summaryrefslogtreecommitdiffstats
path: root/virt/kvm/arm
AgeCommit message (Collapse)AuthorFilesLines
2020-02-28Merge tag 'kvmarm-fixes-5.6-1' of ↵Paolo Bonzini2-2/+1
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm fixes for 5.6, take #1 - Fix compilation on 32bit - Move VHE guest entry/exit into the VHE-specific entry code - Make sure all functions called by the non-VHE HYP code is tagged as __always_inline
2020-02-17kvm: arm/arm64: Fold VHE entry/exit work into kvm_vcpu_run_vhe()Mark Rutland1-2/+0
With VHE, running a vCPU always requires the sequence: 1. kvm_arm_vhe_guest_enter(); 2. kvm_vcpu_run_vhe(); 3. kvm_arm_vhe_guest_exit() ... and as we invoke this from the shared arm/arm64 KVM code, 32-bit arm has to provide stubs for all three functions. To simplify the common code, and make it easier to make further modifications to the arm64-specific portions in the near future, let's fold kvm_arm_vhe_guest_enter() and kvm_arm_vhe_guest_exit() into kvm_vcpu_run_vhe(). The 32-bit stubs for kvm_arm_vhe_guest_enter() and kvm_arm_vhe_guest_exit() are removed, as they are no longer used. The 32-bit stub for kvm_vcpu_run_vhe() is left as-is. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20200210114757.2889-1-mark.rutland@arm.com
2020-02-12KVM: Disable preemption in kvm_get_running_vcpu()Marc Zyngier1-12/+0
Accessing a per-cpu variable only makes sense when preemption is disabled (and the kernel does check this when the right debug options are switched on). For kvm_get_running_vcpu(), it is fine to return the value after re-enabling preemption, as the preempt notifiers will make sure that this is kept consistent across task migration (the comment above the function hints at it, but lacks the crucial preemption management). While we're at it, move the comment from the ARM code, which explains why the whole thing works. Fixes: 7495e22bb165 ("KVM: Move running VCPU from ARM to common code"). Cc: Paolo Bonzini <pbonzini@redhat.com> Reported-by: Zenghui Yu <yuzenghui@huawei.com> Tested-by: Zenghui Yu <yuzenghui@huawei.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/318984f6-bc36-33a3-abc6-bf2295974b06@huawei.com Message-id: <20200207163410.31276-1-maz@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05KVM: arm/arm64: Fix up includes for trace.hJeremy Cline1-0/+1
Fedora kernel builds on armv7hl began failing recently because kvm_arm_exception_type and kvm_arm_exception_class were undeclared in trace.h. Add the missing include. Fixes: 0e20f5e25556 ("KVM: arm/arm64: Cleanup MMIO handling") Signed-off-by: Jeremy Cline <jcline@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20200205134146.82678-1-jcline@redhat.com
2020-01-30Merge tag 'kvmarm-5.6' of ↵Paolo Bonzini9-132/+228
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm updates for Linux 5.6 - Fix MMIO sign extension - Fix HYP VA tagging on tag space exhaustion - Fix PSTATE/CPSR handling when generating exception - Fix MMU notifier's advertizing of young pages - Fix poisoned page handling - Fix PMU SW event handling - Fix TVAL register access - Fix AArch32 external abort injection - Fix ITS unmapped collection handling - Various cleanups
2020-01-28KVM: arm64: Treat emulated TVAL TimerValue as a signed 32-bit integerAlexandru Elisei1-1/+2
According to the ARM ARM, registers CNT{P,V}_TVAL_EL0 have bits [63:32] RES0 [1]. When reading the register, the value is truncated to the least significant 32 bits [2], and on writes, TimerValue is treated as a signed 32-bit integer [1, 2]. When the guest behaves correctly and writes 32-bit values, treating TVAL as an unsigned 64 bit register works as expected. However, things start to break down when the guest writes larger values, because (u64)0x1_ffff_ffff = 8589934591. but (s32)0x1_ffff_ffff = -1, and the former will cause the timer interrupt to be asserted in the future, but the latter will cause it to be asserted now. Let's treat TVAL as a signed 32-bit register on writes, to match the behaviour described in the architecture, and the behaviour experimentally exhibited by the virtual timer on a non-vhe host. [1] Arm DDI 0487E.a, section D13.8.18 [2] Arm DDI 0487E.a, section D11.2.4 Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> [maz: replaced the read-side mask with lower_32_bits] Signed-off-by: Marc Zyngier <maz@kernel.org> Fixes: 8fa761624871 ("KVM: arm/arm64: arch_timer: Fix CNTP_TVAL calculation") Link: https://lore.kernel.org/r/20200127103652.2326-1-alexandru.elisei@arm.com
2020-01-28KVM: arm64: pmu: Only handle supported event countersEric Auger1-5/+5
Let the code never use unsupported event counters. Change kvm_pmu_handle_pmcr() to only reset supported counters and kvm_pmu_vcpu_reset() to only stop supported counters. Other actions are filtered on the supported counters in kvm/sysregs.c Signed-off-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20200124142535.29386-5-eric.auger@redhat.com
2020-01-28KVM: arm64: pmu: Fix chained SW_INCR countersEric Auger1-13/+30
At the moment a SW_INCR counter always overflows on 32-bit boundary, independently on whether the n+1th counter is programmed as CHAIN. Check whether the SW_INCR counter is a 64b counter and if so, implement the 64b logic. Fixes: 80f393a23be6 ("KVM: arm/arm64: Support chained PMU counters") Signed-off-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20200124142535.29386-4-eric.auger@redhat.com
2020-01-28KVM: arm64: pmu: Don't mark a counter as chained if the odd one is disabledEric Auger1-29/+33
At the moment we update the chain bitmap on type setting. This does not take into account the enable state of the odd register. Let's make sure a counter is never considered as chained if the high counter is disabled. We recompute the chain state on enable/disable and type changes. Also let create_perf_event() use the chain bitmap and not use kvm_pmu_idx_has_chain_evtype(). Suggested-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20200124142535.29386-3-eric.auger@redhat.com
2020-01-28KVM: arm64: pmu: Don't increment SW_INCR if PMCR.E is unsetEric Auger1-0/+3
The specification says PMSWINC increments PMEVCNTR<n>_EL1 by 1 if PMEVCNTR<n>_EL0 is enabled and configured to count SW_INCR. For PMEVCNTR<n>_EL0 to be enabled, we need both PMCNTENSET to be set for the corresponding event counter but we also need the PMCR.E bit to be set. Fixes: 7a0adc7064b8 ("arm64: KVM: Add access handler for PMSWINC register") Signed-off-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Andrew Murray <andrew.murray@arm.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20200124142535.29386-2-eric.auger@redhat.com
2020-01-27mm: thp: KVM: Explicitly check for THP when populating secondary MMUSean Christopherson1-7/+1
Add a helper, is_transparent_hugepage(), to explicitly check whether a compound page is a THP and use it when populating KVM's secondary MMU. The explicit check fixes a bug where a remapped compound page, e.g. for an XDP Rx socket, is mapped into a KVM guest and is mistaken for a THP, which results in KVM incorrectly creating a huge page in its secondary MMU. Fixes: 936a5fe6e6148 ("thp: kvm mmu transparent hugepage support") Reported-by: syzbot+c9d1fb51ac9d0d10c39d@syzkaller.appspotmail.com Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-27KVM: Move running VCPU from ARM to common codePaolo Bonzini4-45/+7
For ring-based dirty log tracking, it will be more efficient to account writes during schedule-out or schedule-in to the currently running VCPU. We would like to do it even if the write doesn't use the current VCPU's address space, as is the case for cached writes (see commit 4e335d9e7ddb, "Revert "KVM: Support vCPU-based gfn->hva cache"", 2017-05-02). Therefore, add a mechanism to track the currently-loaded kvm_vcpu struct. There is already something similar in KVM/ARM; one important difference is that kvm_arch_vcpu_{load,put} have two callers in virt/kvm/kvm_main.c: we have to update both the architecture-independent vcpu_{load,put} and the preempt notifiers. Another change made in the process is to allow using kvm_get_running_vcpu() in preemptible code. This is allowed because preempt notifiers ensure that the value does not change even after the VCPU thread is migrated. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-27KVM: Drop kvm_arch_vcpu_init() and kvm_arch_vcpu_uninit()Sean Christopherson1-5/+0
Remove kvm_arch_vcpu_init() and kvm_arch_vcpu_uninit() now that all arch specific implementations are nops. Acked-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-27KVM: arm64: Free sve_state via arm specific hookSean Christopherson1-0/+2
Add an arm specific hook to free the arm64-only sve_state. Doing so eliminates the last functional code from kvm_arch_vcpu_uninit() across all architectures and paves the way for removing kvm_arch_vcpu_init() and kvm_arch_vcpu_uninit() entirely. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-27KVM: ARM: Move all vcpu init code into kvm_arch_vcpu_create()Sean Christopherson1-14/+20
Fold init() into create() now that the two are called back-to-back by common KVM code (kvm_vcpu_init() calls kvm_arch_vcpu_init() as its last action, and kvm_vm_ioctl_create_vcpu() calls kvm_arch_vcpu_create() immediately thereafter). This paves the way for removing kvm_arch_vcpu_{un}init() entirely. Note, there is no associated unwinding in kvm_arch_vcpu_uninit() that needs to be relocated (to kvm_arch_vcpu_destroy()). No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-27KVM: Move vcpu alloc and init invocation to common codeSean Christopherson1-27/+2
Now that all architectures tightly couple vcpu allocation/free with the mandatory calls to kvm_{un}init_vcpu(), move the sequences verbatim to common KVM code. Move both allocation and initialization in a single patch to eliminate thrash in arch specific code. The bisection benefits of moving the two pieces in separate patches is marginal at best, whereas the odds of introducing a transient arch specific bug are non-zero. Acked-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-24KVM: Introduce kvm_vcpu_destroy()Sean Christopherson1-1/+1
Add kvm_vcpu_destroy() and wire up all architectures to call the common function instead of their arch specific implementation. The common destruction function will be used by future patches to move allocation and initialization of vCPUs to common KVM code, i.e. to free resources that are allocated by arch agnostic code. No functional change intended. Acked-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-24KVM: Add kvm_arch_vcpu_precreate() to handle pre-allocation issuesSean Christopherson1-10/+11
Add a pre-allocation arch hook to handle checks that are currently done by arch specific code prior to allocating the vCPU object. This paves the way for moving the allocation to common KVM code. Acked-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-24KVM: arm: Drop kvm_arch_vcpu_free()Sean Christopherson1-7/+2
Remove the superfluous kvm_arch_vcpu_free() as it is no longer called from commmon KVM code. Note, kvm_arch_vcpu_destroy() *is* called from common code, i.e. choosing which function to whack is not completely arbitrary. Acked-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-23KVM: arm: Make inject_abt32() inject an external abort insteadJames Morse1-3/+7
KVM's inject_abt64() injects an external-abort into an aarch64 guest. The KVM_CAP_ARM_INJECT_EXT_DABT is intended to do exactly this, but for an aarch32 guest inject_abt32() injects an implementation-defined exception, 'Lockdown fault'. Change this to external abort. For non-LPAE we now get the documented: | Unhandled fault: external abort on non-linefetch (0x008) at 0x9c800f00 and for LPAE: | Unhandled fault: synchronous external abort (0x210) at 0x9c800f00 Fixes: 74a64a981662a ("KVM: arm/arm64: Unify 32bit fault injection") Reported-by: Beata Michalska <beata.michalska@linaro.org> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20200121123356.203000-3-james.morse@arm.com
2020-01-23KVM: arm: Fix DFSR setting for non-LPAE aarch32 guestsJames Morse1-3/+5
Beata reports that KVM_SET_VCPU_EVENTS doesn't inject the expected exception to a non-LPAE aarch32 guest. The host intends to inject DFSR.FS=0x14 "IMPLEMENTATION DEFINED fault (Lockdown fault)", but the guest receives DFSR.FS=0x04 "Fault on instruction cache maintenance". This fault is hooked by do_translation_fault() since ARMv6, which goes on to silently 'handle' the exception, and restart the faulting instruction. It turns out, when TTBCR.EAE is clear DFSR is split, and FS[4] has to shuffle up to DFSR[10]. As KVM only does this in one place, fix up the static values. We now get the expected: | Unhandled fault: lock abort (0x404) at 0x9c800f00 Fixes: 74a64a981662a ("KVM: arm/arm64: Unify 32bit fault injection") Reported-by: Beata Michalska <beata.michalska@linaro.org> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20200121123356.203000-2-james.morse@arm.com
2020-01-23KVM: arm/arm64: Fix young bit from mmu notifierGavin Shan1-1/+2
kvm_test_age_hva() is called upon mmu_notifier_test_young(), but wrong address range has been passed to handle_hva_to_gpa(). With the wrong address range, no young bits will be checked in handle_hva_to_gpa(). It means zero is always returned from mmu_notifier_test_young(). This fixes the issue by passing correct address range to the underly function handle_hva_to_gpa(), so that the hardware young (access) bit will be visited. Fixes: 35307b9a5f7e ("arm/arm64: KVM: Implement Stage-2 page aging") Signed-off-by: Gavin Shan <gshan@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20200121055659.19560-1-gshan@redhat.com
2020-01-23KVM: arm/arm64: Cleanup MMIO handlingMarc Zyngier2-49/+22
Our MMIO handling is a bit odd, in the sense that it uses an intermediate per-vcpu structure to store the various decoded information that describe the access. But the same information is readily available in the HSR/ESR_EL2 field, and we actually use this field to populate the structure. Let's simplify the whole thing by getting rid of the superfluous structure and save a (tiny) bit of space in the vcpu structure. [32bit fix courtesy of Olof Johansson <olof@lixom.net>] Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-01-19KVM: arm/arm64: vgic: Drop the kvm_vgic_register_mmio_region()Zenghui Yu1-5/+0
kvm_vgic_register_mmio_region() was introduced in commit 4493b1c4866a ("KVM: arm/arm64: vgic-new: Add MMIO handling framework") but never used, and even never implemented. Remove it to avoid confusing readers. Reported-by: Haibin Wang <wanghaibin.wang@huawei.com> Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20200119090604.398-1-yuzenghui@huawei.com
2020-01-19KVM: arm/arm64: vgic-its: Properly check the unmapped coll in DISCARD handlerZenghui Yu1-2/+1
Discard is supposed to fail if the collection is not mapped to any target redistributor. We currently check if the collection is mapped by "ite->collection" but this is incomplete (e.g., mapping a LPI to an unmapped collection also results in a non NULL ite->collection). What actually needs to be checked is its_is_collection_mapped(), let's turn to it. Also take this chance to remove an extra blank line. Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Eric Auger <eric.auger@redhat.com> Link: https://lore.kernel.org/r/20200114112212.1411-1-yuzenghui@huawei.com
2020-01-19KVM: arm/arm64: Correct AArch32 SPSR on exception entryMark Rutland1-3/+3
Confusingly, there are three SPSR layouts that a kernel may need to deal with: (1) An AArch64 SPSR_ELx view of an AArch64 pstate (2) An AArch64 SPSR_ELx view of an AArch32 pstate (3) An AArch32 SPSR_* view of an AArch32 pstate When the KVM AArch32 support code deals with SPSR_{EL2,HYP}, it's either dealing with #2 or #3 consistently. On arm64 the PSR_AA32_* definitions match the AArch64 SPSR_ELx view, and on arm the PSR_AA32_* definitions match the AArch32 SPSR_* view. However, when we inject an exception into an AArch32 guest, we have to synthesize the AArch32 SPSR_* that the guest will see. Thus, an AArch64 host needs to synthesize layout #3 from layout #2. This patch adds a new host_spsr_to_spsr32() helper for this, and makes use of it in the KVM AArch32 support code. For arm64 we need to shuffle the DIT bit around, and remove the SS bit, while for arm we can use the value as-is. I've open-coded the bit manipulation for now to avoid having to rework the existing PSR_* definitions into PSR64_AA32_* and PSR32_AA32_* definitions. I hope to perform a more thorough refactoring in future so that we can handle pstate view manipulation more consistently across the kernel tree. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20200108134324.46500-4-mark.rutland@arm.com
2020-01-19KVM: arm/arm64: Correct CPSR on exception entryMark Rutland1-10/+101
When KVM injects an exception into a guest, it generates the CPSR value from scratch, configuring CPSR.{M,A,I,T,E}, and setting all other bits to zero. This isn't correct, as the architecture specifies that some CPSR bits are (conditionally) cleared or set upon an exception, and others are unchanged from the original context. This patch adds logic to match the architectural behaviour. To make this simple to follow/audit/extend, documentation references are provided, and bits are configured in order of their layout in SPSR_EL2. This layout can be seen in the diagram on ARM DDI 0487E.a page C5-426. Note that this code is used by both arm and arm64, and is intended to fuction with the SPSR_EL2 and SPSR_HYP layouts. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20200108134324.46500-3-mark.rutland@arm.com
2020-01-19KVM: arm/arm64: Re-check VMA on detecting a poisoned pageJames Morse1-11/+9
When we check for a poisoned page, we use the VMA to tell userspace about the looming disaster. But we pass a pointer to this VMA after having released the mmap_sem, which isn't a good idea. Instead, stash the shift value that goes with this pfn while we are holding the mmap_sem. Reported-by: Marc Zyngier <maz@kernel.org> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Christoffer Dall <christoffer.dall@arm.com> Link: https://lore.kernel.org/r/20191211165651.7889-3-maz@kernel.org Link: https://lore.kernel.org/r/20191217123809.197392-1-james.morse@arm.com
2020-01-19KVM: arm: Remove duplicate includeYueHaibing1-2/+0
Remove duplicate header which is included twice. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Steven Price <steven.price@arm.com> Link: https://lore.kernel.org/r/20191113014045.15276-1-yuehaibing@huawei.com
2020-01-19KVM: ARM: Call hyp_cpu_pm_exit at the right placeShannon Zhao1-1/+1
It doesn't needs to call hyp_cpu_pm_exit() in init_hyp_mode() when some error occurs. hyp_cpu_pm_exit() only needs to be called in kvm_arch_init() if init_subsystems() fails. So move hyp_cpu_pm_exit() out from teardown_hyp_mode() and call it directly in kvm_arch_init(). Signed-off-by: Shannon Zhao <shannon.zhao@linux.alibaba.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/1575272531-3204-1-git-send-email-shannon.zhao@linux.alibaba.com
2020-01-19KVM: arm/arm64: vgic: Handle GICR_PENDBASER.PTZ filed as RAZZenghui Yu1-1/+4
Although guest will hardly read and use the PTZ (Pending Table Zero) bit in GICR_PENDBASER, let us emulate the architecture strictly. As per IHI 0069E 9.11.30, PTZ field is WO, and reads as 0. Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Eric Auger <eric.auger@redhat.com> Link: https://lore.kernel.org/r/20191220111833.1422-1-yuzenghui@huawei.com
2020-01-19KVM: arm/arm64: vgic-its: Fix restoration of unmapped collectionsEric Auger1-1/+2
Saving/restoring an unmapped collection is a valid scenario. For example this happens if a MAPTI command was sent, featuring an unmapped collection. At the moment the CTE fails to be restored. Only compare against the number of online vcpus if the rdist base is set. Fixes: ea1ad53e1e31a ("KVM: arm64: vgic-its: Collection table save/restore") Signed-off-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Zenghui Yu <yuzenghui@huawei.com> Link: https://lore.kernel.org/r/20191213094237.19627-1-eric.auger@redhat.com
2020-01-19KVM: arm64: Only sign-extend MMIO up to register widthChristoffer Dall1-0/+6
On AArch64 you can do a sign-extended load to either a 32-bit or 64-bit register, and we should only sign extend the register up to the width of the register as specified in the operation (by using the 32-bit Wn or 64-bit Xn register specifier). As it turns out, the architecture provides this decoding information in the SF ("Sixty-Four" -- how cute...) bit. Let's take advantage of this with the usual 32-bit/64-bit header file dance and do the right thing on AArch64 hosts. Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20191212195055.5541-1-christoffer.dall@arm.com
2019-12-18Merge tag 'kvmarm-fixes-5.5-1' of ↵Paolo Bonzini13-163/+324
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-master KVM/arm fixes for .5.5, take #1 - Fix uninitialised sysreg accessor - Fix handling of demand-paged device mappings - Stop spamming the console on IMPDEF sysregs - Relax mappings of writable memslots - Assorted cleanups
2019-12-12KVM: arm/arm64: Properly handle faulting of device mappingsMarc Zyngier1-4/+17
A device mapping is normally always mapped at Stage-2, since there is very little gain in having it faulted in. Nonetheless, it is possible to end-up in a situation where the device mapping has been removed from Stage-2 (userspace munmaped the VFIO region, and the MMU notifier did its job), but present in a userspace mapping (userpace has mapped it back at the same address). In such a situation, the device mapping will be demand-paged as the guest performs memory accesses. This requires to be careful when dealing with mapping size, cache management, and to handle potential execution of a device mapping. Reported-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Alexandru Elisei <alexandru.elisei@arm.com> Reviewed-by: James Morse <james.morse@arm.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20191211165651.7889-2-maz@kernel.org
2019-12-06KVM: arm/arm64: Remove excessive permission check in ↵Jia He1-9/+0
kvm_arch_prepare_memory_region In kvm_arch_prepare_memory_region, arm kvm regards the memory region as writable if the flag has no KVM_MEM_READONLY, and the vm is readonly if !VM_WRITE. But there is common usage for setting kvm memory region as follows: e.g. qemu side (see the PROT_NONE flag) 1. mmap(NULL, size, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); memory_region_init_ram_ptr() 2. re mmap the above area with read/write authority. Such example is used in virtio-fs qemu codes which hasn't been upstreamed [1]. But seems we can't forbid this example. Without this patch, it will cause an EPERM during kvm_set_memory_region() and cause qemu boot crash. As told by Ard, "the underlying assumption is incorrect, i.e., that the value of vm_flags at this point in time defines how the VMA is used during its lifetime. There may be other cases where a VMA is created with VM_READ vm_flags that are changed to VM_READ|VM_WRITE later, and we are currently rejecting this use case as well." [1] https://gitlab.com/virtio-fs/qemu/blob/5a356e/hw/virtio/vhost-user-fs.c#L488 Suggested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Jia He <justin.he@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Christoffer Dall <christoffer.dall@arm.com> Link: https://lore.kernel.org/r/20191206020802.196108-1-justin.he@arm.com
2019-12-06KVM: arm/arm64: vgic: Use wrapper function to lock/unlock all vcpus in ↵Miaohe Lin1-15/+4
kvm_vgic_create() Use wrapper function lock_all_vcpus()/unlock_all_vcpus() in kvm_vgic_create() to remove duplicated code dealing with locking and unlocking all vcpus in a vm. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Steven Price <steven.price@arm.com> Link: https://lore.kernel.org/r/1575081918-11401-1-git-send-email-linmiaohe@huawei.com
2019-12-06KVM: arm/arm64: vgic: Fix potential double free dist->spis in ↵Miaohe Lin1-0/+1
__kvm_vgic_destroy() In kvm_vgic_dist_init() called from kvm_vgic_map_resources(), if dist->vgic_model is invalid, dist->spis will be freed without set dist->spis = NULL. And in vgicv2 resources clean up path, __kvm_vgic_destroy() will be called to free allocated resources. And dist->spis will be freed again in clean up chain because we forget to set dist->spis = NULL in kvm_vgic_dist_init() failed path. So double free would happen. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Eric Auger <eric.auger@redhat.com> Link: https://lore.kernel.org/r/1574923128-19956-1-git-send-email-linmiaohe@huawei.com
2019-12-06KVM: arm/arm64: Get rid of unused arg in cpu_init_hyp_mode()Miaohe Lin1-2/+2
As arg dummy is not really needed, there's no need to pass NULL when calling cpu_init_hyp_mode(). So clean it up. Fixes: 67f691976662 ("arm64: kvm: allows kvm cpu hotplug") Reviewed-by: Steven Price <steven.price@arm.com> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/1574320559-5662-1-git-send-email-linmiaohe@huawei.com
2019-11-08Merge remote-tracking branch 'kvmarm/misc-5.5' into kvmarm/nextMarc Zyngier8-50/+55
2019-11-08KVM: arm64: Opportunistically turn off WFI trapping when using direct LPI ↵Marc Zyngier1-2/+2
injection Just like we do for WFE trapping, it can be useful to turn off WFI trapping when the physical CPU is not oversubscribed (that is, the vcpu is the only runnable process on this CPU) *and* that we're using direct injection of interrupts. The conditions are reevaluated on each vcpu_load(), ensuring that we don't switch to this mode on a busy system. On a GICv4 system, this has the effect of reducing the generation of doorbell interrupts to zero when the right conditions are met, which is a huge improvement over the current situation (where the doorbells are screaming if the CPU ever hits a blocking WFI). Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Zenghui Yu <yuzenghui@huawei.com> Reviewed-by: Christoffer Dall <christoffer.dall@arm.com> Link: https://lore.kernel.org/r/20191107160412.30301-3-maz@kernel.org
2019-11-08KVM: vgic-v4: Track the number of VLPIs per vcpuMarc Zyngier3-0/+6
In order to find out whether a vcpu is likely to be the target of VLPIs (and to further optimize the way we deal with those), let's track the number of VLPIs a vcpu can receive. This gets implemented with an atomic variable that gets incremented or decremented on map, unmap and move of a VLPI. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Zenghui Yu <yuzenghui@huawei.com> Reviewed-by: Christoffer Dall <christoffer.dall@arm.com> Link: https://lore.kernel.org/r/20191107160412.30301-2-maz@kernel.org
2019-11-07KVM: arm/arm64: Let the timer expire in hardirq context on RTThomas Gleixner1-4/+4
The timers are canceled from an preempt-notifier which is invoked with disabled preemption which is not allowed on PREEMPT_RT. The timer callback is short so in could be invoked in hard-IRQ context on -RT. Let the timer expire on hard-IRQ context even on -RT. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Julien Grall <julien.grall@arm.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20191107095424.16647-1-bigeasy@linutronix.de
2019-10-29KVM: arm/arm64: vgic: Don't rely on the wrong pending tableZenghui Yu1-3/+3
It's possible that two LPIs locate in the same "byte_offset" but target two different vcpus, where their pending status are indicated by two different pending tables. In such a scenario, using last_byte_offset optimization will lead KVM relying on the wrong pending table entry. Let us use last_ptr instead, which can be treated as a byte index into a pending table and also, can be vcpu specific. Fixes: 280771252c1b ("KVM: arm64: vgic-v3: KVM_DEV_ARM_VGIC_SAVE_PENDING_TABLES") Cc: stable@vger.kernel.org Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Acked-by: Eric Auger <eric.auger@redhat.com> Link: https://lore.kernel.org/r/20191029071919.177-4-yuzenghui@huawei.com
2019-10-29KVM: arm/arm64: vgic: Fix some comments typoZenghui Yu2-2/+2
Fix various comments, including wrong function names, grammar mistakes and specification references. Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20191029071919.177-3-yuzenghui@huawei.com
2019-10-28KVM: arm64: vgic-v4: Move the GICv4 residency flow to be driven by vcpu_load/putMarc Zyngier5-39/+38
When the VHE code was reworked, a lot of the vgic stuff was moved around, but the GICv4 residency code did stay untouched, meaning that we come in and out of residency on each flush/sync, which is obviously suboptimal. To address this, let's move things around a bit: - Residency entry (flush) moves to vcpu_load - Residency exit (sync) moves to vcpu_put - On blocking (entry to WFI), we "put" - On unblocking (exit from WFI), we "load" Because these can nest (load/block/put/load/unblock/put, for example), we now have per-VPE tracking of the residency state. Additionally, vgic_v4_put gains a "need doorbell" parameter, which only gets set to true when blocking because of a WFI. This allows a finer control of the doorbell, which now also gets disabled as soon as it gets signaled. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20191027144234.8395-2-maz@kernel.org
2019-10-24Merge remote-tracking branch 'kvmarm/kvm-arm64/stolen-time' into ↵Marc Zyngier4-82/+215
kvmarm-master/next
2019-10-21KVM: arm64: Provide VCPU attributes for stolen timeSteven Price1-0/+59
Allow user space to inform the KVM host where in the physical memory map the paravirtualized time structures should be located. User space can set an attribute on the VCPU providing the IPA base address of the stolen time structure for that VCPU. This must be repeated for every VCPU in the VM. The address is given in terms of the physical address visible to the guest and must be 64 byte aligned. The guest will discover the address via a hypercall. Signed-off-by: Steven Price <steven.price@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-10-21KVM: arm64: Support stolen time reporting via shared structureSteven Price3-0/+69
Implement the service call for configuring a shared structure between a VCPU and the hypervisor in which the hypervisor can write the time stolen from the VCPU's execution time by other tasks on the host. User space allocates memory which is placed at an IPA also chosen by user space. The hypervisor then updates the shared structure using kvm_put_guest() to ensure single copy atomicity of the 64-bit value reporting the stolen time in nanoseconds. Whenever stolen time is enabled by the guest, the stolen time counter is reset. The stolen time itself is retrieved from the sched_info structure maintained by the Linux scheduler code. We enable SCHEDSTATS when selecting KVM Kconfig to ensure this value is meaningful. Signed-off-by: Steven Price <steven.price@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-10-21KVM: arm64: Implement PV_TIME_FEATURES callSteven Price2-1/+27
This provides a mechanism for querying which paravirtualized time features are available in this hypervisor. Also add the header file which defines the ABI for the paravirtualized time features we're about to add. Signed-off-by: Steven Price <steven.price@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>