summaryrefslogtreecommitdiffstats
path: root/virt/kvm/arm/vgic/vgic.c
AgeCommit message (Collapse)AuthorFilesLines
2019-10-28KVM: arm64: vgic-v4: Move the GICv4 residency flow to be driven by vcpu_load/putMarc Zyngier1-4/+0
When the VHE code was reworked, a lot of the vgic stuff was moved around, but the GICv4 residency code did stay untouched, meaning that we come in and out of residency on each flush/sync, which is obviously suboptimal. To address this, let's move things around a bit: - Residency entry (flush) moves to vcpu_load - Residency exit (sync) moves to vcpu_put - On blocking (entry to WFI), we "put" - On unblocking (exit from WFI), we "load" Because these can nest (load/block/put/load/unblock/put, for example), we now have per-VPE tracking of the residency state. Additionally, vgic_v4_put gains a "need doorbell" parameter, which only gets set to true when blocking because of a WFI. This allows a finer control of the doorbell, which now also gets disabled as soon as it gets signaled. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20191027144234.8395-2-maz@kernel.org
2019-09-18Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-9/+17
Pull KVM updates from Paolo Bonzini: "s390: - ioctl hardening - selftests ARM: - ITS translation cache - support for 512 vCPUs - various cleanups and bugfixes PPC: - various minor fixes and preparation x86: - bugfixes all over the place (posted interrupts, SVM, emulation corner cases, blocked INIT) - some IPI optimizations" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (75 commits) KVM: X86: Use IPI shorthands in kvm guest when support KVM: x86: Fix INIT signal handling in various CPU states KVM: VMX: Introduce exit reason for receiving INIT signal on guest-mode KVM: VMX: Stop the preemption timer during vCPU reset KVM: LAPIC: Micro optimize IPI latency kvm: Nested KVM MMUs need PAE root too KVM: x86: set ctxt->have_exception in x86_decode_insn() KVM: x86: always stop emulation on page fault KVM: nVMX: trace nested VM-Enter failures detected by H/W KVM: nVMX: add tracepoint for failed nested VM-Enter x86: KVM: svm: Fix a check in nested_svm_vmrun() KVM: x86: Return to userspace with internal error on unexpected exit reason KVM: x86: Add kvm_emulate_{rd,wr}msr() to consolidate VXM/SVM code KVM: x86: Refactor up kvm_{g,s}et_msr() to simplify callers doc: kvm: Fix return description of KVM_SET_MSRS KVM: X86: Tune PLE Window tracepoint KVM: VMX: Change ple_window type to unsigned int KVM: X86: Remove tailing newline for tracepoints KVM: X86: Trace vcpu_id for vmexit KVM: x86: Manually calculate reserved bits when loading PDPTRS ...
2019-08-28Merge tag 'arm64-fixes' of ↵Linus Torvalds1-0/+7
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Will Deacon: "Hot on the heels of our last set of fixes are a few more for -rc7. Two of them are fixing issues with our virtual interrupt controller implementation in KVM/arm, while the other is a longstanding but straightforward kallsyms fix which was been acked by Masami and resolves an initialisation failure in kprobes observed on arm64. - Fix GICv2 emulation bug (KVM) - Fix deadlock in virtual GIC interrupt injection code (KVM) - Fix kprobes blacklist init failure due to broken kallsyms lookup" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: KVM: arm/arm64: vgic-v2: Handle SGI bits in GICD_I{S,C}PENDR0 as WI KVM: arm/arm64: vgic: Fix potential deadlock when ap_list is long kallsyms: Don't let kallsyms_lookup_size_offset() fail on retrieving the first symbol
2019-08-27KVM: arm/arm64: vgic: Fix potential deadlock when ap_list is longHeyi Guo1-0/+7
If the ap_list is longer than 256 entries, merge_final() in list_sort() will call the comparison callback with the same element twice, causing a deadlock in vgic_irq_cmp(). Fix it by returning early when irqa == irqb. Cc: stable@vger.kernel.org # 4.7+ Fixes: 8e4447457965 ("KVM: arm/arm64: vgic-new: Add IRQ sorting") Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Heyi Guo <guoheyi@huawei.com> [maz: massaged commit log and patch, added Fixes and Cc-stable] Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Will Deacon <will@kernel.org>
2019-08-18KVM: arm/arm64: vgic: Add __vgic_put_lpi_locked primitiveMarc Zyngier1-9/+17
Our LPI translation cache needs to be able to drop the refcount on an LPI whilst already holding the lpi_list_lock. Let's add a new primitive for this. Tested-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-08-05KVM: arm/arm64: Sync ICH_VMCR_EL2 back when about to blockMarc Zyngier1-0/+11
Since commit commit 328e56647944 ("KVM: arm/arm64: vgic: Defer touching GICH_VMCR to vcpu_load/put"), we leave ICH_VMCR_EL2 (or its GICv2 equivalent) loaded as long as we can, only syncing it back when we're scheduled out. There is a small snag with that though: kvm_vgic_vcpu_pending_irq(), which is indirectly called from kvm_vcpu_check_block(), needs to evaluate the guest's view of ICC_PMR_EL1. At the point were we call kvm_vcpu_check_block(), the vcpu is still loaded, and whatever changes to PMR is not visible in memory until we do a vcpu_put(). Things go really south if the guest does the following: mov x0, #0 // or any small value masking interrupts msr ICC_PMR_EL1, x0 [vcpu preempted, then rescheduled, VMCR sampled] mov x0, #ff // allow all interrupts msr ICC_PMR_EL1, x0 wfi // traps to EL2, so samping of VMCR [interrupt arrives just after WFI] Here, the hypervisor's view of PMR is zero, while the guest has enabled its interrupts. kvm_vgic_vcpu_pending_irq() will then say that no interrupts are pending (despite an interrupt being received) and we'll block for no reason. If the guest doesn't have a periodic interrupt firing once it has blocked, it will stay there forever. To avoid this unfortuante situation, let's resync VMCR from kvm_arch_vcpu_blocking(), ensuring that a following kvm_vcpu_check_block() will observe the latest value of PMR. This has been found by booting an arm64 Linux guest with the pseudo NMI feature, and thus using interrupt priorities to mask interrupts instead of the usual PSTATE masking. Cc: stable@vger.kernel.org # 4.12 Fixes: 328e56647944 ("KVM: arm/arm64: vgic: Defer touching GICH_VMCR to vcpu_load/put") Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-06-19treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 234Thomas Gleixner1-12/+1
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details you should have received a copy of the gnu general public license along with this program if not see http www gnu org licenses extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 503 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexios Zavras <alexios.zavras@intel.com> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Enrico Weigelt <info@metux.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190602204653.811534538@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-03KVM: arm/arm64: vgic-v3: Retire pending interrupts on disabling LPIsMarc Zyngier1-0/+21
When disabling LPIs (for example on reset) at the redistributor level, it is expected that LPIs that was pending in the CPU interface are eventually retired. Currently, this is not what is happening, and these LPIs will stay in the ap_list, eventually being acknowledged by the vcpu (which didn't quite expect this behaviour). The fix is thus to retire these LPIs from the list of pending interrupts as we disable LPIs. Reported-by: Heyi Guo <guoheyi@huawei.com> Tested-by: Heyi Guo <guoheyi@huawei.com> Fixes: 0e4e82f154e3 ("KVM: arm64: vgic-its: Enable ITS emulation as a virtual MSI controller") Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-03-19arm64: KVM: Always set ICH_HCR_EL2.EN if GICv4 is enabledMarc Zyngier1-4/+10
The normal interrupt flow is not to enable the vgic when no virtual interrupt is to be injected (i.e. the LRs are empty). But when a guest is likely to use GICv4 for LPIs, we absolutely need to switch it on at all times. Otherwise, VLPIs only get delivered when there is something in the LRs, which doesn't happen very often. Reported-by: Nianyao Tang <tangnianyao@huawei.com> Tested-by: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-01-24KVM: arm/arm64: vgic: Make vgic_cpu->ap_list_lock a raw_spinlockJulien Thierry1-18/+19
vgic_cpu->ap_list_lock must always be taken with interrupts disabled as it is used in interrupt context. For configurations such as PREEMPT_RT_FULL, this means that it should be a raw_spinlock since RT spinlocks are interruptible. Signed-off-by: Julien Thierry <julien.thierry@arm.com> Acked-by: Christoffer Dall <christoffer.dall@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
2019-01-24KVM: arm/arm64: vgic: Make vgic_dist->lpi_list_lock a raw_spinlockJulien Thierry1-5/+5
vgic_dist->lpi_list_lock must always be taken with interrupts disabled as it is used in interrupt context. For configurations such as PREEMPT_RT_FULL, this means that it should be a raw_spinlock since RT spinlocks are interruptible. Signed-off-by: Julien Thierry <julien.thierry@arm.com> Acked-by: Christoffer Dall <christoffer.dall@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
2019-01-24KVM: arm/arm64: vgic: Make vgic_irq->irq_lock a raw_spinlockJulien Thierry1-36/+35
vgic_irq->irq_lock must always be taken with interrupts disabled as it is used in interrupt context. For configurations such as PREEMPT_RT_FULL, this means that it should be a raw_spinlock since RT spinlocks are interruptible. Signed-off-by: Julien Thierry <julien.thierry@arm.com> Acked-by: Christoffer Dall <christoffer.dall@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
2018-12-26Merge branch 'core-rcu-for-linus' of ↵Linus Torvalds1-6/+6
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU updates from Ingo Molnar: "The biggest RCU changes in this cycle were: - Convert RCU's BUG_ON() and similar calls to WARN_ON() and similar. - Replace calls of RCU-bh and RCU-sched update-side functions to their vanilla RCU counterparts. This series is a step towards complete removal of the RCU-bh and RCU-sched update-side functions. ( Note that some of these conversions are going upstream via their respective maintainers. ) - Documentation updates, including a number of flavor-consolidation updates from Joel Fernandes. - Miscellaneous fixes. - Automate generation of the initrd filesystem used for rcutorture testing. - Convert spin_is_locked() assertions to instead use lockdep. ( Note that some of these conversions are going upstream via their respective maintainers. ) - SRCU updates, especially including a fix from Dennis Krein for a bag-on-head-class bug. - RCU torture-test updates" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (112 commits) rcutorture: Don't do busted forward-progress testing rcutorture: Use 100ms buckets for forward-progress callback histograms rcutorture: Recover from OOM during forward-progress tests rcutorture: Print forward-progress test age upon failure rcutorture: Print time since GP end upon forward-progress failure rcutorture: Print histogram of CB invocation at OOM time rcutorture: Print GP age upon forward-progress failure rcu: Print per-CPU callback counts for forward-progress failures rcu: Account for nocb-CPU callback counts in RCU CPU stall warnings rcutorture: Dump grace-period diagnostics upon forward-progress OOM rcutorture: Prepare for asynchronous access to rcu_fwd_startat torture: Remove unnecessary "ret" variables rcutorture: Affinity forward-progress test to avoid housekeeping CPUs rcutorture: Break up too-long rcu_torture_fwd_prog() function rcutorture: Remove cbflood facility torture: Bring any extra CPUs online during kernel startup rcutorture: Add call_rcu() flooding forward-progress tests rcutorture/formal: Replace synchronize_sched() with synchronize_rcu() tools/kernel.h: Replace synchronize_sched() with synchronize_rcu() net/decnet: Replace rcu_barrier_bh() with rcu_barrier() ...
2018-12-19KVM: arm/arm64: vgic: Consider priority and active state for pending irqChristoffer Dall1-1/+6
When checking if there are any pending IRQs for the VM, consider the active state and priority of the IRQs as well. Otherwise we could be continuously scheduling a guest hypervisor without it seeing an IRQ. Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-12-19KVM: arm/arm64: vgic: Fix off-by-one bug in vgic_get_irq()Gustavo A. R. Silva1-1/+1
When using the nospec API, it should be taken into account that: "...if the CPU speculates past the bounds check then * array_index_nospec() will clamp the index within the range of [0, * size)." The above is part of the header for macro array_index_nospec() in linux/nospec.h Now, in this particular case, if intid evaluates to exactly VGIC_MAX_SPI or to exaclty VGIC_MAX_PRIVATE, the array_index_nospec() macro ends up returning VGIC_MAX_SPI - 1 or VGIC_MAX_PRIVATE - 1 respectively, instead of VGIC_MAX_SPI or VGIC_MAX_PRIVATE, which, based on the original logic: /* SGIs and PPIs */ if (intid <= VGIC_MAX_PRIVATE) return &vcpu->arch.vgic_cpu.private_irqs[intid]; /* SPIs */ if (intid <= VGIC_MAX_SPI) return &kvm->arch.vgic.spis[intid - VGIC_NR_PRIVATE_IRQS]; are valid values for intid. Fix this by calling array_index_nospec() macro with VGIC_MAX_PRIVATE + 1 and VGIC_MAX_SPI + 1 as arguments for its parameter size. Fixes: 41b87599c743 ("KVM: arm/arm64: vgic: fix possible spectre-v1 in vgic_get_irq()") Cc: stable@vger.kernel.org Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> [dropped the SPI part which was fixed separately] Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-12-18KVM: arm/arm64: vgic: Cap SPIs to the VM-defined maximumMarc Zyngier1-2/+2
SPIs should be checked against the VMs specific configuration, and not the architectural maximum. Cc: stable@vger.kernel.org Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-11-12KVM: arm/arm64: vgic: Replace spin_is_locked() with lockdepLance Roy1-6/+6
lockdep_assert_held() is better suited to checking locking requirements, since it only checks if the current thread holds the lock regardless of whether someone else does. This is also a step towards possibly removing spin_is_locked(). Signed-off-by: Lance Roy <ldr709@gmail.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Eric Auger <eric.auger@redhat.com> Cc: linux-arm-kernel@lists.infradead.org Cc: <kvmarm@lists.cs.columbia.edu> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com> Acked-by: Christoffer Dall <christoffer.dall@arm.com>
2018-08-12KVM: arm/arm64: vgic: Do not use spin_lock_irqsave/restore with irq disabledJia He1-6/+7
kvm_vgic_sync_hwstate is only called with IRQ being disabled. There is thus no need to call spin_lock_irqsave/restore in vgic_fold_lr_state and vgic_prune_ap_list. This patch replace them with the non irq-safe version. Signed-off-by: Jia He <jia.he@hxt-semitech.com> Acked-by: Christoffer Dall <christoffer.dall@arm.com> [maz: commit message tidy-up] Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-08-12KVM: arm/arm64: vgic: Move DEBUG_SPINLOCK_BUG_ON to vgic.hJia He1-6/+0
DEBUG_SPINLOCK_BUG_ON can be used with both vgic-v2 and vgic-v3, so let's move it to vgic.h Signed-off-by: Jia He <jia.he@hxt-semitech.com> [maz: commit message tidy-up] Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-05-15KVM: arm/arm64: Properly protect VGIC locks from IRQsAndre Przywara1-8/+14
As Jan reported [1], lockdep complains about the VGIC not being bullet proof. This seems to be due to two issues: - When commit 006df0f34930 ("KVM: arm/arm64: Support calling vgic_update_irq_pending from irq context") promoted irq_lock and ap_list_lock to _irqsave, we forgot two instances of irq_lock. lockdeps seems to pick those up. - If a lock is _irqsave, any other locks we take inside them should be _irqsafe as well. So the lpi_list_lock needs to be promoted also. This fixes both issues by simply making the remaining instances of those locks _irqsave. One irq_lock is addressed in a separate patch, to simplify backporting. [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2018-May/575718.html Cc: stable@vger.kernel.org Fixes: 006df0f34930 ("KVM: arm/arm64: Support calling vgic_update_irq_pending from irq context") Reported-by: Jan Glauber <jan.glauber@caviumnetworks.com> Acked-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-05-05Merge tag 'kvmarm-fixes-for-4.17-2' of ↵Radim Krčmář1-23/+7
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm KVM/arm fixes for 4.17, take #2 - Fix proxying of GICv2 CPU interface accesses - Fix crash when switching to BE - Track source vcpu git GICv2 SGIs - Fix an outdated bit of documentation
2018-04-27rMerge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-0/+8
Pull KVM fixes from Radim Krčmář: "ARM: - PSCI selection API, a leftover from 4.16 (for stable) - Kick vcpu on active interrupt affinity change - Plug a VMID allocation race on oversubscribed systems - Silence debug messages - Update Christoffer's email address (linaro -> arm) x86: - Expose userspace-relevant bits of a newly added feature - Fix TLB flushing on VMX with VPID, but without EPT" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: x86/headers/UAPI: Move DISABLE_EXITS KVM capability bits to the UAPI kvm: apic: Flush TLB after APIC mode/address change if VPIDs are in use arm/arm64: KVM: Add PSCI version selection API KVM: arm/arm64: vgic: Kick new VCPU on interrupt migration arm64: KVM: Demote SVE and LORegion warnings to debug only MAINTAINERS: Update e-mail address for Christoffer Dall KVM: arm/arm64: Close VMID generation race
2018-04-27KVM: arm/arm64: vgic: Fix source vcpu issues for GICv2 SGIMarc Zyngier1-23/+7
Now that we make sure we don't inject multiple instances of the same GICv2 SGI at the same time, we've made another bug more obvious: If we exit with an active SGI, we completely lose track of which vcpu it came from. On the next entry, we restore it with 0 as a source, and if that wasn't the right one, too bad. While this doesn't seem to trouble GIC-400, the architectural model gets offended and doesn't deactivate the interrupt on EOI. Another connected issue is that we will happilly make pending an interrupt from another vcpu, overriding the above zero with something that is just as inconsistent. Don't do that. The final issue is that we signal a maintenance interrupt when no pending interrupts are present in the LR. Assuming we've fixed the two issues above, we end-up in a situation where we keep exiting as soon as we've reached the active state, and not be able to inject the following pending. The fix comes in 3 parts: - GICv2 SGIs have their source vcpu saved if they are active on exit, and restored on entry - Multi-SGIs cannot go via the Pending+Active state, as this would corrupt the source field - Multi-SGIs are converted to using MI on EOI instead of NPIE Fixes: 16ca6a607d84bef0 ("KVM: arm/arm64: vgic: Don't populate multiple LRs with the same vintid") Reported-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-04-26KVM: arm/arm64: vgic: fix possible spectre-v1 in vgic_get_irq()Mark Rutland1-4/+10
It's possible for userspace to control intid. Sanitize intid when using it as an array index. At the same time, sort the includes when adding <linux/nospec.h>. Found by smatch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Christoffer Dall <christoffer.dall@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Cc: kvmarm@lists.cs.columbia.edu Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-04-17KVM: arm/arm64: vgic: Kick new VCPU on interrupt migrationAndre Przywara1-0/+8
When vgic_prune_ap_list() finds an interrupt that needs to be migrated to a new VCPU, we should notify this VCPU of the pending interrupt, since it requires immediate action. Kick this VCPU once we have added the new IRQ to the list, but only after dropping the locks. Reported-by: Stefano Stabellini <sstabellini@kernel.org> Reviewed-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-03-19Merge tag 'kvm-arm-fixes-for-v4.16-2' into HEADMarc Zyngier1-14/+73
Resolve conflicts with current mainline
2018-03-19KVM: arm/arm64: Avoid VGICv3 save/restore on VHE with no IRQsChristoffer Dall1-5/+4
We can finally get completely rid of any calls to the VGICv3 save/restore functions when the AP lists are empty on VHE systems. This requires carefully factoring out trap configuration from saving and restoring state, and carefully choosing what to do on the VHE and non-VHE path. One of the challenges is that we cannot save/restore the VMCR lazily because we can only write the VMCR when ICC_SRE_EL1.SRE is cleared when emulating a GICv2-on-GICv3, since otherwise all Group-0 interrupts end up being delivered as FIQ. To solve this problem, and still provide fast performance in the fast path of exiting a VM when no interrupts are pending (which also optimized the latency for actually delivering virtual interrupts coming from physical interrupts), we orchestrate a dance of only doing the activate/deactivate traps in vgic load/put for VHE systems (which can have ICC_SRE_EL1.SRE cleared when running in the host), and doing the configuration on every round-trip on non-VHE systems. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-03-19KVM: arm/arm64: Handle VGICv3 save/restore from the main VGIC code on VHEChristoffer Dall1-2/+19
Just like we can program the GICv2 hypervisor control interface directly from the core vgic code, we can do the same for the GICv3 hypervisor control interface on VHE systems. We do this by simply calling the save/restore functions when we have VHE and we can then get rid of the save/restore function calls from the VHE world switch function. One caveat is that we now write GICv3 system register state before the potential early exit path in the run loop, and because we sync back state in the early exit path, we have to ensure that we read a consistent GIC state from the sync path, even though we have never actually run the guest with the newly written GIC state. We solve this by inserting an ISB in the early exit path. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-03-19KVM: arm/arm64: Handle VGICv2 save/restore from the main VGIC codeChristoffer Dall1-1/+18
We can program the GICv2 hypervisor control interface logic directly from the core vgic code and can instead do the save/restore directly from the flush/sync functions, which can lead to a number of future optimizations. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-03-14KVM: arm/arm64: vgic: Don't populate multiple LRs with the same vintidMarc Zyngier1-14/+47
The vgic code is trying to be clever when injecting GICv2 SGIs, and will happily populate LRs with the same interrupt number if they come from multiple vcpus (after all, they are distinct interrupt sources). Unfortunately, this is against the letter of the architecture, and the GICv2 architecture spec says "Each valid interrupt stored in the List registers must have a unique VirtualID for that virtual CPU interface.". GICv3 has similar (although slightly ambiguous) restrictions. This results in guests locking up when using GICv2-on-GICv3, for example. The obvious fix is to stop trying so hard, and inject a single vcpu per SGI per guest entry. After all, pending SGIs with multiple source vcpus are pretty rare, and are mostly seen in scenario where the physical CPUs are severely overcomitted. But as we now only inject a single instance of a multi-source SGI per vcpu entry, we may delay those interrupts for longer than strictly necessary, and run the risk of injecting lower priority interrupts in the meantime. In order to address this, we adopt a three stage strategy: - If we encounter a multi-source SGI in the AP list while computing its depth, we force the list to be sorted - When populating the LRs, we prevent the injection of any interrupt of lower priority than that of the first multi-source SGI we've injected. - Finally, the injection of a multi-source SGI triggers the request of a maintenance interrupt when there will be no pending interrupt in the LRs (HCR_NPIE). At the point where the last pending interrupt in the LRs switches from Pending to Active, the maintenance interrupt will be delivered, allowing us to add the remaining SGIs using the same process. Cc: stable@vger.kernel.org Fixes: 0919e84c0fc1 ("KVM: arm/arm64: vgic-new: Add IRQ sync/flush framework") Acked-by: Christoffer Dall <cdall@kernel.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-03-14KVM: arm/arm64: Reset mapped IRQs on VM resetChristoffer Dall1-0/+26
We currently don't allow resetting mapped IRQs from userspace, because their state is controlled by the hardware. But we do need to reset the state when the VM is reset, so we provide a function for the 'owner' of the mapped interrupt to reset the interrupt state. Currently only the timer uses mapped interrupts, so we call this function from the timer reset logic. Cc: stable@vger.kernel.org Fixes: 4c60e360d6df ("KVM: arm/arm64: Provide a get_input_level for the arch timer") Signed-off-by: Christoffer Dall <cdall@kernel.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-01-02KVM: arm/arm64: Support VGIC dist pend/active changes for mapped IRQsChristoffer Dall1-0/+7
For mapped IRQs (with the HW bit set in the LR) we have to follow some rules of the architecture. One of these rules is that VM must not be allowed to deactivate a virtual interrupt with the HW bit set unless the physical interrupt is also active. This works fine when injecting mapped interrupts, because we leave it up to the injector to either set EOImode==1 or manually set the active state of the physical interrupt. However, the guest can set virtual interrupt to be pending or active by writing to the virtual distributor, which could lead to deactivating a virtual interrupt with the HW bit set without the physical interrupt being active. We could set the physical interrupt to active whenever we are about to enter the VM with a HW interrupt either pending or active, but that would be really slow, especially on GICv2. So we take the long way around and do the hard work when needed, which is expected to be extremely rare. When the VM sets the pending state for a HW interrupt on the virtual distributor we set the active state on the physical distributor, because the virtual interrupt can become active and then the guest can deactivate it. When the VM clears the pending state we also clear it on the physical side, because the injector might otherwise raise the interrupt. We also clear the physical active state when the virtual interrupt is not active, since otherwise a SPEND/CPEND sequence from the guest would prevent signaling of future interrupts. Changing the state of mapped interrupts from userspace is not supported, and it's expected that userspace unmaps devices from VFIO before attempting to set the interrupt state, because the interrupt state is driven by hardware. Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2018-01-02KVM: arm/arm64: Support a vgic interrupt line level sample functionChristoffer Dall1-4/+9
The GIC sometimes need to sample the physical line of a mapped interrupt. As we know this to be notoriously slow, provide a callback function for devices (such as the timer) which can do this much faster than talking to the distributor, for example by comparing a few in-memory values. Fall back to the good old method of poking the physical GIC if no callback is provided. Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2018-01-02KVM: arm/arm64: vgic: Support level-triggered mapped interruptsChristoffer Dall1-0/+23
Level-triggered mapped IRQs are special because we only observe rising edges as input to the VGIC, and we don't set the EOI flag and therefore are not told when the level goes down, so that we can re-queue a new interrupt when the level goes up. One way to solve this problem is to side-step the logic of the VGIC and special case the validation in the injection path, but it has the unfortunate drawback of having to peak into the physical GIC state whenever we want to know if the interrupt is pending on the virtual distributor. Instead, we can maintain the current semantics of a level triggered interrupt by sort of treating it as an edge-triggered interrupt, following from the fact that we only observe an asserting edge. This requires us to be a bit careful when populating the LRs and when folding the state back in though: * We lower the line level when populating the LR, so that when subsequently observing an asserting edge, the VGIC will do the right thing. * If the guest never acked the interrupt while running (for example if it had masked interrupts at the CPU level while running), we have to preserve the pending state of the LR and move it back to the line_level field of the struct irq when folding LR state. If the guest never acked the interrupt while running, but changed the device state and lowered the line (again with interrupts masked) then we need to observe this change in the line_level. Both of the above situations are solved by sampling the physical line and set the line level when folding the LR back. * Finally, if the guest never acked the interrupt while running and sampling the line reveals that the device state has changed and the line has been lowered, we must clear the physical active state, since we will otherwise never be told when the interrupt becomes asserted again. This has the added benefit of making the timer optimization patches (https://lists.cs.columbia.edu/pipermail/kvmarm/2017-July/026343.html) a bit simpler, because the timer code doesn't have to clear the active state on the sync anymore. It also potentially improves the performance of the timer implementation because the GIC knows the state or the LR and only needs to clear the active state when the pending bit in the LR is still set, where the timer has to always clear it when returning from running the guest with an injected timer interrupt. Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2017-12-01KVM: arm/arm64: Fix spinlock acquisition in vgic_set_ownerMarc Zyngier1-2/+3
vgic_set_owner acquires the irq lock without disabling interrupts, resulting in a lockdep splat (an interrupt could fire and result in the same lock being taken if the same virtual irq is to be injected). In practice, it is almost impossible to trigger this bug, but better safe than sorry. Convert the lock acquisition to a spin_lock_irqsave() and keep lockdep happy. Reported-by: James Morse <james.morse@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2017-11-29KVM: arm/arm64: VGIC: extend !vgic_is_initialized guardAndre Przywara1-1/+2
Commit f39d16cbabf9 ("KVM: arm/arm64: Guard kvm_vgic_map_is_active against !vgic_initialized") introduced a check whether the VGIC has been initialized before accessing the spinlock and the VGIC data structure. However the vgic_get_irq() call in the variable declaration sneaked through the net, so lets make sure that this also gets called only after we actually allocated the arrays this function accesses. Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2017-11-10KVM: arm/arm64: GICv4: Hook vPE scheduling into vgic flush/syncMarc Zyngier1-0/+4
The redistributor needs to be told which vPE is about to be run, and tells us whether there is any pending VLPI on exit. Let's add the scheduling calls to the vgic flush/sync functions, allowing the VLPIs to be delivered to the guest. Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2017-11-10KVM: arm/arm64: GICv4: Use pending_last as a scheduling hintMarc Zyngier1-0/+3
When a vPE exits, the pending_last flag is set when there are pending VLPIs stored in the pending table. Similarily, this flag will be set when a doorbell interrupt fires, as it indicates the same condition. Let's update kvm_vgic_vcpu_pending_irq() to account for that flag as well, making a vcpu runnable when set. Acked-by: Christoffer Dall <cdall@linaro.org> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2017-11-06KVM: arm/arm64: vgic: restructure kvm_vgic_(un)map_phys_irqEric Auger1-15/+45
We want to reuse the core of the map/unmap functions for IRQ forwarding. Let's move the computation of the hwirq in kvm_vgic_map_phys_irq and pass the linux IRQ as parameter. the host_irq is added to struct vgic_irq. We introduce kvm_vgic_map/unmap_irq which take a struct vgic_irq handle as a parameter. Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2017-11-06KVM: arm/arm64: Support calling vgic_update_irq_pending from irq contextChristoffer Dall1-23/+36
We are about to optimize our timer handling logic which involves injecting irqs to the vgic directly from the irq handler. Unfortunately, the injection path can take any AP list lock and irq lock and we must therefore make sure to use spin_lock_irqsave where ever interrupts are enabled and we are taking any of those locks, to avoid deadlocking between process context and the ISR. This changes a lot of the VGIC code, but the good news are that the changes are mostly mechanical. Acked-by: Marc Zyngier <marc,zyngier@arm.com> Signed-off-by: Christoffer Dall <cdall@linaro.org>
2017-11-06KVM: arm/arm64: Guard kvm_vgic_map_is_active against !vgic_initializedChristoffer Dall1-0/+3
If the vgic is not initialized, don't try to grab its spinlocks or traverse its data structures. This is important because we soon have to start considering the active state of a virtual interrupts when doing vcpu_load, which may happen early on before the vgic is initialized. Signed-off-by: Christoffer Dall <cdall@linaro.org> Acked-by: Marc Zyngier <marc.zyngier@arm.com>
2017-06-08KVM: arm/arm64: Disallow userspace control of in-kernel IRQ linesChristoffer Dall1-4/+11
When injecting an IRQ to the VGIC, you now have to present an owner token for that IRQ line to show that you are the owner of that line. IRQ lines driven from userspace or via an irqfd do not have an owner and will simply pass a NULL pointer. Also get rid of the unused kvm_vgic_inject_mapped_irq prototype. Signed-off-by: Christoffer Dall <cdall@linaro.org> Acked-by: Marc Zyngier <marc.zyngier@arm.com>
2017-06-08KVM: arm/arm64: Introduce an allocator for in-kernel irq linesChristoffer Dall1-0/+33
Having multiple devices being able to signal the same interrupt line is very confusing and almost certainly guarantees a configuration error. Therefore, introduce a very simple allocator which allows a device to claim an interrupt line from the vgic for a given VM. Signed-off-by: Christoffer Dall <cdall@linaro.org> Acked-by: Marc Zyngier <marc.zyngier@arm.com>
2017-06-04KVM: arm/arm64: use vcpu requests for irq injectionAndrew Jones1-2/+7
Don't use request-less VCPU kicks when injecting IRQs, as a VCPU kick meant to trigger the interrupt injection could be sent while the VCPU is outside guest mode, which means no IPI is sent, and after it has called kvm_vgic_flush_hwstate(), meaning it won't see the updated GIC state until its next exit some time later for some other reason. The receiving VCPU only needs to check this request in VCPU RUN to handle it. By checking it, if it's pending, a memory barrier will be issued that ensures all state is visible. See "Ensuring Requests Are Seen" of Documentation/virtual/kvm/vcpu-requests.rst Signed-off-by: Andrew Jones <drjones@redhat.com> Reviewed-by: Christoffer Dall <cdall@linaro.org> Signed-off-by: Christoffer Dall <cdall@linaro.org>
2017-05-23KVM: arm/arm64: Simplify active_change_prepare and plug raceChristoffer Dall1-5/+6
We don't need to stop a specific VCPU when changing the active state, because private IRQs can only be modified by a running VCPU for the VCPU itself and it is therefore already stopped. However, it is also possible for two VCPUs to be modifying the active state of SPIs at the same time, which can cause the thread being stuck in the loop that checks other VCPU threads for a potentially very long time, or to modify the active state of a running VCPU. Fix this by serializing all accesses to setting and clearing the active state of interrupts using the KVM mutex. Reported-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Christoffer Dall <cdall@linaro.org> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
2017-05-09Merge tag 'kvm-arm-for-v4.12-round2' of ↵Paolo Bonzini1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD Second round of KVM/ARM Changes for v4.12. Changes include: - A fix related to the 32-bit idmap stub - A fix to the bitmask used to deode the operands of an AArch32 CP instruction - We have moved the files shared between arch/arm/kvm and arch/arm64/kvm to virt/kvm/arm - We add support for saving/restoring the virtual ITS state to userspace
2017-05-08Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-16/+44
Pull KVM updates from Paolo Bonzini: "ARM: - HYP mode stub supports kexec/kdump on 32-bit - improved PMU support - virtual interrupt controller performance improvements - support for userspace virtual interrupt controller (slower, but necessary for KVM on the weird Broadcom SoCs used by the Raspberry Pi 3) MIPS: - basic support for hardware virtualization (ImgTec P5600/P6600/I6400 and Cavium Octeon III) PPC: - in-kernel acceleration for VFIO s390: - support for guests without storage keys - adapter interruption suppression x86: - usual range of nVMX improvements, notably nested EPT support for accessed and dirty bits - emulation of CPL3 CPUID faulting generic: - first part of VCPU thread request API - kvm_stat improvements" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (227 commits) kvm: nVMX: Don't validate disabled secondary controls KVM: put back #ifndef CONFIG_S390 around kvm_vcpu_kick Revert "KVM: Support vCPU-based gfn->hva cache" tools/kvm: fix top level makefile KVM: x86: don't hold kvm->lock in KVM_SET_GSI_ROUTING KVM: Documentation: remove VM mmap documentation kvm: nVMX: Remove superfluous VMX instruction fault checks KVM: x86: fix emulation of RSM and IRET instructions KVM: mark requests that need synchronization KVM: return if kvm_vcpu_wake_up() did wake up the VCPU KVM: add explicit barrier to kvm_vcpu_kick KVM: perform a wake_up in kvm_make_all_cpus_request KVM: mark requests that do not need a wakeup KVM: remove #ifndef CONFIG_S390 around kvm_vcpu_wake_up KVM: x86: always use kvm_make_request instead of set_bit KVM: add kvm_{test,clear}_request to replace {test,clear}_bit s390: kvm: Cpu model support for msa6, msa7 and msa8 KVM: x86: remove irq disablement around KVM_SET_CLOCK/KVM_GET_CLOCK kvm: better MWAIT emulation for guests KVM: x86: virtualize cpuid faulting ...
2017-05-04KVM: arm/arm64: Move shared files to virt/kvm/armChristoffer Dall1-1/+1
For some time now we have been having a lot of shared functionality between the arm and arm64 KVM support in arch/arm, which not only required a horrible inter-arch reference from the Makefile in arch/arm64/kvm, but also created confusion for newcomers to the code base, as was recently seen on the mailing list. Further, it causes confusion for things like cscope, which needs special attention to index specific shared files for arm64 from the arm tree. Move the shared files into virt/kvm/arm and move the trace points along with it. When moving the tracepoints we have to modify the way the vgic creates definitions of the trace points, so we take the chance to include the VGIC tracepoints in its very own special vgic trace.h file. Signed-off-by: Christoffer Dall <cdall@linaro.org>
2017-04-09KVM: arm/arm64: vgic: Improve sync_hwstate performanceChristoffer Dall1-4/+6
There is no need to call any functions to fold LRs when we don't use any LRs and we don't need to mess with overflow flags, take spinlocks, or prune the AP list if the AP list is empty. Note: list_empty is a single atomic read (uses READ_ONCE) and can therefore check if a list is empty or not without the need to take the spinlock protecting the list. Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <cdall@linaro.org>
2017-04-09KVM: arm/arm64: vgic: Don't check vgic_initialized in sync/flushChristoffer Dall1-6/+0
Now when we do an early init of the static parts of the VGIC data structures, we can do things like checking if the AP lists are empty directly without having to explicitly check if the vgic is initialized and reduce a bit of work in our critical path. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <cdall@linaro.org>