summaryrefslogtreecommitdiffstats
path: root/arch/powerpc/kvm
AgeCommit message (Collapse)AuthorFilesLines
2018-03-24Merge branch 'topic/ppc-kvm' into nextMichael Ellerman6-9/+512
This brings in two series from Paul, one of which touches KVM code and may need to be merged into the kvm-ppc tree to resolve conflicts.
2018-03-24KVM: PPC: Book3S HV: Work around TEXASR bug in fake suspend statePaul Mackerras1-7/+10
This works around a hardware bug in "Nimbus" POWER9 DD2.2 processors, where the contents of the TEXASR can get corrupted while a thread is in fake suspend state. The workaround is for the instruction emulation code to use the value saved at the most recent guest exit in real suspend mode. We achieve this by simply not saving the TEXASR into the vcpu struct on an exit in fake suspend state. We also have to take care to set the orig_texasr field only on guest exit in real suspend state. This also means that on guest entry in fake suspend state, TEXASR will be restored to the value it had on the last exit in real suspend state, effectively counteracting any hardware-caused corruption. This works because TEXASR may not be written in suspend state. With this, the guest might see the wrong values in TEXASR if it reads it while in suspend state, but will see the correct value in non-transactional state (e.g. after a treclaim), and treclaim will work correctly. With this workaround, the code will actually run slightly faster, and will operate correctly on systems without the TEXASR bug (since TEXASR may not be written in suspend state, and is only changed by failure recording, which will have already been done before we get into fake suspend state). Therefore these changes are not made subject to a CPU feature bit. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-24KVM: PPC: Book3S HV: Work around XER[SO] bug in fake suspend modeSuraj Jitindar Singh1-4/+20
This works around a hardware bug in "Nimbus" POWER9 DD2.2 processors, where a treclaim performed in fake suspend mode can cause subsequent reads from the XER register to return inconsistent values for the SO (summary overflow) bit. The inconsistent SO bit state can potentially be observed on any thread in the core. We have to do the treclaim because that is the only way to get the thread out of suspend state (fake or real) and into non-transactional state. The workaround for the bug is to force the core into SMT4 mode before doing the treclaim. This patch adds the code to do that, conditional on the CPU_FTR_P9_TM_XER_SO_BUG feature bit. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-24KVM: PPC: Book3S HV: Work around transactional memory bugs in POWER9Paul Mackerras6-7/+491
POWER9 has hardware bugs relating to transactional memory and thread reconfiguration (changes to hardware SMT mode). Specifically, the core does not have enough storage to store a complete checkpoint of all the architected state for all four threads. The DD2.2 version of POWER9 includes hardware modifications designed to allow hypervisor software to implement workarounds for these problems. This patch implements those workarounds in KVM code so that KVM guests see a full, working transactional memory implementation. The problems center around the use of TM suspended state, where the CPU has a checkpointed state but execution is not transactional. The workaround is to implement a "fake suspend" state, which looks to the guest like suspended state but the CPU does not store a checkpoint. In this state, any instruction that would cause a transition to transactional state (rfid, rfebb, mtmsrd, tresume) or would use the checkpointed state (treclaim) causes a "soft patch" interrupt (vector 0x1500) to the hypervisor so that it can be emulated. The trechkpt instruction also causes a soft patch interrupt. On POWER9 DD2.2, we avoid returning to the guest in any state which would require a checkpoint to be present. The trechkpt in the guest entry path which would normally create that checkpoint is replaced by either a transition to fake suspend state, if the guest is in suspend state, or a rollback to the pre-transactional state if the guest is in transactional state. Fake suspend state is indicated by a flag in the PACA plus a new bit in the PSSCR. The new PSSCR bit is write-only and reads back as 0. On exit from the guest, if the guest is in fake suspend state, we still do the treclaim instruction as we would in real suspend state, in order to get into non-transactional state, but we do not save the resulting register state since there was no checkpoint. Emulation of the instructions that cause a softpatch interrupt is handled in two paths. If the guest is in real suspend mode, we call kvmhv_p9_tm_emulation_early() to handle the cases where the guest is transitioning to transactional state. This is called before we do the treclaim in the guest exit path; because we haven't done treclaim, we can get back to the guest with the transaction still active. If the instruction is a case that kvmhv_p9_tm_emulation_early() doesn't handle, or if the guest is in fake suspend state, then we proceed to do the complete guest exit path and subsequently call kvmhv_p9_tm_emulation() in host context with the MMU on. This handles all the cases including the cases that generate program interrupts (illegal instruction or TM Bad Thing) and facility unavailable interrupts. The emulation is reasonably straightforward and is mostly concerned with checking for exception conditions and updating the state of registers such as MSR and CR0. The treclaim emulation takes care to ensure that the TEXASR register gets updated as if it were the guest treclaim instruction that had done failure recording, not the treclaim done in hypervisor state in the guest exit path. With this, the KVM_CAP_PPC_HTM capability returns true (1) even if transactional memory is not available to host userspace. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-13Merge two commits from 'kvm-ppc-fixes' into nextMichael Ellerman1-1/+3
This merges two commits from the `kvm-ppc-fixes` branch into next, as they fix build breaks we are seeing while testing next.
2018-02-22treewide/trivial: Remove ';;$' typo noiseIngo Molnar1-1/+1
On lkml suggestions were made to split up such trivial typo fixes into per subsystem patches: --- a/arch/x86/boot/compressed/eboot.c +++ b/arch/x86/boot/compressed/eboot.c @@ -439,7 +439,7 @@ setup_uga32(void **uga_handle, unsigned long size, u32 *width, u32 *height) struct efi_uga_draw_protocol *uga = NULL, *first_uga; efi_guid_t uga_proto = EFI_UGA_PROTOCOL_GUID; unsigned long nr_ugas; - u32 *handles = (u32 *)uga_handle;; + u32 *handles = (u32 *)uga_handle; efi_status_t status = EFI_INVALID_PARAMETER; int i; This patch is the result of the following script: $ sed -i 's/;;$/;/g' $(git grep -E ';;$' | grep "\.[ch]:" | grep -vwE 'for|ia64' | cut -d: -f1 | sort | uniq) ... followed by manual review to make sure it's all good. Splitting this up is just crazy talk, let's get over with this and just do it. Reported-by: Pavel Machek <pavel@ucw.cz> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-13KVM: PPC: Book3S: Fix compile error that occurs with some gcc versionsPaul Mackerras1-1/+1
Some versions of gcc generate a warning that the variable "emulated" may be used uninitialized in function kvmppc_handle_load128_by2x64(). It would be used uninitialized if kvmppc_handle_load128_by2x64 was ever called with vcpu->arch.mmio_vmx_copy_nums == 0, but neither of the callers ever do that, so there is no actual bug. When gcc generates a warning, it causes the build to fail because arch/powerpc is compiled with -Werror. This silences the warning by initializing "emulated" to EMULATE_DONE. Fixes: 09f984961c13 ("KVM: PPC: Book3S: Add MMIO emulation for VMX instructions") Reported-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-02-13KVM: PPC: Fix compile error that occurs when CONFIG_ALTIVEC=nPaul Mackerras1-0/+2
Commit accb757d798c ("KVM: Move vcpu_load to arch-specific kvm_arch_vcpu_ioctl_run", 2017-12-04) added a "goto out" statement and an "out:" label to kvm_arch_vcpu_ioctl_run(). Since the only "goto out" is inside a CONFIG_VSX block, compiling with CONFIG_VSX=n gives a warning that label "out" is defined but not used, and because arch/powerpc is compiled with -Werror, that becomes a compile error that makes the kernel build fail. Merge commit 1ab03c072feb ("Merge tag 'kvm-ppc-next-4.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc", 2018-02-09) added a similar block of code inside a #ifdef CONFIG_ALTIVEC, with a "goto out" statement. In order to make the build succeed, this adds a #ifdef around the "out:" label. This is a minimal, ugly fix, to be replaced later by a refactoring of the code. Since CONFIG_VSX depends on CONFIG_ALTIVEC, it is sufficient to use #ifdef CONFIG_ALTIVEC here. Fixes: accb757d798c ("KVM: Move vcpu_load to arch-specific kvm_arch_vcpu_ioctl_run") Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-02-10Merge tag 'kvm-4.16-1' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds14-196/+610
Pull KVM updates from Radim Krčmář: "ARM: - icache invalidation optimizations, improving VM startup time - support for forwarded level-triggered interrupts, improving performance for timers and passthrough platform devices - a small fix for power-management notifiers, and some cosmetic changes PPC: - add MMIO emulation for vector loads and stores - allow HPT guests to run on a radix host on POWER9 v2.2 CPUs without requiring the complex thread synchronization of older CPU versions - improve the handling of escalation interrupts with the XIVE interrupt controller - support decrement register migration - various cleanups and bugfixes. s390: - Cornelia Huck passed maintainership to Janosch Frank - exitless interrupts for emulated devices - cleanup of cpuflag handling - kvm_stat counter improvements - VSIE improvements - mm cleanup x86: - hypervisor part of SEV - UMIP, RDPID, and MSR_SMI_COUNT emulation - paravirtualized TLB shootdown using the new KVM_VCPU_PREEMPTED bit - allow guests to see TOPOEXT, GFNI, VAES, VPCLMULQDQ, and more AVX512 features - show vcpu id in its anonymous inode name - many fixes and cleanups - per-VCPU MSR bitmaps (already merged through x86/pti branch) - stable KVM clock when nesting on Hyper-V (merged through x86/hyperv)" * tag 'kvm-4.16-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (197 commits) KVM: PPC: Book3S: Add MMIO emulation for VMX instructions KVM: PPC: Book3S HV: Branch inside feature section KVM: PPC: Book3S HV: Make HPT resizing work on POWER9 KVM: PPC: Book3S HV: Fix handling of secondary HPTEG in HPT resizing code KVM: PPC: Book3S PR: Fix broken select due to misspelling KVM: x86: don't forget vcpu_put() in kvm_arch_vcpu_ioctl_set_sregs() KVM: PPC: Book3S PR: Fix svcpu copying with preemption enabled KVM: PPC: Book3S HV: Drop locks before reading guest memory kvm: x86: remove efer_reload entry in kvm_vcpu_stat KVM: x86: AMD Processor Topology Information x86/kvm/vmx: do not use vm-exit instruction length for fast MMIO when running nested kvm: embed vcpu id to dentry of vcpu anon inode kvm: Map PFN-type memory regions as writable (if possible) x86/kvm: Make it compile on 32bit and with HYPYERVISOR_GUEST=n KVM: arm/arm64: Fixup userspace irqchip static key optimization KVM: arm/arm64: Fix userspace_irqchip_in_use counting KVM: arm/arm64: Fix incorrect timer_is_pending logic MAINTAINERS: update KVM/s390 maintainers MAINTAINERS: add Halil as additional vfio-ccw maintainer MAINTAINERS: add David as a reviewer for KVM/s390 ...
2018-02-09Merge tag 'kvm-ppc-next-4.16-2' of ↵Radim Krčmář8-35/+237
git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc Second PPC KVM update for 4.16 Seven fixes that are either trivial or that address bugs that people are actually hitting. The main ones are: - Drop spinlocks before reading guest memory - Fix a bug causing corruption of VCPU state in PR KVM with preemption enabled - Make HPT resizing work on POWER9 - Add MMIO emulation for vector loads and stores, because guests now use these instructions in memcpy and similar routines.
2018-02-09KVM: PPC: Book3S: Add MMIO emulation for VMX instructionsJose Ricardo Ziviani2-0/+186
This patch provides the MMIO load/store vector indexed X-Form emulation. Instructions implemented: lvx: the quadword in storage addressed by the result of EA & 0xffff_ffff_ffff_fff0 is loaded into VRT. stvx: the contents of VRS are stored into the quadword in storage addressed by the result of EA & 0xffff_ffff_ffff_fff0. Reported-by: Gopesh Kumar Chaudhary <gopchaud@in.ibm.com> Reported-by: Balamuruhan S <bala24@linux.vnet.ibm.com> Signed-off-by: Jose Ricardo Ziviani <joserz@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-02-09KVM: PPC: Book3S HV: Branch inside feature sectionAlexander Graf1-1/+2
We ended up with code that did a conditional branch inside a feature section to code outside of the feature section. Depending on how the object file gets organized, that might mean we exceed the 14bit relocation limit for conditional branches: arch/powerpc/kvm/built-in.o:arch/powerpc/kvm/book3s_hv_rmhandlers.S:416:(__ftr_alt_97+0x8): relocation truncated to fit: R_PPC64_REL14 against `.text'+1ca4 So instead of doing a conditional branch outside of the feature section, let's just jump at the end of the same, making the branch very short. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-02-09KVM: PPC: Book3S HV: Make HPT resizing work on POWER9David Gibson2-9/+24
This adds code to enable the HPT resizing code to work on POWER9, which uses a slightly modified HPT entry format compared to POWER8. On POWER9, we convert HPTEs read from the HPT from the new format to the old format so that the rest of the HPT resizing code can work as before. HPTEs written to the new HPT are converted to the new format as the last step before writing them into the new HPT. This takes out the checks added by commit bcd3bb63dbc8 ("KVM: PPC: Book3S HV: Disable HPT resizing on POWER9 for now", 2017-02-18), now that HPT resizing works on POWER9. On POWER9, when we pivot to the new HPT, we now call kvmppc_setup_partition_table() to update the partition table in order to make the hardware use the new HPT. [paulus@ozlabs.org - added kvmppc_setup_partition_table() call, wrote commit message.] Tested-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-02-09KVM: PPC: Book3S HV: Fix handling of secondary HPTEG in HPT resizing codePaul Mackerras1-6/+2
This fixes the computation of the HPTE index to use when the HPT resizing code encounters a bolted HPTE which is stored in its secondary HPTE group. The code inverts the HPTE group number, which is correct, but doesn't then mask it with new_hash_mask. As a result, new_pteg will be effectively negative, resulting in new_hptep pointing before the new HPT, which will corrupt memory. In addition, this removes two BUG_ON statements. The condition that the BUG_ONs were testing -- that we have computed the hash value incorrectly -- has never been observed in testing, and if it did occur, would only affect the guest, not the host. Given that BUG_ON should only be used in conditions where the kernel (i.e. the host kernel, in this case) can't possibly continue execution, it is not appropriate here. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-02-08KVM: PPC: Book3S PR: Fix broken select due to misspellingUlf Magnusson1-1/+1
Commit 76d837a4c0f9 ("KVM: PPC: Book3S PR: Don't include SPAPR TCE code on non-pseries platforms") added a reference to the globally undefined symbol PPC_SERIES. Looking at the rest of the commit, PPC_PSERIES was probably intended. Change PPC_SERIES to PPC_PSERIES. Discovered with the https://github.com/ulfalizer/Kconfiglib/blob/master/examples/list_undefined.py script. Fixes: 76d837a4c0f9 ("KVM: PPC: Book3S PR: Don't include SPAPR TCE code on non-pseries platforms") Cc: stable@vger.kernel.org # v4.12+ Signed-off-by: Ulf Magnusson <ulfalizer@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-02-02Merge tag 'powerpc-4.16-1' of ↵Linus Torvalds5-24/+22
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: "Highlights: - Enable support for memory protection keys aka "pkeys" on Power7/8/9 when using the hash table MMU. - Extend our interrupt soft masking to support masking PMU interrupts as well as "normal" interrupts, and then use that to implement local_t for a ~4x speedup vs the current atomics-based implementation. - A new driver "ocxl" for "Open Coherent Accelerator Processor Interface (OpenCAPI)" devices. - Support for new device tree properties on PowerVM to describe hotpluggable memory and devices. - Add support for CLOCK_{REALTIME/MONOTONIC}_COARSE to the 64-bit VDSO. - Freescale updates from Scott: fixes for CPM GPIO and an FSL PCI erratum workaround, plus a minor cleanup patch. As well as quite a lot of other changes all over the place, and small fixes and cleanups as always. Thanks to: Alan Modra, Alastair D'Silva, Alexey Kardashevskiy, Alistair Popple, Andreas Schwab, Andrew Donnellan, Aneesh Kumar K.V, Anju T Sudhakar, Anshuman Khandual, Anton Blanchard, Arnd Bergmann, Balbir Singh, Benjamin Herrenschmidt, Bhaktipriya Shridhar, Bryant G. Ly, Cédric Le Goater, Christophe Leroy, Christophe Lombard, Cyril Bur, David Gibson, Desnes A. Nunes do Rosario, Dmitry Torokhov, Frederic Barrat, Geert Uytterhoeven, Guilherme G. Piccoli, Gustavo A. R. Silva, Gustavo Romero, Ivan Mikhaylov, Joakim Tjernlund, Joe Perches, Josh Poimboeuf, Juan J. Alvarez, Julia Cartwright, Kamalesh Babulal, Madhavan Srinivasan, Mahesh Salgaonkar, Mathieu Malaterre, Michael Bringmann, Michael Hanselmann, Michael Neuling, Nathan Fontenot, Naveen N. Rao, Nicholas Piggin, Paul Mackerras, Philippe Bergheaud, Ram Pai, Russell Currey, Santosh Sivaraj, Scott Wood, Seth Forshee, Simon Guo, Stewart Smith, Sukadev Bhattiprolu, Thiago Jung Bauermann, Vaibhav Jain, Vasyl Gomonovych" * tag 'powerpc-4.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (199 commits) powerpc/mm/radix: Fix build error when RADIX_MMU=n macintosh/ams-input: Use true and false for boolean values macintosh: change some data types from int to bool powerpc/watchdog: Print the NIP in soft_nmi_interrupt() powerpc/watchdog: regs can't be null in soft_nmi_interrupt() powerpc/watchdog: Tweak watchdog printks powerpc/cell: Remove axonram driver rtc-opal: Fix handling of firmware error codes, prevent busy loops powerpc/mpc52xx_gpt: make use of raw_spinlock variants macintosh/adb: Properly mark continued kernel messages powerpc/pseries: Fix cpu hotplug crash with memoryless nodes powerpc/numa: Ensure nodes initialized for hotplug powerpc/numa: Use ibm,max-associativity-domains to discover possible nodes powerpc/kernel: Block interrupts when updating TIDR powerpc/powernv/idoa: Remove unnecessary pcidev from pci_dn powerpc/mm/nohash: do not flush the entire mm when range is a single page powerpc/pseries: Add Initialization of VF Bars powerpc/pseries/pci: Associate PEs to VFs in configure SR-IOV powerpc/eeh: Add EEH notify resume sysfs powerpc/eeh: Add EEH operations to notify resume ...
2018-02-01Merge tag 'kvm-ppc-next-4.16-1' of ↵Radim Krčmář8-144/+298
git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc PPC KVM update for 4.16 - Allow HPT guests to run on a radix host on POWER9 v2.2 CPUs without requiring the complex thread synchronization that earlier CPU versions required. - A series from Ben Herrenschmidt to improve the handling of escalation interrupts with the XIVE interrupt controller. - Provide for the decrementer register to be copied across on migration. - Various minor cleanups and bugfixes.
2018-02-01Merge branch 'x86/hyperv' of ↵Radim Krčmář8-41/+210
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Topic branch for stable KVM clockource under Hyper-V. Thanks to Christoffer Dall for resolving the ARM conflict.
2018-01-31Merge branch 'work.whack-a-mole' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull asm/uaccess.h whack-a-mole from Al Viro: "It's linux/uaccess.h, damnit... Oh, well - eventually they'll stop cropping up..." * 'work.whack-a-mole' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: asm-prototypes.h: use linux/uaccess.h, not asm/uaccess.h riscv: use linux/uaccess.h, not asm/uaccess.h... ppc: for put_user() pull linux/uaccess.h, not asm/uaccess.h
2018-02-01KVM: PPC: Book3S PR: Fix svcpu copying with preemption enabledAlexander Graf2-14/+10
When copying between the vcpu and svcpu, we may get scheduled away onto a different host CPU which in turn means our svcpu pointer may change. That means we need to atomically copy to and from the svcpu with preemption disabled, so that all code around it always sees a coherent state. Reported-by: Simon Guo <wei.guo.simon@gmail.com> Fixes: 3d3319b45eea ("KVM: PPC: Book3S: PR: Enable interrupts earlier") Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-02-01KVM: PPC: Book3S HV: Drop locks before reading guest memoryPaul Mackerras1-4/+12
Running with CONFIG_DEBUG_ATOMIC_SLEEP reveals that HV KVM tries to read guest memory, in order to emulate guest instructions, while preempt is disabled and a vcore lock is held. This occurs in kvmppc_handle_exit_hv(), called from post_guest_process(), when emulating guest doorbell instructions on POWER9 systems, and also when checking whether we have hit a hypervisor breakpoint. Reading guest memory can cause a page fault and thus cause the task to sleep, so we need to avoid reading guest memory while holding a spinlock or when preempt is disabled. To fix this, we move the preempt_enable() in kvmppc_run_core() to before the loop that calls post_guest_process() for each vcore that has just run, and we drop and re-take the vcore lock around the calls to kvmppc_emulate_debug_inst() and kvmppc_emulate_doorbell_instr(). Dropping the lock is safe with respect to the iteration over the runnable vcpus in post_guest_process(); for_each_runnable_thread is actually safe to use locklessly. It is possible for a vcpu to become runnable and add itself to the runnable_threads array (code near the beginning of kvmppc_run_vcpu()) and then get included in the iteration in post_guest_process despite the fact that it has not just run. This is benign because vcpu->arch.trap and vcpu->arch.ceded will be zero. Cc: stable@vger.kernel.org # v4.13+ Fixes: 579006944e0d ("KVM: PPC: Book3S HV: Virtualize doorbell facility on POWER9") Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-01-22powerpc: Use octal numbers for file permissionsRussell Currey2-7/+5
Symbolic macros are unintuitive and hard to read, whereas octal constants are much easier to interpret. Replace macros for the basic permission flags (user/group/other read/write/execute) with numeric constants instead, across the whole powerpc tree. Introducing a significant number of changes across the tree for no runtime benefit isn't exactly desirable, but so long as these macros are still used in the tree people will keep sending patches that add them. Not only are they hard to parse at a glance, there are multiple ways of coming to the same value (as you can see with 0444 and 0644 in this patch) which hurts readability. Signed-off-by: Russell Currey <ruscur@russell.cc> Reviewed-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-21Merge branch 'fixes' into nextMichael Ellerman4-12/+15
Merge our fixes branch from the 4.15 cycle. Unusually the fixes branch saw some significant features merged, notably the RFI flush patches, so we want the code in next to be tested against that, to avoid any surprises when the two are merged. There's also some other work on the panic handling that was reverted in fixes and we now want to do properly in next, which would conflict. And we also fix a few other minor merge conflicts.
2018-01-21Merge branch 'topic/ppc-kvm' into nextMichael Ellerman3-12/+14
Merge the topic branch we share with kvm-ppc, this brings in two xive commits, one from Paul to rework HMI handling, and a minor cleanup to drop an unused flag.
2018-01-21powerpc/mm: Remove unused flag arg in global_invalidatesAneesh Kumar K.V1-5/+4
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-19powerpc/64: Rename soft_enabled to irq_soft_maskMadhavan Srinivasan1-1/+1
Rename the paca->soft_enabled to paca->irq_soft_mask as it is no longer used as a flag for interrupt state, but a mask. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-19KVM: PPC: Book3S: Provide information about hardware/firmware CVE workaroundsPaul Mackerras1-0/+131
This adds a new ioctl, KVM_PPC_GET_CPU_CHAR, that gives userspace information about the underlying machine's level of vulnerability to the recently announced vulnerabilities CVE-2017-5715, CVE-2017-5753 and CVE-2017-5754, and whether the machine provides instructions to assist software to work around the vulnerabilities. The ioctl returns two u64 words describing characteristics of the CPU and required software behaviour respectively, plus two mask words which indicate which bits have been filled in by the kernel, for extensibility. The bit definitions are the same as for the new H_GET_CPU_CHARACTERISTICS hypercall. There is also a new capability, KVM_CAP_PPC_GET_CPU_CHAR, which indicates whether the new ioctl is available. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-01-19KVM: PPC: Book3S HV: Keep XIVE escalation interrupt masked unless cededBenjamin Herrenschmidt2-1/+91
This works on top of the single escalation support. When in single escalation, with this change, we will keep the escalation interrupt disabled unless the VCPU is in H_CEDE (idle). In any other case, we know the VCPU will be rescheduled and thus there is no need to take escalation interrupts in the host whenever a guest interrupt fires. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-01-19KVM: PPC: Book3S HV: Make xive_pushed a byte, not a wordBenjamin Herrenschmidt1-3/+3
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-01-19KVM: PPC: Book3S HV: Check DR not IR to chose real vs virt mode MMIOsBenjamin Herrenschmidt1-1/+1
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-01-19KVM: PPC: Book3S HV: Don't use existing "prodded" flag for XIVE escalationsBenjamin Herrenschmidt3-3/+12
The prodded flag is only cleared at the beginning of H_CEDE, so every time we have an escalation, we will cause the *next* H_CEDE to return immediately. Instead use a dedicated "irq_pending" flag to indicate that a guest interrupt is pending for the VCPU. We don't reuse the existing exception bitmap so as to avoid expensive atomic ops. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-01-19KVM: PPC: Book3S HV: Enable use of the new XIVE "single escalation" featureBenjamin Herrenschmidt2-25/+38
That feature, provided by Power9 DD2.0 and later, when supported by newer OPAL versions, allows us to sacrifice a queue (priority 7) in favor of merging all the escalation interrupts of the queues of a single VP into a single interrupt. This reduces the number of host interrupts used up by KVM guests especially when those guests use multiple priorities. It will also enable a future change to control the masking of the escalation interrupts more precisely to avoid spurious ones. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-01-19KVM: PPC: Book3S HV: Add more info about XIVE queues in debugfsBenjamin Herrenschmidt1-0/+28
Add details about enabled queues and escalation interrupts. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-01-19Merge remote-tracking branch 'remotes/powerpc/topic/ppc-kvm' into kvm-ppc-nextPaul Mackerras4-16/+15
This merges in the ppc-kvm topic branch of the powerpc tree to get two patches which are prerequisites for the following patch series, plus another patch which touches both powerpc and KVM code. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-01-18KVM: PPC: Book3S HV: Improve handling of debug-trigger HMIs on POWER9Paul Mackerras2-7/+10
Hypervisor maintenance interrupts (HMIs) are generated by various causes, signalled by bits in the hypervisor maintenance exception register (HMER). In most cases calling OPAL to handle the interrupt is the correct thing to do, but the "debug trigger" HMIs signalled by PPC bit 17 (bit 46) of HMER are used to invoke software workarounds for hardware bugs, and OPAL does not have any code to handle this cause. The debug trigger HMI is used in POWER9 DD2.0 and DD2.1 chips to work around a hardware bug in executing vector load instructions to cache inhibited memory. In POWER9 DD2.2 chips, it is generated when conditions are detected relating to threads being in TM (transactional memory) suspended mode when the core SMT configuration needs to be reconfigured. The kernel currently has code to detect the vector CI load condition, but only when the HMI occurs in the host, not when it occurs in a guest. If a HMI occurs in the guest, it is always passed to OPAL, and then we always re-sync the timebase, because the HMI cause might have been a timebase error, for which OPAL would re-sync the timebase, thus removing the timebase offset which KVM applied for the guest. Since we don't know what OPAL did, we don't know whether to subtract the timebase offset from the timebase, so instead we re-sync the timebase. This adds code to determine explicitly what the cause of a debug trigger HMI will be. This is based on a new device-tree property under the CPU nodes called ibm,hmi-special-triggers, if it is present, or otherwise based on the PVR (processor version register). The handling of debug trigger HMIs is pulled out into a separate function which can be called from the KVM guest exit code. If this function handles and clears the HMI, and no other HMI causes remain, then we skip calling OPAL and we proceed to subtract the guest timebase offset from the timebase. The overall handling for HMIs that occur in the host (i.e. not in a KVM guest) is largely unchanged, except that we now don't set the flag for the vector CI load workaround on DD2.2 processors. This also removes a BUG_ON in the KVM code. BUG_ON is generally not useful in KVM guest entry/exit code since it is difficult to handle the resulting trap gracefully. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-18KVM: PPC: Book3S HV: Allow HPT and radix on the same core for POWER9 v2.2Paul Mackerras1-6/+24
POWER9 chip versions starting with "Nimbus" v2.2 can support running with some threads of a core in HPT mode and others in radix mode. This means that we don't have to prohibit independent-threads mode when running a HPT guest on a radix host, and we don't have to do any of the synchronization between threads that was introduced in commit c01015091a77 ("KVM: PPC: Book3S HV: Run HPT guests on POWER9 radix hosts", 2017-10-19). Rather than using up another CPU feature bit, we just do an explicit test on the PVR (processor version register) at module startup time to determine whether we have to take steps to avoid having some threads in HPT mode and some in radix mode (so-called "mixed mode"). We test for "Nimbus" (indicated by 0 or 1 in the top nibble of the lower 16 bits) v2.2 or later, or "Cumulus" (indicated by 2 or 3 in that nibble) v1.1 or later. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-01-18powerpc/64s: Improve local TLB flush for boot and MCE on POWER9Nicholas Piggin1-4/+2
There are several cases outside the normal address space management where a CPU's entire local TLB is to be flushed: 1. Booting the kernel, in case something has left stale entries in the TLB (e.g., kexec). 2. Machine check, to clean corrupted TLB entries. One other place where the TLB is flushed, is waking from deep idle states. The flush is a side-effect of calling ->cpu_restore with the intention of re-setting various SPRs. The flush itself is unnecessary because in the first case, the TLB should not acquire new corrupted TLB entries as part of sleep/wake (though they may be lost). This type of TLB flush is coded inflexibly, several times for each CPU type, and they have a number of problems with ISA v3.0B: - The current radix mode of the MMU is not taken into account, it is always done as a hash flushn For IS=2 (LPID-matching flush from host) and IS=3 with HV=0 (guest kernel flush), tlbie(l) is undefined if the R field does not match the current radix mode. - ISA v3.0B hash must flush the partition and process table caches as well. - ISA v3.0B radix must flush partition and process scoped translations, partition and process table caches, and also the page walk cache. So consolidate the flushing code and implement it in C and inline asm under the mm/ directory with the rest of the flush code. Add ISA v3.0B cases for radix and hash, and use the radix flush in radix environment. Provide a way for IS=2 (LPID flush) to specify the radix mode of the partition. Have KVM pass in the radix mode of the guest. Take out the flushes from early cputable/dt_cpu_ftrs detection hooks, and move it later in the boot process after, the MMU registers are set up and before relocation is first turned on. The TLB flush is no longer called when restoring from deep idle states. This was not be done as a separate step because booting secondaries uses the same cpu_restore as idle restore, which needs the TLB flush. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-17KVM: PPC: Book3S HV: Do SLB load/unload with guest LPCR value loadedPaul Mackerras1-54/+55
This moves the code that loads and unloads the guest SLB values so that it is done while the guest LPCR value is loaded in the LPCR register. The reason for doing this is that on POWER9, the behaviour of the slbmte instruction depends on the LPCR[UPRT] bit. If UPRT is 1, as it is for a radix host (or guest), the SLB index is truncated to 2 bits. This means that for a HPT guest on a radix host, the SLB was not being loaded correctly, causing the guest to crash. The SLB is now loaded much later in the guest entry path, after the LPCR is loaded, which for a secondary thread is after it sees that the primary thread has switched the MMU to the guest. The loop that waits for the primary thread has a branch out to the exit code that is taken if it sees that other threads have commenced exiting the guest. Since we have now not loaded the SLB at this point, we make this path branch to a new label 'guest_bypass' and we move the SLB unload code to before this label. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-01-17KVM: PPC: Book3S HV: Make sure we don't re-enter guest without XIVE loadedPaul Mackerras1-20/+20
This fixes a bug where it is possible to enter a guest on a POWER9 system without having the XIVE (interrupt controller) context loaded. This can happen because we unload the XIVE context from the CPU before doing the real-mode handling for machine checks. After the real-mode handler runs, it is possible that we re-enter the guest via a fast path which does not load the XIVE context. To fix this, we move the unloading of the XIVE context to come after the real-mode machine check handler is called. Fixes: 5af50993850a ("KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controller") Cc: stable@vger.kernel.org # v4.11+ Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-01-16KVM: PPC: Book3S HV: Enable migration of decrementer registerPaul Mackerras2-1/+9
This adds a register identifier for use with the one_reg interface to allow the decrementer expiry time to be read and written by userspace. The decrementer expiry time is in guest timebase units and is equal to the sum of the decrementer and the guest timebase. (The expiry time is used rather than the decrementer value itself because the expiry time is not constantly changing, though the decrementer value is, while the guest vcpu is not running.) Without this, a guest vcpu migrated to a new host will see its decrementer set to some random value. On POWER8 and earlier, the decrementer is 32 bits wide and counts down at 512MHz, so the guest vcpu will potentially see no decrementer interrupts for up to about 4 seconds, which will lead to a stall. With POWER9, the decrementer is now 56 bits side, so the stall can be much longer (up to 2.23 years) and more noticeable. To help work around the problem in cases where userspace has not been updated to migrate the decrementer expiry time, we now set the default decrementer expiry at vcpu creation time to the current time rather than the maximum possible value. This should mean an immediate decrementer interrupt when a migrated vcpu starts running. In cases where the decrementer is 32 bits wide and more than 4 seconds elapse between the creation of the vcpu and when it first runs, the decrementer would have wrapped around to positive values and there may still be a stall - but this is no worse than the current situation. In the large-decrementer case, we are sure to get an immediate decrementer interrupt (assuming the time from vcpu creation to first run is less than 2.23 years) and we thus avoid a very long stall. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-01-14Merge tag 'powerpc-4.15-7' of ↵Linus Torvalds3-9/+11
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "One fix for an oops at boot if we take a hotplug interrupt before we are ready to handle it. The bulk is patches to implement mitigation for Meltdown, see the change logs for more details. Thanks to: Nicholas Piggin, Michael Neuling, Oliver O'Halloran, Jon Masters, Jose Ricardo Ziviani, David Gibson" * tag 'powerpc-4.15-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/powernv: Check device-tree for RFI flush settings powerpc/pseries: Query hypervisor for RFI flush settings powerpc/64s: Support disabling RFI flush with no_rfi_flush and nopti powerpc/64s: Add support for RFI flush of L1-D cache powerpc/64s: Convert slb_miss_common to use RFI_TO_USER/KERNEL powerpc/64: Convert fast_exception_return to use RFI_TO_USER/KERNEL powerpc/64: Convert the syscall exit path to use RFI_TO_USER/KERNEL powerpc/64s: Simple RFI macro conversions powerpc/64: Add macros for annotating the destination of rfid/hrfid powerpc/pseries: Add H_GET_CPU_CHARACTERISTICS flags & wrapper powerpc/pseries: Make RAS IRQ explicitly dependent on DLPAR WQ
2018-01-11Merge tag 'kvm-ppc-fixes-4.15-3' of ↵Paolo Bonzini3-29/+64
git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into kvm-master PPC KVM fixes for 4.15 Four commits here, including two that were tagged but never merged. Three of them are for the HPT resizing code; two of those fix a user-triggerable use-after-free in the host, and one that fixes stale TLB entries in the guest. The remaining commit fixes a bug causing PR KVM guests under PowerVM to fail to start.
2018-01-11KVM: PPC: Use seq_puts() in kvmppc_exit_timing_show()Markus Elfring1-2/+1
A headline should be quickly put into a sequence. Thus use the function "seq_puts" instead of "seq_printf" for this purpose. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-01-11KVM: PPC: Book3S HV: Remove vcpu->arch.dec usageAlexander Graf1-1/+0
On Book3S in HV mode, we don't use the vcpu->arch.dec field at all. Instead, all logic is built around vcpu->arch.dec_expires. So let's remove the one remaining piece of code that was setting it. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-01-10KVM: PPC: Book3S HV: Always flush TLB in kvmppc_alloc_reset_hpt()David Gibson1-2/+4
The KVM_PPC_ALLOCATE_HTAB ioctl(), implemented by kvmppc_alloc_reset_hpt() is supposed to completely clear and reset a guest's Hashed Page Table (HPT) allocating or re-allocating it if necessary. In the case where an HPT of the right size already exists and it just zeroes it, it forces a TLB flush on all guest CPUs, to remove any stale TLB entries loaded from the old HPT. However, that situation can arise when the HPT is resizing as well - or even when switching from an RPT to HPT - so those cases need a TLB flush as well. So, move the TLB flush to trigger in all cases except for errors. Cc: stable@vger.kernel.org # v4.10+ Fixes: f98a8bf9ee20 ("KVM: PPC: Book3S HV: Allow KVM_PPC_ALLOCATE_HTAB ioctl() to change HPT size") Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-01-10KVM: PPC: Book3S PR: Fix WIMG handling under pHypAlexey Kardashevskiy2-0/+3
Commit 96df226 ("KVM: PPC: Book3S PR: Preserve storage control bits") added code to preserve WIMG bits but it missed 2 special cases: - a magic page in kvmppc_mmu_book3s_64_xlate() and - guest real mode in kvmppc_handle_pagefault(). For these ptes, WIMG was 0 and pHyp failed on these causing a guest to stop in the very beginning at NIP=0x100 (due to bd9166ffe "KVM: PPC: Book3S PR: Exit KVM on failed mapping"). According to LoPAPR v1.1 14.5.4.1.2 H_ENTER: The hypervisor checks that the WIMG bits within the PTE are appropriate for the physical page number else H_Parameter return. (For System Memory pages WIMG=0010, or, 1110 if the SAO option is enabled, and for IO pages WIMG=01**.) This hence initializes WIMG to non-zero value HPTE_R_M (0x10), as expected by pHyp. [paulus@ozlabs.org - fix compile for 32-bit] Cc: stable@vger.kernel.org # v4.11+ Fixes: 96df226 "KVM: PPC: Book3S PR: Preserve storage control bits" Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Tested-by: Ruediger Oertel <ro@suse.de> Reviewed-by: Greg Kurz <groug@kaod.org> Tested-by: Greg Kurz <groug@kaod.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-01-10powerpc/64s: Simple RFI macro conversionsNicholas Piggin3-9/+11
This commit does simple conversions of rfi/rfid to the new macros that include the expected destination context. By simple we mean cases where there is a single well known destination context, and it's simply a matter of substituting the instruction for the appropriate macro. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-12-22KVM: PPC: Book3S HV: Fix pending_pri value in kvmppc_xive_get_icp()Laurent Vivier1-1/+2
When we migrate a VM from a POWER8 host (XICS) to a POWER9 host (XICS-on-XIVE), we have an error: qemu-kvm: Unable to restore KVM interrupt controller state \ (0xff000000) for CPU 0: Invalid argument This is because kvmppc_xics_set_icp() checks the new state is internaly consistent, and especially: ... 1129 if (xisr == 0) { 1130 if (pending_pri != 0xff) 1131 return -EINVAL; ... On the other side, kvmppc_xive_get_icp() doesn't set neither the pending_pri value, nor the xisr value (set to 0) (and kvmppc_xive_set_icp() ignores the pending_pri value) As xisr is 0, pending_pri must be set to 0xff. Fixes: 5af50993850a ("KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controller") Cc: stable@vger.kernel.org # v4.12+ Signed-off-by: Laurent Vivier <lvivier@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-12-22KVM: PPC: Book3S: fix XIVE migration of pending interruptsCédric Le Goater1-2/+2
When restoring a pending interrupt, we are setting the Q bit to force a retrigger in xive_finish_unmask(). But we also need to force an EOI in this case to reach the same initial state : P=1, Q=0. This can be done by not setting 'old_p' for pending interrupts which will inform xive_finish_unmask() that an EOI needs to be sent. Fixes: 5af50993850a ("KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controller") Cc: stable@vger.kernel.org # v4.12+ Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Tested-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-12-14KVM: introduce kvm_arch_vcpu_async_ioctlPaolo Bonzini2-3/+12
After the vcpu_load/vcpu_put pushdown, the handling of asynchronous VCPU ioctl is already much clearer in that it is obvious that they bypass vcpu_load and vcpu_put. However, it is still not perfect in that the different state of the VCPU mutex is still hidden in the caller. Separate those ioctls into a new function kvm_arch_vcpu_async_ioctl that returns -ENOIOCTLCMD for more "traditional" synchronous ioctls. Cc: James Hogan <jhogan@kernel.org> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Suggested-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>