summaryrefslogtreecommitdiffstats
path: root/arch
AgeCommit message (Collapse)AuthorFilesLines
2021-11-18riscv: kvm: fix non-kernel-doc comment blockRandy Dunlap1-1/+1
Don't use "/**" to begin a comment block for a non-kernel-doc comment. Prevents this docs build warning: vcpu_sbi.c:3: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst * Copyright (c) 2019 Western Digital Corporation or its affiliates. Fixes: dea8ee31a039 ("RISC-V: KVM: Add SBI v0.1 support") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: kernel test robot <lkp@intel.com> Cc: Atish Patra <atish.patra@wdc.com> Cc: Anup Patel <anup.patel@wdc.com> Cc: kvm@vger.kernel.org Cc: kvm-riscv@lists.infradead.org Cc: linux-riscv@lists.infradead.org Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Message-Id: <20211107034706.30672-1-rdunlap@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-18Merge branch 'kvm-5.16-fixes' into kvm-masterPaolo Bonzini9-58/+108
* Fixes for Xen emulation * Kill kvm_map_gfn() / kvm_unmap_gfn() and broken gfn_to_pfn_cache * Fixes for migration of 32-bit nested guests on 64-bit hypervisor * Compilation fixes * More SEV cleanups
2021-11-18KVM: SEV: Fix typo in and tweak name of cmd_allowed_from_miror()Sean Christopherson1-2/+2
Rename cmd_allowed_from_miror() to is_cmd_allowed_from_mirror(), fixing a typo and making it obvious that the result is a boolean where false means "not allowed". No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211109215101.2211373-7-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-18KVM: SEV: Drop a redundant setting of sev->asid during initializationSean Christopherson1-1/+0
Remove a fully redundant write to sev->asid during SEV/SEV-ES guest initialization. The ASID is set a few lines earlier prior to the call to sev_platform_init(), which doesn't take "sev" as a param, i.e. can't muck with the ASID barring some truly magical behind-the-scenes code. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211109215101.2211373-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-18KVM: SEV: WARN if SEV-ES is marked active but SEV is notSean Christopherson1-1/+1
WARN if the VM is tagged as SEV-ES but not SEV. KVM relies on SEV and SEV-ES being set atomically, and guards common flows with "is SEV", i.e. observing SEV-ES without SEV means KVM has a fatal bug. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211109215101.2211373-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-18KVM: SEV: Set sev_info.active after initial checks in sev_guest_init()Sean Christopherson1-3/+3
Set sev_info.active during SEV/SEV-ES activation before calling any code that can potentially consume sev_info.es_active, e.g. set "active" and "es_active" as a pair immediately after the initial sanity checks. KVM generally expects that es_active can be true if and only if active is true, e.g. sev_asid_new() deliberately avoids sev_es_guest() so that it doesn't get a false negative. This will allow WARNing in sev_es_guest() if the VM is tagged as SEV-ES but not SEV. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211109215101.2211373-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-18KVM: SEV: Disallow COPY_ENC_CONTEXT_FROM if target has created vCPUsSean Christopherson1-1/+6
Reject COPY_ENC_CONTEXT_FROM if the destination VM has created vCPUs. KVM relies on SEV activation to occur before vCPUs are created, e.g. to set VMCB flags and intercepts correctly. Fixes: 54526d1fd593 ("KVM: x86: Support KVM VMs sharing SEV context") Cc: stable@vger.kernel.org Cc: Peter Gonda <pgonda@google.com> Cc: Marc Orr <marcorr@google.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Nathan Tempelman <natet@google.com> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211109215101.2211373-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-18KVM: nVMX: Use a gfn_to_hva_cache for vmptrldDavid Woodhouse2-9/+22
And thus another call to kvm_vcpu_map() can die. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Message-Id: <20211115165030.7422-7-dwmw2@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-18KVM: nVMX: Use kvm_read_guest_offset_cached() for nested VMCS checkDavid Woodhouse1-11/+15
Kill another mostly gratuitous kvm_vcpu_map() which could just use the userspace HVA for it. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Message-Id: <20211115165030.7422-6-dwmw2@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-18KVM: x86/xen: Use sizeof_field() instead of open-coding itDavid Woodhouse1-9/+9
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Message-Id: <20211115165030.7422-4-dwmw2@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-18KVM: nVMX: Use kvm_{read,write}_guest_cached() for shadow_vmcs12David Woodhouse2-9/+20
Using kvm_vcpu_map() for reading from the guest is entirely gratuitous, when all we do is a single memcpy and unmap it again. Fix it up to use kvm_read_guest()... but in fact I couldn't bring myself to do that without also making it use a gfn_to_hva_cache for both that *and* the copy in the other direction. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Message-Id: <20211115165030.7422-5-dwmw2@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-18KVM: x86/xen: Fix get_attr of KVM_XEN_ATTR_TYPE_SHARED_INFODavid Woodhouse1-1/+1
In commit 319afe68567b ("KVM: xen: do not use struct gfn_to_hva_cache") we stopped storing this in-kernel as a GPA, and started storing it as a GFN. Which means we probably should have stopped calling gpa_to_gfn() on it when userspace asks for it back. Cc: stable@vger.kernel.org Fixes: 319afe68567b ("KVM: xen: do not use struct gfn_to_hva_cache") Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Message-Id: <20211115165030.7422-2-dwmw2@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-18KVM: x86/mmu: include EFER.LMA in extended mmu roleMaxim Levitsky2-0/+2
Incorporate EFER.LMA into kvm_mmu_extended_role, as it used to compute the guest root level and is not reflected in kvm_mmu_page_role.level when TDP is in use. When simply running the guest, it is impossible for EFER.LMA and kvm_mmu.root_level to get out of sync, as the guest cannot transition from PAE paging to 64-bit paging without toggling CR0.PG, i.e. without first bouncing through a different MMU context. And stuffing guest state via KVM_SET_SREGS{,2} also ensures a full MMU context reset. However, if KVM_SET_SREGS{,2} is followed by KVM_SET_NESTED_STATE, e.g. to set guest state when migrating the VM while L2 is active, the vCPU state will reflect L2, not L1. If L1 is using TDP for L2, then root_mmu will have been configured using L2's state, despite not being used for L2. If L2.EFER.LMA != L1.EFER.LMA, and L2 is using PAE paging, then root_mmu will be configured for guest PAE paging, but will match the mmu_role for 64-bit paging and cause KVM to not reconfigure root_mmu on the next nested VM-Exit. Alternatively, the root_mmu's role could be invalidated after a successful KVM_SET_NESTED_STATE that yields vcpu->arch.mmu != vcpu->arch.root_mmu, i.e. that switches the active mmu to guest_mmu, but doing so is unnecessarily tricky, and not even needed if L1 and L2 do have the same role (e.g., they are both 64-bit guests and run with the same CR4). Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20211115131837.195527-3-mlevitsk@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-18KVM: nVMX: don't use vcpu->arch.efer when checking host state on nested ↵Maxim Levitsky1-5/+17
state load When loading nested state, don't use check vcpu->arch.efer to get the L1 host's 64-bit vs. 32-bit state and don't check it for consistency with respect to VM_EXIT_HOST_ADDR_SPACE_SIZE, as register state in vCPU may be stale when KVM_SET_NESTED_STATE is called---and architecturally does not exist. When restoring L2 state in KVM, the CPU is placed in non-root where nested VMX code has no snapshot of L1 host state: VMX (conditionally) loads host state fields loaded on VM-exit, but they need not correspond to the state before entry. A simple case occurs in KVM itself, where the host RIP field points to vmx_vmexit rather than the instruction following vmlaunch/vmresume. However, for the particular case of L1 being in 32- or 64-bit mode on entry, the exit controls can be treated instead as the source of truth regarding the state of L1 on entry, and can be used to check that vmcs12.VM_EXIT_HOST_ADDR_SPACE_SIZE matches vmcs12.HOST_EFER if vmcs12.VM_EXIT_LOAD_IA32_EFER is set. The consistency check on CPU EFER vs. vmcs12.VM_EXIT_HOST_ADDR_SPACE_SIZE, instead, happens only on VM-Enter. That's because, again, there's conceptually no "current" L1 EFER to check on KVM_SET_NESTED_STATE. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20211115131837.195527-2-mlevitsk@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-18KVM: Fix steal time asm constraintsDavid Woodhouse1-3/+3
In 64-bit mode, x86 instruction encoding allows us to use the low 8 bits of any GPR as an 8-bit operand. In 32-bit mode, however, we can only use the [abcd] registers. For which, GCC has the "q" constraint instead of the less restrictive "r". Also fix st->preempted, which is an input/output operand rather than an input. Fixes: 7e2175ebd695 ("KVM: x86: Fix recording of guest steal time / preempted status") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Message-Id: <89bf72db1b859990355f9c40713a34e0d2d86c98.camel@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-18cpuid: kvm_find_kvm_cpuid_features() should be declared 'static'Paul Durrant1-1/+1
The lack a static declaration currently results in: arch/x86/kvm/cpuid.c:128:26: warning: no previous prototype for function 'kvm_find_kvm_cpuid_features' when compiling with "W=1". Reported-by: kernel test robot <lkp@intel.com> Fixes: 760849b1476c ("KVM: x86: Make sure KVM_CPUID_FEATURES really are KVM_CPUID_FEATURES") Signed-off-by: Paul Durrant <pdurrant@amazon.com> Message-Id: <20211115144131.5943-1-pdurrant@amazon.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-16KVM: x86: Fix uninitialized eoi_exit_bitmap usage in vcpu_load_eoi_exitmap()黄乐1-2/+6
In vcpu_load_eoi_exitmap(), currently the eoi_exit_bitmap[4] array is initialized only when Hyper-V context is available, in other path it is just passed to kvm_x86_ops.load_eoi_exitmap() directly from on the stack, which would cause unexpected interrupt delivery/handling issues, e.g. an *old* linux kernel that relies on PIT to do clock calibration on KVM might randomly fail to boot. Fix it by passing ioapic_handled_vectors to load_eoi_exitmap() when Hyper-V context is not available. Fixes: f2bc14b69c38 ("KVM: x86: hyper-v: Prepare to meet unallocated Hyper-V context") Cc: stable@vger.kernel.org Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Huang Le <huangle1@jd.com> Message-Id: <62115b277dab49ea97da5633f8522daf@jd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-12Merge tag 'kvmarm-fixes-5.16-1' of ↵Paolo Bonzini8-16/+19
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-master KVM/arm64 fixes for 5.16, take #1 - Fix the host S2 finalization by solely iterating over the memblocks instead of the whole IPA space - Tighten the return value of kvm_vcpu_preferred_target() now that 32bit support is long gone - Make sure the extraction of ESR_ELx.EC is limited to the architected bits - Comment fixups
2021-11-12KVM: SEV: unify cgroup cleanup code for svm_vm_migrate_fromPaolo Bonzini1-14/+11
Use the same cleanup code independent of whether the cgroup to be uncharged and unref'd is the source or the destination cgroup. Use a bool to track whether the destination cgroup has been charged, which also fixes a bug in the error case: the destination cgroup must be uncharged only if it does not match the source. Fixes: b56639318bb2 ("KVM: SEV: Add support for SEV intra host migration") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-12KVM: x86: move guest_pv_has out of user_access sectionPaolo Bonzini1-3/+6
When UBSAN is enabled, the code emitted for the call to guest_pv_has includes a call to __ubsan_handle_load_invalid_value. objtool complains that this call happens with UACCESS enabled; to avoid the warning, pull the calls to user_access_begin into both arms of the "if" statement, after the check for guest_pv_has. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-11Merge branch 'kvm-5.16-fixes' into kvm-masterPaolo Bonzini20-264/+317
* Fix misuse of gfn-to-pfn cache when recording guest steal time / preempted status * Fix selftests on APICv machines * Fix sparse warnings * Fix detection of KVM features in CPUID * Cleanups for bogus writes to MSR_KVM_PV_EOI_EN * Fixes and cleanups for MSR bitmap handling * Cleanups for INVPCID * Make x86 KVM_SOFT_MAX_VCPUS consistent with other architectures
2021-11-11Merge branch 'kvm-sev-move-context' into kvm-masterPaolo Bonzini5-63/+285
Add support for AMD SEV and SEV-ES intra-host migration support. Intra host migration provides a low-cost mechanism for userspace VMM upgrades. In the common case for intra host migration, we can rely on the normal ioctls for passing data from one VMM to the next. SEV, SEV-ES, and other confidential compute environments make most of this information opaque, and render KVM ioctls such as "KVM_GET_REGS" irrelevant. As a result, we need the ability to pass this opaque metadata from one VMM to the next. The easiest way to do this is to leave this data in the kernel, and transfer ownership of the metadata from one KVM VM (or vCPU) to the next. In-kernel hand off makes it possible to move any data that would be unsafe/impossible for the kernel to hand directly to userspace, and cannot be reproduced using data that can be handed to userspace. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-11KVM: x86: Drop arbitrary KVM_SOFT_MAX_VCPUSVitaly Kuznetsov2-2/+1
KVM_CAP_NR_VCPUS is used to get the "recommended" maximum number of VCPUs and arm64/mips/riscv report num_online_cpus(). Powerpc reports either num_online_cpus() or num_present_cpus(), s390 has multiple constants depending on hardware features. On x86, KVM reports an arbitrary value of '710' which is supposed to be the maximum tested value but it's possible to test all KVM_MAX_VCPUS even when there are less physical CPUs available. Drop the arbitrary '710' value and return num_online_cpus() on x86 as well. The recommendation will match other architectures and will mean 'no CPU overcommit'. For reference, QEMU only queries KVM_CAP_NR_VCPUS to print a warning when the requested vCPU number exceeds it. The static limit of '710' is quite weird as smaller systems with just a few physical CPUs should certainly "recommend" less. Suggested-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20211111134733.86601-1-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-11KVM: Move INVPCID type check from vmx and svm to the common kvm_handle_invpcid()Vipin Sharma3-11/+2
Handle #GP on INVPCID due to an invalid type in the common switch statement instead of relying on the callers (VMX and SVM) to manually validate the type. Unlike INVVPID and INVEPT, INVPCID is not explicitly documented to check the type before reading the operand from memory, so deferring the type validity check until after that point is architecturally allowed. Signed-off-by: Vipin Sharma <vipinsh@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211109174426.2350547-3-vipinsh@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-11KVM: VMX: Add a helper function to retrieve the GPR index for INVPCID, ↵Vipin Sharma3-5/+14
INVVPID, and INVEPT handle_invept(), handle_invvpid(), handle_invpcid() read the same reg2 field in vmcs.VMX_INSTRUCTION_INFO to get the index of the GPR that holds the invalidation type. Add a helper to retrieve reg2 from VMX instruction info to consolidate and document the shift+mask magic. Signed-off-by: Vipin Sharma <vipinsh@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211109174426.2350547-2-vipinsh@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-11KVM: nVMX: Clean up x2APIC MSR handling for L2Sean Christopherson1-39/+14
Clean up the x2APIC MSR bitmap intereption code for L2, which is the last holdout of open coded bitmap manipulations. Freshen up the SDM/PRM comment, rename the function to make it abundantly clear the funky behavior is x2APIC specific, and explain _why_ vmcs01's bitmap is ignored (the previous comment was flat out wrong for x2APIC behavior). No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211109013047.2041518-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-11KVM: VMX: Macrofy the MSR bitmap getters and settersSean Christopherson1-60/+25
Add builder macros to generate the MSR bitmap helpers to reduce the amount of copy-paste code, especially with respect to all the magic numbers needed to calc the correct bit location. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211109013047.2041518-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-11KVM: nVMX: Handle dynamic MSR intercept togglingSean Christopherson3-110/+111
Always check vmcs01's MSR bitmap when merging L0 and L1 bitmaps for L2, and always update the relevant bits in vmcs02. This fixes two distinct, but intertwined bugs related to dynamic MSR bitmap modifications. The first issue is that KVM fails to enable MSR interception in vmcs02 for the FS/GS base MSRs if L1 first runs L2 with interception disabled, and later enables interception. The second issue is that KVM fails to honor userspace MSR filtering when preparing vmcs02. Fix both issues simultaneous as fixing only one of the issues (doesn't matter which) would create a mess that no one should have to bisect. Fixing only the first bug would exacerbate the MSR filtering issue as userspace would see inconsistent behavior depending on the whims of L1. Fixing only the second bug (MSR filtering) effectively requires fixing the first, as the nVMX code only knows how to transition vmcs02's bitmap from 1->0. Move the various accessor/mutators that are currently buried in vmx.c into vmx.h so that they can be shared by the nested code. Fixes: 1a155254ff93 ("KVM: x86: Introduce MSR filtering") Fixes: d69129b4e46a ("KVM: nVMX: Disable intercept for FS/GS base MSRs in vmcs02 when possible") Cc: stable@vger.kernel.org Cc: Alexander Graf <graf@amazon.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211109013047.2041518-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-11KVM: nVMX: Query current VMCS when determining if MSR bitmaps are in useSean Christopherson1-4/+4
Check the current VMCS controls to determine if an MSR write will be intercepted due to MSR bitmaps being disabled. In the nested VMX case, KVM will disable MSR bitmaps in vmcs02 if they're disabled in vmcs12 or if KVM can't map L1's bitmaps for whatever reason. Note, the bad behavior is relatively benign in the current code base as KVM sets all bits in vmcs02's MSR bitmap by default, clears bits if and only if L0 KVM also disables interception of an MSR, and only uses the buggy helper for MSR_IA32_SPEC_CTRL. Because KVM explicitly tests WRMSR before disabling interception of MSR_IA32_SPEC_CTRL, the flawed check will only result in KVM reading MSR_IA32_SPEC_CTRL from hardware when it isn't strictly necessary. Tag the fix for stable in case a future fix wants to use msr_write_intercepted(), in which case a buggy implementation in older kernels could prove subtly problematic. Fixes: d28b387fb74d ("KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211109013047.2041518-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-11KVM: x86: Don't update vcpu->arch.pv_eoi.msr_val when a bogus value was ↵Vitaly Kuznetsov1-8/+13
written to MSR_KVM_PV_EOI_EN When kvm_gfn_to_hva_cache_init() call from kvm_lapic_set_pv_eoi() fails, MSR write to MSR_KVM_PV_EOI_EN results in #GP so it is reasonable to expect that the value we keep internally in KVM wasn't updated. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20211108152819.12485-3-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-11KVM: x86: Rename kvm_lapic_enable_pv_eoi()Vitaly Kuznetsov4-5/+5
kvm_lapic_enable_pv_eoi() is a misnomer as the function is also used to disable PV EOI. Rename it to kvm_lapic_set_pv_eoi(). No functional change intended. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20211108152819.12485-2-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-11KVM: x86: Make sure KVM_CPUID_FEATURES really are KVM_CPUID_FEATURESPaul Durrant5-8/+47
Currently when kvm_update_cpuid_runtime() runs, it assumes that the KVM_CPUID_FEATURES leaf is located at 0x40000001. This is not true, however, if Hyper-V support is enabled. In this case the KVM leaves will be offset. This patch introdues as new 'kvm_cpuid_base' field into struct kvm_vcpu_arch to track the location of the KVM leaves and function kvm_update_kvm_cpuid_base() (called from kvm_set_cpuid()) to locate the leaves using the 'KVMKVMKVM\0\0\0' signature (which is now given a definition in kvm_para.h). Adjustment of KVM_CPUID_FEATURES will hence now target the correct leaf. NOTE: A new for_each_possible_hypervisor_cpuid_base() macro is intoduced into processor.h to avoid having duplicate code for the iteration over possible hypervisor base leaves. Signed-off-by: Paul Durrant <pdurrant@amazon.com> Message-Id: <20211105095101.5384-3-pdurrant@amazon.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-11KVM: x86: Add helper to consolidate core logic of SET_CPUID{2} flowsSean Christopherson1-23/+24
Move the core logic of SET_CPUID and SET_CPUID2 to a common helper, the only difference between the two ioctls() is the format of the userspace struct. A future fix will add yet more code to the core logic. No functional change intended. Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211105095101.5384-2-pdurrant@amazon.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-11kvm: mmu: Use fast PF path for access tracking of huge pages when possibleJunaid Shahid1-5/+5
The fast page fault path bails out on write faults to huge pages in order to accommodate dirty logging. This change adds a check to do that only when dirty logging is actually enabled, so that access tracking for huge pages can still use the fast path for write faults in the common case. Signed-off-by: Junaid Shahid <junaids@google.com> Reviewed-by: Ben Gardon <bgardon@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211104003359.2201967-1-junaids@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-11KVM: x86/mmu: Properly dereference rcu-protected TDP MMU sptep iteratorSean Christopherson1-1/+1
Wrap the read of iter->sptep in tdp_mmu_map_handle_target_level() with rcu_dereference(). Shadow pages in the TDP MMU, and thus their SPTEs, are protected by rcu. This fixes a Sparse warning at tdp_mmu.c:900:51: warning: incorrect type in argument 1 (different address spaces) expected unsigned long long [usertype] *sptep got unsigned long long [noderef] [usertype] __rcu *[usertype] sptep Fixes: 7158bee4b475 ("KVM: MMU: pass kvm_mmu_page struct to make_spte") Cc: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211103161833.3769487-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-11KVM: x86: inhibit APICv when KVM_GUESTDBG_BLOCKIRQ activeMaxim Levitsky4-2/+25
KVM_GUESTDBG_BLOCKIRQ relies on interrupts being injected using standard kvm's inject_pending_event, and not via APICv/AVIC. Since this is a debug feature, just inhibit APICv/AVIC while KVM_GUESTDBG_BLOCKIRQ is in use on at least one vCPU. Fixes: 61e5f69ef0837 ("KVM: x86: implement KVM_GUESTDBG_BLOCKIRQ") Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Tested-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211108090245.166408-1-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-11kvm: x86: Convert return type of *is_valid_rdpmc_ecx() to boolJim Mattson5-11/+11
These function names sound like predicates, and they have siblings, *is_valid_msr(), which _are_ predicates. Moreover, there are comments that essentially warn that these functions behave unexpectedly. Flip the polarity of the return values, so that they become predicates, and convert the boolean result to a success/failure code at the outer call site. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211105202058.1048757-1-jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-11KVM: x86: Fix recording of guest steal time / preempted statusDavid Woodhouse2-31/+76
In commit b043138246a4 ("x86/KVM: Make sure KVM_VCPU_FLUSH_TLB flag is not missed") we switched to using a gfn_to_pfn_cache for accessing the guest steal time structure in order to allow for an atomic xchg of the preempted field. This has a couple of problems. Firstly, kvm_map_gfn() doesn't work at all for IOMEM pages when the atomic flag is set, which it is in kvm_steal_time_set_preempted(). So a guest vCPU using an IOMEM page for its steal time would never have its preempted field set. Secondly, the gfn_to_pfn_cache is not invalidated in all cases where it should have been. There are two stages to the GFN->PFN conversion; first the GFN is converted to a userspace HVA, and then that HVA is looked up in the process page tables to find the underlying host PFN. Correct invalidation of the latter would require being hooked up to the MMU notifiers, but that doesn't happen---so it just keeps mapping and unmapping the *wrong* PFN after the userspace page tables change. In the !IOMEM case at least the stale page *is* pinned all the time it's cached, so it won't be freed and reused by anyone else while still receiving the steal time updates. The map/unmap dance only takes care of the KVM administrivia such as marking the page dirty. Until the gfn_to_pfn cache handles the remapping automatically by integrating with the MMU notifiers, we might as well not get a kernel mapping of it, and use the perfectly serviceable userspace HVA that we already have. We just need to implement the atomic xchg on the userspace address with appropriate exception handling, which is fairly trivial. Cc: stable@vger.kernel.org Fixes: b043138246a4 ("x86/KVM: Make sure KVM_VCPU_FLUSH_TLB flag is not missed") Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Message-Id: <3645b9b889dac6438394194bb5586a46b68d581f.camel@infradead.org> [I didn't entirely agree with David's assessment of the usefulness of the gfn_to_pfn cache, and integrated the outcome of the discussion in the above commit message. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-11KVM: SEV: Add support for SEV-ES intra host migrationPeter Gonda1-1/+47
For SEV-ES to work with intra host migration the VMSAs, GHCB metadata, and other SEV-ES info needs to be preserved along with the guest's memory. Signed-off-by: Peter Gonda <pgonda@google.com> Reviewed-by: Marc Orr <marcorr@google.com> Cc: Marc Orr <marcorr@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Wanpeng Li <wanpengli@tencent.com> Cc: Jim Mattson <jmattson@google.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Message-Id: <20211021174303.385706-4-pgonda@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-11KVM: SEV: Add support for SEV intra host migrationPeter Gonda5-0/+162
For SEV to work with intra host migration, contents of the SEV info struct such as the ASID (used to index the encryption key in the AMD SP) and the list of memory regions need to be transferred to the target VM. This change adds a commands for a target VMM to get a source SEV VM's sev info. Signed-off-by: Peter Gonda <pgonda@google.com> Suggested-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Marc Orr <marcorr@google.com> Cc: Marc Orr <marcorr@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Wanpeng Li <wanpengli@tencent.com> Cc: Jim Mattson <jmattson@google.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Message-Id: <20211021174303.385706-3-pgonda@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-11KVM: SEV: provide helpers to charge/uncharge misc_cgPaolo Bonzini1-7/+15
Avoid code duplication across all callers of misc_cg_try_charge and misc_cg_uncharge. The resource type for KVM is always derived from sev->es_active, and the quantity is always 1. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-11KVM: generalize "bugged" VM to "dead" VMPaolo Bonzini1-1/+1
Generalize KVM_REQ_VM_BUGGED so that it can be called even in cases where it is by design that the VM cannot be operated upon. In this case any KVM_BUG_ON should still warn, so introduce a new flag kvm->vm_dead that is separate from kvm->vm_bugged. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-11KVM: SEV: Refactor out sev_es_state structPeter Gonda3-56/+61
Move SEV-ES vCPU metadata into new sev_es_state struct from vcpu_svm. Signed-off-by: Peter Gonda <pgonda@google.com> Suggested-by: Tom Lendacky <thomas.lendacky@amd.com> Acked-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Cc: Marc Orr <marcorr@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Wanpeng Li <wanpengli@tencent.com> Cc: Jim Mattson <jmattson@google.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Message-Id: <20211021174303.385706-2-pgonda@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-11Merge branch 'kvm-guest-sev-migration' into kvm-masterPaolo Bonzini9-9/+201
Add guest api and guest kernel support for SEV live migration. Introduces a new hypercall to notify the host of changes to the page encryption status. If the page is encrypted then it must be migrated through the SEV firmware or a helper VM sharing the key. If page is not encrypted then it can be migrated normally by userspace. This new hypercall is invoked using paravirt_ops. Conflicts: sev_active() replaced by cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT).
2021-11-11x86/kvm: Add kexec support for SEV Live Migration.Ashish Kalra1-0/+25
Reset the host's shared pages list related to kernel specific page encryption status settings before we load a new kernel by kexec. We cannot reset the complete shared pages list here as we need to retain the UEFI/OVMF firmware specific settings. The host's shared pages list is maintained for the guest to keep track of all unencrypted guest memory regions, therefore we need to explicitly mark all shared pages as encrypted again before rebooting into the new guest kernel. Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> Reviewed-by: Steve Rutherford <srutherford@google.com> Message-Id: <3e051424ab839ea470f88333273d7a185006754f.1629726117.git.ashish.kalra@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-11x86/kvm: Add guest support for detecting and enabling SEV Live Migration ↵Ashish Kalra3-0/+91
feature. The guest support for detecting and enabling SEV Live migration feature uses the following logic : - kvm_init_plaform() checks if its booted under the EFI - If not EFI, i) if kvm_para_has_feature(KVM_FEATURE_MIGRATION_CONTROL), issue a wrmsrl() to enable the SEV live migration support - If EFI, i) If kvm_para_has_feature(KVM_FEATURE_MIGRATION_CONTROL), read the UEFI variable which indicates OVMF support for live migration ii) the variable indicates live migration is supported, issue a wrmsrl() to enable the SEV live migration support The EFI live migration check is done using a late_initcall() callback. Also, ensure that _bss_decrypted section is marked as decrypted in the hypervisor's guest page encryption status tracking. Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> Reviewed-by: Steve Rutherford <srutherford@google.com> Message-Id: <b4453e4c87103ebef12217d2505ea99a1c3e0f0f.1629726117.git.ashish.kalra@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-11mm: x86: Invoke hypercall when page encryption status is changedBrijesh Singh6-9/+73
Invoke a hypercall when a memory region is changed from encrypted -> decrypted and vice versa. Hypervisor needs to know the page encryption status during the guest migration. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Borislav Petkov <bp@suse.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Steve Rutherford <srutherford@google.com> Reviewed-by: Venu Busireddy <venu.busireddy@oracle.com> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> Reviewed-by: Borislav Petkov <bp@suse.de> Message-Id: <0a237d5bb08793916c7790a3e653a2cbe7485761.1629726117.git.ashish.kalra@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-11x86/kvm: Add AMD SEV specific Hypercall3Brijesh Singh1-0/+12
KVM hypercall framework relies on alternative framework to patch the VMCALL -> VMMCALL on AMD platform. If a hypercall is made before apply_alternative() is called then it defaults to VMCALL. The approach works fine on non SEV guest. A VMCALL would causes #UD, and hypervisor will be able to decode the instruction and do the right things. But when SEV is active, guest memory is encrypted with guest key and hypervisor will not be able to decode the instruction bytes. To highlight the need to provide this interface, capturing the flow of apply_alternatives() : setup_arch() call init_hypervisor_platform() which detects the hypervisor platform the kernel is running under and then the hypervisor specific initialization code can make early hypercalls. For example, KVM specific initialization in case of SEV will try to mark the "__bss_decrypted" section's encryption state via early page encryption status hypercalls. Now, apply_alternatives() is called much later when setup_arch() calls check_bugs(), so we do need some kind of an early, pre-alternatives hypercall interface. Other cases of pre-alternatives hypercalls include marking per-cpu GHCB pages as decrypted on SEV-ES and per-cpu apf_reason, steal_time and kvm_apic_eoi as decrypted for SEV generally. Add SEV specific hypercall3, it unconditionally uses VMMCALL. The hypercall will be used by the SEV guest to notify encrypted pages to the hypervisor. This kvm_sev_hypercall3() function is abstracted and used as follows : All these early hypercalls are made through early_set_memory_XX() interfaces, which in turn invoke pv_ops (paravirt_ops). This early_set_memory_XX() -> pv_ops.mmu.notify_page_enc_status_changed() is a generic interface and can easily have SEV, TDX and any other future platform specific abstractions added to it. Currently, pv_ops.mmu.notify_page_enc_status_changed() callback is setup to invoke kvm_sev_hypercall3() in case of SEV. Similarly, in case of TDX, pv_ops.mmu.notify_page_enc_status_changed() can be setup to a TDX specific callback. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Borislav Petkov <bp@suse.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Steve Rutherford <srutherford@google.com> Reviewed-by: Venu Busireddy <venu.busireddy@oracle.com> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> Message-Id: <6fd25c749205dd0b1eb492c60d41b124760cc6ae.1629726117.git.ashish.kalra@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-10Merge branch 'exit-cleanups-for-v5.16' of ↵Linus Torvalds27-59/+45
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull exit cleanups from Eric Biederman: "While looking at some issues related to the exit path in the kernel I found several instances where the code is not using the existing abstractions properly. This set of changes introduces force_fatal_sig a way of sending a signal and not allowing it to be caught, and corrects the misuse of the existing abstractions that I found. A lot of the misuse of the existing abstractions are silly things such as doing something after calling a no return function, rolling BUG by hand, doing more work than necessary to terminate a kernel thread, or calling do_exit(SIGKILL) instead of calling force_sig(SIGKILL). In the review a deficiency in force_fatal_sig and force_sig_seccomp where ptrace or sigaction could prevent the delivery of the signal was found. I have added a change that adds SA_IMMUTABLE to change that makes it impossible to interrupt the delivery of those signals, and allows backporting to fix force_sig_seccomp And Arnd found an issue where a function passed to kthread_run had the wrong prototype, and after my cleanup was failing to build." * 'exit-cleanups-for-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (23 commits) soc: ti: fix wkup_m3_rproc_boot_thread return type signal: Add SA_IMMUTABLE to ensure forced siganls do not get changed signal: Replace force_sigsegv(SIGSEGV) with force_fatal_sig(SIGSEGV) exit/r8188eu: Replace the macro thread_exit with a simple return 0 exit/rtl8712: Replace the macro thread_exit with a simple return 0 exit/rtl8723bs: Replace the macro thread_exit with a simple return 0 signal/x86: In emulate_vsyscall force a signal instead of calling do_exit signal/sparc32: In setup_rt_frame and setup_fram use force_fatal_sig signal/sparc32: Exit with a fatal signal when try_to_clear_window_buffer fails exit/syscall_user_dispatch: Send ordinary signals on failure signal: Implement force_fatal_sig exit/kthread: Have kernel threads return instead of calling do_exit signal/s390: Use force_sigsegv in default_trap_handler signal/vm86_32: Properly send SIGSEGV when the vm86 state cannot be saved. signal/vm86_32: Replace open coded BUG_ON with an actual BUG_ON signal/sparc: In setup_tsb_params convert open coded BUG into BUG signal/powerpc: On swapcontext failure force SIGSEGV signal/sh: Use force_sig(SIGKILL) instead of do_group_exit(SIGKILL) signal/mips: Update (_save|_restore)_fp_context to fail with -EFAULT signal/sparc32: Remove unreachable do_exit in do_sparc_fault ...
2021-11-10Merge tag 'arm64-fixes' of ↵Linus Torvalds7-12/+24
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Will Deacon: - Fix double-evaluation of 'pte' macro argument when using 52-bit PAs - Fix signedness of some MTE prctl PR_* constants - Fix kmemleak memory usage by skipping early pgtable allocations - Fix printing of CPU feature register strings - Remove redundant -nostdlib linker flag for vDSO binaries * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: pgtable: make __pte_to_phys/__phys_to_pte_val inline functions arm64: Track no early_pgtable_alloc() for kmemleak arm64: mte: change PR_MTE_TCF_NONE back into an unsigned long arm64: vdso: remove -nostdlib compiler flag arm64: arm64_ftr_reg->name may not be a human-readable string