summaryrefslogtreecommitdiffstats
path: root/arch/x86/kvm/svm/svm.c
AgeCommit message (Collapse)AuthorFilesLines
2021-02-04KVM: x86: move kvm_inject_gp up from kvm_set_xcr to callersPaolo Bonzini1-5/+2
Push the injection of #GP up to the callers, so that they can just use kvm_complete_insn_gp. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-04KVM: SVM: Replace hard-coded value with #defineKrish Sadhukhan1-1/+1
Replace the hard-coded value for bit# 1 in EFLAGS, with the available #define. Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Message-Id: <20210203012842.101447-2-krish.sadhukhan@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-04KVM: SVM: use .prepare_guest_switch() to handle CPU register save/setupMichael Roth1-24/+52
Currently we save host state like user-visible host MSRs, and do some initial guest register setup for MSR_TSC_AUX and MSR_AMD64_TSC_RATIO in svm_vcpu_load(). Defer this until just before we enter the guest by moving the handling to kvm_x86_ops.prepare_guest_switch() similarly to how it is done for the VMX implementation. Additionally, since handling of saving/restoring host user MSRs is the same both with/without SEV-ES enabled, move that handling to common code. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Michael Roth <michael.roth@amd.com> Message-Id: <20210202190126.2185715-4-michael.roth@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-04KVM: SVM: remove uneeded fields from host_save_users_msrsMichael Roth1-4/+2
Now that the set of host user MSRs that need to be individually saved/restored are the same with/without SEV-ES, we can drop the .sev_es_restored flag and just iterate through the list unconditionally for both cases. A subsequent patch can then move these loops to a common path. Signed-off-by: Michael Roth <michael.roth@amd.com> Message-Id: <20210202190126.2185715-3-michael.roth@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-04KVM: SVM: use vmsave/vmload for saving/restoring additional host stateMichael Roth1-26/+5
Using a guest workload which simply issues 'hlt' in a tight loop to generate VMEXITs, it was observed (on a recent EPYC processor) that a significant amount of the VMEXIT overhead measured on the host was the result of MSR reads/writes in svm_vcpu_load/svm_vcpu_put according to perf: 67.49%--kvm_arch_vcpu_ioctl_run | |--23.13%--vcpu_put | kvm_arch_vcpu_put | | | |--21.31%--native_write_msr | | | --1.27%--svm_set_cr4 | |--16.11%--vcpu_load | | | --15.58%--kvm_arch_vcpu_load | | | |--13.97%--svm_set_cr4 | | | | | |--12.64%--native_read_msr Most of these MSRs relate to 'syscall'/'sysenter' and segment bases, and can be saved/restored using 'vmsave'/'vmload' instructions rather than explicit MSR reads/writes. In doing so there is a significant reduction in the svm_vcpu_load/svm_vcpu_put overhead measured for the above workload: 50.92%--kvm_arch_vcpu_ioctl_run | |--19.28%--disable_nmi_singlestep | |--13.68%--vcpu_load | kvm_arch_vcpu_load | | | |--9.19%--svm_set_cr4 | | | | | --6.44%--native_read_msr | | | --3.55%--native_write_msr | |--6.05%--kvm_inject_nmi |--2.80%--kvm_sev_es_mmio_read |--2.19%--vcpu_put | | | --1.25%--kvm_arch_vcpu_put | native_write_msr Quantifying this further, if we look at the raw cycle counts for a normal iteration of the above workload (according to 'rdtscp'), kvm_arch_vcpu_ioctl_run() takes ~4600 cycles from start to finish with the current behavior. Using 'vmsave'/'vmload', this is reduced to ~2800 cycles, a savings of 39%. While this approach doesn't seem to manifest in any noticeable improvement for more realistic workloads like UnixBench, netperf, and kernel builds, likely due to their exit paths generally involving IO with comparatively high latencies, it does improve overall overhead of KVM_RUN significantly, which may still be noticeable for certain situations. It also simplifies some aspects of the code. With this change, explicit save/restore is no longer needed for the following host MSRs, since they are documented[1] as being part of the VMCB State Save Area: MSR_STAR, MSR_LSTAR, MSR_CSTAR, MSR_SYSCALL_MASK, MSR_KERNEL_GS_BASE, MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, MSR_IA32_SYSENTER_EIP, MSR_FS_BASE, MSR_GS_BASE and only the following MSR needs individual handling in svm_vcpu_put/svm_vcpu_load: MSR_TSC_AUX We could drop the host_save_user_msrs array/loop and instead handle MSR read/write of MSR_TSC_AUX directly, but we leave that for now as a potential follow-up. Since 'vmsave'/'vmload' also handles the LDTR and FS/GS segment registers (and associated hidden state)[2], some of the code previously used to handle this is no longer needed, so we drop it as well. The first public release of the SVM spec[3] also documents the same handling for the host state in question, so we make these changes unconditionally. Also worth noting is that we 'vmsave' to the same page that is subsequently used by 'vmrun' to record some host additional state. This is okay, since, in accordance with the spec[2], the additional state written to the page by 'vmrun' does not overwrite any fields written by 'vmsave'. This has also been confirmed through testing (for the above CPU, at least). [1] AMD64 Architecture Programmer's Manual, Rev 3.33, Volume 2, Appendix B, Table B-2 [2] AMD64 Architecture Programmer's Manual, Rev 3.31, Volume 3, Chapter 4, VMSAVE/VMLOAD [3] Secure Virtual Machine Architecture Reference Manual, Rev 3.01 Suggested-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Michael Roth <michael.roth@amd.com> Message-Id: <20210202190126.2185715-2-michael.roth@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-04KVM: SVM: Use asm goto to handle unexpected #UD on SVM instructionsSean Christopherson1-15/+1
Add svm_asm*() macros, a la the existing vmx_asm*() macros, to handle faults on SVM instructions instead of using the generic __ex(), a.k.a. __kvm_handle_fault_on_reboot(). Using asm goto generates slightly better code as it eliminates the in-line JMP+CALL sequences that are needed by __kvm_handle_fault_on_reboot() to avoid triggering BUG() from fixup (which generates bad stack traces). Using SVM specific macros also drops the last user of __ex() and the the last asm linkage to kvm_spurious_fault(), and adds a helper for VMSAVE, which may gain an addition call site in the future (as part of optimizing the SVM context switching). Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20201231002702.2223707-8-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-04KVM: X86: prepend vmx/svm prefix to additional kvm_x86_ops functionsJason Baron1-10/+10
A subsequent patch introduces macros in preparation for simplifying the definition for vmx_x86_ops and svm_x86_ops. Making the naming more uniform expands the coverage of the macros. Add vmx/svm prefix to the following functions: update_exception_bitmap(), enable_nmi_window(), enable_irq_window(), update_cr8_intercept and enable_smi_window(). Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Jason Baron <jbaron@akamai.com> Message-Id: <ed594696f8e2c2b2bfc747504cee9bbb2a269300.1610680941.git.jbaron@akamai.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-04KVM: SVM: Fix #GP handling for doubly-nested virtualizationWei Huang1-2/+18
Under the case of nested on nested (L0, L1, L2 are all hypervisors), we do not support emulation of the vVMLOAD/VMSAVE feature, the L0 hypervisor can inject the proper #VMEXIT to inform L1 of what is happening and L1 can avoid invoking the #GP workaround. For this reason we turns on guest VM's X86_FEATURE_SVME_ADDR_CHK bit for KVM running inside VM to receive the notification and change behavior. Similarly we check if vcpu is under guest mode before emulating the vmware-backdoor instructions. For the case of nested on nested, we let the guest handle it. Co-developed-by: Bandan Das <bsd@redhat.com> Signed-off-by: Bandan Das <bsd@redhat.com> Signed-off-by: Wei Huang <wei.huang2@amd.com> Tested-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210126081831.570253-5-wei.huang2@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-04KVM: SVM: Add support for SVM instruction address check changeWei Huang1-0/+3
New AMD CPUs have a change that checks #VMEXIT intercept on special SVM instructions before checking their EAX against reserved memory region. This change is indicated by CPUID_0x8000000A_EDX[28]. If it is 1, #VMEXIT is triggered before #GP. KVM doesn't need to intercept and emulate #GP faults as #GP is supposed to be triggered. Co-developed-by: Bandan Das <bsd@redhat.com> Signed-off-by: Bandan Das <bsd@redhat.com> Signed-off-by: Wei Huang <wei.huang2@amd.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210126081831.570253-4-wei.huang2@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-04KVM: SVM: Add emulation support for #GP triggered by SVM instructionsBandan Das1-18/+91
While running SVM related instructions (VMRUN/VMSAVE/VMLOAD), some AMD CPUs check EAX against reserved memory regions (e.g. SMM memory on host) before checking VMCB's instruction intercept. If EAX falls into such memory areas, #GP is triggered before VMEXIT. This causes problem under nested virtualization. To solve this problem, KVM needs to trap #GP and check the instructions triggering #GP. For VM execution instructions, KVM emulates these instructions. Co-developed-by: Wei Huang <wei.huang2@amd.com> Signed-off-by: Wei Huang <wei.huang2@amd.com> Signed-off-by: Bandan Das <bsd@redhat.com> Message-Id: <20210126081831.570253-3-wei.huang2@amd.com> [Conditionally enable #GP intercept. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-04KVM: X86: Rename DR6_INIT to DR6_ACTIVE_LOWChenyi Qiang1-3/+3
DR6_INIT contains the 1-reserved bits as well as the bit that is cleared to 0 when the condition (e.g. RTM) happens. The value can be used to initialize dr6 and also be the XOR mask between the #DB exit qualification (or payload) and DR6. Concerning that DR6_INIT is used as initial value only once, rename it to DR6_ACTIVE_LOW and apply it in other places, which would make the incoming changes for bus lock debug exception more simple. Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> Message-Id: <20210202090433.13441-2-chenyi.qiang@intel.com> [Define DR6_FIXED_1 from DR6_ACTIVE_LOW and DR6_VOLATILE. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-03KVM: SVM: Treat SVM as unsupported when running as an SEV guestSean Christopherson1-0/+5
Don't let KVM load when running as an SEV guest, regardless of what CPUID says. Memory is encrypted with a key that is not accessible to the host (L0), thus it's impossible for L0 to emulate SVM, e.g. it'll see garbage when reading the VMCB. Technically, KVM could decrypt all memory that needs to be accessible to the L0 and use shadow paging so that L0 does not need to shadow NPT, but exposing such information to L0 largely defeats the purpose of running as an SEV guest. This can always be revisited if someone comes up with a use case for running VMs inside SEV guests. Note, VMLOAD, VMRUN, etc... will also #GP on GPAs with C-bit set, i.e. KVM is doomed even if the SEV guest is debuggable and the hypervisor is willing to decrypt the VMCB. This may or may not be fixed on CPUs that have the SVME_ADDR_CHK fix. Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210202212017.2486595-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-01-25kvm: tracing: Fix unmatched kvm_entry and kvm_exit eventsLorenzo Brescia1-0/+2
On VMX, if we exit and then re-enter immediately without leaving the vmx_vcpu_run() function, the kvm_entry event is not logged. That means we will see one (or more) kvm_exit, without its (their) corresponding kvm_entry, as shown here: CPU-1979 [002] 89.871187: kvm_entry: vcpu 1 CPU-1979 [002] 89.871218: kvm_exit: reason MSR_WRITE CPU-1979 [002] 89.871259: kvm_exit: reason MSR_WRITE It also seems possible for a kvm_entry event to be logged, but then we leave vmx_vcpu_run() right away (if vmx->emulation_required is true). In this case, we will have a spurious kvm_entry event in the trace. Fix these situations by moving trace_kvm_entry() inside vmx_vcpu_run() (where trace_kvm_exit() already is). A trace obtained with this patch applied looks like this: CPU-14295 [000] 8388.395387: kvm_entry: vcpu 0 CPU-14295 [000] 8388.395392: kvm_exit: reason MSR_WRITE CPU-14295 [000] 8388.395393: kvm_entry: vcpu 0 CPU-14295 [000] 8388.395503: kvm_exit: reason EXTERNAL_INTERRUPT Of course, not calling trace_kvm_entry() in common x86 code any longer means that we need to adjust the SVM side of things too. Signed-off-by: Lorenzo Brescia <lorenzo.brescia@edu.unito.it> Signed-off-by: Dario Faggioli <dfaggioli@suse.com> Message-Id: <160873470698.11652.13483635328769030605.stgit@Wayrath> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-01-07KVM: SVM: Add support for booting APs in an SEV-ES guestTom Lendacky1-0/+10
Typically under KVM, an AP is booted using the INIT-SIPI-SIPI sequence, where the guest vCPU register state is updated and then the vCPU is VMRUN to begin execution of the AP. For an SEV-ES guest, this won't work because the guest register state is encrypted. Following the GHCB specification, the hypervisor must not alter the guest register state, so KVM must track an AP/vCPU boot. Should the guest want to park the AP, it must use the AP Reset Hold exit event in place of, for example, a HLT loop. First AP boot (first INIT-SIPI-SIPI sequence): Execute the AP (vCPU) as it was initialized and measured by the SEV-ES support. It is up to the guest to transfer control of the AP to the proper location. Subsequent AP boot: KVM will expect to receive an AP Reset Hold exit event indicating that the vCPU is being parked and will require an INIT-SIPI-SIPI sequence to awaken it. When the AP Reset Hold exit event is received, KVM will place the vCPU into a simulated HLT mode. Upon receiving the INIT-SIPI-SIPI sequence, KVM will make the vCPU runnable. It is again up to the guest to then transfer control of the AP to the proper location. To differentiate between an actual HLT and an AP Reset Hold, a new MP state is introduced, KVM_MP_STATE_AP_RESET_HOLD, which the vCPU is placed in upon receiving the AP Reset Hold exit event. Additionally, to communicate the AP Reset Hold exit event up to userspace (if needed), a new exit reason is introduced, KVM_EXIT_AP_RESET_HOLD. A new x86 ops function is introduced, vcpu_deliver_sipi_vector, in order to accomplish AP booting. For VMX, vcpu_deliver_sipi_vector is set to the original SIPI delivery function, kvm_vcpu_deliver_sipi_vector(). SVM adds a new function that, for non SEV-ES guests, invokes the original SIPI delivery function, kvm_vcpu_deliver_sipi_vector(), but for SEV-ES guests, implements the logic above. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <e8fbebe8eb161ceaabdad7c01a5859a78b424d5e.1609791600.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-01-07KVM/SVM: Remove leftover __svm_vcpu_run prototype from svm.cUros Bizjak1-2/+0
Commit 16809ecdc1e8a moved __svm_vcpu_run the prototype to svm.h, but forgot to remove the original from svm.c. Fixes: 16809ecdc1e8a ("KVM: SVM: Provide an updated VMRUN invocation for SEV-ES guests") Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Message-Id: <20201220200339.65115-1-ubizjak@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-01-07Merge branch 'kvm-master' into kvm-nextPaolo Bonzini1-3/+5
Fixes to get_mmio_spte, destined to 5.10 stable branch.
2020-12-15KVM: SVM: Provide an updated VMRUN invocation for SEV-ES guestsTom Lendacky1-9/+16
The run sequence is different for an SEV-ES guest compared to a legacy or even an SEV guest. The guest vCPU register state of an SEV-ES guest will be restored on VMRUN and saved on VMEXIT. There is no need to restore the guest registers directly and through VMLOAD before VMRUN and no need to save the guest registers directly and through VMSAVE on VMEXIT. Update the svm_vcpu_run() function to skip register state saving and restoring and provide an alternative function for running an SEV-ES guest in vmenter.S Additionally, certain host state is restored across an SEV-ES VMRUN. As a result certain register states are not required to be restored upon VMEXIT (e.g. FS, GS, etc.), so only do that if the guest is not an SEV-ES guest. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <fb1c66d32f2194e171b95fc1a8affd6d326e10c1.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-12-15KVM: SVM: Provide support for SEV-ES vCPU loadingTom Lendacky1-13/+23
An SEV-ES vCPU requires additional VMCB vCPU load/put requirements. SEV-ES hardware will restore certain registers on VMEXIT, but not save them on VMRUN (see Table B-3 and Table B-4 of the AMD64 APM Volume 2), so make the following changes: General vCPU load changes: - During vCPU loading, perform a VMSAVE to the per-CPU SVM save area and save the current values of XCR0, XSS and PKRU to the per-CPU SVM save area as these registers will be restored on VMEXIT. General vCPU put changes: - Do not attempt to restore registers that SEV-ES hardware has already restored on VMEXIT. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <019390e9cb5e93cd73014fa5a040c17d42588733.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-12-15KVM: SVM: Provide support for SEV-ES vCPU creation/loadingTom Lendacky1-3/+17
An SEV-ES vCPU requires additional VMCB initialization requirements for vCPU creation and vCPU load/put requirements. This includes: General VMCB initialization changes: - Set a VMCB control bit to enable SEV-ES support on the vCPU. - Set the VMCB encrypted VM save area address. - CRx registers are part of the encrypted register state and cannot be updated. Remove the CRx register read and write intercepts and replace them with CRx register write traps to track the CRx register values. - Certain MSR values are part of the encrypted register state and cannot be updated. Remove certain MSR intercepts (EFER, CR_PAT, etc.). - Remove the #GP intercept (no support for "enable_vmware_backdoor"). - Remove the XSETBV intercept since the hypervisor cannot modify XCR0. General vCPU creation changes: - Set the initial GHCB gpa value as per the GHCB specification. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <3a8aef366416eddd5556dfa3fdc212aafa1ad0a2.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-12-15KVM: SVM: Set the encryption mask for the SVM host save areaTom Lendacky1-1/+2
The SVM host save area is used to restore some host state on VMEXIT of an SEV-ES guest. After allocating the save area, clear it and add the encryption mask to the SVM host save area physical address that is programmed into the VM_HSAVE_PA MSR. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <b77aa28af6d7f1a0cb545959e08d6dc75e0c3cba.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-12-15KVM: SVM: Add NMI support for an SEV-ES guestTom Lendacky1-7/+13
The GHCB specification defines how NMIs are to be handled for an SEV-ES guest. To detect the completion of an NMI the hypervisor must not intercept the IRET instruction (because a #VC while running the NMI will issue an IRET) and, instead, must receive an NMI Complete exit event from the guest. Update the KVM support for detecting the completion of NMIs in the guest to follow the GHCB specification. When an SEV-ES guest is active, the IRET instruction will no longer be intercepted. Now, when the NMI Complete exit event is received, the iret_interception() function will be called to simulate the completion of the NMI. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <5ea3dd69b8d4396cefdc9048ebc1ab7caa70a847.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-12-15KVM: SVM: Guest FPU state save/restore not needed for SEV-ES guestTom Lendacky1-0/+8
The guest FPU state is automatically restored on VMRUN and saved on VMEXIT by the hardware, so there is no reason to do this in KVM. Eliminate the allocation of the guest_fpu save area and key off that to skip operations related to the guest FPU state. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <173e429b4d0d962c6a443c4553ffdaf31b7665a4.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-12-15KVM: SVM: Do not report support for SMM for an SEV-ES guestTom Lendacky1-1/+10
SEV-ES guests do not currently support SMM. Update the has_emulated_msr() kvm_x86_ops function to take a struct kvm parameter so that the capability can be reported at a VM level. Since this op is also called during KVM initialization and before a struct kvm instance is available, comments will be added to each implementation of has_emulated_msr() to indicate the kvm parameter can be null. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <75de5138e33b945d2fb17f81ae507bda381808e3.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-12-15KVM: SVM: Add support for CR8 write traps for an SEV-ES guestTom Lendacky1-1/+6
For SEV-ES guests, the interception of control register write access is not recommended. Control register interception occurs prior to the control register being modified and the hypervisor is unable to modify the control register itself because the register is located in the encrypted register state. SEV-ES guests introduce new control register write traps. These traps provide intercept support of a control register write after the control register has been modified. The new control register value is provided in the VMCB EXITINFO1 field, allowing the hypervisor to track the setting of the guest control registers. Add support to track the value of the guest CR8 register using the control register write trap so that the hypervisor understands the guest operating mode. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <5a01033f4c8b3106ca9374b7cadf8e33da852df1.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-12-15KVM: SVM: Add support for CR4 write traps for an SEV-ES guestTom Lendacky1-0/+7
For SEV-ES guests, the interception of control register write access is not recommended. Control register interception occurs prior to the control register being modified and the hypervisor is unable to modify the control register itself because the register is located in the encrypted register state. SEV-ES guests introduce new control register write traps. These traps provide intercept support of a control register write after the control register has been modified. The new control register value is provided in the VMCB EXITINFO1 field, allowing the hypervisor to track the setting of the guest control registers. Add support to track the value of the guest CR4 register using the control register write trap so that the hypervisor understands the guest operating mode. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <c3880bf2db8693aa26f648528fbc6e967ab46e25.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-12-15KVM: SVM: Add support for CR0 write traps for an SEV-ES guestTom Lendacky1-0/+26
For SEV-ES guests, the interception of control register write access is not recommended. Control register interception occurs prior to the control register being modified and the hypervisor is unable to modify the control register itself because the register is located in the encrypted register state. SEV-ES support introduces new control register write traps. These traps provide intercept support of a control register write after the control register has been modified. The new control register value is provided in the VMCB EXITINFO1 field, allowing the hypervisor to track the setting of the guest control registers. Add support to track the value of the guest CR0 register using the control register write trap so that the hypervisor understands the guest operating mode. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <182c9baf99df7e40ad9617ff90b84542705ef0d7.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-12-15KVM: SVM: Add support for EFER write traps for an SEV-ES guestTom Lendacky1-0/+20
For SEV-ES guests, the interception of EFER write access is not recommended. EFER interception occurs prior to EFER being modified and the hypervisor is unable to modify EFER itself because the register is located in the encrypted register state. SEV-ES support introduces a new EFER write trap. This trap provides intercept support of an EFER write after it has been modified. The new EFER value is provided in the VMCB EXITINFO1 field, allowing the hypervisor to track the setting of the guest EFER. Add support to track the value of the guest EFER value using the EFER write trap so that the hypervisor understands the guest operating mode. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <8993149352a3a87cd0625b3b61bfd31ab28977e1.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-12-15KVM: SVM: Support string IO operations for an SEV-ES guestTom Lendacky1-3/+8
For an SEV-ES guest, string-based port IO is performed to a shared (un-encrypted) page so that both the hypervisor and guest can read or write to it and each see the contents. For string-based port IO operations, invoke SEV-ES specific routines that can complete the operation using common KVM port IO support. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <9d61daf0ffda496703717218f415cdc8fd487100.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-12-15KVM: SVM: Add initial support for a VMGEXIT VMEXITTom Lendacky1-2/+4
SEV-ES adds a new VMEXIT reason code, VMGEXIT. Initial support for a VMGEXIT includes mapping the GHCB based on the guest GPA, which is obtained from a new VMCB field, and then validating the required inputs for the VMGEXIT exit reason. Since many of the VMGEXIT exit reasons correspond to existing VMEXIT reasons, the information from the GHCB is copied into the VMCB control exit code areas and KVM register areas. The standard exit handlers are invoked, similar to standard VMEXIT processing. Before restarting the vCPU, the GHCB is updated with any registers that have been updated by the hypervisor. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <c6a4ed4294a369bd75c44d03bd7ce0f0c3840e50.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-12-15KVM: SVM: Prepare for SEV-ES exit handling in the sev.c fileTom Lendacky1-26/+38
This is a pre-patch to consolidate some exit handling code into callable functions. Follow-on patches for SEV-ES exit handling will then be able to use them from the sev.c file. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <5b8b0ffca8137f3e1e257f83df9f5c881c8a96a3.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-12-15KVM: SVM: Cannot re-initialize the VMCB after shutdown with SEV-ESTom Lendacky1-0/+7
When a SHUTDOWN VMEXIT is encountered, normally the VMCB is re-initialized so that the guest can be re-launched. But when a guest is running as an SEV-ES guest, the VMSA cannot be re-initialized because it has been encrypted. For now, just return -EINVAL to prevent a possible attempt at a guest reset. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <aa6506000f6f3a574de8dbcdab0707df844cb00c.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-12-15KVM: SVM: Do not allow instruction emulation under SEV-ESTom Lendacky1-0/+6
When a guest is running as an SEV-ES guest, it is not possible to emulate instructions. Add support to prevent instruction emulation. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <f6355ea3024fda0a3eb5eb99c6b62dca10d792bd.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-12-15KVM: SVM: Prevent debugging under SEV-ESTom Lendacky1-0/+9
Since the guest register state of an SEV-ES guest is encrypted, debugging is not supported. Update the code to prevent guest debugging when the guest has protected state. Additionally, an SEV-ES guest must only and always intercept DR7 reads and writes. Update set_dr_intercepts() and clr_dr_intercepts() to account for this. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <8db966fa2f9803d6454ce773863025d0e2e7f3cc.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-12-15KVM: SVM: Add required changes to support intercepts under SEV-ESTom Lendacky1-10/+73
When a guest is running under SEV-ES, the hypervisor cannot access the guest register state. There are numerous places in the KVM code where certain registers are accessed that are not allowed to be accessed (e.g. RIP, CR0, etc). Add checks to prevent register accesses and add intercept update support at various points within the KVM code. Also, when handling a VMGEXIT, exceptions are passed back through the GHCB. Since the RDMSR/WRMSR intercepts (may) inject a #GP on error, update the SVM intercepts to handle this for SEV-ES guests. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> [Redo MSR part using the .complete_emulated_msr callback. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-12-15KVM: x86: introduce complete_emulated_msr callbackPaolo Bonzini1-0/+1
This will be used by SEV-ES to inject MSR failure via the GHCB. Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-12-14KVM: SVM: Add support for the SEV-ES VMSATom Lendacky1-2/+22
Allocate a page during vCPU creation to be used as the encrypted VM save area (VMSA) for the SEV-ES guest. Provide a flag in the kvm_vcpu_arch structure that indicates whether the guest state is protected. When freeing a VMSA page that has been encrypted, the cache contents must be flushed using the MSR_AMD64_VM_PAGE_FLUSH before freeing the page. [ i386 build warnings ] Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <fde272b17eec804f3b9db18c131262fe074015c5.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-12-14KVM: SVM: Add support for SEV-ES capability in KVMTom Lendacky1-10/+10
Add support to KVM for determining if a system is capable of supporting SEV-ES as well as determining if a guest is an SEV-ES guest. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <e66792323982c822350e40c7a1cf67ea2978a70b.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-12-14KVM/VMX/SVM: Move kvm_machine_check function to x86.hUros Bizjak1-20/+0
Move kvm_machine_check to x86.h to avoid two exact copies of the same function in kvm.c and svm.c. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Message-Id: <20201029135600.122392-1-ubizjak@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-12-11KVM: x86: reinstate vendor-agnostic check on SPEC_CTRL cpuid bitsPaolo Bonzini1-10/+4
Until commit e7c587da1252 ("x86/speculation: Use synthetic bits for IBRS/IBPB/STIBP"), KVM was testing both Intel and AMD CPUID bits before allowing the guest to write MSR_IA32_SPEC_CTRL and MSR_IA32_PRED_CMD. Testing only Intel bits on VMX processors, or only AMD bits on SVM processors, fails if the guests are created with the "opposite" vendor as the host. While at it, also tweak the host CPU check to use the vendor-agnostic feature bit X86_FEATURE_IBPB, since we only care about the availability of the MSR on the host here and not about specific CPUID bits. Fixes: e7c587da1252 ("x86/speculation: Use synthetic bits for IBRS/IBPB/STIBP") Cc: stable@vger.kernel.org Reported-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-12-04kvm: svm: de-allocate svm_cpu_data for all cpus in svm_cpu_uninit()Jacob Xu1-2/+2
The cpu arg for svm_cpu_uninit() was previously ignored resulting in the per cpu structure svm_cpu_data not being de-allocated for all cpus. Signed-off-by: Jacob Xu <jacobhxu@google.com> Message-Id: <20201203205939.1783969-1-jacobhxu@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-17KVM: SVM: fix error return code in svm_create_vcpu()Chen Zhou1-1/+3
Fix to return a negative error code from the error handling case instead of 0 in function svm_create_vcpu(), as done elsewhere in this function. Fixes: f4c847a95654 ("KVM: SVM: refactor msr permission bitmap allocation") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Chen Zhou <chenzhou10@huawei.com> Message-Id: <20201117025426.167824-1-chenzhou10@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-16KVM: SVM: check CR4 changes against vcpu->archPaolo Bonzini1-1/+1
Similarly to what vmx/vmx.c does, use vcpu->arch.cr4 to check if CR4 bits PGE, PKE and OSXSAVE have changed. When switching between VMCB01 and VMCB02, CPUID has to be adjusted every time if CR4.PKE or CR4.OSXSAVE change; without this patch, instead, CR4 would be checked against the previous value for L2 on vmentry, and against the previous value for L1 on vmexit, and CPUID would not be updated. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-16KVM: SVM: Move asid to vcpu_svmCathy Avery1-3/+7
KVM does not have separate ASIDs for L1 and L2; either the nested hypervisor and nested guests share a single ASID, or on older processor the ASID is used only to implement TLB flushing. Either way, ASIDs are handled at the VM level. In preparation for having different VMCBs passed to VMLOAD/VMRUN/VMSAVE for L1 and L2, store the current ASID to struct vcpu_svm and only move it to the VMCB in svm_vcpu_run. This way, TLB flushes can be applied no matter which VMCB will be active during the next svm_vcpu_run. Signed-off-by: Cathy Avery <cavery@redhat.com> Message-Id: <20201011184818.3609-2-cavery@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-15kvm: x86: Sink cpuid update into vendor-specific set_cr4 functionsJim Mattson1-0/+3
On emulated VM-entry and VM-exit, update the CPUID bits that reflect CR4.OSXSAVE and CR4.PKE. This fixes a bug where the CPUID bits could continue to reflect L2 CR4 values after emulated VM-exit to L1. It also fixes a related bug where the CPUID bits could continue to reflect L1 CR4 values after emulated VM-entry to L2. The latter bug is mainly relevant to SVM, wherein CPUID is not a required intercept. However, it could also be relevant to VMX, because the code to conditionally update these CPUID bits assumes that the guest CPUID and the guest CR4 are always in sync. Fixes: 8eb3f87d903168 ("KVM: nVMX: fix guest CR4 loading when emulating L2 to L1 exit") Fixes: 2acf923e38fb6a ("KVM: VMX: Enable XSAVE/XRSTOR for guest") Fixes: b9baba86148904 ("KVM, pkeys: expose CPUID/CR4 to guest") Reported-by: Abhiroop Dabral <adabral@paloaltonetworks.com> Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Ricardo Koller <ricarkol@google.com> Reviewed-by: Peter Shier <pshier@google.com> Cc: Haozhong Zhang <haozhong.zhang@intel.com> Cc: Dexuan Cui <dexuan.cui@intel.com> Cc: Huaitong Han <huaitong.han@intel.com> Message-Id: <20201029170648.483210-1-jmattson@google.com>
2020-11-15KVM: x86: Move vendor CR4 validity check to dedicated kvm_x86_ops hookSean Christopherson1-2/+7
Split out VMX's checks on CR4.VMXE to a dedicated hook, .is_valid_cr4(), and invoke the new hook from kvm_valid_cr4(). This fixes an issue where KVM_SET_SREGS would return success while failing to actually set CR4. Fixing the issue by explicitly checking kvm_x86_ops.set_cr4()'s return in __set_sregs() is not a viable option as KVM has already stuffed a variety of vCPU state. Note, kvm_valid_cr4() and is_valid_cr4() have different return types and inverted semantics. This will be remedied in a future patch. Fixes: 5e1746d6205d ("KVM: nVMX: Allow setting the VMXE bit in CR4") Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20201007014417.29276-5-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-15KVM: SVM: Drop VMXE check from svm_set_cr4()Sean Christopherson1-3/+0
Drop svm_set_cr4()'s explicit check CR4.VMXE now that common x86 handles the check by incorporating VMXE into the CR4 reserved bits, via kvm_cpu_caps. SVM obviously does not set X86_FEATURE_VMX. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20201007014417.29276-4-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-13KVM: SVM: Update cr3_lm_rsvd_bits for AMD SEV guestsBabu Moger1-0/+8
For AMD SEV guests, update the cr3_lm_rsvd_bits to mask the memory encryption bit in reserved bits. Signed-off-by: Babu Moger <babu.moger@amd.com> Message-Id: <160521948301.32054.5783800787423231162.stgit@bmoger-ubuntu> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-23Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-136/+268
Pull KVM updates from Paolo Bonzini: "For x86, there is a new alternative and (in the future) more scalable implementation of extended page tables that does not need a reverse map from guest physical addresses to host physical addresses. For now it is disabled by default because it is still lacking a few of the existing MMU's bells and whistles. However it is a very solid piece of work and it is already available for people to hammer on it. Other updates: ARM: - New page table code for both hypervisor and guest stage-2 - Introduction of a new EL2-private host context - Allow EL2 to have its own private per-CPU variables - Support of PMU event filtering - Complete rework of the Spectre mitigation PPC: - Fix for running nested guests with in-kernel IRQ chip - Fix race condition causing occasional host hard lockup - Minor cleanups and bugfixes x86: - allow trapping unknown MSRs to userspace - allow userspace to force #GP on specific MSRs - INVPCID support on AMD - nested AMD cleanup, on demand allocation of nested SVM state - hide PV MSRs and hypercalls for features not enabled in CPUID - new test for MSR_IA32_TSC writes from host and guest - cleanups: MMU, CPUID, shared MSRs - LAPIC latency optimizations ad bugfixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (232 commits) kvm: x86/mmu: NX largepage recovery for TDP MMU kvm: x86/mmu: Don't clear write flooding count for direct roots kvm: x86/mmu: Support MMIO in the TDP MMU kvm: x86/mmu: Support write protection for nesting in tdp MMU kvm: x86/mmu: Support disabling dirty logging for the tdp MMU kvm: x86/mmu: Support dirty logging for the TDP MMU kvm: x86/mmu: Support changed pte notifier in tdp MMU kvm: x86/mmu: Add access tracking for tdp_mmu kvm: x86/mmu: Support invalidate range MMU notifier for TDP MMU kvm: x86/mmu: Allocate struct kvm_mmu_pages for all pages in TDP MMU kvm: x86/mmu: Add TDP MMU PF handler kvm: x86/mmu: Remove disallowed_hugepage_adjust shadow_walk_iterator arg kvm: x86/mmu: Support zapping SPTEs in the TDP MMU KVM: Cache as_id in kvm_memory_slot kvm: x86/mmu: Add functions to handle changed TDP SPTEs kvm: x86/mmu: Allocate and free TDP MMU roots kvm: x86/mmu: Init / Uninit the TDP MMU kvm: x86/mmu: Introduce tdp_iter KVM: mmu: extract spte.h and spte.c KVM: mmu: Separate updating a PTE from kvm_set_pte_rmapp ...
2020-10-21Merge branch 'kvm-fixes' into 'next'Paolo Bonzini1-1/+7
Pick up bugfixes from 5.9, otherwise various tests fail.
2020-10-21KVM: nSVM: implement on demand allocation of the nested stateMaxim Levitsky1-28/+33
This way we don't waste memory on VMs which don't use nesting virtualization even when the host enabled it for them. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20201001112954.6258-5-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>