linux - Linux Kernel (branches are rebased on master from time to time)

Age	Commit message (Collapse)	Author	Files	Lines
2020-05-27	Merge branch 'kvm-master' into HEAD	Paolo Bonzini	1	-1/+1
	Merge AMD fixes before doing more development work.
2020-05-27	KVM: x86: simplify is_mmio_spte	Paolo Bonzini	1	-1/+1
	We can simply look at bits 52-53 to identify MMIO entries in KVM's page tables. Therefore, there is no need to pass a mask to kvm_mmu_set_mmio_spte_mask. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-20	Merge tag 'noinstr-x86-kvm-2020-05-16' of ↵	Paolo Bonzini	1	-1/+1
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into HEAD
2020-05-16	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm	Linus Torvalds	2	-22/+53
	Pull kvm fixes from Paolo Bonzini: "A new testcase for guest debugging (gdbstub) that exposed a bunch of bugs, mostly for AMD processors. And a few other x86 fixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: Fix off-by-one error in kvm_vcpu_ioctl_x86_setup_mce KVM: x86: Fix pkru save/restore when guest CR4.PKE=0, move it to x86.c KVM: SVM: Disable AVIC before setting V_IRQ KVM: Introduce kvm_make_all_cpus_request_except() KVM: VMX: pass correct DR6 for GD userspace exit KVM: x86, SVM: isolate vcpu->arch.dr6 from vmcb->save.dr6 KVM: SVM: keep DR6 synchronized with vcpu->arch.dr6 KVM: nSVM: trap #DB and #BP to userspace if guest debugging is on KVM: selftests: Add KVM_SET_GUEST_DEBUG test KVM: X86: Fix single-step with KVM_SET_GUEST_DEBUG KVM: X86: Set RTM for DB_VECTOR too for KVM_EXIT_DEBUG KVM: x86: fix DR6 delivery for various cases of #DB injection KVM: X86: Declare KVM_CAP_SET_GUEST_DEBUG properly
2020-05-15	KVM: SVM: Remove unnecessary V_IRQ unsetting	Suravee Suthikulpanit	1	-2/+0
	This has already been handled in the prior call to svm_clear_vintr(). Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Message-Id: <1588771076-73790-5-git-send-email-suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-15	KVM: SVM: Merge svm_enable_vintr into svm_set_vintr	Suravee Suthikulpanit	1	-8/+2
	Code clean up and remove unnecessary intercept check for INTERCEPT_VINTR. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Message-Id: <1588771076-73790-4-git-send-email-suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-15	KVM: X86: Introduce more exit_fastpath_completion enum values	Wanpeng Li	1	-8/+7
	Adds a fastpath_t typedef since enum lines are a bit long, and replace EXIT_FASTPATH_SKIP_EMUL_INS with two new exit_fastpath_completion enum values. - EXIT_FASTPATH_EXIT_HANDLED kvm will still go through it's full run loop, but it would skip invoking the exit handler. - EXIT_FASTPATH_REENTER_GUEST complete fastpath, guest can be re-entered without invoking the exit handler or going back to vcpu_run Tested-by: Haiwei Li <lihaiwei@tencent.com> Cc: Haiwei Li <lihaiwei@tencent.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Message-Id: <1588055009-12677-4-git-send-email-wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-15	KVM: x86/mmu: Drop KVM's hugepage enums in favor of the kernel's enums	Sean Christopherson	1	-1/+1
	Replace KVM's PT_PAGE_TABLE_LEVEL, PT_DIRECTORY_LEVEL and PT_PDPE_LEVEL with the kernel's PG_LEVEL_4K, PG_LEVEL_2M and PG_LEVEL_1G. KVM's enums are borderline impossible to remember and result in code that is visually difficult to audit, e.g. if (!enable_ept) ept_lpage_level = 0; else if (cpu_has_vmx_ept_1g_page()) ept_lpage_level = PT_PDPE_LEVEL; else if (cpu_has_vmx_ept_2m_page()) ept_lpage_level = PT_DIRECTORY_LEVEL; else ept_lpage_level = PT_PAGE_TABLE_LEVEL; versus if (!enable_ept) ept_lpage_level = 0; else if (cpu_has_vmx_ept_1g_page()) ept_lpage_level = PG_LEVEL_1G; else if (cpu_has_vmx_ept_2m_page()) ept_lpage_level = PG_LEVEL_2M; else ept_lpage_level = PG_LEVEL_4K; No functional change intended. Suggested-by: Barret Rhoden <brho@google.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200428005422.4235-4-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-13	KVM: x86/mmu: Capture TDP level when updating CPUID	Sean Christopherson	1	-1/+1
	Snapshot the TDP level now that it's invariant (SVM) or dependent only on host capabilities and guest CPUID (VMX). This avoids having to call kvm_x86_ops.get_tdp_level() when initializing a TDP MMU and/or calculating the page role, and thus avoids the associated retpoline. Drop the WARN in vmx_get_tdp_level() as updating CPUID while L2 is active is legal, if dodgy. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200502043234.12481-11-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-13	KVM: VMX: Add proper cache tracking for CR0	Sean Christopherson	1	-5/+0
	Move CR0 caching into the standard register caching mechanism in order to take advantage of the availability checks provided by regs_avail. This avoids multiple VMREADs in the (uncommon) case where kvm_read_cr0() is called multiple times in a single VM-Exit, and more importantly eliminates a kvm_x86_ops hook, saves a retpoline on SVM when reading CR0, and squashes the confusing naming discrepancy of "cache_reg" vs. "decache_cr0_guest_bits". No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200502043234.12481-8-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-13	KVM: VMX: Add proper cache tracking for CR4	Sean Christopherson	1	-5/+0
	Move CR4 caching into the standard register caching mechanism in order to take advantage of the availability checks provided by regs_avail. This avoids multiple VMREADs and retpolines (when configured) during nested VMX transitions as kvm_read_cr4_bits() is invoked multiple times on each transition, e.g. when stuffing CR0 and CR3. As an added bonus, this eliminates a kvm_x86_ops hook, saves a retpoline on SVM when reading CR4, and squashes the confusing naming discrepancy of "cache_reg" vs. "decache_cr4_guest_bits". No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200502043234.12481-7-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-13	KVM: x86: Save L1 TSC offset in 'struct kvm_vcpu_arch'	Sean Christopherson	1	-11/+0
	Save L1's TSC offset in 'struct kvm_vcpu_arch' and drop the kvm_x86_ops hook read_l1_tsc_offset(). This avoids a retpoline (when configured) when reading L1's effective TSC, which is done at least once on every VM-Exit. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200502043234.12481-2-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-13	KVM: x86: handle wrap around 32-bit address space	Paolo Bonzini	1	-3/+0
	KVM is not handling the case where EIP wraps around the 32-bit address space (that is, outside long mode). This is needed both in vmx.c and in emulate.c. SVM with NRIPS is okay, but it can still print an error to dmesg due to integer overflow. Reported-by: Nick Peterson <everdox@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-13	KVM: x86: Replace late check_nested_events() hack with more precise fix	Paolo Bonzini	1	-5/+20
	Add an argument to interrupt_allowed and nmi_allowed, to checking if interrupt injection is blocked. Use the hook to handle the case where an interrupt arrives between check_nested_events() and the injection logic. Drop the retry of check_nested_events() that hack-a-fixed the same condition. Blocking injection is also a bit of a hack, e.g. KVM should do exiting and non-exiting interrupt processing in a single pass, but it's a more precise hack. The old comment is also misleading, e.g. KVM_REQ_EVENT is purely an optimization, setting it on every run loop (which KVM doesn't do) should not affect functionality, only performance. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200423022550.15113-13-sean.j.christopherson@intel.com> [Extend to SVM, add SMI and NMI. Even though NMI and SMI cannot come asynchronously right now, making the fix generic is easy and removes a special case. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-13	KVM: nSVM: Preserve IRQ/NMI/SMI priority irrespective of exiting behavior	Paolo Bonzini	1	-3/+9
	Short circuit vmx_check_nested_events() if an unblocked IRQ/NMI/SMI is pending and needs to be injected into L2, priority between coincident events is not dependent on exiting behavior. Fixes: b518ba9fa691 ("KVM: nSVM: implement check_nested_events for interrupts") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-13	KVM: nSVM: Report interrupts as allowed when in L2 and exit-on-interrupt is set	Paolo Bonzini	3	-11/+22
	Report interrupts as allowed when the vCPU is in L2 and L2 is being run with exit-on-interrupts enabled and EFLAGS.IF=1 (either on the host or on the guest according to VINTR). Interrupts are always unblocked from L1's perspective in this case. While moving nested_exit_on_intr to svm.h, use INTERCEPT_INTR properly instead of assuming it's zero (which it is of course). Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-13	KVM: SVM: Split out architectural interrupt/NMI/SMI blocking checks	Paolo Bonzini	2	-11/+43
	Move the architectural (non-KVM specific) interrupt/NMI/SMI blocking checks to a separate helper so that they can be used in a future patch by svm_check_nested_events(). No functional change intended. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-13	KVM: nSVM: Move SMI vmexit handling to svm_check_nested_events()	Paolo Bonzini	3	-8/+21
	Unlike VMX, SVM allows a hypervisor to take a SMI vmexit without having any special SMM-monitor enablement sequence. Therefore, it has to be handled like interrupts and NMIs. Check for an unblocked SMI in svm_check_nested_events() so that pending SMIs are correctly prioritized over IRQs and NMIs when the latter events will trigger VM-Exit. Note that there is no need to test explicitly for SMI vmexits, because guests always runs outside SMM and therefore can never get an SMI while they are blocked. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-13	KVM: nSVM: Report NMIs as allowed when in L2 and Exit-on-NMI is set	Paolo Bonzini	3	-5/+8
	Report NMIs as allowed when the vCPU is in L2 and L2 is being run with Exit-on-NMI enabled, as NMIs are always unblocked from L1's perspective in this case. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-13	KVM: x86: replace is_smm checks with kvm_x86_ops.smi_allowed	Paolo Bonzini	1	-1/+1
	Do not hardcode is_smm so that all the architectural conditions for blocking SMIs are listed in a single place. Well, in two places because this introduces some code duplication between Intel and AMD. This ensures that nested SVM obeys GIF in kvm_vcpu_has_events. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-13	KVM: x86: Make return for {interrupt_nmi,smi}_allowed() a bool instead of int	Sean Christopherson	1	-8/+8
	Return an actual bool for kvm_x86_ops' {interrupt_nmi}_allowed() hook to better reflect the return semantics, and to avoid creating an even bigger mess when the related VMX code is refactored in upcoming patches. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200423022550.15113-5-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-13	KVM: SVM: Implement check_nested_events for NMI	Cathy Avery	3	-19/+23
	Migrate nested guest NMI intercept processing to new check_nested_events. Signed-off-by: Cathy Avery <cavery@redhat.com> Message-Id: <20200414201107.22952-2-cavery@redhat.com> [Reorder clauses as NMIs have higher priority than IRQs; inject immediate vmexit as is now done for IRQ vmexits. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-13	KVM: SVM: immediately inject INTR vmexit	Paolo Bonzini	1	-3/+3
	We can immediately leave SVM guest mode in svm_check_nested_events now that we have the nested_run_pending mechanism. This makes things easier because we can run the rest of inject_pending_event with GIF=0, and KVM will naturally end up requesting the next interrupt window. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-13	KVM: SVM: leave halted state on vmexit	Paolo Bonzini	1	-0/+3
	Similar to VMX, we need to leave the halted state when performing a vmexit. Failure to do so will cause a hang after vmexit. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-13	KVM: SVM: introduce nested_run_pending	Paolo Bonzini	3	-1/+8
	We want to inject vmexits immediately from svm_check_nested_events, so that the interrupt/NMI window requests happen in inject_pending_event right after it returns. This however has the same issue as in vmx_check_nested_events, so introduce a nested_run_pending flag with the exact same purpose of delaying vmexit injection after the vmentry. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-13	Merge branch 'kvm-amd-fixes' into HEAD	Paolo Bonzini	2	-22/+55

2020-05-08	Merge branch 'akpm' (patches from Andrew)	Linus Torvalds	1	-1/+1
	Merge misc fixes from Andrew Morton: "14 fixes and one selftest to verify the ipc fixes herein" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm: limit boost_watermark on small zones ubsan: disable UBSAN_ALIGNMENT under COMPILE_TEST mm/vmscan: remove unnecessary argument description of isolate_lru_pages() epoll: atomically remove wait entry on wake up kselftests: introduce new epoll60 testcase for catching lost wakeups percpu: make pcpu_alloc() aware of current gfp context mm/slub: fix incorrect interpretation of s->offset scripts/gdb: repair rb_first() and rb_last() eventpoll: fix missing wakeup for ovflist in ep_poll_callback arch/x86/kvm/svm/sev.c: change flag passed to GUP fast in sev_pin_memory() scripts/decodecode: fix trapping instruction formatting kernel/kcov.c: fix typos in kcov_remote_start documentation mm/page_alloc: fix watchdog soft lockups during set_zone_contiguous() mm, memcg: fix error return value of mem_cgroup_css_alloc() ipc/mqueue.c: change __do_notify() to bypass check_kill_permission()
2020-05-08	KVM: x86, SVM: isolate vcpu->arch.dr6 from vmcb->save.dr6	Paolo Bonzini	2	-12/+32
	There are two issues with KVM_EXIT_DEBUG on AMD, whose root cause is the different handling of DR6 on intercepted #DB exceptions on Intel and AMD. On Intel, #DB exceptions transmit the DR6 value via the exit qualification field of the VMCS, and the exit qualification only contains the description of the precise event that caused a vmexit. On AMD, instead the DR6 field of the VMCB is filled in as if the #DB exception was to be injected into the guest. This has two effects when guest debugging is in use: * the guest DR6 is clobbered * the kvm_run->debug.arch.dr6 field can accumulate more debug events, rather than just the last one that happened (the testcase in the next patch covers this issue). This patch fixes both issues by emulating, so to speak, the Intel behavior on AMD processors. The important observation is that (after the previous patches) the VMCB value of DR6 is only ever observable from the guest is KVM_DEBUGREG_WONT_EXIT is set. Therefore we can actually set vmcb->save.dr6 to any value we want as long as KVM_DEBUGREG_WONT_EXIT is clear, which it will be if guest debugging is enabled. Therefore it is possible to enter the guest with an all-zero DR6, reconstruct the #DB payload from the DR6 we get at exit time, and let kvm_deliver_exception_payload move the newly set bits into vcpu->arch.dr6. Some extra bits may be included in the payload if KVM_DEBUGREG_WONT_EXIT is set, but this is harmless. This may not be the most optimized way to deal with this, but it is simple and, being confined within SVM code, it gets rid of the set_dr6 callback and kvm_update_dr6. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-08	KVM: SVM: keep DR6 synchronized with vcpu->arch.dr6	Paolo Bonzini	2	-15/+18
	kvm_x86_ops.set_dr6 is only ever called with vcpu->arch.dr6 as the second argument. Ensure that the VMCB value is synchronized to vcpu->arch.dr6 on #DB (both "normal" and nested) and nested vmentry, so that the current value of DR6 is always available in vcpu->arch.dr6. The get_dr6 callback can just access vcpu->arch.dr6 and becomes redundant. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-07	arch/x86/kvm/svm/sev.c: change flag passed to GUP fast in sev_pin_memory()	Janakarajan Natarajan	1	-1/+1
	When trying to lock read-only pages, sev_pin_memory() fails because FOLL_WRITE is used as the flag for get_user_pages_fast(). Commit 73b0140bf0fe ("mm/gup: change GUP fast to use flags rather than a write 'bool'") updated the get_user_pages_fast() call sites to use flags, but incorrectly updated the call in sev_pin_memory(). As the original coding of this call was correct, revert the change made by that commit. Fixes: 73b0140bf0fe ("mm/gup: change GUP fast to use flags rather than a write 'bool'") Signed-off-by: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Wanpeng Li <wanpengli@tencent.com> Cc: Jim Mattson <jmattson@google.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Mike Marshall <hubcap@omnibond.com> Cc: Brijesh Singh <brijesh.singh@amd.com> Link: http://lkml.kernel.org/r/20200423152419.87202-1-Janakarajan.Natarajan@amd.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-07	KVM: nSVM: trap #DB and #BP to userspace if guest debugging is on	Paolo Bonzini	1	-0/+8
	Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-04	KVM: SVM: fill in kvm_run->debug.arch.dr[67]	Paolo Bonzini	1	-0/+2
	The corresponding code was added for VMX in commit 42dbaa5a057 ("KVM: x86: Virtualize debug registers, 2008-12-15) but never for AMD. Fix this. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-24	KVM: SVM: do not allow VMRUN inside SMM	Paolo Bonzini	1	-1/+5
	VMRUN is not supported inside the SMM handler and the behavior is undefined. Just raise a #UD. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-23	KVM: x86: move nested-related kvm_x86_ops to a separate struct	Paolo Bonzini	3	-10/+12
	Clean up some of the patching of kvm_x86_ops, by moving kvm_x86_ops related to nested virtualization into a separate struct. As a result, these ops will always be non-NULL on VMX. This is not a problem: * check_nested_events is only called if is_guest_mode(vcpu) returns true * get_nested_state treats VMXOFF state the same as nested being disabled * set_nested_state fails if you attempt to set nested state while nesting is disabled * nested_enable_evmcs could already be called on a CPU without VMX enabled in CPUID. * nested_get_evmcs_version was fixed in the previous patch Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-21	KVM: SVM: avoid infinite loop on NPF from bad address	Paolo Bonzini	1	-0/+7
	When a nested page fault is taken from an address that does not have a memslot associated to it, kvm_mmu_do_page_fault returns RET_PF_EMULATE (via mmu_set_spte) and kvm_mmu_page_fault then invokes svm_need_emulation_on_page_fault. The default answer there is to return false, but in this case this just causes the page fault to be retried ad libitum. Since this is not a fast path, and the only other case where it is taken is an erratum, just stick a kvm_vcpu_gfn_to_memslot check in there to detect the common case where the erratum is not happening. This fixes an infinite loop in the new set_memory_region_test. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-21	KVM: nSVM: Check for CR0.CD and CR0.NW on VMRUN of nested guests	Krish Sadhukhan	1	-0/+4
	According to section "Canonicalization and Consistency Checks" in APM vol. 2, the following guest state combination is illegal: "CR0.CD is zero and CR0.NW is set" Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Message-Id: <20200409205035.16830-2-krish.sadhukhan@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-21	KVM: X86: Improve latency for single target IPI fastpath	Wanpeng Li	1	-8/+16
	IPI and Timer cause the main MSRs write vmexits in cloud environment observation, let's optimize virtual IPI latency more aggressively to inject target IPI as soon as possible. Running kvm-unit-tests/vmexit.flat IPI testing on SKX server, disable adaptive advance lapic timer and adaptive halt-polling to avoid the interference, this patch can give another 7% improvement. w/o fastpath -> x86.c fastpath 4238 -> 3543 16.4% x86.c fastpath -> vmx.c fastpath 3543 -> 3293 7% w/o fastpath -> vmx.c fastpath 4238 -> 3293 22.3% Cc: Haiwei Li <lihaiwei@tencent.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200410174703.1138-3-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-21	KVM: SVM: Use do_machine_check to pass MCE to the host	Uros Bizjak	1	-5/+21
	Use do_machine_check instead of INT $12 to pass MCE to the host, the same approach VMX uses. On a related note, there is no reason to limit the use of do_machine_check to 64 bit targets, as is currently done for VMX. MCE handling works for both target families. The patch is only compile tested, for both, 64 and 32 bit targets, someone should test the passing of the exception by injecting some MCEs into the guest. For future non-RFC patch, kvm_machine_check should be moved to some appropriate header file. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Message-Id: <20200411153627.3474710-1-ubizjak@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-21	KVM: x86: Introduce KVM_REQ_TLB_FLUSH_CURRENT to flush current ASID	Sean Christopherson	1	-0/+1
	Add KVM_REQ_TLB_FLUSH_CURRENT to allow optimized TLB flushing of VMX's EPTP/VPID contexts[] from the KVM MMU and/or in a deferred manner, e.g. to flush L2's context during nested VM-Enter. Convert KVM_REQ_TLB_FLUSH to KVM_REQ_TLB_FLUSH_CURRENT in flows where the flush is directly associated with vCPU-scoped instruction emulation, i.e. MOV CR3 and INVPCID. Add a comment in vmx_vcpu_load_vmcs() above its KVM_REQ_TLB_FLUSH to make it clear that it deliberately requests a flush of all contexts. Service any pending flush request on nested VM-Exit as it's possible a nested VM-Exit could occur after requesting a flush for L2. Add the same logic for nested VM-Enter even though it's _extremely_ unlikely for flush to be pending on nested VM-Enter, but theoretically possible (in the future) due to RSM (SMM) emulation. [] Intel also has an Address Space Identifier (ASID) concept, e.g. EPTP+VPID+PCID == ASID, it's just not documented in the SDM because the rules of invalidation are different based on which piece of the ASID is being changed, i.e. whether the EPTP, VPID, or PCID context must be invalidated. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200320212833.3507-25-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-21	KVM: x86: Rename ->tlb_flush() to ->tlb_flush_all()	Sean Christopherson	1	-1/+1
	Rename ->tlb_flush() to ->tlb_flush_all() in preparation for adding a new hook to flush only the current ASID/context. Opportunstically replace the comment in vmx_flush_tlb() that explains why it flushes all EPTP/VPID contexts with a comment explaining why it unconditionally uses INVEPT when EPT is enabled. I.e. rely on the "all" part of the name to clarify why it does global INVEPT/INVVPID. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200320212833.3507-23-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-21	KVM: SVM: Document the ASID logic in svm_flush_tlb()	Sean Christopherson	1	-0/+7
	Add a comment in svm_flush_tlb() to document why it flushes only the current ASID, even when it is invoked when flushing remote TLBs. Cc: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200320212833.3507-22-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-21	KVM: SVM: Wire up ->tlb_flush_guest() directly to svm_flush_tlb()	Sean Christopherson	1	-6/+1
	Use svm_flush_tlb() directly for kvm_x86_ops->tlb_flush_guest() now that the @invalidate_gpa param to ->tlb_flush() is gone, i.e. the wrapper for ->tlb_flush_guest() is no longer necessary. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200320212833.3507-18-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-21	KVM: x86: Drop @invalidate_gpa param from kvm_x86_ops' tlb_flush()	Sean Christopherson	3	-5/+5
	Drop @invalidate_gpa from ->tlb_flush() and kvm_vcpu_flush_tlb() now that all callers pass %true for said param, or ignore the param (SVM has an internal call to svm_flush_tlb() in svm_flush_tlb_guest that somewhat arbitrarily passes %false). Remove __vmx_flush_tlb() as it is no longer used. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200320212833.3507-17-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-20	KVM: x86: Move "flush guest's TLB" logic to separate kvm_x86_ops hook	Sean Christopherson	1	-0/+6
	Add a dedicated hook to handle flushing TLB entries on behalf of the guest, i.e. for a paravirtualized TLB flush, and use it directly instead of bouncing through kvm_vcpu_flush_tlb(). For VMX, change the effective implementation implementation to never do INVEPT and flush only the current context, i.e. to always flush via INVVPID(SINGLE_CONTEXT). The INVEPT performed by __vmx_flush_tlb() when @invalidate_gpa=false and enable_vpid=0 is unnecessary, as it will only flush guest-physical mappings; linear and combined mappings are flushed by VM-Enter when VPID is disabled, and changes in the guest pages tables do not affect guest-physical mappings. When EPT and VPID are enabled, doing INVVPID is not required (by Intel's architecture) to invalidate guest-physical mappings, i.e. TLB entries that cache guest-physical mappings can live across INVVPID as the mappings are associated with an EPTP, not a VPID. The intent of @invalidate_gpa is to inform vmx_flush_tlb() that it must "invalidate gpa mappings", i.e. do INVEPT and not simply INVVPID. Other than nested VPID handling, which now calls vpid_sync_context() directly, the only scenario where KVM can safely do INVVPID instead of INVEPT (when EPT is enabled) is if KVM is flushing TLB entries from the guest's perspective, i.e. is only required to invalidate linear mappings. For SVM, flushing TLB entries from the guest's perspective can be done by flushing the current ASID, as changes to the guest's page tables are associated only with the current ASID. Adding a dedicated ->tlb_flush_guest() paves the way toward removing @invalidate_gpa, which is a potentially dangerous control flag as its meaning is not exactly crystal clear, even for those who are familiar with the subtleties of what mappings Intel CPUs are/aren't allowed to keep across various invalidation scenarios. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200320212833.3507-15-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-15	KVM: SVM: Fix __svm_vcpu_run declaration.	Uros Bizjak	1	-1/+1
	The function returns no value. Cc: Paolo Bonzini <pbonzini@redhat.com> Fixes: 199cd1d7b534 ("KVM: SVM: Split svm_vcpu_run inline assembly to separate file") Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Message-Id: <20200409114926.1407442-1-ubizjak@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-15	KVM: SVM: Do not setup frame pointer in __svm_vcpu_run	Uros Bizjak	1	-1/+0
	__svm_vcpu_run is a leaf function and does not need a frame pointer. %rbp is also destroyed a few instructions later when guest registers are loaded. Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Message-Id: <20200409120440.1427215-1-ubizjak@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-15	KVM: SVM: Fix build error due to missing release_pages() include	Borislav Petkov	1	-0/+1
	Fix: arch/x86/kvm/svm/sev.c: In function ‘sev_pin_memory’: arch/x86/kvm/svm/sev.c:360:3: error: implicit declaration of function ‘release_pages’;\ did you mean ‘reclaim_pages’? [-Werror=implicit-function-declaration] 360 \| release_pages(pages, npinned); \| ^~~~~~~~~~~~~ \| reclaim_pages because svm.c includes pagemap.h but the carved out sev.c needs it too. Triggered by a randconfig build. Fixes: eaf78265a4ab ("KVM: SVM: Move SEV code to separate file") Signed-off-by: Borislav Petkov <bp@suse.de> Message-Id: <20200411160927.27954-1-bp@alien8.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-15	KVM: SVM: Do not mark svm_vcpu_run with STACK_FRAME_NON_STANDARD	Uros Bizjak	1	-1/+0
	svm_vcpu_run does not change stack or frame pointer anymore. Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Message-Id: <20200414113612.104501-1-ubizjak@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-14	KVM: SVM: move more vmentry code to assembly	Paolo Bonzini	2	-7/+9
	Manipulate IF around vmload/vmsave to remove the confusing usage of local_irq_enable where interrupts are actually disabled via GIF. And stuff the RSB immediately without waiting for a RET to avoid Spectre-v2 attacks. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-14	KVM: SVM: fix compilation with modular PSP and non-modular KVM	Paolo Bonzini	1	-1/+4
	Use svm_sev_enabled() in order to cull all calls to PSP code. Otherwise, compilation fails with undefined symbols if the PSP device driver is compiled as a module and KVM is not. Reported-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>