summaryrefslogtreecommitdiffstats
path: root/arch/x86/kvm
AgeCommit message (Collapse)AuthorFilesLines
2020-06-23KVM: VMX: Remove vcpu_vmx's defunct copy of host_pkruSean Christopherson1-2/+0
Remove vcpu_vmx.host_pkru, which got left behind when PKRU support was moved to common x86 code. No functional change intended. Fixes: 37486135d3a7b ("KVM: x86: Fix pkru save/restore when guest CR4.PKE=0, move it to x86.c") Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200617034123.25647-1-sean.j.christopherson@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-23KVM: x86: allow TSC to differ by NTP correction bounds without TSC scalingMarcelo Tosatti1-1/+2
The Linux TSC calibration procedure is subject to small variations (its common to see +-1 kHz difference between reboots on a given CPU, for example). So migrating a guest between two hosts with identical processor can fail, in case of a small variation in calibrated TSC between them. Without TSC scaling, the current kernel interface will either return an error (if user_tsc_khz <= tsc_khz) or enable TSC catchup mode. This change enables the following TSC tolerance check to accept KVM_SET_TSC_KHZ within tsc_tolerance_ppm (which is 250ppm by default). /* * Compute the variation in TSC rate which is acceptable * within the range of tolerance and decide if the * rate being applied is within that bounds of the hardware * rate. If so, no scaling or compensation need be done. */ thresh_lo = adjust_tsc_khz(tsc_khz, -tsc_tolerance_ppm); thresh_hi = adjust_tsc_khz(tsc_khz, tsc_tolerance_ppm); if (user_tsc_khz < thresh_lo || user_tsc_khz > thresh_hi) { pr_debug("kvm: requested TSC rate %u falls outside tolerance [%u,%u]\n", user_tsc_khz, thresh_lo, thresh_hi); use_scaling = 1; } NTP daemon in the guest can correct this difference (NTP can correct upto 500ppm). Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Message-Id: <20200616114741.GA298183@fuller.cnet> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-23KVM: X86: Fix MSR range of APIC registers in X2APIC modeXiaoyao Li1-2/+2
Only MSR address range 0x800 through 0x8ff is architecturally reserved and dedicated for accessing APIC registers in x2APIC mode. Fixes: 0105d1a52640 ("KVM: x2apic interface to lapic") Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Message-Id: <20200616073307.16440-1-xiaoyao.li@intel.com> Cc: stable@vger.kernel.org Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-22KVM: VMX: Stop context switching MSR_IA32_UMWAIT_CONTROLSean Christopherson1-18/+0
Remove support for context switching between the guest's and host's desired UMWAIT_CONTROL. Propagating the guest's value to hardware isn't required for correct functionality, e.g. KVM intercepts reads and writes to the MSR, and the latency effects of the settings controlled by the MSR are not architecturally visible. As a general rule, KVM should not allow the guest to control power management settings unless explicitly enabled by userspace, e.g. see KVM_CAP_X86_DISABLE_EXITS. E.g. Intel's SDM explicitly states that C0.2 can improve the performance of SMT siblings. A devious guest could disable C0.2 so as to improve the performance of their workloads at the detriment to workloads running in the host or on other VMs. Wholesale removal of UMWAIT_CONTROL context switching also fixes a race condition where updates from the host may cause KVM to enter the guest with the incorrect value. Because updates are are propagated to all CPUs via IPI (SMP function callback), the value in hardware may be stale with respect to the cached value and KVM could enter the guest with the wrong value in hardware. As above, the guest can't observe the bad value, but it's a weird and confusing wart in the implementation. Removal also fixes the unnecessary usage of VMX's atomic load/store MSR lists. Using the lists is only necessary for MSRs that are required for correct functionality immediately upon VM-Enter/VM-Exit, e.g. EFER on old hardware, or for MSRs that need to-the-uop precision, e.g. perf related MSRs. For UMWAIT_CONTROL, the effects are only visible in the kernel via TPAUSE/delay(), and KVM doesn't do any form of delay in vcpu_vmx_run(). Using the atomic lists is undesirable as they are more expensive than direct RDMSR/WRMSR. Furthermore, even if giving the guest control of the MSR is legitimate, e.g. in pass-through scenarios, it's not clear that the benefits would outweigh the overhead. E.g. saving and restoring an MSR across a VMX roundtrip costs ~250 cycles, and if the guest diverged from the host that cost would be paid on every run of the guest. In other words, if there is a legitimate use case then it should be enabled by a new per-VM capability. Note, KVM still needs to emulate MSR_IA32_UMWAIT_CONTROL so that it can correctly expose other WAITPKG features to the guest, e.g. TPAUSE, UMWAIT and UMONITOR. Fixes: 6e3ba4abcea56 ("KVM: vmx: Emulate MSR IA32_UMWAIT_CONTROL") Cc: stable@vger.kernel.org Cc: Jingqi Liu <jingqi.liu@intel.com> Cc: Tao Xu <tao3.xu@intel.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200623005135.10414-1-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-22KVM: nVMX: Plumb L2 GPA through to PML emulationSean Christopherson4-9/+10
Explicitly pass the L2 GPA to kvm_arch_write_log_dirty(), which for all intents and purposes is vmx_write_pml_buffer(), instead of having the latter pull the GPA from vmcs.GUEST_PHYSICAL_ADDRESS. If the dirty bit update is the result of KVM emulation (rare for L2), then the GPA in the VMCS may be stale and/or hold a completely unrelated GPA. Fixes: c5f983f6e8455 ("nVMX: Implement emulated Page Modification Logging") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200622215832.22090-2-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-22KVM: x86/mmu: Avoid mixing gpa_t with gfn_t in walk_addr_generic()Vitaly Kuznetsov1-6/+3
translate_gpa() returns a GPA, assigning it to 'real_gfn' seems obviously wrong. There is no real issue because both 'gpa_t' and 'gfn_t' are u64 and we don't use the value in 'real_gfn' as a GFN, we do real_gfn = gpa_to_gfn(real_gfn); instead. 'If you see a "buffalo" sign on an elephant's cage, do not trust your eyes', but let's fix it for good. No functional change intended. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200622151435.752560-1-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-22KVM: LAPIC: ensure APIC map is up to date on concurrent update requestsPaolo Bonzini1-20/+31
The following race can cause lost map update events: cpu1 cpu2 apic_map_dirty = true ------------------------------------------------------------ kvm_recalculate_apic_map: pass check mutex_lock(&kvm->arch.apic_map_lock); if (!kvm->arch.apic_map_dirty) and in process of updating map ------------------------------------------------------------- other calls to apic_map_dirty = true might be too late for affected cpu ------------------------------------------------------------- apic_map_dirty = false ------------------------------------------------------------- kvm_recalculate_apic_map: bail out on if (!kvm->arch.apic_map_dirty) To fix it, record the beginning of an update of the APIC map in apic_map_dirty. If another APIC map change switches apic_map_dirty back to DIRTY during the update, kvm_recalculate_apic_map should not make it CLEAN, and the other caller will go through the slow path. Reported-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-22kvm: lapic: fix broken vcpu hotplugIgor Mammedov1-0/+1
Guest fails to online hotplugged CPU with error smpboot: do_boot_cpu failed(-1) to wakeup CPU#4 It's caused by the fact that kvm_apic_set_state(), which used to call recalculate_apic_map() unconditionally and pulled hotplugged CPU into apic map, is updating map conditionally on state changes. In this case the APIC map is not considered dirty and the is not updated. Fix the issue by forcing unconditional update from kvm_apic_set_state(), like it used to be. Fixes: 4abaffce4d25a ("KVM: LAPIC: Recalculate apic map in batch") Cc: stable@vger.kernel.org Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20200622160830.426022-1-imammedo@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-19Revert "KVM: VMX: Micro-optimize vmexit time when not exposing PMU"Vitaly Kuznetsov1-2/+1
Guest crashes are observed on a Cascade Lake system when 'perf top' is launched on the host, e.g. BUG: unable to handle kernel paging request at fffffe0000073038 PGD 7ffa7067 P4D 7ffa7067 PUD 7ffa6067 PMD 7ffa5067 PTE ffffffffff120 Oops: 0000 [#1] SMP PTI CPU: 1 PID: 1 Comm: systemd Not tainted 4.18.0+ #380 ... Call Trace: serial8250_console_write+0xfe/0x1f0 call_console_drivers.constprop.0+0x9d/0x120 console_unlock+0x1ea/0x460 Call traces are different but the crash is imminent. The problem was blindly bisected to the commit 041bc42ce2d0 ("KVM: VMX: Micro-optimize vmexit time when not exposing PMU"). It was also confirmed that the issue goes away if PMU is exposed to the guest. With some instrumentation of the guest we can see what is being switched (when we do atomic_switch_perf_msrs()): vmx_vcpu_run: switching 2 msrs vmx_vcpu_run: switching MSR38f guest: 70000000d host: 70000000f vmx_vcpu_run: switching MSR3f1 guest: 0 host: 2 The current guess is that PEBS (MSR_IA32_PEBS_ENABLE, 0x3f1) is to blame. Regardless of whether PMU is exposed to the guest or not, PEBS needs to be disabled upon switch. This reverts commit 041bc42ce2d0efac3b85bbb81dea8c74b81f4ef9. Reported-by: Maxime Coquelin <maxime.coquelin@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200619094046.654019-1-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-15KVM: VMX: Add helpers to identify interrupt type from intr_infoSean Christopherson1-12/+20
Add is_intr_type() and is_intr_type_n() to consolidate the boilerplate code for querying a specific type of interrupt given an encoded value from VMCS.VM_{ENTER,EXIT}_INTR_INFO, with and without an associated vector respectively. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200609014518.26756-1-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-15kvm/svm: disable KCSAN for svm_vcpu_run()Qian Cai1-1/+1
For some reasons, running a simple qemu-kvm command with KCSAN will reset AMD hosts. It turns out svm_vcpu_run() could not be instrumented. Disable it for now. # /usr/libexec/qemu-kvm -name ubuntu-18.04-server-cloudimg -cpu host -smp 2 -m 2G -hda ubuntu-18.04-server-cloudimg.qcow2 === console output === Kernel 5.6.0-next-20200408+ on an x86_64 hp-dl385g10-05 login: <...host reset...> HPE ProLiant System BIOS A40 v1.20 (03/09/2018) (C) Copyright 1982-2018 Hewlett Packard Enterprise Development LP Early system initialization, please wait... Signed-off-by: Qian Cai <cai@lca.pw> Message-Id: <20200415153709.1559-1-cai@lca.pw> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-13Merge tag 'kbuild-v5.8-2' of ↵Linus Torvalds1-6/+6
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull more Kbuild updates from Masahiro Yamada: - fix build rules in binderfs sample - fix build errors when Kbuild recurses to the top Makefile - covert '---help---' in Kconfig to 'help' * tag 'kbuild-v5.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: treewide: replace '---help---' in Kconfig files with 'help' kbuild: fix broken builds because of GZIP,BZIP2,LZOP variables samples: binderfs: really compile this sample and fix build issues
2020-06-13Merge tag 'x86-entry-2020-06-12' of ↵Linus Torvalds3-3/+3
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 entry updates from Thomas Gleixner: "The x86 entry, exception and interrupt code rework This all started about 6 month ago with the attempt to move the Posix CPU timer heavy lifting out of the timer interrupt code and just have lockless quick checks in that code path. Trivial 5 patches. This unearthed an inconsistency in the KVM handling of task work and the review requested to move all of this into generic code so other architectures can share. Valid request and solved with another 25 patches but those unearthed inconsistencies vs. RCU and instrumentation. Digging into this made it obvious that there are quite some inconsistencies vs. instrumentation in general. The int3 text poke handling in particular was completely unprotected and with the batched update of trace events even more likely to expose to endless int3 recursion. In parallel the RCU implications of instrumenting fragile entry code came up in several discussions. The conclusion of the x86 maintainer team was to go all the way and make the protection against any form of instrumentation of fragile and dangerous code pathes enforcable and verifiable by tooling. A first batch of preparatory work hit mainline with commit d5f744f9a2ac ("Pull x86 entry code updates from Thomas Gleixner") That (almost) full solution introduced a new code section '.noinstr.text' into which all code which needs to be protected from instrumentation of all sorts goes into. Any call into instrumentable code out of this section has to be annotated. objtool has support to validate this. Kprobes now excludes this section fully which also prevents BPF from fiddling with it and all 'noinstr' annotated functions also keep ftrace off. The section, kprobes and objtool changes are already merged. The major changes coming with this are: - Preparatory cleanups - Annotating of relevant functions to move them into the noinstr.text section or enforcing inlining by marking them __always_inline so the compiler cannot misplace or instrument them. - Splitting and simplifying the idtentry macro maze so that it is now clearly separated into simple exception entries and the more interesting ones which use interrupt stacks and have the paranoid handling vs. CR3 and GS. - Move quite some of the low level ASM functionality into C code: - enter_from and exit to user space handling. The ASM code now calls into C after doing the really necessary ASM handling and the return path goes back out without bells and whistels in ASM. - exception entry/exit got the equivivalent treatment - move all IRQ tracepoints from ASM to C so they can be placed as appropriate which is especially important for the int3 recursion issue. - Consolidate the declaration and definition of entry points between 32 and 64 bit. They share a common header and macros now. - Remove the extra device interrupt entry maze and just use the regular exception entry code. - All ASM entry points except NMI are now generated from the shared header file and the corresponding macros in the 32 and 64 bit entry ASM. - The C code entry points are consolidated as well with the help of DEFINE_IDTENTRY*() macros. This allows to ensure at one central point that all corresponding entry points share the same semantics. The actual function body for most entry points is in an instrumentable and sane state. There are special macros for the more sensitive entry points, e.g. INT3 and of course the nasty paranoid #NMI, #MCE, #DB and #DF. They allow to put the whole entry instrumentation and RCU handling into safe places instead of the previous pray that it is correct approach. - The INT3 text poke handling is now completely isolated and the recursion issue banned. Aside of the entry rework this required other isolation work, e.g. the ability to force inline bsearch. - Prevent #DB on fragile entry code, entry relevant memory and disable it on NMI, #MC entry, which allowed to get rid of the nested #DB IST stack shifting hackery. - A few other cleanups and enhancements which have been made possible through this and already merged changes, e.g. consolidating and further restricting the IDT code so the IDT table becomes RO after init which removes yet another popular attack vector - About 680 lines of ASM maze are gone. There are a few open issues: - An escape out of the noinstr section in the MCE handler which needs some more thought but under the aspect that MCE is a complete trainwreck by design and the propability to survive it is low, this was not high on the priority list. - Paravirtualization When PV is enabled then objtool complains about a bunch of indirect calls out of the noinstr section. There are a few straight forward ways to fix this, but the other issues vs. general correctness were more pressing than parawitz. - KVM KVM is inconsistent as well. Patches have been posted, but they have not yet been commented on or picked up by the KVM folks. - IDLE Pretty much the same problems can be found in the low level idle code especially the parts where RCU stopped watching. This was beyond the scope of the more obvious and exposable problems and is on the todo list. The lesson learned from this brain melting exercise to morph the evolved code base into something which can be validated and understood is that once again the violation of the most important engineering principle "correctness first" has caused quite a few people to spend valuable time on problems which could have been avoided in the first place. The "features first" tinkering mindset really has to stop. With that I want to say thanks to everyone involved in contributing to this effort. Special thanks go to the following people (alphabetical order): Alexandre Chartre, Andy Lutomirski, Borislav Petkov, Brian Gerst, Frederic Weisbecker, Josh Poimboeuf, Juergen Gross, Lai Jiangshan, Macro Elver, Paolo Bonzin,i Paul McKenney, Peter Zijlstra, Vitaly Kuznetsov, and Will Deacon" * tag 'x86-entry-2020-06-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (142 commits) x86/entry: Force rcu_irq_enter() when in idle task x86/entry: Make NMI use IDTENTRY_RAW x86/entry: Treat BUG/WARN as NMI-like entries x86/entry: Unbreak __irqentry_text_start/end magic x86/entry: __always_inline CR2 for noinstr lockdep: __always_inline more for noinstr x86/entry: Re-order #DB handler to avoid *SAN instrumentation x86/entry: __always_inline arch_atomic_* for noinstr x86/entry: __always_inline irqflags for noinstr x86/entry: __always_inline debugreg for noinstr x86/idt: Consolidate idt functionality x86/idt: Cleanup trap_init() x86/idt: Use proper constants for table size x86/idt: Add comments about early #PF handling x86/idt: Mark init only functions __init x86/entry: Rename trace_hardirqs_off_prepare() x86/entry: Clarify irq_{enter,exit}_rcu() x86/entry: Remove DBn stacks x86/entry: Remove debug IDT frobbing x86/entry: Optimize local_db_save() for virt ...
2020-06-14treewide: replace '---help---' in Kconfig files with 'help'Masahiro Yamada1-6/+6
Since commit 84af7a6194e4 ("checkpatch: kconfig: prefer 'help' over '---help---'"), the number of '---help---' has been gradually decreasing, but there are still more than 2400 instances. This commit finishes the conversion. While I touched the lines, I also fixed the indentation. There are a variety of indentation styles found. a) 4 spaces + '---help---' b) 7 spaces + '---help---' c) 8 spaces + '---help---' d) 1 space + 1 tab + '---help---' e) 1 tab + '---help---' (correct indentation) f) 1 tab + 1 space + '---help---' g) 1 tab + 2 spaces + '---help---' In order to convert all of them to 1 tab + 'help', I ran the following commend: $ find . -name 'Kconfig*' | xargs sed -i 's/^[[:space:]]*---help---/\thelp/' Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-06-12Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds11-150/+169
Pull more KVM updates from Paolo Bonzini: "The guest side of the asynchronous page fault work has been delayed to 5.9 in order to sync with Thomas's interrupt entry rework, but here's the rest of the KVM updates for this merge window. MIPS: - Loongson port PPC: - Fixes ARM: - Fixes x86: - KVM_SET_USER_MEMORY_REGION optimizations - Fixes - Selftest fixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (62 commits) KVM: x86: do not pass poisoned hva to __kvm_set_memory_region KVM: selftests: fix sync_with_host() in smm_test KVM: async_pf: Inject 'page ready' event only if 'page not present' was previously injected KVM: async_pf: Cleanup kvm_setup_async_pf() kvm: i8254: remove redundant assignment to pointer s KVM: x86: respect singlestep when emulating instruction KVM: selftests: Don't probe KVM_CAP_HYPERV_ENLIGHTENED_VMCS when nested VMX is unsupported KVM: selftests: do not substitute SVM/VMX check with KVM_CAP_NESTED_STATE check KVM: nVMX: Consult only the "basic" exit reason when routing nested exit KVM: arm64: Move hyp_symbol_addr() to kvm_asm.h KVM: arm64: Synchronize sysreg state on injecting an AArch32 exception KVM: arm64: Make vcpu_cp1x() work on Big Endian hosts KVM: arm64: Remove host_cpu_context member from vcpu structure KVM: arm64: Stop sparse from moaning at __hyp_this_cpu_ptr KVM: arm64: Handle PtrAuth traps early KVM: x86: Unexport x86_fpu_cache and make it static KVM: selftests: Ignore KVM 5-level paging support for VM_MODE_PXXV48_4K KVM: arm64: Save the host's PtrAuth keys in non-preemptible context KVM: arm64: Stop save/restoring ACTLR_EL1 KVM: arm64: Add emulation for 32bit guests accessing ACTLR2 ...
2020-06-11KVM: x86: do not pass poisoned hva to __kvm_set_memory_regionPaolo Bonzini1-6/+1
__kvm_set_memory_region does not use the hva at all, so trying to catch use-after-delete is pointless and, worse, it fails access_ok now that we apply it to all memslots including private kernel ones. This fixes an AVIC regression. Fixes: 09d952c971a5 ("KVM: check userspace_addr for all memslots") Reported-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-11KVM: async_pf: Inject 'page ready' event only if 'page not present' was ↵Vitaly Kuznetsov1-2/+5
previously injected 'Page not present' event may or may not get injected depending on guest's state. If the event wasn't injected, there is no need to inject the corresponding 'page ready' event as the guest may get confused. E.g. Linux thinks that the corresponding 'page not present' event wasn't delivered *yet* and allocates a 'dummy entry' for it. This entry is never freed. Note, 'wakeup all' events have no corresponding 'page not present' event and always get injected. s390 seems to always be able to inject 'page not present', the change is effectively a nop. Suggested-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200610175532.779793-2-vkuznets@redhat.com> Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=208081 Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-11kvm: i8254: remove redundant assignment to pointer sColin Ian King1-1/+0
The pointer s is being assigned a value that is never read, the assignment is redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Message-Id: <20200609233121.1118683-1-colin.king@canonical.com> Fixes: 7837699fa6d7 ("KVM: In kernel PIT model") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-11KVM: x86: respect singlestep when emulating instructionFelipe Franciosi1-1/+1
When userspace configures KVM_GUESTDBG_SINGLESTEP, KVM will manage the presence of X86_EFLAGS_TF via kvm_set/get_rflags on vcpus. The actual rflag bit is therefore hidden from callers. That includes init_emulate_ctxt() which uses the value returned from kvm_get_flags() to set ctxt->tf. As a result, x86_emulate_instruction() will skip a single step, leaving singlestep_rip stale and not returning to userspace. This resolves the issue by observing the vcpu guest_debug configuration alongside ctxt->tf in x86_emulate_instruction(), performing the single step if set. Cc: stable@vger.kernel.org Signed-off-by: Felipe Franciosi <felipe@nutanix.com> Message-Id: <20200519081048.8204-1-felipe@nutanix.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-11Merge branch 'kvm-basic-exit-reason' into HEADPaolo Bonzini1-2/+2
Using a topic branch so that stable branches can simply cherry-pick the patch. Reviewed-by: Oliver Upton <oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-11KVM: nVMX: Consult only the "basic" exit reason when routing nested exitSean Christopherson1-1/+1
Consult only the basic exit reason, i.e. bits 15:0 of vmcs.EXIT_REASON, when determining whether a nested VM-Exit should be reflected into L1 or handled by KVM in L0. For better or worse, the switch statement in nested_vmx_exit_reflected() currently defaults to "true", i.e. reflects any nested VM-Exit without dedicated logic. Because the case statements only contain the basic exit reason, any VM-Exit with modifier bits set will be reflected to L1, even if KVM intended to handle it in L0. Practically speaking, this only affects EXIT_REASON_MCE_DURING_VMENTRY, i.e. a #MC that occurs on nested VM-Enter would be incorrectly routed to L1, as "failed VM-Entry" is the only modifier that KVM can currently encounter. The SMM modifiers will never be generated as KVM doesn't support/employ a SMI Transfer Monitor. Ditto for "exit from enclave", as KVM doesn't yet support virtualizing SGX, i.e. it's impossible to enter an enclave in a KVM guest (L1 or L2). Fixes: 644d711aa0e1 ("KVM: nVMX: Deciding if L0 or L1 should handle an L2 exit") Cc: Jim Mattson <jmattson@google.com> Cc: Xiaoyao Li <xiaoyao.li@intel.com> Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200227174430.26371-1-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-11x86/entry: Optimize local_db_save() for virtPeter Zijlstra1-1/+1
Because DRn access is 'difficult' with virt; but the DR7 read is cheaper than a cacheline miss on native, add a virt specific fast path to local_db_save(), such that when breakpoints are not in use to avoid touching DRn entirely. Suggested-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20200529213321.187833200@infradead.org
2020-06-11x86/entry: Convert Machine Check to IDTENTRY_ISTThomas Gleixner2-2/+2
Convert #MC to IDTENTRY_MCE: - Implement the C entry points with DEFINE_IDTENTRY_MCE - Emit the ASM stub with DECLARE_IDTENTRY_MCE - Remove the ASM idtentry in 64bit - Remove the open coded ASM entry code in 32bit - Fixup the XEN/PV code - Remove the old prototypes - Remove the error code from *machine_check_vector() as it is always 0 and not used by any of the functions it can point to. Fixup all the functions as well. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Andy Lutomirski <luto@kernel.org> Link: https://lkml.kernel.org/r/20200505135314.334980426@linutronix.de
2020-06-10Merge branch 'uaccess.misc' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc uaccess updates from Al Viro: "Assorted uaccess patches for this cycle - the stuff that didn't fit into thematic series" * 'uaccess.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: bpf: make bpf_check_uarg_tail_zero() use check_zeroed_user() x86: kvm_hv_set_msr(): use __put_user() instead of 32bit __clear_user() user_regset_copyout_zero(): use clear_user() TEST_ACCESS_OK _never_ had been checked anywhere x86: switch cp_stat64() to unsafe_put_user() binfmt_flat: don't use __put_user() binfmt_elf_fdpic: don't use __... uaccess primitives binfmt_elf: don't bother with __{put,copy_to}_user() pselect6() and friends: take handling the combined 6th/7th args into helper
2020-06-09mmap locking API: convert mmap_sem call sites missed by coccinelleMichel Lespinasse1-4/+4
Convert the last few remaining mmap_sem rwsem calls to use the new mmap locking API. These were missed by coccinelle for some reason (I think coccinelle does not support some of the preprocessor constructs in these files ?) [akpm@linux-foundation.org: convert linux-next leftovers] [akpm@linux-foundation.org: more linux-next leftovers] [akpm@linux-foundation.org: more linux-next leftovers] Signed-off-by: Michel Lespinasse <walken@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ying Han <yinghan@google.com> Link: http://lkml.kernel.org/r/20200520052908.204642-6-walken@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09KVM: x86: Unexport x86_fpu_cache and make it staticSean Christopherson1-2/+1
Make x86_fpu_cache static now that FPU allocation and destruction is handled entirely by common x86 code. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200608180218.20946-1-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-08KVM: x86: Fix APIC page invalidation raceEiichi Tsukata1-5/+2
Commit b1394e745b94 ("KVM: x86: fix APIC page invalidation") tried to fix inappropriate APIC page invalidation by re-introducing arch specific kvm_arch_mmu_notifier_invalidate_range() and calling it from kvm_mmu_notifier_invalidate_range_start. However, the patch left a possible race where the VMCS APIC address cache is updated *before* it is unmapped: (Invalidator) kvm_mmu_notifier_invalidate_range_start() (Invalidator) kvm_make_all_cpus_request(kvm, KVM_REQ_APIC_PAGE_RELOAD) (KVM VCPU) vcpu_enter_guest() (KVM VCPU) kvm_vcpu_reload_apic_access_page() (Invalidator) actually unmap page Because of the above race, there can be a mismatch between the host physical address stored in the APIC_ACCESS_PAGE VMCS field and the host physical address stored in the EPT entry for the APIC GPA (0xfee0000). When this happens, the processor will not trap APIC accesses, and will instead show the raw contents of the APIC-access page. Because Windows OS periodically checks for unexpected modifications to the LAPIC register, this will show up as a BSOD crash with BugCheck CRITICAL_STRUCTURE_CORRUPTION (109) we are currently seeing in https://bugzilla.redhat.com/show_bug.cgi?id=1751017. The root cause of the issue is that kvm_arch_mmu_notifier_invalidate_range() cannot guarantee that no additional references are taken to the pages in the range before kvm_mmu_notifier_invalidate_range_end(). Fortunately, this case is supported by the MMU notifier API, as documented in include/linux/mmu_notifier.h: * If the subsystem * can't guarantee that no additional references are taken to * the pages in the range, it has to implement the * invalidate_range() notifier to remove any references taken * after invalidate_range_start(). The fix therefore is to reload the APIC-access page field in the VMCS from kvm_mmu_notifier_invalidate_range() instead of ..._range_start(). Cc: stable@vger.kernel.org Fixes: b1394e745b94 ("KVM: x86: fix APIC page invalidation") Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=197951 Signed-off-by: Eiichi Tsukata <eiichi.tsukata@nutanix.com> Message-Id: <20200606042627.61070-1-eiichi.tsukata@nutanix.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-08KVM: SVM: fix calls to is_interceptPaolo Bonzini2-2/+4
is_intercept takes an INTERCEPT_* constant, not SVM_EXIT_*; because of this, the compiler was removing the body of the conditionals, as if is_intercept returned 0. This unveils a latent bug: when clearing the VINTR intercept, int_ctl must also be changed in the L1 VMCB (svm->nested.hsave), just like the intercept itself is also changed in the L1 VMCB. Otherwise V_IRQ remains set and, due to the VINTR intercept being clear, we get a spurious injection of a vector 0 interrupt on the next L2->L1 vmexit. Reported-by: Qian Cai <cai@lca.pw> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-08Revert "KVM: x86: work around leak of uninitialized stack contents"Vitaly Kuznetsov1-7/+0
handle_vmptrst()/handle_vmread() stopped injecting #PF unconditionally and switched to nested_vmx_handle_memory_failure() which just kills the guest with KVM_EXIT_INTERNAL_ERROR in case of MMIO access, zeroing 'exception' in kvm_write_guest_virt_system() is not needed anymore. This reverts commit 541ab2aeb28251bf7135c7961f3a6080eebcc705. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200605115906.532682-2-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-08KVM: VMX: Properly handle kvm_read/write_guest_virt*() resultVitaly Kuznetsov3-40/+74
Syzbot reports the following issue: WARNING: CPU: 0 PID: 6819 at arch/x86/kvm/x86.c:618 kvm_inject_emulated_page_fault+0x210/0x290 arch/x86/kvm/x86.c:618 ... Call Trace: ... RIP: 0010:kvm_inject_emulated_page_fault+0x210/0x290 arch/x86/kvm/x86.c:618 ... nested_vmx_get_vmptr+0x1f9/0x2a0 arch/x86/kvm/vmx/nested.c:4638 handle_vmon arch/x86/kvm/vmx/nested.c:4767 [inline] handle_vmon+0x168/0x3a0 arch/x86/kvm/vmx/nested.c:4728 vmx_handle_exit+0x29c/0x1260 arch/x86/kvm/vmx/vmx.c:6067 'exception' we're trying to inject with kvm_inject_emulated_page_fault() comes from: nested_vmx_get_vmptr() kvm_read_guest_virt() kvm_read_guest_virt_helper() vcpu->arch.walk_mmu->gva_to_gpa() but it is only set when GVA to GPA conversion fails. In case it doesn't but we still fail kvm_vcpu_read_guest_page(), X86EMUL_IO_NEEDED is returned and nested_vmx_get_vmptr() calls kvm_inject_emulated_page_fault() with zeroed 'exception'. This happen when the argument is MMIO. Paolo also noticed that nested_vmx_get_vmptr() is not the only place in KVM code where kvm_read/write_guest_virt*() return result is mishandled. VMX instructions along with INVPCID have the same issue. This was already noticed before, e.g. see commit 541ab2aeb282 ("KVM: x86: work around leak of uninitialized stack contents") but was never fully fixed. KVM could've handled the request correctly by going to userspace and performing I/O but there doesn't seem to be a good need for such requests in the first place. Introduce vmx_handle_memory_failure() as an interim solution. Note, nested_vmx_get_vmptr() now has three possible outcomes: OK, PF, KVM_EXIT_INTERNAL_ERROR and callers need to know if userspace exit is needed (for KVM_EXIT_INTERNAL_ERROR) in case of failure. We don't seem to have a good enum describing this tristate, just add "int *ret" to nested_vmx_get_vmptr() interface to pass the information. Reported-by: syzbot+2a7156e11dc199bdbd8a@syzkaller.appspotmail.com Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200605115906.532682-1-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-05KVM: x86: emulate reserved nops from 0f/18 to 0f/1fPaolo Bonzini1-2/+6
Instructions starting with 0f18 up to 0f1f are reserved nops, except those that were assigned to MPX. These include the endbr markers used by CET. List them correctly in the opcode table. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-04KVM: x86: minor code refactor and comments fixup around dirty loggingAnthony Yznaga1-55/+49
Consolidate the code and correct the comments to show that the actions taken to update existing mappings to disable or enable dirty logging are not necessary when creating, moving, or deleting a memslot. Signed-off-by: Anthony Yznaga <anthony.yznaga@oracle.com> Message-Id: <1591128450-11977-4-git-send-email-anthony.yznaga@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-04KVM: x86: avoid unnecessary rmap walks when creating/moving slotsAnthony Yznaga1-1/+1
On large memory guests it has been observed that creating a memslot for a very large range can take noticeable amount of time. Investigation showed that the time is spent walking the rmaps to update existing sptes to remove write access or set/clear dirty bits to support dirty logging. These rmap walks are unnecessary when creating or moving a memslot. A newly created memslot will not have any existing mappings, and the existing mappings of a moved memslot will have been invalidated and flushed. Any mappings established once the new/moved memslot becomes visible will be set using the properties of the new slot. Signed-off-by: Anthony Yznaga <anthony.yznaga@oracle.com> Message-Id: <1591128450-11977-3-git-send-email-anthony.yznaga@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-04KVM: x86: remove unnecessary rmap walk of read-only memslotsAnthony Yznaga1-4/+2
There's no write access to remove. An existing memslot cannot be updated to set or clear KVM_MEM_READONLY, and any mappings established in a newly created or moved read-only memslot will already be read-only. Signed-off-by: Anthony Yznaga <anthony.yznaga@oracle.com> Message-Id: <1591128450-11977-2-git-send-email-anthony.yznaga@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-04KVM: Use vmemdup_user()Denis Efremov1-10/+7
Replace opencoded alloc and copy with vmemdup_user(). Signed-off-by: Denis Efremov <efremov@linux.com> Message-Id: <20200603101131.2107303-1-efremov@linux.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-04KVM: x86: Move MPK feature detection to common codeBabu Moger2-5/+8
Both Intel and AMD support (MPK) Memory Protection Key feature. Move the feature detection from VMX to the common code. It should work for both the platforms now. Signed-off-by: Babu Moger <babu.moger@amd.com> Message-Id: <158932795627.44260.15144185478040178638.stgit@naples-babu.amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-04KVM: x86: Assign correct value to array.maxnentXiaoyao Li1-2/+3
Delay the assignment of array.maxnent to use correct value for the case cpuid->nent > KVM_MAX_CPUID_ENTRIES. Fixes: e53c95e8d41e ("KVM: x86: Encapsulate CPUID entries and metadata in struct") Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Message-Id: <20200604041636.1187-1-xiaoyao.li@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-04KVM: VMX: Always treat MSR_IA32_PERF_CAPABILITIES as a valid PMU MSRSean Christopherson1-1/+1
Unconditionally return true when querying the validity of MSR_IA32_PERF_CAPABILITIES so as to defer the validity check to intel_pmu_{get,set}_msr(), which can properly give the MSR a pass when the access is initiated from host userspace. The MSR is emulated so there is no underlying hardware dependency to worry about. Fixes: 27461da31089a ("KVM: x86/pmu: Support full width counting") Cc: Like Xu <like.xu@linux.intel.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200603203303.28545-1-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-04KVM: let kvm_destroy_vm_debugfs clean up vCPU debugfs directoriesPaolo Bonzini1-5/+5
After commit 63d0434 ("KVM: x86: move kvm_create_vcpu_debugfs after last failure point") we are creating the pre-vCPU debugfs files after the creation of the vCPU file descriptor. This makes it possible for userspace to reach kvm_vcpu_release before kvm_create_vcpu_debugfs has finished. The vcpu->debugfs_dentry then does not have any associated inode anymore, and this causes a NULL-pointer dereference in debugfs_create_file. The solution is simply to avoid removing the files; they are cleaned up when the VM file descriptor is closed (and that must be after KVM_CREATE_VCPU returns). We can stop storing the dentry in struct kvm_vcpu too, because it is not needed anywhere after kvm_create_vcpu_debugfs returns. Reported-by: syzbot+705f4401d5a93a59b87d@syzkaller.appspotmail.com Fixes: 63d04348371b ("KVM: x86: move kvm_create_vcpu_debugfs after last failure point") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-03Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds37-1575/+2716
Pull kvm updates from Paolo Bonzini: "ARM: - Move the arch-specific code into arch/arm64/kvm - Start the post-32bit cleanup - Cherry-pick a few non-invasive pre-NV patches x86: - Rework of TLB flushing - Rework of event injection, especially with respect to nested virtualization - Nested AMD event injection facelift, building on the rework of generic code and fixing a lot of corner cases - Nested AMD live migration support - Optimization for TSC deadline MSR writes and IPIs - Various cleanups - Asynchronous page fault cleanups (from tglx, common topic branch with tip tree) - Interrupt-based delivery of asynchronous "page ready" events (host side) - Hyper-V MSRs and hypercalls for guest debugging - VMX preemption timer fixes s390: - Cleanups Generic: - switch vCPU thread wakeup from swait to rcuwait The other architectures, and the guest side of the asynchronous page fault work, will come next week" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (256 commits) KVM: selftests: fix rdtsc() for vmx_tsc_adjust_test KVM: check userspace_addr for all memslots KVM: selftests: update hyperv_cpuid with SynDBG tests x86/kvm/hyper-v: Add support for synthetic debugger via hypercalls x86/kvm/hyper-v: enable hypercalls regardless of hypercall page x86/kvm/hyper-v: Add support for synthetic debugger interface x86/hyper-v: Add synthetic debugger definitions KVM: selftests: VMX preemption timer migration test KVM: nVMX: Fix VMX preemption timer migration x86/kvm/hyper-v: Explicitly align hcall param for kvm_hyperv_exit KVM: x86/pmu: Support full width counting KVM: x86/pmu: Tweak kvm_pmu_get_msr to pass 'struct msr_data' in KVM: x86: announce KVM_FEATURE_ASYNC_PF_INT KVM: x86: acknowledgment mechanism for async pf page ready notifications KVM: x86: interrupt based APF 'page ready' event delivery KVM: introduce kvm_read_guest_offset_cached() KVM: rename kvm_arch_can_inject_async_page_present() to kvm_arch_can_dequeue_async_page_present() KVM: x86: extend struct kvm_vcpu_pv_apf_data with token info Revert "KVM: async_pf: Fix #DF due to inject "Page not Present" and "Page Ready" exceptions simultaneously" KVM: VMX: Replace zero-length array with flexible-array ...
2020-06-03Merge tag 'hyperv-next-signed' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyper-v updates from Wei Liu: - a series from Andrea to support channel reassignment - a series from Vitaly to clean up Vmbus message handling - a series from Michael to clean up and augment hyperv-tlfs.h - patches from Andy to clean up GUID usage in Hyper-V code - a few other misc patches * tag 'hyperv-next-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: (29 commits) Drivers: hv: vmbus: Resolve more races involving init_vp_index() Drivers: hv: vmbus: Resolve race between init_vp_index() and CPU hotplug vmbus: Replace zero-length array with flexible-array Driver: hv: vmbus: drop a no long applicable comment hyper-v: Switch to use UUID types directly hyper-v: Replace open-coded variant of %*phN specifier hyper-v: Supply GUID pointer to printf() like functions hyper-v: Use UUID API for exporting the GUID (part 2) asm-generic/hyperv: Add definitions for Get/SetVpRegister hypercalls x86/hyperv: Split hyperv-tlfs.h into arch dependent and independent files x86/hyperv: Remove HV_PROCESSOR_POWER_STATE #defines KVM: x86: hyperv: Remove duplicate definitions of Reference TSC Page drivers: hv: remove redundant assignment to pointer primary_channel scsi: storvsc: Re-init stor_chns when a channel interrupt is re-assigned Drivers: hv: vmbus: Introduce the CHANNELMSG_MODIFYCHANNEL message type Drivers: hv: vmbus: Synchronize init_vp_index() vs. CPU hotplug Drivers: hv: vmbus: Remove the unused HV_LOCALIZED channel affinity logic PCI: hv: Prepare hv_compose_msi_msg() for the VMBus-channel-interrupt-to-vCPU reassignment functionality Drivers: hv: vmbus: Use a spin lock for synchronizing channel scheduling vs. channel removal hv_utils: Always execute the fcopy and vss callbacks in a tasklet ...
2020-06-03x86: kvm_hv_set_msr(): use __put_user() instead of 32bit __clear_user()Al Viro1-1/+1
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-06-02Merge branch 'akpm' (patches from Andrew)Linus Torvalds1-2/+1
Merge updates from Andrew Morton: "A few little subsystems and a start of a lot of MM patches. Subsystems affected by this patch series: squashfs, ocfs2, parisc, vfs. With mm subsystems: slab-generic, slub, debug, pagecache, gup, swap, memcg, pagemap, memory-failure, vmalloc, kasan" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (128 commits) kasan: move kasan_report() into report.c mm/mm_init.c: report kasan-tag information stored in page->flags ubsan: entirely disable alignment checks under UBSAN_TRAP kasan: fix clang compilation warning due to stack protector x86/mm: remove vmalloc faulting mm: remove vmalloc_sync_(un)mappings() x86/mm/32: implement arch_sync_kernel_mappings() x86/mm/64: implement arch_sync_kernel_mappings() mm/ioremap: track which page-table levels were modified mm/vmalloc: track which page-table levels were modified mm: add functions to track page directory modifications s390: use __vmalloc_node in stack_alloc powerpc: use __vmalloc_node in alloc_vm_stack arm64: use __vmalloc_node in arch_alloc_vmap_stack mm: remove vmalloc_user_node_flags mm: switch the test_vmalloc module to use __vmalloc_node mm: remove __vmalloc_node_flags_caller mm: remove both instances of __vmalloc_node_flags mm: remove the prot argument to __vmalloc_node mm: remove the pgprot argument to __vmalloc ...
2020-06-02mm: remove the pgprot argument to __vmallocChristoph Hellwig1-2/+1
The pgprot argument to __vmalloc is always PAGE_KERNEL now, so remove it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Michael Kelley <mikelley@microsoft.com> [hyperv] Acked-by: Gao Xiang <xiang@kernel.org> [erofs] Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Wei Liu <wei.liu@kernel.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Airlie <airlied@linux.ie> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Link: http://lkml.kernel.org/r/20200414131348.444715-22-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-01Merge tag 'docs-5.8' of git://git.lwn.net/linuxLinus Torvalds1-1/+1
Pull documentation updates from Jonathan Corbet: "A fair amount of stuff this time around, dominated by yet another massive set from Mauro toward the completion of the RST conversion. I *really* hope we are getting close to the end of this. Meanwhile, those patches reach pretty far afield to update document references around the tree; there should be no actual code changes there. There will be, alas, more of the usual trivial merge conflicts. Beyond that we have more translations, improvements to the sphinx scripting, a number of additions to the sysctl documentation, and lots of fixes" * tag 'docs-5.8' of git://git.lwn.net/linux: (130 commits) Documentation: fixes to the maintainer-entry-profile template zswap: docs/vm: Fix typo accept_threshold_percent in zswap.rst tracing: Fix events.rst section numbering docs: acpi: fix old http link and improve document format docs: filesystems: add info about efivars content Documentation: LSM: Correct the basic LSM description mailmap: change email for Ricardo Ribalda docs: sysctl/kernel: document unaligned controls Documentation: admin-guide: update bug-hunting.rst docs: sysctl/kernel: document ngroups_max nvdimm: fixes to maintainter-entry-profile Documentation/features: Correct RISC-V kprobes support entry Documentation/features: Refresh the arch support status files Revert "docs: sysctl/kernel: document ngroups_max" docs: move locking-specific documents to locking/ docs: move digsig docs to the security book docs: move the kref doc into the core-api book docs: add IRQ documentation at the core-api book docs: debugging-via-ohci1394.txt: add it to the core-api book docs: fix references for ipmi.rst file ...
2020-06-01x86/kvm/hyper-v: Add support for synthetic debugger via hypercallsJon Doron1-0/+28
There is another mode for the synthetic debugger which uses hypercalls to send/recv network data instead of the MSR interface. This interface is much slower and less recommended since you might get a lot of VMExits while KDVM polling for new packets to recv, rather than simply checking the pending page to see if there is data avialble and then request. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Jon Doron <arilou@gmail.com> Message-Id: <20200529134543.1127440-6-arilou@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-01x86/kvm/hyper-v: enable hypercalls regardless of hypercall pageJon Doron1-1/+1
Microsoft's kdvm.dll dbgtransport module does not respect the hypercall page and simply identifies the CPU being used (AMD/Intel) and according to it simply makes hypercalls with the relevant instruction (vmmcall/vmcall respectively). The relevant function in kdvm is KdHvConnectHypervisor which first checks if the hypercall page has been enabled via HV_X64_MSR_HYPERCALL_ENABLE, and in case it was not it simply sets the HV_X64_MSR_GUEST_OS_ID to 0x1000101010001 which means: build_number = 0x0001 service_version = 0x01 minor_version = 0x01 major_version = 0x01 os_id = 0x00 (Undefined) vendor_id = 1 (Microsoft) os_type = 0 (A value of 0 indicates a proprietary, closed source OS) and starts issuing the hypercall without setting the hypercall page. To resolve this issue simply enable hypercalls also if the guest_os_id is not 0. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Jon Doron <arilou@gmail.com> Message-Id: <20200529134543.1127440-5-arilou@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-01x86/kvm/hyper-v: Add support for synthetic debugger interfaceJon Doron4-3/+219
Add support for Hyper-V synthetic debugger (syndbg) interface. The syndbg interface is using MSRs to emulate a way to send/recv packets data. The debug transport dll (kdvm/kdnet) will identify if Hyper-V is enabled and if it supports the synthetic debugger interface it will attempt to use it, instead of trying to initialize a network adapter. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Jon Doron <arilou@gmail.com> Message-Id: <20200529134543.1127440-4-arilou@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-01x86/hyper-v: Add synthetic debugger definitionsJon Doron1-0/+27
Hyper-V synthetic debugger has two modes, one that uses MSRs and the other that use Hypercalls. Add all the required definitions to both types of synthetic debugger interface. Some of the required new CPUIDs and MSRs are not documented in the TLFS so they are in hyperv.h instead. The reason they are not documented is because they are subjected to be removed in future versions of Windows. Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Jon Doron <arilou@gmail.com> Message-Id: <20200529134543.1127440-3-arilou@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-01KVM: selftests: VMX preemption timer migration testMakarand Sonare1-8/+4
When a nested VM with a VMX-preemption timer is migrated, verify that the nested VM and its parent VM observe the VMX-preemption timer exit close to the original expiration deadline. Signed-off-by: Makarand Sonare <makarandsonare@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <20200526215107.205814-3-makarandsonare@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>