summaryrefslogtreecommitdiffstats
path: root/arch/x86
AgeCommit message (Collapse)AuthorFilesLines
2021-01-20Merge tag 'for-linus-5.11-rc5-tag' of ↵Linus Torvalds1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fix from Juergen Gross: "A fix for build failure showing up in some configurations" * tag 'for-linus-5.11-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: x86/xen: fix 'nopvspin' build error
2021-01-19Merge tag 'hyperv-fixes-signed-20210119' of ↵Linus Torvalds1-3/+26
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv fix from Wei Liu: "One patch from Dexuan to fix clockevent initialization" * tag 'hyperv-fixes-signed-20210119' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: x86/hyperv: Initialize clockevents after LAPIC is initialized
2021-01-18x86/xen: fix 'nopvspin' build errorRandy Dunlap1-0/+2
Fix build error in x86/xen/ when PARAVIRT_SPINLOCKS is not enabled. Fixes this build error: ../arch/x86/xen/smp_hvm.c: In function ‘xen_hvm_smp_init’: ../arch/x86/xen/smp_hvm.c:77:3: error: ‘nopvspin’ undeclared (first use in this function) nopvspin = true; Fixes: 3d7746bea925 ("x86/xen: Fix xen_hvm_smp_init() when vector callback not available") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20210115191123.27572-1-rdunlap@infradead.org Signed-off-by: Juergen Gross <jgross@suse.com>
2021-01-17x86/hyperv: Initialize clockevents after LAPIC is initializedDexuan Cui1-3/+26
With commit 4df4cb9e99f8, the Hyper-V direct-mode STIMER is actually initialized before LAPIC is initialized: see apic_intr_mode_init() x86_platform.apic_post_init() hyperv_init() hv_stimer_alloc() apic_bsp_setup() setup_local_APIC() setup_local_APIC() temporarily disables LAPIC, initializes it and re-eanble it. The direct-mode STIMER depends on LAPIC, and when it's registered, it can be programmed immediately and the timer can fire very soon: hv_stimer_init clockevents_config_and_register clockevents_register_device tick_check_new_device tick_setup_device tick_setup_periodic(), tick_setup_oneshot() clockevents_program_event When the timer fires in the hypervisor, if the LAPIC is in the disabled state, new versions of Hyper-V ignore the event and don't inject the timer interrupt into the VM, and hence the VM hangs when it boots. Note: when the VM starts/reboots, the LAPIC is pre-enabled by the firmware, so the window of LAPIC being temporarily disabled is pretty small, and the issue can only happen once out of 100~200 reboots for a 40-vCPU VM on one dev host, and on another host the issue doesn't reproduce after 2000 reboots. The issue is more noticeable for kdump/kexec, because the LAPIC is disabled by the first kernel, and stays disabled until the kdump/kexec kernel enables it. This is especially an issue to a Generation-2 VM (for which Hyper-V doesn't emulate the PIT timer) when CONFIG_HZ=1000 (rather than CONFIG_HZ=250) is used. Fix the issue by moving hv_stimer_alloc() to a later place where the LAPIC timer is initialized. Fixes: 4df4cb9e99f8 ("x86/hyperv: Initialize clockevents earlier in CPU onlining") Signed-off-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20210116223136.13892-1-decui@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2021-01-15Merge tag 'for-linus-5.11-rc4-tag' of ↵Linus Torvalds2-13/+29
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: - A series to fix a regression when running as a fully virtualized guest on an old Xen hypervisor not supporting PV interrupt callbacks for HVM guests. - A patch to add support to query Xen resource sizes (setting was possible already) from user mode. * tag 'for-linus-5.11-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: x86/xen: Fix xen_hvm_smp_init() when vector callback not available x86/xen: Don't register Xen IPIs when they aren't going to be used x86/xen: Add xen_no_vector_callback option to test PCI INTX delivery xen: Set platform PCI device INTX affinity to CPU0 xen: Fix event channel callback via INTX/GSI xen/privcmd: allow fetching resource sizes
2021-01-13x86/xen: Fix xen_hvm_smp_init() when vector callback not availableDavid Woodhouse1-10/+17
Only the IPI-related functions in the smp_ops should be conditional on the vector callback being available. The rest should still happen: • xen_hvm_smp_prepare_boot_cpu() This function does two things, both of which should still happen if there is no vector callback support. The call to xen_vcpu_setup() for vCPU0 should still happen as it just sets up the vcpu_info for CPU0. That does happen for the secondary vCPUs too, from xen_cpu_up_prepare_hvm(). The second thing it does is call xen_init_spinlocks(), which perhaps counter-intuitively should *also* still be happening in the case without vector callbacks, so that it can clear its local xen_pvspin flag and disable the virt_spin_lock_key accordingly. Checking xen_have_vector_callback in xen_init_spinlocks() itself would affect PV guests, so set the global nopvspin flag in xen_hvm_smp_init() instead, when vector callbacks aren't available. • xen_hvm_smp_prepare_cpus() This does some IPI-related setup by calling xen_smp_intr_init() and xen_init_lock_cpu(), which can be made conditional. And it sets the xen_vcpu_id to XEN_VCPU_ID_INVALID for all possible CPUS, which does need to happen. • xen_smp_cpus_done() This offlines any vCPUs which doesn't fit in the global shared_info page, if separate vcpu_info placement isn't available. That part also needs to happen regardless of vector callback support. • xen_hvm_cpu_die() This doesn't actually do anything other than commin_cpu_die() right right now in the !vector_callback case; all three teardown functions it calls should be no-ops. But to guard against future regressions it's useful to call it anyway, and for it to explicitly check for xen_have_vector_callback before calling those additional functions. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: https://lore.kernel.org/r/20210106153958.584169-6-dwmw2@infradead.org Signed-off-by: Juergen Gross <jgross@suse.com>
2021-01-13x86/xen: Don't register Xen IPIs when they aren't going to be usedDavid Woodhouse1-2/+2
In the case where xen_have_vector_callback is false, we still register the IPI vectors in xen_smp_intr_init() for the secondary CPUs even though they aren't going to be used. Stop doing that. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: https://lore.kernel.org/r/20210106153958.584169-5-dwmw2@infradead.org Signed-off-by: Juergen Gross <jgross@suse.com>
2021-01-13x86/xen: Add xen_no_vector_callback option to test PCI INTX deliveryDavid Woodhouse1-1/+10
It's useful to be able to test non-vector event channel delivery, to make sure Linux will work properly on older Xen which doesn't have it. It's also useful for those working on Xen and Xen-compatible hypervisors, because there are guest kernels still in active use which use PCI INTX even when vector delivery is available. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: https://lore.kernel.org/r/20210106153958.584169-4-dwmw2@infradead.org Signed-off-by: Juergen Gross <jgross@suse.com>
2021-01-11Merge tag 'hyperv-fixes-signed-20210111' of ↵Linus Torvalds4-3/+33
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv fixes from Wei Liu: - fix kexec panic/hang (Dexuan Cui) - fix occasional crashes when flushing TLB (Wei Liu) * tag 'hyperv-fixes-signed-20210111' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: x86/hyperv: check cpu mask after interrupt has been disabled x86/hyperv: Fix kexec panic/hang issues
2021-01-10Merge tag 'x86_urgent_for_v5.11_rc3' of ↵Linus Torvalds5-73/+57
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: "As expected, fixes started trickling in after the holidays so here is the accumulated pile of x86 fixes for 5.11: - A fix for fanotify_mark() missing the conversion of x86_32 native syscalls which take 64-bit arguments to the compat handlers due to former having a general compat handler. (Brian Gerst) - Add a forgotten pmd page destructor call to pud_free_pmd_page() where a pmd page is freed. (Dan Williams) - Make IN/OUT insns with an u8 immediate port operand handling for SEV-ES guests more precise by using only the single port byte and not the whole s32 value of the insn decoder. (Peter Gonda) - Correct a straddling end range check before returning the proper MTRR type, when the end address is the same as top of memory. (Ying-Tsun Huang) - Change PQR_ASSOC MSR update scheme when moving a task to a resctrl resource group to avoid significant performance overhead with some resctrl workloads. (Fenghua Yu) - Avoid the actual task move overhead when the task is already in the resource group. (Fenghua Yu)" * tag 'x86_urgent_for_v5.11_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/resctrl: Don't move a task to the same resource group x86/resctrl: Use an IPI instead of task_work_add() to update PQR_ASSOC MSR x86/mtrr: Correct the range check before performing MTRR type lookups x86/sev-es: Fix SEV-ES OUT/IN immediate opcode vc handling x86/mm: Fix leak of pmd ptlock fanotify: Fix sys_fanotify_mark() on native x86-32
2021-01-08Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds13-97/+178
Pull kvm fixes from Paolo Bonzini: "x86: - Fixes for the new scalable MMU - Fixes for migration of nested hypervisors on AMD - Fix for clang integrated assembler - Fix for left shift by 64 (UBSAN) - Small cleanups - Straggler SEV-ES patch ARM: - VM init cleanups - PSCI relay cleanups - Kill CONFIG_KVM_ARM_PMU - Fixup __init annotations - Fixup reg_to_encoding() - Fix spurious PMCR_EL0 access Misc: - selftests cleanups" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (38 commits) KVM: x86: __kvm_vcpu_halt can be static KVM: SVM: Add support for booting APs in an SEV-ES guest KVM: nSVM: cancel KVM_REQ_GET_NESTED_STATE_PAGES on nested vmexit KVM: nSVM: mark vmcb as dirty when forcingly leaving the guest mode KVM: nSVM: correctly restore nested_run_pending on migration KVM: x86/mmu: Clarify TDP MMU page list invariants KVM: x86/mmu: Ensure TDP MMU roots are freed after yield kvm: check tlbs_dirty directly KVM: x86: change in pv_eoi_get_pending() to make code more readable MAINTAINERS: Really update email address for Sean Christopherson KVM: x86: fix shift out of bounds reported by UBSAN KVM: selftests: Implement perf_test_util more conventionally KVM: selftests: Use vm_create_with_vcpus in create_vm KVM: selftests: Factor out guest mode code KVM/SVM: Remove leftover __svm_vcpu_run prototype from svm.c KVM: SVM: Add register operand to vmsave call in sev_es_vcpu_load KVM: x86/mmu: Optimize not-present/MMIO SPTE check in get_mmio_spte() KVM: x86/mmu: Use raw level to index into MMIO walks' sptes array KVM: x86/mmu: Get root level from walkers when retrieving MMIO SPTE KVM: x86/mmu: Use -1 to flag an undefined spte in get_mmio_spte() ...
2021-01-08KVM: x86: __kvm_vcpu_halt can be staticPaolo Bonzini1-1/+1
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-01-08x86/resctrl: Don't move a task to the same resource groupFenghua Yu1-0/+7
Shakeel Butt reported in [1] that a user can request a task to be moved to a resource group even if the task is already in the group. It just wastes time to do the move operation which could be costly to send IPI to a different CPU. Add a sanity check to ensure that the move operation only happens when the task is not already in the resource group. [1] https://lore.kernel.org/lkml/CALvZod7E9zzHwenzf7objzGKsdBmVwTgEJ0nPgs0LUFU3SN5Pw@mail.gmail.com/ Fixes: e02737d5b826 ("x86/intel_rdt: Add tasks files") Reported-by: Shakeel Butt <shakeelb@google.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/962ede65d8e95be793cb61102cca37f7bb018e66.1608243147.git.reinette.chatre@intel.com
2021-01-08x86/resctrl: Use an IPI instead of task_work_add() to update PQR_ASSOC MSRFenghua Yu1-69/+43
Currently, when moving a task to a resource group the PQR_ASSOC MSR is updated with the new closid and rmid in an added task callback. If the task is running, the work is run as soon as possible. If the task is not running, the work is executed later in the kernel exit path when the kernel returns to the task again. Updating the PQR_ASSOC MSR as soon as possible on the CPU a moved task is running is the right thing to do. Queueing work for a task that is not running is unnecessary (the PQR_ASSOC MSR is already updated when the task is scheduled in) and causing system resource waste with the way in which it is implemented: Work to update the PQR_ASSOC register is queued every time the user writes a task id to the "tasks" file, even if the task already belongs to the resource group. This could result in multiple pending work items associated with a single task even if they are all identical and even though only a single update with most recent values is needed. Specifically, even if a task is moved between different resource groups while it is sleeping then it is only the last move that is relevant but yet a work item is queued during each move. This unnecessary queueing of work items could result in significant system resource waste, especially on tasks sleeping for a long time. For example, as demonstrated by Shakeel Butt in [1] writing the same task id to the "tasks" file can quickly consume significant memory. The same problem (wasted system resources) occurs when moving a task between different resource groups. As pointed out by Valentin Schneider in [2] there is an additional issue with the way in which the queueing of work is done in that the task_struct update is currently done after the work is queued, resulting in a race with the register update possibly done before the data needed by the update is available. To solve these issues, update the PQR_ASSOC MSR in a synchronous way right after the new closid and rmid are ready during the task movement, only if the task is running. If a moved task is not running nothing is done since the PQR_ASSOC MSR will be updated next time the task is scheduled. This is the same way used to update the register when tasks are moved as part of resource group removal. [1] https://lore.kernel.org/lkml/CALvZod7E9zzHwenzf7objzGKsdBmVwTgEJ0nPgs0LUFU3SN5Pw@mail.gmail.com/ [2] https://lore.kernel.org/lkml/20201123022433.17905-1-valentin.schneider@arm.com [ bp: Massage commit message and drop the two update_task_closid_rmid() variants. ] Fixes: e02737d5b826 ("x86/intel_rdt: Add tasks files") Reported-by: Shakeel Butt <shakeelb@google.com> Reported-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: James Morse <james.morse@arm.com> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/17aa2fb38fc12ce7bb710106b3e7c7b45acb9e94.1608243147.git.reinette.chatre@intel.com
2021-01-07KVM: SVM: Add support for booting APs in an SEV-ES guestTom Lendacky7-6/+61
Typically under KVM, an AP is booted using the INIT-SIPI-SIPI sequence, where the guest vCPU register state is updated and then the vCPU is VMRUN to begin execution of the AP. For an SEV-ES guest, this won't work because the guest register state is encrypted. Following the GHCB specification, the hypervisor must not alter the guest register state, so KVM must track an AP/vCPU boot. Should the guest want to park the AP, it must use the AP Reset Hold exit event in place of, for example, a HLT loop. First AP boot (first INIT-SIPI-SIPI sequence): Execute the AP (vCPU) as it was initialized and measured by the SEV-ES support. It is up to the guest to transfer control of the AP to the proper location. Subsequent AP boot: KVM will expect to receive an AP Reset Hold exit event indicating that the vCPU is being parked and will require an INIT-SIPI-SIPI sequence to awaken it. When the AP Reset Hold exit event is received, KVM will place the vCPU into a simulated HLT mode. Upon receiving the INIT-SIPI-SIPI sequence, KVM will make the vCPU runnable. It is again up to the guest to then transfer control of the AP to the proper location. To differentiate between an actual HLT and an AP Reset Hold, a new MP state is introduced, KVM_MP_STATE_AP_RESET_HOLD, which the vCPU is placed in upon receiving the AP Reset Hold exit event. Additionally, to communicate the AP Reset Hold exit event up to userspace (if needed), a new exit reason is introduced, KVM_EXIT_AP_RESET_HOLD. A new x86 ops function is introduced, vcpu_deliver_sipi_vector, in order to accomplish AP booting. For VMX, vcpu_deliver_sipi_vector is set to the original SIPI delivery function, kvm_vcpu_deliver_sipi_vector(). SVM adds a new function that, for non SEV-ES guests, invokes the original SIPI delivery function, kvm_vcpu_deliver_sipi_vector(), but for SEV-ES guests, implements the logic above. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <e8fbebe8eb161ceaabdad7c01a5859a78b424d5e.1609791600.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-01-07KVM: nSVM: cancel KVM_REQ_GET_NESTED_STATE_PAGES on nested vmexitMaxim Levitsky3-1/+8
It is possible to exit the nested guest mode, entered by svm_set_nested_state prior to first vm entry to it (e.g due to pending event) if the nested run was not pending during the migration. In this case we must not switch to the nested msr permission bitmap. Also add a warning to catch similar cases in the future. Fixes: a7d5c7ce41ac1 ("KVM: nSVM: delay MSR permission processing to first nested VM run") Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210107093854.882483-2-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-01-07KVM: nSVM: mark vmcb as dirty when forcingly leaving the guest modeMaxim Levitsky1-0/+1
We overwrite most of vmcb fields while doing so, so we must mark it as dirty. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210107093854.882483-5-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-01-07KVM: nSVM: correctly restore nested_run_pending on migrationMaxim Levitsky1-0/+4
The code to store it on the migration exists, but no code was restoring it. One of the side effects of fixing this is that L1->L2 injected events are no longer lost when migration happens with nested run pending. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210107093854.882483-3-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-01-07KVM: x86/mmu: Clarify TDP MMU page list invariantsBen Gardon1-2/+14
The tdp_mmu_roots and tdp_mmu_pages in struct kvm_arch should only contain pages with tdp_mmu_page set to true. tdp_mmu_pages should not contain any pages with a non-zero root_count and tdp_mmu_roots should only contain pages with a positive root_count, unless a thread holds the MMU lock and is in the process of modifying the list. Various functions expect these invariants to be maintained, but they are not explictily documented. Add to the comments on both fields to document the above invariants. Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20210107001935.3732070-2-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-01-07KVM: x86/mmu: Ensure TDP MMU roots are freed after yieldBen Gardon1-56/+48
Many TDP MMU functions which need to perform some action on all TDP MMU roots hold a reference on that root so that they can safely drop the MMU lock in order to yield to other threads. However, when releasing the reference on the root, there is a bug: the root will not be freed even if its reference count (root_count) is reduced to 0. To simplify acquiring and releasing references on TDP MMU root pages, and to ensure that these roots are properly freed, move the get/put operations into another TDP MMU root iterator macro. Moving the get/put operations into an iterator macro also helps simplify control flow when a root does need to be freed. Note that using the list_for_each_entry_safe macro would not have been appropriate in this situation because it could keep a pointer to the next root across an MMU lock release + reacquire, during which time that root could be freed. Reported-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Fixes: faaf05b00aec ("kvm: x86/mmu: Support zapping SPTEs in the TDP MMU") Fixes: 063afacd8730 ("kvm: x86/mmu: Support invalidate range MMU notifier for TDP MMU") Fixes: a6a0b05da9f3 ("kvm: x86/mmu: Support dirty logging for the TDP MMU") Fixes: 14881998566d ("kvm: x86/mmu: Support disabling dirty logging for the tdp MMU") Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20210107001935.3732070-1-bgardon@google.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-01-07KVM: x86: change in pv_eoi_get_pending() to make code more readableStephen Zhang1-1/+1
Signed-off-by: Stephen Zhang <stephenzhangzsd@gmail.com> Message-Id: <1608277897-1932-1-git-send-email-stephenzhangzsd@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-01-07KVM: x86: fix shift out of bounds reported by UBSANPaolo Bonzini1-1/+1
Since we know that e >= s, we can reassociate the left shift, changing the shifted number from 1 to 2 in exchange for decreasing the right hand side by 1. Reported-by: syzbot+e87846c48bf72bc85311@syzkaller.appspotmail.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-01-07KVM/SVM: Remove leftover __svm_vcpu_run prototype from svm.cUros Bizjak1-2/+0
Commit 16809ecdc1e8a moved __svm_vcpu_run the prototype to svm.h, but forgot to remove the original from svm.c. Fixes: 16809ecdc1e8a ("KVM: SVM: Provide an updated VMRUN invocation for SEV-ES guests") Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Message-Id: <20201220200339.65115-1-ubizjak@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-01-07KVM: SVM: Add register operand to vmsave call in sev_es_vcpu_loadNathan Chancellor1-1/+1
When using LLVM's integrated assembler (LLVM_IAS=1) while building x86_64_defconfig + CONFIG_KVM=y + CONFIG_KVM_AMD=y, the following build error occurs: $ make LLVM=1 LLVM_IAS=1 arch/x86/kvm/svm/sev.o arch/x86/kvm/svm/sev.c:2004:15: error: too few operands for instruction asm volatile(__ex("vmsave") : : "a" (__sme_page_pa(sd->save_area)) : "memory"); ^ arch/x86/kvm/svm/sev.c:28:17: note: expanded from macro '__ex' #define __ex(x) __kvm_handle_fault_on_reboot(x) ^ ./arch/x86/include/asm/kvm_host.h:1646:10: note: expanded from macro '__kvm_handle_fault_on_reboot' "666: \n\t" \ ^ <inline asm>:2:2: note: instantiated into assembly here vmsave ^ 1 error generated. This happens because LLVM currently does not support calling vmsave without the fixed register operand (%rax for 64-bit and %eax for 32-bit). This will be fixed in LLVM 12 but the kernel currently supports LLVM 10.0.1 and newer so this needs to be handled. Add the proper register using the _ASM_AX macro, which matches the vmsave call in vmenter.S. Fixes: 861377730aa9 ("KVM: SVM: Provide support for SEV-ES vCPU loading") Link: https://reviews.llvm.org/D93524 Link: https://github.com/ClangBuiltLinux/linux/issues/1216 Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Message-Id: <20201219063711.3526947-1-natechancellor@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-01-07Merge branch 'kvm-master' into kvm-nextPaolo Bonzini45-297/+331
Fixes to get_mmio_spte, destined to 5.10 stable branch.
2021-01-07KVM: x86/mmu: Optimize not-present/MMIO SPTE check in get_mmio_spte()Sean Christopherson1-7/+13
Check only the terminal leaf for a "!PRESENT || MMIO" SPTE when looking for reserved bits on valid, non-MMIO SPTEs. The get_walk() helpers terminate their walks if a not-present or MMIO SPTE is encountered, i.e. the non-terminal SPTEs have already been verified to be regular SPTEs. This eliminates an extra check-and-branch in a relatively hot loop. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20201218003139.2167891-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-01-07KVM: x86/mmu: Use raw level to index into MMIO walks' sptes arraySean Christopherson2-9/+8
Bump the size of the sptes array by one and use the raw level of the SPTE to index into the sptes array. Using the SPTE level directly improves readability by eliminating the need to reason out why the level is being adjusted when indexing the array. The array is on the stack and is not explicitly initialized; bumping its size is nothing more than a superficial adjustment to the stack frame. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20201218003139.2167891-4-seanjc@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-01-07KVM: x86/mmu: Get root level from walkers when retrieving MMIO SPTESean Christopherson3-11/+13
Get the so called "root" level from the low level shadow page table walkers instead of manually attempting to calculate it higher up the stack, e.g. in get_mmio_spte(). When KVM is using PAE shadow paging, the starting level of the walk, from the callers perspective, is not the CR3 root but rather the PDPTR "root". Checking for reserved bits from the CR3 root causes get_mmio_spte() to consume uninitialized stack data due to indexing into sptes[] for a level that was not filled by get_walk(). This can result in false positives and/or negatives depending on what garbage happens to be on the stack. Opportunistically nuke a few extra newlines. Fixes: 95fb5b0258b7 ("kvm: x86/mmu: Support MMIO in the TDP MMU") Reported-by: Richard Herbert <rherbert@sympatico.ca> Cc: Ben Gardon <bgardon@google.com> Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20201218003139.2167891-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-01-07KVM: x86/mmu: Use -1 to flag an undefined spte in get_mmio_spte()Sean Christopherson2-2/+7
Return -1 from the get_walk() helpers if the shadow walk doesn't fill at least one spte, which can theoretically happen if the walk hits a not-present PDPTR. Returning the root level in such a case will cause get_mmio_spte() to return garbage (uninitialized stack data). In practice, such a scenario should be impossible as KVM shouldn't get a reserved-bit page fault with a not-present PDPTR. Note, using mmu->root_level in get_walk() is wrong for other reasons, too, but that's now a moot point. Fixes: 95fb5b0258b7 ("kvm: x86/mmu: Support MMIO in the TDP MMU") Cc: Ben Gardon <bgardon@google.com> Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20201218003139.2167891-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-01-06x86/mtrr: Correct the range check before performing MTRR type lookupsYing-Tsun Huang1-3/+3
In mtrr_type_lookup(), if the input memory address region is not in the MTRR, over 4GB, and not over the top of memory, a write-back attribute is returned. These condition checks are for ensuring the input memory address region is actually mapped to the physical memory. However, if the end address is just aligned with the top of memory, the condition check treats the address is over the top of memory, and write-back attribute is not returned. And this hits in a real use case with NVDIMM: the nd_pmem module tries to map NVDIMMs as cacheable memories when NVDIMMs are connected. If a NVDIMM is the last of the DIMMs, the performance of this NVDIMM becomes very low since it is aligned with the top of memory and its memory type is uncached-minus. Move the input end address change to inclusive up into mtrr_type_lookup(), before checking for the top of memory in either mtrr_type_lookup_{variable,fixed}() helpers. [ bp: Massage commit message. ] Fixes: 0cc705f56e40 ("x86/mm/mtrr: Clean up mtrr_type_lookup()") Signed-off-by: Ying-Tsun Huang <ying-tsun.huang@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20201215070721.4349-1-ying-tsun.huang@amd.com
2021-01-06x86/hyperv: check cpu mask after interrupt has been disabledWei Liu1-3/+9
We've observed crashes due to an empty cpu mask in hyperv_flush_tlb_others. Obviously the cpu mask in question is changed between the cpumask_empty call at the beginning of the function and when it is actually used later. One theory is that an interrupt comes in between and a code path ends up changing the mask. Move the check after interrupt has been disabled to see if it fixes the issue. Signed-off-by: Wei Liu <wei.liu@kernel.org> Cc: stable@kernel.org Link: https://lore.kernel.org/r/20210105175043.28325-1-wei.liu@kernel.org Reviewed-by: Michael Kelley <mikelley@microsoft.com>
2021-01-05x86/sev-es: Fix SEV-ES OUT/IN immediate opcode vc handlingPeter Gonda1-2/+2
The IN and OUT instructions with port address as an immediate operand only use an 8-bit immediate (imm8). The current VC handler uses the entire 32-bit immediate value but these instructions only set the first bytes. Cast the operand to an u8 for that. [ bp: Massage commit message. ] Fixes: 25189d08e5168 ("x86/sev-es: Add support for handling IOIO exceptions") Signed-off-by: Peter Gonda <pgonda@google.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: David Rientjes <rientjes@google.com> Link: https://lkml.kernel.org/r/20210105163311.221490-1-pgonda@google.com
2021-01-05x86/hyperv: Fix kexec panic/hang issuesDexuan Cui3-0/+24
Currently the kexec kernel can panic or hang due to 2 causes: 1) hv_cpu_die() is not called upon kexec, so the hypervisor corrupts the old VP Assist Pages when the kexec kernel runs. The same issue is fixed for hibernation in commit 421f090c819d ("x86/hyperv: Suspend/resume the VP assist page for hibernation"). Now fix it for kexec. 2) hyperv_cleanup() is called too early. In the kexec path, the other CPUs are stopped in hv_machine_shutdown() -> native_machine_shutdown(), so between hv_kexec_handler() and native_machine_shutdown(), the other CPUs can still try to access the hypercall page and cause panic. The workaround "hv_hypercall_pg = NULL;" in hyperv_cleanup() is unreliabe. Move hyperv_cleanup() to a better place. Signed-off-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20201222065541.24312-1-decui@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2021-01-05x86/mm: Fix leak of pmd ptlockDan Williams1-0/+2
Commit 28ee90fe6048 ("x86/mm: implement free pmd/pte page interfaces") introduced a new location where a pmd was released, but neglected to run the pmd page destructor. In fact, this happened previously for a different pmd release path and was fixed by commit: c283610e44ec ("x86, mm: do not leak page->ptl for pmd page tables"). This issue was hidden until recently because the failure mode is silent, but commit: b2b29d6d0119 ("mm: account PMD tables like PTE tables") turns the failure mode into this signature: BUG: Bad page state in process lt-pmem-ns pfn:15943d page:000000007262ed7b refcount:0 mapcount:-1024 mapping:0000000000000000 index:0x0 pfn:0x15943d flags: 0xaffff800000000() raw: 00affff800000000 dead000000000100 0000000000000000 0000000000000000 raw: 0000000000000000 ffff913a029bcc08 00000000fffffbff 0000000000000000 page dumped because: nonzero mapcount [..] dump_stack+0x8b/0xb0 bad_page.cold+0x63/0x94 free_pcp_prepare+0x224/0x270 free_unref_page+0x18/0xd0 pud_free_pmd_page+0x146/0x160 ioremap_pud_range+0xe3/0x350 ioremap_page_range+0x108/0x160 __ioremap_caller.constprop.0+0x174/0x2b0 ? memremap+0x7a/0x110 memremap+0x7a/0x110 devm_memremap+0x53/0xa0 pmem_attach_disk+0x4ed/0x530 [nd_pmem] ? __devm_release_region+0x52/0x80 nvdimm_bus_probe+0x85/0x210 [libnvdimm] Given this is a repeat occurrence it seemed prudent to look for other places where this destructor might be missing and whether a better helper is needed. try_to_free_pmd_page() looks like a candidate, but testing with setting up and tearing down pmd mappings via the dax unit tests is thus far not triggering the failure. As for a better helper pmd_free() is close, but it is a messy fit due to requiring an @mm arg. Also, ___pmd_free_tlb() wants to call paravirt_tlb_remove_table() instead of free_page(), so open-coded pgtable_pmd_page_dtor() seems the best way forward for now. Debugged together with Matthew Wilcox <willy@infradead.org>. Fixes: 28ee90fe6048 ("x86/mm: implement free pmd/pte page interfaces") Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Yi Zhang <yi.zhang@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/160697689204.605323.17629854984697045602.stgit@dwillia2-desk3.amr.corp.intel.com
2020-12-29local64.h: make <asm/local64.h> mandatoryRandy Dunlap1-1/+0
Make <asm-generic/local64.h> mandatory in include/asm-generic/Kbuild and remove all arch/*/include/asm/local64.h arch-specific files since they only #include <asm-generic/local64.h>. This fixes build errors on arch/c6x/ and arch/nios2/ for block/blk-iocost.c. Build-tested on 21 of 25 arch-es. (tools problems on the others) Yes, we could even rename <asm-generic/local64.h> to <linux/local64.h> and change all #includes to use <linux/local64.h> instead. Link: https://lkml.kernel.org/r/20201227024446.17018-1-rdunlap@infradead.org Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Suggested-by: Christoph Hellwig <hch@infradead.org> Reviewed-by: Masahiro Yamada <masahiroy@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Ley Foon Tan <ley.foon.tan@intel.com> Cc: Mark Salter <msalter@redhat.com> Cc: Aurelien Jacquiot <jacquiot.aurelien@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-28fanotify: Fix sys_fanotify_mark() on native x86-32Brian Gerst1-0/+1
Commit 121b32a58a3a ("x86/entry/32: Use IA32-specific wrappers for syscalls taking 64-bit arguments") converted native x86-32 which take 64-bit arguments to use the compat handlers to allow conversion to passing args via pt_regs. sys_fanotify_mark() was however missed, as it has a general compat handler. Add a config option that will use the syscall wrapper that takes the split args for native 32-bit. [ bp: Fix typo in Kconfig help text. ] Fixes: 121b32a58a3a ("x86/entry/32: Use IA32-specific wrappers for syscalls taking 64-bit arguments") Reported-by: Paweł Jasiak <pawel@jasiak.xyz> Signed-off-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Jan Kara <jack@suse.cz> Acked-by: Andy Lutomirski <luto@kernel.org> Link: https://lkml.kernel.org/r/20201130223059.101286-1-brgerst@gmail.com
2020-12-24Merge tag 'irq-core-2020-12-23' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq updates from Thomas Gleixner: "This is the second attempt after the first one failed miserably and got zapped to unblock the rest of the interrupt related patches. A treewide cleanup of interrupt descriptor (ab)use with all sorts of racy accesses, inefficient and disfunctional code. The goal is to remove the export of irq_to_desc() to prevent these things from creeping up again" * tag 'irq-core-2020-12-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (30 commits) genirq: Restrict export of irq_to_desc() xen/events: Implement irq distribution xen/events: Reduce irq_info:: Spurious_cnt storage size xen/events: Only force affinity mask for percpu interrupts xen/events: Use immediate affinity setting xen/events: Remove disfunct affinity spreading xen/events: Remove unused bind_evtchn_to_irq_lateeoi() net/mlx5: Use effective interrupt affinity net/mlx5: Replace irq_to_desc() abuse net/mlx4: Use effective interrupt affinity net/mlx4: Replace irq_to_desc() abuse PCI: mobiveil: Use irq_data_get_irq_chip_data() PCI: xilinx-nwl: Use irq_data_get_irq_chip_data() NTB/msi: Use irq_has_action() mfd: ab8500-debugfs: Remove the racy fiddling with irq_desc pinctrl: nomadik: Use irq_has_action() drm/i915/pmu: Replace open coded kstat_irqs() copy drm/i915/lpe_audio: Remove pointless irq_to_desc() usage s390/irq: Use irq_desc_kstat_cpu() in show_msi_interrupt() parisc/irq: Use irq_desc_kstat_cpu() in show_interrupts() ...
2020-12-24Merge tag 'efi_updates_for_v5.11' of ↵Linus Torvalds5-127/+13
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull EFI updates from Borislav Petkov: "These got delayed due to a last minute ia64 build issue which got fixed in the meantime. EFI updates collected by Ard Biesheuvel: - Don't move BSS section around pointlessly in the x86 decompressor - Refactor helper for discovering the EFI secure boot mode - Wire up EFI secure boot to IMA for arm64 - Some fixes for the capsule loader - Expose the RT_PROP table via the EFI test module - Relax DT and kernel placement restrictions on ARM with a few followup fixes: - fix the build breakage on IA64 caused by recent capsule loader changes - suppress a type mismatch build warning in the expansion of EFI_PHYS_ALIGN on ARM" * tag 'efi_updates_for_v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: efi: arm: force use of unsigned type for EFI_PHYS_ALIGN efi: ia64: disable the capsule loader efi: stub: get rid of efi_get_max_fdt_addr() efi/efi_test: read RuntimeServicesSupported efi: arm: reduce minimum alignment of uncompressed kernel efi: capsule: clean scatter-gather entries from the D-cache efi: capsule: use atomic kmap for transient sglist mappings efi: x86/xen: switch to efi_get_secureboot_mode helper arm64/ima: add ima_arch support ima: generalize x86/EFI arch glue for other EFI architectures efi: generalize efi_get_secureboot efi/libstub: EFI_GENERIC_STUB_INITRD_CMDLINE_LOADER should not default to yes efi/x86: Only copy the compressed kernel image in efi_relocate_kernel() efi/libstub/x86: simplify efi_is_native()
2020-12-22Merge branch 'akpm' (patches from Andrew)Linus Torvalds2-1/+2
Merge KASAN updates from Andrew Morton. This adds a new hardware tag-based mode to KASAN. The new mode is similar to the existing software tag-based KASAN, but relies on arm64 Memory Tagging Extension (MTE) to perform memory and pointer tagging (instead of shadow memory and compiler instrumentation). By Andrey Konovalov and Vincenzo Frascino. * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (60 commits) kasan: update documentation kasan, mm: allow cache merging with no metadata kasan: sanitize objects when metadata doesn't fit kasan: clarify comment in __kasan_kfree_large kasan: simplify assign_tag and set_tag calls kasan: don't round_up too much kasan, mm: rename kasan_poison_kfree kasan, mm: check kasan_enabled in annotations kasan: add and integrate kasan boot parameters kasan: inline (un)poison_range and check_invalid_free kasan: open-code kasan_unpoison_slab kasan: inline random_tag for HW_TAGS kasan: inline kasan_reset_tag for tag-based modes kasan: remove __kasan_unpoison_stack kasan: allow VMAP_STACK for HW_TAGS mode kasan, arm64: unpoison stack only with CONFIG_KASAN_STACK kasan: introduce set_alloc_info kasan: rename get_alloc/free_info kasan: simplify quarantine_put call site kselftest/arm64: check GCR_EL1 after context switch ...
2020-12-22x86/split-lock: Avoid returning with interrupts enabledAndi Kleen1-1/+2
When a split lock is detected always make sure to disable interrupts before returning from the trap handler. The kernel exit code assumes that all exits run with interrupts disabled, otherwise the SWAPGS sequence can race against interrupts and cause recursing page faults and later panics. The problem will only happen on CPUs with split lock disable functionality, so Icelake Server, Tiger Lake, Snow Ridge, Jacobsville. Fixes: ca4c6a9858c2 ("x86/traps: Make interrupt enable/disable symmetric in C code") Fixes: bce9b042ec73 ("x86/traps: Disable interrupts in exc_aligment_check()") # v5.8+ Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Tony Luck <tony.luck@intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-22kasan, arm64: unpoison stack only with CONFIG_KASAN_STACKAndrey Konovalov1-1/+1
There's a config option CONFIG_KASAN_STACK that has to be enabled for KASAN to use stack instrumentation and perform validity checks for stack variables. There's no need to unpoison stack when CONFIG_KASAN_STACK is not enabled. Only call kasan_unpoison_task_stack[_below]() when CONFIG_KASAN_STACK is enabled. Note, that CONFIG_KASAN_STACK is an option that is currently always defined when CONFIG_KASAN is enabled, and therefore has to be tested with #if instead of #ifdef. Link: https://lkml.kernel.org/r/d09dd3f8abb388da397fd11598c5edeaa83fe559.1606162397.git.andreyknvl@google.com Link: https://linux-review.googlesource.com/id/If8a891e9fe01ea543e00b576852685afec0887e3 Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Marco Elver <elver@google.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Branislav Rankov <Branislav.Rankov@arm.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-22kasan, x86, s390: update undef CONFIG_KASANAndrey Konovalov1-0/+1
With the intoduction of hardware tag-based KASAN some kernel checks of this kind: ifdef CONFIG_KASAN will be updated to: if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS) x86 and s390 use a trick to #undef CONFIG_KASAN for some of the code that isn't linked with KASAN runtime and shouldn't have any KASAN annotations. Also #undef CONFIG_KASAN_GENERIC with CONFIG_KASAN. Link: https://lkml.kernel.org/r/9d84bfaaf8fabe0fc89f913c9e420a30bd31a260.1606161801.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by: Marco Elver <elver@google.com> Acked-by: Vasily Gorbik <gor@linux.ibm.com> Reviewed-by: Alexander Potapenko <glider@google.com> Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Branislav Rankov <Branislav.Rankov@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-20Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds38-498/+2297
Pull KVM updates from Paolo Bonzini: "Much x86 work was pushed out to 5.12, but ARM more than made up for it. ARM: - PSCI relay at EL2 when "protected KVM" is enabled - New exception injection code - Simplification of AArch32 system register handling - Fix PMU accesses when no PMU is enabled - Expose CSV3 on non-Meltdown hosts - Cache hierarchy discovery fixes - PV steal-time cleanups - Allow function pointers at EL2 - Various host EL2 entry cleanups - Simplification of the EL2 vector allocation s390: - memcg accouting for s390 specific parts of kvm and gmap - selftest for diag318 - new kvm_stat for when async_pf falls back to sync x86: - Tracepoints for the new pagetable code from 5.10 - Catch VFIO and KVM irqfd events before userspace - Reporting dirty pages to userspace with a ring buffer - SEV-ES host support - Nested VMX support for wait-for-SIPI activity state - New feature flag (AVX512 FP16) - New system ioctl to report Hyper-V-compatible paravirtualization features Generic: - Selftest improvements" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (171 commits) KVM: SVM: fix 32-bit compilation KVM: SVM: Add AP_JUMP_TABLE support in prep for AP booting KVM: SVM: Provide support to launch and run an SEV-ES guest KVM: SVM: Provide an updated VMRUN invocation for SEV-ES guests KVM: SVM: Provide support for SEV-ES vCPU loading KVM: SVM: Provide support for SEV-ES vCPU creation/loading KVM: SVM: Update ASID allocation to support SEV-ES guests KVM: SVM: Set the encryption mask for the SVM host save area KVM: SVM: Add NMI support for an SEV-ES guest KVM: SVM: Guest FPU state save/restore not needed for SEV-ES guest KVM: SVM: Do not report support for SMM for an SEV-ES guest KVM: x86: Update __get_sregs() / __set_sregs() to support SEV-ES KVM: SVM: Add support for CR8 write traps for an SEV-ES guest KVM: SVM: Add support for CR4 write traps for an SEV-ES guest KVM: SVM: Add support for CR0 write traps for an SEV-ES guest KVM: SVM: Add support for EFER write traps for an SEV-ES guest KVM: SVM: Support string IO operations for an SEV-ES guest KVM: SVM: Support MMIO for an SEV-ES guest KVM: SVM: Create trace events for VMGEXIT MSR protocol processing KVM: SVM: Create trace events for VMGEXIT processing ...
2020-12-19Merge tag 'for-linus-5.11-rc1b-tag' of ↵Linus Torvalds3-28/+24
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull more xen updates from Juergen Gross: "Some minor cleanup patches and a small series disentangling some Xen related Kconfig options" * tag 'for-linus-5.11-rc1b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen: Kconfig: remove X86_64 depends from XEN_512GB xen/manage: Fix fall-through warnings for Clang xen-blkfront: Fix fall-through warnings for Clang xen: remove trailing semicolon in macro definition xen: Kconfig: nest Xen guest options xen: Remove Xen PVH/PVHVM dependency on PCI x86/xen: Convert to DEFINE_SHOW_ATTRIBUTE
2020-12-19epoll: wire up syscall epoll_pwait2Willem de Bruijn2-0/+2
Split off from prev patch in the series that implements the syscall. Link: https://lkml.kernel.org/r/20201121144401.3727659-4-willemdebruijn.kernel@gmail.com Signed-off-by: Willem de Bruijn <willemb@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-19mm, kvm: account kvm_vcpu_mmap to kmemcgShakeel Butt1-1/+1
A VCPU of a VM can allocate couple of pages which can be mmap'ed by the user space application. At the moment this memory is not charged to the memcg of the VMM. On a large machine running large number of VMs or small number of VMs having large number of VCPUs, this unaccounted memory can be very significant. So, charge this memory to the memcg of the VMM. Please note that lifetime of these allocations corresponds to the lifetime of the VMM. Link: https://lkml.kernel.org/r/20201106202923.2087414-1-shakeelb@google.com Signed-off-by: Shakeel Butt <shakeelb@google.com> Acked-by: Roman Gushchin <guro@fb.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-19xen: Kconfig: remove X86_64 depends from XEN_512GBJason Andryuk1-1/+1
commit bfda93aee0ec ("xen: Kconfig: nest Xen guest options") accidentally re-added X86_64 as a dependency to XEN_512GB. It was originally removed in commit a13f2ef168cb ("x86/xen: remove 32-bit Xen PV guest support"). Remove it again. Fixes: bfda93aee0ec ("xen: Kconfig: nest Xen guest options") Reported-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Jason Andryuk <jandryuk@gmail.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20201216140838.16085-1-jandryuk@gmail.com Signed-off-by: Juergen Gross <jgross@suse.com>
2020-12-17Merge tag 'trace-v5.11' of ↵Linus Torvalds5-7/+46
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "The major update to this release is that there's a new arch config option called CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS. Currently, only x86_64 enables it. All the ftrace callbacks now take a struct ftrace_regs instead of a struct pt_regs. If the architecture has HAVE_DYNAMIC_FTRACE_WITH_ARGS enabled, then the ftrace_regs will have enough information to read the arguments of the function being traced, as well as access to the stack pointer. This way, if a user (like live kernel patching) only cares about the arguments, then it can avoid using the heavier weight "regs" callback, that puts in enough information in the struct ftrace_regs to simulate a breakpoint exception (needed for kprobes). A new config option that audits the timestamps of the ftrace ring buffer at most every event recorded. Ftrace recursion protection has been cleaned up to move the protection to the callback itself (this saves on an extra function call for those callbacks). Perf now handles its own RCU protection and does not depend on ftrace to do it for it (saving on that extra function call). New debug option to add "recursed_functions" file to tracefs that lists all the places that triggered the recursion protection of the function tracer. This will show where things need to be fixed as recursion slows down the function tracer. The eval enum mapping updates done at boot up are now offloaded to a work queue, as it caused a noticeable pause on slow embedded boards. Various clean ups and last minute fixes" * tag 'trace-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (33 commits) tracing: Offload eval map updates to a work queue Revert: "ring-buffer: Remove HAVE_64BIT_ALIGNED_ACCESS" ring-buffer: Add rb_check_bpage in __rb_allocate_pages ring-buffer: Fix two typos in comments tracing: Drop unneeded assignment in ring_buffer_resize() tracing: Disable ftrace selftests when any tracer is running seq_buf: Avoid type mismatch for seq_buf_init ring-buffer: Fix a typo in function description ring-buffer: Remove obsolete rb_event_is_commit() ring-buffer: Add test to validate the time stamp deltas ftrace/documentation: Fix RST C code blocks tracing: Clean up after filter logic rewriting tracing: Remove the useless value assignment in test_create_synth_event() livepatch: Use the default ftrace_ops instead of REGS when ARGS is available ftrace/x86: Allow for arguments to be passed in to ftrace_regs by default ftrace: Have the callbacks receive a struct ftrace_regs instead of pt_regs MAINTAINERS: assign ./fs/tracefs to TRACING tracing: Fix some typos in comments ftrace: Remove unused varible 'ret' ring-buffer: Add recording of ring buffer recursion into recursed_functions ...
2020-12-16Merge tag 'arm-soc-drivers-5.11' of ↵Linus Torvalds1-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull ARM SoC driver updates from Arnd Bergmann: "There are a couple of subsystems maintained by other people that merge their drivers through the SoC tree, those changes include: - The SCMI firmware framework gains support for sensor notifications and for controlling voltage domains. - A large update for the Tegra memory controller driver, integrating it better with the interconnect framework - The memory controller subsystem gains support for Mediatek MT8192 - The reset controller framework gains support for sharing pulsed resets For Soc specific drivers in drivers/soc, the main changes are - The Allwinner/sunxi MBUS gets a rework for the way it handles dma_map_ops and offsets between physical and dma address spaces. - An errata fix plus some cleanups for Freescale Layerscape SoCs - A cleanup for renesas drivers regarding MMIO accesses. - New SoC specific drivers for Mediatek MT8192 and MT8183 power domains - New SoC specific drivers for Aspeed AST2600 LPC bus control and SoC identification. - Core Power Domain support for Qualcomm MSM8916, MSM8939, SDM660 and SDX55. - A rework of the TI AM33xx 'genpd' power domain support to use information from DT instead of platform data - Support for TI AM64x SoCs - Allow building some Amlogic drivers as modules instead of built-in Finally, there are numerous cleanups and smaller bug fixes for Mediatek, Tegra, Samsung, Qualcomm, TI OMAP, Amlogic, Rockchips, Renesas, and Xilinx SoCs" * tag 'arm-soc-drivers-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (222 commits) soc: mediatek: mmsys: Specify HAS_IOMEM dependency for MTK_MMSYS firmware: xilinx: Properly align function parameter firmware: xilinx: Add a blank line after function declaration firmware: xilinx: Remove additional newline firmware: xilinx: Fix kernel-doc warnings firmware: xlnx-zynqmp: fix compilation warning soc: xilinx: vcu: add missing register NUM_CORE soc: xilinx: vcu: use vcu-settings syscon registers dt-bindings: soc: xlnx: extract xlnx, vcu-settings to separate binding soc: xilinx: vcu: drop useless success message clk: samsung: mark PM functions as __maybe_unused soc: samsung: exynos-chipid: initialize later - with arch_initcall soc: samsung: exynos-chipid: order list of SoCs by name memory: jz4780_nemc: Fix potential NULL dereference in jz4780_nemc_probe() memory: ti-emif-sram: only build for ARMv7 memory: tegra30: Support interconnect framework memory: tegra20: Support hardware versioning and clean up OPP table initialization dt-bindings: memory: tegra20-emc: Document opp-supported-hw property soc: rockchip: io-domain: Fix error return code in rockchip_iodomain_probe() reset-controller: ti: force the write operation when assert or deassert ...
2020-12-16Merge branch 'stable/for-linus-5.11' of ↵Linus Torvalds3-0/+39
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb Pull swiotlb update from Konrad Rzeszutek Wilk: "A generic (but for right now engaged only with AMD SEV) mechanism to adjust a larger size SWIOTLB based on the total memory of the SEV guests which right now require the bounce buffer for interacting with the outside world. Normal knobs (swiotlb=XYZ) still work" * 'stable/for-linus-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb: x86,swiotlb: Adjust SWIOTLB bounce buffer size for SEV guests