summaryrefslogtreecommitdiffstats
path: root/arch/x86/include
AgeCommit message (Collapse)AuthorFilesLines
2022-11-30KVM: x86/xen: Allow XEN_RUNSTATE_UPDATE flag behaviour to be configuredDavid Woodhouse1-0/+1
Closer inspection of the Xen code shows that we aren't supposed to be using the XEN_RUNSTATE_UPDATE flag unconditionally. It should be explicitly enabled by guests through the HYPERVISOR_vm_assist hypercall. If we randomly set the top bit of ->state_entry_time for a guest that hasn't asked for it and doesn't expect it, that could make the runtimes fail to add up and confuse the guest. Without the flag it's perfectly safe for a vCPU to read its own vcpu_runstate_info; just not for one vCPU to read *another's*. I briefly pondered adding a word for the whole set of VMASST_TYPE_* flags but the only one we care about for HVM guests is this, so it seemed a bit pointless. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Message-Id: <20221127122210.248427-3-dwmw2@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-30KVM: x86/xen: Compatibility fixes for shared runstate areaDavid Woodhouse1-0/+1
The guest runstate area can be arbitrarily byte-aligned. In fact, even when a sane 32-bit guest aligns the overall structure nicely, the 64-bit fields in the structure end up being unaligned due to the fact that the 32-bit ABI only aligns them to 32 bits. So setting the ->state_entry_time field to something|XEN_RUNSTATE_UPDATE is buggy, because if it's unaligned then we can't update the whole field atomically; the low bytes might be observable before the _UPDATE bit is. Xen actually updates the *byte* containing that top bit, on its own. KVM should do the same. In addition, we cannot assume that the runstate area fits within a single page. One option might be to make the gfn_to_pfn cache cope with regions that cross a page — but getting a contiguous virtual kernel mapping of a discontiguous set of IOMEM pages is a distinctly non-trivial exercise, and it seems this is the *only* current use case for the GPC which would benefit from it. An earlier version of the runstate code did use a gfn_to_hva cache for this purpose, but it still had the single-page restriction because it used the uhva directly — because it needs to be able to do so atomically when the vCPU is being scheduled out, so it used pagefault_disable() around the accesses and didn't just use kvm_write_guest_cached() which has a fallback path. So... use a pair of GPCs for the first and potential second page covering the runstate area. We can get away with locking both at once because nothing else takes more than one GPC lock at a time so we can invent a trivial ordering rule. The common case where it's all in the same page is kept as a fast path, but in both cases, the actual guest structure (compat or not) is built up from the fields in @vx, following preset pointers to the state and times fields. The only difference is whether those pointers point to the kernel stack (in the split case) or to guest memory directly via the GPC. The fast path is also fixed to use a byte access for the XEN_RUNSTATE_UPDATE bit, then the only real difference is the dual memcpy. Finally, Xen also does write the runstate area immediately when it's configured. Flip the kvm_xen_update_runstate() and …_guest() functions and call the latter directly when the runstate area is set. This means that other ioctls which modify the runstate also write it immediately to the guest when they do so, which is also intended. Update the xen_shinfo_test to exercise the pathological case where the XEN_RUNSTATE_UPDATE flag in the top byte of the state_entry_time is actually in a different page to the rest of the 64-bit word. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-29x86/cpuid: Carve out all CPUID functionalityBorislav Petkov2-134/+140
Carve it out into a special header, where it belongs. No functional changes. Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20221124164150.3040-1-bp@alien8.de
2022-11-28x86: KVM: Advertise AVX-IFMA CPUID to user spaceJiaxi Chen1-0/+1
AVX-IFMA is a new instruction in the latest Intel platform Sierra Forest. This instruction packed multiplies unsigned 52-bit integers and adds the low/high 52-bit products to Qword Accumulators. The bit definition: CPUID.(EAX=7,ECX=1):EAX[bit 23] AVX-IFMA is on an expected-dense CPUID leaf and some other bits on this leaf have kernel usages. Given that, define this feature bit like X86_FEATURE_<name> in kernel. Considering AVX-IFMA itself has no truly kernel usages and /proc/cpuinfo has too much unreadable flags, hide this one in /proc/cpuinfo. Advertise AVX-IFMA to KVM userspace. This is safe because there are no new VMX controls or additional host enabling required for guests to use this feature. Signed-off-by: Jiaxi Chen <jiaxi.chen@linux.intel.com> Acked-by: Borislav Petkov <bp@suse.de> Message-Id: <20221125125845.1182922-6-jiaxi.chen@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-28x86: KVM: Advertise AMX-FP16 CPUID to user spaceChang S. Bae1-0/+1
Latest Intel platform Granite Rapids has introduced a new instruction - AMX-FP16, which performs dot-products of two FP16 tiles and accumulates the results into a packed single precision tile. AMX-FP16 adds FP16 capability and also allows a FP16 GPU trained model to run faster without loss of accuracy or added SW overhead. The bit definition: CPUID.(EAX=7,ECX=1):EAX[bit 21] AMX-FP16 is on an expected-dense CPUID leaf and some other bits on this leaf have kernel usages. Given that, define this feature bit like X86_FEATURE_<name> in kernel. Considering AMX-FP16 itself has no truly kernel usages and /proc/cpuinfo has too much unreadable flags, hide this one in /proc/cpuinfo. Advertise AMX-FP16 to KVM userspace. This is safe because there are no new VMX controls or additional host enabling required for guests to use this feature. Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Jiaxi Chen <jiaxi.chen@linux.intel.com> Acked-by: Borislav Petkov <bp@suse.de> Message-Id: <20221125125845.1182922-5-jiaxi.chen@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-28x86: KVM: Advertise CMPccXADD CPUID to user spaceJiaxi Chen1-0/+1
CMPccXADD is a new set of instructions in the latest Intel platform Sierra Forest. This new instruction set includes a semaphore operation that can compare and add the operands if condition is met, which can improve database performance. The bit definition: CPUID.(EAX=7,ECX=1):EAX[bit 7] CMPccXADD is on an expected-dense CPUID leaf and some other bits on this leaf have kernel usages. Given that, define this feature bit like X86_FEATURE_<name> in kernel. Considering CMPccXADD itself has no truly kernel usages and /proc/cpuinfo has too much unreadable flags, hide this one in /proc/cpuinfo. Advertise CMPCCXADD to KVM userspace. This is safe because there are no new VMX controls or additional host enabling required for guests to use this feature. Signed-off-by: Jiaxi Chen <jiaxi.chen@linux.intel.com> Acked-by: Borislav Petkov <bp@suse.de> Message-Id: <20221125125845.1182922-4-jiaxi.chen@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-28x86/hyperv: Expand definition of struct hv_vp_assist_pageSaurabh Sengar1-1/+10
The struct hv_vp_assist_page has 24 bytes which is defined as u64[3], expand that to expose vtl_entry_reason, vtl_ret_x64rax and vtl_ret_x64rcx field. vtl_entry_reason is updated by hypervisor for the entry reason as to why the VTL was entered on the virtual processor. Guest updates the vtl_ret_* fields to provide the register values to restore on VTL return. The specific register values that are restored which will be updated on vtl_ret_x64rax and vtl_ret_x64rcx. Also added the missing fields for synthetic_time_unhalted_timer_expired, virtualization_fault_information and intercept_message. Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com> Reviewed-by: <anrayabh@linux.microsoft.com> Link: https://lore.kernel.org/r/1667587123-31645-1-git-send-email-ssengar@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2022-11-27x86/resctrl: Move MSR defines into msr-index.hBorislav Petkov2-11/+18
msr-index.h should contain all MSRs for easier grepping for MSR numbers when dealing with unchecked MSR access warnings, for example. Move the resctrl ones. Prefix IA32_PQR_ASSOC with "MSR_" while at it. No functional changes. Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20221106212923.20699-1-bp@alien8.de
2022-11-25x86/boot: Skip realmode init code when running as Xen PV guestJuergen Gross2-0/+5
When running as a Xen PV guest there is no need for setting up the realmode trampoline, as realmode isn't supported in this environment. Trying to setup the trampoline has been proven to be problematic in some cases, especially when trying to debug early boot problems with Xen requiring to keep the EFI boot-services memory mapped (some firmware variants seem to claim basically all memory below 1Mb for boot services). Introduce new x86_platform_ops operations for that purpose, which can be set to a NOP by the Xen PV specific kernel boot code. [ bp: s/call_init_real_mode/do_init_real_mode/ ] Fixes: 084ee1c641a0 ("x86, realmode: Relocator for realmode code") Suggested-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20221123114523.3467-1-jgross@suse.com
2022-11-24x86/paravirt: Use common macro for creating simple asm paravirt functionsJuergen Gross2-27/+32
There are some paravirt assembler functions which are sharing a common pattern. Introduce a macro DEFINE_PARAVIRT_ASM() for creating them. Note that this macro is including explicit alignment of the generated functions, leading to __raw_callee_save___kvm_vcpu_is_preempted(), _paravirt_nop() and paravirt_ret0() to be aligned at 4 byte boundaries now. The explicit _paravirt_nop() prototype in paravirt.c isn't needed, as it is included in paravirt_types.h already. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu> Link: https://lkml.kernel.org/r/20221109134418.6516-1-jgross@suse.com
2022-11-24x86/paravirt: Remove clobber bitmask from .parainstructionsKees Cook1-49/+12
The u16 "clobber" value is not used in .parainstructions since commit 27876f3882fd ("x86/paravirt: Remove clobbers from struct paravirt_patch_site") Remove the u16 from the section macro, the argument from all macros, and all now-unused CLBR_* macros. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20220903073706.3193746-1-keescook@chromium.org
2022-11-22x86/cpu: Drop 32-bit Xen PV guest code in update_task_stack()Juergen Gross1-5/+2
Testing for Xen PV guest mode in a 32-bit only code section can be dropped, as Xen PV guests are supported in 64-bit mode only. While at it, switch from boot_cpu_has() to cpu_feature_enabled() in the 64-bit part of the code. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/r/20221104072701.20283-4-jgross@suse.com
2022-11-22x86/cpu: Remove unneeded 64-bit dependency in arch_enter_from_user_mode()Juergen Gross1-2/+2
The check for 64-bit mode when testing X86_FEATURE_XENPV isn't needed, as Xen PV guests are no longer supported in 32-bit mode, see a13f2ef168cb ("x86/xen: remove 32-bit Xen PV guest support"). While at it switch from boot_cpu_has() to cpu_feature_enabled(). Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/r/20221104072701.20283-3-jgross@suse.com
2022-11-22x86/cpufeatures: Add X86_FEATURE_XENPV to disabled-features.hJuergen Gross1-1/+7
Add X86_FEATURE_XENPV to the features handled specially in disabled-features.h. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/r/20221104072701.20283-2-jgross@suse.com
2022-11-21Merge tag 'v6.1-rc6' into x86/core, to resolve conflictsIngo Molnar9-23/+45
Resolve conflicts between these commits in arch/x86/kernel/asm-offsets.c: # upstream: debc5a1ec0d1 ("KVM: x86: use a separate asm-offsets.c file") # retbleed work in x86/core: 5d8213864ade ("x86/retbleed: Add SKL return thunk") ... and these commits in include/linux/bpf.h: # upstram: 18acb7fac22f ("bpf: Revert ("Fix dispatcher patchable function entry to 5 bytes nop")") # x86/core commits: 931ab63664f0 ("x86/ibt: Implement FineIBT") bea75b33895f ("x86/Kconfig: Introduce function padding") The latter two modify BPF_DISPATCHER_ATTRIBUTES(), which was removed upstream. Conflicts: arch/x86/kernel/asm-offsets.c include/linux/bpf.h Signed-off-by: Ingo Molnar <mingo@kernel.org>
2022-11-21x86/tsx: Add a feature bit for TSX control MSR supportPawan Gupta1-0/+3
Support for the TSX control MSR is enumerated in MSR_IA32_ARCH_CAPABILITIES. This is different from how other CPU features are enumerated i.e. via CPUID. Currently, a call to tsx_ctrl_is_supported() is required for enumerating the feature. In the absence of a feature bit for TSX control, any code that relies on checking feature bits directly will not work. In preparation for adding a feature bit check in MSR save/restore during suspend/resume, set a new feature bit X86_FEATURE_TSX_CTRL when MSR_IA32_TSX_CTRL is present. Also make tsx_ctrl_is_supported() use the new feature bit to avoid any overhead of reading the MSR. [ bp: Remove tsx_ctrl_is_supported(), add room for two more feature bits in word 11 which are coming up in the next merge window. ] Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: <stable@kernel.org> Link: https://lore.kernel.org/r/de619764e1d98afbb7a5fa58424f1278ede37b45.1668539735.git.pawan.kumar.gupta@linux.intel.com
2022-11-20Merge tag 'locking_urgent_for_v6.1_rc6' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fix from Borislav Petkov: - Fix a build error with clang 11 * tag 'locking_urgent_for_v6.1_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking: Fix qspinlock/x86 inline asm error
2022-11-19platform/x86/intel/ifs: Use generic microcode headers and functionsJithu Joseph1-0/+1
Existing implementation (broken) of IFS used a header format (for IFS test images) which was very similar to microcode format, but didn’t accommodate extended signatures. This meant same IFS test image had to be duplicated for different steppings and the validation code in the driver was only looking at the primary header parameters. Going forward, IFS test image headers have been tweaked to become fully compatible with the microcode format. Newer IFS test image headers will use header version 2 in order to distinguish it from microcode images and older IFS test images. In light of the above, reuse struct microcode_header_intel directly in the IFS driver and reuse microcode functions for validation and sanity checking. [ bp: Massage commit message. ] Signed-off-by: Jithu Joseph <jithu.joseph@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20221117225039.30166-1-jithu.joseph@intel.com
2022-11-18x86/microcode/intel: Use a reserved field for metasizeJithu Joseph1-1/+2
Intel is using microcode file format for IFS test images too. IFS test images use one of the existing reserved fields in microcode header to indicate the size of the region in the file allocated for metadata structures. In preparation for this, rename first of the existing reserved fields in microcode header to metasize. In subsequent patches IFS specific code will make use of this field while parsing IFS images. Signed-off-by: Jithu Joseph <jithu.joseph@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Ashok Raj <ashok.raj@intel.com> Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> Link: https://lore.kernel.org/r/20221117035935.4136738-10-jithu.joseph@intel.com
2022-11-18x86/microcode/intel: Add hdr_type to intel_microcode_sanity_check()Jithu Joseph2-1/+2
IFS test images and microcode blobs use the same header format. Microcode blobs use header type of 1, whereas IFS test images will use header type of 2. In preparation for IFS reusing intel_microcode_sanity_check(), add header type as a parameter for sanity check. [ bp: Touchups. ] Signed-off-by: Jithu Joseph <jithu.joseph@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Ashok Raj <ashok.raj@intel.com> Link: https://lore.kernel.org/r/20221117035935.4136738-9-jithu.joseph@intel.com
2022-11-18x86/microcode/intel: Reuse microcode_sanity_check()Jithu Joseph1-0/+1
IFS test image carries the same microcode header as regular Intel microcode blobs. Reuse microcode_sanity_check() in the IFS driver to perform sanity check of the IFS test images too. Signed-off-by: Jithu Joseph <jithu.joseph@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Ashok Raj <ashok.raj@intel.com> Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> Link: https://lore.kernel.org/r/20221117035935.4136738-8-jithu.joseph@intel.com
2022-11-18x86/microcode/intel: Reuse find_matching_signature()Jithu Joseph1-0/+1
IFS uses test images provided by Intel that can be regarded as firmware. An IFS test image carries microcode header with an extended signature table. Reuse find_matching_signature() for verifying if the test image header or the extended signature table indicate whether that image is fit to run on a system. No functional changes. Signed-off-by: Jithu Joseph <jithu.joseph@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Ashok Raj <ashok.raj@intel.com> Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> Link: https://lore.kernel.org/r/20221117035935.4136738-6-jithu.joseph@intel.com
2022-11-18KVM: nSVM: hyper-v: Enable L2 TLB flushVitaly Kuznetsov1-0/+4
Implement Hyper-V L2 TLB flush for nSVM. The feature needs to be enabled both in extended 'nested controls' in VMCB and VP assist page. According to Hyper-V TLFS, synthetic vmexit to L1 is performed with - HV_SVM_EXITCODE_ENL exit_code. - HV_SVM_ENL_EXITCODE_TRAP_AFTER_FLUSH exit_info_1. Note: VP assist page is cached in 'struct kvm_vcpu_hv' so recalc_intercepts() doesn't need to read from guest's memory. KVM needs to update the case upon each VMRUN and after svm_set_nested_state (svm_get_nested_state_pages()) to handle the case when the guest got migrated while L2 was running. Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20221101145426.251680-29-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-18KVM: nVMX: hyper-v: Enable L2 TLB flushVitaly Kuznetsov1-0/+9
Enable L2 TLB flush feature on nVMX when: - Enlightened VMCS is in use. - The feature flag is enabled in eVMCS. - The feature flag is enabled in partition assist page. Perform synthetic vmexit to L1 after processing TLB flush call upon request (HV_VMX_SYNTHETIC_EXIT_REASON_TRAP_AFTER_FLUSH). Note: nested_evmcs_l2_tlb_flush_enabled() uses cached VP assist page copy which gets updated from nested_vmx_handle_enlightened_vmptrld(). This is also guaranteed to happen post migration with eVMCS backed L2 running. Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20221101145426.251680-27-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-18KVM: nVMX: hyper-v: Cache VP assist page in 'struct kvm_vcpu_hv'Vitaly Kuznetsov1-0/+2
In preparation to enabling L2 TLB flush, cache VP assist page in 'struct kvm_vcpu_hv'. While on it, rename nested_enlightened_vmentry() to nested_get_evmptr() and make it return eVMCS GPA directly. No functional change intended. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20221101145426.251680-26-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-18KVM: x86: Introduce .hv_inject_synthetic_vmexit_post_tlb_flush() nested hookVitaly Kuznetsov1-0/+1
Hyper-V supports injecting synthetic L2->L1 exit after performing L2 TLB flush operation but the procedure is vendor specific. Introduce .hv_inject_synthetic_vmexit_post_tlb_flush nested hook for it. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20221101145426.251680-22-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-18KVM: nVMX: Keep track of hv_vm_id/hv_vp_id when eVMCS is in useVitaly Kuznetsov1-0/+6
To handle L2 TLB flush requests, KVM needs to keep track of L2's VM_ID/ VP_IDs which are set by L1 hypervisor. 'Partition assist page' address is also needed to handle post-flush exit to L1 upon request. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20221101145426.251680-20-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-18KVM: x86: hyper-v: Use preallocated buffer in 'struct kvm_vcpu_hv' instead ↵Vitaly Kuznetsov1-0/+3
of on-stack 'sparse_banks' To make kvm_hv_flush_tlb() ready to handle L2 TLB flush requests, KVM needs to allow for all 64 sparse vCPU banks regardless of KVM_MAX_VCPUs as L1 may use vCPU overcommit for L2. To avoid growing on-stack allocation, make 'sparse_banks' part of per-vCPU 'struct kvm_vcpu_hv' which is allocated dynamically. Note: sparse_set_to_vcpu_mask() can't currently be used to handle L2 requests as KVM does not keep L2 VM_ID -> L2 VCPU_ID -> L1 vCPU mappings, i.e. its vp_bitmap array is still bounded by the number of L1 vCPUs and so can remain an on-stack allocation. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20221101145426.251680-19-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-18KVM: x86: hyper-v: Create a separate fifo for L2 TLB flushVitaly Kuznetsov1-1/+7
To handle L2 TLB flush requests, KVM needs to use a separate fifo from regular (L1) Hyper-V TLB flush requests: e.g. when a request to flush something in L2 is made, the target vCPU can transition from L2 to L1, receive a request to flush a GVA for L1 and then try to enter L2 back. The first request needs to be processed at this point. Similarly, requests to flush GVAs in L1 must wait until L2 exits to L1. No functional change as KVM doesn't handle L2 TLB flush requests from L2 yet. Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20221101145426.251680-18-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-18KVM: x86: hyper-v: Expose support for extended gva ranges for flush hypercallsVitaly Kuznetsov1-0/+2
Extended GVA ranges support bit seems to indicate whether lower 12 bits of GVA can be used to specify up to 4095 additional consequent GVAs to flush. This is somewhat described in TLFS. Previously, KVM was handling HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST{,EX} requests by flushing the whole VPID so technically, extended GVA ranges were already supported. As such requests are handled more gently now, advertizing support for extended ranges starts making sense to reduce the size of TLB flush requests. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20221101145426.251680-13-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-18KVM: x86: hyper-v: Introduce TLB flush fifoVitaly Kuznetsov1-0/+20
To allow flushing individual GVAs instead of always flushing the whole VPID a per-vCPU structure to pass the requests is needed. Use standard 'kfifo' to queue two types of entries: individual GVA (GFN + up to 4095 following GFNs in the lower 12 bits) and 'flush all'. The size of the fifo is arbitrarily set to '16'. Note, kvm_hv_flush_tlb() only queues 'flush all' entries for now and kvm_hv_vcpu_flush_tlb() doesn't actually read the fifo just resets the queue before returning -EOPNOTSUPP (which triggers full TLB flush) so the functional change is very small but the infrastructure is prepared to handle individual GVA flush requests. Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20221101145426.251680-10-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-18KVM: x86: hyper-v: Resurrect dedicated KVM_REQ_HV_TLB_FLUSH flagVitaly Kuznetsov1-0/+2
In preparation to implementing fine-grained Hyper-V TLB flush and L2 TLB flush, resurrect dedicated KVM_REQ_HV_TLB_FLUSH request bit. As KVM_REQ_TLB_FLUSH_GUEST is a stronger operation, clear KVM_REQ_HV_TLB_FLUSH request in kvm_vcpu_flush_tlb_guest(). The flush itself is temporary handled by kvm_vcpu_flush_tlb_guest(). No functional change intended. Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20221101145426.251680-9-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-18KVM: x86: Rename 'enable_direct_tlbflush' to 'enable_l2_tlb_flush'Vitaly Kuznetsov2-2/+2
To make terminology between Hyper-V-on-KVM and KVM-on-Hyper-V consistent, rename 'enable_direct_tlbflush' to 'enable_l2_tlb_flush'. The change eliminates the use of confusing 'direct' and adds the missing underscore. No functional change. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20221101145426.251680-6-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-18x86/hyperv: KVM: Rename "hv_enlightenments" to "hv_vmcb_enlightenments"Sean Christopherson2-2/+2
Now that KVM isn't littered with "struct hv_enlightenments" casts, rename the struct to "hv_vmcb_enlightenments" to highlight the fact that the struct is specifically for SVM's VMCB. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20221101145426.251680-5-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-18KVM: SVM: Add a proper field for Hyper-V VMCB enlightenmentsSean Christopherson1-1/+6
Add a union to provide hv_enlightenments side-by-side with the sw_reserved bytes that Hyper-V's enlightenments overlay. Casting sw_reserved everywhere is messy, confusing, and unnecessarily unsafe. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20221101145426.251680-4-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-18x86/hyperv: Move VMCB enlightenment definitions to hyperv-tlfs.hSean Christopherson1-0/+22
Move Hyper-V's VMCB enlightenment definitions to the TLFS header; the definitions come directly from the TLFS[*], not from KVM. No functional change intended. [*] https://learn.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/datatypes/hv_svm_enlightened_vmcb_fields [vitaly: rename VMCB_HV_ -> HV_VMCB_ to match the rest of hyperv-tlfs.h, keep svm/hyperv.h] Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20221101145426.251680-2-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-18ftrace: abstract DYNAMIC_FTRACE_WITH_ARGS accessesMark Rutland1-0/+16
In subsequent patches we'll arrange for architectures to have an ftrace_regs which is entirely distinct from pt_regs. In preparation for this, we need to minimize the use of pt_regs to where strictly necessary in the core ftrace code. This patch adds new ftrace_regs_{get,set}_*() helpers which can be used to manipulate ftrace_regs. When CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS=y, these can always be used on any ftrace_regs, and when CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS=n these can be used when regs are available. A new ftrace_regs_has_args(fregs) helper is added which code can use to check when these are usable. Co-developed-by: Florent Revest <revest@chromium.org> Signed-off-by: Florent Revest <revest@chromium.org> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://lore.kernel.org/r/20221103170520.931305-4-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-11-18ftrace: rename ftrace_instruction_pointer_set() -> ↵Mark Rutland1-1/+1
ftrace_regs_set_instruction_pointer() In subsequent patches we'll add a sew of ftrace_regs_{get,set}_*() helpers. In preparation, this patch renames ftrace_instruction_pointer_set() to ftrace_regs_set_instruction_pointer(). There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Florent Revest <revest@chromium.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://lore.kernel.org/r/20221103170520.931305-3-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-11-18ftrace: pass fregs to arch_ftrace_set_direct_caller()Mark Rutland1-13/+18
In subsequent patches we'll arrange for architectures to have an ftrace_regs which is entirely distinct from pt_regs. In preparation for this, we need to minimize the use of pt_regs to where strictly necessary in the core ftrace code. This patch changes the prototype of arch_ftrace_set_direct_caller() to take ftrace_regs rather than pt_regs, and moves the extraction of the pt_regs into arch_ftrace_set_direct_caller(). On x86, arch_ftrace_set_direct_caller() can be used even when CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS=n, and <linux/ftrace.h> defines struct ftrace_regs. Due to this, it's necessary to define arch_ftrace_set_direct_caller() as a macro to avoid using an incomplete type. I've also moved the body of arch_ftrace_set_direct_caller() after the CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS=y defineidion of struct ftrace_regs. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Florent Revest <revest@chromium.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://lore.kernel.org/r/20221103170520.931305-2-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-11-18efi: x86: Move EFI runtime map sysfs code to arch/x86Ard Biesheuvel1-0/+22
The EFI runtime map code is only wired up on x86, which is the only architecture that has a need for it in its implementation of kexec. So let's move this code under arch/x86 and drop all references to it from generic code. To ensure that the efi_runtime_map_init() is invoked at the appropriate time use a 'sync' subsys_initcall() that will be called right after the EFI initcall made from generic code where the original invocation of efi_runtime_map_init() resided. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Dave Young <dyoung@redhat.com>
2022-11-18efi: memmap: Move manipulation routines into x86 arch treeArd Biesheuvel1-0/+12
The EFI memory map is a description of the memory layout as provided by the firmware, and only x86 manipulates it in various different ways for its own memory bookkeeping. So let's move the memmap routines that are only used by x86 into the x86 arch tree. Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2022-11-18efi: memmap: Move EFI fake memmap support into x86 arch treeArd Biesheuvel1-0/+5
The EFI fake memmap support is specific to x86, which manipulates the EFI memory map in various different ways after receiving it from the EFI stub. On other architectures, we have managed to push back on this, and the EFI memory map is kept pristine. So let's move the fake memmap code into the x86 arch tree, where it arguably belongs. Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2022-11-18efi: libstub: Add mixed mode support to command line initrd loaderArd Biesheuvel1-0/+11
Now that we have support for calling protocols that need additional marshalling for mixed mode, wire up the initrd command line loader. Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2022-11-18efi: libstub: Permit mixed mode return types other than efi_status_tArd Biesheuvel1-35/+30
Rework the EFI stub macro wrappers around protocol method calls and other indirect calls in order to allow return types other than efi_status_t. This means the widening should be conditional on whether or not the return type is efi_status_t, and should be omitted otherwise. Also, switch to _Generic() to implement the type based compile time conditionals, which is more concise, and distinguishes between efi_status_t and u64 properly. Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2022-11-18stackprotector: actually use get_random_canary()Jason A. Donenfeld1-13/+1
The RNG always mixes in the Linux version extremely early in boot. It also always includes a cycle counter, not only during early boot, but each and every time it is invoked prior to being fully initialized. Together, this means that the use of additional xors inside of the various stackprotector.h files is superfluous and over-complicated. Instead, we can get exactly the same thing, but better, by just calling `get_random_canary()`. Acked-by: Guo Ren <guoren@kernel.org> # for csky Acked-by: Catalin Marinas <catalin.marinas@arm.com> # for arm64 Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-11-17x86/tdx: Add a wrapper to get TDREPORT0 from the TDX ModuleKuppuswamy Sathyanarayanan1-0/+2
To support TDX attestation, the TDX guest driver exposes an IOCTL interface to allow userspace to get the TDREPORT0 (a.k.a. TDREPORT subtype 0) from the TDX module via TDG.MR.TDREPORT TDCALL. In order to get the TDREPORT0 in the TDX guest driver, instead of using a low level function like __tdx_module_call(), add a tdx_mcall_get_report0() wrapper function to handle it. This is a preparatory patch for adding attestation support. Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Wander Lairson Costa <wander@redhat.com> Link: https://lore.kernel.org/all/20221116223820.819090-2-sathyanarayanan.kuppuswamy%40linux.intel.com
2022-11-17x86/apic: Remove X86_IRQ_ALLOC_CONTIGUOUS_VECTORSThomas Gleixner1-3/+1
Now that the PCI/MSI core code does early checking for multi-MSI support X86_IRQ_ALLOC_CONTIGUOUS_VECTORS is not required anymore. Remove the flag and rely on MSI_FLAG_MULTI_PCI_MSI. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20221111122015.865042356@linutronix.de
2022-11-17PCI/MSI: Get rid of PCI_MSI_IRQ_DOMAINThomas Gleixner1-2/+2
What a zoo: PCI_MSI select GENERIC_MSI_IRQ PCI_MSI_IRQ_DOMAIN def_bool y depends on PCI_MSI select GENERIC_MSI_IRQ_DOMAIN Ergo PCI_MSI enables PCI_MSI_IRQ_DOMAIN which in turn selects GENERIC_MSI_IRQ_DOMAIN. So all the dependencies on PCI_MSI_IRQ_DOMAIN are just an indirection to PCI_MSI. Match the reality and just admit that PCI_MSI requires GENERIC_MSI_IRQ_DOMAIN. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://lore.kernel.org/r/20221111122014.467556921@linutronix.de
2022-11-17clocksource/drivers/hyper-v: Include asm/hyperv-tlfs.h not asm/mshyperv.hThomas Gleixner2-2/+9
clocksource/hyperv_timer.h is included into the VDSO build. It includes asm/mshyperv.h which in turn includes the world and some more. This worked so far by chance, but any subtle change in the include chain results in a build breakage because VDSO builds are building user space libraries. Include asm/hyperv-tlfs.h instead which contains everything what the VDSO build needs except the hv_get_raw_timer() define. Move this define into a separate header file, which contains the prerequisites (msr.h) and is included by clocksource/hyperv_timer.h. Fixup drivers/hv/vmbus_drv.c which relies on the indirect include of asm/mshyperv.h. With that the VDSO build only pulls in the minimum requirements. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/87fsemtut0.ffs@tglx
2022-11-16KVM: x86/mmu: Use BIT{,_ULL}() for PFERR masksDavid Matlack1-10/+10
Use the preferred BIT() and BIT_ULL() to construct the PFERR masks rather than open-coding the bit shifting. No functional change intended. Signed-off-by: David Matlack <dmatlack@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20221102184654.282799-6-dmatlack@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>