summaryrefslogtreecommitdiffstats
path: root/arch/x86/include/uapi/asm
AgeCommit message (Collapse)AuthorFilesLines
2022-07-10x86/boot: Fix the setup data types max limitBorislav Petkov1-1/+1
Commit in Fixes forgot to change the SETUP_TYPE_MAX definition which contains the highest valid setup data type. Correct that. Fixes: 5ea98e01ab52 ("x86/boot: Add Confidential Computing type to setup_data") Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/ddba81dd-cc92-699c-5274-785396a17fb5@zytor.com
2022-05-26Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-5/+6
Pull kvm updates from Paolo Bonzini: "S390: - ultravisor communication device driver - fix TEID on terminating storage key ops RISC-V: - Added Sv57x4 support for G-stage page table - Added range based local HFENCE functions - Added remote HFENCE functions based on VCPU requests - Added ISA extension registers in ONE_REG interface - Updated KVM RISC-V maintainers entry to cover selftests support ARM: - Add support for the ARMv8.6 WFxT extension - Guard pages for the EL2 stacks - Trap and emulate AArch32 ID registers to hide unsupported features - Ability to select and save/restore the set of hypercalls exposed to the guest - Support for PSCI-initiated suspend in collaboration with userspace - GICv3 register-based LPI invalidation support - Move host PMU event merging into the vcpu data structure - GICv3 ITS save/restore fixes - The usual set of small-scale cleanups and fixes x86: - New ioctls to get/set TSC frequency for a whole VM - Allow userspace to opt out of hypercall patching - Only do MSR filtering for MSRs accessed by rdmsr/wrmsr AMD SEV improvements: - Add KVM_EXIT_SHUTDOWN metadata for SEV-ES - V_TSC_AUX support Nested virtualization improvements for AMD: - Support for "nested nested" optimizations (nested vVMLOAD/VMSAVE, nested vGIF) - Allow AVIC to co-exist with a nested guest running - Fixes for LBR virtualizations when a nested guest is running, and nested LBR virtualization support - PAUSE filtering for nested hypervisors Guest support: - Decoupling of vcpu_is_preempted from PV spinlocks" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (199 commits) KVM: x86: Fix the intel_pt PMI handling wrongly considered from guest KVM: selftests: x86: Sync the new name of the test case to .gitignore Documentation: kvm: reorder ARM-specific section about KVM_SYSTEM_EVENT_SUSPEND x86, kvm: use correct GFP flags for preemption disabled KVM: LAPIC: Drop pending LAPIC timer injection when canceling the timer x86/kvm: Alloc dummy async #PF token outside of raw spinlock KVM: x86: avoid calling x86 emulator without a decoded instruction KVM: SVM: Use kzalloc for sev ioctl interfaces to prevent kernel data leak x86/fpu: KVM: Set the base guest FPU uABI size to sizeof(struct kvm_xsave) s390/uv_uapi: depend on CONFIG_S390 KVM: selftests: x86: Fix test failure on arch lbr capable platforms KVM: LAPIC: Trace LAPIC timer expiration on every vmentry KVM: s390: selftest: Test suppression indication on key prot exception KVM: s390: Don't indicate suppression on dirtying, failing memop selftests: drivers/s390x: Add uvdevice tests drivers/s390/char: Add Ultravisor io device MAINTAINERS: Update KVM RISC-V entry to cover selftests support RISC-V: KVM: Introduce ISA extension register RISC-V: KVM: Cleanup stale TLB entries when host CPU changes RISC-V: KVM: Add remote HFENCE functions based on VCPU requests ...
2022-05-26Merge tag 'mm-stable-2022-05-25' of ↵Linus Torvalds1-14/+0
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "Almost all of MM here. A few things are still getting finished off, reviewed, etc. - Yang Shi has improved the behaviour of khugepaged collapsing of readonly file-backed transparent hugepages. - Johannes Weiner has arranged for zswap memory use to be tracked and managed on a per-cgroup basis. - Munchun Song adds a /proc knob ("hugetlb_optimize_vmemmap") for runtime enablement of the recent huge page vmemmap optimization feature. - Baolin Wang contributes a series to fix some issues around hugetlb pagetable invalidation. - Zhenwei Pi has fixed some interactions between hwpoisoned pages and virtualization. - Tong Tiangen has enabled the use of the presently x86-only page_table_check debugging feature on arm64 and riscv. - David Vernet has done some fixup work on the memcg selftests. - Peter Xu has taught userfaultfd to handle write protection faults against shmem- and hugetlbfs-backed files. - More DAMON development from SeongJae Park - adding online tuning of the feature and support for monitoring of fixed virtual address ranges. Also easier discovery of which monitoring operations are available. - Nadav Amit has done some optimization of TLB flushing during mprotect(). - Neil Brown continues to labor away at improving our swap-over-NFS support. - David Hildenbrand has some fixes to anon page COWing versus get_user_pages(). - Peng Liu fixed some errors in the core hugetlb code. - Joao Martins has reduced the amount of memory consumed by device-dax's compound devmaps. - Some cleanups of the arch-specific pagemap code from Anshuman Khandual. - Muchun Song has found and fixed some errors in the TLB flushing of transparent hugepages. - Roman Gushchin has done more work on the memcg selftests. ... and, of course, many smaller fixes and cleanups. Notably, the customary million cleanup serieses from Miaohe Lin" * tag 'mm-stable-2022-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (381 commits) mm: kfence: use PAGE_ALIGNED helper selftests: vm: add the "settings" file with timeout variable selftests: vm: add "test_hmm.sh" to TEST_FILES selftests: vm: check numa_available() before operating "merge_across_nodes" in ksm_tests selftests: vm: add migration to the .gitignore selftests/vm/pkeys: fix typo in comment ksm: fix typo in comment selftests: vm: add process_mrelease tests Revert "mm/vmscan: never demote for memcg reclaim" mm/kfence: print disabling or re-enabling message include/trace/events/percpu.h: cleanup for "percpu: improve percpu_alloc_percpu event trace" include/trace/events/mmflags.h: cleanup for "tracing: incorrect gfp_t conversion" mm: fix a potential infinite loop in start_isolate_page_range() MAINTAINERS: add Muchun as co-maintainer for HugeTLB zram: fix Kconfig dependency warning mm/shmem: fix shmem folio swapoff hang cgroup: fix an error handling path in alloc_pagecache_max_30M() mm: damon: use HPAGE_PMD_SIZE tracing: incorrect isolate_mote_t cast in mm_vmscan_lru_isolate nodemask.h: fix compilation error with GCC12 ...
2022-05-23Merge tag 'platform-drivers-x86-v5.19-1' of ↵Linus Torvalds1-5/+109
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform driver updates from Hans de Goede: "This includes some small changes to kernel/stop_machine.c and arch/x86 which are deps of the new Intel IFS support. Highlights: - New drivers: - Intel "In Field Scan" (IFS) support - Winmate FM07/FM07P buttons - Mellanox SN2201 support - AMD PMC driver enhancements - Lots of various other small fixes and hardware-id additions" * tag 'platform-drivers-x86-v5.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (54 commits) platform/x86/intel/ifs: Add CPU_SUP_INTEL dependency platform/x86: intel_cht_int33fe: Set driver data platform/x86: intel-hid: fix _DSM function index handling platform/x86: toshiba_acpi: use kobj_to_dev() platform/x86: samsung-laptop: use kobj_to_dev() platform/x86: gigabyte-wmi: Add support for Z490 AORUS ELITE AC and X570 AORUS ELITE WIFI tools/power/x86/intel-speed-select: Fix warning for perf_cap.cpu tools/power/x86/intel-speed-select: Display error on turbo mode disabled Documentation: In-Field Scan platform/x86/intel/ifs: add ABI documentation for IFS trace: platform/x86/intel/ifs: Add trace point to track Intel IFS operations platform/x86/intel/ifs: Add IFS sysfs interface platform/x86/intel/ifs: Add scan test support platform/x86/intel/ifs: Authenticate and copy to secured memory platform/x86/intel/ifs: Check IFS Image sanity platform/x86/intel/ifs: Read IFS firmware image platform/x86/intel/ifs: Add stub driver for In-Field Scan stop_machine: Add stop_core_cpuslocked() for per-core operations x86/msr-index: Define INTEGRITY_CAPABILITIES MSR x86/microcode/intel: Expose collect_cpu_info_early() for IFS ...
2022-04-28x86/mm: enable ARCH_HAS_VM_GET_PAGE_PROTChristoph Hellwig1-14/+0
This defines and exports a platform specific custom vm_get_page_prot() via subscribing ARCH_HAS_VM_GET_PAGE_PROT. This also unsubscribes from config ARCH_HAS_FILTER_PGPROT, after dropping off arch_filter_pgprot() and arch_vm_get_page_prot(). Link: https://lkml.kernel.org/r/20220414062125.609297-6-anshuman.khandual@arm.com Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: David S. Miller <davem@davemloft.net> Cc: Khalid Aziz <khalid.aziz@oracle.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-27amd_hsmp: Add HSMP protocol version 5 messagesSuma Hegde1-5/+109
HSMP protocol version 5 is supported on AMD family 19h model 10h EPYC processors. This version brings new features such as -- DIMM statistics -- Bandwidth for IO and xGMI links -- Monitor socket and core frequency limits -- Configure power efficiency modes, DF pstate range etc Signed-off-by: Suma Hegde <suma.hegde@amd.com> Reviewed-by: Carlos Bilbao <carlos.bilbao@amd.com> Signed-off-by: Naveen Krishna Chatradhi <nchatrad@amd.com> Link: https://lore.kernel.org/r/20220427152248.25643-1-nchatrad@amd.com Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2022-04-13Merge branch 'kvm-older-features' into HEADPaolo Bonzini1-5/+6
Merge branch for features that did not make it into 5.18: * New ioctls to get/set TSC frequency for a whole VM * Allow userspace to opt out of hypercall patching Nested virtualization improvements for AMD: * Support for "nested nested" optimizations (nested vVMLOAD/VMSAVE, nested vGIF) * Allow AVIC to co-exist with a nested guest running * Fixes for LBR virtualizations when a nested guest is running, and nested LBR virtualization support * PAUSE filtering for nested hypervisors Guest support: * Decoupling of vcpu_is_preempted from PV spinlocks Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-07x86/sev: Provide support for SNP guest request NAEsBrijesh Singh1-0/+4
Version 2 of GHCB specification provides SNP_GUEST_REQUEST and SNP_EXT_GUEST_REQUEST NAE that can be used by the SNP guest to communicate with the PSP. While at it, add a snp_issue_guest_request() helper that will be used by driver or other subsystem to issue the request to PSP. See SEV-SNP firmware and GHCB spec for more details. Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220307213356.2797205-42-brijesh.singh@amd.com
2022-04-07x86/boot: Add a pointer to Confidential Computing blob in bootparamsMichael Roth1-1/+2
The previously defined Confidential Computing blob is provided to the kernel via a setup_data structure or EFI config table entry. Currently, these are both checked for by boot/compressed kernel to access the CPUID table address within it for use with SEV-SNP CPUID enforcement. To also enable that enforcement for the run-time kernel, similar access to the CPUID table is needed early on while it's still using the identity-mapped page table set up by boot/compressed, where global pointers need to be accessed via fixup_pointer(). This isn't much of an issue for accessing setup_data, and the EFI config table helper code currently used in boot/compressed *could* be used in this case as well since they both rely on identity-mapping. However, it has some reliance on EFI helpers/string constants that would need to be accessed via fixup_pointer(), and fixing it up while making it shareable between boot/compressed and run-time kernel is fragile and introduces a good bit of ugliness. Instead, add a boot_params->cc_blob_address pointer that the boot/compressed kernel can initialize so that the run-time kernel can access the CC blob from there instead of re-scanning the EFI config table. Also document these in Documentation/x86/zero-page.rst. While there, add missing documentation for the acpi_rsdp_addr field, which serves a similar purpose in providing the run-time kernel a pointer to the ACPI RSDP table so that it does not need to [re-]scan the EFI configuration table. [ bp: Fix typos, massage commit message. ] Signed-off-by: Michael Roth <michael.roth@amd.com> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220307213356.2797205-34-brijesh.singh@amd.com
2022-04-07x86/boot: Add Confidential Computing type to setup_dataBrijesh Singh1-0/+1
While launching encrypted guests, the hypervisor may need to provide some additional information during the guest boot. When booting under an EFI-based BIOS, the EFI configuration table contains an entry for the confidential computing blob that contains the required information. To support booting encrypted guests on non-EFI VMs, the hypervisor needs to pass this additional information to the guest kernel using a different method. For this purpose, introduce SETUP_CC_BLOB type in setup_data to hold the physical address of the confidential computing blob location. The boot loader or hypervisor may choose to use this method instead of an EFI configuration table. The CC blob location scanning should give preference to a setup_data blob over an EFI configuration table. In AMD SEV-SNP, the CC blob contains the address of the secrets and CPUID pages. The secrets page includes information such as a VM to PSP communication key and the CPUID page contains PSP-filtered CPUID values. Define the AMD SEV confidential computing blob structure. While at it, define the EFI GUID for the confidential computing blob. [ bp: Massage commit message, mark struct __packed. ] Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20220307213356.2797205-30-brijesh.singh@amd.com
2022-04-06x86/sev: Use SEV-SNP AP creation to start secondary CPUsTom Lendacky1-0/+5
To provide a more secure way to start APs under SEV-SNP, use the SEV-SNP AP Creation NAE event. This allows for guest control over the AP register state rather than trusting the hypervisor with the SEV-ES Jump Table address. During native_smp_prepare_cpus(), invoke an SEV-SNP function that, if SEV-SNP is active, will set/override apic->wakeup_secondary_cpu. This will allow the SEV-SNP AP Creation NAE event method to be used to boot the APs. As a result of installing the override when SEV-SNP is active, this method of starting the APs becomes the required method. The override function will fail to start the AP if the hypervisor does not have support for AP creation. [ bp: Work in forgotten review comments. ] Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220307213356.2797205-23-brijesh.singh@amd.com
2022-04-06x86/mm: Validate memory when changing the C-bitBrijesh Singh1-0/+2
Add the needed functionality to change pages state from shared to private and vice-versa using the Page State Change VMGEXIT as documented in the GHCB spec. Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220307213356.2797205-22-brijesh.singh@amd.com
2022-04-06x86/sev: Check SEV-SNP features supportBrijesh Singh1-0/+2
Version 2 of the GHCB specification added the advertisement of features that are supported by the hypervisor. If the hypervisor supports SEV-SNP then it must set the SEV-SNP features bit to indicate that the base functionality is supported. Check that feature bit while establishing the GHCB; if failed, terminate the guest. Version 2 of the GHCB specification adds several new Non-Automatic Exits (NAEs), most of them are optional except the hypervisor feature. Now that the hypervisor feature NAE is implemented, bump the GHCB maximum supported protocol version. While at it, move the GHCB protocol negotiation check from the #VC exception handler to sev_enable() so that all feature detection happens before the first #VC exception. While at it, document why the GHCB page cannot be setup from load_stage2_idt(). [ bp: Massage commit message. ] Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220307213356.2797205-13-brijesh.singh@amd.com
2022-04-02KVM: x86: Allow userspace to opt out of hypercall patchingOliver Upton1-5/+6
KVM handles the VMCALL/VMMCALL instructions very strangely. Even though both of these instructions really should #UD when executed on the wrong vendor's hardware (i.e. VMCALL on SVM, VMMCALL on VMX), KVM replaces the guest's instruction with the appropriate instruction for the vendor. Nonetheless, older guest kernels without commit c1118b3602c2 ("x86: kvm: use alternatives for VMCALL vs. VMMCALL if kernel text is read-only") do not patch in the appropriate instruction using alternatives, likely motivating KVM's intervention. Add a quirk allowing userspace to opt out of hypercall patching. If the quirk is disabled, KVM synthesizes a #UD in the guest. Signed-off-by: Oliver Upton <oupton@google.com> Message-Id: <20220316005538.2282772-2-oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-27Merge tag 'x86_core_for_5.18_rc1' of ↵Linus Torvalds1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 CET-IBT (Control-Flow-Integrity) support from Peter Zijlstra: "Add support for Intel CET-IBT, available since Tigerlake (11th gen), which is a coarse grained, hardware based, forward edge Control-Flow-Integrity mechanism where any indirect CALL/JMP must target an ENDBR instruction or suffer #CP. Additionally, since Alderlake (12th gen)/Sapphire-Rapids, speculation is limited to 2 instructions (and typically fewer) on branch targets not starting with ENDBR. CET-IBT also limits speculation of the next sequential instruction after the indirect CALL/JMP [1]. CET-IBT is fundamentally incompatible with retpolines, but provides, as described above, speculation limits itself" [1] https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/branch-history-injection.html * tag 'x86_core_for_5.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (53 commits) kvm/emulate: Fix SETcc emulation for ENDBR x86/Kconfig: Only allow CONFIG_X86_KERNEL_IBT with ld.lld >= 14.0.0 x86/Kconfig: Only enable CONFIG_CC_HAS_IBT for clang >= 14.0.0 kbuild: Fixup the IBT kbuild changes x86/Kconfig: Do not allow CONFIG_X86_X32_ABI=y with llvm-objcopy x86: Remove toolchain check for X32 ABI capability x86/alternative: Use .ibt_endbr_seal to seal indirect calls objtool: Find unused ENDBR instructions objtool: Validate IBT assumptions objtool: Add IBT/ENDBR decoding objtool: Read the NOENDBR annotation x86: Annotate idtentry_df() x86,objtool: Move the ASM_REACHABLE annotation to objtool.h x86: Annotate call_on_stack() objtool: Rework ASM_REACHABLE x86: Mark __invalid_creds() __noreturn exit: Mark do_group_exit() __noreturn x86: Mark stop_this_cpu() __noreturn objtool: Ignore extra-symbol code objtool: Rename --duplicate to --lto ...
2022-03-25Merge tag 'platform-drivers-x86-v5.18-1' of ↵Linus Torvalds1-0/+203
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform driver updates from Hans de Goede: "New drivers: - AMD Host System Management Port (HSMP) - Intel Software Defined Silicon Removed drivers (functionality folded into other drivers): - intel_cht_int33fe_microb - surface3_button amd-pmc: - s2idle bug-fixes - Support for AMD Spill to DRAM STB feature hp-wmi: - Fix SW_TABLET_MODE detection method (and other fixes) - Support omen thermal profile policy v1 serial-multi-instantiate: - Add SPI device support - Add support for CS35L41 amplifiers used in new laptops think-lmi: - syfs-class-firmware-attributes Certificate authentication support thinkpad_acpi: - Fixes + quirks - Add platform_profile support on AMD based ThinkPads x86-android-tablets: - Improve Asus ME176C / TF103C support - Support Nextbook Ares 8, Lenovo Tab 2 830 and 1050 tablets Lots of various other small fixes and hardware-id additions" * tag 'platform-drivers-x86-v5.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (60 commits) platform/x86: think-lmi: Certificate authentication support Documentation: syfs-class-firmware-attributes: Lenovo Certificate support platform/x86: amd-pmc: Only report STB errors when STB enabled platform/x86: amd-pmc: Drop CPU QoS workaround platform/x86: amd-pmc: Output error codes in messages platform/x86: amd-pmc: Move to later in the suspend process ACPI / x86: Add support for LPS0 callback handler platform/x86: thinkpad_acpi: consistently check fan_get_status return. platform/x86: hp-wmi: support omen thermal profile policy v1 platform/x86: hp-wmi: Changing bios_args.data to be dynamically allocated platform/x86: hp-wmi: Fix 0x05 error code reported by several WMI calls platform/x86: hp-wmi: Fix SW_TABLET_MODE detection method platform/x86: hp-wmi: Fix hp_wmi_read_int() reporting error (0x05) platform/x86: amd-pmc: Validate entry into the deepest state on resume platform/x86: thinkpad_acpi: Don't use test_bit on an integer platform/x86: thinkpad_acpi: Fix compiler warning about uninitialized err variable platform/x86: thinkpad_acpi: clean up dytc profile convert platform/x86: x86-android-tablets: Depend on EFI and SPI platform/x86: amd-pmc: uninitialized variable in amd_pmc_s2d_init() platform/x86: intel-uncore-freq: fix uncore_freq_common_init() error codes ...
2022-03-15x86/ibt: Add IBT feature, MSR and #CP handlingPeter Zijlstra1-0/+2
The bits required to make the hardware go.. Of note is that, provided the syscall entry points are covered with ENDBR, #CP doesn't need to be an IST because we'll never hit the syscall gap. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lore.kernel.org/r/20220308154318.582331711@infradead.org
2022-03-02platform/x86: Add AMD system management interfaceSuma Hegde1-0/+203
Recent Fam19h EPYC server line of processors from AMD support system management functionality via HSMP (Host System Management Port) interface. The Host System Management Port (HSMP) is an interface to provide OS-level software with access to system management functions via a set of mailbox registers. More details on the interface can be found in chapter "7 Host System Management Port (HSMP)" of the following PPR https://www.amd.com/system/files/TechDocs/55898_B1_pub_0.50.zip This patch adds new amd_hsmp module under the drivers/platforms/x86/ which creates miscdevice with an IOCTL interface to the user space. /dev/hsmp is for running the hsmp mailbox commands. Signed-off-by: Suma Hegde <suma.hegde@amd.com> Signed-off-by: Naveen Krishna Chatradhi <nchatrad@amd.com> Reviewed-by: Carlos Bilbao <carlos.bilbao@amd.com> Acked-by: Song Liu <song@kernel.org> Reviewed-by: Nathan Fontenot <nathan.fontenot@amd.com> Link: https://lore.kernel.org/r/20220222050501.18789-1-nchatrad@amd.com Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2022-02-17shmbuf.h: add asm/shmbuf.h to UAPI compile-test coverageMasahiro Yamada1-1/+5
asm/shmbuf.h is currently excluded from the UAPI compile-test because of the errors like follows: HDRTEST usr/include/asm/shmbuf.h In file included from ./usr/include/asm/shmbuf.h:6, from <command-line>: ./usr/include/asm-generic/shmbuf.h:26:33: error: field ‘shm_perm’ has incomplete type 26 | struct ipc64_perm shm_perm; /* operation perms */ | ^~~~~~~~ ./usr/include/asm-generic/shmbuf.h:27:9: error: unknown type name ‘size_t’ 27 | size_t shm_segsz; /* size of segment (bytes) */ | ^~~~~~ ./usr/include/asm-generic/shmbuf.h:40:9: error: unknown type name ‘__kernel_pid_t’ 40 | __kernel_pid_t shm_cpid; /* pid of creator */ | ^~~~~~~~~~~~~~ ./usr/include/asm-generic/shmbuf.h:41:9: error: unknown type name ‘__kernel_pid_t’ 41 | __kernel_pid_t shm_lpid; /* pid of last operator */ | ^~~~~~~~~~~~~~ The errors can be fixed by replacing size_t with __kernel_size_t and by including proper headers. Then, remove the no-header-test entry from user/include/Makefile. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-02-17signal.h: add linux/signal.h and asm/signal.h to UAPI compile-test coverageMasahiro Yamada1-1/+1
linux/signal.h and asm/signal.h are currently excluded from the UAPI compile-test because of the errors like follows: HDRTEST usr/include/asm/signal.h In file included from <command-line>: ./usr/include/asm/signal.h:103:9: error: unknown type name ‘size_t’ 103 | size_t ss_size; | ^~~~~~ The errors can be fixed by replacing size_t with __kernel_size_t. Then, remove the no-header-test entries from user/include/Makefile. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-01-28KVM: x86: add system attribute to retrieve full set of supported xsave statesPaolo Bonzini1-0/+3
Because KVM_GET_SUPPORTED_CPUID is meant to be passed (by simple-minded VMMs) to KVM_SET_CPUID2, it cannot include any dynamic xsave states that have not been enabled. Probing those, for example so that they can be passed to ARCH_REQ_XCOMP_GUEST_PERM, requires a new ioctl or arch_prctl. The latter is in fact worse, even though that is what the rest of the API uses, because it would require supported_xcr0 to be moved from the KVM module to the kernel just for this use. In addition, the value would be nonsensical (or an error would have to be returned) until the KVM module is loaded in. Therefore, to limit the growth of system ioctls, add a /dev/kvm variant of KVM_{GET,HAS}_DEVICE_ATTR, and implement it in x86 with just one group (0) and attribute (KVM_X86_XCOMP_GUEST_SUPP). Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-14kvm: x86: Add support for getting/setting expanded xstate bufferGuang Zeng1-1/+15
With KVM_CAP_XSAVE, userspace uses a hardcoded 4KB buffer to get/set xstate data from/to KVM. This doesn't work when dynamic xfeatures (e.g. AMX) are exposed to the guest as they require a larger buffer size. Introduce a new capability (KVM_CAP_XSAVE2). Userspace VMM gets the required xstate buffer size via KVM_CHECK_EXTENSION(KVM_CAP_XSAVE2). KVM_SET_XSAVE is extended to work with both legacy and new capabilities by doing properly-sized memdup_user() based on the guest fpu container. KVM_GET_XSAVE is kept for backward-compatible reason. Instead, KVM_GET_XSAVE2 is introduced under KVM_CAP_XSAVE2 as the preferred interface for getting xstate buffer (4KB or larger size) from KVM (Link: https://lkml.org/lkml/2021/12/15/510) Also, update the api doc with the new KVM_GET_XSAVE2 ioctl. Signed-off-by: Guang Zeng <guang.zeng@intel.com> Signed-off-by: Wei Wang <wei.w.wang@intel.com> Signed-off-by: Jing Liu <jing2.liu@intel.com> Signed-off-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Yang Zhong <yang.zhong@intel.com> Message-Id: <20220105123532.12586-19-yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-07x86/fpu: Extend fpu_xstate_prctl() with guest permissionsThomas Gleixner1-12/+14
KVM requires a clear separation of host user space and guest permissions for dynamic XSTATE components. Add a guest permissions member to struct fpu and a separate set of prctl() arguments: ARCH_GET_XCOMP_GUEST_PERM and ARCH_REQ_XCOMP_GUEST_PERM. The semantics are equivalent to the host user space permission control except for the following constraints: 1) Permissions have to be requested before the first vCPU is created 2) Permissions are frozen when the first vCPU is created to ensure consistency. Any attempt to expand permissions via the prctl() after that point is rejected. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jing Liu <jing2.liu@intel.com> Signed-off-by: Yang Zhong <yang.zhong@intel.com> Message-Id: <20220105123532.12586-2-yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-11Merge branch 'kvm-5.16-fixes' into kvm-masterPaolo Bonzini1-0/+1
* Fix misuse of gfn-to-pfn cache when recording guest steal time / preempted status * Fix selftests on APICv machines * Fix sparse warnings * Fix detection of KVM features in CPUID * Cleanups for bogus writes to MSR_KVM_PV_EOI_EN * Fixes and cleanups for MSR bitmap handling * Cleanups for INVPCID * Make x86 KVM_SOFT_MAX_VCPUS consistent with other architectures
2021-11-11KVM: x86: Make sure KVM_CPUID_FEATURES really are KVM_CPUID_FEATURESPaul Durrant1-0/+1
Currently when kvm_update_cpuid_runtime() runs, it assumes that the KVM_CPUID_FEATURES leaf is located at 0x40000001. This is not true, however, if Hyper-V support is enabled. In this case the KVM leaves will be offset. This patch introdues as new 'kvm_cpuid_base' field into struct kvm_vcpu_arch to track the location of the KVM leaves and function kvm_update_kvm_cpuid_base() (called from kvm_set_cpuid()) to locate the leaves using the 'KVMKVMKVM\0\0\0' signature (which is now given a definition in kvm_para.h). Adjustment of KVM_CPUID_FEATURES will hence now target the correct leaf. NOTE: A new for_each_possible_hypervisor_cpuid_base() macro is intoduced into processor.h to avoid having duplicate code for the iteration over possible hypervisor base leaves. Signed-off-by: Paul Durrant <pdurrant@amazon.com> Message-Id: <20211105095101.5384-3-pdurrant@amazon.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-02Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-0/+4
Pull KVM updates from Paolo Bonzini: "ARM: - More progress on the protected VM front, now with the full fixed feature set as well as the limitation of some hypercalls after initialisation. - Cleanup of the RAZ/WI sysreg handling, which was pointlessly complicated - Fixes for the vgic placement in the IPA space, together with a bunch of selftests - More memcg accounting of the memory allocated on behalf of a guest - Timer and vgic selftests - Workarounds for the Apple M1 broken vgic implementation - KConfig cleanups - New kvmarm.mode=none option, for those who really dislike us RISC-V: - New KVM port. x86: - New API to control TSC offset from userspace - TSC scaling for nested hypervisors on SVM - Switch masterclock protection from raw_spin_lock to seqcount - Clean up function prototypes in the page fault code and avoid repeated memslot lookups - Convey the exit reason to userspace on emulation failure - Configure time between NX page recovery iterations - Expose Predictive Store Forwarding Disable CPUID leaf - Allocate page tracking data structures lazily (if the i915 KVM-GT functionality is not compiled in) - Cleanups, fixes and optimizations for the shadow MMU code s390: - SIGP Fixes - initial preparations for lazy destroy of secure VMs - storage key improvements/fixes - Log the guest CPNC Starting from this release, KVM-PPC patches will come from Michael Ellerman's PPC tree" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (227 commits) RISC-V: KVM: fix boolreturn.cocci warnings RISC-V: KVM: remove unneeded semicolon RISC-V: KVM: Fix GPA passed to __kvm_riscv_hfence_gvma_xyz() functions RISC-V: KVM: Factor-out FP virtualization into separate sources KVM: s390: add debug statement for diag 318 CPNC data KVM: s390: pv: properly handle page flags for protected guests KVM: s390: Fix handle_sske page fault handling KVM: x86: SGX must obey the KVM_INTERNAL_ERROR_EMULATION protocol KVM: x86: On emulation failure, convey the exit reason, etc. to userspace KVM: x86: Get exit_reason as part of kvm_x86_ops.get_exit_info KVM: x86: Clarify the kvm_run.emulation_failure structure layout KVM: s390: Add a routine for setting userspace CPU state KVM: s390: Simplify SIGP Set Arch handling KVM: s390: pv: avoid stalls when making pages secure KVM: s390: pv: avoid stalls for kvm_s390_pv_init_vm KVM: s390: pv: avoid double free of sida page KVM: s390: pv: add macros for UVC CC values s390/mm: optimize reset_guest_reference_bit() s390/mm: optimize set_guest_storage_key() s390/mm: no need for pte_alloc_map_lock() if we know the pmd is present ...
2021-11-01Merge tag 'x86_sgx_for_v5.16_rc1' of ↵Linus Torvalds1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 SGX updates from Borislav Petkov: "Add a SGX_IOC_VEPC_REMOVE ioctl to the /dev/sgx_vepc virt interface with which EPC pages can be put back into their uninitialized state without having to reopen /dev/sgx_vepc, which could not be possible anymore after startup due to security policies" * tag 'x86_sgx_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/sgx/virt: implement SGX_IOC_VEPC_REMOVE ioctl x86/sgx/virt: extract sgx_vepc_remove_page
2021-10-26x86/arch_prctl: Add controls for dynamic XSTATE componentsChang S. Bae1-0/+4
Dynamically enabled XSTATE features are by default disabled for all processes. A process has to request permission to use such a feature. To support this implement a architecture specific prctl() with the options: - ARCH_GET_XCOMP_SUPP Copies the supported feature bitmap into the user space provided u64 storage. The pointer is handed in via arg2 - ARCH_GET_XCOMP_PERM Copies the process wide permitted feature bitmap into the user space provided u64 storage. The pointer is handed in via arg2 - ARCH_REQ_XCOMP_PERM Request permission for a feature set. A feature set can be mapped to a facility, e.g. AMX, and can require one or more XSTATE components to be enabled. The feature argument is the number of the highest XSTATE component which is required for a facility to work. The request argument is not a user supplied bitmap because that makes filtering harder (think seccomp) and even impossible because to support 32bit tasks the argument would have to be a pointer. The permission mechanism works this way: Task asks for permission for a facility and kernel checks whether that's supported. If supported it does: 1) Check whether permission has already been granted 2) Compute the size of the required kernel and user space buffer (sigframe) size. 3) Validate that no task has a sigaltstack installed which is smaller than the resulting sigframe size 4) Add the requested feature bit(s) to the permission bitmap of current->group_leader->fpu and store the sizes in the group leaders fpu struct as well. If that is successful then the feature is still not enabled for any of the tasks. The first usage of a related instruction will result in a #NM trap. The trap handler validates the permission bit of the tasks group leader and if permitted it installs a larger kernel buffer and transfers the permission and size info to the new fpstate container which makes all the FPU functions which require per task information aware of the extended feature set. [ tglx: Adopted to new base code, added missing serialization, massaged namings, comments and changelog ] Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211021225527.10184-7-chang.seok.bae@intel.com
2021-10-22x86/sgx/virt: implement SGX_IOC_VEPC_REMOVE ioctlPaolo Bonzini1-0/+2
For bare-metal SGX on real hardware, the hardware provides guarantees SGX state at reboot. For instance, all pages start out uninitialized. The vepc driver provides a similar guarantee today for freshly-opened vepc instances, but guests such as Windows expect all pages to be in uninitialized state on startup, including after every guest reboot. Some userspace implementations of virtual SGX would rather avoid having to close and reopen the /dev/sgx_vepc file descriptor and re-mmap the virtual EPC. For example, they could sandbox themselves after the guest starts and forbid further calls to open(), in order to mitigate exploits from untrusted guests. Therefore, add a ioctl that does this with EREMOVE. Userspace can invoke the ioctl to bring its vEPC pages back to uninitialized state. There is a possibility that some pages fail to be removed if they are SECS pages, and the child and SECS pages could be in separate vEPC regions. Therefore, the ioctl returns the number of EREMOVE failures, telling userspace to try the ioctl again after it's done with all vEPC regions. A more verbose description of the correct usage and the possible error conditions is documented in sgx.rst. Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lkml.kernel.org/r/20211021201155.1523989-3-pbonzini@redhat.com
2021-10-18KVM: x86: Expose TSC offset controls to userspaceOliver Upton1-0/+4
To date, VMM-directed TSC synchronization and migration has been a bit messy. KVM has some baked-in heuristics around TSC writes to infer if the VMM is attempting to synchronize. This is problematic, as it depends on host userspace writing to the guest's TSC within 1 second of the last write. A much cleaner approach to configuring the guest's views of the TSC is to simply migrate the TSC offset for every vCPU. Offsets are idempotent, and thus not subject to change depending on when the VMM actually reads/writes values from/to KVM. The VMM can then read the TSC once with KVM_GET_CLOCK to capture a (realtime, host_tsc) pair at the instant when the guest is paused. Cc: David Matlack <dmatlack@google.com> Cc: Sean Christopherson <seanjc@google.com> Signed-off-by: Oliver Upton <oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210916181538.968978-8-oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-20KVM: x86: implement KVM_GUESTDBG_BLOCKIRQMaxim Levitsky1-0/+1
KVM_GUESTDBG_BLOCKIRQ will allow KVM to block all interrupts while running. This change is mostly intended for more robust single stepping of the guest and it has the following benefits when enabled: * Resuming from a breakpoint is much more reliable. When resuming execution from a breakpoint, with interrupts enabled, more often than not, KVM would inject an interrupt and make the CPU jump immediately to the interrupt handler and eventually return to the breakpoint, to trigger it again. From the user point of view it looks like the CPU never executed a single instruction and in some cases that can even prevent forward progress, for example, when the breakpoint is placed by an automated script (e.g lx-symbols), which does something in response to the breakpoint and then continues the guest automatically. If the script execution takes enough time for another interrupt to arrive, the guest will be stuck on the same breakpoint RIP forever. * Normal single stepping is much more predictable, since it won't land the debugger into an interrupt handler. * RFLAGS.TF has less chance to be leaked to the guest: We set that flag behind the guest's back to do single stepping but if single step lands us into an interrupt/exception handler it will be leaked to the guest in the form of being pushed to the stack. This doesn't completely eliminate this problem as exceptions can still happen, but at least this reduces the chances of this happening. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210811122927.900604-6-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-07-07Merge tag 'x86-fpu-2021-07-07' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fpu updates from Thomas Gleixner: "Fixes and improvements for FPU handling on x86: - Prevent sigaltstack out of bounds writes. The kernel unconditionally writes the FPU state to the alternate stack without checking whether the stack is large enough to accomodate it. Check the alternate stack size before doing so and in case it's too small force a SIGSEGV instead of silently corrupting user space data. - MINSIGSTKZ and SIGSTKSZ are constants in signal.h and have never been updated despite the fact that the FPU state which is stored on the signal stack has grown over time which causes trouble in the field when AVX512 is available on a CPU. The kernel does not expose the minimum requirements for the alternate stack size depending on the available and enabled CPU features. ARM already added an aux vector AT_MINSIGSTKSZ for the same reason. Add it to x86 as well. - A major cleanup of the x86 FPU code. The recent discoveries of XSTATE related issues unearthed quite some inconsistencies, duplicated code and other issues. The fine granular overhaul addresses this, makes the code more robust and maintainable, which allows to integrate upcoming XSTATE related features in sane ways" * tag 'x86-fpu-2021-07-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (74 commits) x86/fpu/xstate: Clear xstate header in copy_xstate_to_uabi_buf() again x86/fpu/signal: Let xrstor handle the features to init x86/fpu/signal: Handle #PF in the direct restore path x86/fpu: Return proper error codes from user access functions x86/fpu/signal: Split out the direct restore code x86/fpu/signal: Sanitize copy_user_to_fpregs_zeroing() x86/fpu/signal: Sanitize the xstate check on sigframe x86/fpu/signal: Remove the legacy alignment check x86/fpu/signal: Move initial checks into fpu__restore_sig() x86/fpu: Mark init_fpstate __ro_after_init x86/pkru: Remove xstate fiddling from write_pkru() x86/fpu: Don't store PKRU in xstate in fpu_reset_fpstate() x86/fpu: Remove PKRU handling from switch_fpu_finish() x86/fpu: Mask PKRU from kernel XRSTOR[S] operations x86/fpu: Hook up PKRU into ptrace() x86/fpu: Add PKRU storage outside of task XSAVE buffer x86/fpu: Dont restore PKRU in fpregs_restore_userspace() x86/fpu: Rename xfeatures_mask_user() to xfeatures_mask_uabi() x86/fpu: Move FXSAVE_LEAK quirk info __copy_kernel_to_fpregs() x86/fpu: Rename __fpregs_load_activate() to fpregs_restore_userregs() ...
2021-06-28Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds3-0/+29
Pull kvm updates from Paolo Bonzini: "This covers all architectures (except MIPS) so I don't expect any other feature pull requests this merge window. ARM: - Add MTE support in guests, complete with tag save/restore interface - Reduce the impact of CMOs by moving them in the page-table code - Allow device block mappings at stage-2 - Reduce the footprint of the vmemmap in protected mode - Support the vGIC on dumb systems such as the Apple M1 - Add selftest infrastructure to support multiple configuration and apply that to PMU/non-PMU setups - Add selftests for the debug architecture - The usual crop of PMU fixes PPC: - Support for the H_RPT_INVALIDATE hypercall - Conversion of Book3S entry/exit to C - Bug fixes S390: - new HW facilities for guests - make inline assembly more robust with KASAN and co x86: - Allow userspace to handle emulation errors (unknown instructions) - Lazy allocation of the rmap (host physical -> guest physical address) - Support for virtualizing TSC scaling on VMX machines - Optimizations to avoid shattering huge pages at the beginning of live migration - Support for initializing the PDPTRs without loading them from memory - Many TLB flushing cleanups - Refuse to load if two-stage paging is available but NX is not (this has been a requirement in practice for over a year) - A large series that separates the MMU mode (WP/SMAP/SMEP etc.) from CR0/CR4/EFER, using the MMU mode everywhere once it is computed from the CPU registers - Use PM notifier to notify the guest about host suspend or hibernate - Support for passing arguments to Hyper-V hypercalls using XMM registers - Support for Hyper-V TLB flush hypercalls and enlightened MSR bitmap on AMD processors - Hide Hyper-V hypercalls that are not included in the guest CPUID - Fixes for live migration of virtual machines that use the Hyper-V "enlightened VMCS" optimization of nested virtualization - Bugfixes (not many) Generic: - Support for retrieving statistics without debugfs - Cleanups for the KVM selftests API" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (314 commits) KVM: x86: rename apic_access_page_done to apic_access_memslot_enabled kvm: x86: disable the narrow guest module parameter on unload selftests: kvm: Allows userspace to handle emulation errors. kvm: x86: Allow userspace to handle emulation errors KVM: x86/mmu: Let guest use GBPAGES if supported in hardware and TDP is on KVM: x86/mmu: Get CR4.SMEP from MMU, not vCPU, in shadow page fault KVM: x86/mmu: Get CR0.WP from MMU, not vCPU, in shadow page fault KVM: x86/mmu: Drop redundant rsvd bits reset for nested NPT KVM: x86/mmu: Optimize and clean up so called "last nonleaf level" logic KVM: x86: Enhance comments for MMU roles and nested transition trickiness KVM: x86/mmu: WARN on any reserved SPTE value when making a valid SPTE KVM: x86/mmu: Add helpers to do full reserved SPTE checks w/ generic MMU KVM: x86/mmu: Use MMU's role to determine PTTYPE KVM: x86/mmu: Collapse 32-bit PAE and 64-bit statements for helpers KVM: x86/mmu: Add a helper to calculate root from role_regs KVM: x86/mmu: Add helper to update paging metadata KVM: x86/mmu: Don't update nested guest's paging bitmasks if CR0.PG=0 KVM: x86/mmu: Consolidate reset_rsvds_bits_mask() calls KVM: x86/mmu: Use MMU role_regs to get LA57, and drop vCPU LA57 helper KVM: x86/mmu: Get nested MMU's root level from the MMU's role ...
2021-06-17KVM: X86: Introduce KVM_HC_MAP_GPA_RANGE hypercallAshish Kalra1-0/+13
This hypercall is used by the SEV guest to notify a change in the page encryption status to the hypervisor. The hypercall should be invoked only when the encryption attribute is changed from encrypted -> decrypted and vice versa. By default all guest pages are considered encrypted. The hypercall exits to userspace to manage the guest shared regions and integrate with the userspace VMM's migration code. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Borislav Petkov <bp@suse.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Steve Rutherford <srutherford@google.com> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Co-developed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <90778988e1ee01926ff9cac447aacb745f954c8c.1623174621.git.ashish.kalra@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-17KVM: x86: Introduce KVM_GET_SREGS2 / KVM_SET_SREGS2Maxim Levitsky1-0/+13
This is a new version of KVM_GET_SREGS / KVM_SET_SREGS. It has the following changes: * Has flags for future extensions * Has vcpu's PDPTRs, allowing to save/restore them on migration. * Lacks obsolete interrupt bitmap (done now via KVM_SET_VCPU_EVENTS) New capability, KVM_CAP_SREGS2 is added to signal the userspace of this ioctl. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210607090203.133058-8-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-17KVM: SVM: Software reserved fieldsVineeth Pillai1-0/+3
SVM added support for certain reserved fields to be used by software or hypervisor. Add the following reserved fields: - VMCB offset 0x3e0 - 0x3ff - Clean bit 31 - SVM intercept exit code 0xf0000000 Later patches will make use of this for supporting Hyper-V nested virtualization enhancements. Signed-off-by: Vineeth Pillai <viremana@linux.microsoft.com> Message-Id: <a1f17a43a8e9e751a1a9cc0281649d71bdbf721b.1622730232.git.viremana@linux.microsoft.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-05-21x86/elf: Use _BITUL() macro in UAPI headersJoe Richey1-2/+4
Replace BIT() in x86's UAPI header with _BITUL(). BIT() is not defined in the UAPI headers and its usage may cause userspace build errors. Fixes: 742c45c3ecc9 ("x86/elf: Enumerate kernel FSGSBASE capability in AT_HWCAP2") Signed-off-by: Joe Richey <joerichey@google.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210521085849.37676-2-joerichey94@gmail.com
2021-05-19x86/elf: Support a new ELF aux vector AT_MINSIGSTKSZChang S. Bae1-2/+2
Historically, signal.h defines MINSIGSTKSZ (2KB) and SIGSTKSZ (8KB), for use by all architectures with sigaltstack(2). Over time, the hardware state size grew, but these constants did not evolve. Today, literal use of these constants on several architectures may result in signal stack overflow, and thus user data corruption. A few years ago, the ARM team addressed this issue by establishing getauxval(AT_MINSIGSTKSZ). This enables the kernel to supply a value at runtime that is an appropriate replacement on current and future hardware. Add getauxval(AT_MINSIGSTKSZ) support to x86, analogous to the support added for ARM in 94b07c1f8c39 ("arm64: signal: Report signal frame size to userspace via auxv"). Also, include a documentation to describe x86-specific auxiliary vectors. Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Len Brown <len.brown@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20210518200320.17239-4-chang.seok.bae@intel.com
2021-05-07KVM: nVMX: Properly pad 'struct kvm_vmx_nested_state_hdr'Vitaly Kuznetsov1-0/+2
Eliminate the probably unwanted hole in 'struct kvm_vmx_nested_state_hdr': Pre-patch: struct kvm_vmx_nested_state_hdr { __u64 vmxon_pa; /* 0 8 */ __u64 vmcs12_pa; /* 8 8 */ struct { __u16 flags; /* 16 2 */ } smm; /* 16 2 */ /* XXX 2 bytes hole, try to pack */ __u32 flags; /* 20 4 */ __u64 preemption_timer_deadline; /* 24 8 */ }; Post-patch: struct kvm_vmx_nested_state_hdr { __u64 vmxon_pa; /* 0 8 */ __u64 vmcs12_pa; /* 8 8 */ struct { __u16 flags; /* 16 2 */ } smm; /* 16 2 */ __u16 pad; /* 18 2 */ __u32 flags; /* 20 4 */ __u64 preemption_timer_deadline; /* 24 8 */ }; Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210503150854.1144255-3-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-05-01Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-0/+1
Pull kvm updates from Paolo Bonzini: "This is a large update by KVM standards, including AMD PSP (Platform Security Processor, aka "AMD Secure Technology") and ARM CoreSight (debug and trace) changes. ARM: - CoreSight: Add support for ETE and TRBE - Stage-2 isolation for the host kernel when running in protected mode - Guest SVE support when running in nVHE mode - Force W^X hypervisor mappings in nVHE mode - ITS save/restore for guests using direct injection with GICv4.1 - nVHE panics now produce readable backtraces - Guest support for PTP using the ptp_kvm driver - Performance improvements in the S2 fault handler x86: - AMD PSP driver changes - Optimizations and cleanup of nested SVM code - AMD: Support for virtual SPEC_CTRL - Optimizations of the new MMU code: fast invalidation, zap under read lock, enable/disably dirty page logging under read lock - /dev/kvm API for AMD SEV live migration (guest API coming soon) - support SEV virtual machines sharing the same encryption context - support SGX in virtual machines - add a few more statistics - improved directed yield heuristics - Lots and lots of cleanups Generic: - Rework of MMU notifier interface, simplifying and optimizing the architecture-specific code - a handful of "Get rid of oprofile leftovers" patches - Some selftests improvements" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (379 commits) KVM: selftests: Speed up set_memory_region_test selftests: kvm: Fix the check of return value KVM: x86: Take advantage of kvm_arch_dy_has_pending_interrupt() KVM: SVM: Skip SEV cache flush if no ASIDs have been used KVM: SVM: Remove an unnecessary prototype declaration of sev_flush_asids() KVM: SVM: Drop redundant svm_sev_enabled() helper KVM: SVM: Move SEV VMCB tracking allocation to sev.c KVM: SVM: Explicitly check max SEV ASID during sev_hardware_setup() KVM: SVM: Unconditionally invoke sev_hardware_teardown() KVM: SVM: Enable SEV/SEV-ES functionality by default (when supported) KVM: SVM: Condition sev_enabled and sev_es_enabled on CONFIG_KVM_AMD_SEV=y KVM: SVM: Append "_enabled" to module-scoped SEV/SEV-ES control variables KVM: SEV: Mask CPUID[0x8000001F].eax according to supported features KVM: SVM: Move SEV module params/variables to sev.c KVM: SVM: Disable SEV/SEV-ES if NPT is disabled KVM: SVM: Free sev_asid_bitmap during init if SEV setup fails KVM: SVM: Zero out the VMCB array used to track SEV ASID association x86/sev: Drop redundant and potentially misleading 'sev_enabled' KVM: x86: Move reverse CPUID helpers to separate header file KVM: x86: Rename GPR accessors to make mode-aware variants the defaults ...
2021-04-26Merge tag 'x86-splitlock-2021-04-26' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 bus lock detection updates from Thomas Gleixner: "Support for enhanced split lock detection: Newer CPUs provide a second mechanism to detect operations with lock prefix which go accross a cache line boundary. Such operations have to take bus lock which causes a system wide performance degradation when these operations happen frequently. The new mechanism is not using the #AC exception. It triggers #DB and is restricted to operations in user space. Kernel side split lock access can only be detected by the #AC based variant. Contrary to the #AC based mechanism the #DB based variant triggers _after_ the instruction was executed. The mechanism is CPUID enumerated and contrary to the #AC version which is based on the magic TEST_CTRL_MSR and model/family based enumeration on the way to become architectural" * tag 'x86-splitlock-2021-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Documentation/admin-guide: Change doc for split_lock_detect parameter x86/traps: Handle #DB for bus lock x86/cpufeatures: Enumerate #DB for bus lock detection
2021-04-20KVM: VMX: Add basic handling of VM-Exit from SGX enclaveSean Christopherson1-0/+1
Add support for handling VM-Exits that originate from a guest SGX enclave. In SGX, an "enclave" is a new CPL3-only execution environment, wherein the CPU and memory state is protected by hardware to make the state inaccesible to code running outside of the enclave. When exiting an enclave due to an asynchronous event (from the perspective of the enclave), e.g. exceptions, interrupts, and VM-Exits, the enclave's state is automatically saved and scrubbed (the CPU loads synthetic state), and then reloaded when re-entering the enclave. E.g. after an instruction based VM-Exit from an enclave, vmcs.GUEST_RIP will not contain the RIP of the enclave instruction that trigered VM-Exit, but will instead point to a RIP in the enclave's untrusted runtime (the guest userspace code that coordinates entry/exit to/from the enclave). To help a VMM recognize and handle exits from enclaves, SGX adds bits to existing VMCS fields, VM_EXIT_REASON.VMX_EXIT_REASON_FROM_ENCLAVE and GUEST_INTERRUPTIBILITY_INFO.GUEST_INTR_STATE_ENCLAVE_INTR. Define the new architectural bits, and add a boolean to struct vcpu_vmx to cache VMX_EXIT_REASON_FROM_ENCLAVE. Clear the bit in exit_reason so that checks against exit_reason do not need to account for SGX, e.g. "if (exit_reason == EXIT_REASON_EXCEPTION_NMI)" continues to work. KVM is a largely a passive observer of the new bits, e.g. KVM needs to account for the bits when propagating information to a nested VMM, but otherwise doesn't need to act differently for the majority of VM-Exits from enclaves. The one scenario that is directly impacted is emulation, which is for all intents and purposes impossible[1] since KVM does not have access to the RIP or instruction stream that triggered the VM-Exit. The inability to emulate is a non-issue for KVM, as most instructions that might trigger VM-Exit unconditionally #UD in an enclave (before the VM-Exit check. For the few instruction that conditionally #UD, KVM either never sets the exiting control, e.g. PAUSE_EXITING[2], or sets it if and only if the feature is not exposed to the guest in order to inject a #UD, e.g. RDRAND_EXITING. But, because it is still possible for a guest to trigger emulation, e.g. MMIO, inject a #UD if KVM ever attempts emulation after a VM-Exit from an enclave. This is architecturally accurate for instruction VM-Exits, and for MMIO it's the least bad choice, e.g. it's preferable to killing the VM. In practice, only broken or particularly stupid guests should ever encounter this behavior. Add a WARN in skip_emulated_instruction to detect any attempt to modify the guest's RIP during an SGX enclave VM-Exit as all such flows should either be unreachable or must handle exits from enclaves before getting to skip_emulated_instruction. [1] Impossible for all practical purposes. Not truly impossible since KVM could implement some form of para-virtualization scheme. [2] PAUSE_LOOP_EXITING only affects CPL0 and enclaves exist only at CPL3, so we also don't need to worry about that interaction. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Kai Huang <kai.huang@intel.com> Message-Id: <315f54a8507d09c292463ef29104e1d4c62e9090.1618196135.git.kai.huang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-28x86/traps: Handle #DB for bus lockFenghua Yu1-0/+1
Bus locks degrade performance for the whole system, not just for the CPU that requested the bus lock. Two CPU features "#AC for split lock" and "#DB for bus lock" provide hooks so that the operating system may choose one of several mitigation strategies. #AC for split lock is already implemented. Add code to use the #DB for bus lock feature to cover additional situations with new options to mitigate. split_lock_detect= #AC for split lock #DB for bus lock off Do nothing Do nothing warn Kernel OOPs Warn once per task and Warn once per task and and continues to run. disable future checking When both features are supported, warn in #AC fatal Kernel OOPs Send SIGBUS to user. Send SIGBUS to user When both features are supported, fatal in #AC ratelimit:N Do nothing Limit bus lock rate to N per second in the current non-root user. Default option is "warn". Hardware only generates #DB for bus lock detect when CPL>0 to avoid nested #DB from multiple bus locks while the first #DB is being handled. So no need to handle #DB for bus lock detected in the kernel. #DB for bus lock is enabled by bus lock detection bit 2 in DEBUGCTL MSR while #AC for split lock is enabled by split lock detection bit 29 in TEST_CTRL MSR. Both breakpoint and bus lock in the same instruction can trigger one #DB. The bus lock is handled before the breakpoint in the #DB handler. Delivery of #DB for bus lock in userspace clears DR6[11], which is set by the #DB handler right after reading DR6. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20210322135325.682257-3-fenghua.yu@intel.com
2021-03-18x86: Fix various typos in commentsIngo Molnar5-6/+6
Fix ~144 single-word typos in arch/x86/ code comments. Doing this in a single commit should reduce the churn. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: linux-kernel@vger.kernel.org
2021-02-21Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2-1/+4
Pull KVM updates from Paolo Bonzini: "x86: - Support for userspace to emulate Xen hypercalls - Raise the maximum number of user memslots - Scalability improvements for the new MMU. Instead of the complex "fast page fault" logic that is used in mmu.c, tdp_mmu.c uses an rwlock so that page faults are concurrent, but the code that can run against page faults is limited. Right now only page faults take the lock for reading; in the future this will be extended to some cases of page table destruction. I hope to switch the default MMU around 5.12-rc3 (some testing was delayed due to Chinese New Year). - Cleanups for MAXPHYADDR checks - Use static calls for vendor-specific callbacks - On AMD, use VMLOAD/VMSAVE to save and restore host state - Stop using deprecated jump label APIs - Workaround for AMD erratum that made nested virtualization unreliable - Support for LBR emulation in the guest - Support for communicating bus lock vmexits to userspace - Add support for SEV attestation command - Miscellaneous cleanups PPC: - Support for second data watchpoint on POWER10 - Remove some complex workarounds for buggy early versions of POWER9 - Guest entry/exit fixes ARM64: - Make the nVHE EL2 object relocatable - Cleanups for concurrent translation faults hitting the same page - Support for the standard TRNG hypervisor call - A bunch of small PMU/Debug fixes - Simplification of the early init hypercall handling Non-KVM changes (with acks): - Detection of contended rwlocks (implemented only for qrwlocks, because KVM only needs it for x86) - Allow __DISABLE_EXPORTS from assembly code - Provide a saner follow_pfn replacements for modules" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (192 commits) KVM: x86/xen: Explicitly pad struct compat_vcpu_info to 64 bytes KVM: selftests: Don't bother mapping GVA for Xen shinfo test KVM: selftests: Fix hex vs. decimal snafu in Xen test KVM: selftests: Fix size of memslots created by Xen tests KVM: selftests: Ignore recently added Xen tests' build output KVM: selftests: Add missing header file needed by xAPIC IPI tests KVM: selftests: Add operand to vmsave/vmload/vmrun in svm.c KVM: SVM: Make symbol 'svm_gp_erratum_intercept' static locking/arch: Move qrwlock.h include after qspinlock.h KVM: PPC: Book3S HV: Fix host radix SLB optimisation with hash guests KVM: PPC: Book3S HV: Ensure radix guest has no SLB entries KVM: PPC: Don't always report hash MMU capability for P9 < DD2.2 KVM: PPC: Book3S HV: Save and restore FSCR in the P9 path KVM: PPC: remove unneeded semicolon KVM: PPC: Book3S HV: Use POWER9 SLBIA IH=6 variant to clear SLB KVM: PPC: Book3S HV: No need to clear radix host SLB before loading HPT guest KVM: PPC: Book3S HV: Fix radix guest SLB side channel KVM: PPC: Book3S HV: Remove support for running HPT guest on RPT host without mixed mode support KVM: PPC: Book3S HV: Introduce new capability for 2nd DAWR KVM: PPC: Book3S HV: Add infrastructure to support 2nd DAWR ...
2021-02-04KVM: VMX: Enable bus lock VM exitChenyi Qiang2-1/+4
Virtual Machine can exploit bus locks to degrade the performance of system. Bus lock can be caused by split locked access to writeback(WB) memory or by using locks on uncacheable(UC) memory. The bus lock is typically >1000 cycles slower than an atomic operation within a cache line. It also disrupts performance on other cores (which must wait for the bus lock to be released before their memory operations can complete). To address the threat, bus lock VM exit is introduced to notify the VMM when a bus lock was acquired, allowing it to enforce throttling or other policy based mitigations. A VMM can enable VM exit due to bus locks by setting a new "Bus Lock Detection" VM-execution control(bit 30 of Secondary Processor-based VM execution controls). If delivery of this VM exit was preempted by a higher priority VM exit (e.g. EPT misconfiguration, EPT violation, APIC access VM exit, APIC write VM exit, exception bitmap exiting), bit 26 of exit reason in vmcs field is set to 1. In current implementation, the KVM exposes this capability through KVM_CAP_X86_BUS_LOCK_EXIT. The user can get the supported mode bitmap (i.e. off and exit) and enable it explicitly (disabled by default). If bus locks in guest are detected by KVM, exit to user space even when current exit reason is handled by KVM internally. Set a new field KVM_RUN_BUS_LOCK in vcpu->run->flags to inform the user space that there is a bus lock detected in guest. Document for Bus Lock VM exit is now available at the latest "Intel Architecture Instruction Set Extensions Programming Reference". Document Link: https://software.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> Message-Id: <20201106090315.18606-4-chenyi.qiang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-01-21x86/vm86/32: Remove VM86_SCREEN_BITMAP supportAndy Lutomirski1-2/+2
The implementation was rather buggy. It unconditionally marked PTEs read-only, even for VM_SHARED mappings. I'm not sure whether this is actually a problem, but it certainly seems unwise. More importantly, it released the mmap lock before flushing the TLB, which could allow a racing CoW operation to falsely believe that the underlying memory was not writable. I can't find any users at all of this mechanism, so just remove it. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Stas Sergeev <stsp2@yandex.ru> Link: https://lkml.kernel.org/r/f3086de0babcab36f69949b5780bde851f719bc8.1611078018.git.luto@kernel.org
2020-12-20Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds3-0/+31
Pull KVM updates from Paolo Bonzini: "Much x86 work was pushed out to 5.12, but ARM more than made up for it. ARM: - PSCI relay at EL2 when "protected KVM" is enabled - New exception injection code - Simplification of AArch32 system register handling - Fix PMU accesses when no PMU is enabled - Expose CSV3 on non-Meltdown hosts - Cache hierarchy discovery fixes - PV steal-time cleanups - Allow function pointers at EL2 - Various host EL2 entry cleanups - Simplification of the EL2 vector allocation s390: - memcg accouting for s390 specific parts of kvm and gmap - selftest for diag318 - new kvm_stat for when async_pf falls back to sync x86: - Tracepoints for the new pagetable code from 5.10 - Catch VFIO and KVM irqfd events before userspace - Reporting dirty pages to userspace with a ring buffer - SEV-ES host support - Nested VMX support for wait-for-SIPI activity state - New feature flag (AVX512 FP16) - New system ioctl to report Hyper-V-compatible paravirtualization features Generic: - Selftest improvements" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (171 commits) KVM: SVM: fix 32-bit compilation KVM: SVM: Add AP_JUMP_TABLE support in prep for AP booting KVM: SVM: Provide support to launch and run an SEV-ES guest KVM: SVM: Provide an updated VMRUN invocation for SEV-ES guests KVM: SVM: Provide support for SEV-ES vCPU loading KVM: SVM: Provide support for SEV-ES vCPU creation/loading KVM: SVM: Update ASID allocation to support SEV-ES guests KVM: SVM: Set the encryption mask for the SVM host save area KVM: SVM: Add NMI support for an SEV-ES guest KVM: SVM: Guest FPU state save/restore not needed for SEV-ES guest KVM: SVM: Do not report support for SMM for an SEV-ES guest KVM: x86: Update __get_sregs() / __set_sregs() to support SEV-ES KVM: SVM: Add support for CR8 write traps for an SEV-ES guest KVM: SVM: Add support for CR4 write traps for an SEV-ES guest KVM: SVM: Add support for CR0 write traps for an SEV-ES guest KVM: SVM: Add support for EFER write traps for an SEV-ES guest KVM: SVM: Support string IO operations for an SEV-ES guest KVM: SVM: Support MMIO for an SEV-ES guest KVM: SVM: Create trace events for VMGEXIT MSR protocol processing KVM: SVM: Create trace events for VMGEXIT processing ...
2020-12-15KVM: SVM: Add support for CR8 write traps for an SEV-ES guestTom Lendacky1-0/+1
For SEV-ES guests, the interception of control register write access is not recommended. Control register interception occurs prior to the control register being modified and the hypervisor is unable to modify the control register itself because the register is located in the encrypted register state. SEV-ES guests introduce new control register write traps. These traps provide intercept support of a control register write after the control register has been modified. The new control register value is provided in the VMCB EXITINFO1 field, allowing the hypervisor to track the setting of the guest control registers. Add support to track the value of the guest CR8 register using the control register write trap so that the hypervisor understands the guest operating mode. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <5a01033f4c8b3106ca9374b7cadf8e33da852df1.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-12-15KVM: SVM: Add support for CR4 write traps for an SEV-ES guestTom Lendacky1-0/+1
For SEV-ES guests, the interception of control register write access is not recommended. Control register interception occurs prior to the control register being modified and the hypervisor is unable to modify the control register itself because the register is located in the encrypted register state. SEV-ES guests introduce new control register write traps. These traps provide intercept support of a control register write after the control register has been modified. The new control register value is provided in the VMCB EXITINFO1 field, allowing the hypervisor to track the setting of the guest control registers. Add support to track the value of the guest CR4 register using the control register write trap so that the hypervisor understands the guest operating mode. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <c3880bf2db8693aa26f648528fbc6e967ab46e25.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>