summaryrefslogtreecommitdiffstats
path: root/arch/x86/kernel/fpu/xstate.c
AgeCommit message (Collapse)AuthorFilesLines
2022-11-16x86/fpu: Emulate XRSTOR's behavior if the xfeatures PKRU bit is not setKyle Huey1-1/+14
The hardware XRSTOR instruction resets the PKRU register to its hardware init value (namely 0) if the PKRU bit is not set in the xfeatures mask. Emulating that here restores the pre-5.14 behavior for PTRACE_SET_REGSET with NT_X86_XSTATE, and makes sigreturn (which still uses XRSTOR) and ptrace behave identically. KVM has never used XRSTOR and never had this behavior, so KVM opts-out of this emulation by passing a NULL pkru pointer to copy_uabi_to_xstate(). Fixes: e84ba47e313d ("x86/fpu: Hook up PKRU into ptrace()") Signed-off-by: Kyle Huey <me@kylehuey.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/all/20221115230932.7126-6-khuey%40kylehuey.com
2022-11-16x86/fpu: Allow PKRU to be (once again) written by ptrace.Kyle Huey1-1/+20
Move KVM's PKRU handling code in fpu_copy_uabi_to_guest_fpstate() to copy_uabi_to_xstate() so that it is shared with other APIs that write the XSTATE such as PTRACE_SETREGSET with NT_X86_XSTATE. This restores the pre-5.14 behavior of ptrace. The regression can be seen by running gdb and executing `p $pkru`, `set $pkru = 42`, and `p $pkru`. On affected kernels (5.14+) the write to the PKRU register (which gdb performs through ptrace) is ignored. [ dhansen: removed stable@ tag for now. The ABI was broken for long enough that this is not urgent material. Let's let it stew in tip for a few weeks before it's submitted to stable because there are so many ABIs potentially affected. ] Fixes: e84ba47e313d ("x86/fpu: Hook up PKRU into ptrace()") Signed-off-by: Kyle Huey <me@kylehuey.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/all/20221115230932.7126-5-khuey%40kylehuey.com
2022-11-16x86/fpu: Add a pkru argument to copy_uabi_to_xstate()Kyle Huey1-3/+13
In preparation for moving PKRU handling code out of fpu_copy_uabi_to_guest_fpstate() and into copy_uabi_to_xstate(), add an argument that copy_uabi_from_kernel_to_xstate() can use to pass the canonical location of the PKRU value. For copy_sigframe_from_user_to_xstate() the kernel will actually restore the PKRU value from the fpstate, but pass in the thread_struct's pkru location anyways for consistency. Signed-off-by: Kyle Huey <me@kylehuey.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/all/20221115230932.7126-4-khuey%40kylehuey.com
2022-11-16x86/fpu: Add a pkru argument to copy_uabi_from_kernel_to_xstate().Kyle Huey1-1/+1
Both KVM (through KVM_SET_XSTATE) and ptrace (through PTRACE_SETREGSET with NT_X86_XSTATE) ultimately call copy_uabi_from_kernel_to_xstate(), but the canonical locations for the current PKRU value for KVM guests and processes in a ptrace stop are different (in the kvm_vcpu_arch and the thread_state structs respectively). In preparation for eventually handling PKRU in copy_uabi_to_xstate, pass in a pointer to the PKRU location. Signed-off-by: Kyle Huey <me@kylehuey.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/all/20221115230932.7126-3-khuey%40kylehuey.com
2022-11-16x86/fpu: Take task_struct* in copy_sigframe_from_user_to_xstate()Kyle Huey1-2/+2
This will allow copy_sigframe_from_user_to_xstate() to grab the address of thread_struct's pkru value in a later patch. Signed-off-by: Kyle Huey <me@kylehuey.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/all/20221115230932.7126-2-khuey%40kylehuey.com
2022-11-09x86/fpu/xstate: Fix XSTATE_WARN_ON() to emit relevant diagnosticsAndrew Cooper1-6/+6
"XSAVE consistency problem" has been reported under Xen, but that's the extent of my divination skills. Modify XSTATE_WARN_ON() to force the caller to provide relevant diagnostic information, and modify each caller suitably. For check_xstate_against_struct(), this removes a double WARN() where one will do perfectly fine. CC stable as this has been wonky debugging for 7 years and it is good to have there too. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220810221909.12768-1-andrew.cooper3@citrix.com
2022-10-21x86/fpu: Fix copy_xstate_to_uabi() to copy init states correctlyChang S. Bae1-0/+9
When an extended state component is not present in fpstate, but in init state, the function copies from init_fpstate via copy_feature(). But, dynamic states are not present in init_fpstate because of all-zeros init states. Then retrieving them from init_fpstate will explode like this: BUG: kernel NULL pointer dereference, address: 0000000000000000 ... RIP: 0010:memcpy_erms+0x6/0x10 ? __copy_xstate_to_uabi_buf+0x381/0x870 fpu_copy_guest_fpstate_to_uabi+0x28/0x80 kvm_arch_vcpu_ioctl+0x14c/0x1460 [kvm] ? __this_cpu_preempt_check+0x13/0x20 ? vmx_vcpu_put+0x2e/0x260 [kvm_intel] kvm_vcpu_ioctl+0xea/0x6b0 [kvm] ? kvm_vcpu_ioctl+0xea/0x6b0 [kvm] ? __fget_light+0xd4/0x130 __x64_sys_ioctl+0xe3/0x910 ? debug_smp_processor_id+0x17/0x20 ? fpregs_assert_state_consistent+0x27/0x50 do_syscall_64+0x3f/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd Adjust the 'mask' to zero out the userspace buffer for the features that are not available both from fpstate and from init_fpstate. The dynamic features depend on the compacted XSAVE format. Ensure it is enabled before reading XCOMP_BV in init_fpstate. Fixes: 2308ee57d93d ("x86/fpu/amx: Enable the AMX feature in 64-bit mode") Reported-by: Yuan Yao <yuan.yao@intel.com> Suggested-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Tested-by: Yuan Yao <yuan.yao@intel.com> Link: https://lore.kernel.org/lkml/BYAPR11MB3717EDEF2351C958F2C86EED95259@BYAPR11MB3717.namprd11.prod.outlook.com/ Link: https://lkml.kernel.org/r/20221021185844.13472-1-chang.seok.bae@intel.com
2022-10-17x86/fpu: Exclude dynamic states from init_fpstateChang S. Bae1-3/+6
== Background == The XSTATE init code initializes all enabled and supported components. Then, the init states are saved in the init_fpstate buffer that is statically allocated in about one page. The AMX TILE_DATA state is large (8KB) but its init state is zero. And the feature comes only with the compacted format with these established dependencies: AMX->XFD->XSAVES. So this state is excludable from init_fpstate. == Problem == But the buffer is formatted to include that large state. Then, this can be the cause of a noisy splat like the below. This came from XRSTORS for the task with init_fpstate in its XSAVE buffer. It is reproducible on AMX systems when the running kernel is built with CONFIG_DEBUG_PAGEALLOC=y and CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT=y: Bad FPU state detected at restore_fpregs_from_fpstate+0x57/0xd0, reinitializing FPU registers. ... RIP: 0010:restore_fpregs_from_fpstate+0x57/0xd0 ? restore_fpregs_from_fpstate+0x45/0xd0 switch_fpu_return+0x4e/0xe0 exit_to_user_mode_prepare+0x17b/0x1b0 syscall_exit_to_user_mode+0x29/0x40 do_syscall_64+0x67/0x80 ? do_syscall_64+0x67/0x80 ? exc_page_fault+0x86/0x180 entry_SYSCALL_64_after_hwframe+0x63/0xcd == Solution == Adjust init_fpstate to exclude dynamic states. XRSTORS from init_fpstate still initializes those states when their bits are set in the requested-feature bitmap. Fixes: 2308ee57d93d ("x86/fpu/amx: Enable the AMX feature in 64-bit mode") Reported-by: Lin X Wang <lin.x.wang@intel.com> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Lin X Wang <lin.x.wang@intel.com> Link: https://lore.kernel.org/r/20220824191223.1248-4-chang.seok.bae@intel.com
2022-10-17x86/fpu: Fix the init_fpstate size check with the actual sizeChang S. Bae1-18/+6
The init_fpstate buffer is statically allocated. Thus, the sanity test was established to check whether the pre-allocated buffer is enough for the calculated size or not. The currently measured size is not strictly relevant. Fix to validate the calculated init_fpstate size with the pre-allocated area. Also, replace the sanity check function with open code for clarity. The abstraction itself and the function naming do not tend to represent simply what it does. Fixes: 2ae996e0c1a3 ("x86/fpu: Calculate the default sizes independently") Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20220824191223.1248-3-chang.seok.bae@intel.com
2022-10-17x86/fpu: Configure init_fpstate attributes orderlyChang S. Bae1-1/+5
The init_fpstate setup code is spread out and out of order. The init image is recorded before its scoped features and the buffer size are determined. Determine the scope of init_fpstate components and its size before recording the init state. Also move the relevant code together. Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: neelnatu@google.com Link: https://lore.kernel.org/r/20220824191223.1248-2-chang.seok.bae@intel.com
2022-05-23Merge tag 'x86_fpu_for_v5.19_rc1' of ↵Linus Torvalds1-21/+39
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fpu updates from Borislav Petkov: - Add support for XSAVEC - the Compacted XSTATE saving variant - and thus allow for guests to use this compacted XSTATE variant when the hypervisor exports that support - A variable shadowing cleanup * tag 'x86_fpu_for_v5.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/fpu: Cleanup variable shadowing x86/fpu/xsave: Support XSAVEC in the kernel
2022-05-13x86/prctl: Remove pointless task argumentThomas Gleixner1-4/+1
The functions invoked via do_arch_prctl_common() can only operate on the current task and none of these function uses the task argument. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/87lev7vtxj.ffs@tglx
2022-05-02x86/fpu: Cleanup variable shadowingThomas Gleixner1-1/+1
Addresses: warning: Local variable 'mask' shadows outer variable Remove extra variable declaration and switch the bit mask assignment to use BIT_ULL() while at it. Fixes: 522e92743b35 ("x86/fpu: Deduplicate copy_uabi_from_user/kernel_to_xstate()") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/lkml/202204262032.jFYKit5j-lkp@intel.com
2022-04-25x86/fpu/xsave: Support XSAVEC in the kernelThomas Gleixner1-20/+38
XSAVEC is the user space counterpart of XSAVES which cannot save supervisor state. In virtualization scenarios the hypervisor does not expose XSAVES but XSAVEC to the guest, though the kernel does not make use of it. That's unfortunate because XSAVEC uses the compacted format of saving the XSTATE. This is more efficient in terms of storage space vs. XSAVE[OPT] as it does not create holes for XSTATE components which are not supported or enabled by the kernel but are available in hardware. There is room for further optimizations when XSAVEC/S and XGETBV1 are supported. In order to support XSAVEC: - Define the XSAVEC ASM macro as it's not yet supported by the required minimal toolchain. - Create a software defined X86_FEATURE_XCOMPACTED to select the compacted XSTATE buffer format for both XSAVEC and XSAVES. - Make XSAVEC an option in the 'XSAVE' ASM alternatives Requested-by: Andrew Cooper <Andrew.Cooper3@citrix.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20220404104820.598704095@linutronix.de
2022-03-30x86/fpu/xstate: Consolidate size calculationsThomas Gleixner1-41/+8
Use the offset calculation to do the size calculation which avoids yet another series of CPUID instructions for each invocation. [ Fix the FP/SSE only case which missed to take the xstate header into account, as Reported-by: kernel test robot <oliver.sang@intel.com> ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/87o81pgbp2.ffs@tglx
2022-03-30x86/fpu/xstate: Handle supervisor states in XSTATE permissionsThomas Gleixner1-0/+3
The size calculation in __xstate_request_perm() fails to take supervisor states into account because the permission bitmap is only relevant for user states. Up to 5.17 this does not matter because there are no supervisor states supported, but the (re-)enabling of ENQCMD makes them available. Fixes: 7c1ef59145f1 ("x86/cpufeatures: Re-enable ENQCMD") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220324134623.681768598@linutronix.de
2022-03-30x86/fpu/xsave: Handle compacted offsets correctly with supervisor statesThomas Gleixner1-45/+41
So far the cached fixed compacted offsets worked, but with (re-)enabling of ENQCMD this does no longer work with KVM fpstate. KVM does not have supervisor features enabled for the guest FPU, which means that KVM has then a different XSAVE area layout than the host FPU state. This in turn breaks the copy from/to UABI functions when invoked for a guest state. Remove the pre-calculated compacted offsets and calculate the offset of each component at runtime based on the XCOMP_BV field in the XSAVE header. The runtime overhead is not interesting because these copy from/to UABI functions are not used in critical fast paths. KVM uses them to save and restore FPU state during migration. The host uses them for ptrace and for the slow path of 32bit signal handling. Fixes: 7c1ef59145f1 ("x86/cpufeatures: Re-enable ENQCMD") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220324134623.627636809@linutronix.de
2022-03-30x86/fpu: Cache xfeature flags from CPUIDThomas Gleixner1-36/+13
In preparation for runtime calculation of XSAVE offsets cache the feature flags for each XSTATE component during feature enumeration via CPUID(0xD). EDX has two relevant bits: 0 Supervisor component 1 Feature storage must be 64 byte aligned These bits are currently only evaluated during init, but the alignment bit must be cached to make runtime calculation of XSAVE offsets efficient. Cache the full EDX content and use it for the existing alignment and supervisor checks. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220324134623.573656209@linutronix.de
2022-03-30x86/fpu/xsave: Initialize offset/size cache earlyThomas Gleixner1-2/+5
Reading XSTATE feature information from CPUID over and over does not make sense. The information has to be cached anyway, so it can be done early. Prepare for runtime calculation of XSTATE offsets and allow consolidation of the size calculation functions in a later step. Rename the function while at it as it does not setup any features. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220324134623.519411939@linutronix.de
2022-03-30x86/fpu: Remove unused supervisor only offsetsThomas Gleixner1-30/+0
No users. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220324134623.465066249@linutronix.de
2022-03-23x86/fpu/xstate: Fix the ARCH_REQ_XCOMP_PERM implementationYang Zhong1-1/+1
ARCH_REQ_XCOMP_PERM is supposed to add the requested feature to the permission bitmap of thread_group_leader()->fpu. But the code overwrites the bitmap with the requested feature bit only rather than adding it. Fix the code to add the requested feature bit to the master bitmask. Fixes: db8268df0983 ("x86/arch_prctl: Add controls for dynamic XSTATE components") Signed-off-by: Yang Zhong <yang.zhong@intel.com> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Paolo Bonzini <bonzini@gnu.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220129173647.27981-2-chang.seok.bae@intel.com
2022-02-17x86/kvm/fpu: Limit guest user_xfeatures to supported bits of XCR0Leonardo Bras1-1/+4
During host/guest switch (like in kvm_arch_vcpu_ioctl_run()), the kernel swaps the fpu between host/guest contexts, by using fpu_swap_kvm_fpstate(). When xsave feature is available, the fpu swap is done by: - xsave(s) instruction, with guest's fpstate->xfeatures as mask, is used to store the current state of the fpu registers to a buffer. - xrstor(s) instruction, with (fpu_kernel_cfg.max_features & XFEATURE_MASK_FPSTATE) as mask, is used to put the buffer into fpu regs. For xsave(s) the mask is used to limit what parts of the fpu regs will be copied to the buffer. Likewise on xrstor(s), the mask is used to limit what parts of the fpu regs will be changed. The mask for xsave(s), the guest's fpstate->xfeatures, is defined on kvm_arch_vcpu_create(), which (in summary) sets it to all features supported by the cpu which are enabled on kernel config. This means that xsave(s) will save to guest buffer all the fpu regs contents the cpu has enabled when the guest is paused, even if they are not used. This would not be an issue, if xrstor(s) would also do that. xrstor(s)'s mask for host/guest swap is basically every valid feature contained in kernel config, except XFEATURE_MASK_PKRU. Accordingto kernel src, it is instead switched in switch_to() and flush_thread(). Then, the following happens with a host supporting PKRU starts a guest that does not support it: 1 - Host has XFEATURE_MASK_PKRU set. 1st switch to guest, 2 - xsave(s) fpu regs to host fpustate (buffer has XFEATURE_MASK_PKRU) 3 - xrstor(s) guest fpustate to fpu regs (fpu regs have XFEATURE_MASK_PKRU) 4 - guest runs, then switch back to host, 5 - xsave(s) fpu regs to guest fpstate (buffer now have XFEATURE_MASK_PKRU) 6 - xrstor(s) host fpstate to fpu regs. 7 - kvm_vcpu_ioctl_x86_get_xsave() copy guest fpstate to userspace (with XFEATURE_MASK_PKRU, which should not be supported by guest vcpu) On 5, even though the guest does not support PKRU, it does have the flag set on guest fpstate, which is transferred to userspace via vcpu ioctl KVM_GET_XSAVE. This becomes a problem when the user decides on migrating the above guest to another machine that does not support PKRU: the new host restores guest's fpu regs to as they were before (xrstor(s)), but since the new host don't support PKRU, a general-protection exception ocurs in xrstor(s) and that crashes the guest. This can be solved by making the guest's fpstate->user_xfeatures hold a copy of guest_supported_xcr0. This way, on 7 the only flags copied to userspace will be the ones compatible to guest requirements, and thus there will be no issue during migration. As a bonus, it will also fail if userspace tries to set fpu features (with the KVM_SET_XSAVE ioctl) that are not compatible to the guest configuration. Such features will never be returned by KVM_GET_XSAVE or KVM_GET_XSAVE2. Also, since kvm_vcpu_after_set_cpuid() now sets fpstate->user_xfeatures, there is not need to set it in kvm_check_cpuid(). So, change fpstate_realloc() so it does not touch fpstate->user_xfeatures if a non-NULL guest_fpu is passed, which is the case when kvm_check_cpuid() calls it. Signed-off-by: Leonardo Bras <leobras@redhat.com> Message-Id: <20220217053028.96432-2-leobras@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-14x86/fpu: Fix inline prefix warningsYang Zhong1-1/+1
Fix sparse warnings in xstate and remove inline prefix. Fixes: 980fe2fddcff ("x86/fpu: Extend fpu_xstate_prctl() with guest permissions") Signed-off-by: Yang Zhong <yang.zhong@intel.com> Reported-by: kernel test robot <lkp@intel.com> Message-Id: <20220113180825.322333-1-yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-14x86/fpu: Add uabi_size to guest_fpuThomas Gleixner1-0/+1
Userspace needs to inquire KVM about the buffer size to work with the new KVM_SET_XSAVE and KVM_GET_XSAVE2. Add the size info to guest_fpu for KVM to access. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Wei Wang <wei.w.wang@intel.com> Signed-off-by: Jing Liu <jing2.liu@intel.com> Signed-off-by: Yang Zhong <yang.zhong@intel.com> Message-Id: <20220105123532.12586-18-yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-14x86/fpu: Add guest support to xfd_enable_feature()Thomas Gleixner1-39/+54
Guest support for dynamically enabled FPU features requires a few modifications to the enablement function which is currently invoked from the #NM handler: 1) Use guest permissions and sizes for the update 2) Update fpu_guest state accordingly 3) Take into account that the enabling can be triggered either from a running guest via XSETBV and MSR_IA32_XFD write emulation or from a guest restore. In the latter case the guests fpstate is not the current tasks active fpstate. Split the function and implement the guest mechanics throughout the callchain. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jing Liu <jing2.liu@intel.com> Signed-off-by: Yang Zhong <yang.zhong@intel.com> Message-Id: <20220105123532.12586-7-yang.zhong@intel.com> [Add 32-bit stub for __xfd_enable_feature. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-07x86/fpu: Extend fpu_xstate_prctl() with guest permissionsThomas Gleixner1-14/+39
KVM requires a clear separation of host user space and guest permissions for dynamic XSTATE components. Add a guest permissions member to struct fpu and a separate set of prctl() arguments: ARCH_GET_XCOMP_GUEST_PERM and ARCH_REQ_XCOMP_GUEST_PERM. The semantics are equivalent to the host user space permission control except for the following constraints: 1) Permissions have to be requested before the first vCPU is created 2) Permissions are frozen when the first vCPU is created to ensure consistency. Any attempt to expand permissions via the prctl() after that point is rejected. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jing Liu <jing2.liu@intel.com> Signed-off-by: Yang Zhong <yang.zhong@intel.com> Message-Id: <20220105123532.12586-2-yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-26x86/fpu/amx: Enable the AMX feature in 64-bit modeChang S. Bae1-2/+3
Add the AMX state components in XFEATURE_MASK_USER_SUPPORTED and the TILE_DATA component to the dynamic states and update the permission check table accordingly. This is only effective on 64 bit kernels as for 32bit kernels XFEATURE_MASK_TILE is defined as 0. TILE_DATA is caller-saved state and the only dynamic state. Add build time sanity check to ensure the assumption that every dynamic feature is caller- saved. Make AMX state depend on XFD as it is dynamic feature. Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20211021225527.10184-24-chang.seok.bae@intel.com
2021-10-26x86/fpu: Add XFD handling for dynamic statesChang S. Bae1-1/+27
To handle the dynamic sizing of buffers on first use the XFD MSR has to be armed. Store the delta between the maximum available and the default feature bits in init_fpstate where it can be retrieved for task creation. If the delta is non zero then dynamic features are enabled. This needs also to enable the static key which guards the XFD updates. This is delayed to an initcall because the FPU setup runs before jump labels are initialized. Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20211021225527.10184-23-chang.seok.bae@intel.com
2021-10-26x86/fpu: Calculate the default sizes independentlyChang S. Bae1-9/+21
When dynamically enabled states are supported the maximum and default sizes for the kernel buffers and user space interfaces are not longer identical. Put the necessary calculations in place which only take the default enabled features into account. Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20211021225527.10184-22-chang.seok.bae@intel.com
2021-10-26x86/fpu/amx: Define AMX state components and have it used for boot-time checksChang S. Bae1-1/+79
The XSTATE initialization uses check_xstate_against_struct() to sanity check the size of XSTATE-enabled features. AMX is a XSAVE-enabled feature, and its size is not hard-coded but discoverable at run-time via CPUID. The AMX state is composed of state components 17 and 18, which are all user state components. The first component is the XTILECFG state of a 64-byte tile-related control register. The state component 18, called XTILEDATA, contains the actual tile data, and the state size varies on implementations. The architectural maximum, as defined in the CPUID(0x1d, 1): EAX[15:0], is a byte less than 64KB. The first implementation supports 8KB. Check the XTILEDATA state size dynamically. The feature introduces the new tile register, TMM. Define one register struct only and read the number of registers from CPUID. Cross-check the overall size with CPUID again. Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20211021225527.10184-21-chang.seok.bae@intel.com
2021-10-26x86/fpu/xstate: Prepare XSAVE feature table for gaps in state component numbersChang S. Bae1-13/+16
The kernel checks at boot time which features are available by walking a XSAVE feature table which contains the CPUID feature bit numbers which need to be checked whether a feature is available on a CPU or not. So far the feature numbers have been linear, but AMX will create a gap which the current code cannot handle. Make the table entries explicitly indexed and adjust the loop code accordingly to prepare for that. No functional change. Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Len Brown <len.brown@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20211021225527.10184-20-chang.seok.bae@intel.com
2021-10-26x86/fpu/xstate: Add fpstate_realloc()/free()Chang S. Bae1-8/+89
The fpstate embedded in struct fpu is the default state for storing the FPU registers. It's sized so that the default supported features can be stored. For dynamically enabled features the register buffer is too small. The #NM handler detects first use of a feature which is disabled in the XFD MSR. After handling permission checks it recalculates the size for kernel space and user space state and invokes fpstate_realloc() which tries to reallocate fpstate and install it. Provide the allocator function which checks whether the current buffer size is sufficient and if not allocates one. If allocation is successful the new fpstate is initialized with the new features and sizes and the now enabled features is removed from the task's XFD mask. realloc_fpstate() uses vzalloc(). If use of this mechanism grows to re-allocate buffers larger than 64KB, a more sophisticated allocation scheme that includes purpose-built reclaim capability might be justified. Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20211021225527.10184-19-chang.seok.bae@intel.com
2021-10-26x86/fpu/xstate: Add XFD #NM handlerChang S. Bae1-0/+47
If the XFD MSR has feature bits set then #NM will be raised when user space attempts to use an instruction related to one of these features. When the task has no permissions to use that feature, raise SIGILL, which is the same behavior as #UD. If the task has permissions, calculate the new buffer size for the extended feature set and allocate a larger fpstate. In the unlikely case that vzalloc() fails, SIGSEGV is raised. The allocation function will be added in the next step. Provide a stub which fails for now. [ tglx: Updated serialization ] Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20211021225527.10184-18-chang.seok.bae@intel.com
2021-10-26x86/fpu: Update XFD state where requiredChang S. Bae1-0/+12
The IA32_XFD_MSR allows to arm #NM traps for XSTATE components which are enabled in XCR0. The register has to be restored before the tasks XSTATE is restored. The life time rules are the same as for FPU state. XFD is updated on return to userspace only when the FPU state of the task is not up to date in the registers. It's updated before the XRSTORS so that eventually enabled dynamic features are restored as well and not brought into init state. Also in signal handling for restoring FPU state from user space the correctness of the XFD state has to be ensured. Add it to CPU initialization and resume as well. Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20211021225527.10184-17-chang.seok.bae@intel.com
2021-10-26x86/fpu: Add sanity checks for XFDThomas Gleixner1-0/+58
Add debug functionality to ensure that the XFD MSR is up to date for XSAVE* and XRSTOR* operations. [ tglx: Improve comment. ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211021225527.10184-16-chang.seok.bae@intel.com
2021-10-26x86/fpu/signal: Prepare for variable sigframe lengthChang S. Bae1-1/+0
The software reserved portion of the fxsave frame in the signal frame is copied from structures which have been set up at boot time. With dynamically enabled features the content of these structures is no longer correct because the xfeatures and size can be different per task. Calculate the software reserved portion at runtime and fill in the xfeatures and size values from the tasks active fpstate. Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211021225527.10184-10-chang.seok.bae@intel.com
2021-10-26x86/arch_prctl: Add controls for dynamic XSTATE componentsChang S. Bae1-0/+156
Dynamically enabled XSTATE features are by default disabled for all processes. A process has to request permission to use such a feature. To support this implement a architecture specific prctl() with the options: - ARCH_GET_XCOMP_SUPP Copies the supported feature bitmap into the user space provided u64 storage. The pointer is handed in via arg2 - ARCH_GET_XCOMP_PERM Copies the process wide permitted feature bitmap into the user space provided u64 storage. The pointer is handed in via arg2 - ARCH_REQ_XCOMP_PERM Request permission for a feature set. A feature set can be mapped to a facility, e.g. AMX, and can require one or more XSTATE components to be enabled. The feature argument is the number of the highest XSTATE component which is required for a facility to work. The request argument is not a user supplied bitmap because that makes filtering harder (think seccomp) and even impossible because to support 32bit tasks the argument would have to be a pointer. The permission mechanism works this way: Task asks for permission for a facility and kernel checks whether that's supported. If supported it does: 1) Check whether permission has already been granted 2) Compute the size of the required kernel and user space buffer (sigframe) size. 3) Validate that no task has a sigaltstack installed which is smaller than the resulting sigframe size 4) Add the requested feature bit(s) to the permission bitmap of current->group_leader->fpu and store the sizes in the group leaders fpu struct as well. If that is successful then the feature is still not enabled for any of the tasks. The first usage of a related instruction will result in a #NM trap. The trap handler validates the permission bit of the tasks group leader and if permitted it installs a larger kernel buffer and transfers the permission and size info to the new fpstate container which makes all the FPU functions which require per task information aware of the extended feature set. [ tglx: Adopted to new base code, added missing serialization, massaged namings, comments and changelog ] Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211021225527.10184-7-chang.seok.bae@intel.com
2021-10-26x86/fpu/xstate: Provide xstate_calculate_size()Chang S. Bae1-18/+28
Split out the size calculation from the paranoia check so it can be used for recalculating buffer sizes when dynamically enabled features are supported. Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> [ tglx: Adopted to changed base code ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211021225527.10184-4-chang.seok.bae@intel.com
2021-10-22x86/fpu: Rework restore_regs_from_fpstate()Thomas Gleixner1-1/+1
xfeatures_mask_fpstate() is no longer valid when dynamically enabled features come into play. Rework restore_regs_from_fpstate() so it takes a constant mask which will then be applied against the maximum feature set so that the restore operation brings all features which are not in the xsave buffer xfeature bitmap into init state. This ensures that if the previous task used a dynamically enabled feature that the task which restores has all unused components properly initialized. Cleanup the last user of xfeatures_mask_fpstate() as well and remove it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211014230739.461348278@linutronix.de
2021-10-22x86/fpu: Mop up xfeatures_mask_uabi()Thomas Gleixner1-3/+3
Use the new fpu_user_cfg to retrieve the information instead of xfeatures_mask_uabi() which will be no longer correct when dynamically enabled features become available. Using fpu_user_cfg is appropriate when setting XCOMP_BV in the init_fpstate since it has space allocated for "max_features". But, normal fpstates might only have space for default xfeatures. Since XRSTOR* derives the format of the XSAVE buffer from XCOMP_BV, this can lead to XRSTOR reading out of bounds. So when copying actively used fpstate, simply read the XCOMP_BV features bits directly out of the fpstate instead. This correction courtesy of Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211014230739.408879849@linutronix.de
2021-10-21x86/fpu: Move xstate feature masks to fpu_*_cfgThomas Gleixner1-28/+29
Move the feature mask storage to the kernel and user config structs. Default and maximum feature set are the same for now. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211014230739.352041752@linutronix.de
2021-10-21x86/fpu: Move xstate size to fpu_*_cfgThomas Gleixner1-14/+18
Use the new kernel and user space config storage to store and retrieve the XSTATE buffer sizes. The default and the maximum size are the same for now, but will change when support for dynamically enabled features is added. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211014230739.296830097@linutronix.de
2021-10-21x86/fpu/xstate: Cleanup size calculationsThomas Gleixner1-36/+46
The size calculations are partially unreadable gunk. Clean them up. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211014230739.241223689@linutronix.de
2021-10-21x86/fpu/xstate: Use fpstate for copy_uabi_to_xstate()Thomas Gleixner1-8/+10
Prepare for dynamically enabled states per task. The function needs to retrieve the features and sizes which are valid in a fpstate context. Retrieve them from fpstate. Move the function declarations to the core header as they are not required anywhere else. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211013145323.233529986@linutronix.de
2021-10-21x86/fpu: Use fpstate in __copy_xstate_to_uabi_buf()Thomas Gleixner1-5/+6
With dynamically enabled features the copy function must know the features and the size which is valid for the task. Retrieve them from fpstate. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211013145323.181495492@linutronix.de
2021-10-21x86/fpu: Add size and mask information to fpstateThomas Gleixner1-0/+3
Add state size and feature mask information to the fpstate container. This will be used for runtime checks with the upcoming support for dynamically enabled features and dynamically sized buffers. That avoids conditionals all over the place as the required information is accessible for both default and extended buffers. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211013145322.921388806@linutronix.de
2021-10-20x86/fpu/core: Convert to fpstateThomas Gleixner1-1/+1
Convert the rest of the core code to the new register storage mechanism in preparation for dynamically sized buffers. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211013145322.659456185@linutronix.de
2021-10-20x86/fpu: Replace KVMs xstate component clearingThomas Gleixner1-1/+11
In order to prepare for the support of dynamically enabled FPU features, move the clearing of xstate components to the FPU core code. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: kvm@vger.kernel.org Link: https://lkml.kernel.org/r/20211013145322.399567049@linutronix.de
2021-10-20x86/fpu: Convert fpstate_init() to struct fpstateThomas Gleixner1-6/+6
Convert fpstate_init() and related code to the new register storage mechanism in preparation for dynamically sized buffers. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211013145322.292157401@linutronix.de
2021-10-20x86/fpu: Replace the includes of fpu/internal.hThomas Gleixner1-1/+0
Now that the file is empty, fixup all references with the proper includes and delete the former kitchen sink. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211015011540.001197214@linutronix.de