diff options
Diffstat (limited to 'Documentation/x86')
-rw-r--r-- | Documentation/x86/entry_64.rst | 2 | ||||
-rw-r--r-- | Documentation/x86/index.rst | 1 | ||||
-rw-r--r-- | Documentation/x86/orc-unwinder.rst | 4 | ||||
-rw-r--r-- | Documentation/x86/sgx.rst | 35 | ||||
-rw-r--r-- | Documentation/x86/xstate.rst | 65 |
5 files changed, 104 insertions, 3 deletions
diff --git a/Documentation/x86/entry_64.rst b/Documentation/x86/entry_64.rst index a48b3f6ebbe8..e433e08f7018 100644 --- a/Documentation/x86/entry_64.rst +++ b/Documentation/x86/entry_64.rst @@ -8,7 +8,7 @@ This file documents some of the kernel entries in arch/x86/entry/entry_64.S. A lot of this explanation is adapted from an email from Ingo Molnar: -http://lkml.kernel.org/r/<20110529191055.GC9835%40elte.hu> +https://lore.kernel.org/r/20110529191055.GC9835%40elte.hu The x86 architecture has quite a few different ways to jump into kernel code. Most of these entry points are registered in diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst index 383048396336..f498f1d36cd3 100644 --- a/Documentation/x86/index.rst +++ b/Documentation/x86/index.rst @@ -37,3 +37,4 @@ x86-specific Documentation sgx features elf_auxvec + xstate diff --git a/Documentation/x86/orc-unwinder.rst b/Documentation/x86/orc-unwinder.rst index d811576c1f3e..9a66a88be765 100644 --- a/Documentation/x86/orc-unwinder.rst +++ b/Documentation/x86/orc-unwinder.rst @@ -177,6 +177,6 @@ brutal, unyielding efficiency. ORC stands for Oops Rewind Capability. -.. [1] https://lkml.kernel.org/r/20170602104048.jkkzssljsompjdwy@suse.de -.. [2] https://lkml.kernel.org/r/d2ca5435-6386-29b8-db87-7f227c2b713a@suse.cz +.. [1] https://lore.kernel.org/r/20170602104048.jkkzssljsompjdwy@suse.de +.. [2] https://lore.kernel.org/r/d2ca5435-6386-29b8-db87-7f227c2b713a@suse.cz .. [3] http://dustin.wikidot.com/half-orcs-and-orcs diff --git a/Documentation/x86/sgx.rst b/Documentation/x86/sgx.rst index dd0ac96ff9ef..a608f667fb95 100644 --- a/Documentation/x86/sgx.rst +++ b/Documentation/x86/sgx.rst @@ -250,3 +250,38 @@ user wants to deploy SGX applications both on the host and in guests on the same machine, the user should reserve enough EPC (by taking out total virtual EPC size of all SGX VMs from the physical EPC size) for host SGX applications so they can run with acceptable performance. + +Architectural behavior is to restore all EPC pages to an uninitialized +state also after a guest reboot. Because this state can be reached only +through the privileged ``ENCLS[EREMOVE]`` instruction, ``/dev/sgx_vepc`` +provides the ``SGX_IOC_VEPC_REMOVE_ALL`` ioctl to execute the instruction +on all pages in the virtual EPC. + +``EREMOVE`` can fail for three reasons. Userspace must pay attention +to expected failures and handle them as follows: + +1. Page removal will always fail when any thread is running in the + enclave to which the page belongs. In this case the ioctl will + return ``EBUSY`` independent of whether it has successfully removed + some pages; userspace can avoid these failures by preventing execution + of any vcpu which maps the virtual EPC. + +2. Page removal will cause a general protection fault if two calls to + ``EREMOVE`` happen concurrently for pages that refer to the same + "SECS" metadata pages. This can happen if there are concurrent + invocations to ``SGX_IOC_VEPC_REMOVE_ALL``, or if a ``/dev/sgx_vepc`` + file descriptor in the guest is closed at the same time as + ``SGX_IOC_VEPC_REMOVE_ALL``; it will also be reported as ``EBUSY``. + This can be avoided in userspace by serializing calls to the ioctl() + and to close(), but in general it should not be a problem. + +3. Finally, page removal will fail for SECS metadata pages which still + have child pages. Child pages can be removed by executing + ``SGX_IOC_VEPC_REMOVE_ALL`` on all ``/dev/sgx_vepc`` file descriptors + mapped into the guest. This means that the ioctl() must be called + twice: an initial set of calls to remove child pages and a subsequent + set of calls to remove SECS pages. The second set of calls is only + required for those mappings that returned a nonzero value from the + first call. It indicates a bug in the kernel or the userspace client + if any of the second round of ``SGX_IOC_VEPC_REMOVE_ALL`` calls has + a return code other than 0. diff --git a/Documentation/x86/xstate.rst b/Documentation/x86/xstate.rst new file mode 100644 index 000000000000..65de3f054ba5 --- /dev/null +++ b/Documentation/x86/xstate.rst @@ -0,0 +1,65 @@ +Using XSTATE features in user space applications +================================================ + +The x86 architecture supports floating-point extensions which are +enumerated via CPUID. Applications consult CPUID and use XGETBV to +evaluate which features have been enabled by the kernel XCR0. + +Up to AVX-512 and PKRU states, these features are automatically enabled by +the kernel if available. Features like AMX TILE_DATA (XSTATE component 18) +are enabled by XCR0 as well, but the first use of related instruction is +trapped by the kernel because by default the required large XSTATE buffers +are not allocated automatically. + +Using dynamically enabled XSTATE features in user space applications +-------------------------------------------------------------------- + +The kernel provides an arch_prctl(2) based mechanism for applications to +request the usage of such features. The arch_prctl(2) options related to +this are: + +-ARCH_GET_XCOMP_SUPP + + arch_prctl(ARCH_GET_XCOMP_SUPP, &features); + + ARCH_GET_XCOMP_SUPP stores the supported features in userspace storage of + type uint64_t. The second argument is a pointer to that storage. + +-ARCH_GET_XCOMP_PERM + + arch_prctl(ARCH_GET_XCOMP_PERM, &features); + + ARCH_GET_XCOMP_PERM stores the features for which the userspace process + has permission in userspace storage of type uint64_t. The second argument + is a pointer to that storage. + +-ARCH_REQ_XCOMP_PERM + + arch_prctl(ARCH_REQ_XCOMP_PERM, feature_nr); + + ARCH_REQ_XCOMP_PERM allows to request permission for a dynamically enabled + feature or a feature set. A feature set can be mapped to a facility, e.g. + AMX, and can require one or more XSTATE components to be enabled. + + The feature argument is the number of the highest XSTATE component which + is required for a facility to work. + +When requesting permission for a feature, the kernel checks the +availability. The kernel ensures that sigaltstacks in the process's tasks +are large enough to accommodate the resulting large signal frame. It +enforces this both during ARCH_REQ_XCOMP_SUPP and during any subsequent +sigaltstack(2) calls. If an installed sigaltstack is smaller than the +resulting sigframe size, ARCH_REQ_XCOMP_SUPP results in -ENOSUPP. Also, +sigaltstack(2) results in -ENOMEM if the requested altstack is too small +for the permitted features. + +Permission, when granted, is valid per process. Permissions are inherited +on fork(2) and cleared on exec(3). + +The first use of an instruction related to a dynamically enabled feature is +trapped by the kernel. The trap handler checks whether the process has +permission to use the feature. If the process has no permission then the +kernel sends SIGILL to the application. If the process has permission then +the handler allocates a larger xstate buffer for the task so the large +state can be context switched. In the unlikely cases that the allocation +fails, the kernel sends SIGSEGV. |