summaryrefslogtreecommitdiffstats
path: root/arch/x86/include/asm/cpu_entry_area.h
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2018-02-23 11:35:10 -0800
committerLinus Torvalds <torvalds@linux-foundation.org>2018-02-26 19:06:51 -0800
commitfd468043d4d87da49d717d7747dba9f21bf13ed7 (patch)
tree6a9dc8dbd6d331043b79c8970fc2d95fe7205bcf /arch/x86/include/asm/cpu_entry_area.h
parent6f70eb2b00eb416146247c65003d31f4df983ce0 (diff)
downloadlinux-fd468043d4d87da49d717d7747dba9f21bf13ed7.tar.bz2
x86: avoid per-cpu system call trampoline
The per-cpu system call trampoline was a clever trick, and allows us to have percpu data even before swapgs is done by just doing %rip-relative addressing. And that was important, because syscall doesn't have a kernel stack, so we needed that percpu data very very early, just to get a temporary register to switch the page tables around. However, it turns out to be unnecessary. Because we actually have a temporary register that we can use: %r11 is destroyed by the 'syscall' instruction anyway. Ok, technically it contains the user mode flags register, but we *have* that information anyway: it's still in %rflags, we've just masked off a few unimportant bits. We'll destroy the rest too when we do the "and" of the CR3 value, but who cares? It's a system call. Btw, there are a few bits in eflags that might matter to user space: DF and AC. Right now this clears them, but that is fixable by just changing the MSR_SYSCALL_MASK value to not include them, and clearing them by hand the way we do for all other kernel entry points anyway. So the only _real_ flags we'd destroy are IF and the arithmetic flags that get trampled on by the arithmetic instructions that are part of the %cr3 reload logic. However, if we really end up caring, we can save off even those: we'd take advantage of the fact that %rcx - which contains the returning IP of the system call - also has 8 bits free. Why 8? Even with 5-level paging, we only have 57 bits of virtual address space, and the high address space is for the kernel (and vsyscall, but we'd just disable native vsyscall). So the %rip value saved in %rcx can have only 56 valid bits, which means that we have 8 bits free. So *if* we care about IF and the arithmetic flags being saved over a system call, we'd do: shlq $8,%rcx movb %r11b,%cl shrl $8,%r11d andl $8,%r11d orb %r11b,%cl to save those bits off before we then user %r11 as a temporary register (we'd obviously need to then undo that as we save the user space state on the stack). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Diffstat (limited to 'arch/x86/include/asm/cpu_entry_area.h')
-rw-r--r--arch/x86/include/asm/cpu_entry_area.h2
1 files changed, 0 insertions, 2 deletions
diff --git a/arch/x86/include/asm/cpu_entry_area.h b/arch/x86/include/asm/cpu_entry_area.h
index 4a7884b8dca5..29c706415443 100644
--- a/arch/x86/include/asm/cpu_entry_area.h
+++ b/arch/x86/include/asm/cpu_entry_area.h
@@ -30,8 +30,6 @@ struct cpu_entry_area {
*/
struct tss_struct tss;
- char entry_trampoline[PAGE_SIZE];
-
#ifdef CONFIG_X86_64
/*
* Exception stacks used for IST entries.