summaryrefslogtreecommitdiffstats
path: root/arch
diff options
context:
space:
mode:
authorH. Peter Anvin (Intel) <hpa@zytor.com>2021-05-18 12:13:01 -0700
committerThomas Gleixner <tglx@linutronix.de>2021-05-20 15:19:49 +0200
commit0595494891723a1dcca5eaa8eeca8ab54ad953b9 (patch)
tree11640da20dbf9e37b38ec809fecc7604f5862e9c /arch
parent795e2a023b8080b95442811f26f0762184116caa (diff)
downloadlinux-0595494891723a1dcca5eaa8eeca8ab54ad953b9.tar.bz2
x86/entry/64: Sign-extend system calls on entry to int
Right now, *some* code will treat e.g. 0x0000000100000001 as a system call and some will not. Some of the code, notably in ptrace, will treat 0x000000018000000 as a system call and some will not. Finally, right now, e.g. 335 for x86-64 will force the exit code to be set to -ENOSYS even if poked by ptrace, but 548 will not, because there is an observable difference between an out of range system call and a system call number that falls outside the range of the table. This is visible to the user: for example, the syscall_numbering_64 test fails if run under strace, because as strace uses ptrace, it ends up clobbering the upper half of the 64-bit system call number. The architecture independent code all assumes that a system call is "int" that the value -1 specifically and not just any negative value is used for a non-system call. This is the case on x86 as well when arch-independent code is involved. The arch-independent API is defined/documented (but not *implemented*!) in <asm-generic/syscall.h>. This is an ABI change, but is in fact a revert to the original x86-64 ABI. The original assembly entry code would zero-extend the system call number; Use sign extend to be explicit that this is treated as a signed number (although in practice it makes no difference, of course) and to avoid people getting the idea of "optimizing" it, as has happened on at least two(!) separate occasions. Do not store the extended value into regs->orig_ax, however: on x86-64, the ABI is that the callee is responsible for extending parameters, so only examining the lower 32 bits is fully consistent with any "int" argument to any system call, e.g. regs->di for write(2). The full value of %rax on entry to the kernel is thus still available. [ tglx: Add a comment to the ASM code ] Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210518191303.4135296-5-hpa@zytor.com
Diffstat (limited to 'arch')
-rw-r--r--arch/x86/entry/entry_64.S3
1 files changed, 2 insertions, 1 deletions
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 1d9db15fdc69..a5f02d03c585 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -108,7 +108,8 @@ SYM_INNER_LABEL(entry_SYSCALL_64_after_hwframe, SYM_L_GLOBAL)
/* IRQs are off. */
movq %rsp, %rdi
- movq %rax, %rsi
+ /* Sign extend the lower 32bit as syscall numbers are treated as int */
+ movslq %eax, %rsi
call do_syscall_64 /* returns with IRQs disabled */
/*