summaryrefslogtreecommitdiffstats
path: root/arch/x86/entry
AgeCommit message (Collapse)AuthorFilesLines
2020-07-19x86/entry: Actually disable stack protectorKees Cook1-3/+11
Some builds of GCC enable stack protector by default. Simply removing the arguments is not sufficient to disable stack protector, as the stack protector for those GCC builds must be explicitly disabled. Remove the argument removals and add -fno-stack-protector. Additionally include missed x32 argument updates, and adjust whitespace for readability. Fixes: 20355e5f73a7 ("x86/entry: Exclude low level entry code from sanitizing") Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/202006261333.585319CA6B@keescook
2020-07-09x86/entry/common: Make prepare_exit_to_usermode() staticThomas Gleixner1-1/+1
No users outside this file anymore. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Andy Lutomirski <luto@kernel.org> Link: https://lkml.kernel.org/r/20200708192934.301116609@linutronix.de
2020-07-09x86/entry: Mark check_user_regs() noinstrThomas Gleixner1-1/+1
It's called from the non-instrumentable section. Fixes: c9c26150e61d ("x86/entry: Assert that syscalls are on the right stack") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Andy Lutomirski <luto@kernel.org> Link: https://lkml.kernel.org/r/20200708192934.191497962@linutronix.de
2020-07-04x86/entry, selftests: Further improve user entry sanity checksAndy Lutomirski1-0/+19
Chasing down a Xen bug caused me to realize that the new entry sanity checks are still fairly weak. Add some more checks. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/881de09e786ab93ce56ee4a2437ba2c308afe7a9.1593795633.git.luto@kernel.org
2020-07-04x86/entry/compat: Clear RAX high bits on Xen PV SYSENTERAndy Lutomirski1-9/+10
Move the clearing of the high bits of RAX after Xen PV joins the SYSENTER path so that Xen PV doesn't skip it. Arguably this code should be deleted instead, but that would belong in the merge window. Fixes: ffae641f5747 ("x86/entry/64/compat: Fix Xen PV SYSENTER frame setup") Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/9d33b3f3216dcab008070f1c28b6091ae7199969.1593795633.git.luto@kernel.org
2020-07-01x86/entry/64/compat: Fix Xen PV SYSENTER frame setupAndy Lutomirski1-0/+1
The SYSENTER frame setup was nonsense. It worked by accident because the normal code into which the Xen asm jumped (entry_SYSENTER_32/compat) threw away SP without touching the stack. entry_SYSENTER_compat was recently modified such that it relied on having a valid stack pointer, so now the Xen asm needs to invoke it with a valid stack. Fix it up like SYSCALL: use the Xen-provided frame and skip the bare metal prologue. Fixes: 1c3e5d3f60e2 ("x86/entry: Make entry_64_compat.S objtool clean") Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: https://lkml.kernel.org/r/947880c41ade688ff4836f665d0c9fcaa9bd1201.1593191971.git.luto@kernel.org
2020-07-01x86/entry: Move SYSENTER's regs->sp and regs->flags fixups into CAndy Lutomirski3-9/+19
The SYSENTER asm (32-bit and compat) contains fixups for regs->sp and regs->flags. Move the fixups into C and fix some comments while at it. This is a valid cleanup all by itself, and it also simplifies the subsequent patch that will fix Xen PV SYSENTER. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/fe62bef67eda7fac75b8f3dbafccf571dc4ece6b.1593191971.git.luto@kernel.org
2020-07-01x86/entry: Assert that syscalls are on the right stackAndy Lutomirski1-3/+15
Now that the entry stack is a full page, it's too easy to regress the system call entry code and end up on the wrong stack without noticing. Assert that all system calls (SYSCALL64, SYSCALL32, SYSENTER, and INT80) are on the right stack and have pt_regs in the right place. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/52059e42bb0ab8551153d012d68f7be18d72ff8e.1593191971.git.luto@kernel.org
2020-06-13Merge tag 'x86-entry-2020-06-12' of ↵Linus Torvalds7-1106/+761
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 entry updates from Thomas Gleixner: "The x86 entry, exception and interrupt code rework This all started about 6 month ago with the attempt to move the Posix CPU timer heavy lifting out of the timer interrupt code and just have lockless quick checks in that code path. Trivial 5 patches. This unearthed an inconsistency in the KVM handling of task work and the review requested to move all of this into generic code so other architectures can share. Valid request and solved with another 25 patches but those unearthed inconsistencies vs. RCU and instrumentation. Digging into this made it obvious that there are quite some inconsistencies vs. instrumentation in general. The int3 text poke handling in particular was completely unprotected and with the batched update of trace events even more likely to expose to endless int3 recursion. In parallel the RCU implications of instrumenting fragile entry code came up in several discussions. The conclusion of the x86 maintainer team was to go all the way and make the protection against any form of instrumentation of fragile and dangerous code pathes enforcable and verifiable by tooling. A first batch of preparatory work hit mainline with commit d5f744f9a2ac ("Pull x86 entry code updates from Thomas Gleixner") That (almost) full solution introduced a new code section '.noinstr.text' into which all code which needs to be protected from instrumentation of all sorts goes into. Any call into instrumentable code out of this section has to be annotated. objtool has support to validate this. Kprobes now excludes this section fully which also prevents BPF from fiddling with it and all 'noinstr' annotated functions also keep ftrace off. The section, kprobes and objtool changes are already merged. The major changes coming with this are: - Preparatory cleanups - Annotating of relevant functions to move them into the noinstr.text section or enforcing inlining by marking them __always_inline so the compiler cannot misplace or instrument them. - Splitting and simplifying the idtentry macro maze so that it is now clearly separated into simple exception entries and the more interesting ones which use interrupt stacks and have the paranoid handling vs. CR3 and GS. - Move quite some of the low level ASM functionality into C code: - enter_from and exit to user space handling. The ASM code now calls into C after doing the really necessary ASM handling and the return path goes back out without bells and whistels in ASM. - exception entry/exit got the equivivalent treatment - move all IRQ tracepoints from ASM to C so they can be placed as appropriate which is especially important for the int3 recursion issue. - Consolidate the declaration and definition of entry points between 32 and 64 bit. They share a common header and macros now. - Remove the extra device interrupt entry maze and just use the regular exception entry code. - All ASM entry points except NMI are now generated from the shared header file and the corresponding macros in the 32 and 64 bit entry ASM. - The C code entry points are consolidated as well with the help of DEFINE_IDTENTRY*() macros. This allows to ensure at one central point that all corresponding entry points share the same semantics. The actual function body for most entry points is in an instrumentable and sane state. There are special macros for the more sensitive entry points, e.g. INT3 and of course the nasty paranoid #NMI, #MCE, #DB and #DF. They allow to put the whole entry instrumentation and RCU handling into safe places instead of the previous pray that it is correct approach. - The INT3 text poke handling is now completely isolated and the recursion issue banned. Aside of the entry rework this required other isolation work, e.g. the ability to force inline bsearch. - Prevent #DB on fragile entry code, entry relevant memory and disable it on NMI, #MC entry, which allowed to get rid of the nested #DB IST stack shifting hackery. - A few other cleanups and enhancements which have been made possible through this and already merged changes, e.g. consolidating and further restricting the IDT code so the IDT table becomes RO after init which removes yet another popular attack vector - About 680 lines of ASM maze are gone. There are a few open issues: - An escape out of the noinstr section in the MCE handler which needs some more thought but under the aspect that MCE is a complete trainwreck by design and the propability to survive it is low, this was not high on the priority list. - Paravirtualization When PV is enabled then objtool complains about a bunch of indirect calls out of the noinstr section. There are a few straight forward ways to fix this, but the other issues vs. general correctness were more pressing than parawitz. - KVM KVM is inconsistent as well. Patches have been posted, but they have not yet been commented on or picked up by the KVM folks. - IDLE Pretty much the same problems can be found in the low level idle code especially the parts where RCU stopped watching. This was beyond the scope of the more obvious and exposable problems and is on the todo list. The lesson learned from this brain melting exercise to morph the evolved code base into something which can be validated and understood is that once again the violation of the most important engineering principle "correctness first" has caused quite a few people to spend valuable time on problems which could have been avoided in the first place. The "features first" tinkering mindset really has to stop. With that I want to say thanks to everyone involved in contributing to this effort. Special thanks go to the following people (alphabetical order): Alexandre Chartre, Andy Lutomirski, Borislav Petkov, Brian Gerst, Frederic Weisbecker, Josh Poimboeuf, Juergen Gross, Lai Jiangshan, Macro Elver, Paolo Bonzin,i Paul McKenney, Peter Zijlstra, Vitaly Kuznetsov, and Will Deacon" * tag 'x86-entry-2020-06-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (142 commits) x86/entry: Force rcu_irq_enter() when in idle task x86/entry: Make NMI use IDTENTRY_RAW x86/entry: Treat BUG/WARN as NMI-like entries x86/entry: Unbreak __irqentry_text_start/end magic x86/entry: __always_inline CR2 for noinstr lockdep: __always_inline more for noinstr x86/entry: Re-order #DB handler to avoid *SAN instrumentation x86/entry: __always_inline arch_atomic_* for noinstr x86/entry: __always_inline irqflags for noinstr x86/entry: __always_inline debugreg for noinstr x86/idt: Consolidate idt functionality x86/idt: Cleanup trap_init() x86/idt: Use proper constants for table size x86/idt: Add comments about early #PF handling x86/idt: Mark init only functions __init x86/entry: Rename trace_hardirqs_off_prepare() x86/entry: Clarify irq_{enter,exit}_rcu() x86/entry: Remove DBn stacks x86/entry: Remove debug IDT frobbing x86/entry: Optimize local_db_save() for virt ...
2020-06-12x86/entry: Force rcu_irq_enter() when in idle taskThomas Gleixner1-7/+28
The idea of conditionally calling into rcu_irq_enter() only when RCU is not watching turned out to be not completely thought through. Paul noticed occasional premature end of grace periods in RCU torture testing. Bisection led to the commit which made the invocation of rcu_irq_enter() conditional on !rcu_is_watching(). It turned out that this conditional breaks RCU assumptions about the idle task when the scheduler tick happens to be a nested interrupt. Nested interrupts can happen when the first interrupt invokes softirq processing on return which enables interrupts. If that nested tick interrupt does not invoke rcu_irq_enter() then the RCU's irq-nesting checks will believe that this interrupt came directly from idle, which will cause RCU to report a quiescent state. Because this interrupt instead came from a softirq handler which might have been executing an RCU read-side critical section, this can cause the grace period to end prematurely. Change the condition from !rcu_is_watching() to is_idle_task(current) which enforces that interrupts in the idle task unconditionally invoke rcu_irq_enter() independent of the RCU state. This is also correct vs. user mode entries in NOHZ full scenarios because user mode entries bring RCU out of EQS and force the RCU irq nesting state accounting to nested. As only the first interrupt can enter from user mode a nested tick interrupt will enter from kernel mode and as the nesting state accounting is forced to nesting it will not do anything stupid even if rcu_irq_enter() has not been invoked. Fixes: 3eeec3858488 ("x86/entry: Provide idtentry_entry/exit_cond_rcu()") Reported-by: "Paul E. McKenney" <paulmck@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: "Paul E. McKenney" <paulmck@kernel.org> Reviewed-by: "Paul E. McKenney" <paulmck@kernel.org> Acked-by: Andy Lutomirski <luto@kernel.org> Acked-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lkml.kernel.org/r/87wo4cxubv.fsf@nanos.tec.linutronix.de
2020-06-11Rebase locking/kcsan to locking/urgentThomas Gleixner1-0/+6
Merge the state of the locking kcsan branch before the read/write_once() and the atomics modifications got merged. Squash the fallout of the rebase on top of the read/write once and atomic fallback work into the merge. The history of the original branch is preserved in tag locking-kcsan-2020-06-02. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2020-06-11x86/entry: Unbreak __irqentry_text_start/end magicThomas Gleixner2-2/+20
The entry rework moved interrupt entry code from the irqentry to the noinstr section which made the irqentry section empty. This breaks boundary checks which rely on the __irqentry_text_start/end markers to find out whether a function in a stack trace is interrupt/exception entry code. This affects the function graph tracer and filter_irq_stacks(). As the IDT entry points are all sequentialy emitted this is rather simple to unbreak by injecting __irqentry_text_start/end as global labels. To make this work correctly: - Remove the IRQENTRY_TEXT section from the x86 linker script - Define __irqentry so it breaks the build if it's used - Adjust the entry mirroring in PTI - Remove the redundant kprobes and unwinder bound checks Reported-by: Qian Cai <cai@lca.pw> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2020-06-11x86/entry: Rename trace_hardirqs_off_prepare()Peter Zijlstra1-3/+3
The typical pattern for trace_hardirqs_off_prepare() is: ENTRY lockdep_hardirqs_off(); // because hardware ... do entry magic instrumentation_begin(); trace_hardirqs_off_prepare(); ... do actual work trace_hardirqs_on_prepare(); lockdep_hardirqs_on_prepare(); instrumentation_end(); ... do exit magic lockdep_hardirqs_on(); which shows that it's named wrong, rename it to trace_hardirqs_off_finish(), as it concludes the hardirq_off transition. Also, given that the above is the only correct order, make the traditional all-in-one trace_hardirqs_off() follow suit. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20200529213321.415774872@infradead.org
2020-06-11x86/entry: Remove DBn stacksPeter Zijlstra1-17/+0
Both #DB itself, as all other IST users (NMI, #MC) now clear DR7 on entry. Combined with not allowing breakpoints on entry/noinstr/NOKPROBE text and no single step (EFLAGS.TF) inside the #DB handler should guarantee no nested #DB. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20200529213321.303027161@infradead.org
2020-06-11x86/entry: Remove the TRACE_IRQS cruftThomas Gleixner2-21/+1
No more users. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Andy Lutomirski <luto@kernel.org> Link: https://lore.kernel.org/r/20200521202120.523289762@linutronix.de
2020-06-11x86/entry: Move paranoid irq tracing out of ASM codeThomas Gleixner1-13/+0
The last step to remove the irq tracing cruft from ASM. Ignore #DF as the maschine is going to die anyway. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Andy Lutomirski <luto@kernel.org> Link: https://lore.kernel.org/r/20200521202120.414043330@linutronix.de
2020-06-11x86/entry/64: Remove TRACE_IRQS_*_DEBUGThomas Gleixner1-45/+3
Since INT3/#BP no longer runs on an IST, this workaround is no longer required. Tested by running lockdep+ftrace as described in the initial commit: 5963e317b1e9 ("ftrace/x86: Do not change stacks in DEBUG when calling lockdep") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Acked-by: Andy Lutomirski <luto@kernel.org> Link: https://lore.kernel.org/r/20200521202120.319418546@linutronix.de
2020-06-11x86/entry/32: Remove redundant irq disable codeThomas Gleixner1-76/+0
All exceptions/interrupts return with interrupts disabled now. No point in doing this in ASM again. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Andy Lutomirski <luto@kernel.org> Link: https://lore.kernel.org/r/20200521202120.221223450@linutronix.de
2020-06-11x86/entry: Make enter_from_user_mode() staticThomas Gleixner1-1/+1
The ASM users are gone. All callers are local. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Andy Lutomirski <luto@kernel.org> Link: https://lore.kernel.org/r/20200521202120.129232680@linutronix.de
2020-06-11x86/entry/64: Remove IRQ stack switching ASMThomas Gleixner1-96/+0
No more users. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Andy Lutomirski <luto@kernel.org> Link: https://lore.kernel.org/r/20200521202120.021462159@linutronix.de
2020-06-11x86/entry: Remove the apic/BUILD interrupt leftoversThomas Gleixner3-181/+0
Remove all the code which was there to emit the system vector stubs. All users are gone. Move the now unused GET_CR2_INTO macro muck to head_64.S where the last user is. Fixup the eye hurting comment there while at it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Andy Lutomirski <luto@kernel.org> Link: https://lore.kernel.org/r/20200521202119.927433002@linutronix.de
2020-06-11x86/entry: Convert reschedule interrupt to IDTENTRY_SYSVEC_SIMPLEThomas Gleixner1-4/+0
The scheduler IPI does not need the full interrupt entry handling logic when the entry is from kernel mode. Use IDTENTRY_SYSVEC_SIMPLE and spare all the overhead. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Andy Lutomirski <luto@kernel.org> Link: https://lore.kernel.org/r/20200521202119.835425642@linutronix.de
2020-06-11x86/entry: Convert XEN hypercall vector to IDTENTRY_SYSVECThomas Gleixner2-10/+0
Convert the last oldstyle defined vector to IDTENTRY_SYSVEC: - Implement the C entry point with DEFINE_IDTENTRY_SYSVEC - Emit the ASM stub with DECLARE_IDTENTRY_SYSVEC - Remove the ASM idtentries in 64-bit - Remove the BUILD_INTERRUPT entries in 32-bit - Remove the old prototypes Fixup the related XEN code by providing the primary C entry point in x86 to avoid cluttering the generic code with X86'isms. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Andy Lutomirski <luto@kernel.org> Link: https://lore.kernel.org/r/20200521202119.741950104@linutronix.de
2020-06-11x86/entry: Convert various hypervisor vectors to IDTENTRY_SYSVECThomas Gleixner2-31/+0
Convert various hypervisor vectors to IDTENTRY_SYSVEC: - Implement the C entry point with DEFINE_IDTENTRY_SYSVEC - Emit the ASM stub with DECLARE_IDTENTRY_SYSVEC - Remove the ASM idtentries in 64-bit - Remove the BUILD_INTERRUPT entries in 32-bit - Remove the old prototypes No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Wei Liu <wei.liu@kernel.org> Link: https://lore.kernel.org/r/20200521202119.647997594@linutronix.de
2020-06-11x86/entry: Convert KVM vectors to IDTENTRY_SYSVEC*Thomas Gleixner1-7/+0
Convert KVM specific system vectors to IDTENTRY_SYSVEC*: The two empty stub handlers which only increment the stats counter do no need to run on the interrupt stack. Use IDTENTRY_SYSVEC_SIMPLE for them. The wakeup handler does more work and runs on the interrupt stack. None of these handlers need to save and restore the irq_regs pointer. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Andy Lutomirski <luto@kernel.org> Link: https://lore.kernel.org/r/20200521202119.555715519@linutronix.de
2020-06-11x86/entry: Convert various system vectorsThomas Gleixner1-19/+0
Convert various system vectors to IDTENTRY_SYSVEC: - Implement the C entry point with DEFINE_IDTENTRY_SYSVEC - Emit the ASM stub with DECLARE_IDTENTRY_SYSVEC - Remove the ASM idtentries in 64-bit - Remove the BUILD_INTERRUPT entries in 32-bit - Remove the old prototypes No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Andy Lutomirski <luto@kernel.org> Link: https://lore.kernel.org/r/20200521202119.464812973@linutronix.de
2020-06-11x86/entry: Convert SMP system vectors to IDTENTRY_SYSVECThomas Gleixner1-7/+0
Convert SMP system vectors to IDTENTRY_SYSVEC: - Implement the C entry point with DEFINE_IDTENTRY_SYSVEC - Emit the ASM stub with DECLARE_IDTENTRY_SYSVEC - Remove the ASM idtentries in 64-bit - Remove the BUILD_INTERRUPT entries in 32-bit - Remove the old prototypes No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Andy Lutomirski <luto@kernel.org> Link: https://lore.kernel.org/r/20200521202119.372234635@linutronix.de
2020-06-11x86/entry: Convert APIC interrupts to IDTENTRY_SYSVECThomas Gleixner1-6/+0
Convert APIC interrupts to IDTENTRY_SYSVEC: - Implement the C entry point with DEFINE_IDTENTRY_SYSVEC - Emit the ASM stub with DECLARE_IDTENTRY_SYSVEC - Remove the ASM idtentries in 64-bit - Remove the BUILD_INTERRUPT entries in 32-bit - Remove the old prototypes No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Andy Lutomirski <luto@kernel.org> Link: https://lore.kernel.org/r/20200521202119.280728850@linutronix.de
2020-06-11x86/entry: Provide IDTENTRY_SYSVECThomas Gleixner2-0/+12
Provide IDTENTRY variants for system vectors to consolidate the different mechanisms to emit the ASM stubs for 32- and 64-bit. On 64-bit this also moves the stack switching from ASM to C code. 32-bit will excute the system vectors w/o stack switching as before. The simple variant is meant for "empty" system vectors like scheduler IPI and KVM posted interrupt vectors. These do not need the full glory of irq enter/exit handling with softirq processing and more. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Andy Lutomirski <luto@kernel.org> Link: https://lore.kernel.org/r/20200521202119.185317067@linutronix.de
2020-06-11x86/entry: Use idtentry for interruptsThomas Gleixner2-59/+3
Replace the extra interrupt handling code and reuse the existing idtentry machinery. This moves the irq stack switching on 64-bit from ASM to C code; 32-bit already does the stack switching in C. This requires to remove HAVE_IRQ_EXIT_ON_IRQ_STACK as the stack switch is not longer in the low level entry code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Andy Lutomirski <luto@kernel.org> Link: https://lore.kernel.org/r/20200521202119.078690991@linutronix.de
2020-06-11x86/entry: Add IRQENTRY_IRQ macroThomas Gleixner2-0/+28
Provide a seperate IDTENTRY macro for device interrupts. Similar to IDTENTRY_ERRORCODE with the addition of invoking irq_enter/exit_rcu() and providing the errorcode as a 'u8' argument to the C function, which truncates the sign extended vector number. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Andy Lutomirski <luto@kernel.org> Link: https://lore.kernel.org/r/20200521202118.984573165@linutronix.de
2020-06-11x86/irq: Convey vector as argument and not in ptregsThomas Gleixner3-63/+15
Device interrupts which go through do_IRQ() or the spurious interrupt handler have their separate entry code on 64 bit for no good reason. Both 32 and 64 bit transport the vector number through ORIG_[RE]AX in pt_regs. Further the vector number is forced to fit into an u8 and is complemented and offset by 0x80 so it's in the signed character range. Otherwise GAS would expand the pushq to a 5 byte instruction for any vector > 0x7F. Treat the vector number like an error code and hand it to the C function as argument. This allows to get rid of the extra entry code in a later step. Simplify the error code push magic by implementing the pushq imm8 via a '.byte 0x6a, vector' sequence so GAS is not able to screw it up. As the pushq imm8 is sign extending the resulting error code needs to be truncated to 8 bits in C code. Originally-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Andy Lutomirski <luto@kernel.org> Link: https://lore.kernel.org/r/20200521202118.796915981@linutronix.de
2020-06-11x86/entry/32: Remove common_exception()Thomas Gleixner1-21/+0
No more users. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Andy Lutomirski <luto@kernel.org> Link: https://lore.kernel.org/r/20200521202118.611906966@linutronix.de
2020-06-11x86/entry/64: Remove error_exit()Thomas Gleixner1-9/+0
No more users. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Andy Lutomirski <luto@kernel.org> Link: https://lore.kernel.org/r/20200521202118.516757524@linutronix.de
2020-06-11x86/entry: Change exit path of xen_failsafe_callbackThomas Gleixner2-2/+2
xen_failsafe_callback() is invoked from XEN for two cases: 1. Fault while reloading DS, ES, FS or GS 2. Fault while executing IRET #1 retries the IRET after XEN has fixed up the segments. #2 injects a #GP which kills the task For #1 there is no reason to go through the full exception return path because the tasks TIF state is still the same. So just going straight to the IRET path is good enough. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Andy Lutomirski <luto@kernel.org> Link: https://lore.kernel.org/r/20200521202118.423224507@linutronix.de
2020-06-11x86/entry: Remove the transition leftoversThomas Gleixner2-24/+5
Now that all exceptions are converted over the sane flag is not longer needed. Also the vector argument of idtentry_body on 64-bit is pointless now. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Andy Lutomirski <luto@kernel.org> Link: https://lore.kernel.org/r/20200521202118.331115895@linutronix.de
2020-06-11x86/entry: Switch page fault exception to IDTENTRY_RAWThomas Gleixner2-49/+0
Convert page fault exceptions to IDTENTRY_RAW: - Implement the C entry point with DEFINE_IDTENTRY_RAW - Add the CR2 read into the exception handler - Add the idtentry_enter/exit_cond_rcu() invocations in in the regular page fault handler and in the async PF part. - Emit the ASM stub with DECLARE_IDTENTRY_RAW - Remove the ASM idtentry in 64-bit - Remove the CR2 read from 64-bit - Remove the open coded ASM entry code in 32-bit - Fix up the XEN/PV code - Remove the old prototypes No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Andy Lutomirski <luto@kernel.org> Link: https://lore.kernel.org/r/20200521202118.238455120@linutronix.de
2020-06-11x86/entry/64: Simplify idtentry_bodyThomas Gleixner1-2/+0
All C functions which do not have an error code have been converted to the new IDTENTRY interface which does not expect an error code in the arguments. Spare the XORL. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Andy Lutomirski <luto@kernel.org> Link: https://lore.kernel.org/r/20200521202118.145811853@linutronix.de
2020-06-11x86/entry: Switch XEN/PV hypercall entry to IDTENTRYThomas Gleixner3-22/+96
Convert the XEN/PV hypercall to IDTENTRY: - Emit the ASM stub with DECLARE_IDTENTRY - Remove the ASM idtentry in 64-bit - Remove the open coded ASM entry code in 32-bit - Remove the old prototypes The handler stubs need to stay in ASM code as they need corner case handling and adjustment of the stack pointer. Provide a new C function which invokes the entry/exit handling and calls into the XEN handler on the interrupt stack if required. The exit code is slightly different from the regular idtentry_exit() on non-preemptible kernels. If the hypercall is preemptible and need_resched() is set then XEN provides a preempt hypercall scheduling function. Move this functionality into the entry code so it can use the existing idtentry functionality. [ mingo: Build fixes. ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Andy Lutomirski <luto@kernel.org> Acked-by: Juergen Gross <jgross@suse.com> Tested-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20200521202118.055270078@linutronix.de
2020-06-11x86/entry: Split out idtentry_exit_cond_resched()Thomas Gleixner1-15/+15
The XEN PV hypercall requires the ability of conditional rescheduling when preemption is disabled because some hypercalls take ages. Split out the rescheduling code from idtentry_exit_cond_rcu() so it can be reused for that. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Andy Lutomirski <luto@kernel.org> Link: https://lore.kernel.org/r/20200521202117.962199649@linutronix.de
2020-06-11x86/entry/64: Move do_softirq_own_stack() to CThomas Gleixner1-13/+0
The first step to get rid of the ENTER/LEAVE_IRQ_STACK ASM macro maze. Use the new C code helpers to move do_softirq_own_stack() out of ASM code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Andy Lutomirski <luto@kernel.org> Link: https://lore.kernel.org/r/20200521202117.870911120@linutronix.de
2020-06-11x86/entry: Provide helpers for executing on the irqstackThomas Gleixner1-0/+39
Device interrupt handlers and system vector handlers are executed on the interrupt stack. The stack switch happens in the low level assembly entry code. This conflicts with the efforts to consolidate the exit code in C to ensure correctness vs. RCU and tracing. As there is no way to move #DB away from IST due to the MOV SS issue, the requirements vs. #DB and NMI for switching to the interrupt stack do not exist anymore. The only requirement is that interrupts are disabled. That allows the moving of the stack switching to C code, which simplifies the entry/exit handling further, because it allows the switching of stacks after handling the entry and on exit before handling RCU, returning to usermode and kernel preemption in the same way as for regular exceptions. The initial attempt of having the stack switching in inline ASM caused too much headache vs. objtool and the unwinder. After analysing the use cases it was agreed on that having the stack switch in ASM for the price of an indirect call is acceptable, as the main users are indirect call heavy anyway and the few system vectors which are empty shells (scheduler IPI and KVM posted interrupt vectors) can run from the regular stack. Provide helper functions to check whether the interrupt stack is already active and whether stack switching is required. 64-bit only for now, as 32-bit has a variant of that already. Once this is cleaned up, the two implementations might be consolidated as an additional cleanup on top. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Andy Lutomirski <luto@kernel.org> Link: https://lore.kernel.org/r/20200521202117.763775313@linutronix.de
2020-06-11x86/entry: Clean up idtentry_enter/exit() leftoversThomas Gleixner1-38/+29
Now that everything is converted to conditional RCU handling remove idtentry_enter/exit() and tidy up the conditional functions. This does not remove rcu_irq_exit_preempt(), to avoid conflicts with the RCU tree. Will be removed once all of this hits Linus's tree. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Andy Lutomirski <luto@kernel.org> Link: https://lore.kernel.org/r/20200521202117.473597954@linutronix.de
2020-06-11x86/entry: Provide idtentry_enter/exit_user()Thomas Gleixner1-0/+31
As there are exceptions which already handle entry from user mode and from kernel mode separately, providing explicit user entry/exit handling callbacks makes sense and makes the code easier to understand. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Andy Lutomirski <luto@kernel.org> Link: https://lore.kernel.org/r/20200521202117.289548561@linutronix.de
2020-06-11x86/entry: Provide idtentry_entry/exit_cond_rcu()Thomas Gleixner1-15/+64
After a lengthy discussion [1] it turned out that RCU does not need a full rcu_irq_enter/exit() when RCU is already watching. All it needs if NOHZ_FULL is active is to check whether the tick needs to be restarted. This allows to avoid a separate variant for the pagefault handler which cannot invoke rcu_irq_enter() on a kernel pagefault which might sleep. The cond_rcu argument is only temporary and will be removed once the existing users of idtentry_enter/exit() have been cleaned up. After that the code can be significantly simplified. [ mingo: Simplified the control flow ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: "Paul E. McKenney" <paulmck@kernel.org> Acked-by: Andy Lutomirski <luto@kernel.org> Link: [1] https://lkml.kernel.org/r/20200515235125.628629605@linutronix.de Link: https://lore.kernel.org/r/20200521202117.181397835@linutronix.de
2020-06-11x86/entry: Convert double fault exception to IDTENTRY_DFThomas Gleixner2-11/+3
Convert #DF to IDTENTRY_DF - Implement the C entry point with DEFINE_IDTENTRY_DF - Emit the ASM stub with DECLARE_IDTENTRY_DF on 64bit - Remove the ASM idtentry in 64bit - Adjust the 32bit shim code - Fixup the XEN/PV code - Remove the old prototypes No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Andy Lutomirski <luto@kernel.org> Link: https://lkml.kernel.org/r/20200505135315.583415264@linutronix.de
2020-06-11x86/entry: Implement user mode C entry points for #DB and #MCEThomas Gleixner1-1/+1
The MCE entry point uses the same mechanism as the IST entry point for now. For #DB split the inner workings and just keep the nmi_enter/exit() magic in the IST variant. Fixup the ASM code to emit the proper noist_##cfunc call. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Andy Lutomirski <luto@kernel.org> Link: https://lkml.kernel.org/r/20200505135315.177564104@linutronix.de
2020-06-11x86/entry/64: Remove error code clearing from #DB and #MCE ASM stubThomas Gleixner1-1/+0
The C entry points do not expect an error code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Andy Lutomirski <luto@kernel.org> Link: https://lkml.kernel.org/r/20200505135314.992621707@linutronix.de
2020-06-11x86/entry: Convert Debug exception to IDTENTRY_DBThomas Gleixner2-12/+0
Convert #DB to IDTENTRY_ERRORCODE: - Implement the C entry point with DEFINE_IDTENTRY_DB - Emit the ASM stub with DECLARE_IDTENTRY - Remove the ASM idtentry in 64bit - Remove the open coded ASM entry code in 32bit - Fixup the XEN/PV code - Remove the old prototypes No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Andy Lutomirski <luto@kernel.org> Link: https://lkml.kernel.org/r/20200505135314.900297476@linutronix.de
2020-06-11x86/entry: Convert NMI to IDTENTRY_NMIThomas Gleixner2-12/+11
Convert #NMI to IDTENTRY_NMI: - Implement the C entry point with DEFINE_IDTENTRY_NMI - Fixup the XEN/PV code - Remove the old prototypes No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Andy Lutomirski <luto@kernel.org> Link: https://lkml.kernel.org/r/20200505135314.609932306@linutronix.de