linux - Linux Kernel (branches are rebased on master from time to time)

Age	Commit message (Collapse)	Author	Files	Lines
2022-03-11	ARM: entry: fix unwinder problems caused by IRQ stacks	Ard Biesheuvel	1	-43/+14
	The IRQ stacks series made some changes to the unwinder, to permit unwinding across different stacks. This is needed because otherwise, the call stack would terminate at the point where the stack switch between the task stack and the IRQ stack occurs, which would defeat any diagnostics that rely on timer interrupts, such as RCU stall detection. Unfortunately, getting the unwind annotations correct turns out to be difficult, given that this now involves a frame pointer which needs to point into the right location in the task stack when unwinding from the IRQ stack. Getting this wrong for an exception handling routine results in the stack pointer to be unwound from the wrong location, causing any subsequent unwind attempts to cause all kinds of issues, as reported by Naresh here [0]. So let's simplify this, by deferring the stack switch to call_with_stack(), which already has the correct unwind annotations, and removing all the complicated handling of the stack frame from the IRQ exception entrypoint itself. [0] https://lore.kernel.org/all/CA+G9fYtpy8VgK+ag6OsA9TDrwi5YGU4hu7GM8xwpO7v6LrCD4Q@mail.gmail.com/ Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2022-03-11	ARM: unwind: set frame.pc correctly for current-thread unwinding	Russell King (Oracle)	3	-3/+10
	When e.g. a WARN_ON() is encountered, we attempt to unwind the current thread. To do this, we set frame.pc to unwind_backtrace, which means it points at the beginning of the function. However, the rest of the state is initialised from within the function, which means the function prologue has already been run. This can be confusing, and with a recent patch from Ard, can result in the unwinder misbehaving if we want to be strict about the PC value. If we correctly initialise the state so it is self-consistent (in other words, set frame.pc to the location we are initialising it) then we eliminate this confusion, and avoid possible future issues. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2022-03-07	ARM: 9184/1: return_address: disable again for CONFIG_ARM_UNWIND=y	Ard Biesheuvel	2	-1/+22
	Commit 41918ec82eb6 ("ARM: ftrace: enable the graph tracer with the EABI unwinder") removed the dummy version of return_address() that was provided for the CONFIG_ARM_UNWIND=y case, on the assumption that the removal of the kernel_text_address() call from unwind_frame() in the preceding patch made it safe to do so. However, this turns out not to be the case: Corentin reports warnings about suspicious RCU usage and other strange behavior that seems to originate in the stack unwinding that occurs in return_address(). Given that the function graph tracer (which is what these changes were enabling for CONFIG_ARM_UNWIND=y builds) does not appear to care about this distinction, let's revert return_address() to the old state. Cc: Corentin Labbe <clabbe.montjoie@gmail.com> Fixes: 41918ec82eb6 ("ARM: ftrace: enable the graph tracer with the EABI unwinder") Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reported-by: Corentin Labbe <clabbe.montjoie@gmail.com> Tested-by: Corentin Labbe <clabbe.montjoie@gmail.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2022-03-07	ARM: 9183/1: unwind: avoid spurious warnings on bogus code addresses	Ard Biesheuvel	1	-1/+2
	Corentin reports that since commit 538b9265c063 ("ARM: unwind: track location of LR value in stack frame"), numerous spurious warnings are emitted into the kernel log: [ 0.000000] unwind: Index not found c0f0c440 [ 0.000000] unwind: Index not found 00000000 [ 0.000000] unwind: Index not found c0f0c440 [ 0.000000] unwind: Index not found 00000000 This is due to the fact that the commit in question removes a check whether the PC value in the unwound frame is actually a kernel text address, on the assumption that such an address will not be associated with valid unwind data to begin with, which is checked right after. The reason for removing this check was that unwind_frame() will be called by the ftrace graph tracer code, which means that it can no longer be safely instrumented itself, or any code that it calls, as it could cause infinite recursion. In order to prevent the spurious diagnostics, let's add back the call to kernel_text_address(), but this time, only call it if no unwind data could be found for the address in question. This is more efficient for the common successful case, and should avoid any unintended recursion, considering that kernel_text_address() will only be called if no unwind data was found. Cc: Corentin Labbe <clabbe.montjoie@gmail.com> Fixes: 538b9265c063 ("ARM: unwind: track location of LR value in stack frame") Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2022-02-10	Revert "ARM: 9144/1: forbid ftrace with clang and thumb2_kernel"	Ard Biesheuvel	1	-1/+1
	This reverts commit ecb108e3e3f7c692082b7c6fce41779c3835854a. Clang + Thumb2 with ftrace is now supported. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-02-10	ARM: mach-bcm: disable ftrace in SMC invocation routines	Ard Biesheuvel	1	-0/+1
	The SMC calling convention uses R7 as an argument register, which conflicts with its use as a frame pointer when building in Thumb2 mode. Given that Clang with ftrace does not permit frame pointers to be disabled, let's omit this compilation unit from ftrace instrumentation. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Nick Desaulniers <ndesaulniers@google.com>
2022-02-09	ARM: cacheflush: avoid clobbering the frame pointer	Ard Biesheuvel	3	-35/+23
	Thumb2 uses R7 rather than R11 as the frame pointer, and even if we rarely use a frame pointer to begin with when building in Thumb2 mode, there are cases where it is required by the compiler (Clang when inserting profiling hooks via -pg) However, preserving and restoring the frame pointer is risky, as any unhandled exceptions raised in the mean time will produce a bogus backtrace, and it would be better not to touch the frame pointer at all. This is the case even when CONFIG_FRAME_POINTER is not set, as the unwind directive used by the unwinder may also use R7 or R11 as the unwind anchor, even if the frame pointer is not managed strictly according to the frame pointer ABI. So let's tweak the cacheflush asm code not to clobber R7 or R11 at all, so that we can drop R7 from the clobber lists of the inline asm blocks that call these routines, and remove the code that preserves/restores R11. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
2022-02-09	ARM: kprobes: treat R7 as the frame pointer register in Thumb2 builds	Ard Biesheuvel	2	-7/+17
	Thumb2 code uses R7 as the frame pointer rather than R11, because the opcodes to access it are generally shorter. This means that there are cases where we cannot simply add it to the clobber list of an asm() block, but need to preserve/restore it explicitly, or the compiler may complain in some cases (e.g., Clang builds with ftrace enabled). Since R11 is not special in that case, clobber it instead, and use it to preserve/restore the value of R7. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
2022-02-09	ARM: ftrace: enable the graph tracer with the EABI unwinder	Ard Biesheuvel	6	-34/+40
	Enable the function graph tracer in combination with the EABI unwinder, so that Thumb2 builds or Clang ARM builds can make use of it. This involves using the unwinder to locate the return address of an instrumented function on the stack, so that it can be overridden and made to refer to the ftrace handling routines that need to be called at function return. Given that for these builds, it is not guaranteed that the value of the link register is stored on the stack, fall back to the stack slot that will be used by the ftrace exit code to restore LR in the instrumented function's execution context. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-02-09	ARM: unwind: track location of LR value in stack frame	Ard Biesheuvel	3	-3/+8
	The ftrace graph tracer needs to override the return address of an instrumented function, in order to install a hook that gets invoked when the function returns again. Currently, we only support this when building for ARM using GCC with frame pointers, as in this case, it is guaranteed that the function will reload LR from [FP, #-4] in all cases, and we can simply pass that address to the ftrace code. In order to support this for configurations that rely on the EABI unwinder, such as Thumb2 builds, make the unwinder keep track of the address from which LR was unwound, permitting ftrace to make use of this in a subsequent patch. Drop the call to is_kernel_text_address(), which is problematic in terms of ftrace recursion, given that it may be instrumented itself. The call is redundant anyway, as no unwind directives will be found unless the PC points to memory that is known to contain executable code. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
2022-02-09	ARM: ftrace: enable HAVE_FUNCTION_GRAPH_FP_TEST	Ard Biesheuvel	3	-1/+8
	Fix the frame pointer handling in the function graph tracer entry and exit code so we can enable HAVE_FUNCTION_GRAPH_FP_TEST. Instead of using FP directly (which will have different values between the entry and exit pieces of the function graph tracer), use the value of SP at entry and exit, as we can derive the former value from the frame pointer. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-02-09	ARM: ftrace: avoid unnecessary literal loads	Ard Biesheuvel	1	-16/+11
	Avoid explicit literal loads and instead, use accessor macros that generate the optimal sequence depending on the architecture revision being targeted. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-02-09	ARM: ftrace: avoid redundant loads or clobbering IP	Ard Biesheuvel	1	-29/+22
	Tweak the ftrace return paths to avoid redundant loads of SP, as well as unnecessary clobbering of IP. This also fixes the inconsistency of using MOV to perform a function return, which is sub-optimal on recent micro-architectures but more importantly, does not perform an interworking return, unlike compiler generated function returns in Thumb2 builds. Let's fix this by popping PC from the stack like most ordinary code does. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-02-09	ARM: ftrace: use trampolines to keep .init.text in branching range	Ard Biesheuvel	2	-3/+36
	Kernel images that are large in comparison to the range of a direct branch may fail to work as expected with ftrace, as patching a direct branch to one of the core ftrace routines may not be possible from the .init.text section, if it is emitted too far away from the normal .text section. This is more likely to affect Thumb2 builds, given that its range is only -/+ 16 MiB (as opposed to ARM which has -/+ 32 MiB), but may occur in either ISA. To work around this, add a couple of trampolines to .init.text and swap these in when the ftrace patching code is operating on callers in .init.text. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-02-09	ARM: ftrace: use ADD not POP to counter PUSH at entry	Ard Biesheuvel	2	-3/+14
	The compiler emitted hook used for ftrace consists of a PUSH {LR} to preserve the link register, followed by a branch-and-link (BL) to __gnu_mount_nc. Dynamic ftrace patches away the latter to turn the combined sequence into a NOP, using a POP {LR} instruction. This is not necessary, since the link register does not get clobbered in this case, and simply adding #4 to the stack pointer is sufficient, and avoids a memory access that may take a few cycles to resolve depending on the micro-architecture. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-02-09	ARM: ftrace: ensure that ADR takes the Thumb bit into account	Ard Biesheuvel	1	-1/+1
	Using ADR to take the address of 'ftrace_stub' via a local label produces an address that has the Thumb bit cleared, which means the subsequent comparison is guaranteed to fail. Instead, use the badr macro, which forces the Thumb bit to be set. Fixes: a3ba87a61499 ("ARM: 6316/1: ftrace: add Thumb-2 support") Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
2022-01-31	ARM: make get_current() and __my_cpu_offset() __always_inline	Ard Biesheuvel	2	-2/+2
	The get_current() and __my_cpu_offset() accessors evaluate to only a single instruction emitted inline, but due to the size of the asm string that is created for SMP+v6 configurations, the compiler assumes otherwise, and may emit the functions out of line instead. So use __always_inline to avoid this. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
2022-01-25	ARM: drop pointless SMP check on secondary startup path	Ard Biesheuvel	1	-5/+0
	Only SMP systems use the secondary startup path by definition, so there is no need for SMP conditionals there. Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2022-01-25	ARM: iop: make iop_handle_irq() static	Ard Biesheuvel	1	-1/+1
	The build bots complain about iop_handle_irq() not being declared so let's make it static instead. Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2022-01-25	ARM: mm: make vmalloc_seq handling SMP safe	Ard Biesheuvel	6	-32/+41
	Rework the vmalloc_seq handling so it can be used safely under SMP, as we started using it to ensure that vmap'ed stacks are guaranteed to be mapped by the active mm before switching to a task, and here we need to ensure that changes to the page tables are visible to other CPUs when they observe a change in the sequence count. Since LPAE needs none of this, fold a check against it into the vmalloc_seq counter check after breaking it out into a separate static inline helper. Given that vmap'ed stacks are now also supported on !SMP configurations, let's drop the WARN() that could potentially now fire spuriously. Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2022-01-25	ARM: entry: avoid clobbering R9 in IRQ handler	Ard Biesheuvel	1	-5/+4
	Avoid using R9 in the IRQ handler code, as the entry code uses it for tsk, and expects it to remain untouched between the IRQ entry and exit code. Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2022-01-25	ARM: smp: elide HWCAP_TLS checks or __entry_task updates on SMP+v6	Ard Biesheuvel	3	-18/+25
	Use the SMP_ON_UP patching framework to elide HWCAP_TLS tests from the context switch and return to userspace code paths, as SMP systems are guaranteed to have this h/w capability. At the same time, omit the update of __entry_task if the system is detected to be UP at runtime, as in that case, the value is never used. Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2022-01-24	ARM: assembler: define a Kconfig symbol for group relocation support	Ard Biesheuvel	5	-12/+28
	Nathan reports the group relocations go out of range in pathological cases such as allyesconfig kernels, which have little chance of actually booting but are still used in validation. So add a Kconfig symbol for this feature, and make it depend on !COMPILE_TEST. Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2022-01-24	ARM: mm: switch to swapper_pg_dir early for vmap'ed stack	Ard Biesheuvel	3	-1/+15
	When onlining a CPU, switch to swapper_pg_dir as soon as possible so that it is guaranteed that the vmap'ed stack is mapped before it is used. Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2022-01-06	ARM: 9176/1: avoid literal references in inline assembly	Ard Biesheuvel	3	-9/+19
	Nathan reports that the new get_current() and per-CPU offset accessors may cause problems at build time due to the use of a literal to hold the address of the respective variables. This is due to the fact that LLD before v14 does not support the PC-relative group relocations that are normally used for this, and the fallback relies on literals but does not emit the literal pools explictly using the .ltorg directive. ./arch/arm/include/asm/current.h:53:6: error: out of range pc-relative fixup value asm(LOAD_SYM_ARMV6(%0, __current) : "=r"(cur)); ^ ./arch/arm/include/asm/insn.h:25:2: note: expanded from macro 'LOAD_SYM_ARMV6' " ldr " #reg ", =" #sym " nt" ^ <inline asm>:1:3: note: instantiated into assembly here ldr r0, =__current ^ Since emitting a literal pool in this particular case is not possible, let's avoid the LOAD_SYM_ARMV6() entirely, and use the ordinary C assigment instead. As it turns out, there are other such cases, and here, using .ltorg to emit the literal pool within range of the LDR instruction would be possible due to the presence of an unconditional branch right after it. Unfortunately, putting .ltorg directives in subsections appears to confuse the Clang inline assembler, resulting in similar errors even though the .ltorg is most definitely within range. So let's fix this by emitting the literal explicitly, and not rely on the assembler to figure this out. This means we have move the fallback out of the LOAD_SYM_ARMV6() macro and into the callers. Link: https://github.com/ClangBuiltLinux/linux/issues/1551 Fixes: 9c46929e7989 ("ARM: implement THREAD_INFO_IN_TASK for uniprocessor systems") Reported-by: Nathan Chancellor <natechancellor@gmail.com> Tested-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2022-01-05	ARM: 9177/1: disable vmap'ed stacks on suspend-capable SMP configs	Ard Biesheuvel	1	-1/+1
	There are several reports about the new vmap'ed stacks code breaking suspend/resume on Exynos, Renesas and Tegra SMP platforms. While this is under investigation, let's disable the vmap'ed stacks feature for the time being for SMP configurations that have suspend/resume enabled. [0] https://lore.kernel.org/linux-arm-kernel/20211122092816.2865873-8-ardb@kernel.org/ Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Jon Hunter <jonathanh@nvidia.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2021-12-06	ARM: v7m: enable support for IRQ stacks	Ard Biesheuvel	2	-3/+15
	Enable support for IRQ stacks on !MMU, and add the code to the IRQ entry path to switch to the IRQ stack if not running from it already. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Marc Zyngier <maz@kernel.org> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
2021-12-06	ARM: implement THREAD_INFO_IN_TASK for uniprocessor systems	Ard Biesheuvel	12	-92/+116
	On UP systems, only a single task can be 'current' at the same time, which means we can use a global variable to track it. This means we can also enable THREAD_INFO_IN_TASK for those systems, as in that case, thread_info is accessed via current rather than the other way around, removing the need to store thread_info at the base of the task stack. This, in turn, permits us to enable IRQ stacks and vmap'ed stacks on UP systems as well. To partially mitigate the performance overhead of this arrangement, use a ADD/ADD/LDR sequence with the appropriate PC-relative group relocations to load the value of current when needed. This means that accessing current will still only require a single load as before, avoiding the need for a literal to carry the address of the global variable in each function. However, accessing thread_info will now require this load as well. Acked-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Marc Zyngier <maz@kernel.org> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
2021-12-06	ARM: smp: defer TPIDRURO update for SMP v6 configurations too	Ard Biesheuvel	2	-7/+17
	Defer TPIDURO updates for user space until exit also for CPU_V6+SMP configurations so that we can decide at runtime whether to use it to carry the current pointer, provided that we are running on a CPU that actually implements this register. This is needed for THREAD_INFO_IN_TASK support for UP systems, which requires that all SMP capable systems use the TPIDRURO based access to 'current' as the only remaining alternative will be a global variable which only works on UP. Acked-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Marc Zyngier <maz@kernel.org> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
2021-12-06	ARM: use TLS register for 'current' on !SMP as well	Ard Biesheuvel	1	-1/+1
	Enable the use of the TLS register to hold the 'current' pointer also on non-SMP configurations that target v6k or later CPUs. This will permit the use of THREAD_INFO_IN_TASK as well as IRQ stacks and vmap'ed stacks for such configurations. Acked-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Nicolas Pitre <nico@fluxnic.net> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Marc Zyngier <maz@kernel.org> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
2021-12-06	ARM: percpu: add SMP_ON_UP support	Ard Biesheuvel	6	-22/+107
	Permit the use of the TPIDRPRW system register for carrying the per-CPU offset in generic SMP configurations that also target non-SMP capable ARMv6 cores. This uses the SMP_ON_UP code patching framework to turn all TPIDRPRW accesses into reads/writes of entry #0 in the __per_cpu_offset array. While at it, switch over some existing direct TPIDRPRW accesses in asm code to invocations of a new helper that is patched in the same way when necessary. Note that CPU_V6+SMP without SMP_ON_UP results in a kernel that does not boot on v6 CPUs without SMP extensions, so add this dependency to Kconfig as well. Acked-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Marc Zyngier <maz@kernel.org> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
2021-12-06	ARM: assembler: add optimized ldr/str macros to load variables from memory	Ard Biesheuvel	3	-6/+43
	We will be adding variable loads to various hot paths, so it makes sense to add a helper macro that can load variables from asm code without the use of literal pool entries. On v7 or later, we can simply use MOVW/MOVT pairs, but on earlier cores, this requires a bit of hackery to emit a instruction sequence that implements this using a sequence of ADD/LDR instructions. Acked-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Marc Zyngier <maz@kernel.org> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
2021-12-06	ARM: module: implement support for PC-relative group relocations	Ard Biesheuvel	2	-0/+88
	Add support for the R_ARM_ALU_PC_Gn_NC and R_ARM_LDR_PC_G2 group relocations [0] so we can use them in modules. These will be used to load the current task pointer from a global variable without having to rely on a literal pool entry to carry the address of this variable, which may have a significant negative impact on cache utilization for variables that are used often and in many different places, as each occurrence will result in a literal pool entry and therefore a line in the D-cache. [0] 'ELF for the ARM architecture' https://github.com/ARM-software/abi-aa/releases Acked-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Marc Zyngier <maz@kernel.org> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
2021-12-06	ARM: entry: preserve thread_info pointer in switch_to	Ard Biesheuvel	1	-8/+9
	Tweak the UP stack protector handling code so that the thread info pointer is preserved in R7 until set_current is called. This is needed for a subsequent patch that implements THREAD_INFO_IN_TASK and set_current for UP as well. This also means we will prefer the per-task protector on UP systems that implement the thread ID registers, so tweak the preprocessor conditionals to reflect this. Acked-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Marc Zyngier <maz@kernel.org> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
2021-12-06	irqchip: nvic: Use GENERIC_IRQ_MULTI_HANDLER	Vladimir Murzin	2	-8/+5
	Rather then restructuring the ARMv7M entrly logic per TODO, just move NVIC to GENERIC_IRQ_MULTI_HANDLER. Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Marc Zyngier <maz@kernel.org> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
2021-12-06	ARM: remove old-style irq entry	Arnd Bergmann	6	-56/+2
	The last user of arch_irq_handler_default is gone now, so the entry-macro-multi.S file and all references to mach/entry-macro.S can be removed, as well as the asm_do_IRQ() entrypoint into the interrupt handling routines implemented in C. Note: The ARMv7-M entry still uses its own top-level IRQ entry, calling nvic_handle_irq() from assembly. This could be changed to go through generic_handle_arch_irq() as well, but it's unclear to me if there are any benefits. Signed-off-by: Arnd Bergmann <arnd@arndb.de> [ardb: keep irq_handler macro as it carries all the IRQ stack handling] Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Marc Zyngier <maz@kernel.org> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
2021-12-06	ARM: iop32x: use GENERIC_IRQ_MULTI_HANDLER	Arnd Bergmann	6	-43/+43
	iop32x uses the entry-macro.S file for both the IRQ entry and for hooking into the arch_ret_to_user code path. This is done because the cp6 registers have to be enabled before accessing any of the interrupt controller registers but have to be disabled when running in user space. There is also a lazy-enable logic in cp6.c, but during a hardirq, we know it has to be enabled. Both the cp6-enable code and the code to read the IRQ status can be lifted into the normal generic_handle_arch_irq() path, but the cp6-disable code has to remain in the user return code. As nothing other than iop32x uses this hook, just open-code it there with an ifdef for the platform that can eventually be removed when iop32x has reached the end of its life. The cp6-enable path in the IRQ entry has an extra cp_wait barrier that the trap version does not have, but it is harmless to do it in both cases to simplify the logic here at the cost of a few extra cycles for the trap. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Marc Zyngier <maz@kernel.org> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
2021-12-06	ARM: iop32x: offset IRQ numbers by 1	Arnd Bergmann	4	-33/+37
	iop32x is one of the last platforms to use IRQ 0, and this has apparently stopped working in a 2014 cleanup without anyone noticing. This interrupt is used for the DMA engine, so most likely this has not actually worked in the past 7 years, but it's also not essential for using this board. I'm splitting out this change from my GENERIC_IRQ_MULTI_HANDLER conversion so it can be backported if anyone cares. Fixes: a71b092a9c68 ("ARM: Convert handle_IRQ to use __handle_domain_irq") Signed-off-by: Arnd Bergmann <arnd@arndb.de> [ardb: take +1 offset into account in mask/unmask and init as well] Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Marc Zyngier <maz@kernel.org> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
2021-12-06	ARM: footbridge: use GENERIC_IRQ_MULTI_HANDLER	Arnd Bergmann	3	-107/+88
	Footbridge still uses the classic IRQ entry path in assembler, but this is easily converted into an equivalent C version. In this case, the correlation between IRQ numbers and bits in the status register is non-obvious, and the priorities are handled by manually checking each bit in a static order, re-reading the status register after each handled event. I moved the code into the new file and edited the syntax without changing this sequence to keep the behavior as close as possible to what it traditionally did. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Marc Zyngier <maz@kernel.org> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
2021-12-03	ARM: riscpc: use GENERIC_IRQ_MULTI_HANDLER	Arnd Bergmann	5	-99/+99
	This is one of the last platforms using the old entry path. While this code path is spread over a few files, it is fairly straightforward to convert it into an equivalent C version, leaving the existing algorithm and all the priority handling the same. Unlike most irqchip drivers, this means reading the status register(s) in a loop and always handling the highest-priority irq first. The IOMD_IRQREQC and IOMD_IRQREQD registers are not actaully used here, but I left the code in place for the time being, to keep the conversion as direct as possible. It could be removed in a cleanup on top. Signed-off-by: Arnd Bergmann <arnd@arndb.de> [ardb: drop obsolete IOMD_IRQREQC/IOMD_IRQREQD handling] Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Marc Zyngier <maz@kernel.org> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
2021-12-03	ARM: riscpc: drop support for IOMD_IRQREQC/IOMD_IRQREQD IRQ groups	Ard Biesheuvel	1	-47/+0
	IOMD_IRQREQC nor IOMD_IRQREQD are ever defined, so any conditionally compiled code that depends on them is dead code, and can be removed. Suggested-by: Russell King <linux@armlinux.org.uk> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2021-12-03	ARM: implement support for vmap'ed stacks	Ard Biesheuvel	11	-15/+244
	Wire up the generic support for managing task stack allocations via vmalloc, and implement the entry code that detects whether we faulted because of a stack overrun (or future stack overrun caused by pushing the pt_regs array) While this adds a fair amount of tricky entry asm code, it should be noted that it only adds a TST + branch to the svc_entry path. The code implementing the non-trivial handling of the overflow stack is emitted out-of-line into the .text section. Since on ARM, we rely on do_translation_fault() to keep PMD level page table entries that cover the vmalloc region up to date, we need to ensure that we don't hit such a stale PMD entry when accessing the stack. So we do a dummy read from the new stack while still running from the old one on the context switch path, and bump the vmalloc_seq counter when PMD level entries in the vmalloc range are modified, so that the MM switch fetches the latest version of the entries. Note that we need to increase the per-mode stack by 1 word, to gain some space to stash a GPR until we know it is safe to touch the stack. However, due to the cacheline alignment of the struct, this does not actually increase the memory footprint of the struct stack array at all. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Keith Packard <keithpac@amazon.com> Tested-by: Marc Zyngier <maz@kernel.org> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
2021-12-03	ARM: entry: rework stack realignment code in svc_entry	Ard Biesheuvel	1	-11/+14
	The original Thumb-2 enablement patches updated the stack realignment code in svc_entry to work around the lack of a STMIB instruction in Thumb-2, by subtracting 4 from the frame size, inverting the sense of the misaligment check, and changing to a STMIA instruction and a final stack push of a 4 byte quantity that results in the stack becoming aligned at the end of the sequence. It also pushes and pops R0 to the stack in order to have a temp register that Thumb-2 allows in general purpose ALU instructions, as TST using SP is not permitted. Both are a bit problematic for vmap'ed stacks, as using the stack is only permitted after we decide that we did not overflow the stack, or have already switched to the overflow stack. As for the alignment check: the current approach creates a corner case where, if the initial SUB of SP ends up right at the start of the stack, we will end up subtracting another 8 bytes and overflowing it. This means we would need to add the overflow check after the SUB that deliberately misaligns the stack. However, this would require us to keep local state (i.e., whether we performed the subtract or not) across the overflow check, but without any GPRs or stack available. So let's switch to an approach where we don't use the stack, and where the alignment check of the stack pointer occurs in the usual way, as this is guaranteed not to result in overflow. This means we will be able to do the overflow check first. While at it, switch to R1 so the mode stack pointer in R0 remains accessible. Acked-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Marc Zyngier <maz@kernel.org> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
2021-12-03	ARM: switch_to: clean up Thumb2 code path	Ard Biesheuvel	1	-5/+18
	The load-multiple instruction that essentially performs the switch_to operation in ARM mode, by loading all callee save registers as well the stack pointer and the program counter, is split into 3 separate loads for Thumb-2, with the IP register used as a temporary to capture the value of R4 before it gets overwritten. We can clean this up a bit, by sticking with a single LDMIA instruction, but one that pops SP and PC into IP and LR, respectively, and by using ordinary move register and branch instructions to get those values into SP and PC. This also allows us to move the set_current call closer to the assignment of SP, reducing the window where those are mutually out of sync. This is especially relevant for CONFIG_VMAP_STACK, which is being introduced in a subsequent patch, where we need to issue a load that might fault from the new stack while running from the old one, to ensure that stale PMD entries in the VMALLOC space are synced up. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Keith Packard <keithpac@amazon.com> Tested-by: Marc Zyngier <maz@kernel.org> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
2021-12-03	ARM: unwind: disregard unwind info before stack frame is set up	Ard Biesheuvel	1	-1/+15
	When unwinding the stack from a stack overflow, we are likely to start from a stack push instruction, given that this is the most common way to grow the stack for compiler emitted code. This push instruction rarely appears anywhere else than at offset 0x0 of the function, and if it doesn't, the compiler tends to split up the unwind annotations, given that the stack frame layout is apparently not the same throughout the function. This means that, in the general case, if the frame's PC points at the first instruction covered by a certain unwind entry, there is no way the stack frame that the unwind entry describes could have been created yet, and so we are still on the stack frame of the caller in that case. So treat this as a special case, and return with the new PC taken from the frame's LR, without applying the unwind transformations to the virtual register set. This permits us to unwind the call stack on stack overflow when the overflow was caused by a stack push on function entry. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Keith Packard <keithpac@amazon.com> Tested-by: Marc Zyngier <maz@kernel.org> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
2021-12-03	ARM: memset: clean up unwind annotations	Ard Biesheuvel	1	-4/+3
	The memset implementation carves up the code in different sections, each covered with their own unwind info. In this case, it is done in a way similar to how the compiler might do it, to disambiguate between parts where the return address is in LR and the SP is unmodified, and parts where a stack frame is live, and the unwinder needs to know the size of the stack frame and the location of the return address within it. Only the placement of the unwind directives is slightly odd: the stack pushes are placed in the wrong sections, which may confuse the unwinder when attempting to unwind with PC pointing at the stack push in question. So let's fix this up, by reordering the directives and instructions as appropriate. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Keith Packard <keithpac@amazon.com> Tested-by: Marc Zyngier <maz@kernel.org> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
2021-12-03	ARM: memmove: use frame pointer as unwind anchor	Ard Biesheuvel	1	-40/+20
	The memmove routine is a bit unusual in the way it manages the stack pointer: depending on the execution path through the function, the SP assumes different values as different subsets of the register file are preserved and restored again. This is problematic when it comes to EHABI unwind info, as it is not instruction accurate, and does not allow tracking the SP value as it changes. Commit 207a6cb06990c ("ARM: 8224/1: Add unwinding support for memmove function") addressed this by carving up the function in different chunks as far as the unwinder is concerned, and keeping a set of unwind directives for each of them, each corresponding with the state of the stack pointer during execution of the chunk in question. This not only duplicates unwind info unnecessarily, but it also complicates unwinding the stack upon overflow. Instead, let's do what the compiler does when the SP is updated halfway through a function, which is to use a frame pointer and emit the appropriate unwind directives to communicate this to the unwinder. Note that Thumb-2 uses R7 for this, while ARM uses R11 aka FP. So let's avoid touching R7 in the body of the function, so that Thumb-2 can use it as the frame pointer. R11 was not modified in the first place. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Keith Packard <keithpac@amazon.com> Tested-by: Marc Zyngier <maz@kernel.org> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
2021-12-03	ARM: memcpy: use frame pointer as unwind anchor	Ard Biesheuvel	4	-68/+38
	The memcpy template is a bit unusual in the way it manages the stack pointer: depending on the execution path through the function, the SP assumes different values as different subsets of the register file are preserved and restored again. This is problematic when it comes to EHABI unwind info, as it is not instruction accurate, and does not allow tracking the SP value as it changes. Commit 279f487e0b471 ("ARM: 8225/1: Add unwinding support for memory copy functions") addressed this by carving up the function in different chunks as far as the unwinder is concerned, and keeping a set of unwind directives for each of them, each corresponding with the state of the stack pointer during execution of the chunk in question. This not only duplicates unwind info unnecessarily, but it also complicates unwinding the stack upon overflow. Instead, let's do what the compiler does when the SP is updated halfway through a function, which is to use a frame pointer and emit the appropriate unwind directives to communicate this to the unwinder. Note that Thumb-2 uses R7 for this, while ARM uses R11 aka FP. So let's avoid touching R7 in the body of the template, so that Thumb-2 can use it as the frame pointer. R11 was not modified in the first place. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Keith Packard <keithpac@amazon.com> Tested-by: Marc Zyngier <maz@kernel.org> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
2021-12-03	ARM: run softirqs on the per-CPU IRQ stack	Ard Biesheuvel	2	-0/+16
	Now that we have enabled IRQ stacks, any softIRQs that are handled over the back of a hard IRQ will run from the IRQ stack as well. However, any synchronous softirq processing that happens when re-enabling softIRQs from task context will still execute on that task's stack. Since any call to local_bh_enable() at any level in the task's call stack may trigger a softIRQ processing run, which could potentially cause a task stack overflow if the combined stack footprints exceed the stack's size, let's run these synchronous invocations of do_softirq() on the IRQ stack as well. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Linus Walleij <linus.walleij@linaro.org> Tested-by: Keith Packard <keithpac@amazon.com> Tested-by: Marc Zyngier <maz@kernel.org> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
2021-12-03	ARM: call_with_stack: add unwind support	Ard Biesheuvel	1	-8/+25
	Restructure the code and add the unwind annotations so that both the frame pointer unwinder as well as the EHABI unwind info based unwinder will be able to follow the call stack through call_with_stack(). Since GCC and Clang use different formats for the stack frame, two methods are implemented: a GCC version that pushes fp, sp, lr and pc for compatibility with the frame pointer unwinder, and a second version that works with Clang, as well as with the EHABI unwinder both in ARM and Thumb2 modes. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Linus Walleij <linus.walleij@linaro.org> Tested-by: Keith Packard <keithpac@amazon.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Marc Zyngier <maz@kernel.org> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M