summaryrefslogtreecommitdiffstats
path: root/arch/powerpc/kernel/entry_64.S
AgeCommit message (Collapse)AuthorFilesLines
2018-01-21Merge branch 'fixes' into nextMichael Ellerman1-8/+36
Merge our fixes branch from the 4.15 cycle. Unusually the fixes branch saw some significant features merged, notably the RFI flush patches, so we want the code in next to be tested against that, to avoid any surprises when the two are merged. There's also some other work on the panic handling that was reverted in fixes and we now want to do properly in next, which would conflict. And we also fix a few other minor merge conflicts.
2018-01-19powerpc: Add new kconfig CONFIG_PPC_IRQ_SOFT_MASK_DEBUGMadhavan Srinivasan1-2/+2
New Kconfig is added "CONFIG_PPC_IRQ_SOFT_MASK_DEBUG" to add WARN_ON to alert the invalid transitions. Also moved the code under the CONFIG_TRACE_IRQFLAGS in arch_local_irq_restore() to new Kconfig. Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> [mpe: Fix name of CONFIG option in change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-19powerpc/64s: Add support to mask perf interrupts and replay themMadhavan Srinivasan1-0/+5
Two new bit mask field "IRQ_DISABLE_MASK_PMU" is introduced to support the masking of PMI and "IRQ_DISABLE_MASK_ALL" to aid interrupt masking checking. Couple of new irq #defs "PACA_IRQ_PMI" and "SOFTEN_VALUE_0xf0*" added to use in the exception code to check for PMI interrupts. In the masked_interrupt handler, for PMIs we reset the MSR[EE] and return. In the __check_irq_replay(), replay the PMI interrupt by calling performance_monitor_common handler. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-19powerpc/64: Rename soft_enabled to irq_soft_maskMadhavan Srinivasan1-6/+6
Rename the paca->soft_enabled to paca->irq_soft_mask as it is no longer used as a flag for interrupt state, but a mask. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-19powerpc/64: Change soft_enabled from flag to bitmaskMadhavan Srinivasan1-11/+10
"paca->soft_enabled" is used as a flag to mask some of interrupts. Currently supported flags values and their details: soft_enabled MSR[EE] 0 0 Disabled (PMI and HMI not masked) 1 1 Enabled "paca->soft_enabled" is initialized to 1 to make the interripts as enabled. arch_local_irq_disable() will toggle the value when interrupts needs to disbled. At this point, the interrupts are not actually disabled, instead, interrupt vector has code to check for the flag and mask it when it occurs. By "mask it", it update interrupt paca->irq_happened and return. arch_local_irq_restore() is called to re-enable interrupts, which checks and replays interrupts if any occured. Now, as mentioned, current logic doesnot mask "performance monitoring interrupts" and PMIs are implemented as NMI. But this patchset depends on local_irq_* for a successful local_* update. Meaning, mask all possible interrupts during local_* update and replay them after the update. So the idea here is to reserve the "paca->soft_enabled" logic. New values and details: soft_enabled MSR[EE] 1 0 Disabled (PMI and HMI not masked) 0 1 Enabled Reason for the this change is to create foundation for a third mask value "0x2" for "soft_enabled" to add support to mask PMIs. When ->soft_enabled is set to a value "3", PMI interrupts are mask and when set to a value of "1", PMI are not mask. With this patch also extends soft_enabled as interrupt disable mask. Current flags are renamed from IRQ_[EN?DIS}ABLED to IRQS_ENABLED and IRQS_DISABLED. Patch also fixes the ptrace call to force the user to see the softe value to be alway 1. Reason being, even though userspace has no business knowing about softe, it is part of pt_regs. Like-wise in signal context. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-19powerpc/64: Add #defines for paca->soft_enabled flagsMadhavan Srinivasan1-8/+8
Two #defines IRQS_ENABLED and IRQS_DISABLED are added to be used when updating paca->soft_enabled. Replace the hardcoded values used when updating paca->soft_enabled with IRQ_(EN|DIS)ABLED #define. No logic change. Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-18powerpc/64: rtas avoid accessing paca in 32-bit modeNicholas Piggin1-6/+11
Commit 177ba7c647f3 ("powerpc/mm/radix: Limit paca allocation in radix") limited the paca allocation address to 1G on pSeries because RTAS return accesses the paca in 32-bit mode: On return from RTAS we access the paca variables and we have 64 bit disabled. This requires us to limit paca in 32 bit range. Fix this by setting ppc64_rma_size to first_memblock_size/1G range. Avoid this limit by switching to 64-bit mode before accessing any memory. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-10powerpc/64: Convert fast_exception_return to use RFI_TO_USER/KERNELNicholas Piggin1-2/+16
Similar to the syscall return path, in fast_exception_return we may be returning to user or kernel context. We already have a test for that, because we conditionally restore r13. So use that existing test and branch, and bifurcate the return based on that. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-10powerpc/64: Convert the syscall exit path to use RFI_TO_USER/KERNELNicholas Piggin1-1/+11
In the syscall exit path we may be returning to user or kernel context. We already have a test for that, because we conditionally restore r13. So use that existing test and branch, and bifurcate the return based on that. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-10powerpc/64s: Simple RFI macro conversionsNicholas Piggin1-5/+9
This commit does simple conversions of rfi/rfid to the new macros that include the expected destination context. By simple we mean cases where there is a single well known destination context, and it's simply a matter of substituting the instruction for the appropriate macro. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-12-11powerpc/64: Don't trace irqs-off at interrupt return to soft-disabled contextNicholas Piggin1-3/+7
When an interrupt is returning to a soft-disabled context (which can happen for non-maskable interrupts or synchronous interrupts), it goes through the motions of soft-disabling again, including calling TRACE_DISABLE_INTS (i.e., trace_hardirqs_off()). This is not necessary, because we must already be soft-disabled in the interrupt context, it also may be causing crashes in the irq tracing code to re-enter as an nmi. Replace it with a warning to ensure that soft-interrupts are still disabled. Fixes: 7c0482e3d055 ("powerpc/irq: Fix another case of lazy IRQ state getting out of sync") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-06powerpc/64s: Replace CONFIG_PPC_STD_MMU_64 with CONFIG_PPC_BOOK3S_64Michael Ellerman1-2/+2
CONFIG_PPC_STD_MMU_64 indicates support for the "standard" powerpc MMU on 64-bit CPUs. The "standard" MMU refers to the hash page table MMU found in "server" processors, from IBM mainly. Currently CONFIG_PPC_STD_MMU_64 is == CONFIG_PPC_BOOK3S_64. While it's annoying to have two symbols that always have the same value, it's not quite annoying enough to bother removing one. However with the arrival of Power9, we now have the situation where CONFIG_PPC_STD_MMU_64 is enabled, but the kernel is running using the Radix MMU - *not* the "standard" MMU. So it is now actively confusing to use it, because it implies that code is disabled or inactive when the Radix MMU is in use, however that is not necessarily true. So s/CONFIG_PPC_STD_MMU_64/CONFIG_PPC_BOOK3S_64/, and do some minor formatting updates of some of the affected lines. This will be a pain for backports, but c'est la vie. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31powerpc/asm: Convert .llong directives to .8byteTobin C. Harding1-1/+1
.llong is an undocumented PPC specific directive. The generic equivalent is .quad, but even better (because it's self describing) is .8byte. Convert all .llong directives to .8byte. Signed-off-by: Tobin C. Harding <me@tobin.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-23powerpc/64: Remove redundant instruction in interrupt replayNicholas Piggin1-1/+0
Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-23powerpc/64s: Merge HV and non-HV paths for doorbell IRQ replayNicholas Piggin1-5/+1
This results in smaller code, and fewer branches. This relies on the fact that both the 0xe80 and 0xa00 handlers call the same upper level code, namely doorbell_exception(). Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Mention we rely on the implementation of the 0xe80/0xa00 handlers] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-07Revert "powerpc/64: Avoid restore_math call if possible in syscall exit"Michael Ellerman1-42/+18
This reverts commit bc4f65e4cf9d6cc43e0e9ba0b8648cf9201cd55f. As reported by Andreas, this commit is causing unrecoverable SLB misses in the system call exit path: Unrecoverable exception 4100 at c00000000000a1ec Oops: Unrecoverable exception, sig: 6 [#1] SMP NR_CPUS=2 PowerMac ... CPU: 0 PID: 18626 Comm: rm Not tainted 4.13.0-rc3 #1 task: c00000018335e080 task.stack: c000000139e50000 NIP: c00000000000a1ec LR: c00000000000a118 CTR: 0000000000000000 REGS: c000000139e53bb0 TRAP: 4100 Not tainted (4.13.0-rc3) MSR: 9000000000001030 <SF,HV,ME,IR,DR> CR: 24000044 XER: 20000000 SOFTE: 1 GPR00: 0000000000000000 c000000139e53e30 c000000000abb500 fffffffffffffffe GPR04: c0000001eb866298 0000000000000000 0000000000000000 c00000018335e080 GPR08: 900000000000d032 0000000000000000 0000000000000002 fffffffffffff001 GPR12: c000000139e50000 c00000000ffff000 00003fffa8c0dca0 00003fffa8c0dc88 GPR16: 0000000010000000 0000000000000001 00003fffa8c0eaa0 0000000000000000 GPR20: 00003fffa8c27528 00003fffa8c27b00 0000000000000000 0000000000000000 GPR24: 00003fffa8c0d918 00003ffff1b3efa0 00003fffa8c26d68 0000000000000000 GPR28: 00003fffa8c249e8 00003fffa8c263d0 00003fffa8c27550 00003ffff1b3ef10 NIP [c00000000000a1ec] system_call_exit+0xc0/0x21c LR [c00000000000a118] system_call+0x58/0x6c Call Trace: [c000000139e53e30] [c00000000000a118] system_call+0x58/0x6c (unreliable) Instruction dump: 64a51000 7c6300d0 f8a101a0 4bffff9c 3c000000 60000006 780007c6 64000000 60000000 7c004039 4082001c e8ed0170 <88070b78> 88c70b79 7c003214 2c200000 This is caused by us trying to load THREAD_LOAD_FP with MSR_RI=0, and taking an SLB miss on the thread struct. Reported-by: Andreas Schwab <schwab@linux-m68k.org> Diagnosed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-03powerpc/64s: Blacklist rtas entry/exit from kprobesNaveen N. Rao1-0/+4
We can't take traps with relocation off, so blacklist enter_rtas() and rtas_return_loc(). However, instead of blacklisting all of enter_rtas(), introduce a new symbol __enter_rtas from where on we can't take a trap and blacklist that. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-03powerpc/64s: Blacklist functions invoked on a trapNaveen N. Rao1-13/+22
Blacklist all functions involved while handling a trap. We: - convert some of the symbols into private symbols, and - blacklist most functions involved while handling a trap. Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-03powerpc/64s: Un-blacklist system_call() from kprobesNaveen N. Rao1-1/+13
It is actually safe to probe system_call() in entry_64.S, but only till we unset MSR_RI. To allow this, add a new symbol system_call_exit() after the mtmsrd and blacklist that. Suggested-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-03powerpc/64s: Move system_call() symbol to just after setting MSR_EENaveen N. Rao1-3/+4
It is common to get a PMU interrupt right after the mtmsr instruction that enables interrupts. Due to this, the stack trace profile gets needlessly split across system_call_common() and system_call(). Previously, system_call() symbol was at the current place to hide a few earlier symbols which have since been made private or removed entirely. So, let's move system_call() slightly higher up, right after the mtmsr instruction that enables interrupts. Convert existing references to system_call to a local syscall symbol. Suggested-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-03powerpc/64s: Blacklist system_call() and system_call_common() from kprobesNaveen N. Rao1-12/+14
Convert some of the symbols into private symbols and blacklist system_call_common() and system_call() from kprobes. We can't take a trap at parts of these functions as either MSR_RI is unset or the kernel stack pointer is not yet setup. Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> [mpe: Don't convert system_call_common to _GLOBAL()] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-15powerpc/64s: Avoid cpabort in context switch when possibleNicholas Piggin1-9/+0
The ISA v3.0B copy-paste facility only requires cpabort when switching to a process that has foreign real addresses mapped (direct access to accelerators), to clear a potential copy buffer filled by a previous thread. There is no accelerator driver implemented yet, so cpabort can be removed. It can be be re-added when a driver is implemented. POWER9 DD1 requires the copy buffer to always be cleared on context switch, but if accelerators are not in use, then an unpaired copy from a dummy region is sufficient to clear data out of the copy buffer. This increases context switch performance by about 5% on POWER9. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-15powerpc/64: Drop explicit hwsync in context switchNicholas Piggin1-6/+17
The sync (aka. hwsync, aka. heavyweight sync) in the context switch code to prevent MMIO access being reordered from the point of view of a single process if it gets migrated to a different CPU is not required because there is an hwsync performed earlier in the context switch path. Comment this so it's clear enough if anything changes on the scheduler or the powerpc sides. Remove the hwsync from _switch. This improves context switch performance by 2-3% on POWER8. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-15powerpc/64: Drop reservation-clearing ldarx in context switchNicholas Piggin1-8/+3
There is no need to explicitly break the reservation in _switch, because we are guaranteed that the context switch path will include a larx/stcx. Comment the guarantee and remove the reservation clear from _switch. This is worth 1-2% in context switch performance. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-15powerpc/64s: Leave interrupts hard enabled in context switch for radixNicholas Piggin1-0/+8
Commit 4387e9ff25 ("[POWERPC] Fix PMU + soft interrupt disable bug") hard disabled interrupts over the low level context switch, because the SLB management can't cope with a PMU interrupt accesing the stack in that window. Radix based kernel mapping does not use the SLB so it does not require interrupts hard disabled here. This is worth 1-2% in context switch performance on POWER9. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-15powerpc/64: Avoid restore_math call if possible in syscall exitNicholas Piggin1-19/+43
The syscall exit code that branches to restore_math is quite heavy on Book3S, consisting of 2 mtmsr instructions. Threads that don't use both FP and vector can get caught here if the kernel ever uses FP or vector. Lazy-FP/vec context switching also trips this case. So check for lazy FP and vector before switching RI for restore_math. Move most of this case out of line. For threads that do want to restore math registers, the MSR switches are still suboptimal. Future direction may be to use a soft-RI bit to avoid MSR switches in kernel (similar to soft-EE), but for now at least the no-restore POWER9 context switch rate increases by about 5% due to sched_yield(2) return performance. I haven't constructed a test to measure the syscall cost. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-05-05Merge tag 'powerpc-4.12-1' of ↵Linus Torvalds1-380/+0
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: "Highlights include: - Larger virtual address space on 64-bit server CPUs. By default we use a 128TB virtual address space, but a process can request access to the full 512TB by passing a hint to mmap(). - Support for the new Power9 "XIVE" interrupt controller. - TLB flushing optimisations for the radix MMU on Power9. - Support for CAPI cards on Power9, using the "Coherent Accelerator Interface Architecture 2.0". - The ability to configure the mmap randomisation limits at build and runtime. - Several small fixes and cleanups to the kprobes code, as well as support for KPROBES_ON_FTRACE. - Major improvements to handling of system reset interrupts, correctly treating them as NMIs, giving them a dedicated stack and using a new hypervisor call to trigger them, all of which should aid debugging and robustness. - Many fixes and other minor enhancements. Thanks to: Alastair D'Silva, Alexey Kardashevskiy, Alistair Popple, Andrew Donnellan, Aneesh Kumar K.V, Anshuman Khandual, Anton Blanchard, Balbir Singh, Ben Hutchings, Benjamin Herrenschmidt, Bhupesh Sharma, Chris Packham, Christian Zigotzky, Christophe Leroy, Christophe Lombard, Daniel Axtens, David Gibson, Gautham R. Shenoy, Gavin Shan, Geert Uytterhoeven, Guilherme G. Piccoli, Hamish Martin, Hari Bathini, Kees Cook, Laurent Dufour, Madhavan Srinivasan, Mahesh J Salgaonkar, Mahesh Salgaonkar, Masami Hiramatsu, Matt Brown, Matthew R. Ochs, Michael Neuling, Naveen N. Rao, Nicholas Piggin, Oliver O'Halloran, Pan Xinhui, Paul Mackerras, Rashmica Gupta, Russell Currey, Sukadev Bhattiprolu, Thadeu Lima de Souza Cascardo, Tobin C. Harding, Tyrel Datwyler, Uma Krishnan, Vaibhav Jain, Vipin K Parashar, Yang Shi" * tag 'powerpc-4.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (214 commits) powerpc/64s: Power9 has no LPCR[VRMASD] field so don't set it powerpc/powernv: Fix TCE kill on NVLink2 powerpc/mm/radix: Drop support for CPUs without lockless tlbie powerpc/book3s/mce: Move add_taint() later in virtual mode powerpc/sysfs: Move #ifdef CONFIG_HOTPLUG_CPU out of the function body powerpc/smp: Document irq enable/disable after migrating IRQs powerpc/mpc52xx: Don't select user-visible RTAS_PROC powerpc/powernv: Document cxl dependency on special case in pnv_eeh_reset() powerpc/eeh: Clean up and document event handling functions powerpc/eeh: Avoid use after free in eeh_handle_special_event() cxl: Mask slice error interrupts after first occurrence cxl: Route eeh events to all drivers in cxl_pci_error_detected() cxl: Force context lock during EEH flow powerpc/64: Allow CONFIG_RELOCATABLE if COMPILE_TEST powerpc/xmon: Teach xmon oops about radix vectors powerpc/mm/hash: Fix off-by-one in comment about kernel contexts ids powerpc/pseries: Enable VFIO powerpc/powernv: Fix iommu table size calculation hook for small tables powerpc/powernv: Check kzalloc() return value in pnv_pci_table_alloc powerpc: Add arch/powerpc/tools directory ...
2017-04-27powerpc: Split ftrace bits into a separate fileNaveen N. Rao1-378/+0
entry_*.S now includes a lot more than just kernel entry/exit code. As a first step at cleaning this up, let's split out the ftrace bits into separate files. Also move all related tracing code into a new trace/ subdirectory. No functional changes. Suggested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-25Merge branch 'topic/kprobes' into nextMichael Ellerman1-6/+7
Although most of these kprobes patches are powerpc specific, there's a couple that touch generic code (with Acks). At the moment there's one conflict with acme's tree, but it's not too bad. Still just in case some other conflicts show up, we've put these in a topic branch so another tree could merge some or all of it if necessary.
2017-04-24powerpc/ftrace: Restore LR from pt_regsNaveen N. Rao1-6/+7
Pass the real LR to the ftrace handler. This is needed for KPROBES_ON_FTRACE for the pre handlers. Also, with KPROBES_ON_FTRACE, the link register may be updated by the pre handlers or by a registed kretprobe. Honor updated LR by restoring it from pt_regs, rather than from the stack save area. Live patch and function graph continue to work fine with this change. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-23powerpc/ftrace: Move stack setup and teardown code into ftrace_graph_caller()Naveen N. Rao1-4/+1
Move the stack setup and teardown code into ftrace_graph_caller(). This way, we don't incur the cost of setting it up unless function graph is enabled for this function. Also, remove the extraneous LR restore code after the function graph stub. LR has previously been restored and neither livepatch_handler() nor ftrace_graph_caller() return back here. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> [mpe: Drop bad change to non-mprofile-kernel version of ftrace_graph_caller] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-18powerpc/kprobe: Fix oops when kprobed on 'stdu' instructionRavi Bangoria1-3/+3
If we set a kprobe on a 'stdu' instruction on powerpc64, we see a kernel OOPS: Bad kernel stack pointer cd93c840 at c000000000009868 Oops: Bad kernel stack pointer, sig: 6 [#1] ... GPR00: c000001fcd93cb30 00000000cd93c840 c0000000015c5e00 00000000cd93c840 ... NIP [c000000000009868] resume_kernel+0x2c/0x58 LR [c000000000006208] program_check_common+0x108/0x180 On a 64-bit system when the user probes on a 'stdu' instruction, the kernel does not emulate actual store in emulate_step() because it may corrupt the exception frame. So the kernel does the actual store operation in exception return code i.e. resume_kernel(). resume_kernel() loads the saved stack pointer from memory using lwz, which only loads the low 32-bits of the address, causing the kernel crash. Fix this by loading the 64-bit value instead. Fixes: be96f63375a1 ("powerpc: Split out instruction analysis part of emulate_step()") Cc: stable@vger.kernel.org # v3.18+ Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Reviewed-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> [mpe: Change log massage, add stable tag] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-10-14Merge branch 'kbuild' of ↵Linus Torvalds1-0/+3
git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild Pull kbuild updates from Michal Marek: - EXPORT_SYMBOL for asm source by Al Viro. This does bring a regression, because genksyms no longer generates checksums for these symbols (CONFIG_MODVERSIONS). Nick Piggin is working on a patch to fix this. Plus, we are talking about functions like strcpy(), which rarely change prototypes. - Fixes for PPC fallout of the above by Stephen Rothwell and Nick Piggin - fixdep speedup by Alexey Dobriyan. - preparatory work by Nick Piggin to allow architectures to build with -ffunction-sections, -fdata-sections and --gc-sections - CONFIG_THIN_ARCHIVES support by Stephen Rothwell - fix for filenames with colons in the initramfs source by me. * 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild: (22 commits) initramfs: Escape colons in depfile ppc: there is no clear_pages to export powerpc/64: whitelist unresolved modversions CRCs kbuild: -ffunction-sections fix for archs with conflicting sections kbuild: add arch specific post-link Makefile kbuild: allow archs to select link dead code/data elimination kbuild: allow architectures to use thin archives instead of ld -r kbuild: Regenerate genksyms lexer kbuild: genksyms fix for typeof handling fixdep: faster CONFIG_ search ia64: move exports to definitions sparc32: debride memcpy.S a bit [sparc] unify 32bit and 64bit string.h sparc: move exports to definitions ppc: move exports to definitions arm: move exports to definitions s390: move exports to definitions m68k: move exports to definitions alpha: move exports to actual definitions x86: move exports to actual definitions ...
2016-09-20powerpc/64s: Optimise MSR handling in exception handlingNicholas Piggin1-12/+9
mtmsrd with L=1 only affects MSR_EE and MSR_RI bits, and we always know what state those bits are, so the kernel MSR does not need to be loaded when modifying them. mtmsrd is often in the critical execution path, so avoiding dependency on even L1 load is noticable. On a POWER8 this saves about 3 cycles from the syscall path, and possibly a few from other exception returns (not measured). Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-29powerpc/tm: do not use r13 for tabort_syscallNicholas Piggin1-6/+6
tabort_syscall runs with RI=1, so a nested recoverable machine check will load the paca into r13 and overwrite what we loaded it with, because exceptions returning to privileged mode do not restore r13. Fixes: b4b56f9ecab4 (powerpc/tm: Abort syscalls in active transactions) Cc: stable@vger.kernel.org Signed-off-by: Nick Piggin <npiggin@gmail.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2016-08-07ppc: move exports to definitionsAl Viro1-0/+3
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-08-01powerpc/mm: Make MMU_FTR_RADIX a MMU family featureAneesh Kumar K.V1-1/+1
MMU feature bits are defined such that we use the lower half to present MMU family features. Remove the strict split of half and also move Radix to a mmu family feature. Radix introduce a new MMU model and strictly speaking it is a new MMU family. This also free up bits which can be used for individual features later. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-09powerpc32: provide VIRT_CPU_ACCOUNTINGChristophe Leroy1-3/+3
This patch provides VIRT_CPU_ACCOUTING to PPC32 architecture. PPC32 doesn't have the PACA structure, so we use the task_info structure to store the accounting data. In order to reuse on PPC32 the PPC64 functions, all u64 data has been replaced by 'unsigned long' so that it is u32 on PPC32 and u64 on PPC64 Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>
2016-06-14powerpc: Define and use PPC64_ELF_ABI_v2/v1Michael Ellerman1-1/+1
We're approaching 20 locations where we need to check for ELF ABI v2. That's fine, except the logic is a bit awkward, because we have to check that _CALL_ELF is defined and then what its value is. So check it once in asm/types.h and define PPC64_ELF_ABI_v2 when ELF ABI v2 is detected. We also have a few places where what we're really trying to check is that we are using the 64-bit v1 ABI, ie. function descriptors. So also add a #define for that, which simplifies several checks. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-11powerpc/mm/radix: Use STD_MMU_64 to properly isolate hash related codeAneesh Kumar K.V1-2/+5
We also use MMU_FTR_RADIX to branch out from code path specific to hash. No functionality change. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-04-27powerpc: Add support for userspace P9 copy pasteChris Smart1-0/+9
The copy paste facility introduced in POWER9 provides an optimised mechanism for a userspace application to copy a cacheline. This is provided by a pair of instructions, copy and paste, while a third, cp_abort (copy paste abort), provides a clean up of the state in case of a failure. The copy instruction will read a 128 byte cacheline and store it in an internal buffer. The subsequent paste instruction will store this internal buffer to memory and set a CR field if the paste succeeds. Since the state of the copy paste buffer is internal (and not architecturally visible), in the unlikely event of a context switch, the state cannot be stored and the paste should therefore fail. The cp_abort instruction exists to fail and clean up any such interrupted copy paste sequence and is to be called by the kernel as part of the context switch. Doing so prevents data from a preceding copy in one process leaking into the paste of another. This code enables use of the cp_abort instruction if a supported processor is detected. NOTE: this is for userspace only, not in kernel, and does not deal with KVM guests. Patch created with much assistance from Michael Neuling <mikey@neuling.org> Signed-off-by: Chris Smart <chris@distroguy.com> Reviewed-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-04-18Merge branch 'topic/livepatch' into nextMichael Ellerman1-0/+97
Merge the support for live patching on ppc64le using mprofile-kernel. This branch has also been merged into the livepatching tree for v4.7.
2016-04-14powerpc/livepatch: Add live patching support on ppc64leMichael Ellerman1-0/+97
Add the kconfig logic & assembly support for handling live patched functions. This depends on DYNAMIC_FTRACE_WITH_REGS, which in turn depends on the new -mprofile-kernel ftrace ABI, which is only supported currently on ppc64le. Live patching is handled by a special ftrace handler. This means it runs from ftrace_caller(). The live patch handler modifies the NIP so as to redirect the return from ftrace_caller() to the new patched function. However there is one particularly tricky case we need to handle. If a function A calls another function B, and it is known at link time that they share the same TOC, then A will not save or restore its TOC, and will call the local entry point of B. When we live patch B, we replace it with a new function C, which may not have the same TOC as A. At live patch time it's too late to modify A to do the TOC save/restore, so the live patching code must interpose itself between A and C, and do the TOC save/restore that A omitted. An additionaly complication is that the livepatch code can not create a stack frame in order to save the TOC. That is because if C takes > 8 arguments, or is varargs, A will have written the arguments for C in A's stack frame. To solve this, we introduce a "livepatch stack" which grows upward from the base of the regular stack, and is used to store the TOC & LR when calling a live patched function. When the patched function returns, we retrieve the real LR & TOC from the livepatch stack, restore them, and pop the livepatch "stack frame". Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Torsten Duwe <duwe@suse.de> Reviewed-by: Balbir Singh <bsingharora@gmail.com>
2016-03-16powerpc: Fix unrecoverable SLB miss during restore_math()Cyril Bur1-0/+9
Commit 70fe3d9 "powerpc: Restore FPU/VEC/VSX if previously used" introduces a call to restore_math() late in the syscall return path, after MSR_RI has been cleared. The MSR_RI flag is used to indicate whether the kernel can take another exception or not. A cleared MSR_RI flag indicates that the kernel cannot. Unfortunately when a machine is under SLB pressure an SLB miss can occur in restore_math() which (with MSR_RI cleared) leads to an unrecoverable exception. Unrecoverable exception 4100 at c0000000000088d8 cpu 0x0: Vector: 4100 at [c0000003fa473b20] pc: c0000000000088d8: .load_vr_state+0x70/0x110 lr: c00000000000f710: .restore_math+0x130/0x188 sp: c0000003fa473da0 msr: 9000000002003030 current = 0xc0000007f876f180 paca = 0xc00000000fff0000 softe: 0 irq_happened: 0x01 pid = 1944, comm = K08umountfs [link register ] c00000000000f710 .restore_math+0x130/0x188 [c0000003fa473da0] c0000003fa473e30 (unreliable) [c0000003fa473e30] c000000000007b6c system_call+0x84/0xfc The clearing of MSR_RI is actually an optimisation to avoid multiple MSR writes, what must be disabled are interrupts. See comment in entry_64.S: /* * For performance reasons we clear RI the same time that we * clear EE. We only need to clear RI just before we restore r13 * below, but batching it with EE saves us one expensive mtmsrd call. * We have to be careful to restore RI if we branch anywhere from * here (eg syscall_exit_work). */ At the point of calling restore_math() r13 has not been restored, as such, the quick fix of turning MSR_RI back on for the call to restore_math() will eliminate the occurrence of an unrecoverable exception. We'd like to do a better fix in future. Fixes: 70fe3d980f5f ("powerpc: Restore FPU/VEC/VSX if previously used") Signed-off-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-03-11Merge branch 'topic/mprofile-kernel' into nextMichael Ellerman1-1/+165
Merge the ftrace changes to support -mprofile-kernel on ppc64le. This is a prerequisite for live patching, the support for which will be merged via the livepatch tree based on this topic branch.
2016-03-07powerpc/ftrace: Add support for -mprofile-kernel ftrace ABITorsten Duwe1-1/+165
The gcc switch -mprofile-kernel defines a new ABI for calling _mcount() very early in the function with minimal overhead. Although mprofile-kernel has been available since GCC 3.4, there were bugs which were only fixed recently. Currently it is known to work in GCC 4.9, 5 and 6. Additionally there are two possible code sequences generated by the flag, the first uses mflr/std/bl and the second is optimised to omit the std. Currently only gcc 6 has the optimised sequence. This patch supports both sequences. Initial work started by Vojtech Pavlik, used with permission. Key changes: - rework _mcount() to work for both the old and new ABIs. - implement new versions of ftrace_caller() and ftrace_graph_caller() which deal with the new ABI. - updates to __ftrace_make_nop() to recognise the new mcount calling sequence. - updates to __ftrace_make_call() to recognise the nop'ed sequence. - implement ftrace_modify_call(). - updates to the module loader to surpress the toc save in the module stub when calling mcount with the new ABI. Reviewed-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Torsten Duwe <duwe@suse.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-03-02powerpc: Restore FPU/VEC/VSX if previously usedCyril Bur1-3/+18
Currently the FPU, VEC and VSX facilities are lazily loaded. This is not a problem unless a process is using these facilities. Modern versions of GCC are very good at automatically vectorising code, new and modernised workloads make use of floating point and vector facilities, even the kernel makes use of vectorised memcpy. All this combined greatly increases the cost of a syscall since the kernel uses the facilities sometimes even in syscall fast-path making it increasingly common for a thread to take an *_unavailable exception soon after a syscall, not to mention potentially taking all three. The obvious overcompensation to this problem is to simply always load all the facilities on every exit to userspace. Loading up all FPU, VEC and VSX registers every time can be expensive and if a workload does avoid using them, it should not be forced to incur this penalty. An 8bit counter is used to detect if the registers have been used in the past and the registers are always loaded until the value wraps to back to zero. Several versions of the assembly in entry_64.S were tested: 1. Always calling C. 2. Performing a common case check and then calling C. 3. A complex check in asm. After some benchmarking it was determined that avoiding C in the common case is a performance benefit (option 2). The full check in asm (option 3) greatly complicated that codepath for a negligible performance gain and the trade-off was deemed not worth it. Signed-off-by: Cyril Bur <cyrilbur@gmail.com> [mpe: Move load_vec in the struct to fill an existing hole, reword change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> fixup
2015-12-17powerpc/kernel: Open code SET_DEFAULT_THREAD_PPRMichael Ellerman1-1/+7
This is only used in one location, open code it. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-17powerpc/kernel: Open code HMT_MEDIUM_LOW_HAS_PPRMichael Ellerman1-1/+5
HMT_MEDIUM_LOW_HAS_PPR is only used in once place, open code it. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-01powerpc: Remove redundant mflr in _switchAnton Blanchard1-3/+1
No need to execute mflr twice. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>