summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2019-03-02powerpc/mm: Check secondary hash page tableRashmica Gupta1-1/+1
We were always calling base_hpte_find() with primary = true, even when we wanted to check the secondary table. mpe: I broke this when refactoring Rashmica's original patch. Fixes: 1515ab932156 ("powerpc/mm: Dump hash table") Signed-off-by: Rashmica Gupta <rashmica.g@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-03-02powerpc: remove nargs from __SYSCALLFiroz Khan3-6/+6
The __SYSCALL macro's arguments are system call number, system call entry name and number of arguments for the system call. Argument- nargs in __SYSCALL(nr, entry, nargs) is neither calculated nor used anywhere. So it would be better to keep the implementaion as __SYSCALL(nr, entry). This will unifies the implementation with some other architetures too. Signed-off-by: Firoz Khan <firoz.khan@linaro.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-03-02Merge branch 'topic/ppc-kvm' into nextMichael Ellerman1-9/+17
Merge another commit in the topic/ppc-kvm branch we're sharing with kvm-ppc.
2019-03-02powerpc/64s: Fix unrelocated interrupt trampoline address testNicholas Piggin4-16/+16
The recent commit got this test wrong, it declared the assembler symbols the wrong way, and also used the wrong symbol name (xxx_start rather than start_xxx, see asm/head-64.h). Fixes: ccd477028a ("powerpc/64s: Fix HV NMI vs HV interrupt recoverability test") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-28powerpc/powernv/ioda: Fix locked_vm counting for memory used by IOMMU tablesAlexey Kardashevskiy2-2/+6
We store 2 multilevel tables in iommu_table - one for the hardware and one with the corresponding userspace addresses. Before allocating the tables, the iommu_table_group_ops::get_table_size() hook returns the combined size of the two and VFIO SPAPR TCE IOMMU driver adjusts the locked_vm counter correctly. When the table is actually allocated, the amount of allocated memory is stored in iommu_table::it_allocated_size and used to decrement the locked_vm counter when we release the memory used by the table; .get_table_size() and .create_table() calculate it independently but the result is expected to be the same. However the allocator does not add the userspace table size to .it_allocated_size so when we destroy the table because of VFIO PCI unplug (i.e. VFIO container is gone but the userspace keeps running), we decrement locked_vm by just a half of size of memory we are releasing. To make things worse, since we enabled on-demand allocation of indirect levels, it_allocated_size contains only the amount of memory actually allocated at the table creation time which can just be a fraction. It is not a problem with incrementing locked_vm (as get_table_size() value is used) but it is with decrementing. As the result, we leak locked_vm and may not be able to allocate more IOMMU tables after few iterations of hotplug/unplug. This sets it_allocated_size in the pnv_pci_ioda2_ops::create_table() hook to what pnv_pci_ioda2_get_table_size() returns so from now on we have a single place which calculates the maximum memory a table can occupy. The original meaning of it_allocated_size is somewhat lost now though. We do not ditch it_allocated_size whatsoever here and we do not call get_table_size() from vfio_iommu_spapr_tce.c when decrementing locked_vm as we may have multiple IOMMU groups per container and even though they all are supposed to have the same get_table_size() implementation, there is a small chance for failure or confusion. Fixes: 090bad39b237 ("powerpc/powernv: Add indirect levels to it_userspace") Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-27powerpc/fsl: Fix the flush of branch predictor.Christophe Leroy1-0/+1
The commit identified below adds MC_BTB_FLUSH macro only when CONFIG_PPC_FSL_BOOK3E is defined. This results in the following error on some configs (seen several times with kisskb randconfig_defconfig) arch/powerpc/kernel/exceptions-64e.S:576: Error: Unrecognized opcode: `mc_btb_flush' make[3]: *** [scripts/Makefile.build:367: arch/powerpc/kernel/exceptions-64e.o] Error 1 make[2]: *** [scripts/Makefile.build:492: arch/powerpc/kernel] Error 2 make[1]: *** [Makefile:1043: arch/powerpc] Error 2 make: *** [Makefile:152: sub-make] Error 2 This patch adds a blank definition of MC_BTB_FLUSH for other cases. Fixes: 10c5e83afd4a ("powerpc/fsl: Flush the branch predictor at each kernel entry (64bit)") Cc: Diana Craciun <diana.craciun@nxp.com> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Daniel Axtens <dja@axtens.net> Reviewed-by: Diana Craciun <diana.craciun@nxp.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-27powerpc/powernv: Make opal log only readable by rootJordan Niethe1-1/+1
Currently the opal log is globally readable. It is kernel policy to limit the visibility of physical addresses / kernel pointers to root. Given this and the fact the opal log may contain this information it would be better to limit the readability to root. Fixes: bfc36894a48b ("powerpc/powernv: Add OPAL message log interface") Cc: stable@vger.kernel.org # v3.15+ Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Reviewed-by: Stewart Smith <stewart@linux.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-26powerpc/xmon: Fix opcode being uninitialized in print_insn_powerpcNathan Chancellor1-1/+1
When building with -Wsometimes-uninitialized, Clang warns: arch/powerpc/xmon/ppc-dis.c:157:7: warning: variable 'opcode' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized] if (cpu_has_feature(CPU_FTRS_POWER9)) ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ arch/powerpc/xmon/ppc-dis.c:167:7: note: uninitialized use occurs here if (opcode == NULL) ^~~~~~ arch/powerpc/xmon/ppc-dis.c:157:3: note: remove the 'if' if its condition is always true if (cpu_has_feature(CPU_FTRS_POWER9)) ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ arch/powerpc/xmon/ppc-dis.c:132:38: note: initialize the variable 'opcode' to silence this warning const struct powerpc_opcode *opcode; ^ = NULL 1 warning generated. This warning seems to make no sense on the surface because opcode is set to NULL right below this statement. However, there is a comma instead of semicolon to end the dialect assignment, meaning that the opcode assignment only happens in the if statement. Properly terminate that line so that Clang no longer warns. Fixes: 5b102782c7f4 ("powerpc/xmon: Enable disassembly files (compilation changes)") Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-26powerpc/powernv: move OPAL call wrapper tracing and interrupt handling to CNicholas Piggin4-314/+328
The OPAL call wrapper gets interrupt disabling wrong. It disables interrupts just by clearing MSR[EE], which has two problems: - It doesn't call into the IRQ tracing subsystem, which means tracing across OPAL calls does not always notice IRQs have been disabled. - It doesn't go through the IRQ soft-mask code, which causes a minor bug. MSR[EE] can not be restored by saving the MSR then clearing MSR[EE], because a racing interrupt while soft-masked could clear MSR[EE] between the two steps. This can cause MSR[EE] to be incorrectly enabled when the OPAL call returns. Fortunately that should only result in another masked interrupt being taken to disable MSR[EE] again, but it's a bit sloppy. The existing code also saves MSR to PACA, which is not re-entrant if there is a nested OPAL call from different MSR contexts, which can happen these days with SRESET interrupts on bare metal. To fix these issues, move the tracing and IRQ handling code to C, and call into asm just for the low level call when everything is ready to go. Save the MSR on stack rather than PACA. Performance cost is kept to a minimum with a few optimisations: - The endian switch upon return is combined with the MSR restore, which avoids an expensive context synchronizing operation for LE kernels. This makes up for the additional mtmsrd to enable interrupts with local_irq_enable(). - blr is now used to return from the opal_* functions that are called as C functions, to avoid link stack corruption. This requires a skiboot fix as well to keep the call stack balanced. A NULL call is more costly after this, (410ns->430ns on POWER9), but OPAL calls are generally not performance critical at this scale. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-26powerpc/64s: Fix data interrupts vs d-side MCE reentrancyNicholas Piggin1-10/+26
Handlers for interrupts that set DAR / DSISR, set MSR[RI] before those SPRs are read. If a d-side machine check hits in this window, DAR / DSISR will be clobbered silently, leading to random corruption. Fix this by having handlers save those registers before setting MSR[RI]. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-26powerpc/64s: Prepare to handle data interrupts vs d-side MCE reentrancyNicholas Piggin1-6/+42
A subsequent fix for data interrupts (those that set DAR / DSISR) requires some interrupt macros to be open-coded, and also requires the 0x300 interrupt handler to be moved out-of-line. This patch does that without changing behaviour, which makes the later fix a smaller change. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-26powerpc/64s: system reset interrupt preserve HSRRsNicholas Piggin1-1/+24
Code that uses HSRR registers is not required to clear MSR[RI] by convention, however the system reset NMI itself may use HSRR registers (e.g., to call OPAL) and clobber them. Rather than introduce the requirement to clear RI in order to use HSRRs, have system reset interrupt save and restore HSRRs. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-26powerpc/64s: Fix HV NMI vs HV interrupt recoverability testNicholas Piggin5-0/+87
HV interrupts that use HSRR registers do not enter with MSR[RI] clear, but their entry code is not recoverable vs NMI, due to shared use of HSPRG1 as a scratch register to save r13. This means that a system reset or machine check that hits in HSRR interrupt entry can cause r13 to be silently corrupted. Fix this by marking NMIs non-recoverable if they land in HV interrupt ranges. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-26powerpc/mm/hash: Handle mmap_min_addr correctly in get_unmapped_area topdown ↵Aneesh Kumar K.V1-4/+6
search When doing top-down search the low_limit is not PAGE_SIZE but rather max(PAGE_SIZE, mmap_min_addr). This handle cases in which mmap_min_addr > PAGE_SIZE. Fixes: fba2369e6ceb ("mm: use vm_unmapped_area() on powerpc architecture") Reviewed-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-26powerpc/hugetlb: Handle mmap_min_addr correctly in get_unmapped_area callbackAneesh Kumar K.V1-2/+3
After we ALIGN up the address we need to make sure we didn't overflow and resulted in zero address. In that case, we need to make sure that the returned address is greater than mmap_min_addr. This fixes selftest va_128TBswitch --run-hugetlb reporting failures when run as non root user for mmap(-1, MAP_HUGETLB) The bug is that a non-root user requesting address -1 will be given address 0 which will then fail, whereas they should have been given something else that would have succeeded. We also avoid the first mmap(-1, MAP_HUGETLB) returning NULL address as mmap address with this change. So we think this is not a security issue, because it only affects whether we choose an address below mmap_min_addr, not whether we actually allow that address to be mapped. ie. there are existing capability checks to prevent a user mapping below mmap_min_addr and those will still be honoured even without this fix. Fixes: 484837601d4d ("powerpc/mm: Add radix support for hugetlb") Reviewed-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-26selftests/powerpc: Remove duplicate headerMichael Ellerman1-1/+0
Remove duplicate headers which are included twice. Signed-off-by: Sabyasachi Gupta <sabyasachi.linux@gmail.com> Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com> [mpe: Split out of larger patch] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-26powerpc sstep: Add support for modsd, modud instructionsSandipan Das1-2/+15
This adds emulation support for the following integer instructions: * Modulo Signed Doubleword (modsd) * Modulo Unsigned Doubleword (modud) Signed-off-by: Sandipan Das <sandipan@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-26powerpc sstep: Add support for modsw, moduw instructionsPrasannaKumar Muralidharan1-0/+14
This adds emulation support for the following integer instructions: * Modulo Signed Word (modsw) * Modulo Unsigned Word (moduw) Signed-off-by: PrasannaKumar Muralidharan <prasannatsmkumar@gmail.com> Signed-off-by: Sandipan Das <sandipan@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-26powerpc sstep: Add support for extswsli instructionSandipan Das1-0/+14
This adds emulation support for the following integer instructions: * Extend-Sign Word and Shift Left Immediate (extswsli[.]) Signed-off-by: Sandipan Das <sandipan@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-26powerpc sstep: Add support for cnttzw, cnttzd instructionsSandipan Das1-0/+14
This adds emulation support for the following integer instructions: * Count Trailing Zeros Word (cnttzw[.]) * Count Trailing Zeros Doubleword (cnttzd[.]) Signed-off-by: Sandipan Das <sandipan@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-26powerpc: sstep: Add support for darn instructionSandipan Das1-0/+22
This adds emulation support for the following integer instructions: * Deliver A Random Number (darn) As suggested by Michael, this uses a raw .long for specifying the instruction word when using inline assembly to retain compatibility with older binutils. Signed-off-by: Sandipan Das <sandipan@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-26powerpc: sstep: Add support for maddhd, maddhdu, maddld instructionsSandipan Das2-2/+48
This adds emulation support for the following integer instructions: * Multiply-Add High Doubleword (maddhd) * Multiply-Add High Doubleword Unsigned (maddhdu) * Multiply-Add Low Doubleword (maddld) As suggested by Michael, this uses a raw .long for specifying the instruction word when using inline assembly to retain compatibility with older binutils. Signed-off-by: Sandipan Das <sandipan@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-23powerpc: clean stack pointers namingChristophe Leroy2-18/+10
Some stack pointers used to also be thread_info pointers and were called tp. Now that they are only stack pointers, rename them sp. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-23powerpc/64: Replace CURRENT_THREAD_INFO with PACA_THREAD_INFOChristophe Leroy9-18/+16
Now that current_thread_info is located at the beginning of 'current' task struct, CURRENT_THREAD_INFO macro is not really needed any more. This patch replaces it by loads of the value at PACA_THREAD_INFO(r13). Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> [mpe: Add PACA_THREAD_INFO rather than using PACACURRENT] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-23powerpc/32: Remove CURRENT_THREAD_INFO and rename TI_CPUChristophe Leroy11-72/+37
Now that thread_info is similar to task_struct, its address is in r2 so CURRENT_THREAD_INFO() macro is useless. This patch removes it. This patch also moves the 'tovirt(r2, r2)' down just before the reactivation of MMU translation, so that we keep the physical address of 'current' in r2 until then. It avoids a few calls to tophys(). At the same time, as the 'cpu' field is not anymore in thread_info, TI_CPU is renamed TASK_CPU by this patch. It also allows to get rid of a couple of '#ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE' as ACCOUNT_CPU_USER_ENTRY() and ACCOUNT_CPU_USER_EXIT() are empty when CONFIG_VIRT_CPU_ACCOUNTING_NATIVE is not defined. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> [mpe: Fix a missed conversion of TI_CPU idle_6xx.S] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-23powerpc: 'current_set' is now a table of task_struct pointersChristophe Leroy5-15/+13
The table of pointers 'current_set' has been used for retrieving the stack and current. They used to be thread_info pointers as they were pointing to the stack and current was taken from the 'task' field of the thread_info. Now, the pointers of 'current_set' table are now both pointers to task_struct and pointers to thread_info. As they are used to get current, and the stack pointer is retrieved from current's stack field, this patch changes their type to task_struct, and renames secondary_ti to secondary_current. Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-23powerpc: regain entire stack spaceChristophe Leroy8-55/+38
thread_info is not anymore in the stack, so the entire stack can now be used. There is also no risk anymore of corrupting task_cpu(p) with a stack overflow so the patch removes the test. When doing this, an explicit test for NULL stack pointer is needed in validate_sp() as it is not anymore implicitely covered by the sizeof(thread_info) gap. In the meantime, with the previous patch all pointers to the stacks are not anymore pointers to thread_info so this patch changes them to void* Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-23powerpc: Activate CONFIG_THREAD_INFO_IN_TASKChristophe Leroy21-194/+56
This patch activates CONFIG_THREAD_INFO_IN_TASK which moves the thread_info into task_struct. Moving thread_info into task_struct has the following advantages: - It protects thread_info from corruption in the case of stack overflows. - Its address is harder to determine if stack addresses are leaked, making a number of attacks more difficult. This has the following consequences: - thread_info is now located at the beginning of task_struct. - The 'cpu' field is now in task_struct, and only exists when CONFIG_SMP is active. - thread_info doesn't have anymore the 'task' field. This patch: - Removes all recopy of thread_info struct when the stack changes. - Changes the CURRENT_THREAD_INFO() macro to point to current. - Selects CONFIG_THREAD_INFO_IN_TASK. - Modifies raw_smp_processor_id() to get ->cpu from current without including linux/sched.h to avoid circular inclusion and without including asm/asm-offsets.h to avoid symbol names duplication between ASM constants and C constants. - Modifies klp_init_thread_info() to take a task_struct pointer argument. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Add task_stack.h to livepatch.h to fix build fails] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-23powerpc/idle/6xx: Use r1 with CURRENT_THREAD_INFO()Christophe Leroy1-1/+2
Make sure CURRENT_THREAD_INFO() is used with r1 which is the virtual address of the stack, in order to ease the switch to r2 when we enable THREAD_INFO_IN_TASK, as we have no register having the phys address of current. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> [mpe: Split out of larger patch] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-23powerpc: Use task_stack_page() in current_pt_regs()Christophe Leroy1-1/+1
Change current_pt_regs() to use task_stack_page() rather than current_thread_info() so that it keeps working once we enable THREAD_INFO_IN_TASK. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> [mpe: Split out of large patch] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-23powerpc: Use linux/thread_info.h in processor.hChristophe Leroy1-1/+1
When we enable THREAD_INFO_IN_TASK we will remove our definition of current_thread_info(). Instead it will come from linux/thread_info.h So switch processor.h to include the latter, so that it can continue to find current_thread_info(). Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Split out of larger patch] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-23powerpc: Use sizeof(struct thread_info) in INIT_SP_LIMITChristophe Leroy1-1/+1
Currently INIT_SP_LIMIT uses sizeof(init_thread_info), but that symbol won't exist when we enable THREAD_INFO_IN_TASK. So just use the sizeof the type which is the same value but will continue to work. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Split out of larger patch] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-23powerpc/64: Use task_stack_page() to initialise paca->kstackChristophe Leroy1-1/+3
Rather than using the thread info use task_stack_page() to initialise paca->kstack, that way it will work with THREAD_INFO_IN_TASK. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Split out of larger patch] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-23powerpc: Update comments in preparation for THREAD_INFO_IN_TASKChristophe Leroy4-4/+4
Update a few comments that talk about current_thread_info() in preparation for THREAD_INFO_IN_TASK. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Split out of larger patch] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-23powerpc: Replace current_thread_info()->task with currentChristophe Leroy1-3/+3
We have a few places that use current_thread_info()->task to access current. This won't work with THREAD_INFO_IN_TASK so fix them now. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Split out of larger patch] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-23powerpc: Don't use CURRENT_THREAD_INFO to find the stackChristophe Leroy3-3/+3
A few places use CURRENT_THREAD_INFO, or the C version, to find the stack. This will no longer work with THREAD_INFO_IN_TASK so change them to find the stack in other ways. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Split out of larger patch] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-23powerpc: call_do_[soft]irq() takes a pointer to the stackChristophe Leroy2-3/+3
The purpose of the pointer given to call_do_softirq() and call_do_irq() is to point the new stack. Currently that's the same thing as the thread_info, but won't be with THREAD_INFO_IN_TASK. So change the parameter to void* and rename it 'sp'. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Split out of larger patch] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-23powerpc: Rename THREAD_INFO to TASK_STACKChristophe Leroy7-9/+9
This patch renames THREAD_INFO to TASK_STACK, because it is in fact the offset of the pointer to the stack in task_struct so this pointer will not be impacted by the move of THREAD_INFO. Also make it available on 64-bit, as we'll need it there when we activate THREAD_INFO_IN_TASK. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Make available on 64-bit] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-23powerpc: prep stack walkers for THREAD_INFO_IN_TASKChristophe Leroy2-6/+49
[text copied from commit 9bbd4c56b0b6 ("arm64: prep stack walkers for THREAD_INFO_IN_TASK")] When CONFIG_THREAD_INFO_IN_TASK is selected, task stacks may be freed before a task is destroyed. To account for this, the stacks are refcounted, and when manipulating the stack of another task, it is necessary to get/put the stack to ensure it isn't freed and/or re-used while we do so. This patch reworks the powerpc stack walking code to account for this. When CONFIG_THREAD_INFO_IN_TASK is not selected these perform no refcounting, and this should only be a structural change that does not affect behaviour. Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Move try_get_task_stack() below tsk == NULL check in show_stack()] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-23powerpc: Only use task_struct 'cpu' field on SMPChristophe Leroy3-1/+7
When moving to CONFIG_THREAD_INFO_IN_TASK, the thread_info 'cpu' field gets moved into task_struct and only defined when CONFIG_SMP is set. This patch ensures that TI_CPU is only used when CONFIG_SMP is set and that task_struct 'cpu' field is not used directly out of SMP code. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-23powerpc: Avoid circular header inclusion in mmu-hash.hChristophe Leroy5-96/+107
When activating CONFIG_THREAD_INFO_IN_TASK, linux/sched.h includes asm/current.h. This generates a circular dependency. To avoid that, asm/processor.h shall not be included in mmu-hash.h. In order to do that, this patch moves into a new header called asm/task_size_64/32.h all the TASK_SIZE related constants, which can then be included in mmu-hash.h directly. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Split out all the TASK_SIZE constants not just 64-bit ones] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-23powerpc/irq: use memblock functions returning virtual addressChristophe Leroy3-27/+23
Since only the virtual address of allocated blocks is used, lets use functions returning directly virtual address. Those functions have the advantage of also zeroing the block. Suggested-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-23powerpc/64: Simplify __secondary_start paca->kstack handlingMichael Ellerman1-11/+9
In __secondary_start() we load the thread_info of the idle task of the secondary CPU from current_set[cpu], and then convert it into a stack pointer before storing that back to paca->kstack. As pointed out in commit f761622e5943 ("powerpc: Initialise paca->kstack before early_setup_secondary") it's important that we initialise paca->kstack before calling the MMU setup code, in particular slb_initialize(), because it will bolt the SLB entry for the kstack into the SLB. However we have already setup paca->kstack in cpu_idle_thread_init(), since commit 3b5750644b2f ("[POWERPC] Bolt in SLB entry for kernel stack on secondary cpus") (May 2008). It's also in cpu_idle_thread_init() that we initialise current_set[cpu] with the thread_info pointer, so there is no issue of the timing being different between the two. Therefore the initialisation of paca->kstack in __setup_secondary() is completely redundant, so remove it. This has the added benefit of removing code that runs in real mode, and is therefore restricted by the RMO, and so opens the way for us to enable THREAD_INFO_IN_TASK. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-23powerpc/64s: Remove MSR_RI optimisation in system_call_exit()Michael Ellerman1-20/+14
Currently in system_call_exit() we have an optimisation where we disable MSR_RI (recoverable interrupt) and MSR_EE (external interrupt enable) in a single mtmsrd instruction. Unfortunately this will no longer work with THREAD_INFO_IN_TASK, because then the load of TI_FLAGS might fault and faulting with MSR_RI clear is treated as an unrecoverable exception which leads to a panic(). So change the code to only clear MSR_EE prior to loading TI_FLAGS, leaving the clear of MSR_RI until later. We have some latitude in where do the clear of MSR_RI. A bit of experimentation has shown that this location gives the least slow down. This still causes a noticeable slow down in our null_syscall performance. On a Power9 DD2.2: Before After Delta Delta % 955 cycles 999 cycles -44 -4.6% On the plus side this does simplify the code somewhat, because we don't have to reenable MSR_RI on the restore_math() or syscall_exit_work() paths which was necessitated previously by the optimisation. Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-23powerpc: Enable kcovAndrew Donnellan7-2/+22
kcov provides kernel coverage data that's useful for fuzzing tools like syzkaller. Wire up kcov support on powerpc. Disable kcov instrumentation on the same files where we currently disable gcov and UBSan instrumentation, plus some additional exclusions which appear necessary to boot on book3e machines. Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Acked-by: Dmitry Vyukov <dvyukov@google.com> Tested-by: Daniel Axtens <dja@axtens.net> # e6500 Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-23powerpc/kconfig: make _etext and data areas alignment configurable on 8xxChristophe Leroy2-5/+17
On 8xx, large pages (512kb or 8M) are used to map kernel linear memory. Aligning to 8M reduces TLB misses as only 8M pages are used in that case. We make 8M the default for data. This patchs allows the user to do it via Kconfig. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-23powerpc/8xx: don't disable large TLBs with CONFIG_STRICT_KERNEL_RWXChristophe Leroy6-16/+78
This patch implements handling of STRICT_KERNEL_RWX with large TLBs directly in the TLB miss handlers. To do so, etext and sinittext are aligned on 512kB boundaries and the miss handlers use 512kB pages instead of 8Mb pages for addresses close to the boundaries. It sets RO PP flags for addresses under sinittext. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-23powerpc/kconfig: make _etext and data areas alignment configurable on Book3s 32Christophe Leroy1-2/+30
Depending on the number of available BATs for mapping the different kernel areas, it might be needed to increase the alignment of _etext and/or of data areas. This patchs allows the user to do it via Kconfig. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-23powerpc/mm/32s: Use BATs for STRICT_KERNEL_RWXChristophe Leroy6-10/+112
Today, STRICT_KERNEL_RWX is based on the use of regular pages to map kernel pages. On Book3s 32, it has three consequences: - Using pages instead of BAT for mapping kernel linear memory severely impacts performance. - Exec protection is not effective because no-execute cannot be set at page level (except on 603 which doesn't have hash tables) - Write protection is not effective because PP bits do not provide RO mode for kernel-only pages (except on 603 which handles it in software via PAGE_DIRTY) On the 603+, we have: - Independent IBAT and DBAT allowing limitation of exec parts. - NX bit can be set in segment registers to forbit execution on memory mapped by pages. - RO mode on DBATs even for kernel-only blocks. On the 601, there is nothing much we can do other than warn the user about it, because: - BATs are common to instructions and data. - BAT do not provide RO mode for kernel-only blocks. - segment registers don't have the NX bit. In order to use IBAT for exec protection, this patch: - Aligns _etext to BAT block sizes (128kb) - Set NX bit in kernel segment register (Except on vmalloc area when CONFIG_MODULES is selected) - Maps kernel text with IBATs. In order to use DBAT for exec protection, this patch: - Aligns RW DATA to BAT block sizes (4M) - Maps kernel RO area with write prohibited DBATs - Maps remaining memory with remaining DBATs Here is what we get with this patch on a 832x when activating STRICT_KERNEL_RWX: Symbols: c0000000 T _stext c0680000 R __start_rodata c0680000 R _etext c0800000 T __init_begin c0800000 T _sinittext ~# cat /sys/kernel/debug/block_address_translation ---[ Instruction Block Address Translation ]--- 0: 0xc0000000-0xc03fffff 0x00000000 Kernel EXEC coherent 1: 0xc0400000-0xc05fffff 0x00400000 Kernel EXEC coherent 2: 0xc0600000-0xc067ffff 0x00600000 Kernel EXEC coherent 3: - 4: - 5: - 6: - 7: - ---[ Data Block Address Translation ]--- 0: 0xc0000000-0xc07fffff 0x00000000 Kernel RO coherent 1: 0xc0800000-0xc0ffffff 0x00800000 Kernel RW coherent 2: 0xc1000000-0xc1ffffff 0x01000000 Kernel RW coherent 3: 0xc2000000-0xc3ffffff 0x02000000 Kernel RW coherent 4: 0xc4000000-0xc7ffffff 0x04000000 Kernel RW coherent 5: 0xc8000000-0xcfffffff 0x08000000 Kernel RW coherent 6: 0xd0000000-0xdfffffff 0x10000000 Kernel RW coherent 7: - ~# cat /sys/kernel/debug/segment_registers ---[ User Segments ]--- 0x00000000-0x0fffffff Kern key 1 User key 1 VSID 0xa085d0 0x10000000-0x1fffffff Kern key 1 User key 1 VSID 0xa086e1 0x20000000-0x2fffffff Kern key 1 User key 1 VSID 0xa087f2 0x30000000-0x3fffffff Kern key 1 User key 1 VSID 0xa08903 0x40000000-0x4fffffff Kern key 1 User key 1 VSID 0xa08a14 0x50000000-0x5fffffff Kern key 1 User key 1 VSID 0xa08b25 0x60000000-0x6fffffff Kern key 1 User key 1 VSID 0xa08c36 0x70000000-0x7fffffff Kern key 1 User key 1 VSID 0xa08d47 0x80000000-0x8fffffff Kern key 1 User key 1 VSID 0xa08e58 0x90000000-0x9fffffff Kern key 1 User key 1 VSID 0xa08f69 0xa0000000-0xafffffff Kern key 1 User key 1 VSID 0xa0907a 0xb0000000-0xbfffffff Kern key 1 User key 1 VSID 0xa0918b ---[ Kernel Segments ]--- 0xc0000000-0xcfffffff Kern key 0 User key 1 No Exec VSID 0x000ccc 0xd0000000-0xdfffffff Kern key 0 User key 1 No Exec VSID 0x000ddd 0xe0000000-0xefffffff Kern key 0 User key 1 No Exec VSID 0x000eee 0xf0000000-0xffffffff Kern key 0 User key 1 No Exec VSID 0x000fff Aligning _etext to 128kb allows to map up to 32Mb text with 8 IBATs: 16Mb + 8Mb + 4Mb + 2Mb + 1Mb + 512kb + 256kb + 128kb (+ 128kb) = 32Mb (A 9th IBAT is unneeded as 32Mb would need only a single 32Mb block) Aligning data to 4M allows to map up to 512Mb data with 8 DBATs: 16Mb + 8Mb + 4Mb + 4Mb + 32Mb + 64Mb + 128Mb + 256Mb = 512Mb Because some processors only have 4 BATs and because some targets need DBATs for mapping other areas, the following patch will allow to modify _etext and data alignment. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-23powerpc/mm/32s: add setibat() clearibat() and update_bats()Christophe Leroy3-0/+69
setibat() and clearibat() allows to manipulate IBATs independently of DBATs. update_bats() allows to update bats after init. This is done with MMU off. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>