summaryrefslogtreecommitdiffstats
path: root/arch
AgeCommit message (Collapse)AuthorFilesLines
2020-11-19Revert "powerpc/pseries/hotplug-cpu: Remove double free in error path"Zhang Xiaoxu1-0/+1
This reverts commit a0ff72f9f5a780341e7ff5e9ba50a0dad5fa1980. Since the commit b015f6bc9547 ("powerpc/pseries: Add cpu DLPAR support for drc-info property"), the 'cpu_drcs' wouldn't be double freed when the 'cpus' node not found. So we needn't apply this patch, otherwise, the memory will be leaked. Fixes: a0ff72f9f5a7 ("powerpc/pseries/hotplug-cpu: Remove double free in error path") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com> [mpe: Caused by me applying a patch to a function that had changed in the interim] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201111020752.1686139-1-zhangxiaoxu5@huawei.com
2020-11-19powerpc/64s/perf: perf interrupt does not have to get_user_pages to access ↵Nicholas Piggin2-2/+3
user memory read_user_stack_slow that walks user address translation by hand is only required on hash, because a hash fault can not be serviced from "NMI" context (to avoid re-entering the hash code) so the user stack can be mapped into Linux page tables but not accessible by the CPU. Radix MMU mode does not have this restriction. A page fault failure would indicate the page is not accessible via get_user_pages either, so avoid this on radix. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201111120151.3150658-1-npiggin@gmail.com
2020-11-19powerpc: Use the common INIT_DATA_SECTION macro in vmlinux.lds.SYouling Tang1-18/+1
Use the common INIT_DATA_SECTION rule for the linker script in an effort to regularize the linker script. Signed-off-by: Youling Tang <tangyouling@loongson.cn> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1604487550-20040-1-git-send-email-tangyouling@loongson.cn
2020-11-19powerpc/feature: Fix CPU_FTRS_ALWAYS by removing CPU_FTRS_GENERIC_32Christophe Leroy1-5/+0
On 8xx, we get the following features: [ 0.000000] cpu_features = 0x0000000000000100 [ 0.000000] possible = 0x0000000000000120 [ 0.000000] always = 0x0000000000000000 This is not correct. As CONFIG_PPC_8xx is mutually exclusive with all other configurations, the three lines should be equal. The problem is due to CPU_FTRS_GENERIC_32 which is taken when CONFIG_BOOK3S_32 is NOT selected. This CPU_FTRS_GENERIC_32 is pointless because there is no generic configuration supporting all 32 bits but book3s/32. Remove this pointless generic features definition to unbreak the calculation of 'possible' features and 'always' features. Fixes: 76bc080ef5a3 ("[POWERPC] Make default cputable entries reflect selected CPU family") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/76a85f30bf981d1aeaae00df99321235494da254.1604426550.git.christophe.leroy@csgroup.eu
2020-11-19powerpc/mm: Update tlbiel loop on POWER10Aneesh Kumar K.V3-9/+32
With POWER10, single tlbiel instruction invalidates all the congruence class of the TLB and hence we need to issue only one tlbiel with SET=0. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201007053305.232879-1-aneesh.kumar@linux.ibm.com
2020-11-19powerpc: Avoid broken GCC __attribute__((optimize))Ard Biesheuvel4-9/+6
Commit 7053f80d9696 ("powerpc/64: Prevent stack protection in early boot") introduced a couple of uses of __attribute__((optimize)) with function scope, to disable the stack protector in some early boot code. Unfortunately, and this is documented in the GCC man pages [0], overriding function attributes for optimization is broken, and is only supported for debug scenarios, not for production: the problem appears to be that setting GCC -f flags using this method will cause it to forget about some or all other optimization settings that have been applied. So the only safe way to disable the stack protector is to disable it for the entire source file. [0] https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html Fixes: 7053f80d9696 ("powerpc/64: Prevent stack protection in early boot") Signed-off-by: Ard Biesheuvel <ardb@kernel.org> [mpe: Drop one remaining use of __nostackprotector, reported by snowpatch] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201028080433.26799-1-ardb@kernel.org
2020-11-19powerpc: sysdev: add missing iounmap() on error in mpic_msgr_probe()Qinglang Miao1-1/+1
I noticed that iounmap() of msgr_block_addr before return from mpic_msgr_probe() in the error handling case is missing. So use devm_ioremap() instead of just ioremap() when remapping the message register block, so the mapping will be automatically released on probe failure. Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201028091551.136400-1-miaoqinglang@huawei.com
2020-11-19powerpc/ps3: Drop unused DBG macroMichael Ellerman1-7/+0
This DBG macro is unused, and has been unused since the file was originally merged into mainline. Just drop it. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201023031305.3284819-1-mpe@ellerman.id.au
2020-11-19powerpc/85xx: Fix declaration made after definitionMichael Ellerman1-2/+1
Currently the clang build of corenet64_smp_defconfig fails with: arch/powerpc/platforms/85xx/corenet_generic.c:210:1: error: attribute declaration must precede definition machine_arch_initcall(corenet_generic, corenet_gen_publish_devices); Fix it by moving the initcall definition prior to the machine definition, and directly below the function it calls, which is the usual style anyway. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201023020838.3274226-1-mpe@ellerman.id.au
2020-11-19powerpc/mm: Move setting PTE specific flags to pfn_pmd()Aneesh Kumar K.V2-2/+23
powerpc used to set the PTE specific flags in set_pte_at(). That is different from other architectures. To be consistent with other architectures powerpc updated pfn_pte() to set _PAGE_PTE in commit 379c926d6334 ("powerpc/mm: move setting pte specific flags to pfn_pte") That commit didn't do the same for pfn_pmd() because we expect pmd_mkhuge() to do that. But as per Linus that is a bad rule: The rule that you must use "pmd_mkhuge()" seems _completely_ wrong. The only valid use to ever make a pmd out of a pfn is to make a huge-page. Hence update pfn_pmd() to set _PAGE_PTE. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201022091115.39568-1-aneesh.kumar@linux.ibm.com
2020-11-19powerpc/bitops: Fix possible undefined behaviour with fls() and fls64()Christophe Leroy1-2/+21
fls() and fls64() are using __builtin_ctz() and _builtin_ctzll(). On powerpc, those builtins trivially use ctlzw and ctlzd power instructions. Allthough those instructions provide the expected result with input argument 0, __builtin_ctz() and __builtin_ctzll() are documented as undefined for value 0. The easiest fix would be to use fls() and fls64() functions defined in include/asm-generic/bitops/builtin-fls.h and include/asm-generic/bitops/fls64.h, but GCC output is not optimal: 00000388 <testfls>: 388: 2c 03 00 00 cmpwi r3,0 38c: 41 82 00 10 beq 39c <testfls+0x14> 390: 7c 63 00 34 cntlzw r3,r3 394: 20 63 00 20 subfic r3,r3,32 398: 4e 80 00 20 blr 39c: 38 60 00 00 li r3,0 3a0: 4e 80 00 20 blr 000003b0 <testfls64>: 3b0: 2c 03 00 00 cmpwi r3,0 3b4: 40 82 00 1c bne 3d0 <testfls64+0x20> 3b8: 2f 84 00 00 cmpwi cr7,r4,0 3bc: 38 60 00 00 li r3,0 3c0: 4d 9e 00 20 beqlr cr7 3c4: 7c 83 00 34 cntlzw r3,r4 3c8: 20 63 00 20 subfic r3,r3,32 3cc: 4e 80 00 20 blr 3d0: 7c 63 00 34 cntlzw r3,r3 3d4: 20 63 00 40 subfic r3,r3,64 3d8: 4e 80 00 20 blr When the input of fls(x) is a constant, just check x for nullity and return either 0 or __builtin_clz(x). Otherwise, use cntlzw instruction directly. For fls64() on PPC64, do the same but with __builtin_clzll() and cntlzd instruction. On PPC32, lets take the generic fls64() which will use our fls(). The result is as expected: 00000388 <testfls>: 388: 7c 63 00 34 cntlzw r3,r3 38c: 20 63 00 20 subfic r3,r3,32 390: 4e 80 00 20 blr 000003a0 <testfls64>: 3a0: 2c 03 00 00 cmpwi r3,0 3a4: 40 82 00 10 bne 3b4 <testfls64+0x14> 3a8: 7c 83 00 34 cntlzw r3,r4 3ac: 20 63 00 20 subfic r3,r3,32 3b0: 4e 80 00 20 blr 3b4: 7c 63 00 34 cntlzw r3,r3 3b8: 20 63 00 40 subfic r3,r3,64 3bc: 4e 80 00 20 blr Fixes: 2fcff790dcb4 ("powerpc: Use builtin functions for fls()/__fls()/fls64()") Cc: stable@vger.kernel.org Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Segher Boessenkool <segher@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/348c2d3f19ffcff8abe50d52513f989c4581d000.1603375524.git.christophe.leroy@csgroup.eu
2020-11-19powerpc/64s: Convert some cpu_setup() and cpu_restore() functions to CJordan Niethe4-260/+287
The only thing keeping the cpu_setup() and cpu_restore() functions used in the cputable entries for Power7, Power8, Power9 and Power10 in assembly was cpu_restore() being called before there was a stack in generic_secondary_smp_init(). Commit ("powerpc/64: Set up a kernel stack for secondaries before cpu_restore()") means that it is now possible to use C. Rewrite the functions in C so they are a little bit easier to read. This is not changing their functionality. Signed-off-by: Jordan Niethe <jniethe5@gmail.com> [mpe: Tweak copyright and authorship notes] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201014072837.24539-2-jniethe5@gmail.com
2020-11-18powerpc/64s/exception: KVM Fix for host DSI being taken in HPT guest MMU contextNicholas Piggin1-4/+7
Commit 2284ffea8f0c ("powerpc/64s/exception: Only test KVM in SRR interrupts when PR KVM is supported") removed KVM guest tests from interrupts that do not set HV=1, when PR-KVM is not configured. This is wrong for HV-KVM HPT guest MMIO emulation case which attempts to load the faulting instruction word with MSR[DR]=1 and MSR[HV]=1 with the guest MMU context loaded. This can cause host DSI, DSLB interrupts which must test for KVM guest. Restore this and add a comment. Fixes: 2284ffea8f0c ("powerpc/64s/exception: Only test KVM in SRR interrupts when PR KVM is supported") Cc: stable@vger.kernel.org # v5.7+ Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201117135617.3521127-1-npiggin@gmail.com
2020-11-17powerpc: Drop -me200 addition to build flagsMichael Ellerman1-1/+0
Currently a build with CONFIG_E200=y will fail with: Error: invalid switch -me200 Error: unrecognized option -me200 Upstream binutils has never supported an -me200 option. Presumably it was supported at some point by either a fork or Freescale internal binutils. We can't support code that we can't even build test, so drop the addition of -me200 to the build flags, so we can at least build with CONFIG_E200=y. Reported-by: Németh Márton <nm127@freemail.hu> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Acked-by: Scott Wood <oss@buserror.net> Link: https://lore.kernel.org/r/20201116120913.165317-1-mpe@ellerman.id.au
2020-11-16KVM: PPC: Book3S HV: XIVE: Fix possible oops when accessing ESB pageCédric Le Goater1-0/+7
When accessing the ESB page of a source interrupt, the fault handler will retrieve the page address from the XIVE interrupt 'xive_irq_data' structure. If the associated KVM XIVE interrupt is not valid, that is not allocated at the HW level for some reason, the fault handler will dereference a NULL pointer leading to the oops below : WARNING: CPU: 40 PID: 59101 at arch/powerpc/kvm/book3s_xive_native.c:259 xive_native_esb_fault+0xe4/0x240 [kvm] CPU: 40 PID: 59101 Comm: qemu-system-ppc Kdump: loaded Tainted: G W --------- - - 4.18.0-240.el8.ppc64le #1 NIP: c00800000e949fac LR: c00000000044b164 CTR: c00800000e949ec8 REGS: c000001f69617840 TRAP: 0700 Tainted: G W --------- - - (4.18.0-240.el8.ppc64le) MSR: 9000000000029033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 44044282 XER: 00000000 CFAR: c00000000044b160 IRQMASK: 0 GPR00: c00000000044b164 c000001f69617ac0 c00800000e96e000 c000001f69617c10 GPR04: 05faa2b21e000080 0000000000000000 0000000000000005 ffffffffffffffff GPR08: 0000000000000000 0000000000000001 0000000000000000 0000000000000001 GPR12: c00800000e949ec8 c000001ffffd3400 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 0000000000000000 0000000000000000 c000001f5c065160 c000000001c76f90 GPR24: c000001f06f20000 c000001f5c065100 0000000000000008 c000001f0eb98c78 GPR28: c000001dcab40000 c000001dcab403d8 c000001f69617c10 0000000000000011 NIP [c00800000e949fac] xive_native_esb_fault+0xe4/0x240 [kvm] LR [c00000000044b164] __do_fault+0x64/0x220 Call Trace: [c000001f69617ac0] [0000000137a5dc20] 0x137a5dc20 (unreliable) [c000001f69617b50] [c00000000044b164] __do_fault+0x64/0x220 [c000001f69617b90] [c000000000453838] do_fault+0x218/0x930 [c000001f69617bf0] [c000000000456f50] __handle_mm_fault+0x350/0xdf0 [c000001f69617cd0] [c000000000457b1c] handle_mm_fault+0x12c/0x310 [c000001f69617d10] [c00000000007ef44] __do_page_fault+0x264/0xbb0 [c000001f69617df0] [c00000000007f8c8] do_page_fault+0x38/0xd0 [c000001f69617e30] [c00000000000a714] handle_page_fault+0x18/0x38 Instruction dump: 40c2fff0 7c2004ac 2fa90000 409e0118 73e90001 41820080 e8bd0008 7c2004ac 7ca90074 39400000 915c0000 7929d182 <0b090000> 2fa50000 419e0080 e89e0018 ---[ end trace 66c6ff034c53f64f ]--- xive-kvm: xive_native_esb_fault: accessing invalid ESB page for source 8 ! Fix that by checking the validity of the KVM XIVE interrupt structure. Fixes: 6520ca64cde7 ("KVM: PPC: Book3S HV: XIVE: Add a mapping for the source ESB pages") Cc: stable@vger.kernel.org # v5.2+ Reported-by: Greg Kurz <groug@kaod.org> Signed-off-by: Cédric Le Goater <clg@kaod.org> Tested-by: Greg Kurz <groug@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201105134713.656160-1-clg@kaod.org
2020-11-16powerpc/64s: Fix KVM system reset handling when CONFIG_PPC_PSERIES=yNicholas Piggin1-2/+0
pseries guest kernels have a FWNMI handler for SRESET and MCE NMIs, which is basically the same as the regular handlers for those interrupts. The system reset FWNMI handler did not have a KVM guest test in it, although it probably should have because the guest can itself run guests. Commit 4f50541f6703b ("powerpc/64s/exception: Move all interrupt handlers to new style code gen macros") convert the handler faithfully to avoid a KVM test with a "clever" trick to modify the IKVM_REAL setting to 0 when the fwnmi handler is to be generated (PPC_PSERIES=y). This worked when the KVM test was generated in the interrupt entry handlers, but a later patch moved the KVM test to the common handler, and the common handler macro is expanded below the fwnmi entry. This prevents the KVM test from being generated even for the 0x100 entry point as well. The result is NMI IPIs in the host kernel when a guest is running will use gest registers. This goes particularly badly when an HPT guest is running and the MMU is set to guest mode. Remove this trickery and just generate the test always. Fixes: 9600f261acaa ("powerpc/64s/exception: Move KVM test to common code") Cc: stable@vger.kernel.org # v5.7+ Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201114114743.3306283-1-npiggin@gmail.com
2020-11-08powerpc/32s: Use relocation offset when setting early hash tableChristophe Leroy1-1/+2
When calling early_hash_table(), the kernel hasn't been yet relocated to its linking address, so data must be addressed with relocation offset. Add relocation offset to write into Hash in early_hash_table(). Fixes: 69a1593abdbc ("powerpc/32s: Setup the early hash table at all time.") Reported-by: Erhard Furtner <erhard_f@mailbox.org> Reported-by: Andreas Schwab <schwab@linux-m68k.org> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Tested-by: Serge Belyshev <belyshev@depni.sinp.msu.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/9e225a856a8b22e0e77587ee22ab7a2f5bca8753.1604740029.git.christophe.leroy@csgroup.eu
2020-11-06powerpc/numa: Fix build when CONFIG_NUMA=nScott Cheloha1-3/+9
Add a non-NUMA definition for of_drconf_to_nid_single() to topology.h so we have one even if powerpc/mm/numa.c is not compiled. On a non-NUMA kernel the appropriate node id is always first_online_node. Fixes: 72cdd117c449 ("pseries/hotplug-memory: hot-add: skip redundant LMB lookup") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Scott Cheloha <cheloha@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201105223040.3612663-1-cheloha@linux.ibm.com
2020-11-05powerpc/8xx: Manage _PAGE_ACCESSED through APG bits in L1 entryChristophe Leroy4-66/+28
When _PAGE_ACCESSED is not set, a minor fault is expected. To do this, TLB miss exception ANDs _PAGE_PRESENT and _PAGE_ACCESSED into the L2 entry valid bit. To simplify the processing and reduce the number of instructions in TLB miss exceptions, manage it as an APG bit and get it next to _PAGE_GUARDED bit to allow a copy in one go. Then declare the corresponding groups as handling all accesses as user accesses. As the PP bits always define user as No Access, it will generate a fault. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/80f488db230c6b0e7b3b990d72bd94a8a069e93e.1602492856.git.christophe.leroy@csgroup.eu
2020-11-05powerpc/8xx: Always fault when _PAGE_ACCESSED is not setChristophe Leroy1-12/+2
The kernel expects pte_young() to work regardless of CONFIG_SWAP. Make sure a minor fault is taken to set _PAGE_ACCESSED when it is not already set, regardless of the selection of CONFIG_SWAP. This adds at least 3 instructions to the TLB miss exception handlers fast path. Following patch will reduce this overhead. Also update the rotation instruction to the correct number of bits to reflect all changes done to _PAGE_ACCESSED over time. Fixes: d069cb4373fe ("powerpc/8xx: Don't touch ACCESSED when no SWAP.") Fixes: 5f356497c384 ("powerpc/8xx: remove unused _PAGE_WRITETHRU") Fixes: e0a8e0d90a9f ("powerpc/8xx: Handle PAGE_USER via APG bits") Fixes: 5b2753fc3e8a ("powerpc/8xx: Implementation of PAGE_EXEC") Fixes: a891c43b97d3 ("powerpc/8xx: Prepare handlers for _PAGE_HUGE for 512k pages.") Cc: stable@vger.kernel.org Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/af834e8a0f1fa97bfae65664950f0984a70c4750.1602492856.git.christophe.leroy@csgroup.eu
2020-11-05powerpc/40x: Always fault when _PAGE_ACCESSED is not setChristophe Leroy1-8/+0
The kernel expects pte_young() to work regardless of CONFIG_SWAP. Make sure a minor fault is taken to set _PAGE_ACCESSED when it is not already set, regardless of the selection of CONFIG_SWAP. Fixes: 2c74e2586bb9 ("powerpc/40x: Rework 40x PTE access and TLB miss") Cc: stable@vger.kernel.org Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/b02ca2ed2d3676a096219b48c0f69ec982a75bcf.1602342801.git.christophe.leroy@csgroup.eu
2020-11-05powerpc/603: Always fault when _PAGE_ACCESSED is not setChristophe Leroy1-12/+0
The kernel expects pte_young() to work regardless of CONFIG_SWAP. Make sure a minor fault is taken to set _PAGE_ACCESSED when it is not already set, regardless of the selection of CONFIG_SWAP. Fixes: 84de6ab0e904 ("powerpc/603: don't handle PAGE_ACCESSED in TLB miss handlers.") Cc: stable@vger.kernel.org Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/a44367744de54e2315b2f1a8cbbd7f88488072e0.1602342806.git.christophe.leroy@csgroup.eu
2020-11-05powerpc: Use asm_goto_volatile for put_user()Michael Ellerman1-2/+2
Andreas reported that commit ee0a49a6870e ("powerpc/uaccess: Switch __put_user_size_allowed() to __put_user_asm_goto()") broke CLONE_CHILD_SETTID. Further inspection showed that the put_user() in schedule_tail() was missing entirely, the store not emitted by the compiler. <.schedule_tail>: mflr r0 std r0,16(r1) stdu r1,-112(r1) bl <.finish_task_switch> ld r9,2496(r3) cmpdi cr7,r9,0 bne cr7,<.schedule_tail+0x60> ld r3,392(r13) ld r9,1392(r3) cmpdi cr7,r9,0 beq cr7,<.schedule_tail+0x3c> li r4,0 li r5,0 bl <.__task_pid_nr_ns> nop bl <.calculate_sigpending> nop addi r1,r1,112 ld r0,16(r1) mtlr r0 blr nop nop nop bl <.__balance_callback> b <.schedule_tail+0x1c> Notice there are no stores other than to the stack. There should be a stw in there for the store to current->set_child_tid. This is only seen with GCC 4.9 era compilers (tested with 4.9.3 and 4.9.4), and only when CONFIG_PPC_KUAP is disabled. When CONFIG_PPC_KUAP=y, the inline asm that's part of the isync() and mtspr() inlined via allow_user_access() seems to be enough to avoid the bug. We already have a macro to work around this (or a similar bug), called asm_volatile_goto which includes an empty asm block to tickle the compiler into generating the right code. So use that. With this applied the code generation looks more like it will work: <.schedule_tail>: mflr r0 std r31,-8(r1) std r0,16(r1) stdu r1,-144(r1) std r3,112(r1) bl <._mcount> nop ld r3,112(r1) bl <.finish_task_switch> ld r9,2624(r3) cmpdi cr7,r9,0 bne cr7,<.schedule_tail+0xa0> ld r3,2408(r13) ld r31,1856(r3) cmpdi cr7,r31,0 beq cr7,<.schedule_tail+0x80> li r4,0 li r5,0 bl <.__task_pid_nr_ns> nop li r9,-1 clrldi r9,r9,12 cmpld cr7,r31,r9 bgt cr7,<.schedule_tail+0x80> lis r9,16 rldicr r9,r9,32,31 subf r9,r31,r9 cmpldi cr7,r9,3 ble cr7,<.schedule_tail+0x80> li r9,0 stw r3,0(r31) <-- stw nop bl <.calculate_sigpending> nop addi r1,r1,144 ld r0,16(r1) ld r31,-8(r1) mtlr r0 blr nop bl <.__balance_callback> b <.schedule_tail+0x30> Fixes: ee0a49a6870e ("powerpc/uaccess: Switch __put_user_size_allowed() to __put_user_asm_goto()") Reported-by: Andreas Schwab <schwab@linux-m68k.org> Tested-by: Andreas Schwab <schwab@linux-m68k.org> Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201104111742.672142-1-mpe@ellerman.id.au
2020-11-02powerpc/64: Set up a kernel stack for secondaries before cpu_restore()Jordan Niethe2-6/+6
Currently in generic_secondary_smp_init(), cur_cpu_spec->cpu_restore() is called before a stack has been set up in r1. This was previously fine as the cpu_restore() functions were implemented in assembly and did not use a stack. However commit 5a61ef74f269 ("powerpc/64s: Support new device tree binding for discovering CPU features") used __restore_cpu_cpufeatures() as the cpu_restore() function for a device-tree features based cputable entry. This is a C function and hence uses a stack in r1. generic_secondary_smp_init() is entered on the secondary cpus via the primary cpu using the OPAL call opal_start_cpu(). In OPAL, each hardware thread has its own stack. The OPAL call is ran in the primary's hardware thread. During the call, a job is scheduled on a secondary cpu that will start executing at the address of generic_secondary_smp_init(). Hence the value that will be left in r1 when the secondary cpu enters the kernel is part of that secondary cpu's individual OPAL stack. This means that __restore_cpu_cpufeatures() will write to that OPAL stack. This is not horribly bad as each hardware thread has its own stack and the call that enters the kernel from OPAL never returns, but it is still wrong and should be corrected. Create the temp kernel stack before calling cpu_restore(). As noted by mpe, for a kexec boot, the secondary CPUs are released from the spin loop at address 0x60 by smp_release_cpus() and then jump to generic_secondary_smp_init(). The call to smp_release_cpus() is in setup_arch(), and it comes before the call to emergency_stack_init(). emergency_stack_init() allocates an emergency stack in the PACA for each CPU. This address in the PACA is what is used to set up the temp kernel stack in generic_secondary_smp_init(). Move releasing the secondary CPUs to after the PACAs have been allocated an emergency stack, otherwise the PACA stack pointer will contain garbage and hence the temp kernel stack created from it will be broken. Fixes: 5a61ef74f269 ("powerpc/64s: Support new device tree binding for discovering CPU features") Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201014072837.24539-1-jniethe5@gmail.com
2020-11-02powerpc/smp: Call rcu_cpu_starting() earlierQian Cai1-1/+2
The call to rcu_cpu_starting() in start_secondary() is not early enough in the CPU-hotplug onlining process, which results in lockdep splats as follows (with CONFIG_PROVE_RCU_LIST=y): WARNING: suspicious RCU usage ----------------------------- kernel/locking/lockdep.c:3497 RCU-list traversed in non-reader section!! other info that might help us debug this: RCU used illegally from offline CPU! rcu_scheduler_active = 1, debug_locks = 1 no locks held by swapper/1/0. Call Trace: dump_stack+0xec/0x144 (unreliable) lockdep_rcu_suspicious+0x128/0x14c __lock_acquire+0x1060/0x1c60 lock_acquire+0x140/0x5f0 _raw_spin_lock_irqsave+0x64/0xb0 clockevents_register_device+0x74/0x270 register_decrementer_clockevent+0x94/0x110 start_secondary+0x134/0x800 start_secondary_prolog+0x10/0x14 This is avoided by adding a call to rcu_cpu_starting() near the beginning of the start_secondary() function. Note that the raw_smp_processor_id() is required in order to avoid calling into lockdep before RCU has declared the CPU to be watched for readers. It's safe to call rcu_cpu_starting() in the arch code as well as later in generic code, as explained by Paul: It uses a per-CPU variable so that RCU pays attention only to the first call to rcu_cpu_starting() if there is more than one of them. This is even intentional, due to there being a generic arch-independent call to rcu_cpu_starting() in notify_cpu_starting(). So multiple calls to rcu_cpu_starting() are fine by design. Fixes: 4d004099a668 ("lockdep: Fix lockdep recursion") Signed-off-by: Qian Cai <cai@redhat.com> Acked-by: Paul E. McKenney <paulmck@kernel.org> [mpe: Add Fixes tag, reword slightly & expand change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201028182334.13466-1-cai@redhat.com
2020-11-02powerpc/eeh_cache: Fix a possible debugfs deadlockQian Cai1-2/+3
Lockdep complains that a possible deadlock below in eeh_addr_cache_show() because it is acquiring a lock with IRQ enabled, but eeh_addr_cache_insert_dev() needs to acquire the same lock with IRQ disabled. Let's just make eeh_addr_cache_show() acquire the lock with IRQ disabled as well. CPU0 CPU1 ---- ---- lock(&pci_io_addr_cache_root.piar_lock); local_irq_disable(); lock(&tp->lock); lock(&pci_io_addr_cache_root.piar_lock); <Interrupt> lock(&tp->lock); *** DEADLOCK *** lock_acquire+0x140/0x5f0 _raw_spin_lock_irqsave+0x64/0xb0 eeh_addr_cache_insert_dev+0x48/0x390 eeh_probe_device+0xb8/0x1a0 pnv_pcibios_bus_add_device+0x3c/0x80 pcibios_bus_add_device+0x118/0x290 pci_bus_add_device+0x28/0xe0 pci_bus_add_devices+0x54/0xb0 pcibios_init+0xc4/0x124 do_one_initcall+0xac/0x528 kernel_init_freeable+0x35c/0x3fc kernel_init+0x24/0x148 ret_from_kernel_thread+0x5c/0x80 lock_acquire+0x140/0x5f0 _raw_spin_lock+0x4c/0x70 eeh_addr_cache_show+0x38/0x110 seq_read+0x1a0/0x660 vfs_read+0xc8/0x1f0 ksys_read+0x74/0x130 system_call_exception+0xf8/0x1d0 system_call_common+0xe8/0x218 Fixes: 5ca85ae6318d ("powerpc/eeh_cache: Add a way to dump the EEH address cache") Signed-off-by: Qian Cai <cai@redhat.com> Reviewed-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201028152717.8967-1-cai@redhat.com
2020-11-01Merge tag 'x86-urgent-2020-11-01' of ↵Linus Torvalds1-13/+30
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "Three fixes all related to #DB: - Handle the BTF bit correctly so it doesn't get lost due to a kernel #DB - Only clear and set the virtual DR6 value used by ptrace on user space triggered #DB. A kernel #DB must leave it alone to ensure data consistency for ptrace. - Make the bitmasking of the virtual DR6 storage correct so it does not lose DR_STEP" * tag 'x86-urgent-2020-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/debug: Fix DR_STEP vs ptrace_get_debugreg(6) x86/debug: Only clear/set ->virtual_dr6 for userspace #DB x86/debug: Fix BTF handling
2020-11-01Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds19-69/+127
Pull kvm fixes from Paolo Bonzini: "ARM: - selftest fix - force PTE mapping on device pages provided via VFIO - fix detection of cacheable mapping at S2 - fallback to PMD/PTE mappings for composite huge pages - fix accounting of Stage-2 PGD allocation - fix AArch32 handling of some of the debug registers - simplify host HYP entry - fix stray pointer conversion on nVHE TLB invalidation - fix initialization of the nVHE code - simplify handling of capabilities exposed to HYP - nuke VCPUs caught using a forbidden AArch32 EL0 x86: - new nested virtualization selftest - miscellaneous fixes - make W=1 fixes - reserve new CPUID bit in the KVM leaves" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: vmx: remove unused variable KVM: selftests: Don't require THP to run tests KVM: VMX: eVMCS: make evmcs_sanitize_exec_ctrls() work again KVM: selftests: test behavior of unmapped L2 APIC-access address KVM: x86: Fix NULL dereference at kvm_msr_ignored_check() KVM: x86: replace static const variables with macros KVM: arm64: Handle Asymmetric AArch32 systems arm64: cpufeature: upgrade hyp caps to final arm64: cpufeature: reorder cpus_have_{const, final}_cap() KVM: arm64: Factor out is_{vhe,nvhe}_hyp_code() KVM: arm64: Force PTE mapping on fault resulting in a device mapping KVM: arm64: Use fallback mapping sizes for contiguous huge page sizes KVM: arm64: Fix masks in stage2_pte_cacheable() KVM: arm64: Fix AArch32 handling of DBGD{CCINT,SCRext} and DBGVCR KVM: arm64: Allocate stage-2 pgd pages with GFP_KERNEL_ACCOUNT KVM: arm64: Drop useless PAN setting on host EL1 to EL2 transition KVM: arm64: Remove leftover kern_hyp_va() in nVHE TLB invalidation KVM: arm64: Don't corrupt tpidr_el2 on failed HVC call x86/kvm: Reserve KVM_FEATURE_MSI_EXT_DEST_ID
2020-10-31KVM: vmx: remove unused variablePaolo Bonzini1-2/+0
Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-31KVM: VMX: eVMCS: make evmcs_sanitize_exec_ctrls() work againVitaly Kuznetsov3-5/+5
It was noticed that evmcs_sanitize_exec_ctrls() is not being executed nowadays despite the code checking 'enable_evmcs' static key looking correct. Turns out, static key magic doesn't work in '__init' section (and it is unclear when things changed) but setup_vmcs_config() is called only once per CPU so we don't really need it to. Switch to checking 'enlightened_vmcs' instead, it is supposed to be in sync with 'enable_evmcs'. Opportunistically make evmcs_sanitize_exec_ctrls '__init' and drop unneeded extra newline from it. Reported-by: Yang Weijiang <weijiang.yang@intel.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20201014143346.2430936-1-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-30Merge tag 'arm64-fixes' of ↵Linus Torvalds24-40/+90
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Will Deacon: "The diffstat is a bit spread out thanks to an invasive CPU erratum workaround which missed the merge window and also a bunch of fixes to the recently added MTE selftests. - Fixes to MTE kselftests - Fix return code from KVM Spectre-v2 hypercall - Build fixes for ld.lld and Clang's infamous integrated assembler - Ensure RCU is up and running before we use printk() - Workaround for Cortex-A77 erratum 1508412 - Fix linker warnings from unexpected ELF sections - Ensure PE/COFF sections are 64k aligned" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: Change .weak to SYM_FUNC_START_WEAK_PI for arch/arm64/lib/mem*.S arm64/smp: Move rcu_cpu_starting() earlier arm64: Add workaround for Arm Cortex-A77 erratum 1508412 arm64: Add part number for Arm Cortex-A77 arm64: mte: Document that user PSTATE.TCO is ignored by kernel uaccess module: use hidden visibility for weak symbol references arm64: efi: increase EFI PE/COFF header padding to 64 KB arm64: vmlinux.lds: account for spurious empty .igot.plt sections kselftest/arm64: Fix check_user_mem test kselftest/arm64: Fix check_ksm_options test kselftest/arm64: Fix check_mmap_options test kselftest/arm64: Fix check_child_memory test kselftest/arm64: Fix check_tags_inclusion test kselftest/arm64: Fix check_buffer_fill test arm64: avoid -Woverride-init warning KVM: arm64: ARM_SMCCC_ARCH_WORKAROUND_1 doesn't return SMCCC_RET_NOT_REQUIRED arm64: vdso32: Allow ld.lld to properly link the VDSO
2020-10-30Merge tag 'arm-soc-fixes-v5.10-1' of ↵Linus Torvalds20-19/+135
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull ARM SoC fixes from Arnd Bergmann: "This is a fairly large set of bug fixes on top of -rc1, as most of them were ready but didn't quite make it into the last-minute pull requests for the merge window. Allwinner: - fix for incorrect CPU overtemperature limit Amlogic: - multiple smaller DT bugfixes, and missing device nodes Marvell EBU: - add missing aliases for ethernet switch ports on espressobin board Marvell MMP: - DTC warning fix - bugfix for camera interface power-down NXP i.MX: - re-enable the GPIO driver on all defconfigs ST STM32MP1: - fix random crashes from incorrect voltage settings Synaptics Berlin: - enable the correct hardware timer driver Texas Instruments K2G: - fix a boot regression in the power domain code TEE drivers: - fix regression in TEE "login" method SCMI drivers: - multiple code fixes for corner cases in newly added code MAINTAINERS file: - move Kukjin Kim and Sangbeom Kim to credits (used to work on Samsung Exynos) - Masahiro Yamada is stepping down as Uniphier maintainer I did not include a series of patches that work around a regression caused by a bugfix in an ethernet phy driver that resulted in an inadvertent DT binding change. This is still under discussion" * tag 'arm-soc-fixes-v5.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (31 commits) soc: ti: ti_sci_pm_domains: check for proper args count in xlate ARM: dts: stm32: Describe Vin power supply on stm32mp157c-edx board ARM: dts: stm32: Describe Vin power supply on stm32mp15xx-dkx board ARM: multi_v5_defconfig: Select CONFIG_GPIO_MXC ARM: imx_v4_v5_defconfig: Select CONFIG_GPIO_MXC ARM: dts: mmp2-olpc-xo-1-75: Use plural form of "-gpios" ARM: dts: mmp3: Add power domain for the camera arm64: berlin: Select DW_APB_TIMER_OF dt-bindings: sram: sunxi-sram: add V3s compatible string MAINTAINERS: Move Sangbeom Kim to credits MAINTAINERS: Move Kukjin Kim to credits MAINTAINERS: step down as maintainer of UniPhier SoCs and Denali driver ARM: multi_v7_defconfig: Build in CONFIG_GPIO_MXC by default ARM: imx_v6_v7_defconfig: Build in CONFIG_GPIO_MXC by default arm64: defconfig: Build in CONFIG_GPIO_MXC by default arm64: dts: meson: odroid-n2 plus: fix vddcpu_a pwm ARM: dts: meson8: remove two invalid interrupt lines from the GPU node arm64: dts: amlogic: add missing ethernet reset ID firmware: arm_scmi: Fix duplicate workqueue name firmware: arm_scmi: Fix locking in notifications ...
2020-10-30KVM: x86: Fix NULL dereference at kvm_msr_ignored_check()Takashi Iwai1-4/+4
The newly introduced kvm_msr_ignored_check() tries to print error or debug messages via vcpu_*() macros, but those may cause Oops when NULL vcpu is passed for KVM_GET_MSRS ioctl. Fix it by replacing the print calls with kvm_*() macros. (Note that this will leave vcpu argument completely unused in the function, but I didn't touch it to make the fix as small as possible. A clean up may be applied later.) Fixes: 12bc2132b15e ("KVM: X86: Do the same ignore_msrs check for feature msrs") BugLink: https://bugzilla.suse.com/show_bug.cgi?id=1178280 Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de> Message-Id: <20201030151414.20165-1-tiwai@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-30KVM: x86: replace static const variables with macrosPaolo Bonzini3-21/+21
Even though the compiler is able to replace static const variables with their value, it will warn about them being unused when Linux is built with W=1. Use good old macros instead, this is not C++. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-30Merge tag 'kvmarm-fixes-5.10-1' of ↵Paolo Bonzini11-37/+96
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 5.10, take #1 - Force PTE mapping on device pages provided via VFIO - Fix detection of cacheable mapping at S2 - Fallback to PMD/PTE mappings for composite huge pages - Fix accounting of Stage-2 PGD allocation - Fix AArch32 handling of some of the debug registers - Simplify host HYP entry - Fix stray pointer conversion on nVHE TLB invalidation - Fix initialization of the nVHE code - Simplify handling of capabilities exposed to HYP - Nuke VCPUs caught using a forbidden AArch32 EL0
2020-10-30KVM: arm64: Handle Asymmetric AArch32 systemsQais Yousef1-0/+19
On a system without uniform support for AArch32 at EL0, it is possible for the guest to force run AArch32 at EL0 and potentially cause an illegal exception if running on a core without AArch32. Add an extra check so that if we catch the guest doing that, then we prevent it from running again by resetting vcpu->arch.target and return ARM_EXCEPTION_IL. We try to catch this misbehaviour as early as possible and not rely on an illegal exception occuring to signal the problem. Attempting to run a 32bit app in the guest will produce an error from QEMU if the guest exits while running in AArch32 EL0. Tested on Juno by instrumenting the host to fake asym aarch32 and instrumenting KVM to make the asymmetry visible to the guest. [will: Incorporated feedback from Marc] Signed-off-by: Qais Yousef <qais.yousef@arm.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20201021104611.2744565-2-qais.yousef@arm.com Link: https://lore.kernel.org/r/20201027215118.27003-2-will@kernel.org
2020-10-30arm64: cpufeature: upgrade hyp caps to finalMark Rutland3-15/+24
We finalize caps before initializing kvm hyp code, and any use of cpus_have_const_cap() in kvm hyp code generates redundant and potentially unsound code to read the cpu_hwcaps array. A number of helper functions used in both hyp context and regular kernel context use cpus_have_const_cap(), as some regular kernel code runs before the capabilities are finalized. It's tedious and error-prone to write separate copies of these for hyp and non-hyp code. So that we can avoid the redundant code, let's automatically upgrade cpus_have_const_cap() to cpus_have_final_cap() when used in hyp context. With this change, there's never a reason to access to cpu_hwcaps array from hyp code, and we don't need to create an NVHE alias for this. This should have no effect on non-hyp code. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Acked-by: Will Deacon <will@kernel.org> Cc: David Brazdil <dbrazdil@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20201026134931.28246-4-mark.rutland@arm.com
2020-10-30arm64: cpufeature: reorder cpus_have_{const, final}_cap()Mark Rutland1-8/+8
In a subsequent patch we'll modify cpus_have_const_cap() to call cpus_have_final_cap(), and hence we need to define cpus_have_final_cap() first. To make subsequent changes easier to follow, this patch reorders the two without making any other changes. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Acked-by: Will Deacon <will@kernel.org> Cc: David Brazdil <dbrazdil@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20201026134931.28246-3-mark.rutland@arm.com
2020-10-30KVM: arm64: Factor out is_{vhe,nvhe}_hyp_code()Mark Rutland1-5/+16
Currently has_vhe() detects whether it is being compiled for VHE/NVHE hyp code based on preprocessor definitions, and uses this knowledge to avoid redundant runtime checks. There are other cases where we'd like to use this knowledge, so let's factor the preprocessor checks out into separate helpers. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Acked-by: Will Deacon <will@kernel.org> Cc: David Brazdil <dbrazdil@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20201026134931.28246-2-mark.rutland@arm.com
2020-10-30arm64: Change .weak to SYM_FUNC_START_WEAK_PI for arch/arm64/lib/mem*.SFangrui Song3-6/+3
Commit 39d114ddc682 ("arm64: add KASAN support") added .weak directives to arch/arm64/lib/mem*.S instead of changing the existing SYM_FUNC_START_PI macros. This can lead to the assembly snippet `.weak memcpy ... .globl memcpy` which will produce a STB_WEAK memcpy with GNU as but STB_GLOBAL memcpy with LLVM's integrated assembler before LLVM 12. LLVM 12 (since https://reviews.llvm.org/D90108) will error on such an overridden symbol binding. Use the appropriate SYM_FUNC_START_WEAK_PI instead. Fixes: 39d114ddc682 ("arm64: add KASAN support") Reported-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Fangrui Song <maskray@google.com> Tested-by: Sami Tolvanen <samitolvanen@google.com> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20201029181951.1866093-1-maskray@google.com Signed-off-by: Will Deacon <will@kernel.org>
2020-10-30arm64/smp: Move rcu_cpu_starting() earlierQian Cai1-0/+1
The call to rcu_cpu_starting() in secondary_start_kernel() is not early enough in the CPU-hotplug onlining process, which results in lockdep splats as follows: WARNING: suspicious RCU usage ----------------------------- kernel/locking/lockdep.c:3497 RCU-list traversed in non-reader section!! other info that might help us debug this: RCU used illegally from offline CPU! rcu_scheduler_active = 1, debug_locks = 1 no locks held by swapper/1/0. Call trace: dump_backtrace+0x0/0x3c8 show_stack+0x14/0x60 dump_stack+0x14c/0x1c4 lockdep_rcu_suspicious+0x134/0x14c __lock_acquire+0x1c30/0x2600 lock_acquire+0x274/0xc48 _raw_spin_lock+0xc8/0x140 vprintk_emit+0x90/0x3d0 vprintk_default+0x34/0x40 vprintk_func+0x378/0x590 printk+0xa8/0xd4 __cpuinfo_store_cpu+0x71c/0x868 cpuinfo_store_cpu+0x2c/0xc8 secondary_start_kernel+0x244/0x318 This is avoided by moving the call to rcu_cpu_starting up near the beginning of the secondary_start_kernel() function. Signed-off-by: Qian Cai <cai@redhat.com> Acked-by: Paul E. McKenney <paulmck@kernel.org> Link: https://lore.kernel.org/lkml/160223032121.7002.1269740091547117869.tip-bot2@tip-bot2/ Link: https://lore.kernel.org/r/20201028182614.13655-1-cai@redhat.com Signed-off-by: Will Deacon <will@kernel.org>
2020-10-29KVM: arm64: Force PTE mapping on fault resulting in a device mappingSantosh Shukla1-0/+1
VFIO allows a device driver to resolve a fault by mapping a MMIO range. This can be subsequently result in user_mem_abort() to try and compute a huge mapping based on the MMIO pfn, which is a sure recipe for things to go wrong. Instead, force a PTE mapping when the pfn faulted in has a device mapping. Fixes: 6d674e28f642 ("KVM: arm/arm64: Properly handle faulting of device mappings") Suggested-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Santosh Shukla <sashukla@nvidia.com> [maz: rewritten commit message] Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Gavin Shan <gshan@redhat.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/1603711447-11998-2-git-send-email-sashukla@nvidia.com
2020-10-29KVM: arm64: Use fallback mapping sizes for contiguous huge page sizesGavin Shan1-7/+19
Although huge pages can be created out of multiple contiguous PMDs or PTEs, the corresponding sizes are not supported at Stage-2 yet. Instead of failing the mapping, fall back to the nearer supported mapping size (CONT_PMD to PMD and CONT_PTE to PTE respectively). Suggested-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Gavin Shan <gshan@redhat.com> [maz: rewritten commit message] Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20201025230626.18501-1-gshan@redhat.com
2020-10-29KVM: arm64: Fix masks in stage2_pte_cacheable()Will Deacon1-1/+1
stage2_pte_cacheable() tries to figure out whether the mapping installed in its 'pte' parameter is cacheable or not. Unfortunately, it fails miserably because it extracts the memory attributes from the entry using FIELD_GET(), which returns the attributes shifted down to bit 0, but then compares this with the unshifted value generated by the PAGE_S2_MEMATTR() macro. A direct consequence of this bug is that cache maintenance is silently skipped, which in turn causes 32-bit guests to crash early on when their set/way maintenance is trapped but not emulated correctly. Fix the broken masks by avoiding the use of FIELD_GET() altogether. Fixes: 6d9d2115c480 ("KVM: arm64: Add support for stage-2 map()/unmap() in generic page-table") Reported-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20201029144716.30476-1-will@kernel.org
2020-10-29KVM: arm64: Fix AArch32 handling of DBGD{CCINT,SCRext} and DBGVCRMarc Zyngier2-3/+4
The DBGD{CCINT,SCRext} and DBGVCR register entries in the cp14 array are missing their target register, resulting in all accesses being targetted at the guard sysreg (indexed by __INVALID_SYSREG__). Point the emulation code at the actual register entries. Fixes: bdfb4b389c8d ("arm64: KVM: add trap handlers for AArch32 debug registers") Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20201029172409.2768336-1-maz@kernel.org
2020-10-29KVM: arm64: Allocate stage-2 pgd pages with GFP_KERNEL_ACCOUNTWill Deacon1-1/+1
For consistency with the rest of the stage-2 page-table page allocations (performing using a kvm_mmu_memory_cache), ensure that __GFP_ACCOUNT is included in the GFP flags for the PGD pages. Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Gavin Shan <gshan@redhat.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20201026144423.24683-1-will@kernel.org
2020-10-29KVM: arm64: Drop useless PAN setting on host EL1 to EL2 transitionMarc Zyngier1-2/+0
Setting PSTATE.PAN when entering EL2 on nVHE doesn't make much sense as this bit only means something for translation regimes that include EL0. This obviously isn't the case in the nVHE case, so let's drop this setting. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Vladimir Murzin <vladimir.murzin@arm.com> Link: https://lore.kernel.org/r/20201026095116.72051-4-maz@kernel.org
2020-10-29KVM: arm64: Remove leftover kern_hyp_va() in nVHE TLB invalidationMarc Zyngier1-1/+0
The new calling convention says that pointers coming from the SMCCC interface are turned into their HYP version in the host HVC handler. However, there is still a stray kern_hyp_va() in the TLB invalidation code, which could result in a corrupted pointer. Drop the spurious conversion. Fixes: a071261d9318 ("KVM: arm64: nVHE: Fix pointers during SMCCC convertion") Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20201026095116.72051-3-maz@kernel.org
2020-10-29KVM: arm64: Don't corrupt tpidr_el2 on failed HVC callMarc Zyngier1-7/+16
The hyp-init code starts by stashing a register in TPIDR_EL2 in in order to free a register. This happens no matter if the HVC call is legal or not. Although nothing wrong seems to come out of it, it feels odd to alter the EL2 state for something that eventually returns an error. Instead, use the fact that we know exactly which bits of the __kvm_hyp_init call are non-zero to perform the check with a series of EOR/ROR instructions, combined with a build-time check that the value is the one we expect. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20201026095116.72051-2-maz@kernel.org
2020-10-29arm64: Add workaround for Arm Cortex-A77 erratum 1508412Rob Herring12-15/+64
On Cortex-A77 r0p0 and r1p0, a sequence of a non-cacheable or device load and a store exclusive or PAR_EL1 read can cause a deadlock. The workaround requires a DMB SY before and after a PAR_EL1 register read. In addition, it's possible an interrupt (doing a device read) or KVM guest exit could be taken between the DMB and PAR read, so we also need a DMB before returning from interrupt and before returning to a guest. A deadlock is still possible with the workaround as KVM guests must also have the workaround. IOW, a malicious guest can deadlock an affected systems. This workaround also depends on a firmware counterpart to enable the h/w to insert DMB SY after load and store exclusive instructions. See the errata document SDEN-1152370 v10 [1] for more information. [1] https://static.docs.arm.com/101992/0010/Arm_Cortex_A77_MP074_Software_Developer_Errata_Notice_v10.pdf Signed-off-by: Rob Herring <robh@kernel.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Marc Zyngier <maz@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Julien Thierry <julien.thierry.kdev@gmail.com> Cc: kvmarm@lists.cs.columbia.edu Link: https://lore.kernel.org/r/20201028182839.166037-2-robh@kernel.org Signed-off-by: Will Deacon <will@kernel.org>