summaryrefslogtreecommitdiffstats
path: root/arch/powerpc/kernel
AgeCommit message (Collapse)AuthorFilesLines
2020-08-07Merge tag 'powerpc-5.9-1' of ↵Linus Torvalds60-823/+1145
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - Add support for (optionally) using queued spinlocks & rwlocks. - Support for a new faster system call ABI using the scv instruction on Power9 or later. - Drop support for the PROT_SAO mmap/mprotect flag as it will be unsupported on Power10 and future processors, leaving us with no way to implement the functionality it requests. This risks breaking userspace, though we believe it is unused in practice. - A bug fix for, and then the removal of, our custom stack expansion checking. We now allow stack expansion up to the rlimit, like other architectures. - Remove the remnants of our (previously disabled) topology update code, which tried to react to NUMA layout changes on virtualised systems, but was prone to crashes and other problems. - Add PMU support for Power10 CPUs. - A change to our signal trampoline so that we don't unbalance the link stack (branch return predictor) in the signal delivery path. - Lots of other cleanups, refactorings, smaller features and so on as usual. Thanks to: Abhishek Goel, Alastair D'Silva, Alexander A. Klimov, Alexey Kardashevskiy, Alistair Popple, Andrew Donnellan, Aneesh Kumar K.V, Anju T Sudhakar, Anton Blanchard, Arnd Bergmann, Athira Rajeev, Balamuruhan S, Bharata B Rao, Bill Wendling, Bin Meng, Cédric Le Goater, Chris Packham, Christophe Leroy, Christoph Hellwig, Daniel Axtens, Dan Williams, David Lamparter, Desnes A. Nunes do Rosario, Erhard F., Finn Thain, Frederic Barrat, Ganesh Goudar, Gautham R. Shenoy, Geoff Levand, Greg Kurz, Gustavo A. R. Silva, Hari Bathini, Harish, Imre Kaloz, Joel Stanley, Joe Perches, John Crispin, Jordan Niethe, Kajol Jain, Kamalesh Babulal, Kees Cook, Laurent Dufour, Leonardo Bras, Li RongQing, Madhavan Srinivasan, Mahesh Salgaonkar, Mark Cave-Ayland, Michal Suchanek, Milton Miller, Mimi Zohar, Murilo Opsfelder Araujo, Nathan Chancellor, Nathan Lynch, Naveen N. Rao, Nayna Jain, Nicholas Piggin, Oliver O'Halloran, Palmer Dabbelt, Pedro Miraglia Franco de Carvalho, Philippe Bergheaud, Pingfan Liu, Pratik Rajesh Sampat, Qian Cai, Qinglang Miao, Randy Dunlap, Ravi Bangoria, Sachin Sant, Sam Bobroff, Sandipan Das, Santosh Sivaraj, Satheesh Rajendran, Shirisha Ganta, Sourabh Jain, Srikar Dronamraju, Stan Johnson, Stephen Rothwell, Thadeu Lima de Souza Cascardo, Thiago Jung Bauermann, Tom Lane, Vaibhav Jain, Vladis Dronov, Wei Yongjun, Wen Xiong, YueHaibing. * tag 'powerpc-5.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (337 commits) selftests/powerpc: Fix pkey syscall redefinitions powerpc: Fix circular dependency between percpu.h and mmu.h powerpc/powernv/sriov: Fix use of uninitialised variable selftests/powerpc: Skip vmx/vsx/tar/etc tests on older CPUs powerpc/40x: Fix assembler warning about r0 powerpc/papr_scm: Add support for fetching nvdimm 'fuel-gauge' metric powerpc/papr_scm: Fetch nvdimm performance stats from PHYP cpuidle: pseries: Fixup exit latency for CEDE(0) cpuidle: pseries: Add function to parse extended CEDE records cpuidle: pseries: Set the latency-hint before entering CEDE selftests/powerpc: Fix online CPU selection powerpc/perf: Consolidate perf_callchain_user_[64|32]() powerpc/pseries/hotplug-cpu: Remove double free in error path powerpc/pseries/mobility: Add pr_debug() for device tree changes powerpc/pseries/mobility: Set pr_fmt() powerpc/cacheinfo: Warn if cache object chain becomes unordered powerpc/cacheinfo: Improve diagnostics about malformed cache lists powerpc/cacheinfo: Use name@unit instead of full DT path in debug messages powerpc/cacheinfo: Set pr_fmt() powerpc: fix function annotations to avoid section mismatch warnings with gcc-10 ...
2020-08-07Merge branch 'work.regset' of ↵Linus Torvalds7-304/+148
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull ptrace regset updates from Al Viro: "Internal regset API changes: - regularize copy_regset_{to,from}_user() callers - switch to saner calling conventions for ->get() - kill user_regset_copyout() The ->put() side of things will have to wait for the next cycle, unfortunately. The balance is about -1KLoC and replacements for ->get() instances are a lot saner" * 'work.regset' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (41 commits) regset: kill user_regset_copyout{,_zero}() regset(): kill ->get_size() regset: kill ->get() csky: switch to ->regset_get() xtensa: switch to ->regset_get() parisc: switch to ->regset_get() nds32: switch to ->regset_get() nios2: switch to ->regset_get() hexagon: switch to ->regset_get() h8300: switch to ->regset_get() openrisc: switch to ->regset_get() riscv: switch to ->regset_get() c6x: switch to ->regset_get() ia64: switch to ->regset_get() arc: switch to ->regset_get() arm: switch to ->regset_get() sh: convert to ->regset_get() arm64: switch to ->regset_get() mips: switch to ->regset_get() sparc: switch to ->regset_get() ...
2020-08-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-nextLinus Torvalds1-2/+2
Pull networking updates from David Miller: 1) Support 6Ghz band in ath11k driver, from Rajkumar Manoharan. 2) Support UDP segmentation in code TSO code, from Eric Dumazet. 3) Allow flashing different flash images in cxgb4 driver, from Vishal Kulkarni. 4) Add drop frames counter and flow status to tc flower offloading, from Po Liu. 5) Support n-tuple filters in cxgb4, from Vishal Kulkarni. 6) Various new indirect call avoidance, from Eric Dumazet and Brian Vazquez. 7) Fix BPF verifier failures on 32-bit pointer arithmetic, from Yonghong Song. 8) Support querying and setting hardware address of a port function via devlink, use this in mlx5, from Parav Pandit. 9) Support hw ipsec offload on bonding slaves, from Jarod Wilson. 10) Switch qca8k driver over to phylink, from Jonathan McDowell. 11) In bpftool, show list of processes holding BPF FD references to maps, programs, links, and btf objects. From Andrii Nakryiko. 12) Several conversions over to generic power management, from Vaibhav Gupta. 13) Add support for SO_KEEPALIVE et al. to bpf_setsockopt(), from Dmitry Yakunin. 14) Various https url conversions, from Alexander A. Klimov. 15) Timestamping and PHC support for mscc PHY driver, from Antoine Tenart. 16) Support bpf iterating over tcp and udp sockets, from Yonghong Song. 17) Support 5GBASE-T i40e NICs, from Aleksandr Loktionov. 18) Add kTLS RX HW offload support to mlx5e, from Tariq Toukan. 19) Fix the ->ndo_start_xmit() return type to be netdev_tx_t in several drivers. From Luc Van Oostenryck. 20) XDP support for xen-netfront, from Denis Kirjanov. 21) Support receive buffer autotuning in MPTCP, from Florian Westphal. 22) Support EF100 chip in sfc driver, from Edward Cree. 23) Add XDP support to mvpp2 driver, from Matteo Croce. 24) Support MPTCP in sock_diag, from Paolo Abeni. 25) Commonize UDP tunnel offloading code by creating udp_tunnel_nic infrastructure, from Jakub Kicinski. 26) Several pci_ --> dma_ API conversions, from Christophe JAILLET. 27) Add FLOW_ACTION_POLICE support to mlxsw, from Ido Schimmel. 28) Add SK_LOOKUP bpf program type, from Jakub Sitnicki. 29) Refactor a lot of networking socket option handling code in order to avoid set_fs() calls, from Christoph Hellwig. 30) Add rfc4884 support to icmp code, from Willem de Bruijn. 31) Support TBF offload in dpaa2-eth driver, from Ioana Ciornei. 32) Support XDP_REDIRECT in qede driver, from Alexander Lobakin. 33) Support PCI relaxed ordering in mlx5 driver, from Aya Levin. 34) Support TCP syncookies in MPTCP, from Flowian Westphal. 35) Fix several tricky cases of PMTU handling wrt. briding, from Stefano Brivio. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2056 commits) net: thunderx: initialize VF's mailbox mutex before first usage usb: hso: remove bogus check for EINPROGRESS usb: hso: no complaint about kmalloc failure hso: fix bailout in error case of probe ip_tunnel_core: Fix build for archs without _HAVE_ARCH_IPV6_CSUM selftests/net: relax cpu affinity requirement in msg_zerocopy test mptcp: be careful on subflow creation selftests: rtnetlink: make kci_test_encap() return sub-test result selftests: rtnetlink: correct the final return value for the test net: dsa: sja1105: use detected device id instead of DT one on mismatch tipc: set ub->ifindex for local ipv6 address ipv6: add ipv6_dev_find() net: openvswitch: silence suspicious RCU usage warning Revert "vxlan: fix tos value before xmit" ptp: only allow phase values lower than 1 period farsync: switch from 'pci_' to 'dma_' API wan: wanxl: switch from 'pci_' to 'dma_' API hv_netvsc: do not use VF device if link is down dpaa2-eth: Fix passing zero to 'PTR_ERR' warning net: macb: Properly handle phylink on at91sam9x ...
2020-08-04Merge tag 'dma-mapping-5.9' of git://git.infradead.org/users/hch/dma-mappingLinus Torvalds1-81/+9
Pull dma-mapping updates from Christoph Hellwig: - make support for dma_ops optional - move more code out of line - add generic support for a dma_ops bypass mode - misc cleanups * tag 'dma-mapping-5.9' of git://git.infradead.org/users/hch/dma-mapping: dma-contiguous: cleanup dma_alloc_contiguous dma-debug: use named initializers for dir2name powerpc: use the generic dma_ops_bypass mode dma-mapping: add a dma_ops_bypass flag to struct device dma-mapping: make support for dma ops optional dma-mapping: inline the fast path dma-direct calls dma-mapping: move the remaining DMA API calls out of line
2020-08-04Merge tag 'close-range-v5.9' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull close_range() implementation from Christian Brauner: "This adds the close_range() syscall. It allows to efficiently close a range of file descriptors up to all file descriptors of a calling task. This is coordinated with the FreeBSD folks which have copied our version of this syscall and in the meantime have already merged it in April 2019: https://reviews.freebsd.org/D21627 https://svnweb.freebsd.org/base?view=revision&revision=359836 The syscall originally came up in a discussion around the new mount API and making new file descriptor types cloexec by default. During this discussion, Al suggested the close_range() syscall. First, it helps to close all file descriptors of an exec()ing task. This can be done safely via (quoting Al's example from [1] verbatim): /* that exec is sensitive */ unshare(CLONE_FILES); /* we don't want anything past stderr here */ close_range(3, ~0U); execve(....); The code snippet above is one way of working around the problem that file descriptors are not cloexec by default. This is aggravated by the fact that we can't just switch them over without massively regressing userspace. For a whole class of programs having an in-kernel method of closing all file descriptors is very helpful (e.g. demons, service managers, programming language standard libraries, container managers etc.). Second, it allows userspace to avoid implementing closing all file descriptors by parsing through /proc/<pid>/fd/* and calling close() on each file descriptor and other hacks. From looking at various large(ish) userspace code bases this or similar patterns are very common in service managers, container runtimes, and programming language runtimes/standard libraries such as Python or Rust. In addition, the syscall will also work for tasks that do not have procfs mounted and on kernels that do not have procfs support compiled in. In such situations the only way to make sure that all file descriptors are closed is to call close() on each file descriptor up to UINT_MAX or RLIMIT_NOFILE, OPEN_MAX trickery. Based on Linus' suggestion close_range() also comes with a new flag CLOSE_RANGE_UNSHARE to more elegantly handle file descriptor dropping right before exec. This would usually be expressed in the sequence: unshare(CLONE_FILES); close_range(3, ~0U); as pointed out by Linus it might be desirable to have this be a part of close_range() itself under a new flag CLOSE_RANGE_UNSHARE which gets especially handy when we're closing all file descriptors above a certain threshold. Test-suite as always included" * tag 'close-range-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: tests: add CLOSE_RANGE_UNSHARE tests close_range: add CLOSE_RANGE_UNSHARE tests: add close_range() tests arch: wire-up close_range() open: add close_range()
2020-08-04Merge tag 'fork-v5.9' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull fork cleanups from Christian Brauner: "This is cleanup series from when we reworked a chunk of the process creation paths in the kernel and switched to struct {kernel_}clone_args. High-level this does two main things: - Remove the double export of both do_fork() and _do_fork() where do_fork() used the incosistent legacy clone calling convention. Now we only export _do_fork() which is based on struct kernel_clone_args. - Remove the copy_thread_tls()/copy_thread() split making the architecture specific HAVE_COYP_THREAD_TLS config option obsolete. This switches all remaining architectures to select HAVE_COPY_THREAD_TLS and thus to the copy_thread_tls() calling convention. The current split makes the process creation codepaths more convoluted than they need to be. Each architecture has their own copy_thread() function unless it selects HAVE_COPY_THREAD_TLS then it has a copy_thread_tls() function. The split is not needed anymore nowadays, all architectures support CLONE_SETTLS but quite a few of them never bothered to select HAVE_COPY_THREAD_TLS and instead simply continued to use copy_thread() and use the old calling convention. Removing this split cleans up the process creation codepaths and paves the way for implementing clone3() on such architectures since it requires the copy_thread_tls() calling convention. After having made each architectures support copy_thread_tls() this series simply renames that function back to copy_thread(). It also switches all architectures that call do_fork() directly over to _do_fork() and the struct kernel_clone_args calling convention. This is a corollary of switching the architectures that did not yet support it over to copy_thread_tls() since do_fork() is conditional on not supporting copy_thread_tls() (Mostly because it lacks a separate argument for tls which is trivial to fix but there's no need for this function to exist.). The do_fork() removal is in itself already useful as it allows to to remove the export of both do_fork() and _do_fork() we currently have in favor of only _do_fork(). This has already been discussed back when we added clone3(). The legacy clone() calling convention is - as is probably well-known - somewhat odd: # # ABI hall of shame # config CLONE_BACKWARDS config CLONE_BACKWARDS2 config CLONE_BACKWARDS3 that is aggravated by the fact that some architectures such as sparc follow the CLONE_BACKWARDSx calling convention but don't really select the corresponding config option since they call do_fork() directly. So do_fork() enforces a somewhat arbitrary calling convention in the first place that doesn't really help the individual architectures that deviate from it. They can thus simply be switched to _do_fork() enforcing a single calling convention. (I really hope that any new architectures will __not__ try to implement their own calling conventions...) Most architectures already have made a similar switch (m68k comes to mind). Overall this removes more code than it adds even with a good portion of added comments. It simplifies a chunk of arch specific assembly either by moving the code into C or by simply rewriting the assembly. Architectures that have been touched in non-trivial ways have all been actually boot and stress tested: sparc and ia64 have been tested with Debian 9 images. They are the two architectures which have been touched the most. All non-trivial changes to architectures have seen acks from the relevant maintainers. nios2 with a custom built buildroot image. h8300 I couldn't get something bootable to test on but the changes have been fairly automatic and I'm sure we'll hear people yell if I broke something there. All other architectures that have been touched in trivial ways have been compile tested for each single patch of the series via git rebase -x "make ..." v5.8-rc2. arm{64} and x86{_64} have been boot tested even though they have just been trivially touched (removal of the HAVE_COPY_THREAD_TLS macro from their Kconfig) because well they are basically "core architectures" and since it is trivial to get your hands on a useable image" * tag 'fork-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: arch: rename copy_thread_tls() back to copy_thread() arch: remove HAVE_COPY_THREAD_TLS unicore: switch to copy_thread_tls() sh: switch to copy_thread_tls() nds32: switch to copy_thread_tls() microblaze: switch to copy_thread_tls() hexagon: switch to copy_thread_tls() c6x: switch to copy_thread_tls() alpha: switch to copy_thread_tls() fork: remove do_fork() h8300: select HAVE_COPY_THREAD_TLS, switch to kernel_clone_args nios2: enable HAVE_COPY_THREAD_TLS, switch to kernel_clone_args ia64: enable HAVE_COPY_THREAD_TLS, switch to kernel_clone_args sparc: unconditionally enable HAVE_COPY_THREAD_TLS sparc: share process creation helpers between sparc and sparc64 sparc64: enable HAVE_COPY_THREAD_TLS fork: fold legacy_clone_args_valid() into _do_fork()
2020-08-03Merge tag 'locking-core-2020-08-03' of ↵Linus Torvalds1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: - LKMM updates: mostly documentation changes, but also some new litmus tests for atomic ops. - KCSAN updates: the most important change is that GCC 11 now has all fixes in place to support KCSAN, so GCC support can be enabled again. Also more annotations. - futex updates: minor cleanups and simplifications - seqlock updates: merge preparatory changes/cleanups for the 'associated locks' facilities. - lockdep updates: - simplify IRQ trace event handling - add various new debug checks - simplify header dependencies, split out <linux/lockdep_types.h>, decouple lockdep from other low level headers some more - fix NMI handling - misc cleanups and smaller fixes * tag 'locking-core-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits) kcsan: Improve IRQ state trace reporting lockdep: Refactor IRQ trace events fields into struct seqlock: lockdep assert non-preemptibility on seqcount_t write lockdep: Add preemption enabled/disabled assertion APIs seqlock: Implement raw_seqcount_begin() in terms of raw_read_seqcount() seqlock: Add kernel-doc for seqcount_t and seqlock_t APIs seqlock: Reorder seqcount_t and seqlock_t API definitions seqlock: seqcount_t latch: End read sections with read_seqcount_retry() seqlock: Properly format kernel-doc code samples Documentation: locking: Describe seqlock design and usage locking/qspinlock: Do not include atomic.h from qspinlock_types.h locking/atomic: Move ATOMIC_INIT into linux/types.h lockdep: Move list.h inclusion into lockdep.h locking/lockdep: Fix TRACE_IRQFLAGS vs. NMIs futex: Remove unused or redundant includes futex: Consistently use fshared as boolean futex: Remove needless goto's futex: Remove put_futex_key() rwsem: fix commas in initialisation docs: locking: Replace HTTP links with HTTPS ones ...
2020-08-03powerpc/40x: Fix assembler warning about r0Michael Ellerman1-1/+1
The assembler says: arch/powerpc/kernel/head_40x.S:623: Warning: invalid register expression It's objecting to the use of r0 as the RA argument. That's because when RA = 0 the literal value 0 is used, rather than the content of r0, making the use of r0 in the source potentially confusing. Fix it to use a literal 0, the generated code is identical. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200722022422.825197-1-mpe@ellerman.id.au
2020-08-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller1-3/+11
Resolved kernel/bpf/btf.c using instructions from merge commit 69138b34a7248d2396ab85c8652e20c0c39beaba Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-30powerpc/cacheinfo: Warn if cache object chain becomes unorderedNathan Lynch1-0/+9
This can catch cases where the device tree has gotten mishandled into an inconsistent state at runtime, e.g. the cache nodes for both the source and the destination are present after a migration. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190627051537.7298-5-nathanl@linux.ibm.com
2020-07-30powerpc/cacheinfo: Improve diagnostics about malformed cache listsNathan Lynch1-2/+8
If we have a bug which causes us to start with the wrong kind of OF node when linking up the cache tree, it's helpful for debugging to print information about what we found vs what we expected. So replace uses of WARN_ON_ONCE with WARN_ONCE, which lets us include an informative message instead of a contentless backtrace. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190627051537.7298-4-nathanl@linux.ibm.com
2020-07-30powerpc/cacheinfo: Use name@unit instead of full DT path in debug messagesNathan Lynch1-8/+8
We know that every OF node we deal with in this code is under /cpus, so we can make the debug messages a little less verbose without losing information. E.g. cacheinfo: creating L1 dcache and icache for /cpus/PowerPC,POWER8@0 cacheinfo: creating L2 ucache for /cpus/l2-cache@2006 cacheinfo: creating L3 ucache for /cpus/l3-cache@3106 becomes cacheinfo: creating L1 dcache and icache for PowerPC,POWER8@0 cacheinfo: creating L2 ucache for l2-cache@2006 cacheinfo: creating L3 ucache for l3-cache@3106 Replace all '%pOF' specifiers with '%pOFP'. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190627051537.7298-3-nathanl@linux.ibm.com
2020-07-30powerpc/cacheinfo: Set pr_fmt()Nathan Lynch1-0/+2
Set pr_fmt() so we get a nice prefix on messages. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190627051537.7298-2-nathanl@linux.ibm.com
2020-07-30powerpc: fix function annotations to avoid section mismatch warnings with gcc-10Vladis Dronov1-2/+2
Certain warnings are emitted for powerpc code when building with a gcc-10 toolset: WARNING: modpost: vmlinux.o(.text.unlikely+0x377c): Section mismatch in reference from the function remove_pmd_table() to the function .meminit.text:split_kernel_mapping() The function remove_pmd_table() references the function __meminit split_kernel_mapping(). This is often because remove_pmd_table lacks a __meminit annotation or the annotation of split_kernel_mapping is wrong. Add the appropriate __init and __meminit annotations to make modpost not complain. In all the cases there are just a single callsite from another __init or __meminit function: __meminit remove_pagetable() -> remove_pud_table() -> remove_pmd_table() __init prom_init() -> setup_secure_guest() __init xive_spapr_init() -> xive_spapr_disabled() Signed-off-by: Vladis Dronov <vdronov@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200729133741.62789-1-vdronov@redhat.com
2020-07-29powerpc/drmem: Make LMB walk a bit more flexibleHari Bathini1-5/+8
Currently, numa & prom are the only users of drmem LMB walk code. Loading kdump with kexec_file also needs to walk the drmem LMBs to setup the usable memory ranges for kdump kernel. But there are couple of issues in using the code as is. One, walk_drmem_lmb() code is built into the .init section currently, while kexec_file needs it later. Two, there is no scope to pass data to the callback function for processing and/or erroring out on certain conditions. Fix that by, moving drmem LMB walk code out of .init section, adding scope to pass data to the callback function and bailing out when an error is encountered in the callback function. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Tested-by: Pingfan Liu <piliu@redhat.com> Reviewed-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/159602282727.575379.3979857013827701828.stgit@hbathini
2020-07-29powerpc/64s: Move HMI IRQ stat from percpu variable to paca.Mahesh Salgaonkar2-4/+7
With the proposed change in percpu bootmem allocator to use page mapping [1], the percpu first chunk memory area can come from vmalloc ranges. This makes the HMI (Hypervisor Maintenance Interrupt) handler crash the kernel whenever percpu variable is accessed in real mode. This patch fixes this issue by moving the HMI IRQ stat inside paca for safe access in realmode. [1] https://lore.kernel.org/linuxppc-dev/20200608070904.387440-1-aneesh.kumar@linux.ibm.com/ Suggested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/159290806973.3642154.5244613424529764050.stgit@jupiter
2020-07-29powerpc/build: vdso linker warning for orphan sectionsNicholas Piggin4-3/+5
Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200303012748.4190929-1-npiggin@gmail.com
2020-07-29powerpc: Use fallthrough pseudo-keywordGustavo A. R. Silva1-4/+4
Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200727224201.GA10133@embeddedor
2020-07-29powerpc/book3s64/radix: Add kernel command line option to disable radix GTSEAneesh Kumar K.V1-4/+9
This adds a kernel command line option that can be used to disable GTSE support. Disabling GTSE implies kernel will make hcalls to invalidate TLB entries. This was done so that we can do VM migration between configs that enable/disable GTSE support via hypervisor. To migrate a VM from a system that supports GTSE to a system that doesn't, we can boot the guest with radix_hcall_invalidate=on, thereby forcing the guest to use hcalls for TLB invalidates. The check for hcall availability is done in pSeries_setup_arch so that the panic message appears on the console. This should only happen on a hypervisor that doesn't force the guest to hash translation even though it can't handle the radix GTSE=0 request via CAS. With radix_hcall_invalidate=on if the hypervisor doesn't support hcall_rpt_invalidate hcall it should force the LPAR to hash translation. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Tested-by: Bharata B Rao <bharata@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200727085908.420806-1-aneesh.kumar@linux.ibm.com
2020-07-29powerpc/hugetlb/cma: Allocate gigantic hugetlb pages using CMAAneesh Kumar K.V1-0/+3
commit: cf11e85fc08c ("mm: hugetlb: optionally allocate gigantic hugepages using cma") added support for allocating gigantic hugepages using CMA. This patch enables the same for powerpc Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200713150749.25245-1-aneesh.kumar@linux.ibm.com
2020-07-29powerpc/32s: Remove TAUException wart in traps.cMichael Ellerman2-8/+4
All 32 and 64-bit builds that don't have CONFIG_TAU_INT enabled (all of them), get a definition of TAUException() in traps.c. On 64-bit it's completely useless, and just wastes ~120 bytes of text. On 32-bit it allows the kernel to link because head_32.S calls it unconditionally. Instead follow the example of altivec_assist_exception(), and if CONFIG_TAU_INT is not enabled just point it at unknown_exception using the preprocessor. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200724131728.1643966-6-mpe@ellerman.id.au
2020-07-29powerpc/pseries: Add KVM guest doorbell restrictionsNicholas Piggin2-3/+21
KVM guests have certain restrictions and performance quirks when using doorbells. This patch moves the EPAPR KVM guest test so it can be shared with PSERIES, and uses that in doorbell setup code to apply the KVM guest quirks and improves IPI performance for two cases: - PowerVM guests may now use doorbells even if they are secure. - KVM guests no longer use doorbells if XIVE is available. There is a valid complaint that "KVM guest" is not a very reasonable thing to test for, it's preferable for the hypervisor to advertise particular behaviours to the guest so they could change if the hypervisor implementation or configuration changes. However in this case we were already assuming a KVM guest worst case, so this patch is about containing those quirks. If KVM later advertises fast doorbells, we should test for that and override the quirks. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Tested-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200726035155.1424103-4-npiggin@gmail.com
2020-07-29powerpc: Inline doorbell sending functionsNicholas Piggin1-55/+0
These are only called in one place for a given platform, so inline them for performance. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Tested-by: Cédric Le Goater <clg@kaod.org> [mpe: Fix build errors related to KVM] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200726035155.1424103-2-npiggin@gmail.com
2020-07-27powerpc: switch to ->regset_get()Al Viro7-304/+148
Note: compat variant of REGSET_TM_CGPR is almost certainly wrong; it claims to be 48*64bit, but just as compat REGSET_GPR it stores 44*32bit of (truncated) registers + 4 32bit zeros... followed by 48 more 32bit zeroes. Might be too late to change - it's a userland ABI, after all ;-/ Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-07-27powerpc/fadump: Fix build error with CONFIG_PRESERVE_FA_DUMP=yMichael Ellerman1-1/+3
skiroot_defconfig fails: arch/powerpc/kernel/fadump.c:48:17: error: ‘cpus_in_fadump’ defined but not used 48 | static atomic_t cpus_in_fadump; Fix it by moving the definition into the #ifdef where it's used. Fixes: ba608c4fa12c ("powerpc/fadump: fix race between pstore write and fadump crash trigger") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200727070341.595634-1-mpe@ellerman.id.au
2020-07-27powerpc/64s/hash: Fix hash_preload running with interrupts enabledNicholas Piggin1-3/+11
Commit 2f92447f9f96 ("powerpc/book3s64/hash: Use the pte_t address from the caller") removed the local_irq_disable from hash_preload, but it was required for more than just the page table walk: the hash pte busy bit is effectively a lock which may be taken in interrupt context, and the local update flag test must not be preempted before it's used. This solves apparent lockups with perf interrupting __hash_page_64K. If get_perf_callchain then also takes a hash fault on the same page while it is already locked, it will loop forever taking hash faults, which looks like this: cpu 0x49e: Vector: 100 (System Reset) at [c00000001a4f7d70] pc: c000000000072dc8: hash_page_mm+0x8/0x800 lr: c00000000000c5a4: do_hash_page+0x24/0x38 sp: c0002ac1cc69ac70 msr: 8000000000081033 current = 0xc0002ac1cc602e00 paca = 0xc00000001de1f280 irqmask: 0x03 irq_happened: 0x01 pid = 20118, comm = pread2_processe Linux version 5.8.0-rc6-00345-g1fad14f18bc6 49e:mon> t [c0002ac1cc69ac70] c00000000000c5a4 do_hash_page+0x24/0x38 (unreliable) --- Exception: 300 (Data Access) at c00000000008fa60 __copy_tofrom_user_power7+0x20c/0x7ac [link register ] c000000000335d10 copy_from_user_nofault+0xf0/0x150 [c0002ac1cc69af70] c00032bf9fa3c880 (unreliable) [c0002ac1cc69afa0] c000000000109df0 read_user_stack_64+0x70/0xf0 [c0002ac1cc69afd0] c000000000109fcc perf_callchain_user_64+0x15c/0x410 [c0002ac1cc69b060] c000000000109c00 perf_callchain_user+0x20/0x40 [c0002ac1cc69b080] c00000000031c6cc get_perf_callchain+0x25c/0x360 [c0002ac1cc69b120] c000000000316b50 perf_callchain+0x70/0xa0 [c0002ac1cc69b140] c000000000316ddc perf_prepare_sample+0x25c/0x790 [c0002ac1cc69b1a0] c000000000317350 perf_event_output_forward+0x40/0xb0 [c0002ac1cc69b220] c000000000306138 __perf_event_overflow+0x88/0x1a0 [c0002ac1cc69b270] c00000000010cf70 record_and_restart+0x230/0x750 [c0002ac1cc69b620] c00000000010d69c perf_event_interrupt+0x20c/0x510 [c0002ac1cc69b730] c000000000027d9c performance_monitor_exception+0x4c/0x60 [c0002ac1cc69b750] c00000000000b2f8 performance_monitor_common_virt+0x1b8/0x1c0 --- Exception: f00 (Performance Monitor) at c0000000000cb5b0 pSeries_lpar_hpte_insert+0x0/0x160 [link register ] c0000000000846f0 __hash_page_64K+0x210/0x540 [c0002ac1cc69ba50] 0000000000000000 (unreliable) [c0002ac1cc69bb00] c000000000073ae0 update_mmu_cache+0x390/0x3a0 [c0002ac1cc69bb70] c00000000037f024 wp_page_copy+0x364/0xce0 [c0002ac1cc69bc20] c00000000038272c do_wp_page+0xdc/0xa60 [c0002ac1cc69bc70] c0000000003857bc handle_mm_fault+0xb9c/0x1b60 [c0002ac1cc69bd50] c00000000006c434 __do_page_fault+0x314/0xc90 [c0002ac1cc69be20] c00000000000c5c8 handle_page_fault+0x10/0x2c --- Exception: 300 (Data Access) at 00007fff8c861fe8 SP (7ffff6b19660) is in userspace Fixes: 2f92447f9f96 ("powerpc/book3s64/hash: Use the pte_t address from the caller") Reported-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Reported-by: Anton Blanchard <anton@ozlabs.org> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200727060947.10060-1-npiggin@gmail.com
2020-07-27powerpc/32s: Kernel space starts at TASK_SIZEChristophe Leroy1-6/+6
Kernel space starts at TASK_SIZE. Select kernel page table when address is over TASK_SIZE. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/893425e32cd0a003539573b2d115e0ffa98bc26c.1593428200.git.christophe.leroy@csgroup.eu
2020-07-27powerpc: Use MODULES_VADDR if definedChristophe Leroy1-0/+11
In order to allow allocation of modules outside of vmalloc space, use MODULES_VADDR and MODULES_END when MODULES_VADDR is defined. Redefine module_alloc() when MODULES_VADDR defined. Unmap corresponding KASAN shadow memory. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/7ecf5fff1eef67d450e73fc412b6ec3818483d75.1593428200.git.christophe.leroy@csgroup.eu
2020-07-26powerpc/perf: Initialize power10 PMU registers in cpu setup routineAthira Rajeev1-4/+15
Initialize Monitor Mode Control Register 3 (MMCR3) SPR which is new in power10. For PowerISA v3.1, BHRB disable is controlled via Monitor Mode Control Register A (MMCRA) bit, namely "BHRB Recording Disable (BHRBRD)". This patch also initializes MMCRA BHRBRD to disable BHRB feature at boot for power10. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Reviewed-by: Jordan Niethe <jniethe5@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1595489557-2047-1-git-send-email-atrajeev@linux.vnet.ibm.com
2020-07-26powerpc/eeh: Move PE tree setup into the platformOliver O'Halloran1-52/+19
The EEH core has a concept of a "PE tree" to support PowerNV. The PE tree follows the PCI bus structures because a reset asserted on an upstream bridge will be propagated to the downstream bridges. On pseries there's a 1-1 correspondence between what the guest sees are a PHB and a PE so the "tree" is really just a single node. Current the EEH core is reponsible for setting up this PE tree which it does by traversing the pci_dn tree. The structure of the pci_dn tree matches the bus tree on PowerNV which leads to the PE tree being "correct" this setup method doesn't make a whole lot of sense and it's actively confusing for the pseries case where it doesn't really do anything. We want to remove the dependence on pci_dn anyway so this patch move choosing where to insert a new PE into the platform code rather than being part of the generic EEH code. For PowerNV this simplifies the tree building logic and removes the use of pci_dn. For pseries we keep the existing logic. I'm not really convinced it does anything due to the 1-1 PE-to-PHB correspondence so every device under that PHB should be in the same PE, but I'd rather not remove it entirely until we've had a chance to look at it more deeply. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200725081231.39076-14-oohall@gmail.com
2020-07-26powerpc/eeh: Drop pdn use in eeh_pe_tree_insert()Oliver O'Halloran1-8/+7
This is mostly just to make the subsequent diffs less noisy. No functional changes. One thing that needs calling out is the removal of the "config_addr" variable and replacing it with edev->bdfn. The contents of edev->bdfn are the same, however it's worth pointing out that what RTAS calls a "config_addr" isn't the same as the bdfn. The config_addr is supposed to be: <bus><devfn><reg> with each field being an 8 bit number. Various parts of the EEH code use BDFN and "config_addr" as interchangeable quantities even though they aren't really. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200725081231.39076-13-oohall@gmail.com
2020-07-26powerpc/eeh: Rename eeh_{add_to|remove_from}_parent_pe()Oliver O'Halloran4-8/+8
The naming of eeh_{add_to|remove_from}_parent_pe() doesn't really reflect what they actually do. If the PE referred to be edev->pe_config_addr already exists under that PHB then the edev is added to that PE. However, if the PE doesn't exist the a new one is created for the edev. The bulk of the implementation of eeh_add_to_parent_pe() covers that second case. Similarly, most of eeh_remove_from_parent_pe() is determining when it's safe to delete a PE. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200725081231.39076-12-oohall@gmail.com
2020-07-26powerpc/eeh: Remove spurious use of pci_dn in eeh_dump_dev_logOliver O'Halloran1-10/+4
Retrieve the domain, bus, device, and function numbers from the edev. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200725081231.39076-10-oohall@gmail.com
2020-07-26powerpc/eeh: Pass eeh_dev to eeh_ops->{read|write}_config()Oliver O'Halloran2-37/+32
Mechanical conversion of the eeh_ops interfaces to use eeh_dev to reference a specific device rather than pci_dn. No functional changes. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200725081231.39076-9-oohall@gmail.com
2020-07-26powerpc/eeh: Pass eeh_dev to eeh_ops->resume_notify()Oliver O'Halloran2-3/+3
Mechanical conversion of the eeh_ops interfaces to use eeh_dev to reference a specific device rather than pci_dn. No functional changes. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200725081231.39076-8-oohall@gmail.com
2020-07-26powerpc/eeh: Pass eeh_dev to eeh_ops->restore_config()Oliver O'Halloran2-7/+4
Mechanical conversion of the eeh_ops interfaces to use eeh_dev to reference a specific device rather than pci_dn. No functional changes. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200725081231.39076-7-oohall@gmail.com
2020-07-26powerpc/eeh: Remove VF config space restorationOliver O'Halloran1-59/+0
There's a bunch of strange things about this code. First up is that none of the fields being written to are functional for a VF. The SR-IOV specification lists then as "Reserved, but OS should preserve" so writing new values to them doesn't do anything and is clearly wrong from a correctness perspective. However, since VFs are designed to be managed by the OS there is an argument to be made that we should be saving and restoring some parts of config space. We already sort of do that by saving the first 64 bytes of config space in the eeh_dev (see eeh_dev->config_space[]). This is inadequate since it doesn't even consider saving and restoring the PCI capability structures. However, this is a problem with EEH in general and that needs to be fixed for non-VF devices too. There's no real reason to keep around this around so delete it. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200725081231.39076-6-oohall@gmail.com
2020-07-26powerpc/eeh: Move vf_index out of pci_dn and into eeh_devOliver O'Halloran2-7/+6
Drivers that do not support the PCI error handling callbacks are handled by tearing down the device and re-probing them. If the device being removed is a virtual function then we need to know the VF index so it can be removed using the pci_iov_{add|remove}_virtfn() API. Currently this is handled by looking up the pci_dn, and using the vf_index that was stashed there when the pci_dn for the VF was created in pcibios_sriov_enable(). We would like to eliminate the use of pci_dn outside of pseries though so we need to provide the generic EEH code with some other way to find the vf_index. The easiest thing to do here is move the vf_index field out of pci_dn and into eeh_dev. Currently pci_dn and eeh_dev are allocated and initialized together so this is a fairly minimal change in preparation for splitting pci_dn and eeh_dev in the future. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200725081231.39076-3-oohall@gmail.com
2020-07-26powerpc/eeh: Remove eeh_dev.cOliver O'Halloran3-55/+21
The only thing in this file is eeh_dev_init() which is allocates and initialises an eeh_dev based on a pci_dn. This is only ever called from pci_dn.c so move it into there and remove the file. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200725081231.39076-2-oohall@gmail.com
2020-07-26powerpc/eeh: Remove eeh_dev_phb_init_dynamic()Oliver O'Halloran3-16/+3
This function is a one line wrapper around eeh_phb_pe_create() and despite the name it doesn't create any eeh_dev structures. Replace it with direct calls to eeh_phb_pe_create() since that does what it says on the tin and removes a layer of indirection. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200725081231.39076-1-oohall@gmail.com
2020-07-26powerpc/watchpoint: Remove 512 byte boundaryRavi Bangoria1-2/+3
Power10 has removed 512 bytes boundary from match criteria i.e. the watch range can cross 512 bytes boundary. Note: ISA 3.1 Book III 9.4 match criteria includes 512 byte limit but that is a documentation mistake and hopefully will be fixed in the next version of ISA. Though, ISA 3.1 change log mentions about removal of 512B boundary: Multiple DEAW: Added a second Data Address Watchpoint. [H]DAR is set to the first byte of overlap. 512B boundary is removed. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200723090813.303838-11-ravi.bangoria@linux.ibm.com
2020-07-26powerpc/watchpoint: Guest support for 2nd DAWR hcallRavi Bangoria1-1/+1
2nd DAWR can be set/unset using H_SET_MODE hcall with resource value 5. Enable powervm guest support with that. This has no effect on kvm guest because kvm will return error if guest does hcall with resource value 5. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200723090813.303838-9-ravi.bangoria@linux.ibm.com
2020-07-26powerpc/watchpoint: Set CPU_FTR_DAWR1 based on pa-features bitRavi Bangoria1-0/+2
As per the PAPR, bit 0 of byte 64 in pa-features property indicates availability of 2nd DAWR registers. i.e. If this bit is set, 2nd DAWR is present, otherwise not. Host generally uses "cpu-features", which masks "pa-features". But "cpu-features" are still not used for guests and thus this change is mostly applicable for guests only. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Tested-by: Jordan Niethe <jniethe5@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200723090813.303838-7-ravi.bangoria@linux.ibm.com
2020-07-26powerpc/dt_cpu_ftrs: Add feature for 2nd DAWRRavi Bangoria1-0/+1
Add new device-tree feature for 2nd DAWR. If this feature is present, 2nd DAWR is supported, otherwise not. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200723090813.303838-6-ravi.bangoria@linux.ibm.com
2020-07-26powerpc/watchpoint: Fix DAWR exception for CACHEOPRavi Bangoria1-1/+20
'ea' returned by analyse_instr() needs to be aligned down to cache block size for CACHEOP instructions. analyse_instr() does not set size for CACHEOP, thus size also needs to be calculated manually. Fixes: 27985b2a640e ("powerpc/watchpoint: Don't ignore extraneous exceptions blindly") Fixes: 74c6881019b7 ("powerpc/watchpoint: Prepare handler to handle more than one watchpoint") Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200723090813.303838-4-ravi.bangoria@linux.ibm.com
2020-07-26powerpc/watchpoint: Fix DAWR exception constraintRavi Bangoria1-31/+41
Pedro Miraglia Franco de Carvalho noticed that on p8/p9, DAR value is inconsistent with different type of load/store. Like for byte,word etc. load/stores, DAR is set to the address of the first byte of overlap between watch range and real access. But for quadword load/ store it's sometime set to the address of the first byte of real access whereas sometime set to the address of the first byte of overlap. This issue has been fixed in p10. In p10(ISA 3.1), DAR is always set to the address of the first byte of overlap. Commit 27985b2a640e ("powerpc/watchpoint: Don't ignore extraneous exceptions blindly") wrongly assumes that DAR is set to the address of the first byte of overlap for all load/stores on p8/p9 as well. Fix that. With the fix, we now rely on 'ea' provided by analyse_instr(). If analyse_instr() fails, generate event unconditionally on p8/p9, and on p10 generate event only if DAR is within a DAWR range. Note: 8xx is not affected. Fixes: 27985b2a640e ("powerpc/watchpoint: Don't ignore extraneous exceptions blindly") Fixes: 74c6881019b7 ("powerpc/watchpoint: Prepare handler to handle more than one watchpoint") Reported-by: Pedro Miraglia Franco de Carvalho <pedromfc@br.ibm.com> Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200723090813.303838-3-ravi.bangoria@linux.ibm.com
2020-07-26powerpc/watchpoint: Fix 512 byte boundary limitRavi Bangoria1-1/+1
Milton Miller reported that we are aligning start and end address to wrong size SZ_512M. It should be SZ_512. Fix that. While doing this change I also found a case where ALIGN() comparison fails. Within a given aligned range, ALIGN() of two addresses does not match when start address is pointing to the first byte and end address is pointing to any other byte except the first one. But that's not true for ALIGN_DOWN(). ALIGN_DOWN() of any two addresses within that range will always point to the first byte. So use ALIGN_DOWN() instead of ALIGN(). Fixes: e68ef121c1f4 ("powerpc/watchpoint: Use builtin ALIGN*() macros") Reported-by: Milton Miller <miltonm@us.ibm.com> Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Tested-by: Jordan Niethe <jniethe5@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200723090813.303838-2-ravi.bangoria@linux.ibm.com
2020-07-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller2-2/+2
The UDP reuseport conflict was a little bit tricky. The net-next code, via bpf-next, extracted the reuseport handling into a helper so that the BPF sk lookup code could invoke it. At the same time, the logic for reuseport handling of unconnected sockets changed via commit efc6b6f6c3113e8b203b9debfb72d81e0f3dcace which changed the logic to carry on the reuseport result into the rest of the lookup loop if we do not return immediately. This requires moving the reuseport_has_conns() logic into the callers. While we are here, get rid of inline directives as they do not belong in foo.c files. The other changes were cases of more straightforward overlapping modifications. Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-25Merge tag 'v5.8-rc6' into locking/core, to pick up fixesIngo Molnar2-2/+2
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-07-23Merge branch 'scv' support into nextMichael Ellerman9-27/+374
From Nick's cover letter: Linux powerpc new system call instruction and ABI System Call Vectored (scv) ABI ============================== The scv instruction is introduced with POWER9 / ISA3, it comes with an rfscv counter-part. The benefit of these instructions is performance (trading slower SRR0/1 with faster LR/CTR registers, and entering the kernel with MSR[EE] and MSR[RI] left enabled, which can reduce MSR updates. The scv instruction has 128 levels (not enough to cover the Linux system call space). Assignment and advertisement ---------------------------- The proposal is to assign scv levels conservatively, and advertise them with HWCAP feature bits as we add support for more. Linux has not enabled FSCR[SCV] yet, so executing the scv instruction will cause the kernel to log a "SCV facility unavilable" message, and deliver a SIGILL with ILL_ILLOPC to the process. Linux has defined a HWCAP2 bit PPC_FEATURE2_SCV for SCV support, but does not set it. This change allocates the zero level ('scv 0'), advertised with PPC_FEATURE2_SCV, which will be used to provide normal Linux system calls (equivalent to 'sc'). Attempting to execute scv with other levels will cause a SIGILL to be delivered the same as before, but will not log a "SCV facility unavailable" message (because the processor facility is enabled). Calling convention ------------------ The proposal is for scv 0 to provide the standard Linux system call ABI with the following differences from sc convention[1]: - LR is to be volatile across scv calls. This is necessary because the scv instruction clobbers LR. From previous discussion, this should be possible to deal with in GCC clobbers and CFI. - cr1 and cr5-cr7 are volatile. This matches the C ABI and would allow the kernel system call exit to avoid restoring the volatile cr registers (although we probably still would anyway to avoid information leaks). - Error handling: The consensus among kernel, glibc, and musl is to move to using negative return values in r3 rather than CR0[SO]=1 to indicate error, which matches most other architectures, and is closer to a function call. Notes ----- - r0,r4-r8 are documented as volatile in the ABI, but the kernel patch as submitted currently preserves them. This is to leave room for deciding which way to go with these. Some small benefit was found by preserving them[1] but I'm not convinced it's worth deviating from the C function call ABI just for this. Release code should follow the ABI. Previous discussions: https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-April/208691.html https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-April/209268.html [1] https://github.com/torvalds/linux/blob/master/Documentation/powerpc/syscall64-abi.rst [2] https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-April/209263.html