summaryrefslogtreecommitdiffstats
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2022-12-15kprobes: kretprobe events missing on 2-core KVM guestwuqiang1-6/+2
Default value of maxactive is set as num_possible_cpus() for nonpreemptable systems. For a 2-core system, only 2 kretprobe instances would be allocated in default, then these 2 instances for execve kretprobe are very likely to be used up with a pipelined command. Here's the testcase: a shell script was added to crontab, and the content of the script is: #!/bin/sh do_something_magic `tr -dc a-z < /dev/urandom | head -c 10` cron will trigger a series of program executions (4 times every hour). Then events loss would be noticed normally after 3-4 hours of testings. The issue is caused by a burst of series of execve requests. The best number of kretprobe instances could be different case by case, and should be user's duty to determine, but num_possible_cpus() as the default value is inadequate especially for systems with small number of cpus. This patch enables the logic for preemption as default, thus increases the minimum of maxactive to 10 for nonpreemptable systems. Link: https://lore.kernel.org/all/20221110081502.492289-1-wuqiang.matt@bytedance.com/ Signed-off-by: wuqiang <wuqiang.matt@bytedance.com> Reviewed-by: Solar Designer <solar@openwall.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2022-12-14Merge tag 'x86_core_for_v6.2' of ↵Linus Torvalds2-6/+26
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 core updates from Borislav Petkov: - Add the call depth tracking mitigation for Retbleed which has been long in the making. It is a lighterweight software-only fix for Skylake-based cores where enabling IBRS is a big hammer and causes a significant performance impact. What it basically does is, it aligns all kernel functions to 16 bytes boundary and adds a 16-byte padding before the function, objtool collects all functions' locations and when the mitigation gets applied, it patches a call accounting thunk which is used to track the call depth of the stack at any time. When that call depth reaches a magical, microarchitecture-specific value for the Return Stack Buffer, the code stuffs that RSB and avoids its underflow which could otherwise lead to the Intel variant of Retbleed. This software-only solution brings a lot of the lost performance back, as benchmarks suggest: https://lore.kernel.org/all/20220915111039.092790446@infradead.org/ That page above also contains a lot more detailed explanation of the whole mechanism - Implement a new control flow integrity scheme called FineIBT which is based on the software kCFI implementation and uses hardware IBT support where present to annotate and track indirect branches using a hash to validate them - Other misc fixes and cleanups * tag 'x86_core_for_v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (80 commits) x86/paravirt: Use common macro for creating simple asm paravirt functions x86/paravirt: Remove clobber bitmask from .parainstructions x86/debug: Include percpu.h in debugreg.h to get DECLARE_PER_CPU() et al x86/cpufeatures: Move X86_FEATURE_CALL_DEPTH from bit 18 to bit 19 of word 11, to leave space for WIP X86_FEATURE_SGX_EDECCSSA bit x86/Kconfig: Enable kernel IBT by default x86,pm: Force out-of-line memcpy() objtool: Fix weak hole vs prefix symbol objtool: Optimize elf_dirty_reloc_sym() x86/cfi: Add boot time hash randomization x86/cfi: Boot time selection of CFI scheme x86/ibt: Implement FineIBT objtool: Add --cfi to generate the .cfi_sites section x86: Add prefix symbols for function padding objtool: Add option to generate prefix symbols objtool: Avoid O(bloody terrible) behaviour -- an ode to libelf objtool: Slice up elf_create_section_symbol() kallsyms: Revert "Take callthunks into account" x86: Unconfuse CONFIG_ and X86_FEATURE_ namespaces x86/retpoline: Fix crash printing warning x86/paravirt: Fix a !PARAVIRT build warning ...
2022-12-14Merge tag 'v6.2-p1' of ↵Linus Torvalds1-5/+10
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto updates from Herbert Xu: "API: - Optimise away self-test overhead when they are disabled - Support symmetric encryption via keyring keys in af_alg - Flip hwrng default_quality, the default is now maximum entropy Algorithms: - Add library version of aesgcm - CFI fixes for assembly code - Add arm/arm64 accelerated versions of sm3/sm4 Drivers: - Remove assumption on arm64 that kmalloc is DMA-aligned - Fix selftest failures in rockchip - Add support for RK3328/RK3399 in rockchip - Add deflate support in qat - Merge ux500 into stm32 - Add support for TEE for PCI ID 0x14CA in ccp - Add mt7986 support in mtk - Add MaxLinear platform support in inside-secure - Add NPCM8XX support in npcm" * tag 'v6.2-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (184 commits) crypto: ux500/cryp - delete driver crypto: stm32/cryp - enable for use with Ux500 crypto: stm32 - enable drivers to be used on Ux500 dt-bindings: crypto: Let STM32 define Ux500 CRYP hwrng: geode - Fix PCI device refcount leak hwrng: amd - Fix PCI device refcount leak crypto: qce - Set DMA alignment explicitly crypto: octeontx2 - Set DMA alignment explicitly crypto: octeontx - Set DMA alignment explicitly crypto: keembay - Set DMA alignment explicitly crypto: safexcel - Set DMA alignment explicitly crypto: hisilicon/hpre - Set DMA alignment explicitly crypto: chelsio - Set DMA alignment explicitly crypto: ccree - Set DMA alignment explicitly crypto: ccp - Set DMA alignment explicitly crypto: cavium - Set DMA alignment explicitly crypto: img-hash - Fix variable dereferenced before check 'hdev->req' crypto: arm64/ghash-ce - use frame_push/pop macros consistently crypto: arm64/crct10dif - use frame_push/pop macros consistently crypto: arm64/aes-modes - use frame_push/pop macros consistently ...
2022-12-14Merge tag 'hardening-v6.2-rc1' of ↵Linus Torvalds6-15/+112
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull kernel hardening updates from Kees Cook: - Convert flexible array members, fix -Wstringop-overflow warnings, and fix KCFI function type mismatches that went ignored by maintainers (Gustavo A. R. Silva, Nathan Chancellor, Kees Cook) - Remove the remaining side-effect users of ksize() by converting dma-buf, btrfs, and coredump to using kmalloc_size_roundup(), add more __alloc_size attributes, and introduce full testing of all allocator functions. Finally remove the ksize() side-effect so that each allocation-aware checker can finally behave without exceptions - Introduce oops_limit (default 10,000) and warn_limit (default off) to provide greater granularity of control for panic_on_oops and panic_on_warn (Jann Horn, Kees Cook) - Introduce overflows_type() and castable_to_type() helpers for cleaner overflow checking - Improve code generation for strscpy() and update str*() kern-doc - Convert strscpy and sigphash tests to KUnit, and expand memcpy tests - Always use a non-NULL argument for prepare_kernel_cred() - Disable structleak plugin in FORTIFY KUnit test (Anders Roxell) - Adjust orphan linker section checking to respect CONFIG_WERROR (Xin Li) - Make sure siginfo is cleared for forced SIGKILL (haifeng.xu) - Fix um vs FORTIFY warnings for always-NULL arguments * tag 'hardening-v6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (31 commits) ksmbd: replace one-element arrays with flexible-array members hpet: Replace one-element array with flexible-array member um: virt-pci: Avoid GCC non-NULL warning signal: Initialize the info in ksignal lib: fortify_kunit: build without structleak plugin panic: Expose "warn_count" to sysfs panic: Introduce warn_limit panic: Consolidate open-coded panic_on_warn checks exit: Allow oops_limit to be disabled exit: Expose "oops_count" to sysfs exit: Put an upper limit on how often we can oops panic: Separate sysctl logic from CONFIG_SMP mm/pgtable: Fix multiple -Wstringop-overflow warnings mm: Make ksize() a reporting-only function kunit/fortify: Validate __alloc_size attribute results drm/sti: Fix return type of sti_{dvo,hda,hdmi}_connector_mode_valid() drm/fsl-dcu: Fix return type of fsl_dcu_drm_connector_mode_valid() driver core: Add __alloc_size hint to devm allocators overflow: Introduce overflows_type() and castable_to_type() coredump: Proactively round up to kmalloc bucket size ...
2022-12-14bpf: Synchronize dispatcher update with bpf_dispatcher_xdp_funcJiri Olsa1-0/+5
Hao Sun reported crash in dispatcher image [1]. Currently we don't have any sync between bpf_dispatcher_update and bpf_dispatcher_xdp_func, so following race is possible: cpu 0: cpu 1: bpf_prog_run_xdp ... bpf_dispatcher_xdp_func in image at offset 0x0 bpf_dispatcher_update update image at offset 0x800 bpf_dispatcher_update update image at offset 0x0 in image at offset 0x0 -> crash Fixing this by synchronizing dispatcher image update (which is done in bpf_dispatcher_update function) with bpf_dispatcher_xdp_func that reads and execute the dispatcher image. Calling synchronize_rcu after updating and installing new image ensures that readers leave old image before it's changed in the next dispatcher update. The update itself is locked with dispatcher's mutex. The bpf_prog_run_xdp is called under local_bh_disable and synchronize_rcu will wait for it to leave [2]. [1] https://lore.kernel.org/bpf/Y5SFho7ZYXr9ifRn@krava/T/#m00c29ece654bc9f332a17df493bbca33e702896c [2] https://lore.kernel.org/bpf/0B62D35A-E695-4B7A-A0D4-774767544C1A@gmail.com/T/#mff43e2c003ae99f4a38f353c7969be4c7162e877 Reported-by: Hao Sun <sunhao.th@gmail.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Paul E. McKenney <paulmck@kernel.org> Link: https://lore.kernel.org/r/20221214123542.1389719-1-jolsa@kernel.org Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2022-12-14bpf: prevent leak of lsm program after failed attachMilan Landaverde1-3/+3
In [0], we added the ability to bpf_prog_attach LSM programs to cgroups, but in our validation to make sure the prog is meant to be attached to BPF_LSM_CGROUP, we return too early if the check fails. This results in lack of decrementing prog's refcnt (through bpf_prog_put) leaving the LSM program alive past the point of the expected lifecycle. This fix allows for the decrement to take place. [0] https://lore.kernel.org/all/20220628174314.1216643-4-sdf@google.com/ Fixes: 69fd337a975c ("bpf: per-cgroup lsm flavor") Signed-off-by: Milan Landaverde <milan@mdaverde.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20221213175714.31963-1-milan@mdaverde.com
2022-12-14Merge tag 'for-linus-iommufd' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd Pull iommufd implementation from Jason Gunthorpe: "iommufd is the user API to control the IOMMU subsystem as it relates to managing IO page tables that point at user space memory. It takes over from drivers/vfio/vfio_iommu_type1.c (aka the VFIO container) which is the VFIO specific interface for a similar idea. We see a broad need for extended features, some being highly IOMMU device specific: - Binding iommu_domain's to PASID/SSID - Userspace IO page tables, for ARM, x86 and S390 - Kernel bypassed invalidation of user page tables - Re-use of the KVM page table in the IOMMU - Dirty page tracking in the IOMMU - Runtime Increase/Decrease of IOPTE size - PRI support with faults resolved in userspace Many of these HW features exist to support VM use cases - for instance the combination of PASID, PRI and Userspace IO Page Tables allows an implementation of DMA Shared Virtual Addressing (vSVA) within a guest. Dirty tracking enables VM live migration with SRIOV devices and PASID support allow creating "scalable IOV" devices, among other things. As these features are fundamental to a VM platform they need to be uniformly exposed to all the driver families that do DMA into VMs, which is currently VFIO and VDPA" For more background, see the extended explanations in Jason's pull request: https://lore.kernel.org/lkml/Y5dzTU8dlmXTbzoJ@nvidia.com/ * tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: (62 commits) iommufd: Change the order of MSI setup iommufd: Improve a few unclear bits of code iommufd: Fix comment typos vfio: Move vfio group specific code into group.c vfio: Refactor dma APIs for emulated devices vfio: Wrap vfio group module init/clean code into helpers vfio: Refactor vfio_device open and close vfio: Make vfio_device_open() truly device specific vfio: Swap order of vfio_device_container_register() and open_device() vfio: Set device->group in helper function vfio: Create wrappers for group register/unregister vfio: Move the sanity check of the group to vfio_create_group() vfio: Simplify vfio_create_group() iommufd: Allow iommufd to supply /dev/vfio/vfio vfio: Make vfio_container optionally compiled vfio: Move container related MODULE_ALIAS statements into container.c vfio-iommufd: Support iommufd for emulated VFIO devices vfio-iommufd: Support iommufd for physical VFIO devices vfio-iommufd: Allow iommufd to be used in place of a container fd vfio: Use IOMMU_CAP_ENFORCE_CACHE_COHERENCY for vfio_file_enforced_coherent() ...
2022-12-14tracing: Improve panic/die notifiersGuilherme G. Piccoli1-28/+27
Currently the tracing dump_on_oops feature is implemented through separate notifiers, one for die/oops and the other for panic; given they have the same functionality, let's unify them. Also improve the function comment and change the priority of the notifier to make it execute earlier, avoiding showing useless trace data (like the callback names for the other notifiers); finally, we also removed an unnecessary header inclusion. Link: https://lkml.kernel.org/r/20220819221731.480795-7-gpiccoli@igalia.com Cc: Petr Mladek <pmladek@suse.com> Cc: Sergei Shtylyov <sergei.shtylyov@gmail.com> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-12-14ftrace: Prevent RCU stall on PREEMPT_VOLUNTARY kernelsGuilherme G. Piccoli1-0/+1
The function match_records() may take a while due to a large number of string comparisons, so when in PREEMPT_VOLUNTARY kernels we could face RCU stalls due to that. Add a cond_resched() to prevent that. Link: https://lkml.kernel.org/r/20221115204847.593616-1-gpiccoli@igalia.com Cc: Mark Rutland <mark.rutland@arm.com> Suggested-by: Steven Rostedt <rostedt@goodmis.org> Acked-by: Paul E. McKenney <paulmck@kernel.org> # from RCU CPU stall warning perspective Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-12-14tracing: Do not synchronize freeing of trigger filter on boot upSteven Rostedt (Google)1-2/+8
If a trigger filter on the kernel command line fails to apply (due to syntax error), it will be freed. The freeing will call tracepoint_synchronize_unregister(), but this is not needed during early boot up, and will even trigger a lockdep splat. Avoid calling the synchronization function when system_state is SYSTEM_BOOTING. Link: https://lore.kernel.org/linux-trace-kernel/20221213172429.7774f4ba@gandalf.local.home Cc: Andrew Morton <akpm@linux-foundation.org> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-12-14padata: Mark padata_work_init() as __refNathan Chancellor1-2/+10
When building arm64 allmodconfig + ThinLTO with clang and a proposed modpost update to account for -ffuncton-sections, the following warning appears: WARNING: modpost: vmlinux.o: section mismatch in reference: padata_work_init (section: .text.padata_work_init) -> padata_mt_helper (section: .init.text) WARNING: modpost: vmlinux.o: section mismatch in reference: padata_work_init (section: .text.padata_work_init) -> padata_mt_helper (section: .init.text) LLVM has optimized padata_work_init() to include the address of padata_mt_helper() directly because it inlined the other call to padata_work_init() with padata_parallel_worker(), meaning the remaining uses of padata_work_init() use padata_mt_helper() as the work_fn argument. This optimization causes modpost to complain since padata_work_init() is not __init, whereas padata_mt_helper() is. Since padata_work_init() is only called from __init code when padata_mt_helper() is passed as the work_fn argument, mark padata_work_init() as __ref, which makes it clear to modpost that this scenario is okay. Suggested-by: Daniel Jordan <daniel.m.jordan@oracle.com> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Acked-by: Daniel Jordan <daniel.m.jordan@oracle.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2022-12-13Merge tag 'mm-stable-2022-12-13' of ↵Linus Torvalds4-8/+22
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - More userfaultfs work from Peter Xu - Several convert-to-folios series from Sidhartha Kumar and Huang Ying - Some filemap cleanups from Vishal Moola - David Hildenbrand added the ability to selftest anon memory COW handling - Some cpuset simplifications from Liu Shixin - Addition of vmalloc tracing support by Uladzislau Rezki - Some pagecache folioifications and simplifications from Matthew Wilcox - A pagemap cleanup from Kefeng Wang: we have VM_ACCESS_FLAGS, so use it - Miguel Ojeda contributed some cleanups for our use of the __no_sanitize_thread__ gcc keyword. This series should have been in the non-MM tree, my bad - Naoya Horiguchi improved the interaction between memory poisoning and memory section removal for huge pages - DAMON cleanups and tuneups from SeongJae Park - Tony Luck fixed the handling of COW faults against poisoned pages - Peter Xu utilized the PTE marker code for handling swapin errors - Hugh Dickins reworked compound page mapcount handling, simplifying it and making it more efficient - Removal of the autonuma savedwrite infrastructure from Nadav Amit and David Hildenbrand - zram support for multiple compression streams from Sergey Senozhatsky - David Hildenbrand reworked the GUP code's R/O long-term pinning so that drivers no longer need to use the FOLL_FORCE workaround which didn't work very well anyway - Mel Gorman altered the page allocator so that local IRQs can remnain enabled during per-cpu page allocations - Vishal Moola removed the try_to_release_page() wrapper - Stefan Roesch added some per-BDI sysfs tunables which are used to prevent network block devices from dirtying excessive amounts of pagecache - David Hildenbrand did some cleanup and repair work on KSM COW breaking - Nhat Pham and Johannes Weiner have implemented writeback in zswap's zsmalloc backend - Brian Foster has fixed a longstanding corner-case oddity in file[map]_write_and_wait_range() - sparse-vmemmap changes for MIPS, LoongArch and NIOS2 from Feiyang Chen - Shiyang Ruan has done some work on fsdax, to make its reflink mode work better under xfstests. Better, but still not perfect - Christoph Hellwig has removed the .writepage() method from several filesystems. They only need .writepages() - Yosry Ahmed wrote a series which fixes the memcg reclaim target beancounting - David Hildenbrand has fixed some of our MM selftests for 32-bit machines - Many singleton patches, as usual * tag 'mm-stable-2022-12-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (313 commits) mm/hugetlb: set head flag before setting compound_order in __prep_compound_gigantic_folio mm: mmu_gather: allow more than one batch of delayed rmaps mm: fix typo in struct pglist_data code comment kmsan: fix memcpy tests mm: add cond_resched() in swapin_walk_pmd_entry() mm: do not show fs mm pc for VM_LOCKONFAULT pages selftests/vm: ksm_functional_tests: fixes for 32bit selftests/vm: cow: fix compile warning on 32bit selftests/vm: madv_populate: fix missing MADV_POPULATE_(READ|WRITE) definitions mm/gup_test: fix PIN_LONGTERM_TEST_READ with highmem mm,thp,rmap: fix races between updates of subpages_mapcount mm: memcg: fix swapcached stat accounting mm: add nodes= arg to memory.reclaim mm: disable top-tier fallback to reclaim on proactive reclaim selftests: cgroup: make sure reclaim target memcg is unprotected selftests: cgroup: refactor proactive reclaim code to reclaim_until() mm: memcg: fix stale protection of reclaim target memcg mm/mmap: properly unaccount memory on mas_preallocate() failure omfs: remove ->writepage jfs: remove ->writepage ...
2022-12-13Merge tag 'net-next-6.2' of ↵Linus Torvalds27-1623/+4459
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Paolo Abeni: "Core: - Allow live renaming when an interface is up - Add retpoline wrappers for tc, improving considerably the performances of complex queue discipline configurations - Add inet drop monitor support - A few GRO performance improvements - Add infrastructure for atomic dev stats, addressing long standing data races - De-duplicate common code between OVS and conntrack offloading infrastructure - A bunch of UBSAN_BOUNDS/FORTIFY_SOURCE improvements - Netfilter: introduce packet parser for tunneled packets - Replace IPVS timer-based estimators with kthreads to scale up the workload with the number of available CPUs - Add the helper support for connection-tracking OVS offload BPF: - Support for user defined BPF objects: the use case is to allocate own objects, build own object hierarchies and use the building blocks to build own data structures flexibly, for example, linked lists in BPF - Make cgroup local storage available to non-cgroup attached BPF programs - Avoid unnecessary deadlock detection and failures wrt BPF task storage helpers - A relevant bunch of BPF verifier fixes and improvements - Veristat tool improvements to support custom filtering, sorting, and replay of results - Add LLVM disassembler as default library for dumping JITed code - Lots of new BPF documentation for various BPF maps - Add bpf_rcu_read_{,un}lock() support for sleepable programs - Add RCU grace period chaining to BPF to wait for the completion of access from both sleepable and non-sleepable BPF programs - Add support storing struct task_struct objects as kptrs in maps - Improve helper UAPI by explicitly defining BPF_FUNC_xxx integer values - Add libbpf *_opts API-variants for bpf_*_get_fd_by_id() functions Protocols: - TCP: implement Protective Load Balancing across switch links - TCP: allow dynamically disabling TCP-MD5 static key, reverting back to fast[er]-path - UDP: Introduce optional per-netns hash lookup table - IPv6: simplify and cleanup sockets disposal - Netlink: support different type policies for each generic netlink operation - MPTCP: add MSG_FASTOPEN and FastOpen listener side support - MPTCP: add netlink notification support for listener sockets events - SCTP: add VRF support, allowing sctp sockets binding to VRF devices - Add bridging MAC Authentication Bypass (MAB) support - Extensions for Ethernet VPN bridging implementation to better support multicast scenarios - More work for Wi-Fi 7 support, comprising conversion of all the existing drivers to internal TX queue usage - IPSec: introduce a new offload type (packet offload) allowing complete header processing and crypto offloading - IPSec: extended ack support for more descriptive XFRM error reporting - RXRPC: increase SACK table size and move processing into a per-local endpoint kernel thread, reducing considerably the required locking - IEEE 802154: synchronous send frame and extended filtering support, initial support for scanning available 15.4 networks - Tun: bump the link speed from 10Mbps to 10Gbps - Tun/VirtioNet: implement UDP segmentation offload support Driver API: - PHY/SFP: improve power level switching between standard level 1 and the higher power levels - New API for netdev <-> devlink_port linkage - PTP: convert existing drivers to new frequency adjustment implementation - DSA: add support for rx offloading - Autoload DSA tagging driver when dynamically changing protocol - Add new PCP and APPTRUST attributes to Data Center Bridging - Add configuration support for 800Gbps link speed - Add devlink port function attribute to enable/disable RoCE and migratable - Extend devlink-rate to support strict prioriry and weighted fair queuing - Add devlink support to directly reading from region memory - New device tree helper to fetch MAC address from nvmem - New big TCP helper to simplify temporary header stripping New hardware / drivers: - Ethernet: - Marvel Octeon CNF95N and CN10KB Ethernet Switches - Marvel Prestera AC5X Ethernet Switch - WangXun 10 Gigabit NIC - Motorcomm yt8521 Gigabit Ethernet - Microchip ksz9563 Gigabit Ethernet Switch - Microsoft Azure Network Adapter - Linux Automation 10Base-T1L adapter - PHY: - Aquantia AQR112 and AQR412 - Motorcomm YT8531S - PTP: - Orolia ART-CARD - WiFi: - MediaTek Wi-Fi 7 (802.11be) devices - RealTek rtw8821cu, rtw8822bu, rtw8822cu and rtw8723du USB devices - Bluetooth: - Broadcom BCM4377/4378/4387 Bluetooth chipsets - Realtek RTL8852BE and RTL8723DS - Cypress.CYW4373A0 WiFi + Bluetooth combo device Drivers: - CAN: - gs_usb: bus error reporting support - kvaser_usb: listen only and bus error reporting support - Ethernet NICs: - Intel (100G): - extend action skbedit to RX queue mapping - implement devlink-rate support - support direct read from memory - nVidia/Mellanox (mlx5): - SW steering improvements, increasing rules update rate - Support for enhanced events compression - extend H/W offload packet manipulation capabilities - implement IPSec packet offload mode - nVidia/Mellanox (mlx4): - better big TCP support - Netronome Ethernet NICs (nfp): - IPsec offload support - add support for multicast filter - Broadcom: - RSS and PTP support improvements - AMD/SolarFlare: - netlink extened ack improvements - add basic flower matches to offload, and related stats - Virtual NICs: - ibmvnic: introduce affinity hint support - small / embedded: - FreeScale fec: add initial XDP support - Marvel mv643xx_eth: support MII/GMII/RGMII modes for Kirkwood - TI am65-cpsw: add suspend/resume support - Mediatek MT7986: add RX wireless wthernet dispatch support - Realtek 8169: enable GRO software interrupt coalescing per default - Ethernet high-speed switches: - Microchip (sparx5): - add support for Sparx5 TC/flower H/W offload via VCAP - Mellanox mlxsw: - add 802.1X and MAC Authentication Bypass offload support - add ip6gre support - Embedded Ethernet switches: - Mediatek (mtk_eth_soc): - improve PCS implementation, add DSA untag support - enable flow offload support - Renesas: - add rswitch R-Car Gen4 gPTP support - Microchip (lan966x): - add full XDP support - add TC H/W offload via VCAP - enable PTP on bridge interfaces - Microchip (ksz8): - add MTU support for KSZ8 series - Qualcomm 802.11ax WiFi (ath11k): - support configuring channel dwell time during scan - MediaTek WiFi (mt76): - enable Wireless Ethernet Dispatch (WED) offload support - add ack signal support - enable coredump support - remain_on_channel support - Intel WiFi (iwlwifi): - enable Wi-Fi 7 Extremely High Throughput (EHT) PHY capabilities - 320 MHz channels support - RealTek WiFi (rtw89): - new dynamic header firmware format support - wake-over-WLAN support" * tag 'net-next-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2002 commits) ipvs: fix type warning in do_div() on 32 bit net: lan966x: Remove a useless test in lan966x_ptp_add_trap() net: ipa: add IPA v4.7 support dt-bindings: net: qcom,ipa: Add SM6350 compatible bnxt: Use generic HBH removal helper in tx path IPv6/GRO: generic helper to remove temporary HBH/jumbo header in driver selftests: forwarding: Add bridge MDB test selftests: forwarding: Rename bridge_mdb test bridge: mcast: Support replacement of MDB port group entries bridge: mcast: Allow user space to specify MDB entry routing protocol bridge: mcast: Allow user space to add (*, G) with a source list and filter mode bridge: mcast: Add support for (*, G) with a source list and filter mode bridge: mcast: Avoid arming group timer when (S, G) corresponds to a source bridge: mcast: Add a flag for user installed source entries bridge: mcast: Expose __br_multicast_del_group_src() bridge: mcast: Expose br_multicast_new_group_src() bridge: mcast: Add a centralized error path bridge: mcast: Place netlink policy before validation functions bridge: mcast: Split (*, G) and (S, G) addition into different functions bridge: mcast: Do not derive entry type from its filter mode ...
2022-12-13Merge tag 'sysctl-6.2-rc1' of ↵Linus Torvalds3-23/+27
git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux Pull sysctl updates from Luis Chamberlain: "Only a small step forward on the sysctl cleanups for this cycle" * tag 'sysctl-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: sched: Move numa_balancing sysctls to its own file
2022-12-13Merge tag 'modules-6.2-rc1' of ↵Linus Torvalds11-39/+739
git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux Pull modules updates from Luis Chamberlain: "Tux gets for xmas an improvement to the average lookup performance of kallsyms_lookup_name() by 715x thanks to the work by Zhen Lei, which upgraded our old implementation from being O(n) to O(log(n)), while also retaining the old implementation support on /proc/kallsyms. The only penalty was increasing the memory footprint by 3 * kallsyms_num_syms. Folks who want to improve this further now also have a dedicated selftest facility through KALLSYMS_SELFTEST. Stephen Boyd added zstd in-kernel decompression support, but the only users of this would be folks using the load-pin LSM because otherwise we do module decompression in userspace. The only other thing with mentioning is a minor boot time optimization by Rasmus Villemoes which deferes param_sysfs_init() to late init. The rest is cleanups and minor fixes" * tag 'modules-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: livepatch: Call klp_match_callback() in klp_find_callback() to avoid code duplication module/decompress: Support zstd in-kernel decompression kallsyms: Remove unneeded semicolon kallsyms: Add self-test facility livepatch: Use kallsyms_on_each_match_symbol() to improve performance kallsyms: Add helper kallsyms_on_each_match_symbol() kallsyms: Reduce the memory occupied by kallsyms_seqs_of_names[] kallsyms: Correctly sequence symbols when CONFIG_LTO_CLANG=y kallsyms: Improve the performance of kallsyms_lookup_name() scripts/kallsyms: rename build_initial_tok_table() module: Fix NULL vs IS_ERR checking for module_get_next_page kernel/params.c: defer most of param_sysfs_init() to late_initcall time module: Remove unused macros module_addr_min/max module: remove redundant module_sysfs_initialized variable
2022-12-13tracing: Have trigger filter parsing errors show up in error_logSteven Rostedt (Google)1-1/+8
It is annoying that the filter parsing of triggers do not show up in the error_log. Trying to figure out what is incorrect in the input is difficult when it fails for a typo. Have the errors of filter parsing show up in error_log as well. Link: https://lore.kernel.org/linux-trace-kernel/20221213095602.083fa9fd@gandalf.local.home Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-12-13Merge tag 'for-6.2/block-2022-12-08' of git://git.kernel.dk/linuxLinus Torvalds2-10/+15
Pull block updates from Jens Axboe: - NVMe pull requests via Christoph: - Support some passthrough commands without CAP_SYS_ADMIN (Kanchan Joshi) - Refactor PCIe probing and reset (Christoph Hellwig) - Various fabrics authentication fixes and improvements (Sagi Grimberg) - Avoid fallback to sequential scan due to transient issues (Uday Shankar) - Implement support for the DEAC bit in Write Zeroes (Christoph Hellwig) - Allow overriding the IEEE OUI and firmware revision in configfs for nvmet (Aleksandr Miloserdov) - Force reconnect when number of queue changes in nvmet (Daniel Wagner) - Minor fixes and improvements (Uros Bizjak, Joel Granados, Sagi Grimberg, Christoph Hellwig, Christophe JAILLET) - Fix and cleanup nvme-fc req allocation (Chaitanya Kulkarni) - Use the common tagset helpers in nvme-pci driver (Christoph Hellwig) - Cleanup the nvme-pci removal path (Christoph Hellwig) - Use kstrtobool() instead of strtobool (Christophe JAILLET) - Allow unprivileged passthrough of Identify Controller (Joel Granados) - Support io stats on the mpath device (Sagi Grimberg) - Minor nvmet cleanup (Sagi Grimberg) - MD pull requests via Song: - Code cleanups (Christoph) - Various fixes - Floppy pull request from Denis: - Fix a memory leak in the init error path (Yuan) - Series fixing some batch wakeup issues with sbitmap (Gabriel) - Removal of the pktcdvd driver that was deprecated more than 5 years ago, and subsequent removal of the devnode callback in struct block_device_operations as no users are now left (Greg) - Fix for partition read on an exclusively opened bdev (Jan) - Series of elevator API cleanups (Jinlong, Christoph) - Series of fixes and cleanups for blk-iocost (Kemeng) - Series of fixes and cleanups for blk-throttle (Kemeng) - Series adding concurrent support for sync queues in BFQ (Yu) - Series bringing drbd a bit closer to the out-of-tree maintained version (Christian, Joel, Lars, Philipp) - Misc drbd fixes (Wang) - blk-wbt fixes and tweaks for enable/disable (Yu) - Fixes for mq-deadline for zoned devices (Damien) - Add support for read-only and offline zones for null_blk (Shin'ichiro) - Series fixing the delayed holder tracking, as used by DM (Yu, Christoph) - Series enabling bio alloc caching for IRQ based IO (Pavel) - Series enabling userspace peer-to-peer DMA (Logan) - BFQ waker fixes (Khazhismel) - Series fixing elevator refcount issues (Christoph, Jinlong) - Series cleaning up references around queue destruction (Christoph) - Series doing quiesce by tagset, enabling cleanups in drivers (Christoph, Chao) - Series untangling the queue kobject and queue references (Christoph) - Misc fixes and cleanups (Bart, David, Dawei, Jinlong, Kemeng, Ye, Yang, Waiman, Shin'ichiro, Randy, Pankaj, Christoph) * tag 'for-6.2/block-2022-12-08' of git://git.kernel.dk/linux: (247 commits) blktrace: Fix output non-blktrace event when blk_classic option enabled block: sed-opal: Don't include <linux/kernel.h> sed-opal: allow using IOC_OPAL_SAVE for locking too blk-cgroup: Fix typo in comment block: remove bio_set_op_attrs nvmet: don't open-code NVME_NS_ATTR_RO enumeration nvme-pci: use the tagset alloc/free helpers nvme: add the Apple shared tag workaround to nvme_alloc_io_tag_set nvme: only set reserved_tags in nvme_alloc_io_tag_set for fabrics controllers nvme: consolidate setting the tagset flags nvme: pass nr_maps explicitly to nvme_alloc_io_tag_set block: bio_copy_data_iter nvme-pci: split out a nvme_pci_ctrl_is_dead helper nvme-pci: return early on ctrl state mismatch in nvme_reset_work nvme-pci: rename nvme_disable_io_queues nvme-pci: cleanup nvme_suspend_queue nvme-pci: remove nvme_pci_disable nvme-pci: remove nvme_disable_admin_queue nvme: merge nvme_shutdown_ctrl into nvme_disable_ctrl nvme: use nvme_wait_ready in nvme_shutdown_ctrl ...
2022-12-13Merge tag 'audit-pr-20221212' of ↵Linus Torvalds1-36/+39
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit Pull audit updates from Paul Moore: "Two performance oriented patches for the audit subsystem: one consolidates similar code to gain some caching advantages, while the other stores a value in a stack variable to avoid repeated lookups in a loop. The commit descriptions have more information, including some before/after performance measurements" * tag 'audit-pr-20221212' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit: audit: unify audit_filter_{uring(), inode_name(), syscall()} audit: cache ctx->major in audit_filter_syscall()
2022-12-13Merge tag 'dma-mapping-6.2-2022-12-13' of ↵Linus Torvalds2-24/+47
git://git.infradead.org/users/hch/dma-mapping Pull dma-mapping updates from Christoph Hellwig: - reduce the swiotlb buffer size on allocation failure (Alexey Kardashevskiy) - clean up passing of bogus GFP flags to the dma-coherent allocator (Christoph Hellwig) * tag 'dma-mapping-6.2-2022-12-13' of git://git.infradead.org/users/hch/dma-mapping: dma-mapping: reject __GFP_COMP in dma_alloc_attrs ALSA: memalloc: don't pass bogus GFP_ flags to dma_alloc_* s390/ism: don't pass bogus GFP_ flags to dma_alloc_coherent cnic: don't pass bogus GFP_ flags to dma_alloc_coherent RDMA/qib: don't pass bogus GFP_ flags to dma_alloc_coherent RDMA/hfi1: don't pass bogus GFP_ flags to dma_alloc_coherent media: videobuf-dma-contig: use dma_mmap_coherent swiotlb: reduce the swiotlb buffer size on allocation failure
2022-12-12Merge tag 'fs.vfsuid.conversion.v6.2' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping Pull vfsuid updates from Christian Brauner: "Last cycle we introduced the vfs{g,u}id_t types and associated helpers to gain type safety when dealing with idmapped mounts. That initial work already converted a lot of places over but there were still some left, This converts all remaining places that still make use of non-type safe idmapping helpers to rely on the new type safe vfs{g,u}id based helpers. Afterwards it removes all the old non-type safe helpers" * tag 'fs.vfsuid.conversion.v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping: fs: remove unused idmapping helpers ovl: port to vfs{g,u}id_t and associated helpers fuse: port to vfs{g,u}id_t and associated helpers ima: use type safe idmapping helpers apparmor: use type safe idmapping helpers caps: use type safe idmapping helpers fs: use type safe idmapping helpers mnt_idmapping: add missing helpers
2022-12-12livepatch: Call klp_match_callback() in klp_find_callback() to avoid code ↵Zhen Lei1-22/+11
duplication The implementation of function klp_match_callback() is identical to the partial implementation of function klp_find_callback(). So call function klp_match_callback() in function klp_find_callback() instead of the duplicated code. Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Acked-by: Song Liu <song@kernel.org> Reviewed-by: Petr Mladek <pmladek@suse.com> Suggested-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2022-12-12Merge tag 'pull-iov_iter' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull iov_iter updates from Al Viro: "iov_iter work; most of that is about getting rid of direction misannotations and (hopefully) preventing more of the same for the future" * tag 'pull-iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: use less confusing names for iov_iter direction initializers iov_iter: saner checks for attempt to copy to/from iterator [xen] fix "direction" argument of iov_iter_kvec() [vhost] fix 'direction' argument of iov_iter_{init,bvec}() [target] fix iov_iter_bvec() "direction" argument [s390] memcpy_real(): WRITE is "data source", not destination... [s390] zcore: WRITE is "data source", not destination... [infiniband] READ is "data destination", not source... [fsi] WRITE is "data source", not destination... [s390] copy_oldmem_kernel() - WRITE is "data source", not destination csum_and_copy_to_iter(): handle ITER_DISCARD get rid of unlikely() on page_copy_sane() calls
2022-12-12Merge tag 'pull-elfcore' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull elf coredumping updates from Al Viro: "Unification of regset and non-regset sides of ELF coredump handling. Collecting per-thread register values is the only thing that needs to be ifdefed there..." * tag 'pull-elfcore' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: [elf] get rid of get_note_info_size() [elf] unify regset and non-regset cases [elf][non-regset] use elf_core_copy_task_regs() for dumper as well [elf][non-regset] uninline elf_core_copy_task_fpregs() (and lose pt_regs argument) elf_core_copy_task_regs(): task_pt_regs is defined everywhere [elf][regset] simplify thread list handling in fill_note_info() [elf][regset] clean fill_note_info() a bit kill extern of vsyscall32_sysctl kill coredump_params->regs kill signal_pt_regs()
2022-12-12Merge tag 'mm-nonmm-stable-2022-12-12' of ↵Linus Torvalds8-18/+19
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - A ptrace API cleanup series from Sergey Shtylyov - Fixes and cleanups for kexec from ye xingchen - nilfs2 updates from Ryusuke Konishi - squashfs feature work from Xiaoming Ni: permit configuration of the filesystem's compression concurrency from the mount command line - A series from Akinobu Mita which addresses bound checking errors when writing to debugfs files - A series from Yang Yingliang to address rapidio memory leaks - A series from Zheng Yejian to address possible overflow errors in encode_comp_t() - And a whole shower of singleton patches all over the place * tag 'mm-nonmm-stable-2022-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (79 commits) ipc: fix memory leak in init_mqueue_fs() hfsplus: fix bug causing custom uid and gid being unable to be assigned with mount rapidio: devices: fix missing put_device in mport_cdev_open kcov: fix spelling typos in comments hfs: Fix OOB Write in hfs_asc2mac hfs: fix OOB Read in __hfs_brec_find relay: fix type mismatch when allocating memory in relay_create_buf() ocfs2: always read both high and low parts of dinode link count io-mapping: move some code within the include guarded section kernel: kcsan: kcsan_test: build without structleak plugin mailmap: update email for Iskren Chernev eventfd: change int to __u64 in eventfd_signal() ifndef CONFIG_EVENTFD rapidio: fix possible UAF when kfifo_alloc() fails relay: use strscpy() is more robust and safer cpumask: limit visibility of FORCE_NR_CPUS acct: fix potential integer overflow in encode_comp_t() acct: fix accuracy loss for input value of encode_comp_t() linux/init.h: include <linux/build_bug.h> and <linux/stringify.h> rapidio: rio: fix possible name leak in rio_register_mport() rapidio: fix possible name leaks when rio_add_device() fails ...
2022-12-12Merge tag 'random-6.2-rc1-for-linus' of ↵Linus Torvalds5-16/+8
git://git.kernel.org/pub/scm/linux/kernel/git/crng/random Pull random number generator updates from Jason Donenfeld: - Replace prandom_u32_max() and various open-coded variants of it, there is now a new family of functions that uses fast rejection sampling to choose properly uniformly random numbers within an interval: get_random_u32_below(ceil) - [0, ceil) get_random_u32_above(floor) - (floor, U32_MAX] get_random_u32_inclusive(floor, ceil) - [floor, ceil] Coccinelle was used to convert all current users of prandom_u32_max(), as well as many open-coded patterns, resulting in improvements throughout the tree. I'll have a "late" 6.1-rc1 pull for you that removes the now unused prandom_u32_max() function, just in case any other trees add a new use case of it that needs to converted. According to linux-next, there may be two trivial cases of prandom_u32_max() reintroductions that are fixable with a 's/.../.../'. So I'll have for you a final conversion patch doing that alongside the removal patch during the second week. This is a treewide change that touches many files throughout. - More consistent use of get_random_canary(). - Updates to comments, documentation, tests, headers, and simplification in configuration. - The arch_get_random*_early() abstraction was only used by arm64 and wasn't entirely useful, so this has been replaced by code that works in all relevant contexts. - The kernel will use and manage random seeds in non-volatile EFI variables, refreshing a variable with a fresh seed when the RNG is initialized. The RNG GUID namespace is then hidden from efivarfs to prevent accidental leakage. These changes are split into random.c infrastructure code used in the EFI subsystem, in this pull request, and related support inside of EFISTUB, in Ard's EFI tree. These are co-dependent for full functionality, but the order of merging doesn't matter. - Part of the infrastructure added for the EFI support is also used for an improvement to the way vsprintf initializes its siphash key, replacing an sleep loop wart. - The hardware RNG framework now always calls its correct random.c input function, add_hwgenerator_randomness(), rather than sometimes going through helpers better suited for other cases. - The add_latent_entropy() function has long been called from the fork handler, but is a no-op when the latent entropy gcc plugin isn't used, which is fine for the purposes of latent entropy. But it was missing out on the cycle counter that was also being mixed in beside the latent entropy variable. So now, if the latent entropy gcc plugin isn't enabled, add_latent_entropy() will expand to a call to add_device_randomness(NULL, 0), which adds a cycle counter, without the absent latent entropy variable. - The RNG is now reseeded from a delayed worker, rather than on demand when used. Always running from a worker allows it to make use of the CPU RNG on platforms like S390x, whose instructions are too slow to do so from interrupts. It also has the effect of adding in new inputs more frequently with more regularity, amounting to a long term transcript of random values. Plus, it helps a bit with the upcoming vDSO implementation (which isn't yet ready for 6.2). - The jitter entropy algorithm now tries to execute on many different CPUs, round-robining, in hopes of hitting even more memory latencies and other unpredictable effects. It also will mix in a cycle counter when the entropy timer fires, in addition to being mixed in from the main loop, to account more explicitly for fluctuations in that timer firing. And the state it touches is now kept within the same cache line, so that it's assured that the different execution contexts will cause latencies. * tag 'random-6.2-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random: (23 commits) random: include <linux/once.h> in the right header random: align entropy_timer_state to cache line random: mix in cycle counter when jitter timer fires random: spread out jitter callback to different CPUs random: remove extraneous period and add a missing one in comments efi: random: refresh non-volatile random seed when RNG is initialized vsprintf: initialize siphash key using notifier random: add back async readiness notifier random: reseed in delayed work rather than on-demand random: always mix cycle counter in add_latent_entropy() hw_random: use add_hwgenerator_randomness() for early entropy random: modernize documentation comment on get_random_bytes() random: adjust comment to account for removed function random: remove early archrandom abstraction random: use random.trust_{bootloader,cpu} command line option only stackprotector: actually use get_random_canary() stackprotector: move get_random_canary() into stackprotector.h treewide: use get_random_u32_inclusive() when possible treewide: use get_random_u32_{above,below}() instead of manual loop treewide: use get_random_u32_below() instead of deprecated function ...
2022-12-12Merge tag 'livepatching-for-6.2' of ↵Linus Torvalds1-27/+27
git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching Pull livepatching update from Petr Mladek: - code cleanup * tag 'livepatching-for-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching: livepatch: Move the result-invariant calculation out of the loop
2022-12-12Merge tag 'cgroup-for-6.2' of ↵Linus Torvalds2-11/+45
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup updates from Tejun Heo: "Nothing too interesting: - Add CONFIG_DEBUG_GROUP_REF which makes cgroup refcnt operations kprobable - A couple cpuset optimizations - Other misc changes including doc and test updates" * tag 'cgroup-for-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: remove rcu_read_lock()/rcu_read_unlock() in critical section of spin_lock_irq() cgroup/cpuset: Improve cpuset_css_alloc() description kselftest/cgroup: Add cleanup() to test_cpuset_prs.sh cgroup/cpuset: Optimize cpuset_attach() on v2 cgroup/cpuset: Skip spread flags update on v2 kselftest/cgroup: Fix gathering number of CPUs cgroup: cgroup refcnt functions should be exported when CONFIG_DEBUG_CGROUP_REF cgroup: Implement DEBUG_CGROUP_REF
2022-12-12Merge tag 'sched-core-2022-12-12' of ↵Linus Torvalds6-180/+603
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: - Implement persistent user-requested affinity: introduce affinity_context::user_mask and unconditionally preserve the user-requested CPU affinity masks, for long-lived tasks to better interact with cpusets & CPU hotplug events over longer timespans, without destroying the original affinity intent if the underlying topology changes. - Uclamp updates: fix relationship between uclamp and fits_capacity() - PSI fixes - Misc fixes & updates * tag 'sched-core-2022-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched: Clear ttwu_pending after enqueue_task() sched/psi: Use task->psi_flags to clear in CPU migration sched/psi: Stop relying on timer_pending() for poll_work rescheduling sched/psi: Fix avgs_work re-arm in psi_avgs_work() sched/psi: Fix possible missing or delayed pending event sched: Always clear user_cpus_ptr in do_set_cpus_allowed() sched: Enforce user requested affinity sched: Always preserve the user requested cpumask sched: Introduce affinity_context sched: Add __releases annotations to affine_move_task() sched/fair: Check if prev_cpu has highest spare cap in feec() sched/fair: Consider capacity inversion in util_fits_cpu() sched/fair: Detect capacity inversion sched/uclamp: Cater for uclamp in find_energy_efficient_cpu()'s early exit condition sched/uclamp: Make cpu_overutilized() use util_fits_cpu() sched/uclamp: Make asym_fits_capacity() use util_fits_cpu() sched/uclamp: Make select_idle_capacity() use util_fits_cpu() sched/uclamp: Fix fits_capacity() check in feec() sched/uclamp: Make task_fits_capacity() use util_fits_cpu() sched/uclamp: Fix relationship between uclamp and migration margin
2022-12-12Merge tag 'perf-core-2022-12-12' of ↵Linus Torvalds1-1013/+1087
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf events updates from Ingo Molnar: - Thoroughly rewrite the data structures that implement perf task context handling, with the goal of fixing various quirks and unfeatures both in already merged, and in upcoming proposed code. The old data structure is the per task and per cpu perf_event_contexts: task_struct::perf_events_ctxp[] <-> perf_event_context <-> perf_cpu_context ^ | ^ | ^ `---------------------------------' | `--> pmu ---' v ^ perf_event ------' In this new design this is replaced with a single task context and a single CPU context, plus intermediate data-structures: task_struct::perf_event_ctxp -> perf_event_context <- perf_cpu_context ^ | ^ ^ `---------------------------' | | | | perf_cpu_pmu_context <--. | `----. ^ | | | | | | v v | | ,--> perf_event_pmu_context | | | | | | | v v | perf_event ---> pmu ----------------' [ See commit bd2756811766 for more details. ] This rewrite was developed by Peter Zijlstra and Ravi Bangoria. - Optimize perf_tp_event() - Update the Intel uncore PMU driver, extending it with UPI topology discovery on various hardware models. - Misc fixes & cleanups * tag 'perf-core-2022-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits) perf/x86/intel/uncore: Fix reference count leak in __uncore_imc_init_box() perf/x86/intel/uncore: Fix reference count leak in snr_uncore_mmio_map() perf/x86/intel/uncore: Fix reference count leak in hswep_has_limit_sbox() perf/x86/intel/uncore: Fix reference count leak in sad_cfg_iio_topology() perf/x86/intel/uncore: Make set_mapping() procedure void perf/x86/intel/uncore: Update sysfs-devices-mapping file perf/x86/intel/uncore: Enable UPI topology discovery for Sapphire Rapids perf/x86/intel/uncore: Enable UPI topology discovery for Icelake Server perf/x86/intel/uncore: Get UPI NodeID and GroupID perf/x86/intel/uncore: Enable UPI topology discovery for Skylake Server perf/x86/intel/uncore: Generalize get_topology() for SKX PMUs perf/x86/intel/uncore: Disable I/O stacks to PMU mapping on ICX-D perf/x86/intel/uncore: Clear attr_update properly perf/x86/intel/uncore: Introduce UPI topology type perf/x86/intel/uncore: Generalize IIO topology support perf/core: Don't allow grouping events from different hw pmus perf/amd/ibs: Make IBS a core pmu perf: Fix function pointer case perf/x86/amd: Remove the repeated declaration perf: Fix possible memleak in pmu_dev_alloc() ...
2022-12-12Merge tag 'locking-core-2022-12-12' of ↵Linus Torvalds2-15/+19
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: "Two changes in this cycle: - a micro-optimization in static_key_slow_inc_cpuslocked() - fix futex death-notification wakeup bug" * tag 'locking-core-2022-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: futex: Resend potentially swallowed owner death notification jump_label: Use atomic_try_cmpxchg() in static_key_slow_inc_cpuslocked()
2022-12-12Merge tag 'cxl-for-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxlLinus Torvalds1-5/+8
Pull cxl updates from Dan Williams: "Compute Express Link (CXL) updates for 6.2. While it may seem backwards, the CXL update this time around includes some focus on CXL 1.x enabling where the work to date had been with CXL 2.0 (VH topologies) in mind. First generation CXL can mostly be supported via BIOS, similar to DDR, however it became clear there are use cases for OS native CXL error handling and some CXL 3.0 endpoint features can be deployed on CXL 1.x hosts (Restricted CXL Host (RCH) topologies). So, this update brings RCH topologies into the Linux CXL device model. In support of the ongoing CXL 2.0+ enabling two new core kernel facilities are added. One is the ability for the kernel to flag collisions between userspace access to PCI configuration registers and kernel accesses. This is brought on by the PCIe Data-Object-Exchange (DOE) facility, a hardware mailbox over config-cycles. The other is a cpu_cache_invalidate_memregion() API that maps to wbinvd_on_all_cpus() on x86. To prevent abuse it is disabled in guest VMs and architectures that do not support it yet. The CXL paths that need it, dynamic memory region creation and security commands (erase / unlock), are disabled when it is not present. As for the CXL 2.0+ this cycle the subsystem gains support Persistent Memory Security commands, error handling in response to PCIe AER notifications, and support for the "XOR" host bridge interleave algorithm. Summary: - Add the cpu_cache_invalidate_memregion() API for cache flushing in response to physical memory reconfiguration, or memory-side data invalidation from operations like secure erase or memory-device unlock. - Add a facility for the kernel to warn about collisions between kernel and userspace access to PCI configuration registers - Add support for Restricted CXL Host (RCH) topologies (formerly CXL 1.1) - Add handling and reporting of CXL errors reported via the PCIe AER mechanism - Add support for CXL Persistent Memory Security commands - Add support for the "XOR" algorithm for CXL host bridge interleave - Rework / simplify CXL to NVDIMM interactions - Miscellaneous cleanups and fixes" * tag 'cxl-for-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (71 commits) cxl/region: Fix memdev reuse check cxl/pci: Remove endian confusion cxl/pci: Add some type-safety to the AER trace points cxl/security: Drop security command ioctl uapi cxl/mbox: Add variable output size validation for internal commands cxl/mbox: Enable cxl_mbox_send_cmd() users to validate output size cxl/security: Fix Get Security State output payload endian handling cxl: update names for interleave ways conversion macros cxl: update names for interleave granularity conversion macros cxl/acpi: Warn about an invalid CHBCR in an existing CHBS entry tools/testing/cxl: Require cache invalidation bypass cxl/acpi: Fail decoder add if CXIMS for HBIG is missing cxl/region: Fix spelling mistake "memergion" -> "memregion" cxl/regs: Fix sparse warning cxl/acpi: Set ACPI's CXL _OSC to indicate RCD mode support tools/testing/cxl: Add an RCH topology cxl/port: Add RCD endpoint port enumeration cxl/mem: Move devm_cxl_add_endpoint() from cxl_core to cxl_mem tools/testing/cxl: Add XOR Math support to cxl_test cxl/acpi: Support CXL XOR Interleave Math (CXIMS) ...
2022-12-12Merge tag 'pm-6.2-rc1' of ↵Linus Torvalds3-22/+21
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "These include two new drivers (cpufreq driver for Apple SoC CPU P-states and the SCMI Powercap based power capping driver), other new hardware support and driver extensions (Qualcomm cpufreq driver and its DT bindings, TI cpufreq driver, intel_pstate, intel-uncore-freq), a bunch of fixes and cleanups all over and a cpupower utility update including new features related to RAPL support. Specifics: - Fix nasty and hard to debug race condition introduced by mistake in the runtime PM core code and clean up that code somewhat on top of the fix (Rafael Wysocki) - Generalize of_perf_domain_get_sharing_cpumask phandle format (Hector Martin) - Add new cpufreq driver for Apple SoC CPU P-states (Hector Martin) - Update Qualcomm cpufreq driver (Manivannan Sadhasivam, Chen Hui): - CPU clock provider support - Generic cleanups or reorganization - Potential memleak fix - Fix of the return value of cpufreq_driver->get() - Update Qualcomm cpufreq driver's DT bindings (Manivannan Sadhasivam, Rob Herring, Melody Olvera): - Support for CPU clock provider - Missing cache-related properties fixes - Support for QDU1000/QRU1000 - Add support for ti,am625 SoC and enable build of ti-cpufreq for ARCH_K3 (Dave Gerlach, and Vibhore Vardhan) - Use flexible array to simplify memory allocation in the tegra186 cpufreq driver (Christophe JAILLET) - Convert cpufreq statistics code to use sysfs_emit_at() (ye xingchen) - Allow intel_pstate to use no-HWP mode on Sapphire Rapids (Giovanni Gherdovich) - Add missing pci_dev_put() to the amd_freq_sensitivity cpufreq driver (Xiongfeng Wang) - Initialize the kobj_unregister completion before calling kobject_init_and_add() in the cpufreq core code (Yongqiang Liu) - Defer setting boost MSRs in the ACPI cpufreq driver (Stuart Hayes, Nathan Chancellor) - Make intel_pstate accept initial EPP value of 0x80 (Srinivas Pandruvada) - Make read-only array sys_clk_src in the SPEAr cpufreq driver static (Colin Ian King) - Make array speeds in the longhaul cpufreq driver static (Colin Ian King) - Use str_enabled_disabled() helper in the ACPI cpufreq driver (Andy Shevchenko) - Drop a reference to CVS from cpufreq documentation (Conghui Wang) - Improve kernel messages printed by the PSCI cpuidle driver (Ulf Hansson) - Make the DT cpuidle driver return the correct number of parsed idle states, clean it up and clarify a comment in it (Ulf Hansson) - Modify the tasks freezing code to avoid using pr_cont() and refine an error message printed by it (Rafael Wysocki) - Make the hibernation core code complain about memory map mismatches during resume to help diagnostics (Xueqin Luo) - Fix mistake in a kerneldoc comment in the hibernation code (xiongxin) - Reverse the order of performance and enabling operations in the generic power domains code (Abel Vesa) - Power off[on] domains in hibernate .freeze[thaw]_noirq hook of in the generic power domains code (Abel Vesa) - Consolidate genpd_restore_noirq() and genpd_resume_noirq() (Shawn Guo) - Pass generic PM noirq hooks to genpd_finish_suspend() (Shawn Guo) - Drop generic power domain status manipulation during hibernate restore (Shawn Guo) - Fix compiler warnings with make W=1 in the idle_inject power capping driver (Srinivas Pandruvada) - Use kstrtobool() instead of strtobool() in the power capping sysfs interface (Christophe JAILLET) - Add SCMI Powercap based power capping driver (Cristian Marussi) - Add Emerald Rapids support to the intel-uncore-freq driver (Artem Bityutskiy) - Repair slips in kernel-doc comments in the generic notifier code (Lukas Bulwahn) - Fix several DT issues in the OPP library reorganize code around opp-microvolt-<named> DT property (Viresh Kumar) - Allow any of opp-microvolt, opp-microamp, or opp-microwatt properties to be present without the others present (James Calligeros) - Fix clock-latency-ns property in DT example (Serge Semin) - Add a private governor_data for devfreq governors (Kant Fan) - Reorganize devfreq code to use device_match_of_node() and devm_platform_get_and_ioremap_resource() instead of open coding them (ye xingchen, Minghao Chi) - Make cpupower choose base_cpu to display default cpupower details instead of picking CPU 0 (Saket Kumar Bhaskar) - Add Georgian translation to cpupower documentation (Zurab Kargareteli) - Introduce powercap intel-rapl library, powercap-info command, and RAPL monitor into cpupower (Thomas Renninger)" * tag 'pm-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (64 commits) PM: runtime: Adjust white space in the core code cpufreq: Remove CVS version control contents from documentation cpufreq: stats: Convert to use sysfs_emit_at() API cpufreq: ACPI: Only set boost MSRs on supported CPUs PM: sleep: Refine error message in try_to_freeze_tasks() PM: sleep: Avoid using pr_cont() in the tasks freezing code PM: runtime: Relocate rpm_callback() right after __rpm_callback() PM: runtime: Do not call __rpm_callback() from rpm_idle() PM / devfreq: event: use devm_platform_get_and_ioremap_resource() PM / devfreq: event: Use device_match_of_node() PM / devfreq: Use device_match_of_node() powercap: idle_inject: Fix warnings with make W=1 PM: hibernate: Complain about memory map mismatches during resume dt-bindings: cpufreq: cpufreq-qcom-hw: Add QDU1000/QRU1000 cpufreq cpufreq: tegra186: Use flexible array to simplify memory allocation cpupower: rapl monitor - shows the used power consumption in uj for each rapl domain cpupower: Introduce powercap intel-rapl library and powercap-info command cpupower: Add Georgian translation cpufreq: intel_pstate: Add Sapphire Rapids support in no-HWP mode cpufreq: amd_freq_sensitivity: Add missing pci_dev_put() ...
2022-12-12Merge tag 'timers-core-2022-12-10' of ↵Linus Torvalds3-99/+348
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: "Updates for timers, timekeeping and drivers: Core: - The timer_shutdown[_sync]() infrastructure: Tearing down timers can be tedious when there are circular dependencies to other things which need to be torn down. A prime example is timer and workqueue where the timer schedules work and the work arms the timer. What needs to prevented is that pending work which is drained via destroy_workqueue() does not rearm the previously shutdown timer. Nothing in that shutdown sequence relies on the timer being functional. The conclusion was that the semantics of timer_shutdown_sync() should be: - timer is not enqueued - timer callback is not running - timer cannot be rearmed Preventing the rearming of shutdown timers is done by discarding rearm attempts silently. A warning for the case that a rearm attempt of a shutdown timer is detected would not be really helpful because it's entirely unclear how it should be acted upon. The only way to address such a case is to add 'if (in_shutdown)' conditionals all over the place. This is error prone and in most cases of teardown not required all. - The real fix for the bluetooth HCI teardown based on timer_shutdown_sync(). A larger scale conversion to timer_shutdown_sync() is work in progress. - Consolidation of VDSO time namespace helper functions - Small fixes for timer and timerqueue Drivers: - Prevent integer overflow on the XGene-1 TVAL register which causes an never ending interrupt storm. - The usual set of new device tree bindings - Small fixes and improvements all over the place" * tag 'timers-core-2022-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (34 commits) dt-bindings: timer: renesas,cmt: Add r8a779g0 CMT support dt-bindings: timer: renesas,tmu: Add r8a779g0 support clocksource/drivers/arm_arch_timer: Use kstrtobool() instead of strtobool() clocksource/drivers/timer-ti-dm: Fix missing clk_disable_unprepare in dmtimer_systimer_init_clock() clocksource/drivers/timer-ti-dm: Clear settings on probe and free clocksource/drivers/timer-ti-dm: Make timer_get_irq static clocksource/drivers/timer-ti-dm: Fix warning for omap_timer_match clocksource/drivers/arm_arch_timer: Fix XGene-1 TVAL register math error clocksource/drivers/timer-npcm7xx: Enable timer 1 clock before use dt-bindings: timer: nuvoton,npcm7xx-timer: Allow specifying all clocks dt-bindings: timer: rockchip: Add rockchip,rk3128-timer clockevents: Repair kernel-doc for clockevent_delta2ns() clocksource/drivers/ingenic-ost: Define pm functions properly in platform_driver struct clocksource/drivers/sh_cmt: Access registers according to spec vdso/timens: Refactor copy-pasted find_timens_vvar_page() helper into one copy Bluetooth: hci_qca: Fix the teardown problem for real timers: Update the documentation to reflect on the new timer_shutdown() API timers: Provide timer_shutdown[_sync]() timers: Add shutdown mechanism to the internal functions timers: Split [try_to_]del_timer[_sync]() to prepare for shutdown mode ...
2022-12-12Merge tag 'smp-core-2022-12-10' of ↵Linus Torvalds1-17/+44
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull CPU hotplug updates from Thomas Gleixner: "A small set of updates for CPU hotplug: - Prevent stale CPU hotplug state in the cpu_down() path which was detected by stress testing the sysfs interface - Ensure that the target CPU hotplug state for the boot CPU is CPUHP_ONLINE instead of the compile time init value CPUHP_OFFLINE. - Switch back to the original behaviour of warning when a CPU hotplug callback in the DYING/STARTING section returns an error code. Otherwise a buggy callback can leave the CPUs in an non recoverable state" * tag 'smp-core-2022-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: cpu/hotplug: Do not bail-out in DYING/STARTING sections cpu/hotplug: Set cpuhp target for boot cpu cpu/hotplug: Make target_store() a nop when target == state
2022-12-12Merge tag 'for-netdev' of ↵Jakub Kicinski8-299/+605
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Alexei Starovoitov says: ==================== pull-request: bpf-next 2022-12-11 We've added 74 non-merge commits during the last 11 day(s) which contain a total of 88 files changed, 3362 insertions(+), 789 deletions(-). The main changes are: 1) Decouple prune and jump points handling in the verifier, from Andrii. 2) Do not rely on ALLOW_ERROR_INJECTION for fmod_ret, from Benjamin. Merged from hid tree. 3) Do not zero-extend kfunc return values. Necessary fix for 32-bit archs, from Björn. 4) Don't use rcu_users to refcount in task kfuncs, from David. 5) Three reg_state->id fixes in the verifier, from Eduard. 6) Optimize bpf_mem_alloc by reusing elements from free_by_rcu, from Hou. 7) Refactor dynptr handling in the verifier, from Kumar. 8) Remove the "/sys" mount and umount dance in {open,close}_netns in bpf selftests, from Martin. 9) Enable sleepable support for cgrp local storage, from Yonghong. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (74 commits) selftests/bpf: test case for relaxed prunning of active_lock.id selftests/bpf: Add pruning test case for bpf_spin_lock bpf: use check_ids() for active_lock comparison selftests/bpf: verify states_equal() maintains idmap across all frames bpf: states_equal() must build idmap for all function frames selftests/bpf: test cases for regsafe() bug skipping check_id() bpf: regsafe() must not skip check_ids() docs/bpf: Add documentation for BPF_MAP_TYPE_SK_STORAGE selftests/bpf: Add test for dynptr reinit in user_ringbuf callback bpf: Use memmove for bpf_dynptr_{read,write} bpf: Move PTR_TO_STACK alignment check to process_dynptr_func bpf: Rework check_func_arg_reg_off bpf: Rework process_dynptr_func bpf: Propagate errors from process_* checks in check_func_arg bpf: Refactor ARG_PTR_TO_DYNPTR checks into process_dynptr_func bpf: Skip rcu_barrier() if rcu_trace_implies_rcu_gp() is true bpf: Reuse freed element in free_by_rcu during allocation selftests/bpf: Bring test_offload.py back to life bpf: Fix comment error in fixup_kfunc_call function bpf: Do not zero-extend kfunc return values ... ==================== Link: https://lore.kernel.org/r/20221212024701.73809-1-alexei.starovoitov@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-12Merge tag 'irq-core-2022-12-10' of ↵Linus Torvalds6-189/+761
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq updates from Thomas Gleixner: "Updates for the interrupt core and driver subsystem: The bulk is the rework of the MSI subsystem to support per device MSI interrupt domains. This solves conceptual problems of the current PCI/MSI design which are in the way of providing support for PCI/MSI[-X] and the upcoming PCI/IMS mechanism on the same device. IMS (Interrupt Message Store] is a new specification which allows device manufactures to provide implementation defined storage for MSI messages (as opposed to PCI/MSI and PCI/MSI-X that has a specified message store which is uniform accross all devices). The PCI/MSI[-X] uniformity allowed us to get away with "global" PCI/MSI domains. IMS not only allows to overcome the size limitations of the MSI-X table, but also gives the device manufacturer the freedom to store the message in arbitrary places, even in host memory which is shared with the device. There have been several attempts to glue this into the current MSI code, but after lengthy discussions it turned out that there is a fundamental design problem in the current PCI/MSI-X implementation. This needs some historical background. When PCI/MSI[-X] support was added around 2003, interrupt management was completely different from what we have today in the actively developed architectures. Interrupt management was completely architecture specific and while there were attempts to create common infrastructure the commonalities were rudimentary and just providing shared data structures and interfaces so that drivers could be written in an architecture agnostic way. The initial PCI/MSI[-X] support obviously plugged into this model which resulted in some basic shared infrastructure in the PCI core code for setting up MSI descriptors, which are a pure software construct for holding data relevant for a particular MSI interrupt, but the actual association to Linux interrupts was completely architecture specific. This model is still supported today to keep museum architectures and notorious stragglers alive. In 2013 Intel tried to add support for hot-pluggable IO/APICs to the kernel, which was creating yet another architecture specific mechanism and resulted in an unholy mess on top of the existing horrors of x86 interrupt handling. The x86 interrupt management code was already an incomprehensible maze of indirections between the CPU vector management, interrupt remapping and the actual IO/APIC and PCI/MSI[-X] implementation. At roughly the same time ARM struggled with the ever growing SoC specific extensions which were glued on top of the architected GIC interrupt controller. This resulted in a fundamental redesign of interrupt management and provided the today prevailing concept of hierarchical interrupt domains. This allowed to disentangle the interactions between x86 vector domain and interrupt remapping and also allowed ARM to handle the zoo of SoC specific interrupt components in a sane way. The concept of hierarchical interrupt domains aims to encapsulate the functionality of particular IP blocks which are involved in interrupt delivery so that they become extensible and pluggable. The X86 encapsulation looks like this: |--- device 1 [Vector]---[Remapping]---[PCI/MSI]--|... |--- device N where the remapping domain is an optional component and in case that it is not available the PCI/MSI[-X] domains have the vector domain as their parent. This reduced the required interaction between the domains pretty much to the initialization phase where it is obviously required to establish the proper parent relation ship in the components of the hierarchy. While in most cases the model is strictly representing the chain of IP blocks and abstracting them so they can be plugged together to form a hierarchy, the design stopped short on PCI/MSI[-X]. Looking at the hardware it's clear that the actual PCI/MSI[-X] interrupt controller is not a global entity, but strict a per PCI device entity. Here we took a short cut on the hierarchical model and went for the easy solution of providing "global" PCI/MSI domains which was possible because the PCI/MSI[-X] handling is uniform across the devices. This also allowed to keep the existing PCI/MSI[-X] infrastructure mostly unchanged which in turn made it simple to keep the existing architecture specific management alive. A similar problem was created in the ARM world with support for IP block specific message storage. Instead of going all the way to stack a IP block specific domain on top of the generic MSI domain this ended in a construct which provides a "global" platform MSI domain which allows overriding the irq_write_msi_msg() callback per allocation. In course of the lengthy discussions we identified other abuse of the MSI infrastructure in wireless drivers, NTB etc. where support for implementation specific message storage was just mindlessly glued into the existing infrastructure. Some of this just works by chance on particular platforms but will fail in hard to diagnose ways when the driver is used on platforms where the underlying MSI interrupt management code does not expect the creative abuse. Another shortcoming of today's PCI/MSI-X support is the inability to allocate or free individual vectors after the initial enablement of MSI-X. This results in an works by chance implementation of VFIO (PCI pass-through) where interrupts on the host side are not set up upfront to avoid resource exhaustion. They are expanded at run-time when the guest actually tries to use them. The way how this is implemented is that the host disables MSI-X and then re-enables it with a larger number of vectors again. That works by chance because most device drivers set up all interrupts before the device actually will utilize them. But that's not universally true because some drivers allocate a large enough number of vectors but do not utilize them until it's actually required, e.g. for acceleration support. But at that point other interrupts of the device might be in active use and the MSI-X disable/enable dance can just result in losing interrupts and therefore hard to diagnose subtle problems. Last but not least the "global" PCI/MSI-X domain approach prevents to utilize PCI/MSI[-X] and PCI/IMS on the same device due to the fact that IMS is not longer providing a uniform storage and configuration model. The solution to this is to implement the missing step and switch from global PCI/MSI domains to per device PCI/MSI domains. The resulting hierarchy then looks like this: |--- [PCI/MSI] device 1 [Vector]---[Remapping]---|... |--- [PCI/MSI] device N which in turn allows to provide support for multiple domains per device: |--- [PCI/MSI] device 1 |--- [PCI/IMS] device 1 [Vector]---[Remapping]---|... |--- [PCI/MSI] device N |--- [PCI/IMS] device N This work converts the MSI and PCI/MSI core and the x86 interrupt domains to the new model, provides new interfaces for post-enable allocation/free of MSI-X interrupts and the base framework for PCI/IMS. PCI/IMS has been verified with the work in progress IDXD driver. There is work in progress to convert ARM over which will replace the platform MSI train-wreck. The cleanup of VFIO, NTB and other creative "solutions" are in the works as well. Drivers: - Updates for the LoongArch interrupt chip drivers - Support for MTK CIRQv2 - The usual small fixes and updates all over the place" * tag 'irq-core-2022-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (134 commits) irqchip/ti-sci-inta: Fix kernel doc irqchip/gic-v2m: Mark a few functions __init irqchip/gic-v2m: Include arm-gic-common.h irqchip/irq-mvebu-icu: Fix works by chance pointer assignment iommu/amd: Enable PCI/IMS iommu/vt-d: Enable PCI/IMS x86/apic/msi: Enable PCI/IMS PCI/MSI: Provide pci_ims_alloc/free_irq() PCI/MSI: Provide IMS (Interrupt Message Store) support genirq/msi: Provide constants for PCI/IMS support x86/apic/msi: Enable MSI_FLAG_PCI_MSIX_ALLOC_DYN PCI/MSI: Provide post-enable dynamic allocation interfaces for MSI-X PCI/MSI: Provide prepare_desc() MSI domain op PCI/MSI: Split MSI-X descriptor setup genirq/msi: Provide MSI_FLAG_MSIX_ALLOC_DYN genirq/msi: Provide msi_domain_alloc_irq_at() genirq/msi: Provide msi_domain_ops:: Prepare_desc() genirq/msi: Provide msi_desc:: Msi_data genirq/msi: Provide struct msi_map x86/apic/msi: Remove arch_create_remap_msi_irq_domain() ...
2022-12-12rtmutex: Add acquire semantics for rtmutex lock acquisition slow pathMel Gorman2-12/+49
Jan Kara reported the following bug triggering on 6.0.5-rt14 running dbench on XFS on arm64. kernel BUG at fs/inode.c:625! Internal error: Oops - BUG: 0 [#1] PREEMPT_RT SMP CPU: 11 PID: 6611 Comm: dbench Tainted: G E 6.0.0-rt14-rt+ #1 pc : clear_inode+0xa0/0xc0 lr : clear_inode+0x38/0xc0 Call trace: clear_inode+0xa0/0xc0 evict+0x160/0x180 iput+0x154/0x240 do_unlinkat+0x184/0x300 __arm64_sys_unlinkat+0x48/0xc0 el0_svc_common.constprop.4+0xe4/0x2c0 do_el0_svc+0xac/0x100 el0_svc+0x78/0x200 el0t_64_sync_handler+0x9c/0xc0 el0t_64_sync+0x19c/0x1a0 It also affects 6.1-rc7-rt5 and affects a preempt-rt fork of 5.14 so this is likely a bug that existed forever and only became visible when ARM support was added to preempt-rt. The same problem does not occur on x86-64 and he also reported that converting sb->s_inode_wblist_lock to raw_spinlock_t makes the problem disappear indicating that the RT spinlock variant is the problem. Which in turn means that RT mutexes on ARM64 and any other weakly ordered architecture are affected by this independent of RT. Will Deacon observed: "I'd be more inclined to be suspicious of the slowpath tbh, as we need to make sure that we have acquire semantics on all paths where the lock can be taken. Looking at the rtmutex code, this really isn't obvious to me -- for example, try_to_take_rt_mutex() appears to be able to return via the 'takeit' label without acquire semantics and it looks like we might be relying on the caller's subsequent _unlock_ of the wait_lock for ordering, but that will give us release semantics which aren't correct." Sebastian Andrzej Siewior prototyped a fix that does work based on that comment but it was a little bit overkill and added some fences that should not be necessary. The lock owner is updated with an IRQ-safe raw spinlock held, but the spin_unlock does not provide acquire semantics which are needed when acquiring a mutex. Adds the necessary acquire semantics for lock owner updates in the slow path acquisition and the waiter bit logic. It successfully completed 10 iterations of the dbench workload while the vanilla kernel fails on the first iteration. [ bigeasy@linutronix.de: Initial prototype fix ] Fixes: 700318d1d7b38 ("locking/rtmutex: Use acquire/release semantics") Fixes: 23f78d4a03c5 ("[PATCH] pi-futex: rt mutex core") Reported-by: Jan Kara <jack@suse.cz> Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20221202100223.6mevpbl7i6x5udfd@techsingularity.net
2022-12-12Merge tag 'arm64-upstream' of ↵Linus Torvalds4-8/+17
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Will Deacon: "The highlights this time are support for dynamically enabling and disabling Clang's Shadow Call Stack at boot and a long-awaited optimisation to the way in which we handle the SVE register state on system call entry to avoid taking unnecessary traps from userspace. Summary: ACPI: - Enable FPDT support for boot-time profiling - Fix CPU PMU probing to work better with PREEMPT_RT - Update SMMUv3 MSI DeviceID parsing to latest IORT spec - APMT support for probing Arm CoreSight PMU devices CPU features: - Advertise new SVE instructions (v2.1) - Advertise range prefetch instruction - Advertise CSSC ("Common Short Sequence Compression") scalar instructions, adding things like min, max, abs, popcount - Enable DIT (Data Independent Timing) when running in the kernel - More conversion of system register fields over to the generated header CPU misfeatures: - Workaround for Cortex-A715 erratum #2645198 Dynamic SCS: - Support for dynamic shadow call stacks to allow switching at runtime between Clang's SCS implementation and the CPU's pointer authentication feature when it is supported (complete with scary DWARF parser!) Tracing and debug: - Remove static ftrace in favour of, err, dynamic ftrace! - Seperate 'struct ftrace_regs' from 'struct pt_regs' in core ftrace and existing arch code - Introduce and implement FTRACE_WITH_ARGS on arm64 to replace the old FTRACE_WITH_REGS - Extend 'crashkernel=' parameter with default value and fallback to placement above 4G physical if initial (low) allocation fails SVE: - Optimisation to avoid disabling SVE unconditionally on syscall entry and just zeroing the non-shared state on return instead Exceptions: - Rework of undefined instruction handling to avoid serialisation on global lock (this includes emulation of user accesses to the ID registers) Perf and PMU: - Support for TLP filters in Hisilicon's PCIe PMU device - Support for the DDR PMU present in Amlogic Meson G12 SoCs - Support for the terribly-named "CoreSight PMU" architecture from Arm (and Nvidia's implementation of said architecture) Misc: - Tighten up our boot protocol for systems with memory above 52 bits physical - Const-ify static keys to satisty jump label asm constraints - Trivial FFA driver cleanups in preparation for v1.1 support - Export the kernel_neon_* APIs as GPL symbols - Harden our instruction generation routines against instrumentation - A bunch of robustness improvements to our arch-specific selftests - Minor cleanups and fixes all over (kbuild, kprobes, kfence, PMU, ...)" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (151 commits) arm64: kprobes: Return DBG_HOOK_ERROR if kprobes can not handle a BRK arm64: kprobes: Let arch do_page_fault() fix up page fault in user handler arm64: Prohibit instrumentation on arch_stack_walk() arm64:uprobe fix the uprobe SWBP_INSN in big-endian arm64: alternatives: add __init/__initconst to some functions/variables arm_pmu: Drop redundant armpmu->map_event() in armpmu_event_init() kselftest/arm64: Allow epoll_wait() to return more than one result kselftest/arm64: Don't drain output while spawning children kselftest/arm64: Hold fp-stress children until they're all spawned arm64/sysreg: Remove duplicate definitions from asm/sysreg.h arm64/sysreg: Convert ID_DFR1_EL1 to automatic generation arm64/sysreg: Convert ID_DFR0_EL1 to automatic generation arm64/sysreg: Convert ID_AFR0_EL1 to automatic generation arm64/sysreg: Convert ID_MMFR5_EL1 to automatic generation arm64/sysreg: Convert MVFR2_EL1 to automatic generation arm64/sysreg: Convert MVFR1_EL1 to automatic generation arm64/sysreg: Convert MVFR0_EL1 to automatic generation arm64/sysreg: Convert ID_PFR2_EL1 to automatic generation arm64/sysreg: Convert ID_PFR1_EL1 to automatic generation arm64/sysreg: Convert ID_PFR0_EL1 to automatic generation ...
2022-12-12Merge tag 'slab-for-6.2-rc1' of ↵Linus Torvalds1-2/+3
git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab Pull slab updates from Vlastimil Babka: - SLOB deprecation and SLUB_TINY The SLOB allocator adds maintenance burden and stands in the way of API improvements [1]. Deprecate it by renaming the config option (to make users notice) to CONFIG_SLOB_DEPRECATED with updated help text. SLUB should be used instead as SLAB will be the next on the removal list. Based on reports from a riscv k210 board with 8MB RAM, add a CONFIG_SLUB_TINY option to minimize SLUB's memory usage at the expense of scalability. This has resolved the k210 regression [2] so in case there are no others (that wouldn't be resolvable by further tweaks to SLUB_TINY) plan is to remove SLOB in a few cycles. Existing defconfigs with CONFIG_SLOB are converted to CONFIG_SLUB_TINY. - kmalloc() slub_debug redzone improvements A series from Feng Tang that builds on the tracking or requested size for kmalloc() allocations (for caches with debugging enabled) added in 6.1, to make redzone checks consider the requested size and not the rounded up one, in order to catch more subtle buffer overruns. Includes new slub_kunit test. - struct slab fields reordering to accomodate larger rcu_head RCU folks would like to grow rcu_head with debugging options, which breaks current struct slab layout's assumptions, so reorganize it to make this possible. - Miscellaneous improvements/fixes: - __alloc_size checking compiler workaround (Kees Cook) - Optimize and cleanup SLUB's sysfs init (Rasmus Villemoes) - Make SLAB compatible with PROVE_RAW_LOCK_NESTING (Jiri Kosina) - Correct SLUB's percpu allocation estimates (Baoquan He) - Re-enableS LUB's run-time failslab sysfs control (Alexander Atanasov) - Make tools/vm/slabinfo more user friendly when not run as root (Rong Tao) - Dead code removal in SLUB (Hyeonggon Yoo) * tag 'slab-for-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab: (31 commits) mm, slob: rename CONFIG_SLOB to CONFIG_SLOB_DEPRECATED mm, slub: don't aggressively inline with CONFIG_SLUB_TINY mm, slub: remove percpu slabs with CONFIG_SLUB_TINY mm, slub: split out allocations from pre/post hooks mm/slub, kunit: Add a test case for kmalloc redzone check mm/slub, kunit: add SLAB_SKIP_KFENCE flag for cache creation mm, slub: refactor free debug processing mm, slab: ignore SLAB_RECLAIM_ACCOUNT with CONFIG_SLUB_TINY mm, slub: don't create kmalloc-rcl caches with CONFIG_SLUB_TINY mm, slub: lower the default slub_max_order with CONFIG_SLUB_TINY mm, slub: retain no free slabs on partial list with CONFIG_SLUB_TINY mm, slub: disable SYSFS support with CONFIG_SLUB_TINY mm, slub: add CONFIG_SLUB_TINY mm, slab: ignore hardened usercopy parameters when disabled slab: Remove special-casing of const 0 size allocations slab: Clean up SLOB vs kmalloc() definition mm/sl[au]b: rearrange struct slab fields to allow larger rcu_head mm/migrate: make isolate_movable_page() skip slab pages mm/slab: move and adjust kernel-doc for kmem_cache_alloc mm/slub, percpu: correct the calculation of early percpu allocation size ...
2022-12-12Merge tag 'printk-for-6.2' of ↵Linus Torvalds8-140/+428
git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux Pull printk updates from Petr Mladek: - Add NMI-safe SRCU reader API. It uses atomic_inc() instead of this_cpu_inc() on strong load-store architectures. - Introduce new console_list_lock to synchronize a manipulation of the list of registered consoles and their flags. This is a first step in removing the big-kernel-lock-like behavior of console_lock(). This semaphore still serializes console->write() calbacks against: - each other. It primary prevents potential races between early and proper console drivers using the same device. - suspend()/resume() callbacks and init() operations in some drivers. - various other operations in the tty/vt and framebufer susbsystems. It is likely that console_lock() serializes even operations that are not directly conflicting with the console->write() callbacks here. This is the most complicated big-kernel-lock aspect of the console_lock() that will be hard to untangle. - Introduce new console_srcu lock that is used to safely iterate and access the registered console drivers under SRCU read lock. This is a prerequisite for introducing atomic console drivers and console kthreads. It will reduce the complexity of serialization against normal consoles and console_lock(). Also it should remove the risk of deadlock during critical situations, like Oops or panic, when only atomic consoles are registered. - Check whether the console is registered instead of enabled on many locations. It was a historical leftover. - Cleanly force a preferred console in xenfb code instead of a dirty hack. - A lot of code and comment clean ups and improvements. * tag 'printk-for-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: (47 commits) printk: htmldocs: add missing description tty: serial: sh-sci: use setup() callback for early console printk: relieve console_lock of list synchronization duties tty: serial: kgdboc: use console_list_lock to trap exit tty: serial: kgdboc: synchronize tty_find_polling_driver() and register_console() tty: serial: kgdboc: use console_list_lock for list traversal tty: serial: kgdboc: use srcu console list iterator proc: consoles: use console_list_lock for list iteration tty: tty_io: use console_list_lock for list synchronization printk, xen: fbfront: create/use safe function for forcing preferred netconsole: avoid CON_ENABLED misuse to track registration usb: early: xhci-dbc: use console_is_registered() tty: serial: xilinx_uartps: use console_is_registered() tty: serial: samsung_tty: use console_is_registered() tty: serial: pic32_uart: use console_is_registered() tty: serial: earlycon: use console_is_registered() tty: hvc: use console_is_registered() efi: earlycon: use console_is_registered() tty: nfcon: use console_is_registered() serial_core: replace uart_console_enabled() with uart_console_registered() ...
2022-12-12Merge tag 'execve-v6.2-rc1' of ↵Linus Torvalds2-11/+21
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull execve updates from Kees Cook: "Most are small refactorings and bug fixes, but three things stand out: switching timens (which got reverted before) looks solid now, FOLL_FORCE has been removed (no failures seen yet across several weeks in -next), and some whitespace cleanups (which are long overdue). - Add timens support (when switching mm). This version has survived in -next for the entire cycle (Andrei Vagin) - Various small bug fixes, refactoring, and readability improvements (Bernd Edlinger, Rolf Eike Beer, Bo Liu, Li Zetao Liu Shixin) - Remove FOLL_FORCE for stack setup (Kees Cook) - Whitespace cleanups (Rolf Eike Beer, Kees Cook)" * tag 'execve-v6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: binfmt_misc: fix shift-out-of-bounds in check_special_flags binfmt: Fix error return code in load_elf_fdpic_binary() exec: Remove FOLL_FORCE for stack setup binfmt_elf: replace IS_ERR() with IS_ERR_VALUE() binfmt_elf: simplify error handling in load_elf_phdrs() binfmt_elf: fix documented return value for load_elf_phdrs() exec: simplify initial stack size expansion binfmt: Fix whitespace issues exec: Add comments on check_unsafe_exec() fs counting ELF uapi: add spaces before '{' selftests/timens: add a test for vfork+exit fs/exec: switch timens when a task gets a new mm
2022-12-12Merge tag 'seccomp-v6.2-rc1' of ↵Linus Torvalds1-6/+11
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull seccomp updates from Kees Cook: - Add missing kerndoc parameter (Randy Dunlap) - Improve seccomp selftest to check CAP_SYS_ADMIN (Gautam Menghani) - Fix allocation leak when cloned thread immediately dies (Kuniyuki Iwashima) * tag 'seccomp-v6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: seccomp: document the "filter_count" field seccomp: Move copy_seccomp() to no failure path. selftests/seccomp: Check CAP_SYS_ADMIN capability in the test mode_filter_without_nnp
2022-12-12Merge tag 'kcsan.2022.12.02a' of ↵Linus Torvalds1-0/+50
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu Pull KCSAN updates from Paul McKenney: - Add instrumentation for memcpy(), memset(), and memmove() for Clang v16+'s new function names that are used when the -fsanitize=thread argument is given - Fix objtool warnings from KCSAN's volatile instrumentation, and typos in a pair of Kconfig options' help clauses * tag 'kcsan.2022.12.02a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: kcsan: Fix trivial typo in Kconfig help comments objtool, kcsan: Add volatile read/write instrumentation to whitelist kcsan: Instrument memcpy/memset/memmove with newer Clang
2022-12-12Merge tag 'rcu.2022.12.02a' of ↵Linus Torvalds14-153/+545
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu Pull RCU updates from Paul McKenney: - Documentation updates. This is the second in a series from an ongoing review of the RCU documentation. - Miscellaneous fixes. - Introduce a default-off Kconfig option that depends on RCU_NOCB_CPU that, on CPUs mentioned in the nohz_full or rcu_nocbs boot-argument CPU lists, causes call_rcu() to introduce delays. These delays result in significant power savings on nearly idle Android and ChromeOS systems. These savings range from a few percent to more than ten percent. This series also includes several commits that change call_rcu() to a new call_rcu_hurry() function that avoids these delays in a few cases, for example, where timely wakeups are required. Several of these are outside of RCU and thus have acks and reviews from the relevant maintainers. - Create an srcu_read_lock_nmisafe() and an srcu_read_unlock_nmisafe() for architectures that support NMIs, but which do not provide NMI-safe this_cpu_inc(). These NMI-safe SRCU functions are required by the upcoming lockless printk() work by John Ogness et al. - Changes providing minor but important increases in torture test coverage for the new RCU polled-grace-period APIs. - Changes to torturescript that avoid redundant kernel builds, thus providing about a 30% speedup for the torture.sh acceptance test. * tag 'rcu.2022.12.02a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (49 commits) net: devinet: Reduce refcount before grace period net: Use call_rcu_hurry() for dst_release() workqueue: Make queue_rcu_work() use call_rcu_hurry() percpu-refcount: Use call_rcu_hurry() for atomic switch scsi/scsi_error: Use call_rcu_hurry() instead of call_rcu() rcu/rcutorture: Use call_rcu_hurry() where needed rcu/rcuscale: Use call_rcu_hurry() for async reader test rcu/sync: Use call_rcu_hurry() instead of call_rcu rcuscale: Add laziness and kfree tests rcu: Shrinker for lazy rcu rcu: Refactor code a bit in rcu_nocb_do_flush_bypass() rcu: Make call_rcu() lazy to save power rcu: Implement lockdep_rcu_enabled for !CONFIG_DEBUG_LOCK_ALLOC srcu: Debug NMI safety even on archs that don't require it srcu: Explain the reason behind the read side critical section on GP start srcu: Warn when NMI-unsafe API is used in NMI arch/s390: Add ARCH_HAS_NMI_SAFE_THIS_CPU_OPS Kconfig option arch/loongarch: Add ARCH_HAS_NMI_SAFE_THIS_CPU_OPS Kconfig option rcu: Fix __this_cpu_read() lockdep warning in rcu_force_quiescent_state() rcu-tasks: Make grace-period-age message human-readable ...
2022-12-12Merge branches 'powercap', 'pm-x86', 'pm-opp' and 'pm-misc'Rafael J. Wysocki1-3/+3
Merge power capping code updates, x86-specific power management pdate, operating performance points library updates and miscellaneous power management updates for 6.2-rc1: - Fix compiler warnings with make W=1 in the idle_inject power capping driver (Srinivas Pandruvada). - Use kstrtobool() instead of strtobool() in the power capping sysfs interface (Christophe JAILLET). - Add SCMI Powercap based power capping driver (Cristian Marussi). - Add Emerald Rapids support to the intel-uncore-freq driver (Artem Bityutskiy). - Repair slips in kernel-doc comments in the generic notifier code (Lukas Bulwahn). - Fix several DT issues in the OPP library reorganize code around opp-microvolt-<named> DT property (Viresh Kumar). - Allow any of opp-microvolt, opp-microamp, or opp-microwatt properties to be present without the others present (James Calligeros). - Fix clock-latency-ns property in DT example (Serge Semin). * powercap: powercap: idle_inject: Fix warnings with make W=1 powercap: Use kstrtobool() instead of strtobool() powercap: arm_scmi: Add SCMI Powercap based driver * pm-x86: platform/x86: intel-uncore-freq: add Emerald Rapids support * pm-opp: dt-bindings: opp-v2: Fix clock-latency-ns prop in example OPP: decouple dt properties in opp_parse_supplies() OPP: Simplify opp_parse_supplies() by restructuring it OPP: Parse named opp-microwatt property too dt-bindings: opp: Fix named microwatt property dt-bindings: opp: Fix usage of current in microwatt property * pm-misc: notifier: repair slips in kernel-doc comments
2022-12-12tracing: Fix infinite loop in tracing_read_pipe on overflowed print_trace_lineYang Jihong1-1/+14
print_trace_line may overflow seq_file buffer. If the event is not consumed, the while loop keeps peeking this event, causing a infinite loop. Link: https://lkml.kernel.org/r/20221129113009.182425-1-yangjihong1@huawei.com Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: stable@vger.kernel.org Fixes: 088b1e427dbba ("ftrace: pipe fixes") Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-12-12Merge branches 'pm-cpuidle', 'pm-sleep' and 'pm-domains'Rafael J. Wysocki2-19/+18
Merge cpuidle changes, updates related to system sleep amd generic power domains code fixes for 6.2-rc1: - Improve kernel messages printed by the cpuidle PCI driver (Ulf Hansson). - Make the DT cpuidle driver return the correct number of parsed idle states, clean it up and clarify a comment in it (Ulf Hansson). - Modify the tasks freezing code to avoid using pr_cont() and refine an error message printed by it (Rafael Wysocki). - Make the hibernation core code complain about memory map mismatches during resume to help diagnostics (Xueqin Luo). - Fix mistake in a kerneldoc comment in the hibernation code (xiongxin). - Reverse the order of performance and enabling operations in the generic power domains code (Abel Vesa). - Power off[on] domains in hibernate .freeze[thaw]_noirq hook of in the generic power domains code (Abel Vesa). - Consolidate genpd_restore_noirq() and genpd_resume_noirq() (Shawn Guo). - Pass generic PM noirq hooks to genpd_finish_suspend() (Shawn Guo). - Drop generic power domain status manipulation during hibernate restore (Shawn Guo). * pm-cpuidle: cpuidle: dt: Clarify a comment and simplify code in dt_init_idle_driver() cpuidle: dt: Return the correct numbers of parsed idle states cpuidle: psci: Extend information in log about OSI/PC mode * pm-sleep: PM: sleep: Refine error message in try_to_freeze_tasks() PM: sleep: Avoid using pr_cont() in the tasks freezing code PM: hibernate: Complain about memory map mismatches during resume PM: hibernate: Fix mistake in kerneldoc comment * pm-domains: PM: domains: Reverse the order of performance and enabling ops PM: domains: Power off[on] domain in hibernate .freeze[thaw]_noirq hook PM: domains: Consolidate genpd_restore_noirq() and genpd_resume_noirq() PM: domains: Pass generic PM noirq hooks to genpd_finish_suspend() PM: domains: Drop genpd status manipulation for hibernate restore
2022-12-11relay: fix type mismatch when allocating memory in relay_create_buf()Gavrilov Ilia1-2/+2
The 'padding' field of the 'rchan_buf' structure is an array of 'size_t' elements, but the memory is allocated for an array of 'size_t *' elements. Found by Linux Verification Center (linuxtesting.org) with SVACE. Link: https://lkml.kernel.org/r/20221129092002.3538384-1-Ilia.Gavrilov@infotecs.ru Fixes: b86ff981a825 ("[PATCH] relay: migrate from relayfs to a generic relay API") Signed-off-by: Ilia.Gavrilov <Ilia.Gavrilov@infotecs.ru> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: wuchi <wuchi.zero@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-12-11kernel: kcsan: kcsan_test: build without structleak pluginAnders Roxell1-0/+1
Building kcsan_test with structleak plugin enabled makes the stack frame size to grow. kernel/kcsan/kcsan_test.c:704:1: error: the frame size of 3296 bytes is larger than 2048 bytes [-Werror=frame-larger-than=] Turn off the structleak plugin checks for kcsan_test. Link: https://lkml.kernel.org/r/20221128104358.2660634-1-anders.roxell@linaro.org Signed-off-by: Anders Roxell <anders.roxell@linaro.org> Suggested-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Marco Elver <elver@google.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: David Gow <davidgow@google.com> Cc: Jason A. Donenfeld <Jason@zx2c4.com> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-12-11relay: use strscpy() is more robust and saferXu Panda1-2/+2
The implementation of strscpy() is more robust and safer. That's now the recommended way to copy NUL terminated strings. Link: https://lkml.kernel.org/r/202211220853259244666@zte.com.cn Signed-off-by: Xu Panda <xu.panda@zte.com.cn> Signed-off-by: Yang Yang <yang.yang29@zte.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: wuchi <wuchi.zero@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>