summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2022-10-12ext4,f2fs: fix readahead of verity dataMatthew Wilcox (Oracle)2-2/+4
The recent change of page_cache_ra_unbounded() arguments was buggy in the two callers, causing us to readahead the wrong pages. Move the definition of ractl down to after the index is set correctly. This affected performance on configurations that use fs-verity. Link: https://lkml.kernel.org/r/20221012193419.1453558-1-willy@infradead.org Fixes: 73bb49da50cd ("mm/readahead: make page_cache_ra_unbounded take a readahead_control") Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reported-by: Jintao Yin <nicememory@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-12mm/mmap: undo ->mmap() when arch_validate_flags() failsCarlos Llamas1-1/+4
Commit c462ac288f2c ("mm: Introduce arch_validate_flags()") added a late check in mmap_region() to let architectures validate vm_flags. The check needs to happen after calling ->mmap() as the flags can potentially be modified during this callback. If arch_validate_flags() check fails we unmap and free the vma. However, the error path fails to undo the ->mmap() call that previously succeeded and depending on the specific ->mmap() implementation this translates to reference increments, memory allocations and other operations what will not be cleaned up. There are several places (mainly device drivers) where this is an issue. However, one specific example is bpf_map_mmap() which keeps count of the mappings in map->writecnt. The count is incremented on ->mmap() and then decremented on vm_ops->close(). When arch_validate_flags() fails this count is off since bpf_map_mmap_close() is never called. One can reproduce this issue in arm64 devices with MTE support. Here the vm_flags are checked to only allow VM_MTE if VM_MTE_ALLOWED has been set previously. From userspace then is enough to pass the PROT_MTE flag to mmap() syscall to trigger the arch_validate_flags() failure. The following program reproduces this issue: #include <stdio.h> #include <unistd.h> #include <linux/unistd.h> #include <linux/bpf.h> #include <sys/mman.h> int main(void) { union bpf_attr attr = { .map_type = BPF_MAP_TYPE_ARRAY, .key_size = sizeof(int), .value_size = sizeof(long long), .max_entries = 256, .map_flags = BPF_F_MMAPABLE, }; int fd; fd = syscall(__NR_bpf, BPF_MAP_CREATE, &attr, sizeof(attr)); mmap(NULL, 4096, PROT_WRITE | PROT_MTE, MAP_SHARED, fd, 0); return 0; } By manually adding some log statements to the vm_ops callbacks we can confirm that when passing PROT_MTE to mmap() the map->writecnt is off upon ->release(): With PROT_MTE flag: root@debian:~# ./bpf-test [ 111.263874] bpf_map_write_active_inc: map=9 writecnt=1 [ 111.288763] bpf_map_release: map=9 writecnt=1 Without PROT_MTE flag: root@debian:~# ./bpf-test [ 157.816912] bpf_map_write_active_inc: map=10 writecnt=1 [ 157.830442] bpf_map_write_active_dec: map=10 writecnt=0 [ 157.832396] bpf_map_release: map=10 writecnt=0 This patch fixes the above issue by calling vm_ops->close() when the arch_validate_flags() check fails, after this we can proceed to unmap and free the vma on the error path. Link: https://lkml.kernel.org/r/20220930003844.1210987-1-cmllamas@google.com Fixes: c462ac288f2c ("mm: Introduce arch_validate_flags()") Signed-off-by: Carlos Llamas <cmllamas@google.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Liam Howlett <liam.howlett@oracle.com> Cc: Christian Brauner (Microsoft) <brauner@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: <stable@vger.kernel.org> [5.10+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-12net: phy: micrel: Fixes FIELD_GET assertionDivya Koppera1-4/+5
FIELD_GET() must only be used with a mask that is a compile-time constant. Mark the functions as __always_inline to avoid the problem. Fixes: 21b688dabecb6a ("net: phy: micrel: Cable Diag feature for lan8814 phy") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Divya Koppera <Divya.Koppera@microchip.com> Link: https://lore.kernel.org/r/20221011095437.12580-1-Divya.Koppera@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-12openvswitch: add nf_ct_is_confirmed check before assigning the helperXin Long1-1/+2
A WARN_ON call trace would be triggered when 'ct(commit, alg=helper)' applies on a confirmed connection: WARNING: CPU: 0 PID: 1251 at net/netfilter/nf_conntrack_extend.c:98 RIP: 0010:nf_ct_ext_add+0x12d/0x150 [nf_conntrack] Call Trace: <TASK> nf_ct_helper_ext_add+0x12/0x60 [nf_conntrack] __nf_ct_try_assign_helper+0xc4/0x160 [nf_conntrack] __ovs_ct_lookup+0x72e/0x780 [openvswitch] ovs_ct_execute+0x1d8/0x920 [openvswitch] do_execute_actions+0x4e6/0xb60 [openvswitch] ovs_execute_actions+0x60/0x140 [openvswitch] ovs_packet_cmd_execute+0x2ad/0x310 [openvswitch] genl_family_rcv_msg_doit.isra.15+0x113/0x150 genl_rcv_msg+0xef/0x1f0 which can be reproduced with these OVS flows: table=0, in_port=veth1,tcp,tcp_dst=2121,ct_state=-trk actions=ct(commit, table=1) table=1, in_port=veth1,tcp,tcp_dst=2121,ct_state=+trk+new actions=ct(commit, alg=ftp),normal The issue was introduced by commit 248d45f1e193 ("openvswitch: Allow attaching helper in later commit") where it somehow removed the check of nf_ct_is_confirmed before asigning the helper. This patch is to fix it by bringing it back. Fixes: 248d45f1e193 ("openvswitch: Allow attaching helper in later commit") Reported-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Aaron Conole <aconole@redhat.com> Tested-by: Aaron Conole <aconole@redhat.com> Link: https://lore.kernel.org/r/c5c9092a22a2194650222bffaf786902613deb16.1665085502.git.lucien.xin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-12Merge branch 'tcp-udp-fix-memory-leaks-and-data-races-around-ipv6_addrform'Jakub Kicinski14-45/+102
Kuniyuki Iwashima says: ==================== tcp/udp: Fix memory leaks and data races around IPV6_ADDRFORM. This series fixes some memory leaks and data races caused in the same scenario where one thread converts an IPv6 socket into IPv4 with IPV6_ADDRFORM and another accesses the socket concurrently. v4: https://lore.kernel.org/netdev/20221004171802.40968-1-kuniyu@amazon.com/ v3 (Resend): https://lore.kernel.org/netdev/20221003154425.49458-1-kuniyu@amazon.com/ v3: https://lore.kernel.org/netdev/20220929012542.55424-1-kuniyu@amazon.com/ v2: https://lore.kernel.org/netdev/20220928002741.64237-1-kuniyu@amazon.com/ v1: https://lore.kernel.org/netdev/20220927161209.32939-1-kuniyu@amazon.com/ ==================== Link: https://lore.kernel.org/r/20221006185349.74777-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-12tcp: Fix data races around icsk->icsk_af_ops.Kuniyuki Iwashima3-7/+12
setsockopt(IPV6_ADDRFORM) and tcp_v6_connect() change icsk->icsk_af_ops under lock_sock(), but tcp_(get|set)sockopt() read it locklessly. To avoid load/store tearing, we need to add READ_ONCE() and WRITE_ONCE() for the reads and writes. Thanks to Eric Dumazet for providing the syzbot report: BUG: KCSAN: data-race in tcp_setsockopt / tcp_v6_connect write to 0xffff88813c624518 of 8 bytes by task 23936 on cpu 0: tcp_v6_connect+0x5b3/0xce0 net/ipv6/tcp_ipv6.c:240 __inet_stream_connect+0x159/0x6d0 net/ipv4/af_inet.c:660 inet_stream_connect+0x44/0x70 net/ipv4/af_inet.c:724 __sys_connect_file net/socket.c:1976 [inline] __sys_connect+0x197/0x1b0 net/socket.c:1993 __do_sys_connect net/socket.c:2003 [inline] __se_sys_connect net/socket.c:2000 [inline] __x64_sys_connect+0x3d/0x50 net/socket.c:2000 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd read to 0xffff88813c624518 of 8 bytes by task 23937 on cpu 1: tcp_setsockopt+0x147/0x1c80 net/ipv4/tcp.c:3789 sock_common_setsockopt+0x5d/0x70 net/core/sock.c:3585 __sys_setsockopt+0x212/0x2b0 net/socket.c:2252 __do_sys_setsockopt net/socket.c:2263 [inline] __se_sys_setsockopt net/socket.c:2260 [inline] __x64_sys_setsockopt+0x62/0x70 net/socket.c:2260 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd value changed: 0xffffffff8539af68 -> 0xffffffff8539aff8 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 23937 Comm: syz-executor.5 Not tainted 6.0.0-rc4-syzkaller-00331-g4ed9c1e971b1-dirty #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/26/2022 Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot <syzkaller@googlegroups.com> Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-12ipv6: Fix data races around sk->sk_prot.Kuniyuki Iwashima3-11/+22
Commit 086d49058cd8 ("ipv6: annotate some data-races around sk->sk_prot") fixed some data-races around sk->sk_prot but it was not enough. Some functions in inet6_(stream|dgram)_ops still access sk->sk_prot without lock_sock() or rtnl_lock(), so they need READ_ONCE() to avoid load tearing. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-12tcp/udp: Call inet6_destroy_sock() in IPv6 sk->sk_destruct().Kuniyuki Iwashima9-15/+46
Originally, inet6_sk(sk)->XXX were changed under lock_sock(), so we were able to clean them up by calling inet6_destroy_sock() during the IPv6 -> IPv4 conversion by IPV6_ADDRFORM. However, commit 03485f2adcde ("udpv6: Add lockless sendmsg() support") added a lockless memory allocation path, which could cause a memory leak: setsockopt(IPV6_ADDRFORM) sendmsg() +-----------------------+ +-------+ - do_ipv6_setsockopt(sk, ...) - udpv6_sendmsg(sk, ...) - sockopt_lock_sock(sk) ^._ called via udpv6_prot - lock_sock(sk) before WRITE_ONCE() - WRITE_ONCE(sk->sk_prot, &tcp_prot) - inet6_destroy_sock() - if (!corkreq) - sockopt_release_sock(sk) - ip6_make_skb(sk, ...) - release_sock(sk) ^._ lockless fast path for the non-corking case - __ip6_append_data(sk, ...) - ipv6_local_rxpmtu(sk, ...) - xchg(&np->rxpmtu, skb) ^._ rxpmtu is never freed. - goto out_no_dst; - lock_sock(sk) For now, rxpmtu is only the case, but not to miss the future change and a similar bug fixed in commit e27326009a3d ("net: ping6: Fix memleak in ipv6_renew_options()."), let's set a new function to IPv6 sk->sk_destruct() and call inet6_cleanup_sock() there. Since the conversion does not change sk->sk_destruct(), we can guarantee that we can clean up IPv6 resources finally. We can now remove all inet6_destroy_sock() calls from IPv6 protocol specific ->destroy() functions, but such changes are invasive to backport. So they can be posted as a follow-up later for net-next. Fixes: 03485f2adcde ("udpv6: Add lockless sendmsg() support") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-12udp: Call inet6_destroy_sock() in setsockopt(IPV6_ADDRFORM).Kuniyuki Iwashima3-12/+15
Commit 4b340ae20d0e ("IPv6: Complete IPV6_DONTFRAG support") forgot to add a change to free inet6_sk(sk)->rxpmtu while converting an IPv6 socket into IPv4 with IPV6_ADDRFORM. After conversion, sk_prot is changed to udp_prot and ->destroy() never cleans it up, resulting in a memory leak. This is due to the discrepancy between inet6_destroy_sock() and IPV6_ADDRFORM, so let's call inet6_destroy_sock() from IPV6_ADDRFORM to remove the difference. However, this is not enough for now because rxpmtu can be changed without lock_sock() after commit 03485f2adcde ("udpv6: Add lockless sendmsg() support"). We will fix this case in the following patch. Note we will rename inet6_destroy_sock() to inet6_cleanup_sock() and remove unnecessary inet6_destroy_sock() calls in sk_prot->destroy() in the future. Fixes: 4b340ae20d0e ("IPv6: Complete IPV6_DONTFRAG support") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-12tcp/udp: Fix memory leak in ipv6_renew_options().Kuniyuki Iwashima1-0/+7
syzbot reported a memory leak [0] related to IPV6_ADDRFORM. The scenario is that while one thread is converting an IPv6 socket into IPv4 with IPV6_ADDRFORM, another thread calls do_ipv6_setsockopt() and allocates memory to inet6_sk(sk)->XXX after conversion. Then, the converted sk with (tcp|udp)_prot never frees the IPv6 resources, which inet6_destroy_sock() should have cleaned up. setsockopt(IPV6_ADDRFORM) setsockopt(IPV6_DSTOPTS) +-----------------------+ +----------------------+ - do_ipv6_setsockopt(sk, ...) - sockopt_lock_sock(sk) - do_ipv6_setsockopt(sk, ...) - lock_sock(sk) ^._ called via tcpv6_prot - WRITE_ONCE(sk->sk_prot, &tcp_prot) before WRITE_ONCE() - xchg(&np->opt, NULL) - txopt_put(opt) - sockopt_release_sock(sk) - release_sock(sk) - sockopt_lock_sock(sk) - lock_sock(sk) - ipv6_set_opt_hdr(sk, ...) - ipv6_update_options(sk, opt) - xchg(&inet6_sk(sk)->opt, opt) ^._ opt is never freed. - sockopt_release_sock(sk) - release_sock(sk) Since IPV6_DSTOPTS allocates options under lock_sock(), we can avoid this memory leak by testing whether sk_family is changed by IPV6_ADDRFORM after acquiring the lock. This issue exists from the initial commit between IPV6_ADDRFORM and IPV6_PKTOPTIONS. [0]: BUG: memory leak unreferenced object 0xffff888009ab9f80 (size 96): comm "syz-executor583", pid 328, jiffies 4294916198 (age 13.034s) hex dump (first 32 bytes): 01 00 00 00 48 00 00 00 08 00 00 00 00 00 00 00 ....H........... 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<000000002ee98ae1>] kmalloc include/linux/slab.h:605 [inline] [<000000002ee98ae1>] sock_kmalloc+0xb3/0x100 net/core/sock.c:2566 [<0000000065d7b698>] ipv6_renew_options+0x21e/0x10b0 net/ipv6/exthdrs.c:1318 [<00000000a8c756d7>] ipv6_set_opt_hdr net/ipv6/ipv6_sockglue.c:354 [inline] [<00000000a8c756d7>] do_ipv6_setsockopt.constprop.0+0x28b7/0x4350 net/ipv6/ipv6_sockglue.c:668 [<000000002854d204>] ipv6_setsockopt+0xdf/0x190 net/ipv6/ipv6_sockglue.c:1021 [<00000000e69fdcf8>] tcp_setsockopt+0x13b/0x2620 net/ipv4/tcp.c:3789 [<0000000090da4b9b>] __sys_setsockopt+0x239/0x620 net/socket.c:2252 [<00000000b10d192f>] __do_sys_setsockopt net/socket.c:2263 [inline] [<00000000b10d192f>] __se_sys_setsockopt net/socket.c:2260 [inline] [<00000000b10d192f>] __x64_sys_setsockopt+0xbe/0x160 net/socket.c:2260 [<000000000a80d7aa>] do_syscall_x64 arch/x86/entry/common.c:50 [inline] [<000000000a80d7aa>] do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80 [<000000004562b5c6>] entry_SYSCALL_64_after_hwframe+0x63/0xcd Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-12Merge patch series "Fix dt-validate issues on qemu dtbdumps due to dt-bindings"Palmer Dabbelt3-9/+19
Conor Dooley <mail@conchuod.ie> says: From: Conor Dooley <conor.dooley@microchip.com> The device trees produced automatically for the virt and spike machines fail dt-validate on several grounds. Some of these need to be fixed in the linux kernel's dt-bindings, but others are caused by bugs in QEMU. Patches been sent that fix the QEMU issues [0], but a couple of them need to be fixed in the kernel's dt-bindings. The first patches add compatibles for "riscv,{clint,plic}0" which are present in drivers and the auto generated QEMU dtbs. Thanks to Rob Herring for reporting these issues [1], Conor. To reproduce the errors: ./build/qemu-system-riscv64 -nographic -machine virt,dumpdtb=qemu.dtb dt-validate -p /path/to/linux/kernel/Documentation/devicetree/bindings/processed-schema.json qemu.dtb (The processed schema needs to be generated first) 0 - https://lore.kernel.org/linux-riscv/20220810184612.157317-1-mail@conchuod.ie/ 1 - https://lore.kernel.org/linux-riscv/20220803170552.GA2250266-robh@kernel.org/ * fix-dt-validate: dt-bindings: riscv: add new riscv,isa strings for emulators dt-bindings: interrupt-controller: sifive,plic: add legacy riscv compatible dt-bindings: timer: sifive,clint: add legacy riscv compatible Link: https://lore.kernel.org/r/20220823183319.3314940-1-mail@conchuod.ie [Palmer: some cover letter pruning, and dropped #4 as suggested.] Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-10-12dt-bindings: riscv: add new riscv,isa strings for emulatorsConor Dooley1-3/+2
The QEMU virt and spike machines currently export a riscv,isa string of "rv64imafdcsuh", While the RISC-V foundation has been ratifying a bunch of extenstions etc, the kernel has remained relatively static with what hardware is supported - but the same is not true of QEMU. Using the virt machine and running dt-validate on the dumped dtb fails, partly due to the unexpected isa string. Rather than enumerate the many many possbilities, change the pattern to a regex, with the following assumptions: - ima are required - the single letter order is fixed & we don't care about things that can't even do "ima" - the standard multi letter extensions are all in a "_z<foo>" format where the first letter of <foo> is a valid single letter extension - _s & _h are used for supervisor and hyper visor extensions - convention says that after the first two chars, a standard multi letter extension name could be an english word (ifencei anyone?) so it's not worth restricting the charset - as the above is just convention, don't apply any charset restrictions to reduce future churn - vendor ISA extensions begind with _x and have no charset restrictions - we don't care about an e extension from an OS pov - that attempting to validate the contents of the multiletter extensions with dt-validate beyond the formatting is a futile, massively verbose or unwieldy exercise at best The following limitations also apply: - multi letter extension ordering is not enforced. dt-schema does not appear to allow for named match groups, so the resulting regex would be even more of a headache - ditto for the numbered extensions Finally, add me as a maintainer of the binding so that when it breaks in the future, I can be held responsible! Reported-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/linux-riscv/20220803170552.GA2250266-robh@kernel.org/ Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Acked-by: Guo Ren <guoren@kernel.org> Signed-off-by: Conor Dooley <conor.dooley@microchip.com> Acked-by: Heiko Stuebner <heiko@sntech.de> Reviewed-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20220823183319.3314940-4-mail@conchuod.ie Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-10-12dt-bindings: interrupt-controller: sifive,plic: add legacy riscv compatibleConor Dooley1-0/+5
While "real" hardware might not use the compatible string "riscv,plic0" it is present in the driver & QEMU uses it for automatically generated virt machine dtbs. To avoid dt-validate problems with QEMU produced dtbs, such as the following, add it to the binding. riscv-virt.dtb: plic@c000000: compatible: 'oneOf' conditional failed, one must be fixed: 'sifive,plic-1.0.0' is not one of ['sifive,fu540-c000-plic', 'starfive,jh7100-plic', 'canaan,k210-plic'] 'sifive,plic-1.0.0' is not one of ['allwinner,sun20i-d1-plic'] 'sifive,plic-1.0.0' was expected 'thead,c900-plic' was expected riscv-virt.dtb: plic@c000000: '#address-cells' is a required property Reported-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/linux-riscv/20220803170552.GA2250266-robh@kernel.org/ Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Heiko Stuebner <heiko@sntech.de> Link: https://lore.kernel.org/r/20220823183319.3314940-3-mail@conchuod.ie Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-10-12dt-bindings: timer: sifive,clint: add legacy riscv compatibleConor Dooley1-6/+12
While "real" hardware might not use the compatible string "riscv,clint0" it is present in the driver & QEMU uses it for automatically generated virt machine dtbs. To avoid dt-validate problems with QEMU produced dtbs, such as the following, add it to the binding. riscv-virt.dtb: clint@2000000: compatible:0: 'sifive,clint0' is not one of ['sifive,fu540-c000-clint', 'starfive,jh7100-clint', 'canaan,k210-clint'] Reported-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/linux-riscv/20220803170552.GA2250266-robh@kernel.org/ Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Heiko Stuebner <heiko@sntech.de> Link: https://lore.kernel.org/r/20220823183319.3314940-2-mail@conchuod.ie Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-10-12mm/uffd: fix warning without PTE_MARKER_UFFD_WP compiled inPeter Xu3-0/+8
When PTE_MARKER_UFFD_WP not configured, it's still possible to reach pte marker code and trigger an warning. Add a few CONFIG_PTE_MARKER_UFFD_WP ifdefs to make sure the code won't be reached when not compiled in. Link: https://lkml.kernel.org/r/YzeR+R6b4bwBlBHh@x1n Fixes: b1f9e876862d ("mm/uffd: enable write protection for shmem & hugetlbfs") Signed-off-by: Peter Xu <peterx@redhat.com> Reported-by: <syzbot+2b9b4f0895be09a6dec3@syzkaller.appspotmail.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Brian Geffon <bgeffon@google.com> Cc: Edward Liaw <edliaw@google.com> Cc: Liu Shixin <liushixin2@huawei.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-12mm/mmap: preallocate maple nodes for brk vma expansionLiam Howlett1-12/+6
If the brk VMA is the last vma in a maple node and meets the rare criteria that it can be expanded, then preallocation is necessary to avoid a potential fs_reclaim circular lock issue on low resources. At the same time use the actual vma start address (unaligned) when calling vma_adjust_trans_huge(). Link: https://lkml.kernel.org/r/20221011160624.1253454-1-Liam.Howlett@oracle.com Fixes: 2e7ce7d354f2 (mm/mmap: change do_brk_flags() to expand existing VMA and add do_brk_munmap()) Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Reported-by: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-12mm: more vma cache removalAlexey Dobriyan1-2/+0
Link: https://lkml.kernel.org/r/Y0WuE3Riv4iy5Jx8@localhost.localdomain Fixes: 7964cf8caa4d ("mm: remove vmacache") Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Liam Howlett <liam.howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-12mmap: fix copy_vma() failure pathLiam Howlett1-0/+5
The anon vma was not unlinked and the file was not closed in the failure path when the machine runs out of memory during the maple tree modification. This caused a memory leak of the anon vma chain and vma since neither would be freed. Link: https://lkml.kernel.org/r/20221011203621.1446507-1-Liam.Howlett@oracle.com Fixes: 524e00b36e8c ("mm: remove rb tree") Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Reported-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Tested-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-12mm/compaction: fix set skip in fast_find_migrateblockChuyi Zhou1-1/+0
When we successfully find a pageblock in fast_find_migrateblock(), the block will be set skip-flag through set_pageblock_skip(). However, when entering isolate_migratepages_block(), the whole pageblock will be skipped due to the branch 'if (!valid_page && IS_ALIGNED(low_pfn, pageblock_nr_pages))'. Eventually we will goto isolate_abort and isolate nothing. That makes fast_find_migrateblock useless. In this patch, when we find a suitable pageblock in fast_find_migrateblock, we do noting but let isolate_migratepages_block to set skip flag to the pageblock after scan it. Normally, we would isolate some pages from the fast-find block. I use mmtest/thpscale-madvhugepage test it. Here is the result: baseline patch Amean fault-both-1 1331.66 ( 0.00%) 1261.04 * 5.30%* Amean fault-both-3 1383.95 ( 0.00%) 1191.69 * 13.89%* Amean fault-both-5 1568.13 ( 0.00%) 1445.20 * 7.84%* Amean fault-both-7 1819.62 ( 0.00%) 1555.13 * 14.54%* Amean fault-both-12 1106.96 ( 0.00%) 1149.43 * -3.84%* Amean fault-both-18 2196.93 ( 0.00%) 1875.77 * 14.62%* Amean fault-both-24 2642.69 ( 0.00%) 2671.21 * -1.08%* Amean fault-both-30 2901.89 ( 0.00%) 2857.32 * 1.54%* Amean fault-both-32 3747.00 ( 0.00%) 3479.23 * 7.15%* Link: https://lkml.kernel.org/r/20220713062009.597255-1-zhouchuyi@bytedance.com Fixes: 70b44595eafe9 ("mm, compaction: use free lists to quickly locate a migration source") Signed-off-by: zhouchuyi <zhouchuyi@bytedance.com> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-12mm/hugetlb.c: make __hugetlb_vma_unlock_write_put() staticAndrew Morton1-1/+1
Reported-by: kernel test robot <lkp@intel.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-13rtc: k3: Use devm_clk_get_enabled() helperChristophe JAILLET1-18/+4
The devm_clk_get_enabled() helper: - calls devm_clk_get() - calls clk_prepare_enable() and registers what is needed in order to call clk_disable_unprepare() when needed, as a managed resource. This simplifies the code, the error handling paths and avoid the need of a dedicated function used with devm_add_action_or_reset(). Based on my test with allyesconfig, this reduces the .o size from: text data bss dec hex filename 12843 4804 64 17711 452f drivers/rtc/rtc-ti-k3.o down to: 12523 4804 64 17391 43ef drivers/rtc/rtc-ti-k3.o Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/601288834ab71c0fddde7eedd8cdb8001254ed7e.1661329498.git.christophe.jaillet@wanadoo.fr Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2022-10-13rtc: jz4740: Use devm_clk_get_enabled() helperChristophe JAILLET1-22/+3
The devm_clk_get_enabled() helper: - calls devm_clk_get() - calls clk_prepare_enable() and registers what is needed in order to call clk_disable_unprepare() when needed, as a managed resource. This simplifies the code, the error handling paths and avoid the need of a dedicated function used with devm_add_action_or_reset(). As a side effect, some error messages are not logged anymore, so also use dev_err_probe() instead of dev_err() in case of error. At least the error code will be logged (and -EPROBE_DEFER will be filtered) Based on my test with allyesconfig, this reduces the .o size from: text data bss dec hex filename 9025 2488 128 11641 2d79 drivers/rtc/rtc-jz4740.o down to: 8267 2080 128 10475 28eb drivers/rtc/rtc-jz4740.o Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Acked-by: Paul Cercueil <paul@crapouillou.net> Link: https://lore.kernel.org/r/af10570000d7e103d70bbea590ce8df4f8902b67.1661330532.git.christophe.jaillet@wanadoo.fr Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2022-10-13rtc: mpfs: Use devm_clk_get_enabled() helperChristophe JAILLET1-18/+1
The devm_clk_get_enabled() helper: - calls devm_clk_get() - calls clk_prepare_enable() and registers what is needed in order to call clk_disable_unprepare() when needed, as a managed resource. This simplifies the code, the error handling paths and avoid the need of a dedicated function used with devm_add_action_or_reset(). That said, mpfs_rtc_init_clk() is the same as devm_clk_get_enabled(), so use this function directly instead. This also fixes an (unlikely) unchecked devm_add_action_or_reset() error. Based on my test with allyesconfig, this reduces the .o size from: text data bss dec hex filename 5330 2208 0 7538 1d72 drivers/rtc/rtc-mpfs.o down to: 5074 2208 0 7282 1c72 drivers/rtc/rtc-mpfs.o Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/e55c959f2821a2c367a4c5de529a638b1cc6b8cd.1661329086.git.christophe.jaillet@wanadoo.fr Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2022-10-12io_uring/rw: ensure kiocb_end_write() is always calledJens Axboe1-9/+29
A previous commit moved the notifications and end-write handling, but it is now missing a few spots where we also want to call both of those. Without that, we can potentially be missing file notifications, and more importantly, have an imbalance in the super_block writers sem accounting. Fixes: b000145e9907 ("io_uring/rw: defer fsnotify calls to task context") Reported-by: Dave Chinner <david@fromorbit.com> Link: https://lore.kernel.org/all/20221010050319.GC2703033@dread.disaster.area/ Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-10-12io_uring: fix fdinfo sqe offsets calculationPavel Begunkov1-1/+1
Only with the big sqe feature they take 128 bytes per entry, but we unconditionally advance by 128B. Fix it by using sq_shift. Fixes: 3b8fdd1dc35e3 ("io_uring/fdinfo: fix sqe dumping for IORING_SETUP_SQE128") Reported-and-tested-by: syzbot+e5198737e8a2d23d958c@syzkaller.appspotmail.com Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/8b41287cb75d5efb8fcb5cccde845ddbbadd8372.1665449983.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-10-12io_uring: local variable rw shadows outer variable in io_writeStefan Roesch1-4/+4
This fixes the shadowing of the outer variable rw in the function io_write(). No issue is caused by this, but let's silence the shadowing warning anyway. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Stefan Roesch <shr@devkernel.io> Link: https://lore.kernel.org/r/20221010234330.244244-1-shr@devkernel.io Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-10-12io_uring/opdef: remove 'audit_skip' from SENDMSG_ZCJens Axboe1-1/+0
The msg variants of sending aren't audited separately, so we should not be setting audit_skip for the zerocopy sendmsg variant either. Fixes: 493108d95f14 ("io_uring/net: zerocopy sendmsg") Reported-by: Paul Moore <paul@paul-moore.com> Reviewed-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-10-12io_uring: optimise locking for local tw with submit_waitPavel Begunkov2-2/+17
Running local task_work requires taking uring_lock, for submit + wait we can try to run them right after submit while we still hold the lock and save one lock/unlokc pair. The optimisation was implemented in the first local tw patches but got dropped for simplicity. Suggested-by: Dylan Yudaken <dylany@fb.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/281fc79d98b5d91fe4778c5137a17a2ab4693e5c.1665088876.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-10-12io_uring: remove redundant memory barrier in io_req_local_work_addPavel Begunkov2-4/+12
io_cqring_wake() needs a barrier for the waitqueue_active() check. However, in the case of io_req_local_work_add(), we call llist_add() first, which implies an atomic. Hence we can replace smb_mb() with smp_mb__after_atomic(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/43983bc8bc507172adda7a0f00cab1aff09fd238.1665018309.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-10-12io_uring/net: handle -EINPROGRESS correct for IORING_OP_CONNECTJens Axboe1-6/+22
We treat EINPROGRESS like EAGAIN, but if we're retrying post getting EINPROGRESS, then we just need to check the socket for errors and terminate the request. This was exposed on a bluetooth connection request which ends up taking a while and hitting EINPROGRESS, and yields a CQE result of -EBADFD because we're retrying a connect on a socket that is now connected. Cc: stable@vger.kernel.org Fixes: 87f80d623c6c ("io_uring: handle connect -EINPROGRESS like -EAGAIN") Link: https://github.com/axboe/liburing/issues/671 Reported-by: Aidan Sun <aidansun05@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-10-12io_uring: remove notif leftoversPavel Begunkov2-6/+0
Notifications were killed but there is a couple of fields and struct declarations left, remove them. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/8df8877d677be5a2b43afd936d600e60105ea960.1664849941.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-10-12io_uring: correct pinned_vm accountingPavel Begunkov1-6/+4
->mm_account should be released only after we free all registered buffers, otherwise __io_sqe_buffers_unregister() will see a NULL ->mm_account and skip locked_vm accounting. Cc: <Stable@vger.kernel.org> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/6d798f65ed4ab8db3664c4d3397d4af16ca98846.1664849932.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-10-12io_uring/af_unix: defer registered files gc to io_uring releasePavel Begunkov3-0/+23
Instead of putting io_uring's registered files in unix_gc() we want it to be done by io_uring itself. The trick here is to consider io_uring registered files for cycle detection but not actually putting them down. Because io_uring can't register other ring instances, this will remove all refs to the ring file triggering the ->release path and clean up with io_ring_ctx_free(). Cc: stable@vger.kernel.org Fixes: 6b06314c47e1 ("io_uring: add file set registration") Reported-and-tested-by: David Bouman <dbouman03@gmail.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> [axboe: add kerneldoc comment to skb, fold in skb leak fix] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-10-13rtc: ds1685: Fix spelling of function name in comment blockColin Ian King1-1/+1
The function name is missing the letter 'd' in the comment block. Fix it. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Acked-by: Joshua Kinard <kumba@gentoo.org> Link: https://lore.kernel.org/r/20221003153711.271630-1-colin.i.king@gmail.com Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2022-10-13rtc: isl12022: switch to using regmap APIRasmus Villemoes2-85/+26
The regmap abstraction allows us to avoid the private i2c transfer helpers, and also offers some nice utility functions such as the regmap_update_bits family. While at it, simplify the code even more by not keeping track of ->write_enabled: rtc_set_time is not a hot path, so one extra i2c read doesn't hurt (regmap_update_bits elides the write when the bits are already as desired). Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Link: https://lore.kernel.org/r/20220921114624.3250848-9-linux@rasmusvillemoes.dk Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2022-10-13rtc: isl12022: drop redundant write to HR registerRasmus Villemoes1-14/+0
There's nothing in the data sheet that says writing to one of the time keeping registers is necessary to start the RTC. It does so at the stop condition of the i2c transfer setting the WRTC bit: Upon initialization or power-up, the WRTC must be set to "1" to enable the RTC. Upon the completion of a valid write (STOP), the RTC starts counting. Moreover, even if such a write to one of the timekeeping registers was necessary, that's exactly what we do anyway just below when we actually write the given struct rtc_time to the device. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Link: https://lore.kernel.org/r/20220921114624.3250848-8-linux@rasmusvillemoes.dk Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2022-10-13rtc: isl12022: use dev_set_drvdata() instead of i2c_set_clientdata()Rasmus Villemoes1-3/+2
As another preparation for removing direct references to the i2c_client in the helper functions, stash a pointer to the private data via dev_set_drvdata() instead of i2c_set_clientdata(). Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Link: https://lore.kernel.org/r/20220921114624.3250848-7-linux@rasmusvillemoes.dk Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2022-10-13rtc: isl12022: use %ptRRasmus Villemoes1-10/+2
Simplify the code and make the output format consistent with other RTC drivers by standardizing on using the %ptR printf extension. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Link: https://lore.kernel.org/r/20220921114624.3250848-6-linux@rasmusvillemoes.dk Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2022-10-13rtc: isl12022: simplify some expressionsRasmus Villemoes1-4/+4
These instances of '&client->dev' might as well be spelled 'dev', since 'client' has been computed from 'dev' via 'client = to_i2c_client(dev)'. Later patches will get rid of that local variable 'client', so remove these unnecessary references so those later patches become easier to read. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Link: https://lore.kernel.org/r/20220921114624.3250848-5-linux@rasmusvillemoes.dk Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2022-10-13rtc: isl12022: drop a dev_info()Rasmus Villemoes1-3/+0
This dev_info() seems to be a debug leftover, and it would only get printed once (or, once per battery change). Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Link: https://lore.kernel.org/r/20220921114624.3250848-4-linux@rasmusvillemoes.dk Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2022-10-13rtc: isl12022: specify range_min and range_maxRasmus Villemoes1-0/+2
The isl12022 can (only) keep track of times in the range 2000-2099. The data sheet says The calendar registers track date, month, year, and day of the week and are accurate through 2099, with automatic leap year correction. The lower bound of 2000 is obtained by simply observing that its YR register only counts from 00 through 99. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Link: https://lore.kernel.org/r/20220921114624.3250848-3-linux@rasmusvillemoes.dk Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2022-10-13rtc: isl12022: stop using deprecated devm_rtc_device_register()Rasmus Villemoes1-4/+7
The comments say that devm_rtc_device_register() is deprecated and that one should instead use devm_rtc_allocate_device() and [devm_]rtc_register_device. So do that. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Link: https://lore.kernel.org/r/20220921114624.3250848-2-linux@rasmusvillemoes.dk Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2022-10-13rtc: stmp3xxx: Add failure handling for stmp3xxx_wdt_register()Lin Yujun1-0/+2
Use platform_device_put() to free platform device before print error message when platform_device_add() fails to run. Fixes: 1a71fb84fda6 ("rtc: stmp3xxx: add wdt-accessor function") Signed-off-by: Lin Yujun <linyujun809@huawei.com> Reviewed-by: Wolfram Sang <wsa@kernel.org> Link: https://lore.kernel.org/r/20220915065253.43668-1-linyujun809@huawei.com Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2022-10-12Merge tag 'linux-kselftest-kunit-6.1-rc1-2' of ↵Linus Torvalds9-206/+131
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull more KUnit updates from Shuah Khan: "Features and fixes: - simplify resource use - make kunit_malloc() and kunit_free() allocations and frees consistent. kunit_free() frees only the memory allocated by kunit_malloc() - stop downloading risc-v opensbi binaries using wget - other fixes and improvements to tool and KUnit framework" * tag 'linux-kselftest-kunit-6.1-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: Documentation: kunit: Update description of --alltests option kunit: declare kunit_assert structs as const kunit: rename base KUNIT_ASSERTION macro to _KUNIT_FAILED kunit: remove format func from struct kunit_assert, get it to 0 bytes kunit: tool: Don't download risc-v opensbi firmware with wget kunit: make kunit_kfree(NULL) a no-op to match kfree() kunit: make kunit_kfree() not segfault on invalid inputs kunit: make kunit_kfree() only work on pointers from kunit_malloc() and friends kunit: drop test pointer in string_stream_fragment kunit: string-stream: Simplify resource use
2022-10-12Merge tag 'dt-for-palmer-v6.1-mw1' of ↵Palmer Dabbelt10-45/+512
git://git.kernel.org/pub/scm/linux/kernel/git/conor/linux into for-next Microchip RISC-V devicetrees for v6.1 Fixups, reference design changes and new boards: - The addition of QSPI support for mpfs had a corresponding change to the devicetree node. - The v2022.{09,10} reference designs brought with them several memory map changes which are not backwards compatible. The old devicetrees from the v2022.08 and earlier releases still work with current kernels. - Two new devicetrees for a first-party development kit and for the Aries Embedded M100FPSEVP kit. - Corresponding dt-bindings changes for the above. Signed-off-by: Conor Dooley <conor.dooley@microchip.com> * tag 'dt-for-palmer-v6.1-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/conor/linux: riscv: dts: microchip: fix fabric i2c reg size riscv: dts: microchip: update memory configuration for v2022.10 riscv: dts: microchip: add a devicetree for aries' m100pfsevp riscv: dts: microchip: add sevkit device tree riscv: dts: microchip: reduce the fic3 clock rate riscv: dts: microchip: icicle: re-jig fabric peripheral addresses riscv: dts: microchip: icicle: update pci address properties riscv: dts: microchip: move the mpfs' pci node to -fabric.dtsi riscv: dts: microchip: add pci dma ranges for the icicle kit dt-bindings: riscv: microchip: document the sev kit dt-bindings: riscv: microchip: document the aries m100pfsevp dt-bindings: riscv: microchip: document icicle reference design riscv: dts: microchip: add qspi compatible fallback Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-10-12Merge tag 'linux-kselftest-next-6.1-rc1-2' of ↵Linus Torvalds3-10/+30
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull more Kselftest updates from Shuah Khan: "This consists of fixes and improvements to memory-hotplug test and a minor spelling fix to ftrace test" * tag 'linux-kselftest-next-6.1-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: docs: notifier-error-inject: Correct test's name selftests/memory-hotplug: Adjust log info for maintainability selftests/memory-hotplug: Restore memory before exit selftests/memory-hotplug: Add checking after online or offline selftests/ftrace: func_event_triggers: fix typo in user message
2022-10-12Merge tag 'vfio-v6.1-rc1' of https://github.com/awilliam/linux-vfioLinus Torvalds51-2720/+4887
Pull VFIO updates from Alex Williamson: - Prune private items from vfio_pci_core.h to a new internal header, fix missed function rename, and refactor vfio-pci interrupt defines (Jason Gunthorpe) - Create consistent naming and handling of ioctls with a function per ioctl for vfio-pci and vfio group handling, use proper type args where available (Jason Gunthorpe) - Implement a set of low power device feature ioctls allowing userspace to make use of power states such as D3cold where supported (Abhishek Sahu) - Remove device counter on vfio groups, which had restricted the page pinning interface to singleton groups to account for limitations in the type1 IOMMU backend. Document usage as limited to emulated IOMMU devices, ie. traditional mdev devices where this restriction is consistent (Jason Gunthorpe) - Correct function prefix in hisi_acc driver incurred during previous refactoring (Shameer Kolothum) - Correct typo and remove redundant warning triggers in vfio-fsl driver (Christophe JAILLET) - Introduce device level DMA dirty tracking uAPI and implementation in the mlx5 variant driver (Yishai Hadas & Joao Martins) - Move much of the vfio_device life cycle management into vfio core, simplifying and avoiding duplication across drivers. This also facilitates adding a struct device to vfio_device which begins the introduction of device rather than group level user support and fills a gap allowing userspace identify devices as vfio capable without implicit knowledge of the driver (Kevin Tian & Yi Liu) - Split vfio container handling to a separate file, creating a more well defined API between the core and container code, masking IOMMU backend implementation from the core, allowing for an easier future transition to an iommufd based implementation of the same (Jason Gunthorpe) - Attempt to resolve race accessing the iommu_group for a device between vfio releasing DMA ownership and removal of the device from the IOMMU driver. Follow-up with support to allow vfio_group to exist with NULL iommu_group pointer to support existing userspace use cases of holding the group file open (Jason Gunthorpe) - Fix error code and hi/lo register manipulation issues in the hisi_acc variant driver, along with various code cleanups (Longfang Liu) - Fix a prior regression in GVT-g group teardown, resulting in unreleased resources (Jason Gunthorpe) - A significant cleanup and simplification of the mdev interface, consolidating much of the open coded per driver sysfs interface support into the mdev core (Christoph Hellwig) - Simplification of tracking and locking around vfio_groups that fall out from previous refactoring (Jason Gunthorpe) - Replace trivial open coded f_ops tests with new helper (Alex Williamson) * tag 'vfio-v6.1-rc1' of https://github.com/awilliam/linux-vfio: (77 commits) vfio: More vfio_file_is_group() use cases vfio: Make the group FD disassociate from the iommu_group vfio: Hold a reference to the iommu_group in kvm for SPAPR vfio: Add vfio_file_is_group() vfio: Change vfio_group->group_rwsem to a mutex vfio: Remove the vfio_group->users and users_comp vfio/mdev: add mdev available instance checking to the core vfio/mdev: consolidate all the description sysfs into the core code vfio/mdev: consolidate all the available_instance sysfs into the core code vfio/mdev: consolidate all the name sysfs into the core code vfio/mdev: consolidate all the device_api sysfs into the core code vfio/mdev: remove mtype_get_parent_dev vfio/mdev: remove mdev_parent_dev vfio/mdev: unexport mdev_bus_type vfio/mdev: remove mdev_from_dev vfio/mdev: simplify mdev_type handling vfio/mdev: embedd struct mdev_parent in the parent data structure vfio/mdev: make mdev.h standalone includable drm/i915/gvt: simplify vgpu configuration management drm/i915/gvt: fix a memory leak in intel_gvt_init_vgpu_types ...
2022-10-12i3c: master: Remove the wrong place of reattach.Billy Tsai1-4/+0
The reattach should be used when an I3C device has its address changed. But the modified place in this patch doesn't have the address changed of the newdev. This wrong reattach will reserve the same address slot twice and return unexpected -EBUSY when the bus find the duplicate device with diffent dynamic address. Signed-off-by: Billy Tsai <billy_tsai@aspeedtech.com> Link: https://lore.kernel.org/r/20220926105145.8145-2-billy_tsai@aspeedtech.com Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2022-10-12i3c: master: Free the old_dyn_addr when reattach.Billy Tsai1-0/+3
This patch is used to free the old_dyn_addr when the caller want to reattach the device to the different dynamic address. If the old_dyn_addr is 0 the function will treat it as no old_dyn_addr is reserved on the bus. Without the patch, when the driver reattach the i3c device after setnewda the old_dyn_addr will be permanently occupied. Signed-off-by: Billy Tsai <billy_tsai@aspeedtech.com> Link: https://lore.kernel.org/r/20220926105145.8145-1-billy_tsai@aspeedtech.com Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2022-10-12Merge tag 'for-linus-6.1-rc1-tag' of ↵Linus Torvalds12-242/+313
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen updates from Juergen Gross: - Some minor typo fixes - A fix of the Xen pcifront driver for supporting the device model to run in a Linux stub domain - A cleanup of the pcifront driver - A series to enable grant-based virtio with Xen on x86 - A cleanup of Xen PV guests to distinguish between safe and faulting MSR accesses - Two fixes of the Xen gntdev driver - Two fixes of the new xen grant DMA driver * tag 'for-linus-6.1-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen: Kconfig: Fix spelling mistake "Maxmium" -> "Maximum" xen/pv: support selecting safe/unsafe msr accesses xen/pv: refactor msr access functions to support safe and unsafe accesses xen/pv: fix vendor checks for pmu emulation xen/pv: add fault recovery control to pmu msr accesses xen/virtio: enable grant based virtio on x86 xen/virtio: use dom0 as default backend for CONFIG_XEN_VIRTIO_FORCE_GRANT xen/virtio: restructure xen grant dma setup xen/pcifront: move xenstore config scanning into sub-function xen/gntdev: Accommodate VMA splitting xen/gntdev: Prevent leaking grants xen/virtio: Fix potential deadlock when accessing xen_grant_dma_devices xen/virtio: Fix n_pages calculation in xen_grant_dma_map(unmap)_page() xen/xenbus: Fix spelling mistake "hardward" -> "hardware" xen-pcifront: Handle missed Connected state