summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2022-10-19netfilter: rpfilter/fib: Set ->flowic_uid correctly for user namespaces.Guillaume Nault4-0/+5
Currently netfilter's rpfilter and fib modules implicitely initialise ->flowic_uid with 0. This is normally the root UID. However, this isn't the case in user namespaces, where user ID 0 is mapped to a different kernel UID. By initialising ->flowic_uid with sock_net_uid(), we get the root UID of the user namespace, thus keeping the same behaviour whether or not we're running in a user namepspace. Note, this is similar to commit 8bcfd0925ef1 ("ipv4: add missing initialization for flowi4_uid"), which fixed the rp_filter sysctl. Fixes: 622ec2c9d524 ("net: core: add UID to flows, rules, and routes") Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-10-18ip6mr: fix UAF issue in ip6mr_sk_done() when addrconf_init_net() failedZhengchao Shao1-0/+2
If the initialization fails in calling addrconf_init_net(), devconf_all is the pointer that has been released. Then ip6mr_sk_done() is called to release the net, accessing devconf->mc_forwarding directly causes invalid pointer access. The process is as follows: setup_net() ops_init() addrconf_init_net() all = kmemdup(...) ---> alloc "all" ... net->ipv6.devconf_all = all; __addrconf_sysctl_register() ---> failed ... kfree(all); ---> ipv6.devconf_all invalid ... ops_exit_list() ... ip6mr_sk_done() devconf = net->ipv6.devconf_all; //devconf is invalid pointer if (!devconf || !atomic_read(&devconf->mc_forwarding)) The following is the Call Trace information: BUG: KASAN: use-after-free in ip6mr_sk_done+0x112/0x3a0 Read of size 4 at addr ffff888075508e88 by task ip/14554 Call Trace: <TASK> dump_stack_lvl+0x8e/0xd1 print_report+0x155/0x454 kasan_report+0xba/0x1f0 kasan_check_range+0x35/0x1b0 ip6mr_sk_done+0x112/0x3a0 rawv6_close+0x48/0x70 inet_release+0x109/0x230 inet6_release+0x4c/0x70 sock_release+0x87/0x1b0 igmp6_net_exit+0x6b/0x170 ops_exit_list+0xb0/0x170 setup_net+0x7ac/0xbd0 copy_net_ns+0x2e6/0x6b0 create_new_namespaces+0x382/0xa50 unshare_nsproxy_namespaces+0xa6/0x1c0 ksys_unshare+0x3a4/0x7e0 __x64_sys_unshare+0x2d/0x40 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 RIP: 0033:0x7f7963322547 </TASK> Allocated by task 14554: kasan_save_stack+0x1e/0x40 kasan_set_track+0x21/0x30 __kasan_kmalloc+0xa1/0xb0 __kmalloc_node_track_caller+0x4a/0xb0 kmemdup+0x28/0x60 addrconf_init_net+0x1be/0x840 ops_init+0xa5/0x410 setup_net+0x5aa/0xbd0 copy_net_ns+0x2e6/0x6b0 create_new_namespaces+0x382/0xa50 unshare_nsproxy_namespaces+0xa6/0x1c0 ksys_unshare+0x3a4/0x7e0 __x64_sys_unshare+0x2d/0x40 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 Freed by task 14554: kasan_save_stack+0x1e/0x40 kasan_set_track+0x21/0x30 kasan_save_free_info+0x2a/0x40 ____kasan_slab_free+0x155/0x1b0 slab_free_freelist_hook+0x11b/0x220 __kmem_cache_free+0xa4/0x360 addrconf_init_net+0x623/0x840 ops_init+0xa5/0x410 setup_net+0x5aa/0xbd0 copy_net_ns+0x2e6/0x6b0 create_new_namespaces+0x382/0xa50 unshare_nsproxy_namespaces+0xa6/0x1c0 ksys_unshare+0x3a4/0x7e0 __x64_sys_unshare+0x2d/0x40 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 Fixes: 7d9b1b578d67 ("ip6mr: fix use-after-free in ip6mr_sk_done()") Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20221017080331.16878-1-shaozhengchao@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-10-18udp: Update reuse->has_conns under reuseport_lock.Kuniyuki Iwashima6-10/+25
When we call connect() for a UDP socket in a reuseport group, we have to update sk->sk_reuseport_cb->has_conns to 1. Otherwise, the kernel could select a unconnected socket wrongly for packets sent to the connected socket. However, the current way to set has_conns is illegal and possible to trigger that problem. reuseport_has_conns() changes has_conns under rcu_read_lock(), which upgrades the RCU reader to the updater. Then, it must do the update under the updater's lock, reuseport_lock, but it doesn't for now. For this reason, there is a race below where we fail to set has_conns resulting in the wrong socket selection. To avoid the race, let's split the reader and updater with proper locking. cpu1 cpu2 +----+ +----+ __ip[46]_datagram_connect() reuseport_grow() . . |- reuseport_has_conns(sk, true) |- more_reuse = __reuseport_alloc(more_socks_size) | . | | |- rcu_read_lock() | |- reuse = rcu_dereference(sk->sk_reuseport_cb) | | | | | /* reuse->has_conns == 0 here */ | | |- more_reuse->has_conns = reuse->has_conns | |- reuse->has_conns = 1 | /* more_reuse->has_conns SHOULD BE 1 HERE */ | | | | | |- rcu_assign_pointer(reuse->socks[i]->sk_reuseport_cb, | | | more_reuse) | `- rcu_read_unlock() `- kfree_rcu(reuse, rcu) | |- sk->sk_state = TCP_ESTABLISHED Note the likely(reuse) in reuseport_has_conns_set() is always true, but we put the test there for ease of review. [0] For the record, usually, sk_reuseport_cb is changed under lock_sock(). The only exception is reuseport_grow() & TCP reqsk migration case. 1) shutdown() TCP listener, which is moved into the latter part of reuse->socks[] to migrate reqsk. 2) New listen() overflows reuse->socks[] and call reuseport_grow(). 3) reuse->max_socks overflows u16 with the new listener. 4) reuseport_grow() pops the old shutdown()ed listener from the array and update its sk->sk_reuseport_cb as NULL without lock_sock(). shutdown()ed TCP sk->sk_reuseport_cb can be changed without lock_sock(), but, reuseport_has_conns_set() is called only for UDP under lock_sock(), so likely(reuse) never be false in reuseport_has_conns_set(). [0]: https://lore.kernel.org/netdev/CANn89iLja=eQHbsM_Ta2sQF0tOGU8vAGrh_izRuuHjuO1ouUag@mail.gmail.com/ Fixes: acdcecc61285 ("udp: correct reuseport selection with connected sockets") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20221014182625.89913-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-10-17net: ethernet: mediatek: ppe: Remove the unused function mtk_foe_entry_usable()Jiapeng Chong1-6/+0
The function mtk_foe_entry_usable() is defined in the mtk_ppe.c file, but not called elsewhere, so delete this unused function. drivers/net/ethernet/mediatek/mtk_ppe.c:400:20: warning: unused function 'mtk_foe_entry_usable'. Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=2409 Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-17Merge branch 'mtk_eth_wed-leak-fixes'David S. Miller2-8/+24
Yang Yingliang says: ==================== net: ethernet: mtk_eth_wed: fixe some leaks I found some leaks in mtk_eth_soc.c/mtk_wed.c. patch#1 - I found mtk_wed_exit() is never called, I think mtk_wed_exit() need be called in error path or module remove function to free the memory allocated in mtk_wed_add_hw(). patch#2 - The device is not put in error path in mtk_wed_add_hw(). patch#3 - The device_node pointer returned by of_parse_phandle() with refcount incremented, it should be decreased when it done. This patchset was just compiled tested because I don't have any HW on which to do the actual tests. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-17net: ethernet: mtk_eth_wed: add missing of_node_put()Yang Yingliang1-1/+4
The device_node pointer returned by of_parse_phandle() with refcount incremented, when finish using it, the refcount need be decreased. Fixes: 804775dfc288 ("net: ethernet: mtk_eth_soc: add support for Wireless Ethernet Dispatch (WED)") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-17net: ethernet: mtk_eth_wed: add missing put_device() in mtk_wed_add_hw()Yang Yingliang1-2/+8
After calling get_device() in mtk_wed_add_hw(), in error path, put_device() needs be called. Fixes: 804775dfc288 ("net: ethernet: mtk_eth_soc: add support for Wireless Ethernet Dispatch (WED)") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-17net: ethernet: mtk_eth_soc: fix possible memory leak in mtk_probe()Yang Yingliang1-5/+12
If mtk_wed_add_hw() has been called, mtk_wed_exit() needs be called in error path or removing module to free the memory allocated in mtk_wed_add_hw(). Fixes: 804775dfc288 ("net: ethernet: mtk_eth_soc: add support for Wireless Ethernet Dispatch (WED)") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-16skmsg: pass gfp argument to alloc_sk_msg()Eric Dumazet1-4/+4
syzbot found that alloc_sk_msg() could be called from a non sleepable context. sk_psock_verdict_recv() uses rcu_read_lock() protection. We need the callers to pass a gfp_t argument to avoid issues. syzbot report was: BUG: sleeping function called from invalid context at include/linux/sched/mm.h:274 in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 3613, name: syz-executor414 preempt_count: 0, expected: 0 RCU nest depth: 1, expected: 0 INFO: lockdep is turned off. CPU: 0 PID: 3613 Comm: syz-executor414 Not tainted 6.0.0-syzkaller-09589-g55be6084c8e0 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x1e3/0x2cb lib/dump_stack.c:106 __might_resched+0x538/0x6a0 kernel/sched/core.c:9877 might_alloc include/linux/sched/mm.h:274 [inline] slab_pre_alloc_hook mm/slab.h:700 [inline] slab_alloc_node mm/slub.c:3162 [inline] slab_alloc mm/slub.c:3256 [inline] kmem_cache_alloc_trace+0x59/0x310 mm/slub.c:3287 kmalloc include/linux/slab.h:600 [inline] kzalloc include/linux/slab.h:733 [inline] alloc_sk_msg net/core/skmsg.c:507 [inline] sk_psock_skb_ingress_self+0x5c/0x330 net/core/skmsg.c:600 sk_psock_verdict_apply+0x395/0x440 net/core/skmsg.c:1014 sk_psock_verdict_recv+0x34d/0x560 net/core/skmsg.c:1201 tcp_read_skb+0x4a1/0x790 net/ipv4/tcp.c:1770 tcp_rcv_established+0x129d/0x1a10 net/ipv4/tcp_input.c:5971 tcp_v4_do_rcv+0x479/0xac0 net/ipv4/tcp_ipv4.c:1681 sk_backlog_rcv include/net/sock.h:1109 [inline] __release_sock+0x1d8/0x4c0 net/core/sock.c:2906 release_sock+0x5d/0x1c0 net/core/sock.c:3462 tcp_sendmsg+0x36/0x40 net/ipv4/tcp.c:1483 sock_sendmsg_nosec net/socket.c:714 [inline] sock_sendmsg net/socket.c:734 [inline] __sys_sendto+0x46d/0x5f0 net/socket.c:2117 __do_sys_sendto net/socket.c:2129 [inline] __se_sys_sendto net/socket.c:2125 [inline] __x64_sys_sendto+0xda/0xf0 net/socket.c:2125 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd Fixes: 43312915b5ba ("skmsg: Get rid of unncessary memset()") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Cong Wang <cong.wang@bytedance.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-15Merge branch 'phylink_set_mac_pm'David S. Miller3-0/+6
Shenwei Wang says: ==================== net: phylink: add phylink_set_mac_pm() helper Per Russell's suggestion, the implementation is changed from the helper function to add an extra property in phylink_config structure because this change can easily cover SFP usecase too. Changes in v6: - update the fix tag hash and format Changes in v5: - Add fix tag in the commit message Changes in v4: - Clean up the codes in phylink.c - Continue the version number ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-15net: stmmac: Enable mac_managed_pm phylink configShenwei Wang1-0/+1
Enable the mac_managed_pm configuration in the phylink_config structure to avoid the kernel warning during system resume. Fixes: 744d23c71af3 ("net: phy: Warn about incorrect mdio_bus_phy_resume() state") Signed-off-by: Shenwei Wang <shenwei.wang@nxp.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-15net: phylink: add mac_managed_pm in phylink_config structureShenwei Wang2-0/+5
The recent commit 'commit 744d23c71af3 ("net: phy: Warn about incorrect mdio_bus_phy_resume() state")' requires the MAC driver explicitly tell the phy driver who is managing the PM, otherwise you will see warning during resume stage. Add a boolean property in the phylink_config structure so that the MAC driver can use it to tell the PHY driver if it wants to manage the PM. Fixes: 744d23c71af3 ("net: phy: Warn about incorrect mdio_bus_phy_resume() state") Signed-off-by: Shenwei Wang <shenwei.wang@nxp.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-15Revert "net: fix cpu_max_bits_warn() usage in netif_attrmask_next{,_and}"Jakub Kicinski1-4/+6
This reverts commit 854701ba4c39afae2362ba19a580c461cb183e4f. We have more violations around, which leads to: WARNING: CPU: 2 PID: 1 at include/linux/cpumask.h:110 __netif_set_xps_queue+0x14e/0x770 Let's back this out and retry with a larger clean up in -next. Fixes: 854701ba4c39 ("net: fix cpu_max_bits_warn() usage in netif_attrmask_next{,_and}") Link: https://lore.kernel.org/all/20221014030459.3272206-2-guoren@kernel.org/ Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-15net: dsa: uninitialized variable in dsa_slave_netdevice_event()Dan Carpenter1-1/+1
Return zero if both dsa_slave_dev_check() and netdev_uses_dsa() are false. Fixes: acc43b7bf52a ("net: dsa: allow masters to join a LAG") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-15sunhme: Uninitialized variable in happy_meal_init()Dan Carpenter1-1/+1
The "burst" string is only initialized for CONFIG_SPARC. It should be set to "64" because that's what is used by PCI. Fixes: 24cddbc3ef11 ("sunhme: Combine continued messages") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Sean Anderson <seanga2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-15net/smc: Fix an error code in smc_lgr_create()Dan Carpenter1-1/+2
If smc_wr_alloc_lgr_mem() fails then return an error code. Don't return success. Fixes: 8799e310fb3f ("net/smc: add v2 support to the work request layer") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-15net: phy: dp83867: Extend RX strap quirk for SGMII modeHarini Katakam1-0/+8
When RX strap in HW is not set to MODE 3 or 4, bit 7 and 8 in CF4 register should be set. The former is already handled in dp83867_config_init; add the latter in SGMII specific initialization. Fixes: 2a10154abcb7 ("net: phy: dp83867: Add TI dp83867 phy") Signed-off-by: Harini Katakam <harini.katakam@amd.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-15net: hv_netvsc: Fix a warning triggered by memcpy in rndis_filterCezar Bulinaru1-2/+4
memcpy: detected field-spanning write (size 168) of single field "(void *)&request->response_msg + (sizeof(struct rndis_message) - sizeof(union rndis_message_container)) + sizeof(*req_id)" at drivers/net/hyperv/rndis_filter.c:338 (size 40) RSP: 0018:ffffc90000144de0 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffff8881766b4000 RCX: 0000000000000000 RDX: 0000000000000102 RSI: 0000000000009ffb RDI: 00000000ffffffff RBP: ffffc90000144e38 R08: 0000000000000000 R09: 00000000ffffdfff R10: ffffc90000144c48 R11: ffffffff82f56ac8 R12: ffff8881766b403c R13: 00000000000000a8 R14: ffff888100b75000 R15: ffff888179301d00 FS: 0000000000000000(0000) GS:ffff8884d6280000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055f8b024c418 CR3: 0000000176548001 CR4: 00000000003706e0 Call Trace: <IRQ> ? _raw_spin_unlock_irqrestore+0x27/0x50 netvsc_poll+0x556/0x940 [hv_netvsc] __napi_poll+0x2e/0x170 net_rx_action+0x299/0x2f0 __do_softirq+0xed/0x2ef __irq_exit_rcu+0x9f/0x110 irq_exit_rcu+0xe/0x20 sysvec_hyperv_callback+0xb0/0xd0 </IRQ> <TASK> asm_sysvec_hyperv_callback+0x1b/0x20 RIP: 0010:native_safe_halt+0xb/0x10 Fixes: A warning triggered when the response message len exceeds the size of rndis_message. Inside the rndis_request structure these fields are however followed by a RNDIS_EXT_LEN padding so it is safe to use unsafe_memcpy. Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Cezar Bulinaru <cbulinaru@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-15net/atm: fix proc_mpc_write incorrect return valueXiaobo Liu1-1/+2
Then the input contains '\0' or '\n', proc_mpc_write has read them, so the return value needs +1. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Xiaobo Liu <cppcoffee@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-15sfc: Change VF mac via PF as first preference if available.Jonathan Cooper1-34/+24
Changing a VF's mac address through the VF (rather than via the PF) fails with EPERM because the latter part of efx_ef10_set_mac_address attempts to change the vport mac address list as the VF. Even with this fixed it still fails with EBUSY because the vadaptor is still assigned on the VF - the vadaptor reassignment must be within a section where the VF has torn down its state. A major reason this has broken is because we have two functions that ostensibly do the same thing - have a PF and VF cooperate to change a VF mac address. Rather than do this, if we are changing the mac of a VF that has a link to the PF in the same VM then simply call sriov_set_vf_mac instead, which is a proven working function that does that. If there is no PF available, or that fails non-fatally, then attempt to change the VF's mac address as we would a PF, without updating the PF's data. Test case: Create a VF: echo 1 > /sys/class/net/<if>/device/sriov_numvfs Set the mac address of the VF directly: ip link set <vf> addr 00:11:22:33:44:55 Set the MAC address of the VF via the PF: ip link set <pf> vf 0 mac 00:11:22:33:44:66 Without this patch the last command will fail with ENOENT. Signed-off-by: Jonathan Cooper <jonathan.s.cooper@amd.com> Reported-by: Íñigo Huguet <ihuguet@redhat.com> Fixes: 910c8789a777 ("set the MAC address using MC_CMD_VADAPTOR_SET_MAC") Acked-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-14MAINTAINERS: nfc: s3fwrn5: Drop Krzysztof OpasiakKrzysztof Kozlowski2-2/+0
Emails to Krzysztof Opasiak bounce ("Recipient address rejected: User unknown") so drop his email from maintainers of s3fwrn5 NFC bindings and driver. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-14MAINTAINERS: git://github -> https://github.com for petkanPalmer Dabbelt1-2/+2
Github deprecated the git:// links about a year ago, so let's move to the https:// URLs instead. Reported-by: Conor Dooley <conor.dooley@microchip.com> Link: https://github.blog/2021-09-01-improving-git-protocol-security-github/ Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-14net: macvlan: change schedule system_wq to system_unbound_wqzhangxiangqian1-1/+1
For FT2000+/64 devices, when four virtual machines share the same physical network interface, DROP will occur due to the single core CPU performance problem. ip_check_defrag and macvlan_process_broadcast is on the same CPU. When the MACVLAN PORT increases, the CPU usage reaches more than 90%. bc_queue > bc_queue_len_used (default 1000), causing DROP. Signed-off-by: zhangxiangqian <zhangxiangqian@kylinos.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-14tls: strp: make sure the TCP skbs do not have overlapping dataJakub Kicinski1-4/+28
TLS tries to get away with using the TCP input queue directly. This does not work if there is duplicated data (multiple skbs holding bytes for the same seq number range due to retransmits). Check for this condition and fall back to copy mode, it should be rare. Fixes: 84c61fe1a75b ("tls: rx: do not use the standard strparser") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-14i40e: Fix DMA mappings leakJan Sokolowski6-28/+74
During reallocation of RX buffers, new DMA mappings are created for those buffers. steps for reproduction: while : do for ((i=0; i<=8160; i=i+32)) do ethtool -G enp130s0f0 rx $i tx $i sleep 0.5 ethtool -g enp130s0f0 done done This resulted in crash: i40e 0000:01:00.1: Unable to allocate memory for the Rx descriptor ring, size=65536 Driver BUG WARNING: CPU: 0 PID: 4300 at net/core/xdp.c:141 xdp_rxq_info_unreg+0x43/0x50 Call Trace: i40e_free_rx_resources+0x70/0x80 [i40e] i40e_set_ringparam+0x27c/0x800 [i40e] ethnl_set_rings+0x1b2/0x290 genl_family_rcv_msg_doit.isra.15+0x10f/0x150 genl_family_rcv_msg+0xb3/0x160 ? rings_fill_reply+0x1a0/0x1a0 genl_rcv_msg+0x47/0x90 ? genl_family_rcv_msg+0x160/0x160 netlink_rcv_skb+0x4c/0x120 genl_rcv+0x24/0x40 netlink_unicast+0x196/0x230 netlink_sendmsg+0x204/0x3d0 sock_sendmsg+0x4c/0x50 __sys_sendto+0xee/0x160 ? handle_mm_fault+0xbe/0x1e0 ? syscall_trace_enter+0x1d3/0x2c0 __x64_sys_sendto+0x24/0x30 do_syscall_64+0x5b/0x1a0 entry_SYSCALL_64_after_hwframe+0x65/0xca RIP: 0033:0x7f5eac8b035b Missing register, driver bug WARNING: CPU: 0 PID: 4300 at net/core/xdp.c:119 xdp_rxq_info_unreg_mem_model+0x69/0x140 Call Trace: xdp_rxq_info_unreg+0x1e/0x50 i40e_free_rx_resources+0x70/0x80 [i40e] i40e_set_ringparam+0x27c/0x800 [i40e] ethnl_set_rings+0x1b2/0x290 genl_family_rcv_msg_doit.isra.15+0x10f/0x150 genl_family_rcv_msg+0xb3/0x160 ? rings_fill_reply+0x1a0/0x1a0 genl_rcv_msg+0x47/0x90 ? genl_family_rcv_msg+0x160/0x160 netlink_rcv_skb+0x4c/0x120 genl_rcv+0x24/0x40 netlink_unicast+0x196/0x230 netlink_sendmsg+0x204/0x3d0 sock_sendmsg+0x4c/0x50 __sys_sendto+0xee/0x160 ? handle_mm_fault+0xbe/0x1e0 ? syscall_trace_enter+0x1d3/0x2c0 __x64_sys_sendto+0x24/0x30 do_syscall_64+0x5b/0x1a0 entry_SYSCALL_64_after_hwframe+0x65/0xca RIP: 0033:0x7f5eac8b035b This was caused because of new buffers with different RX ring count should substitute older ones, but those buffers were freed in i40e_configure_rx_ring and reallocated again with i40e_alloc_rx_bi, thus kfree on rx_bi caused leak of already mapped DMA. Fix this by reallocating ZC with rx_bi_zc struct when BPF program loads. Additionally reallocate back to rx_bi when BPF program unloads. If BPF program is loaded/unloaded and XSK pools are created, reallocate RX queues accordingly in XSP_SETUP_XSK_POOL handler. Fixes: be1222b585fd ("i40e: Separate kernel allocated rx_bi rings from AF_XDP rings") Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Chandan <chandanx.rout@intel.com> (A Contingent Worker at Intel) Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-14net: dsa: qca8k: fix ethtool autocast mib for big-endian systemsChristian Marangi2-13/+9
The switch sends autocast mib in little-endian. This is problematic for big-endian system as the values needs to be converted. Fix this by converting each mib value to cpu byte order. Fixes: 5c957c7ca78c ("net: dsa: qca8k: add support for mib autocast in Ethernet packet") Tested-by: Pawel Dembicki <paweldembicki@gmail.com> Tested-by: Lech Perczak <lech.perczak@gmail.com> Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-14net: dsa: qca8k: fix inband mgmt for big-endian systemsChristian Marangi2-19/+50
The header and the data of the skb for the inband mgmt requires to be in little-endian. This is problematic for big-endian system as the mgmt header is written in the cpu byte order. Fix this by converting each value for the mgmt header and data to little-endian, and convert to cpu byte order the mgmt header and data sent by the switch. Fixes: 5950c7c0a68c ("net: dsa: qca8k: add support for mgmt read/write in Ethernet packet") Tested-by: Pawel Dembicki <paweldembicki@gmail.com> Tested-by: Lech Perczak <lech.perczak@gmail.com> Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Reviewed-by: Lech Perczak <lech.perczak@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-14tipc: fix an information leak in tipc_topsrv_kern_subscrAlexander Potapenko1-1/+1
Use a 8-byte write to initialize sub.usr_handle in tipc_topsrv_kern_subscr(), otherwise four bytes remain uninitialized when issuing setsockopt(..., SOL_TIPC, ...). This resulted in an infoleak reported by KMSAN when the packet was received: ===================================================== BUG: KMSAN: kernel-infoleak in copyout+0xbc/0x100 lib/iov_iter.c:169 instrument_copy_to_user ./include/linux/instrumented.h:121 copyout+0xbc/0x100 lib/iov_iter.c:169 _copy_to_iter+0x5c0/0x20a0 lib/iov_iter.c:527 copy_to_iter ./include/linux/uio.h:176 simple_copy_to_iter+0x64/0xa0 net/core/datagram.c:513 __skb_datagram_iter+0x123/0xdc0 net/core/datagram.c:419 skb_copy_datagram_iter+0x58/0x200 net/core/datagram.c:527 skb_copy_datagram_msg ./include/linux/skbuff.h:3903 packet_recvmsg+0x521/0x1e70 net/packet/af_packet.c:3469 ____sys_recvmsg+0x2c4/0x810 net/socket.c:? ___sys_recvmsg+0x217/0x840 net/socket.c:2743 __sys_recvmsg net/socket.c:2773 __do_sys_recvmsg net/socket.c:2783 __se_sys_recvmsg net/socket.c:2780 __x64_sys_recvmsg+0x364/0x540 net/socket.c:2780 do_syscall_x64 arch/x86/entry/common.c:50 do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd arch/x86/entry/entry_64.S:120 ... Uninit was stored to memory at: tipc_sub_subscribe+0x42d/0xb50 net/tipc/subscr.c:156 tipc_conn_rcv_sub+0x246/0x620 net/tipc/topsrv.c:375 tipc_topsrv_kern_subscr+0x2e8/0x400 net/tipc/topsrv.c:579 tipc_group_create+0x4e7/0x7d0 net/tipc/group.c:190 tipc_sk_join+0x2a8/0x770 net/tipc/socket.c:3084 tipc_setsockopt+0xae5/0xe40 net/tipc/socket.c:3201 __sys_setsockopt+0x87f/0xdc0 net/socket.c:2252 __do_sys_setsockopt net/socket.c:2263 __se_sys_setsockopt net/socket.c:2260 __x64_sys_setsockopt+0xe0/0x160 net/socket.c:2260 do_syscall_x64 arch/x86/entry/common.c:50 do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd arch/x86/entry/entry_64.S:120 Local variable sub created at: tipc_topsrv_kern_subscr+0x57/0x400 net/tipc/topsrv.c:562 tipc_group_create+0x4e7/0x7d0 net/tipc/group.c:190 Bytes 84-87 of 88 are uninitialized Memory access of size 88 starts at ffff88801ed57cd0 Data copied to user address 0000000020000400 ... ===================================================== Signed-off-by: Alexander Potapenko <glider@google.com> Fixes: 026321c6d056a5 ("tipc: rename tipc_server to tipc_topsrv") Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-14tipc: Fix recognition of trial periodMark Tomlinson1-1/+1
The trial period exists until jiffies is after addr_trial_end. But as jiffies will eventually overflow, just using time_after will eventually give incorrect results. As the node address is set once the trial period ends, this can be used to know that we are not in the trial period. Fixes: e415577f57f4 ("tipc: correct discovery message handling during address trial period") Signed-off-by: Mark Tomlinson <mark.tomlinson@alliedtelesis.co.nz> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-13Merge tag 'net-6.1-rc1' of ↵Linus Torvalds88-275/+684
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from netfilter, and wifi. Current release - regressions: - Revert "net/sched: taprio: make qdisc_leaf() see the per-netdev-queue pfifo child qdiscs", it may cause crashes when the qdisc is reconfigured - inet: ping: fix splat due to packet allocation refactoring in inet - tcp: clean up kernel listener's reqsk in inet_twsk_purge(), fix UAF due to races when per-netns hash table is used Current release - new code bugs: - eth: adin1110: check in netdev_event that netdev belongs to driver - fixes for PTR_ERR() vs NULL bugs in driver code, from Dan and co. Previous releases - regressions: - ipv4: handle attempt to delete multipath route when fib_info contains an nh reference, avoid oob access - wifi: fix handful of bugs in the new Multi-BSSID code - wifi: mt76: fix rate reporting / throughput regression on mt7915 and newer, fix checksum offload - wifi: iwlwifi: mvm: fix double list_add at iwl_mvm_mac_wake_tx_queue (other cases) - wifi: mac80211: do not drop packets smaller than the LLC-SNAP header on fast-rx Previous releases - always broken: - ieee802154: don't warn zero-sized raw_sendmsg() - ipv6: ping: fix wrong checksum for large frames - mctp: prevent double key removal and unref - tcp/udp: fix memory leaks and races around IPV6_ADDRFORM - hv_netvsc: fix race between VF offering and VF association message Misc: - remove -Warray-bounds silencing in the drivers, compilers fixed" * tag 'net-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (73 commits) sunhme: fix an IS_ERR() vs NULL check in probe net: marvell: prestera: fix a couple NULL vs IS_ERR() checks kcm: avoid potential race in kcm_tx_work tcp: Clean up kernel listener's reqsk in inet_twsk_purge() net: phy: micrel: Fixes FIELD_GET assertion openvswitch: add nf_ct_is_confirmed check before assigning the helper tcp: Fix data races around icsk->icsk_af_ops. ipv6: Fix data races around sk->sk_prot. tcp/udp: Call inet6_destroy_sock() in IPv6 sk->sk_destruct(). udp: Call inet6_destroy_sock() in setsockopt(IPV6_ADDRFORM). tcp/udp: Fix memory leak in ipv6_renew_options(). mctp: prevent double key removal and unref selftests: netfilter: Fix nft_fib.sh for all.rp_filter=1 netfilter: rpfilter/fib: Populate flowic_l3mdev field selftests: netfilter: Test reverse path filtering net/mlx5: Make ASO poll CQ usable in atomic context tcp: cdg: allow tcp_cdg_release() to be called multiple times inet: ping: fix recent breakage ipv6: ping: fix wrong checksum for large frames net: ethernet: ti: am65-cpsw: set correct devlink flavour for unused ports ...
2022-10-13Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds2-2/+6
Pull virtio fixes from Michael Tsirkin: - Fix a regression in virtio pci on power - Add a reviewer for ifcvf * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: vdpa/ifcvf: add reviewer virtio_pci: use irq to detect interrupt support
2022-10-13Merge tag 'trace-v6.1-1' of ↵Linus Torvalds6-123/+146
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: - Found that the synthetic events were using strlen/strscpy() on values that could have come from userspace, and that is bad. Consolidate the string logic of kprobe and eprobe and extend it to the synthetic events to safely process string addresses. - Clean up content of text dump in ftrace_bug() where the output does not make char reads into signed and sign extending the byte output. - Fix some kernel docs in the ring buffer code. * tag 'trace-v6.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Fix reading strings from synthetic events tracing: Add "(fault)" name injection to kernel probes tracing: Move duplicate code of trace_kprobe/eprobe.c into header ring-buffer: Fix kernel-doc ftrace: Fix char print issue in print_ip_ins()
2022-10-13Merge tag 'linux-watchdog-6.1-rc1' of ↵Linus Torvalds32-223/+1097
git://www.linux-watchdog.org/linux-watchdog Pull watchdog updates from Wim Van Sebroeck: - new driver for Exar/MaxLinear XR28V38x - support for exynosautov9 SoC - support for Renesas R-Car V5H (R8A779G0) and RZ/V2M (r9a09g011) SoC - support for imx93 - several other fixes and improvements * tag 'linux-watchdog-6.1-rc1' of git://www.linux-watchdog.org/linux-watchdog: (36 commits) watchdog: twl4030_wdt: add missing mod_devicetable.h include dt-bindings: watchdog: migrate mt7621 text bindings to YAML watchdog: sp5100_tco: Add "action" module parameter watchdog: imx93: add watchdog timer on imx93 watchdog: imx7ulp_wdt: init wdog when it was active watchdog: imx7ulp_wdt: Handle wdog reconfigure failure watchdog: imx7ulp_wdt: Fix RCS timeout issue watchdog: imx7ulp_wdt: Check CMD32EN in wdog init watchdog: imx7ulp: Add explict memory barrier for unlock sequence watchdog: imx7ulp: Move suspend/resume to noirq phase watchdog: rti-wdt:using the pm_runtime_resume_and_get to simplify the code dt-bindings: watchdog: rockchip: add rockchip,rk3128-wdt watchdog: s3c2410_wdt: support exynosautov9 watchdog dt-bindings: watchdog: add exynosautov9 compatible watchdog: npcm: Enable clock if provided watchdog: meson: keep running if already active watchdog: dt-bindings: atmel,at91sam9-wdt: convert to json-schema watchdog: armada_37xx_wdt: Fix .set_timeout callback watchdog: sa1100: make variable sa1100dog_driver static watchdog: w83977f_wdt: Fix comment typo ...
2022-10-13Merge tag 'ceph-for-6.1-rc1' of https://github.com/ceph/ceph-clientLinus Torvalds10-52/+64
Pull ceph updates from Ilya Dryomov: "A quiet round this time: several assorted filesystem fixes, the most noteworthy one being some additional wakeups in cap handling code, and a messenger cleanup" * tag 'ceph-for-6.1-rc1' of https://github.com/ceph/ceph-client: ceph: remove Sage's git tree from documentation ceph: fix incorrectly showing the .snap size for stat ceph: fail the open_by_handle_at() if the dentry is being unlinked ceph: increment i_version when doing a setattr with caps ceph: Use kcalloc for allocating multiple elements ceph: no need to wait for transition RDCACHE|RD -> RD ceph: fail the request if the peer MDS doesn't support getvxattr op ceph: wake up the waiters if any new caps comes libceph: drop last_piece flag from ceph_msg_data_cursor
2022-10-13Merge tag 'nfs-for-6.1-1' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds30-116/+342
Pull NFS client updates from Anna Schumaker: "New Features: - Add NFSv4.2 xattr tracepoints - Replace xprtiod WQ in rpcrdma - Flexfiles cancels I/O on layout recall or revoke Bugfixes and Cleanups: - Directly use ida_alloc() / ida_free() - Don't open-code max_t() - Prefer using strscpy over strlcpy - Remove unused forward declarations - Always return layout states on flexfiles layout return - Have LISTXATTR treat NFS4ERR_NOXATTR as an empty reply instead of error - Allow more xprtrdma memory allocations to fail without triggering a reclaim - Various other xprtrdma clean ups - Fix rpc_killall_tasks() races" * tag 'nfs-for-6.1-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (27 commits) NFSv4/flexfiles: Cancel I/O if the layout is recalled or revoked SUNRPC: Add API to force the client to disconnect SUNRPC: Add a helper to allow pNFS drivers to selectively cancel RPC calls SUNRPC: Fix races with rpc_killall_tasks() xprtrdma: Fix uninitialized variable xprtrdma: Prevent memory allocations from driving a reclaim xprtrdma: Memory allocation should be allowed to fail during connect xprtrdma: MR-related memory allocation should be allowed to fail xprtrdma: Clean up synopsis of rpcrdma_regbuf_alloc() xprtrdma: Clean up synopsis of rpcrdma_req_create() svcrdma: Clean up RPCRDMA_DEF_GFP SUNRPC: Replace the use of the xprtiod WQ in rpcrdma NFSv4.2: Add a tracepoint for listxattr NFSv4.2: Add tracepoints for getxattr, setxattr, and removexattr NFSv4.2: Move TRACE_DEFINE_ENUM(NFS4_CONTENT_*) under CONFIG_NFS_V4_2 NFSv4.2: Add special handling for LISTXATTR receiving NFS4ERR_NOXATTR nfs: remove nfs_wait_atomic_killable() and nfs_write_prepare() declaration NFSv4: remove nfs4_renewd_prepare_shutdown() declaration fs/nfs/pnfs_nfs.c: fix spelling typo and syntax error in comment NFSv4/pNFS: Always return layout stats on layout return for flexfiles ...
2022-10-13Merge tag 'for-linus-6.1-ofs1' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux Pull orangefs update from Mike Marshall: "Change iterate to iterate_shared" * tag 'for-linus-6.1-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux: Orangefs: change iterate to iterate_shared
2022-10-13sunhme: fix an IS_ERR() vs NULL check in probeDan Carpenter1-2/+2
The devm_request_region() function does not return error pointers, it returns NULL on error. Fixes: 914d9b2711dd ("sunhme: switch to devres") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Sean Anderson <seanga2@gmail.com> Reviewed-by: Rolf Eike Beer <eike-kernel@sf-tec.de> Link: https://lore.kernel.org/r/Y0bWzJL8JknX8MUf@kili Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-13net: marvell: prestera: fix a couple NULL vs IS_ERR() checksDan Carpenter1-3/+3
The __prestera_nexthop_group_create() function returns NULL on error and the prestera_nexthop_group_get() returns error pointers. Fix these two checks. Fixes: 0a23ae237171 ("net: marvell: prestera: Add router nexthops ABI") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/Y0bWq+7DoKK465z8@kili Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-13kcm: avoid potential race in kcm_tx_workEric Dumazet1-1/+1
syzbot found that kcm_tx_work() could crash [1] in: /* Primarily for SOCK_SEQPACKET sockets */ if (likely(sk->sk_socket) && test_bit(SOCK_NOSPACE, &sk->sk_socket->flags)) { <<*>> clear_bit(SOCK_NOSPACE, &sk->sk_socket->flags); sk->sk_write_space(sk); } I think the reason is that another thread might concurrently run in kcm_release() and call sock_orphan(sk) while sk is not locked. kcm_tx_work() find sk->sk_socket being NULL. [1] BUG: KASAN: null-ptr-deref in instrument_atomic_write include/linux/instrumented.h:86 [inline] BUG: KASAN: null-ptr-deref in clear_bit include/asm-generic/bitops/instrumented-atomic.h:41 [inline] BUG: KASAN: null-ptr-deref in kcm_tx_work+0xff/0x160 net/kcm/kcmsock.c:742 Write of size 8 at addr 0000000000000008 by task kworker/u4:3/53 CPU: 0 PID: 53 Comm: kworker/u4:3 Not tainted 5.19.0-rc3-next-20220621-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: kkcmd kcm_tx_work Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 kasan_report+0xbe/0x1f0 mm/kasan/report.c:495 check_region_inline mm/kasan/generic.c:183 [inline] kasan_check_range+0x13d/0x180 mm/kasan/generic.c:189 instrument_atomic_write include/linux/instrumented.h:86 [inline] clear_bit include/asm-generic/bitops/instrumented-atomic.h:41 [inline] kcm_tx_work+0xff/0x160 net/kcm/kcmsock.c:742 process_one_work+0x996/0x1610 kernel/workqueue.c:2289 worker_thread+0x665/0x1080 kernel/workqueue.c:2436 kthread+0x2e9/0x3a0 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:302 </TASK> Fixes: ab7ac4eb9832 ("kcm: Kernel Connection Multiplexor module") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tom Herbert <tom@herbertland.com> Link: https://lore.kernel.org/r/20221012133412.519394-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-13tcp: Clean up kernel listener's reqsk in inet_twsk_purge()Kuniyuki Iwashima2-5/+19
Eric Dumazet reported a use-after-free related to the per-netns ehash series. [0] When we create a TCP socket from userspace, the socket always holds a refcnt of the netns. This guarantees that a reqsk timer is always fired before netns dismantle. Each reqsk has a refcnt of its listener, so the listener is not freed before the reqsk, and the net is not freed before the listener as well. OTOH, when in-kernel users create a TCP socket, it might not hold a refcnt of its netns. Thus, a reqsk timer can be fired after the netns dismantle and access freed per-netns ehash. To avoid the use-after-free, we need to clean up TCP_NEW_SYN_RECV sockets in inet_twsk_purge() if the netns uses a per-netns ehash. [0]: https://lore.kernel.org/netdev/CANn89iLXMup0dRD_Ov79Xt8N9FM0XdhCHEN05sf3eLwxKweM6w@mail.gmail.com/ BUG: KASAN: use-after-free in tcp_or_dccp_get_hashinfo include/net/inet_hashtables.h:181 [inline] BUG: KASAN: use-after-free in reqsk_queue_unlink+0x320/0x350 net/ipv4/inet_connection_sock.c:913 Read of size 8 at addr ffff88807545bd80 by task syz-executor.2/8301 CPU: 1 PID: 8301 Comm: syz-executor.2 Not tainted 6.0.0-syzkaller-02757-gaf7d23f9d96a #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 print_address_description mm/kasan/report.c:317 [inline] print_report.cold+0x2ba/0x719 mm/kasan/report.c:433 kasan_report+0xb1/0x1e0 mm/kasan/report.c:495 tcp_or_dccp_get_hashinfo include/net/inet_hashtables.h:181 [inline] reqsk_queue_unlink+0x320/0x350 net/ipv4/inet_connection_sock.c:913 inet_csk_reqsk_queue_drop net/ipv4/inet_connection_sock.c:927 [inline] inet_csk_reqsk_queue_drop_and_put net/ipv4/inet_connection_sock.c:939 [inline] reqsk_timer_handler+0x724/0x1160 net/ipv4/inet_connection_sock.c:1053 call_timer_fn+0x1a0/0x6b0 kernel/time/timer.c:1474 expire_timers kernel/time/timer.c:1519 [inline] __run_timers.part.0+0x674/0xa80 kernel/time/timer.c:1790 __run_timers kernel/time/timer.c:1768 [inline] run_timer_softirq+0xb3/0x1d0 kernel/time/timer.c:1803 __do_softirq+0x1d0/0x9c8 kernel/softirq.c:571 invoke_softirq kernel/softirq.c:445 [inline] __irq_exit_rcu+0x123/0x180 kernel/softirq.c:650 irq_exit_rcu+0x5/0x20 kernel/softirq.c:662 sysvec_apic_timer_interrupt+0x93/0xc0 arch/x86/kernel/apic/apic.c:1107 </IRQ> Fixes: d1e5e6408b30 ("tcp: Introduce optional per-netns ehash.") Reported-by: syzbot <syzkaller@googlegroups.com> Reported-by: Eric Dumazet <edumazet@google.com> Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20221012145036.74960-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-13vdpa/ifcvf: add reviewerMichael S. Tsirkin1-0/+4
Zhu Lingshan has been writing and reviewing ifcvf patches for a while now, add as reviewer. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Zhu Lingshan <lingshan.zhu@intel.com> Acked-by: Jason Wang <jasowang@redhat.com>
2022-10-13virtio_pci: use irq to detect interrupt supportMichael S. Tsirkin1-2/+2
commit 71491c54eafa ("virtio_pci: don't try to use intxif pin is zero") breaks virtio_pci on powerpc, when running as a qemu guest. vp_find_vqs() bails out because pci_dev->pin == 0. But pci_dev->irq is populated correctly, so vp_find_vqs_intx() would succeed if we called it - which is what the code used to do. This seems to happen because pci_dev->pin is not populated in pci_assign_irq(). A PCI core bug? Maybe. However Linus said: I really think that that is basically the only time you should use that 'pci_dev->pin' thing: it basically exists not for "does this device have an IRQ", but for "what is the routing of this irq on this device". and The correct way to check for "no irq" doesn't use NO_IRQ at all, it just does if (dev->irq) ... so let's just check irq and be done with it. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Reported-by: Michael Ellerman <mpe@ellerman.id.au> Fixes: 71491c54eafa ("virtio_pci: don't try to use intxif pin is zero") Cc: "Angus Chen" <angus.chen@jaguarmicro.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20221012220312.308522-1-mst@redhat.com>
2022-10-13Merge tag 'wireless-2022-10-13' of ↵Paolo Abeni5-47/+84
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Johannes Berg says: ==================== More wireless fixes for 6.1 This has only the fixes for the scan parsing issues. * tag 'wireless-2022-10-13' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: wifi: cfg80211: update hidden BSSes to avoid WARN_ON wifi: mac80211: fix crash in beacon protection for P2P-device wifi: mac80211_hwsim: avoid mac80211 warning on bad rate wifi: cfg80211: avoid nontransmitted BSS list corruption wifi: cfg80211: fix BSS refcounting bugs wifi: cfg80211: ensure length byte is present before access wifi: mac80211: fix MBSSID parsing use-after-free wifi: cfg80211/mac80211: reject bad MBSSID elements wifi: cfg80211: fix u8 overflow in cfg80211_update_notlisted_nontrans() ==================== Link: https://lore.kernel.org/r/20221013100522.46346-1-johannes@sipsolutions.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-10-13Merge branch 'cve-fixes-2022-10-13'Johannes Berg28-89/+161
Pull in the fixes for various scan parsing bugs found by Sönke Huster by fuzzing.
2022-10-12net: phy: micrel: Fixes FIELD_GET assertionDivya Koppera1-4/+5
FIELD_GET() must only be used with a mask that is a compile-time constant. Mark the functions as __always_inline to avoid the problem. Fixes: 21b688dabecb6a ("net: phy: micrel: Cable Diag feature for lan8814 phy") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Divya Koppera <Divya.Koppera@microchip.com> Link: https://lore.kernel.org/r/20221011095437.12580-1-Divya.Koppera@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-12openvswitch: add nf_ct_is_confirmed check before assigning the helperXin Long1-1/+2
A WARN_ON call trace would be triggered when 'ct(commit, alg=helper)' applies on a confirmed connection: WARNING: CPU: 0 PID: 1251 at net/netfilter/nf_conntrack_extend.c:98 RIP: 0010:nf_ct_ext_add+0x12d/0x150 [nf_conntrack] Call Trace: <TASK> nf_ct_helper_ext_add+0x12/0x60 [nf_conntrack] __nf_ct_try_assign_helper+0xc4/0x160 [nf_conntrack] __ovs_ct_lookup+0x72e/0x780 [openvswitch] ovs_ct_execute+0x1d8/0x920 [openvswitch] do_execute_actions+0x4e6/0xb60 [openvswitch] ovs_execute_actions+0x60/0x140 [openvswitch] ovs_packet_cmd_execute+0x2ad/0x310 [openvswitch] genl_family_rcv_msg_doit.isra.15+0x113/0x150 genl_rcv_msg+0xef/0x1f0 which can be reproduced with these OVS flows: table=0, in_port=veth1,tcp,tcp_dst=2121,ct_state=-trk actions=ct(commit, table=1) table=1, in_port=veth1,tcp,tcp_dst=2121,ct_state=+trk+new actions=ct(commit, alg=ftp),normal The issue was introduced by commit 248d45f1e193 ("openvswitch: Allow attaching helper in later commit") where it somehow removed the check of nf_ct_is_confirmed before asigning the helper. This patch is to fix it by bringing it back. Fixes: 248d45f1e193 ("openvswitch: Allow attaching helper in later commit") Reported-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Aaron Conole <aconole@redhat.com> Tested-by: Aaron Conole <aconole@redhat.com> Link: https://lore.kernel.org/r/c5c9092a22a2194650222bffaf786902613deb16.1665085502.git.lucien.xin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-12Merge branch 'tcp-udp-fix-memory-leaks-and-data-races-around-ipv6_addrform'Jakub Kicinski14-45/+102
Kuniyuki Iwashima says: ==================== tcp/udp: Fix memory leaks and data races around IPV6_ADDRFORM. This series fixes some memory leaks and data races caused in the same scenario where one thread converts an IPv6 socket into IPv4 with IPV6_ADDRFORM and another accesses the socket concurrently. v4: https://lore.kernel.org/netdev/20221004171802.40968-1-kuniyu@amazon.com/ v3 (Resend): https://lore.kernel.org/netdev/20221003154425.49458-1-kuniyu@amazon.com/ v3: https://lore.kernel.org/netdev/20220929012542.55424-1-kuniyu@amazon.com/ v2: https://lore.kernel.org/netdev/20220928002741.64237-1-kuniyu@amazon.com/ v1: https://lore.kernel.org/netdev/20220927161209.32939-1-kuniyu@amazon.com/ ==================== Link: https://lore.kernel.org/r/20221006185349.74777-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-12tcp: Fix data races around icsk->icsk_af_ops.Kuniyuki Iwashima3-7/+12
setsockopt(IPV6_ADDRFORM) and tcp_v6_connect() change icsk->icsk_af_ops under lock_sock(), but tcp_(get|set)sockopt() read it locklessly. To avoid load/store tearing, we need to add READ_ONCE() and WRITE_ONCE() for the reads and writes. Thanks to Eric Dumazet for providing the syzbot report: BUG: KCSAN: data-race in tcp_setsockopt / tcp_v6_connect write to 0xffff88813c624518 of 8 bytes by task 23936 on cpu 0: tcp_v6_connect+0x5b3/0xce0 net/ipv6/tcp_ipv6.c:240 __inet_stream_connect+0x159/0x6d0 net/ipv4/af_inet.c:660 inet_stream_connect+0x44/0x70 net/ipv4/af_inet.c:724 __sys_connect_file net/socket.c:1976 [inline] __sys_connect+0x197/0x1b0 net/socket.c:1993 __do_sys_connect net/socket.c:2003 [inline] __se_sys_connect net/socket.c:2000 [inline] __x64_sys_connect+0x3d/0x50 net/socket.c:2000 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd read to 0xffff88813c624518 of 8 bytes by task 23937 on cpu 1: tcp_setsockopt+0x147/0x1c80 net/ipv4/tcp.c:3789 sock_common_setsockopt+0x5d/0x70 net/core/sock.c:3585 __sys_setsockopt+0x212/0x2b0 net/socket.c:2252 __do_sys_setsockopt net/socket.c:2263 [inline] __se_sys_setsockopt net/socket.c:2260 [inline] __x64_sys_setsockopt+0x62/0x70 net/socket.c:2260 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd value changed: 0xffffffff8539af68 -> 0xffffffff8539aff8 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 23937 Comm: syz-executor.5 Not tainted 6.0.0-rc4-syzkaller-00331-g4ed9c1e971b1-dirty #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/26/2022 Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot <syzkaller@googlegroups.com> Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-12ipv6: Fix data races around sk->sk_prot.Kuniyuki Iwashima3-11/+22
Commit 086d49058cd8 ("ipv6: annotate some data-races around sk->sk_prot") fixed some data-races around sk->sk_prot but it was not enough. Some functions in inet6_(stream|dgram)_ops still access sk->sk_prot without lock_sock() or rtnl_lock(), so they need READ_ONCE() to avoid load tearing. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-12tcp/udp: Call inet6_destroy_sock() in IPv6 sk->sk_destruct().Kuniyuki Iwashima9-15/+46
Originally, inet6_sk(sk)->XXX were changed under lock_sock(), so we were able to clean them up by calling inet6_destroy_sock() during the IPv6 -> IPv4 conversion by IPV6_ADDRFORM. However, commit 03485f2adcde ("udpv6: Add lockless sendmsg() support") added a lockless memory allocation path, which could cause a memory leak: setsockopt(IPV6_ADDRFORM) sendmsg() +-----------------------+ +-------+ - do_ipv6_setsockopt(sk, ...) - udpv6_sendmsg(sk, ...) - sockopt_lock_sock(sk) ^._ called via udpv6_prot - lock_sock(sk) before WRITE_ONCE() - WRITE_ONCE(sk->sk_prot, &tcp_prot) - inet6_destroy_sock() - if (!corkreq) - sockopt_release_sock(sk) - ip6_make_skb(sk, ...) - release_sock(sk) ^._ lockless fast path for the non-corking case - __ip6_append_data(sk, ...) - ipv6_local_rxpmtu(sk, ...) - xchg(&np->rxpmtu, skb) ^._ rxpmtu is never freed. - goto out_no_dst; - lock_sock(sk) For now, rxpmtu is only the case, but not to miss the future change and a similar bug fixed in commit e27326009a3d ("net: ping6: Fix memleak in ipv6_renew_options()."), let's set a new function to IPv6 sk->sk_destruct() and call inet6_cleanup_sock() there. Since the conversion does not change sk->sk_destruct(), we can guarantee that we can clean up IPv6 resources finally. We can now remove all inet6_destroy_sock() calls from IPv6 protocol specific ->destroy() functions, but such changes are invasive to backport. So they can be posted as a follow-up later for net-next. Fixes: 03485f2adcde ("udpv6: Add lockless sendmsg() support") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>