summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2022-12-22bpf: fix nullness propagation for reg to reg comparisonsHao Sun1-1/+8
After befae75856ab, the verifier would propagate null information after JEQ/JNE, e.g., if two pointers, one is maybe_null and the other is not, the former would be marked as non-null in eq path. However, as comment "PTR_TO_BTF_ID points to a kernel struct that does not need to be null checked by the BPF program ... The verifier must keep this in mind and can make no assumptions about null or non-null when doing branch ...". If one pointer is maybe_null and the other is PTR_TO_BTF, the former is incorrectly marked non-null. The following BPF prog can trigger a null-ptr-deref, also see this report for more details[1]: 0: (18) r1 = map_fd ; R1_w=map_ptr(ks=4, vs=4) 2: (79) r6 = *(u64 *)(r1 +8) ; R6_w=bpf_map->inner_map_data ; R6 is PTR_TO_BTF_ID ; equals to null at runtime 3: (bf) r2 = r10 4: (07) r2 += -4 5: (62) *(u32 *)(r2 +0) = 0 6: (85) call bpf_map_lookup_elem#1 ; R0_w=map_value_or_null 7: (1d) if r6 == r0 goto pc+1 8: (95) exit ; from 7 to 9: R0=map_value R6=ptr_bpf_map 9: (61) r0 = *(u32 *)(r0 +0) ; null-ptr-deref 10: (95) exit So, make the verifier propagate nullness information for reg to reg comparisons only if neither reg is PTR_TO_BTF_ID. [1] https://lore.kernel.org/bpf/CACkBjsaFJwjC5oiw-1KXvcazywodwXo4zGYsRHwbr2gSG9WcSw@mail.gmail.com/T/#u Fixes: befae75856ab ("bpf: propagate nullness information for reg to reg comparisons") Signed-off-by: Hao Sun <sunhao.th@gmail.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20221222024414.29539-1-sunhao.th@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2022-12-22bpftool: Fix linkage with statically built libllvmAnton Protopopov1-0/+4
Since the commit eb9d1acf634b ("bpftool: Add LLVM as default library for disassembling JIT-ed programs") we might link the bpftool program with the libllvm library. This works fine when a shared libllvm library is available, but fails if we want to link bpftool with a statically built LLVM: [...] /usr/bin/ld: /usr/local/lib/libLLVMSupport.a(CrashRecoveryContext.cpp.o): in function `llvm::CrashRecoveryContextCleanup::~CrashRecoveryContextCleanup()': CrashRecoveryContext.cpp:(.text._ZN4llvm27CrashRecoveryContextCleanupD0Ev+0x17): undefined reference to `operator delete(void*, unsigned long)' /usr/bin/ld: /usr/local/lib/libLLVMSupport.a(CrashRecoveryContext.cpp.o): in function `llvm::CrashRecoveryContext::~CrashRecoveryContext()': CrashRecoveryContext.cpp:(.text._ZN4llvm20CrashRecoveryContextD2Ev+0xc8): undefined reference to `operator delete(void*, unsigned long)' [...] So in the case of static libllvm we need to explicitly link bpftool with required libraries, namely, libstdc++ and those provided by the `llvm-config --system-libs` command. We can distinguish between the shared and static cases by using the `llvm-config --shared-mode` command. Fixes: eb9d1acf634b ("bpftool: Add LLVM as default library for disassembling JIT-ed programs") Signed-off-by: Anton Protopopov <aspsk@isovalent.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/bpf/20221222102627.1643709-1-aspsk@isovalent.com
2022-12-22selftests/bpf: Test bpf_skb_adjust_room on CHECKSUM_PARTIALMartin KaFai Lau4-0/+160
When the bpf_skb_adjust_room() shrinks the skb such that its csum_start is invalid, the skb->ip_summed should be reset from CHECKSUM_PARTIAL to CHECKSUM_NONE. The commit 54c3f1a81421 ("bpf: pull before calling skb_postpull_rcsum()") fixed it. This patch adds a test to ensure the skb->ip_summed changed from CHECKSUM_PARTIAL to CHECKSUM_NONE after bpf_skb_adjust_room(). Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/bpf/20221221185653.1589961-1-martin.lau@linux.dev
2022-12-20bpf: pull before calling skb_postpull_rcsum()Jakub Kicinski1-2/+5
Anand hit a BUG() when pulling off headers on egress to a SW tunnel. We get to skb_checksum_help() with an invalid checksum offset (commit d7ea0d9df2a6 ("net: remove two BUG() from skb_checksum_help()") converted those BUGs to WARN_ONs()). He points out oddness in how skb_postpull_rcsum() gets used. Indeed looks like we should pull before "postpull", otherwise the CHECKSUM_PARTIAL fixup from skb_postpull_rcsum() will not be able to do its job: if (skb->ip_summed == CHECKSUM_PARTIAL && skb_checksum_start_offset(skb) < 0) skb->ip_summed = CHECKSUM_NONE; Reported-by: Anand Parthasarathy <anpartha@meta.com> Fixes: 6578171a7ff0 ("bpf: add bpf_skb_change_proto helper") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20221220004701.402165-1-kuba@kernel.org Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2022-12-19bpf: Define sock security related BTF IDs under CONFIG_SECURITY_NETWORKHou Tao1-0/+2
There are warnings reported from resolve_btfids when building vmlinux with CONFIG_SECURITY_NETWORK disabled: WARN: resolve_btfids: unresolved symbol bpf_lsm_sk_free_security WARN: resolve_btfids: unresolved symbol bpf_lsm_sk_alloc_security So only define BTF IDs for these LSM hooks when CONFIG_SECURITY_NETWORK is enabled. Fixes: c0c852dd1876 ("bpf: Do not mark certain LSM hook arguments as trusted") Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20221217062144.2507222-1-houtao@huaweicloud.com
2022-12-19mctp: serial: Fix starting value for frame check sequenceJeremy Kerr1-2/+4
RFC1662 defines the start state for the crc16 FCS to be 0xffff, but we're currently starting at zero. This change uses the correct start state. We're only early in the adoption for the serial binding, so there aren't yet any other users to interface to. Fixes: a0c2ccd9b5ad ("mctp: Add MCTP-over-serial transport binding") Reported-by: Harsh Tyagi <harshtya@google.com> Tested-by: Harsh Tyagi <harshtya@google.com> Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-19nfp: fix unaligned io read of capabilities wordHuanhuan Wang1-1/+1
The address of 32-bit extend capability is not qword aligned, and may cause exception in some arch. Fixes: 484963ce9f1e ("nfp: extend capability and control words") Signed-off-by: Huanhuan Wang <huanhuan.wang@corigine.com> Reviewed-by: Niklas Söderlund <niklas.soderlund@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-19net: stream: purge sk_error_queue in sk_stream_kill_queues()Eric Dumazet1-0/+6
Changheon Lee reported TCP socket leaks, with a nice repro. It seems we leak TCP sockets with the following sequence: 1) SOF_TIMESTAMPING_TX_ACK is enabled on the socket. Each ACK will cook an skb put in error queue, from __skb_tstamp_tx(). __skb_tstamp_tx() is using skb_clone(), unless SOF_TIMESTAMPING_OPT_TSONLY was also requested. 2) If the application is also using MSG_ZEROCOPY, then we put in the error queue cloned skbs that had a struct ubuf_info attached to them. Whenever an struct ubuf_info is allocated, sock_zerocopy_alloc() does a sock_hold(). As long as the cloned skbs are still in sk_error_queue, socket refcount is kept elevated. 3) Application closes the socket, while error queue is not empty. Since tcp_close() no longer purges the socket error queue, we might end up with a TCP socket with at least one skb in error queue keeping the socket alive forever. This bug can be (ab)used to consume all kernel memory and freeze the host. We need to purge the error queue, with proper synchronization against concurrent writers. Fixes: 24bcbe1cc69f ("net: stream: don't purge sk_error_queue in sk_stream_kill_queues()") Reported-by: Changheon Lee <darklight2357@icloud.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-19myri10ge: Fix an error handling path in myri10ge_probe()Christophe JAILLET1-0/+1
Some memory allocated in myri10ge_probe_slices() is not released in the error handling path of myri10ge_probe(). Add the corresponding kfree(), as already done in the remove function. Fixes: 0dcffac1a329 ("myri10ge: add multislices support") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-19net: microchip: vcap: Fix initialization of value and maskHoratiu Vultur1-0/+2
Fix the following smatch warning: smatch warnings: drivers/net/ethernet/microchip/vcap/vcap_api_debugfs.c:103 vcap_debugfs_show_rule_keyfield() error: uninitialized symbol 'value'. drivers/net/ethernet/microchip/vcap/vcap_api_debugfs.c:106 vcap_debugfs_show_rule_keyfield() error: uninitialized symbol 'mask'. In case the vcap field was VCAP_FIELD_U128 and the key was different than IP6_S/DIP then the value and mask were not initialized, therefore initialize them. Fixes: 610c32b2ce66 ("net: microchip: vcap: Add vcap_get_rule") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <error27@gmail.com> Reviewed-by: Saeed Mahameed <saeed@kernel.org> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-19Merge branch 'rxrpc-fixes'David S. Miller12-49/+40
David Howells says: ==================== rxrpc: Fixes for I/O thread conversion/SACK table expansion Here are some fixes for AF_RXRPC: (1) Fix missing unlock in rxrpc's sendmsg. (2) Fix (lack of) propagation of security settings to rxrpc_call. (3) Fix NULL ptr deref in rxrpc_unuse_local(). (4) Fix problem with kthread_run() not invoking the I/O thread function if the kthread gets stopped first. Possibly this should actually be fixed in the kthread code. (5) Fix locking problem as putting a peer (which may be done from RCU) may now invoke kthread_stop(). (6) Fix switched parameters in a couple of trace calls. (7) Fix I/O thread's checking for kthread stop to make sure it completes all outstanding work before returning so that calls are cleaned up. (8) Fix an uninitialised var in the new rxperf test server. (9) Fix the return value of rxrpc_new_incoming_call() so that the checks on it work correctly. The patches fix at least one syzbot bug[1] and probably some others that don't have reproducers[2][3][4]. I think it also fixes another[5], but that showed another failure during testing that was different to the original. There's also an outstanding bug in rxrpc_put_peer()[6] that is fixed by a combination of several patches in my rxrpc-next branch, but I haven't included that here. ==================== Tested-by: Marc Dionne <marc.dionne@auristor.com> Tested-by: kafs-testing+fedora36_64checkkafs-build-164@auristor.com Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-19rxrpc: Fix the return value of rxrpc_new_incoming_call()David Howells3-14/+14
Dan Carpenter sayeth[1]: The patch 5e6ef4f1017c: "rxrpc: Make the I/O thread take over the call and local processor work" from Jan 23, 2020, leads to the following Smatch static checker warning: net/rxrpc/io_thread.c:283 rxrpc_input_packet() warn: bool is not less than zero. Fix this (for now) by changing rxrpc_new_incoming_call() to return an int with 0 or error code rather than bool. Note that the actual return value of rxrpc_input_packet() is currently ignored. I have a separate patch to clean that up. Fixes: 5e6ef4f1017c ("rxrpc: Make the I/O thread take over the call and local processor work") Reported-by: Dan Carpenter <error27@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: http://lists.infradead.org/pipermail/linux-afs/2022-December/006123.html [1] Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-19rxrpc: rxperf: Fix uninitialised variableDavid Howells1-1/+1
Dan Carpenter sayeth[1]: The patch 75bfdbf2fca3: "rxrpc: Implement an in-kernel rxperf server for testing purposes" from Nov 3, 2022, leads to the following Smatch static checker warning: net/rxrpc/rxperf.c:337 rxperf_deliver_to_call() error: uninitialized symbol 'ret'. Fix this by initialising ret to 0. The value is only used for tracing purposes in the rxperf server. Fixes: 75bfdbf2fca3 ("rxrpc: Implement an in-kernel rxperf server for testing purposes") Reported-by: Dan Carpenter <error27@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: http://lists.infradead.org/pipermail/linux-afs/2022-December/006124.html [1] Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-19rxrpc: Fix I/O thread stopDavid Howells1-1/+3
The rxrpc I/O thread checks to see if there's any work it needs to do, and if not, checks kthread_should_stop() before scheduling, and if it should stop, breaks out of the loop and tries to clean up and exit. This can, however, race with socket destruction, wherein outstanding calls are aborted and released from the socket and then the socket unuses the local endpoint, causing kthread_stop() to be issued. The abort is deferred to the I/O thread and the event can by issued between the I/O thread checking if there's any work to be done (such as processing call aborts) and the stop being seen. This results in the I/O thread stopping processing of events whilst call cleanup events are still outstanding, leading to connections or other objects still being around and uncleaned up, which can result in assertions being triggered, e.g.: rxrpc: AF_RXRPC: Leaked client conn 00000000e8009865 {2} ------------[ cut here ]------------ kernel BUG at net/rxrpc/conn_client.c:64! Fix this by retrieving the kthread_should_stop() indication, then checking to see if there's more work to do, and going back round the loop if there is, and breaking out of the loop only if there wasn't. This was triggered by a syzbot test that produced some other symptom[1]. Fixes: a275da62e8c1 ("rxrpc: Create a per-local endpoint receive queue and I/O thread") Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: https://lore.kernel.org/r/0000000000002b4a9f05ef2b616f@google.com/ [1] Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-19rxrpc: Fix switched parameters in peer tracingDavid Howells2-3/+3
Fix the switched parameters on rxrpc_alloc_peer() and rxrpc_get_peer(). The ref argument and the why argument got mixed. Fixes: 47c810a79844 ("rxrpc: trace: Don't use __builtin_return_address for rxrpc_peer tracing") Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-19rxrpc: Fix locking issues in rxrpc_put_peer_locked()David Howells3-23/+7
Now that rxrpc_put_local() may call kthread_stop(), it can't be called under spinlock as it might sleep. This can cause a problem in the peer keepalive code in rxrpc as it tries to avoid dropping the peer_hash_lock from the point it needs to re-add peer->keepalive_link to going round the loop again in rxrpc_peer_keepalive_dispatch(). Fix this by just dropping the lock when we don't need it and accepting that we'll have to take it again. This code is only called about every 20s for each peer, so not very often. This allows rxrpc_put_peer_unlocked() to be removed also. If triggered, this bug produces an oops like the following, as reproduced by a syzbot reproducer for a different oops[1]: BUG: sleeping function called from invalid context at kernel/sched/completion.c:101 ... RCU nest depth: 0, expected: 0 3 locks held by kworker/u9:0/50: #0: ffff88810e74a138 ((wq_completion)krxrpcd){+.+.}-{0:0}, at: process_one_work+0x294/0x636 #1: ffff8881013a7e20 ((work_completion)(&rxnet->peer_keepalive_work)){+.+.}-{0:0}, at: process_one_work+0x294/0x636 #2: ffff88817d366390 (&rxnet->peer_hash_lock){+.+.}-{2:2}, at: rxrpc_peer_keepalive_dispatch+0x2bd/0x35f ... Call Trace: <TASK> dump_stack_lvl+0x4c/0x5f __might_resched+0x2cf/0x2f2 __wait_for_common+0x87/0x1e8 kthread_stop+0x14d/0x255 rxrpc_peer_keepalive_dispatch+0x333/0x35f rxrpc_peer_keepalive_worker+0x2e9/0x449 process_one_work+0x3c1/0x636 worker_thread+0x25f/0x359 kthread+0x1a6/0x1b5 ret_from_fork+0x1f/0x30 Fixes: a275da62e8c1 ("rxrpc: Create a per-local endpoint receive queue and I/O thread") Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: https://lore.kernel.org/r/0000000000002b4a9f05ef2b616f@google.com/ [1] Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-19rxrpc: Fix I/O thread startup getting skippedDavid Howells3-0/+5
When starting a kthread, the __kthread_create_on_node() function, as called from kthread_run(), waits for a completion to indicate that the task_struct (or failure state) of the new kernel thread is available before continuing. This does not wait, however, for the thread function to be invoked and, indeed, will skip it if kthread_stop() gets called before it gets there. If this happens, though, kthread_run() will have returned successfully, indicating that the thread was started and returning the task_struct pointer. The actual error indication is returned by kthread_stop(). Note that this is ambiguous, as the caller cannot tell whether the -EINTR error code came from kthread() or from the thread function. This was encountered in the new rxrpc I/O thread, where if the system is being pounded hard by, say, syzbot, the check of KTHREAD_SHOULD_STOP can be delayed long enough for kthread_stop() to get called when rxrpc releases a socket - and this causes an oops because the I/O thread function doesn't get started and thus doesn't remove the rxrpc_local struct from the local_endpoints list. Fix this by using a completion to wait for the thread to actually enter rxrpc_io_thread(). This makes sure the thread can't be prematurely stopped and makes sure the relied-upon cleanup is done. Fixes: a275da62e8c1 ("rxrpc: Create a per-local endpoint receive queue and I/O thread") Reported-by: syzbot+3538a6a72efa8b059c38@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Hillf Danton <hdanton@sina.com> Link: https://lore.kernel.org/r/000000000000229f1505ef2b6159@google.com/ Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-19rxrpc: Fix NULL deref in rxrpc_unuse_local()David Howells1-1/+2
Fix rxrpc_unuse_local() to get the debug_id *after* checking to see if local is NULL. Fixes: a2cf3264f331 ("rxrpc: Fold __rxrpc_unuse_local() into rxrpc_unuse_local()") Reported-by: syzbot+3538a6a72efa8b059c38@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: syzbot+3538a6a72efa8b059c38@syzkaller.appspotmail.com cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-19rxrpc: Fix security setting propagationDavid Howells3-5/+4
Fix the propagation of the security settings from sendmsg to the rxrpc_call struct. Fixes: f3441d4125fc ("rxrpc: Copy client call parameters into rxrpc_call earlier") Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-19rxrpc: Fix missing unlock in rxrpc_do_sendmsg()David Howells1-1/+1
One of the error paths in rxrpc_do_sendmsg() doesn't unlock the call mutex before returning. Fix it to do this. Note that this still doesn't get rid of the checker warning: ../net/rxrpc/sendmsg.c:617:5: warning: context imbalance in 'rxrpc_do_sendmsg' - wrong count at exit I think the interplay between the socket lock and the call's user_mutex may be too complicated for checker to analyse, especially as rxrpc_new_client_call_for_sendmsg(), which it calls, returns with the call's user_mutex if successful but unconditionally drops the socket lock. Fixes: e754eba685aa ("rxrpc: Provide a cmsg to specify the amount of Tx data for a call") Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-19net_sched: reject TCF_EM_SIMPLE case for complex ematch moduleCong Wang1-0/+2
When TCF_EM_SIMPLE was introduced, it is supposed to be convenient for ematch implementation: https://lore.kernel.org/all/20050105110048.GO26856@postel.suug.ch/ "You don't have to, providing a 32bit data chunk without TCF_EM_SIMPLE set will simply result in allocating & copy. It's an optimization, nothing more." So if an ematch module provides ops->datalen that means it wants a complex data structure (saved in its em->data) instead of a simple u32 value. We should simply reject such a combination, otherwise this u32 could be misinterpreted as a pointer. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-and-tested-by: syzbot+4caeae4c7103813598ae@syzkaller.appspotmail.com Reported-by: Jun Nie <jun.nie@linaro.org> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-18Merge branch '1GbE' of ↵David S. Miller4-40/+188
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2022-12-15 (igc) Muhammad Husaini Zulkifli says: This patch series fixes bugs for the Time-Sensitive Networking(TSN) Qbv Scheduling features. An overview of each patch series is given below: Patch 1: Using a first flag bit to schedule a packet to the next cycle if packet cannot fit in current Qbv cycle. Patch 2: Enable strict cycle for Qbv scheduling. Patch 3: Prevent user to set basetime less than zero during tc config. Patch 4: Allow the basetime enrollment with zero value. Patch 5: Calculate the new end time value to exclude the time interval that exceed the cycle time as user can specify the cycle time in tc config. Patch 6: Resolve the HW bugs where the gate is not fully closed. --- This contains the net patches from this original pull request: https://lore.kernel.org/netdev/20221205212414.3197525-1-anthony.l.nguyen@intel.com/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-16skbuff: Account for tail adjustment during pull operationsSubash Abhinov Kasiviswanathan1-0/+3
Extending the tail can have some unexpected side effects if a program uses a helper like BPF_FUNC_skb_pull_data to read partial content beyond the head skb headlen when all the skbs in the gso frag_list are linear with no head_frag - kernel BUG at net/core/skbuff.c:4219! pc : skb_segment+0xcf4/0xd2c lr : skb_segment+0x63c/0xd2c Call trace: skb_segment+0xcf4/0xd2c __udp_gso_segment+0xa4/0x544 udp4_ufo_fragment+0x184/0x1c0 inet_gso_segment+0x16c/0x3a4 skb_mac_gso_segment+0xd4/0x1b0 __skb_gso_segment+0xcc/0x12c udp_rcv_segment+0x54/0x16c udp_queue_rcv_skb+0x78/0x144 udp_unicast_rcv_skb+0x8c/0xa4 __udp4_lib_rcv+0x490/0x68c udp_rcv+0x20/0x30 ip_protocol_deliver_rcu+0x1b0/0x33c ip_local_deliver+0xd8/0x1f0 ip_rcv+0x98/0x1a4 deliver_ptype_list_skb+0x98/0x1ec __netif_receive_skb_core+0x978/0xc60 Fix this by marking these skbs as GSO_DODGY so segmentation can handle the tail updates accordingly. Fixes: 3dcbdb134f32 ("net: gso: Fix skb_segment splat when splitting gso_size mangled skb having linear-headed frag_list") Signed-off-by: Sean Tranchetti <quic_stranche@quicinc.com> Signed-off-by: Subash Abhinov Kasiviswanathan <quic_subashab@quicinc.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Link: https://lore.kernel.org/r/1671084718-24796-1-git-send-email-quic_subashab@quicinc.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-16devlink: protect devlink dump by the instance lockJakub Kicinski1-0/+3
Take the instance lock around devlink_nl_fill() when dumping, doit takes it already. We are only dumping basic info so in the worst case we were risking data races around the reload statistics. Until the big devlink mutex was removed all relevant code was protected by it, so the missing instance lock was not exposed. Fixes: d3efc2a6a6d8 ("net: devlink: remove devlink_mutex") Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20221216044122.1863550-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-16net: ethernet: ti: am65-cpsw: fix CONFIG_PM #ifdefArnd Bergmann1-3/+1
The #ifdef check is incorrect and leads to a warning: drivers/net/ethernet/ti/am65-cpsw-nuss.c:1679:13: error: 'am65_cpsw_nuss_remove_rx_chns' defined but not used [-Werror=unused-function] 1679 | static void am65_cpsw_nuss_remove_rx_chns(void *data) It's better to remove the #ifdef here and use the modern SYSTEM_SLEEP_PM_OPS() macro instead. Fixes: 24bc19b05f1f ("net: ethernet: ti: am65-cpsw: Add suspend/resume support") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Link: https://lore.kernel.org/r/20221215163918.611609-1-arnd@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-16Merge tag 'for-netdev' of ↵Jakub Kicinski9-36/+119
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2022-12-16 We've added 7 non-merge commits during the last 2 day(s) which contain a total of 9 files changed, 119 insertions(+), 36 deletions(-). 1) Fix for recent syzkaller XDP dispatcher update splat, from Jiri Olsa. 2) Fix BPF program refcount leak in LSM attachment failure path, from Milan Landaverde. 3) Fix BPF program type in map compatibility check for fext, from Toke Høiland-Jørgensen. 4) Fix a BPF selftest compilation error under !CONFIG_SMP config, from Yonghong Song. 5) Fix CI to enable CONFIG_FUNCTION_ERROR_INJECTION after it got changed to a prompt, from Song Liu. 6) Various BPF documentation fixes for socket local storage, from Donald Hunter. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: selftests/bpf: Add a test for using a cpumap from an freplace-to-XDP program bpf: Resolve fext program type when checking map compatibility bpf: Synchronize dispatcher update with bpf_dispatcher_xdp_func bpf: prevent leak of lsm program after failed attach selftests/bpf: Select CONFIG_FUNCTION_ERROR_INJECTION selftests/bpf: Fix a selftest compilation error with CONFIG_SMP=n docs/bpf: Reword docs for BPF_MAP_TYPE_SK_STORAGE ==================== Link: https://lore.kernel.org/r/20221216174540.16598-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-16openvswitch: Fix flow lookup to use unmasked keyEelco Chaudron1-9/+16
The commit mentioned below causes the ovs_flow_tbl_lookup() function to be called with the masked key. However, it's supposed to be called with the unmasked key. This due to the fact that the datapath supports installing wider flows, and OVS relies on this behavior. For example if ipv4(src=1.1.1.1/192.0.0.0, dst=1.1.1.2/192.0.0.0) exists, a wider flow (smaller mask) of ipv4(src=192.1.1.1/128.0.0.0,dst=192.1.1.2/ 128.0.0.0) is allowed to be added. However, if we try to add a wildcard rule, the installation fails: $ ovs-appctl dpctl/add-flow system@myDP "in_port(1),eth_type(0x0800), \ ipv4(src=1.1.1.1/192.0.0.0,dst=1.1.1.2/192.0.0.0,frag=no)" 2 $ ovs-appctl dpctl/add-flow system@myDP "in_port(1),eth_type(0x0800), \ ipv4(src=192.1.1.1/0.0.0.0,dst=49.1.1.2/0.0.0.0,frag=no)" 2 ovs-vswitchd: updating flow table (File exists) The reason is that the key used to determine if the flow is already present in the system uses the original key ANDed with the mask. This results in the IP address not being part of the (miniflow) key, i.e., being substituted with an all-zero value. When doing the actual lookup, this results in the key wrongfully matching the first flow, and therefore the flow does not get installed. This change reverses the commit below, but rather than having the key on the stack, it's allocated. Fixes: 190aa3e77880 ("openvswitch: Fix Frame-size larger than 1024 bytes warning.") Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-16Merge branch 'devlink-fixes'David S. Miller3-2/+17
Jakub Kicinski says: ==================== devlink: region snapshot locking fix and selftest adjustments Minor fix for region snapshot locking and adjustments to selftests. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-16selftests: devlink: add a warning for interfaces coming upJakub Kicinski1-0/+13
NetworkManager (and other daemons) may bring the interface up and cause failures in quiescence checks. Print a helpful warning, and take the interface down again. I seem to forget about this every time I run these tests on a new VM. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-16selftests: devlink: fix the fd redirect in dummy_reporter_testJakub Kicinski1-2/+2
$number + > bash means redirect FD $number, e.g. commonly used 2> redirects stderr (fd 2). The test uses 8192> to write the number 8192 to a file, this results in: ./devlink.sh: line 499: 8192: Bad file descriptor Oddly the test also papers over this issue by checking for failure (expecting an error rather than success) so it passes, anyway. Fixes: ff18176ad806 ("selftests: Add a test of large binary to devlink health test") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-16devlink: hold region lock when flushing snapshotsJakub Kicinski1-0/+2
Netdevsim triggers a splat on reload, when it destroys regions with snapshots pending: WARNING: CPU: 1 PID: 787 at net/core/devlink.c:6291 devlink_region_snapshot_del+0x12e/0x140 CPU: 1 PID: 787 Comm: devlink Not tainted 6.1.0-07460-g7ae9888d6e1c #580 RIP: 0010:devlink_region_snapshot_del+0x12e/0x140 Call Trace: <TASK> devl_region_destroy+0x70/0x140 nsim_dev_reload_down+0x2f/0x60 [netdevsim] devlink_reload+0x1f7/0x360 devlink_nl_cmd_reload+0x6ce/0x860 genl_family_rcv_msg_doit.isra.0+0x145/0x1c0 This is the locking assert in devlink_region_snapshot_del(), we're supposed to be holding the region->snapshot_lock here. Fixes: 2dec18ad826f ("net: devlink: remove region snapshots list dependency on devlink->lock") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-15net: dsa: mt7530: remove redundant assignmentDaniel Golle1-3/+0
Russell King correctly pointed out that the MAC_2500FD capability is already added for port 5 (if not in RGMII mode) and port 6 (which only supports SGMII) by mt7531_mac_port_get_caps. Remove the reduntant setting of this capability flag which was added by a previous commit. Fixes: e19de30d2080 ("net: dsa: mt7530: add support for in-band link status") Reported-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Daniel Golle <daniel@makrotopia.org> Link: https://lore.kernel.org/r/Y5qY7x6la5TxZxzX@makrotopia.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-15igc: Set Qbv start_time and end_time to end_time if not being configured in GCLTan Tee Min1-1/+13
The default setting of end_time minus start_time is whole 1 second. Thus, if it's not being configured in any GCL entry then it will be staying at original 1 second. This patch is changing the start_time and end_time to be end_time as if setting zero will be having weird HW behavior where the gate will not be fully closed. Fixes: ec50a9d437f0 ("igc: Add support for taprio offloading") Signed-off-by: Tan Tee Min <tee.min.tan@linux.intel.com> Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-12-15igc: recalculate Qbv end_time by considering cycle timeTan Tee Min1-0/+15
Qbv users can specify a cycle time that is not equal to the total GCL intervals. Hence, recalculation is necessary here to exclude the time interval that exceeds the cycle time. As those GCL which exceeds the cycle time will be truncated. According to IEEE Std. 802.1Q-2018 section 8.6.9.2, once the end of the list is reached, it will switch to the END_OF_CYCLE state and leave the gates in the same state until the next cycle is started. Fixes: ec50a9d437f0 ("igc: Add support for taprio offloading") Signed-off-by: Tan Tee Min <tee.min.tan@linux.intel.com> Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-12-15igc: allow BaseTime 0 enrollment for QbvTan Tee Min3-1/+4
Introduce qbv_enable flag in igc_adapter struct to store the Qbv on/off. So this allow the BaseTime to enroll with zero value. Fixes: 61572d5f8f91 ("igc: Simplify TSN flags handling") Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com> Signed-off-by: Tan Tee Min <tee.min.tan@linux.intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-12-15igc: Add checking for basetime less than zeroMuhammad Husaini Zulkifli1-0/+3
Using the tc qdisc command, the user can set basetime to any value. Checking should be done on the driver's side to prevent registering basetime values that are less than zero. Fixes: ec50a9d437f0 ("igc: Add support for taprio offloading") Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-12-15igc: Use strict cycles for Qbv schedulingVinicius Costa Gomes1-9/+2
Configuring strict cycle mode in the controller forces more well behaved transmissions when taprio is offloaded. When set this strict_cycle and strict_end, transmission is not enabled if the whole packet cannot be completed before end of the Qbv cycle. Fixes: 82faa9b79950 ("igc: Add support for ETF offloading") Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: Aravindhan Gunasekaran <aravindhan.gunasekaran@intel.com> Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-12-15igc: Enhance Qbv scheduling by using first flag bitVinicius Costa Gomes3-29/+151
The I225 hardware has a limitation that packets can only be scheduled in the [0, cycle-time] interval. So, scheduling a packet to the start of the next cycle doesn't usually work. To overcome this, we use the Transmit Descriptor first flag to indicates that a packet should be the first packet (from a queue) in a cycle according to the section 7.5.2.9.3.4 The First Packet on Each QBV Cycle in Intel Discrete I225/6 User Manual. But this only works if there was any packet from that queue during the current cycle, to avoid this issue, we issue an empty packet if that's not the case. Also require one more descriptor to be available, to take into account the empty packet that might be issued. Test Setup: Talker: Use l2_tai to generate the launchtime into packet load. Listener: Use timedump.c to compute the delta between packet arrival and LaunchTime packet payload. Test Result: Before: 1666000610127300000,1666000610127300096,96,621273 1666000610127400000,1666000610127400192,192,621274 1666000610127500000,1666000610127500032,32,621275 1666000610127600000,1666000610127600128,128,621276 1666000610127700000,1666000610127700224,224,621277 1666000610127800000,1666000610127800064,64,621278 1666000610127900000,1666000610127900160,160,621279 1666000610128000000,1666000610128000000,0,621280 1666000610128100000,1666000610128100096,96,621281 1666000610128200000,1666000610128200192,192,621282 1666000610128300000,1666000610128300032,32,621283 1666000610128400000,1666000610128301056,-98944,621284 1666000610128500000,1666000610128302080,-197920,621285 1666000610128600000,1666000610128302848,-297152,621286 1666000610128700000,1666000610128303872,-396128,621287 1666000610128800000,1666000610128304896,-495104,621288 1666000610128900000,1666000610128305664,-594336,621289 1666000610129000000,1666000610128306688,-693312,621290 1666000610129100000,1666000610128307712,-792288,621291 1666000610129200000,1666000610128308480,-891520,621292 1666000610129300000,1666000610128309504,-990496,621293 1666000610129400000,1666000610128310528,-1089472,621294 1666000610129500000,1666000610128311296,-1188704,621295 1666000610129600000,1666000610128312320,-1287680,621296 1666000610129700000,1666000610128313344,-1386656,621297 1666000610129800000,1666000610128314112,-1485888,621298 1666000610129900000,1666000610128315136,-1584864,621299 1666000610130000000,1666000610128316160,-1683840,621300 1666000610130100000,1666000610128316928,-1783072,621301 1666000610130200000,1666000610128317952,-1882048,621302 1666000610130300000,1666000610128318976,-1981024,621303 1666000610130400000,1666000610128319744,-2080256,621304 1666000610130500000,1666000610128320768,-2179232,621305 1666000610130600000,1666000610128321792,-2278208,621306 1666000610130700000,1666000610128322816,-2377184,621307 1666000610130800000,1666000610128323584,-2476416,621308 1666000610130900000,1666000610128324608,-2575392,621309 1666000610131000000,1666000610128325632,-2674368,621310 1666000610131100000,1666000610128326400,-2773600,621311 1666000610131200000,1666000610128327424,-2872576,621312 1666000610131300000,1666000610128328448,-2971552,621313 1666000610131400000,1666000610128329216,-3070784,621314 1666000610131500000,1666000610131500032,32,621315 1666000610131600000,1666000610131600128,128,621316 1666000610131700000,1666000610131700224,224,621317 After: 1666073510646200000,1666073510646200064,64,2676462 1666073510646300000,1666073510646300160,160,2676463 1666073510646400000,1666073510646400256,256,2676464 1666073510646500000,1666073510646500096,96,2676465 1666073510646600000,1666073510646600192,192,2676466 1666073510646700000,1666073510646700032,32,2676467 1666073510646800000,1666073510646800128,128,2676468 1666073510646900000,1666073510646900224,224,2676469 1666073510647000000,1666073510647000064,64,2676470 1666073510647100000,1666073510647100160,160,2676471 1666073510647200000,1666073510647200256,256,2676472 1666073510647300000,1666073510647300096,96,2676473 1666073510647400000,1666073510647400192,192,2676474 1666073510647500000,1666073510647500032,32,2676475 1666073510647600000,1666073510647600128,128,2676476 1666073510647700000,1666073510647700224,224,2676477 1666073510647800000,1666073510647800064,64,2676478 1666073510647900000,1666073510647900160,160,2676479 1666073510648000000,1666073510648000000,0,2676480 1666073510648100000,1666073510648100096,96,2676481 1666073510648200000,1666073510648200192,192,2676482 1666073510648300000,1666073510648300032,32,2676483 1666073510648400000,1666073510648400128,128,2676484 1666073510648500000,1666073510648500224,224,2676485 1666073510648600000,1666073510648600064,64,2676486 1666073510648700000,1666073510648700160,160,2676487 1666073510648800000,1666073510648800000,0,2676488 1666073510648900000,1666073510648900096,96,2676489 1666073510649000000,1666073510649000192,192,2676490 1666073510649100000,1666073510649100032,32,2676491 1666073510649200000,1666073510649200128,128,2676492 1666073510649300000,1666073510649300224,224,2676493 1666073510649400000,1666073510649400064,64,2676494 1666073510649500000,1666073510649500160,160,2676495 1666073510649600000,1666073510649600000,0,2676496 1666073510649700000,1666073510649700096,96,2676497 1666073510649800000,1666073510649800192,192,2676498 1666073510649900000,1666073510649900032,32,2676499 1666073510650000000,1666073510650000128,128,2676500 Fixes: 82faa9b79950 ("igc: Add support for ETF offloading") Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Co-developed-by: Aravindhan Gunasekaran <aravindhan.gunasekaran@intel.com> Signed-off-by: Aravindhan Gunasekaran <aravindhan.gunasekaran@intel.com> Co-developed-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com> Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com> Signed-off-by: Malli C <mallikarjuna.chilakala@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-12-15net: dsa: mv88e6xxx: avoid reg_lock deadlock in mv88e6xxx_setup_port()Vladimir Oltean1-5/+4
In the blamed commit, it was not noticed that one implementation of chip->info->ops->phylink_get_caps(), called by mv88e6xxx_get_caps(), may access hardware registers, and in doing so, it takes the mv88e6xxx_reg_lock(). Namely, this is mv88e6352_phylink_get_caps(). This is a problem because mv88e6xxx_get_caps(), apart from being a top-level function (method invoked by dsa_switch_ops), is now also directly called from mv88e6xxx_setup_port(), which runs under the mv88e6xxx_reg_lock() taken by mv88e6xxx_setup(). Therefore, when running on mv88e6352, the reg_lock would be acquired a second time and the system would deadlock on driver probe. The things that mv88e6xxx_setup() can compete with in terms of register access with are the IRQ handlers and MDIO bus operations registered by mv88e6xxx_probe(). So there is a real need to acquire the register lock. The register lock can, in principle, be dropped and re-acquired pretty much at will within the driver, as long as no operations that involve waiting for indirect access to complete (essentially, callers of mv88e6xxx_smi_direct_wait() and mv88e6xxx_wait_mask()) are interrupted with the lock released. However, I would guess that in mv88e6xxx_setup(), the critical section is kept open for such a long time just in order to optimize away multiple lock/unlock operations on the registers. We could, in principle, drop the reg_lock right before the mv88e6xxx_setup_port() -> mv88e6xxx_get_caps() call, and re-acquire it immediately afterwards. But this would look ugly, because mv88e6xxx_setup_port() would release a lock which it didn't acquire, but the caller did. A cleaner solution to this issue comes from the observation that struct mv88e6xxxx_ops methods generally assume they are called with the reg_lock already acquired. Whereas mv88e6352_phylink_get_caps() is more the exception rather than the norm, in that it acquires the lock itself. Let's enforce the same locking pattern/convention for chip->info->ops->phylink_get_caps() as well, and make mv88e6xxx_get_caps(), the top-level function, acquire the register lock explicitly, for this one implementation that will access registers for port 4 to work properly. This means that mv88e6xxx_setup_port() will no longer call the top-level function, but the low-level mv88e6xxx_ops method which expects the correct calling context (register lock held). Compared to chip->info->ops->phylink_get_caps(), mv88e6xxx_get_caps() also fixes up the supported_interfaces bitmap for internal ports, since that can be done generically and does not require per-switch knowledge. That's code which will no longer execute, however mv88e6xxx_setup_port() doesn't need that. It just needs to look at the mac_capabilities bitmap. Fixes: cc1049ccee20 ("net: dsa: mv88e6xxx: fix speed setting for CPU/DSA ports") Reported-by: Maksim Kiselev <bigunclemax@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Maksim Kiselev <bigunclemax@gmail.com> Link: https://lore.kernel.org/r/20221214110120.3368472-1-vladimir.oltean@nxp.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-12-15ravb: Fix "failed to switch device to config mode" message during unbindBiju Das1-1/+1
This patch fixes the error "ravb 11c20000.ethernet eth0: failed to switch device to config mode" during unbind. We are doing register access after pm_runtime_put_sync(). We usually do cleanup in reverse order of init. Currently in remove(), the "pm_runtime_put_sync" is not in reverse order. Probe reset_control_deassert(rstc); pm_runtime_enable(&pdev->dev); pm_runtime_get_sync(&pdev->dev); remove pm_runtime_put_sync(&pdev->dev); unregister_netdev(ndev); .. ravb_mdio_release(priv); pm_runtime_disable(&pdev->dev); Consider the call to unregister_netdev() unregister_netdev->unregister_netdevice_queue->rollback_registered_many that calls the below functions which access the registers after pm_runtime_put_sync() 1) ravb_get_stats 2) ravb_close Fixes: c156633f1353 ("Renesas Ethernet AVB driver proper") Cc: stable@vger.kernel.org Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/20221214105118.2495313-1-biju.das.jz@bp.renesas.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-12-15net: stmmac: fix errno when create_singlethread_workqueue() failsGaosheng Cui1-0/+1
We should set the return value to -ENOMEM explicitly when create_singlethread_workqueue() fails in stmmac_dvr_probe(), otherwise we'll lose the error value. Fixes: a137f3f27f92 ("net: stmmac: fix possible memory leak in stmmac_dvr_probe()") Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/20221214080117.3514615-1-cuigaosheng1@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-12-15r6040: Fix kmemleak in probe and removeLi Zetao1-1/+4
There is a memory leaks reported by kmemleak: unreferenced object 0xffff888116111000 (size 2048): comm "modprobe", pid 817, jiffies 4294759745 (age 76.502s) hex dump (first 32 bytes): 00 c4 0a 04 81 88 ff ff 08 10 11 16 81 88 ff ff ................ 08 10 11 16 81 88 ff ff 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff815bcd82>] kmalloc_trace+0x22/0x60 [<ffffffff827e20ee>] phy_device_create+0x4e/0x90 [<ffffffff827e6072>] get_phy_device+0xd2/0x220 [<ffffffff827e7844>] mdiobus_scan+0xa4/0x2e0 [<ffffffff827e8be2>] __mdiobus_register+0x482/0x8b0 [<ffffffffa01f5d24>] r6040_init_one+0x714/0xd2c [r6040] ... The problem occurs in probe process as follows: r6040_init_one: mdiobus_register mdiobus_scan <- alloc and register phy_device, the reference count of phy_device is 3 r6040_mii_probe phy_connect <- connect to the first phy_device, so the reference count of the first phy_device is 4, others are 3 register_netdev <- fault inject succeeded, goto error handling path // error handling path err_out_mdio_unregister: mdiobus_unregister(lp->mii_bus); err_out_mdio: mdiobus_free(lp->mii_bus); <- the reference count of the first phy_device is 1, it is not released and other phy_devices are released // similarly, the remove process also has the same problem The root cause is traced to the phy_device is not disconnected when removes one r6040 device in r6040_remove_one() or on error handling path after r6040_mii probed successfully. In r6040_mii_probe(), a net ethernet device is connected to the first PHY device of mii_bus, in order to notify the connected driver when the link status changes, which is the default behavior of the PHY infrastructure to handle everything. Therefore the phy_device should be disconnected when removes one r6040 device or on error handling path. Fix it by adding phy_disconnect() when removes one r6040 device or on error handling path after r6040_mii probed successfully. Fixes: 3831861b4ad8 ("r6040: implement phylib") Signed-off-by: Li Zetao <lizetao1@huawei.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/20221213125614.927754-1-lizetao1@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-12-15unix: Fix race in SOCK_SEQPACKET's unix_dgram_sendmsg()Kirill Tkhai1-2/+9
There is a race resulting in alive SOCK_SEQPACKET socket may change its state from TCP_ESTABLISHED to TCP_CLOSE: unix_release_sock(peer) unix_dgram_sendmsg(sk) sock_orphan(peer) sock_set_flag(peer, SOCK_DEAD) sock_alloc_send_pskb() if !(sk->sk_shutdown & SEND_SHUTDOWN) OK if sock_flag(peer, SOCK_DEAD) sk->sk_state = TCP_CLOSE sk->sk_shutdown = SHUTDOWN_MASK After that socket sk remains almost normal: it is able to connect, listen, accept and recvmsg, while it can't sendmsg. Since this is the only possibility for alive SOCK_SEQPACKET to change the state in such way, we should better fix this strange and potentially danger corner case. Note, that we will return EPIPE here like this is normally done in sock_alloc_send_pskb(). Originally used ECONNREFUSED looks strange, since it's strange to return a specific retval in dependence of race in kernel, when user can't affect on this. Also, move TCP_CLOSE assignment for SOCK_DGRAM sockets under state lock to fix race with unix_dgram_connect(): unix_dgram_connect(other) unix_dgram_sendmsg(sk) unix_peer(sk) = NULL unix_state_unlock(sk) unix_state_double_lock(sk, other) sk->sk_state = TCP_ESTABLISHED unix_peer(sk) = other unix_state_double_unlock(sk, other) sk->sk_state = TCP_CLOSED This patch fixes both of these races. Fixes: 83301b5367a9 ("af_unix: Set TCP_ESTABLISHED for datagram sockets too") Signed-off-by: Kirill Tkhai <tkhai@ya.ru> Link: https://lore.kernel.org/r/135fda25-22d5-837a-782b-ceee50e19844@ya.ru Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-12-14selftests/bpf: Add a test for using a cpumap from an freplace-to-XDP programToke Høiland-Jørgensen2-0/+72
This adds a simple test for inserting an XDP program into a cpumap that is "owned" by an XDP program that was loaded as PROG_TYPE_EXT (as libxdp does). Prior to the kernel fix this would fail because the map type ownership would be set to PROG_TYPE_EXT instead of being resolved to PROG_TYPE_XDP. v5: - Fix a few nits from Andrii, add his ACK v4: - Use skeletons for selftest v3: - Update comment to better explain the cause - Add Yonghong's ACK Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/r/20221214230254.790066-2-toke@redhat.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2022-12-14bpf: Resolve fext program type when checking map compatibilityToke Høiland-Jørgensen1-2/+3
The bpf_prog_map_compatible() check makes sure that BPF program types are not mixed inside BPF map types that can contain programs (tail call maps, cpumaps and devmaps). It does this by setting the fields of the map->owner struct to the values of the first program being checked against, and rejecting any subsequent programs if the values don't match. One of the values being set in the map owner struct is the program type, and since the code did not resolve the prog type for fext programs, the map owner type would be set to PROG_TYPE_EXT and subsequent loading of programs of the target type into the map would fail. This bug is seen in particular for XDP programs that are loaded as PROG_TYPE_EXT using libxdp; these cannot insert programs into devmaps and cpumaps because the check fails as described above. Fix the bug by resolving the fext program type to its target program type as elsewhere in the verifier. v3: - Add Yonghong's ACK Fixes: f45d5b6ce2e8 ("bpf: generalise tail call map compatibility check") Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/r/20221214230254.790066-1-toke@redhat.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2022-12-14nfc: pn533: Clear nfc_target before being usedMinsuk Kang1-0/+4
Fix a slab-out-of-bounds read that occurs in nla_put() called from nfc_genl_send_target() when target->sensb_res_len, which is duplicated from an nfc_target in pn533, is too large as the nfc_target is not properly initialized and retains garbage values. Clear nfc_targets with memset() before they are used. Found by a modified version of syzkaller. BUG: KASAN: slab-out-of-bounds in nla_put Call Trace: memcpy nla_put nfc_genl_dump_targets genl_lock_dumpit netlink_dump __netlink_dump_start genl_family_rcv_msg_dumpit genl_rcv_msg netlink_rcv_skb genl_rcv netlink_unicast netlink_sendmsg sock_sendmsg ____sys_sendmsg ___sys_sendmsg __sys_sendmsg do_syscall_64 Fixes: 673088fb42d0 ("NFC: pn533: Send ATR_REQ directly for active device detection") Fixes: 361f3cb7f9cf ("NFC: DEP link hook implementation for pn533") Signed-off-by: Minsuk Kang <linuxlovemin@yonsei.ac.kr> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20221214015139.119673-1-linuxlovemin@yonsei.ac.kr Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-14net: enetc: avoid buffer leaks on xdp_do_redirect() failureVladimir Oltean1-27/+8
Before enetc_clean_rx_ring_xdp() calls xdp_do_redirect(), each software BD in the RX ring between index orig_i and i can have one of 2 refcount values on its page. We are the owner of the current buffer that is being processed, so the refcount will be at least 1. If the current owner of the buffer at the diametrically opposed index in the RX ring (i.o.w, the other half of this page) has not yet called kfree(), this page's refcount could even be 2. enetc_page_reusable() in enetc_flip_rx_buff() tests for the page refcount against 1, and [ if it's 2 ] does not attempt to reuse it. But if enetc_flip_rx_buff() is put after the xdp_do_redirect() call, the page refcount can have one of 3 values. It can also be 0, if there is no owner of the other page half, and xdp_do_redirect() for this buffer ran so far that it triggered a flush of the devmap/cpumap bulk queue, and the consumers of those bulk queues also freed the buffer, all by the time xdp_do_redirect() returns the execution back to enetc. This is the reason why enetc_flip_rx_buff() is called before xdp_do_redirect(), but there is a big flaw with that reasoning: enetc_flip_rx_buff() will set rx_swbd->page = NULL on both sides of the enetc_page_reusable() branch, and if xdp_do_redirect() returns an error, we call enetc_xdp_free(), which does not deal gracefully with that. In fact, what happens is quite special. The page refcounts start as 1. enetc_flip_rx_buff() figures they're reusable, transfers these rx_swbd->page pointers to a different rx_swbd in enetc_reuse_page(), and bumps the refcount to 2. When xdp_do_redirect() later returns an error, we call the no-op enetc_xdp_free(), but we still haven't lost the reference to that page. A copy of it is still at rx_ring->next_to_alloc, but that has refcount 2 (and there are no concurrent owners of it in flight, to drop the refcount). What really kills the system is when we'll flip the rx_swbd->page the second time around. With an updated refcount of 2, the page will not be reusable and we'll really leak it. Then enetc_new_page() will have to allocate more pages, which will then eventually leak again on further errors from xdp_do_redirect(). The problem, summarized, is that we zeroize rx_swbd->page before we're completely done with it, and this makes it impossible for the error path to do something with it. Since the packet is potentially multi-buffer and therefore the rx_swbd->page is potentially an array, manual passing of the old pointers between enetc_flip_rx_buff() and enetc_xdp_free() is a bit difficult. For the sake of going with a simple solution, we accept the possibility of racing with xdp_do_redirect(), and we move the flip procedure to execute only on the redirect success path. By racing, I mean that the page may be deemed as not reusable by enetc (having a refcount of 0), but there will be no leak in that case, either. Once we accept that, we have something better to do with buffers on XDP_REDIRECT failure. Since we haven't performed half-page flipping yet, we won't, either (and this way, we can avoid enetc_xdp_free() completely, which gives the entire page to the slab allocator). Instead, we'll call enetc_xdp_drop(), which will recycle this half of the buffer back to the RX ring. Fixes: 9d2b68cc108d ("net: enetc: add support for XDP_REDIRECT") Suggested-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20221213001908.2347046-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-14bpf: Synchronize dispatcher update with bpf_dispatcher_xdp_funcJiri Olsa1-0/+5
Hao Sun reported crash in dispatcher image [1]. Currently we don't have any sync between bpf_dispatcher_update and bpf_dispatcher_xdp_func, so following race is possible: cpu 0: cpu 1: bpf_prog_run_xdp ... bpf_dispatcher_xdp_func in image at offset 0x0 bpf_dispatcher_update update image at offset 0x800 bpf_dispatcher_update update image at offset 0x0 in image at offset 0x0 -> crash Fixing this by synchronizing dispatcher image update (which is done in bpf_dispatcher_update function) with bpf_dispatcher_xdp_func that reads and execute the dispatcher image. Calling synchronize_rcu after updating and installing new image ensures that readers leave old image before it's changed in the next dispatcher update. The update itself is locked with dispatcher's mutex. The bpf_prog_run_xdp is called under local_bh_disable and synchronize_rcu will wait for it to leave [2]. [1] https://lore.kernel.org/bpf/Y5SFho7ZYXr9ifRn@krava/T/#m00c29ece654bc9f332a17df493bbca33e702896c [2] https://lore.kernel.org/bpf/0B62D35A-E695-4B7A-A0D4-774767544C1A@gmail.com/T/#mff43e2c003ae99f4a38f353c7969be4c7162e877 Reported-by: Hao Sun <sunhao.th@gmail.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Paul E. McKenney <paulmck@kernel.org> Link: https://lore.kernel.org/r/20221214123542.1389719-1-jolsa@kernel.org Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2022-12-14bpf: prevent leak of lsm program after failed attachMilan Landaverde1-3/+3
In [0], we added the ability to bpf_prog_attach LSM programs to cgroups, but in our validation to make sure the prog is meant to be attached to BPF_LSM_CGROUP, we return too early if the check fails. This results in lack of decrementing prog's refcnt (through bpf_prog_put) leaving the LSM program alive past the point of the expected lifecycle. This fix allows for the decrement to take place. [0] https://lore.kernel.org/all/20220628174314.1216643-4-sdf@google.com/ Fixes: 69fd337a975c ("bpf: per-cgroup lsm flavor") Signed-off-by: Milan Landaverde <milan@mdaverde.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20221213175714.31963-1-milan@mdaverde.com
2022-12-14selftests/bpf: Select CONFIG_FUNCTION_ERROR_INJECTIONSong Liu1-0/+1
BPF selftests require CONFIG_FUNCTION_ERROR_INJECTION to work. However, CONFIG_FUNCTION_ERROR_INJECTION is no longer 'y' by default after recent changes. As a result, we are seeing errors like the following from BPF CI: bpf_testmod_test_read() is not modifiable __x64_sys_setdomainname is not sleepable __x64_sys_getpgid is not sleepable Fix this by explicitly selecting CONFIG_FUNCTION_ERROR_INJECTION in the selftest config. Fixes: a4412fdd49dc ("error-injection: Add prompt for function error injection") Reported-by: Daniel Müller <deso@posteo.net> Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Daniel Müller <deso@posteo.net> Link: https://lore.kernel.org/bpf/20221213220500.3427947-1-song@kernel.org