summaryrefslogtreecommitdiffstats
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2022-08-03Merge tag 'pull-work.9p' of ↵Linus Torvalds2-85/+35
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull 9p iov_iter fix from Al Viro: "net/9p abuses iov_iter primitives - it attempts to copy _from_ a destination-only iov_iter when it handles Rerror arriving in reply to zero-copy request. Not hard to fix, fortunately. This is a prereq for the iov_iter_get_pages() work in the second part of iov_iter series, ended up in a separate branch" * tag 'pull-work.9p' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: 9p: handling Rerror without copy_from_iter_full()
2022-08-02Merge tag 'v5.20-p1' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto updates from Herbert Xu: "API: - Make proc files report fips module name and version Algorithms: - Move generic SHA1 code into lib/crypto - Implement Chinese Remainder Theorem for RSA - Remove blake2s - Add XCTR with x86/arm64 acceleration - Add POLYVAL with x86/arm64 acceleration - Add HCTR2 - Add ARIA Drivers: - Add support for new CCP/PSP device ID in ccp" * tag 'v5.20-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (89 commits) crypto: tcrypt - Remove the static variable initialisations to NULL crypto: arm64/poly1305 - fix a read out-of-bound crypto: hisilicon/zip - Use the bitmap API to allocate bitmaps crypto: hisilicon/sec - fix auth key size error crypto: ccree - Remove a useless dma_supported() call crypto: ccp - Add support for new CCP/PSP device ID crypto: inside-secure - Add missing MODULE_DEVICE_TABLE for of crypto: hisilicon/hpre - don't use GFP_KERNEL to alloc mem during softirq crypto: testmgr - some more fixes to RSA test vectors cyrpto: powerpc/aes - delete the rebundant word "block" in comments hwrng: via - Fix comment typo crypto: twofish - Fix comment typo crypto: rmd160 - fix Kconfig "its" grammar crypto: keembay-ocs-ecc - Drop if with an always false condition Documentation: qat: rewrite description Documentation: qat: Use code block for qat sysfs example crypto: lib - add module license to libsha1 crypto: lib - make the sha1 library optional crypto: lib - move lib/sha1.c into lib/crypto/ crypto: fips - make proc files report fips module name and version ...
2022-08-02Merge tag 'for-5.20/io_uring-zerocopy-send-2022-07-29' of ↵Linus Torvalds7-46/+140
git://git.kernel.dk/linux-block Pull io_uring zerocopy support from Jens Axboe: "This adds support for efficient support for zerocopy sends through io_uring. Both ipv4 and ipv6 is supported, as well as both TCP and UDP. The core network changes to support this is in a stable branch from Jakub that both io_uring and net-next has pulled in, and the io_uring changes are layered on top of that. All of the work has been done by Pavel" * tag 'for-5.20/io_uring-zerocopy-send-2022-07-29' of git://git.kernel.dk/linux-block: (34 commits) io_uring: notification completion optimisation io_uring: export req alloc from core io_uring/net: use unsigned for flags io_uring/net: make page accounting more consistent io_uring/net: checks errors of zc mem accounting io_uring/net: improve io_get_notif_slot types selftests/io_uring: test zerocopy send io_uring: enable managed frags with register buffers io_uring: add zc notification flush requests io_uring: rename IORING_OP_FILES_UPDATE io_uring: flush notifiers after sendzc io_uring: sendzc with fixed buffers io_uring: allow to pass addr into sendzc io_uring: account locked pages for non-fixed zc io_uring: wire send zc request type io_uring: add notification slot registration io_uring: add rsrc referencing for notifiers io_uring: complete notifiers in tw io_uring: cache struct io_notif io_uring: add zc notification infrastructure ...
2022-08-02Merge tag 'for-5.20/io_uring-2022-07-29' of git://git.kernel.dk/linux-blockLinus Torvalds2-43/+33
Pull io_uring updates from Jens Axboe: - As per (valid) complaint in the last merge window, fs/io_uring.c has grown quite large these days. io_uring isn't really tied to fs either, as it supports a wide variety of functionality outside of that. Move the code to io_uring/ and split it into files that either implement a specific request type, and split some code into helpers as well. The code is organized a lot better like this, and io_uring.c is now < 4K LOC (me). - Deprecate the epoll_ctl opcode. It'll still work, just trigger a warning once if used. If we don't get any complaints on this, and I don't expect any, then we can fully remove it in a future release (me). - Improve the cancel hash locking (Hao) - kbuf cleanups (Hao) - Efficiency improvements to the task_work handling (Dylan, Pavel) - Provided buffer improvements (Dylan) - Add support for recv/recvmsg multishot support. This is similar to the accept (or poll) support for have for multishot, where a single SQE can trigger everytime data is received. For applications that expect to do more than a few receives on an instantiated socket, this greatly improves efficiency (Dylan). - Efficiency improvements for poll handling (Pavel) - Poll cancelation improvements (Pavel) - Allow specifiying a range for direct descriptor allocations (Pavel) - Cleanup the cqe32 handling (Pavel) - Move io_uring types to greatly cleanup the tracing (Pavel) - Tons of great code cleanups and improvements (Pavel) - Add a way to do sync cancelations rather than through the sqe -> cqe interface, as that's a lot easier to use for some use cases (me). - Add support to IORING_OP_MSG_RING for sending direct descriptors to a different ring. This avoids the usually problematic SCM case, as we disallow those. (me) - Make the per-command alloc cache we use for apoll generic, place limits on it, and use it for netmsg as well (me). - Various cleanups (me, Michal, Gustavo, Uros) * tag 'for-5.20/io_uring-2022-07-29' of git://git.kernel.dk/linux-block: (172 commits) io_uring: ensure REQ_F_ISREG is set async offload net: fix compat pointer in get_compat_msghdr() io_uring: Don't require reinitable percpu_ref io_uring: fix types in io_recvmsg_multishot_overflow io_uring: Use atomic_long_try_cmpxchg in __io_account_mem io_uring: support multishot in recvmsg net: copy from user before calling __get_compat_msghdr net: copy from user before calling __copy_msghdr io_uring: support 0 length iov in buffer select in compat io_uring: fix multishot ending when not polled io_uring: add netmsg cache io_uring: impose max limit on apoll cache io_uring: add abstraction around apoll cache io_uring: move apoll cache to poll.c io_uring: consolidate hash_locked io-wq handling io_uring: clear REQ_F_HASH_LOCKED on hash removal io_uring: don't race double poll setting REQ_F_ASYNC_DATA io_uring: don't miss setting REQ_F_DOUBLE_POLL io_uring: disable multishot recvmsg io_uring: only trace one of complete or overflow ...
2022-07-28net: ping6: Fix memleak in ipv6_renew_options().Kuniyuki Iwashima1-0/+6
When we close ping6 sockets, some resources are left unfreed because pingv6_prot is missing sk->sk_prot->destroy(). As reported by syzbot [0], just three syscalls leak 96 bytes and easily cause OOM. struct ipv6_sr_hdr *hdr; char data[24] = {0}; int fd; hdr = (struct ipv6_sr_hdr *)data; hdr->hdrlen = 2; hdr->type = IPV6_SRCRT_TYPE_4; fd = socket(AF_INET6, SOCK_DGRAM, NEXTHDR_ICMP); setsockopt(fd, IPPROTO_IPV6, IPV6_RTHDR, data, 24); close(fd); To fix memory leaks, let's add a destroy function. Note the socket() syscall checks if the GID is within the range of net.ipv4.ping_group_range. The default value is [1, 0] so that no GID meets the condition (1 <= GID <= 0). Thus, the local DoS does not succeed until we change the default value. However, at least Ubuntu/Fedora/RHEL loosen it. $ cat /usr/lib/sysctl.d/50-default.conf ... -net.ipv4.ping_group_range = 0 2147483647 Also, there could be another path reported with these options, and some of them require CAP_NET_RAW. setsockopt IPV6_ADDRFORM (inet6_sk(sk)->pktoptions) IPV6_RECVPATHMTU (inet6_sk(sk)->rxpmtu) IPV6_HOPOPTS (inet6_sk(sk)->opt) IPV6_RTHDRDSTOPTS (inet6_sk(sk)->opt) IPV6_RTHDR (inet6_sk(sk)->opt) IPV6_DSTOPTS (inet6_sk(sk)->opt) IPV6_2292PKTOPTIONS (inet6_sk(sk)->opt) getsockopt IPV6_FLOWLABEL_MGR (inet6_sk(sk)->ipv6_fl_list) For the record, I left a different splat with syzbot's one. unreferenced object 0xffff888006270c60 (size 96): comm "repro2", pid 231, jiffies 4294696626 (age 13.118s) hex dump (first 32 bytes): 01 00 00 00 44 00 00 00 00 00 00 00 00 00 00 00 ....D........... 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000f6bc7ea9>] sock_kmalloc (net/core/sock.c:2564 net/core/sock.c:2554) [<000000006d699550>] do_ipv6_setsockopt.constprop.0 (net/ipv6/ipv6_sockglue.c:715) [<00000000c3c3b1f5>] ipv6_setsockopt (net/ipv6/ipv6_sockglue.c:1024) [<000000007096a025>] __sys_setsockopt (net/socket.c:2254) [<000000003a8ff47b>] __x64_sys_setsockopt (net/socket.c:2265 net/socket.c:2262 net/socket.c:2262) [<000000007c409dcb>] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) [<00000000e939c4a9>] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120) [0]: https://syzkaller.appspot.com/bug?extid=a8430774139ec3ab7176 Fixes: 6d0bfe226116 ("net: ipv6: Add IPv6 support to the ping socket.") Reported-by: syzbot+a8430774139ec3ab7176@syzkaller.appspotmail.com Reported-by: Ayushman Dutta <ayudutta@amazon.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20220728012220.46918-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-27sctp: leave the err path free in sctp_stream_init to sctp_stream_freeXin Long2-19/+5
A NULL pointer dereference was reported by Wei Chen: BUG: kernel NULL pointer dereference, address: 0000000000000000 RIP: 0010:__list_del_entry_valid+0x26/0x80 Call Trace: <TASK> sctp_sched_dequeue_common+0x1c/0x90 sctp_sched_prio_dequeue+0x67/0x80 __sctp_outq_teardown+0x299/0x380 sctp_outq_free+0x15/0x20 sctp_association_free+0xc3/0x440 sctp_do_sm+0x1ca7/0x2210 sctp_assoc_bh_rcv+0x1f6/0x340 This happens when calling sctp_sendmsg without connecting to server first. In this case, a data chunk already queues up in send queue of client side when processing the INIT_ACK from server in sctp_process_init() where it calls sctp_stream_init() to alloc stream_in. If it fails to alloc stream_in all stream_out will be freed in sctp_stream_init's err path. Then in the asoc freeing it will crash when dequeuing this data chunk as stream_out is missing. As we can't free stream out before dequeuing all data from send queue, and this patch is to fix it by moving the err path stream_out/in freeing in sctp_stream_init() to sctp_stream_free() which is eventually called when freeing the asoc in sctp_association_free(). This fix also makes the code in sctp_process_init() more clear. Note that in sctp_association_init() when it fails in sctp_stream_init(), sctp_association_free() will not be called, and in that case it should go to 'stream_free' err path to free stream instead of 'fail_init'. Fixes: 5bbbbe32a431 ("sctp: introduce stream scheduler foundations") Reported-by: Wei Chen <harperchen1110@gmail.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Link: https://lore.kernel.org/r/831a3dc100c4908ff76e5bcc363be97f2778bc0b.1658787066.git.lucien.xin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-27tcp: md5: fix IPv4-mapped supportEric Dumazet1-3/+12
After the blamed commit, IPv4 SYN packets handled by a dual stack IPv6 socket are dropped, even if perfectly valid. $ nstat | grep MD5 TcpExtTCPMD5Failure 5 0.0 For a dual stack listener, an incoming IPv4 SYN packet would call tcp_inbound_md5_hash() with @family == AF_INET, while tp->af_specific is pointing to tcp_sock_ipv6_specific. Only later when an IPv4-mapped child is created, tp->af_specific is changed to tcp_sock_ipv6_mapped_specific. Fixes: 7bbb765b7349 ("net/tcp: Merge TCP-MD5 inbound callbacks") Reported-by: Brian Vazquez <brianvv@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Dmitry Safonov <dima@arista.com> Tested-by: Leonard Crestez <cdleonard@gmail.com> Link: https://lore.kernel.org/r/20220726115743.2759832-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-26mptcp: Do not return EINPROGRESS when subflow creation succeedsMat Martineau1-1/+1
New subflows are created within the kernel using O_NONBLOCK, so EINPROGRESS is the expected return value from kernel_connect(). __mptcp_subflow_connect() has the correct logic to consider EINPROGRESS to be a successful case, but it has also used that error code as its return value. Before v5.19 this was benign: all the callers ignored the return value. Starting in v5.19 there is a MPTCP_PM_CMD_SUBFLOW_CREATE generic netlink command that does use the return value, so the EINPROGRESS gets propagated to userspace. Make __mptcp_subflow_connect() always return 0 on success instead. Fixes: ec3edaa7ca6c ("mptcp: Add handling of outgoing MP_JOIN requests") Fixes: 702c2f646d42 ("mptcp: netlink: allow userspace-driven subflow establishment") Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Link: https://lore.kernel.org/r/20220725205231.87529-1-mathew.j.martineau@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nfJakub Kicinski3-1/+39
Florian Westphal says: ==================== netfilter updates for net Three late fixes for netfilter: 1) If nf_queue user requests packet truncation below size of l3 header, we corrupt the skb, then crash. Reject such requests. 2) add cond_resched() calls when doing cycle detection in the nf_tables graph. This avoids softlockup warning with certain rulesets. 3) Reject rulesets that use nftables 'queue' expression in family/chain combinations other than those that are supported. Currently the ruleset will load, but when userspace attempts to reinject you get WARN splat + packet drops. * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nft_queue: only allow supported familes and hooks netfilter: nf_tables: add rescheduling points during loop detection walks netfilter: nf_queue: do not allow packet truncation below transport header offset ==================== Link: https://lore.kernel.org/r/20220726192056.13497-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-26Merge tag 'for-net-2022-07-26' of ↵Jakub Kicinski3-17/+51
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Luiz Augusto von Dentz says: ==================== bluetooth pull request for net: - Fix early wakeup after suspend - Fix double free on error - Fix use-after-free on l2cap_chan_put * tag 'for-net-2022-07-26' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth: Bluetooth: L2CAP: Fix use-after-free caused by l2cap_chan_put Bluetooth: Always set event mask on suspend Bluetooth: mgmt: Fix double free on error path ==================== Link: https://lore.kernel.org/r/20220726221328.423714-1-luiz.dentz@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-26Bluetooth: L2CAP: Fix use-after-free caused by l2cap_chan_putLuiz Augusto von Dentz1-13/+48
This fixes the following trace which is caused by hci_rx_work starting up *after* the final channel reference has been put() during sock_close() but *before* the references to the channel have been destroyed, so instead the code now rely on kref_get_unless_zero/l2cap_chan_hold_unless_zero to prevent referencing a channel that is about to be destroyed. refcount_t: increment on 0; use-after-free. BUG: KASAN: use-after-free in refcount_dec_and_test+0x20/0xd0 Read of size 4 at addr ffffffc114f5bf18 by task kworker/u17:14/705 CPU: 4 PID: 705 Comm: kworker/u17:14 Tainted: G S W 4.14.234-00003-g1fb6d0bd49a4-dirty #28 Hardware name: Qualcomm Technologies, Inc. SM8150 V2 PM8150 Google Inc. MSM sm8150 Flame DVT (DT) Workqueue: hci0 hci_rx_work Call trace: dump_backtrace+0x0/0x378 show_stack+0x20/0x2c dump_stack+0x124/0x148 print_address_description+0x80/0x2e8 __kasan_report+0x168/0x188 kasan_report+0x10/0x18 __asan_load4+0x84/0x8c refcount_dec_and_test+0x20/0xd0 l2cap_chan_put+0x48/0x12c l2cap_recv_frame+0x4770/0x6550 l2cap_recv_acldata+0x44c/0x7a4 hci_acldata_packet+0x100/0x188 hci_rx_work+0x178/0x23c process_one_work+0x35c/0x95c worker_thread+0x4cc/0x960 kthread+0x1a8/0x1c4 ret_from_fork+0x10/0x18 Cc: stable@kernel.org Reported-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Tested-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-07-26Bluetooth: Always set event mask on suspendAbhishek Pandit-Subedi1-3/+3
When suspending, always set the event mask once disconnects are successful. Otherwise, if wakeup is disallowed, the event mask is not set before suspend continues and can result in an early wakeup. Fixes: 182ee45da083 ("Bluetooth: hci_sync: Rework hci_suspend_notifier") Cc: stable@vger.kernel.org Signed-off-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-07-26Bluetooth: mgmt: Fix double free on error pathDan Carpenter1-1/+0
Don't call mgmt_pending_remove() twice (double free). Fixes: 6b88eff43704 ("Bluetooth: hci_sync: Refactor remove Adv Monitor") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-07-26wifi: mac80211: do not abuse fq.lock in ieee80211_do_stop()Tetsuo Handa1-2/+1
lockdep complains use of uninitialized spinlock at ieee80211_do_stop() [1], for commit f856373e2f31ffd3 ("wifi: mac80211: do not wake queues on a vif that is being stopped") guards clear_bit() using fq.lock even before fq_init() from ieee80211_txq_setup_flows() initializes this spinlock. According to discussion [2], Toke was not happy with expanding usage of fq.lock. Since __ieee80211_wake_txqs() is called under RCU read lock, we can instead use synchronize_rcu() for flushing ieee80211_wake_txqs(). Link: https://syzkaller.appspot.com/bug?extid=eceab52db7c4b961e9d6 [1] Link: https://lkml.kernel.org/r/874k0zowh2.fsf@toke.dk [2] Reported-by: syzbot <syzbot+eceab52db7c4b961e9d6@syzkaller.appspotmail.com> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Fixes: f856373e2f31ffd3 ("wifi: mac80211: do not wake queues on a vif that is being stopped") Tested-by: syzbot <syzbot+eceab52db7c4b961e9d6@syzkaller.appspotmail.com> Acked-by: Toke Høiland-Jørgensen <toke@kernel.org> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/9cc9b81d-75a3-3925-b612-9d0ad3cab82b@I-love.SAKURA.ne.jp [ pick up commit 3598cb6e1862 ("wifi: mac80211: do not abuse fq.lock in ieee80211_do_stop()") from -next] Link: https://lore.kernel.org/all/87o7xcq6qt.fsf@kernel.org/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-26netfilter: nft_queue: only allow supported familes and hooksFlorian Westphal1-0/+27
Trying to use 'queue' statement in ingress (for example) triggers a splat on reinject: WARNING: CPU: 3 PID: 1345 at net/netfilter/nf_queue.c:291 ... because nf_reinject cannot find the ruleset head. The netdev family doesn't support async resume at the moment anyway, so disallow loading such rulesets with a more appropriate error message. v2: add 'validate' callback and also check hook points, v1 did allow ingress use in 'table inet', but that doesn't work either. (Pablo) Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-07-26netfilter: nf_tables: add rescheduling points during loop detection walksFlorian Westphal1-0/+6
Add explicit rescheduling points during ruleset walk. Switching to a faster algorithm is possible but this is a much smaller change, suitable for nf tree. Link: https://bugzilla.netfilter.org/show_bug.cgi?id=1460 Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-07-26netfilter: nf_queue: do not allow packet truncation below transport header ↵Florian Westphal1-1/+6
offset Domingo Dirutigliano and Nicola Guerrera report kernel panic when sending nf_queue verdict with 1-byte nfta_payload attribute. The IP/IPv6 stack pulls the IP(v6) header from the packet after the input hook. If user truncates the packet below the header size, this skb_pull() will result in a malformed skb (skb->len < 0). Fixes: 7af4cc3fa158 ("[NETFILTER]: Add "nfnetlink_queue" netfilter queue handler over nfnetlink") Reported-by: Domingo Dirutigliano <pwnzer0tt1@proton.me> Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-07-26bridge: Do not send empty IFLA_AF_SPEC attributeBenjamin Poirier1-2/+6
After commit b6c02ef54913 ("bridge: Netlink interface fix."), br_fill_ifinfo() started to send an empty IFLA_AF_SPEC attribute when a bridge vlan dump is requested but an interface does not have any vlans configured. iproute2 ignores such an empty attribute since commit b262a9becbcb ("bridge: Fix output with empty vlan lists") but older iproute2 versions as well as other utilities have their output changed by the cited kernel commit, resulting in failed test cases. Regardless, emitting an empty attribute is pointless and inefficient. Avoid this change by canceling the attribute if no AF_SPEC data was added. Fixes: b6c02ef54913 ("bridge: Netlink interface fix.") Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://lore.kernel.org/r/20220725001236.95062-1-bpoirier@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-07-25sctp: fix sleep in atomic context bug in timer handlersDuoming Zhou1-1/+1
There are sleep in atomic context bugs in timer handlers of sctp such as sctp_generate_t3_rtx_event(), sctp_generate_probe_event(), sctp_generate_t1_init_event(), sctp_generate_timeout_event(), sctp_generate_t3_rtx_event() and so on. The root cause is sctp_sched_prio_init_sid() with GFP_KERNEL parameter that may sleep could be called by different timer handlers which is in interrupt context. One of the call paths that could trigger bug is shown below: (interrupt context) sctp_generate_probe_event sctp_do_sm sctp_side_effects sctp_cmd_interpreter sctp_outq_teardown sctp_outq_init sctp_sched_set_sched n->init_sid(..,GFP_KERNEL) sctp_sched_prio_init_sid //may sleep This patch changes gfp_t parameter of init_sid in sctp_sched_set_sched() from GFP_KERNEL to GFP_ATOMIC in order to prevent sleep in atomic context bugs. Fixes: 5bbbbe32a431 ("sctp: introduce stream scheduler foundations") Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Link: https://lore.kernel.org/r/20220723015809.11553-1-duoming@zju.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-25net: dsa: fix reference counting for LAG FDBsVladimir Oltean1-0/+1
Due to an invalid conflict resolution on my side while working on 2 different series (LAG FDBs and FDB isolation), dsa_switch_do_lag_fdb_add() does not store the database associated with a dsa_mac_addr structure. So after adding an FDB entry associated with a LAG, dsa_mac_addr_find() fails to find it while deleting it, because &a->db is zeroized memory for all stored FDB entries of lag->fdbs, and dsa_switch_do_lag_fdb_del() returns -ENOENT rather than deleting the entry. Fixes: c26933639b54 ("net: dsa: request drivers to perform FDB isolation") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20220723012411.1125066-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-25ipv4: Fix data-races around sysctl_fib_notify_on_flag_change.Kuniyuki Iwashima1-2/+5
While reading sysctl_fib_notify_on_flag_change, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: 680aea08e78c ("net: ipv4: Emit notification when fib hardware flags are changed") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-25tcp: Fix data-races around sysctl_tcp_reflect_tos.Kuniyuki Iwashima2-4/+4
While reading sysctl_tcp_reflect_tos, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: ac8f1710c12b ("tcp: reflect tos value received in SYN to the socket") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Wei Wang <weiwan@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-25tcp: Fix a data-race around sysctl_tcp_comp_sack_nr.Kuniyuki Iwashima1-1/+1
While reading sysctl_tcp_comp_sack_nr, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: 9c21d2fc41c0 ("tcp: add tcp_comp_sack_nr sysctl") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-25tcp: Fix a data-race around sysctl_tcp_comp_sack_slack_ns.Kuniyuki Iwashima1-1/+1
While reading sysctl_tcp_comp_sack_slack_ns, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: a70437cc09a1 ("tcp: add hrtimer slack to sack compression") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-25tcp: Fix a data-race around sysctl_tcp_comp_sack_delay_ns.Kuniyuki Iwashima1-1/+2
While reading sysctl_tcp_comp_sack_delay_ns, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: 6d82aa242092 ("tcp: add tcp_comp_sack_delay_ns sysctl") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-25net: Fix data-races around sysctl_[rw]mem(_offset)?.Kuniyuki Iwashima6-16/+17
While reading these sysctl variables, they can be changed concurrently. Thus, we need to add READ_ONCE() to their readers. - .sysctl_rmem - .sysctl_rwmem - .sysctl_rmem_offset - .sysctl_wmem_offset - sysctl_tcp_rmem[1, 2] - sysctl_tcp_wmem[1, 2] - sysctl_decnet_rmem[1] - sysctl_decnet_wmem[1] - sysctl_tipc_rmem[1] Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-25tcp: Fix data-races around sk_pacing_rate.Kuniyuki Iwashima1-2/+2
While reading sysctl_tcp_pacing_(ss|ca)_ratio, they can be changed concurrently. Thus, we need to add READ_ONCE() to their readers. Fixes: 43e122b014c9 ("tcp: refine pacing rate determination") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-25net: mld: fix reference count leak in mld_{query | report}_work()Taehee Yoo1-6/+8
mld_{query | report}_work() processes queued events. If there are too many events in the queue, it re-queue a work. And then, it returns without in6_dev_put(). But if queuing is failed, it should call in6_dev_put(), but it doesn't. So, a reference count leak would occur. THREAD0 THREAD1 mld_report_work() spin_lock_bh() if (!mod_delayed_work()) in6_dev_hold(); spin_unlock_bh() spin_lock_bh() schedule_delayed_work() spin_unlock_bh() Script to reproduce(by Hangbin Liu): ip netns add ns1 ip netns add ns2 ip netns exec ns1 sysctl -w net.ipv6.conf.all.force_mld_version=1 ip netns exec ns2 sysctl -w net.ipv6.conf.all.force_mld_version=1 ip -n ns1 link add veth0 type veth peer name veth0 netns ns2 ip -n ns1 link set veth0 up ip -n ns2 link set veth0 up for i in `seq 50`; do for j in `seq 100`; do ip -n ns1 addr add 2021:${i}::${j}/64 dev veth0 ip -n ns2 addr add 2022:${i}::${j}/64 dev veth0 done done modprobe -r veth ip -a netns del splat looks like: unregister_netdevice: waiting for veth0 to become free. Usage count = 2 leaked reference. ipv6_add_dev+0x324/0xec0 addrconf_notify+0x481/0xd10 raw_notifier_call_chain+0xe3/0x120 call_netdevice_notifiers+0x106/0x160 register_netdevice+0x114c/0x16b0 veth_newlink+0x48b/0xa50 [veth] rtnl_newlink+0x11a2/0x1a40 rtnetlink_rcv_msg+0x63f/0xc00 netlink_rcv_skb+0x1df/0x3e0 netlink_unicast+0x5de/0x850 netlink_sendmsg+0x6c9/0xa90 ____sys_sendmsg+0x76a/0x780 __sys_sendmsg+0x27c/0x340 do_syscall_64+0x43/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd Tested-by: Hangbin Liu <liuhangbin@gmail.com> Fixes: f185de28d9ae ("mld: add new workqueues for process mld events") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-24Merge branch 'for-5.20/io_uring' into for-5.20/io_uring-zerocopy-sendJens Axboe67-388/+445
* for-5.20/io_uring: (716 commits) io_uring: ensure REQ_F_ISREG is set async offload net: fix compat pointer in get_compat_msghdr() io_uring: Don't require reinitable percpu_ref io_uring: fix types in io_recvmsg_multishot_overflow io_uring: Use atomic_long_try_cmpxchg in __io_account_mem io_uring: support multishot in recvmsg net: copy from user before calling __get_compat_msghdr net: copy from user before calling __copy_msghdr io_uring: support 0 length iov in buffer select in compat io_uring: fix multishot ending when not polled io_uring: add netmsg cache io_uring: impose max limit on apoll cache io_uring: add abstraction around apoll cache io_uring: move apoll cache to poll.c io_uring: consolidate hash_locked io-wq handling io_uring: clear REQ_F_HASH_LOCKED on hash removal io_uring: don't race double poll setting REQ_F_ASYNC_DATA io_uring: don't miss setting REQ_F_DOUBLE_POLL io_uring: disable multishot recvmsg io_uring: only trace one of complete or overflow ... Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24net: fix compat pointer in get_compat_msghdr()Jens Axboe1-1/+1
A previous change enabled external users to copy the data before calling __get_compat_msghdr(), but didn't modify get_compat_msghdr() or __io_compat_recvmsg_copy_hdr() to take that into account. They are both stil passing in the __user pointer rather than the copied version. Ensure we pass in the kernel struct, not the pointer to the user data. Link: https://lore.kernel.org/all/46439555-644d-08a1-7d66-16f8f9a320f0@samsung.com/ Fixes: 1a3e4e94a1b9 ("net: copy from user before calling __get_compat_msghdr") Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24net: copy from user before calling __get_compat_msghdrDylan Yudaken1-22/+17
this is in preparation for multishot receive from io_uring, where it needs to have access to the original struct user_msghdr. functionally this should be a no-op. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220714110258.1336200-3-dylany@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24net: copy from user before calling __copy_msghdrDylan Yudaken1-21/+16
this is in preparation for multishot receive from io_uring, where it needs to have access to the original struct user_msghdr. functionally this should be a no-op. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220714110258.1336200-2-dylany@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24net/tls: Remove the context from the list in tls_device_downMaxim Mikityanskiy1-1/+6
tls_device_down takes a reference on all contexts it's going to move to the degraded state (software fallback). If sk_destruct runs afterwards, it can reduce the reference counter back to 1 and return early without destroying the context. Then tls_device_down will release the reference it took and call tls_device_free_ctx. However, the context will still stay in tls_device_down_list forever. The list will contain an item, memory for which is released, making a memory corruption possible. Fix the above bug by properly removing the context from all lists before any call to tls_device_free_ctx. Fixes: 3740651bf7e2 ("tls: Fix context leak on tls_device_down") Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-22Revert "tcp: change pingpong threshold to 3"Wei Wang1-9/+6
This reverts commit 4a41f453bedfd5e9cd040bad509d9da49feb3e2c. This to-be-reverted commit was meant to apply a stricter rule for the stack to enter pingpong mode. However, the condition used to check for interactive session "before(tp->lsndtime, icsk->icsk_ack.lrcvtime)" is jiffy based and might be too coarse, which delays the stack entering pingpong mode. We revert this patch so that we no longer use the above condition to determine interactive session, and also reduce pingpong threshold to 1. Fixes: 4a41f453bedf ("tcp: change pingpong threshold to 3") Reported-by: LemmyHuang <hlm3280@163.com> Suggested-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20220721204404.388396-1-weiwan@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-22net: fix uninitialised msghdr->sg_from_iterPavel Begunkov1-1/+1
Because of how struct msghdr is usually initialised some fields and sg_from_iter in particular might be left out not initialised, so we can't safely use it in __zerocopy_sg_from_iter(). For now use the callback only when there is ->msg_ubuf set relying on the fact that they're used together and we properly zero ->msg_ubuf. Fixes: ebe73a284f4de8 ("net: Allow custom iter handler in msghdr") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Message-Id: <ce8b68b41351488f79fd998b032b3c56e9b1cc6c.1658401817.git.asml.silence@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-22caif: Fix bitmap data type in "struct caifsock"Christophe JAILLET1-13/+7
Bitmap are "unsigned long", so use it instead of a "u32" to make things more explicit. While at it, remove some useless cast (and leading spaces) when using the bitmap API. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-22tcp: Fix a data-race around sysctl_tcp_invalid_ratelimit.Kuniyuki Iwashima1-1/+2
While reading sysctl_tcp_invalid_ratelimit, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: 032ee4236954 ("tcp: helpers to mitigate ACK loops by rate-limiting out-of-window dupacks") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-22tcp: Fix a data-race around sysctl_tcp_autocorking.Kuniyuki Iwashima1-1/+1
While reading sysctl_tcp_autocorking, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: f54b311142a9 ("tcp: auto corking") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-22tcp: Fix a data-race around sysctl_tcp_min_rtt_wlen.Kuniyuki Iwashima1-1/+1
While reading sysctl_tcp_min_rtt_wlen, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: f672258391b4 ("tcp: track min RTT using windowed min-filter") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-22tcp: Fix a data-race around sysctl_tcp_tso_rtt_log.Kuniyuki Iwashima1-1/+1
While reading sysctl_tcp_tso_rtt_log, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: 65466904b015 ("tcp: adjust TSO packet sizes based on min_rtt") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-22tcp: Fix a data-race around sysctl_tcp_min_tso_segs.Kuniyuki Iwashima1-1/+1
While reading sysctl_tcp_min_tso_segs, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: 95bd09eb2750 ("tcp: TSO packets automatic sizing") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-22tcp: Fix a data-race around sysctl_tcp_challenge_ack_limit.Kuniyuki Iwashima1-1/+1
While reading sysctl_tcp_challenge_ack_limit, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: 282f23c6ee34 ("tcp: implement RFC 5961 3.2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-22tcp: Fix a data-race around sysctl_tcp_limit_output_bytes.Kuniyuki Iwashima1-1/+1
While reading sysctl_tcp_limit_output_bytes, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: 46d3ceabd8d9 ("tcp: TCP Small Queues") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-22tcp: Fix data-races around sysctl_tcp_workaround_signed_windows.Kuniyuki Iwashima2-3/+3
While reading sysctl_tcp_workaround_signed_windows, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: 15d99e02baba ("[TCP]: sysctl to allow TCP window > 32767 sans wscale") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-22tcp: Fix data-races around sysctl_tcp_moderate_rcvbuf.Kuniyuki Iwashima2-2/+2
While reading sysctl_tcp_moderate_rcvbuf, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-22tcp: Fix data-races around sysctl_tcp_no_ssthresh_metrics_save.Kuniyuki Iwashima1-4/+4
While reading sysctl_tcp_no_ssthresh_metrics_save, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: 65e6d90168f3 ("net-tcp: Disable TCP ssthresh metrics cache by default") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-22tcp: Fix a data-race around sysctl_tcp_nometrics_save.Kuniyuki Iwashima1-1/+1
While reading sysctl_tcp_nometrics_save, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-22tcp: Fix a data-race around sysctl_tcp_frto.Kuniyuki Iwashima1-1/+1
While reading sysctl_tcp_frto, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-22tcp: Fix a data-race around sysctl_tcp_app_win.Kuniyuki Iwashima1-1/+1
While reading sysctl_tcp_app_win, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-22tcp: Fix data-races around sysctl_tcp_dsack.Kuniyuki Iwashima1-2/+2
While reading sysctl_tcp_dsack, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>