summaryrefslogtreecommitdiffstats
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2022-09-09net: sched: act_connmark: get rid of tcf_connmark_walker and tcf_connmark_searchZhengchao Shao1-19/+0
tcf_connmark_walker() and tcf_connmark_search() do the same thing as generic walk/search function, so remove them. Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-09net: sched: act_bpf: get rid of tcf_bpf_walker and tcf_bpf_searchZhengchao Shao1-19/+0
tcf_bpf_walker() and tcf_bpf_search() do the same thing as generic walk/search function, so remove them. Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-09net: sched: act_api: implement generic walker and search for tc actionZhengchao Shao1-4/+29
Being able to get tc_action_net by using net_id stored in tc_action_ops and execute the generic walk/search function, add __tcf_generic_walker() and __tcf_idr_search() helpers. Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-09net: sched: act: move global static variable net_id to tc_action_opsZhengchao Shao20-152/+130
Each tc action module has a corresponding net_id, so put net_id directly into the structure tc_action_ops. Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-nextDavid S. Miller14-252/+263
Florian Westphal says: ==================== The following set contains changes for your *net-next* tree: - make conntrack ignore packets that are delayed (containing data already acked). The current behaviour to flag them as INVALID causes more harm than good, let them pass so peer can send an immediate ACK for the most recent sequence number. - make conntrack recognize when both peers have sent 'invalid' FINs: This helps cleaning out stale connections faster for those cases where conntrack is no longer in sync with the actual connection state. - Now that DECNET is gone, we don't need to reserve space for DECNET related information. - compact common 'find a free port number for the new inbound connection' code and move it to a helper, then cap number of tries the new helper will make until it gives up. - replace various instances of strlcpy with strscpy, from Wolfram Sang. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netPaolo Abeni37-235/+368
drivers/net/ethernet/freescale/fec.h 7d650df99d52 ("net: fec: add pm_qos support on imx6q platform") 40c79ce13b03 ("net: fec: add stop mode support for imx8 platform") Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-08Revert "SUNRPC: Remove unreachable error condition"Dan Aloni1-0/+3
This reverts commit efe57fd58e1cb77f9186152ee12a8aa4ae3348e0. The assumption that it is impossible to return an ERR pointer from rpc_run_task() no longer holds due to commit 25cf32ad5dba ("SUNRPC: Handle allocation failure in rpc_new_task()"). Fixes: 25cf32ad5dba ('SUNRPC: Handle allocation failure in rpc_new_task()') Fixes: efe57fd58e1c ('SUNRPC: Remove unreachable error condition') Signed-off-by: Dan Aloni <dan.aloni@vastdata.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-09-08sch_sfb: Also store skb len before calling child enqueueToke Høiland-Jørgensen1-1/+2
Cong Wang noticed that the previous fix for sch_sfb accessing the queued skb after enqueueing it to a child qdisc was incomplete: the SFB enqueue function was also calling qdisc_qstats_backlog_inc() after enqueue, which reads the pkt len from the skb cb field. Fix this by also storing the skb len, and using the stored value to increment the backlog after enqueueing. Fixes: 9efd23297cca ("sch_sfb: Don't assume the skb is still around after enqueueing to child") Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk> Acked-by: Cong Wang <cong.wang@bytedance.com> Link: https://lore.kernel.org/r/20220905192137.965549-1-toke@toke.dk Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-07freezer,sched: Rewrite core freezer logicPeter Zijlstra2-12/+8
Rewrite the core freezer to behave better wrt thawing and be simpler in general. By replacing PF_FROZEN with TASK_FROZEN, a special block state, it is ensured frozen tasks stay frozen until thawed and don't randomly wake up early, as is currently possible. As such, it does away with PF_FROZEN and PF_FREEZER_SKIP, freeing up two PF_flags (yay!). Specifically; the current scheme works a little like: freezer_do_not_count(); schedule(); freezer_count(); And either the task is blocked, or it lands in try_to_freezer() through freezer_count(). Now, when it is blocked, the freezer considers it frozen and continues. However, on thawing, once pm_freezing is cleared, freezer_count() stops working, and any random/spurious wakeup will let a task run before its time. That is, thawing tries to thaw things in explicit order; kernel threads and workqueues before doing bringing SMP back before userspace etc.. However due to the above mentioned races it is entirely possible for userspace tasks to thaw (by accident) before SMP is back. This can be a fatal problem in asymmetric ISA architectures (eg ARMv9) where the userspace task requires a special CPU to run. As said; replace this with a special task state TASK_FROZEN and add the following state transitions: TASK_FREEZABLE -> TASK_FROZEN __TASK_STOPPED -> TASK_FROZEN __TASK_TRACED -> TASK_FROZEN The new TASK_FREEZABLE can be set on any state part of TASK_NORMAL (IOW. TASK_INTERRUPTIBLE and TASK_UNINTERRUPTIBLE) -- any such state is already required to deal with spurious wakeups and the freezer causes one such when thawing the task (since the original state is lost). The special __TASK_{STOPPED,TRACED} states *can* be restored since their canonical state is in ->jobctl. With this, frozen tasks need an explicit TASK_FROZEN wakeup and are free of undue (early / spurious) wakeups. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Ingo Molnar <mingo@kernel.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lore.kernel.org/r/20220822114649.055452969@infradead.org
2022-09-07selftests/bpf: Add tests for kfunc returning a memory pointerBenjamin Tissoires1-0/+36
We add 2 new kfuncs that are following the RET_PTR_TO_MEM capability from the previous commit. Then we test them in selftests: the first tests are testing valid case, and are not failing, and the later ones are actually preventing the program to be loaded because they are wrong. To work around that, we mark the failing ones as not autoloaded (with SEC("?tc")), and we manually enable them one by one, ensuring the verifier rejects them. Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Link: https://lore.kernel.org/r/20220906151303.2780789-8-benjamin.tissoires@redhat.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-09-07selftests/bpf: add test for accessing ctx from syscall program typeBenjamin Tissoires1-0/+1
We need to also export the kfunc set to the syscall program type, and then add a couple of eBPF programs that are testing those calls. The first one checks for valid access, and the second one is OK from a static analysis point of view but fails at run time because we are trying to access outside of the allocated memory. Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Link: https://lore.kernel.org/r/20220906151303.2780789-5-benjamin.tissoires@redhat.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-09-07net/smc: Fix possible access to freed memory in link clearYacan Liu4-0/+13
After modifying the QP to the Error state, all RX WR would be completed with WC in IB_WC_WR_FLUSH_ERR status. Current implementation does not wait for it is done, but destroy the QP and free the link group directly. So there is a risk that accessing the freed memory in tasklet context. Here is a crash example: BUG: unable to handle page fault for address: ffffffff8f220860 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD f7300e067 P4D f7300e067 PUD f7300f063 PMD 8c4e45063 PTE 800ffff08c9df060 Oops: 0002 [#1] SMP PTI CPU: 1 PID: 0 Comm: swapper/1 Kdump: loaded Tainted: G S OE 5.10.0-0607+ #23 Hardware name: Inspur NF5280M4/YZMB-00689-101, BIOS 4.1.20 07/09/2018 RIP: 0010:native_queued_spin_lock_slowpath+0x176/0x1b0 Code: f3 90 48 8b 32 48 85 f6 74 f6 eb d5 c1 ee 12 83 e0 03 83 ee 01 48 c1 e0 05 48 63 f6 48 05 00 c8 02 00 48 03 04 f5 00 09 98 8e <48> 89 10 8b 42 08 85 c0 75 09 f3 90 8b 42 08 85 c0 74 f7 48 8b 32 RSP: 0018:ffffb3b6c001ebd8 EFLAGS: 00010086 RAX: ffffffff8f220860 RBX: 0000000000000246 RCX: 0000000000080000 RDX: ffff91db1f86c800 RSI: 000000000000173c RDI: ffff91db62bace00 RBP: ffff91db62bacc00 R08: 0000000000000000 R09: c00000010000028b R10: 0000000000055198 R11: ffffb3b6c001ea58 R12: ffff91db80e05010 R13: 000000000000000a R14: 0000000000000006 R15: 0000000000000040 FS: 0000000000000000(0000) GS:ffff91db1f840000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffff8f220860 CR3: 00000001f9580004 CR4: 00000000003706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <IRQ> _raw_spin_lock_irqsave+0x30/0x40 mlx5_ib_poll_cq+0x4c/0xc50 [mlx5_ib] smc_wr_rx_tasklet_fn+0x56/0xa0 [smc] tasklet_action_common.isra.21+0x66/0x100 __do_softirq+0xd5/0x29c asm_call_irq_on_stack+0x12/0x20 </IRQ> do_softirq_own_stack+0x37/0x40 irq_exit_rcu+0x9d/0xa0 sysvec_call_function_single+0x34/0x80 asm_sysvec_call_function_single+0x12/0x20 Fixes: bd4ad57718cc ("smc: initialize IB transport incl. PD, MR, QP, CQ, event, WR") Signed-off-by: Yacan Liu <liuyacan@corp.netease.com> Reviewed-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-07netfilter: nat: avoid long-running port range loopFlorian Westphal1-2/+14
Looping a large port range takes too long. Instead select a random offset within [ntohs(exp->saved_proto.tcp.port), 65535] and try 128 ports. This is a rehash of an erlier patch to do the same, but generalized to handle other helpers as well. Link: https://patchwork.ozlabs.org/project/netfilter-devel/patch/20210920204439.13179-2-Cole.Dishington@alliedtelesis.co.nz/ Signed-off-by: Florian Westphal <fw@strlen.de>
2022-09-07netfilter: nat: move repetitive nat port reserve loop to a helperFlorian Westphal6-111/+29
Almost all nat helpers reserve an expecation port the same way: Try the port inidcated by the peer, then move to next port if that port is already in use. We can squash this into a helper. Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>
2022-09-07netfilter: move from strlcpy with unused retval to strscpyWolfram Sang7-21/+21
Follow the advice of the below link and prefer 'strscpy' in this subsystem. Conversion is 1:1 because the return value is not used. Generated by a coccinelle script. Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/ Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Reviewed-by: Simon Horman <horms@verge.net.au> Signed-off-by: Florian Westphal <fw@strlen.de>
2022-09-07netfilter: conntrack: reduce timeout when receiving out-of-window fin or rstFlorian Westphal1-0/+58
In case the endpoints and conntrack go out-of-sync, i.e. there is disagreement wrt. validy of sequence/ack numbers between conntracks internal state and those of the endpoints, connections can hang for a long time (until ESTABLISHED timeout). This adds a check to detect a fin/fin exchange even if those are invalid. The timeout is then lowered to UNACKED (default 300s). Signed-off-by: Florian Westphal <fw@strlen.de>
2022-09-07netfilter: conntrack: remove unneeded indent levelFlorian Westphal1-53/+45
After previous patch, the conditional branch is obsolete, reformat it. gcc generates same code as before this change. Signed-off-by: Florian Westphal <fw@strlen.de>
2022-09-07net: sysctl: remove unused variable long_maxLiu Shixin1-1/+0
The variable long_max is replaced by bpf_jit_limit_max and no longer be used. So remove it. No functional change. Signed-off-by: Liu Shixin <liushixin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-07net: skb: export skb drop reaons to user by TRACE_DEFINE_ENUMMenglong Dong3-23/+6
As Eric reported, the 'reason' field is not presented when trace the kfree_skb event by perf: $ perf record -e skb:kfree_skb -a sleep 10 $ perf script ip_defrag 14605 [021] 221.614303: skb:kfree_skb: skbaddr=0xffff9d2851242700 protocol=34525 location=0xffffffffa39346b1 reason: The cause seems to be passing kernel address directly to TP_printk(), which is not right. As the enum 'skb_drop_reason' is not exported to user space through TRACE_DEFINE_ENUM(), perf can't get the drop reason string from the 'reason' field, which is a number. Therefore, we introduce the macro DEFINE_DROP_REASON(), which is used to define the trace enum by TRACE_DEFINE_ENUM(). With the help of DEFINE_DROP_REASON(), now we can remove the auto-generate that we introduced in the commit ec43908dd556 ("net: skb: use auto-generation to convert skb drop reason to string"), and define the string array 'drop_reasons'. Hmmmm...now we come back to the situation that have to maintain drop reasons in both enum skb_drop_reason and DEFINE_DROP_REASON. But they are both in dropreason.h, which makes it easier. After this commit, now the format of kfree_skb is like this: $ cat /tracing/events/skb/kfree_skb/format name: kfree_skb ID: 1524 format: field:unsigned short common_type; offset:0; size:2; signed:0; field:unsigned char common_flags; offset:2; size:1; signed:0; field:unsigned char common_preempt_count; offset:3; size:1; signed:0; field:int common_pid; offset:4; size:4; signed:1; field:void * skbaddr; offset:8; size:8; signed:0; field:void * location; offset:16; size:8; signed:0; field:unsigned short protocol; offset:24; size:2; signed:0; field:enum skb_drop_reason reason; offset:28; size:4; signed:0; print fmt: "skbaddr=%p protocol=%u location=%p reason: %s", REC->skbaddr, REC->protocol, REC->location, __print_symbolic(REC->reason, { 1, "NOT_SPECIFIED" }, { 2, "NO_SOCKET" } ...... Fixes: ec43908dd556 ("net: skb: use auto-generation to convert skb drop reason to string") Link: https://lore.kernel.org/netdev/CANn89i+bx0ybvE55iMYf5GJM48WwV1HNpdm9Q6t-HaEstqpCSA@mail.gmail.com/ Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Menglong Dong <imagedong@tencent.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-07netfilter: nfnetlink_osf: fix possible bogus match in nf_osf_find()Pablo Neira Ayuso1-1/+3
nf_osf_find() incorrectly returns true on mismatch, this leads to copying uninitialized memory area in nft_osf which can be used to leak stale kernel stack data to userspace. Fixes: 22c7652cdaa8 ("netfilter: nft_osf: Add version option support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>
2022-09-07netfilter: nf_conntrack_irc: Tighten matching on DCC messageDavid Leadbeater1-6/+28
CTCP messages should only be at the start of an IRC message, not anywhere within it. While the helper only decodes packes in the ORIGINAL direction, its possible to make a client send a CTCP message back by empedding one into a PING request. As-is, thats enough to make the helper believe that it saw a CTCP message. Fixes: 869f37d8e48f ("[NETFILTER]: nf_conntrack/nf_nat: add IRC helper port") Signed-off-by: David Leadbeater <dgl@dgl.cx> Signed-off-by: Florian Westphal <fw@strlen.de>
2022-09-07netfilter: conntrack: ignore overly delayed tcp packetsFlorian Westphal1-28/+21
If 'nf_conntrack_tcp_loose' is off (the default), tcp packets that are outside of the current window are marked as INVALID. nf/iptables rulesets often drop such packets via 'ct state invalid' or similar checks. For overly delayed acks, this can be a nuisance if such 'invalid' packets are also logged. Since they are not invalid in a strict sense, just ignore them, i.e. conntrack won't extend timeout or change state so that they do not match invalid state rules anymore. This also avoids unwantend connection stalls in case conntrack considers retransmission (of data that did not reach the peer) as too old. The else branch of the conditional becomes obsolete. Next patch will reformant the now always-true if condition. The existing workaround for data that exceeds the calculated receive window is adjusted to use the 'ignore' state so that these packets do not refresh the timeout or change state other than updating ->td_end. Signed-off-by: Florian Westphal <fw@strlen.de>
2022-09-07netfilter: conntrack: prepare tcp_in_window for ternary return valueFlorian Westphal1-49/+87
tcp_in_window returns true if the packet is in window and false if it is not. If its outside of window, packet will be treated as INVALID. There are corner cases where the packet should still be tracked, because rulesets may drop or log such packets, even though they can occur during normal operation, such as overly delayed acks. In extreme cases, connection may hang forever because conntrack state differs from real state. There is no retransmission for ACKs. In case of ACK loss after conntrack processing, its possible that a connection can be stuck because the actual retransmits are considered stale ("SEQ is under the lower bound (already ACKed data retransmitted)". The problem is made worse by carrier-grade-nat which can also result in stale packets from old connections to get treated as 'recent' packets in conntrack (it doesn't support tcp timestamps at this time). Prepare tcp_in_window() to return an enum that tells the desired action (in-window/accept, bogus/drop). A third action (accept the packet as in-window, but do not change state) is added in a followup patch. Signed-off-by: Florian Westphal <fw@strlen.de>
2022-09-07netfilter: nf_conntrack_sip: fix ct_sip_walk_headersIgor Ryzhov1-2/+2
ct_sip_next_header and ct_sip_get_header return an absolute value of matchoff, not a shift from current dataoff. So dataoff should be assigned matchoff, not incremented by it. This issue can be seen in the scenario when there are multiple Contact headers and the first one is using a hostname and other headers use IP addresses. In this case, ct_sip_walk_headers will work as follows: The first ct_sip_get_header call to will find the first Contact header but will return -1 as the header uses a hostname. But matchoff will be changed to the offset of this header. After that, dataoff should be set to matchoff, so that the next ct_sip_get_header call find the next Contact header. But instead of assigning dataoff to matchoff, it is incremented by it, which is not correct, as matchoff is an absolute value of the offset. So on the next call to the ct_sip_get_header, dataoff will be incorrect, and the next Contact header may not be found at all. Fixes: 05e3ced297fe ("[NETFILTER]: nf_conntrack_sip: introduce SIP-URI parsing helper") Signed-off-by: Igor Ryzhov <iryzhov@nfware.com> Signed-off-by: Florian Westphal <fw@strlen.de>
2022-09-07netfilter: nft_payload: reject out-of-range attributes via policyFlorian Westphal1-3/+3
Now that nla_policy allows range checks for bigendian data make use of this to reject such attributes. At this time, reject happens later from the init or select_ops callbacks, but its prone to errors. In the future, new attributes can be handled via NLA_POLICY_MAX_BE and exiting ones can be converted one by one. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-06Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextPaolo Abeni11-604/+575
Daniel Borkmann says: ==================== pull-request: bpf-next 2022-09-05 The following pull-request contains BPF updates for your *net-next* tree. We've added 106 non-merge commits during the last 18 day(s) which contain a total of 159 files changed, 5225 insertions(+), 1358 deletions(-). There are two small merge conflicts, resolve them as follows: 1) tools/testing/selftests/bpf/DENYLIST.s390x Commit 27e23836ce22 ("selftests/bpf: Add lru_bug to s390x deny list") in bpf tree was needed to get BPF CI green on s390x, but it conflicted with newly added tests on bpf-next. Resolve by adding both hunks, result: [...] lru_bug # prog 'printk': failed to auto-attach: -524 setget_sockopt # attach unexpected error: -524 (trampoline) cb_refs # expected error message unexpected error: -524 (trampoline) cgroup_hierarchical_stats # JIT does not support calling kernel function (kfunc) htab_update # failed to attach: ERROR: strerror_r(-524)=22 (trampoline) [...] 2) net/core/filter.c Commit 1227c1771dd2 ("net: Fix data-races around sysctl_[rw]mem_(max|default).") from net tree conflicts with commit 29003875bd5b ("bpf: Change bpf_setsockopt(SOL_SOCKET) to reuse sk_setsockopt()") from bpf-next tree. Take the code as it is from bpf-next tree, result: [...] if (getopt) { if (optname == SO_BINDTODEVICE) return -EINVAL; return sk_getsockopt(sk, SOL_SOCKET, optname, KERNEL_SOCKPTR(optval), KERNEL_SOCKPTR(optlen)); } return sk_setsockopt(sk, SOL_SOCKET, optname, KERNEL_SOCKPTR(optval), *optlen); [...] The main changes are: 1) Add any-context BPF specific memory allocator which is useful in particular for BPF tracing with bonus of performance equal to full prealloc, from Alexei Starovoitov. 2) Big batch to remove duplicated code from bpf_{get,set}sockopt() helpers as an effort to reuse the existing core socket code as much as possible, from Martin KaFai Lau. 3) Extend BPF flow dissector for BPF programs to just augment the in-kernel dissector with custom logic. In other words, allow for partial replacement, from Shmulik Ladkani. 4) Add a new cgroup iterator to BPF with different traversal options, from Hao Luo. 5) Support for BPF to collect hierarchical cgroup statistics efficiently through BPF integration with the rstat framework, from Yosry Ahmed. 6) Support bpf_{g,s}et_retval() under more BPF cgroup hooks, from Stanislav Fomichev. 7) BPF hash table and local storages fixes under fully preemptible kernel, from Hou Tao. 8) Add various improvements to BPF selftests and libbpf for compilation with gcc BPF backend, from James Hilliard. 9) Fix verifier helper permissions and reference state management for synchronous callbacks, from Kumar Kartikeya Dwivedi. 10) Add support for BPF selftest's xskxceiver to also be used against real devices that support MAC loopback, from Maciej Fijalkowski. 11) Various fixes to the bpf-helpers(7) man page generation script, from Quentin Monnet. 12) Document BPF verifier's tnum_in(tnum_range(), ...) gotchas, from Shung-Hsi Yu. 13) Various minor misc improvements all over the place. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (106 commits) bpf: Optimize rcu_barrier usage between hash map and bpf_mem_alloc. bpf: Remove usage of kmem_cache from bpf_mem_cache. bpf: Remove prealloc-only restriction for sleepable bpf programs. bpf: Prepare bpf_mem_alloc to be used by sleepable bpf programs. bpf: Remove tracing program restriction on map types bpf: Convert percpu hash map to per-cpu bpf_mem_alloc. bpf: Add percpu allocation support to bpf_mem_alloc. bpf: Batch call_rcu callbacks instead of SLAB_TYPESAFE_BY_RCU. bpf: Adjust low/high watermarks in bpf_mem_cache bpf: Optimize call_rcu in non-preallocated hash map. bpf: Optimize element count in non-preallocated hash map. bpf: Relax the requirement to use preallocated hash maps in tracing progs. samples/bpf: Reduce syscall overhead in map_perf_test. selftests/bpf: Improve test coverage of test_maps bpf: Convert hash map to bpf_mem_alloc. bpf: Introduce any context BPF specific memory allocator. selftest/bpf: Add test for bpf_getsockopt() bpf: Change bpf_getsockopt(SOL_IPV6) to reuse do_ipv6_getsockopt() bpf: Change bpf_getsockopt(SOL_IP) to reuse do_ip_getsockopt() bpf: Change bpf_getsockopt(SOL_TCP) to reuse do_tcp_getsockopt() ... ==================== Link: https://lore.kernel.org/r/20220905161136.9150-1-daniel@iogearbox.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-06Bluetooth: hci_sync: Fix hci_read_buffer_size_syncLuiz Augusto von Dentz1-6/+6
hci_read_buffer_size_sync shall not use HCI_OP_LE_READ_BUFFER_SIZE_V2 sinze that is LE specific, instead it is hci_le_read_buffer_size_sync version that shall use it. Link: https://bugzilla.kernel.org/show_bug.cgi?id=216382 Fixes: 26afbd826ee3 ("Bluetooth: Add initial implementation of CIS connections") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-09-06Bluetooth: Add experimental wrapper for MGMT based meshBrian Gix1-8/+104
This introduces a "Mesh UUID" and an Experimental Feature bit to the hdev mask, and depending all underlying Mesh functionality on it. Signed-off-by: Brian Gix <brian.gix@intel.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-09-06Bluetooth: Implement support for MeshBrian Gix7-44/+690
The patch adds state bits, storage and HCI command chains for sending and receiving Bluetooth Mesh advertising packets, and delivery to requesting user space processes. It specifically creates 4 new MGMT commands and 2 new MGMT events: MGMT_OP_SET_MESH_RECEIVER - Sets passive scan parameters and a list of AD Types which will trigger Mesh Packet Received events MGMT_OP_MESH_READ_FEATURES - Returns information on how many outbound Mesh packets can be simultaneously queued, and what the currently queued handles are. MGMT_OP_MESH_SEND - Command to queue a specific outbound Mesh packet, with the number of times it should be sent, and the BD Addr to use. Discrete advertisments are added to the ADV Instance list. MGMT_OP_MESH_SEND_CANCEL - Command to cancel a prior outbound message request. MGMT_EV_MESH_DEVICE_FOUND - Event to deliver entire received Mesh Advertisement packet, along with timing information. MGMT_EV_MESH_PACKET_CMPLT - Event to indicate that an outbound packet is no longer queued for delivery. Signed-off-by: Brian Gix <brian.gix@intel.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-09-06tcp: fix early ETIMEDOUT after spurious non-SACK RTONeal Cardwell1-7/+18
Fix a bug reported and analyzed by Nagaraj Arankal, where the handling of a spurious non-SACK RTO could cause a connection to fail to clear retrans_stamp, causing a later RTO to very prematurely time out the connection with ETIMEDOUT. Here is the buggy scenario, expanding upon Nagaraj Arankal's excellent report: (*1) Send one data packet on a non-SACK connection (*2) Because no ACK packet is received, the packet is retransmitted and we enter CA_Loss; but this retransmission is spurious. (*3) The ACK for the original data is received. The transmitted packet is acknowledged. The TCP timestamp is before the retrans_stamp, so tcp_may_undo() returns true, and tcp_try_undo_loss() returns true without changing state to Open (because tcp_is_sack() is false), and tcp_process_loss() returns without calling tcp_try_undo_recovery(). Normally after undoing a CA_Loss episode, tcp_fastretrans_alert() would see that the connection has returned to CA_Open and fall through and call tcp_try_to_open(), which would set retrans_stamp to 0. However, for non-SACK connections we hold the connection in CA_Loss, so do not fall through to call tcp_try_to_open() and do not set retrans_stamp to 0. So retrans_stamp is (erroneously) still non-zero. At this point the first "retransmission event" has passed and been recovered from. Any future retransmission is a completely new "event". However, retrans_stamp is erroneously still set. (And we are still in CA_Loss, which is correct.) (*4) After 16 minutes (to correspond with tcp_retries2=15), a new data packet is sent. Note: No data is transmitted between (*3) and (*4) and we disabled keep alives. The socket's timeout SHOULD be calculated from this point in time, but instead it's calculated from the prior "event" 16 minutes ago (step (*2)). (*5) Because no ACK packet is received, the packet is retransmitted. (*6) At the time of the 2nd retransmission, the socket returns ETIMEDOUT, prematurely, because retrans_stamp is (erroneously) too far in the past (set at the time of (*2)). This commit fixes this bug by ensuring that we reuse in tcp_try_undo_loss() the same careful logic for non-SACK connections that we have in tcp_try_undo_recovery(). To avoid duplicating logic, we factor out that logic into a new tcp_is_non_sack_preventing_reopen() helper and call that helper from both undo functions. Fixes: da34ac7626b5 ("tcp: only undo on partial ACKs in CA_Loss") Reported-by: Nagaraj Arankal <nagaraj.p.arankal@hpe.com> Link: https://lore.kernel.org/all/SJ0PR84MB1847BE6C24D274C46A1B9B0EB27A9@SJ0PR84MB1847.NAMPRD84.PROD.OUTLOOK.COM/ Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20220903121023.866900-1-ncardwell.kernel@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-06wifi: mac80211: implement link switchingJohannes Berg6-0/+250
Implement an API function and debugfs file to switch active links. Also provide an async version of the API so drivers can call it in arbitrary contexts, e.g. while in the authorized callback. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-09-06wifi: mac80211: keep A-MSDU data in sta and per-linkBenjamin Berg7-16/+72
The A-MSDU data needs to be stored per-link and aggregated into a single value for the station. Add a new struct ieee_80211_sta_aggregates in order to store this data and a new function ieee80211_sta_recalc_aggregates to update the current data for the STA. Note that in the non MLO case the pointer in ieee80211_sta will directly reference the data in deflink.agg, which means that recalculation may be skipped in that case. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-09-06wifi: mac80211: set up beacon timing config on linksJohannes Berg1-36/+47
On secondary MLO links, I forgot to set the beacon interval and DTIM period, fix that. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-09-06wifi: mac80211: add vif/sta link RCU dereference macrosJohannes Berg1-0/+10
Add macros (and an exported function) to allow checking some link RCU protected accesses that are happening in callbacks from mac80211 and are thus under the correct lock. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-09-06wifi: mac80211: extend ieee80211_nullfunc_get() for MLOJohannes Berg2-17/+31
Add a link_id parameter to ieee80211_nullfunc_get() to be able to obtain a correctly addressed frame. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-09-06wifi: mac80211: add ieee80211_find_sta_by_link_addrs APIJohannes Berg1-0/+37
Add a new API function ieee80211_find_sta_by_link_addrs() that looks up the STA and link ID based on interface and station link addresses. We're going to use it for mac80211-hwsim to track on the AP side which links are active. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-09-06wifi: mac80211: isolate driver from inactive linksJohannes Berg7-174/+270
In order to let the driver select active links and properly make multi-link connections, as a first step isolate the driver from inactive links, and set the active links to be only the association link for client-side interfaces. For AP side nothing changes since APs always have to have all their links active. To simplify things, update the for_each_sta_active_link() API to include the appropriate vif pointer. This also implies not allocating a chanctx for an inactive link, which requires a few more changes. Since we now no longer try to program multiple links to the driver, remove the check in the MLME code. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-09-06wifi: mac80211: make smps_mode per-linkBenjamin Berg5-11/+13
The SMPS power save mode needs to be per-link rather than being shared for all links. As such, move it into struct ieee80211_link_sta. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-09-06wifi: mac80211: use correct rx link_sta instead of defaultBenjamin Berg1-34/+35
Use rx->link_sta everywhere instead of accessing the default link. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-09-06wifi: mac80211: set link_sta in reorder timeoutJohannes Berg1-0/+1
Now that we have a link_sta pointer in the rx struct we also need to fill it in all the cases. It didn't matter so much until now as we weren't using it, but the code should really be able to assume that if the rx.sta is set, so is rx.link_sta. Fixes: ccdde7c74ffd ("wifi: mac80211: properly implement MLO key handling") Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-09-06Merge remote-tracking branch 'wireless/main' into wireless-nextJohannes Berg4-9/+13
Merge wireless/main to get the rx.link fix, which is needed for further work in this area. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-09-06can: raw: use guard clause to optimize nesting in raw_rcv()Ziyang Xuan1-7/+6
We can use guard clause to optimize nesting codes like if (condition) { ... } else { return; } in raw_rcv(); Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> Link: https://lore.kernel.org/all/0170ad1f07dbe838965df4274fce950980fa9d1f.1661584485.git.william.xuanziyang@huawei.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-09-06can: raw: process optimization in raw_init()Ziyang Xuan1-3/+11
Now, register notifier after register proto successfully. It can create raw socket and set socket options once register proto successfully, so it is possible missing notifier event before register notifier successfully although this is a low probability scenario. Move notifier registration to the front of proto registration like done in j1939. In addition, register_netdevice_notifier() may fail, check its result is necessary. Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> Link: https://lore.kernel.org/all/7af9401f0d2d9fed36c1667b5ac9b8df8f8b87ee.1661584485.git.william.xuanziyang@huawei.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-09-05netlink: Bounds-check struct nlmsgerr creationKees Cook2-6/+10
In preparation for FORTIFY_SOURCE doing bounds-check on memcpy(), switch from __nlmsg_put to nlmsg_put(), and explain the bounds check for dealing with the memcpy() across a composite flexible array struct. Avoids this future run-time warning: memcpy: detected field-spanning write (size 32) of single field "&errmsg->msg" at net/netlink/af_netlink.c:2447 (size 16) Cc: Jakub Kicinski <kuba@kernel.org> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Jozsef Kadlecsik <kadlec@netfilter.org> Cc: Florian Westphal <fw@strlen.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: syzbot <syzkaller@googlegroups.com> Cc: netfilter-devel@vger.kernel.org Cc: coreteam@netfilter.org Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220901071336.1418572-1-keescook@chromium.org Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-05Merge tag 'for-net-2022-09-02' of ↵David S. Miller1-6/+6
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Luiz Augusto von Dentz says: ==================== bluetooth pull request for net: - Fix regression preventing ACL packet transmission ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-05ipv6: sr: fix out-of-bounds read when setting HMAC data.David Lebrun1-0/+5
The SRv6 layer allows defining HMAC data that can later be used to sign IPv6 Segment Routing Headers. This configuration is realised via netlink through four attributes: SEG6_ATTR_HMACKEYID, SEG6_ATTR_SECRET, SEG6_ATTR_SECRETLEN and SEG6_ATTR_ALGID. Because the SECRETLEN attribute is decoupled from the actual length of the SECRET attribute, it is possible to provide invalid combinations (e.g., secret = "", secretlen = 64). This case is not checked in the code and with an appropriately crafted netlink message, an out-of-bounds read of up to 64 bytes (max secret length) can occur past the skb end pointer and into skb_shared_info: Breakpoint 1, seg6_genl_sethmac (skb=<optimized out>, info=<optimized out>) at net/ipv6/seg6.c:208 208 memcpy(hinfo->secret, secret, slen); (gdb) bt #0 seg6_genl_sethmac (skb=<optimized out>, info=<optimized out>) at net/ipv6/seg6.c:208 #1 0xffffffff81e012e9 in genl_family_rcv_msg_doit (skb=skb@entry=0xffff88800b1f9f00, nlh=nlh@entry=0xffff88800b1b7600, extack=extack@entry=0xffffc90000ba7af0, ops=ops@entry=0xffffc90000ba7a80, hdrlen=4, net=0xffffffff84237580 <init_net>, family=<optimized out>, family=<optimized out>) at net/netlink/genetlink.c:731 #2 0xffffffff81e01435 in genl_family_rcv_msg (extack=0xffffc90000ba7af0, nlh=0xffff88800b1b7600, skb=0xffff88800b1f9f00, family=0xffffffff82fef6c0 <seg6_genl_family>) at net/netlink/genetlink.c:775 #3 genl_rcv_msg (skb=0xffff88800b1f9f00, nlh=0xffff88800b1b7600, extack=0xffffc90000ba7af0) at net/netlink/genetlink.c:792 #4 0xffffffff81dfffc3 in netlink_rcv_skb (skb=skb@entry=0xffff88800b1f9f00, cb=cb@entry=0xffffffff81e01350 <genl_rcv_msg>) at net/netlink/af_netlink.c:2501 #5 0xffffffff81e00919 in genl_rcv (skb=0xffff88800b1f9f00) at net/netlink/genetlink.c:803 #6 0xffffffff81dff6ae in netlink_unicast_kernel (ssk=0xffff888010eec800, skb=0xffff88800b1f9f00, sk=0xffff888004aed000) at net/netlink/af_netlink.c:1319 #7 netlink_unicast (ssk=ssk@entry=0xffff888010eec800, skb=skb@entry=0xffff88800b1f9f00, portid=portid@entry=0, nonblock=<optimized out>) at net/netlink/af_netlink.c:1345 #8 0xffffffff81dff9a4 in netlink_sendmsg (sock=<optimized out>, msg=0xffffc90000ba7e48, len=<optimized out>) at net/netlink/af_netlink.c:1921 ... (gdb) p/x ((struct sk_buff *)0xffff88800b1f9f00)->head + ((struct sk_buff *)0xffff88800b1f9f00)->end $1 = 0xffff88800b1b76c0 (gdb) p/x secret $2 = 0xffff88800b1b76c0 (gdb) p slen $3 = 64 '@' The OOB data can then be read back from userspace by dumping HMAC state. This commit fixes this by ensuring SECRETLEN cannot exceed the actual length of SECRET. Reported-by: Lucas Leong <wmliang.tw@gmail.com> Tested: verified that EINVAL is correctly returned when secretlen > len(secret) Fixes: 4f4853dc1c9c1 ("ipv6: sr: implement API to control SR HMAC structure") Signed-off-by: David Lebrun <dlebrun@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-05bonding: add all node mcast address when slave upHangbin Liu1-2/+6
When a link is enslave to bond, it need to set the interface down first. This makes the slave remove mac multicast address 33:33:00:00:00:01(The IPv6 multicast address ff02::1 is kept even when the interface down). When bond set the slave up, ipv6_mc_up() was not called due to commit c2edacf80e15 ("bonding / ipv6: no addrconf for slaves separately from master"). This is not an issue before we adding the lladdr target feature for bonding, as the mac multicast address will be added back when bond interface up and join group ff02::1. But after adding lladdr target feature for bonding. When user set a lladdr target, the unsolicited NA message with all-nodes multicast dest will be dropped as the slave interface never add 33:33:00:00:00:01 back. Fix this by calling ipv6_mc_up() to add 33:33:00:00:00:01 back when the slave interface up. Reported-by: LiLiang <liali@redhat.com> Fixes: 5e1eeef69c0f ("bonding: NS target should accept link local address") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-05Merge 6.0-rc4 into tty-nextGreg Kroah-Hartman86-437/+744
We need the tty/serial fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-04Merge tag 'wireless-next-2022-09-03' of ↵David S. Miller11-302/+417
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Johannes Berg says: ==================== drivers - rtw89: large update across the map, e.g. coex, pci(e), etc. - ath9k: uninit memory read fix - ath10k: small peer map fix and a WCN3990 device fix - wfx: underflow stack - the "change MAC address while IFF_UP" change from James we discussed - more MLO work, including a set of fixes for the previous code, now that we have more code we can exercise it more - prevent some features with MLO that aren't ready yet (AP_VLAN and 4-address connections) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-04Merge tag 'wireless-2022-09-03' of ↵David S. Miller4-9/+13
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Johannes berg says: ==================== We have a handful of fixes: - fix DMA from stack in wilc1000 driver - fix crash on chip reset failure in mt7921e - fix for the reported warning on aggregation timer expiry - check packet lengths in hwsim virtio paths - fix compiler warnings/errors with AAD construction by using struct_group - fix Intel 4965 driver rate scale operation - release channel contexts correctly in mac80211 mlme code ==================== Signed-off-by: David S. Miller <davem@davemloft.net>