linux - Linux Kernel (branches are rebased on master from time to time)

Age	Commit message (Collapse)	Author	Files	Lines
2022-12-22	selftests/bpf: check null propagation only neither reg is PTR_TO_BTF_ID	Hao Sun	1	-0/+42
	Verify that nullness information is not porpagated in the branches of register to register JEQ and JNE operations if one of them is PTR_TO_BTF_ID. Implement this in C level so we can use CO-RE. Signed-off-by: Hao Sun <sunhao.th@gmail.com> Suggested-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20221222024414.29539-2-sunhao.th@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2022-12-22	selftests/bpf: Test bpf_skb_adjust_room on CHECKSUM_PARTIAL	Martin KaFai Lau	2	-0/+74
	When the bpf_skb_adjust_room() shrinks the skb such that its csum_start is invalid, the skb->ip_summed should be reset from CHECKSUM_PARTIAL to CHECKSUM_NONE. The commit 54c3f1a81421 ("bpf: pull before calling skb_postpull_rcsum()") fixed it. This patch adds a test to ensure the skb->ip_summed changed from CHECKSUM_PARTIAL to CHECKSUM_NONE after bpf_skb_adjust_room(). Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/bpf/20221221185653.1589961-1-martin.lau@linux.dev
2022-12-14	selftests/bpf: Add a test for using a cpumap from an freplace-to-XDP program	Toke Høiland-Jørgensen	1	-0/+24
	This adds a simple test for inserting an XDP program into a cpumap that is "owned" by an XDP program that was loaded as PROG_TYPE_EXT (as libxdp does). Prior to the kernel fix this would fail because the map type ownership would be set to PROG_TYPE_EXT instead of being resolved to PROG_TYPE_XDP. v5: - Fix a few nits from Andrii, add his ACK v4: - Use skeletons for selftest v3: - Update comment to better explain the cause - Add Yonghong's ACK Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/r/20221214230254.790066-2-toke@redhat.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2022-12-14	selftests/bpf: Fix a selftest compilation error with CONFIG_SMP=n	Yonghong Song	2	-5/+5
	Kernel test robot reported bpf selftest build failure when CONFIG_SMP is not set. The error message looks below: >> progs/rcu_read_lock.c:256:34: error: no member named 'last_wakee' in 'struct task_struct' last_wakee = task->real_parent->last_wakee; ~~~~~~~~~~~~~~~~~ ^ 1 error generated. When CONFIG_SMP is not set, the field 'last_wakee' is not available in struct 'task_struct'. Hence the above compilation failure. To fix the issue, let us choose another field 'group_leader' which is available regardless of CONFIG_SMP set or not. Fixes: fe147956fca4 ("bpf/selftests: Add selftests for new task kfuncs") Fixes: 48671232fcb8 ("selftests/bpf: Add tests for bpf_rcu_read_lock()") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: David Vernet <void@manifault.com> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/bpf/20221213012224.379581-1-yhs@fb.com
2022-12-08	selftests/bpf: Add test for dynptr reinit in user_ringbuf callback	Kumar Kartikeya Dwivedi	1	-8/+43
	The original support for bpf_user_ringbuf_drain callbacks simply short-circuited checks for the dynptr state, allowing users to pass PTR_TO_DYNPTR (now CONST_PTR_TO_DYNPTR) to helpers that initialize a dynptr. This bug would have also surfaced with other dynptr helpers in the future that changed dynptr view or modified it in some way. Include test cases for all cases, i.e. both bpf_dynptr_from_mem and bpf_ringbuf_reserve_dynptr, and ensure verifier rejects both of them. Without the fix, both of these programs load and pass verification. While at it, remove sys_nanosleep target from failure cases' SEC definition, as there is no such tracepoint. Acked-by: David Vernet <void@manifault.com> Acked-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20221207204141.308952-8-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-12-08	bpf: Refactor ARG_PTR_TO_DYNPTR checks into process_dynptr_func	Kumar Kartikeya Dwivedi	1	-12/+0
	ARG_PTR_TO_DYNPTR is akin to ARG_PTR_TO_TIMER, ARG_PTR_TO_KPTR, where the underlying register type is subjected to more special checks to determine the type of object represented by the pointer and its state consistency. Move dynptr checks to their own 'process_dynptr_func' function so that is consistent and in-line with existing code. This also makes it easier to reuse this code for kfunc handling. Then, reuse this consolidated function in kfunc dynptr handling too. Note that for kfuncs, the arg_type constraint of DYNPTR_TYPE_LOCAL has been lifted. Acked-by: David Vernet <void@manifault.com> Acked-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20221207204141.308952-2-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-12-07	selftests/bpf: convert dynptr_fail and map_kptr_fail subtests to generic tester	Andrii Nakryiko	3	-0/+59
	Convert big chunks of dynptr and map_kptr subtests to use generic verification_tester. They are switched from using manually maintained tables of test cases, specifying program name and expected error verifier message, to btf_decl_tag-based annotations directly on corresponding BPF programs: __failure to specify that BPF program is expected to fail verification, and __msg() to specify expected log message. Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20221207201648.2990661-2-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-12-07	selftests/bpf: add generic BPF program tester-loader	Andrii Nakryiko	1	-0/+5
	It's become a common pattern to have a collection of small BPF programs in one BPF object file, each representing one test case. On user-space side of such tests we maintain a table of program names and expected failure or success, along with optional expected verifier log message. This works, but each set of tests reimplement this mundane code over and over again, which is a waste of time for anyone trying to add a new set of tests. Furthermore, it's quite error prone as it's way too easy to miss some entries in these manually maintained test tables (as evidences by dynptr_fail tests, in which ringbuf_release_uninit_dynptr subtest was accidentally missed; this is fixed in next patch). So this patch implements generic test_loader, which accepts skeleton name and handles the rest of details: opens and loads BPF object file, making sure each program is tested in isolation. Optionally each test case can specify expected BPF verifier log message. In case of failure, tester makes sure to report verifier log, but it also reports verifier log in verbose mode unconditionally. Now, the interesting deviation from existing custom implementations is the use of btf_decl_tag attribute to specify expected-to-fail vs expected-to-succeed markers and, optionally, expected log message directly next to BPF program source code, eliminating the need to manually create and update table of tests. We define few macros wrapping btf_decl_tag with a convention that all values of btf_decl_tag start with "comment:" prefix, and then utilizing a very simple "just_some_text_tag" or "some_key_name=<value>" pattern to define things like expected success/failure, expected verifier message, extra verifier log level (if necessary). This approach is demonstrated by next patch in which two existing sets of failure tests are converted. Tester supports both expected-to-fail and expected-to-succeed programs, though this patch set didn't convert any existing expected-to-succeed programs yet, as existing tests couple BPF program loading with their further execution through attach or test_prog_run. One way to allow testing scenarios like this would be ability to specify custom callback, executed for each successfully loaded BPF program. This is left for follow up patches, after some more analysis of existing test cases. This test_loader is, hopefully, a start of a test_verifier-like runner, but integrated into test_progs infrastructure. It will allow much better "user experience" of defining low-level verification tests that can take advantage of all the libbpf-provided nicety features on BPF side: global variables, declarative maps, etc. All while having a choice of defining it in C or as BPF assembly (through __attribute__((naked)) functions and using embedded asm), depending on what makes most sense in each particular case. This will be explored in follow up patches as well. Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20221207201648.2990661-1-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-12-06	bpf: Don't use rcu_users to refcount in task kfuncs	David Vernet	2	-2/+12
	A series of prior patches added some kfuncs that allow struct task_struct * objects to be used as kptrs. These kfuncs leveraged the 'refcount_t rcu_users' field of the task for performing refcounting. This field was used instead of 'refcount_t usage', as we wanted to leverage the safety provided by RCU for ensuring a task's lifetime. A struct task_struct is refcounted by two different refcount_t fields: 1. p->usage: The "true" refcount field which task lifetime. The task is freed as soon as this refcount drops to 0. 2. p->rcu_users: An "RCU users" refcount field which is statically initialized to 2, and is co-located in a union with a struct rcu_head field (p->rcu). p->rcu_users essentially encapsulates a single p->usage refcount, and when p->rcu_users goes to 0, an RCU callback is scheduled on the struct rcu_head which decrements the p->usage refcount. Our logic was that by using p->rcu_users, we would be able to use RCU to safely issue refcount_inc_not_zero() a task's rcu_users field to determine if a task could still be acquired, or was exiting. Unfortunately, this does not work due to p->rcu_users and p->rcu sharing a union. When p->rcu_users goes to 0, an RCU callback is scheduled to drop a single p->usage refcount, and because the fields share a union, the refcount immediately becomes nonzero again after the callback is scheduled. If we were to split the fields out of the union, this wouldn't be a problem. Doing so should also be rather non-controversial, as there are a number of places in struct task_struct that have padding which we could use to avoid growing the structure by splitting up the fields. For now, so as to fix the kfuncs to be correct, this patch instead updates bpf_task_acquire() and bpf_task_release() to use the p->usage field for refcounting via the get_task_struct() and put_task_struct() functions. Because we can no longer rely on RCU, the change also guts the bpf_task_acquire_not_zero() and bpf_task_kptr_get() functions pending a resolution on the above problem. In addition, the task fixes the kfunc and rcu_read_lock selftests to expect this new behavior. Fixes: 90660309b0c7 ("bpf: Add kfuncs for storing struct task_struct * as a kptr") Fixes: fca1aa75518c ("bpf: Handle MEM_RCU type properly") Reported-by: Matus Jokay <matus.jokay@stuba.sk> Signed-off-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/r/20221206210538.597606-1-void@manifault.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-12-06	selftests/bpf: Allow building bpf tests with CONFIG_XFRM_INTERFACE=[m\|n]	Martin KaFai Lau	1	-4/+9
	It is useful to use vmlinux.h in the xfrm_info test like other kfunc tests do. In particular, it is common for kfunc bpf prog that requires to use other core kernel structures in vmlinux.h Although vmlinux.h is preferred, it needs a ___local flavor of struct bpf_xfrm_info in order to build the bpf selftests when CONFIG_XFRM_INTERFACE=[m\|n]. Cc: Eyal Birger <eyal.birger@gmail.com> Fixes: 90a3a05eb33f ("selftests/bpf: add xfrm_info tests") Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20221206193554.1059757-1-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-12-05	selftests/bpf: add xfrm_info tests	Eyal Birger	2	-0/+38
	Test the xfrm_info kfunc helpers. The test setup creates three name spaces - NS0, NS1, NS2. XFRM tunnels are setup between NS0 and the two other NSs. The kfunc helpers are used to steer traffic from NS0 to the other NSs based on a userspace populated bpf global variable and validate that the return traffic had arrived from the desired NS. Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Link: https://lore.kernel.org/r/20221203084659.1837829-5-eyal.birger@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2022-12-04	selftests/bpf: Fix conflicts with built-in functions in bpf_iter_ksym	James Hilliard	1	-3/+3
	Both tolower and toupper are built in c functions, we should not redefine them as this can result in a build error. Fixes the following errors: progs/bpf_iter_ksym.c:10:20: error: conflicting types for built-in function 'tolower'; expected 'int(int)' [-Werror=builtin-declaration-mismatch] 10 \| static inline char tolower(char c) \| ^~~~~~~ progs/bpf_iter_ksym.c:5:1: note: 'tolower' is declared in header '<ctype.h>' 4 \| #include <bpf/bpf_helpers.h> +++ \|+#include <ctype.h> 5 \| progs/bpf_iter_ksym.c:17:20: error: conflicting types for built-in function 'toupper'; expected 'int(int)' [-Werror=builtin-declaration-mismatch] 17 \| static inline char toupper(char c) \| ^~~~~~~ progs/bpf_iter_ksym.c:17:20: note: 'toupper' is declared in header '<ctype.h>' See background on this sort of issue: https://stackoverflow.com/a/20582607 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=12213 (C99, 7.1.3p1) "All identifiers with external linkage in any of the following subclauses (including the future library directions) are always reserved for use as identifiers with external linkage." This is documented behavior in GCC: https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html#index-std-2 Signed-off-by: James Hilliard <james.hilliard1@gmail.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20221203010847.2191265-1-james.hilliard1@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-12-04	bpf: Add sleepable prog tests for cgrp local storage	Yonghong Song	1	-0/+80
	Add three tests for cgrp local storage support for sleepable progs. Two tests can load and run properly, one for cgroup_iter, another for passing current->cgroups->dfl_cgrp to bpf_cgrp_storage_get() helper. One test has bpf_rcu_read_lock() and failed to load. Signed-off-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20221201050449.2785613-1-yhs@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-12-04	bpf: Do not mark certain LSM hook arguments as trusted	Yonghong Song	1	-0/+11
	Martin mentioned that the verifier cannot assume arguments from LSM hook sk_alloc_security being trusted since after the hook is called, the sk ref_count is set to 1. This will overwrite the ref_count changed by the bpf program and may cause ref_count underflow later on. I then further checked some other hooks. For example, for bpf_lsm_file_alloc() hook in fs/file_table.c, f->f_cred = get_cred(cred); error = security_file_alloc(f); if (unlikely(error)) { file_free_rcu(&f->f_rcuhead); return ERR_PTR(error); } atomic_long_set(&f->f_count, 1); The input parameter 'f' to security_file_alloc() cannot be trusted as well. Specifically, I investiaged bpf_map/bpf_prog/file/sk/task alloc/free lsm hooks. Except bpf_map_alloc and task_alloc, arguments for all other hooks should not be considered as trusted. This may not be a complete list, but it covers common usage for sk and task. Fixes: 3f00c5239344 ("bpf: Allow trusted pointers to be passed to KF_TRUSTED_ARGS kfuncs") Signed-off-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20221203204954.2043348-1-yhs@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-12-04	selftests/bpf: Fix rcu_read_lock test with new MEM_RCU semantics	Yonghong Song	1	-10/+45
	Add MEM_RCU pointer null checking for related tests. Also modified task_acquire test so it takes a rcu ptr 'ptr' where 'ptr = rcu_ptr->rcu_field'. Signed-off-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20221203184607.478314-1-yhs@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-12-01	selftests/bpf: Validate multiple ref release_on_unlock logic	Dave Marchevsky	1	-1/+16
	Modify list_push_pop_multiple to alloc and insert nodes 2-at-a-time. Without the previous patch's fix, this block of code: bpf_spin_lock(lock); bpf_list_push_front(head, &f[i]->node); bpf_list_push_front(head, &f[i + 1]->node); bpf_spin_unlock(lock); would fail check_reference_leak check as release_on_unlock logic would miss a ref that should've been released. Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> cc: Kumar Kartikeya Dwivedi <memxor@gmail.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20221201183406.1203621-2-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-11-30	bpf: Tighten ptr_to_btf_id checks.	Alexei Starovoitov	1	-0/+1
	The networking programs typically don't require CAP_PERFMON, but through kfuncs like bpf_cast_to_kern_ctx() they can access memory through PTR_TO_BTF_ID. In such case enforce CAP_PERFMON. Also make sure that only GPL programs can access kernel data structures. All kfuncs require GPL already. Also remove allow_ptr_to_map_access. It's the same as allow_ptr_leaks and different name for the same check only causes confusion. Fixes: fd264ca02094 ("bpf: Add a kfunc to type cast from bpf uapi ctx to kernel ctx") Fixes: 50c6b8a9aea2 ("selftests/bpf: Add a test for btf_type_tag "percpu"") Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20221125220617.26846-1-alexei.starovoitov@gmail.com
2022-11-28	Daniel Borkmann says:	Jakub Kicinski	15	-2/+2700
	==================== bpf-next 2022-11-25 We've added 101 non-merge commits during the last 11 day(s) which contain a total of 109 files changed, 8827 insertions(+), 1129 deletions(-). The main changes are: 1) Support for user defined BPF objects: the use case is to allocate own objects, build own object hierarchies and use the building blocks to build own data structures flexibly, for example, linked lists in BPF, from Kumar Kartikeya Dwivedi. 2) Add bpf_rcu_read_{,un}lock() support for sleepable programs, from Yonghong Song. 3) Add support storing struct task_struct objects as kptrs in maps, from David Vernet. 4) Batch of BPF map documentation improvements, from Maryam Tahhan and Donald Hunter. 5) Improve BPF verifier to propagate nullness information for branches of register to register comparisons, from Eduard Zingerman. 6) Fix cgroup BPF iter infra to hold reference on the start cgroup, from Hou Tao. 7) Fix BPF verifier to not mark fentry/fexit program arguments as trusted given it is not the case for them, from Alexei Starovoitov. 8) Improve BPF verifier's realloc handling to better play along with dynamic runtime analysis tools like KASAN and friends, from Kees Cook. 9) Remove legacy libbpf mode support from bpftool, from Sahid Orentino Ferdjaoui. 10) Rework zero-len skb redirection checks to avoid potentially breaking existing BPF test infra users, from Stanislav Fomichev. 11) Two small refactorings which are independent and have been split out of the XDP queueing RFC series, from Toke Høiland-Jørgensen. 12) Fix a memory leak in LSM cgroup BPF selftest, from Wang Yufen. 13) Documentation on how to run BPF CI without patch submission, from Daniel Müller. Signed-off-by: Jakub Kicinski <kuba@kernel.org> ==================== Link: https://lore.kernel.org/r/20221125012450.441-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-24	selftests/bpf: Add tests for bpf_rcu_read_lock()	Yonghong Song	1	-0/+290
	Add a few positive/negative tests to test bpf_rcu_read_lock() and its corresponding verifier support. The new test will fail on s390x and aarch64, so an entry is added to each of their respective deny lists. Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20221124053222.2374650-1-yhs@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-11-23	selftests/bpf: Add selftests for bpf_task_from_pid()	David Vernet	3	-0/+87
	Add some selftest testcases that validate the expected behavior of the bpf_task_from_pid() kfunc that was added in the prior patch. Signed-off-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/r/20221122145300.251210-3-void@manifault.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-11-22	selftests/bpf: Add selftests for bpf_cgroup_ancestor() kfunc	David Vernet	2	-0/+46
	bpf_cgroup_ancestor() allows BPF programs to access the ancestor of a struct cgroup *. This patch adds selftests that validate its expected behavior. Signed-off-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/r/20221122055458.173143-5-void@manifault.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-11-22	selftests/bpf: Add cgroup kfunc / kptr selftests	David Vernet	3	-0/+456
	This patch adds a selftest suite to validate the cgroup kfuncs that were added in the prior patch. Signed-off-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/r/20221122055458.173143-3-void@manifault.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-11-22	selftests/bpf: Workaround for llvm nop-4 bug	Alexei Starovoitov	1	-1/+1
	Currently LLVM fails to recognize .data.* as data section and defaults to .text section. Later BPF backend tries to emit 4-byte NOP instruction which doesn't exist in BPF ISA and aborts. The fix for LLVM is pending: https://reviews.llvm.org/D138477 While waiting for the fix lets workaround the linked_list test case by using .bss.* prefix which is properly recognized by LLVM as BSS section. Fix libbpf to support .bss. prefix and adjust tests. Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-11-22	Revert "selftests/bpf: Temporarily disable linked list tests"	Alexei Starovoitov	3	-20/+9
	This reverts commit 0a2f85a1be4328d29aefa54684d10c23a3298fef. Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-11-21	selftests/bpf: Make sure zero-len skbs aren't redirectable	Stanislav Fomichev	1	-0/+37
	LWT_XMIT to test L3 case, TC to test L2 case. v2: - s/veth_ifindex/ipip_ifindex/ in two places (Martin) - add comment about which condition triggers the rejection (Martin) Signed-off-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20221121180340.1983627-2-sdf@google.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2022-11-20	bpf: Add type cast unit tests	Yonghong Song	1	-0/+83
	Three tests are added. One is from John Fastabend ({1]) which tests tracing style access for xdp program from the kernel ctx. Another is a tc test to test both kernel ctx tracing style access and explicit non-ctx type cast. The third one is for negative tests including two tests, a tp_bpf test where the bpf_rdonly_cast() returns a untrusted ptr which cannot be used as helper argument, and a tracepoint test where the kernel ctx is a u64. Also added the test to DENYLIST.s390x since s390 does not currently support calling kernel functions in JIT mode. [1] https://lore.kernel.org/bpf/20221109215242.1279993-1-john.fastabend@gmail.com/ Signed-off-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20221120195442.3114844-1-yhs@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-11-20	bpf/selftests: Add selftests for new task kfuncs	David Vernet	3	-0/+480
	A previous change added a series of kfuncs for storing struct task_struct objects as referenced kptrs. This patch adds a new task_kfunc test suite for validating their expected behavior. Signed-off-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/r/20221120051004.3605026-5-void@manifault.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-11-17	selftests/bpf: Temporarily disable linked list tests	Kumar Kartikeya Dwivedi	3	-9/+20
	The latest clang nightly as of writing crashes with the given test case for BPF linked lists wherever global glock, ghead, glock2 are used, hence comment out the parts that cause the crash, and prepare this commit so that it can be reverted when the fix has been made. More context in [0]. [0]: https://lore.kernel.org/bpf/d56223f9-483e-fbc1-4564-44c0858a1e3e@meta.com Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20221118015614.2013203-25-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-11-17	selftests/bpf: Add BPF linked list API tests	Kumar Kartikeya Dwivedi	3	-0/+1007
	Include various tests covering the success and failure cases. Also, run the success cases at runtime to verify correctness of linked list manipulation routines, in addition to ensuring successful verification. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20221118015614.2013203-23-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-11-17	selftests/bpf: Add failure test cases for spin lock pairing	Kumar Kartikeya Dwivedi	1	-0/+204
	First, ensure that whenever a bpf_spin_lock is present in an allocation, the reg->id is preserved. This won't be true for global variables however, since they have a single map value per map, hence the verifier harcodes it to 0 (so that multiple pseudo ldimm64 insns can yield the same lock object per map at a given offset). Next, add test cases for all possible combinations (kptr, global, map value, inner map value). Since we lifted restriction on locking in inner maps, also add test cases for them. Currently, each lookup into an inner map gets a fresh reg->id, so even if the reg->map_ptr is same, they will be treated as separate allocations and the incorrect unlock pairing will be rejected. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20221118015614.2013203-22-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-11-17	selftests/bpf: Update spinlock selftest	Kumar Kartikeya Dwivedi	1	-2/+2
	Make updates in preparation for adding more test cases to this selftest: - Convert from CHECK_ to ASSERT macros. - Use BPF skeleton - Fix typo sping -> spin - Rename spinlock.c -> spin_lock.c Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20221118015614.2013203-21-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-11-17	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Jakub Kicinski	1	-0/+5
	include/linux/bpf.h 1f6e04a1c7b8 ("bpf: Fix offset calculation error in __copy_map_value and zero_map_value") aa3496accc41 ("bpf: Refactor kptr_off_tab into btf_record") f71b2f64177a ("bpf: Refactor map->off_arr handling") https://lore.kernel.org/all/20221114095000.67a73239@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-16	selftests/bpf: fix memory leak of lsm_cgroup	Wang Yufen	1	-0/+8
	kmemleak reports this issue: unreferenced object 0xffff88810b7835c0 (size 32): comm "test_progs", pid 270, jiffies 4294969007 (age 1621.315s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 03 00 00 00 03 00 00 00 0f 00 00 00 00 00 00 00 ................ backtrace: [<00000000376cdeab>] kmalloc_trace+0x27/0x110 [<000000003bcdb3b6>] selinux_sk_alloc_security+0x66/0x110 [<000000003959008f>] security_sk_alloc+0x47/0x80 [<00000000e7bc6668>] sk_prot_alloc+0xbd/0x1a0 [<0000000002d6343a>] sk_alloc+0x3b/0x940 [<000000009812a46d>] unix_create1+0x8f/0x3d0 [<000000005ed0976b>] unix_create+0xa1/0x150 [<0000000086a1d27f>] __sock_create+0x233/0x4a0 [<00000000cffe3a73>] __sys_socket_create.part.0+0xaa/0x110 [<0000000007c63f20>] __sys_socket+0x49/0xf0 [<00000000b08753c8>] __x64_sys_socket+0x42/0x50 [<00000000b56e26b3>] do_syscall_64+0x3b/0x90 [<000000009b4871b8>] entry_SYSCALL_64_after_hwframe+0x63/0xcd The issue occurs in the following scenarios: unix_create1() sk_alloc() sk_prot_alloc() security_sk_alloc() call_int_hook() hlist_for_each_entry() entry1->hook.sk_alloc_security <-- selinux_sk_alloc_security() succeeded, <-- sk->security alloced here. entry2->hook.sk_alloc_security <-- bpf_lsm_sk_alloc_security() failed goto out_free; ... <-- the sk->security not freed, memleak The core problem is that the LSM is not yet fully stacked (work is actively going on in this space) which means that some LSM hooks do not support multiple LSMs at the same time. To fix, skip the "EPERM" test when it runs in the environments that already have non-bpf lsms installed Fixes: dca85aac8895 ("selftests/bpf: lsm_cgroup functional test") Signed-off-by: Wang Yufen <wangyufen@huawei.com> Cc: Stanislav Fomichev <sdf@google.com> Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/1668482980-16163-1-git-send-email-wangyufen@huawei.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2022-11-11	selftests/bpf: Test skops->skb_hwtstamp	Martin KaFai Lau	1	-0/+4
	This patch tests reading the skops->skb_hwtstamp field. A local test was also done such that the shinfo hwtstamp was temporary set to a non zero value in the kernel bpf_skops_parse_hdr() and the same value can be read by the skops test. An adjustment is needed to the btf_dump selftest because the changes in the 'struct bpf_sock_ops'. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20221107230420.4192307-4-martin.lau@linux.dev
2022-11-11	selftests: bpf: Add a test when bpf_probe_read_kernel_str() returns EFAULT	Alban Crequy	1	-0/+5
	This commit tests previous fix of bpf_probe_read_kernel_str(). The BPF helper bpf_probe_read_kernel_str should return -EFAULT when given a bad source pointer and the target buffer should only be modified to make the string NULL terminated. bpf_probe_read_kernel_str() was previously inserting a NULL before the beginning of the dst buffer. This test should ensure that the implementation stays correct for now on. Without the fix, this test will fail as follows: $ cd tools/testing/selftests/bpf $ make $ sudo ./test_progs --name=varlen ... test_varlen:FAIL:check got 0 != exp 66 Signed-off-by: Alban Crequy <albancrequy@linux.microsoft.com> Signed-off-by: Francis Laniel <flaniel@linux.microsoft.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20221110085614.111213-3-albancrequy@linux.microsoft.com Changes v1 to v2: - add ack tag - fix my email - rebase on bpf tree and tag for bpf tree
2022-11-02	Merge tag 'for-netdev' of ↵	Jakub Kicinski	12	-4/+538
	https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== bpf-next 2022-11-02 We've added 70 non-merge commits during the last 14 day(s) which contain a total of 96 files changed, 3203 insertions(+), 640 deletions(-). The main changes are: 1) Make cgroup local storage available to non-cgroup attached BPF programs such as tc BPF ones, from Yonghong Song. 2) Avoid unnecessary deadlock detection and failures wrt BPF task storage helpers, from Martin KaFai Lau. 3) Add LLVM disassembler as default library for dumping JITed code in bpftool, from Quentin Monnet. 4) Various kprobe_multi_link fixes related to kernel modules, from Jiri Olsa. 5) Optimize x86-64 JIT with emitting BMI2-based shift instructions, from Jie Meng. 6) Improve BPF verifier's memory type compatibility for map key/value arguments, from Dave Marchevsky. 7) Only create mmap-able data section maps in libbpf when data is exposed via skeletons, from Andrii Nakryiko. 8) Add an autoattach option for bpftool to load all object assets, from Wang Yufen. 9) Various memory handling fixes for libbpf and BPF selftests, from Xu Kuohai. 10) Initial support for BPF selftest's vmtest.sh on arm64, from Manu Bretelle. 11) Improve libbpf's BTF handling to dedup identical structs, from Alan Maguire. 12) Add BPF CI and denylist documentation for BPF selftests, from Daniel Müller. 13) Check BPF cpumap max_entries before doing allocation work, from Florian Lehner. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (70 commits) samples/bpf: Fix typo in README bpf: Remove the obsolte u64_stats_fetch_*_irq() users. bpf: check max_entries before allocating memory bpf: Fix a typo in comment for DFS algorithm bpftool: Fix spelling mistake "disasembler" -> "disassembler" selftests/bpf: Fix bpftool synctypes checking failure selftests/bpf: Panic on hard/soft lockup docs/bpf: Add documentation for new cgroup local storage selftests/bpf: Add test cgrp_local_storage to DENYLIST.s390x selftests/bpf: Add selftests for new cgroup local storage selftests/bpf: Fix test test_libbpf_str/bpf_map_type_str bpftool: Support new cgroup local storage libbpf: Support new cgroup local storage bpf: Implement cgroup storage available to non-cgroup-attached bpf progs bpf: Refactor some inode/task/sk storage functions for reuse bpf: Make struct cgroup btf id global selftests/bpf: Tracing prog can still do lookup under busy lock selftests/bpf: Ensure no task storage failure for bpf_lsm.s prog due to deadlock detection bpf: Add new bpf_task_storage_delete proto with no deadlock detection bpf: bpf_task_storage_delete_recur does lookup first before the deadlock check ... ==================== Link: https://lore.kernel.org/r/20221102062120.5724-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-25	selftests/bpf: Add selftests for new cgroup local storage	Yonghong Song	4	-0/+285
	Add four tests for new cgroup local storage, (1) testing bpf program helpers and user space map APIs, (2) testing recursive fentry triggering won't deadlock, (3) testing progs attached to cgroups, and (4) a negative test if the bpf_cgrp_storage_get() helper key is not a cgroup btf id. Signed-off-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20221026042911.675546-1-yhs@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-10-25	selftests/bpf: Tracing prog can still do lookup under busy lock	Martin KaFai Lau	1	-3/+40
	This patch modifies the task_ls_recursion test to check that the first bpf_task_storage_get(&map_a, ...) in BPF_PROG(on_update) can still do the lockless lookup even it cannot acquire the percpu busy lock. If the lookup succeeds, it will increment the value by 1 and the value in the task storage map_a will become 200+1=201. After that, BPF_PROG(on_update) tries to delete from map_a and should get -EBUSY because it cannot acquire the percpu busy lock after finding the data. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20221025184524.3526117-10-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-10-25	selftests/bpf: Ensure no task storage failure for bpf_lsm.s prog due to ↵	Martin KaFai Lau	1	-0/+47
	deadlock detection This patch adds a test to check for deadlock failure in bpf_task_storage_{get,delete} when called by a sleepable bpf_lsm prog. It also checks if the prog_info.recursion_misses is non zero. The test starts with 32 threads and they are affinitized to one cpu. In my qemu setup, with CONFIG_PREEMPT=y, I can reproduce it within one second if it is run without the previous patches of this set. Here is the test error message before adding the no deadlock detection version of the bpf_task_storage_{get,delete}: test_nodeadlock:FAIL:bpf_task_storage_get busy unexpected bpf_task_storage_get busy: actual 2 != expected 0 test_nodeadlock:FAIL:bpf_task_storage_delete busy unexpected bpf_task_storage_delete busy: actual 2 != expected 0 Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20221025184524.3526117-9-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-10-25	selftests/bpf: Add kprobe_multi kmod attach api tests	Jiri Olsa	1	-0/+50
	Adding kprobe_multi kmod attach api tests that attach bpf_testmod functions via bpf_program__attach_kprobe_multi_opts. Running it as serial test, because we don't want other tests to reload bpf_testmod while it's running. Acked-by: Song Liu <song@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20221025134148.3300700-9-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-10-25	selftests/bpf: Add kprobe_multi check to module attach test	Jiri Olsa	1	-0/+6
	Adding test that makes sure the kernel module won't be removed if there's kprobe multi link defined on top of it. Acked-by: Song Liu <song@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20221025134148.3300700-8-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-10-24	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Jakub Kicinski	1	-2/+2
	include/linux/net.h a5ef058dc4d9 ("net: introduce and use custom sockopt socket flag") e993ffe3da4b ("net: flag sockets supporting msghdr originated zerocopy") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-21	selftests/bpf: Add write to hashmap to array_map iter test	Dave Marchevsky	1	-1/+20
	Modify iter prog in existing bpf_iter_bpf_array_map.c, which currently dumps arraymap key/val, to also do a write of (val, key) into a newly-added hashmap. Confirm that the write succeeds as expected by modifying the userspace runner program. Before a change added in an earlier commit - considering PTR_TO_BUF reg a valid input to helpers which expect MAP_{KEY,VAL} - the verifier would've rejected this prog change due to type mismatch. Since using current iter's key/val to access a separate map is a reasonable usecase, let's add support for it. Note that the test prog cannot directly write (val, key) into hashmap via bpf_map_update_elem when both come from iter context because key is marked MEM_RDONLY. This is due to bpf_map_update_elem - and other basic map helpers - taking ARG_PTR_TO_MAP_{KEY,VALUE} w/o MEM_RDONLY type flag. bpf_map_{lookup,update,delete}_elem don't modify their input key/val so it should be possible to tag their args READONLY, but due to the ubiquitous use of these helpers and verifier checks for type == MAP_VALUE, such a change is nontrivial and seems better to address in a followup series. Also fixup some 'goto's in test runner's map checking loop. Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20221020160721.4030492-4-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-10-21	selftests/bpf: Add test verifying bpf_ringbuf_reserve retval use in map ops	Dave Marchevsky	1	-0/+70
	Add a test_ringbuf_map_key test prog, borrowing heavily from extant test_ringbuf.c. The program tries to use the result of bpf_ringbuf_reserve as map_key, which was not possible before previouis commits in this series. The test runner added to prog_tests/ringbuf.c verifies that the program loads and does basic sanity checks to confirm that it runs as expected. Also, refactor test_ringbuf such that runners for existing test_ringbuf and newly-added test_ringbuf_map_key are subtests of 'ringbuf' top-level test. Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20221020160721.4030492-3-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-10-21	selftests/bpf: fix task_local_storage/exit_creds rcu usage	Delyan Kratunov	1	-0/+3
	BPF CI has revealed flakiness in the task_local_storage/exit_creds test. The failure point in CI [1] is that null_ptr_count is equal to 0, which indicates that the program hasn't run yet. This points to the kern_sync_rcu (sys_membarrier -> synchronize_rcu underneath) not waiting sufficiently. Indeed, synchronize_rcu only waits for read-side sections that started before the call. If the program execution starts during the synchronize_rcu invocation (due to, say, preemption), the test won't wait long enough. As a speculative fix, make the synchornize_rcu calls in a loop until an explicit run counter has gone up. [1]: https://github.com/kernel-patches/bpf/actions/runs/3268263235/jobs/5374940791 Signed-off-by: Delyan Kratunov <delyank@meta.com> Link: https://lore.kernel.org/r/156d4ef82275a074e8da8f4cffbd01b0c1466493.camel@meta.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-10-19	libbpf: add non-mmapable data section selftest	Andrii Nakryiko	1	-0/+17
	Add non-mmapable data section to test_skeleton selftest and make sure it really isn't mmapable by trying to mmap() it anyways. Also make sure that libbpf doesn't report BPF_F_MMAPABLE flag to users. Additional, some more manual testing was performed that this feature works as intended. Looking at created map through bpftool shows that flags passed to kernel are indeed zero: $ bpftool map show ... 1782: array name .data.non_mmapa flags 0x0 key 4B value 16B max_entries 1 memlock 4096B btf_id 1169 pids test_progs(8311) ... Checking BTF uploaded to kernel for this map shows that zero_key and zero_value are indeed marked as static, even though zero_key is actually original global (but STV_HIDDEN) variable: $ bpftool btf dump id 1169 ... [51] VAR 'zero_key' type_id=2, linkage=static [52] VAR 'zero_value' type_id=7, linkage=static ... [62] DATASEC '.data.non_mmapable' size=16 vlen=2 type_id=51 offset=0 size=4 (VAR 'zero_key') type_id=52 offset=4 size=12 (VAR 'zero_value') ... And original BTF does have zero_key marked as linkage=global: $ bpftool btf dump file test_skeleton.bpf.linked3.o ... [51] VAR 'zero_key' type_id=2, linkage=global [52] VAR 'zero_value' type_id=7, linkage=static ... [62] DATASEC '.data.non_mmapable' size=16 vlen=2 type_id=51 offset=0 size=4 (VAR 'zero_key') type_id=52 offset=4 size=12 (VAR 'zero_value') Bpftool didn't require any changes at all because it checks whether internal map is mmapable already, but just to double-check generated skeleton, we see that .data.non_mmapable neither sets mmaped pointer nor has a corresponding field in the skeleton: $ grep non_mmapable test_skeleton.skel.h struct bpf_map data_non_mmapable; s->maps[7].name = ".data.non_mmapable"; s->maps[7].map = &obj->maps.data_non_mmapable; But .data.read_mostly has all of those things: $ grep read_mostly test_skeleton.skel.h struct bpf_map data_read_mostly; struct test_skeleton__data_read_mostly { int read_mostly_var; } data_read_mostly; s->maps[6].name = ".data.read_mostly"; s->maps[6].map = &obj->maps.data_read_mostly; s->maps[6].mmaped = (void *)&obj->data_read_mostly; _Static_assert(sizeof(s->data_read_mostly->read_mostly_var) == 4, "unexpected size of 'read_mostly_var'"); Acked-by: Stanislav Fomichev <sdf@google.com> Acked-by: Dave Marchevsky <davemarchevsky@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20221019002816.359650-4-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-10-13	selftests/bpf: Make bpf_user_ringbuf_drain() selftest callback return 1	David Vernet	1	-2/+2
	In commit 1bfe26fb0827 ("bpf: Add verifier support for custom callback return range"), the verifier was updated to require callbacks to BPF helpers to explicitly specify the range of values that can be returned. bpf_user_ringbuf_drain() was merged after this in commit 205715673844 ("bpf: Add bpf_user_ringbuf_drain() helper"), and this change in default behavior was missed. This patch updates the BPF_MAP_TYPE_USER_RINGBUF selftests to also return 1 from a bpf_user_ringbuf_drain() callback so as to properly test this going forward. Signed-off-by: David Vernet <void@manifault.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20221012232015.1510043-3-void@manifault.com
2022-10-10	selftests/bpf: Add tests for _opts variants of bpf_*_get_fd_by_id()	Roberto Sassu	1	-0/+36
	Introduce the data_input map, write-protected with a small eBPF program implementing the lsm/bpf_map hook. Then, ensure that bpf_map_get_fd_by_id() and bpf_map_get_fd_by_id_opts() with NULL opts don't succeed due to requesting read-write access to the write-protected map. Also, ensure that bpf_map_get_fd_by_id_opts() with open_flags in opts set to BPF_F_RDONLY instead succeeds. After obtaining a read-only fd, ensure that only map lookup succeeds and not update. Ensure that update works only with the read-write fd obtained at program loading time, when the write protection was not yet enabled. Finally, ensure that the other _opts variants of bpf_*_get_fd_by_id() don't work if the BPF_F_RDONLY flag is set in opts (due to the kernel not handling the open_flags member of bpf_attr). Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20221006110736.84253-7-roberto.sassu@huaweicloud.com
2022-10-05	selftests/bpf: Test btf dump for struct with padding only fields	Eduard Zingerman	1	-0/+9
	Structures with zero regular fields but some padding constitute a special case in btf_dump.c:btf_dump_emit_struct_def with regards to newline before closing '}'. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20221001104425.415768-2-eddyz87@gmail.com
2022-10-04	Merge tag 'net-next-6.1' of ↵	Linus Torvalds	46	-98/+2228
	git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core: - Introduce and use a single page frag cache for allocating small skb heads, clawing back the 10-20% performance regression in UDP flood test from previous fixes. - Run packets which already went thru HW coalescing thru SW GRO. This significantly improves TCP segment coalescing and simplifies deployments as different workloads benefit from HW or SW GRO. - Shrink the size of the base zero-copy send structure. - Move TCP init under a new slow / sleepable version of DO_ONCE(). BPF: - Add BPF-specific, any-context-safe memory allocator. - Add helpers/kfuncs for PKCS#7 signature verification from BPF programs. - Define a new map type and related helpers for user space -> kernel communication over a ring buffer (BPF_MAP_TYPE_USER_RINGBUF). - Allow targeting BPF iterators to loop through resources of one task/thread. - Add ability to call selected destructive functions. Expose crash_kexec() to allow BPF to trigger a kernel dump. Use CAP_SYS_BOOT check on the loading process to judge permissions. - Enable BPF to collect custom hierarchical cgroup stats efficiently by integrating with the rstat framework. - Support struct arguments for trampoline based programs. Only structs with size <= 16B and x86 are supported. - Invoke cgroup/connect{4,6} programs for unprivileged ICMP ping sockets (instead of just TCP and UDP sockets). - Add a helper for accessing CLOCK_TAI for time sensitive network related programs. - Support accessing network tunnel metadata's flags. - Make TCP SYN ACK RTO tunable by BPF programs with TCP Fast Open. - Add support for writing to Netfilter's nf_conn:mark. Protocols: - WiFi: more Extremely High Throughput (EHT) and Multi-Link Operation (MLO) work (802.11be, WiFi 7). - vsock: improve support for SO_RCVLOWAT. - SMC: support SO_REUSEPORT. - Netlink: define and document how to use netlink in a "modern" way. Support reporting missing attributes via extended ACK. - IPSec: support collect metadata mode for xfrm interfaces. - TCPv6: send consistent autoflowlabel in SYN_RECV state and RST packets. - TCP: introduce optional per-netns connection hash table to allow better isolation between namespaces (opt-in, at the cost of memory and cache pressure). - MPTCP: support TCP_FASTOPEN_CONNECT. - Add NEXT-C-SID support in Segment Routing (SRv6) End behavior. - Adjust IP_UNICAST_IF sockopt behavior for connected UDP sockets. - Open vSwitch: - Allow specifying ifindex of new interfaces. - Allow conntrack and metering in non-initial user namespace. - TLS: support the Korean ARIA-GCM crypto algorithm. - Remove DECnet support. Driver API: - Allow selecting the conduit interface used by each port in DSA switches, at runtime. - Ethernet Power Sourcing Equipment and Power Device support. - Add tc-taprio support for queueMaxSDU parameter, i.e. setting per traffic class max frame size for time-based packet schedules. - Support PHY rate matching - adapting between differing host-side and link-side speeds. - Introduce QUSGMII PHY mode and 1000BASE-KX interface mode. - Validate OF (device tree) nodes for DSA shared ports; make phylink-related properties mandatory on DSA and CPU ports. Enforcing more uniformity should allow transitioning to phylink. - Require that flash component name used during update matches one of the components for which version is reported by info_get(). - Remove "weight" argument from driver-facing NAPI API as much as possible. It's one of those magic knobs which seemed like a good idea at the time but is too indirect to use in practice. - Support offload of TLS connections with 256 bit keys. New hardware / drivers: - Ethernet: - Microchip KSZ9896 6-port Gigabit Ethernet Switch - Renesas Ethernet AVB (EtherAVB-IF) Gen4 SoCs - Analog Devices ADIN1110 and ADIN2111 industrial single pair Ethernet (10BASE-T1L) MAC+PHY. - Rockchip RV1126 Gigabit Ethernet (a version of stmmac IP). - Ethernet SFPs / modules: - RollBall / Hilink / Turris 10G copper SFPs - HALNy GPON module - WiFi: - CYW43439 SDIO chipset (brcmfmac) - CYW89459 PCIe chipset (brcmfmac) - BCM4378 on Apple platforms (brcmfmac) Drivers: - CAN: - gs_usb: HW timestamp support - Ethernet PHYs: - lan8814: cable diagnostics - Ethernet NICs: - Intel (100G): - implement control of FCS/CRC stripping - port splitting via devlink - L2TPv3 filtering offload - nVidia/Mellanox: - tunnel offload for sub-functions - MACSec offload, w/ Extended packet number and replay window offload - significantly restructure, and optimize the AF_XDP support, align the behavior with other vendors - Huawei: - configuring DSCP map for traffic class selection - querying standard FEC statistics - querying SerDes lane number via ethtool - Marvell/Cavium: - egress priority flow control - MACSec offload - AMD/SolarFlare: - PTP over IPv6 and raw Ethernet - small / embedded: - ax88772: convert to phylink (to support SFP cages) - altera: tse: convert to phylink - ftgmac100: support fixed link - enetc: standard Ethtool counters - macb: ZynqMP SGMII dynamic configuration support - tsnep: support multi-queue and use page pool - lan743x: Rx IP & TCP checksum offload - igc: add xdp frags support to ndo_xdp_xmit - Ethernet high-speed switches: - Marvell (prestera): - support SPAN port features (traffic mirroring) - nexthop object offloading - Microchip (sparx5): - multicast forwarding offload - QoS queuing offload (tc-mqprio, tc-tbf, tc-ets) - Ethernet embedded switches: - Marvell (mv88e6xxx): - support RGMII cmode - NXP (felix): - standardized ethtool counters - Microchip (lan966x): - QoS queuing offload (tc-mqprio, tc-tbf, tc-cbs, tc-ets) - traffic policing and mirroring - link aggregation / bonding offload - QUSGMII PHY mode support - Qualcomm 802.11ax WiFi (ath11k): - cold boot calibration support on WCN6750 - support to connect to a non-transmit MBSSID AP profile - enable remain-on-channel support on WCN6750 - Wake-on-WLAN support for WCN6750 - support to provide transmit power from firmware via nl80211 - support to get power save duration for each client - spectral scan support for 160 MHz - MediaTek WiFi (mt76): - WiFi-to-Ethernet bridging offload for MT7986 chips - RealTek WiFi (rtw89): - P2P support" * tag 'net-next-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1864 commits) eth: pse: add missing static inlines once: rename _SLOW to _SLEEPABLE net: pse-pd: add regulator based PSE driver dt-bindings: net: pse-dt: add bindings for regulator based PoDL PSE controller ethtool: add interface to interact with Ethernet Power Equipment net: mdiobus: search for PSE nodes by parsing PHY nodes. net: mdiobus: fwnode_mdiobus_register_phy() rework error handling net: add framework to support Ethernet PSE and PDs devices dt-bindings: net: phy: add PoDL PSE property net: marvell: prestera: Propagate nh state from hw to kernel net: marvell: prestera: Add neighbour cache accounting net: marvell: prestera: add stub handler neighbour events net: marvell: prestera: Add heplers to interact with fib_notifier_info net: marvell: prestera: Add length macros for prestera_ip_addr net: marvell: prestera: add delayed wq and flush wq on deinit net: marvell: prestera: Add strict cleanup of fib arbiter net: marvell: prestera: Add cleanup of allocated fib_nodes net: marvell: prestera: Add router nexthops ABI eth: octeon: fix build after netif_napi_add() changes net/mlx5: E-Switch, Return EBUSY if can't get mode lock ...