summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2022-08-25bpf: Introduce cgroup iterHao Luo6-2/+357
Cgroup_iter is a type of bpf_iter. It walks over cgroups in four modes: - walking a cgroup's descendants in pre-order. - walking a cgroup's descendants in post-order. - walking a cgroup's ancestors. - process only the given cgroup. When attaching cgroup_iter, one can set a cgroup to the iter_link created from attaching. This cgroup is passed as a file descriptor or cgroup id and serves as the starting point of the walk. If no cgroup is specified, the starting point will be the root cgroup v2. For walking descendants, one can specify the order: either pre-order or post-order. For walking ancestors, the walk starts at the specified cgroup and ends at the root. One can also terminate the walk early by returning 1 from the iter program. Note that because walking cgroup hierarchy holds cgroup_mutex, the iter program is called with cgroup_mutex held. Currently only one session is supported, which means, depending on the volume of data bpf program intends to send to user space, the number of cgroups that can be walked is limited. For example, given the current buffer size is 8 * PAGE_SIZE, if the program sends 64B data for each cgroup, assuming PAGE_SIZE is 4kb, the total number of cgroups that can be walked is 512. This is a limitation of cgroup_iter. If the output data is larger than the kernel buffer size, after all data in the kernel buffer is consumed by user space, the subsequent read() syscall will signal EOPNOTSUPP. In order to work around, the user may have to update their program to reduce the volume of data sent to output. For example, skip some uninteresting cgroups. In future, we may extend bpf_iter flags to allow customizing buffer size. Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Hao Luo <haoluo@google.com> Link: https://lore.kernel.org/r/20220824233117.1312810-2-haoluo@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-24selftests/bpf: Fix wrong size passed to bpf_setsockopt()Yang Yingliang1-3/+7
sizeof(new_cc) is not real memory size that new_cc points to; introduce a new_cc_len to store the size and then pass it to bpf_setsockopt(). Fixes: 31123c0360e0 ("selftests/bpf: bpf_setsockopt tests") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220824013907.380448-1-yangyingliang@huawei.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-24selftests/bpf: Add cb_refs test to s390x deny listDaniel Müller1-0/+1
The cb_refs BPF selftest is failing execution on s390x machines. This is a newly added test that requires a feature not presently supported on this architecture. Denylist the test for this architecture. Fixes: 3cf7e7d8685c ("selftests/bpf: Add tests for reference state fixes for callbacks") Signed-off-by: Daniel Müller <deso@posteo.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220824163906.1186832-1-deso@posteo.net Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-24Merge branch 'Fix reference state management for synchronous callbacks'Alexei Starovoitov5-13/+212
Kumar Kartikeya Dwivedi says: ==================== This is patch 1, 2 + their individual tests split into a separate series from the RFC, so that these can be taken in, while we continue working towards a fix for handling stack access inside the callback. Changelog: ---------- v1 -> v2: v1: https://lore.kernel.org/bpf/20220822131923.21476-1-memxor@gmail.com * Fix error for test_progs-no_alu32 due to distinct alloc_insn in errstr RFC v1 -> v1: RFC v1: https://lore.kernel.org/bpf/20220815051540.18791-1-memxor@gmail.com * Fix up commit log to add more explanation (Alexei) * Split reference state fix out into a separate series ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-24selftests/bpf: Add tests for reference state fixes for callbacksKumar Kartikeya Dwivedi2-0/+164
These are regression tests to ensure we don't end up in invalid runtime state for helpers that execute callbacks multiple times. It exercises the fixes to verifier callback handling for reference state in previous patches. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20220823013226.24988-1-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-24bpf: Fix reference state management for synchronous callbacksKumar Kartikeya Dwivedi2-9/+44
Currently, verifier verifies callback functions (sync and async) as if they will be executed once, (i.e. it explores execution state as if the function was being called once). The next insn to explore is set to start of subprog and the exit from nested frame is handled using curframe > 0 and prepare_func_exit. In case of async callback it uses a customized variant of push_stack simulating a kind of branch to set up custom state and execution context for the async callback. While this approach is simple and works when callback really will be executed only once, it is unsafe for all of our current helpers which are for_each style, i.e. they execute the callback multiple times. A callback releasing acquired references of the caller may do so multiple times, but currently verifier sees it as one call inside the frame, which then returns to caller. Hence, it thinks it released some reference that the cb e.g. got access through callback_ctx (register filled inside cb from spilled typed register on stack). Similarly, it may see that an acquire call is unpaired inside the callback, so the caller will copy the reference state of callback and then will have to release the register with new ref_obj_ids. But again, the callback may execute multiple times, but the verifier will only account for acquired references for a single symbolic execution of the callback, which will cause leaks. Note that for async callback case, things are different. While currently we have bpf_timer_set_callback which only executes it once, even for multiple executions it would be safe, as reference state is NULL and check_reference_leak would force program to release state before BPF_EXIT. The state is also unaffected by analysis for the caller frame. Hence async callback is safe. Since we want the reference state to be accessible, e.g. for pointers loaded from stack through callback_ctx's PTR_TO_STACK, we still have to copy caller's reference_state to callback's bpf_func_state, but we enforce that whatever references it adds to that reference_state has been released before it hits BPF_EXIT. This requires introducing a new callback_ref member in the reference state to distinguish between caller vs callee references. Hence, check_reference_leak now errors out if it sees we are in callback_fn and we have not released callback_ref refs. Since there can be multiple nested callbacks, like frame 0 -> cb1 -> cb2 etc. we need to also distinguish between whether this particular ref belongs to this callback frame or parent, and only error for our own, so we store state->frameno (which is always non-zero for callbacks). In short, callbacks can read parent reference_state, but cannot mutate it, to be able to use pointers acquired by the caller. They must only undo their changes (by releasing their own acquired_refs before BPF_EXIT) on top of caller reference_state before returning (at which point the caller and callback state will match anyway, so no need to copy it back to caller). Fixes: 69c087ba6225 ("bpf: Add bpf_for_each_map_elem() helper") Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20220823013125.24938-1-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-23bpf: Move bpf_loop and bpf_for_each_map_elem under CAP_BPFKumar Kartikeya Dwivedi1-4/+4
They would require func_info which needs prog BTF anyway. Loading BTF and setting the prog btf_fd while loading the prog indirectly requires CAP_BPF, so just to reduce confusion, move both these helpers taking callback under bpf_capable() protection as well, since they cannot be used without CAP_BPF. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20220823013117.24916-1-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-23Merge branch 'bpf: expose bpf_{g,s}et_retval to more cgroup hooks'Alexei Starovoitov12-124/+322
Stanislav Fomichev says: ==================== Apparently, only a small subset of cgroup hooks actually falls back to cgroup_base_func_proto. This leads to unexpected result where not all cgroup helpers have bpf_{g,s}et_retval. It's getting harder and harder to manage which helpers are exported to which hooks. We now have the following call chains: - cg_skb_func_proto - sk_filter_func_proto - bpf_sk_base_func_proto - bpf_base_func_proto So by looking at cg_skb_func_proto it's pretty hard to understand what's going on. For cgroup helpers, I'm proposing we do the following instead: func_proto = cgroup_common_func_proto(); if (func_proto) return func_proto; /* optional, if hook has 'current' */ func_proto = cgroup_current_func_proto(); if (func_proto) return func_proto; ... switch (func_id) { /* hook specific helpers */ case BPF_FUNC_hook_specific_helper: return &xyz; default: /* always fall back to plain bpf_base_func_proto */ bpf_base_func_proto(func_id); } If this turns out more workable, we can follow up with converting the rest to the same pattern. v5: - remove net/cls_cgroup.h include from patch 1/5 (Martin) - move endif changes from patch 1/5 to 3/5 (Martin) - don't define __weak protos, the ones in core.c suffice (Martin) v4: - don't touch existing helper.c helpers (Martin) - drop unneeded CONFIG_CGROUP_BPF in bpf_lsm_func_proto (Martin) v3: - expose strtol/strtoul everywhere (Martin) - move helpers declarations from bpf.h to bpf-cgroup.h (Martin) - revise bpf_{g,s}et_retval documentation (Martin) - don't expose bpf_{g,s}et_retval to cg_skb hooks (Martin) v2: - move everything into kernel/bpf/cgroup.c instead (Martin) - use cgroup_common_func_proto in lsm (Martin) ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-23selftests/bpf: Make sure bpf_{g,s}et_retval is exposed everywhereStanislav Fomichev4-0/+90
For each hook, have a simple bpf_set_retval(bpf_get_retval) program and make sure it loads for the hooks we want. The exceptions are the hooks which don't propagate the error to the callers: - sockops - recvmsg - getpeername - getsockname - cg_skb ingress and egress Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20220823222555.523590-6-sdf@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-23bpf: update bpf_{g,s}et_retval documentationStanislav Fomichev2-10/+34
* replace 'syscall' with 'upper layers', still mention that it's being exported via syscall errno * describe what happens in set_retval(-EPERM) + return 1 * describe what happens with bind's 'return 3' Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20220823222555.523590-5-sdf@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-23bpf: expose bpf_strtol and bpf_strtoul to all program typesStanislav Fomichev2-5/+5
bpf_strncmp is already exposed everywhere. The motivation is to keep those helpers in kernel/bpf/helpers.c. Otherwise it's tempting to move them under kernel/bpf/cgroup.c because they are currently only used by sysctl prog types. Suggested-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20220823222555.523590-4-sdf@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-23bpf: Use cgroup_{common,current}_func_proto in more hooksStanislav Fomichev4-58/+80
The following hooks are per-cgroup hooks but they are not using cgroup_{common,current}_func_proto, fix it: * BPF_PROG_TYPE_CGROUP_SKB (cg_skb) * BPF_PROG_TYPE_CGROUP_SOCK_ADDR (cg_sock_addr) * BPF_PROG_TYPE_CGROUP_SOCK (cg_sock) * BPF_PROG_TYPE_LSM+BPF_LSM_CGROUP Also: * move common func_proto's into cgroup func_proto handlers * make sure bpf_{g,s}et_retval are not accessible from recvmsg, getpeername and getsockname (return/errno is ignored in these places) * as a side effect, expose get_current_pid_tgid, get_current_comm_proto, get_current_ancestor_cgroup_id, get_cgroup_classid to more cgroup hooks Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20220823222555.523590-3-sdf@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-23bpf: Introduce cgroup_{common,current}_func_protoStanislav Fomichev3-53/+115
Split cgroup_base_func_proto into the following: * cgroup_common_func_proto - common helpers for all cgroup hooks * cgroup_current_func_proto - common helpers for all cgroup hooks running in the process context (== have meaningful 'current'). Move bpf_{g,s}et_retval and other cgroup-related helpers into kernel/bpf/cgroup.c so they closer to where they are being used. Signed-off-by: Stanislav Fomichev <sdf@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/r/20220823222555.523590-2-sdf@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-23scripts/bpf: Set date attribute for bpf-helpers(7) man pageQuentin Monnet1-2/+18
The bpf-helpers(7) manual page shipped in the man-pages project is generated from the documentation contained in the BPF UAPI header, in the Linux repository, parsed by script/bpf_doc.py and then fed to rst2man. The man page should contain the date of last modification of the documentation. This commit adds the relevant date when generating the page. Before: $ ./scripts/bpf_doc.py helpers | rst2man | grep '\.TH' .TH BPF-HELPERS 7 "" "Linux v5.19-14022-g30d2a4d74e11" "" After: $ ./scripts/bpf_doc.py helpers | rst2man | grep '\.TH' .TH BPF-HELPERS 7 "2022-08-15" "Linux v5.19-14022-g30d2a4d74e11" "" We get the version by using "git log" to look for the commit date of the latest change to the section of the BPF header containing the documentation. If the command fails, we just skip the date field. and keep generating the page. Reported-by: Alejandro Colomar <alx.manpages@gmail.com> Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Alejandro Colomar <alx.manpages@gmail.com> Link: https://lore.kernel.org/bpf/20220823155327.98888-2-quentin@isovalent.com
2022-08-23scripts/bpf: Set version attribute for bpf-helpers(7) man pageQuentin Monnet1-1/+20
The bpf-helpers(7) manual page shipped in the man-pages project is generated from the documentation contained in the BPF UAPI header, in the Linux repository, parsed by script/bpf_doc.py and then fed to rst2man. After a recent update of that page [0], Alejandro reported that the linter used to validate the man pages complains about the generated document [1]. The header for the page is supposed to contain some attributes that we do not set correctly with the script. This commit updates the "project and version" field. We discussed the format of those fields in [1] and [2]. Before: $ ./scripts/bpf_doc.py helpers | rst2man | grep '\.TH' .TH BPF-HELPERS 7 "" "" "" After: $ ./scripts/bpf_doc.py helpers | rst2man | grep '\.TH' .TH BPF-HELPERS 7 "" "Linux v5.19-14022-g30d2a4d74e11" "" We get the version from "git describe", but if unavailable, we fall back on "make kernelversion". If none works, for example because neither git nore make are installed, we just set the field to "Linux" and keep generating the page. [0] https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/man7/bpf-helpers.7?id=19c7f78393f2b038e76099f87335ddf43a87f039 [1] https://lore.kernel.org/all/20220823084719.13613-1-quentin@isovalent.com/t/#m58a418a318642c6428e14ce9bb84eba5183b06e8 [2] https://lore.kernel.org/all/20220721110821.8240-1-alx.manpages@gmail.com/t/#m8e689a822e03f6e2530a0d6de9d128401916c5de Reported-by: Alejandro Colomar <alx.manpages@gmail.com> Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Alejandro Colomar <alx.manpages@gmail.com> Link: https://lore.kernel.org/bpf/20220823155327.98888-1-quentin@isovalent.com
2022-08-23bpf, selftests: Test BPF_FLOW_DISSECTOR_CONTINUEShmulik Ladkani3-0/+44
The dissector program returns BPF_FLOW_DISSECTOR_CONTINUE (and avoids setting skb->flow_keys or last_dissection map) in case it encounters IP packets whose (outer) source address is 127.0.0.127. Additional test is added to prog_tests/flow_dissector.c which sets this address as test's pkk.iph.saddr, with the expected retval of BPF_FLOW_DISSECTOR_CONTINUE. Also, legacy test_flow_dissector.sh was similarly augmented. Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Stanislav Fomichev <sdf@google.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20220821113519.116765-5-shmulik.ladkani@gmail.com
2022-08-23bpf, test_run: Propagate bpf_flow_dissect's retval to user's ↵Shmulik Ladkani3-3/+24
bpf_attr.test.retval Formerly, a boolean denoting whether bpf_flow_dissect returned BPF_OK was set into 'bpf_attr.test.retval'. Augment this, so users can check the actual return code of the dissector program under test. Existing prog_tests/flow_dissector*.c tests were correspondingly changed to check against each test's expected retval. Also, tests' resulting 'flow_keys' are verified only in case the expected retval is BPF_OK. This allows adding new tests that expect non BPF_OK. Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Stanislav Fomichev <sdf@google.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20220821113519.116765-4-shmulik.ladkani@gmail.com
2022-08-23bpf, flow_dissector: Introduce BPF_FLOW_DISSECTOR_CONTINUE retcode for bpf progsShmulik Ladkani3-0/+13
Currently, attaching BPF_PROG_TYPE_FLOW_DISSECTOR programs completely replaces the flow-dissector logic with custom dissection logic. This forces implementors to write programs that handle dissection for any flows expected in the namespace. It makes sense for flow-dissector BPF programs to just augment the dissector with custom logic (e.g. dissecting certain flows or custom protocols), while enjoying the broad capabilities of the standard dissector for any other traffic. Introduce BPF_FLOW_DISSECTOR_CONTINUE retcode. Flow-dissector BPF programs may return this to indicate no dissection was made, and fallback to the standard dissector is requested. Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Stanislav Fomichev <sdf@google.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20220821113519.116765-3-shmulik.ladkani@gmail.com
2022-08-23flow_dissector: Make 'bpf_flow_dissect' return the bpf program retcodeShmulik Ladkani3-9/+10
Let 'bpf_flow_dissect' callers know the BPF program's retcode and act accordingly. Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Stanislav Fomichev <sdf@google.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20220821113519.116765-2-shmulik.ladkani@gmail.com
2022-08-19selftest/bpf: Add setget_sockopt to DENYLIST.s390xMartin KaFai Lau1-0/+1
Trampoline is not supported in s390. Fixes: 31123c0360e0 ("selftests/bpf: bpf_setsockopt tests") Signed-off-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/r/20220819192155.91713-1-kafai@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-19selftests/bpf: Fix spelling mistake.Colin Ian King1-1/+1
There is a spelling mistake in an ASSERT_OK literal string. Fix it. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Acked-by: Mykola Lysenko <mykolal@fb.com> Link: https://lore.kernel.org/r/20220817213242.101277-1-colin.i.king@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-18Merge branch 'bpf: net: Remove duplicated code from bpf_setsockopt()'Alexei Starovoitov16-277/+874
Martin KaFai Lau says: ==================== The code in bpf_setsockopt() is mostly a copy-and-paste from the sock_setsockopt(), do_tcp_setsockopt(), do_ipv6_setsockopt(), and do_ip_setsockopt(). As the allowed optnames in bpf_setsockopt() grows, so are the duplicated code. The code between the copies also slowly drifted. This set is an effort to clean this up and reuse the existing {sock,do_tcp,do_ipv6,do_ip}_setsockopt() as much as possible. After the clean up, this set also adds a few allowed optnames that we need to the bpf_setsockopt(). The initial attempt was to clean up both bpf_setsockopt() and bpf_getsockopt() together. However, the patch set was getting too long. It is beneficial to leave the bpf_getsockopt() out for another patch set. Thus, this set is focusing on the bpf_setsockopt(). v4: - This set now depends on the commit f574f7f839fc ("net: bpf: Use the protocol's set_rcvlowat behavior if there is one") in the net-next tree. The commit calls a specific protocol's set_rcvlowat and it changed the bpf_setsockopt which this set has also changed. Because of this, patch 9 of this set has also adjusted and a 'sock' NULL check is added to the sk_setsockopt() because some of the bpf hooks have a NULL sk->sk_socket. This removes more dup code from the bpf_setsockopt() side. - Avoid mentioning specific prog types in the comment of the has_current_bpf_ctx(). (Andrii) - Replace signed with unsigned int bitfield in the patch 15 selftest. (Daniel) v3: - s/in_bpf/has_current_bpf_ctx/ (Andrii) - Add comment to has_current_bpf_ctx() and sockopt_lock_sock() (Stanislav) - Use vmlinux.h in selftest and add defines to bpf_tracing_net.h (Stanislav) - Use bpf_getsockopt(SO_MARK) in selftest (Stanislav) - Use BPF_CORE_READ_BITFIELD in selftest (Yonghong) v2: - A major change is to use in_bpf() to test if a setsockopt() is called by a bpf prog and use in_bpf() to skip capable check. Suggested by Stanislav. - Instead of passing is_locked through sockptr_t or through an extra argument to sk_setsockopt, v2 uses in_bpf() to skip the lock_sock() also because bpf prog has the lock acquired. - No change to the current sockptr_t in this revision - s/codes/code/ ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-18selftests/bpf: bpf_setsockopt testsMartin KaFai Lau3-1/+606
This patch adds tests to exercise optnames that are allowed in bpf_setsockopt(). Reviewed-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/r/20220817061847.4182339-1-kafai@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-18bpf: Add a few optnames to bpf_setsockoptMartin KaFai Lau1-0/+5
This patch adds a few optnames for bpf_setsockopt: SO_REUSEADDR, IPV6_AUTOFLOWLABEL, TCP_MAXSEG, TCP_NODELAY, and TCP_THIN_LINEAR_TIMEOUTS. Thanks to the previous patches of this set, all additions can reuse the sk_setsockopt(), do_ipv6_setsockopt(), and do_tcp_setsockopt(). The only change here is to allow them in bpf_setsockopt. The bpf prog has been able to read all members of a sk by using PTR_TO_BTF_ID of a sk. The optname additions here can also be read by the same approach. Meaning there is a way to read the values back. These optnames can also be added to bpf_getsockopt() later with another patch set that makes the bpf_getsockopt() to reuse the sock_getsockopt(), tcp_getsockopt(), and ip[v6]_getsockopt(). Thus, this patch does not add more duplicated code to bpf_getsockopt() now. Reviewed-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/r/20220817061841.4181642-1-kafai@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-18bpf: Change bpf_setsockopt(SOL_IPV6) to reuse do_ipv6_setsockopt()Martin KaFai Lau5-32/+33
After the prep work in the previous patches, this patch removes the dup code from bpf_setsockopt(SOL_IPV6) and reuses the implementation in do_ipv6_setsockopt(). ipv6 could be compiled as a module. Like how other code solved it with stubs in ipv6_stubs.h, this patch adds the do_ipv6_setsockopt to the ipv6_bpf_stub. The current bpf_setsockopt(IPV6_TCLASS) does not take the INET_ECN_MASK into the account for tcp. The do_ipv6_setsockopt(IPV6_TCLASS) will handle it correctly. The existing optname white-list is refactored into a new function sol_ipv6_setsockopt(). After this last SOL_IPV6 dup code removal, the __bpf_setsockopt() is simplified enough that the extra "{ }" around the if statement can be removed. Reviewed-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/r/20220817061834.4181198-1-kafai@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-18bpf: Change bpf_setsockopt(SOL_IP) to reuse do_ip_setsockopt()Martin KaFai Lau3-22/+24
After the prep work in the previous patches, this patch removes the dup code from bpf_setsockopt(SOL_IP) and reuses the implementation in do_ip_setsockopt(). The existing optname white-list is refactored into a new function sol_ip_setsockopt(). NOTE, the current bpf_setsockopt(IP_TOS) is quite different from the the do_ip_setsockopt(IP_TOS). For example, it does not take the INET_ECN_MASK into the account for tcp and also does not adjust sk->sk_priority. It looks like the current bpf_setsockopt(IP_TOS) was referencing the IPV6_TCLASS implementation instead of IP_TOS. This patch tries to rectify that by using the do_ip_setsockopt(IP_TOS). While this is a behavior change, the do_ip_setsockopt(IP_TOS) behavior is arguably what the user is expecting. At least, the INET_ECN_MASK bits should be masked out for tcp. Reviewed-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/r/20220817061826.4180990-1-kafai@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-18bpf: Change bpf_setsockopt(SOL_TCP) to reuse do_tcp_setsockopt()Martin KaFai Lau3-69/+34
After the prep work in the previous patches, this patch removes all the dup code from bpf_setsockopt(SOL_TCP) and reuses the do_tcp_setsockopt(). The existing optname white-list is refactored into a new function sol_tcp_setsockopt(). The sol_tcp_setsockopt() also calls the bpf_sol_tcp_setsockopt() to handle the TCP_BPF_XXX specific optnames. bpf_setsockopt(TCP_SAVE_SYN) now also allows a value 2 to save the eth header also and it comes for free from do_tcp_setsockopt(). Reviewed-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/r/20220817061819.4180146-1-kafai@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-18bpf: Refactor bpf specific tcp optnames to a new functionMartin KaFai Lau1-29/+50
The patch moves all bpf specific tcp optnames (TCP_BPF_XXX) to a new function bpf_sol_tcp_setsockopt(). This will make the next patch easier to follow. Reviewed-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/r/20220817061812.4179645-1-kafai@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-18bpf: Change bpf_setsockopt(SOL_SOCKET) to reuse sk_setsockopt()Martin KaFai Lau3-98/+34
After the prep work in the previous patches, this patch removes most of the dup code from bpf_setsockopt(SOL_SOCKET) and reuses them from sk_setsockopt(). The sock ptr test is added to the SO_RCVLOWAT because the sk->sk_socket could be NULL in some of the bpf hooks. The existing optname white-list is refactored into a new function sol_socket_setsockopt(). Reviewed-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/r/20220817061804.4178920-1-kafai@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-18bpf: Embed kernel CONFIG check into the if statement in bpf_setsockoptMartin KaFai Lau1-7/+3
This patch moves the "#ifdef CONFIG_XXX" check into the "if/else" statement itself. The change is done for the bpf_setsockopt() function only. It will make the latter patches easier to follow without the surrounding ifdef macro. Reviewed-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/r/20220817061758.4178374-1-kafai@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-18bpf: Initialize the bpf_run_ctx in bpf_iter_run_prog()Martin KaFai Lau1-0/+5
The bpf-iter-prog for tcp and unix sk can do bpf_setsockopt() which needs has_current_bpf_ctx() to decide if it is called by a bpf prog. This patch initializes the bpf_run_ctx in bpf_iter_run_prog() for the has_current_bpf_ctx() to use. Acked-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/r/20220817061751.4177657-1-kafai@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-18bpf: net: Change do_ipv6_setsockopt() to use the sockopt's lock_sock() and ↵Martin KaFai Lau1-7/+7
capable() Similar to the earlier patch that avoids sk_setsockopt() from taking sk lock and doing capable test when called by bpf. This patch changes do_ipv6_setsockopt() to use the sockopt_{lock,release}_sock() and sockopt_[ns_]capable(). Reviewed-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/r/20220817061744.4176893-1-kafai@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-18bpf: net: Change do_ip_setsockopt() to use the sockopt's lock_sock() and ↵Martin KaFai Lau1-6/+6
capable() Similar to the earlier patch that avoids sk_setsockopt() from taking sk lock and doing capable test when called by bpf. This patch changes do_ip_setsockopt() to use the sockopt_{lock,release}_sock() and sockopt_[ns_]capable(). Reviewed-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/r/20220817061737.4176402-1-kafai@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-18bpf: net: Change do_tcp_setsockopt() to use the sockopt's lock_sock() and ↵Martin KaFai Lau1-9/+9
capable() Similar to the earlier patch that avoids sk_setsockopt() from taking sk lock and doing capable test when called by bpf. This patch changes do_tcp_setsockopt() to use the sockopt_{lock,release}_sock() and sockopt_[ns_]capable(). Reviewed-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/r/20220817061730.4176021-1-kafai@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-18bpf: net: Consider has_current_bpf_ctx() when testing capable() in ↵Martin KaFai Lau2-13/+27
sk_setsockopt() When bpf program calling bpf_setsockopt(SOL_SOCKET), it could be run in softirq and doesn't make sense to do the capable check. There was a similar situation in bpf_setsockopt(TCP_CONGESTION). In commit 8d650cdedaab ("tcp: fix tcp_set_congestion_control() use from bpf hook"), tcp_set_congestion_control(..., cap_net_admin) was added to skip the cap check for bpf prog. This patch adds sockopt_ns_capable() and sockopt_capable() for the sk_setsockopt() to use. They will consider the has_current_bpf_ctx() before doing the ns_capable() and capable() test. They are in EXPORT_SYMBOL for the ipv6 module to use in a latter patch. Suggested-by: Stanislav Fomichev <sdf@google.com> Reviewed-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/r/20220817061723.4175820-1-kafai@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-18bpf: net: Avoid sk_setsockopt() taking sk lock when called from bpfMartin KaFai Lau3-3/+43
Most of the code in bpf_setsockopt(SOL_SOCKET) are duplicated from the sk_setsockopt(). The number of supported optnames are increasing ever and so as the duplicated code. One issue in reusing sk_setsockopt() is that the bpf prog has already acquired the sk lock. This patch adds a has_current_bpf_ctx() to tell if the sk_setsockopt() is called from a bpf prog. The bpf prog calling bpf_setsockopt() is either running in_task() or in_serving_softirq(). Both cases have the current->bpf_ctx initialized. Thus, the has_current_bpf_ctx() only needs to test !!current->bpf_ctx. This patch also adds sockopt_{lock,release}_sock() helpers for sk_setsockopt() to use. These helpers will test has_current_bpf_ctx() before acquiring/releasing the lock. They are in EXPORT_SYMBOL for the ipv6 module to use in a latter patch. Note on the change in sock_setbindtodevice(). sockopt_lock_sock() is done in sock_setbindtodevice() instead of doing the lock_sock in sock_bindtoindex(..., lock_sk = true). Reviewed-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/r/20220817061717.4175589-1-kafai@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-18net: Add sk_setsockopt() to take the sk ptr instead of the sock ptrMartin KaFai Lau1-3/+10
A latter patch refactors bpf_setsockopt(SOL_SOCKET) with the sock_setsockopt() to avoid code duplication and code drift between the two duplicates. The current sock_setsockopt() takes sock ptr as the argument. The very first thing of this function is to get back the sk ptr by 'sk = sock->sk'. bpf_setsockopt() could be called when the sk does not have the sock ptr created. Meaning sk->sk_socket is NULL. For example, when a passive tcp connection has just been established but has yet been accept()-ed. Thus, it cannot use the sock_setsockopt(sk->sk_socket) or else it will pass a NULL ptr. This patch moves all sock_setsockopt implementation to the newly added sk_setsockopt(). The new sk_setsockopt() takes a sk ptr and immediately gets the sock ptr by 'sock = sk->sk_socket' The existing sock_setsockopt(sock) is changed to call sk_setsockopt(sock->sk). All existing callers have both sock->sk and sk->sk_socket pointer. The latter patch will make bpf_setsockopt(SOL_SOCKET) call sk_setsockopt(sk) directly. The bpf_setsockopt(SOL_SOCKET) does not use the optnames that require sk->sk_socket, so it will be safe. Reviewed-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/r/20220817061711.4175048-1-kafai@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-18net: ethernet: altera: Add use of ethtool_op_get_ts_infoMaxime Chevallier1-0/+1
Add the ethtool_op_get_ts_info() callback to ethtool ops, so that we can at least use software timestamping. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://lore.kernel.org/r/20220817095725.97444-1-maxime.chevallier@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-17stmmac: intel: remove unused 'has_crossts' flagWong Vee Khee2-2/+0
The 'has_crossts' flag was not used anywhere in the stmmac driver, removing it from both header file and dwmac-intel driver. Signed-off-by: Wong Vee Khee <veekhee@apple.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Link: https://lore.kernel.org/r/20220817064324.10025-1-veekhee@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-17Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextJakub Kicinski61-372/+986
Andrii Nakryiko says: ==================== bpf-next 2022-08-17 We've added 45 non-merge commits during the last 14 day(s) which contain a total of 61 files changed, 986 insertions(+), 372 deletions(-). The main changes are: 1) New bpf_ktime_get_tai_ns() BPF helper to access CLOCK_TAI, from Kurt Kanzenbach and Jesper Dangaard Brouer. 2) Few clean ups and improvements for libbpf 1.0, from Andrii Nakryiko. 3) Expose crash_kexec() as kfunc for BPF programs, from Artem Savkov. 4) Add ability to define sleepable-only kfuncs, from Benjamin Tissoires. 5) Teach libbpf's bpf_prog_load() and bpf_map_create() to gracefully handle unsupported names on old kernels, from Hangbin Liu. 6) Allow opting out from auto-attaching BPF programs by libbpf's BPF skeleton, from Hao Luo. 7) Relax libbpf's requirement for shared libs to be marked executable, from Henqgi Chen. 8) Improve bpf_iter internals handling of error returns, from Hao Luo. 9) Few accommodations in libbpf to support GCC-BPF quirks, from James Hilliard. 10) Fix BPF verifier logic around tracking dynptr ref_obj_id, from Joanne Koong. 11) bpftool improvements to handle full BPF program names better, from Manu Bretelle. 12) bpftool fixes around libcap use, from Quentin Monnet. 13) BPF map internals clean ups and improvements around memory allocations, from Yafang Shao. 14) Allow to use cgroup_get_from_file() on cgroupv1, allowing BPF cgroup iterator to work on cgroupv1, from Yosry Ahmed. 15) BPF verifier internal clean ups, from Dave Marchevsky and Joanne Koong. 16) Various fixes and clean ups for selftests/bpf and vmtest.sh, from Daniel Xu, Artem Savkov, Joanne Koong, Andrii Nakryiko, Shibin Koikkara Reeny. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (45 commits) selftests/bpf: Few fixes for selftests/bpf built in release mode libbpf: Clean up deprecated and legacy aliases libbpf: Streamline bpf_attr and perf_event_attr initialization libbpf: Fix potential NULL dereference when parsing ELF selftests/bpf: Tests libbpf autoattach APIs libbpf: Allows disabling auto attach selftests/bpf: Fix attach point for non-x86 arches in test_progs/lsm libbpf: Making bpf_prog_load() ignore name if kernel doesn't support selftests/bpf: Update CI kconfig selftests/bpf: Add connmark read test selftests/bpf: Add existing connection bpf_*_ct_lookup() test bpftool: Clear errno after libcap's checks bpf: Clear up confusion in bpf_skb_adjust_room()'s documentation bpftool: Fix a typo in a comment libbpf: Add names for auxiliary maps bpf: Use bpf_map_area_alloc consistently on bpf map creation bpf: Make __GFP_NOWARN consistent in bpf map creation bpf: Use bpf_map_area_free instread of kvfree bpf: Remove unneeded memset in queue_stack_map creation libbpf: preserve errno across pr_warn/pr_info/pr_debug ... ==================== Link: https://lore.kernel.org/r/20220817215656.1180215-1-andrii@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-17selftests/bpf: Few fixes for selftests/bpf built in release modeAndrii Nakryiko4-6/+6
Fix few issues found when building and running test_progs in release mode. First, potentially uninitialized idx variable in xskxceiver, force-initialize to zero to satisfy compiler. Few instances of defining uprobe trigger functions break in release mode unless marked as noinline, due to being static. Add noinline to make sure everything works. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Hao Luo <haoluo@google.com> Link: https://lore.kernel.org/bpf/20220816001929.369487-5-andrii@kernel.org
2022-08-17libbpf: Clean up deprecated and legacy aliasesAndrii Nakryiko5-10/+2
Remove three missed deprecated APIs that were aliased to new APIs: bpf_object__unload, bpf_prog_attach_xattr and btf__load. Also move legacy API libbpf_find_kernel_btf (aliased to btf__load_vmlinux_btf) into libbpf_legacy.h. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Hao Luo <haoluo@google.com> Link: https://lore.kernel.org/bpf/20220816001929.369487-4-andrii@kernel.org
2022-08-17libbpf: Streamline bpf_attr and perf_event_attr initializationAndrii Nakryiko4-91/+138
Make sure that entire libbpf code base is initializing bpf_attr and perf_event_attr with memset(0). Also for bpf_attr make sure we clear and pass to kernel only relevant parts of bpf_attr. bpf_attr is a huge union of independent sub-command attributes, so there is no need to clear and pass entire union bpf_attr, which over time grows quite a lot and for most commands this growth is completely irrelevant. Few cases where we were relying on compiler initialization of BPF UAPI structs (like bpf_prog_info, bpf_map_info, etc) with `= {};` were switched to memset(0) pattern for future-proofing. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Hao Luo <haoluo@google.com> Link: https://lore.kernel.org/bpf/20220816001929.369487-3-andrii@kernel.org
2022-08-17libbpf: Fix potential NULL dereference when parsing ELFAndrii Nakryiko1-1/+1
Fix if condition filtering empty ELF sections to prevent NULL dereference. Fixes: 47ea7417b074 ("libbpf: Skip empty sections in bpf_object__init_global_data_maps") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Hao Luo <haoluo@google.com> Link: https://lore.kernel.org/bpf/20220816001929.369487-2-andrii@kernel.org
2022-08-17Merge branch 'net-dsa-bcm_sf2-utilize-phylink-for-all-ports'Jakub Kicinski1-67/+63
Florian Fainelli says: ==================== net: dsa: bcm_sf2: Utilize PHYLINK for all ports This patch series has the bcm_sf2 driver utilize PHYLINK to configure the CPU port link parameters to unify the configuration and pave the way for DSA to utilize PHYLINK for all ports in the future. Tested on BCM7445 and BCM7278 ==================== Link: https://lore.kernel.org/r/20220815175009.2681932-1-f.fainelli@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-17net: dsa: bcm_sf2: Have PHYLINK configure CPU/IMP port(s)Florian Fainelli1-52/+43
Remove the artificial limitations imposed upon bcm_sf2_sw_mac_link_{up,down} and allow us to override the link parameters for IMP port(s) as well as regular ports by accounting for the special differences that exist there. Remove the code that did override the link parameters in bcm_sf2_imp_setup(). Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-17net: dsa: bcm_sf2: Introduce helper for port override offsetFlorian Fainelli1-16/+21
Depending upon the generation of switches, we have different offsets for configuring a given port's status override where link parameters are applied. Introduce a helper function that we re-use throughout the code in order to let phylink callbacks configure the IMP/CPU port(s) in subsequent changes. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-17net: sfp: use simplified HWMON_CHANNEL_INFO macroBeniamin Sandu1-83/+38
This makes the code look cleaner and easier to read. Signed-off-by: Beniamin Sandu <beniaminsandu@gmail.com> Link: https://lore.kernel.org/r/20220813204658.848372-1-beniaminsandu@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-17selftests/bpf: Tests libbpf autoattach APIsHao Luo2-0/+53
Adds test for libbpf APIs that toggle bpf program auto-attaching. Signed-off-by: Hao Luo <haoluo@google.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220816234012.910255-2-haoluo@google.com
2022-08-17libbpf: Allows disabling auto attachHao Luo3-1/+18
Adds libbpf APIs for disabling auto-attach for individual functions. This is motivated by the use case of cgroup iter [1]. Some iter types require their parameters to be non-zero, therefore applying auto-attach on them will fail. With these two new APIs, users who want to use auto-attach and these types of iters can disable auto-attach on the program and perform manual attach. [1] https://lore.kernel.org/bpf/CAEf4BzZ+a2uDo_t6kGBziqdz--m2gh2_EUwkGLDtMd65uwxUjA@mail.gmail.com/ Signed-off-by: Hao Luo <haoluo@google.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220816234012.910255-1-haoluo@google.com