summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2020-03-10bpftool: Introduce "prog profile" commandSong Liu5-1/+608
With fentry/fexit programs, it is possible to profile BPF program with hardware counters. Introduce bpftool "prog profile", which measures key metrics of a BPF program. bpftool prog profile command creates per-cpu perf events. Then it attaches fentry/fexit programs to the target BPF program. The fentry program saves perf event value to a map. The fexit program reads the perf event again, and calculates the difference, which is the instructions/cycles used by the target program. Example input and output: ./bpftool prog profile id 337 duration 3 cycles instructions llc_misses 4228 run_cnt 3403698 cycles (84.08%) 3525294 instructions # 1.04 insn per cycle (84.05%) 13 llc_misses # 3.69 LLC misses per million isns (83.50%) This command measures cycles and instructions for BPF program with id 337 for 3 seconds. The program has triggered 4228 times. The rest of the output is similar to perf-stat. In this example, the counters were only counting ~84% of the time because of time multiplexing of perf counters. Note that, this approach measures cycles and instructions in very small increments. So the fentry/fexit programs introduce noticeable errors to the measurement results. The fentry/fexit programs are generated with BPF skeletons. Therefore, we build bpftool twice. The first time _bpftool is built without skeletons. Then, _bpftool is used to generate the skeletons. The second time, bpftool is built with skeletons. Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Quentin Monnet <quentin@isovalent.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200309173218.2739965-2-songliubraving@fb.com
2020-03-09bpf, doc: Update maintainers for L7 BPFLorenz Bauer1-0/+2
Add Jakub and myself as maintainers for sockmap related code. Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200309111243.6982-13-lmb@cloudflare.com
2020-03-09selftests: bpf: Enable UDP sockmap reuseport testsLorenz Bauer1-6/+0
Remove the guard that disables UDP tests now that sockmap has support for them. Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200309111243.6982-12-lmb@cloudflare.com
2020-03-09selftests: bpf: Add tests for UDP sockets in sockmapLorenz Bauer1-30/+127
Expand the TCP sockmap test suite to also check UDP sockets. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200309111243.6982-11-lmb@cloudflare.com
2020-03-09selftests: bpf: Don't listen() on UDP socketsLorenz Bauer1-22/+25
Most tests for TCP sockmap can be adapted to UDP sockmap if the listen call is skipped. Rename listen_loopback, etc. to socket_loopback and skip listen() for SOCK_DGRAM. Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200309111243.6982-10-lmb@cloudflare.com
2020-03-09bpf: sockmap: Add UDP supportLorenz Bauer1-4/+33
Allow adding hashed UDP sockets to sockmaps. Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200309111243.6982-9-lmb@cloudflare.com
2020-03-09bpf: Add sockmap hooks for UDP socketsLorenz Bauer4-0/+60
Add basic psock hooks for UDP sockets. This allows adding and removing sockets, as well as automatic removal on unhash and close. Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200309111243.6982-8-lmb@cloudflare.com
2020-03-09bpf: sockmap: Simplify sock_map_init_protoLorenz Bauer1-15/+4
We can take advantage of the fact that both callers of sock_map_init_proto are holding a RCU read lock, and have verified that psock is valid. Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200309111243.6982-7-lmb@cloudflare.com
2020-03-09bpf: sockmap: Move generic sockmap hooks from BPF TCPLorenz Bauer5-119/+118
The init, close and unhash handlers from TCP sockmap are generic, and can be reused by UDP sockmap. Move the helpers into the sockmap code base and expose them. This requires tcp_bpf_get_proto and tcp_bpf_clone to be conditional on BPF_STREAM_PARSER. The moved functions are unmodified, except that sk_psock_unlink is renamed to sock_map_unlink to better match its behaviour. Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200309111243.6982-6-lmb@cloudflare.com
2020-03-09bpf: tcp: Guard declarations with CONFIG_NET_SOCK_MSGLorenz Bauer1-2/+2
tcp_bpf.c is only included in the build if CONFIG_NET_SOCK_MSG is selected. The declaration should therefore be guarded as such. Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200309111243.6982-5-lmb@cloudflare.com
2020-03-09bpf: tcp: Move assertions into tcp_bpf_get_protoLorenz Bauer3-37/+31
We need to ensure that sk->sk_prot uses certain callbacks, so that code that directly calls e.g. tcp_sendmsg in certain corner cases works. To avoid spurious asserts, we must to do this only if sk_psock_update_proto has not yet been called. The same invariants apply for tcp_bpf_check_v6_needs_rebuild, so move the call as well. Doing so allows us to merge tcp_bpf_init and tcp_bpf_reinit. Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200309111243.6982-4-lmb@cloudflare.com
2020-03-09skmsg: Update saved hooks only onceLorenz Bauer2-19/+17
Only update psock->saved_* if psock->sk_proto has not been initialized yet. This allows us to get rid of tcp_bpf_reinit_sk_prot. Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200309111243.6982-3-lmb@cloudflare.com
2020-03-09bpf: sockmap: Only check ULP for TCP socketsLorenz Bauer4-12/+15
The sock map code checks that a socket does not have an active upper layer protocol before inserting it into the map. This requires casting via inet_csk, which isn't valid for UDP sockets. Guard checks for ULP by checking inet_sk(sk)->is_icsk first. Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200309111243.6982-2-lmb@cloudflare.com
2020-03-05bpf: Fix bpf_prog_test_run_tracing for !CONFIG_NETKP Singh1-0/+7
test_run.o is not built when CONFIG_NET is not set and bpf_prog_test_run_tracing being referenced in bpf_trace.o causes the linker error: ld: kernel/trace/bpf_trace.o:(.rodata+0x38): undefined reference to `bpf_prog_test_run_tracing' Add a __weak function in bpf_trace.c to handle this. Fixes: da00d2f117a0 ("bpf: Add test ops for BPF_PROG_TYPE_TRACING") Signed-off-by: KP Singh <kpsingh@google.com> Reported-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200305220127.29109-1-kpsingh@chromium.org
2020-03-05bpf: Remove unnecessary CAP_MAC_ADMIN checkKP Singh1-10/+3
While well intentioned, checking CAP_MAC_ADMIN for attaching BPF_MODIFY_RETURN tracing programs to "security_" functions is not necessary as tracing BPF programs already require CAP_SYS_ADMIN. Fixes: 6ba43b761c41 ("bpf: Attachment verification for BPF_MODIFY_RETURN") Signed-off-by: KP Singh <kpsingh@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200305204955.31123-1-kpsingh@chromium.org
2020-03-05MAINTAINERS: Add entry for RV32G BPF JITLuke Nelson1-1/+12
Add a new entry for the 32-bit RISC-V JIT to MAINTAINERS and change mailing list to netdev and bpf following the guidelines from commit e42da4c62abb ("docs/bpf: Update bpf development Q/A file"). Signed-off-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Björn Töpel <bjorn.topel@gmail.com> Acked-by: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200305050207.4159-5-luke.r.nels@gmail.com
2020-03-05bpf, doc: Add BPF JIT for RV32G to BPF documentationLuke Nelson2-2/+3
Update filter.txt and admin-guide to mention the BPF JIT for RV32G. Co-developed-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Björn Töpel <bjorn.topel@gmail.com> Acked-by: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200305050207.4159-4-luke.r.nels@gmail.com
2020-03-05riscv, bpf: Add RV32G eBPF JITLuke Nelson4-2/+1366
This is an eBPF JIT for RV32G, adapted from the JIT for RV64G and the 32-bit ARM JIT. There are two main changes required for this to work compared to the RV64 JIT. First, eBPF registers are 64-bit, while RV32G registers are 32-bit. BPF registers either map directly to 2 RISC-V registers, or reside in stack scratch space and are saved and restored when used. Second, many 64-bit ALU operations do not trivially map to 32-bit operations. Operations that move bits between high and low words, such as ADD, LSH, MUL, and others must emulate the 64-bit behavior in terms of 32-bit instructions. This patch also makes related changes to bpf_jit.h, such as adding RISC-V instructions required by the RV32 JIT. Supported features: The RV32 JIT supports the same features and instructions as the RV64 JIT, with the following exceptions: - ALU64 DIV/MOD: Requires loops to implement on 32-bit hardware. - BPF_XADD | BPF_DW: There's no 8-byte atomic instruction in RV32. These features are also unsupported on other BPF JITs for 32-bit architectures. Testing: - lib/test_bpf.c test_bpf: Summary: 378 PASSED, 0 FAILED, [349/366 JIT'ed] test_bpf: test_skb_segment: Summary: 2 PASSED, 0 FAILED The tests that are not JITed are all due to use of 64-bit div/mod or 64-bit xadd. - tools/testing/selftests/bpf/test_verifier.c Summary: 1415 PASSED, 122 SKIPPED, 43 FAILED Tested both with and without BPF JIT hardening. This is the same set of tests that pass using the BPF interpreter with the JIT disabled. Verification and synthesis: We developed the RV32 JIT using our automated verification tool, Serval. We have used Serval in the past to verify patches to the RV64 JIT. We also used Serval to superoptimize the resulting code through program synthesis. You can find the tool and a guide to the approach and results here: https://github.com/uw-unsat/serval-bpf/tree/rv32-jit-v5 Co-developed-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Björn Töpel <bjorn.topel@gmail.com> Acked-by: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200305050207.4159-3-luke.r.nels@gmail.com
2020-03-05riscv, bpf: Factor common RISC-V JIT codeLuke Nelson4-601/+639
This patch factors out code that can be used by both the RV64 and RV32 BPF JITs to a common bpf_jit.h and bpf_jit_core.c. Move struct definitions and macro-like functions to header. Rename rv_sb_insn/rv_uj_insn to rv_b_insn/rv_j_insn to match the RISC-V specification. Move reusable functions emit_body() and bpf_int_jit_compile() to bpf_jit_core.c with minor simplifications. Rename emit_insn() and build_{prologue,epilogue}() to be prefixed with "bpf_jit_" as they are no longer static. Rename bpf_jit_comp.c to bpf_jit_comp64.c to be more explicit. Co-developed-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Björn Töpel <bjorn.topel@gmail.com> Acked-by: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20200305050207.4159-2-luke.r.nels@gmail.com
2020-03-04Merge branch 'bpf_modify_ret'Alexei Starovoitov17-187/+524
KP Singh says: ==================== v3 -> v4: * Fix a memory leak noticed by Daniel. v2 -> v3: * bpf_trampoline_update_progs -> bpf_trampoline_get_progs + const qualification. * Typos in commit messages. * Added Andrii's Acks. v1 -> v2: * Adressed Andrii's feedback. * Fixed a bug that Alexei noticed about nop generation. * Rebase. This was brought up in the KRSI v4 discussion and found to be useful both for security and tracing programs. https://lore.kernel.org/bpf/20200225193108.GB22391@chromium.org/ The modify_return programs are allowed for security hooks (with an extra CAP_MAC_ADMIN check) and functions whitelisted for error injection (ALLOW_ERROR_INJECTION). The "security_" check is expected to be cleaned up with the KRSI patch series. Here is an example of how a fmod_ret program behaves: int func_to_be_attached(int a, int b) { <--- do_fentry do_fmod_ret: <update ret by calling fmod_ret> if (ret != 0) goto do_fexit; original_function: <side_effects_happen_here> } <--- do_fexit ALLOW_ERROR_INJECTION(func_to_be_attached, ERRNO) The fmod_ret program attached to this function can be defined as: SEC("fmod_ret/func_to_be_attached") int BPF_PROG(func_name, int a, int b, int ret) { // This will skip the original function logic. return -1; } ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-03-04bpf: Add selftests for BPF_MODIFY_RETURNKP Singh3-1/+135
Test for two scenarios: * When the fmod_ret program returns 0, the original function should be called along with fentry and fexit programs. * When the fmod_ret program returns a non-zero value, the original function should not be called, no side effect should be observed and fentry and fexit programs should be called. The result from the kernel function call and whether a side-effect is observed is returned via the retval attr of the BPF_PROG_TEST_RUN (bpf) syscall. Signed-off-by: KP Singh <kpsingh@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200304191853.1529-8-kpsingh@chromium.org
2020-03-04bpf: Add test ops for BPF_PROG_TYPE_TRACINGKP Singh6-76/+67
The current fexit and fentry tests rely on a different program to exercise the functions they attach to. Instead of doing this, implement the test operations for tracing which will also be used for BPF_MODIFY_RETURN in a subsequent patch. Also, clean up the fexit test to use the generated skeleton. Signed-off-by: KP Singh <kpsingh@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200304191853.1529-7-kpsingh@chromium.org
2020-03-04tools/libbpf: Add support for BPF_MODIFY_RETURNKP Singh1-0/+4
Signed-off-by: KP Singh <kpsingh@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200304191853.1529-6-kpsingh@chromium.org
2020-03-04bpf: Attachment verification for BPF_MODIFY_RETURNKP Singh2-8/+51
- Allow BPF_MODIFY_RETURN attachment only to functions that are: * Whitelisted for error injection by checking within_error_injection_list. Similar discussions happened for the bpf_override_return helper. * security hooks, this is expected to be cleaned up with the LSM changes after the KRSI patches introduce the LSM_HOOK macro: https://lore.kernel.org/bpf/20200220175250.10795-1-kpsingh@chromium.org/ - The attachment is currently limited to functions that return an int. This can be extended later other types (e.g. PTR). Signed-off-by: KP Singh <kpsingh@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200304191853.1529-5-kpsingh@chromium.org
2020-03-04bpf: Introduce BPF_MODIFY_RETURNKP Singh8-13/+130
When multiple programs are attached, each program receives the return value from the previous program on the stack and the last program provides the return value to the attached function. The fmod_ret bpf programs are run after the fentry programs and before the fexit programs. The original function is only called if all the fmod_ret programs return 0 to avoid any unintended side-effects. The success value, i.e. 0 is not currently configurable but can be made so where user-space can specify it at load time. For example: int func_to_be_attached(int a, int b) { <--- do_fentry do_fmod_ret: <update ret by calling fmod_ret> if (ret != 0) goto do_fexit; original_function: <side_effects_happen_here> } <--- do_fexit The fmod_ret program attached to this function can be defined as: SEC("fmod_ret/func_to_be_attached") int BPF_PROG(func_name, int a, int b, int ret) { // This will skip the original function logic. return 1; } The first fmod_ret program is passed 0 in its return argument. Signed-off-by: KP Singh <kpsingh@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200304191853.1529-4-kpsingh@chromium.org
2020-03-04bpf: JIT helpers for fmod_ret progsKP Singh1-63/+85
* Split the invoke_bpf program to prepare for special handling of fmod_ret programs introduced in a subsequent patch. * Move the definition of emit_cond_near_jump and emit_nops as they are needed for fmod_ret. * Refactor branch target alignment into its own generic helper function i.e. emit_align. Signed-off-by: KP Singh <kpsingh@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200304191853.1529-3-kpsingh@chromium.org
2020-03-04bpf: Refactor trampoline update codeKP Singh4-45/+71
As we need to introduce a third type of attachment for trampolines, the flattened signature of arch_prepare_bpf_trampoline gets even more complicated. Refactor the prog and count argument to arch_prepare_bpf_trampoline to use bpf_tramp_progs to simplify the addition and accounting for new attachment types. Signed-off-by: KP Singh <kpsingh@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200304191853.1529-2-kpsingh@chromium.org
2020-03-04selftests/bpf: Support out-of-tree vmlinux builds for VMLINUX_BTFAndrii Nakryiko1-4/+7
Add detection of out-of-tree built vmlinux image for the purpose of VMLINUX_BTF detection. According to Documentation/kbuild/kbuild.rst, O takes precedence over KBUILD_OUTPUT. Also ensure ~/path/to/build/dir also works by relying on wildcard's resolution first, but then applying $(abspath) at the end to also handle O=../../whatever cases. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200304184336.165766-1-andriin@fb.com
2020-03-04kbuild: Remove debug info from kallsyms linkingKees Cook1-9/+19
When CONFIG_DEBUG_INFO is enabled, the two kallsyms linking steps spend time collecting and writing the dwarf sections to the temporary output files. kallsyms does not need this information, and leaving it off halves their linking time. This is especially noticeable without CONFIG_DEBUG_INFO_REDUCED. The BTF linking stage, however, does still need those details. Refactor the BTF and kallsyms generation stages slightly for more regularized temporary names. Skip debug during kallsyms links. Additionally move "info BTF" to the correct place since commit 8959e39272d6 ("kbuild: Parameterize kallsyms generation and correct reporting"), which added "info LD ..." to vmlinux_link calls. For a full debug info build with BTF, my link time goes from 1m06s to 0m54s, saving about 12 seconds, or 18%. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/202003031814.4AEA3351@keescook
2020-03-04Merge branch 'bpf-uapi-enums'Daniel Borkmann5-141/+224
Andrii Nakryiko says: ==================== Convert BPF-related UAPI constants, currently defined as #define macro, into anonymous enums. This has no difference in terms of usage of such constants in C code (they are still could be used in all the compile-time contexts that `#define`s can), but they are recorded as part of DWARF type info, and subsequently get recorded as part of kernel's BTF type info. This allows those constants to be emitted as part of vmlinux.h auto-generated header file and be used from BPF programs. Which is especially convenient for all kinds of BPF helper flags and makes CO-RE BPF programs nicer to write. libbpf's btf_dump logic currently assumes enum values are signed 32-bit values, but that doesn't match a typical case, so switch it to emit unsigned values. Once BTF encoding of BTF_KIND_ENUM is extended to capture signedness properly, this will be made more flexible. As an immediate validation of the approach, runqslower's copy of BPF_F_CURRENT_CPU #define is dropped in favor of its enum variant from vmlinux.h. v2->v3: - convert only constants usable from BPF programs (BPF helper flags, map create flags, etc) (Alexei); v1->v2: - fix up btf_dump test to use max 32-bit unsigned value instead of negative one. ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2020-03-04tools/runqslower: Drop copy/pasted BPF_F_CURRENT_CPU definitonAndrii Nakryiko1-3/+0
With BPF_F_CURRENT_CPU being an enum, it is now captured in vmlinux.h and is readily usable by runqslower. So drop local copy/pasted definition in favor of the one coming from vmlinux.h. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200303003233.3496043-4-andriin@fb.com
2020-03-04libbpf: Assume unsigned values for BTF_KIND_ENUMAndrii Nakryiko2-5/+5
Currently, BTF_KIND_ENUM type doesn't record whether enum values should be interpreted as signed or unsigned. In Linux, most enums are unsigned, though, so interpreting them as unsigned matches real world better. Change btf_dump test case to test maximum 32-bit value, instead of negative value. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200303003233.3496043-3-andriin@fb.com
2020-03-04bpf: Switch BPF UAPI #define constants used from BPF program side to enumsAndrii Nakryiko2-133/+219
Switch BPF UAPI constants, previously defined as #define macro, to anonymous enum values. This preserves constants values and behavior in expressions, but has added advantaged of being captured as part of DWARF and, subsequently, BTF type info. Which, in turn, greatly improves usefulness of generated vmlinux.h for BPF applications, as it will not require BPF users to copy/paste various flags and constants, which are frequently used with BPF helpers. Only those constants that are used/useful from BPF program side are converted. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200303003233.3496043-2-andriin@fb.com
2020-03-03libbpf: Fix handling of optional field_name in btf_dump__emit_type_declAndrii Nakryiko1-1/+1
Internal functions, used by btf_dump__emit_type_decl(), assume field_name is never going to be NULL. Ensure it's always the case. Fixes: 9f81654eebe8 ("libbpf: Expose BTF-to-C type declaration emitting API") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200303180800.3303471-1-andriin@fb.com
2020-03-03Merge branch 'bpf_gso_size'Alexei Starovoitov7-14/+89
Willem de Bruijn says: ==================== See first patch for details. Patch split across three parts { kernel feature, uapi header, tools } following the custom for such __sk_buff changes. ==================== Acked-by: Petar Penkov <ppenkov@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-03-03selftests/bpf: Test new __sk_buff field gso_sizeWillem de Bruijn3-0/+50
Analogous to the gso_segs selftests introduced in commit d9ff286a0f59 ("bpf: allow BPF programs access skb_shared_info->gso_segs field"). Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200303200503.226217-4-willemdebruijn.kernel@gmail.com
2020-03-03bpf: Sync uapi bpf.h to tools/Willem de Bruijn1-0/+1
sync tools/include/uapi/linux/bpf.h to match include/uapi/linux/bpf.h Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200303200503.226217-3-willemdebruijn.kernel@gmail.com
2020-03-03bpf: Add gso_size to __sk_buffWillem de Bruijn3-14/+38
BPF programs may want to know whether an skb is gso. The canonical answer is skb_is_gso(skb), which tests that gso_size != 0. Expose this field in the same manner as gso_segs. That field itself is not a sufficient signal, as the comment in skb_shared_info makes clear: gso_segs may be zero, e.g., from dodgy sources. Also prepare net/bpf/test_run for upcoming BPF_PROG_TEST_RUN tests of the feature. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200303200503.226217-2-willemdebruijn.kernel@gmail.com
2020-03-02Merge branch 'bpf_link'Alexei Starovoitov8-73/+476
Andrii Nakryiko says: ==================== This patch series adds bpf_link abstraction, analogous to libbpf's already existing bpf_link abstraction. This formalizes and makes more uniform existing bpf_link-like BPF program link (attachment) types (raw tracepoint and tracing links), which are FD-based objects that are automatically detached when last file reference is closed. These types of BPF program links are switched to using bpf_link framework. FD-based bpf_link approach provides great safety guarantees, by ensuring there is not going to be an abandoned BPF program attached, if user process suddenly exits or forgets to clean up after itself. This is especially important in production environment and is what all the recent new BPF link types followed. One of the previously existing inconveniences of FD-based approach, though, was the scenario in which user process wants to install BPF link and exit, but let attached BPF program run. Now, with bpf_link abstraction in place, it's easy to support pinning links in BPF FS, which is done as part of the same patch #1. This allows FD-based BPF program links to survive exit of a user process and original file descriptor being closed, by creating an file entry in BPF FS. This provides great safety by default, with simple way to opt out for cases where it's needed. Corresponding libbpf APIs are added in the same patch set, as well as selftests for this functionality. Other types of BPF program attachments (XDP, cgroup, perf_event, etc) are going to be converted in subsequent patches to follow similar approach. v1->v2: - use bpf_link_new_fd() uniformly (Alexei). ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-03-02selftests/bpf: Add link pinning selftestsAndrii Nakryiko2-0/+130
Add selftests validating link pinning/unpinning and associated BPF link (attachment) lifetime. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200303043159.323675-4-andriin@fb.com
2020-03-02libbpf: Add bpf_link pinning/unpinningAndrii Nakryiko3-27/+114
With bpf_link abstraction supported by kernel explicitly, add pinning/unpinning API for links. Also allow to create (open) bpf_link from BPF FS file. This API allows to have an "ephemeral" FD-based BPF links (like raw tracepoint or fexit/freplace attachments) surviving user process exit, by pinning them in a BPF FS, which is an important use case for long-running BPF programs. As part of this, expose underlying FD for bpf_link. While legacy bpf_link's might not have a FD associated with them (which will be expressed as a bpf_link with fd=-1), kernel's abstraction is based around FD-based usage, so match it closely. This, subsequently, allows to have a generic pinning/unpinning API for generalized bpf_link. For some types of bpf_links kernel might not support pinning, in which case bpf_link__pin() will return error. With FD being part of generic bpf_link, also get rid of bpf_link_fd in favor of using vanialla bpf_link. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200303043159.323675-3-andriin@fb.com
2020-03-02bpf: Introduce pinnable bpf_link abstractionAndrii Nakryiko3-46/+232
Introduce bpf_link abstraction, representing an attachment of BPF program to a BPF hook point (e.g., tracepoint, perf event, etc). bpf_link encapsulates ownership of attached BPF program, reference counting of a link itself, when reference from multiple anonymous inodes, as well as ensures that release callback will be called from a process context, so that users can safely take mutex locks and sleep. Additionally, with a new abstraction it's now possible to generalize pinning of a link object in BPF FS, allowing to explicitly prevent BPF program detachment on process exit by pinning it in a BPF FS and let it open from independent other process to keep working with it. Convert two existing bpf_link-like objects (raw tracepoint and tracing BPF program attachments) into utilizing bpf_link framework, making them pinnable in BPF FS. More FD-based bpf_links will be added in follow up patches. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200303043159.323675-2-andriin@fb.com
2020-03-02selftests/bpf: Declare bpf_log_buf variables as staticToke Høiland-Jørgensen3-3/+3
The cgroup selftests did not declare the bpf_log_buf variable as static, leading to a linker error with GCC 10 (which defaults to -fno-common). Fix this by adding the missing static declarations. Fixes: 257c88559f36 ("selftests/bpf: Convert test_cgroup_attach to prog_tests") Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrey Ignatov <rdna@fb.com> Link: https://lore.kernel.org/bpf/20200302145348.559177-1-toke@redhat.com
2020-03-02bpf: Reliably preserve btf_trace_xxx typesAndrii Nakryiko1-7/+11
btf_trace_xxx types, crucial for tp_btf BPF programs (raw tracepoint with verifier-checked direct memory access), have to be preserved in kernel BTF to allow verifier do its job and enforce type/memory safety. It was reported ([0]) that for kernels built with Clang current type-casting approach doesn't preserve these types. This patch fixes it by declaring an anonymous union for each registered tracepoint, capturing both struct bpf_raw_event_map information, as well as recording btf_trace_##call type reliably. Structurally, it's still the same content as for a plain struct bpf_raw_event_map, so no other changes are necessary. [0] https://github.com/iovisor/bcc/issues/2770#issuecomment-591007692 Fixes: e8c423fb31fa ("bpf: Add typecast to raw_tracepoints to help BTF generation") Reported-by: Wenbo Zhang <ethercflow@gmail.com> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200301081045.3491005-2-andriin@fb.com
2020-03-02Merge branch 'move_BPF_PROG_to_libbpf'Alexei Starovoitov17-139/+140
Andrii Nakryiko says: ==================== Move BPF_PROG, BPF_KPROBE, and BPF_KRETPROBE helper macros from private selftests helpers to public libbpf ones. These helpers are extremely helpful for writing tracing BPF applications and have been requested to be exposed for easy use (e.g., [0]). As part of this move, fix up BPF_KRETPROBE to not allow for capturing input arguments (as it's unreliable and they will be often clobbered). Also, add vmlinux.h header guard to allow multi-time inclusion, if necessary; but also to let PT_REGS_PARM do proper detection of struct pt_regs field names on x86 arch. See relevant patches for more details. [0] https://github.com/iovisor/bcc/pull/2778#issue-381642907 ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-03-02libbpf: Merge selftests' bpf_trace_helpers.h into libbpf's bpf_tracing.hAndrii Nakryiko16-135/+131
Move BPF_PROG, BPF_KPROBE, and BPF_KRETPROBE macro into libbpf's bpf_tracing.h header to make it available for non-selftests users. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200229231112.1240137-5-andriin@fb.com
2020-03-02selftests/bpf: Fix BPF_KRETPROBE macro and use it in attach_probe testAndrii Nakryiko3-11/+11
For kretprobes, there is no point in capturing input arguments from pt_regs, as they are going to be, most probably, clobbered by the time probed kernel function returns. So switch BPF_KRETPROBE to accept zero or one argument (optional return result). Fixes: ac065870d928 ("selftests/bpf: Add BPF_PROG, BPF_KPROBE, and BPF_KRETPROBE macros") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200229231112.1240137-4-andriin@fb.com
2020-03-02libbpf: Fix use of PT_REGS_PARM macros with vmlinux.hAndrii Nakryiko1-1/+1
Add detection of vmlinux.h to bpf_tracing.h header for PT_REGS macro. Currently, BPF applications have to define __KERNEL__ symbol to use correct definition of struct pt_regs on x86 arch. This is due to different field names under internal kernel vs UAPI conditions. To make this more transparent for users, detect vmlinux.h by checking __VMLINUX_H__ symbol. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200229231112.1240137-3-andriin@fb.com
2020-03-02bpftool: Add header guards to generated vmlinux.hAndrii Nakryiko1-0/+5
Add canonical #ifndef/#define/#endif guard for generated vmlinux.h header with __VMLINUX_H__ symbol. __VMLINUX_H__ is also going to play double role of identifying whether vmlinux.h is being used, versus, say, BCC or non-CO-RE libbpf modes with dependency on kernel headers. This will make it possible to write helper macro/functions, agnostic to exact BPF program set up. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200229231112.1240137-2-andriin@fb.com
2020-03-02mvneta: add XDP ethtool errors stats for TX to driverJesper Dangaard Brouer1-4/+26
Adding ethtool stats for when XDP transmitted packets overrun the TX queue. This is recorded separately for XDP_TX and ndo_xdp_xmit. This is an important aid for troubleshooting XDP based setups. It is currently a known weakness and property of XDP that there isn't any push-back or congestion feedback when transmitting frames via XDP. It's easy to realise when redirecting from a higher speed link into a slower speed link, or simply two ingress links into a single egress. The situation can also happen when Ethernet flow control is active. For testing the patch and provoking the situation to occur on my Espressobin board, I configured the TX-queue to be smaller (434) than RX-queue (512) and overload network with large MTU size frames (as a larger frame takes longer to transmit). Hopefully the upcoming XDP TX hook can be extended to provide insight into these TX queue overflows, to allow programmable adaptation strategies. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>