linux - Linux Kernel (branches are rebased on master from time to time)

Age	Commit message (Collapse)	Author	Files	Lines
2019-11-02	bpf, samples: Use bpf_probe_read_user where appropriate	Daniel Borkmann	3	-5/+5
	Use bpf_probe_read_user() helper instead of bpf_probe_read() for samples that attach to kprobes probing on user addresses. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/5b0144b3f8e031ec5e2438bd7de8d7877e63bf2f.1572649915.git.daniel@iogearbox.net
2019-11-02	bpf: Switch BPF probe insns to bpf_probe_read_kernel	Daniel Borkmann	1	-4/+5
	Commit 2a02759ef5f8 ("bpf: Add support for BTF pointers to interpreter") explicitly states that the pointer to BTF object is a pointer to a kernel object or NULL. Therefore we should also switch to using the strict kernel probe helper which is restricted to kernel addresses only when architectures have non-overlapping address spaces. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/d2b90827837685424a4b8008dfe0460558abfada.1572649915.git.daniel@iogearbox.net
2019-11-02	bpf: Add probe_read_{user, kernel} and probe_read_{user, kernel}_str helpers	Daniel Borkmann	3	-126/+299
	The current bpf_probe_read() and bpf_probe_read_str() helpers are broken in that they assume they can be used for probing memory access for kernel space addresses /as well as/ user space addresses. However, plain use of probe_kernel_read() for both cases will attempt to always access kernel space address space given access is performed under KERNEL_DS and some archs in-fact have overlapping address spaces where a kernel pointer and user pointer would have the /same/ address value and therefore accessing application memory via bpf_probe_read{,_str}() would read garbage values. Lets fix BPF side by making use of recently added 3d7081822f7f ("uaccess: Add non-pagefault user-space read functions"). Unfortunately, the only way to fix this status quo is to add dedicated bpf_probe_read_{user,kernel}() and bpf_probe_read_{user,kernel}_str() helpers. The bpf_probe_read{,_str}() helpers are kept as-is to retain their current behavior. The two _user() variants attempt the access always under USER_DS set, the two _kernel() variants will -EFAULT when accessing user memory if the underlying architecture has non-overlapping address ranges, also avoiding throwing the kernel warning via 00c42373d397 ("x86-64: add warning for non-canonical user access address dereferences"). Fixes: a5e8c07059d0 ("bpf: add bpf_probe_read_str helper") Fixes: 2541517c32be ("tracing, perf: Implement BPF programs attached to kprobes") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/796ee46e948bc808d54891a1108435f8652c6ca4.1572649915.git.daniel@iogearbox.net
2019-11-02	bpf: Make use of probe_user_write in probe write helper	Daniel Borkmann	1	-4/+2
	Convert the bpf_probe_write_user() helper to probe_user_write() such that writes are not attempted under KERNEL_DS anymore which is buggy as kernel and user space pointers can have overlapping addresses. Also, given we have the access_ok() check inside probe_user_write(), the helper doesn't need to do it twice. Fixes: 96ae52279594 ("bpf: Add bpf_probe_write_user BPF helper to be called in tracers") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/841c461781874c07a0ee404a454c3bc0459eed30.1572649915.git.daniel@iogearbox.net
2019-11-02	uaccess: Add strict non-pagefault kernel-space read function	Daniel Borkmann	4	-2/+72
	Add two new probe_kernel_read_strict() and strncpy_from_unsafe_strict() helpers which by default alias to the __probe_kernel_read() and the __strncpy_from_unsafe(), respectively, but can be overridden by archs which have non-overlapping address ranges for kernel space and user space in order to bail out with -EFAULT when attempting to probe user memory including non-canonical user access addresses [0]: 4-level page tables: user-space mem: 0x0000000000000000 - 0x00007fffffffffff non-canonical: 0x0000800000000000 - 0xffff7fffffffffff 5-level page tables: user-space mem: 0x0000000000000000 - 0x00ffffffffffffff non-canonical: 0x0100000000000000 - 0xfeffffffffffffff The idea is that these helpers are complementary to the probe_user_read() and strncpy_from_unsafe_user() which probe user-only memory. Both added helpers here do the same, but for kernel-only addresses. Both set of helpers are going to be used for BPF tracing. They also explicitly avoid throwing the splat for non-canonical user addresses from 00c42373d397 ("x86-64: add warning for non-canonical user access address dereferences"). For compat, the current probe_kernel_read() and strncpy_from_unsafe() are left as-is. [0] Documentation/x86/x86_64/mm.txt Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: x86@kernel.org Link: https://lore.kernel.org/bpf/eefeefd769aa5a013531f491a71f0936779e916b.1572649915.git.daniel@iogearbox.net
2019-11-02	uaccess: Add non-pagefault user-space write function	Daniel Borkmann	2	-4/+53
	Commit 3d7081822f7f ("uaccess: Add non-pagefault user-space read functions") missed to add probe write function, therefore factor out a probe_write_common() helper with most logic of probe_kernel_write() except setting KERNEL_DS, and add a new probe_user_write() helper so it can be used from BPF side. Again, on some archs, the user address space and kernel address space can co-exist and be overlapping, so in such case, setting KERNEL_DS would mean that the given address is treated as being in kernel address space. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Link: https://lore.kernel.org/bpf/9df2542e68141bfa3addde631441ee45503856a8.1572649915.git.daniel@iogearbox.net
2019-11-02	Merge branch 'map-pinning'	Alexei Starovoitov	7	-81/+591
	Toke Høiland-Jørgensen says: ==================== This series adds support to libbpf for reading 'pinning' settings from BTF-based map definitions. It introduces a new open option which can set the pinning path; if no path is set, /sys/fs/bpf is used as the default. Callers can customise the pinning between open and load by setting the pin path per map, and still get the automatic reuse feature. The semantics of the pinning is similar to the iproute2 "PIN_GLOBAL" setting, and the eventual goal is to move the iproute2 implementation to be based on libbpf and the functions introduced in this series. Changelog: v6: - Fix leak of struct bpf_object in selftest - Make struct bpf_map arg const in bpf_map__is_pinned() and bpf_map__get_pin_path() v5: - Don't pin maps with pinning set, but with a value of LIBBPF_PIN_NONE - Add a few more selftests: - Should not pin map with pinning set, but value LIBBPF_PIN_NONE - Should fail to load a map with an invalid pinning value - Should fail to re-use maps with parameter mismatch - Alphabetise libbpf.map - Whitespace and typo fixes v4: - Don't check key_type_id and value_type_id when checking for map reuse compatibility. - Move building of map->pin_path into init_user_btf_map() - Get rid of 'pinning' attribute in struct bpf_map - Make sure we also create parent directory on auto-pin (new patch 3). - Abort the selftest on error instead of attempting to continue. - Support unpinning all pinned maps with bpf_object__unpin_maps(obj, NULL) - Support pinning at map->pin_path with bpf_object__pin_maps(obj, NULL) - Make re-pinning a map at the same path a noop - Rename the open option to pin_root_path - Add a bunch more self-tests for pin_maps(NULL) and unpin_maps(NULL) - Fix a couple of smaller nits v3: - Drop bpf_object__pin_maps_opts() and just use an open option to customise the pin path; also don't touch bpf_object__{un,}pin_maps() - Integrate pinning and reuse into bpf_object__create_maps() instead of having multiple loops though the map structure - Make errors in map reuse and pinning fatal to the load procedure - Add selftest to exercise pinning feature - Rebase series to latest bpf-next v2: - Drop patch that adds mounting of bpffs - Only support a single value of the pinning attribute - Add patch to fixup error handling in reuse_fd() - Implement the full automatic pinning and map reuse logic on load ==================== Acked-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-11-02	selftests: Add tests for automatic map pinning	Toke Høiland-Jørgensen	3	-0/+257
	This adds a new BPF selftest to exercise the new automatic map pinning code. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/157269298209.394725.15420085139296213182.stgit@toke.dk
2019-11-02	libbpf: Add auto-pinning of maps when loading BPF objects	Toke Høiland-Jørgensen	3	-9/+156
	This adds support to libbpf for setting map pinning information as part of the BTF map declaration, to get automatic map pinning (and reuse) on load. The pinning type currently only supports a single PIN_BY_NAME mode, where each map will be pinned by its name in a path that can be overridden, but defaults to /sys/fs/bpf. Since auto-pinning only does something if any maps actually have a 'pinning' BTF attribute set, we default the new option to enabled, on the assumption that seamless pinning is what most callers want. When a map has a pin_path set at load time, libbpf will compare the map pinned at that location (if any), and if the attributes match, will re-use that map instead of creating a new one. If no existing map is found, the newly created map will instead be pinned at the location. Programs wanting to customise the pinning can override the pinning paths using bpf_map__set_pin_path() before calling bpf_object__load() (including setting it to NULL to disable pinning of a particular map). Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/157269298092.394725.3966306029218559681.stgit@toke.dk
2019-11-02	libbpf: Move directory creation into _pin() functions	Toke Høiland-Jørgensen	1	-27/+34
	The existing pin_*() functions all try to create the parent directory before pinning. Move this check into the per-object _pin() functions instead. This ensures consistent behaviour when auto-pinning is added (which doesn't go through the top-level pin_maps() function), at the cost of a few more calls to mkdir(). Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/157269297985.394725.5882630952992598610.stgit@toke.dk
2019-11-02	libbpf: Store map pin path and status in struct bpf_map	Toke Høiland-Jørgensen	3	-41/+134
	Support storing and setting a pin path in struct bpf_map, which can be used for automatic pinning. Also store the pin status so we can avoid attempts to re-pin a map that has already been pinned (or reused from a previous pinning). The behaviour of bpf_object__{un,}pin_maps() is changed so that if it is called with a NULL path argument (which was previously illegal), it will (un)pin only those maps that have a pin_path set. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/157269297876.394725.14782206533681896279.stgit@toke.dk
2019-11-02	libbpf: Fix error handling in bpf_map__reuse_fd()	Toke Høiland-Jørgensen	1	-4/+10
	bpf_map__reuse_fd() was calling close() in the error path before returning an error value based on errno. However, close can change errno, so that can lead to potentially misleading error messages. Instead, explicitly store errno in the err variable before each goto. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/157269297769.394725.12634985106772698611.stgit@toke.dk
2019-11-02	Merge branch 'bpf-xskmap-perf-improvements'	Daniel Borkmann	4	-115/+106
	Björn Töpel says: ==================== This set consists of three patches from Maciej and myself which are optimizing the XSKMAP lookups. In the first patch, the sockets are moved to be stored at the tail of the struct xsk_map. The second patch, Maciej implements map_gen_lookup() for XSKMAP. The third patch, introduced in this revision, moves various XSKMAP functions, to permit the compiler to do more aggressive inlining. Based on the XDP program from tools/lib/bpf/xsk.c where bpf_map_lookup_elem() is explicitly called, this work yields a 5% improvement for xdpsock's rxdrop scenario. The last patch yields 2% improvement. Jonathan's Acked-by: for patch 1 and 2 was carried on. Note that the overflow checks are done in the bpf_map_area_alloc() and bpf_map_charge_init() functions, which was fixed in commit ff1c08e1f74b ("bpf: Change size to u64 for bpf_map_{area_alloc, charge_init}()"). [1] https://patchwork.ozlabs.org/patch/1186170/ v1->v2: * Change size/cost to size_t and use {struct, array}_size where appropriate. (Jakub) v2->v3: * Proper commit message for patch 2. v3->v4: * Change size_t to u64 to handle 32-bit overflows. (Jakub) * Introduced patch 3. v4->v5: * Use BPF_SIZEOF size, instead of BPF_DW, for correct pointer-sized loads. (Daniel) ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-11-02	xsk: Restructure/inline XSKMAP lookup/redirect/flush	Björn Töpel	4	-87/+70
	In this commit the XSKMAP entry lookup function used by the XDP redirect code is moved from the xskmap.c file to the xdp_sock.h header, so the lookup can be inlined from, e.g., the bpf_xdp_redirect_map() function. Further the __xsk_map_redirect() and __xsk_map_flush() is moved to the xsk.c, which lets the compiler inline the xsk_rcv() and xsk_flush() functions. Finally, all the XDP socket functions were moved from linux/bpf.h to net/xdp_sock.h, where most of the XDP sockets functions are anyway. This yields a ~2% performance boost for the xdpsock "rx_drop" scenario. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191101110346.15004-4-bjorn.topel@gmail.com
2019-11-02	bpf: Implement map_gen_lookup() callback for XSKMAP	Maciej Fijalkowski	1	-0/+17
	Inline the xsk_map_lookup_elem() via implementing the map_gen_lookup() callback. This results in emitting the bpf instructions in place of bpf_map_lookup_elem() helper call and better performance of bpf programs. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Link: https://lore.kernel.org/bpf/20191101110346.15004-3-bjorn.topel@gmail.com
2019-11-02	xsk: Store struct xdp_sock as a flexible array member of the XSKMAP	Björn Töpel	1	-32/+23
	Prior this commit, the array storing XDP socket instances were stored in a separate allocated array of the XSKMAP. Now, we store the sockets as a flexible array member in a similar fashion as the arraymap. Doing so, we do less pointer chasing in the lookup. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Link: https://lore.kernel.org/bpf/20191101110346.15004-2-bjorn.topel@gmail.com
2019-11-01	Revert "selftests: bpf: Don't try to read files without read permission"	Jakub Kicinski	1	-1/+1
	This reverts commit 5bc60de50dfe ("selftests: bpf: Don't try to read files without read permission"). Quoted commit does not work at all, and was never tested. Script requires root permissions (and tests for them) and os.access() will always return true for root. The correct fix is needed in the bpf tree, so let's just revert and save ourselves the merge conflict. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Cc: Jiri Pirko <jiri@resnulli.us> Link: https://lore.kernel.org/bpf/20191101005127.1355-1-jakub.kicinski@netronome.com
2019-10-31	Merge branch 'bpf-cleanup-btf-raw-tp'	Daniel Borkmann	13	-49/+142
	Alexei Starovoitov says: ==================== v1->v2: addressed Andrii's feedback When BTF-enabled raw_tp were introduced the plan was to follow up with BTF-enabled kprobe and kretprobe reusing PROG_RAW_TRACEPOINT and PROG_KPROBE types. But k[ret]probe expect pt_regs while BTF-enabled program ctx will be the same as raw_tp. kretprobe is indistinguishable from kprobe while BTF-enabled kretprobe will have access to retval while kprobe will not. Hence PROG_KPROBE type is not reusable and reusing PROG_RAW_TRACEPOINT no longer fits well. Hence introduce 'umbrella' prog type BPF_PROG_TYPE_TRACING that will cover different BTF-enabled tracing attach points. The changes make libbpf side cleaner as well. check_attach_btf_id() is cleaner too. ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-10-31	libbpf: Add support for prog_tracing	Alexei Starovoitov	7	-28/+71
	Cleanup libbpf from expected_attach_type == attach_btf_id hack and introduce BPF_PROG_TYPE_TRACING. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20191030223212.953010-3-ast@kernel.org
2019-10-31	bpf: Replace prog_raw_tp+btf_id with prog_tracing	Alexei Starovoitov	6	-21/+71
	The bpf program type raw_tp together with 'expected_attach_type' was the most appropriate api to indicate BTF-enabled raw_tp programs. But during development it became apparent that 'expected_attach_type' cannot be used and new 'attach_btf_id' field had to be introduced. Which means that the information is duplicated in two fields where one of them is ignored. Clean it up by introducing new program type where both 'expected_attach_type' and 'attach_btf_id' fields have specific meaning. In the future 'expected_attach_type' will be extended with other attach points that have similar semantics to raw_tp. This patch is replacing BTF-enabled BPF_PROG_TYPE_RAW_TRACEPOINT with prog_type = BPF_RPOG_TYPE_TRACING expected_attach_type = BPF_TRACE_RAW_TP attach_btf_id = btf_id of raw tracepoint inside the kernel Future patches will add expected_attach_type = BPF_TRACE_FENTRY or BPF_TRACE_FEXIT where programs have the same input context and the same helpers, but different attach points. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20191030223212.953010-2-ast@kernel.org
2019-10-31	bpf: Fix bpf jit kallsym access	Alexei Starovoitov	1	-3/+0
	Jiri reported crash when JIT is on, but net.core.bpf_jit_kallsyms is off. bpf_prog_kallsyms_find() was skipping addr->bpf_prog resolution logic in oops and stack traces. That's incorrect. It should only skip addr->name resolution for 'cat /proc/kallsyms'. That's what bpf_jit_kallsyms and bpf_jit_harden protect. Fixes: 3dec541b2e63 ("bpf: Add support for BTF pointers to x86 JIT") Reported-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191030233019.1187404-1-ast@kernel.org
2019-10-30	bpf, testing: Introduce 'gso_linear_no_head_frag' skb_segment test	Shmulik Ladkani	1	-0/+63
	Following reports of skb_segment() hitting a BUG_ON when working on GROed skbs which have their gso_size mangled (e.g. after a bpf_skb_change_proto call), add a reproducer test that mimics the input skbs that lead to the mentioned BUG_ON as in [1] and validates the fix submitted in [2]. [1] https://lists.openwall.net/netdev/2019/08/26/110 [2] commit 3dcbdb134f32 ("net: gso: Fix skb_segment splat when splitting gso_size mangled skb having linear-headed frag_list") Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191025134223.2761-3-shmulik.ladkani@gmail.com
2019-10-30	bpf, testing: Refactor test_skb_segment() for testing skb_segment() on ↵	Shmulik Ladkani	1	-10/+41
	different skbs Currently, test_skb_segment() builds a single test skb and runs skb_segment() on it. Extend test_skb_segment() so it processes an array of numerous skb/feature pairs to test. Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191025134223.2761-2-shmulik.ladkani@gmail.com
2019-10-30	bpf: Add s390 testing documentation	Ilya Leoshkevich	2	-0/+214
	This commits adds a document that explains how to test BPF in an s390 QEMU guest. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191029172916.36528-1-iii@linux.ibm.com
2019-10-30	selftests/bpf: Test narrow load from bpf_sysctl.write	Ilya Leoshkevich	1	-0/+23
	There are tests for full and narrows loads from bpf_sysctl.file_pos, but for bpf_sysctl.write only full load is tested. Add the missing test. Suggested-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrey Ignatov <rdna@fb.com> Link: https://lore.kernel.org/bpf/20191029143027.28681-1-iii@linux.ibm.com
2019-10-30	bpf: Enforce 'return 0' in BTF-enabled raw_tp programs	Alexei Starovoitov	1	-0/+5
	The return value of raw_tp programs is ignored by __bpf_trace_run() that calls them. The verifier also allows any value to be returned. For BTF-enabled raw_tp lets enforce 'return 0', so that return value can be used for something in the future. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20191029032426.1206762-1-ast@kernel.org
2019-10-29	libbpf: Don't use kernel-side u32 type in xsk.c	Andrii Nakryiko	1	-4/+4
	u32 is a kernel-side typedef. User-space library is supposed to use __u32. This breaks Github's projection of libbpf. Do u32 -> __u32 fix. Fixes: 94ff9ebb49a5 ("libbpf: Fix compatibility for kernels without need_wakeup") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Björn Töpel <bjorn.topel@intel.com> Cc: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20191029055953.2461336-1-andriin@fb.com
2019-10-28	libbpf: Fix off-by-one error in ELF sanity check	Andrii Nakryiko	1	-1/+1
	libbpf's bpf_object__elf_collect() does simple sanity check after iterating over all ELF sections, if checks that .strtab index is correct. Unfortunately, due to section indices being 1-based, the check breaks for cases when .strtab ends up being the very last section in ELF. Fixes: 77ba9a5b48a7 ("tools lib bpf: Fetch map names from correct strtab") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191028233727.1286699-1-andriin@fb.com
2019-10-28	libbpf: Fix compatibility for kernels without need_wakeup	Magnus Karlsson	1	-12/+71
	When the need_wakeup flag was added to AF_XDP, the format of the XDP_MMAP_OFFSETS getsockopt was extended. Code was added to the kernel to take care of compatibility issues arrising from running applications using any of the two formats. However, libbpf was not extended to take care of the case when the application/libbpf uses the new format but the kernel only supports the old format. This patch adds support in libbpf for parsing the old format, before the need_wakeup flag was added, and emulating a set of static need_wakeup flags that will always work for the application. v2 -> v3: * Incorporated code improvements suggested by Jonathan Lemon v1 -> v2: * Rebased to bpf-next * Rewrote the code as the previous version made you blind Fixes: a4500432c2587cb2a ("libbpf: add support for need_wakeup flag in AF_XDP part") Reported-by: Eloy Degen <degeneloy@gmail.com> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Link: https://lore.kernel.org/bpf/1571995035-21889-1-git-send-email-magnus.karlsson@intel.com
2019-10-28	selftests/bpf: Restore $(OUTPUT)/test_stub.o rule	Ilya Leoshkevich	1	-0/+3
	`make O=/linux-build kselftest TARGETS=bpf` fails with make[3]: *** No rule to make target '/linux-build/bpf/test_stub.o', needed by '/linux-build/bpf/test_verifier' The same command without the O= part works, presumably thanks to the implicit rule. Fix by restoring the explicit $(OUTPUT)/test_stub.o rule. Fixes: 74b5a5968fe8 ("selftests/bpf: Replace test_progs and test_maps w/ general rule") Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20191028102110.7545-1-iii@linux.ibm.com
2019-10-28	selftest/bpf: Use -m{little, big}-endian for clang	Ilya Leoshkevich	1	-6/+7
	When cross-compiling tests from x86 to s390, the resulting BPF objects fail to load due to endianness mismatch. Fix by using BPF-GCC endianness check for clang as well. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20191028102049.7489-1-iii@linux.ibm.com
2019-10-26	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next	David S. Miller	65	-1100/+2604
	Daniel Borkmann says: ==================== pull-request: bpf-next 2019-10-27 The following pull-request contains BPF updates for your net-next tree. We've added 52 non-merge commits during the last 11 day(s) which contain a total of 65 files changed, 2604 insertions(+), 1100 deletions(-). The main changes are: 1) Revolutionize BPF tracing by using in-kernel BTF to type check BPF assembly code. The work here teaches BPF verifier to recognize kfree_skb()'s first argument as 'struct sk_buff *' in tracepoints such that verifier allows direct use of bpf_skb_event_output() helper used in tc BPF et al (w/o probing memory access) that dumps skb data into perf ring buffer. Also add direct loads to probe memory in order to speed up/replace bpf_probe_read() calls, from Alexei Starovoitov. 2) Big batch of changes to improve libbpf and BPF kselftests. Besides others: generalization of libbpf's CO-RE relocation support to now also include field existence relocations, revamp the BPF kselftest Makefile to add test runner concept allowing to exercise various ways to build BPF programs, and teach bpf_object__open() and friends to automatically derive BPF program type/expected attach type from section names to ease their use, from Andrii Nakryiko. 3) Fix deadlock in stackmap's build-id lookup on rq_lock(), from Song Liu. 4) Allow to read BTF as raw data from bpftool. Most notable use case is to dump /sys/kernel/btf/vmlinux through this, from Jiri Olsa. 5) Use bpf_redirect_map() helper in libbpf's AF_XDP helper prog which manages to improve "rx_drop" performance by ~4%., from Björn Töpel. 6) Fix to restore the flow dissector after reattach BPF test and also fix error handling in bpf_helper_defs.h generation, from Jakub Sitnicki. 7) Improve verifier's BTF ctx access for use outside of raw_tp, from Martin KaFai Lau. 8) Improve documentation for AF_XDP with new sections and to reflect latest features, from Magnus Karlsson. 9) Add back 'version' section parsing to libbpf for old kernels, from John Fastabend. 10) Fix strncat bounds error in libbpf's libbpf_prog_type_by_name(), from KP Singh. 11) Turn on -mattr=+alu32 in LLVM by default for BPF kselftests in order to improve insn coverage for built BPF progs, from Yonghong Song. 12) Misc minor cleanups and fixes, from various others. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-26	tc-testing: list required kernel options for act_ct action	Roman Mashak	1	-0/+10
	Updated config with required kernel options for conntrac TC action, so that tdc can run the tests. Signed-off-by: Roman Mashak <mrv@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-26	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next	David S. Miller	43	-730/+1346
	Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains Netfilter/IPVS updates for net-next, more specifically: * Updates for ipset: 1) Coding style fix for ipset comment extension, from Jeremy Sowden. 2) De-inline many functions in ipset, from Jeremy Sowden. 3) Move ipset function definition from header to source file. 4) Move ip_set_put_flags() to source, export it as a symbol, remove inline. 5) Move range_to_mask() to the source file where this is used. 6) Move ip_set_get_ip_port() to the source file where this is used. * IPVS selftests and netns improvements: 7) Two patches to speedup ipvs netns dismantle, from Haishuang Yan. 8) Three patches to add selftest script for ipvs, also from Haishuang Yan. * Conntrack updates and new nf_hook_slow_list() function: 9) Document ct ecache extension, from Florian Westphal. 10) Skip ct extensions from ctnetlink dump, from Florian. 11) Free ct extension immediately, from Florian. 12) Skip access to ecache extension from nf_ct_deliver_cached_events() this is not correct as reported by Syzbot. 13) Add and use nf_hook_slow_list(), from Florian. * Flowtable infrastructure updates: 14) Move priority to nf_flowtable definition. 15) Dynamic allocation of per-device hooks in flowtables. 16) Allow to include netdevice only once in flowtable definitions. 17) Rise maximum number of devices per flowtable. * Netfilter hardware offload infrastructure updates: 18) Add nft_flow_block_chain() helper function. 19) Pass callback list to nft_setup_cb_call(). 20) Add nft_flow_cls_offload_setup() helper function. 21) Remove rules for the unregistered device via netdevice event. 22) Support for multiple devices in a basechain definition at the ingress hook. 22) Add nft_chain_offload_cmd() helper function. 23) Add nft_flow_block_offload_init() helper function. 24) Rewind in case of failing to bind multiple devices to hook. 25) Typo in IPv6 tproxy module description, from Norman Rasmussen. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-26	Merge branch 'net-aquantia-ptp-followup-fixes'	David S. Miller	5	-8/+92
	Igor Russkikh says: ==================== net: aquantia: ptp followup fixes Here are two sparse warnings, third patch is a fix for scaled_ppm_to_ppb missing. Eventually I reworked this to exclude ptp module from build. Please consider it instead of this patch: https://patchwork.ozlabs.org/patch/1184171/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-26	net: aquantia: disable ptp object build if no config	Igor Russkikh	2	-1/+86
	We do disable aq_ptp module build using inline stubs when CONFIG_PTP_1588_CLOCK is not declared. This reduces module size and removes unnecessary code. Reported-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-26	net: aquantia: fix warnings on endianness	Igor Russkikh	2	-6/+5
	fixes to remove sparse warnings: sparse: sparse: cast to restricted __be64 Fixes: 04a1839950d9 ("net: aquantia: implement data PTP datapath") Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-26	net: aquantia: fix var initialization warning	Igor Russkikh	1	-1/+1
	found by sparse, simply useless local initialization with zero. Fixes: 94ad94558b0f ("net: aquantia: add PTP rings infrastructure") Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-26	netfilter: nf_tables_offload: unbind if multi-device binding fails	Pablo Neira Ayuso	1	-2/+17
	nft_flow_block_chain() needs to unbind in case of error when performing the multi-device binding. Fixes: d54725cd11a5 ("netfilter: nf_tables: support for multiple devices per netdev hook") Reported-by: wenxu <wenxu@ucloud.cn> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-10-26	netfilter: nf_tables_offload: add nft_flow_block_offload_init()	Pablo Neira Ayuso	1	-21/+21
	This patch adds the nft_flow_block_offload_init() helper function to initialize the flow_block_offload object. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-10-26	netfilter: nf_tables_offload: add nft_chain_offload_cmd()	Pablo Neira Ayuso	1	-5/+15
	This patch adds the nft_chain_offload_cmd() helper function. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-10-26	netfilter: ecache: don't look for ecache extension on dying/unconfirmed ↵	Florian Westphal	1	-3/+3
	conntracks syzbot reported following splat: BUG: KASAN: use-after-free in __nf_ct_ext_exist include/net/netfilter/nf_conntrack_extend.h:53 [inline] BUG: KASAN: use-after-free in nf_ct_deliver_cached_events+0x5c3/0x6d0 net/netfilter/nf_conntrack_ecache.c:205 nf_conntrack_confirm include/net/netfilter/nf_conntrack_core.h:65 [inline] nf_confirm+0x3d8/0x4d0 net/netfilter/nf_conntrack_proto.c:154 [..] While there is no reproducer yet, the syzbot report contains one interesting bit of information: Freed by task 27585: [..] kfree+0x10a/0x2c0 mm/slab.c:3757 nf_ct_ext_destroy+0x2ab/0x2e0 net/netfilter/nf_conntrack_extend.c:38 nf_conntrack_free+0x8f/0xe0 net/netfilter/nf_conntrack_core.c:1418 destroy_conntrack+0x1a2/0x270 net/netfilter/nf_conntrack_core.c:626 nf_conntrack_put include/linux/netfilter/nf_conntrack_common.h:31 [inline] nf_ct_resolve_clash net/netfilter/nf_conntrack_core.c:915 [inline] ^^^^^^^^^^^^^^^^^^^ __nf_conntrack_confirm+0x21ca/0x2830 net/netfilter/nf_conntrack_core.c:1038 nf_conntrack_confirm include/net/netfilter/nf_conntrack_core.h:63 [inline] nf_confirm+0x3e7/0x4d0 net/netfilter/nf_conntrack_proto.c:154 This is whats happening: 1. a conntrack entry is about to be confirmed (added to hash table). 2. a clash with existing entry is detected. 3. nf_ct_resolve_clash() puts skb->nfct (the "losing" entry). 4. this entry now has a refcount of 0 and is freed to SLAB_TYPESAFE_BY_RCU kmem cache. skb->nfct has been replaced by the one found in the hash. Problem is that nf_conntrack_confirm() uses the old ct: static inline int nf_conntrack_confirm(struct sk_buff skb) { struct nf_conn ct = (struct nf_conn )skb_nfct(skb); int ret = NF_ACCEPT; if (ct) { if (!nf_ct_is_confirmed(ct)) ret = __nf_conntrack_confirm(skb); if (likely(ret == NF_ACCEPT)) nf_ct_deliver_cached_events(ct); / This ct has refcount 0! */ } return ret; } As of "netfilter: conntrack: free extension area immediately", we can't access conntrack extensions in this case. To fix this, make sure we check the dying bit presence before attempting to get the eache extension. Reported-by: syzbot+c7aabc9fe93e7f3637ba@syzkaller.appspotmail.com Fixes: 2ad9d7747c10d1 ("netfilter: conntrack: free extension area immediately") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-10-25	Merge branch 'ionic-updates'	David S. Miller	7	-186/+413
	Shannon Nelson says: ==================== ionic updates These are a few of the driver updates we've been working on internally. These clean up a few mismatched struct comments, add checking for dead firmware, fix an initialization bug, and change the Rx buffer management. These are based on net-next v5.4-rc3-709-g985fd98ab5cc. v2: clear napi->skb in the error case in ionic_rx_frags() ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-25	ionic: update driver version	Shannon Nelson	1	-1/+1
	Signed-off-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-25	ionic: implement support for rx sgl	Shannon Nelson	3	-84/+224
	Even out Rx performance across MTU sizes by changing from full skb allocations to page-based frag allocations. The device supports a form of scatter-gather in the Rx path, so we can set up a number of pages for each descriptor, all of which are easier to alloc and pass around than the standard kzalloc'd buffer. An skb is wrapped around the pages while processing the received packets, and pages are recycled as needed, or left alone if they weren't used in the Rx. Signed-off-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-25	ionic: add a watchdog timer to monitor heartbeat	Shannon Nelson	3	-2/+20
	Add a watchdog to periodically monitor the NIC heartbeat. Signed-off-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-25	ionic: add heartbeat check	Shannon Nelson	3	-1/+70
	Most of our firmware has a heartbeat feature that the driver can watch for to see if the FW is still alive and likely to answer a dev_cmd or AdminQ request. Signed-off-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-25	ionic: reverse an interrupt coalesce calculation	Shannon Nelson	1	-1/+1
	Fix the initial interrupt coalesce usec-to-hw setting to actually be usec-to-hw. Fixes: 780eded34ccc ("ionic: report users coalesce request") Signed-off-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-25	ionic: fix up struct name comments	Shannon Nelson	1	-98/+98
	Fix up struct names in the ionic_if.h comments Signed-off-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-25	r8169: improve rtl8169_rx_fill	Heiner Kallweit	1	-6/+3
	We have only one user of the error path, so we can inline it. In addition the call to rtl8169_make_unusable_by_asic() can be removed because rtl8169_alloc_rx_data() didn't call rtl8169_mark_to_asic() yet for the respective index if returning NULL. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>