summaryrefslogtreecommitdiffstats
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2022-03-10bpf-lsm: Make bpf_lsm_kernel_read_file() as sleepableRoberto Sassu1-0/+1
Make bpf_lsm_kernel_read_file() as sleepable, so that bpf_ima_inode_hash() or bpf_ima_file_hash() can be called inside the implementation of this hook. Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220302111404.193900-8-roberto.sassu@huawei.com
2022-03-10bpf-lsm: Introduce new helper bpf_ima_file_hash()Roberto Sassu1-0/+20
ima_file_hash() has been modified to calculate the measurement of a file on demand, if it has not been already performed by IMA or the measurement is not fresh. For compatibility reasons, ima_inode_hash() remains unchanged. Keep the same approach in eBPF and introduce the new helper bpf_ima_file_hash() to take advantage of the modified behavior of ima_file_hash(). Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220302111404.193900-4-roberto.sassu@huawei.com
2022-03-10bpf: Use offsetofend() to simplify macro definitionYuntao Wang1-2/+1
Use offsetofend() instead of offsetof() + sizeof() to simplify MIN_BPF_LINEINFO_SIZE macro definition. Signed-off-by: Yuntao Wang <ytcoode@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Joanne Koong <joannelkoong@gmail.com> Link: https://lore.kernel.org/bpf/20220310161518.534544-1-ytcoode@gmail.com
2022-03-09bpf: Add "live packet" mode for XDP in BPF_PROG_RUNToke Høiland-Jørgensen2-1/+2
This adds support for running XDP programs through BPF_PROG_RUN in a mode that enables live packet processing of the resulting frames. Previous uses of BPF_PROG_RUN for XDP returned the XDP program return code and the modified packet data to userspace, which is useful for unit testing of XDP programs. The existing BPF_PROG_RUN for XDP allows userspace to set the ingress ifindex and RXQ number as part of the context object being passed to the kernel. This patch reuses that code, but adds a new mode with different semantics, which can be selected with the new BPF_F_TEST_XDP_LIVE_FRAMES flag. When running BPF_PROG_RUN in this mode, the XDP program return codes will be honoured: returning XDP_PASS will result in the frame being injected into the networking stack as if it came from the selected networking interface, while returning XDP_TX and XDP_REDIRECT will result in the frame being transmitted out that interface. XDP_TX is translated into an XDP_REDIRECT operation to the same interface, since the real XDP_TX action is only possible from within the network drivers themselves, not from the process context where BPF_PROG_RUN is executed. Internally, this new mode of operation creates a page pool instance while setting up the test run, and feeds pages from that into the XDP program. The setup cost of this is amortised over the number of repetitions specified by userspace. To support the performance testing use case, we further optimise the setup step so that all pages in the pool are pre-initialised with the packet data, and pre-computed context and xdp_frame objects stored at the start of each page. This makes it possible to entirely avoid touching the page content on each XDP program invocation, and enables sending up to 9 Mpps/core on my test box. Because the data pages are recycled by the page pool, and the test runner doesn't re-initialise them for each run, subsequent invocations of the XDP program will see the packet data in the state it was after the last time it ran on that particular page. This means that an XDP program that modifies the packet before redirecting it has to be careful about which assumptions it makes about the packet content, but that is only an issue for the most naively written programs. Enabling the new flag is only allowed when not setting ctx_out and data_out in the test specification, since using it means frames will be redirected somewhere else, so they can't be returned. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20220309105346.100053-2-toke@redhat.com
2022-03-08bpf: Determine buf_info inside check_buffer_access()Shung-Hsi Yu1-9/+3
Instead of determining buf_info string in the caller of check_buffer_access(), we can determine whether the register type is read-only through type_is_rdonly_mem() helper inside check_buffer_access() and construct buf_info, making the code slightly cleaner. Signed-off-by: Shung-Hsi Yu <shung-hsi.yu@suse.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/YiWYLnAkEZXBP/gH@syu-laptop
2022-03-07bpf: Remove redundant slashYuntao Wang1-3/+2
The trailing slash of LIBBPF_SRCS is redundant, remove it. Also inline it as its only used in LIBBPF_INCLUDE. Signed-off-by: Yuntao Wang <ytcoode@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220305161013.361646-1-ytcoode@gmail.com
2022-03-07bpf: Replace strncpy() with strscpy()Yuntao Wang1-7/+2
Using strncpy() on NUL-terminated strings is considered deprecated[1]. Moreover, if the length of 'task->comm' is less than the destination buffer size, strncpy() will NUL-pad the destination buffer, which is a needless performance penalty. Replacing strncpy() with strscpy() fixes all these issues. [1] https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings Signed-off-by: Yuntao Wang <ytcoode@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20220304070408.233658-1-ytcoode@gmail.com
2022-03-05bpf: Reject programs that try to load __percpu memory.Hao Luo2-11/+21
With the introduction of the btf_type_tag "percpu", we can add a MEM_PERCPU to identify those pointers that point to percpu memory. The ability of differetiating percpu pointers from regular memory pointers have two benefits: 1. It forbids unexpected use of percpu pointers, such as direct loads. In kernel, there are special functions used for accessing percpu memory. Directly loading percpu memory is meaningless. We already have BPF helpers like bpf_per_cpu_ptr() and bpf_this_cpu_ptr() that wrap the kernel percpu functions. So we can now convert percpu pointers into regular pointers in a safe way. 2. Previously, bpf_per_cpu_ptr() and bpf_this_cpu_ptr() only work on PTR_TO_PERCPU_BTF_ID, a special reg_type which describes static percpu variables in kernel (we rely on pahole to encode them into vmlinux BTF). Now, since we can identify __percpu tagged pointers, we can also identify dynamically allocated percpu memory as well. It means we can use bpf_xxx_cpu_ptr() on dynamic percpu memory. This would be very convenient when accessing fields like "cgroup->rstat_cpu". Signed-off-by: Hao Luo <haoluo@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20220304191657.981240-4-haoluo@google.com
2022-03-05bpf: Fix checking PTR_TO_BTF_ID in check_mem_accessHao Luo1-1/+2
With the introduction of MEM_USER in commit c6f1bfe89ac9 ("bpf: reject program if a __user tagged memory accessed in kernel way") PTR_TO_BTF_ID can be combined with a MEM_USER tag. Therefore, most likely, when we compare reg_type against PTR_TO_BTF_ID, we want to use the reg's base_type. Previously the check in check_mem_access() wants to say: if the reg is BTF_ID but not NULL, the execution flow falls into the 'then' branch. But now a reg of (BTF_ID | MEM_USER), which should go into the 'then' branch, goes into the 'else'. The end results before and after this patch are the same: regs tagged with MEM_USER get rejected, but not in a way we intended. So fix the condition, the error message now is correct. Before (log from commit 696c39011538): $ ./test_progs -v -n 22/3 ... libbpf: prog 'test_user1': BPF program load failed: Permission denied libbpf: prog 'test_user1': -- BEGIN PROG LOAD LOG -- R1 type=ctx expected=fp 0: R1=ctx(id=0,off=0,imm=0) R10=fp0 ; int BPF_PROG(test_user1, struct bpf_testmod_btf_type_tag_1 *arg) 0: (79) r1 = *(u64 *)(r1 +0) func 'bpf_testmod_test_btf_type_tag_user_1' arg0 has btf_id 136561 type STRUCT 'bpf_testmod_btf_type_tag_1' 1: R1_w=user_ptr_bpf_testmod_btf_type_tag_1(id=0,off=0,imm=0) ; g = arg->a; 1: (61) r1 = *(u32 *)(r1 +0) R1 invalid mem access 'user_ptr_' Now: libbpf: prog 'test_user1': BPF program load failed: Permission denied libbpf: prog 'test_user1': -- BEGIN PROG LOAD LOG -- R1 type=ctx expected=fp 0: R1=ctx(id=0,off=0,imm=0) R10=fp0 ; int BPF_PROG(test_user1, struct bpf_testmod_btf_type_tag_1 *arg) 0: (79) r1 = *(u64 *)(r1 +0) func 'bpf_testmod_test_btf_type_tag_user_1' arg0 has btf_id 104036 type STRUCT 'bpf_testmod_btf_type_tag_1' 1: R1_w=user_ptr_bpf_testmod_btf_type_tag_1(id=0,ref_obj_id=0,off=0,imm=0) ; g = arg->a; 1: (61) r1 = *(u32 *)(r1 +0) R1 is ptr_bpf_testmod_btf_type_tag_1 access user memory: off=0 Note the error message for the reason of rejection. Fixes: c6f1bfe89ac9 ("bpf: reject program if a __user tagged memory accessed in kernel way") Signed-off-by: Hao Luo <haoluo@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20220304191657.981240-2-haoluo@google.com
2022-03-05bpf: Harden register offset checks for release helpers and kfuncsKumar Kartikeya Dwivedi2-17/+41
Let's ensure that the PTR_TO_BTF_ID reg being passed in to release BPF helpers and kfuncs always has its offset set to 0. While not a real problem now, there's a very real possibility this will become a problem when more and more kfuncs are exposed, and more BPF helpers are added which can release PTR_TO_BTF_ID. Previous commits already protected against non-zero var_off. One of the case we are concerned about now is when we have a type that can be returned by e.g. an acquire kfunc: struct foo { int a; int b; struct bar b; }; ... and struct bar is also a type that can be returned by another acquire kfunc. Then, doing the following sequence: struct foo *f = bpf_get_foo(); // acquire kfunc if (!f) return 0; bpf_put_bar(&f->b); // release kfunc ... would work with the current code, since the btf_struct_ids_match takes reg->off into account for matching pointer type with release kfunc argument type, but would obviously be incorrect, and most likely lead to a kernel crash. A test has been included later to prevent regressions in this area. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220304224645.3677453-5-memxor@gmail.com
2022-03-05bpf: Disallow negative offset in check_ptr_off_regKumar Kartikeya Dwivedi1-0/+6
check_ptr_off_reg only allows fixed offset to be set for PTR_TO_BTF_ID, where reg->off < 0 doesn't make sense. This would shift the pointer backwards, and fails later in btf_struct_ids_match or btf_struct_walk due to out of bounds access (since offset is interpreted as unsigned). Improve the verifier by rejecting this case by using a better error message for BPF helpers and kfunc, by putting a check inside the check_func_arg_reg_off function. Also, update existing verifier selftests to work with new error string. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220304224645.3677453-4-memxor@gmail.com
2022-03-05bpf: Fix PTR_TO_BTF_ID var_off checkKumar Kartikeya Dwivedi1-3/+6
When kfunc support was added, check_ctx_reg was called for PTR_TO_CTX register, but no offset checks were made for PTR_TO_BTF_ID. Only reg->off was taken into account by btf_struct_ids_match, which protected against type mismatch due to non-zero reg->off, but when reg->off was zero, a user could set the variable offset of the register and allow it to be passed to kfunc, leading to bad pointer being passed into the kernel. Fix this by reusing the extracted helper check_func_arg_reg_off from previous commit, and make one call before checking all supported register types. Since the list is maintained, any future changes will be taken into account by updating check_func_arg_reg_off. This function prevents non-zero var_off to be set for PTR_TO_BTF_ID, but still allows a fixed non-zero reg->off, which is needed for type matching to work correctly when using pointer arithmetic. ARG_DONTCARE is passed as arg_type, since kfunc doesn't support accepting a ARG_PTR_TO_ALLOC_MEM without relying on size of parameter type from BTF (in case of pointer), or using a mem, len pair. The forcing of offset check for ARG_PTR_TO_ALLOC_MEM is done because ringbuf helpers obtain the size from the header located at the beginning of the memory region, hence any changes to the original pointer shouldn't be allowed. In case of kfunc, size is always known, either at verification time, or using the length parameter, hence this forcing is not required. Since this check will happen once already for PTR_TO_CTX, remove the check_ptr_off_reg call inside its block. Fixes: e6ac2450d6de ("bpf: Support bpf program calling kernel function") Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220304224645.3677453-3-memxor@gmail.com
2022-03-05bpf: Add check_func_arg_reg_off functionKumar Kartikeya Dwivedi1-28/+41
Lift the list of register types allowed for having fixed and variable offsets when passed as helper function arguments into a common helper, so that they can be reused for kfunc checks in later commits. Keeping a common helper aids maintainability and allows us to follow the same consistent rules across helpers and kfuncs. Also, convert check_func_arg to use this function. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220304224645.3677453-2-memxor@gmail.com
2022-03-04Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextJakub Kicinski14-50/+82
Daniel Borkmann says: ==================== pull-request: bpf-next 2022-03-04 We've added 32 non-merge commits during the last 14 day(s) which contain a total of 59 files changed, 1038 insertions(+), 473 deletions(-). The main changes are: 1) Optimize BPF stackmap's build_id retrieval by caching last valid build_id, as consecutive stack frames are likely to be in the same VMA and therefore have the same build id, from Hao Luo. 2) Several improvements to arm64 BPF JIT, that is, support for JITing the atomic[64]_fetch_add, atomic[64]_[fetch_]{and,or,xor} and lastly atomic[64]_{xchg|cmpxchg}. Also fix the BTF line info dump for JITed programs, from Hou Tao. 3) Optimize generic BPF map batch deletion by only enforcing synchronize_rcu() barrier once upon return to user space, from Eric Dumazet. 4) For kernel build parse DWARF and generate BTF through pahole with enabled multithreading, from Kui-Feng Lee. 5) BPF verifier usability improvements by making log info more concise and replacing inv with scalar type name, from Mykola Lysenko. 6) Two follow-up fixes for BPF prog JIT pack allocator, from Song Liu. 7) Add a new Kconfig to allow for loading kernel modules with non-matching BTF type info; their BTF info is then removed on load, from Connor O'Brien. 8) Remove reallocarray() usage from bpftool and switch to libbpf_reallocarray() in order to fix compilation errors for older glibc, from Mauricio Vásquez. 9) Fix libbpf to error on conflicting name in BTF when type declaration appears before the definition, from Xu Kuohai. 10) Fix issue in BPF preload for in-kernel light skeleton where loaded BPF program fds prevent init process from setting up fd 0-2, from Yucong Sun. 11) Fix libbpf reuse of pinned perf RB map when max_entries is auto-determined by libbpf, from Stijn Tintel. 12) Several cleanups for libbpf and a fix to enforce perf RB map #pages to be non-zero, from Yuntao Wang. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (32 commits) bpf: Small BPF verifier log improvements libbpf: Add a check to ensure that page_cnt is non-zero bpf, x86: Set header->size properly before freeing it x86: Disable HAVE_ARCH_HUGE_VMALLOC on 32-bit x86 bpf, test_run: Fix overflow in XDP frags bpf_test_finish selftests/bpf: Update btf_dump case for conflicting names libbpf: Skip forward declaration when counting duplicated type names bpf: Add some description about BPF_JIT_ALWAYS_ON in Kconfig bpf, docs: Add a missing colon in verifier.rst bpf: Cache the last valid build_id libbpf: Fix BPF_MAP_TYPE_PERF_EVENT_ARRAY auto-pinning bpf, selftests: Use raw_tp program for atomic test bpf, arm64: Support more atomic operations bpftool: Remove redundant slashes bpf: Add config to allow loading modules with BTF mismatches bpf, arm64: Feed byte-offset into bpf line info bpf, arm64: Call build_prologue() first in first JIT pass bpf: Fix issue with bpf preload module taking over stdout/stdin of kernel. bpftool: Bpf skeletons assert type sizes bpf: Cleanup comments ... ==================== Link: https://lore.kernel.org/r/20220304164313.31675-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski11-55/+133
net/batman-adv/hard-interface.c commit 690bb6fb64f5 ("batman-adv: Request iflink once in batadv-on-batadv check") commit 6ee3c393eeb7 ("batman-adv: Demote batadv-on-batadv skip error message") https://lore.kernel.org/all/20220302163049.101957-1-sw@simonwunderlich.de/ net/smc/af_smc.c commit 4d08b7b57ece ("net/smc: Fix cleanup when register ULP fails") commit 462791bbfa35 ("net/smc: add sysctl interface for SMC") https://lore.kernel.org/all/20220302112209.355def40@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-03bpf: Small BPF verifier log improvementsMykola Lysenko1-29/+35
In particular these include: 1) Remove output of inv for scalars in print_verifier_state 2) Replace inv with scalar in verifier error messages 3) Remove _value suffixes for umin/umax/s32_min/etc (except map_value) 4) Remove output of id=0 5) Remove output of ref_obj_id=0 Signed-off-by: Mykola Lysenko <mykolal@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220301222745.1667206-1-mykolal@fb.com
2022-03-02Merge branch 'ucount-rlimit-fixes-for-v5.17' of ↵Linus Torvalds1-1/+13
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull ucounts fix from Eric Biederman: "Etienne Dechamps recently found a regression caused by enforcing RLIMIT_NPROC for root where the rlimit was not previously enforced. Michal Koutný had previously pointed out the inconsistency in enforcing the RLIMIT_NPROC that had been on the root owned process after the root user creates a user namespace. Which makes the fix for the regression simply removing the inconsistency" * 'ucount-rlimit-fixes-for-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: ucounts: Fix systemd LimitNPROC with private users regression
2022-03-02bpf, x86: Set header->size properly before freeing itSong Liu1-3/+6
On do_jit failure path, the header is freed by bpf_jit_binary_pack_free. While bpf_jit_binary_pack_free doesn't require proper ro_header->size, bpf_prog_pack_free still uses it. Set header->size in bpf_int_jit_compile before calling bpf_jit_binary_pack_free. Fixes: 1022a5498f6f ("bpf, x86_64: Use bpf_jit_binary_pack_alloc") Fixes: 33c9805860e5 ("bpf: Introduce bpf_jit_binary_pack_[alloc|finalize|free]") Reported-by: Kui-Feng Lee <kuifeng@fb.com> Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20220302175126.247459-3-song@kernel.org
2022-03-01bpf: Add some description about BPF_JIT_ALWAYS_ON in KconfigTiezhu Yang1-0/+4
When CONFIG_BPF_JIT_ALWAYS_ON is enabled, /proc/sys/net/core/bpf_jit_enable is permanently set to 1 and setting any other value than that will return failure. Add the above description in the help text of config BPF_JIT_ALWAYS_ON, and then we can distinguish between BPF_JIT_ALWAYS_ON and BPF_JIT_DEFAULT_ON. Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/1645523826-18149-2-git-send-email-yangtiezhu@loongson.cn
2022-02-28bpf: Cache the last valid build_idHao Luo1-1/+11
For binaries that are statically linked, consecutive stack frames are likely to be in the same VMA and therefore have the same build id. On a real-world workload, we observed that 66% of CPU cycles in __bpf_get_stackid() were spent on build_id_parse() and find_vma(). As an optimization for this case, we can cache the previous frame's VMA, if the new frame has the same VMA as the previous one, reuse the previous one's build id. We are holding the MM locks as reader across the entire loop, so we don't need to worry about VMA going away. Tested through "stacktrace_build_id" and "stacktrace_build_id_nmi" in test_progs. Suggested-by: Greg Thelen <gthelen@google.com> Signed-off-by: Hao Luo <haoluo@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/bpf/20220224000531.1265030-1-haoluo@google.com
2022-02-28bpf: Add config to allow loading modules with BTF mismatchesConnor O'Brien1-1/+2
BTF mismatch can occur for a separately-built module even when the ABI is otherwise compatible and nothing else would prevent successfully loading. Add a new Kconfig to control how mismatches are handled. By default, preserve the current behavior of refusing to load the module. If MODULE_ALLOW_BTF_MISMATCH is enabled, load the module but ignore its BTF information. Suggested-by: Yonghong Song <yhs@fb.com> Suggested-by: Michal Suchánek <msuchanek@suse.de> Signed-off-by: Connor O'Brien <connoro@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/CAADnVQJ+OVPnBz8z3vNu8gKXX42jCUqfuvhWAyCQDu8N_yqqwQ@mail.gmail.com Link: https://lore.kernel.org/bpf/20220223012814.1898677-1-connoro@google.com
2022-02-27Merge tag 'dma-mapping-5.17-1' of git://git.infradead.org/users/hch/dma-mappingLinus Torvalds1-1/+2
Pull dma-mapping fix from Christoph Hellwig: - fix a swiotlb info leak (Halil Pasic) * tag 'dma-mapping-5.17-1' of git://git.infradead.org/users/hch/dma-mapping: swiotlb: fix info leak with DMA_FROM_DEVICE
2022-02-26Merge tag 'trace-v5.17-rc4' of ↵Linus Torvalds9-53/+118
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: - rtla (Real-Time Linux Analysis tool): - fix typo in man page - Update API -e to -E before it is released - Error message fix and memory leak fix - Partially uninline trace event soft disable to shrink text - Fix function graph start up test - Have triggers affect the trace instance they are in and not top level - Have osnoise sleep in the units it says it uses - Remove unused ftrace stub function - Remove event probe redundant info from event in the buffer - Fix group ownership setting in tracefs - Ensure trace buffer is minimum size to prevent crashes * tag 'trace-v5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: rtla/osnoise: Fix error message when failing to enable trace instance rtla/osnoise: Free params at the exit rtla/hist: Make -E the short version of --entries tracing: Fix selftest config check for function graph start up test tracefs: Set the group ownership in apply_options() not parse_options() tracing/osnoise: Make osnoise_main to sleep for microseconds ftrace: Remove unused ftrace_startup_enable() stub tracing: Ensure trace buffer is at least 4096 bytes large tracing: Uninline trace_trigger_soft_disabled() partly eprobes: Remove redundant event type information tracing: Have traceon and traceoff trigger honor the instance tracing: Dump stacktrace trigger to the corresponding instance rtla: Fix systme -> system typo on man page
2022-02-25tracing: Fix selftest config check for function graph start up testChristophe Leroy1-4/+2
CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS is required to test direct tramp. Link: https://lkml.kernel.org/r/bdc7e594e13b0891c1d61bc8d56c94b1890eaed7.1640017960.git.christophe.leroy@csgroup.eu Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-02-25bpf: Fix issue with bpf preload module taking over stdout/stdin of kernel.Yucong Sun1-0/+7
In cb80ddc67152 ("bpf: Convert bpf_preload.ko to use light skeleton.") BPF preload was switched from user mode process to use in-kernel light skeleton instead. However, in the kernel context, early in the boot sequence, the first available FD can start from 0, instead of normally 3 for user mode process. So FDs 0 and 1 are then used for loaded BPF programs and prevent init process from setting up stdin/stdout/stderr on FD 0, 1, and 2 as expected. Before the fix: ls -lah /proc/1/fd/* lrwx------1 root root 64 Feb 23 17:20 /proc/1/fd/0 -> /dev/null lrwx------ 1 root root 64 Feb 23 17:20 /proc/1/fd/1 -> /dev/null lrwx------ 1 root root 64 Feb 23 17:20 /proc/1/fd/2 -> /dev/console lrwx------ 1 root root 64 Feb 23 17:20 /proc/1/fd/6 -> /dev/console lrwx------ 1 root root 64 Feb 23 17:20 /proc/1/fd/7 -> /dev/console After the fix: ls -lah /proc/1/fd/* lrwx------ 1 root root 64 Feb 24 21:23 /proc/1/fd/0 -> /dev/console lrwx------ 1 root root 64 Feb 24 21:23 /proc/1/fd/1 -> /dev/console lrwx------ 1 root root 64 Feb 24 21:23 /proc/1/fd/2 -> /dev/console Fix by closing prog FDs after initialization. struct bpf_prog's themselves are kept alive through direct kernel references taken with bpf_link_get_from_fd(). Fixes: cb80ddc67152 ("bpf: Convert bpf_preload.ko to use light skeleton.") Signed-off-by: Yucong Sun <fallentree@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220225185923.2535519-1-fallentree@fb.com
2022-02-25tracing/osnoise: Make osnoise_main to sleep for microsecondsDaniel Bristot de Oliveira1-21/+32
osnoise's runtime and period are in the microseconds scale, but it is currently sleeping in the millisecond's scale. This behavior roots in the usage of hwlat as the skeleton for osnoise. Make osnoise to sleep in the microseconds scale. Also, move the sleep to a specialized function. Link: https://lkml.kernel.org/r/302aa6c7bdf2d131719b22901905e9da122a11b2.1645197336.git.bristot@kernel.org Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-02-25ftrace: Remove unused ftrace_startup_enable() stubNathan Chancellor1-1/+0
When building with clang + CONFIG_DYNAMIC_FTRACE=n + W=1, there is a warning: kernel/trace/ftrace.c:7194:20: error: unused function 'ftrace_startup_enable' [-Werror,-Wunused-function] static inline void ftrace_startup_enable(int command) { } ^ 1 error generated. Clang warns on instances of static inline functions in .c files with W=1 after commit 6863f5643dd7 ("kbuild: allow Clang to find unused static inline functions for W=1 build"). The ftrace_startup_enable() stub has been unused since commit e1effa0144a1 ("ftrace: Annotate the ops operation on update"), where its use outside of the CONFIG_DYNAMIC_TRACE section was replaced by ftrace_startup_all(). Remove it to resolve the warning. Link: https://lkml.kernel.org/r/20220214192847.488166-1-nathan@kernel.org Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-02-25tracing: Ensure trace buffer is at least 4096 bytes largeSven Schnelle1-4/+6
Booting the kernel with 'trace_buf_size=1' give a warning at boot during the ftrace selftests: [ 0.892809] Running postponed tracer tests: [ 0.892893] Testing tracer function: [ 0.901899] Callback from call_rcu_tasks_trace() invoked. [ 0.983829] Callback from call_rcu_tasks_rude() invoked. [ 1.072003] .. bad ring buffer .. corrupted trace buffer .. [ 1.091944] Callback from call_rcu_tasks() invoked. [ 1.097695] PASSED [ 1.097701] Testing dynamic ftrace: .. filter failed count=0 ..FAILED! [ 1.353474] ------------[ cut here ]------------ [ 1.353478] WARNING: CPU: 0 PID: 1 at kernel/trace/trace.c:1951 run_tracer_selftest+0x13c/0x1b0 Therefore enforce a minimum of 4096 bytes to make the selftest pass. Link: https://lkml.kernel.org/r/20220214134456.1751749-1-svens@linux.ibm.com Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-02-25tracing: Uninline trace_trigger_soft_disabled() partlyChristophe Leroy1-0/+14
On a powerpc32 build with CONFIG_CC_OPTIMISE_FOR_SIZE, the inline keyword is not honored and trace_trigger_soft_disabled() appears approx 50 times in vmlinux. Adding -Winline to the build, the following message appears: ./include/linux/trace_events.h:712:1: error: inlining failed in call to 'trace_trigger_soft_disabled': call is unlikely and code size would grow [-Werror=inline] That function is rather big for an inlined function: c003df60 <trace_trigger_soft_disabled>: c003df60: 94 21 ff f0 stwu r1,-16(r1) c003df64: 7c 08 02 a6 mflr r0 c003df68: 90 01 00 14 stw r0,20(r1) c003df6c: bf c1 00 08 stmw r30,8(r1) c003df70: 83 e3 00 24 lwz r31,36(r3) c003df74: 73 e9 01 00 andi. r9,r31,256 c003df78: 41 82 00 10 beq c003df88 <trace_trigger_soft_disabled+0x28> c003df7c: 38 60 00 00 li r3,0 c003df80: 39 61 00 10 addi r11,r1,16 c003df84: 4b fd 60 ac b c0014030 <_rest32gpr_30_x> c003df88: 73 e9 00 80 andi. r9,r31,128 c003df8c: 7c 7e 1b 78 mr r30,r3 c003df90: 41 a2 00 14 beq c003dfa4 <trace_trigger_soft_disabled+0x44> c003df94: 38 c0 00 00 li r6,0 c003df98: 38 a0 00 00 li r5,0 c003df9c: 38 80 00 00 li r4,0 c003dfa0: 48 05 c5 f1 bl c009a590 <event_triggers_call> c003dfa4: 73 e9 00 40 andi. r9,r31,64 c003dfa8: 40 82 00 28 bne c003dfd0 <trace_trigger_soft_disabled+0x70> c003dfac: 73 ff 02 00 andi. r31,r31,512 c003dfb0: 41 82 ff cc beq c003df7c <trace_trigger_soft_disabled+0x1c> c003dfb4: 80 01 00 14 lwz r0,20(r1) c003dfb8: 83 e1 00 0c lwz r31,12(r1) c003dfbc: 7f c3 f3 78 mr r3,r30 c003dfc0: 83 c1 00 08 lwz r30,8(r1) c003dfc4: 7c 08 03 a6 mtlr r0 c003dfc8: 38 21 00 10 addi r1,r1,16 c003dfcc: 48 05 6f 6c b c0094f38 <trace_event_ignore_this_pid> c003dfd0: 38 60 00 01 li r3,1 c003dfd4: 4b ff ff ac b c003df80 <trace_trigger_soft_disabled+0x20> However it is located in a hot path so inlining it is important. But forcing inlining of the entire function by using __always_inline leads to increasing the text size by approx 20 kbytes. Instead, split the fonction in two parts, one part with the likely fast path, flagged __always_inline, and a second part out of line. With this change, on a powerpc32 with CONFIG_CC_OPTIMISE_FOR_SIZE vmlinux text increases by only 1,4 kbytes, which is partly compensated by a decrease of vmlinux data by 7 kbytes. On ppc64_defconfig which has CONFIG_CC_OPTIMISE_FOR_SPEED, this change reduces vmlinux text by more than 30 kbytes. Link: https://lkml.kernel.org/r/69ce0986a52d026d381d612801d978aa4f977460.1644563295.git.christophe.leroy@csgroup.eu Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-02-25eprobes: Remove redundant event type informationSteven Rostedt (Google)4-16/+12
Currently, the event probes save the type of the event they are attached to when recording the event. For example: # echo 'e:switch sched/sched_switch prev_state=$prev_state prev_prio=$prev_prio next_pid=$next_pid next_prio=$next_prio' > dynamic_events # cat events/eprobes/switch/format name: switch ID: 1717 format: field:unsigned short common_type; offset:0; size:2; signed:0; field:unsigned char common_flags; offset:2; size:1; signed:0; field:unsigned char common_preempt_count; offset:3; size:1; signed:0; field:int common_pid; offset:4; size:4; signed:1; field:unsigned int __probe_type; offset:8; size:4; signed:0; field:u64 prev_state; offset:12; size:8; signed:0; field:u64 prev_prio; offset:20; size:8; signed:0; field:u64 next_pid; offset:28; size:8; signed:0; field:u64 next_prio; offset:36; size:8; signed:0; print fmt: "(%u) prev_state=0x%Lx prev_prio=0x%Lx next_pid=0x%Lx next_prio=0x%Lx", REC->__probe_type, REC->prev_state, REC->prev_prio, REC->next_pid, REC->next_prio The __probe_type adds 4 bytes to every event. One of the reasons for creating eprobes is to limit what is traced in an event to be able to limit what is written into the ring buffer. Having this redundant 4 bytes to every event takes away from this. The event that is recorded can be retrieved from the event probe itself, that is available when the trace is happening. For user space tools, it could simply read the dynamic_event file to find the event they are for. So there is really no reason to write this information into the ring buffer for every event. Link: https://lkml.kernel.org/r/20220218190057.2f5a19a8@gandalf.local.home Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Reviewed-by: Joel Fernandes <joel@joelfernandes.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-02-25tracing: Have traceon and traceoff trigger honor the instanceSteven Rostedt (Google)1-6/+46
If a trigger is set on an event to disable or enable tracing within an instance, then tracing should be disabled or enabled in the instance and not at the top level, which is confusing to users. Link: https://lkml.kernel.org/r/20220223223837.14f94ec3@rorschach.local.home Cc: stable@vger.kernel.org Fixes: ae63b31e4d0e2 ("tracing: Separate out trace events from global variables") Tested-by: Daniel Bristot de Oliveira <bristot@kernel.org> Reviewed-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-02-25ucounts: Fix systemd LimitNPROC with private users regressionEric W. Biederman1-1/+13
Long story short recursively enforcing RLIMIT_NPROC when it is not enforced on the process that creates a new user namespace, causes currently working code to fail. There is no reason to enforce RLIMIT_NPROC recursively when we don't enforce it normally so update the code to detect this case. I would like to simply use capable(CAP_SYS_RESOURCE) to detect when RLIMIT_NPROC is not enforced upon the caller. Unfortunately because RLIMIT_NPROC is charged and checked for enforcement based upon the real uid, using capable() which is euid based is inconsistent with reality. Come as close as possible to testing for capable(CAP_SYS_RESOURCE) by testing for when the real uid would match the conditions when CAP_SYS_RESOURCE would be present if the real uid was the effective uid. Reported-by: Etienne Dechamps <etienne@edechamps.fr> Link: https://bugzilla.kernel.org/show_bug.cgi?id=215596 Link: https://lkml.kernel.org/r/e9589141-cfeb-90cd-2d0e-83a62787239a@edechamps.fr Link: https://lkml.kernel.org/r/87sfs8jmpz.fsf_-_@email.froward.int.ebiederm.org Cc: stable@vger.kernel.org Fixes: 21d1c5e386bc ("Reimplement RLIMIT_NPROC on top of ucounts") Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2022-02-24tracing: Dump stacktrace trigger to the corresponding instanceDaniel Bristot de Oliveira1-1/+6
The stacktrace event trigger is not dumping the stacktrace to the instance where it was enabled, but to the global "instance." Use the private_data, pointing to the trigger file, to figure out the corresponding trace instance, and use it in the trigger action, like snapshot_trigger does. Link: https://lkml.kernel.org/r/afbb0b4f18ba92c276865bc97204d438473f4ebc.1645396236.git.bristot@kernel.org Cc: stable@vger.kernel.org Fixes: ae63b31e4d0e2 ("tracing: Separate out trace events from global variables") Reviewed-by: Tom Zanussi <zanussi@kernel.org> Tested-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-02-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski9-46/+86
tools/testing/selftests/net/mptcp/mptcp_join.sh 34aa6e3bccd8 ("selftests: mptcp: add ip mptcp wrappers") 857898eb4b28 ("selftests: mptcp: add missing join check") 6ef84b1517e0 ("selftests: mptcp: more robust signal race test") https://lore.kernel.org/all/20220221131842.468893-1-broonie@kernel.org/ drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/act.h drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/ct.c fb7e76ea3f3b6 ("net/mlx5e: TC, Skip redundant ct clear actions") c63741b426e11 ("net/mlx5e: Fix MPLSoUDP encap to use MPLS action information") 09bf97923224f ("net/mlx5e: TC, Move pedit_headers_action to parse_attr") 84ba8062e383 ("net/mlx5e: Test CT and SAMPLE on flow attr") efe6f961cd2e ("net/mlx5e: CT, Don't set flow flag CT for ct clear flow") 3b49a7edec1d ("net/mlx5e: TC, Reject rules with multiple CT actions") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-24Merge tag 'net-5.17-rc6' of ↵Linus Torvalds3-2/+8
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bpf and netfilter. Current release - regressions: - bpf: fix crash due to out of bounds access into reg2btf_ids - mvpp2: always set port pcs ops, avoid null-deref - eth: marvell: fix driver load from initrd - eth: intel: revert "Fix reset bw limit when DCB enabled with 1 TC" Current release - new code bugs: - mptcp: fix race in overlapping signal events Previous releases - regressions: - xen-netback: revert hotplug-status changes causing devices to not be configured - dsa: - avoid call to __dev_set_promiscuity() while rtnl_mutex isn't held - fix panic when removing unoffloaded port from bridge - dsa: microchip: fix bridging with more than two member ports Previous releases - always broken: - bpf: - fix crash due to incorrect copy_map_value when both spin lock and timer are present in a single value - fix a bpf_timer initialization issue with clang - do not try bpf_msg_push_data with len 0 - add schedule points in batch ops - nf_tables: - unregister flowtable hooks on netns exit - correct flow offload action array size - fix a couple of memory leaks - vsock: don't check owner in vhost_vsock_stop() while releasing - gso: do not skip outer ip header in case of ipip and net_failover - smc: use a mutex for locking "struct smc_pnettable" - openvswitch: fix setting ipv6 fields causing hw csum failure - mptcp: fix race in incoming ADD_ADDR option processing - sysfs: add check for netdevice being present to speed_show - sched: act_ct: fix flow table lookup after ct clear or switching zones - eth: intel: fixes for SR-IOV forwarding offloads - eth: broadcom: fixes for selftests and error recovery - eth: mellanox: flow steering and SR-IOV forwarding fixes Misc: - make __pskb_pull_tail() & pskb_carve_frag_list() drop_monitor friends not report freed skbs as drops - force inlining of checksum functions in net/checksum.h" * tag 'net-5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (85 commits) net: mv643xx_eth: process retval from of_get_mac_address ping: remove pr_err from ping_lookup Revert "i40e: Fix reset bw limit when DCB enabled with 1 TC" openvswitch: Fix setting ipv6 fields causing hw csum failure ipv6: prevent a possible race condition with lifetimes net/smc: Use a mutex for locking "struct smc_pnettable" bnx2x: fix driver load from initrd Revert "xen-netback: Check for hotplug-status existence before watching" Revert "xen-netback: remove 'hotplug-status' once it has served its purpose" net/mlx5e: Fix VF min/max rate parameters interchange mistake net/mlx5e: Add missing increment of count net/mlx5e: MPLSoUDP decap, fix check for unsupported matches net/mlx5e: Fix MPLSoUDP encap to use MPLS action information net/mlx5e: Add feature check for set fec counters net/mlx5e: TC, Skip redundant ct clear actions net/mlx5e: TC, Reject rules with forward and drop actions net/mlx5e: TC, Reject rules with drop and modify hdr action net/mlx5e: kTLS, Use CHECKSUM_UNNECESSARY for device-offloaded packets net/mlx5e: Fix wrong return value on ioctl EEPROM query failure net/mlx5: Fix possible deadlock on rule deletion ...
2022-02-23bpf: Cleanup commentsTom Rix9-14/+14
Add leading space to spdx tag Use // for spdx c file comment Replacements resereved to reserved inbetween to in between everytime to every time intutivie to intuitive currenct to current encontered to encountered referenceing to referencing upto to up to exectuted to executed Signed-off-by: Tom Rix <trix@redhat.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20220220184055.3608317-1-trix@redhat.com
2022-02-22Merge branch 'for-5.17-fixes' of ↵Linus Torvalds3-7/+25
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: - Fix for a subtle bug in the recent release_agent permission check update - Fix for a long-standing race condition between cpuset and cpu hotplug - Comment updates * 'for-5.17-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cpuset: Fix kernel-doc cgroup-v1: Correct privileges check in release_agent writes cgroup: clarify cgroup_css_set_fork() cgroup/cpuset: Fix a race between cpuset_attach() and cpu hotplug
2022-02-22cpuset: Fix kernel-docJiapeng Chong1-5/+5
Fix the following W=1 kernel warnings: kernel/cgroup/cpuset.c:3718: warning: expecting prototype for cpuset_memory_pressure_bump(). Prototype was for __cpuset_memory_pressure_bump() instead. kernel/cgroup/cpuset.c:3568: warning: expecting prototype for cpuset_node_allowed(). Prototype was for __cpuset_node_allowed() instead. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2022-02-22cgroup-v1: Correct privileges check in release_agent writesMichal Koutný1-2/+4
The idea is to check: a) the owning user_ns of cgroup_ns, b) capabilities in init_user_ns. The commit 24f600856418 ("cgroup-v1: Require capabilities to set release_agent") got this wrong in the write handler of release_agent since it checked user_ns of the opener (may be different from the owning user_ns of cgroup_ns). Secondly, to avoid possibly confused deputy, the capability of the opener must be checked. Fixes: 24f600856418 ("cgroup-v1: Require capabilities to set release_agent") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/stable/20220216121142.GB30035@blackbody.suse.cz/ Signed-off-by: Michal Koutný <mkoutny@suse.com> Reviewed-by: Masami Ichikawa(CIP) <masami.ichikawa@cybertrust.co.jp> Signed-off-by: Tejun Heo <tj@kernel.org>
2022-02-22cgroup: clarify cgroup_css_set_fork()Christian Brauner1-0/+14
With recent fixes for the permission checking when moving a task into a cgroup using a file descriptor to a cgroup's cgroup.procs file and calling write() it seems a good idea to clarify CLONE_INTO_CGROUP permission checking with a comment. Cc: Tejun Heo <tj@kernel.org> Cc: <cgroups@vger.kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Tejun Heo <tj@kernel.org>
2022-02-20Merge tag 'locking_urgent_for_v5.17_rc5' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fix from Borislav Petkov: "Fix a NULL ptr dereference when dumping lockdep chains through /proc/lockdep_chains" * tag 'locking_urgent_for_v5.17_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: lockdep: Correct lock_classes index mapping
2022-02-20Merge tag 'sched_urgent_for_v5.17_rc5' of ↵Linus Torvalds2-14/+33
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fix from Borislav Petkov: "Fix task exposure order when forking tasks" * tag 'sched_urgent_for_v5.17_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched: Fix yet more sched_fork() races
2022-02-20Merge tag 'pidfd.v5.17-rc4' of ↵Linus Torvalds1-4/+3
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull pidfd fix from Christian Brauner: "This fixes a problem reported by lockdep when installing a pidfd via fd_install() with siglock and the tasklisk write lock held in copy_process() when calling clone()/clone3() with CLONE_PIDFD. Originally a pidfd was created prior to holding any of these locks but this required a call to ksys_close(). So quite some time ago in 6fd2fe494b17 ("copy_process(): don't use ksys_close() on cleanups") we switched to a get_unused_fd_flags() + fd_install() model. As part of that we moved fd_install() as late as possible. This was done for two main reasons. First, because we needed to ensure that we call fd_install() past the point of no return as once that's called the fd is live in the task's file table. Second, because we tried to ensure that the fd is visible in /proc/<pid>/fd/<pidfd> right when the task is visible. This fix moves the fd_install() to an even later point which means that a task will be visible in proc while the pidfd isn't yet under /proc/<pid>/fd/<pidfd>. While this is a user visible change it's very unlikely that this will have any impact. Nobody should be relying on that and if they do we need to come up with something better but again, it's doubtful this is relevant" * tag 'pidfd.v5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: copy_process(): Move fd_install() out of sighand->siglock critical section
2022-02-20Merge branch 'ucount-rlimit-fixes-for-v5.17' of ↵Linus Torvalds4-19/+23
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull ucounts fixes from Eric Biederman: "Michal Koutný recently found some bugs in the enforcement of RLIMIT_NPROC in the recent ucount rlimit implementation. In this set of patches I have developed a very conservative approach changing only what is necessary to fix the bugs that I can see clearly. Cleanups and anything that is making the code more consistent can follow after we have the code working as it has historically. The problem is not so much inconsistencies (although those exist) but that it is very difficult to figure out what the code should be doing in the case of RLIMIT_NPROC. All other rlimits are only enforced where the resource is acquired (allocated). RLIMIT_NPROC by necessity needs to be enforced in an additional location, and our current implementation stumbled it's way into that implementation" * 'ucount-rlimit-fixes-for-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: ucounts: Handle wrapping in is_ucounts_overlimit ucounts: Move RLIMIT_NPROC handling after set_user ucounts: Base set_cred_ucounts changes on the real user ucounts: Enforce RLIMIT_NPROC not RLIMIT_NPROC+1 rlimit: Fix RLIMIT_NPROC enforcement failure caused by capability calls in set_user
2022-02-19bpf: Initialize ret to 0 inside btf_populate_kfunc_set()Souptick Joarder (HPE)1-1/+1
Kernel test robot reported below error -> kernel/bpf/btf.c:6718 btf_populate_kfunc_set() error: uninitialized symbol 'ret'. Initialize ret to 0. Fixes: dee872e124e8 ("bpf: Populate kfunc BTF ID sets in struct btf") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Souptick Joarder (HPE) <jrdr.linux@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/bpf/20220219163915.125770-1-jrdr.linux@gmail.com
2022-02-19sched: Fix yet more sched_fork() racesPeter Zijlstra2-14/+33
Where commit 4ef0c5c6b5ba ("kernel/sched: Fix sched_fork() access an invalid sched_task_group") fixed a fork race vs cgroup, it opened up a race vs syscalls by not placing the task on the runqueue before it gets exposed through the pidhash. Commit 13765de8148f ("sched/fair: Fix fault in reweight_entity") is trying to fix a single instance of this, instead fix the whole class of issues, effectively reverting this commit. Fixes: 4ef0c5c6b5ba ("kernel/sched: Fix sched_fork() access an invalid sched_task_group") Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Tadeusz Struk <tadeusz.struk@linaro.org> Tested-by: Zhang Qiao <zhangqiao22@huawei.com> Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Link: https://lkml.kernel.org/r/YgoeCbwj5mbCR0qA@hirez.programming.kicks-ass.net
2022-02-18bpf: Call maybe_wait_bpf_programs() only once from generic_map_delete_batch()Eric Dumazet1-1/+2
As stated in the comment found in maybe_wait_bpf_programs(), the synchronize_rcu() barrier is only needed before returning to userspace, not after each deletion in the batch. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/bpf/20220218181801.2971275-1-eric.dumazet@gmail.com
2022-02-17Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextJakub Kicinski13-329/+196
Daniel Borkmann says: ==================== bpf-next 2022-02-17 We've added 29 non-merge commits during the last 8 day(s) which contain a total of 34 files changed, 1502 insertions(+), 524 deletions(-). The main changes are: 1) Add BTFGen support to bpftool which allows to use CO-RE in kernels without BTF info, from Mauricio Vásquez, Rafael David Tinoco, Lorenzo Fontana and Leonardo Di Donato. (Details: https://lpc.events/event/11/contributions/948/) 2) Prepare light skeleton to be used in both kernel module and user space and convert bpf_preload.ko to use light skeleton, from Alexei Starovoitov. 3) Rework bpftool's versioning scheme and align with libbpf's version number; also add linked libbpf version info to "bpftool version", from Quentin Monnet. 4) Add minimal C++ specific additions to bpftool's skeleton codegen to facilitate use of C skeletons in C++ applications, from Andrii Nakryiko. 5) Add BPF verifier sanity check whether relative offset on kfunc calls overflows desc->imm and reject the BPF program if the case, from Hou Tao. 6) Fix libbpf to use a dynamically allocated buffer for netlink messages to avoid receiving truncated messages on some archs, from Toke Høiland-Jørgensen. 7) Various follow-up fixes to the JIT bpf_prog_pack allocator, from Song Liu. 8) Various BPF selftest and vmtest.sh fixes, from Yucong Sun. 9) Fix bpftool pretty print handling on dumping map keys/values when no BTF is available, from Jiri Olsa and Yinjun Zhang. 10) Extend XDP frags selftest to check for invalid length, from Lorenzo Bianconi. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (29 commits) bpf: bpf_prog_pack: Set proper size before freeing ro_header selftests/bpf: Fix crash in core_reloc when bpftool btfgen fails selftests/bpf: Fix vmtest.sh to launch smp vm. libbpf: Fix memleak in libbpf_netlink_recv() bpftool: Fix C++ additions to skeleton bpftool: Fix pretty print dump for maps without BTF loaded selftests/bpf: Test "bpftool gen min_core_btf" bpftool: Gen min_core_btf explanation and examples bpftool: Implement btfgen_get_btf() bpftool: Implement "gen min_core_btf" logic bpftool: Add gen min_core_btf command libbpf: Expose bpf_core_{add,free}_cands() to bpftool libbpf: Split bpf_core_apply_relo() bpf: Reject kfunc calls that overflow insn->imm selftests/bpf: Add Skeleton templated wrapper as an example bpftool: Add C++-specific open/load/etc skeleton wrappers selftests/bpf: Fix GCC11 compiler warnings in -O2 mode bpftool: Fix the error when lookup in no-btf maps libbpf: Use dynamically allocated buffer when receiving netlink messages libbpf: Fix libbpf.map inheritance chain for LIBBPF_0.7.0 ... ==================== Link: https://lore.kernel.org/r/20220217232027.29831-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-17bpf: bpf_prog_pack: Set proper size before freeing ro_headerSong Liu1-0/+1
bpf_prog_pack_free() uses header->size to decide whether the header should be freed with module_memfree() or the bpf_prog_pack logic. However, in kvmalloc() failure path of bpf_jit_binary_pack_alloc(), header->size is not set yet. As a result, bpf_prog_pack_free() may treat a slice of a pack as a standalone kvmalloc'd header and call module_memfree() on the whole pack. This in turn causes use-after-free by other users of the pack. Fix this by setting ro_header->size before freeing ro_header. Fixes: 33c9805860e5 ("bpf: Introduce bpf_jit_binary_pack_[alloc|finalize|free]") Reported-by: syzbot+2f649ec6d2eea1495a8f@syzkaller.appspotmail.com Reported-by: syzbot+ecb1e7e51c52f68f7481@syzkaller.appspotmail.com Reported-by: syzbot+87f65c75f4a72db05445@syzkaller.appspotmail.com Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220217183001.1876034-1-song@kernel.org
2022-02-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski3-2/+8
Fast path bpf marge for some -next work. Signed-off-by: Jakub Kicinski <kuba@kernel.org>