summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2019-04-16tools: bpftool: remove blank line after btf_id when listing programsQuentin Monnet1-1/+1
Commit 569b0c77735d ("tools/bpftool: show btf id in program information") made bpftool print an empty line after each program entry when listing the BPF programs loaded on the system (plain output). This is especially confusing when some programs have an associated BTF id, and others don't. Let's remove the blank line. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-04-16bpf: reserve flags in bpf_skb_net_shrinkWillem de Bruijn1-0/+3
The ENCAP flags in bpf_skb_adjust_room are ignored on decap with bpf_skb_net_shrink. Reserve these bits for future use. Fixes: 868d523535c2d ("bpf: add bpf_skb_adjust_room encap flags") Signed-off-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-04-16bpf: fix whitespace for ENCAP_L2 defines in bpf.hAlan Maguire2-6/+6
replace tab after #define with space in line with rest of definitions Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-04-16selftests/bpf: bring back (void *) cast to set_ipv4_csum in test_tc_tunnelStanislav Fomichev1-1/+1
It was removed in commit 166b5a7f2ca3 ("selftests_bpf: extend test_tc_tunnel for UDP encap") without any explanation. Otherwise I see: progs/test_tc_tunnel.c:160:17: warning: taking address of packed member 'ip' of class or structure 'v4hdr' may result in an unaligned pointer value [-Waddress-of-packed-member] set_ipv4_csum(&h_outer.ip); ^~~~~~~~~~ 1 warning generated. Cc: Alan Maguire <alan.maguire@oracle.com> Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com> Fixes: 166b5a7f2ca3 ("selftests_bpf: extend test_tc_tunnel for UDP encap") Signed-off-by: Stanislav Fomichev <sdf@google.com> Acked-by: Song Liu <songliubraving@fb.com> Reviewed-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-04-16selftests/btf: add VAR and DATASEC case for dedup testsAndrii Nakryiko1-0/+49
Add test case verifying that dedup happens (INTs are deduped in this case) and VAR/DATASEC types are not deduped, but have their referenced type IDs adjusted correctly. Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Yonghong Song <yhs@fb.com> Cc: Alexei Starovoitov <ast@fb.com> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-04-16btf: add support for VAR and DATASEC in btf_dedup()Andrii Nakryiko1-2/+27
This patch adds support for VAR and DATASEC in btf_dedup(). VAR/DATASEC are never deduplicated, but they need to be processed anyway as types they refer to might need to be remapped due to deduplication and compaction. Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Yonghong Song <yhs@fb.com> Cc: Alexei Starovoitov <ast@fb.com> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-04-16kbuild: handle old pahole more gracefully when generating BTFAndrii Nakryiko1-1/+1
When CONFIG_DEBUG_INFO_BTF is enabled but available version of pahole is too old to support BTF generation, build script is supposed to emit warning and proceed with the build. Due to using exit instead of return from BASH function, existing handling code prematurely exits exit code 0, not completing some of the build steps. This patch fixes issue by correctly returning just from gen_btf() function only. Fixes: e83b9f55448a ("kbuild: add ability to generate BTF type info for vmlinux") Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Alexei Starovoitov <ast@fb.com> Cc: Yonghong Song <yhs@fb.com> Cc: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-04-12bpf: refactor "check_reg_arg" to eliminate code redundancyJiong Wang1-6/+8
There are a few "regs[regno]" here are there across "check_reg_arg", this patch factor it out into a simple "reg" pointer. The intention is to simplify code indentation and make the later patches in this set look cleaner. Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-12bpf: factor out reg and stack slot propagation into "propagate_liveness_reg"Jiong Wang1-10/+20
After code refactor in previous patches, the propagation logic inside the for loop in "propagate_liveness" becomes clear that they are good enough to be factored out into a common function "propagate_liveness_reg". Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-12bpf: refactor propagate_liveness to eliminate code redundanceJiong Wang1-14/+20
Access to reg states were not factored out, the consequence is long code for dereferencing them which made the indentation not good for reading. This patch factor out these code so the core code in the loop could be easier to follow. Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-12bpf: refactor propagate_liveness to eliminate duplicated for loopJiong Wang1-3/+1
Propagation for register and stack slot are finished in separate for loop, while they are perfect to be put into a single loop. This could also let them share some common variables in later patches. Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-12bpf: Fix distinct pointer types warning for ARCH=i386Andrey Ignatov1-1/+1
Fix a new warning reported by kbuild for make ARCH=i386: In file included from kernel/bpf/cgroup.c:11:0: kernel/bpf/cgroup.c: In function '__cgroup_bpf_run_filter_sysctl': include/linux/kernel.h:827:29: warning: comparison of distinct pointer types lacks a cast (!!(sizeof((typeof(x) *)1 == (typeof(y) *)1))) ^ include/linux/kernel.h:841:4: note: in expansion of macro '__typecheck' (__typecheck(x, y) && __no_side_effects(x, y)) ^~~~~~~~~~~ include/linux/kernel.h:851:24: note: in expansion of macro '__safe_cmp' __builtin_choose_expr(__safe_cmp(x, y), \ ^~~~~~~~~~ include/linux/kernel.h:860:19: note: in expansion of macro '__careful_cmp' #define min(x, y) __careful_cmp(x, y, <) ^~~~~~~~~~~~~ >> kernel/bpf/cgroup.c:837:17: note: in expansion of macro 'min' ctx.new_len = min(PAGE_SIZE, *pcount); ^~~ Fixes: 4e63acdff864 ("bpf: Introduce bpf_sysctl_{get,set}_new_value helpers") Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-12Merge branch 'bpf-sysctl-hook'Alexei Starovoitov19-8/+2697
Andrey Ignatov says: ==================== v2->v3: - simplify C based selftests by relying on variable offset stack access. v1->v2: - add fs/proc/proc_sysctl.c mainteners to Cc:. The patch set introduces new BPF hook for sysctl. It adds new program type BPF_PROG_TYPE_CGROUP_SYSCTL and attach type BPF_CGROUP_SYSCTL. BPF_CGROUP_SYSCTL hook is placed before calling to sysctl's proc_handler so that accesses (read/write) to sysctl can be controlled for specific cgroup and either allowed or denied, or traced. The hook has access to sysctl name, current sysctl value and (on write only) to new sysctl value via corresponding helpers. New sysctl value can be overridden by program. Both name and values (current/new) are represented as strings same way they're visible in /proc/sys/. It is up to program to parse these strings. To help with parsing the most common kind of sysctl value, vector of integers, two new helpers are provided: bpf_strtol and bpf_strtoul with semantic similar to user space strtol(3) and strtoul(3). The hook also provides bpf_sysctl context with two fields: * @write indicates whether sysctl is being read (= 0) or written (= 1); * @file_pos is sysctl file position to read from or write to, can be overridden. The hook allows to make better isolation for containerized applications that are run as root so that one container can't change a sysctl and affect all other containers on a host, make changes to allowed sysctl in a safer way and simplify sysctl tracing for cgroups. Patch 1 is preliminary refactoring. Patch 2 adds new program and attach types. Patches 3-5 implement helpers to access sysctl name and value. Patch 6 adds file_pos field to bpf_sysctl context. Patch 7 updates UAPI in tools. Patches 8-9 add support for the new hook to libbpf and corresponding test. Patches 10-14 add selftests for the new hook. Patch 15 adds support for new arg types to verifier: pointer to integer. Patch 16 adds bpf_strto{l,ul} helpers to parse integers from sysctl value. Patch 17 updates UAPI in tools. Patch 18 updates bpf_helpers.h. Patch 19 adds selftests for pointer to integer in verifier. Patches 20-21 add selftests for bpf_strto{l,ul}, including integration C based test for sysctl value parsing. ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-12selftests/bpf: C based test for sysctl and strtoXAndrey Ignatov2-1/+126
Add C based test for a few bpf_sysctl_* helpers and bpf_strtoul. Make sure that sysctl can be identified by name and that multiple integers can be parsed from sysctl value with bpf_strtoul. net/ipv4/tcp_mem is chosen as a testing sysctl, it contains 3 unsigned longs, they all are parsed and compared (val[0] < val[1] < val[2]). Example of output: # ./test_sysctl ... Test case: C prog: deny all writes .. [PASS] Test case: C prog: deny access by name .. [PASS] Test case: C prog: read tcp_mem .. [PASS] Summary: 39 PASSED, 0 FAILED Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-12selftests/bpf: Test bpf_strtol and bpf_strtoul helpersAndrey Ignatov1-0/+485
Test that bpf_strtol and bpf_strtoul helpers can be used to convert provided buffer to long or unsigned long correspondingly and return both correct result and number of consumed bytes, or proper errno. Example of output: # ./test_sysctl .. Test case: bpf_strtoul one number string .. [PASS] Test case: bpf_strtoul multi number string .. [PASS] Test case: bpf_strtoul buf_len = 0, reject .. [PASS] Test case: bpf_strtoul supported base, ok .. [PASS] Test case: bpf_strtoul unsupported base, EINVAL .. [PASS] Test case: bpf_strtoul buf with spaces only, EINVAL .. [PASS] Test case: bpf_strtoul negative number, EINVAL .. [PASS] Test case: bpf_strtol negative number, ok .. [PASS] Test case: bpf_strtol hex number, ok .. [PASS] Test case: bpf_strtol max long .. [PASS] Test case: bpf_strtol overflow, ERANGE .. [PASS] Summary: 36 PASSED, 0 FAILED Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-12selftests/bpf: Test ARG_PTR_TO_LONG arg typeAndrey Ignatov1-0/+160
Test that verifier handles new argument types properly, including uninitialized or partially initialized value, misaligned stack access, etc. Example of output: #456/p ARG_PTR_TO_LONG uninitialized OK #457/p ARG_PTR_TO_LONG half-uninitialized OK #458/p ARG_PTR_TO_LONG misaligned OK #459/p ARG_PTR_TO_LONG size < sizeof(long) OK #460/p ARG_PTR_TO_LONG initialized OK Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-12selftests/bpf: Add sysctl and strtoX helpers to bpf_helpers.hAndrey Ignatov1-0/+19
Add bpf_sysctl_* and bpf_strtoX helpers to bpf_helpers.h. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-12bpf: Sync bpf.h to tools/Andrey Ignatov1-1/+50
Sync bpf_strtoX related bpf UAPI changes to tools/. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-12bpf: Introduce bpf_strtol and bpf_strtoul helpersAndrey Ignatov4-1/+187
Add bpf_strtol and bpf_strtoul to convert a string to long and unsigned long correspondingly. It's similar to user space strtol(3) and strtoul(3) with a few changes to the API: * instead of NUL-terminated C string the helpers expect buffer and buffer length; * resulting long or unsigned long is returned in a separate result-argument; * return value is used to indicate success or failure, on success number of consumed bytes is returned that can be used to identify position to read next if the buffer is expected to contain multiple integers; * instead of *base* argument, *flags* is used that provides base in 5 LSB, other bits are reserved for future use; * number of supported bases is limited. Documentation for the new helpers is provided in bpf.h UAPI. The helpers are made available to BPF_PROG_TYPE_CGROUP_SYSCTL programs to be able to convert string input to e.g. "ulongvec" output. E.g. "net/ipv4/tcp_mem" consists of three ulong integers. They can be parsed by calling to bpf_strtoul three times. Implementation notes: Implementation includes "../../lib/kstrtox.h" to reuse integer parsing functions. It's done exactly same way as fs/proc/base.c already does. Unfortunately existing kstrtoX function can't be used directly since they fail if any invalid character is present right after integer in the string. Existing simple_strtoX functions can't be used either since they're obsolete and don't handle overflow properly. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-12bpf: Introduce ARG_PTR_TO_{INT,LONG} arg typesAndrey Ignatov2-0/+31
Currently the way to pass result from BPF helper to BPF program is to provide memory area defined by pointer and size: func(void *, size_t). It works great for generic use-case, but for simple types, such as int, it's overkill and consumes two arguments when it could use just one. Introduce new argument types ARG_PTR_TO_INT and ARG_PTR_TO_LONG to be able to pass result from helper to program via pointer to int and long correspondingly: func(int *) or func(long *). New argument types are similar to ARG_PTR_TO_MEM with the following differences: * they don't require corresponding ARG_CONST_SIZE argument, predefined access sizes are used instead (32bit for int, 64bit for long); * it's possible to use more than one such an argument in a helper; * provided pointers have to be aligned. It's easy to introduce similar ARG_PTR_TO_CHAR and ARG_PTR_TO_SHORT argument types. It's not done due to lack of use-case though. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-12selftests/bpf: Test file_pos field in bpf_sysctl ctxAndrey Ignatov1-0/+64
Test access to file_pos field of bpf_sysctl context, both read (incl. narrow read) and write. # ./test_sysctl ... Test case: ctx:file_pos sysctl:read read ok .. [PASS] Test case: ctx:file_pos sysctl:read read ok narrow .. [PASS] Test case: ctx:file_pos sysctl:read write ok .. [PASS] ... Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-12selftests/bpf: Test bpf_sysctl_{get,set}_new_value helpersAndrey Ignatov1-0/+222
Test that new value provided by user space on sysctl write can be read by bpf_sysctl_get_new_value and overridden by bpf_sysctl_set_new_value. # ./test_sysctl ... Test case: sysctl_get_new_value sysctl:read EINVAL .. [PASS] Test case: sysctl_get_new_value sysctl:write ok .. [PASS] Test case: sysctl_get_new_value sysctl:write ok long .. [PASS] Test case: sysctl_get_new_value sysctl:write E2BIG .. [PASS] Test case: sysctl_set_new_value sysctl:read EINVAL .. [PASS] Test case: sysctl_set_new_value sysctl:write ok .. [PASS] Summary: 22 PASSED, 0 FAILED Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-12selftests/bpf: Test sysctl_get_current_value helperAndrey Ignatov1-0/+228
Test sysctl_get_current_value on sysctl read and write, buffers with enough space and too small buffers to get E2BIG and truncated result, etc. # ./test_sysctl ... Test case: sysctl_get_current_value sysctl:read ok, gt .. [PASS] Test case: sysctl_get_current_value sysctl:read ok, eq .. [PASS] Test case: sysctl_get_current_value sysctl:read E2BIG truncated .. [PASS] Test case: sysctl_get_current_value sysctl:read EINVAL .. [PASS] Test case: sysctl_get_current_value sysctl:write ok .. [PASS] Summary: 16 PASSED, 0 FAILED Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-12selftests/bpf: Test bpf_sysctl_get_name helperAndrey Ignatov1-0/+222
Test w/ and w/o BPF_F_SYSCTL_BASE_NAME, buffers with enough space and too small buffers to get E2BIG and truncated result, etc. # ./test_sysctl ... Test case: sysctl_get_name sysctl_value:base ok .. [PASS] Test case: sysctl_get_name sysctl_value:base E2BIG truncated .. [PASS] Test case: sysctl_get_name sysctl:full ok .. [PASS] Test case: sysctl_get_name sysctl:full E2BIG truncated .. [PASS] Test case: sysctl_get_name sysctl:full E2BIG truncated small .. [PASS] Summary: 11 PASSED, 0 FAILED Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-12selftests/bpf: Test BPF_CGROUP_SYSCTLAndrey Ignatov2-1/+293
Add unit test for BPF_PROG_TYPE_CGROUP_SYSCTL program type. Test that program can allow/deny access. Test both valid and invalid accesses to ctx->write. Example of output: # ./test_sysctl Test case: sysctl wrong attach_type .. [PASS] Test case: sysctl:read allow all .. [PASS] Test case: sysctl:read deny all .. [PASS] Test case: ctx:write sysctl:read read ok .. [PASS] Test case: ctx:write sysctl:write read ok .. [PASS] Test case: ctx:write sysctl:read write reject .. [PASS] Summary: 6 PASSED, 0 FAILED Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-12selftests/bpf: Test sysctl section nameAndrey Ignatov1-0/+5
Add unit test to verify that program and attach types are properly identified for "cgroup/sysctl" section name. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-12libbpf: Support sysctl hookAndrey Ignatov2-0/+4
Support BPF_PROG_TYPE_CGROUP_SYSCTL program in libbpf: identifying program and attach types by section name, probe. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-12bpf: Sync bpf.h to tools/Andrey Ignatov1-1/+89
Sync BPF_PROG_TYPE_CGROUP_SYSCTL related bpf UAPI changes to tools/. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-12bpf: Add file_pos field to bpf_sysctl ctxAndrey Ignatov5-8/+63
Add file_pos field to bpf_sysctl context to read and write sysctl file position at which sysctl is being accessed (read or written). The field can be used to e.g. override whole sysctl value on write to sysctl even when sys_write is called by user space with file_pos > 0. Or BPF program may reject such accesses. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-12bpf: Introduce bpf_sysctl_{get,set}_new_value helpersAndrey Ignatov5-10/+142
Add helpers to work with new value being written to sysctl by user space. bpf_sysctl_get_new_value() copies value being written to sysctl into provided buffer. bpf_sysctl_set_new_value() overrides new value being written by user space with a one from provided buffer. Buffer should contain string representation of the value, similar to what can be seen in /proc/sys/. Both helpers can be used only on sysctl write. File position matters and can be managed by an interface that will be introduced separately. E.g. if user space calls sys_write to a file in /proc/sys/ at file position = X, where X > 0, then the value set by bpf_sysctl_set_new_value() will be written starting from X. If program wants to override whole value with specified buffer, file position has to be set to zero. Documentation for the new helpers is provided in bpf.h UAPI. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-12bpf: Introduce bpf_sysctl_get_current_value helperAndrey Ignatov3-1/+88
Add bpf_sysctl_get_current_value() helper to copy current sysctl value into provided by BPF_PROG_TYPE_CGROUP_SYSCTL program buffer. It provides same string as user space can see by reading corresponding file in /proc/sys/, including new line, etc. Documentation for the new helper is provided in bpf.h UAPI. Since current value is kept in ctl_table->data in a parsed form, ctl_table->proc_handler() with write=0 is called to read that data and convert it to a string. Such a string can later be parsed by a program using helpers that will be introduced separately. Unfortunately it's not trivial to provide API to access parsed data due to variety of data representations (string, intvec, uintvec, ulongvec, custom structures, even NULL, etc). Instead it's assumed that user know how to handle specific sysctl they're interested in and appropriate helpers can be used. Since ctl_table->proc_handler() expects __user buffer, conversion to __user happens for kernel allocated one where the value is stored. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-12bpf: Introduce bpf_sysctl_get_name helperAndrey Ignatov2-2/+90
Add bpf_sysctl_get_name() helper to copy sysctl name (/proc/sys/ entry) into provided by BPF_PROG_TYPE_CGROUP_SYSCTL program buffer. By default full name (w/o /proc/sys/) is copied, e.g. "net/ipv4/tcp_mem". If BPF_F_SYSCTL_BASE_NAME flag is set, only base name will be copied, e.g. "tcp_mem". Documentation for the new helper is provided in bpf.h UAPI. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-12bpf: Sysctl hookAndrey Ignatov8-0/+141
Containerized applications may run as root and it may create problems for whole host. Specifically such applications may change a sysctl and affect applications in other containers. Furthermore in existing infrastructure it may not be possible to just completely disable writing to sysctl, instead such a process should be gradual with ability to log what sysctl are being changed by a container, investigate, limit the set of writable sysctl to currently used ones (so that new ones can not be changed) and eventually reduce this set to zero. The patch introduces new program type BPF_PROG_TYPE_CGROUP_SYSCTL and attach type BPF_CGROUP_SYSCTL to solve these problems on cgroup basis. New program type has access to following minimal context: struct bpf_sysctl { __u32 write; }; Where @write indicates whether sysctl is being read (= 0) or written (= 1). Helpers to access sysctl name and value will be introduced separately. BPF_CGROUP_SYSCTL attach point is added to sysctl code right before passing control to ctl_table->proc_handler so that BPF program can either allow or deny access to sysctl. Suggested-by: Roman Gushchin <guro@fb.com> Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-12bpf: Add base proto function for cgroup-bpf programsAndrey Ignatov1-1/+7
Currently kernel/bpf/cgroup.c contains only one program type and one proto function cgroup_dev_func_proto(). It'd be useful to have base proto function that can be reused for new cgroup-bpf program types coming soon. Introduce cgroup_base_func_proto(). Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-12Merge branch 'smc-next'David S. Miller8-274/+294
Ursula Braun says: ==================== net/smc: patches 2019-04-12 here are patches for SMC: * patch 1 improves behavior of non-blocking connect * patches 2, 3, 5, 7, and 8 improve connecting return codes * patches 4 and 6 are a cleanups without functional change ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12net/smc: improve smc_conn_create reason codesKarsten Graul4-59/+58
Rework smc_conn_create() to always return a valid DECLINE reason code. This removes the need to translate the return codes on 4 different places and allows to easily add more detailed return codes by changing smc_conn_create() only. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12net/smc: improve smc_listen_work reason codesKarsten Graul2-46/+54
Rework smc_listen_work() to provide improved reason codes when an SMC connection is declined. This allows better debugging on user side. This also adds 3 more detailed reason codes in smc_clc.h to indicate what type of device was not found (ism or rdma or both), or if ism cannot talk to the peer. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12net/smc: code cleanup smc_listen_workKarsten Graul1-15/+14
In smc_listen_work() the variables rc and reason_code are defined which have the same meaning. Eliminate reason_code in favor of the shorter name rc. No functional changes. Rename the functions smc_check_ism() and smc_check_rdma() into smc_find_ism_device() and smc_find_rdma_device() to make there purpose more clear. No functional changes. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12net/smc: cleanup of get vlan idKarsten Graul3-6/+10
The vlan_id of the underlying CLC socket was retrieved two times during processing of the listen handshaking. Change this to get the vlan id one time in connect and in listen processing, and reuse the id. And add a new CLC DECLINE return code for the case when the retrieval of the vlan id failed. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12net/smc: consolidate function parametersKarsten Graul7-141/+139
During initialization of an SMC socket a lot of function parameters need to get passed down the function call path. Consolidate the parameters in a helper struct so there are less enough parameters to get all passed by register. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12net/smc: check for ip prefix and subnetKarsten Graul2-3/+10
The check for a matching ip prefix and subnet was only done for SMC-R in smc_listen_rdma_check() but not when an SMC-D connection was possible. Rename the function into smc_listen_prfx_check() and move its call to a place where it is called for both SMC variants. And add a new CLC DECLINE reason for the case when the IP prefix or subnet check fails so the reason for the failing SMC connection can be found out more easily. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12net/smc: fallback to TCP after connect problemsKarsten Graul1-4/+4
Correct the CLC decline reason codes for internal problems to not have the sign bit set, negative reason codes are interpreted as not eligible for TCP fallback. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12net/smc: nonblocking connect reworkUrsula Braun2-42/+47
For nonblocking sockets move the kernel_connect() from the connect worker into the initial smc_connect part to return kernel_connect() errors other than -EINPROGRESS to user space. Reviewed-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12xen-netback: add reference from xenvif to backend_info to facilitate ↵Dongli Zhang2-16/+19
coredump analysis During coredump analysis, it is not easy to obtain the address of backend_info in xen-netback. So far there are two ways to obtain backend_info: 1. Do what xenbus_device_find() does for vmcore to find the xenbus_device and then derive it from dev_get_drvdata(). 2. Extract backend_info from callstack of xenwatch (e.g., netback_remove() or frontend_changed()). This patch adds a reference from xenvif to backend_info so that it would be much more easier to obtain backend_info during coredump analysis. Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-11Merge branch 'sctp-skb-list'David S. Miller3-41/+71
David Miller says: ==================== SCTP: Event skb list overhaul. This patch series eliminates the explicit reference to the skb list implementation via skb->prev dereferences. The approach used is to pass a non-empty skb list around instead of an event skb object which may or may not be on a list. I'd like to thank Marcelo Leitner, Xin Long, and Neil Horman for reviewing previous versions of this series. Testing would be very much appreciated, in addition to the review of course. v4 --> v5: Rebase to net-next v3 --> v4: Fix the logic in patch #4 so that we don't miss cases where we should add event to the on-stack temp list. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-11sctp: Pass sk_buff_head explicitly to sctp_ulpq_tail_event().David Miller3-20/+13
Now the SKB list implementation assumption can be removed. And now that we know that the list head is always non-NULL we can remove the code blocks dealing with that as well. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-11sctp: Make sctp_enqueue_event tak an skb list.David Miller2-15/+39
Pass this, instead of an event. Then everything trickles down and we always have events a non-empty list. Then we needs a list creating stub to place into .enqueue_event for sctp_stream_interleave_1. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-11sctp: Use helper for sctp_ulpq_tail_event() when hooked up to ->enqueue_eventDavid Miller1-1/+10
This way we can make sure events sent this way to sctp_ulpq_tail_event() are on a list as well. Now all such code paths are fully covered. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-11sctp: Always pass skbs on a list to sctp_ulpq_tail_event().David Miller1-6/+10
This way we can simplify the logic and remove assumptions about the implementation of skb lists. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-11sctp: Remove superfluous test in sctp_ulpq_reasm_drain().David Miller1-1/+1
Inside the loop, we always start with event non-NULL. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>