diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2020-10-15 18:42:13 -0700 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2020-10-15 18:42:13 -0700 |
commit | 9ff9b0d392ea08090cd1780fb196f36dbb586529 (patch) | |
tree | 276a3a5c4525b84dee64eda30b423fc31bf94850 /tools/lib | |
parent | 840e5bb326bbcb16ce82dd2416d2769de4839aea (diff) | |
parent | 105faa8742437c28815b2a3eb8314ebc5fd9288c (diff) | |
download | linux-9ff9b0d392ea08090cd1780fb196f36dbb586529.tar.bz2 |
Merge tag 'net-next-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
- Add redirect_neigh() BPF packet redirect helper, allowing to limit
stack traversal in common container configs and improving TCP
back-pressure.
Daniel reports ~10Gbps => ~15Gbps single stream TCP performance gain.
- Expand netlink policy support and improve policy export to user
space. (Ge)netlink core performs request validation according to
declared policies. Expand the expressiveness of those policies
(min/max length and bitmasks). Allow dumping policies for particular
commands. This is used for feature discovery by user space (instead
of kernel version parsing or trial and error).
- Support IGMPv3/MLDv2 multicast listener discovery protocols in
bridge.
- Allow more than 255 IPv4 multicast interfaces.
- Add support for Type of Service (ToS) reflection in SYN/SYN-ACK
packets of TCPv6.
- In Multi-patch TCP (MPTCP) support concurrent transmission of data on
multiple subflows in a load balancing scenario. Enhance advertising
addresses via the RM_ADDR/ADD_ADDR options.
- Support SMC-Dv2 version of SMC, which enables multi-subnet
deployments.
- Allow more calls to same peer in RxRPC.
- Support two new Controller Area Network (CAN) protocols - CAN-FD and
ISO 15765-2:2016.
- Add xfrm/IPsec compat layer, solving the 32bit user space on 64bit
kernel problem.
- Add TC actions for implementing MPLS L2 VPNs.
- Improve nexthop code - e.g. handle various corner cases when nexthop
objects are removed from groups better, skip unnecessary
notifications and make it easier to offload nexthops into HW by
converting to a blocking notifier.
- Support adding and consuming TCP header options by BPF programs,
opening the doors for easy experimental and deployment-specific TCP
option use.
- Reorganize TCP congestion control (CC) initialization to simplify
life of TCP CC implemented in BPF.
- Add support for shipping BPF programs with the kernel and loading
them early on boot via the User Mode Driver mechanism, hence reusing
all the user space infra we have.
- Support sleepable BPF programs, initially targeting LSM and tracing.
- Add bpf_d_path() helper for returning full path for given 'struct
path'.
- Make bpf_tail_call compatible with bpf-to-bpf calls.
- Allow BPF programs to call map_update_elem on sockmaps.
- Add BPF Type Format (BTF) support for type and enum discovery, as
well as support for using BTF within the kernel itself (current use
is for pretty printing structures).
- Support listing and getting information about bpf_links via the bpf
syscall.
- Enhance kernel interfaces around NIC firmware update. Allow
specifying overwrite mask to control if settings etc. are reset
during update; report expected max time operation may take to users;
support firmware activation without machine reboot incl. limits of
how much impact reset may have (e.g. dropping link or not).
- Extend ethtool configuration interface to report IEEE-standard
counters, to limit the need for per-vendor logic in user space.
- Adopt or extend devlink use for debug, monitoring, fw update in many
drivers (dsa loop, ice, ionic, sja1105, qed, mlxsw, mv88e6xxx,
dpaa2-eth).
- In mlxsw expose critical and emergency SFP module temperature alarms.
Refactor port buffer handling to make the defaults more suitable and
support setting these values explicitly via the DCBNL interface.
- Add XDP support for Intel's igb driver.
- Support offloading TC flower classification and filtering rules to
mscc_ocelot switches.
- Add PTP support for Marvell Octeontx2 and PP2.2 hardware, as well as
fixed interval period pulse generator and one-step timestamping in
dpaa-eth.
- Add support for various auth offloads in WiFi APs, e.g. SAE (WPA3)
offload.
- Add Lynx PHY/PCS MDIO module, and convert various drivers which have
this HW to use it. Convert mvpp2 to split PCS.
- Support Marvell Prestera 98DX3255 24-port switch ASICs, as well as
7-port Mediatek MT7531 IP.
- Add initial support for QCA6390 and IPQ6018 in ath11k WiFi driver,
and wcn3680 support in wcn36xx.
- Improve performance for packets which don't require much offloads on
recent Mellanox NICs by 20% by making multiple packets share a
descriptor entry.
- Move chelsio inline crypto drivers (for TLS and IPsec) from the
crypto subtree to drivers/net. Move MDIO drivers out of the phy
directory.
- Clean up a lot of W=1 warnings, reportedly the actively developed
subsections of networking drivers should now build W=1 warning free.
- Make sure drivers don't use in_interrupt() to dynamically adapt their
code. Convert tasklets to use new tasklet_setup API (sadly this
conversion is not yet complete).
* tag 'net-next-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2583 commits)
Revert "bpfilter: Fix build error with CONFIG_BPFILTER_UMH"
net, sockmap: Don't call bpf_prog_put() on NULL pointer
bpf, selftest: Fix flaky tcp_hdr_options test when adding addr to lo
bpf, sockmap: Add locking annotations to iterator
netfilter: nftables: allow re-computing sctp CRC-32C in 'payload' statements
net: fix pos incrementment in ipv6_route_seq_next
net/smc: fix invalid return code in smcd_new_buf_create()
net/smc: fix valid DMBE buffer sizes
net/smc: fix use-after-free of delayed events
bpfilter: Fix build error with CONFIG_BPFILTER_UMH
cxgb4/ch_ipsec: Replace the module name to ch_ipsec from chcr
net: sched: Fix suspicious RCU usage while accessing tcf_tunnel_info
bpf: Fix register equivalence tracking.
rxrpc: Fix loss of final ack on shutdown
rxrpc: Fix bundle counting for exclusive connections
netfilter: restore NF_INET_NUMHOOKS
ibmveth: Identify ingress large send packets.
ibmveth: Switch order of ibmveth_helper calls.
cxgb4: handle 4-tuple PEDIT to NAT mode translation
selftests: Add VRF route leaking tests
...
Diffstat (limited to 'tools/lib')
-rw-r--r-- | tools/lib/bpf/Makefile | 28 | ||||
-rw-r--r-- | tools/lib/bpf/bpf.c | 70 | ||||
-rw-r--r-- | tools/lib/bpf/bpf.h | 39 | ||||
-rw-r--r-- | tools/lib/bpf/bpf_core_read.h | 120 | ||||
-rw-r--r-- | tools/lib/bpf/bpf_helpers.h | 49 | ||||
-rw-r--r-- | tools/lib/bpf/bpf_prog_linfo.c | 3 | ||||
-rw-r--r-- | tools/lib/bpf/bpf_tracing.h | 4 | ||||
-rw-r--r-- | tools/lib/bpf/btf.c | 1631 | ||||
-rw-r--r-- | tools/lib/bpf/btf.h | 103 | ||||
-rw-r--r-- | tools/lib/bpf/btf_dump.c | 87 | ||||
-rw-r--r-- | tools/lib/bpf/hashmap.c | 3 | ||||
-rw-r--r-- | tools/lib/bpf/hashmap.h | 12 | ||||
-rw-r--r-- | tools/lib/bpf/libbpf.c | 3407 | ||||
-rw-r--r-- | tools/lib/bpf/libbpf.h | 12 | ||||
-rw-r--r-- | tools/lib/bpf/libbpf.map | 38 | ||||
-rw-r--r-- | tools/lib/bpf/libbpf_common.h | 2 | ||||
-rw-r--r-- | tools/lib/bpf/libbpf_internal.h | 147 | ||||
-rw-r--r-- | tools/lib/bpf/libbpf_probes.c | 8 | ||||
-rw-r--r-- | tools/lib/bpf/netlink.c | 128 | ||||
-rw-r--r-- | tools/lib/bpf/nlattr.c | 9 | ||||
-rw-r--r-- | tools/lib/bpf/ringbuf.c | 8 | ||||
-rw-r--r-- | tools/lib/bpf/xsk.c | 383 | ||||
-rw-r--r-- | tools/lib/bpf/xsk.h | 9 |
23 files changed, 4726 insertions, 1574 deletions
diff --git a/tools/lib/bpf/Makefile b/tools/lib/bpf/Makefile index 9ae8f4ef0aac..5f9abed3e226 100644 --- a/tools/lib/bpf/Makefile +++ b/tools/lib/bpf/Makefile @@ -1,6 +1,9 @@ # SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) # Most of this file is copied from tools/lib/traceevent/Makefile +RM ?= rm +srctree = $(abs_srctree) + LIBBPF_VERSION := $(shell \ grep -oE '^LIBBPF_([0-9.]+)' libbpf.map | \ sort -rV | head -n1 | cut -d'_' -f2) @@ -56,7 +59,7 @@ ifndef VERBOSE endif FEATURE_USER = .libbpf -FEATURE_TESTS = libelf libelf-mmap zlib bpf reallocarray +FEATURE_TESTS = libelf zlib bpf FEATURE_DISPLAY = libelf zlib bpf INCLUDES = -I. -I$(srctree)/tools/include -I$(srctree)/tools/include/uapi @@ -95,27 +98,18 @@ PC_FILE = libbpf.pc ifdef EXTRA_CFLAGS CFLAGS := $(EXTRA_CFLAGS) else - CFLAGS := -g -Wall -endif - -ifeq ($(feature-libelf-mmap), 1) - override CFLAGS += -DHAVE_LIBELF_MMAP_SUPPORT -endif - -ifeq ($(feature-reallocarray), 0) - override CFLAGS += -DCOMPAT_NEED_REALLOCARRAY + CFLAGS := -g -O2 endif # Append required CFLAGS -override CFLAGS += $(EXTRA_WARNINGS) +override CFLAGS += $(EXTRA_WARNINGS) -Wno-switch-enum override CFLAGS += -Werror -Wall -override CFLAGS += -fPIC override CFLAGS += $(INCLUDES) override CFLAGS += -fvisibility=hidden override CFLAGS += -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 # flags specific for shared library -SHLIB_FLAGS := -DSHARED +SHLIB_FLAGS := -DSHARED -fPIC ifeq ($(VERBOSE),1) Q = @@ -197,7 +191,7 @@ $(OUTPUT)libbpf.so.$(LIBBPF_VERSION): $(BPF_IN_SHARED) @ln -sf $(@F) $(OUTPUT)libbpf.so.$(LIBBPF_MAJOR_VERSION) $(OUTPUT)libbpf.a: $(BPF_IN_STATIC) - $(QUIET_LINK)$(RM) $@; $(AR) rcs $@ $^ + $(QUIET_LINK)$(RM) -f $@; $(AR) rcs $@ $^ $(OUTPUT)libbpf.pc: $(QUIET_GEN)sed -e "s|@PREFIX@|$(prefix)|" \ @@ -271,10 +265,10 @@ install: install_lib install_pkgconfig install_headers ### Cleaning rules config-clean: - $(call QUIET_CLEAN, config) + $(call QUIET_CLEAN, feature-detect) $(Q)$(MAKE) -C $(srctree)/tools/build/feature/ clean >/dev/null -clean: +clean: config-clean $(call QUIET_CLEAN, libbpf) $(RM) -rf $(CMD_TARGETS) \ *~ .*.d .*.cmd LIBBPF-CFLAGS $(BPF_HELPER_DEFS) \ $(SHARED_OBJDIR) $(STATIC_OBJDIR) \ @@ -301,7 +295,7 @@ cscope: cscope -b -q -I $(srctree)/include -f cscope.out tags: - rm -f TAGS tags + $(RM) -f TAGS tags ls *.c *.h | xargs $(TAGS_PROG) -a # Declare the contents of the .PHONY variable as phony. We keep that diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c index 0750681057c2..d27e34133973 100644 --- a/tools/lib/bpf/bpf.c +++ b/tools/lib/bpf/bpf.c @@ -32,9 +32,6 @@ #include "libbpf.h" #include "libbpf_internal.h" -/* make sure libbpf doesn't use kernel-only integer typedefs */ -#pragma GCC poison u8 u16 u32 u64 s8 s16 s32 s64 - /* * When building perf, unistd.h is overridden. __NR_bpf is * required to be defined explicitly. @@ -589,19 +586,31 @@ int bpf_link_create(int prog_fd, int target_fd, enum bpf_attach_type attach_type, const struct bpf_link_create_opts *opts) { + __u32 target_btf_id, iter_info_len; union bpf_attr attr; if (!OPTS_VALID(opts, bpf_link_create_opts)) return -EINVAL; + iter_info_len = OPTS_GET(opts, iter_info_len, 0); + target_btf_id = OPTS_GET(opts, target_btf_id, 0); + + if (iter_info_len && target_btf_id) + return -EINVAL; + memset(&attr, 0, sizeof(attr)); attr.link_create.prog_fd = prog_fd; attr.link_create.target_fd = target_fd; attr.link_create.attach_type = attach_type; attr.link_create.flags = OPTS_GET(opts, flags, 0); - attr.link_create.iter_info = - ptr_to_u64(OPTS_GET(opts, iter_info, (void *)0)); - attr.link_create.iter_info_len = OPTS_GET(opts, iter_info_len, 0); + + if (iter_info_len) { + attr.link_create.iter_info = + ptr_to_u64(OPTS_GET(opts, iter_info, (void *)0)); + attr.link_create.iter_info_len = iter_info_len; + } else if (target_btf_id) { + attr.link_create.target_btf_id = target_btf_id; + } return sys_bpf(BPF_LINK_CREATE, &attr, sizeof(attr)); } @@ -715,6 +724,37 @@ int bpf_prog_test_run_xattr(struct bpf_prog_test_run_attr *test_attr) return ret; } +int bpf_prog_test_run_opts(int prog_fd, struct bpf_test_run_opts *opts) +{ + union bpf_attr attr; + int ret; + + if (!OPTS_VALID(opts, bpf_test_run_opts)) + return -EINVAL; + + memset(&attr, 0, sizeof(attr)); + attr.test.prog_fd = prog_fd; + attr.test.cpu = OPTS_GET(opts, cpu, 0); + attr.test.flags = OPTS_GET(opts, flags, 0); + attr.test.repeat = OPTS_GET(opts, repeat, 0); + attr.test.duration = OPTS_GET(opts, duration, 0); + attr.test.ctx_size_in = OPTS_GET(opts, ctx_size_in, 0); + attr.test.ctx_size_out = OPTS_GET(opts, ctx_size_out, 0); + attr.test.data_size_in = OPTS_GET(opts, data_size_in, 0); + attr.test.data_size_out = OPTS_GET(opts, data_size_out, 0); + attr.test.ctx_in = ptr_to_u64(OPTS_GET(opts, ctx_in, NULL)); + attr.test.ctx_out = ptr_to_u64(OPTS_GET(opts, ctx_out, NULL)); + attr.test.data_in = ptr_to_u64(OPTS_GET(opts, data_in, NULL)); + attr.test.data_out = ptr_to_u64(OPTS_GET(opts, data_out, NULL)); + + ret = sys_bpf(BPF_PROG_TEST_RUN, &attr, sizeof(attr)); + OPTS_SET(opts, data_size_out, attr.test.data_size_out); + OPTS_SET(opts, ctx_size_out, attr.test.ctx_size_out); + OPTS_SET(opts, duration, attr.test.duration); + OPTS_SET(opts, retval, attr.test.retval); + return ret; +} + static int bpf_obj_get_next_id(__u32 start_id, __u32 *next_id, int cmd) { union bpf_attr attr; @@ -818,7 +858,7 @@ int bpf_raw_tracepoint_open(const char *name, int prog_fd) return sys_bpf(BPF_RAW_TRACEPOINT_OPEN, &attr, sizeof(attr)); } -int bpf_load_btf(void *btf, __u32 btf_size, char *log_buf, __u32 log_buf_size, +int bpf_load_btf(const void *btf, __u32 btf_size, char *log_buf, __u32 log_buf_size, bool do_log) { union bpf_attr attr = {}; @@ -875,3 +915,19 @@ int bpf_enable_stats(enum bpf_stats_type type) return sys_bpf(BPF_ENABLE_STATS, &attr, sizeof(attr)); } + +int bpf_prog_bind_map(int prog_fd, int map_fd, + const struct bpf_prog_bind_opts *opts) +{ + union bpf_attr attr; + + if (!OPTS_VALID(opts, bpf_prog_bind_opts)) + return -EINVAL; + + memset(&attr, 0, sizeof(attr)); + attr.prog_bind_map.prog_fd = prog_fd; + attr.prog_bind_map.map_fd = map_fd; + attr.prog_bind_map.flags = OPTS_GET(opts, flags, 0); + + return sys_bpf(BPF_PROG_BIND_MAP, &attr, sizeof(attr)); +} diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h index 015d13f25fcc..875dde20d56e 100644 --- a/tools/lib/bpf/bpf.h +++ b/tools/lib/bpf/bpf.h @@ -174,8 +174,9 @@ struct bpf_link_create_opts { __u32 flags; union bpf_iter_link_info *iter_info; __u32 iter_info_len; + __u32 target_btf_id; }; -#define bpf_link_create_opts__last_field iter_info_len +#define bpf_link_create_opts__last_field target_btf_id LIBBPF_API int bpf_link_create(int prog_fd, int target_fd, enum bpf_attach_type attach_type, @@ -234,7 +235,7 @@ LIBBPF_API int bpf_prog_query(int target_fd, enum bpf_attach_type type, __u32 query_flags, __u32 *attach_flags, __u32 *prog_ids, __u32 *prog_cnt); LIBBPF_API int bpf_raw_tracepoint_open(const char *name, int prog_fd); -LIBBPF_API int bpf_load_btf(void *btf, __u32 btf_size, char *log_buf, +LIBBPF_API int bpf_load_btf(const void *btf, __u32 btf_size, char *log_buf, __u32 log_buf_size, bool do_log); LIBBPF_API int bpf_task_fd_query(int pid, int fd, __u32 flags, char *buf, __u32 *buf_len, __u32 *prog_id, __u32 *fd_type, @@ -243,6 +244,40 @@ LIBBPF_API int bpf_task_fd_query(int pid, int fd, __u32 flags, char *buf, enum bpf_stats_type; /* defined in up-to-date linux/bpf.h */ LIBBPF_API int bpf_enable_stats(enum bpf_stats_type type); +struct bpf_prog_bind_opts { + size_t sz; /* size of this struct for forward/backward compatibility */ + __u32 flags; +}; +#define bpf_prog_bind_opts__last_field flags + +LIBBPF_API int bpf_prog_bind_map(int prog_fd, int map_fd, + const struct bpf_prog_bind_opts *opts); + +struct bpf_test_run_opts { + size_t sz; /* size of this struct for forward/backward compatibility */ + const void *data_in; /* optional */ + void *data_out; /* optional */ + __u32 data_size_in; + __u32 data_size_out; /* in: max length of data_out + * out: length of data_out + */ + const void *ctx_in; /* optional */ + void *ctx_out; /* optional */ + __u32 ctx_size_in; + __u32 ctx_size_out; /* in: max length of ctx_out + * out: length of cxt_out + */ + __u32 retval; /* out: return code of the BPF program */ + int repeat; + __u32 duration; /* out: average per repetition in ns */ + __u32 flags; + __u32 cpu; +}; +#define bpf_test_run_opts__last_field cpu + +LIBBPF_API int bpf_prog_test_run_opts(int prog_fd, + struct bpf_test_run_opts *opts); + #ifdef __cplusplus } /* extern "C" */ #endif diff --git a/tools/lib/bpf/bpf_core_read.h b/tools/lib/bpf/bpf_core_read.h index eae5cccff761..bbcefb3ff5a5 100644 --- a/tools/lib/bpf/bpf_core_read.h +++ b/tools/lib/bpf/bpf_core_read.h @@ -19,32 +19,52 @@ enum bpf_field_info_kind { BPF_FIELD_RSHIFT_U64 = 5, }; +/* second argument to __builtin_btf_type_id() built-in */ +enum bpf_type_id_kind { + BPF_TYPE_ID_LOCAL = 0, /* BTF type ID in local program */ + BPF_TYPE_ID_TARGET = 1, /* BTF type ID in target kernel */ +}; + +/* second argument to __builtin_preserve_type_info() built-in */ +enum bpf_type_info_kind { + BPF_TYPE_EXISTS = 0, /* type existence in target kernel */ + BPF_TYPE_SIZE = 1, /* type size in target kernel */ +}; + +/* second argument to __builtin_preserve_enum_value() built-in */ +enum bpf_enum_value_kind { + BPF_ENUMVAL_EXISTS = 0, /* enum value existence in kernel */ + BPF_ENUMVAL_VALUE = 1, /* enum value value relocation */ +}; + #define __CORE_RELO(src, field, info) \ __builtin_preserve_field_info((src)->field, BPF_FIELD_##info) #if __BYTE_ORDER == __LITTLE_ENDIAN #define __CORE_BITFIELD_PROBE_READ(dst, src, fld) \ - bpf_probe_read((void *)dst, \ - __CORE_RELO(src, fld, BYTE_SIZE), \ - (const void *)src + __CORE_RELO(src, fld, BYTE_OFFSET)) + bpf_probe_read_kernel( \ + (void *)dst, \ + __CORE_RELO(src, fld, BYTE_SIZE), \ + (const void *)src + __CORE_RELO(src, fld, BYTE_OFFSET)) #else /* semantics of LSHIFT_64 assumes loading values into low-ordered bytes, so * for big-endian we need to adjust destination pointer accordingly, based on * field byte size */ #define __CORE_BITFIELD_PROBE_READ(dst, src, fld) \ - bpf_probe_read((void *)dst + (8 - __CORE_RELO(src, fld, BYTE_SIZE)), \ - __CORE_RELO(src, fld, BYTE_SIZE), \ - (const void *)src + __CORE_RELO(src, fld, BYTE_OFFSET)) + bpf_probe_read_kernel( \ + (void *)dst + (8 - __CORE_RELO(src, fld, BYTE_SIZE)), \ + __CORE_RELO(src, fld, BYTE_SIZE), \ + (const void *)src + __CORE_RELO(src, fld, BYTE_OFFSET)) #endif /* * Extract bitfield, identified by s->field, and return its value as u64. * All this is done in relocatable manner, so bitfield changes such as * signedness, bit size, offset changes, this will be handled automatically. - * This version of macro is using bpf_probe_read() to read underlying integer - * storage. Macro functions as an expression and its return type is - * bpf_probe_read()'s return value: 0, on success, <0 on error. + * This version of macro is using bpf_probe_read_kernel() to read underlying + * integer storage. Macro functions as an expression and its return type is + * bpf_probe_read_kernel()'s return value: 0, on success, <0 on error. */ #define BPF_CORE_READ_BITFIELD_PROBED(s, field) ({ \ unsigned long long val = 0; \ @@ -92,15 +112,75 @@ enum bpf_field_info_kind { __builtin_preserve_field_info(field, BPF_FIELD_EXISTS) /* - * Convenience macro to get byte size of a field. Works for integers, + * Convenience macro to get the byte size of a field. Works for integers, * struct/unions, pointers, arrays, and enums. */ #define bpf_core_field_size(field) \ __builtin_preserve_field_info(field, BPF_FIELD_BYTE_SIZE) /* - * bpf_core_read() abstracts away bpf_probe_read() call and captures offset - * relocation for source address using __builtin_preserve_access_index() + * Convenience macro to get BTF type ID of a specified type, using a local BTF + * information. Return 32-bit unsigned integer with type ID from program's own + * BTF. Always succeeds. + */ +#define bpf_core_type_id_local(type) \ + __builtin_btf_type_id(*(typeof(type) *)0, BPF_TYPE_ID_LOCAL) + +/* + * Convenience macro to get BTF type ID of a target kernel's type that matches + * specified local type. + * Returns: + * - valid 32-bit unsigned type ID in kernel BTF; + * - 0, if no matching type was found in a target kernel BTF. + */ +#define bpf_core_type_id_kernel(type) \ + __builtin_btf_type_id(*(typeof(type) *)0, BPF_TYPE_ID_TARGET) + +/* + * Convenience macro to check that provided named type + * (struct/union/enum/typedef) exists in a target kernel. + * Returns: + * 1, if such type is present in target kernel's BTF; + * 0, if no matching type is found. + */ +#define bpf_core_type_exists(type) \ + __builtin_preserve_type_info(*(typeof(type) *)0, BPF_TYPE_EXISTS) + +/* + * Convenience macro to get the byte size of a provided named type + * (struct/union/enum/typedef) in a target kernel. + * Returns: + * >= 0 size (in bytes), if type is present in target kernel's BTF; + * 0, if no matching type is found. + */ +#define bpf_core_type_size(type) \ + __builtin_preserve_type_info(*(typeof(type) *)0, BPF_TYPE_SIZE) + +/* + * Convenience macro to check that provided enumerator value is defined in + * a target kernel. + * Returns: + * 1, if specified enum type and its enumerator value are present in target + * kernel's BTF; + * 0, if no matching enum and/or enum value within that enum is found. + */ +#define bpf_core_enum_value_exists(enum_type, enum_value) \ + __builtin_preserve_enum_value(*(typeof(enum_type) *)enum_value, BPF_ENUMVAL_EXISTS) + +/* + * Convenience macro to get the integer value of an enumerator value in + * a target kernel. + * Returns: + * 64-bit value, if specified enum type and its enumerator value are + * present in target kernel's BTF; + * 0, if no matching enum and/or enum value within that enum is found. + */ +#define bpf_core_enum_value(enum_type, enum_value) \ + __builtin_preserve_enum_value(*(typeof(enum_type) *)enum_value, BPF_ENUMVAL_VALUE) + +/* + * bpf_core_read() abstracts away bpf_probe_read_kernel() call and captures + * offset relocation for source address using __builtin_preserve_access_index() * built-in, provided by Clang. * * __builtin_preserve_access_index() takes as an argument an expression of @@ -115,8 +195,8 @@ enum bpf_field_info_kind { * (local) BTF, used to record relocation. */ #define bpf_core_read(dst, sz, src) \ - bpf_probe_read(dst, sz, \ - (const void *)__builtin_preserve_access_index(src)) + bpf_probe_read_kernel(dst, sz, \ + (const void *)__builtin_preserve_access_index(src)) /* * bpf_core_read_str() is a thin wrapper around bpf_probe_read_str() @@ -124,8 +204,8 @@ enum bpf_field_info_kind { * argument. */ #define bpf_core_read_str(dst, sz, src) \ - bpf_probe_read_str(dst, sz, \ - (const void *)__builtin_preserve_access_index(src)) + bpf_probe_read_kernel_str(dst, sz, \ + (const void *)__builtin_preserve_access_index(src)) #define ___concat(a, b) a ## b #define ___apply(fn, n) ___concat(fn, n) @@ -239,15 +319,17 @@ enum bpf_field_info_kind { * int x = BPF_CORE_READ(s, a.b.c, d.e, f, g); * * BPF_CORE_READ will decompose above statement into 4 bpf_core_read (BPF - * CO-RE relocatable bpf_probe_read() wrapper) calls, logically equivalent to: + * CO-RE relocatable bpf_probe_read_kernel() wrapper) calls, logically + * equivalent to: * 1. const void *__t = s->a.b.c; * 2. __t = __t->d.e; * 3. __t = __t->f; * 4. return __t->g; * * Equivalence is logical, because there is a heavy type casting/preservation - * involved, as well as all the reads are happening through bpf_probe_read() - * calls using __builtin_preserve_access_index() to emit CO-RE relocations. + * involved, as well as all the reads are happening through + * bpf_probe_read_kernel() calls using __builtin_preserve_access_index() to + * emit CO-RE relocations. * * N.B. Only up to 9 "field accessors" are supported, which should be more * than enough for any practical purpose. diff --git a/tools/lib/bpf/bpf_helpers.h b/tools/lib/bpf/bpf_helpers.h index e9a4ecddb7a5..2bdb7d6dbad2 100644 --- a/tools/lib/bpf/bpf_helpers.h +++ b/tools/lib/bpf/bpf_helpers.h @@ -32,6 +32,9 @@ #ifndef __always_inline #define __always_inline __attribute__((always_inline)) #endif +#ifndef __noinline +#define __noinline __attribute__((noinline)) +#endif #ifndef __weak #define __weak __attribute__((weak)) #endif @@ -51,6 +54,52 @@ #endif /* + * Helper macro to throw a compilation error if __bpf_unreachable() gets + * built into the resulting code. This works given BPF back end does not + * implement __builtin_trap(). This is useful to assert that certain paths + * of the program code are never used and hence eliminated by the compiler. + * + * For example, consider a switch statement that covers known cases used by + * the program. __bpf_unreachable() can then reside in the default case. If + * the program gets extended such that a case is not covered in the switch + * statement, then it will throw a build error due to the default case not + * being compiled out. + */ +#ifndef __bpf_unreachable +# define __bpf_unreachable() __builtin_trap() +#endif + +/* + * Helper function to perform a tail call with a constant/immediate map slot. + */ +static __always_inline void +bpf_tail_call_static(void *ctx, const void *map, const __u32 slot) +{ + if (!__builtin_constant_p(slot)) + __bpf_unreachable(); + + /* + * Provide a hard guarantee that LLVM won't optimize setting r2 (map + * pointer) and r3 (constant map index) from _different paths_ ending + * up at the _same_ call insn as otherwise we won't be able to use the + * jmpq/nopl retpoline-free patching by the x86-64 JIT in the kernel + * given they mismatch. See also d2e4c1e6c294 ("bpf: Constant map key + * tracking for prog array pokes") for details on verifier tracking. + * + * Note on clobber list: we need to stay in-line with BPF calling + * convention, so even if we don't end up using r0, r4, r5, we need + * to mark them as clobber so that LLVM doesn't end up using them + * before / after the call. + */ + asm volatile("r1 = %[ctx]\n\t" + "r2 = %[map]\n\t" + "r3 = %[slot]\n\t" + "call 12" + :: [ctx]"r"(ctx), [map]"r"(map), [slot]"i"(slot) + : "r0", "r1", "r2", "r3", "r4", "r5"); +} + +/* * Helper structure used by eBPF C program * to describe BPF map attributes to libbpf loader */ diff --git a/tools/lib/bpf/bpf_prog_linfo.c b/tools/lib/bpf/bpf_prog_linfo.c index bafca49cb1e6..3ed1a27b5f7c 100644 --- a/tools/lib/bpf/bpf_prog_linfo.c +++ b/tools/lib/bpf/bpf_prog_linfo.c @@ -8,9 +8,6 @@ #include "libbpf.h" #include "libbpf_internal.h" -/* make sure libbpf doesn't use kernel-only integer typedefs */ -#pragma GCC poison u8 u16 u32 u64 s8 s16 s32 s64 - struct bpf_prog_linfo { void *raw_linfo; void *raw_jited_linfo; diff --git a/tools/lib/bpf/bpf_tracing.h b/tools/lib/bpf/bpf_tracing.h index eebf020cbe3e..f9ef37707888 100644 --- a/tools/lib/bpf/bpf_tracing.h +++ b/tools/lib/bpf/bpf_tracing.h @@ -289,9 +289,9 @@ struct pt_regs; #define BPF_KRETPROBE_READ_RET_IP BPF_KPROBE_READ_RET_IP #else #define BPF_KPROBE_READ_RET_IP(ip, ctx) \ - ({ bpf_probe_read(&(ip), sizeof(ip), (void *)PT_REGS_RET(ctx)); }) + ({ bpf_probe_read_kernel(&(ip), sizeof(ip), (void *)PT_REGS_RET(ctx)); }) #define BPF_KRETPROBE_READ_RET_IP(ip, ctx) \ - ({ bpf_probe_read(&(ip), sizeof(ip), \ + ({ bpf_probe_read_kernel(&(ip), sizeof(ip), \ (void *)(PT_REGS_FP(ctx) + sizeof(ip))); }) #endif diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c index 6bdbc389b493..231b07203e3d 100644 --- a/tools/lib/bpf/btf.c +++ b/tools/lib/bpf/btf.c @@ -1,6 +1,7 @@ // SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) /* Copyright (c) 2018 Facebook */ +#include <byteswap.h> #include <endian.h> #include <stdio.h> #include <stdlib.h> @@ -21,26 +22,78 @@ #include "libbpf_internal.h" #include "hashmap.h" -/* make sure libbpf doesn't use kernel-only integer typedefs */ -#pragma GCC poison u8 u16 u32 u64 s8 s16 s32 s64 - #define BTF_MAX_NR_TYPES 0x7fffffffU #define BTF_MAX_STR_OFFSET 0x7fffffffU static struct btf_type btf_void; struct btf { - union { - struct btf_header *hdr; - void *data; - }; - struct btf_type **types; - const char *strings; - void *nohdr_data; + /* raw BTF data in native endianness */ + void *raw_data; + /* raw BTF data in non-native endianness */ + void *raw_data_swapped; + __u32 raw_size; + /* whether target endianness differs from the native one */ + bool swapped_endian; + + /* + * When BTF is loaded from an ELF or raw memory it is stored + * in a contiguous memory block. The hdr, type_data, and, strs_data + * point inside that memory region to their respective parts of BTF + * representation: + * + * +--------------------------------+ + * | Header | Types | Strings | + * +--------------------------------+ + * ^ ^ ^ + * | | | + * hdr | | + * types_data-+ | + * strs_data------------+ + * + * If BTF data is later modified, e.g., due to types added or + * removed, BTF deduplication performed, etc, this contiguous + * representation is broken up into three independently allocated + * memory regions to be able to modify them independently. + * raw_data is nulled out at that point, but can be later allocated + * and cached again if user calls btf__get_raw_data(), at which point + * raw_data will contain a contiguous copy of header, types, and + * strings: + * + * +----------+ +---------+ +-----------+ + * | Header | | Types | | Strings | + * +----------+ +---------+ +-----------+ + * ^ ^ ^ + * | | | + * hdr | | + * types_data----+ | + * strs_data------------------+ + * + * +----------+---------+-----------+ + * | Header | Types | Strings | + * raw_data----->+----------+---------+-----------+ + */ + struct btf_header *hdr; + + void *types_data; + size_t types_data_cap; /* used size stored in hdr->type_len */ + + /* type ID to `struct btf_type *` lookup index */ + __u32 *type_offs; + size_t type_offs_cap; __u32 nr_types; - __u32 types_size; - __u32 data_size; + + void *strs_data; + size_t strs_data_cap; /* used size stored in hdr->str_len */ + + /* lookup index for each unique string in strings section */ + struct hashmap *strs_hash; + /* whether strings are already deduplicated */ + bool strs_deduped; + /* BTF object FD, if loaded into kernel */ int fd; + + /* Pointer size (in bytes) for a target architecture of this BTF */ int ptr_sz; }; @@ -49,60 +102,114 @@ static inline __u64 ptr_to_u64(const void *ptr) return (__u64) (unsigned long) ptr; } -static int btf_add_type(struct btf *btf, struct btf_type *t) +/* Ensure given dynamically allocated memory region pointed to by *data* with + * capacity of *cap_cnt* elements each taking *elem_sz* bytes has enough + * memory to accomodate *add_cnt* new elements, assuming *cur_cnt* elements + * are already used. At most *max_cnt* elements can be ever allocated. + * If necessary, memory is reallocated and all existing data is copied over, + * new pointer to the memory region is stored at *data, new memory region + * capacity (in number of elements) is stored in *cap. + * On success, memory pointer to the beginning of unused memory is returned. + * On error, NULL is returned. + */ +void *btf_add_mem(void **data, size_t *cap_cnt, size_t elem_sz, + size_t cur_cnt, size_t max_cnt, size_t add_cnt) { - if (btf->types_size - btf->nr_types < 2) { - struct btf_type **new_types; - __u32 expand_by, new_size; + size_t new_cnt; + void *new_data; - if (btf->types_size == BTF_MAX_NR_TYPES) - return -E2BIG; + if (cur_cnt + add_cnt <= *cap_cnt) + return *data + cur_cnt * elem_sz; - expand_by = max(btf->types_size >> 2, 16U); - new_size = min(BTF_MAX_NR_TYPES, btf->types_size + expand_by); + /* requested more than the set limit */ + if (cur_cnt + add_cnt > max_cnt) + return NULL; - new_types = realloc(btf->types, sizeof(*new_types) * new_size); - if (!new_types) - return -ENOMEM; + new_cnt = *cap_cnt; + new_cnt += new_cnt / 4; /* expand by 25% */ + if (new_cnt < 16) /* but at least 16 elements */ + new_cnt = 16; + if (new_cnt > max_cnt) /* but not exceeding a set limit */ + new_cnt = max_cnt; + if (new_cnt < cur_cnt + add_cnt) /* also ensure we have enough memory */ + new_cnt = cur_cnt + add_cnt; + + new_data = libbpf_reallocarray(*data, new_cnt, elem_sz); + if (!new_data) + return NULL; - if (btf->nr_types == 0) - new_types[0] = &btf_void; + /* zero out newly allocated portion of memory */ + memset(new_data + (*cap_cnt) * elem_sz, 0, (new_cnt - *cap_cnt) * elem_sz); - btf->types = new_types; - btf->types_size = new_size; - } + *data = new_data; + *cap_cnt = new_cnt; + return new_data + cur_cnt * elem_sz; +} - btf->types[++(btf->nr_types)] = t; +/* Ensure given dynamically allocated memory region has enough allocated space + * to accommodate *need_cnt* elements of size *elem_sz* bytes each + */ +int btf_ensure_mem(void **data, size_t *cap_cnt, size_t elem_sz, size_t need_cnt) +{ + void *p; + + if (need_cnt <= *cap_cnt) + return 0; + p = btf_add_mem(data, cap_cnt, elem_sz, *cap_cnt, SIZE_MAX, need_cnt - *cap_cnt); + if (!p) + return -ENOMEM; + + return 0; +} + +static int btf_add_type_idx_entry(struct btf *btf, __u32 type_off) +{ + __u32 *p; + + p = btf_add_mem((void **)&btf->type_offs, &btf->type_offs_cap, sizeof(__u32), + btf->nr_types + 1, BTF_MAX_NR_TYPES, 1); + if (!p) + return -ENOMEM; + + *p = type_off; return 0; } +static void btf_bswap_hdr(struct btf_header *h) +{ + h->magic = bswap_16(h->magic); + h->hdr_len = bswap_32(h->hdr_len); + h->type_off = bswap_32(h->type_off); + h->type_len = bswap_32(h->type_len); + h->str_off = bswap_32(h->str_off); + h->str_len = bswap_32(h->str_len); +} + static int btf_parse_hdr(struct btf *btf) { - const struct btf_header *hdr = btf->hdr; + struct btf_header *hdr = btf->hdr; __u32 meta_left; - if (btf->data_size < sizeof(struct btf_header)) { + if (btf->raw_size < sizeof(struct btf_header)) { pr_debug("BTF header not found\n"); return -EINVAL; } - if (hdr->magic != BTF_MAGIC) { + if (hdr->magic == bswap_16(BTF_MAGIC)) { + btf->swapped_endian = true; + if (bswap_32(hdr->hdr_len) != sizeof(struct btf_header)) { + pr_warn("Can't load BTF with non-native endianness due to unsupported header length %u\n", + bswap_32(hdr->hdr_len)); + return -ENOTSUP; + } + btf_bswap_hdr(hdr); + } else if (hdr->magic != BTF_MAGIC) { pr_debug("Invalid BTF magic:%x\n", hdr->magic); return -EINVAL; } - if (hdr->version != BTF_VERSION) { - pr_debug("Unsupported BTF version:%u\n", hdr->version); - return -ENOTSUP; - } - - if (hdr->flags) { - pr_debug("Unsupported BTF flags:%x\n", hdr->flags); - return -ENOTSUP; - } - - meta_left = btf->data_size - sizeof(*hdr); + meta_left = btf->raw_size - sizeof(*hdr); if (!meta_left) { pr_debug("BTF has no data\n"); return -EINVAL; @@ -128,15 +235,13 @@ static int btf_parse_hdr(struct btf *btf) return -EINVAL; } - btf->nohdr_data = btf->hdr + 1; - return 0; } static int btf_parse_str_sec(struct btf *btf) { const struct btf_header *hdr = btf->hdr; - const char *start = btf->nohdr_data + hdr->str_off; + const char *start = btf->strs_data; const char *end = start + btf->hdr->str_len; if (!hdr->str_len || hdr->str_len - 1 > BTF_MAX_STR_OFFSET || @@ -145,14 +250,12 @@ static int btf_parse_str_sec(struct btf *btf) return -EINVAL; } - btf->strings = start; - return 0; } -static int btf_type_size(struct btf_type *t) +static int btf_type_size(const struct btf_type *t) { - int base_size = sizeof(struct btf_type); + const int base_size = sizeof(struct btf_type); __u16 vlen = btf_vlen(t); switch (btf_kind(t)) { @@ -185,25 +288,120 @@ static int btf_type_size(struct btf_type *t) } } +static void btf_bswap_type_base(struct btf_type *t) +{ + t->name_off = bswap_32(t->name_off); + t->info = bswap_32(t->info); + t->type = bswap_32(t->type); +} + +static int btf_bswap_type_rest(struct btf_type *t) +{ + struct btf_var_secinfo *v; + struct btf_member *m; + struct btf_array *a; + struct btf_param *p; + struct btf_enum *e; + __u16 vlen = btf_vlen(t); + int i; + + switch (btf_kind(t)) { + case BTF_KIND_FWD: + case BTF_KIND_CONST: + case BTF_KIND_VOLATILE: + case BTF_KIND_RESTRICT: + case BTF_KIND_PTR: + case BTF_KIND_TYPEDEF: + case BTF_KIND_FUNC: + return 0; + case BTF_KIND_INT: + *(__u32 *)(t + 1) = bswap_32(*(__u32 *)(t + 1)); + return 0; + case BTF_KIND_ENUM: + for (i = 0, e = btf_enum(t); i < vlen; i++, e++) { + e->name_off = bswap_32(e->name_off); + e->val = bswap_32(e->val); + } + return 0; + case BTF_KIND_ARRAY: + a = btf_array(t); + a->type = bswap_32(a->type); + a->index_type = bswap_32(a->index_type); + a->nelems = bswap_32(a->nelems); + return 0; + case BTF_KIND_STRUCT: + case BTF_KIND_UNION: + for (i = 0, m = btf_members(t); i < vlen; i++, m++) { + m->name_off = bswap_32(m->name_off); + m->type = bswap_32(m->type); + m->offset = bswap_32(m->offset); + } + return 0; + case BTF_KIND_FUNC_PROTO: + for (i = 0, p = btf_params(t); i < vlen; i++, p++) { + p->name_off = bswap_32(p->name_off); + p->type = bswap_32(p->type); + } + return 0; + case BTF_KIND_VAR: + btf_var(t)->linkage = bswap_32(btf_var(t)->linkage); + return 0; + case BTF_KIND_DATASEC: + for (i = 0, v = btf_var_secinfos(t); i < vlen; i++, v++) { + v->type = bswap_32(v->type); + v->offset = bswap_32(v->offset); + v->size = bswap_32(v->size); + } + return 0; + default: + pr_debug("Unsupported BTF_KIND:%u\n", btf_kind(t)); + return -EINVAL; + } +} + static int btf_parse_type_sec(struct btf *btf) { struct btf_header *hdr = btf->hdr; - void *nohdr_data = btf->nohdr_data; - void *next_type = nohdr_data + hdr->type_off; - void *end_type = nohdr_data + hdr->str_off; + void *next_type = btf->types_data; + void *end_type = next_type + hdr->type_len; + int err, i = 0, type_size; - while (next_type < end_type) { - struct btf_type *t = next_type; - int type_size; - int err; + /* VOID (type_id == 0) is specially handled by btf__get_type_by_id(), + * so ensure we can never properly use its offset from index by + * setting it to a large value + */ + err = btf_add_type_idx_entry(btf, UINT_MAX); + if (err) + return err; + + while (next_type + sizeof(struct btf_type) <= end_type) { + i++; + + if (btf->swapped_endian) + btf_bswap_type_base(next_type); - type_size = btf_type_size(t); + type_size = btf_type_size(next_type); if (type_size < 0) return type_size; - next_type += type_size; - err = btf_add_type(btf, t); + if (next_type + type_size > end_type) { + pr_warn("BTF type [%d] is malformed\n", i); + return -EINVAL; + } + + if (btf->swapped_endian && btf_bswap_type_rest(next_type)) + return -EINVAL; + + err = btf_add_type_idx_entry(btf, next_type - btf->types_data); if (err) return err; + + next_type += type_size; + btf->nr_types++; + } + + if (next_type != end_type) { + pr_warn("BTF types data is malformed\n"); + return -EINVAL; } return 0; @@ -214,12 +412,20 @@ __u32 btf__get_nr_types(const struct btf *btf) return btf->nr_types; } +/* internal helper returning non-const pointer to a type */ +static struct btf_type *btf_type_by_id(struct btf *btf, __u32 type_id) +{ + if (type_id == 0) + return &btf_void; + + return btf->types_data + btf->type_offs[type_id]; +} + const struct btf_type *btf__type_by_id(const struct btf *btf, __u32 type_id) { if (type_id > btf->nr_types) return NULL; - - return btf->types[type_id]; + return btf_type_by_id((struct btf *)btf, type_id); } static int determine_ptr_size(const struct btf *btf) @@ -286,6 +492,38 @@ int btf__set_pointer_size(struct btf *btf, size_t ptr_sz) return 0; } +static bool is_host_big_endian(void) +{ +#if __BYTE_ORDER == __LITTLE_ENDIAN + return false; +#elif __BYTE_ORDER == __BIG_ENDIAN + return true; +#else +# error "Unrecognized __BYTE_ORDER__" +#endif +} + +enum btf_endianness btf__endianness(const struct btf *btf) +{ + if (is_host_big_endian()) + return btf->swapped_endian ? BTF_LITTLE_ENDIAN : BTF_BIG_ENDIAN; + else + return btf->swapped_endian ? BTF_BIG_ENDIAN : BTF_LITTLE_ENDIAN; +} + +int btf__set_endianness(struct btf *btf, enum btf_endianness endian) +{ + if (endian != BTF_LITTLE_ENDIAN && endian != BTF_BIG_ENDIAN) + return -EINVAL; + + btf->swapped_endian = is_host_big_endian() != (endian == BTF_BIG_ENDIAN); + if (!btf->swapped_endian) { + free(btf->raw_data_swapped); + btf->raw_data_swapped = NULL; + } + return 0; +} + static bool btf_type_is_void(const struct btf_type *t) { return t == &btf_void || btf_is_fwd(t); @@ -417,7 +655,7 @@ __s32 btf__find_by_name(const struct btf *btf, const char *type_name) return 0; for (i = 1; i <= btf->nr_types; i++) { - const struct btf_type *t = btf->types[i]; + const struct btf_type *t = btf__type_by_id(btf, i); const char *name = btf__name_by_offset(btf, t->name_off); if (name && !strcmp(type_name, name)) @@ -436,7 +674,7 @@ __s32 btf__find_by_name_kind(const struct btf *btf, const char *type_name, return 0; for (i = 1; i <= btf->nr_types; i++) { - const struct btf_type *t = btf->types[i]; + const struct btf_type *t = btf__type_by_id(btf, i); const char *name; if (btf_kind(t) != kind) @@ -449,6 +687,11 @@ __s32 btf__find_by_name_kind(const struct btf *btf, const char *type_name, return -ENOENT; } +static bool btf_is_modifiable(const struct btf *btf) +{ + return (void *)btf->hdr != btf->raw_data; +} + void btf__free(struct btf *btf) { if (IS_ERR_OR_NULL(btf)) @@ -457,11 +700,55 @@ void btf__free(struct btf *btf) if (btf->fd >= 0) close(btf->fd); - free(btf->data); - free(btf->types); + if (btf_is_modifiable(btf)) { + /* if BTF was modified after loading, it will have a split + * in-memory representation for header, types, and strings + * sections, so we need to free all of them individually. It + * might still have a cached contiguous raw data present, + * which will be unconditionally freed below. + */ + free(btf->hdr); + free(btf->types_data); + free(btf->strs_data); + } + free(btf->raw_data); + free(btf->raw_data_swapped); + free(btf->type_offs); free(btf); } +struct btf *btf__new_empty(void) +{ + struct btf *btf; + + btf = calloc(1, sizeof(*btf)); + if (!btf) + return ERR_PTR(-ENOMEM); + + btf->fd = -1; + btf->ptr_sz = sizeof(void *); + btf->swapped_endian = false; + + /* +1 for empty string at offset 0 */ + btf->raw_size = sizeof(struct btf_header) + 1; + btf->raw_data = calloc(1, btf->raw_size); + if (!btf->raw_data) { + free(btf); + return ERR_PTR(-ENOMEM); + } + + btf->hdr = btf->raw_data; + btf->hdr->hdr_len = sizeof(struct btf_header); + btf->hdr->magic = BTF_MAGIC; + btf->hdr->version = BTF_VERSION; + + btf->types_data = btf->raw_data + btf->hdr->hdr_len; + btf->strs_data = btf->raw_data + btf->hdr->hdr_len; + btf->hdr->str_len = 1; /* empty string at offset 0 */ + + return btf; +} + struct btf *btf__new(const void *data, __u32 size) { struct btf *btf; @@ -471,26 +758,28 @@ struct btf *btf__new(const void *data, __u32 size) if (!btf) return ERR_PTR(-ENOMEM); - btf->fd = -1; - - btf->data = malloc(size); - if (!btf->data) { + btf->raw_data = malloc(size); + if (!btf->raw_data) { err = -ENOMEM; goto done; } + memcpy(btf->raw_data, data, size); + btf->raw_size = size; - memcpy(btf->data, data, size); - btf->data_size = size; - + btf->hdr = btf->raw_data; err = btf_parse_hdr(btf); if (err) goto done; + btf->strs_data = btf->raw_data + btf->hdr->hdr_len + btf->hdr->str_off; + btf->types_data = btf->raw_data + btf->hdr->hdr_len + btf->hdr->type_off; + err = btf_parse_str_sec(btf); + err = err ?: btf_parse_type_sec(btf); if (err) goto done; - err = btf_parse_type_sec(btf); + btf->fd = -1; done: if (err) { @@ -501,17 +790,6 @@ done: return btf; } -static bool btf_check_endianness(const GElf_Ehdr *ehdr) -{ -#if __BYTE_ORDER == __LITTLE_ENDIAN - return ehdr->e_ident[EI_DATA] == ELFDATA2LSB; -#elif __BYTE_ORDER == __BIG_ENDIAN - return ehdr->e_ident[EI_DATA] == ELFDATA2MSB; -#else -# error "Unrecognized __BYTE_ORDER__" -#endif -} - struct btf *btf__parse_elf(const char *path, struct btf_ext **btf_ext) { Elf_Data *btf_data = NULL, *btf_ext_data = NULL; @@ -544,10 +822,6 @@ struct btf *btf__parse_elf(const char *path, struct btf_ext **btf_ext) pr_warn("failed to get EHDR from %s\n", path); goto done; } - if (!btf_check_endianness(&ehdr)) { - pr_warn("non-native ELF endianness is not supported\n"); - goto done; - } if (!elf_rawdata(elf_getscn(elf, ehdr.e_shstrndx), NULL)) { pr_warn("failed to get e_shstrndx from %s\n", path); goto done; @@ -659,13 +933,7 @@ struct btf *btf__parse_raw(const char *path) err = -EIO; goto err_out; } - if (magic == __bswap_16(BTF_MAGIC)) { - /* non-native endian raw BTF */ - pr_warn("non-native BTF endianness is not supported\n"); - err = -LIBBPF_ERRNO__ENDIAN; - goto err_out; - } - if (magic != BTF_MAGIC) { + if (magic != BTF_MAGIC && magic != bswap_16(BTF_MAGIC)) { /* definitely not a raw BTF */ err = -EPROTO; goto err_out; @@ -798,7 +1066,7 @@ int btf__finalize_data(struct bpf_object *obj, struct btf *btf) __u32 i; for (i = 1; i <= btf->nr_types; i++) { - struct btf_type *t = btf->types[i]; + struct btf_type *t = btf_type_by_id(btf, i); /* Loader needs to fix up some of the things compiler * couldn't get its hands on while emitting BTF. This @@ -815,10 +1083,13 @@ int btf__finalize_data(struct bpf_object *obj, struct btf *btf) return err; } +static void *btf_get_raw_data(const struct btf *btf, __u32 *size, bool swap_endian); + int btf__load(struct btf *btf) { - __u32 log_buf_size = 0; + __u32 log_buf_size = 0, raw_size; char *log_buf = NULL; + void *raw_data; int err = 0; if (btf->fd >= 0) @@ -833,8 +1104,16 @@ retry_load: *log_buf = 0; } - btf->fd = bpf_load_btf(btf->data, btf->data_size, - log_buf, log_buf_size, false); + raw_data = btf_get_raw_data(btf, &raw_size, false); + if (!raw_data) { + err = -ENOMEM; + goto done; + } + /* cache native raw data representation */ + btf->raw_size = raw_size; + btf->raw_data = raw_data; + + btf->fd = bpf_load_btf(raw_data, raw_size, log_buf, log_buf_size, false); if (btf->fd < 0) { if (!log_buf || errno == ENOSPC) { log_buf_size = max((__u32)BPF_LOG_BUF_SIZE, @@ -865,20 +1144,88 @@ void btf__set_fd(struct btf *btf, int fd) btf->fd = fd; } -const void *btf__get_raw_data(const struct btf *btf, __u32 *size) +static void *btf_get_raw_data(const struct btf *btf, __u32 *size, bool swap_endian) { - *size = btf->data_size; - return btf->data; + struct btf_header *hdr = btf->hdr; + struct btf_type *t; + void *data, *p; + __u32 data_sz; + int i; + + data = swap_endian ? btf->raw_data_swapped : btf->raw_data; + if (data) { + *size = btf->raw_size; + return data; + } + + data_sz = hdr->hdr_len + hdr->type_len + hdr->str_len; + data = calloc(1, data_sz); + if (!data) + return NULL; + p = data; + + memcpy(p, hdr, hdr->hdr_len); + if (swap_endian) + btf_bswap_hdr(p); + p += hdr->hdr_len; + + memcpy(p, btf->types_data, hdr->type_len); + if (swap_endian) { + for (i = 1; i <= btf->nr_types; i++) { + t = p + btf->type_offs[i]; + /* btf_bswap_type_rest() relies on native t->info, so + * we swap base type info after we swapped all the + * additional information + */ + if (btf_bswap_type_rest(t)) + goto err_out; + btf_bswap_type_base(t); + } + } + p += hdr->type_len; + + memcpy(p, btf->strs_data, hdr->str_len); + p += hdr->str_len; + + *size = data_sz; + return data; +err_out: + free(data); + return NULL; } -const char *btf__name_by_offset(const struct btf *btf, __u32 offset) +const void *btf__get_raw_data(const struct btf *btf_ro, __u32 *size) +{ + struct btf *btf = (struct btf *)btf_ro; + __u32 data_sz; + void *data; + + data = btf_get_raw_data(btf, &data_sz, btf->swapped_endian); + if (!data) + return NULL; + + btf->raw_size = data_sz; + if (btf->swapped_endian) + btf->raw_data_swapped = data; + else + btf->raw_data = data; + *size = data_sz; + return data; +} + +const char *btf__str_by_offset(const struct btf *btf, __u32 offset) { if (offset < btf->hdr->str_len) - return &btf->strings[offset]; + return btf->strs_data + offset; else return NULL; } +const char *btf__name_by_offset(const struct btf *btf, __u32 offset) +{ + return btf__str_by_offset(btf, offset); +} + int btf__get_from_id(__u32 id, struct btf **btf) { struct bpf_btf_info btf_info = { 0 }; @@ -1014,6 +1361,970 @@ int btf__get_map_kv_tids(const struct btf *btf, const char *map_name, return 0; } +static size_t strs_hash_fn(const void *key, void *ctx) +{ + struct btf *btf = ctx; + const char *str = btf->strs_data + (long)key; + + return str_hash(str); +} + +static bool strs_hash_equal_fn(const void *key1, const void *key2, void *ctx) +{ + struct btf *btf = ctx; + const char *str1 = btf->strs_data + (long)key1; + const char *str2 = btf->strs_data + (long)key2; + + return strcmp(str1, str2) == 0; +} + +static void btf_invalidate_raw_data(struct btf *btf) +{ + if (btf->raw_data) { + free(btf->raw_data); + btf->raw_data = NULL; + } + if (btf->raw_data_swapped) { + free(btf->raw_data_swapped); + btf->raw_data_swapped = NULL; + } +} + +/* Ensure BTF is ready to be modified (by splitting into a three memory + * regions for header, types, and strings). Also invalidate cached + * raw_data, if any. + */ +static int btf_ensure_modifiable(struct btf *btf) +{ + void *hdr, *types, *strs, *strs_end, *s; + struct hashmap *hash = NULL; + long off; + int err; + + if (btf_is_modifiable(btf)) { + /* any BTF modification invalidates raw_data */ + btf_invalidate_raw_data(btf); + return 0; + } + + /* split raw data into three memory regions */ + hdr = malloc(btf->hdr->hdr_len); + types = malloc(btf->hdr->type_len); + strs = malloc(btf->hdr->str_len); + if (!hdr || !types || !strs) + goto err_out; + + memcpy(hdr, btf->hdr, btf->hdr->hdr_len); + memcpy(types, btf->types_data, btf->hdr->type_len); + memcpy(strs, btf->strs_data, btf->hdr->str_len); + + /* build lookup index for all strings */ + hash = hashmap__new(strs_hash_fn, strs_hash_equal_fn, btf); + if (IS_ERR(hash)) { + err = PTR_ERR(hash); + hash = NULL; + goto err_out; + } + + strs_end = strs + btf->hdr->str_len; + for (off = 0, s = strs; s < strs_end; off += strlen(s) + 1, s = strs + off) { + /* hashmap__add() returns EEXIST if string with the same + * content already is in the hash map + */ + err = hashmap__add(hash, (void *)off, (void *)off); + if (err == -EEXIST) + continue; /* duplicate */ + if (err) + goto err_out; + } + + /* only when everything was successful, update internal state */ + btf->hdr = hdr; + btf->types_data = types; + btf->types_data_cap = btf->hdr->type_len; + btf->strs_data = strs; + btf->strs_data_cap = btf->hdr->str_len; + btf->strs_hash = hash; + /* if BTF was created from scratch, all strings are guaranteed to be + * unique and deduplicated + */ + btf->strs_deduped = btf->hdr->str_len <= 1; + + /* invalidate raw_data representation */ + btf_invalidate_raw_data(btf); + + return 0; + +err_out: + hashmap__free(hash); + free(hdr); + free(types); + free(strs); + return -ENOMEM; +} + +static void *btf_add_str_mem(struct btf *btf, size_t add_sz) +{ + return btf_add_mem(&btf->strs_data, &btf->strs_data_cap, 1, + btf->hdr->str_len, BTF_MAX_STR_OFFSET, add_sz); +} + +/* Find an offset in BTF string section that corresponds to a given string *s*. + * Returns: + * - >0 offset into string section, if string is found; + * - -ENOENT, if string is not in the string section; + * - <0, on any other error. + */ +int btf__find_str(struct btf *btf, const char *s) +{ + long old_off, new_off, len; + void *p; + + /* BTF needs to be in a modifiable state to build string lookup index */ + if (btf_ensure_modifiable(btf)) + return -ENOMEM; + + /* see btf__add_str() for why we do this */ + len = strlen(s) + 1; + p = btf_add_str_mem(btf, len); + if (!p) + return -ENOMEM; + + new_off = btf->hdr->str_len; + memcpy(p, s, len); + + if (hashmap__find(btf->strs_hash, (void *)new_off, (void **)&old_off)) + return old_off; + + return -ENOENT; +} + +/* Add a string s to the BTF string section. + * Returns: + * - > 0 offset into string section, on success; + * - < 0, on error. + */ +int btf__add_str(struct btf *btf, const char *s) +{ + long old_off, new_off, len; + void *p; + int err; + + if (btf_ensure_modifiable(btf)) + return -ENOMEM; + + /* Hashmap keys are always offsets within btf->strs_data, so to even + * look up some string from the "outside", we need to first append it + * at the end, so that it can be addressed with an offset. Luckily, + * until btf->hdr->str_len is incremented, that string is just a piece + * of garbage for the rest of BTF code, so no harm, no foul. On the + * other hand, if the string is unique, it's already appended and + * ready to be used, only a simple btf->hdr->str_len increment away. + */ + len = strlen(s) + 1; + p = btf_add_str_mem(btf, len); + if (!p) + return -ENOMEM; + + new_off = btf->hdr->str_len; + memcpy(p, s, len); + + /* Now attempt to add the string, but only if the string with the same + * contents doesn't exist already (HASHMAP_ADD strategy). If such + * string exists, we'll get its offset in old_off (that's old_key). + */ + err = hashmap__insert(btf->strs_hash, (void *)new_off, (void *)new_off, + HASHMAP_ADD, (const void **)&old_off, NULL); + if (err == -EEXIST) + return old_off; /* duplicated string, return existing offset */ + if (err) + return err; + + btf->hdr->str_len += len; /* new unique string, adjust data length */ + return new_off; +} + +static void *btf_add_type_mem(struct btf *btf, size_t add_sz) +{ + return btf_add_mem(&btf->types_data, &btf->types_data_cap, 1, + btf->hdr->type_len, UINT_MAX, add_sz); +} + +static __u32 btf_type_info(int kind, int vlen, int kflag) +{ + return (kflag << 31) | (kind << 24) | vlen; +} + +static void btf_type_inc_vlen(struct btf_type *t) +{ + t->info = btf_type_info(btf_kind(t), btf_vlen(t) + 1, btf_kflag(t)); +} + +/* + * Append new BTF_KIND_INT type with: + * - *name* - non-empty, non-NULL type name; + * - *sz* - power-of-2 (1, 2, 4, ..) size of the type, in bytes; + * - encoding is a combination of BTF_INT_SIGNED, BTF_INT_CHAR, BTF_INT_BOOL. + * Returns: + * - >0, type ID of newly added BTF type; + * - <0, on error. + */ +int btf__add_int(struct btf *btf, const char *name, size_t byte_sz, int encoding) +{ + struct btf_type *t; + int sz, err, name_off; + + /* non-empty name */ + if (!name || !name[0]) + return -EINVAL; + /* byte_sz must be power of 2 */ + if (!byte_sz || (byte_sz & (byte_sz - 1)) || byte_sz > 16) + return -EINVAL; + if (encoding & ~(BTF_INT_SIGNED | BTF_INT_CHAR | BTF_INT_BOOL)) + return -EINVAL; + + /* deconstruct BTF, if necessary, and invalidate raw_data */ + if (btf_ensure_modifiable(btf)) + return -ENOMEM; + + sz = sizeof(struct btf_type) + sizeof(int); + t = btf_add_type_mem(btf, sz); + if (!t) + return -ENOMEM; + + /* if something goes wrong later, we might end up with an extra string, + * but that shouldn't be a problem, because BTF can't be constructed + * completely anyway and will most probably be just discarded + */ + name_off = btf__add_str(btf, name); + if (name_off < 0) + return name_off; + + t->name_off = name_off; + t->info = btf_type_info(BTF_KIND_INT, 0, 0); + t->size = byte_sz; + /* set INT info, we don't allow setting legacy bit offset/size */ + *(__u32 *)(t + 1) = (encoding << 24) | (byte_sz * 8); + + err = btf_add_type_idx_entry(btf, btf->hdr->type_len); + if (err) + return err; + + btf->hdr->type_len += sz; + btf->hdr->str_off += sz; + btf->nr_types++; + return btf->nr_types; +} + +/* it's completely legal to append BTF types with type IDs pointing forward to + * types that haven't been appended yet, so we only make sure that id looks + * sane, we can't guarantee that ID will always be valid + */ +static int validate_type_id(int id) +{ + if (id < 0 || id > BTF_MAX_NR_TYPES) + return -EINVAL; + return 0; +} + +/* generic append function for PTR, TYPEDEF, CONST/VOLATILE/RESTRICT */ +static int btf_add_ref_kind(struct btf *btf, int kind, const char *name, int ref_type_id) +{ + struct btf_type *t; + int sz, name_off = 0, err; + + if (validate_type_id(ref_type_id)) + return -EINVAL; + + if (btf_ensure_modifiable(btf)) + return -ENOMEM; + + sz = sizeof(struct btf_type); + t = btf_add_type_mem(btf, sz); + if (!t) + return -ENOMEM; + + if (name && name[0]) { + name_off = btf__add_str(btf, name); + if (name_off < 0) + return name_off; + } + + t->name_off = name_off; + t->info = btf_type_info(kind, 0, 0); + t->type = ref_type_id; + + err = btf_add_type_idx_entry(btf, btf->hdr->type_len); + if (err) + return err; + + btf->hdr->type_len += sz; + btf->hdr->str_off += sz; + btf->nr_types++; + return btf->nr_types; +} + +/* + * Append new BTF_KIND_PTR type with: + * - *ref_type_id* - referenced type ID, it might not exist yet; + * Returns: + * - >0, type ID of newly added BTF type; + * - <0, on error. + */ +int btf__add_ptr(struct btf *btf, int ref_type_id) +{ + return btf_add_ref_kind(btf, BTF_KIND_PTR, NULL, ref_type_id); +} + +/* + * Append new BTF_KIND_ARRAY type with: + * - *index_type_id* - type ID of the type describing array index; + * - *elem_type_id* - type ID of the type describing array element; + * - *nr_elems* - the size of the array; + * Returns: + * - >0, type ID of newly added BTF type; + * - <0, on error. + */ +int btf__add_array(struct btf *btf, int index_type_id, int elem_type_id, __u32 nr_elems) +{ + struct btf_type *t; + struct btf_array *a; + int sz, err; + + if (validate_type_id(index_type_id) || validate_type_id(elem_type_id)) + return -EINVAL; + + if (btf_ensure_modifiable(btf)) + return -ENOMEM; + + sz = sizeof(struct btf_type) + sizeof(struct btf_array); + t = btf_add_type_mem(btf, sz); + if (!t) + return -ENOMEM; + + t->name_off = 0; + t->info = btf_type_info(BTF_KIND_ARRAY, 0, 0); + t->size = 0; + + a = btf_array(t); + a->type = elem_type_id; + a->index_type = index_type_id; + a->nelems = nr_elems; + + err = btf_add_type_idx_entry(btf, btf->hdr->type_len); + if (err) + return err; + + btf->hdr->type_len += sz; + btf->hdr->str_off += sz; + btf->nr_types++; + return btf->nr_types; +} + +/* generic STRUCT/UNION append function */ +static int btf_add_composite(struct btf *btf, int kind, const char *name, __u32 bytes_sz) +{ + struct btf_type *t; + int sz, err, name_off = 0; + + if (btf_ensure_modifiable(btf)) + return -ENOMEM; + + sz = sizeof(struct btf_type); + t = btf_add_type_mem(btf, sz); + if (!t) + return -ENOMEM; + + if (name && name[0]) { + name_off = btf__add_str(btf, name); + if (name_off < 0) + return name_off; + } + + /* start out with vlen=0 and no kflag; this will be adjusted when + * adding each member + */ + t->name_off = name_off; + t->info = btf_type_info(kind, 0, 0); + t->size = bytes_sz; + + err = btf_add_type_idx_entry(btf, btf->hdr->type_len); + if (err) + return err; + + btf->hdr->type_len += sz; + btf->hdr->str_off += sz; + btf->nr_types++; + return btf->nr_types; +} + +/* + * Append new BTF_KIND_STRUCT type with: + * - *name* - name of the struct, can be NULL or empty for anonymous structs; + * - *byte_sz* - size of the struct, in bytes; + * + * Struct initially has no fields in it. Fields can be added by + * btf__add_field() right after btf__add_struct() succeeds. + * + * Returns: + * - >0, type ID of newly added BTF type; + * - <0, on error. + */ +int btf__add_struct(struct btf *btf, const char *name, __u32 byte_sz) +{ + return btf_add_composite(btf, BTF_KIND_STRUCT, name, byte_sz); +} + +/* + * Append new BTF_KIND_UNION type with: + * - *name* - name of the union, can be NULL or empty for anonymous union; + * - *byte_sz* - size of the union, in bytes; + * + * Union initially has no fields in it. Fields can be added by + * btf__add_field() right after btf__add_union() succeeds. All fields + * should have *bit_offset* of 0. + * + * Returns: + * - >0, type ID of newly added BTF type; + * - <0, on error. + */ +int btf__add_union(struct btf *btf, const char *name, __u32 byte_sz) +{ + return btf_add_composite(btf, BTF_KIND_UNION, name, byte_sz); +} + +/* + * Append new field for the current STRUCT/UNION type with: + * - *name* - name of the field, can be NULL or empty for anonymous field; + * - *type_id* - type ID for the type describing field type; + * - *bit_offset* - bit offset of the start of the field within struct/union; + * - *bit_size* - bit size of a bitfield, 0 for non-bitfield fields; + * Returns: + * - 0, on success; + * - <0, on error. + */ +int btf__add_field(struct btf *btf, const char *name, int type_id, + __u32 bit_offset, __u32 bit_size) +{ + struct btf_type *t; + struct btf_member *m; + bool is_bitfield; + int sz, name_off = 0; + + /* last type should be union/struct */ + if (btf->nr_types == 0) + return -EINVAL; + t = btf_type_by_id(btf, btf->nr_types); + if (!btf_is_composite(t)) + return -EINVAL; + + if (validate_type_id(type_id)) + return -EINVAL; + /* best-effort bit field offset/size enforcement */ + is_bitfield = bit_size || (bit_offset % 8 != 0); + if (is_bitfield && (bit_size == 0 || bit_size > 255 || bit_offset > 0xffffff)) + return -EINVAL; + + /* only offset 0 is allowed for unions */ + if (btf_is_union(t) && bit_offset) + return -EINVAL; + + /* decompose and invalidate raw data */ + if (btf_ensure_modifiable(btf)) + return -ENOMEM; + + sz = sizeof(struct btf_member); + m = btf_add_type_mem(btf, sz); + if (!m) + return -ENOMEM; + + if (name && name[0]) { + name_off = btf__add_str(btf, name); + if (name_off < 0) + return name_off; + } + + m->name_off = name_off; + m->type = type_id; + m->offset = bit_offset | (bit_size << 24); + + /* btf_add_type_mem can invalidate t pointer */ + t = btf_type_by_id(btf, btf->nr_types); + /* update parent type's vlen and kflag */ + t->info = btf_type_info(btf_kind(t), btf_vlen(t) + 1, is_bitfield || btf_kflag(t)); + + btf->hdr->type_len += sz; + btf->hdr->str_off += sz; + return 0; +} + +/* + * Append new BTF_KIND_ENUM type with: + * - *name* - name of the enum, can be NULL or empty for anonymous enums; + * - *byte_sz* - size of the enum, in bytes. + * + * Enum initially has no enum values in it (and corresponds to enum forward + * declaration). Enumerator values can be added by btf__add_enum_value() + * immediately after btf__add_enum() succeeds. + * + * Returns: + * - >0, type ID of newly added BTF type; + * - <0, on error. + */ +int btf__add_enum(struct btf *btf, const char *name, __u32 byte_sz) +{ + struct btf_type *t; + int sz, err, name_off = 0; + + /* byte_sz must be power of 2 */ + if (!byte_sz || (byte_sz & (byte_sz - 1)) || byte_sz > 8) + return -EINVAL; + + if (btf_ensure_modifiable(btf)) + return -ENOMEM; + + sz = sizeof(struct btf_type); + t = btf_add_type_mem(btf, sz); + if (!t) + return -ENOMEM; + + if (name && name[0]) { + name_off = btf__add_str(btf, name); + if (name_off < 0) + return name_off; + } + + /* start out with vlen=0; it will be adjusted when adding enum values */ + t->name_off = name_off; + t->info = btf_type_info(BTF_KIND_ENUM, 0, 0); + t->size = byte_sz; + + err = btf_add_type_idx_entry(btf, btf->hdr->type_len); + if (err) + return err; + + btf->hdr->type_len += sz; + btf->hdr->str_off += sz; + btf->nr_types++; + return btf->nr_types; +} + +/* + * Append new enum value for the current ENUM type with: + * - *name* - name of the enumerator value, can't be NULL or empty; + * - *value* - integer value corresponding to enum value *name*; + * Returns: + * - 0, on success; + * - <0, on error. + */ +int btf__add_enum_value(struct btf *btf, const char *name, __s64 value) +{ + struct btf_type *t; + struct btf_enum *v; + int sz, name_off; + + /* last type should be BTF_KIND_ENUM */ + if (btf->nr_types == 0) + return -EINVAL; + t = btf_type_by_id(btf, btf->nr_types); + if (!btf_is_enum(t)) + return -EINVAL; + + /* non-empty name */ + if (!name || !name[0]) + return -EINVAL; + if (value < INT_MIN || value > UINT_MAX) + return -E2BIG; + + /* decompose and invalidate raw data */ + if (btf_ensure_modifiable(btf)) + return -ENOMEM; + + sz = sizeof(struct btf_enum); + v = btf_add_type_mem(btf, sz); + if (!v) + return -ENOMEM; + + name_off = btf__add_str(btf, name); + if (name_off < 0) + return name_off; + + v->name_off = name_off; + v->val = value; + + /* update parent type's vlen */ + t = btf_type_by_id(btf, btf->nr_types); + btf_type_inc_vlen(t); + + btf->hdr->type_len += sz; + btf->hdr->str_off += sz; + return 0; +} + +/* + * Append new BTF_KIND_FWD type with: + * - *name*, non-empty/non-NULL name; + * - *fwd_kind*, kind of forward declaration, one of BTF_FWD_STRUCT, + * BTF_FWD_UNION, or BTF_FWD_ENUM; + * Returns: + * - >0, type ID of newly added BTF type; + * - <0, on error. + */ +int btf__add_fwd(struct btf *btf, const char *name, enum btf_fwd_kind fwd_kind) +{ + if (!name || !name[0]) + return -EINVAL; + + switch (fwd_kind) { + case BTF_FWD_STRUCT: + case BTF_FWD_UNION: { + struct btf_type *t; + int id; + + id = btf_add_ref_kind(btf, BTF_KIND_FWD, name, 0); + if (id <= 0) + return id; + t = btf_type_by_id(btf, id); + t->info = btf_type_info(BTF_KIND_FWD, 0, fwd_kind == BTF_FWD_UNION); + return id; + } + case BTF_FWD_ENUM: + /* enum forward in BTF currently is just an enum with no enum + * values; we also assume a standard 4-byte size for it + */ + return btf__add_enum(btf, name, sizeof(int)); + default: + return -EINVAL; + } +} + +/* + * Append new BTF_KING_TYPEDEF type with: + * - *name*, non-empty/non-NULL name; + * - *ref_type_id* - referenced type ID, it might not exist yet; + * Returns: + * - >0, type ID of newly added BTF type; + * - <0, on error. + */ +int btf__add_typedef(struct btf *btf, const char *name, int ref_type_id) +{ + if (!name || !name[0]) + return -EINVAL; + + return btf_add_ref_kind(btf, BTF_KIND_TYPEDEF, name, ref_type_id); +} + +/* + * Append new BTF_KIND_VOLATILE type with: + * - *ref_type_id* - referenced type ID, it might not exist yet; + * Returns: + * - >0, type ID of newly added BTF type; + * - <0, on error. + */ +int btf__add_volatile(struct btf *btf, int ref_type_id) +{ + return btf_add_ref_kind(btf, BTF_KIND_VOLATILE, NULL, ref_type_id); +} + +/* + * Append new BTF_KIND_CONST type with: + * - *ref_type_id* - referenced type ID, it might not exist yet; + * Returns: + * - >0, type ID of newly added BTF type; + * - <0, on error. + */ +int btf__add_const(struct btf *btf, int ref_type_id) +{ + return btf_add_ref_kind(btf, BTF_KIND_CONST, NULL, ref_type_id); +} + +/* + * Append new BTF_KIND_RESTRICT type with: + * - *ref_type_id* - referenced type ID, it might not exist yet; + * Returns: + * - >0, type ID of newly added BTF type; + * - <0, on error. + */ +int btf__add_restrict(struct btf *btf, int ref_type_id) +{ + return btf_add_ref_kind(btf, BTF_KIND_RESTRICT, NULL, ref_type_id); +} + +/* + * Append new BTF_KIND_FUNC type with: + * - *name*, non-empty/non-NULL name; + * - *proto_type_id* - FUNC_PROTO's type ID, it might not exist yet; + * Returns: + * - >0, type ID of newly added BTF type; + * - <0, on error. + */ +int btf__add_func(struct btf *btf, const char *name, + enum btf_func_linkage linkage, int proto_type_id) +{ + int id; + + if (!name || !name[0]) + return -EINVAL; + if (linkage != BTF_FUNC_STATIC && linkage != BTF_FUNC_GLOBAL && + linkage != BTF_FUNC_EXTERN) + return -EINVAL; + + id = btf_add_ref_kind(btf, BTF_KIND_FUNC, name, proto_type_id); + if (id > 0) { + struct btf_type *t = btf_type_by_id(btf, id); + + t->info = btf_type_info(BTF_KIND_FUNC, linkage, 0); + } + return id; +} + +/* + * Append new BTF_KIND_FUNC_PROTO with: + * - *ret_type_id* - type ID for return result of a function. + * + * Function prototype initially has no arguments, but they can be added by + * btf__add_func_param() one by one, immediately after + * btf__add_func_proto() succeeded. + * + * Returns: + * - >0, type ID of newly added BTF type; + * - <0, on error. + */ +int btf__add_func_proto(struct btf *btf, int ret_type_id) +{ + struct btf_type *t; + int sz, err; + + if (validate_type_id(ret_type_id)) + return -EINVAL; + + if (btf_ensure_modifiable(btf)) + return -ENOMEM; + + sz = sizeof(struct btf_type); + t = btf_add_type_mem(btf, sz); + if (!t) + return -ENOMEM; + + /* start out with vlen=0; this will be adjusted when adding enum + * values, if necessary + */ + t->name_off = 0; + t->info = btf_type_info(BTF_KIND_FUNC_PROTO, 0, 0); + t->type = ret_type_id; + + err = btf_add_type_idx_entry(btf, btf->hdr->type_len); + if (err) + return err; + + btf->hdr->type_len += sz; + btf->hdr->str_off += sz; + btf->nr_types++; + return btf->nr_types; +} + +/* + * Append new function parameter for current FUNC_PROTO type with: + * - *name* - parameter name, can be NULL or empty; + * - *type_id* - type ID describing the type of the parameter. + * Returns: + * - 0, on success; + * - <0, on error. + */ +int btf__add_func_param(struct btf *btf, const char *name, int type_id) +{ + struct btf_type *t; + struct btf_param *p; + int sz, name_off = 0; + + if (validate_type_id(type_id)) + return -EINVAL; + + /* last type should be BTF_KIND_FUNC_PROTO */ + if (btf->nr_types == 0) + return -EINVAL; + t = btf_type_by_id(btf, btf->nr_types); + if (!btf_is_func_proto(t)) + return -EINVAL; + + /* decompose and invalidate raw data */ + if (btf_ensure_modifiable(btf)) + return -ENOMEM; + + sz = sizeof(struct btf_param); + p = btf_add_type_mem(btf, sz); + if (!p) + return -ENOMEM; + + if (name && name[0]) { + name_off = btf__add_str(btf, name); + if (name_off < 0) + return name_off; + } + + p->name_off = name_off; + p->type = type_id; + + /* update parent type's vlen */ + t = btf_type_by_id(btf, btf->nr_types); + btf_type_inc_vlen(t); + + btf->hdr->type_len += sz; + btf->hdr->str_off += sz; + return 0; +} + +/* + * Append new BTF_KIND_VAR type with: + * - *name* - non-empty/non-NULL name; + * - *linkage* - variable linkage, one of BTF_VAR_STATIC, + * BTF_VAR_GLOBAL_ALLOCATED, or BTF_VAR_GLOBAL_EXTERN; + * - *type_id* - type ID of the type describing the type of the variable. + * Returns: + * - >0, type ID of newly added BTF type; + * - <0, on error. + */ +int btf__add_var(struct btf *btf, const char *name, int linkage, int type_id) +{ + struct btf_type *t; + struct btf_var *v; + int sz, err, name_off; + + /* non-empty name */ + if (!name || !name[0]) + return -EINVAL; + if (linkage != BTF_VAR_STATIC && linkage != BTF_VAR_GLOBAL_ALLOCATED && + linkage != BTF_VAR_GLOBAL_EXTERN) + return -EINVAL; + if (validate_type_id(type_id)) + return -EINVAL; + + /* deconstruct BTF, if necessary, and invalidate raw_data */ + if (btf_ensure_modifiable(btf)) + return -ENOMEM; + + sz = sizeof(struct btf_type) + sizeof(struct btf_var); + t = btf_add_type_mem(btf, sz); + if (!t) + return -ENOMEM; + + name_off = btf__add_str(btf, name); + if (name_off < 0) + return name_off; + + t->name_off = name_off; + t->info = btf_type_info(BTF_KIND_VAR, 0, 0); + t->type = type_id; + + v = btf_var(t); + v->linkage = linkage; + + err = btf_add_type_idx_entry(btf, btf->hdr->type_len); + if (err) + return err; + + btf->hdr->type_len += sz; + btf->hdr->str_off += sz; + btf->nr_types++; + return btf->nr_types; +} + +/* + * Append new BTF_KIND_DATASEC type with: + * - *name* - non-empty/non-NULL name; + * - *byte_sz* - data section size, in bytes. + * + * Data section is initially empty. Variables info can be added with + * btf__add_datasec_var_info() calls, after btf__add_datasec() succeeds. + * + * Returns: + * - >0, type ID of newly added BTF type; + * - <0, on error. + */ +int btf__add_datasec(struct btf *btf, const char *name, __u32 byte_sz) +{ + struct btf_type *t; + int sz, err, name_off; + + /* non-empty name */ + if (!name || !name[0]) + return -EINVAL; + + if (btf_ensure_modifiable(btf)) + return -ENOMEM; + + sz = sizeof(struct btf_type); + t = btf_add_type_mem(btf, sz); + if (!t) + return -ENOMEM; + + name_off = btf__add_str(btf, name); + if (name_off < 0) + return name_off; + + /* start with vlen=0, which will be update as var_secinfos are added */ + t->name_off = name_off; + t->info = btf_type_info(BTF_KIND_DATASEC, 0, 0); + t->size = byte_sz; + + err = btf_add_type_idx_entry(btf, btf->hdr->type_len); + if (err) + return err; + + btf->hdr->type_len += sz; + btf->hdr->str_off += sz; + btf->nr_types++; + return btf->nr_types; +} + +/* + * Append new data section variable information entry for current DATASEC type: + * - *var_type_id* - type ID, describing type of the variable; + * - *offset* - variable offset within data section, in bytes; + * - *byte_sz* - variable size, in bytes. + * + * Returns: + * - 0, on success; + * - <0, on error. + */ +int btf__add_datasec_var_info(struct btf *btf, int var_type_id, __u32 offset, __u32 byte_sz) +{ + struct btf_type *t; + struct btf_var_secinfo *v; + int sz; + + /* last type should be BTF_KIND_DATASEC */ + if (btf->nr_types == 0) + return -EINVAL; + t = btf_type_by_id(btf, btf->nr_types); + if (!btf_is_datasec(t)) + return -EINVAL; + + if (validate_type_id(var_type_id)) + return -EINVAL; + + /* decompose and invalidate raw data */ + if (btf_ensure_modifiable(btf)) + return -ENOMEM; + + sz = sizeof(struct btf_var_secinfo); + v = btf_add_type_mem(btf, sz); + if (!v) + return -ENOMEM; + + v->type = var_type_id; + v->offset = offset; + v->size = byte_sz; + + /* update parent type's vlen */ + t = btf_type_by_id(btf, btf->nr_types); + btf_type_inc_vlen(t); + + btf->hdr->type_len += sz; + btf->hdr->str_off += sz; + return 0; +} + struct btf_ext_sec_setup_param { __u32 off; __u32 len; @@ -1137,14 +2448,14 @@ static int btf_ext_setup_line_info(struct btf_ext *btf_ext) return btf_ext_setup_info(btf_ext, ¶m); } -static int btf_ext_setup_field_reloc(struct btf_ext *btf_ext) +static int btf_ext_setup_core_relos(struct btf_ext *btf_ext) { struct btf_ext_sec_setup_param param = { - .off = btf_ext->hdr->field_reloc_off, - .len = btf_ext->hdr->field_reloc_len, - .min_rec_size = sizeof(struct bpf_field_reloc), - .ext_info = &btf_ext->field_reloc_info, - .desc = "field_reloc", + .off = btf_ext->hdr->core_relo_off, + .len = btf_ext->hdr->core_relo_len, + .min_rec_size = sizeof(struct bpf_core_relo), + .ext_info = &btf_ext->core_relo_info, + .desc = "core_relo", }; return btf_ext_setup_info(btf_ext, ¶m); @@ -1160,7 +2471,10 @@ static int btf_ext_parse_hdr(__u8 *data, __u32 data_size) return -EINVAL; } - if (hdr->magic != BTF_MAGIC) { + if (hdr->magic == bswap_16(BTF_MAGIC)) { + pr_warn("BTF.ext in non-native endianness is not supported\n"); + return -ENOTSUP; + } else if (hdr->magic != BTF_MAGIC) { pr_debug("Invalid BTF.ext magic:%x\n", hdr->magic); return -EINVAL; } @@ -1223,10 +2537,9 @@ struct btf_ext *btf_ext__new(__u8 *data, __u32 size) if (err) goto done; - if (btf_ext->hdr->hdr_len < - offsetofend(struct btf_ext_header, field_reloc_len)) + if (btf_ext->hdr->hdr_len < offsetofend(struct btf_ext_header, core_relo_len)) goto done; - err = btf_ext_setup_field_reloc(btf_ext); + err = btf_ext_setup_core_relos(btf_ext); if (err) goto done; @@ -1481,6 +2794,9 @@ int btf__dedup(struct btf *btf, struct btf_ext *btf_ext, return -EINVAL; } + if (btf_ensure_modifiable(btf)) + return -ENOMEM; + err = btf_dedup_strings(d); if (err < 0) { pr_debug("btf_dedup_strings failed:%d\n", err); @@ -1581,7 +2897,7 @@ static int btf_dedup_hypot_map_add(struct btf_dedup *d, __u32 *new_list; d->hypot_cap += max((size_t)16, d->hypot_cap / 2); - new_list = realloc(d->hypot_list, sizeof(__u32) * d->hypot_cap); + new_list = libbpf_reallocarray(d->hypot_list, d->hypot_cap, sizeof(__u32)); if (!new_list) return -ENOMEM; d->hypot_list = new_list; @@ -1665,7 +2981,7 @@ static struct btf_dedup *btf_dedup_new(struct btf *btf, struct btf_ext *btf_ext, /* special BTF "void" type is made canonical immediately */ d->map[0] = 0; for (i = 1; i <= btf->nr_types; i++) { - struct btf_type *t = d->btf->types[i]; + struct btf_type *t = btf_type_by_id(d->btf, i); /* VAR and DATASEC are never deduped and are self-canonical */ if (btf_is_var(t) || btf_is_datasec(t)) @@ -1704,7 +3020,7 @@ static int btf_for_each_str_off(struct btf_dedup *d, str_off_fn_t fn, void *ctx) struct btf_type *t; for (i = 1; i <= d->btf->nr_types; i++) { - t = d->btf->types[i]; + t = btf_type_by_id(d->btf, i); r = fn(&t->name_off, ctx); if (r) return r; @@ -1858,8 +3174,7 @@ static int btf_str_remap_offset(__u32 *str_off_ptr, void *ctx) */ static int btf_dedup_strings(struct btf_dedup *d) { - const struct btf_header *hdr = d->btf->hdr; - char *start = (char *)d->btf->nohdr_data + hdr->str_off; + char *start = d->btf->strs_data; char *end = start + d->btf->hdr->str_len; char *p = start, *tmp_strs = NULL; struct btf_str_ptrs strs = { @@ -1871,14 +3186,16 @@ static int btf_dedup_strings(struct btf_dedup *d) int i, j, err = 0, grp_idx; bool grp_used; + if (d->btf->strs_deduped) + return 0; + /* build index of all strings */ while (p < end) { if (strs.cnt + 1 > strs.cap) { struct btf_str_ptr *new_ptrs; strs.cap += max(strs.cnt / 2, 16U); - new_ptrs = realloc(strs.ptrs, - sizeof(strs.ptrs[0]) * strs.cap); + new_ptrs = libbpf_reallocarray(strs.ptrs, strs.cap, sizeof(strs.ptrs[0])); if (!new_ptrs) { err = -ENOMEM; goto done; @@ -1964,6 +3281,7 @@ static int btf_dedup_strings(struct btf_dedup *d) goto done; d->btf->hdr->str_len = end - start; + d->btf->strs_deduped = true; done: free(tmp_strs); @@ -2240,7 +3558,7 @@ static bool btf_compat_fnproto(struct btf_type *t1, struct btf_type *t2) */ static int btf_dedup_prim_type(struct btf_dedup *d, __u32 type_id) { - struct btf_type *t = d->btf->types[type_id]; + struct btf_type *t = btf_type_by_id(d->btf, type_id); struct hashmap_entry *hash_entry; struct btf_type *cand; /* if we don't find equivalent type, then we are canonical */ @@ -2267,7 +3585,7 @@ static int btf_dedup_prim_type(struct btf_dedup *d, __u32 type_id) h = btf_hash_int(t); for_each_dedup_cand(d, hash_entry, h) { cand_id = (__u32)(long)hash_entry->value; - cand = d->btf->types[cand_id]; + cand = btf_type_by_id(d->btf, cand_id); if (btf_equal_int(t, cand)) { new_id = cand_id; break; @@ -2279,7 +3597,7 @@ static int btf_dedup_prim_type(struct btf_dedup *d, __u32 type_id) h = btf_hash_enum(t); for_each_dedup_cand(d, hash_entry, h) { cand_id = (__u32)(long)hash_entry->value; - cand = d->btf->types[cand_id]; + cand = btf_type_by_id(d->btf, cand_id); if (btf_equal_enum(t, cand)) { new_id = cand_id; break; @@ -2302,7 +3620,7 @@ static int btf_dedup_prim_type(struct btf_dedup *d, __u32 type_id) h = btf_hash_common(t); for_each_dedup_cand(d, hash_entry, h) { cand_id = (__u32)(long)hash_entry->value; - cand = d->btf->types[cand_id]; + cand = btf_type_by_id(d->btf, cand_id); if (btf_equal_common(t, cand)) { new_id = cand_id; break; @@ -2361,13 +3679,13 @@ static uint32_t resolve_fwd_id(struct btf_dedup *d, uint32_t type_id) { __u32 orig_type_id = type_id; - if (!btf_is_fwd(d->btf->types[type_id])) + if (!btf_is_fwd(btf__type_by_id(d->btf, type_id))) return type_id; while (is_type_mapped(d, type_id) && d->map[type_id] != type_id) type_id = d->map[type_id]; - if (!btf_is_fwd(d->btf->types[type_id])) + if (!btf_is_fwd(btf__type_by_id(d->btf, type_id))) return type_id; return orig_type_id; @@ -2495,8 +3813,8 @@ static int btf_dedup_is_equiv(struct btf_dedup *d, __u32 cand_id, if (btf_dedup_hypot_map_add(d, canon_id, cand_id)) return -ENOMEM; - cand_type = d->btf->types[cand_id]; - canon_type = d->btf->types[canon_id]; + cand_type = btf_type_by_id(d->btf, cand_id); + canon_type = btf_type_by_id(d->btf, canon_id); cand_kind = btf_kind(cand_type); canon_kind = btf_kind(canon_type); @@ -2647,8 +3965,8 @@ static void btf_dedup_merge_hypot_map(struct btf_dedup *d) targ_type_id = d->hypot_map[cand_type_id]; t_id = resolve_type_id(d, targ_type_id); c_id = resolve_type_id(d, cand_type_id); - t_kind = btf_kind(d->btf->types[t_id]); - c_kind = btf_kind(d->btf->types[c_id]); + t_kind = btf_kind(btf__type_by_id(d->btf, t_id)); + c_kind = btf_kind(btf__type_by_id(d->btf, c_id)); /* * Resolve FWD into STRUCT/UNION. * It's ok to resolve FWD into STRUCT/UNION that's not yet @@ -2716,7 +4034,7 @@ static int btf_dedup_struct_type(struct btf_dedup *d, __u32 type_id) if (d->map[type_id] <= BTF_MAX_NR_TYPES) return 0; - t = d->btf->types[type_id]; + t = btf_type_by_id(d->btf, type_id); kind = btf_kind(t); if (kind != BTF_KIND_STRUCT && kind != BTF_KIND_UNION) @@ -2737,7 +4055,7 @@ static int btf_dedup_struct_type(struct btf_dedup *d, __u32 type_id) * creating a loop (FWD -> STRUCT and STRUCT -> FWD), because * FWD and compatible STRUCT/UNION are considered equivalent. */ - cand_type = d->btf->types[cand_id]; + cand_type = btf_type_by_id(d->btf, cand_id); if (!btf_shallow_equal_struct(t, cand_type)) continue; @@ -2809,7 +4127,7 @@ static int btf_dedup_ref_type(struct btf_dedup *d, __u32 type_id) if (d->map[type_id] <= BTF_MAX_NR_TYPES) return resolve_type_id(d, type_id); - t = d->btf->types[type_id]; + t = btf_type_by_id(d->btf, type_id); d->map[type_id] = BTF_IN_PROGRESS_ID; switch (btf_kind(t)) { @@ -2827,7 +4145,7 @@ static int btf_dedup_ref_type(struct btf_dedup *d, __u32 type_id) h = btf_hash_common(t); for_each_dedup_cand(d, hash_entry, h) { cand_id = (__u32)(long)hash_entry->value; - cand = d->btf->types[cand_id]; + cand = btf_type_by_id(d->btf, cand_id); if (btf_equal_common(t, cand)) { new_id = cand_id; break; @@ -2851,7 +4169,7 @@ static int btf_dedup_ref_type(struct btf_dedup *d, __u32 type_id) h = btf_hash_array(t); for_each_dedup_cand(d, hash_entry, h) { cand_id = (__u32)(long)hash_entry->value; - cand = d->btf->types[cand_id]; + cand = btf_type_by_id(d->btf, cand_id); if (btf_equal_array(t, cand)) { new_id = cand_id; break; @@ -2883,7 +4201,7 @@ static int btf_dedup_ref_type(struct btf_dedup *d, __u32 type_id) h = btf_hash_fnproto(t); for_each_dedup_cand(d, hash_entry, h) { cand_id = (__u32)(long)hash_entry->value; - cand = d->btf->types[cand_id]; + cand = btf_type_by_id(d->btf, cand_id); if (btf_equal_fnproto(t, cand)) { new_id = cand_id; break; @@ -2931,9 +4249,9 @@ static int btf_dedup_ref_types(struct btf_dedup *d) */ static int btf_dedup_compact_types(struct btf_dedup *d) { - struct btf_type **new_types; + __u32 *new_offs; __u32 next_type_id = 1; - char *types_start, *p; + void *p; int i, len; /* we are going to reuse hypot_map to store compaction remapping */ @@ -2941,41 +4259,34 @@ static int btf_dedup_compact_types(struct btf_dedup *d) for (i = 1; i <= d->btf->nr_types; i++) d->hypot_map[i] = BTF_UNPROCESSED_ID; - types_start = d->btf->nohdr_data + d->btf->hdr->type_off; - p = types_start; + p = d->btf->types_data; for (i = 1; i <= d->btf->nr_types; i++) { if (d->map[i] != i) continue; - len = btf_type_size(d->btf->types[i]); + len = btf_type_size(btf__type_by_id(d->btf, i)); if (len < 0) return len; - memmove(p, d->btf->types[i], len); + memmove(p, btf__type_by_id(d->btf, i), len); d->hypot_map[i] = next_type_id; - d->btf->types[next_type_id] = (struct btf_type *)p; + d->btf->type_offs[next_type_id] = p - d->btf->types_data; p += len; next_type_id++; } /* shrink struct btf's internal types index and update btf_header */ d->btf->nr_types = next_type_id - 1; - d->btf->types_size = d->btf->nr_types; - d->btf->hdr->type_len = p - types_start; - new_types = realloc(d->btf->types, - (1 + d->btf->nr_types) * sizeof(struct btf_type *)); - if (!new_types) + d->btf->type_offs_cap = d->btf->nr_types + 1; + d->btf->hdr->type_len = p - d->btf->types_data; + new_offs = libbpf_reallocarray(d->btf->type_offs, d->btf->type_offs_cap, + sizeof(*new_offs)); + if (!new_offs) return -ENOMEM; - d->btf->types = new_types; - - /* make sure string section follows type information without gaps */ - d->btf->hdr->str_off = p - (char *)d->btf->nohdr_data; - memmove(p, d->btf->strings, d->btf->hdr->str_len); - d->btf->strings = p; - p += d->btf->hdr->str_len; - - d->btf->data_size = p - (char *)d->btf->data; + d->btf->type_offs = new_offs; + d->btf->hdr->str_off = d->btf->hdr->type_len; + d->btf->raw_size = d->btf->hdr->hdr_len + d->btf->hdr->type_len + d->btf->hdr->str_len; return 0; } @@ -3008,7 +4319,7 @@ static int btf_dedup_remap_type_id(struct btf_dedup *d, __u32 type_id) */ static int btf_dedup_remap_type(struct btf_dedup *d, __u32 type_id) { - struct btf_type *t = d->btf->types[type_id]; + struct btf_type *t = btf_type_by_id(d->btf, type_id); int i, r; switch (btf_kind(t)) { diff --git a/tools/lib/bpf/btf.h b/tools/lib/bpf/btf.h index 1ca14448df4c..57247240a20a 100644 --- a/tools/lib/bpf/btf.h +++ b/tools/lib/bpf/btf.h @@ -5,6 +5,7 @@ #define __LIBBPF_BTF_H #include <stdarg.h> +#include <stdbool.h> #include <linux/btf.h> #include <linux/types.h> @@ -24,46 +25,14 @@ struct btf_type; struct bpf_object; -/* - * The .BTF.ext ELF section layout defined as - * struct btf_ext_header - * func_info subsection - * - * The func_info subsection layout: - * record size for struct bpf_func_info in the func_info subsection - * struct btf_sec_func_info for section #1 - * a list of bpf_func_info records for section #1 - * where struct bpf_func_info mimics one in include/uapi/linux/bpf.h - * but may not be identical - * struct btf_sec_func_info for section #2 - * a list of bpf_func_info records for section #2 - * ...... - * - * Note that the bpf_func_info record size in .BTF.ext may not - * be the same as the one defined in include/uapi/linux/bpf.h. - * The loader should ensure that record_size meets minimum - * requirement and pass the record as is to the kernel. The - * kernel will handle the func_info properly based on its contents. - */ -struct btf_ext_header { - __u16 magic; - __u8 version; - __u8 flags; - __u32 hdr_len; - - /* All offsets are in bytes relative to the end of this header */ - __u32 func_info_off; - __u32 func_info_len; - __u32 line_info_off; - __u32 line_info_len; - - /* optional part of .BTF.ext header */ - __u32 field_reloc_off; - __u32 field_reloc_len; +enum btf_endianness { + BTF_LITTLE_ENDIAN = 0, + BTF_BIG_ENDIAN = 1, }; LIBBPF_API void btf__free(struct btf *btf); LIBBPF_API struct btf *btf__new(const void *data, __u32 size); +LIBBPF_API struct btf *btf__new_empty(void); LIBBPF_API struct btf *btf__parse(const char *path, struct btf_ext **btf_ext); LIBBPF_API struct btf *btf__parse_elf(const char *path, struct btf_ext **btf_ext); LIBBPF_API struct btf *btf__parse_raw(const char *path); @@ -78,6 +47,8 @@ LIBBPF_API const struct btf_type *btf__type_by_id(const struct btf *btf, __u32 id); LIBBPF_API size_t btf__pointer_size(const struct btf *btf); LIBBPF_API int btf__set_pointer_size(struct btf *btf, size_t ptr_sz); +LIBBPF_API enum btf_endianness btf__endianness(const struct btf *btf); +LIBBPF_API int btf__set_endianness(struct btf *btf, enum btf_endianness endian); LIBBPF_API __s64 btf__resolve_size(const struct btf *btf, __u32 type_id); LIBBPF_API int btf__resolve_type(const struct btf *btf, __u32 type_id); LIBBPF_API int btf__align_of(const struct btf *btf, __u32 id); @@ -85,6 +56,7 @@ LIBBPF_API int btf__fd(const struct btf *btf); LIBBPF_API void btf__set_fd(struct btf *btf, int fd); LIBBPF_API const void *btf__get_raw_data(const struct btf *btf, __u32 *size); LIBBPF_API const char *btf__name_by_offset(const struct btf *btf, __u32 offset); +LIBBPF_API const char *btf__str_by_offset(const struct btf *btf, __u32 offset); LIBBPF_API int btf__get_from_id(__u32 id, struct btf **btf); LIBBPF_API int btf__get_map_kv_tids(const struct btf *btf, const char *map_name, __u32 expected_key_size, @@ -95,19 +67,62 @@ LIBBPF_API struct btf_ext *btf_ext__new(__u8 *data, __u32 size); LIBBPF_API void btf_ext__free(struct btf_ext *btf_ext); LIBBPF_API const void *btf_ext__get_raw_data(const struct btf_ext *btf_ext, __u32 *size); -LIBBPF_API int btf_ext__reloc_func_info(const struct btf *btf, - const struct btf_ext *btf_ext, - const char *sec_name, __u32 insns_cnt, - void **func_info, __u32 *cnt); -LIBBPF_API int btf_ext__reloc_line_info(const struct btf *btf, - const struct btf_ext *btf_ext, - const char *sec_name, __u32 insns_cnt, - void **line_info, __u32 *cnt); +LIBBPF_API LIBBPF_DEPRECATED("btf_ext__reloc_func_info was never meant as a public API and has wrong assumptions embedded in it; it will be removed in the future libbpf versions") +int btf_ext__reloc_func_info(const struct btf *btf, + const struct btf_ext *btf_ext, + const char *sec_name, __u32 insns_cnt, + void **func_info, __u32 *cnt); +LIBBPF_API LIBBPF_DEPRECATED("btf_ext__reloc_line_info was never meant as a public API and has wrong assumptions embedded in it; it will be removed in the future libbpf versions") +int btf_ext__reloc_line_info(const struct btf *btf, + const struct btf_ext *btf_ext, + const char *sec_name, __u32 insns_cnt, + void **line_info, __u32 *cnt); LIBBPF_API __u32 btf_ext__func_info_rec_size(const struct btf_ext *btf_ext); LIBBPF_API __u32 btf_ext__line_info_rec_size(const struct btf_ext *btf_ext); LIBBPF_API struct btf *libbpf_find_kernel_btf(void); +LIBBPF_API int btf__find_str(struct btf *btf, const char *s); +LIBBPF_API int btf__add_str(struct btf *btf, const char *s); + +LIBBPF_API int btf__add_int(struct btf *btf, const char *name, size_t byte_sz, int encoding); +LIBBPF_API int btf__add_ptr(struct btf *btf, int ref_type_id); +LIBBPF_API int btf__add_array(struct btf *btf, + int index_type_id, int elem_type_id, __u32 nr_elems); +/* struct/union construction APIs */ +LIBBPF_API int btf__add_struct(struct btf *btf, const char *name, __u32 sz); +LIBBPF_API int btf__add_union(struct btf *btf, const char *name, __u32 sz); +LIBBPF_API int btf__add_field(struct btf *btf, const char *name, int field_type_id, + __u32 bit_offset, __u32 bit_size); + +/* enum construction APIs */ +LIBBPF_API int btf__add_enum(struct btf *btf, const char *name, __u32 bytes_sz); +LIBBPF_API int btf__add_enum_value(struct btf *btf, const char *name, __s64 value); + +enum btf_fwd_kind { + BTF_FWD_STRUCT = 0, + BTF_FWD_UNION = 1, + BTF_FWD_ENUM = 2, +}; + +LIBBPF_API int btf__add_fwd(struct btf *btf, const char *name, enum btf_fwd_kind fwd_kind); +LIBBPF_API int btf__add_typedef(struct btf *btf, const char *name, int ref_type_id); +LIBBPF_API int btf__add_volatile(struct btf *btf, int ref_type_id); +LIBBPF_API int btf__add_const(struct btf *btf, int ref_type_id); +LIBBPF_API int btf__add_restrict(struct btf *btf, int ref_type_id); + +/* func and func_proto construction APIs */ +LIBBPF_API int btf__add_func(struct btf *btf, const char *name, + enum btf_func_linkage linkage, int proto_type_id); +LIBBPF_API int btf__add_func_proto(struct btf *btf, int ret_type_id); +LIBBPF_API int btf__add_func_param(struct btf *btf, const char *name, int type_id); + +/* var & datasec construction APIs */ +LIBBPF_API int btf__add_var(struct btf *btf, const char *name, int linkage, int type_id); +LIBBPF_API int btf__add_datasec(struct btf *btf, const char *name, __u32 byte_sz); +LIBBPF_API int btf__add_datasec_var_info(struct btf *btf, int var_type_id, + __u32 offset, __u32 byte_sz); + struct btf_dedup_opts { unsigned int dedup_table_size; bool dont_resolve_fwds; diff --git a/tools/lib/bpf/btf_dump.c b/tools/lib/bpf/btf_dump.c index 57c00fa63932..2f9d685bd522 100644 --- a/tools/lib/bpf/btf_dump.c +++ b/tools/lib/bpf/btf_dump.c @@ -19,9 +19,6 @@ #include "libbpf.h" #include "libbpf_internal.h" -/* make sure libbpf doesn't use kernel-only integer typedefs */ -#pragma GCC poison u8 u16 u32 u64 s8 s16 s32 s64 - static const char PREFIXES[] = "\t\t\t\t\t\t\t\t\t\t\t\t\t"; static const size_t PREFIX_CNT = sizeof(PREFIXES) - 1; @@ -63,11 +60,14 @@ struct btf_dump { struct btf_dump_opts opts; int ptr_sz; bool strip_mods; + int last_id; /* per-type auxiliary state */ struct btf_dump_type_aux_state *type_states; + size_t type_states_cap; /* per-type optional cached unique name, must be freed, if present */ const char **cached_names; + size_t cached_names_cap; /* topo-sorted list of dependent type definitions */ __u32 *emit_queue; @@ -93,14 +93,7 @@ struct btf_dump { static size_t str_hash_fn(const void *key, void *ctx) { - const char *s = key; - size_t h = 0; - - while (*s) { - h = h * 31 + *s; - s++; - } - return h; + return str_hash(key); } static bool str_equal_fn(const void *a, const void *b, void *ctx) @@ -123,6 +116,7 @@ static void btf_dump_printf(const struct btf_dump *d, const char *fmt, ...) } static int btf_dump_mark_referenced(struct btf_dump *d); +static int btf_dump_resize(struct btf_dump *d); struct btf_dump *btf_dump__new(const struct btf *btf, const struct btf_ext *btf_ext, @@ -154,25 +148,8 @@ struct btf_dump *btf_dump__new(const struct btf *btf, d->ident_names = NULL; goto err; } - d->type_states = calloc(1 + btf__get_nr_types(d->btf), - sizeof(d->type_states[0])); - if (!d->type_states) { - err = -ENOMEM; - goto err; - } - d->cached_names = calloc(1 + btf__get_nr_types(d->btf), - sizeof(d->cached_names[0])); - if (!d->cached_names) { - err = -ENOMEM; - goto err; - } - - /* VOID is special */ - d->type_states[0].order_state = ORDERED; - d->type_states[0].emit_state = EMITTED; - /* eagerly determine referenced types for anon enums */ - err = btf_dump_mark_referenced(d); + err = btf_dump_resize(d); if (err) goto err; @@ -182,9 +159,38 @@ err: return ERR_PTR(err); } +static int btf_dump_resize(struct btf_dump *d) +{ + int err, last_id = btf__get_nr_types(d->btf); + + if (last_id <= d->last_id) + return 0; + + if (btf_ensure_mem((void **)&d->type_states, &d->type_states_cap, + sizeof(*d->type_states), last_id + 1)) + return -ENOMEM; + if (btf_ensure_mem((void **)&d->cached_names, &d->cached_names_cap, + sizeof(*d->cached_names), last_id + 1)) + return -ENOMEM; + + if (d->last_id == 0) { + /* VOID is special */ + d->type_states[0].order_state = ORDERED; + d->type_states[0].emit_state = EMITTED; + } + + /* eagerly determine referenced types for anon enums */ + err = btf_dump_mark_referenced(d); + if (err) + return err; + + d->last_id = last_id; + return 0; +} + void btf_dump__free(struct btf_dump *d) { - int i, cnt; + int i; if (IS_ERR_OR_NULL(d)) return; @@ -192,7 +198,7 @@ void btf_dump__free(struct btf_dump *d) free(d->type_states); if (d->cached_names) { /* any set cached name is owned by us and should be freed */ - for (i = 0, cnt = btf__get_nr_types(d->btf); i <= cnt; i++) { + for (i = 0; i <= d->last_id; i++) { if (d->cached_names[i]) free((void *)d->cached_names[i]); } @@ -232,6 +238,10 @@ int btf_dump__dump_type(struct btf_dump *d, __u32 id) if (id > btf__get_nr_types(d->btf)) return -EINVAL; + err = btf_dump_resize(d); + if (err) + return err; + d->emit_queue_cnt = 0; err = btf_dump_order_type(d, id, false); if (err < 0) @@ -261,7 +271,7 @@ static int btf_dump_mark_referenced(struct btf_dump *d) const struct btf_type *t; __u16 vlen; - for (i = 1; i <= n; i++) { + for (i = d->last_id + 1; i <= n; i++) { t = btf__type_by_id(d->btf, i); vlen = btf_vlen(t); @@ -316,6 +326,7 @@ static int btf_dump_mark_referenced(struct btf_dump *d) } return 0; } + static int btf_dump_add_emit_queue_id(struct btf_dump *d, __u32 id) { __u32 *new_queue; @@ -323,8 +334,7 @@ static int btf_dump_add_emit_queue_id(struct btf_dump *d, __u32 id) if (d->emit_queue_cnt >= d->emit_queue_cap) { new_cap = max(16, d->emit_queue_cap * 3 / 2); - new_queue = realloc(d->emit_queue, - new_cap * sizeof(new_queue[0])); + new_queue = libbpf_reallocarray(d->emit_queue, new_cap, sizeof(new_queue[0])); if (!new_queue) return -ENOMEM; d->emit_queue = new_queue; @@ -1003,8 +1013,7 @@ static int btf_dump_push_decl_stack_id(struct btf_dump *d, __u32 id) if (d->decl_stack_cnt >= d->decl_stack_cap) { new_cap = max(16, d->decl_stack_cap * 3 / 2); - new_stack = realloc(d->decl_stack, - new_cap * sizeof(new_stack[0])); + new_stack = libbpf_reallocarray(d->decl_stack, new_cap, sizeof(new_stack[0])); if (!new_stack) return -ENOMEM; d->decl_stack = new_stack; @@ -1061,11 +1070,15 @@ int btf_dump__emit_type_decl(struct btf_dump *d, __u32 id, const struct btf_dump_emit_type_decl_opts *opts) { const char *fname; - int lvl; + int lvl, err; if (!OPTS_VALID(opts, btf_dump_emit_type_decl_opts)) return -EINVAL; + err = btf_dump_resize(d); + if (err) + return -EINVAL; + fname = OPTS_GET(opts, field_name, ""); lvl = OPTS_GET(opts, indent_level, 0); d->strip_mods = OPTS_GET(opts, strip_mods, false); diff --git a/tools/lib/bpf/hashmap.c b/tools/lib/bpf/hashmap.c index a405dad068f5..3c20b126d60d 100644 --- a/tools/lib/bpf/hashmap.c +++ b/tools/lib/bpf/hashmap.c @@ -15,6 +15,9 @@ /* make sure libbpf doesn't use kernel-only integer typedefs */ #pragma GCC poison u8 u16 u32 u64 s8 s16 s32 s64 +/* prevent accidental re-addition of reallocarray() */ +#pragma GCC poison reallocarray + /* start with 4 buckets */ #define HASHMAP_MIN_CAP_BITS 2 diff --git a/tools/lib/bpf/hashmap.h b/tools/lib/bpf/hashmap.h index e0af36b0e5d8..d9b385fe808c 100644 --- a/tools/lib/bpf/hashmap.h +++ b/tools/lib/bpf/hashmap.h @@ -25,6 +25,18 @@ static inline size_t hash_bits(size_t h, int bits) #endif } +/* generic C-string hashing function */ +static inline size_t str_hash(const char *s) +{ + size_t h = 0; + + while (*s) { + h = h * 31 + *s; + s++; + } + return h; +} + typedef size_t (*hashmap_hash_fn)(const void *key, void *ctx); typedef bool (*hashmap_equal_fn)(const void *key1, const void *key2, void *ctx); diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c index e493d6048143..313034117070 100644 --- a/tools/lib/bpf/libbpf.c +++ b/tools/lib/bpf/libbpf.c @@ -44,7 +44,6 @@ #include <sys/vfs.h> #include <sys/utsname.h> #include <sys/resource.h> -#include <tools/libc_compat.h> #include <libelf.h> #include <gelf.h> #include <zlib.h> @@ -56,9 +55,6 @@ #include "libbpf_internal.h" #include "hashmap.h" -/* make sure libbpf doesn't use kernel-only integer typedefs */ -#pragma GCC poison u8 u16 u32 u64 s8 s16 s32 s64 - #ifndef EM_BPF #define EM_BPF 247 #endif @@ -67,6 +63,8 @@ #define BPF_FS_MAGIC 0xcafe4a11 #endif +#define BPF_INSN_SZ (sizeof(struct bpf_insn)) + /* vsprintf() in __base_pr() uses nonliteral format string. It may break * compilation if user enables corresponding warning. Disable it explicitly. */ @@ -75,8 +73,6 @@ #define __printf(a, b) __attribute__((format(printf, a, b))) static struct bpf_map *bpf_object__add_map(struct bpf_object *obj); -static struct bpf_program *bpf_object__find_prog_by_idx(struct bpf_object *obj, - int idx); static const struct btf_type * skip_mods_and_typedefs(const struct btf *btf, __u32 id, __u32 *res_id); @@ -154,34 +150,37 @@ static void pr_perm_msg(int err) ___err; }) #endif -#ifdef HAVE_LIBELF_MMAP_SUPPORT -# define LIBBPF_ELF_C_READ_MMAP ELF_C_READ_MMAP -#else -# define LIBBPF_ELF_C_READ_MMAP ELF_C_READ -#endif - static inline __u64 ptr_to_u64(const void *ptr) { return (__u64) (unsigned long) ptr; } -struct bpf_capabilities { +enum kern_feature_id { /* v4.14: kernel support for program & map names. */ - __u32 name:1; + FEAT_PROG_NAME, /* v5.2: kernel support for global data sections. */ - __u32 global_data:1; + FEAT_GLOBAL_DATA, + /* BTF support */ + FEAT_BTF, /* BTF_KIND_FUNC and BTF_KIND_FUNC_PROTO support */ - __u32 btf_func:1; + FEAT_BTF_FUNC, /* BTF_KIND_VAR and BTF_KIND_DATASEC support */ - __u32 btf_datasec:1; - /* BPF_F_MMAPABLE is supported for arrays */ - __u32 array_mmap:1; + FEAT_BTF_DATASEC, /* BTF_FUNC_GLOBAL is supported */ - __u32 btf_func_global:1; + FEAT_BTF_GLOBAL_FUNC, + /* BPF_F_MMAPABLE is supported for arrays */ + FEAT_ARRAY_MMAP, /* kernel support for expected_attach_type in BPF_PROG_LOAD */ - __u32 exp_attach_type:1; + FEAT_EXP_ATTACH_TYPE, + /* bpf_probe_read_{kernel,user}[_str] helpers */ + FEAT_PROBE_READ_KERN, + /* BPF_PROG_BIND_MAP is supported */ + FEAT_PROG_BIND_MAP, + __FEAT_CNT, }; +static bool kernel_supports(enum kern_feature_id feat_id); + enum reloc_type { RELO_LD64, RELO_CALL, @@ -194,6 +193,7 @@ struct reloc_desc { int insn_idx; int map_idx; int sym_off; + bool processed; }; struct bpf_sec_def; @@ -209,6 +209,7 @@ struct bpf_sec_def { bool is_exp_attach_type_optional; bool is_attachable; bool is_attach_btf; + bool is_sleepable; attach_fn_t attach_fn; }; @@ -217,20 +218,45 @@ struct bpf_sec_def { * linux/filter.h. */ struct bpf_program { - /* Index in elf obj file, for relocation use. */ - int idx; - char *name; - int prog_ifindex; - char *section_name; const struct bpf_sec_def *sec_def; - /* section_name with / replaced by _; makes recursive pinning + char *sec_name; + size_t sec_idx; + /* this program's instruction offset (in number of instructions) + * within its containing ELF section + */ + size_t sec_insn_off; + /* number of original instructions in ELF section belonging to this + * program, not taking into account subprogram instructions possible + * appended later during relocation + */ + size_t sec_insn_cnt; + /* Offset (in number of instructions) of the start of instruction + * belonging to this BPF program within its containing main BPF + * program. For the entry-point (main) BPF program, this is always + * zero. For a sub-program, this gets reset before each of main BPF + * programs are processed and relocated and is used to determined + * whether sub-program was already appended to the main program, and + * if yes, at which instruction offset. + */ + size_t sub_insn_off; + + char *name; + /* sec_name with / replaced by _; makes recursive pinning * in bpf_object__pin_programs easier */ char *pin_name; + + /* instructions that belong to BPF program; insns[0] is located at + * sec_insn_off instruction within its ELF section in ELF file, so + * when mapping ELF file instruction index to the local instruction, + * one needs to subtract sec_insn_off; and vice versa. + */ struct bpf_insn *insns; - size_t insns_cnt, main_prog_cnt; - enum bpf_prog_type type; - bool load; + /* actual number of instruction in this BPF program's image; for + * entry-point BPF programs this includes the size of main program + * itself plus all the used sub-programs, appended at the end + */ + size_t insns_cnt; struct reloc_desc *reloc_desc; int nr_reloc; @@ -246,15 +272,16 @@ struct bpf_program { void *priv; bpf_program_clear_priv_t clear_priv; + bool load; + enum bpf_prog_type type; enum bpf_attach_type expected_attach_type; + int prog_ifindex; __u32 attach_btf_id; __u32 attach_prog_fd; void *func_info; __u32 func_info_rec_size; __u32 func_info_cnt; - struct bpf_capabilities *caps; - void *line_info; __u32 line_info_rec_size; __u32 line_info_cnt; @@ -363,6 +390,12 @@ struct extern_desc { } kcfg; struct { unsigned long long addr; + + /* target btf_id of the corresponding kernel var. */ + int vmlinux_btf_id; + + /* local btf_id of the ksym extern's type. */ + __u32 type_id; } ksym; }; }; @@ -384,9 +417,10 @@ struct bpf_object { struct extern_desc *externs; int nr_extern; int kconfig_map_idx; + int rodata_map_idx; bool loaded; - bool has_pseudo_calls; + bool has_subcalls; /* * Information when doing elf related work. Only valid if fd @@ -403,6 +437,7 @@ struct bpf_object { Elf_Data *rodata; Elf_Data *bss; Elf_Data *st_ops_data; + size_t shstrndx; /* section index for section name strings */ size_t strtabidx; struct { GElf_Shdr shdr; @@ -436,12 +471,20 @@ struct bpf_object { void *priv; bpf_object_clear_priv_t clear_priv; - struct bpf_capabilities caps; - char path[]; }; #define obj_elf_valid(o) ((o)->efile.elf) +static const char *elf_sym_str(const struct bpf_object *obj, size_t off); +static const char *elf_sec_str(const struct bpf_object *obj, size_t off); +static Elf_Scn *elf_sec_by_idx(const struct bpf_object *obj, size_t idx); +static Elf_Scn *elf_sec_by_name(const struct bpf_object *obj, const char *name); +static int elf_sec_hdr(const struct bpf_object *obj, Elf_Scn *scn, GElf_Shdr *hdr); +static const char *elf_sec_name(const struct bpf_object *obj, Elf_Scn *scn); +static Elf_Data *elf_sec_data(const struct bpf_object *obj, Elf_Scn *scn); +static int elf_sym_by_sec_off(const struct bpf_object *obj, size_t sec_idx, + size_t off, __u32 sym_type, GElf_Sym *sym); + void bpf_program__unload(struct bpf_program *prog) { int i; @@ -481,159 +524,160 @@ static void bpf_program__exit(struct bpf_program *prog) bpf_program__unload(prog); zfree(&prog->name); - zfree(&prog->section_name); + zfree(&prog->sec_name); zfree(&prog->pin_name); zfree(&prog->insns); zfree(&prog->reloc_desc); prog->nr_reloc = 0; prog->insns_cnt = 0; - prog->idx = -1; + prog->sec_idx = -1; } static char *__bpf_program__pin_name(struct bpf_program *prog) { char *name, *p; - name = p = strdup(prog->section_name); + name = p = strdup(prog->sec_name); while ((p = strchr(p, '/'))) *p = '_'; return name; } +static bool insn_is_subprog_call(const struct bpf_insn *insn) +{ + return BPF_CLASS(insn->code) == BPF_JMP && + BPF_OP(insn->code) == BPF_CALL && + BPF_SRC(insn->code) == BPF_K && + insn->src_reg == BPF_PSEUDO_CALL && + insn->dst_reg == 0 && + insn->off == 0; +} + static int -bpf_program__init(void *data, size_t size, char *section_name, int idx, - struct bpf_program *prog) +bpf_object__init_prog(struct bpf_object *obj, struct bpf_program *prog, + const char *name, size_t sec_idx, const char *sec_name, + size_t sec_off, void *insn_data, size_t insn_data_sz) { - const size_t bpf_insn_sz = sizeof(struct bpf_insn); + int i; - if (size == 0 || size % bpf_insn_sz) { - pr_warn("corrupted section '%s', size: %zu\n", - section_name, size); + if (insn_data_sz == 0 || insn_data_sz % BPF_INSN_SZ || sec_off % BPF_INSN_SZ) { + pr_warn("sec '%s': corrupted program '%s', offset %zu, size %zu\n", + sec_name, name, sec_off, insn_data_sz); return -EINVAL; } memset(prog, 0, sizeof(*prog)); + prog->obj = obj; + + prog->sec_idx = sec_idx; + prog->sec_insn_off = sec_off / BPF_INSN_SZ; + prog->sec_insn_cnt = insn_data_sz / BPF_INSN_SZ; + /* insns_cnt can later be increased by appending used subprograms */ + prog->insns_cnt = prog->sec_insn_cnt; - prog->section_name = strdup(section_name); - if (!prog->section_name) { - pr_warn("failed to alloc name for prog under section(%d) %s\n", - idx, section_name); + prog->type = BPF_PROG_TYPE_UNSPEC; + prog->load = true; + + prog->instances.fds = NULL; + prog->instances.nr = -1; + + prog->sec_name = strdup(sec_name); + if (!prog->sec_name) + goto errout; + + prog->name = strdup(name); + if (!prog->name) goto errout; - } prog->pin_name = __bpf_program__pin_name(prog); - if (!prog->pin_name) { - pr_warn("failed to alloc pin name for prog under section(%d) %s\n", - idx, section_name); + if (!prog->pin_name) goto errout; - } - prog->insns = malloc(size); - if (!prog->insns) { - pr_warn("failed to alloc insns for prog under section %s\n", - section_name); + prog->insns = malloc(insn_data_sz); + if (!prog->insns) goto errout; + memcpy(prog->insns, insn_data, insn_data_sz); + + for (i = 0; i < prog->insns_cnt; i++) { + if (insn_is_subprog_call(&prog->insns[i])) { + obj->has_subcalls = true; + break; + } } - prog->insns_cnt = size / bpf_insn_sz; - memcpy(prog->insns, data, size); - prog->idx = idx; - prog->instances.fds = NULL; - prog->instances.nr = -1; - prog->type = BPF_PROG_TYPE_UNSPEC; - prog->load = true; return 0; errout: + pr_warn("sec '%s': failed to allocate memory for prog '%s'\n", sec_name, name); bpf_program__exit(prog); return -ENOMEM; } static int -bpf_object__add_program(struct bpf_object *obj, void *data, size_t size, - char *section_name, int idx) +bpf_object__add_programs(struct bpf_object *obj, Elf_Data *sec_data, + const char *sec_name, int sec_idx) { - struct bpf_program prog, *progs; + struct bpf_program *prog, *progs; + void *data = sec_data->d_buf; + size_t sec_sz = sec_data->d_size, sec_off, prog_sz; int nr_progs, err; + const char *name; + GElf_Sym sym; - err = bpf_program__init(data, size, section_name, idx, &prog); - if (err) - return err; - - prog.caps = &obj->caps; progs = obj->programs; nr_progs = obj->nr_programs; + sec_off = 0; - progs = reallocarray(progs, nr_progs + 1, sizeof(progs[0])); - if (!progs) { - /* - * In this case the original obj->programs - * is still valid, so don't need special treat for - * bpf_close_object(). - */ - pr_warn("failed to alloc a new program under section '%s'\n", - section_name); - bpf_program__exit(&prog); - return -ENOMEM; - } - - pr_debug("found program %s\n", prog.section_name); - obj->programs = progs; - obj->nr_programs = nr_progs + 1; - prog.obj = obj; - progs[nr_progs] = prog; - return 0; -} - -static int -bpf_object__init_prog_names(struct bpf_object *obj) -{ - Elf_Data *symbols = obj->efile.symbols; - struct bpf_program *prog; - size_t pi, si; + while (sec_off < sec_sz) { + if (elf_sym_by_sec_off(obj, sec_idx, sec_off, STT_FUNC, &sym)) { + pr_warn("sec '%s': failed to find program symbol at offset %zu\n", + sec_name, sec_off); + return -LIBBPF_ERRNO__FORMAT; + } - for (pi = 0; pi < obj->nr_programs; pi++) { - const char *name = NULL; + prog_sz = sym.st_size; - prog = &obj->programs[pi]; + name = elf_sym_str(obj, sym.st_name); + if (!name) { + pr_warn("sec '%s': failed to get symbol name for offset %zu\n", + sec_name, sec_off); + return -LIBBPF_ERRNO__FORMAT; + } - for (si = 0; si < symbols->d_size / sizeof(GElf_Sym) && !name; - si++) { - GElf_Sym sym; + if (sec_off + prog_sz > sec_sz) { + pr_warn("sec '%s': program at offset %zu crosses section boundary\n", + sec_name, sec_off); + return -LIBBPF_ERRNO__FORMAT; + } - if (!gelf_getsym(symbols, si, &sym)) - continue; - if (sym.st_shndx != prog->idx) - continue; - if (GELF_ST_BIND(sym.st_info) != STB_GLOBAL) - continue; + pr_debug("sec '%s': found program '%s' at insn offset %zu (%zu bytes), code size %zu insns (%zu bytes)\n", + sec_name, name, sec_off / BPF_INSN_SZ, sec_off, prog_sz / BPF_INSN_SZ, prog_sz); - name = elf_strptr(obj->efile.elf, - obj->efile.strtabidx, - sym.st_name); - if (!name) { - pr_warn("failed to get sym name string for prog %s\n", - prog->section_name); - return -LIBBPF_ERRNO__LIBELF; - } + progs = libbpf_reallocarray(progs, nr_progs + 1, sizeof(*progs)); + if (!progs) { + /* + * In this case the original obj->programs + * is still valid, so don't need special treat for + * bpf_close_object(). + */ + pr_warn("sec '%s': failed to alloc memory for new program '%s'\n", + sec_name, name); + return -ENOMEM; } + obj->programs = progs; - if (!name && prog->idx == obj->efile.text_shndx) - name = ".text"; + prog = &progs[nr_progs]; - if (!name) { - pr_warn("failed to find sym for prog %s\n", - prog->section_name); - return -EINVAL; - } + err = bpf_object__init_prog(obj, prog, name, sec_idx, sec_name, + sec_off, data + sec_off, prog_sz); + if (err) + return err; - prog->name = strdup(name); - if (!prog->name) { - pr_warn("failed to allocate memory for prog sym %s\n", - name); - return -ENOMEM; - } + nr_progs++; + obj->nr_programs = nr_progs; + + sec_off += prog_sz; } return 0; @@ -1035,6 +1079,7 @@ static struct bpf_object *bpf_object__new(const char *path, obj->efile.bss_shndx = -1; obj->efile.st_ops_shndx = -1; obj->kconfig_map_idx = -1; + obj->rodata_map_idx = -1; obj->kern_version = get_kernel_version(); obj->loaded = false; @@ -1066,13 +1111,18 @@ static void bpf_object__elf_finish(struct bpf_object *obj) obj->efile.obj_buf_sz = 0; } +/* if libelf is old and doesn't support mmap(), fall back to read() */ +#ifndef ELF_C_READ_MMAP +#define ELF_C_READ_MMAP ELF_C_READ +#endif + static int bpf_object__elf_init(struct bpf_object *obj) { int err = 0; GElf_Ehdr *ep; if (obj_elf_valid(obj)) { - pr_warn("elf init: internal error\n"); + pr_warn("elf: init internal error\n"); return -LIBBPF_ERRNO__LIBELF; } @@ -1090,31 +1140,44 @@ static int bpf_object__elf_init(struct bpf_object *obj) err = -errno; cp = libbpf_strerror_r(err, errmsg, sizeof(errmsg)); - pr_warn("failed to open %s: %s\n", obj->path, cp); + pr_warn("elf: failed to open %s: %s\n", obj->path, cp); return err; } - obj->efile.elf = elf_begin(obj->efile.fd, - LIBBPF_ELF_C_READ_MMAP, NULL); + obj->efile.elf = elf_begin(obj->efile.fd, ELF_C_READ_MMAP, NULL); } if (!obj->efile.elf) { - pr_warn("failed to open %s as ELF file\n", obj->path); + pr_warn("elf: failed to open %s as ELF file: %s\n", obj->path, elf_errmsg(-1)); err = -LIBBPF_ERRNO__LIBELF; goto errout; } if (!gelf_getehdr(obj->efile.elf, &obj->efile.ehdr)) { - pr_warn("failed to get EHDR from %s\n", obj->path); + pr_warn("elf: failed to get ELF header from %s: %s\n", obj->path, elf_errmsg(-1)); err = -LIBBPF_ERRNO__FORMAT; goto errout; } ep = &obj->efile.ehdr; + if (elf_getshdrstrndx(obj->efile.elf, &obj->efile.shstrndx)) { + pr_warn("elf: failed to get section names section index for %s: %s\n", + obj->path, elf_errmsg(-1)); + err = -LIBBPF_ERRNO__FORMAT; + goto errout; + } + + /* Elf is corrupted/truncated, avoid calling elf_strptr. */ + if (!elf_rawdata(elf_getscn(obj->efile.elf, obj->efile.shstrndx), NULL)) { + pr_warn("elf: failed to get section names strings from %s: %s\n", + obj->path, elf_errmsg(-1)); + return -LIBBPF_ERRNO__FORMAT; + } + /* Old LLVM set e_machine to EM_NONE */ if (ep->e_type != ET_REL || (ep->e_machine && ep->e_machine != EM_BPF)) { - pr_warn("%s is not an eBPF object file\n", obj->path); + pr_warn("elf: %s is not a valid eBPF object file\n", obj->path); err = -LIBBPF_ERRNO__FORMAT; goto errout; } @@ -1136,7 +1199,7 @@ static int bpf_object__check_endianness(struct bpf_object *obj) #else # error "Unrecognized __BYTE_ORDER__" #endif - pr_warn("endianness mismatch.\n"); + pr_warn("elf: endianness mismatch in %s.\n", obj->path); return -LIBBPF_ERRNO__ENDIAN; } @@ -1171,55 +1234,10 @@ static bool bpf_map_type__is_map_in_map(enum bpf_map_type type) return false; } -static int bpf_object_search_section_size(const struct bpf_object *obj, - const char *name, size_t *d_size) -{ - const GElf_Ehdr *ep = &obj->efile.ehdr; - Elf *elf = obj->efile.elf; - Elf_Scn *scn = NULL; - int idx = 0; - - while ((scn = elf_nextscn(elf, scn)) != NULL) { - const char *sec_name; - Elf_Data *data; - GElf_Shdr sh; - - idx++; - if (gelf_getshdr(scn, &sh) != &sh) { - pr_warn("failed to get section(%d) header from %s\n", - idx, obj->path); - return -EIO; - } - - sec_name = elf_strptr(elf, ep->e_shstrndx, sh.sh_name); - if (!sec_name) { - pr_warn("failed to get section(%d) name from %s\n", - idx, obj->path); - return -EIO; - } - - if (strcmp(name, sec_name)) - continue; - - data = elf_getdata(scn, 0); - if (!data) { - pr_warn("failed to get section(%d) data from %s(%s)\n", - idx, name, obj->path); - return -EIO; - } - - *d_size = data->d_size; - return 0; - } - - return -ENOENT; -} - int bpf_object__section_size(const struct bpf_object *obj, const char *name, __u32 *size) { int ret = -ENOENT; - size_t d_size; *size = 0; if (!name) { @@ -1237,9 +1255,13 @@ int bpf_object__section_size(const struct bpf_object *obj, const char *name, if (obj->efile.st_ops_data) *size = obj->efile.st_ops_data->d_size; } else { - ret = bpf_object_search_section_size(obj, name, &d_size); - if (!ret) - *size = d_size; + Elf_Scn *scn = elf_sec_by_name(obj, name); + Elf_Data *data = elf_sec_data(obj, scn); + + if (data) { + ret = 0; /* found it */ + *size = data->d_size; + } } return *size ? 0 : ret; @@ -1264,8 +1286,7 @@ int bpf_object__variable_offset(const struct bpf_object *obj, const char *name, GELF_ST_TYPE(sym.st_info) != STT_OBJECT) continue; - sname = elf_strptr(obj->efile.elf, obj->efile.strtabidx, - sym.st_name); + sname = elf_sym_str(obj, sym.st_name); if (!sname) { pr_warn("failed to get sym name string for var %s\n", name); @@ -1290,7 +1311,7 @@ static struct bpf_map *bpf_object__add_map(struct bpf_object *obj) return &obj->maps[obj->nr_maps++]; new_cap = max((size_t)4, obj->maps_cap * 3 / 2); - new_maps = realloc(obj->maps, new_cap * sizeof(*obj->maps)); + new_maps = libbpf_reallocarray(obj->maps, new_cap, sizeof(*obj->maps)); if (!new_maps) { pr_warn("alloc maps for object failed\n"); return ERR_PTR(-ENOMEM); @@ -1417,6 +1438,8 @@ static int bpf_object__init_global_data_maps(struct bpf_object *obj) obj->efile.rodata->d_size); if (err) return err; + + obj->rodata_map_idx = obj->nr_maps - 1; } if (obj->efile.bss_shndx >= 0) { err = bpf_object__init_internal_map(obj, LIBBPF_MAP_BSS, @@ -1742,12 +1765,12 @@ static int bpf_object__init_user_maps(struct bpf_object *obj, bool strict) if (!symbols) return -EINVAL; - scn = elf_getscn(obj->efile.elf, obj->efile.maps_shndx); - if (scn) - data = elf_getdata(scn, NULL); + + scn = elf_sec_by_idx(obj, obj->efile.maps_shndx); + data = elf_sec_data(obj, scn); if (!scn || !data) { - pr_warn("failed to get Elf_Data from map section %d\n", - obj->efile.maps_shndx); + pr_warn("elf: failed to get legacy map definitions for %s\n", + obj->path); return -EINVAL; } @@ -1769,12 +1792,12 @@ static int bpf_object__init_user_maps(struct bpf_object *obj, bool strict) nr_maps++; } /* Assume equally sized map definitions */ - pr_debug("maps in %s: %d maps in %zd bytes\n", - obj->path, nr_maps, data->d_size); + pr_debug("elf: found %d legacy map definitions (%zd bytes) in %s\n", + nr_maps, data->d_size, obj->path); if (!data->d_size || nr_maps == 0 || (data->d_size % nr_maps) != 0) { - pr_warn("unable to determine map definition size section %s, %d maps in %zd bytes\n", - obj->path, nr_maps, data->d_size); + pr_warn("elf: unable to determine legacy map definition size in %s\n", + obj->path); return -EINVAL; } map_def_sz = data->d_size / nr_maps; @@ -1795,8 +1818,7 @@ static int bpf_object__init_user_maps(struct bpf_object *obj, bool strict) if (IS_ERR(map)) return PTR_ERR(map); - map_name = elf_strptr(obj->efile.elf, obj->efile.strtabidx, - sym.st_name); + map_name = elf_sym_str(obj, sym.st_name); if (!map_name) { pr_warn("failed to get map #%d name sym string for obj %s\n", i, obj->path); @@ -1884,6 +1906,29 @@ resolve_func_ptr(const struct btf *btf, __u32 id, __u32 *res_id) return btf_is_func_proto(t) ? t : NULL; } +static const char *btf_kind_str(const struct btf_type *t) +{ + switch (btf_kind(t)) { + case BTF_KIND_UNKN: return "void"; + case BTF_KIND_INT: return "int"; + case BTF_KIND_PTR: return "ptr"; + case BTF_KIND_ARRAY: return "array"; + case BTF_KIND_STRUCT: return "struct"; + case BTF_KIND_UNION: return "union"; + case BTF_KIND_ENUM: return "enum"; + case BTF_KIND_FWD: return "fwd"; + case BTF_KIND_TYPEDEF: return "typedef"; + case BTF_KIND_VOLATILE: return "volatile"; + case BTF_KIND_CONST: return "const"; + case BTF_KIND_RESTRICT: return "restrict"; + case BTF_KIND_FUNC: return "func"; + case BTF_KIND_FUNC_PROTO: return "func_proto"; + case BTF_KIND_VAR: return "var"; + case BTF_KIND_DATASEC: return "datasec"; + default: return "unknown"; + } +} + /* * Fetch integer attribute of BTF map definition. Such attributes are * represented using a pointer to an array, in which dimensionality of array @@ -1900,8 +1945,8 @@ static bool get_map_field_int(const char *map_name, const struct btf *btf, const struct btf_type *arr_t; if (!btf_is_ptr(t)) { - pr_warn("map '%s': attr '%s': expected PTR, got %u.\n", - map_name, name, btf_kind(t)); + pr_warn("map '%s': attr '%s': expected PTR, got %s.\n", + map_name, name, btf_kind_str(t)); return false; } @@ -1912,8 +1957,8 @@ static bool get_map_field_int(const char *map_name, const struct btf *btf, return false; } if (!btf_is_array(arr_t)) { - pr_warn("map '%s': attr '%s': expected ARRAY, got %u.\n", - map_name, name, btf_kind(arr_t)); + pr_warn("map '%s': attr '%s': expected ARRAY, got %s.\n", + map_name, name, btf_kind_str(arr_t)); return false; } arr_info = btf_array(arr_t); @@ -1924,7 +1969,7 @@ static bool get_map_field_int(const char *map_name, const struct btf *btf, static int build_map_pin_path(struct bpf_map *map, const char *path) { char buf[PATH_MAX]; - int err, len; + int len; if (!path) path = "/sys/fs/bpf"; @@ -1935,11 +1980,7 @@ static int build_map_pin_path(struct bpf_map *map, const char *path) else if (len >= PATH_MAX) return -ENAMETOOLONG; - err = bpf_map__set_pin_path(map, buf); - if (err) - return err; - - return 0; + return bpf_map__set_pin_path(map, buf); } @@ -2007,8 +2048,8 @@ static int parse_btf_map_def(struct bpf_object *obj, return -EINVAL; } if (!btf_is_ptr(t)) { - pr_warn("map '%s': key spec is not PTR: %u.\n", - map->name, btf_kind(t)); + pr_warn("map '%s': key spec is not PTR: %s.\n", + map->name, btf_kind_str(t)); return -EINVAL; } sz = btf__resolve_size(obj->btf, t->type); @@ -2049,8 +2090,8 @@ static int parse_btf_map_def(struct bpf_object *obj, return -EINVAL; } if (!btf_is_ptr(t)) { - pr_warn("map '%s': value spec is not PTR: %u.\n", - map->name, btf_kind(t)); + pr_warn("map '%s': value spec is not PTR: %s.\n", + map->name, btf_kind_str(t)); return -EINVAL; } sz = btf__resolve_size(obj->btf, t->type); @@ -2107,14 +2148,14 @@ static int parse_btf_map_def(struct bpf_object *obj, t = skip_mods_and_typedefs(obj->btf, btf_array(t)->type, NULL); if (!btf_is_ptr(t)) { - pr_warn("map '%s': map-in-map inner def is of unexpected kind %u.\n", - map->name, btf_kind(t)); + pr_warn("map '%s': map-in-map inner def is of unexpected kind %s.\n", + map->name, btf_kind_str(t)); return -EINVAL; } t = skip_mods_and_typedefs(obj->btf, t->type, NULL); if (!btf_is_struct(t)) { - pr_warn("map '%s': map-in-map inner def is of unexpected kind %u.\n", - map->name, btf_kind(t)); + pr_warn("map '%s': map-in-map inner def is of unexpected kind %s.\n", + map->name, btf_kind_str(t)); return -EINVAL; } @@ -2205,8 +2246,8 @@ static int bpf_object__init_user_btf_map(struct bpf_object *obj, return -EINVAL; } if (!btf_is_var(var)) { - pr_warn("map '%s': unexpected var kind %u.\n", - map_name, btf_kind(var)); + pr_warn("map '%s': unexpected var kind %s.\n", + map_name, btf_kind_str(var)); return -EINVAL; } if (var_extra->linkage != BTF_VAR_GLOBAL_ALLOCATED && @@ -2218,8 +2259,8 @@ static int bpf_object__init_user_btf_map(struct bpf_object *obj, def = skip_mods_and_typedefs(obj->btf, var->type, NULL); if (!btf_is_struct(def)) { - pr_warn("map '%s': unexpected def kind %u.\n", - map_name, btf_kind(var)); + pr_warn("map '%s': unexpected def kind %s.\n", + map_name, btf_kind_str(var)); return -EINVAL; } if (def->size > vi->size) { @@ -2259,12 +2300,11 @@ static int bpf_object__init_user_btf_maps(struct bpf_object *obj, bool strict, if (obj->efile.btf_maps_shndx < 0) return 0; - scn = elf_getscn(obj->efile.elf, obj->efile.btf_maps_shndx); - if (scn) - data = elf_getdata(scn, NULL); + scn = elf_sec_by_idx(obj, obj->efile.btf_maps_shndx); + data = elf_sec_data(obj, scn); if (!scn || !data) { - pr_warn("failed to get Elf_Data from map section %d (%s)\n", - obj->efile.btf_maps_shndx, MAPS_ELF_SEC); + pr_warn("elf: failed to get %s map definitions for %s\n", + MAPS_ELF_SEC, obj->path); return -EINVAL; } @@ -2322,36 +2362,28 @@ static int bpf_object__init_maps(struct bpf_object *obj, static bool section_have_execinstr(struct bpf_object *obj, int idx) { - Elf_Scn *scn; GElf_Shdr sh; - scn = elf_getscn(obj->efile.elf, idx); - if (!scn) + if (elf_sec_hdr(obj, elf_sec_by_idx(obj, idx), &sh)) return false; - if (gelf_getshdr(scn, &sh) != &sh) - return false; - - if (sh.sh_flags & SHF_EXECINSTR) - return true; - - return false; + return sh.sh_flags & SHF_EXECINSTR; } static bool btf_needs_sanitization(struct bpf_object *obj) { - bool has_func_global = obj->caps.btf_func_global; - bool has_datasec = obj->caps.btf_datasec; - bool has_func = obj->caps.btf_func; + bool has_func_global = kernel_supports(FEAT_BTF_GLOBAL_FUNC); + bool has_datasec = kernel_supports(FEAT_BTF_DATASEC); + bool has_func = kernel_supports(FEAT_BTF_FUNC); return !has_func || !has_datasec || !has_func_global; } static void bpf_object__sanitize_btf(struct bpf_object *obj, struct btf *btf) { - bool has_func_global = obj->caps.btf_func_global; - bool has_datasec = obj->caps.btf_datasec; - bool has_func = obj->caps.btf_func; + bool has_func_global = kernel_supports(FEAT_BTF_GLOBAL_FUNC); + bool has_datasec = kernel_supports(FEAT_BTF_DATASEC); + bool has_func = kernel_supports(FEAT_BTF_FUNC); struct btf_type *t; int i, j, vlen; @@ -2496,12 +2528,23 @@ static int bpf_object__load_vmlinux_btf(struct bpf_object *obj) { bool need_vmlinux_btf = false; struct bpf_program *prog; - int err; + int i, err; /* CO-RE relocations need kernel BTF */ - if (obj->btf_ext && obj->btf_ext->field_reloc_info.len) + if (obj->btf_ext && obj->btf_ext->core_relo_info.len) need_vmlinux_btf = true; + /* Support for typed ksyms needs kernel BTF */ + for (i = 0; i < obj->nr_extern; i++) { + const struct extern_desc *ext; + + ext = &obj->externs[i]; + if (ext->type == EXT_KSYM && ext->ksym.type_id) { + need_vmlinux_btf = true; + break; + } + } + bpf_object__for_each_program(prog, obj) { if (!prog->load) continue; @@ -2533,6 +2576,15 @@ static int bpf_object__sanitize_and_load_btf(struct bpf_object *obj) if (!obj->btf) return 0; + if (!kernel_supports(FEAT_BTF)) { + if (kernel_needs_btf(obj)) { + err = -EOPNOTSUPP; + goto report; + } + pr_debug("Kernel doesn't support BTF, skipping uploading it.\n"); + return 0; + } + sanitize = btf_needs_sanitization(obj); if (sanitize) { const void *raw_data; @@ -2558,6 +2610,7 @@ static int bpf_object__sanitize_and_load_btf(struct bpf_object *obj) } btf__free(kern_btf); } +report: if (err) { btf_mandatory = kernel_needs_btf(obj); pr_warn("Error loading .BTF into kernel: %d. %s\n", err, @@ -2569,61 +2622,255 @@ static int bpf_object__sanitize_and_load_btf(struct bpf_object *obj) return err; } +static const char *elf_sym_str(const struct bpf_object *obj, size_t off) +{ + const char *name; + + name = elf_strptr(obj->efile.elf, obj->efile.strtabidx, off); + if (!name) { + pr_warn("elf: failed to get section name string at offset %zu from %s: %s\n", + off, obj->path, elf_errmsg(-1)); + return NULL; + } + + return name; +} + +static const char *elf_sec_str(const struct bpf_object *obj, size_t off) +{ + const char *name; + + name = elf_strptr(obj->efile.elf, obj->efile.shstrndx, off); + if (!name) { + pr_warn("elf: failed to get section name string at offset %zu from %s: %s\n", + off, obj->path, elf_errmsg(-1)); + return NULL; + } + + return name; +} + +static Elf_Scn *elf_sec_by_idx(const struct bpf_object *obj, size_t idx) +{ + Elf_Scn *scn; + + scn = elf_getscn(obj->efile.elf, idx); + if (!scn) { + pr_warn("elf: failed to get section(%zu) from %s: %s\n", + idx, obj->path, elf_errmsg(-1)); + return NULL; + } + return scn; +} + +static Elf_Scn *elf_sec_by_name(const struct bpf_object *obj, const char *name) +{ + Elf_Scn *scn = NULL; + Elf *elf = obj->efile.elf; + const char *sec_name; + + while ((scn = elf_nextscn(elf, scn)) != NULL) { + sec_name = elf_sec_name(obj, scn); + if (!sec_name) + return NULL; + + if (strcmp(sec_name, name) != 0) + continue; + + return scn; + } + return NULL; +} + +static int elf_sec_hdr(const struct bpf_object *obj, Elf_Scn *scn, GElf_Shdr *hdr) +{ + if (!scn) + return -EINVAL; + + if (gelf_getshdr(scn, hdr) != hdr) { + pr_warn("elf: failed to get section(%zu) header from %s: %s\n", + elf_ndxscn(scn), obj->path, elf_errmsg(-1)); + return -EINVAL; + } + + return 0; +} + +static const char *elf_sec_name(const struct bpf_object *obj, Elf_Scn *scn) +{ + const char *name; + GElf_Shdr sh; + + if (!scn) + return NULL; + + if (elf_sec_hdr(obj, scn, &sh)) + return NULL; + + name = elf_sec_str(obj, sh.sh_name); + if (!name) { + pr_warn("elf: failed to get section(%zu) name from %s: %s\n", + elf_ndxscn(scn), obj->path, elf_errmsg(-1)); + return NULL; + } + + return name; +} + +static Elf_Data *elf_sec_data(const struct bpf_object *obj, Elf_Scn *scn) +{ + Elf_Data *data; + + if (!scn) + return NULL; + + data = elf_getdata(scn, 0); + if (!data) { + pr_warn("elf: failed to get section(%zu) %s data from %s: %s\n", + elf_ndxscn(scn), elf_sec_name(obj, scn) ?: "<?>", + obj->path, elf_errmsg(-1)); + return NULL; + } + + return data; +} + +static int elf_sym_by_sec_off(const struct bpf_object *obj, size_t sec_idx, + size_t off, __u32 sym_type, GElf_Sym *sym) +{ + Elf_Data *symbols = obj->efile.symbols; + size_t n = symbols->d_size / sizeof(GElf_Sym); + int i; + + for (i = 0; i < n; i++) { + if (!gelf_getsym(symbols, i, sym)) + continue; + if (sym->st_shndx != sec_idx || sym->st_value != off) + continue; + if (GELF_ST_TYPE(sym->st_info) != sym_type) + continue; + return 0; + } + + return -ENOENT; +} + +static bool is_sec_name_dwarf(const char *name) +{ + /* approximation, but the actual list is too long */ + return strncmp(name, ".debug_", sizeof(".debug_") - 1) == 0; +} + +static bool ignore_elf_section(GElf_Shdr *hdr, const char *name) +{ + /* no special handling of .strtab */ + if (hdr->sh_type == SHT_STRTAB) + return true; + + /* ignore .llvm_addrsig section as well */ + if (hdr->sh_type == 0x6FFF4C03 /* SHT_LLVM_ADDRSIG */) + return true; + + /* no subprograms will lead to an empty .text section, ignore it */ + if (hdr->sh_type == SHT_PROGBITS && hdr->sh_size == 0 && + strcmp(name, ".text") == 0) + return true; + + /* DWARF sections */ + if (is_sec_name_dwarf(name)) + return true; + + if (strncmp(name, ".rel", sizeof(".rel") - 1) == 0) { + name += sizeof(".rel") - 1; + /* DWARF section relocations */ + if (is_sec_name_dwarf(name)) + return true; + + /* .BTF and .BTF.ext don't need relocations */ + if (strcmp(name, BTF_ELF_SEC) == 0 || + strcmp(name, BTF_EXT_ELF_SEC) == 0) + return true; + } + + return false; +} + +static int cmp_progs(const void *_a, const void *_b) +{ + const struct bpf_program *a = _a; + const struct bpf_program *b = _b; + + if (a->sec_idx != b->sec_idx) + return a->sec_idx < b->sec_idx ? -1 : 1; + + /* sec_insn_off can't be the same within the section */ + return a->sec_insn_off < b->sec_insn_off ? -1 : 1; +} + static int bpf_object__elf_collect(struct bpf_object *obj) { Elf *elf = obj->efile.elf; - GElf_Ehdr *ep = &obj->efile.ehdr; Elf_Data *btf_ext_data = NULL; Elf_Data *btf_data = NULL; - Elf_Scn *scn = NULL; int idx = 0, err = 0; + const char *name; + Elf_Data *data; + Elf_Scn *scn; + GElf_Shdr sh; - /* Elf is corrupted/truncated, avoid calling elf_strptr. */ - if (!elf_rawdata(elf_getscn(elf, ep->e_shstrndx), NULL)) { - pr_warn("failed to get e_shstrndx from %s\n", obj->path); - return -LIBBPF_ERRNO__FORMAT; + /* a bunch of ELF parsing functionality depends on processing symbols, + * so do the first pass and find the symbol table + */ + scn = NULL; + while ((scn = elf_nextscn(elf, scn)) != NULL) { + if (elf_sec_hdr(obj, scn, &sh)) + return -LIBBPF_ERRNO__FORMAT; + + if (sh.sh_type == SHT_SYMTAB) { + if (obj->efile.symbols) { + pr_warn("elf: multiple symbol tables in %s\n", obj->path); + return -LIBBPF_ERRNO__FORMAT; + } + + data = elf_sec_data(obj, scn); + if (!data) + return -LIBBPF_ERRNO__FORMAT; + + obj->efile.symbols = data; + obj->efile.symbols_shndx = elf_ndxscn(scn); + obj->efile.strtabidx = sh.sh_link; + } } + scn = NULL; while ((scn = elf_nextscn(elf, scn)) != NULL) { - char *name; - GElf_Shdr sh; - Elf_Data *data; - idx++; - if (gelf_getshdr(scn, &sh) != &sh) { - pr_warn("failed to get section(%d) header from %s\n", - idx, obj->path); + + if (elf_sec_hdr(obj, scn, &sh)) return -LIBBPF_ERRNO__FORMAT; - } - name = elf_strptr(elf, ep->e_shstrndx, sh.sh_name); - if (!name) { - pr_warn("failed to get section(%d) name from %s\n", - idx, obj->path); + name = elf_sec_str(obj, sh.sh_name); + if (!name) return -LIBBPF_ERRNO__FORMAT; - } - data = elf_getdata(scn, 0); - if (!data) { - pr_warn("failed to get section(%d) data from %s(%s)\n", - idx, name, obj->path); + if (ignore_elf_section(&sh, name)) + continue; + + data = elf_sec_data(obj, scn); + if (!data) return -LIBBPF_ERRNO__FORMAT; - } - pr_debug("section(%d) %s, size %ld, link %d, flags %lx, type=%d\n", + + pr_debug("elf: section(%d) %s, size %ld, link %d, flags %lx, type=%d\n", idx, name, (unsigned long)data->d_size, (int)sh.sh_link, (unsigned long)sh.sh_flags, (int)sh.sh_type); if (strcmp(name, "license") == 0) { - err = bpf_object__init_license(obj, - data->d_buf, - data->d_size); + err = bpf_object__init_license(obj, data->d_buf, data->d_size); if (err) return err; } else if (strcmp(name, "version") == 0) { - err = bpf_object__init_kversion(obj, - data->d_buf, - data->d_size); + err = bpf_object__init_kversion(obj, data->d_buf, data->d_size); if (err) return err; } else if (strcmp(name, "maps") == 0) { @@ -2635,31 +2882,14 @@ static int bpf_object__elf_collect(struct bpf_object *obj) } else if (strcmp(name, BTF_EXT_ELF_SEC) == 0) { btf_ext_data = data; } else if (sh.sh_type == SHT_SYMTAB) { - if (obj->efile.symbols) { - pr_warn("bpf: multiple SYMTAB in %s\n", - obj->path); - return -LIBBPF_ERRNO__FORMAT; - } - obj->efile.symbols = data; - obj->efile.symbols_shndx = idx; - obj->efile.strtabidx = sh.sh_link; + /* already processed during the first pass above */ } else if (sh.sh_type == SHT_PROGBITS && data->d_size > 0) { if (sh.sh_flags & SHF_EXECINSTR) { if (strcmp(name, ".text") == 0) obj->efile.text_shndx = idx; - err = bpf_object__add_program(obj, data->d_buf, - data->d_size, - name, idx); - if (err) { - char errmsg[STRERR_BUFSIZE]; - char *cp; - - cp = libbpf_strerror_r(-err, errmsg, - sizeof(errmsg)); - pr_warn("failed to alloc program %s (%s): %s", - name, obj->path, cp); + err = bpf_object__add_programs(obj, data, name, idx); + if (err) return err; - } } else if (strcmp(name, DATA_SEC) == 0) { obj->efile.data = data; obj->efile.data_shndx = idx; @@ -2670,7 +2900,8 @@ static int bpf_object__elf_collect(struct bpf_object *obj) obj->efile.st_ops_data = data; obj->efile.st_ops_shndx = idx; } else { - pr_debug("skip section(%d) %s\n", idx, name); + pr_info("elf: skipping unrecognized data section(%d) %s\n", + idx, name); } } else if (sh.sh_type == SHT_REL) { int nr_sects = obj->efile.nr_reloc_sects; @@ -2681,36 +2912,40 @@ static int bpf_object__elf_collect(struct bpf_object *obj) if (!section_have_execinstr(obj, sec) && strcmp(name, ".rel" STRUCT_OPS_SEC) && strcmp(name, ".rel" MAPS_ELF_SEC)) { - pr_debug("skip relo %s(%d) for section(%d)\n", - name, idx, sec); + pr_info("elf: skipping relo section(%d) %s for section(%d) %s\n", + idx, name, sec, + elf_sec_name(obj, elf_sec_by_idx(obj, sec)) ?: "<?>"); continue; } - sects = reallocarray(sects, nr_sects + 1, - sizeof(*obj->efile.reloc_sects)); - if (!sects) { - pr_warn("reloc_sects realloc failed\n"); + sects = libbpf_reallocarray(sects, nr_sects + 1, + sizeof(*obj->efile.reloc_sects)); + if (!sects) return -ENOMEM; - } obj->efile.reloc_sects = sects; obj->efile.nr_reloc_sects++; obj->efile.reloc_sects[nr_sects].shdr = sh; obj->efile.reloc_sects[nr_sects].data = data; - } else if (sh.sh_type == SHT_NOBITS && - strcmp(name, BSS_SEC) == 0) { + } else if (sh.sh_type == SHT_NOBITS && strcmp(name, BSS_SEC) == 0) { obj->efile.bss = data; obj->efile.bss_shndx = idx; } else { - pr_debug("skip section(%d) %s\n", idx, name); + pr_info("elf: skipping section(%d) %s (size %zu)\n", idx, name, + (size_t)sh.sh_size); } } if (!obj->efile.strtabidx || obj->efile.strtabidx > idx) { - pr_warn("Corrupted ELF file: index of strtab invalid\n"); + pr_warn("elf: symbol strings section missing or invalid in %s\n", obj->path); return -LIBBPF_ERRNO__FORMAT; } + + /* sort BPF programs by section name and in-section instruction offset + * for faster search */ + qsort(obj->programs, obj->nr_programs, sizeof(*obj->programs), cmp_progs); + return bpf_object__init_btf(obj, btf_data, btf_ext_data); } @@ -2869,14 +3104,13 @@ static int bpf_object__collect_externs(struct bpf_object *obj) if (!obj->efile.symbols) return 0; - scn = elf_getscn(obj->efile.elf, obj->efile.symbols_shndx); - if (!scn) + scn = elf_sec_by_idx(obj, obj->efile.symbols_shndx); + if (elf_sec_hdr(obj, scn, &sh)) return -LIBBPF_ERRNO__FORMAT; - if (gelf_getshdr(scn, &sh) != &sh) - return -LIBBPF_ERRNO__FORMAT; - n = sh.sh_size / sh.sh_entsize; + n = sh.sh_size / sh.sh_entsize; pr_debug("looking for externs among %d symbols...\n", n); + for (i = 0; i < n; i++) { GElf_Sym sym; @@ -2884,13 +3118,12 @@ static int bpf_object__collect_externs(struct bpf_object *obj) return -LIBBPF_ERRNO__FORMAT; if (!sym_is_extern(&sym)) continue; - ext_name = elf_strptr(obj->efile.elf, obj->efile.strtabidx, - sym.st_name); + ext_name = elf_sym_str(obj, sym.st_name); if (!ext_name || !ext_name[0]) continue; ext = obj->externs; - ext = reallocarray(ext, obj->nr_extern + 1, sizeof(*ext)); + ext = libbpf_reallocarray(ext, obj->nr_extern + 1, sizeof(*ext)); if (!ext) return -ENOMEM; obj->externs = ext; @@ -2940,16 +3173,10 @@ static int bpf_object__collect_externs(struct bpf_object *obj) return -ENOTSUP; } } else if (strcmp(sec_name, KSYMS_SEC) == 0) { - const struct btf_type *vt; - ksym_sec = sec; ext->type = EXT_KSYM; - - vt = skip_mods_and_typedefs(obj->btf, t->type, NULL); - if (!btf_is_void(vt)) { - pr_warn("extern (ksym) '%s' is not typeless (void)\n", ext_name); - return -ENOTSUP; - } + skip_mods_and_typedefs(obj->btf, t->type, + &ext->ksym.type_id); } else { pr_warn("unrecognized extern section '%s'\n", sec_name); return -ENOTSUP; @@ -3037,20 +3264,6 @@ static int bpf_object__collect_externs(struct bpf_object *obj) return 0; } -static struct bpf_program * -bpf_object__find_prog_by_idx(struct bpf_object *obj, int idx) -{ - struct bpf_program *prog; - size_t i; - - for (i = 0; i < obj->nr_programs; i++) { - prog = &obj->programs[i]; - if (prog->idx == idx) - return prog; - } - return NULL; -} - struct bpf_program * bpf_object__find_program_by_title(const struct bpf_object *obj, const char *title) @@ -3058,12 +3271,18 @@ bpf_object__find_program_by_title(const struct bpf_object *obj, struct bpf_program *pos; bpf_object__for_each_program(pos, obj) { - if (pos->section_name && !strcmp(pos->section_name, title)) + if (pos->sec_name && !strcmp(pos->sec_name, title)) return pos; } return NULL; } +static bool prog_is_subprog(const struct bpf_object *obj, + const struct bpf_program *prog) +{ + return prog->sec_idx == obj->efile.text_shndx && obj->has_subcalls; +} + struct bpf_program * bpf_object__find_program_by_name(const struct bpf_object *obj, const char *name) @@ -3071,6 +3290,8 @@ bpf_object__find_program_by_name(const struct bpf_object *obj, struct bpf_program *prog; bpf_object__for_each_program(prog, obj) { + if (prog_is_subprog(obj, prog)) + continue; if (!strcmp(prog->name, name)) return prog; } @@ -3109,7 +3330,7 @@ bpf_object__section_to_libbpf_map_type(const struct bpf_object *obj, int shndx) static int bpf_program__record_reloc(struct bpf_program *prog, struct reloc_desc *reloc_desc, - __u32 insn_idx, const char *name, + __u32 insn_idx, const char *sym_name, const GElf_Sym *sym, const GElf_Rel *rel) { struct bpf_insn *insn = &prog->insns[insn_idx]; @@ -3117,34 +3338,38 @@ static int bpf_program__record_reloc(struct bpf_program *prog, struct bpf_object *obj = prog->obj; __u32 shdr_idx = sym->st_shndx; enum libbpf_map_type type; + const char *sym_sec_name; struct bpf_map *map; + reloc_desc->processed = false; + /* sub-program call relocation */ if (insn->code == (BPF_JMP | BPF_CALL)) { if (insn->src_reg != BPF_PSEUDO_CALL) { - pr_warn("incorrect bpf_call opcode\n"); + pr_warn("prog '%s': incorrect bpf_call opcode\n", prog->name); return -LIBBPF_ERRNO__RELOC; } /* text_shndx can be 0, if no default "main" program exists */ if (!shdr_idx || shdr_idx != obj->efile.text_shndx) { - pr_warn("bad call relo against section %u\n", shdr_idx); + sym_sec_name = elf_sec_name(obj, elf_sec_by_idx(obj, shdr_idx)); + pr_warn("prog '%s': bad call relo against '%s' in section '%s'\n", + prog->name, sym_name, sym_sec_name); return -LIBBPF_ERRNO__RELOC; } - if (sym->st_value % 8) { - pr_warn("bad call relo offset: %zu\n", - (size_t)sym->st_value); + if (sym->st_value % BPF_INSN_SZ) { + pr_warn("prog '%s': bad call relo against '%s' at offset %zu\n", + prog->name, sym_name, (size_t)sym->st_value); return -LIBBPF_ERRNO__RELOC; } reloc_desc->type = RELO_CALL; reloc_desc->insn_idx = insn_idx; reloc_desc->sym_off = sym->st_value; - obj->has_pseudo_calls = true; return 0; } if (insn->code != (BPF_LD | BPF_IMM | BPF_DW)) { - pr_warn("invalid relo for insns[%d].code 0x%x\n", - insn_idx, insn->code); + pr_warn("prog '%s': invalid relo against '%s' for insns[%d].code 0x%x\n", + prog->name, sym_name, insn_idx, insn->code); return -LIBBPF_ERRNO__RELOC; } @@ -3159,12 +3384,12 @@ static int bpf_program__record_reloc(struct bpf_program *prog, break; } if (i >= n) { - pr_warn("extern relo failed to find extern for sym %d\n", - sym_idx); + pr_warn("prog '%s': extern relo failed to find extern for '%s' (%d)\n", + prog->name, sym_name, sym_idx); return -LIBBPF_ERRNO__RELOC; } - pr_debug("found extern #%d '%s' (sym %d) for insn %u\n", - i, ext->name, ext->sym_idx, insn_idx); + pr_debug("prog '%s': found extern #%d '%s' (sym %d) for insn #%u\n", + prog->name, i, ext->name, ext->sym_idx, insn_idx); reloc_desc->type = RELO_EXTERN; reloc_desc->insn_idx = insn_idx; reloc_desc->sym_off = i; /* sym_off stores extern index */ @@ -3172,18 +3397,19 @@ static int bpf_program__record_reloc(struct bpf_program *prog, } if (!shdr_idx || shdr_idx >= SHN_LORESERVE) { - pr_warn("invalid relo for \'%s\' in special section 0x%x; forgot to initialize global var?..\n", - name, shdr_idx); + pr_warn("prog '%s': invalid relo against '%s' in special section 0x%x; forgot to initialize global var?..\n", + prog->name, sym_name, shdr_idx); return -LIBBPF_ERRNO__RELOC; } type = bpf_object__section_to_libbpf_map_type(obj, shdr_idx); + sym_sec_name = elf_sec_name(obj, elf_sec_by_idx(obj, shdr_idx)); /* generic map reference relocation */ if (type == LIBBPF_MAP_UNSPEC) { if (!bpf_object__shndx_is_maps(obj, shdr_idx)) { - pr_warn("bad map relo against section %u\n", - shdr_idx); + pr_warn("prog '%s': bad map relo against '%s' in section '%s'\n", + prog->name, sym_name, sym_sec_name); return -LIBBPF_ERRNO__RELOC; } for (map_idx = 0; map_idx < nr_maps; map_idx++) { @@ -3192,14 +3418,14 @@ static int bpf_program__record_reloc(struct bpf_program *prog, map->sec_idx != sym->st_shndx || map->sec_offset != sym->st_value) continue; - pr_debug("found map %zd (%s, sec %d, off %zu) for insn %u\n", - map_idx, map->name, map->sec_idx, + pr_debug("prog '%s': found map %zd (%s, sec %d, off %zu) for insn #%u\n", + prog->name, map_idx, map->name, map->sec_idx, map->sec_offset, insn_idx); break; } if (map_idx >= nr_maps) { - pr_warn("map relo failed to find map for sec %u, off %zu\n", - shdr_idx, (size_t)sym->st_value); + pr_warn("prog '%s': map relo failed to find map for section '%s', off %zu\n", + prog->name, sym_sec_name, (size_t)sym->st_value); return -LIBBPF_ERRNO__RELOC; } reloc_desc->type = RELO_LD64; @@ -3211,21 +3437,22 @@ static int bpf_program__record_reloc(struct bpf_program *prog, /* global data map relocation */ if (!bpf_object__shndx_is_data(obj, shdr_idx)) { - pr_warn("bad data relo against section %u\n", shdr_idx); + pr_warn("prog '%s': bad data relo against section '%s'\n", + prog->name, sym_sec_name); return -LIBBPF_ERRNO__RELOC; } for (map_idx = 0; map_idx < nr_maps; map_idx++) { map = &obj->maps[map_idx]; if (map->libbpf_type != type) continue; - pr_debug("found data map %zd (%s, sec %d, off %zu) for insn %u\n", - map_idx, map->name, map->sec_idx, map->sec_offset, - insn_idx); + pr_debug("prog '%s': found data map %zd (%s, sec %d, off %zu) for insn %u\n", + prog->name, map_idx, map->name, map->sec_idx, + map->sec_offset, insn_idx); break; } if (map_idx >= nr_maps) { - pr_warn("data relo failed to find map for sec %u\n", - shdr_idx); + pr_warn("prog '%s': data relo failed to find map for section '%s'\n", + prog->name, sym_sec_name); return -LIBBPF_ERRNO__RELOC; } @@ -3236,55 +3463,113 @@ static int bpf_program__record_reloc(struct bpf_program *prog, return 0; } +static bool prog_contains_insn(const struct bpf_program *prog, size_t insn_idx) +{ + return insn_idx >= prog->sec_insn_off && + insn_idx < prog->sec_insn_off + prog->sec_insn_cnt; +} + +static struct bpf_program *find_prog_by_sec_insn(const struct bpf_object *obj, + size_t sec_idx, size_t insn_idx) +{ + int l = 0, r = obj->nr_programs - 1, m; + struct bpf_program *prog; + + while (l < r) { + m = l + (r - l + 1) / 2; + prog = &obj->programs[m]; + + if (prog->sec_idx < sec_idx || + (prog->sec_idx == sec_idx && prog->sec_insn_off <= insn_idx)) + l = m; + else + r = m - 1; + } + /* matching program could be at index l, but it still might be the + * wrong one, so we need to double check conditions for the last time + */ + prog = &obj->programs[l]; + if (prog->sec_idx == sec_idx && prog_contains_insn(prog, insn_idx)) + return prog; + return NULL; +} + static int -bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr, - Elf_Data *data, struct bpf_object *obj) +bpf_object__collect_prog_relos(struct bpf_object *obj, GElf_Shdr *shdr, Elf_Data *data) { Elf_Data *symbols = obj->efile.symbols; + const char *relo_sec_name, *sec_name; + size_t sec_idx = shdr->sh_info; + struct bpf_program *prog; + struct reloc_desc *relos; int err, i, nrels; + const char *sym_name; + __u32 insn_idx; + GElf_Sym sym; + GElf_Rel rel; - pr_debug("collecting relocating info for: '%s'\n", prog->section_name); - nrels = shdr->sh_size / shdr->sh_entsize; + relo_sec_name = elf_sec_str(obj, shdr->sh_name); + sec_name = elf_sec_name(obj, elf_sec_by_idx(obj, sec_idx)); + if (!relo_sec_name || !sec_name) + return -EINVAL; - prog->reloc_desc = malloc(sizeof(*prog->reloc_desc) * nrels); - if (!prog->reloc_desc) { - pr_warn("failed to alloc memory in relocation\n"); - return -ENOMEM; - } - prog->nr_reloc = nrels; + pr_debug("sec '%s': collecting relocation for section(%zu) '%s'\n", + relo_sec_name, sec_idx, sec_name); + nrels = shdr->sh_size / shdr->sh_entsize; for (i = 0; i < nrels; i++) { - const char *name; - __u32 insn_idx; - GElf_Sym sym; - GElf_Rel rel; - if (!gelf_getrel(data, i, &rel)) { - pr_warn("relocation: failed to get %d reloc\n", i); + pr_warn("sec '%s': failed to get relo #%d\n", relo_sec_name, i); return -LIBBPF_ERRNO__FORMAT; } if (!gelf_getsym(symbols, GELF_R_SYM(rel.r_info), &sym)) { - pr_warn("relocation: symbol %"PRIx64" not found\n", - GELF_R_SYM(rel.r_info)); + pr_warn("sec '%s': symbol 0x%zx not found for relo #%d\n", + relo_sec_name, (size_t)GELF_R_SYM(rel.r_info), i); return -LIBBPF_ERRNO__FORMAT; } - if (rel.r_offset % sizeof(struct bpf_insn)) + if (rel.r_offset % BPF_INSN_SZ) { + pr_warn("sec '%s': invalid offset 0x%zx for relo #%d\n", + relo_sec_name, (size_t)GELF_R_SYM(rel.r_info), i); return -LIBBPF_ERRNO__FORMAT; + } + + insn_idx = rel.r_offset / BPF_INSN_SZ; + /* relocations against static functions are recorded as + * relocations against the section that contains a function; + * in such case, symbol will be STT_SECTION and sym.st_name + * will point to empty string (0), so fetch section name + * instead + */ + if (GELF_ST_TYPE(sym.st_info) == STT_SECTION && sym.st_name == 0) + sym_name = elf_sec_name(obj, elf_sec_by_idx(obj, sym.st_shndx)); + else + sym_name = elf_sym_str(obj, sym.st_name); + sym_name = sym_name ?: "<?"; - insn_idx = rel.r_offset / sizeof(struct bpf_insn); - name = elf_strptr(obj->efile.elf, obj->efile.strtabidx, - sym.st_name) ? : "<?>"; + pr_debug("sec '%s': relo #%d: insn #%u against '%s'\n", + relo_sec_name, i, insn_idx, sym_name); - pr_debug("relo for shdr %u, symb %zu, value %zu, type %d, bind %d, name %d (\'%s\'), insn %u\n", - (__u32)sym.st_shndx, (size_t)GELF_R_SYM(rel.r_info), - (size_t)sym.st_value, GELF_ST_TYPE(sym.st_info), - GELF_ST_BIND(sym.st_info), sym.st_name, name, - insn_idx); + prog = find_prog_by_sec_insn(obj, sec_idx, insn_idx); + if (!prog) { + pr_warn("sec '%s': relo #%d: program not found in section '%s' for insn #%u\n", + relo_sec_name, i, sec_name, insn_idx); + return -LIBBPF_ERRNO__RELOC; + } - err = bpf_program__record_reloc(prog, &prog->reloc_desc[i], - insn_idx, name, &sym, &rel); + relos = libbpf_reallocarray(prog->reloc_desc, + prog->nr_reloc + 1, sizeof(*relos)); + if (!relos) + return -ENOMEM; + prog->reloc_desc = relos; + + /* adjust insn_idx to local BPF program frame of reference */ + insn_idx -= prog->sec_insn_off; + err = bpf_program__record_reloc(prog, &relos[prog->nr_reloc], + insn_idx, sym_name, &sym, &rel); if (err) return err; + + prog->nr_reloc++; } return 0; } @@ -3433,8 +3718,14 @@ bpf_object__probe_loading(struct bpf_object *obj) return 0; } -static int -bpf_object__probe_name(struct bpf_object *obj) +static int probe_fd(int fd) +{ + if (fd >= 0) + close(fd); + return fd >= 0; +} + +static int probe_kern_prog_name(void) { struct bpf_load_program_attr attr; struct bpf_insn insns[] = { @@ -3452,16 +3743,10 @@ bpf_object__probe_name(struct bpf_object *obj) attr.license = "GPL"; attr.name = "test"; ret = bpf_load_program_xattr(&attr, NULL, 0); - if (ret >= 0) { - obj->caps.name = 1; - close(ret); - } - - return 0; + return probe_fd(ret); } -static int -bpf_object__probe_global_data(struct bpf_object *obj) +static int probe_kern_global_data(void) { struct bpf_load_program_attr prg_attr; struct bpf_create_map_attr map_attr; @@ -3498,16 +3783,23 @@ bpf_object__probe_global_data(struct bpf_object *obj) prg_attr.license = "GPL"; ret = bpf_load_program_xattr(&prg_attr, NULL, 0); - if (ret >= 0) { - obj->caps.global_data = 1; - close(ret); - } - close(map); - return 0; + return probe_fd(ret); +} + +static int probe_kern_btf(void) +{ + static const char strs[] = "\0int"; + __u32 types[] = { + /* int */ + BTF_TYPE_INT_ENC(1, BTF_INT_SIGNED, 0, 32, 4), + }; + + return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types), + strs, sizeof(strs))); } -static int bpf_object__probe_btf_func(struct bpf_object *obj) +static int probe_kern_btf_func(void) { static const char strs[] = "\0int\0x\0a"; /* void x(int a) {} */ @@ -3520,20 +3812,12 @@ static int bpf_object__probe_btf_func(struct bpf_object *obj) /* FUNC x */ /* [3] */ BTF_TYPE_ENC(5, BTF_INFO_ENC(BTF_KIND_FUNC, 0, 0), 2), }; - int btf_fd; - - btf_fd = libbpf__load_raw_btf((char *)types, sizeof(types), - strs, sizeof(strs)); - if (btf_fd >= 0) { - obj->caps.btf_func = 1; - close(btf_fd); - return 1; - } - return 0; + return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types), + strs, sizeof(strs))); } -static int bpf_object__probe_btf_func_global(struct bpf_object *obj) +static int probe_kern_btf_func_global(void) { static const char strs[] = "\0int\0x\0a"; /* static void x(int a) {} */ @@ -3546,20 +3830,12 @@ static int bpf_object__probe_btf_func_global(struct bpf_object *obj) /* FUNC x BTF_FUNC_GLOBAL */ /* [3] */ BTF_TYPE_ENC(5, BTF_INFO_ENC(BTF_KIND_FUNC, 0, BTF_FUNC_GLOBAL), 2), }; - int btf_fd; - btf_fd = libbpf__load_raw_btf((char *)types, sizeof(types), - strs, sizeof(strs)); - if (btf_fd >= 0) { - obj->caps.btf_func_global = 1; - close(btf_fd); - return 1; - } - - return 0; + return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types), + strs, sizeof(strs))); } -static int bpf_object__probe_btf_datasec(struct bpf_object *obj) +static int probe_kern_btf_datasec(void) { static const char strs[] = "\0x\0.data"; /* static int a; */ @@ -3573,20 +3849,12 @@ static int bpf_object__probe_btf_datasec(struct bpf_object *obj) BTF_TYPE_ENC(3, BTF_INFO_ENC(BTF_KIND_DATASEC, 0, 1), 4), BTF_VAR_SECINFO_ENC(2, 0, 4), }; - int btf_fd; - btf_fd = libbpf__load_raw_btf((char *)types, sizeof(types), - strs, sizeof(strs)); - if (btf_fd >= 0) { - obj->caps.btf_datasec = 1; - close(btf_fd); - return 1; - } - - return 0; + return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types), + strs, sizeof(strs))); } -static int bpf_object__probe_array_mmap(struct bpf_object *obj) +static int probe_kern_array_mmap(void) { struct bpf_create_map_attr attr = { .map_type = BPF_MAP_TYPE_ARRAY, @@ -3595,27 +3863,17 @@ static int bpf_object__probe_array_mmap(struct bpf_object *obj) .value_size = sizeof(int), .max_entries = 1, }; - int fd; - - fd = bpf_create_map_xattr(&attr); - if (fd >= 0) { - obj->caps.array_mmap = 1; - close(fd); - return 1; - } - return 0; + return probe_fd(bpf_create_map_xattr(&attr)); } -static int -bpf_object__probe_exp_attach_type(struct bpf_object *obj) +static int probe_kern_exp_attach_type(void) { struct bpf_load_program_attr attr; struct bpf_insn insns[] = { BPF_MOV64_IMM(BPF_REG_0, 0), BPF_EXIT_INSN(), }; - int fd; memset(&attr, 0, sizeof(attr)); /* use any valid combination of program type and (optional) @@ -3629,36 +3887,140 @@ bpf_object__probe_exp_attach_type(struct bpf_object *obj) attr.insns_cnt = ARRAY_SIZE(insns); attr.license = "GPL"; - fd = bpf_load_program_xattr(&attr, NULL, 0); - if (fd >= 0) { - obj->caps.exp_attach_type = 1; - close(fd); - return 1; - } - return 0; + return probe_fd(bpf_load_program_xattr(&attr, NULL, 0)); } -static int -bpf_object__probe_caps(struct bpf_object *obj) -{ - int (*probe_fn[])(struct bpf_object *obj) = { - bpf_object__probe_name, - bpf_object__probe_global_data, - bpf_object__probe_btf_func, - bpf_object__probe_btf_func_global, - bpf_object__probe_btf_datasec, - bpf_object__probe_array_mmap, - bpf_object__probe_exp_attach_type, +static int probe_kern_probe_read_kernel(void) +{ + struct bpf_load_program_attr attr; + struct bpf_insn insns[] = { + BPF_MOV64_REG(BPF_REG_1, BPF_REG_10), /* r1 = r10 (fp) */ + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, -8), /* r1 += -8 */ + BPF_MOV64_IMM(BPF_REG_2, 8), /* r2 = 8 */ + BPF_MOV64_IMM(BPF_REG_3, 0), /* r3 = 0 */ + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_probe_read_kernel), + BPF_EXIT_INSN(), }; - int i, ret; - for (i = 0; i < ARRAY_SIZE(probe_fn); i++) { - ret = probe_fn[i](obj); - if (ret < 0) - pr_debug("Probe #%d failed with %d.\n", i, ret); + memset(&attr, 0, sizeof(attr)); + attr.prog_type = BPF_PROG_TYPE_KPROBE; + attr.insns = insns; + attr.insns_cnt = ARRAY_SIZE(insns); + attr.license = "GPL"; + + return probe_fd(bpf_load_program_xattr(&attr, NULL, 0)); +} + +static int probe_prog_bind_map(void) +{ + struct bpf_load_program_attr prg_attr; + struct bpf_create_map_attr map_attr; + char *cp, errmsg[STRERR_BUFSIZE]; + struct bpf_insn insns[] = { + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + }; + int ret, map, prog; + + memset(&map_attr, 0, sizeof(map_attr)); + map_attr.map_type = BPF_MAP_TYPE_ARRAY; + map_attr.key_size = sizeof(int); + map_attr.value_size = 32; + map_attr.max_entries = 1; + + map = bpf_create_map_xattr(&map_attr); + if (map < 0) { + ret = -errno; + cp = libbpf_strerror_r(ret, errmsg, sizeof(errmsg)); + pr_warn("Error in %s():%s(%d). Couldn't create simple array map.\n", + __func__, cp, -ret); + return ret; } - return 0; + memset(&prg_attr, 0, sizeof(prg_attr)); + prg_attr.prog_type = BPF_PROG_TYPE_SOCKET_FILTER; + prg_attr.insns = insns; + prg_attr.insns_cnt = ARRAY_SIZE(insns); + prg_attr.license = "GPL"; + + prog = bpf_load_program_xattr(&prg_attr, NULL, 0); + if (prog < 0) { + close(map); + return 0; + } + + ret = bpf_prog_bind_map(prog, map, NULL); + + close(map); + close(prog); + + return ret >= 0; +} + +enum kern_feature_result { + FEAT_UNKNOWN = 0, + FEAT_SUPPORTED = 1, + FEAT_MISSING = 2, +}; + +typedef int (*feature_probe_fn)(void); + +static struct kern_feature_desc { + const char *desc; + feature_probe_fn probe; + enum kern_feature_result res; +} feature_probes[__FEAT_CNT] = { + [FEAT_PROG_NAME] = { + "BPF program name", probe_kern_prog_name, + }, + [FEAT_GLOBAL_DATA] = { + "global variables", probe_kern_global_data, + }, + [FEAT_BTF] = { + "minimal BTF", probe_kern_btf, + }, + [FEAT_BTF_FUNC] = { + "BTF functions", probe_kern_btf_func, + }, + [FEAT_BTF_GLOBAL_FUNC] = { + "BTF global function", probe_kern_btf_func_global, + }, + [FEAT_BTF_DATASEC] = { + "BTF data section and variable", probe_kern_btf_datasec, + }, + [FEAT_ARRAY_MMAP] = { + "ARRAY map mmap()", probe_kern_array_mmap, + }, + [FEAT_EXP_ATTACH_TYPE] = { + "BPF_PROG_LOAD expected_attach_type attribute", + probe_kern_exp_attach_type, + }, + [FEAT_PROBE_READ_KERN] = { + "bpf_probe_read_kernel() helper", probe_kern_probe_read_kernel, + }, + [FEAT_PROG_BIND_MAP] = { + "BPF_PROG_BIND_MAP support", probe_prog_bind_map, + } +}; + +static bool kernel_supports(enum kern_feature_id feat_id) +{ + struct kern_feature_desc *feat = &feature_probes[feat_id]; + int ret; + + if (READ_ONCE(feat->res) == FEAT_UNKNOWN) { + ret = feat->probe(); + if (ret > 0) { + WRITE_ONCE(feat->res, FEAT_SUPPORTED); + } else if (ret == 0) { + WRITE_ONCE(feat->res, FEAT_MISSING); + } else { + pr_warn("Detection of kernel %s support failed: %d\n", feat->desc, ret); + WRITE_ONCE(feat->res, FEAT_MISSING); + } + } + + return READ_ONCE(feat->res) == FEAT_SUPPORTED; } static bool map_is_reuse_compat(const struct bpf_map *map, int map_fd) @@ -3760,7 +4122,7 @@ static int bpf_object__create_map(struct bpf_object *obj, struct bpf_map *map) memset(&create_attr, 0, sizeof(create_attr)); - if (obj->caps.name) + if (kernel_supports(FEAT_PROG_NAME)) create_attr.name = map->name; create_attr.map_ifindex = map->map_ifindex; create_attr.map_type = def->type; @@ -3841,6 +4203,36 @@ static int bpf_object__create_map(struct bpf_object *obj, struct bpf_map *map) return 0; } +static int init_map_slots(struct bpf_map *map) +{ + const struct bpf_map *targ_map; + unsigned int i; + int fd, err; + + for (i = 0; i < map->init_slots_sz; i++) { + if (!map->init_slots[i]) + continue; + + targ_map = map->init_slots[i]; + fd = bpf_map__fd(targ_map); + err = bpf_map_update_elem(map->fd, &i, &fd, 0); + if (err) { + err = -errno; + pr_warn("map '%s': failed to initialize slot [%d] to map '%s' fd=%d: %d\n", + map->name, i, targ_map->name, + fd, err); + return err; + } + pr_debug("map '%s': slot [%d] set to map '%s' fd=%d\n", + map->name, i, targ_map->name, fd); + } + + zfree(&map->init_slots); + map->init_slots_sz = 0; + + return 0; +} + static int bpf_object__create_maps(struct bpf_object *obj) { @@ -3864,47 +4256,29 @@ bpf_object__create_maps(struct bpf_object *obj) if (map->fd >= 0) { pr_debug("map '%s': skipping creation (preset fd=%d)\n", map->name, map->fd); - continue; - } - - err = bpf_object__create_map(obj, map); - if (err) - goto err_out; - - pr_debug("map '%s': created successfully, fd=%d\n", map->name, - map->fd); - - if (bpf_map__is_internal(map)) { - err = bpf_object__populate_internal_map(obj, map); - if (err < 0) { - zclose(map->fd); + } else { + err = bpf_object__create_map(obj, map); + if (err) goto err_out; - } - } - if (map->init_slots_sz) { - for (j = 0; j < map->init_slots_sz; j++) { - const struct bpf_map *targ_map; - int fd; + pr_debug("map '%s': created successfully, fd=%d\n", + map->name, map->fd); - if (!map->init_slots[j]) - continue; + if (bpf_map__is_internal(map)) { + err = bpf_object__populate_internal_map(obj, map); + if (err < 0) { + zclose(map->fd); + goto err_out; + } + } - targ_map = map->init_slots[j]; - fd = bpf_map__fd(targ_map); - err = bpf_map_update_elem(map->fd, &j, &fd, 0); - if (err) { - err = -errno; - pr_warn("map '%s': failed to initialize slot [%d] to map '%s' fd=%d: %d\n", - map->name, j, targ_map->name, - fd, err); + if (map->init_slots_sz) { + err = init_map_slots(map); + if (err < 0) { + zclose(map->fd); goto err_out; } - pr_debug("map '%s': slot [%d] set to map '%s' fd=%d\n", - map->name, j, targ_map->name, fd); } - zfree(&map->init_slots); - map->init_slots_sz = 0; } if (map->pin_path && !map->pinned) { @@ -3929,75 +4303,6 @@ err_out: return err; } -static int -check_btf_ext_reloc_err(struct bpf_program *prog, int err, - void *btf_prog_info, const char *info_name) -{ - if (err != -ENOENT) { - pr_warn("Error in loading %s for sec %s.\n", - info_name, prog->section_name); - return err; - } - - /* err == -ENOENT (i.e. prog->section_name not found in btf_ext) */ - - if (btf_prog_info) { - /* - * Some info has already been found but has problem - * in the last btf_ext reloc. Must have to error out. - */ - pr_warn("Error in relocating %s for sec %s.\n", - info_name, prog->section_name); - return err; - } - - /* Have problem loading the very first info. Ignore the rest. */ - pr_warn("Cannot find %s for main program sec %s. Ignore all %s.\n", - info_name, prog->section_name, info_name); - return 0; -} - -static int -bpf_program_reloc_btf_ext(struct bpf_program *prog, struct bpf_object *obj, - const char *section_name, __u32 insn_offset) -{ - int err; - - if (!insn_offset || prog->func_info) { - /* - * !insn_offset => main program - * - * For sub prog, the main program's func_info has to - * be loaded first (i.e. prog->func_info != NULL) - */ - err = btf_ext__reloc_func_info(obj->btf, obj->btf_ext, - section_name, insn_offset, - &prog->func_info, - &prog->func_info_cnt); - if (err) - return check_btf_ext_reloc_err(prog, err, - prog->func_info, - "bpf_func_info"); - - prog->func_info_rec_size = btf_ext__func_info_rec_size(obj->btf_ext); - } - - if (!insn_offset || prog->line_info) { - err = btf_ext__reloc_line_info(obj->btf, obj->btf_ext, - section_name, insn_offset, - &prog->line_info, - &prog->line_info_cnt); - if (err) - return check_btf_ext_reloc_err(prog, err, - prog->line_info, - "bpf_line_info"); - - prog->line_info_rec_size = btf_ext__line_info_rec_size(obj->btf_ext); - } - - return 0; -} - #define BPF_CORE_SPEC_MAX_LEN 64 /* represents BPF CO-RE field or array element accessor */ @@ -4011,6 +4316,10 @@ struct bpf_core_spec { const struct btf *btf; /* high-level spec: named fields and array indices only */ struct bpf_core_accessor spec[BPF_CORE_SPEC_MAX_LEN]; + /* original unresolved (no skip_mods_or_typedefs) root type ID */ + __u32 root_type_id; + /* CO-RE relocation kind */ + enum bpf_core_relo_kind relo_kind; /* high-level spec length */ int len; /* raw, low-level spec: 1-to-1 with accessor spec string */ @@ -4041,8 +4350,66 @@ static bool is_flex_arr(const struct btf *btf, return acc->idx == btf_vlen(t) - 1; } +static const char *core_relo_kind_str(enum bpf_core_relo_kind kind) +{ + switch (kind) { + case BPF_FIELD_BYTE_OFFSET: return "byte_off"; + case BPF_FIELD_BYTE_SIZE: return "byte_sz"; + case BPF_FIELD_EXISTS: return "field_exists"; + case BPF_FIELD_SIGNED: return "signed"; + case BPF_FIELD_LSHIFT_U64: return "lshift_u64"; + case BPF_FIELD_RSHIFT_U64: return "rshift_u64"; + case BPF_TYPE_ID_LOCAL: return "local_type_id"; + case BPF_TYPE_ID_TARGET: return "target_type_id"; + case BPF_TYPE_EXISTS: return "type_exists"; + case BPF_TYPE_SIZE: return "type_size"; + case BPF_ENUMVAL_EXISTS: return "enumval_exists"; + case BPF_ENUMVAL_VALUE: return "enumval_value"; + default: return "unknown"; + } +} + +static bool core_relo_is_field_based(enum bpf_core_relo_kind kind) +{ + switch (kind) { + case BPF_FIELD_BYTE_OFFSET: + case BPF_FIELD_BYTE_SIZE: + case BPF_FIELD_EXISTS: + case BPF_FIELD_SIGNED: + case BPF_FIELD_LSHIFT_U64: + case BPF_FIELD_RSHIFT_U64: + return true; + default: + return false; + } +} + +static bool core_relo_is_type_based(enum bpf_core_relo_kind kind) +{ + switch (kind) { + case BPF_TYPE_ID_LOCAL: + case BPF_TYPE_ID_TARGET: + case BPF_TYPE_EXISTS: + case BPF_TYPE_SIZE: + return true; + default: + return false; + } +} + +static bool core_relo_is_enumval_based(enum bpf_core_relo_kind kind) +{ + switch (kind) { + case BPF_ENUMVAL_EXISTS: + case BPF_ENUMVAL_VALUE: + return true; + default: + return false; + } +} + /* - * Turn bpf_field_reloc into a low- and high-level spec representation, + * Turn bpf_core_relo into a low- and high-level spec representation, * validating correctness along the way, as well as calculating resulting * field bit offset, specified by accessor string. Low-level spec captures * every single level of nestedness, including traversing anonymous @@ -4071,10 +4438,17 @@ static bool is_flex_arr(const struct btf *btf, * - field 'a' access (corresponds to '2' in low-level spec); * - array element #3 access (corresponds to '3' in low-level spec). * + * Type-based relocations (TYPE_EXISTS/TYPE_SIZE, + * TYPE_ID_LOCAL/TYPE_ID_TARGET) don't capture any field information. Their + * spec and raw_spec are kept empty. + * + * Enum value-based relocations (ENUMVAL_EXISTS/ENUMVAL_VALUE) use access + * string to specify enumerator's value index that need to be relocated. */ -static int bpf_core_spec_parse(const struct btf *btf, +static int bpf_core_parse_spec(const struct btf *btf, __u32 type_id, const char *spec_str, + enum bpf_core_relo_kind relo_kind, struct bpf_core_spec *spec) { int access_idx, parsed_len, i; @@ -4089,6 +4463,15 @@ static int bpf_core_spec_parse(const struct btf *btf, memset(spec, 0, sizeof(*spec)); spec->btf = btf; + spec->root_type_id = type_id; + spec->relo_kind = relo_kind; + + /* type-based relocations don't have a field access string */ + if (core_relo_is_type_based(relo_kind)) { + if (strcmp(spec_str, "0")) + return -EINVAL; + return 0; + } /* parse spec_str="0:1:2:3:4" into array raw_spec=[0, 1, 2, 3, 4] */ while (*spec_str) { @@ -4105,16 +4488,28 @@ static int bpf_core_spec_parse(const struct btf *btf, if (spec->raw_len == 0) return -EINVAL; - /* first spec value is always reloc type array index */ t = skip_mods_and_typedefs(btf, type_id, &id); if (!t) return -EINVAL; access_idx = spec->raw_spec[0]; - spec->spec[0].type_id = id; - spec->spec[0].idx = access_idx; + acc = &spec->spec[0]; + acc->type_id = id; + acc->idx = access_idx; spec->len++; + if (core_relo_is_enumval_based(relo_kind)) { + if (!btf_is_enum(t) || spec->raw_len > 1 || access_idx >= btf_vlen(t)) + return -EINVAL; + + /* record enumerator name in a first accessor */ + acc->name = btf__name_by_offset(btf, btf_enum(t)[access_idx].name_off); + return 0; + } + + if (!core_relo_is_field_based(relo_kind)) + return -EINVAL; + sz = btf__resolve_size(btf, id); if (sz < 0) return sz; @@ -4172,8 +4567,8 @@ static int bpf_core_spec_parse(const struct btf *btf, return sz; spec->bit_offset += access_idx * sz * 8; } else { - pr_warn("relo for [%u] %s (at idx %d) captures type [%d] of unexpected kind %d\n", - type_id, spec_str, i, id, btf_kind(t)); + pr_warn("relo for [%u] %s (at idx %d) captures type [%d] of unexpected kind %s\n", + type_id, spec_str, i, id, btf_kind_str(t)); return -EINVAL; } } @@ -4223,16 +4618,16 @@ static struct ids_vec *bpf_core_find_cands(const struct btf *local_btf, { size_t local_essent_len, targ_essent_len; const char *local_name, *targ_name; - const struct btf_type *t; + const struct btf_type *t, *local_t; struct ids_vec *cand_ids; __u32 *new_ids; int i, err, n; - t = btf__type_by_id(local_btf, local_type_id); - if (!t) + local_t = btf__type_by_id(local_btf, local_type_id); + if (!local_t) return ERR_PTR(-EINVAL); - local_name = btf__name_by_offset(local_btf, t->name_off); + local_name = btf__name_by_offset(local_btf, local_t->name_off); if (str_is_empty(local_name)) return ERR_PTR(-EINVAL); local_essent_len = bpf_core_essential_name_len(local_name); @@ -4244,12 +4639,11 @@ static struct ids_vec *bpf_core_find_cands(const struct btf *local_btf, n = btf__get_nr_types(targ_btf); for (i = 1; i <= n; i++) { t = btf__type_by_id(targ_btf, i); - targ_name = btf__name_by_offset(targ_btf, t->name_off); - if (str_is_empty(targ_name)) + if (btf_kind(t) != btf_kind(local_t)) continue; - t = skip_mods_and_typedefs(targ_btf, i, NULL); - if (!btf_is_composite(t) && !btf_is_array(t)) + targ_name = btf__name_by_offset(targ_btf, t->name_off); + if (str_is_empty(targ_name)) continue; targ_essent_len = bpf_core_essential_name_len(targ_name); @@ -4257,11 +4651,12 @@ static struct ids_vec *bpf_core_find_cands(const struct btf *local_btf, continue; if (strncmp(local_name, targ_name, local_essent_len) == 0) { - pr_debug("[%d] %s: found candidate [%d] %s\n", - local_type_id, local_name, i, targ_name); - new_ids = reallocarray(cand_ids->data, - cand_ids->len + 1, - sizeof(*cand_ids->data)); + pr_debug("CO-RE relocating [%d] %s %s: found target candidate [%d] %s %s\n", + local_type_id, btf_kind_str(local_t), + local_name, i, btf_kind_str(t), targ_name); + new_ids = libbpf_reallocarray(cand_ids->data, + cand_ids->len + 1, + sizeof(*cand_ids->data)); if (!new_ids) { err = -ENOMEM; goto err_out; @@ -4276,8 +4671,9 @@ err_out: return ERR_PTR(err); } -/* Check two types for compatibility, skipping const/volatile/restrict and - * typedefs, to ensure we are relocating compatible entities: +/* Check two types for compatibility for the purpose of field access + * relocation. const/volatile/restrict and typedefs are skipped to ensure we + * are relocating semantically compatible entities: * - any two STRUCTs/UNIONs are compatible and can be mixed; * - any two FWDs are compatible, if their names match (modulo flavor suffix); * - any two PTRs are always compatible; @@ -4432,6 +4828,100 @@ static int bpf_core_match_member(const struct btf *local_btf, return 0; } +/* Check local and target types for compatibility. This check is used for + * type-based CO-RE relocations and follow slightly different rules than + * field-based relocations. This function assumes that root types were already + * checked for name match. Beyond that initial root-level name check, names + * are completely ignored. Compatibility rules are as follows: + * - any two STRUCTs/UNIONs/FWDs/ENUMs/INTs are considered compatible, but + * kind should match for local and target types (i.e., STRUCT is not + * compatible with UNION); + * - for ENUMs, the size is ignored; + * - for INT, size and signedness are ignored; + * - for ARRAY, dimensionality is ignored, element types are checked for + * compatibility recursively; + * - CONST/VOLATILE/RESTRICT modifiers are ignored; + * - TYPEDEFs/PTRs are compatible if types they pointing to are compatible; + * - FUNC_PROTOs are compatible if they have compatible signature: same + * number of input args and compatible return and argument types. + * These rules are not set in stone and probably will be adjusted as we get + * more experience with using BPF CO-RE relocations. + */ +static int bpf_core_types_are_compat(const struct btf *local_btf, __u32 local_id, + const struct btf *targ_btf, __u32 targ_id) +{ + const struct btf_type *local_type, *targ_type; + int depth = 32; /* max recursion depth */ + + /* caller made sure that names match (ignoring flavor suffix) */ + local_type = btf__type_by_id(local_btf, local_id); + targ_type = btf__type_by_id(targ_btf, targ_id); + if (btf_kind(local_type) != btf_kind(targ_type)) + return 0; + +recur: + depth--; + if (depth < 0) + return -EINVAL; + + local_type = skip_mods_and_typedefs(local_btf, local_id, &local_id); + targ_type = skip_mods_and_typedefs(targ_btf, targ_id, &targ_id); + if (!local_type || !targ_type) + return -EINVAL; + + if (btf_kind(local_type) != btf_kind(targ_type)) + return 0; + + switch (btf_kind(local_type)) { + case BTF_KIND_UNKN: + case BTF_KIND_STRUCT: + case BTF_KIND_UNION: + case BTF_KIND_ENUM: + case BTF_KIND_FWD: + return 1; + case BTF_KIND_INT: + /* just reject deprecated bitfield-like integers; all other + * integers are by default compatible between each other + */ + return btf_int_offset(local_type) == 0 && btf_int_offset(targ_type) == 0; + case BTF_KIND_PTR: + local_id = local_type->type; + targ_id = targ_type->type; + goto recur; + case BTF_KIND_ARRAY: + local_id = btf_array(local_type)->type; + targ_id = btf_array(targ_type)->type; + goto recur; + case BTF_KIND_FUNC_PROTO: { + struct btf_param *local_p = btf_params(local_type); + struct btf_param *targ_p = btf_params(targ_type); + __u16 local_vlen = btf_vlen(local_type); + __u16 targ_vlen = btf_vlen(targ_type); + int i, err; + + if (local_vlen != targ_vlen) + return 0; + + for (i = 0; i < local_vlen; i++, local_p++, targ_p++) { + skip_mods_and_typedefs(local_btf, local_p->type, &local_id); + skip_mods_and_typedefs(targ_btf, targ_p->type, &targ_id); + err = bpf_core_types_are_compat(local_btf, local_id, targ_btf, targ_id); + if (err <= 0) + return err; + } + + /* tail recurse for return type check */ + skip_mods_and_typedefs(local_btf, local_type->type, &local_id); + skip_mods_and_typedefs(targ_btf, targ_type->type, &targ_id); + goto recur; + } + default: + pr_warn("unexpected kind %s relocated, local [%d], target [%d]\n", + btf_kind_str(local_type), local_id, targ_id); + return 0; + } +} + /* * Try to match local spec to a target type and, if successful, produce full * target spec (high-level, low-level + bit offset). @@ -4447,10 +4937,51 @@ static int bpf_core_spec_match(struct bpf_core_spec *local_spec, memset(targ_spec, 0, sizeof(*targ_spec)); targ_spec->btf = targ_btf; + targ_spec->root_type_id = targ_id; + targ_spec->relo_kind = local_spec->relo_kind; + + if (core_relo_is_type_based(local_spec->relo_kind)) { + return bpf_core_types_are_compat(local_spec->btf, + local_spec->root_type_id, + targ_btf, targ_id); + } local_acc = &local_spec->spec[0]; targ_acc = &targ_spec->spec[0]; + if (core_relo_is_enumval_based(local_spec->relo_kind)) { + size_t local_essent_len, targ_essent_len; + const struct btf_enum *e; + const char *targ_name; + + /* has to resolve to an enum */ + targ_type = skip_mods_and_typedefs(targ_spec->btf, targ_id, &targ_id); + if (!btf_is_enum(targ_type)) + return 0; + + local_essent_len = bpf_core_essential_name_len(local_acc->name); + + for (i = 0, e = btf_enum(targ_type); i < btf_vlen(targ_type); i++, e++) { + targ_name = btf__name_by_offset(targ_spec->btf, e->name_off); + targ_essent_len = bpf_core_essential_name_len(targ_name); + if (targ_essent_len != local_essent_len) + continue; + if (strncmp(local_acc->name, targ_name, local_essent_len) == 0) { + targ_acc->type_id = targ_id; + targ_acc->idx = i; + targ_acc->name = targ_name; + targ_spec->len++; + targ_spec->raw_spec[targ_spec->raw_len] = targ_acc->idx; + targ_spec->raw_len++; + return 1; + } + } + return 0; + } + + if (!core_relo_is_field_based(local_spec->relo_kind)) + return -EINVAL; + for (i = 0; i < local_spec->len; i++, local_acc++, targ_acc++) { targ_type = skip_mods_and_typedefs(targ_spec->btf, targ_id, &targ_id); @@ -4507,22 +5038,42 @@ static int bpf_core_spec_match(struct bpf_core_spec *local_spec, } static int bpf_core_calc_field_relo(const struct bpf_program *prog, - const struct bpf_field_reloc *relo, + const struct bpf_core_relo *relo, const struct bpf_core_spec *spec, - __u32 *val, bool *validate) + __u32 *val, __u32 *field_sz, __u32 *type_id, + bool *validate) { - const struct bpf_core_accessor *acc = &spec->spec[spec->len - 1]; - const struct btf_type *t = btf__type_by_id(spec->btf, acc->type_id); - __u32 byte_off, byte_sz, bit_off, bit_sz; + const struct bpf_core_accessor *acc; + const struct btf_type *t; + __u32 byte_off, byte_sz, bit_off, bit_sz, field_type_id; const struct btf_member *m; const struct btf_type *mt; bool bitfield; __s64 sz; + *field_sz = 0; + + if (relo->kind == BPF_FIELD_EXISTS) { + *val = spec ? 1 : 0; + return 0; + } + + if (!spec) + return -EUCLEAN; /* request instruction poisoning */ + + acc = &spec->spec[spec->len - 1]; + t = btf__type_by_id(spec->btf, acc->type_id); + /* a[n] accessor needs special handling */ if (!acc->name) { if (relo->kind == BPF_FIELD_BYTE_OFFSET) { *val = spec->bit_offset / 8; + /* remember field size for load/store mem size */ + sz = btf__resolve_size(spec->btf, acc->type_id); + if (sz < 0) + return -EINVAL; + *field_sz = sz; + *type_id = acc->type_id; } else if (relo->kind == BPF_FIELD_BYTE_SIZE) { sz = btf__resolve_size(spec->btf, acc->type_id); if (sz < 0) @@ -4530,8 +5081,7 @@ static int bpf_core_calc_field_relo(const struct bpf_program *prog, *val = sz; } else { pr_warn("prog '%s': relo %d at insn #%d can't be applied to array access\n", - bpf_program__title(prog, false), - relo->kind, relo->insn_off / 8); + prog->name, relo->kind, relo->insn_off / 8); return -EINVAL; } if (validate) @@ -4540,7 +5090,7 @@ static int bpf_core_calc_field_relo(const struct bpf_program *prog, } m = btf_members(t) + acc->idx; - mt = skip_mods_and_typedefs(spec->btf, m->type, NULL); + mt = skip_mods_and_typedefs(spec->btf, m->type, &field_type_id); bit_off = spec->bit_offset; bit_sz = btf_member_bitfield_size(t, acc->idx); @@ -4553,15 +5103,14 @@ static int bpf_core_calc_field_relo(const struct bpf_program *prog, if (byte_sz >= 8) { /* bitfield can't be read with 64-bit read */ pr_warn("prog '%s': relo %d at insn #%d can't be satisfied for bitfield\n", - bpf_program__title(prog, false), - relo->kind, relo->insn_off / 8); + prog->name, relo->kind, relo->insn_off / 8); return -E2BIG; } byte_sz *= 2; byte_off = bit_off / 8 / byte_sz * byte_sz; } } else { - sz = btf__resolve_size(spec->btf, m->type); + sz = btf__resolve_size(spec->btf, field_type_id); if (sz < 0) return -EINVAL; byte_sz = sz; @@ -4579,6 +5128,10 @@ static int bpf_core_calc_field_relo(const struct bpf_program *prog, switch (relo->kind) { case BPF_FIELD_BYTE_OFFSET: *val = byte_off; + if (!bitfield) { + *field_sz = byte_sz; + *type_id = field_type_id; + } break; case BPF_FIELD_BYTE_SIZE: *val = byte_sz; @@ -4604,117 +5157,384 @@ static int bpf_core_calc_field_relo(const struct bpf_program *prog, break; case BPF_FIELD_EXISTS: default: - pr_warn("prog '%s': unknown relo %d at insn #%d\n", - bpf_program__title(prog, false), - relo->kind, relo->insn_off / 8); - return -EINVAL; + return -EOPNOTSUPP; } return 0; } +static int bpf_core_calc_type_relo(const struct bpf_core_relo *relo, + const struct bpf_core_spec *spec, + __u32 *val) +{ + __s64 sz; + + /* type-based relos return zero when target type is not found */ + if (!spec) { + *val = 0; + return 0; + } + + switch (relo->kind) { + case BPF_TYPE_ID_TARGET: + *val = spec->root_type_id; + break; + case BPF_TYPE_EXISTS: + *val = 1; + break; + case BPF_TYPE_SIZE: + sz = btf__resolve_size(spec->btf, spec->root_type_id); + if (sz < 0) + return -EINVAL; + *val = sz; + break; + case BPF_TYPE_ID_LOCAL: + /* BPF_TYPE_ID_LOCAL is handled specially and shouldn't get here */ + default: + return -EOPNOTSUPP; + } + + return 0; +} + +static int bpf_core_calc_enumval_relo(const struct bpf_core_relo *relo, + const struct bpf_core_spec *spec, + __u32 *val) +{ + const struct btf_type *t; + const struct btf_enum *e; + + switch (relo->kind) { + case BPF_ENUMVAL_EXISTS: + *val = spec ? 1 : 0; + break; + case BPF_ENUMVAL_VALUE: + if (!spec) + return -EUCLEAN; /* request instruction poisoning */ + t = btf__type_by_id(spec->btf, spec->spec[0].type_id); + e = btf_enum(t) + spec->spec[0].idx; + *val = e->val; + break; + default: + return -EOPNOTSUPP; + } + + return 0; +} + +struct bpf_core_relo_res +{ + /* expected value in the instruction, unless validate == false */ + __u32 orig_val; + /* new value that needs to be patched up to */ + __u32 new_val; + /* relocation unsuccessful, poison instruction, but don't fail load */ + bool poison; + /* some relocations can't be validated against orig_val */ + bool validate; + /* for field byte offset relocations or the forms: + * *(T *)(rX + <off>) = rY + * rX = *(T *)(rY + <off>), + * we remember original and resolved field size to adjust direct + * memory loads of pointers and integers; this is necessary for 32-bit + * host kernel architectures, but also allows to automatically + * relocate fields that were resized from, e.g., u32 to u64, etc. + */ + bool fail_memsz_adjust; + __u32 orig_sz; + __u32 orig_type_id; + __u32 new_sz; + __u32 new_type_id; +}; + +/* Calculate original and target relocation values, given local and target + * specs and relocation kind. These values are calculated for each candidate. + * If there are multiple candidates, resulting values should all be consistent + * with each other. Otherwise, libbpf will refuse to proceed due to ambiguity. + * If instruction has to be poisoned, *poison will be set to true. + */ +static int bpf_core_calc_relo(const struct bpf_program *prog, + const struct bpf_core_relo *relo, + int relo_idx, + const struct bpf_core_spec *local_spec, + const struct bpf_core_spec *targ_spec, + struct bpf_core_relo_res *res) +{ + int err = -EOPNOTSUPP; + + res->orig_val = 0; + res->new_val = 0; + res->poison = false; + res->validate = true; + res->fail_memsz_adjust = false; + res->orig_sz = res->new_sz = 0; + res->orig_type_id = res->new_type_id = 0; + + if (core_relo_is_field_based(relo->kind)) { + err = bpf_core_calc_field_relo(prog, relo, local_spec, + &res->orig_val, &res->orig_sz, + &res->orig_type_id, &res->validate); + err = err ?: bpf_core_calc_field_relo(prog, relo, targ_spec, + &res->new_val, &res->new_sz, + &res->new_type_id, NULL); + if (err) + goto done; + /* Validate if it's safe to adjust load/store memory size. + * Adjustments are performed only if original and new memory + * sizes differ. + */ + res->fail_memsz_adjust = false; + if (res->orig_sz != res->new_sz) { + const struct btf_type *orig_t, *new_t; + + orig_t = btf__type_by_id(local_spec->btf, res->orig_type_id); + new_t = btf__type_by_id(targ_spec->btf, res->new_type_id); + + /* There are two use cases in which it's safe to + * adjust load/store's mem size: + * - reading a 32-bit kernel pointer, while on BPF + * size pointers are always 64-bit; in this case + * it's safe to "downsize" instruction size due to + * pointer being treated as unsigned integer with + * zero-extended upper 32-bits; + * - reading unsigned integers, again due to + * zero-extension is preserving the value correctly. + * + * In all other cases it's incorrect to attempt to + * load/store field because read value will be + * incorrect, so we poison relocated instruction. + */ + if (btf_is_ptr(orig_t) && btf_is_ptr(new_t)) + goto done; + if (btf_is_int(orig_t) && btf_is_int(new_t) && + btf_int_encoding(orig_t) != BTF_INT_SIGNED && + btf_int_encoding(new_t) != BTF_INT_SIGNED) + goto done; + + /* mark as invalid mem size adjustment, but this will + * only be checked for LDX/STX/ST insns + */ + res->fail_memsz_adjust = true; + } + } else if (core_relo_is_type_based(relo->kind)) { + err = bpf_core_calc_type_relo(relo, local_spec, &res->orig_val); + err = err ?: bpf_core_calc_type_relo(relo, targ_spec, &res->new_val); + } else if (core_relo_is_enumval_based(relo->kind)) { + err = bpf_core_calc_enumval_relo(relo, local_spec, &res->orig_val); + err = err ?: bpf_core_calc_enumval_relo(relo, targ_spec, &res->new_val); + } + +done: + if (err == -EUCLEAN) { + /* EUCLEAN is used to signal instruction poisoning request */ + res->poison = true; + err = 0; + } else if (err == -EOPNOTSUPP) { + /* EOPNOTSUPP means unknown/unsupported relocation */ + pr_warn("prog '%s': relo #%d: unrecognized CO-RE relocation %s (%d) at insn #%d\n", + prog->name, relo_idx, core_relo_kind_str(relo->kind), + relo->kind, relo->insn_off / 8); + } + + return err; +} + +/* + * Turn instruction for which CO_RE relocation failed into invalid one with + * distinct signature. + */ +static void bpf_core_poison_insn(struct bpf_program *prog, int relo_idx, + int insn_idx, struct bpf_insn *insn) +{ + pr_debug("prog '%s': relo #%d: substituting insn #%d w/ invalid insn\n", + prog->name, relo_idx, insn_idx); + insn->code = BPF_JMP | BPF_CALL; + insn->dst_reg = 0; + insn->src_reg = 0; + insn->off = 0; + /* if this instruction is reachable (not a dead code), + * verifier will complain with the following message: + * invalid func unknown#195896080 + */ + insn->imm = 195896080; /* => 0xbad2310 => "bad relo" */ +} + +static bool is_ldimm64(struct bpf_insn *insn) +{ + return insn->code == (BPF_LD | BPF_IMM | BPF_DW); +} + +static int insn_bpf_size_to_bytes(struct bpf_insn *insn) +{ + switch (BPF_SIZE(insn->code)) { + case BPF_DW: return 8; + case BPF_W: return 4; + case BPF_H: return 2; + case BPF_B: return 1; + default: return -1; + } +} + +static int insn_bytes_to_bpf_size(__u32 sz) +{ + switch (sz) { + case 8: return BPF_DW; + case 4: return BPF_W; + case 2: return BPF_H; + case 1: return BPF_B; + default: return -1; + } +} + /* * Patch relocatable BPF instruction. * * Patched value is determined by relocation kind and target specification. - * For field existence relocation target spec will be NULL if field is not - * found. + * For existence relocations target spec will be NULL if field/type is not found. * Expected insn->imm value is determined using relocation kind and local * spec, and is checked before patching instruction. If actual insn->imm value * is wrong, bail out with error. * - * Currently three kinds of BPF instructions are supported: + * Currently supported classes of BPF instruction are: * 1. rX = <imm> (assignment with immediate operand); * 2. rX += <imm> (arithmetic operations with immediate operand); + * 3. rX = <imm64> (load with 64-bit immediate value); + * 4. rX = *(T *)(rY + <off>), where T is one of {u8, u16, u32, u64}; + * 5. *(T *)(rX + <off>) = rY, where T is one of {u8, u16, u32, u64}; + * 6. *(T *)(rX + <off>) = <imm>, where T is one of {u8, u16, u32, u64}. */ -static int bpf_core_reloc_insn(struct bpf_program *prog, - const struct bpf_field_reloc *relo, +static int bpf_core_patch_insn(struct bpf_program *prog, + const struct bpf_core_relo *relo, int relo_idx, - const struct bpf_core_spec *local_spec, - const struct bpf_core_spec *targ_spec) + const struct bpf_core_relo_res *res) { __u32 orig_val, new_val; struct bpf_insn *insn; - bool validate = true; - int insn_idx, err; + int insn_idx; __u8 class; - if (relo->insn_off % sizeof(struct bpf_insn)) + if (relo->insn_off % BPF_INSN_SZ) return -EINVAL; - insn_idx = relo->insn_off / sizeof(struct bpf_insn); + insn_idx = relo->insn_off / BPF_INSN_SZ; + /* adjust insn_idx from section frame of reference to the local + * program's frame of reference; (sub-)program code is not yet + * relocated, so it's enough to just subtract in-section offset + */ + insn_idx = insn_idx - prog->sec_insn_off; insn = &prog->insns[insn_idx]; class = BPF_CLASS(insn->code); - if (relo->kind == BPF_FIELD_EXISTS) { - orig_val = 1; /* can't generate EXISTS relo w/o local field */ - new_val = targ_spec ? 1 : 0; - } else if (!targ_spec) { - pr_debug("prog '%s': relo #%d: substituting insn #%d w/ invalid insn\n", - bpf_program__title(prog, false), relo_idx, insn_idx); - insn->code = BPF_JMP | BPF_CALL; - insn->dst_reg = 0; - insn->src_reg = 0; - insn->off = 0; - /* if this instruction is reachable (not a dead code), - * verifier will complain with the following message: - * invalid func unknown#195896080 + if (res->poison) { +poison: + /* poison second part of ldimm64 to avoid confusing error from + * verifier about "unknown opcode 00" */ - insn->imm = 195896080; /* => 0xbad2310 => "bad relo" */ + if (is_ldimm64(insn)) + bpf_core_poison_insn(prog, relo_idx, insn_idx + 1, insn + 1); + bpf_core_poison_insn(prog, relo_idx, insn_idx, insn); return 0; - } else { - err = bpf_core_calc_field_relo(prog, relo, local_spec, - &orig_val, &validate); - if (err) - return err; - err = bpf_core_calc_field_relo(prog, relo, targ_spec, - &new_val, NULL); - if (err) - return err; } + orig_val = res->orig_val; + new_val = res->new_val; + switch (class) { case BPF_ALU: case BPF_ALU64: if (BPF_SRC(insn->code) != BPF_K) return -EINVAL; - if (validate && insn->imm != orig_val) { + if (res->validate && insn->imm != orig_val) { pr_warn("prog '%s': relo #%d: unexpected insn #%d (ALU/ALU64) value: got %u, exp %u -> %u\n", - bpf_program__title(prog, false), relo_idx, + prog->name, relo_idx, insn_idx, insn->imm, orig_val, new_val); return -EINVAL; } orig_val = insn->imm; insn->imm = new_val; pr_debug("prog '%s': relo #%d: patched insn #%d (ALU/ALU64) imm %u -> %u\n", - bpf_program__title(prog, false), relo_idx, insn_idx, + prog->name, relo_idx, insn_idx, orig_val, new_val); break; case BPF_LDX: case BPF_ST: case BPF_STX: - if (validate && insn->off != orig_val) { - pr_warn("prog '%s': relo #%d: unexpected insn #%d (LD/LDX/ST/STX) value: got %u, exp %u -> %u\n", - bpf_program__title(prog, false), relo_idx, - insn_idx, insn->off, orig_val, new_val); + if (res->validate && insn->off != orig_val) { + pr_warn("prog '%s': relo #%d: unexpected insn #%d (LDX/ST/STX) value: got %u, exp %u -> %u\n", + prog->name, relo_idx, insn_idx, insn->off, orig_val, new_val); return -EINVAL; } if (new_val > SHRT_MAX) { pr_warn("prog '%s': relo #%d: insn #%d (LDX/ST/STX) value too big: %u\n", - bpf_program__title(prog, false), relo_idx, - insn_idx, new_val); + prog->name, relo_idx, insn_idx, new_val); return -ERANGE; } + if (res->fail_memsz_adjust) { + pr_warn("prog '%s': relo #%d: insn #%d (LDX/ST/STX) accesses field incorrectly. " + "Make sure you are accessing pointers, unsigned integers, or fields of matching type and size.\n", + prog->name, relo_idx, insn_idx); + goto poison; + } + orig_val = insn->off; insn->off = new_val; pr_debug("prog '%s': relo #%d: patched insn #%d (LDX/ST/STX) off %u -> %u\n", - bpf_program__title(prog, false), relo_idx, insn_idx, - orig_val, new_val); + prog->name, relo_idx, insn_idx, orig_val, new_val); + + if (res->new_sz != res->orig_sz) { + int insn_bytes_sz, insn_bpf_sz; + + insn_bytes_sz = insn_bpf_size_to_bytes(insn); + if (insn_bytes_sz != res->orig_sz) { + pr_warn("prog '%s': relo #%d: insn #%d (LDX/ST/STX) unexpected mem size: got %d, exp %u\n", + prog->name, relo_idx, insn_idx, insn_bytes_sz, res->orig_sz); + return -EINVAL; + } + + insn_bpf_sz = insn_bytes_to_bpf_size(res->new_sz); + if (insn_bpf_sz < 0) { + pr_warn("prog '%s': relo #%d: insn #%d (LDX/ST/STX) invalid new mem size: %u\n", + prog->name, relo_idx, insn_idx, res->new_sz); + return -EINVAL; + } + + insn->code = BPF_MODE(insn->code) | insn_bpf_sz | BPF_CLASS(insn->code); + pr_debug("prog '%s': relo #%d: patched insn #%d (LDX/ST/STX) mem_sz %u -> %u\n", + prog->name, relo_idx, insn_idx, res->orig_sz, res->new_sz); + } + break; + case BPF_LD: { + __u64 imm; + + if (!is_ldimm64(insn) || + insn[0].src_reg != 0 || insn[0].off != 0 || + insn_idx + 1 >= prog->insns_cnt || + insn[1].code != 0 || insn[1].dst_reg != 0 || + insn[1].src_reg != 0 || insn[1].off != 0) { + pr_warn("prog '%s': relo #%d: insn #%d (LDIMM64) has unexpected form\n", + prog->name, relo_idx, insn_idx); + return -EINVAL; + } + + imm = insn[0].imm + ((__u64)insn[1].imm << 32); + if (res->validate && imm != orig_val) { + pr_warn("prog '%s': relo #%d: unexpected insn #%d (LDIMM64) value: got %llu, exp %u -> %u\n", + prog->name, relo_idx, + insn_idx, (unsigned long long)imm, + orig_val, new_val); + return -EINVAL; + } + + insn[0].imm = new_val; + insn[1].imm = 0; /* currently only 32-bit values are supported */ + pr_debug("prog '%s': relo #%d: patched insn #%d (LDIMM64) imm64 %llu -> %u\n", + prog->name, relo_idx, insn_idx, + (unsigned long long)imm, new_val); break; + } default: - pr_warn("prog '%s': relo #%d: trying to relocate unrecognized insn #%d, code:%x, src:%x, dst:%x, off:%x, imm:%x\n", - bpf_program__title(prog, false), relo_idx, - insn_idx, insn->code, insn->src_reg, insn->dst_reg, - insn->off, insn->imm); + pr_warn("prog '%s': relo #%d: trying to relocate unrecognized insn #%d, code:0x%x, src:0x%x, dst:0x%x, off:0x%x, imm:0x%x\n", + prog->name, relo_idx, insn_idx, insn->code, + insn->src_reg, insn->dst_reg, insn->off, insn->imm); return -EINVAL; } @@ -4728,29 +5548,48 @@ static int bpf_core_reloc_insn(struct bpf_program *prog, static void bpf_core_dump_spec(int level, const struct bpf_core_spec *spec) { const struct btf_type *t; + const struct btf_enum *e; const char *s; __u32 type_id; int i; - type_id = spec->spec[0].type_id; + type_id = spec->root_type_id; t = btf__type_by_id(spec->btf, type_id); s = btf__name_by_offset(spec->btf, t->name_off); - libbpf_print(level, "[%u] %s + ", type_id, s); - for (i = 0; i < spec->raw_len; i++) - libbpf_print(level, "%d%s", spec->raw_spec[i], - i == spec->raw_len - 1 ? " => " : ":"); + libbpf_print(level, "[%u] %s %s", type_id, btf_kind_str(t), str_is_empty(s) ? "<anon>" : s); - libbpf_print(level, "%u.%u @ &x", - spec->bit_offset / 8, spec->bit_offset % 8); + if (core_relo_is_type_based(spec->relo_kind)) + return; - for (i = 0; i < spec->len; i++) { - if (spec->spec[i].name) - libbpf_print(level, ".%s", spec->spec[i].name); - else - libbpf_print(level, "[%u]", spec->spec[i].idx); + if (core_relo_is_enumval_based(spec->relo_kind)) { + t = skip_mods_and_typedefs(spec->btf, type_id, NULL); + e = btf_enum(t) + spec->raw_spec[0]; + s = btf__name_by_offset(spec->btf, e->name_off); + + libbpf_print(level, "::%s = %u", s, e->val); + return; } + if (core_relo_is_field_based(spec->relo_kind)) { + for (i = 0; i < spec->len; i++) { + if (spec->spec[i].name) + libbpf_print(level, ".%s", spec->spec[i].name); + else if (i > 0 || spec->spec[i].idx > 0) + libbpf_print(level, "[%u]", spec->spec[i].idx); + } + + libbpf_print(level, " ("); + for (i = 0; i < spec->raw_len; i++) + libbpf_print(level, "%s%d", i == 0 ? "" : ":", spec->raw_spec[i]); + + if (spec->bit_offset % 8) + libbpf_print(level, " @ offset %u.%u)", + spec->bit_offset / 8, spec->bit_offset % 8); + else + libbpf_print(level, " @ offset %u)", spec->bit_offset / 8); + return; + } } static size_t bpf_core_hash_fn(const void *key, void *ctx) @@ -4814,22 +5653,22 @@ static void *u32_as_hash_key(__u32 x) * CPU-wise compared to prebuilding a map from all local type names to * a list of candidate type names. It's also sped up by caching resolved * list of matching candidates per each local "root" type ID, that has at - * least one bpf_field_reloc associated with it. This list is shared + * least one bpf_core_relo associated with it. This list is shared * between multiple relocations for the same type ID and is updated as some * of the candidates are pruned due to structural incompatibility. */ -static int bpf_core_reloc_field(struct bpf_program *prog, - const struct bpf_field_reloc *relo, - int relo_idx, - const struct btf *local_btf, - const struct btf *targ_btf, - struct hashmap *cand_cache) +static int bpf_core_apply_relo(struct bpf_program *prog, + const struct bpf_core_relo *relo, + int relo_idx, + const struct btf *local_btf, + const struct btf *targ_btf, + struct hashmap *cand_cache) { - const char *prog_name = bpf_program__title(prog, false); - struct bpf_core_spec local_spec, cand_spec, targ_spec; + struct bpf_core_spec local_spec, cand_spec, targ_spec = {}; const void *type_key = u32_as_hash_key(relo->type_id); - const struct btf_type *local_type, *cand_type; - const char *local_name, *cand_name; + struct bpf_core_relo_res cand_res, targ_res; + const struct btf_type *local_type; + const char *local_name; struct ids_vec *cand_ids; __u32 local_id, cand_id; const char *spec_str; @@ -4841,32 +5680,49 @@ static int bpf_core_reloc_field(struct bpf_program *prog, return -EINVAL; local_name = btf__name_by_offset(local_btf, local_type->name_off); - if (str_is_empty(local_name)) + if (!local_name) return -EINVAL; spec_str = btf__name_by_offset(local_btf, relo->access_str_off); if (str_is_empty(spec_str)) return -EINVAL; - err = bpf_core_spec_parse(local_btf, local_id, spec_str, &local_spec); + err = bpf_core_parse_spec(local_btf, local_id, spec_str, relo->kind, &local_spec); if (err) { - pr_warn("prog '%s': relo #%d: parsing [%d] %s + %s failed: %d\n", - prog_name, relo_idx, local_id, local_name, spec_str, - err); + pr_warn("prog '%s': relo #%d: parsing [%d] %s %s + %s failed: %d\n", + prog->name, relo_idx, local_id, btf_kind_str(local_type), + str_is_empty(local_name) ? "<anon>" : local_name, + spec_str, err); return -EINVAL; } - pr_debug("prog '%s': relo #%d: kind %d, spec is ", prog_name, relo_idx, - relo->kind); + pr_debug("prog '%s': relo #%d: kind <%s> (%d), spec is ", prog->name, + relo_idx, core_relo_kind_str(relo->kind), relo->kind); bpf_core_dump_spec(LIBBPF_DEBUG, &local_spec); libbpf_print(LIBBPF_DEBUG, "\n"); + /* TYPE_ID_LOCAL relo is special and doesn't need candidate search */ + if (relo->kind == BPF_TYPE_ID_LOCAL) { + targ_res.validate = true; + targ_res.poison = false; + targ_res.orig_val = local_spec.root_type_id; + targ_res.new_val = local_spec.root_type_id; + goto patch_insn; + } + + /* libbpf doesn't support candidate search for anonymous types */ + if (str_is_empty(spec_str)) { + pr_warn("prog '%s': relo #%d: <%s> (%d) relocation doesn't support anonymous types\n", + prog->name, relo_idx, core_relo_kind_str(relo->kind), relo->kind); + return -EOPNOTSUPP; + } + if (!hashmap__find(cand_cache, type_key, (void **)&cand_ids)) { cand_ids = bpf_core_find_cands(local_btf, local_id, targ_btf); if (IS_ERR(cand_ids)) { - pr_warn("prog '%s': relo #%d: target candidate search failed for [%d] %s: %ld", - prog_name, relo_idx, local_id, local_name, - PTR_ERR(cand_ids)); + pr_warn("prog '%s': relo #%d: target candidate search failed for [%d] %s %s: %ld", + prog->name, relo_idx, local_id, btf_kind_str(local_type), + local_name, PTR_ERR(cand_ids)); return PTR_ERR(cand_ids); } err = hashmap__set(cand_cache, type_key, cand_ids, NULL, NULL); @@ -4878,36 +5734,51 @@ static int bpf_core_reloc_field(struct bpf_program *prog, for (i = 0, j = 0; i < cand_ids->len; i++) { cand_id = cand_ids->data[i]; - cand_type = btf__type_by_id(targ_btf, cand_id); - cand_name = btf__name_by_offset(targ_btf, cand_type->name_off); - - err = bpf_core_spec_match(&local_spec, targ_btf, - cand_id, &cand_spec); - pr_debug("prog '%s': relo #%d: matching candidate #%d %s against spec ", - prog_name, relo_idx, i, cand_name); - bpf_core_dump_spec(LIBBPF_DEBUG, &cand_spec); - libbpf_print(LIBBPF_DEBUG, ": %d\n", err); + err = bpf_core_spec_match(&local_spec, targ_btf, cand_id, &cand_spec); if (err < 0) { - pr_warn("prog '%s': relo #%d: matching error: %d\n", - prog_name, relo_idx, err); + pr_warn("prog '%s': relo #%d: error matching candidate #%d ", + prog->name, relo_idx, i); + bpf_core_dump_spec(LIBBPF_WARN, &cand_spec); + libbpf_print(LIBBPF_WARN, ": %d\n", err); return err; } + + pr_debug("prog '%s': relo #%d: %s candidate #%d ", prog->name, + relo_idx, err == 0 ? "non-matching" : "matching", i); + bpf_core_dump_spec(LIBBPF_DEBUG, &cand_spec); + libbpf_print(LIBBPF_DEBUG, "\n"); + if (err == 0) continue; + err = bpf_core_calc_relo(prog, relo, relo_idx, &local_spec, &cand_spec, &cand_res); + if (err) + return err; + if (j == 0) { + targ_res = cand_res; targ_spec = cand_spec; } else if (cand_spec.bit_offset != targ_spec.bit_offset) { - /* if there are many candidates, they should all - * resolve to the same bit offset + /* if there are many field relo candidates, they + * should all resolve to the same bit offset */ - pr_warn("prog '%s': relo #%d: offset ambiguity: %u != %u\n", - prog_name, relo_idx, cand_spec.bit_offset, + pr_warn("prog '%s': relo #%d: field offset ambiguity: %u != %u\n", + prog->name, relo_idx, cand_spec.bit_offset, targ_spec.bit_offset); return -EINVAL; + } else if (cand_res.poison != targ_res.poison || cand_res.new_val != targ_res.new_val) { + /* all candidates should result in the same relocation + * decision and value, otherwise it's dangerous to + * proceed due to ambiguity + */ + pr_warn("prog '%s': relo #%d: relocation decision ambiguity: %s %u != %s %u\n", + prog->name, relo_idx, + cand_res.poison ? "failure" : "success", cand_res.new_val, + targ_res.poison ? "failure" : "success", targ_res.new_val); + return -EINVAL; } - cand_ids->data[j++] = cand_spec.spec[0].type_id; + cand_ids->data[j++] = cand_spec.root_type_id; } /* @@ -4926,22 +5797,28 @@ static int bpf_core_reloc_field(struct bpf_program *prog, * as well as expected case, depending whether instruction w/ * relocation is guarded in some way that makes it unreachable (dead * code) if relocation can't be resolved. This is handled in - * bpf_core_reloc_insn() uniformly by replacing that instruction with + * bpf_core_patch_insn() uniformly by replacing that instruction with * BPF helper call insn (using invalid helper ID). If that instruction * is indeed unreachable, then it will be ignored and eliminated by * verifier. If it was an error, then verifier will complain and point * to a specific instruction number in its log. */ - if (j == 0) - pr_debug("prog '%s': relo #%d: no matching targets found for [%d] %s + %s\n", - prog_name, relo_idx, local_id, local_name, spec_str); + if (j == 0) { + pr_debug("prog '%s': relo #%d: no matching targets found\n", + prog->name, relo_idx); + + /* calculate single target relo result explicitly */ + err = bpf_core_calc_relo(prog, relo, relo_idx, &local_spec, NULL, &targ_res); + if (err) + return err; + } - /* bpf_core_reloc_insn should know how to handle missing targ_spec */ - err = bpf_core_reloc_insn(prog, relo, relo_idx, &local_spec, - j ? &targ_spec : NULL); +patch_insn: + /* bpf_core_patch_insn() should know how to handle missing targ_spec */ + err = bpf_core_patch_insn(prog, relo, relo_idx, &targ_res); if (err) { pr_warn("prog '%s': relo #%d: failed to patch insn at offset %d: %d\n", - prog_name, relo_idx, relo->insn_off, err); + prog->name, relo_idx, relo->insn_off, err); return -EINVAL; } @@ -4949,20 +5826,23 @@ static int bpf_core_reloc_field(struct bpf_program *prog, } static int -bpf_core_reloc_fields(struct bpf_object *obj, const char *targ_btf_path) +bpf_object__relocate_core(struct bpf_object *obj, const char *targ_btf_path) { const struct btf_ext_info_sec *sec; - const struct bpf_field_reloc *rec; + const struct bpf_core_relo *rec; const struct btf_ext_info *seg; struct hashmap_entry *entry; struct hashmap *cand_cache = NULL; struct bpf_program *prog; struct btf *targ_btf; const char *sec_name; - int i, err = 0; + int i, err = 0, insn_idx, sec_idx; + + if (obj->btf_ext->core_relo_info.len == 0) + return 0; if (targ_btf_path) - targ_btf = btf__parse_elf(targ_btf_path, NULL); + targ_btf = btf__parse(targ_btf_path, NULL); else targ_btf = obj->btf_vmlinux; if (IS_ERR_OR_NULL(targ_btf)) { @@ -4976,36 +5856,54 @@ bpf_core_reloc_fields(struct bpf_object *obj, const char *targ_btf_path) goto out; } - seg = &obj->btf_ext->field_reloc_info; + seg = &obj->btf_ext->core_relo_info; for_each_btf_ext_sec(seg, sec) { sec_name = btf__name_by_offset(obj->btf, sec->sec_name_off); if (str_is_empty(sec_name)) { err = -EINVAL; goto out; } + /* bpf_object's ELF is gone by now so it's not easy to find + * section index by section name, but we can find *any* + * bpf_program within desired section name and use it's + * prog->sec_idx to do a proper search by section index and + * instruction offset + */ prog = NULL; for (i = 0; i < obj->nr_programs; i++) { - if (!strcmp(obj->programs[i].section_name, sec_name)) { - prog = &obj->programs[i]; + prog = &obj->programs[i]; + if (strcmp(prog->sec_name, sec_name) == 0) break; - } } if (!prog) { - pr_warn("failed to find program '%s' for CO-RE offset relocation\n", - sec_name); - err = -EINVAL; - goto out; + pr_warn("sec '%s': failed to find a BPF program\n", sec_name); + return -ENOENT; } + sec_idx = prog->sec_idx; - pr_debug("prog '%s': performing %d CO-RE offset relocs\n", + pr_debug("sec '%s': found %d CO-RE relocations\n", sec_name, sec->num_info); for_each_btf_ext_rec(seg, sec, i, rec) { - err = bpf_core_reloc_field(prog, rec, i, obj->btf, - targ_btf, cand_cache); + insn_idx = rec->insn_off / BPF_INSN_SZ; + prog = find_prog_by_sec_insn(obj, sec_idx, insn_idx); + if (!prog) { + pr_warn("sec '%s': failed to find program at insn #%d for CO-RE offset relocation #%d\n", + sec_name, insn_idx, i); + err = -EINVAL; + goto out; + } + /* no need to apply CO-RE relocation if the program is + * not going to be loaded + */ + if (!prog->load) + continue; + + err = bpf_core_apply_relo(prog, rec, i, obj->btf, + targ_btf, cand_cache); if (err) { pr_warn("prog '%s': relo #%d: failed to relocate: %d\n", - sec_name, i, err); + prog->name, i, err); goto out; } } @@ -5024,125 +5922,432 @@ out: return err; } +/* Relocate data references within program code: + * - map references; + * - global variable references; + * - extern references. + */ static int -bpf_object__relocate_core(struct bpf_object *obj, const char *targ_btf_path) +bpf_object__relocate_data(struct bpf_object *obj, struct bpf_program *prog) { - int err = 0; + int i; - if (obj->btf_ext->field_reloc_info.len) - err = bpf_core_reloc_fields(obj, targ_btf_path); + for (i = 0; i < prog->nr_reloc; i++) { + struct reloc_desc *relo = &prog->reloc_desc[i]; + struct bpf_insn *insn = &prog->insns[relo->insn_idx]; + struct extern_desc *ext; - return err; + switch (relo->type) { + case RELO_LD64: + insn[0].src_reg = BPF_PSEUDO_MAP_FD; + insn[0].imm = obj->maps[relo->map_idx].fd; + relo->processed = true; + break; + case RELO_DATA: + insn[0].src_reg = BPF_PSEUDO_MAP_VALUE; + insn[1].imm = insn[0].imm + relo->sym_off; + insn[0].imm = obj->maps[relo->map_idx].fd; + relo->processed = true; + break; + case RELO_EXTERN: + ext = &obj->externs[relo->sym_off]; + if (ext->type == EXT_KCFG) { + insn[0].src_reg = BPF_PSEUDO_MAP_VALUE; + insn[0].imm = obj->maps[obj->kconfig_map_idx].fd; + insn[1].imm = ext->kcfg.data_off; + } else /* EXT_KSYM */ { + if (ext->ksym.type_id) { /* typed ksyms */ + insn[0].src_reg = BPF_PSEUDO_BTF_ID; + insn[0].imm = ext->ksym.vmlinux_btf_id; + } else { /* typeless ksyms */ + insn[0].imm = (__u32)ext->ksym.addr; + insn[1].imm = ext->ksym.addr >> 32; + } + } + relo->processed = true; + break; + case RELO_CALL: + /* will be handled as a follow up pass */ + break; + default: + pr_warn("prog '%s': relo #%d: bad relo type %d\n", + prog->name, i, relo->type); + return -EINVAL; + } + } + + return 0; } -static int -bpf_program__reloc_text(struct bpf_program *prog, struct bpf_object *obj, - struct reloc_desc *relo) +static int adjust_prog_btf_ext_info(const struct bpf_object *obj, + const struct bpf_program *prog, + const struct btf_ext_info *ext_info, + void **prog_info, __u32 *prog_rec_cnt, + __u32 *prog_rec_sz) { - struct bpf_insn *insn, *new_insn; - struct bpf_program *text; - size_t new_cnt; - int err; + void *copy_start = NULL, *copy_end = NULL; + void *rec, *rec_end, *new_prog_info; + const struct btf_ext_info_sec *sec; + size_t old_sz, new_sz; + const char *sec_name; + int i, off_adj; - if (prog->idx != obj->efile.text_shndx && prog->main_prog_cnt == 0) { - text = bpf_object__find_prog_by_idx(obj, obj->efile.text_shndx); - if (!text) { - pr_warn("no .text section found yet relo into text exist\n"); - return -LIBBPF_ERRNO__RELOC; + for_each_btf_ext_sec(ext_info, sec) { + sec_name = btf__name_by_offset(obj->btf, sec->sec_name_off); + if (!sec_name) + return -EINVAL; + if (strcmp(sec_name, prog->sec_name) != 0) + continue; + + for_each_btf_ext_rec(ext_info, sec, i, rec) { + __u32 insn_off = *(__u32 *)rec / BPF_INSN_SZ; + + if (insn_off < prog->sec_insn_off) + continue; + if (insn_off >= prog->sec_insn_off + prog->sec_insn_cnt) + break; + + if (!copy_start) + copy_start = rec; + copy_end = rec + ext_info->rec_size; } - new_cnt = prog->insns_cnt + text->insns_cnt; - new_insn = reallocarray(prog->insns, new_cnt, sizeof(*insn)); - if (!new_insn) { - pr_warn("oom in prog realloc\n"); + + if (!copy_start) + return -ENOENT; + + /* append func/line info of a given (sub-)program to the main + * program func/line info + */ + old_sz = (size_t)(*prog_rec_cnt) * ext_info->rec_size; + new_sz = old_sz + (copy_end - copy_start); + new_prog_info = realloc(*prog_info, new_sz); + if (!new_prog_info) return -ENOMEM; - } - prog->insns = new_insn; + *prog_info = new_prog_info; + *prog_rec_cnt = new_sz / ext_info->rec_size; + memcpy(new_prog_info + old_sz, copy_start, copy_end - copy_start); + + /* Kernel instruction offsets are in units of 8-byte + * instructions, while .BTF.ext instruction offsets generated + * by Clang are in units of bytes. So convert Clang offsets + * into kernel offsets and adjust offset according to program + * relocated position. + */ + off_adj = prog->sub_insn_off - prog->sec_insn_off; + rec = new_prog_info + old_sz; + rec_end = new_prog_info + new_sz; + for (; rec < rec_end; rec += ext_info->rec_size) { + __u32 *insn_off = rec; - if (obj->btf_ext) { - err = bpf_program_reloc_btf_ext(prog, obj, - text->section_name, - prog->insns_cnt); - if (err) - return err; + *insn_off = *insn_off / BPF_INSN_SZ + off_adj; } - - memcpy(new_insn + prog->insns_cnt, text->insns, - text->insns_cnt * sizeof(*insn)); - prog->main_prog_cnt = prog->insns_cnt; - prog->insns_cnt = new_cnt; - pr_debug("added %zd insn from %s to prog %s\n", - text->insns_cnt, text->section_name, - prog->section_name); + *prog_rec_sz = ext_info->rec_size; + return 0; } - insn = &prog->insns[relo->insn_idx]; - insn->imm += relo->sym_off / 8 + prog->main_prog_cnt - relo->insn_idx; - return 0; + return -ENOENT; } static int -bpf_program__relocate(struct bpf_program *prog, struct bpf_object *obj) +reloc_prog_func_and_line_info(const struct bpf_object *obj, + struct bpf_program *main_prog, + const struct bpf_program *prog) { - int i, err; + int err; - if (!prog) + /* no .BTF.ext relocation if .BTF.ext is missing or kernel doesn't + * supprot func/line info + */ + if (!obj->btf_ext || !kernel_supports(FEAT_BTF_FUNC)) return 0; - if (obj->btf_ext) { - err = bpf_program_reloc_btf_ext(prog, obj, - prog->section_name, 0); - if (err) + /* only attempt func info relocation if main program's func_info + * relocation was successful + */ + if (main_prog != prog && !main_prog->func_info) + goto line_info; + + err = adjust_prog_btf_ext_info(obj, prog, &obj->btf_ext->func_info, + &main_prog->func_info, + &main_prog->func_info_cnt, + &main_prog->func_info_rec_size); + if (err) { + if (err != -ENOENT) { + pr_warn("prog '%s': error relocating .BTF.ext function info: %d\n", + prog->name, err); + return err; + } + if (main_prog->func_info) { + /* + * Some info has already been found but has problem + * in the last btf_ext reloc. Must have to error out. + */ + pr_warn("prog '%s': missing .BTF.ext function info.\n", prog->name); return err; + } + /* Have problem loading the very first info. Ignore the rest. */ + pr_warn("prog '%s': missing .BTF.ext function info for the main program, skipping all of .BTF.ext func info.\n", + prog->name); } - if (!prog->reloc_desc) +line_info: + /* don't relocate line info if main program's relocation failed */ + if (main_prog != prog && !main_prog->line_info) return 0; - for (i = 0; i < prog->nr_reloc; i++) { - struct reloc_desc *relo = &prog->reloc_desc[i]; - struct bpf_insn *insn = &prog->insns[relo->insn_idx]; - struct extern_desc *ext; + err = adjust_prog_btf_ext_info(obj, prog, &obj->btf_ext->line_info, + &main_prog->line_info, + &main_prog->line_info_cnt, + &main_prog->line_info_rec_size); + if (err) { + if (err != -ENOENT) { + pr_warn("prog '%s': error relocating .BTF.ext line info: %d\n", + prog->name, err); + return err; + } + if (main_prog->line_info) { + /* + * Some info has already been found but has problem + * in the last btf_ext reloc. Must have to error out. + */ + pr_warn("prog '%s': missing .BTF.ext line info.\n", prog->name); + return err; + } + /* Have problem loading the very first info. Ignore the rest. */ + pr_warn("prog '%s': missing .BTF.ext line info for the main program, skipping all of .BTF.ext line info.\n", + prog->name); + } + return 0; +} + +static int cmp_relo_by_insn_idx(const void *key, const void *elem) +{ + size_t insn_idx = *(const size_t *)key; + const struct reloc_desc *relo = elem; + + if (insn_idx == relo->insn_idx) + return 0; + return insn_idx < relo->insn_idx ? -1 : 1; +} + +static struct reloc_desc *find_prog_insn_relo(const struct bpf_program *prog, size_t insn_idx) +{ + return bsearch(&insn_idx, prog->reloc_desc, prog->nr_reloc, + sizeof(*prog->reloc_desc), cmp_relo_by_insn_idx); +} + +static int +bpf_object__reloc_code(struct bpf_object *obj, struct bpf_program *main_prog, + struct bpf_program *prog) +{ + size_t sub_insn_idx, insn_idx, new_cnt; + struct bpf_program *subprog; + struct bpf_insn *insns, *insn; + struct reloc_desc *relo; + int err; - if (relo->insn_idx + 1 >= (int)prog->insns_cnt) { - pr_warn("relocation out of range: '%s'\n", - prog->section_name); + err = reloc_prog_func_and_line_info(obj, main_prog, prog); + if (err) + return err; + + for (insn_idx = 0; insn_idx < prog->sec_insn_cnt; insn_idx++) { + insn = &main_prog->insns[prog->sub_insn_off + insn_idx]; + if (!insn_is_subprog_call(insn)) + continue; + + relo = find_prog_insn_relo(prog, insn_idx); + if (relo && relo->type != RELO_CALL) { + pr_warn("prog '%s': unexpected relo for insn #%zu, type %d\n", + prog->name, insn_idx, relo->type); return -LIBBPF_ERRNO__RELOC; } + if (relo) { + /* sub-program instruction index is a combination of + * an offset of a symbol pointed to by relocation and + * call instruction's imm field; for global functions, + * call always has imm = -1, but for static functions + * relocation is against STT_SECTION and insn->imm + * points to a start of a static function + */ + sub_insn_idx = relo->sym_off / BPF_INSN_SZ + insn->imm + 1; + } else { + /* if subprogram call is to a static function within + * the same ELF section, there won't be any relocation + * emitted, but it also means there is no additional + * offset necessary, insns->imm is relative to + * instruction's original position within the section + */ + sub_insn_idx = prog->sec_insn_off + insn_idx + insn->imm + 1; + } - switch (relo->type) { - case RELO_LD64: - insn[0].src_reg = BPF_PSEUDO_MAP_FD; - insn[0].imm = obj->maps[relo->map_idx].fd; - break; - case RELO_DATA: - insn[0].src_reg = BPF_PSEUDO_MAP_VALUE; - insn[1].imm = insn[0].imm + relo->sym_off; - insn[0].imm = obj->maps[relo->map_idx].fd; - break; - case RELO_EXTERN: - ext = &obj->externs[relo->sym_off]; - if (ext->type == EXT_KCFG) { - insn[0].src_reg = BPF_PSEUDO_MAP_VALUE; - insn[0].imm = obj->maps[obj->kconfig_map_idx].fd; - insn[1].imm = ext->kcfg.data_off; - } else /* EXT_KSYM */ { - insn[0].imm = (__u32)ext->ksym.addr; - insn[1].imm = ext->ksym.addr >> 32; + /* we enforce that sub-programs should be in .text section */ + subprog = find_prog_by_sec_insn(obj, obj->efile.text_shndx, sub_insn_idx); + if (!subprog) { + pr_warn("prog '%s': no .text section found yet sub-program call exists\n", + prog->name); + return -LIBBPF_ERRNO__RELOC; + } + + /* if it's the first call instruction calling into this + * subprogram (meaning this subprog hasn't been processed + * yet) within the context of current main program: + * - append it at the end of main program's instructions blog; + * - process is recursively, while current program is put on hold; + * - if that subprogram calls some other not yet processes + * subprogram, same thing will happen recursively until + * there are no more unprocesses subprograms left to append + * and relocate. + */ + if (subprog->sub_insn_off == 0) { + subprog->sub_insn_off = main_prog->insns_cnt; + + new_cnt = main_prog->insns_cnt + subprog->insns_cnt; + insns = libbpf_reallocarray(main_prog->insns, new_cnt, sizeof(*insns)); + if (!insns) { + pr_warn("prog '%s': failed to realloc prog code\n", main_prog->name); + return -ENOMEM; } - break; - case RELO_CALL: - err = bpf_program__reloc_text(prog, obj, relo); + main_prog->insns = insns; + main_prog->insns_cnt = new_cnt; + + memcpy(main_prog->insns + subprog->sub_insn_off, subprog->insns, + subprog->insns_cnt * sizeof(*insns)); + + pr_debug("prog '%s': added %zu insns from sub-prog '%s'\n", + main_prog->name, subprog->insns_cnt, subprog->name); + + err = bpf_object__reloc_code(obj, main_prog, subprog); if (err) return err; - break; - default: - pr_warn("relo #%d: bad relo type %d\n", i, relo->type); - return -EINVAL; } + + /* main_prog->insns memory could have been re-allocated, so + * calculate pointer again + */ + insn = &main_prog->insns[prog->sub_insn_off + insn_idx]; + /* calculate correct instruction position within current main + * prog; each main prog can have a different set of + * subprograms appended (potentially in different order as + * well), so position of any subprog can be different for + * different main programs */ + insn->imm = subprog->sub_insn_off - (prog->sub_insn_off + insn_idx) - 1; + + if (relo) + relo->processed = true; + + pr_debug("prog '%s': insn #%zu relocated, imm %d points to subprog '%s' (now at %zu offset)\n", + prog->name, insn_idx, insn->imm, subprog->name, subprog->sub_insn_off); } - zfree(&prog->reloc_desc); - prog->nr_reloc = 0; + return 0; +} + +/* + * Relocate sub-program calls. + * + * Algorithm operates as follows. Each entry-point BPF program (referred to as + * main prog) is processed separately. For each subprog (non-entry functions, + * that can be called from either entry progs or other subprogs) gets their + * sub_insn_off reset to zero. This serves as indicator that this subprogram + * hasn't been yet appended and relocated within current main prog. Once its + * relocated, sub_insn_off will point at the position within current main prog + * where given subprog was appended. This will further be used to relocate all + * the call instructions jumping into this subprog. + * + * We start with main program and process all call instructions. If the call + * is into a subprog that hasn't been processed (i.e., subprog->sub_insn_off + * is zero), subprog instructions are appended at the end of main program's + * instruction array. Then main program is "put on hold" while we recursively + * process newly appended subprogram. If that subprogram calls into another + * subprogram that hasn't been appended, new subprogram is appended again to + * the *main* prog's instructions (subprog's instructions are always left + * untouched, as they need to be in unmodified state for subsequent main progs + * and subprog instructions are always sent only as part of a main prog) and + * the process continues recursively. Once all the subprogs called from a main + * prog or any of its subprogs are appended (and relocated), all their + * positions within finalized instructions array are known, so it's easy to + * rewrite call instructions with correct relative offsets, corresponding to + * desired target subprog. + * + * Its important to realize that some subprogs might not be called from some + * main prog and any of its called/used subprogs. Those will keep their + * subprog->sub_insn_off as zero at all times and won't be appended to current + * main prog and won't be relocated within the context of current main prog. + * They might still be used from other main progs later. + * + * Visually this process can be shown as below. Suppose we have two main + * programs mainA and mainB and BPF object contains three subprogs: subA, + * subB, and subC. mainA calls only subA, mainB calls only subC, but subA and + * subC both call subB: + * + * +--------+ +-------+ + * | v v | + * +--+---+ +--+-+-+ +---+--+ + * | subA | | subB | | subC | + * +--+---+ +------+ +---+--+ + * ^ ^ + * | | + * +---+-------+ +------+----+ + * | mainA | | mainB | + * +-----------+ +-----------+ + * + * We'll start relocating mainA, will find subA, append it and start + * processing sub A recursively: + * + * +-----------+------+ + * | mainA | subA | + * +-----------+------+ + * + * At this point we notice that subB is used from subA, so we append it and + * relocate (there are no further subcalls from subB): + * + * +-----------+------+------+ + * | mainA | subA | subB | + * +-----------+------+------+ + * + * At this point, we relocate subA calls, then go one level up and finish with + * relocatin mainA calls. mainA is done. + * + * For mainB process is similar but results in different order. We start with + * mainB and skip subA and subB, as mainB never calls them (at least + * directly), but we see subC is needed, so we append and start processing it: + * + * +-----------+------+ + * | mainB | subC | + * +-----------+------+ + * Now we see subC needs subB, so we go back to it, append and relocate it: + * + * +-----------+------+------+ + * | mainB | subC | subB | + * +-----------+------+------+ + * + * At this point we unwind recursion, relocate calls in subC, then in mainB. + */ +static int +bpf_object__relocate_calls(struct bpf_object *obj, struct bpf_program *prog) +{ + struct bpf_program *subprog; + int i, j, err; + + /* mark all subprogs as not relocated (yet) within the context of + * current main program + */ + for (i = 0; i < obj->nr_programs; i++) { + subprog = &obj->programs[i]; + if (!prog_is_subprog(obj, subprog)) + continue; + + subprog->sub_insn_off = 0; + for (j = 0; j < subprog->nr_reloc; j++) + if (subprog->reloc_desc[j].type == RELO_CALL) + subprog->reloc_desc[j].processed = false; + } + + err = bpf_object__reloc_code(obj, prog, prog); + if (err) + return err; + + return 0; } @@ -5161,35 +6366,45 @@ bpf_object__relocate(struct bpf_object *obj, const char *targ_btf_path) return err; } } - /* ensure .text is relocated first, as it's going to be copied as-is - * later for sub-program calls + /* relocate data references first for all programs and sub-programs, + * as they don't change relative to code locations, so subsequent + * subprogram processing won't need to re-calculate any of them */ for (i = 0; i < obj->nr_programs; i++) { prog = &obj->programs[i]; - if (prog->idx != obj->efile.text_shndx) - continue; - - err = bpf_program__relocate(prog, obj); + err = bpf_object__relocate_data(obj, prog); if (err) { - pr_warn("failed to relocate '%s'\n", prog->section_name); + pr_warn("prog '%s': failed to relocate data references: %d\n", + prog->name, err); return err; } - break; } - /* now relocate everything but .text, which by now is relocated - * properly, so we can copy raw sub-program instructions as is safely + /* now relocate subprogram calls and append used subprograms to main + * programs; each copy of subprogram code needs to be relocated + * differently for each main program, because its code location might + * have changed */ for (i = 0; i < obj->nr_programs; i++) { prog = &obj->programs[i]; - if (prog->idx == obj->efile.text_shndx) + /* sub-program's sub-calls are relocated within the context of + * its main program only + */ + if (prog_is_subprog(obj, prog)) continue; - err = bpf_program__relocate(prog, obj); + err = bpf_object__relocate_calls(obj, prog); if (err) { - pr_warn("failed to relocate '%s'\n", prog->section_name); + pr_warn("prog '%s': failed to relocate calls: %d\n", + prog->name, err); return err; } } + /* free up relocation descriptors */ + for (i = 0; i < obj->nr_programs; i++) { + prog = &obj->programs[i]; + zfree(&prog->reloc_desc); + prog->nr_reloc = 0; + } return 0; } @@ -5230,8 +6445,7 @@ static int bpf_object__collect_map_relos(struct bpf_object *obj, i, (size_t)GELF_R_SYM(rel.r_info)); return -LIBBPF_ERRNO__FORMAT; } - name = elf_strptr(obj->efile.elf, obj->efile.strtabidx, - sym.st_name) ? : "<?>"; + name = elf_sym_str(obj, sym.st_name) ?: "<?>"; if (sym.st_shndx != obj->efile.btf_maps_shndx) { pr_warn(".maps relo #%d: '%s' isn't a BTF-defined map\n", i, name); @@ -5293,7 +6507,7 @@ static int bpf_object__collect_map_relos(struct bpf_object *obj, moff /= bpf_ptr_sz; if (moff >= map->init_slots_sz) { new_sz = moff + 1; - tmp = realloc(map->init_slots, new_sz * host_ptr_sz); + tmp = libbpf_reallocarray(map->init_slots, new_sz, host_ptr_sz); if (!tmp) return -ENOMEM; map->init_slots = tmp; @@ -5310,41 +6524,98 @@ static int bpf_object__collect_map_relos(struct bpf_object *obj, return 0; } -static int bpf_object__collect_reloc(struct bpf_object *obj) +static int cmp_relocs(const void *_a, const void *_b) { - int i, err; + const struct reloc_desc *a = _a; + const struct reloc_desc *b = _b; - if (!obj_elf_valid(obj)) { - pr_warn("Internal error: elf object is closed\n"); - return -LIBBPF_ERRNO__INTERNAL; - } + if (a->insn_idx != b->insn_idx) + return a->insn_idx < b->insn_idx ? -1 : 1; + + /* no two relocations should have the same insn_idx, but ... */ + if (a->type != b->type) + return a->type < b->type ? -1 : 1; + + return 0; +} + +static int bpf_object__collect_relos(struct bpf_object *obj) +{ + int i, err; for (i = 0; i < obj->efile.nr_reloc_sects; i++) { GElf_Shdr *shdr = &obj->efile.reloc_sects[i].shdr; Elf_Data *data = obj->efile.reloc_sects[i].data; int idx = shdr->sh_info; - struct bpf_program *prog; if (shdr->sh_type != SHT_REL) { pr_warn("internal error at %d\n", __LINE__); return -LIBBPF_ERRNO__INTERNAL; } - if (idx == obj->efile.st_ops_shndx) { + if (idx == obj->efile.st_ops_shndx) err = bpf_object__collect_st_ops_relos(obj, shdr, data); - } else if (idx == obj->efile.btf_maps_shndx) { + else if (idx == obj->efile.btf_maps_shndx) err = bpf_object__collect_map_relos(obj, shdr, data); - } else { - prog = bpf_object__find_prog_by_idx(obj, idx); - if (!prog) { - pr_warn("relocation failed: no prog in section(%d)\n", idx); - return -LIBBPF_ERRNO__RELOC; - } - err = bpf_program__collect_reloc(prog, shdr, data, obj); - } + else + err = bpf_object__collect_prog_relos(obj, shdr, data); if (err) return err; } + + for (i = 0; i < obj->nr_programs; i++) { + struct bpf_program *p = &obj->programs[i]; + + if (!p->nr_reloc) + continue; + + qsort(p->reloc_desc, p->nr_reloc, sizeof(*p->reloc_desc), cmp_relocs); + } + return 0; +} + +static bool insn_is_helper_call(struct bpf_insn *insn, enum bpf_func_id *func_id) +{ + if (BPF_CLASS(insn->code) == BPF_JMP && + BPF_OP(insn->code) == BPF_CALL && + BPF_SRC(insn->code) == BPF_K && + insn->src_reg == 0 && + insn->dst_reg == 0) { + *func_id = insn->imm; + return true; + } + return false; +} + +static int bpf_object__sanitize_prog(struct bpf_object* obj, struct bpf_program *prog) +{ + struct bpf_insn *insn = prog->insns; + enum bpf_func_id func_id; + int i; + + for (i = 0; i < prog->insns_cnt; i++, insn++) { + if (!insn_is_helper_call(insn, &func_id)) + continue; + + /* on kernels that don't yet support + * bpf_probe_read_{kernel,user}[_str] helpers, fall back + * to bpf_probe_read() which works well for old kernels + */ + switch (func_id) { + case BPF_FUNC_probe_read_kernel: + case BPF_FUNC_probe_read_user: + if (!kernel_supports(FEAT_PROBE_READ_KERN)) + insn->imm = BPF_FUNC_probe_read; + break; + case BPF_FUNC_probe_read_kernel_str: + case BPF_FUNC_probe_read_user_str: + if (!kernel_supports(FEAT_PROBE_READ_KERN)) + insn->imm = BPF_FUNC_probe_read_str; + break; + default: + break; + } + } return 0; } @@ -5364,12 +6635,12 @@ load_program(struct bpf_program *prog, struct bpf_insn *insns, int insns_cnt, memset(&load_attr, 0, sizeof(struct bpf_load_program_attr)); load_attr.prog_type = prog->type; /* old kernels might not support specifying expected_attach_type */ - if (!prog->caps->exp_attach_type && prog->sec_def && + if (!kernel_supports(FEAT_EXP_ATTACH_TYPE) && prog->sec_def && prog->sec_def->is_exp_attach_type_optional) load_attr.expected_attach_type = 0; else load_attr.expected_attach_type = prog->expected_attach_type; - if (prog->caps->name) + if (kernel_supports(FEAT_PROG_NAME)) load_attr.name = prog->name; load_attr.insns = insns; load_attr.insns_cnt = insns_cnt; @@ -5387,7 +6658,7 @@ load_program(struct bpf_program *prog, struct bpf_insn *insns, int insns_cnt, } /* specify func_info/line_info only if kernel supports them */ btf_fd = bpf_object__btf_fd(prog->obj); - if (btf_fd >= 0 && prog->obj->caps.btf_func) { + if (btf_fd >= 0 && kernel_supports(FEAT_BTF_FUNC)) { load_attr.prog_btf_fd = btf_fd; load_attr.func_info = prog->func_info; load_attr.func_info_rec_size = prog->func_info_rec_size; @@ -5413,6 +6684,20 @@ retry_load: if (ret >= 0) { if (log_buf && load_attr.log_level) pr_debug("verifier log:\n%s", log_buf); + + if (prog->obj->rodata_map_idx >= 0 && + kernel_supports(FEAT_PROG_BIND_MAP)) { + struct bpf_map *rodata_map = + &prog->obj->maps[prog->obj->rodata_map_idx]; + + if (bpf_prog_bind_map(ret, bpf_map__fd(rodata_map), NULL)) { + cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg)); + pr_warn("prog '%s': failed to bind .rodata map: %s\n", + prog->name, cp); + /* Don't fail hard if can't bind rodata. */ + } + } + *pfd = ret; ret = 0; goto out; @@ -5425,7 +6710,7 @@ retry_load: free(log_buf); goto retry_load; } - ret = -errno; + ret = errno ? -errno : -LIBBPF_ERRNO__LOAD; cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg)); pr_warn("load bpf program failed: %s\n", cp); pr_perm_msg(ret); @@ -5465,8 +6750,7 @@ int bpf_program__load(struct bpf_program *prog, char *license, __u32 kern_ver) int err = 0, fd, i, btf_id; if (prog->obj->loaded) { - pr_warn("prog '%s'('%s'): can't load after object was loaded\n", - prog->name, prog->section_name); + pr_warn("prog '%s': can't load after object was loaded\n", prog->name); return -EINVAL; } @@ -5482,7 +6766,7 @@ int bpf_program__load(struct bpf_program *prog, char *license, __u32 kern_ver) if (prog->instances.nr < 0 || !prog->instances.fds) { if (prog->preprocessor) { pr_warn("Internal error: can't load program '%s'\n", - prog->section_name); + prog->name); return -LIBBPF_ERRNO__INTERNAL; } @@ -5497,8 +6781,8 @@ int bpf_program__load(struct bpf_program *prog, char *license, __u32 kern_ver) if (!prog->preprocessor) { if (prog->instances.nr != 1) { - pr_warn("Program '%s' is inconsistent: nr(%d) != 1\n", - prog->section_name, prog->instances.nr); + pr_warn("prog '%s': inconsistent nr(%d) != 1\n", + prog->name, prog->instances.nr); } err = load_program(prog, prog->insns, prog->insns_cnt, license, kern_ver, &fd); @@ -5516,13 +6800,13 @@ int bpf_program__load(struct bpf_program *prog, char *license, __u32 kern_ver) prog->insns_cnt, &result); if (err) { pr_warn("Preprocessing the %dth instance of program '%s' failed\n", - i, prog->section_name); + i, prog->name); goto out; } if (!result.new_insn_ptr || !result.new_insn_cnt) { pr_debug("Skip loading the %dth instance of program '%s'\n", - i, prog->section_name); + i, prog->name); prog->instances.fds[i] = -1; if (result.pfd) *result.pfd = -1; @@ -5533,7 +6817,7 @@ int bpf_program__load(struct bpf_program *prog, char *license, __u32 kern_ver) result.new_insn_cnt, license, kern_ver, &fd); if (err) { pr_warn("Loading the %dth instance of program '%s' failed\n", - i, prog->section_name); + i, prog->name); goto out; } @@ -5543,18 +6827,12 @@ int bpf_program__load(struct bpf_program *prog, char *license, __u32 kern_ver) } out: if (err) - pr_warn("failed to load program '%s'\n", prog->section_name); + pr_warn("failed to load program '%s'\n", prog->name); zfree(&prog->insns); prog->insns_cnt = 0; return err; } -static bool bpf_program__is_function_storage(const struct bpf_program *prog, - const struct bpf_object *obj) -{ - return prog->idx == obj->efile.text_shndx && obj->has_pseudo_calls; -} - static int bpf_object__load_progs(struct bpf_object *obj, int log_level) { @@ -5564,11 +6842,17 @@ bpf_object__load_progs(struct bpf_object *obj, int log_level) for (i = 0; i < obj->nr_programs; i++) { prog = &obj->programs[i]; - if (bpf_program__is_function_storage(prog, obj)) + err = bpf_object__sanitize_prog(obj, prog); + if (err) + return err; + } + + for (i = 0; i < obj->nr_programs; i++) { + prog = &obj->programs[i]; + if (prog_is_subprog(obj, prog)) continue; if (!prog->load) { - pr_debug("prog '%s'('%s'): skipped loading\n", - prog->name, prog->section_name); + pr_debug("prog '%s': skipped loading\n", prog->name); continue; } prog->log_level |= log_level; @@ -5629,18 +6913,19 @@ __bpf_object__open(const char *path, const void *obj_buf, size_t obj_buf_sz, err = err ? : bpf_object__collect_externs(obj); err = err ? : bpf_object__finalize_btf(obj); err = err ? : bpf_object__init_maps(obj, opts); - err = err ? : bpf_object__init_prog_names(obj); - err = err ? : bpf_object__collect_reloc(obj); + err = err ? : bpf_object__collect_relos(obj); if (err) goto out; bpf_object__elf_finish(obj); bpf_object__for_each_program(prog, obj) { - prog->sec_def = find_sec_def(prog->section_name); + prog->sec_def = find_sec_def(prog->sec_name); if (!prog->sec_def) /* couldn't guess, but user might manually specify */ continue; + if (prog->sec_def->is_sleepable) + prog->prog_flags |= BPF_F_SLEEPABLE; bpf_program__set_type(prog, prog->sec_def->prog_type); bpf_program__set_expected_attach_type(prog, prog->sec_def->expected_attach_type); @@ -5750,11 +7035,11 @@ static int bpf_object__sanitize_maps(struct bpf_object *obj) bpf_object__for_each_map(m, obj) { if (!bpf_map__is_internal(m)) continue; - if (!obj->caps.global_data) { + if (!kernel_supports(FEAT_GLOBAL_DATA)) { pr_warn("kernel doesn't support global data\n"); return -ENOTSUP; } - if (!obj->caps.array_mmap) + if (!kernel_supports(FEAT_ARRAY_MMAP)) m->def.map_flags ^= BPF_F_MMAPABLE; } @@ -5809,10 +7094,72 @@ out: return err; } +static int bpf_object__resolve_ksyms_btf_id(struct bpf_object *obj) +{ + struct extern_desc *ext; + int i, id; + + for (i = 0; i < obj->nr_extern; i++) { + const struct btf_type *targ_var, *targ_type; + __u32 targ_type_id, local_type_id; + const char *targ_var_name; + int ret; + + ext = &obj->externs[i]; + if (ext->type != EXT_KSYM || !ext->ksym.type_id) + continue; + + id = btf__find_by_name_kind(obj->btf_vmlinux, ext->name, + BTF_KIND_VAR); + if (id <= 0) { + pr_warn("extern (ksym) '%s': failed to find BTF ID in vmlinux BTF.\n", + ext->name); + return -ESRCH; + } + + /* find local type_id */ + local_type_id = ext->ksym.type_id; + + /* find target type_id */ + targ_var = btf__type_by_id(obj->btf_vmlinux, id); + targ_var_name = btf__name_by_offset(obj->btf_vmlinux, + targ_var->name_off); + targ_type = skip_mods_and_typedefs(obj->btf_vmlinux, + targ_var->type, + &targ_type_id); + + ret = bpf_core_types_are_compat(obj->btf, local_type_id, + obj->btf_vmlinux, targ_type_id); + if (ret <= 0) { + const struct btf_type *local_type; + const char *targ_name, *local_name; + + local_type = btf__type_by_id(obj->btf, local_type_id); + local_name = btf__name_by_offset(obj->btf, + local_type->name_off); + targ_name = btf__name_by_offset(obj->btf_vmlinux, + targ_type->name_off); + + pr_warn("extern (ksym) '%s': incompatible types, expected [%d] %s %s, but kernel has [%d] %s %s\n", + ext->name, local_type_id, + btf_kind_str(local_type), local_name, targ_type_id, + btf_kind_str(targ_type), targ_name); + return -EINVAL; + } + + ext->is_set = true; + ext->ksym.vmlinux_btf_id = id; + pr_debug("extern (ksym) '%s': resolved to [%d] %s %s\n", + ext->name, id, btf_kind_str(targ_var), targ_var_name); + } + return 0; +} + static int bpf_object__resolve_externs(struct bpf_object *obj, const char *extra_kconfig) { bool need_config = false, need_kallsyms = false; + bool need_vmlinux_btf = false; struct extern_desc *ext; void *kcfg_data = NULL; int err, i; @@ -5843,7 +7190,10 @@ static int bpf_object__resolve_externs(struct bpf_object *obj, strncmp(ext->name, "CONFIG_", 7) == 0) { need_config = true; } else if (ext->type == EXT_KSYM) { - need_kallsyms = true; + if (ext->ksym.type_id) + need_vmlinux_btf = true; + else + need_kallsyms = true; } else { pr_warn("unrecognized extern '%s'\n", ext->name); return -EINVAL; @@ -5872,6 +7222,11 @@ static int bpf_object__resolve_externs(struct bpf_object *obj, if (err) return -EINVAL; } + if (need_vmlinux_btf) { + err = bpf_object__resolve_ksyms_btf_id(obj); + if (err) + return -EINVAL; + } for (i = 0; i < obj->nr_extern; i++) { ext = &obj->externs[i]; @@ -5904,11 +7259,10 @@ int bpf_object__load_xattr(struct bpf_object_load_attr *attr) } err = bpf_object__probe_loading(obj); - err = err ? : bpf_object__probe_caps(obj); + err = err ? : bpf_object__load_vmlinux_btf(obj); err = err ? : bpf_object__resolve_externs(obj, obj->kconfig); err = err ? : bpf_object__sanitize_and_load_btf(obj); err = err ? : bpf_object__sanitize_maps(obj); - err = err ? : bpf_object__load_vmlinux_btf(obj); err = err ? : bpf_object__init_kern_struct_ops_maps(obj); err = err ? : bpf_object__create_maps(obj); err = err ? : bpf_object__relocate(obj, attr->target_btf_path); @@ -6016,7 +7370,7 @@ int bpf_program__pin_instance(struct bpf_program *prog, const char *path, if (instance < 0 || instance >= prog->instances.nr) { pr_warn("invalid prog instance %d of prog %s (max %d)\n", - instance, prog->section_name, prog->instances.nr); + instance, prog->name, prog->instances.nr); return -EINVAL; } @@ -6047,7 +7401,7 @@ int bpf_program__unpin_instance(struct bpf_program *prog, const char *path, if (instance < 0 || instance >= prog->instances.nr) { pr_warn("invalid prog instance %d of prog %s (max %d)\n", - instance, prog->section_name, prog->instances.nr); + instance, prog->name, prog->instances.nr); return -EINVAL; } @@ -6077,8 +7431,7 @@ int bpf_program__pin(struct bpf_program *prog, const char *path) } if (prog->instances.nr <= 0) { - pr_warn("no instances of prog %s to pin\n", - prog->section_name); + pr_warn("no instances of prog %s to pin\n", prog->name); return -EINVAL; } @@ -6140,8 +7493,7 @@ int bpf_program__unpin(struct bpf_program *prog, const char *path) } if (prog->instances.nr <= 0) { - pr_warn("no instances of prog %s to pin\n", - prog->section_name); + pr_warn("no instances of prog %s to pin\n", prog->name); return -EINVAL; } @@ -6633,7 +7985,7 @@ bpf_program__next(struct bpf_program *prev, const struct bpf_object *obj) do { prog = __bpf_program__iter(prog, obj, true); - } while (prog && bpf_program__is_function_storage(prog, obj)); + } while (prog && prog_is_subprog(obj, prog)); return prog; } @@ -6645,7 +7997,7 @@ bpf_program__prev(struct bpf_program *next, const struct bpf_object *obj) do { prog = __bpf_program__iter(prog, obj, false); - } while (prog && bpf_program__is_function_storage(prog, obj)); + } while (prog && prog_is_subprog(obj, prog)); return prog; } @@ -6676,11 +8028,16 @@ const char *bpf_program__name(const struct bpf_program *prog) return prog->name; } +const char *bpf_program__section_name(const struct bpf_program *prog) +{ + return prog->sec_name; +} + const char *bpf_program__title(const struct bpf_program *prog, bool needs_copy) { const char *title; - title = prog->section_name; + title = prog->sec_name; if (needs_copy) { title = strdup(title); if (!title) { @@ -6713,7 +8070,7 @@ int bpf_program__fd(const struct bpf_program *prog) size_t bpf_program__size(const struct bpf_program *prog) { - return prog->insns_cnt * sizeof(struct bpf_insn); + return prog->insns_cnt * BPF_INSN_SZ; } int bpf_program__set_prep(struct bpf_program *prog, int nr_instances, @@ -6753,14 +8110,14 @@ int bpf_program__nth_fd(const struct bpf_program *prog, int n) if (n >= prog->instances.nr || n < 0) { pr_warn("Can't get the %dth fd from program %s: only %d instances\n", - n, prog->section_name, prog->instances.nr); + n, prog->name, prog->instances.nr); return -EINVAL; } fd = prog->instances.fds[n]; if (fd < 0) { pr_warn("%dth instance of program '%s' is invalid\n", - n, prog->section_name); + n, prog->name); return -ENOENT; } @@ -6910,6 +8267,21 @@ static const struct bpf_sec_def section_defs[] = { .expected_attach_type = BPF_TRACE_FEXIT, .is_attach_btf = true, .attach_fn = attach_trace), + SEC_DEF("fentry.s/", TRACING, + .expected_attach_type = BPF_TRACE_FENTRY, + .is_attach_btf = true, + .is_sleepable = true, + .attach_fn = attach_trace), + SEC_DEF("fmod_ret.s/", TRACING, + .expected_attach_type = BPF_MODIFY_RETURN, + .is_attach_btf = true, + .is_sleepable = true, + .attach_fn = attach_trace), + SEC_DEF("fexit.s/", TRACING, + .expected_attach_type = BPF_TRACE_FEXIT, + .is_attach_btf = true, + .is_sleepable = true, + .attach_fn = attach_trace), SEC_DEF("freplace/", EXT, .is_attach_btf = true, .attach_fn = attach_trace), @@ -6917,6 +8289,11 @@ static const struct bpf_sec_def section_defs[] = { .is_attach_btf = true, .expected_attach_type = BPF_LSM_MAC, .attach_fn = attach_lsm), + SEC_DEF("lsm.s/", LSM, + .is_attach_btf = true, + .is_sleepable = true, + .expected_attach_type = BPF_LSM_MAC, + .attach_fn = attach_lsm), SEC_DEF("iter/", TRACING, .expected_attach_type = BPF_TRACE_ITER, .is_attach_btf = true, @@ -7100,7 +8477,7 @@ static int bpf_object__collect_st_ops_relos(struct bpf_object *obj, const struct btf *btf; struct bpf_map *map; Elf_Data *symbols; - unsigned int moff; + unsigned int moff, insn_idx; const char *name; __u32 member_idx; GElf_Sym sym; @@ -7122,8 +8499,7 @@ static int bpf_object__collect_st_ops_relos(struct bpf_object *obj, return -LIBBPF_ERRNO__FORMAT; } - name = elf_strptr(obj->efile.elf, obj->efile.strtabidx, - sym.st_name) ? : "<?>"; + name = elf_sym_str(obj, sym.st_name) ?: "<?>"; map = find_struct_ops_map_by_offset(obj, rel.r_offset); if (!map) { pr_warn("struct_ops reloc: cannot find map at rel.r_offset %zu\n", @@ -7146,6 +8522,12 @@ static int bpf_object__collect_st_ops_relos(struct bpf_object *obj, map->name, (size_t)rel.r_offset, shdr_idx); return -LIBBPF_ERRNO__RELOC; } + if (sym.st_value % BPF_INSN_SZ) { + pr_warn("struct_ops reloc %s: invalid target program offset %llu\n", + map->name, (unsigned long long)sym.st_value); + return -LIBBPF_ERRNO__FORMAT; + } + insn_idx = sym.st_value / BPF_INSN_SZ; member = find_member_by_offset(st_ops->type, moff * 8); if (!member) { @@ -7162,7 +8544,7 @@ static int bpf_object__collect_st_ops_relos(struct bpf_object *obj, return -EINVAL; } - prog = bpf_object__find_prog_by_idx(obj, shdr_idx); + prog = find_prog_by_sec_insn(obj, shdr_idx, insn_idx); if (!prog) { pr_warn("struct_ops reloc %s: cannot find prog at shdr_idx %u to relocate func ptr %s\n", map->name, shdr_idx, name); @@ -7172,7 +8554,7 @@ static int bpf_object__collect_st_ops_relos(struct bpf_object *obj, if (prog->type == BPF_PROG_TYPE_UNSPEC) { const struct bpf_sec_def *sec_def; - sec_def = find_sec_def(prog->section_name); + sec_def = find_sec_def(prog->sec_name); if (sec_def && sec_def->prog_type != BPF_PROG_TYPE_STRUCT_OPS) { /* for pr_warn */ @@ -7195,7 +8577,7 @@ static int bpf_object__collect_st_ops_relos(struct bpf_object *obj, invalid_prog: pr_warn("struct_ops reloc %s: cannot use prog %s in sec %s with type %u attach_btf_id %u expected_attach_type %u for func ptr %s\n", - map->name, prog->name, prog->section_name, prog->type, + map->name, prog->name, prog->sec_name, prog->type, prog->attach_btf_id, prog->expected_attach_type, name); return -EINVAL; } @@ -7299,7 +8681,7 @@ static int libbpf_find_attach_btf_id(struct bpf_program *prog) { enum bpf_attach_type attach_type = prog->expected_attach_type; __u32 attach_prog_fd = prog->attach_prog_fd; - const char *name = prog->section_name; + const char *name = prog->sec_name; int i, err; if (!name) @@ -7640,7 +9022,7 @@ int bpf_prog_load_xattr(const struct bpf_prog_load_attr *attr, prog->prog_ifindex = attr->ifindex; prog->log_level = attr->log_level; - prog->prog_flags = attr->prog_flags; + prog->prog_flags |= attr->prog_flags; if (!first_prog) first_prog = prog; } @@ -7826,14 +9208,14 @@ struct bpf_link *bpf_program__attach_perf_event(struct bpf_program *prog, int prog_fd, err; if (pfd < 0) { - pr_warn("program '%s': invalid perf event FD %d\n", - bpf_program__title(prog, false), pfd); + pr_warn("prog '%s': invalid perf event FD %d\n", + prog->name, pfd); return ERR_PTR(-EINVAL); } prog_fd = bpf_program__fd(prog); if (prog_fd < 0) { - pr_warn("program '%s': can't attach BPF program w/o FD (did you load it?)\n", - bpf_program__title(prog, false)); + pr_warn("prog '%s': can't attach BPF program w/o FD (did you load it?)\n", + prog->name); return ERR_PTR(-EINVAL); } @@ -7846,20 +9228,18 @@ struct bpf_link *bpf_program__attach_perf_event(struct bpf_program *prog, if (ioctl(pfd, PERF_EVENT_IOC_SET_BPF, prog_fd) < 0) { err = -errno; free(link); - pr_warn("program '%s': failed to attach to pfd %d: %s\n", - bpf_program__title(prog, false), pfd, - libbpf_strerror_r(err, errmsg, sizeof(errmsg))); + pr_warn("prog '%s': failed to attach to pfd %d: %s\n", + prog->name, pfd, libbpf_strerror_r(err, errmsg, sizeof(errmsg))); if (err == -EPROTO) - pr_warn("program '%s': try add PERF_SAMPLE_CALLCHAIN to or remove exclude_callchain_[kernel|user] from pfd %d\n", - bpf_program__title(prog, false), pfd); + pr_warn("prog '%s': try add PERF_SAMPLE_CALLCHAIN to or remove exclude_callchain_[kernel|user] from pfd %d\n", + prog->name, pfd); return ERR_PTR(err); } if (ioctl(pfd, PERF_EVENT_IOC_ENABLE, 0) < 0) { err = -errno; free(link); - pr_warn("program '%s': failed to enable pfd %d: %s\n", - bpf_program__title(prog, false), pfd, - libbpf_strerror_r(err, errmsg, sizeof(errmsg))); + pr_warn("prog '%s': failed to enable pfd %d: %s\n", + prog->name, pfd, libbpf_strerror_r(err, errmsg, sizeof(errmsg))); return ERR_PTR(err); } return link; @@ -7981,9 +9361,8 @@ struct bpf_link *bpf_program__attach_kprobe(struct bpf_program *prog, pfd = perf_event_open_probe(false /* uprobe */, retprobe, func_name, 0 /* offset */, -1 /* pid */); if (pfd < 0) { - pr_warn("program '%s': failed to create %s '%s' perf event: %s\n", - bpf_program__title(prog, false), - retprobe ? "kretprobe" : "kprobe", func_name, + pr_warn("prog '%s': failed to create %s '%s' perf event: %s\n", + prog->name, retprobe ? "kretprobe" : "kprobe", func_name, libbpf_strerror_r(pfd, errmsg, sizeof(errmsg))); return ERR_PTR(pfd); } @@ -7991,9 +9370,8 @@ struct bpf_link *bpf_program__attach_kprobe(struct bpf_program *prog, if (IS_ERR(link)) { close(pfd); err = PTR_ERR(link); - pr_warn("program '%s': failed to attach to %s '%s': %s\n", - bpf_program__title(prog, false), - retprobe ? "kretprobe" : "kprobe", func_name, + pr_warn("prog '%s': failed to attach to %s '%s': %s\n", + prog->name, retprobe ? "kretprobe" : "kprobe", func_name, libbpf_strerror_r(err, errmsg, sizeof(errmsg))); return link; } @@ -8006,7 +9384,7 @@ static struct bpf_link *attach_kprobe(const struct bpf_sec_def *sec, const char *func_name; bool retprobe; - func_name = bpf_program__title(prog, false) + sec->len; + func_name = prog->sec_name + sec->len; retprobe = strcmp(sec->sec, "kretprobe/") == 0; return bpf_program__attach_kprobe(prog, retprobe, func_name); @@ -8024,9 +9402,8 @@ struct bpf_link *bpf_program__attach_uprobe(struct bpf_program *prog, pfd = perf_event_open_probe(true /* uprobe */, retprobe, binary_path, func_offset, pid); if (pfd < 0) { - pr_warn("program '%s': failed to create %s '%s:0x%zx' perf event: %s\n", - bpf_program__title(prog, false), - retprobe ? "uretprobe" : "uprobe", + pr_warn("prog '%s': failed to create %s '%s:0x%zx' perf event: %s\n", + prog->name, retprobe ? "uretprobe" : "uprobe", binary_path, func_offset, libbpf_strerror_r(pfd, errmsg, sizeof(errmsg))); return ERR_PTR(pfd); @@ -8035,9 +9412,8 @@ struct bpf_link *bpf_program__attach_uprobe(struct bpf_program *prog, if (IS_ERR(link)) { close(pfd); err = PTR_ERR(link); - pr_warn("program '%s': failed to attach to %s '%s:0x%zx': %s\n", - bpf_program__title(prog, false), - retprobe ? "uretprobe" : "uprobe", + pr_warn("prog '%s': failed to attach to %s '%s:0x%zx': %s\n", + prog->name, retprobe ? "uretprobe" : "uprobe", binary_path, func_offset, libbpf_strerror_r(err, errmsg, sizeof(errmsg))); return link; @@ -8105,9 +9481,8 @@ struct bpf_link *bpf_program__attach_tracepoint(struct bpf_program *prog, pfd = perf_event_open_tracepoint(tp_category, tp_name); if (pfd < 0) { - pr_warn("program '%s': failed to create tracepoint '%s/%s' perf event: %s\n", - bpf_program__title(prog, false), - tp_category, tp_name, + pr_warn("prog '%s': failed to create tracepoint '%s/%s' perf event: %s\n", + prog->name, tp_category, tp_name, libbpf_strerror_r(pfd, errmsg, sizeof(errmsg))); return ERR_PTR(pfd); } @@ -8115,9 +9490,8 @@ struct bpf_link *bpf_program__attach_tracepoint(struct bpf_program *prog, if (IS_ERR(link)) { close(pfd); err = PTR_ERR(link); - pr_warn("program '%s': failed to attach to tracepoint '%s/%s': %s\n", - bpf_program__title(prog, false), - tp_category, tp_name, + pr_warn("prog '%s': failed to attach to tracepoint '%s/%s': %s\n", + prog->name, tp_category, tp_name, libbpf_strerror_r(err, errmsg, sizeof(errmsg))); return link; } @@ -8130,7 +9504,7 @@ static struct bpf_link *attach_tp(const struct bpf_sec_def *sec, char *sec_name, *tp_cat, *tp_name; struct bpf_link *link; - sec_name = strdup(bpf_program__title(prog, false)); + sec_name = strdup(prog->sec_name); if (!sec_name) return ERR_PTR(-ENOMEM); @@ -8159,8 +9533,7 @@ struct bpf_link *bpf_program__attach_raw_tracepoint(struct bpf_program *prog, prog_fd = bpf_program__fd(prog); if (prog_fd < 0) { - pr_warn("program '%s': can't attach before loaded\n", - bpf_program__title(prog, false)); + pr_warn("prog '%s': can't attach before loaded\n", prog->name); return ERR_PTR(-EINVAL); } @@ -8173,9 +9546,8 @@ struct bpf_link *bpf_program__attach_raw_tracepoint(struct bpf_program *prog, if (pfd < 0) { pfd = -errno; free(link); - pr_warn("program '%s': failed to attach to raw tracepoint '%s': %s\n", - bpf_program__title(prog, false), tp_name, - libbpf_strerror_r(pfd, errmsg, sizeof(errmsg))); + pr_warn("prog '%s': failed to attach to raw tracepoint '%s': %s\n", + prog->name, tp_name, libbpf_strerror_r(pfd, errmsg, sizeof(errmsg))); return ERR_PTR(pfd); } link->fd = pfd; @@ -8185,7 +9557,7 @@ struct bpf_link *bpf_program__attach_raw_tracepoint(struct bpf_program *prog, static struct bpf_link *attach_raw_tp(const struct bpf_sec_def *sec, struct bpf_program *prog) { - const char *tp_name = bpf_program__title(prog, false) + sec->len; + const char *tp_name = prog->sec_name + sec->len; return bpf_program__attach_raw_tracepoint(prog, tp_name); } @@ -8199,8 +9571,7 @@ static struct bpf_link *bpf_program__attach_btf_id(struct bpf_program *prog) prog_fd = bpf_program__fd(prog); if (prog_fd < 0) { - pr_warn("program '%s': can't attach before loaded\n", - bpf_program__title(prog, false)); + pr_warn("prog '%s': can't attach before loaded\n", prog->name); return ERR_PTR(-EINVAL); } @@ -8213,9 +9584,8 @@ static struct bpf_link *bpf_program__attach_btf_id(struct bpf_program *prog) if (pfd < 0) { pfd = -errno; free(link); - pr_warn("program '%s': failed to attach: %s\n", - bpf_program__title(prog, false), - libbpf_strerror_r(pfd, errmsg, sizeof(errmsg))); + pr_warn("prog '%s': failed to attach: %s\n", + prog->name, libbpf_strerror_r(pfd, errmsg, sizeof(errmsg))); return ERR_PTR(pfd); } link->fd = pfd; @@ -8251,9 +9621,11 @@ static struct bpf_link *attach_iter(const struct bpf_sec_def *sec, } static struct bpf_link * -bpf_program__attach_fd(struct bpf_program *prog, int target_fd, +bpf_program__attach_fd(struct bpf_program *prog, int target_fd, int btf_id, const char *target_name) { + DECLARE_LIBBPF_OPTS(bpf_link_create_opts, opts, + .target_btf_id = btf_id); enum bpf_attach_type attach_type; char errmsg[STRERR_BUFSIZE]; struct bpf_link *link; @@ -8261,8 +9633,7 @@ bpf_program__attach_fd(struct bpf_program *prog, int target_fd, prog_fd = bpf_program__fd(prog); if (prog_fd < 0) { - pr_warn("program '%s': can't attach before loaded\n", - bpf_program__title(prog, false)); + pr_warn("prog '%s': can't attach before loaded\n", prog->name); return ERR_PTR(-EINVAL); } @@ -8272,12 +9643,12 @@ bpf_program__attach_fd(struct bpf_program *prog, int target_fd, link->detach = &bpf_link__detach_fd; attach_type = bpf_program__get_expected_attach_type(prog); - link_fd = bpf_link_create(prog_fd, target_fd, attach_type, NULL); + link_fd = bpf_link_create(prog_fd, target_fd, attach_type, &opts); if (link_fd < 0) { link_fd = -errno; free(link); - pr_warn("program '%s': failed to attach to %s: %s\n", - bpf_program__title(prog, false), target_name, + pr_warn("prog '%s': failed to attach to %s: %s\n", + prog->name, target_name, libbpf_strerror_r(link_fd, errmsg, sizeof(errmsg))); return ERR_PTR(link_fd); } @@ -8288,19 +9659,51 @@ bpf_program__attach_fd(struct bpf_program *prog, int target_fd, struct bpf_link * bpf_program__attach_cgroup(struct bpf_program *prog, int cgroup_fd) { - return bpf_program__attach_fd(prog, cgroup_fd, "cgroup"); + return bpf_program__attach_fd(prog, cgroup_fd, 0, "cgroup"); } struct bpf_link * bpf_program__attach_netns(struct bpf_program *prog, int netns_fd) { - return bpf_program__attach_fd(prog, netns_fd, "netns"); + return bpf_program__attach_fd(prog, netns_fd, 0, "netns"); } struct bpf_link *bpf_program__attach_xdp(struct bpf_program *prog, int ifindex) { /* target_fd/target_ifindex use the same field in LINK_CREATE */ - return bpf_program__attach_fd(prog, ifindex, "xdp"); + return bpf_program__attach_fd(prog, ifindex, 0, "xdp"); +} + +struct bpf_link *bpf_program__attach_freplace(struct bpf_program *prog, + int target_fd, + const char *attach_func_name) +{ + int btf_id; + + if (!!target_fd != !!attach_func_name) { + pr_warn("prog '%s': supply none or both of target_fd and attach_func_name\n", + prog->name); + return ERR_PTR(-EINVAL); + } + + if (prog->type != BPF_PROG_TYPE_EXT) { + pr_warn("prog '%s': only BPF_PROG_TYPE_EXT can attach as freplace", + prog->name); + return ERR_PTR(-EINVAL); + } + + if (target_fd) { + btf_id = libbpf_find_prog_btf_id(attach_func_name, target_fd); + if (btf_id < 0) + return ERR_PTR(btf_id); + + return bpf_program__attach_fd(prog, target_fd, btf_id, "freplace"); + } else { + /* no target, so use raw_tracepoint_open for compatibility + * with old kernels + */ + return bpf_program__attach_trace(prog); + } } struct bpf_link * @@ -8321,8 +9724,7 @@ bpf_program__attach_iter(struct bpf_program *prog, prog_fd = bpf_program__fd(prog); if (prog_fd < 0) { - pr_warn("program '%s': can't attach before loaded\n", - bpf_program__title(prog, false)); + pr_warn("prog '%s': can't attach before loaded\n", prog->name); return ERR_PTR(-EINVAL); } @@ -8336,9 +9738,8 @@ bpf_program__attach_iter(struct bpf_program *prog, if (link_fd < 0) { link_fd = -errno; free(link); - pr_warn("program '%s': failed to attach to iterator: %s\n", - bpf_program__title(prog, false), - libbpf_strerror_r(link_fd, errmsg, sizeof(errmsg))); + pr_warn("prog '%s': failed to attach to iterator: %s\n", + prog->name, libbpf_strerror_r(link_fd, errmsg, sizeof(errmsg))); return ERR_PTR(link_fd); } link->fd = link_fd; @@ -8349,7 +9750,7 @@ struct bpf_link *bpf_program__attach(struct bpf_program *prog) { const struct bpf_sec_def *sec_def; - sec_def = find_sec_def(bpf_program__title(prog, false)); + sec_def = find_sec_def(prog->sec_name); if (!sec_def || !sec_def->attach_fn) return ERR_PTR(-ESRCH); @@ -8594,7 +9995,7 @@ struct perf_buffer *perf_buffer__new(int map_fd, size_t page_cnt, struct perf_buffer_params p = {}; struct perf_event_attr attr = { 0, }; - attr.config = PERF_COUNT_SW_BPF_OUTPUT, + attr.config = PERF_COUNT_SW_BPF_OUTPUT; attr.type = PERF_TYPE_SOFTWARE; attr.sample_type = PERF_SAMPLE_RAW; attr.sample_period = 1; @@ -8832,6 +10233,11 @@ static int perf_buffer__process_records(struct perf_buffer *pb, return 0; } +int perf_buffer__epoll_fd(const struct perf_buffer *pb) +{ + return pb->epoll_fd; +} + int perf_buffer__poll(struct perf_buffer *pb, int timeout_ms) { int i, cnt, err; @@ -8849,6 +10255,55 @@ int perf_buffer__poll(struct perf_buffer *pb, int timeout_ms) return cnt < 0 ? -errno : cnt; } +/* Return number of PERF_EVENT_ARRAY map slots set up by this perf_buffer + * manager. + */ +size_t perf_buffer__buffer_cnt(const struct perf_buffer *pb) +{ + return pb->cpu_cnt; +} + +/* + * Return perf_event FD of a ring buffer in *buf_idx* slot of + * PERF_EVENT_ARRAY BPF map. This FD can be polled for new data using + * select()/poll()/epoll() Linux syscalls. + */ +int perf_buffer__buffer_fd(const struct perf_buffer *pb, size_t buf_idx) +{ + struct perf_cpu_buf *cpu_buf; + + if (buf_idx >= pb->cpu_cnt) + return -EINVAL; + + cpu_buf = pb->cpu_bufs[buf_idx]; + if (!cpu_buf) + return -ENOENT; + + return cpu_buf->fd; +} + +/* + * Consume data from perf ring buffer corresponding to slot *buf_idx* in + * PERF_EVENT_ARRAY BPF map without waiting/polling. If there is no data to + * consume, do nothing and return success. + * Returns: + * - 0 on success; + * - <0 on failure. + */ +int perf_buffer__consume_buffer(struct perf_buffer *pb, size_t buf_idx) +{ + struct perf_cpu_buf *cpu_buf; + + if (buf_idx >= pb->cpu_cnt) + return -EINVAL; + + cpu_buf = pb->cpu_bufs[buf_idx]; + if (!cpu_buf) + return -ENOENT; + + return perf_buffer__process_records(pb, cpu_buf); +} + int perf_buffer__consume(struct perf_buffer *pb) { int i, err; @@ -8861,7 +10316,7 @@ int perf_buffer__consume(struct perf_buffer *pb) err = perf_buffer__process_records(pb, cpu_buf); if (err) { - pr_warn("error while processing records: %d\n", err); + pr_warn("perf_buffer: failed to process records in buffer #%d: %d\n", i, err); return err; } } @@ -9129,9 +10584,8 @@ int bpf_program__set_attach_target(struct bpf_program *prog, btf_id = libbpf_find_prog_btf_id(attach_func_name, attach_prog_fd); else - btf_id = __find_vmlinux_btf_id(prog->obj->btf_vmlinux, - attach_func_name, - prog->expected_attach_type); + btf_id = libbpf_find_vmlinux_btf_id(attach_func_name, + prog->expected_attach_type); if (btf_id < 0) return btf_id; @@ -9365,12 +10819,11 @@ int bpf_object__attach_skeleton(struct bpf_object_skeleton *s) struct bpf_program *prog = *s->progs[i].prog; struct bpf_link **link = s->progs[i].link; const struct bpf_sec_def *sec_def; - const char *sec_name = bpf_program__title(prog, false); if (!prog->load) continue; - sec_def = find_sec_def(sec_name); + sec_def = find_sec_def(prog->sec_name); if (!sec_def || !sec_def->attach_fn) continue; diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h index 5ecb4069a9f0..6909ee81113a 100644 --- a/tools/lib/bpf/libbpf.h +++ b/tools/lib/bpf/libbpf.h @@ -198,8 +198,9 @@ LIBBPF_API void bpf_program__set_ifindex(struct bpf_program *prog, __u32 ifindex); LIBBPF_API const char *bpf_program__name(const struct bpf_program *prog); -LIBBPF_API const char *bpf_program__title(const struct bpf_program *prog, - bool needs_copy); +LIBBPF_API const char *bpf_program__section_name(const struct bpf_program *prog); +LIBBPF_API LIBBPF_DEPRECATED("BPF program title is confusing term; please use bpf_program__section_name() instead") +const char *bpf_program__title(const struct bpf_program *prog, bool needs_copy); LIBBPF_API bool bpf_program__autoload(const struct bpf_program *prog); LIBBPF_API int bpf_program__set_autoload(struct bpf_program *prog, bool autoload); @@ -260,6 +261,9 @@ LIBBPF_API struct bpf_link * bpf_program__attach_netns(struct bpf_program *prog, int netns_fd); LIBBPF_API struct bpf_link * bpf_program__attach_xdp(struct bpf_program *prog, int ifindex); +LIBBPF_API struct bpf_link * +bpf_program__attach_freplace(struct bpf_program *prog, + int target_fd, const char *attach_func_name); struct bpf_map; @@ -588,8 +592,12 @@ perf_buffer__new_raw(int map_fd, size_t page_cnt, const struct perf_buffer_raw_opts *opts); LIBBPF_API void perf_buffer__free(struct perf_buffer *pb); +LIBBPF_API int perf_buffer__epoll_fd(const struct perf_buffer *pb); LIBBPF_API int perf_buffer__poll(struct perf_buffer *pb, int timeout_ms); LIBBPF_API int perf_buffer__consume(struct perf_buffer *pb); +LIBBPF_API int perf_buffer__consume_buffer(struct perf_buffer *pb, size_t buf_idx); +LIBBPF_API size_t perf_buffer__buffer_cnt(const struct perf_buffer *pb); +LIBBPF_API int perf_buffer__buffer_fd(const struct perf_buffer *pb, size_t buf_idx); typedef enum bpf_perf_event_ret (*bpf_perf_event_print_t)(struct perf_event_header *hdr, diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map index e35bd6cdbdbf..4ebfadf45b47 100644 --- a/tools/lib/bpf/libbpf.map +++ b/tools/lib/bpf/libbpf.map @@ -299,3 +299,41 @@ LIBBPF_0.1.0 { btf__set_fd; btf__set_pointer_size; } LIBBPF_0.0.9; + +LIBBPF_0.2.0 { + global: + bpf_prog_bind_map; + bpf_prog_test_run_opts; + bpf_program__attach_freplace; + bpf_program__section_name; + btf__add_array; + btf__add_const; + btf__add_enum; + btf__add_enum_value; + btf__add_datasec; + btf__add_datasec_var_info; + btf__add_field; + btf__add_func; + btf__add_func_param; + btf__add_func_proto; + btf__add_fwd; + btf__add_int; + btf__add_ptr; + btf__add_restrict; + btf__add_str; + btf__add_struct; + btf__add_typedef; + btf__add_union; + btf__add_var; + btf__add_volatile; + btf__endianness; + btf__find_str; + btf__new_empty; + btf__set_endianness; + btf__str_by_offset; + perf_buffer__buffer_cnt; + perf_buffer__buffer_fd; + perf_buffer__epoll_fd; + perf_buffer__consume_buffer; + xsk_socket__create_shared; +} LIBBPF_0.1.0; diff --git a/tools/lib/bpf/libbpf_common.h b/tools/lib/bpf/libbpf_common.h index a23ae1ac27eb..947d8bd8a7bb 100644 --- a/tools/lib/bpf/libbpf_common.h +++ b/tools/lib/bpf/libbpf_common.h @@ -15,6 +15,8 @@ #define LIBBPF_API __attribute__((visibility("default"))) #endif +#define LIBBPF_DEPRECATED(msg) __attribute__((deprecated(msg))) + /* Helper macro to declare and initialize libbpf options struct * * This dance with uninitialized declaration, followed by memset to zero, diff --git a/tools/lib/bpf/libbpf_internal.h b/tools/lib/bpf/libbpf_internal.h index 50d70e90d5f1..d99bc847bf84 100644 --- a/tools/lib/bpf/libbpf_internal.h +++ b/tools/lib/bpf/libbpf_internal.h @@ -9,6 +9,15 @@ #ifndef __LIBBPF_LIBBPF_INTERNAL_H #define __LIBBPF_LIBBPF_INTERNAL_H +#include <stdlib.h> +#include <limits.h> + +/* make sure libbpf doesn't use kernel-only integer typedefs */ +#pragma GCC poison u8 u16 u32 u64 s8 s16 s32 s64 + +/* prevent accidental re-addition of reallocarray() */ +#pragma GCC poison reallocarray + #include "libbpf.h" #define BTF_INFO_ENC(kind, kind_flag, vlen) \ @@ -23,6 +32,12 @@ #define BTF_PARAM_ENC(name, type) (name), (type) #define BTF_VAR_SECINFO_ENC(type, offset, size) (type), (offset), (size) +#ifndef likely +#define likely(x) __builtin_expect(!!(x), 1) +#endif +#ifndef unlikely +#define unlikely(x) __builtin_expect(!!(x), 0) +#endif #ifndef min # define min(x, y) ((x) < (y) ? (x) : (y)) #endif @@ -63,6 +78,37 @@ do { \ #define pr_info(fmt, ...) __pr(LIBBPF_INFO, fmt, ##__VA_ARGS__) #define pr_debug(fmt, ...) __pr(LIBBPF_DEBUG, fmt, ##__VA_ARGS__) +#ifndef __has_builtin +#define __has_builtin(x) 0 +#endif +/* + * Re-implement glibc's reallocarray() for libbpf internal-only use. + * reallocarray(), unfortunately, is not available in all versions of glibc, + * so requires extra feature detection and using reallocarray() stub from + * <tools/libc_compat.h> and COMPAT_NEED_REALLOCARRAY. All this complicates + * build of libbpf unnecessarily and is just a maintenance burden. Instead, + * it's trivial to implement libbpf-specific internal version and use it + * throughout libbpf. + */ +static inline void *libbpf_reallocarray(void *ptr, size_t nmemb, size_t size) +{ + size_t total; + +#if __has_builtin(__builtin_mul_overflow) + if (unlikely(__builtin_mul_overflow(nmemb, size, &total))) + return NULL; +#else + if (size == 0 || nmemb > ULONG_MAX / size) + return NULL; + total = nmemb * size; +#endif + return realloc(ptr, total); +} + +void *btf_add_mem(void **data, size_t *cap_cnt, size_t elem_sz, + size_t cur_cnt, size_t max_cnt, size_t add_cnt); +int btf_ensure_mem(void **data, size_t *cap_cnt, size_t elem_sz, size_t need_cnt); + static inline bool libbpf_validate_opts(const char *opts, size_t opts_sz, size_t user_sz, const char *type_name) @@ -94,6 +140,11 @@ static inline bool libbpf_validate_opts(const char *opts, ((opts) && opts->sz >= offsetofend(typeof(*(opts)), field)) #define OPTS_GET(opts, field, fallback_value) \ (OPTS_HAS(opts, field) ? (opts)->field : fallback_value) +#define OPTS_SET(opts, field, value) \ + do { \ + if (OPTS_HAS(opts, field)) \ + (opts)->field = value; \ + } while (0) int parse_cpu_mask_str(const char *s, bool **mask, int *mask_sz); int parse_cpu_mask_file(const char *fcpu, bool **mask, int *mask_sz); @@ -105,18 +156,6 @@ int bpf_object__section_size(const struct bpf_object *obj, const char *name, int bpf_object__variable_offset(const struct bpf_object *obj, const char *name, __u32 *off); -struct nlattr; -typedef int (*libbpf_dump_nlmsg_t)(void *cookie, void *msg, struct nlattr **tb); -int libbpf_netlink_open(unsigned int *nl_pid); -int libbpf_nl_get_link(int sock, unsigned int nl_pid, - libbpf_dump_nlmsg_t dump_link_nlmsg, void *cookie); -int libbpf_nl_get_class(int sock, unsigned int nl_pid, int ifindex, - libbpf_dump_nlmsg_t dump_class_nlmsg, void *cookie); -int libbpf_nl_get_qdisc(int sock, unsigned int nl_pid, int ifindex, - libbpf_dump_nlmsg_t dump_qdisc_nlmsg, void *cookie); -int libbpf_nl_get_filter(int sock, unsigned int nl_pid, int ifindex, int handle, - libbpf_dump_nlmsg_t dump_filter_nlmsg, void *cookie); - struct btf_ext_info { /* * info points to the individual info section (e.g. func_info and @@ -138,6 +177,44 @@ struct btf_ext_info { i < (sec)->num_info; \ i++, rec = (void *)rec + (seg)->rec_size) +/* + * The .BTF.ext ELF section layout defined as + * struct btf_ext_header + * func_info subsection + * + * The func_info subsection layout: + * record size for struct bpf_func_info in the func_info subsection + * struct btf_sec_func_info for section #1 + * a list of bpf_func_info records for section #1 + * where struct bpf_func_info mimics one in include/uapi/linux/bpf.h + * but may not be identical + * struct btf_sec_func_info for section #2 + * a list of bpf_func_info records for section #2 + * ...... + * + * Note that the bpf_func_info record size in .BTF.ext may not + * be the same as the one defined in include/uapi/linux/bpf.h. + * The loader should ensure that record_size meets minimum + * requirement and pass the record as is to the kernel. The + * kernel will handle the func_info properly based on its contents. + */ +struct btf_ext_header { + __u16 magic; + __u8 version; + __u8 flags; + __u32 hdr_len; + + /* All offsets are in bytes relative to the end of this header */ + __u32 func_info_off; + __u32 func_info_len; + __u32 line_info_off; + __u32 line_info_len; + + /* optional part of .BTF.ext header */ + __u32 core_relo_off; + __u32 core_relo_len; +}; + struct btf_ext { union { struct btf_ext_header *hdr; @@ -145,7 +222,7 @@ struct btf_ext { }; struct btf_ext_info func_info; struct btf_ext_info line_info; - struct btf_ext_info field_reloc_info; + struct btf_ext_info core_relo_info; __u32 data_size; }; @@ -170,32 +247,40 @@ struct bpf_line_info_min { __u32 line_col; }; -/* bpf_field_info_kind encodes which aspect of captured field has to be - * adjusted by relocations. Currently supported values are: - * - BPF_FIELD_BYTE_OFFSET: field offset (in bytes); - * - BPF_FIELD_EXISTS: field existence (1, if field exists; 0, otherwise); +/* bpf_core_relo_kind encodes which aspect of captured field/type/enum value + * has to be adjusted by relocations. */ -enum bpf_field_info_kind { +enum bpf_core_relo_kind { BPF_FIELD_BYTE_OFFSET = 0, /* field byte offset */ - BPF_FIELD_BYTE_SIZE = 1, + BPF_FIELD_BYTE_SIZE = 1, /* field size in bytes */ BPF_FIELD_EXISTS = 2, /* field existence in target kernel */ - BPF_FIELD_SIGNED = 3, - BPF_FIELD_LSHIFT_U64 = 4, - BPF_FIELD_RSHIFT_U64 = 5, + BPF_FIELD_SIGNED = 3, /* field signedness (0 - unsigned, 1 - signed) */ + BPF_FIELD_LSHIFT_U64 = 4, /* bitfield-specific left bitshift */ + BPF_FIELD_RSHIFT_U64 = 5, /* bitfield-specific right bitshift */ + BPF_TYPE_ID_LOCAL = 6, /* type ID in local BPF object */ + BPF_TYPE_ID_TARGET = 7, /* type ID in target kernel */ + BPF_TYPE_EXISTS = 8, /* type existence in target kernel */ + BPF_TYPE_SIZE = 9, /* type size in bytes */ + BPF_ENUMVAL_EXISTS = 10, /* enum value existence in target kernel */ + BPF_ENUMVAL_VALUE = 11, /* enum value integer value */ }; -/* The minimum bpf_field_reloc checked by the loader +/* The minimum bpf_core_relo checked by the loader * - * Field relocation captures the following data: + * CO-RE relocation captures the following data: * - insn_off - instruction offset (in bytes) within a BPF program that needs * its insn->imm field to be relocated with actual field info; * - type_id - BTF type ID of the "root" (containing) entity of a relocatable - * field; + * type or field; * - access_str_off - offset into corresponding .BTF string section. String - * itself encodes an accessed field using a sequence of field and array - * indicies, separated by colon (:). It's conceptually very close to LLVM's - * getelementptr ([0]) instruction's arguments for identifying offset to - * a field. + * interpretation depends on specific relocation kind: + * - for field-based relocations, string encodes an accessed field using + * a sequence of field and array indices, separated by colon (:). It's + * conceptually very close to LLVM's getelementptr ([0]) instruction's + * arguments for identifying offset to a field. + * - for type-based relocations, strings is expected to be just "0"; + * - for enum value-based relocations, string contains an index of enum + * value within its enum type; * * Example to provide a better feel. * @@ -226,11 +311,11 @@ enum bpf_field_info_kind { * * [0] https://llvm.org/docs/LangRef.html#getelementptr-instruction */ -struct bpf_field_reloc { +struct bpf_core_relo { __u32 insn_off; __u32 type_id; __u32 access_str_off; - enum bpf_field_info_kind kind; + enum bpf_core_relo_kind kind; }; #endif /* __LIBBPF_LIBBPF_INTERNAL_H */ diff --git a/tools/lib/bpf/libbpf_probes.c b/tools/lib/bpf/libbpf_probes.c index 5a3d3f078408..5482a9b7ae2d 100644 --- a/tools/lib/bpf/libbpf_probes.c +++ b/tools/lib/bpf/libbpf_probes.c @@ -17,9 +17,6 @@ #include "libbpf.h" #include "libbpf_internal.h" -/* make sure libbpf doesn't use kernel-only integer typedefs */ -#pragma GCC poison u8 u16 u32 u64 s8 s16 s32 s64 - static bool grep(const char *buffer, const char *pattern) { return !!strstr(buffer, pattern); @@ -173,7 +170,7 @@ int libbpf__load_raw_btf(const char *raw_types, size_t types_len, return btf_fd; } -static int load_sk_storage_btf(void) +static int load_local_storage_btf(void) { const char strs[] = "\0bpf_spin_lock\0val\0cnt\0l"; /* struct bpf_spin_lock { @@ -232,12 +229,13 @@ bool bpf_probe_map_type(enum bpf_map_type map_type, __u32 ifindex) key_size = 0; break; case BPF_MAP_TYPE_SK_STORAGE: + case BPF_MAP_TYPE_INODE_STORAGE: btf_key_type_id = 1; btf_value_type_id = 3; value_size = 8; max_entries = 0; map_flags = BPF_F_NO_PREALLOC; - btf_fd = load_sk_storage_btf(); + btf_fd = load_local_storage_btf(); if (btf_fd < 0) return false; break; diff --git a/tools/lib/bpf/netlink.c b/tools/lib/bpf/netlink.c index 312f887570b2..4dd73de00b6f 100644 --- a/tools/lib/bpf/netlink.c +++ b/tools/lib/bpf/netlink.c @@ -15,13 +15,12 @@ #include "libbpf_internal.h" #include "nlattr.h" -/* make sure libbpf doesn't use kernel-only integer typedefs */ -#pragma GCC poison u8 u16 u32 u64 s8 s16 s32 s64 - #ifndef SOL_NETLINK #define SOL_NETLINK 270 #endif +typedef int (*libbpf_dump_nlmsg_t)(void *cookie, void *msg, struct nlattr **tb); + typedef int (*__dump_nlmsg_t)(struct nlmsghdr *nlmsg, libbpf_dump_nlmsg_t, void *cookie); @@ -31,7 +30,7 @@ struct xdp_id_md { struct xdp_link_info info; }; -int libbpf_netlink_open(__u32 *nl_pid) +static int libbpf_netlink_open(__u32 *nl_pid) { struct sockaddr_nl sa; socklen_t addrlen; @@ -283,6 +282,9 @@ static int get_xdp_info(void *cookie, void *msg, struct nlattr **tb) return 0; } +static int libbpf_nl_get_link(int sock, unsigned int nl_pid, + libbpf_dump_nlmsg_t dump_link_nlmsg, void *cookie); + int bpf_get_link_xdp_info(int ifindex, struct xdp_link_info *info, size_t info_size, __u32 flags) { @@ -368,121 +370,3 @@ int libbpf_nl_get_link(int sock, unsigned int nl_pid, return bpf_netlink_recv(sock, nl_pid, seq, __dump_link_nlmsg, dump_link_nlmsg, cookie); } - -static int __dump_class_nlmsg(struct nlmsghdr *nlh, - libbpf_dump_nlmsg_t dump_class_nlmsg, - void *cookie) -{ - struct nlattr *tb[TCA_MAX + 1], *attr; - struct tcmsg *t = NLMSG_DATA(nlh); - int len; - - len = nlh->nlmsg_len - NLMSG_LENGTH(sizeof(*t)); - attr = (struct nlattr *) ((void *) t + NLMSG_ALIGN(sizeof(*t))); - if (libbpf_nla_parse(tb, TCA_MAX, attr, len, NULL) != 0) - return -LIBBPF_ERRNO__NLPARSE; - - return dump_class_nlmsg(cookie, t, tb); -} - -int libbpf_nl_get_class(int sock, unsigned int nl_pid, int ifindex, - libbpf_dump_nlmsg_t dump_class_nlmsg, void *cookie) -{ - struct { - struct nlmsghdr nlh; - struct tcmsg t; - } req = { - .nlh.nlmsg_len = NLMSG_LENGTH(sizeof(struct tcmsg)), - .nlh.nlmsg_type = RTM_GETTCLASS, - .nlh.nlmsg_flags = NLM_F_DUMP | NLM_F_REQUEST, - .t.tcm_family = AF_UNSPEC, - .t.tcm_ifindex = ifindex, - }; - int seq = time(NULL); - - req.nlh.nlmsg_seq = seq; - if (send(sock, &req, req.nlh.nlmsg_len, 0) < 0) - return -errno; - - return bpf_netlink_recv(sock, nl_pid, seq, __dump_class_nlmsg, - dump_class_nlmsg, cookie); -} - -static int __dump_qdisc_nlmsg(struct nlmsghdr *nlh, - libbpf_dump_nlmsg_t dump_qdisc_nlmsg, - void *cookie) -{ - struct nlattr *tb[TCA_MAX + 1], *attr; - struct tcmsg *t = NLMSG_DATA(nlh); - int len; - - len = nlh->nlmsg_len - NLMSG_LENGTH(sizeof(*t)); - attr = (struct nlattr *) ((void *) t + NLMSG_ALIGN(sizeof(*t))); - if (libbpf_nla_parse(tb, TCA_MAX, attr, len, NULL) != 0) - return -LIBBPF_ERRNO__NLPARSE; - - return dump_qdisc_nlmsg(cookie, t, tb); -} - -int libbpf_nl_get_qdisc(int sock, unsigned int nl_pid, int ifindex, - libbpf_dump_nlmsg_t dump_qdisc_nlmsg, void *cookie) -{ - struct { - struct nlmsghdr nlh; - struct tcmsg t; - } req = { - .nlh.nlmsg_len = NLMSG_LENGTH(sizeof(struct tcmsg)), - .nlh.nlmsg_type = RTM_GETQDISC, - .nlh.nlmsg_flags = NLM_F_DUMP | NLM_F_REQUEST, - .t.tcm_family = AF_UNSPEC, - .t.tcm_ifindex = ifindex, - }; - int seq = time(NULL); - - req.nlh.nlmsg_seq = seq; - if (send(sock, &req, req.nlh.nlmsg_len, 0) < 0) - return -errno; - - return bpf_netlink_recv(sock, nl_pid, seq, __dump_qdisc_nlmsg, - dump_qdisc_nlmsg, cookie); -} - -static int __dump_filter_nlmsg(struct nlmsghdr *nlh, - libbpf_dump_nlmsg_t dump_filter_nlmsg, - void *cookie) -{ - struct nlattr *tb[TCA_MAX + 1], *attr; - struct tcmsg *t = NLMSG_DATA(nlh); - int len; - - len = nlh->nlmsg_len - NLMSG_LENGTH(sizeof(*t)); - attr = (struct nlattr *) ((void *) t + NLMSG_ALIGN(sizeof(*t))); - if (libbpf_nla_parse(tb, TCA_MAX, attr, len, NULL) != 0) - return -LIBBPF_ERRNO__NLPARSE; - - return dump_filter_nlmsg(cookie, t, tb); -} - -int libbpf_nl_get_filter(int sock, unsigned int nl_pid, int ifindex, int handle, - libbpf_dump_nlmsg_t dump_filter_nlmsg, void *cookie) -{ - struct { - struct nlmsghdr nlh; - struct tcmsg t; - } req = { - .nlh.nlmsg_len = NLMSG_LENGTH(sizeof(struct tcmsg)), - .nlh.nlmsg_type = RTM_GETTFILTER, - .nlh.nlmsg_flags = NLM_F_DUMP | NLM_F_REQUEST, - .t.tcm_family = AF_UNSPEC, - .t.tcm_ifindex = ifindex, - .t.tcm_parent = handle, - }; - int seq = time(NULL); - - req.nlh.nlmsg_seq = seq; - if (send(sock, &req, req.nlh.nlmsg_len, 0) < 0) - return -errno; - - return bpf_netlink_recv(sock, nl_pid, seq, __dump_filter_nlmsg, - dump_filter_nlmsg, cookie); -} diff --git a/tools/lib/bpf/nlattr.c b/tools/lib/bpf/nlattr.c index 0ad41dfea8eb..b607fa9852b1 100644 --- a/tools/lib/bpf/nlattr.c +++ b/tools/lib/bpf/nlattr.c @@ -7,14 +7,11 @@ */ #include <errno.h> -#include "nlattr.h" -#include "libbpf_internal.h" -#include <linux/rtnetlink.h> #include <string.h> #include <stdio.h> - -/* make sure libbpf doesn't use kernel-only integer typedefs */ -#pragma GCC poison u8 u16 u32 u64 s8 s16 s32 s64 +#include <linux/rtnetlink.h> +#include "nlattr.h" +#include "libbpf_internal.h" static uint16_t nla_attr_minlen[LIBBPF_NLA_TYPE_MAX+1] = { [LIBBPF_NLA_U8] = sizeof(uint8_t), diff --git a/tools/lib/bpf/ringbuf.c b/tools/lib/bpf/ringbuf.c index 4fc6c6cbb4eb..5c6522c89af1 100644 --- a/tools/lib/bpf/ringbuf.c +++ b/tools/lib/bpf/ringbuf.c @@ -16,15 +16,11 @@ #include <asm/barrier.h> #include <sys/mman.h> #include <sys/epoll.h> -#include <tools/libc_compat.h> #include "libbpf.h" #include "libbpf_internal.h" #include "bpf.h" -/* make sure libbpf doesn't use kernel-only integer typedefs */ -#pragma GCC poison u8 u16 u32 u64 s8 s16 s32 s64 - struct ring { ring_buffer_sample_fn sample_cb; void *ctx; @@ -82,12 +78,12 @@ int ring_buffer__add(struct ring_buffer *rb, int map_fd, return -EINVAL; } - tmp = reallocarray(rb->rings, rb->ring_cnt + 1, sizeof(*rb->rings)); + tmp = libbpf_reallocarray(rb->rings, rb->ring_cnt + 1, sizeof(*rb->rings)); if (!tmp) return -ENOMEM; rb->rings = tmp; - tmp = reallocarray(rb->events, rb->ring_cnt + 1, sizeof(*rb->events)); + tmp = libbpf_reallocarray(rb->events, rb->ring_cnt + 1, sizeof(*rb->events)); if (!tmp) return -ENOMEM; rb->events = tmp; diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c index f7f4efb70a4c..e3c98c007825 100644 --- a/tools/lib/bpf/xsk.c +++ b/tools/lib/bpf/xsk.c @@ -20,6 +20,8 @@ #include <linux/if_ether.h> #include <linux/if_packet.h> #include <linux/if_xdp.h> +#include <linux/kernel.h> +#include <linux/list.h> #include <linux/sockios.h> #include <net/if.h> #include <sys/ioctl.h> @@ -32,9 +34,6 @@ #include "libbpf_internal.h" #include "xsk.h" -/* make sure libbpf doesn't use kernel-only integer typedefs */ -#pragma GCC poison u8 u16 u32 u64 s8 s16 s32 s64 - #ifndef SOL_XDP #define SOL_XDP 283 #endif @@ -48,26 +47,35 @@ #endif struct xsk_umem { - struct xsk_ring_prod *fill; - struct xsk_ring_cons *comp; + struct xsk_ring_prod *fill_save; + struct xsk_ring_cons *comp_save; char *umem_area; struct xsk_umem_config config; int fd; int refcount; + struct list_head ctx_list; +}; + +struct xsk_ctx { + struct xsk_ring_prod *fill; + struct xsk_ring_cons *comp; + __u32 queue_id; + struct xsk_umem *umem; + int refcount; + int ifindex; + struct list_head list; + int prog_fd; + int xsks_map_fd; + char ifname[IFNAMSIZ]; }; struct xsk_socket { struct xsk_ring_cons *rx; struct xsk_ring_prod *tx; __u64 outstanding_tx; - struct xsk_umem *umem; + struct xsk_ctx *ctx; struct xsk_socket_config config; int fd; - int ifindex; - int prog_fd; - int xsks_map_fd; - __u32 queue_id; - char ifname[IFNAMSIZ]; }; struct xsk_nl_info { @@ -203,15 +211,73 @@ static int xsk_get_mmap_offsets(int fd, struct xdp_mmap_offsets *off) return -EINVAL; } +static int xsk_create_umem_rings(struct xsk_umem *umem, int fd, + struct xsk_ring_prod *fill, + struct xsk_ring_cons *comp) +{ + struct xdp_mmap_offsets off; + void *map; + int err; + + err = setsockopt(fd, SOL_XDP, XDP_UMEM_FILL_RING, + &umem->config.fill_size, + sizeof(umem->config.fill_size)); + if (err) + return -errno; + + err = setsockopt(fd, SOL_XDP, XDP_UMEM_COMPLETION_RING, + &umem->config.comp_size, + sizeof(umem->config.comp_size)); + if (err) + return -errno; + + err = xsk_get_mmap_offsets(fd, &off); + if (err) + return -errno; + + map = mmap(NULL, off.fr.desc + umem->config.fill_size * sizeof(__u64), + PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE, fd, + XDP_UMEM_PGOFF_FILL_RING); + if (map == MAP_FAILED) + return -errno; + + fill->mask = umem->config.fill_size - 1; + fill->size = umem->config.fill_size; + fill->producer = map + off.fr.producer; + fill->consumer = map + off.fr.consumer; + fill->flags = map + off.fr.flags; + fill->ring = map + off.fr.desc; + fill->cached_cons = umem->config.fill_size; + + map = mmap(NULL, off.cr.desc + umem->config.comp_size * sizeof(__u64), + PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE, fd, + XDP_UMEM_PGOFF_COMPLETION_RING); + if (map == MAP_FAILED) { + err = -errno; + goto out_mmap; + } + + comp->mask = umem->config.comp_size - 1; + comp->size = umem->config.comp_size; + comp->producer = map + off.cr.producer; + comp->consumer = map + off.cr.consumer; + comp->flags = map + off.cr.flags; + comp->ring = map + off.cr.desc; + + return 0; + +out_mmap: + munmap(map, off.fr.desc + umem->config.fill_size * sizeof(__u64)); + return err; +} + int xsk_umem__create_v0_0_4(struct xsk_umem **umem_ptr, void *umem_area, __u64 size, struct xsk_ring_prod *fill, struct xsk_ring_cons *comp, const struct xsk_umem_config *usr_config) { - struct xdp_mmap_offsets off; struct xdp_umem_reg mr; struct xsk_umem *umem; - void *map; int err; if (!umem_area || !umem_ptr || !fill || !comp) @@ -230,6 +296,7 @@ int xsk_umem__create_v0_0_4(struct xsk_umem **umem_ptr, void *umem_area, } umem->umem_area = umem_area; + INIT_LIST_HEAD(&umem->ctx_list); xsk_set_umem_config(&umem->config, usr_config); memset(&mr, 0, sizeof(mr)); @@ -244,71 +311,16 @@ int xsk_umem__create_v0_0_4(struct xsk_umem **umem_ptr, void *umem_area, err = -errno; goto out_socket; } - err = setsockopt(umem->fd, SOL_XDP, XDP_UMEM_FILL_RING, - &umem->config.fill_size, - sizeof(umem->config.fill_size)); - if (err) { - err = -errno; - goto out_socket; - } - err = setsockopt(umem->fd, SOL_XDP, XDP_UMEM_COMPLETION_RING, - &umem->config.comp_size, - sizeof(umem->config.comp_size)); - if (err) { - err = -errno; - goto out_socket; - } - err = xsk_get_mmap_offsets(umem->fd, &off); - if (err) { - err = -errno; - goto out_socket; - } - - map = mmap(NULL, off.fr.desc + umem->config.fill_size * sizeof(__u64), - PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE, umem->fd, - XDP_UMEM_PGOFF_FILL_RING); - if (map == MAP_FAILED) { - err = -errno; + err = xsk_create_umem_rings(umem, umem->fd, fill, comp); + if (err) goto out_socket; - } - - umem->fill = fill; - fill->mask = umem->config.fill_size - 1; - fill->size = umem->config.fill_size; - fill->producer = map + off.fr.producer; - fill->consumer = map + off.fr.consumer; - fill->flags = map + off.fr.flags; - fill->ring = map + off.fr.desc; - fill->cached_prod = *fill->producer; - /* cached_cons is "size" bigger than the real consumer pointer - * See xsk_prod_nb_free - */ - fill->cached_cons = *fill->consumer + umem->config.fill_size; - - map = mmap(NULL, off.cr.desc + umem->config.comp_size * sizeof(__u64), - PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE, umem->fd, - XDP_UMEM_PGOFF_COMPLETION_RING); - if (map == MAP_FAILED) { - err = -errno; - goto out_mmap; - } - - umem->comp = comp; - comp->mask = umem->config.comp_size - 1; - comp->size = umem->config.comp_size; - comp->producer = map + off.cr.producer; - comp->consumer = map + off.cr.consumer; - comp->flags = map + off.cr.flags; - comp->ring = map + off.cr.desc; - comp->cached_prod = *comp->producer; - comp->cached_cons = *comp->consumer; + umem->fill_save = fill; + umem->comp_save = comp; *umem_ptr = umem; return 0; -out_mmap: - munmap(map, off.fr.desc + umem->config.fill_size * sizeof(__u64)); out_socket: close(umem->fd); out_umem_alloc: @@ -342,6 +354,7 @@ DEFAULT_VERSION(xsk_umem__create_v0_0_4, xsk_umem__create, LIBBPF_0.0.4) static int xsk_load_xdp_prog(struct xsk_socket *xsk) { static const int log_buf_size = 16 * 1024; + struct xsk_ctx *ctx = xsk->ctx; char log_buf[log_buf_size]; int err, prog_fd; @@ -369,7 +382,7 @@ static int xsk_load_xdp_prog(struct xsk_socket *xsk) /* *(u32 *)(r10 - 4) = r2 */ BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_2, -4), /* r1 = xskmap[] */ - BPF_LD_MAP_FD(BPF_REG_1, xsk->xsks_map_fd), + BPF_LD_MAP_FD(BPF_REG_1, ctx->xsks_map_fd), /* r3 = XDP_PASS */ BPF_MOV64_IMM(BPF_REG_3, 2), /* call bpf_redirect_map */ @@ -381,7 +394,7 @@ static int xsk_load_xdp_prog(struct xsk_socket *xsk) /* r2 += -4 */ BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4), /* r1 = xskmap[] */ - BPF_LD_MAP_FD(BPF_REG_1, xsk->xsks_map_fd), + BPF_LD_MAP_FD(BPF_REG_1, ctx->xsks_map_fd), /* call bpf_map_lookup_elem */ BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem), /* r1 = r0 */ @@ -393,7 +406,7 @@ static int xsk_load_xdp_prog(struct xsk_socket *xsk) /* r2 = *(u32 *)(r10 - 4) */ BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_10, -4), /* r1 = xskmap[] */ - BPF_LD_MAP_FD(BPF_REG_1, xsk->xsks_map_fd), + BPF_LD_MAP_FD(BPF_REG_1, ctx->xsks_map_fd), /* r3 = 0 */ BPF_MOV64_IMM(BPF_REG_3, 0), /* call bpf_redirect_map */ @@ -411,19 +424,21 @@ static int xsk_load_xdp_prog(struct xsk_socket *xsk) return prog_fd; } - err = bpf_set_link_xdp_fd(xsk->ifindex, prog_fd, xsk->config.xdp_flags); + err = bpf_set_link_xdp_fd(xsk->ctx->ifindex, prog_fd, + xsk->config.xdp_flags); if (err) { close(prog_fd); return err; } - xsk->prog_fd = prog_fd; + ctx->prog_fd = prog_fd; return 0; } static int xsk_get_max_queues(struct xsk_socket *xsk) { struct ethtool_channels channels = { .cmd = ETHTOOL_GCHANNELS }; + struct xsk_ctx *ctx = xsk->ctx; struct ifreq ifr = {}; int fd, err, ret; @@ -432,7 +447,7 @@ static int xsk_get_max_queues(struct xsk_socket *xsk) return -errno; ifr.ifr_data = (void *)&channels; - memcpy(ifr.ifr_name, xsk->ifname, IFNAMSIZ - 1); + memcpy(ifr.ifr_name, ctx->ifname, IFNAMSIZ - 1); ifr.ifr_name[IFNAMSIZ - 1] = '\0'; err = ioctl(fd, SIOCETHTOOL, &ifr); if (err && errno != EOPNOTSUPP) { @@ -460,6 +475,7 @@ out: static int xsk_create_bpf_maps(struct xsk_socket *xsk) { + struct xsk_ctx *ctx = xsk->ctx; int max_queues; int fd; @@ -472,15 +488,17 @@ static int xsk_create_bpf_maps(struct xsk_socket *xsk) if (fd < 0) return fd; - xsk->xsks_map_fd = fd; + ctx->xsks_map_fd = fd; return 0; } static void xsk_delete_bpf_maps(struct xsk_socket *xsk) { - bpf_map_delete_elem(xsk->xsks_map_fd, &xsk->queue_id); - close(xsk->xsks_map_fd); + struct xsk_ctx *ctx = xsk->ctx; + + bpf_map_delete_elem(ctx->xsks_map_fd, &ctx->queue_id); + close(ctx->xsks_map_fd); } static int xsk_lookup_bpf_maps(struct xsk_socket *xsk) @@ -488,10 +506,11 @@ static int xsk_lookup_bpf_maps(struct xsk_socket *xsk) __u32 i, *map_ids, num_maps, prog_len = sizeof(struct bpf_prog_info); __u32 map_len = sizeof(struct bpf_map_info); struct bpf_prog_info prog_info = {}; + struct xsk_ctx *ctx = xsk->ctx; struct bpf_map_info map_info; int fd, err; - err = bpf_obj_get_info_by_fd(xsk->prog_fd, &prog_info, &prog_len); + err = bpf_obj_get_info_by_fd(ctx->prog_fd, &prog_info, &prog_len); if (err) return err; @@ -505,11 +524,11 @@ static int xsk_lookup_bpf_maps(struct xsk_socket *xsk) prog_info.nr_map_ids = num_maps; prog_info.map_ids = (__u64)(unsigned long)map_ids; - err = bpf_obj_get_info_by_fd(xsk->prog_fd, &prog_info, &prog_len); + err = bpf_obj_get_info_by_fd(ctx->prog_fd, &prog_info, &prog_len); if (err) goto out_map_ids; - xsk->xsks_map_fd = -1; + ctx->xsks_map_fd = -1; for (i = 0; i < prog_info.nr_map_ids; i++) { fd = bpf_map_get_fd_by_id(map_ids[i]); @@ -523,7 +542,7 @@ static int xsk_lookup_bpf_maps(struct xsk_socket *xsk) } if (!strcmp(map_info.name, "xsks_map")) { - xsk->xsks_map_fd = fd; + ctx->xsks_map_fd = fd; continue; } @@ -531,7 +550,7 @@ static int xsk_lookup_bpf_maps(struct xsk_socket *xsk) } err = 0; - if (xsk->xsks_map_fd == -1) + if (ctx->xsks_map_fd == -1) err = -ENOENT; out_map_ids: @@ -541,16 +560,19 @@ out_map_ids: static int xsk_set_bpf_maps(struct xsk_socket *xsk) { - return bpf_map_update_elem(xsk->xsks_map_fd, &xsk->queue_id, + struct xsk_ctx *ctx = xsk->ctx; + + return bpf_map_update_elem(ctx->xsks_map_fd, &ctx->queue_id, &xsk->fd, 0); } static int xsk_setup_xdp_prog(struct xsk_socket *xsk) { + struct xsk_ctx *ctx = xsk->ctx; __u32 prog_id = 0; int err; - err = bpf_get_link_xdp_id(xsk->ifindex, &prog_id, + err = bpf_get_link_xdp_id(ctx->ifindex, &prog_id, xsk->config.xdp_flags); if (err) return err; @@ -566,12 +588,12 @@ static int xsk_setup_xdp_prog(struct xsk_socket *xsk) return err; } } else { - xsk->prog_fd = bpf_prog_get_fd_by_id(prog_id); - if (xsk->prog_fd < 0) + ctx->prog_fd = bpf_prog_get_fd_by_id(prog_id); + if (ctx->prog_fd < 0) return -errno; err = xsk_lookup_bpf_maps(xsk); if (err) { - close(xsk->prog_fd); + close(ctx->prog_fd); return err; } } @@ -580,23 +602,108 @@ static int xsk_setup_xdp_prog(struct xsk_socket *xsk) err = xsk_set_bpf_maps(xsk); if (err) { xsk_delete_bpf_maps(xsk); - close(xsk->prog_fd); + close(ctx->prog_fd); return err; } return 0; } -int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname, - __u32 queue_id, struct xsk_umem *umem, - struct xsk_ring_cons *rx, struct xsk_ring_prod *tx, - const struct xsk_socket_config *usr_config) +static struct xsk_ctx *xsk_get_ctx(struct xsk_umem *umem, int ifindex, + __u32 queue_id) +{ + struct xsk_ctx *ctx; + + if (list_empty(&umem->ctx_list)) + return NULL; + + list_for_each_entry(ctx, &umem->ctx_list, list) { + if (ctx->ifindex == ifindex && ctx->queue_id == queue_id) { + ctx->refcount++; + return ctx; + } + } + + return NULL; +} + +static void xsk_put_ctx(struct xsk_ctx *ctx) +{ + struct xsk_umem *umem = ctx->umem; + struct xdp_mmap_offsets off; + int err; + + if (--ctx->refcount == 0) { + err = xsk_get_mmap_offsets(umem->fd, &off); + if (!err) { + munmap(ctx->fill->ring - off.fr.desc, + off.fr.desc + umem->config.fill_size * + sizeof(__u64)); + munmap(ctx->comp->ring - off.cr.desc, + off.cr.desc + umem->config.comp_size * + sizeof(__u64)); + } + + list_del(&ctx->list); + free(ctx); + } +} + +static struct xsk_ctx *xsk_create_ctx(struct xsk_socket *xsk, + struct xsk_umem *umem, int ifindex, + const char *ifname, __u32 queue_id, + struct xsk_ring_prod *fill, + struct xsk_ring_cons *comp) +{ + struct xsk_ctx *ctx; + int err; + + ctx = calloc(1, sizeof(*ctx)); + if (!ctx) + return NULL; + + if (!umem->fill_save) { + err = xsk_create_umem_rings(umem, xsk->fd, fill, comp); + if (err) { + free(ctx); + return NULL; + } + } else if (umem->fill_save != fill || umem->comp_save != comp) { + /* Copy over rings to new structs. */ + memcpy(fill, umem->fill_save, sizeof(*fill)); + memcpy(comp, umem->comp_save, sizeof(*comp)); + } + + ctx->ifindex = ifindex; + ctx->refcount = 1; + ctx->umem = umem; + ctx->queue_id = queue_id; + memcpy(ctx->ifname, ifname, IFNAMSIZ - 1); + ctx->ifname[IFNAMSIZ - 1] = '\0'; + + umem->fill_save = NULL; + umem->comp_save = NULL; + ctx->fill = fill; + ctx->comp = comp; + list_add(&ctx->list, &umem->ctx_list); + return ctx; +} + +int xsk_socket__create_shared(struct xsk_socket **xsk_ptr, + const char *ifname, + __u32 queue_id, struct xsk_umem *umem, + struct xsk_ring_cons *rx, + struct xsk_ring_prod *tx, + struct xsk_ring_prod *fill, + struct xsk_ring_cons *comp, + const struct xsk_socket_config *usr_config) { void *rx_map = NULL, *tx_map = NULL; struct sockaddr_xdp sxdp = {}; struct xdp_mmap_offsets off; struct xsk_socket *xsk; - int err; + struct xsk_ctx *ctx; + int err, ifindex; if (!umem || !xsk_ptr || !(rx || tx)) return -EFAULT; @@ -609,10 +716,10 @@ int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname, if (err) goto out_xsk_alloc; - if (umem->refcount && - !(xsk->config.libbpf_flags & XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD)) { - pr_warn("Error: shared umems not supported by libbpf supplied XDP program.\n"); - err = -EBUSY; + xsk->outstanding_tx = 0; + ifindex = if_nametoindex(ifname); + if (!ifindex) { + err = -errno; goto out_xsk_alloc; } @@ -626,16 +733,21 @@ int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname, xsk->fd = umem->fd; } - xsk->outstanding_tx = 0; - xsk->queue_id = queue_id; - xsk->umem = umem; - xsk->ifindex = if_nametoindex(ifname); - if (!xsk->ifindex) { - err = -errno; - goto out_socket; + ctx = xsk_get_ctx(umem, ifindex, queue_id); + if (!ctx) { + if (!fill || !comp) { + err = -EFAULT; + goto out_socket; + } + + ctx = xsk_create_ctx(xsk, umem, ifindex, ifname, queue_id, + fill, comp); + if (!ctx) { + err = -ENOMEM; + goto out_socket; + } } - memcpy(xsk->ifname, ifname, IFNAMSIZ - 1); - xsk->ifname[IFNAMSIZ - 1] = '\0'; + xsk->ctx = ctx; if (rx) { err = setsockopt(xsk->fd, SOL_XDP, XDP_RX_RING, @@ -643,7 +755,7 @@ int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname, sizeof(xsk->config.rx_size)); if (err) { err = -errno; - goto out_socket; + goto out_put_ctx; } } if (tx) { @@ -652,14 +764,14 @@ int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname, sizeof(xsk->config.tx_size)); if (err) { err = -errno; - goto out_socket; + goto out_put_ctx; } } err = xsk_get_mmap_offsets(xsk->fd, &off); if (err) { err = -errno; - goto out_socket; + goto out_put_ctx; } if (rx) { @@ -669,7 +781,7 @@ int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname, xsk->fd, XDP_PGOFF_RX_RING); if (rx_map == MAP_FAILED) { err = -errno; - goto out_socket; + goto out_put_ctx; } rx->mask = xsk->config.rx_size - 1; @@ -708,10 +820,10 @@ int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname, xsk->tx = tx; sxdp.sxdp_family = PF_XDP; - sxdp.sxdp_ifindex = xsk->ifindex; - sxdp.sxdp_queue_id = xsk->queue_id; + sxdp.sxdp_ifindex = ctx->ifindex; + sxdp.sxdp_queue_id = ctx->queue_id; if (umem->refcount > 1) { - sxdp.sxdp_flags = XDP_SHARED_UMEM; + sxdp.sxdp_flags |= XDP_SHARED_UMEM; sxdp.sxdp_shared_umem_fd = umem->fd; } else { sxdp.sxdp_flags = xsk->config.bind_flags; @@ -723,7 +835,7 @@ int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname, goto out_mmap_tx; } - xsk->prog_fd = -1; + ctx->prog_fd = -1; if (!(xsk->config.libbpf_flags & XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD)) { err = xsk_setup_xdp_prog(xsk); @@ -742,6 +854,8 @@ out_mmap_rx: if (rx) munmap(rx_map, off.rx.desc + xsk->config.rx_size * sizeof(struct xdp_desc)); +out_put_ctx: + xsk_put_ctx(ctx); out_socket: if (--umem->refcount) close(xsk->fd); @@ -750,25 +864,24 @@ out_xsk_alloc: return err; } -int xsk_umem__delete(struct xsk_umem *umem) +int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname, + __u32 queue_id, struct xsk_umem *umem, + struct xsk_ring_cons *rx, struct xsk_ring_prod *tx, + const struct xsk_socket_config *usr_config) { - struct xdp_mmap_offsets off; - int err; + return xsk_socket__create_shared(xsk_ptr, ifname, queue_id, umem, + rx, tx, umem->fill_save, + umem->comp_save, usr_config); +} +int xsk_umem__delete(struct xsk_umem *umem) +{ if (!umem) return 0; if (umem->refcount) return -EBUSY; - err = xsk_get_mmap_offsets(umem->fd, &off); - if (!err) { - munmap(umem->fill->ring - off.fr.desc, - off.fr.desc + umem->config.fill_size * sizeof(__u64)); - munmap(umem->comp->ring - off.cr.desc, - off.cr.desc + umem->config.comp_size * sizeof(__u64)); - } - close(umem->fd); free(umem); @@ -778,15 +891,16 @@ int xsk_umem__delete(struct xsk_umem *umem) void xsk_socket__delete(struct xsk_socket *xsk) { size_t desc_sz = sizeof(struct xdp_desc); + struct xsk_ctx *ctx = xsk->ctx; struct xdp_mmap_offsets off; int err; if (!xsk) return; - if (xsk->prog_fd != -1) { + if (ctx->prog_fd != -1) { xsk_delete_bpf_maps(xsk); - close(xsk->prog_fd); + close(ctx->prog_fd); } err = xsk_get_mmap_offsets(xsk->fd, &off); @@ -799,14 +913,15 @@ void xsk_socket__delete(struct xsk_socket *xsk) munmap(xsk->tx->ring - off.tx.desc, off.tx.desc + xsk->config.tx_size * desc_sz); } - } - xsk->umem->refcount--; + xsk_put_ctx(ctx); + + ctx->umem->refcount--; /* Do not close an fd that also has an associated umem connected * to it. */ - if (xsk->fd != xsk->umem->fd) + if (xsk->fd != ctx->umem->fd) close(xsk->fd); free(xsk); } diff --git a/tools/lib/bpf/xsk.h b/tools/lib/bpf/xsk.h index 584f6820a639..1069c46364ff 100644 --- a/tools/lib/bpf/xsk.h +++ b/tools/lib/bpf/xsk.h @@ -234,6 +234,15 @@ LIBBPF_API int xsk_socket__create(struct xsk_socket **xsk, struct xsk_ring_cons *rx, struct xsk_ring_prod *tx, const struct xsk_socket_config *config); +LIBBPF_API int +xsk_socket__create_shared(struct xsk_socket **xsk_ptr, + const char *ifname, + __u32 queue_id, struct xsk_umem *umem, + struct xsk_ring_cons *rx, + struct xsk_ring_prod *tx, + struct xsk_ring_prod *fill, + struct xsk_ring_cons *comp, + const struct xsk_socket_config *config); /* Returns 0 for success and -EBUSY if the umem is still in use. */ LIBBPF_API int xsk_umem__delete(struct xsk_umem *umem); |