summaryrefslogtreecommitdiffstats
path: root/include
diff options
context:
space:
mode:
authorAndrii Nakryiko <andrii@kernel.org>2022-05-23 14:30:18 -0700
committerAndrii Nakryiko <andrii@kernel.org>2022-05-23 14:31:29 -0700
commit608b638ebf368f18431f47bbbd0d93828cbbdf83 (patch)
treecae0fe762d981c3965aab29479d0c7af319ce73c /include
parent1ec5ee8c8a5a65ea377f8bea64bf4d5b743f6f79 (diff)
parent0cf7052a55128f7fd7905f6ae6eb995d6db76b52 (diff)
downloadlinux-608b638ebf368f18431f47bbbd0d93828cbbdf83.tar.bz2
Merge branch 'Dynamic pointers'
Joanne Koong says: ==================== This patchset implements the basics of dynamic pointers in bpf. A dynamic pointer (struct bpf_dynptr) is a pointer that stores extra metadata alongside the address it points to. This abstraction is useful in bpf given that every memory access in a bpf program must be safe. The verifier and bpf helper functions can use the metadata to enforce safety guarantees for things such as dynamically sized strings and kernel heap allocations. From the program side, the bpf_dynptr is an opaque struct and the verifier will enforce that its contents are never written to by the program. It can only be written to through specific bpf helper functions. There are several uses cases for dynamic pointers in bpf programs. Some examples include: dynamically sized ringbuf reservations without extra memcpys, dynamic string parsing and memory comparisons, dynamic memory allocations that can be persisted in maps, and dynamic + ergonomic parsing of sk_buff and xdp_md packet data. At a high-level, the patches are as follows: 1/6 - Adds verifier support for dynptrs 2/6 - Adds bpf_dynptr_from_mem (local dynptr) 3/6 - Adds dynptr support for ring buffers 4/6 - Adds bpf_dynptr_read and bpf_dynptr_write 5/6 - Adds dynptr data slices (ptr to the dynptr data) 6/6 - Tests to check that the verifier rejects invalid cases and passes valid ones This is the first dynptr patchset in a larger series. The next series of patches will add dynptrs that support dynamic memory allocations that can also be persisted in maps, support for parsing packet data through dynptrs, convenience helpers for using dynptrs as iterators, and more helper functions for interacting with strings and memory dynamically. Changelog: ---------- v5 -> v6: v5: https://lore.kernel.org/bpf/20220520044245.3305025-1-joannelkoong@gmail.com/ * enforce PTR_TO_MAP_VALUE for bpf_dynptr_from_mem data in check_helper_call instead of using DYNPTR_TYPE_LOCAL when checking func arg compatiblity * remove MEM_DYNPTR modifier v4 -> v5: v4: https://lore.kernel.org/bpf/20220509224257.3222614-1-joannelkoong@gmail.com/ * Remove malloc dynptr; this will be part of the 2nd patchset while we figure out memory accounting * For data slices, only set the register's ref_obj_id to dynptr_id (Alexei) * Tidying (eg remove "inline", move offset checking to "check_func_arg_reg_off") (David) * Add a few new test cases, remove malloc-only ones. v3 -> v4: v3: https://lore.kernel.org/bpf/20220428211059.4065379-1-joannelkoong@gmail.com/ 1/6 - Change mem ptr + size check to use more concise inequality expression (David + Andrii) 2/6 - Add check for meta->uninit_dynptr_regno not already set (Andrii) Move DYNPTR_TYPE_FLAG_MASK to include/linux/bpf.h (Andrii) 3/6 - Remove four underscores for invoking BPF_CALL (Andrii) Add __BPF_TYPE_FLAG_MAX and use it for __BPF_TYPE_LAST_FLAG (Andrii) 4/6 - Fix capacity to be bpf_dynptr size value in check_off_len (Andrii) Change -EINVAL to -E2BIG if len + offset is out of bounds (Andrii) 5/6 - Add check for only 1 dynptr arg for dynptr data function (Andrii) 6/6 - For ringbuf map, set max_entries from userspace (Andrii) Use err ?: ... for interactring with dynptr APIs (Andrii) Define array_map2 for add_dynptr_to_map2 test where value is a struct with an embedded dynptr Remove ref id from missing_put_callback message, since on different environments, ref id is not always = 1 v2 -> v3: v2: https://lore.kernel.org/bpf/20220416063429.3314021-1-joannelkoong@gmail.com/ * Reorder patches (move ringbuffer patch to be right after the verifier + * malloc dynptr patchset) * Remove local type dynptrs (Andrii + Alexei) * Mark stack slot as STACK_MISC after any writes into a dynptr instead of * explicitly prohibiting writes (Alexei) * Pass number of slots, not memory size to is_spi_bounds_valid (Kumar) * Check reference leaks by adding dynptr id to state->refs instead of checking stack slots (Alexei) v1 -> v2: v1: https://lore.kernel.org/bpf/20220402015826.3941317-1-joannekoong@fb.com/ 1/7 - * Remove ARG_PTR_TO_MAP_VALUE_UNINIT alias and use ARG_PTR_TO_MAP_VALUE | MEM_UNINIT directly (Andrii) * Drop arg_type_is_mem_ptr() wrapper function (Andrii) 2/7 - * Change name from MEM_RELEASE to OBJ_RELEASE (Andrii) * Use meta.release_ref instead of ref_obj_id != 0 to determine whether to release reference (Kumar) * Drop type_is_release_mem() wrapper function (Andrii) 3/7 - * Add checks for nested dynptrs edge-cases, which could lead to corrupt * writes of the dynptr stack variable. * Add u64 flags to bpf_dynptr_from_mem() and bpf_dynptr_alloc() (Andrii) * Rename from bpf_malloc/bpf_free to bpf_dynptr_alloc/bpf_dynptr_put (Alexei) * Support alloc flag __GFP_ZERO (Andrii) * Reserve upper 8 bits in dynptr size and offset fields instead of reserving just the upper 4 bits (Andrii) * Allow dynptr zero-slices (Andrii) * Use the highest bit for is_rdonly instead of the 28th bit (Andrii) * Rename check_* functions to is_* functions for better readability (Andrii) * Add comment for code that checks the spi bounds (Andrii) 4/7 - * Fix doc description for bpf_dynpt_read (Toke) * Move bpf_dynptr_check_off_len() from function patch 1 to here (Andrii) 5/7 - * When finding the id for the dynptr to associate the data slice with, look for dynptr arg instead of assuming it is BPF_REG_1. 6/7 - * Add __force when casting from unsigned long to void * (kernel test * robot) * Expand on docs for ringbuf dynptr APIs (Andrii) 7/7 - * Use table approach for defining test programs and error messages * (Andrii) * Print out full log if there’s an error (Andrii) * Use bpf_object__find_program_by_name() instead of specifying program name as a string (Andrii) * Add 6 extra cases: invalid_nested_dynptrs1, invalid_nested_dynptrs2, invalid_ref_mem1, invalid_ref_mem2, zero_slice_access, and test_alloc_zero_bytes * Add checking for edge cases (eg allocing with invalid flags) ==================== Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Diffstat (limited to 'include')
-rw-r--r--include/linux/bpf.h42
-rw-r--r--include/linux/bpf_verifier.h20
-rw-r--r--include/uapi/linux/bpf.h83
3 files changed, 145 insertions, 0 deletions
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index a9b1875212f6..a7080c86fa76 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -392,10 +392,18 @@ enum bpf_type_flag {
MEM_UNINIT = BIT(7 + BPF_BASE_TYPE_BITS),
+ /* DYNPTR points to memory local to the bpf program. */
+ DYNPTR_TYPE_LOCAL = BIT(8 + BPF_BASE_TYPE_BITS),
+
+ /* DYNPTR points to a ringbuf record. */
+ DYNPTR_TYPE_RINGBUF = BIT(9 + BPF_BASE_TYPE_BITS),
+
__BPF_TYPE_FLAG_MAX,
__BPF_TYPE_LAST_FLAG = __BPF_TYPE_FLAG_MAX - 1,
};
+#define DYNPTR_TYPE_FLAG_MASK (DYNPTR_TYPE_LOCAL | DYNPTR_TYPE_RINGBUF)
+
/* Max number of base types. */
#define BPF_BASE_TYPE_LIMIT (1UL << BPF_BASE_TYPE_BITS)
@@ -438,6 +446,7 @@ enum bpf_arg_type {
ARG_PTR_TO_CONST_STR, /* pointer to a null terminated read-only string */
ARG_PTR_TO_TIMER, /* pointer to bpf_timer */
ARG_PTR_TO_KPTR, /* pointer to referenced kptr */
+ ARG_PTR_TO_DYNPTR, /* pointer to bpf_dynptr. See bpf_type_flag for dynptr type */
__BPF_ARG_TYPE_MAX,
/* Extended arg_types. */
@@ -479,6 +488,7 @@ enum bpf_return_type {
RET_PTR_TO_TCP_SOCK_OR_NULL = PTR_MAYBE_NULL | RET_PTR_TO_TCP_SOCK,
RET_PTR_TO_SOCK_COMMON_OR_NULL = PTR_MAYBE_NULL | RET_PTR_TO_SOCK_COMMON,
RET_PTR_TO_ALLOC_MEM_OR_NULL = PTR_MAYBE_NULL | MEM_ALLOC | RET_PTR_TO_ALLOC_MEM,
+ RET_PTR_TO_DYNPTR_MEM_OR_NULL = PTR_MAYBE_NULL | RET_PTR_TO_ALLOC_MEM,
RET_PTR_TO_BTF_ID_OR_NULL = PTR_MAYBE_NULL | RET_PTR_TO_BTF_ID,
/* This must be the last entry. Its purpose is to ensure the enum is
@@ -2225,6 +2235,9 @@ extern const struct bpf_func_proto bpf_ringbuf_reserve_proto;
extern const struct bpf_func_proto bpf_ringbuf_submit_proto;
extern const struct bpf_func_proto bpf_ringbuf_discard_proto;
extern const struct bpf_func_proto bpf_ringbuf_query_proto;
+extern const struct bpf_func_proto bpf_ringbuf_reserve_dynptr_proto;
+extern const struct bpf_func_proto bpf_ringbuf_submit_dynptr_proto;
+extern const struct bpf_func_proto bpf_ringbuf_discard_dynptr_proto;
extern const struct bpf_func_proto bpf_skc_to_tcp6_sock_proto;
extern const struct bpf_func_proto bpf_skc_to_tcp_sock_proto;
extern const struct bpf_func_proto bpf_skc_to_tcp_timewait_sock_proto;
@@ -2376,4 +2389,33 @@ int bpf_bprintf_prepare(char *fmt, u32 fmt_size, const u64 *raw_args,
u32 **bin_buf, u32 num_args);
void bpf_bprintf_cleanup(void);
+/* the implementation of the opaque uapi struct bpf_dynptr */
+struct bpf_dynptr_kern {
+ void *data;
+ /* Size represents the number of usable bytes of dynptr data.
+ * If for example the offset is at 4 for a local dynptr whose data is
+ * of type u64, the number of usable bytes is 4.
+ *
+ * The upper 8 bits are reserved. It is as follows:
+ * Bits 0 - 23 = size
+ * Bits 24 - 30 = dynptr type
+ * Bit 31 = whether dynptr is read-only
+ */
+ u32 size;
+ u32 offset;
+} __aligned(8);
+
+enum bpf_dynptr_type {
+ BPF_DYNPTR_TYPE_INVALID,
+ /* Points to memory that is local to the bpf program */
+ BPF_DYNPTR_TYPE_LOCAL,
+ /* Underlying data is a ringbuf record */
+ BPF_DYNPTR_TYPE_RINGBUF,
+};
+
+void bpf_dynptr_init(struct bpf_dynptr_kern *ptr, void *data,
+ enum bpf_dynptr_type type, u32 offset, u32 size);
+void bpf_dynptr_set_null(struct bpf_dynptr_kern *ptr);
+int bpf_dynptr_check_size(u32 size);
+
#endif /* _LINUX_BPF_H */
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 1f1e7f2ea967..e8439f6cbe57 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -72,6 +72,18 @@ struct bpf_reg_state {
u32 mem_size; /* for PTR_TO_MEM | PTR_TO_MEM_OR_NULL */
+ /* For dynptr stack slots */
+ struct {
+ enum bpf_dynptr_type type;
+ /* A dynptr is 16 bytes so it takes up 2 stack slots.
+ * We need to track which slot is the first slot
+ * to protect against cases where the user may try to
+ * pass in an address starting at the second slot of the
+ * dynptr.
+ */
+ bool first_slot;
+ } dynptr;
+
/* Max size from any of the above. */
struct {
unsigned long raw1;
@@ -88,6 +100,8 @@ struct bpf_reg_state {
* for the purpose of tracking that it's freed.
* For PTR_TO_SOCKET this is used to share which pointers retain the
* same reference to the socket, to determine proper reference freeing.
+ * For stack slots that are dynptrs, this is used to track references to
+ * the dynptr to determine proper reference freeing.
*/
u32 id;
/* PTR_TO_SOCKET and PTR_TO_TCP_SOCK could be a ptr returned
@@ -174,9 +188,15 @@ enum bpf_stack_slot_type {
STACK_SPILL, /* register spilled into stack */
STACK_MISC, /* BPF program wrote some data into this slot */
STACK_ZERO, /* BPF program wrote constant zero */
+ /* A dynptr is stored in this stack slot. The type of dynptr
+ * is stored in bpf_stack_state->spilled_ptr.dynptr.type
+ */
+ STACK_DYNPTR,
};
#define BPF_REG_SIZE 8 /* size of eBPF register in bytes */
+#define BPF_DYNPTR_SIZE sizeof(struct bpf_dynptr_kern)
+#define BPF_DYNPTR_NR_SLOTS (BPF_DYNPTR_SIZE / BPF_REG_SIZE)
struct bpf_stack_state {
struct bpf_reg_state spilled_ptr;
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 56688bee20d9..f4009dbdf62d 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -5178,6 +5178,77 @@ union bpf_attr {
* Dynamically cast a *sk* pointer to a *mptcp_sock* pointer.
* Return
* *sk* if casting is valid, or **NULL** otherwise.
+ *
+ * long bpf_dynptr_from_mem(void *data, u32 size, u64 flags, struct bpf_dynptr *ptr)
+ * Description
+ * Get a dynptr to local memory *data*.
+ *
+ * *data* must be a ptr to a map value.
+ * The maximum *size* supported is DYNPTR_MAX_SIZE.
+ * *flags* is currently unused.
+ * Return
+ * 0 on success, -E2BIG if the size exceeds DYNPTR_MAX_SIZE,
+ * -EINVAL if flags is not 0.
+ *
+ * long bpf_ringbuf_reserve_dynptr(void *ringbuf, u32 size, u64 flags, struct bpf_dynptr *ptr)
+ * Description
+ * Reserve *size* bytes of payload in a ring buffer *ringbuf*
+ * through the dynptr interface. *flags* must be 0.
+ *
+ * Please note that a corresponding bpf_ringbuf_submit_dynptr or
+ * bpf_ringbuf_discard_dynptr must be called on *ptr*, even if the
+ * reservation fails. This is enforced by the verifier.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * void bpf_ringbuf_submit_dynptr(struct bpf_dynptr *ptr, u64 flags)
+ * Description
+ * Submit reserved ring buffer sample, pointed to by *data*,
+ * through the dynptr interface. This is a no-op if the dynptr is
+ * invalid/null.
+ *
+ * For more information on *flags*, please see
+ * 'bpf_ringbuf_submit'.
+ * Return
+ * Nothing. Always succeeds.
+ *
+ * void bpf_ringbuf_discard_dynptr(struct bpf_dynptr *ptr, u64 flags)
+ * Description
+ * Discard reserved ring buffer sample through the dynptr
+ * interface. This is a no-op if the dynptr is invalid/null.
+ *
+ * For more information on *flags*, please see
+ * 'bpf_ringbuf_discard'.
+ * Return
+ * Nothing. Always succeeds.
+ *
+ * long bpf_dynptr_read(void *dst, u32 len, struct bpf_dynptr *src, u32 offset)
+ * Description
+ * Read *len* bytes from *src* into *dst*, starting from *offset*
+ * into *src*.
+ * Return
+ * 0 on success, -E2BIG if *offset* + *len* exceeds the length
+ * of *src*'s data, -EINVAL if *src* is an invalid dynptr.
+ *
+ * long bpf_dynptr_write(struct bpf_dynptr *dst, u32 offset, void *src, u32 len)
+ * Description
+ * Write *len* bytes from *src* into *dst*, starting from *offset*
+ * into *dst*.
+ * Return
+ * 0 on success, -E2BIG if *offset* + *len* exceeds the length
+ * of *dst*'s data, -EINVAL if *dst* is an invalid dynptr or if *dst*
+ * is a read-only dynptr.
+ *
+ * void *bpf_dynptr_data(struct bpf_dynptr *ptr, u32 offset, u32 len)
+ * Description
+ * Get a pointer to the underlying dynptr data.
+ *
+ * *len* must be a statically known value. The returned data slice
+ * is invalidated whenever the dynptr is invalidated.
+ * Return
+ * Pointer to the underlying dynptr data, NULL if the dynptr is
+ * read-only, if the dynptr is invalid, or if the offset and length
+ * is out of bounds.
*/
#define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \
@@ -5377,6 +5448,13 @@ union bpf_attr {
FN(kptr_xchg), \
FN(map_lookup_percpu_elem), \
FN(skc_to_mptcp_sock), \
+ FN(dynptr_from_mem), \
+ FN(ringbuf_reserve_dynptr), \
+ FN(ringbuf_submit_dynptr), \
+ FN(ringbuf_discard_dynptr), \
+ FN(dynptr_read), \
+ FN(dynptr_write), \
+ FN(dynptr_data), \
/* */
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
@@ -6528,6 +6606,11 @@ struct bpf_timer {
__u64 :64;
} __attribute__((aligned(8)));
+struct bpf_dynptr {
+ __u64 :64;
+ __u64 :64;
+} __attribute__((aligned(8)));
+
struct bpf_sysctl {
__u32 write; /* Sysctl is being read (= 0) or written (= 1).
* Allows 1,2,4-byte read, but no write.