Merge branch 'bpf-cap'

Alexei Starovoitov says: ==================== v6->v7: - permit SK_REUSEPORT program type under CAP_BPF as suggested by Marek Majkowski. It's equivalent to SOCKET_FILTER which is unpriv. v5->v6: - split allow_ptr_leaks into four flags. - retain bpf_jit_limit under cap_sys_admin. - fixed few other issues spotted by Daniel. v4->v5: Split BPF operations that are allowed under CAP_SYS_ADMIN into combination of CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN and keep some of them under CAP_SYS_ADMIN. The user process has to have - CAP_BPF to create maps, do other sys_bpf() commands and load SK_REUSEPORT progs. Note: dev_map, sock_hash, sock_map map types still require CAP_NET_ADMIN. That could be relaxed in the future. - CAP_BPF and CAP_PERFMON to load tracing programs. - CAP_BPF and CAP_NET_ADMIN to load networking programs. (or CAP_SYS_ADMIN for backward compatibility). CAP_BPF solves three main goals: 1. provides isolation to user space processes that drop CAP_SYS_ADMIN and switch to CAP_BPF. More on this below. This is the major difference vs v4 set back from Sep 2019. 2. makes networking BPF progs more secure, since CAP_BPF + CAP_NET_ADMIN prevents pointer leaks and arbitrary kernel memory access. 3. enables fuzzers to exercise all of the verifier logic. Eventually finding bugs and making BPF infra more secure. Currently fuzzers run in unpriv. They will be able to run with CAP_BPF. The patchset is long overdue follow-up from the last plumbers conference. Comparing to what was discussed at LPC the CAP* checks at attach time are gone. For tracing progs the CAP_SYS_ADMIN check was done at load time only. There was no check at attach time. For networking and cgroup progs CAP_SYS_ADMIN was required at load time and CAP_NET_ADMIN at attach time, but there are several ways to bypass CAP_NET_ADMIN: - if networking prog is using tail_call writing FD into prog_array will effectively attach it, but bpf_map_update_elem is an unprivileged operation. - freplace prog with CAP_SYS_ADMIN can replace networking prog Consolidating all CAP checks at load time makes security model similar to open() syscall. Once the user got an FD it can do everything with it. read/write/poll don't check permissions. The same way when bpf_prog_load command returns an FD the user can do everything (including attaching, detaching, and bpf_test_run). The important design decision is to allow ID->FD transition for CAP_SYS_ADMIN only. What it means that user processes can run with CAP_BPF and CAP_NET_ADMIN and they will not be able to affect each other unless they pass FDs via scm_rights or via pinning in bpffs. ID->FD is a mechanism for human override and introspection. An admin can do 'sudo bpftool prog ...'. It's possible to enforce via LSM that only bpftool binary does bpf syscall with CAP_SYS_ADMIN and the rest of user space processes do bpf syscall with CAP_BPF isolating bpf objects (progs, maps, links) that are owned by such processes from each other. Another significant change from LPC is that the verifier checks are split into four flags. The allow_ptr_leaks flag allows pointer manipulations. The bpf_capable flag enables all modern verifier features like bpf-to-bpf calls, BTF, bounded loops, dead code elimination, etc. All the goodness. The bypass_spec_v1 flag enables indirect stack access from bpf programs and disables speculative analysis and bpf array mitigations. The bypass_spec_v4 flag disables store sanitation. That allows networking progs with CAP_BPF + CAP_NET_ADMIN enjoy modern verifier features while being more secure. Some networking progs may need CAP_BPF + CAP_NET_ADMIN + CAP_PERFMON, since subtracting pointers (like skb->data_end - skb->data) is a pointer leak, but the verifier may get smarter in the future. ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
author: Daniel Borkmann <daniel@iogearbox.net> 2020-05-15 17:29:42 +0200
committer: Daniel Borkmann <daniel@iogearbox.net> 2020-05-15 17:29:46 +0200
commit: ed24a7a852b542911479383d5c80b9a2b4bb8caa (patch)
tree: 6315790b1ff6943b35aab55167ec86996184b770 /include
parent: 0ee52c0f6c67e187ff1906f6048af7c96df320c7 (diff)
parent: 81626001187609b9c49696a5b48d5abcf0e5f9be (diff)
download: linux-ed24a7a852b542911479383d5c80b9a2b4bb8caa.tar.bz2
4 files changed, 58 insertions, 2 deletions
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index c45d198ac38c..efe8836b5c48 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -19,6 +19,7 @@
 #include <linux/mutex.h>
 #include <linux/module.h>
 #include <linux/kallsyms.h>
+#include <linux/capability.h>
 
 struct bpf_verifier_env;
 struct bpf_verifier_log;
@@ -119,7 +120,7 @@ struct bpf_map {
 	struct bpf_map_memory memory;
 	char name[BPF_OBJ_NAME_LEN];
 	u32 btf_vmlinux_value_type_id;
-	bool unpriv_array;
+	bool bypass_spec_v1;
 	bool frozen; /* write-once; write-protected by freeze_mutex */
 	/* 22 bytes hole */
 
@@ -1095,6 +1096,21 @@ struct bpf_map *bpf_map_get_curr_or_next(u32 *id);
 
 extern int sysctl_unprivileged_bpf_disabled;
 
+static inline bool bpf_allow_ptr_leaks(void)
+{
+	return perfmon_capable();
+}
+
+static inline bool bpf_bypass_spec_v1(void)
+{
+	return perfmon_capable();
+}
+
+static inline bool bpf_bypass_spec_v4(void)
+{
+	return perfmon_capable();
+}
+
 int bpf_map_new_fd(struct bpf_map *map, int flags);
 int bpf_prog_new_fd(struct bpf_prog *prog);
 
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 6abd5a778fcd..ea833087e853 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -375,6 +375,9 @@ struct bpf_verifier_env {
 	u32 used_map_cnt;		/* number of used maps */
 	u32 id_gen;			/* used to generate unique reg IDs */
 	bool allow_ptr_leaks;
+	bool bpf_capable;
+	bool bypass_spec_v1;
+	bool bypass_spec_v4;
 	bool seen_direct_write;
 	struct bpf_insn_aux_data *insn_aux_data; /* array of per-insn state */
 	const struct bpf_line_info *prev_linfo;
diff --git a/include/linux/capability.h b/include/linux/capability.h
index 027d7e4a853b..b4345b38a6be 100644
--- a/include/linux/capability.h
+++ b/include/linux/capability.h
@@ -256,6 +256,11 @@ static inline bool perfmon_capable(void)
 	return capable(CAP_PERFMON) || capable(CAP_SYS_ADMIN);
 }
 
+static inline bool bpf_capable(void)
+{
+	return capable(CAP_BPF) || capable(CAP_SYS_ADMIN);
+}
+
 /* audit system wants to get cap info from files as well */
 extern int get_vfs_caps_from_disk(const struct dentry *dentry, struct cpu_vfs_cap_data *cpu_caps);
 
diff --git a/include/uapi/linux/capability.h b/include/uapi/linux/capability.h
index e58c9636741b..c7372180a0a9 100644
--- a/include/uapi/linux/capability.h
+++ b/include/uapi/linux/capability.h
@@ -274,6 +274,7 @@ struct vfs_ns_cap_data {
    arbitrary SCSI commands */
 /* Allow setting encryption key on loopback filesystem */
 /* Allow setting zone reclaim policy */
+/* Allow everything under CAP_BPF and CAP_PERFMON for backward compatibility */
 
 #define CAP_SYS_ADMIN        21
 
@@ -374,7 +375,38 @@ struct vfs_ns_cap_data {
 
 #define CAP_PERFMON		38
 
-#define CAP_LAST_CAP         CAP_PERFMON
+/*
+ * CAP_BPF allows the following BPF operations:
+ * - Creating all types of BPF maps
+ * - Advanced verifier features
+ *   - Indirect variable access
+ *   - Bounded loops
+ *   - BPF to BPF function calls
+ *   - Scalar precision tracking
+ *   - Larger complexity limits
+ *   - Dead code elimination
+ *   - And potentially other features
+ * - Loading BPF Type Format (BTF) data
+ * - Retrieve xlated and JITed code of BPF programs
+ * - Use bpf_spin_lock() helper
+ *
+ * CAP_PERFMON relaxes the verifier checks further:
+ * - BPF progs can use of pointer-to-integer conversions
+ * - speculation attack hardening measures are bypassed
+ * - bpf_probe_read to read arbitrary kernel memory is allowed
+ * - bpf_trace_printk to print kernel memory is allowed
+ *
+ * CAP_SYS_ADMIN is required to use bpf_probe_write_user.
+ *
+ * CAP_SYS_ADMIN is required to iterate system wide loaded
+ * programs, maps, links, BTFs and convert their IDs to file descriptors.
+ *
+ * CAP_PERFMON and CAP_BPF are required to load tracing programs.
+ * CAP_NET_ADMIN and CAP_BPF are required to load networking programs.
+ */
+#define CAP_BPF			39
+
+#define CAP_LAST_CAP         CAP_BPF
 
 #define cap_valid(x) ((x) >= 0 && (x) <= CAP_LAST_CAP)
author	Daniel Borkmann <daniel@iogearbox.net>	2020-05-15 17:29:42 +0200
committer	Daniel Borkmann <daniel@iogearbox.net>	2020-05-15 17:29:46 +0200
commit	ed24a7a852b542911479383d5c80b9a2b4bb8caa (patch)
tree	6315790b1ff6943b35aab55167ec86996184b770 /include
parent	0ee52c0f6c67e187ff1906f6048af7c96df320c7 (diff)
parent	81626001187609b9c49696a5b48d5abcf0e5f9be (diff)
download	linux-ed24a7a852b542911479383d5c80b9a2b4bb8caa.tar.bz2