Merge branch 'bpf-sockmap-tls-fixes'

John Fastabend says: ==================== To date our usage of sockmap/tls has been fairly simple, the BPF programs did only well-defined pop, push, pull and apply/cork operations. Now that we started to push more complex programs into sockmap we uncovered a series of issues addressed here. Further OpenSSL3.0 version should be released soon with kTLS support so its important to get any remaining issues on BPF and kTLS support resolved. Additionally, I have a patch under development to allow sockmap to be enabled/disabled at runtime for Cilium endpoints. This allows us to stress the map insert/delete with kTLS more than previously where Cilium only added the socket to the map when it entered ESTABLISHED state and never touched it from the control path side again relying on the sockets own close() hook to remove it. To test I have a set of test cases in test_sockmap.c that expose these issues. Once we get fixes here merged and in bpf-next I'll submit the tests to bpf-next tree to ensure we don't regress again. Also I've run these patches in the Cilium CI with OpenSSL (master branch) this will run tools such as netperf, ab, wrk2, curl, etc. to get a broad set of testing. I'm aware of two more issues that we are working to resolve in another couple (probably two) patches. First we see an auth tag corruption in kTLS when sending small 1byte chunks under stress. I've not pinned this down yet. But, guessing because its under 1B stress tests it must be some error path being triggered. And second we need to ensure BPF RX programs are not skipped when kTLS ULP is loaded. This breaks some of the sockmap selftests when running with kTLS. I'll send a follow up for this. v2: I dropped a patch that added !0 size check in tls_push_record this originated from a panic I caught awhile ago with a trace in the crypto stack. But I can not reproduce it anymore so will dig into that and send another patch later if needed. Anyways after a bit of thought it would be nicer if tls/crypto/bpf didn't require special case handling for the !0 size. ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
author: Daniel Borkmann <daniel@iogearbox.net> 2020-01-15 23:26:14 +0100
committer: Daniel Borkmann <daniel@iogearbox.net> 2020-01-15 23:26:23 +0100
commit: 85ddd9c3173102930c16b0cfe8dbb771af434532 (patch)
tree: bf73100781a6b1416cb714f3d72321f247bf44ea /net/tls
parent: 0af2ffc93a4b50948f9dad2786b7f1bd253bf0b9 (diff)
parent: 7361d44896ff20d48bdd502d1a0cd66308055d45 (diff)
download: linux-85ddd9c3173102930c16b0cfe8dbb771af434532.tar.bz2
2 files changed, 34 insertions, 7 deletions
diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index dac24c7aa7d4..94774c0e5ff3 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -732,15 +732,19 @@ out:
 	return rc;
 }
 
-static void tls_update(struct sock *sk, struct proto *p)
+static void tls_update(struct sock *sk, struct proto *p,
+		       void (*write_space)(struct sock *sk))
 {
 	struct tls_context *ctx;
 
 	ctx = tls_get_ctx(sk);
-	if (likely(ctx))
+	if (likely(ctx)) {
+		ctx->sk_write_space = write_space;
 		ctx->sk_proto = p;
-	else
+	} else {
 		sk->sk_prot = p;
+		sk->sk_write_space = write_space;
+	}
 }
 
 static int tls_get_info(const struct sock *sk, struct sk_buff *skb)
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index c6803a82b769..159d49dab403 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -682,12 +682,32 @@ static int tls_push_record(struct sock *sk, int flags,
 
 	split_point = msg_pl->apply_bytes;
 	split = split_point && split_point < msg_pl->sg.size;
+	if (unlikely((!split &&
+		      msg_pl->sg.size +
+		      prot->overhead_size > msg_en->sg.size) ||
+		     (split &&
+		      split_point +
+		      prot->overhead_size > msg_en->sg.size))) {
+		split = true;
+		split_point = msg_en->sg.size;
+	}
 	if (split) {
 		rc = tls_split_open_record(sk, rec, &tmp, msg_pl, msg_en,
 					   split_point, prot->overhead_size,
 					   &orig_end);
 		if (rc < 0)
 			return rc;
+		/* This can happen if above tls_split_open_record allocates
+		 * a single large encryption buffer instead of two smaller
+		 * ones. In this case adjust pointers and continue without
+		 * split.
+		 */
+		if (!msg_pl->sg.size) {
+			tls_merge_open_record(sk, rec, tmp, orig_end);
+			msg_pl = &rec->msg_plaintext;
+			msg_en = &rec->msg_encrypted;
+			split = false;
+		}
 		sk_msg_trim(sk, msg_en, msg_pl->sg.size +
 			    prot->overhead_size);
 	}
@@ -709,6 +729,12 @@ static int tls_push_record(struct sock *sk, int flags,
 		sg_mark_end(sk_msg_elem(msg_pl, i));
 	}
 
+	if (msg_pl->sg.end < msg_pl->sg.start) {
+		sg_chain(&msg_pl->sg.data[msg_pl->sg.start],
+			 MAX_SKB_FRAGS - msg_pl->sg.start + 1,
+			 msg_pl->sg.data);
+	}
+
 	i = msg_pl->sg.start;
 	sg_chain(rec->sg_aead_in, 2, &msg_pl->sg.data[i]);
 
@@ -783,10 +809,7 @@ more_data:
 	if (psock->eval == __SK_NONE) {
 		delta = msg->sg.size;
 		psock->eval = sk_psock_msg_verdict(sk, psock, msg);
-		if (delta < msg->sg.size)
-			delta -= msg->sg.size;
-		else
-			delta = 0;
+		delta -= msg->sg.size;
 	}
 	if (msg->cork_bytes && msg->cork_bytes > msg->sg.size &&
 	    !enospc && !full_record) {
author	Daniel Borkmann <daniel@iogearbox.net>	2020-01-15 23:26:14 +0100
committer	Daniel Borkmann <daniel@iogearbox.net>	2020-01-15 23:26:23 +0100
commit	85ddd9c3173102930c16b0cfe8dbb771af434532 (patch)
tree	bf73100781a6b1416cb714f3d72321f247bf44ea /net/tls
parent	0af2ffc93a4b50948f9dad2786b7f1bd253bf0b9 (diff)
parent	7361d44896ff20d48bdd502d1a0cd66308055d45 (diff)
download	linux-85ddd9c3173102930c16b0cfe8dbb771af434532.tar.bz2