From a4dc6a49156b1f8d6e17251ffda17c9e6a5db78a Mon Sep 17 00:00:00 2001
From: Maxime Chevallier <maxime.chevallier@bootlin.com>
Date: Sat, 16 Mar 2019 14:41:30 +0100
Subject: packets: Always register packet sk in the same order

When using fanouts with AF_PACKET, the demux functions such as
fanout_demux_cpu will return an index in the fanout socket array, which
corresponds to the selected socket.

The ordering of this array depends on the order the sockets were added
to a given fanout group, so for FANOUT_CPU this means sockets are bound
to cpus in the order they are configured, which is OK.

However, when stopping then restarting the interface these sockets are
bound to, the sockets are reassigned to the fanout group in the reverse
order, due to the fact that they were inserted at the head of the
interface's AF_PACKET socket list.

This means that traffic that was directed to the first socket in the
fanout group is now directed to the last one after an interface restart.

In the case of FANOUT_CPU, traffic from CPU0 will be directed to the
socket that used to receive traffic from the last CPU after an interface
restart.

This commit introduces a helper to add a socket at the tail of a list,
then uses it to register AF_PACKET sockets.

Note that this changes the order in which sockets are listed in /proc and
with sock_diag.

Fixes: dc99f600698d ("packet: Add fanout support")
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/packet/af_packet.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

(limited to 'net/packet/af_packet.c')

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 8376bc1c1508..8754d7c93b84 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -3243,7 +3243,7 @@ static int packet_create(struct net *net, struct socket *sock, int protocol,
 	}
 
 	mutex_lock(&net->packet.sklist_lock);
-	sk_add_node_rcu(sk, &net->packet.sklist);
+	sk_add_node_tail_rcu(sk, &net->packet.sklist);
 	mutex_unlock(&net->packet.sklist_lock);
 
 	preempt_disable();
-- 
cgit v1.2.3


From 18bed89107a400af0d672ec85a270f1545db2569 Mon Sep 17 00:00:00 2001
From: Yoshiki Komachi <komachi.yoshiki@lab.ntt.co.jp>
Date: Mon, 18 Mar 2019 14:39:52 +0900
Subject: af_packet: fix the tx skb protocol in raw sockets with ETH_P_ALL

I am using "protocol ip" filters in TC to manipulate TC flower
classifiers, which are only available with "protocol ip". However,
I faced an issue that packets sent via raw sockets with ETH_P_ALL
did not match the ip filters even if they did satisfy the condition
(e.g., DHCP offer from dhcpd).

I have determined that the behavior was caused by an unexpected
value stored in skb->protocol, namely, ETH_P_ALL instead of ETH_P_IP,
when packets were sent via raw sockets with ETH_P_ALL set.

IMHO, storing ETH_P_ALL in skb->protocol is not appropriate for
packets sent via raw sockets because ETH_P_ALL is not a real ether
type used on wire, but a virtual one.

This patch fixes the tx protocol selection in cases of transmission
via raw sockets created with ETH_P_ALL so that it asks the driver to
extract protocol from the Ethernet header.

Fixes: 75c65772c3 ("net/packet: Ask driver for protocol if not provided by user")
Signed-off-by: Yoshiki Komachi <komachi.yoshiki@lab.ntt.co.jp>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/packet/af_packet.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

(limited to 'net/packet/af_packet.c')

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 8754d7c93b84..323655a25674 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1852,7 +1852,8 @@ oom:
 
 static void packet_parse_headers(struct sk_buff *skb, struct socket *sock)
 {
-	if (!skb->protocol && sock->type == SOCK_RAW) {
+	if ((!skb->protocol || skb->protocol == htons(ETH_P_ALL)) &&
+	    sock->type == SOCK_RAW) {
 		skb_reset_mac_header(skb);
 		skb->protocol = dev_parse_header_protocol(skb);
 	}
-- 
cgit v1.2.3


From 398f0132c14754fcd03c1c4f8e7176d001ce8ea1 Mon Sep 17 00:00:00 2001
From: Christoph Paasch <cpaasch@apple.com>
Date: Mon, 18 Mar 2019 23:14:52 -0700
Subject: net/packet: Set __GFP_NOWARN upon allocation in alloc_pg_vec

Since commit fc62814d690c ("net/packet: fix 4gb buffer limit due to overflow check")
one can now allocate packet ring buffers >= UINT_MAX. However, syzkaller
found that that triggers a warning:

[   21.100000] WARNING: CPU: 2 PID: 2075 at mm/page_alloc.c:4584 __alloc_pages_nod0
[   21.101490] Modules linked in:
[   21.101921] CPU: 2 PID: 2075 Comm: syz-executor.0 Not tainted 5.0.0 #146
[   21.102784] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.5.1 01/01/2011
[   21.103887] RIP: 0010:__alloc_pages_nodemask+0x2a0/0x630
[   21.104640] Code: fe ff ff 65 48 8b 04 25 c0 de 01 00 48 05 90 0f 00 00 41 bd 01 00 00 00 48 89 44 24 48 e9 9c fe 3
[   21.107121] RSP: 0018:ffff88805e1cf920 EFLAGS: 00010246
[   21.107819] RAX: 0000000000000000 RBX: ffffffff85a488a0 RCX: 0000000000000000
[   21.108753] RDX: 0000000000000000 RSI: dffffc0000000000 RDI: 0000000000000000
[   21.109699] RBP: 1ffff1100bc39f28 R08: ffffed100bcefb67 R09: ffffed100bcefb67
[   21.110646] R10: 0000000000000001 R11: ffffed100bcefb66 R12: 000000000000000d
[   21.111623] R13: 0000000000000000 R14: ffff88805e77d888 R15: 000000000000000d
[   21.112552] FS:  00007f7c7de05700(0000) GS:ffff88806d100000(0000) knlGS:0000000000000000
[   21.113612] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   21.114405] CR2: 000000000065c000 CR3: 000000005e58e006 CR4: 00000000001606e0
[   21.115367] Call Trace:
[   21.115705]  ? __alloc_pages_slowpath+0x21c0/0x21c0
[   21.116362]  alloc_pages_current+0xac/0x1e0
[   21.116923]  kmalloc_order+0x18/0x70
[   21.117393]  kmalloc_order_trace+0x18/0x110
[   21.117949]  packet_set_ring+0x9d5/0x1770
[   21.118524]  ? packet_rcv_spkt+0x440/0x440
[   21.119094]  ? lock_downgrade+0x620/0x620
[   21.119646]  ? __might_fault+0x177/0x1b0
[   21.120177]  packet_setsockopt+0x981/0x2940
[   21.120753]  ? __fget+0x2fb/0x4b0
[   21.121209]  ? packet_release+0xab0/0xab0
[   21.121740]  ? sock_has_perm+0x1cd/0x260
[   21.122297]  ? selinux_secmark_relabel_packet+0xd0/0xd0
[   21.123013]  ? __fget+0x324/0x4b0
[   21.123451]  ? selinux_netlbl_socket_setsockopt+0x101/0x320
[   21.124186]  ? selinux_netlbl_sock_rcv_skb+0x3a0/0x3a0
[   21.124908]  ? __lock_acquire+0x529/0x3200
[   21.125453]  ? selinux_socket_setsockopt+0x5d/0x70
[   21.126075]  ? __sys_setsockopt+0x131/0x210
[   21.126533]  ? packet_release+0xab0/0xab0
[   21.127004]  __sys_setsockopt+0x131/0x210
[   21.127449]  ? kernel_accept+0x2f0/0x2f0
[   21.127911]  ? ret_from_fork+0x8/0x50
[   21.128313]  ? do_raw_spin_lock+0x11b/0x280
[   21.128800]  __x64_sys_setsockopt+0xba/0x150
[   21.129271]  ? lockdep_hardirqs_on+0x37f/0x560
[   21.129769]  do_syscall_64+0x9f/0x450
[   21.130182]  entry_SYSCALL_64_after_hwframe+0x49/0xbe

We should allocate with __GFP_NOWARN to handle this.

Cc: Kal Conley <kal.conley@dectris.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Fixes: fc62814d690c ("net/packet: fix 4gb buffer limit due to overflow check")
Signed-off-by: Christoph Paasch <cpaasch@apple.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/packet/af_packet.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

(limited to 'net/packet/af_packet.c')

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 323655a25674..9419c5cf4de5 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -4210,7 +4210,7 @@ static struct pgv *alloc_pg_vec(struct tpacket_req *req, int order)
 	struct pgv *pg_vec;
 	int i;
 
-	pg_vec = kcalloc(block_nr, sizeof(struct pgv), GFP_KERNEL);
+	pg_vec = kcalloc(block_nr, sizeof(struct pgv), GFP_KERNEL | __GFP_NOWARN);
 	if (unlikely(!pg_vec))
 		goto out;
 
-- 
cgit v1.2.3