summaryrefslogtreecommitdiffstats
path: root/net/ipv6
AgeCommit message (Collapse)AuthorFilesLines
2019-06-19treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500Thomas Gleixner22-91/+22
Based on 2 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation # extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 4122 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Enrico Weigelt <info@metux.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller1-10/+12
Pablo Neira Ayuso says: ==================== Netfilter fixes for net 1) Module autoload for masquerade and redirection does not work. 2) Leak in unqueued packets in nf_ct_frag6_queue(). Ignore duplicated fragments, pretend they are placed into the queue. Patches from Guillaume Nault. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-18ipv6: Stop sending in-kernel notifications for each nexthopIdo Schimmel1-12/+17
Both listeners - mlxsw and netdevsim - of IPv6 FIB notifications are now ready to handle IPv6 multipath notifications. Therefore, stop ignoring such notifications in both drivers and stop sending notification for each added / deleted nexthop. v2: * Remove 'multipath_rt' from 'struct fib6_entry_notifier_info' Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-18ipv6: Add IPv6 multipath notification for route deleteIdo Schimmel1-0/+6
If all the nexthops of a multipath route are being deleted, send one notification for the entire route, instead of one per-nexthop. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-18ipv6: Add IPv6 multipath notifications for add / replaceIdo Schimmel1-0/+15
Emit a notification when a multipath routes is added or replace. Note that unlike the replace notifications sent from fib6_add_rt2node(), it is possible we are sending a 'FIB_EVENT_ENTRY_REPLACE' when a route was merely added and not replaced. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-18ipv6: Extend notifier info for multipath routesIdo Schimmel1-0/+17
Extend the IPv6 FIB notifier info with number of sibling routes being notified. This will later allow listeners to process one notification for a multipath routes instead of N, where N is the number of nexthops. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller8-35/+27
Honestly all the conflicts were simple overlapping changes, nothing really interesting to report. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds5-14/+24
Pull networking fixes from David Miller: "Lots of bug fixes here: 1) Out of bounds access in __bpf_skc_lookup, from Lorenz Bauer. 2) Fix rate reporting in cfg80211_calculate_bitrate_he(), from John Crispin. 3) Use after free in psock backlog workqueue, from John Fastabend. 4) Fix source port matching in fdb peer flow rule of mlx5, from Raed Salem. 5) Use atomic_inc_not_zero() in fl6_sock_lookup(), from Eric Dumazet. 6) Network header needs to be set for packet redirect in nfp, from John Hurley. 7) Fix udp zerocopy refcnt, from Willem de Bruijn. 8) Don't assume linear buffers in vxlan and geneve error handlers, from Stefano Brivio. 9) Fix TOS matching in mlxsw, from Jiri Pirko. 10) More SCTP cookie memory leak fixes, from Neil Horman. 11) Fix VLAN filtering in rtl8366, from Linus Walluij. 12) Various TCP SACK payload size and fragmentation memory limit fixes from Eric Dumazet. 13) Use after free in pneigh_get_next(), also from Eric Dumazet. 14) LAPB control block leak fix from Jeremy Sowden" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (145 commits) lapb: fixed leak of control-blocks. tipc: purge deferredq list for each grp member in tipc_group_delete ax25: fix inconsistent lock state in ax25_destroy_timer neigh: fix use-after-free read in pneigh_get_next tcp: fix compile error if !CONFIG_SYSCTL hv_sock: Suppress bogus "may be used uninitialized" warnings be2net: Fix number of Rx queues used for flow hashing net: handle 802.1P vlan 0 packets properly tcp: enforce tcp_min_snd_mss in tcp_mtu_probing() tcp: add tcp_min_snd_mss sysctl tcp: tcp_fragment() should apply sane memory limits tcp: limit payload size of sacked skbs Revert "net: phylink: set the autoneg state in phylink_phy_change" bpf: fix nested bpf tracepoints with per-cpu data bpf: Fix out of bounds memory access in bpf_sk_storage vsock/virtio: set SOCK_DONE on peer shutdown net: dsa: rtl8366: Fix up VLAN filtering net: phylink: set the autoneg state in phylink_phy_change net: add high_order_alloc_disable sysctl/static key tcp: add tcp_tx_skb_cache sysctl ...
2019-06-17netfilter: synproxy: extract SYNPROXY infrastructure from {ipt, ip6t}_SYNPROXYFernando Fernandez Mancera1-411/+9
Add common functions into nf_synproxy_core.c to prepare for nftables support. The prototypes of the functions used by {ipt, ip6t}_SYNPROXY are in the new file nf_synproxy.h Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-06-17netfilter: synproxy: remove module dependency on IPv6 SYNPROXYFernando Fernandez Mancera1-0/+2
This is a prerequisite for the infrastructure module NETFILTER_SYNPROXY. The new module is needed to avoid duplicated code for the SYNPROXY nftables support. Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-06-14udp: Remove unused variable/function (exact_dif)Tim Beale1-11/+0
This was originally passed through to the VRF logic in compute_score(). But that logic has now been replaced by udp_sk_bound_dev_eq() and so this code is no longer used or needed. Signed-off-by: Tim Beale <timbeale@catalyst.net.nz> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-14udp: Remove unused parameter (exact_dif)Tim Beale1-7/+6
Originally this was used by the VRF logic in compute_score(), but that was later replaced by udp_sk_bound_dev_eq() and the parameter became unused. Note this change adds an 'unused variable' compiler warning that will be removed in the next patch (I've split the removal in two to make review slightly easier). Signed-off-by: Tim Beale <timbeale@catalyst.net.nz> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-14ipv4: tcp: fix ACK/RST sent with a transmit delayEric Dumazet1-1/+1
If we want to set a EDT time for the skb we want to send via ip_send_unicast_reply(), we have to pass a new parameter and initialize ipc.sockc.transmit_time with it. This fixes the EDT time for ACK/RST packets sent on behalf of a TIME_WAIT socket. Fixes: a842fe1425cb ("tcp: add optional per socket transmit delay") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-14docs: kbuild: convert docs to ReST and rename to *.rstMauro Carvalho Chehab1-1/+1
The kbuild documentation clearly shows that the documents there are written at different times: some use markdown, some use their own peculiar logic to split sections. Convert everything to ReST without affecting too much the author's style and avoiding adding uneeded markups. The conversion is actually: - add blank lines and identation in order to identify paragraphs; - fix tables markups; - add some lists markups; - mark literal blocks; - adjust title markups. At its new index.rst, let's add a :orphan: while this is not linked to the main index.rst file, in order to avoid build warnings. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2019-06-12tcp: add optional per socket transmit delayEric Dumazet1-0/+1
Adding delays to TCP flows is crucial for studying behavior of TCP stacks, including congestion control modules. Linux offers netem module, but it has unpractical constraints : - Need root access to change qdisc - Hard to setup on egress if combined with non trivial qdisc like FQ - Single delay for all flows. EDT (Earliest Departure Time) adoption in TCP stack allows us to enable a per socket delay at a very small cost. Networking tools can now establish thousands of flows, each of them with a different delay, simulating real world conditions. This requires FQ packet scheduler or a EDT-enabled NIC. This patchs adds TCP_TX_DELAY socket option, to set a delay in usec units. unsigned int tx_delay = 10000; /* 10 msec */ setsockopt(fd, SOL_TCP, TCP_TX_DELAY, &tx_delay, sizeof(tx_delay)); Note that FQ packet scheduler limits might need some tweaking : man tc-fq PARAMETERS limit Hard limit on the real queue size. When this limit is reached, new packets are dropped. If the value is lowered, packets are dropped so that the new limit is met. Default is 10000 packets. flow_limit Hard limit on the maximum number of packets queued per flow. Default value is 100. Use of TCP_TX_DELAY option will increase number of skbs in FQ qdisc, so packets would be dropped if any of the previous limit is hit. Use of a jump label makes this support runtime-free, for hosts never using the option. Also note that TSQ (TCP Small Queues) limits are slightly changed with this patch : we need to account that skbs artificially delayed wont stop us providind more skbs to feed the pipe (netem uses skb_orphan_partial() for this purpose, but FQ can not use this trick) Because of that, using big delays might very well trigger old bugs in TSO auto defer logic and/or sndbuf limited detection. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-12vrf: Increment Icmp6InMsgs on the original netdevStephen Suryaputra2-8/+13
Get the ingress interface and increment ICMP counters based on that instead of skb->dev when the the dev is a VRF device. This is a follow up on the following message: https://www.spinics.net/lists/netdev/msg560268.html v2: Avoid changing skb->dev since it has unintended effect for local delivery (David Ahern). Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-11net: correct udp zerocopy refcnt also when zerocopy only on appendWillem de Bruijn1-1/+1
The below patch fixes an incorrect zerocopy refcnt increment when appending with MSG_MORE to an existing zerocopy udp skb. send(.., MSG_ZEROCOPY | MSG_MORE); // refcnt 1 send(.., MSG_ZEROCOPY | MSG_MORE); // refcnt still 1 (bar frags) But it missed that zerocopy need not be passed at the first send. The right test whether the uarg is newly allocated and thus has extra refcnt 1 is not !skb, but !skb_zcopy. send(.., MSG_MORE); // <no uarg> send(.., MSG_ZEROCOPY); // refcnt 1 Fixes: 100f6d8e09905 ("net: correct zerocopy refcnt with udp MSG_MORE") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-10ipv6: Allow routes to use nexthop objectsDavid Ahern1-8/+81
Add support for RTA_NH_ID attribute to allow a user to specify a nexthop id to use with a route. fc_nh_id is added to fib6_config to hold the value passed in the RTA_NH_ID attribute. If a nexthop id is given, the gateway, device, encap and multipath attributes can not be set. Update ip6_route_del to check metric and protocol before nexthop specs. If fc_nh_id is set, then it must match the id in the route entry. Since IPv6 allows delete of a cached entry (an exception), add ip6_del_cached_rt_nh to cycle through all of the fib6_nh in a fib entry if it is using a nexthop. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-10ipv6: Handle all fib6_nh in a nexthop in mtu updatesDavid Ahern1-1/+28
Use nexthop_for_each_fib6_nh to call fib6_nh_mtu_change for each fib6_nh in a nexthop for rt6_mtu_change_route. For __ip6_rt_update_pmtu, we need to find the nexthop that correlates to the device and gateway in the rt6_info. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-10ipv6: Handle all fib6_nh in a nexthop in rt6_do_redirectDavid Ahern1-1/+19
Use nexthop_for_each_fib6_nh and fib6_nh_find_match to find the fib6_nh in a nexthop that correlates to the device and gateway in the rt6_info. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-10ipv6: Handle all fib6_nh in a nexthop in __ip6_route_redirectDavid Ahern1-4/+35
Add a hook in __ip6_route_redirect to handle a nexthop struct in a fib6_info. Use nexthop_for_each_fib6_nh and fib6_nh_redirect_match to call ip6_redirect_nh_match for each fib6_nh looking for a match. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-10ipv6: Handle all fib6_nh in a nexthop in exception handlingDavid Ahern1-3/+108
Add a hook in rt6_flush_exceptions, rt6_remove_exception_rt, rt6_update_exception_stamp_rt, and rt6_age_exceptions to handle nexthop struct in a fib6_info. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-10ipv6: Handle all fib6_nh in a nexthop in fib6_info_uses_devDavid Ahern1-0/+18
Add a hook in fib6_info_uses_dev to handle nexthop struct in a fib6_info. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-10ipv6: Handle all fib6_nh in a nexthop in rt6_nlmsg_sizeDavid Ahern1-12/+37
Add a hook in rt6_nlmsg_size to handle nexthop struct in a fib6_info. rt6_nh_nlmsg_size is used to sum the space needed for all nexthops in the fib entry. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-10ipv6: Handle all fib6_nh in a nexthop in __find_rr_leafDavid Ahern1-2/+47
Add a hook in __find_rr_leaf to handle nexthop struct in a fib6_info. nexthop_for_each_fib6_nh is used to walk each fib6_nh in a nexthop and call find_match. On a match, use the fib6_nh saved in the callback arg to setup fib6_result. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-10ipv6: Handle all fib6_nh in a nexthop in rt6_device_matchDavid Ahern1-2/+52
Add a hook in rt6_device_match to handle nexthop struct in a fib6_info. The new rt6_nh_dev_match uses nexthop_for_each_fib6_nh to walk each fib6_nh in a nexthop and call __rt6_device_match. On match, rt6_nh_dev_match returns the fib6_nh and rt6_device_match uses it to setup fib6_result. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-10ipv6: Handle all fib6_nh in a nexthop in fib6_drop_pcpu_fromDavid Ahern1-4/+27
Use nexthop_for_each_fib6_nh to walk all fib6_nh in a nexthop when dropping 'from' reference in pcpu routes. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-10Update my email addressJozsef Kadlecsik1-1/+1
It's better to use my kadlec@netfilter.org email address in the source code. I might not be able to use kadlec@blackhole.kfki.hu in the future. Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2019-06-09ipv6: tcp: send consistent autoflowlabel in TIME_WAIT stateEric Dumazet1-3/+10
In case autoflowlabel is in action, skb_get_hash_flowi6() derives a non zero skb->hash to the flowlabel. If skb->hash is zero, a flow dissection is performed. Since all TCP skbs sent from ESTABLISH state inherit their skb->hash from sk->sk_txhash, we better keep a copy of sk->sk_txhash into the TIME_WAIT socket. After this patch, ACK or RST packets sent on behalf of a TIME_WAIT socket have the flowlabel that was previously used by the flow. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-09ipv6: tcp: fix potential NULL deref in tcp_v6_send_reset()Eric Dumazet1-1/+1
syzbot found a crash in tcp_v6_send_reset() caused by my latest change. Problem is that if an skb has been queued to socket prequeue, skb_dst(skb)->dev can not anymore point to the device. Fortunately in this case the socket pointer is not NULL. A similar issue has been fixed in commit 0f85feae6b71 ("tcp: fix more NULL deref after prequeue changes"), I should have known better. Fixes: 323a53c41292 ("ipv6: tcp: enable flowlabel reflection in some RST packets") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-09ipv6: flowlabel: fl6_sock_lookup() must use atomic_inc_not_zeroEric Dumazet1-3/+4
Before taking a refcount, make sure the object is not already scheduled for deletion. Same fix is needed in ipv6_flowlabel_opt() Fixes: 18367681a10b ("ipv6 flowlabel: Convert np->ipv6_fl_list to RCU.") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-08Merge tag 'spdx-5.2-rc4' of ↵Linus Torvalds3-21/+3
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull yet more SPDX updates from Greg KH: "Another round of SPDX header file fixes for 5.2-rc4 These are all more "GPL-2.0-or-later" or "GPL-2.0-only" tags being added, based on the text in the files. We are slowly chipping away at the 700+ different ways people tried to write the license text. All of these were reviewed on the spdx mailing list by a number of different people. We now have over 60% of the kernel files covered with SPDX tags: $ ./scripts/spdxcheck.py -v 2>&1 | grep Files Files checked: 64533 Files with SPDX: 40392 Files with errors: 0 I think the majority of the "easy" fixups are now done, it's now the start of the longer-tail of crazy variants to wade through" * tag 'spdx-5.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (159 commits) treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 450 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 449 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 448 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 446 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 445 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 444 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 443 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 442 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 440 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 438 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 437 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 436 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 435 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 434 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 433 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 432 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 431 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 430 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 429 ...
2019-06-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller1-2/+6
Daniel Borkmann says: ==================== pull-request: bpf 2019-06-07 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) Fix several bugs in riscv64 JIT code emission which forgot to clear high 32-bits for alu32 ops, from Björn and Luke with selftests covering all relevant BPF alu ops from Björn and Jiong. 2) Two fixes for UDP BPF reuseport that avoid calling the program in case of __udp6_lib_err and UDP GRO which broke reuseport_select_sock() assumption that skb->data is pointing to transport header, from Martin. 3) Two fixes for BPF sockmap: a use-after-free from sleep in psock's backlog workqueue, and a missing restore of sk_write_space when psock gets dropped, from Jakub and John. 4) Fix unconnected UDP sendmsg hook API which is insufficient as-is since it breaks standard applications like DNS if reverse NAT is not performed upon receive, from Daniel. 5) Fix an out-of-bounds read in __bpf_skc_lookup which in case of AF_INET6 fails to verify that the length of the tuple is long enough, from Lorenz. 6) Fix libbpf's libbpf__probe_raw_btf to return an fd instead of 0/1 (for {un,}successful probe) as that is expected to be propagated as an fd to load_sk_storage_btf() and thus closing the wrong descriptor otherwise, from Michal. 7) Fix bpftool's JSON output for the case when a lookup fails, from Krzesimir. 8) Minor misc fixes in docs, samples and selftests, from various others. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller45-241/+63
Some ISDN files that got removed in net-next had some changes done in mainline, take the removals. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds1-7/+18
Pull networking fixes from David Miller: 1) Free AF_PACKET po->rollover properly, from Willem de Bruijn. 2) Read SFP eeprom in max 16 byte increments to avoid problems with some SFP modules, from Russell King. 3) Fix UDP socket lookup wrt. VRF, from Tim Beale. 4) Handle route invalidation properly in s390 qeth driver, from Julian Wiedmann. 5) Memory leak on unload in RDS, from Zhu Yanjun. 6) sctp_process_init leak, from Neil HOrman. 7) Fix fib_rules rule insertion semantic change that broke Android, from Hangbin Liu. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (33 commits) pktgen: do not sleep with the thread lock held. net: mvpp2: Use strscpy to handle stat strings net: rds: fix memory leak in rds_ib_flush_mr_pool ipv6: fix EFAULT on sendto with icmpv6 and hdrincl ipv6: use READ_ONCE() for inet->hdrincl as in ipv4 Revert "fib_rules: return 0 directly if an exactly same rule exists when NLM_F_EXCL not supplied" net: aquantia: fix wol configuration not applied sometimes ethtool: fix potential userspace buffer overflow Fix memory leak in sctp_process_init net: rds: fix memory leak when unload rds_rdma ipv6: fix the check before getting the cookie in rt6_get_cookie ipv4: not do cache for local delivery if bc_forwarding is enabled s390/qeth: handle error when updating TX queue count s390/qeth: fix VLAN attribute in bridge_hostnotify udev event s390/qeth: check dst entry before use s390/qeth: handle limited IPv4 broadcast in L3 TX path net: fix indirect calls helpers for ptype list hooks. net: ipvlan: Fix ipvlan device tso disabled while NETIF_F_IP_CSUM is set udp: only choose unbound UDP socket for multicast when not in a VRF net/tls: replace the sleeping lock around RX resync with a bit lock ...
2019-06-07netfilter: ipv6: nf_defrag: accept duplicate fragments againGuillaume Nault1-3/+7
When fixing the skb leak introduced by the conversion to rbtree, I forgot about the special case of duplicate fragments. The condition under the 'insert_error' label isn't effective anymore as nf_ct_frg6_gather() doesn't override the returned value anymore. So duplicate fragments now get NF_DROP verdict. To accept duplicate fragments again, handle them specially as soon as inet_frag_queue_insert() reports them. Return -EINPROGRESS which will translate to NF_STOLEN verdict, like any accepted fragment. However, such packets don't carry any new information and aren't queued, so we just drop them immediately. Fixes: a0d56cb911ca ("netfilter: ipv6: nf_defrag: fix leakage of unqueued fragments") Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-06-06bpf: fix unconnected udp hooksDaniel Borkmann1-0/+4
Intention of cgroup bind/connect/sendmsg BPF hooks is to act transparently to applications as also stated in original motivation in 7828f20e3779 ("Merge branch 'bpf-cgroup-bind-connect'"). When recently integrating the latter two hooks into Cilium to enable host based load-balancing with Kubernetes, I ran into the issue that pods couldn't start up as DNS got broken. Kubernetes typically sets up DNS as a service and is thus subject to load-balancing. Upon further debugging, it turns out that the cgroupv2 sendmsg BPF hooks API is currently insufficient and thus not usable as-is for standard applications shipped with most distros. To break down the issue we ran into with a simple example: # cat /etc/resolv.conf nameserver 147.75.207.207 nameserver 147.75.207.208 For the purpose of a simple test, we set up above IPs as service IPs and transparently redirect traffic to a different DNS backend server for that node: # cilium service list ID Frontend Backend 1 147.75.207.207:53 1 => 8.8.8.8:53 2 147.75.207.208:53 1 => 8.8.8.8:53 The attached BPF program is basically selecting one of the backends if the service IP/port matches on the cgroup hook. DNS breaks here, because the hooks are not transparent enough to applications which have built-in msg_name address checks: # nslookup 1.1.1.1 ;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.207#53 ;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.208#53 ;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.207#53 [...] ;; connection timed out; no servers could be reached # dig 1.1.1.1 ;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.207#53 ;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.208#53 ;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.207#53 [...] ; <<>> DiG 9.11.3-1ubuntu1.7-Ubuntu <<>> 1.1.1.1 ;; global options: +cmd ;; connection timed out; no servers could be reached For comparison, if none of the service IPs is used, and we tell nslookup to use 8.8.8.8 directly it works just fine, of course: # nslookup 1.1.1.1 8.8.8.8 1.1.1.1.in-addr.arpa name = one.one.one.one. In order to fix this and thus act more transparent to the application, this needs reverse translation on recvmsg() side. A minimal fix for this API is to add similar recvmsg() hooks behind the BPF cgroups static key such that the program can track state and replace the current sockaddr_in{,6} with the original service IP. From BPF side, this basically tracks the service tuple plus socket cookie in an LRU map where the reverse NAT can then be retrieved via map value as one example. Side-note: the BPF cgroups static key should be converted to a per-hook static key in future. Same example after this fix: # cilium service list ID Frontend Backend 1 147.75.207.207:53 1 => 8.8.8.8:53 2 147.75.207.208:53 1 => 8.8.8.8:53 Lookups work fine now: # nslookup 1.1.1.1 1.1.1.1.in-addr.arpa name = one.one.one.one. Authoritative answers can be found from: # dig 1.1.1.1 ; <<>> DiG 9.11.3-1ubuntu1.7-Ubuntu <<>> 1.1.1.1 ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 51550 ;; flags: qr rd ra ad; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 512 ;; QUESTION SECTION: ;1.1.1.1. IN A ;; AUTHORITY SECTION: . 23426 IN SOA a.root-servers.net. nstld.verisign-grs.com. 2019052001 1800 900 604800 86400 ;; Query time: 17 msec ;; SERVER: 147.75.207.207#53(147.75.207.207) ;; WHEN: Tue May 21 12:59:38 UTC 2019 ;; MSG SIZE rcvd: 111 And from an actual packet level it shows that we're using the back end server when talking via 147.75.207.20{7,8} front end: # tcpdump -i any udp [...] 12:59:52.698732 IP foo.42011 > google-public-dns-a.google.com.domain: 18803+ PTR? 1.1.1.1.in-addr.arpa. (38) 12:59:52.698735 IP foo.42011 > google-public-dns-a.google.com.domain: 18803+ PTR? 1.1.1.1.in-addr.arpa. (38) 12:59:52.701208 IP google-public-dns-a.google.com.domain > foo.42011: 18803 1/0/0 PTR one.one.one.one. (67) 12:59:52.701208 IP google-public-dns-a.google.com.domain > foo.42011: 18803 1/0/0 PTR one.one.one.one. (67) [...] In order to be flexible and to have same semantics as in sendmsg BPF programs, we only allow return codes in [1,1] range. In the sendmsg case the program is called if msg->msg_name is present which can be the case in both, connected and unconnected UDP. The former only relies on the sockaddr_in{,6} passed via connect(2) if passed msg->msg_name was NULL. Therefore, on recvmsg side, we act in similar way to call into the BPF program whenever a non-NULL msg->msg_name was passed independent of sk->sk_state being TCP_ESTABLISHED or not. Note that for TCP case, the msg->msg_name is ignored in the regular recvmsg path and therefore not relevant. For the case of ip{,v6}_recv_error() paths, picked up via MSG_ERRQUEUE, the hook is not called. This is intentional as it aligns with the same semantics as in case of TCP cgroup BPF hooks right now. This might be better addressed in future through a different bpf_attach_type such that this case can be distinguished from the regular recvmsg paths, for example. Fixes: 1cedee13d25a ("bpf: Hooks for sys_sendmsg") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrey Ignatov <rdna@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-06-06ipv6: fix spelling mistake: "wtih" -> "with"Colin Ian King1-1/+1
There is a spelling mistake in a NL_SET_ERR_MSG message. Fix it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-06ipv6: fix EFAULT on sendto with icmpv6 and hdrinclOlivier Matz1-5/+8
The following code returns EFAULT (Bad address): s = socket(AF_INET6, SOCK_RAW, IPPROTO_ICMPV6); setsockopt(s, SOL_IPV6, IPV6_HDRINCL, 1); sendto(ipv6_icmp6_packet, addr); /* returns -1, errno = EFAULT */ The IPv4 equivalent code works. A workaround is to use IPPROTO_RAW instead of IPPROTO_ICMPV6. The failure happens because 2 bytes are eaten from the msghdr by rawv6_probe_proto_opt() starting from commit 19e3c66b52ca ("ipv6 equivalent of "ipv4: Avoid reading user iov twice after raw_probe_proto_opt""), but at that time it was not a problem because IPV6_HDRINCL was not yet introduced. Only eat these 2 bytes if hdrincl == 0. Fixes: 715f504b1189 ("ipv6: add IPV6_HDRINCL option for raw sockets") Signed-off-by: Olivier Matz <olivier.matz@6wind.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-06ipv6: use READ_ONCE() for inet->hdrincl as in ipv4Olivier Matz1-2/+10
As it was done in commit 8f659a03a0ba ("net: ipv4: fix for a race condition in raw_sendmsg") and commit 20b50d79974e ("net: ipv4: emulate READ_ONCE() on ->hdrincl bit-field in raw_sendmsg()") for ipv4, copy the value of inet->hdrincl in a local variable, to avoid introducing a race condition in the next commit. Signed-off-by: Olivier Matz <olivier.matz@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-06netfilter: ipv6: Fix undefined symbol nf_ct_frag6_gatherwenxu1-1/+3
CONFIG_NETFILTER=m and CONFIG_NF_DEFRAG_IPV6 is not set ERROR: "nf_ct_frag6_gather" [net/ipv6/ipv6.ko] undefined! Fixes: c9bb6165a16e ("netfilter: nf_conntrack_bridge: fix CONFIG_IPV6=y") Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-06-06xfrm: remove type and offload_type map from xfrm_state_afinfoFlorian Westphal5-14/+6
Only a handful of xfrm_types exist, no need to have 512 pointers for them. Reduces size of afinfo struct from 4k to 120 bytes on 64bit platforms. Also, the unregister function doesn't need to return an error, no single caller does anything useful with it. Just place a WARN_ON() where needed instead. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2019-06-06xfrm: remove eth_proto value from xfrm_state_afinfoFlorian Westphal1-2/+0
xfrm_prepare_input needs to lookup the state afinfo backend again to fetch the address family ethernet protocol value. There are only two address families, so a switch statement is simpler. While at it, use u8 for family and proto and remove the owner member -- its not used anywhere. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2019-06-06xfrm: remove state and template sort indirections from xfrm_state_afinfoFlorian Westphal1-98/+0
No module dependency, placing this in xfrm_state.c avoids need for an indirection. This also removes the state spinlock -- I don't see why we would need to hold it during sorting. This in turn allows to remove the 'net' argument passed to xfrm_tmpl_sort. Last, remove the EXPORT_SYMBOL, there are no modular callers. For the CONFIG_IPV6=m case, vmlinux size increase is about 300 byte. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2019-06-05ipv6: tcp: send consistent flowlabel in TIME_WAIT stateEric Dumazet1-0/+2
After commit 1d13a96c74fc ("ipv6: tcp: fix flowlabel value in ACK messages"), we stored in tw_flowlabel the flowlabel, in the case ACK packets needed to be sent on behalf of a TIME_WAIT socket. We can use the same field so that RST packets sent from TIME_WAIT state also use a consistent flowlabel. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Florent Fourcot <florent.fourcot@wifirst.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-05ipv6: tcp: enable flowlabel reflection in some RST packetsEric Dumazet3-4/+14
When RST packets are sent because no socket could be found, it makes sense to use flowlabel_reflect sysctl to decide if a reflection of the flowlabel is requested. This extends commit 22b6722bfa59 ("ipv6: Add sysctl for per namespace flow label reflection"), for some TCP RST packets. In order to provide full control of this new feature, flowlabel_reflect becomes a bitmask. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-05net: ipv6: drop unneeded likely() call around IS_ERR()Enrico Weigelt2-2/+2
IS_ERR() already calls unlikely(), so this extra unlikely() call around IS_ERR() is not needed. Signed-off-by: Enrico Weigelt <info@metux.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-05treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 422Thomas Gleixner1-4/+1
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms and conditions of the gnu general public license version 2 as published by the free software foundation extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 101 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190531190113.822954939@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-05treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 372Thomas Gleixner1-4/+1
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation version 2 extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 135 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190531081036.435762997@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-05treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 343Thomas Gleixner1-13/+1
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of version 2 of the gnu general public license as published by the free software foundation this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details you should have received a copy of the gnu general public license along with this program if not see http www gnu org licenses extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 1 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Armijn Hemel <armijn@tjaldur.nl> Reviewed-by: Alexios Zavras <alexios.zavras@intel.com> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190530000437.338011816@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>