summaryrefslogtreecommitdiffstats
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2018-01-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller16-97/+128
2018-01-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds15-85/+126
Pull networking fixes from David Miller: 1) Frag and UDP handling fixes in i40e driver, from Amritha Nambiar and Alexander Duyck. 2) Undo unintentional UAPI change in netfilter conntrack, from Florian Westphal. 3) Revert a change to how error codes are returned from dev_get_valid_name(), it broke some apps. 4) Cannot cache routes for ipv6 tunnels in the tunnel is ipv4/ipv6 dual-stack. From Eli Cooper. 5) Fix missed PMTU updates in geneve, from Xin Long. 6) Cure double free in macvlan, from Gao Feng. 7) Fix heap out-of-bounds write in rds_message_alloc_sgs(), from Mohamed Ghannam. 8) FEC bug fixes from FUgang Duan (mis-accounting of dev_id, missed deferral of probe when the regulator is not ready yet). 9) Missing DMA mapping error checks in 3c59x, from Neil Horman. 10) Turn off Broadcom tags for some b53 switches, from Florian Fainelli. 11) Fix OOPS when get_target_net() is passed an SKB whose NETLINK_CB() isn't initialized. From Andrei Vagin. 12) Fix crashes in fib6_add(), from Wei Wang. 13) PMTU bug fixes in SCTP from Marcelo Ricardo Leitner. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (56 commits) sh_eth: fix TXALCR1 offsets mdio-sun4i: Fix a memory leak phylink: mark expected switch fall-throughs in phylink_mii_ioctl sctp: fix the handling of ICMP Frag Needed for too small MTUs sctp: do not retransmit upon FragNeeded if PMTU discovery is disabled xen-netfront: enable device after manual module load bnxt_en: Fix the 'Invalid VF' id check in bnxt_vf_ndo_prep routine. bnxt_en: Fix population of flow_type in bnxt_hwrm_cfa_flow_alloc() sh_eth: fix SH7757 GEther initialization net: fec: free/restore resource in related probe error pathes uapi/if_ether.h: prevent redefinition of struct ethhdr ipv6: fix general protection fault in fib6_add() RDS: null pointer dereference in rds_atomic_free_op sh_eth: fix TSU resource handling net: stmmac: enable EEE in MII, GMII or RGMII only rtnetlink: give a user socket to get_target_net() MAINTAINERS: Update my email address. can: ems_usb: improve error reporting for error warning and error passive can: flex_can: Correct the checking for frame length in flexcan_start_xmit() can: gs_usb: fix return value of the "set_bittiming" callback ...
2018-01-08net: tipc: remove unused hardirq.hYang Shi1-1/+0
Preempt counter APIs have been split out, currently, hardirq.h just includes irq_enter/exit APIs which are not used by TIPC at all. So, remove the unused hardirq.h. Signed-off-by: Yang Shi <yang.s@alibaba-inc.com> Acked-by: Ying Xue <ying.xue@windriver.com> Tested-by: Ying Xue <ying.xue@windriver.com> Cc: Jon Maloy <jon.maloy@ericsson.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08net: ovs: remove unused hardirq.hYang Shi1-1/+0
Preempt counter APIs have been split out, currently, hardirq.h just includes irq_enter/exit APIs which are not used by openvswitch at all. So, remove the unused hardirq.h. Signed-off-by: Yang Shi <yang.s@alibaba-inc.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: dev@openvswitch.org Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08net: caif: remove unused hardirq.hYang Shi2-2/+0
Preempt counter APIs have been split out, currently, hardirq.h just includes irq_enter/exit APIs which are not used by caif at all. So, remove the unused hardirq.h. Signed-off-by: Yang Shi <yang.s@alibaba-inc.com> Cc: Dmitry Tarnyagin <dmitry.tarnyagin@lockless.no> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller84-1241/+3370
Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains Netfilter/IPVS updates for your net-next tree: 1) Free hooks via call_rcu to speed up netns release path, from Florian Westphal. 2) Reduce memory footprint of hook arrays, skip allocation if family is not present - useful in case decnet support is not compiled built-in. Patches from Florian Westphal. 3) Remove defensive check for malformed IPv4 - including ihl field - and IPv6 headers in x_tables and nf_tables. 4) Add generic flow table offload infrastructure for nf_tables, this includes the netlink control plane and support for IPv4, IPv6 and mixed IPv4/IPv6 dataplanes. This comes with NAT support too. This patchset adds the IPS_OFFLOAD conntrack status bit to indicate that this flow has been offloaded. 5) Add secpath matching support for nf_tables, from Florian. 6) Save some code bytes in the fast path for the nf_tables netdev, bridge and inet families. 7) Allow one single NAT hook per point and do not allow to register NAT hooks in nf_tables before the conntrack hook, patches from Florian. 8) Seven patches to remove the struct nf_af_info abstraction, instead we perform direct calls for IPv4 which is faster. IPv6 indirections are still needed to avoid dependencies with the 'ipv6' module, but these now reside in struct nf_ipv6_ops. 9) Seven patches to handle NFPROTO_INET from the Netfilter core, hence we can remove specific code in nf_tables to handle this pseudofamily. 10) No need for synchronize_net() call for nf_queue after conversion to hook arrays. Also from Florian. 11) Call cond_resched_rcu() when dumping large sets in ipset to avoid softlockup. Again from Florian. 12) Pass lockdep_nfnl_is_held() to rcu_dereference_protected(), patch from Florian Westphal. 13) Fix matching of counters in ipset, from Jozsef Kadlecsik. 14) Missing nfnl lock protection in the ip_set_net_exit path, also from Jozsef. 15) Move connlimit code that we can reuse from nf_tables into nf_conncount, from Florian Westhal. And asorted cleanups: 16) Get rid of nft_dereference(), it only has one single caller. 17) Add nft_set_is_anonymous() helper function. 18) Remove NF_ARP_FORWARD leftover chain definition in nf_tables_arp. 19) Remove unnecessary comments in nf_conntrack_h323_asn1.c From Varsha Rao. 20) Remove useless parameters in frag_safe_skb_hp(), from Gao Feng. 21) Constify layer 4 conntrack protocol definitions, function parameters to register/unregister these protocol trackers, and timeouts. Patches from Florian Westphal. 22) Remove nlattr_size indirection, from Florian Westphal. 23) Add fall-through comments as -Wimplicit-fallthrough needs this, from Gustavo A. R. Silva. 24) Use swap() macro to exchange values in ipset, patch from Gustavo A. R. Silva. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08l2tp: adjust comments about L2TPv3 offsetsGuillaume Nault1-4/+3
The "offset" option has been removed by commit 900631ee6a26 ("l2tp: remove configurable payload offset"). Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Acked-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08sctp: fix the handling of ICMP Frag Needed for too small MTUsMarcelo Ricardo Leitner2-12/+25
syzbot reported a hang involving SCTP, on which it kept flooding dmesg with the message: [ 246.742374] sctp: sctp_transport_update_pmtu: Reported pmtu 508 too low, using default minimum of 512 That happened because whenever SCTP hits an ICMP Frag Needed, it tries to adjust to the new MTU and triggers an immediate retransmission. But it didn't consider the fact that MTUs smaller than the SCTP minimum MTU allowed (512) would not cause the PMTU to change, and issued the retransmission anyway (thus leading to another ICMP Frag Needed, and so on). As IPv4 (ip_rt_min_pmtu=556) and IPv6 (IPV6_MIN_MTU=1280) minimum MTU are higher than that, sctp_transport_update_pmtu() is changed to re-fetch the PMTU that got set after our request, and with that, detect if there was an actual change or not. The fix, thus, skips the immediate retransmission if the received ICMP resulted in no change, in the hope that SCTP will select another path. Note: The value being used for the minimum MTU (512, SCTP_DEFAULT_MINSEGMENT) is not right and instead it should be (576, SCTP_MIN_PMTU), but such change belongs to another patch. Changes from v1: - do not disable PMTU discovery, in the light of commit 06ad391919b2 ("[SCTP] Don't disable PMTU discovery when mtu is small") and as suggested by Xin Long. - changed the way to break the rtx loop by detecting if the icmp resulted in a change or not Changes from v2: none See-also: https://lkml.org/lkml/2017/12/22/811 Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08sctp: do not retransmit upon FragNeeded if PMTU discovery is disabledMarcelo Ricardo Leitner1-12/+12
Currently, if PMTU discovery is disabled on a given transport, but the configured value is higher than the actual PMTU, it is likely that we will get some icmp Frag Needed. The issue is, if PMTU discovery is disabled, we won't update the information and will issue a retransmission immediately, which may very well trigger another ICMP, and another retransmission, leading to a loop. The fix is to simply not trigger immediate retransmissions if PMTU discovery is disabled on the given transport. Changes from v2: - updated stale comment, noticed by Xin Long Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08tcp: Split BUG_ON() in tcp_tso_should_defer() into two assertionsStefano Brivio1-1/+2
The two conditions triggering BUG_ON() are somewhat unrelated: the tcp_skb_pcount() check is meant to catch TSO flaws, the second one checks sanity of congestion window bookkeeping. Split them into two separate BUG_ON() assertions on two lines, so that we know which one actually triggers, when they do. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08net: ipv6: Allow connect to linklocal address from socket bound to vrfDavid Ahern2-4/+2
Allow a process bound to a VRF to connect to a linklocal address. Currently, this fails because of a mismatch between the scope of the linklocal address and the sk_bound_dev_if inherited by the VRF binding: $ ssh -6 fe80::70b8:cff:fedd:ead8%eth1 ssh: connect to host fe80::70b8:cff:fedd:ead8%eth1 port 22: Invalid argument Relax the scope check to allow the socket to be bound to the same L3 device as the scope id. This makes ipv6 linklocal consistent with other relaxed checks enabled by commits 1ff23beebdd3 ("net: l3mdev: Allow send on enslaved interface") and 7bb387c5ab12a ("net: Allow IP_MULTICAST_IF to set index to L3 slave"). Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08netfilter: ipset: Missing nfnl_lock()/nfnl_unlock() is added to ↵Jozsef Kadlecsik1-0/+2
ip_set_net_exit() Patch "netfilter: ipset: use nfnl_mutex_is_locked" is added the real mutex locking check, which revealed the missing locking in ip_set_net_exit(). Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Reported-by: syzbot+36b06f219f2439fe62e1@syzkaller.appspotmail.com Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-08netfilter: ipset: Fix "don't update counters" mode when counters used at the ↵Jozsef Kadlecsik5-122/+89
matching The matching of the counters was not taken into account, fixed. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-08netfilter: ipset: use swap macro instead of _manually_ swapping valuesGustavo A. R. Silva3-18/+6
Make use of the swap macro and remove unnecessary variables tmp. This makes the code easier to read and maintain. This code was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-08netfilter: nf_tables: flow offload expressionPablo Neira Ayuso3-0/+272
Add new instruction for the nf_tables VM that allows us to specify what flows are offloaded into a given flow table via name. This new instruction creates the flow entry and adds it to the flow table. Only established flows, ie. we have seen traffic in both directions, are added to the flow table. You can still decide to offload entries at a later stage via packet counting or checking the ct status in case you want to offload assured conntracks. This new extension depends on the conntrack subsystem. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-08netfilter: flow table support for the mixed IPv4/IPv6 familyPablo Neira Ayuso5-2/+61
This patch adds the IPv6 flow table type, that implements the datapath flow table to forward IPv6 traffic. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-08netfilter: flow table support for IPv6Pablo Neira Ayuso4-1/+290
This patch adds the IPv6 flow table type, that implements the datapath flow table to forward IPv6 traffic. This patch exports ip6_dst_mtu_forward() that is required to check for mtu to pass up packets that need PMTUD handling to the classic forwarding path. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-08netfilter: flow table support for IPv4Pablo Neira Ayuso3-0/+294
This patch adds the IPv4 flow table type, that implements the datapath flow table to forward IPv4 traffic. Rationale is: 1) Look up for the packet in the flow table, from the ingress hook. 2) If there's a hit, decrement ttl and pass it on to the neighbour layer for transmission. 3) If there's a miss, packet is passed up to the classic forwarding path. This patch also supports layer 3 source and destination NAT. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-08netfilter: add generic flow table infrastructurePablo Neira Ayuso3-0/+439
This patch defines the API to interact with flow tables, this allows to add, delete and lookup for entries in the flow table. This also adds the generic garbage code that removes entries that have expired, ie. no traffic has been seen for a while. Users of the flow table infrastructure can delete entries via flow_offload_dead(), which sets the dying bit, this signals the garbage collector to release an entry from user context. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-08netfilter: nf_tables: add flow table netlink frontendPablo Neira Ayuso1-1/+746
This patch introduces a netlink control plane to create, delete and dump flow tables. Flow tables are identified by name, this name is used from rules to refer to an specific flow table. Flow tables use the rhashtable class and a generic garbage collector to remove expired entries. This also adds the infrastructure to add different flow table types, so we can add one for each layer 3 protocol family. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-08netfilter: nf_conntrack: add IPS_OFFLOAD status bitPablo Neira Ayuso4-5/+45
This new bit tells us that the conntrack entry is owned by the flow table offload infrastructure. # cat /proc/net/nf_conntrack ipv4 2 tcp 6 src=10.141.10.2 dst=147.75.205.195 sport=36392 dport=443 src=147.75.205.195 dst=192.168.2.195 sport=443 dport=36392 [OFFLOAD] mark=0 zone=0 use=2 Note the [OFFLOAD] tag in the listing. The timer of such conntrack entries look like stopped from userspace. In practise, to make sure the conntrack entry does not go away, the conntrack timer is periodically set to an arbitrary large value that gets refreshed on every iteration from the garbage collector, so it never expires- and they display no internal state in the case of TCP flows. This allows us to save a bitcheck from the packet path via nf_ct_is_expired(). Conntrack entries that have been offloaded to the flow table infrastructure cannot be deleted/flushed via ctnetlink. The flow table infrastructure is also responsible for releasing this conntrack entry. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-08netfilter: nf_tables: remove nft_dereference()Pablo Neira Ayuso1-3/+3
This macro is unnecessary, it just hides details for one single caller. nfnl_dereference() is just enough. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-08netfilter: remove defensive check on malformed packets from raw socketsPablo Neira Ayuso13-128/+3
Users cannot forge malformed IPv4/IPv6 headers via raw sockets that they can inject into the stack. Specifically, not for IPv4 since 55888dfb6ba7 ("AF_RAW: Augment raw_send_hdrinc to expand skb to fit iphdr->ihl (v2)"). IPv6 raw sockets also ensure that packets have a well-formed IPv6 header available in the skbuff. At quick glance, br_netfilter also validates layer 3 headers and it drops malformed both IPv4 and IPv6 packets. Therefore, let's remove this defensive check all over the place. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-08netfilter: meta: secpath supportFlorian Westphal1-0/+43
replacement for iptables "-m policy --dir in --policy {ipsec,none}". Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-08netfilter: remove struct nf_afinfo and its helper functionsPablo Neira Ayuso4-52/+6
This abstraction has no clients anymore, remove it. This is what remains from previous authors, so correct copyright statement after recent modifications and code removal. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-08netfilter: remove route_key_size field in struct nf_afinfoPablo Neira Ayuso3-8/+16
This is only needed by nf_queue, place this code where it belongs. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-08netfilter: move reroute indirection to struct nf_ipv6_opsPablo Neira Ayuso5-17/+27
We cannot make a direct call to nf_ip6_reroute() because that would result in autoloading the 'ipv6' module because of symbol dependencies. Therefore, define reroute indirection in nf_ipv6_ops where this really belongs to. For IPv4, we can indeed make a direct function call, which is faster, given IPv4 is built-in in the networking code by default. Still, CONFIG_INET=n and CONFIG_NETFILTER=y is possible, so define empty inline stub for IPv4 in such case. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-08netfilter: move route indirection to struct nf_ipv6_opsPablo Neira Ayuso9-56/+57
We cannot make a direct call to nf_ip6_route() because that would result in autoloading the 'ipv6' module because of symbol dependencies. Therefore, define route indirection in nf_ipv6_ops where this really belongs to. For IPv4, we can indeed make a direct function call, which is faster, given IPv4 is built-in in the networking code by default. Still, CONFIG_INET=n and CONFIG_NETFILTER=y is possible, so define empty inline stub for IPv4 in such case. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-08netfilter: remove saveroute indirection in struct nf_afinfoPablo Neira Ayuso4-61/+41
This is only used by nf_queue.c and this function comes with no symbol dependencies with IPv6, it just refers to structure layouts. Therefore, we can replace it by a direct function call from where it belongs. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-08netfilter: move checksum_partial indirection to struct nf_ipv6_opsPablo Neira Ayuso4-17/+33
We cannot make a direct call to nf_ip6_checksum_partial() because that would result in autoloading the 'ipv6' module because of symbol dependencies. Therefore, define checksum_partial indirection in nf_ipv6_ops where this really belongs to. For IPv4, we can indeed make a direct function call, which is faster, given IPv4 is built-in in the networking code by default. Still, CONFIG_INET=n and CONFIG_NETFILTER=y is possible, so define empty inline stub for IPv4 in such case. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-08netfilter: move checksum indirection to struct nf_ipv6_opsPablo Neira Ayuso5-11/+29
We cannot make a direct call to nf_ip6_checksum() because that would result in autoloading the 'ipv6' module because of symbol dependencies. Therefore, define checksum indirection in nf_ipv6_ops where this really belongs to. For IPv4, we can indeed make a direct function call, which is faster, given IPv4 is built-in in the networking code by default. Still, CONFIG_INET=n and CONFIG_NETFILTER=y is possible, so define empty inline stub for IPv4 in such case. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-08netfilter: connlimit: split xt_connlimit into front and backendFlorian Westphal4-345/+402
This allows to reuse xt_connlimit infrastructure from nf_tables. The upcoming nf_tables frontend can just pass in an nftables register as input key, this allows limiting by any nft-supported key, including concatenations. For xt_connlimit, pass in the zone and the ip/ipv6 address. With help from Yi-Hung Wei. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-08netfilter: nf_tables: remove hooks from family definitionPablo Neira Ayuso7-40/+36
They don't belong to the family definition, move them to the filter chain type definition instead. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-08netfilter: nf_tables: remove multihook chains and familiesPablo Neira Ayuso8-70/+48
Since NFPROTO_INET is handled from the core, we don't need to maintain extra infrastructure in nf_tables to handle the double hook registration, one for IPv4 and another for IPv6. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-08netfilter: nf_tables_inet: don't use multihook infrastructure anymorePablo Neira Ayuso3-16/+60
Use new native NFPROTO_INET support in netfilter core, this gets rid of ad-hoc code in the nf_tables API codebase. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-08netfilter: core: support for NFPROTO_INET hook registrationPablo Neira Ayuso1-9/+44
Expand NFPROTO_INET in two hook registrations, one for NFPROTO_IPV4 and another for NFPROTO_IPV6. Hence, we handle NFPROTO_INET from the core. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-08netfilter: core: pass family as parameter to nf_remove_net_hook()Pablo Neira Ayuso1-5/+5
So static_key_slow_dec applies to the family behind NFPROTO_INET. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-08netfilter: core: pass hook number, family and device to nf_find_hook_list()Pablo Neira Ayuso1-17/+19
Instead of passing struct nf_hook_ops, this is needed by follow up patches to handle NFPROTO_INET from the core. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-08netfilter: core: add nf_remove_net_hookPablo Neira Ayuso1-4/+4
Just a cleanup, __nf_unregister_net_hook() is used by a follow up patch when handling NFPROTO_INET as a real family from the core. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-08netfilter: nf_tables: add nft_set_is_anonymous() helperPablo Neira Ayuso2-5/+5
Add helper function to test for the NFT_SET_ANONYMOUS flag. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-08netfilter: nf_tables: explicit nft_set_pktinfo() call from hook pathPablo Neira Ayuso9-13/+24
Instead of calling this function from the family specific variant, this reduces the code size in the fast path for the netdev, bridge and inet families. After this change, we must call nft_set_pktinfo() upfront from the chain hook indirection. Before: text data bss dec hex filename 2145 208 0 2353 931 net/netfilter/nf_tables_netdev.o After: text data bss dec hex filename 2125 208 0 2333 91d net/netfilter/nf_tables_netdev.o Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-08netfilter: nf_tables_arp: don't set forward chainPablo Neira Ayuso1-1/+0
46928a0b49f3 ("netfilter: nf_tables: remove multihook chains and families") already removed this, this is a leftover. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-08netfilter: nf_tables: reject nat hook registration if prio is before conntrackFlorian Westphal1-1/+6
No problem for iptables as priorities are fixed values defined in the nat modules, but in nftables the priority its coming from userspace. Reject in case we see that such a hook would not work. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-08netfilter: core: only allow one nat hook per hook pointFlorian Westphal4-0/+16
The netfilter NAT core cannot deal with more than one NAT hook per hook location (prerouting, input ...), because the NAT hooks install a NAT null binding in case the iptables nat table (iptable_nat hooks) or the corresponding nftables chain (nft nat hooks) doesn't specify a nat transformation. Null bindings are needed to detect port collsisions between NAT-ed and non-NAT-ed connections. This causes nftables NAT rules to not work when iptable_nat module is loaded, and vice versa because nat binding has already been attached when the second nat hook is consulted. The netfilter core is not really the correct location to handle this (hooks are just hooks, the core has no notion of what kinds of side effects a hook implements), but its the only place where we can check for conflicts between both iptables hooks and nftables hooks without adding dependencies. So add nat annotation to hook_ops to describe those hooks that will add NAT bindings and then make core reject if such a hook already exists. The annotation fills a padding hole, in case further restrictions appar we might change this to a 'u8 type' instead of bool. iptables error if nft nat hook active: iptables -t nat -A POSTROUTING -j MASQUERADE iptables v1.4.21: can't initialize iptables table `nat': File exists Perhaps iptables or your kernel needs to be upgraded. nftables error if iptables nat table present: nft -f /etc/nftables/ipv4-nat /usr/etc/nftables/ipv4-nat:3:1-2: Error: Could not process rule: File exists table nat { ^^ Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-08netfilter: xtables: add and use xt_request_find_table_lockFlorian Westphal4-51/+63
currently we always return -ENOENT to userspace if we can't find a particular table, or if the table initialization fails. Followup patch will make nat table init fail in case nftables already registered a nat hook so this change makes xt_find_table_lock return an ERR_PTR to return the errno value reported from the table init function. Add xt_request_find_table_lock as try_then_request_module replacement and use it where needed. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-08netfilter: don't allocate space for arp/bridge hooks unless neededFlorian Westphal6-0/+21
no need to define hook points if the family isn't supported. Because we need these hooks for either nftables, arp/ebtables or the 'call-iptables' hack we have in the bridge layer add two new dependencies, NETFILTER_FAMILY_{ARP,BRIDGE}, and have the users select them. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-08netfilter: don't allocate space for decnet hooks unless neededFlorian Westphal1-0/+4
no need to define hook points if the family isn't supported. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-08netfilter: reduce hook array sizes to what is neededFlorian Westphal1-7/+17
Not all families share the same hook count, adjust sizes to what is needed. struct net before: /* size: 6592, cachelines: 103, members: 46 */ after: /* size: 5952, cachelines: 93, members: 46 */ Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-08netfilter: reduce size of hook entry point locationsFlorian Westphal3-11/+50
struct net contains: struct nf_hook_entries __rcu *hooks[NFPROTO_NUMPROTO][NF_MAX_HOOKS]; which store the hook entry point locations for the various protocol families and the hooks. Using array results in compact c code when doing accesses, i.e. x = rcu_dereference(net->nf.hooks[pf][hook]); but its also wasting a lot of memory, as most families are not used. So split the array into those families that are used, which are only 5 (instead of 13). In most cases, the 'pf' argument is constant, i.e. gcc removes switch statement. struct net before: /* size: 5184, cachelines: 81, members: 46 */ after: /* size: 4672, cachelines: 73, members: 46 */ Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-08netfilter: core: free hooks with call_rcuFlorian Westphal1-6/+28
Giuseppe Scrivano says: "SELinux, if enabled, registers for each new network namespace 6 netfilter hooks." Cost for this is high. With synchronize_net() removed: "The net benefit on an SMP machine with two cores is that creating a new network namespace takes -40% of the original time." This patch replaces synchronize_net+kvfree with call_rcu(). We store rcu_head at the tail of a structure that has no fixed layout, i.e. we cannot use offsetof() to compute the start of the original allocation. Thus store this information right after the rcu head. We could simplify this by just placing the rcu_head at the start of struct nf_hook_entries. However, this structure is used in packet processing hotpath, so only place what is needed for that at the beginning of the struct. Reported-by: Giuseppe Scrivano <gscrivan@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>