summaryrefslogtreecommitdiffstats
path: root/net/netfilter
AgeCommit message (Collapse)AuthorFilesLines
2020-10-14netfilter: nf_log: missing vlan offload tag and protoPablo Neira Ayuso1-0/+12
Dump vlan tag and proto for the usual vlan offload case if the NF_LOG_MACDECODE flag is set on. Without this information the logging is misleading as there is no reference to the VLAN header. [12716.993704] test: IN=veth0 OUT= MACSRC=86:6c:92:ea:d6:73 MACDST=0e:3b:eb:86:73:76 VPROTO=8100 VID=10 MACPROTO=0800 SRC=192.168.10.2 DST=172.217.168.163 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=2548 DF PROTO=TCP SPT=55848 DPT=80 WINDOW=501 RES=0x00 ACK FIN URGP=0 [12721.157643] test: IN=veth0 OUT= MACSRC=86:6c:92:ea:d6:73 MACDST=0e:3b:eb:86:73:76 VPROTO=8100 VID=10 MACPROTO=0806 ARP HTYPE=1 PTYPE=0x0800 OPCODE=2 MACSRC=86:6c:92:ea:d6:73 IPSRC=192.168.10.2 MACDST=0e:3b:eb:86:73:76 IPDST=192.168.10.1 Fixes: 83e96d443b37 ("netfilter: log: split family specific code to nf_log_{ip,ip6,common}.c files") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-10-12netfilter: nftables: extend error reporting for chain updatesPablo Neira Ayuso1-5/+14
The initial support for netlink extended ACK is missing the chain update path, which results in misleading error reporting in case of EEXIST. Fixes 36dd1bcc07e5 ("netfilter: nf_tables: initial support for extended ACK reporting") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-10-12ipvs: clear skb->tstamp in forwarding pathJulian Anastasov1-0/+6
fq qdisc requires tstamp to be cleared in forwarding path Reported-by: Evgeny B <abt-admin@mail.ru> Link: https://bugzilla.kernel.org/show_bug.cgi?id=209427 Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Fixes: 8203e2d844d3 ("net: clear skb->tstamp in forwarding paths") Fixes: fb420d5d91c1 ("tcp/fq: move back to CLOCK_MONOTONIC") Fixes: 80b14dee2bea ("net: Add a new socket option for a future transmit time.") Signed-off-by: Julian Anastasov <ja@ssi.bg> Reviewed-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-09-08netfilter: nft_meta: use socket user_ns to retrieve skuid and skgidPablo Neira Ayuso1-2/+2
... instead of using init_user_ns. Fixes: 96518518cc41 ("netfilter: add nftables") Tested-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-09-08netfilter: conntrack: nf_conncount_init is failing with IPv6 disabledEelco Chaudron1-0/+2
The openvswitch module fails initialization when used in a kernel without IPv6 enabled. nf_conncount_init() fails because the ct code unconditionally tries to initialize the netns IPv6 related bit, regardless of the build option. The change below ignores the IPv6 part if not enabled. Note that the corresponding _put() function already has this IPv6 configuration check. Fixes: 11efd5cb04a1 ("openvswitch: Support conntrack zone limit") Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-09-08netfilter: ctnetlink: fix mark based dump filtering regressionMartin Willi1-16/+3
conntrack mark based dump filtering may falsely skip entries if a mask is given: If the mask-based check does not filter out the entry, the else-if check is always true and compares the mark without considering the mask. The if/else-if logic seems wrong. Given that the mask during filter setup is implicitly set to 0xffffffff if not specified explicitly, the mark filtering flags seem to just complicate things. Restore the previously used approach by always matching against a zero mask is no filter mark is given. Fixes: cb8aa9a3affb ("netfilter: ctnetlink: add kernel side filtering for dump") Signed-off-by: Martin Willi <martin@strongswan.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-09-08netfilter: nf_tables: coalesce multiple notifications into one skbuffPablo Neira Ayuso1-13/+57
On x86_64, each notification results in one skbuff allocation which consumes at least 768 bytes due to the skbuff overhead. This patch coalesces several notifications into one single skbuff, so each notification consumes at least ~211 bytes, that ~3.5 times less memory consumption. As a result, this is reducing the chances to exhaust the netlink socket receive buffer. Rule of thumb is that each notification batch only contains netlink messages whose report flag is the same, nfnetlink_send() requires this to do appropriate delivery to userspace, either via unicast (echo mode) or multicast (monitor mode). The skbuff control buffer is used to annotate the report flag for later handling at the new coalescing routine. The batch skbuff notification size is NLMSG_GOODSIZE, using a larger skbuff would allow for more socket receiver buffer savings (to amortize the cost of the skbuff even more), however, going over that size might break userspace applications, so let's be conservative and stick to NLMSG_GOODSIZE. Reported-by: Phil Sutter <phil@nwl.cc> Acked-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-09-08netfilter: ctnetlink: add a range check for l3/l4 protonumWill McVicker1-1/+2
The indexes to the nf_nat_l[34]protos arrays come from userspace. So check the tuple's family, e.g. l3num, when creating the conntrack in order to prevent an OOB memory access during setup. Here is an example kernel panic on 4.14.180 when userspace passes in an index greater than NFPROTO_NUMPROTO. Internal error: Oops - BUG: 0 [#1] PREEMPT SMP Modules linked in:... Process poc (pid: 5614, stack limit = 0x00000000a3933121) CPU: 4 PID: 5614 Comm: poc Tainted: G S W O 4.14.180-g051355490483 Hardware name: Qualcomm Technologies, Inc. SM8150 V2 PM8150 Google Inc. MSM task: 000000002a3dfffe task.stack: 00000000a3933121 pc : __cfi_check_fail+0x1c/0x24 lr : __cfi_check_fail+0x1c/0x24 ... Call trace: __cfi_check_fail+0x1c/0x24 name_to_dev_t+0x0/0x468 nfnetlink_parse_nat_setup+0x234/0x258 ctnetlink_parse_nat_setup+0x4c/0x228 ctnetlink_new_conntrack+0x590/0xc40 nfnetlink_rcv_msg+0x31c/0x4d4 netlink_rcv_skb+0x100/0x184 nfnetlink_rcv+0xf4/0x180 netlink_unicast+0x360/0x770 netlink_sendmsg+0x5a0/0x6a4 ___sys_sendmsg+0x314/0x46c SyS_sendmsg+0xb4/0x108 el0_svc_naked+0x34/0x38 This crash is not happening since 5.4+, however, ctnetlink still allows for creating entries with unsupported layer 3 protocol number. Fixes: c1d10adb4a521 ("[NETFILTER]: Add ctnetlink port for nf_conntrack") Signed-off-by: Will McVicker <willmcvicker@google.com> [pablo@netfilter.org: rebased original patch on top of nf.git] Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-09-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds12-74/+140
Pull networking fixes from David Miller: 1) Use netif_rx_ni() when necessary in batman-adv stack, from Jussi Kivilinna. 2) Fix loss of RTT samples in rxrpc, from David Howells. 3) Memory leak in hns_nic_dev_probe(), from Dignhao Liu. 4) ravb module cannot be unloaded, fix from Yuusuke Ashizuka. 5) We disable BH for too lokng in sctp_get_port_local(), add a cond_resched() here as well, from Xin Long. 6) Fix memory leak in st95hf_in_send_cmd, from Dinghao Liu. 7) Out of bound access in bpf_raw_tp_link_fill_link_info(), from Yonghong Song. 8) Missing of_node_put() in mt7530 DSA driver, from Sumera Priyadarsini. 9) Fix crash in bnxt_fw_reset_task(), from Michael Chan. 10) Fix geneve tunnel checksumming bug in hns3, from Yi Li. 11) Memory leak in rxkad_verify_response, from Dinghao Liu. 12) In tipc, don't use smp_processor_id() in preemptible context. From Tuong Lien. 13) Fix signedness issue in mlx4 memory allocation, from Shung-Hsi Yu. 14) Missing clk_disable_prepare() in gemini driver, from Dan Carpenter. 15) Fix ABI mismatch between driver and firmware in nfp, from Louis Peens. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (110 commits) net/smc: fix sock refcounting in case of termination net/smc: reset sndbuf_desc if freed net/smc: set rx_off for SMCR explicitly net/smc: fix toleration of fake add_link messages tg3: Fix soft lockup when tg3_reset_task() fails. doc: net: dsa: Fix typo in config code sample net: dp83867: Fix WoL SecureOn password nfp: flower: fix ABI mismatch between driver and firmware tipc: fix shutdown() of connectionless socket ipv6: Fix sysctl max for fib_multipath_hash_policy drivers/net/wan/hdlc: Change the default of hard_header_len to 0 net: gemini: Fix another missing clk_disable_unprepare() in probe net: bcmgenet: fix mask check in bcmgenet_validate_flow() amd-xgbe: Add support for new port mode net: usb: dm9601: Add USB ID of Keenetic Plus DSL vhost: fix typo in error message net: ethernet: mlx4: Fix memory allocation in mlx4_buddy_init() pktgen: fix error message with wrong function name net: ethernet: ti: am65-cpsw: fix rmii 100Mbit link mode cxgb4: fix thermal zone device registration ...
2020-08-29netfilter: conntrack: do not auto-delete clash entries on replyFlorian Westphal2-17/+11
Its possible that we have more than one packet with the same ct tuple simultaneously, e.g. when an application emits n packets on same UDP socket from multiple threads. NAT rules might be applied to those packets. With the right set of rules, n packets will be mapped to m destinations, where at least two packets end up with the same destination. When this happens, the existing clash resolution may merge the skb that is processed after the first has been received with the identical tuple already in hash table. However, its possible that this identical tuple is a NAT_CLASH tuple. In that case the second skb will be sent, but no reply can be received since the reply that is processed first removes the NAT_CLASH tuple. Do not auto-delete, this gives a 1 second window for replies to be passed back to originator. Packets that are coming later (udp stream case) will not be affected: they match the original ct entry, not a NAT_CLASH one. Also prevent NAT_CLASH entries from getting offloaded. Fixes: 6a757c07e51f ("netfilter: conntrack: allow insertion of clashing entries") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-08-28netfilter: nfnetlink: nfnetlink_unicast() reports EAGAIN instead of ENOBUFSPablo Neira Ayuso4-38/+39
Frontend callback reports EAGAIN to nfnetlink to retry a command, this is used to signal that module autoloading is required. Unfortunately, nlmsg_unicast() reports EAGAIN in case the receiver socket buffer gets full, so it enters a busy-loop. This patch updates nfnetlink_unicast() to turn EAGAIN into ENOBUFS and to use nlmsg_unicast(). Remove the flags field in nfnetlink_unicast() since this is always MSG_DONTWAIT in the existing code which is exactly what nlmsg_unicast() passes to netlink_unicast() as parameter. Fixes: 96518518cc41 ("netfilter: add nftables") Reported-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-08-28netfilter: delete repeated wordsRandy Dunlap3-3/+3
Drop duplicated words in net/netfilter/ and net/ipv4/netfilter/. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-08-23treewide: Use fallthrough pseudo-keywordGustavo A. R. Silva2-2/+2
Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2020-08-21netfilter: nf_tables: fix destination register zeroingFlorian Westphal1-1/+3
Following bug was reported via irc: nft list ruleset set knock_candidates_ipv4 { type ipv4_addr . inet_service size 65535 elements = { 127.0.0.1 . 123, 127.0.0.1 . 123 } } .. udp dport 123 add @knock_candidates_ipv4 { ip saddr . 123 } udp dport 123 add @knock_candidates_ipv4 { ip saddr . udp dport } It should not have been possible to add a duplicate set entry. After some debugging it turned out that the problem is the immediate value (123) in the second-to-last rule. Concatenations use 32bit registers, i.e. the elements are 8 bytes each, not 6 and it turns out the kernel inserted inet firewall @knock_candidates_ipv4 element 0100007f ffff7b00 : 0 [end] element 0100007f 00007b00 : 0 [end] Note the non-zero upper bits of the first element. It turns out that nft_immediate doesn't zero the destination register, but this is needed when the length isn't a multiple of 4. Furthermore, the zeroing in nft_payload is broken. We can't use [len / 4] = 0 -- if len is a multiple of 4, index is off by one. Skip zeroing in this case and use a conditional instead of (len -1) / 4. Fixes: 49499c3e6e18 ("netfilter: nf_tables: switch registers to 32 bit addressing") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-08-21netfilter: nf_tables: add NFTA_SET_USERDATA if not nullPablo Neira Ayuso1-1/+2
Kernel sends an empty NFTA_SET_USERDATA attribute with no value if userspace adds a set with no NFTA_SET_USERDATA attribute. Fixes: e6d8ecac9e68 ("netfilter: nf_tables: Add new attributes into nft_set to store user data.") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-08-21netfilter: nft_set_rbtree: Detect partial overlap with start endpoint matchStefano Brivio1-1/+33
Getting creative with nft and omitting the interval_overlap() check from the set_overlap() function, without omitting set_overlap() altogether, led to the observation of a partial overlap that wasn't detected, and would actually result in replacement of the end element of an existing interval. This is due to the fact that we'll return -EEXIST on a matching, pre-existing start element, instead of -ENOTEMPTY, and the error is cleared by API if NLM_F_EXCL is not given. At this point, we can insert a matching start, and duplicate the end element as long as we don't end up into other intervals. For instance, inserting interval 0 - 2 with an existing 0 - 3 interval would result in a single 0 - 2 interval, and a dangling '3' end element. This is because nft will proceed after inserting the '0' start element as no error is reported, and no further conflicting intervals are detected on insertion of the end element. This needs a different approach as it's a local condition that can be detected by looking for duplicate ends coming from left and right, separately. Track those and directly report -ENOTEMPTY on duplicated end elements for a matching start. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-08-21netfilter: nft_set_rbtree: Handle outcomes of tree rotations in overlap ↵Stefano Brivio1-9/+14
detection Checks for partial overlaps on insertion assume that end elements are always descendant nodes of their corresponding start, because they are inserted later. However, this is not the case if a previous delete operation caused a tree rotation as part of rebalancing. Taking the issue reported by Andreas Fischer as an example, if we omit delete operations, the existing procedure works because, equivalently, we are inserting a start item with value 40 in the this region of the red-black tree with single-sized intervals: overlap flag 10 (start) / \ false 20 (start) / \ false 30 (start) / \ false 60 (start) / \ false 50 (end) / \ false 20 (end) / \ false 40 (start) if we now delete interval 30 - 30, the tree can be rearranged in a way similar to this (note the rotation involving 50 - 50): overlap flag 10 (start) / \ false 20 (start) / \ false 25 (start) / \ false 70 (start) / \ false 50 (end) / \ true (from rule a1.) 50 (start) / \ true 40 (start) and we traverse interval 50 - 50 from the opposite direction compared to what was expected. To deal with those cases, add a start-before-start rule, b4., that covers traversal of existing intervals from the right. We now need to restrict start-after-end rule b3. to cases where there are no occurring nodes between existing start and end elements, because addition of rule b4. isn't sufficient to ensure that the pre-existing end element we encounter while descending the tree corresponds to a start element of an interval that we already traversed entirely. Different types of overlap detection on trees with rotations resulting from re-balancing will be covered by nft test case sets/0044interval_overlap_1. Reported-by: Andreas Fischer <netfilter@d9c.eu> Bugzilla: https://bugzilla.netfilter.org/show_bug.cgi?id=1449 Cc: <stable@vger.kernel.org> # 5.6.x Fixes: 7c84d41416d8 ("netfilter: nft_set_rbtree: Detect partial overlaps on insertion") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-08-20netfilter: conntrack: allow sctp hearbeat after connection re-useFlorian Westphal1-4/+35
If an sctp connection gets re-used, heartbeats are flagged as invalid because their vtag doesn't match. Handle this in a similar way as TCP conntrack when it suspects that the endpoints and conntrack are out-of-sync. When a HEARTBEAT request fails its vtag validation, flag this in the conntrack state and accept the packet. When a HEARTBEAT_ACK is received with an invalid vtag in the reverse direction after we allowed such a HEARTBEAT through, assume we are out-of-sync and re-set the vtag info. v2: remove left-over snippet from an older incarnation that moved new_state/old_state assignments, thats not needed so keep that as-is. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-08-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller3-27/+20
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Endianness issue in IPv4 option support in nft_exthdr, from Stephen Suryaputra. 2) Removes the waitcount optimization in nft_compat, from Florian Westphal. 3) Remove ipv6 -> nf_defrag_ipv6 module dependency, from Florian Westphal. 4) Memleak in chain binding support, also from Florian. 5) Simplify nft_flowtable.sh selftest, from Fabian Frederick. 6) Optional MTU arguments for selftest nft_flowtable.sh, also from Fabian. 7) Remove noise error report when killing process in selftest nft_flowtable.sh, from Fabian Frederick. 8) Reject bogus getsockopt option length in ebtables, from Florian Westphal. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-13netfilter: nf_tables: free chain context when BINDING flag is missingFlorian Westphal1-2/+4
syzbot found a memory leak in nf_tables_addchain() because the chain object is not free'd correctly on error. Fixes: d0e2c7de92c7 ("netfilter: nf_tables: add NFT_CHAIN_BINDING") Reported-by: syzbot+c99868fde67014f7e9f5@syzkaller.appspotmail.com Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-08-10Merge tag 'locking-urgent-2020-08-10' of ↵Linus Torvalds2-4/+5
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Thomas Gleixner: "A set of locking fixes and updates: - Untangle the header spaghetti which causes build failures in various situations caused by the lockdep additions to seqcount to validate that the write side critical sections are non-preemptible. - The seqcount associated lock debug addons which were blocked by the above fallout. seqcount writers contrary to seqlock writers must be externally serialized, which usually happens via locking - except for strict per CPU seqcounts. As the lock is not part of the seqcount, lockdep cannot validate that the lock is held. This new debug mechanism adds the concept of associated locks. sequence count has now lock type variants and corresponding initializers which take a pointer to the associated lock used for writer serialization. If lockdep is enabled the pointer is stored and write_seqcount_begin() has a lockdep assertion to validate that the lock is held. Aside of the type and the initializer no other code changes are required at the seqcount usage sites. The rest of the seqcount API is unchanged and determines the type at compile time with the help of _Generic which is possible now that the minimal GCC version has been moved up. Adding this lockdep coverage unearthed a handful of seqcount bugs which have been addressed already independent of this. While generally useful this comes with a Trojan Horse twist: On RT kernels the write side critical section can become preemtible if the writers are serialized by an associated lock, which leads to the well known reader preempts writer livelock. RT prevents this by storing the associated lock pointer independent of lockdep in the seqcount and changing the reader side to block on the lock when a reader detects that a writer is in the write side critical section. - Conversion of seqcount usage sites to associated types and initializers" * tag 'locking-urgent-2020-08-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits) locking/seqlock, headers: Untangle the spaghetti monster locking, arch/ia64: Reduce <asm/smp.h> header dependencies by moving XTP bits into the new <asm/xtp.h> header x86/headers: Remove APIC headers from <asm/smp.h> seqcount: More consistent seqprop names seqcount: Compress SEQCNT_LOCKNAME_ZERO() seqlock: Fold seqcount_LOCKNAME_init() definition seqlock: Fold seqcount_LOCKNAME_t definition seqlock: s/__SEQ_LOCKDEP/__SEQ_LOCK/g hrtimer: Use sequence counter with associated raw spinlock kvm/eventfd: Use sequence counter with associated spinlock userfaultfd: Use sequence counter with associated spinlock NFSv4: Use sequence counter with associated spinlock iocost: Use sequence counter with associated spinlock raid5: Use sequence counter with associated spinlock vfs: Use sequence counter with associated spinlock timekeeping: Use sequence counter with associated raw spinlock xfrm: policy: Use sequence counters with associated lock netfilter: nft_set_rbtree: Use sequence counter with associated rwlock netfilter: conntrack: Use sequence counter with associated spinlock sched: tasks: Use sequence counter with associated spinlock ...
2020-08-10netfilter: nft_compat: remove flush counter optimizationFlorian Westphal1-23/+14
WARNING: CPU: 1 PID: 16059 at lib/refcount.c:31 refcount_warn_saturate+0xdf/0xf [..] __nft_mt_tg_destroy+0x42/0x50 [nft_compat] nft_target_destroy+0x63/0x80 [nft_compat] nf_tables_expr_destroy+0x1b/0x30 [nf_tables] nf_tables_rule_destroy+0x3a/0x70 [nf_tables] nf_tables_exit_net+0x186/0x3d0 [nf_tables] Happens when a compat expr is destoyed from abort path. There is no functional impact; after this work queue is flushed unconditionally if its pending. This removes the waitcount optimization. Test of repeated iptables-restore of a ~60k kubernetes ruleset doesn't indicate a slowdown. In case the counter is needed after all for some workloads we can revert this and increment the refcount for the != NFT_PREPARE_TRANS case to avoid the increment/decrement imbalance. While at it, also flush for match case, this was an oversight in the original patch. Fixes: ffe8923f109b7e ("netfilter: nft_compat: make sure xtables destructors have run") Reported-by: kernel test robot <rong.a.chen@intel.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-08-10netfilter: nf_tables: nft_exthdr: the presence return value should be ↵Stephen Suryaputra1-2/+2
little-endian On big-endian machine, the returned register data when the exthdr is present is not being compared correctly because little-endian is assumed. The function nft_cmp_fast_mask(), called by nft_cmp_fast_eval() and nft_cmp_fast_init(), calls cpu_to_le32(). The following dump also shows that little endian is assumed: $ nft --debug=netlink add rule ip recordroute forward ip option rr exists counter ip [ exthdr load ipv4 1b @ 7 + 0 present => reg 1 ] [ cmp eq reg 1 0x01000000 ] [ counter pkts 0 bytes 0 ] Lastly, debug print in nft_cmp_fast_init() and nft_cmp_fast_eval() when RR option exists in the packet shows that the comparison fails because the assumption: nft_cmp_fast_init:189 priv->sreg=4 desc.len=8 mask=0xff000000 data.data[0]=0x10003e0 nft_cmp_fast_eval:57 regs->data[priv->sreg=4]=0x1 mask=0xff000000 priv->data=0x1000000 v2: use nft_reg_store8() instead (Florian Westphal). Also to avoid the warnings reported by kernel test robot. Fixes: dbb5281a1f84 ("netfilter: nf_tables: add support for matching IPv4 options") Fixes: c078ca3b0c5b ("netfilter: nft_exthdr: Add support for existence check") Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-08-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-nextLinus Torvalds35-255/+554
Pull networking updates from David Miller: 1) Support 6Ghz band in ath11k driver, from Rajkumar Manoharan. 2) Support UDP segmentation in code TSO code, from Eric Dumazet. 3) Allow flashing different flash images in cxgb4 driver, from Vishal Kulkarni. 4) Add drop frames counter and flow status to tc flower offloading, from Po Liu. 5) Support n-tuple filters in cxgb4, from Vishal Kulkarni. 6) Various new indirect call avoidance, from Eric Dumazet and Brian Vazquez. 7) Fix BPF verifier failures on 32-bit pointer arithmetic, from Yonghong Song. 8) Support querying and setting hardware address of a port function via devlink, use this in mlx5, from Parav Pandit. 9) Support hw ipsec offload on bonding slaves, from Jarod Wilson. 10) Switch qca8k driver over to phylink, from Jonathan McDowell. 11) In bpftool, show list of processes holding BPF FD references to maps, programs, links, and btf objects. From Andrii Nakryiko. 12) Several conversions over to generic power management, from Vaibhav Gupta. 13) Add support for SO_KEEPALIVE et al. to bpf_setsockopt(), from Dmitry Yakunin. 14) Various https url conversions, from Alexander A. Klimov. 15) Timestamping and PHC support for mscc PHY driver, from Antoine Tenart. 16) Support bpf iterating over tcp and udp sockets, from Yonghong Song. 17) Support 5GBASE-T i40e NICs, from Aleksandr Loktionov. 18) Add kTLS RX HW offload support to mlx5e, from Tariq Toukan. 19) Fix the ->ndo_start_xmit() return type to be netdev_tx_t in several drivers. From Luc Van Oostenryck. 20) XDP support for xen-netfront, from Denis Kirjanov. 21) Support receive buffer autotuning in MPTCP, from Florian Westphal. 22) Support EF100 chip in sfc driver, from Edward Cree. 23) Add XDP support to mvpp2 driver, from Matteo Croce. 24) Support MPTCP in sock_diag, from Paolo Abeni. 25) Commonize UDP tunnel offloading code by creating udp_tunnel_nic infrastructure, from Jakub Kicinski. 26) Several pci_ --> dma_ API conversions, from Christophe JAILLET. 27) Add FLOW_ACTION_POLICE support to mlxsw, from Ido Schimmel. 28) Add SK_LOOKUP bpf program type, from Jakub Sitnicki. 29) Refactor a lot of networking socket option handling code in order to avoid set_fs() calls, from Christoph Hellwig. 30) Add rfc4884 support to icmp code, from Willem de Bruijn. 31) Support TBF offload in dpaa2-eth driver, from Ioana Ciornei. 32) Support XDP_REDIRECT in qede driver, from Alexander Lobakin. 33) Support PCI relaxed ordering in mlx5 driver, from Aya Levin. 34) Support TCP syncookies in MPTCP, from Flowian Westphal. 35) Fix several tricky cases of PMTU handling wrt. briding, from Stefano Brivio. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2056 commits) net: thunderx: initialize VF's mailbox mutex before first usage usb: hso: remove bogus check for EINPROGRESS usb: hso: no complaint about kmalloc failure hso: fix bailout in error case of probe ip_tunnel_core: Fix build for archs without _HAVE_ARCH_IPV6_CSUM selftests/net: relax cpu affinity requirement in msg_zerocopy test mptcp: be careful on subflow creation selftests: rtnetlink: make kci_test_encap() return sub-test result selftests: rtnetlink: correct the final return value for the test net: dsa: sja1105: use detected device id instead of DT one on mismatch tipc: set ub->ifindex for local ipv6 address ipv6: add ipv6_dev_find() net: openvswitch: silence suspicious RCU usage warning Revert "vxlan: fix tos value before xmit" ptp: only allow phase values lower than 1 period farsync: switch from 'pci_' to 'dma_' API wan: wanxl: switch from 'pci_' to 'dma_' API hv_netvsc: do not use VF device if link is down dpaa2-eth: Fix passing zero to 'PTR_ERR' warning net: macb: Properly handle phylink on at91sam9x ...
2020-08-04Merge tag 'audit-pr-20200803' of ↵Linus Torvalds2-2/+115
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit Pull audit updates from Paul Moore: "Aside from some smaller bug fixes, here are the highlights: - add a new backlog wait metric to the audit status message, this is intended to help admins determine how long processes have been waiting for the audit backlog queue to clear - generate audit records for nftables configuration changes - generate CWD audit records for for the relevant LSM audit records" * tag 'audit-pr-20200803' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit: audit: report audit wait metric in audit status reply audit: purge audit_log_string from the intra-kernel audit API audit: issue CWD record to accompany LSM_AUDIT_DATA_* records audit: use the proper gfp flags in the audit_log_nfcfg() calls audit: remove unused !CONFIG_AUDITSYSCALL __audit_inode* stubs audit: add gfp parameter to audit_log_nfcfg audit: log nftables configuration change events audit: Use struct_size() helper in alloc_chunk
2020-08-04Merge tag 'uninit-macro-v5.9-rc1' of ↵Linus Torvalds3-4/+4
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull uninitialized_var() macro removal from Kees Cook: "This is long overdue, and has hidden too many bugs over the years. The series has several "by hand" fixes, and then a trivial treewide replacement. - Clean up non-trivial uses of uninitialized_var() - Update documentation and checkpatch for uninitialized_var() removal - Treewide removal of uninitialized_var()" * tag 'uninit-macro-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: compiler: Remove uninitialized_var() macro treewide: Remove uninitialized_var() usage checkpatch: Remove awareness of uninitialized_var() macro mm/debug_vm_pgtable: Remove uninitialized_var() usage f2fs: Eliminate usage of uninitialized_var() macro media: sur40: Remove uninitialized_var() usage KVM: PPC: Book3S PR: Remove uninitialized_var() usage clk: spear: Remove uninitialized_var() usage clk: st: Remove uninitialized_var() usage spi: davinci: Remove uninitialized_var() usage ide: Remove uninitialized_var() usage rtlwifi: rtl8192cu: Remove uninitialized_var() usage b43: Remove uninitialized_var() usage drbd: Remove uninitialized_var() usage x86/mm/numa: Remove uninitialized_var() usage docs: deprecated.rst: Add uninitialized_var()
2020-08-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller5-19/+43
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Flush the cleanup xtables worker to make sure destructors have completed, from Florian Westphal. 2) iifgroup is matching erroneously, also from Florian. 3) Add selftest for meta interface matching, from Florian Westphal. 4) Move nf_ct_offload_timeout() to header, from Roi Dayan. 5) Call nf_ct_offload_timeout() from flow_offload_add() to make sure garbage collection does not evict offloaded flow, from Roi Dayan. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller27-82/+137
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next 1) UAF in chain binding support from previous batch, from Dan Carpenter. 2) Queue up delayed work to expire connections with no destination, from Andrew Sy Kim. 3) Use fallthrough pseudo-keyword, from Gustavo A. R. Silva. 4) Replace HTTP links with HTTPS, from Alexander A. Klimov. 5) Remove superfluous null header checks in ip6tables, from Gaurav Singh. 6) Add extended netlink error reporting for expression. 7) Report EEXIST on overlapping chain, set elements and flowtable devices. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-03netfilter: flowtable: Set offload timeout when adding flowRoi Dayan1-0/+2
On heavily loaded systems the GC can take time to go over all existing conns and reset their timeout. At that time other calls like from nf_conntrack_in() can call of nf_ct_is_expired() and see the conn as expired. To fix this when we set the offload bit we should also reset the timeout instead of counting on GC to finish first iteration over all conns before the initial timeout. Fixes: 90964016e5d3 ("netfilter: nf_conntrack: add IPS_OFFLOAD status bit") Signed-off-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-08-03netfilter: conntrack: Move nf_ct_offload_timeout to header fileRoi Dayan1-12/+0
To be used by callers from other modules. [ Rename DAY to NF_CT_DAY to avoid possible symbol name pollution issue --Pablo ] Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-08-02netfilter: nf_tables: report EEXIST on overlapsPablo Neira Ayuso1-9/+7
Replace EBUSY by EEXIST in the following cases: - If the user adds a chain with a different configuration such as different type, hook and priority. - If the user adds a non-base chain that clashes with an existing basechain. - If the user adds a { key : value } mapping element and the key exists but the value differs. - If the device already belongs to an existing flowtable. User describe that this error reporting is confusing: - https://bugzilla.netfilter.org/show_bug.cgi?id=1176 - https://bugzilla.netfilter.org/show_bug.cgi?id=1413 Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-08-02netfilter: nft_meta: fix iifgroup matchingFlorian Westphal1-1/+1
iifgroup matching erroneously checks the output interface. Fixes: 8724e819cc9a ("netfilter: nft_meta: move all interface related keys to helper") Reported-by: Demi M. Obenour <demiobenour@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-08-02netfilter: nf_tables: extended netlink error reporting for expressionsPablo Neira Ayuso1-1/+6
This patch extends 36dd1bcc07e5 ("netfilter: nf_tables: initial support for extended ACK reporting") to include netlink extended error reporting for expressions. This allows userspace to identify what rule expression is triggering the error. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-07-31netfilter: nft_compat: make sure xtables destructors have runFlorian Westphal2-6/+40
Pablo Neira found that after recent update of xt_IDLETIMER the iptables-nft tests sometimes show an error. He tracked this down to the delayed cleanup used by nf_tables core: del rule (transaction A) add rule (transaction B) Its possible that by time transaction B (both in same netns) runs, the xt target destructor has not been invoked yet. For native nft expressions this is no problem because all expressions that have such side effects make sure these are handled from the commit phase, rather than async cleanup. For nft_compat however this isn't true. Instead of forcing synchronous behaviour for nft_compat, keep track of the number of outstanding destructor calls. When we attempt to create a new expression, flush the cleanup worker to make sure destructors have completed. With lots of help from Pablo Neira. Reported-by: Pablo Neira Ayso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-07-29netfilter: Replace HTTP links with HTTPS onesAlexander A. Klimov7-8/+8
Rationale: Reduces attack surface on kernel devs opening the links for MITM as HTTPS traffic is much harder to manipulate. Deterministic algorithm: For each file: If not .svg: For each line: If doesn't contain `\bxmlns\b`: For each link, `\bhttp://[^# \t\r\n]*(?:\w|/)`: If neither `\bgnu\.org/license`, nor `\bmozilla\.org/MPL\b`: If both the HTTP and HTTPS versions return 200 OK and serve the same content: Replace HTTP with HTTPS. Signed-off-by: Alexander A. Klimov <grandmaster@al2klimov.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-07-29netfilter: nft_set_rbtree: Use sequence counter with associated rwlockAhmed S. Darwish1-2/+2
A sequence counter write side critical section must be protected by some form of locking to serialize writers. A plain seqcount_t does not contain the information of which lock must be held when entering a write side critical section. Use the new seqcount_rwlock_t data type, which allows to associate a rwlock with the sequence counter. This enables lockdep to verify that the rwlock used for writer serialization is held when the write side critical section is entered. If lockdep is disabled this lock association is compiled out and has neither storage size nor runtime overhead. Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200720155530.1173732-16-a.darwish@linutronix.de
2020-07-29netfilter: conntrack: Use sequence counter with associated spinlockAhmed S. Darwish1-2/+3
A sequence counter write side critical section must be protected by some form of locking to serialize writers. A plain seqcount_t does not contain the information of which lock must be held when entering a write side critical section. Use the new seqcount_spinlock_t data type, which allows to associate a spinlock with the sequence counter. This enables lockdep to verify that the spinlock used for writer serialization is held when the write side critical section is entered. If lockdep is disabled this lock association is compiled out and has neither storage size nor runtime overhead. Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200720155530.1173732-15-a.darwish@linutronix.de
2020-07-28net: remove sockptr_advanceChristoph Hellwig1-3/+4
sockptr_advance never properly worked. Replace it with _offset variants of copy_from_sockptr and copy_to_sockptr. Fixes: ba423fdaa589 ("net: add a new sockptr_t type") Reported-by: Jason A. Donenfeld <Jason@zx2c4.com> Reported-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Jason A. Donenfeld <Jason@zx2c4.com> Tested-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller2-31/+22
The UDP reuseport conflict was a little bit tricky. The net-next code, via bpf-next, extracted the reuseport handling into a helper so that the BPF sk lookup code could invoke it. At the same time, the logic for reuseport handling of unconnected sockets changed via commit efc6b6f6c3113e8b203b9debfb72d81e0f3dcace which changed the logic to carry on the reuseport result into the rest of the lookup loop if we do not return immediately. This requires moving the reuseport_has_conns() logic into the callers. While we are here, get rid of inline directives as they do not belong in foo.c files. The other changes were cases of more straightforward overlapping modifications. Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-24netfilter: switch nf_setsockopt to sockptr_tChristoph Hellwig2-3/+3
Pass a sockptr_t to prepare for set_fs-less handling of the kernel pointer from bpf-cgroup. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-24netfilter: switch xt_copy_counters to sockptr_tChristoph Hellwig1-10/+10
Pass a sockptr_t to prepare for set_fs-less handling of the kernel pointer from bpf-cgroup. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-22ipvs: fix the connection sync failed in some casesguodeqing1-4/+8
The sync_thread_backup only checks sk_receive_queue is empty or not, there is a situation which cannot sync the connection entries when sk_receive_queue is empty and sk_rmem_alloc is larger than sk_rcvbuf, the sync packets are dropped in __udp_enqueue_schedule_skb, this is because the packets in reader_queue is not read, so the rmem is not reclaimed. Here I add the check of whether the reader_queue of the udp sock is empty or not to solve this problem. Fixes: 2276f58ac589 ("udp: use a separate rx queue for packet reception") Reported-by: zhouxudong <zhouxudong8@huawei.com> Signed-off-by: guodeqing <geffrey.guo@huawei.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-07-22netfilter: Use fallthrough pseudo-keywordGustavo A. R. Silva16-35/+33
Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/latest/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-07-22ipvs: queue delayed work to expire no destination connections if ↵Andrew Sy Kim3-27/+81
expire_nodest_conn=1 When expire_nodest_conn=1 and a destination is deleted, IPVS does not expire the existing connections until the next matching incoming packet. If there are many connection entries from a single client to a single destination, many packets may get dropped before all the connections are expired (more likely with lots of UDP traffic). An optimization can be made where upon deletion of a destination, IPVS queues up delayed work to immediately expire any connections with a deleted destination. This ensures any reused source ports from a client (within the IPVS timeouts) are scheduled to new real servers instead of silently dropped. Signed-off-by: Andrew Sy Kim <kim.andrewsy@gmail.com> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-07-19netfilter: split nf_sockoptChristoph Hellwig1-17/+13
Split nf_sockopt into a getsockopt and setsockopt side as they share very little code. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-19netfilter: remove the compat argument to xt_copy_counters_from_userChristoph Hellwig1-5/+4
Lift the in_compat_syscall() from the callers instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-19netfilter: remove the compat_{get,set} methodsChristoph Hellwig1-42/+0
All instances handle compat sockopts via in_compat_syscall() now, so remove the compat_{get,set} methods as well as the compat_nf_{get,set}sockopt wrappers. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-16treewide: Remove uninitialized_var() usageKees Cook3-4/+4
Using uninitialized_var() is dangerous as it papers over real bugs[1] (or can in the future), and suppresses unrelated compiler warnings (e.g. "unused variable"). If the compiler thinks it is uninitialized, either simply initialize the variable or make compiler changes. In preparation for removing[2] the[3] macro[4], remove all remaining needless uses with the following script: git grep '\buninitialized_var\b' | cut -d: -f1 | sort -u | \ xargs perl -pi -e \ 's/\buninitialized_var\(([^\)]+)\)/\1/g; s:\s*/\* (GCC be quiet|to make compiler happy) \*/$::g;' drivers/video/fbdev/riva/riva_hw.c was manually tweaked to avoid pathological white-space. No outstanding warnings were found building allmodconfig with GCC 9.3.0 for x86_64, i386, arm64, arm, powerpc, powerpc64le, s390x, mips, sparc64, alpha, and m68k. [1] https://lore.kernel.org/lkml/20200603174714.192027-1-glider@google.com/ [2] https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1TGqCR5vQkCzWJ0QxK6CernOU6eedsudAixw@mail.gmail.com/ [3] https://lore.kernel.org/lkml/CA+55aFwgbgqhbp1fkxvRKEpzyR5J8n1vKT1VZdz9knmPuXhOeg@mail.gmail.com/ [4] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/ Reviewed-by: Leon Romanovsky <leonro@mellanox.com> # drivers/infiniband and mlx4/mlx5 Acked-by: Jason Gunthorpe <jgg@mellanox.com> # IB Acked-by: Kalle Valo <kvalo@codeaurora.org> # wireless drivers Reviewed-by: Chao Yu <yuchao0@huawei.com> # erofs Signed-off-by: Kees Cook <keescook@chromium.org>
2020-07-15netfilter: nf_tables: Fix a use after free in nft_immediate_destroy()Dan Carpenter1-2/+2
The nf_tables_rule_release() function frees "rule" so we have to use the _safe() version of list_for_each_entry(). Fixes: d0e2c7de92c7 ("netfilter: nf_tables: add NFT_CHAIN_BINDING") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-07-15netfilter: nf_tables: fix nat hook table deletionFlorian Westphal1-27/+14
sybot came up with following transaction: add table ip syz0 add chain ip syz0 syz2 { type nat hook prerouting priority 0; policy accept; } add table ip syz0 { flags dormant; } delete chain ip syz0 syz2 delete table ip syz0 which yields: hook not found, pf 2 num 0 WARNING: CPU: 0 PID: 6775 at net/netfilter/core.c:413 __nf_unregister_net_hook+0x3e6/0x4a0 net/netfilter/core.c:413 [..] nft_unregister_basechain_hooks net/netfilter/nf_tables_api.c:206 [inline] nft_table_disable net/netfilter/nf_tables_api.c:835 [inline] nf_tables_table_disable net/netfilter/nf_tables_api.c:868 [inline] nf_tables_commit+0x32d3/0x4d70 net/netfilter/nf_tables_api.c:7550 nfnetlink_rcv_batch net/netfilter/nfnetlink.c:486 [inline] nfnetlink_rcv_skb_batch net/netfilter/nfnetlink.c:544 [inline] nfnetlink_rcv+0x14a5/0x1e50 net/netfilter/nfnetlink.c:562 netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline] Problem is that when I added ability to override base hook registration to make nat basechains register with the nat core instead of netfilter core, I forgot to update nft_table_disable() to use that instead of the 'raw' hook register interface. In syzbot transaction, the basechain is of 'nat' type. Its registered with the nat core. The switch to 'dormant mode' attempts to delete from netfilter core instead. After updating nft_table_disable/enable to use the correct helper, nft_(un)register_basechain_hooks can be folded into the only remaining caller. Because nft_trans_table_enable() won't do anything when the DORMANT flag is set, remove the flag first, then re-add it in case re-enablement fails, else this patch breaks sequence: add table ip x { flags dormant; } /* add base chains */ add table ip x The last 'add' will remove the dormant flags, but won't have any other effect -- base chains are not registered. Then, next 'set dormant flag' will create another 'hook not found' splat. Reported-by: syzbot+2570f2c036e3da5db176@syzkaller.appspotmail.com Fixes: 4e25ceb80b58 ("netfilter: nf_tables: allow chain type to override hook register") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>