summaryrefslogtreecommitdiffstats
path: root/net/netfilter
AgeCommit message (Collapse)AuthorFilesLines
2020-12-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextJakub Kicinski12-105/+450
Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next 1) Missing dependencies in NFT_BRIDGE_REJECT, from Randy Dunlap. 2) Use atomic_inc_return() instead of atomic_add_return() in IPVS, from Yejune Deng. 3) Simplify check for overquota in xt_nfacct, from Kaixu Xia. 4) Move nfnl_acct_list away from struct net, from Miao Wang. 5) Pass actual sk in reject actions, from Jan Engelhardt. 6) Add timeout and protoinfo to ctnetlink destroy events, from Florian Westphal. 7) Four patches to generalize set infrastructure to support for multiple expressions per set element. * git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next: netfilter: nftables: netlink support for several set element expressions netfilter: nftables: generalize set extension to support for several expressions netfilter: nftables: move nft_expr before nft_set netfilter: nftables: generalize set expressions support netfilter: ctnetlink: add timeout and protoinfo to destroy events netfilter: use actual socket sk for REJECT action netfilter: nfnl_acct: remove data from struct net netfilter: Remove unnecessary conversion to bool ipvs: replace atomic_add_return() netfilter: nft_reject_bridge: fix build errors due to code movement ==================== Link: https://lore.kernel.org/r/20201212230513.3465-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-12netfilter: nftables: netlink support for several set element expressionsPablo Neira Ayuso2-6/+143
This patch adds three new netlink attributes to encapsulate a list of expressions per set elements: - NFTA_SET_EXPRESSIONS: this attribute provides the set definition in terms of expressions. New set elements get attached the list of expressions that is specified by this new netlink attribute. - NFTA_SET_ELEM_EXPRESSIONS: this attribute allows users to restore (or initialize) the stateful information of set elements when adding an element to the set. - NFTA_DYNSET_EXPRESSIONS: this attribute specifies the list of expressions that the set element gets when it is inserted from the packet path. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-12-12netfilter: nftables: generalize set extension to support for several expressionsPablo Neira Ayuso3-47/+168
This patch replaces NFT_SET_EXPR by NFT_SET_EXT_EXPRESSIONS. This new extension allows to attach several expressions to one set element (not only one single expression as NFT_SET_EXPR provides). This patch prepares for support for several expressions per set element in the netlink userspace API. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-12-12netfilter: nftables: generalize set expressions supportPablo Neira Ayuso2-27/+66
Currently, the set infrastucture allows for one single expressions per element. This patch extends the existing infrastructure to allow for up to two expressions. This is not updating the netlink API yet, this is coming as an initial preparation patch. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-12-12netfilter: ctnetlink: add timeout and protoinfo to destroy eventsFlorian Westphal4-22/+48
DESTROY events do not include the remaining timeout. Add the timeout if the entry was removed explicitly. This can happen when a conntrack gets deleted prematurely, e.g. due to a tcp reset, module removal, netdev notifier (nat/masquerade device went down), ctnetlink and so on. Add the protocol state too for the destroy message to check for abnormal state on connection termination. Joint work with Pablo. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-12-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski4-41/+26
xdp_return_frame_bulk() needs to pass a xdp_buff to __xdp_return(). strlcpy got converted to strscpy but here it makes no functional difference, so just keep the right code. Conflicts: net/netfilter/nf_tables_api.c Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-09netfilter: nft_ct: Remove confirmation check for NFT_CT_IDBrett Mastbergen1-2/+0
Since commit 656c8e9cc1ba ("netfilter: conntrack: Use consistent ct id hash calculation") the ct id will not change from initialization to confirmation. Removing the confirmation check allows for things like adding an element to a 'typeof ct id' set in prerouting upon reception of the first packet of a new connection, and then being able to reference that set consistently both before and after the connection is confirmed. Fixes: 656c8e9cc1ba ("netfilter: conntrack: Use consistent ct id hash calculation") Signed-off-by: Brett Mastbergen <brett.mastbergen@gmail.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-12-08netfilter: nftables: comment indirect serialization of commit_mutex with ↵Pablo Neira Ayuso1-0/+4
rtnl_mutex Add an explicit comment in the code to describe the indirect serialization of the holders of the commit_mutex with the rtnl_mutex. Commit 90d2723c6d4c ("netfilter: nf_tables: do not hold reference on netdevice from preparation phase") already describes this, but a comment in this case is better for reference. Reported-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-12-08netfilter: nft_dynset: fix timeouts later than 23 daysPablo Neira Ayuso2-5/+7
Use nf_msecs_to_jiffies64 and nf_jiffies64_to_msecs as provided by 8e1102d5a159 ("netfilter: nf_tables: support timeouts larger than 23 days"), otherwise ruleset listing breaks. Fixes: a8b1e36d0d1d ("netfilter: nft_dynset: fix element timeout for HZ != 1000") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-12-08netfilter: x_tables: Switch synchronization to RCUSubash Abhinov Kasiviswanathan1-34/+15
When running concurrent iptables rules replacement with data, the per CPU sequence count is checked after the assignment of the new information. The sequence count is used to synchronize with the packet path without the use of any explicit locking. If there are any packets in the packet path using the table information, the sequence count is incremented to an odd value and is incremented to an even after the packet process completion. The new table value assignment is followed by a write memory barrier so every CPU should see the latest value. If the packet path has started with the old table information, the sequence counter will be odd and the iptables replacement will wait till the sequence count is even prior to freeing the old table info. However, this assumes that the new table information assignment and the memory barrier is actually executed prior to the counter check in the replacement thread. If CPU decides to execute the assignment later as there is no user of the table information prior to the sequence check, the packet path in another CPU may use the old table information. The replacement thread would then free the table information under it leading to a use after free in the packet processing context- Unable to handle kernel NULL pointer dereference at virtual address 000000000000008e pc : ip6t_do_table+0x5d0/0x89c lr : ip6t_do_table+0x5b8/0x89c ip6t_do_table+0x5d0/0x89c ip6table_filter_hook+0x24/0x30 nf_hook_slow+0x84/0x120 ip6_input+0x74/0xe0 ip6_rcv_finish+0x7c/0x128 ipv6_rcv+0xac/0xe4 __netif_receive_skb+0x84/0x17c process_backlog+0x15c/0x1b8 napi_poll+0x88/0x284 net_rx_action+0xbc/0x23c __do_softirq+0x20c/0x48c This could be fixed by forcing instruction order after the new table information assignment or by switching to RCU for the synchronization. Fixes: 80055dab5de0 ("netfilter: x_tables: make xt_replace_table wait until old rules are not used anymore") Reported-by: Sean Tranchetti <stranche@codeaurora.org> Reported-by: kernel test robot <lkp@intel.com> Suggested-by: Florian Westphal <fw@strlen.de> Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-12-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski7-38/+110
Conflicts: drivers/net/ethernet/ibm/ibmvnic.c Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-01netfilter: use actual socket sk for REJECT actionJan Engelhardt1-2/+4
True to the message of commit v5.10-rc1-105-g46d6c5ae953c, _do_ actually make use of state->sk when possible, such as in the REJECT modules. Reported-by: Minqiang Chen <ptpt52@gmail.com> Cc: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Jan Engelhardt <jengelh@inai.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-12-01netfilter: nfnl_acct: remove data from struct netWang Shanker1-9/+29
This patch removes nfnl_acct_list from struct net to reduce the default memory footprint for the netns structure. Signed-off-by: Miao Wang <shankerwangmiao@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-12-01netfilter: Remove unnecessary conversion to boolKaixu Xia1-1/+1
Here we could use the '!=' expression to fix the following coccicheck warning: ./net/netfilter/xt_nfacct.c:30:41-46: WARNING: conversion to bool not needed here Reported-by: Tosk Robot <tencent_os_robot@tencent.com> Signed-off-by: Kaixu Xia <kaixuxia@tencent.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-11-27netfilter: nftables_offload: build mask based from the matching bytesPablo Neira Ayuso3-29/+61
Userspace might match on prefix bytes of header fields if they are on the byte boundary, this requires that the mask is adjusted accordingly. Use NFT_OFFLOAD_MATCH_EXACT() for meta since prefix byte matching is not allowed for this type of selector. The bitwise expression might be optimized out by userspace, hence the kernel needs to infer the prefix from the number of payload bytes to match on. This patch adds nft_payload_offload_mask() to calculate the bitmask to match on the prefix. Fixes: c9626a2cbdb2 ("netfilter: nf_tables: add hardware offload support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-11-27netfilter: nftables_offload: set address type in control dissectorPablo Neira Ayuso2-0/+21
This patch adds nft_flow_rule_set_addr_type() to set the address type from the nft_payload expression accordingly. If the address type is not set in the control dissector then a rule that matches either on source or destination IP address does not work. After this patch, nft hardware offload generates the flow dissector configuration as tc-flower does to match on an IP address. This patch has been also tested functionally to make sure packets are filtered out by the NIC. This is also getting the code aligned with the existing netfilter flow offload infrastructure which is also setting the control dissector. Fixes: c9626a2cbdb2 ("netfilter: nf_tables: add hardware offload support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-11-27ipvs: fix possible memory leak in ip_vs_control_net_initWang Hai1-6/+25
kmemleak report a memory leak as follows: BUG: memory leak unreferenced object 0xffff8880759ea000 (size 256): backtrace: [<00000000c0bf2deb>] kmem_cache_zalloc include/linux/slab.h:656 [inline] [<00000000c0bf2deb>] __proc_create+0x23d/0x7d0 fs/proc/generic.c:421 [<000000009d718d02>] proc_create_reg+0x8e/0x140 fs/proc/generic.c:535 [<0000000097bbfc4f>] proc_create_net_data+0x8c/0x1b0 fs/proc/proc_net.c:126 [<00000000652480fc>] ip_vs_control_net_init+0x308/0x13a0 net/netfilter/ipvs/ip_vs_ctl.c:4169 [<000000004c927ebe>] __ip_vs_init+0x211/0x400 net/netfilter/ipvs/ip_vs_core.c:2429 [<00000000aa6b72d9>] ops_init+0xa8/0x3c0 net/core/net_namespace.c:151 [<00000000153fd114>] setup_net+0x2de/0x7e0 net/core/net_namespace.c:341 [<00000000be4e4f07>] copy_net_ns+0x27d/0x530 net/core/net_namespace.c:482 [<00000000f1c23ec9>] create_new_namespaces+0x382/0xa30 kernel/nsproxy.c:110 [<00000000098a5757>] copy_namespaces+0x2e6/0x3b0 kernel/nsproxy.c:179 [<0000000026ce39e9>] copy_process+0x220a/0x5f00 kernel/fork.c:2072 [<00000000b71f4efe>] _do_fork+0xc7/0xda0 kernel/fork.c:2428 [<000000002974ee96>] __do_sys_clone3+0x18a/0x280 kernel/fork.c:2703 [<0000000062ac0a4d>] do_syscall_64+0x33/0x40 arch/x86/entry/common.c:46 [<0000000093f1ce2c>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 In the error path of ip_vs_control_net_init(), remove_proc_entry() needs to be called to remove the added proc entry, otherwise a memory leak will occur. Also, add some '#ifdef CONFIG_PROC_FS' because proc_create_net* return NULL when PROC is not used. Fixes: b17fc9963f83 ("IPVS: netns, ip_vs_stats and its procfs") Fixes: 61b1ab4583e2 ("IPVS: netns, add basic init per netns.") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wang Hai <wanghai38@huawei.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-11-26netfilter: nf_tables: avoid false-postive lockdep splatFlorian Westphal1-1/+2
There are reports wrt lockdep splat in nftables, e.g.: ------------[ cut here ]------------ WARNING: CPU: 2 PID: 31416 at net/netfilter/nf_tables_api.c:622 lockdep_nfnl_nft_mutex_not_held+0x28/0x38 [nf_tables] ... These are caused by an earlier, unrelated bug such as a n ABBA deadlock in a different subsystem. In such an event, lockdep is disabled and lockdep_is_held returns true unconditionally. This then causes the WARN() in nf_tables. Make the WARN conditional on lockdep still active to avoid this. Fixes: f102d66b335a417 ("netfilter: nf_tables: use dedicated mutex to guard transactions") Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Link: https://lore.kernel.org/linux-kselftest/CA+G9fYvFUpODs+NkSYcnwKnXm62tmP=ksLeBPmB+KFrB2rvCtQ@mail.gmail.com/ Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-11-26netfilter: ipset: prevent uninit-value in hash_ip6_addEric Dumazet1-2/+1
syzbot found that we are not validating user input properly before copying 16 bytes [1]. Using NLA_BINARY in ipaddr_policy[] for IPv6 address is not correct, since it ensures at most 16 bytes were provided. We should instead make sure user provided exactly 16 bytes. In old kernels (before v4.20), fix would be to remove the NLA_BINARY, since NLA_POLICY_EXACT_LEN() was not yet available. [1] BUG: KMSAN: uninit-value in hash_ip6_add+0x1cba/0x3a50 net/netfilter/ipset/ip_set_hash_gen.h:892 CPU: 1 PID: 11611 Comm: syz-executor.0 Not tainted 5.10.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x21c/0x280 lib/dump_stack.c:118 kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:118 __msan_warning+0x5f/0xa0 mm/kmsan/kmsan_instr.c:197 hash_ip6_add+0x1cba/0x3a50 net/netfilter/ipset/ip_set_hash_gen.h:892 hash_ip6_uadt+0x976/0xbd0 net/netfilter/ipset/ip_set_hash_ip.c:267 call_ad+0x329/0xd00 net/netfilter/ipset/ip_set_core.c:1720 ip_set_ad+0x111f/0x1440 net/netfilter/ipset/ip_set_core.c:1808 ip_set_uadd+0xf6/0x110 net/netfilter/ipset/ip_set_core.c:1833 nfnetlink_rcv_msg+0xc7d/0xdf0 net/netfilter/nfnetlink.c:252 netlink_rcv_skb+0x70a/0x820 net/netlink/af_netlink.c:2494 nfnetlink_rcv+0x4f0/0x4380 net/netfilter/nfnetlink.c:600 netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline] netlink_unicast+0x11da/0x14b0 net/netlink/af_netlink.c:1330 netlink_sendmsg+0x173c/0x1840 net/netlink/af_netlink.c:1919 sock_sendmsg_nosec net/socket.c:651 [inline] sock_sendmsg net/socket.c:671 [inline] ____sys_sendmsg+0xc7a/0x1240 net/socket.c:2353 ___sys_sendmsg net/socket.c:2407 [inline] __sys_sendmsg+0x6d5/0x830 net/socket.c:2440 __do_sys_sendmsg net/socket.c:2449 [inline] __se_sys_sendmsg+0x97/0xb0 net/socket.c:2447 __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2447 do_syscall_64+0x9f/0x140 arch/x86/entry/common.c:48 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x45deb9 Code: 0d b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 db b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007fe2e503fc78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 0000000000029ec0 RCX: 000000000045deb9 RDX: 0000000000000000 RSI: 0000000020000140 RDI: 0000000000000003 RBP: 000000000118bf60 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 000000000118bf2c R13: 000000000169fb7f R14: 00007fe2e50409c0 R15: 000000000118bf2c Uninit was stored to memory at: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:121 [inline] kmsan_internal_chain_origin+0xad/0x130 mm/kmsan/kmsan.c:289 __msan_chain_origin+0x57/0xa0 mm/kmsan/kmsan_instr.c:147 ip6_netmask include/linux/netfilter/ipset/pfxlen.h:49 [inline] hash_ip6_netmask net/netfilter/ipset/ip_set_hash_ip.c:185 [inline] hash_ip6_uadt+0xb1c/0xbd0 net/netfilter/ipset/ip_set_hash_ip.c:263 call_ad+0x329/0xd00 net/netfilter/ipset/ip_set_core.c:1720 ip_set_ad+0x111f/0x1440 net/netfilter/ipset/ip_set_core.c:1808 ip_set_uadd+0xf6/0x110 net/netfilter/ipset/ip_set_core.c:1833 nfnetlink_rcv_msg+0xc7d/0xdf0 net/netfilter/nfnetlink.c:252 netlink_rcv_skb+0x70a/0x820 net/netlink/af_netlink.c:2494 nfnetlink_rcv+0x4f0/0x4380 net/netfilter/nfnetlink.c:600 netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline] netlink_unicast+0x11da/0x14b0 net/netlink/af_netlink.c:1330 netlink_sendmsg+0x173c/0x1840 net/netlink/af_netlink.c:1919 sock_sendmsg_nosec net/socket.c:651 [inline] sock_sendmsg net/socket.c:671 [inline] ____sys_sendmsg+0xc7a/0x1240 net/socket.c:2353 ___sys_sendmsg net/socket.c:2407 [inline] __sys_sendmsg+0x6d5/0x830 net/socket.c:2440 __do_sys_sendmsg net/socket.c:2449 [inline] __se_sys_sendmsg+0x97/0xb0 net/socket.c:2447 __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2447 do_syscall_64+0x9f/0x140 arch/x86/entry/common.c:48 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Uninit was stored to memory at: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:121 [inline] kmsan_internal_chain_origin+0xad/0x130 mm/kmsan/kmsan.c:289 kmsan_memcpy_memmove_metadata+0x25e/0x2d0 mm/kmsan/kmsan.c:226 kmsan_memcpy_metadata+0xb/0x10 mm/kmsan/kmsan.c:246 __msan_memcpy+0x46/0x60 mm/kmsan/kmsan_instr.c:110 ip_set_get_ipaddr6+0x2cb/0x370 net/netfilter/ipset/ip_set_core.c:310 hash_ip6_uadt+0x439/0xbd0 net/netfilter/ipset/ip_set_hash_ip.c:255 call_ad+0x329/0xd00 net/netfilter/ipset/ip_set_core.c:1720 ip_set_ad+0x111f/0x1440 net/netfilter/ipset/ip_set_core.c:1808 ip_set_uadd+0xf6/0x110 net/netfilter/ipset/ip_set_core.c:1833 nfnetlink_rcv_msg+0xc7d/0xdf0 net/netfilter/nfnetlink.c:252 netlink_rcv_skb+0x70a/0x820 net/netlink/af_netlink.c:2494 nfnetlink_rcv+0x4f0/0x4380 net/netfilter/nfnetlink.c:600 netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline] netlink_unicast+0x11da/0x14b0 net/netlink/af_netlink.c:1330 netlink_sendmsg+0x173c/0x1840 net/netlink/af_netlink.c:1919 sock_sendmsg_nosec net/socket.c:651 [inline] sock_sendmsg net/socket.c:671 [inline] ____sys_sendmsg+0xc7a/0x1240 net/socket.c:2353 ___sys_sendmsg net/socket.c:2407 [inline] __sys_sendmsg+0x6d5/0x830 net/socket.c:2440 __do_sys_sendmsg net/socket.c:2449 [inline] __se_sys_sendmsg+0x97/0xb0 net/socket.c:2447 __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2447 do_syscall_64+0x9f/0x140 arch/x86/entry/common.c:48 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Uninit was created at: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:121 [inline] kmsan_internal_poison_shadow+0x5c/0xf0 mm/kmsan/kmsan.c:104 kmsan_slab_alloc+0x8d/0xe0 mm/kmsan/kmsan_hooks.c:76 slab_alloc_node mm/slub.c:2906 [inline] __kmalloc_node_track_caller+0xc61/0x15f0 mm/slub.c:4512 __kmalloc_reserve net/core/skbuff.c:142 [inline] __alloc_skb+0x309/0xae0 net/core/skbuff.c:210 alloc_skb include/linux/skbuff.h:1094 [inline] netlink_alloc_large_skb net/netlink/af_netlink.c:1176 [inline] netlink_sendmsg+0xdb8/0x1840 net/netlink/af_netlink.c:1894 sock_sendmsg_nosec net/socket.c:651 [inline] sock_sendmsg net/socket.c:671 [inline] ____sys_sendmsg+0xc7a/0x1240 net/socket.c:2353 ___sys_sendmsg net/socket.c:2407 [inline] __sys_sendmsg+0x6d5/0x830 net/socket.c:2440 __do_sys_sendmsg net/socket.c:2449 [inline] __se_sys_sendmsg+0x97/0xb0 net/socket.c:2447 __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2447 do_syscall_64+0x9f/0x140 arch/x86/entry/common.c:48 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: a7b4f989a629 ("netfilter: ipset: IP set core support") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-11-22ipvs: replace atomic_add_return()Yejune Deng2-3/+3
atomic_inc_return() looks better Signed-off-by: Yejune Deng <yejune.deng@gmail.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-11-20net: openvswitch: Be liberal in tcp conntrack.Numan Siddique1-6/+0
There is no easy way to distinguish if a conntracked tcp packet is marked invalid because of tcp_in_window() check error or because it doesn't belong to an existing connection. With this patch, openvswitch sets liberal tcp flag for the established sessions so that out of window packets are not marked invalid. A helper function - nf_ct_set_tcp_be_liberal(nf_conn) is added which sets this flag for both the directions of the nf_conn. Suggested-by: Florian Westphal <fw@strlen.de> Signed-off-by: Numan Siddique <nusiddiq@redhat.com> Acked-by: Florian Westphal <fw@strlen.de> Link: https://lore.kernel.org/r/20201116130126.3065077-1-nusiddiq@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-16treewide: rename nla_strlcpy to nla_strscpy.Francis Laniel6-10/+10
Calls to nla_strlcpy are now replaced by calls to nla_strscpy which is the new name of this function. Signed-off-by: Francis Laniel <laniel_francis@privacyrequired.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski8-21/+41
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-01netfilter: nft_reject_inet: allow to use reject from inet ingressPablo Neira Ayuso1-1/+13
Enhance validation to support for reject from inet ingress chains. Note that, reject from inet ingress and netdev ingress differ. Reject packets from inet ingress are sent through ip_local_out() since inet reject emulates the IP layer receive path. So the reject packet follows to classic IP output and postrouting paths. The reject action from netdev ingress assumes the packet not yet entered the IP layer, so the reject packet is sent through dev_queue_xmit(). Therefore, reject packets from netdev ingress do not follow the classic IP output and postrouting paths. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-11-01netfilter: nftables: Add __printf() attributeAndrew Lunn1-1/+2
nft_request_module calls vsnprintf() using parameters passed to it. Make the function with __printf() attribute so the compiler can check the format and arguments. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-10-31netfilter: ipset: Expose the initval hash parameter to userspaceJozsef Kadlecsik13-16/+33
It makes possible to reproduce exactly the same set after a save/restore. Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-10-31netfilter: ipset: Add bucketsize parameter to all hash typesJozsef Kadlecsik14-39/+71
The parameter defines the upper limit in any hash bucket at adding new entries from userspace - if the limit would be exceeded, ipset doubles the hash size and rehashes. It means the set may consume more memory but gives faster evaluation at matching in the set. Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-10-31netfilter: ipset: Support the -exist flag with the destroy commandJozsef Kadlecsik1-1/+3
The -exist flag was supported with the create, add and delete commands. In order to gracefully handle the destroy command with nonexistent sets, the -exist flag is added to destroy too. Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-10-31netfilter: ipset: Update byte and packet counters regardless of whether they ↵Stefano Brivio1-1/+2
match In ip_set_match_extensions(), for sets with counters, we take care of updating counters themselves by calling ip_set_update_counter(), and of checking if the given comparison and values match, by calling ip_set_match_counter() if needed. However, if a given comparison on counters doesn't match the configured values, that doesn't mean the set entry itself isn't matching. This fix restores the behaviour we had before commit 4750005a85f7 ("netfilter: ipset: Fix "don't update counters" mode when counters used at the matching"), without reintroducing the issue fixed there: back then, mtype_data_match() first updated counters in any case, and then took care of matching on counters. Now, if the IPSET_FLAG_SKIP_COUNTER_UPDATE flag is set, ip_set_update_counter() will anyway skip counter updates if desired. The issue observed is illustrated by this reproducer: ipset create c hash:ip counters ipset add c 192.0.2.1 iptables -I INPUT -m set --match-set c src --bytes-gt 800 -j DROP if we now send packets from 192.0.2.1, bytes and packets counters for the entry as shown by 'ipset list' are always zero, and, no matter how many bytes we send, the rule will never match, because counters themselves are not updated. Reported-by: Mithil Mhatre <mmhatre@redhat.com> Fixes: 4750005a85f7 ("netfilter: ipset: Fix "don't update counters" mode when counters used at the matching") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-10-31netfilter: nft_reject: add reject verdict support for netdevJose M. Guisado Gomez3-0/+200
Adds support for reject from ingress hook in netdev family. Both stacks ipv4 and ipv6. With reject packets supporting ICMP and TCP RST. This ability is required in devices that need to REJECT legitimate clients which traffic is forwarded from the ingress hook. Joint work with Laura Garcia. Signed-off-by: Jose M. Guisado Gomez <guigom@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-10-31netfilter: nft_reject: unify reject init and dump into nft_rejectJose M. Guisado Gomez2-59/+13
Bridge family is using the same static init and dump function as inet. This patch removes duplicate code unifying these functions body into nft_reject.c so they can be reused in the rest of families supporting reject verdict. Signed-off-by: Jose M. Guisado Gomez <guigom@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-10-30netfilter: nf_tables: missing validation from the abort pathPablo Neira Ayuso2-9/+28
If userspace does not include the trailing end of batch message, then nfnetlink aborts the transaction. This allows to check that ruleset updates trigger no errors. After this patch, invoking this command from the prerouting chain: # nft -c add rule x y fib saddr . oif type local fails since oif is not supported there. This patch fixes the lack of rule validation from the abort/check path to catch configuration errors such as the one above. Fixes: a654de8fdc18 ("netfilter: nf_tables: fix chain dependency validation") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-10-30netfilter: use actual socket sk rather than skb sk when routing harderJason A. Donenfeld5-9/+9
If netfilter changes the packet mark when mangling, the packet is rerouted using the route_me_harder set of functions. Prior to this commit, there's one big difference between route_me_harder and the ordinary initial routing functions, described in the comment above __ip_queue_xmit(): /* Note: skb->sk can be different from sk, in case of tunnels */ int __ip_queue_xmit(struct sock *sk, struct sk_buff *skb, struct flowi *fl, That function goes on to correctly make use of sk->sk_bound_dev_if, rather than skb->sk->sk_bound_dev_if. And indeed the comment is true: a tunnel will receive a packet in ndo_start_xmit with an initial skb->sk. It will make some transformations to that packet, and then it will send the encapsulated packet out of a *new* socket. That new socket will basically always have a different sk_bound_dev_if (otherwise there'd be a routing loop). So for the purposes of routing the encapsulated packet, the routing information as it pertains to the socket should come from that socket's sk, rather than the packet's original skb->sk. For that reason __ip_queue_xmit() and related functions all do the right thing. One might argue that all tunnels should just call skb_orphan(skb) before transmitting the encapsulated packet into the new socket. But tunnels do *not* do this -- and this is wisely avoided in skb_scrub_packet() too -- because features like TSQ rely on skb->destructor() being called when that buffer space is truely available again. Calling skb_orphan(skb) too early would result in buffers filling up unnecessarily and accounting info being all wrong. Instead, additional routing must take into account the new sk, just as __ip_queue_xmit() notes. So, this commit addresses the problem by fishing the correct sk out of state->sk -- it's already set properly in the call to nf_hook() in __ip_local_out(), which receives the sk as part of its normal functionality. So we make sure to plumb state->sk through the various route_me_harder functions, and then make correct use of it following the example of __ip_queue_xmit(). Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-10-30netfilter: nftables: fix netlink report logic in flowtable and genidPablo Neira Ayuso1-2/+2
The netlink report should be sent regardless the available listeners. Fixes: 84d7fce69388 ("netfilter: nf_tables: export rule-set generation ID") Fixes: 3b49e2e94e6e ("netfilter: nf_tables: add flow table netlink frontend") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-10-22netfilter: nf_fwd_netdev: clear timestamp in forwarding pathPablo Neira Ayuso2-0/+2
Similar to 7980d2eabde8 ("ipvs: clear skb->tstamp in forwarding path"). fq qdisc requires tstamp to be cleared in forwarding path. Fixes: 8203e2d844d3 ("net: clear skb->tstamp in forwarding paths") Fixes: fb420d5d91c1 ("tcp/fq: move back to CLOCK_MONOTONIC") Fixes: 80b14dee2bea ("net: Add a new socket option for a future transmit time.") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-10-20netfilter: nftables_offload: KASAN slab-out-of-bounds Read in ↵Saeed Mirzamohammadi2-5/+5
nft_flow_rule_create This patch fixes the issue due to: BUG: KASAN: slab-out-of-bounds in nft_flow_rule_create+0x622/0x6a2 net/netfilter/nf_tables_offload.c:40 Read of size 8 at addr ffff888103910b58 by task syz-executor227/16244 The error happens when expr->ops is accessed early on before performing the boundary check and after nft_expr_next() moves the expr to go out-of-bounds. This patch checks the boundary condition before expr->ops that fixes the slab-out-of-bounds Read issue. Add nft_expr_more() and use it to fix this problem. Signed-off-by: Saeed Mirzamohammadi <saeed.mirzamohammadi@oracle.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-10-20netfilter: conntrack: connection timeout after re-registerFrancesco Ruggeri1-6/+13
If the first packet conntrack sees after a re-register is an outgoing keepalive packet with no data (SEG.SEQ = SND.NXT-1), td_end is set to SND.NXT-1. When the peer correctly acknowledges SND.NXT, tcp_in_window fails check III (Upper bound for valid (s)ack: sack <= receiver.td_end) and returns false, which cascades into nf_conntrack_in setting skb->_nfct = 0 and in later conntrack iptables rules not matching. In cases where iptables are dropping packets that do not match conntrack rules this can result in idle tcp connections to time out. v2: adjust td_end when getting the reply rather than when sending out the keepalive packet. Fixes: f94e63801ab2 ("netfilter: conntrack: reset tcp maxwin on re-register") Signed-off-by: Francesco Ruggeri <fruggeri@arista.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-10-20ipvs: adjust the debug info in function set_tcp_statelongguang.yue1-4/+6
Outputting client,virtual,dst addresses info when tcp state changes, which makes the connection debug more clear Signed-off-by: longguang.yue <bigclouds@163.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-10-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski3-5/+32
Minor conflicts in net/mptcp/protocol.h and tools/testing/selftests/net/Makefile. In both cases code was added on both sides in the same place so just keep both. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-15netfilter: nftables: allow re-computing sctp CRC-32C in 'payload' statementsDavide Caratti2-0/+29
nftables payload statements are used to mangle SCTP headers, but they can only replace the Internet Checksum. As a consequence, nftables rules that mangle sport/dport/vtag in SCTP headers potentially generate packets that are discarded by the receiver, unless the CRC-32C is "offloaded" (e.g the rule mangles a skb having 'ip_summed' equal to 'CHECKSUM_PARTIAL'. Fix this extending uAPI definitions and L4 checksum update function, in a way that userspace programs (e.g. nft) can instruct the kernel to compute CRC-32C in SCTP headers. Also ensure that LIBCRC32C is built if NF_TABLES is 'y' or 'm' in the kernel build configuration. Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-14netfilter: restore NF_INET_NUMHOOKSPablo Neira Ayuso1-1/+1
This definition is used by the iptables legacy UAPI, restore it. Fixes: d3519cb89f6d ("netfilter: nf_tables: add inet ingress support") Reported-by: Jason A. Donenfeld <Jason@zx2c4.com> Tested-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-14netfilter: nf_log: missing vlan offload tag and protoPablo Neira Ayuso1-0/+12
Dump vlan tag and proto for the usual vlan offload case if the NF_LOG_MACDECODE flag is set on. Without this information the logging is misleading as there is no reference to the VLAN header. [12716.993704] test: IN=veth0 OUT= MACSRC=86:6c:92:ea:d6:73 MACDST=0e:3b:eb:86:73:76 VPROTO=8100 VID=10 MACPROTO=0800 SRC=192.168.10.2 DST=172.217.168.163 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=2548 DF PROTO=TCP SPT=55848 DPT=80 WINDOW=501 RES=0x00 ACK FIN URGP=0 [12721.157643] test: IN=veth0 OUT= MACSRC=86:6c:92:ea:d6:73 MACDST=0e:3b:eb:86:73:76 VPROTO=8100 VID=10 MACPROTO=0806 ARP HTYPE=1 PTYPE=0x0800 OPCODE=2 MACSRC=86:6c:92:ea:d6:73 IPSRC=192.168.10.2 MACDST=0e:3b:eb:86:73:76 IPDST=192.168.10.1 Fixes: 83e96d443b37 ("netfilter: log: split family specific code to nf_log_{ip,ip6,common}.c files") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-10-12netfilter: nftables: extend error reporting for chain updatesPablo Neira Ayuso1-5/+14
The initial support for netlink extended ACK is missing the chain update path, which results in misleading error reporting in case of EEXIST. Fixes 36dd1bcc07e5 ("netfilter: nf_tables: initial support for extended ACK reporting") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-10-12ipvs: clear skb->tstamp in forwarding pathJulian Anastasov1-0/+6
fq qdisc requires tstamp to be cleared in forwarding path Reported-by: Evgeny B <abt-admin@mail.ru> Link: https://bugzilla.kernel.org/show_bug.cgi?id=209427 Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Fixes: 8203e2d844d3 ("net: clear skb->tstamp in forwarding paths") Fixes: fb420d5d91c1 ("tcp/fq: move back to CLOCK_MONOTONIC") Fixes: 80b14dee2bea ("net: Add a new socket option for a future transmit time.") Signed-off-by: Julian Anastasov <ja@ssi.bg> Reviewed-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-10-12netfilter: flowtable: reduce calls to pskb_may_pull()Pablo Neira Ayuso2-27/+30
Make two unfront calls to pskb_may_pull() to linearize the network and transport header. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-10-12netfilter: nf_tables: add inet ingress supportPablo Neira Ayuso2-8/+41
This patch adds a new ingress hook for the inet family. The inet ingress hook emulates the IP receive path code, therefore, unclean packets are drop before walking over the ruleset in this basechain. This patch also introduces the nft_base_chain_netdev() helper function to check if this hook is bound to one or more devices (through the hook list infrastructure). This check allows to perform the same handling for the inet ingress as it would be a netdev ingress chain from the control plane. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-10-12netfilter: add inet ingress supportPablo Neira Ayuso1-21/+82
This patch adds the NF_INET_INGRESS pseudohook for the NFPROTO_INET family. This is a mapping this new hook to the existing NFPROTO_NETDEV and NF_NETDEV_INGRESS hook. The hook does not guarantee that packets are inet only, users must filter out non-ip traffic explicitly. This infrastructure makes it easier to support this new hook in nf_tables. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-10-12netfilter: add nf_ingress_hook() helper functionPablo Neira Ayuso1-2/+7
Add helper function to check if this is an ingress hook. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-10-12netfilter: add nf_static_key_{inc,dec}Pablo Neira Ayuso1-6/+17
Add helper functions increment and decrement the hook static keys. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-10-12ipvs: inspect reply packets from DR/TUN real serverslongguang.yue2-15/+22
Just like for MASQ, inspect the reply packets coming from DR/TUN real servers and alter the connection's state and timeout according to the protocol. It's ipvs's duty to do traffic statistic if packets get hit, no matter what mode it is. Signed-off-by: longguang.yue <bigclouds@163.com> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>