summaryrefslogtreecommitdiffstats
path: root/net/netfilter
AgeCommit message (Collapse)AuthorFilesLines
2020-06-25netfilter: Add MODULE_DESCRIPTION entries to kernel modulesRob Gill29-0/+29
The user tool modinfo is used to get information on kernel modules, including a description where it is available. This patch adds a brief MODULE_DESCRIPTION to netfilter kernel modules (descriptions taken from Kconfig file or code comments) Signed-off-by: Rob Gill <rrobgill@protonmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-06-25netfilter: ipset: fix unaligned atomic accessRussell King1-0/+2
When using ip_set with counters and comment, traffic causes the kernel to panic on 32-bit ARM: Alignment trap: not handling instruction e1b82f9f at [<bf01b0dc>] Unhandled fault: alignment exception (0x221) at 0xea08133c PC is at ip_set_match_extensions+0xe0/0x224 [ip_set] The problem occurs when we try to update the 64-bit counters - the faulting address above is not 64-bit aligned. The problem occurs due to the way elements are allocated, for example: set->dsize = ip_set_elem_len(set, tb, 0, 0); map = ip_set_alloc(sizeof(*map) + elements * set->dsize); If the element has a requirement for a member to be 64-bit aligned, and set->dsize is not a multiple of 8, but is a multiple of four, then every odd numbered elements will be misaligned - and hitting an atomic64_add() on that element will cause the kernel to panic. ip_set_elem_len() must return a size that is rounded to the maximum alignment of any extension field stored in the element. This change ensures that is the case. Fixes: 95ad1f4a9358 ("netfilter: ipset: Fix extension alignment") Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-06-19net: flow_offload: fix flow_indr_dev_unregister pathwenxu2-0/+2
If the representor is removed, then identify the indirect flow_blocks that need to be removed by the release callback and the port representor structure. To identify the port representor structure, a new indr.cb_priv field needs to be introduced. The flow_block also needs to be removed from the driver list from the cleanup path. Fixes: 1fac52da5942 ("net: flow_offload: consolidate indirect flow_block infrastructure") Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-15netfilter: flowtable: Make nf_flow_table_offload_add/del_cb inlineAlaa Hleihel1-45/+0
Currently, nf_flow_table_offload_add/del_cb are exported by nf_flow_table module, therefore modules using them will have hard-dependency on nf_flow_table and will require loading it all the time. This can lead to an unnecessary overhead on systems that do not use this API. To relax the hard-dependency between the modules, we unexport these functions and make them static inline. Fixes: 978703f42549 ("netfilter: flowtable: Add API for registering to flow table events") Signed-off-by: Alaa Hleihel <alaa@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller4-25/+65
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Fix bogus EEXIST on element insertions to the rbtree with timeouts, from Stefano Brivio. 2) Preempt BUG splat in the pipapo element insertion path, also from Stefano. 3) Release filter from the ctnetlink error path. 4) Release flowtable hooks from the deletion path. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-14treewide: replace '---help---' in Kconfig files with 'help'Masahiro Yamada2-56/+56
Since commit 84af7a6194e4 ("checkpatch: kconfig: prefer 'help' over '---help---'"), the number of '---help---' has been gradually decreasing, but there are still more than 2400 instances. This commit finishes the conversion. While I touched the lines, I also fixed the indentation. There are a variety of indentation styles found. a) 4 spaces + '---help---' b) 7 spaces + '---help---' c) 8 spaces + '---help---' d) 1 space + 1 tab + '---help---' e) 1 tab + '---help---' (correct indentation) f) 1 tab + 1 space + '---help---' g) 1 tab + 2 spaces + '---help---' In order to convert all of them to 1 tab + 'help', I ran the following commend: $ find . -name 'Kconfig*' | xargs sed -i 's/^[[:space:]]*---help---/\thelp/' Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-06-12netfilter: nf_tables: hook list memleak in flowtable deletionPablo Neira Ayuso1-7/+24
After looking up for the flowtable hooks that need to be removed, release the hook objects in the deletion list. The error path needs to released these hook objects too. Fixes: abadb2f865d7 ("netfilter: nf_tables: delete devices from flowtable") Reported-by: syzbot+eb9d5924c51d6d59e094@syzkaller.appspotmail.com Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-06-10Merge branch 'rwonce/rework' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/will/linux Pull READ/WRITE_ONCE rework from Will Deacon: "This the READ_ONCE rework I've been working on for a while, which bumps the minimum GCC version and improves code-gen on arm64 when stack protector is enabled" [ Side note: I'm _really_ tempted to raise the minimum gcc version to 4.9, so that we can just say that we require _Generic() support. That would allow us to more cleanly handle a lot of the cases where we depend on very complex macros with 'sizeof' or __builtin_choose_expr() with __builtin_types_compatible_p() etc. This branch has a workaround for sparse not handling _Generic(), either, but that was already fixed in the sparse development branch, so it's really just gcc-4.9 that we'd require. - Linus ] * 'rwonce/rework' of git://git.kernel.org/pub/scm/linux/kernel/git/will/linux: compiler_types.h: Use unoptimized __unqual_scalar_typeof for sparse compiler_types.h: Optimize __unqual_scalar_typeof compilation time compiler.h: Enforce that READ_ONCE_NOCHECK() access size is sizeof(long) compiler-types.h: Include naked type in __pick_integer_type() match READ_ONCE: Fix comment describing 2x32-bit atomicity gcov: Remove old GCC 3.4 support arm64: barrier: Use '__unqual_scalar_typeof' for acquire/release macros locking/barriers: Use '__unqual_scalar_typeof' for load-acquire macros READ_ONCE: Drop pointer qualifiers when reading from scalar types READ_ONCE: Enforce atomicity for {READ,WRITE}_ONCE() memory accesses READ_ONCE: Simplify implementations of {READ,WRITE}_ONCE() arm64: csum: Disable KASAN for do_csum() fault_inject: Don't rely on "return value" from WRITE_ONCE() net: tls: Avoid assigning 'const' pointer to non-const pointer netfilter: Avoid assigning 'const' pointer to non-const pointer compiler/gcc: Raise minimum GCC version for kernel builds to 4.8
2020-06-10netfilter: ctnetlink: memleak in filter initialization error pathPablo Neira Ayuso1-10/+22
Release the filter object in case of error. Fixes: cb8aa9a3affb ("netfilter: ctnetlink: add kernel side filtering for dump") Reported-by: syzbot+38b8b548a851a01793c5@syzkaller.appspotmail.com Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-06-08netfilter: nft_set_pipapo: Disable preemption before getting per-CPU pointerStefano Brivio1-1/+5
The lkp kernel test robot reports, with CONFIG_DEBUG_PREEMPT enabled: [ 165.316525] BUG: using smp_processor_id() in preemptible [00000000] code: nft/6247 [ 165.319547] caller is nft_pipapo_insert+0x464/0x610 [nf_tables] [ 165.321846] CPU: 1 PID: 6247 Comm: nft Not tainted 5.6.0-rc5-01595-ge32a4dc6512ce3 #1 [ 165.332128] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 [ 165.334892] Call Trace: [ 165.336435] dump_stack+0x8f/0xcb [ 165.338128] debug_smp_processor_id+0xb2/0xc0 [ 165.340117] nft_pipapo_insert+0x464/0x610 [nf_tables] [ 165.342290] ? nft_trans_alloc_gfp+0x1c/0x60 [nf_tables] [ 165.344420] ? rcu_read_lock_sched_held+0x52/0x80 [ 165.346460] ? nft_trans_alloc_gfp+0x1c/0x60 [nf_tables] [ 165.348543] ? __mmu_interval_notifier_insert+0xa0/0xf0 [ 165.350629] nft_add_set_elem+0x5ff/0xa90 [nf_tables] [ 165.352699] ? __lock_acquire+0x241/0x1400 [ 165.354573] ? __lock_acquire+0x241/0x1400 [ 165.356399] ? reacquire_held_locks+0x12f/0x200 [ 165.358384] ? nf_tables_valid_genid+0x1f/0x40 [nf_tables] [ 165.360502] ? nla_strcmp+0x10/0x50 [ 165.362199] ? nft_table_lookup+0x4f/0xa0 [nf_tables] [ 165.364217] ? nla_strcmp+0x10/0x50 [ 165.365891] ? nf_tables_newsetelem+0xd5/0x150 [nf_tables] [ 165.367997] nf_tables_newsetelem+0xd5/0x150 [nf_tables] [ 165.370083] nfnetlink_rcv_batch+0x4fd/0x790 [nfnetlink] [ 165.372205] ? __lock_acquire+0x241/0x1400 [ 165.374058] ? __nla_validate_parse+0x57/0x8a0 [ 165.375989] ? cap_inode_getsecurity+0x230/0x230 [ 165.377954] ? security_capable+0x38/0x50 [ 165.379795] nfnetlink_rcv+0x11d/0x140 [nfnetlink] [ 165.381779] netlink_unicast+0x1b2/0x280 [ 165.383612] netlink_sendmsg+0x351/0x470 [ 165.385439] sock_sendmsg+0x5b/0x60 [ 165.387133] ____sys_sendmsg+0x200/0x280 [ 165.388871] ? copy_msghdr_from_user+0xd9/0x160 [ 165.390805] ___sys_sendmsg+0x88/0xd0 [ 165.392524] ? __might_fault+0x3e/0x90 [ 165.394273] ? sock_getsockopt+0x3d5/0xbb0 [ 165.396021] ? __handle_mm_fault+0x545/0x6a0 [ 165.397822] ? find_held_lock+0x2d/0x90 [ 165.399593] ? __sys_sendmsg+0x5e/0xa0 [ 165.401338] __sys_sendmsg+0x5e/0xa0 [ 165.402979] do_syscall_64+0x60/0x280 [ 165.404680] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 165.406621] RIP: 0033:0x7ff1fa46e783 [ 165.408299] Code: c7 c0 ff ff ff ff eb bb 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 55 c3 0f 1f 40 00 48 83 ec 28 89 54 24 1c 48 [ 165.414163] RSP: 002b:00007ffedf59ea78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 165.416804] RAX: ffffffffffffffda RBX: 00007ffedf59fc60 RCX: 00007ff1fa46e783 [ 165.419419] RDX: 0000000000000000 RSI: 00007ffedf59fb10 RDI: 0000000000000005 [ 165.421886] RBP: 00007ffedf59fc10 R08: 00007ffedf59ea54 R09: 0000000000000001 [ 165.424445] R10: 00007ff1fa630c6c R11: 0000000000000246 R12: 0000000000020000 [ 165.426954] R13: 0000000000000280 R14: 0000000000000005 R15: 00007ffedf59ea90 Disable preemption before accessing the lookup scratch area in nft_pipapo_insert(). Reported-by: kernel test robot <lkp@intel.com> Analysed-by: Florian Westphal <fw@strlen.de> Cc: <stable@vger.kernel.org> # 5.6.x Fixes: 3c4287f62044 ("nf_tables: Add set type for arbitrary concatenation of ranges") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-06-08netfilter: nft_set_rbtree: Don't account for expired elements on insertionStefano Brivio1-7/+14
While checking the validity of insertion in __nft_rbtree_insert(), we currently ignore conflicting elements and intervals only if they are not active within the next generation. However, if we consider expired elements and intervals as potentially conflicting and overlapping, we'll return error for entries that should be added instead. This is particularly visible with garbage collection intervals that are comparable with the element timeout itself, as reported by Mike Dillinger. Other than the simple issue of denying insertion of valid entries, this might also result in insertion of a single element (opening or closing) out of a given interval. With single entries (that are inserted as intervals of size 1), this leads in turn to the creation of new intervals. For example: # nft add element t s { 192.0.2.1 } # nft list ruleset [...] elements = { 192.0.2.1-255.255.255.255 } Always ignore expired elements active in the next generation, while checking for conflicts. It might be more convenient to introduce a new macro that covers both inactive and expired items, as this type of check also appears quite frequently in other set back-ends. This is however beyond the scope of this fix and can be deferred to a separate patch. Other than the overlap detection cases introduced by commit 7c84d41416d8 ("netfilter: nft_set_rbtree: Detect partial overlaps on insertion"), we also have to cover the original conflict check dealing with conflicts between two intervals of size 1, which was introduced before support for timeout was introduced. This won't return an error to the user as -EEXIST is masked by nft if NLM_F_EXCL is not given, but would result in a silent failure adding the entry. Reported-by: Mike Dillinger <miked@softtalker.com> Cc: <stable@vger.kernel.org> # 5.6.x Fixes: 8d8540c4f5e0 ("netfilter: nft_set_rbtree: add timeout support") Fixes: 7c84d41416d8 ("netfilter: nft_set_rbtree: Detect partial overlaps on insertion") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Acked-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-06-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-nextLinus Torvalds14-304/+808
Pull networking updates from David Miller: 1) Allow setting bluetooth L2CAP modes via socket option, from Luiz Augusto von Dentz. 2) Add GSO partial support to igc, from Sasha Neftin. 3) Several cleanups and improvements to r8169 from Heiner Kallweit. 4) Add IF_OPER_TESTING link state and use it when ethtool triggers a device self-test. From Andrew Lunn. 5) Start moving away from custom driver versions, use the globally defined kernel version instead, from Leon Romanovsky. 6) Support GRO vis gro_cells in DSA layer, from Alexander Lobakin. 7) Allow hard IRQ deferral during NAPI, from Eric Dumazet. 8) Add sriov and vf support to hinic, from Luo bin. 9) Support Media Redundancy Protocol (MRP) in the bridging code, from Horatiu Vultur. 10) Support netmap in the nft_nat code, from Pablo Neira Ayuso. 11) Allow UDPv6 encapsulation of ESP in the ipsec code, from Sabrina Dubroca. Also add ipv6 support for espintcp. 12) Lots of ReST conversions of the networking documentation, from Mauro Carvalho Chehab. 13) Support configuration of ethtool rxnfc flows in bcmgenet driver, from Doug Berger. 14) Allow to dump cgroup id and filter by it in inet_diag code, from Dmitry Yakunin. 15) Add infrastructure to export netlink attribute policies to userspace, from Johannes Berg. 16) Several optimizations to sch_fq scheduler, from Eric Dumazet. 17) Fallback to the default qdisc if qdisc init fails because otherwise a packet scheduler init failure will make a device inoperative. From Jesper Dangaard Brouer. 18) Several RISCV bpf jit optimizations, from Luke Nelson. 19) Correct the return type of the ->ndo_start_xmit() method in several drivers, it's netdev_tx_t but many drivers were using 'int'. From Yunjian Wang. 20) Add an ethtool interface for PHY master/slave config, from Oleksij Rempel. 21) Add BPF iterators, from Yonghang Song. 22) Add cable test infrastructure, including ethool interfaces, from Andrew Lunn. Marvell PHY driver is the first to support this facility. 23) Remove zero-length arrays all over, from Gustavo A. R. Silva. 24) Calculate and maintain an explicit frame size in XDP, from Jesper Dangaard Brouer. 25) Add CAP_BPF, from Alexei Starovoitov. 26) Support terse dumps in the packet scheduler, from Vlad Buslov. 27) Support XDP_TX bulking in dpaa2 driver, from Ioana Ciornei. 28) Add devm_register_netdev(), from Bartosz Golaszewski. 29) Minimize qdisc resets, from Cong Wang. 30) Get rid of kernel_getsockopt and kernel_setsockopt in order to eliminate set_fs/get_fs calls. From Christoph Hellwig. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2517 commits) selftests: net: ip_defrag: ignore EPERM net_failover: fixed rollback in net_failover_open() Revert "tipc: Fix potential tipc_aead refcnt leak in tipc_crypto_rcv" Revert "tipc: Fix potential tipc_node refcnt leak in tipc_rcv" vmxnet3: allow rx flow hash ops only when rss is enabled hinic: add set_channels ethtool_ops support selftests/bpf: Add a default $(CXX) value tools/bpf: Don't use $(COMPILE.c) bpf, selftests: Use bpf_probe_read_kernel s390/bpf: Use bcr 0,%0 as tail call nop filler s390/bpf: Maintain 8-byte stack alignment selftests/bpf: Fix verifier test selftests/bpf: Fix sample_cnt shared between two threads bpf, selftests: Adapt cls_redirect to call csum_level helper bpf: Add csum_level helper for fixing up csum levels bpf: Fix up bpf_skb_adjust_room helper's skb csum setting sfc: add missing annotation for efx_ef10_try_update_nic_stats_vf() crypto/chtls: IPv6 support for inline TLS Crypto/chcr: Fixes a coccinile check error Crypto/chcr: Fixes compilations warnings ...
2020-06-02Merge tag 'audit-pr-20200601' of ↵Linus Torvalds1-9/+5
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit Pull audit updates from Paul Moore: "Summary of the significant patches: - Record information about binds/unbinds to the audit multicast socket. This helps identify which processes have/had access to the information in the audit stream. - Cleanup and add some additional information to the netfilter configuration events collected by audit. - Fix some of the audit error handling code so we don't leak network namespace references" * tag 'audit-pr-20200601' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit: audit: add subj creds to NETFILTER_CFG record to audit: Replace zero-length array with flexible-array audit: make symbol 'audit_nfcfgs' static netfilter: add audit table unregister actions audit: tidy and extend netfilter_cfg x_tables audit: log audit netlink multicast bind and unbind audit: fix a net reference leak in audit_list_rules_send() audit: fix a net reference leak in audit_send_reply()
2020-06-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller6-135/+650
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for net-next to extend ctnetlink and the flowtable infrastructure: 1) Extend ctnetlink kernel side netlink dump filtering capabilities, from Romain Bellan. 2) Generalise the flowtable hook parser to take a hook list. 3) Pass a hook list to the flowtable hook registration/unregistration. 4) Add a helper function to release the flowtable hook list. 5) Update the flowtable event notifier to pass a flowtable hook list. 6) Allow users to add new devices to an existing flowtables. 7) Allow users to remove devices to an existing flowtables. 8) Allow for registering a flowtable with no initial devices. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-01net: remove indirect block netdev event registrationPablo Neira Ayuso2-118/+1
Drivers do not register to netdev events to set up indirect blocks anymore. Remove __flow_indr_block_cb_register() and __flow_indr_block_cb_unregister(). The frontends set up the callbacks through flow_indr_dev_setup_block() Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-01net: use flow_indr_dev_setup_offload()Pablo Neira Ayuso2-9/+38
Update existing frontends to use flow_indr_dev_setup_offload(). This new function must be called if ->ndo_setup_tc is unset to deal with tunnel devices. If there is no driver that is subscribed to new tunnel device flow_block bindings, then this function bails out with EOPNOTSUPP. If the driver module is removed, the ->cleanup() callback removes the entries that belong to this tunnel device. This cleanup procedures is triggered when the device unregisters the tunnel device offload handler. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-01netfilter: nf_flowtable: expose nf_flow_table_gc_cleanup()Pablo Neira Ayuso1-3/+3
This function schedules the flow teardown state and it forces a gc run. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller4-36/+111
xdp_umem.c had overlapping changes between the 64-bit math fix for the calculation of npgs and the removal of the zerocopy memory type which got rid of the chunk_size_nohdr member. The mlx5 Kconfig conflict is a case where we just take the net-next copy of the Kconfig entry dependency as it takes on the ESWITCH dependency by one level of indirection which is what the 'net' conflicting change is trying to ensure. Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-27netfilter: nf_tables: skip flowtable hooknum and priority on device updatesPablo Neira Ayuso1-18/+35
On device updates, the hooknum and priority attributes are not required. This patch makes optional these two netlink attributes. Moreover, bail out with EOPNOTSUPP if userspace tries to update the hooknum and priority for existing flowtables. While at this, turn EINVAL into EOPNOTSUPP in case the hooknum is not ingress. EINVAL is reserved for missing netlink attribute / malformed netlink messages. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-05-27netfilter: nf_tables: allow to register flowtable with no devicesPablo Neira Ayuso1-9/+11
A flowtable might be composed of dynamic interfaces only. Such dynamic interfaces might show up at a later stage. This patch allows users to register a flowtable with no devices. Once the dynamic interface becomes available, the user adds the dynamic devices to the flowtable. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-05-27netfilter: nf_tables: delete devices from flowtablePablo Neira Ayuso1-16/+97
This patch allows users to delete devices from existing flowtables. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-05-27netfilter: nf_tables: add devices to existing flowtablePablo Neira Ayuso1-11/+86
This patch allows users to add devices to an existing flowtable. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-05-27netfilter: nf_tables: pass hook list to flowtable event notifierPablo Neira Ayuso1-5/+11
Update the flowtable netlink notifier to take the list of hooks as input. This allows to reuse this function in incremental flowtable hook updates. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-05-27netfilter: nf_tables: add nft_flowtable_hooks_destroy()Pablo Neira Ayuso1-4/+11
This patch adds a helper function destroy the flowtable hooks. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-05-27netfilter: nf_tables: pass hook list to nft_{un,}register_flowtable_net_hooks()Pablo Neira Ayuso1-7/+10
This patch prepares for incremental flowtable hook updates. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-05-27netfilter: nf_tables: generalise flowtable hook parsingPablo Neira Ayuso1-11/+25
Update nft_flowtable_parse_hook() to take the flowtable hook list as parameter. This allows to reuse this function to update the hooks. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-05-27netfilter: ctnetlink: add kernel side filtering for dumpRomain Bellan5-71/+381
Conntrack dump does not support kernel side filtering (only get exists, but it returns only one entry. And user has to give a full valid tuple) It means that userspace has to implement filtering after receiving many irrelevant entries, consuming resources (conntrack table is sometimes very huge, much more than a routing table for example). This patch adds filtering in kernel side. To achieve this goal, we: * Add a new CTA_FILTER netlink attributes, actually a flag list to parametize filtering * Convert some *nlattr_to_tuple() functions, to allow a partial parsing of CTA_TUPLE_ORIG and CTA_TUPLE_REPLY (so nf_conntrack_tuple it not fully set) Filtering is now possible on: * IP SRC/DST values * Ports for TCP and UDP flows * IMCP(v6) codes types and IDs Filtering is done as an "AND" operator. For example, when flags PROTO_SRC_PORT, PROTO_NUM and IP_SRC are sets, only entries matching all values are dumped. Changes since v1: Set NLM_F_DUMP_FILTERED in nlm flags if entries are filtered Changes since v2: Move several constants to nf_internals.h Move a fix on netlink values check in a separate patch Add a check on not-supported flags Return EOPNOTSUPP if CDA_FILTER is set in ctnetlink_flush_conntrack (not yet implemented) Code style issues Changes since v3: Fix compilation warning reported by kbuild test robot Changes since v4: Fix a regression introduced in v3 (returned EINVAL for valid netlink messages without CTA_MARK) Changes since v5: Change definition of CTA_FILTER_F_ALL Fix a regression when CTA_TUPLE_ZONE is not set Signed-off-by: Romain Bellan <romain.bellan@wifirst.fr> Signed-off-by: Florent Fourcot <florent.fourcot@wifirst.fr> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-05-27netfilter: nf_conntrack_pptp: fix compilation warning with W=1 buildPablo Neira Ayuso1-1/+1
>> include/linux/netfilter/nf_conntrack_pptp.h:13:20: warning: 'const' type qualifier on return type has no effect [-Wignored-qualifiers] extern const char *const pptp_msg_name(u_int16_t msg); ^~~~~~ Reported-by: kbuild test robot <lkp@intel.com> Fixes: 4c559f15efcc ("netfilter: nf_conntrack_pptp: prevent buffer overflows in debug code") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-05-27netfilter: conntrack: comparison of unsigned in cthelper confirmationPablo Neira Ayuso1-1/+1
net/netfilter/nf_conntrack_core.c: In function nf_confirm_cthelper: net/netfilter/nf_conntrack_core.c:2117:15: warning: comparison of unsigned expression in < 0 is always false [-Wtype-limits] 2117 | if (protoff < 0 || (frag_off & htons(~0x7)) != 0) | ^ ipv6_skip_exthdr() returns a signed integer. Reported-by: Colin Ian King <colin.king@canonical.com> Fixes: 703acd70f249 ("netfilter: nfnetlink_cthelper: unbreak userspace helper support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-05-27netfilter: conntrack: Pass value of ctinfo to __nf_conntrack_updateNathan Chancellor1-3/+3
Clang warns: net/netfilter/nf_conntrack_core.c:2068:21: warning: variable 'ctinfo' is uninitialized when used here [-Wuninitialized] nf_ct_set(skb, ct, ctinfo); ^~~~~~ net/netfilter/nf_conntrack_core.c:2024:2: note: variable 'ctinfo' is declared here enum ip_conntrack_info ctinfo; ^ 1 warning generated. nf_conntrack_update was split up into nf_conntrack_update and __nf_conntrack_update, where the assignment of ctinfo is in nf_conntrack_update but it is used in __nf_conntrack_update. Pass the value of ctinfo from nf_conntrack_update to __nf_conntrack_update so that uninitialized memory is not used and everything works properly. Fixes: ee04805ff54a ("netfilter: conntrack: make conntrack userspace helpers work again") Link: https://github.com/ClangBuiltLinux/linux/issues/1039 Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-05-25netfilter: nfnetlink_cthelper: unbreak userspace helper supportPablo Neira Ayuso1-1/+2
Restore helper data size initialization and fix memcopy of the helper data size. Fixes: 157ffffeb5dc ("netfilter: nfnetlink_cthelper: reject too large userspace allocation requests") Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-05-25netfilter: conntrack: make conntrack userspace helpers work againPablo Neira Ayuso1-6/+72
Florian Westphal says: "Problem is that after the helper hook was merged back into the confirm one, the queueing itself occurs from the confirm hook, i.e. we queue from the last netfilter callback in the hook-list. Therefore, on return, the packet bypasses the confirm action and the connection is never committed to the main conntrack table. To fix this there are several ways: 1. revert the 'Fixes' commit and have a extra helper hook again. Works, but has the drawback of adding another indirect call for everyone. 2. Special case this: split the hooks only when userspace helper gets added, so queueing occurs at a lower priority again, and normal enqueue reinject would eventually call the last hook. 3. Extend the existing nf_queue ct update hook to allow a forced confirmation (plus run the seqadj code). This goes for 3)." Fixes: 827318feb69cb ("netfilter: conntrack: remove helper hook again") Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-05-25netfilter: nf_conntrack_pptp: prevent buffer overflows in debug codePablo Neira Ayuso1-27/+35
Dan Carpenter says: "Smatch complains that the value for "cmd" comes from the network and can't be trusted." Add pptp_msg_name() helper function that checks for the array boundary. Fixes: f09943fefe6b ("[NETFILTER]: nf_conntrack/nf_nat: add PPTP helper port") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-05-25netfilter: ipset: Fix subcounter update skipPhil Sutter1-1/+1
If IPSET_FLAG_SKIP_SUBCOUNTER_UPDATE is set, user requested to not update counters in sub sets. Therefore IPSET_FLAG_SKIP_COUNTER_UPDATE must be set, not unset. Fixes: 6e01781d1c80e ("netfilter: ipset: set match: add support to match the counters") Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-05-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller4-8/+38
Move the bpf verifier trace check into the new switch statement in HEAD. Resolve the overlapping changes in hinic, where bug fixes overlap the addition of VF support. Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-12netfilter: nft_set_rbtree: Add missing expired checksPhil Sutter1-0/+11
Expired intervals would still match and be dumped to user space until garbage collection wiped them out. Make sure they stop matching and disappear (from users' perspective) as soon as they expire. Fixes: 8d8540c4f5e03 ("netfilter: nft_set_rbtree: add timeout support") Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-05-12netfilter: flowtable: set NF_FLOW_TEARDOWN flag on entry expirationPablo Neira Ayuso1-3/+5
If the flow timer expires, the gc sets on the NF_FLOW_TEARDOWN flag. Otherwise, the flowtable software path might race to refresh the timeout, leaving the state machine in inconsistent state. Fixes: c29f74e0df7a ("netfilter: nf_flow_table: hardware offload support") Reported-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-05-11netfilter: conntrack: fix infinite loop on rmmodFlorian Westphal1-1/+12
'rmmod nf_conntrack' can hang forever, because the netns exit gets stuck in nf_conntrack_cleanup_net_list(): i_see_dead_people: busy = 0; list_for_each_entry(net, net_exit_list, exit_list) { nf_ct_iterate_cleanup(kill_all, net, 0, 0); if (atomic_read(&net->ct.count) != 0) busy = 1; } if (busy) { schedule(); goto i_see_dead_people; } When nf_ct_iterate_cleanup iterates the conntrack table, all nf_conn structures can be found twice: once for the original tuple and once for the conntracks reply tuple. get_next_corpse() only calls the iterator when the entry is in original direction -- the idea was to avoid unneeded invocations of the iterator callback. When support for clashing entries was added, the assumption that all nf_conn objects are added twice, once in original, once for reply tuple no longer holds -- NF_CLASH_BIT entries are only added in the non-clashing reply direction. Thus, if at least one NF_CLASH entry is in the list then nf_conntrack_cleanup_net_list() always skips it completely. During normal netns destruction, this causes a hang of several seconds, until the gc worker removes the entry (NF_CLASH entries always have a 1 second timeout). But in the rmmod case, the gc worker has already been stopped, so ct.count never becomes 0. We can fix this in two ways: 1. Add a second test for CLASH_BIT and call iterator for those entries as well, or: 2. Skip the original tuple direction and use the reply tuple. 2) is simpler, so do that. Fixes: 6a757c07e51f80ac ("netfilter: conntrack: allow insertion of clashing entries") Reported-by: Chen Yi <yiche@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-05-11netfilter: flowtable: Remove WQ_MEM_RECLAIM from workqueueRoi Dayan1-1/+1
This workqueue is in charge of handling offloaded flow tasks like add/del/stats we should not use WQ_MEM_RECLAIM flag. The flag can result in the following warning. [ 485.557189] ------------[ cut here ]------------ [ 485.562976] workqueue: WQ_MEM_RECLAIM nf_flow_table_offload:flow_offload_worr [ 485.562985] WARNING: CPU: 7 PID: 3731 at kernel/workqueue.c:2610 check_flush0 [ 485.590191] Kernel panic - not syncing: panic_on_warn set ... [ 485.597100] CPU: 7 PID: 3731 Comm: kworker/u112:8 Not tainted 5.7.0-rc1.21802 [ 485.606629] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.4.3 01/177 [ 485.615487] Workqueue: nf_flow_table_offload flow_offload_work_handler [nf_f] [ 485.624834] Call Trace: [ 485.628077] dump_stack+0x50/0x70 [ 485.632280] panic+0xfb/0x2d7 [ 485.636083] ? check_flush_dependency+0x110/0x130 [ 485.641830] __warn.cold.12+0x20/0x2a [ 485.646405] ? check_flush_dependency+0x110/0x130 [ 485.652154] ? check_flush_dependency+0x110/0x130 [ 485.657900] report_bug+0xb8/0x100 [ 485.662187] ? sched_clock_cpu+0xc/0xb0 [ 485.666974] do_error_trap+0x9f/0xc0 [ 485.671464] do_invalid_op+0x36/0x40 [ 485.675950] ? check_flush_dependency+0x110/0x130 [ 485.681699] invalid_op+0x28/0x30 Fixes: 7da182a998d6 ("netfilter: flowtable: Use work entry per offload command") Reported-by: Marcelo Ricardo Leitner <mleitner@redhat.com> Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-05-11netfilter: flowtable: Add pending bit for offload workPaul Blakey1-1/+7
Gc step can queue offloaded flow del work or stats work. Those work items can race each other and a flow could be freed before the stats work is executed and querying it. To avoid that, add a pending bit that if a work exists for a flow don't queue another work for it. This will also avoid adding multiple stats works in case stats work didn't complete but gc step started again. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-05-10netfilter: conntrack: avoid gcc-10 zero-length-bounds warningArnd Bergmann1-2/+2
gcc-10 warns around a suspicious access to an empty struct member: net/netfilter/nf_conntrack_core.c: In function '__nf_conntrack_alloc': net/netfilter/nf_conntrack_core.c:1522:9: warning: array subscript 0 is outside the bounds of an interior zero-length array 'u8[0]' {aka 'unsigned char[0]'} [-Wzero-length-bounds] 1522 | memset(&ct->__nfct_init_offset[0], 0, | ^~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from net/netfilter/nf_conntrack_core.c:37: include/net/netfilter/nf_conntrack.h:90:5: note: while referencing '__nfct_init_offset' 90 | u8 __nfct_init_offset[0]; | ^~~~~~~~~~~~~~~~~~ The code is correct but a bit unusual. Rework it slightly in a way that does not trigger the warning, using an empty struct instead of an empty array. There are probably more elegant ways to do this, but this is the smallest change. Fixes: c41884ce0562 ("netfilter: conntrack: avoid zeroing timer") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-05-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller2-8/+8
Conflicts were all overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller3-5/+5
Alexei Starovoitov says: ==================== pull-request: bpf-next 2020-05-01 (v2) The following pull-request contains BPF updates for your *net-next* tree. We've added 61 non-merge commits during the last 6 day(s) which contain a total of 153 files changed, 6739 insertions(+), 3367 deletions(-). The main changes are: 1) pulled work.sysctl from vfs tree with sysctl bpf changes. 2) bpf_link observability, from Andrii. 3) BTF-defined map in map, from Andrii. 4) asan fixes for selftests, from Andrii. 5) Allow bpf_map_lookup_elem for SOCKMAP and SOCKHASH, from Jakub. 6) production cloudflare classifier as a selftes, from Lorenz. 7) bpf_ktime_get_*_ns() helper improvements, from Maciej. 8) unprivileged bpftool feature probe, from Quentin. 9) BPF_ENABLE_STATS command, from Song. 10) enable bpf_[gs]etsockopt() helpers for sock_ops progs, from Stanislav. 11) enable a bunch of common helpers for cg-device, sysctl, sockopt progs, from Stanislav. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-30docs: networking: convert tproxy.txt to ReSTMauro Carvalho Chehab1-1/+1
- add SPDX header; - adjust title markup; - mark code blocks and literals as such; - adjust identation, whitespaces and blank lines where needed; - add to networking/index.rst. Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-29netfilter: nf_osf: avoid passing pointer to local varArnd Bergmann1-5/+7
gcc-10 points out that a code path exists where a pointer to a stack variable may be passed back to the caller: net/netfilter/nfnetlink_osf.c: In function 'nf_osf_hdr_ctx_init': cc1: warning: function may return address of local variable [-Wreturn-local-addr] net/netfilter/nfnetlink_osf.c:171:16: note: declared here 171 | struct tcphdr _tcph; | ^~~~~ I am not sure whether this can happen in practice, but moving the variable declaration into the callers avoids the problem. Fixes: 31a9c29210e2 ("netfilter: nf_osf: add struct nf_osf_hdr_ctx") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-04-28netfilter: add audit table unregister actionsRichard Guy Briggs1-0/+2
Audit the action of unregistering ebtables and x_tables. See: https://github.com/linux-audit/audit-kernel/issues/44 Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-04-28audit: tidy and extend netfilter_cfg x_tablesRichard Guy Briggs1-9/+3
NETFILTER_CFG record generation was inconsistent for x_tables and ebtables configuration changes. The call was needlessly messy and there were supporting records missing at times while they were produced when not requested. Simplify the logging call into a new audit_log_nfcfg call. Honour the audit_enabled setting while more consistently recording information including supporting records by tidying up dummy checks. Add an op= field that indicates the operation being performed (register or replace). Here is the enhanced sample record: type=NETFILTER_CFG msg=audit(1580905834.919:82970): table=filter family=2 entries=83 op=replace Generate audit NETFILTER_CFG records on ebtables table registration. Previously this was being done for x_tables registration and replacement operations and ebtables table replacement only. See: https://github.com/linux-audit/audit-kernel/issues/25 See: https://github.com/linux-audit/audit-kernel/issues/35 See: https://github.com/linux-audit/audit-kernel/issues/43 Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-04-28Merge branch 'work.sysctl' of ↵Daniel Borkmann3-5/+5
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull in Christoph Hellwig's series that changes the sysctl's ->proc_handler methods to take kernel pointers instead. It gets rid of the set_fs address space overrides used by BPF. As per discussion, pull in the feature branch into bpf-next as it relates to BPF sysctl progs. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200427071508.GV23230@ZenIV.linux.org.uk/T/
2020-04-28netfilter: nft_nat: add netmap supportPablo Neira Ayuso1-1/+45
This patch allows you to NAT the network address prefix onto another network address prefix, a.k.a. netmapping. Userspace must specify the NF_NAT_RANGE_NETMAP flag and the prefix address through the NFTA_NAT_REG_ADDR_MIN and NFTA_NAT_REG_ADDR_MAX netlink attributes. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-04-28netfilter: nft_nat: add helper function to set up NAT address and protocolPablo Neira Ayuso1-22/+34
This patch add nft_nat_setup_addr() and nft_nat_setup_proto() to set up the NAT mangling. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>