summaryrefslogtreecommitdiffstats
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2022-04-13net: ipv6: add skb drop reasons to ip6_protocol_deliver_rcu()Menglong Dong1-4/+12
Replace kfree_skb() used in ip6_protocol_deliver_rcu() with kfree_skb_reason(). No new reasons are added. Some paths are ignored, as they are not common, such as encapsulation on non-final protocol. Signed-off-by: Menglong Dong <imagedong@tencent.com> Reviewed-by: Jiang Biao <benbjiang@tencent.com> Reviewed-by: Hao Peng <flyingpeng@tencent.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13net: ipv6: add skb drop reasons to ip6_rcv_core()Menglong Dong1-8/+16
Replace kfree_skb() used in ip6_rcv_core() with kfree_skb_reason(). No new drop reasons are added. Seems now we use 'SKB_DROP_REASON_IP_INHDR' for too many case during ipv6 header parse or check, just like what 'IPSTATS_MIB_INHDRERRORS' do. Will it be too general and hard to know what happened? Signed-off-by: Menglong Dong <imagedong@tencent.com> Reviewed-by: Jiang Biao <benbjiang@tencent.com> Reviewed-by: Hao Peng <flyingpeng@tencent.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13net: ipv6: add skb drop reasons to TLV parseMenglong Dong1-12/+24
Replace kfree_skb() used in TLV encoded option header parsing with kfree_skb_reason(). Following functions are involved: ip6_parse_tlv() ipv6_hop_ra() ipv6_hop_ioam() ipv6_hop_jumbo() ipv6_hop_calipso() ipv6_dest_hao() Most skb drops during this process are regarded as 'InHdrErrors', as 'IPSTATS_MIB_INHDRERRORS' is used when ip6_parse_tlv() fails, which make we use 'SKB_DROP_REASON_IP_INHDR' correspondingly. However, 'IP_INHDR' is a relatively general reason. Therefore, we can use other reasons with higher priority in some cases. For example, 'SKB_DROP_REASON_UNHANDLED_PROTO' is used for unknown TLV options. Signed-off-by: Menglong Dong <imagedong@tencent.com> Reviewed-by: Jiang Biao <benbjiang@tencent.com> Reviewed-by: Hao Peng <flyingpeng@tencent.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13net: ipv6: remove redundant statistics in ipv6_hop_jumbo()Menglong Dong1-8/+1
There are two call chains for ipv6_hop_jumbo(). The first one is: ipv6_destopt_rcv() -> ip6_parse_tlv() -> ipv6_hop_jumbo() On this call chain, the drop statistics will be done in ipv6_destopt_rcv() with 'IPSTATS_MIB_INHDRERRORS' if ipv6_hop_jumbo() returns false. The second call chain is: ip6_rcv_core() -> ipv6_parse_hopopts() -> ip6_parse_tlv() And the drop statistics will also be done in ip6_rcv_core() with 'IPSTATS_MIB_INHDRERRORS' if ipv6_hop_jumbo() returns false. Therefore, the statistics in ipv6_hop_jumbo() is redundant, which means the drop is counted twice. The statistics in ipv6_hop_jumbo() is almost the same as the outside, except the 'IPSTATS_MIB_INTRUNCATEDPKTS', which seems that we have to ignore it. Signed-off-by: Menglong Dong <imagedong@tencent.com> Reviewed-by: Jiang Biao <benbjiang@tencent.com> Reviewed-by: Hao Peng <flyingpeng@tencent.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13net: icmp: introduce function icmpv6_param_prob_reason()Menglong Dong1-3/+4
In order to add the skb drop reasons support to icmpv6_param_prob(), introduce the function icmpv6_param_prob_reason() and make icmpv6_param_prob() an inline call to it. This new function will be used in the following patches. Signed-off-by: Menglong Dong <imagedong@tencent.com> Reviewed-by: Jiang Biao <benbjiang@tencent.com> Reviewed-by: Hao Peng <flyingpeng@tencent.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13net: ip: add skb drop reasons to ip forwardingMenglong Dong2-6/+16
Replace kfree_skb() which is used in ip6_forward() and ip_forward() with kfree_skb_reason(). The new drop reason 'SKB_DROP_REASON_PKT_TOO_BIG' is introduced for the case that the length of the packet exceeds MTU and can't fragment. Signed-off-by: Menglong Dong <imagedong@tencent.com> Reviewed-by: Jiang Biao <benbjiang@tencent.com> Reviewed-by: Hao Peng <flyingpeng@tencent.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13net: ipv6: add skb drop reasons to ip6_pkt_drop()Menglong Dong1-1/+5
Replace kfree_skb() used in ip6_pkt_drop() with kfree_skb_reason(). No new reason is added. Signed-off-by: Menglong Dong <imagedong@tencent.com> Reviewed-by: Jiang Biao <benbjiang@tencent.com> Reviewed-by: Hao Peng <flyingpeng@tencent.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13net: ipv4: add skb drop reasons to ip_error()Menglong Dong1-1/+5
Eventually, I find out the handler function for inputting route lookup fail: ip_error(). The drop reasons we used in ip_error() are almost corresponding to IPSTATS_MIB_*, and following new reasons are introduced: SKB_DROP_REASON_IP_INADDRERRORS SKB_DROP_REASON_IP_INNOROUTES Isn't the name SKB_DROP_REASON_IP_HOSTUNREACH and SKB_DROP_REASON_IP_NETUNREACH more accurate? To make them corresponding to IPSTATS_MIB_*, we keep their name still. Signed-off-by: Menglong Dong <imagedong@tencent.com> Reviewed-by: Jiang Biao <benbjiang@tencent.com> Reviewed-by: Hao Peng <flyingpeng@tencent.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13net: bridge: fdb: add support for flush filtering based on ifindex and vlanNikolay Aleksandrov1-1/+44
Add support for fdb flush filtering based on destination ifindex and vlan id. The ifindex must either match a port's device ifindex or the bridge's. The vlan support is trivial since it's already validated by rtnl_fdb_del, we just need to fill it in. Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13net: bridge: fdb: add support for flush filtering based on ndm flags and stateNikolay Aleksandrov2-3/+60
Add support for fdb flush filtering based on ndm flags and state. NDM state and flags are mapped to bridge-specific flags and matched according to the specified masks. NTF_USE is used to represent added_by_user flag since it sets it on fdb add and we don't have a 1:1 mapping for it. Only allowed bits can be set, NTF_SELF and NTF_MASTER are ignored. Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13net: rtnetlink: add ndm flags and state mask attributesNikolay Aleksandrov1-0/+2
Add ndm flags/state masks which will be used for bulk delete filtering. All of these are used by the bridge and vxlan drivers. Also minimal attr policy validation is added, it is up to ndo_fdb_del_bulk implementers to further validate them. Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13net: bridge: fdb: add support for fine-grained flushingNikolay Aleksandrov4-12/+54
Add the ability to specify exactly which fdbs to be flushed. They are described by a new structure - net_bridge_fdb_flush_desc. Currently it can match on port/bridge ifindex, vlan id and fdb flags. It is used to describe the existing dynamic fdb flush operation. Note that this flush operation doesn't treat permanent entries in a special way (fdb_delete vs fdb_delete_local), it will delete them regardless if any port is using them, so currently it can't directly replace deletes which need to handle that case, although we can extend it later for that too. Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13net: bridge: fdb: add ndo_fdb_del_bulkNikolay Aleksandrov3-0/+27
Add a minimal ndo_fdb_del_bulk implementation which flushes all entries. Support for more fine-grained filtering will be added in the following patches. Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13net: rtnetlink: add NLM_F_BULK support to rtnl_fdb_delNikolay Aleksandrov1-19/+48
When NLM_F_BULK is specified in a fdb del message we need to handle it differently. First since this is a new call we can strictly validate the passed attributes, at first only ifindex and vlan are allowed as these will be the initially supported filter attributes, any other attribute is rejected. The mac address is no longer mandatory, but we use it to error out in older kernels because it cannot be specified with bulk request (the attribute is not allowed) and then we have to dispatch the call to ndo_fdb_del_bulk if the device supports it. The del bulk callback can do further validation of the attributes if necessary. Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13net: rtnetlink: add bulk delete support flagNikolay Aleksandrov1-0/+8
Add a new rtnl flag (RTNL_FLAG_BULK_DEL_SUPPORTED) which is used to verify that the delete operation allows bulk object deletion. Also emit a warning if anyone tries to set it for non-delete kind. Suggested-by: David Ahern <dsahern@kernel.org> Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13net: rtnetlink: add helper to extract msg type's kindNikolay Aleksandrov1-1/+1
Add a helper which extracts the msg type's kind using the kind mask (0x3). Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13net: rtnetlink: add msg kind namesNikolay Aleksandrov1-3/+3
Add rtnl kind names instead of using raw values. We'll need to check for DEL kind later to validate bulk flag support. Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13NFC: NULL out the dev->rfkill to prevent UAFLin Ma1-0/+1
Commit 3e3b5dfcd16a ("NFC: reorder the logic in nfc_{un,}register_device") assumes the device_is_registered() in function nfc_dev_up() will help to check when the rfkill is unregistered. However, this check only take effect when device_del(&dev->dev) is done in nfc_unregister_device(). Hence, the rfkill object is still possible be dereferenced. The crash trace in latest kernel (5.18-rc2): [ 68.760105] ================================================================== [ 68.760330] BUG: KASAN: use-after-free in __lock_acquire+0x3ec1/0x6750 [ 68.760756] Read of size 8 at addr ffff888009c93018 by task fuzz/313 [ 68.760756] [ 68.760756] CPU: 0 PID: 313 Comm: fuzz Not tainted 5.18.0-rc2 #4 [ 68.760756] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [ 68.760756] Call Trace: [ 68.760756] <TASK> [ 68.760756] dump_stack_lvl+0x57/0x7d [ 68.760756] print_report.cold+0x5e/0x5db [ 68.760756] ? __lock_acquire+0x3ec1/0x6750 [ 68.760756] kasan_report+0xbe/0x1c0 [ 68.760756] ? __lock_acquire+0x3ec1/0x6750 [ 68.760756] __lock_acquire+0x3ec1/0x6750 [ 68.760756] ? lockdep_hardirqs_on_prepare+0x410/0x410 [ 68.760756] ? register_lock_class+0x18d0/0x18d0 [ 68.760756] lock_acquire+0x1ac/0x4f0 [ 68.760756] ? rfkill_blocked+0xe/0x60 [ 68.760756] ? lockdep_hardirqs_on_prepare+0x410/0x410 [ 68.760756] ? mutex_lock_io_nested+0x12c0/0x12c0 [ 68.760756] ? nla_get_range_signed+0x540/0x540 [ 68.760756] ? _raw_spin_lock_irqsave+0x4e/0x50 [ 68.760756] _raw_spin_lock_irqsave+0x39/0x50 [ 68.760756] ? rfkill_blocked+0xe/0x60 [ 68.760756] rfkill_blocked+0xe/0x60 [ 68.760756] nfc_dev_up+0x84/0x260 [ 68.760756] nfc_genl_dev_up+0x90/0xe0 [ 68.760756] genl_family_rcv_msg_doit+0x1f4/0x2f0 [ 68.760756] ? genl_family_rcv_msg_attrs_parse.constprop.0+0x230/0x230 [ 68.760756] ? security_capable+0x51/0x90 [ 68.760756] genl_rcv_msg+0x280/0x500 [ 68.760756] ? genl_get_cmd+0x3c0/0x3c0 [ 68.760756] ? lock_acquire+0x1ac/0x4f0 [ 68.760756] ? nfc_genl_dev_down+0xe0/0xe0 [ 68.760756] ? lockdep_hardirqs_on_prepare+0x410/0x410 [ 68.760756] netlink_rcv_skb+0x11b/0x340 [ 68.760756] ? genl_get_cmd+0x3c0/0x3c0 [ 68.760756] ? netlink_ack+0x9c0/0x9c0 [ 68.760756] ? netlink_deliver_tap+0x136/0xb00 [ 68.760756] genl_rcv+0x1f/0x30 [ 68.760756] netlink_unicast+0x430/0x710 [ 68.760756] ? memset+0x20/0x40 [ 68.760756] ? netlink_attachskb+0x740/0x740 [ 68.760756] ? __build_skb_around+0x1f4/0x2a0 [ 68.760756] netlink_sendmsg+0x75d/0xc00 [ 68.760756] ? netlink_unicast+0x710/0x710 [ 68.760756] ? netlink_unicast+0x710/0x710 [ 68.760756] sock_sendmsg+0xdf/0x110 [ 68.760756] __sys_sendto+0x19e/0x270 [ 68.760756] ? __ia32_sys_getpeername+0xa0/0xa0 [ 68.760756] ? fd_install+0x178/0x4c0 [ 68.760756] ? fd_install+0x195/0x4c0 [ 68.760756] ? kernel_fpu_begin_mask+0x1c0/0x1c0 [ 68.760756] __x64_sys_sendto+0xd8/0x1b0 [ 68.760756] ? lockdep_hardirqs_on+0xbf/0x130 [ 68.760756] ? syscall_enter_from_user_mode+0x1d/0x50 [ 68.760756] do_syscall_64+0x3b/0x90 [ 68.760756] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 68.760756] RIP: 0033:0x7f67fb50e6b3 ... [ 68.760756] RSP: 002b:00007f67fa91fe90 EFLAGS: 00000293 ORIG_RAX: 000000000000002c [ 68.760756] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f67fb50e6b3 [ 68.760756] RDX: 000000000000001c RSI: 0000559354603090 RDI: 0000000000000003 [ 68.760756] RBP: 00007f67fa91ff00 R08: 00007f67fa91fedc R09: 000000000000000c [ 68.760756] R10: 0000000000000000 R11: 0000000000000293 R12: 00007ffe824d496e [ 68.760756] R13: 00007ffe824d496f R14: 00007f67fa120000 R15: 0000000000000003 [ 68.760756] </TASK> [ 68.760756] [ 68.760756] Allocated by task 279: [ 68.760756] kasan_save_stack+0x1e/0x40 [ 68.760756] __kasan_kmalloc+0x81/0xa0 [ 68.760756] rfkill_alloc+0x7f/0x280 [ 68.760756] nfc_register_device+0xa3/0x1a0 [ 68.760756] nci_register_device+0x77a/0xad0 [ 68.760756] nfcmrvl_nci_register_dev+0x20b/0x2c0 [ 68.760756] nfcmrvl_nci_uart_open+0xf2/0x1dd [ 68.760756] nci_uart_tty_ioctl+0x2c3/0x4a0 [ 68.760756] tty_ioctl+0x764/0x1310 [ 68.760756] __x64_sys_ioctl+0x122/0x190 [ 68.760756] do_syscall_64+0x3b/0x90 [ 68.760756] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 68.760756] [ 68.760756] Freed by task 314: [ 68.760756] kasan_save_stack+0x1e/0x40 [ 68.760756] kasan_set_track+0x21/0x30 [ 68.760756] kasan_set_free_info+0x20/0x30 [ 68.760756] __kasan_slab_free+0x108/0x170 [ 68.760756] kfree+0xb0/0x330 [ 68.760756] device_release+0x96/0x200 [ 68.760756] kobject_put+0xf9/0x1d0 [ 68.760756] nfc_unregister_device+0x77/0x190 [ 68.760756] nfcmrvl_nci_unregister_dev+0x88/0xd0 [ 68.760756] nci_uart_tty_close+0xdf/0x180 [ 68.760756] tty_ldisc_kill+0x73/0x110 [ 68.760756] tty_ldisc_hangup+0x281/0x5b0 [ 68.760756] __tty_hangup.part.0+0x431/0x890 [ 68.760756] tty_release+0x3a8/0xc80 [ 68.760756] __fput+0x1f0/0x8c0 [ 68.760756] task_work_run+0xc9/0x170 [ 68.760756] exit_to_user_mode_prepare+0x194/0x1a0 [ 68.760756] syscall_exit_to_user_mode+0x19/0x50 [ 68.760756] do_syscall_64+0x48/0x90 [ 68.760756] entry_SYSCALL_64_after_hwframe+0x44/0xae This patch just add the null out of dev->rfkill to make sure such dereference cannot happen. This is safe since the device_lock() already protect the check/write from data race. Fixes: 3e3b5dfcd16a ("NFC: reorder the logic in nfc_{un,}register_device") Signed-off-by: Lin Ma <linma@zju.edu.cn> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13ipv6: exthdrs: use swap() instead of open coding itGuo Zhengkui1-4/+1
Address the following coccicheck warning: net/ipv6/exthdrs.c:620:44-45: WARNING opportunity for swap() by using swap() for the swapping of variable values and drop the tmp (`addr`) variable that is not needed any more. Signed-off-by: Guo Zhengkui <guozhengkui@vivo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13tls: rx: only copy IV from the packet for TLS 1.2Jakub Kicinski1-10/+10
TLS 1.3 and ChaChaPoly don't carry IV in the packet. The code before this change would copy out iv_size worth of whatever followed the TLS header in the packet and then for TLS 1.3 | ChaCha overwrite that with the sequence number. Waste of cycles especially with TLS 1.2 being close to dead and TLS 1.3 being the common case. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13tls: rx: use MAX_IV_SIZE for allocationsJakub Kicinski1-1/+1
IVs are 8 or 16 bytes, no point reading out the exact value for quantities this small. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13tls: rx: use async as an in-out argumentJakub Kicinski1-15/+16
Propagating EINPROGRESS thru multiple layers of functions is error prone. Use darg->async as an in/out argument, like we use darg->zc today. On input it tells the code if async is allowed, on output if it took place. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13tls: rx: return the already-copied data on crypto errorJakub Kicinski1-6/+10
async crypto handler will report the socket error no need to report it again. We can, however, let the data we already copied be reported to user space but we need to make sure the error will be reported next time around. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13tls: rx: treat process_rx_list() errors as transientJakub Kicinski1-12/+8
process_rx_list() only fails if it can't copy data to user space. There is no point recording the error onto sk->sk_err or giving up on the data which was read partially. Treat the return value like a normal socket partial read. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13tls: rx: assume crypto always calls our callbackJakub Kicinski1-3/+0
If crypto didn't always invoke our callback for async we'd not be clearing skb->sk and would crash in the skb core when freeing it. This if must be dead code. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13tls: rx: don't handle TLS 1.3 in the async crypto callbackJakub Kicinski1-10/+5
Async crypto never worked with TLS 1.3 and was explicitly disabled in commit 8497ded2d16c ("net/tls: Disable async decrytion for tls1.3"). There's no need for us to handle TLS 1.3 padding in the async cb. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13tls: rx: move counting TlsDecryptErrors for syncJakub Kicinski1-2/+2
Move counting TlsDecryptErrors to tls_do_decryption() where differences between sync and async crypto are reconciled. No functional changes, this code just always gave me a pause. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13tls: rx: reuse leave_on_list label for psockJakub Kicinski1-8/+4
The code is identical, we can save a few LoC. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13tls: rx: consistently use unlocked accessors for rx_listJakub Kicinski1-5/+5
rx_list is protected by the socket lock, no need to take the built-in spin lock on accesses. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-12fou: Remove XRFM from NET_FOU KconfigCoco Li2-2/+0
XRFM is no longer needed for configuring FOU tunnels (CONFIG_NET_FOU_IP_TUNNELS), remove from Kconfig. Also remove the xrfm.h dependency in fou.c. It was added in '23461551c006 ("fou: Support for foo-over-udp RX path")' for depencies of udp_del_offload and udp_offloads, which were removed in 'd92283e338f6 ("fou: change to use UDP socket GRO")'. Built and installed kernel and setup GUE/FOU tunnels. Signed-off-by: Coco Li <lixiaoyan@google.com> Link: https://lore.kernel.org/r/20220411213717.3688789-1-lixiaoyan@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-12page_pool: Add recycle stats to page_pool_put_page_bulkLorenzo Bianconi1-2/+13
Add missing recycle stats to page_pool_put_page_bulk routine. Reviewed-by: Joe Damato <jdamato@fastly.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Link: https://lore.kernel.org/r/3712178b51c007cfaed910ea80e68f00c916b1fa.1649685634.git.lorenzo@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-04-12net: remove noblock parameter from recvmsg() entitiesOliver Hartkopp29-99/+71
The internal recvmsg() functions have two parameters 'flags' and 'noblock' that were merged inside skb_recv_datagram(). As a follow up patch to commit f4b41f062c42 ("net: remove noblock parameter from skb_recv_datagram()") this patch removes the separate 'noblock' parameter for recvmsg(). Analogue to the referenced patch for skb_recv_datagram() the 'flags' and 'noblock' parameters are unnecessarily split up with e.g. err = sk->sk_prot->recvmsg(sk, msg, size, flags & MSG_DONTWAIT, flags & ~MSG_DONTWAIT, &addr_len); or in err = INDIRECT_CALL_2(sk->sk_prot->recvmsg, tcp_recvmsg, udp_recvmsg, sk, msg, size, flags & MSG_DONTWAIT, flags & ~MSG_DONTWAIT, &addr_len); instead of simply using only flags all the time and check for MSG_DONTWAIT where needed (to preserve for the formerly separated no(n)block condition). Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://lore.kernel.org/r/20220411124955.154876-1-socketcan@hartkopp.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-04-12net: bridge: add support for host l2 mdb entriesJoachim Wiberg1-5/+7
This patch expands on the earlier work on layer-2 mdb entries by adding support for host entries. Due to the fact that host joined entries do not have any flag field, we infer the permanent flag when reporting the entries to userspace, which otherwise would be listed as 'temp'. Before patch: ~# bridge mdb add dev br0 port br0 grp 01:00:00:c0:ff:ee permanent Error: bridge: Flags are not allowed for host groups. ~# bridge mdb add dev br0 port br0 grp 01:00:00:c0:ff:ee Error: bridge: Only permanent L2 entries allowed. After patch: ~# bridge mdb add dev br0 port br0 grp 01:00:00:c0:ff:ee permanent ~# bridge mdb show dev br0 port br0 grp 01:00:00:c0:ff:ee permanent vid 1 Signed-off-by: Joachim Wiberg <troglobit@gmail.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-04-11net: bridge: offload BR_HAIRPIN_MODE, BR_ISOLATED, BR_MULTICAST_TO_UNICASTArınç ÜNAL1-1/+2
Add BR_HAIRPIN_MODE, BR_ISOLATED and BR_MULTICAST_TO_UNICAST port flags to BR_PORT_FLAGS_HW_OFFLOAD so that switchdev drivers which have an offloaded data plane have a chance to reject these bridge port flags if they don't support them yet. It makes the code path go through the SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS driver handlers, which return -EINVAL for everything they don't recognize. For drivers that don't catch SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS at all, switchdev will return -EOPNOTSUPP for those which is then ignored, but those are in the minority. Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://lore.kernel.org/r/20220410134227.18810-1-arinc.unal@arinc9.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-11ipv4: Use dscp_t in struct fib_entry_notifier_infoGuillaume Nault1-2/+2
Use the new dscp_t type to replace the tos field of struct fib_entry_notifier_info. This ensures ECN bits are ignored and makes it compatible with the dscp field of struct fib_rt_info. This also allows sparse to flag potential incorrect uses of DSCP and ECN bits. Signed-off-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-11ipv4: Use dscp_t in struct fib_rt_infoGuillaume Nault3-7/+7
Use the new dscp_t type to replace the tos field of struct fib_rt_info. This ensures ECN bits are ignored and makes it compatible with the fa_dscp field of struct fib_alias. This also allows sparse to flag potential incorrect uses of DSCP and ECN bits. Signed-off-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-11mptcp: listen diag dump supportFlorian Westphal1-0/+91
makes 'ss -Ml' show mptcp listen sockets. Iterate over the tcp listen sockets and pick those that have mptcp ulp info attached. mptcp_diag_get_info() is modified to prefer msk->first for mptcp sockets in listen state. This reports accurate number for recv and send queue (pending / max connection backlog counters). Sample output: ss -Mil State Recv-Q Send-Q Local Address:Port Peer Address:Port LISTEN 0 20 127.0.0.1:12000 0.0.0.0:* subflows_max:2 Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-11mptcp: remove locking in mptcp_diag_fill_infoFlorian Westphal1-6/+0
Problem is that listener iteration would call this from atomic context so this locking is not allowed. One way is to drop locks before calling the helper, but afaics the lock isn't really needed, all values are fetched via READ_ONCE(). Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-11mptcp: diag: switch to context structureFlorian Westphal1-3/+11
Raw access to cb->arg[] is deprecated, use a context structure. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-11mptcp: add pm_nl_pernet helpersGeliang Tang1-17/+24
This patch adds two pm_nl_pernet related helpers, named pm_nl_get_pernet() and pm_nl_get_pernet_from_msk() to get pm_nl_pernet from 'net' or 'msk'. Use these helpers instead of using net_generic() directly. Suggested-by: Florian Westphal <fw@strlen.de> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-11mptcp: reset the packet scheduler on PRIO changePaolo Abeni1-0/+2
Similar to the previous patch, for priority changes requested by the local PM. Reported-and-suggested-by: Davide Caratti <dcaratti@redhat.com> Fixes: 067065422fcd ("mptcp: add the outgoing MP_PRIO support") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-11mptcp: reset the packet scheduler on incoming MP_PRIOPaolo Abeni3-4/+18
When an incoming MP_PRIO option changes the backup status of any subflow, we need to reset the packet scheduler status, or the next send could keep using the previously selected subflow, without taking in account the new priorities. Reported-by: Davide Caratti <dcaratti@redhat.com> Fixes: 40453a5c61f4 ("mptcp: add the incoming MP_PRIO support") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-11mptcp: optimize release_cb for the common casePaolo Abeni1-7/+9
The mptcp release callback checks several flags in atomic context, but only MPTCP_CLEAN_UNA can be up frequently. Reorganize the code to avoid multiple conditionals in the most common scenarios. Additional clarify a related comment. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-nextDavid S. Miller9-125/+143
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for net-next: 1) Replace unnecessary list_for_each_entry_continue() in nf_tables, from Jakob Koschel. 2) Add struct nf_conntrack_net_ecache to conntrack event cache and use it, from Florian Westphal. 3) Refactor ctnetlink_dump_list(), also from Florian. 4) Bump module reference counter on cttimeout object addition/removal, from Florian. 5) Consolidate nf_log MAC printer, from Phil Sutter. 6) Add basic logging support for unknown ethertype, from Phil Sutter. 7) Consolidate check for sysctl nf_log_all_netns toggle, also from Phil. 8) Replace hardcode value in nft_bitwise, from Jeremy Sowden. 9) Rename BASIC-like goto tags in nft_bitwise to more meaningful names, also from Jeremy. 10) nft_fib support for reverse path filtering with policy-based routing on iif. Extend selftests to cover for this new usecase, from Florian. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-11netfilter: nft_fib: reverse path filter for policy-based routing on iifPablo Neira Ayuso3-0/+12
If policy-based routing using the iif selector is used, then the fib expression fails to look up for the reverse path from the prerouting hook because the input interface cannot be inferred. In order to support this scenario, extend the fib expression to allow to use after the route lookup, from the forward hook. This patch also adds support for the input hook for usability reasons. Since the prerouting hook cannot be used for the scenario described above, users need two rules: one for the forward chain and another rule for the input chain to check for the reverse path check for locally targeted traffic. Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-04-11net: icmp: add skb drop reasons to icmp protocolMenglong Dong3-46/+67
Replace kfree_skb() used in icmp_rcv() and icmpv6_rcv() with kfree_skb_reason(). In order to get the reasons of the skb drops after icmp message handle, we change the return type of 'handler()' in 'struct icmp_control' from 'bool' to 'enum skb_drop_reason'. This may change its original intention, as 'false' means failure, but 'SKB_NOT_DROPPED_YET' means success now. Therefore, all 'handler' and the call of them need to be handled. Following 'handler' functions are involved: icmp_unreach() icmp_redirect() icmp_echo() icmp_timestamp() icmp_discard() And following new drop reasons are added: SKB_DROP_REASON_ICMP_CSUM SKB_DROP_REASON_INVALID_PROTO The reason 'INVALID_PROTO' is introduced for the case that the packet doesn't follow rfc 1122 and is dropped. This is not a common case, and I believe we can locate the problem from the data in the packet. For now, this 'INVALID_PROTO' is used for the icmp broadcasts with wrong types. Maybe there should be a document file for these reasons. For example, list all the case that causes the 'UNHANDLED_PROTO' and 'INVALID_PROTO' drop reason. Therefore, users can locate their problems according to the document. Reviewed-by: Hao Peng <flyingpeng@tencent.com> Reviewed-by: Jiang Biao <benbjiang@tencent.com> Signed-off-by: Menglong Dong <imagedong@tencent.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-11net: icmp: introduce __ping_queue_rcv_skb() to report drop reasonsMenglong Dong1-5/+13
In order to avoid to change the return value of ping_queue_rcv_skb(), introduce the function __ping_queue_rcv_skb(), which is able to report the reasons of skb drop as its return value, as Paolo suggested. Meanwhile, make ping_queue_rcv_skb() a simple call to __ping_queue_rcv_skb(). The kfree_skb() and sock_queue_rcv_skb() used in ping_queue_rcv_skb() are replaced with kfree_skb_reason() and sock_queue_rcv_skb_reason() now. Reviewed-by: Hao Peng <flyingpeng@tencent.com> Reviewed-by: Jiang Biao <benbjiang@tencent.com> Signed-off-by: Menglong Dong <imagedong@tencent.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-11net: skb: rename SKB_DROP_REASON_PTYPE_ABSENTMenglong Dong1-5/+3
As David Ahern suggested, the reasons for skb drops should be more general and not be code based. Therefore, rename SKB_DROP_REASON_PTYPE_ABSENT to SKB_DROP_REASON_UNHANDLED_PROTO, which is used for the cases of no L3 protocol handler, no L4 protocol handler, version extensions, etc. From previous discussion, now we have the aim to make these reasons more abstract and users based, avoiding code based. Signed-off-by: Menglong Dong <imagedong@tencent.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-11net: sock: introduce sock_queue_rcv_skb_reason()Menglong Dong1-6/+24
In order to report the reasons of skb drops in 'sock_queue_rcv_skb()', introduce the function 'sock_queue_rcv_skb_reason()'. As the return value of 'sock_queue_rcv_skb()' is used as the error code, we can't make it as drop reason and have to pass extra output argument. 'sock_queue_rcv_skb()' is used in many places, so we can't change it directly. Introduce the new function 'sock_queue_rcv_skb_reason()' and make 'sock_queue_rcv_skb()' an inline call to it. Reviewed-by: Hao Peng <flyingpeng@tencent.com> Reviewed-by: Jiang Biao <benbjiang@tencent.com> Signed-off-by: Menglong Dong <imagedong@tencent.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-10tls: rx: jump out for cases which need to leave skb on listJakub Kicinski1-21/+22
The current invese logic is harder to follow (and adds extra tests to the fast path). We have to enumerate all cases which need to keep the skb before consuming it. It's simpler to jump out of the full record flow as we detect those cases. This makes it clear that partial consumption and peek can only reach end of the function thru the !zc case so move the code up there. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>