summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2021-06-03net: caif: add proper error handlingPavel Skripkin4-10/+18
caif_enroll_dev() can fail in some cases. Ingnoring these cases can lead to memory leak due to not assigning link_support pointer to anywhere. Fixes: 7c18d2205ea7 ("caif: Restructure how link caif link layer enroll") Cc: stable@vger.kernel.org Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-03net: caif: added cfserl_release functionPavel Skripkin2-0/+6
Added cfserl_release() function. Cc: stable@vger.kernel.org Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-03Merge branch '1GbE' of ↵David S. Miller13-68/+112
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== This series contains updates to igb, igc, ixgbe, ixgbevf, i40e and ice drivers. Kurt Kanzenbach fixes XDP for igb when PTP is enabled by pulling the timestamp and adjusting appropriate values prior to XDP operations. Magnus adds missing exception tracing for XDP on igb, igc, ixgbe, ixgbevf, i40e and ice drivers. Maciej adds tracking of AF_XDP zero copy enabled queues to resolve an issue with copy mode Tx for the ice driver. Note: Patch 7 will conflict when merged with net-next. Please carry these changes forward. IGC_XDP_TX and IGC_XDP_REDIRECT will need to be changed to return to conform with the net-next changes. Let me know if you have issues. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller4-24/+19
Daniel Borkmann says: ==================== pull-request: bpf 2021-06-02 The following pull-request contains BPF updates for your *net* tree. We've added 2 non-merge commits during the last 7 day(s) which contain a total of 4 files changed, 19 insertions(+), 24 deletions(-). The main changes are: 1) Fix pahole BTF generation when ccache is used, from Javier Martinez Canillas. 2) Fix BPF lockdown hooks in bpf_probe_read_kernel{,_str}() helpers which caused a deadlock from bcc programs, triggered OOM killer from audit side and didn't work generally with SELinux policy rules due to pointing to wrong task struct, from Daniel Borkmann. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-03net: kcm: fix memory leak in kcm_sendmsgPavel Skripkin1-0/+5
Syzbot reported memory leak in kcm_sendmsg()[1]. The problem was in non-freed frag_list in case of error. In the while loop: if (head == skb) skb_shinfo(head)->frag_list = tskb; else skb->next = tskb; frag_list filled with skbs, but nothing was freeing them. backtrace: [<0000000094c02615>] __alloc_skb+0x5e/0x250 net/core/skbuff.c:198 [<00000000e5386cbd>] alloc_skb include/linux/skbuff.h:1083 [inline] [<00000000e5386cbd>] kcm_sendmsg+0x3b6/0xa50 net/kcm/kcmsock.c:967 [1] [<00000000f1613a8a>] sock_sendmsg_nosec net/socket.c:652 [inline] [<00000000f1613a8a>] sock_sendmsg+0x4c/0x60 net/socket.c:672 Reported-and-tested-by: syzbot+b039f5699bd82e1fb011@syzkaller.appspotmail.com Fixes: ab7ac4eb9832 ("kcm: Kernel Connection Multiplexor module") Cc: stable@vger.kernel.org Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-03sit: set name of device back to struct parmszhang kai1-0/+3
addrconf_set_sit_dstaddr will use parms->name. Signed-off-by: zhang kai <zhangkaiheb@126.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-03rtnetlink: Fix missing error code in rtnl_bridge_notify()Jiapeng Chong1-1/+3
The error code is missing in this code scenario, add the error code '-EINVAL' to the return value 'err'. Eliminate the follow smatch warning: net/core/rtnetlink.c:4834 rtnl_bridge_notify() warn: missing error code 'err'. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller2-3/+7
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Do not allow to add conntrack helper extension for confirmed conntracks in the nf_tables ct expectation support. 2) Fix bogus EBUSY in nfnetlink_cthelper when NFCTH_PRIV_DATA_LEN is passed on userspace helper updates. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-03ice: track AF_XDP ZC enabled queues in bitmapMaciej Fijalkowski3-3/+18
Commit c7a219048e45 ("ice: Remove xsk_buff_pool from VSI structure") silently introduced a regression and broke the Tx side of AF_XDP in copy mode. xsk_pool on ice_ring is set only based on the existence of the XDP prog on the VSI which in turn picks ice_clean_tx_irq_zc to be executed. That is not something that should happen for copy mode as it should use the regular data path ice_clean_tx_irq. This results in a following splat when xdpsock is run in txonly or l2fwd scenarios in copy mode: <snip> [ 106.050195] BUG: kernel NULL pointer dereference, address: 0000000000000030 [ 106.057269] #PF: supervisor read access in kernel mode [ 106.062493] #PF: error_code(0x0000) - not-present page [ 106.067709] PGD 0 P4D 0 [ 106.070293] Oops: 0000 [#1] PREEMPT SMP NOPTI [ 106.074721] CPU: 61 PID: 0 Comm: swapper/61 Not tainted 5.12.0-rc2+ #45 [ 106.081436] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019 [ 106.092027] RIP: 0010:xp_raw_get_dma+0x36/0x50 [ 106.096551] Code: 74 14 48 b8 ff ff ff ff ff ff 00 00 48 21 f0 48 c1 ee 30 48 01 c6 48 8b 87 90 00 00 00 48 89 f2 81 e6 ff 0f 00 00 48 c1 ea 0c <48> 8b 04 d0 48 83 e0 fe 48 01 f0 c3 66 66 2e 0f 1f 84 00 00 00 00 [ 106.115588] RSP: 0018:ffffc9000d694e50 EFLAGS: 00010206 [ 106.120893] RAX: 0000000000000000 RBX: ffff88984b8c8a00 RCX: ffff889852581800 [ 106.128137] RDX: 0000000000000006 RSI: 0000000000000000 RDI: ffff88984cd8b800 [ 106.135383] RBP: ffff888123b50001 R08: ffff889896800000 R09: 0000000000000800 [ 106.142628] R10: 0000000000000000 R11: ffffffff826060c0 R12: 00000000000000ff [ 106.149872] R13: 0000000000000000 R14: 0000000000000040 R15: ffff888123b50018 [ 106.157117] FS: 0000000000000000(0000) GS:ffff8897e0f40000(0000) knlGS:0000000000000000 [ 106.165332] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 106.171163] CR2: 0000000000000030 CR3: 000000000560a004 CR4: 00000000007706e0 [ 106.178408] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 106.185653] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 106.192898] PKRU: 55555554 [ 106.195653] Call Trace: [ 106.198143] <IRQ> [ 106.200196] ice_clean_tx_irq_zc+0x183/0x2a0 [ice] [ 106.205087] ice_napi_poll+0x3e/0x590 [ice] [ 106.209356] __napi_poll+0x2a/0x160 [ 106.212911] net_rx_action+0xd6/0x200 [ 106.216634] __do_softirq+0xbf/0x29b [ 106.220274] irq_exit_rcu+0x88/0xc0 [ 106.223819] common_interrupt+0x7b/0xa0 [ 106.227719] </IRQ> [ 106.229857] asm_common_interrupt+0x1e/0x40 </snip> Fix this by introducing the bitmap of queues that are zero-copy enabled, where each bit, corresponding to a queue id that xsk pool is being configured on, will be set/cleared within ice_xsk_pool_{en,dis}able and checked within ice_xsk_pool(). The latter is a function used for deciding which napi poll routine is executed. Idea is being taken from our other drivers such as i40e and ixgbe. Fixes: c7a219048e45 ("ice: Remove xsk_buff_pool from VSI structure") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-06-03igc: add correct exception tracing for XDPMagnus Karlsson1-6/+5
Add missing exception tracing to XDP when a number of different errors can occur. The support was only partial. Several errors where not logged which would confuse the user quite a lot not knowing where and why the packets disappeared. Fixes: 73f1071c1d29 ("igc: Add support for XDP_TX action") Fixes: 4ff320361092 ("igc: Add support for XDP_REDIRECT action") Reported-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-06-03ixgbevf: add correct exception tracing for XDPMagnus Karlsson1-0/+3
Add missing exception tracing to XDP when a number of different errors can occur. The support was only partial. Several errors where not logged which would confuse the user quite a lot not knowing where and why the packets disappeared. Fixes: 21092e9ce8b1 ("ixgbevf: Add support for XDP_TX action") Reported-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Tested-by: Vishakha Jambekar <vishakha.jambekar@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-06-03igb: add correct exception tracing for XDPMagnus Karlsson1-4/+6
Add missing exception tracing to XDP when a number of different errors can occur. The support was only partial. Several errors where not logged which would confuse the user quite a lot not knowing where and why the packets disappeared. Fixes: 9cbc948b5a20 ("igb: add XDP support") Reported-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Tested-by: Vishakha Jambekar <vishakha.jambekar@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-06-03ixgbe: add correct exception tracing for XDPMagnus Karlsson2-14/+16
Add missing exception tracing to XDP when a number of different errors can occur. The support was only partial. Several errors where not logged which would confuse the user quite a lot not knowing where and why the packets disappeared. Fixes: 33fdc82f0883 ("ixgbe: add support for XDP_TX action") Fixes: d0bcacd0a130 ("ixgbe: add AF_XDP zero-copy Rx support") Reported-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Tested-by: Vishakha Jambekar <vishakha.jambekar@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-06-03ice: add correct exception tracing for XDPMagnus Karlsson2-5/+15
Add missing exception tracing to XDP when a number of different errors can occur. The support was only partial. Several errors where not logged which would confuse the user quite a lot not knowing where and why the packets disappeared. Fixes: efc2214b6047 ("ice: Add support for XDP") Fixes: 2d4238f55697 ("ice: Add support for AF_XDP") Reported-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-06-03i40e: add correct exception tracing for XDPMagnus Karlsson2-3/+12
Add missing exception tracing to XDP when a number of different errors can occur. The support was only partial. Several errors where not logged which would confuse the user quite a lot not knowing where and why the packets disappeared. Fixes: 74608d17fe29 ("i40e: add support for XDP_TX action") Fixes: 0a714186d3c0 ("i40e: add AF_XDP zero-copy Rx support") Reported-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-06-03igb: Fix XDP with PTP enabledKurt Kanzenbach3-33/+37
When using native XDP with the igb driver, the XDP frame data doesn't point to the beginning of the packet. It's off by 16 bytes. Everything works as expected with XDP skb mode. Actually these 16 bytes are used to store the packet timestamps. Therefore, pull the timestamp before executing any XDP operations and adjust all other code accordingly. The igc driver does it like that as well. Tested with Intel i210 card and AF_XDP sockets. Fixes: 9cbc948b5a20 ("igb: add XDP support") Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-06-02net: stmmac: fix issue where clk is being unprepared twiceWong Vee Khee1-1/+0
In the case of MDIO bus registration failure due to no external PHY devices is connected to the MAC, clk_disable_unprepare() is called in stmmac_bus_clk_config() and intel_eth_pci_probe() respectively. The second call in intel_eth_pci_probe() will caused the following:- [ 16.578605] intel-eth-pci 0000:00:1e.5: No PHY found [ 16.583778] intel-eth-pci 0000:00:1e.5: stmmac_dvr_probe: MDIO bus (id: 2) registration failed [ 16.680181] ------------[ cut here ]------------ [ 16.684861] stmmac-0000:00:1e.5 already disabled [ 16.689547] WARNING: CPU: 13 PID: 2053 at drivers/clk/clk.c:952 clk_core_disable+0x96/0x1b0 [ 16.697963] Modules linked in: dwc3 iTCO_wdt mei_hdcp iTCO_vendor_support udc_core x86_pkg_temp_thermal kvm_intel marvell10g kvm sch_fq_codel nfsd irqbypass dwmac_intel(+) stmmac uio ax88179_178a pcs_xpcs phylink uhid spi_pxa2xx_platform usbnet mei_me pcspkr tpm_crb mii i2c_i801 dw_dmac dwc3_pci thermal dw_dmac_core intel_rapl_msr libphy i2c_smbus mei tpm_tis intel_th_gth tpm_tis_core tpm intel_th_acpi intel_pmc_core intel_th i915 fuse configfs snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec snd_hda_core snd_pcm snd_timer snd soundcore [ 16.746785] CPU: 13 PID: 2053 Comm: systemd-udevd Tainted: G U 5.13.0-rc3-intel-lts #76 [ 16.756134] Hardware name: Intel Corporation Alder Lake Client Platform/AlderLake-S ADP-S DRR4 CRB, BIOS ADLIFSI1.R00.1494.B00.2012031421 12/03/2020 [ 16.769465] RIP: 0010:clk_core_disable+0x96/0x1b0 [ 16.774222] Code: 00 8b 05 45 96 17 01 85 c0 7f 24 48 8b 5b 30 48 85 db 74 a5 8b 43 7c 85 c0 75 93 48 8b 33 48 c7 c7 6e 32 cc b7 e8 b2 5d 52 00 <0f> 0b 5b 5d c3 65 8b 05 76 31 18 49 89 c0 48 0f a3 05 bc 92 1a 01 [ 16.793016] RSP: 0018:ffffa44580523aa0 EFLAGS: 00010086 [ 16.798287] RAX: 0000000000000000 RBX: ffff8d7d0eb70a00 RCX: 0000000000000000 [ 16.805435] RDX: 0000000000000002 RSI: ffffffffb7c62d5f RDI: 00000000ffffffff [ 16.812610] RBP: 0000000000000287 R08: 0000000000000000 R09: ffffa445805238d0 [ 16.819759] R10: 0000000000000001 R11: 0000000000000001 R12: ffff8d7d0eb70a00 [ 16.826904] R13: ffff8d7d027370c8 R14: 0000000000000006 R15: ffffa44580523ad0 [ 16.834047] FS: 00007f9882fa2600(0000) GS:ffff8d80a0940000(0000) knlGS:0000000000000000 [ 16.842177] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 16.847966] CR2: 00007f9882bea3d8 CR3: 000000010b126001 CR4: 0000000000370ee0 [ 16.855144] Call Trace: [ 16.857614] clk_core_disable_lock+0x1b/0x30 [ 16.861941] intel_eth_pci_probe.cold+0x11d/0x136 [dwmac_intel] [ 16.867913] pci_device_probe+0xcf/0x150 [ 16.871890] really_probe+0xf5/0x3e0 [ 16.875526] driver_probe_device+0x64/0x150 [ 16.879763] device_driver_attach+0x53/0x60 [ 16.883998] __driver_attach+0x9f/0x150 [ 16.887883] ? device_driver_attach+0x60/0x60 [ 16.892288] ? device_driver_attach+0x60/0x60 [ 16.896698] bus_for_each_dev+0x77/0xc0 [ 16.900583] bus_add_driver+0x184/0x1f0 [ 16.904469] driver_register+0x6c/0xc0 [ 16.908268] ? 0xffffffffc07ae000 [ 16.911598] do_one_initcall+0x4a/0x210 [ 16.915489] ? kmem_cache_alloc_trace+0x305/0x4e0 [ 16.920247] do_init_module+0x5c/0x230 [ 16.924057] load_module+0x2894/0x2b70 [ 16.927857] ? __do_sys_finit_module+0xb5/0x120 [ 16.932441] __do_sys_finit_module+0xb5/0x120 [ 16.936845] do_syscall_64+0x42/0x80 [ 16.940476] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 16.945586] RIP: 0033:0x7f98830e5ccd [ 16.949177] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 93 31 0c 00 f7 d8 64 89 01 48 [ 16.967970] RSP: 002b:00007ffc66b60168 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 [ 16.975583] RAX: ffffffffffffffda RBX: 000055885de35ef0 RCX: 00007f98830e5ccd [ 16.982725] RDX: 0000000000000000 RSI: 00007f98832541e3 RDI: 0000000000000012 [ 16.989868] RBP: 0000000000020000 R08: 0000000000000000 R09: 0000000000000000 [ 16.997042] R10: 0000000000000012 R11: 0000000000000246 R12: 00007f98832541e3 [ 17.004222] R13: 0000000000000000 R14: 0000000000000000 R15: 00007ffc66b60328 [ 17.011369] ---[ end trace df06a3dab26b988c ]--- [ 17.016062] ------------[ cut here ]------------ [ 17.020701] stmmac-0000:00:1e.5 already unprepared Removing the stmmac_bus_clks_config() call in stmmac_dvr_probe and let dwmac-intel to handle the unprepare and disable of the clk device. Fixes: 5ec55823438e ("net: stmmac: add clocks management for gmac driver") Cc: Joakim Zhang <qiangqing.zhang@nxp.com> Signed-off-by: Wong Vee Khee <vee.khee.wong@linux.intel.com> Reviewed-by: Joakim Zhang <qiangqing.zhang@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-02net: ipconfig: Don't override command-line hostnames or domainsJosh Triplett1-5/+8
If the user specifies a hostname or domain name as part of the ip= command-line option, preserve it and don't overwrite it with one supplied by DHCP/BOOTP. For instance, ip=::::myhostname::dhcp will use "myhostname" rather than ignoring and overwriting it. Fix the comment on ic_bootp_string that suggests it only copies a string "if not already set"; it doesn't have any such logic. Signed-off-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-02Merge tag 'mlx5-fixes-2021-06-01' of git://git.kernel.org/pub/scm/linuDavid S. Miller9-20/+96
x/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5 fixes 2021-06-01 This series introduces some fixes to mlx5 driver. Please pull and let me know if there is any problem. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-02bpf, lockdown, audit: Fix buggy SELinux lockdown permission checksDaniel Borkmann2-22/+17
Commit 59438b46471a ("security,lockdown,selinux: implement SELinux lockdown") added an implementation of the locked_down LSM hook to SELinux, with the aim to restrict which domains are allowed to perform operations that would breach lockdown. This is indirectly also getting audit subsystem involved to report events. The latter is problematic, as reported by Ondrej and Serhei, since it can bring down the whole system via audit: 1) The audit events that are triggered due to calls to security_locked_down() can OOM kill a machine, see below details [0]. 2) It also seems to be causing a deadlock via avc_has_perm()/slow_avc_audit() when trying to wake up kauditd, for example, when using trace_sched_switch() tracepoint, see details in [1]. Triggering this was not via some hypothetical corner case, but with existing tools like runqlat & runqslower from bcc, for example, which make use of this tracepoint. Rough call sequence goes like: rq_lock(rq) -> -------------------------+ trace_sched_switch() -> | bpf_prog_xyz() -> +-> deadlock selinux_lockdown() -> | audit_log_end() -> | wake_up_interruptible() -> | try_to_wake_up() -> | rq_lock(rq) --------------+ What's worse is that the intention of 59438b46471a to further restrict lockdown settings for specific applications in respect to the global lockdown policy is completely broken for BPF. The SELinux policy rule for the current lockdown check looks something like this: allow <who> <who> : lockdown { <reason> }; However, this doesn't match with the 'current' task where the security_locked_down() is executed, example: httpd does a syscall. There is a tracing program attached to the syscall which triggers a BPF program to run, which ends up doing a bpf_probe_read_kernel{,_str}() helper call. The selinux_lockdown() hook does the permission check against 'current', that is, httpd in this example. httpd has literally zero relation to this tracing program, and it would be nonsensical having to write an SELinux policy rule against httpd to let the tracing helper pass. The policy in this case needs to be against the entity that is installing the BPF program. For example, if bpftrace would generate a histogram of syscall counts by user space application: bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }' bpftrace would then go and generate a BPF program from this internally. One way of doing it [for the sake of the example] could be to call bpf_get_current_task() helper and then access current->comm via one of bpf_probe_read_kernel{,_str}() helpers. So the program itself has nothing to do with httpd or any other random app doing a syscall here. The BPF program _explicitly initiated_ the lockdown check. The allow/deny policy belongs in the context of bpftrace: meaning, you want to grant bpftrace access to use these helpers, but other tracers on the system like my_random_tracer _not_. Therefore fix all three issues at the same time by taking a completely different approach for the security_locked_down() hook, that is, move the check into the program verification phase where we actually retrieve the BPF func proto. This also reliably gets the task (current) that is trying to install the BPF tracing program, e.g. bpftrace/bcc/perf/systemtap/etc, and it also fixes the OOM since we're moving this out of the BPF helper's fast-path which can be called several millions of times per second. The check is then also in line with other security_locked_down() hooks in the system where the enforcement is performed at open/load time, for example, open_kcore() for /proc/kcore access or module_sig_check() for module signatures just to pick few random ones. What's out of scope in the fix as well as in other security_locked_down() hook locations /outside/ of BPF subsystem is that if the lockdown policy changes on the fly there is no retrospective action. This requires a different discussion, potentially complex infrastructure, and it's also not clear whether this can be solved generically. Either way, it is out of scope for a suitable stable fix which this one is targeting. Note that the breakage is specifically on 59438b46471a where it started to rely on 'current' as UAPI behavior, and _not_ earlier infrastructure such as 9d1f8be5cf42 ("bpf: Restrict bpf when kernel lockdown is in confidentiality mode"). [0] https://bugzilla.redhat.com/show_bug.cgi?id=1955585, Jakub Hrozek says: I starting seeing this with F-34. When I run a container that is traced with BPF to record the syscalls it is doing, auditd is flooded with messages like: type=AVC msg=audit(1619784520.593:282387): avc: denied { confidentiality } for pid=476 comm="auditd" lockdown_reason="use of bpf to read kernel RAM" scontext=system_u:system_r:auditd_t:s0 tcontext=system_u:system_r:auditd_t:s0 tclass=lockdown permissive=0 This seems to be leading to auditd running out of space in the backlog buffer and eventually OOMs the machine. [...] auditd running at 99% CPU presumably processing all the messages, eventually I get: Apr 30 12:20:42 fedora kernel: audit: backlog limit exceeded Apr 30 12:20:42 fedora kernel: audit: backlog limit exceeded Apr 30 12:20:42 fedora kernel: audit: audit_backlog=2152579 > audit_backlog_limit=64 Apr 30 12:20:42 fedora kernel: audit: audit_backlog=2152626 > audit_backlog_limit=64 Apr 30 12:20:42 fedora kernel: audit: audit_backlog=2152694 > audit_backlog_limit=64 Apr 30 12:20:42 fedora kernel: audit: audit_lost=6878426 audit_rate_limit=0 audit_backlog_limit=64 Apr 30 12:20:45 fedora kernel: oci-seccomp-bpf invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=-1000 Apr 30 12:20:45 fedora kernel: CPU: 0 PID: 13284 Comm: oci-seccomp-bpf Not tainted 5.11.12-300.fc34.x86_64 #1 Apr 30 12:20:45 fedora kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014 [...] [1] https://lore.kernel.org/linux-audit/CANYvDQN7H5tVp47fbYcRasv4XF07eUbsDwT_eDCHXJUj43J7jQ@mail.gmail.com/, Serhei Makarov says: Upstream kernel 5.11.0-rc7 and later was found to deadlock during a bpf_probe_read_compat() call within a sched_switch tracepoint. The problem is reproducible with the reg_alloc3 testcase from SystemTap's BPF backend testsuite on x86_64 as well as the runqlat, runqslower tools from bcc on ppc64le. Example stack trace: [...] [ 730.868702] stack backtrace: [ 730.869590] CPU: 1 PID: 701 Comm: in:imjournal Not tainted, 5.12.0-0.rc2.20210309git144c79ef3353.166.fc35.x86_64 #1 [ 730.871605] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014 [ 730.873278] Call Trace: [ 730.873770] dump_stack+0x7f/0xa1 [ 730.874433] check_noncircular+0xdf/0x100 [ 730.875232] __lock_acquire+0x1202/0x1e10 [ 730.876031] ? __lock_acquire+0xfc0/0x1e10 [ 730.876844] lock_acquire+0xc2/0x3a0 [ 730.877551] ? __wake_up_common_lock+0x52/0x90 [ 730.878434] ? lock_acquire+0xc2/0x3a0 [ 730.879186] ? lock_is_held_type+0xa7/0x120 [ 730.880044] ? skb_queue_tail+0x1b/0x50 [ 730.880800] _raw_spin_lock_irqsave+0x4d/0x90 [ 730.881656] ? __wake_up_common_lock+0x52/0x90 [ 730.882532] __wake_up_common_lock+0x52/0x90 [ 730.883375] audit_log_end+0x5b/0x100 [ 730.884104] slow_avc_audit+0x69/0x90 [ 730.884836] avc_has_perm+0x8b/0xb0 [ 730.885532] selinux_lockdown+0xa5/0xd0 [ 730.886297] security_locked_down+0x20/0x40 [ 730.887133] bpf_probe_read_compat+0x66/0xd0 [ 730.887983] bpf_prog_250599c5469ac7b5+0x10f/0x820 [ 730.888917] trace_call_bpf+0xe9/0x240 [ 730.889672] perf_trace_run_bpf_submit+0x4d/0xc0 [ 730.890579] perf_trace_sched_switch+0x142/0x180 [ 730.891485] ? __schedule+0x6d8/0xb20 [ 730.892209] __schedule+0x6d8/0xb20 [ 730.892899] schedule+0x5b/0xc0 [ 730.893522] exit_to_user_mode_prepare+0x11d/0x240 [ 730.894457] syscall_exit_to_user_mode+0x27/0x70 [ 730.895361] entry_SYSCALL_64_after_hwframe+0x44/0xae [...] Fixes: 59438b46471a ("security,lockdown,selinux: implement SELinux lockdown") Reported-by: Ondrej Mosnacek <omosnace@redhat.com> Reported-by: Jakub Hrozek <jhrozek@redhat.com> Reported-by: Serhei Makarov <smakarov@redhat.com> Reported-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Tested-by: Jiri Olsa <jolsa@redhat.com> Cc: Paul Moore <paul@paul-moore.com> Cc: James Morris <jamorris@linux.microsoft.com> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: Frank Eigler <fche@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/bpf/01135120-8bf7-df2e-cff0-1d73f1f841c3@iogearbox.net
2021-06-02netfilter: nfnetlink_cthelper: hit EBUSY on updates if size mismatchesPablo Neira Ayuso1-2/+6
The private helper data size cannot be updated. However, updates that contain NFCTH_PRIV_DATA_LEN might bogusly hit EBUSY even if the size is the same. Fixes: 12f7a505331e ("netfilter: add user-space connection tracking helper infrastructure") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-06-02netfilter: nft_ct: skip expectations for confirmed conntrackPablo Neira Ayuso1-1/+1
nft_ct_expect_obj_eval() calls nf_ct_ext_add() for a confirmed conntrack entry. However, nf_ct_ext_add() can only be called for !nf_ct_is_confirmed(). [ 1825.349056] WARNING: CPU: 0 PID: 1279 at net/netfilter/nf_conntrack_extend.c:48 nf_ct_xt_add+0x18e/0x1a0 [nf_conntrack] [ 1825.351391] RIP: 0010:nf_ct_ext_add+0x18e/0x1a0 [nf_conntrack] [ 1825.351493] Code: 41 5c 41 5d 41 5e 41 5f c3 41 bc 0a 00 00 00 e9 15 ff ff ff ba 09 00 00 00 31 f6 4c 89 ff e8 69 6c 3d e9 eb 96 45 31 ed eb cd <0f> 0b e9 b1 fe ff ff e8 86 79 14 e9 eb bf 0f 1f 40 00 0f 1f 44 00 [ 1825.351721] RSP: 0018:ffffc90002e1f1e8 EFLAGS: 00010202 [ 1825.351790] RAX: 000000000000000e RBX: ffff88814f5783c0 RCX: ffffffffc0e4f887 [ 1825.351881] RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff88814f578440 [ 1825.351971] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff88814f578447 [ 1825.352060] R10: ffffed1029eaf088 R11: 0000000000000001 R12: ffff88814f578440 [ 1825.352150] R13: ffff8882053f3a00 R14: 0000000000000000 R15: 0000000000000a20 [ 1825.352240] FS: 00007f992261c900(0000) GS:ffff889faec00000(0000) knlGS:0000000000000000 [ 1825.352343] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1825.352417] CR2: 000056070a4d1158 CR3: 000000015efe0000 CR4: 0000000000350ee0 [ 1825.352508] Call Trace: [ 1825.352544] nf_ct_helper_ext_add+0x10/0x60 [nf_conntrack] [ 1825.352641] nft_ct_expect_obj_eval+0x1b8/0x1e0 [nft_ct] [ 1825.352716] nft_do_chain+0x232/0x850 [nf_tables] Add the ct helper extension only for unconfirmed conntrack. Skip rule evaluation if the ct helper extension does not exist. Thus, you can only create expectations from the first packet. It should be possible to remove this limitation by adding a new action to attach a generic ct helper to the first packet. Then, use this ct helper extension from follow up packets to create the ct expectation. While at it, add a missing check to skip the template conntrack too and remove check for IPCT_UNTRACK which is implicit to !ct. Fixes: 857b46027d6f ("netfilter: nft_ct: add ct expectations support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-06-01net/mlx5: DR, Create multi-destination flow table with level less than 64Yevgeny Kliteynik2-1/+4
Flow table that contains flow pointing to multiple flow tables or multiple TIRs must have a level lower than 64. In our case it applies to muli- destination flow table. Fix the level of the created table to comply with HW Spec definitions, and still make sure that its level lower than SW-owned tables, so that it would be possible to point from the multi-destination FW table to SW tables. Fixes: 34583beea4b7 ("net/mlx5: DR, Create multi-destination table for SW-steering use") Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-06-01net/mlx5e: Fix conflict with HW TS and CQE compressionAya Levin1-0/+7
When a driver's profile doesn't support a dedicated PTP-RQ, configuration of CQE compression while HW TS is configured should fail. Fixes: 885b8cfb161e ("net/mlx5e: Update ethtool setting of CQE compression") Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-06-01net/mlx5e: Fix HW TS with CQE compression according to profileAya Levin1-15/+52
When the driver's profile doesn't support a dedicated PTP-RQ, the PTP accuracy of HW TS is affected by the CQE compression. In this case, turn off CQE compression. Otherwise, the driver crashes: BUG: kernel NULL pointer dereference, address:0000000000000018 ... ... RIP: 0010:mlx5e_ptp_rx_set_fs+0x25/0x1a0 [mlx5_core] ... ... Call Trace: mlx5e_ptp_activate_channel+0xb2/0xf0 [mlx5_core] mlx5e_activate_priv_channels+0x3b9/0x8c0 [mlx5_core] ? __mutex_unlock_slowpath+0x45/0x2a0 ? mlx5e_refresh_tirs+0x151/0x1e0 [mlx5_core] mlx5e_switch_priv_channels+0x1cd/0x2d0 [mlx5_core] ? mlx5e_xdp_allowed+0x150/0x150 [mlx5_core] mlx5e_safe_switch_params+0x118/0x3c0 [mlx5_core] ? __mutex_lock+0x6e/0x8e0 ? mlx5e_hwstamp_set+0xa9/0x300 [mlx5_core] mlx5e_hwstamp_set+0x194/0x300 [mlx5_core] ? dev_ioctl+0x9b/0x3d0 mlx5i_ioctl+0x37/0x60 [mlx5_core] mlx5i_pkey_ioctl+0x12/0x20 [mlx5_core] dev_ioctl+0xa9/0x3d0 sock_ioctl+0x268/0x420 __x64_sys_ioctl+0x3d8/0x790 ? lockdep_hardirqs_on_prepare+0xe4/0x190 do_syscall_64+0x2d/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: 960fbfe222a4 ("net/mlx5e: Allow coexistence of CQE compression and HW TS PTP") Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-06-01net/mlx5e: Fix adding encap rules to slow pathRoi Dayan3-2/+8
On some devices the ignore flow level cap is not supported and we shouldn't use it. Setting the dest ft with mlx5_chains_get_tc_end_ft() already gives the correct end ft if ignore flow level cap is supported or not. Fixes: 39ac237ce009 ("net/mlx5: E-Switch, Refactor chains and priorities") Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-06-01net/mlx5e: Check for needed capability for cvlan matchingRoi Dayan1-0/+9
If not supported show an error and return instead of trying to offload to the hardware and fail. Fixes: 699e96ddf47f ("net/mlx5e: Support offloading tc double vlan headers match") Reported-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-06-01net/mlx5: Check firmware sync reset requested is set before trying to abort itMoshe Shemesh1-0/+3
In case driver sent NACK to firmware on sync reset request, it will get sync reset abort event while it didn't set sync reset requested mode. Thus, on abort sync reset event handler, driver should check reset requested is set before trying to stop sync reset poll. Fixes: 7dd6df329d4c ("net/mlx5: Handle sync reset abort event") Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-06-01net/mlx5e: Disable TLS offload for uplink representorRoi Dayan1-0/+10
TLS offload is not supported in switchdev mode. Fixes: 7a9fb35e8c3a ("net/mlx5e: Do not reload ethernet ports when changing eswitch mode") Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-06-01net/mlx5e: Fix incompatible castingAya Levin1-2/+3
Device supports setting of a single fec mode at a time, enforce this by bitmap_weight == 1. Input from fec command is in u32, avoid cast to unsigned long and use bitmap_from_arr32 to populate bitmap safely. Fixes: 4bd9d5070b92 ("net/mlx5e: Enforce setting of a single FEC mode") Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-06-01MAINTAINERS: nfc mailing lists are subscribers-onlyJoe Perches1-5/+5
It looks as if the MAINTAINERS entries for the nfc mailing list should be updated as I just got a "rejected" bounce from the nfc list. ------- Your message to the Linux-nfc mailing-list was rejected for the following reasons: The message is not from a list member ------- Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-01Merge branch 'ktls-use-after-free'David S. Miller4-12/+66
Maxim Mikityanskiy says: ==================== Fix use-after-free after the TLS device goes down and up This small series fixes a use-after-free bug in the TLS offload code. The first patch is a preparation for the second one, and the second is the fix itself. v2 changes: Remove unneeded EXPORT_SYMBOL_GPL. ==================== Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-01net/tls: Fix use-after-free after the TLS device goes down and upMaxim Mikityanskiy4-5/+64
When a netdev with active TLS offload goes down, tls_device_down is called to stop the offload and tear down the TLS context. However, the socket stays alive, and it still points to the TLS context, which is now deallocated. If a netdev goes up, while the connection is still active, and the data flow resumes after a number of TCP retransmissions, it will lead to a use-after-free of the TLS context. This commit addresses this bug by keeping the context alive until its normal destruction, and implements the necessary fallbacks, so that the connection can resume in software (non-offloaded) kTLS mode. On the TX side tls_sw_fallback is used to encrypt all packets. The RX side already has all the necessary fallbacks, because receiving non-decrypted packets is supported. The thing needed on the RX side is to block resync requests, which are normally produced after receiving non-decrypted packets. The necessary synchronization is implemented for a graceful teardown: first the fallbacks are deployed, then the driver resources are released (it used to be possible to have a tls_dev_resync after tls_dev_del). A new flag called TLS_RX_DEV_DEGRADED is added to indicate the fallback mode. It's used to skip the RX resync logic completely, as it becomes useless, and some objects may be released (for example, resync_async, which is allocated and freed by the driver). Fixes: e8f69799810c ("net/tls: Add generic NIC offload infrastructure") Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-01net/tls: Replace TLS_RX_SYNC_RUNNING with RCUMaxim Mikityanskiy2-8/+3
RCU synchronization is guaranteed to finish in finite time, unlike a busy loop that polls a flag. This patch is a preparation for the bugfix in the next patch, where the same synchronize_net() call will also be used to sync with the TX datapath. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-01ethernet: myri10ge: Fix missing error code in myri10ge_probe()Jiapeng Chong1-0/+1
The error code is missing in this code scenario, add the error code '-EINVAL' to the return value 'status'. Eliminate the follow smatch warning: drivers/net/ethernet/myricom/myri10ge/myri10ge.c:3818 myri10ge_probe() warn: missing error code 'status'. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-01Merge branch 'virtio_net-build_skb-fixes'David S. Miller1-12/+8
Xuan Zhuo says: ==================== virtio-net: fix for build_skb() The logic of this piece is really messy. Fortunately, my refactored patch can be completed with a small amount of testing. ==================== Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-01virtio_net: get build_skb() buf by data ptrXuan Zhuo1-11/+6
In the case of merge, the page passed into page_to_skb() may be a head page, not the page where the current data is located. So when trying to get the buf where the data is located, we should get buf based on headroom instead of offset. This patch solves this problem. But if you don't use this patch, the original code can also run, because if the page is not the page of the current data, the calculated tailroom will be less than 0, and will not enter the logic of build_skb() . The significance of this patch is to modify this logical problem, allowing more situations to use build_skb(). Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-01virtio-net: fix for unable to handle page fault for addressXuan Zhuo1-1/+2
In merge mode, when xdp is enabled, if the headroom of buf is smaller than virtnet_get_headroom(), xdp_linearize_page() will be called but the variable of "headroom" is still 0, which leads to wrong logic after entering page_to_skb(). [ 16.600944] BUG: unable to handle page fault for address: ffffecbfff7b43c8[ 16.602175] #PF: supervisor read access in kernel mode [ 16.603350] #PF: error_code(0x0000) - not-present page [ 16.604200] PGD 0 P4D 0 [ 16.604686] Oops: 0000 [#1] SMP PTI [ 16.605306] CPU: 4 PID: 715 Comm: sh Tainted: G B 5.12.0+ #312 [ 16.606429] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/04 [ 16.608217] RIP: 0010:unmap_page_range+0x947/0xde0 [ 16.609014] Code: 00 00 08 00 48 83 f8 01 45 19 e4 41 f7 d4 41 83 e4 03 e9 a4 fd ff ff e8 b7 63 ed ff 4c 89 e0 48 c1 e0 065 [ 16.611863] RSP: 0018:ffffc90002503c58 EFLAGS: 00010286 [ 16.612720] RAX: ffffecbfff7b43c0 RBX: 00007f19f7203000 RCX: ffffffff812ff359 [ 16.613853] RDX: ffff888107778000 RSI: 0000000000000000 RDI: 0000000000000005 [ 16.614976] RBP: ffffea000425e000 R08: 0000000000000000 R09: 3030303030303030 [ 16.616124] R10: ffffffff82ed7d94 R11: 6637303030302052 R12: 7c00000afffded0f [ 16.617276] R13: 0000000000000001 R14: ffff888119ee7010 R15: 00007f19f7202000 [ 16.618423] FS: 0000000000000000(0000) GS:ffff88842fd00000(0000) knlGS:0000000000000000 [ 16.619738] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 16.620670] CR2: ffffecbfff7b43c8 CR3: 0000000103220005 CR4: 0000000000370ee0 [ 16.621792] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 16.622920] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 16.624047] Call Trace: [ 16.624525] ? release_pages+0x24d/0x730 [ 16.625209] unmap_single_vma+0xa9/0x130 [ 16.625885] unmap_vmas+0x76/0xf0 [ 16.626480] exit_mmap+0xa0/0x210 [ 16.627129] mmput+0x67/0x180 [ 16.627673] do_exit+0x3d1/0xf10 [ 16.628259] ? do_user_addr_fault+0x231/0x840 [ 16.629000] do_group_exit+0x53/0xd0 [ 16.629631] __x64_sys_exit_group+0x1d/0x20 [ 16.630354] do_syscall_64+0x3c/0x80 [ 16.630988] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 16.631828] RIP: 0033:0x7f1a043d0191 [ 16.632464] Code: Unable to access opcode bytes at RIP 0x7f1a043d0167. [ 16.633502] RSP: 002b:00007ffe3d993308 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 [ 16.634737] RAX: ffffffffffffffda RBX: 00007f1a044c9490 RCX: 00007f1a043d0191 [ 16.635857] RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000000 [ 16.636986] RBP: 0000000000000000 R08: ffffffffffffff88 R09: 0000000000000001 [ 16.638120] R10: 0000000000000008 R11: 0000000000000246 R12: 00007f1a044c9490 [ 16.639245] R13: 0000000000000001 R14: 00007f1a044c9968 R15: 0000000000000000 [ 16.640408] Modules linked in: [ 16.640958] CR2: ffffecbfff7b43c8 [ 16.641557] ---[ end trace bc4891c6ce46354c ]--- [ 16.642335] RIP: 0010:unmap_page_range+0x947/0xde0 [ 16.643135] Code: 00 00 08 00 48 83 f8 01 45 19 e4 41 f7 d4 41 83 e4 03 e9 a4 fd ff ff e8 b7 63 ed ff 4c 89 e0 48 c1 e0 065 [ 16.645983] RSP: 0018:ffffc90002503c58 EFLAGS: 00010286 [ 16.646845] RAX: ffffecbfff7b43c0 RBX: 00007f19f7203000 RCX: ffffffff812ff359 [ 16.647970] RDX: ffff888107778000 RSI: 0000000000000000 RDI: 0000000000000005 [ 16.649091] RBP: ffffea000425e000 R08: 0000000000000000 R09: 3030303030303030 [ 16.650250] R10: ffffffff82ed7d94 R11: 6637303030302052 R12: 7c00000afffded0f [ 16.651394] R13: 0000000000000001 R14: ffff888119ee7010 R15: 00007f19f7202000 [ 16.652529] FS: 0000000000000000(0000) GS:ffff88842fd00000(0000) knlGS:0000000000000000 [ 16.653887] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 16.654841] CR2: ffffecbfff7b43c8 CR3: 0000000103220005 CR4: 0000000000370ee0 [ 16.655992] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 16.657150] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 16.658290] Kernel panic - not syncing: Fatal exception [ 16.659613] Kernel Offset: disabled [ 16.660234] ---[ end Kernel panic - not syncing: Fatal exception ]--- Fixes: fb32856b16ad ("virtio-net: page_to_skb() use build_skb when there's sufficient tailroom") Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-01net: sock: fix in-kernel mark settingAlexander Aring1-4/+12
This patch fixes the in-kernel mark setting by doing an additional sk_dst_reset() which was introduced by commit 50254256f382 ("sock: Reset dst when changing sk_mark via setsockopt"). The code is now shared to avoid any further suprises when changing the socket mark value. Fixes: 84d1c617402e ("net: sock: add sock_set_mark") Reported-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-01net: dsa: tag_8021q: fix the VLAN IDs used for encoding sub-VLANsVladimir Oltean1-1/+1
When using sub-VLANs in the range of 1-7, the resulting value from: rx_vid = dsa_8021q_rx_vid_subvlan(ds, port, subvlan); is wrong according to the description from tag_8021q.c: | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | +-----------+-----+-----------------+-----------+-----------------------+ | DIR | SVL | SWITCH_ID | SUBVLAN | PORT | +-----------+-----+-----------------+-----------+-----------------------+ For example, when ds->index == 0, port == 3 and subvlan == 1, dsa_8021q_rx_vid_subvlan() returns 1027, same as it returns for subvlan == 0, but it should have returned 1043. This is because the low portion of the subvlan bits are not masked properly when writing into the 12-bit VLAN value. They are masked into bits 4:3, but they should be masked into bits 5:4. Fixes: 3eaae1d05f2b ("net: dsa: tag_8021q: support up to 8 VLANs per port using sub-VLANs") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-31nfc: fix NULL ptr dereference in llcp_sock_getname() after failed connectKrzysztof Kozlowski1-0/+2
It's possible to trigger NULL pointer dereference by local unprivileged user, when calling getsockname() after failed bind() (e.g. the bind fails because LLCP_SAP_MAX used as SAP): BUG: kernel NULL pointer dereference, address: 0000000000000000 CPU: 1 PID: 426 Comm: llcp_sock_getna Not tainted 5.13.0-rc2-next-20210521+ #9 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1 04/01/2014 Call Trace: llcp_sock_getname+0xb1/0xe0 __sys_getpeername+0x95/0xc0 ? lockdep_hardirqs_on_prepare+0xd5/0x180 ? syscall_enter_from_user_mode+0x1c/0x40 __x64_sys_getpeername+0x11/0x20 do_syscall_64+0x36/0x70 entry_SYSCALL_64_after_hwframe+0x44/0xae This can be reproduced with Syzkaller C repro (bind followed by getpeername): https://syzkaller.appspot.com/x/repro.c?x=14def446e00000 Cc: <stable@vger.kernel.org> Fixes: d646960f7986 ("NFC: Initial LLCP support") Reported-by: syzbot+80fb126e7f7d8b1a5914@syzkaller.appspotmail.com Reported-by: butt3rflyh4ck <butterflyhuangxx@gmail.com> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Link: https://lore.kernel.org/r/20210531072138.5219-1-krzysztof.kozlowski@canonical.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-05-30net: stmmac: fix kernel panic due to NULL pointer dereference of mdio_bus_dataSriranjani P1-2/+3
Fixed link does not need mdio bus and in that case mdio_bus_data will not be allocated. Before using mdio_bus_data we should check for NULL. This patch fix the kernel panic due to NULL pointer dereference of mdio_bus_data when it is not allocated. Without this patch we do see following kernel crash caused due to kernel NULL pointer dereference. Call trace: stmmac_dvr_probe+0x3c/0x10b0 dwc_eth_dwmac_probe+0x224/0x378 platform_probe+0x68/0xe0 really_probe+0x130/0x3d8 driver_probe_device+0x68/0xd0 device_driver_attach+0x74/0x80 __driver_attach+0x58/0xf8 bus_for_each_dev+0x7c/0xd8 driver_attach+0x24/0x30 bus_add_driver+0x148/0x1f0 driver_register+0x64/0x120 __platform_driver_register+0x28/0x38 dwc_eth_dwmac_driver_init+0x1c/0x28 do_one_initcall+0x78/0x158 kernel_init_freeable+0x1f0/0x244 kernel_init+0x14/0x118 ret_from_fork+0x10/0x30 Code: f9002bfb 9113e2d9 910e6273 aa0003f7 (f9405c78) ---[ end trace 32d9d41562ddc081 ]--- Fixes: e5e5b771f684 ("net: stmmac: make in-band AN mode parsing is supported for non-DT") Signed-off-by: Sriranjani P <sriranjani.p@samsung.com> Signed-off-by: Pankaj Dubey <pankaj.dubey@samsung.com> Link: https://lore.kernel.org/r/20210528071056.35252-1-sriranjani.p@samsung.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-05-28Merge branch 'mptcp-fixes-for-5-13'Jakub Kicinski3-44/+64
Mat Martineau says: ==================== mptcp: Fixes for 5.13 These patches address two issues in MPTCP. Patch 1 fixes a locking issue affecting MPTCP-level retransmissions. Patches 2-4 improve handling of out-of-order packet arrival early in a connection, so it falls back to TCP rather than forcing a reset. Includes a selftest. ==================== Link: https://lore.kernel.org/r/20210527233140.182728-1-mathew.j.martineau@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-05-28mptcp: update selftest for fallback due to OoOPaolo Abeni1-4/+9
The previous commit noted that we can have fallback scenario due to OoO (or packet drop). Update the self-tests accordingly Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-05-28mptcp: do not reset MP_CAPABLE subflow on mapping errorsPaolo Abeni1-30/+32
When some mapping related errors occurs we close the main MPC subflow with a RST. We should instead fallback gracefully to TCP, and do the reset only for MPJ subflows. Fixes: d22f4988ffec ("mptcp: process MP_CAPABLE data option") Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/192 Reported-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-05-28mptcp: always parse mptcp options for MPC reqskPaolo Abeni1-9/+8
In subflow_syn_recv_sock() we currently skip options parsing for OoO packet, given that such packets may not carry the relevant MPC option. If the peer generates an MPC+data TSO packet and some of the early segments are lost or get reorder, we server will ignore the peer key, causing transient, unexpected fallback to TCP. The solution is always parsing the incoming MPTCP options, and do the fallback only for in-order packets. This actually cleans the existing code a bit. Fixes: d22f4988ffec ("mptcp: process MP_CAPABLE data option") Reported-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-05-28mptcp: fix sk_forward_memory corruption on retransmissionPaolo Abeni1-1/+15
MPTCP sk_forward_memory handling is a bit special, as such field is protected by the msk socket spin_lock, instead of the plain socket lock. Currently we have a code path updating such field without handling the relevant lock: __mptcp_retrans() -> __mptcp_clean_una_wakeup() Several helpers in __mptcp_clean_una_wakeup() will update sk_forward_alloc, possibly causing such field corruption, as reported by Matthieu. Address the issue providing and using a new variant of blamed function which explicitly acquires the msk spin lock. Fixes: 64b9cea7a0af ("mptcp: fix spurious retransmissions") Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/172 Reported-by: Matthieu Baerts <matthieu.baerts@tessares.net> Tested-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-05-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfJakub Kicinski4-35/+59
Pablo Neira Ayuso says: ==================== Netfilter/IPVS fixes for net The following patchset contains Netfilter/IPVS fixes for net: 1) Fix incorrect sockopts unregistration from error path, from Florian Westphal. 2) A few patches to provide better error reporting when missing kernel netfilter options are missing in .config. 3) Fix dormant table flag updates. 4) Memleak in IPVS when adding service with IP_VS_SVC_F_HASHED flag. * git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf: ipvs: ignore IP_VS_SVC_F_HASHED flag when adding service netfilter: nf_tables: fix table flag updates netfilter: nf_tables: extended netlink error reporting for chain type netfilter: nf_tables: missing error reporting for not selected expressions netfilter: conntrack: unregister ipv4 sockopts on error unwind ==================== Link: https://lore.kernel.org/r/20210527190115.98503-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-05-27net/sched: act_ct: Fix ct template allocation for zone 0Ariel Levkovich1-3/+0
Fix current behavior of skipping template allocation in case the ct action is in zone 0. Skipping the allocation may cause the datapath ct code to ignore the entire ct action with all its attributes (commit, nat) in case the ct action in zone 0 was preceded by a ct clear action. The ct clear action sets the ct_state to untracked and resets the skb->_nfct pointer. Under these conditions and without an allocated ct template, the skb->_nfct pointer will remain NULL which will cause the tc ct action handler to exit without handling commit and nat actions, if such exist. For example, the following rule in OVS dp: recirc_id(0x2),ct_state(+new-est-rel-rpl+trk),ct_label(0/0x1), \ in_port(eth0),actions:ct_clear,ct(commit,nat(src=10.11.0.12)), \ recirc(0x37a) Will result in act_ct skipping the commit and nat actions in zone 0. The change removes the skipping of template allocation for zone 0 and treats it the same as any other zone. Fixes: b57dc7c13ea9 ("net/sched: Introduce action ct") Signed-off-by: Ariel Levkovich <lariel@nvidia.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Link: https://lore.kernel.org/r/20210526170110.54864-1-lariel@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-05-27net/sched: act_ct: Offload connections with commit actionPaul Blakey1-3/+4
Currently established connections are not offloaded if the filter has a "ct commit" action. This behavior will not offload connections of the following scenario: $ tc_filter add dev $DEV ingress protocol ip prio 1 flower \ ct_state -trk \ action ct commit action goto chain 1 $ tc_filter add dev $DEV ingress protocol ip chain 1 prio 1 flower \ action mirred egress redirect dev $DEV2 $ tc_filter add dev $DEV2 ingress protocol ip prio 1 flower \ action ct commit action goto chain 1 $ tc_filter add dev $DEV2 ingress protocol ip prio 1 chain 1 flower \ ct_state +trk+est \ action mirred egress redirect dev $DEV Offload established connections, regardless of the commit flag. Fixes: 46475bb20f4b ("net/sched: act_ct: Software offload of established flows") Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Paul Blakey <paulb@nvidia.com> Link: https://lore.kernel.org/r/1622029449-27060-1-git-send-email-paulb@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>