summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2022-09-19gve: Fix GFP flags when allocing pagesShailend Chand1-1/+1
Use GFP_ATOMIC when allocating pages out of the hotpath, continue to use GFP_KERNEL when allocating pages during setup. GFP_KERNEL will allow blocking which allows it to succeed more often in a low memory enviornment but in the hotpath we do not want to allow the allocation to block. Fixes: 9b8dd5e5ea48b ("gve: DQO: Add RX path") Signed-off-by: Shailend Chand <shailend@google.com> Signed-off-by: Jeroen de Borst <jeroendb@google.com> Link: https://lore.kernel.org/r/20220913000901.959546-1-jeroendb@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-19bnxt_en: fix flags to check for supported fw versionVadim Fedorenko1-2/+2
The warning message of unsupported FW appears every time RX timestamps are disabled on the interface. The patch fixes the flags to correct set for the check. Fixes: 66ed81dcedc6 ("bnxt_en: Enable packet timestamping for all RX packets") Cc: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru> Reviewed-by: Andy Gospodarek <gospo@broadcom.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20220915234932.25497-1-vfedorenko@novek.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-19Merge tag 'wireless-2022-09-19' of ↵Jakub Kicinski4-4/+6
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Kalle Valo says: ==================== wireless fixes for v6.0 Late stage fixes for v6.0. Temporarily mark iwlwifi's mei code broken as it breaks suspend for iwd users and also don't spam nss trimming messages. mt76 has fixes for aggregation sequence numbers and a regression related to the VHT extended NSS BW feature. * tag 'wireless-2022-09-19' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: wifi: mt76: fix 5 GHz connection regression on mt76x0/mt76x2 wifi: mt76: fix reading current per-tid starting sequence number for aggregation wifi: iwlwifi: Mark IWLMEI as broken wifi: iwlwifi: don't spam logs with NSS>2 messages ==================== Link: https://lore.kernel.org/r/20220919105003.1EAE7C433B5@smtp.kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-19Merge tag 'batadv-net-pullrequest-20220916' of ↵Jakub Kicinski1-0/+4
git://git.open-mesh.org/linux-merge Simon Wunderlich says: ==================== Here is a batman-adv bugfix: - Fix hang up with small MTU hard-interface, by Shigeru Yoshida * tag 'batadv-net-pullrequest-20220916' of git://git.open-mesh.org/linux-merge: batman-adv: Fix hang up with small MTU hard-interface ==================== Link: https://lore.kernel.org/r/20220916160931.1412407-1-sw@simonwunderlich.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-19sfc: fix null pointer dereference in efx_hard_start_xmitÍñigo Huguet1-1/+1
Trying to get the channel from the tx_queue variable here is wrong because we can only be here if tx_queue is NULL, so we shouldn't dereference it. As the above comment in the code says, this is very unlikely to happen, but it's wrong anyway so let's fix it. I hit this issue because of a different bug that caused tx_queue to be NULL. If that happens, this is the error message that we get here: BUG: unable to handle kernel NULL pointer dereference at 0000000000000020 [...] RIP: 0010:efx_hard_start_xmit+0x153/0x170 [sfc] Fixes: 12804793b17c ("sfc: decouple TXQ type from label") Reported-by: Tianhao Zhao <tizhao@redhat.com> Signed-off-by: Íñigo Huguet <ihuguet@redhat.com> Acked-by: Edward Cree <ecree.xilinx@gmail.com> Link: https://lore.kernel.org/r/20220914111135.21038-1-ihuguet@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-19sfc: fix TX channel offset when using legacy interruptsÍñigo Huguet1-1/+1
In legacy interrupt mode the tx_channel_offset was hardcoded to 1, but that's not correct if efx_sepparate_tx_channels is false. In that case, the offset is 0 because the tx queues are in the single existing channel at index 0, together with the rx queue. Without this fix, as soon as you try to send any traffic, it tries to get the tx queues from an uninitialized channel getting these errors: WARNING: CPU: 1 PID: 0 at drivers/net/ethernet/sfc/tx.c:540 efx_hard_start_xmit+0x12e/0x170 [sfc] [...] RIP: 0010:efx_hard_start_xmit+0x12e/0x170 [sfc] [...] Call Trace: <IRQ> dev_hard_start_xmit+0xd7/0x230 sch_direct_xmit+0x9f/0x360 __dev_queue_xmit+0x890/0xa40 [...] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020 [...] RIP: 0010:efx_hard_start_xmit+0x153/0x170 [sfc] [...] Call Trace: <IRQ> dev_hard_start_xmit+0xd7/0x230 sch_direct_xmit+0x9f/0x360 __dev_queue_xmit+0x890/0xa40 [...] Fixes: c308dfd1b43e ("sfc: fix wrong tx channel offset with efx_separate_tx_channels") Reported-by: Tianhao Zhao <tizhao@redhat.com> Signed-off-by: Íñigo Huguet <ihuguet@redhat.com> Acked-by: Edward Cree <ecree.xilinx@gmail.com> Link: https://lore.kernel.org/r/20220914103648.16902-1-ihuguet@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-19Merge tag 'for-net-2022-09-09' of ↵Jakub Kicinski1-2/+0
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Luiz Augusto von Dentz says: ==================== bluetooth pull request for net: - Fix HCIGETDEVINFO regression * tag 'for-net-2022-09-09' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth: Bluetooth: Fix HCIGETDEVINFO regression ==================== Link: https://lore.kernel.org/r/20220909201642.3810565-1-luiz.dentz@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-19net: ethernet: mtk_eth_soc: enable XDP support just for MT7986 SoCLorenzo Bianconi1-1/+1
Disable page_pool/XDP support for MT7621 SoC in order fix a regression introduce adding XDP for MT7986 SoC. There is no a real use case for XDP on MT7621 since it is a low-end cpu. Moreover this patch reduces the memory footprint. Tested-by: Sergio Paracuellos <sergio.paracuellos@gmail.com> Tested-by: Arınç ÜNAL <arinc.unal@arinc9.com> Fixes: 23233e577ef9 ("net: ethernet: mtk_eth_soc: rely on page_pool for single page buffers") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://lore.kernel.org/r/2bf31e27b888c43228b0d84dd2ef5033338269e2.1663074002.git.lorenzo@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-19net: mana: Add rmb after checking owner bitsHaiyang Zhang1-0/+10
Per GDMA spec, rmb is necessary after checking owner_bits, before reading EQ or CQ entries. Add rmb in these two places to comply with the specs. Cc: stable@vger.kernel.org Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)") Reported-by: Sinan Kaya <Sinan.Kaya@microsoft.com> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Dexuan Cui <decui@microsoft.com> Link: https://lore.kernel.org/r/1662928805-15861-1-git-send-email-haiyangz@microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-19MAINTAINERS: gve: update developersJeroen de Borst1-2/+2
Updating active developers. Signed-off-by: Jeroen de Borst <jeroendb@google.com> Link: https://lore.kernel.org/r/20220913185319.1061909-1-jeroendb@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-19netdevsim: Fix hwstats debugfs file permissionsIdo Schimmel1-3/+3
The hwstats debugfs files are only writeable, but they are created with read and write permissions, causing certain selftests to fail [1]. Fix by creating the files with write permission only. [1] # ./test_offload.py Test destruction of generic XDP... Traceback (most recent call last): File "/home/idosch/code/linux/tools/testing/selftests/bpf/./test_offload.py", line 810, in <module> simdev = NetdevSimDev() [...] Exception: Command failed: cat /sys/kernel/debug/netdevsim/netdevsim0//ports/0/dev/hwstats/l3/disable_ifindex cat: /sys/kernel/debug/netdevsim/netdevsim0//ports/0/dev/hwstats/l3/disable_ifindex: Invalid argument Fixes: 1a6d7ae7d63c ("netdevsim: Introduce support for L3 offload xstats") Reported-by: Jie2x Zhou <jie2x.zhou@intel.com> Tested-by: Jie2x Zhou <jie2x.zhou@intel.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Link: https://lore.kernel.org/r/20220909153830.3732504-1-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-19i40e: Fix set max_tx_rate when it is lower than 1 MbpsMichal Jaron1-6/+26
While converting max_tx_rate from bytes to Mbps, this value was set to 0, if the original value was lower than 125000 bytes (1 Mbps). This would cause no transmission rate limiting to occur. This happened due to lack of check of max_tx_rate against the 1 Mbps value for max_tx_rate and the following division by 125000. Fix this issue by adding a helper i40e_bw_bytes_to_mbits() which sets max_tx_rate to minimum usable value of 50 Mbps, if its value is less than 1 Mbps, otherwise do the required conversion by dividing by 125000. Fixes: 5ecae4120a6b ("i40e: Refactor VF BW rate limiting") Signed-off-by: Michal Jaron <michalx.jaron@intel.com> Signed-off-by: Andrii Staikov <andrii.staikov@intel.com> Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-09-19i40e: Fix VF set max MTU sizeMichal Jaron1-0/+20
Max MTU sent to VF is set to 0 during memory allocation. It cause that max MTU on VF is changed to IAVF_MAX_RXBUFFER and does not depend on data from HW. Set max_mtu field in virtchnl_vf_resource struct to inform VF in GET_VF_RESOURCES msg what size should be max frame. Fixes: dab86afdbbd1 ("i40e/i40evf: Change the way we limit the maximum frame size for Rx") Signed-off-by: Michal Jaron <michalx.jaron@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-09-19iavf: Fix set max MTU size with port VLAN and jumbo framesMichal Jaron1-2/+5
After setting port VLAN and MTU to 9000 on VF with ice driver there was an iavf error "PF returned error -5 (IAVF_ERR_PARAM) to our request 6". During queue configuration, VF's max packet size was set to IAVF_MAX_RXBUFFER but on ice max frame size was smaller by VLAN_HLEN due to making some space for port VLAN as VF is not aware whether it's in a port VLAN. This mismatch in sizes caused ice to reject queue configuration with ERR_PARAM error. Proper max_mtu is sent from ice PF to VF with GET_VF_RESOURCES msg but VF does not look at this. In iavf change max_frame from IAVF_MAX_RXBUFFER to max_mtu received from pf with GET_VF_RESOURCES msg to make vf's max_frame_size dependent from pf. Add check if received max_mtu is not in eligible range then set it to IAVF_MAX_RXBUFFER. Fixes: dab86afdbbd1 ("i40e/i40evf: Change the way we limit the maximum frame size for Rx") Signed-off-by: Michal Jaron <michalx.jaron@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-09-19mlxbf_gige: clear MDIO gateway lock after readDavid Thompson1-0/+6
The MDIO gateway (GW) lock in BlueField-2 GIGE logic is set after read. This patch adds logic to make sure the lock is always cleared at the end of each MDIO transaction. Fixes: f92e1869d74e ("Add Mellanox BlueField Gigabit Ethernet driver") Reviewed-by: Asmaa Mnebhi <asmaa@nvidia.com> Signed-off-by: David Thompson <davthompson@nvidia.com> Link: https://lore.kernel.org/r/20220902164247.19862-1-davthompson@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-19iavf: Fix bad page stateNorbert Zulinski1-2/+2
Fix bad page state, free inappropriate page in handling dummy descriptor. iavf_build_skb now has to check not only if rx_buffer is NULL but also if size is zero, same thing in iavf_clean_rx_irq. Without this patch driver would free page that will be used by napi_build_skb. Fixes: a9f49e006030 ("iavf: Fix handling of dummy receive descriptors") Signed-off-by: Norbert Zulinski <norbertx.zulinski@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-09-16tcp: Use WARN_ON_ONCE() in tcp_read_skb()Peilin Ye1-1/+1
Prevent tcp_read_skb() from flooding the syslog. Suggested-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Peilin Ye <peilin.ye@bytedance.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-16Merge branch 'net-unsync-addresses-from-ports'David S. Miller14-32/+297
From: Benjamin Poirier <bpoirier@nvidia.com> To: netdev@vger.kernel.org Cc: Jay Vosburgh <j.vosburgh@gmail.com>, Veaceslav Falico <vfalico@gmail.com>, Andy Gospodarek <andy@greyhouse.net>, "David S. Miller" <davem@davemloft.net>, Eric Dumazet <edumazet@google.com>, Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>, Jiri Pirko <jiri@resnulli.us>, Shuah Khan <shuah@kernel.org>, Jonathan Toppins <jtoppins@redhat.com>, linux-kselftest@vger.kernel.org Subject: [PATCH net v3 0/4] Unsync addresses from ports when stopping aggregated devices Date: Wed, 7 Sep 2022 16:56:38 +0900 [thread overview] Message-ID: <20220907075642.475236-1-bpoirier@nvidia.com> (raw) This series fixes similar problems in the bonding and team drivers. Because of missing dev_{uc,mc}_unsync() calls, addresses added to underlying devices may be leftover after the aggregated device is deleted. Add the missing calls and a few related tests. v2: * fix selftest installation, see patch 3 v3: * Split lacpdu_multicast changes to their own patch, #1 * In ndo_{add,del}_slave methods, only perform address list changes when the aggregated device is up (patches 2 & 3) * Add selftest function related to the above change (patch 4) ==================== Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-16net: Add tests for bonding and team address list managementBenjamin Poirier9-1/+237
Test that the bonding and team drivers clean up an underlying device's address lists (dev->uc, dev->mc) when the aggregated device is deleted. Test addition and removal of the LACPDU multicast address on underlying devices by the bonding driver. v2: * add lag_lib.sh to TEST_FILES v3: * extend bond_listen_lacpdu_multicast test to init_state up and down cases * remove some superfluous shell syntax and 'set dev ... up' commands Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-16net: team: Unsync device addresses on ndo_stopBenjamin Poirier1-6/+18
Netdev drivers are expected to call dev_{uc,mc}_sync() in their ndo_set_rx_mode method and dev_{uc,mc}_unsync() in their ndo_stop method. This is mentioned in the kerneldoc for those dev_* functions. The team driver calls dev_{uc,mc}_unsync() during ndo_uninit instead of ndo_stop. This is ineffective because address lists (dev->{uc,mc}) have already been emptied in unregister_netdevice_many() before ndo_uninit is called. This mistake can result in addresses being leftover on former team ports after a team device has been deleted; see test_LAG_cleanup() in the last patch in this series. Add unsync calls at their expected location, team_close(). v3: * When adding or deleting a port, only sync/unsync addresses if the team device is up. In other cases, it is taken care of at the right time by ndo_open/ndo_set_rx_mode/ndo_stop. Fixes: 3d249d4ca7d0 ("net: introduce ethernet teaming device") Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-16net: bonding: Unsync device addresses on ndo_stopBenjamin Poirier1-12/+35
Netdev drivers are expected to call dev_{uc,mc}_sync() in their ndo_set_rx_mode method and dev_{uc,mc}_unsync() in their ndo_stop method. This is mentioned in the kerneldoc for those dev_* functions. The bonding driver calls dev_{uc,mc}_unsync() during ndo_uninit instead of ndo_stop. This is ineffective because address lists (dev->{uc,mc}) have already been emptied in unregister_netdevice_many() before ndo_uninit is called. This mistake can result in addresses being leftover on former bond slaves after a bond has been deleted; see test_LAG_cleanup() in the last patch in this series. Add unsync calls, via bond_hw_addr_flush(), at their expected location, bond_close(). Add dev_mc_add() call to bond_open() to match the above change. v3: * When adding or deleting a slave, only sync/unsync, add/del addresses if the bond is up. In other cases, it is taken care of at the right time by ndo_open/ndo_set_rx_mode/ndo_stop. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-16net: bonding: Share lacpdu_mcast_addr definitionBenjamin Poirier4-16/+10
There are already a few definitions of arrays containing MULTICAST_LACPDU_ADDR and the next patch will add one more use. These all contain the same constant data so define one common instance for all bonding code. Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-16Merge branch '100GbE' of ↵David S. Miller4-25/+33
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2022-09-08 (ice, iavf) This series contains updates to ice and iavf drivers. Dave removes extra unplug of auxiliary bus on reset which caused a scheduling while atomic to be reported for ice. Ding Hui defers setting of queues for TCs to ensure valid configuration and restores old config if invalid for ice. Sylwester fixes a check of setting MAC address to occur after result is received from PF for iavf driver. Brett changes check of ring tail to use software cached value as not all devices have access to register tail for iavf driver. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-16net: marvell: prestera: add support for for Aldrin2Oleksandr Mazur1-0/+1
Aldrin2 (98DX8525) is a Marvell Prestera PP, with 100G support. Signed-off-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu> V2: - retarget to net tree instead of net-next; - fix missed colon in patch subject ('net marvell' vs 'net: mavell'); Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-16net/ieee802154: fix uninit value bug in dgram_sendmsgHaimin Zhang2-19/+60
There is uninit value bug in dgram_sendmsg function in net/ieee802154/socket.c when the length of valid data pointed by the msg->msg_name isn't verified. We introducing a helper function ieee802154_sockaddr_check_size to check namelen. First we check there is addr_type in ieee802154_addr_sa. Then, we check namelen according to addr_type. Also fixed in raw_bind, dgram_bind, dgram_connect. Signed-off-by: Haimin Zhang <tcs_kernel@tencent.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-13Documentation: mptcp: fix pm_type formattingMatthieu Baerts1-1/+0
When looking at the rendered HTML version, we can see 'pm_type' is not displayed with a bold font: https://docs.kernel.org/5.19/networking/mptcp-sysctl.html The empty line under 'pm_type' is then removed to have the same style as the others. Fixes: 6bb63ccc25d4 ("mptcp: Add a per-namespace sysctl to set the default path manager type") Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Link: https://lore.kernel.org/r/20220906180404.1255873-2-matthieu.baerts@tessares.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-13mptcp: fix fwd memory accounting on coalescePaolo Abeni1-1/+7
The intel bot reported a memory accounting related splat: [ 240.473094] ------------[ cut here ]------------ [ 240.478507] page_counter underflow: -4294828518 nr_pages=4294967290 [ 240.485500] WARNING: CPU: 2 PID: 14986 at mm/page_counter.c:56 page_counter_cancel+0x96/0xc0 [ 240.570849] CPU: 2 PID: 14986 Comm: mptcp_connect Tainted: G S 5.19.0-rc4-00739-gd24141fe7b48 #1 [ 240.581637] Hardware name: HP HP Z240 SFF Workstation/802E, BIOS N51 Ver. 01.63 10/05/2017 [ 240.590600] RIP: 0010:page_counter_cancel+0x96/0xc0 [ 240.596179] Code: 00 00 00 45 31 c0 48 89 ef 5d 4c 89 c6 41 5c e9 40 fd ff ff 4c 89 e2 48 c7 c7 20 73 39 84 c6 05 d5 b1 52 04 01 e8 e7 95 f3 01 <0f> 0b eb a9 48 89 ef e8 1e 25 fc ff eb c3 66 66 2e 0f 1f 84 00 00 [ 240.615639] RSP: 0018:ffffc9000496f7c8 EFLAGS: 00010082 [ 240.621569] RAX: 0000000000000000 RBX: ffff88819c9c0120 RCX: 0000000000000000 [ 240.629404] RDX: 0000000000000027 RSI: 0000000000000004 RDI: fffff5200092deeb [ 240.637239] RBP: ffff88819c9c0120 R08: 0000000000000001 R09: ffff888366527a2b [ 240.645069] R10: ffffed106cca4f45 R11: 0000000000000001 R12: 00000000fffffffa [ 240.652903] R13: ffff888366536118 R14: 00000000fffffffa R15: ffff88819c9c0000 [ 240.660738] FS: 00007f3786e72540(0000) GS:ffff888366500000(0000) knlGS:0000000000000000 [ 240.669529] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 240.675974] CR2: 00007f966b346000 CR3: 0000000168cea002 CR4: 00000000003706e0 [ 240.683807] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 240.691641] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 240.699468] Call Trace: [ 240.702613] <TASK> [ 240.705413] page_counter_uncharge+0x29/0x80 [ 240.710389] drain_stock+0xd0/0x180 [ 240.714585] refill_stock+0x278/0x580 [ 240.718951] __sk_mem_reduce_allocated+0x222/0x5c0 [ 240.729248] __mptcp_update_rmem+0x235/0x2c0 [ 240.734228] __mptcp_move_skbs+0x194/0x6c0 [ 240.749764] mptcp_recvmsg+0xdfa/0x1340 [ 240.763153] inet_recvmsg+0x37f/0x500 [ 240.782109] sock_read_iter+0x24a/0x380 [ 240.805353] new_sync_read+0x420/0x540 [ 240.838552] vfs_read+0x37f/0x4c0 [ 240.842582] ksys_read+0x170/0x200 [ 240.864039] do_syscall_64+0x5c/0x80 [ 240.872770] entry_SYSCALL_64_after_hwframe+0x46/0xb0 [ 240.878526] RIP: 0033:0x7f3786d9ae8e [ 240.882805] Code: c0 e9 b6 fe ff ff 50 48 8d 3d 6e 18 0a 00 e8 89 e8 01 00 66 0f 1f 84 00 00 00 00 00 64 8b 04 25 18 00 00 00 85 c0 75 14 0f 05 <48> 3d 00 f0 ff ff 77 5a c3 66 0f 1f 84 00 00 00 00 00 48 83 ec 28 [ 240.902259] RSP: 002b:00007fff7be81e08 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [ 240.910533] RAX: ffffffffffffffda RBX: 0000000000002000 RCX: 00007f3786d9ae8e [ 240.918368] RDX: 0000000000002000 RSI: 00007fff7be87ec0 RDI: 0000000000000005 [ 240.926206] RBP: 0000000000000005 R08: 00007f3786e6a230 R09: 00007f3786e6a240 [ 240.934046] R10: fffffffffffff288 R11: 0000000000000246 R12: 0000000000002000 [ 240.941884] R13: 00007fff7be87ec0 R14: 00007fff7be87ec0 R15: 0000000000002000 [ 240.949741] </TASK> [ 240.952632] irq event stamp: 27367 [ 240.956735] hardirqs last enabled at (27366): [<ffffffff81ba50ea>] mem_cgroup_uncharge_skmem+0x6a/0x80 [ 240.966848] hardirqs last disabled at (27367): [<ffffffff81b8fd42>] refill_stock+0x282/0x580 [ 240.976017] softirqs last enabled at (27360): [<ffffffff83a4d8ef>] mptcp_recvmsg+0xaf/0x1340 [ 240.985273] softirqs last disabled at (27364): [<ffffffff83a4d30c>] __mptcp_move_skbs+0x18c/0x6c0 [ 240.994872] ---[ end trace 0000000000000000 ]--- After commit d24141fe7b48 ("mptcp: drop SK_RECLAIM_* macros"), if rmem_fwd_alloc become negative, mptcp_rmem_uncharge() can try to reclaim a negative amount of pages, since the expression: reclaimable >= PAGE_SIZE will evaluate to true for any negative value of the int 'reclaimable': 'PAGE_SIZE' is an unsigned long and the negative integer will be promoted to a (very large) unsigned long value. Still after the mentioned commit, kfree_skb_partial() in mptcp_try_coalesce() will reclaim most of just released fwd memory, so that following charging of the skb delta size will lead to negative fwd memory values. At that point a racing recvmsg() can trigger the splat. Address the issue switching the order of the memory accounting operations. The fwd memory can still transiently reach negative values, but that will happen in an atomic scope and no code path could touch/use such value. Reported-by: kernel test robot <oliver.sang@intel.com> Fixes: d24141fe7b48 ("mptcp: drop SK_RECLAIM_* macros") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Link: https://lore.kernel.org/r/20220906180404.1255873-1-matthieu.baerts@tessares.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-13net: phy: aquantia: wait for the suspend/resume operations to finishIoana Ciornei1-4/+49
The Aquantia datasheet notes that after issuing a Processor-Intensive MDIO operation, like changing the low-power state of the device, the driver should wait for the operation to finish before issuing a new MDIO command. The new aqr107_wait_processor_intensive_op() function is added which can be used after these kind of MDIO operations. At the moment, we are only adding it at the end of the suspend/resume calls. The issue was identified on a board featuring the AQR113C PHY, on which commands like 'ip link (..) up / down' issued without any delays between them would render the link on the PHY to remain down. The issue was easy to reproduce with a one-liner: $ ip link set dev ethX down; ip link set dev ethX up; \ ip link set dev ethX down; ip link set dev ethX up; Fixes: ac9e81c230eb ("net: phy: aquantia: add suspend / resume callbacks for AQR107 family") Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20220906130451.1483448-1-ioana.ciornei@nxp.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-12wifi: mt76: fix 5 GHz connection regression on mt76x0/mt76x2Felix Fietkau1-1/+2
Some users have reported being unable to connect to MT76x0 APs running mt76 after a commit enabling the VHT extneded NSS BW feature. Fix this regression by ensuring that this feature only gets enabled on drivers that support it Cc: stable@vger.kernel.org Fixes: d9fcfc1424aa ("mt76: enable the VHT extended NSS BW feature") Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20220907095228.82072-1-nbd@nbd.name
2022-09-12wifi: mt76: fix reading current per-tid starting sequence number for aggregationFelix Fietkau1-1/+1
The code was accidentally shifting register values down by tid % 32 instead of (tid * field_size) % 32. Cc: stable@vger.kernel.org Fixes: a28bef561a5c ("mt76: mt7615: re-enable offloading of sequence number assignment") Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20220826182329.18155-1-nbd@nbd.name
2022-09-12wifi: iwlwifi: Mark IWLMEI as brokenToke Høiland-Jørgensen1-0/+1
The iwlmei driver breaks iwlwifi when returning from suspend. The interface ends up in the 'down' state after coming back from suspend. And iwd doesn't touch the interface state, but wpa_supplicant does, so the bug only happens on iwd. The bug report[0] has been open for four months now, and no fix seems to be forthcoming. Since just disabling the iwlmei driver works as a workaround, let's mark the config option as broken until it can be fixed properly. [0] https://bugzilla.kernel.org/show_bug.cgi?id=215937 Fixes: 2da4366f9e2c ("iwlwifi: mei: add the driver to allow cooperation with CSME") Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20220907134450.1183045-1-toke@toke.dk
2022-09-09Bluetooth: Fix HCIGETDEVINFO regressionLuiz Augusto von Dentz1-2/+0
Recent changes breaks HCIGETDEVINFO since it changes the size of hci_dev_info. Fixes: 26afbd826ee3 ("Bluetooth: Add initial implementation of CIS connections") Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-09-09net: core: fix flow symmetric hashLudovic Cintrat1-3/+2
__flow_hash_consistentify() wrongly swaps ipv4 addresses in few cases. This function is indirectly used by __skb_get_hash_symmetric(), which is used to fanout packets in AF_PACKET. Intrusion detection systems may be impacted by this issue. __flow_hash_consistentify() computes the addresses difference then swaps them if the difference is negative. In few cases src - dst and dst - src are both negative. The following snippet mimics __flow_hash_consistentify(): ``` #include <stdio.h> #include <stdint.h> int main(int argc, char** argv) { int diffs_d, diffd_s; uint32_t dst = 0xb225a8c0; /* 178.37.168.192 --> 192.168.37.178 */ uint32_t src = 0x3225a8c0; /* 50.37.168.192 --> 192.168.37.50 */ uint32_t dst2 = 0x3325a8c0; /* 51.37.168.192 --> 192.168.37.51 */ diffs_d = src - dst; diffd_s = dst - src; printf("src:%08x dst:%08x, diff(s-d)=%d(0x%x) diff(d-s)=%d(0x%x)\n", src, dst, diffs_d, diffs_d, diffd_s, diffd_s); diffs_d = src - dst2; diffd_s = dst2 - src; printf("src:%08x dst:%08x, diff(s-d)=%d(0x%x) diff(d-s)=%d(0x%x)\n", src, dst2, diffs_d, diffs_d, diffd_s, diffd_s); return 0; } ``` Results: src:3225a8c0 dst:b225a8c0, \ diff(s-d)=-2147483648(0x80000000) \ diff(d-s)=-2147483648(0x80000000) src:3225a8c0 dst:3325a8c0, \ diff(s-d)=-16777216(0xff000000) \ diff(d-s)=16777216(0x1000000) In the first case the addresses differences are always < 0, therefore __flow_hash_consistentify() always swaps, thus dst->src and src->dst packets have differents hashes. Fixes: c3f8324188fa8 ("net: Add full IPv6 addresses to flow_keys") Signed-off-by: Ludovic Cintrat <ludovic.cintrat@gatewatcher.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-09ipvlan: Fix out-of-bound bugs caused by unset skb->mac_headerLu Wei1-2/+4
If an AF_PACKET socket is used to send packets through ipvlan and the default xmit function of the AF_PACKET socket is changed from dev_queue_xmit() to packet_direct_xmit() via setsockopt() with the option name of PACKET_QDISC_BYPASS, the skb->mac_header may not be reset and remains as the initial value of 65535, this may trigger slab-out-of-bounds bugs as following: ================================================================= UG: KASAN: slab-out-of-bounds in ipvlan_xmit_mode_l2+0xdb/0x330 [ipvlan] PU: 2 PID: 1768 Comm: raw_send Kdump: loaded Not tainted 6.0.0-rc4+ #6 ardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 all Trace: print_address_description.constprop.0+0x1d/0x160 print_report.cold+0x4f/0x112 kasan_report+0xa3/0x130 ipvlan_xmit_mode_l2+0xdb/0x330 [ipvlan] ipvlan_start_xmit+0x29/0xa0 [ipvlan] __dev_direct_xmit+0x2e2/0x380 packet_direct_xmit+0x22/0x60 packet_snd+0x7c9/0xc40 sock_sendmsg+0x9a/0xa0 __sys_sendto+0x18a/0x230 __x64_sys_sendto+0x74/0x90 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd The root cause is: 1. packet_snd() only reset skb->mac_header when sock->type is SOCK_RAW and skb->protocol is not specified as in packet_parse_headers() 2. packet_direct_xmit() doesn't reset skb->mac_header as dev_queue_xmit() In this case, skb->mac_header is 65535 when ipvlan_xmit_mode_l2() is called. So when ipvlan_xmit_mode_l2() gets mac header with eth_hdr() which use "skb->head + skb->mac_header", out-of-bound access occurs. This patch replaces eth_hdr() with skb_eth_hdr() in ipvlan_xmit_mode_l2() and reset mac header in multicast to solve this out-of-bound bug. Fixes: 2ad7bf363841 ("ipvlan: Initial check-in of the IPVLAN driver.") Signed-off-by: Lu Wei <luwei32@huawei.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nfDavid S. Miller4-21/+86
Florian Westhal says: ==================== netfilter: bugfixes for net The following set contains four netfilter patches for your *net* tree. When there are multiple Contact headers in a SIP message its possible the next headers won't be found because the SIP helper confuses relative and absolute offsets in the message. From Igor Ryzhov. Make the nft_concat_range self-test support socat, this makes the selftest pass on my test VM, from myself. nf_conntrack_irc helper can be tricked into opening a local port forward that the client never requested by embedding a DCC message in a PING request sent to the client. Fix from David Leadbeater. Both have been broken since the kernel 2.6.x days. The 'osf' match might indicate success while it could not find anything, broken since 5.2 . Fix from Pablo Neira. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-08iavf: Fix cached head and tail value for iavf_get_tx_pendingBrett Creeley1-1/+4
The underlying hardware may or may not allow reading of the head or tail registers and it really makes no difference if we use the software cached values. So, always used the software cached values. Fixes: 9c6c12595b73 ("i40e: Detection and recovery of TX queue hung logic moved to service_task from tx_timeout") Signed-off-by: Brett Creeley <brett.creeley@intel.com> Co-developed-by: Norbert Zulinski <norbertx.zulinski@intel.com> Signed-off-by: Norbert Zulinski <norbertx.zulinski@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-09-08iavf: Fix change VF's mac addressSylwester Dziedziuch1-6/+3
Previously changing mac address gives false negative because ip link set <interface> address <MAC> return with RTNLINK: Permission denied. In iavf_set_mac was check if PF handled our mac set request, even before filter was added to list. Because this check returns always true and it never waits for PF's response. Move iavf_is_mac_handled to wait_event_interruptible_timeout instead of false. Now it will wait for PF's response and then check if address was added or rejected. Fixes: 35a2443d0910 ("iavf: Add waiting for response from PF in set mac") Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com> Co-developed-by: Norbert Zulinski <norbertx.zulinski@intel.com> Signed-off-by: Norbert Zulinski <norbertx.zulinski@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-09-08ice: Fix crash by keep old cfg when update TCs more than queuesDing Hui1-16/+26
There are problems if allocated queues less than Traffic Classes. Commit a632b2a4c920 ("ice: ethtool: Prohibit improper channel config for DCB") already disallow setting less queues than TCs. Another case is if we first set less queues, and later update more TCs config due to LLDP, ice_vsi_cfg_tc() will failed but left dirty num_txq/rxq and tc_cfg in vsi, that will cause invalid pointer access. [ 95.968089] ice 0000:3b:00.1: More TCs defined than queues/rings allocated. [ 95.968092] ice 0000:3b:00.1: Trying to use more Rx queues (8), than were allocated (1)! [ 95.968093] ice 0000:3b:00.1: Failed to config TC for VSI index: 0 [ 95.969621] general protection fault: 0000 [#1] SMP NOPTI [ 95.969705] CPU: 1 PID: 58405 Comm: lldpad Kdump: loaded Tainted: G U W O --------- -t - 4.18.0 #1 [ 95.969867] Hardware name: O.E.M/BC11SPSCB10, BIOS 8.23 12/30/2021 [ 95.969992] RIP: 0010:devm_kmalloc+0xa/0x60 [ 95.970052] Code: 5c ff ff ff 31 c0 5b 5d 41 5c c3 b8 f4 ff ff ff eb f4 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 89 d1 <8b> 97 60 02 00 00 48 8d 7e 18 48 39 f7 72 3f 55 89 ce 53 48 8b 4c [ 95.970344] RSP: 0018:ffffc9003f553888 EFLAGS: 00010206 [ 95.970425] RAX: dead000000000200 RBX: ffffea003c425b00 RCX: 00000000006080c0 [ 95.970536] RDX: 00000000006080c0 RSI: 0000000000000200 RDI: dead000000000200 [ 95.970648] RBP: dead000000000200 R08: 00000000000463c0 R09: ffff888ffa900000 [ 95.970760] R10: 0000000000000000 R11: 0000000000000002 R12: ffff888ff6b40100 [ 95.970870] R13: ffff888ff6a55018 R14: 0000000000000000 R15: ffff888ff6a55460 [ 95.970981] FS: 00007f51b7d24700(0000) GS:ffff88903ee80000(0000) knlGS:0000000000000000 [ 95.971108] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 95.971197] CR2: 00007fac5410d710 CR3: 0000000f2c1de002 CR4: 00000000007606e0 [ 95.971309] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 95.971419] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 95.971530] PKRU: 55555554 [ 95.971573] Call Trace: [ 95.971622] ice_setup_rx_ring+0x39/0x110 [ice] [ 95.971695] ice_vsi_setup_rx_rings+0x54/0x90 [ice] [ 95.971774] ice_vsi_open+0x25/0x120 [ice] [ 95.971843] ice_open_internal+0xb8/0x1f0 [ice] [ 95.971919] ice_ena_vsi+0x4f/0xd0 [ice] [ 95.971987] ice_dcb_ena_dis_vsi.constprop.5+0x29/0x90 [ice] [ 95.972082] ice_pf_dcb_cfg+0x29a/0x380 [ice] [ 95.972154] ice_dcbnl_setets+0x174/0x1b0 [ice] [ 95.972220] dcbnl_ieee_set+0x89/0x230 [ 95.972279] ? dcbnl_ieee_del+0x150/0x150 [ 95.972341] dcb_doit+0x124/0x1b0 [ 95.972392] rtnetlink_rcv_msg+0x243/0x2f0 [ 95.972457] ? dcb_doit+0x14d/0x1b0 [ 95.972510] ? __kmalloc_node_track_caller+0x1d3/0x280 [ 95.972591] ? rtnl_calcit.isra.31+0x100/0x100 [ 95.972661] netlink_rcv_skb+0xcf/0xf0 [ 95.972720] netlink_unicast+0x16d/0x220 [ 95.972781] netlink_sendmsg+0x2ba/0x3a0 [ 95.975891] sock_sendmsg+0x4c/0x50 [ 95.979032] ___sys_sendmsg+0x2e4/0x300 [ 95.982147] ? kmem_cache_alloc+0x13e/0x190 [ 95.985242] ? __wake_up_common_lock+0x79/0x90 [ 95.988338] ? __check_object_size+0xac/0x1b0 [ 95.991440] ? _copy_to_user+0x22/0x30 [ 95.994539] ? move_addr_to_user+0xbb/0xd0 [ 95.997619] ? __sys_sendmsg+0x53/0x80 [ 96.000664] __sys_sendmsg+0x53/0x80 [ 96.003747] do_syscall_64+0x5b/0x1d0 [ 96.006862] entry_SYSCALL_64_after_hwframe+0x65/0xca Only update num_txq/rxq when passed check, and restore tc_cfg if setup queue map failed. Fixes: a632b2a4c920 ("ice: ethtool: Prohibit improper channel config for DCB") Signed-off-by: Ding Hui <dinghui@sangfor.com.cn> Reviewed-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com> Tested-by: Arpana Arland <arpanax.arland@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-09-08ice: Don't double unplug aux on peer initiated resetDave Ertman1-2/+0
In the IDC callback that is accessed when the aux drivers request a reset, the function to unplug the aux devices is called. This function is also called in the ice_prepare_for_reset function. This double call is causing a "scheduling while atomic" BUG. [ 662.676430] ice 0000:4c:00.0 rocep76s0: cqp opcode = 0x1 maj_err_code = 0xffff min_err_code = 0x8003 [ 662.676609] ice 0000:4c:00.0 rocep76s0: [Modify QP Cmd Error][op_code=8] status=-29 waiting=1 completion_err=1 maj=0xffff min=0x8003 [ 662.815006] ice 0000:4c:00.0 rocep76s0: ICE OICR event notification: oicr = 0x10000003 [ 662.815014] ice 0000:4c:00.0 rocep76s0: critical PE Error, GLPE_CRITERR=0x00011424 [ 662.815017] ice 0000:4c:00.0 rocep76s0: Requesting a reset [ 662.815475] BUG: scheduling while atomic: swapper/37/0/0x00010002 [ 662.815475] BUG: scheduling while atomic: swapper/37/0/0x00010002 [ 662.815477] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs rfkill 8021q garp mrp stp llc vfat fat rpcrdma intel_rapl_msr intel_rapl_common sunrpc i10nm_edac rdma_ucm nfit ib_srpt libnvdimm ib_isert iscsi_target_mod x86_pkg_temp_thermal intel_powerclamp coretemp target_core_mod snd_hda_intel ib_iser snd_intel_dspcfg libiscsi snd_intel_sdw_acpi scsi_transport_iscsi kvm_intel iTCO_wdt rdma_cm snd_hda_codec kvm iw_cm ipmi_ssif iTCO_vendor_support snd_hda_core irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hwdep snd_seq snd_seq_device rapl snd_pcm snd_timer isst_if_mbox_pci pcspkr isst_if_mmio irdma intel_uncore idxd acpi_ipmi joydev isst_if_common snd mei_me idxd_bus ipmi_si soundcore i2c_i801 mei ipmi_devintf i2c_smbus i2c_ismt ipmi_msghandler acpi_power_meter acpi_pad rv(OE) ib_uverbs ib_cm ib_core xfs libcrc32c ast i2c_algo_bit drm_vram_helper drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm_ttm_helpe r ttm [ 662.815546] nvme nvme_core ice drm crc32c_intel i40e t10_pi wmi pinctrl_emmitsburg dm_mirror dm_region_hash dm_log dm_mod fuse [ 662.815557] Preemption disabled at: [ 662.815558] [<0000000000000000>] 0x0 [ 662.815563] CPU: 37 PID: 0 Comm: swapper/37 Kdump: loaded Tainted: G S OE 5.17.1 #2 [ 662.815566] Hardware name: Intel Corporation D50DNP/D50DNP, BIOS SE5C6301.86B.6624.D18.2111021741 11/02/2021 [ 662.815568] Call Trace: [ 662.815572] <IRQ> [ 662.815574] dump_stack_lvl+0x33/0x42 [ 662.815581] __schedule_bug.cold.147+0x7d/0x8a [ 662.815588] __schedule+0x798/0x990 [ 662.815595] schedule+0x44/0xc0 [ 662.815597] schedule_preempt_disabled+0x14/0x20 [ 662.815600] __mutex_lock.isra.11+0x46c/0x490 [ 662.815603] ? __ibdev_printk+0x76/0xc0 [ib_core] [ 662.815633] device_del+0x37/0x3d0 [ 662.815639] ice_unplug_aux_dev+0x1a/0x40 [ice] [ 662.815674] ice_schedule_reset+0x3c/0xd0 [ice] [ 662.815693] irdma_iidc_event_handler.cold.7+0xb6/0xd3 [irdma] [ 662.815712] ? bitmap_find_next_zero_area_off+0x45/0xa0 [ 662.815719] ice_send_event_to_aux+0x54/0x70 [ice] [ 662.815741] ice_misc_intr+0x21d/0x2d0 [ice] [ 662.815756] __handle_irq_event_percpu+0x4c/0x180 [ 662.815762] handle_irq_event_percpu+0xf/0x40 [ 662.815764] handle_irq_event+0x34/0x60 [ 662.815766] handle_edge_irq+0x9a/0x1c0 [ 662.815770] __common_interrupt+0x62/0x100 [ 662.815774] common_interrupt+0xb4/0xd0 [ 662.815779] </IRQ> [ 662.815780] <TASK> [ 662.815780] asm_common_interrupt+0x1e/0x40 [ 662.815785] RIP: 0010:cpuidle_enter_state+0xd6/0x380 [ 662.815789] Code: 49 89 c4 0f 1f 44 00 00 31 ff e8 65 d7 95 ff 45 84 ff 74 12 9c 58 f6 c4 02 0f 85 64 02 00 00 31 ff e8 ae c5 9c ff fb 45 85 f6 <0f> 88 12 01 00 00 49 63 d6 4c 2b 24 24 48 8d 04 52 48 8d 04 82 49 [ 662.815791] RSP: 0018:ff2c2c4f18edbe80 EFLAGS: 00000202 [ 662.815793] RAX: ff280805df140000 RBX: 0000000000000002 RCX: 000000000000001f [ 662.815795] RDX: 0000009a52da2d08 RSI: ffffffff93f8240b RDI: ffffffff93f53ee7 [ 662.815796] RBP: ff5e2bd11ff41928 R08: 0000000000000000 R09: 000000000002f8c0 [ 662.815797] R10: 0000010c3f18e2cf R11: 000000000000000f R12: 0000009a52da2d08 [ 662.815798] R13: ffffffff94ad7e20 R14: 0000000000000002 R15: 0000000000000000 [ 662.815801] cpuidle_enter+0x29/0x40 [ 662.815803] do_idle+0x261/0x2b0 [ 662.815807] cpu_startup_entry+0x19/0x20 [ 662.815809] start_secondary+0x114/0x150 [ 662.815813] secondary_startup_64_no_verify+0xd5/0xdb [ 662.815818] </TASK> [ 662.815846] bad: scheduling from the idle thread! [ 662.815849] CPU: 37 PID: 0 Comm: swapper/37 Kdump: loaded Tainted: G S W OE 5.17.1 #2 [ 662.815852] Hardware name: Intel Corporation D50DNP/D50DNP, BIOS SE5C6301.86B.6624.D18.2111021741 11/02/2021 [ 662.815853] Call Trace: [ 662.815855] <IRQ> [ 662.815856] dump_stack_lvl+0x33/0x42 [ 662.815860] dequeue_task_idle+0x20/0x30 [ 662.815863] __schedule+0x1c3/0x990 [ 662.815868] schedule+0x44/0xc0 [ 662.815871] schedule_preempt_disabled+0x14/0x20 [ 662.815873] __mutex_lock.isra.11+0x3a8/0x490 [ 662.815876] ? __ibdev_printk+0x76/0xc0 [ib_core] [ 662.815904] device_del+0x37/0x3d0 [ 662.815909] ice_unplug_aux_dev+0x1a/0x40 [ice] [ 662.815937] ice_schedule_reset+0x3c/0xd0 [ice] [ 662.815961] irdma_iidc_event_handler.cold.7+0xb6/0xd3 [irdma] [ 662.815979] ? bitmap_find_next_zero_area_off+0x45/0xa0 [ 662.815985] ice_send_event_to_aux+0x54/0x70 [ice] [ 662.816011] ice_misc_intr+0x21d/0x2d0 [ice] [ 662.816033] __handle_irq_event_percpu+0x4c/0x180 [ 662.816037] handle_irq_event_percpu+0xf/0x40 [ 662.816039] handle_irq_event+0x34/0x60 [ 662.816042] handle_edge_irq+0x9a/0x1c0 [ 662.816045] __common_interrupt+0x62/0x100 [ 662.816048] common_interrupt+0xb4/0xd0 [ 662.816052] </IRQ> [ 662.816053] <TASK> [ 662.816054] asm_common_interrupt+0x1e/0x40 [ 662.816057] RIP: 0010:cpuidle_enter_state+0xd6/0x380 [ 662.816060] Code: 49 89 c4 0f 1f 44 00 00 31 ff e8 65 d7 95 ff 45 84 ff 74 12 9c 58 f6 c4 02 0f 85 64 02 00 00 31 ff e8 ae c5 9c ff fb 45 85 f6 <0f> 88 12 01 00 00 49 63 d6 4c 2b 24 24 48 8d 04 52 48 8d 04 82 49 [ 662.816063] RSP: 0018:ff2c2c4f18edbe80 EFLAGS: 00000202 [ 662.816065] RAX: ff280805df140000 RBX: 0000000000000002 RCX: 000000000000001f [ 662.816067] RDX: 0000009a52da2d08 RSI: ffffffff93f8240b RDI: ffffffff93f53ee7 [ 662.816068] RBP: ff5e2bd11ff41928 R08: 0000000000000000 R09: 000000000002f8c0 [ 662.816070] R10: 0000010c3f18e2cf R11: 000000000000000f R12: 0000009a52da2d08 [ 662.816071] R13: ffffffff94ad7e20 R14: 0000000000000002 R15: 0000000000000000 [ 662.816075] cpuidle_enter+0x29/0x40 [ 662.816077] do_idle+0x261/0x2b0 [ 662.816080] cpu_startup_entry+0x19/0x20 [ 662.816083] start_secondary+0x114/0x150 [ 662.816087] secondary_startup_64_no_verify+0xd5/0xdb [ 662.816091] </TASK> [ 662.816169] bad: scheduling from the idle thread! The correct place to unplug the aux devices for a reset is in the prepare_for_reset function, as this is a common place for all reset flows. It also has built in protection from being called twice in a single reset instance before the aux devices are replugged. Fixes: f9f5301e7e2d4 ("ice: Register auxiliary device to provide RDMA") Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Tested-by: Helena Anna Dubel <helena.anna.dubel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-09-08Merge tag 'net-6.0-rc5' of ↵Linus Torvalds82-420/+913
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from rxrpc, netfilter, wireless and bluetooth subtrees. Current release - regressions: - skb: export skb drop reaons to user by TRACE_DEFINE_ENUM - bluetooth: fix regression preventing ACL packet transmission Current release - new code bugs: - dsa: microchip: fix kernel oops on ksz8 switches - dsa: qca8k: fix NULL pointer dereference for of_device_get_match_data Previous releases - regressions: - netfilter: clean up hook list when offload flags check fails - wifi: mt76: fix crash in chip reset fail - rxrpc: fix ICMP/ICMP6 error handling - ice: fix DMA mappings leak - i40e: fix kernel crash during module removal Previous releases - always broken: - ipv6: sr: fix out-of-bounds read when setting HMAC data. - tcp: TX zerocopy should not sense pfmemalloc status - sch_sfb: don't assume the skb is still around after enqueueing to child - netfilter: drop dst references before setting - wifi: wilc1000: fix DMA on stack objects - rxrpc: fix an insufficiently large sglist in rxkad_verify_packet_2() - fec: use a spinlock to guard `fep->ptp_clk_on` Misc: - usb: qmi_wwan: add Quectel RM520N" * tag 'net-6.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (50 commits) sch_sfb: Also store skb len before calling child enqueue net: phy: lan87xx: change interrupt src of link_up to comm_ready net/smc: Fix possible access to freed memory in link clear net: ethernet: mtk_eth_soc: check max allowed hash in mtk_ppe_check_skb net: skb: export skb drop reaons to user by TRACE_DEFINE_ENUM net: ethernet: mtk_eth_soc: fix typo in __mtk_foe_entry_clear net: dsa: felix: access QSYS_TAG_CONFIG under tas_lock in vsc9959_sched_speed_set net: dsa: felix: disable cut-through forwarding for frames oversized for tc-taprio net: dsa: felix: tc-taprio intervals smaller than MTU should send at least one packet net: usb: qmi_wwan: add Quectel RM520N net: dsa: qca8k: fix NULL pointer dereference for of_device_get_match_data tcp: fix early ETIMEDOUT after spurious non-SACK RTO stmmac: intel: Simplify intel_eth_pci_remove() net: mvpp2: debugfs: fix memory leak when using debugfs_lookup() ipv6: sr: fix out-of-bounds read when setting HMAC data. bonding: accept unsolicited NA message bonding: add all node mcast address when slave up bonding: use unspecified address if no available link local address wifi: use struct_group to copy addresses wifi: mac80211_hwsim: check length for virtio packets ...
2022-09-08fs: only do a memory barrier for the first set_buffer_uptodate()Linus Torvalds1-0/+11
Commit d4252071b97d ("add barriers to buffer_uptodate and set_buffer_uptodate") added proper memory barriers to the buffer head BH_Uptodate bit, so that anybody who tests a buffer for being up-to-date will be guaranteed to actually see initialized state. However, that commit didn't _just_ add the memory barrier, it also ended up dropping the "was it already set" logic that the BUFFER_FNS() macro had. That's conceptually the right thing for a generic "this is a memory barrier" operation, but in the case of the buffer contents, we really only care about the memory barrier for the _first_ time we set the bit, in that the only memory ordering protection we need is to avoid anybody seeing uninitialized memory contents. Any other access ordering wouldn't be about the BH_Uptodate bit anyway, and would require some other proper lock (typically BH_Lock or the folio lock). A reader that races with somebody invalidating the buffer head isn't an issue wrt the memory ordering, it's a serialization issue. Now, you'd think that the buffer head operations don't matter in this day and age (and I certainly thought so), but apparently some loads still end up being heavy users of buffer heads. In particular, the kernel test robot reported that not having this bit access optimization in place caused a noticeable direct IO performance regression on ext4: fxmark.ssd_ext4_no_jnl_DWTL_54_directio.works/sec -26.5% regression although you presumably need a fast disk and a lot of cores to actually notice. Link: https://lore.kernel.org/all/Yw8L7HTZ%2FdE2%2Fo9C@xsang-OptiPlex-9020/ Reported-by: kernel test robot <oliver.sang@intel.com> Tested-by: Fengwei Yin <fengwei.yin@intel.com> Cc: Mikulas Patocka <mpatocka@redhat.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-09-08Merge tag 'efi-urgent-for-v6.0-1' of ↵Linus Torvalds3-25/+14
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi Pull EFI fixes from Ard Biesheuvel: "A couple of low-priority EFI fixes: - prevent the randstruct plugin from re-ordering EFI protocol definitions - fix a use-after-free in the capsule loader - drop unused variable" * tag 'efi-urgent-for-v6.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: efi: capsule-loader: Fix use-after-free in efi_capsule_write efi/x86: libstub: remove unused variable efi: libstub: Disable struct randomization
2022-09-08sch_sfb: Also store skb len before calling child enqueueToke Høiland-Jørgensen1-1/+2
Cong Wang noticed that the previous fix for sch_sfb accessing the queued skb after enqueueing it to a child qdisc was incomplete: the SFB enqueue function was also calling qdisc_qstats_backlog_inc() after enqueue, which reads the pkt len from the skb cb field. Fix this by also storing the skb len, and using the stored value to increment the backlog after enqueueing. Fixes: 9efd23297cca ("sch_sfb: Don't assume the skb is still around after enqueueing to child") Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk> Acked-by: Cong Wang <cong.wang@bytedance.com> Link: https://lore.kernel.org/r/20220905192137.965549-1-toke@toke.dk Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-08net: phy: lan87xx: change interrupt src of link_up to comm_readyArun Ramadoss1-4/+54
Currently phy link up/down interrupt is enabled using the LAN87xx_INTERRUPT_MASK register. In the lan87xx_read_status function, phy link is determined using the T1_MODE_STAT_REG register comm_ready bit. comm_ready bit is set using the loc_rcvr_status & rem_rcvr_status. Whenever the phy link is up, LAN87xx_INTERRUPT_SOURCE link_up bit is set first but comm_ready bit takes some time to set based on local and remote receiver status. As per the current implementation, interrupt is triggered using link_up but the comm_ready bit is still cleared in the read_status function. So, link is always down. Initially tested with the shared interrupt mechanism with switch and internal phy which is working, but after implementing interrupt controller it is not working. It can fixed either by updating the read_status function to read from LAN87XX_INTERRUPT_SOURCE register or enable the interrupt mask for comm_ready bit. But the validation team recommends the use of comm_ready for link detection. This patch fixes by enabling the comm_ready bit for link_up in the LAN87XX_INTERRUPT_MASK_2 register (MISC Bank) and link_down in LAN87xx_INTERRUPT_MASK register. Fixes: 8a1b415d70b7 ("net: phy: added ethtool master-slave configuration support") Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20220905152750.5079-1-arun.ramadoss@microchip.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-07efi: capsule-loader: Fix use-after-free in efi_capsule_writeHyunwoo Kim1-24/+7
A race condition may occur if the user calls close() on another thread during a write() operation on the device node of the efi capsule. This is a race condition that occurs between the efi_capsule_write() and efi_capsule_flush() functions of efi_capsule_fops, which ultimately results in UAF. So, the page freeing process is modified to be done in efi_capsule_release() instead of efi_capsule_flush(). Cc: <stable@vger.kernel.org> # v4.9+ Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com> Link: https://lore.kernel.org/all/20220907102920.GA88602@ubuntu/ Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2022-09-07net/smc: Fix possible access to freed memory in link clearYacan Liu4-0/+13
After modifying the QP to the Error state, all RX WR would be completed with WC in IB_WC_WR_FLUSH_ERR status. Current implementation does not wait for it is done, but destroy the QP and free the link group directly. So there is a risk that accessing the freed memory in tasklet context. Here is a crash example: BUG: unable to handle page fault for address: ffffffff8f220860 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD f7300e067 P4D f7300e067 PUD f7300f063 PMD 8c4e45063 PTE 800ffff08c9df060 Oops: 0002 [#1] SMP PTI CPU: 1 PID: 0 Comm: swapper/1 Kdump: loaded Tainted: G S OE 5.10.0-0607+ #23 Hardware name: Inspur NF5280M4/YZMB-00689-101, BIOS 4.1.20 07/09/2018 RIP: 0010:native_queued_spin_lock_slowpath+0x176/0x1b0 Code: f3 90 48 8b 32 48 85 f6 74 f6 eb d5 c1 ee 12 83 e0 03 83 ee 01 48 c1 e0 05 48 63 f6 48 05 00 c8 02 00 48 03 04 f5 00 09 98 8e <48> 89 10 8b 42 08 85 c0 75 09 f3 90 8b 42 08 85 c0 74 f7 48 8b 32 RSP: 0018:ffffb3b6c001ebd8 EFLAGS: 00010086 RAX: ffffffff8f220860 RBX: 0000000000000246 RCX: 0000000000080000 RDX: ffff91db1f86c800 RSI: 000000000000173c RDI: ffff91db62bace00 RBP: ffff91db62bacc00 R08: 0000000000000000 R09: c00000010000028b R10: 0000000000055198 R11: ffffb3b6c001ea58 R12: ffff91db80e05010 R13: 000000000000000a R14: 0000000000000006 R15: 0000000000000040 FS: 0000000000000000(0000) GS:ffff91db1f840000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffff8f220860 CR3: 00000001f9580004 CR4: 00000000003706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <IRQ> _raw_spin_lock_irqsave+0x30/0x40 mlx5_ib_poll_cq+0x4c/0xc50 [mlx5_ib] smc_wr_rx_tasklet_fn+0x56/0xa0 [smc] tasklet_action_common.isra.21+0x66/0x100 __do_softirq+0xd5/0x29c asm_call_irq_on_stack+0x12/0x20 </IRQ> do_softirq_own_stack+0x37/0x40 irq_exit_rcu+0x9d/0xa0 sysvec_call_function_single+0x34/0x80 asm_sysvec_call_function_single+0x12/0x20 Fixes: bd4ad57718cc ("smc: initialize IB transport incl. PD, MR, QP, CQ, event, WR") Signed-off-by: Yacan Liu <liuyacan@corp.netease.com> Reviewed-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-07net: ethernet: mtk_eth_soc: check max allowed hash in mtk_ppe_check_skbLorenzo Bianconi1-0/+3
Even if max hash configured in hw in mtk_ppe_hash_entry is MTK_PPE_ENTRIES - 1, check theoretical OOB accesses in mtk_ppe_check_skb routine Fixes: c4f033d9e03e9 ("net: ethernet: mtk_eth_soc: rework hardware flow table management") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-07net: skb: export skb drop reaons to user by TRACE_DEFINE_ENUMMenglong Dong5-24/+87
As Eric reported, the 'reason' field is not presented when trace the kfree_skb event by perf: $ perf record -e skb:kfree_skb -a sleep 10 $ perf script ip_defrag 14605 [021] 221.614303: skb:kfree_skb: skbaddr=0xffff9d2851242700 protocol=34525 location=0xffffffffa39346b1 reason: The cause seems to be passing kernel address directly to TP_printk(), which is not right. As the enum 'skb_drop_reason' is not exported to user space through TRACE_DEFINE_ENUM(), perf can't get the drop reason string from the 'reason' field, which is a number. Therefore, we introduce the macro DEFINE_DROP_REASON(), which is used to define the trace enum by TRACE_DEFINE_ENUM(). With the help of DEFINE_DROP_REASON(), now we can remove the auto-generate that we introduced in the commit ec43908dd556 ("net: skb: use auto-generation to convert skb drop reason to string"), and define the string array 'drop_reasons'. Hmmmm...now we come back to the situation that have to maintain drop reasons in both enum skb_drop_reason and DEFINE_DROP_REASON. But they are both in dropreason.h, which makes it easier. After this commit, now the format of kfree_skb is like this: $ cat /tracing/events/skb/kfree_skb/format name: kfree_skb ID: 1524 format: field:unsigned short common_type; offset:0; size:2; signed:0; field:unsigned char common_flags; offset:2; size:1; signed:0; field:unsigned char common_preempt_count; offset:3; size:1; signed:0; field:int common_pid; offset:4; size:4; signed:1; field:void * skbaddr; offset:8; size:8; signed:0; field:void * location; offset:16; size:8; signed:0; field:unsigned short protocol; offset:24; size:2; signed:0; field:enum skb_drop_reason reason; offset:28; size:4; signed:0; print fmt: "skbaddr=%p protocol=%u location=%p reason: %s", REC->skbaddr, REC->protocol, REC->location, __print_symbolic(REC->reason, { 1, "NOT_SPECIFIED" }, { 2, "NO_SOCKET" } ...... Fixes: ec43908dd556 ("net: skb: use auto-generation to convert skb drop reason to string") Link: https://lore.kernel.org/netdev/CANn89i+bx0ybvE55iMYf5GJM48WwV1HNpdm9Q6t-HaEstqpCSA@mail.gmail.com/ Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Menglong Dong <imagedong@tencent.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-07net: ethernet: mtk_eth_soc: fix typo in __mtk_foe_entry_clearLorenzo Bianconi1-1/+1
Set ib1 state to MTK_FOE_STATE_UNBIND in __mtk_foe_entry_clear routine. Fixes: 33fc42de33278 ("net: ethernet: mtk_eth_soc: support creating mac address based offload entries") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-07netfilter: nfnetlink_osf: fix possible bogus match in nf_osf_find()Pablo Neira Ayuso1-1/+3
nf_osf_find() incorrectly returns true on mismatch, this leads to copying uninitialized memory area in nft_osf which can be used to leak stale kernel stack data to userspace. Fixes: 22c7652cdaa8 ("netfilter: nft_osf: Add version option support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>