summaryrefslogtreecommitdiffstats
path: root/drivers/net/ethernet/intel
AgeCommit message (Collapse)AuthorFilesLines
2022-06-22Merge branch '100GbE' of ↵Jakub Kicinski3-7/+89
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2022-06-21 This series contains updates to ice driver only. Marcin fixes GTP filters by allowing ignoring of the inner ethertype field. Wojciech adds VSI handle tracking in order to properly distinguish similar filters for removal. Anatolii removes ability to set 1000baseT and 1000baseX fields concurrently which caused link issues. He also disallows setting channels to less than the number of Traffic Classes which would cause NULL pointer dereference. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: ice: ethtool: Prohibit improper channel config for DCB ice: ethtool: advertise 1000M speeds properly ice: Fix switchdev rules book keeping ice: ignore protocol field in GTP offload ==================== Link: https://lore.kernel.org/r/20220621224756.631765-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-22igb: Make DMA faster when CPU is active on the PCIe linkKai-Heng Feng1-7/+5
Intel I210 on some Intel Alder Lake platforms can only achieve ~750Mbps Tx speed via iperf. The RR2DCDELAY shows around 0x2xxx DMA delay, which will be significantly lower when 1) ASPM is disabled or 2) SoC package c-state stays above PC3. When the RR2DCDELAY is around 0x1xxx the Tx speed can reach to ~950Mbps. According to the I210 datasheet "8.26.1 PCIe Misc. Register - PCIEMISC", "DMA Idle Indication" doesn't seem to tie to DMA coalesce anymore, so set it to 1b for "DMA is considered idle when there is no Rx or Tx AND when there are no TLPs indicating that CPU is active detected on the PCIe link (such as the host executes CSR or Configuration register read or write operation)" and performing Tx should also fall under "active CPU on PCIe link" case. In addition to that, commit b6e0c419f040 ("igb: Move DMA Coalescing init code to separate function.") seems to wrongly changed from enabling E1000_PCIEMISC_LX_DECISION to disabling it, also fix that. Fixes: b6e0c419f040 ("igb: Move DMA Coalescing init code to separate function.") Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20220621221056.604304-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-21ice: ethtool: Prohibit improper channel config for DCBAnatolii Gerasymenko2-5/+47
Do not allow setting less channels, than Traffic Classes there are via ethtool. There must be at least one channel per Traffic Class. If you set less channels, than Traffic Classes there are, then during ice_vsi_rebuild there would be allocated only the requested amount of tx/rx rings in ice_vsi_alloc_arrays. But later in ice_vsi_setup_q_map there would be requested at least one channel per Traffic Class. This results in setting num_rxq > alloc_rxq and num_txq > alloc_txq. Later, there would be a NULL pointer dereference in ice_vsi_map_rings_to_vectors, because we go beyond of rx_rings or tx_rings arrays. Change ice_set_channels() to return error if you try to allocate less channels, than Traffic Classes there are. Change ice_vsi_setup_q_map() and ice_vsi_setup_q_map_mqprio() to return status code instead of void. Add error handling for ice_vsi_setup_q_map() and ice_vsi_setup_q_map_mqprio() in ice_vsi_init() and ice_vsi_cfg_tc(). [53753.889983] INFO: Flow control is disabled for this traffic class (0) on this vsi. [53763.984862] BUG: unable to handle kernel NULL pointer dereference at 0000000000000028 [53763.992915] PGD 14b45f5067 P4D 0 [53763.996444] Oops: 0002 [#1] SMP NOPTI [53764.000312] CPU: 12 PID: 30661 Comm: ethtool Kdump: loaded Tainted: GOE --------- - - 4.18.0-240.el8.x86_64 #1 [53764.011825] Hardware name: Intel Corporation WilsonCity/WilsonCity, BIOS WLYDCRB1.SYS.0020.P21.2012150710 12/15/2020 [53764.022584] RIP: 0010:ice_vsi_map_rings_to_vectors+0x7e/0x120 [ice] [53764.029089] Code: 41 0d 0f b7 b7 12 05 00 00 0f b6 d0 44 29 de 44 0f b7 c6 44 01 c2 41 39 d0 7d 2d 4c 8b 47 28 44 0f b7 ce 83 c6 01 4f 8b 04 c8 <49> 89 48 28 4 c 8b 89 b8 01 00 00 4d 89 08 4c 89 81 b8 01 00 00 44 [53764.048379] RSP: 0018:ff550dd88ea47b20 EFLAGS: 00010206 [53764.053884] RAX: 0000000000000002 RBX: 0000000000000004 RCX: ff385ea42fa4a018 [53764.061301] RDX: 0000000000000006 RSI: 0000000000000005 RDI: ff385e9baeedd018 [53764.068717] RBP: 0000000000000010 R08: 0000000000000000 R09: 0000000000000004 [53764.076133] R10: 0000000000000002 R11: 0000000000000004 R12: 0000000000000000 [53764.083553] R13: 0000000000000000 R14: ff385e658fdd9000 R15: ff385e9baeedd018 [53764.090976] FS: 000014872c5b5740(0000) GS:ff385e847f100000(0000) knlGS:0000000000000000 [53764.099362] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [53764.105409] CR2: 0000000000000028 CR3: 0000000a820fa002 CR4: 0000000000761ee0 [53764.112851] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [53764.120301] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [53764.127747] PKRU: 55555554 [53764.130781] Call Trace: [53764.133564] ice_vsi_rebuild+0x611/0x870 [ice] [53764.138341] ice_vsi_recfg_qs+0x94/0x100 [ice] [53764.143116] ice_set_channels+0x1a8/0x3e0 [ice] [53764.147975] ethtool_set_channels+0x14e/0x240 [53764.152667] dev_ethtool+0xd74/0x2a10 [53764.156665] ? __mod_lruvec_state+0x44/0x110 [53764.161280] ? __mod_lruvec_state+0x44/0x110 [53764.165893] ? page_add_file_rmap+0x15/0x170 [53764.170518] ? inet_ioctl+0xd1/0x220 [53764.174445] ? netdev_run_todo+0x5e/0x290 [53764.178808] dev_ioctl+0xb5/0x550 [53764.182485] sock_do_ioctl+0xa0/0x140 [53764.186512] sock_ioctl+0x1a8/0x300 [53764.190367] ? selinux_file_ioctl+0x161/0x200 [53764.195090] do_vfs_ioctl+0xa4/0x640 [53764.199035] ksys_ioctl+0x60/0x90 [53764.202722] __x64_sys_ioctl+0x16/0x20 [53764.206845] do_syscall_64+0x5b/0x1a0 [53764.210887] entry_SYSCALL_64_after_hwframe+0x65/0xca Fixes: 87324e747fde ("ice: Implement ethtool ops for channels") Signed-off-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-06-21ice: ethtool: advertise 1000M speeds properlyAnatolii Gerasymenko1-1/+38
In current implementation ice_update_phy_type enables all link modes for selected speed. This approach doesn't work for 1000M speeds, because both copper (1000baseT) and optical (1000baseX) standards cannot be enabled at once. Fix this, by adding the function `ice_set_phy_type_from_speed()` for 1000M speeds. Fixes: 48cb27f2fd18 ("ice: Implement handlers for ethtool PHY/link operations") Signed-off-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-06-21ice: Fix switchdev rules book keepingWojciech Drewek1-0/+1
Adding two filters with same matching criteria ends up with one rule in hardware with act = ICE_FWD_TO_VSI_LIST. In order to remove them properly we have to keep the information about vsi handle which is used in VSI bitmap (ice_adv_fltr_mgmt_list_entry::vsi_list_info::vsi_map). Fixes: 0d08a441fb1a ("ice: ndo_setup_tc implementation for PF") Reported-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-06-21ice: ignore protocol field in GTP offloadMarcin Szycik1-1/+3
Commit 34a897758efe ("ice: Add support for inner etype in switchdev") added the ability to match on inner ethertype. A side effect of that change is that it is now impossible to add some filters for protocols which do not contain inner ethtype field. tc requires the protocol field to be specified when providing certain other options, e.g. src_ip. This is a problem in case of GTP - when user wants to specify e.g. src_ip, they also need to specify protocol in tc command (otherwise tc fails with: Illegal "src_ip"). Because GTP is a tunnel, the protocol field is treated as inner protocol. GTP does not contain inner ethtype field and the filter cannot be added. To fix this, ignore the ethertype field in case of GTP filters. Fixes: 9a225f81f540 ("ice: Support GTP-U and GTP-C offload in switchdev") Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-06-17igb: fix a use-after-free issue in igb_clean_tx_ringLorenzo Bianconi1-2/+5
Fix the following use-after-free bug in igb_clean_tx_ring routine when the NIC is running in XDP mode. The issue can be triggered redirecting traffic into the igb NIC and then closing the device while the traffic is flowing. [ 73.322719] CPU: 1 PID: 487 Comm: xdp_redirect Not tainted 5.18.3-apu2 #9 [ 73.330639] Hardware name: PC Engines APU2/APU2, BIOS 4.0.7 02/28/2017 [ 73.337434] RIP: 0010:refcount_warn_saturate+0xa7/0xf0 [ 73.362283] RSP: 0018:ffffc9000081f798 EFLAGS: 00010282 [ 73.367761] RAX: 0000000000000000 RBX: ffffc90000420f80 RCX: 0000000000000000 [ 73.375200] RDX: ffff88811ad22d00 RSI: ffff88811ad171e0 RDI: ffff88811ad171e0 [ 73.382590] RBP: 0000000000000900 R08: ffffffff82298f28 R09: 0000000000000058 [ 73.390008] R10: 0000000000000219 R11: ffffffff82280f40 R12: 0000000000000090 [ 73.397356] R13: ffff888102343a40 R14: ffff88810359e0e4 R15: 0000000000000000 [ 73.404806] FS: 00007ff38d31d740(0000) GS:ffff88811ad00000(0000) knlGS:0000000000000000 [ 73.413129] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 73.419096] CR2: 000055cff35f13f8 CR3: 0000000106391000 CR4: 00000000000406e0 [ 73.426565] Call Trace: [ 73.429087] <TASK> [ 73.431314] igb_clean_tx_ring+0x43/0x140 [igb] [ 73.436002] igb_down+0x1d7/0x220 [igb] [ 73.439974] __igb_close+0x3c/0x120 [igb] [ 73.444118] igb_xdp+0x10c/0x150 [igb] [ 73.447983] ? igb_pci_sriov_configure+0x70/0x70 [igb] [ 73.453362] dev_xdp_install+0xda/0x110 [ 73.457371] dev_xdp_attach+0x1da/0x550 [ 73.461369] do_setlink+0xfd0/0x10f0 [ 73.465166] ? __nla_validate_parse+0x89/0xc70 [ 73.469714] rtnl_setlink+0x11a/0x1e0 [ 73.473547] rtnetlink_rcv_msg+0x145/0x3d0 [ 73.477709] ? rtnl_calcit.isra.0+0x130/0x130 [ 73.482258] netlink_rcv_skb+0x8d/0x110 [ 73.486229] netlink_unicast+0x230/0x340 [ 73.490317] netlink_sendmsg+0x215/0x470 [ 73.494395] __sys_sendto+0x179/0x190 [ 73.498268] ? move_addr_to_user+0x37/0x70 [ 73.502547] ? __sys_getsockname+0x84/0xe0 [ 73.506853] ? netlink_setsockopt+0x1c1/0x4a0 [ 73.511349] ? __sys_setsockopt+0xc8/0x1d0 [ 73.515636] __x64_sys_sendto+0x20/0x30 [ 73.519603] do_syscall_64+0x3b/0x80 [ 73.523399] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 73.528712] RIP: 0033:0x7ff38d41f20c [ 73.551866] RSP: 002b:00007fff3b945a68 EFLAGS: 00000246 ORIG_RAX: 000000000000002c [ 73.559640] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ff38d41f20c [ 73.567066] RDX: 0000000000000034 RSI: 00007fff3b945b30 RDI: 0000000000000003 [ 73.574457] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000 [ 73.581852] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fff3b945ab0 [ 73.589179] R13: 0000000000000000 R14: 0000000000000003 R15: 00007fff3b945b30 [ 73.596545] </TASK> [ 73.598842] ---[ end trace 0000000000000000 ]--- Fixes: 9cbc948b5a20c ("igb: add XDP support") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Link: https://lore.kernel.org/r/e5c01d549dc37bff18e46aeabd6fb28a7bcf84be.1655388571.git.lorenzo@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-14ice: Fix memory corruption in VF driverPrzemyslaw Patynowski1-0/+5
Disable VF's RX/TX queues, when it's disabled. VF can have queues enabled, when it requests a reset. If PF driver assumes that VF is disabled, while VF still has queues configured, VF may unmap DMA resources. In such scenario device still can map packets to memory, which ends up silently corrupting it. Previously, VF driver could experience memory corruption, which lead to crash: [ 5119.170157] BUG: unable to handle kernel paging request at 00001b9780003237 [ 5119.170166] PGD 0 P4D 0 [ 5119.170173] Oops: 0002 [#1] PREEMPT_RT SMP PTI [ 5119.170181] CPU: 30 PID: 427592 Comm: kworker/u96:2 Kdump: loaded Tainted: G W I --------- - - 4.18.0-372.9.1.rt7.166.el8.x86_64 #1 [ 5119.170189] Hardware name: Dell Inc. PowerEdge R740/014X06, BIOS 2.3.10 08/15/2019 [ 5119.170193] Workqueue: iavf iavf_adminq_task [iavf] [ 5119.170219] RIP: 0010:__page_frag_cache_drain+0x5/0x30 [ 5119.170238] Code: 0f 0f b6 77 51 85 f6 74 07 31 d2 e9 05 df ff ff e9 90 fe ff ff 48 8b 05 49 db 33 01 eb b4 0f 1f 80 00 00 00 00 0f 1f 44 00 00 <f0> 29 77 34 74 01 c3 48 8b 07 f6 c4 80 74 0f 0f b6 77 51 85 f6 74 [ 5119.170244] RSP: 0018:ffffa43b0bdcfd78 EFLAGS: 00010282 [ 5119.170250] RAX: ffffffff896b3e40 RBX: ffff8fb282524000 RCX: 0000000000000002 [ 5119.170254] RDX: 0000000049000000 RSI: 0000000000000000 RDI: 00001b9780003203 [ 5119.170259] RBP: ffff8fb248217b00 R08: 0000000000000022 R09: 0000000000000009 [ 5119.170262] R10: 2b849d6300000000 R11: 0000000000000020 R12: 0000000000000000 [ 5119.170265] R13: 0000000000001000 R14: 0000000000000009 R15: 0000000000000000 [ 5119.170269] FS: 0000000000000000(0000) GS:ffff8fb1201c0000(0000) knlGS:0000000000000000 [ 5119.170274] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 5119.170279] CR2: 00001b9780003237 CR3: 00000008f3e1a003 CR4: 00000000007726e0 [ 5119.170283] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 5119.170286] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 5119.170290] PKRU: 55555554 [ 5119.170292] Call Trace: [ 5119.170298] iavf_clean_rx_ring+0xad/0x110 [iavf] [ 5119.170324] iavf_free_rx_resources+0xe/0x50 [iavf] [ 5119.170342] iavf_free_all_rx_resources.part.51+0x30/0x40 [iavf] [ 5119.170358] iavf_virtchnl_completion+0xd8a/0x15b0 [iavf] [ 5119.170377] ? iavf_clean_arq_element+0x210/0x280 [iavf] [ 5119.170397] iavf_adminq_task+0x126/0x2e0 [iavf] [ 5119.170416] process_one_work+0x18f/0x420 [ 5119.170429] worker_thread+0x30/0x370 [ 5119.170437] ? process_one_work+0x420/0x420 [ 5119.170445] kthread+0x151/0x170 [ 5119.170452] ? set_kthread_struct+0x40/0x40 [ 5119.170460] ret_from_fork+0x35/0x40 [ 5119.170477] Modules linked in: iavf sctp ip6_udp_tunnel udp_tunnel mlx4_en mlx4_core nfp tls vhost_net vhost vhost_iotlb tap tun xt_CHECKSUM ipt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat nft_counter nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink bridge stp llc rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache sunrpc intel_rapl_msr iTCO_wdt iTCO_vendor_support dell_smbios wmi_bmof dell_wmi_descriptor dcdbas kvm_intel kvm irqbypass intel_rapl_common isst_if_common skx_edac irdma nfit libnvdimm x86_pkg_temp_thermal i40e intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel ib_uverbs rapl ipmi_ssif intel_cstate intel_uncore mei_me pcspkr acpi_ipmi ib_core mei lpc_ich i2c_i801 ipmi_si ipmi_devintf wmi ipmi_msghandler acpi_power_meter xfs libcrc32c sd_mod t10_pi sg mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ice ahci drm libahci crc32c_intel libata tg3 megaraid_sas [ 5119.170613] i2c_algo_bit dm_mirror dm_region_hash dm_log dm_mod fuse [last unloaded: iavf] [ 5119.170627] CR2: 00001b9780003237 Fixes: ec4f5a436bdf ("ice: Check if VF is disabled for Opcode and other operations") Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com> Co-developed-by: Slawomir Laba <slawomirx.laba@intel.com> Signed-off-by: Slawomir Laba <slawomirx.laba@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-06-14ice: Fix queue config fail handlingPrzemyslaw Patynowski1-27/+26
Disable VF's RX/TX queues, when VIRTCHNL_OP_CONFIG_VSI_QUEUES fail. Not disabling them might lead to scenario, where PF driver leaves VF queues enabled, when VF's VSI failed queue config. In this scenario VF should not have RX/TX queues enabled. If PF failed to set up VF's queues, VF will reset due to TX timeouts in VF driver. Initialize iterator 'i' to -1, so if error happens prior to configuring queues then error path code will not disable queue 0. Loop that configures queues will is using same iterator, so error path code will only disable queues that were configured. Fixes: 77ca27c41705 ("ice: add support for virtchnl_queue_select.[tx|rx]_queues bitmap") Suggested-by: Slawomir Laba <slawomirx.laba@intel.com> Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-06-14ice: Sync VLAN filtering features for DVMRoman Storozhenko1-18/+31
VLAN filtering features, that is C-Tag and S-Tag, in DVM mode must be both enabled or disabled. In case of turning off/on only one of the features, another feature must be turned off/on automatically with issuing an appropriate message to the kernel log. Fixes: 1babaf77f49d ("ice: Advertise 802.1ad VLAN filtering and offloads for PF netdev") Signed-off-by: Roman Storozhenko <roman.storozhenko@intel.com> Co-developed-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com> Signed-off-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-06-14ice: Fix PTP TX timestamp offset calculationMichal Michalik2-1/+32
The offset was being incorrectly calculated for E822 - that led to collisions in choosing TX timestamp register location when more than one port was trying to use timestamping mechanism. In E822 one quad is being logically split between ports, so quad 0 is having trackers for ports 0-3, quad 1 ports 4-7 etc. Each port should have separate memory location for tracking timestamps. Due to error for example ports 1 and 2 had been assigned to quad 0 with same offset (0), while port 1 should have offset 0 and 1 offset 16. Fix it by correctly calculating quad offset. Fixes: 3a7496234d17 ("ice: implement basic E822 PTP support") Signed-off-by: Michal Michalik <michal.michalik@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-06-09iavf: Fix issue with MAC address of VF shown as zeroMichal Wilczynski1-1/+1
After reinitialization of iavf, ice driver gets VIRTCHNL_OP_ADD_ETH_ADDR message with incorrectly set type of MAC address. Hardware address should have is_primary flag set as true. This way ice driver knows what it has to set as a MAC address. Check if the address is primary in iavf_add_filter function and set flag accordingly. To test set all-zero MAC on a VF. This triggers iavf re-initialization and VIRTCHNL_OP_ADD_ETH_ADDR message gets sent to PF. For example: ip link set dev ens785 vf 0 mac 00:00:00:00:00:00 This triggers re-initialization of iavf. New MAC should be assigned. Now check if MAC is non-zero: ip link show dev ens785 Fixes: a3e839d539e0 ("iavf: Add usage of new virtchnl format to set default MAC") Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-06-09i40e: Fix call trace in setup_tx_descriptorsAleksandr Loktionov1-8/+17
After PF reset and ethtool -t there was call trace in dmesg sometimes leading to panic. When there was some time, around 5 seconds, between reset and test there were no errors. Problem was that pf reset calls i40e_vsi_close in prep_for_reset and ethtool -t calls i40e_vsi_close in diag_test. If there was not enough time between those commands the second i40e_vsi_close starts before previous i40e_vsi_close was done which leads to crash. Add check to diag_test if pf is in reset and don't start offline tests if it is true. Add netif_info("testing failed") into unhappy path of i40e_diag_test() Fixes: e17bc411aea8 ("i40e: Disable offline diagnostics if VFs are enabled") Fixes: 510efb2682b3 ("i40e: Fix ethtool offline diagnostic with netqueues") Signed-off-by: Michal Jaron <michalx.jaron@intel.com> Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-06-09i40e: Fix calculating the number of queue pairsGrzegorz Szczurek1-1/+1
If ADQ is enabled for a VF, then actual number of queue pair is a number of currently available traffic classes for this VF. Without this change the configuration of the Rx/Tx queues fails with error. Fixes: d29e0d233e0d ("i40e: missing input validation on VF message handling by the PF") Signed-off-by: Grzegorz Szczurek <grzegorzx.szczurek@intel.com> Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-06-09i40e: Fix adding ADQ filter to TC0Grzegorz Szczurek1-0/+5
Procedure of configure tc flower filters erroneously allows to create filters on TC0 where unfiltered packets are also directed by default. Issue was caused by insufficient checks of hw_tc parameter specifying the hardware traffic class to pass matching packets to. Fix checking hw_tc parameter which blocks creation of filters on TC0. Fixes: 2f4b411a3d67 ("i40e: Enable cloud filters via tc-flower") Signed-off-by: Grzegorz Szczurek <grzegorzx.szczurek@intel.com> Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-06-07ixgbe: fix unexpected VLAN Rx in promisc mode on VFOlivier Matz1-2/+2
When the promiscuous mode is enabled on a VF, the IXGBE_VMOLR_VPE bit (VLAN Promiscuous Enable) is set. This means that the VF will receive packets whose VLAN is not the same than the VLAN of the VF. For instance, in this situation: ┌────────┐ ┌────────┐ ┌────────┐ │ │ │ │ │ │ │ │ │ │ │ │ │ VF0├────┤VF1 VF2├────┤VF3 │ │ │ │ │ │ │ └────────┘ └────────┘ └────────┘ VM1 VM2 VM3 vf 0: vlan 1000 vf 1: vlan 1000 vf 2: vlan 1001 vf 3: vlan 1001 If we tcpdump on VF3, we see all the packets, even those transmitted on vlan 1000. This behavior prevents to bridge VF1 and VF2 in VM2, because it will create a loop: packets transmitted on VF1 will be received by VF2 and vice-versa, and bridged again through the software bridge. This patch remove the activation of VLAN Promiscuous when a VF enables the promiscuous mode. However, the IXGBE_VMOLR_UPE bit (Unicast Promiscuous) is kept, so that a VF receives all packets that has the same VLAN, whatever the destination MAC address. Fixes: 8443c1a4b192 ("ixgbe, ixgbevf: Add new mbox API xcast mode") Cc: stable@vger.kernel.org Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Olivier Matz <olivier.matz@6wind.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-06-07ixgbe: fix bcast packets Rx on VF after promisc removalOlivier Matz1-2/+2
After a VF requested to remove the promiscuous flag on an interface, the broadcast packets are not received anymore. This breaks some protocols like ARP. In ixgbe_update_vf_xcast_mode(), we should keep the IXGBE_VMOLR_BAM bit (Broadcast Accept) on promiscuous removal. This flag is already set by default in ixgbe_set_vmolr() on VF reset. Fixes: 8443c1a4b192 ("ixgbe, ixgbevf: Add new mbox API xcast mode") Cc: stable@vger.kernel.org Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Olivier Matz <olivier.matz@6wind.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-06-02ice: fix access-beyond-end in the switch codeAlexander Lobakin4-139/+115
Global `-Warray-bounds` enablement revealed some problems, one of which is the way we define and use AQC rules messages. In fact, they have a shared header, followed by the actual message, which can be of one of several different formats. So it is straightforward enough to define that header as a separate struct and then embed it into message structures as needed, but currently all the formats reside in one union coupled with the header. Then, the code allocates only the memory needed for a particular message format, leaving the union potentially incomplete. There are no actual reads or writes beyond the end of an allocated chunk, but at the same time, the whole implementation is fragile and backed by an equilibrium rather than strong type and memory checks. Define the structures the other way around: one for the common header and the rest for the actual formats with the header embedded. There are no places where several union members would be used at the same time anyway. This allows to use proper struct_size() and let the compiler know what is going to be done. Finally, unsilence `-Warray-bounds` back for ice_switch.c. Other little things worth mentioning: * &ice_sw_rule_vsi_list_query is not used anywhere, remove it. It's weird anyway to talk to hardware with purely kernel types (bitmaps); * expand the ICE_SW_RULE_*_SIZE() macros to pass a structure variable name to struct_size() to let it do strict typechecking; * rename ice_sw_rule_lkup_rx_tx::hdr to ::hdr_data to keep ::hdr for the header structure to have the same name for it constistenly everywhere; * drop the duplicate of %ICE_SW_RULE_RX_TX_NO_HDR_SIZE residing in ice_switch.h. Fixes: 9daf8208dd4d ("ice: Add support for switch filter programming") Fixes: 66486d8943ba ("ice: replace single-element array used for C struct hack") Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com> Acked-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20220601105924.2841410-1-alexandr.lobakin@intel.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-22eth: ice: silence the GCC 12 array-bounds warningJakub Kicinski1-0/+5
GCC 12 gets upset because driver allocates partial struct ice_aqc_sw_rules_elem buffers. The writes are within bounds. Silence these warnings for now, our build bot runs GCC 12 so we won't allow any new instances. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski5-19/+37
drivers/net/ethernet/mellanox/mlx5/core/main.c b33886971dbc ("net/mlx5: Initialize flow steering during driver probe") 40379a0084c2 ("net/mlx5_fpga: Drop INNOVA TLS support") f2b41b32cde8 ("net/mlx5: Remove ipsec_ops function table") https://lore.kernel.org/all/20220519040345.6yrjromcdistu7vh@sx1/ 16d42d313350 ("net/mlx5: Drain fw_reset when removing device") 8324a02c342a ("net/mlx5: Add exit route when waiting for FW") https://lore.kernel.org/all/20220519114119.060ce014@canb.auug.org.au/ tools/testing/selftests/net/mptcp/mptcp_join.sh e274f7154008 ("selftests: mptcp: add subflow limits test-cases") b6e074e171bc ("selftests: mptcp: add infinite map testcase") 5ac1d2d63451 ("selftests: mptcp: Add tests for userspace PM type") https://lore.kernel.org/all/20220516111918.366d747f@canb.auug.org.au/ net/mptcp/options.c ba2c89e0ea74 ("mptcp: fix checksum byte order") 1e39e5a32ad7 ("mptcp: infinite mapping sending") ea66758c1795 ("tcp: allow MPTCP to update the announced window") https://lore.kernel.org/all/20220519115146.751c3a37@canb.auug.org.au/ net/mptcp/pm.c 95d686517884 ("mptcp: fix subflow accounting on close") 4d25247d3ae4 ("mptcp: bypass in-kernel PM restrictions for non-kernel PMs") https://lore.kernel.org/all/20220516111435.72f35dca@canb.auug.org.au/ net/mptcp/subflow.c ae66fb2ba6c3 ("mptcp: Do TCP fallback on early DSS checksum failure") 0348c690ed37 ("mptcp: add the fallback check") f8d4bcacff3b ("mptcp: infinite mapping receiving") https://lore.kernel.org/all/20220519115837.380bb8d4@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-18igb: skip phy status check where unavailableKevin Mitchell1-1/+2
igb_read_phy_reg() will silently return, leaving phy_data untouched, if hw->ops.read_reg isn't set. Depending on the uninitialized value of phy_data, this led to the phy status check either succeeding immediately or looping continuously for 2 seconds before emitting a noisy err-level timeout. This message went out to the console even though there was no actual problem. Instead, first check if there is read_reg function pointer. If not, proceed without trying to check the phy status register. Fixes: b72f3f72005d ("igb: When GbE link up, wait for Remote receiver status condition") Signed-off-by: Kevin Mitchell <kevmitch@arista.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-17ice: Fix interrupt moderation settings getting clearedMichal Wilczynski2-11/+16
Adaptive-rx and Adaptive-tx are interrupt moderation settings that can be enabled/disabled using ethtool: ethtool -C ethX adaptive-rx on/off adaptive-tx on/off Unfortunately those settings are getting cleared after changing number of queues, or in ethtool world 'channels': ethtool -L ethX rx 1 tx 1 Clearing was happening due to introduction of bit fields in ice_ring_container struct. This way only itr_setting bits were rebuilt during ice_vsi_rebuild_set_coalesce(). Introduce an anonymous struct of bitfields and create a union to refer to them as a single variable. This way variable can be easily saved and restored. Fixes: 61dc79ced7aa ("ice: Restore interrupt throttle settings after VSI rebuild") Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-05-17ice: fix possible under reporting of ethtool Tx and Rx statisticsPaul Greenwalt1-3/+4
The hardware statistics counters are not cleared during resets so the drivers first access is to initialize the baseline and then subsequent reads are for reporting the counters. The statistics counters are read during the watchdog subtask when the interface is up. If the baseline is not initialized before the interface is up, then there can be a brief window in which some traffic can be transmitted/received before the initial baseline reading takes place. Directly initialize ethtool statistics in driver open so the baseline will be initialized when the interface is up, and any dropped packets incremented before the interface is up won't be reported. Fixes: 28dc1b86f8ea9 ("ice: ignore dropped packets during init") Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-05-17ice: fix crash when writing timestamp on RX ringsArkadiusz Kubalewski1-4/+15
Do not allow to write timestamps on RX rings if PF is being configured. When PF is being configured RX rings can be freed or rebuilt. If at the same time timestamps are updated, the kernel will crash by dereferencing null RX ring pointer. PID: 1449 TASK: ff187d28ed658040 CPU: 34 COMMAND: "ice-ptp-0000:51" #0 [ff1966a94a713bb0] machine_kexec at ffffffff9d05a0be #1 [ff1966a94a713c08] __crash_kexec at ffffffff9d192e9d #2 [ff1966a94a713cd0] crash_kexec at ffffffff9d1941bd #3 [ff1966a94a713ce8] oops_end at ffffffff9d01bd54 #4 [ff1966a94a713d08] no_context at ffffffff9d06bda4 #5 [ff1966a94a713d60] __bad_area_nosemaphore at ffffffff9d06c10c #6 [ff1966a94a713da8] do_page_fault at ffffffff9d06cae4 #7 [ff1966a94a713de0] page_fault at ffffffff9da0107e [exception RIP: ice_ptp_update_cached_phctime+91] RIP: ffffffffc076db8b RSP: ff1966a94a713e98 RFLAGS: 00010246 RAX: 16e3db9c6b7ccae4 RBX: ff187d269dd3c180 RCX: ff187d269cd4d018 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ff187d269cfcc644 R8: ff187d339b9641b0 R9: 0000000000000000 R10: 0000000000000002 R11: 0000000000000000 R12: ff187d269cfcc648 R13: ffffffff9f128784 R14: ffffffff9d101b70 R15: ff187d269cfcc640 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #8 [ff1966a94a713ea0] ice_ptp_periodic_work at ffffffffc076dbef [ice] #9 [ff1966a94a713ee0] kthread_worker_fn at ffffffff9d101c1b #10 [ff1966a94a713f10] kthread at ffffffff9d101b4d #11 [ff1966a94a713f50] ret_from_fork at ffffffff9da0023f Fixes: 77a781155a65 ("ice: enable receive hardware timestamping") Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Reviewed-by: Michal Schmidt <mschmidt@redhat.com> Tested-by: Dave Cain <dcain@redhat.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-05-13ice: Expose RSS indirection tables for queue groups via ethtoolSridhar Samudrala1-18/+51
When ADQ queue groups (TCs) are created via tc mqprio command, RSS contexts and associated RSS indirection tables are configured automatically per TC based on the queue ranges specified for each traffic class. For ex: tc qdisc add dev enp175s0f0 root mqprio num_tc 3 map 0 1 2 \ queues 2@0 8@2 4@10 hw 1 mode channel will create 3 queue groups (TC 0-2) with queue ranges 2, 8 and 4 in 3 queue groups. Each queue group is associated with its own RSS context and RSS indirection table. Add support to expose RSS indirection tables for all ADQ queue groups using ethtool RSS contexts interface. ethtool -x enp175s0f0 context <tc-num> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com> Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20220512213249.3747424-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-13ixgbe: add xdp frags support to ndo_xdp_xmitLorenzo Bianconi1-36/+63
Add the capability to map non-linear xdp frames in XDP_TX and ndo_xdp_xmit callback. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20220512212621.3746140-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-13Merge branch 'master' of ↵Jakub Kicinski4-10/+9
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2022-05-13 1) Cleanups for the code behind the XFRM offload API. This is a preparation for the extension of the API for policy offload. From Leon Romanovsky. * 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next: xfrm: drop not needed flags variable in XFRM offload struct net/mlx5e: Use XFRM state direction instead of flags netdevsim: rely on XFRM state direction instead of flags ixgbe: propagate XFRM offload state direction instead of flags xfrm: store and rely on direction to construct offload flags xfrm: rename xfrm_state_offload struct to allow reuse xfrm: delete not used number of external headers xfrm: free not used XFRM_ESP_NO_TRAILER flag ==================== Link: https://lore.kernel.org/r/20220513151218.4010119-1-steffen.klassert@secunet.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski6-41/+92
No conflicts. Build issue in drivers/net/ethernet/sfc/ptp.c 54fccfdd7c66 ("sfc: efx_default_channel_type APIs can be static") 49e6123c65da ("net: sfc: fix memory leak due to ptp channel") https://lore.kernel.org/all/20220510130556.52598fe2@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-11i40e: i40e_main: fix a missing check on list iteratorXiaomeng Tong1-13/+14
The bug is here: ret = i40e_add_macvlan_filter(hw, ch->seid, vdev->dev_addr, &aq_err); The list iterator 'ch' will point to a bogus position containing HEAD if the list is empty or no element is found. This case must be checked before any use of the iterator, otherwise it will lead to a invalid memory access. To fix this bug, use a new variable 'iter' as the list iterator, while use the origin variable 'ch' as a dedicated pointer to point to the found element. Cc: stable@vger.kernel.org Fixes: 1d8d80b4e4ff6 ("i40e: Add macvlan support on i40e") Signed-off-by: Xiaomeng Tong <xiam0nd.tong@gmail.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20220510204846.2166999-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-10igc: Change type of the 'igc_check_downshift' methodSasha Neftin2-6/+2
The 'igc_check_downshift' method always returns 0; there is no need for a return value so change the type of this method to void. Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-05-10igc: Remove unused phy_type enumSasha Neftin3-18/+3
Complete to commit 8e153faf5827 ("igc: Remove unused phy type") i225 parts have only one PHY. There is no point to use phy_type enum. Clean up the code accordingly, and get rid of the unused enum lines. Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-05-10igc: Remove igc_set_spd_dplx methodSasha Neftin2-51/+0
igc_set_spd_dplx method is not used. This patch comes to tidy up the driver code. Reported-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com> Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-05-09Merge branch '100GbE' of ↵Jakub Kicinski2-5/+4
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== 100GbE Intel Wired LAN Driver Updates 2022-05-06 Marcin Szycik says: This patchset adds support for systemd defined naming scheme for port representors, as well as re-enables displaying PCI bus-info in ethtool. bus-info information has previously been removed from ethtool for port representors, as a workaround for a bug in lshw tool, where the tool would sometimes display wrong descriptions for port representors/PF. Now the bug has been fixed in lshw tool [1]. Removing the workaround can be considered a regression (user might be running an older, unpatched version of lshw) (see [2] for discussion). However, calling SET_NETDEV_DEV also produces the same effect as removing the workaround, i.e. lshw is able to access PCI bus-info (this time not via ethtool, but in some other way) and the bug can occur. Adding SET_NETDEV_DEV is important, as it greatly improves netdev naming - - port representors are named based on PF name. Currently port representors are named "ethX", which might be confusing, especially when spawning VFs on multiple PFs. Furthermore, it's currently harder to determine to which PF does a particular port representor belong, as bus-info is not shown in ethtool. Consider the following three cases: Case 1: current code - driver workaround in place, no SET_NETDEV_DEV, lshw with or without fix. Port representors are not displayed because they don't have bus-info (the workaround), PFs are labelled correctly: $ sudo ./lshw -c net -businfo Bus info Device Class Description ======================================================== pci@0000:02:00.0 ens6f0 network Ethernet Controller E810-XXV for SFP <-- PF pci@0000:02:00.1 ens6f1 network Ethernet Controller E810-XXV for SFP pci@0000:02:01.0 ens6f0v0 network Ethernet Adaptive Virtual Function <-- VF pci@0000:02:01.1 ens6f0v1 network Ethernet Adaptive Virtual Function ... Case 2: driver workaround in place, SET_NETDEV_DEV, no lshw fix. Port representors have predictable names. lshw is able to get bus-info because of SET_NETDEV_DEV and netdevs CAN be mislabelled: $ sudo ./lshw -c net -businfo Bus info Device Class Description ============================================================= pci@0000:02:00.0 ens6f0npf0vf60 network Ethernet Controller E810-XXV for SFP <-- mislabeled port representor pci@0000:02:00.1 ens6f1 network Ethernet Controller E810-XXV for SFP pci@0000:02:01.0 ens6f0v0 network Ethernet Adaptive Virtual Function pci@0000:02:01.1 ens6f0v1 network Ethernet Adaptive Virtual Function ... pci@0000:02:00.0 ens6f0npf0vf26 network Ethernet interface pci@0000:02:00.0 ens6f0 network Ethernet interface <-- mislabeled PF pci@0000:02:00.0 ens6f0npf0vf81 network Ethernet interface ... $ sudo ethtool -i ens6f0npf0vf60 driver: ice ... bus-info: ... Output of lshw would be the same with workaround removed; it does not change the fact that lshw labels netdevs incorrectly, while at the same time it prevents ethtool from displaying potentially useful data (bus-info). Case 3: workaround removed, SET_NETDEV_DEV, lshw fix: $ sudo ./lshw -c net -businfo Bus info Device Class Description ============================================================= pci@0000:02:00.0 ens6f0npf0vf73 network Ethernet Controller E810-XXV for SFP pci@0000:02:00.1 ens6f1 network Ethernet Controller E810-XXV for SFP pci@0000:02:01.0 ens6f0v0 network Ethernet Adaptive Virtual Function pci@0000:02:01.1 ens6f0v1 network Ethernet Adaptive Virtual Function ... pci@0000:02:00.0 ens6f0npf0vf5 network Ethernet Controller E810-XXV for SFP pci@0000:02:00.0 ens6f0 network Ethernet Controller E810-XXV for SFP pci@0000:02:00.0 ens6f0npf0vf60 network Ethernet Controller E810-XXV for SFP ... $ sudo ethtool -i ens6f0npf0vf73 driver: ice ... bus-info: 0000:02:00.0 ... In this case poort representors have predictable names, netdevs are not mislabelled in lshw, and bus-info is shown in ethtool. [1] https://ezix.org/src/pkg/lshw/commit/9bf4e4c9c1 [2] https://patchwork.ozlabs.org/project/intel-wired-lan/patch/20220321144731.3935-1-marcin.szycik@linux.intel.com * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: Revert "ice: Hide bus-info in ethtool for PRs in switchdev mode" ice: link representors to PCI device ==================== Link: https://lore.kernel.org/r/20220506180052.5256-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-09rtnetlink: add extack support in fdb del handlersAlaa Mohamed1-1/+2
Add extack support to .ndo_fdb_del in netdevice.h and all related methods. Signed-off-by: Alaa Mohamed <eng.alaamohamedsoliman.am@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-08eth: switch to netif_napi_add_weight()Jakub Kicinski1-1/+1
Switch all Ethernet drivers which use custom napi weights to the new API. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-06Merge branch '10GbE' of ↵Jakub Kicinski2-4/+4
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== 10GbE Intel Wired LAN Driver Updates 2022-05-05 This series contains updates to ixgbe and igb drivers. Jeff Daly adjusts type for 'allow_unsupported_sfp' to match the associated struct value for ixgbe. Alaa Mohamed converts, deprecated, kmap() call to kmap_local_page() for igb. * '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: igb: Convert kmap() to kmap_local_page() ixgbe: Fix module_param allow_unsupported_sfp type ==================== Link: https://lore.kernel.org/r/20220505155651.2606195-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-06ice: fix PTP stale Tx timestamps cleanupMichal Michalik1-2/+8
Read stale PTP Tx timestamps from PHY on cleanup. After running out of Tx timestamps request handlers, hardware (HW) stops reporting finished requests. Function ice_ptp_tx_tstamp_cleanup() used to only clean up stale handlers in driver and was leaving the hardware registers not read. Not reading stale PTP Tx timestamps prevents next interrupts from arriving and makes timestamping unusable. Fixes: ea9b847cda64 ("ice: enable transmit timestamps for E810 devices") Signed-off-by: Michal Michalik <michal.michalik@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-05-06ice: clear stale Tx queue settings before configuringAnatolii Gerasymenko1-18/+50
The iAVF driver uses 3 virtchnl op codes to communicate with the PF regarding the VF Tx queues: * VIRTCHNL_OP_CONFIG_VSI_QUEUES configures the hardware and firmware logic for the Tx queues * VIRTCHNL_OP_ENABLE_QUEUES configures the queue interrupts * VIRTCHNL_OP_DISABLE_QUEUES disables the queue interrupts and Tx rings. There is a bug in the iAVF driver due to the race condition between VF reset request and shutdown being executed in parallel. This leads to a break in logic and VIRTCHNL_OP_DISABLE_QUEUES is not being sent. If this occurs, the PF driver never cleans up the Tx queues. This results in leaving behind stale Tx queue settings in the hardware and firmware. The most obvious outcome is that upon the next VIRTCHNL_OP_CONFIG_VSI_QUEUES, the PF will fail to program the Tx scheduler node due to a lack of space. We need to protect ICE driver against such situation. To fix this, make sure we clear existing stale settings out when handling VIRTCHNL_OP_CONFIG_VSI_QUEUES. This ensures we remove the previous settings. Calling ice_vf_vsi_dis_single_txq should be safe as it will do nothing if the queue is not configured. The function already handles the case when the Tx queue is not currently configured and exits with a 0 return in that case. Fixes: 7ad15440acf8 ("ice: Refactor VIRTCHNL_OP_CONFIG_VSI_QUEUES handling") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-05-06ice: Fix race during aux device (un)pluggingIvan Vecera3-8/+20
Function ice_plug_aux_dev() assigns pf->adev field too early prior aux device initialization and on other side ice_unplug_aux_dev() starts aux device deinit and at the end assigns NULL to pf->adev. This is wrong because pf->adev should always be non-NULL only when aux device is fully initialized and ready. This wrong order causes a crash when ice_send_event_to_aux() call occurs because that function depends on non-NULL value of pf->adev and does not assume that aux device is half-initialized or half-destroyed. After order correction the race window is tiny but it is still there, as Leon mentioned and manipulation with pf->adev needs to be protected by mutex. Fix (un-)plugging functions so pf->adev field is set after aux device init and prior aux device destroy and protect pf->adev assignment by new mutex. This mutex is also held during ice_send_event_to_aux() call to ensure that aux device is valid during that call. Note that device lock used ice_send_event_to_aux() needs to be kept to avoid race with aux drv unload. Reproducer: cycle=1 while :;do echo "#### Cycle: $cycle" ip link set ens7f0 mtu 9000 ip link add bond0 type bond mode 1 miimon 100 ip link set bond0 up ifenslave bond0 ens7f0 ip link set bond0 mtu 9000 ethtool -L ens7f0 combined 1 ip link del bond0 ip link set ens7f0 mtu 1500 sleep 1 let cycle++ done In short when the device is added/removed to/from bond the aux device is unplugged/plugged. When MTU of the device is changed an event is sent to aux device asynchronously. This can race with (un)plugging operation and because pf->adev is set too early (plug) or too late (unplug) the function ice_send_event_to_aux() can touch uninitialized or destroyed fields. In the case of crash below pf->adev->dev.mutex. Crash: [ 53.372066] bond0: (slave ens7f0): making interface the new active one [ 53.378622] bond0: (slave ens7f0): Enslaving as an active interface with an u p link [ 53.386294] IPv6: ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready [ 53.549104] bond0: (slave ens7f1): Enslaving as a backup interface with an up link [ 54.118906] ice 0000:ca:00.0 ens7f0: Number of in use tx queues changed inval idating tc mappings. Priority traffic classification disabled! [ 54.233374] ice 0000:ca:00.1 ens7f1: Number of in use tx queues changed inval idating tc mappings. Priority traffic classification disabled! [ 54.248204] bond0: (slave ens7f0): Releasing backup interface [ 54.253955] bond0: (slave ens7f1): making interface the new active one [ 54.274875] bond0: (slave ens7f1): Releasing backup interface [ 54.289153] bond0 (unregistering): Released all slaves [ 55.383179] MII link monitoring set to 100 ms [ 55.398696] bond0: (slave ens7f0): making interface the new active one [ 55.405241] BUG: kernel NULL pointer dereference, address: 0000000000000080 [ 55.405289] bond0: (slave ens7f0): Enslaving as an active interface with an u p link [ 55.412198] #PF: supervisor write access in kernel mode [ 55.412200] #PF: error_code(0x0002) - not-present page [ 55.412201] PGD 25d2ad067 P4D 0 [ 55.412204] Oops: 0002 [#1] PREEMPT SMP NOPTI [ 55.412207] CPU: 0 PID: 403 Comm: kworker/0:2 Kdump: loaded Tainted: G S 5.17.0-13579-g57f2d6540f03 #1 [ 55.429094] bond0: (slave ens7f1): Enslaving as a backup interface with an up link [ 55.430224] Hardware name: Dell Inc. PowerEdge R750/06V45N, BIOS 1.4.4 10/07/ 2021 [ 55.430226] Workqueue: ice ice_service_task [ice] [ 55.468169] RIP: 0010:mutex_unlock+0x10/0x20 [ 55.472439] Code: 0f b1 13 74 96 eb e0 4c 89 ee eb d8 e8 79 54 ff ff 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 65 48 8b 04 25 40 ef 01 00 31 d2 <f0> 48 0f b1 17 75 01 c3 e9 e3 fe ff ff 0f 1f 00 0f 1f 44 00 00 48 [ 55.491186] RSP: 0018:ff4454230d7d7e28 EFLAGS: 00010246 [ 55.496413] RAX: ff1a79b208b08000 RBX: ff1a79b2182e8880 RCX: 0000000000000001 [ 55.503545] RDX: 0000000000000000 RSI: ff4454230d7d7db0 RDI: 0000000000000080 [ 55.510678] RBP: ff1a79d1c7e48b68 R08: ff4454230d7d7db0 R09: 0000000000000041 [ 55.517812] R10: 00000000000000a5 R11: 00000000000006e6 R12: ff1a79d1c7e48bc0 [ 55.524945] R13: 0000000000000000 R14: ff1a79d0ffc305c0 R15: 0000000000000000 [ 55.532076] FS: 0000000000000000(0000) GS:ff1a79d0ffc00000(0000) knlGS:0000000000000000 [ 55.540163] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 55.545908] CR2: 0000000000000080 CR3: 00000003487ae003 CR4: 0000000000771ef0 [ 55.553041] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 55.560173] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 55.567305] PKRU: 55555554 [ 55.570018] Call Trace: [ 55.572474] <TASK> [ 55.574579] ice_service_task+0xaab/0xef0 [ice] [ 55.579130] process_one_work+0x1c5/0x390 [ 55.583141] ? process_one_work+0x390/0x390 [ 55.587326] worker_thread+0x30/0x360 [ 55.590994] ? process_one_work+0x390/0x390 [ 55.595180] kthread+0xe6/0x110 [ 55.598325] ? kthread_complete_and_exit+0x20/0x20 [ 55.603116] ret_from_fork+0x1f/0x30 [ 55.606698] </TASK> Fixes: f9f5301e7e2d ("ice: Register auxiliary device to provide RDMA") Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Dave Ertman <david.m.ertman@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-05-06Revert "ice: Hide bus-info in ethtool for PRs in switchdev mode"Marcin Szycik1-5/+3
This reverts commit bfaaba99e680bf82bf2cbf69866c3f37434ff766. Commit bfaaba99e680 ("ice: Hide bus-info in ethtool for PRs in switchdev mode") was a workaround for lshw tool displaying incorrect descriptions for port representors and PF in switchdev mode. Now the issue has been fixed in the lshw tool itself [1]. Removing the workaround can be considered a regression, as the user might be running older, unpatched lshw version. However, another important change (ice: link representors to PCI device, which improves port representor netdev naming with SET_NETDEV_DEV) also causes the same "regression" as removing the workaround, i.e. unpatched lshw is able to access bus-info information (this time not via ethtool) and the bug can occur. Therefore, the workaround no longer prevents the bug and can be removed. [1] https://ezix.org/src/pkg/lshw/commit/9bf4e4c9c1 Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-05-06ice: link representors to PCI deviceMichal Swiatkowski1-0/+1
Link port representors to parent PCI device to benefit from systemd defined naming scheme. Example from ip tool: - without linking: eth0 ... - with linking: eth0 ... altname enp24s0f0npf0vf0 The port representor name is being shown in altname, because the name is longer than IFNAMSIZ (16) limit. Altname can be used in ip tool. Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-05-06net: make drivers set the TSO limit not the GSO limitJakub Kicinski1-2/+2
Drivers should call the TSO setting helper, GSO is controllable by user space. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-06ixgbe: propagate XFRM offload state direction instead of flagsLeon Romanovsky4-10/+9
Convert the ixgbe driver to rely on XFRM offload state direction instead of flags bits that were not checked at all. Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-05-05ice: remove period on argument description in ice_for_each_vfJacob Keller1-2/+2
The ice_for_each_vf macros have comments describing the implementation. One of the arguments has a period on the end, which is not our typical style. Remove the unnecessary period. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-05-05ice: add a function comment for ice_cfg_mac_antispoofJacob Keller1-0/+7
This function definition was missing a comment describing its implementation. Add one. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-05-05ice: fix wording in comment for ice_reset_vfJacob Keller1-2/+2
The comment explaining ice_reset_vf has an extraneous "the" with the "if the resets are disabled". Remove it. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-05-05ice: remove return value comment for ice_reset_all_vfsJacob Keller1-2/+2
Since commit fe99d1c06c16 ("ice: make ice_reset_all_vfs void"), the ice_reset_all_vfs function has not returned anything. The function comment still indicated it did. Fix this. While here, also add a line to clarify the function resets all VFs at once in response to hardware resets such as a PF reset. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-05-05ice: always check VF VSI pointer valuesJacob Keller6-7/+77
The ice_get_vf_vsi function can return NULL in some cases, such as if handling messages during a reset where the VSI is being removed and recreated. Several places throughout the driver do not bother to check whether this VSI pointer is valid. Static analysis tools maybe report issues because they detect paths where a potentially NULL pointer could be dereferenced. Fix this by checking the return value of ice_get_vf_vsi everywhere. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-05-05ice: add newline to dev_dbg in ice_vf_fdir_dump_infoJacob Keller1-1/+1
The debug print in ice_vf_fdir_dump_info does not end in newlines. This can look confusing when reading the kernel log, as the next print will immediately continue on the same line. Fix this by adding the forgotten newline. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-05-05ice: get switch id on switchdev devicesMichal Swiatkowski2-0/+37
Switch id should be the same for each netdevice on a driver. The id must be unique between devices on the same system, but does not need to be unique between devices on different systems. The switch id is used to locate ports on a switch and to know if aggregated ports belong to the same switch. To meet this requirements, use pci_get_dsn as switch id value, as this is unique value for each devices on the same system. Implementing switch id is needed by automatic tools for kubernetes. Set switch id by setting devlink port attribiutes and calling devlink_port_attrs_set while creating pf (for uplink) and vf (for representator) devlink port. To get switch id (in switchdev mode): cat /sys/class/net/$PF0/phys_switch_id Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>