summaryrefslogtreecommitdiffstats
path: root/drivers/net/ethernet/intel/ice
AgeCommit message (Collapse)AuthorFilesLines
2022-04-05ice: Do not skip not enabled queues in ice_vc_dis_qs_msgAnatolii Gerasymenko1-2/+2
Disable check for queue being enabled in ice_vc_dis_qs_msg, because there could be a case when queues were created, but were not enabled. We still need to delete those queues. Normal workflow for VF looks like: Enable path: VIRTCHNL_OP_ADD_ETH_ADDR (opcode 10) VIRTCHNL_OP_CONFIG_VSI_QUEUES (opcode 6) VIRTCHNL_OP_ENABLE_QUEUES (opcode 8) Disable path: VIRTCHNL_OP_DISABLE_QUEUES (opcode 9) VIRTCHNL_OP_DEL_ETH_ADDR (opcode 11) The issue appears only in stress conditions when VF is enabled and disabled very fast. Eventually there will be a case, when queues are created by VIRTCHNL_OP_CONFIG_VSI_QUEUES, but are not enabled by VIRTCHNL_OP_ENABLE_QUEUES. In turn, these queues are not deleted by VIRTCHNL_OP_DISABLE_QUEUES, because there is a check whether queues are enabled in ice_vc_dis_qs_msg. When we bring up the VF again, we will see the "Failed to set LAN Tx queue context" error during VIRTCHNL_OP_CONFIG_VSI_QUEUES step. This happens because old 16 queues were not deleted and VF requests to create 16 more, but ice_sched_get_free_qparent in ice_ena_vsi_txq would fail to find a parent node for first newly requested queue (because all nodes are allocated to 16 old queues). Testing Hints: Just enable and disable VF fast enough, so it would be disabled before reaching VIRTCHNL_OP_ENABLE_QUEUES. while true; do ip link set dev ens785f0v0 up sleep 0.065 # adjust delay value for you machine ip link set dev ens785f0v0 down done Fixes: 77ca27c41705 ("ice: add support for virtchnl_queue_select.[tx|rx]_queues bitmap") Signed-off-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Alice Michael <alice.michael@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-04-05ice: Set txq_teid to ICE_INVAL_TEID on ring creationAnatolii Gerasymenko1-0/+1
When VF is freshly created, but not brought up, ring->txq_teid value is by default set to 0. But 0 is a valid TEID. On some platforms the Root Node of Tx scheduler has a TEID = 0. This can cause issues as shown below. The proper way is to set ring->txq_teid to ICE_INVAL_TEID (0xFFFFFFFF). Testing Hints: echo 1 > /sys/class/net/ens785f0/device/sriov_numvfs ip link set dev ens785f0v0 up ip link set dev ens785f0v0 down If we have freshly created VF and quickly turn it on and off, so there would be no time to reach VIRTCHNL_OP_CONFIG_VSI_QUEUES stage, then VIRTCHNL_OP_DISABLE_QUEUES stage will fail with error: [ 639.531454] disable queue 89 failed 14 [ 639.532233] Failed to disable LAN Tx queues, error: ICE_ERR_AQ_ERROR [ 639.533107] ice 0000:02:00.0: Failed to stop Tx ring 0 on VSI 5 The reason for the fail is that we are trying to send AQ command to delete queue 89, which has never been created and receive an "invalid argument" error from firmware. As this queue has never been created, it's teid and ring->txq_teid have default value 0. ice_dis_vsi_txq has a check against non-existent queues: node = ice_sched_find_node_by_teid(pi->root, q_teids[i]); if (!node) continue; But on some platforms the Root Node of Tx scheduler has a teid = 0. Hence, ice_sched_find_node_by_teid finds a node with teid = 0 (it is pi->root), and we go further to submit an erroneous request to firmware. Fixes: 37bb83901286 ("ice: Move common functions out of ice_main.c part 7/7") Signed-off-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Alice Michael <alice.michael@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-04-01ice: Fix broken IFF_ALLMULTI handlingIvan Vecera3-38/+121
Handling of all-multicast flag and associated multicast promiscuous mode is broken in ice driver. When an user switches allmulticast flag on or off the driver checks whether any VLANs are configured over the interface (except default VLAN 0). If any extra VLANs are registered it enables multicast promiscuous mode for all these VLANs (including default VLAN 0) using ICE_SW_LKUP_PROMISC_VLAN look-up type. In this situation all multicast packets tagged with known VLAN ID or untagged are received and multicast packets tagged with unknown VLAN ID ignored. If no extra VLANs are registered (so only VLAN 0 exists) it enables multicast promiscuous mode for VLAN 0 and uses ICE_SW_LKUP_PROMISC look-up type. In this situation any multicast packets including tagged ones are received. The driver handles IFF_ALLMULTI in ice_vsi_sync_fltr() this way: ice_vsi_sync_fltr() { ... if (changed_flags & IFF_ALLMULTI) { if (netdev->flags & IFF_ALLMULTI) { if (vsi->num_vlans > 1) ice_set_promisc(..., ICE_MCAST_VLAN_PROMISC_BITS); else ice_set_promisc(..., ICE_MCAST_PROMISC_BITS); } else { if (vsi->num_vlans > 1) ice_clear_promisc(..., ICE_MCAST_VLAN_PROMISC_BITS); else ice_clear_promisc(..., ICE_MCAST_PROMISC_BITS); } } ... } The code above depends on value vsi->num_vlan that specifies number of VLANs configured over the interface (including VLAN 0) and this is problem because that value is modified in NDO callbacks ice_vlan_rx_add_vid() and ice_vlan_rx_kill_vid(). Scenario 1: 1. ip link set ens7f0 allmulticast on 2. ip link add vlan10 link ens7f0 type vlan id 10 3. ip link set ens7f0 allmulticast off 4. ip link set ens7f0 allmulticast on [1] In this scenario IFF_ALLMULTI is enabled and the driver calls ice_set_promisc(..., ICE_MCAST_PROMISC_BITS) that installs multicast promisc rule with non-VLAN look-up type. [2] Then VLAN with ID 10 is added and vsi->num_vlan incremented to 2 [3] Command switches IFF_ALLMULTI off and the driver calls ice_clear_promisc(..., ICE_MCAST_VLAN_PROMISC_BITS) but this call is effectively NOP because it looks for multicast promisc rules for VLAN 0 and VLAN 10 with VLAN look-up type but no such rules exist. So the all-multicast remains enabled silently in hardware. [4] Command tries to switch IFF_ALLMULTI on and the driver calls ice_clear_promisc(..., ICE_MCAST_PROMISC_BITS) but this call fails (-EEXIST) because non-VLAN multicast promisc rule already exists. Scenario 2: 1. ip link add vlan10 link ens7f0 type vlan id 10 2. ip link set ens7f0 allmulticast on 3. ip link add vlan20 link ens7f0 type vlan id 20 4. ip link del vlan10 ; ip link del vlan20 5. ip link set ens7f0 allmulticast off [1] VLAN with ID 10 is added and vsi->num_vlan==2 [2] Command switches IFF_ALLMULTI on and driver installs multicast promisc rules with VLAN look-up type for VLAN 0 and 10 [3] VLAN with ID 20 is added and vsi->num_vlan==3 but no multicast promisc rules is added for this new VLAN so the interface does not receive MC packets from VLAN 20 [4] Both VLANs are removed but multicast rule for VLAN 10 remains installed so interface receives multicast packets from VLAN 10 [5] Command switches IFF_ALLMULTI off and because vsi->num_vlan is 1 the driver tries to remove multicast promisc rule for VLAN 0 with non-VLAN look-up that does not exist. All-multicast looks disabled from user point of view but it is partially enabled in HW (interface receives all multicast packets either untagged or tagged with VLAN ID 10) To resolve these issues the patch introduces these changes: 1. Adds handling for IFF_ALLMULTI to ice_vlan_rx_add_vid() and ice_vlan_rx_kill_vid() callbacks. So when VLAN is added/removed and IFF_ALLMULTI is enabled an appropriate multicast promisc rule for that VLAN ID is added/removed. 2. In ice_vlan_rx_add_vid() when first VLAN besides VLAN 0 is added so (vsi->num_vlan == 2) and IFF_ALLMULTI is enabled then look-up type for existing multicast promisc rule for VLAN 0 is updated to ICE_MCAST_VLAN_PROMISC_BITS. 3. In ice_vlan_rx_kill_vid() when last VLAN besides VLAN 0 is removed so (vsi->num_vlan == 1) and IFF_ALLMULTI is enabled then look-up type for existing multicast promisc rule for VLAN 0 is updated to ICE_MCAST_PROMISC_BITS. 4. Both ice_vlan_rx_{add,kill}_vid() have to run under ICE_CFG_BUSY bit protection to avoid races with ice_vsi_sync_fltr() that runs in ice_service_task() context. 5. Bit ICE_VSI_VLAN_FLTR_CHANGED is use-less and can be removed. 6. Error messages added to ice_fltr_*_vsi_promisc() helper functions to avoid them in their callers 7. Small improvements to increase readability Fixes: 5eda8afd6bcc ("ice: Add support for PF/VF promiscuous mode") Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Alice Michael <alice.michael@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-01ice: Fix MAC address settingIvan Vecera1-2/+5
Commit 2ccc1c1ccc671b ("ice: Remove excess error variables") merged the usage of 'status' and 'err' variables into single one in function ice_set_mac_address(). Unfortunately this causes a regression when call of ice_fltr_add_mac() returns -EEXIST because this return value does not indicate an error in this case but value of 'err' remains to be -EEXIST till the end of the function and is returned to caller. Prior mentioned commit this does not happen because return value of ice_fltr_add_mac() was stored to 'status' variable first and if it was -EEXIST then 'err' remains to be zero. Fix the problem by reset 'err' to zero when ice_fltr_add_mac() returns -EEXIST. Fixes: 2ccc1c1ccc671b ("ice: Remove excess error variables") Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Alexander Lobakin <alexandr.lobakin@intel.com> Signed-off-by: Alice Michael <alice.michael@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-01ice: Clear default forwarding VSI during VSI releaseIvan Vecera1-0/+2
VSI is set as default forwarding one when promisc mode is set for PF interface, when PF is switched to switchdev mode or when VF driver asks to enable allmulticast or promisc mode for the VF interface (when vf-true-promisc-support priv flag is off). The third case is buggy because in that case VSI associated with VF remains as default one after VF removal. Reproducer: 1. Create VF echo 1 > sys/class/net/ens7f0/device/sriov_numvfs 2. Enable allmulticast or promisc mode on VF ip link set ens7f0v0 allmulticast on ip link set ens7f0v0 promisc on 3. Delete VF echo 0 > sys/class/net/ens7f0/device/sriov_numvfs 4. Try to enable promisc mode on PF ip link set ens7f0 promisc on Although it looks that promisc mode on PF is enabled the opposite is true because ice_vsi_sync_fltr() responsible for IFF_PROMISC handling first checks if any other VSI is set as default forwarding one and if so the function does not do anything. At this point it is not possible to enable promisc mode on PF without re-probe device. To resolve the issue this patch clear default forwarding VSI during ice_vsi_release() when the VSI to be released is the default one. Fixes: 01b5e89aab49 ("ice: Add VF promiscuous support") Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Alice Michael <alice.michael@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-28ice: xsk: Fix indexing in ice_tx_xsk_pool()Maciej Fijalkowski1-1/+1
Ice driver tries to always create XDP rings array to be num_possible_cpus() sized, regardless of user's queue count setting that can be changed via ethtool -L for example. Currently, ice_tx_xsk_pool() calculates the qid by decrementing the ring->q_index by the count of XDP queues, but ring->q_index is set to 'i + vsi->alloc_txq'. When user did ethtool -L $IFACE combined 1, alloc_txq is 1, but vsi->num_xdp_txq is still num_possible_cpus(). Then, ice_tx_xsk_pool() will do OOB access and in the final result ring would not get xsk_pool pointer assigned. Then, each ice_xsk_wakeup() call will fail with error and it will not be possible to get into NAPI and do the processing from driver side. Fix this by decrementing vsi->alloc_txq instead of vsi->num_xdp_txq from ring-q_index in ice_tx_xsk_pool() so the calculation is reflected to the setting of ring->q_index. Fixes: 22bf877e528f ("ice: introduce XDP_TX fallback path") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220328142123.170157-5-maciej.fijalkowski@intel.com
2022-03-28ice: xsk: Stop Rx processing when ntc catches ntuMaciej Fijalkowski1-0/+3
This can happen with big budget values and some breakage of re-filling descriptors as we do not clear the entry that ntu is pointing at the end of ice_alloc_rx_bufs_zc. So if ntc is at ntu then it might be the case that status_error0 has an old, uncleared value and ntc would go over with processing which would result in false results. Break Rx loop when ntc == ntu to avoid broken behavior. Fixes: 3876ff525de7 ("ice: xsk: Handle SW XDP ring wrap and bump tail more often") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220328142123.170157-4-maciej.fijalkowski@intel.com
2022-03-28ice: xsk: Eliminate unnecessary loop iterationMagnus Karlsson1-1/+1
The NIC Tx ring completion routine cleans entries from the ring in batches. However, it processes one more batch than it is supposed to. Note that this does not matter from a functionality point of view since it will not find a set DD bit for the next batch and just exit the loop. But from a performance perspective, it is faster to terminate the loop before and not issue an expensive read over PCIe to get the DD bit. Fixes: 126cdfe1007a ("ice: xsk: Improve AF_XDP ZC Tx and use batching API") Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220328142123.170157-3-maciej.fijalkowski@intel.com
2022-03-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski3-10/+20
Merge in overtime fixes, no conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-23ice: don't allow to run ice_send_event_to_aux() in atomic ctxAlexander Lobakin1-0/+3
ice_send_event_to_aux() eventually descends to mutex_lock() (-> might_sched()), so it must not be called under non-task context. However, at least two fixes have happened already for the bug splats occurred due to this function being called from atomic context. To make the emergency landings softer, bail out early when executed in non-task context emitting a warn splat only once. This way we trade some events being potentially lost for system stability and avoid any related hangs and crashes. Fixes: 348048e724a0e ("ice: Implement iidc operations") Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Tested-by: Michal Kubiak <michal.kubiak@intel.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Acked-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-23ice: fix 'scheduling while atomic' on aux critical err interruptAlexander Lobakin2-10/+17
There's a kernel BUG splat on processing aux critical error interrupts in ice_misc_intr(): [ 2100.917085] BUG: scheduling while atomic: swapper/15/0/0x00010000 ... [ 2101.060770] Call Trace: [ 2101.063229] <IRQ> [ 2101.065252] dump_stack+0x41/0x60 [ 2101.068587] __schedule_bug.cold.100+0x4c/0x58 [ 2101.073060] __schedule+0x6a4/0x830 [ 2101.076570] schedule+0x35/0xa0 [ 2101.079727] schedule_preempt_disabled+0xa/0x10 [ 2101.084284] __mutex_lock.isra.7+0x310/0x420 [ 2101.088580] ? ice_misc_intr+0x201/0x2e0 [ice] [ 2101.093078] ice_send_event_to_aux+0x25/0x70 [ice] [ 2101.097921] ice_misc_intr+0x220/0x2e0 [ice] [ 2101.102232] __handle_irq_event_percpu+0x40/0x180 [ 2101.106965] handle_irq_event_percpu+0x30/0x80 [ 2101.111434] handle_irq_event+0x36/0x53 [ 2101.115292] handle_edge_irq+0x82/0x190 [ 2101.119148] handle_irq+0x1c/0x30 [ 2101.122480] do_IRQ+0x49/0xd0 [ 2101.125465] common_interrupt+0xf/0xf [ 2101.129146] </IRQ> ... As Andrew correctly mentioned previously[0], the following call ladder happens: ice_misc_intr() <- hardirq ice_send_event_to_aux() device_lock() mutex_lock() might_sleep() might_resched() <- oops Add a new PF state bit which indicates that an aux critical error occurred and serve it in ice_service_task() in process context. The new ice_pf::oicr_err_reg is read-write in both hardirq and process contexts, but only 3 bits of non-critical data probably aren't worth explicit synchronizing (and they're even in the same byte [31:24]). [0] https://lore.kernel.org/all/YeSRUVmrdmlUXHDn@lunn.ch Fixes: 348048e724a0e ("ice: Implement iidc operations") Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Tested-by: Michal Kubiak <michal.kubiak@intel.com> Acked-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17Merge branch '100GbE' of ↵Jakub Kicinski4-3/+35
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== 100GbE Intel Wired LAN Driver Updates 2022-03-16 This series contains updates to gtp and ice driver. Wojciech fixes smatch reported inconsistent indenting for gtp and ice. Yang Yingliang fixes a couple of return value checks for GNSS to IS_PTR instead of null. Jacob adds support for trace events on tx timestamps. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: ice: add trace events for tx timestamps ice: fix return value check in ice_gnss.c ice: Fix inconsistent indenting in ice_switch gtp: Fix inconsistent indenting ==================== Link: https://lore.kernel.org/r/20220316204024.3201500-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-3/+4
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-16ice: add trace events for tx timestampsJacob Keller2-0/+32
We've previously run into many issues related to the latency of a Tx timestamp completion with the ice hardware. It can be difficult to determine the root cause of a slow Tx timestamp. To aid in this, introduce new trace events which capture timing data about when the driver reaches certain points while processing a transmit timestamp * ice_tx_tstamp_request: Trace when the stack initiates a new timestamp request. * ice_tx_tstamp_fw_req: Trace when the driver begins a read of the timestamp register in the work thread. * ice_tx_tstamp_fw_done: Trace when the driver finishes reading a timestamp register in the work thread. * ice_tx_tstamp_complete: Trace when the driver submits the skb back to the stack with a completed Tx timestamp. These trace events can be enabled using the standard trace event subsystem exposed by the ice driver. If they are disabled, they become no-ops with no run time cost. The following is a simple GNU AWK script which can highlight one potential way to use the trace events to capture latency data from the trace buffer about how long the driver takes to process a timestamp: ----- BEGIN { PREC=256 } # Detect requests /tx_tstamp_request/ { time=strtonum($4) skb=$7 # Store the time of request for this skb requests[skb] = time printf("skb %s: idx %d at %.6f\n", skb, idx, time) } # Detect completions /tx_tstamp_complete/ { time=strtonum($4) skb=$7 idx=$9 if (skb in requests) { latency = (time - requests[skb]) * 1000 printf("skb %s: %.3f to complete\n", skb, latency) if (latency > 4) { printf(">>> HIGH LATENCY <<<\n") } printf("\n") } else { printf("!!! skb %s (idx %d) at %.6f\n", skb, idx, time) } } ----- Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-16ice: fix return value check in ice_gnss.cYang Yingliang1-2/+2
kthread_create_worker() and tty_alloc_driver() return ERR_PTR() and never return NULL. The NULL test in the return value check should be replaced with IS_ERR(). Fixes: 43113ff73453 ("ice: add TTY for GNSS module for E810T device") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-16ice: Fix inconsistent indenting in ice_switchWojciech Drewek1-1/+1
Fix the following warning as reported by smatch: smatch warnings: drivers/net/ethernet/intel/ice/ice_switch.c:5568 ice_find_dummy_packet() warn: inconsistent indenting Fixes: 9a225f81f540 ("ice: Support GTP-U and GTP-C offload in switchdev") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-15ice: destroy flow director filter mutex after releasing VSIsSudheer Mogilappagari1-1/+1
Currently fdir_fltr_lock is accessed in ice_vsi_release_all() function after it is destroyed. Instead destroy mutex after ice_vsi_release_all. Fixes: 40319796b732 ("ice: Add flow director support for channel mode") Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com> Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-15ice: fix NULL pointer dereference in ice_update_vsi_tx_ring_stats()Maciej Fijalkowski1-2/+3
It is possible to do NULL pointer dereference in routine that updates Tx ring stats. Currently only stats and bytes are updated when ring pointer is valid, but later on ring is accessed to propagate gathered Tx stats onto VSI stats. Change the existing logic to move to next ring when ring is NULL. Fixes: e72bba21355d ("ice: split ice_ring onto Tx/Rx separate structs") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Acked-by: Alexander Lobakin <alexandr.lobakin@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-15ice: remove PF pointer from ice_check_vf_initJacob Keller3-16/+14
The ice_check_vf_init function takes both a PF and a VF pointer. Every caller looks up the PF pointer from the VF structure. Some callers only use of the PF pointer is call this function. Move the lookup inside ice_check_vf_init and drop the unnecessary argument. Cleanup the callers to drop the now unnecessary local variables. In particular, replace the local PF pointer with a HW structure pointer in ice_vc_get_vf_res_msg which simplifies a few accesses to the HW structure in that function. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-15ice: introduce ice_virtchnl.c and ice_virtchnl.hJacob Keller5-3825/+3869
Just as we moved the generic virtualization library logic into ice_vf_lib.c, move the virtchnl message handling into ice_virtchnl.c Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-15ice: cleanup long lines in ice_sriov.cJacob Keller1-12/+27
Before we move the virtchnl message handling from ice_sriov.c into ice_virtchnl.c, cleanup some long line warnings to avoid checkpatch.pl complaints. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-15ice: introduce ICE_VF_RESET_LOCK flagJacob Keller4-16/+19
The ice_reset_vf function performs actions which must be taken only while holding the VF configuration lock. Some flows already acquired the lock, while other flows must acquire it just for the reset function. Add the ICE_VF_RESET_LOCK flag to the function so that it can handle taking and releasing the lock instead at the appropriate scope. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-15ice: introduce ICE_VF_RESET_NOTIFY flagJacob Keller3-39/+34
In some cases of resetting a VF, the PF would like to first notify the VF that a reset is impending. This is currently done via ice_vc_notify_vf_reset. A wrapper to ice_reset_vf, ice_vf_reset_vf, is used to call this function first before calling ice_reset_vf. In fact, every single call to ice_vc_notify_vf_reset occurs just prior to a call to ice_vc_reset_vf. Now that ice_reset_vf has flags, replace this separate call with an ICE_VF_RESET_NOTIFY flag. This removes an unnecessary exported function of ice_vc_notify_vf_reset, and also makes there be a single function to reset VFs (ice_reset_vf). Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-15ice: convert ice_reset_vf to take flagsJacob Keller4-9/+17
The ice_reset_vf function takes a boolean parameter which indicates whether or not the reset is due to a VFLR event. This is somewhat confusing to read because readers must interpret what "true" and "false" mean when seeing a line of code like "ice_reset_vf(vf, false)". We will want to add another toggle to the ice_reset_vf in a following change. To avoid proliferating many arguments, convert this function to take flags instead. ICE_VF_RESET_VFLR will indicate if this is a VFLR reset. A value of 0 indicates no flags. One could argue that "ice_reset_vf(vf, 0)" is no more readable than "ice_reset_vf(vf, false)".. However, this type of flags interface is somewhat common and using 0 to mean "no flags" makes sense in this context. We could bother to add a define for "ICE_VF_RESET_PLAIN" or something similar, but this can be confusing since its not an actual bit flag. This paves the way to add another flag to the function in a following change. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-15ice: convert ice_reset_vf to standard error codesJacob Keller2-10/+11
The ice_reset_vf function returns a boolean value indicating whether or not the VF reset. This is a bit confusing since it means that callers need to know how to interpret the return value when needing to indicate an error. Refactor the function and call sites to report a regular error code. We still report success (i.e. return 0) in cases where the reset is in progress or is disabled. Existing callers don't care because they do not check the return value. We keep the error code anyways instead of a void return because we expect future code which may care about or at least report the error value. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-15ice: make ice_reset_all_vfs voidJacob Keller2-8/+5
The ice_reset_all_vfs function returns true if any VFs were reset, and false otherwise. However, no callers check the return value. Drop this return value and make the function void since the callers do not care about this. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-15ice: drop is_vflr parameter from ice_reset_all_vfsJacob Keller3-7/+6
The ice_reset_all_vfs function takes a parameter to handle whether its operating after a VFLR event or not. This is not necessary as every caller always passes true. Simplify the interface by removing the parameter. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-15ice: move reset functionality into ice_vf_lib.cJacob Keller5-490/+488
Now that the reset functions do not rely on Single Root specific behavior, move the ice_reset_vf, ice_reset_all_vfs, and ice_vf_rebuild_host_cfg functions and their dependent helper functions out of ice_sriov.c and into ice_vf_lib.c Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-15ice: fix a long line warning in ice_reset_vfJacob Keller1-1/+2
We're about to move ice_reset_vf out of ice_sriov.c and into ice_vf_lib.c One of the dev_err statements has a checkpatch.pl violation due to putting the vf->vf_id on the same line as the dev_err. Fix this style issue first before moving the code. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-15ice: introduce VF operations structure for reset flowsJacob Keller3-139/+196
The ice driver currently supports virtualization using Single Root IOV, with code in the ice_sriov.c file. In the future, we plan to also implement support for Scalable IOV, which uses slightly different hardware implementations for some functionality. To eventually allow this, we introduce a new ice_vf_ops structure which will contain the basic operations that are different between the two IOV implementations. This primarily includes logic for how to handle the VF reset registers, as well as what to do before and after rebuilding the VF's VSI. Implement these ops structures and call the ops table instead of directly calling the SR-IOV specific function. This will allow us to easily add the Scalable IOV implementation in the future. Additionally, it helps separate the generalized VF logic from SR-IOV specifics. This change allows us to move the reset logic out of ice_sriov.c and into ice_vf_lib.c without placing any Single Root specific details into the generic file. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-15ice: fix incorrect dev_dbg print mistaking 'i' for vf->vf_idJacob Keller1-1/+2
If we fail to clear the malicious VF indication after a VF reset, the dev_dbg message which is printed uses the local variable 'i' when it meant to use vf->vf_id. Fix this. Fixes: 0891c89674e8 ("ice: warn about potentially malicious VFs") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-15ice: introduce ice_vf_lib.c, ice_vf_lib.h, and ice_vf_lib_private.hJacob Keller8-711/+823
Introduce the ice_vf_lib.c file along with the ice_vf_lib.h and ice_vf_lib_private.h header files. These files will house the generic VF structures and access functions. Move struct ice_vf and its dependent definitions into this new header file. The ice_vf_lib.c is compiled conditionally on CONFIG_PCI_IOV. Some of its functionality is required by all driver files. However, some of its functionality will only be required by other files also conditionally compiled based on CONFIG_PCI_IOV. Declaring these functions used only in CONFIG_PCI_IOV files in ice_vf_lib.h is verbose. This is because we must provide a fallback implementation for each function in this header since it is included in files which may not be compiled with CONFIG_PCI_IOV. Instead, introduce a new ice_vf_lib_private.h header which verifies that CONFIG_PCI_IOV is enabled. This header is intended to be directly included in .c files which are CONFIG_PCI_IOV only. Add a #error indication that will complain if the file ever gets included by another C file on a kernel with CONFIG_PCI_IOV disabled. Add a comment indicating the nature of the file and why it is useful. This makes it so that we can easily define functions exposed from ice_vf_lib.c into other virtualization files without needing to add fallback implementations for every single function. This begins the path to separate out generic code which will be reused by other virtualization implementations from ice_sriov.h and ice_sriov.c Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-14ice: use ice_is_vf_trusted helper functionJacob Keller1-10/+10
The ice_vc_cfg_promiscuous_mode_msg function directly checks ICE_VIRTCHNL_VF_CAP_PRIVILEGE, instead of using the existing helper function ice_is_vf_trusted. Switch this to use the helper function so that all trusted checks are consistent. This aids in any potential future refactor by ensuring consistent code. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-14ice: log an error message when eswitch fails to configureJacob Keller1-1/+3
When ice_eswitch_configure fails, print an error message to make it more obvious why VF initialization did not succeed. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-14ice: cleanup error logging for ice_ena_vfsJacob Keller1-13/+19
The ice_ena_vfs function and some of its sub-functions like ice_set_per_vf_res use a "if (<function>) { <print error> ; <exit> }" flow. This flow discards specialized errors reported by the called function. This style is generally not preferred as it makes tracing error sources more difficult. It also means we cannot log the actual error received properly. Refactor several calls in the ice_ena_vfs function that do this to catch the error in the 'ret' variable. Report this in the messages, and then return the more precise error value. Doing this reveals that ice_set_per_vf_res returns -EINVAL or -EIO in places where -ENOSPC makes more sense. Fix these calls up to return the more appropriate value. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-14ice: move ice_set_vf_port_vlan near other .ndo opsJacob Keller1-96/+96
The ice_set_vf_port_vlan function is located in ice_sriov.c very far away from the other .ndo operations that it is similar to. Move this so that its located near the other .ndo operation definitions. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-14ice: refactor spoofchk control code in ice_sriov.cJacob Keller1-11/+7
The API to control the VSI spoof checking for a VF VSI has three functions: enable, disable, and set. The set function takes the VSI and the VF and decides whether to call enable or disable based on the vf->spoofchk field. In some flows, vf->spoofchk is not yet set, such as the function used to control the setting for a VF. (vf->spoofchk is only updated after a success). Simplify this API by refactoring ice_vf_set_spoofchk_cfg to be "ice_vsi_apply_spoofchk" which takes the boolean and allows all callers to avoid having to determine whether to call enable or disable themselves. This matches the expected callers better, and will prevent the need to export more than one function when this code must be called from another file. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-14ice: rename ICE_MAX_VF_COUNT to avoid confusionJacob Keller3-7/+7
The ICE_MAX_VF_COUNT field is defined in ice_sriov.h. This count is true for SR-IOV but will not be true for all VF implementations, such as when the ice driver supports Scalable IOV. Rename this definition to clearly indicate ICE_MAX_SRIOV_VFS. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-14ice: remove unused definitions from ice_sriov.hJacob Keller2-7/+1
A few more macros exist in ice_sriov.h which are not used anywhere. These can be safely removed. Note that ICE_VIRTCHNL_VF_CAP_L2 capability is set but never checked anywhere in the driver. Thus it is also safe to remove. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-14ice: convert vf->vc_ops to a const pointerJacob Keller3-23/+55
The vc_ops structure is used to allow different handlers for virtchnl commands when the driver is in representor mode. The current implementation uses a copy of the ops table in each VF, and modifies this copy dynamically. The usual practice in kernel code is to store the ops table in a constant structure and point to different versions. This has a number of advantages: 1. Reduced memory usage. Each VF merely points to the correct table, so they're able to re-use the same constant lookup table in memory. 2. Consistency. It becomes more difficult to accidentally update or edit only one op call. Instead, the code switches to the correct able by a single pointer write. In general this is atomic, either the pointer is updated or its not. 3. Code Layout. The VF structure can store a pointer to the table without needing to have the full structure definition defined prior to the VF structure definition. This will aid in future refactoring of code by allowing the VF pointer to be kept in ice_vf_lib.h while the virtchnl ops table can be maintained in ice_virtchnl.h There is one major downside in the case of the vc_ops structure. Most of the operations in the table are the same between the two current implementations. This can appear to lead to duplication since each implementation must now fill in the complete table. It could make spotting the differences in the representor mode more challenging. Unfortunately, methods to make this less error prone either add complexity overhead (macros using CPP token concatenation) or don't work on all compilers we support (constant initializer from another constant structure). The cost of maintaining two structures does not out weigh the benefits of the constant table model. While we're making these changes, go ahead and rename the structure and implementations with "virtchnl" instead of "vc_vf_". This will more closely align with the planned file renaming, and avoid similar names when we later introduce a "vf ops" table for separating Scalable IOV and Single Root IOV implementations. Leave the accessor/assignment functions in order to avoid issues with compiling with options disabled. The interface makes it easier to handle when CONFIG_PCI_IOV is disabled in the kernel. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-14ice: remove circular header dependencies on ice.hJacob Keller14-11/+35
Several headers in the ice driver include ice.h even though they are themselves included by that header. The most notable of these is ice_common.h, but several other headers also do this. Such a recursive inclusion is problematic as it forces headers to be included in a strict order, otherwise compilation errors can result. The circular inclusions do not trigger an endless loop due to standard header inclusion guards, however other errors can occur. For example, ice_flow.h defines ice_rss_hash_cfg, which is used by ice_sriov.h as part of the definition of ice_vf_hash_ip_ctx. ice_flow.h includes ice_acl.h, which includes ice_common.h, and which finally includes ice.h. Since ice.h itself includes ice_sriov.h, this creates a circular dependency. The definition in ice_sriov.h requires things from ice_flow.h, but ice_flow.h itself will lead to trying to load ice_sriov.h as part of its process for expanding ice.h. The current code avoids this issue by having an implicit dependency without the include of ice_flow.h. If we were to fix that so that ice_sriov.h explicitly depends on ice_flow.h the following pattern would occur: ice_flow.h -> ice_acl.h -> ice_common.h -> ice.h -> ice_sriov.h At this point, during the expansion of, the header guard for ice_flow.h is already set, so when ice_sriov.h attempts to load the ice_flow.h header it is skipped. Then, we go on to begin including the rest of ice_sriov.h, including structure definitions which depend on ice_rss_hash_cfg. This produces a compiler warning because ice_rss_hash_cfg hasn't yet been included. Remember, we're just at the start of ice_flow.h! If the order of headers is incorrect (ice_flow.h is not implicitly loaded first in all files which include ice_sriov.h) then we get the same failure. Removing this recursive inclusion requires fixing a few cases where some headers depended on the header inclusions from ice.h. In addition, a few other changes are also required. Most notably, ice_hw_to_dev is implemented as a macro in ice_osdep.h, which is the likely reason that ice_common.h includes ice.h at all. This macro implementation requires the full definition of ice_pf in order to properly compile. Fix this by moving it to a function declared in ice_main.c, so that we do not require all files to depend on the layout of the ice_pf structure. Note that this change only fixes circular dependencies, but it does not fully resolve all implicit dependencies where one header may depend on the inclusion of another. I tried to fix as many of the implicit dependencies as I noticed, but fixing them all requires a somewhat tedious analysis of each header and attempting to compile it separately. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-14ice: rename ice_virtchnl_pf.c to ice_sriov.cJacob Keller7-8/+8
The ice_virtchnl_pf.c and ice_virtchnl_pf.h files are where most of the code for implementing Single Root IOV virtualization resides. This code includes support for bringing up and tearing down VFs, hooks into the kernel SR-IOV netdev operations, and for handling virtchnl messages from VFs. In the future, we plan to support Scalable IOV in addition to Single Root IOV as an alternative virtualization scheme. This implementation will re-use some but not all of the code in ice_virtchnl_pf.c To prepare for this future, we want to refactor and split up the code in ice_virtchnl_pf.c into the following scheme: * ice_vf_lib.[ch] Basic VF structures and accessors. This is where scheme-independent code will reside. * ice_virtchnl.[ch] Virtchnl message handling. This is where the bulk of the logic for processing messages from VFs using the virtchnl messaging scheme will reside. This is separated from ice_vf_lib.c because it is distinct and has a bulk of the processing code. * ice_sriov.[ch] Single Root IOV implementation, including initialization and the routines for interacting with SR-IOV based netdev operations. * (future) ice_siov.[ch] Scalable IOV implementation. As a first step, lets assume that all of the code in ice_virtchnl_pf.[ch] is for Single Root IOV. Rename this file to ice_sriov.c and its header to ice_sriov.h Future changes will further split out the code in these files following the plan outlined here. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-14ice: rename ice_sriov.c to ice_vf_mbx.cJacob Keller4-6/+6
The ice_sriov.c file primarily contains code which handles the logic for mailbox overflow detection and some other utility functions related to the virtualization mailbox. The bulk of the SR-IOV implementation is actually found in ice_virtchnl_pf.c, and this file isn't strictly SR-IOV specific. In the future, the ice driver will support an additional virtualization scheme known as Scalable IOV, and the code in this file will be used for this alternative implementation. Rename this file (and its associated header) to ice_vf_mbx.c, so that we can later re-use the ice_sriov.c file as the SR-IOV specific file. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-11ice: Support GTP-U and GTP-C offload in switchdevMarcin Szycik8-24/+765
Add support for creating filters for GTP-U and GTP-C in switchdev mode. Add support for parsing GTP-specific options (QFI and PDU type) and TEID. By default, a filter for GTP-U will be added. To add a filter for GTP-C, specify enc_dst_port = 2123, e.g.: tc filter add dev $GTP0 ingress prio 1 flower enc_key_id 1337 \ enc_dst_port 2123 action mirred egress redirect dev $VF1_PR Note: GTP-U with outer IPv6 offload is not supported yet. Note: GTP-U with no payload offload is not supported yet. Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-11ice: Fix FV offset searchingMichal Swiatkowski3-51/+12
Checking only protocol ids while searching for correct FVs can lead to a situation, when incorrect FV will be added to the list. Incorrect means that FV has correct protocol id but incorrect offset. Call ice_get_sw_fv_list with ice_prot_lkup_ext struct which contains all protocol ids with offsets. With this modification allocating and collecting protocol ids list is not longer needed. Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-10Merge branch '100GbE' of ↵Jakub Kicinski8-28/+370
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== 100GbE Intel Wired LAN Driver Updates 2022-03-09 This series contains updates to ice driver only. Martyna implements switchdev filtering on inner EtherType field for tunnels. Marcin adds reporting of slowpath statistics for port representors. Jonathan Toppins changes a non-fatal link error message from warning to debug. Maciej removes unnecessary checks in ice_clean_tx_irq(). Amritha adds support for ADQ to match outer destination MAC for tunnels. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: ice: Add support for outer dest MAC for ADQ tunnels ice: avoid XDP checks in ice_clean_tx_irq() ice: change "can't set link" message to dbg level ice: Add slow path offload stats on port representor in switchdev ice: Add support for inner etype in switchdev ==================== Link: https://lore.kernel.org/r/20220309190315.1380414-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski5-40/+38
net/dsa/dsa2.c commit afb3cc1a397d ("net: dsa: unlock the rtnl_mutex when dsa_master_setup() fails") commit e83d56537859 ("net: dsa: replay master state events in dsa_tree_{setup,teardown}_master") https://lore.kernel.org/all/20220307101436.7ae87da0@canb.auug.org.au/ drivers/net/ethernet/intel/ice/ice.h commit 97b0129146b1 ("ice: Fix error with handling of bonding MTU") commit 43113ff73453 ("ice: add TTY for GNSS module for E810T device") https://lore.kernel.org/all/20220310112843.3233bcf1@canb.auug.org.au/ drivers/staging/gdm724x/gdm_lte.c commit fc7f750dc9d1 ("staging: gdm724x: fix use after free in gdm_lte_rx()") commit 4bcc4249b4cf ("staging: Use netif_rx().") https://lore.kernel.org/all/20220308111043.1018a59d@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-10ice: Fix race condition during interface enslaveIvan Vecera2-2/+21
Commit 5dbbbd01cbba83 ("ice: Avoid RTNL lock when re-creating auxiliary device") changes a process of re-creation of aux device so ice_plug_aux_dev() is called from ice_service_task() context. This unfortunately opens a race window that can result in dead-lock when interface has left LAG and immediately enters LAG again. Reproducer: ``` #!/bin/sh ip link add lag0 type bond mode 1 miimon 100 ip link set lag0 for n in {1..10}; do echo Cycle: $n ip link set ens7f0 master lag0 sleep 1 ip link set ens7f0 nomaster done ``` This results in: [20976.208697] Workqueue: ice ice_service_task [ice] [20976.213422] Call Trace: [20976.215871] __schedule+0x2d1/0x830 [20976.219364] schedule+0x35/0xa0 [20976.222510] schedule_preempt_disabled+0xa/0x10 [20976.227043] __mutex_lock.isra.7+0x310/0x420 [20976.235071] enum_all_gids_of_dev_cb+0x1c/0x100 [ib_core] [20976.251215] ib_enum_roce_netdev+0xa4/0xe0 [ib_core] [20976.256192] ib_cache_setup_one+0x33/0xa0 [ib_core] [20976.261079] ib_register_device+0x40d/0x580 [ib_core] [20976.266139] irdma_ib_register_device+0x129/0x250 [irdma] [20976.281409] irdma_probe+0x2c1/0x360 [irdma] [20976.285691] auxiliary_bus_probe+0x45/0x70 [20976.289790] really_probe+0x1f2/0x480 [20976.298509] driver_probe_device+0x49/0xc0 [20976.302609] bus_for_each_drv+0x79/0xc0 [20976.306448] __device_attach+0xdc/0x160 [20976.310286] bus_probe_device+0x9d/0xb0 [20976.314128] device_add+0x43c/0x890 [20976.321287] __auxiliary_device_add+0x43/0x60 [20976.325644] ice_plug_aux_dev+0xb2/0x100 [ice] [20976.330109] ice_service_task+0xd0c/0xed0 [ice] [20976.342591] process_one_work+0x1a7/0x360 [20976.350536] worker_thread+0x30/0x390 [20976.358128] kthread+0x10a/0x120 [20976.365547] ret_from_fork+0x1f/0x40 ... [20976.438030] task:ip state:D stack: 0 pid:213658 ppid:213627 flags:0x00004084 [20976.446469] Call Trace: [20976.448921] __schedule+0x2d1/0x830 [20976.452414] schedule+0x35/0xa0 [20976.455559] schedule_preempt_disabled+0xa/0x10 [20976.460090] __mutex_lock.isra.7+0x310/0x420 [20976.464364] device_del+0x36/0x3c0 [20976.467772] ice_unplug_aux_dev+0x1a/0x40 [ice] [20976.472313] ice_lag_event_handler+0x2a2/0x520 [ice] [20976.477288] notifier_call_chain+0x47/0x70 [20976.481386] __netdev_upper_dev_link+0x18b/0x280 [20976.489845] bond_enslave+0xe05/0x1790 [bonding] [20976.494475] do_setlink+0x336/0xf50 [20976.502517] __rtnl_newlink+0x529/0x8b0 [20976.543441] rtnl_newlink+0x43/0x60 [20976.546934] rtnetlink_rcv_msg+0x2b1/0x360 [20976.559238] netlink_rcv_skb+0x4c/0x120 [20976.563079] netlink_unicast+0x196/0x230 [20976.567005] netlink_sendmsg+0x204/0x3d0 [20976.570930] sock_sendmsg+0x4c/0x50 [20976.574423] ____sys_sendmsg+0x1eb/0x250 [20976.586807] ___sys_sendmsg+0x7c/0xc0 [20976.606353] __sys_sendmsg+0x57/0xa0 [20976.609930] do_syscall_64+0x5b/0x1a0 [20976.613598] entry_SYSCALL_64_after_hwframe+0x65/0xca 1. Command 'ip link ... set nomaster' causes that ice_plug_aux_dev() is called from ice_service_task() context, aux device is created and associated device->lock is taken. 2. Command 'ip link ... set master...' calls ice's notifier under RTNL lock and that notifier calls ice_unplug_aux_dev(). That function tries to take aux device->lock but this is already taken by ice_plug_aux_dev() in step 1 3. Later ice_plug_aux_dev() tries to take RTNL lock but this is already taken in step 2 4. Dead-lock The patch fixes this issue by following changes: - Bit ICE_FLAG_PLUG_AUX_DEV is kept to be set during ice_plug_aux_dev() call in ice_service_task() - The bit is checked in ice_clear_rdma_cap() and only if it is not set then ice_unplug_aux_dev() is called. If it is set (in other words plugging of aux device was requested and ice_plug_aux_dev() is potentially running) then the function only clears the bit - Once ice_plug_aux_dev() call (in ice_service_task) is finished the bit ICE_FLAG_PLUG_AUX_DEV is cleared but it is also checked whether it was already cleared by ice_clear_rdma_cap(). If so then aux device is unplugged. Signed-off-by: Ivan Vecera <ivecera@redhat.com> Co-developed-by: Petr Oros <poros@redhat.com> Signed-off-by: Petr Oros <poros@redhat.com> Reviewed-by: Dave Ertman <david.m.ertman@intel.com> Link: https://lore.kernel.org/r/20220310171641.3863659-1-ivecera@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-09ice: Add support for outer dest MAC for ADQ tunnelsAmritha Nambiar1-4/+28
TC flower does not support matching on user specified outer MAC address for tunnels. For ADQ tunnels, the driver adds outer destination MAC address as lower netdev's active unicast MAC address to filter out packets with unrelated MAC address being delivered to ADQ VSIs. Example: - create tunnel device ip l add $VXLAN_DEV type vxlan id $VXLAN_VNI dstport $VXLAN_PORT \ dev $PF - add TC filter (in ADQ mode) $tc filter add dev $VXLAN_DEV protocol ip parent ffff: flower \ dst_ip $INNER_DST_IP ip_proto tcp dst_port $INNER_DST_PORT \ enc_key_id $VXLAN_VNI hw_tc $ADQ_TC Note: Filters with wild-card tunnel ID (when user does not specify tunnel key) are also supported. Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com> Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-09ice: avoid XDP checks in ice_clean_tx_irq()Maciej Fijalkowski1-6/+1
Commit 9610bd988df9 ("ice: optimize XDP_TX workloads") introduced Tx IRQ cleaning routine dedicated for XDP rings. Currently it is impossible to call ice_clean_tx_irq() against XDP ring, so it is safe to drop ice_ring_is_xdp() calls in there. Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com> (A Contingent Worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>