summaryrefslogtreecommitdiffstats
path: root/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
AgeCommit message (Collapse)AuthorFilesLines
2023-01-09net/mlx5: E-switch, Coverity: overlapping copyShay Drory1-4/+0
When a capability is set via port function caps callbacks, a memcpy() is performed in which the source and the target are the same address, e.g.: the copy is redundant. Hence, Remove it. Discovered by Coverity. Fixes: 7db98396ef45 ("net/mlx5: E-Switch, Implement devlink port function cmds to control RoCE") Fixes: e5b9642a33be ("net/mlx5: E-Switch, Implement devlink port function cmds to control migratable") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-01-09net/mlx5: check attr pointer validity before dereferencing itAriel Levkovich1-1/+1
Fix attr pointer validity checks after it was already dereferenced. Fixes: cb0d54cbf948 ("net/mlx5e: Fix wrong source vport matching on tunnel rule") Signed-off-by: Ariel Levkovich <lariel@nvidia.com> Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-12-08net/mlx5e: TC, add support for meter mtu offloadOz Shlomo1-0/+19
Initialize the meter object with the TC police mtu parameter. Use the hardware range destination to compare the pkt len to the mtu setting. Assign the range destination hit/miss ft to the police conform/exceed attributes. Signed-off-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-12-08net/mlx5e: E-Switch, handle flow attribute with no destinationsOz Shlomo1-0/+5
Rules with drop action are not required to have a destination. Currently the destination list is allocated with the maximum number of destinations and passed to the fs_core layer along with the actual number of destinations. Remove redundant passing of dest pointer when count of dest is 0. Signed-off-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20221203221337.29267-2-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-07net/mlx5: E-Switch, Implement devlink port function cmds to control migratableShay Drory1-0/+101
Implement devlink port function commands to enable / disable migratable. This is used to control the migratable capability of the device. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Acked-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-07net/mlx5: E-Switch, Implement devlink port function cmds to control RoCEYishai Hadas1-0/+108
Implement devlink port function commands to enable / disable RoCE. This is used to control the RoCE device capabilities. This patch implement infrastructure which will be used by downstream patches that will add additional capabilities. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Daniel Jurgens <danielj@nvidia.com> Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Acked-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-07net/mlx5: Add generic getters for other functions capsShay Drory1-1/+1
Downstream patch requires to get other function GENERAL2 caps while mlx5_vport_get_other_func_cap() gets only one type of caps (general). Rename it to represent this and introduce a generic implementation of mlx5_vport_get_other_func_cap(). Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Acked-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-29net/mlx5e: TC, Add offload support for trap with additional actionsMaor Dickman1-11/+11
TC trap action offload is currently supported only when trap is the sole action in the flow. This patch remove this limitation by changing trap action offload to not use MLX5_ATTR_FLAG_SLOW_PATH flag and instead set the flow destination table explicitly to be the slow table. This will allow offload of the additional actions. TC flow example: tc filter add dev $REP2 protocol ip prio 2 root \ flower skip_sw dst_mac $mac0 \ action mirred egress redirect dev $REP3 \ action pedit ex munge eth dst set $mac2 pipe \ action trap Signed-off-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Raed Salem <raeds@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-11-29net/mlx5e: Do early return when setup vports dests for slow path flowRoi Dayan1-5/+8
Adding flow flag cases in setup vport dests before the slow path case is incorrect as the slow path should take precedence. Current code doesn't show this importance so make the slow path case return early and separate from the other cases and remove the redundant comparison of it in the sample case. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-11-24net/mlx5: E-switch, Destroy legacy fdb table when neededChris Mi1-0/+7
The cited commit removes eswitch mode none. But when disabling sriov in legacy mode or changing from switchdev to legacy mode without sriov enabled, the legacy fdb table is not destroyed. It is not the right behavior. Destroy legacy fdb table in above two caes. Fixes: f019679ea5f2 ("net/mlx5: E-switch, Remove dependency between sriov and eswitch mode") Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Eli Cohen <elic@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-11-21net/mlx5: E-Switch, Set correctly vport destinationRoi Dayan1-1/+1
The cited commit moved from using reformat_id integer to packet_reformat pointer which introduced the possibility to null pointer dereference. When setting packet reformat flag and pkt_reformat pointer must exists so checking MLX5_ESW_DEST_ENCAP is not enough, we need to make sure the pkt_reformat is valid and check for MLX5_ESW_DEST_ENCAP_VALID. If the dest encap valid flag does not exists then pkt_reformat can be either invalid address or null. Also, to make sure we don't try to access invalid pkt_reformat set it to null when invalidated and invalidate it before calling add flow code as its logically more correct and to be safe. Fixes: 2b688ea5efde ("net/mlx5: Add flow steering actions to fs_cmd shim layer") Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Chris Mi <cmi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-11-09net/mlx5: E-switch, Set to legacy mode if failed to change switchdev modeChris Mi1-15/+3
No need to rollback to the other mode because probably will fail again. Just set to legacy mode and clear fdb table created flag. So that fdb table will not be cleared again. Fixes: f019679ea5f2 ("net/mlx5: E-switch, Remove dependency between sriov and eswitch mode") Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-10-03net/mlx5: E-Switch, Allow offloading fwd dest flow table with vportRoi Dayan1-7/+9
Before this commit a fwd dest flow table resulted in ignoring vport dests which is incorrect and is supported. With this commit the dests can be a mix of flow table and vport dests. There is still a limitation that there cannot be more than one flow table dest. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-2/+5
drivers/net/ethernet/mellanox/mlx5/core/en_fs.c 21234e3a84c7 ("net/mlx5e: Fix use after free in mlx5e_fs_init()") c7eafc5ed068 ("net/mlx5e: Convert ethtool_steering member of flow_steering struct to pointer") https://lore.kernel.org/all/20220825104410.67d4709c@canb.auug.org.au/ https://lore.kernel.org/all/20220823055533.334471-1-saeed@kernel.org/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-22net/mlx5: E-Switch, Move send to vport meta rule creationRoi Dayan1-84/+28
Move the creation of the rules from offloads fdb table init to per rep vport init. This way the driver will creating the send to vport meta rule on any representor, e.g. SF representors. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-08-22net/mlx5: E-Switch, Split creating fdb tables into smaller chunksRoi Dayan1-124/+206
Split esw_create_offloads_fdb_tables() into smaller functions. This will help maintenance. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-08-22net/mlx5: E-Switch, Add default drop rule for unmatched packetsJianbo Liu1-3/+92
The ft_offloads table serves to steer packets, which are from the eswitch, to the representor associated with the packets' source vport. Previously, if a packet's source vport or metadata was not associated with any representor, it was forwarded to the uplink representor. The representor got packets it shouldn't have as they weren't coming from the uplink vport. One such effect of this breakage can be observed if the uplink representor is attached to a bridge, where such illegal packets will be broadcast to the remaining ports, flooding the switch with illegal packets. In the case where IB loopback (e.g, SNAP) is enabled, all transmitted packets would be looped back, and received by the uplink representor, and result in an infinite feedback loop. Therefore, block this hole by adding a default drop rule to the ft_offloads table, so that all unmatched packets with no associated representor are dropped. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Gavi Teitz <gavi@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-08-22net/mlx5: unlock on error path in esw_vfs_changed_event_handler()Dan Carpenter1-1/+3
Unlock before returning on this error path. Fixes: f1bc646c9a06 ("net/mlx5: Use devl_ API in mlx5_esw_offloads_devlink_port_register") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-08-22net/mlx5: Eswitch, Fix forwarding decision to uplinkEli Cohen1-1/+2
Make sure to modify the rule for uplink forwarding only for the case where destination vport number is MLX5_VPORT_UPLINK. Fixes: 94db33177819 ("net/mlx5: Support multiport eswitch mode") Signed-off-by: Eli Cohen <elic@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-08-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netPaolo Abeni1-6/+17
Conflicts: net/ax25/af_ax25.c d7c4c9e075f8c ("ax25: fix incorrect dev_tracker usage") d62607c3fe459 ("net: rename reference+tracking helpers") drivers/net/netdevsim/fib.c 180a6a3ee60a ("netdevsim: fib: Fix reference count leak on route deletion failure") 012ec02ae441 ("netdevsim: convert driver to use unlocked devlink API during init/fini") Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-07-28net/mlx5e: Modify slow path rules to go to slow fdbVlad Buslov1-6/+17
While extending available range of supported chains/prios referenced commit also modified slow path rules to go to FT chain instead of actual slow FDB. However neither of existing users of the MLX5_ATTR_FLAG_SLOW_PATH flag (tunnel encap entries with invalid encap and flows with trap action) need to match on FT chain. After bridge offload was implemented packets of such flows can also be matched by bridge priority tables which is undesirable. Restore slow path flows implementation to redirect packets to slow_fdb. Fixes: 278d51f24330 ("net/mlx5: E-Switch, Increase number of chains and priorities") Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-07-13net/mlx5e: configure meter in flow actionJianbo Liu1-0/+18
After police action is parsed, set meter data in flow action, so they can be used when adding FTE. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-07-13net/mlx5: Expose vnic diagnostic counters for eswitch managed vportsMichael Guralnik1-0/+3
Expose on vport group managers debug counters for their managed vports. Counters are exposed through debugfs, the directory will be present only for functions that are eswitch managers and only counters that are supported on their specific HW/FW will be exposed. Example: $ ls /sys/kernel/debug/mlx5/0000:08:00.0/esw/ pf sf_8 vf_0 vf_1 $ ls -l /sys/kernel/debug/mlx5/0000:08:00.0/esw/vf_0/vnic_diag/ cq_overrun quota_exceeded_command total_q_under_processor_handle invalid_command send_queue_priority_update_flow List of all counter added: total_q_under_processor_handle - number of queues in error state due to an async error or errored command. send_queue_priority_update_flow - number of QP/SQ priority/SL update events. cq_overrun - number of times CQ entered an error state due to an overflow. async_eq_overrun -number of time an EQ mapped to async events was overrun. comp_eq_overrun - number of time an EQ mapped to completion events was overrun. quota_exceeded_command - number of commands issued and failed due to quota exceeded. invalid_command - number of commands issued and failed dues to any reason other than quota exceeded. Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-07-12net/mlx5: Remove devl_unlock from mlx5_devlink_eswitch_mode_setMoshe Shemesh1-20/+0
The callback mlx5_devlink_eswitch_mode_set() had unlocked devlink as a temporary workaround once devlink instance lock was added to devlink eswitch callbacks. Now that all flows triggered by this function that took devlink lock are using devl_ API and all parallel paths are locked we can remove this workaround. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-07-12net/mlx5: Use devl_ API in mlx5e_devlink_port_registerMoshe Shemesh1-0/+2
As part of the flows invoked by mlx5_devlink_eswitch_mode_set() get to mlx5_rescan_drivers_locked() which can call mlx5e_probe()/mlx5e_remove and register/unregister mlx5e driver ports accordingly. This can lead to deadlock once mlx5_devlink_eswitch_mode_set() will use devlink lock. Use devl_port_register/unregister() instead of devlink_port_register/unregister() and add devlink instance locks in the driver paths to this function to have it locked while calling devl_ API function. If remove or probe were called by module init or module cleanup flows, need to lock devlink just before calling devl_port_register(), otherwise it is called by attach/detach or register/unregister flow and we can have the flow locked. Added flag to distinguish between these cases. This will be used by the downstream patch to invoke mlx5_devlink_eswitch_mode_set() with devlink locked. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-07-12net/mlx5: Use devl_ API in mlx5_esw_offloads_devlink_port_registerMoshe Shemesh1-0/+10
The function mlx5_esw_offloads_devlink_port_register() calls devlink_port_register() and devlink_rate_leaf_create(). Use devl_ API to call devl_port_register() and devl_rate_leaf_create() accordingly and add devlink instance lock in driver paths to this function. Similarly, use devl_ API to call devl_port_unregister() and devl_rate_leaf_destroy() in mlx5_esw_offloads_devlink_port_unregister() and ensure locking devlink instance lock on the paths to this function too. This will be used by the downstream patch to invoke mlx5_devlink_eswitch_mode_set() with devlink lock held. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-07-12net/mlx5: Use devl_ API for rate nodes destroyMoshe Shemesh1-0/+2
Use devl_rate_nodes_destroy() instead of devlink_rate_nodes_destroy(). Add devlink instance lock in the driver paths to this function to have it locked while calling devl_ API function. This will be used by the downstream patch to invoke mlx5_devlink_eswitch_mode_set() with devlink lock held. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-07-12net/mlx5: Remove devl_unlock from mlx5_eswtich_mode_callback_enterMoshe Shemesh1-32/+11
The function mlx5_eswtich_mode_callback_enter() was added as a temporary workaround once devlink instance lock was added to devlink eswitch callbacks. However, code review and testing show that all the callbacks part to eswitch_mode_set don't take devlink instance lock in any flow and so unlocking devlink instance lock while entering these functions is not needed. Remove devl_lock from mlx5_eswtich_mode_callback_enter() and devl_unlock from mlx5_eswtich_mode_callback_exit(). Also remove the functions mlx5_eswtich_mode_callback_enter()/exit() as they are not needed any more. The callback eswitch_mode_set will be treated separately in the following patches. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-07-02net/mlx5: E-switch: Change eswitch mode only via devlink commandChris Mi1-8/+9
Enable or disable switchdev according to the eswitch mode set by devlink command. So it is not changed by other functions anymore. Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-07-02net/mlx5: E-switch, Remove dependency between sriov and eswitch modeChris Mi1-37/+15
Currently, there are three eswitch modes, none, legacy and switchdev. None is the default mode. Remove redundant none mode as eswitch mode should always be either legacy mode or switchdev mode. With this patch, there are two behavior changes: 1. Legacy becomes the default mode. When querying eswitch mode using devlink, a valid mode is always returned. 2. When disabling sriov, the eswitch mode will not change, only vfs are unloaded. Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-06-08net/mlx5: E-Switch, pair only capable devicesMark Bloch1-3/+6
OFFLOADS paring using devcom is possible only on devices that support LAG. Filter based on lag capabilities. This fixes an issue where mlx5_get_next_phys_dev() was called without holding the interface lock. This issue was found when commit bc4c2f2e0179 ("net/mlx5: Lag, filter non compatible devices") added an assert that verifies the interface lock is held. WARNING: CPU: 9 PID: 1706 at drivers/net/ethernet/mellanox/mlx5/core/dev.c:642 mlx5_get_next_phys_dev+0xd2/0x100 [mlx5_core] Modules linked in: mlx5_vdpa vringh vhost_iotlb vdpa mlx5_ib mlx5_core xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_umad ib_ipoib ib_cm ib_uverbs ib_core overlay fuse [last unloaded: mlx5_core] CPU: 9 PID: 1706 Comm: devlink Not tainted 5.18.0-rc7+ #11 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:mlx5_get_next_phys_dev+0xd2/0x100 [mlx5_core] Code: 02 00 75 48 48 8b 85 80 04 00 00 5d c3 31 c0 5d c3 be ff ff ff ff 48 c7 c7 08 41 5b a0 e8 36 87 28 e3 85 c0 0f 85 6f ff ff ff <0f> 0b e9 68 ff ff ff 48 c7 c7 0c 91 cc 84 e8 cb 36 6f e1 e9 4d ff RSP: 0018:ffff88811bf47458 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88811b398000 RCX: 0000000000000001 RDX: 0000000080000000 RSI: ffffffffa05b4108 RDI: ffff88812daaaa78 RBP: ffff88812d050380 R08: 0000000000000001 R09: ffff88811d6b3437 R10: 0000000000000001 R11: 00000000fddd3581 R12: ffff88815238c000 R13: ffff88812d050380 R14: ffff8881018aa7e0 R15: ffff88811d6b3428 FS: 00007fc82e18ae80(0000) GS:ffff88842e080000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f9630d1b421 CR3: 0000000149802004 CR4: 0000000000370ea0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> mlx5_esw_offloads_devcom_event+0x99/0x3b0 [mlx5_core] mlx5_devcom_send_event+0x167/0x1d0 [mlx5_core] esw_offloads_enable+0x1153/0x1500 [mlx5_core] ? mlx5_esw_offloads_controller_valid+0x170/0x170 [mlx5_core] ? wait_for_completion_io_timeout+0x20/0x20 ? mlx5_rescan_drivers_locked+0x318/0x810 [mlx5_core] mlx5_eswitch_enable_locked+0x586/0xc50 [mlx5_core] ? mlx5_eswitch_disable_pf_vf_vports+0x1d0/0x1d0 [mlx5_core] ? mlx5_esw_try_lock+0x1b/0xb0 [mlx5_core] ? mlx5_eswitch_enable+0x270/0x270 [mlx5_core] ? __debugfs_create_file+0x260/0x3e0 mlx5_devlink_eswitch_mode_set+0x27e/0x870 [mlx5_core] ? mutex_lock_io_nested+0x12c0/0x12c0 ? esw_offloads_disable+0x250/0x250 [mlx5_core] ? devlink_nl_cmd_trap_get_dumpit+0x470/0x470 ? rcu_read_lock_sched_held+0x3f/0x70 devlink_nl_cmd_eswitch_set_doit+0x217/0x620 Fixes: dd3fddb82780 ("net/mlx5: E-Switch, handle devcom events only for ports on the same device") Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-17net/mlx5: Support multiport eswitch modeEli Cohen1-0/+3
Multiport eswitch mode is a LAG mode that allows to add rules that forward traffic to a specific physical port without being affected by LAG affinity configuration. This mode of operation is mutual exclusive with the other LAG modes used by multipath and bonding. To make the transition between the modes, we maintain a counter on the number of rules specifying one of the uplink representors as the target of mirred egress redirect action. An example of such rule would be: $ tc filter add dev enp8s0f0_0 prot all root flower dst_mac \ 00:11:22:33:44:55 action mirred egress redirect dev enp8s0f0 If the reference count just grows to one and LAG is not in use, we create the LAG in multiport eswitch mode. Other mode changes are not allowed while in this mode. When the reference count reaches zero, we destroy the LAG and let other modes be used if needed. logic also changed such that if forwarding to some uplink destination cannot be guaranteed, we fail the operation so the rule will eventually be in software and not in hardware. Signed-off-by: Eli Cohen <elic@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-04net/mlx5e: Fix wrong source vport matching on tunnel ruleAriel Levkovich1-1/+1
When OVS internal port is the vtep device, the first decap rule is matching on the internal port's vport metadata value and then changes the metadata to be the uplink's value. Therefore, following rules on the tunnel, in chain > 0, should avoid matching on internal port metadata and use the uplink vport metadata instead. Select the uplink's metadata value for the source vport match in case the rule is in chain greater than zero, even if the tunnel route device is internal port. Fixes: 166f431ec6be ("net/mlx5e: Add indirect tc offload of ovs internal port") Signed-off-by: Ariel Levkovich <lariel@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-03-21devlink: hold the instance lock during eswitch_mode callbacksJakub Kicinski1-12/+42
Make the devlink core hold the instance lock during eswitch_mode callbacks. Cheat in case of mlx5 (see the cover letter). Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-28Merge branch 'mlx5-next' of ↵Jakub Kicinski1-67/+26
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Saeed Mahameed says: ==================== mlx5-next 2022-22-02 The following PR includes updates to mlx5-next branch: Headlines: ========== 1) Jakub cleans up unused static inline functions 2) I did some low level firmware command interface return status changes to provide the caller with full visibility on the error/status returned by the Firmware. 3) Use the new command interface in RDMA DEVX usecases to avoid flooding dmesg with some "expected" user error prone use cases. 4) Moshe also uses the new command interface to grab the specific error code from MFRL register command to provide the exact error reason for why SW reset couldn't perform internally in FW. 5) From Mark Bloch: Lag, drop packets in hardware when possible In active-backup mode the inactive interface's packets are dropped by the bond device. In switchdev where TC rules are offloaded to the FDB this can lead to packets being hit in the FDB where without offload they would have been dropped before reaching TC rules in the kernel. Create a drop rule to make sure packets on inactive ports are dropped before reaching the FDB. Listen on NETDEV_CHANGEUPPER / NETDEV_CHANGEINFODATA events and record the inactive state and offload accordingly. * 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux: net/mlx5: Add clarification on sync reset failure net/mlx5: Add reset_state field to MFRL register RDMA/mlx5: Use new command interface API net/mlx5: cmdif, Refactor error handling and reporting of async commands net/mlx5: Use mlx5_cmd_do() in core create_{cq,dct} net/mlx5: cmdif, Add new api for command execution net/mlx5: cmdif, cmd_check refactoring net/mlx5: cmdif, Return value improvements net/mlx5: Lag, offload active-backup drops to hardware net/mlx5: Lag, record inactive state of bond device net/mlx5: Lag, don't use magic numbers for ports net/mlx5: Lag, use local variable already defined to access E-Switch net/mlx5: E-switch, add drop rule support to ingress ACL net/mlx5: E-switch, remove special uplink ingress ACL handling net/mlx5: E-Switch, reserve and use same uplink metadata across ports net/mlx5: Add ability to insert to specific flow group mlx5: remove unused static inlines ==================== Link: https://lore.kernel.org/r/20220223233930.319301-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-4/+0
tools/testing/selftests/net/mptcp/mptcp_join.sh 34aa6e3bccd8 ("selftests: mptcp: add ip mptcp wrappers") 857898eb4b28 ("selftests: mptcp: add missing join check") 6ef84b1517e0 ("selftests: mptcp: more robust signal race test") https://lore.kernel.org/all/20220221131842.468893-1-broonie@kernel.org/ drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/act.h drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/ct.c fb7e76ea3f3b6 ("net/mlx5e: TC, Skip redundant ct clear actions") c63741b426e11 ("net/mlx5e: Fix MPLSoUDP encap to use MPLS action information") 09bf97923224f ("net/mlx5e: TC, Move pedit_headers_action to parse_attr") 84ba8062e383 ("net/mlx5e: Test CT and SAMPLE on flow attr") efe6f961cd2e ("net/mlx5e: CT, Don't set flow flag CT for ct clear flow") 3b49a7edec1d ("net/mlx5e: TC, Reject rules with multiple CT actions") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-23net/mlx5: Fix wrong limitation of metadata match on ecpfAriel Levkovich1-4/+0
Match metadata support check returns false for ecpf device. However, this support does exist for ecpf and therefore this limitation should be removed to allow feature such as stacked devices and internal port offloaded to be supported. Fixes: 92ab1eb392c6 ("net/mlx5: E-Switch, Enable vport metadata matching if firmware supports it") Signed-off-by: Ariel Levkovich <lariel@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-02-23net/mlx5: E-switch, remove special uplink ingress ACL handlingMark Bloch1-64/+1
As both uplinks set the same metadata there is no need to merge the ACL handling of both into a single one. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-02-23net/mlx5: E-Switch, reserve and use same uplink metadata across portsSunil Rani1-3/+25
When in switchdev mode wire traffic will hit the FDB in one of two scenarios. - Shared FDB, in that case traffic from both physical ports should be tagged by the same metadata value so a single FDB rule could catch traffic from both ports. - Two E-Switches, traffic from each physical port will hit the native E-Switch which means traffic from one physical port can't reach the E-Switch of the other one. Looking at those two scenarios it means we can always use the same metadata value to tag wire traffic regardless of the mode. Reserve a single metadata value to be used to tag wire traffic. Signed-off-by: Sunil Rani <sunrani@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-01-27net/mlx5e: Test CT and SAMPLE on flow attrRoi Dayan1-1/+2
Currently the mlx5_flow object contains a single mlx5_attr instance. However, multi table actions (e.g. CT) instantiate multiple attr instances. Prepare for multiple attr instances by testing for CT or SAMPLE flag on attr flags instead of flow flag. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Chris Mi <cmi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-01-27net/mlx5e: Refactor eswitch attr flags to just attr flagsRoi Dayan1-13/+13
The flags are flow attrs and not esw specific attr flags. Refactor to remove the esw prefix and move from eswitch.h to en_tc.h where struct mlx5_flow_attr exists. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-01-27net/mlx5e: TC, Hold sample_attr on stack instead of pointerRoi Dayan1-3/+3
In later commit we are going to instantiate multiple attr instances for flow instead of single attr. Parsing TC sample allocates a new memory but there is no symmetric cleanup in the infrastructure. To avoid asymmetric alloc/free use sample_attr as part of the flow attr and not allocated and held as a pointer. This will avoid a cleanup leak when sample action is not on the first attr. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-01-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-13/+15
Merge in fixes directly in prep for the 5.17 merge window. No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-06net/mlx5e: Fix nullptr on deleting mirroring ruleDima Chumak1-13/+15
Deleting a Tc rule with multiple outputs, one of which is internal port, like this one: tc filter del dev enp8s0f0_0 ingress protocol ip pref 5 flower \ dst_mac 0c:42:a1:d1:d0:88 \ src_mac e4:ea:09:08:00:02 \ action tunnel_key set \ src_ip 0.0.0.0 \ dst_ip 7.7.7.8 \ id 8 \ dst_port 4789 \ action mirred egress mirror dev vxlan_sys_4789 pipe \ action mirred egress redirect dev enp8s0f0_1 Triggers a call trace: BUG: kernel NULL pointer dereference, address: 0000000000000230 RIP: 0010:del_sw_hw_rule+0x2b/0x1f0 [mlx5_core] Call Trace: tree_remove_node+0x16/0x30 [mlx5_core] mlx5_del_flow_rules+0x51/0x160 [mlx5_core] __mlx5_eswitch_del_rule+0x4b/0x170 [mlx5_core] mlx5e_tc_del_fdb_flow+0x295/0x550 [mlx5_core] mlx5e_flow_put+0x1f/0x70 [mlx5_core] mlx5e_delete_flower+0x286/0x390 [mlx5_core] tc_setup_cb_destroy+0xac/0x170 fl_hw_destroy_filter+0x94/0xc0 [cls_flower] __fl_delete+0x15e/0x170 [cls_flower] fl_delete+0x36/0x80 [cls_flower] tc_del_tfilter+0x3a6/0x6e0 rtnetlink_rcv_msg+0xe5/0x360 ? rtnl_calcit.isra.0+0x110/0x110 netlink_rcv_skb+0x46/0x110 netlink_unicast+0x16b/0x200 netlink_sendmsg+0x202/0x3d0 sock_sendmsg+0x33/0x40 ____sys_sendmsg+0x1c3/0x200 ? copy_msghdr_from_user+0xd6/0x150 ___sys_sendmsg+0x88/0xd0 ? ___sys_recvmsg+0x88/0xc0 ? do_futex+0x10c/0x460 __sys_sendmsg+0x59/0xa0 do_syscall_64+0x48/0x140 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fix by disabling offloading for flows matching esw_is_chain_src_port_rewrite() which have more than one output. Fixes: 10742efc20a4 ("net/mlx5e: VF tunnel TX traffic offloading") Signed-off-by: Dima Chumak <dchumak@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-4/+16
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-30net/mlx5: E-Switch, Use indirect table only if all destinations support itMaor Dickman1-4/+15
When adding rule with multiple destinations, indirect table is used for all of the destinations if at least one of the destinations support it, this can cause creation of invalid indirect tables for the destinations that doesn't support it. Fixed it by using indirect table only if all destinations support it. Fixes: a508728a4c8b ("net/mlx5e: VF tunnel RX traffic offloading") Signed-off-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-11-30net/mlx5: E-Switch, fix single FDB creation on BlueFieldMark Bloch1-0/+1
Always use MLX5_FLOW_TABLE_OTHER_VPORT flag when creating egress ACL table for single FDB. Not doing so on BlueField will make firmware fail the command. On BlueField the E-Switch manager is the ECPF (vport 0xFFFE) which is filled in the flow table creation command but as the other_vport field wasn't set the firmware complains about a bad parameter. This is different from a regular HCA where the E-Switch manager vport is the PF (vport 0x0). Passing MLX5_FLOW_TABLE_OTHER_VPORT will make the firmware happy both on BlueField and on regular HCAs without special condition for each. This fixes the bellow firmware syndrome: mlx5_cmd_check:819:(pid 571): CREATE_FLOW_TABLE(0x930) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x754a4) Fixes: db202995f503 ("net/mlx5: E-Switch, add logic to enable shared FDB") Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-11-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-8/+1
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-16net/mlx5: E-switch, move offloads mode callbacks to offloads fileParav Pandit1-0/+59
eswitch.c is mainly for common code between legacy and offloads mode. MAC address get and set via devlink is applicable only in offloads mode. Hence, move it to eswitch_offloads.c file. Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-11-16net/mlx5: E-Switch, return error if encap isn't supportedRaed Salem1-1/+1
On regular ConnectX HCAs getting encap mode isn't supported when the E-Switch is in NONE mode. Current code would return no error code when trying to get encap mode in such case which is wrong. Fix by returning error value to indicate failure to caller in such case. Fixes: 8e0aa4bc959c ("net/mlx5: E-switch, Protect eswitch mode changes") Signed-off-by: Raed Salem <raeds@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>