summaryrefslogtreecommitdiffstats
path: root/drivers/net/ethernet/mellanox
AgeCommit message (Collapse)AuthorFilesLines
2020-01-24net/mlx5e: kTLS, Do not send decrypted-marked SKBs via non-accel pathTariq Toukan1-4/+10
When TCP out-of-order is identified (unexpected tcp seq mismatch), driver analyzes the packet and decides what handling should it get: 1. go to accelerated path (to be encrypted in HW), 2. go to regular xmit path (send w/o encryption), 3. drop. Packets marked with skb->decrypted by the TLS stack in the TX flow skips SW encryption, and rely on the HW offload. Verify that such packets are never sent un-encrypted on the wire. Add a WARN to catch such bugs, and prefer dropping the packet in these cases. Fixes: 46a3ea98074e ("net/mlx5e: kTLS, Enhance TX resync flow") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Boris Pismenny <borisp@mellanox.com> Reviewed-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-01-24net/mlx5e: kTLS, Remove redundant posts in TX resync flowTariq Toukan1-2/+0
The call to tx_post_resync_params() is done earlier in the flow, the post of the control WQEs is unnecessarily repeated. Remove it. Fixes: 700ec4974240 ("net/mlx5e: kTLS, Fix missing SQ edge fill") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Boris Pismenny <borisp@mellanox.com> Reviewed-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-01-24net/mlx5e: kTLS, Fix corner-case checks in TX resync flowTariq Toukan1-14/+19
There are the following cases: 1. Packet ends before start marker: bypass offload. 2. Packet starts before start marker and ends after it: drop, not supported, breaks contract with kernel. 3. packet ends before tls record info starts: drop, this packet was already acknowledged and its record info was released. Add the above as comment in code. Mind possible wraparounds of the TCP seq, replace the simple comparison with a call to the TCP before() method. In addition, remove logic that handles negative sync_len values, as it became impossible. Fixes: d2ead1f360e8 ("net/mlx5e: Add kTLS TX HW offload support") Fixes: 46a3ea98074e ("net/mlx5e: kTLS, Enhance TX resync flow") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Boris Pismenny <borisp@mellanox.com> Reviewed-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-01-24net/mlx5e: Clear VF config when switching modesDmytro Linkin2-4/+11
Currently VF in LEGACY mode are not able to go up. Also in OFFLOADS mode, when switching to it first time, VF can go up independently to his representor, which is not expected. Perform clearing of VF config when switching modes and set link state to AUTO as default value. Also, when switching to OFFLOADS mode set link state to DOWN, which allow VF link state to be controlled by its REP. Fixes: 1ab2068a4c66 ("net/mlx5: Implement vports admin state backup/restore") Fixes: 556b9d16d3f5 ("net/mlx5: Clear VF's configuration on disabling SRIOV") Signed-off-by: Dmytro Linkin <dmitrolin@mellanox.com> Signed-off-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-01-24net/mlx5: DR, use non preemptible call to get the current cpu numberErez Shitrit1-1/+2
Use raw_smp_processor_id instead of smp_processor_id() otherwise we will get the following trace in debug-kernel: BUG: using smp_processor_id() in preemptible [00000000] code: devlink caller is dr_create_cq.constprop.2+0x31d/0x970 [mlx5_core] Call Trace: dump_stack+0x9a/0xf0 debug_smp_processor_id+0x1f3/0x200 dr_create_cq.constprop.2+0x31d/0x970 genl_family_rcv_msg+0x5fd/0x1170 genl_rcv_msg+0xb8/0x160 netlink_rcv_skb+0x11e/0x340 Fixes: 297cccebdc5a ("net/mlx5: DR, Expose an internal API to issue RDMA operations") Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-01-24net/mlx5: E-Switch, Prevent ingress rate configuration of uplink repEli Cohen1-2/+7
Since the implementation relies on limiting the VF transmit rate to simulate ingress rate limiting, and since either uplink representor or ecpf are not associated with a VF, we limit the rate limit configuration for those ports. Fixes: fcb64c0f5640 ("net/mlx5: E-Switch, add ingress rate support") Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-01-24net/mlx5: DR, Enable counter on non-fwd-dest objectsErez Shitrit1-13/+29
The current code handles only counters that attached to dest, we still have the cases where we have counter on non-dest, like over drop etc. Fixes: 6a48faeeca10 ("net/mlx5: Add direct rule fs_cmd implementation") Signed-off-by: Hamdan Igbaria <hamdani@mellanox.com> Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-01-24net/mlx5: Update the list of the PCI supported devicesMeir Lichtinger1-0/+1
Add the upcoming ConnectX-7 device ID. Fixes: 85327a9c4150 ("net/mlx5: Update the list of the PCI supported devices") Signed-off-by: Meir Lichtinger <meirl@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-01-24net/mlx5: Fix lowest FDB pool sizePaul Blakey1-1/+1
The pool sizes represent the pool sizes in the fw. when we request a pool size from fw, it will return the next possible group. We track how many pools the fw has left and start requesting groups from the big to the small. When we start request 4k group, which doesn't exists in fw, fw wants to allocate the next possible size, 64k, but will fail since its exhausted. The correct smallest pool size in fw is 128 and not 4k. Fixes: e52c28024008 ("net/mlx5: E-Switch, Add chains and priorities") Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-01-23mlxsw: spectrum_acl: Fix use-after-free during reloadIdo Schimmel1-4/+12
During reload (or module unload), the router block is de-initialized. Among other things, this results in the removal of a default multicast route from each active virtual router (VRF). These default routes are configured during initialization to trap packets to the CPU. In Spectrum-2, unlike Spectrum-1, multicast routes are implemented using ACL rules. Since the router block is de-initialized before the ACL block, it is possible that the ACL rules corresponding to the default routes are deleted while being accessed by the ACL delayed work that queries rules' activity from the device. This can result in a rare use-after-free [1]. Fix this by protecting the rules list accessed by the delayed work with a lock. We cannot use a spinlock as the activity read operation is blocking. [1] [ 123.331662] ================================================================== [ 123.339920] BUG: KASAN: use-after-free in mlxsw_sp_acl_rule_activity_update_work+0x330/0x3b0 [ 123.349381] Read of size 8 at addr ffff8881f3bb4520 by task kworker/0:2/78 [ 123.357080] [ 123.358773] CPU: 0 PID: 78 Comm: kworker/0:2 Not tainted 5.5.0-rc5-custom-33108-gf5df95d3ef41 #2209 [ 123.368898] Hardware name: Mellanox Technologies Ltd. MSN3700C/VMOD0008, BIOS 5.11 10/10/2018 [ 123.378456] Workqueue: mlxsw_core mlxsw_sp_acl_rule_activity_update_work [ 123.385970] Call Trace: [ 123.388734] dump_stack+0xc6/0x11e [ 123.392568] print_address_description.constprop.4+0x21/0x340 [ 123.403236] __kasan_report.cold.8+0x76/0xb1 [ 123.414884] kasan_report+0xe/0x20 [ 123.418716] mlxsw_sp_acl_rule_activity_update_work+0x330/0x3b0 [ 123.444034] process_one_work+0xb06/0x19a0 [ 123.453731] worker_thread+0x91/0xe90 [ 123.467348] kthread+0x348/0x410 [ 123.476847] ret_from_fork+0x24/0x30 [ 123.480863] [ 123.482545] Allocated by task 73: [ 123.486273] save_stack+0x19/0x80 [ 123.490000] __kasan_kmalloc.constprop.6+0xc1/0xd0 [ 123.495379] mlxsw_sp_acl_rule_create+0xa7/0x230 [ 123.500566] mlxsw_sp2_mr_tcam_route_create+0xf6/0x3e0 [ 123.506334] mlxsw_sp_mr_tcam_route_create+0x5b4/0x820 [ 123.512102] mlxsw_sp_mr_table_create+0x3b5/0x690 [ 123.517389] mlxsw_sp_vr_get+0x289/0x4d0 [ 123.521797] mlxsw_sp_fib_node_get+0xa2/0x990 [ 123.526692] mlxsw_sp_router_fib4_event_work+0x54c/0x2d60 [ 123.532752] process_one_work+0xb06/0x19a0 [ 123.537352] worker_thread+0x91/0xe90 [ 123.541471] kthread+0x348/0x410 [ 123.545103] ret_from_fork+0x24/0x30 [ 123.549113] [ 123.550795] Freed by task 518: [ 123.554231] save_stack+0x19/0x80 [ 123.557958] __kasan_slab_free+0x125/0x170 [ 123.562556] kfree+0xd7/0x3a0 [ 123.565895] mlxsw_sp_acl_rule_destroy+0x63/0xd0 [ 123.571081] mlxsw_sp2_mr_tcam_route_destroy+0xd5/0x130 [ 123.576946] mlxsw_sp_mr_tcam_route_destroy+0xba/0x260 [ 123.582714] mlxsw_sp_mr_table_destroy+0x1ab/0x290 [ 123.588091] mlxsw_sp_vr_put+0x1db/0x350 [ 123.592496] mlxsw_sp_fib_node_put+0x298/0x4c0 [ 123.597486] mlxsw_sp_vr_fib_flush+0x15b/0x360 [ 123.602476] mlxsw_sp_router_fib_flush+0xba/0x470 [ 123.607756] mlxsw_sp_vrs_fini+0xaa/0x120 [ 123.612260] mlxsw_sp_router_fini+0x137/0x384 [ 123.617152] mlxsw_sp_fini+0x30a/0x4a0 [ 123.621374] mlxsw_core_bus_device_unregister+0x159/0x600 [ 123.627435] mlxsw_devlink_core_bus_device_reload_down+0x7e/0xb0 [ 123.634176] devlink_reload+0xb4/0x380 [ 123.638391] devlink_nl_cmd_reload+0x610/0x700 [ 123.643382] genl_rcv_msg+0x6a8/0xdc0 [ 123.647497] netlink_rcv_skb+0x134/0x3a0 [ 123.651904] genl_rcv+0x29/0x40 [ 123.655436] netlink_unicast+0x4d4/0x700 [ 123.659843] netlink_sendmsg+0x7c0/0xc70 [ 123.664251] __sys_sendto+0x265/0x3c0 [ 123.668367] __x64_sys_sendto+0xe2/0x1b0 [ 123.672773] do_syscall_64+0xa0/0x530 [ 123.676892] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 123.682552] [ 123.684238] The buggy address belongs to the object at ffff8881f3bb4500 [ 123.684238] which belongs to the cache kmalloc-128 of size 128 [ 123.698261] The buggy address is located 32 bytes inside of [ 123.698261] 128-byte region [ffff8881f3bb4500, ffff8881f3bb4580) [ 123.711303] The buggy address belongs to the page: [ 123.716682] page:ffffea0007ceed00 refcount:1 mapcount:0 mapping:ffff888236403500 index:0x0 [ 123.725958] raw: 0200000000000200 dead000000000100 dead000000000122 ffff888236403500 [ 123.734646] raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000 [ 123.743315] page dumped because: kasan: bad access detected [ 123.749562] [ 123.751241] Memory state around the buggy address: [ 123.756620] ffff8881f3bb4400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 123.764716] ffff8881f3bb4480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 123.772812] >ffff8881f3bb4500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 123.780904] ^ [ 123.785697] ffff8881f3bb4580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 123.793793] ffff8881f3bb4600: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 123.801883] ================================================================== Fixes: cf7221a4f5a5 ("mlxsw: spectrum_router: Add Multicast routing support for Spectrum-2") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-15mlxsw: spectrum_qdisc: Include MC TCs in Qdisc countersPetr Machata1-7/+23
mlxsw configures Spectrum in such a way that BUM traffic is passed not through its nominal traffic class TC, but through its MC counterpart TC+8. However, when collecting statistics, Qdiscs only look at the nominal TC and ignore the MC TC. Add two helpers to compute the value for logical TC from the constituents, one for backlog, the other for tail drops. Use them throughout instead of going through the xstats pointer directly. Counters for TX bytes and packets are deduced from packet priority counters, and therefore already include BUM traffic. wred_drop counter is irrelevant on MC TCs, because RED is not enabled on them. Fixes: 7b8195306694 ("mlxsw: spectrum: Configure MC-aware mode on mlxsw ports") Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-15mlxsw: spectrum: Wipe xstats.backlog of down portsPetr Machata1-0/+13
Per-port counter cache used by Qdiscs is updated periodically, unless the port is down. The fact that the cache is not updated for down ports is no problem for most counters, which are relative in nature. However, backlog is absolute in nature, and if there is a non-zero value in the cache around the time that the port goes down, that value just stays there. This value then leaks to offloaded Qdiscs that report non-zero backlog even if there (obviously) is no traffic. The HW does not keep backlog of a downed port, so do likewise: as the port goes down, wipe the backlog value from xstats. Fixes: 075ab8adaf4e ("mlxsw: spectrum: Collect tclass related stats periodically") Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-15mlxsw: switchx2: Do not modify cloned SKBs during xmitIdo Schimmel1-11/+6
The driver needs to prepend a Tx header to each packet it is transmitting. The header includes information such as the egress port and traffic class. The addition of the header requires the driver to modify the SKB's header and therefore it must not be shared. Otherwise, we risk hitting various race conditions. For example, when a packet is flooded (cloned) by the bridge driver to two switch ports swp1 and swp2: t0 - mlxsw_sp_port_xmit() is called for swp1. Tx header is prepended with swp1's port number t1 - mlxsw_sp_port_xmit() is called for swp2. Tx header is prepended with swp2's port number, overwriting swp1's port number t2 - The device processes data buffer from t0. Packet is transmitted via swp2 t3 - The device processes data buffer from t1. Packet is transmitted via swp2 Usually, the device is fast enough and transmits the packet before its Tx header is overwritten, but this is not the case in emulated environments. Fix this by making sure the SKB's header is writable by calling skb_cow_head(). Since the function ensures we have headroom to push the Tx header, the check further in the function can be removed. v2: * Use skb_cow_head() instead of skb_unshare() as suggested by Jakub * Remove unnecessary check regarding headroom Fixes: 31557f0f9755 ("mlxsw: Introduce Mellanox SwitchX-2 ASIC support") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reported-by: Shalom Toledo <shalomt@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-15mlxsw: spectrum: Do not modify cloned SKBs during xmitIdo Schimmel1-12/+6
The driver needs to prepend a Tx header to each packet it is transmitting. The header includes information such as the egress port and traffic class. The addition of the header requires the driver to modify the SKB's header and therefore it must not be shared. Otherwise, we risk hitting various race conditions. For example, when a packet is flooded (cloned) by the bridge driver to two switch ports swp1 and swp2: t0 - mlxsw_sp_port_xmit() is called for swp1. Tx header is prepended with swp1's port number t1 - mlxsw_sp_port_xmit() is called for swp2. Tx header is prepended with swp2's port number, overwriting swp1's port number t2 - The device processes data buffer from t0. Packet is transmitted via swp2 t3 - The device processes data buffer from t1. Packet is transmitted via swp2 Usually, the device is fast enough and transmits the packet before its Tx header is overwritten, but this is not the case in emulated environments. Fix this by making sure the SKB's header is writable by calling skb_cow_head(). Since the function ensures we have headroom to push the Tx header, the check further in the function can be removed. v2: * Use skb_cow_head() instead of skb_unshare() as suggested by Jakub * Remove unnecessary check regarding headroom Fixes: 56ade8fe3fe1 ("mlxsw: spectrum: Add initial support for Spectrum ASIC") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reported-by: Shalom Toledo <shalomt@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-15mlxsw: spectrum: Do not enforce same firmware version for multiple ASICsIdo Schimmel1-1/+22
In commit a72afb6879bb ("mlxsw: Enforce firmware version for Spectrum-2") I added a required firmware version for Spectrum-2, but missed the fact that mlxsw_sp2_init() is used by both Spectrum-2 and Spectrum-3. This means that the same firmware version will be used for both, which is wrong. Fix this by creating a new init() callback for Spectrum-3. Fixes: a72afb6879bb ("mlxsw: Enforce firmware version for Spectrum-2") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Tested-by: Shalom Toledo <shalomt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-11devlink: correct misspelling of snapshotJacob Keller1-1/+1
The function to obtain a unique snapshot id was mistakenly typo'd as devlink_region_shapshot_id_get. Fix this typo by renaming the function and all of its users. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-08mlxsw: spectrum_qdisc: Ignore grafting of invisible FIFOPetr Machata1-0/+7
The following patch will change PRIO to replace a removed Qdisc with an invisible FIFO, instead of NOOP. mlxsw will see this replacement due to the graft message that is generated. But because FIFO does not issue its own REPLACE message, when the graft operation takes place, the Qdisc that mlxsw tracks under the indicated band is still the old one. The child handle (0:0) therefore does not match, and mlxsw rejects the graft operation, which leads to an extack message: Warning: Offloading graft operation failed. Fix by ignoring the invisible children in the PRIO graft handler. The DESTROY message of the removed Qdisc is going to follow shortly and handle the removal. Fixes: 32dc5efc6cb4 ("mlxsw: spectrum: qdiscs: prio: Handle graft command") Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-06net/mlx5: DR, Init lists that are used in rule's memberErez Shitrit1-0/+3
Whenever adding new member of rule object we attach it to 2 lists, These 2 lists should be initialized first. Fixes: 41d07074154c ("net/mlx5: DR, Expose steering rule functionality") Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-01-06net/mlx5e: Fix hairpin RSS table sizeEli Cohen3-17/+17
Set hairpin table size to the corret size, based on the groups that would be created in it. Groups are laid out on the table such that a group occupies a range of entries in the table. This implies that the group ranges should have correspondence to the table they are laid upon. The patch cited below made group 1's size to grow hence causing overflow of group range laid on the table. Fixes: a795d8db2a6d ("net/mlx5e: Support RSS for IP-in-IP and IPv6 tunneled packets") Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-01-06net/mlx5: DR, No need for atomic refcount for internal SW steering resourcesYevgeny Kliteynik3-12/+14
No need for an atomic refcounter for the STE and hashtables. These are internal SW steering resources and they are always under domain mutex. This also fixes the following refcount error: refcount_t: addition on 0; use-after-free. WARNING: CPU: 9 PID: 3527 at lib/refcount.c:25 refcount_warn_saturate+0x81/0xe0 Call Trace: dr_table_init_nic+0x10d/0x110 [mlx5_core] mlx5dr_table_create+0xb4/0x230 [mlx5_core] mlx5_cmd_dr_create_flow_table+0x39/0x120 [mlx5_core] __mlx5_create_flow_table+0x221/0x5f0 [mlx5_core] esw_create_offloads_fdb_tables+0x180/0x5a0 [mlx5_core] ... Fixes: 26d688e33f88 ("net/mlx5: DR, Add Steering entry (STE) utilities") Signed-off-by: Yevgeny Kliteynik <kliteyn@mellanox.com> Reviewed-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-01-06Revert "net/mlx5: Support lockless FTE read lookups"Parav Pandit2-56/+15
This reverts commit 7dee607ed0e04500459db53001d8e02f8831f084. During cleanup path, FTE's parent node group is removed which is referenced by the FTE while freeing the FTE. Hence FTE's lockless read lookup optimization done in cited commit is not possible at the moment. Hence, revert the commit. This avoid below KAZAN call trace. [ 110.390896] BUG: KASAN: use-after-free in find_root.isra.14+0x56/0x60 [mlx5_core] [ 110.391048] Read of size 4 at addr ffff888c19e6d220 by task swapper/12/0 [ 110.391219] CPU: 12 PID: 0 Comm: swapper/12 Not tainted 5.5.0-rc1+ [ 110.391222] Hardware name: HP ProLiant DL380p Gen8, BIOS P70 08/02/2014 [ 110.391225] Call Trace: [ 110.391229] <IRQ> [ 110.391246] dump_stack+0x95/0xd5 [ 110.391307] ? find_root.isra.14+0x56/0x60 [mlx5_core] [ 110.391320] print_address_description.constprop.5+0x20/0x320 [ 110.391379] ? find_root.isra.14+0x56/0x60 [mlx5_core] [ 110.391435] ? find_root.isra.14+0x56/0x60 [mlx5_core] [ 110.391441] __kasan_report+0x149/0x18c [ 110.391499] ? find_root.isra.14+0x56/0x60 [mlx5_core] [ 110.391504] kasan_report+0x12/0x20 [ 110.391511] __asan_report_load4_noabort+0x14/0x20 [ 110.391567] find_root.isra.14+0x56/0x60 [mlx5_core] [ 110.391625] del_sw_fte_rcu+0x4a/0x100 [mlx5_core] [ 110.391633] rcu_core+0x404/0x1950 [ 110.391640] ? rcu_accelerate_cbs_unlocked+0x100/0x100 [ 110.391649] ? run_rebalance_domains+0x201/0x280 [ 110.391654] rcu_core_si+0xe/0x10 [ 110.391661] __do_softirq+0x181/0x66c [ 110.391670] irq_exit+0x12c/0x150 [ 110.391675] smp_apic_timer_interrupt+0xf0/0x370 [ 110.391681] apic_timer_interrupt+0xf/0x20 [ 110.391684] </IRQ> [ 110.391695] RIP: 0010:cpuidle_enter_state+0xfa/0xba0 [ 110.391703] Code: 3d c3 9b b5 50 e8 56 75 6e fe 48 89 45 c8 0f 1f 44 00 00 31 ff e8 a6 94 6e fe 45 84 ff 0f 85 f6 02 00 00 fb 66 0f 1f 44 00 00 <45> 85 f6 0f 88 db 06 00 00 4d 63 fe 4b 8d 04 7f 49 8d 04 87 49 8d [ 110.391706] RSP: 0018:ffff888c23a6fce8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13 [ 110.391712] RAX: dffffc0000000000 RBX: ffffe8ffff7002f8 RCX: 000000000000001f [ 110.391715] RDX: 1ffff11184ee6cb5 RSI: 0000000040277d83 RDI: ffff888c277365a8 [ 110.391718] RBP: ffff888c23a6fd40 R08: 0000000000000002 R09: 0000000000035280 [ 110.391721] R10: ffff888c23a6fc80 R11: ffffed11847485d0 R12: ffffffffb1017740 [ 110.391723] R13: 0000000000000003 R14: 0000000000000003 R15: 0000000000000000 [ 110.391732] ? cpuidle_enter_state+0xea/0xba0 [ 110.391738] cpuidle_enter+0x4f/0xa0 [ 110.391747] call_cpuidle+0x6d/0xc0 [ 110.391752] do_idle+0x360/0x430 [ 110.391758] ? arch_cpu_idle_exit+0x40/0x40 [ 110.391765] ? complete+0x67/0x80 [ 110.391771] cpu_startup_entry+0x1d/0x20 [ 110.391779] start_secondary+0x2f3/0x3c0 [ 110.391784] ? set_cpu_sibling_map+0x2500/0x2500 [ 110.391795] secondary_startup_64+0xa4/0xb0 [ 110.391841] Allocated by task 290: [ 110.391917] save_stack+0x21/0x90 [ 110.391921] __kasan_kmalloc.constprop.8+0xa7/0xd0 [ 110.391925] kasan_kmalloc+0x9/0x10 [ 110.391929] kmem_cache_alloc_trace+0xf6/0x270 [ 110.391987] create_root_ns.isra.36+0x58/0x260 [mlx5_core] [ 110.392044] mlx5_init_fs+0x5fd/0x1ee0 [mlx5_core] [ 110.392092] mlx5_load_one+0xc7a/0x3860 [mlx5_core] [ 110.392139] init_one+0x6ff/0xf90 [mlx5_core] [ 110.392145] local_pci_probe+0xde/0x190 [ 110.392150] work_for_cpu_fn+0x56/0xa0 [ 110.392153] process_one_work+0x678/0x1140 [ 110.392157] worker_thread+0x573/0xba0 [ 110.392162] kthread+0x341/0x400 [ 110.392166] ret_from_fork+0x1f/0x40 [ 110.392218] Freed by task 2742: [ 110.392288] save_stack+0x21/0x90 [ 110.392292] __kasan_slab_free+0x137/0x190 [ 110.392296] kasan_slab_free+0xe/0x10 [ 110.392299] kfree+0x94/0x250 [ 110.392357] tree_put_node+0x257/0x360 [mlx5_core] [ 110.392413] tree_remove_node+0x63/0xb0 [mlx5_core] [ 110.392469] clean_tree+0x199/0x240 [mlx5_core] [ 110.392525] mlx5_cleanup_fs+0x76/0x580 [mlx5_core] [ 110.392572] mlx5_unload+0x22/0xc0 [mlx5_core] [ 110.392619] mlx5_unload_one+0x99/0x260 [mlx5_core] [ 110.392666] remove_one+0x61/0x160 [mlx5_core] [ 110.392671] pci_device_remove+0x10b/0x2c0 [ 110.392677] device_release_driver_internal+0x1e4/0x490 [ 110.392681] device_driver_detach+0x36/0x40 [ 110.392685] unbind_store+0x147/0x200 [ 110.392688] drv_attr_store+0x6f/0xb0 [ 110.392693] sysfs_kf_write+0x127/0x1d0 [ 110.392697] kernfs_fop_write+0x296/0x420 [ 110.392702] __vfs_write+0x66/0x110 [ 110.392707] vfs_write+0x1a0/0x500 [ 110.392711] ksys_write+0x164/0x250 [ 110.392715] __x64_sys_write+0x73/0xb0 [ 110.392720] do_syscall_64+0x9f/0x3a0 [ 110.392725] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: 7dee607ed0e0 ("net/mlx5: Support lockless FTE read lookups") Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-01-06net/mlx5: Move devlink registration before interfaces loadMichael Guralnik1-7/+9
Register devlink before interfaces are added. This will allow interfaces to use devlink while initalizing. For example, call mlx5_is_roce_enabled. Fixes: aba25279c100 ("net/mlx5e: Add TX reporter support") Signed-off-by: Michael Guralnik <michaelgur@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-01-06net/mlx5e: Always print health reporter message to dmesgEran Ben Elisha1-3/+4
In case a reporter exists, error message is logged only to the devlink tracer. The devlink tracer is a visibility utility only, which user can choose not to monitor. After cited patch, 3rd party monitoring tools that tracks these error message will no longer find them in dmesg, causing a regression. With this patch, error messages are also logged into the dmesg. Fixes: c50de4af1d63 ("net/mlx5e: Generalize tx reporter's functionality") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-01-06net/mlx5e: Avoid duplicating rule destinationsDmytro Linkin1-1/+57
Following scenario easily break driver logic and crash the kernel: 1. Add rule with mirred actions to same device. 2. Delete this rule. In described scenario rule is not added to database and on deletion driver access invalid entry. Example: $ tc filter add dev ens1f0_0 ingress protocol ip prio 1 \ flower skip_sw \ action mirred egress mirror dev ens1f0_1 pipe \ action mirred egress redirect dev ens1f0_1 $ tc filter del dev ens1f0_0 ingress protocol ip prio 1 Dmesg output: [ 376.634396] mlx5_core 0000:82:00.0: mlx5_cmd_check:756:(pid 3439): DESTROY_FLOW_GROUP(0x934) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0x563e2f) [ 376.654983] mlx5_core 0000:82:00.0: del_hw_flow_group:567:(pid 3439): flow steering can't destroy fg 89 of ft 3145728 [ 376.673433] kasan: CONFIG_KASAN_INLINE enabled [ 376.683769] kasan: GPF could be caused by NULL-ptr deref or user memory access [ 376.695229] general protection fault: 0000 [#1] PREEMPT SMP KASAN PTI [ 376.705069] CPU: 7 PID: 3439 Comm: tc Not tainted 5.4.0-rc5+ #76 [ 376.714959] Hardware name: Supermicro SYS-2028TP-DECTR/X10DRT-PT, BIOS 2.0a 08/12/2016 [ 376.726371] RIP: 0010:mlx5_del_flow_rules+0x105/0x960 [mlx5_core] [ 376.735817] Code: 01 00 00 00 48 83 eb 08 e8 28 d9 ff ff 4c 39 e3 75 d8 4c 8d bd c0 02 00 00 48 b8 00 00 00 00 00 fc ff df 4c 89 fa 48 c1 ea 03 <0f> b6 04 02 84 c0 74 08 3c 03 0f 8e 84 04 00 00 48 8d 7d 28 8b 9 d [ 376.761261] RSP: 0018:ffff888847c56db8 EFLAGS: 00010202 [ 376.770054] RAX: dffffc0000000000 RBX: ffff8888582a6da0 RCX: ffff888847c56d60 [ 376.780743] RDX: 0000000000000058 RSI: 0000000000000008 RDI: 0000000000000282 [ 376.791328] RBP: 0000000000000000 R08: fffffbfff0c60ea6 R09: fffffbfff0c60ea6 [ 376.802050] R10: fffffbfff0c60ea5 R11: ffffffff8630752f R12: ffff8888582a6da0 [ 376.812798] R13: dffffc0000000000 R14: ffff8888582a6da0 R15: 00000000000002c0 [ 376.823445] FS: 00007f675f9a8840(0000) GS:ffff88886d200000(0000) knlGS:0000000000000000 [ 376.834971] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 376.844179] CR2: 00000000007d9640 CR3: 00000007d3f26003 CR4: 00000000001606e0 [ 376.854843] Call Trace: [ 376.868542] __mlx5_eswitch_del_rule+0x49/0x300 [mlx5_core] [ 376.877735] mlx5e_tc_del_fdb_flow+0x6ec/0x9e0 [mlx5_core] [ 376.921549] mlx5e_flow_put+0x2b/0x50 [mlx5_core] [ 376.929813] mlx5e_delete_flower+0x5b6/0xbd0 [mlx5_core] [ 376.973030] tc_setup_cb_reoffload+0x29/0xc0 [ 376.980619] fl_reoffload+0x50a/0x770 [cls_flower] [ 377.015087] tcf_block_playback_offloads+0xbd/0x250 [ 377.033400] tcf_block_setup+0x1b2/0xc60 [ 377.057247] tcf_block_offload_cmd+0x195/0x240 [ 377.098826] tcf_block_offload_unbind+0xe7/0x180 [ 377.107056] __tcf_block_put+0xe5/0x400 [ 377.114528] ingress_destroy+0x3d/0x60 [sch_ingress] [ 377.122894] qdisc_destroy+0xf1/0x5a0 [ 377.129993] qdisc_graft+0xa3d/0xe50 [ 377.151227] tc_get_qdisc+0x48e/0xa20 [ 377.165167] rtnetlink_rcv_msg+0x35d/0x8d0 [ 377.199528] netlink_rcv_skb+0x11e/0x340 [ 377.219638] netlink_unicast+0x408/0x5b0 [ 377.239913] netlink_sendmsg+0x71b/0xb30 [ 377.267505] sock_sendmsg+0xb1/0xf0 [ 377.273801] ___sys_sendmsg+0x635/0x900 [ 377.312784] __sys_sendmsg+0xd3/0x170 [ 377.338693] do_syscall_64+0x95/0x460 [ 377.344833] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 377.352321] RIP: 0033:0x7f675e58e090 To avoid this, for every mirred action check if output device was already processed. If so - drop rule with EOPNOTSUPP error. Signed-off-by: Dmytro Linkin <dmitrolin@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-12-29mlxsw: spectrum: Use dedicated policer for VRRP packetsIdo Schimmel2-2/+8
Currently, VRRP packets and packets that hit exceptions during routing (e.g., MTU error) are policed using the same policer towards the CPU. This means, for example, that misconfiguration of the MTU on a routed interface can prevent VRRP packets from reaching the CPU, which in turn can cause the VRRP daemon to assume it is the Master router. Fix this by using a dedicated policer for VRRP packets. Fixes: 11566d34f895 ("mlxsw: spectrum: Add VRRP traps") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reported-by: Alex Veber <alexve@mellanox.com> Tested-by: Alex Veber <alexve@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-29mlxsw: spectrum_router: Skip loopback RIFs during MAC validationAmit Cohen1-0/+3
When a router interface (RIF) is created the MAC address of the backing netdev is verified to have the same MSBs as existing RIFs. This is required in order to avoid changing existing RIF MAC addresses that all share the same MSBs. Loopback RIFs are special in this regard as they do not have a MAC address, given they are only used to loop packets from the overlay to the underlay. Without this change, an error is returned when trying to create a RIF after the creation of a GRE tunnel that is represented by a loopback RIF. 'rif->dev->dev_addr' points to the GRE device's local IP, which does not share the same MSBs as physical interfaces. Adding an IP address to any physical interface results in: Error: mlxsw_spectrum: All router interface MAC addresses must have the same prefix. Fix this by skipping loopback RIFs during MAC validation. Fixes: 74bc99397438 ("mlxsw: spectrum_router: Veto unsupported RIF MAC addresses") Signed-off-by: Amit Cohen <amitc@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-26net/mlxfw: Fix out-of-memory error in mfa2 flash burningVladyslav Tarasiuk1-3/+4
The burning process requires to perform internal allocations of large chunks of memory. This memory doesn't need to be contiguous and can be safely allocated by vzalloc() instead of kzalloc(). This patch changes such allocation to avoid possible out-of-memory failure. Fixes: 410ed13cae39 ("Add the mlxfw module for Mellanox firmware flash process") Signed-off-by: Vladyslav Tarasiuk <vladyslavt@mellanox.com> Reviewed-by: Aya Levin <ayal@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Tested-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds6-34/+19
Pull networking fixes from David Miller: 1) Several nf_flow_table_offload fixes from Pablo Neira Ayuso, including adding a missing ipv6 match description. 2) Several heap overflow fixes in mwifiex from qize wang and Ganapathi Bhat. 3) Fix uninit value in bond_neigh_init(), from Eric Dumazet. 4) Fix non-ACPI probing of nxp-nci, from Stephan Gerhold. 5) Fix use after free in tipc_disc_rcv(), from Tuong Lien. 6) Enforce limit of 33 tail calls in mips and riscv JIT, from Paul Chaignon. 7) Multicast MAC limit test is off by one in qede, from Manish Chopra. 8) Fix established socket lookup race when socket goes from TCP_ESTABLISHED to TCP_LISTEN, because there lacks an intervening RCU grace period. From Eric Dumazet. 9) Don't send empty SKBs from tcp_write_xmit(), also from Eric Dumazet. 10) Fix active backup transition after link failure in bonding, from Mahesh Bandewar. 11) Avoid zero sized hash table in gtp driver, from Taehee Yoo. 12) Fix wrong interface passed to ->mac_link_up(), from Russell King. 13) Fix DSA egress flooding settings in b53, from Florian Fainelli. 14) Memory leak in gmac_setup_txqs(), from Navid Emamdoost. 15) Fix double free in dpaa2-ptp code, from Ioana Ciornei. 16) Reject invalid MTU values in stmmac, from Jose Abreu. 17) Fix refcount leak in error path of u32 classifier, from Davide Caratti. 18) Fix regression causing iwlwifi firmware crashes on boot, from Anders Kaseorg. 19) Fix inverted return value logic in llc2 code, from Chan Shu Tak. 20) Disable hardware GRO when XDP is attached to qede, frm Manish Chopra. 21) Since we encode state in the low pointer bits, dst metrics must be at least 4 byte aligned, which is not necessarily true on m68k. Add annotations to fix this, from Geert Uytterhoeven. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (160 commits) sfc: Include XDP packet headroom in buffer step size. sfc: fix channel allocation with brute force net: dst: Force 4-byte alignment of dst_metrics selftests: pmtu: fix init mtu value in description hv_netvsc: Fix unwanted rx_table reset net: phy: ensure that phy IDs are correctly typed mod_devicetable: fix PHY module format qede: Disable hardware gro when xdp prog is installed net: ena: fix issues in setting interrupt moderation params in ethtool net: ena: fix default tx interrupt moderation interval net/smc: unregister ib devices in reboot_event net: stmmac: platform: Fix MDIO init for platforms without PHY llc2: Fix return statement of llc_stat_ev_rx_null_dsap_xid_c (and _test_c) net: hisilicon: Fix a BUG trigered by wrong bytes_compl net: dsa: ksz: use common define for tag len s390/qeth: don't return -ENOTSUPP to userspace s390/qeth: fix promiscuous mode after reset s390/qeth: handle error due to unsupported transport mode cxgb4: fix refcount init for TC-MQPRIO offload tc-testing: initial tdc selftests for cls_u32 ...
2019-12-19net/mlx5e: Fix concurrency issues between config flow and XSKMaxim Mikityanskiy5-33/+13
After disabling resources necessary for XSK (the XDP program, channels, XSK queues), use synchronize_rcu to wait until the XSK wakeup function finishes, before freeing the resources. Suspend XSK wakeups during switching channels. If the XDP program is being removed, synchronize_rcu before closing the old channels to allow XSK wakeup to complete. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191217162023.16011-3-maximmi@mellanox.com
2019-12-09treewide: Use sizeof_field() macroPankaj Bharadiya3-6/+6
Replace all the occurrences of FIELD_SIZEOF() with sizeof_field() except at places where these are defined. Later patches will remove the unused definition of FIELD_SIZEOF(). This patch is generated using following script: EXCLUDE_FILES="include/linux/stddef.h|include/linux/kernel.h" git grep -l -e "\bFIELD_SIZEOF\b" | while read file; do if [[ "$file" =~ $EXCLUDE_FILES ]]; then continue fi sed -i -e 's/\bFIELD_SIZEOF\b/sizeof_field/g' $file; done Signed-off-by: Pankaj Bharadiya <pankaj.laxminarayan.bharadiya@intel.com> Link: https://lore.kernel.org/r/20190924105839.110713-3-pankaj.laxminarayan.bharadiya@intel.com Co-developed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: David Miller <davem@davemloft.net> # for net
2019-12-09mlxsw: spectrum_router: Remove unlikely user-triggerable warningIdo Schimmel1-1/+6
In case the driver vetoes the addition of an IPv6 multipath route, the IPv6 stack will emit delete notifications for the sibling routes that were already added to the FIB trie. Since these siblings are not present in hardware, a warning will be generated. Have the driver ignore notifications for routes it does not have. Fixes: ebee3cad835f ("ipv6: Add IPv6 multipath notifications for add / replace") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds11-80/+148
Pull networking fixes from David Miller: 1) More jumbo frame fixes in r8169, from Heiner Kallweit. 2) Fix bpf build in minimal configuration, from Alexei Starovoitov. 3) Use after free in slcan driver, from Jouni Hogander. 4) Flower classifier port ranges don't work properly in the HW offload case, from Yoshiki Komachi. 5) Use after free in hns3_nic_maybe_stop_tx(), from Yunsheng Lin. 6) Out of bounds access in mqprio_dump(), from Vladyslav Tarasiuk. 7) Fix flow dissection in dsa TX path, from Alexander Lobakin. 8) Stale syncookie timestampe fixes from Guillaume Nault. [ Did an evil merge to silence a warning introduced by this pull - Linus ] * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (84 commits) r8169: fix rtl_hw_jumbo_disable for RTL8168evl net_sched: validate TCA_KIND attribute in tc_chain_tmplt_add() r8169: add missing RX enabling for WoL on RTL8125 vhost/vsock: accept only packets with the right dst_cid net: phy: dp83867: fix hfs boot in rgmii mode net: ethernet: ti: cpsw: fix extra rx interrupt inet: protect against too small mtu values. gre: refetch erspan header from skb->data after pskb_may_pull() pppoe: remove redundant BUG_ON() check in pppoe_pernet tcp: Protect accesses to .ts_recent_stamp with {READ,WRITE}_ONCE() tcp: tighten acceptance of ACKs not matching a child socket tcp: fix rejected syncookies due to stale timestamps lpc_eth: kernel BUG on remove tcp: md5: fix potential overestimation of TCP option space net: sched: allow indirect blocks to bind to clsact in TC net: core: rename indirect block ingress cb function net-sysfs: Call dev_hold always in netdev_queue_add_kobject net: dsa: fix flow dissection on Tx path net/tls: Fix return values to avoid ENOTSUPP net: avoid an indirect call in ____sys_recvmsg() ...
2019-12-05net/mlx5e: E-switch, Fix Ingress ACL groups in switchdev mode for prio tagParav Pandit2-38/+93
In cited commit, when prio tag mode is enabled, FTE creation fails due to missing group with valid match criteria. Hence, (a) create prio tag group metadata_prio_tag_grp when prio tag is enabled with match criteria for vlan push FTE. (b) Rename metadata_grp to metadata_allmatch_grp to reflect its purpose. Also when priority tag is enabled, delete metadata settings after deleting ingress rules, which are using it. Tide up rest of the ingress config code for unnecessary labels. Fixes: 10652f39943e ("net/mlx5: Refactor ingress acl configuration") Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Eli Britstein <elibr@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-12-05net/mlx5e: ethtool, Fix analysis of speed settingAya Levin1-10/+3
When setting speed to 100G via ethtool (AN is set to off), only 25G*4 is configured while the user, who has an advanced HW which supports extended PTYS, expects also 50G*2 to be configured. With this patch, when extended PTYS mode is available, configure PTYS via extended fields. Fixes: 4b95840a6ced ("net/mlx5e: Fix matching of speed to PRM link modes") Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-12-05net/mlx5e: Fix translation of link mode into speedAya Levin1-0/+1
Add a missing value in translation of PTYS ext_eth_proto_oper to its corresponding speed. When ext_eth_proto_oper bit 10 is set, ethtool shows unknown speed. With this fix, ethtool shows speed is 100G as expected. Fixes: a08b4ed1373d ("net/mlx5: Add support to ext_* fields introduced in Port Type and Speed register") Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-12-05net/mlx5e: Fix free peer_flow when refcount is 0Roi Dayan1-2/+5
It could be neigh update flow took a refcount on peer flow so sometimes we cannot release peer flow even if parent flow is being freed now. Fixes: 5a7e5bcb663d ("net/mlx5e: Extend tc flow struct with reference counter") Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Eli Britstein <elibr@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-12-05net/mlx5e: Fix freeing flow with kfree() and not kvfree()Roi Dayan1-1/+1
Flows are allocated with kzalloc() so free with kfree(). Fixes: 04de7dda7394 ("net/mlx5e: Infrastructure for duplicated offloading of TC flows") Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Eli Britstein <elibr@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-12-05net/mlx5e: Fix SFF 8472 eeprom lengthEran Ben Elisha1-1/+1
SFF 8472 eeprom length is 512 bytes. Fix module info return value to support 512 bytes read. Fixes: ace329f4ab3b ("net/mlx5e: ethtool, Remove unsupported SFP EEPROM high pages query") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Aya Levin <ayal@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-12-05net/mlx5e: Query global pause state before setting prio2bufferHuy Nguyen1-2/+25
When the user changes prio2buffer mapping while global pause is enabled, mlx5 driver incorrectly sets all active buffers (buffer that has at least one priority mapped) to lossy. Solution: If global pause is enabled, set all the active buffers to lossless in prio2buffer command. Also, add error message when buffer size is not enough to meet xoff threshold. Fixes: 0696d60853d5 ("net/mlx5e: Receive buffer configuration") Signed-off-by: Huy Nguyen <huyn@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-12-05net/mlx5e: Fix TXQ indices to be sequentialEran Ben Elisha4-22/+15
Cited patch changed (channel index, tc) => (TXQ index) mapping to be a static one, in order to keep indices consistent when changing number of channels or TCs. For 32 channels (OOB) and 8 TCs, real num of TXQs is 256. When reducing the amount of channels to 8, the real num of TXQs will be changed to 64. This indices method is buggy: - Channel #0, TC 3, the TXQ index is 96. - Index 8 is not valid, as there is no such TXQ from driver perspective (As it represents channel #8, TC 0, which is not valid with the above configuration). As part of driver's select queue, it calls netdev_pick_tx which returns an index in the range of real number of TXQs. Depends on the return value, with the examples above, driver could have returned index larger than the real number of tx queues, or crash the kernel as it tries to read invalid address of SQ which was not allocated. Fix that by allocating sequential TXQ indices, and hold a new mapping between (channel index, tc) => (real TXQ index). This mapping will be updated as part of priv channels activation, and is used in mlx5e_select_queue to find the selected queue index. The existing indices mapping (channel_tc2txq) is no longer needed, as it is used only for statistics structures and can be calculated on run time. Delete its definintion and updates. Fixes: 8bfaf07f7806 ("net/mlx5e: Present SW stats when state is not opened") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-12-04net: ipv6_stub: use ip6_dst_lookup_flow instead of ip6_dst_lookupSabrina Dubroca1-4/+4
ipv6_stub uses the ip6_dst_lookup function to allow other modules to perform IPv6 lookups. However, this function skips the XFRM layer entirely. All users of ipv6_stub->ip6_dst_lookup use ip_route_output_flow (via the ip_route_output_key and ip_route_output helpers) for their IPv4 lookups, which calls xfrm_lookup_route(). This patch fixes this inconsistent behavior by switching the stub to ip6_dst_lookup_flow, which also calls xfrm_lookup_route(). This requires some changes in all the callers, as these two functions take different arguments and have different return types. Fixes: 5f81bd2e5d80 ("ipv6: export a stub for IPv6 symbols used by vxlan") Reported-by: Xiumei Mu <xmu@redhat.com> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-30net/mlx5e: Fix build error without IPV6YueHaibing1-36/+38
If IPV6 is not set and CONFIG_MLX5_ESWITCH is y, building fails: drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun.c:322:5: error: redefinition of mlx5e_tc_tun_create_header_ipv6 int mlx5e_tc_tun_create_header_ipv6(struct mlx5e_priv *priv, ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun.c:7:0: drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun.h:67:1: note: previous definition of mlx5e_tc_tun_create_header_ipv6 was here mlx5e_tc_tun_create_header_ipv6(struct mlx5e_priv *priv, ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Use #ifdef to guard this, also move mlx5e_route_lookup_ipv6 to cleanup unused warning. Reported-by: Hulk Robot <hulkci@huawei.com> Fixes: e689e998e102 ("net/mlx5e: TC, Stub out ipv6 tun create header function") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-27Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds2-31/+1
Pull rdma updates from Jason Gunthorpe: "Again another fairly quiet cycle with few notable core code changes and the usual variety of driver bug fixes and small improvements. - Various driver updates and bug fixes for siw, bnxt_re, hns, qedr, iw_cxgb4, vmw_pvrdma, mlx5 - Improvements in SRPT from working with iWarp - SRIOV VF support for bnxt_re - Skeleton kernel-doc files for drivers/infiniband - User visible counters for events related to ODP - Common code for tracking of mmap lifetimes so that drivers can link HW object liftime to a VMA - ODP bug fixes and rework - RDMA READ support for efa - Removal of the very old cxgb3 driver" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (168 commits) RDMA/hns: Delete unnecessary callback functions for cq RDMA/hns: Rename the functions used inside creating cq RDMA/hns: Redefine the member of hns_roce_cq struct RDMA/hns: Redefine interfaces used in creating cq RDMA/efa: Expose RDMA read related attributes RDMA/efa: Support remote read access in MR registration RDMA/efa: Store network attributes in device attributes IB/hfi1: remove redundant assignment to variable ret RDMA/bnxt_re: Fix missing le16_to_cpu RDMA/bnxt_re: Fix stat push into dma buffer on gen p5 devices RDMA/bnxt_re: Fix chip number validation Broadcom's Gen P5 series RDMA/bnxt_re: Fix Kconfig indentation IB/mlx5: Implement callbacks for getting VFs GUID attributes IB/ipoib: Add ndo operation for getting VFs GUID attributes IB/core: Add interfaces to get VF node and port GUIDs net/core: Add support for getting VF GUIDs RDMA/qedr: Fix null-pointer dereference when calling rdma_user_mmap_get_offset RDMA/cm: Use refcount_t type for refcount variable IB/mlx5: Support extended number of strides for Striding RQ IB/mlx4: Update HW GID table while adding vlan GID ...
2019-11-25Merge branch 'ib-guids' into rdma.git for-nextJason Gunthorpe6-420/+666
Danit Goldberg says: ==================== This series extends RTNETLINK to provide IB port and node GUIDs, which were configured for Infiniband VFs. The functionality to set VF GUIDs already existed for a long time, and here we are adding the missing "get" so that netlink will be symmetric and various cloud orchestration tools will be able to manage such VFs more naturally. The iproute2 was extended too to present those GUIDs. - ip link show <device> For example: - ip link set ib4 vf 0 node_guid 22:44:33:00:33:11:00:33 - ip link set ib4 vf 0 port_guid 10:21:33:12:00:11:22:10 - ip link show ib4 ib4: <BROADCAST,MULTICAST> mtu 4092 qdisc noop state DOWN mode DEFAULT group default qlen 256 link/infiniband 00:00:0a:2d:fe:80:00:00:00:00:00:00:ec:0d:9a:03:00:44:36:8d brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff vf 0 link/infiniband 00:00:0a:2d:fe:80:00:00:00:00:00:00:ec:0d:9a:03:00:44:36:8d brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff, spoof checking off, NODE_GUID 22:44:33:00:33:11:00:33, PORT_GUID 10:21:33:12:00:11:22:10, link-state disable, trust off, query_rss off ==================== Based on the mlx5-next branch from git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux for dependencies * branch 'ib-guids': (35 commits) IB/mlx5: Implement callbacks for getting VFs GUID attributes IB/ipoib: Add ndo operation for getting VFs GUID attributes IB/core: Add interfaces to get VF node and port GUIDs net/core: Add support for getting VF GUIDs net/mlx5: Add new chain for netfilter flow table offload net/mlx5: Refactor creating fast path prio chains net/mlx5: Accumulate levels for chains prio namespaces net/mlx5: Define fdb tc levels per prio net/mlx5: Rename FDB_* tc related defines to FDB_TC_* defines net/mlx5: Simplify fdb chain and prio eswitch defines IB/mlx5: Load profile according to RoCE enablement state IB/mlx5: Rename profile and init methods net/mlx5: Handle "enable_roce" devlink param net/mlx5: Document flow_steering_mode devlink param devlink: Add new "enable_roce" generic device param net/mlx5: fix spelling mistake "metdata" -> "metadata" net/mlx5: fix kvfree of uninitialized pointer spec IB/mlx5: Introduce and use mlx5_core_is_vf() net/mlx5: E-switch, Enable metadata on own vport net/mlx5: Refactor ingress acl configuration ... Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-24mlxsw: spectrum_router: Fix use of uninitialized adjacency indexAmit Cohen1-3/+2
When mlxsw_sp_adj_discard_write() is called for the first time, the value stored in 'mlxsw_sp->router->adj_discard_index' is invalid, as indicated by 'mlxsw_sp->router->adj_discard_index_valid' being set to 'false'. In this case, we should not use the value initially stored in 'mlxsw_sp->router->adj_discard_index' (0) and instead use the value allocated later in the function. Fixes: 983db6198f0d ("mlxsw: spectrum_router: Allocate discard adjacency entry when needed") Signed-off-by: Amit Cohen <amitc@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-11-24mlxsw: spectrum_router: After underlay moves, demote conflicting tunnelsPetr Machata1-1/+38
When a GRE tunnel is bound to an underlay netdevice and that netdevice is moved to a different VRF, that could cause two tunnels to have the same underlay local address in the same VRF. Linux in this situation dispatches the traffic according to the tunnel key (or lack thereof), but that cannot be offloaded to Spectrum devices. Detect this situation and unoffload the two impacted tunnels when it happens. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-11-23Merge tag 'mlx5-updates-2019-11-22' of ↵Jakub Kicinski7-93/+186
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2019-11-22 1) Misc Cleanups 2) Software steering support for Geneve ==================== Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-11-23net: use rhashtable_lookup() instead of rhashtable_lookup_fast()Taehee Yoo1-1/+1
rhashtable_lookup_fast() internally calls rcu_read_lock() then, calls rhashtable_lookup(). So if rcu_read_lock() is already held, rhashtable_lookup() is enough. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-11-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski14-72/+70
Minor conflict in drivers/s390/net/qeth_l2_main.c, kept the lock from commit c8183f548902 ("s390/qeth: fix potential deadlock on workqueue flush"), removed the code which was removed by commit 9897d583b015 ("s390/qeth: consolidate some duplicated HW cmd code"). Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-11-22net/mlx5e: Remove redundant pointer checkEli Cohen1-12/+10
When code reaches the "out" label, n is guaranteed to be valid so we can unconditionally call neigh_release. Also change the label to release_neigh to better reflect the fact that we unconditionally free the neighbour and also match other labels convention. Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>