linux - Linux Kernel (branches are rebased on master from time to time)

Age	Commit message (Collapse)	Author	Files	Lines
2020-05-24	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	David S. Miller	1	-1/+1
	The MSCC bug fix in 'net' had to be slightly adjusted because the register accesses are done slightly differently in net-next. Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-22	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next	David S. Miller	1	-6/+1
	Daniel Borkmann says: ==================== pull-request: bpf-next 2020-05-23 The following pull-request contains BPF updates for your net-next tree. We've added 50 non-merge commits during the last 8 day(s) which contain a total of 109 files changed, 2776 insertions(+), 2887 deletions(-). The main changes are: 1) Add a new AF_XDP buffer allocation API to the core in order to help lowering the bar for drivers adopting AF_XDP support. i40e, ice, ixgbe as well as mlx5 have been moved over to the new API and also gained a small improvement in performance, from Björn Töpel and Magnus Karlsson. 2) Add getpeername()/getsockname() attach types for BPF sock_addr programs in order to allow for e.g. reverse translation of load-balancer backend to service address/port tuple from a connected peer, from Daniel Borkmann. 3) Improve the BPF verifier is_branch_taken() logic to evaluate pointers being non-NULL, e.g. if after an initial test another non-NULL test on that pointer follows in a given path, then it can be pruned right away, from John Fastabend. 4) Larger rework of BPF sockmap selftests to make output easier to understand and to reduce overall runtime as well as adding new BPF kTLS selftests that run in combination with sockmap, also from John Fastabend. 5) Batch of misc updates to BPF selftests including fixing up test_align to match verifier output again and moving it under test_progs, allowing bpf_iter selftest to compile on machines with older vmlinux.h, and updating config options for lirc and v6 segment routing helpers, from Stanislav Fomichev, Andrii Nakryiko and Alan Maguire. 6) Conversion of BPF tracing samples outdated internal BPF loader to use libbpf API instead, from Daniel T. Lee. 7) Follow-up to BPF kernel test infrastructure in order to fix a flake in the XDP selftests, from Jesper Dangaard Brouer. 8) Minor improvements to libbpf's internal hashmap implementation, from Ian Rogers. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-22	net/mlx5e: Fix inner tirs handling	Roi Dayan	1	-1/+1
	In the cited commit inner_tirs argument was added to create and destroy inner tirs, and no indication was added to mlx5e_modify_tirs_hash() function. In order to have a consistent handling, use inner_indir_tir[0].tirn in tirs destroy/modify function as an indication to whether inner tirs are created. Inner tirs are not created for representors and before this commit, a call to mlx5e_modify_tirs_hash() was sending HW commands to modify non-existent inner tirs. Fixes: 46dc933cee82 ("net/mlx5e: Provide explicit directive if to create inner indirect tirs") Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-21	mlx5, xsk: Migrate to new MEM_TYPE_XSK_BUFF_POOL	Björn Töpel	1	-6/+1
	Use the new MEM_TYPE_XSK_BUFF_POOL API in lieu of MEM_TYPE_ZERO_COPY in mlx5e. It allows to drop a lot of code from the driver (which is now common in AF_XDP core and was related to XSK RX frame allocation, DMA mapping, etc.) and slightly improve performance (RX +0.8 Mpps, TX +0.4 Mpps). rfc->v1: Put back the sanity check for XSK params, use XSK API to get the total headroom size. (Maxim) v1->v2: Fix DMA address handling, set XDP metadata to invalid. (Maxim) v2->v3: Handle frame_sz, use xsk_buff_xdp_get_frame_dma, use xsk_buff API for DMA sync on TX, add performance numbers. (Maxim) v3->v4: Remove unused variable num_xsk_frames. (Jakub) Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200520192103.355233-12-bjorn.topel@gmail.com
2020-05-15	net/mlx5e: Take DCBNL-related definitions into dedicated files	Tariq Toukan	1	-49/+1
	Take DCBNL-related definitions out of the common en.h header, Use a dedicated header file for exposing them. Some need not to be exposed, use them locally in the .c file. Use stubs to eliminate use of CONFIG_MLX5_CORE_EN_DCB in the generic control flows. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-15	net/mlx5e: IPoIB, Enable loopback packets for IPoIB interfaces	Erez Shitrit	1	-1/+2
	Enable loopback of unicast and multicast traffic for IPoIB enhanced mode. This will allow interfaces with the same pkey to communicate between them e.g cloned interfaces that located in different namespaces. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-15	net/mlx5: Move internal timer read function to clock library	Eran Ben Elisha	1	-1/+0
	Move mlx5_read_internal_timer() into lib/clock.c file as it is being used there. As such, make this function a static one. In addition, rearrange headers include to support function move. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Aya Levin <ayal@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-14	mlx5: Rx queue setup time determine frame_sz for XDP	Jesper Dangaard Brouer	1	-0/+1
	The mlx5 driver have multiple memory models, which are also changed according to whether a XDP bpf_prog is attached. The 'rx_striding_rq' setting is adjusted via ethtool priv-flags e.g.: # ethtool --set-priv-flags mlx5p2 rx_striding_rq off On the general case with 4K page_size and regular MTU packet, then the frame_sz is 2048 and 4096 when XDP is enabled, in both modes. The info on the given frame size is stored differently depending on the RQ-mode and encoded in a union in struct mlx5e_rq union wqe/mpwqe. In rx striding mode rq->mpwqe.log_stride_sz is either 11 or 12, which corresponds to 2048 or 4096 (MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ). In non-striding mode (MLX5_WQ_TYPE_CYCLIC) the frag_stride is stored in rq->wqe.info.arr[0].frag_stride, for the first fragment, which is what the XDP case cares about. To reduce effect on fast-path, this patch determine the frame_sz at setup time, to avoid determining the memory model runtime. Variable is named frame0_sz to make it clear that this is only the frame size of the first fragment. This mlx5 driver does a DMA-sync on XDP_TX action, but grow is safe as it have done a DMA-map on the entire PAGE_SIZE. The driver also already does a XDP length check against sq->hw_mtu on the possible XDP xmit paths mlx5e_xmit_xdp_frame() + mlx5e_xmit_xdp_frame_mpwqe(). V3+4: Change variable name first_frame_sz to frame0_sz V2: Fix that frag_size need to be recalc before creating SKB. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Tariq Toukan <tariqt@mellanox.com> Cc: Saeed Mahameed <saeedm@mellanox.com> Link: https://lore.kernel.org/bpf/158945348021.97035.12295039384250022883.stgit@firesoul
2020-05-09	net/mlx5e: Take TX WQE info structures out of general EN header	Tariq Toukan	1	-27/+0
	Into the txrx header file. The mlx5e_sq_wqe_info structure describes WQE info for the ICOSQ, rename it to better reflect this. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-09	net/mlx5e: Return void from mlx5e_sq_xmit and mlx5i_sq_xmit	Maxim Mikityanskiy	1	-2/+2
	mlx5e_sq_xmit and mlx5i_sq_xmit always return NETDEV_TX_OK. Drop the return value to simplify the code. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-04-30	net/mlx5e: Rename ICOSQ WQE info struct and field	Maxim Mikityanskiy	1	-2/+2
	Structs mlx5e_txqsq and mlx5e_xdpsq contain wqe_info arrays to store supplementary information corresponding to WQEs in the queue. Struct mlx5e_icosq also has such an array, but it's called differently - ico_wqe. This patch renames it to unify with the other SQs. In addition, rename the struct to emphasize its specific usage. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-04-30	Merge branch 'mlx5-next' of ↵	Saeed Mahameed	1	-3/+3
	git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux mlx5 updates for both net-next and rdma-next: 1) HW bits and definitions for TLS and IPsec offlaods 2) Release all pages capability bits 3) New command interface helpers and some code cleanup as a result 4) Move qp.c out of mlx5 core driver into mlx5_ib rdma driver Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-04-23	net/mlx5: Update transobj.c new cmd interface	Leon Romanovsky	1	-3/+3
	Do mass update of transobj.c to reuse newly introduced mlx5_cmd_exec_in*() interfaces. Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-04-20	net/mlx5e: Don't trigger IRQ multiple times on XSK wakeup to avoid WQ overruns	Maxim Mikityanskiy	1	-1/+2
	XSK wakeup function triggers NAPI by posting a NOP WQE to a special XSK ICOSQ. When the application floods the driver with wakeup requests by calling sendto() in a certain pattern that ends up in mlx5e_trigger_irq, the XSK ICOSQ may overflow. Multiple NOPs are not required and won't accelerate the process, so avoid posting a second NOP if there is one already on the way. This way we also avoid increasing the queue size (which might not help anyway). Fixes: db05815b36cb ("net/mlx5e: Add XSK zero-copy support") Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-25	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	David S. Miller	1	-0/+2
	Overlapping header include additions in macsec.c A bug fix in 'net' overlapping with the removal of 'version' string in ena_netdev.c Overlapping test additions in selftests Makefile Overlapping PCI ID table adjustments in iwlwifi driver. Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-24	net/mlx5e: Fix ICOSQ recovery flow with Striding RQ	Aya Levin	1	-0/+1
	In striding RQ mode, the buffers of an RX WQE are first prepared and posted to the HW using a UMR WQEs via the ICOSQ. We maintain the state of these in-progress WQEs in the RQ SW struct. In the flow of ICOSQ recovery, the corresponding RQ is not in error state, hence: - The buffers of the in-progress WQEs must be released and the RQ metadata should reflect it. - Existing RX WQEs in the RQ should not be affected. For this, wrap the dealloc of the in-progress WQEs in a function, and use it in the ICOSQ recovery flow instead of mlx5e_free_rx_descs(). Fixes: be5323c8379f ("net/mlx5e: Report and recover from CQE error on ICOSQ") Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-24	net/mlx5e: Enhance ICOSQ WQE info fields	Aya Levin	1	-0/+1
	Add number of WQEBBs (WQE's Basic Block) to WQE info struct. Set the number of WQEBBs on WQE post, and increment the consumer counter (cc) on completion. In case of error completions, the cc was mistakenly not incremented, keeping a gap between cc and pc (producer counter). This failed the recovery flow on the ICOSQ from a CQE error which timed-out waiting for the cc and pc to meet. Fixes: be5323c8379f ("net/mlx5e: Report and recover from CQE error on ICOSQ") Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-09	net/mlx5e: Show/set Rx network flow classification rules on ul rep	Vlad Buslov	1	-0/+3
	Reuse infrastructure that already exists for pf in legacy mode to show/set Rx network flow classification rules for uplink representors. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com>
2020-03-09	net/mlx5e: Show/set Rx flow indir table and RSS hash key on ul rep	Vlad Buslov	1	-0/+3
	Reuse infrastructure that already exists for pf in legacy mode to show/set Rx flow hash indirection table and RSS hash key for uplink representors. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com>
2020-03-09	Merge branch 'mlx5-next' of ↵	Saeed Mahameed	1	-1/+1
	git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux This series adds some HW bits and definitions for mlx5 driver, to be used by downstream features in both rdma and netdev branches. * 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux: net/mlx5: HW bit for goto chain offload support net/mlx5: Expose link speed directly net/mlx5: Introduce TLS and IPSec objects enums net/mlx5: Introduce egress acl forward-to-vport capability net/mlx5: Expose raw packet pacing APIs net/mlx5e: Replace zero-length array with flexible-array member net/mlx5: fix spelling mistake "reserverd" -> "reserved" Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-03	net/mlx5e: Use devlink virtual flavour for VF devlink port	Parav Pandit	1	-1/+1
	Use newly introduce 'virtual' port flavour for devlink port of PCI VF devlink device in non-representors mode. While at it, remove recently introduced empty lines at end of the file. Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-27	net/mlx5e: Add support for devlink-port in non-representors mode	Vladyslav Tarasiuk	1	-0/+1
	Added devlink_port field to mlx5e_priv structure and a callback to netdev ops to enable devlink to get info about the port. The port registration happens at driver initialization. Signed-off-by: Vladyslav Tarasiuk <vladyslavt@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-02-25	net/mlx5e: Add context to the preactivate hook	Maxim Mikityanskiy	1	-4/+11
	Sometimes the preactivate hook of mlx5e_safe_switch_channels needs more parameters than just struct mlx5e_priv . For such cases, a new parameter (void context) is added to preactivate hooks. Some of the existing normal functions are currently used as preactivate callbacks. To avoid adding an extra unused parameter, they are wrapped in an automatic way using the MLX5E_DEFINE_PREACTIVATE_WRAPPER_CTX macro. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-02-25	net/mlx5e: Fix configuration of XPS cpumasks and netdev queues in corner cases	Maxim Mikityanskiy	1	-1/+10
	Currently, mlx5e notifies the kernel about the number of queues and sets the default XPS cpumasks when channels are activated. This implementation has several corner cases, in which the kernel may not be updated on time, or XPS cpumasks may be reset when not directly touched by the user. This commit fixes these corner cases to match the following expected behavior: 1. The number of queues always corresponds to the number of channels configured. 2. XPS cpumasks are set to driver's defaults on netdev attach. 3. XPS cpumasks set by user are not reset, unless the number of channels changes. If the number of channels changes, they are reset to driver's defaults. (In general case, when the number of channels increases or decreases, it's not possible to guess how to convert the current XPS cpumasks to work with the new number of channels, so we let the user reconfigure it if they change the number of channels.) XPS cpumasks are no longer stored per channel. Only one temporary cpumask is used. The old stored cpumasks didn't reflect the user's changes and were not used after applying them. A scratchpad area is added to struct mlx5e_priv. As cpumask_var_t requires allocation, and the preactivate hook can't fail, we need to preallocate the temporary cpumask in advance. It's stored in the scratchpad. Fixes: 149e566fef81 ("net/mlx5e: Expand XPS cpumask to cover all online cpus") Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-02-25	net/mlx5e: Use preactivate hook to set the indirection table	Maxim Mikityanskiy	1	-0/+1
	mlx5e_ethtool_set_channels updates the indirection table before switching to the new channels. If the switch fails, the indirection table is new, but the channels are old, which is wrong. Fix it by using the preactivate hook of mlx5e_safe_switch_channels to update the indirection table at the stage when nothing can fail anymore. As the code that updates the indirection table is now encapsulated into a new function, use that function in the attach flow when the driver has to reduce the number of channels, and prepare the code for the next commit. Fixes: 85082dba0a ("net/mlx5e: Correctly handle RSS indirection table when changing number of channels") Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-02-25	net/mlx5e: Rename hw_modify to preactivate	Maxim Mikityanskiy	1	-3/+3
	mlx5e_safe_switch_channels accepts a callback to be called before activating new channels. It is intended to configure some hardware parameters in cases where channels are recreated because some configuration has changed. Recently, this callback has started being used to update the driver's internal MLX5E_STATE_XDP_OPEN flag, and the following patches also intend to use this callback for software preparations. This patch renames the hw_modify callback to preactivate, so that the name fits better. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-02-19	net/mlx5e: Replace zero-length array with flexible-array member	Gustavo A. R. Silva	1	-1/+1
	The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-01-22	net/mlx5e: Convert stats groups array to array of group pointers	Saeed Mahameed	1	-1/+1
	Convert stats groups array to array of "stats group" pointers to allow sharing and individual selection of groups per profile as illustrated in the next patches. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
2020-01-22	net/mlx5e: Profile specific stats groups	Saeed Mahameed	1	-1/+2
	Attach stats groups array to the profiles and make the stats utility functions (get_num, update, fill, fill_strings) generic and use the profile->stats_grps rather the hardcoded NIC stats groups. This will allow future extension to have per profile stats groups. In this patch mlx5e NIC and IPoIB will still share the same stats groups. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
2020-01-07	net/mlx5: Increase the max number of channels to 128	Fan Li	1	-3/+3
	Currently the max number of channels is limited to 64, which is half of the indirection table size to allow some flexibility. But on servers with more than 64 cores, users may want to utilize more queues. This patch increases the advertised max number of channels to 128 by changing the ratio between channels and indirection table slots to 1:1. At the same time, the driver still enable no more than 64 channels at loading. Users can change it by ethtool afterwards. Signed-off-by: Fan Li <fanl@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-12-19	net/mlx5e: Fix concurrency issues between config flow and XSK	Maxim Mikityanskiy	1	-1/+1
	After disabling resources necessary for XSK (the XDP program, channels, XSK queues), use synchronize_rcu to wait until the XSK wakeup function finishes, before freeing the resources. Suspend XSK wakeups during switching channels. If the XDP program is being removed, synchronize_rcu before closing the old channels to allow XSK wakeup to complete. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191217162023.16011-3-maximmi@mellanox.com
2019-12-05	net/mlx5e: Fix TXQ indices to be sequential	Eran Ben Elisha	1	-1/+1
	Cited patch changed (channel index, tc) => (TXQ index) mapping to be a static one, in order to keep indices consistent when changing number of channels or TCs. For 32 channels (OOB) and 8 TCs, real num of TXQs is 256. When reducing the amount of channels to 8, the real num of TXQs will be changed to 64. This indices method is buggy: - Channel #0, TC 3, the TXQ index is 96. - Index 8 is not valid, as there is no such TXQ from driver perspective (As it represents channel #8, TC 0, which is not valid with the above configuration). As part of driver's select queue, it calls netdev_pick_tx which returns an index in the range of real number of TXQs. Depends on the return value, with the examples above, driver could have returned index larger than the real number of tx queues, or crash the kernel as it tries to read invalid address of SQ which was not allocated. Fix that by allocating sequential TXQ indices, and hold a new mapping between (channel index, tc) => (real TXQ index). This mapping will be updated as part of priv channels activation, and is used in mlx5e_select_queue to find the selected queue index. The existing indices mapping (channel_tc2txq) is no longer needed, as it is used only for statistics structures and can be calculated on run time. Delete its definintion and updates. Fixes: 8bfaf07f7806 ("net/mlx5e: Present SW stats when state is not opened") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-10-18	net/mlx5e: kTLS, Limit DUMP wqe size	Tariq Toukan	1	-0/+1
	HW expects the data size in DUMP WQEs to be up to MTU. Make sure they are in range. We elevate the frag page refcount by 'n-1', in addition to the one obtained in tx_sync_info_get(), having an overall of 'n' references. We bulk increments by using a single page_ref_add() command, to optimize perfermance. The refcounts are released one by one, by the corresponding completions. Fixes: d2ead1f360e8 ("net/mlx5e: Add kTLS TX HW offload support") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-10-18	net/mlx5e: kTLS, Save only the frag page to release at completion	Tariq Toukan	1	-1/+1
	In TX resync flow where DUMP WQEs are posted, keep a pointer to the fragment page to unref it upon completion, instead of saving the whole fragment. In addition, move it the end of the arguments list in tx_fill_wi(). Fixes: d2ead1f360e8 ("net/mlx5e: Add kTLS TX HW offload support") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-28	net/mlx5e: Change function's position to a more fitting file	Aya Levin	1	-6/+0
	Move function which indicates whether tunnel inner flow table is supported from en.h to en_fs.c. It fits better right after tunnel protocol rules definitions. Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-28	net/mlx5e: Support LAG TX port affinity distribution	Maxim Mikityanskiy	1	-1/+10
	When the VF LAG is in use, round-robin the TX affinity of channels among the different ports, if supported by the firmware. Create a set of TISes per port, while doing round-robin of the channels over the different sets. Let all SQs of a channel share the same set of TISes. If lag_tx_port_affinity HCA cap bit is supported, num_lag_ports > 1 and we aren't the LACP owner (PF in the regular use), assign the affinities, otherwise use tx_affinity == 0 in TIS context to let the FW assign the affinities itself. The TISes of the LACP owner are mapped only to the native physical port. For VFs, the starting port for round-robin is determined by its vhca_id, because a VF may have only one channel if attached to a single-core VM. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-28	net/mlx5e: Expose new function for TIS destroy loop	Tariq Toukan	1	-0/+1
	For better modularity and code sharing. Function internal change to be introduced in the next patches. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-22	net/mlx5e: Add mlx5e HV VHCA stats agent	Eran Ben Elisha	1	-0/+13
	HV VHCA stats agent is responsible on running a preiodic rx/tx packets/bytes stats update. Currently the supported format is version MLX5_HV_VHCA_STATS_VERSION. Block ID 1 is dedicated for statistics data transfer from the VF to the PF. The reporter fetch the statistics data from all opened channels, fill it in a buffer and send it to mlx5_hv_vhca_write_agent. As the stats layer should include some metadata per block (sequence and offset), the HV VHCA layer shall modify the buffer before actually send it over block 1. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-20	net/mlx5e: Report and recover from CQE with error on RQ	Aya Levin	1	-0/+3
	Add support for report and recovery from error on completion on RQ by setting the queue back to ready state. Handle only errors with a syndrome indicating the RQ might enter error state and could be recovered. Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-20	net/mlx5e: Report and recover from CQE error on ICOSQ	Aya Levin	1	-0/+8
	Add support for report and recovery from error on completion on ICOSQ. Deactivate RQ and flush, then deactivate ICOSQ. Set the queue back to ready state (firmware) and reset the ICOSQ and the RQ (software resources). Finally, activate the ICOSQ and the RQ. Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-20	net/mlx5e: Add support to rx reporter diagnose	Aya Levin	1	-0/+21
	Add rx reporter, which supports diagnose call-back. Diagnostics output include: information common to all RQs: RQ type, RQ size, RQ stride size, CQ size and CQ stride size. In addition advertise information per RQ and its related icosq and attached CQ. $ devlink health diagnose pci/0000:00:0b.0 reporter rx Common config: RQ: type: 2 stride size: 2048 size: 8 CQ: stride size: 64 size: 1024 RQs: channel ix: 0 rqn: 4308 HW state: 1 SW state: 3 posted WQEs: 7 cc: 7 ICOSQ HW state: 1 CQ: cqn: 1032 HW status: 0 channel ix: 1 rqn: 4313 HW state: 1 SW state: 3 posted WQEs: 7 cc: 7 ICOSQ HW state: 1 CQ: cqn: 1036 HW status: 0 channel ix: 2 rqn: 4318 HW state: 1 SW state: 3 posted WQEs: 7 cc: 7 ICOSQ HW state: 1 CQ: cqn: 1040 HW status: 0 channel ix: 3 rqn: 4323 HW state: 1 SW state: 3 posted WQEs: 7 cc: 7 ICOSQ HW state: 1 CQ: cqn: 1044 HW status: 0 $ devlink health diagnose pci/0000:00:0b.0 reporter rx -jp { "Common config": { "RQ": { "type": 2, "stride size": 2048, "size": 8 }, "CQ": { "stride size": 64, "size": 1024 } }, "RQs": [ { "channel ix": 0, "rqn": 4308, "HW state": 1, "SW state": 3, "posted WQEs": 7, "cc": 7, "ICOSQ HW state": 1, "CQ": { "cqn": 1032, "HW status": 0 } },{ "channel ix": 1, "rqn": 4313, "HW state": 1, "SW state": 3, "posted WQEs": 7, "cc": 7, "ICOSQ HW state": 1, "CQ": { "cqn": 1036, "HW status": 0 } },{ "channel ix": 2, "rqn": 4318, "HW state": 1, "SW state": 3, "posted WQEs": 7, "cc": 7, "ICOSQ HW state": 1, "CQ": { "cqn": 1040, "HW status": 0 } },{ "channel ix": 3, "rqn": 4323, "HW state": 1, "SW state": 3, "posted WQEs": 7, "cc": 7, "ICOSQ HW state": 1, "CQ": { "cqn": 1044, "HW status": 0 } } ] } Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-19	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	David S. Miller	1	-2/+9
	Merge conflict of mlx5 resolved using instructions in merge commit 9566e650bf7fdf58384bb06df634f7531ca3a97e. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-15	net/mlx5e: Fix compatibility issue with ethtool flash device	Eran Ben Elisha	1	-0/+2
	Cited patch deleted ethtool flash device support, as ethtool core can fallback into devlink flash callback. However, this is supported only if there is a devlink port registered over the corresponding netdevice. As mlx5e do not have devlink port support over native netdevice, it broke the ability to flash device via ethtool. This patch re-add the ethtool callback to avoid user functionality breakage when trying to flash device via ethtool. Fixes: 9c8bca2637b8 ("mlx5: Move firmware flash implementation to devlink") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-08	net/mlx5e: kTLS, Fix progress params context WQE layout	Tariq Toukan	1	-2/+7
	The TLS progress params context WQE should not include an Eth segment, drop it. In addition, align the tls_progress_params layout with the HW specification document: - fix the tisn field name. - remove the valid bit. Fixes: a12ff35e0fb7 ("net/mlx5: Introduce TLS TX offload hardware bits and structures") Fixes: d2ead1f360e8 ("net/mlx5e: Add kTLS TX HW offload support") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-06	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	David S. Miller	1	-9/+3
	Just minor overlapping changes in the conflicts here. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-01	net/mlx5e: Tx, Soften inline mode VLAN dependencies	Tariq Toukan	1	-1/+1
	If capable, use zero inline mode in TX WQE for non-VLAN packets. For VLAN ones, keep the enforcement of at least L2 inline mode, unless the WQE VLAN insertion offload cap is on. Performance: Tested single core packet rate of 64Bytes. NIC: ConnectX-5 CPU: Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz pktgen: Before: 12.46 Mpps After: 14.65 Mpps (+17.5%) XDP_TX: The MPWQE flow is not affected, as it already has this optimization. So we test with priv-flag xdp_tx_mpwqe: off. Before: 9.90 Mpps After: 10.20 Mpps (+3%) Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Tested-by: Noam Stolero <noams@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-01	net/mlx5e: XDP, Close TX MPWQE session when no room for inline packet left	Shay Agroskin	1	-2/+0
	In MPWQE mode, when transmitting packets with XDP, a packet that is smaller than a certain size (set to 256 bytes) would be sent inline within its WQE TX descriptor (mem-copied), in case the hardware tx queue is congested beyond a pre-defined water-mark. If a MPWQE cannot contain an additional inline packet, we close this MPWQE session, and send the packet inlined within the next MPWQE. To save some MPWQE session close+open operations, we don't open MPWQE sessions that are contiguously smaller than certain size (set to the HW MPWQE maximum size). If there isn't enough contiguous room in the send queue, we fill it with NOPs and wrap the send queue index around. This way, qualified packets are always sent inline. Perf tests: Tested packet rate for UDP 64Byte multi-stream over two dual port ConnectX-5 100Gbps NICs. CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz XDP_TX: With 24 channels: \| ------ \| bounced packets \| inlined packets \| inline ratio \| \| before \| 113.6Mpps \| 96.3Mpps \| 84% \| \| after \| 115Mpps \| 99.5Mpps \| 86% \| With one channel: \| ------ \| bounced packets \| inlined packets \| inline ratio \| \| before \| 6.7Mpps \| 0pps \| 0% \| \| after \| 6.8Mpps \| 0pps \| 0% \| As we can see, there is improvement in both inline ratio and overall packet rate for 24 channels. Also, we see no degradation for the one-channel case. Signed-off-by: Shay Agroskin <shayag@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-25	net/mlx5e: Fix wrong max num channels indication	Tariq Toukan	1	-9/+3
	No XSK support in the enhanced IPoIB driver and representors. Add a profile property to specify this, and enhance the logic that calculates the max number of channels to take it into account. Fixes: db05815b36cb ("net/mlx5e: Add XSK zero-copy support") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-11	Merge tag 'mlx5-fixes-2019-07-11' of ↵	David S. Miller	1	-0/+1
	git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== Mellanox, mlx5 fixes 2019-07-11 This series introduces some fixes to mlx5 driver. Please pull and let me know if there is any problem. For -stable v4.15 ('net/mlx5e: IPoIB, Add error path in mlx5_rdma_setup_rn') For -stable v5.1 ('net/mlx5e: Fix port tunnel GRE entropy control') ('net/mlx5e: Rx, Fix checksum calculation for new hardware') ('net/mlx5e: Fix return value from timeout recover function') ('net/mlx5e: Fix error flow in tx reporter diagnose') For -stable v5.2 ('net/mlx5: E-Switch, Fix default encap mode') Conflict note: This pull request will produce a small conflict when merged with net-next. In drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c Take the hunk from net and replace: esw_offloads_steering_init(esw, vf_nvports, total_nvports); with: esw_offloads_steering_init(esw); ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-11	net/mlx5e: Rx, Fix checksum calculation for new hardware	Saeed Mahameed	1	-0/+1
	CQE checksum full mode in new HW, provides a full checksum of rx frame. Covering bytes starting from eth protocol up to last byte in the received frame (frame_size - ETH_HLEN), as expected by the stack. Fixing up skb->csum by the driver is not required in such case. This fix is to avoid wrong checksum calculation in drivers which already support the new hardware with the new checksum mode. Fixes: 85327a9c4150 ("net/mlx5: Update the list of the PCI supported devices") Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>