summaryrefslogtreecommitdiffstats
path: root/drivers/infiniband/hw
AgeCommit message (Collapse)AuthorFilesLines
2022-06-07RDMA/qedr: Fix reporting QP timeout attributeKamal Heib2-1/+4
Make sure to save the passed QP timeout attribute when the QP gets modified, so when calling query QP the right value is reported and not the converted value that is required by the firmware. This issue was found while running the pyverbs tests. Fixes: cecbcddf6461 ("qedr: Add support for QP verbs") Link: https://lore.kernel.org/r/20220525132029.84813-1-kamalheib1@gmail.com Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Acked-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-05-26Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds45-1696/+1578
Pull rdma updates from Jason Gunthorpe: "Small collection of incremental improvement patches: - Minor code cleanup patches, comment improvements, etc from static tools - Clean the some of the kernel caps, reducing the historical stealth uAPI leftovers - Bug fixes and minor changes for rdmavt, hns, rxe, irdma - Remove unimplemented cruft from rxe - Reorganize UMR QP code in mlx5 to avoid going through the IB verbs layer - flush_workqueue(system_unbound_wq) removal - Ensure rxe waits for objects to be unused before allowing the core to free them - Several rc quality bug fixes for hfi1" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (67 commits) RDMA/rtrs-clt: Fix one kernel-doc comment RDMA/hfi1: Remove all traces of diagpkt support RDMA/hfi1: Consolidate software versions RDMA/hfi1: Remove pointless driver version RDMA/hfi1: Fix potential integer multiplication overflow errors RDMA/hfi1: Prevent panic when SDMA is disabled RDMA/hfi1: Prevent use of lock before it is initialized RDMA/rxe: Fix an error handling path in rxe_get_mcg() IB/core: Fix typo in comment RDMA/core: Fix typo in comment IB/hf1: Fix typo in comment IB/qib: Fix typo in comment IB/iser: Fix typo in comment RDMA/mlx4: Avoid flush_scheduled_work() usage IB/isert: Avoid flush_scheduled_work() usage RDMA/mlx5: Remove duplicate pointer assignment in mlx5_ib_alloc_implicit_mr() RDMA/qedr: Remove unnecessary synchronize_irq() before free_irq() RDMA/hns: Use hr_reg_read() instead of remaining roce_get_xxx() RDMA/hns: Use hr_reg_xxx() instead of remaining roce_set_xxx() RDMA/irdma: Add SW mechanism to generate completions on error ...
2022-05-24RDMA/hfi1: Remove all traces of diagpkt supportDennis Dalessandro1-23/+0
One of the concessions we made to get our driver upstream was to remove the diagnostic packet support. There is however still some cruft that was left over. Remove it. Link: https://lore.kernel.org/r/20220520183727.48973.93587.stgit@awfm-01.cornelisnetworks.com Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-24RDMA/hfi1: Consolidate software versionsDennis Dalessandro2-18/+1
There is no need to have separate user and kernel software versions. There is a single software that the kernel is compatible with. Also remove the notion of a "kernel type" that is long since deprecated. Link: https://lore.kernel.org/r/20220520183722.48973.60262.stgit@awfm-01.cornelisnetworks.com Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-24RDMA/hfi1: Remove pointless driver versionDennis Dalessandro2-21/+0
Driver versions have long been forbidden in the RDMA subsystem. We removed most of the code relating to them and have been very strict about not allowing. However there is some leftover versioning that we do not need. Get rid of that. Link: https://lore.kernel.org/r/20220520183717.48973.17418.stgit@awfm-01.cornelisnetworks.com Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-24RDMA/hfi1: Fix potential integer multiplication overflow errorsDennis Dalessandro1-1/+1
When multiplying of different types, an overflow is possible even when storing the result in a larger type. This is because the conversion is done after the multiplication. So arithmetic overflow and thus in incorrect value is possible. Correct an instance of this in the inter packet delay calculation. Fix by ensuring one of the operands is u64 which will promote the other to u64 as well ensuring no overflow. Cc: stable@vger.kernel.org Fixes: 7724105686e7 ("IB/hfi1: add driver files") Link: https://lore.kernel.org/r/20220520183712.48973.29855.stgit@awfm-01.cornelisnetworks.com Reviewed-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-24RDMA/hfi1: Prevent panic when SDMA is disabledDouglas Miller1-0/+2
If the hfi1 module is loaded with HFI1_CAP_SDMA off, a call to hfi1_write_iter() will dereference a NULL pointer and panic. A typical stack frame is: sdma_select_user_engine [hfi1] hfi1_user_sdma_process_request [hfi1] hfi1_write_iter [hfi1] do_iter_readv_writev do_iter_write vfs_writev do_writev do_syscall_64 The fix is to test for SDMA in hfi1_write_iter() and fail the I/O with EINVAL. Link: https://lore.kernel.org/r/20220520183706.48973.79803.stgit@awfm-01.cornelisnetworks.com Signed-off-by: Douglas Miller <doug.miller@cornelisnetworks.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-24RDMA/hfi1: Prevent use of lock before it is initializedDouglas Miller1-5/+7
If there is a failure during probe of hfi1 before the sdma_map_lock is initialized, the call to hfi1_free_devdata() will attempt to use a lock that has not been initialized. If the locking correctness validator is on then an INFO message and stack trace resembling the following may be seen: INFO: trying to register non-static key. The code is fine but needs lockdep annotation, or maybe you didn't initialize this object before use? turning off the locking correctness validator. Call Trace: register_lock_class+0x11b/0x880 __lock_acquire+0xf3/0x7930 lock_acquire+0xff/0x2d0 _raw_spin_lock_irq+0x46/0x60 sdma_clean+0x42a/0x660 [hfi1] hfi1_free_devdata+0x3a7/0x420 [hfi1] init_one+0x867/0x11a0 [hfi1] pci_device_probe+0x40e/0x8d0 The use of sdma_map_lock in sdma_clean() is for freeing the sdma_map memory, and sdma_map is not allocated/initialized until after sdma_map_lock has been initialized. This code only needs to be run if sdma_map is not NULL, and so checking for that condition will avoid trying to use the lock before it is initialized. Fixes: 473291b3ea0e ("IB/hfi1: Fix for early release of sdma context") Fixes: 7724105686e7 ("IB/hfi1: add driver files") Link: https://lore.kernel.org/r/20220520183701.48973.72434.stgit@awfm-01.cornelisnetworks.com Reported-by: Zheyu Ma <zheyuma97@gmail.com> Signed-off-by: Douglas Miller <doug.miller@cornelisnetworks.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-24Merge tag 'v5.18' into rdma.git for-nextJason Gunthorpe5-38/+31
Following patches have dependencies. Resolve the merge conflict in drivers/net/ethernet/mellanox/mlx5/core/main.c by keeping the new names for the fs functions following linux-next: https://lore.kernel.org/r/20220519113529.226bc3e2@canb.auug.org.au/ Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-24IB/hf1: Fix typo in commentJulia Lawall1-1/+1
Spelling mistake (triple letters) in comment. Detected with the help of Coccinelle. Link: https://lore.kernel.org/r/20220521111145.81697-83-Julia.Lawall@inria.fr Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-24IB/qib: Fix typo in commentJulia Lawall1-1/+1
Spelling mistake (triple letters) in comment. Detected with the help of Coccinelle. Link: https://lore.kernel.org/r/20220521111145.81697-56-Julia.Lawall@inria.fr Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-20RDMA/mlx4: Avoid flush_scheduled_work() usageTetsuo Handa3-8/+34
Flushing system-wide workqueues is dangerous and will be forbidden. Replace system_wq with local cm_wq. Link: https://lore.kernel.org/r/22f7183b-cc16-5a34-e879-7605f5efc6e6@I-love.SAKURA.ne.jp Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-19RDMA/mlx5: Remove duplicate pointer assignment in mlx5_ib_alloc_implicit_mr()Daisuke Matsuda1-1/+0
The pointer imr->umem is assigned twice. Fix this by removing the redundant one. Link: https://lore.kernel.org/r/20220518044914.1903125-1-matsuda-daisuke@fujitsu.com Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-17RDMA/qedr: Remove unnecessary synchronize_irq() before free_irq()Minghao Chi1-1/+0
Calling synchronize_irq() right before free_irq() is quite useless. On one hand the IRQ can easily fire again before free_irq() is entered, on the other hand free_irq() itself calls synchronize_irq() internally (in a race condition free way), before any state associated with the IRQ is freed. Link: https://lore.kernel.org/r/20220513081647.1631141-1-chi.minghao@zte.com.cn Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn> Acked-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski3-37/+21
No conflicts. Build issue in drivers/net/ethernet/sfc/ptp.c 54fccfdd7c66 ("sfc: efx_default_channel_type APIs can be static") 49e6123c65da ("net: sfc: fix memory leak due to ptp channel") https://lore.kernel.org/all/20220510130556.52598fe2@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-12RDMA/hns: Use hr_reg_read() instead of remaining roce_get_xxx()Wenpeng Liang4-254/+104
To reduce the code size and make the code clearer, replace all roce_get_xxx() with hr_reg_read() to read the data fields. Link: https://lore.kernel.org/r/20220512080012.38728-3-liangwenpeng@huawei.com Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-12RDMA/hns: Use hr_reg_xxx() instead of remaining roce_set_xxx()Wenpeng Liang2-270/+157
To reduce the code size and make the code clearer, replace all roce_set_xxx() with hr_reg_xxx() to write the data fields. Link: https://lore.kernel.org/r/20220512080012.38728-2-liangwenpeng@huawei.com Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-11RDMA/irdma: Add SW mechanism to generate completions on errorMustafa Ismail4-37/+210
HW flushes after QP in error state is not reliable. This can lead to application hang waiting on a completion for outstanding WRs. Implement a SW mechanism to generate completions for any outstanding WR's after the QP is modified to error. This is accomplished by starting a delayed worker after the QP is modified to error and the HW flush is performed. The worker will generate completions that will be returned to the application when it polls the CQ. This mechanism only applies to Kernel applications. Link: https://lore.kernel.org/r/20220425181624.1617-1-shiraz.saleem@intel.com Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-09net/mlx5: Lag, expose number of lag portsMark Bloch4-2/+4
Downstream patches will add support for hardware lag with more than 2 ports. Add a way for users to query the number of lag ports. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-04RDMA/hns: Remove the num_cqc_timer variableYixing Liu4-5/+3
The bt number of cqc_timer of HIP09 increases compared with that of HIP08. Therefore, cqc_timer_bt_num and num_cqc_timer do not match. As a result, the driver may fail to allocate cqc_timer. So the driver needs to uniquely uses cqc_timer_bt_num to represent the bt number of cqc_timer. Fixes: 0e40dc2f70cd ("RDMA/hns: Add timer allocation support for hip08") Link: https://lore.kernel.org/r/20220429093545.58070-1-liangwenpeng@huawei.com Signed-off-by: Yixing Liu <liuyixing1@huawei.com> Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-04RDMA/hns: Add the detection for CMDQ status in the device initialization processYangyang Li2-0/+27
CMDQ may fail during HNS ROCEE initialization. The following is the log when the execution fails: hns3 0000:bd:00.2: In reset process RoCE client reinit. hns3 0000:bd:00.2: CMDQ move tail from 840 to 839 hns3 0000:bd:00.2 hns_2: failed to set gid, ret = -11! hns3 0000:bd:00.2: CMDQ move tail from 840 to 839 <...> hns3 0000:bd:00.2: CMDQ move tail from 840 to 839 hns3 0000:bd:00.2: CMDQ move tail from 840 to 0 hns3 0000:bd:00.2: [cmd]token 14e mailbox 20 timeout. hns3 0000:bd:00.2 hns_2: set HEM step 0 failed! hns3 0000:bd:00.2 hns_2: set HEM address to HW failed! hns3 0000:bd:00.2 hns_2: failed to alloc mtpt, ret = -16. infiniband hns_2: Couldn't create ib_mad PD infiniband hns_2: Couldn't open port 1 hns3 0000:bd:00.2: Reset done, RoCE client reinit finished. However, even if ib_mad client registration failed, ib_register_device() still returns success to the driver. In the device initialization process, CMDQ execution fails because HW/FW is abnormal. Therefore, if CMDQ fails, the initialization function should set CMDQ to a fatal error state and return a failure to the caller. Fixes: 9a4435375cd1 ("IB/hns: Add driver files for hns RoCE driver") Link: https://lore.kernel.org/r/20220429093104.26687-1-liangwenpeng@huawei.com Signed-off-by: Yangyang Li <liyangyang20@huawei.com> Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-04RDMA/hns: Remove unnecessary ret variable from hns_roce_dereg_mr()Guo Zhengkui1-2/+1
Fix the following coccicheck warning: drivers/infiniband/hw/hns/hns_roce_mr.c:343:5-8: Unneeded variable: "ret". Return 0 directly instead. Link: https://lore.kernel.org/r/20220426070858.9098-1-guozhengkui@vivo.com Signed-off-by: Guo Zhengkui <guozhengkui@vivo.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-02RDMA/irdma: Fix possible crash due to NULL netdev in notifierMustafa Ismail1-12/+9
For some net events in irdma_net_event notifier, the netdev can be NULL which will cause a crash in rdma_vlan_dev_real_dev. Fix this by moving all processing to the NETEVENT_NEIGH_UPDATE case where the netdev is guaranteed to not be NULL. Fixes: 6702bc147448 ("RDMA/irdma: Fix netdev notifications for vlan's") Link: https://lore.kernel.org/r/20220425181703.1634-4-shiraz.saleem@intel.com Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-02RDMA/irdma: Reduce iWARP QP destroy timeShiraz Saleem1-6/+4
QP destroy is synchronous and waits for its refcnt to be decremented in irdma_cm_node_free_cb (for iWARP) which fires after the RCU grace period elapses. Applications running a large number of connections are exposed to high wait times on destroy QP for events like SIGABORT. The long pole for this wait time is the firing of the call_rcu callback during a CM node destroy which can be slow. It holds the QP reference count and blocks the destroy QP from completing. call_rcu only needs to make sure that list walkers have a reference to the cm_node object before freeing it and thus need to wait for grace period elapse. The rest of the connection teardown in irdma_cm_node_free_cb is moved out of the grace period wait in irdma_destroy_connection. Also, replace call_rcu with a simple kfree_rcu as it just needs to do a kfree on the cm_node Fixes: 146b9756f14c ("RDMA/irdma: Add connection manager") Link: https://lore.kernel.org/r/20220425181703.1634-3-shiraz.saleem@intel.com Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-02RDMA/irdma: Flush iWARP QP if modified to ERR from RTR stateTatyana Nikolova2-13/+7
When connection establishment fails in iWARP mode, an app can drain the QPs and hang because flush isn't issued when the QP is modified from RTR state to error. Issue a flush in this case using function irdma_cm_disconn(). Update irdma_cm_disconn() to do flush when cm_id is NULL, which is the case when the QP is in RTR state and there is an error in the connection establishment. Fixes: b48c24c2d710 ("RDMA/irdma: Implement device supported verb APIs") Link: https://lore.kernel.org/r/20220425181703.1634-2-shiraz.saleem@intel.com Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-04-25RDMA/mlx5: Clean UMR QP type flow from mlx5_ib_post_send()Aharon Landau4-145/+1
No internal UMR operation is using mlx5_ib_post_send(), remove the UMR QP type logic from this function. Link: https://lore.kernel.org/r/0b2f368f14bc9266ebdf92a601ca4e1e5b1e1188.1649747695.git.leonro@nvidia.com Signed-off-by: Aharon Landau <aharonl@nvidia.com> Reviewed-by: Michael Guralnik <michaelgur@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-04-25RDMA/mlx5: Use mlx5_umr_post_send_wait() to update xltAharon Landau5-208/+116
Move mlx5_ib_update_mr_pas logic to umr.c, and use mlx5_umr_post_send_wait() instead of mlx5_ib_post_send_wait(). Since it is the last use of mlx5_ib_post_send_wait(), remove it. Link: https://lore.kernel.org/r/55a4972f156aba3592a2fc9bcb33e2059acf295f.1649747695.git.leonro@nvidia.com Signed-off-by: Aharon Landau <aharonl@nvidia.com> Reviewed-by: Michael Guralnik <michaelgur@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-04-25RDMA/mlx5: Use mlx5_umr_post_send_wait() to update MR pasAharon Landau5-75/+164
Move mlx5_ib_update_mr_pas logic to umr.c, and use mlx5_umr_post_send_wait() instead of mlx5_ib_post_send_wait(). Link: https://lore.kernel.org/r/ed8f2ee6c64804072155d727149abf7105f92536.1649747695.git.leonro@nvidia.com Signed-off-by: Aharon Landau <aharonl@nvidia.com> Reviewed-by: Michael Guralnik <michaelgur@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-04-25RDMA/mlx5: Move creation and free of translation tables to umr.cAharon Landau3-100/+116
The only use of the translation tables is to update the mkey translation by a UMR operation. Move the responsibility of creating and freeing them to umr.c Link: https://lore.kernel.org/r/1d93f1381be82a22aaf1168cdbdfb227eac1ce62.1649747695.git.leonro@nvidia.com Signed-off-by: Aharon Landau <aharonl@nvidia.com> Reviewed-by: Michael Guralnik <michaelgur@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-04-25RDMA/mlx5: Use mlx5_umr_post_send_wait() to rereg pd accessAharon Landau3-25/+45
Move rereg_pd_access logic to umr.c, and use mlx5_umr_post_send_wait() instead of mlx5_ib_post_send_wait(). Link: https://lore.kernel.org/r/18da4f47edbc2561f652b7ee4e7a5269e866af77.1649747695.git.leonro@nvidia.com Signed-off-by: Aharon Landau <aharonl@nvidia.com> Reviewed-by: Michael Guralnik <michaelgur@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-04-25RDMA/mlx5: Use mlx5_umr_post_send_wait() to revoke MRsAharon Landau3-28/+34
Move the revoke_mr logic to umr.c, and using mlx5_umr_post_send_wait() instead of mlx5_ib_post_send_wait(). In the new implementation, do not zero out the access flags. Before reusing the MR, we will update it to the required access. Link: https://lore.kernel.org/r/63717dfdaf6007f81b3e6dbf598f5bf3875ce86f.1649747695.git.leonro@nvidia.com Signed-off-by: Aharon Landau <aharonl@nvidia.com> Reviewed-by: Michael Guralnik <michaelgur@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-04-25RDMA/mlx5: Introduce mlx5_umr_post_send_wait()Aharon Landau2-0/+104
Introduce mlx5_umr_post_send_wait() that uses a UMR adjusted flow for posting WQEs. The next patches will gradually move UMR operations to use this flow. Once done, will get rid of mlx5_ib_post_send_wait(). mlx5_umr_post_send_wait gets already written WQE segments and will only memcpy it to the SQ. This way, we avoid packing all the data in a WR just to unpack it into the WQE. Link: https://lore.kernel.org/r/f027dd592fde62402b2d49efded8d1d22229d22b.1649747695.git.leonro@nvidia.com Signed-off-by: Aharon Landau <aharonl@nvidia.com> Reviewed-by: Michael Guralnik <michaelgur@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-04-25RDMA/mlx5: Expose wqe posting helpers outside of wr.cAharon Landau2-111/+121
Split posting WQEs logic to helpers, generalize it and expose for future use in the UMR post send. Link: https://lore.kernel.org/r/a2b0f6cd96f0405a65d38e82c6ae7ef34dcb34bc.1649747695.git.leonro@nvidia.com Signed-off-by: Aharon Landau <aharonl@nvidia.com> Reviewed-by: Michael Guralnik <michaelgur@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-04-25RDMA/mlx5: Simplify get_umr_update_access_mask()Aharon Landau1-10/+5
Instead of getting the update access capabilities each call to get_umr_update_access_mask(), pass struct mlx5_ib_dev and get the capabilities inside the function. Link: https://lore.kernel.org/r/f22b8a84ef32e29ada26691f06b57e2ed5943b76.1649747695.git.leonro@nvidia.com Signed-off-by: Aharon Landau <aharonl@nvidia.com> Reviewed-by: Michael Guralnik <michaelgur@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-04-25RDMA/mlx5: Move mkey ctrl segment logic to umr.cAharon Landau5-141/+147
Move set_reg_umr_segment() and its helpers to umr.c. Link: https://lore.kernel.org/r/5a7fac8ae8543521d19d174663245ae84b910310.1649747695.git.leonro@nvidia.com Signed-off-by: Aharon Landau <aharonl@nvidia.com> Reviewed-by: Michael Guralnik <michaelgur@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-04-25RDMA/mlx5: Move umr checks to umr.hAharon Landau5-70/+72
Move mlx5_ib_can_load_pas_with_umr() and mlx5_ib_can_reconfig_with_umr() to umr.h and rename them accordingly. Link: https://lore.kernel.org/r/1b799b0142534a63dfd5bacc5f8ad2256d7777ad.1649747695.git.leonro@nvidia.com Signed-off-by: Aharon Landau <aharonl@nvidia.com> Reviewed-by: Michael Guralnik <michaelgur@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-04-25RDMA/mlx5: Move init and cleanup of UMR to umr.cAharon Landau4-103/+125
The first patch in a series to split UMR logic to a dedicated file. As a start, move the init and cleanup of UMR resources to umr.c. Link: https://lore.kernel.org/r/849e632dd1945a2534712a320cc5779f2149ba96.1649747695.git.leonro@nvidia.com Signed-off-by: Aharon Landau <aharonl@nvidia.com> Reviewed-by: Michael Guralnik <michaelgur@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-04-19RDMA/mlx5: Fix flow steering egress flowLeon Romanovsky1-5/+0
The commit mentioned in Fixes line removed the function that was called to check validity of esp_aes_gcm attribute. Sadly, that is_valid_esp_aes_gcm() returned success even for specs without esp_aes_gcm at all. So the right fix will be to remove whole if () and such fix the following error observed in smatch too. drivers/infiniband/hw/mlx5/fs.c:1126 _create_flow_rule() warn: duplicate check 'is_egress' (previous on line 1098) Fixes: de8bdb476908 ("RDMA/mlx5: Drop crypto flow steering API") Link: https://lore.kernel.org/r/11b31c1f85bc8c8add385529aa3f307c3b383a11.1649842371.git.leonro@nvidia.com Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-04-19RDMA/irdma: Fix deadlock in irdma_cleanup_cm_core()Duoming Zhou1-6/+1
There is a deadlock in irdma_cleanup_cm_core(), which is shown below: (Thread 1) | (Thread 2) | irdma_schedule_cm_timer() irdma_cleanup_cm_core() | add_timer() spin_lock_irqsave() //(1) | (wait a time) ... | irdma_cm_timer_tick() del_timer_sync() | spin_lock_irqsave() //(2) (wait timer to stop) | ... We hold cm_core->ht_lock in position (1) of thread 1 and use del_timer_sync() to wait timer to stop, but timer handler also need cm_core->ht_lock in position (2) of thread 2. As a result, irdma_cleanup_cm_core() will block forever. This patch removes the check of timer_pending() in irdma_cleanup_cm_core(), because the del_timer_sync() function will just return directly if there isn't a pending timer. As a result, the lock is redundant, because there is no resource it could protect. Link: https://lore.kernel.org/r/20220418153322.42524-1-duoming@zju.edu.cn Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-04-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netPaolo Abeni2-1/+10
2022-04-12Merge branch 'mlx5-next' of ↵Jason Gunthorpe2-252/+2
https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Leon Romanovsky says: ==================== Mellanox shared branch that includes: * Removal of FPGA TLS code https://lore.kernel.org/all/cover.1649073691.git.leonro@nvidia.com Mellanox INNOVA TLS cards are EOL in May, 2018 [1]. As such, the code is unmaintained, untested and not in-use by any upstream/distro oriented customers. In order to reduce code complexity, drop the kernel code, clean build config options and delete useless kTLS vs. TLS separation. [1] https://network.nvidia.com/related-docs/eol/LCR-000286.pdf * Removal of FPGA IPsec code https://lore.kernel.org/all/cover.1649232994.git.leonro@nvidia.com Together with FPGA TLS, the IPsec went to EOL state in the November of 2019 [1]. Exactly like FPGA TLS, no active customers exist for this upstream code and all the complexity around that area can be deleted. [2] https://network.nvidia.com/related-docs/eol/LCR-000535.pdf * Fix to undefined behavior from Borislav https://lore.kernel.org/all/20220405151517.29753-11-bp@alien8.de ==================== * 'mlx5-next' of https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux: net/mlx5: Remove not-implemented IPsec capabilities net/mlx5: Remove ipsec_ops function table net/mlx5: Reduce kconfig complexity while building crypto support net/mlx5: Move IPsec file to relevant directory net/mlx5: Remove not-needed IPsec config net/mlx5: Align flow steering allocation namespace to common style net/mlx5: Unify device IPsec capabilities check net/mlx5: Remove useless IPsec device checks net/mlx5: Remove ipsec vs. ipsec offload file separation RDMA/core: Delete IPsec flow action logic from the core RDMA/mlx5: Drop crypto flow steering API RDMA/mlx5: Delete never supported IPsec flow action net/mlx5: Remove FPGA ipsec specific statistics net/mlx5: Remove XFRM no_trailer flag net/mlx5: Remove not-used IDA field from IPsec struct net/mlx5: Delete metadata handling logic net/mlx5_fpga: Drop INNOVA IPsec support IB/mlx5: Fix undefined behavior due to shift overflowing the constant net/mlx5: Cleanup kTLS function names and their exposure net/mlx5: Remove tls vs. ktls separation as it is the same net/mlx5: Remove indirection in TLS build net/mlx5: Reliably return TLS device capabilities net/mlx5_fpga: Drop INNOVA TLS support Link: https://lore.kernel.org/r/20220409055303.1223644-1-leon@kernel.org Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
2022-04-11RDMA/hns: Init the variable at the suitable placeHaoyue Xu1-1/+2
Assigning a value to ret in the init statement of a for-loop makes the code less readable. Link: https://lore.kernel.org/r/20220409083254.9696-6-liangwenpeng@huawei.com Signed-off-by: Haoyue Xu <xuhaoyue1@hisilicon.com> Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-04-11RDMA/hns: Add judgment on the execution result of CMDQ that free vf resourceWenpeng Liang1-4/+13
CDMQ may fail to execute, so its return value should not be ignored. Link: https://lore.kernel.org/r/20220409083254.9696-5-liangwenpeng@huawei.com Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-04-11RDMA/hns: Remove redundant variable "ret"Guofeng Yue1-4/+1
It is completely redundant for this function to use "ret" to store the return value of the subfunction. Link: https://lore.kernel.org/r/20220409083254.9696-4-liangwenpeng@huawei.com Signed-off-by: Guofeng Yue <yueguofeng@hisilicon.com> Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-04-11RDMA/hns: Remove unused function to_hns_roce_state()Yixing Liu2-31/+0
This function is only used in HIP06, which has been removed. So remove it. Link: https://lore.kernel.org/r/20220409083254.9696-3-liangwenpeng@huawei.com Signed-off-by: Yixing Liu <liuyixing1@huawei.com> Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-04-11RDMA/hns: Remove unnecessary check for the sgid_attr when modifying QPChengchang Tang1-3/+1
The sgid_attr cannot be null in this scenario. This judgment is redundant. Fixes: 606bf89e98ef ("RDMA/hns: Refactor for hns_roce_v2_modify_qp function") Link: https://lore.kernel.org/r/20220409083254.9696-2-liangwenpeng@huawei.com Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-04-09RDMA/mlx5: Drop crypto flow steering APILeon Romanovsky2-248/+2
The mlx5 flow steering crypto API was intended to be used in FPGA devices, which is not supported for years already. The removal of mlx5 crypto FPGA code together with inability to configure encryption keys makes the low steering API completely unusable. So delete the code, so any ESP flow steering requests will fail with not supported error, as it is happening now anyway as no device support this type of API. Link: https://lore.kernel.org/r/634a5face7734381463d809bfb89850f6998deac.1649232994.git.leonro@nvidia.com Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-04-09RDMA/mlx5: Delete never supported IPsec flow actionLeon Romanovsky1-4/+0
The IPSEC_REQUIRED_METADATA capability bit is never set, and can be safely removed from the flow action flags. Link: https://lore.kernel.org/r/697cd60bd5c9b6a004c449c1a41c2798fac844ff.1649232994.git.leonro@nvidia.com Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-04-08RDMA/hfi1: Fix use-after-free bug for mm structDouglas Miller1-0/+6
Under certain conditions, such as MPI_Abort, the hfi1 cleanup code may represent the last reference held on the task mm. hfi1_mmu_rb_unregister() then drops the last reference and the mm is freed before the final use in hfi1_release_user_pages(). A new task may allocate the mm structure while it is still being used, resulting in problems. One manifestation is corruption of the mmap_sem counter leading to a hang in down_write(). Another is corruption of an mm struct that is in use by another task. Fixes: 3d2a9d642512 ("IB/hfi1: Ensure correct mm is used at all times") Link: https://lore.kernel.org/r/20220408133523.122165.72975.stgit@awfm-01.cornelisnetworks.com Cc: <stable@vger.kernel.org> Signed-off-by: Douglas Miller <doug.miller@cornelisnetworks.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-04-08RDMA/usnic: Refactor usnic_uiom_alloc_pd()Robin Murphy3-6/+4
Rather than hard-coding pci_bus_type, pass the PF device through to usnic_uiom_alloc_pd() and retrieve its bus there. This prepares for iommu_domain_alloc() changing to take a device rather than a bus_type. Link: https://lore.kernel.org/r/ef607cb3f5a09920b86971b8c8e60af8c647457e.1649169359.git.robin.murphy@arm.com Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>