summaryrefslogtreecommitdiffstats
path: root/drivers/infiniband/hw/bnxt_re/qplib_rcfw.c
AgeCommit message (Collapse)AuthorFilesLines
2021-11-29RDMA/bnxt_re: Use bitmap_zalloc() when applicableChristophe JAILLET1-4/+2
Use 'bitmap_zalloc()' to simplify code, improve the semantic and avoid some open-coded arithmetic in allocator arguments. Also change the corresponding 'kfree()' into 'bitmap_free()' to keep consistency. Link: https://lore.kernel.org/r/5c029daf43b92fdc27926fe8a98084843437c498.1637872888.git.christophe.jaillet@wanadoo.fr Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-11-16RDMA/bnxt_re: Scan the whole bitmap when checking if "disabling RCFW with ↵Christophe JAILLET1-4/+2
pending cmd-bit" The 'cmdq->cmdq_bitmap' bitmap is 'rcfw->cmdq_depth' bits long. The size stored in 'cmdq->bmap_size' is the size of the bitmap in bytes. Remove this erroneous 'bmap_size' and use 'rcfw->cmdq_depth' directly in 'bnxt_qplib_disable_rcfw_channel()'. Otherwise some error messages may be missing. Other uses of 'cmdq_bitmap' already take into account 'rcfw->cmdq_depth' directly. Fixes: 1ac5a4047975 ("RDMA/bnxt_re: Add bnxt_re RoCE driver") Link: https://lore.kernel.org/r/47ed717c3070a1d0f53e7b4c768a4fd11caf365d.1636707421.git.christophe.jaillet@wanadoo.fr Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Acked-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-09-20RDMA/bnxt_re: Use GFP_KERNEL in non atomic contextSelvin Xavier1-2/+2
Use GFP_KERNEL instead of GFP_ATOMIC while allocating control path structures which will be only called from non atomic context Link: https://lore.kernel.org/r/1631709163-2287-10-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-09-20RDMA/bnxt_re: Reduce the delay in polling for hwrm command completionSelvin Xavier1-1/+1
Driver has 1ms delay between the polling for atomic command completion. Polling immediately after issuing command usually doesn't report any completions. So all commands in the blocking path needs two iterations. So effectively 1ms spend on each command. HW requires much lesser time for each command. So reduce the delay to 1us and increase the iteration count to wait for the same time. Link: https://lore.kernel.org/r/1631709163-2287-5-git-send-email-selvin.xavier@broadcom.com Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-03-26RDMA/bnxt_re: Move device to error state upon device crashSelvin Xavier1-0/+4
When the L2 driver detects a device crash or device undergone reset, it invokes a stop callback to recover from error. The current RoCE driver doesn't recover the device. So move the device to error state and dispatch fatal events to all qps Release the MSIx vectors to avoid a crash when L2 driver disables the MSIx. Also, check for the device state to avoid posting further commands to the HW. Link: https://lore.kernel.org/r/1615968942-30970-1-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Naresh Kumar PBS <nareshkumar.pbs@broadcom.com> Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-18Merge branch 'mlx5_active_speed' into rdma.git for-nextJason Gunthorpe1-4/+6
Leon Romanovsky says: ==================== IBTA declares speed as 16 bits, but kernel stores it in u8. This series fixes in-kernel declaration while keeping external interface intact. ==================== Based on the mlx5-next branch at git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux due to dependencies. * branch 'mlx5_active_speed': RDMA: Fix link active_speed size RDMA/mlx5: Delete duplicated mlx5_ptys_width enum net/mlx5: Refactor query port speed functions
2020-09-03RDMA/bnxt_re: Convert tasklets to use new tasklet_setup() APIAllen Pais1-6/+5
In preparation for unconditionally passing the struct tasklet_struct pointer to all tasklet callbacks, switch to using the new tasklet_setup() and from_tasklet() to pass the tasklet pointer explicitly. Link: https://lore.kernel.org/r/20200903060637.424458-2-allen.lkml@gmail.com Signed-off-by: Romain Perier <romain.perier@gmail.com> Signed-off-by: Allen Pais <allen.lkml@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27RDMA/bnxt_re: Fix the qp table indexingSelvin Xavier1-4/+6
qp->id can be a value outside the max number of qp. Indexing the qp table with the id can cause out of bounds crash. So changing the qp table indexing by (qp->id % max_qp -1). Allocating one extra entry for QP1. Some adapters create one more than the max_qp requested to accommodate QP1. If the qp->id is 1, store the inforamtion in the last entry of the qp table. Fixes: f218d67ef004 ("RDMA/bnxt_re: Allow posting when QPs are in error") Link: https://lore.kernel.org/r/1598292876-26529-4-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-04-14RDMA/bnxt_re: Simplify obtaining queue entry from hw ringDevesh Sharma1-11/+5
Restructring the data path and control path queue management code to simplify the way a queue element is extracted from the hardware ring. Introduced a new function which will give a pointer to the next ring item depending upon the current cons/prod index in the hardware queue. Further, there are hardcoding when size of queue entry is calculated, replacing it with an inline function. This function would be easier to expand if need going forward. The code section to initialize the PSN search areas has also been restructured and couple of functions has been added there. Link: https://lore.kernel.org/r/1585851136-2316-4-git-send-email-devesh.sharma@broadcom.com Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-14RDMA/bnxt_re: Reduce device page size detection codeDevesh Sharma1-45/+27
Getting rid of the repeated code in the driver when deciding on the page size of the hardware ring memory. A new common function would translate the ring page size into device specific page size. Link: https://lore.kernel.org/r/1585851136-2316-2-git-send-email-devesh.sharma@broadcom.com Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-02RDMA/bnxt_re: Remove set but not used variables 'pg' and 'idx'YueHaibing1-4/+0
Fixes gcc '-Wunused-but-set-variable' warning: drivers/infiniband/hw/bnxt_re/qplib_rcfw.c: In function '__send_message': drivers/infiniband/hw/bnxt_re/qplib_rcfw.c:101:10: warning: variable 'idx' set but not used [-Wunused-but-set-variable] drivers/infiniband/hw/bnxt_re/qplib_rcfw.c:101:6: warning: variable 'pg' set but not used [-Wunused-but-set-variable] commit cee0c7bba486 ("RDMA/bnxt_re: Refactor command queue management code") involved this, but not used. Link: https://lore.kernel.org/r/20200227064900.92255-1-yuehaibing@huawei.com Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-21RDMA/bnxt_re: Refactor doorbell management functionsDevesh Sharma1-13/+8
Moving all the fast path doorbell functions at one place under qplib_res.h. To pass doorbell record information a new structure bnxt_qplib_db_info has been introduced. Every roce object holds an instance of this structure and doorbell information is initialized during resource creation. When DB is rung only the current queue index is read from hardware ring and rest of the data is taken from pre-initialized dbinfo structure. Link: https://lore.kernel.org/r/1581786665-23705-8-git-send-email-devesh.sharma@broadcom.com Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-21RDMA/bnxt_re: Refactor command queue management codeDevesh Sharma1-166/+253
Refactoring the command queue (rcfw) management code. A new data-structure is introduced to describe the bar register. each object which deals with mmio space should have a descriptor structure. This structure specifically hold DB register information. Thus, slow path creq structure now hold a bar register descriptor. Further cleanup the rcfw structure to introduce the command queue context and command response event queue context structures. Rest of the rcfw related code has been touched to incorporate these three structures. Link: https://lore.kernel.org/r/1581786665-23705-6-git-send-email-devesh.sharma@broadcom.com Signed-off-by: Naresh Kumar PBS <nareshkumar.pbs@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-21RDMA/bnxt_re: Refactor hardware queue memory allocationDevesh Sharma1-25/+32
At top level there are three major data structure addition. viz bnxt_qplib_hwq_attr, bnxt_qplib_sg_info and bnxt_qplib_tqm_ctx Intorduction of first data structure reduces the arguments list to bnxt_re_alloc_init_hwq() function. There are changes all over the driver code to incorporate this new structure. The caller needs to fill the attribute data structure and pass to this function. The second data structure is to pass memory region description viz. sghead, page_size and page_shift. There are changes all over the driver code to initialize bnxt_re_sg_info data structure. The new data structure helps to reduce the argument list of __alloc_pbl() function call. Till now the TQM rings related members were not collected under any specific data-structure making it hard to manage. The third data sctructure bnxt_qplib_tqm_ctx is added to refactor the TQM queue allocation and initialization. Link: https://lore.kernel.org/r/1581786665-23705-4-git-send-email-devesh.sharma@broadcom.com Signed-off-by: Naresh Kumar PBS <nareshkumar.pbs@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-06remove ioremap_nocache and devm_ioremap_nocacheChristoph Hellwig1-2/+2
ioremap has provided non-cached semantics by default since the Linux 2.6 days, so remove the additional ioremap_nocache interface. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Arnd Bergmann <arnd@arndb.de>
2019-10-08RDMA/bnxt_re: Enable SRIOV VF support on Broadcom's 57500 adapter seriesDevesh Sharma1-1/+4
Broadcom's 575xx adapter series has support for SRIOV VFs. Making changes to enable SRIOV VF support. There are two major area where changes are done: - Added new DB location for control-path and data-path DB ring - New devices do not need to issue the sriov-config slow-path command thus, skipping to call that firmware command. For now enabling support for 64 RoCE VFs. Link: https://lore.kernel.org/r/1570081715-14301-1-git-send-email-devesh.sharma@broadcom.com Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-08-22RDMA/bnxt_re: Fix stack-out-of-bounds in bnxt_qplib_rcfw_send_messageSelvin Xavier1-1/+7
Driver copies FW commands to the HW queue as units of 16 bytes. Some of the command structures are not exact multiple of 16. So while copying the data from those structures, the stack out of bounds messages are reported by KASAN. The following error is reported. [ 1337.530155] ================================================================== [ 1337.530277] BUG: KASAN: stack-out-of-bounds in bnxt_qplib_rcfw_send_message+0x40a/0x850 [bnxt_re] [ 1337.530413] Read of size 16 at addr ffff888725477a48 by task rmmod/2785 [ 1337.530540] CPU: 5 PID: 2785 Comm: rmmod Tainted: G OE 5.2.0-rc6+ #75 [ 1337.530541] Hardware name: Dell Inc. PowerEdge R730/0599V5, BIOS 1.0.4 08/28/2014 [ 1337.530542] Call Trace: [ 1337.530548] dump_stack+0x5b/0x90 [ 1337.530556] ? bnxt_qplib_rcfw_send_message+0x40a/0x850 [bnxt_re] [ 1337.530560] print_address_description+0x65/0x22e [ 1337.530568] ? bnxt_qplib_rcfw_send_message+0x40a/0x850 [bnxt_re] [ 1337.530575] ? bnxt_qplib_rcfw_send_message+0x40a/0x850 [bnxt_re] [ 1337.530577] __kasan_report.cold.3+0x37/0x77 [ 1337.530581] ? _raw_write_trylock+0x10/0xe0 [ 1337.530588] ? bnxt_qplib_rcfw_send_message+0x40a/0x850 [bnxt_re] [ 1337.530590] kasan_report+0xe/0x20 [ 1337.530592] memcpy+0x1f/0x50 [ 1337.530600] bnxt_qplib_rcfw_send_message+0x40a/0x850 [bnxt_re] [ 1337.530608] ? bnxt_qplib_creq_irq+0xa0/0xa0 [bnxt_re] [ 1337.530611] ? xas_create+0x3aa/0x5f0 [ 1337.530613] ? xas_start+0x77/0x110 [ 1337.530615] ? xas_clear_mark+0x34/0xd0 [ 1337.530623] bnxt_qplib_free_mrw+0x104/0x1a0 [bnxt_re] [ 1337.530631] ? bnxt_qplib_destroy_ah+0x110/0x110 [bnxt_re] [ 1337.530633] ? bit_wait_io_timeout+0xc0/0xc0 [ 1337.530641] bnxt_re_dealloc_mw+0x2c/0x60 [bnxt_re] [ 1337.530648] bnxt_re_destroy_fence_mr+0x77/0x1d0 [bnxt_re] [ 1337.530655] bnxt_re_dealloc_pd+0x25/0x60 [bnxt_re] [ 1337.530677] ib_dealloc_pd_user+0xbe/0xe0 [ib_core] [ 1337.530683] srpt_remove_one+0x5de/0x690 [ib_srpt] [ 1337.530689] ? __srpt_close_all_ch+0xc0/0xc0 [ib_srpt] [ 1337.530692] ? xa_load+0x87/0xe0 ... [ 1337.530840] do_syscall_64+0x6d/0x1f0 [ 1337.530843] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 1337.530845] RIP: 0033:0x7ff5b389035b [ 1337.530848] Code: 73 01 c3 48 8b 0d 2d 0b 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d fd 0a 2c 00 f7 d8 64 89 01 48 [ 1337.530849] RSP: 002b:00007fff83425c28 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0 [ 1337.530852] RAX: ffffffffffffffda RBX: 00005596443e6750 RCX: 00007ff5b389035b [ 1337.530853] RDX: 000000000000000a RSI: 0000000000000800 RDI: 00005596443e67b8 [ 1337.530854] RBP: 0000000000000000 R08: 00007fff83424ba1 R09: 0000000000000000 [ 1337.530856] R10: 00007ff5b3902960 R11: 0000000000000206 R12: 00007fff83425e50 [ 1337.530857] R13: 00007fff8342673c R14: 00005596443e6260 R15: 00005596443e6750 [ 1337.530885] The buggy address belongs to the page: [ 1337.530962] page:ffffea001c951dc0 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 [ 1337.530964] flags: 0x57ffffc0000000() [ 1337.530967] raw: 0057ffffc0000000 0000000000000000 ffffffff1c950101 0000000000000000 [ 1337.530970] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 [ 1337.530970] page dumped because: kasan: bad access detected [ 1337.530996] Memory state around the buggy address: [ 1337.531072] ffff888725477900: 00 00 00 00 f1 f1 f1 f1 00 00 00 00 00 f2 f2 f2 [ 1337.531180] ffff888725477980: 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 [ 1337.531288] >ffff888725477a00: 00 f2 f2 f2 f2 f2 f2 00 00 00 f2 00 00 00 00 00 [ 1337.531393] ^ [ 1337.531478] ffff888725477a80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 1337.531585] ffff888725477b00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 1337.531691] ================================================================== Fix this by passing the exact size of each FW command to bnxt_qplib_rcfw_send_message as req->cmd_size. Before sending the command to HW, modify the req->cmd_size to number of 16 byte units. Fixes: 1ac5a4047975 ("RDMA/bnxt_re: Add bnxt_re RoCE driver") Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1566468170-489-1-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-03-28RDMA/bnxt_re: Use correct sizing on buffers holding page DMA addressesSelvin Xavier1-2/+2
umem->nmap is used while allocating internal buffer for storing page DMA addresses. This causes out of bounds array access while iterating the umem DMA-mapped SGL with umem page combining as umem->nmap can be less than number of system pages in umem. Use ib_umem_num_pages() instead of umem->nmap to size the page array. Add a new structure (bnxt_qplib_sg_info) to pass sglist, npages and nmap. Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-07RDMA/bnxt_re: Skip backing store allocation for 57500 seriesDevesh Sharma1-2/+4
The backing store to keep HW context data structures is allocated and initialized by L2 driver. For 57500 chip RoCE driver do not require to allocate and initialize additional memory. Changing to skip duplicate allocation and initialization for 57500 adapters. Driver continues as before for older chips. This patch also takes care of stats context memory alignment to 128 boundary, a requirement for 57500 series of chip. Older chips do not care of alignment, thus the change is unconditional. Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-07RDMA/bnxt_re: Add 64bit doorbells for 57500 seriesDevesh Sharma1-10/+21
The new chip series has 64 bit doorbell for notification queues. Thus, both control and data path event queues need new routines to write 64 bit doorbell. Adding the same. There is new doorbell interface between the chip and driver. Changing the chip specific data structure definitions. Additional significant changes are listed below - bnxt_re_net_ring_free/alloc takes a new argument - bnxt_qplib_enable_nq and enable_rcfw uses new doorbell offset for new chip. - DB mapping for NQ and CREQ now maps 8 bytes. - DBR_DBR_* macros renames to DBC_DBC_* - store nq_db_offset in a 32bit data type. - got rid of __iowrite64_copy, used writeq instead. - changed the DB header initialization to simpler scheme. Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-29Merge branch 'devx-async' into k.o/for-nextJason Gunthorpe1-2/+2
Yishai Hadas says: Enable DEVX asynchronous query commands This series enables querying a DEVX object in an asynchronous mode. The userspace application won't block when calling the firmware and it will be able to get the response back once that it will be ready. To enable the above functionality: - DEVX asynchronous command completion FD object was introduced. - The applicable file operations were implemented to enable using it by the user application. - Query asynchronous method was added to the DEVX object, it will call the firmware asynchronously and manages the response on the given input FD. - Hot unplug support was added for the FD to work properly upon unbind/disassociate. - mlx5 core fence for asynchronous commands was implemented and used to prevent racing upon unbind/disassociate. This branch is based on mlx5-next & v5.0-rc2 due to dependencies, from git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux * branch 'devx-async': IB/mlx5: Implement DEVX hot unplug for async command FD IB/mlx5: Implement the file ops of DEVX async command FD IB/mlx5: Introduce async DEVX obj query API IB/mlx5: Introduce MLX5_IB_OBJECT_DEVX_ASYNC_CMD_FD Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-14RDMA/bnxt_re: fix a size calculationDan Carpenter1-2/+1
This is from static analysis not from testing. Depending on the value of rcfw->cmdq_depth, then this might not cause an issue at runtime. The BITS_TO_LONGS() macro tells us how many longs it take to hold a bitmap. In other words, it divides by the number if bits per long and rounds up. Then we want to take that number and multiple by sizeof(long) to get the number of bytes to allocate. The code here does the multiplication first so the rounding up is done in the wrong place. So imagine we want to allocate 1 bit, then "(1 * 8) / 64 = 1" when we round up. But it should be "(1 / 64) * 8 = 8". In other words, because of the rounding difference we might allocate up to "sizeof(long) - 1" bytes fewer than intended. Fixes: 1ac5a4047975 ("RDMA/bnxt_re: Add bnxt_re RoCE driver") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-By: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-08cross-tree: phase out dma_zalloc_coherent()Luis Chamberlain1-2/+2
We already need to zero out memory for dma_alloc_coherent(), as such using dma_zalloc_coherent() is superflous. Phase it out. This change was generated with the following Coccinelle SmPL patch: @ replace_dma_zalloc_coherent @ expression dev, size, data, handle, flags; @@ -dma_zalloc_coherent(dev, size, handle, flags) +dma_alloc_coherent(dev, size, handle, flags) Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> [hch: re-ran the script on the latest tree] Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-19RDMA/bnxt_re: Increase depth of control path command queueDevesh Sharma1-12/+22
Increasing the depth of control path command queue to 8K entries to handle burst of commands. This feature needs support from FW and the driver/fw compatibility is checked from the interface version number. Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-16RDMA/bnxt_re: Prevent driver crash due to NULL pointer in error message printSomnath Kotur1-4/+6
crsqe->resp would be NULL in case the host command timed out before getting a response from HW. Check for NULL pointer to avoid a potential crash while printing the error message. Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-16RDMA/bnxt_re: Drop L2 async events silentlyDevesh Sharma1-3/+4
In some FW versions, RoCE driver also receives an async notification which was directed to L2 driver. RoCE driver does not handle this and print a message to syslog. Drop these notifications silently. Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-16RDMA/bnxt_re: Avoid NULL check after accessing the pointerSelvin Xavier1-7/+6
This is reported by smatch check. rcfw->creq_bar_reg_iomem is accessed in bnxt_qplib_rcfw_stop_irq and this variable check afterwards doesn't make sense. Also, rcfw->creq_bar_reg_iomem will never be NULL. So Removing this check. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: 6e04b1035689 ("RDMA/bnxt_re: Fix broken RoCE driver due to recent L2 driver changes") Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-16RDMA/bnxt_re: Fix recursive lock warning in debug kernelSelvin Xavier1-2/+11
Fix possible recursive lock warning. Its a false warning as the locks are part of two differnt HW Queue data structure - cmdq and creq. Debug kernel is throwing the following warning and stack trace. [ 783.914967] ============================================ [ 783.914970] WARNING: possible recursive locking detected [ 783.914973] 4.19.0-rc2+ #33 Not tainted [ 783.914976] -------------------------------------------- [ 783.914979] swapper/2/0 is trying to acquire lock: [ 783.914982] 000000002aa3949d (&(&hwq->lock)->rlock){..-.}, at: bnxt_qplib_service_creq+0x232/0x350 [bnxt_re] [ 783.914999] but task is already holding lock: [ 783.915002] 00000000be73920d (&(&hwq->lock)->rlock){..-.}, at: bnxt_qplib_service_creq+0x2a/0x350 [bnxt_re] [ 783.915013] other info that might help us debug this: [ 783.915016] Possible unsafe locking scenario: [ 783.915019] CPU0 [ 783.915021] ---- [ 783.915034] lock(&(&hwq->lock)->rlock); [ 783.915035] lock(&(&hwq->lock)->rlock); [ 783.915037] *** DEADLOCK *** [ 783.915038] May be due to missing lock nesting notation [ 783.915039] 1 lock held by swapper/2/0: [ 783.915040] #0: 00000000be73920d (&(&hwq->lock)->rlock){..-.}, at: bnxt_qplib_service_creq+0x2a/0x350 [bnxt_re] [ 783.915044] stack backtrace: [ 783.915046] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.19.0-rc2+ #33 [ 783.915047] Hardware name: Dell Inc. PowerEdge R730/0599V5, BIOS 1.0.4 08/28/2014 [ 783.915048] Call Trace: [ 783.915049] <IRQ> [ 783.915054] dump_stack+0x90/0xe3 [ 783.915058] __lock_acquire+0x106c/0x1080 [ 783.915061] ? sched_clock+0x5/0x10 [ 783.915063] lock_acquire+0xbd/0x1a0 [ 783.915065] ? bnxt_qplib_service_creq+0x232/0x350 [bnxt_re] [ 783.915069] _raw_spin_lock_irqsave+0x4a/0x90 [ 783.915071] ? bnxt_qplib_service_creq+0x232/0x350 [bnxt_re] [ 783.915073] bnxt_qplib_service_creq+0x232/0x350 [bnxt_re] [ 783.915078] tasklet_action_common.isra.17+0x197/0x1b0 [ 783.915081] __do_softirq+0xcb/0x3a6 [ 783.915084] irq_exit+0xe9/0x100 [ 783.915085] do_IRQ+0x6a/0x120 [ 783.915087] common_interrupt+0xf/0xf [ 783.915088] </IRQ> Use nested notation for the spin_lock to avoid this warning. Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-05RDMA/bnxt_re: QPLIB: Add and use #define dev_fmt(fmt) "QPLIB: " fmtJoe Perches1-26/+25
Consistently use the "QPLIB: " prefix for dev_<level> logging. Miscellanea: o Add missing newlines to avoid possible message interleaving o Coalesce consecutive dev_<level> uses that emit a message header to avoid < 80 column lengths and mistakenly output on multiple lines o Reflow modified lines to use 80 columns where appropriate o Consistently use "%s: " where __func__ is output o QPLIB: is now always output immediately after the dev_<level> header Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-05-25RDMA/bnxt_re: Fix broken RoCE driver due to recent L2 driver changesDevesh Sharma1-18/+43
The recent changes in Broadcom's ethernet driver(L2 driver) broke RoCE functionality in terms of MSIx vector allocation and de-allocation. There is a possibility that L2 driver would initiate MSIx vector reallocation depending upon the requests coming from administrator. In such cases L2 driver needs to free up all the MSIx vectors allocated previously and reallocate/initialize those. If RoCE driver is loaded and reshuffling is attempted, there will be kernel crashes because RoCE driver would still be holding the MSIx vectors but L2 driver would attempt to free in-use vectors. Thus leading to a kernel crash. Making changes in roce driver to fix crashes described above. As part of solution L2 driver tells RoCE driver to release the MSIx vector whenever there is a need. When RoCE driver get message it sync up with all the running tasklets and IRQ handlers and releases the vectors. L2 driver send one more message to RoCE driver to resume the MSIx vectors. L2 driver guarantees that RoCE vector do not change during reshuffling. Fixes: ec86f14ea506 ("bnxt_en: Add ULP calls to stop and restart IRQs.") Fixes: 08654eb213a8 ("bnxt_en: Change IRQ assignment for RDMA driver.") Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-06RDMA/bnxt_re: Avoid Hard lockup during error CQE processingSelvin Xavier1-2/+1
Hitting the following hardlockup due to a race condition in error CQE processing. [26146.879798] bnxt_en 0000:04:00.0: QPLIB: FP: CQ Processed Req [26146.886346] bnxt_en 0000:04:00.0: QPLIB: wr_id[1251] = 0x0 with status 0xa [26156.350935] NMI watchdog: Watchdog detected hard LOCKUP on cpu 4 [26156.357470] Modules linked in: nfsd auth_rpcgss nfs_acl lockd grace [26156.447957] CPU: 4 PID: 3413 Comm: kworker/4:1H Kdump: loaded [26156.457994] Hardware name: Dell Inc. PowerEdge R430/0CN7X8, [26156.466390] Workqueue: ib-comp-wq ib_cq_poll_work [ib_core] [26156.472639] Call Trace: [26156.475379] <NMI> [<ffffffff98d0d722>] dump_stack+0x19/0x1b [26156.481833] [<ffffffff9873f775>] watchdog_overflow_callback+0x135/0x140 [26156.489341] [<ffffffff9877f237>] __perf_event_overflow+0x57/0x100 [26156.496256] [<ffffffff98787c24>] perf_event_overflow+0x14/0x20 [26156.502887] [<ffffffff9860a580>] intel_pmu_handle_irq+0x220/0x510 [26156.509813] [<ffffffff98d16031>] perf_event_nmi_handler+0x31/0x50 [26156.516738] [<ffffffff98d1790c>] nmi_handle.isra.0+0x8c/0x150 [26156.523273] [<ffffffff98d17be8>] do_nmi+0x218/0x460 [26156.528834] [<ffffffff98d16d79>] end_repeat_nmi+0x1e/0x7e [26156.534980] [<ffffffff987089c0>] ? native_queued_spin_lock_slowpath+0x1d0/0x200 [26156.543268] [<ffffffff987089c0>] ? native_queued_spin_lock_slowpath+0x1d0/0x200 [26156.551556] [<ffffffff987089c0>] ? native_queued_spin_lock_slowpath+0x1d0/0x200 [26156.559842] <EOE> [<ffffffff98d083e4>] queued_spin_lock_slowpath+0xb/0xf [26156.567555] [<ffffffff98d15690>] _raw_spin_lock+0x20/0x30 [26156.573696] [<ffffffffc08381a1>] bnxt_qplib_lock_buddy_cq+0x31/0x40 [bnxt_re] [26156.581789] [<ffffffffc083bbaa>] bnxt_qplib_poll_cq+0x43a/0xf10 [bnxt_re] [26156.589493] [<ffffffffc083239b>] bnxt_re_poll_cq+0x9b/0x760 [bnxt_re] The issue happens if RQ poll_cq or SQ poll_cq or Async error event tries to put the error QP in flush list. Since SQ and RQ of each error qp are added to two different flush list, we need to protect it using locks of corresponding CQs. Difference in order of acquiring the lock in SQ poll_cq and RQ poll_cq can cause a hard lockup. Revisits the locking strategy and removes the usage of qplib_cq.hwq.lock. Instead of this lock, introduces qplib_cq.flush_lock to handle addition/deletion of QPs in flush list. Also, always invoke the flush_lock in order (SQ CQ lock first and then RQ CQ lock) to avoid any potential deadlock. Other than the poll_cq context, the movement of QP to/from flush list can be done in modify_qp context or from an async error event from HW. Synchronize these operations using the bnxt_re verbs layer CQ locks. To achieve this, adds a call back to the HW abstraction layer(qplib) to bnxt_re ib_verbs layer in case of async error event. Also, removes the buddy cq functions as it is no longer required. Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-28RDMA/bnxt_re: Fix incorrect DB offset calculationDevesh Sharma1-1/+5
To support host systems with non 4K page size, l2_db_size shall be calculated with 4096 instead of PAGE_SIZE. Also, supply the host page size to FW during initialization. Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-18RDMA/bnxt_re: Add SRQ support for Broadcom adaptersDevesh Sharma1-1/+1
Shared receive queue (SRQ) is defined as a pool of receive buffers shared among multiple QPs which belong to same protection domain in a given process context. Use of SRQ reduces the memory foot print of IB applications. Broadcom adapters support SRQ, adding code-changes to enable shared receive queue. Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-17RDMA/bnxt_re: Add support for query firmware versionSelvin Xavier1-1/+2
The device now reports firmware version thus, removing the hard coded values of the FW version string and redundant fw_rev hook from sysfs. Adding code to query firmware version from underlying device and report it through the kernel verb to get firmware version string. Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-11-13RDMA/bnxt_re: Add memory barriers when processing CQ/EQ entriesSomnath Kotur1-0/+4
The code determines if the next ring entry is valid before proceeding further to read the rest of the entry. The CPU can re-order and read the rest of the entry first, possibly reading a stale entry, if DMA of a new entry happens right after reading it. Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-11-13bnxt_re: fix a crash in qp error event processingSriharsha Basavapatna1-0/+2
In bnxt_qplib_process_qp_event(), for qp error events we look up the qp-handle and pass it for further processing. But we don't check if the handle is NULL. This could lead to a crash in the called functions when that qp-handle is dereferenced, if the qp is destroyed in the meantime. Fix this by checking for a valid qp-handle in that function. Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-18bnxt_re: Fix incorrect usage of test_bit()Somnath Kotur1-4/+4
test_bit() takes a bit number while the 'flags' field in struct bnxt_qplib_rcfw was using actual BIT position converted values. Fix this by assigning bit numbers and use consistent APIs all the flag values. Also logging a message in case of failure. Thanks to Dan Carpenter for pointing this out. Suggested-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-14RDMA/bnxt_re: Remove set-but-not-used variablesBart Van Assche1-4/+0
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-22bnxt_re: Stop issuing further cmds to FW once a cmd times outSomnath Kotur1-0/+4
Once a cmd to FW times out(after 20s) it is reasonable to assume the FW or atleast the control path is dead. No point issuing further cmds to the FW as each subsequent cmd with another 20s timeout will cascade resulting in unnecessary traces and/or NMI Lockups. Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-24RDMA/bnxt_re: Allow posting when QPs are in errorSelvin Xavier1-1/+25
This patch allows driver to post send and receive requests on QPs which are in error state. Instead of flushing the QP in the context of polling error CQEs, the QPs will be added to a flush list maintained per CQ. QP state is moved to error. QP is added to flush list if the user moves it to error state using modify_qp also. After polling the HW CQ in poll_cq routine, this flush list is traversed and driver completes work requests on each QP in the flush list, till the budget expires. The QP is moved out of flush list during QP destroy or during modify_QP to RESET. When ULPs post Work Requests while QP is in error state, driver will store the ULP data and then increment the QP producer s/w index, without ringing doorbell. It then schedules a worker to invoke the CQ handler since the interrupts wont be generated from the HW for this request. Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-13RDMA/bnxt_re: Fixing the Control path command and response handlingDevesh Sharma1-150/+164
Fixing a concurrency issue with creq handling. Each caller was given a globally managed crsq element, which was accessed outside a lock. This could result in corruption, if lot of applications are simultaneously issuing Control Path commands. Now, each caller will provide its own response buffer and the responses will be copied under a lock. Also, Fixing the queue full condition check for the CMDQ. As a part of these changes, the control path code is refactored to remove the code replication in the response status checking. Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-14RDMA/bnxt_re: Add bnxt_re RoCE driverSelvin Xavier1-0/+694
This patch introduces the RoCE driver for the Broadcom NetXtreme-E 10/25/40/50G RoCE HCAs. The RoCE driver is a two part driver that relies on the parent bnxt_en NIC driver to operate. The changes needed in the bnxt_en driver have already been incorporated via Dave Miller's net tree into the mainline kernel. The vendor official git repository for this driver is available on github as: https://github.com/Broadcom/linux-rdma-nxt/ Signed-off-by: Eddie Wai <eddie.wai@broadcom.com> Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>