summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2018-01-18IB/mlx5: Mmap the HCA's clock info to user-spaceFeras Daoud3-13/+59
This patch maps the new page to user space applications to allow converting a user space completion timestamp to system wall time at the lowest possible latency cost. By using a versioning scheme we allow compatibility between current and future userspace libraries. The change moves mlx5_ib_mmap_cmd enum from mlx5_ib.h to the abi header file mlx5-abi.h. Reviewed-by: Alex Vesker <valex@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Eitan Rabin <rabin@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-18net/mlx5e: Add clock info page to mlx5 core devicesFeras Daoud3-0/+74
Adds a new page to mlx5 core containing clock info data that allows user level applications to translate between cqe timestamp to nanoseconds. The information stored into this page is represented through mlx5_ib_clock_info. In order to synchronize between kernel and user space a sequence number is incremented at the beginning and end of each update. An odd number means the data is being updated while an even means the access was already done. To guarantee that the data structure was accessed atomically user will: repeat: seq1 = <read sequence> goto <repeate> while odd <read data structure> seq2 = <read sequence> if seq1 != seq2 goto repeat Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Eitan Rabin <rabin@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-18IB/cq: Don't force IB_POLL_DIRECT poll context for ib_process_cq_directSagi Grimberg1-10/+13
polling the completion queue directly does not interfere with the existing polling logic, hence drop the requirement. Be aware that running ib_process_cq_direct with non IB_POLL_DIRECT CQ may trigger concurrent CQ processing. This can be used for polling mode ULPs. Cc: Bart Van Assche <bart.vanassche@wdc.com> Reported-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> [maxg: added wcs array argument to __ib_process_cq] Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-18IB/core: postpone WR initialization during queue drainMax Gurtovoy1-8/+8
No need to initialize completion and WR in case we fail during QP modification. Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Acked-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-18RDMA/rxe: Fix rxe_qp_cleanup()Bart Van Assche2-2/+13
rxe_qp_cleanup() can sleep so it must be run in thread context and not in atomic context. This patch avoids that the following bug is triggered: Kernel BUG at 00000000560033f3 [verbose debug info unavailable] BUG: sleeping function called from invalid context at net/core/sock.c:2761 in_atomic(): 1, irqs_disabled(): 0, pid: 7, name: ksoftirqd/0 INFO: lockdep is turned off. Preemption disabled at: [<00000000b6e69628>] __do_softirq+0x4e/0x540 CPU: 0 PID: 7 Comm: ksoftirqd/0 Not tainted 4.15.0-rc7-dbg+ #4 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014 Call Trace: dump_stack+0x85/0xbf ___might_sleep+0x177/0x260 lock_sock_nested+0x1d/0x90 inet_shutdown+0x2e/0xd0 rxe_qp_cleanup+0x107/0x140 [rdma_rxe] rxe_elem_release+0x18/0x80 [rdma_rxe] rxe_requester+0x1cf/0x11b0 [rdma_rxe] rxe_do_task+0x78/0xf0 [rdma_rxe] tasklet_action+0x99/0x270 __do_softirq+0xc0/0x540 run_ksoftirqd+0x1c/0x70 smpboot_thread_fn+0x1be/0x270 kthread+0x117/0x130 ret_from_fork+0x24/0x30 Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Moni Shoua <monis@mellanox.com> Cc: stable@vger.kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-18RDMA/rxe: Fix a race condition in rxe_requester()Bart Van Assche3-10/+2
The rxe driver works as follows: * The send queue, receive queue and completion queues are implemented as circular buffers. * ib_post_send() and ib_post_recv() calls are serialized through a spinlock. * Removing elements from various queues happens from tasklet context. Tasklets are guaranteed to run on at most one CPU. This serializes access to these queues. See also rxe_completer(), rxe_requester() and rxe_responder(). * rxe_completer() processes the skbs queued onto qp->resp_pkts. * rxe_requester() handles the send queue (qp->sq.queue). * rxe_responder() processes the skbs queued onto qp->req_pkts. Since rxe_drain_req_pkts() processes qp->req_pkts, calling rxe_drain_req_pkts() from rxe_requester() is racy. Hence this patch. Reported-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: stable@vger.kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-18RDMA/bnxt_re: Add SRQ support for Broadcom adaptersDevesh Sharma8-98/+825
Shared receive queue (SRQ) is defined as a pool of receive buffers shared among multiple QPs which belong to same protection domain in a given process context. Use of SRQ reduces the memory foot print of IB applications. Broadcom adapters support SRQ, adding code-changes to enable shared receive queue. Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-18RDMA/bnxt_re: expose detailed stats retrieved from HWSelvin Xavier7-10/+417
Broadcom's adapter supports more granular statistics to allow better understanding about the state of the chip when data traffic is flowing. Exposing the detailed stats to the consumer through the standard hook available in the kverbs interface. In order to retrieve all the information, driver implements a firmware command. Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-18RDMA/bnxt_re: Add support for MRs with Huge pagesSomnath Kotur5-57/+140
Depending on the OS page-table configurations, applications may request MRs which has page size alignment other than 4K Underlying provider driver needs to adjust its PBL boundaries according to the incoming page boundaries in the PA list. Adding a capability to register MRs having pages-sizes other than 4K (Hugepages). Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-17RDMA/bnxt_re: Add support for query firmware versionSelvin Xavier6-15/+39
The device now reports firmware version thus, removing the hard coded values of the FW version string and redundant fw_rev hook from sysfs. Adding code to query firmware version from underlying device and report it through the kernel verb to get firmware version string. Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-17RDMA/bnxt_re: Enable RoCE on virtual functionsSelvin Xavier6-37/+182
RoCE can be used by virtual functions (VFs) as well. Adding code changes to allow resource reservation, initialization and avail the resources to the RDMA applications running on those VFs. Currently, fifty percent of the total available resources are reserved for PF and remaining are equally divided among active VFs. Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-16i40iw: Free IEQ resourcesMustafa Ismail3-2/+3
The iWARP Exception Queue (IEQ) resources are not freed when a QP is destroyed. Fix this by freeing IEQ resources when freeing QP resources. Fixes: d37498417947 ("i40iw: add files for iwarp interface") Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-16i40iw: Remove setting of rem_addr.lenMustafa Ismail1-3/+0
Remove setting of rem_addr.len before calling iw_rdma_write, iw_inline_rdma_write and rdma_read. rem_addr.len is not used in those functions. Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-16i40iw: Remove limit on re-posting AEQ entries to HWSindhu Devale2-3/+0
Currently, if the number of processed Asynchronous Event Queue (AEQ) entries exceeds 255, they are not returned to HW for re-use. During scale-up, the unreturned AEQ entries can grow to the max AEQ size and cause the HW to report an AEQ overflow. Remove the check which limits the number of processed AEQ entries returned to HW. Fixes: 86dbcd0f12e9 ("RDMA/i40iw: add file to handle cqp calls") Signed-off-by: Sindhu Devale <sindhu.devale@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-16i40iw: Zero-out consumer key on allocate stag for FMRShiraz Saleem1-0/+1
If the application invalidates the MR before the FMR WR, HW parses the consumer key portion of the stag and returns an invalid stag key Asynchronous Event (AE) that tears down the QP. Fix this by zeroing-out the consumer key portion of the allocated stag returned to application for FMR. Fixes: ee855d3b93f3 ("RDMA/i40iw: Add base memory management extensions") Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-16i40iw: Remove extra call to i40iw_est_sd()Shiraz Saleem1-2/+0
Remove redundant estimate SD function call. sd_needed should already be updated at the end of the do while resource reduction loop. Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-16RDMA/hns: Set the guid for hip08 RoCE deviceoulijun1-0/+4
This patch assign a guid(Global Unique identifer) value to the hip08 device. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Yixian Liu <liuyixian@huawei.com> Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-16RDMA/hns: Update the verbs of polling for completionoulijun2-1/+13
If the port is a RoCEv2 port, the remote port address and QP information which returned for UD will be modified. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Yixian Liu <liuyixian@huawei.com> Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-16RDMA/hns: Assign zero for pkey_index of wc in hip08oulijun1-0/+3
Because pkey is fixed for hip08 RoCE, it needs to assign zero for pkey_index of wc. otherwise, it will happen an error when establishing connection by communication management mechanism. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Yixian Liu <liuyixian@huawei.com> Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-16RDMA/hns: Fill sq wqe context of ud type in hip08oulijun2-145/+385
This patch mainly configure the fields of sq wqe of ud type when posting wr of gsi qp type. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Yixian Liu <liuyixian@huawei.com> Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-16RDMA/hns: Add gsi qp support for modifying qp in hip08oulijun2-16/+49
It needs to Assign the values for some fields in qp context when qp type is gsi qp type in hip08. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Yixian Liu <liuyixian@huawei.com> Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-16RDMA/hns: Create gsi qp in hip08oulijun1-2/+16
The gsi qp and rc qp use the same qp context structure and the created flow, only differentiate them by qpn and qp type. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Yixian Liu <liuyixian@huawei.com> Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-16RDMA/hns: Assign the correct value for tx_cqnoulijun1-1/+1
When modifying qp from init to init, it need to assign the cqn of send cq for tx cqn field of qp context. Otherwise, it will cause a mistake when the send and recv cq sizes are different. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-15IB/cma: use strlcpy() instead of strncpy()Xiongfeng Wang1-1/+1
gcc-8 reports drivers/infiniband/core/cma_configfs.c: In function 'make_cma_dev': ./include/linux/string.h:245:9: warning: '__builtin_strncpy' specified bound 64 equals destination size [-Wstringop-truncation] We need to use strlcpy() to make sure the string is nul-terminated. Signed-off-by: Xiongfeng Wang <xiongfeng.wang@linaro.org> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-15RDMA/core: Clarify rdma_ah_find_typeParav Pandit1-2/+1
iWARP does not use rdma_ah_attr_type, and for this reason we do not have a RDMA_AH_ATTR_TYPE_IWARP. rdma_ah_find_type should not even be called on iwarp ports and for clarity it shouldn't have a special test for iWarp. This changes the result from RDMA_AH_ATTR_TYPE_ROCE to RDMA_AH_ATTR_TYPE_IB when wrongly called on an iWarp port. Fixes: 44c58487d51a ("IB/core: Define 'ib' and 'roce' rdma_ah_attr types") Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-15IB/core: Fix ib_wc structure size to remain in 64 bytes boundaryBodong Wang1-1/+1
The change of slid from u16 to u32 results in sizeof(struct ib_wc) cross 64B boundary, which causes more cache misses. This patch rearranges the fields and remain the size to 64B. Pahole output before this change: struct ib_wc { union { u64 wr_id; /* 8 */ struct ib_cqe * wr_cqe; /* 8 */ }; /* 0 8 */ enum ib_wc_status status; /* 8 4 */ enum ib_wc_opcode opcode; /* 12 4 */ u32 vendor_err; /* 16 4 */ u32 byte_len; /* 20 4 */ struct ib_qp * qp; /* 24 8 */ union { __be32 imm_data; /* 4 */ u32 invalidate_rkey; /* 4 */ } ex; /* 32 4 */ u32 src_qp; /* 36 4 */ int wc_flags; /* 40 4 */ u16 pkey_index; /* 44 2 */ /* XXX 2 bytes hole, try to pack */ u32 slid; /* 48 4 */ u8 sl; /* 52 1 */ u8 dlid_path_bits; /* 53 1 */ u8 port_num; /* 54 1 */ u8 smac[6]; /* 55 6 */ /* XXX 1 byte hole, try to pack */ u16 vlan_id; /* 62 2 */ /* --- cacheline 1 boundary (64 bytes) --- */ u8 network_hdr_type; /* 64 1 */ /* size: 72, cachelines: 2, members: 17 */ /* sum members: 62, holes: 2, sum holes: 3 */ /* padding: 7 */ /* last cacheline: 8 bytes */ }; Pahole output after this change: struct ib_wc { union { u64 wr_id; /* 8 */ struct ib_cqe * wr_cqe; /* 8 */ }; /* 0 8 */ enum ib_wc_status status; /* 8 4 */ enum ib_wc_opcode opcode; /* 12 4 */ u32 vendor_err; /* 16 4 */ u32 byte_len; /* 20 4 */ struct ib_qp * qp; /* 24 8 */ union { __be32 imm_data; /* 4 */ u32 invalidate_rkey; /* 4 */ } ex; /* 32 4 */ u32 src_qp; /* 36 4 */ u32 slid; /* 40 4 */ int wc_flags; /* 44 4 */ u16 pkey_index; /* 48 2 */ u8 sl; /* 50 1 */ u8 dlid_path_bits; /* 51 1 */ u8 port_num; /* 52 1 */ u8 smac[6]; /* 53 6 */ /* XXX 1 byte hole, try to pack */ u16 vlan_id; /* 60 2 */ u8 network_hdr_type; /* 62 1 */ /* size: 64, cachelines: 1, members: 17 */ /* sum members: 62, holes: 1, sum holes: 1 */ /* padding: 1 */ }; Cc: <stable@vger.kernel.org> # v4.13 Fixes: 7db20ecd1d97 ("IB/core: Change wc.slid from 16 to 32 bits") Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-15IB/mlx4: Fix incorrectly releasing steerable UD QPs when have only ETH portsJack Morgenstein2-8/+8
Allocating steerable UD QPs depends on having at least one IB port, while releasing those QPs does not. As a result, when there are only ETH ports, the IB (RoCE) driver requests releasing a qp range whose base qp is zero, with qp count zero. When SR-IOV is enabled, and the VF driver is running on a VM over a hypervisor which treats such qp release calls as errors (rather than NOPs), we see lines in the VM message log like: mlx4_core 0002:00:02.0: Failed to release qp range base:0 cnt:0 Fix this by adding a check for a zero count in mlx4_release_qp_range() (which thus treats releasing 0 qps as a nop), and eliminating the check for device managed flow steering when releasing steerable UD QPs. (Freeing ib_uc_qpns_bitmap unconditionally is also OK, since it remains NULL when steerable UD QPs are not allocated). Cc: <stable@vger.kernel.org> Fixes: 4196670be786 ("IB/mlx4: Don't allocate range of steerable UD QPs for Ethernet-only device") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-15RDMA/qedr: Fix endian problems around imm_dataJason Gunthorpe1-2/+2
The double swap matches what user space rdma-core does to imm_data. wc->imm_data is not used in the kernel so this change has no practical impact. Acked-by: Michal Kalderon <michal.kalderon@cavium.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-15RDMA/hns: Fix endian problems around imm_data and rkeyJason Gunthorpe3-7/+11
This matches the changes made recently to the userspace hns driver when it was made sparse clean. See rdma-core commit bffd380cfe56 ("libhns: Make the provider sparse clean") wc->imm_data is not used in the kernel so this change has no practical impact. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-15RDMA: Mark imm_data as be32 in the verbs uapi headerJason Gunthorpe4-7/+5
This matches what the userspace copy of this header has been doing for a while. imm_data is an opaque 4 byte array carried over the network, and invalidate_rkey is in CPU byte order. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-15IB/core: Limit DMAC resolution to RoCE Connected QPsParav Pandit1-1/+10
Resolving DMAC for RoCE is applicable to only Connected mode QPs. So resolve DMAC for only for Connected mode QPs. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-15IB/core: Attempt DMAC resolution for only RoCEParav Pandit1-4/+2
Instead of returning 0 (success) for RoCE scenarios where DMAC should not be resolved, avoid such attempt and make code consistent with ib_create_user_ah(). Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-15IB/core: Limit DMAC resolution to userspace QPsParav Pandit1-19/+29
Currently ah_attr is initialized by the ib_cm layer for rdma_cm based applications. For RoCE transport ah_attr.roce.dmac is already initialized by ib_cm, rdma_cm either from wc, path record, route resolve, explicit path record setting depending on active or passive side QP. Therefore avoid resolving DMAC for QP of kernel consumers. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-15IB/core: Perform modify QP on real oneParav Pandit1-2/+3
Currently qp->port stores the port number whenever IB_QP_PORT QP attribute mask is set (during QP state transition to INIT state). This port number should be stored for the real QP when XRC target QP is used. Follow the ib_modify_qp() implementation and hide the access to ->real_qp. Fixes: a512c2fbef9c ("IB/core: Introduce modify QP operation with udata") Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-10infiniband: fix sw/rdmavt/* kernel-doc notationRandy Dunlap5-17/+18
Use correct parameter names and formatting in function kernel-doc notation to eliminate warnings from scripts/kernel-doc. ../drivers/infiniband/sw/rdmavt/mr.c:784: warning: Excess function parameter 'ibmfr' description in 'rvt_map_phys_fmr' ../drivers/infiniband/sw/rdmavt/vt.c:234: warning: Excess function parameter 'intex' description in 'rvt_query_pkey' ../drivers/infiniband/sw/rdmavt/vt.c:266: warning: Excess function parameter 'index' description in 'rvt_query_gid' ../drivers/infiniband/sw/rdmavt/vt.c:306: warning: Excess function parameter 'data' description in 'rvt_alloc_ucontext' ../drivers/infiniband/sw/rdmavt/cq.c:65: warning: Excess function parameter 'sig' description in 'rvt_cq_enter' ../drivers/infiniband/sw/rdmavt/qp.c:279: warning: Excess function parameter 'qpt' description in 'rvt_free_all_qps' ../drivers/infiniband/sw/rdmavt/mcast.c:282: warning: Excess function parameter 'igd' description in 'rvt_attach_mcast' ../drivers/infiniband/sw/rdmavt/mcast.c:345: warning: Excess function parameter 'igd' description in 'rvt_detach_mcast' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Doug Ledford <dledford@redhat.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: linux-doc@vger.kernel.org Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-10infiniband: fix ulp/opa_vnic/opa_vnic_vema.c kernel-doc notationRandy Dunlap1-1/+1
Use correct parameter name and description in kernel-doc notation to eliminate a kernel-doc warning. ../drivers/infiniband/ulp/opa_vnic/opa_vnic_vema.c:730: warning: Excess function parameter 'cport' description in 'opa_vnic_vema_send_trap' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Doug Ledford <dledford@redhat.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: linux-doc@vger.kernel.org Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-10infiniband: fix core/fmr_pool.c kernel-doc notationRandy Dunlap1-7/+5
Fix kernel-doc warning for ib_fmr_pool_map_phys() and also format it with function description and text spacing. ../drivers/infiniband/core/fmr_pool.c:404: warning: Excess function parameter 'pool' description in 'ib_fmr_pool_map_phys' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Doug Ledford <dledford@redhat.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: linux-doc@vger.kernel.org Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-10infiniband: fix core/verbs.c kernel-doc notationRandy Dunlap1-2/+2
Change function parameter name in kernel-doc notation and other comments to eliminate a kernel-doc warning. ../drivers/infiniband/core/verbs.c:1790: warning: Excess function parameter 'wq_init_attr' description in 'ib_create_wq' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Doug Ledford <dledford@redhat.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: linux-doc@vger.kernel.org Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-10RDMA/cma: Fix rdma_cm path querying for RoCEParav Pandit1-3/+4
The 'if' logic in ucma_query_path was broken with OPA was introduced and started to treat RoCE paths as as OPA paths. Invert the logic of the 'if' so only OPA paths are treated as OPA paths. Otherwise the path records returned to rdma_cma users are mangled when in RoCE mode. Fixes: 57520751445b ("IB/SA: Add OPA path record type") Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-10RDMA/cma: Fix rdma_cm raw IB path setting for RoCEParav Pandit1-0/+14
rdma_set_ib_path() missed setting path record fields for RoCE transport when RoCE support was added. This results in setting incorrect ndev, destination mac address, incorrect GID type etc errors when user space attempts to set a raw IB path using the roce IB path compatibility mapping from userspace. Fixes: 3c86aa70bf67 ("RDMA/cm: Add RDMA CM support for IBoE devices") Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-10RDMA/{cma, ucma}: Simplify and rename rdma_set_ib_pathsParav Pandit3-11/+11
Since 2006 there has been no user of rdmacm based application to make use of setting multiple path records using rdma_set_ib_paths API. Therefore code is simplified to allow setting one path record entry. Now that it sets only single path, it is renamed to reflect the same. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-10RDMA/cma: Provide a function to set RoCE path record L2 parametersParav Pandit1-35/+53
Introduce a helper function to set path record L2 fields for RoCE. This includes setting GID type, destination mac address and netdev ifindex. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-10RDMA/cma: Use the right net namespace for the rdma_cm_idParav Pandit1-2/+3
The net namespace is set in addr during create_rdma_id(), cma_resolve_iboe_route() should use that instead of the init namespace. The original code was added in commit fa20105e09e9 ("IB/cma: Add support for network namespaces"), but this path wasn't in use back then. This patch updates the code to use right namespace, as preparation for improving namespace support. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-10IB/core: Increase number of char device minorsHuy Nguyen4-156/+113
There is a need to increase number of possible char devices to support large number of SR-IOV instances. The current limit is in the range of 64-128 devices/ports. Increase it to support up to 1024. The patch performs the following steps to refactor the code: 1. Removes the split bitmap for fixed and overflow dev numbers. 2. Pre-allocates the non-legacy major number range during driver initialization, choosen for simplicity. 3. Add new define (RDMA_MAX_PORTS) that is shared between all drivers. This is the maximum total number of ports on all struct ib_devices. 4. Set RDMA_MAX_PORTS to 1024. Signed-off-by: Huy Nguyen <huyn@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-10IB/core: Remove the locking for character device bitmapsHuy Nguyen2-10/+0
Remove the locks that protect character device bitmaps of uverbs, umad and issm. The character device bitmaps are accessed in "client->add" and "client->remove" calls from ib_register_device and ib_unregister_device respectively. These calls are already protected by the "device_mutex" mutex. Thus, the spinlocks are not needed. Signed-off-by: Huy Nguyen <huyn@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-10RDMA/rxe: Fix a race condition related to the QP error stateBart Van Assche1-0/+2
The following sequence: * Change queue pair state into IB_QPS_ERR. * Post a work request on the queue pair. Triggers the following race condition in the rdma_rxe driver: * rxe_qp_error() triggers an asynchronous call of rxe_completer(), the function that examines the QP send queue. * rxe_post_send() posts a work request on the QP send queue. If rxe_completer() runs prior to rxe_post_send(), it will drain the send queue and the driver will assume no further action is necessary. However, once we post the send to the send queue, because the queue is in error, no send completion will ever happen and the send will get stuck. In order to process the send, we need to make sure that rxe_completer() gets run after a send is posted to a queue pair in an error state. This patch ensures that happens. Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Moni Shoua <monis@mellanox.com> Cc: <stable@vger.kernel.org> # v4.8 Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-10IB/mlx5: remove redundant assignment of mdevColin Ian King1-1/+1
The initial assignment to mdev is redundant as mdev is re-assigned later and the first assigned value is never read. Remove this redundant assignment. Cleans up clang warning: drivers/infiniband/hw/mlx5/main.c:359:24: warning: Value stored to 'mdev' during its initialization is never read Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-08IB/rxe: remove unnecessary skb_clone in xmitZhu Yanjun1-11/+5
In xmit, there is a skb_clone. This function copies the struct sk_buff. And some parameters are changed to the new skb. Then the new skb is sent while the old skb is freed. While the function skb_clone is removed, the parameter changes are made on the old skb, then the old skb is sent. It can also work well. The following tests are made. server client --------- --------- |1.1.1.1|<----rxe-channel--->|1.1.1.2| --------- --------- On server: rping -s -a 1.1.1.1 -v -C 1000 -S 512 On client: rping -c -a 1.1.1.1 -v -C 1000 -S 512 The kernel config CONFIG_DEBUG_KMEMLEAK is enabled on both server and client. This test runs for several hours. There is no memory leak and the whole system can work well. As the above network, the following tests are made. Server: ibv_rc_pingpong -d rxe0 -g 1 Client: ibv_rc_pingpong -d rxe0 -g 1 1.1.1.1 The result on Server. Before: 8192000 bytes in 0.88 seconds = 74.36 Mbit/sec 1000 iters in 0.88 seconds = 881.30 usec/iter After: 8192000 bytes in 0.81 seconds = 81.15 Mbit/sec 1000 iters in 0.81 seconds = 807.62 usec/iter The throughput is enhanced and the latency is reduced. CC: Srinivas Eeda <srinivas.eeda@oracle.com> CC: Joe Jin <joe.jin@oracle.com> CC: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-08IB/rxe: add the static type to the variableZhu Yanjun2-2/+1
The variable recv_sockets is only used in the file rxe_net.c. So it is better to add static type to it. CC: Srinivas Eeda <srinivas.eeda@oracle.com> CC: Joe Jin <joe.jin@oracle.com> CC: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-08Merge branch 'bart-srpt-for-next' into k.o/wip/dl-for-nextDoug Ledford8-186/+299
Merging in 12 patch series from Bart that required changes in the current for-rc branch in order to apply cleanly. Signed-off-by: Doug Ledford <dledford@redhat.com>