summaryrefslogtreecommitdiffstats
path: root/drivers/infiniband
AgeCommit message (Collapse)AuthorFilesLines
2017-01-09Merge tag 'mlx5-4kuar-for-4.11' of ↵David S. Miller4-280/+330
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Saeed Mahameed says: ==================== mlx5 4K UAR The following series of patches optimizes the usage of the UAR area which is contained within the BAR 0-1. Previous versions of the firmware and the driver assumed each system page contains a single UAR. This patch set will query the firmware for a new capability that if published, means that the firmware can support UARs of fixed 4K regardless of system page size. In the case of powerpc, where page size equals 64KB, this means we can utilize 16 UARs per system page. Since user space processes by default consume eight UARs per context this means that with this change a process will need a single system page to fulfill that requirement and in fact make use of more UARs which is better in terms of performance. In addition to optimizing user-space processes, we introduce an allocator that can be used by kernel consumers to allocate blue flame registers (which are areas within a UAR that are used to write doorbells). This provides further optimization on using the UAR area since the Ethernet driver makes use of a single blue flame register per system page and now it will use two blue flame registers per 4K. The series also makes changes to naming conventions and now the terms used in the driver code match the terms used in the PRM (programmers reference manual). Thus, what used to be called UUAR (micro UAR) is now called BFREG (blue flame register). In order to support compatibility between different versions of library/driver/firmware, the library has now means to notify the kernel driver that it supports the new scheme and the kernel can notify the library if it supports this extension. So mixed versions of libraries can run concurrently without any issues. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09IB/mlx5: Support 4k UAR for libmlx5Eli Cohen1-2/+19
Add fields to structs to convey to kernel an indication whether the library supports multi UARs per page and return to the library the size of a UAR based on the queried value. Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-01-09IB/mlx5: Allow future extension of libmlx5 input dataEli Cohen4-153/+198
Current check requests that new fields in struct mlx5_ib_alloc_ucontext_req_v2 that are not known to the driver be zero. This was introduced so new libraries passing additional information to the kernel through struct mlx5_ib_alloc_ucontext_req_v2 will be notified by old kernels that do not support their request by failing the operation. This schecme is problematic since it requires libmlx5 to issue the requests with descending input size for struct mlx5_ib_alloc_ucontext_req_v2. To avoid this, we require that new features that will obey the following rules: If the feature requires one or more fields in the response and the at least one of the fields can be encoded such that a zero value means the kernel ignored the request then this field will provide the indication to the library. If no response is required or if zero is a valid response, a new field should be added that indicates to the library whether its request was processed. Fixes: b368d7cb8ceb ('IB/mlx5: Add hca_core_clock_offset to udata in init_ucontext') Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-01-09IB/mlx5: Use blue flame register allocator in mlx5_ibEli Cohen4-69/+51
Make use of the blue flame registers allocator at mlx5_ib. Since blue flame was not really supported we remove all the code that is related to blue flame and we let all consumers to use the same blue flame register. Once blue flame is supported we will add the code. As part of this patch we also move the definition of struct mlx5_bf to mlx5_ib.h as it is only used by mlx5_ib. Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-01-08IB/mlx5: Fix retrieval of index to first hi class bfregEli Cohen1-10/+14
First the function retrieving the index of the first hi latency class blue flame register. High latency class bfregs are located right above medium latency class bfregs. Fixes: c1be5232d21d ('IB/mlx5: Fix micro UAR allocator') Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-01-08mlx5: Fix naming convention with respect to UARsEli Cohen4-133/+135
This establishes a solid naming conventions for UARs. A UAR (User Access Region) can have size identical to a system page or can be fixed 4KB depending on a value queried by firmware. Each UAR always has 4 blue flame register which are used to post doorbell to send queue. In addition, a UAR has section used for posting doorbells to CQs or EQs. In this patch we change names to reflect this conventions. Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-01-08IB/mlx5: Fix error handling order in create_kernel_qpEli Cohen1-2/+2
Make sure order of cleanup is exactly the opposite of initialization. Fixes: 9603b61de1ee ('mlx5: Move pci device handling from mlx5_ib to mlx5_core') Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-01-08IB/mlx5: Fix kernel to user leak prevention logicEli Cohen1-7/+7
The logic was broken as it failed to update the response length for architectures with PAGE_SIZE larger than 4kB. As a result further extension of the ucontext response struct would fail. Fixes: d69e3bcf7976 ('IB/mlx5: Mmap the HCA's core clock register to user-space') Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-01-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-2/+12
2017-01-02IB/mlx5: Improve MR checkArtemy Kovalyov2-2/+12
Add "type" field to mlx5_core MKEY struct. Check whether page fault happens on MKEY corresponding to MR. Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-02IB/mlx5: Add ODP atomics supportArtemy Kovalyov1-38/+50
Handle ODP atomic operations. When initiator of RDMA atomic operation use ODP MR to provide source data handle pagefault properly. Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-02{net,IB}/mlx5: Refactor page fault handlingArtemy Kovalyov4-274/+115
* Update page fault event according to last specification. * Separate code path for page fault EQ, completion EQ and async EQ. * Move page fault handling work queue from mlx5_ib static variable into mlx5_core page fault EQ. * Allocate memory to store ODP event dynamically as the events arrive, since in atomic context - use mempool. * Make mlx5_ib page fault handler run in process context. Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-02IB/mlx5: Add MR cache for large UMR regionsArtemy Kovalyov5-247/+219
In this change we turn mlx5_ib_update_mtt() into generic mlx5_ib_update_xlt() to perfrom HCA translation table modifiactions supporting both atomic and process contexts and not limited by number of modified entries. Using this function we increase preallocated MRs up to 16GB. Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-02IB/mlx5: Add support for big MRsArtemy Kovalyov3-3/+11
Make use of extended UMR translation offset. Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-02IB/mlx5: Refactor UMR post send formatArtemy Kovalyov4-114/+91
* Update struct mlx5_wqe_umr_ctrl_seg. * Currenlty UMR send_flags aim only certain use cases: enabled/disable cached MR, modifying XLT for ODP. By making flags independent make UMR more flexible allowing arbitrary manipulations. * Since different UMR formats have different entry sizes UMR request should receive exact size of translation table update instead of number of entries. Rename field npages to xlt_size in struct mlx5_umr_wr and update relevant code accordingly. * Add support of length64 bit. Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-02IB/mlx5: Add helper mlx5_ib_post_send_waitBinoy Jayan1-83/+32
Clean up the following common code (to post a list of work requests to the send queue of the specified QP) at various places and add a helper function 'mlx5_ib_post_send_wait' to implement the same. - Initialize 'mlx5_ib_umr_context' on stack - Assign "mlx5_umr_wr:wr:wr_cqe to umr_context.cqe - Acquire the semaphore - call ib_post_send with a single ib_send_wr - wait_for_completion() - Check for umr_context.status - Release the semaphore Signed-off-by: Binoy Jayan <binoy.jayan@linaro.org> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-02IB/mlx5: Reorder code in query device commandLeon Romanovsky1-11/+11
The order of features exposed by private mlx5-abi.h file is CQE zipping, packet pacing and multi-packet WQE. The internal order implemented in mlx5_ib_query_device() is multi-packet WQE, CQE zipping and packet pacing. Such difference hurts code readability, so let's sync, while mlx5-abi.h (exposed to userspace) is the primary order. This commit doesn't change any functionality. Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-29net/mlx4_core: Fix raw qp flow steering rules under SRIOVJack Morgenstein1-2/+12
Demoting simple flow steering rule priority (for DPDK) was achieved by wrapping FW commands MLX4_QP_FLOW_STEERING_ATTACH/DETACH for the PF as well, and forcing the priority to MLX4_DOMAIN_NIC in the wrapper function for the PF and all VFs. In function mlx4_ib_create_flow(), this change caused the main rule creation for the PF to be wrapped, while it left the associated tunnel steering rule creation unwrapped for the PF. This mismatch caused rule deletion failures in mlx4_ib_destroy_flow() for the PF when the detach wrapper function did not find the associated tunnel-steering rule (since creation of that rule for the PF did not go through the wrapper function). Fix this by setting MLX4_QP_FLOW_STEERING_ATTACH/DETACH to be "native" (so that the PF invocation does not go through the wrapper), and perform the required priority demotion for the PF in the mlx4_ib_create_flow() code path. Fixes: 48564135cba8 ("net/mlx4_core: Demote simple multicast and broadcast flow steering rules") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-24Replace <asm/uaccess.h> with <linux/uaccess.h> globallyLinus Torvalds6-6/+6
This was entirely automated, using the script by Al: PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>' sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \ $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h) to do the replacement at the end of the merge window. Requested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-23Merge tag 'for-linus' of ↵Linus Torvalds14-41/+86
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull rdma fixes from Doug Ledford: "First round of -rc fixes for 4.10 kernel: - a series of qedr fixes - a series of rxe fixes - one i40iw fix - one cma fix - one cxgb4 fix" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: IB/rxe: Don't check for null ptr in send() IB/rxe: Drop future atomic/read packets rather than retrying IB/rxe: Use BTH_PSN_MASK when ACKing duplicate sends qedr: Always notify the verb consumer of flushed CQEs qedr: clear the vendor error field in the work completion qedr: post_send/recv according to QP state qedr: ignore inline flag in read verbs qedr: modify QP state to error when destroying it qedr: return correct value on modify qp qedr: return error if destroy CQ failed qedr: configure the number of CQEs on CQ creation i40iw: Set 128B as the only supported RQ WQE size IB/cma: Fix a race condition in iboe_addr_get_sgid() IB/rxe: Fix a memory leak in rxe_qp_cleanup() iw_cxgb4: set correct FetchBurstMax for QPs
2016-12-22IB/rxe: Don't check for null ptr in send()Andrew Boyer1-2/+1
pkt->qp was already dereferenced earlier in the function. Fixes Smatch complaint: drivers/infiniband/sw/rxe/rxe_net.c:458 send() warn: variable dereferenced before check 'pkt->qp' (see line 441) Signed-off-by: Andrew Boyer <andrew.boyer@dell.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-22IB/rxe: Drop future atomic/read packets rather than retryingAndrew Boyer1-1/+1
If the completer is in the middle of a large read operation, one lost packet can cause havoc. Going to COMPST_ERROR_RETRY will cause the requester to resend the request. After that, any packet from the first attempt still in the receive queue will be interpreted as an error, restarting the error/retry sequence. The transfer will quickly exhaust its retries. This behavior is very noticeable when doing 512KB reads on a QEMU system configured with 1500B MTU. Also, a resent request here will prompt the responder on the other side to immediately start resending, but the resent packets will get stuck in the already-loaded receive queue and will never be processed. Rather than erroring out every time an unexpected future packet arrives, just drop it. Eventually the retry timer will send a duplicate request; the completer will be able to make progress since the queue will start relatively empty. Signed-off-by: Andrew Boyer <andrew.boyer@dell.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-22IB/rxe: Use BTH_PSN_MASK when ACKing duplicate sendsAndrew Boyer1-1/+2
Signed-off-by: Andrew Boyer <andrew.boyer@dell.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-22qedr: Always notify the verb consumer of flushed CQEsAmrani, Ram1-1/+1
Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com> Reviewed-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-22qedr: clear the vendor error field in the work completionAmrani, Ram1-0/+3
We clear the vendor error field in the work completion so that if a work completion is erroneous the field won't confuse the caller. Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com> Reviewed-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-22qedr: post_send/recv according to QP stateAmrani, Ram1-4/+4
Enable posting to SQ only in RTS, ERR and SQD QP state. Enable posting to RQ in ERR QP state. Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com> Reviewed-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-22qedr: ignore inline flag in read verbsAmrani, Ram1-1/+3
In the current implementation a read verb with IB_SEND_INLINE may be illegally configured. In this fix we ignore the inline bit in the case of a read verb. Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com> Reviewed-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-22qedr: modify QP state to error when destroying itAmrani, Ram1-2/+4
Current code didn't modify the QP state to error because it queried the QP state as a bitmap while it isn't. So the code never got executed. This patch fixes this and queries for each QP state respectively and not at once via a bitmask. Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com> Reviewed-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-22qedr: return correct value on modify qpAmrani, Ram1-1/+1
Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com> Reviewed-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-22qedr: return error if destroy CQ failedAmrani, Ram1-1/+6
Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com> Reviewed-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-22qedr: configure the number of CQEs on CQ creationAmrani, Ram1-0/+3
Configure ibcq->cqe when a CQ is created. Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com> Reviewed-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-22i40iw: Set 128B as the only supported RQ WQE sizeChien Tin Tung8-25/+53
RQ WQE size other than 128B is not supported. Correct RQ size calculation to use 128B only. Since this breaks ABI, add additional code to provide compatibility with v4 user provider, libi40iw. Signed-off-by: Chien Tin Tung <chien.tin.tung@intel.com> Signed-off-by: Henry Orosco <henry.orosco@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-18IB/rxe: Fix a memory leak in rxe_qp_cleanup()Bart Van Assche1-0/+1
A socket is associated with every QP by the rxe driver but sock_release() is never called. Add a call to sock_release() in rxe_qp_cleanup(). Fixes: commit 8700e3e7c48A5 ("Add Soft RoCE driver") Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Moni Shoua <monis@mellanox.com> Cc: Kamal Heib <kamalh@mellanox.com> Cc: Amir Vadai <amirv@mellanox.com> Cc: Haggai Eran <haggaie@mellanox.com> Cc: <stable@vger.kernel.org> Reviewed-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-18iw_cxgb4: set correct FetchBurstMax for QPsSteve Wise1-2/+3
The current QP FetchBurstMax value is 256B, which is incorrect since a WR can exceed that value. The result being a partial WR fetched by hardware, and a fatal "bad WR" error posted by the SGE. So bump the FetchBurstMax to 512B. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-15rdma: fix buggy code that the compiler warns aboutLinus Torvalds1-1/+2
Get rid of this warning: drivers/infiniband/sw/rdmavt/cq.c: In function ‘rvt_cq_exit’: drivers/infiniband/sw/rdmavt/cq.c:542:2: warning: ‘worker’ may be used uninitialized in this function [-Wmaybe-uninitialized] kthread_destroy_worker(worker); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ by fixing the function to actually work. Fixes: 6efaf10f163d ("IB/rdmavt: Avoid queuing work into a destroyed cq kthread worker") Cc: Petr Mladek <pmladek@suse.com> Cc: Doug Ledford <dledford@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-15Merge tag 'for-linus' of ↵Linus Torvalds180-2638/+11640
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull rdma updates from Doug Ledford: "This is the complete update for the rdma stack for this release cycle. Most of it is typical driver and core updates, but there is the entirely new VMWare pvrdma driver. You may have noticed that there were changes in DaveM's pull request to the bnxt Ethernet driver to support a RoCE RDMA driver. The bnxt_re driver was tentatively set to be pulled in this release cycle, but it simply wasn't ready in time and was dropped (a few review comments still to address, and some multi-arch build issues like prefetch() not working across all arches). Summary: - shared mlx5 updates with net stack (will drop out on merge if Dave's tree has already been merged) - driver updates: cxgb4, hfi1, hns-roce, i40iw, mlx4, mlx5, qedr, rxe - debug cleanups - new connection rejection helpers - SRP updates - various misc fixes - new paravirt driver from vmware" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (210 commits) IB: Add vmw_pvrdma driver IB/mlx4: fix improper return value IB/ocrdma: fix bad initialization infiniband: nes: return value of skb_linearize should be handled MAINTAINERS: Update Intel RDMA RNIC driver maintainers MAINTAINERS: Remove Mitesh Ahuja from emulex maintainers IB/core: fix unmap_sg argument qede: fix general protection fault may occur on probe IB/mthca: Replace pci_pool_alloc by pci_pool_zalloc mlx5, calc_sq_size(): Make a debug message more informative mlx5: Remove a set-but-not-used variable mlx5: Use { } instead of { 0 } to init struct IB/srp: Make writing the add_target sysfs attr interruptible IB/srp: Make mapping failures easier to debug IB/srp: Make login failures easier to debug IB/srp: Introduce a local variable in srp_add_one() IB/srp: Fix CONFIG_DYNAMIC_DEBUG=n build IB/multicast: Check ib_find_pkey() return value IPoIB: Avoid reading an uninitialized member variable IB/mad: Fix an array index check ...
2016-12-14mm: add locked parameter to get_user_pages_remote()Lorenzo Stoakes1-1/+1
Patch series "mm: unexport __get_user_pages_unlocked()". This patch series continues the cleanup of get_user_pages*() functions taking advantage of the fact we can now pass gup_flags as we please. It firstly adds an additional 'locked' parameter to get_user_pages_remote() to allow for its callers to utilise VM_FAULT_RETRY functionality. This is necessary as the invocation of __get_user_pages_unlocked() in process_vm_rw_single_vec() makes use of this and no other existing higher level function would allow it to do so. Secondly existing callers of __get_user_pages_unlocked() are replaced with the appropriate higher-level replacement - get_user_pages_unlocked() if the current task and memory descriptor are referenced, or get_user_pages_remote() if other task/memory descriptors are referenced (having acquiring mmap_sem.) This patch (of 2): Add a int *locked parameter to get_user_pages_remote() to allow VM_FAULT_RETRY faulting behaviour similar to get_user_pages_[un]locked(). Taking into account the previous adjustments to get_user_pages*() functions allowing for the passing of gup_flags, we are now in a position where __get_user_pages_unlocked() need only be exported for his ability to allow VM_FAULT_RETRY behaviour, this adjustment allows us to subsequently unexport __get_user_pages_unlocked() as well as allowing for future flexibility in the use of get_user_pages_remote(). [sfr@canb.auug.org.au: merge fix for get_user_pages_remote API change] Link: http://lkml.kernel.org/r/20161122210511.024ec341@canb.auug.org.au Link: http://lkml.kernel.org/r/20161027095141.2569-2-lstoakes@gmail.com Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Jan Kara <jack@suse.cz> Cc: Hugh Dickins <hughd@google.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krcmar <rkrcmar@redhat.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14Merge branch 'vmw_pvrdma' into merge-testDoug Ledford16-0/+5710
2016-12-14IB: Add vmw_pvrdma driverAdit Ranadive16-0/+5710
This patch series adds a driver for a paravirtual RDMA device. The device is developed for VMware's Virtual Machines and allows existing RDMA applications to continue to use existing Verbs API when deployed in VMs on ESXi. We recently did a presentation in the OFA Workshop [1] regarding this device. Description and RDMA Support ============================ The virtual device is exposed as a dual function PCIe device. One part is a virtual network device (VMXNet3) which provides networking properties like MAC, IP addresses to the RDMA part of the device. The networking properties are used to register GIDs required by RDMA applications to communicate. These patches add support and the all required infrastructure for letting applications use such a device. We support the mandatory Verbs API as well as the base memory management extensions (Local Inv, Send with Inv and Fast Register Work Requests). We currently support both Reliable Connected and Unreliable Datagram QPs but do not support Shared Receive Queues (SRQs). Also, we support the following types of Work Requests: o Send/Receive (with or without Immediate Data) o RDMA Write (with or without Immediate Data) o RDMA Read o Local Invalidate o Send with Invalidate o Fast Register Work Requests This version only adds support for version 1 of RoCE. We will add RoCEv2 support in a future patch. We do support registration of both MAC-based and IP-based GIDs. I have also created a git tree for our user-level driver [2]. Testing ======= We have tested this internally for various types of Guest OS - Red Hat, Centos, Ubuntu 12.04/14.04/16.04, Oracle Enterprise Linux, SLES 12 using backported versions of this driver. The tests included several runs of the performance tests (included with OFED), Intel MPI PingPong benchmark on OpenMPI, krping for FRWRs. Mellanox has been kind enough to test the backported version of the driver internally on their hardware using a VMware provided ESX build. I have also applied and tested this with Doug's k.o/for-4.9 branch (commit 5603910b). Note, that this patch series should be applied all together. I split out the commits so that it may be easier to review. PVRDMA Resources ================ [1] OFA Workshop Presentation - https://openfabrics.org/images/eventpresos/2016presentations/102parardma.pdf [2] Libpvrdma User-level library - http://git.openfabrics.org/?p=~aditr/libpvrdma.git;a=summary Reviewed-by: Jorgen Hansen <jhansen@vmware.com> Reviewed-by: George Zhang <georgezhang@vmware.com> Reviewed-by: Aditya Sarwade <asarwade@vmware.com> Reviewed-by: Bryan Tan <bryantan@vmware.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Adit Ranadive <aditr@vmware.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14Merge branches 'misc', 'qedr', 'reject-helpers', 'rxe' and 'srp' into merge-testDoug Ledford29-139/+367
2016-12-14Merge branch 'mlx' into merge-testDoug Ledford35-318/+740
2016-12-14Merge branch 'hfi1' into merge-testDoug Ledford42-648/+1677
2016-12-14Merge branches 'chelsio', 'debug-cleanup', 'hns' and 'i40iw' into merge-testDoug Ledford76-1531/+3143
2016-12-14IB/mlx4: fix improper return valuePan Bian1-1/+2
If uhw->inlen is non-zero, the value of variable err is 0 if the copy succeeds. Then, if kzalloc() or kmalloc() returns a NULL pointer, it will return 0 to the callers. As a result, the callers cannot detect the errors. This patch fixes the bug, assign "-ENOMEM" to err before the NULL pointer checks, and remove the initialization of err at the beginning. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=189031 Signed-off-by: Pan Bian <bianpan2016@163.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14IB/ocrdma: fix bad initializationPan Bian1-1/+1
In function ocrdma_mbx_create_ah_tbl(), returns the value of status on errors. However, because status is initialized with 0, 0 will be returned even if on error paths. This patch initialize status with "-ENOMEM". Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=188831 Signed-off-by: Pan Bian <bianpan2016@163.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14infiniband: nes: return value of skb_linearize should be handledZhouyi Zhou1-2/+6
Return value of skb_linearize should be handled in function nes_netdev_start_xmit. Compiled in x86_64 Signed-off-by: Zhouyi Zhou <yizhouzhou@ict.ac.cn> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14IB/core: fix unmap_sg argumentSebastian Ott1-1/+1
__ib_umem_release calls dma_unmap_sg with a different number of sg_entries than ib_umem_get uses for dma_map_sg. This might cause trouble for implementations that merge sglist entries and results in the following dma debug complaint: DMA-API: device driver frees DMA sg list with different entry count [map count=2] [unmap count=1] Fix it by using the correct value. Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14IB/mthca: Replace pci_pool_alloc by pci_pool_zallocSouptick Joarder1-4/+2
In mthca_create_ah(), pci_pool_alloc() followed by memset will be replaced by pci_pool_zalloc() Signed-off-by: Souptick joarder <jrdr.linux@gmail.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14mlx5, calc_sq_size(): Make a debug message more informativeBart Van Assche1-1/+2
Make it clear that qp->sq.wqe_cnt is not the number of WQEs. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Eli Cohen <eli@mellanox.com> Acked-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14mlx5: Remove a set-but-not-used variableBart Van Assche1-7/+0
This has been detected by building the mlx5 driver with W=1. Fixes: 1a412fb1caa2 ('net/mlx5: Fixes: 1a412fb1caa2 (IB/mlx5: Modify QP commands via mlx5 ifc') Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Eli Cohen <eli@mellanox.com> Acked-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>