summaryrefslogtreecommitdiffstats
path: root/drivers/infiniband/ulp
AgeCommit message (Collapse)AuthorFilesLines
2016-12-24Replace <asm/uaccess.h> with <linux/uaccess.h> globallyLinus Torvalds2-2/+2
This was entirely automated, using the script by Al: PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>' sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \ $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h) to do the replacement at the end of the merge window. Requested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-15Merge tag 'for-linus' of ↵Linus Torvalds8-66/+67
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull rdma updates from Doug Ledford: "This is the complete update for the rdma stack for this release cycle. Most of it is typical driver and core updates, but there is the entirely new VMWare pvrdma driver. You may have noticed that there were changes in DaveM's pull request to the bnxt Ethernet driver to support a RoCE RDMA driver. The bnxt_re driver was tentatively set to be pulled in this release cycle, but it simply wasn't ready in time and was dropped (a few review comments still to address, and some multi-arch build issues like prefetch() not working across all arches). Summary: - shared mlx5 updates with net stack (will drop out on merge if Dave's tree has already been merged) - driver updates: cxgb4, hfi1, hns-roce, i40iw, mlx4, mlx5, qedr, rxe - debug cleanups - new connection rejection helpers - SRP updates - various misc fixes - new paravirt driver from vmware" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (210 commits) IB: Add vmw_pvrdma driver IB/mlx4: fix improper return value IB/ocrdma: fix bad initialization infiniband: nes: return value of skb_linearize should be handled MAINTAINERS: Update Intel RDMA RNIC driver maintainers MAINTAINERS: Remove Mitesh Ahuja from emulex maintainers IB/core: fix unmap_sg argument qede: fix general protection fault may occur on probe IB/mthca: Replace pci_pool_alloc by pci_pool_zalloc mlx5, calc_sq_size(): Make a debug message more informative mlx5: Remove a set-but-not-used variable mlx5: Use { } instead of { 0 } to init struct IB/srp: Make writing the add_target sysfs attr interruptible IB/srp: Make mapping failures easier to debug IB/srp: Make login failures easier to debug IB/srp: Introduce a local variable in srp_add_one() IB/srp: Fix CONFIG_DYNAMIC_DEBUG=n build IB/multicast: Check ib_find_pkey() return value IPoIB: Avoid reading an uninitialized member variable IB/mad: Fix an array index check ...
2016-12-14Merge branches 'misc', 'qedr', 'reject-helpers', 'rxe' and 'srp' into merge-testDoug Ledford5-32/+58
2016-12-14Merge branch 'mlx' into merge-testDoug Ledford1-2/+0
2016-12-14IB/srp: Make writing the add_target sysfs attr interruptibleBart Van Assche1-1/+4
Avoid that shutdown of srp_daemon is delayed if add_target_mutex is held by another process. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14IB/srp: Make mapping failures easier to debugBart Van Assche1-2/+10
Make it easier to figure out what is going on if memory mapping fails because more memory regions than mr_per_cmd are needed. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14IB/srp: Make login failures easier to debugBart Van Assche1-0/+3
If login fails because memory region allocation failed it can be hard to figure out what happened. Make it easier to figure out why login failed by logging a message if ib_alloc_mr() fails. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14IB/srp: Introduce a local variable in srp_add_one()Bart Van Assche1-8/+9
This patch makes the srp_add_one() code more compact and does not change any functionality. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14IB/srp: Fix CONFIG_DYNAMIC_DEBUG=n buildBart Van Assche1-5/+6
Avoid that the kernel build fails as follows if dynamic debug support is disabled: drivers/infiniband/ulp/srp/ib_srp.c:2272:3: error: implicit declaration of function 'DEFINE_DYNAMIC_DEBUG_METADATA' drivers/infiniband/ulp/srp/ib_srp.c:2272:33: error: 'ddm' undeclared (first use in this function) drivers/infiniband/ulp/srp/ib_srp.c:2275:39: error: '_DPRINTK_FLAGS_PRINT' undeclared (first use in this function) Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14IPoIB: Avoid reading an uninitialized member variableBart Van Assche1-2/+5
This patch avoids that Coverity reports the following: Using uninitialized value port_attr.state when calling printk Fixes: commit 94232d9ce817 ("IPoIB: Start multicast join process only on active ports") Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Erez Shitrit <erezsh@mellanox.com> Cc: <stable@vger.kernel.org> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14IB/srpt: Report login failures only onceBart Van Assche1-13/+9
Report the following message only once if no ACL has been configured yet for an initiator port: "Rejected login because no ACL has been configured yet for initiator %s.\n" Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Nicholas Bellinger <nab@linux-iscsi.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Sagi Grimberg <sagig@grimberg.me> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14IB/isert: do not ignore errors in dma_map_single()Alexey Khoroshilov1-0/+6
There are several places, where errors in dma_map_single() are ignored. The patch fixes them. Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Acked-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14ib_isert: log the connection reject messageSteve Wise1-0/+2
Acked-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14ib_iser: log the connection reject messageSteve Wise1-1/+4
Acked-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-03IB/ipoib: Remove and fix debug prints after allocation failureLeon Romanovsky3-15/+3
The prints after [k|v][m|z|c]alloc() functions are not needed, because in case of failure, allocator will print their internal error prints anyway. Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-03IB/isert: Remove and fix debug prints after allocation failureLeon Romanovsky1-17/+6
The prints after [k|v][m|z|c]alloc() functions are not needed, because in case of failure, allocator will print their internal error prints anyway. Signed-off-by: Leon Romanovsky <leon@kernel.org> Acked-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-11-16IB/IPoIB: Remove can't use GFP_NOIO warningKamal Heib1-2/+0
Remove the warning print of "can't use of GFP_NOIO" to avoid prints in each QP creation when devices aren't supporting IB_QP_CREATE_USE_GFP_NOIO. This print become more annoying when the IPoIB interface is configured to work in connected mode. Fixes: 09b93088d750 ('IB: Add a QP creation flag to use GFP_NOIO allocations') Signed-off-by: Kamal Heib <kamalh@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller5-43/+64
Mostly simple overlapping changes. For example, David Ahern's adjacency list revamp in 'net-next' conflicted with an adjacency list traversal bug fix in 'net'. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-20net: use core MTU range checking in misc driversJarod Wilson1-0/+1
firewire-net: - set min/max_mtu - remove fwnet_change_mtu nes: - set max_mtu - clean up nes_netdev_change_mtu xpnet: - set min/max_mtu - remove xpnet_dev_change_mtu hippi: - set min/max_mtu - remove hippi_change_mtu batman-adv: - set max_mtu - remove batadv_interface_change_mtu - initialization is a little async, not 100% certain that max_mtu is set in the optimal place, don't have hardware to test with rionet: - set min/max_mtu - remove rionet_change_mtu slip: - set min/max_mtu - streamline sl_change_mtu um/net_kern: - remove pointless ndo_change_mtu hsi/clients/ssi_protocol: - use core MTU range checking - remove now redundant ssip_pn_set_mtu ipoib: - set a default max MTU value - Note: ipoib's actual max MTU can vary, depending on if the device is in connected mode or not, so we'll just set the max_mtu value to the max possible, and let the ndo_change_mtu function continue to validate any new MTU change requests with checks for CM or not. Note that ipoib has no min_mtu set, and thus, the network core's mtu > 0 check is the only lower bounds here. mptlan: - use net core MTU range checking - remove now redundant mpt_lan_change_mtu fddi: - min_mtu = 21, max_mtu = 4470 - remove now redundant fddi_change_mtu (including export) fjes: - min_mtu = 8192, max_mtu = 65536 - The max_mtu value is actually one over IP_MAX_MTU here, but the idea is to get past the core net MTU range checks so fjes_change_mtu can validate a new MTU against what it supports (see fjes_support_mtu in fjes_hw.c) hsr: - min_mtu = 0 (calls ether_setup, max_mtu is 1500) f_phonet: - min_mtu = 6, max_mtu = 65541 u_ether: - min_mtu = 14, max_mtu = 15412 phonet/pep-gprs: - min_mtu = 576, max_mtu = 65530 - remove redundant gprs_set_mtu CC: netdev@vger.kernel.org CC: linux-rdma@vger.kernel.org CC: Stefan Richter <stefanr@s5r6.in-berlin.de> CC: Faisal Latif <faisal.latif@intel.com> CC: linux-rdma@vger.kernel.org CC: Cliff Whickman <cpw@sgi.com> CC: Robin Holt <robinmholt@gmail.com> CC: Jes Sorensen <jes@trained-monkey.org> CC: Marek Lindner <mareklindner@neomailbox.ch> CC: Simon Wunderlich <sw@simonwunderlich.de> CC: Antonio Quartulli <a@unstable.cc> CC: Sathya Prakash <sathya.prakash@broadcom.com> CC: Chaitra P B <chaitra.basappa@broadcom.com> CC: Suganath Prabu Subramani <suganath-prabu.subramani@broadcom.com> CC: MPT-FusionLinux.pdl@broadcom.com CC: Sebastian Reichel <sre@kernel.org> CC: Felipe Balbi <balbi@kernel.org> CC: Arvid Brodin <arvid.brodin@alten.se> CC: Remi Denis-Courmont <courmisch@gmail.com> Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-18IB/ipoib: Flip to new dev walk APIDavid Ahern1-12/+25
Convert ipoib_get_net_dev_match_addr to the new upper device walk API. This is just a code conversion; no functional change is intended. v2 - removed typecast of data Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-14IB/ipoib: move back IB LL address into the hard headerPaolo Abeni5-43/+64
After the commit 9207f9d45b0a ("net: preserve IP control block during GSO segmentation"), the GSO CB and the IPoIB CB conflict. That destroy the IPoIB address information cached there, causing a severe performance regression, as better described here: http://marc.info/?l=linux-kernel&m=146787279825501&w=2 This change moves the data cached by the IPoIB driver from the skb control lock into the IPoIB hard header, as done before the commit 936d7de3d736 ("IPoIB: Stop lying about hard_header_len and use skb->cb to stash LL addresses"). In order to avoid GRO issue, on packet reception, the IPoIB driver stash into the skb a dummy pseudo header, so that the received packets have actually a hard header matching the declared length. To avoid changing the connected mode maximum mtu, the allocated head buffer size is increased by the pseudo header length. After this commit, IPoIB performances are back to pre-regression value. v2 -> v3: rebased v1 -> v2: avoid changing the max mtu, increasing the head buf size Fixes: 9207f9d45b0a ("net: preserve IP control block during GSO segmentation") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-09Merge tag 'for-linus' of ↵Linus Torvalds10-59/+48
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull main rdma updates from Doug Ledford: "This is the main pull request for the rdma stack this release. The code has been through 0day and I had it tagged for linux-next testing for a couple days. Summary: - updates to mlx5 - updates to mlx4 (two conflicts, both minor and easily resolved) - updates to iw_cxgb4 (one conflict, not so obvious to resolve, proper resolution is to keep the code in cxgb4_main.c as it is in Linus' tree as attach_uld was refactored and moved into cxgb4_uld.c) - improvements to uAPI (moved vendor specific API elements to uAPI area) - add hns-roce driver and hns and hns-roce ACPI reset support - conversion of all rdma code away from deprecated create_singlethread_workqueue - security improvement: remove unsafe ib_get_dma_mr (breaks lustre in staging)" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (75 commits) staging/lustre: Disable InfiniBand support iw_cxgb4: add fast-path for small REG_MR operations cxgb4: advertise support for FR_NSMR_TPTE_WR IB/core: correctly handle rdma_rw_init_mrs() failure IB/srp: Fix infinite loop when FMR sg[0].offset != 0 IB/srp: Remove an unused argument IB/core: Improve ib_map_mr_sg() documentation IB/mlx4: Fix possible vl/sl field mismatch in LRH header in QP1 packets IB/mthca: Move user vendor structures IB/nes: Move user vendor structures IB/ocrdma: Move user vendor structures IB/mlx4: Move user vendor structures IB/cxgb4: Move user vendor structures IB/cxgb3: Move user vendor structures IB/mlx5: Move and decouple user vendor structures IB/{core,hw}: Add constant for node_desc ipoib: Make ipoib_warn ratelimited IB/mlx4/alias_GUID: Remove deprecated create_singlethread_workqueue IB/ipoib_verbs: Remove deprecated create_singlethread_workqueue IB/ipoib: Remove deprecated create_singlethread_workqueue ...
2016-10-07IB/srp: Fix infinite loop when FMR sg[0].offset != 0Bart Van Assche1-3/+5
Avoid that mapping an sg-list in which the first element has a non-zero offset triggers an infinite loop when using FMR. This patch makes the FMR mapping code similar to that of ib_sg_to_pages(). Note: older Mellanox HCAs do not support non-zero offsets for FMR. See also commit 8c4037b501ac ("IB/srp: always avoid non-zero offsets into an FMR"). Reported-by: Alex Estrin <alex.estrin@intel.com> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: <stable@vger.kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-07IB/srp: Remove an unused argumentBart Van Assche1-2/+2
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-07ipoib: Make ipoib_warn ratelimitedkernel@kyup.com1-1/+7
In certain cases it's possible to be flooded by warning messages. To cope with such situations make the ipoib_warn macro be ratelimited. To prevent accidental limiting of legitimate, bursty messages make the limit fairly liberal by allowing up to 100 messages in 10 seconds. Signed-off-by: Nikolay Borisov <kernel@kyup.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-07IB/ipoib_verbs: Remove deprecated create_singlethread_workqueueBhaktipriya Shridhar1-1/+1
alloc_ordered_workqueue() with WQ_MEM_RECLAIM set, replaces deprecated create_singlethread_workqueue(). This is the identity conversion. The workqueue "wq" queues mulitple work items viz &priv->restart_task, &priv->cm.rx_reap_task, &priv->cm.skb_task, &priv->neigh_reap_task, &priv->ah_reap_task, &priv->mcast_task and &priv->carrier_on_task. The work items require strict execution ordering. Hence, an ordered dedicated workqueue has been used. WQ_MEM_RECLAIM has been set to ensure forward progress under memory pressure. Signed-off-by: Bhaktipriya Shridhar <bhaktipriya96@gmail.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-07IB/ipoib: Remove deprecated create_singlethread_workqueueBhaktipriya Shridhar1-1/+2
alloc_ordered_workqueue() replaces deprecated create_singlethread_workqueue(). The workqueue "ipoib_workqueue" that is used for all flush operations for the device. WQ_MEM_RECLAIM has been set since the flush operations may need to complete in order for other network functions to continue, and the memory reclaim operation might need the network functioning in order to make progress. Signed-off-by: Bhaktipriya Shridhar <bhaktipriya96@gmail.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-23IB/srp: use IB_PD_UNSAFE_GLOBAL_RKEYChristoph Hellwig2-27/+20
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-23IB/iser: use IB_PD_UNSAFE_GLOBAL_RKEYChristoph Hellwig3-21/+8
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-23IB/core: add support to create a unsafe global rkey to ib_create_pdChristoph Hellwig5-5/+5
Instead of exposing ib_get_dma_mr to ULPs and letting them use it more or less unchecked, this moves the capability of creating a global rkey into the RDMA core, where it can be easily audited. It also prints a warning everytime this feature is used as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-16IB/ipoib: Don't allow MC joins during light MC flushAlex Vesker1-0/+9
This fix solves a race between light flush and on the fly joins. Light flush doesn't set the device to down and unset IPOIB_OPER_UP flag, this means that if while flushing we have a MC join in progress and the QP was attached to BC MGID we can have a mismatches when re-attaching a QP to the BC MGID. The light flush would set the broadcast group to NULL causing an on the fly join to rejoin and reattach to the BC MCG as well as adding the BC MGID to the multicast list. The flush process would later on remove the BC MGID and detach it from the QP. On the next flush the BC MGID is present in the multicast list but not found when trying to detach it because of the previous double attach and single detach. [18332.714265] ------------[ cut here ]------------ [18332.717775] WARNING: CPU: 6 PID: 3767 at drivers/infiniband/core/verbs.c:280 ib_dealloc_pd+0xff/0x120 [ib_core] ... [18332.775198] Hardware name: Red Hat KVM, BIOS Bochs 01/01/2011 [18332.779411] 0000000000000000 ffff8800b50dfbb0 ffffffff813fed47 0000000000000000 [18332.784960] 0000000000000000 ffff8800b50dfbf0 ffffffff8109add1 0000011832f58300 [18332.790547] ffff880226a596c0 ffff880032482000 ffff880032482830 ffff880226a59280 [18332.796199] Call Trace: [18332.798015] [<ffffffff813fed47>] dump_stack+0x63/0x8c [18332.801831] [<ffffffff8109add1>] __warn+0xd1/0xf0 [18332.805403] [<ffffffff8109aebd>] warn_slowpath_null+0x1d/0x20 [18332.809706] [<ffffffffa025d90f>] ib_dealloc_pd+0xff/0x120 [ib_core] [18332.814384] [<ffffffffa04f3d7c>] ipoib_transport_dev_cleanup+0xfc/0x1d0 [ib_ipoib] [18332.820031] [<ffffffffa04ed648>] ipoib_ib_dev_cleanup+0x98/0x110 [ib_ipoib] [18332.825220] [<ffffffffa04e62c8>] ipoib_dev_cleanup+0x2d8/0x550 [ib_ipoib] [18332.830290] [<ffffffffa04e656f>] ipoib_uninit+0x2f/0x40 [ib_ipoib] [18332.834911] [<ffffffff81772a8a>] rollback_registered_many+0x1aa/0x2c0 [18332.839741] [<ffffffff81772bd1>] rollback_registered+0x31/0x40 [18332.844091] [<ffffffff81773b18>] unregister_netdevice_queue+0x48/0x80 [18332.848880] [<ffffffffa04f489b>] ipoib_vlan_delete+0x1fb/0x290 [ib_ipoib] [18332.853848] [<ffffffffa04df1cd>] delete_child+0x7d/0xf0 [ib_ipoib] [18332.858474] [<ffffffff81520c08>] dev_attr_store+0x18/0x30 [18332.862510] [<ffffffff8127fe4a>] sysfs_kf_write+0x3a/0x50 [18332.866349] [<ffffffff8127f4e0>] kernfs_fop_write+0x120/0x170 [18332.870471] [<ffffffff81207198>] __vfs_write+0x28/0xe0 [18332.874152] [<ffffffff810e09bf>] ? percpu_down_read+0x1f/0x50 [18332.878274] [<ffffffff81208062>] vfs_write+0xa2/0x1a0 [18332.881896] [<ffffffff812093a6>] SyS_write+0x46/0xa0 [18332.885632] [<ffffffff810039b7>] do_syscall_64+0x57/0xb0 [18332.889709] [<ffffffff81883321>] entry_SYSCALL64_slow_path+0x25/0x25 [18332.894727] ---[ end trace 09ebbe31f831ef17 ]--- Fixes: ee1e2c82c245 ("IPoIB: Refresh paths instead of flushing them on SM change events") Signed-off-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-02IB/ipoib: Fix memory corruption in ipoib cm mode connect flowErez Shitrit3-1/+18
When a new CM connection is being requested, ipoib driver copies data from the path pointer in the CM/tx object, the path object might be invalid at the point and memory corruption will happened later when now the CM driver will try using that data. The next scenario demonstrates it: neigh_add_path --> ipoib_cm_create_tx --> queue_work (pointer to path is in the cm/tx struct) #while the work is still in the queue, #the port goes down and causes the ipoib_flush_paths: ipoib_flush_paths --> path_free --> kfree(path) #at this point the work scheduled starts. ipoib_cm_tx_start --> copy from the (invalid)path pointer: (memcpy(&pathrec, &p->path->pathrec, sizeof pathrec);) -> memory corruption. To fix that the driver now starts the CM/tx connection only if that specific path exists in the general paths database. This check is protected with the relevant locks, and uses the gid from the neigh member in the CM/tx object which is valid according to the ref count that was taken by the CM/tx. Fixes: 839fcaba35 ('IPoIB: Connected mode experimental support') Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-02IB/isert: Properly release resources on DEVICE_REMOVALRaju Rangoju2-3/+22
When the low level driver exercises the hot unplug they would call rdma_cm cma_remove_one which would fire DEVICE_REMOVAL event to all cma consumers. Now, if consumer doesn't make sure they destroy all IB objects created on that IB device instance prior to finalizing all processing of DEVICE_REMOVAL callback, rdma_cm will let the lld to de-register with IB core and destroy the IB device instance. And if the consumer calls (say) ib_dereg_mr(), it will crash since that dev object is NULL. In the current implementation, iser-target just initiates the cleanup and returns from DEVICE_REMOVAL callback. This deferred work creates a race between iser-target cleaning IB objects(say MR) and lld destroying IB device instance. This patch includes the following fixes -> make sure that consumer frees all IB objects associated with device instance -> return non-zero from the callback to destroy the rdma_cm id Signed-off-by: Raju Rangoju <rajur@chelsio.com> Acked-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-24IB/srpt: Update sport->port_guid with each port refreshDoug Ledford1-4/+5
If port_guid is set with the default subnet_prefix, then we get a change event and run a port refresh, we don't update the port_guid. As a result, attempts to create a target device that uses the new subnet_prefix in the wwn will fail to find a match and be rejected by the ib_srpt driver. This makes it impossible to configure a port if it was initialized with a default subnet_prefix and later changed to any non-default subnet-prefix. Updating the port refresh task to always update the wwn based upon the current subnext_prefix solves this problem. Cc: Bart Van Assche <bart.vanassche@sandisk.com> Cc: nab@linux-iscsi.org Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-22IB/isert: fix error return code in isert_alloc_login_buf()Wei Yongjun1-1/+1
Fix to return error code -ENOMEM from the alloc error handling case instead of 0, as done elsewhere in this function. Signed-off-by: Wei Yongjun <weiyj.lk@gmail.com> Acked-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-04Merge tag 'for-linus-2' of ↵Linus Torvalds4-10/+9
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull second round of rdma updates from Doug Ledford: "This can be split out into just two categories: - fixes to the RDMA R/W API in regards to SG list length limits (about 5 patches) - fixes/features for the Intel hfi1 driver (everything else) The hfi1 driver is still being brought to full feature support by Intel, and they have a lot of people working on it, so that amounts to almost the entirety of this pull request" * tag 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (84 commits) IB/hfi1: Add cache evict LRU list IB/hfi1: Fix memory leak during unexpected shutdown IB/hfi1: Remove unneeded mm argument in remove function IB/hfi1: Consistently call ops->remove outside spinlock IB/hfi1: Use evict mmu rb operation IB/hfi1: Add evict operation to the mmu rb handler IB/hfi1: Fix TID caching actions IB/hfi1: Make the cache handler own its rb tree root IB/hfi1: Make use of mm consistent IB/hfi1: Fix user SDMA racy user request claim IB/hfi1: Fix error condition that needs to clean up IB/hfi1: Release node on insert failure IB/hfi1: Validate SDMA user iovector count IB/hfi1: Validate SDMA user request index IB/hfi1: Use the same capability state for all shared contexts IB/hfi1: Prevent null pointer dereference IB/hfi1: Rename TID mmu_rb_* functions IB/hfi1: Remove unneeded empty check in hfi1_mmu_rb_unregister() IB/hfi1: Restructure hfi1_file_open IB/hfi1: Make iovec loop index easy to understand ...
2016-08-04Merge tag 'for-linus' of ↵Linus Torvalds3-11/+8
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull base rdma updates from Doug Ledford: "Round one of 4.8 code: while this is mostly normal, there is a new driver in here (the driver was hosted outside the kernel for several years and is actually a fairly mature and well coded driver). It amounts to 13,000 of the 16,000 lines of added code in here. Summary: - Updates/fixes for iw_cxgb4 driver - Updates/fixes for mlx5 driver - Add flow steering and RSS API - Add hardware stats to mlx4 and mlx5 drivers - Add firmware version API for RDMA driver use - Add the rxe driver (this is a software RoCE driver that makes any Ethernet device a RoCE device) - Fixes for i40iw driver - Support for send only multicast joins in the cma layer - Other minor fixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (72 commits) Soft RoCE driver IB/core: Support for CMA multicast join flags IB/sa: Add cached attribute containing SM information to SA port IB/uverbs: Fix race between uverbs_close and remove_one IB/mthca: Clean up error unwind flow in mthca_reset() IB/mthca: NULL arg to pci_dev_put is OK IB/hfi1: NULL arg to sc_return_credits is OK IB/mlx4: Add diagnostic hardware counters net/mlx4: Query performance and diagnostics counters net/mlx4: Add diagnostic counters capability bit Use smaller 512 byte messages for portmapper messages IB/ipoib: Report SG feature regardless of HW UD CSUM capability IB/mlx4: Don't use GFP_ATOMIC for CQ resize struct IB/hfi1: Disable by default IB/rdmavt: Disable by default IB/mlx5: Fix port counter ID association to QP offset IB/mlx5: Fix iteration overrun in GSI qps i40iw: Add NULL check for puda buffer i40iw: Change dup_ack_thresh to u8 i40iw: Remove unnecessary check for moving CQ head ...
2016-08-04Merge branches 'misc' and 'rxe' into k.o/for-4.8-1Doug Ledford2-7/+6
2016-08-03IB/ipoib: Report SG feature regardless of HW UD CSUM capabilityYuval Shaia2-7/+6
Decouple SG support from HW ability to do UD checksum. This coupling is for historical reasons and removed with 'commit ec5f06156423 ("net: Kill link between CSUM and SG features.")' During driver load it is assumed that device does not supports SG. The final decision is taken after creating UD QP based on device capability. Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02IB/isert: Remove an unused member variableBart Van Assche2-3/+0
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Laurence Oberman <loberman@redhat.com> Cc: Parav Pandit <pandit.parav@gmail.com> Cc: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02IB/srpt: Simplify srpt_queue_response()Bart Van Assche1-5/+2
Initialize first_wr to &send_wr. This allows to remove a ternary operator and an else branch. This patch does not change the behavior of srpt_queue_response(). Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Tested-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Laurence Oberman <loberman@redhat.com> Cc: Parav Pandit <pandit.parav@gmail.com> Cc: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02IB/srpt: Limit the number of SG elements per work requestBart Van Assche2-2/+7
Limit the number of SG elements per work request to what the HCA and the queue pair support. Fixes: 34693573fde0 ("IB/srpt: Reduce QP buffer size") Reported-by: Parav Pandit <pandit.parav@gmail.com> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Steve Wise <swise@opengridcomputing.com> Cc: Parav Pandit <pandit.parav@gmail.com> Cc: Nicholas Bellinger <nab@linux-iscsi.org> Cc: Laurence Oberman <loberman@redhat.com> Cc: <stable@vger.kernel.org> #v4.7+ Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23IB/ipoib: Use new device FW version stringIra Weiny1-4/+2
Using this allows for devices to specify the format of their firmware version rather than forcing a format. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23IB/srpt: Reduce QP buffer sizeBart Van Assche2-2/+2
The memory needed for the send and receive queues associated with a QP is proportional to the max_sge parameter. The current value of that parameter is such that with an mlx4 HCA the QP buffer size is 8 MB. Since DMA is used for communication between HCA and CPU that buffer either has to be allocated coherently or map_single() must succeed for that buffer. Since large contiguous allocations are fragile and since the maximum segment size for e.g. swiotlb is 256 KB, reduce the max_sge parameter. This patch avoids that the following text appears on the console after SRP logout and relogin on a system equipped with multiple IB HCAs: mlx4_core 0000:05:00.0: swiotlb buffer is full (sz: 8388608 bytes) swiotlb: coherent allocation failed for device 0000:05:00.0 size=8388608 CPU: 11 PID: 148 Comm: kworker/11:1 Not tainted 4.7.0-rc4-dbg+ #1 Call Trace: [<ffffffff812c6d35>] dump_stack+0x67/0x92 [<ffffffff812efe71>] swiotlb_alloc_coherent+0x141/0x150 [<ffffffff810458be>] x86_swiotlb_alloc_coherent+0x3e/0x50 [<ffffffffa03861fa>] mlx4_buf_direct_alloc.isra.5+0x9a/0x120 [mlx4_core] [<ffffffffa0386545>] mlx4_buf_alloc+0x165/0x1a0 [mlx4_core] [<ffffffffa035053d>] create_qp_common.isra.29+0x57d/0xff0 [mlx4_ib] [<ffffffffa03510da>] mlx4_ib_create_qp+0x12a/0x3f0 [mlx4_ib] [<ffffffffa031154a>] ib_create_qp+0x3a/0x250 [ib_core] [<ffffffffa055dd4b>] srpt_cm_handler+0x4bb/0xcad [ib_srpt] [<ffffffffa02c1ab0>] cm_process_work+0x20/0xf0 [ib_cm] [<ffffffffa02c3640>] cm_work_handler+0x1ac0/0x2059 [ib_cm] [<ffffffff810737ed>] process_one_work+0x19d/0x490 [<ffffffff81073b29>] worker_thread+0x49/0x490 [<ffffffff8107a0ea>] kthread+0xea/0x100 [<ffffffff815b25af>] ret_from_fork+0x1f/0x40 Fixes: b99f8e4d7bcd ("IB/srpt: convert to the generic RDMA READ/WRITE API") Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Laurence Oberman <loberman@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-07IB/IPoIB: Don't update neigh validity for unresolved entriesErez Shitrit1-1/+3
ipoib_neigh_get unconditionally updates the "alive" variable member on any packet send. This prevents the neighbor garbage collection from cleaning out a dead neighbor entry if we are still queueing packets for it. If the queue for this neighbor is full, then don't update the alive timestamp. That way the neighbor can time out even if packets are still being queued as long as none of them are being sent. Fixes: b63b70d87741 ("IPoIB: Use a private hash table for path lookup in xmit path") Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-07IB/IPoIB: Disable bottom half when dealing with device addressMark Bloch3-11/+11
Align locking usage when touching device address with rest of the kernel. Lock the bottom half when doing so using netif_addr_lock_bh. This also solves the following case as reported by lockdep: CPU0 CPU1 ---- ---- lock(_xmit_INFINIBAND); local_irq_disable(); lock(&(&mc->mca_lock)->rlock); lock(_xmit_INFINIBAND); <Interrupt> lock(&(&mc->mca_lock)->rlock); *** DEADLOCK *** Fixes: 492a7e67ff83 ("IB/IPoIB: Allow setting the device address") Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-07IB/IPoIB: Fix race between ipoib_remove_one to sysfs functionsErez Shitrit4-0/+14
In ipoib_remove_one the driver holds the rtnl_lock and tries to do some operation like dev_change_flags or unregister_netdev, while sysfs callback like ipoib_vlan_delete holds sysfs mutex and tries to hold the rtnl_lock via rtnl_trylock() and restart_syscall() if the lock is not free, meanwhile ipoib_remove_one tries to get the sysfs lock in order to free its sysfs directory, and we will get a->b, b->a deadlock. Trace like the following: schedule+0x37/0x80 schedule_preempt_disabled+0xe/0x10 __mutex_lock_slowpath+0xb5/0x120 mutex_lock+0x23/0x40 rtnl_lock+0x15/0x20 netdev_run_todo+0x17c/0x320 rtnl_unlock+0xe/0x10 ipoib_vlan_delete+0x11b/0x1b0 [ib_ipoib] delete_child+0x54/0x80 [ib_ipoib] dev_attr_store+0x18/0x30 sysfs_kf_write+0x37/0x40 mutex_lock+0x16/0x40 SyS_write+0x55/0xc0 entry_SYSCALL_64_fastpath+0x16/0x75 And schedule+0x37/0x80 __kernfs_remove+0x1a8/0x260 ? wake_atomic_t_function+0x60/0x60 kernfs_remove+0x25/0x40 sysfs_remove_dir+0x50/0x80 kobject_del+0x18/0x50 device_del+0x19f/0x260 netdev_unregister_kobject+0x6a/0x80 rollback_registered_many+0x1fd/0x340 rollback_registered+0x3c/0x70 unregister_netdevice_queue+0x55/0xc0 unregister_netdev+0x20/0x30 ipoib_remove_one+0x114/0x1b0 [ib_ipoib] ib_unregister_client+0x4a/0x170 [ib_core] ? find_module_all+0x71/0xa0 ipoib_cleanup_module+0x10/0x94 [ib_ipoib] SyS_delete_module+0x1b5/0x210 entry_SYSCALL_64_fastpath+0x16/0x75 The fix is by checking the flag IPOIB_FLAG_INTF_ON_DESTROY in order to get out from the sysfs function. Fixes: 862096a8bbf8 ("IB/ipoib: Add more rtnl_link_ops callbacks") Fixes: 9baa0b036410 ("IB/ipoib: Add rtnl_link_ops support") Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-06IB/srp: Fix srp_map_sg_dma()Bart Van Assche1-2/+1
Because patch "IB/srp: Move common code into the caller" was applied partially srp_map_sg_dma() doesn't work properly. Fix this by applying the remainder of that patch. See also http://thread.gmane.org/gmane.linux.drivers.rdma/35803/focus=35811. Fixes: 3849e44d1c4b ("IB/srp: Move common code into the caller") Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Mike Marciniszyn <mike.marciniszyn@intel.com> Cc: Sagi Grimberg <sai@grimberg.me> Cc: Christoph Hellwig <hch@lst.de> Cc: Laurence Oberman <loberman@redhat.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-06IB/srp: Always initialize use_fast_reg and use_fmrBart Van Assche1-3/+1
Avoid that mapping fails due to use_fast_reg != 0 or use_fmr != 0 if both member variables should be zero (if never_register == 1 or if neither FMR nor FR is supported). Remove an initialization that became superfluous due to changing a kmalloc() into a kzalloc() call. Fixes: 509c5f33f4f6 ("IB/srp: Prevent mapping failures") Cc: Sagi Grimberg <sai@grimberg.m> Cc: Christoph Hellwig <hch@lst.de> Cc: Laurence Oberman <loberman@redhat.com> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-28Merge branch 'for-next' of ↵Linus Torvalds2-9/+11
git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending Pull SCSI target updates from Nicholas Bellinger: "Here are the outstanding target pending updates for v4.7-rc1. The highlights this round include: - Allow external PR/ALUA metadata path be defined at runtime via top level configfs attribute (Lee) - Fix target session shutdown bug for ib_srpt multi-channel (hch) - Make TFO close_session() and shutdown_session() optional (hch) - Drop se_sess->sess_kref + convert tcm_qla2xxx to internal kref (hch) - Add tcm_qla2xxx endpoint attribute for basic FC jammer (Laurence) - Refactor iscsi-target RX/TX PDU encode/decode into common code (Varun) - Extend iscsit_transport with xmit_pdu, release_cmd, get_rx_pdu, validate_parameters, and get_r2t_ttt for generic ISO offload (Varun) - Initial merge of cxgb iscsi-segment offload target driver (Varun) The bulk of the changes are Chelsio's new driver, along with a number of iscsi-target common code improvements made by Varun + Co along the way" * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (29 commits) iscsi-target: Fix early sk_data_ready LOGIN_FLAGS_READY race cxgbit: Use type ISCSI_CXGBIT + cxgbit tpg_np attribute iscsi-target: Convert transport drivers to signal rdma_shutdown iscsi-target: Make iscsi_tpg_np driver show/store use generic code tcm_qla2xxx Add SCSI command jammer/discard capability iscsi-target: graceful disconnect on invalid mapping to iovec target: need_to_release is always false, remove redundant check and kfree target: remove sess_kref and ->shutdown_session iscsi-target: remove usage of ->shutdown_session tcm_qla2xxx: introduce a private sess_kref target: make close_session optional target: make ->shutdown_session optional target: remove acl_stop target: consolidate and fix session shutdown cxgbit: add files for cxgbit.ko iscsi-target: export symbols iscsi-target: call complete on conn_logout_comp iscsi-target: clear tx_thread_active iscsi-target: add new offload transport type iscsi-target: use conn_transport->transport_type in text rsp ...