summaryrefslogtreecommitdiffstats
path: root/drivers
diff options
context:
space:
mode:
authorHåkon Bugge <haakon.bugge@oracle.com>2017-06-20 14:07:50 +0200
committerDoug Ledford <dledford@redhat.com>2017-07-20 11:20:50 -0400
commit4542e3c79a2c5a167cbeb4f4190d5f705d272002 (patch)
tree13fc15f92dad0a2307f26ba91e7f4721d0e3ee06 /drivers
parenta25ce4270bfdd522207b02f81a594c7d1746b697 (diff)
downloadlinux-4542e3c79a2c5a167cbeb4f4190d5f705d272002.tar.bz2
IB/mlx4: Fix CM REQ retries in paravirt mode
CM REQs cannot be successfully retried, because a new pv_cm_id is created for each request, without checking if one already exists. By checking if an id exists before creating one, the bug is fixed. This bug can be provoked by running an RDMA CM user-land application, but inserting a five seconds delay before the rdma_accept() call on the passive side. This delay is larger than the default CMA timeout, and triggers a retry from the active side. The retried REQ will use another pv_cm_id (the cm_id on the wire). This confuses the CM protocol and two REJs are sent from the passive side. Here is an excerpt from ibdump running without the patch: 3.285092 LID: 4 -> LID: 4 SDP 290 CM: ConnectRequest(SDP Hello) 7.382711 LID: 4 -> LID: 4 SDP 290 CM: ConnectRequest(SDP Hello) 7.382861 LID: 4 -> LID: 4 InfiniBand 290 CM: ConnectReject 7.387644 LID: 4 -> LID: 4 InfiniBand 290 CM: ConnectReject and here is the same with bug fix applied: 3.251010 LID: 4 -> LID: 4 SDP 290 CM: ConnectRequest(SDP Hello) 7.349387 LID: 4 -> LID: 4 SDP 290 CM: ConnectRequest(SDP Hello) 8.258443 LID: 4 -> LID: 4 SDP 290 CM: ConnectReply(SDP Hello) 8.259890 LID: 4 -> LID: 4 InfiniBand 290 CM: ReadyToUse Suggested-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com> Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> Reported-by: Wei Lin Guay <wei.lin.guay@oracle.com> Tested-by: Wei Lin Guay <wei.lin.guay@oracle.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Acked-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Doug Ledford <dledford@redhat.com>
Diffstat (limited to 'drivers')
-rw-r--r--drivers/infiniband/hw/mlx4/cm.c4
1 files changed, 4 insertions, 0 deletions
diff --git a/drivers/infiniband/hw/mlx4/cm.c b/drivers/infiniband/hw/mlx4/cm.c
index 1e6c526450d9..fedaf8260105 100644
--- a/drivers/infiniband/hw/mlx4/cm.c
+++ b/drivers/infiniband/hw/mlx4/cm.c
@@ -323,6 +323,9 @@ int mlx4_ib_multiplex_cm_handler(struct ib_device *ibdev, int port, int slave_id
mad->mad_hdr.attr_id == CM_REP_ATTR_ID ||
mad->mad_hdr.attr_id == CM_SIDR_REQ_ATTR_ID) {
sl_cm_id = get_local_comm_id(mad);
+ id = id_map_get(ibdev, &pv_cm_id, slave_id, sl_cm_id);
+ if (id)
+ goto cont;
id = id_map_alloc(ibdev, slave_id, sl_cm_id);
if (IS_ERR(id)) {
mlx4_ib_warn(ibdev, "%s: id{slave: %d, sl_cm_id: 0x%x} Failed to id_map_alloc\n",
@@ -343,6 +346,7 @@ int mlx4_ib_multiplex_cm_handler(struct ib_device *ibdev, int port, int slave_id
return -EINVAL;
}
+cont:
set_local_comm_id(mad, id->pv_cm_id);
if (mad->mad_hdr.attr_id == CM_DREQ_ATTR_ID)