nvme-rdma: remove timeout for getting RDMA-CM established event - linux - Linux Kernel (branches are rebased on master from time to time)

diff options

author	Israel Rukshin <israelr@nvidia.com>	2022-05-15 18:04:40 +0300
committer	Jens Axboe <axboe@kernel.dk>	2022-08-02 17:22:41 -0600
commit	0525af711b6676156fdffc1072c49ff1d1d5bc0f (patch)
tree	615b15709420f85d1161e5e959559eb50f97b289 /mm/early_ioremap.c
parent	7012eef520cb7cb12910fb799dfd4ad0ed256b77 (diff)
download	linux-0525af711b6676156fdffc1072c49ff1d1d5bc0f.tar.bz2

nvme-rdma: remove timeout for getting RDMA-CM established event

In case many controllers start error recovery at the same time (i.e., when port is down and up), they may never succeed to reconnect again. This is because the target can't handle all the connect requests at three seconds (the arbitrary value set today). Even if some of the connections are established, when a single queue fails to connect, all the controller's queues are destroyed as well. So, on the following reconnection attempts the number of connect requests may remain the same. To fix this, remove the timeout and wait for RDMA-CM event to abort/complete the connect request. RDMA-CM sends unreachable event when a timeout of ~90 seconds is expired. This approach is used at other RDMA-CM users like SRP and iSER at blocking mode. The commit also renames NVME_RDMA_CONNECT_TIMEOUT_MS to NVME_RDMA_CM_TIMEOUT_MS. Signed-off-by: Israel Rukshin <israelr@nvidia.com> Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Acked-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>

Diffstat (limited to 'mm/early_ioremap.c')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: