From b112036535eda34460677ea883eaecc3a45a435d Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Tue, 5 Jan 2021 00:41:04 +0100 Subject: scsi: megaraid_sas: Fix MEGASAS_IOC_FIRMWARE regression Phil Oester reported that a fix for a possible buffer overrun that I sent caused a regression that manifests in this output: Event Message: A PCI parity error was detected on a component at bus 0 device 5 function 0. Severity: Critical Message ID: PCI1308 The original code tried to handle the sense data pointer differently when using 32-bit 64-bit DMA addressing, which would lead to a 32-bit dma_addr_t value of 0x11223344 to get stored 32-bit kernel: 44 33 22 11 ?? ?? ?? ?? 64-bit LE kernel: 44 33 22 11 00 00 00 00 64-bit BE kernel: 00 00 00 00 44 33 22 11 or a 64-bit dma_addr_t value of 0x1122334455667788 to get stored as 32-bit kernel: 88 77 66 55 ?? ?? ?? ?? 64-bit kernel: 88 77 66 55 44 33 22 11 In my patch, I tried to ensure that the same value is used on both 32-bit and 64-bit kernels, and picked what seemed to be the most sensible combination, storing 32-bit addresses in the first four bytes (as 32-bit kernels already did), and 64-bit addresses in eight consecutive bytes (as 64-bit kernels already did), but evidently this was incorrect. Always storing the dma_addr_t pointer as 64-bit little-endian, i.e. initializing the second four bytes to zero in case of 32-bit addressing, apparently solved the problem for Phil, and is consistent with what all 64-bit little-endian machines did before. I also checked in the history that in previous versions of the code, the pointer was always in the first four bytes without padding, and that previous attempts to fix 64-bit user space, big-endian architectures and 64-bit DMA were clearly flawed and seem to have introduced made this worse. Link: https://lore.kernel.org/r/20210104234137.438275-1-arnd@kernel.org Fixes: 381d34e376e3 ("scsi: megaraid_sas: Check user-provided offsets") Fixes: 107a60dd71b5 ("scsi: megaraid_sas: Add support for 64bit consistent DMA") Fixes: 94cd65ddf4d7 ("[SCSI] megaraid_sas: addded support for big endian architecture") Fixes: 7b2519afa1ab ("[SCSI] megaraid_sas: fix 64 bit sense pointer truncation") Reported-by: Phil Oester Tested-by: Phil Oester Signed-off-by: Arnd Bergmann Signed-off-by: Martin K. Petersen --- drivers/scsi/megaraid/megaraid_sas_base.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) (limited to 'drivers/scsi') diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c b/drivers/scsi/megaraid/megaraid_sas_base.c index af192096a82b..63a4f48bdc75 100644 --- a/drivers/scsi/megaraid/megaraid_sas_base.c +++ b/drivers/scsi/megaraid/megaraid_sas_base.c @@ -8244,11 +8244,9 @@ megasas_mgmt_fw_ioctl(struct megasas_instance *instance, goto out; } + /* always store 64 bits regardless of addressing */ sense_ptr = (void *)cmd->frame + ioc->sense_off; - if (instance->consistent_mask_64bit) - put_unaligned_le64(sense_handle, sense_ptr); - else - put_unaligned_le32(sense_handle, sense_ptr); + put_unaligned_le64(sense_handle, sense_ptr); } /* -- cgit v1.2.3 From 5e6ddadf7637d336acaad1df1f3bcbb07f7d104d Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Tue, 5 Jan 2021 20:08:22 -0800 Subject: scsi: ufs: ufshcd-pltfrm depends on HAS_IOMEM Building ufshcd-pltfrm.c on arch/s390/ has a linker error since S390 does not support IOMEM, so add a dependency on HAS_IOMEM. s390-linux-ld: drivers/scsi/ufs/ufshcd-pltfrm.o: in function `ufshcd_pltfrm_init': ufshcd-pltfrm.c:(.text+0x38e): undefined reference to `devm_platform_ioremap_resource' where that devm_ function is inside an #ifdef CONFIG_HAS_IOMEM/#endif block. Link: lore.kernel.org/r/202101031125.ZEFCUiKi-lkp@intel.com Link: https://lore.kernel.org/r/20210106040822.933-1-rdunlap@infradead.org Fixes: 03b1781aa978 ("[SCSI] ufs: Add Platform glue driver for ufshcd") Cc: "James E.J. Bottomley" Cc: "Martin K. Petersen" Cc: Alim Akhtar Cc: Avri Altman Cc: linux-scsi@vger.kernel.org Reported-by: kernel test robot Signed-off-by: Randy Dunlap Signed-off-by: Martin K. Petersen --- drivers/scsi/ufs/Kconfig | 1 + 1 file changed, 1 insertion(+) (limited to 'drivers/scsi') diff --git a/drivers/scsi/ufs/Kconfig b/drivers/scsi/ufs/Kconfig index 3f6dfed4fe84..b915b38c2b27 100644 --- a/drivers/scsi/ufs/Kconfig +++ b/drivers/scsi/ufs/Kconfig @@ -72,6 +72,7 @@ config SCSI_UFS_DWC_TC_PCI config SCSI_UFSHCD_PLATFORM tristate "Platform bus based UFS Controller support" depends on SCSI_UFSHCD + depends on HAS_IOMEM help This selects the UFS host controller support. Select this if you have an UFS controller on Platform bus. -- cgit v1.2.3 From 901d01c8e50c35a182073219a38b9c6391e59144 Mon Sep 17 00:00:00 2001 From: Tyrel Datwyler Date: Wed, 6 Jan 2021 14:37:21 -0600 Subject: scsi: ibmvfc: Fix missing cast of ibmvfc_event pointer to u64 handle Commit 2aa0102c6688 ("scsi: ibmvfc: Use correlation token to tag commands") sets the vfcFrame correlation token to the pointer handle of the associated ibmvfc_event. However, that commit failed to cast the pointer to an appropriate type which in this case is a u64. As such sparse warnings are generated for both correlation token assignments. ibmvfc.c:2375:36: sparse: incorrect type in argument 1 (different base types) ibmvfc.c:2375:36: sparse: expected unsigned long long [usertype] val ibmvfc.c:2375:36: sparse: got struct ibmvfc_event *[assigned] evt Add the appropriate u64 casts when assigning an ibmvfc_event as a correlation token. Link: https://lore.kernel.org/r/20210106203721.1054693-1-tyreld@linux.ibm.com Fixes: 2aa0102c6688 ("scsi: ibmvfc: Use correlation token to tag commands") Reported-by: kernel test robot Signed-off-by: Tyrel Datwyler Signed-off-by: Martin K. Petersen --- drivers/scsi/ibmvscsi/ibmvfc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'drivers/scsi') diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c index 42e4d35e0d35..7312f31df878 100644 --- a/drivers/scsi/ibmvscsi/ibmvfc.c +++ b/drivers/scsi/ibmvscsi/ibmvfc.c @@ -1744,7 +1744,7 @@ static int ibmvfc_queuecommand_lck(struct scsi_cmnd *cmnd, iu->pri_task_attr = IBMVFC_SIMPLE_TASK; } - vfc_cmd->correlation = cpu_to_be64(evt); + vfc_cmd->correlation = cpu_to_be64((u64)evt); if (likely(!(rc = ibmvfc_map_sg_data(cmnd, evt, vfc_cmd, vhost->dev)))) return ibmvfc_send_event(evt, vhost, 0); @@ -2418,7 +2418,7 @@ static int ibmvfc_abort_task_set(struct scsi_device *sdev) tmf->flags = cpu_to_be16((IBMVFC_NO_MEM_DESC | IBMVFC_TMF)); evt->sync_iu = &rsp_iu; - tmf->correlation = cpu_to_be64(evt); + tmf->correlation = cpu_to_be64((u64)evt); init_completion(&evt->comp); rsp_rc = ibmvfc_send_event(evt, vhost, default_timeout); -- cgit v1.2.3 From 4ee7ee530bc2bae6268247988d86722c65d02a37 Mon Sep 17 00:00:00 2001 From: Jaegeuk Kim Date: Thu, 7 Jan 2021 10:53:15 -0800 Subject: scsi: ufs: Fix livelock of ufshcd_clear_ua_wluns() When gate_work/ungate_work experience an error during hibern8_enter or exit we can livelock: ufshcd_err_handler() ufshcd_scsi_block_requests() ufshcd_reset_and_restore() ufshcd_clear_ua_wluns() -> stuck ufshcd_scsi_unblock_requests() In order to avoid this, ufshcd_clear_ua_wluns() can be called per recovery flows such as suspend/resume, link_recovery, and error_handler. Link: https://lore.kernel.org/r/20210107185316.788815-2-jaegeuk@kernel.org Fixes: 1918651f2d7e ("scsi: ufs: Clear UAC for RPMB after ufshcd resets") Reviewed-by: Can Guo Signed-off-by: Jaegeuk Kim Signed-off-by: Martin K. Petersen --- drivers/scsi/ufs/ufshcd.c | 19 ++++++++++++------- 1 file changed, 12 insertions(+), 7 deletions(-) (limited to 'drivers/scsi') diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c index e31d2c5c7b23..cff4f52a91f0 100644 --- a/drivers/scsi/ufs/ufshcd.c +++ b/drivers/scsi/ufs/ufshcd.c @@ -3996,6 +3996,8 @@ int ufshcd_link_recovery(struct ufs_hba *hba) if (ret) dev_err(hba->dev, "%s: link recovery failed, err %d", __func__, ret); + else + ufshcd_clear_ua_wluns(hba); return ret; } @@ -6001,6 +6003,9 @@ skip_err_handling: ufshcd_scsi_unblock_requests(hba); ufshcd_err_handling_unprepare(hba); up(&hba->eh_sem); + + if (!err && needs_reset) + ufshcd_clear_ua_wluns(hba); } /** @@ -6938,14 +6943,11 @@ static int ufshcd_host_reset_and_restore(struct ufs_hba *hba) ufshcd_set_clk_freq(hba, true); err = ufshcd_hba_enable(hba); - if (err) - goto out; /* Establish the link again and restore the device */ - err = ufshcd_probe_hba(hba, false); if (!err) - ufshcd_clear_ua_wluns(hba); -out: + err = ufshcd_probe_hba(hba, false); + if (err) dev_err(hba->dev, "%s: Host init failed %d\n", __func__, err); ufshcd_update_evt_hist(hba, UFS_EVT_HOST_RESET, (u32)err); @@ -7716,6 +7718,8 @@ static int ufshcd_add_lus(struct ufs_hba *hba) if (ret) goto out; + ufshcd_clear_ua_wluns(hba); + /* Initialize devfreq after UFS device is detected */ if (ufshcd_is_clkscaling_supported(hba)) { memcpy(&hba->clk_scaling.saved_pwr_info.info, @@ -7917,8 +7921,6 @@ out: pm_runtime_put_sync(hba->dev); ufshcd_exit_clk_scaling(hba); ufshcd_hba_exit(hba); - } else { - ufshcd_clear_ua_wluns(hba); } } @@ -8775,6 +8777,7 @@ enable_gating: ufshcd_resume_clkscaling(hba); hba->clk_gating.is_suspended = false; hba->dev_info.b_rpm_dev_flush_capable = false; + ufshcd_clear_ua_wluns(hba); ufshcd_release(hba); out: if (hba->dev_info.b_rpm_dev_flush_capable) { @@ -8885,6 +8888,8 @@ static int ufshcd_resume(struct ufs_hba *hba, enum ufs_pm_op pm_op) cancel_delayed_work(&hba->rpm_dev_flush_recheck_work); } + ufshcd_clear_ua_wluns(hba); + /* Schedule clock gating in case of no access to UFS device yet */ ufshcd_release(hba); -- cgit v1.2.3 From eeb1b55b6e25c5f7265ff45cd050f3bc2cc423a4 Mon Sep 17 00:00:00 2001 From: Jaegeuk Kim Date: Thu, 7 Jan 2021 10:53:16 -0800 Subject: scsi: ufs: Fix tm request when non-fatal error happens When non-fatal error like line-reset happens, ufshcd_err_handler() starts to abort tasks by ufshcd_try_to_abort_task(). When it tries to issue a task management request, we hit two warnings: WARNING: CPU: 7 PID: 7 at block/blk-core.c:630 blk_get_request+0x68/0x70 WARNING: CPU: 4 PID: 157 at block/blk-mq-tag.c:82 blk_mq_get_tag+0x438/0x46c After fixing the above warnings we hit another tm_cmd timeout which may be caused by unstable controller state: __ufshcd_issue_tm_cmd: task management cmd 0x80 timed-out Then, ufshcd_err_handler() enters full reset, and kernel gets stuck. It turned out ufshcd_print_trs() printed too many messages on console which requires CPU locks. Likewise hba->silence_err_logs, we need to avoid too verbose messages. This is actually not an error case. Link: https://lore.kernel.org/r/20210107185316.788815-3-jaegeuk@kernel.org Fixes: 69a6c269c097 ("scsi: ufs: Use blk_{get,put}_request() to allocate and free TMFs") Reviewed-by: Can Guo Signed-off-by: Jaegeuk Kim Signed-off-by: Martin K. Petersen --- drivers/scsi/ufs/ufshcd.c | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-) (limited to 'drivers/scsi') diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c index cff4f52a91f0..fb32d122f2e3 100644 --- a/drivers/scsi/ufs/ufshcd.c +++ b/drivers/scsi/ufs/ufshcd.c @@ -4994,7 +4994,8 @@ ufshcd_transfer_rsp_status(struct ufs_hba *hba, struct ufshcd_lrb *lrbp) break; } /* end of switch */ - if ((host_byte(result) != DID_OK) && !hba->silence_err_logs) + if ((host_byte(result) != DID_OK) && + (host_byte(result) != DID_REQUEUE) && !hba->silence_err_logs) ufshcd_print_trs(hba, 1 << lrbp->task_tag, true); return result; } @@ -6300,9 +6301,13 @@ static irqreturn_t ufshcd_intr(int irq, void *__hba) intr_status = ufshcd_readl(hba, REG_INTERRUPT_STATUS); } - if (enabled_intr_status && retval == IRQ_NONE) { - dev_err(hba->dev, "%s: Unhandled interrupt 0x%08x\n", - __func__, intr_status); + if (enabled_intr_status && retval == IRQ_NONE && + !ufshcd_eh_in_progress(hba)) { + dev_err(hba->dev, "%s: Unhandled interrupt 0x%08x (0x%08x, 0x%08x)\n", + __func__, + intr_status, + hba->ufs_stats.last_intr_status, + enabled_intr_status); ufshcd_dump_regs(hba, 0, UFSHCI_REG_SPACE_SIZE, "host_regs: "); } @@ -6346,7 +6351,10 @@ static int __ufshcd_issue_tm_cmd(struct ufs_hba *hba, * Even though we use wait_event() which sleeps indefinitely, * the maximum wait time is bounded by %TM_CMD_TIMEOUT. */ - req = blk_get_request(q, REQ_OP_DRV_OUT, BLK_MQ_REQ_RESERVED); + req = blk_get_request(q, REQ_OP_DRV_OUT, 0); + if (IS_ERR(req)) + return PTR_ERR(req); + req->end_io_data = &wait; free_slot = req->tag; WARN_ON_ONCE(free_slot < 0 || free_slot >= hba->nutmrs); -- cgit v1.2.3 From 72eeb7c7151302ef007f1acd018cbf6f30e50321 Mon Sep 17 00:00:00 2001 From: Martin Wilck Date: Mon, 11 Jan 2021 15:25:41 +0100 Subject: scsi: scsi_transport_srp: Don't block target in failfast state If the port is in SRP_RPORT_FAIL_FAST state when srp_reconnect_rport() is entered, a transition to SDEV_BLOCK would be illegal, and a kernel WARNING would be triggered. Skip scsi_target_block() in this case. Link: https://lore.kernel.org/r/20210111142541.21534-1-mwilck@suse.com Reviewed-by: Bart Van Assche Signed-off-by: Martin Wilck Signed-off-by: Martin K. Petersen --- drivers/scsi/scsi_transport_srp.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) (limited to 'drivers/scsi') diff --git a/drivers/scsi/scsi_transport_srp.c b/drivers/scsi/scsi_transport_srp.c index cba1cf6a1c12..1e939a2a387f 100644 --- a/drivers/scsi/scsi_transport_srp.c +++ b/drivers/scsi/scsi_transport_srp.c @@ -541,7 +541,14 @@ int srp_reconnect_rport(struct srp_rport *rport) res = mutex_lock_interruptible(&rport->mutex); if (res) goto out; - scsi_target_block(&shost->shost_gendev); + if (rport->state != SRP_RPORT_FAIL_FAST) + /* + * sdev state must be SDEV_TRANSPORT_OFFLINE, transition + * to SDEV_BLOCK is illegal. Calling scsi_target_unblock() + * later is ok though, scsi_internal_device_unblock_nowait() + * treats SDEV_TRANSPORT_OFFLINE like SDEV_BLOCK. + */ + scsi_target_block(&shost->shost_gendev); res = rport->state != SRP_RPORT_LOST ? i->f->reconnect(rport) : -ENODEV; pr_debug("%s (state %d): transport.reconnect() returned %d\n", dev_name(&shost->shost_gendev), rport->state, res); -- cgit v1.2.3 From b2b0f16fa65e910a3ec8771206bb49ee87a54ac5 Mon Sep 17 00:00:00 2001 From: Javed Hasan Date: Tue, 15 Dec 2020 11:47:31 -0800 Subject: scsi: libfc: Avoid invoking response handler twice if ep is already completed MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit A race condition exists between the response handler getting called because of exchange_mgr_reset() (which clears out all the active XIDs) and the response we get via an interrupt. Sequence of events: rport ba0200: Port timeout, state PLOGI rport ba0200: Port entered PLOGI state from PLOGI state xid 1052: Exchange timer armed : 20000 msecs  xid timer armed here rport ba0200: Received LOGO request while in state PLOGI rport ba0200: Delete port rport ba0200: work event 3 rport ba0200: lld callback ev 3 bnx2fc: rport_event_hdlr: event = 3, port_id = 0xba0200 bnx2fc: ba0200 - rport not created Yet!! /* Here we reset any outstanding exchanges before freeing rport using the exch_mgr_reset() */ xid 1052: Exchange timer canceled /* Here we got two responses for one xid */ xid 1052: invoking resp(), esb 20000000 state 3 xid 1052: invoking resp(), esb 20000000 state 3 xid 1052: fc_rport_plogi_resp() : ep->resp_active 2 xid 1052: fc_rport_plogi_resp() : ep->resp_active 2 Skip the response if the exchange is already completed. Link: https://lore.kernel.org/r/20201215194731.2326-1-jhasan@marvell.com Signed-off-by: Javed Hasan Signed-off-by: Martin K. Petersen --- drivers/scsi/libfc/fc_exch.c | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-) (limited to 'drivers/scsi') diff --git a/drivers/scsi/libfc/fc_exch.c b/drivers/scsi/libfc/fc_exch.c index d71afae6191c..841000445b9a 100644 --- a/drivers/scsi/libfc/fc_exch.c +++ b/drivers/scsi/libfc/fc_exch.c @@ -1623,8 +1623,13 @@ static void fc_exch_recv_seq_resp(struct fc_exch_mgr *mp, struct fc_frame *fp) rc = fc_exch_done_locked(ep); WARN_ON(fc_seq_exch(sp) != ep); spin_unlock_bh(&ep->ex_lock); - if (!rc) + if (!rc) { fc_exch_delete(ep); + } else { + FC_EXCH_DBG(ep, "ep is completed already," + "hence skip calling the resp\n"); + goto skip_resp; + } } /* @@ -1643,6 +1648,7 @@ static void fc_exch_recv_seq_resp(struct fc_exch_mgr *mp, struct fc_frame *fp) if (!fc_invoke_resp(ep, sp, fp)) fc_frame_free(fp); +skip_resp: fc_exch_release(ep); return; rel: @@ -1899,10 +1905,16 @@ static void fc_exch_reset(struct fc_exch *ep) fc_exch_hold(ep); - if (!rc) + if (!rc) { fc_exch_delete(ep); + } else { + FC_EXCH_DBG(ep, "ep is completed already," + "hence skip calling the resp\n"); + goto skip_resp; + } fc_invoke_resp(ep, sp, ERR_PTR(-FC_EX_CLOSED)); +skip_resp: fc_seq_set_resp(sp, NULL, ep->arg); fc_exch_release(ep); } -- cgit v1.2.3 From d6e3ae76728ccde49271d9f5acfebbea0c5625a3 Mon Sep 17 00:00:00 2001 From: Dinghao Liu Date: Fri, 25 Dec 2020 16:35:20 +0800 Subject: scsi: fnic: Fix memleak in vnic_dev_init_devcmd2 When ioread32() returns 0xFFFFFFFF, we should execute cleanup functions like other error handling paths before returning. Link: https://lore.kernel.org/r/20201225083520.22015-1-dinghao.liu@zju.edu.cn Acked-by: Karan Tilak Kumar Signed-off-by: Dinghao Liu Signed-off-by: Martin K. Petersen --- drivers/scsi/fnic/vnic_dev.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) (limited to 'drivers/scsi') diff --git a/drivers/scsi/fnic/vnic_dev.c b/drivers/scsi/fnic/vnic_dev.c index a2beee6e09f0..5988c300cc82 100644 --- a/drivers/scsi/fnic/vnic_dev.c +++ b/drivers/scsi/fnic/vnic_dev.c @@ -444,7 +444,8 @@ static int vnic_dev_init_devcmd2(struct vnic_dev *vdev) fetch_index = ioread32(&vdev->devcmd2->wq.ctrl->fetch_index); if (fetch_index == 0xFFFFFFFF) { /* check for hardware gone */ pr_err("error in devcmd2 init"); - return -ENODEV; + err = -ENODEV; + goto err_free_wq; } /* @@ -460,7 +461,7 @@ static int vnic_dev_init_devcmd2(struct vnic_dev *vdev) err = vnic_dev_alloc_desc_ring(vdev, &vdev->devcmd2->results_ring, DEVCMD2_RING_SIZE, DEVCMD2_DESC_SIZE); if (err) - goto err_free_wq; + goto err_disable_wq; vdev->devcmd2->result = (struct devcmd2_result *) vdev->devcmd2->results_ring.descs; @@ -481,8 +482,9 @@ static int vnic_dev_init_devcmd2(struct vnic_dev *vdev) err_free_desc_ring: vnic_dev_free_desc_ring(vdev, &vdev->devcmd2->results_ring); -err_free_wq: +err_disable_wq: vnic_wq_disable(&vdev->devcmd2->wq); +err_free_wq: vnic_wq_free(&vdev->devcmd2->wq); err_free_devcmd2: kfree(vdev->devcmd2); -- cgit v1.2.3 From 764907293edc1af7ac857389af9dc858944f53dc Mon Sep 17 00:00:00 2001 From: Brian King Date: Tue, 12 Jan 2021 09:06:38 -0600 Subject: scsi: ibmvfc: Set default timeout to avoid crash during migration While testing live partition mobility, we have observed occasional crashes of the Linux partition. What we've seen is that during the live migration, for specific configurations with large amounts of memory, slow network links, and workloads that are changing memory a lot, the partition can end up being suspended for 30 seconds or longer. This resulted in the following scenario: CPU 0 CPU 1 ------------------------------- ---------------------------------- scsi_queue_rq migration_store -> blk_mq_start_request -> rtas_ibm_suspend_me -> blk_add_timer -> on_each_cpu(rtas_percpu_suspend_me _______________________________________V | V -> IPI from CPU 1 -> rtas_percpu_suspend_me -> __rtas_suspend_last_cpu -- Linux partition suspended for > 30 seconds -- -> for_each_online_cpu(cpu) plpar_hcall_norets(H_PROD -> scsi_dispatch_cmd -> scsi_times_out -> scsi_abort_command -> queue_delayed_work -> ibmvfc_queuecommand_lck -> ibmvfc_send_event -> ibmvfc_send_crq - returns H_CLOSED <- returns SCSI_MLQUEUE_HOST_BUSY -> __blk_mq_requeue_request -> scmd_eh_abort_handler -> scsi_try_to_abort_cmd - returns SUCCESS -> scsi_queue_insert Normally, the SCMD_STATE_COMPLETE bit would protect against the command completion and the timeout, but that doesn't work here, since we don't check that at all in the SCSI_MLQUEUE_HOST_BUSY path. In this case we end up calling scsi_queue_insert on a request that has already been queued, or possibly even freed, and we crash. The patch below simply increases the default I/O timeout to avoid this race condition. This is also the timeout value that nearly all IBM SAN storage recommends setting as the default value. Link: https://lore.kernel.org/r/1610463998-19791-1-git-send-email-brking@linux.vnet.ibm.com Signed-off-by: Brian King Signed-off-by: Martin K. Petersen --- drivers/scsi/ibmvscsi/ibmvfc.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) (limited to 'drivers/scsi') diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c index 7312f31df878..65f168c41d23 100644 --- a/drivers/scsi/ibmvscsi/ibmvfc.c +++ b/drivers/scsi/ibmvscsi/ibmvfc.c @@ -3007,8 +3007,10 @@ static int ibmvfc_slave_configure(struct scsi_device *sdev) unsigned long flags = 0; spin_lock_irqsave(shost->host_lock, flags); - if (sdev->type == TYPE_DISK) + if (sdev->type == TYPE_DISK) { sdev->allow_restart = 1; + blk_queue_rq_timeout(sdev->request_queue, 120 * HZ); + } spin_unlock_irqrestore(shost->host_lock, flags); return 0; } -- cgit v1.2.3 From aa2c24e7f415e9c13635cee22ff4e15a80215551 Mon Sep 17 00:00:00 2001 From: Enzo Matsumiya Date: Mon, 18 Jan 2021 15:49:22 -0300 Subject: scsi: qla2xxx: Fix description for parameter ql2xenforce_iocb_limit Parameter ql2xenforce_iocb_limit is enabled by default. Link: https://lore.kernel.org/r/20210118184922.23793-1-ematsumiya@suse.de Fixes: 89c72f4245a8 ("scsi: qla2xxx: Add IOCB resource tracking") Reviewed-by: Daniel Wagner Reviewed-by: Himanshu Madhani Signed-off-by: Enzo Matsumiya Signed-off-by: Martin K. Petersen --- drivers/scsi/qla2xxx/qla_os.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'drivers/scsi') diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c index f80abe28f35a..0e0fe5b09496 100644 --- a/drivers/scsi/qla2xxx/qla_os.c +++ b/drivers/scsi/qla2xxx/qla_os.c @@ -42,7 +42,7 @@ MODULE_PARM_DESC(ql2xfulldump_on_mpifail, int ql2xenforce_iocb_limit = 1; module_param(ql2xenforce_iocb_limit, int, S_IRUGO | S_IWUSR); MODULE_PARM_DESC(ql2xenforce_iocb_limit, - "Enforce IOCB throttling, to avoid FW congestion. (default: 0)"); + "Enforce IOCB throttling, to avoid FW congestion. (default: 1)"); /* * CT6 CTX allocation cache -- cgit v1.2.3 From 8c65830ae1629b03e5d65e9aafae7e2cf5f8b743 Mon Sep 17 00:00:00 2001 From: James Smart Date: Wed, 27 Jan 2021 14:16:01 -0800 Subject: scsi: lpfc: Fix EEH encountering oops with NVMe traffic In testing, in a configuration with Redfish and native NVMe multipath when an EEH is injected, a kernel oops is being encountered: (unreliable) lpfc_nvme_ls_req+0x328/0x720 [lpfc] __nvme_fc_send_ls_req.constprop.13+0x1d8/0x3d0 [nvme_fc] nvme_fc_create_association+0x224/0xd10 [nvme_fc] nvme_fc_reset_ctrl_work+0x110/0x154 [nvme_fc] process_one_work+0x304/0x5d the NBMe transport is issuing a Disconnect LS request, which the driver receives and tries to post but the work queue used by the driver is already being torn down by the eeh. Fix by validating the validity of the work queue before proceeding with the LS transmit. Link: https://lore.kernel.org/r/20210127221601.84878-1-jsmart2021@gmail.com Reviewed-by: Ewan D. Milne Signed-off-by: James Smart Signed-off-by: Martin K. Petersen --- drivers/scsi/lpfc/lpfc_nvme.c | 3 +++ 1 file changed, 3 insertions(+) (limited to 'drivers/scsi') diff --git a/drivers/scsi/lpfc/lpfc_nvme.c b/drivers/scsi/lpfc/lpfc_nvme.c index 1cb82fa6a60e..39d147e251bf 100644 --- a/drivers/scsi/lpfc/lpfc_nvme.c +++ b/drivers/scsi/lpfc/lpfc_nvme.c @@ -559,6 +559,9 @@ __lpfc_nvme_ls_req(struct lpfc_vport *vport, struct lpfc_nodelist *ndlp, return -ENODEV; } + if (!vport->phba->sli4_hba.nvmels_wq) + return -ENOMEM; + /* * there are two dma buf in the request, actually there is one and * the second one is just the start address + cmd size. -- cgit v1.2.3