summaryrefslogtreecommitdiffstats
path: root/drivers/scsi
AgeCommit message (Collapse)AuthorFilesLines
2022-09-25scsi: pm8001: Fix running_req for internal abort commandsJohn Garry1-0/+4
Disabling the remote phy for a SATA disk causes a hang: root@(none)$ more /sys/class/sas_phy/phy-0:0:8/target_port_protocols sata root@(none)$ echo 0 > sys/class/sas_phy/phy-0:0:8/enable root@(none)$ [ 67.855950] sas: ex 500e004aaaaaaa1f phy08 change count has changed [ 67.920585] sd 0:0:2:0: [sdc] Synchronizing SCSI cache [ 67.925780] sd 0:0:2:0: [sdc] Synchronize Cache(10) failed: Result: hostbyte=0x04 driverbyte=DRIVER_OK [ 67.935094] sd 0:0:2:0: [sdc] Stopping disk [ 67.939305] sd 0:0:2:0: [sdc] Start/Stop Unit failed: Result: hostbyte=0x04 driverbyte=DRIVER_OK ... [ 123.998998] INFO: task kworker/u192:1:642 blocked for more than 30 seconds. [ 124.005960] Not tainted 6.0.0-rc1-205202-gf26f8f761e83 #218 [ 124.012049] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 124.019872] task:kworker/u192:1 state:D stack:0 pid: 642 ppid: 2 flags:0x00000008 [ 124.028223] Workqueue: 0000:04:00.0_event_q sas_port_event_worker [ 124.034319] Call trace: [ 124.036758] __switch_to+0x128/0x278 [ 124.040333] __schedule+0x434/0xa58 [ 124.043820] schedule+0x94/0x138 [ 124.047045] schedule_timeout+0x2fc/0x368 [ 124.051052] wait_for_completion+0xdc/0x200 [ 124.055234] __flush_workqueue+0x1a8/0x708 [ 124.059328] sas_porte_broadcast_rcvd+0xa8/0xc0 [ 124.063858] sas_port_event_worker+0x60/0x98 [ 124.068126] process_one_work+0x3f8/0x660 [ 124.072134] worker_thread+0x70/0x700 [ 124.075793] kthread+0x1a4/0x1b8 [ 124.079014] ret_from_fork+0x10/0x20 The issue is that the per-device running_req read in pm8001_dev_gone_notify() never goes to zero and we never make progress. This is caused by missing accounting for running_req for when an internal abort command completes. In commit 2cbbf489778e ("scsi: pm8001: Use libsas internal abort support") we started to send internal abort commands as a proper sas_task. In this when we deliver a sas_task to HW the per-device running_req is incremented in pm8001_queue_command(). However it is never decremented for internal abort commnds, so decrement in pm8001_mpi_task_abort_resp(). Link: https://lore.kernel.org/r/1663854664-76165-1-git-send-email-john.garry@huawei.com Fixes: 2cbbf489778e ("scsi: pm8001: Use libsas internal abort support") Acked-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-09-25scsi: libsas: Fix use-after-free bug in smp_execute_task_sg()Duoming Zhou1-1/+1
When executing SMP task failed, the smp_execute_task_sg() calls del_timer() to delete "slow_task->timer". However, if the timer handler sas_task_internal_timedout() is running, the del_timer() in smp_execute_task_sg() will not stop it and a UAF will happen. The process is shown below: (thread 1) | (thread 2) smp_execute_task_sg() | sas_task_internal_timedout() ... | del_timer() | ... | ... sas_free_task(task) | kfree(task->slow_task) //FREE| | task->slow_task->... //USE Fix by calling del_timer_sync() in smp_execute_task_sg(), which makes sure the timer handler have finished before the "task->slow_task" is deallocated. Link: https://lore.kernel.org/r/20220920144213.10536-1-duoming@zju.edu.cn Fixes: 2908d778ab3e ("[SCSI] aic94xx: new driver") Reviewed-by: Jason Yan <yanaijie@huawei.com> Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-09-25scsi: scsi_transport_fc: Adjust struct fc_nl_event flex array usageKees Cook1-3/+5
In order to help the compiler reason about the destination buffer in struct fc_nl_event, add a flexible array member for this purpose. However, since the header is UAPI, it must not change size or layout, so a union is used. The allocation size calculations are also corrected (it was potentially allocating an extra 8 bytes), and the padding is zeroed to avoid leaking kernel heap memory contents. Detected at run-time by the recently added memcpy() bounds checking: memcpy: detected field-spanning write (size 8) of single field "&event->event_data" at drivers/scsi/scsi_transport_fc.c:581 (size 4) Link: https://lore.kernel.org/linux-next/42404B5E-198B-4FD3-94D6-5E16CF579EF3@linux.ibm.com/ Link: https://lore.kernel.org/r/20220921205155.1451649-1-keescook@chromium.org Cc: "James E.J. Bottomley" <jejb@linux.ibm.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: linux-scsi@vger.kernel.org Reported-by: Sachin Sant <sachinp@linux.ibm.com> Tested-by: Sachin Sant <sachinp@linux.ibm.com> Reviewed-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-09-25scsi: core: Make SCSI_MOD depend on BLOCK for cleaner .config filesLukas Bulwahn1-3/+4
SCSI_MOD is a helper config symbol for configuring RAID_ATTRS properly, i.e., RAID_ATTRS needs to be m when SCSI=m. This helper config symbol SCSI_MOD still shows up even in kernel configurations that do not select the block subsystem and where SCSI is not even a configuration option mentioned and selectable. Make this SCSI_MOD depend on BLOCK, so that it only shows up when it is slightly relevant in the kernel configuration. Link: https://lore.kernel.org/r/20220919060112.24802-1-lukas.bulwahn@gmail.com Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-09-23scsi: storvsc: remove an extraneous "to" in a commentwangjianli1-1/+1
Signed-off-by: wangjianli <wangjianli@cdjrlc.com> Link: https://lore.kernel.org/r/20220908130754.34999-1-wangjianli@cdjrlc.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2022-09-23Drivers: hv: vmbus: Optimize vmbus_on_eventSaurabh Sengar1-0/+9
In the vmbus_on_event loop, 2 jiffies timer will not serve the purpose if callback_fn takes longer. For effective use move this check inside of callback functions where needed. Out of all the VMbus drivers using vmbus_on_event, only storvsc has a high packet volume, thus add this limit only in storvsc callback for now. There is no apparent benefit of loop itself because this tasklet will be scheduled anyway again if there are packets left in ring buffer. This patch removes this unnecessary loop as well. Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/1658741848-4210-1-git-send-email-ssengar@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2022-09-15scsi: scsi_transport_fc: Use %u for dev_loss_tmoMartin Wilck1-1/+1
dev_loss_tmo is an unsigned value. Using "%d" as output format causes irritating negative values to be shown in sysfs. Link: https://lore.kernel.org/r/20220902131519.16513-1-mwilck@suse.com Reviewed-by: Steffen Maier <maier@linux.ibm.com> Signed-off-by: Martin Wilck <mwilck@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-09-15scsi: csiostor: Convert sysfs snprintf() to sysfs_emit()Xuezhi Zhang1-5/+5
Follow the advice of the Documentation/filesystems/sysfs.rst and show() should only use sysfs_emit() or sysfs_emit_at() when formatting the value to be returned to user space. Link: https://lore.kernel.org/r/20220901015130.419307-1-zhangxuezhi3@gmail.com Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Signed-off-by: Xuezhi Zhang <zhangxuezhi1@coolpad.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-09-15scsi: megaraid: Convert sysfs snprintf() to sysfs_emit()Xuezhi Zhang1-2/+2
Fix up sysfs show entries to use sysfs_emit() Link: https://lore.kernel.org/r/20220831140325.396295-1-zhangxuezhi3@gmail.com Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Signed-off-by: Xuezhi Zhang <zhangxuezhi1@coolpad.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-09-15scsi: ibmvscsi_tgt: Fix repeated words in commentwangjianli1-1/+1
Delete the redundant word 'to'. Link: https://lore.kernel.org/r/20220908130910.35680-1-wangjianli@cdjrlc.com Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: wangjianli <wangjianli@cdjrlc.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-09-15scsi: mpt3sas: Fix return value check of dma_get_required_mask()Sreekanth Reddy1-1/+1
Fix the incorrect return value check of dma_get_required_mask(). Due to this incorrect check, the driver was always setting the DMA mask to 63 bit. Link: https://lore.kernel.org/r/20220913120538.18759-2-sreekanth.reddy@broadcom.com Fixes: ba27c5cf286d ("scsi: mpt3sas: Don't change the DMA coherent mask after allocations") Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-09-15scsi: lpfc: Update lpfc version to 14.2.0.7James Smart1-1/+1
Update lpfc version to 14.2.0.7 Link: https://lore.kernel.org/r/20220911221505.117655-14-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-09-15scsi: lpfc: Fix various issues reported by toolsJames Smart6-118/+148
This patch fixes below Smatch reported issues: 1. lpfc_hbadisc.c:3020 lpfc_mbx_cmpl_fcf_rr_read_fcf_rec() error: uninitialized symbol 'vlan_id'. 2. lpfc_hbadisc.c:3121 lpfc_mbx_cmpl_read_fcf_rec() error: uninitialized symbol 'vlan_id'. 3. lpfc_init.c:335 lpfc_dump_wakeup_param_cmpl() warn: always true condition '(prg->dist < 4) => (0-3 < 4)' 4. lpfc_init.c:2419 lpfc_parse_vpd() warn: inconsistent indenting. 5. lpfc_init.c:13248 lpfc_sli4_enable_msi() warn: 'phba->pcidev->irq' 2147483648 can't fit into 65535 'eqhdl->irq' 6. lpfc_debugfs.c:5300 lpfc_idiag_extacc_avail_get() error: uninitialized symbol 'ext_cnt' 7. lpfc_debugfs.c:5300 lpfc_idiag_extacc_avail_get() error: uninitialized symbol 'ext_size' 8. lpfc_vmid.c:248 lpfc_vmid_get_appid() warn: sleeping in atomic context. 9. lpfc_init.c:8342 lpfc_sli4_driver_resource_setup() warn: missing error code 'rc'. 10. lpfc_init.c:13573 lpfc_sli4_hba_unset() warn: variable dereferenced before check 'phba->pport' (see line 13546) 11. lpfc_auth.c:1923 lpfc_auth_handle_dhchap_reply() error: double free of 'hash_value' Fixes: 1. Initialize vlan_id to LPFC_FCOE_NULL_VID. 2. Initialize vlan_id to LPFC_FCOE_NULL_VID. 3. prg->dist is a 2 bit field. Its value can only be between 0-3. Remove redundent check 'if (prg->dist < 4)'. 4. Fix inconsistent indenting. Moved logic into helper function lpfc_fill_vpd(). 5. Define 'eqhdl->irq' as int value as pci_irq_vector() returns int. Also, check for return value of pci_irq_vector() and log message in case of failure. 6. Initialize 'ext_cnt' to 0. 7. Initialize 'ext_size' to 0. 8. Use alloc_percpu_gfp() with GFP_ATOMIC flag. 9. 'rc' was not updated when dma_pool_create() fails. Update 'rc = -ENOMEM' when dma_pool_create() fails before calling goto statement. 10. Add check for 'phba->pport' in lpfc_cpuhp_remove(). 11. Initialize 'hash_value' to NULL, same like 'aug_chal' variable. Link: https://lore.kernel.org/r/20220911221505.117655-13-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-09-15scsi: lpfc: Add reporting capability for Link Degrade SignalingJames Smart8-68/+224
Firmware reports link degrade signaling via ACQES. Handlers and new additions to the SET_FEATURES mbox command are implemented so that link degrade parameters for 64GB capable links are reported through EDC ELS frames. Link: https://lore.kernel.org/r/20220911221505.117655-12-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-09-15scsi: lpfc: Rework FDMI attribute registration for unintential paddingJames Smart2-669/+350
Removed the lpfc_fdmi_attr_entry and lpfc_fdmi_attr_def structures that had a union causing unintentional zero padding, which required the usage of __packed. They are replaced with explicit lpfc_fdmi_attr_u32, lpfc_fdmi_attr_wwn, lpfc_fdmi_attr_fc4types, and lpfc_fdmi_attr_string structure defines instead of living in a union. This rids of ambiguous compiler zero padding, and entailed cleaning up bitwise endian declarations. As such, all FDMI attribute registration routines are replaced with generic void *arg and handlers for each of the newly defined attribute structure types. Link: https://lore.kernel.org/r/20220911221505.117655-11-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-09-15scsi: lpfc: Rework lpfc_fdmi_cmd() routine for cleanup and consistencyJames Smart1-23/+30
Switch case logics are reworked so they appear more similar and consistent. This eliminates compiler errors indicating unaligned pointer values and packed members. Added comments to explain previous size offset accumulations. Link: https://lore.kernel.org/r/20220911221505.117655-10-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-09-15scsi: lpfc: Rename mp/bmp dma buffers to rq/rsp in lpfc_fdmi_cmdJames Smart1-31/+32
Clarify naming of the mp/bmp dma buffers: - Rename mp to rq as it is the request buffer - Rename bmp to rsp as it is the response buffer This reduces confusion about what the buffer content is based on their name. Link: https://lore.kernel.org/r/20220911221505.117655-9-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-09-15scsi: lpfc: Update congestion mode logging for Emulex SAN Manager applicationJames Smart1-6/+22
If there is a congestion or automated congestion response mode change, then log the reported change to kmsg. Link: https://lore.kernel.org/r/20220911221505.117655-8-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-09-15scsi: lpfc: Move scsi_host_template outside dynamically allocated/freed phbaJames Smart4-34/+48
On a PCI hotplug capable system, it is possible for scsi_device_put() to happen after lpfc_pci_remove_one() is called. As a result, the sdev->host->hostt->module dereference is for a previously freed memory location because the phba structure containing the hostt template was already freed when lpfc_pci_remove_one() returned. Since the lpfc module is still loaded during power slot disable, all scsi_host_templates should be declared as part of the global data segment instead of inside the heap allocated phba structure. This way the sdev->host->hostt memory area is always valid as long as the module is loaded regardless if PCI hotplug dynamically allocates or frees phba structures. Move all scsi_host_templates in the phba structure to global variables. Create a small helper routine to determine appropriate sg_tablesize during shost allocation. Link: https://lore.kernel.org/r/20220911221505.117655-7-jsmart2021@gmail.com Co-developed-by: Dwip N. Banerjee <dnbanerg@us.ibm.com> Signed-off-by: Dwip N. Banerjee <dnbanerg@us.ibm.com> Co-developed-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Daniel Wagner <dwagner@suse.de> Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-09-15scsi: lpfc: Fix multiple NVMe remoteport registration calls for the same ↵James Smart3-41/+37
NPort ID When a target makes the mistake of registering a FC4 type with the fabric, but then rejects a PRLI of that type, the lpfc driver incorrectly retries the PRLI causing multiple registrations with the transport. The driver needs to detect the reject reason data and stop any retry. Rework the PRLI reject scenarios. Link: https://lore.kernel.org/r/20220911221505.117655-6-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-09-15scsi: lpfc: Add missing free iocb and nlp kref put for early return VMID casesJames Smart1-3/+7
Sometimes VMID targets are not getting rediscovered after a port reset. The iocb is not freed in lpfc_cmpl_ct_cmd_vmid(), which is the completion function for the appid CT commands. So after a port reset, the count of sges is less than the expected count of 250. This causes post reset operation logic to fail and keep the port offline. Fix by freeing the iocb and kref put for the lpfc_cmpl_ct_cmd_vmid() early return cases. Link: https://lore.kernel.org/r/20220911221505.117655-5-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-09-15scsi: lpfc: Fix mbuf pool resource detected as busy at driver unloadJames Smart1-0/+7
In a situation where the node state changes while a REG_LOGIN is in progress, the LPFC_MBOXQ_t structure is cleared and reused for an UNREG_LOGIN command to release RPI resources without first freeing the mbuf pool resource allocated for REG_LOGIN. Release mbuf pool resource prior to repurposing of the mailbox command structure from REG_LOGIN to UNREG_LOGIN. Link: https://lore.kernel.org/r/20220911221505.117655-4-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-09-15scsi: lpfc: Fix FLOGI ACC with wrong SID in PT2PT topologyJames Smart1-2/+7
When a FLOGI is received before we have issued our FLOGI, the ACC response to the received FLOGI is issued with SID 2 instead of the expected fabric controller SID. Certain target vendors ignore the malformed ACC with SID 2 and wait for a properly filled ACC with a fabric controller SID. The lpfc_sli_prep_wqe() routine depends on the FC_PT2PT flag to fill in the fabric controller SID when in PT2PT mode, but due to a previous commit the flag was getting cleared. Fix by adding a check for the defer_flogi_acc flag to know whether or not to clear the FC_PT2PT flag on link up. Link: https://lore.kernel.org/r/20220911221505.117655-3-jsmart2021@gmail.com Fixes: 439b93293ff2 ("scsi: lpfc: Fix unsolicited FLOGI receive handling during PT2PT discovery") Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-09-15scsi: lpfc: Fix prli_fc4_req checks in PRLI handlingJames Smart1-2/+2
The if statment check (prli_fc4_req & PRLI_NVME_TYPE) evaluates to true when receiving a PRLI request for bogus FC4 type codes that happen to have the 3rd or 5th bit set because PRLI_NVME_TYPE is 0x28. This leads to sending a PRLI_NVME_ACC even for bogus FC4 type codes. Change the bitwise & check to an exact == type code check to ensure we send PRLI_NVME_ACC only for NVME type coded PRLI requests. Link: https://lore.kernel.org/r/20220911221505.117655-2-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-09-15scsi: mpi3mr: Fix error code in mpi3mr_transport_smp_handler()Dan Carpenter1-2/+4
The error code from mpi3mr_post_transport_req() is supposed to be passed to bsg_job_done(job, rc, reslen), but it isn't. Link: https://lore.kernel.org/r/YyMISJzVDARpVwrr@kili Fixes: 176d4aa69c6e ("scsi: mpi3mr: Support SAS transport class callbacks") Acked-by: Sathya Prakash Veerichetty <sathya.prakash@broadcom.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-09-15scsi: mpi3mr: Fix error codes in mpi3mr_report_manufacture()Dan Carpenter1-26/+32
There are three error paths which return success: 1) Propagate the error code from mpi3mr_post_transport_req() if it fails. 2) Return -EINVAL if "ioc_status != MPI3_IOCSTATUS_SUCCESS". 3) Return -EINVAL if "le16_to_cpu(mpi_reply.response_data_length) != sizeof(struct rep_manu_reply)" Link: https://lore.kernel.org/r/YyMIJh1HU2Qz9+Rs@kili Fixes: 2bd37e284914 ("scsi: mpi3mr: Add framework to issue MPT transport cmds") Acked-by: Sathya Prakash Veerichetty <sathya.prakash@broadcom.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-09-15scsi: aic79xx: Use __ro_after_init explicitlyKees Cook1-1/+1
ahd_linux_setup_iocell_info() intentionally writes to the const-marked aic79xx_iocell_info array, but is called during __init, so the location is actually writable at this point on most architectures. Annotate this explicitly with __ro_after_init to avoid static analysis confusion. Link: https://lpc.events/event/16/contributions/1175/attachments/1109/2128/2022-LPC-analyzer-talk.pdf Link: https://lore.kernel.org/r/20220914115953.3854029-1-keescook@chromium.org Cc: Hannes Reinecke <hare@suse.com> Cc: "James E.J. Bottomley" <jejb@linux.ibm.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: linux-scsi@vger.kernel.org Reported-by: David Malcolm <dmalcolm@redhat.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-09-15scsi: qla2xxx: Fix memory leak in __qlt_24xx_handle_abts()Rafael Mendonca1-1/+3
Commit 8f394da36a36 ("scsi: qla2xxx: Drop TARGET_SCF_LOOKUP_LUN_FROM_TAG") made the __qlt_24xx_handle_abts() function return early if tcm_qla2xxx_find_cmd_by_tag() didn't find a command, but it missed to clean up the allocated memory for the management command. Link: https://lore.kernel.org/r/20220914024924.695604-1-rafaelmendsr@gmail.com Fixes: 8f394da36a36 ("scsi: qla2xxx: Drop TARGET_SCF_LOOKUP_LUN_FROM_TAG") Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Rafael Mendonca <rafaelmendsr@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-09-15scsi: qla2xxx: Remove unused declarations for qla2xxxGaosheng Cui2-14/+0
qla2x00_get_fw_version_str() has been removed since commit abbd8870b9cb ("[SCSI] qla2xxx: Factor-out ISP specific functions to method-based call tables."). qla2x00_release_nvram_protection() has been removed since commit 459c537807bd ("[SCSI] qla2xxx: Add ISP24xx flash-manipulation routines."). qla82xx_rdmem() and qla82xx_wrmem() have been removed since commit 3711333dfbee ("[SCSI] qla2xxx: Updates for ISP82xx."). qla25xx_rd_req_reg(), qla24xx_rd_req_reg(), qla25xx_wrt_rsp_reg(), qla24xx_wrt_rsp_reg(), qla25xx_wrt_req_reg() and qla24xx_wrt_req_reg() have been removed since commit 08029990b25b ("[SCSI] qla2xxx: Refactor request/response-queue register handling."). qla2x00_async_login_done() has been removed since commit 726b85487067 ("qla2xxx: Add framework for async fabric discovery"). qlt_24xx_process_response_error() has been removed since commit c5419e2618b9 ("scsi: qla2xxx: Combine Active command arrays."). Remove the declarations for them from header file. Link: https://lore.kernel.org/r/20220913023722.547249-2-cuigaosheng1@huawei.com Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-09-15scsi: qedf: Fix a UAF bug in __qedf_probe()Letu Ren1-5/+0
In __qedf_probe(), if qedf->cdev is NULL which means qed_ops->common->probe() failed, then the program will goto label err1, and scsi_host_put() will free lport->host pointer. Because the memory qedf points to is allocated by libfc_host_alloc(), it will be freed by scsi_host_put(). However, the if statement below label err0 only checks whether qedf is NULL but doesn't check whether the memory has been freed. So a UAF bug can occur. There are two ways to reach the statements below err0. The first one is described as before, "qedf" should be set to NULL. The second one is goto "err0" directly. In the latter scenario qedf hasn't been changed and it has the initial value NULL. As a result the if statement is not reachable in any situation. The KASAN logs are as follows: [ 2.312969] BUG: KASAN: use-after-free in __qedf_probe+0x5dcf/0x6bc0 [ 2.312969] [ 2.312969] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 [ 2.312969] Call Trace: [ 2.312969] dump_stack_lvl+0x59/0x7b [ 2.312969] print_address_description+0x7c/0x3b0 [ 2.312969] ? __qedf_probe+0x5dcf/0x6bc0 [ 2.312969] __kasan_report+0x160/0x1c0 [ 2.312969] ? __qedf_probe+0x5dcf/0x6bc0 [ 2.312969] kasan_report+0x4b/0x70 [ 2.312969] ? kobject_put+0x25d/0x290 [ 2.312969] kasan_check_range+0x2ca/0x310 [ 2.312969] __qedf_probe+0x5dcf/0x6bc0 [ 2.312969] ? selinux_kernfs_init_security+0xdc/0x5f0 [ 2.312969] ? trace_rpm_return_int_rcuidle+0x18/0x120 [ 2.312969] ? rpm_resume+0xa5c/0x16e0 [ 2.312969] ? qedf_get_generic_tlv_data+0x160/0x160 [ 2.312969] local_pci_probe+0x13c/0x1f0 [ 2.312969] pci_device_probe+0x37e/0x6c0 Link: https://lore.kernel.org/r/20211112120641.16073-1-fantasquex@gmail.com Reported-by: Zheyu Ma <zheyuma97@gmail.com> Acked-by: Saurav Kashyap <skashyap@marvell.com> Co-developed-by: Wende Tan <twd2.me@gmail.com> Signed-off-by: Wende Tan <twd2.me@gmail.com> Signed-off-by: Letu Ren <fantasquex@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-09-07freezer: Have {,un}lock_system_sleep() save/restore flagsPeter Zijlstra1-3/+4
Rafael explained that the reason for having both PF_NOFREEZE and PF_FREEZER_SKIP is that {,un}lock_system_sleep() is callable from kthread context that has previously called set_freezable(). In preparation of merging the flags, have {,un}lock_system_slee() save and restore current->flags. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lore.kernel.org/r/20220822114648.725003428@infradead.org
2022-09-06scsi: qla2xxx: Fix spelling mistake "definiton" -> "definition"Colin Ian King1-1/+1
There is a spelling mistake in a MODULE_PARM_DESC description. Fix it. Link: https://lore.kernel.org/r/20220906140010.194273-1-colin.i.king@gmail.com Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-09-06scsi: mpt3sas: Fix use-after-free warningSreekanth Reddy1-1/+1
Fix the following use-after-free warning which is observed during controller reset: refcount_t: underflow; use-after-free. WARNING: CPU: 23 PID: 5399 at lib/refcount.c:28 refcount_warn_saturate+0xa6/0xf0 Link: https://lore.kernel.org/r/20220906134908.1039-2-sreekanth.reddy@broadcom.com Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-09-06scsi: hisi_sas: Don't send bcast events from HW during nexus HA resetJohn Garry1-4/+12
Remote devices may go missing from the per-device nexus reset part of the HA nexus, i.e after the controller reset. This is because libsas may find the devices to be gone as the phy may be temporarily down when processing the bcast event generated from the nexus reset. Filter out bcast events during this time to stop the devices being lost. Link: https://lore.kernel.org/r/1662378529-101489-6-git-send-email-john.garry@huawei.com Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-09-06scsi: hisi_sas: Add helper to process bcast eventsJohn Garry5-13/+18
Add a helper for bcast processing to reduce duplication. Link: https://lore.kernel.org/r/1662378529-101489-5-git-send-email-john.garry@huawei.com Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-09-06scsi: hisi_sas: Drain bcast events in hisi_sas_rescan_topology()John Garry1-0/+7
In resetting the controller, SATA devices may be lost. The issue is that when we insert the bcast events to rescan the topology in hisi_sas_rescan_topology(), when we subsequently nexus reset the SATA devices in hisi_sas_async_I_T_nexus_reset(), there is a small timing window in which the remote phy is down and we process the bcast event (meaning that libsas judges that the disk is lost). Ensure that all bcast events are processed prior to the nexus reset to close this window. Link: https://lore.kernel.org/r/1662378529-101489-4-git-send-email-john.garry@huawei.com Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-09-06scsi: hisi_sas: Clear HISI_SAS_HW_FAULT_BIT earlierJohn Garry1-1/+1
Once the controller HW has been reset then we can unset flag HISI_SAS_HW_FAULT_BIT. In clearing this flag earlier we can now successfully execute commands in hisi_sas_controller_reset_done(), like bcast processing. Link: https://lore.kernel.org/r/1662378529-101489-3-git-send-email-john.garry@huawei.com Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-09-06scsi: hisi_sas: Revert change to limit max hw sectors for v3 HWJohn Garry1-7/+0
Now that libsas and the SCSI core code limits the default sectors from commit 4cbfca5f7750 ("scsi: scsi_transport_sas: cap shost opt_sectors according to DMA optimal limit") and commit 608128d391fa ("scsi: sd: allow max_sectors be capped at DMA optimal size limit"), there is no need for the hack to limit the max HW sectors. Link: https://lore.kernel.org/r/1662378529-101489-2-git-send-email-john.garry@huawei.com Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-09-06scsi: esas2r: Use flex array destination for memcpy()Kees Cook2-2/+2
In preparation for FORTIFY_SOURCE performing run-time destination buffer bounds checking for memcpy(), specify the destination output buffer explicitly, instead of asking memcpy() to write past the end of what looked like a fixed-size object. Silences future run-time warning: memcpy: detected field-spanning write (size 80) of single field "trc + 1" (size 64) There is no binary code output differences from this change. Link: https://lore.kernel.org/r/20220901205729.2260982-1-keescook@chromium.org Cc: Bradley Grove <linuxdrivers@attotech.com> Cc: "James E.J. Bottomley" <jejb@linux.ibm.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: linux-scsi@vger.kernel.org Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-09-06scsi: 3w-9xxx: Avoid disabling device if failing to enable itLetu Ren1-1/+1
The original code will "goto out_disable_device" and call pci_disable_device() if pci_enable_device() fails. The kernel will generate a warning message like "3w-9xxx 0000:00:05.0: disabling already-disabled device". We shouldn't disable a device that failed to be enabled. A simple return is fine. Link: https://lore.kernel.org/r/20220829110115.38789-1-fantasquex@gmail.com Reported-by: Zheyu Ma <zheyuma97@gmail.com> Signed-off-by: Letu Ren <fantasquex@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-09-06scsi: qlogicpti: Fix dma_map_sg() checkJack Wang1-1/+2
Add missing error check for dma_map_sg(). Cc: "James E.J. Bottomley" <jejb@linux.ibm.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: linux-scsi@vger.kernel.org Cc: linux-kernel@vger.kernel.org Link: https://lore.kernel.org/r/20220826101435.79170-1-jinpu.wang@ionos.com Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-09-06scsi: core: Convert scsi_decide_disposition() to use SCSIML_STATMike Christie2-25/+11
Don't use: - DID_TARGET_FAILURE - DID_NEXUS_FAILURE - DID_ALLOC_FAILURE - DID_MEDIUM_ERROR Instead use the SCSI midlayer internal values. Link: https://lore.kernel.org/r/20220812010027.8251-10-michael.christie@oracle.com Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-09-06scsi: core: Add error codes for internal SCSI midlayer useMike Christie3-0/+38
If a driver returns: - DID_TARGET_FAILURE - DID_NEXUS_FAILURE - DID_ALLOC_FAILURE - DID_MEDIUM_ERROR we hit a couple bugs: 1. The SCSI error handler runs because scsi_decide_disposition() has no case statements for them and we return FAILED. 2. For SG IO the userspace app gets a success status instead of failed, because scsi_result_to_blk_status() clears those errors. This patch adds a new internal error code byte for use by the SCSI midlayer. This will be used instead of the above error codes, so we don't have to play that clearing the host code game in scsi_result_to_blk_status() and drivers cannot accidentally use them. A subsequent commit will then remove the internal users of the above codes and convert us to use the new ones. Link: https://lore.kernel.org/r/20220812010027.8251-9-michael.christie@oracle.com Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-09-06scsi: cxlflash: Drop DID_ALLOC_FAILURE useMike Christie1-1/+1
DID_ALLOC_FAILURE is internal to the SCSI layer. Drivers must not use it because: 1. It's not propagated upwards, so SG IO/passthrough users will not see an error and think a command was successful. 2. There is no handling for it in scsi_decide_disposition() so it results in entering SCSI error handling. By the code comment, it looks like the driver wanted a retryable error code, so this has it use DID_ERROR. Link: https://lore.kernel.org/r/20220812010027.8251-8-michael.christie@oracle.com Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-09-06scsi: qla2xxx: Drop DID_TARGET_FAILURE useMike Christie1-1/+1
DID_TARGET_FAILURE is internal to the SCSI layer. Drivers must not use it because: 1. It's not propagated upwards, so SG IO/passthrough users will not see an error and think a command was successful. 2. There is no handling for it in scsi_decide_disposition() so it results in entering SCSI error handling. This has qla2xxx use DID_NO_CONNECT because it looks like we hit this error when we can't find a port. It will give us the same hard error behavior and it seems to match the error where we can't find the endpoint. Link: https://lore.kernel.org/r/20220812010027.8251-7-michael.christie@oracle.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-09-06scsi: virtio_scsi: Drop DID_NEXUS_FAILURE useMike Christie1-1/+1
DID_NEXUS_FAILURE is internal to the SCSI layer. Drivers must not use it because: 1. It's not propagated upwards, so SG IO/passthrough users will not see an error and think a command was successful. 2. There is no handling for it in scsi_decide_disposition() so it results in entering SCSI error handling. virtio_scsi gets this when something like qemu returns VIRTIO_SCSI_S_NEXUS_FAILURE. It looks like qemu returns that error code if host OS returns DID_NEXUS_FAILURE (qemu's internal SCSI_HOST_RESERVATION_ERROR maps to DID_NEXUS_FAILURE). This shouldn't happen for Linux since we don't propagate that error code to userspace. This has us convert VIRTIO_SCSI_S_NEXUS_FAILURE to a SAM_STAT_RESERVATION_CONFLICT in case some other virt layer is returning it. In that case we will still get the reservation confict failure we expect. Link: https://lore.kernel.org/r/20220812010027.8251-6-michael.christie@oracle.com Acked-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-09-06scsi: virtio_scsi: Drop DID_TARGET_FAILURE useMike Christie1-1/+1
DID_TARGET_FAILURE is internal to the SCSI layer. Drivers must not use it because: 1. It's not propagated upwards, so SG IO/passthrough users will not see an error and think a command was successful. 2. There is no handling for it in scsi_decide_disposition() so it results in entering SCSI error handling. virtio_scsi gets this when something like qemu returns VIRTIO_SCSI_S_TARGET_FAILURE. It looks like qemu returns that error code if a host OS returns it, but this shouldn't happen for Linux since we never propagate that error to userspace. This has us use DID_BAD_TARGET in case some other virt layer is returning it. In that case we will still get a hard error like before and it conveys something unexpected happened. Link: https://lore.kernel.org/r/20220812010027.8251-5-michael.christie@oracle.com Acked-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-09-06scsi: storvsc: Drop DID_TARGET_FAILURE useMike Christie1-1/+1
DID_TARGET_FAILURE is internal to the SCSI layer. Drivers must not use it because: 1. It's not propagated upwards, so SG IO/passthrough users will not see an error and think a command was successful. 2. There is no handling for it in scsi_decide_disposition() so it results in the SCSI error handling running. It looks like the driver wanted a hard failure so swap it with DID_BAD_TARGET. Link: https://lore.kernel.org/r/20220812010027.8251-3-michael.christie@oracle.com Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-09-06scsi: xen: Drop use of internal host codesMike Christie1-8/+0
The error codes: - DID_TARGET_FAILURE - DID_NEXUS_FAILURE - DID_ALLOC_FAILURE - DID_MEDIUM_ERROR are internal to the SCSI layer. Drivers must not use them because: 1. They are not propagated upwards, so SG IO/passthrough users will not see an error and think a command was successful. xen-scsiback will never see this error and should not try to send it. 2. There is no handling for them in scsi_decide_disposition() so if xen-scsifront were to return the error to the SCSI midlayer then it kicks off the error handler which is definitely not what we want. Remove the use from xen-scsifront/back. Link: https://lore.kernel.org/r/20220812010027.8251-2-michael.christie@oracle.com Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-09-01scsi: core: Fix a use-after-freeBart Van Assche5-5/+21
There are two .exit_cmd_priv implementations. Both implementations use resources associated with the SCSI host. Make sure that these resources are still available when .exit_cmd_priv is called by waiting inside scsi_remove_host() until the tag set has been freed. This commit fixes the following use-after-free: ================================================================== BUG: KASAN: use-after-free in srp_exit_cmd_priv+0x27/0xd0 [ib_srp] Read of size 8 at addr ffff888100337000 by task multipathd/16727 Call Trace: <TASK> dump_stack_lvl+0x34/0x44 print_report.cold+0x5e/0x5db kasan_report+0xab/0x120 srp_exit_cmd_priv+0x27/0xd0 [ib_srp] scsi_mq_exit_request+0x4d/0x70 blk_mq_free_rqs+0x143/0x410 __blk_mq_free_map_and_rqs+0x6e/0x100 blk_mq_free_tag_set+0x2b/0x160 scsi_host_dev_release+0xf3/0x1a0 device_release+0x54/0xe0 kobject_put+0xa5/0x120 device_release+0x54/0xe0 kobject_put+0xa5/0x120 scsi_device_dev_release_usercontext+0x4c1/0x4e0 execute_in_process_context+0x23/0x90 device_release+0x54/0xe0 kobject_put+0xa5/0x120 scsi_disk_release+0x3f/0x50 device_release+0x54/0xe0 kobject_put+0xa5/0x120 disk_release+0x17f/0x1b0 device_release+0x54/0xe0 kobject_put+0xa5/0x120 dm_put_table_device+0xa3/0x160 [dm_mod] dm_put_device+0xd0/0x140 [dm_mod] free_priority_group+0xd8/0x110 [dm_multipath] free_multipath+0x94/0xe0 [dm_multipath] dm_table_destroy+0xa2/0x1e0 [dm_mod] __dm_destroy+0x196/0x350 [dm_mod] dev_remove+0x10c/0x160 [dm_mod] ctl_ioctl+0x2c2/0x590 [dm_mod] dm_ctl_ioctl+0x5/0x10 [dm_mod] __x64_sys_ioctl+0xb4/0xf0 dm_ctl_ioctl+0x5/0x10 [dm_mod] __x64_sys_ioctl+0xb4/0xf0 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x46/0xb0 Link: https://lore.kernel.org/r/20220826002635.919423-1-bvanassche@acm.org Fixes: 65ca846a5314 ("scsi: core: Introduce {init,exit}_cmd_priv()") Cc: Ming Lei <ming.lei@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Mike Christie <michael.christie@oracle.com> Cc: Hannes Reinecke <hare@suse.de> Cc: John Garry <john.garry@huawei.com> Cc: Li Zhijian <lizhijian@fujitsu.com> Reported-by: Li Zhijian <lizhijian@fujitsu.com> Tested-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>