Age | Commit message (Collapse) | Author | Files | Lines |
|
Disabling the remote phy for a SATA disk causes a hang:
root@(none)$ more /sys/class/sas_phy/phy-0:0:8/target_port_protocols
sata
root@(none)$ echo 0 > sys/class/sas_phy/phy-0:0:8/enable
root@(none)$ [ 67.855950] sas: ex 500e004aaaaaaa1f phy08 change count has changed
[ 67.920585] sd 0:0:2:0: [sdc] Synchronizing SCSI cache
[ 67.925780] sd 0:0:2:0: [sdc] Synchronize Cache(10) failed: Result: hostbyte=0x04 driverbyte=DRIVER_OK
[ 67.935094] sd 0:0:2:0: [sdc] Stopping disk
[ 67.939305] sd 0:0:2:0: [sdc] Start/Stop Unit failed: Result: hostbyte=0x04 driverbyte=DRIVER_OK
...
[ 123.998998] INFO: task kworker/u192:1:642 blocked for more than 30 seconds.
[ 124.005960] Not tainted 6.0.0-rc1-205202-gf26f8f761e83 #218
[ 124.012049] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 124.019872] task:kworker/u192:1 state:D stack:0 pid: 642 ppid: 2 flags:0x00000008
[ 124.028223] Workqueue: 0000:04:00.0_event_q sas_port_event_worker
[ 124.034319] Call trace:
[ 124.036758] __switch_to+0x128/0x278
[ 124.040333] __schedule+0x434/0xa58
[ 124.043820] schedule+0x94/0x138
[ 124.047045] schedule_timeout+0x2fc/0x368
[ 124.051052] wait_for_completion+0xdc/0x200
[ 124.055234] __flush_workqueue+0x1a8/0x708
[ 124.059328] sas_porte_broadcast_rcvd+0xa8/0xc0
[ 124.063858] sas_port_event_worker+0x60/0x98
[ 124.068126] process_one_work+0x3f8/0x660
[ 124.072134] worker_thread+0x70/0x700
[ 124.075793] kthread+0x1a4/0x1b8
[ 124.079014] ret_from_fork+0x10/0x20
The issue is that the per-device running_req read in
pm8001_dev_gone_notify() never goes to zero and we never make progress.
This is caused by missing accounting for running_req for when an internal
abort command completes.
In commit 2cbbf489778e ("scsi: pm8001: Use libsas internal abort support")
we started to send internal abort commands as a proper sas_task. In this
when we deliver a sas_task to HW the per-device running_req is incremented
in pm8001_queue_command(). However it is never decremented for internal
abort commnds, so decrement in pm8001_mpi_task_abort_resp().
Link: https://lore.kernel.org/r/1663854664-76165-1-git-send-email-john.garry@huawei.com
Fixes: 2cbbf489778e ("scsi: pm8001: Use libsas internal abort support")
Acked-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
When executing SMP task failed, the smp_execute_task_sg() calls del_timer()
to delete "slow_task->timer". However, if the timer handler
sas_task_internal_timedout() is running, the del_timer() in
smp_execute_task_sg() will not stop it and a UAF will happen. The process
is shown below:
(thread 1) | (thread 2)
smp_execute_task_sg() | sas_task_internal_timedout()
... |
del_timer() |
... | ...
sas_free_task(task) |
kfree(task->slow_task) //FREE|
| task->slow_task->... //USE
Fix by calling del_timer_sync() in smp_execute_task_sg(), which makes sure
the timer handler have finished before the "task->slow_task" is
deallocated.
Link: https://lore.kernel.org/r/20220920144213.10536-1-duoming@zju.edu.cn
Fixes: 2908d778ab3e ("[SCSI] aic94xx: new driver")
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
In order to help the compiler reason about the destination buffer in struct
fc_nl_event, add a flexible array member for this purpose. However, since
the header is UAPI, it must not change size or layout, so a union is used.
The allocation size calculations are also corrected (it was potentially
allocating an extra 8 bytes), and the padding is zeroed to avoid leaking
kernel heap memory contents.
Detected at run-time by the recently added memcpy() bounds checking:
memcpy: detected field-spanning write (size 8) of single field "&event->event_data" at drivers/scsi/scsi_transport_fc.c:581 (size 4)
Link: https://lore.kernel.org/linux-next/42404B5E-198B-4FD3-94D6-5E16CF579EF3@linux.ibm.com/
Link: https://lore.kernel.org/r/20220921205155.1451649-1-keescook@chromium.org
Cc: "James E.J. Bottomley" <jejb@linux.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: linux-scsi@vger.kernel.org
Reported-by: Sachin Sant <sachinp@linux.ibm.com>
Tested-by: Sachin Sant <sachinp@linux.ibm.com>
Reviewed-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
SCSI_MOD is a helper config symbol for configuring RAID_ATTRS properly,
i.e., RAID_ATTRS needs to be m when SCSI=m.
This helper config symbol SCSI_MOD still shows up even in kernel
configurations that do not select the block subsystem and where SCSI is not
even a configuration option mentioned and selectable.
Make this SCSI_MOD depend on BLOCK, so that it only shows up when it is
slightly relevant in the kernel configuration.
Link: https://lore.kernel.org/r/20220919060112.24802-1-lukas.bulwahn@gmail.com
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Signed-off-by: wangjianli <wangjianli@cdjrlc.com>
Link: https://lore.kernel.org/r/20220908130754.34999-1-wangjianli@cdjrlc.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
|
|
In the vmbus_on_event loop, 2 jiffies timer will not serve the purpose if
callback_fn takes longer. For effective use move this check inside of
callback functions where needed. Out of all the VMbus drivers using
vmbus_on_event, only storvsc has a high packet volume, thus add this limit
only in storvsc callback for now.
There is no apparent benefit of loop itself because this tasklet will be
scheduled anyway again if there are packets left in ring buffer. This
patch removes this unnecessary loop as well.
Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/1658741848-4210-1-git-send-email-ssengar@linux.microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
|
|
dev_loss_tmo is an unsigned value. Using "%d" as output format causes
irritating negative values to be shown in sysfs.
Link: https://lore.kernel.org/r/20220902131519.16513-1-mwilck@suse.com
Reviewed-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Follow the advice of the Documentation/filesystems/sysfs.rst and show()
should only use sysfs_emit() or sysfs_emit_at() when formatting the value
to be returned to user space.
Link: https://lore.kernel.org/r/20220901015130.419307-1-zhangxuezhi3@gmail.com
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Xuezhi Zhang <zhangxuezhi1@coolpad.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Fix up sysfs show entries to use sysfs_emit()
Link: https://lore.kernel.org/r/20220831140325.396295-1-zhangxuezhi3@gmail.com
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Xuezhi Zhang <zhangxuezhi1@coolpad.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Delete the redundant word 'to'.
Link: https://lore.kernel.org/r/20220908130910.35680-1-wangjianli@cdjrlc.com
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: wangjianli <wangjianli@cdjrlc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Fix the incorrect return value check of dma_get_required_mask(). Due to
this incorrect check, the driver was always setting the DMA mask to 63 bit.
Link: https://lore.kernel.org/r/20220913120538.18759-2-sreekanth.reddy@broadcom.com
Fixes: ba27c5cf286d ("scsi: mpt3sas: Don't change the DMA coherent mask after allocations")
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Update lpfc version to 14.2.0.7
Link: https://lore.kernel.org/r/20220911221505.117655-14-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
This patch fixes below Smatch reported issues:
1. lpfc_hbadisc.c:3020 lpfc_mbx_cmpl_fcf_rr_read_fcf_rec()
error: uninitialized symbol 'vlan_id'.
2. lpfc_hbadisc.c:3121 lpfc_mbx_cmpl_read_fcf_rec()
error: uninitialized symbol 'vlan_id'.
3. lpfc_init.c:335 lpfc_dump_wakeup_param_cmpl()
warn: always true condition '(prg->dist < 4) => (0-3 < 4)'
4. lpfc_init.c:2419 lpfc_parse_vpd()
warn: inconsistent indenting.
5. lpfc_init.c:13248 lpfc_sli4_enable_msi()
warn: 'phba->pcidev->irq' 2147483648 can't fit into 65535
'eqhdl->irq'
6. lpfc_debugfs.c:5300 lpfc_idiag_extacc_avail_get()
error: uninitialized symbol 'ext_cnt'
7. lpfc_debugfs.c:5300 lpfc_idiag_extacc_avail_get()
error: uninitialized symbol 'ext_size'
8. lpfc_vmid.c:248 lpfc_vmid_get_appid()
warn: sleeping in atomic context.
9. lpfc_init.c:8342 lpfc_sli4_driver_resource_setup()
warn: missing error code 'rc'.
10. lpfc_init.c:13573 lpfc_sli4_hba_unset()
warn: variable dereferenced before check 'phba->pport' (see
line 13546)
11. lpfc_auth.c:1923 lpfc_auth_handle_dhchap_reply()
error: double free of 'hash_value'
Fixes:
1. Initialize vlan_id to LPFC_FCOE_NULL_VID.
2. Initialize vlan_id to LPFC_FCOE_NULL_VID.
3. prg->dist is a 2 bit field. Its value can only be between 0-3.
Remove redundent check 'if (prg->dist < 4)'.
4. Fix inconsistent indenting. Moved logic into helper function
lpfc_fill_vpd().
5. Define 'eqhdl->irq' as int value as pci_irq_vector() returns int.
Also, check for return value of pci_irq_vector() and log message in
case of failure.
6. Initialize 'ext_cnt' to 0.
7. Initialize 'ext_size' to 0.
8. Use alloc_percpu_gfp() with GFP_ATOMIC flag.
9. 'rc' was not updated when dma_pool_create() fails. Update 'rc =
-ENOMEM' when dma_pool_create() fails before calling goto statement.
10. Add check for 'phba->pport' in lpfc_cpuhp_remove().
11. Initialize 'hash_value' to NULL, same like 'aug_chal' variable.
Link: https://lore.kernel.org/r/20220911221505.117655-13-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Firmware reports link degrade signaling via ACQES.
Handlers and new additions to the SET_FEATURES mbox command are implemented
so that link degrade parameters for 64GB capable links are reported through
EDC ELS frames.
Link: https://lore.kernel.org/r/20220911221505.117655-12-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Removed the lpfc_fdmi_attr_entry and lpfc_fdmi_attr_def structures that had
a union causing unintentional zero padding, which required the usage of
__packed. They are replaced with explicit lpfc_fdmi_attr_u32,
lpfc_fdmi_attr_wwn, lpfc_fdmi_attr_fc4types, and lpfc_fdmi_attr_string
structure defines instead of living in a union. This rids of ambiguous
compiler zero padding, and entailed cleaning up bitwise endian
declarations.
As such, all FDMI attribute registration routines are replaced with generic
void *arg and handlers for each of the newly defined attribute structure
types.
Link: https://lore.kernel.org/r/20220911221505.117655-11-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Switch case logics are reworked so they appear more similar and
consistent. This eliminates compiler errors indicating unaligned pointer
values and packed members.
Added comments to explain previous size offset accumulations.
Link: https://lore.kernel.org/r/20220911221505.117655-10-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Clarify naming of the mp/bmp dma buffers:
- Rename mp to rq as it is the request buffer
- Rename bmp to rsp as it is the response buffer
This reduces confusion about what the buffer content is based on their
name.
Link: https://lore.kernel.org/r/20220911221505.117655-9-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
If there is a congestion or automated congestion response mode change, then
log the reported change to kmsg.
Link: https://lore.kernel.org/r/20220911221505.117655-8-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
On a PCI hotplug capable system, it is possible for scsi_device_put() to
happen after lpfc_pci_remove_one() is called. As a result, the
sdev->host->hostt->module dereference is for a previously freed memory
location because the phba structure containing the hostt template was
already freed when lpfc_pci_remove_one() returned.
Since the lpfc module is still loaded during power slot disable, all
scsi_host_templates should be declared as part of the global data segment
instead of inside the heap allocated phba structure. This way the
sdev->host->hostt memory area is always valid as long as the module is
loaded regardless if PCI hotplug dynamically allocates or frees phba
structures.
Move all scsi_host_templates in the phba structure to global variables.
Create a small helper routine to determine appropriate sg_tablesize during
shost allocation.
Link: https://lore.kernel.org/r/20220911221505.117655-7-jsmart2021@gmail.com
Co-developed-by: Dwip N. Banerjee <dnbanerg@us.ibm.com>
Signed-off-by: Dwip N. Banerjee <dnbanerg@us.ibm.com>
Co-developed-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
NPort ID
When a target makes the mistake of registering a FC4 type with the fabric,
but then rejects a PRLI of that type, the lpfc driver incorrectly retries
the PRLI causing multiple registrations with the transport. The driver
needs to detect the reject reason data and stop any retry.
Rework the PRLI reject scenarios.
Link: https://lore.kernel.org/r/20220911221505.117655-6-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Sometimes VMID targets are not getting rediscovered after a port reset.
The iocb is not freed in lpfc_cmpl_ct_cmd_vmid(), which is the completion
function for the appid CT commands. So after a port reset, the count of
sges is less than the expected count of 250. This causes post reset
operation logic to fail and keep the port offline.
Fix by freeing the iocb and kref put for the lpfc_cmpl_ct_cmd_vmid() early
return cases.
Link: https://lore.kernel.org/r/20220911221505.117655-5-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
In a situation where the node state changes while a REG_LOGIN is in
progress, the LPFC_MBOXQ_t structure is cleared and reused for an
UNREG_LOGIN command to release RPI resources without first freeing the mbuf
pool resource allocated for REG_LOGIN.
Release mbuf pool resource prior to repurposing of the mailbox command
structure from REG_LOGIN to UNREG_LOGIN.
Link: https://lore.kernel.org/r/20220911221505.117655-4-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
When a FLOGI is received before we have issued our FLOGI, the ACC response
to the received FLOGI is issued with SID 2 instead of the expected fabric
controller SID. Certain target vendors ignore the malformed ACC with SID 2
and wait for a properly filled ACC with a fabric controller SID.
The lpfc_sli_prep_wqe() routine depends on the FC_PT2PT flag to fill in the
fabric controller SID when in PT2PT mode, but due to a previous commit the
flag was getting cleared. Fix by adding a check for the defer_flogi_acc
flag to know whether or not to clear the FC_PT2PT flag on link up.
Link: https://lore.kernel.org/r/20220911221505.117655-3-jsmart2021@gmail.com
Fixes: 439b93293ff2 ("scsi: lpfc: Fix unsolicited FLOGI receive handling during PT2PT discovery")
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
The if statment check (prli_fc4_req & PRLI_NVME_TYPE) evaluates to true
when receiving a PRLI request for bogus FC4 type codes that happen to have
the 3rd or 5th bit set because PRLI_NVME_TYPE is 0x28. This leads to
sending a PRLI_NVME_ACC even for bogus FC4 type codes.
Change the bitwise & check to an exact == type code check to ensure we send
PRLI_NVME_ACC only for NVME type coded PRLI requests.
Link: https://lore.kernel.org/r/20220911221505.117655-2-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
The error code from mpi3mr_post_transport_req() is supposed to be passed to
bsg_job_done(job, rc, reslen), but it isn't.
Link: https://lore.kernel.org/r/YyMISJzVDARpVwrr@kili
Fixes: 176d4aa69c6e ("scsi: mpi3mr: Support SAS transport class callbacks")
Acked-by: Sathya Prakash Veerichetty <sathya.prakash@broadcom.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
There are three error paths which return success:
1) Propagate the error code from mpi3mr_post_transport_req() if it fails.
2) Return -EINVAL if "ioc_status != MPI3_IOCSTATUS_SUCCESS".
3) Return -EINVAL if "le16_to_cpu(mpi_reply.response_data_length) !=
sizeof(struct rep_manu_reply)"
Link: https://lore.kernel.org/r/YyMIJh1HU2Qz9+Rs@kili
Fixes: 2bd37e284914 ("scsi: mpi3mr: Add framework to issue MPT transport cmds")
Acked-by: Sathya Prakash Veerichetty <sathya.prakash@broadcom.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
ahd_linux_setup_iocell_info() intentionally writes to the const-marked
aic79xx_iocell_info array, but is called during __init, so the location is
actually writable at this point on most architectures. Annotate this
explicitly with __ro_after_init to avoid static analysis confusion.
Link: https://lpc.events/event/16/contributions/1175/attachments/1109/2128/2022-LPC-analyzer-talk.pdf
Link: https://lore.kernel.org/r/20220914115953.3854029-1-keescook@chromium.org
Cc: Hannes Reinecke <hare@suse.com>
Cc: "James E.J. Bottomley" <jejb@linux.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: linux-scsi@vger.kernel.org
Reported-by: David Malcolm <dmalcolm@redhat.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Commit 8f394da36a36 ("scsi: qla2xxx: Drop TARGET_SCF_LOOKUP_LUN_FROM_TAG")
made the __qlt_24xx_handle_abts() function return early if
tcm_qla2xxx_find_cmd_by_tag() didn't find a command, but it missed to clean
up the allocated memory for the management command.
Link: https://lore.kernel.org/r/20220914024924.695604-1-rafaelmendsr@gmail.com
Fixes: 8f394da36a36 ("scsi: qla2xxx: Drop TARGET_SCF_LOOKUP_LUN_FROM_TAG")
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Rafael Mendonca <rafaelmendsr@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
qla2x00_get_fw_version_str() has been removed since commit abbd8870b9cb
("[SCSI] qla2xxx: Factor-out ISP specific functions to method-based call
tables.").
qla2x00_release_nvram_protection() has been removed since commit
459c537807bd ("[SCSI] qla2xxx: Add ISP24xx flash-manipulation routines.").
qla82xx_rdmem() and qla82xx_wrmem() have been removed since commit
3711333dfbee ("[SCSI] qla2xxx: Updates for ISP82xx.").
qla25xx_rd_req_reg(), qla24xx_rd_req_reg(), qla25xx_wrt_rsp_reg(),
qla24xx_wrt_rsp_reg(), qla25xx_wrt_req_reg() and qla24xx_wrt_req_reg() have
been removed since commit 08029990b25b ("[SCSI] qla2xxx: Refactor
request/response-queue register handling.").
qla2x00_async_login_done() has been removed since commit 726b85487067
("qla2xxx: Add framework for async fabric discovery").
qlt_24xx_process_response_error() has been removed since commit
c5419e2618b9 ("scsi: qla2xxx: Combine Active command arrays.").
Remove the declarations for them from header file.
Link: https://lore.kernel.org/r/20220913023722.547249-2-cuigaosheng1@huawei.com
Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
In __qedf_probe(), if qedf->cdev is NULL which means
qed_ops->common->probe() failed, then the program will goto label err1, and
scsi_host_put() will free lport->host pointer. Because the memory qedf
points to is allocated by libfc_host_alloc(), it will be freed by
scsi_host_put(). However, the if statement below label err0 only checks
whether qedf is NULL but doesn't check whether the memory has been freed.
So a UAF bug can occur.
There are two ways to reach the statements below err0. The first one is
described as before, "qedf" should be set to NULL. The second one is goto
"err0" directly. In the latter scenario qedf hasn't been changed and it has
the initial value NULL. As a result the if statement is not reachable in
any situation.
The KASAN logs are as follows:
[ 2.312969] BUG: KASAN: use-after-free in __qedf_probe+0x5dcf/0x6bc0
[ 2.312969]
[ 2.312969] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
[ 2.312969] Call Trace:
[ 2.312969] dump_stack_lvl+0x59/0x7b
[ 2.312969] print_address_description+0x7c/0x3b0
[ 2.312969] ? __qedf_probe+0x5dcf/0x6bc0
[ 2.312969] __kasan_report+0x160/0x1c0
[ 2.312969] ? __qedf_probe+0x5dcf/0x6bc0
[ 2.312969] kasan_report+0x4b/0x70
[ 2.312969] ? kobject_put+0x25d/0x290
[ 2.312969] kasan_check_range+0x2ca/0x310
[ 2.312969] __qedf_probe+0x5dcf/0x6bc0
[ 2.312969] ? selinux_kernfs_init_security+0xdc/0x5f0
[ 2.312969] ? trace_rpm_return_int_rcuidle+0x18/0x120
[ 2.312969] ? rpm_resume+0xa5c/0x16e0
[ 2.312969] ? qedf_get_generic_tlv_data+0x160/0x160
[ 2.312969] local_pci_probe+0x13c/0x1f0
[ 2.312969] pci_device_probe+0x37e/0x6c0
Link: https://lore.kernel.org/r/20211112120641.16073-1-fantasquex@gmail.com
Reported-by: Zheyu Ma <zheyuma97@gmail.com>
Acked-by: Saurav Kashyap <skashyap@marvell.com>
Co-developed-by: Wende Tan <twd2.me@gmail.com>
Signed-off-by: Wende Tan <twd2.me@gmail.com>
Signed-off-by: Letu Ren <fantasquex@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Rafael explained that the reason for having both PF_NOFREEZE and
PF_FREEZER_SKIP is that {,un}lock_system_sleep() is callable from
kthread context that has previously called set_freezable().
In preparation of merging the flags, have {,un}lock_system_slee() save
and restore current->flags.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/r/20220822114648.725003428@infradead.org
|
|
There is a spelling mistake in a MODULE_PARM_DESC description. Fix it.
Link: https://lore.kernel.org/r/20220906140010.194273-1-colin.i.king@gmail.com
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Fix the following use-after-free warning which is observed during
controller reset:
refcount_t: underflow; use-after-free.
WARNING: CPU: 23 PID: 5399 at lib/refcount.c:28 refcount_warn_saturate+0xa6/0xf0
Link: https://lore.kernel.org/r/20220906134908.1039-2-sreekanth.reddy@broadcom.com
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Remote devices may go missing from the per-device nexus reset part of the
HA nexus, i.e after the controller reset. This is because libsas may find
the devices to be gone as the phy may be temporarily down when processing
the bcast event generated from the nexus reset. Filter out bcast events
during this time to stop the devices being lost.
Link: https://lore.kernel.org/r/1662378529-101489-6-git-send-email-john.garry@huawei.com
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Add a helper for bcast processing to reduce duplication.
Link: https://lore.kernel.org/r/1662378529-101489-5-git-send-email-john.garry@huawei.com
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
In resetting the controller, SATA devices may be lost.
The issue is that when we insert the bcast events to rescan the topology in
hisi_sas_rescan_topology(), when we subsequently nexus reset the SATA
devices in hisi_sas_async_I_T_nexus_reset(), there is a small timing window
in which the remote phy is down and we process the bcast event (meaning
that libsas judges that the disk is lost).
Ensure that all bcast events are processed prior to the nexus reset to
close this window.
Link: https://lore.kernel.org/r/1662378529-101489-4-git-send-email-john.garry@huawei.com
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Once the controller HW has been reset then we can unset flag
HISI_SAS_HW_FAULT_BIT. In clearing this flag earlier we can now
successfully execute commands in hisi_sas_controller_reset_done(), like
bcast processing.
Link: https://lore.kernel.org/r/1662378529-101489-3-git-send-email-john.garry@huawei.com
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Now that libsas and the SCSI core code limits the default sectors from
commit 4cbfca5f7750 ("scsi: scsi_transport_sas: cap shost opt_sectors
according to DMA optimal limit") and commit 608128d391fa ("scsi: sd: allow
max_sectors be capped at DMA optimal size limit"), there is no need for
the hack to limit the max HW sectors.
Link: https://lore.kernel.org/r/1662378529-101489-2-git-send-email-john.garry@huawei.com
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
In preparation for FORTIFY_SOURCE performing run-time destination buffer
bounds checking for memcpy(), specify the destination output buffer
explicitly, instead of asking memcpy() to write past the end of what looked
like a fixed-size object. Silences future run-time warning:
memcpy: detected field-spanning write (size 80) of single field "trc + 1" (size 64)
There is no binary code output differences from this change.
Link: https://lore.kernel.org/r/20220901205729.2260982-1-keescook@chromium.org
Cc: Bradley Grove <linuxdrivers@attotech.com>
Cc: "James E.J. Bottomley" <jejb@linux.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: linux-scsi@vger.kernel.org
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
The original code will "goto out_disable_device" and call
pci_disable_device() if pci_enable_device() fails. The kernel will generate
a warning message like "3w-9xxx 0000:00:05.0: disabling already-disabled
device".
We shouldn't disable a device that failed to be enabled. A simple return is
fine.
Link: https://lore.kernel.org/r/20220829110115.38789-1-fantasquex@gmail.com
Reported-by: Zheyu Ma <zheyuma97@gmail.com>
Signed-off-by: Letu Ren <fantasquex@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Add missing error check for dma_map_sg().
Cc: "James E.J. Bottomley" <jejb@linux.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: linux-scsi@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Link: https://lore.kernel.org/r/20220826101435.79170-1-jinpu.wang@ionos.com
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Don't use:
- DID_TARGET_FAILURE
- DID_NEXUS_FAILURE
- DID_ALLOC_FAILURE
- DID_MEDIUM_ERROR
Instead use the SCSI midlayer internal values.
Link: https://lore.kernel.org/r/20220812010027.8251-10-michael.christie@oracle.com
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
If a driver returns:
- DID_TARGET_FAILURE
- DID_NEXUS_FAILURE
- DID_ALLOC_FAILURE
- DID_MEDIUM_ERROR
we hit a couple bugs:
1. The SCSI error handler runs because scsi_decide_disposition() has no
case statements for them and we return FAILED.
2. For SG IO the userspace app gets a success status instead of failed,
because scsi_result_to_blk_status() clears those errors.
This patch adds a new internal error code byte for use by the SCSI
midlayer. This will be used instead of the above error codes, so we don't
have to play that clearing the host code game in
scsi_result_to_blk_status() and drivers cannot accidentally use them.
A subsequent commit will then remove the internal users of the above codes
and convert us to use the new ones.
Link: https://lore.kernel.org/r/20220812010027.8251-9-michael.christie@oracle.com
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
DID_ALLOC_FAILURE is internal to the SCSI layer. Drivers must not use it
because:
1. It's not propagated upwards, so SG IO/passthrough users will not see an
error and think a command was successful.
2. There is no handling for it in scsi_decide_disposition() so it results
in entering SCSI error handling.
By the code comment, it looks like the driver wanted a retryable error
code, so this has it use DID_ERROR.
Link: https://lore.kernel.org/r/20220812010027.8251-8-michael.christie@oracle.com
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
DID_TARGET_FAILURE is internal to the SCSI layer. Drivers must not use it
because:
1. It's not propagated upwards, so SG IO/passthrough users will not see an
error and think a command was successful.
2. There is no handling for it in scsi_decide_disposition() so it
results in entering SCSI error handling.
This has qla2xxx use DID_NO_CONNECT because it looks like we hit this error
when we can't find a port. It will give us the same hard error behavior and
it seems to match the error where we can't find the endpoint.
Link: https://lore.kernel.org/r/20220812010027.8251-7-michael.christie@oracle.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
DID_NEXUS_FAILURE is internal to the SCSI layer. Drivers must not use it
because:
1. It's not propagated upwards, so SG IO/passthrough users will not see an
error and think a command was successful.
2. There is no handling for it in scsi_decide_disposition() so it results
in entering SCSI error handling.
virtio_scsi gets this when something like qemu returns
VIRTIO_SCSI_S_NEXUS_FAILURE. It looks like qemu returns that error code if
host OS returns DID_NEXUS_FAILURE (qemu's internal
SCSI_HOST_RESERVATION_ERROR maps to DID_NEXUS_FAILURE). This shouldn't
happen for Linux since we don't propagate that error code to userspace.
This has us convert VIRTIO_SCSI_S_NEXUS_FAILURE to a
SAM_STAT_RESERVATION_CONFLICT in case some other virt layer is returning
it. In that case we will still get the reservation confict failure we
expect.
Link: https://lore.kernel.org/r/20220812010027.8251-6-michael.christie@oracle.com
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
DID_TARGET_FAILURE is internal to the SCSI layer. Drivers must not use it
because:
1. It's not propagated upwards, so SG IO/passthrough users will not see an
error and think a command was successful.
2. There is no handling for it in scsi_decide_disposition() so it results
in entering SCSI error handling.
virtio_scsi gets this when something like qemu returns
VIRTIO_SCSI_S_TARGET_FAILURE. It looks like qemu returns that error code
if a host OS returns it, but this shouldn't happen for Linux since we never
propagate that error to userspace.
This has us use DID_BAD_TARGET in case some other virt layer is returning
it. In that case we will still get a hard error like before and it conveys
something unexpected happened.
Link: https://lore.kernel.org/r/20220812010027.8251-5-michael.christie@oracle.com
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
DID_TARGET_FAILURE is internal to the SCSI layer. Drivers must not use it
because:
1. It's not propagated upwards, so SG IO/passthrough users will not see an
error and think a command was successful.
2. There is no handling for it in scsi_decide_disposition() so it results
in the SCSI error handling running.
It looks like the driver wanted a hard failure so swap it with
DID_BAD_TARGET.
Link: https://lore.kernel.org/r/20220812010027.8251-3-michael.christie@oracle.com
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
The error codes:
- DID_TARGET_FAILURE
- DID_NEXUS_FAILURE
- DID_ALLOC_FAILURE
- DID_MEDIUM_ERROR
are internal to the SCSI layer. Drivers must not use them because:
1. They are not propagated upwards, so SG IO/passthrough users will not
see an error and think a command was successful.
xen-scsiback will never see this error and should not try to send it.
2. There is no handling for them in scsi_decide_disposition() so if
xen-scsifront were to return the error to the SCSI midlayer then it
kicks off the error handler which is definitely not what we want.
Remove the use from xen-scsifront/back.
Link: https://lore.kernel.org/r/20220812010027.8251-2-michael.christie@oracle.com
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
There are two .exit_cmd_priv implementations. Both implementations use
resources associated with the SCSI host. Make sure that these resources are
still available when .exit_cmd_priv is called by waiting inside
scsi_remove_host() until the tag set has been freed.
This commit fixes the following use-after-free:
==================================================================
BUG: KASAN: use-after-free in srp_exit_cmd_priv+0x27/0xd0 [ib_srp]
Read of size 8 at addr ffff888100337000 by task multipathd/16727
Call Trace:
<TASK>
dump_stack_lvl+0x34/0x44
print_report.cold+0x5e/0x5db
kasan_report+0xab/0x120
srp_exit_cmd_priv+0x27/0xd0 [ib_srp]
scsi_mq_exit_request+0x4d/0x70
blk_mq_free_rqs+0x143/0x410
__blk_mq_free_map_and_rqs+0x6e/0x100
blk_mq_free_tag_set+0x2b/0x160
scsi_host_dev_release+0xf3/0x1a0
device_release+0x54/0xe0
kobject_put+0xa5/0x120
device_release+0x54/0xe0
kobject_put+0xa5/0x120
scsi_device_dev_release_usercontext+0x4c1/0x4e0
execute_in_process_context+0x23/0x90
device_release+0x54/0xe0
kobject_put+0xa5/0x120
scsi_disk_release+0x3f/0x50
device_release+0x54/0xe0
kobject_put+0xa5/0x120
disk_release+0x17f/0x1b0
device_release+0x54/0xe0
kobject_put+0xa5/0x120
dm_put_table_device+0xa3/0x160 [dm_mod]
dm_put_device+0xd0/0x140 [dm_mod]
free_priority_group+0xd8/0x110 [dm_multipath]
free_multipath+0x94/0xe0 [dm_multipath]
dm_table_destroy+0xa2/0x1e0 [dm_mod]
__dm_destroy+0x196/0x350 [dm_mod]
dev_remove+0x10c/0x160 [dm_mod]
ctl_ioctl+0x2c2/0x590 [dm_mod]
dm_ctl_ioctl+0x5/0x10 [dm_mod]
__x64_sys_ioctl+0xb4/0xf0
dm_ctl_ioctl+0x5/0x10 [dm_mod]
__x64_sys_ioctl+0xb4/0xf0
do_syscall_64+0x3b/0x90
entry_SYSCALL_64_after_hwframe+0x46/0xb0
Link: https://lore.kernel.org/r/20220826002635.919423-1-bvanassche@acm.org
Fixes: 65ca846a5314 ("scsi: core: Introduce {init,exit}_cmd_priv()")
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Mike Christie <michael.christie@oracle.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: John Garry <john.garry@huawei.com>
Cc: Li Zhijian <lizhijian@fujitsu.com>
Reported-by: Li Zhijian <lizhijian@fujitsu.com>
Tested-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|