summaryrefslogtreecommitdiffstats
path: root/drivers/nvme
AgeCommit message (Collapse)AuthorFilesLines
2021-03-18nvmet-tcp: fix kmap leak when data digest in useElad Grupi1-1/+1
When data digest is enabled we should unmap pdu iovec before handling the data digest pdu. Signed-off-by: Elad Grupi <elad.grupi@dell.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-03-18nvmet: don't check iosqes,iocqes for discovery controllersSagi Grimberg1-3/+14
From the base spec, Figure 78: "Controller Configuration, these fields are defined as parameters to configure an "I/O Controller (IOC)" and not to configure a "Discovery Controller (DC). ... If the controller does not support I/O queues, then this field shall be read-only with a value of 0h Just perform this check for I/O controllers. Fixes: a07b4970f464 ("nvmet: add a generic NVMe target") Reported-by: Belanger, Martin <Martin.Belanger@dell.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-03-18nvme-rdma: fix possible hang when failing to set io queuesSagi Grimberg1-2/+5
We only setup io queues for nvme controllers, and it makes absolutely no sense to allow a controller (re)connect without any I/O queues. If we happen to fail setting the queue count for any reason, we should not allow this to be a successful reconnect as I/O has no chance in going through. Instead just fail and schedule another reconnect. Reported-by: Chao Leng <lengchao@huawei.com> Fixes: 711023071960 ("nvme-rdma: add a NVMe over Fabrics RDMA host driver") Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chao Leng <lengchao@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-03-18nvme-tcp: fix possible hang when failing to set io queuesSagi Grimberg1-2/+5
We only setup io queues for nvme controllers, and it makes absolutely no sense to allow a controller (re)connect without any I/O queues. If we happen to fail setting the queue count for any reason, we should not allow this to be a successful reconnect as I/O has no chance in going through. Instead just fail and schedule another reconnect. Fixes: 3f2304f8c6d6 ("nvme-tcp: add NVMe over TCP host driver") Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-03-18nvme-tcp: fix misuse of __smp_processor_id with preemption enabledSagi Grimberg1-1/+1
For our pure advisory use-case, we only rely on this call as a hint, so fix the warning complaints of using the smp_processor_id variants with preemption enabled. Fixes: db5ad6b7f8cd ("nvme-tcp: try to send request in queue_rq context") Fixes: ada831772188 ("nvme-tcp: Fix warning with CONFIG_DEBUG_PREEMPT") Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Tested-by: Yi Zhang <yi.zhang@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-03-18nvme-tcp: fix a NULL deref when receiving a 0-length r2t PDUSagi Grimberg1-0/+7
When the controller sends us a 0-length r2t PDU we should not attempt to try to set up a h2cdata PDU but rather conclude that this is a buggy controller (forward progress is not possible) and simply fail it immediately. Fixes: 3f2304f8c6d6 ("nvme-tcp: add NVMe over TCP host driver") Reported-by: Belanger, Martin <Martin.Belanger@dell.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-03-18nvme: fix Write Zeroes limitationsChristoph Hellwig1-24/+12
We voluntarily limit the Write Zeroes sizes to the MDTS value provided by the hardware, but currently get the units wrong, so fix that. Fixes: 6e02318eaea5 ("nvme: add support for the Write Zeroes command") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Tested-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
2021-03-18nvme: allocate the keep alive request using BLK_MQ_REQ_NOWAITChristoph Hellwig1-2/+2
To avoid an error recovery deadlock where the keep alive work is waiting for a request and thus can't be flushed to make progress for tearing down the controller. Also print the error code returned from blk_mq_alloc_request to help debugging any future issues in this code. Based on an earlier patch from Hannes Reinecke <hare@suse.de>. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Daniel Wagner <dwagner@suse.de>
2021-03-18nvme: merge nvme_keep_alive into nvme_keep_alive_workChristoph Hellwig1-18/+8
Merge nvme_keep_alive into its only caller to prepare for additional changes to this code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Daniel Wagner <dwagner@suse.de>
2021-03-18nvme-fabrics: only reserve a single tagChristoph Hellwig5-8/+15
Fabrics drivers currently reserve two tags on the admin queue. But given that the connect command is only run on a freshly created queue or after all commands have been force aborted we only need to reserve a single tag. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Daniel Wagner <dwagner@suse.de>
2021-03-12nvme: fix the nsid value to print in nvme_validate_or_alloc_nsChristoph Hellwig1-1/+1
ns can be NULL at this point, and my move of the check from the original patch by Chaitanya broke this. Fixes: 0ec84df4953b ("nvme-core: check ctrl css before setting up zns") Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-12Merge tag 'nvme-5.12-2021-03-12' of git://git.infradead.org/nvme into block-5.12Jens Axboe5-11/+24
Pull NVMe fixes from Christoph: "nvme fixes for 5.12: - one more quirk (Dmitry Monakhov) - fix max_zone_append_sectors initialization (Chaitanya Kulkarni) - nvme-fc reset/create race fix (James Smart) - fix status code on aborts/resets (Hannes Reinecke) - fix the CSS check for ZNS namespaces (Chaitanya Kulkarni) - fix a use after free in a debug printk in nvme-rdma (Lv Yunlong)" * tag 'nvme-5.12-2021-03-12' of git://git.infradead.org/nvme: nvme-pci: add the DISABLE_WRITE_ZEROES quirk for a Samsung PM1725a nvme-rdma: Fix a use after free in nvmet_rdma_write_data_done nvme-core: check ctrl css before setting up zns nvme-fc: fix racing controller reset and create association nvme-fc: return NVME_SC_HOST_ABORTED_CMD when a command has been aborted nvme-fc: set NVME_REQ_CANCELLED in nvme_fc_terminate_exchange() nvme: add NVME_REQ_CANCELLED flag in nvme_cancel_request() nvme: simplify error logic in nvme_validate_ns() nvme: set max_zone_append_sectors nvme_revalidate_zones
2021-03-11block: rename BIO_MAX_PAGES to BIO_MAX_VECSChristoph Hellwig1-3/+3
Ever since the addition of multipage bio_vecs BIO_MAX_PAGES has been horribly confusingly misnamed. Rename it to BIO_MAX_VECS to stop confusing users of the bio API. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20210311110137.1132391-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-11nvme-pci: add the DISABLE_WRITE_ZEROES quirk for a Samsung PM1725aDmitry Monakhov1-0/+1
This adds a quirk for Samsung PM1725a drive which fixes timeouts and I/O errors due to the fact that the controller does not properly handle the Write Zeroes command, dmesg log: nvme nvme0: I/O 528 QID 10 timeout, aborting nvme nvme0: I/O 529 QID 10 timeout, aborting nvme nvme0: I/O 530 QID 10 timeout, aborting nvme nvme0: I/O 531 QID 10 timeout, aborting nvme nvme0: I/O 532 QID 10 timeout, aborting nvme nvme0: I/O 533 QID 10 timeout, aborting nvme nvme0: I/O 534 QID 10 timeout, aborting nvme nvme0: I/O 535 QID 10 timeout, aborting nvme nvme0: Abort status: 0x0 nvme nvme0: Abort status: 0x0 nvme nvme0: Abort status: 0x0 nvme nvme0: Abort status: 0x0 nvme nvme0: Abort status: 0x0 nvme nvme0: Abort status: 0x0 nvme nvme0: Abort status: 0x0 nvme nvme0: Abort status: 0x0 nvme nvme0: I/O 528 QID 10 timeout, reset controller nvme nvme0: controller is down; will reset: CSTS=0x3, PCI_STATUS=0x10 nvme nvme0: Device not ready; aborting reset, CSTS=0x3 nvme nvme0: Device not ready; aborting reset, CSTS=0x3 nvme nvme0: Removing after probe failure status: -19 nvme0n1: detected capacity change from 6251233968 to 0 blk_update_request: I/O error, dev nvme0n1, sector 32776 op 0x1:(WRITE) flags 0x3000 phys_seg 6 prio class 0 blk_update_request: I/O error, dev nvme0n1, sector 113319936 op 0x9:(WRITE_ZEROES) flags 0x800 phys_seg 0 prio class 0 Buffer I/O error on dev nvme0n1p2, logical block 1, lost async page write blk_update_request: I/O error, dev nvme0n1, sector 113319680 op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0 Buffer I/O error on dev nvme0n1p2, logical block 2, lost async page write blk_update_request: I/O error, dev nvme0n1, sector 113319424 op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0 Buffer I/O error on dev nvme0n1p2, logical block 3, lost async page write blk_update_request: I/O error, dev nvme0n1, sector 113319168 op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0 Buffer I/O error on dev nvme0n1p2, logical block 4, lost async page write blk_update_request: I/O error, dev nvme0n1, sector 113318912 op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0 Buffer I/O error on dev nvme0n1p2, logical block 5, lost async page write blk_update_request: I/O error, dev nvme0n1, sector 113318656 op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0 Buffer I/O error on dev nvme0n1p2, logical block 6, lost async page write blk_update_request: I/O error, dev nvme0n1, sector 113318400 op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0 blk_update_request: I/O error, dev nvme0n1, sector 113318144 op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0 blk_update_request: I/O error, dev nvme0n1, sector 113317888 op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0 Signed-off-by: Dmitry Monakhov <dmtrmonakhov@yandex-team.ru> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-03-11nvme-rdma: Fix a use after free in nvmet_rdma_write_data_doneLv Yunlong1-3/+2
In nvmet_rdma_write_data_done, rsp is recoverd by wc->wr_cqe and freed by nvmet_rdma_release_rsp(). But after that, pr_info() used the freed chunk's member object and could leak the freed chunk address with wc->wr_cqe by computing the offset. Signed-off-by: Lv Yunlong <lyl2019@mail.ustc.edu.cn> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-03-11nvme-core: check ctrl css before setting up znsChaitanya Kulkarni1-0/+6
Ensure multiple Command Sets are supported before starting to setup a ZNS namespace. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> [hch: move the check around a bit] Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-03-11nvme-fc: fix racing controller reset and create associationJames Smart1-1/+1
Recent patch to prevent calling __nvme_fc_abort_outstanding_ios in interrupt context results in a possible race condition. A controller reset results in errored io completions, which schedules error work. The change of error work to a work element allows it to fire after the ctrl state transition to NVME_CTRL_CONNECTING, causing any outstanding io (used to initialize the controller) to fail and cause problems for connect_work. Add a state check to only schedule error work if not in the RESETTING state. Fixes: 19fce0470f05 ("nvme-fc: avoid calling _nvme_fc_abort_outstanding_ios from interrupt context") Signed-off-by: Nigel Kirkland <nkirkland2304@gmail.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-03-11nvme-fc: return NVME_SC_HOST_ABORTED_CMD when a command has been abortedHannes Reinecke1-1/+1
When a command has been aborted we should return NVME_SC_HOST_ABORTED_CMD to be consistent with the other transports. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-03-11nvme-fc: set NVME_REQ_CANCELLED in nvme_fc_terminate_exchange()Hannes Reinecke1-0/+1
nvme_fc_terminate_exchange() is being called when exchanges are being deleted, and as such we should be setting the NVME_REQ_CANCELLED flag to have identical behaviour on all transports. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-03-11nvme: add NVME_REQ_CANCELLED flag in nvme_cancel_request()Hannes Reinecke1-0/+1
NVME_REQ_CANCELLED is translated into -EINTR in nvme_submit_sync_cmd(), so we should be setting this flags during nvme_cancel_request() to ensure that the callers to nvme_submit_sync_cmd() will get the correct error code when the controller is reset. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chao Leng <lengchao@huawei.com> Reviewed-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-03-11nvme: simplify error logic in nvme_validate_ns()Hannes Reinecke1-4/+4
We only should remove namespaces when we get fatal error back from the device or when the namespace IDs have changed. So instead of painfully masking out error numbers which might indicate that the error should be ignored we could use an NVME status code to indicated when the namespace should be removed. That simplifies the final logic and makes it less error-prone. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-03-11nvme: set max_zone_append_sectors nvme_revalidate_zonesChaitanya Kulkarni1-2/+7
The chunk_sectors value affects max_zone_append_sectors. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Tested-by: Kanchan Joshi <joshi.k@samsung.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-03-05nvmet: model_number must be immutable once setMax Gurtovoy4-45/+50
In case we have already established connection to nvmf target, it shouldn't be allowed to change the model_number. E.g. if someone will identify ctrl and get model_number of "my_model" later on will change the model_numbel via configfs to "my_new_model" this will break the NVMe specification for "Get Log Page – Persistent Event Log" that refers to Model Number as: "This field contains the same value as reported in the Model Number field of the Identify Controller data structure, bytes 63:24." Although it doesn't mentioned explicitly that this field can't be changed, we can assume it. So allow setting this field only once: using configfs or in the first identify ctrl operation. Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-03-05nvme-fabrics: fix kato initializationMartin George1-1/+4
Currently kato is initialized to NVME_DEFAULT_KATO for both discovery & i/o controllers. This is a problem specifically for non-persistent discovery controllers since it always ends up with a non-zero kato value. Fix this by initializing kato to zero instead, and ensuring various controllers are assigned appropriate kato values as follows: non-persistent controllers - kato set to zero persistent controllers - kato set to NVMF_DEV_DISC_TMO (or any positive int via nvme-cli) i/o controllers - kato set to NVME_DEFAULT_KATO (or any positive int via nvme-cli) Signed-off-by: Martin George <marting@netapp.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-03-05nvme-hwmon: Return error code when registration failsDaniel Wagner1-0/+1
The hwmon pointer wont be NULL if the registration fails. Though the exit code path will assign it to ctrl->hwmon_device. Later nvme_hwmon_exit() will try to free the invalid pointer. Avoid this by returning the error code from hwmon_device_register_with_info(). Fixes: ed7770f66286 ("nvme/hwmon: rework to avoid devm allocation") Signed-off-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-03-05nvme-pci: add quirks for Lexar 256GB SSDPascal Terjan1-0/+3
Add the NVME_QUIRK_NO_NS_DESC_LIST and NVME_QUIRK_IGNORE_DEV_SUBNQN quirks for this buggy device. Reported and tested in https://bugs.mageia.org/show_bug.cgi?id=28417 Signed-off-by: Pascal Terjan <pterjan@google.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-03-05nvme-pci: mark Kingston SKC2000 as not supporting the deepest power stateZoltán Böszörményi1-0/+2
My 2TB SKC2000 showed the exact same symptoms that were provided in 538e4a8c57 ("nvme-pci: avoid the deepest sleep state on Kingston A2000 SSDs"), i.e. a complete NVME lockup that needed cold boot to get it back. According to some sources, the A2000 is simply a rebadged SKC2000 with a slightly optimized firmware. Adding the SKC2000 PCI ID to the quirk list with the same workaround as the A2000 made my laptop survive a 5 hours long Yocto bootstrap buildfest which reliably triggered the SSD lockup previously. Signed-off-by: Zoltán Böszörményi <zboszor@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-03-05nvme-pci: mark Seagate Nytro XM1440 as QUIRK_NO_NS_DESC_LIST.Julian Einwag1-1/+2
The kernel fails to fully detect these SSDs, only the character devices are present: [ 10.785605] nvme nvme0: pci function 0000:04:00.0 [ 10.876787] nvme nvme1: pci function 0000:81:00.0 [ 13.198614] nvme nvme0: missing or invalid SUBNQN field. [ 13.198658] nvme nvme1: missing or invalid SUBNQN field. [ 13.206896] nvme nvme0: Shutdown timeout set to 20 seconds [ 13.215035] nvme nvme1: Shutdown timeout set to 20 seconds [ 13.225407] nvme nvme0: 16/0/0 default/read/poll queues [ 13.233602] nvme nvme1: 16/0/0 default/read/poll queues [ 13.239627] nvme nvme0: Identify Descriptors failed (8194) [ 13.246315] nvme nvme1: Identify Descriptors failed (8194) Adding the NVME_QUIRK_NO_NS_DESC_LIST fixes this problem. BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=205679 Signed-off-by: Julian Einwag <jeinwag-nvme@marcapo.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org>
2021-02-28Merge tag 'block-5.12-2021-02-27' of git://git.kernel.dk/linux-blockLinus Torvalds2-6/+6
Pull more block updates from Jens Axboe: "A few stragglers (and one due to me missing it originally), and fixes for changes in this merge window mostly. In particular: - blktrace cleanups (Chaitanya, Greg) - Kill dead blk_pm_* functions (Bart) - Fixes for the bio alloc changes (Christoph) - Fix for the partition changes (Christoph, Ming) - Fix for turning off iopoll with polled IO inflight (Jeffle) - nbd disconnect fix (Josef) - loop fsync error fix (Mauricio) - kyber update depth fix (Yang) - max_sectors alignment fix (Mikulas) - Add bio_max_segs helper (Matthew)" * tag 'block-5.12-2021-02-27' of git://git.kernel.dk/linux-block: (21 commits) block: Add bio_max_segs blktrace: fix documentation for blk_fill_rw() block: memory allocations in bounce_clone_bio must not fail block: remove the gfp_mask argument to bounce_clone_bio block: fix bounce_clone_bio for passthrough bios block-crypto-fallback: use a bio_set for splitting bios block: fix logging on capacity change blk-settings: align max_sectors on "logical_block_size" boundary block: reopen the device in blkdev_reread_part block: don't skip empty device in in disk_uevent blktrace: remove debugfs file dentries from struct blk_trace nbd: handle device refs for DESTROY_ON_DISCONNECT properly kyber: introduce kyber_depth_updated() loop: fix I/O error on fsync() in detached loop devices block: fix potential IO hang when turning off io_poll block: get rid of the trace rq insert wrapper blktrace: fix blk_rq_merge documentation blktrace: fix blk_rq_issue documentation blktrace: add blk_fill_rwbs documentation comment block: remove superfluous param in blk_fill_rwbs() ...
2021-02-26block: Add bio_max_segsMatthew Wilcox (Oracle)2-6/+6
It's often inconvenient to use BIO_MAX_PAGES due to min() requiring the sign to be the same. Introduce bio_max_segs() and change BIO_MAX_PAGES to be unsigned to make it easier for the users. Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-26Merge branch 'stable/for-linus-5.12' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb Pull swiotlb updates from Konrad Rzeszutek Wilk: "Two memory encryption related patches (SWIOTLB is enabled by default for AMD-SEV): - Add support for alignment so that NVME can properly work - Keep track of requested DMA buffers length, as underlaying hardware devices can trip SWIOTLB to bounce too much and crash the kernel And a tiny fix to use proper APIs in drivers" * 'stable/for-linus-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb: swiotlb: Validate bounce size in the sync/unmap path nvme-pci: set min_align_mask swiotlb: respect min_align_mask swiotlb: don't modify orig_addr in swiotlb_tbl_sync_single swiotlb: refactor swiotlb_tbl_map_single swiotlb: clean up swiotlb_tbl_unmap_single swiotlb: factor out a nr_slots helper swiotlb: factor out an io_tlb_offset helper swiotlb: add a IO_TLB_SIZE define driver core: add a min_align_mask field to struct device_dma_parameters sdhci: stop poking into swiotlb internals
2021-02-26nvme-pci: set min_align_maskJianxiong Gao1-0/+1
The PRP addressing scheme requires all PRP entries except for the first one to have a zero offset into the NVMe controller pages (which can be different from the Linux PAGE_SIZE). Use the min_align_mask device parameter to ensure that swiotlb does not change the address of the buffer modulo the device page size to ensure that the PRPs won't be malformed. Signed-off-by: Jianxiong Gao <jxgao@google.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Jianxiong Gao <jxgao@google.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2021-02-21Merge tag 'for-5.12/drivers-2021-02-17' of git://git.kernel.dk/linux-blockLinus Torvalds21-235/+401
Pull block driver updates from Jens Axboe: - Remove the skd driver. It's been EOL for a long time (Damien) - NVMe pull requests - fix multipath handling of ->queue_rq errors (Chao Leng) - nvmet cleanups (Chaitanya Kulkarni) - add a quirk for buggy Amazon controller (Filippo Sironi) - avoid devm allocations in nvme-hwmon that don't interact well with fabrics (Hannes Reinecke) - sysfs cleanups (Jiapeng Chong) - fix nr_zones for multipath (Keith Busch) - nvme-tcp crash fix for no-data commands (Sagi Grimberg) - nvmet-tcp fixes (Sagi Grimberg) - add a missing __rcu annotation (Christoph) - failed reconnect fixes (Chao Leng) - various tracing improvements (Michal Krakowiak, Johannes Thumshirn) - switch the nvmet-fc assoc_list to use RCU protection (Leonid Ravich) - resync the status codes with the latest spec (Max Gurtovoy) - minor nvme-tcp improvements (Sagi Grimberg) - various cleanups (Rikard Falkeborn, Minwoo Im, Chaitanya Kulkarni, Israel Rukshin) - Floppy O_NDELAY fix (Denis) - MD pull request - raid5 chunk_sectors fix (Guoqing) - Use lore links (Kees) - Use DEFINE_SHOW_ATTRIBUTE for nbd (Liao) - loop lock scaling (Pavel) - mtip32xx PCI fixes (Bjorn) - bcache fixes (Kai, Dongdong) - Misc fixes (Tian, Yang, Guoqing, Joe, Andy) * tag 'for-5.12/drivers-2021-02-17' of git://git.kernel.dk/linux-block: (64 commits) lightnvm: pblk: Replace guid_copy() with export_guid()/import_guid() lightnvm: fix unnecessary NULL check warnings nvme-tcp: fix crash triggered with a dataless request submission block: Replace lkml.org links with lore nbd: Convert to DEFINE_SHOW_ATTRIBUTE nvme: add 48-bit DMA address quirk for Amazon NVMe controllers nvme-hwmon: rework to avoid devm allocation nvmet: remove else at the end of the function nvmet: add nvmet_req_subsys() helper nvmet: use min of device_path and disk len nvmet: use invalid cmd opcode helper nvmet: use invalid cmd opcode helper nvmet: add helper to report invalid opcode nvmet: remove extra variable in id-ns handler nvmet: make nvmet_find_namespace() req based nvmet: return uniform error for invalid ns nvmet: set status to 0 in case for invalid nsid nvmet-fc: add a missing __rcu annotation to nvmet_fc_tgt_assoc.queues nvme-multipath: set nr_zones for zoned namespaces nvmet-tcp: fix potential race of tcp socket closing accept_work ...
2021-02-21Merge tag 'for-5.12/block-2021-02-17' of git://git.kernel.dk/linux-blockLinus Torvalds8-35/+30
Pull core block updates from Jens Axboe: "Another nice round of removing more code than what is added, mostly due to Christoph's relentless pursuit of tech debt removal/cleanups. This pull request contains: - Two series of BFQ improvements (Paolo, Jan, Jia) - Block iov_iter improvements (Pavel) - bsg error path fix (Pan) - blk-mq scheduler improvements (Jan) - -EBUSY discard fix (Jan) - bvec allocation improvements (Ming, Christoph) - bio allocation and init improvements (Christoph) - Store bdev pointer in bio instead of gendisk + partno (Christoph) - Block trace point cleanups (Christoph) - hard read-only vs read-only split (Christoph) - Block based swap cleanups (Christoph) - Zoned write granularity support (Damien) - Various fixes/tweaks (Chunguang, Guoqing, Lei, Lukas, Huhai)" * tag 'for-5.12/block-2021-02-17' of git://git.kernel.dk/linux-block: (104 commits) mm: simplify swapdev_block sd_zbc: clear zone resources for non-zoned case block: introduce blk_queue_clear_zone_settings() zonefs: use zone write granularity as block size block: introduce zone_write_granularity limit block: use blk_queue_set_zoned in add_partition() nullb: use blk_queue_set_zoned() to setup zoned devices nvme: cleanup zone information initialization block: document zone_append_max_bytes attribute block: use bi_max_vecs to find the bvec pool md/raid10: remove dead code in reshape_request block: mark the bio as cloned in bio_iov_bvec_set block: set BIO_NO_PAGE_REF in bio_iov_bvec_set block: remove a layer of indentation in bio_iov_iter_get_pages block: turn the nr_iovecs argument to bio_alloc* into an unsigned short block: remove the 1 and 4 vec bvec_slabs entries block: streamline bvec_alloc block: factor out a bvec_alloc_gfp helper block: move struct biovec_slab to bio.c block: reuse BIO_INLINE_VECS for integrity bvecs ...
2021-02-11nvme-tcp: fix crash triggered with a dataless request submissionSagi Grimberg1-1/+1
write-zeros has a bio, but does not have any data buffers associated with it. Hence should not initialize the request iter for it (which attempts to reference the bi_io_vec (and crash). -- run blktests nvme/012 at 2021-02-05 21:53:34 BUG: kernel NULL pointer dereference, address: 0000000000000008 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] SMP NOPTI CPU: 15 PID: 12069 Comm: kworker/15:2H Tainted: G S I 5.11.0-rc6+ #1 Hardware name: Dell Inc. PowerEdge R640/06NR82, BIOS 2.10.0 11/12/2020 Workqueue: kblockd blk_mq_run_work_fn RIP: 0010:nvme_tcp_init_iter+0x7d/0xd0 [nvme_tcp] RSP: 0018:ffffbd084447bd18 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffffa0bba9f3ce80 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000002000000 RBP: ffffa0ba8ac6fec0 R08: 0000000002000000 R09: 0000000000000000 R10: 0000000002800809 R11: 0000000000000000 R12: 0000000000000000 R13: ffffa0bba9f3cf90 R14: 0000000000000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffffa0c9ff9c0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000008 CR3: 00000001c9c6c005 CR4: 00000000007706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: nvme_tcp_queue_rq+0xef/0x330 [nvme_tcp] blk_mq_dispatch_rq_list+0x11c/0x7c0 ? blk_mq_flush_busy_ctxs+0xf6/0x110 __blk_mq_sched_dispatch_requests+0x12b/0x170 blk_mq_sched_dispatch_requests+0x30/0x60 __blk_mq_run_hw_queue+0x2b/0x60 process_one_work+0x1cb/0x360 ? process_one_work+0x360/0x360 worker_thread+0x30/0x370 ? process_one_work+0x360/0x360 kthread+0x116/0x130 ? kthread_park+0x80/0x80 ret_from_fork+0x1f/0x30 -- Fixes: cb9b870fba3e ("nvme-tcp: fix wrong setting of request iov_iter") Reported-by: Yi Zhang <yi.zhang@redhat.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Tested-by: Yi Zhang <yi.zhang@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-10nvme: add 48-bit DMA address quirk for Amazon NVMe controllersFilippo Sironi2-1/+26
Some Amazon NVMe controllers do not follow the NVMe specification and are limited to 48-bit DMA addresses. Add a quirk to force bounce buffering if needed and limit the IOVA allocation for these devices. This affects all current Amazon NVMe controllers that expose EBS volumes (0x0061, 0x0065, 0x8061) and local instance storage (0xcd00, 0xcd01, 0xcd02). Signed-off-by: Filippo Sironi <sironi@amazon.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-10nvme-hwmon: rework to avoid devm allocationHannes Reinecke3-10/+30
The original design to use device-managed resource allocation doesn't really work as the NVMe controller has a vastly different lifetime than the hwmon sysfs attributes, causing warning about duplicate sysfs entries upon reconnection. This patch reworks the hwmon allocation to avoid device-managed resource allocation, and uses the NVMe controller as parent for the sysfs attributes. Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Hannes Reinecke <hare@suse.de> Tested-by: Enzo Matsumiya <ematsumiya@suse.de> Tested-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-10nvmet: remove else at the end of the functionChaitanya Kulkarni1-2/+2
The function nvmet_parse_io_cmd() returns value from nvmet_file_parse_io_cmd() or nvmet_bdev_parse_io_cmd() based on which backend is set for the request. Remove the else and just return the value from nvmet_bdev_parse_io_cmd(). Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-10nvmet: add nvmet_req_subsys() helperChaitanya Kulkarni4-9/+14
Just like what we have to get the passthru ctrl from the req, add an helper to get the subsystem associated with the nvmet_req() instead of open coding the chain of structures. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-10nvmet: use min of device_path and disk lenChaitanya Kulkarni1-3/+6
In function __assign_req_name() instead of using the DEVICE_NAME_LEN in strncpy() use min of DISK_NAME_LEN and strlen(req->ns->device_path). This is needed to turn off the following warnings:- In file included from drivers/nvme/target/core.c:14: In function ‘__assign_req_name’, inlined from ‘trace_event_raw_event_nvmet_req_init’ at drivers/nvme/target/./trace.h:58:1: drivers/nvme/target/trace.h:52:3: warning: ‘strncpy’ specified bound 32 equals destination size [-Wstringop-truncation] strncpy(name, req->ns->device_path, DISK_NAME_LEN); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In function ‘__assign_req_name’, inlined from ‘perf_trace_nvmet_req_complete’ at drivers/nvme/target/./trace.h:100:1: drivers/nvme/target/trace.h:52:3: warning: ‘strncpy’ specified bound 32 equals destination size [-Wstringop-truncation] strncpy(name, req->ns->device_path, DISK_NAME_LEN); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In function ‘__assign_req_name’, inlined from ‘perf_trace_nvmet_req_init’ at drivers/nvme/target/./trace.h:58:1: drivers/nvme/target/trace.h:52:3: warning: ‘strncpy’ specified bound 32 equals destination size [-Wstringop-truncation] strncpy(name, req->ns->device_path, DISK_NAME_LEN); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In function ‘__assign_req_name’, inlined from ‘trace_event_raw_event_nvmet_req_complete’ at drivers/nvme/target/./trace.h:100:1: drivers/nvme/target/trace.h:52:3: warning: ‘strncpy’ specified bound 32 equals destination size [-Wstringop-truncation] strncpy(name, req->ns->device_path, DISK_NAME_LEN); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-10nvmet: use invalid cmd opcode helperChaitanya Kulkarni1-1/+1
In the NVMeOF block device backend, file backend, and passthru backend we reject and report the commands if opcode is not handled. Use the previously introduced helper in the passthru backend to make the error message uniform. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-10nvmet: use invalid cmd opcode helperChaitanya Kulkarni1-4/+1
In the NVMeOF block device backend, file backend, and passthru backend we reject and report the commands if opcode is not handled. Use the previously introduced helper in file backend to reduce the duplicate code and make the error message uniform. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-10nvmet: add helper to report invalid opcodeChaitanya Kulkarni3-4/+11
In the NVMeOF block device backend, file backend, and passthru backend we reject and report the commands if opcode is not handled. Add an helper and use it in block device backend to keep the code and error message uniform. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-10nvmet: remove extra variable in id-ns handlerChaitanya Kulkarni1-2/+1
In nvmet_execute_identify_ns() local variable ctrl is accessed only in one place, remove that and directly use it from nvmet_req->sq->ctrl. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-10nvmet: make nvmet_find_namespace() req basedChaitanya Kulkarni3-40/+32
The six callers of nvmet_find_namespace() duplicate the error log page update and status setting code for each call on failure. All callers are nvmet requests based functions, so we can pass req to the nvmet_find_namesapce() & derive ctrl from req, that'll allow us to update the error log page in nvmet_find_namespace(). Now that we pass the request we can also get rid of the local variable in nvmet_find_namespace() and use the req->ns and return the error code. Replace the ctrl parameter with nvmet_req for nvmet_find_namespace(), centralize the error log page update for non allocated namesapces, and return uniform error for non-allocated namespace. The nvmet_find_namespace() takes nsid parameter which is from NVMe commands structures such as get_log_page, identify, rw and common. All these commands have same offset for the nsid field. Derive nsid from req->cmd->common.nsid) & remove the extra parameter from the nvmet_find_namespace(). Lastly now we associate the ns to the req parameter that we pass to the nvmet_find_namespace(), rename nvmet_find_namespace() to nvmet_req_find_ns(). Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-10nvmet: return uniform error for invalid nsChaitanya Kulkarni1-2/+2
For nvmet_find_namespace() error case we have inconsistent error code mapping in the function nvmet_get_smart_log_nsid() and nvmet_set_feat_write_protect(). There is no point in retrying for the invalid namesapce from the host side. Set the error code to the NVME_SC_INVALID_NS | NVME_SC_DNR which matches what we have in nvmet_execute_identify_desclist(). Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-10nvmet: set status to 0 in case for invalid nsidChaitanya Kulkarni1-1/+1
For unallocated namespace in nvmet_execute_identify_ns() don't set the status to NVME_SC_INVALID_NS, set it to zero. Fixes: bffcd507780e ("nvmet: set right status on error in id-ns handler") Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-10nvmet-fc: add a missing __rcu annotation to nvmet_fc_tgt_assoc.queuesChristoph Hellwig1-1/+1
Make sparse happy after the recent conversion to RCU lookups. Fixes: 4e2f02bf77da ("nvmet-fc: use RCU proctection for assoc_list") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: James Smart <james.smart@broadcom.com>
2021-02-10nvme-multipath: set nr_zones for zoned namespacesKeith Busch1-0/+4
The bio based drivers only require the request_queue's nr_zones is set, so set this field in the head if the namespace path is zoned. Fixes: 240e6ee272c07 ("nvme: support for zoned namespaces") Reported-by: Minwoo Im <minwoo.im.dev@gmail.com> Cc: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-10nvmet-tcp: fix potential race of tcp socket closing accept_workSagi Grimberg1-10/+18
When we accept a TCP connection and allocate an nvmet-tcp queue we should make sure not to fully establish it or reference it as the connection may be already closing, which triggers queue release work, which does not fence against queue establishment. In order to address such a race, we make sure to check the sk_state and contain the queue reference to be done underneath the sk_callback_lock such that the queue release work correctly fences against it. Fixes: 872d26a391da ("nvmet-tcp: add NVMe over TCP target driver") Reported-by: Elad Grupi <elad.grupi@dell.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>