summaryrefslogtreecommitdiffstats
path: root/drivers/nvme
AgeCommit message (Collapse)AuthorFilesLines
2020-10-23nvme-fc: shorten reconnect delay if possible for FCJames Smart1-1/+18
We've had several complaints about a 10s reconnect delay (the default) when there was an error while there is connectivity to a subsystem. The max_reconnects and reconnect_delay are set in common code prior to calling the transport to create the controller. This change checks if the default reconnect delay is being used, and if so, it adjusts it to a shorter period (2s) for the nvme-fc transport. It does so by calculating the controller loss tmo window, changing the value of the reconnect delay, and then recalculating the maximum number of reconnect attempts allowed. Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-10-23nvme-fc: wait for queues to freeze before calling update_hr_hw_queuesJames Smart1-2/+5
On reconnect, the code currently does not freeze the controller before possibly updating the number hw queues for the controller. Add the freeze before updating the number of hw queues. Note: the queues are already started and remain started through the reconnect. Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-10-23nvme-fc: fix error loop in create_hw_io_queuesJames Smart1-2/+2
The loop that backs out of hw io queue creation continues through index 0, which corresponds to the admin queue as well. Fix the loop so it only proceeds through indexes 1..n which correspond to I/O queues. Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-10-23nvme-fc: fix io timeout to abort I/OJames Smart1-39/+69
Currently, an I/O timeout unconditionally invokes nvme_fc_error_recovery() which checks for LIVE or CONNECTING state. If live, the routine resets the controller which initiates a reconnect - which is valid. If CONNECTING, err_work is scheduled. Err_work then calls the terminate_io routine, which also checks for CONNECTING and noops any further action on outstanding I/O. The result is nothing happened to the timed out io. As such, if the command was dropped on the wire, it will never timeout / complete, and the connect process will hang. Change the behavior of the io timeout routine to unconditionally abort the I/O. I/O completion handling will note that an io failed due to an abort and will terminate the connection / association as needed. If the abort was unable to happen, continue with a call to nvme_fc_error_recovery(). To ensure something different happens in nvme_fc_error_recovery() rework it so at it will abort all I/Os on the association to force a failure. As I/O aborts now may occur outside of delete_association, counting for completion must be wary and only count those aborted during delete_association when TERMIO is set on the controller. Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-10-22nvmet: don't use BLK_MQ_REQ_NOWAIT for passthruChaitanya Kulkarni1-1/+1
By default, we set the passthru request allocation flag such that it returns the error in the following code path and we fail the I/O when BLK_MQ_REQ_NOWAIT is used for request allocation :- nvme_alloc_request()  blk_mq_alloc_request()   blk_mq_queue_enter()    if (flag & BLK_MQ_REQ_NOWAIT)         return -EBUSY; <-- return if busy. On some controllers using BLK_MQ_REQ_NOWAIT ends up in I/O error where the controller is perfectly healthy and not in a degraded state. Block layer request allocation does allow us to wait instead of immediately returning the error when we BLK_MQ_REQ_NOWAIT flag is not used. This has shown to fix the I/O error problem reported under heavy random write workload. Remove the BLK_MQ_REQ_NOWAIT parameter for passthru request allocation which resolves this issue. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-10-22nvmet: cleanup nvmet_passthru_map_sg()Logan Gunthorpe1-3/+4
Clean up some confusing elements of nvmet_passthru_map_sg() by returning early if the request is greater than the maximum bio size. This allows us to drop the sg_cnt variable. This should not result in any functional change but makes the code clearer and more understandable. The original code allocated a truncated bio then would return EINVAL when bio_add_pc_page() filled that bio. The new code just returns EINVAL early if this would happen. Fixes: c1fef73f793b ("nvmet: add passthru code to process commands") Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Suggested-by: Douglas Gilbert <dgilbert@interlog.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Cc: Christoph Hellwig <hch@lst.de> Cc: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-10-22nvmet: limit passthru MTDS by BIO_MAX_PAGESLogan Gunthorpe1-1/+8
nvmet_passthru_map_sg() only supports mapping a single BIO, not a chain so the effective maximum transfer should also be limitted by BIO_MAX_PAGES (presently this works out to 1MB). For PCI passthru devices the max_sectors would typically be more limitting than BIO_MAX_PAGES, but this may not be true for all passthru devices. Fixes: c1fef73f793b ("nvmet: add passthru code to process commands") Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-10-22nvmet: fix uninitialized work for zero katozhenwei pi1-1/+2
When connecting a controller with a zero kato value using the following command line nvme connect -t tcp -n NQN -a ADDR -s PORT --keep-alive-tmo=0 the warning below can be reproduced: WARNING: CPU: 1 PID: 241 at kernel/workqueue.c:1627 __queue_delayed_work+0x6d/0x90 with trace: mod_delayed_work_on+0x59/0x90 nvmet_update_cc+0xee/0x100 [nvmet] nvmet_execute_prop_set+0x72/0x80 [nvmet] nvmet_tcp_try_recv_pdu+0x2f7/0x770 [nvmet_tcp] nvmet_tcp_io_work+0x63f/0xb2d [nvmet_tcp] ... This is caused by queuing up an uninitialized work. Althrough the keep-alive timer is disabled during allocating the controller (fixed in 0d3b6a8d213a), ka_work still has a chance to run (called by nvmet_start_ctrl). Fixes: 0d3b6a8d213a ("nvmet: Disable keep-alive timer when kato is cleared to 0h") Signed-off-by: zhenwei pi <pizhenwei@bytedance.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-10-22nvme-pci: disable Write Zeroes on Sandisk SkyhawkKai-Heng Feng1-0/+2
Like commit 5611ec2b9814 ("nvme-pci: prevent SK hynix PC400 from using Write Zeroes command"), Sandisk Skyhawk has the same issue: [ 6305.633887] blk_update_request: operation not supported error, dev nvme0n1, sector 340812032 op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0 So also disable Write Zeroes command on Sandisk Skyhawk. BugLink: https://bugs.launchpad.net/bugs/1899503 Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-10-22nvme: use queuedata for nvme_req_qidKeith Busch1-1/+1
The request's rq_disk isn't set for passthrough IO commands, so tracing uses qid 0 for these which incorrectly decodes as an admin command. Use the request_queue's queuedata instead since that value is always set for the IO queues, and never set for the admin queue. Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-10-22nvme-rdma: fix crash due to incorrect cqeChao Leng1-2/+3
A crash happened due to injecting error test. When a CQE has incorrect command id due do an error injection, the host may find a request which is already freed. Dereferencing req->mr->rkey causes a crash in nvme_rdma_process_nvme_rsp because the mr is already freed. Add a check for the mr to fix it. Signed-off-by: Chao Leng <lengchao@huawei.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-10-22nvme-rdma: fix crash when connect rejectedChao Leng1-1/+0
A crash can happened when a connect is rejected. The host establishes the connection after received ConnectReply, and then continues to send the fabrics Connect command. If the controller does not receive the ReadyToUse capsule, host may receive a ConnectReject reply. Call nvme_rdma_destroy_queue_ib after the host received the RDMA_CM_EVENT_REJECTED event. Then when the fabrics Connect command times out, nvme_rdma_timeout calls nvme_rdma_complete_rq to fail the request. A crash happenes due to use after free in nvme_rdma_complete_rq. nvme_rdma_destroy_queue_ib is redundant when handling the RDMA_CM_EVENT_REJECTED event as nvme_rdma_destroy_queue_ib is already called in connection failure handler. Signed-off-by: Chao Leng <lengchao@huawei.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-10-13nvme: translate zone resource errorsKeith Busch1-0/+4
Translate zoned resource errors to the appropriate blk_status_t. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-13Merge tag 'drivers-5.10-2020-10-12' of git://git.kernel.dk/linux-blockLinus Torvalds11-384/+310
Pull block driver updates from Jens Axboe: "Here are the driver updates for 5.10. A few SCSI updates in here too, in coordination with Martin as they depend on core block changes for the shared tag bitmap. This contains: - NVMe pull requests via Christoph: - fix keep alive timer modification (Amit Engel) - order the PCI ID list more sensibly (Andy Shevchenko) - cleanup the open by controller helper (Chaitanya Kulkarni) - use an xarray for the CSE log lookup (Chaitanya Kulkarni) - support ZNS in nvmet passthrough mode (Chaitanya Kulkarni) - fix nvme_ns_report_zones (Christoph Hellwig) - add a sanity check to nvmet-fc (James Smart) - fix interrupt allocation when too many polled queues are specified (Jeffle Xu) - small nvmet-tcp optimization (Mark Wunderlich) - fix a controller refcount leak on init failure (Chaitanya Kulkarni) - misc cleanups (Chaitanya Kulkarni) - major refactoring of the scanning code (Christoph Hellwig) - MD updates via Song: - Bug fixes in bitmap code, from Zhao Heming - Fix a work queue check, from Guoqing Jiang - Fix raid5 oops with reshape, from Song Liu - Clean up unused code, from Jason Yan - Discard improvements, from Xiao Ni - raid5/6 page offset support, from Yufen Yu - Shared tag bitmap for SCSI/hisi_sas/null_blk (John, Kashyap, Hannes) - null_blk open/active zone limit support (Niklas) - Set of bcache updates (Coly, Dongsheng, Qinglang)" * tag 'drivers-5.10-2020-10-12' of git://git.kernel.dk/linux-block: (78 commits) md/raid5: fix oops during stripe resizing md/bitmap: fix memory leak of temporary bitmap md: fix the checking of wrong work queue md/bitmap: md_bitmap_get_counter returns wrong blocks md/bitmap: md_bitmap_read_sb uses wrong bitmap blocks md/raid0: remove unused function is_io_in_chunk_boundary() nvme-core: remove extra condition for vwc nvme-core: remove extra variable nvme: remove nvme_identify_ns_list nvme: refactor nvme_validate_ns nvme: move nvme_validate_ns nvme: query namespace identifiers before adding the namespace nvme: revalidate zone bitmaps in nvme_update_ns_info nvme: remove nvme_update_formats nvme: update the known admin effects nvme: set the queue limits in nvme_update_ns_info nvme: remove the 0 lba_shift check in nvme_update_ns_info nvme: clean up the check for too large logic block sizes nvme: freeze the queue over ->lba_shift updates nvme: factor out a nvme_configure_metadata helper ...
2020-10-13Merge tag 'block-5.10-2020-10-12' of git://git.kernel.dk/linux-blockLinus Torvalds3-38/+38
Pull block updates from Jens Axboe: - Series of merge handling cleanups (Baolin, Christoph) - Series of blk-throttle fixes and cleanups (Baolin) - Series cleaning up BDI, seperating the block device from the backing_dev_info (Christoph) - Removal of bdget() as a generic API (Christoph) - Removal of blkdev_get() as a generic API (Christoph) - Cleanup of is-partition checks (Christoph) - Series reworking disk revalidation (Christoph) - Series cleaning up bio flags (Christoph) - bio crypt fixes (Eric) - IO stats inflight tweak (Gabriel) - blk-mq tags fixes (Hannes) - Buffer invalidation fixes (Jan) - Allow soft limits for zone append (Johannes) - Shared tag set improvements (John, Kashyap) - Allow IOPRIO_CLASS_RT for CAP_SYS_NICE (Khazhismel) - DM no-wait support (Mike, Konstantin) - Request allocation improvements (Ming) - Allow md/dm/bcache to use IO stat helpers (Song) - Series improving blk-iocost (Tejun) - Various cleanups (Geert, Damien, Danny, Julia, Tetsuo, Tian, Wang, Xianting, Yang, Yufen, yangerkun) * tag 'block-5.10-2020-10-12' of git://git.kernel.dk/linux-block: (191 commits) block: fix uapi blkzoned.h comments blk-mq: move cancel of hctx->run_work to the front of blk_exit_queue blk-mq: get rid of the dead flush handle code path block: get rid of unnecessary local variable block: fix comment and add lockdep assert blk-mq: use helper function to test hw stopped block: use helper function to test queue register block: remove redundant mq check block: invoke blk_mq_exit_sched no matter whether have .exit_sched percpu_ref: don't refer to ref->data if it isn't allocated block: ratelimit handle_bad_sector() message blk-throttle: Re-use the throtl_set_slice_end() blk-throttle: Open code __throtl_de/enqueue_tg() blk-throttle: Move service tree validation out of the throtl_rb_first() blk-throttle: Move the list operation after list validation blk-throttle: Fix IO hang for a corner case blk-throttle: Avoid tracking latency if low limit is invalid blk-throttle: Avoid getting the current time if tg->last_finish_time is 0 blk-throttle: Remove a meaningless parameter for throtl_downgrade_state() block: Remove redundant 'return' statement ...
2020-10-08Merge tag 'block5.9-2020-10-08' of git://git.kernel.dk/linux-blockLinus Torvalds1-1/+3
Pull block fixes from Jens Axboe: "A few fixes that should go into this release: - NVMe controller error path reference fix (Chaitanya) - Fix regression with IBM partitions on non-dasd devices (Christoph) - Fix a missing clear in the compat CDROM packet structure (Peilin)" * tag 'block5.9-2020-10-08' of git://git.kernel.dk/linux-block: partitions/ibm: fix non-DASD devices nvme-core: put ctrl ref when module ref get fail block/scsi-ioctl: Fix kernel-infoleak in scsi_put_cdrom_generic_arg()
2020-10-07nvme-core: remove extra condition for vwcChaitanya Kulkarni1-3/+1
In nvme_set_queue_limits() we initialize vwc to false and later add a condition to set vwc true. The value of the vwc can be declare initialized which makes all the blk_queue_XXX() calls uniform. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-10-07nvme-core: remove extra variableChaitanya Kulkarni1-3/+2
In nvme_validate_ns() the exra variable ctrl is used only twice. Using ns->ctrl directly still maintains the redability and original length of the lines in the code. Get rid of the extra variable ctrl & use ns->ctrl directly. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-10-07nvme: remove nvme_identify_ns_listChristoph Hellwig1-12/+8
Just fold it into the only caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
2020-10-07nvme: refactor nvme_validate_nsChristoph Hellwig1-20/+17
Move the logic to revalidate the block_device size or remove the namespace from the caller into nvme_validate_ns. This removes the return value and thus the status code translation. Additionally it also catches non-permanent errors from nvme_update_ns_info using the existing logic. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
2020-10-07nvme: move nvme_validate_nsChristoph Hellwig1-37/+37
Move nvme_validate_ns just above its only remaining caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
2020-10-07nvme: query namespace identifiers before adding the namespaceChristoph Hellwig1-63/+53
Check the namespace identifier list first thing when scanning namespaces. This keeps the code to query the CSI common between the alloc and validate path, and helps to structure the code better for multiple command set support. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
2020-10-07nvme: revalidate zone bitmaps in nvme_update_ns_infoChristoph Hellwig1-4/+6
Consolidate the two calls into a single place. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org>
2020-10-07nvme: remove nvme_update_formatsChristoph Hellwig1-30/+2
Now that the queue is frozen before updating ->lba_shift we can't hit the invalid references mentioned in the comment any more. More importantly this code would not have helped us if the format was changed by another controller or through implementation defined back channels. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
2020-10-07nvme: update the known admin effectsChristoph Hellwig1-2/+2
A Format NVM command can change the capabilities of namespaces, while Sanitize does change the Logical Block Content and must be serialized. Also remove CSUPP bit for Format - it is not a mandatory command, and we don't check for the bit anyway. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
2020-10-07nvme: set the queue limits in nvme_update_ns_infoChristoph Hellwig1-25/+21
Only set the queue limits once we have the real block size. This also updates the limits on a rescan if needed. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
2020-10-07nvme: remove the 0 lba_shift check in nvme_update_ns_infoChristoph Hellwig1-6/+0
We can no longer reach this code if Identify Namespace failed. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
2020-10-07nvme: clean up the check for too large logic block sizesChristoph Hellwig1-8/+5
Use a single statement to set both the capacity and fake block size instead of two. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org>
2020-10-07nvme: freeze the queue over ->lba_shift updatesChristoph Hellwig1-6/+14
Ensure that there can't be any I/O in flight went we change the disk geometry in nvme_update_ns_info, most notable the LBA size by lifting the queue free from nvme_update_disk_info into the caller Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
2020-10-07nvme: factor out a nvme_configure_metadata helperChristoph Hellwig1-31/+47
Factor out a helper from nvme_update_ns_info that configures the per-namespaces metadata and PI settings. Also make sure the helpers clear the flags explicitly instead of all of ->features to allow for potentially reusing ->features for future non-metadata flags. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
2020-10-07nvme: call nvme_identify_ns as the first thing in nvme_alloc_ns_blockChristoph Hellwig1-8/+7
Check if the namespace actually exists as the very first thing and don't bother with any extra work if not. This should speed up and simplify the sequential scanning for NVMe 1.0 devices. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
2020-10-07nvme: lift the check for an unallocated namespace into nvme_identify_nsChristoph Hellwig1-9/+8
Move the check from the two callers into the common helper. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
2020-10-07nvme: rename __nvme_revalidate_diskChristoph Hellwig1-5/+4
Rename __nvme_revalidate_disk to nvme_update_ns_info and pass a namespace instead of the gendisk. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
2020-10-07nvme: rename _nvme_revalidate_diskChristoph Hellwig1-7/+6
Rename _nvme_revalidate_disk to nvme_validate_ns to better describe what the function does, and pass the struct nvme_ns instead of the gendisk to better match the call chain. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
2020-10-07nvme: rename nvme_validate_ns to nvme_validate_or_alloc_nsChristoph Hellwig1-3/+3
Use a slightly more descriptive name to enable reusing nvme_validate_ns in the next patch for a lower level function. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
2020-10-07nvme: remove the disk argument to nvme_update_zone_infoChristoph Hellwig3-10/+5
The queue can trivially be derived from the nvme_ns structure. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
2020-10-07nvme: fix initialization of the zone bitmapsChristoph Hellwig3-23/+17
The removal of the ->revalidate_disk method broke the initialization of the zone bitmaps, as nvme_revalidate_disk now never gets called during initialization. Move the zone related code from nvme_revalidate_disk into a new helper in zns.c, and call it from nvme_alloc_ns in addition to nvme_validate_ns to ensure the zone bitmaps are initialized during probe. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
2020-10-07nvme-loop: don't put ctrl on nvme_init_ctrl errorChaitanya Kulkarni1-2/+2
The function nvme_init_ctrl() gets the ctrl reference & when it fails it does put the ctrl reference in the error unwind code. When creating loop ctrl in nvme_loop_create_ctrl() if nvme_init_ctrl() returns non zero (i.e. error) value it jumps to the "out_put_ctrl" label which calls nvme_put_ctrl(), that will lead to douple ctrl put in error unwind path. Update nvme_loop_create_ctrl() such that this patch removes the "out_put_ctrl" label, add a new "out" label after nvme_put_ctrl() in error unwind path and jump to newly added label when nvme_init_ctrl() call retuns an error. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-10-07nvme-core: put ctrl ref when module ref get failChaitanya Kulkarni1-1/+3
When try_module_get() fails in the nvme_dev_open() it returns without releasing the ctrl reference which was taken earlier. Put the ctrl reference which is taken before calling the try_module_get() in the error return code path. Fixes: 52a3974feb1a "nvme-core: get/put ctrl and transport module in nvme_dev_open/release()" Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-10-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds1-4/+3
Pull networking fixes from David Miller: 1) Make sure SKB control block is in the proper state during IPSEC ESP-in-TCP encapsulation. From Sabrina Dubroca. 2) Various kinds of attributes were not being cloned properly when we build new xfrm_state objects from existing ones. Fix from Antony Antony. 3) Make sure to keep BTF sections, from Tony Ambardar. 4) TX DMA channels need proper locking in lantiq driver, from Hauke Mehrtens. 5) Honour route MTU during forwarding, always. From Maciej Żenczykowski. 6) Fix races in kTLS which can result in crashes, from Rohit Maheshwari. 7) Skip TCP DSACKs with rediculous sequence ranges, from Priyaranjan Jha. 8) Use correct address family in xfrm state lookups, from Herbert Xu. 9) A bridge FDB flush should not clear out user managed fdb entries with the ext_learn flag set, from Nikolay Aleksandrov. 10) Fix nested locking of netdev address lists, from Taehee Yoo. 11) Fix handling of 32-bit DATA_FIN values in mptcp, from Mat Martineau. 12) Fix r8169 data corruptions on RTL8402 chips, from Heiner Kallweit. 13) Don't free command entries in mlx5 while comp handler could still be running, from Eran Ben Elisha. 14) Error flow of request_irq() in mlx5 is busted, due to an off by one we try to free and IRQ never allocated. From Maor Gottlieb. 15) Fix leak when dumping netlink policies, from Johannes Berg. 16) Sendpage cannot be performed when a page is a slab page, or the page count is < 1. Some subsystems such as nvme were doing so. Create a "sendpage_ok()" helper and use it as needed, from Coly Li. 17) Don't leak request socket when using syncookes with mptcp, from Paolo Abeni. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (111 commits) net/core: check length before updating Ethertype in skb_mpls_{push,pop} net: mvneta: fix double free of txq->buf net_sched: check error pointer in tcf_dump_walker() net: team: fix memory leak in __team_options_register net: typhoon: Fix a typo Typoon --> Typhoon net: hinic: fix DEVLINK build errors net: stmmac: Modify configuration method of EEE timers tcp: fix syn cookied MPTCP request socket leak libceph: use sendpage_ok() in ceph_tcp_sendpage() scsi: libiscsi: use sendpage_ok() in iscsi_tcp_segment_map() drbd: code cleanup by using sendpage_ok() to check page for kernel_sendpage() tcp: use sendpage_ok() to detect misused .sendpage nvme-tcp: check page by sendpage_ok() before calling kernel_sendpage() net: add WARN_ONCE in kernel_sendpage() for improper zero-copy send net: introduce helper sendpage_ok() in include/linux/net.h net: usb: pegasus: Proper error handing when setting pegasus' MAC address net: core: document two new elements of struct net_device netlink: fix policy dump leak net/mlx5e: Fix race condition on nhe->n pointer in neigh update net/mlx5e: Fix VLAN create flow ...
2020-10-02nvme-tcp: check page by sendpage_ok() before calling kernel_sendpage()Coly Li1-4/+3
Currently nvme_tcp_try_send_data() doesn't use kernel_sendpage() to send slab pages. But for pages allocated by __get_free_pages() without __GFP_COMP, which also have refcount as 0, they are still sent by kernel_sendpage() to remote end, this is problematic. The new introduced helper sendpage_ok() checks both PageSlab tag and page_count counter, and returns true if the checking page is OK to be sent by kernel_sendpage(). This patch fixes the page checking issue of nvme_tcp_try_send_data() with sendpage_ok(). If sendpage_ok() returns true, send this page by kernel_sendpage(), otherwise use sock_no_sendpage to handle this page. Signed-off-by: Coly Li <colyli@suse.de> Cc: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.de> Cc: Jan Kara <jack@suse.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Mikhail Skorzhinskii <mskorzhinskiy@solarflare.com> Cc: Philipp Reisner <philipp.reisner@linbit.com> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Vlastimil Babka <vbabka@suse.com> Cc: stable@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-27nvme-pci: allocate separate interrupt for the reserved non-polled I/O queueJeffle Xu1-17/+15
One queue will be reserved for non-polled IO when nvme.poll_queues is greater or equal than the number of IO queues that the nvme controller can provide. Currently the reserved queue for non-polled IO will reuse the interrupt used by admin queue in this case, e.g, vector 0. This can work and the performance may not be an issue since the admin queue is used unfrequently. However this behaviour may be inconsistent with that when nvme.poll_queues is smaller than the number of IO queues available. Thus allocate separate interrupt for this reserved queue, and thus make the behaviour consistent. Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com> [hch: minor cleanups, mostly to the pre-existing surrounding code] Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-09-27nvme: fix error handling in nvme_ns_report_zonesChristoph Hellwig1-25/+16
nvme_submit_sync_cmd can return positive NVMe error codes in addition to the negative Linux error code, which are currently ignored. Fix this by removing __nvme_ns_report_zones and handling the errors from nvme_submit_sync_cmd in the caller instead of multiplexing the return value and the number of zones reported into a single return value. Fixes: 240e6ee272c0 ("nvme: support for zoned namespaces") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
2020-09-27nvmet-fc: fix missing check for no hostport structJames Smart1-1/+1
A hostport port pointer is allowed to be NULL as it is not allocated if the lldd does not support the new interfaces for NVME LS request support. The hostport free routine validates the handle but forgot to validate the hostport pointer. Validate the hostport pointer before using it to validate the handle. Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-09-27nvmet: add passthru ZNS supportChaitanya Kulkarni1-0/+16
In the default passthru implementation NVMeOF target passthru ctrl is not capable of handling Zoned Namespaces (ZNS). Update the nvmet_parse_pasthru_admin_cmd() to allow NVME_ID_CNS_CS_CTRL/NVME_CIS_ZNS and NVME_ID_CNS_CS_NS/NVME_CIS_ZNS. With this addition NVMeOF Passthru allows Zoned Namespaces. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-09-27nvmet: handle keep-alive timer when kato is modified by a set features cmdAmit Engel3-2/+6
A user may modify the kato by a set features cmd. To properly deal with races or a kato value of 0 (no keep alive enabled) change nvmet_set_feat_kato to first disable the timer, then set the value and then re-enable the timer. Signed-off-by: Amit Engel <amit.engel@dell.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-09-27nvmet-tcp: have queue io_work context run on sock incoming cpuMark Wunderlich1-11/+10
No real good need to spread queues artificially. Usually the target will serve multiple hosts, and it's better to run on the socket incoming cpu for better affinitization rather than spread queues on all online cpus. We rely on RSS to spread the work around sufficiently. Signed-off-by: Mark Wunderlich <mark.wunderlich@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-09-27nvme-pci: Move enumeration by class to be last in the tableAndy Shevchenko1-1/+2
It's unusual that we have enumeration by class in the middle of the table. It might potentially be problematic in the future if we add another entry after it. So, move class matching entry to be the last in the ID table. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-09-27nvme: use an xarray to lookup the Commands Supported and Effects logChaitanya Kulkarni2-28/+5
When using linked list we have to open code the locking, search, and destroy operations with the loops even if data structure doesn't fall into the fast path. One of the main advantage of having XArray to store, search, and remove items is that it handles all the locking by itself, avoids the loops when using linked lists, provides clear API to replace the linked list's search and destroy loops. This patch replaces the ctrl->cel list with XArray and removes :- a. Extra code needed for the linked list for ctrl->cel item management such as nvme_find_cel(). b. Destroy loop in the nvme_free_ctrl(). c. Explicit insertion locking in the nvme_get_effects_log(). Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-09-27nvme: lift the file open code from nvme_ctrl_get_by_pathChaitanya Kulkarni3-32/+22
Lift opening the file open/close code from nvme_ctrl_get_by_path into the caller, just keeping a simple nvme_ctrl_from_file() helper. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> [hch: refactored a bit, split the bug fixes into a separate prep patch] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Logan Gunthorpe <logang@deltatee.com>