summaryrefslogtreecommitdiffstats
path: root/drivers/nvme/target/loop.c
AgeCommit message (Collapse)AuthorFilesLines
2022-12-07nvme: consolidate setting the tagset flagsChristoph Hellwig1-2/+2
All nvme transports should be using the same flags for their tagsets, with the exception for the blocking flag that should only be set for transports that can block in ->queue_rq. Add a NVME_F_BLOCKING flag to nvme_ctrl_ops to control the blocking behavior and lift setting the flags into nvme_alloc_{admin,io}_tag_set. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
2022-12-07nvme: pass nr_maps explicitly to nvme_alloc_io_tag_setChristoph Hellwig1-1/+1
Don't look at ctrl->ops as only RDMA and TCP actually support multiple maps. Fixes: 6dfba1c09c10 ("nvme-fc: use the tagset alloc/free helpers") Fixes: ceee1953f923 ("nvme-loop: use the tagset alloc/free helpers") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
2022-12-06nvme: merge nvme_shutdown_ctrl into nvme_disable_ctrlChristoph Hellwig1-1/+1
Many of the callers decide which one to use based on a bool argument and there is at least some code to be shared, so merge these two. Also move a comment specific to a single callsite to that callsite. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Hector Martin <marcan@marcan.st>
2022-12-06nvme: introduce nvme_start_requestSagi Grimberg1-1/+1
In preparation for nvme-multipath IO stats accounting, we want the accounting to happen in a centralized place. The request completion is already centralized, but we need a common helper to request I/O start. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de>
2022-11-18nvme: rename the queue quiescing helpersChristoph Hellwig1-3/+3
Naming the nvme helpers that wrap the block quiesce functionality _start/_stop is rather confusing. Switch to using the quiesce naming used by the block layer instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
2022-11-15nvme: move OPAL setup from PCIe to coreChristoph Hellwig1-1/+1
Nothing about the TCG Opal support is PCIe transport specific, so move it to the core code. For this nvme_init_ctrl_finish grows a new was_suspended argument that allows the transport driver to tell the OPAL code if the controller came out of a suspend cycle. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: James Smart <jsmart2021@gmail.com> Tested-by Gerd Bayer <gbayer@linxu.ibm.com>
2022-09-27nvme-loop: use the tagset alloc/free helpersChristoph Hellwig1-64/+19
Use the common helpers to allocate and free the tagsets. To make this work the generic nvme_ctrl now needs to be stored in the hctx private data instead of the nvme_loop_ctrl. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
2022-09-27nvme-loop: store the generic nvme_ctrl in set->driver_dataChristoph Hellwig1-5/+5
Point the private data to the generic controller structure in preparation of using the common tagset init/exit code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
2022-09-27nvme-loop: initialize sqsize laterChristoph Hellwig1-1/+1
Defer initializing the sqsize field from the options until it has been capped by MAXCMD. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
2022-08-02nvme-loop: use nvme core helpers to cancel all requests in a tagsetSagi Grimberg1-6/+2
A helper now exist, no need to open-code the same functionality. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-06-28block: simplify disk shutdownChristoph Hellwig1-6/+6
Set the queue dying flag and call blk_mq_exit_queue from del_gendisk for all disks that do not have separately allocated queues, and thus remove the need to call blk_cleanup_queue for them. Rename blk_cleanup_disk to blk_mq_destroy_queue to make it clear that this function is intended only for separately allocated blk-mq queues. This saves an extra queue freeze for devices without a separately allocated queue. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20220619060552.1850436-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-29nvmet: use a private workqueue instead of the system workqueueSagi Grimberg1-2/+2
Any attempt to flush kernel-global WQs has possibility of deadlock so we should simply stop using them, instead introduce nvmet_wq which is the generic nvmet workqueue for work elements that don't explicitly require a dedicated workqueue (by the mere fact that they are using the system_wq). Changes were done using the following replaces: - s/schedule_work(/queue_work(nvmet_wq, /g - s/schedule_delayed_work(/queue_delayed_work(nvmet_wq, /g - s/flush_scheduled_work()/flush_workqueue(nvmet_wq)/g Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-02-28nvme: add a helper to initialize connect_qChaitanya Kulkarni1-4/+2
Add and use helper to remove duplicate code for fabrics connect_q initialization and error handling for all the transports. Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-10-19nvme: loop: clear NVME_CTRL_ADMIN_Q_STOPPED after admin queue is reallocatedMing Lei1-0/+2
The nvme-loop's admin queue may be freed and reallocated, and we have to reset the flag of NVME_CTRL_ADMIN_Q_STOPPED so that the flag can match with the quiesce state of the admin queue. nvme-loop is the only driver to reallocate request queue, and not see such usage in other nvme drivers. Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211014081710.1871747-6-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19nvme: apply nvme API to quiesce/unquiesce admin queueMing Lei1-2/+2
Apply the added two APIs to quiesce/unquiesce admin queue. Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211014081710.1871747-3-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-16nvme: code command_id with a genctr for use-after-free validationSagi Grimberg1-2/+2
We cannot detect a (perhaps buggy) controller that is sending us a completion for a request that was already completed (for example sending a completion twice), this phenomenon was seen in the wild a few times. So to protect against this, we use the upper 4 msbits of the nvme sqe command_id to use as a 4-bit generation counter and verify it matches the existing request generation that is incrementing on every execution. The 16-bit command_id structure now is constructed by: | xxxx | xxxxxxxxxxxx | gen request tag This means that we are giving up some possible queue depth as 12 bits allow for a maximum queue depth of 4095 instead of 65536, however we never create such long queues anyways so no real harm done. Suggested-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Acked-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Daniel Wagner <dwagner@suse.de> Tested-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-06-30nvme: use blk_execute_rq() for passthrough commandsKeith Busch1-1/+1
The generic blk_execute_rq() knows how to handle polled completions. Use that instead of implementing an nvme specific handler. Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Link: https://lore.kernel.org/r/20210610214437.641245-3-kbusch@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-02nvme-loop: do not warn for deleted controllers during resetHannes Reinecke1-2/+4
During concurrent reset and delete calls the reset workqueue is flushed, causing nvme_loop_reset_ctrl_work() to be executed when the controller is in state DELETING or DELETING_NOIO. But this is expected, so we shouldn't issue a WARN_ON here. Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-06-02nvme-loop: check for NVME_LOOP_Q_LIVE in nvme_loop_destroy_admin_queue()Hannes Reinecke1-1/+2
We need to check the NVME_LOOP_Q_LIVE flag in nvme_loop_destroy_admin_queue() to protect against duplicate invocations eg during concurrent reset and remove calls. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-06-02nvme-loop: clear NVME_LOOP_Q_LIVE when nvme_loop_configure_admin_queue() failsHannes Reinecke1-0/+1
When the call to nvme_enable_ctrl() in nvme_loop_configure_admin_queue() fails the NVME_LOOP_Q_LIVE flag is not cleared. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-06-02nvme-loop: reset queue count to 1 in nvme_loop_destroy_io_queues()Hannes Reinecke1-0/+1
The queue count is increased in nvme_loop_init_io_queues(), so we need to reset it to 1 at the end of nvme_loop_destroy_io_queues(). Otherwise the function is not re-entrant safe, and crash will happen during concurrent reset and remove calls. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-05-19nvme-loop: fix memory leak in nvme_loop_create_ctrl()Wu Bo1-1/+3
When creating loop ctrl in nvme_loop_create_ctrl(), if nvme_init_ctrl() fails, the loop ctrl should be freed before jumping to the "out" label. Fixes: 3a85a5de29ea ("nvme-loop: add a NVMe loopback host driver") Signed-off-by: Wu Bo <wubo40@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-05-04nvme: move the fabrics queue ready check routines to coreTao Chiu1-2/+2
queue_rq() in pci only checks if the dispatched queue (nvmeq) is ready, e.g. not being suspended. Since nvme_alloc_admin_tags() in reset flow restarts the admin queue, users are able to submit admin commands to a controller before reset_work() completes. Commands submitted under this condition may interfere with commands that performs identify, IO queue setup in reset_work(), and may result in a hang described in the following patch. As seen in the fabrics, user commands are prevented from being executed under inproper controller states. We may reuse this logic to maintain a clear admin queue during reset_work(). Signed-off-by: Tao Chiu <taochiu@synology.com> Signed-off-by: Cody Wong <codywong@synology.com> Reviewed-by: Leon Chien <leonchien@synology.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-04-28Merge tag 'for-5.13/drivers-2021-04-27' of git://git.kernel.dk/linux-blockLinus Torvalds1-2/+4
Pull block driver updates from Jens Axboe: - MD changes via Song: - raid5 POWER fix - raid1 failure fix - UAF fix for md cluster - mddev_find_or_alloc() clean up - Fix NULL pointer deref with external bitmap - Performance improvement for raid10 discard requests - Fix missing information of /proc/mdstat - rsxx const qualifier removal (Arnd) - Expose allocated brd pages (Calvin) - rnbd via Gioh Kim: - Change maintainer - Change domain address of maintainers' email - Add polling IO mode and document update - Fix memory leak and some bug detected by static code analysis tools - Code refactoring - Series of floppy cleanups/fixes (Denis) - s390 dasd fixes (Julian) - kerneldoc fixes (Lee) - null_blk double free (Lv) - null_blk virtual boundary addition (Max) - Remove xsysace driver (Michal) - umem driver removal (Davidlohr) - ataflop fixes (Dan) - Revalidate disk removal (Christoph) - Bounce buffer cleanups (Christoph) - Mark lightnvm as deprecated (Christoph) - mtip32xx init cleanups (Shixin) - Various fixes (Tian, Gustavo, Coly, Yang, Zhang, Zhiqiang) * tag 'for-5.13/drivers-2021-04-27' of git://git.kernel.dk/linux-block: (143 commits) async_xor: increase src_offs when dropping destination page drivers/block/null_blk/main: Fix a double free in null_init. md/raid1: properly indicate failure when ending a failed write request md-cluster: fix use-after-free issue when removing rdev nvme: introduce generic per-namespace chardev nvme: cleanup nvme_configure_apst nvme: do not try to reconfigure APST when the controller is not live nvme: add 'kato' sysfs attribute nvme: sanitize KATO setting nvmet: avoid queuing keep-alive timer if it is disabled brd: expose number of allocated pages in debugfs ataflop: fix off by one in ataflop_probe() ataflop: potential out of bounds in do_format() drbd: Fix fall-through warnings for Clang block/rnbd: Use strscpy instead of strlcpy block/rnbd-clt-sysfs: Remove copy buffer overlap in rnbd_clt_get_path_name block/rnbd-clt: Remove max_segment_size block/rnbd-clt: Generate kobject_uevent when the rnbd device state changes block/rnbd-srv: Remove unused arguments of rnbd_srv_rdma_ev Documentation/ABI/rnbd-clt: Add description for nr_poll_queues ...
2021-04-02nvme: use driver pdu command for passthroughKeith Busch1-1/+3
All nvme transport drivers preallocate an nvme command for each request. Assume to use that command for nvme_setup_cmd() instead of requiring drivers pass a pointer to it. All nvme drivers must initialize the generic nvme_request 'cmd' to point to the transport's preallocated nvme_command. The generic nvme_request cmd pointer had previously been used only as a temporary copy for passthrough commands. Since it now points to the command that gets dispatched, passthrough commands must directly set it up prior to executing the request. Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-04-02nvme: rename nvme_init_identify()Chaitanya Kulkarni1-1/+1
This is a prep patch so that we can move the identify data structure related code initialization from nvme_init_identify() into a helper. Rename the function nvmet_init_identify() to nvmet_init_ctrl_finish(). Next patch will move the nvme_id_ctrl related initialization from newly renamed function nvme_init_ctrl_finish() into the nvme_init_identify() helper. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-03-18nvme-fabrics: only reserve a single tagChristoph Hellwig1-2/+2
Fabrics drivers currently reserve two tags on the admin queue. But given that the connect command is only run on a freshly created queue or after all commands have been force aborted we only need to reserve a single tag. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Daniel Wagner <dwagner@suse.de>
2020-12-16Merge tag 'for-5.11/drivers-2020-12-14' of git://git.kernel.dk/linux-blockLinus Torvalds1-1/+1
Pull block driver updates from Jens Axboe: "Nothing major in here: - NVMe pull request from Christoph: - nvmet passthrough improvements (Chaitanya Kulkarni) - fcloop error injection support (James Smart) - read-only support for zoned namespaces without Zone Append (Javier González) - improve some error message (Minwoo Im) - reject I/O to offline fabrics namespaces (Victor Gladkov) - PCI queue allocation cleanups (Niklas Schnelle) - remove an unused allocation in nvmet (Amit Engel) - a Kconfig spelling fix (Colin Ian King) - nvme_req_qid simplication (Baolin Wang) - MD pull request from Song: - Fix race condition in md_ioctl() (Dae R. Jeong) - Initialize read_slot properly for raid10 (Kevin Vigor) - Code cleanup (Pankaj Gupta) - md-cluster resync/reshape fix (Zhao Heming) - Move null_blk into its own directory (Damien Le Moal) - null_blk zone and discard improvements (Damien Le Moal) - bcache race fix (Dongsheng Yang) - Set of rnbd fixes/improvements (Gioh Kim, Guoqing Jiang, Jack Wang, Lutz Pogrell, Md Haris Iqbal) - lightnvm NULL pointer deref fix (tangzhenhao) - sr in_interrupt() removal (Sebastian Andrzej Siewior) - FC endpoint security support for s390/dasd (Jan Höppner, Sebastian Ott, Vineeth Vijayan). From the s390 arch guys, arch bits included as it made it easier for them to funnel the feature through the block driver tree. - Follow up fixes (Colin Ian King)" * tag 'for-5.11/drivers-2020-12-14' of git://git.kernel.dk/linux-block: (64 commits) block: drop dead assignments in loop_init() sr: Remove in_interrupt() usage in sr_init_command(). sr: Switch the sector size back to 2048 if sr_read_sector() changed it. cdrom: Reset sector_size back it is not 2048. drivers/lightnvm: fix a null-ptr-deref bug in pblk-core.c null_blk: Move driver into its own directory null_blk: Allow controlling max_hw_sectors limit null_blk: discard zones on reset null_blk: cleanup discard handling null_blk: Improve implicit zone close null_blk: improve zone locking block: Align max_hw_sectors to logical blocksize null_blk: Fail zone append to conventional zones null_blk: Fix zone size initialization bcache: fix race between setting bdev state to none and new write request direct to backing block/rnbd: fix a null pointer dereference on dev->blk_symlink_name block/rnbd-clt: Dynamically alloc buffer for pathname & blk_symlink_name block/rnbd: call kobject_put in the failure path Documentation/ABI/rnbd-srv: add document for force_close block/rnbd-srv: close a mapped device from server side. ...
2020-12-07nvme-loop: use blk_mq_hctx_set_fq_lock_class to set loop's lock classMing Lei1-0/+10
Set nvme-loop's lock class via blk_mq_hctx_set_fq_lock_class for avoiding lockdep possible recursive locking, then we can remove the dynamically allocated lock class for each flush queue, finally we can avoid horrible SCSI probe delay. This way may not address situation in which one nvme-loop is backed on another nvme-loop. However, in reality, people seldom uses this way for test. Even though someone played in this way, it is just one recursive locking false positive, no real deadlock issue. Tested-by: Kashyap Desai <kashyap.desai@broadcom.com> Reported-by: Qian Cai <cai@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Sumit Saxena <sumit.saxena@broadcom.com> Cc: John Garry <john.garry@huawei.com> Cc: Kashyap Desai <kashyap.desai@broadcom.com> Cc: Bart Van Assche <bvanassche@acm.org> Cc: Hannes Reinecke <hare@suse.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-01nvme: use consistent macro name for timeoutChaitanya Kulkarni1-1/+1
This is purely a clenaup patch, add prefix NVME to the ADMIN_TIMEOUT to make consistent with NVME_IO_TIMEOUT. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-10-07nvme-loop: don't put ctrl on nvme_init_ctrl errorChaitanya Kulkarni1-2/+2
The function nvme_init_ctrl() gets the ctrl reference & when it fails it does put the ctrl reference in the error unwind code. When creating loop ctrl in nvme_loop_create_ctrl() if nvme_init_ctrl() returns non zero (i.e. error) value it jumps to the "out_put_ctrl" label which calls nvme_put_ctrl(), that will lead to douple ctrl put in error unwind path. Update nvme_loop_create_ctrl() such that this patch removes the "out_put_ctrl" label, add a new "out" label after nvme_put_ctrl() in error unwind path and jump to newly added label when nvme_init_ctrl() call retuns an error. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-08-21nvme: rename and document nvme_end_requestChristoph Hellwig1-1/+1
nvme_end_request is a bit misnamed, as it wraps around the blk_mq_complete_* API. It's semantics also are non-trivial, so give it a more descriptive name and add a comment explaining the semantics. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-29nvme-loop: remove extra variable in create ctrlChaitanya Kulkarni1-8/+6
We can call the nvme_change_ctrl_state() directly and have WARN_ON_ONCE(1) call instead of having to use an extra variable which matches the name of the function. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-29nvme-loop: set ctrl state connecting after initChaitanya Kulkarni1-0/+3
When creating a loop controller (ctrl) in nvme_loop_create_ctrl() -> nvme_init_ctrl() we set the ctrl state to NVME_CTRL_NEW. Prior to [1] NVME_CTRL_NEW state was allowed in nvmf_check_ready() for fabrics command type connect. Now, this fails in the following code path for fabrics connect command when creating admin queue :- nvme_loop_create_ctrl() nvme_loo_configure_admin_queue() nvmf_connect_admin_queue() __nvme_submit_sync_cmd() blk_execute_rq() nvme_loop_queue_rq() nvmf_check_ready() # echo "transport=loop,nqn=fs" > /dev/nvme-fabrics [ 6047.741327] nvmet: adding nsid 1 to subsystem fs [ 6048.756430] nvme nvme1: Connect command failed, error wo/DNR bit: 880 We need to set the ctrl state to NVME_CTRL_CONNECTING after :- nvme_loop_create_ctrl() nvme_init_ctrl() so that the above mentioned check for nvmf_check_ready() will return true. This patch sets the ctrl state to connecting after we init the ctrl in nvme_loop_create_ctrl() nvme_init_ctrl() . [1] commit aa63fa6776a7 ("nvme-fabrics: allow to queue requests for live queues") Fixes: aa63fa6776a7 ("nvme-fabrics: allow to queue requests for live queues") Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Tested-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-08nvmet-loop: remove unused 'target_ctrl' in nvme_loop_ctrlDongli Zhang1-1/+0
This field is never used since the introduction of nvme loopback by commit 3a85a5de29ea ("nvme-loop: add a NVMe loopback host driver"). Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-08Merge tag 'v5.8-rc4' into for-5.9/driversJens Axboe1-2/+2
Merge in 5.8-rc4 for-5.9/block to setup for-5.9/drivers, to provide a clean base and making the life for the NVMe changes easier. Signed-off-by: Jens Axboe <axboe@kernel.dk> * tag 'v5.8-rc4': (732 commits) Linux 5.8-rc4 x86/ldt: use "pr_info_once()" instead of open-coding it badly MIPS: Do not use smp_processor_id() in preemptible code MIPS: Add missing EHB in mtc0 -> mfc0 sequence for DSPen .gitignore: Do not track `defconfig` from `make savedefconfig` io_uring: fix regression with always ignoring signals in io_cqring_wait() x86/ldt: Disable 16-bit segments on Xen PV x86/entry/32: Fix #MC and #DB wiring on x86_32 x86/entry/xen: Route #DB correctly on Xen PV x86/entry, selftests: Further improve user entry sanity checks x86/entry/compat: Clear RAX high bits on Xen PV SYSENTER i2c: mlxcpld: check correct size of maximum RECV_LEN packet i2c: add Kconfig help text for slave mode i2c: slave-eeprom: update documentation i2c: eg20t: Load module automatically if ID matches i2c: designware: platdrv: Set class based on DMI i2c: algo-pca: Add 0x78 as SCL stuck low status for PCA9665 mm/page_alloc: fix documentation error vmalloc: fix the owner argument for the new __vmalloc_node_range callers mm/cma.c: use exact_nid true to fix possible per-numa cma leak ...
2020-06-24nvme-loop: initialize tagset numa value to the value of the ctrlMax Gurtovoy1-2/+2
Both admin's and drive's tagsets should be set according the numa node of the controller. Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-06-24nvme: use blk_mq_complete_request_remote to avoid an indirect function callChristoph Hellwig1-1/+2
Use the new blk_mq_complete_request_remote helper to avoid an indirect function call in the completion fast path. Reviewed-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-26nvme: Make nvme_uninit_ctrl symmetric to nvme_init_ctrlIsrael Rukshin1-2/+0
Put the ctrl reference count at nvme_uninit_ctrl as opposed to nvme_init_ctrl which takes it. This decrease the reference count at the core layer instead of decreasing it on each transport separately. Also move the call of nvme_uninit_ctrl at PCI driver after calling to nvme_release_prp_pools and nvme_dev_unmap, in order to put the reference count after using the dev. This is safe because those functions use nvme_dev which is freed only later at nvme_pci_free_ctrl. Signed-off-by: Israel Rukshin <israelr@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2020-03-26nvme: Fix ctrl use-after-free during sysfs deletionIsrael Rukshin1-2/+1
In case nvme_sysfs_delete() is called by the user before taking the ctrl reference count, the ctrl may be freed during the creation and cause the bug. Take the reference as soon as the controller is externally visible, which is done by cdev_device_add() in nvme_init_ctrl(). Also take the reference count at the core layer instead of taking it on each transport separately. Signed-off-by: Israel Rukshin <israelr@mellanox.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2019-11-27nvmet-loop: Avoid preallocating big SGL for dataIsrael Rukshin1-4/+4
nvme_loop_create_io_queues() preallocates a big buffer for the IO SGL based on SG_CHUNK_SIZE. Modern DMA engines are often capable of dealing with very big segments so the SG_CHUNK_SIZE is often too big. SG_CHUNK_SIZE results in a static 4KB SGL allocation per command. If a controller has lots of deep queues, preallocation for the sg list can consume substantial amounts of memory. For nvmet-loop, nr_hw_queues can be 128 and each queue's depth 128. This means the resulting preallocation for the data SGL is 128*128*4K = 64MB per controller. Switch to runtime allocation for SGL for lists longer than 2 entries. This is the approach used by NVMe PCI so it should be reasonable for NVMeOF as well. Runtime SGL allocation has always been the case for the legacy I/O path so this is nothing new. Tested-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Israel Rukshin <israelr@mellanox.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
2019-11-25Merge tag 'for-5.5/drivers-20191121' of git://git.kernel.dk/linux-blockLinus Torvalds1-4/+3
Pull block driver updates from Jens Axboe: "Here are the main block driver updates for 5.5. Nothing major in here, mostly just fixes. This contains: - a set of bcache changes via Coly - MD changes from Song - loop unmap write-zeroes fix (Darrick) - spelling fixes (Geert) - zoned additions cleanups to null_blk/dm (Ajay) - allow null_blk online submit queue changes (Bart) - NVMe changes via Keith, nothing major here either" * tag 'for-5.5/drivers-20191121' of git://git.kernel.dk/linux-block: (56 commits) Revert "bcache: fix fifo index swapping condition in journal_pin_cmp()" drivers/md/raid5-ppl.c: use the new spelling of RWH_WRITE_LIFE_NOT_SET drivers/md/raid5.c: use the new spelling of RWH_WRITE_LIFE_NOT_SET bcache: don't export symbols bcache: remove the extra cflags for request.o bcache: at least try to shrink 1 node in bch_mca_scan() bcache: add idle_max_writeback_rate sysfs interface bcache: add code comments in bch_btree_leaf_dirty() bcache: fix deadlock in bcache_allocator bcache: add code comment bch_keylist_pop() and bch_keylist_pop_front() bcache: deleted code comments for dead code in bch_data_insert_keys() bcache: add more accurate error messages in read_super() bcache: fix static checker warning in bcache_device_free() bcache: fix a lost wake-up problem caused by mca_cannibalize_lock bcache: fix fifo index swapping condition in journal_pin_cmp() md/raid10: prevent access of uninitialized resync_pages offset md: avoid invalid memory access for array sb->dev_roles md/raid1: avoid soft lockup under high load null_blk: add zone open, close, and finish support dm: add zone open, close and finish support ...
2019-11-04nvmet: Open code nvmet_req_execute()Christoph Hellwig1-1/+1
Now that nvmet_req_execute does nothing, open code it. Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> [split patch, update changelog] Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-04nvme: move common call to nvme_cleanup_cmd to core layerMax Gurtovoy1-1/+0
nvme_cleanup_cmd should be called for each call to nvme_setup_cmd (symmetrical functions). Move the call for nvme_cleanup_cmd to the common core layer and call it during nvme_complete_rq for the good flow. For error flow, each transport will call nvme_cleanup_cmd independently. Also take care of a special case of path failure, where we call nvme_complete_rq without doing nvme_setup_cmd. Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-04nvme: introduce nvme_is_aen_req functionIsrael Rukshin1-2/+2
This function improves code readability and reduces code duplication. Signed-off-by: Israel Rukshin <israelr@mellanox.com> Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-15nvmet-loop: fix possible leakage during error flowMax Gurtovoy1-1/+3
During nvme_loop_queue_rq error flow, one must call nvme_cleanup_cmd since it's symmetric to nvme_setup_cmd. Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2019-09-17Merge tag 'for-5.4/block-2019-09-16' of git://git.kernel.dk/linux-blockLinus Torvalds1-14/+16
Pull block updates from Jens Axboe: - Two NVMe pull requests: - ana log parse fix from Anton - nvme quirks support for Apple devices from Ben - fix missing bio completion tracing for multipath stack devices from Hannes and Mikhail - IP TOS settings for nvme rdma and tcp transports from Israel - rq_dma_dir cleanups from Israel - tracing for Get LBA Status command from Minwoo - Some nvme-tcp cleanups from Minwoo, Potnuri and Myself - Some consolidation between the fabrics transports for handling the CAP register - reset race with ns scanning fix for fabrics (move fabrics commands to a dedicated request queue with a different lifetime from the admin request queue)." - controller reset and namespace scan races fixes - nvme discovery log change uevent support - naming improvements from Keith - multiple discovery controllers reject fix from James - some regular cleanups from various people - Series fixing (and re-fixing) null_blk debug printing and nr_devices checks (André) - A few pull requests from Song, with fixes from Andy, Guoqing, Guilherme, Neil, Nigel, and Yufen. - REQ_OP_ZONE_RESET_ALL support (Chaitanya) - Bio merge handling unification (Christoph) - Pick default elevator correctly for devices with special needs (Damien) - Block stats fixes (Hou) - Timeout and support devices nbd fixes (Mike) - Series fixing races around elevator switching and device add/remove (Ming) - sed-opal cleanups (Revanth) - Per device weight support for BFQ (Fam) - Support for blk-iocost, a new model that can properly account cost of IO workloads. (Tejun) - blk-cgroup writeback fixes (Tejun) - paride queue init fixes (zhengbin) - blk_set_runtime_active() cleanup (Stanley) - Block segment mapping optimizations (Bart) - lightnvm fixes (Hans/Minwoo/YueHaibing) - Various little fixes and cleanups * tag 'for-5.4/block-2019-09-16' of git://git.kernel.dk/linux-block: (186 commits) null_blk: format pr_* logs with pr_fmt null_blk: match the type of parameter nr_devices null_blk: do not fail the module load with zero devices block: also check RQF_STATS in blk_mq_need_time_stamp() block: make rq sector size accessible for block stats bfq: Fix bfq linkage error raid5: use bio_end_sector in r5_next_bio raid5: remove STRIPE_OPS_REQ_PENDING md: add feature flag MD_FEATURE_RAID0_LAYOUT md/raid0: avoid RAID0 data corruption due to layout confusion. raid5: don't set STRIPE_HANDLE to stripe which is in batch list raid5: don't increment read_errors on EILSEQ return nvmet: fix a wrong error status returned in error log page nvme: send discovery log page change events to userspace nvme: add uevent variables for controller devices nvme: enable aen regardless of the presence of I/O queues nvme-fabrics: allow discovery subsystems accept a kato nvmet: Use PTR_ERR_OR_ZERO() in nvmet_init_discovery() nvme: Remove redundant assignment of cq vector nvme: Assign subsys instance from first ctrl ...
2019-08-29nvme: make fabrics command run on a separate request queueSagi Grimberg1-3/+13
We have a fundamental issue that fabric commands use the admin_q. The reason is, that admin-connect, register reads and writes and admin commands cannot be guaranteed ordering while we are running controller resets. For example, when we reset a controller we perform: 1. disable the controller 2. teardown the admin queue 3. re-establish the admin queue 4. enable the controller In order to perform (3), we need to unquiesce the admin queue, however we may have some admin commands that are already pending on the quiesced admin_q and will immediate execute when we unquiesce it before we execute (4). The host must not send admin commands to the controller before enabling the controller. To fix this, we have the fabric commands (admin connect and property get/set, but not I/O queue connect) use a separate fabrics_q and make sure to quiesce the admin_q before we disable the controller, and unquiesce it only after we enable the controller. This fixes the error prints from nvmet in a controller reset storm test: kernel: nvmet: got cmd 6 while CC.EN == 0 on qid = 0 Which indicate that the host is sending an admin command when the controller is not enabled. Reviewed-by: James Smart <james.smart@broadcom.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2019-08-29nvme: move sqsize setting to the coreSagi Grimberg1-11/+1
nvme_enable_ctrl reads the cap register right after, so no need to do that locally in the transport driver. Have sqsize setting in nvme_init_identify. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2019-08-04nvme: wait until all completed request's complete fn is calledMing Lei1-0/+2
When aborting in-flight request for recovering controller, we have to make sure that queue's complete function is called on completed request before moving on. Otherwise, for example, the warning of WARN_ON_ONCE(qp->mrs_used > 0) in ib_destroy_qp_user() may be triggered on nvme-rdma. Fix this issue by using blk_mq_tagset_wait_completed_request. Cc: Max Gurtovoy <maxg@mellanox.com> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Keith Busch <keith.busch@intel.com> Cc: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>