summaryrefslogtreecommitdiffstats
path: root/drivers
AgeCommit message (Collapse)AuthorFilesLines
2020-10-07nvme: move nvme_validate_nsChristoph Hellwig1-37/+37
Move nvme_validate_ns just above its only remaining caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
2020-10-07nvme: query namespace identifiers before adding the namespaceChristoph Hellwig1-63/+53
Check the namespace identifier list first thing when scanning namespaces. This keeps the code to query the CSI common between the alloc and validate path, and helps to structure the code better for multiple command set support. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
2020-10-07nvme: revalidate zone bitmaps in nvme_update_ns_infoChristoph Hellwig1-4/+6
Consolidate the two calls into a single place. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org>
2020-10-07nvme: remove nvme_update_formatsChristoph Hellwig1-30/+2
Now that the queue is frozen before updating ->lba_shift we can't hit the invalid references mentioned in the comment any more. More importantly this code would not have helped us if the format was changed by another controller or through implementation defined back channels. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
2020-10-07nvme: update the known admin effectsChristoph Hellwig1-2/+2
A Format NVM command can change the capabilities of namespaces, while Sanitize does change the Logical Block Content and must be serialized. Also remove CSUPP bit for Format - it is not a mandatory command, and we don't check for the bit anyway. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
2020-10-07nvme: set the queue limits in nvme_update_ns_infoChristoph Hellwig1-25/+21
Only set the queue limits once we have the real block size. This also updates the limits on a rescan if needed. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
2020-10-07nvme: remove the 0 lba_shift check in nvme_update_ns_infoChristoph Hellwig1-6/+0
We can no longer reach this code if Identify Namespace failed. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
2020-10-07nvme: clean up the check for too large logic block sizesChristoph Hellwig1-8/+5
Use a single statement to set both the capacity and fake block size instead of two. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org>
2020-10-07nvme: freeze the queue over ->lba_shift updatesChristoph Hellwig1-6/+14
Ensure that there can't be any I/O in flight went we change the disk geometry in nvme_update_ns_info, most notable the LBA size by lifting the queue free from nvme_update_disk_info into the caller Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
2020-10-07nvme: factor out a nvme_configure_metadata helperChristoph Hellwig1-31/+47
Factor out a helper from nvme_update_ns_info that configures the per-namespaces metadata and PI settings. Also make sure the helpers clear the flags explicitly instead of all of ->features to allow for potentially reusing ->features for future non-metadata flags. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
2020-10-07nvme: call nvme_identify_ns as the first thing in nvme_alloc_ns_blockChristoph Hellwig1-8/+7
Check if the namespace actually exists as the very first thing and don't bother with any extra work if not. This should speed up and simplify the sequential scanning for NVMe 1.0 devices. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
2020-10-07nvme: lift the check for an unallocated namespace into nvme_identify_nsChristoph Hellwig1-9/+8
Move the check from the two callers into the common helper. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
2020-10-07nvme: rename __nvme_revalidate_diskChristoph Hellwig1-5/+4
Rename __nvme_revalidate_disk to nvme_update_ns_info and pass a namespace instead of the gendisk. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
2020-10-07nvme: rename _nvme_revalidate_diskChristoph Hellwig1-7/+6
Rename _nvme_revalidate_disk to nvme_validate_ns to better describe what the function does, and pass the struct nvme_ns instead of the gendisk to better match the call chain. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
2020-10-07nvme: rename nvme_validate_ns to nvme_validate_or_alloc_nsChristoph Hellwig1-3/+3
Use a slightly more descriptive name to enable reusing nvme_validate_ns in the next patch for a lower level function. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
2020-10-07nvme: remove the disk argument to nvme_update_zone_infoChristoph Hellwig3-10/+5
The queue can trivially be derived from the nvme_ns structure. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
2020-10-07nvme: fix initialization of the zone bitmapsChristoph Hellwig3-23/+17
The removal of the ->revalidate_disk method broke the initialization of the zone bitmaps, as nvme_revalidate_disk now never gets called during initialization. Move the zone related code from nvme_revalidate_disk into a new helper in zns.c, and call it from nvme_alloc_ns in addition to nvme_validate_ns to ensure the zone bitmaps are initialized during probe. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
2020-10-07nvme-loop: don't put ctrl on nvme_init_ctrl errorChaitanya Kulkarni1-2/+2
The function nvme_init_ctrl() gets the ctrl reference & when it fails it does put the ctrl reference in the error unwind code. When creating loop ctrl in nvme_loop_create_ctrl() if nvme_init_ctrl() returns non zero (i.e. error) value it jumps to the "out_put_ctrl" label which calls nvme_put_ctrl(), that will lead to douple ctrl put in error unwind path. Update nvme_loop_create_ctrl() such that this patch removes the "out_put_ctrl" label, add a new "out" label after nvme_put_ctrl() in error unwind path and jump to newly added label when nvme_init_ctrl() call retuns an error. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-10-06scsi: megaraid_sas: Added support for shared host tagset for cpuhotplugKashyap Desai2-13/+55
Fusion adapters can steer completions to individual queues, and we now have support for shared host-wide tags. So we can enable multiqueue support for fusion adapters. Once driver enable shared host-wide tags, cpu hotplug feature is also supported as it was enabled using below patchsets - commit bf0beec0607d ("blk-mq: drain I/O when all CPUs in a hctx are offline") Currently driver has provision to disable host-wide tags using "host_tagset_enable" module parameter. Once we do not have any major performance regression using host-wide tags, we will drop the hand-crafted interrupt affinity settings. Performance is also meeting the expecatation - (used both none and mq-deadline scheduler) 24 Drive SSD on Aero with/without this patch can get 3.1M IOPs 3 VDs consist of 8 SAS SSD on Aero with/without this patch can get 3.1M IOPs. Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Hannes Reinecke <hare@suse.com> Signed-off-by: John Garry <john.garry@huawei.com> Tested-by: Douglas Gilbert <dgilbert@interlog.com> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-06scsi: scsi_debug: Support host tagsetJohn Garry1-18/+10
When host_max_queue is set (> 0), set the Scsi_Host.host_tagset such that blk-mq will use a hostwide tagset over all SCSI host submission queues. This means that we may expose all submission queues and always use the hwq chosen by blk-mq. And since if sdebug_host_max_queue is set, sdebug_max_queue is fixed to the same value, we can simplify how sdebug_driver_template.can_queue is set. Signed-off-by: John Garry <john.garry@huawei.com> Tested-by: Douglas Gilbert <dgilbert@interlog.com> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-06scsi: hisi_sas: Switch v3 hw to MQJohn Garry3-70/+56
Now that the block layer provides a shared tag, we can switch the driver to expose all HW queues. Signed-off-by: John Garry <john.garry@huawei.com> Tested-by: Douglas Gilbert <dgilbert@interlog.com> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-06scsi: core: Show nr_hw_queues in sysfsJohn Garry1-0/+11
So that we don't use a value of 0 for when Scsi_Host.nr_hw_queues is unset, use the tag_set->nr_hw_queues (which holds 1 for this case). Signed-off-by: John Garry <john.garry@huawei.com> Tested-by: Douglas Gilbert <dgilbert@interlog.com> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-06scsi: Add host and host template flag 'host_tagset'Hannes Reinecke2-0/+3
Add Host and host template flag 'host_tagset' so hostwide tagset can be shared on multiple reply queues after the SCSI device's reply queue is converted to blk-mq hw queue. [jpg: Update comment on .can_queue and add Scsi_Host.host_tagset] Signed-off-by: Hannes Reinecke <hare@suse.com> Signed-off-by: John Garry <john.garry@huawei.com> Tested-by: Don Brace<don.brace@microsemi.com> #SCSI resv cmds patches used Tested-by: Douglas Gilbert <dgilbert@interlog.com> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-02rsxx: Use fallthrough pseudo-keywordGustavo A. R. Silva1-1/+1
Replace /* Fall through. */ comment with the new pseudo-keyword macro fallthrough[1]. [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-02bcache: remove embedded struct cache_sb from struct cache_setColy Li11-59/+46
Since bcache code was merged into mainline kerrnel, each cache set only as one single cache in it. The multiple caches framework is here but the code is far from completed. Considering the multiple copies of cached data can also be stored on e.g. md raid1 devices, it is unnecessary to support multiple caches in one cache set indeed. The previous preparation patches fix the dependencies of explicitly making a cache set only have single cache. Now we don't have to maintain an embedded partial super block in struct cache_set, the in-memory super block can be directly referenced from struct cache. This patch removes the embedded struct cache_sb from struct cache_set, and fixes all locations where the superb lock was referenced from this removed super block by referencing the in-memory super block of struct cache. Signed-off-by: Coly Li <colyli@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-02bcache: check and set sync status on cache's in-memory super blockColy Li4-10/+7
Currently the cache's sync status is checked and set on cache set's in- memory partial super block. After removing the embedded struct cache_sb from cache set and reference cache's in-memory super block from struct cache_set, the sync status can set and check directly on cache's super block. This patch checks and sets the cache sync status directly on cache's in-memory super block. This is a preparation for later removing embedded struct cache_sb from struct cache_set. Signed-off-by: Coly Li <colyli@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-02bcache: remove can_attach_cache()Coly Li1-10/+0
After removing the embedded struct cache_sb from struct cache_set, cache set will directly reference the in-memory super block of struct cache. It is unnecessary to compare block_size, bucket_size and nr_in_set from the identical in-memory super block in can_attach_cache(). This is a preparation patch for latter removing cache_set->sb from struct cache_set. Signed-off-by: Coly Li <colyli@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-02bcache: don't check seq numbers in register_cache_set()Coly Li1-15/+0
In order to update the partial super block of cache set, the seq numbers of cache and cache set are checked in register_cache_set(). If cache's seq number is larger than cache set's seq number, cache set must update its partial super block from cache's super block. It is unncessary when the embedded struct cache_sb is removed from struct cache set. This patch removed the seq numbers checking from register_cache_set(), because later there will be no such partial super block in struct cache set, the cache set will directly reference in-memory super block from struct cache. This is a preparation patch for removing embedded struct cache_sb from struct cache_set. Signed-off-by: Coly Li <colyli@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-02bcache: only use bucket_bytes() on struct cacheColy Li2-2/+2
Because struct cache_set and struct cache both have struct cache_sb, macro bucket_bytes() currently are used on both of them. When removing the embedded struct cache_sb from struct cache_set, this macro won't be used on struct cache_set anymore. This patch unifies all bucket_bytes() usage only on struct cache, this is one of the preparation to remove the embedded struct cache_sb from struct cache_set. Signed-off-by: Coly Li <colyli@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-02bcache: remove useless bucket_pages()Coly Li1-1/+0
It seems alloc_bucket_pages() is the only user of bucket_pages(). Considering alloc_bucket_pages() is removed from bcache code, it is safe to remove the useless macro bucket_pages() now. Signed-off-by: Coly Li <colyli@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-02bcache: remove useless alloc_bucket_pages()Coly Li1-3/+0
Now no one uses alloc_bucket_pages() anymore, remove it from bcache.h. Signed-off-by: Coly Li <colyli@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-02bcache: only use block_bytes() on struct cacheColy Li7-24/+24
Because struct cache_set and struct cache both have struct cache_sb, therefore macro block_bytes() can be used on both of them. When removing the embedded struct cache_sb from struct cache_set, this macro won't be used on struct cache_set anymore. This patch unifies all block_bytes() usage only on struct cache, this is one of the preparation to remove the embedded struct cache_sb from struct cache_set. Signed-off-by: Coly Li <colyli@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-02bcache: add set_uuid in struct cache_setColy Li3-13/+14
This patch adds a separated set_uuid[16] in struct cache_set, to store the uuid of the cache set. This is the preparation to remove the embedded struct cache_sb from struct cache_set. Signed-off-by: Coly Li <colyli@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-02bcache: remove for_each_cache()Coly Li6-294/+237
Since now each cache_set explicitly has single cache, for_each_cache() is unnecessary. This patch removes this macro, and update all locations where it is used, and makes sure all code logic still being consistent. Signed-off-by: Coly Li <colyli@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-02bcache: explicitly make cache_set only have single cacheColy Li3-17/+12
Currently although the bcache code has a framework for multiple caches in a cache set, but indeed the multiple caches never completed and users use md raid1 for multiple copies of the cached data. This patch does the following change in struct cache_set, to explicitly make a cache_set only have single cache, - Change pointer array "*cache[MAX_CACHES_PER_SET]" to a single pointer "*cache". - Remove pointer array "*cache_by_alloc[MAX_CACHES_PER_SET]". - Remove "caches_loaded". Now the code looks as exactly what it does in practic: only one cache is used in the cache set. Signed-off-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-02bcache: remove 'int n' from parameter list of bch_bucket_alloc_set()Coly Li4-24/+19
The parameter 'int n' from bch_bucket_alloc_set() is not cleared defined. From the code comments n is the number of buckets to alloc, but from the code itself 'n' is the maximum cache to iterate. Indeed all the locations where bch_bucket_alloc_set() is called, 'n' is alwasy 1. This patch removes the confused and unnecessary 'int n' from parameter list of bch_bucket_alloc_set(), and explicitly allocates only 1 bucket for its caller. Signed-off-by: Coly Li <colyli@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-02bcache: Convert to DEFINE_SHOW_ATTRIBUTEQinglang Miao1-13/+3
Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code. As inode->iprivate equals to third parameter of debugfs_create_file() which is NULL. So it's equivalent to original code logic. Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com> Signed-off-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-02bcache: check c->root with IS_ERR_OR_NULL() in mca_reserve()Dongsheng Yang1-1/+1
In mca_reserve(c) macro, we are checking root whether is NULL or not. But that's not enough, when we read the root node in run_cache_set(), if we got an error in bch_btree_node_read_done(), we will return ERR_PTR(-EIO) to c->root. And then we will go continue to unregister, but before calling unregister_shrinker(&c->shrink), there is a possibility to call bch_mca_count(), and we would get a crash with call trace like that: [ 2149.876008] Unable to handle kernel NULL pointer dereference at virtual address 00000000000000b5 ... ... [ 2150.598931] Call trace: [ 2150.606439] bch_mca_count+0x58/0x98 [escache] [ 2150.615866] do_shrink_slab+0x54/0x310 [ 2150.624429] shrink_slab+0x248/0x2d0 [ 2150.632633] drop_slab_node+0x54/0x88 [ 2150.640746] drop_slab+0x50/0x88 [ 2150.648228] drop_caches_sysctl_handler+0xf0/0x118 [ 2150.657219] proc_sys_call_handler.isra.18+0xb8/0x110 [ 2150.666342] proc_sys_write+0x40/0x50 [ 2150.673889] __vfs_write+0x48/0x90 [ 2150.681095] vfs_write+0xac/0x1b8 [ 2150.688145] ksys_write+0x6c/0xd0 [ 2150.695127] __arm64_sys_write+0x24/0x30 [ 2150.702749] el0_svc_handler+0xa0/0x128 [ 2150.710296] el0_svc+0x8/0xc Signed-off-by: Dongsheng Yang <dongsheng.yang@easystack.cn> Signed-off-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-02bcache: share register sysfs with async registerColy Li1-5/+7
Previously the experimental async registration uses a separate sysfs file register_async. Now the async registration code seems working well for a while, we can do furtuher testing with it now. This patch changes the async bcache registration shares the same sysfs file /sys/fs/bcache/register (and register_quiet). Async registration will be default behavior if BCACHE_ASYNC_REGISTRATION is set in kernel configure. By default, BCACHE_ASYNC_REGISTRATION is not configured yet. Signed-off-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-29null_blk: add support for max open/active zone limit for zoned devicesNiklas Cassel3-58/+282
Add support for user space to set a max open zone and a max active zone limit via configfs. By default, the default values are 0 == no limit. Call the block layer API functions used for exposing the configured limits to sysfs. Add accounting in null_blk_zoned so that these new limits are respected. Performing an operation that would exceed these limits results in a standard I/O error. A max open zone limit exists in the ZBC standard. While null_blk_zoned is used to test the Zoned Block Device model in Linux, when it comes to differences between ZBC and ZNS, null_blk_zoned mostly follows ZBC. Therefore, implement the manage open zone resources function from ZBC, but additionally add support for max active zones. This enables user space not only to test against a device with an open zone limit, but also to test against a device with an active zone limit. Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-27nvme-pci: allocate separate interrupt for the reserved non-polled I/O queueJeffle Xu1-17/+15
One queue will be reserved for non-polled IO when nvme.poll_queues is greater or equal than the number of IO queues that the nvme controller can provide. Currently the reserved queue for non-polled IO will reuse the interrupt used by admin queue in this case, e.g, vector 0. This can work and the performance may not be an issue since the admin queue is used unfrequently. However this behaviour may be inconsistent with that when nvme.poll_queues is smaller than the number of IO queues available. Thus allocate separate interrupt for this reserved queue, and thus make the behaviour consistent. Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com> [hch: minor cleanups, mostly to the pre-existing surrounding code] Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-09-27nvme: fix error handling in nvme_ns_report_zonesChristoph Hellwig1-25/+16
nvme_submit_sync_cmd can return positive NVMe error codes in addition to the negative Linux error code, which are currently ignored. Fix this by removing __nvme_ns_report_zones and handling the errors from nvme_submit_sync_cmd in the caller instead of multiplexing the return value and the number of zones reported into a single return value. Fixes: 240e6ee272c0 ("nvme: support for zoned namespaces") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
2020-09-27nvmet-fc: fix missing check for no hostport structJames Smart1-1/+1
A hostport port pointer is allowed to be NULL as it is not allocated if the lldd does not support the new interfaces for NVME LS request support. The hostport free routine validates the handle but forgot to validate the hostport pointer. Validate the hostport pointer before using it to validate the handle. Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-09-27nvmet: add passthru ZNS supportChaitanya Kulkarni1-0/+16
In the default passthru implementation NVMeOF target passthru ctrl is not capable of handling Zoned Namespaces (ZNS). Update the nvmet_parse_pasthru_admin_cmd() to allow NVME_ID_CNS_CS_CTRL/NVME_CIS_ZNS and NVME_ID_CNS_CS_NS/NVME_CIS_ZNS. With this addition NVMeOF Passthru allows Zoned Namespaces. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-09-27nvmet: handle keep-alive timer when kato is modified by a set features cmdAmit Engel3-2/+6
A user may modify the kato by a set features cmd. To properly deal with races or a kato value of 0 (no keep alive enabled) change nvmet_set_feat_kato to first disable the timer, then set the value and then re-enable the timer. Signed-off-by: Amit Engel <amit.engel@dell.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-09-27nvmet-tcp: have queue io_work context run on sock incoming cpuMark Wunderlich1-11/+10
No real good need to spread queues artificially. Usually the target will serve multiple hosts, and it's better to run on the socket incoming cpu for better affinitization rather than spread queues on all online cpus. We rely on RSS to spread the work around sufficiently. Signed-off-by: Mark Wunderlich <mark.wunderlich@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-09-27nvme-pci: Move enumeration by class to be last in the tableAndy Shevchenko1-1/+2
It's unusual that we have enumeration by class in the middle of the table. It might potentially be problematic in the future if we add another entry after it. So, move class matching entry to be the last in the ID table. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-09-27nvme: use an xarray to lookup the Commands Supported and Effects logChaitanya Kulkarni2-28/+5
When using linked list we have to open code the locking, search, and destroy operations with the loops even if data structure doesn't fall into the fast path. One of the main advantage of having XArray to store, search, and remove items is that it handles all the locking by itself, avoids the loops when using linked lists, provides clear API to replace the linked list's search and destroy loops. This patch replaces the ctrl->cel list with XArray and removes :- a. Extra code needed for the linked list for ctrl->cel item management such as nvme_find_cel(). b. Destroy loop in the nvme_free_ctrl(). c. Explicit insertion locking in the nvme_get_effects_log(). Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-09-27nvme: lift the file open code from nvme_ctrl_get_by_pathChaitanya Kulkarni3-32/+22
Lift opening the file open/close code from nvme_ctrl_get_by_path into the caller, just keeping a simple nvme_ctrl_from_file() helper. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> [hch: refactored a bit, split the bug fixes into a separate prep patch] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
2020-09-24md/raid10: improve discard request for far layoutXiao Ni2-23/+64
For far layout, the discard region is not continuous on disks. So it needs far copies r10bio to cover all regions. It needs a way to know all r10bios have finish or not. Similar with raid10_sync_request, only the first r10bio master_bio records the discard bio. Other r10bios master_bio record the first r10bio. The first r10bio can finish after other r10bios finish and then return the discard bio. Signed-off-by: Xiao Ni <xni@redhat.com> Signed-off-by: Song Liu <songliubraving@fb.com>