summaryrefslogtreecommitdiffstats
path: root/drivers/md/dm.c
AgeCommit message (Collapse)AuthorFilesLines
2021-06-04dm: introduce zone append emulationDamien Le Moal1-15/+23
For zoned targets that cannot support zone append operations, implement an emulation using regular write operations. If the original BIO submitted by the user is a zone append operation, change its clone into a regular write operation directed at the target zone write pointer position. To do so, an array of write pointer offsets (write pointer position relative to the start of a zone) is added to struct mapped_device. All operations that modify a sequential zone write pointer (writes, zone reset, zone finish and zone append) are intersepted in __map_bio() and processed using the new functions dm_zone_map_bio(). Detection of the target ability to natively support zone append operations is done from dm_table_set_restrictions() by calling the function dm_set_zones_restrictions(). A target that does not support zone append operation, either by explicitly declaring it using the new struct dm_target field zone_append_not_supported, or because the device table contains a non-zoned device, has its mapped device marked with the new flag DMF_ZONE_APPEND_EMULATED. The helper function dm_emulate_zone_append() is introduced to test a mapped device for this new flag. Atomicity of the zones write pointer tracking and updates is done using a zone write locking mechanism based on a bitmap. This is similar to the block layer method but based on BIOs rather than struct request. A zone write lock is taken in dm_zone_map_bio() for any clone BIO with an operation type that changes the BIO target zone write pointer position. The zone write lock is released if the clone BIO is failed before submission or when dm_zone_endio() is called when the clone BIO completes. The zone write lock bitmap of the mapped device, together with a bitmap indicating zone types (conv_zones_bitmap) and the write pointer offset array (zwp_offset) are allocated and initialized with a full device zone report in dm_set_zones_restrictions() using the function dm_revalidate_zones(). For failed operations that may have modified a zone write pointer, the zone write pointer offset is marked as invalid in dm_zone_endio(). Zones with an invalid write pointer offset are checked and the write pointer updated using an internal report zone operation when the faulty zone is accessed again by the user. All functions added for this emulation have a minimal overhead for zoned targets natively supporting zone append operations. Regular device targets are also not affected. The added code also does not impact builds with CONFIG_BLK_DEV_ZONED disabled by stubbing out all dm zone related functions. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-06-04dm: rearrange core declarations for extended use from dm-zone.cDamien Le Moal1-52/+7
Move the definitions of struct dm_target_io, struct dm_io and the bits of the flags field of struct mapped_device from dm.c to dm-core.h to make them usable from dm-zone.c. For the same reason, declare dec_pending() in dm-core.h after renaming it to dm_io_dec_pending(). And for symmetry of the function names, introduce the inline helper dm_io_inc_pending() instead of directly using atomic_inc() calls. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-06-04dm: Forbid requeue of writes to zonesDamien Le Moal1-6/+19
A target map method requesting the requeue of a bio with DM_MAPIO_REQUEUE or completing it with DM_ENDIO_REQUEUE can cause unaligned write errors if the bio is a write operation targeting a sequential zone. If a zoned target request such a requeue, warn about it and kill the IO. The function dm_is_zone_write() is introduced to detect write operations to zoned targets. This change does not affect the target drivers supporting zoned devices and exposing a zoned device, namely dm-crypt, dm-linear and dm-flakey as none of these targets ever request a requeue. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-06-04dm: move zone related code to dm-zone.cDamien Le Moal1-78/+0
Move core and table code used for zoned targets and conditionally defined with #ifdef CONFIG_BLK_DEV_ZONED to the new file dm-zone.c. This file is conditionally compiled depending on CONFIG_BLK_DEV_ZONED. The small helper dm_set_zones_restrictions() is introduced to initialize a mapped device request queue zone attributes in dm_table_set_restrictions(). Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-06-04dm: Fix dm_accept_partial_bio() relative to zone management commandsDamien Le Moal1-2/+6
Fix dm_accept_partial_bio() to actually check that zone management commands are not passed as explained in the function documentation comment. Also, since a zone append operation cannot be split, add REQ_OP_ZONE_APPEND as a forbidden command. White lines are added around the group of BUG_ON() calls to make the code more legible. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-03-26dm: unexport dm_{get,put}_table_deviceChristoph Hellwig1-2/+0
These are only used by DM core, DM target modules should only use dm_{get,put}_device. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-03-26dm: remove useless loop in __split_and_process_bioMikulas Patocka1-32/+29
Remove useless "while" loop. If the condition ci.sector_count && !error is true, we go to a branch that ends with "break". If this condition is false, the "while" loop will not be executed again. So, the loop can't be executed more than once. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-03-22dm: don't report "detected capacity change" on device creationMikulas Patocka1-1/+4
When a DM device is first created it doesn't yet have an established capacity, therefore the use of set_capacity_and_notify() should be conditional given the potential for needless pr_info "detected capacity change" noise even if capacity is 0. One could argue that the pr_info() in set_capacity_and_notify() is misplaced, but that position is not held uniformly. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Fixes: f64d9b2eacb9 ("dm: use set_capacity_and_notify") Cc: stable@vger.kernel.org Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-02-11dm: fix deadlock when swapping to encrypted deviceMikulas Patocka1-0/+60
The system would deadlock when swapping to a dm-crypt device. The reason is that for each incoming write bio, dm-crypt allocates memory that holds encrypted data. These excessive allocations exhaust all the memory and the result is either deadlock or OOM trigger. This patch limits the number of in-flight swap bios, so that the memory consumed by dm-crypt is limited. The limit is enforced if the target set the "limit_swap_bios" variable and if the bio has REQ_SWAP set. Non-swap bios are not affected becuase taking the semaphore would cause performance degradation. This is similar to request-based drivers - they will also block when the number of requests is over the limit. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-02-11dm: add support for passing through inline crypto supportSatya Tangirala1-1/+17
Update the device-mapper core to support exposing the inline crypto support of the underlying device(s) through the device-mapper device. This works by creating a "passthrough keyslot manager" for the dm device, which declares support for encryption settings which all underlying devices support. When a supported setting is used, the bio cloning code handles cloning the crypto context to the bios for all the underlying devices. When an unsupported setting is used, the blk-crypto fallback is used as usual. Crypto support on each underlying device is ignored unless the corresponding dm target opts into exposing it. This is needed because for inline crypto to semantically operate on the original bio, the data must not be transformed by the dm target. Thus, targets like dm-linear can expose crypto support of the underlying device, but targets like dm-crypt can't. (dm-crypt could use inline crypto itself, though.) A DM device's table can only be changed if the "new" inline encryption capabilities are a (*not* necessarily strict) superset of the "old" inline encryption capabilities. Attempts to make changes to the table that result in some inline encryption capability becoming no longer supported will be rejected. For the sake of clarity, key eviction from underlying devices will be handled in a future patch. Co-developed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Satya Tangirala <satyat@google.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-02-09dm table: fix DAX iterate_devices based device capability checksJeffle Xu1-1/+1
Fix dm_table_supports_dax() and invert logic of both iterate_devices_callout_fn so that all devices' DAX capabilities are properly checked. Fixes: 545ed20e6df6 ("dm: add infrastructure for DAX support") Cc: stable@vger.kernel.org Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-02-03dm: cleanup of front padding calculationJeffle Xu1-6/+10
Add two helper macros calculating the offset of bio in struct dm_io and struct dm_target_io respectively. Besides, simplify the front padding calculation in dm_alloc_md_mempools(). Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-01-24block: store a block_device pointer in struct bioChristoph Hellwig1-7/+7
Replace the gendisk pointer in struct bio with a pointer to the newly improved struct block device. From that the gendisk can be trivially accessed with an extra indirection, but it also allows to directly look up all information related to partition remapping. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-08dm: eliminate potential source of excessive kernel log noiseMike Snitzer1-1/+1
There wasn't ever a real need to log an error in the kernel log for ioctls issued with insufficient permissions. Simply return an error and if an admin/user is sufficiently motivated they can enable DM's dynamic debugging to see an explanation for why the ioctls were disallowed. Reported-by: Nir Soffer <nsoffer@redhat.com> Fixes: e980f62353c6 ("dm: don't allow ioctls to targets that don't map to whole devices") Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-12-22Merge tag 'for-5.11/dm-changes' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper updates from Mike Snitzer: - Add DM verity support for signature verification with 2nd keyring - Fix DM verity to skip verity work if IO completes with error while system is shutting down - Add new DM multipath "IO affinity" path selector that maps IO destined to a given path to a specific CPU based on user provided mapping - Rename DM multipath path selector source files to have "dm-ps" prefix - Add REQ_NOWAIT support to some other simple DM targets that don't block in more elaborate ways waiting for IO - Export DM crypt's kcryptd workqueue via sysfs (WQ_SYSFS) - Fix error return code in DM's target_message() if empty message is received - A handful of other small cleanups * tag 'for-5.11/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm cache: simplify the return expression of load_mapping() dm ebs: avoid double unlikely() notation when using IS_ERR() dm verity: skip verity work if I/O error when system is shutting down dm crypt: export sysfs of kcryptd workqueue dm ioctl: fix error return code in target_message dm crypt: Constify static crypt_iv_operations dm: add support for REQ_NOWAIT to various targets dm: rename multipath path selector source files to have "dm-ps" prefix dm mpath: add IO affinity path selector dm verity: Add support for signature verification with 2nd keyring dm: remove unnecessary current->bio_list check when submitting split bio
2020-12-16Merge tag 'for-5.11/block-2020-12-14' of git://git.kernel.dk/linux-blockLinus Torvalds1-40/+18
Pull block updates from Jens Axboe: "Another series of killing more code than what is being added, again thanks to Christoph's relentless cleanups and tech debt tackling. This contains: - blk-iocost improvements (Baolin Wang) - part0 iostat fix (Jeffle Xu) - Disable iopoll for split bios (Jeffle Xu) - block tracepoint cleanups (Christoph Hellwig) - Merging of struct block_device and hd_struct (Christoph Hellwig) - Rework/cleanup of how block device sizes are updated (Christoph Hellwig) - Simplification of gendisk lookup and removal of block device aliasing (Christoph Hellwig) - Block device ioctl cleanups (Christoph Hellwig) - Removal of bdget()/blkdev_get() as exported API (Christoph Hellwig) - Disk change rework, avoid ->revalidate_disk() (Christoph Hellwig) - sbitmap improvements (Pavel Begunkov) - Hybrid polling fix (Pavel Begunkov) - bvec iteration improvements (Pavel Begunkov) - Zone revalidation fixes (Damien Le Moal) - blk-throttle limit fix (Yu Kuai) - Various little fixes" * tag 'for-5.11/block-2020-12-14' of git://git.kernel.dk/linux-block: (126 commits) blk-mq: fix msec comment from micro to milli seconds blk-mq: update arg in comment of blk_mq_map_queue blk-mq: add helper allocating tagset->tags Revert "block: Fix a lockdep complaint triggered by request queue flushing" nvme-loop: use blk_mq_hctx_set_fq_lock_class to set loop's lock class blk-mq: add new API of blk_mq_hctx_set_fq_lock_class block: disable iopoll for split bio block: Improve blk_revalidate_disk_zones() checks sbitmap: simplify wrap check sbitmap: replace CAS with atomic and sbitmap: remove swap_lock sbitmap: optimise sbitmap_deferred_clear() blk-mq: skip hybrid polling if iopoll doesn't spin blk-iocost: Factor out the base vrate change into a separate function blk-iocost: Factor out the active iocgs' state check into a separate function blk-iocost: Move the usage ratio calculation to the correct place blk-iocost: Remove unnecessary advance declaration blk-iocost: Fix some typos in comments blktrace: fix up a kerneldoc comment block: remove the request_queue to argument request based tracepoints ...
2020-12-04dm: remove unnecessary current->bio_list check when submitting split bioJeffle Xu1-1/+1
The depth-first splitting is introduced in commit 18a25da84354 ("dm: ensure bio submission follows a depth-first tree walk"), which is used to fix the potential deadlock in case of the misordering handling of bios caused by bio_list. There're two paths submitting split bios, dm_wq_work() from worker thread and submit_bio() from application. Back upon that time, dm_wq_work() thread calls __split_and_process_bio() directly and thus will not trigger this issue since bio_list doesn't exist here. So this issue will only be triggered from application calling submit_bio(), and the fix has to check if current->bio_list is non-NULL to distinguish this case. However since commit 0c2915b8c6db1 ("dm: fix missing imposition of queue_limits from dm_wq_work() thread"), dm_wq_work() thread calls submit_bio_noacct() and thus also uses bio_list. Since then all entries into __split_and_process_bio() are under protection of bio_list, and thus the checking of current->bio_list when determinning if the depth-first principle should be used, seems kind of nonsense. After all the checking always succeeds now. Fixes: 0c2915b8c6db1 ("dm: fix missing imposition of queue_limits from dm_wq_work() thread") Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-12-04dm: remove invalid sparse __acquires and __releases annotationsMike Snitzer1-2/+0
Fixes sparse warnings: drivers/md/dm.c:508:12: warning: context imbalance in 'dm_prepare_ioctl' - wrong count at exit drivers/md/dm.c:543:13: warning: context imbalance in 'dm_unprepare_ioctl' - wrong count at exit Fixes: 971888c46993f ("dm: hold DM table for duration of ioctl rather than use blkdev_get") Cc: stable@vger.kernel.org Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-12-04dm: fix double RCU unlock in dm_dax_zero_page_range() error pathMike Snitzer1-2/+0
Remove redundant dm_put_live_table() in dm_dax_zero_page_range() error path to fix sparse warning: drivers/md/dm.c:1208:9: warning: context imbalance in 'dm_dax_zero_page_range' - unexpected unlock Fixes: cdf6cdcd3b99a ("dm,dax: Add dax zero_page_range operation") Cc: stable@vger.kernel.org Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-12-04dm: fix IO splittingMike Snitzer1-8/+11
Commit 882ec4e609c1 ("dm table: stack 'chunk_sectors' limit to account for target-specific splitting") caused a couple regressions: 1) Using lcm_not_zero() when stacking chunk_sectors was a bug because chunk_sectors must reflect the most limited of all devices in the IO stack. 2) DM targets that set max_io_len but that do _not_ provide an .iterate_devices method no longer had there IO split properly. And commit 5091cdec56fa ("dm: change max_io_len() to use blk_max_size_offset()") also caused a regression where DM no longer supported varied (per target) IO splitting. The implication being the potential for severely reduced performance for IO stacks that use a DM target like dm-cache to hide performance limitations of a slower device (e.g. one that requires 4K IO splitting). Coming full circle: Fix all these issues by discontinuing stacking chunk_sectors up using ti->max_io_len in dm_calculate_queue_limits(), add optional chunk_sectors override argument to blk_max_size_offset() and update DM's max_io_len() to pass ti->max_io_len to its blk_max_size_offset() call. Passing in an optional chunk_sectors override to blk_max_size_offset() allows for code reuse of block's centralized calculation for max IO size based on provided offset and split boundary. Fixes: 882ec4e609c1 ("dm table: stack 'chunk_sectors' limit to account for target-specific splitting") Fixes: 5091cdec56fa ("dm: change max_io_len() to use blk_max_size_offset()") Cc: stable@vger.kernel.org Reported-by: John Dorminy <jdorminy@redhat.com> Reported-by: Bruce Johnston <bjohnsto@redhat.com> Reported-by: Kirill Tkhai <ktkhai@virtuozzo.com> Reviewed-by: John Dorminy <jdorminy@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Jens Axboe <axboe@kernel.dk>
2020-12-04block: remove the request_queue argument to the block_bio_remap tracepointChristoph Hellwig1-2/+1
The request_queue can trivially be derived from the bio. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-04block: remove the request_queue argument to the block_split tracepointChristoph Hellwig1-1/+1
The request_queue can trivially be derived from the bio. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-01block: stop using bdget_disk for partition 0Christoph Hellwig1-14/+2
We can just dereference the point in struct gendisk instead. Also remove the now unused export. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-01block: switch partition lookup to use struct block_deviceChristoph Hellwig1-2/+2
Use struct block_device to lookup partitions on a disk. This removes all usage of struct hd_struct from the I/O path. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Coly Li <colyli@suse.de> [bcache] Acked-by: Chao Yu <yuchao0@huawei.com> [f2fs] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-01block: allocate struct hd_struct as part of struct bdev_inodeChristoph Hellwig1-2/+2
Allocate hd_struct together with struct block_device to pre-load the lifetime rule changes in preparation of merging the two structures. Note that part0 was previously embedded into struct gendisk, but is a separate allocation now, and already points to the block_device instead of the hd_struct. The lifetime of struct gendisk is still controlled by the struct device embedded in the part0 hd_struct. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-01dm: remove the block_device reference in struct mapped_deviceChristoph Hellwig1-11/+14
Get rid of the long-lasting struct block_device reference in struct mapped_device. The only remaining user is the freeze code, where we can trivially look up the block device at freeze time and release the reference at thaw time. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-01dm: simplify flush_bio initialization in __send_empty_flushChristoph Hellwig1-9/+3
We don't really need the struct block_device to initialize a bio. So switch from using bio_set_dev to manually setting up bi_disk (bi_partno will always be zero and has been cleared by bio_init already). Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-01fs: simplify freeze_bdev/thaw_bdevChristoph Hellwig1-14/+6
Store the frozen superblock in struct block_device to avoid the awkward interface that can return a sb only used a cookie, an ERR_PTR or NULL. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Chao Yu <yuchao0@huawei.com> [f2fs] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-01dm: fix bug with RCU locking in dm_blk_report_zonesSergei Shtepa1-2/+4
The dm_get_live_table() function makes RCU read lock so dm_put_live_table() must be called even if dm_table map is not found. Fixes: e76239a3748c9 ("block: add a report_zones method") Cc: stable@vger.kernel.org Signed-off-by: Sergei Shtepa <sergei.shtepa@veeam.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-11-16dm: use set_capacity_and_notifyChristoph Hellwig1-2/+1
Use set_capacity_and_notify to set the size of both the disk and block device. This also gets the uevent notifications for the resize for free. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-11-16block: remove __blkdev_driver_ioctlChristoph Hellwig1-1/+4
Just open code it in the few callers. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-14Merge tag 'for-5.10/dm-changes' of ↵Linus Torvalds1-272/+132
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper updates from Mike Snitzer: - Improve DM core's bio splitting to use blk_max_size_offset(). Also fix bio splitting for bios that were deferred to the worker thread due to a DM device being suspended. - Remove DM core's special handling of NVMe devices now that block core has internalized efficiencies drivers previously needed to be concerned about (via now removed direct_make_request). - Fix request-based DM to not bounce through indirect dm_submit_bio; instead have block core make direct call to blk_mq_submit_bio(). - Various DM core cleanups to simplify and improve code. - Update DM cryot to not use drivers that set CRYPTO_ALG_ALLOCATES_MEMORY. - Fix DM raid's raid1 and raid10 discard limits for the purposes of linux-stable. But then remove DM raid's discard limits settings now that MD raid can efficiently handle large discards. - A couple small cleanups across various targets. * tag 'for-5.10/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm: fix request-based DM to not bounce through indirect dm_submit_bio dm: remove special-casing of bio-based immutable singleton target on NVMe dm: export dm_copy_name_and_uuid dm: fix comment in __dm_suspend() dm: fold dm_process_bio() into dm_submit_bio() dm: fix missing imposition of queue_limits from dm_wq_work() thread dm snap persistent: simplify area_io() dm thin metadata: Remove unused local variable when create thin and snap dm raid: remove unnecessary discard limits for raid10 dm raid: fix discard limits for raid1 and raid10 dm crypt: don't use drivers that have CRYPTO_ALG_ALLOCATES_MEMORY dm: use dm_table_get_device_name() where appropriate in targets dm table: make 'struct dm_table' definition accessible to all of DM core dm: eliminate need for start_io_acct() forward declaration dm: simplify __process_abnormal_io() dm: push use of on-stack flush_bio down to __send_empty_flush() dm: optimize max_io_len() by inlining max_io_len_target_boundary() dm: push md->immutable_target optimization down to __process_bio() dm: change max_io_len() to use blk_max_size_offset() dm table: stack 'chunk_sectors' limit to account for target-specific splitting
2020-10-13Merge tag 'block-5.10-2020-10-12' of git://git.kernel.dk/linux-blockLinus Torvalds1-17/+9
Pull block updates from Jens Axboe: - Series of merge handling cleanups (Baolin, Christoph) - Series of blk-throttle fixes and cleanups (Baolin) - Series cleaning up BDI, seperating the block device from the backing_dev_info (Christoph) - Removal of bdget() as a generic API (Christoph) - Removal of blkdev_get() as a generic API (Christoph) - Cleanup of is-partition checks (Christoph) - Series reworking disk revalidation (Christoph) - Series cleaning up bio flags (Christoph) - bio crypt fixes (Eric) - IO stats inflight tweak (Gabriel) - blk-mq tags fixes (Hannes) - Buffer invalidation fixes (Jan) - Allow soft limits for zone append (Johannes) - Shared tag set improvements (John, Kashyap) - Allow IOPRIO_CLASS_RT for CAP_SYS_NICE (Khazhismel) - DM no-wait support (Mike, Konstantin) - Request allocation improvements (Ming) - Allow md/dm/bcache to use IO stat helpers (Song) - Series improving blk-iocost (Tejun) - Various cleanups (Geert, Damien, Danny, Julia, Tetsuo, Tian, Wang, Xianting, Yang, Yufen, yangerkun) * tag 'block-5.10-2020-10-12' of git://git.kernel.dk/linux-block: (191 commits) block: fix uapi blkzoned.h comments blk-mq: move cancel of hctx->run_work to the front of blk_exit_queue blk-mq: get rid of the dead flush handle code path block: get rid of unnecessary local variable block: fix comment and add lockdep assert blk-mq: use helper function to test hw stopped block: use helper function to test queue register block: remove redundant mq check block: invoke blk_mq_exit_sched no matter whether have .exit_sched percpu_ref: don't refer to ref->data if it isn't allocated block: ratelimit handle_bad_sector() message blk-throttle: Re-use the throtl_set_slice_end() blk-throttle: Open code __throtl_de/enqueue_tg() blk-throttle: Move service tree validation out of the throtl_rb_first() blk-throttle: Move the list operation after list validation blk-throttle: Fix IO hang for a corner case blk-throttle: Avoid tracking latency if low limit is invalid blk-throttle: Avoid getting the current time if tg->last_finish_time is 0 blk-throttle: Remove a meaningless parameter for throtl_downgrade_state() block: Remove redundant 'return' statement ...
2020-10-07dm: fix request-based DM to not bounce through indirect dm_submit_bioMike Snitzer1-13/+12
It is unnecessary to force request-based DM to call into bio-based dm_submit_bio (via indirect disk->fops->submit_bio) only to have it then call blk_mq_submit_bio(). Fix this by establishing a request-based DM block_device_operations (dm_rq_blk_dops, which doesn't have .submit_bio) and update dm_setup_md_queue() to set md->disk->fops to it for DM_TYPE_REQUEST_BASED. Remove DM_TYPE_REQUEST_BASED conditional in dm_submit_bio and unexport blk_mq_submit_bio. Fixes: c62b37d96b6eb ("block: move ->make_request_fn to struct block_device_operations") Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-10-07dm: remove special-casing of bio-based immutable singleton target on NVMeMike Snitzer1-50/+5
Since commit 5a6c35f9af416 ("block: remove direct_make_request") there is no benefit to DM special-casing NVMe. Remove all code used to establish DM_TYPE_NVME_BIO_BASED. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-10-05block: make bio_crypt_clone() able to failEric Biggers1-3/+4
bio_crypt_clone() assumes its gfp_mask argument always includes __GFP_DIRECT_RECLAIM, so that the mempool_alloc() will always succeed. However, bio_crypt_clone() might be called with GFP_ATOMIC via setup_clone() in drivers/md/dm-rq.c, or with GFP_NOWAIT via kcryptd_io_read() in drivers/md/dm-crypt.c. Neither case is currently reachable with a bio that actually has an encryption context. However, it's fragile to rely on this. Just make bio_crypt_clone() able to fail, analogous to bio_integrity_clone(). Reported-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Satya Tangirala <satyat@google.com> Cc: Satya Tangirala <satyat@google.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-01dm: fix comment in __dm_suspend()Mike Snitzer1-5/+4
Fix stale references to functions that have been renamed and fix typo. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-10-01dm: fold dm_process_bio() into dm_submit_bio()Mike Snitzer1-30/+22
dm_process_bio() is only called by dm_submit_bio(), there is no benefit to keeping dm_process_bio() factored out, so fold it. While at it, cleanup dm_submit_bio()'s DMF_BLOCK_IO_FOR_SUSPEND related branching and expand scope of dm_get_live_table() rcu reference on map via common 'out' label to dm_put_live_table(). Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-09-30dm: fix missing imposition of queue_limits from dm_wq_work() threadMike Snitzer1-25/+9
If a DM device was suspended when bios were issued to it, those bios would be deferred using queue_io(). Once the DM device was resumed dm_process_bio() could be called by dm_wq_work() for original bio that still needs splitting. dm_process_bio()'s check for current->bio_list (meaning call chain is within ->submit_bio) as a prerequisite for calling blk_queue_split() for "abnormal IO" would result in dm_process_bio() never imposing corresponding queue_limits (e.g. discard_granularity, discard_max_bytes, etc). Fix this by always having dm_wq_work() resubmit deferred bios using submit_bio_noacct(). Side-effect is blk_queue_split() is always called for "abnormal IO" from ->submit_bio, be it from application thread or dm_wq_work() workqueue, so proper bio splitting and depth-first bio submission is performed. For sake of clarity, remove current->bio_list check before call to blk_queue_split(). Also, remove dm_wq_work()'s use of dm_{get,put}_live_table() -- no longer needed since IO will be reissued in terms of ->submit_bio. And rename bio variable from 'c' to 'bio'. Fixes: cf9c37865557 ("dm: fix comment in dm_process_bio()") Reported-by: Jeffle Xu <jefflexu@linux.alibaba.com> Reviewed-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-09-29dm table: make 'struct dm_table' definition accessible to all of DM coreMike Snitzer1-19/+4
Move 'struct dm_table' definition from dm-table.c to dm-core.h and update DM core to access its members directly. Helps optimize max_io_len() and other methods slightly. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-09-29dm: eliminate need for start_io_acct() forward declarationMike Snitzer1-40/+38
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-09-29dm: simplify __process_abnormal_io()Mike Snitzer1-51/+17
Only call bio_op() once in switch statement. Also remove the excessive factoring out to one line functions. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-09-29dm: push use of on-stack flush_bio down to __send_empty_flush()Mike Snitzer1-24/+13
Eliminates duplicate code, no functional change. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-09-29dm: optimize max_io_len() by inlining max_io_len_target_boundary()Mike Snitzer1-9/+10
Saves redundant dm_target_offset() math. Also, reverse argument order for max_io_len() to be consistent with other similar functions. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-09-29dm: push md->immutable_target optimization down to __process_bio()Mike Snitzer1-13/+9
Also, update associated stale comment in __bind(). Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-09-29dm: change max_io_len() to use blk_max_size_offset()Mike Snitzer1-12/+8
Using blk_max_size_offset() enables DM core's splitting to impose ti->max_io_len (via q->limits.chunk_sectors) and also fallback to respecting q->limits.max_sectors if chunk_sectors isn't set. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-09-29Merge remote-tracking branch 'jens/for-5.10/block' into dm-5.10Mike Snitzer1-14/+5
DM depends on these block 5.10 commits: 22ada802ede8 block: use lcm_not_zero() when stacking chunk_sectors 07d098e6bbad block: allow 'chunk_sectors' to be non-power-of-2 021a24460dc2 block: add QUEUE_FLAG_NOWAIT 6abc49468eea dm: add support for REQ_NOWAIT and enable it for linear target Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-09-25dm: add support for REQ_NOWAIT and enable it for linear targetKonstantin Khlebnikov1-1/+3
Add DM target feature flag DM_TARGET_NOWAIT which advertises that target works with REQ_NOWAIT bios. Add dm_table_supports_nowait() and update dm_table_set_restrictions() to set/clear QUEUE_FLAG_NOWAIT accordingly. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-21dm: fix comment in dm_process_bio()Mike Snitzer1-1/+3
Refer to the correct function (->submit_bio instead of ->queue_bio). Also, add details about why using blk_queue_split() isn't needed for dm_wq_work()'s call to dm_process_bio(). Fixes: c62b37d96b6eb ("block: move ->make_request_fn to struct block_device_operations") Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-09-21dm: fix bio splitting and its bio completion order for regular IOMike Snitzer1-21/+2
dm_queue_split() is removed because __split_and_process_bio() _must_ handle splitting bios to ensure proper bio submission and completion ordering as a bio is split. Otherwise, multiple recursive calls to ->submit_bio will cause multiple split bios to be allocated from the same ->bio_split mempool at the same time. This would result in deadlock in low memory conditions because no progress could be made (only one bio is available in ->bio_split mempool). This fix has been verified to still fix the loss of performance, due to excess splitting, that commit 120c9257f5f1 provided. Fixes: 120c9257f5f1 ("Revert "dm: always call blk_queue_split() in dm_process_bio()"") Cc: stable@vger.kernel.org # 5.0+, requires custom backport due to 5.9 changes Reported-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>