summaryrefslogtreecommitdiffstats
path: root/drivers/md/dm-table.c
AgeCommit message (Collapse)AuthorFilesLines
2020-12-04dm: fix IO splittingMike Snitzer1-5/+0
Commit 882ec4e609c1 ("dm table: stack 'chunk_sectors' limit to account for target-specific splitting") caused a couple regressions: 1) Using lcm_not_zero() when stacking chunk_sectors was a bug because chunk_sectors must reflect the most limited of all devices in the IO stack. 2) DM targets that set max_io_len but that do _not_ provide an .iterate_devices method no longer had there IO split properly. And commit 5091cdec56fa ("dm: change max_io_len() to use blk_max_size_offset()") also caused a regression where DM no longer supported varied (per target) IO splitting. The implication being the potential for severely reduced performance for IO stacks that use a DM target like dm-cache to hide performance limitations of a slower device (e.g. one that requires 4K IO splitting). Coming full circle: Fix all these issues by discontinuing stacking chunk_sectors up using ti->max_io_len in dm_calculate_queue_limits(), add optional chunk_sectors override argument to blk_max_size_offset() and update DM's max_io_len() to pass ti->max_io_len to its blk_max_size_offset() call. Passing in an optional chunk_sectors override to blk_max_size_offset() allows for code reuse of block's centralized calculation for max IO size based on provided offset and split boundary. Fixes: 882ec4e609c1 ("dm table: stack 'chunk_sectors' limit to account for target-specific splitting") Fixes: 5091cdec56fa ("dm: change max_io_len() to use blk_max_size_offset()") Cc: stable@vger.kernel.org Reported-by: John Dorminy <jdorminy@redhat.com> Reported-by: Bruce Johnston <bjohnsto@redhat.com> Reported-by: Kirill Tkhai <ktkhai@virtuozzo.com> Reviewed-by: John Dorminy <jdorminy@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Jens Axboe <axboe@kernel.dk>
2020-12-01dm table: Remove BUG_ON(in_interrupt())Thomas Gleixner1-6/+0
The BUG_ON(in_interrupt()) in dm_table_event() is a historic leftover from a rework of the dm table code which changed the calling context. Issuing a BUG for a wrong calling context is frowned upon and in_interrupt() is deprecated and only covering parts of the wrong contexts. The sanity check for the context is covered by CONFIG_DEBUG_ATOMIC_SLEEP and other debug facilities already. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-10-07dm: remove special-casing of bio-based immutable singleton target on NVMeMike Snitzer1-30/+2
Since commit 5a6c35f9af416 ("block: remove direct_make_request") there is no benefit to DM special-casing NVMe. Remove all code used to establish DM_TYPE_NVME_BIO_BASED. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-09-29dm table: make 'struct dm_table' definition accessible to all of DM coreMike Snitzer1-45/+2
Move 'struct dm_table' definition from dm-table.c to dm-core.h and update DM core to access its members directly. Helps optimize max_io_len() and other methods slightly. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-09-29dm table: stack 'chunk_sectors' limit to account for target-specific splittingMike Snitzer1-0/+5
If target set ti->max_io_len it must be used when stacking DM device's queue_limits to establish a 'chunk_sectors' that is compatible with the IO stack. By using lcm_not_zero() care is taken to avoid blindly overriding the chunk_sectors limit stacked up by blk_stack_limits(). Depends-on: 07d098e6bbad ("block: allow 'chunk_sectors' to be non-power-of-2") Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-09-29Merge remote-tracking branch 'jens/for-5.10/block' into dm-5.10Mike Snitzer1-6/+37
DM depends on these block 5.10 commits: 22ada802ede8 block: use lcm_not_zero() when stacking chunk_sectors 07d098e6bbad block: allow 'chunk_sectors' to be non-power-of-2 021a24460dc2 block: add QUEUE_FLAG_NOWAIT 6abc49468eea dm: add support for REQ_NOWAIT and enable it for linear target Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-09-25dm: add support for REQ_NOWAIT and enable it for linear targetKonstantin Khlebnikov1-0/+32
Add DM target feature flag DM_TARGET_NOWAIT which advertises that target works with REQ_NOWAIT bios. Add dm_table_supports_nowait() and update dm_table_set_restrictions() to set/clear QUEUE_FLAG_NOWAIT accordingly. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-25block: add a bdev_is_partition helperChristoph Hellwig1-1/+1
Add a littler helper to make the somewhat arcane bd_contains checks a little more obvious. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-24bdi: replace BDI_CAP_STABLE_WRITES with a queue and a sb flagChristoph Hellwig1-3/+3
The BDI_CAP_STABLE_WRITES is one of the few bits of information in the backing_dev_info shared between the block drivers and the writeback code. To help untangling the dependency replace it with a queue flag and a superblock flag derived from it. This also helps with the case of e.g. a file system requiring stable writes due to its own checksumming, but not forcing it on other users of the block device like the swap code. One downside is that we an't support the stable_pages_required bdi attribute in sysfs anymore. It is replaced with a queue attribute which also is writable for easier testing. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-24block: lift setting the readahead size into the block layerChristoph Hellwig1-2/+1
Drivers shouldn't really mess with the readahead size, as that is a VM concept. Instead set it based on the optimal I/O size by lifting the algorithm from the md driver when registering the disk. Also set bdi->io_pages there as well by applying the same scheme based on max_sectors. To ensure the limits work well for stacking drivers a new helper is added to update the readahead limits from the block limits, which is also called from disk_stack_limits. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-20dm: Call proper helper to determine dax supportJan Kara1-3/+7
DM was calling generic_fsdax_supported() to determine whether a device referenced in the DM table supports DAX. However this is a helper for "leaf" device drivers so that they don't have to duplicate common generic checks. High level code should call dax_supported() helper which that calls into appropriate helper for the particular device. This problem manifested itself as kernel messages: dm-3: error: dax access failed (-95) when lvm2-testsuite run in cases where a DM device was stacked on top of another DM device. Fixes: 7bf7eac8d648 ("dax: Arrange for dax_supported check to span multiple devices") Cc: <stable@vger.kernel.org> Tested-by: Adrian Huang <ahuang12@lenovo.com> Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: Mike Snitzer <snitzer@redhat.com> Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/r/160061715195.13131.5503173247632041975.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2020-08-05Merge tag 'for-5.9/block-merge-20200804' of git://git.kernel.dk/linux-blockLinus Torvalds1-20/+2
Pull block stacking updates from Jens Axboe: "The stacking related fixes depended on both the core block and drivers branches, so here's a topic branch with that change. Outside of that, a late fix from Johannes for zone revalidation" * tag 'for-5.9/block-merge-20200804' of git://git.kernel.dk/linux-block: block: don't do revalidate zones on invalid devices block: remove blk_queue_stack_limits block: remove bdev_stack_limits block: inherit the zoned characteristics in blk_stack_limits
2020-08-04Merge tag 'uninit-macro-v5.9-rc1' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull uninitialized_var() macro removal from Kees Cook: "This is long overdue, and has hidden too many bugs over the years. The series has several "by hand" fixes, and then a trivial treewide replacement. - Clean up non-trivial uses of uninitialized_var() - Update documentation and checkpatch for uninitialized_var() removal - Treewide removal of uninitialized_var()" * tag 'uninit-macro-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: compiler: Remove uninitialized_var() macro treewide: Remove uninitialized_var() usage checkpatch: Remove awareness of uninitialized_var() macro mm/debug_vm_pgtable: Remove uninitialized_var() usage f2fs: Eliminate usage of uninitialized_var() macro media: sur40: Remove uninitialized_var() usage KVM: PPC: Book3S PR: Remove uninitialized_var() usage clk: spear: Remove uninitialized_var() usage clk: st: Remove uninitialized_var() usage spi: davinci: Remove uninitialized_var() usage ide: Remove uninitialized_var() usage rtlwifi: rtl8192cu: Remove uninitialized_var() usage b43: Remove uninitialized_var() usage drbd: Remove uninitialized_var() usage x86/mm/numa: Remove uninitialized_var() usage docs: deprecated.rst: Add uninitialized_var()
2020-07-20block: remove bdev_stack_limitsChristoph Hellwig1-1/+2
This function is just a tiny wrapper around blk_stack_limit and has two callers. Simplify the stack a bit by open coding it in the two callers. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Tested-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-20block: inherit the zoned characteristics in blk_stack_limitsChristoph Hellwig1-19/+0
Lift the code from device mapper into blk_stack_limits to inherity the stacking limitations. This ensures we do the right thing for all stacked zoned block devices. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Tested-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-16treewide: Remove uninitialized_var() usageKees Cook1-1/+1
Using uninitialized_var() is dangerous as it papers over real bugs[1] (or can in the future), and suppresses unrelated compiler warnings (e.g. "unused variable"). If the compiler thinks it is uninitialized, either simply initialize the variable or make compiler changes. In preparation for removing[2] the[3] macro[4], remove all remaining needless uses with the following script: git grep '\buninitialized_var\b' | cut -d: -f1 | sort -u | \ xargs perl -pi -e \ 's/\buninitialized_var\(([^\)]+)\)/\1/g; s:\s*/\* (GCC be quiet|to make compiler happy) \*/$::g;' drivers/video/fbdev/riva/riva_hw.c was manually tweaked to avoid pathological white-space. No outstanding warnings were found building allmodconfig with GCC 9.3.0 for x86_64, i386, arm64, arm, powerpc, powerpc64le, s390x, mips, sparc64, alpha, and m68k. [1] https://lore.kernel.org/lkml/20200603174714.192027-1-glider@google.com/ [2] https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1TGqCR5vQkCzWJ0QxK6CernOU6eedsudAixw@mail.gmail.com/ [3] https://lore.kernel.org/lkml/CA+55aFwgbgqhbp1fkxvRKEpzyR5J8n1vKT1VZdz9knmPuXhOeg@mail.gmail.com/ [4] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/ Reviewed-by: Leon Romanovsky <leonro@mellanox.com> # drivers/infiniband and mlx4/mlx5 Acked-by: Jason Gunthorpe <jgg@mellanox.com> # IB Acked-by: Kalle Valo <kvalo@codeaurora.org> # wireless drivers Reviewed-by: Chao Yu <yuchao0@huawei.com> # erofs Signed-off-by: Kees Cook <keescook@chromium.org>
2020-07-08writeback: remove bdi->congested_fnChristoph Hellwig1-36/+1
Except for pktdvd, the only places setting congested bits are file systems that allocate their own backing_dev_info structures. And pktdvd is a deprecated driver that isn't useful in stack setup either. So remove the dead congested_fn stacking infrastructure. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Song Liu <song@kernel.org> Acked-by: David Sterba <dsterba@suse.com> [axboe: fixup unused variables in bcache/request.c] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-04-25dm: remove the make_request_fn check in device_area_is_invalidChristoph Hellwig1-17/+0
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-03block: don't handle bio based drivers in blk_revalidate_disk_zonesChristoph Hellwig1-5/+7
bio based drivers only need to update q->nr_zones. Do that manually instead of overloading blk_revalidate_disk_zones to keep that function simpler for the next round of changes that will rely even more on the request based functionality. Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-05dm table: do not allow request-based DM to stack on partitionsMike Snitzer1-19/+8
Partitioned request-based devices cannot be used as underlying devices for request-based DM because no partition offsets are added to each incoming request. As such, until now, stacking on partitioned devices would _always_ result in data corruption (e.g. wiping the partition table, writing to other partitions, etc). Fix this by disallowing request-based stacking on partitions. While at it, since all .request_fn support has been removed from block core, remove legacy dm-table code that differentiated between blk-mq and .request_fn request-based. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-08-23dm: make dm_table_find_target return NULLMikulas Patocka1-5/+3
Currently, if we pass too high sector number to dm_table_find_target, it returns zeroed dm_target structure and callers test if the structure is zeroed with the macro dm_target_is_valid. However, returning NULL is common practice to indicate errors. This patch refactors the dm code, so that dm_table_find_target returns NULL and its callers test the returned value for NULL. The macro dm_target_is_valid is deleted. In alloc_targets, we no longer allocate an extra zeroed target. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-08-23dm table: fix invalid memory accesses with too high sector numberMikulas Patocka1-1/+4
If the sector number is too high, dm_table_find_target() should return a pointer to a zeroed dm_target structure (the caller should test it with dm_target_is_valid). However, for some table sizes, the code in dm_table_find_target() that performs btree lookup will access out of bound memory structures. Fix this bug by testing the sector number at the beginning of dm_table_find_target(). Also, add an "inline" keyword to the function dm_table_get_size() because this is a hot path. Fixes: 512875bd9661 ("dm: table detect io beyond device") Cc: stable@vger.kernel.org Reported-by: Zhang Tao <kontais@zoho.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-07-30dm table: fix various whitespace issues with recent DAX codeMike Snitzer1-7/+7
Also, rename device_synchronous to device_dax_synchronous. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-07-30dm table: fix dax_dev NULL dereference in device_synchronous()Pankaj Gupta1-1/+1
If a device doesn't support DAX its 'dax_dev' is NULL. Fix device_synchronous() to first check if dax_dev is NULL before dereferencing it. Fixes: 2e9ee0955d3c ("dm: enable synchronous dax") Reported-by: jencce.kernel@gmail.com Signed-off-by: Pankaj Gupta <pagupta@redhat.com> Acked-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-07-18Merge tag 'libnvdimm-for-5.3' of ↵Linus Torvalds1-6/+18
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm updates from Dan Williams: "Primarily just the virtio_pmem driver: - virtio_pmem The new virtio_pmem facility introduces a paravirtualized persistent memory device that allows a guest VM to use DAX mechanisms to access a host-file with host-page-cache. It arranges for MAP_SYNC to be disabled and instead triggers a host fsync() when a 'write-cache flush' command is sent to the virtual disk device. - Miscellaneous small fixups" * tag 'libnvdimm-for-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: virtio_pmem: fix sparse warning xfs: disable map_sync for async flush ext4: disable map_sync for async flush dax: check synchronous mapping is supported dm: enable synchronous dax libnvdimm: add dax_dev sync flag virtio-pmem: Add virtio pmem driver libnvdimm: nd_region flush callback support libnvdimm, namespace: Drop uuid_t implementation detail
2019-07-05dm: enable synchronous daxPankaj Gupta1-6/+18
This patch sets dax device 'DAXDEV_SYNC' flag if all the target devices of device mapper support synchrononous DAX. If device mapper consists of both synchronous and asynchronous dax devices, we don't set 'DAXDEV_SYNC' flag. 'dm_table_supports_dax' is refactored to pass 'iterate_devices_fn' as argument so that the callers can pass the appropriate functions. Suggested-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Pankaj Gupta <pagupta@redhat.com> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-06-25dm table: don't copy from a NULL pointer in realloc_argv()Jerome Marchand1-1/+1
For the first call to realloc_argv() in dm_split_args(), old_argv is NULL and size is zero. Then memcpy is called, with the NULL old_argv as the source argument and a zero size argument. AFAIK, this is undefined behavior and generates the following warning when compiled with UBSAN on ppc64le: In file included from ./arch/powerpc/include/asm/paca.h:19, from ./arch/powerpc/include/asm/current.h:16, from ./include/linux/sched.h:12, from ./include/linux/kthread.h:6, from drivers/md/dm-core.h:12, from drivers/md/dm-table.c:8: In function 'memcpy', inlined from 'realloc_argv' at drivers/md/dm-table.c:565:3, inlined from 'dm_split_args' at drivers/md/dm-table.c:588:9: ./include/linux/string.h:345:9: error: argument 2 null where non-null expected [-Werror=nonnull] return __builtin_memcpy(p, q, size); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/md/dm-table.c: In function 'dm_split_args': ./include/linux/string.h:345:9: note: in a call to built-in function '__builtin_memcpy' Signed-off-by: Jerome Marchand <jmarchan@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-05-20dax: Arrange for dax_supported check to span multiple devicesDan Williams1-6/+11
Pankaj reports that starting with commit ad428cdb525a "dax: Check the end of the block-device capacity with dax_direct_access()" device-mapper no longer allows dax operation. This results from the stricter checks in __bdev_dax_supported() that validate that the start and end of a block-device map to the same 'pagemap' instance. Teach the dax-core and device-mapper to validate the 'pagemap' on a per-target basis. This is accomplished by refactoring the bdev_dax_supported() internals into generic_fsdax_supported() which takes a sector range to validate. Consequently generic_fsdax_supported() is suitable to be used in a device-mapper ->iterate_devices() callback. A new ->dax_supported() operation is added to allow composite devices to split and route upper-level bdev_dax_supported() requests. Fixes: ad428cdb525a ("dax: Check the end of the block-device...") Cc: <stable@vger.kernel.org> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Reviewed-by: Jan Kara <jack@suse.cz> Reported-by: Pankaj Gupta <pagupta@redhat.com> Reviewed-by: Pankaj Gupta <pagupta@redhat.com> Tested-by: Pankaj Gupta <pagupta@redhat.com> Tested-by: Vaibhav Jain <vaibhav@linux.ibm.com> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-04-01dm table: propagate BDI_CAP_STABLE_WRITES to fix sporadic checksum errorsIlya Dryomov1-0/+39
Some devices don't use blk_integrity but still want stable pages because they do their own checksumming. Examples include rbd and iSCSI when data digests are negotiated. Stacking DM (and thus LVM) on top of these devices results in sporadic checksum errors. Set BDI_CAP_STABLE_WRITES if any underlying device has it set. Cc: stable@vger.kernel.org Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-02-15block: kill QUEUE_FLAG_NO_SG_MERGEMing Lei1-13/+0
Since bdced438acd83ad83a6c ("block: setup bi_phys_segments after splitting"), physical segment number is mainly figured out in blk_queue_split() for fast path, and the flag of BIO_SEG_VALID is set there too. Now only blk_recount_segments() and blk_recalc_rq_segments() use this flag. Basically blk_recount_segments() is bypassed in fast path given BIO_SEG_VALID is set in blk_queue_split(). For another user of blk_recalc_rq_segments(): - run in partial completion branch of blk_update_request, which is an unusual case - run in blk_cloned_rq_check_limits(), still not a big problem if the flag is killed since dm-rq is the only user. Multi-page bvec is enabled now, not doing S/G merging is rather pointless with the current setup of the I/O path, as it isn't going to save you a significant amount of cycles. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-18dm: do not allow readahead to limit IO sizeJaegeuk Kim1-0/+3
Update DM to set the bdi's io_pages. This fixes reads to be capped at the device's max request size (even if user's read IO exceeds the established readahead setting). Fixes: 9491ae4a ("mm: don't cap request size based on read-ahead setting") Cc: stable@vger.kernel.org Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-11-16block: add queue_is_mq() helperJens Axboe1-2/+2
Various spots check for q->mq_ops being non-NULL, but provide a helper to do this instead. Where the ->mq_ops != NULL check is redundant, remove it. Since mq == rq-based now that legacy is gone, get rid of the queue_is_rq_based() and just use queue_is_mq() everywhere. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-26Merge tag 'for-4.20/dm-changes' of ↵Linus Torvalds1-34/+12
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper updates from Mike Snitzer: - The biggest change this cycle is to remove support for the legacy IO path (.request_fn) from request-based DM. Jens has already started preparing for complete removal of the legacy IO path in 4.21 but this earlier removal of support from DM has been coordinated with Jens (as evidenced by the commit being attributed to him). Making request-based DM exclussively blk-mq only cleans up that portion of DM core quite nicely. - Convert the thinp and zoned targets over to using refcount_t where applicable. - A couple fixes to the DM zoned target for refcounting and other races buried in the implementation of metadata block creation and use. - Small cleanups to remove redundant unlikely() around a couple WARN_ON_ONCE(). - Simplify how dm-ioctl copies from userspace, eliminating some potential for a malicious user trying to change the executed ioctl after its processing has begun. - Tweaked DM crypt target to use the DM device name when naming the various workqueues created for a particular DM crypt device (makes the N workqueues for a DM crypt device more easily understood and enhances user's accounting capabilities at a glance via "ps") - Small fixup to remove dead branch in DM writecache's memory_entry(). * tag 'for-4.20/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm writecache: remove disabled code in memory_entry() dm zoned: fix various dmz_get_mblock() issues dm zoned: fix metadata block ref counting dm raid: avoid bitmap with raid4/5/6 journal device dm crypt: make workqueue names device-specific dm: add dm_table_device_name() dm ioctl: harden copy_params()'s copy_from_user() from malicious users dm: remove unnecessary unlikely() around WARN_ON_ONCE() dm zoned: target: use refcount_t for dm zoned reference counters dm thin: use refcount_t for thin_c reference counting dm table: require that request-based DM be layered on blk-mq devices dm: rename DM_TYPE_MQ_REQUEST_BASED to DM_TYPE_REQUEST_BASED dm: remove legacy request-based IO path
2018-10-25block: Introduce blk_revalidate_disk_zones()Damien Le Moal1-0/+10
Drivers exposing zoned block devices have to initialize and maintain correctness (i.e. revalidate) of the device zone bitmaps attached to the device request queue (seq_zones_bitmap and seq_zones_wlock). To simplify coding this, introduce a generic helper function blk_revalidate_disk_zones() suitable for most (and likely all) cases. This new function always update the seq_zones_bitmap and seq_zones_wlock bitmaps as well as the queue nr_zones field when called for a disk using a request based queue. For a disk using a BIO based queue, only the number of zones is updated since these queues do not have schedulers and so do not need the zone bitmaps. With this change, the zone bitmap initialization code in sd_zbc.c can be replaced with a call to this function in sd_zbc_read_zones(), which is called from the disk revalidate block operation method. A call to blk_revalidate_disk_zones() is also added to the null_blk driver for devices created with the zoned mode enabled. Finally, to ensure that zoned devices created with dm-linear or dm-flakey expose the correct number of zones through sysfs, a call to blk_revalidate_disk_zones() is added to dm_table_set_restrictions(). The zone bitmaps allocated and initialized with blk_revalidate_disk_zones() are freed automatically from __blk_release_queue() using the block internal function blk_queue_free_zone_bitmaps(). Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-18dm: add dm_table_device_name()Michał Mirosław1-0/+6
Add a shortcut for dm_device_name(dm_table_get_md(t)). Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-10-11dm table: require that request-based DM be layered on blk-mq devicesMike Snitzer1-1/+17
Now that request-based DM (multipath) is blk-mq only: this restriction is required while the legacy request-based IO path still exists. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-10-11dm: rename DM_TYPE_MQ_REQUEST_BASED to DM_TYPE_REQUEST_BASEDMike Snitzer1-6/+1
Now that request-based DM is only using blk-mq, there is no need to differentiate between legacy "rq" and new "mq". We're back to a single request-based DM -- and there was much rejoicing! Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-10-11dm: remove legacy request-based IO pathJens Axboe1-44/+5
dm supports both, and since we're killing off the legacy path in general, get rid of it in dm. Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-06-28dm: prevent DAX mounts if not supportedRoss Zwisler1-3/+4
Currently device_supports_dax() just checks to see if the QUEUE_FLAG_DAX flag is set on the device's request queue to decide whether or not the device supports filesystem DAX. Really we should be using bdev_dax_supported() like filesystems do at mount time. This performs other tests like checking to make sure the dax_direct_access() path works. We also explicitly clear QUEUE_FLAG_DAX on the DM device's request queue if any of the underlying devices do not support DAX. This makes the handling of QUEUE_FLAG_DAX consistent with the setting/clearing of most other flags in dm_table_set_restrictions(). Now that bdev_dax_supported() explicitly checks for QUEUE_FLAG_DAX, this will ensure that filesystems built upon DM devices will only be able to mount with DAX if all underlying devices also support DAX. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Fixes: commit 545ed20e6df6 ("dm: add infrastructure for DAX support") Cc: stable@vger.kernel.org Acked-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Toshi Kani <toshi.kani@hpe.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-06-12treewide: kmalloc() -> kmalloc_array()Kees Cook1-1/+1
The kmalloc() function has a 2-factor argument form, kmalloc_array(). This patch replaces cases of: kmalloc(a * b, gfp) with: kmalloc_array(a * b, gfp) as well as handling cases of: kmalloc(a * b * c, gfp) with: kmalloc(array3_size(a, b, c), gfp) as it's slightly less ugly than: kmalloc_array(array_size(a, b), c, gfp) This does, however, attempt to ignore constant size factors like: kmalloc(4 * 1024, gfp) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The tools/ directory was manually excluded, since it has its own implementation of kmalloc(). The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( kmalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) | kmalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( kmalloc( - sizeof(u8) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(char) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(u8) * COUNT + COUNT , ...) | kmalloc( - sizeof(__u8) * COUNT + COUNT , ...) | kmalloc( - sizeof(char) * COUNT + COUNT , ...) | kmalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( - kmalloc + kmalloc_array ( - sizeof(TYPE) * (COUNT_ID) + COUNT_ID, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * COUNT_ID + COUNT_ID, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * (COUNT_CONST) + COUNT_CONST, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * COUNT_CONST + COUNT_CONST, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * (COUNT_ID) + COUNT_ID, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * COUNT_ID + COUNT_ID, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * (COUNT_CONST) + COUNT_CONST, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * COUNT_CONST + COUNT_CONST, sizeof(THING) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ - kmalloc + kmalloc_array ( - SIZE * COUNT + COUNT, SIZE , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( kmalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kmalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kmalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kmalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( kmalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kmalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kmalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kmalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) | kmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( kmalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products, // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( kmalloc(C1 * C2 * C3, ...) | kmalloc( - (E1) * E2 * E3 + array3_size(E1, E2, E3) , ...) | kmalloc( - (E1) * (E2) * E3 + array3_size(E1, E2, E3) , ...) | kmalloc( - (E1) * (E2) * (E3) + array3_size(E1, E2, E3) , ...) | kmalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants, // keeping sizeof() as the second factor argument. @@ expression THING, E1, E2; type TYPE; constant C1, C2, C3; @@ ( kmalloc(sizeof(THING) * C2, ...) | kmalloc(sizeof(TYPE) * C2, ...) | kmalloc(C1 * C2 * C3, ...) | kmalloc(C1 * C2, ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * (E2) + E2, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * E2 + E2, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * (E2) + E2, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * E2 + E2, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - (E1) * E2 + E1, E2 , ...) | - kmalloc + kmalloc_array ( - (E1) * (E2) + E1, E2 , ...) | - kmalloc + kmalloc_array ( - E1 * E2 + E1, E2 , ...) ) Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-05overflow.h: Add allocation size calculation helpersKees Cook1-5/+5
In preparation for replacing unchecked overflows for memory allocations, this creates helpers for the 3 most common calculations: array_size(a, b): 2-dimensional array array3_size(a, b, c): 3-dimensional array struct_size(ptr, member, n): struct followed by n-many trailing members Each of these return SIZE_MAX on overflow instead of wrapping around. (Additionally renames a variable named "array_size" to avoid future collision.) Co-developed-by: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Kees Cook <keescook@chromium.org>
2018-04-06Merge tag 'for-4.17/dm-changes' of ↵Linus Torvalds1-0/+31
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper updates from Mike Snitzer: - DM core passthrough ioctl fix to retain reference to DM table, and that table's block devices, while issuing the ioctl to one of those block devices. - DM core passthrough ioctl fix to _not_ override the fmode_t used to issue the ioctl. Overriding by using the fmode_t that the block device was originally open with during DM table load is a liability. - Add DM core support for secure erase forwarding and update the DM linear and DM striped targets to support them. - A DM core 4.16 stable fix to allow abnormal IO (e.g. discard, write same, write zeroes) for targets that make use of the non-splitting IO variant (as is done for multipath or thinp when layered directly on NVMe). - Allow DM targets to return a payload in response to a DM message that they are sent. This is useful for DM targets that would like to provide statistics data in response to DM messages. - Update DM bufio to support non-power-of-2 block sizes. Numerous other related changes prepare the DM bufio code for this support. - Fix DM crypt to use a bounded amount of memory across the entire system. This is to avoid OOM that can otherwise occur in response to certain pathological IO workloads (e.g. discarding a large DM crypt device). - Add a 'check_at_most_once' feature to the DM verity target to allow verity to be used on mobile devices that have very limited resources. - Fix the DM integrity target to fail early if a keyed algorithm (e.g. HMAC) is to be used but the key isn't set. - Add non-power-of-2 support to the DM unstripe target. - Eliminate the use of a Variable Length Array in the DM stripe target. - Update the DM log-writes target to record metadata (REQ_META flag). - DM raid fixes for its nosync status and some variable range issues. * tag 'for-4.17/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (28 commits) dm: remove fmode_t argument from .prepare_ioctl hook dm: hold DM table for duration of ioctl rather than use blkdev_get dm raid: fix parse_raid_params() variable range issue dm verity: make verity_for_io_block static dm verity: add 'check_at_most_once' option to only validate hashes once dm bufio: don't embed a bio in the dm_buffer structure dm bufio: support non-power-of-two block sizes dm bufio: use slab cache for dm_buffer structure allocations dm bufio: reorder fields in dm_buffer structure dm bufio: relax alignment constraint on slab cache dm bufio: remove code that merges slab caches dm bufio: get rid of slab cache name allocations dm bufio: move dm-bufio.h to include/linux/ dm bufio: delete outdated comment dm: add support for secure erase forwarding dm: backfill abnormal IO support to non-splitting IO submission dm raid: fix nosync status dm mpath: use DM_MAPIO_SUBMITTED instead of magic number 0 in process_queued_bios() dm stripe: get rid of a Variable Length Array (VLA) dm log writes: record metadata flag for better flags record ...
2018-04-05Merge tag 'for-4.17/block-20180402' of git://git.kernel.dk/linux-blockLinus Torvalds1-8/+8
Pull block layer updates from Jens Axboe: "It's a pretty quiet round this time, which is nice. This contains: - series from Bart, cleaning up the way we set/test/clear atomic queue flags. - series from Bart, fixing races between gendisk and queue registration and removal. - set of bcache fixes and improvements from various folks, by way of Michael Lyle. - set of lightnvm updates from Matias, most of it being the 1.2 to 2.0 transition. - removal of unused DIO flags from Nikolay. - blk-mq/sbitmap memory ordering fixes from Omar. - divide-by-zero fix for BFQ from Paolo. - minor documentation patches from Randy. - timeout fix from Tejun. - Alpha "can't write a char atomically" fix from Mikulas. - set of NVMe fixes by way of Keith. - bsg and bsg-lib improvements from Christoph. - a few sed-opal fixes from Jonas. - cdrom check-disk-change deadlock fix from Maurizio. - various little fixes, comment fixes, etc from various folks" * tag 'for-4.17/block-20180402' of git://git.kernel.dk/linux-block: (139 commits) blk-mq: Directly schedule q->timeout_work when aborting a request blktrace: fix comment in blktrace_api.h lightnvm: remove function name in strings lightnvm: pblk: remove some unnecessary NULL checks lightnvm: pblk: don't recover unwritten lines lightnvm: pblk: implement 2.0 support lightnvm: pblk: implement get log report chunk lightnvm: pblk: rename ppaf* to addrf* lightnvm: pblk: check for supported version lightnvm: implement get log report chunk helpers lightnvm: make address conversions depend on generic device lightnvm: add support for 2.0 address format lightnvm: normalize geometry nomenclature lightnvm: complete geo structure with maxoc* lightnvm: add shorten OCSSD version in geo lightnvm: add minor version to generic geometry lightnvm: simplify geometry structure lightnvm: pblk: refactor init/exit sequences lightnvm: Avoid validation of default op value lightnvm: centralize permission check for lightnvm ioctl ...
2018-04-03dm: add support for secure erase forwardingDenis Semakin1-0/+31
Set QUEUE_FLAG_SECERASE in DM device's queue_flags if a DM table's data devices support secure erase. Also, add support for secure erase to both the linear and striped targets. Signed-off-by: Denis Semakin <d.semakin@omprussia.ru> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-03-08block: Use blk_queue_flag_*() in drivers instead of queue_flag_*()Bart Van Assche1-8/+8
This patch has been generated as follows: for verb in set_unlocked clear_unlocked set clear; do replace-in-files queue_flag_${verb} blk_queue_flag_${verb%_unlocked} \ $(git grep -lw queue_flag_${verb} drivers block/bsg*) done Except for protecting all queue flag changes with the queue lock this patch does not change any functionality. Cc: Mike Snitzer <snitzer@redhat.com> Cc: Shaohua Li <shli@fb.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.de> Cc: Ming Lei <ming.lei@redhat.com> Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-03-06dm table: allow upgrade from bio-based to specialized bio-based variantMike Snitzer1-9/+5
In practice this is really only meaningful in the context of the DM multipath target (which uses dm_table_set_type() to set the type of device DM should create via its "queue_mode" option). So this change allows a DM multipath device with "queue_mode bio" to be upgraded from DM_TYPE_BIO_BASED to DM_TYPE_NVME_BIO_BASED -- iff the underlying device(s) are NVMe. DM_TYPE_NVME_BIO_BASED is just a DM core implementation detail that allows for NVMe-specific optimizations (e.g. use direct_make_request instead of generic_make_request). If in the future there is no benefit or need to distinguish NVMe vs not: then it will be removed. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-03-06dm table: fix "nvme" testMikulas Patocka1-1/+1
The strncmp function should compare 4 bytes. Fixes: 22c11858e8002 ("dm: introduce DM_TYPE_NVME_BIO_BASED") Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-01-29dm table: fix NVMe bio-based dm_table_determine_type() validationMike Snitzer1-22/+35
The 'verify_rq_based:' code in dm_table_determine_type() was checking all devices in the DM table rather than only checking the data devices. Fix this by using the immutable target's iterate_devices method. Also, tweak the block of dm_table_determine_type() code that decides whether to upgrade from DM_TYPE_BIO_BASED to DM_TYPE_NVME_BIO_BASED so that it makes sure the immutable_target doesn't support require splitting IOs. These changes have been verified to allow a "thin-pool" target whose data device is an NVMe device to be upgraded to DM_TYPE_NVME_BIO_BASED. Using the thin-pool in NVMe bio-based mode was verified to pass all the device-mapper-test-suite's "thin-provisioning" tests. Also verified that request-based DM multipath (with queue_mode "rq" and "mq") works as expected using the 'mptest' harness. Fixes: 22c11858e ("dm: introduce DM_TYPE_NVME_BIO_BASED") Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-12-20dm: introduce DM_TYPE_NVME_BIO_BASEDMike Snitzer1-6/+48
If dm_table_determine_type() establishes DM_TYPE_NVME_BIO_BASED then all devices in the DM table do not support partial completions. Also, the table has a single immutable target that doesn't require DM core to split bios. This will enable adding NVMe optimizations to bio-based DM. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-12-13dm: set QUEUE_FLAG_DAX accordingly in dm_table_set_restrictions()Mike Snitzer1-0/+2
Rather than having DAX support be unique by setting it based on table type in dm_setup_md_queue(). Signed-off-by: Mike Snitzer <snitzer@redhat.com>