summaryrefslogtreecommitdiffstats
path: root/drivers/block/loop.c
AgeCommit message (Collapse)AuthorFilesLines
2021-02-28Merge tag 'block-5.12-2021-02-27' of git://git.kernel.dk/linux-blockLinus Torvalds1-0/+3
Pull more block updates from Jens Axboe: "A few stragglers (and one due to me missing it originally), and fixes for changes in this merge window mostly. In particular: - blktrace cleanups (Chaitanya, Greg) - Kill dead blk_pm_* functions (Bart) - Fixes for the bio alloc changes (Christoph) - Fix for the partition changes (Christoph, Ming) - Fix for turning off iopoll with polled IO inflight (Jeffle) - nbd disconnect fix (Josef) - loop fsync error fix (Mauricio) - kyber update depth fix (Yang) - max_sectors alignment fix (Mikulas) - Add bio_max_segs helper (Matthew)" * tag 'block-5.12-2021-02-27' of git://git.kernel.dk/linux-block: (21 commits) block: Add bio_max_segs blktrace: fix documentation for blk_fill_rw() block: memory allocations in bounce_clone_bio must not fail block: remove the gfp_mask argument to bounce_clone_bio block: fix bounce_clone_bio for passthrough bios block-crypto-fallback: use a bio_set for splitting bios block: fix logging on capacity change blk-settings: align max_sectors on "logical_block_size" boundary block: reopen the device in blkdev_reread_part block: don't skip empty device in in disk_uevent blktrace: remove debugfs file dentries from struct blk_trace nbd: handle device refs for DESTROY_ON_DISCONNECT properly kyber: introduce kyber_depth_updated() loop: fix I/O error on fsync() in detached loop devices block: fix potential IO hang when turning off io_poll block: get rid of the trace rq insert wrapper blktrace: fix blk_rq_merge documentation blktrace: fix blk_rq_issue documentation blktrace: add blk_fill_rwbs documentation comment block: remove superfluous param in blk_fill_rwbs() ...
2021-02-27Merge branch 'work.misc' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc vfs updates from Al Viro: "Assorted stuff pile - no common topic here" * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: whack-a-mole: don't open-code iminor/imajor 9p: fix misuse of sscanf() in v9fs_stat2inode() audit_alloc_mark(): don't open-code ERR_CAST() fs/inode.c: make inode_init_always() initialize i_ino to 0 vfs: don't unnecessarily clone write access for writable fds
2021-02-23whack-a-mole: don't open-code iminor/imajorAl Viro1-1/+1
several instances creeped back into the tree... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-02-22loop: fix I/O error on fsync() in detached loop devicesMauricio Faria de Oliveira1-0/+3
There's an I/O error on fsync() in a detached loop device if it has been previously attached. The issue is write cache is enabled in the attach path in loop_configure() but it isn't disabled in the detach path; thus it remains enabled in the block device regardless of whether it is attached or not. Now fsync() can get an I/O request that will just be failed later in loop_queue_rq() as device's state is not 'Lo_bound'. So, disable write cache in the detach path. Do so based on the queue flag, not the loop device flag for read-only (used to enable) as the queue flag can be changed via sysfs even on read-only loop devices (e.g., losetup -r.) Test-case: # DEV=/dev/loop7 # IMG=/tmp/image # truncate --size 1M $IMG # losetup $DEV $IMG # losetup -d $DEV Before: # strace -e fsync parted -s $DEV print 2>&1 | grep fsync fsync(3) = -1 EIO (Input/output error) Warning: Error fsyncing/closing /dev/loop7: Input/output error [ 982.529929] blk_update_request: I/O error, dev loop7, sector 0 op 0x1:(WRITE) flags 0x800 phys_seg 0 prio class 0 After: # strace -e fsync parted -s $DEV print 2>&1 | grep fsync fsync(3) = 0 Co-developed-by: Eric Desrochers <eric.desrochers@canonical.com> Signed-off-by: Eric Desrochers <eric.desrochers@canonical.com> Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com> Tested-by: Gabriel Krisman Bertazi <krisman@collabora.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-26loop: scale loop device by introducing per device lockPavel Tatashin1-40/+53
Currently, loop device has only one global lock: loop_ctl_mutex. This becomes hot in scenarios where many loop devices are used. Scale it by introducing per-device lock: lo_mutex that protects modifications of all fields in struct loop_device. Keep loop_ctl_mutex to protect global data: loop_index_idr, loop_lookup, loop_add. The new lock ordering requirement is that loop_ctl_mutex must be taken before lo_mutex. Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Tyler Hicks <tyhicks@linux.microsoft.com> Reviewed-by: Petr Vorel <pvorel@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-16Merge tag 'for-5.11/drivers-2020-12-14' of git://git.kernel.dk/linux-blockLinus Torvalds1-6/+2
Pull block driver updates from Jens Axboe: "Nothing major in here: - NVMe pull request from Christoph: - nvmet passthrough improvements (Chaitanya Kulkarni) - fcloop error injection support (James Smart) - read-only support for zoned namespaces without Zone Append (Javier González) - improve some error message (Minwoo Im) - reject I/O to offline fabrics namespaces (Victor Gladkov) - PCI queue allocation cleanups (Niklas Schnelle) - remove an unused allocation in nvmet (Amit Engel) - a Kconfig spelling fix (Colin Ian King) - nvme_req_qid simplication (Baolin Wang) - MD pull request from Song: - Fix race condition in md_ioctl() (Dae R. Jeong) - Initialize read_slot properly for raid10 (Kevin Vigor) - Code cleanup (Pankaj Gupta) - md-cluster resync/reshape fix (Zhao Heming) - Move null_blk into its own directory (Damien Le Moal) - null_blk zone and discard improvements (Damien Le Moal) - bcache race fix (Dongsheng Yang) - Set of rnbd fixes/improvements (Gioh Kim, Guoqing Jiang, Jack Wang, Lutz Pogrell, Md Haris Iqbal) - lightnvm NULL pointer deref fix (tangzhenhao) - sr in_interrupt() removal (Sebastian Andrzej Siewior) - FC endpoint security support for s390/dasd (Jan Höppner, Sebastian Ott, Vineeth Vijayan). From the s390 arch guys, arch bits included as it made it easier for them to funnel the feature through the block driver tree. - Follow up fixes (Colin Ian King)" * tag 'for-5.11/drivers-2020-12-14' of git://git.kernel.dk/linux-block: (64 commits) block: drop dead assignments in loop_init() sr: Remove in_interrupt() usage in sr_init_command(). sr: Switch the sector size back to 2048 if sr_read_sector() changed it. cdrom: Reset sector_size back it is not 2048. drivers/lightnvm: fix a null-ptr-deref bug in pblk-core.c null_blk: Move driver into its own directory null_blk: Allow controlling max_hw_sectors limit null_blk: discard zones on reset null_blk: cleanup discard handling null_blk: Improve implicit zone close null_blk: improve zone locking block: Align max_hw_sectors to logical blocksize null_blk: Fail zone append to conventional zones null_blk: Fix zone size initialization bcache: fix race between setting bdev state to none and new write request direct to backing block/rnbd: fix a null pointer dereference on dev->blk_symlink_name block/rnbd-clt: Dynamically alloc buffer for pathname & blk_symlink_name block/rnbd: call kobject_put in the failure path Documentation/ABI/rnbd-srv: add document for force_close block/rnbd-srv: close a mapped device from server side. ...
2020-12-12block: drop dead assignments in loop_init()Lukas Bulwahn1-6/+2
Commit 8410d38c2552 ("loop: use __register_blkdev to allocate devices on demand") simplified loop_init(); so computing the range of the block region is not required anymore and can be dropped. Drop dead assignments in loop_init(). As compilers will detect these unneeded assignments and optimize this, the resulting object code is identical before and after this change. No functional change. No change in object code. Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-01block: remove the nr_sects field in struct hd_structChristoph Hellwig1-1/+0
Now that the hd_struct always has a block device attached to it, there is no need for having two size field that just get out of sync. Additionally the field in hd_struct did not use proper serialization, possibly allowing for torn writes. By only using the block_device field this problem also gets fixed. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Coly Li <colyli@suse.de> [bcache] Acked-by: Chao Yu <yuchao0@huawei.com> [f2fs] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-01block: simplify the block device claiming interfaceChristoph Hellwig1-7/+5
Stop passing the whole device as a separate argument given that it can be trivially deducted and cleanup the !holder debug check. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-01block: remove ->bd_containsChristoph Hellwig1-1/+1
Now that each hd_struct has a reference to the corresponding block_device, there is no need for the bd_contains pointer. Add a bdev_whole() helper to look up the whole device block_device struture instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-01block: remove i_bdevChristoph Hellwig1-5/+3
Switch the block device lookup interfaces to directly work with a dev_t so that struct block_device references are only acquired by the blkdev_get variants (and the blk-cgroup special case). This means that we now don't need an extra reference in the inode and can generally simplify handling of struct block_device to keep the lookups contained in the core block layer code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Coly Li <colyli@suse.de> [bcache] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-01loop: do not call set_blocksizeChristoph Hellwig1-3/+0
set_blocksize is used by file systems to use their preferred buffer cache block size. Block drivers should not set it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-11-16block: remove the update_bdev parameter to set_capacity_revalidate_and_notifyChristoph Hellwig1-1/+1
The update_bdev argument is always set to true, so remove it. Also rename the function to the slighly less verbose set_capacity_and_notify, as propagating the disk size to the block device isn't really revalidation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Petr Vorel <pvorel@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-11-16loop: let set_capacity_revalidate_and_notify update the bdev sizeChristoph Hellwig1-6/+2
There is no good reason to call revalidate_disk_size separately. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-11-16loop: use __register_blkdev to allocate devices on demandChristoph Hellwig1-22/+8
Use the simpler mechanism attached to major_name to allocate a brd device when a currently unregistered minor is accessed. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-11-16loop: use set_disk_roChristoph Hellwig1-1/+1
Use set_disk_ro instead of set_device_ro to match all other block drivers and to ensure all partitions mirror the read-only flag. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-11-12loop: Fix occasional uevent dropPetr Vorel1-1/+2
Commit 716ad0986cbd ("loop: Switch to set_capacity_revalidate_and_notify") causes an occasional drop of loop device uevent, which are no longer triggered in loop_set_size() but in a different part of code. Bug is reproducible with LTP test uevent01 [1]: i=0; while true; do i=$((i+1)); echo "== $i ==" lsmod |grep -q loop && rmmod -f loop ./uevent01 || break done Put back triggering through code called in loop_set_size(). Fix required to add yet another parameter to set_capacity_revalidate_and_notify(). [1] https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/uevents/uevent01.c [hch: rebased on a different change to the prototype of set_capacity_revalidate_and_notify] Cc: stable@vger.kernel.org # v5.9 Fixes: 716ad0986cbd ("loop: Switch to set_capacity_revalidate_and_notify") Reported-by: <ltp@lists.linux.it> Signed-off-by: Petr Vorel <pvorel@suse.cz> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-01block: replace bd_set_size with bd_set_nr_sectorsChristoph Hellwig1-2/+2
Replace bd_set_size with a version that takes the number of sectors instead, as that fits most of the current and future callers much better. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-28Merge tag 'block-5.9-2020-08-28' of git://git.kernel.dk/linux-blockLinus Torvalds1-2/+2
Pull block fixes from Jens Axboe: - nbd timeout fix (Hou) - device size fix for loop LOOP_CONFIGURE (Martijn) - MD pull from Song with raid5 stripe size fix (Yufen) * tag 'block-5.9-2020-08-28' of git://git.kernel.dk/linux-block: md/raid5: make sure stripe_size as power of two loop: Set correct device size when using LOOP_CONFIGURE nbd: restore default timeout when setting it to zero
2020-08-26loop: Set correct device size when using LOOP_CONFIGUREMartijn Coenen1-2/+2
The device size calculation was done before processing the loop configuration, which meant that the we set the size on the underlying block device incorrectly in case lo_offset/lo_sizelimit were set in the configuration. Delay computing the size until we've setup the device parameters correctly. Fixes: 3448914e8cc5("loop: Add LOOP_CONFIGURE ioctl") Reported-by: Lennart Poettering <mzxreary@0pointer.de> Tested-by: Yang Xu <xuyang2018.jy@cn.fujitsu.com> Signed-off-by: Martijn Coenen <maco@android.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-24Merge tag 'io_uring-5.9-2020-08-23' of git://git.kernel.dk/linux-blockLinus Torvalds1-15/+18
Pull block fixes from Jens Axboe: - NVMe pull request from Sagi: - nvme completion rework from Christoph and Chao that mostly came from a bit of divergence of how we classify errors related to pathing/retry etc. - nvmet passthru fixes from Chaitanya - minor nvmet fixes from Amit and I - mpath round-robin path selection fix from Martin - ignore noiob for zoned devices from Keith - minor nvme-fc fix from Tianjia" - BFQ cgroup leak fix (Dmitry) - block layer MAINTAINERS addition (Geert) - fix null_blk FUA checking (Hou) - get_max_io_size() size fix (Keith) - fix block page_is_mergeable() for compound pages (Matthew) - discard granularity fixes (Ming) - IO scheduler ordering fix (Ming) - misc fixes * tag 'io_uring-5.9-2020-08-23' of git://git.kernel.dk/linux-block: (31 commits) null_blk: fix passing of REQ_FUA flag in null_handle_rq nvmet: Disable keep-alive timer when kato is cleared to 0h nvme: redirect commands on dying queue nvme: just check the status code type in nvme_is_path_error nvme: refactor command completion nvme: rename and document nvme_end_request nvme: skip noiob for zoned devices nvme-pci: fix PRP pool size nvme-pci: Use u32 for nvme_dev.q_depth and nvme_queue.q_depth nvme: Use spin_lock_irq() when taking the ctrl->lock nvmet: call blk_mq_free_request() directly nvmet: fix oops in pt cmd execution nvmet: add ns tear down label for pt-cmd handling nvme: multipath: round-robin: eliminate "fallback" variable nvme: multipath: round-robin: fix single non-optimized path case nvme-fc: Fix wrong return value in __nvme_fc_init_request() nvmet-passthru: Reject commands with non-sgl flags set nvmet: fix a memory leak blkcg: fix memleak for iolatency MAINTAINERS: Add missing header files to BLOCK LAYER section ...
2020-08-23treewide: Use fallthrough pseudo-keywordGustavo A. R. Silva1-2/+2
Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2020-08-17block: loop: set discard granularity and alignment for block device backed loopMing Lei1-15/+18
In case of block device backend, if the backend supports write zeros, the loop device will set queue flag of QUEUE_FLAG_DISCARD. However, limits.discard_granularity isn't setup, and this way is wrong, see the following description in Documentation/ABI/testing/sysfs-block: A discard_granularity of 0 means that the device does not support discard functionality. Especially 9b15d109a6b2 ("block: improve discard bio alignment in __blkdev_issue_discard()") starts to take q->limits.discard_granularity for computing max discard sectors. And zero discard granularity may cause kernel oops, or fail discard request even though the loop queue claims discard support via QUEUE_FLAG_DISCARD. Fix the issue by setup discard granularity and alignment. Fixes: c52abf563049 ("loop: Better discard support for block devices") Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Coly Li <colyli@suse.de> Cc: Hannes Reinecke <hare@suse.com> Cc: Xiao Ni <xni@redhat.com> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Evan Green <evgreen@chromium.org> Cc: Gwendal Grignou <gwendal@chromium.org> Cc: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Cc: Andrzej Pietrasiewicz <andrzej.p@collabora.com> Cc: Christoph Hellwig <hch@lst.de> Cc: <stable@vger.kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-11loop: unset GENHD_FL_NO_PART_SCAN on LOOP_CONFIGURELennart Poettering1-0/+2
When LOOP_CONFIGURE is used with LO_FLAGS_PARTSCAN we need to propagate this into the GENHD_FL_NO_PART_SCAN. LOOP_SETSTATUS does this, LOOP_CONFIGURE doesn't so far. Effect is that setting up a loopback device with partition scanning doesn't actually work when LOOP_CONFIGURE is issued, though it works fine with LOOP_SETSTATUS. Let's correct that and propagate the flag in LOOP_CONFIGURE too. Fixes: 3448914e8cc5("loop: Add LOOP_CONFIGURE ioctl") Signed-off-by: Lennart Poettering <lennart@poettering.net> Acked-by: Martijn Coenen <maco@android.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-16block: use bd_prepare_to_claim directly in the loop driverChristoph Hellwig1-4/+3
The arcane magic in bd_start_claiming is only needed to be able to claim a block_device that hasn't been fully set up. Switch the loop driver that claims from the ioctl path with a fully set up struct block_device to just use the much simpler bd_prepare_to_claim directly. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-24loop: be paranoid on exit and prevent new additions / removalsLuis Chamberlain1-0/+4
Be pedantic on removal as well and hold the mutex. This should prevent uses of addition while we exit. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-24blk-mq: move failure injection out of blk_mq_complete_requestChristoph Hellwig1-2/+4
Move the call to blk_should_fake_timeout out of blk_mq_complete_request and into the drivers, skipping call sites that are obvious error handlers, and remove the now superflous blk_mq_force_complete_rq helper. This ensures we don't keep injecting errors into completions that just terminate the Linux request after the hardware has been reset or the command has been aborted. Reviewed-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-18loop: replace kill_bdev with invalidate_bdevZheng Bin1-4/+4
When a filesystem is mounted on a loop device and on a loop ioctl LOOP_SET_STATUS64, because of kill_bdev, buffer_head mappings are getting destroyed. kill_bdev truncate_inode_pages truncate_inode_pages_range do_invalidatepage block_invalidatepage discard_buffer -->clear BH_Mapped flag sb_bread __bread_gfp bh = __getblk_gfp -->discard_buffer clear BH_Mapped flag __bread_slow submit_bh submit_bh_wbc BUG_ON(!buffer_mapped(bh)) --> hit this BUG_ON Fixes: 5db470e229e2 ("loop: drop caches if offset or block_size are changed") Signed-off-by: Zheng Bin <zhengbin13@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-11Merge tag 'block-5.8-2020-06-11' of git://git.kernel.dk/linux-blockLinus Torvalds1-1/+1
Pull block fixes from Jens Axboe: "Some followup fixes for this merge window. In particular: - Seqcount write missing preemption disable for stats (Ahmed) - blktrace fixes (Chaitanya) - Redundant initializations (Colin) - Various small NVMe fixes (Chaitanya, Christoph, Daniel, Max, Niklas, Rikard) - loop flag bug regression fix (Martijn) - blk-mq tagging fixes (Christoph, Ming)" * tag 'block-5.8-2020-06-11' of git://git.kernel.dk/linux-block: umem: remove redundant initialization of variable ret pktcdvd: remove redundant initialization of variable ret nvmet: fail outstanding host posted AEN req nvme-pci: use simple suspend when a HMB is enabled nvme-fc: don't call nvme_cleanup_cmd() for AENs nvmet-tcp: constify nvmet_tcp_ops nvme-tcp: constify nvme_tcp_mq_ops and nvme_tcp_admin_mq_ops nvme: do not call del_gendisk() on a disk that was never added blk-mq: fix blk_mq_all_tag_iter blk-mq: split out a __blk_mq_get_driver_tag helper blktrace: fix endianness for blk_log_remap() blktrace: fix endianness in get_pdu_int() blktrace: use errno instead of bi_status block: nr_sects_write(): Disable preemption on seqcount write block: remove the error argument to the block_bio_complete tracepoint loop: Fix wrong masking of status flags block/bio-integrity: don't free 'buf' if bio_integrity_add_page() failed
2020-06-04loop: Fix wrong masking of status flagsMartijn Coenen1-1/+1
In faf1d25440d6, loop_set_status() now assigns lo_status directly from the passed in lo_flags, but then fixes it up by masking out flags that can't be set by LOOP_SET_STATUS; unfortunately the mask was negated. Re-ran all ltp ioctl_loop tests, and they all passed. Pass run of the previously failing one: tst_test.c:1247: INFO: Timeout per run is 0h 05m 00s tst_device.c:88: INFO: Found free device 0 '/dev/loop0' ioctl_loop01.c:49: PASS: /sys/block/loop0/loop/partscan = 0 ioctl_loop01.c:50: PASS: /sys/block/loop0/loop/autoclear = 0 ioctl_loop01.c:51: PASS: /sys/block/loop0/loop/backing_file = '/tmp/ZRJ6H4/test.img' ioctl_loop01.c:65: PASS: get expected lo_flag 12 ioctl_loop01.c:67: PASS: /sys/block/loop0/loop/partscan = 1 ioctl_loop01.c:68: PASS: /sys/block/loop0/loop/autoclear = 1 ioctl_loop01.c:77: PASS: access /dev/loop0p1 succeeds ioctl_loop01.c:83: PASS: access /sys/block/loop0/loop0p1 succeeds Summary: passed 8 failed 0 skipped 0 warnings 0 Fixes: faf1d25440d6 ("loop: Clean up LOOP_SET_STATUS lo_flags handling") Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Signed-off-by: Martijn Coenen <maco@android.com> Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-02Merge tag 'vfs-5.8-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds1-3/+3
Pull DAX updates part one from Darrick Wong: "After many years of LKML-wrangling about how to enable programs to query and influence the file data access mode (DAX) when a filesystem resides on storage devices such as persistent memory, Ira Weiny has emerged with a proposed set of standard behaviors that has not been shot down by anyone! We're more or less standardizing on the current XFS behavior and adapting ext4 to do the same. This is the first of a handful pull requests that will make ext4 and XFS present a consistent interface for user programs that care about DAX. We add a statx attribute that programs can check to see if DAX is enabled on a particular file. Then, we update the DAX documentation to spell out the user-visible behaviors that filesystems will guarantee (until the next storage industry shakeup). The on-disk inode flag has been in XFS for a few years now. Summary: - Clean up io_is_direct. - Add a new statx flag to indicate when file data access is being done via DAX (as opposed to the page cache). - Update the documentation for how system administrators and application programmers can take advantage of the (still experimental DAX) feature" Link: https://lore.kernel.org/lkml/20200505002016.1085071-1-ira.weiny@intel.com/ * tag 'vfs-5.8-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: Documentation/dax: Update Usage section fs/stat: Define DAX statx attribute fs: Remove unneeded IS_DAX() check in io_is_direct()
2020-06-02Merge tag 'for-5.8/drivers-2020-06-01' of git://git.kernel.dk/linux-blockLinus Torvalds1-157/+226
Pull block driver updates from Jens Axboe: "On top of the core changes, here are the block driver changes for this merge window: - NVMe changes: - NVMe over Fibre Channel protocol updates, which also reach over to drivers/scsi/lpfc (James Smart) - namespace revalidation support on the target (Anthony Iliopoulos) - gcc zero length array fix (Arnd Bergmann) - nvmet cleanups (Chaitanya Kulkarni) - misc cleanups and fixes (me, Keith Busch, Sagi Grimberg) - use a SRQ per completion vector (Max Gurtovoy) - fix handling of runtime changes to the queue count (Weiping Zhang) - t10 protection information support for nvme-rdma and nvmet-rdma (Israel Rukshin and Max Gurtovoy) - target side AEN improvements (Chaitanya Kulkarni) - various fixes and minor improvements all over, icluding the nvme part of the lpfc driver" - Floppy code cleanup series (Willy, Denis) - Floppy contention fix (Jiri) - Loop CONFIGURE support (Martijn) - bcache fixes/improvements (Coly, Joe, Colin) - q->queuedata cleanups (Christoph) - Get rid of ioctl_by_bdev (Christoph, Stefan) - md/raid5 allocation fixes (Coly) - zero length array fixes (Gustavo) - swim3 task state fix (Xu)" * tag 'for-5.8/drivers-2020-06-01' of git://git.kernel.dk/linux-block: (166 commits) bcache: configure the asynchronous registertion to be experimental bcache: asynchronous devices registration bcache: fix refcount underflow in bcache_device_free() bcache: Convert pr_<level> uses to a more typical style bcache: remove redundant variables i and n lpfc: Fix return value in __lpfc_nvme_ls_abort lpfc: fix axchg pointer reference after free and double frees lpfc: Fix pointer checks and comments in LS receive refactoring nvme: set dma alignment to qword nvmet: cleanups the loop in nvmet_async_events_process nvmet: fix memory leak when removing namespaces and controllers concurrently nvmet-rdma: add metadata/T10-PI support nvmet: add metadata support for block devices nvmet: add metadata/T10-PI support nvme: add Metadata Capabilities enumerations nvmet: rename nvmet_check_data_len to nvmet_check_transfer_len nvmet: rename nvmet_rw_len to nvmet_rw_data_len nvmet: add metadata characteristics for a namespace nvme-rdma: add metadata/T10-PI support nvme-rdma: introduce nvme_rdma_sgl structure ...
2020-06-02Merge tag 'for-5.8/block-2020-06-01' of git://git.kernel.dk/linux-blockLinus Torvalds1-1/+1
Pull block updates from Jens Axboe: "Core block changes that have been queued up for this release: - Remove dead blk-throttle and blk-wbt code (Guoqing) - Include pid in blktrace note traces (Jan) - Don't spew I/O errors on wouldblock termination (me) - Zone append addition (Johannes, Keith, Damien) - IO accounting improvements (Konstantin, Christoph) - blk-mq hardware map update improvements (Ming) - Scheduler dispatch improvement (Salman) - Inline block encryption support (Satya) - Request map fixes and improvements (Weiping) - blk-iocost tweaks (Tejun) - Fix for timeout failing with error injection (Keith) - Queue re-run fixes (Douglas) - CPU hotplug improvements (Christoph) - Queue entry/exit improvements (Christoph) - Move DMA drain handling to the few drivers that use it (Christoph) - Partition handling cleanups (Christoph)" * tag 'for-5.8/block-2020-06-01' of git://git.kernel.dk/linux-block: (127 commits) block: mark bio_wouldblock_error() bio with BIO_QUIET blk-wbt: rename __wbt_update_limits to wbt_update_limits blk-wbt: remove wbt_update_limits blk-throttle: remove tg_drain_bios blk-throttle: remove blk_throtl_drain null_blk: force complete for timeout request blk-mq: drain I/O when all CPUs in a hctx are offline blk-mq: add blk_mq_all_tag_iter blk-mq: open code __blk_mq_alloc_request in blk_mq_alloc_request_hctx blk-mq: use BLK_MQ_NO_TAG in more places blk-mq: rename BLK_MQ_TAG_FAIL to BLK_MQ_NO_TAG blk-mq: move more request initialization to blk_mq_rq_ctx_init blk-mq: simplify the blk_mq_get_request calling convention blk-mq: remove the bio argument to ->prepare_request nvme: force complete cancelled requests blk-mq: blk-mq: provide forced completion method block: fix a warning when blkdev.h is included for !CONFIG_BLOCK builds block: blk-crypto-fallback: remove redundant initialization of variable err block: reduce part_stat_lock() scope block: use __this_cpu_add() instead of access by smp_processor_id() ...
2020-06-02mm/writeback: replace PF_LESS_THROTTLE with PF_LOCAL_THROTTLENeilBrown1-1/+1
PF_LESS_THROTTLE exists for loop-back nfsd (and a similar need in the loop block driver and callers of prctl(PR_SET_IO_FLUSHER)), where a daemon needs to write to one bdi (the final bdi) in order to free up writes queued to another bdi (the client bdi). The daemon sets PF_LESS_THROTTLE and gets a larger allowance of dirty pages, so that it can still dirty pages after other processses have been throttled. The purpose of this is to avoid deadlock that happen when the PF_LESS_THROTTLE process must write for any dirty pages to be freed, but it is being thottled and cannot write. This approach was designed when all threads were blocked equally, independently on which device they were writing to, or how fast it was. Since that time the writeback algorithm has changed substantially with different threads getting different allowances based on non-trivial heuristics. This means the simple "add 25%" heuristic is no longer reliable. The important issue is not that the daemon needs a *larger* dirty page allowance, but that it needs a *private* dirty page allowance, so that dirty pages for the "client" bdi that it is helping to clear (the bdi for an NFS filesystem or loop block device etc) do not affect the throttling of the daemon writing to the "final" bdi. This patch changes the heuristic so that the task is not throttled when the bdi it is writing to has a dirty page count below below (or equal to) the free-run threshold for that bdi. This ensures it will always be able to have some pages in flight, and so will not deadlock. In a steady-state, it is expected that PF_LOCAL_THROTTLE tasks might still be throttled by global threshold, but that is acceptable as it is only the deadlock state that is interesting for this flag. This approach of "only throttle when target bdi is busy" is consistent with the other use of PF_LESS_THROTTLE in current_may_throttle(), were it causes attention to be focussed only on the target bdi. So this patch - renames PF_LESS_THROTTLE to PF_LOCAL_THROTTLE, - removes the 25% bonus that that flag gives, and - If PF_LOCAL_THROTTLE is set, don't delay at all unless the global and the local free-run thresholds are exceeded. Note that previously realtime threads were treated the same as PF_LESS_THROTTLE threads. This patch does *not* change the behvaiour for real-time threads, so it is now different from the behaviour of nfsd and loop tasks. I don't know what is wanted for realtime. [akpm@linux-foundation.org: coding style fixes] Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Chuck Lever <chuck.lever@oracle.com> [nfsd] Cc: Christoph Hellwig <hch@lst.de> Cc: Michal Hocko <mhocko@suse.com> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Link: http://lkml.kernel.org/r/87ftbf7gs3.fsf@notabene.neil.brown.name Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-29blk-mq: drain I/O when all CPUs in a hctx are offlineMing Lei1-1/+1
Most of blk-mq drivers depend on managed IRQ's auto-affinity to setup up queue mapping. Thomas mentioned the following point[1]: "That was the constraint of managed interrupts from the very beginning: The driver/subsystem has to quiesce the interrupt line and the associated queue _before_ it gets shutdown in CPU unplug and not fiddle with it until it's restarted by the core when the CPU is plugged in again." However, current blk-mq implementation doesn't quiesce hw queue before the last CPU in the hctx is shutdown. Even worse, CPUHP_BLK_MQ_DEAD is a cpuhp state handled after the CPU is down, so there isn't any chance to quiesce the hctx before shutting down the CPU. Add new CPUHP_AP_BLK_MQ_ONLINE state to stop allocating from blk-mq hctxs where the last CPU goes away, and wait for completion of in-flight requests. This guarantees that there is no inflight I/O before shutting down the managed IRQ. Add a BLK_MQ_F_STACKING and set it for dm-rq and loop, so we don't need to wait for completion of in-flight requests from these drivers to avoid a potential dead-lock. It is safe to do this for stacking drivers as those do not use interrupts at all and their I/O completions are triggered by underlying devices I/O completion. [1] https://lore.kernel.org/linux-block/alpine.DEB.2.21.1904051331270.1802@nanos.tec.linutronix.de/ [hch: different retry mechanism, merged two patches, minor cleanups] Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-24loop: remove redundant assignment to variable errorColin Ian King1-2/+0
The variable error is being assigned a value that is never read so the assignment is redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-21loop: Add LOOP_CONFIGURE ioctlMartijn Coenen1-28/+76
This allows userspace to completely setup a loop device with a single ioctl, removing the in-between state where the device can be partially configured - eg the loop device has a backing file associated with it, but is reading from the wrong offset. Besides removing the intermediate state, another big benefit of this ioctl is that LOOP_SET_STATUS can be slow; the main reason for this slowness is that LOOP_SET_STATUS(64) calls blk_mq_freeze_queue() to freeze the associated queue; this requires waiting for RCU synchronization, which I've measured can take about 15-20ms on this device on average. In addition to doing what LOOP_SET_STATUS can do, LOOP_CONFIGURE can also be used to: - Set the correct block size immediately by setting loop_config.block_size (avoids LOOP_SET_BLOCK_SIZE) - Explicitly request direct I/O mode by setting LO_FLAGS_DIRECT_IO in loop_config.info.lo_flags (avoids LOOP_SET_DIRECT_IO) - Explicitly request read-only mode by setting LO_FLAGS_READ_ONLY in loop_config.info.lo_flags Here's setting up ~70 regular loop devices with an offset on an x86 Android device, using LOOP_SET_FD and LOOP_SET_STATUS: vsoc_x86:/system/apex # time for i in `seq 30 100`; do losetup -r -o 4096 /dev/block/loop$i com.android.adbd.apex; done 0m03.40s real 0m00.02s user 0m00.03s system Here's configuring ~70 devices in the same way, but using a modified losetup that uses the new LOOP_CONFIGURE ioctl: vsoc_x86:/system/apex # time for i in `seq 30 100`; do losetup -r -o 4096 /dev/block/loop$i com.android.adbd.apex; done 0m01.94s real 0m00.01s user 0m00.01s system Signed-off-by: Martijn Coenen <maco@android.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-21loop: Clean up LOOP_SET_STATUS lo_flags handlingMartijn Coenen1-6/+13
LOOP_SET_STATUS(64) will actually allow some lo_flags to be modified; in particular, LO_FLAGS_AUTOCLEAR can be set and cleared, whereas LO_FLAGS_PARTSCAN can be set to request a partition scan. Make this explicit by updating the UAPI to include the flags that can be set/cleared using this ioctl. The implementation can then blindly take over the passed in flags, and use the previous flags for those flags that can't be set / cleared using LOOP_SET_STATUS. Signed-off-by: Martijn Coenen <maco@android.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-21loop: Rework lo_ioctl() __user argument castingMartijn Coenen1-6/+5
In preparation for a new ioctl that needs to copy_from_user(); makes the code easier to read as well. Signed-off-by: Martijn Coenen <maco@android.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-21loop: Move loop_set_status_from_info() and friends upMartijn Coenen1-103/+103
So we can use it without forward declaration. This is a separate commit to make it easier to verify that this is just a move, without functional modifications. Signed-off-by: Martijn Coenen <maco@android.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-21loop: Factor out configuring loop from statusMartijn Coenen1-50/+67
Factor out this code into a separate function, so it can be reused by other code more easily. Signed-off-by: Martijn Coenen <maco@android.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-21loop: Remove figure_loop_size()Martijn Coenen1-9/+4
This function was now only used by loop_set_capacity(). Just open code the remaining code in the caller instead. Signed-off-by: Martijn Coenen <maco@android.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-21loop: Refactor loop_set_status() size calculationMartijn Coenen1-18/+19
figure_loop_size() calculates the loop size based on the passed in parameters, but at the same time it updates the offset and sizelimit parameters in the loop device configuration. That is a somewhat unexpected side effect of a function with this name, and it is only only needed by one of the two callers of this function - loop_set_status(). Move the lo_offset and lo_sizelimit assignment back into loop_set_status(), and use the newly factored out functions to validate and apply the newly calculated size. This allows us to get rid of figure_loop_size() in a follow-up commit. Signed-off-by: Martijn Coenen <maco@android.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-21loop: Switch to set_capacity_revalidate_and_notify()Martijn Coenen1-3/+2
This was recently added to block/genhd.c, and takes care of both updating the capacity and notifying userspace of the new size. Signed-off-by: Martijn Coenen <maco@android.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-21loop: Factor out setting loop device sizeMartijn Coenen1-9/+21
This code is used repeatedly. Signed-off-by: Martijn Coenen <maco@android.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-21loop: Remove sector_t truncation checksMartijn Coenen1-14/+7
sector_t is now always u64, so we don't need to check for truncation. Signed-off-by: Martijn Coenen <maco@android.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-21loop: Call loop_config_discard() only after new config is appliedMartijn Coenen1-2/+2
loop_set_status() calls loop_config_discard() to configure discard for the loop device; however, the discard configuration depends on whether the loop device uses encryption, and when we call it the encryption configuration has not been updated yet. Move the call down so we apply the correct discard configuration based on the new configuration. Signed-off-by: Martijn Coenen <maco@android.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bob Liu <bob.liu@oracle.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-04fs: Remove unneeded IS_DAX() check in io_is_direct()Ira Weiny1-3/+3
Remove the check because DAX now has it's own read/write methods and file systems which support DAX check IS_DAX() prior to IOCB_DIRECT on their own. Therefore, it does not matter if the file state is DAX when the iocb flags are created. Also remove io_is_direct() as it is just a simple flag check. Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-04-03loop: Better discard support for block devicesEvan Green1-11/+31
If the backing device for a loop device is itself a block device, then mirror the "write zeroes" capabilities of the underlying block device into the loop device. Copy this capability into both max_write_zeroes_sectors and max_discard_sectors of the loop device. The reason for this is that REQ_OP_DISCARD on a loop device translates into blkdev_issue_zeroout(), rather than blkdev_issue_discard(). This presents a consistent interface for loop devices (that discarded data is zeroed), regardless of the backing device type of the loop device. There should be no behavior change for loop devices backed by regular files. This change fixes blktest block/003, and removes an extraneous error print in block/013 when testing on a loop device backed by a block device that does not support discard. Signed-off-by: Evan Green <evgreen@chromium.org> Reviewed-by: Gwendal Grignou <gwendal@chromium.org> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> [used updated version of Evan's comment in loop_config_discard()] [moved backingq to local scope, removed redundant braces] Signed-off-by: Andrzej Pietrasiewicz <andrzej.p@collabora.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-04-03loop: Report EOPNOTSUPP properlyEvan Green1-2/+5
Properly plumb out EOPNOTSUPP from loop driver operations, which may get returned when for instance a discard operation is attempted but not supported by the underlying block device. Before this change, everything was reported in the log as an I/O error, which is scary and not helpful in debugging. Signed-off-by: Evan Green <evgreen@chromium.org> Reviewed-by: Gwendal Grignou <gwendal@chromium.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Andrzej Pietrasiewicz <andrzej.p@collabora.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>