summaryrefslogtreecommitdiffstats
path: root/block
AgeCommit message (Collapse)AuthorFilesLines
2022-07-12blk-iolatency: Use atomic{,64}_try_cmpxchgUros Bizjak1-7/+5
Use atomic_try_cmpxchg instead of atomic_cmpxchg (*ptr, old, new) == old in check_scale_change and atomic64_try_cmpxchg in blkcg_iolatency_done_bio. x86 CMPXCHG instruction returns success in ZF flag, so this change saves a compare after cmpxchg (and related move instruction in front of cmpxchg). No functional change intended. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/r/20220712151947.6783-1-ubizjak@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-12block: Use try_cmpxchg in update_io_ticksUros Bizjak1-1/+1
Use try_cmpxchg instead of cmpxchg (*ptr, old, new) == old in update_io_ticks. x86 CMPXCHG instruction returns success in ZF flag, so this change saves a compare after cmpxchg (and related move instruction in front of cmpxchg). No functional change intended. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/r/20220712152741.7324-1-ubizjak@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-12block/rq_qos: Use atomic_try_cmpxchg in atomic_inc_belowUros Bizjak1-8/+2
Use atomic_try_cmpxchg instead of atomic_cmpxchg (*ptr, old, new) == old in atomic_inc_below. x86 CMPXCHG instruction returns success in ZF flag, so this change saves a compare after cmpxchg (and related move instruction in front of cmpxchg). Also, atomic_try_cmpxchg implicitly assigns old *ptr value to "old" when cmpxchg fails, enabling further code simplifications. No functional change intended. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/r/20220712150547.5786-1-ubizjak@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-12blk-mq: don't create hctx debugfs dir until q->debugfs_dir is createdMing Lei1-0/+3
blk_mq_debugfs_register_hctx() can be called by blk_mq_update_nr_hw_queues when gendisk isn't added yet, such as nvme tcp. Fixes the warning of 'debugfs: Directory 'hctx0' with parent '/' already present!' which can be observed reliably when running blktests nvme/005. Fixes: 6cfc0081b046 ("blk-mq: no need to check return value of debugfs_create functions") Reported-by: Yi Zhang <yi.zhang@redhat.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Tested-by: Yi Zhang <yi.zhang@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220711090808.259682-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-06block: move zone related fields to struct gendiskChristoph Hellwig3-28/+25
Move the zone related fields that are currently stored in struct request_queue to struct gendisk as these are part of the highlevel block layer API and are only used for non-passthrough I/O that requires the gendisk. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220706070350.1703384-17-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-06block: use bdev based helpers in blkdev_zone_mgmt{,all}Christoph Hellwig1-4/+4
Use the bdev based helpers instead of the queue based ones to clean up the code a bit and prepare for storing all zone related fields in struct gendisk. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220706070350.1703384-13-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-06block: replace blkdev_nr_zones with bdev_nr_zonesChristoph Hellwig2-8/+9
Pass a block_device instead of a request_queue as that is what most callers have at hand. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Link: https://lore.kernel.org/r/20220706070350.1703384-12-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-06block: remove queue_max_open_zones and queue_max_active_zonesChristoph Hellwig1-2/+2
Always use the bdev based helpers instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220706070350.1703384-10-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-06block: pass a gendisk to blk_queue_free_zone_bitmapsChristoph Hellwig3-6/+8
Switch to a gendisk based API in preparation for moving all zone related fields from the request_queue to the gendisk. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220706070350.1703384-9-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-06block: pass a gendisk to blk_queue_clear_zone_settingsChristoph Hellwig3-4/+6
Switch to a gendisk based API in preparation for moving all zone related fields from the request_queue to the gendisk. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220706070350.1703384-8-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-06block: pass a gendisk to blk_queue_set_zonedChristoph Hellwig2-5/+6
Prepare for storing the zone related field in struct gendisk instead of struct request_queue. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220706070350.1703384-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-06block: simplify blk_check_zone_appendChristoph Hellwig1-3/+2
Use the bdev based helpers instead of open coding them. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220706070350.1703384-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-06block: simplify blk_mq_plugChristoph Hellwig4-13/+11
Drop the unused q argument, and invert the check to move the exception into a branch and the regular path as the normal return. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220706070350.1703384-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-06block: use bdev_is_zoned instead of open coding itChristoph Hellwig4-10/+9
Use bdev_is_zoned in all places where a block_device is available instead of open coding it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220706070350.1703384-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-06block: call blk_queue_free_zone_bitmaps from disk_releaseChristoph Hellwig2-2/+1
The zone bitmaps are only used for non-passthrough I/O, so free them as soon as the disk is released. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220706070350.1703384-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-06blk-mq: Drop local variable for reserved tagJohn Garry1-4/+2
The local variable is now only referenced once so drop it. Signed-off-by: John Garry <john.garry@huawei.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/1657109034-206040-7-git-send-email-john.garry@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-06blk-mq: Drop 'reserved' arg of busy_tag_iter_fnJohn Garry3-11/+8
We no longer use the 'reserved' arg in busy_tag_iter_fn for any iter function so it may be dropped. Signed-off-by: John Garry <john.garry@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> #nvme Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/1657109034-206040-6-git-send-email-john.garry@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-06blk-mq: Drop blk_mq_ops.timeout 'reserved' argJohn Garry2-4/+4
With new API blk_mq_is_reserved_rq() we can tell if a request is from the reserved pool, so stop passing 'reserved' arg. There is actually only a single user of that arg for all the callback implementations, which can use blk_mq_is_reserved_rq() instead. This will also allow us to stop passing the same 'reserved' around the blk-mq iter functions next. Signed-off-by: John Garry <john.garry@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # For MMC Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/1657109034-206040-4-git-send-email-john.garry@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-06blk-mq: Add a flag for reserved requestsJohn Garry1-0/+6
Add a flag for reserved requests so that drivers may know this for any special handling. Signed-off-by: John Garry <john.garry@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/1657109034-206040-3-git-send-email-john.garry@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-06-29blk-cgroup: factor out blkcg_free_all_cpd()Jason Yan1-18/+18
To reduce some duplicated code, factor out blkcg_free_all_cpd(). No functional change. Signed-off-by: Jason Yan <yanaijie@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220629070917.3113016-3-yanaijie@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-06-29blk-cgroup: factor out blkcg_iostat_update()Jason Yan1-17/+20
To reduce some duplicated code, factor out blkcg_iostat_update(). No functional change. Signed-off-by: Jason Yan <yanaijie@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220629070917.3113016-2-yanaijie@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-06-29block: simplify disk_set_independent_access_rangesChristoph Hellwig3-44/+18
Lift setting disk->ia_ranges from disk_register_independent_access_ranges into disk_set_independent_access_ranges, and make the behavior the same for the registered vs non-registered queue cases. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Link: https://lore.kernel.org/r/20220629062013.1331068-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-06-29block: move ->ia_ranges from the request_queue to the gendiskChristoph Hellwig1-9/+9
Independent access ranges only matter for file system I/O and are only valid with a registered gendisk, so move them there. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Link: https://lore.kernel.org/r/20220629062013.1331068-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-06-29block: remove "select BLK_RQ_IO_DATA_LEN" from BLK_CGROUP_IOCOST dependencyYing Sun1-1/+0
The configuration item BLK_RQ_IO_DATA_LEN is not declared in the kernel. Select BLK_RQ_IO_DATA_LEN is meaningless which could be removed. Signed-off-by: Ying Sun <sunying@nj.iscas.ac.cn> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220629062409.19458-1-sunying@nj.iscas.ac.cn Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-06-29block: Remove check of PageErrorMatthew Wilcox (Oracle)1-4/+0
If read_mapping_page() sees a page with PageError set, it returns a PTR_ERR(). Checking PageError again is simply dead code. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-06-28blk-mq: cleanup disk sysfs registrationChristoph Hellwig3-25/+26
Pass a gendisk to the sysfs register/unregister functions and give them descriptive names. Also move the unregistration helper next to the one doing the registration. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220628171850.1313069-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-06-28blk-mq: rename blk_mq_sysfs_{,un}registerChristoph Hellwig3-6/+6
Add a _hctx postfix to better describe what the functions do, match the debugfs equivalents and release the old names for functions that should be using them. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220628171850.1313069-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-06-28block: remove the extra gendisk reference in __blk_mq_register_devChristoph Hellwig1-3/+1
kobject_add already grabs a reference to the parent, no need to have another one. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220628171850.1313069-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-06-28block: use default groups to register the queue attributesChristoph Hellwig1-6/+6
Set up the default_groups for blk_queue_ktype instead of manually calling sysfs_create_group. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220628171850.1313069-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-06-28block: remove a superflous queue kobject referenceChristoph Hellwig1-5/+1
kobject_add already adds a reference to the parent that is dropped on deletion, so don't bother grabbing another one. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220628171850.1313069-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-06-28block: simplify blktrace sysfs attribute creationChristoph Hellwig4-11/+6
Add the trace attributes to the default gendisk attributes, just like we already do for partitions. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220628171850.1313069-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-06-28block: remove blk_cleanup_diskChristoph Hellwig1-15/+0
blk_cleanup_disk is nothing but a trivial wrapper for put_disk now, so remove it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20220619060552.1850436-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-06-28block: simplify disk shutdownChristoph Hellwig6-56/+59
Set the queue dying flag and call blk_mq_exit_queue from del_gendisk for all disks that do not have separately allocated queues, and thus remove the need to call blk_cleanup_queue for them. Rename blk_cleanup_disk to blk_mq_destroy_queue to make it clear that this function is intended only for separately allocated blk-mq queues. This saves an extra queue freeze for devices without a separately allocated queue. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20220619060552.1850436-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-06-28block: stop setting the nomerges flags in blk_cleanup_queueChristoph Hellwig1-3/+0
These flags only apply to file system I/O, and all file system I/O is already drained by del_gendisk and thus can't be in progress when blk_cleanup_queue is called. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20220619060552.1850436-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-06-28block: remove QUEUE_FLAG_DEADChristoph Hellwig2-8/+3
Disallow setting the blk-mq state on any queue that is already dying as setting the state even then is a bad idea, and remove the now unused QUEUE_FLAG_DEAD flag. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20220619060552.1850436-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-06-27blk-mq: blk_mq_tag_busy is no need to return a valueLiu Song2-17/+11
Currently "blk_mq_tag_busy" return value has no effect, so adjust it. Some code implementations have also been adjusted to enhance readability. Signed-off-by: Liu Song <liusong@linux.alibaba.com> Link: https://lore.kernel.org/r/1656170121-1619-1-git-send-email-liusong@linux.alibaba.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-06-27block: Always initialize bio IO priority on submitJan Kara1-0/+3
Currently, IO priority set in task's IO context is not reflected in the bio->bi_ioprio for most IO (only io_uring and direct IO set it). This results in odd results where process is submitting some bios with one priority and other bios with a different (unset) priority and due to differing priorities bios cannot be merged. Make sure bio->bi_ioprio is always set on bio submission. Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220623074840.5960-9-jack@suse.cz Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-06-27block: Initialize bio priority earlierJan Kara1-2/+2
Bio's IO priority needs to be initialized before we try to merge the bio with other bios. Otherwise we could merge bios which would otherwise receive different IO priorities leading to possible QoS issues. Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220623074840.5960-8-jack@suse.cz Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-06-27blk-ioprio: Convert from rqos policy to direct callJan Kara4-45/+23
Convert blk-ioprio handling from a rqos policy to a direct call from blk_mq_submit_bio(). Firstly, blk-ioprio is not much of a rqos policy anyway, it just needs a hook in bio submission path to set the bio's IO priority. Secondly, the rqos .track hook gets actually called too late for blk-ioprio purposes and introducing a special rqos hook just for blk-ioprio looks even weirder. Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220623074840.5960-7-jack@suse.cz Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-06-27blk-ioprio: Remove unneeded fieldJan Kara1-6/+3
blkcg->ioprio_set field is not really useful except for avoiding possibly more expensive checks inside blkcg_ioprio_track(). The check for blkcg->prio_policy being equal to POLICY_NO_CHANGE does the same service so just remove the ioprio_set field and replace the check. Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220623074840.5960-6-jack@suse.cz Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-06-27block: Fix handling of tasks without ioprio in ioprio_get(2)Jan Kara1-7/+23
ioprio_get(2) can be asked to return the best IO priority from several tasks (IOPRIO_WHO_PGRP, IOPRIO_WHO_USER). Currently the call treats tasks without set IO priority as having priority IOPRIO_CLASS_BE/IOPRIO_BE_NORM however this does not really reflect the IO priority the task will get (which depends on task's nice value). Fix the code to use the real IO priority task's IO will use. We have to modify code for ioprio_get(IOPRIO_WHO_PROCESS) to keep returning IOPRIO_CLASS_NONE priority for tasks without set IO priority as a special case to maintain userspace visible API. Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220623074840.5960-5-jack@suse.cz Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-06-27block: Make ioprio_best() staticJan Kara1-1/+1
Nobody outside of block/ioprio.c uses it. Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220623074840.5960-4-jack@suse.cz Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-06-27block: Generalize get_current_ioprio() for any taskJan Kara1-0/+26
get_current_ioprio() operates only on current task. We will need the same functionality for other tasks as well. Generalize get_current_ioprio() for that and also move the bulk out of the header file because it is large enough. Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220623074840.5960-3-jack@suse.cz Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-06-27block: fix default IO priority handling againJan Kara2-2/+4
Commit e70344c05995 ("block: fix default IO priority handling") introduced an inconsistency in get_current_ioprio() that tasks without IO context return IOPRIO_DEFAULT priority while tasks with freshly allocated IO context will return 0 (IOPRIO_CLASS_NONE/0) IO priority. Tasks without IO context used to be rare before 5a9d041ba2f6 ("block: move io_context creation into where it's needed") but after this commit they became common because now only BFQ IO scheduler setups task's IO context. Similar inconsistency is there for get_task_ioprio() so this inconsistency is now exposed to userspace and userspace will see different IO priority for tasks operating on devices with BFQ compared to devices without BFQ. Furthemore the changes done by commit e70344c05995 change the behavior when no IO priority is set for BFQ IO scheduler which is also documented in ioprio_set(2) manpage: "If no I/O scheduler has been set for a thread, then by default the I/O priority will follow the CPU nice value (setpriority(2)). In Linux kernels before version 2.6.24, once an I/O priority had been set using ioprio_set(), there was no way to reset the I/O scheduling behavior to the default. Since Linux 2.6.24, specifying ioprio as 0 can be used to reset to the default I/O scheduling behavior." So make sure we default to IOPRIO_CLASS_NONE as used to be the case before commit e70344c05995. Also cleanup alloc_io_context() to explicitely set this IO priority for the allocated IO context to avoid future surprises. Note that we tweak ioprio_best() to maintain ioprio_get(2) behavior and make this commit easily backportable. CC: stable@vger.kernel.org Fixes: e70344c05995 ("block: fix default IO priority handling") Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220623074840.5960-1-jack@suse.cz Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-06-27blk-mq: Don't disable preemption around __blk_mq_run_hw_queue().Sebastian Andrzej Siewior1-5/+1
__blk_mq_delay_run_hw_queue() disables preemption to get a stable current CPU number and then invokes __blk_mq_run_hw_queue() if the CPU number is part the mask. __blk_mq_run_hw_queue() acquires a spin_lock_t which is a sleeping lock on PREEMPT_RT and can't be acquired with disabled preemption. It is not required for correctness to invoke __blk_mq_run_hw_queue() on a CPU matching hctx->cpumask. Both (async and direct requests) can run on a CPU not matching hctx->cpumask. The CPU mask without disabling preemption and invoking __blk_mq_run_hw_queue(). Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/YrLSEiNvagKJaDs5@linutronix.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-06-27block: bfq: Fix kernel-doc headersBart Van Assche2-2/+6
Fix the following warnings: block/bfq-cgroup.c:721: warning: Function parameter or member 'bfqg' not described in '__bfq_bic_change_cgroup' block/bfq-cgroup.c:721: warning: Excess function parameter 'blkcg' description in '__bfq_bic_change_cgroup' block/bfq-cgroup.c:870: warning: Function parameter or member 'ioprio_class' not described in 'bfq_reparent_leaf_entity' block/bfq-cgroup.c:900: warning: Function parameter or member 'ioprio_class' not described in 'bfq_reparent_active_queues' Cc: Jan Kara <jack@suse.cz> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/r/20220617210859.106623-1-bvanassche@acm.org
2022-06-27block: bfq: Remove an unused function definitionBart Van Assche1-2/+0
This patch is the result of the analysis of a sparse report. Cc: Jan Kara <jack@suse.cz> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/r/20220617204433.102022-1-bvanassche@acm.org
2022-06-27bfq: Remove useless code in bfq_lookup_next_entityGuoYong Zheng1-3/+0
It is no need to judge entity is null or not here, directly return entity is ok, so remove it. Signed-off-by: GuoYong Zheng <zhenggy@chinatelecom.cn> Link: https://lore.kernel.org/r/1655461684-19075-1-git-send-email-zhenggy@chinatelecom.cn Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-06-27block: move blk_queue_get_max_sectors to blk.hChristoph Hellwig1-0/+13
blk_queue_get_max_sectors is private to the block layer, so move it out of blkdev.h. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220614090934.570632-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-06-27block: fold blk_max_size_offset into get_max_io_sizeChristoph Hellwig1-2/+7
Now that blk_max_size_offset has a single caller left, fold it into that and clean up the naming convention for the local variables there. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Pankaj Raghav <p.raghav@samsung.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220614090934.570632-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>