From 26bfeb26624071a47acb3a07097efe12044865f2 Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Sun, 16 Aug 2020 16:39:34 -0700 Subject: block: blk-mq.c: fix @at_head kernel-doc warning MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Fix a kernel-doc warning in block/blk-mq.c: ../block/blk-mq.c:1844: warning: Function parameter or member 'at_head' not described in 'blk_mq_request_bypass_insert' Fixes: 01e99aeca397 ("blk-mq: insert passthrough request into hctx->dispatch directly") Signed-off-by: Randy Dunlap Cc: André Almeida Cc: Jens Axboe Cc: Ming Lei Cc: linux-block@vger.kernel.org Signed-off-by: Jens Axboe --- block/blk-mq.c | 1 + 1 file changed, 1 insertion(+) (limited to 'block') diff --git a/block/blk-mq.c b/block/blk-mq.c index 0015a1892153..35f8d0692442 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1834,6 +1834,7 @@ void __blk_mq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq, /** * blk_mq_request_bypass_insert - Insert a request at dispatch list. * @rq: Pointer to request to be inserted. + * @at_head: true if the request should be inserted at the head of the list. * @run_queue: If we should run the hardware queue after inserting the request. * * Should only be used carefully, when the caller knows we want to -- cgit v1.2.3 From 03ef5941a04c68b5eb8254493d09b6e899a377a3 Mon Sep 17 00:00:00 2001 From: Xu Wang Date: Mon, 17 Aug 2020 02:16:49 +0000 Subject: bsg-lib: convert comma to semicolon Replace a comma between expression statements by a semicolon. Signed-off-by: Xu Wang Signed-off-by: Jens Axboe --- block/bsg-lib.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'block') diff --git a/block/bsg-lib.c b/block/bsg-lib.c index fb7b347f8010..d185396d88bb 100644 --- a/block/bsg-lib.c +++ b/block/bsg-lib.c @@ -378,7 +378,7 @@ struct request_queue *bsg_setup_queue(struct device *dev, const char *name, bset->timeout_fn = timeout; set = &bset->tag_set; - set->ops = &bsg_mq_ops, + set->ops = &bsg_mq_ops; set->nr_hw_queues = 1; set->queue_depth = 128; set->numa_node = NUMA_NO_NODE; -- cgit v1.2.3 From d7d8535f377e9ba87edbf7fbbd634ac942f3f54f Mon Sep 17 00:00:00 2001 From: Ming Lei Date: Mon, 17 Aug 2020 18:01:15 +0800 Subject: blk-mq: order adding requests to hctx->dispatch and checking SCHED_RESTART SCHED_RESTART code path is relied to re-run queue for dispatch requests in hctx->dispatch. Meantime the SCHED_RSTART flag is checked when adding requests to hctx->dispatch. memory barriers have to be used for ordering the following two pair of OPs: 1) adding requests to hctx->dispatch and checking SCHED_RESTART in blk_mq_dispatch_rq_list() 2) clearing SCHED_RESTART and checking if there is request in hctx->dispatch in blk_mq_sched_restart(). Without the added memory barrier, either: 1) blk_mq_sched_restart() may miss requests added to hctx->dispatch meantime blk_mq_dispatch_rq_list() observes SCHED_RESTART, and not run queue in dispatch side or 2) blk_mq_dispatch_rq_list still sees SCHED_RESTART, and not run queue in dispatch side, meantime checking if there is request in hctx->dispatch from blk_mq_sched_restart() is missed. IO hang in ltp/fs_fill test is reported by kernel test robot: https://lkml.org/lkml/2020/7/26/77 Turns out it is caused by the above out-of-order OPs. And the IO hang can't be observed any more after applying this patch. Fixes: bd166ef183c2 ("blk-mq-sched: add framework for MQ capable IO schedulers") Reported-by: kernel test robot Signed-off-by: Ming Lei Reviewed-by: Christoph Hellwig Cc: Bart Van Assche Cc: Christoph Hellwig Cc: David Jeffery Cc: Signed-off-by: Jens Axboe --- block/blk-mq-sched.c | 9 +++++++++ block/blk-mq.c | 9 +++++++++ 2 files changed, 18 insertions(+) (limited to 'block') diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c index a19cdf159b75..d2790e5b06d1 100644 --- a/block/blk-mq-sched.c +++ b/block/blk-mq-sched.c @@ -78,6 +78,15 @@ void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx) return; clear_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state); + /* + * Order clearing SCHED_RESTART and list_empty_careful(&hctx->dispatch) + * in blk_mq_run_hw_queue(). Its pair is the barrier in + * blk_mq_dispatch_rq_list(). So dispatch code won't see SCHED_RESTART, + * meantime new request added to hctx->dispatch is missed to check in + * blk_mq_run_hw_queue(). + */ + smp_mb(); + blk_mq_run_hw_queue(hctx, true); } diff --git a/block/blk-mq.c b/block/blk-mq.c index 35f8d0692442..a80f4986e594 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1437,6 +1437,15 @@ out: list_splice_tail_init(list, &hctx->dispatch); spin_unlock(&hctx->lock); + /* + * Order adding requests to hctx->dispatch and checking + * SCHED_RESTART flag. The pair of this smp_mb() is the one + * in blk_mq_sched_restart(). Avoid restart code path to + * miss the new added requests to hctx->dispatch, meantime + * SCHED_RESTART is observed here. + */ + smp_mb(); + /* * If SCHED_RESTART was set by the caller of this function and * it is no longer set that means that it was cleared by another -- cgit v1.2.3 From 943b40c832beb71115e38a1c4d99b640b5342738 Mon Sep 17 00:00:00 2001 From: Ming Lei Date: Mon, 17 Aug 2020 17:52:39 +0800 Subject: block: respect queue limit of max discard segment When queue_max_discard_segments(q) is 1, blk_discard_mergable() will return false for discard request, then normal request merge is applied. However, only queue_max_segments() is checked, so max discard segment limit isn't respected. Check max discard segment limit in the request merge code for fixing the issue. Discard request failure of virtio_blk is fixed. Fixes: 69840466086d ("block: fix the DISCARD request merge") Signed-off-by: Ming Lei Reviewed-by: Christoph Hellwig Cc: Stefano Garzarella Signed-off-by: Jens Axboe --- block/blk-merge.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) (limited to 'block') diff --git a/block/blk-merge.c b/block/blk-merge.c index 6529e3aab001..7af1f3668a91 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -533,10 +533,17 @@ int __blk_rq_map_sg(struct request_queue *q, struct request *rq, } EXPORT_SYMBOL(__blk_rq_map_sg); +static inline unsigned int blk_rq_get_max_segments(struct request *rq) +{ + if (req_op(rq) == REQ_OP_DISCARD) + return queue_max_discard_segments(rq->q); + return queue_max_segments(rq->q); +} + static inline int ll_new_hw_segment(struct request *req, struct bio *bio, unsigned int nr_phys_segs) { - if (req->nr_phys_segments + nr_phys_segs > queue_max_segments(req->q)) + if (req->nr_phys_segments + nr_phys_segs > blk_rq_get_max_segments(req)) goto no_merge; if (blk_integrity_merge_bio(req->q, req, bio) == false) @@ -624,7 +631,7 @@ static int ll_merge_requests_fn(struct request_queue *q, struct request *req, return 0; total_phys_segments = req->nr_phys_segments + next->nr_phys_segments; - if (total_phys_segments > queue_max_segments(q)) + if (total_phys_segments > blk_rq_get_max_segments(req)) return 0; if (blk_integrity_merge_rq(q, req, next) == false) -- cgit v1.2.3 From d81665198b83e55a28339d1f3e4890ed8a434556 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Mon, 17 Aug 2020 20:52:06 +0100 Subject: block: Fix page_is_mergeable() for compound pages If we pass in an offset which is larger than PAGE_SIZE, then page_is_mergeable() thinks it's not mergeable with the previous bio_vec, leading to a large number of bio_vecs being used. Use a slightly more obvious test that the two pages are compatible with each other. Fixes: 52d52d1c98a9 ("block: only allow contiguous page structs in a bio_vec") Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Ming Lei Signed-off-by: Jens Axboe --- block/bio.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) (limited to 'block') diff --git a/block/bio.c b/block/bio.c index c63ba04bd629..a9931f23d933 100644 --- a/block/bio.c +++ b/block/bio.c @@ -740,8 +740,8 @@ static inline bool page_is_mergeable(const struct bio_vec *bv, struct page *page, unsigned int len, unsigned int off, bool *same_page) { - phys_addr_t vec_end_addr = page_to_phys(bv->bv_page) + - bv->bv_offset + bv->bv_len - 1; + size_t bv_end = bv->bv_offset + bv->bv_len; + phys_addr_t vec_end_addr = page_to_phys(bv->bv_page) + bv_end - 1; phys_addr_t page_addr = page_to_phys(page); if (vec_end_addr + 1 != page_addr + off) @@ -750,9 +750,9 @@ static inline bool page_is_mergeable(const struct bio_vec *bv, return false; *same_page = ((vec_end_addr & PAGE_MASK) == page_addr); - if (!*same_page && pfn_to_page(PFN_DOWN(vec_end_addr)) + 1 != page) - return false; - return true; + if (*same_page) + return true; + return (bv->bv_page + bv_end / PAGE_SIZE) == (page + off / PAGE_SIZE); } /* -- cgit v1.2.3 From 2de791ab4918969d8108f15238a701968375f235 Mon Sep 17 00:00:00 2001 From: Dmitry Monakhov Date: Tue, 11 Aug 2020 06:43:40 +0000 Subject: bfq: fix blkio cgroup leakage v4 Changes from v1: - update commit description with proper ref-accounting justification commit db37a34c563b ("block, bfq: get a ref to a group when adding it to a service tree") introduce leak forbfq_group and blkcg_gq objects because of get/put imbalance. In fact whole idea of original commit is wrong because bfq_group entity can not dissapear under us because it is referenced by child bfq_queue's entities from here: -> bfq_init_entity() ->bfqg_and_blkg_get(bfqg); ->entity->parent = bfqg->my_entity -> bfq_put_queue(bfqq) FINAL_PUT ->bfqg_and_blkg_put(bfqq_group(bfqq)) ->kmem_cache_free(bfq_pool, bfqq); So parent entity can not disappear while child entity is in tree, and child entities already has proper protection. This patch revert commit db37a34c563b ("block, bfq: get a ref to a group when adding it to a service tree") bfq_group leak trace caused by bad commit: -> blkg_alloc -> bfq_pq_alloc -> bfqg_get (+1) ->bfq_activate_bfqq ->bfq_activate_requeue_entity -> __bfq_activate_entity ->bfq_get_entity ->bfqg_and_blkg_get (+1) <==== : Note1 ->bfq_del_bfqq_busy ->bfq_deactivate_entity+0x53/0xc0 [bfq] ->__bfq_deactivate_entity+0x1b8/0x210 [bfq] -> bfq_forget_entity(is_in_service = true) entity->on_st_or_in_serv = false <=== :Note2 if (is_in_service) return; ==> do not touch reference -> blkcg_css_offline -> blkcg_destroy_blkgs -> blkg_destroy -> bfq_pd_offline -> __bfq_deactivate_entity if (!entity->on_st_or_in_serv) /* true, because (Note2) return false; -> bfq_pd_free -> bfqg_put() (-1, byt bfqg->ref == 2) because of (Note2) So bfq_group and blkcg_gq will leak forever, see test-case below. ##TESTCASE_BEGIN: #!/bin/bash max_iters=${1:-100} #prep cgroup mounts mount -t tmpfs cgroup_root /sys/fs/cgroup mkdir /sys/fs/cgroup/blkio mount -t cgroup -o blkio none /sys/fs/cgroup/blkio # Prepare blkdev grep blkio /proc/cgroups truncate -s 1M img losetup /dev/loop0 img echo bfq > /sys/block/loop0/queue/scheduler grep blkio /proc/cgroups for ((i=0;i /sys/fs/cgroup/blkio/a/cgroup.procs dd if=/dev/loop0 bs=4k count=1 of=/dev/null iflag=direct 2> /dev/null echo 0 > /sys/fs/cgroup/blkio/cgroup.procs rmdir /sys/fs/cgroup/blkio/a grep blkio /proc/cgroups done ##TESTCASE_END: Fixes: db37a34c563b ("block, bfq: get a ref to a group when adding it to a service tree") Tested-by: Oleksandr Natalenko Signed-off-by: Dmitry Monakhov Signed-off-by: Jens Axboe --- block/bfq-cgroup.c | 2 +- block/bfq-iosched.h | 1 - block/bfq-wf2q.c | 12 ++---------- 3 files changed, 3 insertions(+), 12 deletions(-) (limited to 'block') diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c index 68882b9b8f11..b791e2041e49 100644 --- a/block/bfq-cgroup.c +++ b/block/bfq-cgroup.c @@ -332,7 +332,7 @@ static void bfqg_put(struct bfq_group *bfqg) kfree(bfqg); } -void bfqg_and_blkg_get(struct bfq_group *bfqg) +static void bfqg_and_blkg_get(struct bfq_group *bfqg) { /* see comments in bfq_bic_update_cgroup for why refcounting bfqg */ bfqg_get(bfqg); diff --git a/block/bfq-iosched.h b/block/bfq-iosched.h index cd224aaf9f52..703895224562 100644 --- a/block/bfq-iosched.h +++ b/block/bfq-iosched.h @@ -986,7 +986,6 @@ struct bfq_group *bfq_find_set_group(struct bfq_data *bfqd, struct blkcg_gq *bfqg_to_blkg(struct bfq_group *bfqg); struct bfq_group *bfqq_group(struct bfq_queue *bfqq); struct bfq_group *bfq_create_group_hierarchy(struct bfq_data *bfqd, int node); -void bfqg_and_blkg_get(struct bfq_group *bfqg); void bfqg_and_blkg_put(struct bfq_group *bfqg); #ifdef CONFIG_BFQ_GROUP_IOSCHED diff --git a/block/bfq-wf2q.c b/block/bfq-wf2q.c index eb0e2a6daabe..26776bdbdf36 100644 --- a/block/bfq-wf2q.c +++ b/block/bfq-wf2q.c @@ -533,9 +533,7 @@ static void bfq_get_entity(struct bfq_entity *entity) bfqq->ref++; bfq_log_bfqq(bfqq->bfqd, bfqq, "get_entity: %p %d", bfqq, bfqq->ref); - } else - bfqg_and_blkg_get(container_of(entity, struct bfq_group, - entity)); + } } /** @@ -649,14 +647,8 @@ static void bfq_forget_entity(struct bfq_service_tree *st, entity->on_st_or_in_serv = false; st->wsum -= entity->weight; - if (is_in_service) - return; - - if (bfqq) + if (bfqq && !is_in_service) bfq_put_queue(bfqq); - else - bfqg_and_blkg_put(container_of(entity, struct bfq_group, - entity)); } /** -- cgit v1.2.3 From db03f88fae8a2c8007caafa70287798817df2875 Mon Sep 17 00:00:00 2001 From: Ming Lei Date: Tue, 18 Aug 2020 17:07:28 +0800 Subject: blk-mq: insert request not through ->queue_rq into sw/scheduler queue c616cbee97ae ("blk-mq: punt failed direct issue to dispatch list") supposed to add request which has been through ->queue_rq() to the hw queue dispatch list, however it adds request running out of budget or driver tag to hw queue too. This way basically bypasses request merge, and causes too many request dispatched to LLD, and system% is unnecessary increased. Fixes this issue by adding request not through ->queue_rq into sw/scheduler queue, and this way is safe because no ->queue_rq is called on this request yet. High %system can be observed on Azure storvsc device, and even soft lock is observed. This patch reduces %system during heavy sequential IO, meantime decreases soft lockup risk. Fixes: c616cbee97ae ("blk-mq: punt failed direct issue to dispatch list") Signed-off-by: Ming Lei Cc: Christoph Hellwig Cc: Bart Van Assche Cc: Mike Snitzer Signed-off-by: Jens Axboe --- block/blk-mq.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'block') diff --git a/block/blk-mq.c b/block/blk-mq.c index a80f4986e594..b3d2785eefe9 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -2026,7 +2026,8 @@ insert: if (bypass_insert) return BLK_STS_RESOURCE; - blk_mq_request_bypass_insert(rq, false, run_queue); + blk_mq_sched_insert_request(rq, false, run_queue, false); + return BLK_STS_OK; } -- cgit v1.2.3 From e4b469c66f3cbb81c2e94d31123d7bcdf3c1dabd Mon Sep 17 00:00:00 2001 From: Keith Busch Date: Thu, 6 Aug 2020 14:58:37 -0700 Subject: block: fix get_max_io_size() A previous commit aligning splits to physical block sizes inadvertently modified one return case such that that it now returns 0 length splits when the number of sectors doesn't exceed the physical offset. This later hits a BUG in bio_split(). Restore the previous working behavior. Fixes: 9cc5169cd478b ("block: Improve physical block alignment of split bios") Reported-by: Eric Deal Signed-off-by: Keith Busch Cc: Bart Van Assche Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe --- block/blk-merge.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'block') diff --git a/block/blk-merge.c b/block/blk-merge.c index 7af1f3668a91..f685d633bcc9 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -154,7 +154,7 @@ static inline unsigned get_max_io_size(struct request_queue *q, if (max_sectors > start_offset) return max_sectors - start_offset; - return sectors & (lbs - 1); + return sectors & ~(lbs - 1); } static inline unsigned get_max_segment_size(const struct request_queue *q, -- cgit v1.2.3 From 27029b4b18aa5d3b060f0bf2c26dae254132cfce Mon Sep 17 00:00:00 2001 From: Yufen Yu Date: Mon, 10 Aug 2020 22:21:16 -0400 Subject: blkcg: fix memleak for iolatency Normally, blkcg_iolatency_exit() will free related memory in iolatency when cleanup queue. But if blk_throtl_init() return error and queue init fail, blkcg_iolatency_exit() will not do that for us. Then it cause memory leak. Fixes: d70675121546 ("block: introduce blk-iolatency io controller") Signed-off-by: Yufen Yu Signed-off-by: Jens Axboe --- block/blk-cgroup.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) (limited to 'block') diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index 619a79b51068..c195365c9817 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -1152,13 +1152,15 @@ int blkcg_init_queue(struct request_queue *q) if (preloaded) radix_tree_preload_end(); - ret = blk_iolatency_init(q); + ret = blk_throtl_init(q); if (ret) goto err_destroy_all; - ret = blk_throtl_init(q); - if (ret) + ret = blk_iolatency_init(q); + if (ret) { + blk_throtl_exit(q); goto err_destroy_all; + } return 0; err_destroy_all: -- cgit v1.2.3