From 14ccb66b3f585b2bc21e7256c96090abed5a512c Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Thu, 6 Jun 2019 12:29:01 +0200 Subject: block: remove the bi_phys_segments field in struct bio We only need the number of segments in the blk-mq submission path. Remove the field from struct bio, and return it from a variant of blk_queue_split instead of that it can passed as an argument to those functions that need the value. This also means we stop recounting segments except for cloning and partial segments. To keep the number of arguments in this how path down remove pointless struct request_queue arguments from any of the functions that had it and grew a nr_segs argument. Signed-off-by: Christoph Hellwig Signed-off-by: Jens Axboe --- Documentation/block/biodoc.txt | 1 - 1 file changed, 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/block/biodoc.txt b/Documentation/block/biodoc.txt index ac18b488cb5e..31c177663ed5 100644 --- a/Documentation/block/biodoc.txt +++ b/Documentation/block/biodoc.txt @@ -436,7 +436,6 @@ struct bio { struct bvec_iter bi_iter; /* current index into bio_vec array */ unsigned int bi_size; /* total size in bytes */ - unsigned short bi_phys_segments; /* segments after physaddr coalesce*/ unsigned short bi_hw_segments; /* segments after DMA remapping */ unsigned int bi_max; /* max bio_vecs we can hold used as index into pool */ -- cgit v1.2.3 From 8060c47ba853f147c46bf1e6f6d93d1726fcb57a Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Thu, 6 Jun 2019 12:26:24 +0200 Subject: block: rename CONFIG_DEBUG_BLK_CGROUP to CONFIG_BFQ_CGROUP_DEBUG This option is entirely bfq specific, give it an appropinquate name. Also make it depend on CONFIG_BFQ_GROUP_IOSCHED in Kconfig, as all the functionality already does so anyway. Acked-by: Tejun Heo Acked-by: Paolo Valente Signed-off-by: Christoph Hellwig Signed-off-by: Jens Axboe --- Documentation/block/bfq-iosched.txt | 12 ++++++------ Documentation/cgroup-v1/blkio-controller.txt | 12 ++++++------ block/Kconfig.iosched | 7 +++++++ block/bfq-cgroup.c | 27 +++++++++++++-------------- block/bfq-iosched.c | 8 ++++---- block/bfq-iosched.h | 4 ++-- init/Kconfig | 8 -------- 7 files changed, 38 insertions(+), 40 deletions(-) (limited to 'Documentation') diff --git a/Documentation/block/bfq-iosched.txt b/Documentation/block/bfq-iosched.txt index 1a0f2ac02eb6..f02163fabf80 100644 --- a/Documentation/block/bfq-iosched.txt +++ b/Documentation/block/bfq-iosched.txt @@ -38,13 +38,13 @@ stack). To give an idea of the limits with BFQ, on slow or average CPUs, here are, first, the limits of BFQ for three different CPUs, on, respectively, an average laptop, an old desktop, and a cheap embedded system, in case full hierarchical support is enabled (i.e., -CONFIG_BFQ_GROUP_IOSCHED is set), but CONFIG_DEBUG_BLK_CGROUP is not +CONFIG_BFQ_GROUP_IOSCHED is set), but CONFIG_BFQ_CGROUP_DEBUG is not set (Section 4-2): - Intel i7-4850HQ: 400 KIOPS - AMD A8-3850: 250 KIOPS - ARM CortexTM-A53 Octa-core: 80 KIOPS -If CONFIG_DEBUG_BLK_CGROUP is set (and of course full hierarchical +If CONFIG_BFQ_CGROUP_DEBUG is set (and of course full hierarchical support is enabled), then the sustainable throughput with BFQ decreases, because all blkio.bfq* statistics are created and updated (Section 4-2). For BFQ, this leads to the following maximum @@ -537,19 +537,19 @@ or io.bfq.weight. As for cgroups-v1 (blkio controller), the exact set of stat files created, and kept up-to-date by bfq, depends on whether -CONFIG_DEBUG_BLK_CGROUP is set. If it is set, then bfq creates all +CONFIG_BFQ_CGROUP_DEBUG is set. If it is set, then bfq creates all the stat files documented in Documentation/cgroup-v1/blkio-controller.txt. If, instead, -CONFIG_DEBUG_BLK_CGROUP is not set, then bfq creates only the files +CONFIG_BFQ_CGROUP_DEBUG is not set, then bfq creates only the files blkio.bfq.io_service_bytes blkio.bfq.io_service_bytes_recursive blkio.bfq.io_serviced blkio.bfq.io_serviced_recursive -The value of CONFIG_DEBUG_BLK_CGROUP greatly influences the maximum +The value of CONFIG_BFQ_CGROUP_DEBUG greatly influences the maximum throughput sustainable with bfq, because updating the blkio.bfq.* stats is rather costly, especially for some of the stats enabled by -CONFIG_DEBUG_BLK_CGROUP. +CONFIG_BFQ_CGROUP_DEBUG. Parameters to set ----------------- diff --git a/Documentation/cgroup-v1/blkio-controller.txt b/Documentation/cgroup-v1/blkio-controller.txt index d1a1b7bdd03a..78ec4500f220 100644 --- a/Documentation/cgroup-v1/blkio-controller.txt +++ b/Documentation/cgroup-v1/blkio-controller.txt @@ -77,7 +77,7 @@ Various user visible config options CONFIG_BLK_CGROUP - Block IO controller. -CONFIG_DEBUG_BLK_CGROUP +CONFIG_BFQ_CGROUP_DEBUG - Debug help. Right now some additional stats file show up in cgroup if this option is enabled. @@ -193,13 +193,13 @@ Proportional weight policy files write, sync or async. - blkio.avg_queue_size - - Debugging aid only enabled if CONFIG_DEBUG_BLK_CGROUP=y. + - Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y. The average queue size for this cgroup over the entire time of this cgroup's existence. Queue size samples are taken each time one of the queues of this cgroup gets a timeslice. - blkio.group_wait_time - - Debugging aid only enabled if CONFIG_DEBUG_BLK_CGROUP=y. + - Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y. This is the amount of time the cgroup had to wait since it became busy (i.e., went from 0 to 1 request queued) to get a timeslice for one of its queues. This is different from the io_wait_time which is the @@ -210,7 +210,7 @@ Proportional weight policy files got a timeslice and will not include the current delta. - blkio.empty_time - - Debugging aid only enabled if CONFIG_DEBUG_BLK_CGROUP=y. + - Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y. This is the amount of time a cgroup spends without any pending requests when not being served, i.e., it does not include any time spent idling for one of the queues of the cgroup. This is in @@ -219,7 +219,7 @@ Proportional weight policy files time it had a pending request and will not include the current delta. - blkio.idle_time - - Debugging aid only enabled if CONFIG_DEBUG_BLK_CGROUP=y. + - Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y. This is the amount of time spent by the IO scheduler idling for a given cgroup in anticipation of a better request than the existing ones from other queues/cgroups. This is in nanoseconds. If this is read @@ -228,7 +228,7 @@ Proportional weight policy files the current delta. - blkio.dequeue - - Debugging aid only enabled if CONFIG_DEBUG_BLK_CGROUP=y. This + - Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y. This gives the statistics about how many a times a group was dequeued from service tree of the device. First two fields specify the major and minor number of the device and third field specifies the number diff --git a/block/Kconfig.iosched b/block/Kconfig.iosched index 4626b88b2d5a..7a6b2f29a582 100644 --- a/block/Kconfig.iosched +++ b/block/Kconfig.iosched @@ -36,6 +36,13 @@ config BFQ_GROUP_IOSCHED Enable hierarchical scheduling in BFQ, using the blkio (cgroups-v1) or io (cgroups-v2) controller. +config BFQ_CGROUP_DEBUG + bool "BFQ IO controller debugging" + depends on BFQ_GROUP_IOSCHED + ---help--- + Enable some debugging help. Currently it exports additional stat + files in a cgroup which can be useful for debugging. + endmenu endif diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c index d84302445e30..0f6cd688924f 100644 --- a/block/bfq-cgroup.c +++ b/block/bfq-cgroup.c @@ -15,8 +15,7 @@ #include "bfq-iosched.h" -#if defined(CONFIG_BFQ_GROUP_IOSCHED) && defined(CONFIG_DEBUG_BLK_CGROUP) - +#ifdef CONFIG_BFQ_CGROUP_DEBUG static int bfq_stat_init(struct bfq_stat *stat, gfp_t gfp) { int ret; @@ -253,7 +252,7 @@ void bfqg_stats_update_completion(struct bfq_group *bfqg, u64 start_time_ns, io_start_time_ns - start_time_ns); } -#else /* CONFIG_BFQ_GROUP_IOSCHED && CONFIG_DEBUG_BLK_CGROUP */ +#else /* CONFIG_BFQ_CGROUP_DEBUG */ void bfqg_stats_update_io_add(struct bfq_group *bfqg, struct bfq_queue *bfqq, unsigned int op) { } @@ -267,7 +266,7 @@ void bfqg_stats_update_idle_time(struct bfq_group *bfqg) { } void bfqg_stats_set_start_idle_time(struct bfq_group *bfqg) { } void bfqg_stats_update_avg_queue_size(struct bfq_group *bfqg) { } -#endif /* CONFIG_BFQ_GROUP_IOSCHED && CONFIG_DEBUG_BLK_CGROUP */ +#endif /* CONFIG_BFQ_CGROUP_DEBUG */ #ifdef CONFIG_BFQ_GROUP_IOSCHED @@ -351,7 +350,7 @@ void bfqg_and_blkg_put(struct bfq_group *bfqg) /* @stats = 0 */ static void bfqg_stats_reset(struct bfqg_stats *stats) { -#ifdef CONFIG_DEBUG_BLK_CGROUP +#ifdef CONFIG_BFQ_CGROUP_DEBUG /* queued stats shouldn't be cleared */ blkg_rwstat_reset(&stats->merged); blkg_rwstat_reset(&stats->service_time); @@ -372,7 +371,7 @@ static void bfqg_stats_add_aux(struct bfqg_stats *to, struct bfqg_stats *from) if (!to || !from) return; -#ifdef CONFIG_DEBUG_BLK_CGROUP +#ifdef CONFIG_BFQ_CGROUP_DEBUG /* queued stats shouldn't be cleared */ blkg_rwstat_add_aux(&to->merged, &from->merged); blkg_rwstat_add_aux(&to->service_time, &from->service_time); @@ -432,7 +431,7 @@ void bfq_init_entity(struct bfq_entity *entity, struct bfq_group *bfqg) static void bfqg_stats_exit(struct bfqg_stats *stats) { -#ifdef CONFIG_DEBUG_BLK_CGROUP +#ifdef CONFIG_BFQ_CGROUP_DEBUG blkg_rwstat_exit(&stats->merged); blkg_rwstat_exit(&stats->service_time); blkg_rwstat_exit(&stats->wait_time); @@ -449,7 +448,7 @@ static void bfqg_stats_exit(struct bfqg_stats *stats) static int bfqg_stats_init(struct bfqg_stats *stats, gfp_t gfp) { -#ifdef CONFIG_DEBUG_BLK_CGROUP +#ifdef CONFIG_BFQ_CGROUP_DEBUG if (blkg_rwstat_init(&stats->merged, gfp) || blkg_rwstat_init(&stats->service_time, gfp) || blkg_rwstat_init(&stats->wait_time, gfp) || @@ -986,7 +985,7 @@ static ssize_t bfq_io_set_weight(struct kernfs_open_file *of, return ret ?: nbytes; } -#ifdef CONFIG_DEBUG_BLK_CGROUP +#ifdef CONFIG_BFQ_CGROUP_DEBUG static int bfqg_print_stat(struct seq_file *sf, void *v) { blkcg_print_blkgs(sf, css_to_blkcg(seq_css(sf)), blkg_prfill_stat, @@ -1109,7 +1108,7 @@ static int bfqg_print_avg_queue_size(struct seq_file *sf, void *v) 0, false); return 0; } -#endif /* CONFIG_DEBUG_BLK_CGROUP */ +#endif /* CONFIG_BFQ_CGROUP_DEBUG */ struct bfq_group *bfq_create_group_hierarchy(struct bfq_data *bfqd, int node) { @@ -1157,7 +1156,7 @@ struct cftype bfq_blkcg_legacy_files[] = { .private = (unsigned long)&blkcg_policy_bfq, .seq_show = blkg_print_stat_ios, }, -#ifdef CONFIG_DEBUG_BLK_CGROUP +#ifdef CONFIG_BFQ_CGROUP_DEBUG { .name = "bfq.time", .private = offsetof(struct bfq_group, stats.time), @@ -1187,7 +1186,7 @@ struct cftype bfq_blkcg_legacy_files[] = { .private = offsetof(struct bfq_group, stats.queued), .seq_show = bfqg_print_rwstat, }, -#endif /* CONFIG_DEBUG_BLK_CGROUP */ +#endif /* CONFIG_BFQ_CGROUP_DEBUG */ /* the same statistics which cover the bfqg and its descendants */ { @@ -1200,7 +1199,7 @@ struct cftype bfq_blkcg_legacy_files[] = { .private = (unsigned long)&blkcg_policy_bfq, .seq_show = blkg_print_stat_ios_recursive, }, -#ifdef CONFIG_DEBUG_BLK_CGROUP +#ifdef CONFIG_BFQ_CGROUP_DEBUG { .name = "bfq.time_recursive", .private = offsetof(struct bfq_group, stats.time), @@ -1254,7 +1253,7 @@ struct cftype bfq_blkcg_legacy_files[] = { .private = offsetof(struct bfq_group, stats.dequeue), .seq_show = bfqg_print_stat, }, -#endif /* CONFIG_DEBUG_BLK_CGROUP */ +#endif /* CONFIG_BFQ_CGROUP_DEBUG */ { } /* terminate */ }; diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c index a6bf842cbe16..44c6bbcd7720 100644 --- a/block/bfq-iosched.c +++ b/block/bfq-iosched.c @@ -4404,7 +4404,7 @@ exit: return rq; } -#if defined(CONFIG_BFQ_GROUP_IOSCHED) && defined(CONFIG_DEBUG_BLK_CGROUP) +#ifdef CONFIG_BFQ_CGROUP_DEBUG static void bfq_update_dispatch_stats(struct request_queue *q, struct request *rq, struct bfq_queue *in_serv_queue, @@ -4454,7 +4454,7 @@ static inline void bfq_update_dispatch_stats(struct request_queue *q, struct request *rq, struct bfq_queue *in_serv_queue, bool idle_timer_disabled) {} -#endif +#endif /* CONFIG_BFQ_CGROUP_DEBUG */ static struct request *bfq_dispatch_request(struct blk_mq_hw_ctx *hctx) { @@ -5008,7 +5008,7 @@ static bool __bfq_insert_request(struct bfq_data *bfqd, struct request *rq) return idle_timer_disabled; } -#if defined(CONFIG_BFQ_GROUP_IOSCHED) && defined(CONFIG_DEBUG_BLK_CGROUP) +#ifdef CONFIG_BFQ_CGROUP_DEBUG static void bfq_update_insert_stats(struct request_queue *q, struct bfq_queue *bfqq, bool idle_timer_disabled, @@ -5038,7 +5038,7 @@ static inline void bfq_update_insert_stats(struct request_queue *q, struct bfq_queue *bfqq, bool idle_timer_disabled, unsigned int cmd_flags) {} -#endif +#endif /* CONFIG_BFQ_CGROUP_DEBUG */ static void bfq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq, bool at_head) diff --git a/block/bfq-iosched.h b/block/bfq-iosched.h index aef4fa0046b8..584d3c9ed8ba 100644 --- a/block/bfq-iosched.h +++ b/block/bfq-iosched.h @@ -783,7 +783,7 @@ struct bfq_stat { }; struct bfqg_stats { -#if defined(CONFIG_BFQ_GROUP_IOSCHED) && defined(CONFIG_DEBUG_BLK_CGROUP) +#ifdef CONFIG_BFQ_CGROUP_DEBUG /* number of ios merged */ struct blkg_rwstat merged; /* total time spent on device in ns, may not be accurate w/ queueing */ @@ -811,7 +811,7 @@ struct bfqg_stats { u64 start_idle_time; u64 start_empty_time; uint16_t flags; -#endif /* CONFIG_BFQ_GROUP_IOSCHED && CONFIG_DEBUG_BLK_CGROUP */ +#endif /* CONFIG_BFQ_CGROUP_DEBUG */ }; #ifdef CONFIG_BFQ_GROUP_IOSCHED diff --git a/init/Kconfig b/init/Kconfig index 0e2344389501..a41d8fbe09d8 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -799,14 +799,6 @@ config BLK_CGROUP See Documentation/cgroup-v1/blkio-controller.txt for more information. -config DEBUG_BLK_CGROUP - bool "IO controller debugging" - depends on BLK_CGROUP - default n - ---help--- - Enable some debugging help. Currently it exports additional stat - files in a cgroup which can be useful for debugging. - config CGROUP_WRITEBACK bool depends on MEMCG && BLK_CGROUP -- cgit v1.2.3 From 7e31d8215fd8cb1c13b47e23f1136545010e00de Mon Sep 17 00:00:00 2001 From: Akinobu Mita Date: Sun, 9 Jun 2019 23:17:02 +0900 Subject: Documentation: nvme: add an example for nvme fault injection This adds an example of how to inject errors into admin commands. Suggested-by: Thomas Tai Signed-off-by: Akinobu Mita Reviewed-by: Chaitanya Kulkarni Reviewed-by: Minwoo Im Reviewed-by: Christoph Hellwig Signed-off-by: Christoph Hellwig --- .../fault-injection/nvme-fault-injection.txt | 56 ++++++++++++++++++++++ 1 file changed, 56 insertions(+) (limited to 'Documentation') diff --git a/Documentation/fault-injection/nvme-fault-injection.txt b/Documentation/fault-injection/nvme-fault-injection.txt index 8fbf3bf60b62..efcb339a3add 100644 --- a/Documentation/fault-injection/nvme-fault-injection.txt +++ b/Documentation/fault-injection/nvme-fault-injection.txt @@ -114,3 +114,59 @@ R13: ffff88011a3c9680 R14: 0000000000000000 R15: 0000000000000000 cpu_startup_entry+0x6f/0x80 start_secondary+0x187/0x1e0 secondary_startup_64+0xa5/0xb0 + +Example 3: Inject an error into the 10th admin command +------------------------------------------------------ + +echo 100 > /sys/kernel/debug/nvme0/fault_inject/probability +echo 10 > /sys/kernel/debug/nvme0/fault_inject/space +echo 1 > /sys/kernel/debug/nvme0/fault_inject/times +nvme reset /dev/nvme0 + +Expected Result: + +After NVMe controller reset, the reinitialization may or may not succeed. +It depends on which admin command is actually forced to fail. + +Message from dmesg: + +nvme nvme0: resetting controller +FAULT_INJECTION: forcing a failure. +name fault_inject, interval 1, probability 100, space 1, times 1 +CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.2.0-rc2+ #2 +Hardware name: MSI MS-7A45/B150M MORTAR ARCTIC (MS-7A45), BIOS 1.50 04/25/2017 +Call Trace: + + dump_stack+0x63/0x85 + should_fail+0x14a/0x170 + nvme_should_fail+0x38/0x80 [nvme_core] + nvme_irq+0x129/0x280 [nvme] + ? blk_mq_end_request+0xb3/0x120 + __handle_irq_event_percpu+0x84/0x1a0 + handle_irq_event_percpu+0x32/0x80 + handle_irq_event+0x3b/0x60 + handle_edge_irq+0x7f/0x1a0 + handle_irq+0x20/0x30 + do_IRQ+0x4e/0xe0 + common_interrupt+0xf/0xf + +RIP: 0010:cpuidle_enter_state+0xc5/0x460 +Code: ff e8 8f 5f 86 ff 80 7d c7 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 69 03 00 00 31 ff e8 62 aa 8c ff fb 66 0f 1f 44 00 00 <45> 85 ed 0f 88 37 03 00 00 4c 8b 45 d0 4c 2b 45 b8 48 ba cf f7 53 +RSP: 0018:ffffffff88c03dd0 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffdc +RAX: ffff9dac25a2ac80 RBX: ffffffff88d53760 RCX: 000000000000001f +RDX: 0000000000000000 RSI: 000000002d958403 RDI: 0000000000000000 +RBP: ffffffff88c03e18 R08: fffffff75e35ffb7 R09: 00000a49a56c0b48 +R10: ffffffff88c03da0 R11: 0000000000001b0c R12: ffff9dac25a34d00 +R13: 0000000000000006 R14: 0000000000000006 R15: ffffffff88d53760 + cpuidle_enter+0x2e/0x40 + call_cpuidle+0x23/0x40 + do_idle+0x201/0x280 + cpu_startup_entry+0x1d/0x20 + rest_init+0xaa/0xb0 + arch_call_rest_init+0xe/0x1b + start_kernel+0x51c/0x53b + x86_64_start_reservations+0x24/0x26 + x86_64_start_kernel+0x74/0x77 + secondary_startup_64+0xa4/0xb0 +nvme nvme0: Could not set queue count (16385) +nvme nvme0: IO queues not created -- cgit v1.2.3 From 152c7776b9442f2f094da7d81e5a8f345dedb397 Mon Sep 17 00:00:00 2001 From: Bart Van Assche Date: Fri, 28 Jun 2019 13:07:42 -0700 Subject: block, documentation: Fix wbt_lat_usec documentation Fix the spelling of the wbt_lat_usec sysfs attribute. Fixes: 87760e5eef35 ("block: hook up writeback throttling") # v4.10. Reviewed-by: Martin K. Petersen Signed-off-by: Bart Van Assche Signed-off-by: Jens Axboe --- Documentation/block/queue-sysfs.txt | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'Documentation') diff --git a/Documentation/block/queue-sysfs.txt b/Documentation/block/queue-sysfs.txt index 83b457e24bba..3eaf86806621 100644 --- a/Documentation/block/queue-sysfs.txt +++ b/Documentation/block/queue-sysfs.txt @@ -185,8 +185,8 @@ This is the number of bytes the device can write in a single write-same command. A value of '0' means write-same is not supported by this device. -wb_lat_usec (RW) ----------------- +wbt_lat_usec (RW) +----------------- If the device is registered for writeback throttling, then this file shows the target minimum read latency. If this latency is exceeded in a given window of time (see wb_window_usec), then the writeback throttling will start -- cgit v1.2.3 From 6728ac3396265184abe93f18b32aca329981e5ce Mon Sep 17 00:00:00 2001 From: Bart Van Assche Date: Fri, 28 Jun 2019 13:07:43 -0700 Subject: block, documentation: Sort queue sysfs attribute names alphabetically Commit f9824952ee1c ("block: update sysfs documentation") # v5.0 broke the alphabetical order of the sysfs attribute names. List queue sysfs attribute names alphabetically. Cc: Damien Le Moal Reviewed-by: Martin K. Petersen Signed-off-by: Bart Van Assche Signed-off-by: Jens Axboe --- Documentation/block/queue-sysfs.txt | 30 +++++++++++++++--------------- 1 file changed, 15 insertions(+), 15 deletions(-) (limited to 'Documentation') diff --git a/Documentation/block/queue-sysfs.txt b/Documentation/block/queue-sysfs.txt index 3eaf86806621..f6da2efe2105 100644 --- a/Documentation/block/queue-sysfs.txt +++ b/Documentation/block/queue-sysfs.txt @@ -14,6 +14,15 @@ add_random (RW) This file allows to turn off the disk entropy contribution. Default value of this file is '1'(on). +chunk_sectors (RO) +------------------ +This has different meaning depending on the type of the block device. +For a RAID device (dm-raid), chunk_sectors indicates the size in 512B sectors +of the RAID volume stripe segment. For a zoned block device, either host-aware +or host-managed, chunk_sectors indicates the size in 512B sectors of the zones +of the device, with the eventual exception of the last zone of the device which +may be smaller. + dax (RO) -------- This file indicates whether the device supports Direct Access (DAX), @@ -132,6 +141,12 @@ per-block-cgroup request pool. IOW, if there are N block cgroups, each request queue may have up to N request pools, each independently regulated by nr_requests. +nr_zones (RO) +------------- +For zoned block devices (zoned attribute indicating "host-managed" or +"host-aware"), this indicates the total number of zones of the device. +This is always 0 for regular block devices. + optimal_io_size (RO) -------------------- This is the optimal IO size reported by the device. @@ -213,19 +228,4 @@ devices are described in the ZBC (Zoned Block Commands) and ZAC do not support zone commands, they will be treated as regular block devices and zoned will report "none". -nr_zones (RO) -------------- -For zoned block devices (zoned attribute indicating "host-managed" or -"host-aware"), this indicates the total number of zones of the device. -This is always 0 for regular block devices. - -chunk_sectors (RO) ------------------- -This has different meaning depending on the type of the block device. -For a RAID device (dm-raid), chunk_sectors indicates the size in 512B sectors -of the RAID volume stripe segment. For a zoned block device, either host-aware -or host-managed, chunk_sectors indicates the size in 512B sectors of the zones -of the device, with the eventual exception of the last zone of the device which -may be smaller. - Jens Axboe , February 2009 -- cgit v1.2.3 From 0c766e78bda6d4edf40779fc0cd48d0867a04d84 Mon Sep 17 00:00:00 2001 From: Bart Van Assche Date: Fri, 28 Jun 2019 13:07:44 -0700 Subject: block, documentation: Explain the word 'segments' Several block layer users who are not kernel developers do not know that the word 'segment' refers to an element in a DMA scatter/gather list. Make the block layer documentation easier to understand by stating explicitly what the word 'segment' stands for. Reviewed-by: Martin K. Petersen Signed-off-by: Bart Van Assche Signed-off-by: Jens Axboe --- Documentation/block/queue-sysfs.txt | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) (limited to 'Documentation') diff --git a/Documentation/block/queue-sysfs.txt b/Documentation/block/queue-sysfs.txt index f6da2efe2105..1515dcf3dec4 100644 --- a/Documentation/block/queue-sysfs.txt +++ b/Documentation/block/queue-sysfs.txt @@ -98,8 +98,9 @@ This is the maximum number of kilobytes supported in a single data transfer. max_integrity_segments (RO) --------------------------- -When read, this file shows the max limit of integrity segments as -set by block layer which a hardware controller can handle. +Maximum number of elements in a DMA scatter/gather list with integrity +data that will be submitted by the block layer core to the associated +block driver. max_sectors_kb (RW) ------------------- @@ -109,11 +110,12 @@ size allowed by the hardware. max_segments (RO) ----------------- -Maximum number of segments of the device. +Maximum number of elements in a DMA scatter/gather list that is submitted +to the associated block driver. max_segment_size (RO) --------------------- -Maximum segment size of the device. +Maximum size in bytes of a single element in a DMA scatter/gather list. minimum_io_size (RO) -------------------- -- cgit v1.2.3 From fbbe7c86b483878da4a2ec7b899e0814195942af Mon Sep 17 00:00:00 2001 From: Bart Van Assche Date: Fri, 28 Jun 2019 13:07:45 -0700 Subject: block, documentation: Document discard_zeroes_data, fua, max_discard_segments and write_zeroes_max_bytes Reviewed-by: Martin K. Petersen Signed-off-by: Bart Van Assche Signed-off-by: Jens Axboe --- Documentation/block/queue-sysfs.txt | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) (limited to 'Documentation') diff --git a/Documentation/block/queue-sysfs.txt b/Documentation/block/queue-sysfs.txt index 1515dcf3dec4..b40b5b7cebd9 100644 --- a/Documentation/block/queue-sysfs.txt +++ b/Documentation/block/queue-sysfs.txt @@ -52,6 +52,16 @@ large discards are issued, setting this value lower will make Linux issue smaller discards and potentially help reduce latencies induced by large discard operations. +discard_zeroes_data (RO) +------------------------ +Obsolete. Always zero. + +fua (RO) +-------- +Whether or not the block driver supports the FUA flag for write requests. +FUA stands for Force Unit Access. If the FUA flag is set that means that +write requests must bypass the volatile cache of the storage device. + hw_sector_size (RO) ------------------- This is the hardware sector size of the device, in bytes. @@ -92,6 +102,10 @@ logical_block_size (RO) ----------------------- This is the logical block size of the device, in bytes. +max_discard_segments (RO) +------------------------- +The maximum number of DMA scatter/gather entries in a discard request. + max_hw_sectors_kb (RO) ---------------------- This is the maximum number of kilobytes supported in a single data transfer. @@ -218,6 +232,12 @@ blk-throttle makes decision based on the samplings. Lower time means cgroups have more smooth throughput, but higher CPU overhead. This exists only when CONFIG_BLK_DEV_THROTTLING_LOW is enabled. +write_zeroes_max_bytes (RO) +--------------------------- +For block drivers that support REQ_OP_WRITE_ZEROES, the maximum number of +bytes that can be zeroed at once. The value 0 means that REQ_OP_WRITE_ZEROES +is not supported. + zoned (RO) ---------- This indicates if the device is a zoned block device and the zone model of the -- cgit v1.2.3