summaryrefslogtreecommitdiffstats
path: root/Documentation
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2021-06-30 12:12:56 -0700
committerLinus Torvalds <torvalds@linux-foundation.org>2021-06-30 12:12:56 -0700
commitdf668a5fe461bb9d7e899c538acc7197746038f4 (patch)
tree315a71104f5cea7feeb56c9f2c768453408b72f7 /Documentation
parentdf04fbe8680bfe07f3d7487eccff9f768bb02533 (diff)
parent2705dfb2094777e405e065105e307074af8965c1 (diff)
downloadlinux-df668a5fe461bb9d7e899c538acc7197746038f4.tar.bz2
Merge tag 'for-5.14/block-2021-06-29' of git://git.kernel.dk/linux-block
Pull core block updates from Jens Axboe: - disk events cleanup (Christoph) - gendisk and request queue allocation simplifications (Christoph) - bdev_disk_changed cleanups (Christoph) - IO priority improvements (Bart) - Chained bio completion trace fix (Edward) - blk-wbt fixes (Jan) - blk-wbt enable/disable fix (Zhang) - Scheduler dispatch improvements (Jan, Ming) - Shared tagset scheduler improvements (John) - BFQ updates (Paolo, Luca, Pietro) - BFQ lock inversion fix (Jan) - Documentation improvements (Kir) - CLONE_IO block cgroup fix (Tejun) - Remove of ancient and deprecated block dump feature (zhangyi) - Discard merge fix (Ming) - Misc fixes or followup fixes (Colin, Damien, Dan, Long, Max, Thomas, Yang) * tag 'for-5.14/block-2021-06-29' of git://git.kernel.dk/linux-block: (129 commits) block: fix discard request merge block/mq-deadline: Remove a WARN_ON_ONCE() call blk-mq: update hctx->dispatch_busy in case of real scheduler blk: Fix lock inversion between ioc lock and bfqd lock bfq: Remove merged request already in bfq_requests_merged() block: pass a gendisk to bdev_disk_changed block: move bdev_disk_changed block: add the events* attributes to disk_attrs block: move the disk events code to a separate file block: fix trace completion for chained bio block/partitions/msdos: Fix typo inidicator -> indicator block, bfq: reset waker pointer with shared queues block, bfq: check waker only for queues with no in-flight I/O block, bfq: avoid delayed merge of async queues block, bfq: boost throughput by extending queue-merging times block, bfq: consider also creation time in delayed stable merge block, bfq: fix delayed stable merge check block, bfq: let also stably merged queues enjoy weight raising blk-wbt: make sure throttle is enabled properly blk-wbt: introduce a new disable state to prevent false positive by rwb_enabled() ...
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/admin-guide/cgroup-v1/blkio-controller.rst155
-rw-r--r--Documentation/admin-guide/cgroup-v2.rst55
-rw-r--r--Documentation/admin-guide/laptops/laptop-mode.rst11
-rw-r--r--Documentation/admin-guide/sysctl/vm.rst8
-rw-r--r--Documentation/block/bfq-iosched.rst38
-rw-r--r--Documentation/filesystems/locking.rst2
6 files changed, 163 insertions, 106 deletions
diff --git a/Documentation/admin-guide/cgroup-v1/blkio-controller.rst b/Documentation/admin-guide/cgroup-v1/blkio-controller.rst
index 36d43ae7dc13..16253eda192e 100644
--- a/Documentation/admin-guide/cgroup-v1/blkio-controller.rst
+++ b/Documentation/admin-guide/cgroup-v1/blkio-controller.rst
@@ -17,36 +17,37 @@ level logical devices like device mapper.
HOWTO
=====
+
Throttling/Upper Limit policy
-----------------------------
-- Enable Block IO controller::
+Enable Block IO controller::
CONFIG_BLK_CGROUP=y
-- Enable throttling in block layer::
+Enable throttling in block layer::
CONFIG_BLK_DEV_THROTTLING=y
-- Mount blkio controller (see cgroups.txt, Why are cgroups needed?)::
+Mount blkio controller (see cgroups.txt, Why are cgroups needed?)::
mount -t cgroup -o blkio none /sys/fs/cgroup/blkio
-- Specify a bandwidth rate on particular device for root group. The format
- for policy is "<major>:<minor> <bytes_per_second>"::
+Specify a bandwidth rate on particular device for root group. The format
+for policy is "<major>:<minor> <bytes_per_second>"::
echo "8:16 1048576" > /sys/fs/cgroup/blkio/blkio.throttle.read_bps_device
- Above will put a limit of 1MB/second on reads happening for root group
- on device having major/minor number 8:16.
+This will put a limit of 1MB/second on reads happening for root group
+on device having major/minor number 8:16.
-- Run dd to read a file and see if rate is throttled to 1MB/s or not::
+Run dd to read a file and see if rate is throttled to 1MB/s or not::
# dd iflag=direct if=/mnt/common/zerofile of=/dev/null bs=4K count=1024
1024+0 records in
1024+0 records out
4194304 bytes (4.2 MB) copied, 4.0001 s, 1.0 MB/s
- Limits for writes can be put using blkio.throttle.write_bps_device file.
+Limits for writes can be put using blkio.throttle.write_bps_device file.
Hierarchical Cgroups
====================
@@ -79,85 +80,89 @@ following::
Various user visible config options
===================================
-CONFIG_BLK_CGROUP
- - Block IO controller.
-CONFIG_BFQ_CGROUP_DEBUG
- - Debug help. Right now some additional stats file show up in cgroup
+ CONFIG_BLK_CGROUP
+ Block IO controller.
+
+ CONFIG_BFQ_CGROUP_DEBUG
+ Debug help. Right now some additional stats file show up in cgroup
if this option is enabled.
-CONFIG_BLK_DEV_THROTTLING
- - Enable block device throttling support in block layer.
+ CONFIG_BLK_DEV_THROTTLING
+ Enable block device throttling support in block layer.
Details of cgroup files
=======================
+
Proportional weight policy files
--------------------------------
-- blkio.weight
- - Specifies per cgroup weight. This is default weight of the group
- on all the devices until and unless overridden by per device rule.
- (See blkio.weight_device).
- Currently allowed range of weights is from 10 to 1000.
-- blkio.weight_device
- - One can specify per cgroup per device rules using this interface.
- These rules override the default value of group weight as specified
- by blkio.weight.
+ blkio.bfq.weight
+ Specifies per cgroup weight. This is default weight of the group
+ on all the devices until and unless overridden by per device rule
+ (see `blkio.bfq.weight_device` below).
+
+ Currently allowed range of weights is from 1 to 1000. For more details,
+ see Documentation/block/bfq-iosched.rst.
+
+ blkio.bfq.weight_device
+ Specifes per cgroup per device weights, overriding the default group
+ weight. For more details, see Documentation/block/bfq-iosched.rst.
Following is the format::
- # echo dev_maj:dev_minor weight > blkio.weight_device
+ # echo dev_maj:dev_minor weight > blkio.bfq.weight_device
Configure weight=300 on /dev/sdb (8:16) in this cgroup::
- # echo 8:16 300 > blkio.weight_device
- # cat blkio.weight_device
+ # echo 8:16 300 > blkio.bfq.weight_device
+ # cat blkio.bfq.weight_device
dev weight
8:16 300
Configure weight=500 on /dev/sda (8:0) in this cgroup::
- # echo 8:0 500 > blkio.weight_device
- # cat blkio.weight_device
+ # echo 8:0 500 > blkio.bfq.weight_device
+ # cat blkio.bfq.weight_device
dev weight
8:0 500
8:16 300
Remove specific weight for /dev/sda in this cgroup::
- # echo 8:0 0 > blkio.weight_device
- # cat blkio.weight_device
+ # echo 8:0 0 > blkio.bfq.weight_device
+ # cat blkio.bfq.weight_device
dev weight
8:16 300
-- blkio.time
- - disk time allocated to cgroup per device in milliseconds. First
+ blkio.time
+ Disk time allocated to cgroup per device in milliseconds. First
two fields specify the major and minor number of the device and
third field specifies the disk time allocated to group in
milliseconds.
-- blkio.sectors
- - number of sectors transferred to/from disk by the group. First
+ blkio.sectors
+ Number of sectors transferred to/from disk by the group. First
two fields specify the major and minor number of the device and
third field specifies the number of sectors transferred by the
group to/from the device.
-- blkio.io_service_bytes
- - Number of bytes transferred to/from the disk by the group. These
+ blkio.io_service_bytes
+ Number of bytes transferred to/from the disk by the group. These
are further divided by the type of operation - read or write, sync
or async. First two fields specify the major and minor number of the
device, third field specifies the operation type and the fourth field
specifies the number of bytes.
-- blkio.io_serviced
- - Number of IOs (bio) issued to the disk by the group. These
+ blkio.io_serviced
+ Number of IOs (bio) issued to the disk by the group. These
are further divided by the type of operation - read or write, sync
or async. First two fields specify the major and minor number of the
device, third field specifies the operation type and the fourth field
specifies the number of IOs.
-- blkio.io_service_time
- - Total amount of time between request dispatch and request completion
+ blkio.io_service_time
+ Total amount of time between request dispatch and request completion
for the IOs done by this cgroup. This is in nanoseconds to make it
meaningful for flash devices too. For devices with queue depth of 1,
this time represents the actual service time. When queue_depth > 1,
@@ -170,8 +175,8 @@ Proportional weight policy files
specifies the operation type and the fourth field specifies the
io_service_time in ns.
-- blkio.io_wait_time
- - Total amount of time the IOs for this cgroup spent waiting in the
+ blkio.io_wait_time
+ Total amount of time the IOs for this cgroup spent waiting in the
scheduler queues for service. This can be greater than the total time
elapsed since it is cumulative io_wait_time for all IOs. It is not a
measure of total time the cgroup spent waiting but rather a measure of
@@ -185,24 +190,24 @@ Proportional weight policy files
minor number of the device, third field specifies the operation type
and the fourth field specifies the io_wait_time in ns.
-- blkio.io_merged
- - Total number of bios/requests merged into requests belonging to this
+ blkio.io_merged
+ Total number of bios/requests merged into requests belonging to this
cgroup. This is further divided by the type of operation - read or
write, sync or async.
-- blkio.io_queued
- - Total number of requests queued up at any given instant for this
+ blkio.io_queued
+ Total number of requests queued up at any given instant for this
cgroup. This is further divided by the type of operation - read or
write, sync or async.
-- blkio.avg_queue_size
- - Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y.
+ blkio.avg_queue_size
+ Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y.
The average queue size for this cgroup over the entire time of this
cgroup's existence. Queue size samples are taken each time one of the
queues of this cgroup gets a timeslice.
-- blkio.group_wait_time
- - Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y.
+ blkio.group_wait_time
+ Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y.
This is the amount of time the cgroup had to wait since it became busy
(i.e., went from 0 to 1 request queued) to get a timeslice for one of
its queues. This is different from the io_wait_time which is the
@@ -212,8 +217,8 @@ Proportional weight policy files
will only report the group_wait_time accumulated till the last time it
got a timeslice and will not include the current delta.
-- blkio.empty_time
- - Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y.
+ blkio.empty_time
+ Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y.
This is the amount of time a cgroup spends without any pending
requests when not being served, i.e., it does not include any time
spent idling for one of the queues of the cgroup. This is in
@@ -221,8 +226,8 @@ Proportional weight policy files
the stat will only report the empty_time accumulated till the last
time it had a pending request and will not include the current delta.
-- blkio.idle_time
- - Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y.
+ blkio.idle_time
+ Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y.
This is the amount of time spent by the IO scheduler idling for a
given cgroup in anticipation of a better request than the existing ones
from other queues/cgroups. This is in nanoseconds. If this is read
@@ -230,60 +235,60 @@ Proportional weight policy files
idle_time accumulated till the last idle period and will not include
the current delta.
-- blkio.dequeue
- - Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y. This
+ blkio.dequeue
+ Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y. This
gives the statistics about how many a times a group was dequeued
from service tree of the device. First two fields specify the major
and minor number of the device and third field specifies the number
of times a group was dequeued from a particular device.
-- blkio.*_recursive
- - Recursive version of various stats. These files show the
+ blkio.*_recursive
+ Recursive version of various stats. These files show the
same information as their non-recursive counterparts but
include stats from all the descendant cgroups.
Throttling/Upper limit policy files
-----------------------------------
-- blkio.throttle.read_bps_device
- - Specifies upper limit on READ rate from the device. IO rate is
+ blkio.throttle.read_bps_device
+ Specifies upper limit on READ rate from the device. IO rate is
specified in bytes per second. Rules are per device. Following is
the format::
echo "<major>:<minor> <rate_bytes_per_second>" > /cgrp/blkio.throttle.read_bps_device
-- blkio.throttle.write_bps_device
- - Specifies upper limit on WRITE rate to the device. IO rate is
+ blkio.throttle.write_bps_device
+ Specifies upper limit on WRITE rate to the device. IO rate is
specified in bytes per second. Rules are per device. Following is
the format::
echo "<major>:<minor> <rate_bytes_per_second>" > /cgrp/blkio.throttle.write_bps_device
-- blkio.throttle.read_iops_device
- - Specifies upper limit on READ rate from the device. IO rate is
+ blkio.throttle.read_iops_device
+ Specifies upper limit on READ rate from the device. IO rate is
specified in IO per second. Rules are per device. Following is
the format::
echo "<major>:<minor> <rate_io_per_second>" > /cgrp/blkio.throttle.read_iops_device
-- blkio.throttle.write_iops_device
- - Specifies upper limit on WRITE rate to the device. IO rate is
+ blkio.throttle.write_iops_device
+ Specifies upper limit on WRITE rate to the device. IO rate is
specified in io per second. Rules are per device. Following is
the format::
echo "<major>:<minor> <rate_io_per_second>" > /cgrp/blkio.throttle.write_iops_device
-Note: If both BW and IOPS rules are specified for a device, then IO is
- subjected to both the constraints.
+ Note: If both BW and IOPS rules are specified for a device, then IO is
+ subjected to both the constraints.
-- blkio.throttle.io_serviced
- - Number of IOs (bio) issued to the disk by the group. These
+ blkio.throttle.io_serviced
+ Number of IOs (bio) issued to the disk by the group. These
are further divided by the type of operation - read or write, sync
or async. First two fields specify the major and minor number of the
device, third field specifies the operation type and the fourth field
specifies the number of IOs.
-- blkio.throttle.io_service_bytes
- - Number of bytes transferred to/from the disk by the group. These
+ blkio.throttle.io_service_bytes
+ Number of bytes transferred to/from the disk by the group. These
are further divided by the type of operation - read or write, sync
or async. First two fields specify the major and minor number of the
device, third field specifies the operation type and the fourth field
@@ -291,6 +296,6 @@ Note: If both BW and IOPS rules are specified for a device, then IO is
Common files among various policies
-----------------------------------
-- blkio.reset_stats
- - Writing an int to this file will result in resetting all the stats
+ blkio.reset_stats
+ Writing an int to this file will result in resetting all the stats
for that cgroup.
diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index b1e81aa8598a..4e59925e6583 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -56,6 +56,7 @@ v1 is available under :ref:`Documentation/admin-guide/cgroup-v1/index.rst <cgrou
5-3-3. IO Latency
5-3-3-1. How IO Latency Throttling Works
5-3-3-2. IO Latency Interface Files
+ 5-3-4. IO Priority
5-4. PID
5-4-1. PID Interface Files
5-5. Cpuset
@@ -1866,6 +1867,60 @@ IO Latency Interface Files
duration of time between evaluation events. Windows only elapse
with IO activity. Idle periods extend the most recent window.
+IO Priority
+~~~~~~~~~~~
+
+A single attribute controls the behavior of the I/O priority cgroup policy,
+namely the blkio.prio.class attribute. The following values are accepted for
+that attribute:
+
+ no-change
+ Do not modify the I/O priority class.
+
+ none-to-rt
+ For requests that do not have an I/O priority class (NONE),
+ change the I/O priority class into RT. Do not modify
+ the I/O priority class of other requests.
+
+ restrict-to-be
+ For requests that do not have an I/O priority class or that have I/O
+ priority class RT, change it into BE. Do not modify the I/O priority
+ class of requests that have priority class IDLE.
+
+ idle
+ Change the I/O priority class of all requests into IDLE, the lowest
+ I/O priority class.
+
+The following numerical values are associated with the I/O priority policies:
+
++-------------+---+
+| no-change | 0 |
++-------------+---+
+| none-to-rt | 1 |
++-------------+---+
+| rt-to-be | 2 |
++-------------+---+
+| all-to-idle | 3 |
++-------------+---+
+
+The numerical value that corresponds to each I/O priority class is as follows:
+
++-------------------------------+---+
+| IOPRIO_CLASS_NONE | 0 |
++-------------------------------+---+
+| IOPRIO_CLASS_RT (real-time) | 1 |
++-------------------------------+---+
+| IOPRIO_CLASS_BE (best effort) | 2 |
++-------------------------------+---+
+| IOPRIO_CLASS_IDLE | 3 |
++-------------------------------+---+
+
+The algorithm to set the I/O priority class for a request is as follows:
+
+- Translate the I/O priority class policy into a number.
+- Change the request I/O priority class into the maximum of the I/O priority
+ class policy number and the numerical I/O priority class.
+
PID
---
diff --git a/Documentation/admin-guide/laptops/laptop-mode.rst b/Documentation/admin-guide/laptops/laptop-mode.rst
index c984c4262f2e..b61cc601d298 100644
--- a/Documentation/admin-guide/laptops/laptop-mode.rst
+++ b/Documentation/admin-guide/laptops/laptop-mode.rst
@@ -101,17 +101,6 @@ this results in concentration of disk activity in a small time interval which
occurs only once every 10 minutes, or whenever the disk is forced to spin up by
a cache miss. The disk can then be spun down in the periods of inactivity.
-If you want to find out which process caused the disk to spin up, you can
-gather information by setting the flag /proc/sys/vm/block_dump. When this flag
-is set, Linux reports all disk read and write operations that take place, and
-all block dirtyings done to files. This makes it possible to debug why a disk
-needs to spin up, and to increase battery life even more. The output of
-block_dump is written to the kernel output, and it can be retrieved using
-"dmesg". When you use block_dump and your kernel logging level also includes
-kernel debugging messages, you probably want to turn off klogd, otherwise
-the output of block_dump will be logged, causing disk activity that is not
-normally there.
-
Configuration
-------------
diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
index 8387ad0b0b83..003d5cc3751b 100644
--- a/Documentation/admin-guide/sysctl/vm.rst
+++ b/Documentation/admin-guide/sysctl/vm.rst
@@ -25,7 +25,6 @@ files can be found in mm/swap.c.
Currently, these files are in /proc/sys/vm:
- admin_reserve_kbytes
-- block_dump
- compact_memory
- compaction_proactiveness
- compact_unevictable_allowed
@@ -106,13 +105,6 @@ On x86_64 this is about 128MB.
Changing this takes effect whenever an application requests memory.
-block_dump
-==========
-
-block_dump enables block I/O debugging when set to a nonzero value. More
-information on block I/O debugging is in Documentation/admin-guide/laptops/laptop-mode.rst.
-
-
compact_memory
==============
diff --git a/Documentation/block/bfq-iosched.rst b/Documentation/block/bfq-iosched.rst
index 66c5a4e54130..df3a8a47f58c 100644
--- a/Documentation/block/bfq-iosched.rst
+++ b/Documentation/block/bfq-iosched.rst
@@ -553,20 +553,36 @@ throughput sustainable with bfq, because updating the blkio.bfq.*
stats is rather costly, especially for some of the stats enabled by
CONFIG_BFQ_CGROUP_DEBUG.
-Parameters to set
------------------
+Parameters
+----------
-For each group, there is only the following parameter to set.
+For each group, the following parameters can be set:
-weight (namely blkio.bfq.weight or io.bfq-weight): the weight of the
-group inside its parent. Available values: 1..1000 (default 100). The
-linear mapping between ioprio and weights, described at the beginning
-of the tunable section, is still valid, but all weights higher than
-IOPRIO_BE_NR*10 are mapped to ioprio 0.
+ weight
+ This specifies the default weight for the cgroup inside its parent.
+ Available values: 1..1000 (default: 100).
-Recall that, if low-latency is set, then BFQ automatically raises the
-weight of the queues associated with interactive and soft real-time
-applications. Unset this tunable if you need/want to control weights.
+ For cgroup v1, it is set by writing the value to `blkio.bfq.weight`.
+
+ For cgroup v2, it is set by writing the value to `io.bfq.weight`.
+ (with an optional prefix of `default` and a space).
+
+ The linear mapping between ioprio and weights, described at the beginning
+ of the tunable section, is still valid, but all weights higher than
+ IOPRIO_BE_NR*10 are mapped to ioprio 0.
+
+ Recall that, if low-latency is set, then BFQ automatically raises the
+ weight of the queues associated with interactive and soft real-time
+ applications. Unset this tunable if you need/want to control weights.
+
+ weight_device
+ This specifies a per-device weight for the cgroup. The syntax is
+ `minor:major weight`. A weight of `0` may be used to reset to the default
+ weight.
+
+ For cgroup v1, it is set by writing the value to `blkio.bfq.weight_device`.
+
+ For cgroup v2, the file name is `io.bfq.weight`.
[1]
diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
index 1e894480115b..2183fd8cc350 100644
--- a/Documentation/filesystems/locking.rst
+++ b/Documentation/filesystems/locking.rst
@@ -480,7 +480,7 @@ prototypes::
locking rules:
======================= ===================
-ops bd_mutex
+ops open_mutex
======================= ===================
open: yes
release: yes