diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2021-06-28 16:53:05 -0700 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2021-06-28 16:53:05 -0700 |
commit | 233a806b00e31b3ab8d57a68f1aab40cf1e5eaea (patch) | |
tree | 871dc7c6c461e1fed85995096c24839e698e7f1d | |
parent | 122fa8c588316aacafe7e5a393bb3e875eaf5b25 (diff) | |
parent | 98cf4951842adbb03079dadedddf30b95e623cb0 (diff) | |
download | linux-233a806b00e31b3ab8d57a68f1aab40cf1e5eaea.tar.bz2 |
Merge tag 'docs-5.14' of git://git.lwn.net/linux
Pull documentation updates from Jonathan Corbet:
"This was a reasonably active cycle for documentation; this includes:
- Some kernel-doc cleanups. That script is still regex onslaught from
hell, but it has gotten a little better.
- Improvements to the checkpatch docs, which are also used by the
tool itself.
- A major update to the pathname lookup documentation.
- Elimination of :doc: markup, since our automarkup magic can create
references from filenames without all the extra noise.
- The flurry of Chinese translation activity continues.
Plus, of course, the usual collection of updates, typo fixes, and
warning fixes"
* tag 'docs-5.14' of git://git.lwn.net/linux: (115 commits)
docs: path-lookup: use bare function() rather than literals
docs: path-lookup: update symlink description
docs: path-lookup: update get_link() ->follow_link description
docs: path-lookup: update WALK_GET, WALK_PUT desc
docs: path-lookup: no get_link()
docs: path-lookup: update i_op->put_link and cookie description
docs: path-lookup: i_op->follow_link replaced with i_op->get_link
docs: path-lookup: Add macro name to symlink limit description
docs: path-lookup: remove filename_mountpoint
docs: path-lookup: update do_last() part
docs: path-lookup: update path_mountpoint() part
docs: path-lookup: update path_to_nameidata() part
docs: path-lookup: update follow_managed() part
docs: Makefile: Use CONFIG_SHELL not SHELL
docs: Take a little noise out of the build process
docs: x86: avoid using ReST :doc:`foo` markup
docs: virt: kvm: s390-pv-boot.rst: avoid using ReST :doc:`foo` markup
docs: userspace-api: landlock.rst: avoid using ReST :doc:`foo` markup
docs: trace: ftrace.rst: avoid using ReST :doc:`foo` markup
docs: trace: coresight: coresight.rst: avoid using ReST :doc:`foo` markup
...
147 files changed, 5720 insertions, 1076 deletions
diff --git a/Documentation/ABI/obsolete/sysfs-cpuidle b/Documentation/ABI/obsolete/sysfs-cpuidle index e398fb5e542f..972cc11d3434 100644 --- a/Documentation/ABI/obsolete/sysfs-cpuidle +++ b/Documentation/ABI/obsolete/sysfs-cpuidle @@ -6,4 +6,4 @@ Description: with the update that cpuidle governor can be changed at runtime in default, both current_governor and current_governor_ro co-exist under /sys/devices/system/cpu/cpuidle/ file, it's duplicate so make - current_governor_ro obselete. + current_governor_ro obsolete. diff --git a/Documentation/ABI/removed/sysfs-kernel-uids b/Documentation/ABI/removed/sysfs-kernel-uids index dc4463f190a7..85a90b86ce1e 100644 --- a/Documentation/ABI/removed/sysfs-kernel-uids +++ b/Documentation/ABI/removed/sysfs-kernel-uids @@ -5,7 +5,7 @@ Contact: Dhaval Giani <dhaval@linux.vnet.ibm.com> Description: The /sys/kernel/uids/<uid>/cpu_shares tunable is used to set the cpu bandwidth a user is allowed. This is a - propotional value. What that means is that if there + proportional value. What that means is that if there are two users logged in, each with an equal number of shares, then they will get equal CPU bandwidth. Another example would be, if User A has shares = 1024 and user diff --git a/Documentation/ABI/stable/sysfs-bus-vmbus b/Documentation/ABI/stable/sysfs-bus-vmbus index 42599d9fa161..3066feae1d8d 100644 --- a/Documentation/ABI/stable/sysfs-bus-vmbus +++ b/Documentation/ABI/stable/sysfs-bus-vmbus @@ -61,7 +61,7 @@ Date: September. 2017 KernelVersion: 4.14 Contact: Stephen Hemminger <sthemmin@microsoft.com> Description: Directory for per-channel information - NN is the VMBUS relid associtated with the channel. + NN is the VMBUS relid associated with the channel. What: /sys/bus/vmbus/devices/<UUID>/channels/<N>/cpu Date: September. 2017 diff --git a/Documentation/ABI/stable/sysfs-bus-xen-backend b/Documentation/ABI/stable/sysfs-bus-xen-backend index e8b60bd766f7..480a89edfa05 100644 --- a/Documentation/ABI/stable/sysfs-bus-xen-backend +++ b/Documentation/ABI/stable/sysfs-bus-xen-backend @@ -19,7 +19,7 @@ Date: April 2011 KernelVersion: 3.0 Contact: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Description: - The major:minor number (in hexidecimal) of the + The major:minor number (in hexadecimal) of the physical device providing the storage for this backend block device. diff --git a/Documentation/ABI/stable/sysfs-devices-system-cpu b/Documentation/ABI/stable/sysfs-devices-system-cpu index 33c133e2a631..516dafea03eb 100644 --- a/Documentation/ABI/stable/sysfs-devices-system-cpu +++ b/Documentation/ABI/stable/sysfs-devices-system-cpu @@ -23,3 +23,86 @@ Description: Default value for the Data Stream Control Register (DSCR) on here). If set by a process it will be inherited by child processes. Values: 64 bit unsigned integer (bit field) + +What: /sys/devices/system/cpu/cpuX/topology/physical_package_id +Description: physical package id of cpuX. Typically corresponds to a physical + socket number, but the actual value is architecture and platform + dependent. +Values: integer + +What: /sys/devices/system/cpu/cpuX/topology/die_id +Description: the CPU die ID of cpuX. Typically it is the hardware platform's + identifier (rather than the kernel's). The actual value is + architecture and platform dependent. +Values: integer + +What: /sys/devices/system/cpu/cpuX/topology/core_id +Description: the CPU core ID of cpuX. Typically it is the hardware platform's + identifier (rather than the kernel's). The actual value is + architecture and platform dependent. +Values: integer + +What: /sys/devices/system/cpu/cpuX/topology/book_id +Description: the book ID of cpuX. Typically it is the hardware platform's + identifier (rather than the kernel's). The actual value is + architecture and platform dependent. it's only used on s390. +Values: integer + +What: /sys/devices/system/cpu/cpuX/topology/drawer_id +Description: the drawer ID of cpuX. Typically it is the hardware platform's + identifier (rather than the kernel's). The actual value is + architecture and platform dependent. it's only used on s390. +Values: integer + +What: /sys/devices/system/cpu/cpuX/topology/core_cpus +Description: internal kernel map of CPUs within the same core. + (deprecated name: "thread_siblings") +Values: hexadecimal bitmask. + +What: /sys/devices/system/cpu/cpuX/topology/core_cpus_list +Description: human-readable list of CPUs within the same core. + The format is like 0-3, 8-11, 14,17. + (deprecated name: "thread_siblings_list"). +Values: decimal list. + +What: /sys/devices/system/cpu/cpuX/topology/package_cpus +Description: internal kernel map of the CPUs sharing the same physical_package_id. + (deprecated name: "core_siblings"). +Values: hexadecimal bitmask. + +What: /sys/devices/system/cpu/cpuX/topology/package_cpus_list +Description: human-readable list of CPUs sharing the same physical_package_id. + The format is like 0-3, 8-11, 14,17. + (deprecated name: "core_siblings_list") +Values: decimal list. + +What: /sys/devices/system/cpu/cpuX/topology/die_cpus +Description: internal kernel map of CPUs within the same die. +Values: hexadecimal bitmask. + +What: /sys/devices/system/cpu/cpuX/topology/die_cpus_list +Description: human-readable list of CPUs within the same die. + The format is like 0-3, 8-11, 14,17. +Values: decimal list. + +What: /sys/devices/system/cpu/cpuX/topology/book_siblings +Description: internal kernel map of cpuX's hardware threads within the same + book_id. it's only used on s390. +Values: hexadecimal bitmask. + +What: /sys/devices/system/cpu/cpuX/topology/book_siblings_list +Description: human-readable list of cpuX's hardware threads within the same + book_id. + The format is like 0-3, 8-11, 14,17. it's only used on s390. +Values: decimal list. + +What: /sys/devices/system/cpu/cpuX/topology/drawer_siblings +Description: internal kernel map of cpuX's hardware threads within the same + drawer_id. it's only used on s390. +Values: hexadecimal bitmask. + +What: /sys/devices/system/cpu/cpuX/topology/drawer_siblings_list +Description: human-readable list of cpuX's hardware threads within the same + drawer_id. + The format is like 0-3, 8-11, 14,17. it's only used on s390. +Values: decimal list. diff --git a/Documentation/ABI/stable/sysfs-driver-dma-idxd b/Documentation/ABI/stable/sysfs-driver-dma-idxd index 55285c136cf0..d431e2d00472 100644 --- a/Documentation/ABI/stable/sysfs-driver-dma-idxd +++ b/Documentation/ABI/stable/sysfs-driver-dma-idxd @@ -173,7 +173,7 @@ What: /sys/bus/dsa/devices/wq<m>.<n>/priority Date: Oct 25, 2019 KernelVersion: 5.6.0 Contact: dmaengine@vger.kernel.org -Description: The priority value of this work queue, it is a vlue relative to +Description: The priority value of this work queue, it is a value relative to other work queue in the same group to control quality of service for dispatching work from multiple workqueues in the same group. diff --git a/Documentation/ABI/stable/sysfs-driver-mlxreg-io b/Documentation/ABI/stable/sysfs-driver-mlxreg-io index fd9a8045bb0c..b2553df2e786 100644 --- a/Documentation/ABI/stable/sysfs-driver-mlxreg-io +++ b/Documentation/ABI/stable/sysfs-driver-mlxreg-io @@ -137,7 +137,7 @@ Contact: Vadim Pasternak <vadimpmellanox.com> Description: These files show the system reset cause, as following: COMEX thermal shutdown; wathchdog power off or reset was derived by one of the next components: COMEX, switch board or by Small Form - Factor mezzanine, reset requested from ASIC, reset cuased by BIOS + Factor mezzanine, reset requested from ASIC, reset caused by BIOS reload. Value 1 in file means this is reset cause, 0 - otherwise. Only one of the above causes could be 1 at the same time, representing only last reset cause. @@ -183,7 +183,7 @@ What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/vpd_wp Date: January 2020 KernelVersion: 5.6 Contact: Vadim Pasternak <vadimpmellanox.com> -Description: This file allows to overwrite system VPD hardware wrtie +Description: This file allows to overwrite system VPD hardware write protection when attribute is set 1. The file is read/write. diff --git a/Documentation/ABI/testing/configfs-iio b/Documentation/ABI/testing/configfs-iio index aebda53ec0f7..1637fcb50f56 100644 --- a/Documentation/ABI/testing/configfs-iio +++ b/Documentation/ABI/testing/configfs-iio @@ -31,4 +31,4 @@ Date: April 2016 KernelVersion: 4.7 Description: Dummy IIO devices directory. Creating a directory here will result - in creating a dummy IIO device in the IIO subystem. + in creating a dummy IIO device in the IIO subsystem. diff --git a/Documentation/ABI/testing/configfs-most b/Documentation/ABI/testing/configfs-most index bc6b8bd18da4..0a4b8649aa5a 100644 --- a/Documentation/ABI/testing/configfs-most +++ b/Documentation/ABI/testing/configfs-most @@ -20,7 +20,7 @@ Description: subbuffer_size configure the sub-buffer size for this channel - (needed for synchronous and isochrnous data) + (needed for synchronous and isochronous data) num_buffers @@ -75,7 +75,7 @@ Description: subbuffer_size configure the sub-buffer size for this channel - (needed for synchronous and isochrnous data) + (needed for synchronous and isochronous data) num_buffers @@ -130,7 +130,7 @@ Description: subbuffer_size configure the sub-buffer size for this channel - (needed for synchronous and isochrnous data) + (needed for synchronous and isochronous data) num_buffers @@ -196,7 +196,7 @@ Description: subbuffer_size configure the sub-buffer size for this channel - (needed for synchronous and isochrnous data) + (needed for synchronous and isochronous data) num_buffers diff --git a/Documentation/ABI/testing/configfs-usb-gadget b/Documentation/ABI/testing/configfs-usb-gadget index dc351e9af80a..b7943aa7e997 100644 --- a/Documentation/ABI/testing/configfs-usb-gadget +++ b/Documentation/ABI/testing/configfs-usb-gadget @@ -137,7 +137,7 @@ Description: This group contains "OS String" extension handling attributes. ============= =============================================== - use flag turning "OS Desctiptors" support on/off + use flag turning "OS Descriptors" support on/off b_vendor_code one-byte value used for custom per-device and per-interface requests qw_sign an identifier to be reported as "OS String" diff --git a/Documentation/ABI/testing/configfs-usb-gadget-uvc b/Documentation/ABI/testing/configfs-usb-gadget-uvc index ac5e11af79a8..889ed45be4ca 100644 --- a/Documentation/ABI/testing/configfs-usb-gadget-uvc +++ b/Documentation/ABI/testing/configfs-usb-gadget-uvc @@ -170,7 +170,7 @@ Description: Default color matching descriptors bMatrixCoefficients matrix used to compute luma and chroma values from the color primaries bTransferCharacteristics optoelectronic transfer - characteristic of the source picutre, + characteristic of the source picture, also called the gamma function bColorPrimaries color primaries and the reference white @@ -311,7 +311,7 @@ Description: Specific streaming header descriptors a hardware trigger interrupt event bTriggerSupport flag specifying if hardware triggering is supported - bStillCaptureMethod method of still image caputre + bStillCaptureMethod method of still image capture supported bTerminalLink id of the output terminal to which the video endpoint of this interface diff --git a/Documentation/ABI/testing/debugfs-driver-genwqe b/Documentation/ABI/testing/debugfs-driver-genwqe index 1c2f25674e8c..b45b016545d8 100644 --- a/Documentation/ABI/testing/debugfs-driver-genwqe +++ b/Documentation/ABI/testing/debugfs-driver-genwqe @@ -31,7 +31,7 @@ What: /sys/kernel/debug/genwqe/genwqe<n>_card/prev_regs Date: Oct 2013 Contact: haver@linux.vnet.ibm.com Description: Dump of the error registers before the last reset of - the card occured. + the card occurred. Only available for PF. What: /sys/kernel/debug/genwqe/genwqe<n>_card/prev_dbg_uid0 diff --git a/Documentation/ABI/testing/debugfs-driver-habanalabs b/Documentation/ABI/testing/debugfs-driver-habanalabs index c78fc9282876..e89c6351503c 100644 --- a/Documentation/ABI/testing/debugfs-driver-habanalabs +++ b/Documentation/ABI/testing/debugfs-driver-habanalabs @@ -153,7 +153,7 @@ KernelVersion: 5.1 Contact: ogabbay@kernel.org Description: Triggers an I2C transaction that is generated by the device's CPU. Writing to this file generates a write transaction while - reading from the file generates a read transcation + reading from the file generates a read transaction What: /sys/kernel/debug/habanalabs/hl<n>/i2c_reg Date: Jan 2019 diff --git a/Documentation/ABI/testing/sysfs-bus-fsi b/Documentation/ABI/testing/sysfs-bus-fsi index d148214181a1..76e0caa0c2b3 100644 --- a/Documentation/ABI/testing/sysfs-bus-fsi +++ b/Documentation/ABI/testing/sysfs-bus-fsi @@ -12,7 +12,7 @@ KernelVersion: 4.12 Contact: linux-fsi@lists.ozlabs.org Description: Sends an FSI BREAK command on a master's communication - link to any connnected slaves. A BREAK resets connected + link to any connected slaves. A BREAK resets connected device's logic and preps it to receive further commands from the master. diff --git a/Documentation/ABI/testing/sysfs-bus-iio b/Documentation/ABI/testing/sysfs-bus-iio index 267973541e72..6f98b6a9b785 100644 --- a/Documentation/ABI/testing/sysfs-bus-iio +++ b/Documentation/ABI/testing/sysfs-bus-iio @@ -786,7 +786,7 @@ What: /sys/.../events/in_capacitanceY_adaptive_thresh_rising_en What: /sys/.../events/in_capacitanceY_adaptive_thresh_falling_en KernelVersion: 5.13 Contact: linux-iio@vger.kernel.org -Descrption: +Description: Adaptive thresholds are similar to normal fixed thresholds but the value is expressed as an offset from a value which provides a low frequency approximation of the channel itself. @@ -798,10 +798,10 @@ What: /sys/.../in_capacitanceY_adaptive_thresh_rising_timeout What: /sys/.../in_capacitanceY_adaptive_thresh_falling_timeout KernelVersion: 5.11 Contact: linux-iio@vger.kernel.org -Descrption: +Description: When adaptive thresholds are used, the tracking signal may adjust too slowly to step changes in the raw signal. - *_timeout (in seconds) specifies a time for which the + Thus these specify the time in seconds for which the difference between the slow tracking signal and the raw signal is allowed to remain out-of-range before a reset event occurs in which the tracking signal is made equal diff --git a/Documentation/ABI/testing/sysfs-bus-pci b/Documentation/ABI/testing/sysfs-bus-pci index ef00fada2efb..793cbb76cd25 100644 --- a/Documentation/ABI/testing/sysfs-bus-pci +++ b/Documentation/ABI/testing/sysfs-bus-pci @@ -139,8 +139,8 @@ Description: binary file containing the Vital Product Data for the device. It should follow the VPD format defined in PCI Specification 2.1 or 2.2, but users should consider - that some devices may have malformatted data. If the - underlying VPD has a writable section then the + that some devices may have incorrectly formatted data. + If the underlying VPD has a writable section then the corresponding section of this file will be writable. What: /sys/bus/pci/devices/.../virtfnN diff --git a/Documentation/ABI/testing/sysfs-class-backlight b/Documentation/ABI/testing/sysfs-class-backlight index 1fc86401bf95..c453646b06e2 100644 --- a/Documentation/ABI/testing/sysfs-class-backlight +++ b/Documentation/ABI/testing/sysfs-class-backlight @@ -84,3 +84,103 @@ Description: It can be enabled by writing the value stored in /sys/class/backlight/<backlight>/max_brightness to /sys/class/backlight/<backlight>/brightness. + +What: /sys/class/backlight/<backlight>/<ambient light zone>_max +Date: Sep, 2009 +KernelVersion: v2.6.32 +Contact: device-drivers-devel@blackfin.uclinux.org +Description: + Control the maximum brightness for <ambient light zone> + on this <backlight>. Values are between 0 and 127. This file + will also show the brightness level stored for this + <ambient light zone>. + + The <ambient light zone> is device-driver specific: + + For ADP5520 and ADP5501, <ambient light zone> can be: + + =========== ================================================ + Ambient sysfs entry + light zone + =========== ================================================ + daylight /sys/class/backlight/<backlight>/daylight_max + office /sys/class/backlight/<backlight>/office_max + dark /sys/class/backlight/<backlight>/dark_max + =========== ================================================ + + For ADP8860, <ambient light zone> can be: + + =========== ================================================ + Ambient sysfs entry + light zone + =========== ================================================ + l1_daylight /sys/class/backlight/<backlight>/l1_daylight_max + l2_office /sys/class/backlight/<backlight>/l2_office_max + l3_dark /sys/class/backlight/<backlight>/l3_dark_max + =========== ================================================ + + For ADP8870, <ambient light zone> can be: + + =========== ================================================ + Ambient sysfs entry + light zone + =========== ================================================ + l1_daylight /sys/class/backlight/<backlight>/l1_daylight_max + l2_bright /sys/class/backlight/<backlight>/l2_bright_max + l3_office /sys/class/backlight/<backlight>/l3_office_max + l4_indoor /sys/class/backlight/<backlight>/l4_indoor_max + l5_dark /sys/class/backlight/<backlight>/l5_dark_max + =========== ================================================ + + See also: /sys/class/backlight/<backlight>/ambient_light_zone. + +What: /sys/class/backlight/<backlight>/<ambient light zone>_dim +Date: Sep, 2009 +KernelVersion: v2.6.32 +Contact: device-drivers-devel@blackfin.uclinux.org +Description: + Control the dim brightness for <ambient light zone> + on this <backlight>. Values are between 0 and 127, typically + set to 0. Full off when the backlight is disabled. + This file will also show the dim brightness level stored for + this <ambient light zone>. + + The <ambient light zone> is device-driver specific: + + For ADP5520 and ADP5501, <ambient light zone> can be: + + =========== ================================================ + Ambient sysfs entry + light zone + =========== ================================================ + daylight /sys/class/backlight/<backlight>/daylight_dim + office /sys/class/backlight/<backlight>/office_dim + dark /sys/class/backlight/<backlight>/dark_dim + =========== ================================================ + + For ADP8860, <ambient light zone> can be: + + =========== ================================================ + Ambient sysfs entry + light zone + =========== ================================================ + l1_daylight /sys/class/backlight/<backlight>/l1_daylight_dim + l2_office /sys/class/backlight/<backlight>/l2_office_dim + l3_dark /sys/class/backlight/<backlight>/l3_dark_dim + =========== ================================================ + + For ADP8870, <ambient light zone> can be: + + =========== ================================================ + Ambient sysfs entry + light zone + =========== ================================================ + l1_daylight /sys/class/backlight/<backlight>/l1_daylight_dim + l2_bright /sys/class/backlight/<backlight>/l2_bright_dim + l3_office /sys/class/backlight/<backlight>/l3_office_dim + l4_indoor /sys/class/backlight/<backlight>/l4_indoor_dim + l5_dark /sys/class/backlight/<backlight>/l5_dark_dim + =========== ================================================ + + See also: /sys/class/backlight/<backlight>/ambient_light_zone. + diff --git a/Documentation/ABI/testing/sysfs-class-backlight-adp5520 b/Documentation/ABI/testing/sysfs-class-backlight-adp5520 deleted file mode 100644 index 34b6ebafa210..000000000000 --- a/Documentation/ABI/testing/sysfs-class-backlight-adp5520 +++ /dev/null @@ -1,31 +0,0 @@ -sysfs interface for analog devices adp5520(01) backlight driver ---------------------------------------------------------------- - -The backlight brightness control operates at three different levels for the -adp5520 and adp5501 devices: daylight (level 1), office (level 2) and dark -(level 3). By default the brightness operates at the daylight brightness level. - -What: /sys/class/backlight/<backlight>/daylight_max -What: /sys/class/backlight/<backlight>/office_max -What: /sys/class/backlight/<backlight>/dark_max -Date: Sep, 2009 -KernelVersion: v2.6.32 -Contact: Michael Hennerich <michael.hennerich@analog.com> -Description: - (RW) Maximum current setting for the backlight when brightness - is at one of the three levels (daylight, office or dark). This - is an input code between 0 and 127, which is transformed to a - value between 0 mA and 30 mA using linear or non-linear - algorithms. - -What: /sys/class/backlight/<backlight>/daylight_dim -What: /sys/class/backlight/<backlight>/office_dim -What: /sys/class/backlight/<backlight>/dark_dim -Date: Sep, 2009 -KernelVersion: v2.6.32 -Contact: Michael Hennerich <michael.hennerich@analog.com> -Description: - (RW) Dim current setting for the backlight when brightness is at - one of the three levels (daylight, office or dark). This is an - input code between 0 and 127, which is transformed to a value - between 0 mA and 30 mA using linear or non-linear algorithms. diff --git a/Documentation/ABI/testing/sysfs-class-backlight-adp8860 b/Documentation/ABI/testing/sysfs-class-backlight-adp8860 deleted file mode 100644 index 6610ac73f9ba..000000000000 --- a/Documentation/ABI/testing/sysfs-class-backlight-adp8860 +++ /dev/null @@ -1,37 +0,0 @@ -sysfs interface for analog devices adp8860 backlight driver ------------------------------------------------------------ - -The backlight brightness control operates at three different levels for the -adp8860, adp8861 and adp8863 devices: daylight (level 1), office (level 2) and -dark (level 3). By default the brightness operates at the daylight brightness -level. - -See also /sys/class/backlight/<backlight>/ambient_light_level and -/sys/class/backlight/<backlight>/ambient_light_zone. - - -What: /sys/class/backlight/<backlight>/l1_daylight_max -What: /sys/class/backlight/<backlight>/l2_office_max -What: /sys/class/backlight/<backlight>/l3_dark_max -Date: Apr, 2010 -KernelVersion: v2.6.35 -Contact: Michael Hennerich <michael.hennerich@analog.com> -Description: - (RW) Maximum current setting for the backlight when brightness - is at one of the three levels (daylight, office or dark). This - is an input code between 0 and 127, which is transformed to a - value between 0 mA and 30 mA using linear or non-linear - algorithms. - - -What: /sys/class/backlight/<backlight>/l1_daylight_dim -What: /sys/class/backlight/<backlight>/l2_office_dim -What: /sys/class/backlight/<backlight>/l3_dark_dim -Date: Apr, 2010 -KernelVersion: v2.6.35 -Contact: Michael Hennerich <michael.hennerich@analog.com> -Description: - (RW) Dim current setting for the backlight when brightness is at - one of the three levels (daylight, office or dark). This is an - input code between 0 and 127, which is transformed to a value - between 0 mA and 30 mA using linear or non-linear algorithms. diff --git a/Documentation/ABI/testing/sysfs-class-backlight-driver-adp8870 b/Documentation/ABI/testing/sysfs-class-backlight-driver-adp8870 deleted file mode 100644 index b08ca912cad4..000000000000 --- a/Documentation/ABI/testing/sysfs-class-backlight-driver-adp8870 +++ /dev/null @@ -1,32 +0,0 @@ -See also /sys/class/backlight/<backlight>/ambient_light_level and -/sys/class/backlight/<backlight>/ambient_light_zone. - -What: /sys/class/backlight/<backlight>/<ambient light zone>_max -What: /sys/class/backlight/<backlight>/l1_daylight_max -What: /sys/class/backlight/<backlight>/l2_bright_max -What: /sys/class/backlight/<backlight>/l3_office_max -What: /sys/class/backlight/<backlight>/l4_indoor_max -What: /sys/class/backlight/<backlight>/l5_dark_max -Date: May 2011 -KernelVersion: 3.0 -Contact: device-drivers-devel@blackfin.uclinux.org -Description: - Control the maximum brightness for <ambient light zone> - on this <backlight>. Values are between 0 and 127. This file - will also show the brightness level stored for this - <ambient light zone>. - -What: /sys/class/backlight/<backlight>/<ambient light zone>_dim -What: /sys/class/backlight/<backlight>/l2_bright_dim -What: /sys/class/backlight/<backlight>/l3_office_dim -What: /sys/class/backlight/<backlight>/l4_indoor_dim -What: /sys/class/backlight/<backlight>/l5_dark_dim -Date: May 2011 -KernelVersion: 3.0 -Contact: device-drivers-devel@blackfin.uclinux.org -Description: - Control the dim brightness for <ambient light zone> - on this <backlight>. Values are between 0 and 127, typically - set to 0. Full off when the backlight is disabled. - This file will also show the dim brightness level stored for - this <ambient light zone>. diff --git a/Documentation/ABI/testing/sysfs-class-led-driver-el15203000 b/Documentation/ABI/testing/sysfs-class-led-driver-el15203000 deleted file mode 100644 index 04f3ffdc5936..000000000000 --- a/Documentation/ABI/testing/sysfs-class-led-driver-el15203000 +++ /dev/null @@ -1,9 +0,0 @@ -What: /sys/class/leds/<led>/repeat -Date: September 2019 -KernelVersion: 5.5 -Description: - EL15203000 supports only indefinitely patterns, - so this file should always store -1. - - For more info, please see: - Documentation/ABI/testing/sysfs-class-led-trigger-pattern diff --git a/Documentation/ABI/testing/sysfs-class-led-trigger-pattern b/Documentation/ABI/testing/sysfs-class-led-trigger-pattern index d91a07767adf..8c57d2780554 100644 --- a/Documentation/ABI/testing/sysfs-class-led-trigger-pattern +++ b/Documentation/ABI/testing/sysfs-class-led-trigger-pattern @@ -35,3 +35,6 @@ Description: This file will always return the originally written repeat number. + + It should be noticed that some leds, like EL15203000 may + only support indefinitely patterns, so they always store -1. diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu index fe13baa53c59..160b10c029c0 100644 --- a/Documentation/ABI/testing/sysfs-devices-system-cpu +++ b/Documentation/ABI/testing/sysfs-devices-system-cpu @@ -50,7 +50,7 @@ Description: Dynamic addition and removal of CPU's. This is not hotplug architecture specific. release: writes to this file dynamically remove a CPU from - the system. Information writtento the file to remove CPU's + the system. Information written to the file to remove CPU's is architecture specific. What: /sys/devices/system/cpu/cpu#/node @@ -97,7 +97,7 @@ Description: CPU topology files that describe a logical CPU's relationship corresponds to a physical socket number, but the actual value is architecture and platform dependent. - thread_siblings: internel kernel map of cpu#'s hardware + thread_siblings: internal kernel map of cpu#'s hardware threads within the same core as cpu# thread_siblings_list: human-readable list of cpu#'s hardware @@ -280,7 +280,7 @@ Description: Disable L3 cache indices on a processor with this functionality will return the currently disabled index for that node. There is one L3 structure per node, or per internal node on MCM machines. Writing a valid - index to one of these files will cause the specificed cache + index to one of these files will cause the specified cache index to be disabled. All AMD processors with L3 caches provide this functionality. @@ -295,7 +295,7 @@ Description: Processor frequency boosting control This switch controls the boost setting for the whole system. Boosting allows the CPU and the firmware to run at a frequency - beyound it's nominal limit. + beyond it's nominal limit. More details can be found in Documentation/admin-guide/pm/cpufreq.rst @@ -532,7 +532,7 @@ What: /sys/devices/system/cpu/smt /sys/devices/system/cpu/smt/control Date: June 2018 Contact: Linux kernel mailing list <linux-kernel@vger.kernel.org> -Description: Control Symetric Multi Threading (SMT) +Description: Control Symmetric Multi Threading (SMT) active: Tells whether SMT is active (enabled and siblings online) diff --git a/Documentation/ABI/testing/sysfs-driver-ufs b/Documentation/ABI/testing/sysfs-driver-ufs index d1bc23cb6a9d..eaac6898f0c0 100644 --- a/Documentation/ABI/testing/sysfs-driver-ufs +++ b/Documentation/ABI/testing/sysfs-driver-ufs @@ -168,7 +168,7 @@ Description: This file shows the manufacturing date in BCD format. What: /sys/bus/platform/drivers/ufshcd/*/device_descriptor/manufacturer_id Date: February 2018 Contact: Stanislav Nijnikov <stanislav.nijnikov@wdc.com> -Description: This file shows the manufacturee ID. This is one of the +Description: This file shows the manufacturer ID. This is one of the UFS device descriptor parameters. The full information about the descriptor could be found at UFS specifications 2.1. @@ -521,7 +521,7 @@ Description: This file shows maximum VCC, VCCQ and VCCQ2 value for What: /sys/bus/platform/drivers/ufshcd/*/string_descriptors/manufacturer_name Date: February 2018 Contact: Stanislav Nijnikov <stanislav.nijnikov@wdc.com> -Description: This file contains a device manufactureer name string. +Description: This file contains a device manufacturer name string. The full information about the descriptor could be found at UFS specifications 2.1. diff --git a/Documentation/ABI/testing/sysfs-fs-f2fs b/Documentation/ABI/testing/sysfs-fs-f2fs index 4849b8e84e42..5d9ae27bd462 100644 --- a/Documentation/ABI/testing/sysfs-fs-f2fs +++ b/Documentation/ABI/testing/sysfs-fs-f2fs @@ -238,7 +238,7 @@ Description: Shows current reserved blocks in system, it may be temporarily What: /sys/fs/f2fs/<disk>/gc_urgent Date: August 2017 Contact: "Jaegeuk Kim" <jaegeuk@kernel.org> -Description: Do background GC agressively when set. When gc_urgent = 1, +Description: Do background GC aggressively when set. When gc_urgent = 1, background thread starts to do GC by given gc_urgent_sleep_time interval. When gc_urgent = 2, F2FS will lower the bar of checking idle in order to process outstanding discard commands diff --git a/Documentation/ABI/testing/sysfs-kernel-iommu_groups b/Documentation/ABI/testing/sysfs-kernel-iommu_groups index 0fedbb0f94e4..eae2f1c1e11e 100644 --- a/Documentation/ABI/testing/sysfs-kernel-iommu_groups +++ b/Documentation/ABI/testing/sysfs-kernel-iommu_groups @@ -25,14 +25,10 @@ Description: /sys/kernel/iommu_groups/reserved_regions list IOVA the base IOVA, the second is the end IOVA and the third field describes the type of the region. -What: /sys/kernel/iommu_groups/reserved_regions -Date: June 2019 -KernelVersion: v5.3 -Contact: Eric Auger <eric.auger@redhat.com> -Description: In case an RMRR is used only by graphics or USB devices - it is now exposed as "direct-relaxable" instead of "direct". - In device assignment use case, for instance, those RMRR - are considered to be relaxable and safe. + Since kernel 5.3, in case an RMRR is used only by graphics or + USB devices it is now exposed as "direct-relaxable" instead + of "direct". In device assignment use case, for instance, + those RMRR are considered to be relaxable and safe. What: /sys/kernel/iommu_groups/<grp_id>/type Date: November 2020 diff --git a/Documentation/Makefile b/Documentation/Makefile index 9c42dde97671..c3feb657b654 100644 --- a/Documentation/Makefile +++ b/Documentation/Makefile @@ -76,7 +76,7 @@ quiet_cmd_sphinx = SPHINX $@ --> file://$(abspath $(BUILDDIR)/$3/$4) PYTHONDONTWRITEBYTECODE=1 \ BUILDDIR=$(abspath $(BUILDDIR)) SPHINX_CONF=$(abspath $(srctree)/$(src)/$5/$(SPHINX_CONF)) \ $(PYTHON3) $(srctree)/scripts/jobserver-exec \ - $(SHELL) $(srctree)/Documentation/sphinx/parallel-wrapper.sh \ + $(CONFIG_SHELL) $(srctree)/Documentation/sphinx/parallel-wrapper.sh \ $(SPHINXBUILD) \ -b $2 \ -c $(abspath $(srctree)/$(src)) \ diff --git a/Documentation/PCI/acpi-info.rst b/Documentation/PCI/acpi-info.rst index 060217081c79..34c64a5a66ec 100644 --- a/Documentation/PCI/acpi-info.rst +++ b/Documentation/PCI/acpi-info.rst @@ -22,9 +22,9 @@ or if the device has INTx interrupts connected by platform interrupt controllers and a _PRT is needed to describe those connections. ACPI resource description is done via _CRS objects of devices in the ACPI -namespace [2]. The _CRS is like a generalized PCI BAR: the OS can read +namespace [2]. The _CRS is like a generalized PCI BAR: the OS can read _CRS and figure out what resource is being consumed even if it doesn't have -a driver for the device [3]. That's important because it means an old OS +a driver for the device [3]. That's important because it means an old OS can work correctly even on a system with new devices unknown to the OS. The new devices might not do anything, but the OS can at least make sure no resources conflict with them. @@ -41,15 +41,15 @@ ACPI, that device will have a specific _HID/_CID that tells the OS what driver to bind to it, and the _CRS tells the OS and the driver where the device's registers are. -PCI host bridges are PNP0A03 or PNP0A08 devices. Their _CRS should -describe all the address space they consume. This includes all the windows +PCI host bridges are PNP0A03 or PNP0A08 devices. Their _CRS should +describe all the address space they consume. This includes all the windows they forward down to the PCI bus, as well as registers of the host bridge -itself that are not forwarded to PCI. The host bridge registers include +itself that are not forwarded to PCI. The host bridge registers include things like secondary/subordinate bus registers that determine the bus range below the bridge, window registers that describe the apertures, etc. These are all device-specific, non-architected things, so the only way a PNP0A03/PNP0A08 driver can manage them is via _PRS/_CRS/_SRS, which contain -the device-specific details. The host bridge registers also include ECAM +the device-specific details. The host bridge registers also include ECAM space, since it is consumed by the host bridge. ACPI defines a Consumer/Producer bit to distinguish the bridge registers @@ -66,7 +66,7 @@ the PNP0A03/PNP0A08 device itself. The workaround was to describe the bridge registers (including ECAM space) in PNP0C02 catch-all devices [6]. With the exception of ECAM, the bridge register space is device-specific anyway, so the generic PNP0A03/PNP0A08 driver (pci_root.c) has no need to -know about it. +know about it. New architectures should be able to use "Consumer" Extended Address Space descriptors in the PNP0A03 device for bridge registers, including ECAM, @@ -75,9 +75,9 @@ ia64 kernels assume all address space descriptors, including "Consumer" Extended Address Space ones, are windows, so it would not be safe to describe bridge registers this way on those architectures. -PNP0C02 "motherboard" devices are basically a catch-all. There's no +PNP0C02 "motherboard" devices are basically a catch-all. There's no programming model for them other than "don't use these resources for -anything else." So a PNP0C02 _CRS should claim any address space that is +anything else." So a PNP0C02 _CRS should claim any address space that is (1) not claimed by _CRS under any other device object in the ACPI namespace and (2) should not be assigned by the OS to something else. diff --git a/Documentation/PCI/endpoint/pci-endpoint-cfs.rst b/Documentation/PCI/endpoint/pci-endpoint-cfs.rst index 696f8eeb4738..db609b97ad58 100644 --- a/Documentation/PCI/endpoint/pci-endpoint-cfs.rst +++ b/Documentation/PCI/endpoint/pci-endpoint-cfs.rst @@ -125,4 +125,4 @@ all the EPF devices are created and linked with the EPC device. | interrupt_pin | function -[1] :doc:`pci-endpoint` +[1] Documentation/PCI/endpoint/pci-endpoint.rst diff --git a/Documentation/PCI/pci.rst b/Documentation/PCI/pci.rst index 814b40f8360b..fa651e25d98c 100644 --- a/Documentation/PCI/pci.rst +++ b/Documentation/PCI/pci.rst @@ -265,7 +265,7 @@ Set the DMA mask size --------------------- .. note:: If anything below doesn't make sense, please refer to - :doc:`/core-api/dma-api`. This section is just a reminder that + Documentation/core-api/dma-api.rst. This section is just a reminder that drivers need to indicate DMA capabilities of the device and is not an authoritative source for DMA interfaces. @@ -291,7 +291,7 @@ Many 64-bit "PCI" devices (before PCI-X) and some PCI-X devices are Setup shared control data ------------------------- Once the DMA masks are set, the driver can allocate "consistent" (a.k.a. shared) -memory. See :doc:`/core-api/dma-api` for a full description of +memory. See Documentation/core-api/dma-api.rst for a full description of the DMA APIs. This section is just a reminder that it needs to be done before enabling DMA on the device. @@ -421,7 +421,7 @@ owners if there is one. Then clean up "consistent" buffers which contain the control data. -See :doc:`/core-api/dma-api` for details on unmapping interfaces. +See Documentation/core-api/dma-api.rst for details on unmapping interfaces. Unregister from other subsystems diff --git a/Documentation/admin-guide/cputopology.rst b/Documentation/admin-guide/cputopology.rst index b90dafcc8237..8632a1db36e4 100644 --- a/Documentation/admin-guide/cputopology.rst +++ b/Documentation/admin-guide/cputopology.rst @@ -2,87 +2,10 @@ How CPU topology info is exported via sysfs =========================================== -Export CPU topology info via sysfs. Items (attributes) are similar -to /proc/cpuinfo output of some architectures. They reside in -/sys/devices/system/cpu/cpuX/topology/: - -physical_package_id: - - physical package id of cpuX. Typically corresponds to a physical - socket number, but the actual value is architecture and platform - dependent. - -die_id: - - the CPU die ID of cpuX. Typically it is the hardware platform's - identifier (rather than the kernel's). The actual value is - architecture and platform dependent. - -core_id: - - the CPU core ID of cpuX. Typically it is the hardware platform's - identifier (rather than the kernel's). The actual value is - architecture and platform dependent. - -book_id: - - the book ID of cpuX. Typically it is the hardware platform's - identifier (rather than the kernel's). The actual value is - architecture and platform dependent. - -drawer_id: - - the drawer ID of cpuX. Typically it is the hardware platform's - identifier (rather than the kernel's). The actual value is - architecture and platform dependent. - -core_cpus: - - internal kernel map of CPUs within the same core. - (deprecated name: "thread_siblings") - -core_cpus_list: - - human-readable list of CPUs within the same core. - (deprecated name: "thread_siblings_list"); - -package_cpus: - - internal kernel map of the CPUs sharing the same physical_package_id. - (deprecated name: "core_siblings") - -package_cpus_list: - - human-readable list of CPUs sharing the same physical_package_id. - (deprecated name: "core_siblings_list") - -die_cpus: - - internal kernel map of CPUs within the same die. - -die_cpus_list: - - human-readable list of CPUs within the same die. - -book_siblings: - - internal kernel map of cpuX's hardware threads within the same - book_id. - -book_siblings_list: - - human-readable list of cpuX's hardware threads within the same - book_id. - -drawer_siblings: - - internal kernel map of cpuX's hardware threads within the same - drawer_id. - -drawer_siblings_list: - - human-readable list of cpuX's hardware threads within the same - drawer_id. +CPU topology info is exported via sysfs. Items (attributes) are similar +to /proc/cpuinfo output of some architectures. They reside in +/sys/devices/system/cpu/cpuX/topology/. Please refer to the ABI file: +Documentation/ABI/stable/sysfs-devices-system-cpu. Architecture-neutral, drivers/base/topology.c, exports these attributes. However, the book and drawer related sysfs files will only be created if diff --git a/Documentation/admin-guide/ext4.rst b/Documentation/admin-guide/ext4.rst index d2795ca6821e..4c559e08d11e 100644 --- a/Documentation/admin-guide/ext4.rst +++ b/Documentation/admin-guide/ext4.rst @@ -392,7 +392,7 @@ When mounting an ext4 filesystem, the following option are accepted: dax Use direct access (no page cache). See - Documentation/filesystems/dax.txt. Note that this option is + Documentation/filesystems/dax.rst. Note that this option is incompatible with data=journal. inlinecrypt diff --git a/Documentation/admin-guide/hw-vuln/special-register-buffer-data-sampling.rst b/Documentation/admin-guide/hw-vuln/special-register-buffer-data-sampling.rst index 3b1ce68d2456..966c9b3296ea 100644 --- a/Documentation/admin-guide/hw-vuln/special-register-buffer-data-sampling.rst +++ b/Documentation/admin-guide/hw-vuln/special-register-buffer-data-sampling.rst @@ -3,7 +3,8 @@ SRBDS - Special Register Buffer Data Sampling ============================================= -SRBDS is a hardware vulnerability that allows MDS :doc:`mds` techniques to +SRBDS is a hardware vulnerability that allows MDS +Documentation/admin-guide/hw-vuln/mds.rst techniques to infer values returned from special register accesses. Special register accesses are accesses to off core registers. According to Intel's evaluation, the special register reads that have a security expectation of privacy are diff --git a/Documentation/admin-guide/kdump/kdump.rst b/Documentation/admin-guide/kdump/kdump.rst index 75a9dd98e76e..cb30ca3df27c 100644 --- a/Documentation/admin-guide/kdump/kdump.rst +++ b/Documentation/admin-guide/kdump/kdump.rst @@ -2,7 +2,7 @@ Documentation for Kdump - The kexec-based Crash Dumping Solution ================================================================ -This document includes overview, setup and installation, and analysis +This document includes overview, setup, installation, and analysis information. Overview @@ -13,9 +13,9 @@ dump of the system kernel's memory needs to be taken (for example, when the system panics). The system kernel's memory image is preserved across the reboot and is accessible to the dump-capture kernel. -You can use common commands, such as cp and scp, to copy the -memory image to a dump file on the local disk, or across the network to -a remote system. +You can use common commands, such as cp, scp or makedumpfile to copy +the memory image to a dump file on the local disk, or across the network +to a remote system. Kdump and kexec are currently supported on the x86, x86_64, ppc64, ia64, s390x, arm and arm64 architectures. @@ -26,13 +26,15 @@ the dump-capture kernel. This ensures that ongoing Direct Memory Access The kexec -p command loads the dump-capture kernel into this reserved memory. -On x86 machines, the first 640 KB of physical memory is needed to boot, -regardless of where the kernel loads. Therefore, kexec backs up this -region just before rebooting into the dump-capture kernel. +On x86 machines, the first 640 KB of physical memory is needed for boot, +regardless of where the kernel loads. For simpler handling, the whole +low 1M is reserved to avoid any later kernel or device driver writing +data into this area. Like this, the low 1M can be reused as system RAM +by kdump kernel without extra handling. -Similarly on PPC64 machines first 32KB of physical memory is needed for -booting regardless of where the kernel is loaded and to support 64K page -size kexec backs up the first 64KB memory. +On PPC64 machines first 32KB of physical memory is needed for booting +regardless of where the kernel is loaded and to support 64K page size +kexec backs up the first 64KB memory. For s390x, when kdump is triggered, the crashkernel region is exchanged with the region [0, crashkernel region size] and then the kdump kernel @@ -46,14 +48,14 @@ passed to the dump-capture kernel through the elfcorehdr= boot parameter. Optionally the size of the ELF header can also be passed when using the elfcorehdr=[size[KMG]@]offset[KMG] syntax. - With the dump-capture kernel, you can access the memory image through /proc/vmcore. This exports the dump as an ELF-format file that you can -write out using file copy commands such as cp or scp. Further, you can -use analysis tools such as the GNU Debugger (GDB) and the Crash tool to -debug the dump file. This method ensures that the dump pages are correctly -ordered. - +write out using file copy commands such as cp or scp. You can also use +makedumpfile utility to analyze and write out filtered contents with +options, e.g with '-d 31' it will only write out kernel data. Further, +you can use analysis tools such as the GNU Debugger (GDB) and the Crash +tool to debug the dump file. This method ensures that the dump pages are +correctly ordered. Setup and Installation ====================== @@ -125,9 +127,18 @@ dump-capture kernels for enabling kdump support. System kernel config options ---------------------------- -1) Enable "kexec system call" in "Processor type and features.":: +1) Enable "kexec system call" or "kexec file based system call" in + "Processor type and features.":: + + CONFIG_KEXEC=y or CONFIG_KEXEC_FILE=y + + And both of them will select KEXEC_CORE:: - CONFIG_KEXEC=y + CONFIG_KEXEC_CORE=y + + Subsequently, CRASH_CORE is selected by KEXEC_CORE:: + + CONFIG_CRASH_CORE=y 2) Enable "sysfs file system support" in "Filesystem" -> "Pseudo filesystems." This is usually enabled by default:: @@ -175,17 +186,19 @@ Dump-capture kernel config options (Arch Dependent, i386 and x86_64) CONFIG_HIGHMEM4G -2) On i386 and x86_64, disable symmetric multi-processing support - under "Processor type and features":: +2) With CONFIG_SMP=y, usually nr_cpus=1 need specified on the kernel + command line when loading the dump-capture kernel because one + CPU is enough for kdump kernel to dump vmcore on most of systems. - CONFIG_SMP=n + However, you can also specify nr_cpus=X to enable multiple processors + in kdump kernel. In this case, "disable_cpu_apicid=" is needed to + tell kdump kernel which cpu is 1st kernel's BSP. Please refer to + admin-guide/kernel-parameters.txt for more details. - (If CONFIG_SMP=y, then specify maxcpus=1 on the kernel command line - when loading the dump-capture kernel, see section "Load the Dump-capture - Kernel".) + With CONFIG_SMP=n, the above things are not related. -3) If one wants to build and use a relocatable kernel, - Enable "Build a relocatable kernel" support under "Processor type and +3) A relocatable kernel is suggested to be built by default. If not yet, + enable "Build a relocatable kernel" support under "Processor type and features":: CONFIG_RELOCATABLE=y @@ -232,7 +245,7 @@ Dump-capture kernel config options (Arch Dependent, ia64) as a dump-capture kernel if desired. The crashkernel region can be automatically placed by the system - kernel at run time. This is done by specifying the base address as 0, + kernel at runtime. This is done by specifying the base address as 0, or omitting it all together:: crashkernel=256M@0 @@ -241,10 +254,6 @@ Dump-capture kernel config options (Arch Dependent, ia64) crashkernel=256M - If the start address is specified, note that the start address of the - kernel will be aligned to 64Mb, so if the start address is not then - any space below the alignment point will be wasted. - Dump-capture kernel config options (Arch Dependent, arm) ---------------------------------------------------------- @@ -260,46 +269,82 @@ Dump-capture kernel config options (Arch Dependent, arm64) on non-VHE systems even if it is configured. This is because the CPU will not be reset to EL2 on panic. -Extended crashkernel syntax +crashkernel syntax =========================== +1) crashkernel=size@offset -While the "crashkernel=size[@offset]" syntax is sufficient for most -configurations, sometimes it's handy to have the reserved memory dependent -on the value of System RAM -- that's mostly for distributors that pre-setup -the kernel command line to avoid a unbootable system after some memory has -been removed from the machine. + Here 'size' specifies how much memory to reserve for the dump-capture kernel + and 'offset' specifies the beginning of this reserved memory. For example, + "crashkernel=64M@16M" tells the system kernel to reserve 64 MB of memory + starting at physical address 0x01000000 (16MB) for the dump-capture kernel. -The syntax is:: + The crashkernel region can be automatically placed by the system + kernel at run time. This is done by specifying the base address as 0, + or omitting it all together:: - crashkernel=<range1>:<size1>[,<range2>:<size2>,...][@offset] - range=start-[end] + crashkernel=256M@0 -For example:: + or:: - crashkernel=512M-2G:64M,2G-:128M + crashkernel=256M -This would mean: + If the start address is specified, note that the start address of the + kernel will be aligned to a value (which is Arch dependent), so if the + start address is not then any space below the alignment point will be + wasted. - 1) if the RAM is smaller than 512M, then don't reserve anything - (this is the "rescue" case) - 2) if the RAM size is between 512M and 2G (exclusive), then reserve 64M - 3) if the RAM size is larger than 2G, then reserve 128M +2) range1:size1[,range2:size2,...][@offset] + While the "crashkernel=size[@offset]" syntax is sufficient for most + configurations, sometimes it's handy to have the reserved memory dependent + on the value of System RAM -- that's mostly for distributors that pre-setup + the kernel command line to avoid a unbootable system after some memory has + been removed from the machine. + The syntax is:: -Boot into System Kernel -======================= + crashkernel=<range1>:<size1>[,<range2>:<size2>,...][@offset] + range=start-[end] + + For example:: + + crashkernel=512M-2G:64M,2G-:128M + This would mean: + + 1) if the RAM is smaller than 512M, then don't reserve anything + (this is the "rescue" case) + 2) if the RAM size is between 512M and 2G (exclusive), then reserve 64M + 3) if the RAM size is larger than 2G, then reserve 128M + +3) crashkernel=size,high and crashkernel=size,low + + If memory above 4G is preferred, crashkernel=size,high can be used to + fulfill that. With it, physical memory is allowed to be allocated from top, + so could be above 4G if system has more than 4G RAM installed. Otherwise, + memory region will be allocated below 4G if available. + + When crashkernel=X,high is passed, kernel could allocate physical memory + region above 4G, low memory under 4G is needed in this case. There are + three ways to get low memory: + + 1) Kernel will allocate at least 256M memory below 4G automatically + if crashkernel=Y,low is not specified. + 2) Let user specify low memory size instead. + 3) Specified value 0 will disable low memory allocation:: + + crashkernel=0,low + +Boot into System Kernel +----------------------- 1) Update the boot loader (such as grub, yaboot, or lilo) configuration files as necessary. -2) Boot the system kernel with the boot parameter "crashkernel=Y@X", - where Y specifies how much memory to reserve for the dump-capture kernel - and X specifies the beginning of this reserved memory. For example, - "crashkernel=64M@16M" tells the system kernel to reserve 64 MB of memory - starting at physical address 0x01000000 (16MB) for the dump-capture kernel. +2) Boot the system kernel with the boot parameter "crashkernel=Y@X". - On x86 and x86_64, use "crashkernel=64M@16M". + On x86 and x86_64, use "crashkernel=Y[@X]". Most of the time, the + start address 'X' is not necessary, kernel will search a suitable + area. Unless an explicit start address is expected. On ppc64, use "crashkernel=128M@32M". @@ -331,8 +376,8 @@ of dump-capture kernel. Following is the summary. For i386 and x86_64: - - Use vmlinux if kernel is not relocatable. - Use bzImage/vmlinuz if kernel is relocatable. + - Use vmlinux if kernel is not relocatable. For ppc64: @@ -392,7 +437,7 @@ loading dump-capture kernel. For i386, x86_64 and ia64: - "1 irqpoll maxcpus=1 reset_devices" + "1 irqpoll nr_cpus=1 reset_devices" For ppc64: @@ -400,7 +445,7 @@ For ppc64: For s390x: - "1 maxcpus=1 cgroup_disable=memory" + "1 nr_cpus=1 cgroup_disable=memory" For arm: @@ -408,7 +453,7 @@ For arm: For arm64: - "1 maxcpus=1 reset_devices" + "1 nr_cpus=1 reset_devices" Notes on loading the dump-capture kernel: @@ -488,6 +533,10 @@ the following command:: cp /proc/vmcore <dump-file> +You can also use makedumpfile utility to write out the dump file +with specified options to filter out unwanted contents, e.g:: + + makedumpfile -l --message-level 1 -d 31 /proc/vmcore <dump-file> Analysis ======== @@ -535,8 +584,7 @@ This will cause a kdump to occur at the add_taint()->panic() call. Contact ======= -- Vivek Goyal (vgoyal@redhat.com) -- Maneesh Soni (maneesh@in.ibm.com) +- kexec@lists.infradead.org GDB macros ========== diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 34f5b30bcec8..26453f250683 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -3513,6 +3513,9 @@ nr_uarts= [SERIAL] maximum number of UARTs to be registered. + numa=off [KNL, ARM64, PPC, RISCV, SPARC, X86] Disable NUMA, Only + set up a single NUMA node spanning all memory. + numa_balancing= [KNL,ARM64,PPC,RISCV,S390,X86] Enable or disable automatic NUMA balancing. Allowed values are enable and disable diff --git a/Documentation/admin-guide/pm/intel_idle.rst b/Documentation/admin-guide/pm/intel_idle.rst index 89309e1b0e48..b799a43da62e 100644 --- a/Documentation/admin-guide/pm/intel_idle.rst +++ b/Documentation/admin-guide/pm/intel_idle.rst @@ -20,8 +20,8 @@ Nehalem and later generations of Intel processors, but the level of support for a particular processor model in it depends on whether or not it recognizes that processor model and may also depend on information coming from the platform firmware. [To understand ``intel_idle`` it is necessary to know how ``CPUIdle`` -works in general, so this is the time to get familiar with :doc:`cpuidle` if you -have not done that yet.] +works in general, so this is the time to get familiar with +Documentation/admin-guide/pm/cpuidle.rst if you have not done that yet.] ``intel_idle`` uses the ``MWAIT`` instruction to inform the processor that the logical CPU executing it is idle and so it may be possible to put some of the @@ -53,7 +53,8 @@ processor) corresponding to them depends on the processor model and it may also depend on the configuration of the platform. In order to create a list of available idle states required by the ``CPUIdle`` -subsystem (see :ref:`idle-states-representation` in :doc:`cpuidle`), +subsystem (see :ref:`idle-states-representation` in +Documentation/admin-guide/pm/cpuidle.rst), ``intel_idle`` can use two sources of information: static tables of idle states for different processor models included in the driver itself and the ACPI tables of the system. The former are always used if the processor model at hand is @@ -98,7 +99,8 @@ states may not be enabled by default if there are no matching entries in the preliminary list of idle states coming from the ACPI tables. In that case user space still can enable them later (on a per-CPU basis) with the help of the ``disable`` idle state attribute in ``sysfs`` (see -:ref:`idle-states-representation` in :doc:`cpuidle`). This basically means that +:ref:`idle-states-representation` in +Documentation/admin-guide/pm/cpuidle.rst). This basically means that the idle states "known" to the driver may not be enabled by default if they have not been exposed by the platform firmware (through the ACPI tables). @@ -186,7 +188,8 @@ be desirable. In practice, it is only really necessary to do that if the idle states in question cannot be enabled during system startup, because in the working state of the system the CPU power management quality of service (PM QoS) feature can be used to prevent ``CPUIdle`` from touching those idle states -even if they have been enumerated (see :ref:`cpu-pm-qos` in :doc:`cpuidle`). +even if they have been enumerated (see :ref:`cpu-pm-qos` in +Documentation/admin-guide/pm/cpuidle.rst). Setting ``max_cstate`` to 0 causes the ``intel_idle`` initialization to fail. The ``no_acpi`` and ``use_acpi`` module parameters (recognized by ``intel_idle`` @@ -202,7 +205,8 @@ Namely, the positions of the bits that are set in the ``states_off`` value are the indices of idle states to be disabled by default (as reflected by the names of the corresponding idle state directories in ``sysfs``, :file:`state0`, :file:`state1` ... :file:`state<i>` ..., where ``<i>`` is the index of the given -idle state; see :ref:`idle-states-representation` in :doc:`cpuidle`). +idle state; see :ref:`idle-states-representation` in +Documentation/admin-guide/pm/cpuidle.rst). For example, if ``states_off`` is equal to 3, the driver will disable idle states 0 and 1 by default, and if it is equal to 8, idle state 3 will be diff --git a/Documentation/admin-guide/pm/intel_pstate.rst b/Documentation/admin-guide/pm/intel_pstate.rst index df29b4f1f219..7a7d4b041eac 100644 --- a/Documentation/admin-guide/pm/intel_pstate.rst +++ b/Documentation/admin-guide/pm/intel_pstate.rst @@ -18,8 +18,8 @@ General Information (``CPUFreq``). It is a scaling driver for the Sandy Bridge and later generations of Intel processors. Note, however, that some of those processors may not be supported. [To understand ``intel_pstate`` it is necessary to know -how ``CPUFreq`` works in general, so this is the time to read :doc:`cpufreq` if -you have not done that yet.] +how ``CPUFreq`` works in general, so this is the time to read +Documentation/admin-guide/pm/cpufreq.rst if you have not done that yet.] For the processors supported by ``intel_pstate``, the P-state concept is broader than just an operating frequency or an operating performance point (see the @@ -445,8 +445,9 @@ Interpretation of Policy Attributes ----------------------------------- The interpretation of some ``CPUFreq`` policy attributes described in -:doc:`cpufreq` is special with ``intel_pstate`` as the current scaling driver -and it generally depends on the driver's `operation mode <Operation Modes_>`_. +Documentation/admin-guide/pm/cpufreq.rst is special with ``intel_pstate`` +as the current scaling driver and it generally depends on the driver's +`operation mode <Operation Modes_>`_. First of all, the values of the ``cpuinfo_max_freq``, ``cpuinfo_min_freq`` and ``scaling_cur_freq`` attributes are produced by applying a processor-specific diff --git a/Documentation/admin-guide/reporting-issues.rst b/Documentation/admin-guide/reporting-issues.rst index 18d8e25ba9df..d7ac13f789cc 100644 --- a/Documentation/admin-guide/reporting-issues.rst +++ b/Documentation/admin-guide/reporting-issues.rst @@ -1248,7 +1248,7 @@ paragraph makes the severeness obvious. In case you performed a successful bisection, use the title of the change that introduced the regression as the second part of your subject. Make the report -also mention the commit id of the culprit. In case of an unsuccessful bisection, +also mention the commit id of the culprit. In case of an unsuccessful bisection, make your report mention the latest tested version that's working fine (say 5.7) and the oldest where the issue occurs (say 5.8-rc1). diff --git a/Documentation/admin-guide/sysctl/abi.rst b/Documentation/admin-guide/sysctl/abi.rst index 77b1d1b2ad42..4e6db0a2a4c0 100644 --- a/Documentation/admin-guide/sysctl/abi.rst +++ b/Documentation/admin-guide/sysctl/abi.rst @@ -11,7 +11,7 @@ Documentation for /proc/sys/abi/ Copyright (c) 2020, Stephen Kitt -For general info, see :doc:`index`. +For general info, see Documentation/admin-guide/sysctl/index.rst. ------------------------------------------------------------------------------ diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst index 0ef05750dadc..10dd4b111e5c 100644 --- a/Documentation/admin-guide/sysctl/kernel.rst +++ b/Documentation/admin-guide/sysctl/kernel.rst @@ -9,7 +9,8 @@ Copyright (c) 1998, 1999, Rik van Riel <riel@nl.linux.org> Copyright (c) 2009, Shen Feng<shen@cn.fujitsu.com> -For general info and legal blurb, please look in :doc:`index`. +For general info and legal blurb, please look in +Documentation/admin-guide/sysctl/index.rst. ------------------------------------------------------------------------------ @@ -54,7 +55,7 @@ free space valid for 30 seconds. acpi_video_flags ================ -See :doc:`/power/video`. This allows the video resume mode to be set, +See Documentation/power/video.rst. This allows the video resume mode to be set, in a similar fashion to the ``acpi_sleep`` kernel parameter, by combining the following values: @@ -89,7 +90,7 @@ is 0x15 and the full version number is 0x234, this file will contain the value 340 = 0x154. See the ``type_of_loader`` and ``ext_loader_type`` fields in -:doc:`/x86/boot` for additional information. +Documentation/x86/boot.rst for additional information. bootloader_version (x86 only) @@ -99,7 +100,7 @@ The complete bootloader version number. In the example above, this file will contain the value 564 = 0x234. See the ``type_of_loader`` and ``ext_loader_ver`` fields in -:doc:`/x86/boot` for additional information. +Documentation/x86/boot.rst for additional information. bpf_stats_enabled @@ -269,7 +270,7 @@ see the ``hostname(1)`` man page. firmware_config =============== -See :doc:`/driver-api/firmware/fallback-mechanisms`. +See Documentation/driver-api/firmware/fallback-mechanisms.rst. The entries in this directory allow the firmware loader helper fallback to be controlled: @@ -297,7 +298,7 @@ crashes and outputting them to a serial console. ftrace_enabled, stack_tracer_enabled ==================================== -See :doc:`/trace/ftrace`. +See Documentation/trace/ftrace.rst. hardlockup_all_cpu_backtrace @@ -325,7 +326,7 @@ when a hard lockup is detected. 1 Panic on hard lockup. = =========================== -See :doc:`/admin-guide/lockup-watchdogs` for more information. +See Documentation/admin-guide/lockup-watchdogs.rst for more information. This can also be set using the nmi_watchdog kernel parameter. @@ -333,7 +334,12 @@ hotplug ======= Path for the hotplug policy agent. -Default value is "``/sbin/hotplug``". +Default value is ``CONFIG_UEVENT_HELPER_PATH``, which in turn defaults +to the empty string. + +This file only exists when ``CONFIG_UEVENT_HELPER`` is enabled. Most +modern systems rely exclusively on the netlink-based uevent source and +don't need this. hung_task_all_cpu_backtrace @@ -582,7 +588,8 @@ in a KVM virtual machine. This default can be overridden by adding:: nmi_watchdog=1 -to the guest kernel command line (see :doc:`/admin-guide/kernel-parameters`). +to the guest kernel command line (see +Documentation/admin-guide/kernel-parameters.rst). numa_balancing @@ -1067,7 +1074,7 @@ that support this feature. real-root-dev ============= -See :doc:`/admin-guide/initrd`. +See Documentation/admin-guide/initrd.rst. reboot-cmd (SPARC only) @@ -1161,7 +1168,7 @@ will take effect. seccomp ======= -See :doc:`/userspace-api/seccomp_filter`. +See Documentation/userspace-api/seccomp_filter.rst. sg-big-buff @@ -1332,7 +1339,7 @@ the boot PROM. sysrq ===== -See :doc:`/admin-guide/sysrq`. +See Documentation/admin-guide/sysrq.rst. tainted @@ -1362,15 +1369,16 @@ ORed together. The letters are seen in "Tainted" line of Oops reports. 131072 `(T)` The kernel was built with the struct randomization plugin ====== ===== ============================================================== -See :doc:`/admin-guide/tainted-kernels` for more information. +See Documentation/admin-guide/tainted-kernels.rst for more information. Note: writes to this sysctl interface will fail with ``EINVAL`` if the kernel is booted with the command line option ``panic_on_taint=<bitmask>,nousertaint`` and any of the ORed together values being written to ``tainted`` match with the bitmask declared on panic_on_taint. - See :doc:`/admin-guide/kernel-parameters` for more details on that particular - kernel command line option and its optional ``nousertaint`` switch. + See Documentation/admin-guide/kernel-parameters.rst for more details on + that particular kernel command line option and its optional + ``nousertaint`` switch. threads-max =========== @@ -1394,7 +1402,7 @@ If a value outside of this range is written to ``threads-max`` an traceoff_on_warning =================== -When set, disables tracing (see :doc:`/trace/ftrace`) when a +When set, disables tracing (see Documentation/trace/ftrace.rst) when a ``WARN()`` is hit. @@ -1414,8 +1422,8 @@ will send them to printk() again. This only works if the kernel was booted with ``tp_printk`` enabled. -See :doc:`/admin-guide/kernel-parameters` and -:doc:`/trace/boottime-trace`. +See Documentation/admin-guide/kernel-parameters.rst and +Documentation/trace/boottime-trace.rst. .. _unaligned-dump-stack: diff --git a/Documentation/arm/marvell.rst b/Documentation/arm/marvell.rst index c50be711ec72..db2246493d18 100644 --- a/Documentation/arm/marvell.rst +++ b/Documentation/arm/marvell.rst @@ -259,7 +259,7 @@ Storage family https://web.archive.org/web/20191129073953/http://www.marvell.com/storage/armada-sp/ Core: - Sheeva ARMv7 comatible Quad-core PJ4C + Sheeva ARMv7 compatible Quad-core PJ4C (not supported in upstream Linux kernel) diff --git a/Documentation/block/biodoc.rst b/Documentation/block/biodoc.rst index 1d4d71e391af..2098477851a4 100644 --- a/Documentation/block/biodoc.rst +++ b/Documentation/block/biodoc.rst @@ -196,7 +196,7 @@ a virtual address mapping (unlike the earlier scheme of virtual address do not have a corresponding kernel virtual address space mapping) and low-memory pages. -Note: Please refer to :doc:`/core-api/dma-api-howto` for a discussion +Note: Please refer to Documentation/core-api/dma-api-howto.rst for a discussion on PCI high mem DMA aspects and mapping of scatter gather lists, and support for 64 bit PCI. diff --git a/Documentation/block/blk-mq.rst b/Documentation/block/blk-mq.rst index a980d23af48c..d96118c73954 100644 --- a/Documentation/block/blk-mq.rst +++ b/Documentation/block/blk-mq.rst @@ -62,7 +62,7 @@ queue, to be sent in the future, when the hardware is able. Software staging queues ~~~~~~~~~~~~~~~~~~~~~~~ -The block IO subsystem adds requests in the software staging queues +The block IO subsystem adds requests in the software staging queues (represented by struct blk_mq_ctx) in case that they weren't sent directly to the driver. A request is one or more BIOs. They arrived at the block layer through the data structure struct bio. The block layer @@ -132,7 +132,7 @@ In order to indicate which request has been completed, every request is identified by an integer, ranging from 0 to the dispatch queue size. This tag is generated by the block layer and later reused by the device driver, removing the need to create a redundant identifier. When a request is completed in the -drive, the tag is sent back to the block layer to notify it of the finalization. +driver, the tag is sent back to the block layer to notify it of the finalization. This removes the need to do a linear search to find out which IO has been completed. diff --git a/Documentation/block/stat.rst b/Documentation/block/stat.rst index 77311335c08b..a1cd9db2058f 100644 --- a/Documentation/block/stat.rst +++ b/Documentation/block/stat.rst @@ -18,7 +18,7 @@ A. each, it would be impossible to guarantee that a set of readings represent a single point in time. -The stat file consists of a single line of text containing 11 decimal +The stat file consists of a single line of text containing 17 decimal values separated by whitespace. The fields are summarized in the following table, and described in more detail below. diff --git a/Documentation/bpf/bpf_lsm.rst b/Documentation/bpf/bpf_lsm.rst index 1c0a75a51d79..0dc3fb0d9544 100644 --- a/Documentation/bpf/bpf_lsm.rst +++ b/Documentation/bpf/bpf_lsm.rst @@ -20,10 +20,10 @@ LSM hook: Other LSM hooks which can be instrumented can be found in ``include/linux/lsm_hooks.h``. -eBPF programs that use :doc:`/bpf/btf` do not need to include kernel headers -for accessing information from the attached eBPF program's context. They can -simply declare the structures in the eBPF program and only specify the fields -that need to be accessed. +eBPF programs that use Documentation/bpf/btf.rst do not need to include kernel +headers for accessing information from the attached eBPF program's context. +They can simply declare the structures in the eBPF program and only specify +the fields that need to be accessed. .. code-block:: c @@ -88,8 +88,9 @@ example: The ``__attribute__((preserve_access_index))`` is a clang feature that allows the BPF verifier to update the offsets for the access at runtime using the -:doc:`/bpf/btf` information. Since the BPF verifier is aware of the types, it -also validates all the accesses made to the various types in the eBPF program. +Documentation/bpf/btf.rst information. Since the BPF verifier is aware of the +types, it also validates all the accesses made to the various types in the +eBPF program. Loading ------- diff --git a/Documentation/conf.py b/Documentation/conf.py index 879e86dbea66..7d92ec3e5b6e 100644 --- a/Documentation/conf.py +++ b/Documentation/conf.py @@ -41,15 +41,7 @@ extensions = ['kerneldoc', 'rstFlatTable', 'kernel_include', 'maintainers_include', 'sphinx.ext.autosectionlabel', 'kernel_abi', 'kernel_feat'] -# -# cdomain is badly broken in Sphinx 3+. Leaving it out generates *most* -# of the docs correctly, but not all. Scream bloody murder but allow -# the process to proceed; hopefully somebody will fix this properly soon. -# if major >= 3: - sys.stderr.write('''WARNING: The kernel documentation build process - support for Sphinx v3.0 and above is brand new. Be prepared for - possible issues in the generated output.\n''') if (major > 3) or (minor > 0 or patch >= 2): # Sphinx c function parser is more pedantic with regards to type # checking. Due to that, having macros at c:function cause problems. @@ -353,6 +345,8 @@ latex_elements = { # Additional stuff for the LaTeX preamble. 'preamble': ''' + % Prevent column squeezing of tabulary. + \\setlength{\\tymin}{20em} % Use some font with UTF-8 support with XeLaTeX \\usepackage{fontspec} \\setsansfont{DejaVu Sans} @@ -366,11 +360,23 @@ latex_elements = { cjk_cmd = check_output(['fc-list', '--format="%{family[0]}\n"']).decode('utf-8', 'ignore') if cjk_cmd.find("Noto Sans CJK SC") >= 0: - print ("enabling CJK for LaTeX builder") latex_elements['preamble'] += ''' % This is needed for translations \\usepackage{xeCJK} \\setCJKmainfont{Noto Sans CJK SC} + % Define custom macros to on/off CJK + \\newcommand{\\kerneldocCJKon}{\\makexeCJKactive} + \\newcommand{\\kerneldocCJKoff}{\\makexeCJKinactive} + % To customize \sphinxtableofcontents + \\usepackage{etoolbox} + % Inactivate CJK after tableofcontents + \\apptocmd{\\sphinxtableofcontents}{\\kerneldocCJKoff}{}{} + ''' +else: + latex_elements['preamble'] += ''' + % Custom macros to on/off CJK (Dummy) + \\newcommand{\\kerneldocCJKon}{} + \\newcommand{\\kerneldocCJKoff}{} ''' # Fix reference escape troubles with Sphinx 1.4.x diff --git a/Documentation/core-api/bus-virt-phys-mapping.rst b/Documentation/core-api/bus-virt-phys-mapping.rst index c7bc99cd2e21..c72b24a7d52c 100644 --- a/Documentation/core-api/bus-virt-phys-mapping.rst +++ b/Documentation/core-api/bus-virt-phys-mapping.rst @@ -8,7 +8,7 @@ How to access I/O mapped memory from within device drivers The virt_to_bus() and bus_to_virt() functions have been superseded by the functionality provided by the PCI DMA interface - (see :doc:`/core-api/dma-api-howto`). They continue + (see Documentation/core-api/dma-api-howto.rst). They continue to be documented below for historical purposes, but new code must not use them. --davidm 00/12/12 diff --git a/Documentation/core-api/dma-api.rst b/Documentation/core-api/dma-api.rst index 00a1d4fa3f9e..6d6d0edd2d27 100644 --- a/Documentation/core-api/dma-api.rst +++ b/Documentation/core-api/dma-api.rst @@ -5,7 +5,7 @@ Dynamic DMA mapping using the generic device :Author: James E.J. Bottomley <James.Bottomley@HansenPartnership.com> This document describes the DMA API. For a more gentle introduction -of the API (and actual examples), see :doc:`/core-api/dma-api-howto`. +of the API (and actual examples), see Documentation/core-api/dma-api-howto.rst. This API is split into two pieces. Part I describes the basic API. Part II describes extensions for supporting non-consistent memory @@ -479,7 +479,8 @@ without the _attrs suffixes, except that they pass an optional dma_attrs. The interpretation of DMA attributes is architecture-specific, and -each attribute should be documented in :doc:`/core-api/dma-attributes`. +each attribute should be documented in +Documentation/core-api/dma-attributes.rst. If dma_attrs are 0, the semantics of each of these functions is identical to those of the corresponding function diff --git a/Documentation/core-api/dma-isa-lpc.rst b/Documentation/core-api/dma-isa-lpc.rst index e59a3d35a93d..17b193603f0a 100644 --- a/Documentation/core-api/dma-isa-lpc.rst +++ b/Documentation/core-api/dma-isa-lpc.rst @@ -17,7 +17,7 @@ To do ISA style DMA you need to include two headers:: #include <asm/dma.h> The first is the generic DMA API used to convert virtual addresses to -bus addresses (see :doc:`/core-api/dma-api` for details). +bus addresses (see Documentation/core-api/dma-api.rst for details). The second contains the routines specific to ISA DMA transfers. Since this is not present on all platforms make sure you construct your diff --git a/Documentation/core-api/index.rst b/Documentation/core-api/index.rst index f1c9d20bd42d..5de2c7a4b1b3 100644 --- a/Documentation/core-api/index.rst +++ b/Documentation/core-api/index.rst @@ -48,7 +48,7 @@ Concurrency primitives ====================== How Linux keeps everything from happening at the same time. See -:doc:`/locking/index` for more related documentation. +Documentation/locking/index.rst for more related documentation. .. toctree:: :maxdepth: 1 @@ -77,7 +77,7 @@ Memory management ================= How to allocate and use memory in the kernel. Note that there is a lot -more memory-management documentation in :doc:`/vm/index`. +more memory-management documentation in Documentation/vm/index.rst. .. toctree:: :maxdepth: 1 diff --git a/Documentation/core-api/printk-formats.rst b/Documentation/core-api/printk-formats.rst index f063a384c7c8..385c0cc52f1f 100644 --- a/Documentation/core-api/printk-formats.rst +++ b/Documentation/core-api/printk-formats.rst @@ -37,14 +37,13 @@ Integer types u64 %llu or %llx -If <type> is dependent on a config option for its size (e.g., sector_t, -blkcnt_t) or is architecture-dependent for its size (e.g., tcflag_t), use a -format specifier of its largest possible type and explicitly cast to it. +If <type> is architecture-dependent for its size (e.g., cycles_t, tcflag_t) or +is dependent on a config option for its size (e.g., blk_status_t), use a format +specifier of its largest possible type and explicitly cast to it. Example:: - printk("test: sector number/total blocks: %llu/%llu\n", - (unsigned long long)sector, (unsigned long long)blockcount); + printk("test: latency: %llu cycles\n", (unsigned long long)time); Reminder: sizeof() returns type size_t. diff --git a/Documentation/dev-tools/checkpatch.rst b/Documentation/dev-tools/checkpatch.rst index 51fed1bd72ec..f0956e9ea2d8 100644 --- a/Documentation/dev-tools/checkpatch.rst +++ b/Documentation/dev-tools/checkpatch.rst @@ -246,6 +246,7 @@ Allocation style The first argument for kcalloc or kmalloc_array should be the number of elements. sizeof() as the first argument is generally wrong. + See: https://www.kernel.org/doc/html/latest/core-api/memory-allocation.html **ALLOC_SIZEOF_STRUCT** @@ -264,6 +265,7 @@ Allocation style **ALLOC_WITH_MULTIPLY** Prefer kmalloc_array/kcalloc over kmalloc/kzalloc with a sizeof multiply. + See: https://www.kernel.org/doc/html/latest/core-api/memory-allocation.html @@ -284,6 +286,7 @@ API usage BUG() or BUG_ON() should be avoided totally. Use WARN() and WARN_ON() instead, and handle the "impossible" error condition as gracefully as possible. + See: https://www.kernel.org/doc/html/latest/process/deprecated.html#bug-and-bug-on **CONSIDER_KSTRTO** @@ -292,12 +295,161 @@ API usage may lead to unexpected results in callers. The respective kstrtol(), kstrtoll(), kstrtoul(), and kstrtoull() functions tend to be the correct replacements. + See: https://www.kernel.org/doc/html/latest/process/deprecated.html#simple-strtol-simple-strtoll-simple-strtoul-simple-strtoull + **CONSTANT_CONVERSION** + Use of __constant_<foo> form is discouraged for the following functions:: + + __constant_cpu_to_be[x] + __constant_cpu_to_le[x] + __constant_be[x]_to_cpu + __constant_le[x]_to_cpu + __constant_htons + __constant_ntohs + + Using any of these outside of include/uapi/ is not preferred as using the + function without __constant_ is identical when the argument is a + constant. + + In big endian systems, the macros like __constant_cpu_to_be32(x) and + cpu_to_be32(x) expand to the same expression:: + + #define __constant_cpu_to_be32(x) ((__force __be32)(__u32)(x)) + #define __cpu_to_be32(x) ((__force __be32)(__u32)(x)) + + In little endian systems, the macros __constant_cpu_to_be32(x) and + cpu_to_be32(x) expand to __constant_swab32 and __swab32. __swab32 + has a __builtin_constant_p check:: + + #define __swab32(x) \ + (__builtin_constant_p((__u32)(x)) ? \ + ___constant_swab32(x) : \ + __fswab32(x)) + + So ultimately they have a special case for constants. + Similar is the case with all of the macros in the list. Thus + using the __constant_... forms are unnecessarily verbose and + not preferred outside of include/uapi. + + See: https://lore.kernel.org/lkml/1400106425.12666.6.camel@joe-AO725/ + + **DEPRECATED_API** + Usage of a deprecated RCU API is detected. It is recommended to replace + old flavourful RCU APIs by their new vanilla-RCU counterparts. + + The full list of available RCU APIs can be viewed from the kernel docs. + + See: https://www.kernel.org/doc/html/latest/RCU/whatisRCU.html#full-list-of-rcu-apis + + **DEPRECATED_VARIABLE** + EXTRA_{A,C,CPP,LD}FLAGS are deprecated and should be replaced by the new + flags added via commit f77bf01425b1 ("kbuild: introduce ccflags-y, + asflags-y and ldflags-y"). + + The following conversion scheme maybe used:: + + EXTRA_AFLAGS -> asflags-y + EXTRA_CFLAGS -> ccflags-y + EXTRA_CPPFLAGS -> cppflags-y + EXTRA_LDFLAGS -> ldflags-y + + See: + + 1. https://lore.kernel.org/lkml/20070930191054.GA15876@uranus.ravnborg.org/ + 2. https://lore.kernel.org/lkml/1313384834-24433-12-git-send-email-lacombar@gmail.com/ + 3. https://www.kernel.org/doc/html/latest/kbuild/makefiles.html#compilation-flags + + **DEVICE_ATTR_FUNCTIONS** + The function names used in DEVICE_ATTR is unusual. + Typically, the store and show functions are used with <attr>_store and + <attr>_show, where <attr> is a named attribute variable of the device. + + Consider the following examples:: + + static DEVICE_ATTR(type, 0444, type_show, NULL); + static DEVICE_ATTR(power, 0644, power_show, power_store); + + The function names should preferably follow the above pattern. + + See: https://www.kernel.org/doc/html/latest/driver-api/driver-model/device.html#attributes + + **DEVICE_ATTR_RO** + The DEVICE_ATTR_RO(name) helper macro can be used instead of + DEVICE_ATTR(name, 0444, name_show, NULL); + + Note that the macro automatically appends _show to the named + attribute variable of the device for the show method. + + See: https://www.kernel.org/doc/html/latest/driver-api/driver-model/device.html#attributes + + **DEVICE_ATTR_RW** + The DEVICE_ATTR_RW(name) helper macro can be used instead of + DEVICE_ATTR(name, 0644, name_show, name_store); + + Note that the macro automatically appends _show and _store to the + named attribute variable of the device for the show and store methods. + + See: https://www.kernel.org/doc/html/latest/driver-api/driver-model/device.html#attributes + + **DEVICE_ATTR_WO** + The DEVICE_AATR_WO(name) helper macro can be used instead of + DEVICE_ATTR(name, 0200, NULL, name_store); + + Note that the macro automatically appends _store to the + named attribute variable of the device for the store method. + + See: https://www.kernel.org/doc/html/latest/driver-api/driver-model/device.html#attributes + + **DUPLICATED_SYSCTL_CONST** + Commit d91bff3011cf ("proc/sysctl: add shared variables for range + check") added some shared const variables to be used instead of a local + copy in each source file. + + Consider replacing the sysctl range checking value with the shared + one in include/linux/sysctl.h. The following conversion scheme may + be used:: + + &zero -> SYSCTL_ZERO + &one -> SYSCTL_ONE + &int_max -> SYSCTL_INT_MAX + + See: + + 1. https://lore.kernel.org/lkml/20190430180111.10688-1-mcroce@redhat.com/ + 2. https://lore.kernel.org/lkml/20190531131422.14970-1-mcroce@redhat.com/ + + **ENOSYS** + ENOSYS means that a nonexistent system call was called. + Earlier, it was wrongly used for things like invalid operations on + otherwise valid syscalls. This should be avoided in new code. + + See: https://lore.kernel.org/lkml/5eb299021dec23c1a48fa7d9f2c8b794e967766d.1408730669.git.luto@amacapital.net/ + + **ENOTSUPP** + ENOTSUPP is not a standard error code and should be avoided in new patches. + EOPNOTSUPP should be used instead. + + See: https://lore.kernel.org/netdev/20200510182252.GA411829@lunn.ch/ + + **EXPORT_SYMBOL** + EXPORT_SYMBOL should immediately follow the symbol to be exported. + + **IN_ATOMIC** + in_atomic() is not for driver use so any such use is reported as an ERROR. + Also in_atomic() is often used to determine if sleeping is permitted, + but it is not reliable in this use model. Therefore its use is + strongly discouraged. + + However, in_atomic() is ok for core kernel use. + + See: https://lore.kernel.org/lkml/20080320201723.b87b3732.akpm@linux-foundation.org/ + **LOCKDEP** The lockdep_no_validate class was added as a temporary measure to prevent warnings on conversion of device->sem to device->mutex. It should not be used for any other purpose. + See: https://lore.kernel.org/lkml/1268959062.9440.467.camel@laptop/ **MALFORMED_INCLUDE** @@ -308,14 +460,21 @@ API usage **USE_LOCKDEP** lockdep_assert_held() annotations should be preferred over assertions based on spin_is_locked() + See: https://www.kernel.org/doc/html/latest/locking/lockdep-design.html#annotations **UAPI_INCLUDE** No #include statements in include/uapi should use a uapi/ path. + **USLEEP_RANGE** + usleep_range() should be preferred over udelay(). The proper way of + using usleep_range() is mentioned in the kernel docs. -Comment style -------------- + See: https://www.kernel.org/doc/html/latest/timers/timers-howto.html#delays-information-on-the-various-kernel-delay-sleep-mechanisms + + +Comments +-------- **BLOCK_COMMENT_STYLE** The comment style is incorrect. The preferred style for multi- @@ -338,8 +497,24 @@ Comment style **C99_COMMENTS** C99 style single line comments (//) should not be used. Prefer the block comment style instead. + See: https://www.kernel.org/doc/html/latest/process/coding-style.html#commenting + **DATA_RACE** + Applications of data_race() should have a comment so as to document the + reasoning behind why it was deemed safe. + + See: https://lore.kernel.org/lkml/20200401101714.44781-1-elver@google.com/ + + **FSF_MAILING_ADDRESS** + Kernel maintainers reject new instances of the GPL boilerplate paragraph + directing people to write to the FSF for a copy of the GPL, since the + FSF has moved in the past and may do so again. + So do not write paragraphs about writing to the Free Software Foundation's + mailing address. + + See: https://lore.kernel.org/lkml/20131006222342.GT19510@leaf/ + Commit message -------------- @@ -347,6 +522,7 @@ Commit message **BAD_SIGN_OFF** The signed-off-by line does not fall in line with the standards specified by the community. + See: https://www.kernel.org/doc/html/latest/process/submitting-patches.html#developer-s-certificate-of-origin-1-1 **BAD_STABLE_ADDRESS_STYLE** @@ -368,12 +544,33 @@ Commit message **COMMIT_MESSAGE** The patch is missing a commit description. A brief description of the changes made by the patch should be added. + + See: https://www.kernel.org/doc/html/latest/process/submitting-patches.html#describe-your-changes + + **EMAIL_SUBJECT** + Naming the tool that found the issue is not very useful in the + subject line. A good subject line summarizes the change that + the patch brings. + See: https://www.kernel.org/doc/html/latest/process/submitting-patches.html#describe-your-changes + **FROM_SIGN_OFF_MISMATCH** + The author's email does not match with that in the Signed-off-by: + line(s). This can be sometimes caused due to an improperly configured + email client. + + This message is emitted due to any of the following reasons:: + + - The email names do not match. + - The email addresses do not match. + - The email subaddresses do not match. + - The email comments do not match. + **MISSING_SIGN_OFF** The patch is missing a Signed-off-by line. A signed-off-by line should be added according to Developer's certificate of Origin. + See: https://www.kernel.org/doc/html/latest/process/submitting-patches.html#sign-your-work-the-developer-s-certificate-of-origin **NO_AUTHOR_SIGN_OFF** @@ -382,6 +579,7 @@ Commit message end of explanation of the patch to denote that the author has written it or otherwise has the rights to pass it on as an open source patch. + See: https://www.kernel.org/doc/html/latest/process/submitting-patches.html#sign-your-work-the-developer-s-certificate-of-origin **DIFF_IN_COMMIT_MSG** @@ -389,6 +587,7 @@ Commit message This causes problems when one tries to apply a file containing both the changelog and the diff because patch(1) tries to apply the diff which it found in the changelog. + See: https://lore.kernel.org/lkml/20150611134006.9df79a893e3636019ad2759e@linux-foundation.org/ **GERRIT_CHANGE_ID** @@ -431,6 +630,7 @@ Comparison style **BOOL_COMPARISON** Comparisons of A to true and false are better written as A and !A. + See: https://lore.kernel.org/lkml/1365563834.27174.12.camel@joe-AO722/ **COMPARISON_TO_NULL** @@ -442,6 +642,87 @@ Comparison style side of the test should be avoided. +Indentation and Line Breaks +--------------------------- + + **CODE_INDENT** + Code indent should use tabs instead of spaces. + Outside of comments, documentation and Kconfig, + spaces are never used for indentation. + + See: https://www.kernel.org/doc/html/latest/process/coding-style.html#indentation + + **DEEP_INDENTATION** + Indentation with 6 or more tabs usually indicate overly indented + code. + + It is suggested to refactor excessive indentation of + if/else/for/do/while/switch statements. + + See: https://lore.kernel.org/lkml/1328311239.21255.24.camel@joe2Laptop/ + + **SWITCH_CASE_INDENT_LEVEL** + switch should be at the same indent as case. + Example:: + + switch (suffix) { + case 'G': + case 'g': + mem <<= 30; + break; + case 'M': + case 'm': + mem <<= 20; + break; + case 'K': + case 'k': + mem <<= 10; + fallthrough; + default: + break; + } + + See: https://www.kernel.org/doc/html/latest/process/coding-style.html#indentation + + **LONG_LINE** + The line has exceeded the specified maximum length. + To use a different maximum line length, the --max-line-length=n option + may be added while invoking checkpatch. + + Earlier, the default line length was 80 columns. Commit bdc48fa11e46 + ("checkpatch/coding-style: deprecate 80-column warning") increased the + limit to 100 columns. This is not a hard limit either and it's + preferable to stay within 80 columns whenever possible. + + See: https://www.kernel.org/doc/html/latest/process/coding-style.html#breaking-long-lines-and-strings + + **LONG_LINE_STRING** + A string starts before but extends beyond the maximum line length. + To use a different maximum line length, the --max-line-length=n option + may be added while invoking checkpatch. + + See: https://www.kernel.org/doc/html/latest/process/coding-style.html#breaking-long-lines-and-strings + + **LONG_LINE_COMMENT** + A comment starts before but extends beyond the maximum line length. + To use a different maximum line length, the --max-line-length=n option + may be added while invoking checkpatch. + + See: https://www.kernel.org/doc/html/latest/process/coding-style.html#breaking-long-lines-and-strings + + **TRAILING_STATEMENTS** + Trailing statements (for example after any conditional) should be + on the next line. + Statements, such as:: + + if (x == y) break; + + should be:: + + if (x == y) + break; + + Macros, Attributes and Symbols ------------------------------ @@ -472,7 +753,7 @@ Macros, Attributes and Symbols **BIT_MACRO** Defines like: 1 << <digit> could be BIT(digit). - The BIT() macro is defined in include/linux/bitops.h:: + The BIT() macro is defined via include/linux/bits.h:: #define BIT(nr) (1UL << (nr)) @@ -492,6 +773,7 @@ Macros, Attributes and Symbols The kernel does *not* use the ``__DATE__`` and ``__TIME__`` macros, and enables warnings if they are used as they can lead to non-deterministic builds. + See: https://www.kernel.org/doc/html/latest/kbuild/reproducible-builds.html#timestamps **DEFINE_ARCH_HAS** @@ -502,8 +784,12 @@ Macros, Attributes and Symbols want architectures able to override them with optimized ones, we should either use weak functions (appropriate for some cases), or the symbol that protects them should be the same symbol we use. + See: https://lore.kernel.org/lkml/CA+55aFycQ9XJvEOsiM3txHL5bjUc8CeKWJNR_H+MiicaddB42Q@mail.gmail.com/ + **DO_WHILE_MACRO_WITH_TRAILING_SEMICOLON** + do {} while(0) macros should not have a trailing semicolon. + **INIT_ATTRIBUTE** Const init definitions should use __initconst instead of __initdata. @@ -528,6 +814,20 @@ Macros, Attributes and Symbols ... } + **MISPLACED_INIT** + It is possible to use section markers on variables in a way + which gcc doesn't understand (or at least not the way the + developer intended):: + + static struct __initdata samsung_pll_clock exynos4_plls[nr_plls] = { + + does not put exynos4_plls in the .initdata section. The __initdata + marker can be virtually anywhere on the line, except right after + "struct". The preferred location is before the "=" sign if there is + one, or before the trailing ";" otherwise. + + See: https://lore.kernel.org/lkml/1377655732.3619.19.camel@joe-AO722/ + **MULTISTATEMENT_MACRO_USE_DO_WHILE** Macros with multiple statements should be enclosed in a do - while block. Same should also be the case for macros @@ -541,6 +841,10 @@ Macros, Attributes and Symbols See: https://www.kernel.org/doc/html/latest/process/coding-style.html#macros-enums-and-rtl + **PREFER_FALLTHROUGH** + Use the `fallthrough;` pseudo keyword instead of + `/* fallthrough */` like comments. + **WEAK_DECLARATION** Using weak declarations like __attribute__((weak)) or __weak can have unintended link defects. Avoid using them. @@ -551,8 +855,51 @@ Functions and Variables **CAMELCASE** Avoid CamelCase Identifiers. + See: https://www.kernel.org/doc/html/latest/process/coding-style.html#naming + **CONST_CONST** + Using `const <type> const *` is generally meant to be + written `const <type> * const`. + + **CONST_STRUCT** + Using const is generally a good idea. Checkpatch reads + a list of frequently used structs that are always or + almost always constant. + + The existing structs list can be viewed from + `scripts/const_structs.checkpatch`. + + See: https://lore.kernel.org/lkml/alpine.DEB.2.10.1608281509480.3321@hadrien/ + + **EMBEDDED_FUNCTION_NAME** + Embedded function names are less appropriate to use as + refactoring can cause function renaming. Prefer the use of + "%s", __func__ to embedded function names. + + Note that this does not work with -f (--file) checkpatch option + as it depends on patch context providing the function name. + + **FUNCTION_ARGUMENTS** + This warning is emitted due to any of the following reasons: + + 1. Arguments for the function declaration do not follow + the identifier name. Example:: + + void foo + (int bar, int baz) + + This should be corrected to:: + + void foo(int bar, int baz) + + 2. Some arguments for the function definition do not + have an identifier name. Example:: + + void foo(int) + + All arguments should have identifier names. + **FUNCTION_WITHOUT_ARGS** Function declarations without arguments like:: @@ -583,6 +930,34 @@ Functions and Variables return bar; +Permissions +----------- + + **DEVICE_ATTR_PERMS** + The permissions used in DEVICE_ATTR are unusual. + Typically only three permissions are used - 0644 (RW), 0444 (RO) + and 0200 (WO). + + See: https://www.kernel.org/doc/html/latest/filesystems/sysfs.html#attributes + + **EXECUTE_PERMISSIONS** + There is no reason for source files to be executable. The executable + bit can be removed safely. + + **EXPORTED_WORLD_WRITABLE** + Exporting world writable sysfs/debugfs files is usually a bad thing. + When done arbitrarily they can introduce serious security bugs. + In the past, some of the debugfs vulnerabilities would seemingly allow + any local user to write arbitrary values into device registers - a + situation from which little good can be expected to emerge. + + See: https://lore.kernel.org/linux-arm-kernel/cover.1296818921.git.segoon@openwall.com/ + + **NON_OCTAL_PERMISSIONS** + Permission bits should use 4 digit octal permissions (like 0700 or 0444). + Avoid using any other base like decimal. + + Spacing and Brackets -------------------- @@ -616,7 +991,7 @@ Spacing and Brackets 1. With a type on the left:: - ;int [] a; + int [] a; 2. At the beginning of a line for slice initialisers:: @@ -626,12 +1001,6 @@ Spacing and Brackets = { [0...10] = 5 } - **CODE_INDENT** - Code indent should use tabs instead of spaces. - Outside of comments, documentation and Kconfig, - spaces are never used for indentation. - See: https://www.kernel.org/doc/html/latest/process/coding-style.html#indentation - **CONCATENATED_STRING** Concatenated elements should have a space in between. Example:: @@ -644,17 +1013,20 @@ Spacing and Brackets **ELSE_AFTER_BRACE** `else {` should follow the closing block `}` on the same line. + See: https://www.kernel.org/doc/html/latest/process/coding-style.html#placing-braces-and-spaces **LINE_SPACING** Vertical space is wasted given the limited number of lines an editor window can display when multiple blank lines are used. + See: https://www.kernel.org/doc/html/latest/process/coding-style.html#spaces **OPEN_BRACE** The opening brace should be following the function definitions on the next line. For any non-functional block it should be on the same line as the last construct. + See: https://www.kernel.org/doc/html/latest/process/coding-style.html#placing-braces-and-spaces **POINTER_LOCATION** @@ -671,37 +1043,47 @@ Spacing and Brackets **SPACING** Whitespace style used in the kernel sources is described in kernel docs. - See: https://www.kernel.org/doc/html/latest/process/coding-style.html#spaces - **SWITCH_CASE_INDENT_LEVEL** - switch should be at the same indent as case. - Example:: - - switch (suffix) { - case 'G': - case 'g': - mem <<= 30; - break; - case 'M': - case 'm': - mem <<= 20; - break; - case 'K': - case 'k': - mem <<= 10; - /* fall through */ - default: - break; - } - - See: https://www.kernel.org/doc/html/latest/process/coding-style.html#indentation + See: https://www.kernel.org/doc/html/latest/process/coding-style.html#spaces **TRAILING_WHITESPACE** Trailing whitespace should always be removed. Some editors highlight the trailing whitespace and cause visual distractions when editing files. + See: https://www.kernel.org/doc/html/latest/process/coding-style.html#spaces + **UNNECESSARY_PARENTHESES** + Parentheses are not required in the following cases: + + 1. Function pointer uses:: + + (foo->bar)(); + + could be:: + + foo->bar(); + + 2. Comparisons in if:: + + if ((foo->bar) && (foo->baz)) + if ((foo == bar)) + + could be:: + + if (foo->bar && foo->baz) + if (foo == bar) + + 3. addressof/dereference single Lvalues:: + + &(foo->bar) + *(foo->bar) + + could be:: + + &foo->bar + *foo->bar + **WHILE_AFTER_BRACE** while should follow the closing bracket on the same line:: @@ -723,17 +1105,50 @@ Others The patch seems to be corrupted or lines are wrapped. Please regenerate the patch file before sending it to the maintainer. + **CVS_KEYWORD** + Since linux moved to git, the CVS markers are no longer used. + So, CVS style keywords ($Id$, $Revision$, $Log$) should not be + added. + + **DEFAULT_NO_BREAK** + switch default case is sometimes written as "default:;". This can + cause new cases added below default to be defective. + + A "break;" should be added after empty default statement to avoid + unwanted fallthrough. + **DOS_LINE_ENDINGS** For DOS-formatted patches, there are extra ^M symbols at the end of the line. These should be removed. - **EXECUTE_PERMISSIONS** - There is no reason for source files to be executable. The executable - bit can be removed safely. + **DT_SCHEMA_BINDING_PATCH** + DT bindings moved to a json-schema based format instead of + freeform text. - **NON_OCTAL_PERMISSIONS** - Permission bits should use 4 digit octal permissions (like 0700 or 0444). - Avoid using any other base like decimal. + See: https://www.kernel.org/doc/html/latest/devicetree/bindings/writing-schema.html + + **DT_SPLIT_BINDING_PATCH** + Devicetree bindings should be their own patch. This is because + bindings are logically independent from a driver implementation, + they have a different maintainer (even though they often + are applied via the same tree), and it makes for a cleaner history in the + DT only tree created with git-filter-branch. + + See: https://www.kernel.org/doc/html/latest/devicetree/bindings/submitting-patches.html#i-for-patch-submitters + + **EMBEDDED_FILENAME** + Embedding the complete filename path inside the file isn't particularly + useful as often the path is moved around and becomes incorrect. + + **FILE_PATH_CHANGES** + Whenever files are added, moved, or deleted, the MAINTAINERS file + patterns can be out of sync or outdated. + + So MAINTAINERS might need updating in these cases. + + **MEMSET** + The memset use appears to be incorrect. This may be caused due to + badly ordered parameters. Please recheck the usage. **NOT_UNIFIED_DIFF** The patch file does not appear to be in unified-diff format. Please @@ -742,14 +1157,12 @@ Others **PRINTF_0XDECIMAL** Prefixing 0x with decimal output is defective and should be corrected. - **TRAILING_STATEMENTS** - Trailing statements (for example after any conditional) should be - on the next line. - Like:: - - if (x == y) break; + **SPDX_LICENSE_TAG** + The source file is missing or has an improper SPDX identifier tag. + The Linux kernel requires the precise SPDX identifier in all source files, + and it is thoroughly documented in the kernel docs. - should be:: + See: https://www.kernel.org/doc/html/latest/process/license-rules.html - if (x == y) - break; + **TYPO_SPELLING** + Some words may have been misspelled. Consider reviewing them. diff --git a/Documentation/dev-tools/kunit/api/index.rst b/Documentation/dev-tools/kunit/api/index.rst index 9b9bffe5d41a..b33ad72bcf0b 100644 --- a/Documentation/dev-tools/kunit/api/index.rst +++ b/Documentation/dev-tools/kunit/api/index.rst @@ -10,7 +10,7 @@ API Reference This section documents the KUnit kernel testing API. It is divided into the following sections: -================================= ============================================== -:doc:`test` documents all of the standard testing API - excluding mocking or mocking related features. -================================= ============================================== +Documentation/dev-tools/kunit/api/test.rst + + - documents all of the standard testing API excluding mocking + or mocking related features. diff --git a/Documentation/dev-tools/kunit/faq.rst b/Documentation/dev-tools/kunit/faq.rst index 8d5029ad210a..5c6555d020f3 100644 --- a/Documentation/dev-tools/kunit/faq.rst +++ b/Documentation/dev-tools/kunit/faq.rst @@ -97,7 +97,7 @@ things to try. modules will automatically execute associated tests when loaded. Test results can be collected from ``/sys/kernel/debug/kunit/<test suite>/results``, and can be parsed with ``kunit.py parse``. For more details, see "KUnit on - non-UML architectures" in :doc:`usage`. + non-UML architectures" in Documentation/dev-tools/kunit/usage.rst. If none of the above tricks help, you are always welcome to email any issues to kunit-dev@googlegroups.com. diff --git a/Documentation/dev-tools/kunit/index.rst b/Documentation/dev-tools/kunit/index.rst index 848478838347..25d92a9a05ea 100644 --- a/Documentation/dev-tools/kunit/index.rst +++ b/Documentation/dev-tools/kunit/index.rst @@ -36,7 +36,7 @@ To make running these tests (and reading the results) easier, KUnit offers results. This provides a quick way of running KUnit tests during development, without requiring a virtual machine or separate hardware. -Get started now: :doc:`start` +Get started now: Documentation/dev-tools/kunit/start.rst Why KUnit? ========== @@ -88,9 +88,9 @@ it takes to read their test log? How do I use it? ================ -* :doc:`start` - for new users of KUnit -* :doc:`tips` - for short examples of best practices -* :doc:`usage` - for a more detailed explanation of KUnit features -* :doc:`api/index` - for the list of KUnit APIs used for testing -* :doc:`kunit-tool` - for more information on the kunit_tool helper script -* :doc:`faq` - for answers to some common questions about KUnit +* Documentation/dev-tools/kunit/start.rst - for new users of KUnit +* Documentation/dev-tools/kunit/tips.rst - for short examples of best practices +* Documentation/dev-tools/kunit/usage.rst - for a more detailed explanation of KUnit features +* Documentation/dev-tools/kunit/api/index.rst - for the list of KUnit APIs used for testing +* Documentation/dev-tools/kunit/kunit-tool.rst - for more information on the kunit_tool helper script +* Documentation/dev-tools/kunit/faq.rst - for answers to some common questions about KUnit diff --git a/Documentation/dev-tools/kunit/start.rst b/Documentation/dev-tools/kunit/start.rst index 0e65cabe08eb..63ef7b625c13 100644 --- a/Documentation/dev-tools/kunit/start.rst +++ b/Documentation/dev-tools/kunit/start.rst @@ -21,7 +21,7 @@ The wrapper can be run with: ./tools/testing/kunit/kunit.py run For more information on this wrapper (also called kunit_tool) check out the -:doc:`kunit-tool` page. +Documentation/dev-tools/kunit/kunit-tool.rst page. Creating a .kunitconfig ----------------------- @@ -234,7 +234,7 @@ Congrats! You just wrote your first KUnit test! Next Steps ========== -* Check out the :doc:`tips` page for tips on +* Check out the Documentation/dev-tools/kunit/tips.rst page for tips on writing idiomatic KUnit tests. * Optional: see the :doc:`usage` page for a more in-depth explanation of KUnit. diff --git a/Documentation/dev-tools/kunit/tips.rst b/Documentation/dev-tools/kunit/tips.rst index 8d8c238f7f79..492d2ded2f5a 100644 --- a/Documentation/dev-tools/kunit/tips.rst +++ b/Documentation/dev-tools/kunit/tips.rst @@ -125,7 +125,8 @@ Here's a slightly in-depth example of how one could implement "mocking": Note: here we're able to get away with using ``test->priv``, but if you wanted -something more flexible you could use a named ``kunit_resource``, see :doc:`api/test`. +something more flexible you could use a named ``kunit_resource``, see +Documentation/dev-tools/kunit/api/test.rst. Failing the current test ------------------------ @@ -185,5 +186,5 @@ Alternatively, one can take full control over the error message by using ``KUNIT Next Steps ========== -* Optional: see the :doc:`usage` page for a more +* Optional: see the Documentation/dev-tools/kunit/usage.rst page for a more in-depth explanation of KUnit. diff --git a/Documentation/dev-tools/kunit/usage.rst b/Documentation/dev-tools/kunit/usage.rst index 650f99590df5..3ee7ab91f712 100644 --- a/Documentation/dev-tools/kunit/usage.rst +++ b/Documentation/dev-tools/kunit/usage.rst @@ -10,7 +10,7 @@ understand it. This guide assumes a working knowledge of the Linux kernel and some basic knowledge of testing. For a high level introduction to KUnit, including setting up KUnit for your -project, see :doc:`start`. +project, see Documentation/dev-tools/kunit/start.rst. Organization of this document ============================= @@ -99,7 +99,8 @@ violated; however, the test will continue running, potentially trying other expectations until the test case ends or is otherwise terminated. This is as opposed to *assertions* which are discussed later. -To learn about more expectations supported by KUnit, see :doc:`api/test`. +To learn about more expectations supported by KUnit, see +Documentation/dev-tools/kunit/api/test.rst. .. note:: A single test case should be pretty short, pretty easy to understand, @@ -216,7 +217,8 @@ test suite in a special linker section so that it can be run by KUnit either after late_init, or when the test module is loaded (depending on whether the test was built in or not). -For more information on these types of things see the :doc:`api/test`. +For more information on these types of things see the +Documentation/dev-tools/kunit/api/test.rst. Common Patterns =============== diff --git a/Documentation/dev-tools/testing-overview.rst b/Documentation/dev-tools/testing-overview.rst index b5b46709969c..65feb81edb14 100644 --- a/Documentation/dev-tools/testing-overview.rst +++ b/Documentation/dev-tools/testing-overview.rst @@ -71,15 +71,15 @@ can be used to verify that a test is executing particular functions or lines of code. This is useful for determining how much of the kernel is being tested, and for finding corner-cases which are not covered by the appropriate test. -:doc:`gcov` is GCC's coverage testing tool, which can be used with the kernel -to get global or per-module coverage. Unlike KCOV, it does not record per-task -coverage. Coverage data can be read from debugfs, and interpreted using the -usual gcov tooling. - -:doc:`kcov` is a feature which can be built in to the kernel to allow -capturing coverage on a per-task level. It's therefore useful for fuzzing and -other situations where information about code executed during, for example, a -single syscall is useful. +Documentation/dev-tools/gcov.rst is GCC's coverage testing tool, which can be +used with the kernel to get global or per-module coverage. Unlike KCOV, it +does not record per-task coverage. Coverage data can be read from debugfs, +and interpreted using the usual gcov tooling. + +Documentation/dev-tools/kcov.rst is a feature which can be built in to the +kernel to allow capturing coverage on a per-task level. It's therefore useful +for fuzzing and other situations where information about code executed during, +for example, a single syscall is useful. Dynamic Analysis Tools diff --git a/Documentation/devicetree/bindings/submitting-patches.rst b/Documentation/devicetree/bindings/submitting-patches.rst index 104fa8fb2c17..8087780f1685 100644 --- a/Documentation/devicetree/bindings/submitting-patches.rst +++ b/Documentation/devicetree/bindings/submitting-patches.rst @@ -7,8 +7,8 @@ Submitting Devicetree (DT) binding patches I. For patch submitters ======================= - 0) Normal patch submission rules from Documentation/process/submitting-patches.rst - applies. + 0) Normal patch submission rules from + Documentation/process/submitting-patches.rst applies. 1) The Documentation/ and include/dt-bindings/ portion of the patch should be a separate patch. The preferred subject prefix for binding patches is:: @@ -25,8 +25,8 @@ I. For patch submitters make dt_binding_check - See Documentation/devicetree/bindings/writing-schema.rst for more details about - schema and tools setup. + See Documentation/devicetree/bindings/writing-schema.rst for more details + about schema and tools setup. 3) DT binding files should be dual licensed. The preferred license tag is (GPL-2.0-only OR BSD-2-Clause). @@ -84,7 +84,8 @@ II. For kernel maintainers III. Notes ========== - 0) Please see :doc:`ABI` for details regarding devicetree ABI. + 0) Please see Documentation/devicetree/bindings/ABI.rst for details + regarding devicetree ABI. 1) This document is intended as a general familiarization with the process as decided at the 2013 Kernel Summit. When in doubt, the current word of the diff --git a/Documentation/doc-guide/contributing.rst b/Documentation/doc-guide/contributing.rst index 67ee3691f91f..207fd93d7c80 100644 --- a/Documentation/doc-guide/contributing.rst +++ b/Documentation/doc-guide/contributing.rst @@ -237,10 +237,10 @@ We have been trying to improve the situation through the creation of a set of "books" that group documentation for specific readers. These include: - - :doc:`../admin-guide/index` - - :doc:`../core-api/index` - - :doc:`../driver-api/index` - - :doc:`../userspace-api/index` + - Documentation/admin-guide/index.rst + - Documentation/core-api/index.rst + - Documentation/driver-api/index.rst + - Documentation/userspace-api/index.rst As well as this book on documentation itself. diff --git a/Documentation/driver-api/acpi/linuxized-acpica.rst b/Documentation/driver-api/acpi/linuxized-acpica.rst index 6bee03383225..cc234353d2c4 100644 --- a/Documentation/driver-api/acpi/linuxized-acpica.rst +++ b/Documentation/driver-api/acpi/linuxized-acpica.rst @@ -276,4 +276,4 @@ before they become available from the ACPICA release process. # git clone https://github.com/acpica/acpica # git clone https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git # cd acpica - # generate/linux/divergences.sh -s ../linux + # generate/linux/divergence.sh -s ../linux diff --git a/Documentation/driver-api/gpio/using-gpio.rst b/Documentation/driver-api/gpio/using-gpio.rst index dda069444032..64c8d3f76c3a 100644 --- a/Documentation/driver-api/gpio/using-gpio.rst +++ b/Documentation/driver-api/gpio/using-gpio.rst @@ -9,13 +9,13 @@ with them. For examples of already existing generic drivers that will also be good examples for any other kernel drivers you want to author, refer to -:doc:`drivers-on-gpio` +Documentation/driver-api/gpio/drivers-on-gpio.rst For any kind of mass produced system you want to support, such as servers, laptops, phones, tablets, routers, and any consumer or office or business goods using appropriate kernel drivers is paramount. Submit your code for inclusion in the upstream Linux kernel when you feel it is mature enough and you will get -help to refine it, see :doc:`../../process/submitting-patches`. +help to refine it, see Documentation/process/submitting-patches.rst. In Linux GPIO lines also have a userspace ABI. diff --git a/Documentation/driver-api/ioctl.rst b/Documentation/driver-api/ioctl.rst index c455db0e1627..35795f6a151a 100644 --- a/Documentation/driver-api/ioctl.rst +++ b/Documentation/driver-api/ioctl.rst @@ -25,16 +25,16 @@ ioctl commands that follow modern conventions: ``_IO``, ``_IOR``, with the correct parameters: _IO/_IOR/_IOW/_IOWR - The macro name specifies how the argument will be used. It may be a + The macro name specifies how the argument will be used. It may be a pointer to data to be passed into the kernel (_IOW), out of the kernel - (_IOR), or both (_IOWR). _IO can indicate either commands with no + (_IOR), or both (_IOWR). _IO can indicate either commands with no argument or those passing an integer value instead of a pointer. It is recommended to only use _IO for commands without arguments, and use pointers for passing data. type An 8-bit number, often a character literal, specific to a subsystem - or driver, and listed in :doc:`../userspace-api/ioctl/ioctl-number` + or driver, and listed in Documentation/userspace-api/ioctl/ioctl-number.rst nr An 8-bit number identifying the specific command, unique for a give @@ -200,10 +200,10 @@ cause an information leak, which can be used to defeat kernel address space layout randomization (KASLR), helping in an attack. For this reason (and for compat support) it is best to avoid any -implicit padding in data structures. Where there is implicit padding +implicit padding in data structures. Where there is implicit padding in an existing structure, kernel drivers must be careful to fully initialize an instance of the structure before copying it to user -space. This is usually done by calling memset() before assigning to +space. This is usually done by calling memset() before assigning to individual members. Subsystem abstractions diff --git a/Documentation/driver-api/pm/devices.rst b/Documentation/driver-api/pm/devices.rst index 6b3bfd29fd84..d448cb57df86 100644 --- a/Documentation/driver-api/pm/devices.rst +++ b/Documentation/driver-api/pm/devices.rst @@ -217,7 +217,7 @@ system-wide transition to a sleep state even though its :c:member:`runtime_auto` flag is clear. For more information about the runtime power management framework, refer to -:file:`Documentation/power/runtime_pm.rst`. +Documentation/power/runtime_pm.rst. Calling Drivers to Enter and Leave System Sleep States @@ -655,7 +655,7 @@ been thawed. Generally speaking, the PM notifiers are suitable for performing actions that either require user space to be available, or at least won't interfere with user space. -For details refer to :doc:`notifiers`. +For details refer to Documentation/driver-api/pm/notifiers.rst. Device Low-Power (suspend) States @@ -726,7 +726,7 @@ it into account in any way. Devices may be defined as IRQ-safe which indicates to the PM core that their runtime PM callbacks may be invoked with disabled interrupts (see -:file:`Documentation/power/runtime_pm.rst` for more information). If an +Documentation/power/runtime_pm.rst for more information). If an IRQ-safe device belongs to a PM domain, the runtime PM of the domain will be disallowed, unless the domain itself is defined as IRQ-safe. However, it makes sense to define a PM domain as IRQ-safe only if all the devices in it @@ -805,7 +805,7 @@ The ``DPM_FLAG_MAY_SKIP_RESUME`` Driver Flag -------------------------------------------- During system-wide resume from a sleep state it's easiest to put devices into -the full-power state, as explained in :file:`Documentation/power/runtime_pm.rst`. +the full-power state, as explained in Documentation/power/runtime_pm.rst. [Refer to that document for more information regarding this particular issue as well as for information on the device runtime power management framework in general.] However, it often is desirable to leave devices in suspend after diff --git a/Documentation/driver-api/surface_aggregator/clients/index.rst b/Documentation/driver-api/surface_aggregator/clients/index.rst index 98ea9946b8a2..30160513afa5 100644 --- a/Documentation/driver-api/surface_aggregator/clients/index.rst +++ b/Documentation/driver-api/surface_aggregator/clients/index.rst @@ -5,7 +5,8 @@ Client Driver Documentation =========================== This is the documentation for client drivers themselves. Refer to -:doc:`../client` for documentation on how to write client drivers. +Documentation/driver-api/surface_aggregator/client.rst for documentation +on how to write client drivers. .. toctree:: :maxdepth: 1 diff --git a/Documentation/driver-api/surface_aggregator/internal.rst b/Documentation/driver-api/surface_aggregator/internal.rst index 72704734982a..8c7c80c9f418 100644 --- a/Documentation/driver-api/surface_aggregator/internal.rst +++ b/Documentation/driver-api/surface_aggregator/internal.rst @@ -87,10 +87,11 @@ native SSAM devices, i.e. devices that are not defined in ACPI and not implemented as platform devices, via |ssam_device| and |ssam_device_driver| simplify management of client devices and client drivers. -Refer to :doc:`client` for documentation regarding the client device/driver -API and interface options for other kernel drivers. It is recommended to -familiarize oneself with that chapter and the :doc:`ssh` before continuing -with the architectural overview below. +Refer to Documentation/driver-api/surface_aggregator/client.rst for +documentation regarding the client device/driver API and interface options +for other kernel drivers. It is recommended to familiarize oneself with +that chapter and the Documentation/driver-api/surface_aggregator/ssh.rst +before continuing with the architectural overview below. Packet Transport Layer @@ -190,9 +191,9 @@ with success on the transmitter thread. Transmission of sequenced packets is limited by the number of concurrently pending packets, i.e. a limit on how many packets may be waiting for an ACK -from the EC in parallel. This limit is currently set to one (see :doc:`ssh` -for the reasoning behind this). Control packets (i.e. ACK and NAK) can -always be transmitted. +from the EC in parallel. This limit is currently set to one (see +Documentation/driver-api/surface_aggregator/ssh.rst for the reasoning behind +this). Control packets (i.e. ACK and NAK) can always be transmitted. Receiver Thread --------------- diff --git a/Documentation/driver-api/surface_aggregator/overview.rst b/Documentation/driver-api/surface_aggregator/overview.rst index 1e9d57e50063..26415e1ab7da 100644 --- a/Documentation/driver-api/surface_aggregator/overview.rst +++ b/Documentation/driver-api/surface_aggregator/overview.rst @@ -73,5 +73,7 @@ being a direct response to a previous request. We may also refer to requests without response as commands. In general, events need to be enabled via one of multiple dedicated requests before they are sent by the EC. -See :doc:`ssh` for a more technical protocol documentation and -:doc:`internal` for an overview of the internal driver architecture. +See Documentation/driver-api/surface_aggregator/ssh.rst for a +more technical protocol documentation and +Documentation/driver-api/surface_aggregator/internal.rst for an +overview of the internal driver architecture. diff --git a/Documentation/driver-api/usb/dma.rst b/Documentation/driver-api/usb/dma.rst index 2b3dbd3265b4..d32c27e11b90 100644 --- a/Documentation/driver-api/usb/dma.rst +++ b/Documentation/driver-api/usb/dma.rst @@ -10,7 +10,7 @@ API overview The big picture is that USB drivers can continue to ignore most DMA issues, though they still must provide DMA-ready buffers (see -:doc:`/core-api/dma-api-howto`). That's how they've worked through +Documentation/core-api/dma-api-howto.rst). That's how they've worked through the 2.4 (and earlier) kernels, or they can now be DMA-aware. DMA-aware usb drivers: @@ -60,7 +60,7 @@ and effects like cache-trashing can impose subtle penalties. force a consistent memory access ordering by using memory barriers. It's not using a streaming DMA mapping, so it's good for small transfers on systems where the I/O would otherwise thrash an IOMMU mapping. (See - :doc:`/core-api/dma-api-howto` for definitions of "coherent" and + Documentation/core-api/dma-api-howto.rst for definitions of "coherent" and "streaming" DMA mappings.) Asking for 1/Nth of a page (as well as asking for N pages) is reasonably @@ -91,7 +91,7 @@ Working with existing buffers Existing buffers aren't usable for DMA without first being mapped into the DMA address space of the device. However, most buffers passed to your driver can safely be used with such DMA mapping. (See the first section -of :doc:`/core-api/dma-api-howto`, titled "What memory is DMA-able?") +of Documentation/core-api/dma-api-howto.rst, titled "What memory is DMA-able?") - When you're using scatterlists, you can map everything at once. On some systems, this kicks in an IOMMU and turns the scatterlists into single diff --git a/Documentation/fault-injection/fault-injection.rst b/Documentation/fault-injection/fault-injection.rst index 31ecfe44e5b4..f47d05ed0d94 100644 --- a/Documentation/fault-injection/fault-injection.rst +++ b/Documentation/fault-injection/fault-injection.rst @@ -78,8 +78,10 @@ configuration of fault-injection capabilities. - /sys/kernel/debug/fail*/times: - specifies how many times failures may happen at most. - A value of -1 means "no limit". + specifies how many times failures may happen at most. A value of -1 + means "no limit". Note, though, that this file only accepts unsigned + values. So, if you want to specify -1, you better use 'printf' instead + of 'echo', e.g.: $ printf %#x -1 > times - /sys/kernel/debug/fail*/space: @@ -167,11 +169,13 @@ configuration of fault-injection capabilities. - ERRNO: retval must be -1 to -MAX_ERRNO (-4096). - ERR_NULL: retval must be 0 or -1 to -MAX_ERRNO (-4096). -- /sys/kernel/debug/fail_function/<functiuon-name>/retval: +- /sys/kernel/debug/fail_function/<function-name>/retval: - specifies the "error" return value to inject to the given - function for given function. This will be created when - user specifies new injection entry. + specifies the "error" return value to inject to the given function. + This will be created when the user specifies a new injection entry. + Note that this file only accepts unsigned values. So, if you want to + use a negative errno, you better use 'printf' instead of 'echo', e.g.: + $ printf %#x -12 > retval Boot option ^^^^^^^^^^^ @@ -255,7 +259,7 @@ Application Examples echo Y > /sys/kernel/debug/$FAILTYPE/task-filter echo 10 > /sys/kernel/debug/$FAILTYPE/probability echo 100 > /sys/kernel/debug/$FAILTYPE/interval - echo -1 > /sys/kernel/debug/$FAILTYPE/times + printf %#x -1 > /sys/kernel/debug/$FAILTYPE/times echo 0 > /sys/kernel/debug/$FAILTYPE/space echo 2 > /sys/kernel/debug/$FAILTYPE/verbose echo 1 > /sys/kernel/debug/$FAILTYPE/ignore-gfp-wait @@ -309,7 +313,7 @@ Application Examples echo N > /sys/kernel/debug/$FAILTYPE/task-filter echo 10 > /sys/kernel/debug/$FAILTYPE/probability echo 100 > /sys/kernel/debug/$FAILTYPE/interval - echo -1 > /sys/kernel/debug/$FAILTYPE/times + printf %#x -1 > /sys/kernel/debug/$FAILTYPE/times echo 0 > /sys/kernel/debug/$FAILTYPE/space echo 2 > /sys/kernel/debug/$FAILTYPE/verbose echo 1 > /sys/kernel/debug/$FAILTYPE/ignore-gfp-wait @@ -336,11 +340,11 @@ Application Examples FAILTYPE=fail_function FAILFUNC=open_ctree echo $FAILFUNC > /sys/kernel/debug/$FAILTYPE/inject - echo -12 > /sys/kernel/debug/$FAILTYPE/$FAILFUNC/retval + printf %#x -12 > /sys/kernel/debug/$FAILTYPE/$FAILFUNC/retval echo N > /sys/kernel/debug/$FAILTYPE/task-filter echo 100 > /sys/kernel/debug/$FAILTYPE/probability echo 0 > /sys/kernel/debug/$FAILTYPE/interval - echo -1 > /sys/kernel/debug/$FAILTYPE/times + printf %#x -1 > /sys/kernel/debug/$FAILTYPE/times echo 0 > /sys/kernel/debug/$FAILTYPE/space echo 1 > /sys/kernel/debug/$FAILTYPE/verbose diff --git a/Documentation/filesystems/dax.rst b/Documentation/filesystems/dax.rst new file mode 100644 index 000000000000..9a1b8fd9e82b --- /dev/null +++ b/Documentation/filesystems/dax.rst @@ -0,0 +1,291 @@ +======================= +Direct Access for files +======================= + +Motivation +---------- + +The page cache is usually used to buffer reads and writes to files. +It is also used to provide the pages which are mapped into userspace +by a call to mmap. + +For block devices that are memory-like, the page cache pages would be +unnecessary copies of the original storage. The `DAX` code removes the +extra copy by performing reads and writes directly to the storage device. +For file mappings, the storage device is mapped directly into userspace. + + +Usage +----- + +If you have a block device which supports `DAX`, you can make a filesystem +on it as usual. The `DAX` code currently only supports files with a block +size equal to your kernel's `PAGE_SIZE`, so you may need to specify a block +size when creating the filesystem. + +Currently 3 filesystems support `DAX`: ext2, ext4 and xfs. Enabling `DAX` on them +is different. + +Enabling DAX on ext2 +-------------------- + +When mounting the filesystem, use the ``-o dax`` option on the command line or +add 'dax' to the options in ``/etc/fstab``. This works to enable `DAX` on all files +within the filesystem. It is equivalent to the ``-o dax=always`` behavior below. + + +Enabling DAX on xfs and ext4 +---------------------------- + +Summary +------- + + 1. There exists an in-kernel file access mode flag `S_DAX` that corresponds to + the statx flag `STATX_ATTR_DAX`. See the manpage for statx(2) for details + about this access mode. + + 2. There exists a persistent flag `FS_XFLAG_DAX` that can be applied to regular + files and directories. This advisory flag can be set or cleared at any + time, but doing so does not immediately affect the `S_DAX` state. + + 3. If the persistent `FS_XFLAG_DAX` flag is set on a directory, this flag will + be inherited by all regular files and subdirectories that are subsequently + created in this directory. Files and subdirectories that exist at the time + this flag is set or cleared on the parent directory are not modified by + this modification of the parent directory. + + 4. There exist dax mount options which can override `FS_XFLAG_DAX` in the + setting of the `S_DAX` flag. Given underlying storage which supports `DAX` the + following hold: + + ``-o dax=inode`` means "follow `FS_XFLAG_DAX`" and is the default. + + ``-o dax=never`` means "never set `S_DAX`, ignore `FS_XFLAG_DAX`." + + ``-o dax=always`` means "always set `S_DAX` ignore `FS_XFLAG_DAX`." + + ``-o dax`` is a legacy option which is an alias for ``dax=always``. + + .. warning:: + + The option ``-o dax`` may be removed in the future so ``-o dax=always`` is + the preferred method for specifying this behavior. + + .. note:: + + Modifications to and the inheritance behavior of `FS_XFLAG_DAX` remain + the same even when the filesystem is mounted with a dax option. However, + in-core inode state (`S_DAX`) will be overridden until the filesystem is + remounted with dax=inode and the inode is evicted from kernel memory. + + 5. The `S_DAX` policy can be changed via: + + a) Setting the parent directory `FS_XFLAG_DAX` as needed before files are + created + + b) Setting the appropriate dax="foo" mount option + + c) Changing the `FS_XFLAG_DAX` flag on existing regular files and + directories. This has runtime constraints and limitations that are + described in 6) below. + + 6. When changing the `S_DAX` policy via toggling the persistent `FS_XFLAG_DAX` + flag, the change to existing regular files won't take effect until the + files are closed by all processes. + + +Details +------- + +There are 2 per-file dax flags. One is a persistent inode setting (`FS_XFLAG_DAX`) +and the other is a volatile flag indicating the active state of the feature +(`S_DAX`). + +`FS_XFLAG_DAX` is preserved within the filesystem. This persistent config +setting can be set, cleared and/or queried using the `FS_IOC_FS`[`GS`]`ETXATTR` ioctl +(see ioctl_xfs_fsgetxattr(2)) or an utility such as 'xfs_io'. + +New files and directories automatically inherit `FS_XFLAG_DAX` from +their parent directory **when created**. Therefore, setting `FS_XFLAG_DAX` at +directory creation time can be used to set a default behavior for an entire +sub-tree. + +To clarify inheritance, here are 3 examples: + +Example A: + +.. code-block:: shell + + mkdir -p a/b/c + xfs_io -c 'chattr +x' a + mkdir a/b/c/d + mkdir a/e + + ------[outcome]------ + + dax: a,e + no dax: b,c,d + +Example B: + +.. code-block:: shell + + mkdir a + xfs_io -c 'chattr +x' a + mkdir -p a/b/c/d + + ------[outcome]------ + + dax: a,b,c,d + no dax: + +Example C: + +.. code-block:: shell + + mkdir -p a/b/c + xfs_io -c 'chattr +x' c + mkdir a/b/c/d + + ------[outcome]------ + + dax: c,d + no dax: a,b + +The current enabled state (`S_DAX`) is set when a file inode is instantiated in +memory by the kernel. It is set based on the underlying media support, the +value of `FS_XFLAG_DAX` and the filesystem's dax mount option. + +statx can be used to query `S_DAX`. + +.. note:: + + That only regular files will ever have `S_DAX` set and therefore statx + will never indicate that `S_DAX` is set on directories. + +Setting the `FS_XFLAG_DAX` flag (specifically or through inheritance) occurs even +if the underlying media does not support dax and/or the filesystem is +overridden with a mount option. + + +Implementation Tips for Block Driver Writers +-------------------------------------------- + +To support `DAX` in your block driver, implement the 'direct_access' +block device operation. It is used to translate the sector number +(expressed in units of 512-byte sectors) to a page frame number (pfn) +that identifies the physical page for the memory. It also returns a +kernel virtual address that can be used to access the memory. + +The direct_access method takes a 'size' parameter that indicates the +number of bytes being requested. The function should return the number +of bytes that can be contiguously accessed at that offset. It may also +return a negative errno if an error occurs. + +In order to support this method, the storage must be byte-accessible by +the CPU at all times. If your device uses paging techniques to expose +a large amount of memory through a smaller window, then you cannot +implement direct_access. Equally, if your device can occasionally +stall the CPU for an extended period, you should also not attempt to +implement direct_access. + +These block devices may be used for inspiration: +- brd: RAM backed block device driver +- dcssblk: s390 dcss block device driver +- pmem: NVDIMM persistent memory driver + + +Implementation Tips for Filesystem Writers +------------------------------------------ + +Filesystem support consists of: + +* Adding support to mark inodes as being `DAX` by setting the `S_DAX` flag in + i_flags +* Implementing ->read_iter and ->write_iter operations which use + :c:func:`dax_iomap_rw()` when inode has `S_DAX` flag set +* Implementing an mmap file operation for `DAX` files which sets the + `VM_MIXEDMAP` and `VM_HUGEPAGE` flags on the `VMA`, and setting the vm_ops to + include handlers for fault, pmd_fault, page_mkwrite, pfn_mkwrite. These + handlers should probably call :c:func:`dax_iomap_fault()` passing the + appropriate fault size and iomap operations. +* Calling :c:func:`iomap_zero_range()` passing appropriate iomap operations + instead of :c:func:`block_truncate_page()` for `DAX` files +* Ensuring that there is sufficient locking between reads, writes, + truncates and page faults + +The iomap handlers for allocating blocks must make sure that allocated blocks +are zeroed out and converted to written extents before being returned to avoid +exposure of uninitialized data through mmap. + +These filesystems may be used for inspiration: + +.. seealso:: + + ext2: see Documentation/filesystems/ext2.rst + +.. seealso:: + + xfs: see Documentation/admin-guide/xfs.rst + +.. seealso:: + + ext4: see Documentation/filesystems/ext4/ + + +Handling Media Errors +--------------------- + +The libnvdimm subsystem stores a record of known media error locations for +each pmem block device (in gendisk->badblocks). If we fault at such location, +or one with a latent error not yet discovered, the application can expect +to receive a `SIGBUS`. Libnvdimm also allows clearing of these errors by simply +writing the affected sectors (through the pmem driver, and if the underlying +NVDIMM supports the clear_poison DSM defined by ACPI). + +Since `DAX` IO normally doesn't go through the ``driver/bio`` path, applications or +sysadmins have an option to restore the lost data from a prior ``backup/inbuilt`` +redundancy in the following ways: + +1. Delete the affected file, and restore from a backup (sysadmin route): + This will free the filesystem blocks that were being used by the file, + and the next time they're allocated, they will be zeroed first, which + happens through the driver, and will clear bad sectors. + +2. Truncate or hole-punch the part of the file that has a bad-block (at least + an entire aligned sector has to be hole-punched, but not necessarily an + entire filesystem block). + +These are the two basic paths that allow `DAX` filesystems to continue operating +in the presence of media errors. More robust error recovery mechanisms can be +built on top of this in the future, for example, involving redundancy/mirroring +provided at the block layer through DM, or additionally, at the filesystem +level. These would have to rely on the above two tenets, that error clearing +can happen either by sending an IO through the driver, or zeroing (also through +the driver). + + +Shortcomings +------------ + +Even if the kernel or its modules are stored on a filesystem that supports +`DAX` on a block device that supports `DAX`, they will still be copied into RAM. + +The DAX code does not work correctly on architectures which have virtually +mapped caches such as ARM, MIPS and SPARC. + +Calling :c:func:`get_user_pages()` on a range of user memory that has been +mmaped from a `DAX` file will fail when there are no 'struct page' to describe +those pages. This problem has been addressed in some device drivers +by adding optional struct page support for pages under the control of +the driver (see `CONFIG_NVDIMM_PFN` in ``drivers/nvdimm`` for an example of +how to do this). In the non struct page cases `O_DIRECT` reads/writes to +those memory ranges from a non-`DAX` file will fail + + +.. note:: + + `O_DIRECT` reads/writes _of a `DAX` file do work, it is the memory that + is being accessed that is key here). Other things that will not work in + the non struct page case include RDMA, :c:func:`sendfile()` and + :c:func:`splice()`. diff --git a/Documentation/filesystems/dax.txt b/Documentation/filesystems/dax.txt deleted file mode 100644 index e03c20564f3a..000000000000 --- a/Documentation/filesystems/dax.txt +++ /dev/null @@ -1,257 +0,0 @@ -Direct Access for files ------------------------ - -Motivation ----------- - -The page cache is usually used to buffer reads and writes to files. -It is also used to provide the pages which are mapped into userspace -by a call to mmap. - -For block devices that are memory-like, the page cache pages would be -unnecessary copies of the original storage. The DAX code removes the -extra copy by performing reads and writes directly to the storage device. -For file mappings, the storage device is mapped directly into userspace. - - -Usage ------ - -If you have a block device which supports DAX, you can make a filesystem -on it as usual. The DAX code currently only supports files with a block -size equal to your kernel's PAGE_SIZE, so you may need to specify a block -size when creating the filesystem. - -Currently 3 filesystems support DAX: ext2, ext4 and xfs. Enabling DAX on them -is different. - -Enabling DAX on ext2 ------------------------------ - -When mounting the filesystem, use the "-o dax" option on the command line or -add 'dax' to the options in /etc/fstab. This works to enable DAX on all files -within the filesystem. It is equivalent to the '-o dax=always' behavior below. - - -Enabling DAX on xfs and ext4 ----------------------------- - -Summary -------- - - 1. There exists an in-kernel file access mode flag S_DAX that corresponds to - the statx flag STATX_ATTR_DAX. See the manpage for statx(2) for details - about this access mode. - - 2. There exists a persistent flag FS_XFLAG_DAX that can be applied to regular - files and directories. This advisory flag can be set or cleared at any - time, but doing so does not immediately affect the S_DAX state. - - 3. If the persistent FS_XFLAG_DAX flag is set on a directory, this flag will - be inherited by all regular files and subdirectories that are subsequently - created in this directory. Files and subdirectories that exist at the time - this flag is set or cleared on the parent directory are not modified by - this modification of the parent directory. - - 4. There exist dax mount options which can override FS_XFLAG_DAX in the - setting of the S_DAX flag. Given underlying storage which supports DAX the - following hold: - - "-o dax=inode" means "follow FS_XFLAG_DAX" and is the default. - - "-o dax=never" means "never set S_DAX, ignore FS_XFLAG_DAX." - - "-o dax=always" means "always set S_DAX ignore FS_XFLAG_DAX." - - "-o dax" is a legacy option which is an alias for "dax=always". - This may be removed in the future so "-o dax=always" is - the preferred method for specifying this behavior. - - NOTE: Modifications to and the inheritance behavior of FS_XFLAG_DAX remain - the same even when the filesystem is mounted with a dax option. However, - in-core inode state (S_DAX) will be overridden until the filesystem is - remounted with dax=inode and the inode is evicted from kernel memory. - - 5. The S_DAX policy can be changed via: - - a) Setting the parent directory FS_XFLAG_DAX as needed before files are - created - - b) Setting the appropriate dax="foo" mount option - - c) Changing the FS_XFLAG_DAX flag on existing regular files and - directories. This has runtime constraints and limitations that are - described in 6) below. - - 6. When changing the S_DAX policy via toggling the persistent FS_XFLAG_DAX - flag, the change to existing regular files won't take effect until the - files are closed by all processes. - - -Details -------- - -There are 2 per-file dax flags. One is a persistent inode setting (FS_XFLAG_DAX) -and the other is a volatile flag indicating the active state of the feature -(S_DAX). - -FS_XFLAG_DAX is preserved within the filesystem. This persistent config -setting can be set, cleared and/or queried using the FS_IOC_FS[GS]ETXATTR ioctl -(see ioctl_xfs_fsgetxattr(2)) or an utility such as 'xfs_io'. - -New files and directories automatically inherit FS_XFLAG_DAX from -their parent directory _when_ _created_. Therefore, setting FS_XFLAG_DAX at -directory creation time can be used to set a default behavior for an entire -sub-tree. - -To clarify inheritance, here are 3 examples: - -Example A: - -mkdir -p a/b/c -xfs_io -c 'chattr +x' a -mkdir a/b/c/d -mkdir a/e - - dax: a,e - no dax: b,c,d - -Example B: - -mkdir a -xfs_io -c 'chattr +x' a -mkdir -p a/b/c/d - - dax: a,b,c,d - no dax: - -Example C: - -mkdir -p a/b/c -xfs_io -c 'chattr +x' c -mkdir a/b/c/d - - dax: c,d - no dax: a,b - - -The current enabled state (S_DAX) is set when a file inode is instantiated in -memory by the kernel. It is set based on the underlying media support, the -value of FS_XFLAG_DAX and the filesystem's dax mount option. - -statx can be used to query S_DAX. NOTE that only regular files will ever have -S_DAX set and therefore statx will never indicate that S_DAX is set on -directories. - -Setting the FS_XFLAG_DAX flag (specifically or through inheritance) occurs even -if the underlying media does not support dax and/or the filesystem is -overridden with a mount option. - - - -Implementation Tips for Block Driver Writers --------------------------------------------- - -To support DAX in your block driver, implement the 'direct_access' -block device operation. It is used to translate the sector number -(expressed in units of 512-byte sectors) to a page frame number (pfn) -that identifies the physical page for the memory. It also returns a -kernel virtual address that can be used to access the memory. - -The direct_access method takes a 'size' parameter that indicates the -number of bytes being requested. The function should return the number -of bytes that can be contiguously accessed at that offset. It may also -return a negative errno if an error occurs. - -In order to support this method, the storage must be byte-accessible by -the CPU at all times. If your device uses paging techniques to expose -a large amount of memory through a smaller window, then you cannot -implement direct_access. Equally, if your device can occasionally -stall the CPU for an extended period, you should also not attempt to -implement direct_access. - -These block devices may be used for inspiration: -- brd: RAM backed block device driver -- dcssblk: s390 dcss block device driver -- pmem: NVDIMM persistent memory driver - - -Implementation Tips for Filesystem Writers ------------------------------------------- - -Filesystem support consists of -- adding support to mark inodes as being DAX by setting the S_DAX flag in - i_flags -- implementing ->read_iter and ->write_iter operations which use dax_iomap_rw() - when inode has S_DAX flag set -- implementing an mmap file operation for DAX files which sets the - VM_MIXEDMAP and VM_HUGEPAGE flags on the VMA, and setting the vm_ops to - include handlers for fault, pmd_fault, page_mkwrite, pfn_mkwrite. These - handlers should probably call dax_iomap_fault() passing the appropriate - fault size and iomap operations. -- calling iomap_zero_range() passing appropriate iomap operations instead of - block_truncate_page() for DAX files -- ensuring that there is sufficient locking between reads, writes, - truncates and page faults - -The iomap handlers for allocating blocks must make sure that allocated blocks -are zeroed out and converted to written extents before being returned to avoid -exposure of uninitialized data through mmap. - -These filesystems may be used for inspiration: -- ext2: see Documentation/filesystems/ext2.rst -- ext4: see Documentation/filesystems/ext4/ -- xfs: see Documentation/admin-guide/xfs.rst - - -Handling Media Errors ---------------------- - -The libnvdimm subsystem stores a record of known media error locations for -each pmem block device (in gendisk->badblocks). If we fault at such location, -or one with a latent error not yet discovered, the application can expect -to receive a SIGBUS. Libnvdimm also allows clearing of these errors by simply -writing the affected sectors (through the pmem driver, and if the underlying -NVDIMM supports the clear_poison DSM defined by ACPI). - -Since DAX IO normally doesn't go through the driver/bio path, applications or -sysadmins have an option to restore the lost data from a prior backup/inbuilt -redundancy in the following ways: - -1. Delete the affected file, and restore from a backup (sysadmin route): - This will free the filesystem blocks that were being used by the file, - and the next time they're allocated, they will be zeroed first, which - happens through the driver, and will clear bad sectors. - -2. Truncate or hole-punch the part of the file that has a bad-block (at least - an entire aligned sector has to be hole-punched, but not necessarily an - entire filesystem block). - -These are the two basic paths that allow DAX filesystems to continue operating -in the presence of media errors. More robust error recovery mechanisms can be -built on top of this in the future, for example, involving redundancy/mirroring -provided at the block layer through DM, or additionally, at the filesystem -level. These would have to rely on the above two tenets, that error clearing -can happen either by sending an IO through the driver, or zeroing (also through -the driver). - - -Shortcomings ------------- - -Even if the kernel or its modules are stored on a filesystem that supports -DAX on a block device that supports DAX, they will still be copied into RAM. - -The DAX code does not work correctly on architectures which have virtually -mapped caches such as ARM, MIPS and SPARC. - -Calling get_user_pages() on a range of user memory that has been mmaped -from a DAX file will fail when there are no 'struct page' to describe -those pages. This problem has been addressed in some device drivers -by adding optional struct page support for pages under the control of -the driver (see CONFIG_NVDIMM_PFN in drivers/nvdimm for an example of -how to do this). In the non struct page cases O_DIRECT reads/writes to -those memory ranges from a non-DAX file will fail (note that O_DIRECT -reads/writes _of a DAX file_ do work, it is the memory that is being -accessed that is key here). Other things that will not work in the -non struct page case include RDMA, sendfile() and splice(). diff --git a/Documentation/filesystems/ext2.rst b/Documentation/filesystems/ext2.rst index c2fce22cfd03..154101cf0e4f 100644 --- a/Documentation/filesystems/ext2.rst +++ b/Documentation/filesystems/ext2.rst @@ -25,7 +25,7 @@ check=none, nocheck (*) Don't do extra checking of bitmaps on mount (check=normal and check=strict options removed) dax Use direct access (no page cache). See - Documentation/filesystems/dax.txt. + Documentation/filesystems/dax.rst. debug Extra debugging information is sent to the kernel syslog. Useful for developers. diff --git a/Documentation/filesystems/ext4/blockgroup.rst b/Documentation/filesystems/ext4/blockgroup.rst index 3da156633339..d5d652addce5 100644 --- a/Documentation/filesystems/ext4/blockgroup.rst +++ b/Documentation/filesystems/ext4/blockgroup.rst @@ -84,7 +84,7 @@ Without the option META\_BG, for safety concerns, all block group descriptors copies are kept in the first block group. Given the default 128MiB(2^27 bytes) block group size and 64-byte group descriptors, ext4 can have at most 2^27/64 = 2^21 block groups. This limits the entire -filesystem size to 2^21 ∗ 2^27 = 2^48bytes or 256TiB. +filesystem size to 2^21 * 2^27 = 2^48bytes or 256TiB. The solution to this problem is to use the metablock group feature (META\_BG), which is already in ext3 for all 2.6 releases. With the diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst index d4853cb919d2..246af51b277a 100644 --- a/Documentation/filesystems/index.rst +++ b/Documentation/filesystems/index.rst @@ -77,6 +77,7 @@ Documentation for filesystem implementations. coda configfs cramfs + dax debugfs dlmfs ecryptfs diff --git a/Documentation/filesystems/path-lookup.rst b/Documentation/filesystems/path-lookup.rst index c482e1619e77..a6fa7619b69e 100644 --- a/Documentation/filesystems/path-lookup.rst +++ b/Documentation/filesystems/path-lookup.rst @@ -448,15 +448,17 @@ described. If it finds a ``LAST_NORM`` component it first calls filesystem to revalidate the result if it is that sort of filesystem. If that doesn't get a good result, it calls "``lookup_slow()``" which takes ``i_rwsem``, rechecks the cache, and then asks the filesystem -to find a definitive answer. Each of these will call -``follow_managed()`` (as described below) to handle any mount points. - -In the absence of symbolic links, ``walk_component()`` creates a new -``struct path`` containing a counted reference to the new dentry and a -reference to the new ``vfsmount`` which is only counted if it is -different from the previous ``vfsmount``. It then calls -``path_to_nameidata()`` to install the new ``struct path`` in the -``struct nameidata`` and drop the unneeded references. +to find a definitive answer. + +As the last step of walk_component(), step_into() will be called either +directly from walk_component() or from handle_dots(). It calls +handle_mounts(), to check and handle mount points, in which a new +``struct path`` is created containing a counted reference to the new dentry and +a reference to the new ``vfsmount`` which is only counted if it is +different from the previous ``vfsmount``. Then if there is +a symbolic link, step_into() calls pick_link() to deal with it, +otherwise it installs the new ``struct path`` in the ``struct nameidata``, and +drops the unneeded references. This "hand-over-hand" sequencing of getting a reference to the new dentry before dropping the reference to the previous dentry may @@ -470,8 +472,8 @@ Handling the final component ``nd->last_type`` to refer to the final component of the path. It does not call ``walk_component()`` that last time. Handling that final component remains for the caller to sort out. Those callers are -``path_lookupat()``, ``path_parentat()``, ``path_mountpoint()`` and -``path_openat()`` each of which handles the differing requirements of +path_lookupat(), path_parentat() and +path_openat() each of which handles the differing requirements of different system calls. ``path_parentat()`` is clearly the simplest - it just wraps a little bit @@ -486,20 +488,18 @@ perform their operation. object is wanted such as by ``stat()`` or ``chmod()``. It essentially just calls ``walk_component()`` on the final component through a call to ``lookup_last()``. ``path_lookupat()`` returns just the final dentry. - -``path_mountpoint()`` handles the special case of unmounting which must -not try to revalidate the mounted filesystem. It effectively -contains, through a call to ``mountpoint_last()``, an alternate -implementation of ``lookup_slow()`` which skips that step. This is -important when unmounting a filesystem that is inaccessible, such as +It is worth noting that when flag ``LOOKUP_MOUNTPOINT`` is set, +path_lookupat() will unset LOOKUP_JUMPED in nameidata so that in the +subsequent path traversal d_weak_revalidate() won't be called. +This is important when unmounting a filesystem that is inaccessible, such as one provided by a dead NFS server. Finally ``path_openat()`` is used for the ``open()`` system call; it -contains, in support functions starting with "``do_last()``", all the +contains, in support functions starting with "open_last_lookups()", all the complexity needed to handle the different subtleties of O_CREAT (with or without O_EXCL), final "``/``" characters, and trailing symbolic links. We will revisit this in the final part of this series, which -focuses on those symbolic links. "``do_last()``" will sometimes, but +focuses on those symbolic links. "open_last_lookups()" will sometimes, but not always, take ``i_rwsem``, depending on what it finds. Each of these, or the functions which call them, need to be alert to @@ -535,8 +535,7 @@ covered in greater detail in autofs.txt in the Linux documentation tree, but a few notes specifically related to path lookup are in order here. -The Linux VFS has a concept of "managed" dentries which is reflected -in function names such as "``follow_managed()``". There are three +The Linux VFS has a concept of "managed" dentries. There are three potentially interesting things about these dentries corresponding to three different flags that might be set in ``dentry->d_flags``: @@ -652,10 +651,10 @@ RCU-walk finds it cannot stop gracefully, it simply gives up and restarts from the top with REF-walk. This pattern of "try RCU-walk, if that fails try REF-walk" can be -clearly seen in functions like ``filename_lookup()``, -``filename_parentat()``, ``filename_mountpoint()``, -``do_filp_open()``, and ``do_file_open_root()``. These five -correspond roughly to the four ``path_*()`` functions we met earlier, +clearly seen in functions like filename_lookup(), +filename_parentat(), +do_filp_open(), and do_file_open_root(). These four +correspond roughly to the three ``path_*()`` functions we met earlier, each of which calls ``link_path_walk()``. The ``path_*()`` functions are called using different mode flags until a mode is found which works. They are first called with ``LOOKUP_RCU`` set to request "RCU-walk". If @@ -993,8 +992,8 @@ is 4096. There are a number of reasons for this limit; not letting the kernel spend too much time on just one path is one of them. With symbolic links you can effectively generate much longer paths so some sort of limit is needed for the same reason. Linux imposes a limit of -at most 40 symlinks in any one path lookup. It previously imposed a -further limit of eight on the maximum depth of recursion, but that was +at most 40 (MAXSYMLINKS) symlinks in any one path lookup. It previously imposed +a further limit of eight on the maximum depth of recursion, but that was raised to 40 when a separate stack was implemented, so there is now just the one limit. @@ -1061,42 +1060,26 @@ filesystem cannot successfully get a reference in RCU-walk mode, it must return ``-ECHILD`` and ``unlazy_walk()`` will be called to return to REF-walk mode in which the filesystem is allowed to sleep. -The place for all this to happen is the ``i_op->follow_link()`` inode -method. In the present mainline code this is never actually called in -RCU-walk mode as the rewrite is not quite complete. It is likely that -in a future release this method will be passed an ``inode`` pointer when -called in RCU-walk mode so it both (1) knows to be careful, and (2) has the -validated pointer. Much like the ``i_op->permission()`` method we -looked at previously, ``->follow_link()`` would need to be careful that +The place for all this to happen is the ``i_op->get_link()`` inode +method. This is called both in RCU-walk and REF-walk. In RCU-walk the +``dentry*`` argument is NULL, ``->get_link()`` can return -ECHILD to drop out of +RCU-walk. Much like the ``i_op->permission()`` method we +looked at previously, ``->get_link()`` would need to be careful that all the data structures it references are safe to be accessed while -holding no counted reference, only the RCU lock. Though getting a -reference with ``->follow_link()`` is not yet done in RCU-walk mode, the -code is ready to release the reference when that does happen. - -This need to drop the reference to a symlink adds significant -complexity. It requires a reference to the inode so that the -``i_op->put_link()`` inode operation can be called. In REF-walk, that -reference is kept implicitly through a reference to the dentry, so -keeping the ``struct path`` of the symlink is easiest. For RCU-walk, -the pointer to the inode is kept separately. To allow switching from -RCU-walk back to REF-walk in the middle of processing nested symlinks -we also need the seq number for the dentry so we can confirm that -switching back was safe. - -Finally, when providing a reference to a symlink, the filesystem also -provides an opaque "cookie" that must be passed to ``->put_link()`` so that it -knows what to free. This might be the allocated memory area, or a -pointer to the ``struct page`` in the page cache, or something else -completely. Only the filesystem knows what it is. +holding no counted reference, only the RCU lock. A callback +``struct delayed_called`` will be passed to ``->get_link()``: +file systems can set their own put_link function and argument through +set_delayed_call(). Later on, when VFS wants to put link, it will call +do_delayed_call() to invoke that callback function with the argument. In order for the reference to each symlink to be dropped when the walk completes, whether in RCU-walk or REF-walk, the symlink stack needs to contain, along with the path remnants: -- the ``struct path`` to provide a reference to the inode in REF-walk -- the ``struct inode *`` to provide a reference to the inode in RCU-walk +- the ``struct path`` to provide a reference to the previous path +- the ``const char *`` to provide a reference to the to previous name - the ``seq`` to allow the path to be safely switched from RCU-walk to REF-walk -- the ``cookie`` that tells ``->put_path()`` what to put. +- the ``struct delayed_call`` for later invocation. This means that each entry in the symlink stack needs to hold five pointers and an integer instead of just one pointer (the path @@ -1120,12 +1103,10 @@ doesn't need to notice. Getting this ``name`` variable on and off the stack is very straightforward; pushing and popping the references is a little more complex. -When a symlink is found, ``walk_component()`` returns the value ``1`` -(``0`` is returned for any other sort of success, and a negative number -is, as usual, an error indicator). This causes ``get_link()`` to be -called; it then gets the link from the filesystem. Providing that -operation is successful, the old path ``name`` is placed on the stack, -and the new value is used as the ``name`` for a while. When the end of +When a symlink is found, walk_component() calls pick_link() via step_into() +which returns the link from the filesystem. +Providing that operation is successful, the old path ``name`` is placed on the +stack, and the new value is used as the ``name`` for a while. When the end of the path is found (i.e. ``*name`` is ``'\0'``) the old ``name`` is restored off the stack and path walking continues. @@ -1142,23 +1123,23 @@ stack in ``walk_component()`` immediately when the symlink is found; old symlink as it walks that last component. So it is quite convenient for ``walk_component()`` to release the old symlink and pop the references just before pushing the reference information for the -new symlink. It is guided in this by two flags; ``WALK_GET``, which -gives it permission to follow a symlink if it finds one, and -``WALK_PUT``, which tells it to release the current symlink after it has been -followed. ``WALK_PUT`` is tested first, leading to a call to -``put_link()``. ``WALK_GET`` is tested subsequently (by -``should_follow_link()``) leading to a call to ``pick_link()`` which sets -up the stack frame. +new symlink. It is guided in this by three flags: ``WALK_NOFOLLOW`` which +forbids it from following a symlink if it finds one, ``WALK_MORE`` +which indicates that it is yet too early to release the +current symlink, and ``WALK_TRAILING`` which indicates that it is on the final +component of the lookup, so we will check userspace flag ``LOOKUP_FOLLOW`` to +decide whether follow it when it is a symlink and call ``may_follow_link()`` to +check if we have privilege to follow it. Symlinks with no final component ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ A pair of special-case symlinks deserve a little further explanation. Both result in a new ``struct path`` (with mount and dentry) being set -up in the ``nameidata``, and result in ``get_link()`` returning ``NULL``. +up in the ``nameidata``, and result in pick_link() returning ``NULL``. The more obvious case is a symlink to "``/``". All symlinks starting -with "``/``" are detected in ``get_link()`` which resets the ``nameidata`` +with "``/``" are detected in pick_link() which resets the ``nameidata`` to point to the effective filesystem root. If the symlink only contains "``/``" then there is nothing more to do, no components at all, so ``NULL`` is returned to indicate that the symlink can be released and @@ -1175,12 +1156,11 @@ something that looks like a symlink. It is really a reference to the target file, not just the name of it. When you ``readlink`` these objects you get a name that might refer to the same file - unless it has been unlinked or mounted over. When ``walk_component()`` follows -one of these, the ``->follow_link()`` method in "procfs" doesn't return -a string name, but instead calls ``nd_jump_link()`` which updates the -``nameidata`` in place to point to that target. ``->follow_link()`` then -returns ``NULL``. Again there is no final component and ``get_link()`` -reports this by leaving the ``last_type`` field of ``nameidata`` as -``LAST_BIND``. +one of these, the ``->get_link()`` method in "procfs" doesn't return +a string name, but instead calls nd_jump_link() which updates the +``nameidata`` in place to point to that target. ``->get_link()`` then +returns ``NULL``. Again there is no final component and pick_link() +returns ``NULL``. Following the symlink in the final component -------------------------------------------- @@ -1197,42 +1177,38 @@ potentially need to call ``link_path_walk()`` again and again on successive symlinks until one is found that doesn't point to another symlink. -This case is handled by the relevant caller of ``link_path_walk()``, such as -``path_lookupat()`` using a loop that calls ``link_path_walk()``, and then -handles the final component. If the final component is a symlink -that needs to be followed, then ``trailing_symlink()`` is called to set -things up properly and the loop repeats, calling ``link_path_walk()`` -again. This could loop as many as 40 times if the last component of -each symlink is another symlink. - -The various functions that examine the final component and possibly -report that it is a symlink are ``lookup_last()``, ``mountpoint_last()`` -and ``do_last()``, each of which use the same convention as -``walk_component()`` of returning ``1`` if a symlink was found that needs -to be followed. - -Of these, ``do_last()`` is the most interesting as it is used for -opening a file. Part of ``do_last()`` runs with ``i_rwsem`` held and this -part is in a separate function: ``lookup_open()``. - -Explaining ``do_last()`` completely is beyond the scope of this article, -but a few highlights should help those interested in exploring the -code. - -1. Rather than just finding the target file, ``do_last()`` needs to open +This case is handled by relevant callers of link_path_walk(), such as +path_lookupat(), path_openat() using a loop that calls link_path_walk(), +and then handles the final component by calling open_last_lookups() or +lookup_last(). If it is a symlink that needs to be followed, +open_last_lookups() or lookup_last() will set things up properly and +return the path so that the loop repeats, calling +link_path_walk() again. This could loop as many as 40 times if the last +component of each symlink is another symlink. + +Of the various functions that examine the final component, +open_last_lookups() is the most interesting as it works in tandem +with do_open() for opening a file. Part of open_last_lookups() runs +with ``i_rwsem`` held and this part is in a separate function: lookup_open(). + +Explaining open_last_lookups() and do_open() completely is beyond the scope +of this article, but a few highlights should help those interested in exploring +the code. + +1. Rather than just finding the target file, do_open() is used after + open_last_lookup() to open it. If the file was found in the dcache, then ``vfs_open()`` is used for this. If not, then ``lookup_open()`` will either call ``atomic_open()`` (if the filesystem provides it) to combine the final lookup with the open, or - will perform the separate ``lookup_real()`` and ``vfs_create()`` steps + will perform the separate ``i_op->lookup()`` and ``i_op->create()`` steps directly. In the later case the actual "open" of this newly found or - created file will be performed by ``vfs_open()``, just as if the name + created file will be performed by vfs_open(), just as if the name were found in the dcache. -2. ``vfs_open()`` can fail with ``-EOPENSTALE`` if the cached information - wasn't quite current enough. Rather than restarting the lookup from - the top with ``LOOKUP_REVAL`` set, ``lookup_open()`` is called instead, - giving the filesystem a chance to resolve small inconsistencies. - If that doesn't work, only then is the lookup restarted from the top. +2. vfs_open() can fail with ``-EOPENSTALE`` if the cached information + wasn't quite current enough. If it's in RCU-walk ``-ECHILD`` will be returned + otherwise ``-ESTALE`` is returned. When ``-ESTALE`` is returned, the caller may + retry with ``LOOKUP_REVAL`` flag set. 3. An open with O_CREAT **does** follow a symlink in the final component, unlike other creation system calls (like ``mkdir``). So the sequence:: @@ -1242,8 +1218,8 @@ code. will create a file called ``/tmp/bar``. This is not permitted if ``O_EXCL`` is set but otherwise is handled for an O_CREAT open much - like for a non-creating open: ``should_follow_link()`` returns ``1``, and - so does ``do_last()`` so that ``trailing_symlink()`` gets called and the + like for a non-creating open: lookup_last() or open_last_lookup() + returns a non ``NULL`` value, and link_path_walk() gets called and the open process continues on the symlink that was found. Updating the access time diff --git a/Documentation/firmware-guide/acpi/dsd/data-node-references.rst b/Documentation/firmware-guide/acpi/dsd/data-node-references.rst index 9b17dc77d18c..b7ad47df49de 100644 --- a/Documentation/firmware-guide/acpi/dsd/data-node-references.rst +++ b/Documentation/firmware-guide/acpi/dsd/data-node-references.rst @@ -79,7 +79,8 @@ the ANOD object which is also the final target node of the reference. }) } -Please also see a graph example in :doc:`graph`. +Please also see a graph example in +Documentation/firmware-guide/acpi/dsd/graph.rst. References ========== diff --git a/Documentation/firmware-guide/acpi/dsd/graph.rst b/Documentation/firmware-guide/acpi/dsd/graph.rst index 7072db801aeb..4341299aa937 100644 --- a/Documentation/firmware-guide/acpi/dsd/graph.rst +++ b/Documentation/firmware-guide/acpi/dsd/graph.rst @@ -174,4 +174,4 @@ References referenced 2016-10-04. [7] _DSD Device Properties Usage Rules. - :doc:`../DSD-properties-rules` + Documentation/firmware-guide/acpi/DSD-properties-rules.rst diff --git a/Documentation/firmware-guide/acpi/enumeration.rst b/Documentation/firmware-guide/acpi/enumeration.rst index 9f0d5c854fa4..18074eb71860 100644 --- a/Documentation/firmware-guide/acpi/enumeration.rst +++ b/Documentation/firmware-guide/acpi/enumeration.rst @@ -339,8 +339,8 @@ a code like this:: There are also devm_* versions of these functions which release the descriptors once the device is released. -See Documentation/firmware-guide/acpi/gpio-properties.rst for more information about the -_DSD binding related to GPIOs. +See Documentation/firmware-guide/acpi/gpio-properties.rst for more information +about the _DSD binding related to GPIOs. MFD devices =========== @@ -460,7 +460,8 @@ the _DSD of the device object itself or the _DSD of its ancestor in the Otherwise, the _DSD itself is regarded as invalid and therefore the "compatible" property returned by it is meaningless. -Refer to :doc:`DSD-properties-rules` for more information. +Refer to Documentation/firmware-guide/acpi/DSD-properties-rules.rst for more +information. PCI hierarchy representation ============================ diff --git a/Documentation/i2c/instantiating-devices.rst b/Documentation/i2c/instantiating-devices.rst index e558e0a77e0c..890c9360ce19 100644 --- a/Documentation/i2c/instantiating-devices.rst +++ b/Documentation/i2c/instantiating-devices.rst @@ -59,7 +59,7 @@ Declare the I2C devices via ACPI ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ACPI can also describe I2C devices. There is special documentation for this -which is currently located at :doc:`../firmware-guide/acpi/enumeration`. +which is currently located at Documentation/firmware-guide/acpi/enumeration.rst. Declare the I2C devices in board files diff --git a/Documentation/i2c/old-module-parameters.rst b/Documentation/i2c/old-module-parameters.rst index 38e55829dee8..b08b6daabce9 100644 --- a/Documentation/i2c/old-module-parameters.rst +++ b/Documentation/i2c/old-module-parameters.rst @@ -17,7 +17,8 @@ address), ``force`` (to forcibly attach the driver to a given device) and With the conversion of the I2C subsystem to the standard device driver binding model, it became clear that these per-module parameters were no longer needed, and that a centralized implementation was possible. The new, -sysfs-based interface is described in :doc:`instantiating-devices`, section +sysfs-based interface is described in +Documentation/i2c/instantiating-devices.rst, section "Method 4: Instantiate from user-space". Below is a mapping from the old module parameters to the new interface. diff --git a/Documentation/i2c/smbus-protocol.rst b/Documentation/i2c/smbus-protocol.rst index 64689d19dd51..9e07e6bbe6a3 100644 --- a/Documentation/i2c/smbus-protocol.rst +++ b/Documentation/i2c/smbus-protocol.rst @@ -27,8 +27,8 @@ a different protocol operation entirely. Each transaction type corresponds to a functionality flag. Before calling a transaction function, a device driver should always check (just once) for the corresponding functionality flag to ensure that the underlying I2C -adapter supports the transaction in question. See :doc:`functionality` for -the details. +adapter supports the transaction in question. See +Documentation/i2c/functionality.rst for the details. Key to symbols diff --git a/Documentation/input/joydev/joystick-api.rst b/Documentation/input/joydev/joystick-api.rst index af5934c10c1c..5db6dc6fe1c5 100644 --- a/Documentation/input/joydev/joystick-api.rst +++ b/Documentation/input/joydev/joystick-api.rst @@ -263,7 +263,7 @@ possible overrun should the name be too long:: char name[128]; if (ioctl(fd, JSIOCGNAME(sizeof(name)), name) < 0) - strncpy(name, "Unknown", sizeof(name)); + strscpy(name, "Unknown", sizeof(name)); printf("Name: %s\n", name); diff --git a/Documentation/kernel-hacking/hacking.rst b/Documentation/kernel-hacking/hacking.rst index 451523424942..df65c19aa7df 100644 --- a/Documentation/kernel-hacking/hacking.rst +++ b/Documentation/kernel-hacking/hacking.rst @@ -601,7 +601,7 @@ Defined in ``include/linux/export.h`` This is the variant of `EXPORT_SYMBOL()` that allows specifying a symbol namespace. Symbol Namespaces are documented in -:doc:`../core-api/symbol-namespaces` +Documentation/core-api/symbol-namespaces.rst :c:func:`EXPORT_SYMBOL_NS_GPL()` -------------------------------- @@ -610,7 +610,7 @@ Defined in ``include/linux/export.h`` This is the variant of `EXPORT_SYMBOL_GPL()` that allows specifying a symbol namespace. Symbol Namespaces are documented in -:doc:`../core-api/symbol-namespaces` +Documentation/core-api/symbol-namespaces.rst Routines and Conventions ======================== diff --git a/Documentation/networking/device_drivers/ethernet/intel/i40e.rst b/Documentation/networking/device_drivers/ethernet/intel/i40e.rst index 2d3f6bd969a2..ac35bd472bdc 100644 --- a/Documentation/networking/device_drivers/ethernet/intel/i40e.rst +++ b/Documentation/networking/device_drivers/ethernet/intel/i40e.rst @@ -466,7 +466,7 @@ network. PTP support varies among Intel devices that support this driver. Use "ethtool -T <netdev name>" to get a definitive list of PTP capabilities supported by the device. -IEEE 802.1ad (QinQ) Support +IEEE 802.1ad (QinQ) Support --------------------------- The IEEE 802.1ad standard, informally known as QinQ, allows for multiple VLAN IDs within a single Ethernet frame. VLAN IDs are sometimes referred to as @@ -523,8 +523,8 @@ of a port's bandwidth (should it be available). The sum of all the values for Maximum Bandwidth is not restricted, because no more than 100% of a port's bandwidth can ever be used. -NOTE: X710/XXV710 devices fail to enable Max VFs (64) when Multiple Functions -per Port (MFP) and SR-IOV are enabled. An error from i40e is logged that says +NOTE: X710/XXV710 devices fail to enable Max VFs (64) when Multiple Functions +per Port (MFP) and SR-IOV are enabled. An error from i40e is logged that says "add vsi failed for VF N, aq_err 16". To workaround the issue, enable less than 64 virtual functions (VFs). diff --git a/Documentation/networking/device_drivers/ethernet/intel/iavf.rst b/Documentation/networking/device_drivers/ethernet/intel/iavf.rst index 25330b7b5168..151af0a8da9c 100644 --- a/Documentation/networking/device_drivers/ethernet/intel/iavf.rst +++ b/Documentation/networking/device_drivers/ethernet/intel/iavf.rst @@ -113,7 +113,7 @@ which the AVF is associated. The following are base mode features: - AVF device ID - HW mailbox is used for VF to PF communications (including on Windows) -IEEE 802.1ad (QinQ) Support +IEEE 802.1ad (QinQ) Support --------------------------- The IEEE 802.1ad standard, informally known as QinQ, allows for multiple VLAN IDs within a single Ethernet frame. VLAN IDs are sometimes referred to as diff --git a/Documentation/networking/devlink/devlink-region.rst b/Documentation/networking/devlink/devlink-region.rst index 3654c3e9658f..58fe95e9a49d 100644 --- a/Documentation/networking/devlink/devlink-region.rst +++ b/Documentation/networking/devlink/devlink-region.rst @@ -22,7 +22,7 @@ The major benefit to creating a region is to provide access to internal address regions that are otherwise inaccessible to the user. Regions may also be used to provide an additional way to debug complex error -states, but see also :doc:`devlink-health` +states, but see also Documentation/networking/devlink/devlink-health.rst Regions may optionally support capturing a snapshot on demand via the ``DEVLINK_CMD_REGION_NEW`` netlink message. A driver wishing to allow diff --git a/Documentation/networking/devlink/devlink-trap.rst b/Documentation/networking/devlink/devlink-trap.rst index 935b6397e8cf..efa5f7f42c88 100644 --- a/Documentation/networking/devlink/devlink-trap.rst +++ b/Documentation/networking/devlink/devlink-trap.rst @@ -495,8 +495,8 @@ help debug packet drops caused by these exceptions. The following list includes links to the description of driver-specific traps registered by various device drivers: - * :doc:`netdevsim` - * :doc:`mlxsw` + * Documentation/networking/devlink/netdevsim.rst + * Documentation/networking/devlink/mlxsw.rst .. _Generic-Packet-Trap-Groups: diff --git a/Documentation/networking/packet_mmap.rst b/Documentation/networking/packet_mmap.rst index 500ef60b1b82..c5da1a5d93de 100644 --- a/Documentation/networking/packet_mmap.rst +++ b/Documentation/networking/packet_mmap.rst @@ -153,7 +153,7 @@ As capture, each frame contains two parts:: struct ifreq s_ifr; ... - strncpy (s_ifr.ifr_name, "eth0", sizeof(s_ifr.ifr_name)); + strscpy_pad (s_ifr.ifr_name, "eth0", sizeof(s_ifr.ifr_name)); /* get interface index of eth0 */ ioctl(this->socket, SIOCGIFINDEX, &s_ifr); diff --git a/Documentation/networking/tuntap.rst b/Documentation/networking/tuntap.rst index a59d1dd6fdcc..4d7087f727be 100644 --- a/Documentation/networking/tuntap.rst +++ b/Documentation/networking/tuntap.rst @@ -107,7 +107,7 @@ Note that the character pointer becomes overwritten with the real device name */ ifr.ifr_flags = IFF_TUN; if( *dev ) - strncpy(ifr.ifr_name, dev, IFNAMSIZ); + strscpy_pad(ifr.ifr_name, dev, IFNAMSIZ); if( (err = ioctl(fd, TUNSETIFF, (void *) &ifr)) < 0 ){ close(fd); diff --git a/Documentation/process/submitting-patches.rst b/Documentation/process/submitting-patches.rst index c66a19201deb..0852bcf73630 100644 --- a/Documentation/process/submitting-patches.rst +++ b/Documentation/process/submitting-patches.rst @@ -10,10 +10,11 @@ can greatly increase the chances of your change being accepted. This document contains a large number of suggestions in a relatively terse format. For detailed information on how the kernel development process -works, see :doc:`development-process`. Also, read :doc:`submit-checklist` +works, see Documentation/process/development-process.rst. Also, read +Documentation/process/submit-checklist.rst for a list of items to check before submitting code. If you are submitting -a driver, also read :doc:`submitting-drivers`; for device tree binding patches, -read :doc:`submitting-patches`. +a driver, also read Documentation/process/submitting-drivers.rst; for device +tree binding patches, read Documentation/process/submitting-patches.rst. This documentation assumes that you're using ``git`` to prepare your patches. If you're unfamiliar with ``git``, you would be well-advised to learn how to @@ -178,8 +179,7 @@ Style-check your changes ------------------------ Check your patch for basic style violations, details of which can be -found in -:ref:`Documentation/process/coding-style.rst <codingstyle>`. +found in Documentation/process/coding-style.rst. Failure to do so simply wastes the reviewers time and will get your patch rejected, probably without even being read. @@ -238,7 +238,7 @@ If you have a patch that fixes an exploitable security bug, send that patch to security@kernel.org. For severe bugs, a short embargo may be considered to allow distributors to get the patch out to users; in such cases, obviously, the patch should not be sent to any public lists. See also -:doc:`/admin-guide/security-bugs`. +Documentation/admin-guide/security-bugs.rst. Patches that fix a severe bug in a released kernel should be directed toward the stable maintainers by putting a line like this:: @@ -246,9 +246,8 @@ toward the stable maintainers by putting a line like this:: Cc: stable@vger.kernel.org into the sign-off area of your patch (note, NOT an email recipient). You -should also read -:ref:`Documentation/process/stable-kernel-rules.rst <stable_kernel_rules>` -in addition to this file. +should also read Documentation/process/stable-kernel-rules.rst +in addition to this document. If changes affect userland-kernel interfaces, please send the MAN-PAGES maintainer (as listed in the MAINTAINERS file) a man-pages patch, or at @@ -305,8 +304,8 @@ decreasing the likelihood of your MIME-attached change being accepted. Exception: If your mailer is mangling patches then someone may ask you to re-send them using MIME. -See :doc:`/process/email-clients` for hints about configuring your e-mail -client so that it sends your patches untouched. +See Documentation/process/email-clients.rst for hints about configuring +your e-mail client so that it sends your patches untouched. Respond to review comments -------------------------- @@ -324,7 +323,7 @@ for their time. Code review is a tiring and time-consuming process, and reviewers sometimes get grumpy. Even in that case, though, respond politely and address the problems they have pointed out. -See :doc:`email-clients` for recommendations on email +See Documentation/process/email-clients.rst for recommendations on email clients and mailing list etiquette. @@ -562,10 +561,10 @@ method for indicating a bug fixed by the patch. See :ref:`describe_changes` for more details. Note: Attaching a Fixes: tag does not subvert the stable kernel rules -process nor the requirement to Cc: stable@vger.kernel.org on all stable +process nor the requirement to Cc: stable@vger.kernel.org on all stable patch candidates. For more information, please read -:ref:`Documentation/process/stable-kernel-rules.rst <stable_kernel_rules>` - +Documentation/process/stable-kernel-rules.rst. + .. _the_canonical_patch_format: The canonical patch format @@ -824,8 +823,7 @@ Greg Kroah-Hartman, "How to piss off a kernel subsystem maintainer". NO!!!! No more huge patch bombs to linux-kernel@vger.kernel.org people! <https://lore.kernel.org/r/20050711.125305.08322243.davem@davemloft.net> -Kernel Documentation/process/coding-style.rst: - :ref:`Documentation/process/coding-style.rst <codingstyle>` +Kernel Documentation/process/coding-style.rst Linus Torvalds's mail on the canonical patch format: <https://lore.kernel.org/r/Pine.LNX.4.58.0504071023190.28951@ppc970.osdl.org> diff --git a/Documentation/scheduler/sched-bwc.rst b/Documentation/scheduler/sched-bwc.rst index 845eee659199..1fc73555f5c4 100644 --- a/Documentation/scheduler/sched-bwc.rst +++ b/Documentation/scheduler/sched-bwc.rst @@ -29,7 +29,7 @@ Quota and period are managed within the cpu subsystem via cgroupfs. .. note:: The cgroupfs files described in this section are only applicable to cgroup v1. For cgroup v2, see - :ref:`Documentation/admin-guide/cgroupv2.rst <cgroup-v2-cpu>`. + :ref:`Documentation/admin-guide/cgroup-v2.rst <cgroup-v2-cpu>`. - cpu.cfs_quota_us: the total available run-time within a period (in microseconds) diff --git a/Documentation/scheduler/sched-nice-design.rst b/Documentation/scheduler/sched-nice-design.rst index 0571f1b47e64..889bf2b737dc 100644 --- a/Documentation/scheduler/sched-nice-design.rst +++ b/Documentation/scheduler/sched-nice-design.rst @@ -60,7 +60,7 @@ within the constraints of HZ and jiffies and their nasty design level coupling to timeslices and granularity it was not really viable. The second (less frequent but still periodically occurring) complaint -about Linux's nice level support was its assymetry around the origo +about Linux's nice level support was its asymmetry around the origin (which you can see demonstrated in the picture above), or more accurately: the fact that nice level behavior depended on the _absolute_ nice level as well, while the nice API itself is fundamentally diff --git a/Documentation/security/landlock.rst b/Documentation/security/landlock.rst index 2e84925ae971..3df68cb1d10f 100644 --- a/Documentation/security/landlock.rst +++ b/Documentation/security/landlock.rst @@ -25,7 +25,8 @@ Any user can enforce Landlock rulesets on their processes. They are merged and evaluated according to the inherited ones in a way that ensures that only more constraints can be added. -User space documentation can be found here: :doc:`/userspace-api/landlock`. +User space documentation can be found here: +Documentation/userspace-api/landlock.rst. Guiding principles for safe access controls =========================================== diff --git a/Documentation/trace/coresight/coresight-etm4x-reference.rst b/Documentation/trace/coresight/coresight-etm4x-reference.rst index b64d9a9c79df..d25dfe86af9b 100644 --- a/Documentation/trace/coresight/coresight-etm4x-reference.rst +++ b/Documentation/trace/coresight/coresight-etm4x-reference.rst @@ -427,7 +427,7 @@ the ‘TRC’ prefix. :Syntax: ``echo idx > vmid_idx`` - Where idx < numvmidc + Where idx < numvmidc ---- diff --git a/Documentation/trace/coresight/coresight.rst b/Documentation/trace/coresight/coresight.rst index 169749efd8d1..1ec8dc35b1d8 100644 --- a/Documentation/trace/coresight/coresight.rst +++ b/Documentation/trace/coresight/coresight.rst @@ -315,7 +315,8 @@ intermediate links as required. Note: ``cti_sys0`` appears in two of the connections lists above. CTIs can connect to multiple devices and are arranged in a star topology -via the CTM. See (:doc:`coresight-ect`) [#fourth]_ for further details. +via the CTM. See (Documentation/trace/coresight/coresight-ect.rst) +[#fourth]_ for further details. Looking at this device we see 4 connections:: linaro-developer:~# ls -l /sys/bus/coresight/devices/cti_sys0/connections @@ -606,7 +607,8 @@ interface provided for that purpose by the generic STM API:: crw------- 1 root root 10, 61 Jan 3 18:11 /dev/stm0 root@genericarmv8:~# -Details on how to use the generic STM API can be found here:- :doc:`../stm` [#second]_. +Details on how to use the generic STM API can be found here: +- Documentation/trace/stm.rst [#second]_. The CTI & CTM Modules --------------------- @@ -616,7 +618,7 @@ individual CTIs and components, and can propagate these between all CTIs via channels on the CTM (Cross Trigger Matrix). A separate documentation file is provided to explain the use of these devices. -(:doc:`coresight-ect`) [#fourth]_. +(Documentation/trace/coresight/coresight-ect.rst) [#fourth]_. .. [#first] Documentation/ABI/testing/sysfs-bus-coresight-devices-stm diff --git a/Documentation/trace/ftrace.rst b/Documentation/trace/ftrace.rst index 62c98e9bbdd9..cfc81e98e0b8 100644 --- a/Documentation/trace/ftrace.rst +++ b/Documentation/trace/ftrace.rst @@ -40,7 +40,7 @@ See events.rst for more information. Implementation Details ---------------------- -See :doc:`ftrace-design` for details for arch porters and such. +See Documentation/trace/ftrace-design.rst for details for arch porters and such. The File System @@ -354,8 +354,8 @@ of ftrace. Here is a list of some of the key files: is being directly called by the function. If the count is greater than 1 it most likely will be ftrace_ops_list_func(). - If the callback of the function jumps to a trampoline that is - specific to a the callback and not the standard trampoline, + If the callback of a function jumps to a trampoline that is + specific to the callback and which is not the standard trampoline, its address will be printed as well as the function that the trampoline calls. diff --git a/Documentation/translations/index.rst b/Documentation/translations/index.rst index e446e5ed00a6..556b050884fc 100644 --- a/Documentation/translations/index.rst +++ b/Documentation/translations/index.rst @@ -18,6 +18,10 @@ Translations Disclaimer ---------- +.. raw:: latex + + \kerneldocCJKoff + Translation's purpose is to ease reading and understanding in languages other than English. Its aim is to help people who do not understand English or have doubts about its interpretation. Additionally, some people prefer to read diff --git a/Documentation/translations/it_IT/index.rst b/Documentation/translations/it_IT/index.rst index bb8fa7346939..e80a3097aa57 100644 --- a/Documentation/translations/it_IT/index.rst +++ b/Documentation/translations/it_IT/index.rst @@ -4,6 +4,10 @@ Traduzione italiana =================== +.. raw:: latex + + \kerneldocCJKoff + :manutentore: Federico Vaga <federico.vaga@vaga.pv.it> .. _it_disclaimer: diff --git a/Documentation/translations/it_IT/process/coding-style.rst b/Documentation/translations/it_IT/process/coding-style.rst index 95f2e7c985e2..ecc74ba50d3e 100644 --- a/Documentation/translations/it_IT/process/coding-style.rst +++ b/Documentation/translations/it_IT/process/coding-style.rst @@ -62,7 +62,7 @@ i ``case``. Un esempio.: case 'K': case 'k': mem <<= 10; - /* fall through */ + fallthrough; default: break; } diff --git a/Documentation/translations/ja_JP/index.rst b/Documentation/translations/ja_JP/index.rst index 2f91b895e3c2..f94ba62d41c3 100644 --- a/Documentation/translations/ja_JP/index.rst +++ b/Documentation/translations/ja_JP/index.rst @@ -1,7 +1,8 @@ .. raw:: latex - \renewcommand\thesection* - \renewcommand\thesubsection* + \renewcommand\thesection* + \renewcommand\thesubsection* + \kerneldocCJKon Japanese translations ===================== diff --git a/Documentation/translations/ko_KR/index.rst b/Documentation/translations/ko_KR/index.rst index b9e27d20b039..6ae258118bdf 100644 --- a/Documentation/translations/ko_KR/index.rst +++ b/Documentation/translations/ko_KR/index.rst @@ -1,7 +1,8 @@ .. raw:: latex - \renewcommand\thesection* - \renewcommand\thesubsection* + \renewcommand\thesection* + \renewcommand\thesubsection* + \kerneldocCJKon 한국어 번역 =========== diff --git a/Documentation/translations/zh_CN/admin-guide/index.rst b/Documentation/translations/zh_CN/admin-guide/index.rst index be835ec8e632..460034cbc2ab 100644 --- a/Documentation/translations/zh_CN/admin-guide/index.rst +++ b/Documentation/translations/zh_CN/admin-guide/index.rst @@ -65,6 +65,7 @@ Todolist: clearing-warn-once cpu-load + lockup-watchdogs unicode Todolist: @@ -100,7 +101,6 @@ Todolist: laptops/index lcd-panel-cgram ldm - lockup-watchdogs LSM/index md media/index diff --git a/Documentation/translations/zh_CN/admin-guide/lockup-watchdogs.rst b/Documentation/translations/zh_CN/admin-guide/lockup-watchdogs.rst new file mode 100644 index 000000000000..55ed3f4af442 --- /dev/null +++ b/Documentation/translations/zh_CN/admin-guide/lockup-watchdogs.rst @@ -0,0 +1,66 @@ +.. include:: ../disclaimer-zh_CN.rst + +:Original: Documentation/admin-guide/lockup-watchdogs.rst +:Translator: Hailong Liu <liu.hailong6@zte.com.cn> + +.. _cn_lockup-watchdogs: + + +================================================= +Softlockup与hardlockup检测机制(又名:nmi_watchdog) +================================================= + +Linux中内核实现了一种用以检测系统发生softlockup和hardlockup的看门狗机制。 + +Softlockup是一种会引发系统在内核态中一直循环超过20秒(详见下面“实现”小节)导致 +其他任务没有机会得到运行的BUG。一旦检测到'softlockup'发生,默认情况下系统会打 +印当前堆栈跟踪信息并进入锁定状态。也可配置使其在检测到'softlockup'后进入panic +状态;通过sysctl命令设置“kernel.softlockup_panic”、使用内核启动参数 +“softlockup_panic”(详见Documentation/admin-guide/kernel-parameters.rst)以及使 +能内核编译选项“BOOTPARAM_SOFTLOCKUP_PANIC”都可实现这种配置。 + +而'hardlockup'是一种会引发系统在内核态一直循环超过10秒钟(详见"实现"小节)导致其 +他中断没有机会运行的缺陷。与'softlockup'情况类似,除了使用sysctl命令设置 +'hardlockup_panic'、使能内核选项“BOOTPARAM_HARDLOCKUP_PANIC”以及使用内核参数 +"nmi_watchdog"(详见:”Documentation/admin-guide/kernel-parameters.rst“)外,一旦检 +测到'hardlockup'默认情况下系统打印当前堆栈跟踪信息,然后进入锁定状态。 + +这个panic选项也可以与panic_timeout结合使用(这个panic_timeout是通过稍具迷惑性的 +sysctl命令"kernel.panic"来设置),使系统在panic指定时间后自动重启。 + +实现 +==== + +Softlockup和hardlockup分别建立在hrtimer(高精度定时器)和perf两个子系统上而实现。 +这也就意味着理论上任何架构只要实现了这两个子系统就支持这两种检测机制。 + +Hrtimer用于周期性产生中断并唤醒watchdog线程;NMI perf事件则以”watchdog_thresh“ +(编译时默认初始化为10秒,也可通过”watchdog_thresh“这个sysctl接口来进行配置修改) +为间隔周期产生以检测 hardlockups。如果一个CPU在这个时间段内没有检测到hrtimer中 +断发生,'hardlockup 检测器'(即NMI perf事件处理函数)将会视系统配置而选择产生内核 +警告或者直接panic。 + +而watchdog线程本质上是一个高优先级内核线程,每调度一次就对时间戳进行一次更新。 +如果时间戳在2*watchdog_thresh(这个是softlockup的触发门限)这段时间都未更新,那么 +"softlocup 检测器"(内部hrtimer定时器回调函数)会将相关的调试信息打印到系统日志中, +然后如果系统配置了进入panic流程则进入panic,否则内核继续执行。 + +Hrtimer定时器的周期是2*watchdog_thresh/5,也就是说在hardlockup被触发前hrtimer有 +2~3次机会产生时钟中断。 + +如上所述,内核相当于为系统管理员提供了一个可调节hrtimer定时器和perf事件周期长度 +的调节旋钮。如何通过这个旋钮为特定使用场景配置一个合理的周期值要对lockups检测的 +响应速度和lockups检测开销这二者之间进行权衡。 + +默认情况下所有在线cpu上都会运行一个watchdog线程。不过在内核配置了”NO_HZ_FULL“的 +情况下watchdog线程默认只会运行在管家(housekeeping)cpu上,而”nohz_full“启动参数指 +定的cpu上则不会有watchdog线程运行。试想,如果我们允许watchdog线程在”nohz_full“指 +定的cpu上运行,这些cpu上必须得运行时钟定时器来激发watchdog线程调度;这样一来就会 +使”nohz_full“保护用户程序免受内核干扰的功能失效。当然,副作用就是”nohz_full“指定 +的cpu即使在内核产生了lockup问题我们也无法检测到。不过,至少我们可以允许watchdog +线程在管家(non-tickless)核上继续运行以便我们能继续正常的监测这些cpus上的lockups +事件。 + +不论哪种情况都可以通过sysctl命令kernel.watchdog_cpumask来对没有运行watchdog线程 +的cpu集合进行调节。对于nohz_full而言,如果nohz_full cpu上有异常挂住的情况,通过 +这种方式打开这些cpu上的watchdog进行调试可能会有所作用。 diff --git a/Documentation/translations/zh_CN/core-api/cachetlb.rst b/Documentation/translations/zh_CN/core-api/cachetlb.rst new file mode 100644 index 000000000000..8376485a534d --- /dev/null +++ b/Documentation/translations/zh_CN/core-api/cachetlb.rst @@ -0,0 +1,336 @@ +.. include:: ../disclaimer-zh_CN.rst + +:Original: Documentation/core-api/cachetlb.rst + +:翻译: + + 司延腾 Yanteng Si <siyanteng@loongson.cn> + +:校译: + + 吴想成 Wu XiangCheng <bobwxc@email.cn> + +.. _cn_core-api_cachetlb: + +====================== +Linux下的缓存和TLB刷新 +====================== + +:作者: David S. Miller <davem@redhat.com> + +*译注:TLB,Translation Lookaside Buffer,页表缓存/变换旁查缓冲器* + +本文描述了由Linux虚拟内存子系统调用的缓存/TLB刷新接口。它列举了每个接 +口,描述了它的预期目的,以及接口被调用后的预期副作用。 + +下面描述的副作用是针对单处理器的实现,以及在单个处理器上发生的情况。若 +为SMP,则只需将定义简单地扩展一下,使发生在某个特定接口的副作用扩展到系 +统的所有处理器上。不要被这句话吓到,以为SMP的缓存/tlb刷新一定是很低 +效的,事实上,这是一个可以进行很多优化的领域。例如,如果可以证明一个用 +户地址空间从未在某个cpu上执行过(见mm_cpumask()),那么就不需要在该 +cpu上对这个地址空间进行刷新。 + +首先是TLB刷新接口,因为它们是最简单的。在Linux下,TLB被抽象为cpu +用来缓存从软件页表获得的虚拟->物理地址转换的东西。这意味着,如果软件页 +表发生变化,这个“TLB”缓存中就有可能出现过时(脏)的翻译。因此,当软件页表 +发生变化时,内核会在页表发生 *变化后* 调用以下一种刷新方法: + +1) ``void flush_tlb_all(void)`` + + 最严格的刷新。在这个接口运行后,任何以前的页表修改都会对cpu可见。 + + 这通常是在内核页表被改变时调用的,因为这种转换在本质上是“全局”的。 + +2) ``void flush_tlb_mm(struct mm_struct *mm)`` + + 这个接口从TLB中刷新整个用户地址空间。在运行后,这个接口必须确保 + 以前对地址空间‘mm’的任何页表修改对cpu来说是可见的。也就是说,在 + 运行后,TLB中不会有‘mm’的页表项。 + + 这个接口被用来处理整个地址空间的页表操作,比如在fork和exec过程 + 中发生的事情。 + +3) ``void flush_tlb_range(struct vm_area_struct *vma, + unsigned long start, unsigned long end)`` + + 这里我们要从TLB中刷新一个特定范围的(用户)虚拟地址转换。在运行后, + 这个接口必须确保以前对‘start’到‘end-1’范围内的地址空间‘vma->vm_mm’ + 的任何页表修改对cpu来说是可见的。也就是说,在运行后,TLB中不会有 + ‘mm’的页表项用于‘start’到‘end-1’范围内的虚拟地址。 + + “vma”是用于该区域的备份存储。主要是用于munmap()类型的操作。 + + 提供这个接口是希望端口能够找到一个合适的有效方法来从TLB中删除多 + 个页面大小的转换,而不是让内核为每个可能被修改的页表项调用 + flush_tlb_page(见下文)。 + +4) ``void flush_tlb_page(struct vm_area_struct *vma, unsigned long addr)`` + + 这一次我们需要从TLB中删除PAGE_SIZE大小的转换。‘vma’是Linux用来跟 + 踪进程的mmap区域的支持结构体,地址空间可以通过vma->vm_mm获得。另 + 外,可以通过测试(vma->vm_flags & VM_EXEC)来查看这个区域是否是 + 可执行的(因此在split-tlb类型的设置中可能在“指令TLB”中)。 + + 在运行后,这个接口必须确保之前对用户虚拟地址“addr”的地址空间 + “vma->vm_mm”的页表修改对cpu来说是可见的。也就是说,在运行后,TLB + 中不会有虚拟地址‘addr’的‘vma->vm_mm’的页表项。 + + 这主要是在故障处理时使用。 + +5) ``void update_mmu_cache(struct vm_area_struct *vma, + unsigned long address, pte_t *ptep)`` + + 在每个页面故障结束时,这个程序被调用,以告诉体系结构特定的代码,在 + 软件页表中,在地址空间“vma->vm_mm”的虚拟地址“地址”处,现在存在 + 一个翻译。 + + 可以用它所选择的任何方式使用这个信息来进行移植。例如,它可以使用这 + 个事件来为软件管理的TLB配置预装TLB转换。目前sparc64移植就是这么干 + 的。 + +接下来,我们有缓存刷新接口。一般来说,当Linux将现有的虚拟->物理映射 +改变为新的值时,其顺序将是以下形式之一:: + + 1) flush_cache_mm(mm); + change_all_page_tables_of(mm); + flush_tlb_mm(mm); + + 2) flush_cache_range(vma, start, end); + change_range_of_page_tables(mm, start, end); + flush_tlb_range(vma, start, end); + + 3) flush_cache_page(vma, addr, pfn); + set_pte(pte_pointer, new_pte_val); + flush_tlb_page(vma, addr); + +缓存级别的刷新将永远是第一位的,因为这允许我们正确处理那些缓存严格, +且在虚拟地址被从缓存中刷新时要求一个虚拟地址的虚拟->物理转换存在的系统。 +HyperSparc cpu就是这样一个具有这种属性的cpu。 + +下面的缓存刷新程序只需要在特定的cpu需要的范围内处理缓存刷新。大多数 +情况下,这些程序必须为cpu实现,这些cpu有虚拟索引的缓存,当虚拟->物 +理转换被改变或移除时,必须被刷新。因此,例如,IA32处理器的物理索引 +的物理标记的缓存没有必要实现这些接口,因为这些缓存是完全同步的,并 +且不依赖于翻译信息。 + +下面逐个列出这些程序: + +1) ``void flush_cache_mm(struct mm_struct *mm)`` + + 这个接口将整个用户地址空间从高速缓存中刷掉。也就是说,在运行后, + 将没有与‘mm’相关的缓存行。 + + 这个接口被用来处理整个地址空间的页表操作,比如在退出和执行过程 + 中发生的事情。 + +2) ``void flush_cache_dup_mm(struct mm_struct *mm)`` + + 这个接口将整个用户地址空间从高速缓存中刷新掉。也就是说,在运行 + 后,将没有与‘mm’相关的缓存行。 + + 这个接口被用来处理整个地址空间的页表操作,比如在fork过程中发生 + 的事情。 + + 这个选项与flush_cache_mm分开,以允许对VIPT缓存进行一些优化。 + +3) ``void flush_cache_range(struct vm_area_struct *vma, + unsigned long start, unsigned long end)`` + + 在这里,我们要从缓存中刷新一个特定范围的(用户)虚拟地址。运行 + 后,在“start”到“end-1”范围内的虚拟地址的“vma->vm_mm”的缓存中 + 将没有页表项。 + + “vma”是被用于该区域的备份存储。主要是用于munmap()类型的操作。 + + 提供这个接口是希望端口能够找到一个合适的有效方法来从缓存中删 + 除多个页面大小的区域, 而不是让内核为每个可能被修改的页表项调 + 用 flush_cache_page (见下文)。 + +4) ``void flush_cache_page(struct vm_area_struct *vma, unsigned long addr, unsigned long pfn)`` + + 这一次我们需要从缓存中删除一个PAGE_SIZE大小的区域。“vma”是 + Linux用来跟踪进程的mmap区域的支持结构体,地址空间可以通过 + vma->vm_mm获得。另外,我们可以通过测试(vma->vm_flags & + VM_EXEC)来查看这个区域是否是可执行的(因此在“Harvard”类 + 型的缓存布局中可能是在“指令缓存”中)。 + + “pfn”表示“addr”所对应的物理页框(通过PAGE_SHIFT左移这个 + 值来获得物理地址)。正是这个映射应该从缓存中删除。 + + 在运行之后,对于虚拟地址‘addr’的‘vma->vm_mm’,在缓存中不会 + 有任何页表项,它被翻译成‘pfn’。 + + 这主要是在故障处理过程中使用。 + +5) ``void flush_cache_kmaps(void)`` + + 只有在平台使用高位内存的情况下才需要实现这个程序。它将在所有的 + kmaps失效之前被调用。 + + 运行后,内核虚拟地址范围PKMAP_ADDR(0)到PKMAP_ADDR(LAST_PKMAP) + 的缓存中将没有页表项。 + + 这个程序应该在asm/highmem.h中实现。 + +6) ``void flush_cache_vmap(unsigned long start, unsigned long end)`` + ``void flush_cache_vunmap(unsigned long start, unsigned long end)`` + + 在这里,在这两个接口中,我们从缓存中刷新一个特定范围的(内核) + 虚拟地址。运行后,在“start”到“end-1”范围内的虚拟地址的内核地 + 址空间的缓存中不会有页表项。 + + 这两个程序中的第一个是在vmap_range()安装了页表项之后调用的。 + 第二个是在vunmap_range()删除页表项之前调用的。 + +还有一类cpu缓存问题,目前需要一套完全不同的接口来正确处理。最大 +的问题是处理器的数据缓存中的虚拟别名。 + +.. note:: + + 这段内容有些晦涩,为了减轻中文阅读压力,特作此译注。 + + 别名(alias)属于缓存一致性问题,当不同的虚拟地址映射相同的 + 物理地址,而这些虚拟地址的index不同,此时就发生了别名现象(多 + 个虚拟地址被称为别名)。通俗点来说就是指同一个物理地址的数据被 + 加载到不同的cacheline中就会出现别名现象。 + + 常见的解决方法有两种:第一种是硬件维护一致性,设计特定的cpu电 + 路来解决问题(例如设计为PIPT的cache);第二种是软件维护一致性, + 就是下面介绍的sparc的解决方案——页面染色,涉及的技术细节太多, + 译者不便展开,请读者自行查阅相关资料。 + +您的移植是否容易在其D-cache中出现虚拟别名?嗯,如果您的D-cache +是虚拟索引的,且cache大于PAGE_SIZE(页大小),并且不能防止同一 +物理地址的多个cache行同时存在,您就会遇到这个问题。 + +如果你的D-cache有这个问题,首先正确定义asm/shmparam.h SHMLBA, +它基本上应该是你的虚拟寻址D-cache的大小(或者如果大小是可变的, +则是最大的可能大小)。这个设置将迫使SYSv IPC层只允许用户进程在 +这个值的倍数的地址上对共享内存进行映射。 + +.. note:: + + 这并不能解决共享mmaps的问题,请查看sparc64移植解决 + 这个问题的一个方法(特别是 SPARC_FLAG_MMAPSHARED)。 + +接下来,你必须解决所有其他情况下的D-cache别名问题。请记住这个事 +实,对于一个给定的页面映射到某个用户地址空间,总是至少还有一个映 +射,那就是内核在其线性映射中从PAGE_OFFSET开始。因此,一旦第一个 +用户将一个给定的物理页映射到它的地址空间,就意味着D-cache的别名 +问题有可能存在,因为内核已经将这个页映射到它的虚拟地址。 + + ``void copy_user_page(void *to, void *from, unsigned long addr, struct page *page)`` + ``void clear_user_page(void *to, unsigned long addr, struct page *page)`` + + 这两个程序在用户匿名或COW页中存储数据。它允许一个端口有效地 + 避免用户空间和内核之间的D-cache别名问题。 + + 例如,一个端口可以在复制过程中把“from”和“to”暂时映射到内核 + 的虚拟地址上。这两个页面的虚拟地址的选择方式是,内核的加载/存 + 储指令发生在虚拟地址上,而这些虚拟地址与用户的页面映射是相同 + 的“颜色”。例如,Sparc64就使用这种技术。 + + “addr”参数告诉了用户最终要映射这个页面的虚拟地址,“page”参 + 数给出了一个指向目标页结构体的指针。 + + 如果D-cache别名不是问题,这两个程序可以简单地直接调用 + memcpy/memset而不做其他事情。 + + ``void flush_dcache_page(struct page *page)`` + + 任何时候,当内核写到一个页面缓存页,或者内核要从一个页面缓存 + 页中读出,并且这个页面的用户空间共享/可写映射可能存在时, + 这个程序就会被调用。 + + .. note:: + + 这个程序只需要为有可能被映射到用户进程的地址空间的 + 页面缓存调用。因此,例如,处理页面缓存中vfs符号链 + 接的VFS层代码根本不需要调用这个接口。 + + “内核写入页面缓存的页面”这句话的意思是,具体来说,内核执行存 + 储指令,在该页面的页面->虚拟映射处弄脏该页面的数据。在这里,通 + 过刷新的手段处理D-cache的别名是很重要的,以确保这些内核存储对 + 该页的用户空间映射是可见的。 + + 推论的情况也同样重要,如果有用户对这个文件有共享+可写的映射, + 我们必须确保内核对这些页面的读取会看到用户所做的最新的存储。 + + 如果D-cache别名不是一个问题,这个程序可以简单地定义为该架构上 + 的nop。 + + 在page->flags (PG_arch_1)中有一个位是“架构私有”。内核保证, + 对于分页缓存的页面,当这样的页面第一次进入分页缓存时,它将清除 + 这个位。 + + 这使得这些接口可以更有效地被实现。如果目前没有用户进程映射这个 + 页面,它允许我们“推迟”(也许是无限期)实际的刷新过程。请看 + sparc64的flush_dcache_page和update_mmu_cache实现,以了解如 + 何做到这一点。 + + 这个想法是,首先在flush_dcache_page()时,如果page->mapping->i_mmap + 是一个空树,只需标记架构私有页标志位。之后,在update_mmu_cache() + 中,会对这个标志位进行检查,如果设置了,就进行刷新,并清除标志位。 + + .. important:: + + 通常很重要的是,如果你推迟刷新,实际的刷新发生在同一个 + CPU上,因为它将cpu存储到页面上,使其变脏。同样,请看 + sparc64关于如何处理这个问题的例子。 + + ``void copy_to_user_page(struct vm_area_struct *vma, struct page *page, + unsigned long user_vaddr, void *dst, void *src, int len)`` + ``void copy_from_user_page(struct vm_area_struct *vma, struct page *page, + unsigned long user_vaddr, void *dst, void *src, int len)`` + + 当内核需要复制任意的数据进出任意的用户页时(比如ptrace()),它将使 + 用这两个程序。 + + 任何必要的缓存刷新或其他需要发生的一致性操作都应该在这里发生。如果 + 处理器的指令缓存没有对cpu存储进行窥探,那么你很可能需要为 + copy_to_user_page()刷新指令缓存。 + + ``void flush_anon_page(struct vm_area_struct *vma, struct page *page, + unsigned long vmaddr)`` + + 当内核需要访问一个匿名页的内容时,它会调用这个函数(目前只有 + get_user_pages())。注意:flush_dcache_page()故意对匿名页不起作 + 用。默认的实现是nop(对于所有相干的架构应该保持这样)。对于不一致性 + 的架构,它应该刷新vmaddr处的页面缓存。 + + ``void flush_kernel_dcache_page(struct page *page)`` + + 当内核需要修改一个用kmap获得的用户页时,它会在所有修改完成后(但在 + kunmapping之前)调用这个函数,以使底层页面达到最新状态。这里假定用 + 户没有不一致性的缓存副本(即原始页面是从类似get_user_pages()的机制 + 中获得的)。默认的实现是一个nop,在所有相干的架构上都应该如此。在不 + 一致性的架构上,这应该刷新内核缓存中的页面(使用page_address(page))。 + + + ``void flush_icache_range(unsigned long start, unsigned long end)`` + + 当内核存储到它将执行的地址中时(例如在加载模块时),这个函数被调用。 + + 如果icache不对存储进行窥探,那么这个程序将需要对其进行刷新。 + + ``void flush_icache_page(struct vm_area_struct *vma, struct page *page)`` + + flush_icache_page的所有功能都可以在flush_dcache_page和update_mmu_cache + 中实现。在未来,我们希望能够完全删除这个接口。 + +最后一类API是用于I/O到内核内特意设置的别名地址范围。这种别名是通过使用 +vmap/vmalloc API设置的。由于内核I/O是通过物理页进行的,I/O子系统假定用户 +映射和内核偏移映射是唯一的别名。这对vmap别名来说是不正确的,所以内核中任何 +试图对vmap区域进行I/O的东西都必须手动管理一致性。它必须在做I/O之前刷新vmap +范围,并在I/O返回后使其失效。 + + ``void flush_kernel_vmap_range(void *vaddr, int size)`` + + 刷新vmap区域中指定的虚拟地址范围的内核缓存。这是为了确保内核在vmap范围 + 内修改的任何数据对物理页是可见的。这个设计是为了使这个区域可以安全地执 + 行I/O。注意,这个API并 *没有* 刷新该区域的偏移映射别名。 + + ``void invalidate_kernel_vmap_range(void *vaddr, int size) invalidates`` + + 在vmap区域的一个给定的虚拟地址范围的缓存,这可以防止处理器在物理页的I/O + 发生时通过投机性地读取数据而使缓存变脏。这只对读入vmap区域的数据是必要的。 diff --git a/Documentation/translations/zh_CN/core-api/index.rst b/Documentation/translations/zh_CN/core-api/index.rst index f1fa71e45c77..b4bde9396339 100644 --- a/Documentation/translations/zh_CN/core-api/index.rst +++ b/Documentation/translations/zh_CN/core-api/index.rst @@ -19,12 +19,13 @@ 来的大量 kerneldoc 信息;有朝一日,若有人有动力的话,应当把它们拆分 出来。 -Todolist: +.. toctree:: + :maxdepth: 1 kernel-api - workqueue printk-basics printk-formats + workqueue symbol-namespaces 数据结构和低级实用程序 @@ -32,9 +33,13 @@ Todolist: 在整个内核中使用的函数库。 -Todolist: +.. toctree:: + :maxdepth: 1 kobject + +Todolist: + kref assoc_array xarray @@ -58,12 +63,12 @@ Linux如何让一切同时发生。 详情请参阅 :maxdepth: 1 irq/index - -Todolist: - refcount-vs-atomic local_ops padata + +Todolist: + ../RCU/index 低级硬件管理 @@ -71,9 +76,14 @@ Todolist: 缓存管理,CPU热插拔管理等。 -Todolist: +.. toctree:: + :maxdepth: 1 cachetlb + +Todolist: + + cpu_hotplug memory-hotplug genericirq diff --git a/Documentation/translations/zh_CN/core-api/kernel-api.rst b/Documentation/translations/zh_CN/core-api/kernel-api.rst new file mode 100644 index 000000000000..d6f815ec265b --- /dev/null +++ b/Documentation/translations/zh_CN/core-api/kernel-api.rst @@ -0,0 +1,369 @@ +.. include:: ../disclaimer-zh_CN.rst + +:Original: Documentation/core-api/kernel-api.rst +:Translator: Yanteng Si <siyanteng@loongson.cn> + +.. _cn_kernel-api.rst: + + +============ +Linux内核API +============ + + +列表管理函数 +============ + +该API在以下内核代码中: + +include/linux/list.h + +基本的C库函数 +============= + +在编写驱动程序时,一般不能使用C库中的例程。部分函数通常很有用,它们在 +下面被列出。这些函数的行为可能会与ANSI定义的略有不同,这些偏差会在文中 +注明。 + +字符串转换 +---------- + +该API在以下内核代码中: + +lib/vsprintf.c + +include/linux/kernel.h + +include/linux/kernel.h + +lib/kstrtox.c + +lib/string_helpers.c + +字符串处理 +---------- + +该API在以下内核代码中: + +lib/string.c + +include/linux/string.h + +mm/util.c + +基本的内核库函数 +================ + +Linux内核提供了很多实用的基本函数。 + +位运算 +------ + +该API在以下内核代码中: + +include/asm-generic/bitops/instrumented-atomic.h + +include/asm-generic/bitops/instrumented-non-atomic.h + +include/asm-generic/bitops/instrumented-lock.h + +位图运算 +-------- + +该API在以下内核代码中: + +lib/bitmap.c + +include/linux/bitmap.h + +include/linux/bitmap.h + +include/linux/bitmap.h + +lib/bitmap.c + +lib/bitmap.c + +include/linux/bitmap.h + +命令行解析 +---------- + +该API在以下内核代码中: + +lib/cmdline.c + +排序 +---- + +该API在以下内核代码中: + +lib/sort.c + +lib/list_sort.c + +文本检索 +-------- + +该API在以下内核代码中: + +lib/textsearch.c + +lib/textsearch.c + +include/linux/textsearch.h + +Linux中的CRC和数学函数 +====================== + + +CRC函数 +------- + +*译注:CRC,Cyclic Redundancy Check,循环冗余校验* + +该API在以下内核代码中: + +lib/crc4.c + +lib/crc7.c + +lib/crc8.c + +lib/crc16.c + +lib/crc32.c + +lib/crc-ccitt.c + +lib/crc-itu-t.c + +基数为2的对数和幂函数 +--------------------- + +该API在以下内核代码中: + +include/linux/log2.h + +整数幂函数 +---------- + +该API在以下内核代码中: + +lib/math/int_pow.c + +lib/math/int_sqrt.c + +除法函数 +-------- + +该API在以下内核代码中: + +include/asm-generic/div64.h + +include/linux/math64.h + +lib/math/div64.c + +lib/math/gcd.c + +UUID/GUID +--------- + +该API在以下内核代码中: + +lib/uuid.c + +内核IPC设备 +=========== + +IPC实用程序 +----------- + +该API在以下内核代码中: + +ipc/util.c + +FIFO 缓冲区 +=========== + +kfifo接口 +--------- + +该API在以下内核代码中: + +include/linux/kfifo.h + +转发接口支持 +============ + +转发接口支持旨在为工具和设备提供一种有效的机制,将大量数据从内核空间 +转发到用户空间。 + +转发接口 +-------- + +该API在以下内核代码中: + +kernel/relay.c + +kernel/relay.c + +模块支持 +======== + +模块加载 +-------- + +该API在以下内核代码中: + +kernel/kmod.c + +模块接口支持 +------------ + +更多信息请参考文件kernel/module.c。 + +硬件接口 +======== + + +该API在以下内核代码中: + +kernel/dma.c + +资源管理 +-------- + +该API在以下内核代码中: + +kernel/resource.c + +kernel/resource.c + +MTRR处理 +-------- + +该API在以下内核代码中: + +arch/x86/kernel/cpu/mtrr/mtrr.c + +安全框架 +======== + +该API在以下内核代码中: + +security/security.c + +security/inode.c + +审计接口 +======== + +该API在以下内核代码中: + +kernel/audit.c + +kernel/auditsc.c + +kernel/auditfilter.c + +核算框架 +======== + +该API在以下内核代码中: + +kernel/acct.c + +块设备 +====== + +该API在以下内核代码中: + +block/blk-core.c + +block/blk-core.c + +block/blk-map.c + +block/blk-sysfs.c + +block/blk-settings.c + +block/blk-exec.c + +block/blk-flush.c + +block/blk-lib.c + +block/blk-integrity.c + +kernel/trace/blktrace.c + +block/genhd.c + +block/genhd.c + +字符设备 +======== + +该API在以下内核代码中: + +fs/char_dev.c + +时钟框架 +======== + +时钟框架定义了编程接口,以支持系统时钟树的软件管理。该框架广泛用于系统级芯片(SOC)平 +台,以支持电源管理和各种可能需要自定义时钟速率的设备。请注意,这些 “时钟”与计时或实 +时时钟(RTC)无关,它们都有单独的框架。这些:c:type: `struct clk <clk>` 实例可用于管理 +各种时钟信号,例如一个96理例如96MHz的时钟信号,该信号可被用于总线或外设的数据交换,或以 +其他方式触发系统硬件中的同步状态机转换。 + +通过明确的软件时钟门控来支持电源管理:未使用的时钟被禁用,因此系统不会因为改变不在使用 +中的晶体管的状态而浪费电源。在某些系统中,这可能是由硬件时钟门控支持的,其中时钟被门控 +而不在软件中被禁用。芯片的部分,在供电但没有时钟的情况下,可能会保留其最后的状态。这种 +低功耗状态通常被称为*保留模式*。这种模式仍然会产生漏电流,特别是在电路几何结构较细的情 +况下,但对于CMOS电路来说,电能主要是随着时钟翻转而被消耗的。 + +电源感知驱动程序只有在其管理的设备处于活动使用状态时才会启用时钟。此外,系统睡眠状态通 +常根据哪些时钟域处于活动状态而有所不同:“待机”状态可能允许从多个活动域中唤醒,而 +"mem"(暂停到RAM)状态可能需要更全面地关闭来自高速PLL和振荡器的时钟,从而限制了可能 +的唤醒事件源的数量。驱动器的暂停方法可能需要注意目标睡眠状态的系统特定时钟约束。 + +一些平台支持可编程时钟发生器。这些可以被各种外部芯片使用,如其他CPU、多媒体编解码器以 +及对接口时钟有严格要求的设备。 + +该API在以下内核代码中: + +include/linux/clk.h + +同步原语 +======== + +读-复制-更新(RCU) +------------------- + +该API在以下内核代码中: + +include/linux/rcupdate.h + +kernel/rcu/tree.c + +kernel/rcu/tree_exp.h + +kernel/rcu/update.c + +include/linux/srcu.h + +kernel/rcu/srcutree.c + +include/linux/rculist_bl.h + +include/linux/rculist.h + +include/linux/rculist_nulls.h + +include/linux/rcu_sync.h + +kernel/rcu/sync.c diff --git a/Documentation/translations/zh_CN/core-api/kobject.rst b/Documentation/translations/zh_CN/core-api/kobject.rst new file mode 100644 index 000000000000..f0e6a4aeb372 --- /dev/null +++ b/Documentation/translations/zh_CN/core-api/kobject.rst @@ -0,0 +1,378 @@ +.. include:: ../disclaimer-zh_CN.rst + +:Original: Documentation/core-api/kobject.rst +:Translator: Yanteng Si <siyanteng@loongson.cn> + +.. _cn_core_api_kobject.rst: + +======================================================= +关于kobjects、ksets和ktypes的一切你没想过需要了解的东西 +======================================================= + +:作者: Greg Kroah-Hartman <gregkh@linuxfoundation.org> +:最后一次更新: December 19, 2007 + +根据Jon Corbet于2003年10月1日为lwn.net撰写的原创文章改编,网址是: +https://lwn.net/Articles/51437/ + +理解驱动模型和建立在其上的kobject抽象的部分的困难在于,没有明显的切入点。 +处理kobjects需要理解一些不同的类型,所有这些类型都会相互引用。为了使事情 +变得更简单,我们将多路并进,从模糊的术语开始,并逐渐增加细节。那么,先来 +了解一些我们将要使用的术语的简明定义吧。 + + - 一个kobject是一个kobject结构体类型的对象。Kobjects有一个名字和一个 + 引用计数。一个kobject也有一个父指针(允许对象被排列成层次结构),一个 + 特定的类型,并且,通常在sysfs虚拟文件系统中表示。 + + Kobjects本身通常并不引人关注;相反它们常常被嵌入到其他包含真正引人注目 + 的代码的结构体中。 + + 任何结构体都 **不应该** 有一个以上的kobject嵌入其中。如果有的话,对象的引用计 + 数肯定会被打乱,而且不正确,你的代码就会出现错误。所以不要这样做。 + + - ktype是嵌入一个kobject的对象的类型。每个嵌入kobject的结构体都需要一个 + 相应的ktype。ktype控制着kobject在被创建和销毁时的行为。 + + - 一个kset是一组kobjects。这些kobjects可以是相同的ktype或者属于不同的 + ktype。kset是kobjects集合的基本容器类型。Ksets包含它们自己的kobjects, + 但你可以安全地忽略这个实现细节,因为kset的核心代码会自动处理这个kobject。 + + 当你看到一个下面全是其他目录的sysfs目录时,通常这些目录中的每一个都对应 + 于同一个kset中的一个kobject。 + + 我们将研究如何创建和操作所有这些类型。将采取一种自下而上的方法,所以我们 + 将回到kobjects。 + + +嵌入kobjects +============= + +内核代码很少创建孤立的kobject,只有一个主要的例外,下面会解释。相反, +kobjects被用来控制对一个更大的、特定领域的对象的访问。为此,kobjects会被 +嵌入到其他结构中。如果你习惯于用面向对象的术语来思考问题,那么kobjects可 +以被看作是一个顶级的抽象类,其他的类都是从它派生出来的。一个kobject实现了 +一系列的功能,这些功能本身并不特别有用,但在其他对象中却很好用。C语言不允 +许直接表达继承,所以必须使用其他技术——比如结构体嵌入。 + +(对于那些熟悉内核链表实现的人来说,这类似于“list_head”结构本身很少有用, +但总是被嵌入到感兴趣的更大的对象中)。 + +例如, ``drivers/uio/uio.c`` 中的IO代码有一个结构体,定义了与uio设备相 +关的内存区域:: + + struct uio_map { + struct kobject kobj; + struct uio_mem *mem; + }; + +如果你有一个uio_map结构体,找到其嵌入的kobject只是一个使用kobj成员的问题。 +然而,与kobjects一起工作的代码往往会遇到相反的问题:给定一个结构体kobject +的指针,指向包含结构体的指针是什么?你必须避免使用一些技巧(比如假设 +kobject在结构的开头),相反,你得使用container_of()宏,其可以在 ``<linux/kernel.h>`` +中找到:: + + container_of(ptr, type, member) + +其中: + + * ``ptr`` 是一个指向嵌入kobject的指针, + * ``type`` 是包含结构体的类型, + * ``member`` 是 ``指针`` 所指向的结构体域的名称。 + +container_of()的返回值是一个指向相应容器类型的指针。因此,例如,一个嵌入到 +uio_map结构 **中** 的kobject结构体的指针kp可以被转换为一个指向 **包含** uio_map +结构体的指针,方法是:: + + struct uio_map *u_map = container_of(kp, struct uio_map, kobj); + +为了方便起见,程序员经常定义一个简单的宏,用于将kobject指针 **反推** 到包含 +类型。在早期的 ``drivers/uio/uio.c`` 中正是如此,你可以在这里看到:: + + struct uio_map { + struct kobject kobj; + struct uio_mem *mem; + }; + + #define to_map(map) container_of(map, struct uio_map, kobj) + +其中宏的参数“map”是一个指向有关的kobject结构体的指针。该宏随后被调用:: + + struct uio_map *map = to_map(kobj); + + +kobjects的初始化 +================ + +当然,创建kobject的代码必须初始化该对象。一些内部字段是通过(强制)调用kobject_init() +来设置的:: + + void kobject_init(struct kobject *kobj, struct kobj_type *ktype); + +ktype是正确创建kobject的必要条件,因为每个kobject都必须有一个相关的kobj_type。 +在调用kobject_init()后,为了向sysfs注册kobject,必须调用函数kobject_add():: + + int kobject_add(struct kobject *kobj, struct kobject *parent, + const char *fmt, ...); + +这将正确设置kobject的父级和kobject的名称。如果该kobject要与一个特定的kset相关 +联,在调用kobject_add()之前必须分配kobj->kset。如果kset与kobject相关联,则 +kobject的父级可以在调用kobject_add()时被设置为NULL,则kobject的父级将是kset +本身。 + +由于kobject的名字是在它被添加到内核时设置的,所以kobject的名字不应该被直接操作。 +如果你必须改变kobject的名字,请调用kobject_rename():: + + int kobject_rename(struct kobject *kobj, const char *new_name); + +kobject_rename()函数不会执行任何锁定操作,也不会对name进行可靠性检查,所以调用 +者自己检查和串行化操作是明智的选择 + +有一个叫kobject_set_name()的函数,但那是历史遗产,正在被删除。如果你的代码需 +要调用这个函数,那么它是不正确的,需要被修复。 + +要正确访问kobject的名称,请使用函数kobject_name():: + + const char *kobject_name(const struct kobject * kobj); + +有一个辅助函数可以同时初始化和添加kobject到内核中,令人惊讶的是,该函数被称为 +kobject_init_and_add():: + + int kobject_init_and_add(struct kobject *kobj, struct kobj_type *ktype, + struct kobject *parent, const char *fmt, ...); + +参数与上面描述的单个kobject_init()和kobject_add()函数相同。 + + +Uevents +======= + +当一个kobject被注册到kobject核心后,你需要向全世界宣布它已经被创建了。这可以通 +过调用kobject_uevent()来实现:: + + int kobject_uevent(struct kobject *kobj, enum kobject_action action); + +当kobject第一次被添加到内核时,使用 *KOBJ_ADD* 动作。这应该在该kobject的任 +何属性或子对象被正确初始化后进行,因为当这个调用发生时,用户空间会立即开始寻 +找它们。 + +当kobject从内核中移除时(关于如何做的细节在下面), **KOBJ_REMOVE** 的uevent +将由kobject核心自动创建,所以调用者不必担心手动操作。 + + +引用计数 +======== + +kobject的关键功能之一是作为它所嵌入的对象的一个引用计数器。只要对该对象的引用 +存在,该对象(以及支持它的代码)就必须继续存在。用于操作kobject的引用计数的低 +级函数是:: + + struct kobject *kobject_get(struct kobject *kobj); + void kobject_put(struct kobject *kobj); + +对kobject_get()的成功调用将增加kobject的引用计数器值并返回kobject的指针。 + +当引用被释放时,对kobject_put()的调用将递减引用计数值,并可能释放该对象。请注 +意,kobject_init()将引用计数设置为1,所以设置kobject的代码最终需要kobject_put() +来释放该引用。 + +因为kobjects是动态的,所以它们不能以静态方式或在堆栈中声明,而总是以动态方式分 +配。未来版本的内核将包含对静态创建的kobjects的运行时检查,并将警告开发者这种不 +当的使用。 + +如果你使用struct kobject只是为了给你的结构体提供一个引用计数器,请使用struct kref +来代替;kobject是多余的。关于如何使用kref结构体的更多信息,请参见Linux内核源代 +码树中的文件Documentation/core-api/kref.rst + + +创建“简单的”kobjects +==================== + +有时,开发者想要的只是在sysfs层次结构中创建一个简单的目录,而不必去搞那些复杂 +的ksets、显示和存储函数,以及其他细节。这是一个应该创建单个kobject的例外。要 +创建这样一个条目(即简单的目录),请使用函数:: + + struct kobject *kobject_create_and_add(const char *name, struct kobject *parent); + +这个函数将创建一个kobject,并将其放在sysfs中指定的父kobject下面的位置。要创 +建与此kobject相关的简单属性,请使用:: + + int sysfs_create_file(struct kobject *kobj, const struct attribute *attr); + +或者:: + + int sysfs_create_group(struct kobject *kobj, const struct attribute_group *grp); + +这里使用的两种类型的属性,与已经用kobject_create_and_add()创建的kobject, +都可以是kobj_attribute类型,所以不需要创建特殊的自定义属性。 + +参见示例模块, ``samples/kobject/kobject-example.c`` ,以了解一个简单的 +kobject和属性的实现。 + + + +ktypes和释放方法 +================ + +以上讨论中还缺少一件重要的事情,那就是当一个kobject的引用次数达到零的时候 +会发生什么。创建kobject的代码通常不知道何时会发生这种情况;首先,如果它知 +道,那么使用kobject就没有什么意义。当sysfs被引入时,即使是可预测的对象生命 +周期也会变得更加复杂,因为内核的其他部分可以获得在系统中注册的任何kobject +的引用。 + +最终的结果是,一个由kobject保护的结构体在其引用计数归零之前不能被释放。引 +用计数不受创建kobject的代码的直接控制。因此,每当它的一个kobjects的最后一 +个引用消失时,必须异步通知该代码。 + +一旦你通过kobject_add()注册了你的kobject,你绝对不能使用kfree()来直接释 +放它。唯一安全的方法是使用kobject_put()。在kobject_init()之后总是使用 +kobject_put()以避免错误的发生是一个很好的做法。 + +这个通知是通过kobject的release()方法完成的。通常这样的方法有如下形式:: + + void my_object_release(struct kobject *kobj) + { + struct my_object *mine = container_of(kobj, struct my_object, kobj); + + /* Perform any additional cleanup on this object, then... */ + kfree(mine); + } + +有一点很重要:每个kobject都必须有一个release()方法,而且这个kobject必 +须持续存在(处于一致的状态),直到这个方法被调用。如果这些约束条件没有 +得到满足,那么代码就是有缺陷的。注意,如果你忘记提供release()方法,内 +核会警告你。不要试图通过提供一个“空”的释放函数来摆脱这个警告。 + +如果你的清理函数只需要调用kfree(),那么你必须创建一个包装函数,该函数 +使用container_of()来向上造型到正确的类型(如上面的例子所示),然后在整个 +结构体上调用kfree()。 + +注意,kobject的名字在release函数中是可用的,但它不能在这个回调中被改 +变。否则,在kobject核心中会出现内存泄漏,这让人很不爽。 + +有趣的是,release()方法并不存储在kobject本身;相反,它与ktype相关。 +因此,让我们引入结构体kobj_type:: + + struct kobj_type { + void (*release)(struct kobject *kobj); + const struct sysfs_ops *sysfs_ops; + struct attribute **default_attrs; + const struct attribute_group **default_groups; + const struct kobj_ns_type_operations *(*child_ns_type)(struct kobject *kobj); + const void *(*namespace)(struct kobject *kobj); + void (*get_ownership)(struct kobject *kobj, kuid_t *uid, kgid_t *gid); + }; + +这个结构提用来描述一个特定类型的kobject(或者更正确地说,包含对象的 +类型)。每个kobject都需要有一个相关的kobj_type结构;当你调用 +kobject_init()或kobject_init_and_add()时必须指定一个指向该结构的 +指针。 + +当然,kobj_type结构中的release字段是指向这种类型的kobject的release() +方法的一个指针。另外两个字段(sysfs_ops 和 default_attrs)控制这种 +类型的对象如何在 sysfs 中被表示;它们超出了本文的范围。 + +default_attrs 指针是一个默认属性的列表,它将为任何用这个 ktype 注册 +的 kobject 自动创建。 + + +ksets +===== + +一个kset仅仅是一个希望相互关联的kobjects的集合。没有限制它们必须是相 +同的ktype,但是如果它们不是相同的,就要非常小心。 + +一个kset有以下功能: + + - 它像是一个包含一组对象的袋子。一个kset可以被内核用来追踪“所有块 + 设备”或“所有PCI设备驱动”。 + + - kset也是sysfs中的一个子目录,与kset相关的kobjects可以在这里显示 + 出来。每个kset都包含一个kobject,它可以被设置为其他kobject的父对象; + sysfs层次结构的顶级目录就是以这种方式构建的。 + + - Ksets可以支持kobjects的 "热插拔",并影响uevent事件如何被报告给 + 用户空间。 + + 在面向对象的术语中,“kset”是顶级的容器类;ksets包含它们自己的kobject, + 但是这个kobject是由kset代码管理的,不应该被任何其他用户所操纵。 + + kset在一个标准的内核链表中保存它的子对象。Kobjects通过其kset字段指向其 + 包含的kset。在几乎所有的情况下,属于一个kset的kobjects在它们的父 + 对象中都有那个kset(或者,严格地说,它的嵌入kobject)。 + + 由于kset中包含一个kobject,它应该总是被动态地创建,而不是静态地 + 或在堆栈中声明。要创建一个新的kset,请使用:: + + struct kset *kset_create_and_add(const char *name, + const struct kset_uevent_ops *uevent_ops, + struct kobject *parent_kobj); + +当你完成对kset的处理后,调用:: + + void kset_unregister(struct kset *k); + +来销毁它。这将从sysfs中删除该kset并递减其引用计数值。当引用计数 +为零时,该kset将被释放。因为对该kset的其他引用可能仍然存在, +释放可能发生在kset_unregister()返回之后。 + +一个使用kset的例子可以在内核树中的 ``samples/kobject/kset-example.c`` +文件中看到。 + +如果一个kset希望控制与它相关的kobjects的uevent操作,它可以使用 +结构体kset_uevent_ops来处理它:: + + struct kset_uevent_ops { + int (* const filter)(struct kset *kset, struct kobject *kobj); + const char *(* const name)(struct kset *kset, struct kobject *kobj); + int (* const uevent)(struct kset *kset, struct kobject *kobj, + struct kobj_uevent_env *env); + }; + + +过滤器函数允许kset阻止一个特定kobject的uevent被发送到用户空间。 +如果该函数返回0,该uevent将不会被发射出去。 + +name函数将被调用以覆盖uevent发送到用户空间的kset的默认名称。默 +认情况下,该名称将与kset本身相同,但这个函数,如果存在,可以覆盖 +该名称。 + +当uevent即将被发送至用户空间时,uevent函数将被调用,以允许更多 +的环境变量被添加到uevent中。 + +有人可能会问,鉴于没有提出执行该功能的函数,究竟如何将一个kobject +添加到一个kset中。答案是这个任务是由kobject_add()处理的。当一个 +kobject被传递给kobject_add()时,它的kset成员应该指向这个kobject +所属的kset。 kobject_add()将处理剩下的部分。 + +如果属于一个kset的kobject没有父kobject集,它将被添加到kset的目 +录中。并非所有的kset成员都必须住在kset目录中。如果在添加kobject +之前分配了一个明确的父kobject,那么该kobject将被注册到kset中, +但是被添加到父kobject下面。 + + +移除Kobject +=========== + +当一个kobject在kobject核心注册成功后,在代码使用完它时,必须将其 +清理掉。要做到这一点,请调用kobject_put()。通过这样做,kobject核 +心会自动清理这个kobject分配的所有内存。如果为这个对象发送了 ``KOBJ_ADD`` +uevent,那么相应的 ``KOBJ_REMOVE`` uevent也将被发送,任何其他的 +sysfs内务将被正确处理。 + +如果你需要分两次对kobject进行删除(比如说在你要销毁对象时无权睡眠), +那么调用kobject_del()将从sysfs中取消kobject的注册。这使得kobject +“不可见”,但它并没有被清理掉,而且该对象的引用计数仍然是一样的。在稍 +后的时间调用kobject_put()来完成与该kobject相关的内存的清理。 + +kobject_del()可以用来放弃对父对象的引用,如果循环引用被构建的话。 +在某些情况下,一个父对象引用一个子对象是有效的。循环引用必须通过明 +确调用kobject_del()来打断,这样一个释放函数就会被调用,前一个循环 +中的对象会相互释放。 + + +示例代码出处 +============ + +关于正确使用ksets和kobjects的更完整的例子,请参见示例程序 +``samples/kobject/{kobject-example.c,kset-example.c}`` ,如果 +您选择 ``CONFIG_SAMPLE_KOBJECT`` ,它们将被构建为可加载模块。 diff --git a/Documentation/translations/zh_CN/core-api/local_ops.rst b/Documentation/translations/zh_CN/core-api/local_ops.rst new file mode 100644 index 000000000000..ee67379b6869 --- /dev/null +++ b/Documentation/translations/zh_CN/core-api/local_ops.rst @@ -0,0 +1,194 @@ +.. include:: ../disclaimer-zh_CN.rst + +:Original: Documentation/core-api/local_ops.rst +:Translator: Yanteng Si <siyanteng@loongson.cn> + +.. _cn_local_ops: + + +======================== +本地原子操作的语义和行为 +======================== + +:作者: Mathieu Desnoyers + + +本文解释了本地原子操作的目的,如何为任何给定的架构实现这些操作,并说明了 +如何正确使用这些操作。它还强调了在内存写入顺序很重要的情况下,跨CPU读取 +这些本地变量时必须采取的预防措施。 + +.. note:: + + 注意,基于 ``local_t`` 的操作不建议用于一般内核操作。请使用 ``this_cpu`` + 操作来代替使用,除非真的有特殊目的。大多数内核中使用的 ``local_t`` 已 + 经被 ``this_cpu`` 操作所取代。 ``this_cpu`` 操作在一条指令中结合了重 + 定位和类似 ``local_t`` 的语义,产生了更紧凑和更快的执行代码。 + + +本地原子操作的目的 +================== + +本地原子操作的目的是提供快速和高度可重入的每CPU计数器。它们通过移除LOCK前 +缀和通常需要在CPU间同步的内存屏障,将标准原子操作的性能成本降到最低。 + +在许多情况下,拥有快速的每CPU原子计数器是很有吸引力的:它不需要禁用中断来保护中 +断处理程序,它允许在NMI(Non Maskable Interrupt)处理程序中使用连贯的计数器。 +它对追踪目的和各种性能监测计数器特别有用。 + +本地原子操作只保证在拥有数据的CPU上的变量修改的原子性。因此,必须注意确保只 +有一个CPU写到 ``local_t`` 的数据。这是通过使用每CPU的数据来实现的,并确 +保我们在一个抢占式安全上下文中修改它。然而,从任何一个CPU读取 ``local_t`` +数据都是允许的:这样它就会显得与所有者CPU的其他内存写入顺序不一致。 + + +针对特定架构的实现 +================== + +这可以通过稍微修改标准的原子操作来实现:只有它们的UP变体必须被保留。这通常 +意味着删除LOCK前缀(在i386和x86_64上)和任何SMP同步屏障。如果架构在SMP和 +UP之间没有不同的行为,在你的架构的 ``local.h`` 中包括 ``asm-generic/local.h`` +就足够了。 + +通过在一个结构体中嵌入一个 ``atomic_long_t`` , ``local_t`` 类型被定义为 +一个不透明的 ``signed long`` 。这样做的目的是为了使从这个类型到 +``long`` 的转换失败。该定义看起来像:: + + typedef struct { atomic_long_t a; } local_t; + + +使用本地原子操作时应遵循的规则 +============================== + +* 被本地操作触及的变量必须是每cpu的变量。 + +* *只有* 这些变量的CPU所有者才可以写入这些变量。 + +* 这个CPU可以从任何上下文(进程、中断、软中断、nmi...)中使用本地操作来更新 + 它的local_t变量。 + +* 当在进程上下文中使用本地操作时,必须禁用抢占(或中断),以确保进程在获得每 + CPU变量和进行实际的本地操作之间不会被迁移到不同的CPU。 + +* 当在中断上下文中使用本地操作时,在主线内核上不需要特别注意,因为它们将在局 + 部CPU上运行,并且已经禁用了抢占。然而,我建议无论如何都要明确地禁用抢占, + 以确保它在-rt内核上仍能正确工作。 + +* 读取本地cpu变量将提供该变量的当前拷贝。 + +* 对这些变量的读取可以从任何CPU进行,因为对 “ ``long`` ”,对齐的变量的更新 + 总是原子的。由于写入程序的CPU没有进行内存同步,所以在读取 *其他* cpu的变 + 量时,可以读取该变量的过期副本。 + + +如何使用本地原子操作 +==================== + +:: + + #include <linux/percpu.h> + #include <asm/local.h> + + static DEFINE_PER_CPU(local_t, counters) = LOCAL_INIT(0); + + +计数器 +====== + +计数是在一个signed long的所有位上进行的。 + +在可抢占的上下文中,围绕本地原子操作使用 ``get_cpu_var()`` 和 +``put_cpu_var()`` :它确保在对每个cpu变量进行写访问时,抢占被禁用。比如 +说:: + + local_inc(&get_cpu_var(counters)); + put_cpu_var(counters); + +如果你已经在一个抢占安全上下文中,你可以使用 ``this_cpu_ptr()`` 代替:: + + local_inc(this_cpu_ptr(&counters)); + + + +读取计数器 +========== + +那些本地计数器可以从外部的CPU中读取,以求得计数的总和。请注意,local_read +所看到的跨CPU的数据必须被认为是相对于拥有该数据的CPU上发生的其他内存写入来 +说不符合顺序的:: + + long sum = 0; + for_each_online_cpu(cpu) + sum += local_read(&per_cpu(counters, cpu)); + +如果你想使用远程local_read来同步CPU之间对资源的访问,必须在写入者和读取者 +的CPU上分别使用显式的 ``smp_wmb()`` 和 ``smp_rmb()`` 内存屏障。如果你使 +用 ``local_t`` 变量作为写在缓冲区中的字节的计数器,就会出现这种情况:在缓 +冲区写和计数器增量之间应该有一个 ``smp_wmb()`` ,在计数器读和缓冲区读之间 +也应有一个 ``smp_rmb()`` 。 + +下面是一个使用 ``local.h`` 实现每个cpu基本计数器的示例模块:: + + /* test-local.c + * + * Sample module for local.h usage. + */ + + + #include <asm/local.h> + #include <linux/module.h> + #include <linux/timer.h> + + static DEFINE_PER_CPU(local_t, counters) = LOCAL_INIT(0); + + static struct timer_list test_timer; + + /* IPI called on each CPU. */ + static void test_each(void *info) + { + /* Increment the counter from a non preemptible context */ + printk("Increment on cpu %d\n", smp_processor_id()); + local_inc(this_cpu_ptr(&counters)); + + /* This is what incrementing the variable would look like within a + * preemptible context (it disables preemption) : + * + * local_inc(&get_cpu_var(counters)); + * put_cpu_var(counters); + */ + } + + static void do_test_timer(unsigned long data) + { + int cpu; + + /* Increment the counters */ + on_each_cpu(test_each, NULL, 1); + /* Read all the counters */ + printk("Counters read from CPU %d\n", smp_processor_id()); + for_each_online_cpu(cpu) { + printk("Read : CPU %d, count %ld\n", cpu, + local_read(&per_cpu(counters, cpu))); + } + mod_timer(&test_timer, jiffies + 1000); + } + + static int __init test_init(void) + { + /* initialize the timer that will increment the counter */ + timer_setup(&test_timer, do_test_timer, 0); + mod_timer(&test_timer, jiffies + 1); + + return 0; + } + + static void __exit test_exit(void) + { + del_timer_sync(&test_timer); + } + + module_init(test_init); + module_exit(test_exit); + + MODULE_LICENSE("GPL"); + MODULE_AUTHOR("Mathieu Desnoyers"); + MODULE_DESCRIPTION("Local Atomic Ops"); diff --git a/Documentation/translations/zh_CN/core-api/padata.rst b/Documentation/translations/zh_CN/core-api/padata.rst new file mode 100644 index 000000000000..c627f8f131f9 --- /dev/null +++ b/Documentation/translations/zh_CN/core-api/padata.rst @@ -0,0 +1,158 @@ +.. SPDX-License-Identifier: GPL-2.0 + +.. include:: ../disclaimer-zh_CN.rst + +:Original: Documentation/core-api/padata.rst +:Translator: Yanteng Si <siyanteng@loongson.cn> + +.. _cn_core_api_padata.rst: + +================== +padata并行执行机制 +================== + +:日期: 2020年5月 + +Padata是一种机制,内核可以通过此机制将工作分散到多个CPU上并行完成,同时 +可以选择保持它们的顺序。 + +它最初是为IPsec开发的,它需要在不对这些数据包重新排序的其前提下,为大量的数 +据包进行加密和解密。这是目前padata的序列化作业支持的唯一用途。 + +Padata还支持多线程作业,将作业平均分割,同时在线程之间进行负载均衡和协调。 + +执行序列化作业 +============== + +初始化 +------ + +使用padata执行序列化作业的第一步是建立一个padata_instance结构体,以全面 +控制作业的运行方式:: + + #include <linux/padata.h> + + struct padata_instance *padata_alloc(const char *name); + +'name'即标识了这个实例。 + +然后,通过分配一个padata_shell来完成padata的初始化:: + + struct padata_shell *padata_alloc_shell(struct padata_instance *pinst); + +一个padata_shell用于向padata提交一个作业,并允许一系列这样的作业被独立地 +序列化。一个padata_instance可以有一个或多个padata_shell与之相关联,每个 +都允许一系列独立的作业。 + +修改cpumasks +------------ + +用于运行作业的CPU可以通过两种方式改变,通过padata_set_cpumask()编程或通 +过sysfs。前者的定义是:: + + int padata_set_cpumask(struct padata_instance *pinst, int cpumask_type, + cpumask_var_t cpumask); + +这里cpumask_type是PADATA_CPU_PARALLEL(并行)或PADATA_CPU_SERIAL(串行)之一,其中并 +行cpumask描述了哪些处理器将被用来并行执行提交给这个实例的作业,串行cpumask +定义了哪些处理器被允许用作串行化回调处理器。 cpumask指定了要使用的新cpumask。 + +一个实例的cpumasks可能有sysfs文件。例如,pcrypt的文件在 +/sys/kernel/pcrypt/<instance-name>。在一个实例的目录中,有两个文件,parallel_cpumask +和serial_cpumask,任何一个cpumask都可以通过在文件中回显(echo)一个bitmask +来改变,比如说:: + + echo f > /sys/kernel/pcrypt/pencrypt/parallel_cpumask + +读取其中一个文件会显示用户提供的cpumask,它可能与“可用”的cpumask不同。 + +Padata内部维护着两对cpumask,用户提供的cpumask和“可用的”cpumask(每一对由一个 +并行和一个串行cpumask组成)。用户提供的cpumasks在实例分配时默认为所有可能的CPU, +并且可以如上所述进行更改。可用的cpumasks总是用户提供的cpumasks的一个子集,只包 +含用户提供的掩码中的在线CPU;这些是padata实际使用的cpumasks。因此,向padata提 +供一个包含离线CPU的cpumask是合法的。一旦用户提供的cpumask中的一个离线CPU上线, +padata就会使用它。 + +改变CPU掩码的操作代价很高,所以不应频繁更改。 + +运行一个作业 +------------- + +实际上向padata实例提交工作需要创建一个padata_priv结构体,它代表一个作业:: + + struct padata_priv { + /* Other stuff here... */ + void (*parallel)(struct padata_priv *padata); + void (*serial)(struct padata_priv *padata); + }; + +这个结构体几乎肯定会被嵌入到一些针对要做的工作的大结构体中。它的大部分字段对 +padata来说是私有的,但是这个结构在初始化时应该被清零,并且应该提供parallel()和 +serial()函数。在完成工作的过程中,这些函数将被调用,我们马上就会遇到。 + +工作的提交是通过:: + + int padata_do_parallel(struct padata_shell *ps, + struct padata_priv *padata, int *cb_cpu); + +ps和padata结构体必须如上所述进行设置;cb_cpu指向作业完成后用于最终回调的首选CPU; +它必须在当前实例的CPU掩码中(如果不是,cb_cpu指针将被更新为指向实际选择的CPU)。 +padata_do_parallel()的返回值在成功时为0,表示工作正在进行中。-EBUSY意味着有人 +在其他地方正在搞乱实例的CPU掩码,而当cb_cpu不在串行cpumask中、并行或串行cpumasks +中无在线CPU,或实例停止时,则会出现-EINVAL反馈。 + +每个提交给padata_do_parallel()的作业将依次传递给一个CPU上的上述parallel()函数 +的一个调用,所以真正的并行是通过提交多个作业来实现的。parallel()在运行时禁用软 +件中断,因此不能睡眠。parallel()函数把获得的padata_priv结构体指针作为其唯一的参 +数;关于实际要做的工作的信息可能是通过使用container_of()找到封装结构体来获得的。 + +请注意,parallel()没有返回值;padata子系统假定parallel()将从此时开始负责这项工 +作。作业不需要在这次调用中完成,但是,如果parallel()留下了未完成的工作,它应该准 +备在前一个作业完成之前,被以新的作业再次调用 + +序列化作业 +---------- + +当一个作业完成时,parallel()(或任何实际完成该工作的函数)应该通过调用通知padata此 +事:: + + void padata_do_serial(struct padata_priv *padata); + +在未来的某个时刻,padata_do_serial()将触发对padata_priv结构体中serial()函数的调 +用。这个调用将发生在最初要求调用padata_do_parallel()的CPU上;它也是在本地软件中断 +被禁用的情况下运行的。 +请注意,这个调用可能会被推迟一段时间,因为padata代码会努力确保作业按照提交的顺序完 +成。 + +销毁 +---- + +清理一个padata实例时,可以预见的是调用两个free函数,这两个函数对应于分配的逆过程:: + + void padata_free_shell(struct padata_shell *ps); + void padata_free(struct padata_instance *pinst); + +用户有责任确保在调用上述任何一项之前,所有未完成的工作都已完成。 + +运行多线程作业 +============== + +一个多线程作业有一个主线程和零个或多个辅助线程,主线程参与作业,然后等待所有辅助线 +程完成。padata将作业分割成称为chunk的单元,其中chunk是一个线程在一次调用线程函数 +中完成的作业片段。 + +用户必须做三件事来运行一个多线程作业。首先,通过定义一个padata_mt_job结构体来描述 +作业,这在接口部分有解释。这包括一个指向线程函数的指针,padata每次将作业块分配给线 +程时都会调用这个函数。然后,定义线程函数,它接受三个参数: ``start`` 、 ``end`` 和 ``arg`` , +其中前两个参数限定了线程操作的范围,最后一个是指向作业共享状态的指针,如果有的话。 +准备好共享状态,它通常被分配在主线程的堆栈中。最后,调用padata_do_multithreaded(), +它将在作业完成后返回。 + +接口 +==== + +该API在以下内核代码中: + +include/linux/padata.h + +kernel/padata.c diff --git a/Documentation/translations/zh_CN/core-api/printk-basics.rst b/Documentation/translations/zh_CN/core-api/printk-basics.rst new file mode 100644 index 000000000000..2b20f6303a82 --- /dev/null +++ b/Documentation/translations/zh_CN/core-api/printk-basics.rst @@ -0,0 +1,110 @@ +.. SPDX-License-Identifier: GPL-2.0 +.. include:: ../disclaimer-zh_CN.rst + +:Original: Documentation/core-api/printk-basics.rst +:Translator: Yanteng Si <siyanteng@loongson.cn> + +.. _cn_printk-basics.rst: + + +================== +使用printk记录消息 +================== + +printk()是Linux内核中最广为人知的函数之一。它是我们打印消息的标准工具,通常也是追踪和调试 +的最基本方法。如果你熟悉printf(3),你就能够知道printk()是基于它的,尽管它在功能上有一些不 +同之处: + + - printk() 消息可以指定日志级别。 + + - 格式字符串虽然与C99基本兼容,但并不遵循完全相同的规范。它有一些扩展和一些限制(没 + 有 ``%n`` 或浮点转换指定符)。参见:ref: `如何正确地获得printk格式指定符<printk-specifiers>` 。 + +所有的printk()消息都会被打印到内核日志缓冲区,这是一个通过/dev/kmsg输出到用户空间的环 +形缓冲区。读取它的通常方法是使用 ``dmesg`` 。 + +printk()的用法通常是这样的:: + + printk(KERN_INFO "Message: %s\n", arg); + +其中 ``KERN_INFO`` 是日志级别(注意,它与格式字符串连在一起,日志级别不是一个单独的参数)。 +可用的日志级别是: + + ++----------------+--------+-----------------------------------------------+ +| 名称 | 字符串 | 别名函数 | ++================+========+===============================================+ +| KERN_EMERG | "0" | pr_emerg() | ++----------------+--------+-----------------------------------------------+ +| KERN_ALERT | "1" | pr_alert() | ++----------------+--------+-----------------------------------------------+ +| KERN_CRIT | "2" | pr_crit() | ++----------------+--------+-----------------------------------------------+ +| KERN_ERR | "3" | pr_err() | ++----------------+--------+-----------------------------------------------+ +| KERN_WARNING | "4" | pr_warn() | ++----------------+--------+-----------------------------------------------+ +| KERN_NOTICE | "5" | pr_notice() | ++----------------+--------+-----------------------------------------------+ +| KERN_INFO | "6" | pr_info() | ++----------------+--------+-----------------------------------------------+ +| KERN_DEBUG | "7" | pr_debug() and pr_devel() 若定义了DEBUG | ++----------------+--------+-----------------------------------------------+ +| KERN_DEFAULT | "" | | ++----------------+--------+-----------------------------------------------+ +| KERN_CONT | "c" | pr_cont() | ++----------------+--------+-----------------------------------------------+ + + +日志级别指定了一条消息的重要性。内核根据日志级别和当前 *console_loglevel* (一个内核变量)决 +定是否立即显示消息(将其打印到当前控制台)。如果消息的优先级比 *console_loglevel* 高(日志级 +别值较低),消息将被打印到控制台。 + +如果省略了日志级别,则以 ``KERN_DEFAULT`` 级别打印消息。 + +你可以用以下方法检查当前的 *console_loglevel* :: + + $ cat /proc/sys/kernel/printk + 4 4 1 7 + +结果显示了 *current*, *default*, *minimum* 和 *boot-time-default* 日志级别 + +要改变当前的 console_loglevel,只需在 ``/proc/sys/kernel/printk`` 中写入所需的 +级别。例如,要打印所有的消息到控制台上:: + + # echo 8 > /proc/sys/kernel/printk + +另一种方式,使用 ``dmesg``:: + + # dmesg -n 5 + +设置 console_loglevel 打印 KERN_WARNING (4) 或更严重的消息到控制台。更多消息参 +见 ``dmesg(1)`` 。 + +作为printk()的替代方案,你可以使用 ``pr_*()`` 别名来记录日志。这个系列的宏在宏名中 +嵌入了日志级别。例如:: + + pr_info("Info message no. %d\n", msg_num); + +打印 ``KERN_INFO`` 消息。 + +除了比等效的printk()调用更简洁之外,它们还可以通过pr_fmt()宏为格式字符串使用一个通用 +的定义。例如,在源文件的顶部(在任何 ``#include`` 指令之前)定义这样的内容。:: + + #define pr_fmt(fmt) "%s:%s: " fmt, KBUILD_MODNAME, __func__ + +会在该文件中的每一条 pr_*() 消息前加上发起该消息的模块和函数名称。 + +为了调试,还有两个有条件编译的宏: +pr_debug()和pr_devel(),除非定义了 ``DEBUG`` (或者在pr_debug()的情况下定义了 +``CONFIG_DYNAMIC_DEBUG`` ),否则它们会被编译。 + + +函数接口 +======== + +该API在以下内核代码中: + +kernel/printk/printk.c + +include/linux/printk.h diff --git a/Documentation/translations/zh_CN/core-api/printk-formats.rst b/Documentation/translations/zh_CN/core-api/printk-formats.rst new file mode 100644 index 000000000000..a680c8f164c3 --- /dev/null +++ b/Documentation/translations/zh_CN/core-api/printk-formats.rst @@ -0,0 +1,595 @@ +.. include:: ../disclaimer-zh_CN.rst + +:Original: Documentation/core-api/printk-formats.rst +:Translator: Yanteng Si <siyanteng@loongson.cn> + +.. _cn_printk-formats.rst: + + +============================== +如何获得正确的printk格式占位符 +============================== + + + +:作者: Randy Dunlap <rdunlap@infradead.org> +:作者: Andrew Murray <amurray@mpc-data.co.uk> + + +整数类型 +======== + +:: + + 若变量类型是Type,则使用printk格式占位符。 + ------------------------------------------- + char %d 或 %x + unsigned char %u 或 %x + short int %d 或 %x + unsigned short int %u 或 %x + int %d 或 %x + unsigned int %u 或 %x + long %ld 或 %lx + unsigned long %lu 或 %lx + long long %lld 或 %llx + unsigned long long %llu 或 %llx + size_t %zu 或 %zx + ssize_t %zd 或 %zx + s8 %d 或 %x + u8 %u 或 %x + s16 %d 或 %x + u16 %u 或 %x + s32 %d 或 %x + u32 %u 或 %x + s64 %lld 或 %llx + u64 %llu 或 %llx + + +如果 <type> 的大小依赖于配置选项 (例如 sector_t, blkcnt_t) 或其大小依赖于架构 +(例如 tcflag_t),则使用其可能的最大类型的格式占位符并显式强制转换为它。 + +例如 + +:: + + printk("test: sector number/total blocks: %llu/%llu\n", + (unsigned long long)sector, (unsigned long long)blockcount); + +提醒:sizeof()返回类型为size_t。 + +内核的printf不支持%n。显而易见,浮点格式(%e, %f, %g, %a)也不被识别。使用任何不 +支持的占位符或长度限定符都会导致一个WARN并且终止vsnprintf()执行。 + +指针类型 +======== + +一个原始指针值可以用%p打印,它将在打印前对地址进行哈希处理。内核也支持扩展占位符来打印 +不同类型的指针。 + +一些扩展占位符会打印给定地址上的数据,而不是打印地址本身。在这种情况下,以下错误消息可能 +会被打印出来,而不是无法访问的消息:: + + (null) data on plain NULL address + (efault) data on invalid address + (einval) invalid data on a valid address + +普通指针 +---------- + +:: + + %p abcdef12 or 00000000abcdef12 + +没有指定扩展名的指针(即没有修饰符的%p)被哈希(hash),以防止内核内存布局消息的泄露。这 +样还有一个额外的好处,就是提供一个唯一的标识符。在64位机器上,前32位被清零。当没有足够的 +熵进行散列处理时,内核将打印(ptrval)代替 + +如果可能的话,使用专门的修饰符,如%pS或%pB(如下所述),以避免打印一个必须事后解释的非哈 +希地址。如果不可能,而且打印地址的目的是为调试提供更多的消息,使用%p,并在调试过程中 +用 ``no_hash_pointers`` 参数启动内核,这将打印所有未修改的%p地址。如果你 *真的* 想知 +道未修改的地址,请看下面的%px。 + +如果(也只有在)你将地址作为虚拟文件的内容打印出来,例如在procfs或sysfs中(使用 +seq_printf(),而不是printk())由用户空间进程读取,使用下面描述的%pK修饰符,不 +要用%p或%px。 + + +错误指针 +-------- + +:: + + %pe -ENOSPC + +用于打印错误指针(即IS_ERR()为真的指针)的符号错误名。不知道符号名的错误值会以十进制打印, +而作为%pe参数传递的非ERR_PTR会被视为普通的%p。 + +符号/函数指针 +------------- + +:: + + %pS versatile_init+0x0/0x110 + %ps versatile_init + %pSR versatile_init+0x9/0x110 + (with __builtin_extract_return_addr() translation) + %pB prev_fn_of_versatile_init+0x88/0x88 + + +``S`` 和 ``s`` 占位符用于打印符号格式的指针。它们的结果是符号名称带有(S)或不带有(s)偏移 +量。如果禁用KALLSYMS,则打印符号地址。 + +``B`` 占位符的结果是带有偏移量的符号名,在打印堆栈回溯时应该使用。占位符将考虑编译器优化 +的影响,当使用尾部调用并使用noreturn GCC属性标记时,可能会发生这种优化。 + +如果指针在一个模块内,模块名称和可选的构建ID将被打印在符号名称之后,并在说明符的末尾添加 +一个额外的 ``b`` 。 + +:: + + %pS versatile_init+0x0/0x110 [module_name] + %pSb versatile_init+0x0/0x110 [module_name ed5019fdf5e53be37cb1ba7899292d7e143b259e] + %pSRb versatile_init+0x9/0x110 [module_name ed5019fdf5e53be37cb1ba7899292d7e143b259e] + (with __builtin_extract_return_addr() translation) + %pBb prev_fn_of_versatile_init+0x88/0x88 [module_name ed5019fdf5e53be37cb1ba7899292d7e143b259e] + +来自BPF / tracing追踪的探查指针 +---------------------------------- + +:: + + %pks kernel string + %pus user string + +``k`` 和 ``u`` 指定符用于打印来自内核内存(k)或用户内存(u)的先前探测的内存。后面的 ``s`` 指 +定符的结果是打印一个字符串。对于直接在常规的vsnprintf()中使用时,(k)和(u)注释被忽略,但是,当 +在BPF的bpf_trace_printk()之外使用时,它会读取它所指向的内存,不会出现错误。 + +内核指针 +-------- + +:: + + %pK 01234567 or 0123456789abcdef + +用于打印应该对非特权用户隐藏的内核指针。%pK的行为取决于kptr_restrict sysctl——详见 +Documentation/admin-guide/sysctl/kernel.rst。 + +未经修改的地址 +-------------- + +:: + + %px 01234567 or 0123456789abcdef + +对于打印指针,当你 *真的* 想打印地址时。在用%px打印指针之前,请考虑你是否泄露了内核内 +存布局的敏感消息。%px在功能上等同于%lx(或%lu)。%px是首选,因为它在grep查找时更唯一。 +如果将来我们需要修改内核处理打印指针的方式,我们将能更好地找到调用点。 + +在使用%px之前,请考虑使用%p并在调试过程中启用' ' no_hash_pointer ' '内核参数是否足 +够(参见上面的%p描述)。%px的一个有效场景可能是在panic发生之前立即打印消息,这样无论如何 +都可以防止任何敏感消息被利用,使用%px就不需要用no_hash_pointer来重现panic。 + +指针差异 +-------- + +:: + + %td 2560 + %tx a00 + +为了打印指针的差异,使用ptrdiff_t的%t修饰符。 + +例如:: + + printk("test: difference between pointers: %td\n", ptr2 - ptr1); + +结构体资源(Resources) +----------------------- + +:: + + %pr [mem 0x60000000-0x6fffffff flags 0x2200] or + [mem 0x0000000060000000-0x000000006fffffff flags 0x2200] + %pR [mem 0x60000000-0x6fffffff pref] or + [mem 0x0000000060000000-0x000000006fffffff pref] + +用于打印结构体资源。 ``R`` 和 ``r`` 占位符的结果是打印出的资源带有(R)或不带有(r)解码标志 +成员。 + +通过引用传递。 + +物理地址类型 phys_addr_t +------------------------ + +:: + + %pa[p] 0x01234567 or 0x0123456789abcdef + +用于打印phys_addr_t类型(以及它的衍生物,如resource_size_t),该类型可以根据构建选项而 +变化,无论CPU数据真实物理地址宽度如何。 + +通过引用传递。 + +DMA地址类型dma_addr_t +--------------------- + +:: + + %pad 0x01234567 or 0x0123456789abcdef + +用于打印dma_addr_t类型,该类型可以根据构建选项而变化,而不考虑CPU数据路径的宽度。 + +通过引用传递。 + +原始缓冲区为转义字符串 +---------------------- + +:: + + %*pE[achnops] + +用于将原始缓冲区打印成转义字符串。对于以下缓冲区:: + + 1b 62 20 5c 43 07 22 90 0d 5d + +几个例子展示了如何进行转换(不包括两端的引号)。:: + + %*pE "\eb \C\a"\220\r]" + %*pEhp "\x1bb \C\x07"\x90\x0d]" + %*pEa "\e\142\040\\\103\a\042\220\r\135" + +转换规则是根据可选的标志组合来应用的(详见:c:func:`string_escape_mem` 内核文档): + + - a - ESCAPE_ANY + - c - ESCAPE_SPECIAL + - h - ESCAPE_HEX + - n - ESCAPE_NULL + - o - ESCAPE_OCTAL + - p - ESCAPE_NP + - s - ESCAPE_SPACE + +默认情况下,使用 ESCAPE_ANY_NP。 + +ESCAPE_ANY_NP是许多情况下的明智选择,特别是对于打印SSID。 + +如果字段宽度被省略,那么将只转义1个字节。 + +原始缓冲区为十六进制字符串 +-------------------------- + +:: + + %*ph 00 01 02 ... 3f + %*phC 00:01:02: ... :3f + %*phD 00-01-02- ... -3f + %*phN 000102 ... 3f + +对于打印小的缓冲区(最长64个字节),可以用一定的分隔符作为一个 +十六进制字符串。对于较大的缓冲区,可以考虑使用 +:c:func:`print_hex_dump` 。 + +MAC/FDDI地址 +------------ + +:: + + %pM 00:01:02:03:04:05 + %pMR 05:04:03:02:01:00 + %pMF 00-01-02-03-04-05 + %pm 000102030405 + %pmR 050403020100 + +用于打印以十六进制表示的6字节MAC/FDDI地址。 ``M`` 和 ``m`` 占位符导致打印的 +地址有(M)或没有(m)字节分隔符。默认的字节分隔符是冒号(:)。 + +对于FDDI地址,可以在 ``M`` 占位符之后使用 ``F`` 说明,以使用破折号(——)分隔符 +代替默认的分隔符。 + +对于蓝牙地址, ``R`` 占位符应使用在 ``M`` 占位符之后,以使用反转的字节顺序,适 +合于以小尾端顺序的蓝牙地址的肉眼可见的解析。 + +通过引用传递。 + +IPv4地址 +-------- + +:: + + %pI4 1.2.3.4 + %pi4 001.002.003.004 + %p[Ii]4[hnbl] + +用于打印IPv4点分隔的十进制地址。 ``I4`` 和 ``i4`` 占位符的结果是打印的地址 +有(i4)或没有(I4)前导零。 + +附加的 ``h`` 、 ``n`` 、 ``b`` 和 ``l`` 占位符分别用于指定主机、网络、大 +尾端或小尾端地址。如果没有提供占位符,则使用默认的网络/大尾端顺序。 + +通过引用传递。 + +IPv6 地址 +--------- + +:: + + %pI6 0001:0002:0003:0004:0005:0006:0007:0008 + %pi6 00010002000300040005000600070008 + %pI6c 1:2:3:4:5:6:7:8 + +用于打印IPv6网络顺序的16位十六进制地址。 ``I6`` 和 ``i6`` 占位符的结果是 +打印的地址有(I6)或没有(i6)分号。始终使用前导零。 + +额外的 ``c`` 占位符可与 ``I`` 占位符一起使用,以打印压缩的IPv6地址,如 +https://tools.ietf.org/html/rfc5952 所述 + +通过引用传递。 + +IPv4/IPv6地址(generic, with port, flowinfo, scope) +-------------------------------------------------- + +:: + + %pIS 1.2.3.4 or 0001:0002:0003:0004:0005:0006:0007:0008 + %piS 001.002.003.004 or 00010002000300040005000600070008 + %pISc 1.2.3.4 or 1:2:3:4:5:6:7:8 + %pISpc 1.2.3.4:12345 or [1:2:3:4:5:6:7:8]:12345 + %p[Ii]S[pfschnbl] + +用于打印一个IP地址,不需要区分它的类型是AF_INET还是AF_INET6。一个指向有效结构 +体sockaddr的指针,通过 ``IS`` 或 ``IS`` 指定,可以传递给这个格式占位符。 + +附加的 ``p`` 、 ``f`` 和 ``s`` 占位符用于指定port(IPv4, IPv6)、 +flowinfo (IPv6)和sope(IPv6)。port有一个 ``:`` 前缀,flowinfo是 ``/`` 和 +范围是 ``%`` ,每个后面都跟着实际的值。 + +对于IPv6地址,如果指定了额外的指定符 ``c`` ,则使用 +https://tools.ietf.org/html/rfc5952 描述的压缩IPv6地址。 +如https://tools.ietf.org/html/draft-ietf-6man-text-addr-representation-07 +所建议的,IPv6地址由'[',']'包围,以防止出现额外的占位符 ``p`` , ``f`` 或 ``s`` 。 + +对于IPv4地址,也可以使用额外的 ``h`` , ``n`` , ``b`` 和 ``l`` 说 +明符,但对于IPv6地址则忽略。 + +通过引用传递。 + +更多例子:: + + %pISfc 1.2.3.4 or [1:2:3:4:5:6:7:8]/123456789 + %pISsc 1.2.3.4 or [1:2:3:4:5:6:7:8]%1234567890 + %pISpfc 1.2.3.4:12345 or [1:2:3:4:5:6:7:8]:12345/123456789 + +UUID/GUID地址 +------------- + +:: + + %pUb 00010203-0405-0607-0809-0a0b0c0d0e0f + %pUB 00010203-0405-0607-0809-0A0B0C0D0E0F + %pUl 03020100-0504-0706-0809-0a0b0c0e0e0f + %pUL 03020100-0504-0706-0809-0A0B0C0E0E0F + +用于打印16字节的UUID/GUIDs地址。附加的 ``l`` , ``L`` , ``b`` 和 ``B`` 占位符用 +于指定小写(l)或大写(L)十六进制表示法中的小尾端顺序,以及小写(b)或大写(B)十六进制表 +示法中的大尾端顺序。 + +如果没有使用额外的占位符,则将打印带有小写十六进制表示法的默认大端顺序。 + +通过引用传递。 + +目录项(dentry)的名称 +---------------------- + +:: + + %pd{,2,3,4} + %pD{,2,3,4} + +用于打印dentry名称;如果我们用 :c:func:`d_move` 和它比较,名称可能是新旧混合的,但 +不会oops。 %pd dentry比较安全,其相当于我们以前用的%s dentry->d_name.name,%pd<n>打 +印 ``n`` 最后的组件。 %pD对结构文件做同样的事情。 + + +通过引用传递。 + +块设备(block_device)名称 +-------------------------- + +:: + + %pg sda, sda1 or loop0p1 + +用于打印block_device指针的名称。 + +va_format结构体 +--------------- + +:: + + %pV + +用于打印结构体va_format。这些结构包含一个格式字符串 +和va_list如下 + +:: + + struct va_format { + const char *fmt; + va_list *va; + }; + +实现 "递归vsnprintf"。 + +如果没有一些机制来验证格式字符串和va_list参数的正确性,请不要使用这个功能。 + +通过引用传递。 + +设备树节点 +---------- + +:: + + %pOF[fnpPcCF] + + +用于打印设备树节点结构。默认行为相当于%pOFf。 + + - f - 设备节点全称 + - n - 设备节点名 + - p - 设备节点句柄 + - P - 设备节点路径规范(名称+@单位) + - F - 设备节点标志 + - c - 主要兼容字符串 + - C - 全兼容字符串 + +当使用多个参数时,分隔符是':'。 + +例如 + +:: + + %pOF /foo/bar@0 - Node full name + %pOFf /foo/bar@0 - Same as above + %pOFfp /foo/bar@0:10 - Node full name + phandle + %pOFfcF /foo/bar@0:foo,device:--P- - Node full name + + major compatible string + + node flags + D - dynamic + d - detached + P - Populated + B - Populated bus + +通过引用传递。 + +Fwnode handles +-------------- + +:: + + %pfw[fP] + +用于打印fwnode_handles的消息。默认情况下是打印完整的节点名称,包括路径。 +这些修饰符在功能上等同于上面的%pOF。 + + - f - 节点的全名,包括路径。 + - P - 节点名称,包括地址(如果有的话)。 + +例如 (ACPI) + +:: + + %pfwf \_SB.PCI0.CIO2.port@1.endpoint@0 - Full node name + %pfwP endpoint@0 - Node name + +例如 (OF) + +:: + + %pfwf /ocp@68000000/i2c@48072000/camera@10/port/endpoint - Full name + %pfwP endpoint - Node name + +时间和日期 +---------- + +:: + + %pt[RT] YYYY-mm-ddTHH:MM:SS + %pt[RT]s YYYY-mm-dd HH:MM:SS + %pt[RT]d YYYY-mm-dd + %pt[RT]t HH:MM:SS + %pt[RT][dt][r][s] + +用于打印日期和时间:: + + R struct rtc_time structure + T time64_t type + +以我们(人类)可读的格式。 + +默认情况下,年将以1900为单位递增,月将以1为单位递增。 使用%pt[RT]r (raw) +来抑制这种行为。 + +%pt[RT]s(空格)将覆盖ISO 8601的分隔符,在日期和时间之间使用''(空格)而 +不是'T'(大写T)。当日期或时间被省略时,它不会有任何影响。 + +通过引用传递。 + +clk结构体 +--------- + +:: + + %pC pll1 + %pCn pll1 + +用于打印clk结构。%pC 和 %pCn 打印时钟的名称(通用时钟框架)或唯一的32位 +ID(传统时钟框架)。 + +通过引用传递。 + +位图及其衍生物,如cpumask和nodemask +----------------------------------- + +:: + + %*pb 0779 + %*pbl 0,3-6,8-10 + +对于打印位图(bitmap)及其派生的cpumask和nodemask,%*pb输出以字段宽度为位数的位图, +%*pbl输出以字段宽度为位数的范围列表。 + +字段宽度用值传递,位图用引用传递。可以使用辅助宏cpumask_pr_args()和 +nodemask_pr_args()来方便打印cpumask和nodemask。 + +标志位字段,如页标志、gfp_flags +------------------------------- + +:: + + %pGp referenced|uptodate|lru|active|private|node=0|zone=2|lastcpupid=0x1fffff + %pGg GFP_USER|GFP_DMA32|GFP_NOWARN + %pGv read|exec|mayread|maywrite|mayexec|denywrite + +将flags位字段打印为构造值的符号常量集合。标志的类型由第三个字符给出。目前支持的 +是[p]age flags, [v]ma_flags(都期望 ``unsigned long *`` )和 +[g]fp_flags(期望 ``gfp_t *`` )。标志名称和打印顺序取决于特定的类型。 + +注意,这种格式不应该直接用于跟踪点的:c:func:`TP_printk()` 部分。相反,应使 +用 <trace/events/mmflags.h>中的show_*_flags()函数。 + +通过引用传递。 + +网络设备特性 +------------ + +:: + + %pNF 0x000000000000c000 + +用于打印netdev_features_t。 + +通过引用传递。 + +V4L2和DRM FourCC代码(像素格式) +------------------------------ + +:: + + %p4cc + +打印V4L2或DRM使用的FourCC代码,包括格式端序及其十六进制的数值。 + +通过引用传递。 + +例如:: + + %p4cc BG12 little-endian (0x32314742) + %p4cc Y10 little-endian (0x20303159) + %p4cc NV12 big-endian (0xb231564e) + +谢谢 +==== + +如果您添加了其他%p扩展,请在可行的情况下,用一个或多个测试用例扩展<lib/test_printf.c>。 + +谢谢你的合作和关注。 diff --git a/Documentation/translations/zh_CN/core-api/refcount-vs-atomic.rst b/Documentation/translations/zh_CN/core-api/refcount-vs-atomic.rst new file mode 100644 index 000000000000..ea834e38d2f6 --- /dev/null +++ b/Documentation/translations/zh_CN/core-api/refcount-vs-atomic.rst @@ -0,0 +1,154 @@ +.. include:: ../disclaimer-zh_CN.rst + +:Original: Documentation/core-api/refcount-vs-atomic.rst +:Translator: Yanteng Si <siyanteng@loongson.cn> + +.. _cn_refcount-vs-atomic: + + +======================================= +与atomic_t相比,refcount_t的API是这样的 +======================================= + +.. contents:: :local: + +简介 +==== + +refcount_t API的目标是为实现对象的引用计数器提供一个最小的API。虽然来自 +lib/refcount.c的独立于架构的通用实现在下面使用了原子操作,但一些 ``refcount_*()`` +和 ``atomic_*()`` 函数在内存顺序保证方面有很多不同。本文档概述了这些差异,并 +提供了相应的例子,以帮助开发者根据这些内存顺序保证的变化来验证他们的代码。 + +本文档中使用的术语尽量遵循tools/memory-model/Documentation/explanation.txt +中定义的正式LKMM。 + +memory-barriers.txt和atomic_t.txt提供了更多关于内存顺序的背景,包括通用的 +和针对原子操作的。 + +内存顺序的相关类型 +================== + +.. note:: 下面的部分只涵盖了本文使用的与原子操作和引用计数器有关的一些内存顺 + 序类型。如果想了解更广泛的情况,请查阅memory-barriers.txt文件。 + +在没有任何内存顺序保证的情况下(即完全无序),atomics和refcounters只提供原 +子性和程序顺序(program order, po)关系(在同一个CPU上)。它保证每个 +``atomic_* ()`` 和 ``refcount_*()`` 操作都是原子性的,指令在单个CPU上按程序 +顺序执行。这是用READ_ONCE()/WRITE_ONCE()和比较并交换原语实现的。 + +强(完全)内存顺序保证在同一CPU上的所有较早加载和存储的指令(所有程序顺序较早 +[po-earlier]指令)在执行任何程序顺序较后指令(po-later)之前完成。它还保证 +同一CPU上储存的程序优先较早的指令和来自其他CPU传播的指令必须在该CPU执行任何 +程序顺序较后指令之前传播到其他CPU(A-累积属性)。这是用smp_mb()实现的。 + +RELEASE内存顺序保证了在同一CPU上所有较早加载和存储的指令(所有程序顺序较早 +指令)在此操作前完成。它还保证同一CPU上储存的程序优先较早的指令和来自其他CPU +传播的指令必须在释放(release)操作之前传播到所有其他CPU(A-累积属性)。这是用 +smp_store_release()实现的。 + +ACQUIRE内存顺序保证了同一CPU上的所有后加载和存储的指令(所有程序顺序较后 +指令)在获取(acquire)操作之后完成。它还保证在获取操作执行后,同一CPU上 +储存的所有程序顺序较后指令必须传播到所有其他CPU。这是用 +smp_acquire__after_ctrl_dep()实现的。 + +对Refcounters的控制依赖(取决于成功)保证了如果一个对象的引用被成功获得(引用计数 +器的增量或增加行为发生了,函数返回true),那么进一步的存储是针对这个操作的命令。对存 +储的控制依赖没有使用任何明确的屏障来实现,而是依赖于CPU不对存储进行猜测。这只是 +一个单一的CPU关系,对其他CPU不提供任何保证。 + + +函数的比较 +========== + +情况1) - 非 “读/修改/写”(RMW)操作 +------------------------------------ + +函数变化: + + * atomic_set() --> refcount_set() + * atomic_read() --> refcount_read() + +内存顺序保证变化: + + * none (两者都是完全无序的) + + +情况2) - 基于增量的操作,不返回任何值 +-------------------------------------- + +函数变化: + + * atomic_inc() --> refcount_inc() + * atomic_add() --> refcount_add() + +内存顺序保证变化: + + * none (两者都是完全无序的) + +情况3) - 基于递减的RMW操作,没有返回值 +--------------------------------------- + +函数变化: + + * atomic_dec() --> refcount_dec() + +内存顺序保证变化: + + * 完全无序的 --> RELEASE顺序 + + +情况4) - 基于增量的RMW操作,返回一个值 +--------------------------------------- + +函数变化: + + * atomic_inc_not_zero() --> refcount_inc_not_zero() + * 无原子性对应函数 --> refcount_add_not_zero() + +内存顺序保证变化: + + * 完全有序的 --> 控制依赖于存储的成功 + +.. note:: 此处 **假设** 了,必要的顺序是作为获得对象指针的结果而提供的。 + + +情况 5) - 基于Dec/Sub递减的通用RMW操作,返回一个值 +--------------------------------------------------- + +函数变化: + + * atomic_dec_and_test() --> refcount_dec_and_test() + * atomic_sub_and_test() --> refcount_sub_and_test() + +内存顺序保证变化: + + * 完全有序的 --> RELEASE顺序 + 成功后ACQUIRE顺序 + + +情况6)其他基于递减的RMW操作,返回一个值 +---------------------------------------- + +函数变化: + + * 无原子性对应函数 --> refcount_dec_if_one() + * ``atomic_add_unless(&var, -1, 1)`` --> ``refcount_dec_not_one(&var)`` + +内存顺序保证变化: + + * 完全有序的 --> RELEASE顺序 + 控制依赖 + +.. note:: atomic_add_unless()只在执行成功时提供完整的顺序。 + + +情况7)--基于锁的RMW +-------------------- + +函数变化: + + * atomic_dec_and_lock() --> refcount_dec_and_lock() + * atomic_dec_and_mutex_lock() --> refcount_dec_and_mutex_lock() + +内存顺序保证变化: + + * 完全有序 --> RELEASE顺序 + 控制依赖 + 持有 diff --git a/Documentation/translations/zh_CN/core-api/symbol-namespaces.rst b/Documentation/translations/zh_CN/core-api/symbol-namespaces.rst new file mode 100644 index 000000000000..ce05c29c7697 --- /dev/null +++ b/Documentation/translations/zh_CN/core-api/symbol-namespaces.rst @@ -0,0 +1,142 @@ +.. include:: ../disclaimer-zh_CN.rst + +:Original: Documentation/core-api/symbol-namespaces.rst +:Translator: Yanteng Si <siyanteng@loongson.cn> + +.. _cn_symbol-namespaces.rst: + + +================================= +符号命名空间(Symbol Namespaces) +================================= + +本文档描述了如何使用符号命名空间来构造通过EXPORT_SYMBOL()系列宏导出的内核内符号的导出面。 + +.. 目录 + + === 1 简介 + === 2 如何定义符号命名空间 + --- 2.1 使用EXPORT_SYMBOL宏 + --- 2.2 使用DEFAULT_SYMBOL_NAMESPACE定义 + === 3 如何使用命名空间中导出的符号 + === 4 加载使用命名空间符号的模块 + === 5 自动创建MODULE_IMPORT_NS声明 + +1. 简介 +======= + +符号命名空间已经被引入,作为构造内核内API的导出面的一种手段。它允许子系统维护者将 +他们导出的符号划分进独立的命名空间。这对于文档的编写非常有用(想想SUBSYSTEM_DEBUG +命名空间),也可以限制一组符号在内核其他部分的使用。今后,使用导出到命名空间的符号 +的模块必须导入命名空间。否则,内核将根据其配置,拒绝加载该模块或警告说缺少 +导入。 + +2. 如何定义符号命名空间 +======================= + +符号可以用不同的方法导出到命名空间。所有这些都在改变 EXPORT_SYMBOL 和与之类似的那些宏 +被检测到的方式,以创建 ksymtab 条目。 + +2.1 使用EXPORT_SYMBOL宏 +======================= + +除了允许将内核符号导出到内核符号表的宏EXPORT_SYMBOL()和EXPORT_SYMBOL_GPL()之外, +这些宏的变体还可以将符号导出到某个命名空间:EXPORT_SYMBOL_NS() 和 EXPORT_SYMBOL_NS_GPL()。 +它们需要一个额外的参数:命名空间(the namespace)。请注意,由于宏扩展,该参数需 +要是一个预处理器符号。例如,要把符号 ``usb_stor_suspend`` 导出到命名空间 ``USB_STORAGE``, +请使用:: + + EXPORT_SYMBOL_NS(usb_stor_suspend, USB_STORAGE); + +相应的 ksymtab 条目结构体 ``kernel_symbol`` 将有相应的成员 ``命名空间`` 集。 +导出时未指明命名空间的符号将指向 ``NULL`` 。如果没有定义命名空间,则默认没有。 +``modpost`` 和kernel/module.c分别在构建时或模块加载时使用名称空间。 + +2.2 使用DEFAULT_SYMBOL_NAMESPACE定义 +==================================== + +为一个子系统的所有符号定义命名空间可能会非常冗长,并可能变得难以维护。因此,我 +们提供了一个默认定义(DEFAULT_SYMBOL_NAMESPACE),如果设置了这个定义, 它将成 +为所有没有指定命名空间的 EXPORT_SYMBOL() 和 EXPORT_SYMBOL_GPL() 宏扩展的默认 +定义。 + +有多种方法来指定这个定义,使用哪种方法取决于子系统和维护者的喜好。第一种方法是在 +子系统的 ``Makefile`` 中定义默认命名空间。例如,如果要将usb-common中定义的所有符号导 +出到USB_COMMON命名空间,可以在drivers/usb/common/Makefile中添加这样一行:: + + ccflags-y += -DDEFAULT_SYMBOL_NAMESPACE=USB_COMMON + +这将影响所有 EXPORT_SYMBOL() 和 EXPORT_SYMBOL_GPL() 语句。当这个定义存在时, +用EXPORT_SYMBOL_NS()导出的符号仍然会被导出到作为命名空间参数传递的命名空间中, +因为这个参数优先于默认的符号命名空间。 + +定义默认命名空间的第二个选项是直接在编译单元中作为预处理声明。上面的例子就会变 +成:: + + #undef DEFAULT_SYMBOL_NAMESPACE + #define DEFAULT_SYMBOL_NAMESPACE USB_COMMON + +应置于相关编译单元中任何 EXPORT_SYMBOL 宏之前 + +3. 如何使用命名空间中导出的符号 +=============================== + +为了使用被导出到命名空间的符号,内核模块需要明确地导入这些命名空间。 +否则内核可能会拒绝加载该模块。模块代码需要使用宏MODULE_IMPORT_NS来 +表示它所使用的命名空间的符号。例如,一个使用usb_stor_suspend符号的 +模块,需要使用如下语句导入命名空间USB_STORAGE:: + + MODULE_IMPORT_NS(USB_STORAGE); + +这将在模块中为每个导入的命名空间创建一个 ``modinfo`` 标签。这也顺带 +使得可以用modinfo检查模块已导入的命名空间:: + + $ modinfo drivers/usb/storage/ums-karma.ko + [...] + import_ns: USB_STORAGE + [...] + + +建议将 MODULE_IMPORT_NS() 语句添加到靠近其他模块元数据定义的地方, +如 MODULE_AUTHOR() 或 MODULE_LICENSE() 。关于自动创建缺失的导入 +语句的方法,请参考第5节。 + +4. 加载使用命名空间符号的模块 +============================= + +在模块加载时(比如 ``insmod`` ),内核将检查每个从模块中引用的符号是否可 +用,以及它可能被导出到的名字空间是否被模块导入。内核的默认行为是拒绝 +加载那些没有指明足以导入的模块。此错误会被记录下来,并且加载将以 +EINVAL方式失败。要允许加载不满足这个前提条件的模块,可以使用此配置选项: +设置 MODULE_ALLOW_MISSING_NAMESPACE_IMPORTS=y 将使加载不受影响,但会 +发出警告。 + +5. 自动创建MODULE_IMPORT_NS声明 +=============================== + +缺少命名空间的导入可以在构建时很容易被检测到。事实上,如果一个模块 +使用了一个命名空间的符号而没有导入它,modpost会发出警告。 +MODULE_IMPORT_NS()语句通常会被添加到一个明确的位置(和其他模块元 +数据一起)。为了使模块作者(和子系统维护者)的生活更加轻松,我们提 +供了一个脚本和make目标来修复丢失的导入。修复丢失的导入可以用:: + + $ make nsdeps + +对模块作者来说,以下情况可能很典型:: + + - 编写依赖未导入命名空间的符号的代码 + - ``make`` + - 注意 ``modpost`` 的警告,提醒你有一个丢失的导入。 + - 运行 ``make nsdeps``将导入添加到正确的代码位置。 + +对于引入命名空间的子系统维护者来说,其步骤非常相似。同样,make nsdeps最终将 +为树内模块添加缺失的命名空间导入:: + + - 向命名空间转移或添加符号(例如,使用EXPORT_SYMBOL_NS())。 + - `make e`(最好是用allmodconfig来覆盖所有的内核模块)。 + - 注意 ``modpost`` 的警告,提醒你有一个丢失的导入。 + - 运行 ``maknsdeps``将导入添加到正确的代码位置。 + +你也可以为外部模块的构建运行nsdeps。典型的用法是:: + + $ make -C <path_to_kernel_src> M=$PWD nsdeps diff --git a/Documentation/translations/zh_CN/core-api/workqueue.rst b/Documentation/translations/zh_CN/core-api/workqueue.rst new file mode 100644 index 000000000000..0b8f730db6c0 --- /dev/null +++ b/Documentation/translations/zh_CN/core-api/workqueue.rst @@ -0,0 +1,337 @@ +.. SPDX-License-Identifier: GPL-2.0 +.. include:: ../disclaimer-zh_CN.rst + +:Original: Documentation/core-api/workqueue.rst +:Translator: Yanteng Si <siyanteng@loongson.cn> + +.. _cn_workqueue.rst: + + +========================= +并发管理的工作队列 (cmwq) +========================= + +:日期: September, 2010 +:作者: Tejun Heo <tj@kernel.org> +:作者: Florian Mickler <florian@mickler.org> + + +简介 +==== + +在很多情况下,需要一个异步进程的执行环境,工作队列(wq)API是这种情况下 +最常用的机制。 + +当需要这样一个异步执行上下文时,一个描述将要执行的函数的工作项(work, +即一个待执行的任务)被放在队列中。一个独立的线程作为异步执行环境。该队 +列被称为workqueue,线程被称为工作者(worker,即执行这一队列的线程)。 + +当工作队列上有工作项时,工作者会一个接一个地执行与工作项相关的函数。当 +工作队列中没有任何工作项时,工作者就会变得空闲。当一个新的工作项被排入 +队列时,工作者又开始执行。 + + +为什么要cmwq? +============= + +在最初的wq实现中,多线程(MT)wq在每个CPU上有一个工作者线程,而单线程 +(ST)wq在全系统有一个工作者线程。一个MT wq需要保持与CPU数量相同的工 +作者数量。这些年来,内核增加了很多MT wq的用户,随着CPU核心数量的不断 +增加,一些系统刚启动就达到了默认的32k PID的饱和空间。 + +尽管MT wq浪费了大量的资源,但所提供的并发性水平却不能令人满意。这个限 +制在ST和MT wq中都有,只是在MT中没有那么严重。每个wq都保持着自己独立的 +工作者池。一个MT wq只能为每个CPU提供一个执行环境,而一个ST wq则为整个 +系统提供一个。工作项必须竞争这些非常有限的执行上下文,从而导致各种问题, +包括在单一执行上下文周围容易发生死锁。 + +(MT wq)所提供的并发性水平和资源使用之间的矛盾也迫使其用户做出不必要的权衡,比 +如libata选择使用ST wq来轮询PIO,并接受一个不必要的限制,即没有两个轮 +询PIO可以同时进行。由于MT wq并没有提供更好的并发性,需要更高层次的并 +发性的用户,如async或fscache,不得不实现他们自己的线程池。 + +并发管理工作队列(cmwq)是对wq的重新实现,重点是以下目标。 + +* 保持与原始工作队列API的兼容性。 + +* 使用由所有wq共享的每CPU统一的工作者池,在不浪费大量资源的情况下按 +* 需提供灵活的并发水平。 + +* 自动调节工作者池和并发水平,使API用户不需要担心这些细节。 + + +设计 +==== + +为了简化函数的异步执行,引入了一个新的抽象概念,即工作项。 + +一个工作项是一个简单的结构,它持有一个指向将被异步执行的函数的指针。 +每当一个驱动程序或子系统希望一个函数被异步执行时,它必须建立一个指 +向该函数的工作项,并在工作队列中排队等待该工作项。(就是挂到workqueue +队列里面去) + +特定目的线程,称为工作线程(工作者),一个接一个地执行队列中的功能。 +如果没有工作项排队,工作者线程就会闲置。这些工作者线程被管理在所谓 +的工作者池中。 + +cmwq设计区分了面向用户的工作队列,子系统和驱动程序在上面排队工作, +以及管理工作者池和处理排队工作项的后端机制。 + +每个可能的CPU都有两个工作者池,一个用于正常的工作项,另一个用于高 +优先级的工作项,还有一些额外的工作者池,用于服务未绑定工作队列的工 +作项目——这些后备池的数量是动态的。 + +当他们认为合适的时候,子系统和驱动程序可以通过特殊的 +``workqueue API`` 函数创建和排队工作项。他们可以通过在工作队列上 +设置标志来影响工作项执行方式的某些方面,他们把工作项放在那里。这些 +标志包括诸如CPU定位、并发限制、优先级等等。要获得详细的概述,请参 +考下面的 ``alloc_workqueue()`` 的 API 描述。 + +当一个工作项被排入一个工作队列时,目标工作池将根据队列参数和工作队 +列属性确定,并被附加到工作池的共享工作列表上。例如,除非特别重写, +否则一个绑定的工作队列的工作项将被排在与发起线程运行的CPU相关的普 +通或高级工作工作者池的工作项列表中。 + +对于任何工作者池的实施,管理并发水平(有多少执行上下文处于活动状 +态)是一个重要问题。最低水平是为了节省资源,而饱和水平是指系统被 +充分使用。 + +每个与实际CPU绑定的worker-pool通过钩住调度器来实现并发管理。每当 +一个活动的工作者被唤醒或睡眠时,工作者池就会得到通知,并跟踪当前可 +运行的工作者的数量。一般来说,工作项不会占用CPU并消耗很多周期。这 +意味着保持足够的并发性以防止工作处理停滞应该是最优的。只要CPU上有 +一个或多个可运行的工作者,工作者池就不会开始执行新的工作,但是,当 +最后一个运行的工作者进入睡眠状态时,它会立即安排一个新的工作者,这 +样CPU就不会在有待处理的工作项目时闲置。这允许在不损失执行带宽的情 +况下使用最少的工作者。 + +除了kthreads的内存空间外,保留空闲的工作者并没有其他成本,所以cmwq +在杀死它们之前会保留一段时间的空闲。 + +对于非绑定的工作队列,后备池的数量是动态的。可以使用 +``apply_workqueue_attrs()`` 为非绑定工作队列分配自定义属性, +workqueue将自动创建与属性相匹配的后备工作者池。调节并发水平的责任在 +用户身上。也有一个标志可以将绑定的wq标记为忽略并发管理。 +详情请参考API部分。 + +前进进度的保证依赖于当需要更多的执行上下文时可以创建工作者,这也是 +通过使用救援工作者来保证的。所有可能在处理内存回收的代码路径上使用 +的工作项都需要在wq上排队,wq上保留了一个救援工作者,以便在内存有压 +力的情况下下执行。否则,工作者池就有可能出现死锁,等待执行上下文释 +放出来。 + + +应用程序编程接口 (API) +====================== + +``alloc_workqueue()`` 分配了一个wq。原来的 ``create_*workqueue()`` +函数已被废弃,并计划删除。 ``alloc_workqueue()`` 需要三个 +参数 - ``@name`` , ``@flags`` 和 ``@max_active`` 。 +``@name`` 是wq的名称,如果有的话,也用作救援线程的名称。 + +一个wq不再管理执行资源,而是作为前进进度保证、刷新(flush)和 +工作项属性的域。 ``@flags`` 和 ``@max_active`` 控制着工作 +项如何被分配执行资源、安排和执行。 + + +``flags`` +--------- + +``WQ_UNBOUND`` + 排队到非绑定wq的工作项由特殊的工作者池提供服务,这些工作者不 + 绑定在任何特定的CPU上。这使得wq表现得像一个简单的执行环境提 + 供者,没有并发管理。非绑定工作者池试图尽快开始执行工作项。非 + 绑定的wq牺牲了局部性,但在以下情况下是有用的。 + + * 预计并发水平要求会有很大的波动,使用绑定的wq最终可能会在不 + 同的CPU上产生大量大部分未使用的工作者,因为发起线程在不同 + 的CPU上跳转。 + + * 长期运行的CPU密集型工作负载,可以由系统调度器更好地管理。 + +``WQ_FREEZABLE`` + 一个可冻结的wq参与了系统暂停操作的冻结阶段。wq上的工作项被 + 排空,在解冻之前没有新的工作项开始执行。 + +``WQ_MEM_RECLAIM`` + 所有可能在内存回收路径中使用的wq都必须设置这个标志。无论内 + 存压力如何,wq都能保证至少有一个执行上下文。 + +``WQ_HIGHPRI`` + 高优先级wq的工作项目被排到目标cpu的高优先级工作者池中。高 + 优先级的工作者池由具有较高级别的工作者线程提供服务。 + + 请注意,普通工作者池和高优先级工作者池之间并不相互影响。他 + 们各自维护其独立的工作者池,并在其工作者之间实现并发管理。 + +``WQ_CPU_INTENSIVE`` + CPU密集型wq的工作项对并发水平没有贡献。换句话说,可运行的 + CPU密集型工作项不会阻止同一工作者池中的其他工作项开始执行。 + 这对于那些预计会占用CPU周期的绑定工作项很有用,这样它们的 + 执行就会受到系统调度器的监管。 + + 尽管CPU密集型工作项不会对并发水平做出贡献,但它们的执行开 + 始仍然受到并发管理的管制,可运行的非CPU密集型工作项会延迟 + CPU密集型工作项的执行。 + + 这个标志对于未绑定的wq来说是没有意义的。 + +请注意,标志 ``WQ_NON_REENTRANT`` 不再存在,因为现在所有的工作 +队列都是不可逆的——任何工作项都保证在任何时间内最多被整个系统的一 +个工作者执行。 + + +``max_active`` +-------------- + +``@max_active`` 决定了每个CPU可以分配给wq的工作项的最大执行上 +下文数量。例如,如果 ``@max_active为16`` ,每个CPU最多可以同 +时执行16个wq的工作项。 + +目前,对于一个绑定的wq, ``@max_active`` 的最大限制是512,当指 +定为0时使用的默认值是256。对于非绑定的wq,其限制是512和 +4 * ``num_possible_cpus()`` 中的较高值。这些值被选得足够高,所 +以它们不是限制性因素,同时会在失控情况下提供保护。 + +一个wq的活动工作项的数量通常由wq的用户来调节,更具体地说,是由用 +户在同一时间可以排列多少个工作项来调节。除非有特定的需求来控制活动 +工作项的数量,否则建议指定 为"0"。 + +一些用户依赖于ST wq的严格执行顺序。 ``@max_active`` 为1和 ``WQ_UNBOUND`` +的组合用来实现这种行为。这种wq上的工作项目总是被排到未绑定的工作池 +中,并且在任何时候都只有一个工作项目处于活动状态,从而实现与ST wq相 +同的排序属性。 + +在目前的实现中,上述配置只保证了特定NUMA节点内的ST行为。相反, +``alloc_ordered_queue()`` 应该被用来实现全系统的ST行为。 + + +执行场景示例 +============ + +下面的示例执行场景试图说明cmwq在不同配置下的行为。 + + 工作项w0、w1、w2被排到同一个CPU上的一个绑定的wq q0上。w0 + 消耗CPU 5ms,然后睡眠10ms,然后在完成之前再次消耗CPU 5ms。 + +忽略所有其他的任务、工作和处理开销,并假设简单的FIFO调度, +下面是一个高度简化的原始wq的可能事件序列的版本。:: + + TIME IN MSECS EVENT + 0 w0 starts and burns CPU + 5 w0 sleeps + 15 w0 wakes up and burns CPU + 20 w0 finishes + 20 w1 starts and burns CPU + 25 w1 sleeps + 35 w1 wakes up and finishes + 35 w2 starts and burns CPU + 40 w2 sleeps + 50 w2 wakes up and finishes + +And with cmwq with ``@max_active`` >= 3, :: + + TIME IN MSECS EVENT + 0 w0 starts and burns CPU + 5 w0 sleeps + 5 w1 starts and burns CPU + 10 w1 sleeps + 10 w2 starts and burns CPU + 15 w2 sleeps + 15 w0 wakes up and burns CPU + 20 w0 finishes + 20 w1 wakes up and finishes + 25 w2 wakes up and finishes + +如果 ``@max_active`` == 2, :: + + TIME IN MSECS EVENT + 0 w0 starts and burns CPU + 5 w0 sleeps + 5 w1 starts and burns CPU + 10 w1 sleeps + 15 w0 wakes up and burns CPU + 20 w0 finishes + 20 w1 wakes up and finishes + 20 w2 starts and burns CPU + 25 w2 sleeps + 35 w2 wakes up and finishes + +现在,我们假设w1和w2被排到了不同的wq q1上,这个wq q1 +有 ``WQ_CPU_INTENSIVE`` 设置:: + + TIME IN MSECS EVENT + 0 w0 starts and burns CPU + 5 w0 sleeps + 5 w1 and w2 start and burn CPU + 10 w1 sleeps + 15 w2 sleeps + 15 w0 wakes up and burns CPU + 20 w0 finishes + 20 w1 wakes up and finishes + 25 w2 wakes up and finishes + + +指南 +==== + +* 如果一个wq可能处理在内存回收期间使用的工作项目,请不 + 要忘记使用 ``WQ_MEM_RECLAIM`` 。每个设置了 + ``WQ_MEM_RECLAIM`` 的wq都有一个为其保留的执行环境。 + 如果在内存回收过程中使用的多个工作项之间存在依赖关系, + 它们应该被排在不同的wq中,每个wq都有 ``WQ_MEM_RECLAIM`` 。 + +* 除非需要严格排序,否则没有必要使用ST wq。 + +* 除非有特殊需要,建议使用0作为@max_active。在大多数使用情 + 况下,并发水平通常保持在默认限制之下。 + +* 一个wq作为前进进度保证(WQ_MEM_RECLAIM,冲洗(flush)和工 + 作项属性的域。不涉及内存回收的工作项,不需要作为工作项组的一 + 部分被刷新,也不需要任何特殊属性,可以使用系统中的一个wq。使 + 用专用wq和系统wq在执行特性上没有区别。 + +* 除非工作项预计会消耗大量的CPU周期,否则使用绑定的wq通常是有 + 益的,因为wq操作和工作项执行中的定位水平提高了。 + + +调试 +==== + +因为工作函数是由通用的工作者线程执行的,所以需要一些手段来揭示一些行为不端的工作队列用户。 + +工作者线程在进程列表中显示为: :: + + root 5671 0.0 0.0 0 0 ? S 12:07 0:00 [kworker/0:1] + root 5672 0.0 0.0 0 0 ? S 12:07 0:00 [kworker/1:2] + root 5673 0.0 0.0 0 0 ? S 12:12 0:00 [kworker/0:0] + root 5674 0.0 0.0 0 0 ? S 12:13 0:00 [kworker/1:0] + +如果kworkers失控了(使用了太多的cpu),有两类可能的问题: + + 1. 正在迅速调度的事情 + 2. 一个消耗大量cpu周期的工作项。 + +第一个可以用追踪的方式进行跟踪: :: + + $ echo workqueue:workqueue_queue_work > /sys/kernel/debug/tracing/set_event + $ cat /sys/kernel/debug/tracing/trace_pipe > out.txt + (wait a few secs) + +如果有什么东西在工作队列上忙着做循环,它就会主导输出,可以用工作项函数确定违规者。 + +对于第二类问题,应该可以只检查违规工作者线程的堆栈跟踪。 :: + + $ cat /proc/THE_OFFENDING_KWORKER/stack + +工作项函数在堆栈追踪中应该是微不足道的。 + + +内核内联文档参考 +================ + +该API在以下内核代码中: + +include/linux/workqueue.h + +kernel/workqueue.c diff --git a/Documentation/translations/zh_CN/dev-tools/index.rst b/Documentation/translations/zh_CN/dev-tools/index.rst index fd73c479917b..e6c99f2f543f 100644 --- a/Documentation/translations/zh_CN/dev-tools/index.rst +++ b/Documentation/translations/zh_CN/dev-tools/index.rst @@ -19,13 +19,13 @@ :maxdepth: 2 gcov + kasan Todolist: - coccinelle - sparse - kcov - - kasan - ubsan - kmemleak - kcsan diff --git a/Documentation/translations/zh_CN/dev-tools/kasan.rst b/Documentation/translations/zh_CN/dev-tools/kasan.rst new file mode 100644 index 000000000000..23db9d419047 --- /dev/null +++ b/Documentation/translations/zh_CN/dev-tools/kasan.rst @@ -0,0 +1,417 @@ +.. SPDX-License-Identifier: GPL-2.0 + +.. include:: ../disclaimer-zh_CN.rst + +:Original: Documentation/dev-tools/kasan.rst +:Translator: 万家兵 Wan Jiabing <wanjiabing@vivo.com> + +内核地址消毒剂(KASAN) +===================== + +概述 +---- + +KernelAddressSANitizer(KASAN)是一种动态内存安全错误检测工具,主要功能是 +检查内存越界访问和使用已释放内存的问题。KASAN有三种模式: + +1. 通用KASAN(与用户空间的ASan类似) +2. 基于软件标签的KASAN(与用户空间的HWASan类似) +3. 基于硬件标签的KASAN(基于硬件内存标签) + +由于通用KASAN的内存开销较大,通用KASAN主要用于调试。基于软件标签的KASAN +可用于dogfood测试,因为它具有较低的内存开销,并允许将其用于实际工作量。 +基于硬件标签的KASAN具有较低的内存和性能开销,因此可用于生产。同时可用于 +检测现场内存问题或作为安全缓解措施。 + +软件KASAN模式(#1和#2)使用编译时工具在每次内存访问之前插入有效性检查, +因此需要一个支持它的编译器版本。 + +通用KASAN在GCC和Clang受支持。GCC需要8.3.0或更高版本。任何受支持的Clang +版本都是兼容的,但从Clang 11才开始支持检测全局变量的越界访问。 + +基于软件标签的KASAN模式仅在Clang中受支持。 + +硬件KASAN模式(#3)依赖硬件来执行检查,但仍需要支持内存标签指令的编译器 +版本。GCC 10+和Clang 11+支持此模式。 + +两种软件KASAN模式都适用于SLUB和SLAB内存分配器,而基于硬件标签的KASAN目前 +仅支持SLUB。 + +目前x86_64、arm、arm64、xtensa、s390、riscv架构支持通用KASAN模式,仅 +arm64架构支持基于标签的KASAN模式。 + +用法 +---- + +要启用KASAN,请使用以下命令配置内核:: + + CONFIG_KASAN=y + +同时在 ``CONFIG_KASAN_GENERIC`` (启用通用KASAN模式), ``CONFIG_KASAN_SW_TAGS`` +(启用基于硬件标签的KASAN模式),和 ``CONFIG_KASAN_HW_TAGS`` (启用基于硬件标签 +的KASAN模式)之间进行选择。 + +对于软件模式,还可以在 ``CONFIG_KASAN_OUTLINE`` 和 ``CONFIG_KASAN_INLINE`` +之间进行选择。outline和inline是编译器插桩类型。前者产生较小的二进制文件, +而后者快1.1-2倍。 + +要将受影响的slab对象的alloc和free堆栈跟踪包含到报告中,请启用 +``CONFIG_STACKTRACE`` 。要包括受影响物理页面的分配和释放堆栈跟踪的话, +请启用 ``CONFIG_PAGE_OWNER`` 并使用 ``page_owner=on`` 进行引导。 + +错误报告 +~~~~~~~~ + +典型的KASAN报告如下所示:: + + ================================================================== + BUG: KASAN: slab-out-of-bounds in kmalloc_oob_right+0xa8/0xbc [test_kasan] + Write of size 1 at addr ffff8801f44ec37b by task insmod/2760 + + CPU: 1 PID: 2760 Comm: insmod Not tainted 4.19.0-rc3+ #698 + Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 + Call Trace: + dump_stack+0x94/0xd8 + print_address_description+0x73/0x280 + kasan_report+0x144/0x187 + __asan_report_store1_noabort+0x17/0x20 + kmalloc_oob_right+0xa8/0xbc [test_kasan] + kmalloc_tests_init+0x16/0x700 [test_kasan] + do_one_initcall+0xa5/0x3ae + do_init_module+0x1b6/0x547 + load_module+0x75df/0x8070 + __do_sys_init_module+0x1c6/0x200 + __x64_sys_init_module+0x6e/0xb0 + do_syscall_64+0x9f/0x2c0 + entry_SYSCALL_64_after_hwframe+0x44/0xa9 + RIP: 0033:0x7f96443109da + RSP: 002b:00007ffcf0b51b08 EFLAGS: 00000202 ORIG_RAX: 00000000000000af + RAX: ffffffffffffffda RBX: 000055dc3ee521a0 RCX: 00007f96443109da + RDX: 00007f96445cff88 RSI: 0000000000057a50 RDI: 00007f9644992000 + RBP: 000055dc3ee510b0 R08: 0000000000000003 R09: 0000000000000000 + R10: 00007f964430cd0a R11: 0000000000000202 R12: 00007f96445cff88 + R13: 000055dc3ee51090 R14: 0000000000000000 R15: 0000000000000000 + + Allocated by task 2760: + save_stack+0x43/0xd0 + kasan_kmalloc+0xa7/0xd0 + kmem_cache_alloc_trace+0xe1/0x1b0 + kmalloc_oob_right+0x56/0xbc [test_kasan] + kmalloc_tests_init+0x16/0x700 [test_kasan] + do_one_initcall+0xa5/0x3ae + do_init_module+0x1b6/0x547 + load_module+0x75df/0x8070 + __do_sys_init_module+0x1c6/0x200 + __x64_sys_init_module+0x6e/0xb0 + do_syscall_64+0x9f/0x2c0 + entry_SYSCALL_64_after_hwframe+0x44/0xa9 + + Freed by task 815: + save_stack+0x43/0xd0 + __kasan_slab_free+0x135/0x190 + kasan_slab_free+0xe/0x10 + kfree+0x93/0x1a0 + umh_complete+0x6a/0xa0 + call_usermodehelper_exec_async+0x4c3/0x640 + ret_from_fork+0x35/0x40 + + The buggy address belongs to the object at ffff8801f44ec300 + which belongs to the cache kmalloc-128 of size 128 + The buggy address is located 123 bytes inside of + 128-byte region [ffff8801f44ec300, ffff8801f44ec380) + The buggy address belongs to the page: + page:ffffea0007d13b00 count:1 mapcount:0 mapping:ffff8801f7001640 index:0x0 + flags: 0x200000000000100(slab) + raw: 0200000000000100 ffffea0007d11dc0 0000001a0000001a ffff8801f7001640 + raw: 0000000000000000 0000000080150015 00000001ffffffff 0000000000000000 + page dumped because: kasan: bad access detected + + Memory state around the buggy address: + ffff8801f44ec200: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb + ffff8801f44ec280: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc + >ffff8801f44ec300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 03 + ^ + ffff8801f44ec380: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb + ffff8801f44ec400: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc + ================================================================== + +报告标题总结了发生的错误类型以及导致该错误的访问类型。紧随其后的是错误访问的 +堆栈跟踪、所访问内存分配位置的堆栈跟踪(对于访问了slab对象的情况)以及对象 +被释放的位置的堆栈跟踪(对于访问已释放内存的问题报告)。接下来是对访问的 +slab对象的描述以及关于访问的内存页的信息。 + +最后,报告展示了访问地址周围的内存状态。在内部,KASAN单独跟踪每个内存颗粒的 +内存状态,根据KASAN模式分为8或16个对齐字节。报告的内存状态部分中的每个数字 +都显示了围绕访问地址的其中一个内存颗粒的状态。 + +对于通用KASAN,每个内存颗粒的大小为8个字节。每个颗粒的状态被编码在一个影子字节 +中。这8个字节可以是可访问的,部分访问的,已释放的或成为Redzone的一部分。KASAN +对每个影子字节使用以下编码:00表示对应内存区域的所有8个字节都可以访问;数字N +(1 <= N <= 7)表示前N个字节可访问,其他(8 - N)个字节不可访问;任何负值都表示 +无法访问整个8字节。KASAN使用不同的负值来区分不同类型的不可访问内存,如redzones +或已释放的内存(参见 mm/kasan/kasan.h)。 + +在上面的报告中,箭头指向影子字节 ``03`` ,表示访问的地址是部分可访问的。 + +对于基于标签的KASAN模式,报告最后的部分显示了访问地址周围的内存标签 +(参考 `实施细则`_ 章节)。 + +请注意,KASAN错误标题(如 ``slab-out-of-bounds`` 或 ``use-after-free`` ) +是尽量接近的:KASAN根据其拥有的有限信息打印出最可能的错误类型。错误的实际类型 +可能会有所不同。 + +通用KASAN还报告两个辅助调用堆栈跟踪。这些堆栈跟踪指向代码中与对象交互但不直接 +出现在错误访问堆栈跟踪中的位置。目前,这包括 call_rcu() 和排队的工作队列。 + +启动参数 +~~~~~~~~ + +KASAN受通用 ``panic_on_warn`` 命令行参数的影响。启用该功能后,KASAN在打印错误 +报告后会引起内核恐慌。 + +默认情况下,KASAN只为第一次无效内存访问打印错误报告。使用 ``kasan_multi_shot`` , +KASAN会针对每个无效访问打印报告。这有效地禁用了KASAN报告的 ``panic_on_warn`` 。 + +基于硬件标签的KASAN模式(请参阅下面有关各种模式的部分)旨在在生产中用作安全缓解 +措施。因此,它支持允许禁用KASAN或控制其功能的引导参数。 + +- ``kasan=off`` 或 ``=on`` 控制KASAN是否启用 (默认: ``on`` )。 + +- ``kasan.mode=sync`` 或 ``=async`` 控制KASAN是否配置为同步或异步执行模式(默认: + ``sync`` )。同步模式:当标签检查错误发生时,立即检测到错误访问。异步模式: + 延迟错误访问检测。当标签检查错误发生时,信息存储在硬件中(在arm64的 + TFSR_EL1寄存器中)。内核会定期检查硬件,并且仅在这些检查期间报告标签错误。 + +- ``kasan.stacktrace=off`` 或 ``=on`` 禁用或启用alloc和free堆栈跟踪收集 + (默认: ``on`` )。 + +- ``kasan.fault=report`` 或 ``=panic`` 控制是只打印KASAN报告还是同时使内核恐慌 + (默认: ``report`` )。即使启用了 ``kasan_multi_shot`` ,也会发生内核恐慌。 + +实施细则 +-------- + +通用KASAN +~~~~~~~~~ + +软件KASAN模式使用影子内存来记录每个内存字节是否可以安全访问,并使用编译时工具 +在每次内存访问之前插入影子内存检查。 + +通用KASAN将1/8的内核内存专用于其影子内存(16TB以覆盖x86_64上的128TB),并使用 +具有比例和偏移量的直接映射将内存地址转换为其相应的影子地址。 + +这是将地址转换为其相应影子地址的函数:: + + static inline void *kasan_mem_to_shadow(const void *addr) + { + return (void *)((unsigned long)addr >> KASAN_SHADOW_SCALE_SHIFT) + + KASAN_SHADOW_OFFSET; + } + +在这里 ``KASAN_SHADOW_SCALE_SHIFT = 3`` 。 + +编译时工具用于插入内存访问检查。编译器在每次访问大小为1、2、4、8或16的内存之前 +插入函数调用( ``__asan_load*(addr)`` , ``__asan_store*(addr)``)。这些函数通过 +检查相应的影子内存来检查内存访问是否有效。 + +使用inline插桩,编译器不进行函数调用,而是直接插入代码来检查影子内存。此选项 +显著地增大了内核体积,但与outline插桩内核相比,它提供了x1.1-x2的性能提升。 + +通用KASAN是唯一一种通过隔离延迟重新使用已释放对象的模式 +(参见 mm/kasan/quarantine.c 以了解实现)。 + +基于软件标签的KASAN模式 +~~~~~~~~~~~~~~~~~~~~~~~ + +基于软件标签的KASAN使用软件内存标签方法来检查访问有效性。目前仅针对arm64架构实现。 + +基于软件标签的KASAN使用arm64 CPU的顶部字节忽略(TBI)特性在内核指针的顶部字节中 +存储一个指针标签。它使用影子内存来存储与每个16字节内存单元相关的内存标签(因此, +它将内核内存的1/16专用于影子内存)。 + +在每次内存分配时,基于软件标签的KASAN都会生成一个随机标签,用这个标签标记分配 +的内存,并将相同的标签嵌入到返回的指针中。 + +基于软件标签的KASAN使用编译时工具在每次内存访问之前插入检查。这些检查确保正在 +访问的内存的标签等于用于访问该内存的指针的标签。如果标签不匹配,基于软件标签 +的KASAN会打印错误报告。 + +基于软件标签的KASAN也有两种插桩模式(outline,发出回调来检查内存访问;inline, +执行内联的影子内存检查)。使用outline插桩模式,会从执行访问检查的函数打印错误 +报告。使用inline插桩,编译器会发出 ``brk`` 指令,并使用专用的 ``brk`` 处理程序 +来打印错误报告。 + +基于软件标签的KASAN使用0xFF作为匹配所有指针标签(不检查通过带有0xFF指针标签 +的指针进行的访问)。值0xFE当前保留用于标记已释放的内存区域。 + +基于软件标签的KASAN目前仅支持对Slab和page_alloc内存进行标记。 + +基于硬件标签的KASAN模式 +~~~~~~~~~~~~~~~~~~~~~~~ + +基于硬件标签的KASAN在概念上类似于软件模式,但它是使用硬件内存标签作为支持而 +不是编译器插桩和影子内存。 + +基于硬件标签的KASAN目前仅针对arm64架构实现,并且基于ARMv8.5指令集架构中引入 +的arm64内存标记扩展(MTE)和最高字节忽略(TBI)。 + +特殊的arm64指令用于为每次内存分配指定内存标签。相同的标签被指定给指向这些分配 +的指针。在每次内存访问时,硬件确保正在访问的内存的标签等于用于访问该内存的指针 +的标签。如果标签不匹配,则会生成故障并打印报告。 + +基于硬件标签的KASAN使用0xFF作为匹配所有指针标签(不检查通过带有0xFF指针标签的 +指针进行的访问)。值0xFE当前保留用于标记已释放的内存区域。 + +基于硬件标签的KASAN目前仅支持对Slab和page_alloc内存进行标记。 + +如果硬件不支持MTE(ARMv8.5之前),则不会启用基于硬件标签的KASAN。在这种情况下, +所有KASAN引导参数都将被忽略。 + +请注意,启用CONFIG_KASAN_HW_TAGS始终会导致启用内核中的TBI。即使提供了 +``kasan.mode=off`` 或硬件不支持MTE(但支持TBI)。 + +基于硬件标签的KASAN只报告第一个发现的错误。之后,MTE标签检查将被禁用。 + +影子内存 +-------- + +内核将内存映射到地址空间的几个不同部分。内核虚拟地址的范围很大:没有足够的真实 +内存来支持内核可以访问的每个地址的真实影子区域。因此,KASAN只为地址空间的某些 +部分映射真实的影子。 + +默认行为 +~~~~~~~~ + +默认情况下,体系结构仅将实际内存映射到用于线性映射的阴影区域(以及可能的其他 +小区域)。对于所有其他区域 —— 例如vmalloc和vmemmap空间 —— 一个只读页面被映射 +到阴影区域上。这个只读的影子页面声明所有内存访问都是允许的。 + +这给模块带来了一个问题:它们不存在于线性映射中,而是存在于专用的模块空间中。 +通过连接模块分配器,KASAN临时映射真实的影子内存以覆盖它们。例如,这允许检测 +对模块全局变量的无效访问。 + +这也造成了与 ``VMAP_STACK`` 的不兼容:如果堆栈位于vmalloc空间中,它将被分配 +只读页面的影子内存,并且内核在尝试为堆栈变量设置影子数据时会出错。 + +CONFIG_KASAN_VMALLOC +~~~~~~~~~~~~~~~~~~~~ + +使用 ``CONFIG_KASAN_VMALLOC`` ,KASAN可以以更大的内存使用为代价覆盖vmalloc +空间。目前,这在x86、riscv、s390和powerpc上受支持。 + +这通过连接到vmalloc和vmap并动态分配真实的影子内存来支持映射。 + +vmalloc空间中的大多数映射都很小,需要不到一整页的阴影空间。因此,为每个映射 +分配一个完整的影子页面将是一种浪费。此外,为了确保不同的映射使用不同的影子 +页面,映射必须与 ``KASAN_GRANULE_SIZE * PAGE_SIZE`` 对齐。 + +相反,KASAN跨多个映射共享后备空间。当vmalloc空间中的映射使用影子区域的特定 +页面时,它会分配一个后备页面。此页面稍后可以由其他vmalloc映射共享。 + +KASAN连接到vmap基础架构以懒清理未使用的影子内存。 + +为了避免交换映射的困难,KASAN预测覆盖vmalloc空间的阴影区域部分将不会被早期 +的阴影页面覆盖,但是将不会被映射。这将需要更改特定于arch的代码。 + +这允许在x86上支持 ``VMAP_STACK`` ,并且可以简化对没有固定模块区域的架构的支持。 + +对于开发者 +---------- + +忽略访问 +~~~~~~~~ + +软件KASAN模式使用编译器插桩来插入有效性检查。此类检测可能与内核的某些部分 +不兼容,因此需要禁用。 + +内核的其他部分可能会访问已分配对象的元数据。通常,KASAN会检测并报告此类访问, +但在某些情况下(例如,在内存分配器中),这些访问是有效的。 + +对于软件KASAN模式,要禁用特定文件或目录的检测,请将 ``KASAN_SANITIZE`` 添加 +到相应的内核Makefile中: + +- 对于单个文件(例如,main.o):: + + KASAN_SANITIZE_main.o := n + +- 对于一个目录下的所有文件:: + + KASAN_SANITIZE := n + +对于软件KASAN模式,要在每个函数的基础上禁用检测,请使用KASAN特定的 +``__no_sanitize_address`` 函数属性或通用的 ``noinstr`` 。 + +请注意,禁用编译器插桩(基于每个文件或每个函数)会使KASAN忽略在软件KASAN模式 +的代码中直接发生的访问。当访问是间接发生的(通过调用检测函数)或使用没有编译器 +插桩的基于硬件标签的模式时,它没有帮助。 + +对于软件KASAN模式,要在当前任务的一部分内核代码中禁用KASAN报告,请使用 +``kasan_disable_current()``/``kasan_enable_current()`` 部分注释这部分代码。 +这也会禁用通过函数调用发生的间接访问的报告。 + +对于基于标签的KASAN模式(包括硬件模式),要禁用访问检查,请使用 +``kasan_reset_tag()`` 或 ``page_kasan_tag_reset()`` 。请注意,通过 +``page_kasan_tag_reset()`` 临时禁用访问检查需要通过 ``page_kasan_tag`` +/ ``page_kasan_tag_set`` 保存和恢复每页KASAN标签。 + +测试 +~~~~ + +有一些KASAN测试可以验证KASAN是否正常工作并可以检测某些类型的内存损坏。 +测试由两部分组成: + +1. 与KUnit测试框架集成的测试。使用 ``CONFIG_KASAN_KUNIT_TEST`` 启用。 +这些测试可以通过几种不同的方式自动运行和部分验证;请参阅下面的说明。 + +2. 与KUnit不兼容的测试。使用 ``CONFIG_KASAN_MODULE_TEST`` 启用并且只能作为模块 +运行。这些测试只能通过加载内核模块并检查内核日志以获取KASAN报告来手动验证。 + +如果检测到错误,每个KUnit兼容的KASAN测试都会打印多个KASAN报告之一,然后测试打印 +其编号和状态。 + +当测试通过:: + + ok 28 - kmalloc_double_kzfree + +当由于 ``kmalloc`` 失败而导致测试失败时:: + + # kmalloc_large_oob_right: ASSERTION FAILED at lib/test_kasan.c:163 + Expected ptr is not null, but is + not ok 4 - kmalloc_large_oob_right + +当由于缺少KASAN报告而导致测试失败时:: + + # kmalloc_double_kzfree: EXPECTATION FAILED at lib/test_kasan.c:629 + Expected kasan_data->report_expected == kasan_data->report_found, but + kasan_data->report_expected == 1 + kasan_data->report_found == 0 + not ok 28 - kmalloc_double_kzfree + +最后打印所有KASAN测试的累积状态。成功:: + + ok 1 - kasan + +或者,如果其中一项测试失败:: + + not ok 1 - kasan + +有几种方法可以运行与KUnit兼容的KASAN测试。 + +1. 可加载模块 + + 启用 ``CONFIG_KUNIT`` 后,KASAN-KUnit测试可以构建为可加载模块,并通过使用 + ``insmod`` 或 ``modprobe`` 加载 ``test_kasan.ko`` 来运行。 + +2. 内置 + + 通过内置 ``CONFIG_KUNIT`` ,也可以内置KASAN-KUnit测试。在这种情况下, + 测试将在启动时作为后期初始化调用运行。 + +3. 使用kunit_tool + + 通过内置 ``CONFIG_KUNIT`` 和 ``CONFIG_KASAN_KUNIT_TEST`` ,还可以使用 + ``kunit_tool`` 以更易读的方式查看KUnit测试结果。这不会打印通过测试 + 的KASAN报告。有关 ``kunit_tool`` 更多最新信息,请参阅 + `KUnit文档 <https://www.kernel.org/doc/html/latest/dev-tools/kunit/index.html>`_ 。 + +.. _KUnit: https://www.kernel.org/doc/html/latest/dev-tools/kunit/index.html diff --git a/Documentation/translations/zh_CN/index.rst b/Documentation/translations/zh_CN/index.rst index d56d6b7092e6..1f953d3439a5 100644 --- a/Documentation/translations/zh_CN/index.rst +++ b/Documentation/translations/zh_CN/index.rst @@ -4,6 +4,7 @@ \renewcommand\thesection* \renewcommand\thesubsection* + \kerneldocCJKon .. _linux_doc_zh: @@ -72,11 +73,11 @@ TODOlist: dev-tools/index doc-guide/index kernel-hacking/index + maintainer/index TODOList: * trace/index -* maintainer/index * fault-injection/index * livepatch/index * rust/index @@ -153,6 +154,7 @@ TODOList: arm64/index riscv/index openrisc/index + parisc/index TODOList: @@ -160,7 +162,6 @@ TODOList: * ia64/index * m68k/index * nios2/index -* parisc/index * powerpc/index * s390/index * sh/index diff --git a/Documentation/translations/zh_CN/maintainer/configure-git.rst b/Documentation/translations/zh_CN/maintainer/configure-git.rst new file mode 100644 index 000000000000..a45ea736f73b --- /dev/null +++ b/Documentation/translations/zh_CN/maintainer/configure-git.rst @@ -0,0 +1,62 @@ +.. include:: ../disclaimer-zh_CN.rst + +:Original: Documentation/maintainer/configure-git.rst + +:译者: + + 吴想成 Wu XiangCheng <bobwxc@email.cn> + +.. _configuregit_zh: + +Git配置 +======= + +本章讲述了维护者级别的git配置。 + +Documentation/maintainer/pull-requests.rst 中使用的标记分支应使用开发人员的 +GPG公钥进行签名。可以通过将 ``-u`` 标志传递给 ``git tag`` 来创建签名标记。 +但是,由于 *通常* 对同一项目使用同一个密钥,因此可以设置:: + + git config user.signingkey "keyname" + +或者手动编辑你的 ``.git/config`` 或 ``~/.gitconfig`` 文件:: + + [user] + name = Jane Developer + email = jd@domain.org + signingkey = jd@domain.org + +你可能需要告诉 ``git`` 去使用 ``gpg2``:: + + [gpg] + program = /path/to/gpg2 + +你可能也需要告诉 ``gpg`` 去使用哪个 ``tty`` (添加到你的shell rc文件中):: + + export GPG_TTY=$(tty) + + +创建链接到lore.kernel.org的提交 +------------------------------- + +http://lore.kernel.org 网站是所有涉及或影响内核开发的邮件列表的总存档。在这里 +存储补丁存档是推荐的做法,当维护人员将补丁应用到子系统树时,最好提供一个指向 +lore存档链接的标签,以便浏览提交历史的人可以找到某个更改背后的相关讨论和基本 +原理。链接标签如下所示: + + Link: https://lore.kernel.org/r/<message-id> + +通过在git中添加以下钩子,可以将此配置为在发布 ``git am`` 时自动执行: + +.. code-block:: none + + $ git config am.messageid true + $ cat >.git/hooks/applypatch-msg <<'EOF' + #!/bin/sh + . git-sh-setup + perl -pi -e 's|^Message-Id:\s*<?([^>]+)>?$|Link: https://lore.kernel.org/r/$1|g;' "$1" + test -x "$GIT_DIR/hooks/commit-msg" && + exec "$GIT_DIR/hooks/commit-msg" ${1+"$@"} + : + EOF + $ chmod a+x .git/hooks/applypatch-msg diff --git a/Documentation/translations/zh_CN/maintainer/index.rst b/Documentation/translations/zh_CN/maintainer/index.rst new file mode 100644 index 000000000000..eb75ccea9a21 --- /dev/null +++ b/Documentation/translations/zh_CN/maintainer/index.rst @@ -0,0 +1,21 @@ +.. include:: ../disclaimer-zh_CN.rst + +:Original: Documentation/maintainer/index.rst + +============== +内核维护者手册 +============== + +本文档本是内核维护者手册的首页。 +本手册还需要大量完善!请自由提出(和编写)本手册的补充内容。 +*译注:指英文原版* + +.. toctree:: + :maxdepth: 2 + + configure-git + rebasing-and-merging + pull-requests + maintainer-entry-profile + modifying-patches + diff --git a/Documentation/translations/zh_CN/maintainer/maintainer-entry-profile.rst b/Documentation/translations/zh_CN/maintainer/maintainer-entry-profile.rst new file mode 100644 index 000000000000..a1ee99c4786e --- /dev/null +++ b/Documentation/translations/zh_CN/maintainer/maintainer-entry-profile.rst @@ -0,0 +1,92 @@ +.. include:: ../disclaimer-zh_CN.rst + +:Original: Documentation/maintainer/maintainer-entry-profile.rst + +:译者: + + 吴想成 Wu XiangCheng <bobwxc@email.cn> + +.. _maintainerentryprofile_zh: + +维护者条目概要 +============== + +维护人员条目概要补充了顶层过程文档(提交补丁,提交驱动程序……),增加了子系 +统/设备驱动程序本地习惯以及有关补丁提交生命周期的相关内容。贡献者使用此文档 +来调整他们的期望和避免常见错误;维护人员可以使用这些信息超越子系统层面查看 +是否有机会汇聚到通用实践中。 + + +总览 +---- + +提供了子系统如何操作的介绍。MAINTAINERS文件告诉了贡献者应发送某文件的补丁到哪, +但它没有传达其他子系统的本地基础设施和机制以协助开发。 + +请考虑以下问题: + +- 当补丁被本地树接纳或合并到上游时是否有通知? +- 子系统是否使用patchwork实例?Patchwork状态变更是否有通知? +- 是否有任何机器人或CI基础设施监视列表,或子系统是否使用自动测试反馈以便把 + 控接纳补丁? +- 被拉入-next的Git分支是哪个? +- 贡献者应针对哪个分支提交? +- 是否链接到其他维护者条目概要?例如一个设备驱动可能指向其父子系统的条目。 + 这使得贡献者意识到某维护者可能对提交链中其他维护者负有的义务。 + + +提交检查单补遗 +-------------- + +列出强制性和咨询性标准,超出通用标准“提交检查表,以便维护者检查一个补丁是否 +足够健康。例如:“通过checkpatch.pl,没有错误、没有警告。通过单元测试详见某处”。 + +提交检查单补遗还可以包括有关硬件规格状态的详细信息。例如,子系统接受补丁之前 +是否需要考虑在某个修订版上发布的规范。 + + +开发周期的关键日期 +------------------ + +提交者常常会误以为补丁可以在合并窗口关闭之前的任何时间发送,且下一个-rc1时仍 +可以。事实上,大多数补丁都需要在下一个合并窗口打开之前提前进入linux-next中。 +向提交者澄清关键日期(以-rc发布周为标志)以明确什么时候补丁会被考虑合并以及 +何时需要等待下一个-rc。 + +至少需要讲明: + +- 最后一个可以提交新功能的-rc: + 针对下一个合并窗口的新功能提交应该在此点之前首次发布以供考虑。在此时间点 + 之后提交的补丁应该明确他们的目标为下下个合并窗口,或者给出应加快进度被接受 + 的充足理由。通常新特性贡献者的提交应出现在-rc5之前。 + +- 最后合并-rc:合并决策的最后期限。 + 向贡献者指出尚未接受的补丁集需要等待下下个合并窗口。当然,维护者没有义务 + 接受所有给定的补丁集,但是如果审阅在此时间点尚未结束,那么希望贡献者应该 + 等待并在下一个合并窗口重新提交。 + +可选项: + +- 开发基线分支的首个-rc,列在概述部分,视为已为新提交做好准备。 + + +审阅节奏 +-------- + +贡献者最担心的问题之一是:补丁集已发布却未收到反馈,应在多久后发送提醒。除了 +指定在重新提交之前要等待多长时间,还可以指示更新的首选样式;例如,重新发送 +整个系列,或私下发送提醒邮件。本节也可以列出本区域的代码审阅方式,以及获取 +不能直接从维护者那里得到的反馈的方法。 + + +现有概要 +-------- + +这里列出了现有的维护人员条目概要;我们可能会想要在不久的将来做一些不同的事情。 + +.. toctree:: + :maxdepth: 1 + + ../doc-guide/maintainer-profile + ../../../nvdimm/maintainer-entry-profile + ../../../riscv/patch-acceptance diff --git a/Documentation/translations/zh_CN/maintainer/modifying-patches.rst b/Documentation/translations/zh_CN/maintainer/modifying-patches.rst new file mode 100644 index 000000000000..7f6326137f6b --- /dev/null +++ b/Documentation/translations/zh_CN/maintainer/modifying-patches.rst @@ -0,0 +1,51 @@ +.. include:: ../disclaimer-zh_CN.rst + +:Original: Documentation/maintainer/modifying-patches.rst + +:译者: + + 吴想成 Wu XiangCheng <bobwxc@email.cn> + +.. _modifyingpatches_zh: + +修改补丁 +======== + +如果你是子系统或者分支的维护者,由于代码在你的和提交者的树中并不完全相同, +有时你需要稍微修改一下收到的补丁以合并它们。 + +如果你严格遵守开发者来源证书的规则(c),你应该要求提交者重做,但这完全是会 +适得其反的时间、精力浪费。规则(b)允许你调整代码,但这样修改提交者的代码并 +让他背书你的错误是非常不礼貌的。为解决此问题,建议在你之前最后一个 +Signed-off-by标签和你的之间添加一行,以指示更改的性质。这没有强制性要求,最 +好在描述前面加上你的邮件和/或姓名,用方括号括住整行,以明显指出你对最后一刻 +的更改负责。例如:: + + Signed-off-by: Random J Developer <random@developer.example.org> + [lucky@maintainer.example.org: struct foo moved from foo.c to foo.h] + Signed-off-by: Lucky K Maintainer <lucky@maintainer.example.org> + +如果您维护着一个稳定的分支,并希望同时明确贡献、跟踪更改、合并修复,并保护 +提交者免受责难,这种做法尤其有用。请注意,在任何情况下都不得更改作者的身份 +(From头),因为它会在变更日志中显示。 + +向后移植(back-port)人员特别要注意:为了便于跟踪,请在提交消息的顶部(即主题行 +之后)插入补丁的来源,这是一种常见而有用的做法。例如,我们可以在3.x稳定版本 +中看到以下内容:: + + Date: Tue Oct 7 07:26:38 2014 -0400 + + libata: Un-break ATA blacklist + + commit 1c40279960bcd7d52dbdf1d466b20d24b99176c8 upstream. + +下面是一个旧的内核在某补丁被向后移植后会出现的:: + + Date: Tue May 13 22:12:27 2008 +0200 + + wireless, airo: waitbusy() won't delay + + [backport of 2.6 commit b7acbdfbd1f277c1eb23f344f899cfa4cd0bf36a] + +不管什么格式,这些信息都为人们跟踪你的树,以及试图解决你树中的错误的人提供了 +有价值的帮助。 diff --git a/Documentation/translations/zh_CN/maintainer/pull-requests.rst b/Documentation/translations/zh_CN/maintainer/pull-requests.rst new file mode 100644 index 000000000000..f46d6f3f2498 --- /dev/null +++ b/Documentation/translations/zh_CN/maintainer/pull-requests.rst @@ -0,0 +1,148 @@ +.. include:: ../disclaimer-zh_CN.rst + +:Original: Documentation/maintainer/pull-requests.rst + +:译者: + + 吴想成 Wu XiangCheng <bobwxc@email.cn> + +.. _pullrequests_zh: + +如何创建拉取请求 +================ + +本章描述维护人员如何创建并向其他维护人员提交拉取请求。这对将更改从一个维护者 +树转移到另一个维护者树非常有用。 + +本文档由Tobin C. Harding(当时他尚不是一名经验丰富的维护人员)编写,内容主要 +来自Greg Kroah Hartman和Linus Torvalds在LKML上的评论。Jonathan Corbet和Mauro +Carvalho Chehab提出了一些建议和修改。错误不可避免,如有问题,请找Tobin C. +Harding <me@tobin.cc>。 + +原始邮件线程:: + + http://lkml.kernel.org/r/20171114110500.GA21175@kroah.com + + +创建分支 +-------- + +首先,您需要将希望包含拉取请求里的所有更改都放在单独分支中。通常您将基于某开发 +人员树的一个分支,一般是打算向其发送拉取请求的开发人员。 + +为了创建拉取请求,您必须首先标记刚刚创建的分支。建议您选择一个有意义的标记名, +以即使过了一段时间您和他人仍能理解的方式。在名称中包含源子系统和目标内核版本 +的指示也是一个好的做法。 + +Greg提供了以下内容。对于一个含有drivers/char中混杂事项、将应用于4.15-rc1内核的 +拉取请求,可以命名为 ``char-misc-4.15-rc1`` 。如果要在 ``char-misc-next`` 分支 +上打上此标记,您可以使用以下命令:: + + git tag -s char-misc-4.15-rc1 char-misc-next + +这将在 ``char-misc-next`` 分支的最后一个提交上创建一个名为 ``char-misc-4.15-rc1`` +的标记,并用您的gpg密钥签名(参见 Documentation/maintainer/configure-git.rst )。 + +Linus只接受基于签名过的标记的拉取请求。其他维护者可能会有所不同。 + +当您运行上述命令时 ``git`` 会打开编辑器要求你描述一下这个标记。在本例中您需要 +描述拉取请求,所以请概述一下包含的内容,为什么要合并,是否完成任何测试。所有 +这些信息都将留在标记中,然后在维护者合并拉取请求时保留在合并提交中。所以把它 +写好,它将永远留在内核中。 + +正如Linus所说:: + + 不管怎么样,至少对我来说,重要的是 *信息* 。我需要知道我在拉取什么、 + 为什么我要拉取。我也希望将此消息用于合并消息,因此它不仅应该对我有 + 意义,也应该可以成为一个有意义的历史记录。 + + 注意,如果拉取请求有一些不寻常的地方,请详细说明。如果你修改了并非 + 由你维护的文件,请解释 **为什么** 。我总会在差异中看到的,如果你不 + 提的话,我只会觉得分外可疑。当你在合并窗口后给我发新东西的时候, + (甚至是比较重大的错误修复),不仅需要解释做了什么、为什么这么做, + 还请解释一下 **时间问题** 。为什么错过了合并窗口…… + + 我会看你写在拉取请求邮件和签名标记里面的内容,所以根据你的工作流, + 你可以在签名标记里面描述工作内容(也会自动放进拉取请求邮件),也 + 可以只在标记里面放个占位符,稍后在你实际发给我拉取请求时描述工作内容。 + + 是的,我会编辑这些消息。部分因为我需要做一些琐碎的格式调整(整体缩进、 + 括号等),也因为此消息可能对我有意义(描述了冲突或一些个人问题)而对 + 合并提交信息上下文没啥意义,因此我需要尽力让它有意义起来。我也会 + 修复一些拼写和语法错误,特别是非母语者(母语者也是;^)。但我也会删掉 + 或增加一些内容。 + + Linus + +Greg给出了一个拉取请求的例子:: + + Char/Misc patches for 4.15-rc1 + + Here is the big char/misc patch set for the 4.15-rc1 merge window. + Contained in here is the normal set of new functions added to all + of these crazy drivers, as well as the following brand new + subsystems: + - time_travel_controller: Finally a set of drivers for the + latest time travel bus architecture that provides i/o to + the CPU before it asked for it, allowing uninterrupted + processing + - relativity_shifters: due to the affect that the + time_travel_controllers have on the overall system, there + was a need for a new set of relativity shifter drivers to + accommodate the newly formed black holes that would + threaten to suck CPUs into them. This subsystem handles + this in a way to successfully neutralize the problems. + There is a Kconfig option to force these to be enabled + when needed, so problems should not occur. + + All of these patches have been successfully tested in the latest + linux-next releases, and the original problems that it found have + all been resolved (apologies to anyone living near Canberra for the + lack of the Kconfig options in the earlier versions of the + linux-next tree creations.) + + Signed-off-by: Your-name-here <your_email@domain> + + +此标记消息格式就像一个git提交。顶部有一行“总结标题”, 一定要在下面sign-off。 + +现在您已经有了一个本地签名标记,您需要将它推送到可以被拉取的位置:: + + git push origin char-misc-4.15-rc1 + + +创建拉取请求 +------------ + +最后要做的是创建拉取请求消息。可以使用 ``git request-pull`` 命令让 ``git`` +为你做这件事,但它需要确定你想拉取什么,以及拉取针对的基础(显示正确的拉取 +更改和变更状态)。以下命令将生成一个拉取请求:: + + git request-pull master git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git/ char-misc-4.15-rc1 + +引用Greg的话:: + + 此命令要求git比较从“char-misc-4.15-rc1”标记位置到“master”分支头(上述 + 例子中指向了我从Linus的树分叉的地方,通常是-rc发布)的差异,并去使用 + git:// 协议拉取。如果你希望使用 https:// 协议,也可以用在这里(但是请 + 注意,部分人由于防火墙问题没法用https协议拉取)。 + + 如果char-misc-4.15-rc1标记没有出现在我要求拉取的仓库中,git会提醒 + 它不在那里,所以记得推送到公开地方。 + + “git request-pull”会包含git树的地址和需要拉取的特定标记,以及标记 + 描述全文(详尽描述标记)。同时它也会创建此拉取请求的差异状态和单个 + 提交的缩短日志。 + +Linus回复说他倾向于 ``git://`` 协议。其他维护者可能有不同的偏好。另外,请注意 +如果你创建的拉取请求没有签名标记, ``https://`` 可能是更好的选择。完整的讨论 +请看原邮件。 + + +提交拉取请求 +------------ + +拉取请求的提交方式与普通补丁相同。向维护人员发送内联电子邮件并抄送LKML以及 +任何必要特定子系统的列表。对Linus的拉取请求通常有如下主题行:: + + [GIT PULL] <subsystem> changes for v4.15-rc1 diff --git a/Documentation/translations/zh_CN/maintainer/rebasing-and-merging.rst b/Documentation/translations/zh_CN/maintainer/rebasing-and-merging.rst new file mode 100644 index 000000000000..83b7dabfe88b --- /dev/null +++ b/Documentation/translations/zh_CN/maintainer/rebasing-and-merging.rst @@ -0,0 +1,165 @@ +.. SPDX-License-Identifier: GPL-2.0 + +.. include:: ../disclaimer-zh_CN.rst + +:Original: Documentation/maintainer/rebasing-and-merging.rst + +:译者: + + 吴想成 Wu XiangCheng <bobwxc@email.cn> + +========== +变基与合并 +========== + +一般来说,维护子系统需要熟悉Git源代码管理系统。Git是一个功能强大的工具,有 +很多功能;就像这类工具常出现的情况一样,使用这些功能的方法有对有错。本文档 +特别介绍了变基与合并的用法。维护者经常在错误使用这些工具时遇到麻烦,但避免 +问题实际上并不那么困难。 + +总的来说,需要注意的一点是:与许多其他项目不同,内核社区并不害怕在其开发历史 +中看到合并提交。事实上,考虑到该项目的规模,避免合并几乎是不可能的。维护者会 +在希望避免合并时遇到一些问题,而过于频繁的合并也会带来另一些问题。 + +变基 +==== + +“变基(Rebase)”是更改存储库中一系列提交的历史记录的过程。有两种不同型的操作 +都被称为变基,因为这两种操作都使用 ``git rebase`` 命令,但它们之间存在显著 +差异: + + - 更改一系列补丁的父提交(起始提交)。例如,变基操作可以将基于上一内核版本 + 的一个补丁集重建到当前版本上。在下面的讨论中,我们将此操作称为“变根”。 + + - 通过修复(或删除)损坏的提交、添加补丁、添加标记以更改一系列补丁的历史, + 来提交变更日志或更改已应用提交的顺序。在下文中,这种类型的操作称为“历史 + 修改” + +术语“变基”将用于指代上述两种操作。如果使用得当,变基可以产生更清晰、更整洁的 +开发历史;如果使用不当,它可能会模糊历史并引入错误。 + +以下一些经验法则可以帮助开发者避免最糟糕的变基风险: + + - 已经发布到你私人系统之外世界的历史通常不应更改。其他人可能会拉取你的树 + 的副本,然后基于它进行工作;修改你的树会给他们带来麻烦。如果工作需要变基, + 这通常是表明它还没有准备好提交到公共存储库的信号。 + + 但是,总有例外。有些树(linux-next是一个典型的例子)由于它们的需要经常 + 变基,开发人员知道不要基于它们来工作。开发人员有时会公开一个不稳定的分支, + 供其他人或自动测试服务进行测试。如果您确实以这种方式公开了一个可能不稳定 + 的分支,请确保潜在使用者知道不要基于它来工作。 + + - 不要在包含由他人创建的历史的分支上变基。如果你从别的开发者的仓库拉取了变更, + 那你现在就成了他们历史记录的保管人。你不应该改变它,除了少数例外情况。例如 + 树中有问题的提交必须显式恢复(即通过另一个补丁修复),而不是通过修改历史而 + 消失。 + + - 没有合理理由,不要对树变根。仅为了切换到更新的基或避免与上游储存库的合并 + 通常不是合理理由。 + + - 如果你必须对储存库进行变根,请不要随机选取一个提交作为新基。在发布节点之间 + 内核通常处于一个相对不稳定的状态;基于其中某点进行开发会增加遇到意外错误的 + 几率。当一系列补丁必须移动到新基时,请选择移动到一个稳定节点(例如-rc版本 + 节点)。 + + - 请知悉对补丁系列进行变根(或做明显的历史修改)会改变它们的开发环境,且很 + 可能使做过的大部分测试失效。一般来说,变基后的补丁系列应当像新代码一样对 + 待,并重新测试。 + +合并窗口麻烦的一个常见原因是,Linus收到了一个明显在拉取请求发送之前不久才变根 +(通常是变根到随机的提交上)的补丁系列。这样一个系列被充分测试的可能性相对较 +低,拉取请求被接受的几率也同样较低。 + +相反,如果变基仅限于私有树、提交基于一个通用的起点、且经过充分测试,则引起 +麻烦的可能性就很低。 + +合并 +==== + +内核开发过程中,合并是一个很常见的操作;5.1版本开发周期中有超过1126个合并 +——差不多占了整体的9%。内核开发工作积累在100多个不同的子系统树中,每个 +子系统树都可能包含多个主题分支;每个分支通常独立于其他分支进行开发。因此 +在任何给定分支进入上游储存库之前,至少需要一次合并。 + +许多项目要求拉取请求中的分支基于当前主干,这样历史记录中就不会出现合并提交。 +内核并不是这样;任何为了避免合并而重新对分支变基都很可能导致麻烦。 + +子系统维护人员发现他们必须进行两种类型的合并:从较低层级的子系统树和从其他 +子系统树(同级树或主线)进行合并。这两种情况下要遵循的最佳实践是不同的。 + +合并较低层级树 +-------------- + +较大的子系统往往有多个级别的维护人员,较低级别的维护人员向较高级别发送拉取 +请求。合并这样的请求执行几乎肯定会生成一个合并提交;这也是应该的。实际上, +子系统维护人员可能希望在极少数快进合并情况下使用 ``-–no-ff`` 标志来强制添加 +合并提交,以便记录合并的原因。 **任何** 类型的合并的变更日志必须说明 +*为什么* 合并。对于较低级别的树,“为什么”通常是对该取所带来的变化的总结。 + +各级维护人员都应在他们的拉取请求上使用经签名的标签,上游维护人员应在拉取分支 +时验证标签。不这样做会威胁整个开发过程的安全。 + +根据上面列出的规则,一旦您将其他人的历史记录合并到树中,您就不得对该分支进行 +变基,即使您能够这样做。 + +合并同级树或上游树 +------------------ + +虽然来自下游的合并是常见且不起眼的,但当需要将一个分支推向上游时,其中来自 +其他树的合并往往是一个危险信号。这种合并需要仔细考虑并加以充分证明,否则后续 +的拉取请求很可能会被拒绝。 + +想要将主分支合并到存储库中是很自然的;这种类型的合并通常被称为“反向合并” +。反向合并有助于确保与并行的开发没有冲突,并且通常会给人一种温暖、舒服的 +感觉,即处于最新。但这种诱惑几乎总是应该避免的。 + +为什么呢?反向合并将搅乱你自己分支的开发历史。它们会大大增加你遇到来自社区 +其他地方的错误的机会,且使你很难确保你所管理的工作稳定并准备好合入上游。 +频繁的合并还可以掩盖树中开发过程中的问题;它们会隐藏与其他树的交互,而这些 +交互不应该(经常)发生在管理良好的分支中。 + +也就是说,偶尔需要进行反向合并;当这种情况发生时,一定要在提交信息中记录 +*为什么* 。同样,在一个众所周知的稳定点进行合并,而不是随机提交。即使这样, +你也不应该反向合并一棵比你的直接上游树更高层级的树;如果确实需要更高级别的 +反向合并,应首先在上游树进行。 + +导致合并相关问题最常见的原因之一是:在发送拉取请求之前维护者合并上游以解决 +合并冲突。同样,这种诱惑很容易理解,但绝对应该避免。对于最终拉取请求来说 +尤其如此:Linus坚信他更愿意看到合并冲突,而不是不必要的反向合并。看到冲突 +可以让他了解潜在的问题所在。他做过很多合并(在5.1版本开发周期中是382次), +而且在解决冲突方面也很在行——通常比参与的开发人员要强。 + +那么,当他们的子系统分支和主线之间发生冲突时,维护人员应该怎么做呢?最重要 +的一步是在拉取请求中提示Linus会发生冲突;如果啥都没说则表明您的分支可以正常 +合入。对于特别困难的冲突,创建并推送一个 *独立* 分支来展示你将如何解决问题。 +在拉取请求中提到该分支,但是请求本身应该针对未合并的分支。 + +即使不存在已知冲突,在发送拉取请求之前进行合并测试也是个好主意。它可能会提醒 +您一些在linux-next树中没有发现的问题,并帮助您准确地理解您正在要求上游做什么。 + +合并上游树或另一个子系统树的另一个原因是解决依赖关系。这些依赖性问题有时确实 +会发生,而且有时与另一棵树交叉合并是解决这些问题的最佳方法;同样,在这种情况 +下,合并提交应该解释为什么要进行合并。花点时间把它做好;会有人阅读这些变更 +日志。 + +然而依赖性问题通常表明需要改变方法。合并另一个子系统树以解决依赖性风险会带来 +其他缺陷,几乎永远不应这样做。如果该子系统树无法被合到上游,那么它的任何问题 +也都会阻碍你的树合并。更可取的选择包括与维护人员达成一致意见,在其中一个树中 +同时进行两组更改;或者创建一个主题分支专门处理可以合并到两个树中的先决条件提交。 +如果依赖关系与主要的基础结构更改相关,正确的解决方案可能是将依赖提交保留一个 +开发周期,以便这些更改有时间在主线上稳定。 + +最后 +==== + +在开发周期的开头合并主线是比较常见的,可以获取树中其他地方的更改和修复。同样, +这样的合并应该选择一个众所周知的发布点,而不是一些随机点。如果在合并窗口期间 +上游分支已完全清空到主线中,则可以使用以下命令向前拉取它:: + + git merge v5.2-rc1^0 + +“^0”使Git执行快进合并(在这种情况下这应该可以),从而避免多余的虚假合并提交。 + +上面列出的就是指导方针了。总是会有一些情况需要不同的解决方案,这些指导原则 +不应阻止开发人员在需要时做正确的事情。但是,我们应该时刻考虑是否真的出现了 +这样的需求,并准备好解释为什么需要做一些不寻常的事情。 diff --git a/Documentation/translations/zh_CN/parisc/debugging.rst b/Documentation/translations/zh_CN/parisc/debugging.rst new file mode 100644 index 000000000000..c21beb986e15 --- /dev/null +++ b/Documentation/translations/zh_CN/parisc/debugging.rst @@ -0,0 +1,42 @@ +.. include:: ../disclaimer-zh_CN.rst + +:Original: Documentation/parisc/debugging.rst +:Translator: Yanteng Si <siyanteng@loongson.cn> + +.. _cn_parisc_debugging: + +================= +调试PA-RISC +================= + +好吧,这里有一些关于调试linux/parisc的较底层部分的信息。 + + +1. 绝对地址 +===================== + +很多汇编代码目前运行在实模式下,这意味着会使用绝对地址,而不是像内核其他 +部分那样使用虚拟地址。要将绝对地址转换为虚拟地址,你可以在System.map中查 +找,添加__PAGE_OFFSET(目前是0x10000000)。 + + +2. HPMCs +======== + +当实模式的代码试图访问不存在的内存时,会出现HPMC(high priority machine +check)而不是内核oops。若要调试HPMC,请尝试找到系统响应程序/请求程序地址。 +系统请求程序地址应该与(某)处理器的HPA(I/O范围内的高地址)相匹配;系统响应程 +序地址是实模式代码试图访问的地址。 + +系统响应程序地址的典型值是大于__PAGE_OFFSET (0x10000000)的地址,这意味着 +在实模式试图访问它之前,虚拟地址没有被翻译成物理地址。 + + +3. 有趣的Q位 +============ + +某些非常关键的代码必须清除PSW中的Q位。当Q位被清除时,CPU不会更新中断处理 +程序所读取的寄存器,以找出机器被中断的位置——所以如果你在清除Q位的指令和再 +次设置Q位的RFI之间遇到中断,你不知道它到底发生在哪里。如果你幸运的话,IAOQ +会指向清除Q位的指令,如果你不幸运的话,它会指向任何地方。通常Q位的问题会 +表现为无法解释的系统挂起或物理内存越界。 diff --git a/Documentation/translations/zh_CN/parisc/index.rst b/Documentation/translations/zh_CN/parisc/index.rst new file mode 100644 index 000000000000..a47454ebe32e --- /dev/null +++ b/Documentation/translations/zh_CN/parisc/index.rst @@ -0,0 +1,28 @@ +.. SPDX-License-Identifier: GPL-2.0 +.. include:: ../disclaimer-zh_CN.rst + +:Original: Documentation/parisc/index.rst +:Translator: Yanteng Si <siyanteng@loongson.cn> + +.. _cn_parisc_index: + +==================== +PA-RISC体系架构 +==================== + +.. toctree:: + :maxdepth: 2 + + debugging + registers + +Todolist: + + features + +.. only:: subproject and html + + Indices + ======= + + * :ref:`genindex` diff --git a/Documentation/translations/zh_CN/parisc/registers.rst b/Documentation/translations/zh_CN/parisc/registers.rst new file mode 100644 index 000000000000..71e2404cd103 --- /dev/null +++ b/Documentation/translations/zh_CN/parisc/registers.rst @@ -0,0 +1,153 @@ +.. include:: ../disclaimer-zh_CN.rst + +:Original: Documentation/parisc/registers.rst +:Translator: Yanteng Si <siyanteng@loongson.cn> + +.. _cn_parisc_registers: + +========================= +Linux/PA-RISC的寄存器用法 +========================= + +[ 用星号表示目前尚未实现的计划用途。 ] + +ABI约定的通用寄存器 +=================== + +控制寄存器 +---------- + +============================ ================================= +CR 0 (恢复计数器) 用于ptrace +CR 1-CR 7(无定义) 未使用 +CR 8 (Protection ID) 每进程值* +CR 9, 12, 13 (PIDS) 未使用 +CR10 (CCR) FPU延迟保存* +CR11 按照ABI的规定(SAR) +CR14 (中断向量) 初始化为 fault_vector +CR15 (EIEM) 所有位初始化为1* +CR16 (间隔计时器) 读取周期数/写入开始时间间隔计时器 +CR17-CR22 中断参数 +CR19 中断指令寄存器 +CR20 中断空间寄存器 +CR21 中断偏移量寄存器 +CR22 中断 PSW +CR23 (EIRR) 读取未决中断/写入清除位 +CR24 (TR 0) 内核空间页目录指针 +CR25 (TR 1) 用户空间页目录指针 +CR26 (TR 2) 不使用 +CR27 (TR 3) 线程描述符指针 +CR28 (TR 4) 不使用 +CR29 (TR 5) 不使用 +CR30 (TR 6) 当前 / 0 +CR31 (TR 7) 临时寄存器,在不同地方使用 +============================ ================================= + +空间寄存器(内核模式) +---------------------- + +======== ============================== +SR0 临时空间寄存器 +SR4-SR7 设置为0 +SR1 临时空间寄存器 +SR2 内核不应该破坏它 +SR3 用于用户空间访问(当前进程) +======== ============================== + +空间寄存器(用户模式) +---------------------- + +======== ============================ +SR0 临时空间寄存器 +SR1 临时空间寄存器 +SR2 保存Linux gateway page的空间 +SR3 在内核中保存用户地址空间的值 +SR4-SR7 定义了用户/内核的短地址空间 +======== ============================ + + +处理器状态字 +------------ + +====================== ================================================ +W (64位地址) 0 +E (小尾端) 0 +S (安全间隔计时器) 0 +T (产生分支陷阱) 0 +H (高特权级陷阱) 0 +L (低特权级陷阱) 0 +N (撤销下一条指令) 被C代码使用 +X (数据存储中断禁用) 0 +B (产生分支) 被C代码使用 +C (代码地址转译) 1, 在执行实模式代码时为0 +V (除法步长校正) 被C代码使用 +M (HPMC 掩码) 0, 在执行HPMC操作*时为1 +C/B (进/借 位) 被C代码使用 +O (有序引用) 1* +F (性能监视器) 0 +R (回收计数器陷阱) 0 +Q (收集中断状态) 1 (在rfi之前的代码中为0) +P (保护标识符) 1* +D (数据地址转译) 1, 在执行实模式代码时为0 +I (外部中断掩码) 由cli()/sti()宏使用。 +====================== ================================================ + +“隐形”寄存器(影子寄存器) +--------------------------- + +============= =================== +PSW W 默认值 0 +PSW E 默认值 0 +影子寄存器 被中断处理代码使用 +TOC启用位 1 +============= =================== + +---------------------------------------------------------- + +PA-RISC架构定义了7个寄存器作为“影子寄存器”。这些寄存器在 +RETURN FROM INTERRUPTION AND RESTORE指令中使用,通过消 +除中断处理程序中对一般寄存器(GR)的保存和恢复的需要来减 +少状态保存和恢复时间。影子寄存器是GRs 1, 8, 9, 16, 17, +24和25。 + +------------------------------------------------------------------------- + +寄存器使用说明,最初由John Marvin提供,并由Randolph Chung提供一些补充说明。 + +对于通用寄存器: + +r1,r2,r19-r26,r28,r29 & r31可以在不保存它们的情况下被使用。当然,如果你 +关心它们,在调用另一个程序之前,你也需要保存它们。上面的一些寄存器确实 +有特殊的含义,你应该注意一下: + + r1: + addil指令是硬性规定将其结果放在r1中,所以如果你使用这条指令要 + 注意这点。 + + r2: + 这就是返回指针。一般来说,你不想使用它,因为你需要这个指针来返 + 回给你的调用者。然而,它与这组寄存器组合在一起,因为调用者不能 + 依赖你返回时的值是相同的,也就是说,你可以将r2复制到另一个寄存 + 器,并在作废r2后通过该寄存器返回,这应该不会给调用程序带来问题。 + + r19-r22: + 这些通常被认为是临时寄存器。 + 请注意,在64位中它们是arg7-arg4。 + + r23-r26: + 这些是arg3-arg0,也就是说,如果你不再关心传入的值,你可以使用 + 它们。 + + r28,r29: + 这俩是ret0和ret1。它们是你传入返回值的地方。r28是主返回值。当返回 + 小结构体时,r29也可以用来将数据传回给调用程序。 + + r30: + 栈指针 + + r31: + ble指令将返回指针放在这里。 + + + r3-r18,r27,r30需要被保存和恢复。r3-r18只是一般用途的寄存器。 + r27是数据指针,用来使对全局变量的引用更容易。r30是栈指针。 diff --git a/Documentation/translations/zh_CN/process/8.Conclusion.rst b/Documentation/translations/zh_CN/process/8.Conclusion.rst index 71c3e30efc6f..4707f0101964 100644 --- a/Documentation/translations/zh_CN/process/8.Conclusion.rst +++ b/Documentation/translations/zh_CN/process/8.Conclusion.rst @@ -19,7 +19,7 @@ :ref:`Documentation/translations/zh_CN/process/howto.rst <cn_process_howto>` 文件是一个重要的起点; :ref:`Documentation/translations/zh_CN/process/submitting-patches.rst <cn_submittingpatches>` -和 :ref:`Documentation/transaltions/zh_CN/process/submitting-drivers.rst <cn_submittingdrivers>` +和 :ref:`Documentation/translations/zh_CN/process/submitting-drivers.rst <cn_submittingdrivers>` 也是所有内核开发人员都应该阅读的内容。许多内部内核API都是使用kerneldoc机制 记录的;“make htmldocs”或“make pdfdocs”可用于以HTML或PDF格式生成这些文档 (尽管某些发行版提供的tex版本会遇到内部限制,无法正确处理文档)。 diff --git a/Documentation/translations/zh_CN/process/coding-style.rst b/Documentation/translations/zh_CN/process/coding-style.rst index 406d43a02c02..b8c484a84d10 100644 --- a/Documentation/translations/zh_CN/process/coding-style.rst +++ b/Documentation/translations/zh_CN/process/coding-style.rst @@ -61,7 +61,7 @@ Linux 内核代码风格 case 'K': case 'k': mem <<= 10; - /* fall through */ + fallthrough; default: break; } diff --git a/Documentation/usb/ehci.rst b/Documentation/usb/ehci.rst index 31f650e7c1b4..76190501907a 100644 --- a/Documentation/usb/ehci.rst +++ b/Documentation/usb/ehci.rst @@ -1,4 +1,4 @@ -=========== +=========== EHCI driver =========== diff --git a/Documentation/usb/gadget_printer.rst b/Documentation/usb/gadget_printer.rst index 5e5516c69075..e611a6d91093 100644 --- a/Documentation/usb/gadget_printer.rst +++ b/Documentation/usb/gadget_printer.rst @@ -1,4 +1,4 @@ -=============================== +=============================== Linux USB Printer Gadget Driver =============================== diff --git a/Documentation/userspace-api/landlock.rst b/Documentation/userspace-api/landlock.rst index 62c9361a3c7f..f35552ff19ba 100644 --- a/Documentation/userspace-api/landlock.rst +++ b/Documentation/userspace-api/landlock.rst @@ -145,7 +145,8 @@ Bind mounts and OverlayFS Landlock enables to restrict access to file hierarchies, which means that these access rights can be propagated with bind mounts (cf. -:doc:`/filesystems/sharedsubtree`) but not with :doc:`/filesystems/overlayfs`. +Documentation/filesystems/sharedsubtree.rst) but not with +Documentation/filesystems/overlayfs.rst. A bind mount mirrors a source file hierarchy to a destination. The destination hierarchy is then composed of the exact same files, on which Landlock rules can @@ -170,8 +171,8 @@ Inheritance Every new thread resulting from a :manpage:`clone(2)` inherits Landlock domain restrictions from its parent. This is similar to the seccomp inheritance (cf. -:doc:`/userspace-api/seccomp_filter`) or any other LSM dealing with task's -:manpage:`credentials(7)`. For instance, one process's thread may apply +Documentation/userspace-api/seccomp_filter.rst) or any other LSM dealing with +task's :manpage:`credentials(7)`. For instance, one process's thread may apply Landlock rules to itself, but they will not be automatically applied to other sibling threads (unlike POSIX thread credential changes, cf. :manpage:`nptl(7)`). @@ -278,7 +279,7 @@ Memory usage ------------ Kernel memory allocated to create rulesets is accounted and can be restricted -by the :doc:`/admin-guide/cgroup-v1/memory`. +by the Documentation/admin-guide/cgroup-v1/memory.rst. Questions and answers ===================== @@ -303,7 +304,7 @@ issues, especially when untrusted processes can manipulate them (cf. Additional documentation ======================== -* :doc:`/security/landlock` +* Documentation/security/landlock.rst * https://landlock.io .. Links diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index b9ddce5638f5..c7b165ca70b6 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -6620,7 +6620,7 @@ system fingerprint. To prevent userspace from circumventing such restrictions by running an enclave in a VM, KVM prevents access to privileged attributes by default. -See Documentation/x86/sgx/2.Kernel-internals.rst for more details. +See Documentation/x86/sgx.rst for more details. 7.26 KVM_CAP_PPC_RPT_INVALIDATE ------------------------------- diff --git a/Documentation/virt/kvm/s390-pv-boot.rst b/Documentation/virt/kvm/s390-pv-boot.rst index ad1f7866c001..73a6083cb5e7 100644 --- a/Documentation/virt/kvm/s390-pv-boot.rst +++ b/Documentation/virt/kvm/s390-pv-boot.rst @@ -10,7 +10,7 @@ The memory of Protected Virtual Machines (PVMs) is not accessible to I/O or the hypervisor. In those cases where the hypervisor needs to access the memory of a PVM, that memory must be made accessible. Memory made accessible to the hypervisor will be encrypted. See -:doc:`s390-pv` for details." +Documentation/virt/kvm/s390-pv.rst for details." On IPL (boot) a small plaintext bootloader is started, which provides information about the encrypted components and necessary metadata to diff --git a/Documentation/virt/kvm/vcpu-requests.rst b/Documentation/virt/kvm/vcpu-requests.rst index af1b37441e0a..ad2915ef7020 100644 --- a/Documentation/virt/kvm/vcpu-requests.rst +++ b/Documentation/virt/kvm/vcpu-requests.rst @@ -304,6 +304,6 @@ VCPU returns from the call. References ========== -.. [atomic-ops] Documentation/core-api/atomic_ops.rst +.. [atomic-ops] Documentation/atomic_bitops.txt and Documentation/atomic_t.txt .. [memory-barriers] Documentation/memory-barriers.txt .. [lwn-mb] https://lwn.net/Articles/573436/ diff --git a/Documentation/vm/zswap.rst b/Documentation/vm/zswap.rst index d8d9fa4a1f0d..8edb8d578caf 100644 --- a/Documentation/vm/zswap.rst +++ b/Documentation/vm/zswap.rst @@ -10,7 +10,7 @@ Overview Zswap is a lightweight compressed cache for swap pages. It takes pages that are in the process of being swapped out and attempts to compress them into a dynamically allocated RAM-based memory pool. zswap basically trades CPU cycles -for potentially reduced swap I/O. This trade-off can also result in a +for potentially reduced swap I/O. This trade-off can also result in a significant performance improvement if reads from the compressed cache are faster than reads from a swap device. @@ -26,7 +26,7 @@ faster than reads from a swap device. performance impact of swapping. * Overcommitted guests that share a common I/O resource can dramatically reduce their swap I/O pressure, avoiding heavy handed I/O - throttling by the hypervisor. This allows more work to get done with less + throttling by the hypervisor. This allows more work to get done with less impact to the guest workload and guests sharing the I/O subsystem * Users with SSDs as swap devices can extend the life of the device by drastically reducing life-shortening writes. diff --git a/Documentation/x86/boot.rst b/Documentation/x86/boot.rst index fc844913dece..894a19897005 100644 --- a/Documentation/x86/boot.rst +++ b/Documentation/x86/boot.rst @@ -1343,7 +1343,7 @@ follow:: In addition to read/modify/write the setup header of the struct boot_params as that of 16-bit boot protocol, the boot loader should also fill the additional fields of the struct boot_params as -described in chapter :doc:`zero-page`. +described in chapter Documentation/x86/zero-page.rst. After setting up the struct boot_params, the boot loader can load the 32/64-bit kernel in the same way as that of 16-bit boot protocol. @@ -1379,7 +1379,7 @@ can be calculated as follows:: In addition to read/modify/write the setup header of the struct boot_params as that of 16-bit boot protocol, the boot loader should also fill the additional fields of the struct boot_params as described -in chapter :doc:`zero-page`. +in chapter Documentation/x86/zero-page.rst. After setting up the struct boot_params, the boot loader can load 64-bit kernel in the same way as that of 16-bit boot protocol, but diff --git a/Documentation/x86/mtrr.rst b/Documentation/x86/mtrr.rst index c5b695d75349..9f0b1851771a 100644 --- a/Documentation/x86/mtrr.rst +++ b/Documentation/x86/mtrr.rst @@ -28,7 +28,7 @@ are aligned with platform MTRR setup. If MTRRs are only set up by the platform firmware code though and the OS does not make any specific MTRR mapping requests mtrr_type_lookup() should always return MTRR_TYPE_INVALID. -For details refer to :doc:`pat`. +For details refer to Documentation/x86/pat.rst. .. tip:: On Intel P6 family processors (Pentium Pro, Pentium II and later) diff --git a/include/linux/device.h b/include/linux/device.h index f1a00040fa53..959cb9d2c9ab 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -399,7 +399,7 @@ struct dev_links_info { * along with subsystem-level and driver-level callbacks. * @em_pd: device's energy model performance domain * @pins: For device pin management. - * See Documentation/driver-api/pinctl.rst for details. + * See Documentation/driver-api/pin-control.rst for details. * @msi_list: Hosts MSI descriptors * @msi_domain: The generic MSI domain this device is using. * @numa_node: NUMA node this device is close to. diff --git a/include/linux/mfd/madera/pdata.h b/include/linux/mfd/madera/pdata.h index 601cbbc10370..32e3470708ed 100644 --- a/include/linux/mfd/madera/pdata.h +++ b/include/linux/mfd/madera/pdata.h @@ -31,7 +31,7 @@ struct pinctrl_map; * @irq_flags: Mode for primary IRQ (defaults to active low) * @gpio_base: Base GPIO number * @gpio_configs: Array of GPIO configurations (See - * Documentation/driver-api/pinctl.rst) + * Documentation/driver-api/pin-control.rst) * @n_gpio_configs: Number of entries in gpio_configs * @gpsw: General purpose switch mode setting. Depends on the external * hardware connected to the switch. (See the SW1_MODE field diff --git a/include/linux/pinctrl/pinconf-generic.h b/include/linux/pinctrl/pinconf-generic.h index e18ab3d5908f..5a96602a3316 100644 --- a/include/linux/pinctrl/pinconf-generic.h +++ b/include/linux/pinctrl/pinconf-generic.h @@ -89,7 +89,7 @@ struct pinctrl_map; * it. * @PIN_CONFIG_OUTPUT: this will configure the pin as an output and drive a * value on the line. Use argument 1 to indicate high level, argument 0 to - * indicate low level. (Please see Documentation/driver-api/pinctl.rst, + * indicate low level. (Please see Documentation/driver-api/pin-control.rst, * section "GPIO mode pitfalls" for a discussion around this parameter.) * @PIN_CONFIG_PERSIST_STATE: retain pin state across sleep or controller reset * @PIN_CONFIG_POWER_SOURCE: if the pin can select between different power diff --git a/include/linux/platform_profile.h b/include/linux/platform_profile.h index a6329003aee7..e5cbb6841f3a 100644 --- a/include/linux/platform_profile.h +++ b/include/linux/platform_profile.h @@ -2,7 +2,7 @@ /* * Platform profile sysfs interface * - * See Documentation/ABI/testing/sysfs-platform_profile.rst for more + * See Documentation/userspace-api/sysfs-platform_profile.rst for more * information. */ diff --git a/samples/kprobes/kprobe_example.c b/samples/kprobes/kprobe_example.c index 4b2f31828951..f991a66b5b02 100644 --- a/samples/kprobes/kprobe_example.c +++ b/samples/kprobes/kprobe_example.c @@ -10,6 +10,8 @@ * whenever kernel_clone() is invoked to create a new process. */ +#define pr_fmt(fmt) "%s: " fmt, __func__ + #include <linux/kernel.h> #include <linux/module.h> #include <linux/kprobes.h> @@ -27,32 +29,31 @@ static struct kprobe kp = { static int __kprobes handler_pre(struct kprobe *p, struct pt_regs *regs) { #ifdef CONFIG_X86 - pr_info("<%s> pre_handler: p->addr = 0x%p, ip = %lx, flags = 0x%lx\n", + pr_info("<%s> p->addr = 0x%p, ip = %lx, flags = 0x%lx\n", p->symbol_name, p->addr, regs->ip, regs->flags); #endif #ifdef CONFIG_PPC - pr_info("<%s> pre_handler: p->addr = 0x%p, nip = 0x%lx, msr = 0x%lx\n", + pr_info("<%s> p->addr = 0x%p, nip = 0x%lx, msr = 0x%lx\n", p->symbol_name, p->addr, regs->nip, regs->msr); #endif #ifdef CONFIG_MIPS - pr_info("<%s> pre_handler: p->addr = 0x%p, epc = 0x%lx, status = 0x%lx\n", + pr_info("<%s> p->addr = 0x%p, epc = 0x%lx, status = 0x%lx\n", p->symbol_name, p->addr, regs->cp0_epc, regs->cp0_status); #endif #ifdef CONFIG_ARM64 - pr_info("<%s> pre_handler: p->addr = 0x%p, pc = 0x%lx," - " pstate = 0x%lx\n", + pr_info("<%s> p->addr = 0x%p, pc = 0x%lx, pstate = 0x%lx\n", p->symbol_name, p->addr, (long)regs->pc, (long)regs->pstate); #endif #ifdef CONFIG_ARM - pr_info("<%s> pre_handler: p->addr = 0x%p, pc = 0x%lx, cpsr = 0x%lx\n", + pr_info("<%s> p->addr = 0x%p, pc = 0x%lx, cpsr = 0x%lx\n", p->symbol_name, p->addr, (long)regs->ARM_pc, (long)regs->ARM_cpsr); #endif #ifdef CONFIG_RISCV - pr_info("<%s> pre_handler: p->addr = 0x%p, pc = 0x%lx, status = 0x%lx\n", + pr_info("<%s> p->addr = 0x%p, pc = 0x%lx, status = 0x%lx\n", p->symbol_name, p->addr, regs->epc, regs->status); #endif #ifdef CONFIG_S390 - pr_info("<%s> pre_handler: p->addr, 0x%p, ip = 0x%lx, flags = 0x%lx\n", + pr_info("<%s> p->addr, 0x%p, ip = 0x%lx, flags = 0x%lx\n", p->symbol_name, p->addr, regs->psw.addr, regs->flags); #endif @@ -65,31 +66,31 @@ static void __kprobes handler_post(struct kprobe *p, struct pt_regs *regs, unsigned long flags) { #ifdef CONFIG_X86 - pr_info("<%s> post_handler: p->addr = 0x%p, flags = 0x%lx\n", + pr_info("<%s> p->addr = 0x%p, flags = 0x%lx\n", p->symbol_name, p->addr, regs->flags); #endif #ifdef CONFIG_PPC - pr_info("<%s> post_handler: p->addr = 0x%p, msr = 0x%lx\n", + pr_info("<%s> p->addr = 0x%p, msr = 0x%lx\n", p->symbol_name, p->addr, regs->msr); #endif #ifdef CONFIG_MIPS - pr_info("<%s> post_handler: p->addr = 0x%p, status = 0x%lx\n", + pr_info("<%s> p->addr = 0x%p, status = 0x%lx\n", p->symbol_name, p->addr, regs->cp0_status); #endif #ifdef CONFIG_ARM64 - pr_info("<%s> post_handler: p->addr = 0x%p, pstate = 0x%lx\n", + pr_info("<%s> p->addr = 0x%p, pstate = 0x%lx\n", p->symbol_name, p->addr, (long)regs->pstate); #endif #ifdef CONFIG_ARM - pr_info("<%s> post_handler: p->addr = 0x%p, cpsr = 0x%lx\n", + pr_info("<%s> p->addr = 0x%p, cpsr = 0x%lx\n", p->symbol_name, p->addr, (long)regs->ARM_cpsr); #endif #ifdef CONFIG_RISCV - pr_info("<%s> post_handler: p->addr = 0x%p, status = 0x%lx\n", + pr_info("<%s> p->addr = 0x%p, status = 0x%lx\n", p->symbol_name, p->addr, regs->status); #endif #ifdef CONFIG_S390 - pr_info("<%s> pre_handler: p->addr, 0x%p, flags = 0x%lx\n", + pr_info("<%s> p->addr, 0x%p, flags = 0x%lx\n", p->symbol_name, p->addr, regs->flags); #endif } diff --git a/scripts/documentation-file-ref-check b/scripts/documentation-file-ref-check index c71832b2312b..7187ea5e5149 100755 --- a/scripts/documentation-file-ref-check +++ b/scripts/documentation-file-ref-check @@ -24,7 +24,7 @@ my $help = 0; my $fix = 0; my $warn = 0; -if (! -d ".git") { +if (! -e ".git") { printf "Warning: can't check if file exists, as this is not a git tree\n"; exit 0; } diff --git a/scripts/kernel-doc b/scripts/kernel-doc index 4840e748fca8..7c4a6a507ac4 100755 --- a/scripts/kernel-doc +++ b/scripts/kernel-doc @@ -406,6 +406,8 @@ my $doc_inline_sect = '\s*\*\s*(@\s*[\w][\w\.]*\s*):(.*)'; my $doc_inline_end = '^\s*\*/\s*$'; my $doc_inline_oneline = '^\s*/\*\*\s*(@[\w\s]+):\s*(.*)\s*\*/\s*$'; my $export_symbol = '^\s*EXPORT_SYMBOL(_GPL)?\s*\(\s*(\w+)\s*\)\s*;'; +my $function_pointer = qr{([^\(]*\(\*)\s*\)\s*\(([^\)]*)\)}; +my $attribute = qr{__attribute__\s*\(\([a-z0-9,_\*\s\(\)]*\)\)}i; my %parameterdescs; my %parameterdesc_start_lines; @@ -694,7 +696,7 @@ sub output_function_man(%) { $post = ");"; } $type = $args{'parametertypes'}{$parameter}; - if ($type =~ m/([^\(]*\(\*)\s*\)\s*\(([^\)]*)\)/) { + if ($type =~ m/$function_pointer/) { # pointer-to-function print ".BI \"" . $parenth . $1 . "\" " . " \") (" . $2 . ")" . $post . "\"\n"; } else { @@ -974,7 +976,7 @@ sub output_function_rst(%) { $count++; $type = $args{'parametertypes'}{$parameter}; - if ($type =~ m/([^\(]*\(\*)\s*\)\s*\(([^\)]*)\)/) { + if ($type =~ m/$function_pointer/) { # pointer-to-function print $1 . $parameter . ") (" . $2 . ")"; } else { @@ -1211,7 +1213,9 @@ sub dump_struct($$) { my $members; my $type = qr{struct|union}; # For capturing struct/union definition body, i.e. "{members*}qualifiers*" - my $definition_body = qr{\{(.*)\}(?:\s*(?:__packed|__aligned|____cacheline_aligned_in_smp|____cacheline_aligned|__attribute__\s*\(\([a-z0-9,_\s\(\)]*\)\)))*}; + my $qualifiers = qr{$attribute|__packed|__aligned|____cacheline_aligned_in_smp|____cacheline_aligned}; + my $definition_body = qr{\{(.*)\}\s*$qualifiers*}; + my $struct_members = qr{($type)([^\{\};]+)\{([^\{\}]*)\}([^\{\}\;]*)\;}; if ($x =~ /($type)\s+(\w+)\s*$definition_body/) { $decl_type = $1; @@ -1235,27 +1239,27 @@ sub dump_struct($$) { # strip comments: $members =~ s/\/\*.*?\*\///gos; # strip attributes - $members =~ s/\s*__attribute__\s*\(\([a-z0-9,_\*\s\(\)]*\)\)/ /gi; + $members =~ s/\s*$attribute/ /gi; $members =~ s/\s*__aligned\s*\([^;]*\)/ /gos; $members =~ s/\s*__packed\s*/ /gos; $members =~ s/\s*CRYPTO_MINALIGN_ATTR/ /gos; $members =~ s/\s*____cacheline_aligned_in_smp/ /gos; $members =~ s/\s*____cacheline_aligned/ /gos; + my $args = qr{([^,)]+)}; # replace DECLARE_BITMAP $members =~ s/__ETHTOOL_DECLARE_LINK_MODE_MASK\s*\(([^\)]+)\)/DECLARE_BITMAP($1, __ETHTOOL_LINK_MODE_MASK_NBITS)/gos; - $members =~ s/DECLARE_BITMAP\s*\(([^,)]+),\s*([^,)]+)\)/unsigned long $1\[BITS_TO_LONGS($2)\]/gos; + $members =~ s/DECLARE_BITMAP\s*\($args,\s*$args\)/unsigned long $1\[BITS_TO_LONGS($2)\]/gos; # replace DECLARE_HASHTABLE - $members =~ s/DECLARE_HASHTABLE\s*\(([^,)]+),\s*([^,)]+)\)/unsigned long $1\[1 << (($2) - 1)\]/gos; + $members =~ s/DECLARE_HASHTABLE\s*\($args,\s*$args\)/unsigned long $1\[1 << (($2) - 1)\]/gos; # replace DECLARE_KFIFO - $members =~ s/DECLARE_KFIFO\s*\(([^,)]+),\s*([^,)]+),\s*([^,)]+)\)/$2 \*$1/gos; + $members =~ s/DECLARE_KFIFO\s*\($args,\s*$args,\s*$args\)/$2 \*$1/gos; # replace DECLARE_KFIFO_PTR - $members =~ s/DECLARE_KFIFO_PTR\s*\(([^,)]+),\s*([^,)]+)\)/$2 \*$1/gos; - + $members =~ s/DECLARE_KFIFO_PTR\s*\($args,\s*$args\)/$2 \*$1/gos; my $declaration = $members; # Split nested struct/union elements as newer ones - while ($members =~ m/(struct|union)([^\{\};]+)\{([^\{\}]*)\}([^\{\}\;]*)\;/) { + while ($members =~ m/$struct_members/) { my $newmember; my $maintype = $1; my $ids = $4; @@ -1315,7 +1319,7 @@ sub dump_struct($$) { } } } - $members =~ s/(struct|union)([^\{\};]+)\{([^\{\}]*)\}([^\{\}\;]*)\;/$newmember/; + $members =~ s/$struct_members/$newmember/; } # Ignore other nested elements, like enums @@ -1555,8 +1559,9 @@ sub create_parameterlist($$$$) { my $param; # temporarily replace commas inside function pointer definition - while ($args =~ /(\([^\),]+),/) { - $args =~ s/(\([^\),]+),/$1#/g; + my $arg_expr = qr{\([^\),]+}; + while ($args =~ /$arg_expr,/) { + $args =~ s/($arg_expr),/$1#/g; } foreach my $arg (split($splitter, $args)) { @@ -1707,7 +1712,7 @@ sub check_sections($$$$$) { foreach $px (0 .. $#prms) { $prm_clean = $prms[$px]; $prm_clean =~ s/\[.*\]//; - $prm_clean =~ s/__attribute__\s*\(\([a-z,_\*\s\(\)]*\)\)//i; + $prm_clean =~ s/$attribute//i; # ignore array size in a parameter string; # however, the original param string may contain # spaces, e.g.: addr[6 + 2] @@ -1809,8 +1814,14 @@ sub dump_function($$) { # - parport_register_device (function pointer parameters) # - atomic_set (macro) # - pci_match_device, __copy_to_user (long return type) - - if ($define && $prototype =~ m/^()([a-zA-Z0-9_~:]+)\s+/) { + my $name = qr{[a-zA-Z0-9_~:]+}; + my $prototype_end1 = qr{[^\(]*}; + my $prototype_end2 = qr{[^\{]*}; + my $prototype_end = qr{\(($prototype_end1|$prototype_end2)\)}; + my $type1 = qr{[\w\s]+}; + my $type2 = qr{$type1\*+}; + + if ($define && $prototype =~ m/^()($name)\s+/) { # This is an object-like macro, it has no return type and no parameter # list. # Function-like macros are not allowed to have spaces between @@ -1818,23 +1829,9 @@ sub dump_function($$) { $return_type = $1; $declaration_name = $2; $noret = 1; - } elsif ($prototype =~ m/^()([a-zA-Z0-9_~:]+)\s*\(([^\(]*)\)/ || - $prototype =~ m/^(\w+)\s+([a-zA-Z0-9_~:]+)\s*\(([^\(]*)\)/ || - $prototype =~ m/^(\w+\s*\*+)\s*([a-zA-Z0-9_~:]+)\s*\(([^\(]*)\)/ || - $prototype =~ m/^(\w+\s+\w+)\s+([a-zA-Z0-9_~:]+)\s*\(([^\(]*)\)/ || - $prototype =~ m/^(\w+\s+\w+\s*\*+)\s*([a-zA-Z0-9_~:]+)\s*\(([^\(]*)\)/ || - $prototype =~ m/^(\w+\s+\w+\s+\w+)\s+([a-zA-Z0-9_~:]+)\s*\(([^\(]*)\)/ || - $prototype =~ m/^(\w+\s+\w+\s+\w+\s*\*+)\s*([a-zA-Z0-9_~:]+)\s*\(([^\(]*)\)/ || - $prototype =~ m/^()([a-zA-Z0-9_~:]+)\s*\(([^\{]*)\)/ || - $prototype =~ m/^(\w+)\s+([a-zA-Z0-9_~:]+)\s*\(([^\{]*)\)/ || - $prototype =~ m/^(\w+\s*\*+)\s*([a-zA-Z0-9_~:]+)\s*\(([^\{]*)\)/ || - $prototype =~ m/^(\w+\s+\w+)\s+([a-zA-Z0-9_~:]+)\s*\(([^\{]*)\)/ || - $prototype =~ m/^(\w+\s+\w+\s*\*+)\s*([a-zA-Z0-9_~:]+)\s*\(([^\{]*)\)/ || - $prototype =~ m/^(\w+\s+\w+\s+\w+)\s+([a-zA-Z0-9_~:]+)\s*\(([^\{]*)\)/ || - $prototype =~ m/^(\w+\s+\w+\s+\w+\s*\*+)\s*([a-zA-Z0-9_~:]+)\s*\(([^\{]*)\)/ || - $prototype =~ m/^(\w+\s+\w+\s+\w+\s+\w+)\s+([a-zA-Z0-9_~:]+)\s*\(([^\{]*)\)/ || - $prototype =~ m/^(\w+\s+\w+\s+\w+\s+\w+\s*\*+)\s*([a-zA-Z0-9_~:]+)\s*\(([^\{]*)\)/ || - $prototype =~ m/^(\w+\s+\w+\s*\*+\s*\w+\s*\*+\s*)\s*([a-zA-Z0-9_~:]+)\s*\(([^\{]*)\)/) { + } elsif ($prototype =~ m/^()($name)\s*$prototype_end/ || + $prototype =~ m/^($type1)\s+($name)\s*$prototype_end/ || + $prototype =~ m/^($type2+)\s*($name)\s*$prototype_end/) { $return_type = $1; $declaration_name = $2; my $args = $3; @@ -2111,12 +2108,12 @@ sub process_name($$) { } elsif (/$doc_decl/o) { $identifier = $1; my $is_kernel_comment = 0; - my $decl_start = qr{\s*\*}; + my $decl_start = qr{$doc_com}; # test for pointer declaration type, foo * bar() - desc my $fn_type = qr{\w+\s*\*\s*}; my $parenthesis = qr{\(\w*\)}; my $decl_end = qr{[-:].*}; - if (/^$decl_start\s*([\w\s]+?)$parenthesis?\s*$decl_end?$/) { + if (/^$decl_start([\w\s]+?)$parenthesis?\s*$decl_end?$/) { $identifier = $1; } if ($identifier =~ m/^(struct|union|enum|typedef)\b\s*(\S*)/) { @@ -2126,8 +2123,8 @@ sub process_name($$) { } # Look for foo() or static void foo() - description; or misspelt # identifier - elsif (/^$decl_start\s*$fn_type?(\w+)\s*$parenthesis?\s*$decl_end?$/ || - /^$decl_start\s*$fn_type?(\w+.*)$parenthesis?\s*$decl_end$/) { + elsif (/^$decl_start$fn_type?(\w+)\s*$parenthesis?\s*$decl_end?$/ || + /^$decl_start$fn_type?(\w+.*)$parenthesis?\s*$decl_end$/) { $identifier = $1; $decl_type = 'function'; $identifier =~ s/^define\s+//; diff --git a/scripts/sphinx-pre-install b/scripts/sphinx-pre-install index fe92020d67e3..288e86a9d1e5 100755 --- a/scripts/sphinx-pre-install +++ b/scripts/sphinx-pre-install @@ -22,16 +22,18 @@ my $need = 0; my $optional = 0; my $need_symlink = 0; my $need_sphinx = 0; -my $need_venv = 0; +my $need_pip = 0; my $need_virtualenv = 0; my $rec_sphinx_upgrade = 0; my $install = ""; my $virtenv_dir = ""; my $python_cmd = ""; +my $activate_cmd; my $min_version; my $cur_version; my $rec_version = "1.7.9"; # PDF won't build here my $min_pdf_version = "2.4.4"; # Min version where pdf builds +my $latest_avail_ver; # # Command line arguments @@ -319,10 +321,7 @@ sub check_sphinx() return; } - if ($cur_version lt $rec_version) { - $rec_sphinx_upgrade = 1; - return; - } + return if ($cur_version lt $rec_version); # On version check mode, just assume Sphinx has all mandatory deps exit (0) if ($version_check); @@ -701,6 +700,162 @@ sub deactivate_help() printf "\tdeactivate\n"; } +sub get_virtenv() +{ + my $ver; + my $min_activate = "$ENV{'PWD'}/${virtenv_prefix}${min_version}/bin/activate"; + my @activates = glob "$ENV{'PWD'}/${virtenv_prefix}*/bin/activate"; + + @activates = sort {$b cmp $a} @activates; + + foreach my $f (@activates) { + next if ($f lt $min_activate); + + my $sphinx_cmd = $f; + $sphinx_cmd =~ s/activate/sphinx-build/; + next if (! -f $sphinx_cmd); + + my $ver = get_sphinx_version($sphinx_cmd); + if ($need_sphinx && ($ver ge $min_version)) { + return ($f, $ver); + } elsif ($ver gt $cur_version) { + return ($f, $ver); + } + } + return ("", ""); +} + +sub recommend_sphinx_upgrade() +{ + my $venv_ver; + + # Avoid running sphinx-builds from venv if $cur_version is good + if ($cur_version && ($cur_version ge $rec_version)) { + $latest_avail_ver = $cur_version; + return; + } + + # Get the highest version from sphinx_*/bin/sphinx-build and the + # corresponding command to activate the venv/virtenv + $activate_cmd = get_virtenv(); + + # Store the highest version from Sphinx existing virtualenvs + if (($activate_cmd ne "") && ($venv_ver gt $cur_version)) { + $latest_avail_ver = $venv_ver; + } else { + $latest_avail_ver = $cur_version if ($cur_version); + } + + # As we don't know package version of Sphinx, and there's no + # virtual environments, don't check if upgrades are needed + if (!$virtualenv) { + return if (!$latest_avail_ver); + } + + # Either there are already a virtual env or a new one should be created + $need_pip = 1; + + # Return if the reason is due to an upgrade or not + if ($latest_avail_ver lt $rec_version) { + $rec_sphinx_upgrade = 1; + } +} + +# +# The logic here is complex, as it have to deal with different versions: +# - minimal supported version; +# - minimal PDF version; +# - recommended version. +# It also needs to work fine with both distro's package and venv/virtualenv +sub recommend_sphinx_version($) +{ + my $virtualenv_cmd = shift; + + if ($latest_avail_ver lt $min_pdf_version) { + print "note: If you want pdf, you need at least Sphinx $min_pdf_version.\n"; + } + + # Version is OK. Nothing to do. + return if ($cur_version && ($cur_version ge $rec_version)); + + if (!$need_sphinx) { + # sphinx-build is present and its version is >= $min_version + + #only recommend enabling a newer virtenv version if makes sense. + if ($latest_avail_ver gt $cur_version) { + printf "\nYou may also use the newer Sphinx version $latest_avail_ver with:\n"; + printf "\tdeactivate\n" if ($ENV{'PWD'} =~ /${virtenv_prefix}/); + printf "\t. $activate_cmd\n"; + deactivate_help(); + + return; + } + return if ($latest_avail_ver ge $rec_version); + } + + if (!$virtualenv) { + # No sphinx either via package or via virtenv. As we can't + # Compare the versions here, just return, recommending the + # user to install it from the package distro. + return if (!$latest_avail_ver); + + # User doesn't want a virtenv recommendation, but he already + # installed one via virtenv with a newer version. + # So, print commands to enable it + if ($latest_avail_ver gt $cur_version) { + printf "\nYou may also use the Sphinx virtualenv version $latest_avail_ver with:\n"; + printf "\tdeactivate\n" if ($ENV{'PWD'} =~ /${virtenv_prefix}/); + printf "\t. $activate_cmd\n"; + deactivate_help(); + + return; + } + print "\n"; + } else { + $need++ if ($need_sphinx); + } + + # Suggest newer versions if current ones are too old + if ($latest_avail_ver && $cur_version ge $min_version) { + # If there's a good enough version, ask the user to enable it + if ($latest_avail_ver ge $rec_version) { + printf "\nNeed to activate Sphinx (version $latest_avail_ver) on virtualenv with:\n"; + printf "\t. $activate_cmd\n"; + deactivate_help(); + + return; + } + + # Version is above the minimal required one, but may be + # below the recommended one. So, print warnings/notes + + if ($latest_avail_ver lt $rec_version) { + print "Warning: It is recommended at least Sphinx version $rec_version.\n"; + } + } + + # At this point, either it needs Sphinx or upgrade is recommended, + # both via pip + + if ($rec_sphinx_upgrade) { + if (!$virtualenv) { + print "Instead of install/upgrade Python Sphinx pkg, you could use pip/pypi with:\n\n"; + } else { + print "To upgrade Sphinx, use:\n\n"; + } + } else { + print "Sphinx needs to be installed either as a package or via pip/pypi with:\n"; + } + + $python_cmd = find_python_no_venv(); + + printf "\t$virtualenv_cmd $virtenv_dir\n"; + + printf "\t. $virtenv_dir/bin/activate\n"; + printf "\tpip install -r $requirement_file\n"; + deactivate_help(); +} + sub check_needs() { # Check if Sphinx is already accessible from current environment @@ -722,15 +877,14 @@ sub check_needs() if ($virtualenv) { my $tmp = qx($python_cmd --version 2>&1); if ($tmp =~ m/(\d+\.)(\d+\.)/) { - if ($1 >= 3 && $2 >= 3) { - $need_venv = 1; # python 3.3 or upper - } else { - $need_virtualenv = 1; - } if ($1 < 3) { # Fail if it finds python2 (or worse) die "Python 3 is required to build the kernel docs\n"; } + if ($1 == 3 && $2 < 3) { + # Need Python 3.3 or upper for venv + $need_virtualenv = 1; + } } else { die "Warning: couldn't identify $python_cmd version!"; } @@ -739,14 +893,22 @@ sub check_needs() } } - # Set virtualenv command line, if python < 3.3 + recommend_sphinx_upgrade(); + my $virtualenv_cmd; - if ($need_virtualenv) { - $virtualenv_cmd = findprog("virtualenv-3"); - $virtualenv_cmd = findprog("virtualenv-3.5") if (!$virtualenv_cmd); - if (!$virtualenv_cmd) { - check_program("virtualenv", 0); - $virtualenv_cmd = "virtualenv"; + + if ($need_pip) { + # Set virtualenv command line, if python < 3.3 + if ($need_virtualenv) { + $virtualenv_cmd = findprog("virtualenv-3"); + $virtualenv_cmd = findprog("virtualenv-3.5") if (!$virtualenv_cmd); + if (!$virtualenv_cmd) { + check_program("virtualenv", 0); + $virtualenv_cmd = "virtualenv"; + } + } else { + $virtualenv_cmd = "$python_cmd -m venv"; + check_python_module("ensurepip", 0); } } @@ -763,10 +925,6 @@ sub check_needs() check_program("rsvg-convert", 2) if ($pdf); check_program("latexmk", 2) if ($pdf); - if ($need_sphinx || $rec_sphinx_upgrade) { - check_python_module("ensurepip", 0) if ($need_venv); - } - # Do distro-specific checks and output distro-install commands check_distros(); @@ -784,67 +942,7 @@ sub check_needs() which("sphinx-build-3"); } - # NOTE: if the system has a too old Sphinx version installed, - # it will recommend installing a newer version using virtualenv - - if ($need_sphinx || $rec_sphinx_upgrade) { - my $min_activate = "$ENV{'PWD'}/${virtenv_prefix}${min_version}/bin/activate"; - my @activates = glob "$ENV{'PWD'}/${virtenv_prefix}*/bin/activate"; - - if ($cur_version lt $rec_version) { - print "Warning: It is recommended at least Sphinx version $rec_version.\n"; - print " If you want pdf, you need at least $min_pdf_version.\n"; - } - if ($cur_version lt $min_pdf_version) { - print "Note: It is recommended at least Sphinx version $min_pdf_version if you need PDF support.\n"; - } - @activates = sort {$b cmp $a} @activates; - my ($activate, $ver); - foreach my $f (@activates) { - next if ($f lt $min_activate); - - my $sphinx_cmd = $f; - $sphinx_cmd =~ s/activate/sphinx-build/; - next if (! -f $sphinx_cmd); - - $ver = get_sphinx_version($sphinx_cmd); - if ($need_sphinx && ($ver ge $min_version)) { - $activate = $f; - last; - } elsif ($ver gt $cur_version) { - $activate = $f; - last; - } - } - if ($activate ne "") { - if ($need_sphinx) { - printf "\nNeed to activate Sphinx (version $ver) on virtualenv with:\n"; - printf "\t. $activate\n"; - deactivate_help(); - exit (1); - } else { - printf "\nYou may also use a newer Sphinx (version $ver) with:\n"; - printf "\tdeactivate && . $activate\n"; - } - } else { - my $rec_activate = "$virtenv_dir/bin/activate"; - - print "To upgrade Sphinx, use:\n\n" if ($rec_sphinx_upgrade); - - $python_cmd = find_python_no_venv(); - - if ($need_venv) { - printf "\t$python_cmd -m venv $virtenv_dir\n"; - } else { - printf "\t$virtualenv_cmd $virtenv_dir\n"; - } - printf "\t. $rec_activate\n"; - printf "\tpip install -r $requirement_file\n"; - deactivate_help(); - - $need++ if (!$rec_sphinx_upgrade); - } - } + recommend_sphinx_version($virtualenv_cmd); printf "\n"; print "All optional dependencies are met.\n" if (!$optional); diff --git a/tools/debugging/kernel-chktaint b/tools/debugging/kernel-chktaint index 719f18b1edf0..f1af27ce9f20 100755 --- a/tools/debugging/kernel-chktaint +++ b/tools/debugging/kernel-chktaint @@ -196,7 +196,7 @@ else fi echo "For a more detailed explanation of the various taint flags see" -echo " Documentation/admin-guide/tainted-kernels.rst in the the Linux kernel sources" +echo " Documentation/admin-guide/tainted-kernels.rst in the Linux kernel sources" echo " or https://kernel.org/doc/html/latest/admin-guide/tainted-kernels.html" echo "Raw taint value as int/string: $taint/'$out'" #EOF# |