diff options
author | Ingo Molnar <mingo@kernel.org> | 2016-05-04 08:48:26 +0200 |
---|---|---|
committer | Ingo Molnar <mingo@kernel.org> | 2016-05-04 08:48:26 +0200 |
commit | d63c214b0a39e7e92fbae2a47b66e8145c3c9c81 (patch) | |
tree | ca797379eeb01c96308c8e2abdcea3b55040d489 /Documentation | |
parent | 47a541c3e19374ec9f5d3d96730a922e8480dda5 (diff) | |
parent | 04974df8049fc4240d22759a91e035082ccd18b4 (diff) | |
download | linux-d63c214b0a39e7e92fbae2a47b66e8145c3c9c81.tar.bz2 |
Merge tag 'v4.6-rc6' into x86/platform, to refresh the tree
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Diffstat (limited to 'Documentation')
24 files changed, 330 insertions, 100 deletions
diff --git a/Documentation/ABI/testing/sysfs-platform-i2c-demux-pinctrl b/Documentation/ABI/testing/sysfs-platform-i2c-demux-pinctrl index 7ac7d7262bb7..3c3514815cd5 100644 --- a/Documentation/ABI/testing/sysfs-platform-i2c-demux-pinctrl +++ b/Documentation/ABI/testing/sysfs-platform-i2c-demux-pinctrl @@ -1,23 +1,18 @@ -What: /sys/devices/platform/<i2c-demux-name>/cur_master +What: /sys/devices/platform/<i2c-demux-name>/available_masters Date: January 2016 KernelVersion: 4.6 Contact: Wolfram Sang <wsa@the-dreams.de> Description: + Reading the file will give you a list of masters which can be + selected for a demultiplexed bus. The format is + "<index>:<name>". Example from a Renesas Lager board: -This file selects the active I2C master for a demultiplexed bus. + 0:/i2c@e6500000 1:/i2c@e6508000 -Write 0 there for the first master, 1 for the second etc. Reading the file will -give you a list with the active master marked. Example from a Renesas Lager -board: - -root@Lager:~# cat /sys/devices/platform/i2c@8/cur_master -* 0 - /i2c@9 - 1 - /i2c@e6520000 - 2 - /i2c@e6530000 - -root@Lager:~# echo 2 > /sys/devices/platform/i2c@8/cur_master - -root@Lager:~# cat /sys/devices/platform/i2c@8/cur_master - 0 - /i2c@9 - 1 - /i2c@e6520000 -* 2 - /i2c@e6530000 +What: /sys/devices/platform/<i2c-demux-name>/current_master +Date: January 2016 +KernelVersion: 4.6 +Contact: Wolfram Sang <wsa@the-dreams.de> +Description: + This file selects/shows the active I2C master for a demultiplexed + bus. It uses the <index> value from the file 'available_masters'. diff --git a/Documentation/devicetree/bindings/arc/archs-pct.txt b/Documentation/devicetree/bindings/arc/archs-pct.txt index 1ae98b87c640..e4b9dcee6d41 100644 --- a/Documentation/devicetree/bindings/arc/archs-pct.txt +++ b/Documentation/devicetree/bindings/arc/archs-pct.txt @@ -2,7 +2,7 @@ The ARC HS can be configured with a pipeline performance monitor for counting CPU and cache events like cache misses and hits. Like conventional PCT there -are 100+ hardware conditions dynamically mapped to upto 32 counters. +are 100+ hardware conditions dynamically mapped to up to 32 counters. It also supports overflow interrupts. Required properties: diff --git a/Documentation/devicetree/bindings/arc/pct.txt b/Documentation/devicetree/bindings/arc/pct.txt index 7b9588444f20..4e874d9a38a6 100644 --- a/Documentation/devicetree/bindings/arc/pct.txt +++ b/Documentation/devicetree/bindings/arc/pct.txt @@ -2,7 +2,7 @@ The ARC700 can be configured with a pipeline performance monitor for counting CPU and cache events like cache misses and hits. Like conventional PCT there -are 100+ hardware conditions dynamically mapped to upto 32 counters +are 100+ hardware conditions dynamically mapped to up to 32 counters Note that: * The ARC 700 PCT does not support interrupts; although HW events may be diff --git a/Documentation/devicetree/bindings/arm/cpus.txt b/Documentation/devicetree/bindings/arm/cpus.txt index ccc62f145306..3f0cbbb8395f 100644 --- a/Documentation/devicetree/bindings/arm/cpus.txt +++ b/Documentation/devicetree/bindings/arm/cpus.txt @@ -192,7 +192,6 @@ nodes to be present and contain the properties described below. can be one of: "allwinner,sun6i-a31" "allwinner,sun8i-a23" - "arm,psci" "arm,realview-smp" "brcm,bcm-nsp-smp" "brcm,brahma-b15" diff --git a/Documentation/devicetree/bindings/clock/qca,ath79-pll.txt b/Documentation/devicetree/bindings/clock/qca,ath79-pll.txt index e0fc2c11dd00..241fb0545b9e 100644 --- a/Documentation/devicetree/bindings/clock/qca,ath79-pll.txt +++ b/Documentation/devicetree/bindings/clock/qca,ath79-pll.txt @@ -3,7 +3,7 @@ Binding for Qualcomm Atheros AR7xxx/AR9XXX PLL controller The PPL controller provides the 3 main clocks of the SoC: CPU, DDR and AHB. Required Properties: -- compatible: has to be "qca,<soctype>-cpu-intc" and one of the following +- compatible: has to be "qca,<soctype>-pll" and one of the following fallbacks: - "qca,ar7100-pll" - "qca,ar7240-pll" @@ -21,8 +21,8 @@ Optional properties: Example: - memory-controller@18050000 { - compatible = "qca,ar9132-ppl", "qca,ar9130-pll"; + pll-controller@18050000 { + compatible = "qca,ar9132-pll", "qca,ar9130-pll"; reg = <0x18050000 0x20>; clock-names = "ref"; diff --git a/Documentation/devicetree/bindings/i2c/i2c-rk3x.txt b/Documentation/devicetree/bindings/i2c/i2c-rk3x.txt index f0d71bc52e64..0b4a85fe2d86 100644 --- a/Documentation/devicetree/bindings/i2c/i2c-rk3x.txt +++ b/Documentation/devicetree/bindings/i2c/i2c-rk3x.txt @@ -6,8 +6,8 @@ RK3xxx SoCs. Required properties : - reg : Offset and length of the register set for the device - - compatible : should be "rockchip,rk3066-i2c", "rockchip,rk3188-i2c" or - "rockchip,rk3288-i2c". + - compatible : should be "rockchip,rk3066-i2c", "rockchip,rk3188-i2c", + "rockchip,rk3228-i2c" or "rockchip,rk3288-i2c". - interrupts : interrupt number - clocks : parent clock diff --git a/Documentation/devicetree/bindings/net/mediatek-net.txt b/Documentation/devicetree/bindings/net/mediatek-net.txt index 5ca79290eabf..32eaaca04d9b 100644 --- a/Documentation/devicetree/bindings/net/mediatek-net.txt +++ b/Documentation/devicetree/bindings/net/mediatek-net.txt @@ -9,7 +9,8 @@ have dual GMAC each represented by a child node.. Required properties: - compatible: Should be "mediatek,mt7623-eth" - reg: Address and length of the register set for the device -- interrupts: Should contain the frame engines interrupt +- interrupts: Should contain the three frame engines interrupts in numeric + order. These are fe_int0, fe_int1 and fe_int2. - clocks: the clock used by the core - clock-names: the names of the clock listed in the clocks property. These are "ethif", "esw", "gp2", "gp1" @@ -42,7 +43,9 @@ eth: ethernet@1b100000 { <ðsys CLK_ETHSYS_GP2>, <ðsys CLK_ETHSYS_GP1>; clock-names = "ethif", "esw", "gp2", "gp1"; - interrupts = <GIC_SPI 200 IRQ_TYPE_LEVEL_LOW>; + interrupts = <GIC_SPI 200 IRQ_TYPE_LEVEL_LOW + GIC_SPI 199 IRQ_TYPE_LEVEL_LOW + GIC_SPI 198 IRQ_TYPE_LEVEL_LOW>; power-domains = <&scpsys MT2701_POWER_DOMAIN_ETH>; resets = <ðsys MT2701_ETHSYS_ETH_RST>; reset-names = "eth"; diff --git a/Documentation/devicetree/bindings/phy/rockchip-dp-phy.txt b/Documentation/devicetree/bindings/phy/rockchip-dp-phy.txt index 50c4f9b00adf..e3b4809fbe82 100644 --- a/Documentation/devicetree/bindings/phy/rockchip-dp-phy.txt +++ b/Documentation/devicetree/bindings/phy/rockchip-dp-phy.txt @@ -8,15 +8,19 @@ Required properties: of memory mapped region. - clock-names: from common clock binding: Required elements: "24m" -- rockchip,grf: phandle to the syscon managing the "general register files" - #phy-cells : from the generic PHY bindings, must be 0; Example: -edp_phy: edp-phy { - compatible = "rockchip,rk3288-dp-phy"; - rockchip,grf = <&grf>; - clocks = <&cru SCLK_EDP_24M>; - clock-names = "24m"; - #phy-cells = <0>; +grf: syscon@ff770000 { + compatible = "rockchip,rk3288-grf", "syscon", "simple-mfd"; + +... + + edp_phy: edp-phy { + compatible = "rockchip,rk3288-dp-phy"; + clocks = <&cru SCLK_EDP_24M>; + clock-names = "24m"; + #phy-cells = <0>; + }; }; diff --git a/Documentation/devicetree/bindings/phy/rockchip-emmc-phy.txt b/Documentation/devicetree/bindings/phy/rockchip-emmc-phy.txt index 61916f15a949..555cb0f40690 100644 --- a/Documentation/devicetree/bindings/phy/rockchip-emmc-phy.txt +++ b/Documentation/devicetree/bindings/phy/rockchip-emmc-phy.txt @@ -3,17 +3,23 @@ Rockchip EMMC PHY Required properties: - compatible: rockchip,rk3399-emmc-phy - - rockchip,grf : phandle to the syscon managing the "general - register files" - #phy-cells: must be 0 - - reg: PHY configure reg address offset in "general + - reg: PHY register address offset and length in "general register files" Example: -emmcphy: phy { - compatible = "rockchip,rk3399-emmc-phy"; - rockchip,grf = <&grf>; - reg = <0xf780>; - #phy-cells = <0>; + +grf: syscon@ff770000 { + compatible = "rockchip,rk3399-grf", "syscon", "simple-mfd"; + #address-cells = <1>; + #size-cells = <1>; + +... + + emmcphy: phy@f780 { + compatible = "rockchip,rk3399-emmc-phy"; + reg = <0xf780 0x20>; + #phy-cells = <0>; + }; }; diff --git a/Documentation/devicetree/bindings/pinctrl/img,pistachio-pinctrl.txt b/Documentation/devicetree/bindings/pinctrl/img,pistachio-pinctrl.txt index 08a4a32c8eb0..0326154c7925 100644 --- a/Documentation/devicetree/bindings/pinctrl/img,pistachio-pinctrl.txt +++ b/Documentation/devicetree/bindings/pinctrl/img,pistachio-pinctrl.txt @@ -134,12 +134,12 @@ mfio80 ddr_debug, mips_trace_data, mips_debug mfio81 dreq0, mips_trace_data, eth_debug mfio82 dreq1, mips_trace_data, eth_debug mfio83 mips_pll_lock, mips_trace_data, usb_debug -mfio84 sys_pll_lock, mips_trace_data, usb_debug -mfio85 wifi_pll_lock, mips_trace_data, sdhost_debug -mfio86 bt_pll_lock, mips_trace_data, sdhost_debug -mfio87 rpu_v_pll_lock, dreq2, socif_debug -mfio88 rpu_l_pll_lock, dreq3, socif_debug -mfio89 audio_pll_lock, dreq4, dreq5 +mfio84 audio_pll_lock, mips_trace_data, usb_debug +mfio85 rpu_v_pll_lock, mips_trace_data, sdhost_debug +mfio86 rpu_l_pll_lock, mips_trace_data, sdhost_debug +mfio87 sys_pll_lock, dreq2, socif_debug +mfio88 wifi_pll_lock, dreq3, socif_debug +mfio89 bt_pll_lock, dreq4, dreq5 tck trstn tdi diff --git a/Documentation/devicetree/bindings/pinctrl/meson,pinctrl.txt b/Documentation/devicetree/bindings/pinctrl/meson,pinctrl.txt index 3f6a524cc5ff..32f4a2d6d0b3 100644 --- a/Documentation/devicetree/bindings/pinctrl/meson,pinctrl.txt +++ b/Documentation/devicetree/bindings/pinctrl/meson,pinctrl.txt @@ -1,13 +1,16 @@ == Amlogic Meson pinmux controller == Required properties for the root node: - - compatible: "amlogic,meson8-pinctrl" or "amlogic,meson8b-pinctrl" + - compatible: one of "amlogic,meson8-cbus-pinctrl" + "amlogic,meson8b-cbus-pinctrl" + "amlogic,meson8-aobus-pinctrl" + "amlogic,meson8b-aobus-pinctrl" - reg: address and size of registers controlling irq functionality === GPIO sub-nodes === -The 2 power domains of the controller (regular and always-on) are -represented as sub-nodes and each of them acts as a GPIO controller. +The GPIO bank for the controller is represented as a sub-node and it acts as a +GPIO controller. Required properties for sub-nodes are: - reg: should contain address and size for mux, pull-enable, pull and @@ -18,10 +21,6 @@ Required properties for sub-nodes are: - gpio-controller: identifies the node as a gpio controller - #gpio-cells: must be 2 -Valid sub-node names are: - - "banks" for the regular domain - - "ao-bank" for the always-on domain - === Other sub-nodes === Child nodes without the "gpio-controller" represent some desired @@ -45,7 +44,7 @@ pinctrl-bindings.txt === Example === pinctrl: pinctrl@c1109880 { - compatible = "amlogic,meson8-pinctrl"; + compatible = "amlogic,meson8-cbus-pinctrl"; reg = <0xc1109880 0x10>; #address-cells = <1>; #size-cells = <1>; @@ -61,15 +60,6 @@ pinctrl-bindings.txt #gpio-cells = <2>; }; - gpio_ao: ao-bank@c1108030 { - reg = <0xc8100014 0x4>, - <0xc810002c 0x4>, - <0xc8100024 0x8>; - reg-names = "mux", "pull", "gpio"; - gpio-controller; - #gpio-cells = <2>; - }; - nand { mux { groups = "nand_io", "nand_io_ce0", "nand_io_ce1", @@ -79,18 +69,4 @@ pinctrl-bindings.txt function = "nand"; }; }; - - uart_ao_a { - mux { - groups = "uart_tx_ao_a", "uart_rx_ao_a", - "uart_cts_ao_a", "uart_rts_ao_a"; - function = "uart_ao"; - }; - - conf { - pins = "GPIOAO_0", "GPIOAO_1", - "GPIOAO_2", "GPIOAO_3"; - bias-disable; - }; - }; }; diff --git a/Documentation/devicetree/bindings/rtc/s3c-rtc.txt b/Documentation/devicetree/bindings/rtc/s3c-rtc.txt index 1068ffce9f91..fdde63a5419c 100644 --- a/Documentation/devicetree/bindings/rtc/s3c-rtc.txt +++ b/Documentation/devicetree/bindings/rtc/s3c-rtc.txt @@ -15,9 +15,10 @@ Required properties: is the rtc tick interrupt. The number of cells representing a interrupt depends on the parent interrupt controller. - clocks: Must contain a list of phandle and clock specifier for the rtc - and source clocks. -- clock-names: Must contain "rtc" and "rtc_src" entries sorted in the - same order as the clocks property. + clock and in the case of a s3c6410 compatible controller, also + a source clock. +- clock-names: Must contain "rtc" and for a s3c6410 compatible controller, + a "rtc_src" sorted in the same order as the clocks property. Example: diff --git a/Documentation/filesystems/cramfs.txt b/Documentation/filesystems/cramfs.txt index 31f53f0ab957..4006298f6707 100644 --- a/Documentation/filesystems/cramfs.txt +++ b/Documentation/filesystems/cramfs.txt @@ -38,7 +38,7 @@ the update lasts only as long as the inode is cached in memory, after which the timestamp reverts to 1970, i.e. moves backwards in time. Currently, cramfs must be written and read with architectures of the -same endianness, and can be read only by kernels with PAGE_CACHE_SIZE +same endianness, and can be read only by kernels with PAGE_SIZE == 4096. At least the latter of these is a bug, but it hasn't been decided what the best fix is. For the moment if you have larger pages you can just change the #define in mkcramfs.c, so long as you don't diff --git a/Documentation/filesystems/tmpfs.txt b/Documentation/filesystems/tmpfs.txt index d392e1505f17..d9c11d25bf02 100644 --- a/Documentation/filesystems/tmpfs.txt +++ b/Documentation/filesystems/tmpfs.txt @@ -60,7 +60,7 @@ size: The limit of allocated bytes for this tmpfs instance. The default is half of your physical RAM without swap. If you oversize your tmpfs instances the machine will deadlock since the OOM handler will not be able to free that memory. -nr_blocks: The same as size, but in blocks of PAGE_CACHE_SIZE. +nr_blocks: The same as size, but in blocks of PAGE_SIZE. nr_inodes: The maximum number of inodes for this instance. The default is half of the number of your physical RAM pages, or (on a machine with highmem) the number of lowmem RAM pages, diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt index b02a7d598258..4164bd6397a2 100644 --- a/Documentation/filesystems/vfs.txt +++ b/Documentation/filesystems/vfs.txt @@ -708,9 +708,9 @@ struct address_space_operations { from the address space. This generally corresponds to either a truncation, punch hole or a complete invalidation of the address space (in the latter case 'offset' will always be 0 and 'length' - will be PAGE_CACHE_SIZE). Any private data associated with the page + will be PAGE_SIZE). Any private data associated with the page should be updated to reflect this truncation. If offset is 0 and - length is PAGE_CACHE_SIZE, then the private data should be released, + length is PAGE_SIZE, then the private data should be released, because the page must be able to be completely discarded. This may be done by calling the ->releasepage function, but in this case the release MUST succeed. diff --git a/Documentation/input/event-codes.txt b/Documentation/input/event-codes.txt index 3f0f5ce3338b..36ea940e5bb9 100644 --- a/Documentation/input/event-codes.txt +++ b/Documentation/input/event-codes.txt @@ -173,6 +173,10 @@ A few EV_ABS codes have special meanings: proximity of the device and while the value of the BTN_TOUCH code is 0. If the input device may be used freely in three dimensions, consider ABS_Z instead. + - BTN_TOOL_<name> should be set to 1 when the tool comes into detectable + proximity and set to 0 when the tool leaves detectable proximity. + BTN_TOOL_<name> signals the type of tool that is currently detected by the + hardware and is otherwise independent of ABS_DISTANCE and/or BTN_TOUCH. * ABS_MT_<name>: - Used to describe multitouch input events. Please see diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 893a70907f15..8ba7f82f472c 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -4085,6 +4085,8 @@ bytes respectively. Such letter suffixes can also be entirely omitted. sector if the number is odd); i = IGNORE_DEVICE (don't bind to this device); + j = NO_REPORT_LUNS (don't use report luns + command, uas only); l = NOT_LOCKABLE (don't try to lock and unlock ejectable media); m = MAX_SECTORS_64 (don't transfer more diff --git a/Documentation/networking/switchdev.txt b/Documentation/networking/switchdev.txt index fad63136ee3e..2f659129694b 100644 --- a/Documentation/networking/switchdev.txt +++ b/Documentation/networking/switchdev.txt @@ -386,7 +386,7 @@ used. First phase is to "prepare" anything needed, including various checks, memory allocation, etc. The goal is to handle the stuff that is not unlikely to fail here. The second phase is to "commit" the actual changes. -Switchdev provides an inftrastructure for sharing items (for example memory +Switchdev provides an infrastructure for sharing items (for example memory allocations) between the two phases. The object created by a driver in "prepare" phase and it is queued up by: diff --git a/Documentation/power/runtime_pm.txt b/Documentation/power/runtime_pm.txt index 7328cf85236c..1fd1fbe9ce95 100644 --- a/Documentation/power/runtime_pm.txt +++ b/Documentation/power/runtime_pm.txt @@ -586,6 +586,10 @@ drivers to make their ->remove() callbacks avoid races with runtime PM directly, but also it allows of more flexibility in the handling of devices during the removal of their drivers. +Drivers in ->remove() callback should undo the runtime PM changes done +in ->probe(). Usually this means calling pm_runtime_disable(), +pm_runtime_dont_use_autosuspend() etc. + The user space can effectively disallow the driver of the device to power manage it at run time by changing the value of its /sys/devices/.../power/control attribute to "on", which causes pm_runtime_forbid() to be called. In principle, diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt index cb0368459da3..34a5fece3121 100644 --- a/Documentation/sysctl/vm.txt +++ b/Documentation/sysctl/vm.txt @@ -581,15 +581,16 @@ Specify "[Nn]ode" for node order "Zone Order" orders the zonelists by zone type, then by node within each zone. Specify "[Zz]one" for zone order. -Specify "[Dd]efault" to request automatic configuration. Autoconfiguration -will select "node" order in following case. -(1) if the DMA zone does not exist or -(2) if the DMA zone comprises greater than 50% of the available memory or -(3) if any node's DMA zone comprises greater than 70% of its local memory and - the amount of local memory is big enough. - -Otherwise, "zone" order will be selected. Default order is recommended unless -this is causing problems for your system/application. +Specify "[Dd]efault" to request automatic configuration. + +On 32-bit, the Normal zone needs to be preserved for allocations accessible +by the kernel, so "zone" order will be selected. + +On 64-bit, devices that require DMA32/DMA are relatively rare, so "node" +order will be selected. + +Default order is recommended unless this is causing problems for your +system/application. ============================================================== diff --git a/Documentation/usb/gadget_multi.txt b/Documentation/usb/gadget_multi.txt index 7d66a8636cb5..5faf514047e9 100644 --- a/Documentation/usb/gadget_multi.txt +++ b/Documentation/usb/gadget_multi.txt @@ -43,7 +43,7 @@ For the gadget two work under Windows two conditions have to be met: First of all, Windows need to detect the gadget as an USB composite gadget which on its own have some conditions[4]. If they are met, Windows lets USB Generic Parent Driver[5] handle the device which then -tries to much drivers for each individual interface (sort of, don't +tries to match drivers for each individual interface (sort of, don't get into too many details). The good news is: you do not have to worry about most of the diff --git a/Documentation/x86/protection-keys.txt b/Documentation/x86/protection-keys.txt new file mode 100644 index 000000000000..c281ded1ba16 --- /dev/null +++ b/Documentation/x86/protection-keys.txt @@ -0,0 +1,27 @@ +Memory Protection Keys for Userspace (PKU aka PKEYs) is a CPU feature +which will be found on future Intel CPUs. + +Memory Protection Keys provides a mechanism for enforcing page-based +protections, but without requiring modification of the page tables +when an application changes protection domains. It works by +dedicating 4 previously ignored bits in each page table entry to a +"protection key", giving 16 possible keys. + +There is also a new user-accessible register (PKRU) with two separate +bits (Access Disable and Write Disable) for each key. Being a CPU +register, PKRU is inherently thread-local, potentially giving each +thread a different set of protections from every other thread. + +There are two new instructions (RDPKRU/WRPKRU) for reading and writing +to the new register. The feature is only available in 64-bit mode, +even though there is theoretically space in the PAE PTEs. These +permissions are enforced on data access only and have no effect on +instruction fetches. + +=========================== Config Option =========================== + +This config option adds approximately 1.5kb of text. and 50 bytes of +data to the executable. A workload which does large O_DIRECT reads +of holes in XFS files was run to exercise get_user_pages_fast(). No +performance delta was observed with the config option +enabled or disabled. diff --git a/Documentation/x86/topology.txt b/Documentation/x86/topology.txt new file mode 100644 index 000000000000..06afac252f5b --- /dev/null +++ b/Documentation/x86/topology.txt @@ -0,0 +1,208 @@ +x86 Topology +============ + +This documents and clarifies the main aspects of x86 topology modelling and +representation in the kernel. Update/change when doing changes to the +respective code. + +The architecture-agnostic topology definitions are in +Documentation/cputopology.txt. This file holds x86-specific +differences/specialities which must not necessarily apply to the generic +definitions. Thus, the way to read up on Linux topology on x86 is to start +with the generic one and look at this one in parallel for the x86 specifics. + +Needless to say, code should use the generic functions - this file is *only* +here to *document* the inner workings of x86 topology. + +Started by Thomas Gleixner <tglx@linutronix.de> and Borislav Petkov <bp@alien8.de>. + +The main aim of the topology facilities is to present adequate interfaces to +code which needs to know/query/use the structure of the running system wrt +threads, cores, packages, etc. + +The kernel does not care about the concept of physical sockets because a +socket has no relevance to software. It's an electromechanical component. In +the past a socket always contained a single package (see below), but with the +advent of Multi Chip Modules (MCM) a socket can hold more than one package. So +there might be still references to sockets in the code, but they are of +historical nature and should be cleaned up. + +The topology of a system is described in the units of: + + - packages + - cores + - threads + +* Package: + + Packages contain a number of cores plus shared resources, e.g. DRAM + controller, shared caches etc. + + AMD nomenclature for package is 'Node'. + + Package-related topology information in the kernel: + + - cpuinfo_x86.x86_max_cores: + + The number of cores in a package. This information is retrieved via CPUID. + + - cpuinfo_x86.phys_proc_id: + + The physical ID of the package. This information is retrieved via CPUID + and deduced from the APIC IDs of the cores in the package. + + - cpuinfo_x86.logical_id: + + The logical ID of the package. As we do not trust BIOSes to enumerate the + packages in a consistent way, we introduced the concept of logical package + ID so we can sanely calculate the number of maximum possible packages in + the system and have the packages enumerated linearly. + + - topology_max_packages(): + + The maximum possible number of packages in the system. Helpful for per + package facilities to preallocate per package information. + + +* Cores: + + A core consists of 1 or more threads. It does not matter whether the threads + are SMT- or CMT-type threads. + + AMDs nomenclature for a CMT core is "Compute Unit". The kernel always uses + "core". + + Core-related topology information in the kernel: + + - smp_num_siblings: + + The number of threads in a core. The number of threads in a package can be + calculated by: + + threads_per_package = cpuinfo_x86.x86_max_cores * smp_num_siblings + + +* Threads: + + A thread is a single scheduling unit. It's the equivalent to a logical Linux + CPU. + + AMDs nomenclature for CMT threads is "Compute Unit Core". The kernel always + uses "thread". + + Thread-related topology information in the kernel: + + - topology_core_cpumask(): + + The cpumask contains all online threads in the package to which a thread + belongs. + + The number of online threads is also printed in /proc/cpuinfo "siblings." + + - topology_sibling_mask(): + + The cpumask contains all online threads in the core to which a thread + belongs. + + - topology_logical_package_id(): + + The logical package ID to which a thread belongs. + + - topology_physical_package_id(): + + The physical package ID to which a thread belongs. + + - topology_core_id(); + + The ID of the core to which a thread belongs. It is also printed in /proc/cpuinfo + "core_id." + + + +System topology examples + +Note: + +The alternative Linux CPU enumeration depends on how the BIOS enumerates the +threads. Many BIOSes enumerate all threads 0 first and then all threads 1. +That has the "advantage" that the logical Linux CPU numbers of threads 0 stay +the same whether threads are enabled or not. That's merely an implementation +detail and has no practical impact. + +1) Single Package, Single Core + + [package 0] -> [core 0] -> [thread 0] -> Linux CPU 0 + +2) Single Package, Dual Core + + a) One thread per core + + [package 0] -> [core 0] -> [thread 0] -> Linux CPU 0 + -> [core 1] -> [thread 0] -> Linux CPU 1 + + b) Two threads per core + + [package 0] -> [core 0] -> [thread 0] -> Linux CPU 0 + -> [thread 1] -> Linux CPU 1 + -> [core 1] -> [thread 0] -> Linux CPU 2 + -> [thread 1] -> Linux CPU 3 + + Alternative enumeration: + + [package 0] -> [core 0] -> [thread 0] -> Linux CPU 0 + -> [thread 1] -> Linux CPU 2 + -> [core 1] -> [thread 0] -> Linux CPU 1 + -> [thread 1] -> Linux CPU 3 + + AMD nomenclature for CMT systems: + + [node 0] -> [Compute Unit 0] -> [Compute Unit Core 0] -> Linux CPU 0 + -> [Compute Unit Core 1] -> Linux CPU 1 + -> [Compute Unit 1] -> [Compute Unit Core 0] -> Linux CPU 2 + -> [Compute Unit Core 1] -> Linux CPU 3 + +4) Dual Package, Dual Core + + a) One thread per core + + [package 0] -> [core 0] -> [thread 0] -> Linux CPU 0 + -> [core 1] -> [thread 0] -> Linux CPU 1 + + [package 1] -> [core 0] -> [thread 0] -> Linux CPU 2 + -> [core 1] -> [thread 0] -> Linux CPU 3 + + b) Two threads per core + + [package 0] -> [core 0] -> [thread 0] -> Linux CPU 0 + -> [thread 1] -> Linux CPU 1 + -> [core 1] -> [thread 0] -> Linux CPU 2 + -> [thread 1] -> Linux CPU 3 + + [package 1] -> [core 0] -> [thread 0] -> Linux CPU 4 + -> [thread 1] -> Linux CPU 5 + -> [core 1] -> [thread 0] -> Linux CPU 6 + -> [thread 1] -> Linux CPU 7 + + Alternative enumeration: + + [package 0] -> [core 0] -> [thread 0] -> Linux CPU 0 + -> [thread 1] -> Linux CPU 4 + -> [core 1] -> [thread 0] -> Linux CPU 1 + -> [thread 1] -> Linux CPU 5 + + [package 1] -> [core 0] -> [thread 0] -> Linux CPU 2 + -> [thread 1] -> Linux CPU 6 + -> [core 1] -> [thread 0] -> Linux CPU 3 + -> [thread 1] -> Linux CPU 7 + + AMD nomenclature for CMT systems: + + [node 0] -> [Compute Unit 0] -> [Compute Unit Core 0] -> Linux CPU 0 + -> [Compute Unit Core 1] -> Linux CPU 1 + -> [Compute Unit 1] -> [Compute Unit Core 0] -> Linux CPU 2 + -> [Compute Unit Core 1] -> Linux CPU 3 + + [node 1] -> [Compute Unit 0] -> [Compute Unit Core 0] -> Linux CPU 4 + -> [Compute Unit Core 1] -> Linux CPU 5 + -> [Compute Unit 1] -> [Compute Unit Core 0] -> Linux CPU 6 + -> [Compute Unit Core 1] -> Linux CPU 7 diff --git a/Documentation/x86/x86_64/mm.txt b/Documentation/x86/x86_64/mm.txt index c518dce7da4d..5aa738346062 100644 --- a/Documentation/x86/x86_64/mm.txt +++ b/Documentation/x86/x86_64/mm.txt @@ -19,7 +19,7 @@ ffffff0000000000 - ffffff7fffffffff (=39 bits) %esp fixup stacks ffffffef00000000 - ffffffff00000000 (=64 GB) EFI region mapping space ... unused hole ... ffffffff80000000 - ffffffffa0000000 (=512 MB) kernel text mapping, from phys 0 -ffffffffa0000000 - ffffffffff5fffff (=1525 MB) module mapping space +ffffffffa0000000 - ffffffffff5fffff (=1526 MB) module mapping space ffffffffff600000 - ffffffffffdfffff (=8 MB) vsyscalls ffffffffffe00000 - ffffffffffffffff (=2 MB) unused hole @@ -31,8 +31,8 @@ vmalloc space is lazily synchronized into the different PML4 pages of the processes using the page fault handler, with init_level4_pgt as reference. -Current X86-64 implementations only support 40 bits of address space, -but we support up to 46 bits. This expands into MBZ space in the page tables. +Current X86-64 implementations support up to 46 bits of address space (64 TB), +which is our current limit. This expands into MBZ space in the page tables. We map EFI runtime services in the 'efi_pgd' PGD in a 64Gb large virtual memory window (this size is arbitrary, it can be raised later if needed). |