From 849b384f92bcadf2bd967f81ceeff815c9cd6af9 Mon Sep 17 00:00:00 2001 From: Sudeep Holla Date: Thu, 27 Jun 2019 12:52:56 -0700 Subject: Documentation: DT: arm: add support for sockets defining package boundaries The current ARM DT topology description provides the operating system with a topological view of the system that is based on leaf nodes representing either cores or threads (in an SMT system) and a hierarchical set of cluster nodes that creates a hierarchical topology view of how those cores and threads are grouped. However this hierarchical representation of clusters does not allow to describe what topology level actually represents the physical package or the socket boundary, which is a key piece of information to be used by an operating system to optimize resource allocation and scheduling. Lets add a new "socket" node type in the cpu-map node to describe the same. Signed-off-by: Sudeep Holla Reviewed-by: Rob Herring Signed-off-by: Paul Walmsley --- Documentation/devicetree/bindings/arm/topology.txt | 172 ++++++++++++--------- 1 file changed, 99 insertions(+), 73 deletions(-) (limited to 'Documentation') diff --git a/Documentation/devicetree/bindings/arm/topology.txt b/Documentation/devicetree/bindings/arm/topology.txt index b0d80c0fb265..3b8febb46dad 100644 --- a/Documentation/devicetree/bindings/arm/topology.txt +++ b/Documentation/devicetree/bindings/arm/topology.txt @@ -9,6 +9,7 @@ ARM topology binding description In an ARM system, the hierarchy of CPUs is defined through three entities that are used to describe the layout of physical CPUs in the system: +- socket - cluster - core - thread @@ -63,21 +64,23 @@ nodes are listed. The cpu-map node's child nodes can be: - - one or more cluster nodes + - one or more cluster nodes or + - one or more socket nodes in a multi-socket system Any other configuration is considered invalid. -The cpu-map node can only contain three types of child nodes: +The cpu-map node can only contain 4 types of child nodes: +- socket node - cluster node - core node - thread node whose bindings are described in paragraph 3. -The nodes describing the CPU topology (cluster/core/thread) can only -be defined within the cpu-map node and every core/thread in the system -must be defined within the topology. Any other configuration is +The nodes describing the CPU topology (socket/cluster/core/thread) can +only be defined within the cpu-map node and every core/thread in the +system must be defined within the topology. Any other configuration is invalid and therefore must be ignored. =========================================== @@ -85,26 +88,44 @@ invalid and therefore must be ignored. =========================================== cpu-map child nodes must follow a naming convention where the node name -must be "clusterN", "coreN", "threadN" depending on the node type (ie -cluster/core/thread) (where N = {0, 1, ...} is the node number; nodes which -are siblings within a single common parent node must be given a unique and +must be "socketN", "clusterN", "coreN", "threadN" depending on the node type +(ie socket/cluster/core/thread) (where N = {0, 1, ...} is the node number; nodes +which are siblings within a single common parent node must be given a unique and sequential N value, starting from 0). cpu-map child nodes which do not share a common parent node can have the same name (ie same number N as other cpu-map child nodes at different device tree levels) since name uniqueness will be guaranteed by the device tree hierarchy. =========================================== -3 - cluster/core/thread node bindings +3 - socket/cluster/core/thread node bindings =========================================== -Bindings for cluster/cpu/thread nodes are defined as follows: +Bindings for socket/cluster/cpu/thread nodes are defined as follows: + +- socket node + + Description: must be declared within a cpu-map node, one node + per physical socket in the system. A system can + contain single or multiple physical socket. + The association of sockets and NUMA nodes is beyond + the scope of this bindings, please refer [2] for + NUMA bindings. + + This node is optional for a single socket system. + + The socket node name must be "socketN" as described in 2.1 above. + A socket node can not be a leaf node. + + A socket node's child nodes must be one or more cluster nodes. + + Any other configuration is considered invalid. - cluster node Description: must be declared within a cpu-map node, one node per cluster. A system can contain several layers of - clustering and cluster nodes can be contained in parent - cluster nodes. + clustering within a single physical socket and cluster + nodes can be contained in parent cluster nodes. The cluster node name must be "clusterN" as described in 2.1 above. A cluster node can not be a leaf node. @@ -164,90 +185,93 @@ Bindings for cluster/cpu/thread nodes are defined as follows: 4 - Example dts =========================================== -Example 1 (ARM 64-bit, 16-cpu system, two clusters of clusters): +Example 1 (ARM 64-bit, 16-cpu system, two clusters of clusters in a single +physical socket): cpus { #size-cells = <0>; #address-cells = <2>; cpu-map { - cluster0 { + socket0 { cluster0 { - core0 { - thread0 { - cpu = <&CPU0>; - }; - thread1 { - cpu = <&CPU1>; + cluster0 { + core0 { + thread0 { + cpu = <&CPU0>; + }; + thread1 { + cpu = <&CPU1>; + }; }; - }; - core1 { - thread0 { - cpu = <&CPU2>; - }; - thread1 { - cpu = <&CPU3>; + core1 { + thread0 { + cpu = <&CPU2>; + }; + thread1 { + cpu = <&CPU3>; + }; }; }; - }; - cluster1 { - core0 { - thread0 { - cpu = <&CPU4>; - }; - thread1 { - cpu = <&CPU5>; + cluster1 { + core0 { + thread0 { + cpu = <&CPU4>; + }; + thread1 { + cpu = <&CPU5>; + }; }; - }; - - core1 { - thread0 { - cpu = <&CPU6>; - }; - thread1 { - cpu = <&CPU7>; - }; - }; - }; - }; - cluster1 { - cluster0 { - core0 { - thread0 { - cpu = <&CPU8>; - }; - thread1 { - cpu = <&CPU9>; - }; - }; - core1 { - thread0 { - cpu = <&CPU10>; - }; - thread1 { - cpu = <&CPU11>; + core1 { + thread0 { + cpu = <&CPU6>; + }; + thread1 { + cpu = <&CPU7>; + }; }; }; }; cluster1 { - core0 { - thread0 { - cpu = <&CPU12>; + cluster0 { + core0 { + thread0 { + cpu = <&CPU8>; + }; + thread1 { + cpu = <&CPU9>; + }; }; - thread1 { - cpu = <&CPU13>; + core1 { + thread0 { + cpu = <&CPU10>; + }; + thread1 { + cpu = <&CPU11>; + }; }; }; - core1 { - thread0 { - cpu = <&CPU14>; + + cluster1 { + core0 { + thread0 { + cpu = <&CPU12>; + }; + thread1 { + cpu = <&CPU13>; + }; }; - thread1 { - cpu = <&CPU15>; + core1 { + thread0 { + cpu = <&CPU14>; + }; + thread1 { + cpu = <&CPU15>; + }; }; }; }; @@ -473,3 +497,5 @@ cpus { =============================================================================== [1] ARM Linux kernel documentation Documentation/devicetree/bindings/arm/cpus.yaml +[2] Devicetree NUMA binding description + Documentation/devicetree/bindings/numa.txt -- cgit v1.2.3 From 124e46a86580c71e0eee8459c5da7649318118db Mon Sep 17 00:00:00 2001 From: Atish Patra Date: Thu, 27 Jun 2019 12:52:57 -0700 Subject: dt-binding: cpu-topology: Move cpu-map to a common binding. cpu-map binding can be used to described cpu topology for both RISC-V & ARM. It makes more sense to move the binding to document to a common place. The relevant discussion can be found here. https://lkml.org/lkml/2018/11/6/19 Signed-off-by: Atish Patra Reviewed-by: Sudeep Holla Reviewed-by: Rob Herring Signed-off-by: Paul Walmsley --- Documentation/devicetree/bindings/arm/topology.txt | 501 ------------------- .../devicetree/bindings/cpu/cpu-topology.txt | 553 +++++++++++++++++++++ 2 files changed, 553 insertions(+), 501 deletions(-) delete mode 100644 Documentation/devicetree/bindings/arm/topology.txt create mode 100644 Documentation/devicetree/bindings/cpu/cpu-topology.txt (limited to 'Documentation') diff --git a/Documentation/devicetree/bindings/arm/topology.txt b/Documentation/devicetree/bindings/arm/topology.txt deleted file mode 100644 index 3b8febb46dad..000000000000 --- a/Documentation/devicetree/bindings/arm/topology.txt +++ /dev/null @@ -1,501 +0,0 @@ -=========================================== -ARM topology binding description -=========================================== - -=========================================== -1 - Introduction -=========================================== - -In an ARM system, the hierarchy of CPUs is defined through three entities that -are used to describe the layout of physical CPUs in the system: - -- socket -- cluster -- core -- thread - -The cpu nodes (bindings defined in [1]) represent the devices that -correspond to physical CPUs and are to be mapped to the hierarchy levels. - -The bottom hierarchy level sits at core or thread level depending on whether -symmetric multi-threading (SMT) is supported or not. - -For instance in a system where CPUs support SMT, "cpu" nodes represent all -threads existing in the system and map to the hierarchy level "thread" above. -In systems where SMT is not supported "cpu" nodes represent all cores present -in the system and map to the hierarchy level "core" above. - -ARM topology bindings allow one to associate cpu nodes with hierarchical groups -corresponding to the system hierarchy; syntactically they are defined as device -tree nodes. - -The remainder of this document provides the topology bindings for ARM, based -on the Devicetree Specification, available from: - -https://www.devicetree.org/specifications/ - -If not stated otherwise, whenever a reference to a cpu node phandle is made its -value must point to a cpu node compliant with the cpu node bindings as -documented in [1]. -A topology description containing phandles to cpu nodes that are not compliant -with bindings standardized in [1] is therefore considered invalid. - -=========================================== -2 - cpu-map node -=========================================== - -The ARM CPU topology is defined within the cpu-map node, which is a direct -child of the cpus node and provides a container where the actual topology -nodes are listed. - -- cpu-map node - - Usage: Optional - On ARM SMP systems provide CPUs topology to the OS. - ARM uniprocessor systems do not require a topology - description and therefore should not define a - cpu-map node. - - Description: The cpu-map node is just a container node where its - subnodes describe the CPU topology. - - Node name must be "cpu-map". - - The cpu-map node's parent node must be the cpus node. - - The cpu-map node's child nodes can be: - - - one or more cluster nodes or - - one or more socket nodes in a multi-socket system - - Any other configuration is considered invalid. - -The cpu-map node can only contain 4 types of child nodes: - -- socket node -- cluster node -- core node -- thread node - -whose bindings are described in paragraph 3. - -The nodes describing the CPU topology (socket/cluster/core/thread) can -only be defined within the cpu-map node and every core/thread in the -system must be defined within the topology. Any other configuration is -invalid and therefore must be ignored. - -=========================================== -2.1 - cpu-map child nodes naming convention -=========================================== - -cpu-map child nodes must follow a naming convention where the node name -must be "socketN", "clusterN", "coreN", "threadN" depending on the node type -(ie socket/cluster/core/thread) (where N = {0, 1, ...} is the node number; nodes -which are siblings within a single common parent node must be given a unique and -sequential N value, starting from 0). -cpu-map child nodes which do not share a common parent node can have the same -name (ie same number N as other cpu-map child nodes at different device tree -levels) since name uniqueness will be guaranteed by the device tree hierarchy. - -=========================================== -3 - socket/cluster/core/thread node bindings -=========================================== - -Bindings for socket/cluster/cpu/thread nodes are defined as follows: - -- socket node - - Description: must be declared within a cpu-map node, one node - per physical socket in the system. A system can - contain single or multiple physical socket. - The association of sockets and NUMA nodes is beyond - the scope of this bindings, please refer [2] for - NUMA bindings. - - This node is optional for a single socket system. - - The socket node name must be "socketN" as described in 2.1 above. - A socket node can not be a leaf node. - - A socket node's child nodes must be one or more cluster nodes. - - Any other configuration is considered invalid. - -- cluster node - - Description: must be declared within a cpu-map node, one node - per cluster. A system can contain several layers of - clustering within a single physical socket and cluster - nodes can be contained in parent cluster nodes. - - The cluster node name must be "clusterN" as described in 2.1 above. - A cluster node can not be a leaf node. - - A cluster node's child nodes must be: - - - one or more cluster nodes; or - - one or more core nodes - - Any other configuration is considered invalid. - -- core node - - Description: must be declared in a cluster node, one node per core in - the cluster. If the system does not support SMT, core - nodes are leaf nodes, otherwise they become containers of - thread nodes. - - The core node name must be "coreN" as described in 2.1 above. - - A core node must be a leaf node if SMT is not supported. - - Properties for core nodes that are leaf nodes: - - - cpu - Usage: required - Value type: - Definition: a phandle to the cpu node that corresponds to the - core node. - - If a core node is not a leaf node (CPUs supporting SMT) a core node's - child nodes can be: - - - one or more thread nodes - - Any other configuration is considered invalid. - -- thread node - - Description: must be declared in a core node, one node per thread - in the core if the system supports SMT. Thread nodes are - always leaf nodes in the device tree. - - The thread node name must be "threadN" as described in 2.1 above. - - A thread node must be a leaf node. - - A thread node must contain the following property: - - - cpu - Usage: required - Value type: - Definition: a phandle to the cpu node that corresponds to - the thread node. - -=========================================== -4 - Example dts -=========================================== - -Example 1 (ARM 64-bit, 16-cpu system, two clusters of clusters in a single -physical socket): - -cpus { - #size-cells = <0>; - #address-cells = <2>; - - cpu-map { - socket0 { - cluster0 { - cluster0 { - core0 { - thread0 { - cpu = <&CPU0>; - }; - thread1 { - cpu = <&CPU1>; - }; - }; - - core1 { - thread0 { - cpu = <&CPU2>; - }; - thread1 { - cpu = <&CPU3>; - }; - }; - }; - - cluster1 { - core0 { - thread0 { - cpu = <&CPU4>; - }; - thread1 { - cpu = <&CPU5>; - }; - }; - - core1 { - thread0 { - cpu = <&CPU6>; - }; - thread1 { - cpu = <&CPU7>; - }; - }; - }; - }; - - cluster1 { - cluster0 { - core0 { - thread0 { - cpu = <&CPU8>; - }; - thread1 { - cpu = <&CPU9>; - }; - }; - core1 { - thread0 { - cpu = <&CPU10>; - }; - thread1 { - cpu = <&CPU11>; - }; - }; - }; - - cluster1 { - core0 { - thread0 { - cpu = <&CPU12>; - }; - thread1 { - cpu = <&CPU13>; - }; - }; - core1 { - thread0 { - cpu = <&CPU14>; - }; - thread1 { - cpu = <&CPU15>; - }; - }; - }; - }; - }; - }; - - CPU0: cpu@0 { - device_type = "cpu"; - compatible = "arm,cortex-a57"; - reg = <0x0 0x0>; - enable-method = "spin-table"; - cpu-release-addr = <0 0x20000000>; - }; - - CPU1: cpu@1 { - device_type = "cpu"; - compatible = "arm,cortex-a57"; - reg = <0x0 0x1>; - enable-method = "spin-table"; - cpu-release-addr = <0 0x20000000>; - }; - - CPU2: cpu@100 { - device_type = "cpu"; - compatible = "arm,cortex-a57"; - reg = <0x0 0x100>; - enable-method = "spin-table"; - cpu-release-addr = <0 0x20000000>; - }; - - CPU3: cpu@101 { - device_type = "cpu"; - compatible = "arm,cortex-a57"; - reg = <0x0 0x101>; - enable-method = "spin-table"; - cpu-release-addr = <0 0x20000000>; - }; - - CPU4: cpu@10000 { - device_type = "cpu"; - compatible = "arm,cortex-a57"; - reg = <0x0 0x10000>; - enable-method = "spin-table"; - cpu-release-addr = <0 0x20000000>; - }; - - CPU5: cpu@10001 { - device_type = "cpu"; - compatible = "arm,cortex-a57"; - reg = <0x0 0x10001>; - enable-method = "spin-table"; - cpu-release-addr = <0 0x20000000>; - }; - - CPU6: cpu@10100 { - device_type = "cpu"; - compatible = "arm,cortex-a57"; - reg = <0x0 0x10100>; - enable-method = "spin-table"; - cpu-release-addr = <0 0x20000000>; - }; - - CPU7: cpu@10101 { - device_type = "cpu"; - compatible = "arm,cortex-a57"; - reg = <0x0 0x10101>; - enable-method = "spin-table"; - cpu-release-addr = <0 0x20000000>; - }; - - CPU8: cpu@100000000 { - device_type = "cpu"; - compatible = "arm,cortex-a57"; - reg = <0x1 0x0>; - enable-method = "spin-table"; - cpu-release-addr = <0 0x20000000>; - }; - - CPU9: cpu@100000001 { - device_type = "cpu"; - compatible = "arm,cortex-a57"; - reg = <0x1 0x1>; - enable-method = "spin-table"; - cpu-release-addr = <0 0x20000000>; - }; - - CPU10: cpu@100000100 { - device_type = "cpu"; - compatible = "arm,cortex-a57"; - reg = <0x1 0x100>; - enable-method = "spin-table"; - cpu-release-addr = <0 0x20000000>; - }; - - CPU11: cpu@100000101 { - device_type = "cpu"; - compatible = "arm,cortex-a57"; - reg = <0x1 0x101>; - enable-method = "spin-table"; - cpu-release-addr = <0 0x20000000>; - }; - - CPU12: cpu@100010000 { - device_type = "cpu"; - compatible = "arm,cortex-a57"; - reg = <0x1 0x10000>; - enable-method = "spin-table"; - cpu-release-addr = <0 0x20000000>; - }; - - CPU13: cpu@100010001 { - device_type = "cpu"; - compatible = "arm,cortex-a57"; - reg = <0x1 0x10001>; - enable-method = "spin-table"; - cpu-release-addr = <0 0x20000000>; - }; - - CPU14: cpu@100010100 { - device_type = "cpu"; - compatible = "arm,cortex-a57"; - reg = <0x1 0x10100>; - enable-method = "spin-table"; - cpu-release-addr = <0 0x20000000>; - }; - - CPU15: cpu@100010101 { - device_type = "cpu"; - compatible = "arm,cortex-a57"; - reg = <0x1 0x10101>; - enable-method = "spin-table"; - cpu-release-addr = <0 0x20000000>; - }; -}; - -Example 2 (ARM 32-bit, dual-cluster, 8-cpu system, no SMT): - -cpus { - #size-cells = <0>; - #address-cells = <1>; - - cpu-map { - cluster0 { - core0 { - cpu = <&CPU0>; - }; - core1 { - cpu = <&CPU1>; - }; - core2 { - cpu = <&CPU2>; - }; - core3 { - cpu = <&CPU3>; - }; - }; - - cluster1 { - core0 { - cpu = <&CPU4>; - }; - core1 { - cpu = <&CPU5>; - }; - core2 { - cpu = <&CPU6>; - }; - core3 { - cpu = <&CPU7>; - }; - }; - }; - - CPU0: cpu@0 { - device_type = "cpu"; - compatible = "arm,cortex-a15"; - reg = <0x0>; - }; - - CPU1: cpu@1 { - device_type = "cpu"; - compatible = "arm,cortex-a15"; - reg = <0x1>; - }; - - CPU2: cpu@2 { - device_type = "cpu"; - compatible = "arm,cortex-a15"; - reg = <0x2>; - }; - - CPU3: cpu@3 { - device_type = "cpu"; - compatible = "arm,cortex-a15"; - reg = <0x3>; - }; - - CPU4: cpu@100 { - device_type = "cpu"; - compatible = "arm,cortex-a7"; - reg = <0x100>; - }; - - CPU5: cpu@101 { - device_type = "cpu"; - compatible = "arm,cortex-a7"; - reg = <0x101>; - }; - - CPU6: cpu@102 { - device_type = "cpu"; - compatible = "arm,cortex-a7"; - reg = <0x102>; - }; - - CPU7: cpu@103 { - device_type = "cpu"; - compatible = "arm,cortex-a7"; - reg = <0x103>; - }; -}; - -=============================================================================== -[1] ARM Linux kernel documentation - Documentation/devicetree/bindings/arm/cpus.yaml -[2] Devicetree NUMA binding description - Documentation/devicetree/bindings/numa.txt diff --git a/Documentation/devicetree/bindings/cpu/cpu-topology.txt b/Documentation/devicetree/bindings/cpu/cpu-topology.txt new file mode 100644 index 000000000000..99918189403c --- /dev/null +++ b/Documentation/devicetree/bindings/cpu/cpu-topology.txt @@ -0,0 +1,553 @@ +=========================================== +CPU topology binding description +=========================================== + +=========================================== +1 - Introduction +=========================================== + +In a SMP system, the hierarchy of CPUs is defined through three entities that +are used to describe the layout of physical CPUs in the system: + +- socket +- cluster +- core +- thread + +The bottom hierarchy level sits at core or thread level depending on whether +symmetric multi-threading (SMT) is supported or not. + +For instance in a system where CPUs support SMT, "cpu" nodes represent all +threads existing in the system and map to the hierarchy level "thread" above. +In systems where SMT is not supported "cpu" nodes represent all cores present +in the system and map to the hierarchy level "core" above. + +CPU topology bindings allow one to associate cpu nodes with hierarchical groups +corresponding to the system hierarchy; syntactically they are defined as device +tree nodes. + +Currently, only ARM/RISC-V intend to use this cpu topology binding but it may be +used for any other architecture as well. + +The cpu nodes, as per bindings defined in [4], represent the devices that +correspond to physical CPUs and are to be mapped to the hierarchy levels. + +A topology description containing phandles to cpu nodes that are not compliant +with bindings standardized in [4] is therefore considered invalid. + +=========================================== +2 - cpu-map node +=========================================== + +The ARM/RISC-V CPU topology is defined within the cpu-map node, which is a direct +child of the cpus node and provides a container where the actual topology +nodes are listed. + +- cpu-map node + + Usage: Optional - On SMP systems provide CPUs topology to the OS. + Uniprocessor systems do not require a topology + description and therefore should not define a + cpu-map node. + + Description: The cpu-map node is just a container node where its + subnodes describe the CPU topology. + + Node name must be "cpu-map". + + The cpu-map node's parent node must be the cpus node. + + The cpu-map node's child nodes can be: + + - one or more cluster nodes or + - one or more socket nodes in a multi-socket system + + Any other configuration is considered invalid. + +The cpu-map node can only contain 4 types of child nodes: + +- socket node +- cluster node +- core node +- thread node + +whose bindings are described in paragraph 3. + +The nodes describing the CPU topology (socket/cluster/core/thread) can +only be defined within the cpu-map node and every core/thread in the +system must be defined within the topology. Any other configuration is +invalid and therefore must be ignored. + +=========================================== +2.1 - cpu-map child nodes naming convention +=========================================== + +cpu-map child nodes must follow a naming convention where the node name +must be "socketN", "clusterN", "coreN", "threadN" depending on the node type +(ie socket/cluster/core/thread) (where N = {0, 1, ...} is the node number; nodes +which are siblings within a single common parent node must be given a unique and +sequential N value, starting from 0). +cpu-map child nodes which do not share a common parent node can have the same +name (ie same number N as other cpu-map child nodes at different device tree +levels) since name uniqueness will be guaranteed by the device tree hierarchy. + +=========================================== +3 - socket/cluster/core/thread node bindings +=========================================== + +Bindings for socket/cluster/cpu/thread nodes are defined as follows: + +- socket node + + Description: must be declared within a cpu-map node, one node + per physical socket in the system. A system can + contain single or multiple physical socket. + The association of sockets and NUMA nodes is beyond + the scope of this bindings, please refer [2] for + NUMA bindings. + + This node is optional for a single socket system. + + The socket node name must be "socketN" as described in 2.1 above. + A socket node can not be a leaf node. + + A socket node's child nodes must be one or more cluster nodes. + + Any other configuration is considered invalid. + +- cluster node + + Description: must be declared within a cpu-map node, one node + per cluster. A system can contain several layers of + clustering within a single physical socket and cluster + nodes can be contained in parent cluster nodes. + + The cluster node name must be "clusterN" as described in 2.1 above. + A cluster node can not be a leaf node. + + A cluster node's child nodes must be: + + - one or more cluster nodes; or + - one or more core nodes + + Any other configuration is considered invalid. + +- core node + + Description: must be declared in a cluster node, one node per core in + the cluster. If the system does not support SMT, core + nodes are leaf nodes, otherwise they become containers of + thread nodes. + + The core node name must be "coreN" as described in 2.1 above. + + A core node must be a leaf node if SMT is not supported. + + Properties for core nodes that are leaf nodes: + + - cpu + Usage: required + Value type: + Definition: a phandle to the cpu node that corresponds to the + core node. + + If a core node is not a leaf node (CPUs supporting SMT) a core node's + child nodes can be: + + - one or more thread nodes + + Any other configuration is considered invalid. + +- thread node + + Description: must be declared in a core node, one node per thread + in the core if the system supports SMT. Thread nodes are + always leaf nodes in the device tree. + + The thread node name must be "threadN" as described in 2.1 above. + + A thread node must be a leaf node. + + A thread node must contain the following property: + + - cpu + Usage: required + Value type: + Definition: a phandle to the cpu node that corresponds to + the thread node. + +=========================================== +4 - Example dts +=========================================== + +Example 1 (ARM 64-bit, 16-cpu system, two clusters of clusters in a single +physical socket): + +cpus { + #size-cells = <0>; + #address-cells = <2>; + + cpu-map { + socket0 { + cluster0 { + cluster0 { + core0 { + thread0 { + cpu = <&CPU0>; + }; + thread1 { + cpu = <&CPU1>; + }; + }; + + core1 { + thread0 { + cpu = <&CPU2>; + }; + thread1 { + cpu = <&CPU3>; + }; + }; + }; + + cluster1 { + core0 { + thread0 { + cpu = <&CPU4>; + }; + thread1 { + cpu = <&CPU5>; + }; + }; + + core1 { + thread0 { + cpu = <&CPU6>; + }; + thread1 { + cpu = <&CPU7>; + }; + }; + }; + }; + + cluster1 { + cluster0 { + core0 { + thread0 { + cpu = <&CPU8>; + }; + thread1 { + cpu = <&CPU9>; + }; + }; + core1 { + thread0 { + cpu = <&CPU10>; + }; + thread1 { + cpu = <&CPU11>; + }; + }; + }; + + cluster1 { + core0 { + thread0 { + cpu = <&CPU12>; + }; + thread1 { + cpu = <&CPU13>; + }; + }; + core1 { + thread0 { + cpu = <&CPU14>; + }; + thread1 { + cpu = <&CPU15>; + }; + }; + }; + }; + }; + }; + + CPU0: cpu@0 { + device_type = "cpu"; + compatible = "arm,cortex-a57"; + reg = <0x0 0x0>; + enable-method = "spin-table"; + cpu-release-addr = <0 0x20000000>; + }; + + CPU1: cpu@1 { + device_type = "cpu"; + compatible = "arm,cortex-a57"; + reg = <0x0 0x1>; + enable-method = "spin-table"; + cpu-release-addr = <0 0x20000000>; + }; + + CPU2: cpu@100 { + device_type = "cpu"; + compatible = "arm,cortex-a57"; + reg = <0x0 0x100>; + enable-method = "spin-table"; + cpu-release-addr = <0 0x20000000>; + }; + + CPU3: cpu@101 { + device_type = "cpu"; + compatible = "arm,cortex-a57"; + reg = <0x0 0x101>; + enable-method = "spin-table"; + cpu-release-addr = <0 0x20000000>; + }; + + CPU4: cpu@10000 { + device_type = "cpu"; + compatible = "arm,cortex-a57"; + reg = <0x0 0x10000>; + enable-method = "spin-table"; + cpu-release-addr = <0 0x20000000>; + }; + + CPU5: cpu@10001 { + device_type = "cpu"; + compatible = "arm,cortex-a57"; + reg = <0x0 0x10001>; + enable-method = "spin-table"; + cpu-release-addr = <0 0x20000000>; + }; + + CPU6: cpu@10100 { + device_type = "cpu"; + compatible = "arm,cortex-a57"; + reg = <0x0 0x10100>; + enable-method = "spin-table"; + cpu-release-addr = <0 0x20000000>; + }; + + CPU7: cpu@10101 { + device_type = "cpu"; + compatible = "arm,cortex-a57"; + reg = <0x0 0x10101>; + enable-method = "spin-table"; + cpu-release-addr = <0 0x20000000>; + }; + + CPU8: cpu@100000000 { + device_type = "cpu"; + compatible = "arm,cortex-a57"; + reg = <0x1 0x0>; + enable-method = "spin-table"; + cpu-release-addr = <0 0x20000000>; + }; + + CPU9: cpu@100000001 { + device_type = "cpu"; + compatible = "arm,cortex-a57"; + reg = <0x1 0x1>; + enable-method = "spin-table"; + cpu-release-addr = <0 0x20000000>; + }; + + CPU10: cpu@100000100 { + device_type = "cpu"; + compatible = "arm,cortex-a57"; + reg = <0x1 0x100>; + enable-method = "spin-table"; + cpu-release-addr = <0 0x20000000>; + }; + + CPU11: cpu@100000101 { + device_type = "cpu"; + compatible = "arm,cortex-a57"; + reg = <0x1 0x101>; + enable-method = "spin-table"; + cpu-release-addr = <0 0x20000000>; + }; + + CPU12: cpu@100010000 { + device_type = "cpu"; + compatible = "arm,cortex-a57"; + reg = <0x1 0x10000>; + enable-method = "spin-table"; + cpu-release-addr = <0 0x20000000>; + }; + + CPU13: cpu@100010001 { + device_type = "cpu"; + compatible = "arm,cortex-a57"; + reg = <0x1 0x10001>; + enable-method = "spin-table"; + cpu-release-addr = <0 0x20000000>; + }; + + CPU14: cpu@100010100 { + device_type = "cpu"; + compatible = "arm,cortex-a57"; + reg = <0x1 0x10100>; + enable-method = "spin-table"; + cpu-release-addr = <0 0x20000000>; + }; + + CPU15: cpu@100010101 { + device_type = "cpu"; + compatible = "arm,cortex-a57"; + reg = <0x1 0x10101>; + enable-method = "spin-table"; + cpu-release-addr = <0 0x20000000>; + }; +}; + +Example 2 (ARM 32-bit, dual-cluster, 8-cpu system, no SMT): + +cpus { + #size-cells = <0>; + #address-cells = <1>; + + cpu-map { + cluster0 { + core0 { + cpu = <&CPU0>; + }; + core1 { + cpu = <&CPU1>; + }; + core2 { + cpu = <&CPU2>; + }; + core3 { + cpu = <&CPU3>; + }; + }; + + cluster1 { + core0 { + cpu = <&CPU4>; + }; + core1 { + cpu = <&CPU5>; + }; + core2 { + cpu = <&CPU6>; + }; + core3 { + cpu = <&CPU7>; + }; + }; + }; + + CPU0: cpu@0 { + device_type = "cpu"; + compatible = "arm,cortex-a15"; + reg = <0x0>; + }; + + CPU1: cpu@1 { + device_type = "cpu"; + compatible = "arm,cortex-a15"; + reg = <0x1>; + }; + + CPU2: cpu@2 { + device_type = "cpu"; + compatible = "arm,cortex-a15"; + reg = <0x2>; + }; + + CPU3: cpu@3 { + device_type = "cpu"; + compatible = "arm,cortex-a15"; + reg = <0x3>; + }; + + CPU4: cpu@100 { + device_type = "cpu"; + compatible = "arm,cortex-a7"; + reg = <0x100>; + }; + + CPU5: cpu@101 { + device_type = "cpu"; + compatible = "arm,cortex-a7"; + reg = <0x101>; + }; + + CPU6: cpu@102 { + device_type = "cpu"; + compatible = "arm,cortex-a7"; + reg = <0x102>; + }; + + CPU7: cpu@103 { + device_type = "cpu"; + compatible = "arm,cortex-a7"; + reg = <0x103>; + }; +}; + +Example 3: HiFive Unleashed (RISC-V 64 bit, 4 core system) + +{ + #address-cells = <2>; + #size-cells = <2>; + compatible = "sifive,fu540g", "sifive,fu500"; + model = "sifive,hifive-unleashed-a00"; + + ... + cpus { + #address-cells = <1>; + #size-cells = <0>; + cpu-map { + socket0 { + cluster0 { + core0 { + cpu = <&CPU1>; + }; + core1 { + cpu = <&CPU2>; + }; + core2 { + cpu0 = <&CPU2>; + }; + core3 { + cpu0 = <&CPU3>; + }; + }; + }; + }; + + CPU1: cpu@1 { + device_type = "cpu"; + compatible = "sifive,rocket0", "riscv"; + reg = <0x1>; + } + + CPU2: cpu@2 { + device_type = "cpu"; + compatible = "sifive,rocket0", "riscv"; + reg = <0x2>; + } + CPU3: cpu@3 { + device_type = "cpu"; + compatible = "sifive,rocket0", "riscv"; + reg = <0x3>; + } + CPU4: cpu@4 { + device_type = "cpu"; + compatible = "sifive,rocket0", "riscv"; + reg = <0x4>; + } + } +}; +=============================================================================== +[1] ARM Linux kernel documentation + Documentation/devicetree/bindings/arm/cpus.yaml +[2] Devicetree NUMA binding description + Documentation/devicetree/bindings/numa.txt +[3] RISC-V Linux kernel documentation + Documentation/devicetree/bindings/riscv/cpus.txt +[4] https://www.devicetree.org/specifications/ -- cgit v1.2.3 From 6bd1d0be0e97936d15cdacc71f5c232fbf71293e Mon Sep 17 00:00:00 2001 From: Steve Capper Date: Wed, 7 Aug 2019 16:55:15 +0100 Subject: arm64: kasan: Switch to using KASAN_SHADOW_OFFSET KASAN_SHADOW_OFFSET is a constant that is supplied to gcc as a command line argument and affects the codegen of the inline address sanetiser. Essentially, for an example memory access: *ptr1 = val; The compiler will insert logic similar to the below: shadowValue = *(ptr1 >> KASAN_SHADOW_SCALE_SHIFT + KASAN_SHADOW_OFFSET) if (somethingWrong(shadowValue)) flagAnError(); This code sequence is inserted into many places, thus KASAN_SHADOW_OFFSET is essentially baked into many places in the kernel text. If we want to run a single kernel binary with multiple address spaces, then we need to do this with KASAN_SHADOW_OFFSET fixed. Thankfully, due to the way the KASAN_SHADOW_OFFSET is used to provide shadow addresses we know that the end of the shadow region is constant w.r.t. VA space size: KASAN_SHADOW_END = ~0 >> KASAN_SHADOW_SCALE_SHIFT + KASAN_SHADOW_OFFSET This means that if we increase the size of the VA space, the start of the KASAN region expands into lower addresses whilst the end of the KASAN region is fixed. Currently the arm64 code computes KASAN_SHADOW_OFFSET at build time via build scripts with the VA size used as a parameter. (There are build time checks in the C code too to ensure that expected values are being derived). It is sufficient, and indeed is a simplification, to remove the build scripts (and build time checks) entirely and instead provide KASAN_SHADOW_OFFSET values. This patch removes the logic to compute the KASAN_SHADOW_OFFSET in the arm64 Makefile, and instead we adopt the approach used by x86 to supply offset values in kConfig. To help debug/develop future VA space changes, the Makefile logic has been preserved in a script file in the arm64 Documentation folder. Reviewed-by: Catalin Marinas Signed-off-by: Steve Capper Signed-off-by: Will Deacon --- Documentation/arm64/kasan-offsets.sh | 27 +++++++++++++++++++++++++++ arch/arm64/Kconfig | 15 +++++++++++++++ arch/arm64/Makefile | 8 -------- arch/arm64/include/asm/kasan.h | 11 ++++------- arch/arm64/include/asm/memory.h | 8 +++++--- 5 files changed, 51 insertions(+), 18 deletions(-) create mode 100644 Documentation/arm64/kasan-offsets.sh (limited to 'Documentation') diff --git a/Documentation/arm64/kasan-offsets.sh b/Documentation/arm64/kasan-offsets.sh new file mode 100644 index 000000000000..2b7a021db363 --- /dev/null +++ b/Documentation/arm64/kasan-offsets.sh @@ -0,0 +1,27 @@ +#!/bin/sh + +# Print out the KASAN_SHADOW_OFFSETS required to place the KASAN SHADOW +# start address at the mid-point of the kernel VA space + +print_kasan_offset () { + printf "%02d\t" $1 + printf "0x%08x00000000\n" $(( (0xffffffff & (-1 << ($1 - 1 - 32))) \ + + (1 << ($1 - 32 - $2)) \ + - (1 << (64 - 32 - $2)) )) +} + +echo KASAN_SHADOW_SCALE_SHIFT = 3 +printf "VABITS\tKASAN_SHADOW_OFFSET\n" +print_kasan_offset 48 3 +print_kasan_offset 47 3 +print_kasan_offset 42 3 +print_kasan_offset 39 3 +print_kasan_offset 36 3 +echo +echo KASAN_SHADOW_SCALE_SHIFT = 4 +printf "VABITS\tKASAN_SHADOW_OFFSET\n" +print_kasan_offset 48 4 +print_kasan_offset 47 4 +print_kasan_offset 42 4 +print_kasan_offset 39 4 +print_kasan_offset 36 4 diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 3adcec05b1f6..f7f23e47c28f 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -297,6 +297,21 @@ config ARCH_SUPPORTS_UPROBES config ARCH_PROC_KCORE_TEXT def_bool y +config KASAN_SHADOW_OFFSET + hex + depends on KASAN + default 0xdfffa00000000000 if (ARM64_VA_BITS_48 || ARM64_USER_VA_BITS_52) && !KASAN_SW_TAGS + default 0xdfffd00000000000 if ARM64_VA_BITS_47 && !KASAN_SW_TAGS + default 0xdffffe8000000000 if ARM64_VA_BITS_42 && !KASAN_SW_TAGS + default 0xdfffffd000000000 if ARM64_VA_BITS_39 && !KASAN_SW_TAGS + default 0xdffffffa00000000 if ARM64_VA_BITS_36 && !KASAN_SW_TAGS + default 0xefff900000000000 if (ARM64_VA_BITS_48 || ARM64_USER_VA_BITS_52) && KASAN_SW_TAGS + default 0xefffc80000000000 if ARM64_VA_BITS_47 && KASAN_SW_TAGS + default 0xeffffe4000000000 if ARM64_VA_BITS_42 && KASAN_SW_TAGS + default 0xefffffc800000000 if ARM64_VA_BITS_39 && KASAN_SW_TAGS + default 0xeffffff900000000 if ARM64_VA_BITS_36 && KASAN_SW_TAGS + default 0xffffffffffffffff + source "arch/arm64/Kconfig.platforms" menu "Kernel Features" diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile index 61f7926fdeca..a8d2a241ac58 100644 --- a/arch/arm64/Makefile +++ b/arch/arm64/Makefile @@ -126,14 +126,6 @@ KBUILD_CFLAGS += -DKASAN_SHADOW_SCALE_SHIFT=$(KASAN_SHADOW_SCALE_SHIFT) KBUILD_CPPFLAGS += -DKASAN_SHADOW_SCALE_SHIFT=$(KASAN_SHADOW_SCALE_SHIFT) KBUILD_AFLAGS += -DKASAN_SHADOW_SCALE_SHIFT=$(KASAN_SHADOW_SCALE_SHIFT) -# KASAN_SHADOW_OFFSET = VA_START + (1 << (VA_BITS - KASAN_SHADOW_SCALE_SHIFT)) -# - (1 << (64 - KASAN_SHADOW_SCALE_SHIFT)) -# in 32-bit arithmetic -KASAN_SHADOW_OFFSET := $(shell printf "0x%08x00000000\n" $$(( \ - (0xffffffff & (-1 << ($(CONFIG_ARM64_VA_BITS) - 1 - 32))) \ - + (1 << ($(CONFIG_ARM64_VA_BITS) - 32 - $(KASAN_SHADOW_SCALE_SHIFT))) \ - - (1 << (64 - 32 - $(KASAN_SHADOW_SCALE_SHIFT))) )) ) - export TEXT_OFFSET GZFLAGS core-y += arch/arm64/kernel/ arch/arm64/mm/ diff --git a/arch/arm64/include/asm/kasan.h b/arch/arm64/include/asm/kasan.h index b52aacd2c526..10d2add842da 100644 --- a/arch/arm64/include/asm/kasan.h +++ b/arch/arm64/include/asm/kasan.h @@ -18,11 +18,8 @@ * KASAN_SHADOW_START: beginning of the kernel virtual addresses. * KASAN_SHADOW_END: KASAN_SHADOW_START + 1/N of kernel virtual addresses, * where N = (1 << KASAN_SHADOW_SCALE_SHIFT). - */ -#define KASAN_SHADOW_START (VA_START) -#define KASAN_SHADOW_END (KASAN_SHADOW_START + KASAN_SHADOW_SIZE) - -/* + * + * KASAN_SHADOW_OFFSET: * This value is used to map an address to the corresponding shadow * address by the following formula: * shadow_addr = (address >> KASAN_SHADOW_SCALE_SHIFT) + KASAN_SHADOW_OFFSET @@ -33,8 +30,8 @@ * KASAN_SHADOW_OFFSET = KASAN_SHADOW_END - * (1ULL << (64 - KASAN_SHADOW_SCALE_SHIFT)) */ -#define KASAN_SHADOW_OFFSET (KASAN_SHADOW_END - (1ULL << \ - (64 - KASAN_SHADOW_SCALE_SHIFT))) +#define _KASAN_SHADOW_START(va) (KASAN_SHADOW_END - (1UL << ((va) - KASAN_SHADOW_SCALE_SHIFT))) +#define KASAN_SHADOW_START _KASAN_SHADOW_START(VA_BITS) void kasan_init(void); void kasan_copy_shadow(pgd_t *pgdir); diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h index 380594b1a0ba..968659c90f5c 100644 --- a/arch/arm64/include/asm/memory.h +++ b/arch/arm64/include/asm/memory.h @@ -42,7 +42,7 @@ #define PAGE_OFFSET (UL(0xffffffffffffffff) - \ (UL(1) << VA_BITS) + 1) #define KIMAGE_VADDR (MODULES_END) -#define BPF_JIT_REGION_START (VA_START + KASAN_SHADOW_SIZE) +#define BPF_JIT_REGION_START (KASAN_SHADOW_END) #define BPF_JIT_REGION_SIZE (SZ_128M) #define BPF_JIT_REGION_END (BPF_JIT_REGION_START + BPF_JIT_REGION_SIZE) #define MODULES_END (MODULES_VADDR + MODULES_VSIZE) @@ -68,11 +68,13 @@ * significantly, so double the (minimum) stack size when they are in use. */ #ifdef CONFIG_KASAN -#define KASAN_SHADOW_SIZE (UL(1) << (VA_BITS - KASAN_SHADOW_SCALE_SHIFT)) +#define KASAN_SHADOW_OFFSET _AC(CONFIG_KASAN_SHADOW_OFFSET, UL) +#define KASAN_SHADOW_END ((UL(1) << (64 - KASAN_SHADOW_SCALE_SHIFT)) \ + + KASAN_SHADOW_OFFSET) #define KASAN_THREAD_SHIFT 1 #else -#define KASAN_SHADOW_SIZE (0) #define KASAN_THREAD_SHIFT 0 +#define KASAN_SHADOW_END (VA_START) #endif #define MIN_THREAD_SHIFT (14 + KASAN_THREAD_SHIFT) -- cgit v1.2.3 From d2c68de192cfb90f607a80c6b10c41ebd8a3de6a Mon Sep 17 00:00:00 2001 From: Steve Capper Date: Wed, 7 Aug 2019 16:55:24 +0100 Subject: docs: arm64: Add layout and 52-bit info to memory document As the kernel no longer prints out the memory layout on boot, this patch adds this information back to the memory document. Also, as the 52-bit support introduces some subtle changes to the arm64 memory, the rationale behind these changes are also added to the memory document. Signed-off-by: Steve Capper Reviewed-by: Catalin Marinas Signed-off-by: Will Deacon --- Documentation/arm64/memory.rst | 123 +++++++++++++++++++++++++++++++---------- 1 file changed, 95 insertions(+), 28 deletions(-) (limited to 'Documentation') diff --git a/Documentation/arm64/memory.rst b/Documentation/arm64/memory.rst index 464b880fc4b7..b040909e45f8 100644 --- a/Documentation/arm64/memory.rst +++ b/Documentation/arm64/memory.rst @@ -14,6 +14,10 @@ with the 4KB page configuration, allowing 39-bit (512GB) or 48-bit 64KB pages, only 2 levels of translation tables, allowing 42-bit (4TB) virtual address, are used but the memory layout is the same. +ARMv8.2 adds optional support for Large Virtual Address space. This is +only available when running with a 64KB page size and expands the +number of descriptors in the first level of translation. + User addresses have bits 63:48 set to 0 while the kernel addresses have the same bits set to 1. TTBRx selection is given by bit 63 of the virtual address. The swapper_pg_dir contains only kernel (global) @@ -22,40 +26,43 @@ The swapper_pg_dir address is written to TTBR1 and never written to TTBR0. -AArch64 Linux memory layout with 4KB pages + 3 levels:: - - Start End Size Use - ----------------------------------------------------------------------- - 0000000000000000 0000007fffffffff 512GB user - ffffff8000000000 ffffffffffffffff 512GB kernel - - -AArch64 Linux memory layout with 4KB pages + 4 levels:: +AArch64 Linux memory layout with 4KB pages + 4 levels (48-bit):: Start End Size Use ----------------------------------------------------------------------- 0000000000000000 0000ffffffffffff 256TB user - ffff000000000000 ffffffffffffffff 256TB kernel - - -AArch64 Linux memory layout with 64KB pages + 2 levels:: + ffff000000000000 ffff7fffffffffff 128TB kernel logical memory map + ffff800000000000 ffff9fffffffffff 32TB kasan shadow region + ffffa00000000000 ffffa00007ffffff 128MB bpf jit region + ffffa00008000000 ffffa0000fffffff 128MB modules + ffffa00010000000 fffffdffbffeffff ~93TB vmalloc + fffffdffbfff0000 fffffdfffe5f8fff ~998MB [guard region] + fffffdfffe5f9000 fffffdfffe9fffff 4124KB fixed mappings + fffffdfffea00000 fffffdfffebfffff 2MB [guard region] + fffffdfffec00000 fffffdffffbfffff 16MB PCI I/O space + fffffdffffc00000 fffffdffffdfffff 2MB [guard region] + fffffdffffe00000 ffffffffffdfffff 2TB vmemmap + ffffffffffe00000 ffffffffffffffff 2MB [guard region] + + +AArch64 Linux memory layout with 64KB pages + 3 levels (52-bit with HW support):: Start End Size Use ----------------------------------------------------------------------- - 0000000000000000 000003ffffffffff 4TB user - fffffc0000000000 ffffffffffffffff 4TB kernel - - -AArch64 Linux memory layout with 64KB pages + 3 levels:: - - Start End Size Use - ----------------------------------------------------------------------- - 0000000000000000 0000ffffffffffff 256TB user - ffff000000000000 ffffffffffffffff 256TB kernel - - -For details of the virtual kernel memory layout please see the kernel -booting log. + 0000000000000000 000fffffffffffff 4PB user + fff0000000000000 fff7ffffffffffff 2PB kernel logical memory map + fff8000000000000 fffd9fffffffffff 1440TB [gap] + fffda00000000000 ffff9fffffffffff 512TB kasan shadow region + ffffa00000000000 ffffa00007ffffff 128MB bpf jit region + ffffa00008000000 ffffa0000fffffff 128MB modules + ffffa00010000000 fffff81ffffeffff ~88TB vmalloc + fffff81fffff0000 fffffc1ffe58ffff ~3TB [guard region] + fffffc1ffe590000 fffffc1ffe9fffff 4544KB fixed mappings + fffffc1ffea00000 fffffc1ffebfffff 2MB [guard region] + fffffc1ffec00000 fffffc1fffbfffff 16MB PCI I/O space + fffffc1fffc00000 fffffc1fffdfffff 2MB [guard region] + fffffc1fffe00000 ffffffffffdfffff 3968GB vmemmap + ffffffffffe00000 ffffffffffffffff 2MB [guard region] Translation table lookup with 4KB pages:: @@ -83,7 +90,8 @@ Translation table lookup with 64KB pages:: | | | | [15:0] in-page offset | | | +----------> [28:16] L3 index | | +--------------------------> [41:29] L2 index - | +-------------------------------> [47:42] L1 index + | +-------------------------------> [47:42] L1 index (48-bit) + | [51:42] L1 index (52-bit) +-------------------------------------------------> [63] TTBR0/1 @@ -96,3 +104,62 @@ ARM64_HARDEN_EL2_VECTORS is selected for particular CPUs. When using KVM with the Virtualization Host Extensions, no additional mappings are created, since the host kernel runs directly in EL2. + +52-bit VA support in the kernel +------------------------------- +If the ARMv8.2-LVA optional feature is present, and we are running +with a 64KB page size; then it is possible to use 52-bits of address +space for both userspace and kernel addresses. However, any kernel +binary that supports 52-bit must also be able to fall back to 48-bit +at early boot time if the hardware feature is not present. + +This fallback mechanism necessitates the kernel .text to be in the +higher addresses such that they are invariant to 48/52-bit VAs. Due +to the kasan shadow being a fraction of the entire kernel VA space, +the end of the kasan shadow must also be in the higher half of the +kernel VA space for both 48/52-bit. (Switching from 48-bit to 52-bit, +the end of the kasan shadow is invariant and dependent on ~0UL, +whilst the start address will "grow" towards the lower addresses). + +In order to optimise phys_to_virt and virt_to_phys, the PAGE_OFFSET +is kept constant at 0xFFF0000000000000 (corresponding to 52-bit), +this obviates the need for an extra variable read. The physvirt +offset and vmemmap offsets are computed at early boot to enable +this logic. + +As a single binary will need to support both 48-bit and 52-bit VA +spaces, the VMEMMAP must be sized large enough for 52-bit VAs and +also must be sized large enought to accommodate a fixed PAGE_OFFSET. + +Most code in the kernel should not need to consider the VA_BITS, for +code that does need to know the VA size the variables are +defined as follows: + +VA_BITS constant the *maximum* VA space size + +VA_BITS_MIN constant the *minimum* VA space size + +vabits_actual variable the *actual* VA space size + + +Maximum and minimum sizes can be useful to ensure that buffers are +sized large enough or that addresses are positioned close enough for +the "worst" case. + +52-bit userspace VAs +-------------------- +To maintain compatibility with software that relies on the ARMv8.0 +VA space maximum size of 48-bits, the kernel will, by default, +return virtual addresses to userspace from a 48-bit range. + +Software can "opt-in" to receiving VAs from a 52-bit space by +specifying an mmap hint parameter that is larger than 48-bit. +For example: + maybe_high_address = mmap(~0UL, size, prot, flags,...); + +It is also possible to build a debug kernel that returns addresses +from a 52-bit space by enabling the following kernel config options: + CONFIG_EXPERT=y && CONFIG_ARM64_FORCE_52BIT=y + +Note that this option is only intended for debugging applications +and should not be used in production. -- cgit v1.2.3 From e1b832503e8f29ea6e20c30db9c3176576c0fc78 Mon Sep 17 00:00:00 2001 From: Vincenzo Frascino Date: Wed, 21 Aug 2019 17:47:29 +0100 Subject: arm64: Define Documentation/arm64/tagged-address-abi.rst On AArch64 the TCR_EL1.TBI0 bit is set by default, allowing userspace (EL0) to perform memory accesses through 64-bit pointers with a non-zero top byte. Introduce the document describing the relaxation of the syscall ABI that allows userspace to pass certain tagged pointers to kernel syscalls. Cc: Will Deacon Cc: Szabolcs Nagy Acked-by: Kevin Brodsky Acked-by: Andrey Konovalov Signed-off-by: Vincenzo Frascino Co-developed-by: Catalin Marinas Signed-off-by: Catalin Marinas Signed-off-by: Will Deacon --- Documentation/arm64/tagged-address-abi.rst | 156 +++++++++++++++++++++++++++++ 1 file changed, 156 insertions(+) create mode 100644 Documentation/arm64/tagged-address-abi.rst (limited to 'Documentation') diff --git a/Documentation/arm64/tagged-address-abi.rst b/Documentation/arm64/tagged-address-abi.rst new file mode 100644 index 000000000000..d4a85d535bf9 --- /dev/null +++ b/Documentation/arm64/tagged-address-abi.rst @@ -0,0 +1,156 @@ +========================== +AArch64 TAGGED ADDRESS ABI +========================== + +Authors: Vincenzo Frascino + Catalin Marinas + +Date: 21 August 2019 + +This document describes the usage and semantics of the Tagged Address +ABI on AArch64 Linux. + +1. Introduction +--------------- + +On AArch64 the ``TCR_EL1.TBI0`` bit is set by default, allowing +userspace (EL0) to perform memory accesses through 64-bit pointers with +a non-zero top byte. This document describes the relaxation of the +syscall ABI that allows userspace to pass certain tagged pointers to +kernel syscalls. + +2. AArch64 Tagged Address ABI +----------------------------- + +From the kernel syscall interface perspective and for the purposes of +this document, a "valid tagged pointer" is a pointer with a potentially +non-zero top-byte that references an address in the user process address +space obtained in one of the following ways: + +- ``mmap()`` syscall where either: + + - flags have the ``MAP_ANONYMOUS`` bit set or + - the file descriptor refers to a regular file (including those + returned by ``memfd_create()``) or ``/dev/zero`` + +- ``brk()`` syscall (i.e. the heap area between the initial location of + the program break at process creation and its current location). + +- any memory mapped by the kernel in the address space of the process + during creation and with the same restrictions as for ``mmap()`` above + (e.g. data, bss, stack). + +The AArch64 Tagged Address ABI has two stages of relaxation depending +how the user addresses are used by the kernel: + +1. User addresses not accessed by the kernel but used for address space + management (e.g. ``mmap()``, ``mprotect()``, ``madvise()``). The use + of valid tagged pointers in this context is always allowed. + +2. User addresses accessed by the kernel (e.g. ``write()``). This ABI + relaxation is disabled by default and the application thread needs to + explicitly enable it via ``prctl()`` as follows: + + - ``PR_SET_TAGGED_ADDR_CTRL``: enable or disable the AArch64 Tagged + Address ABI for the calling thread. + + The ``(unsigned int) arg2`` argument is a bit mask describing the + control mode used: + + - ``PR_TAGGED_ADDR_ENABLE``: enable AArch64 Tagged Address ABI. + Default status is disabled. + + Arguments ``arg3``, ``arg4``, and ``arg5`` must be 0. + + - ``PR_GET_TAGGED_ADDR_CTRL``: get the status of the AArch64 Tagged + Address ABI for the calling thread. + + Arguments ``arg2``, ``arg3``, ``arg4``, and ``arg5`` must be 0. + + The ABI properties described above are thread-scoped, inherited on + clone() and fork() and cleared on exec(). + + Calling ``prctl(PR_SET_TAGGED_ADDR_CTRL, PR_TAGGED_ADDR_ENABLE, 0, 0, 0)`` + returns ``-EINVAL`` if the AArch64 Tagged Address ABI is globally + disabled by ``sysctl abi.tagged_addr_disabled=1``. The default + ``sysctl abi.tagged_addr_disabled`` configuration is 0. + +When the AArch64 Tagged Address ABI is enabled for a thread, the +following behaviours are guaranteed: + +- All syscalls except the cases mentioned in section 3 can accept any + valid tagged pointer. + +- The syscall behaviour is undefined for invalid tagged pointers: it may + result in an error code being returned, a (fatal) signal being raised, + or other modes of failure. + +- The syscall behaviour for a valid tagged pointer is the same as for + the corresponding untagged pointer. + + +A definition of the meaning of tagged pointers on AArch64 can be found +in Documentation/arm64/tagged-pointers.rst. + +3. AArch64 Tagged Address ABI Exceptions +----------------------------------------- + +The following system call parameters must be untagged regardless of the +ABI relaxation: + +- ``prctl()`` other than pointers to user data either passed directly or + indirectly as arguments to be accessed by the kernel. + +- ``ioctl()`` other than pointers to user data either passed directly or + indirectly as arguments to be accessed by the kernel. + +- ``shmat()`` and ``shmdt()``. + +Any attempt to use non-zero tagged pointers may result in an error code +being returned, a (fatal) signal being raised, or other modes of +failure. + +4. Example of correct usage +--------------------------- +.. code-block:: c + + #include + #include + #include + #include + #include + + #define PR_SET_TAGGED_ADDR_CTRL 55 + #define PR_TAGGED_ADDR_ENABLE (1UL << 0) + + #define TAG_SHIFT 56 + + int main(void) + { + int tbi_enabled = 0; + unsigned long tag = 0; + char *ptr; + + /* check/enable the tagged address ABI */ + if (!prctl(PR_SET_TAGGED_ADDR_CTRL, PR_TAGGED_ADDR_ENABLE, 0, 0, 0)) + tbi_enabled = 1; + + /* memory allocation */ + ptr = mmap(NULL, sysconf(_SC_PAGE_SIZE), PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); + if (ptr == MAP_FAILED) + return 1; + + /* set a non-zero tag if the ABI is available */ + if (tbi_enabled) + tag = rand() & 0xff; + ptr = (char *)((unsigned long)ptr | (tag << TAG_SHIFT)); + + /* memory access to a tagged address */ + strcpy(ptr, "tagged pointer\n"); + + /* syscall with a tagged pointer */ + write(1, ptr, strlen(ptr)); + + return 0; + } -- cgit v1.2.3 From 1243cb6a676ffcfa72dd25859edddf66cde0b638 Mon Sep 17 00:00:00 2001 From: Vincenzo Frascino Date: Thu, 22 Aug 2019 15:17:43 +0100 Subject: arm64: Add tagged-address-abi.rst to index.rst Documentation/arm64/tagged-address-abi.rst introduces the relaxation of the syscall ABI that allows userspace to pass certain tagged pointers to kernel syscalls. Add the document to index.rst for a correct generation of the table of content. Cc: Will Deacon Cc: Catalin Marinas Signed-off-by: Vincenzo Frascino Signed-off-by: Will Deacon --- Documentation/arm64/index.rst | 1 + 1 file changed, 1 insertion(+) (limited to 'Documentation') diff --git a/Documentation/arm64/index.rst b/Documentation/arm64/index.rst index 96b696ba4e6c..5c0c69dc58aa 100644 --- a/Documentation/arm64/index.rst +++ b/Documentation/arm64/index.rst @@ -16,6 +16,7 @@ ARM64 Architecture pointer-authentication silicon-errata sve + tagged-address-abi tagged-pointers .. only:: subproject and html -- cgit v1.2.3 From 92af2b696119e491a95d77acdd8832b582d300d4 Mon Sep 17 00:00:00 2001 From: Vincenzo Frascino Date: Fri, 23 Aug 2019 17:37:17 +0100 Subject: arm64: Relax Documentation/arm64/tagged-pointers.rst On AArch64 the TCR_EL1.TBI0 bit is set by default, allowing userspace (EL0) to perform memory accesses through 64-bit pointers with a non-zero top byte. However, such pointers were not allowed at the user-kernel syscall ABI boundary. With the Tagged Address ABI patchset, it is now possible to pass tagged pointers to the syscalls. Relax the requirements described in tagged-pointers.rst to be compliant with the behaviours guaranteed by the AArch64 Tagged Address ABI. Cc: Will Deacon Cc: Szabolcs Nagy Cc: Kevin Brodsky Acked-by: Andrey Konovalov Signed-off-by: Vincenzo Frascino Co-developed-by: Catalin Marinas Signed-off-by: Catalin Marinas Signed-off-by: Will Deacon --- Documentation/arm64/tagged-pointers.rst | 21 ++++++++++++++------- 1 file changed, 14 insertions(+), 7 deletions(-) (limited to 'Documentation') diff --git a/Documentation/arm64/tagged-pointers.rst b/Documentation/arm64/tagged-pointers.rst index 2acdec3ebbeb..eab4323609b9 100644 --- a/Documentation/arm64/tagged-pointers.rst +++ b/Documentation/arm64/tagged-pointers.rst @@ -20,7 +20,9 @@ Passing tagged addresses to the kernel -------------------------------------- All interpretation of userspace memory addresses by the kernel assumes -an address tag of 0x00. +an address tag of 0x00, unless the application enables the AArch64 +Tagged Address ABI explicitly +(Documentation/arm64/tagged-address-abi.rst). This includes, but is not limited to, addresses found in: @@ -33,13 +35,15 @@ This includes, but is not limited to, addresses found in: - the frame pointer (x29) and frame records, e.g. when interpreting them to generate a backtrace or call graph. -Using non-zero address tags in any of these locations may result in an -error code being returned, a (fatal) signal being raised, or other modes -of failure. +Using non-zero address tags in any of these locations when the +userspace application did not enable the AArch64 Tagged Address ABI may +result in an error code being returned, a (fatal) signal being raised, +or other modes of failure. -For these reasons, passing non-zero address tags to the kernel via -system calls is forbidden, and using a non-zero address tag for sp is -strongly discouraged. +For these reasons, when the AArch64 Tagged Address ABI is disabled, +passing non-zero address tags to the kernel via system calls is +forbidden, and using a non-zero address tag for sp is strongly +discouraged. Programs maintaining a frame pointer and frame records that use non-zero address tags may suffer impaired or inaccurate debug and profiling @@ -59,6 +63,9 @@ be preserved. The architecture prevents the use of a tagged PC, so the upper byte will be set to a sign-extension of bit 55 on exception return. +This behaviour is maintained when the AArch64 Tagged Address ABI is +enabled. + Other considerations -------------------- -- cgit v1.2.3 From 3724e186fead350d1446d5202cd92fa6250bffda Mon Sep 17 00:00:00 2001 From: Joakim Zhang Date: Wed, 28 Aug 2019 12:07:56 +0000 Subject: docs/perf: Add documentation for the i.MX8 DDR PMU Add some documentation describing the DDR PMU residing in the Freescale i.MDX SoC and its perf driver implementation in Linux. Signed-off-by: Joakim Zhang Signed-off-by: Will Deacon --- Documentation/admin-guide/perf/imx-ddr.rst | 52 ++++++++++++++++++++++++++++++ MAINTAINERS | 1 + 2 files changed, 53 insertions(+) create mode 100644 Documentation/admin-guide/perf/imx-ddr.rst (limited to 'Documentation') diff --git a/Documentation/admin-guide/perf/imx-ddr.rst b/Documentation/admin-guide/perf/imx-ddr.rst new file mode 100644 index 000000000000..517a205abad6 --- /dev/null +++ b/Documentation/admin-guide/perf/imx-ddr.rst @@ -0,0 +1,52 @@ +===================================================== +Freescale i.MX8 DDR Performance Monitoring Unit (PMU) +===================================================== + +There are no performance counters inside the DRAM controller, so performance +signals are brought out to the edge of the controller where a set of 4 x 32 bit +counters is implemented. This is controlled by the CSV modes programed in counter +control register which causes a large number of PERF signals to be generated. + +Selection of the value for each counter is done via the config registers. There +is one register for each counter. Counter 0 is special in that it always counts +“time” and when expired causes a lock on itself and the other counters and an +interrupt is raised. If any other counter overflows, it continues counting, and +no interrupt is raised. + +The "format" directory describes format of the config (event ID) and config1 +(AXI filtering) fields of the perf_event_attr structure, see /sys/bus/event_source/ +devices/imx8_ddr0/format/. The "events" directory describes the events types +hardware supported that can be used with perf tool, see /sys/bus/event_source/ +devices/imx8_ddr0/events/. + e.g.:: + perf stat -a -e imx8_ddr0/cycles/ cmd + perf stat -a -e imx8_ddr0/read/,imx8_ddr0/write/ cmd + +AXI filtering is only used by CSV modes 0x41 (axid-read) and 0x42 (axid-write) +to count reading or writing matches filter setting. Filter setting is various +from different DRAM controller implementations, which is distinguished by quirks +in the driver. + +* With DDR_CAP_AXI_ID_FILTER quirk. + Filter is defined with two configuration parts: + --AXI_ID defines AxID matching value. + --AXI_MASKING defines which bits of AxID are meaningful for the matching. + 0:corresponding bit is masked. + 1: corresponding bit is not masked, i.e. used to do the matching. + + AXI_ID and AXI_MASKING are mapped on DPCR1 register in performance counter. + When non-masked bits are matching corresponding AXI_ID bits then counter is + incremented. Perf counter is incremented if + AxID && AXI_MASKING == AXI_ID && AXI_MASKING + + This filter doesn't support filter different AXI ID for axid-read and axid-write + event at the same time as this filter is shared between counters. + e.g.:: + perf stat -a -e imx8_ddr0/axid-read,axi_mask=0xMMMM,axi_id=0xDDDD/ cmd + perf stat -a -e imx8_ddr0/axid-write,axi_mask=0xMMMM,axi_id=0xDDDD/ cmd + + NOTE: axi_mask is inverted in userspace(i.e. set bits are bits to mask), and + it will be reverted in driver automatically. so that the user can just specify + axi_id to monitor a specific id, rather than having to specify axi_mask. + e.g.:: + perf stat -a -e imx8_ddr0/axid-read,axi_id=0x12/ cmd, which will monitor ARID=0x12 diff --git a/MAINTAINERS b/MAINTAINERS index 783569e3c4b4..1d442f9e3276 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -6434,6 +6434,7 @@ M: Frank Li L: linux-arm-kernel@lists.infradead.org S: Maintained F: drivers/perf/fsl_imx8_ddr_perf.c +F: Documentation/admin-guide/perf/imx-ddr.rst F: Documentation/devicetree/bindings/perf/fsl-imx-ddr.txt FREESCALE IMX LPI2C DRIVER -- cgit v1.2.3