1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
|
==========================================
ARM CPUs capacity bindings
==========================================
==========================================
1 - Introduction
==========================================
ARM systems may be configured to have cpus with different power/performance
characteristics within the same chip. In this case, additional information has
to be made available to the kernel for it to be aware of such differences and
take decisions accordingly.
==========================================
2 - CPU capacity definition
==========================================
CPU capacity is a number that provides the scheduler information about CPUs
heterogeneity. Such heterogeneity can come from micro-architectural differences
(e.g., ARM big.LITTLE systems) or maximum frequency at which CPUs can run
(e.g., SMP systems with multiple frequency domains). Heterogeneity in this
context is about differing performance characteristics; this binding tries to
capture a first-order approximation of the relative performance of CPUs.
CPU capacities are obtained by running a suitable benchmark. This binding makes
no guarantees on the validity or suitability of any particular benchmark, the
final capacity should, however, be:
* A "single-threaded" or CPU affine benchmark
* Divided by the running frequency of the CPU executing the benchmark
* Not subject to dynamic frequency scaling of the CPU
For the time being we however advise usage of the Dhrystone benchmark. What
above thus becomes:
CPU capacities are obtained by running the Dhrystone benchmark on each CPU at
max frequency (with caches enabled). The obtained DMIPS score is then divided
by the frequency (in MHz) at which the benchmark has been run, so that
DMIPS/MHz are obtained. Such values are then normalized w.r.t. the highest
score obtained in the system.
==========================================
3 - capacity-dmips-mhz
==========================================
capacity-dmips-mhz is an optional cpu node [1] property: u32 value
representing CPU capacity expressed in normalized DMIPS/MHz. At boot time, the
maximum frequency available to the cpu is then used to calculate the capacity
value internally used by the kernel.
capacity-dmips-mhz property is all-or-nothing: if it is specified for a cpu
node, it has to be specified for every other cpu nodes, or the system will
fall back to the default capacity value for every CPU. If cpufreq is not
available, final capacities are calculated by directly using capacity-dmips-
mhz values (normalized w.r.t. the highest value found while parsing the DT).
===========================================
4 - Examples
===========================================
Example 1 (ARM 64-bit, 6-cpu system, two clusters):
capacities-dmips-mhz are scaled w.r.t. 1024 (cpu@0 and cpu@1)
supposing cluster0@max-freq=1100 and custer1@max-freq=850,
final capacities are 1024 for cluster0 and 446 for cluster1
cpus {
#address-cells = <2>;
#size-cells = <0>;
cpu-map {
cluster0 {
core0 {
cpu = <&A57_0>;
};
core1 {
cpu = <&A57_1>;
};
};
cluster1 {
core0 {
cpu = <&A53_0>;
};
core1 {
cpu = <&A53_1>;
};
core2 {
cpu = <&A53_2>;
};
core3 {
cpu = <&A53_3>;
};
};
};
idle-states {
entry-method = "arm,psci";
CPU_SLEEP_0: cpu-sleep-0 {
compatible = "arm,idle-state";
arm,psci-suspend-param = <0x0010000>;
local-timer-stop;
entry-latency-us = <100>;
exit-latency-us = <250>;
min-residency-us = <150>;
};
CLUSTER_SLEEP_0: cluster-sleep-0 {
compatible = "arm,idle-state";
arm,psci-suspend-param = <0x1010000>;
local-timer-stop;
entry-latency-us = <800>;
exit-latency-us = <700>;
min-residency-us = <2500>;
};
};
A57_0: cpu@0 {
compatible = "arm,cortex-a57","arm,armv8";
reg = <0x0 0x0>;
device_type = "cpu";
enable-method = "psci";
next-level-cache = <&A57_L2>;
clocks = <&scpi_dvfs 0>;
cpu-idle-states = <&CPU_SLEEP_0 &CLUSTER_SLEEP_0>;
capacity-dmips-mhz = <1024>;
};
A57_1: cpu@1 {
compatible = "arm,cortex-a57","arm,armv8";
reg = <0x0 0x1>;
device_type = "cpu";
enable-method = "psci";
next-level-cache = <&A57_L2>;
clocks = <&scpi_dvfs 0>;
cpu-idle-states = <&CPU_SLEEP_0 &CLUSTER_SLEEP_0>;
capacity-dmips-mhz = <1024>;
};
A53_0: cpu@100 {
compatible = "arm,cortex-a53","arm,armv8";
reg = <0x0 0x100>;
device_type = "cpu";
enable-method = "psci";
next-level-cache = <&A53_L2>;
clocks = <&scpi_dvfs 1>;
cpu-idle-states = <&CPU_SLEEP_0 &CLUSTER_SLEEP_0>;
capacity-dmips-mhz = <578>;
};
A53_1: cpu@101 {
compatible = "arm,cortex-a53","arm,armv8";
reg = <0x0 0x101>;
device_type = "cpu";
enable-method = "psci";
next-level-cache = <&A53_L2>;
clocks = <&scpi_dvfs 1>;
cpu-idle-states = <&CPU_SLEEP_0 &CLUSTER_SLEEP_0>;
capacity-dmips-mhz = <578>;
};
A53_2: cpu@102 {
compatible = "arm,cortex-a53","arm,armv8";
reg = <0x0 0x102>;
device_type = "cpu";
enable-method = "psci";
next-level-cache = <&A53_L2>;
clocks = <&scpi_dvfs 1>;
cpu-idle-states = <&CPU_SLEEP_0 &CLUSTER_SLEEP_0>;
capacity-dmips-mhz = <578>;
};
A53_3: cpu@103 {
compatible = "arm,cortex-a53","arm,armv8";
reg = <0x0 0x103>;
device_type = "cpu";
enable-method = "psci";
next-level-cache = <&A53_L2>;
clocks = <&scpi_dvfs 1>;
cpu-idle-states = <&CPU_SLEEP_0 &CLUSTER_SLEEP_0>;
capacity-dmips-mhz = <578>;
};
A57_L2: l2-cache0 {
compatible = "cache";
};
A53_L2: l2-cache1 {
compatible = "cache";
};
};
Example 2 (ARM 32-bit, 4-cpu system, two clusters,
cpus 0,1@1GHz, cpus 2,3@500MHz):
capacities-dmips-mhz are scaled w.r.t. 2 (cpu@0 and cpu@1), this means that first
cpu@0 and cpu@1 are twice fast than cpu@2 and cpu@3 (at the same frequency)
cpus {
#address-cells = <1>;
#size-cells = <0>;
cpu0: cpu@0 {
device_type = "cpu";
compatible = "arm,cortex-a15";
reg = <0>;
capacity-dmips-mhz = <2>;
};
cpu1: cpu@1 {
device_type = "cpu";
compatible = "arm,cortex-a15";
reg = <1>;
capacity-dmips-mhz = <2>;
};
cpu2: cpu@2 {
device_type = "cpu";
compatible = "arm,cortex-a15";
reg = <0x100>;
capacity-dmips-mhz = <1>;
};
cpu3: cpu@3 {
device_type = "cpu";
compatible = "arm,cortex-a15";
reg = <0x101>;
capacity-dmips-mhz = <1>;
};
};
===========================================
5 - References
===========================================
[1] ARM Linux Kernel documentation - CPUs bindings
Documentation/devicetree/bindings/arm/cpus.txt
|