diff options
author | Mauro Carvalho Chehab <mchehab+samsung@kernel.org> | 2019-07-26 09:51:12 -0300 |
---|---|---|
committer | Jonathan Corbet <corbet@lwn.net> | 2019-07-31 13:25:15 -0600 |
commit | eaf7b46083a7e341a23ab3d6042e0ccc115b0914 (patch) | |
tree | 86decb170d9376fca00dc22dcccd499465cd4aa7 /Documentation/thermal | |
parent | fe13225fdc3f7b79e2921869a13386f48b30bf79 (diff) | |
download | linux-eaf7b46083a7e341a23ab3d6042e0ccc115b0914.tar.bz2 |
docs: thermal: add it to the driver API
The file contents mostly describes driver internals.
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Diffstat (limited to 'Documentation/thermal')
-rw-r--r-- | Documentation/thermal/cpu-cooling-api.rst | 107 | ||||
-rw-r--r-- | Documentation/thermal/exynos_thermal.rst | 90 | ||||
-rw-r--r-- | Documentation/thermal/exynos_thermal_emulation.rst | 61 | ||||
-rw-r--r-- | Documentation/thermal/index.rst | 18 | ||||
-rw-r--r-- | Documentation/thermal/intel_powerclamp.rst | 320 | ||||
-rw-r--r-- | Documentation/thermal/nouveau_thermal.rst | 96 | ||||
-rw-r--r-- | Documentation/thermal/power_allocator.rst | 271 | ||||
-rw-r--r-- | Documentation/thermal/sysfs-api.rst | 798 | ||||
-rw-r--r-- | Documentation/thermal/x86_pkg_temperature_thermal.rst | 55 |
9 files changed, 0 insertions, 1816 deletions
diff --git a/Documentation/thermal/cpu-cooling-api.rst b/Documentation/thermal/cpu-cooling-api.rst deleted file mode 100644 index 645d914c45a6..000000000000 --- a/Documentation/thermal/cpu-cooling-api.rst +++ /dev/null @@ -1,107 +0,0 @@ -======================= -CPU cooling APIs How To -======================= - -Written by Amit Daniel Kachhap <amit.kachhap@linaro.org> - -Updated: 6 Jan 2015 - -Copyright (c) 2012 Samsung Electronics Co., Ltd(http://www.samsung.com) - -0. Introduction -=============== - -The generic cpu cooling(freq clipping) provides registration/unregistration APIs -to the caller. The binding of the cooling devices to the trip point is left for -the user. The registration APIs returns the cooling device pointer. - -1. cpu cooling APIs -=================== - -1.1 cpufreq registration/unregistration APIs --------------------------------------------- - - :: - - struct thermal_cooling_device - *cpufreq_cooling_register(struct cpumask *clip_cpus) - - This interface function registers the cpufreq cooling device with the name - "thermal-cpufreq-%x". This api can support multiple instances of cpufreq - cooling devices. - - clip_cpus: - cpumask of cpus where the frequency constraints will happen. - - :: - - struct thermal_cooling_device - *of_cpufreq_cooling_register(struct cpufreq_policy *policy) - - This interface function registers the cpufreq cooling device with - the name "thermal-cpufreq-%x" linking it with a device tree node, in - order to bind it via the thermal DT code. This api can support multiple - instances of cpufreq cooling devices. - - policy: - CPUFreq policy. - - - :: - - void cpufreq_cooling_unregister(struct thermal_cooling_device *cdev) - - This interface function unregisters the "thermal-cpufreq-%x" cooling device. - - cdev: Cooling device pointer which has to be unregistered. - -2. Power models -=============== - -The power API registration functions provide a simple power model for -CPUs. The current power is calculated as dynamic power (static power isn't -supported currently). This power model requires that the operating-points of -the CPUs are registered using the kernel's opp library and the -`cpufreq_frequency_table` is assigned to the `struct device` of the -cpu. If you are using CONFIG_CPUFREQ_DT then the -`cpufreq_frequency_table` should already be assigned to the cpu -device. - -The dynamic power consumption of a processor depends on many factors. -For a given processor implementation the primary factors are: - -- The time the processor spends running, consuming dynamic power, as - compared to the time in idle states where dynamic consumption is - negligible. Herein we refer to this as 'utilisation'. -- The voltage and frequency levels as a result of DVFS. The DVFS - level is a dominant factor governing power consumption. -- In running time the 'execution' behaviour (instruction types, memory - access patterns and so forth) causes, in most cases, a second order - variation. In pathological cases this variation can be significant, - but typically it is of a much lesser impact than the factors above. - -A high level dynamic power consumption model may then be represented as:: - - Pdyn = f(run) * Voltage^2 * Frequency * Utilisation - -f(run) here represents the described execution behaviour and its -result has a units of Watts/Hz/Volt^2 (this often expressed in -mW/MHz/uVolt^2) - -The detailed behaviour for f(run) could be modelled on-line. However, -in practice, such an on-line model has dependencies on a number of -implementation specific processor support and characterisation -factors. Therefore, in initial implementation that contribution is -represented as a constant coefficient. This is a simplification -consistent with the relative contribution to overall power variation. - -In this simplified representation our model becomes:: - - Pdyn = Capacitance * Voltage^2 * Frequency * Utilisation - -Where `capacitance` is a constant that represents an indicative -running time dynamic power coefficient in fundamental units of -mW/MHz/uVolt^2. Typical values for mobile CPUs might lie in range -from 100 to 500. For reference, the approximate values for the SoC in -ARM's Juno Development Platform are 530 for the Cortex-A57 cluster and -140 for the Cortex-A53 cluster. diff --git a/Documentation/thermal/exynos_thermal.rst b/Documentation/thermal/exynos_thermal.rst deleted file mode 100644 index 5bd556566c70..000000000000 --- a/Documentation/thermal/exynos_thermal.rst +++ /dev/null @@ -1,90 +0,0 @@ -======================== -Kernel driver exynos_tmu -======================== - -Supported chips: - -* ARM SAMSUNG EXYNOS4, EXYNOS5 series of SoC - - Datasheet: Not publicly available - -Authors: Donggeun Kim <dg77.kim@samsung.com> -Authors: Amit Daniel <amit.daniel@samsung.com> - -TMU controller Description: ---------------------------- - -This driver allows to read temperature inside SAMSUNG EXYNOS4/5 series of SoC. - -The chip only exposes the measured 8-bit temperature code value -through a register. -Temperature can be taken from the temperature code. -There are three equations converting from temperature to temperature code. - -The three equations are: - 1. Two point trimming:: - - Tc = (T - 25) * (TI2 - TI1) / (85 - 25) + TI1 - - 2. One point trimming:: - - Tc = T + TI1 - 25 - - 3. No trimming:: - - Tc = T + 50 - - Tc: - Temperature code, T: Temperature, - TI1: - Trimming info for 25 degree Celsius (stored at TRIMINFO register) - Temperature code measured at 25 degree Celsius which is unchanged - TI2: - Trimming info for 85 degree Celsius (stored at TRIMINFO register) - Temperature code measured at 85 degree Celsius which is unchanged - -TMU(Thermal Management Unit) in EXYNOS4/5 generates interrupt -when temperature exceeds pre-defined levels. -The maximum number of configurable threshold is five. -The threshold levels are defined as follows:: - - Level_0: current temperature > trigger_level_0 + threshold - Level_1: current temperature > trigger_level_1 + threshold - Level_2: current temperature > trigger_level_2 + threshold - Level_3: current temperature > trigger_level_3 + threshold - -The threshold and each trigger_level are set -through the corresponding registers. - -When an interrupt occurs, this driver notify kernel thermal framework -with the function exynos_report_trigger. -Although an interrupt condition for level_0 can be set, -it can be used to synchronize the cooling action. - -TMU driver description: ------------------------ - -The exynos thermal driver is structured as:: - - Kernel Core thermal framework - (thermal_core.c, step_wise.c, cpu_cooling.c) - ^ - | - | - TMU configuration data -----> TMU Driver <----> Exynos Core thermal wrapper - (exynos_tmu_data.c) (exynos_tmu.c) (exynos_thermal_common.c) - (exynos_tmu_data.h) (exynos_tmu.h) (exynos_thermal_common.h) - -a) TMU configuration data: - This consist of TMU register offsets/bitfields - described through structure exynos_tmu_registers. Also several - other platform data (struct exynos_tmu_platform_data) members - are used to configure the TMU. -b) TMU driver: - This component initialises the TMU controller and sets different - thresholds. It invokes core thermal implementation with the call - exynos_report_trigger. -c) Exynos Core thermal wrapper: - This provides 3 wrapper function to use the - Kernel core thermal framework. They are exynos_unregister_thermal, - exynos_register_thermal and exynos_report_trigger. diff --git a/Documentation/thermal/exynos_thermal_emulation.rst b/Documentation/thermal/exynos_thermal_emulation.rst deleted file mode 100644 index c21d10838bc5..000000000000 --- a/Documentation/thermal/exynos_thermal_emulation.rst +++ /dev/null @@ -1,61 +0,0 @@ -===================== -Exynos Emulation Mode -===================== - -Copyright (C) 2012 Samsung Electronics - -Written by Jonghwa Lee <jonghwa3.lee@samsung.com> - -Description ------------ - -Exynos 4x12 (4212, 4412) and 5 series provide emulation mode for thermal -management unit. Thermal emulation mode supports software debug for -TMU's operation. User can set temperature manually with software code -and TMU will read current temperature from user value not from sensor's -value. - -Enabling CONFIG_THERMAL_EMULATION option will make this support -available. When it's enabled, sysfs node will be created as -/sys/devices/virtual/thermal/thermal_zone'zone id'/emul_temp. - -The sysfs node, 'emul_node', will contain value 0 for the initial state. -When you input any temperature you want to update to sysfs node, it -automatically enable emulation mode and current temperature will be -changed into it. - -(Exynos also supports user changeable delay time which would be used to -delay of changing temperature. However, this node only uses same delay -of real sensing time, 938us.) - -Exynos emulation mode requires synchronous of value changing and -enabling. It means when you want to update the any value of delay or -next temperature, then you have to enable emulation mode at the same -time. (Or you have to keep the mode enabling.) If you don't, it fails to -change the value to updated one and just use last succeessful value -repeatedly. That's why this node gives users the right to change -termerpature only. Just one interface makes it more simply to use. - -Disabling emulation mode only requires writing value 0 to sysfs node. - -:: - - - TEMP 120 | - | - 100 | - | - 80 | - | +----------- - 60 | | | - | +-------------| | - 40 | | | | - | | | | - 20 | | | +---------- - | | | | | - 0 |______________|_____________|__________|__________|_________ - A A A A TIME - |<----->| |<----->| |<----->| | - | 938us | | | | | | - emulation : 0 50 | 70 | 20 | 0 - current temp: sensor 50 70 20 sensor diff --git a/Documentation/thermal/index.rst b/Documentation/thermal/index.rst deleted file mode 100644 index 8c1c00146cad..000000000000 --- a/Documentation/thermal/index.rst +++ /dev/null @@ -1,18 +0,0 @@ -:orphan: - -======= -Thermal -======= - -.. toctree:: - :maxdepth: 1 - - cpu-cooling-api - sysfs-api - power_allocator - - exynos_thermal - exynos_thermal_emulation - intel_powerclamp - nouveau_thermal - x86_pkg_temperature_thermal diff --git a/Documentation/thermal/intel_powerclamp.rst b/Documentation/thermal/intel_powerclamp.rst deleted file mode 100644 index 3f6dfb0b3ea6..000000000000 --- a/Documentation/thermal/intel_powerclamp.rst +++ /dev/null @@ -1,320 +0,0 @@ -======================= -Intel Powerclamp Driver -======================= - -By: - - Arjan van de Ven <arjan@linux.intel.com> - - Jacob Pan <jacob.jun.pan@linux.intel.com> - -.. Contents: - - (*) Introduction - - Goals and Objectives - - (*) Theory of Operation - - Idle Injection - - Calibration - - (*) Performance Analysis - - Effectiveness and Limitations - - Power vs Performance - - Scalability - - Calibration - - Comparison with Alternative Techniques - - (*) Usage and Interfaces - - Generic Thermal Layer (sysfs) - - Kernel APIs (TBD) - -INTRODUCTION -============ - -Consider the situation where a system’s power consumption must be -reduced at runtime, due to power budget, thermal constraint, or noise -level, and where active cooling is not preferred. Software managed -passive power reduction must be performed to prevent the hardware -actions that are designed for catastrophic scenarios. - -Currently, P-states, T-states (clock modulation), and CPU offlining -are used for CPU throttling. - -On Intel CPUs, C-states provide effective power reduction, but so far -they’re only used opportunistically, based on workload. With the -development of intel_powerclamp driver, the method of synchronizing -idle injection across all online CPU threads was introduced. The goal -is to achieve forced and controllable C-state residency. - -Test/Analysis has been made in the areas of power, performance, -scalability, and user experience. In many cases, clear advantage is -shown over taking the CPU offline or modulating the CPU clock. - - -THEORY OF OPERATION -=================== - -Idle Injection --------------- - -On modern Intel processors (Nehalem or later), package level C-state -residency is available in MSRs, thus also available to the kernel. - -These MSRs are:: - - #define MSR_PKG_C2_RESIDENCY 0x60D - #define MSR_PKG_C3_RESIDENCY 0x3F8 - #define MSR_PKG_C6_RESIDENCY 0x3F9 - #define MSR_PKG_C7_RESIDENCY 0x3FA - -If the kernel can also inject idle time to the system, then a -closed-loop control system can be established that manages package -level C-state. The intel_powerclamp driver is conceived as such a -control system, where the target set point is a user-selected idle -ratio (based on power reduction), and the error is the difference -between the actual package level C-state residency ratio and the target idle -ratio. - -Injection is controlled by high priority kernel threads, spawned for -each online CPU. - -These kernel threads, with SCHED_FIFO class, are created to perform -clamping actions of controlled duty ratio and duration. Each per-CPU -thread synchronizes its idle time and duration, based on the rounding -of jiffies, so accumulated errors can be prevented to avoid a jittery -effect. Threads are also bound to the CPU such that they cannot be -migrated, unless the CPU is taken offline. In this case, threads -belong to the offlined CPUs will be terminated immediately. - -Running as SCHED_FIFO and relatively high priority, also allows such -scheme to work for both preemptable and non-preemptable kernels. -Alignment of idle time around jiffies ensures scalability for HZ -values. This effect can be better visualized using a Perf timechart. -The following diagram shows the behavior of kernel thread -kidle_inject/cpu. During idle injection, it runs monitor/mwait idle -for a given "duration", then relinquishes the CPU to other tasks, -until the next time interval. - -The NOHZ schedule tick is disabled during idle time, but interrupts -are not masked. Tests show that the extra wakeups from scheduler tick -have a dramatic impact on the effectiveness of the powerclamp driver -on large scale systems (Westmere system with 80 processors). - -:: - - CPU0 - ____________ ____________ - kidle_inject/0 | sleep | mwait | sleep | - _________| |________| |_______ - duration - CPU1 - ____________ ____________ - kidle_inject/1 | sleep | mwait | sleep | - _________| |________| |_______ - ^ - | - | - roundup(jiffies, interval) - -Only one CPU is allowed to collect statistics and update global -control parameters. This CPU is referred to as the controlling CPU in -this document. The controlling CPU is elected at runtime, with a -policy that favors BSP, taking into account the possibility of a CPU -hot-plug. - -In terms of dynamics of the idle control system, package level idle -time is considered largely as a non-causal system where its behavior -cannot be based on the past or current input. Therefore, the -intel_powerclamp driver attempts to enforce the desired idle time -instantly as given input (target idle ratio). After injection, -powerclamp monitors the actual idle for a given time window and adjust -the next injection accordingly to avoid over/under correction. - -When used in a causal control system, such as a temperature control, -it is up to the user of this driver to implement algorithms where -past samples and outputs are included in the feedback. For example, a -PID-based thermal controller can use the powerclamp driver to -maintain a desired target temperature, based on integral and -derivative gains of the past samples. - - - -Calibration ------------ -During scalability testing, it is observed that synchronized actions -among CPUs become challenging as the number of cores grows. This is -also true for the ability of a system to enter package level C-states. - -To make sure the intel_powerclamp driver scales well, online -calibration is implemented. The goals for doing such a calibration -are: - -a) determine the effective range of idle injection ratio -b) determine the amount of compensation needed at each target ratio - -Compensation to each target ratio consists of two parts: - - a) steady state error compensation - This is to offset the error occurring when the system can - enter idle without extra wakeups (such as external interrupts). - - b) dynamic error compensation - When an excessive amount of wakeups occurs during idle, an - additional idle ratio can be added to quiet interrupts, by - slowing down CPU activities. - -A debugfs file is provided for the user to examine compensation -progress and results, such as on a Westmere system:: - - [jacob@nex01 ~]$ cat - /sys/kernel/debug/intel_powerclamp/powerclamp_calib - controlling cpu: 0 - pct confidence steady dynamic (compensation) - 0 0 0 0 - 1 1 0 0 - 2 1 1 0 - 3 3 1 0 - 4 3 1 0 - 5 3 1 0 - 6 3 1 0 - 7 3 1 0 - 8 3 1 0 - ... - 30 3 2 0 - 31 3 2 0 - 32 3 1 0 - 33 3 2 0 - 34 3 1 0 - 35 3 2 0 - 36 3 1 0 - 37 3 2 0 - 38 3 1 0 - 39 3 2 0 - 40 3 3 0 - 41 3 1 0 - 42 3 2 0 - 43 3 1 0 - 44 3 1 0 - 45 3 2 0 - 46 3 3 0 - 47 3 0 0 - 48 3 2 0 - 49 3 3 0 - -Calibration occurs during runtime. No offline method is available. -Steady state compensation is used only when confidence levels of all -adjacent ratios have reached satisfactory level. A confidence level -is accumulated based on clean data collected at runtime. Data -collected during a period without extra interrupts is considered -clean. - -To compensate for excessive amounts of wakeup during idle, additional -idle time is injected when such a condition is detected. Currently, -we have a simple algorithm to double the injection ratio. A possible -enhancement might be to throttle the offending IRQ, such as delaying -EOI for level triggered interrupts. But it is a challenge to be -non-intrusive to the scheduler or the IRQ core code. - - -CPU Online/Offline ------------------- -Per-CPU kernel threads are started/stopped upon receiving -notifications of CPU hotplug activities. The intel_powerclamp driver -keeps track of clamping kernel threads, even after they are migrated -to other CPUs, after a CPU offline event. - - -Performance Analysis -==================== -This section describes the general performance data collected on -multiple systems, including Westmere (80P) and Ivy Bridge (4P, 8P). - -Effectiveness and Limitations ------------------------------ -The maximum range that idle injection is allowed is capped at 50 -percent. As mentioned earlier, since interrupts are allowed during -forced idle time, excessive interrupts could result in less -effectiveness. The extreme case would be doing a ping -f to generated -flooded network interrupts without much CPU acknowledgement. In this -case, little can be done from the idle injection threads. In most -normal cases, such as scp a large file, applications can be throttled -by the powerclamp driver, since slowing down the CPU also slows down -network protocol processing, which in turn reduces interrupts. - -When control parameters change at runtime by the controlling CPU, it -may take an additional period for the rest of the CPUs to catch up -with the changes. During this time, idle injection is out of sync, -thus not able to enter package C- states at the expected ratio. But -this effect is minor, in that in most cases change to the target -ratio is updated much less frequently than the idle injection -frequency. - -Scalability ------------ -Tests also show a minor, but measurable, difference between the 4P/8P -Ivy Bridge system and the 80P Westmere server under 50% idle ratio. -More compensation is needed on Westmere for the same amount of -target idle ratio. The compensation also increases as the idle ratio -gets larger. The above reason constitutes the need for the -calibration code. - -On the IVB 8P system, compared to an offline CPU, powerclamp can -achieve up to 40% better performance per watt. (measured by a spin -counter summed over per CPU counting threads spawned for all running -CPUs). - -Usage and Interfaces -==================== -The powerclamp driver is registered to the generic thermal layer as a -cooling device. Currently, it’s not bound to any thermal zones:: - - jacob@chromoly:/sys/class/thermal/cooling_device14$ grep . * - cur_state:0 - max_state:50 - type:intel_powerclamp - -cur_state allows user to set the desired idle percentage. Writing 0 to -cur_state will stop idle injection. Writing a value between 1 and -max_state will start the idle injection. Reading cur_state returns the -actual and current idle percentage. This may not be the same value -set by the user in that current idle percentage depends on workload -and includes natural idle. When idle injection is disabled, reading -cur_state returns value -1 instead of 0 which is to avoid confusing -100% busy state with the disabled state. - -Example usage: -- To inject 25% idle time:: - - $ sudo sh -c "echo 25 > /sys/class/thermal/cooling_device80/cur_state - -If the system is not busy and has more than 25% idle time already, -then the powerclamp driver will not start idle injection. Using Top -will not show idle injection kernel threads. - -If the system is busy (spin test below) and has less than 25% natural -idle time, powerclamp kernel threads will do idle injection. Forced -idle time is accounted as normal idle in that common code path is -taken as the idle task. - -In this example, 24.1% idle is shown. This helps the system admin or -user determine the cause of slowdown, when a powerclamp driver is in action:: - - - Tasks: 197 total, 1 running, 196 sleeping, 0 stopped, 0 zombie - Cpu(s): 71.2%us, 4.7%sy, 0.0%ni, 24.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st - Mem: 3943228k total, 1689632k used, 2253596k free, 74960k buffers - Swap: 4087804k total, 0k used, 4087804k free, 945336k cached - - PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND - 3352 jacob 20 0 262m 644 428 S 286 0.0 0:17.16 spin - 3341 root -51 0 0 0 0 D 25 0.0 0:01.62 kidle_inject/0 - 3344 root -51 0 0 0 0 D 25 0.0 0:01.60 kidle_inject/3 - 3342 root -51 0 0 0 0 D 25 0.0 0:01.61 kidle_inject/1 - 3343 root -51 0 0 0 0 D 25 0.0 0:01.60 kidle_inject/2 - 2935 jacob 20 0 696m 125m 35m S 5 3.3 0:31.11 firefox - 1546 root 20 0 158m 20m 6640 S 3 0.5 0:26.97 Xorg - 2100 jacob 20 0 1223m 88m 30m S 3 2.3 0:23.68 compiz - -Tests have shown that by using the powerclamp driver as a cooling -device, a PID based userspace thermal controller can manage to -control CPU temperature effectively, when no other thermal influence -is added. For example, a UltraBook user can compile the kernel under -certain temperature (below most active trip points). diff --git a/Documentation/thermal/nouveau_thermal.rst b/Documentation/thermal/nouveau_thermal.rst deleted file mode 100644 index 37255fd6735d..000000000000 --- a/Documentation/thermal/nouveau_thermal.rst +++ /dev/null @@ -1,96 +0,0 @@ -===================== -Kernel driver nouveau -===================== - -Supported chips: - -* NV43+ - -Authors: Martin Peres (mupuf) <martin.peres@free.fr> - -Description ------------ - -This driver allows to read the GPU core temperature, drive the GPU fan and -set temperature alarms. - -Currently, due to the absence of in-kernel API to access HWMON drivers, Nouveau -cannot access any of the i2c external monitoring chips it may find. If you -have one of those, temperature and/or fan management through Nouveau's HWMON -interface is likely not to work. This document may then not cover your situation -entirely. - -Temperature management ----------------------- - -Temperature is exposed under as a read-only HWMON attribute temp1_input. - -In order to protect the GPU from overheating, Nouveau supports 4 configurable -temperature thresholds: - - * Fan_boost: - Fan speed is set to 100% when reaching this temperature; - * Downclock: - The GPU will be downclocked to reduce its power dissipation; - * Critical: - The GPU is put on hold to further lower power dissipation; - * Shutdown: - Shut the computer down to protect your GPU. - -WARNING: - Some of these thresholds may not be used by Nouveau depending - on your chipset. - -The default value for these thresholds comes from the GPU's vbios. These -thresholds can be configured thanks to the following HWMON attributes: - - * Fan_boost: temp1_auto_point1_temp and temp1_auto_point1_temp_hyst; - * Downclock: temp1_max and temp1_max_hyst; - * Critical: temp1_crit and temp1_crit_hyst; - * Shutdown: temp1_emergency and temp1_emergency_hyst. - -NOTE: Remember that the values are stored as milli degrees Celsius. Don't forget -to multiply! - -Fan management --------------- - -Not all cards have a drivable fan. If you do, then the following HWMON -attributes should be available: - - * pwm1_enable: - Current fan management mode (NONE, MANUAL or AUTO); - * pwm1: - Current PWM value (power percentage); - * pwm1_min: - The minimum PWM speed allowed; - * pwm1_max: - The maximum PWM speed allowed (bypassed when hitting Fan_boost); - -You may also have the following attribute: - - * fan1_input: - Speed in RPM of your fan. - -Your fan can be driven in different modes: - - * 0: The fan is left untouched; - * 1: The fan can be driven in manual (use pwm1 to change the speed); - * 2; The fan is driven automatically depending on the temperature. - -NOTE: - Be sure to use the manual mode if you want to drive the fan speed manually - -NOTE2: - When operating in manual mode outside the vbios-defined - [PWM_min, PWM_max] range, the reported fan speed (RPM) may not be accurate - depending on your hardware. - -Bug reports ------------ - -Thermal management on Nouveau is new and may not work on all cards. If you have -inquiries, please ping mupuf on IRC (#nouveau, freenode). - -Bug reports should be filled on Freedesktop's bug tracker. Please follow -http://nouveau.freedesktop.org/wiki/Bugs diff --git a/Documentation/thermal/power_allocator.rst b/Documentation/thermal/power_allocator.rst deleted file mode 100644 index 67b6a3297238..000000000000 --- a/Documentation/thermal/power_allocator.rst +++ /dev/null @@ -1,271 +0,0 @@ -================================= -Power allocator governor tunables -================================= - -Trip points ------------ - -The governor works optimally with the following two passive trip points: - -1. "switch on" trip point: temperature above which the governor - control loop starts operating. This is the first passive trip - point of the thermal zone. - -2. "desired temperature" trip point: it should be higher than the - "switch on" trip point. This the target temperature the governor - is controlling for. This is the last passive trip point of the - thermal zone. - -PID Controller --------------- - -The power allocator governor implements a -Proportional-Integral-Derivative controller (PID controller) with -temperature as the control input and power as the controlled output: - - P_max = k_p * e + k_i * err_integral + k_d * diff_err + sustainable_power - -where - - e = desired_temperature - current_temperature - - err_integral is the sum of previous errors - - diff_err = e - previous_error - -It is similar to the one depicted below:: - - k_d - | - current_temp | - | v - | +----------+ +---+ - | +----->| diff_err |-->| X |------+ - | | +----------+ +---+ | - | | | tdp actor - | | k_i | | get_requested_power() - | | | | | | | - | | | | | | | ... - v | v v v v v - +---+ | +-------+ +---+ +---+ +---+ +----------+ - | S |-----+----->| sum e |----->| X |--->| S |-->| S |-->|power | - +---+ | +-------+ +---+ +---+ +---+ |allocation| - ^ | ^ +----------+ - | | | | | - | | +---+ | | | - | +------->| X |-------------------+ v v - | +---+ granted performance - desired_temperature ^ - | - | - k_po/k_pu - -Sustainable power ------------------ - -An estimate of the sustainable dissipatable power (in mW) should be -provided while registering the thermal zone. This estimates the -sustained power that can be dissipated at the desired control -temperature. This is the maximum sustained power for allocation at -the desired maximum temperature. The actual sustained power can vary -for a number of reasons. The closed loop controller will take care of -variations such as environmental conditions, and some factors related -to the speed-grade of the silicon. `sustainable_power` is therefore -simply an estimate, and may be tuned to affect the aggressiveness of -the thermal ramp. For reference, the sustainable power of a 4" phone -is typically 2000mW, while on a 10" tablet is around 4500mW (may vary -depending on screen size). - -If you are using device tree, do add it as a property of the -thermal-zone. For example:: - - thermal-zones { - soc_thermal { - polling-delay = <1000>; - polling-delay-passive = <100>; - sustainable-power = <2500>; - ... - -Instead, if the thermal zone is registered from the platform code, pass a -`thermal_zone_params` that has a `sustainable_power`. If no -`thermal_zone_params` were being passed, then something like below -will suffice:: - - static const struct thermal_zone_params tz_params = { - .sustainable_power = 3500, - }; - -and then pass `tz_params` as the 5th parameter to -`thermal_zone_device_register()` - -k_po and k_pu -------------- - -The implementation of the PID controller in the power allocator -thermal governor allows the configuration of two proportional term -constants: `k_po` and `k_pu`. `k_po` is the proportional term -constant during temperature overshoot periods (current temperature is -above "desired temperature" trip point). Conversely, `k_pu` is the -proportional term constant during temperature undershoot periods -(current temperature below "desired temperature" trip point). - -These controls are intended as the primary mechanism for configuring -the permitted thermal "ramp" of the system. For instance, a lower -`k_pu` value will provide a slower ramp, at the cost of capping -available capacity at a low temperature. On the other hand, a high -value of `k_pu` will result in the governor granting very high power -while temperature is low, and may lead to temperature overshooting. - -The default value for `k_pu` is:: - - 2 * sustainable_power / (desired_temperature - switch_on_temp) - -This means that at `switch_on_temp` the output of the controller's -proportional term will be 2 * `sustainable_power`. The default value -for `k_po` is:: - - sustainable_power / (desired_temperature - switch_on_temp) - -Focusing on the proportional and feed forward values of the PID -controller equation we have:: - - P_max = k_p * e + sustainable_power - -The proportional term is proportional to the difference between the -desired temperature and the current one. When the current temperature -is the desired one, then the proportional component is zero and -`P_max` = `sustainable_power`. That is, the system should operate in -thermal equilibrium under constant load. `sustainable_power` is only -an estimate, which is the reason for closed-loop control such as this. - -Expanding `k_pu` we get:: - - P_max = 2 * sustainable_power * (T_set - T) / (T_set - T_on) + - sustainable_power - -where: - - - T_set is the desired temperature - - T is the current temperature - - T_on is the switch on temperature - -When the current temperature is the switch_on temperature, the above -formula becomes:: - - P_max = 2 * sustainable_power * (T_set - T_on) / (T_set - T_on) + - sustainable_power = 2 * sustainable_power + sustainable_power = - 3 * sustainable_power - -Therefore, the proportional term alone linearly decreases power from -3 * `sustainable_power` to `sustainable_power` as the temperature -rises from the switch on temperature to the desired temperature. - -k_i and integral_cutoff ------------------------ - -`k_i` configures the PID loop's integral term constant. This term -allows the PID controller to compensate for long term drift and for -the quantized nature of the output control: cooling devices can't set -the exact power that the governor requests. When the temperature -error is below `integral_cutoff`, errors are accumulated in the -integral term. This term is then multiplied by `k_i` and the result -added to the output of the controller. Typically `k_i` is set low (1 -or 2) and `integral_cutoff` is 0. - -k_d ---- - -`k_d` configures the PID loop's derivative term constant. It's -recommended to leave it as the default: 0. - -Cooling device power API -======================== - -Cooling devices controlled by this governor must supply the additional -"power" API in their `cooling_device_ops`. It consists on three ops: - -1. :: - - int get_requested_power(struct thermal_cooling_device *cdev, - struct thermal_zone_device *tz, u32 *power); - - -@cdev: - The `struct thermal_cooling_device` pointer -@tz: - thermal zone in which we are currently operating -@power: - pointer in which to store the calculated power - -`get_requested_power()` calculates the power requested by the device -in milliwatts and stores it in @power . It should return 0 on -success, -E* on failure. This is currently used by the power -allocator governor to calculate how much power to give to each cooling -device. - -2. :: - - int state2power(struct thermal_cooling_device *cdev, struct - thermal_zone_device *tz, unsigned long state, - u32 *power); - -@cdev: - The `struct thermal_cooling_device` pointer -@tz: - thermal zone in which we are currently operating -@state: - A cooling device state -@power: - pointer in which to store the equivalent power - -Convert cooling device state @state into power consumption in -milliwatts and store it in @power. It should return 0 on success, -E* -on failure. This is currently used by thermal core to calculate the -maximum power that an actor can consume. - -3. :: - - int power2state(struct thermal_cooling_device *cdev, u32 power, - unsigned long *state); - -@cdev: - The `struct thermal_cooling_device` pointer -@power: - power in milliwatts -@state: - pointer in which to store the resulting state - -Calculate a cooling device state that would make the device consume at -most @power mW and store it in @state. It should return 0 on success, --E* on failure. This is currently used by the thermal core to convert -a given power set by the power allocator governor to a state that the -cooling device can set. It is a function because this conversion may -depend on external factors that may change so this function should the -best conversion given "current circumstances". - -Cooling device weights ----------------------- - -Weights are a mechanism to bias the allocation among cooling -devices. They express the relative power efficiency of different -cooling devices. Higher weight can be used to express higher power -efficiency. Weighting is relative such that if each cooling device -has a weight of one they are considered equal. This is particularly -useful in heterogeneous systems where two cooling devices may perform -the same kind of compute, but with different efficiency. For example, -a system with two different types of processors. - -If the thermal zone is registered using -`thermal_zone_device_register()` (i.e., platform code), then weights -are passed as part of the thermal zone's `thermal_bind_parameters`. -If the platform is registered using device tree, then they are passed -as the `contribution` property of each map in the `cooling-maps` node. - -Limitations of the power allocator governor -=========================================== - -The power allocator governor's PID controller works best if there is a -periodic tick. If you have a driver that calls -`thermal_zone_device_update()` (or anything that ends up calling the -governor's `throttle()` function) repetitively, the governor response -won't be very good. Note that this is not particular to this -governor, step-wise will also misbehave if you call its throttle() -faster than the normal thermal framework tick (due to interrupts for -example) as it will overreact. diff --git a/Documentation/thermal/sysfs-api.rst b/Documentation/thermal/sysfs-api.rst deleted file mode 100644 index e4930761d3e5..000000000000 --- a/Documentation/thermal/sysfs-api.rst +++ /dev/null @@ -1,798 +0,0 @@ -=================================== -Generic Thermal Sysfs driver How To -=================================== - -Written by Sujith Thomas <sujith.thomas@intel.com>, Zhang Rui <rui.zhang@intel.com> - -Updated: 2 January 2008 - -Copyright (c) 2008 Intel Corporation - - -0. Introduction -=============== - -The generic thermal sysfs provides a set of interfaces for thermal zone -devices (sensors) and thermal cooling devices (fan, processor...) to register -with the thermal management solution and to be a part of it. - -This how-to focuses on enabling new thermal zone and cooling devices to -participate in thermal management. -This solution is platform independent and any type of thermal zone devices -and cooling devices should be able to make use of the infrastructure. - -The main task of the thermal sysfs driver is to expose thermal zone attributes -as well as cooling device attributes to the user space. -An intelligent thermal management application can make decisions based on -inputs from thermal zone attributes (the current temperature and trip point -temperature) and throttle appropriate devices. - -- `[0-*]` denotes any positive number starting from 0 -- `[1-*]` denotes any positive number starting from 1 - -1. thermal sysfs driver interface functions -=========================================== - -1.1 thermal zone device interface ---------------------------------- - - :: - - struct thermal_zone_device - *thermal_zone_device_register(char *type, - int trips, int mask, void *devdata, - struct thermal_zone_device_ops *ops, - const struct thermal_zone_params *tzp, - int passive_delay, int polling_delay)) - - This interface function adds a new thermal zone device (sensor) to - /sys/class/thermal folder as `thermal_zone[0-*]`. It tries to bind all the - thermal cooling devices registered at the same time. - - type: - the thermal zone type. - trips: - the total number of trip points this thermal zone supports. - mask: - Bit string: If 'n'th bit is set, then trip point 'n' is writeable. - devdata: - device private data - ops: - thermal zone device call-backs. - - .bind: - bind the thermal zone device with a thermal cooling device. - .unbind: - unbind the thermal zone device with a thermal cooling device. - .get_temp: - get the current temperature of the thermal zone. - .set_trips: - set the trip points window. Whenever the current temperature - is updated, the trip points immediately below and above the - current temperature are found. - .get_mode: - get the current mode (enabled/disabled) of the thermal zone. - - - "enabled" means the kernel thermal management is - enabled. - - "disabled" will prevent kernel thermal driver action - upon trip points so that user applications can take - charge of thermal management. - .set_mode: - set the mode (enabled/disabled) of the thermal zone. - .get_trip_type: - get the type of certain trip point. - .get_trip_temp: - get the temperature above which the certain trip point - will be fired. - .set_emul_temp: - set the emulation temperature which helps in debugging - different threshold temperature points. - tzp: - thermal zone platform parameters. - passive_delay: - number of milliseconds to wait between polls when - performing passive cooling. - polling_delay: - number of milliseconds to wait between polls when checking - whether trip points have been crossed (0 for interrupt driven systems). - - :: - - void thermal_zone_device_unregister(struct thermal_zone_device *tz) - - This interface function removes the thermal zone device. - It deletes the corresponding entry from /sys/class/thermal folder and - unbinds all the thermal cooling devices it uses. - - :: - - struct thermal_zone_device - *thermal_zone_of_sensor_register(struct device *dev, int sensor_id, - void *data, - const struct thermal_zone_of_device_ops *ops) - - This interface adds a new sensor to a DT thermal zone. - This function will search the list of thermal zones described in - device tree and look for the zone that refer to the sensor device - pointed by dev->of_node as temperature providers. For the zone - pointing to the sensor node, the sensor will be added to the DT - thermal zone device. - - The parameters for this interface are: - - dev: - Device node of sensor containing valid node pointer in - dev->of_node. - sensor_id: - a sensor identifier, in case the sensor IP has more - than one sensors - data: - a private pointer (owned by the caller) that will be - passed back, when a temperature reading is needed. - ops: - `struct thermal_zone_of_device_ops *`. - - ============== ======================================= - get_temp a pointer to a function that reads the - sensor temperature. This is mandatory - callback provided by sensor driver. - set_trips a pointer to a function that sets a - temperature window. When this window is - left the driver must inform the thermal - core via thermal_zone_device_update. - get_trend a pointer to a function that reads the - sensor temperature trend. - set_emul_temp a pointer to a function that sets - sensor emulated temperature. - ============== ======================================= - - The thermal zone temperature is provided by the get_temp() function - pointer of thermal_zone_of_device_ops. When called, it will - have the private pointer @data back. - - It returns error pointer if fails otherwise valid thermal zone device - handle. Caller should check the return handle with IS_ERR() for finding - whether success or not. - - :: - - void thermal_zone_of_sensor_unregister(struct device *dev, - struct thermal_zone_device *tzd) - - This interface unregisters a sensor from a DT thermal zone which was - successfully added by interface thermal_zone_of_sensor_register(). - This function removes the sensor callbacks and private data from the - thermal zone device registered with thermal_zone_of_sensor_register() - interface. It will also silent the zone by remove the .get_temp() and - get_trend() thermal zone device callbacks. - - :: - - struct thermal_zone_device - *devm_thermal_zone_of_sensor_register(struct device *dev, - int sensor_id, - void *data, - const struct thermal_zone_of_device_ops *ops) - - This interface is resource managed version of - thermal_zone_of_sensor_register(). - - All details of thermal_zone_of_sensor_register() described in - section 1.1.3 is applicable here. - - The benefit of using this interface to register sensor is that it - is not require to explicitly call thermal_zone_of_sensor_unregister() - in error path or during driver unbinding as this is done by driver - resource manager. - - :: - - void devm_thermal_zone_of_sensor_unregister(struct device *dev, - struct thermal_zone_device *tzd) - - This interface is resource managed version of - thermal_zone_of_sensor_unregister(). - All details of thermal_zone_of_sensor_unregister() described in - section 1.1.4 is applicable here. - Normally this function will not need to be called and the resource - management code will ensure that the resource is freed. - - :: - - int thermal_zone_get_slope(struct thermal_zone_device *tz) - - This interface is used to read the slope attribute value - for the thermal zone device, which might be useful for platform - drivers for temperature calculations. - - :: - - int thermal_zone_get_offset(struct thermal_zone_device *tz) - - This interface is used to read the offset attribute value - for the thermal zone device, which might be useful for platform - drivers for temperature calculations. - -1.2 thermal cooling device interface ------------------------------------- - - - :: - - struct thermal_cooling_device - *thermal_cooling_device_register(char *name, - void *devdata, struct thermal_cooling_device_ops *) - - This interface function adds a new thermal cooling device (fan/processor/...) - to /sys/class/thermal/ folder as `cooling_device[0-*]`. It tries to bind itself - to all the thermal zone devices registered at the same time. - - name: - the cooling device name. - devdata: - device private data. - ops: - thermal cooling devices call-backs. - - .get_max_state: - get the Maximum throttle state of the cooling device. - .get_cur_state: - get the Currently requested throttle state of the - cooling device. - .set_cur_state: - set the Current throttle state of the cooling device. - - :: - - void thermal_cooling_device_unregister(struct thermal_cooling_device *cdev) - - This interface function removes the thermal cooling device. - It deletes the corresponding entry from /sys/class/thermal folder and - unbinds itself from all the thermal zone devices using it. - -1.3 interface for binding a thermal zone device with a thermal cooling device ------------------------------------------------------------------------------ - - :: - - int thermal_zone_bind_cooling_device(struct thermal_zone_device *tz, - int trip, struct thermal_cooling_device *cdev, - unsigned long upper, unsigned long lower, unsigned int weight); - - This interface function binds a thermal cooling device to a particular trip - point of a thermal zone device. - - This function is usually called in the thermal zone device .bind callback. - - tz: - the thermal zone device - cdev: - thermal cooling device - trip: - indicates which trip point in this thermal zone the cooling device - is associated with. - upper: - the Maximum cooling state for this trip point. - THERMAL_NO_LIMIT means no upper limit, - and the cooling device can be in max_state. - lower: - the Minimum cooling state can be used for this trip point. - THERMAL_NO_LIMIT means no lower limit, - and the cooling device can be in cooling state 0. - weight: - the influence of this cooling device in this thermal - zone. See 1.4.1 below for more information. - - :: - - int thermal_zone_unbind_cooling_device(struct thermal_zone_device *tz, - int trip, struct thermal_cooling_device *cdev); - - This interface function unbinds a thermal cooling device from a particular - trip point of a thermal zone device. This function is usually called in - the thermal zone device .unbind callback. - - tz: - the thermal zone device - cdev: - thermal cooling device - trip: - indicates which trip point in this thermal zone the cooling device - is associated with. - -1.4 Thermal Zone Parameters ---------------------------- - - :: - - struct thermal_bind_params - - This structure defines the following parameters that are used to bind - a zone with a cooling device for a particular trip point. - - .cdev: - The cooling device pointer - .weight: - The 'influence' of a particular cooling device on this - zone. This is relative to the rest of the cooling - devices. For example, if all cooling devices have a - weight of 1, then they all contribute the same. You can - use percentages if you want, but it's not mandatory. A - weight of 0 means that this cooling device doesn't - contribute to the cooling of this zone unless all cooling - devices have a weight of 0. If all weights are 0, then - they all contribute the same. - .trip_mask: - This is a bit mask that gives the binding relation between - this thermal zone and cdev, for a particular trip point. - If nth bit is set, then the cdev and thermal zone are bound - for trip point n. - .binding_limits: - This is an array of cooling state limits. Must have - exactly 2 * thermal_zone.number_of_trip_points. It is an - array consisting of tuples <lower-state upper-state> of - state limits. Each trip will be associated with one state - limit tuple when binding. A NULL pointer means - <THERMAL_NO_LIMITS THERMAL_NO_LIMITS> on all trips. - These limits are used when binding a cdev to a trip point. - .match: - This call back returns success(0) if the 'tz and cdev' need to - be bound, as per platform data. - - :: - - struct thermal_zone_params - - This structure defines the platform level parameters for a thermal zone. - This data, for each thermal zone should come from the platform layer. - This is an optional feature where some platforms can choose not to - provide this data. - - .governor_name: - Name of the thermal governor used for this zone - .no_hwmon: - a boolean to indicate if the thermal to hwmon sysfs interface - is required. when no_hwmon == false, a hwmon sysfs interface - will be created. when no_hwmon == true, nothing will be done. - In case the thermal_zone_params is NULL, the hwmon interface - will be created (for backward compatibility). - .num_tbps: - Number of thermal_bind_params entries for this zone - .tbp: - thermal_bind_params entries - -2. sysfs attributes structure -============================= - -== ================ -RO read only value -WO write only value -RW read/write value -== ================ - -Thermal sysfs attributes will be represented under /sys/class/thermal. -Hwmon sysfs I/F extension is also available under /sys/class/hwmon -if hwmon is compiled in or built as a module. - -Thermal zone device sys I/F, created once it's registered:: - - /sys/class/thermal/thermal_zone[0-*]: - |---type: Type of the thermal zone - |---temp: Current temperature - |---mode: Working mode of the thermal zone - |---policy: Thermal governor used for this zone - |---available_policies: Available thermal governors for this zone - |---trip_point_[0-*]_temp: Trip point temperature - |---trip_point_[0-*]_type: Trip point type - |---trip_point_[0-*]_hyst: Hysteresis value for this trip point - |---emul_temp: Emulated temperature set node - |---sustainable_power: Sustainable dissipatable power - |---k_po: Proportional term during temperature overshoot - |---k_pu: Proportional term during temperature undershoot - |---k_i: PID's integral term in the power allocator gov - |---k_d: PID's derivative term in the power allocator - |---integral_cutoff: Offset above which errors are accumulated - |---slope: Slope constant applied as linear extrapolation - |---offset: Offset constant applied as linear extrapolation - -Thermal cooling device sys I/F, created once it's registered:: - - /sys/class/thermal/cooling_device[0-*]: - |---type: Type of the cooling device(processor/fan/...) - |---max_state: Maximum cooling state of the cooling device - |---cur_state: Current cooling state of the cooling device - |---stats: Directory containing cooling device's statistics - |---stats/reset: Writing any value resets the statistics - |---stats/time_in_state_ms: Time (msec) spent in various cooling states - |---stats/total_trans: Total number of times cooling state is changed - |---stats/trans_table: Cooing state transition table - - -Then next two dynamic attributes are created/removed in pairs. They represent -the relationship between a thermal zone and its associated cooling device. -They are created/removed for each successful execution of -thermal_zone_bind_cooling_device/thermal_zone_unbind_cooling_device. - -:: - - /sys/class/thermal/thermal_zone[0-*]: - |---cdev[0-*]: [0-*]th cooling device in current thermal zone - |---cdev[0-*]_trip_point: Trip point that cdev[0-*] is associated with - |---cdev[0-*]_weight: Influence of the cooling device in - this thermal zone - -Besides the thermal zone device sysfs I/F and cooling device sysfs I/F, -the generic thermal driver also creates a hwmon sysfs I/F for each _type_ -of thermal zone device. E.g. the generic thermal driver registers one hwmon -class device and build the associated hwmon sysfs I/F for all the registered -ACPI thermal zones. - -:: - - /sys/class/hwmon/hwmon[0-*]: - |---name: The type of the thermal zone devices - |---temp[1-*]_input: The current temperature of thermal zone [1-*] - |---temp[1-*]_critical: The critical trip point of thermal zone [1-*] - -Please read Documentation/hwmon/sysfs-interface.rst for additional information. - -Thermal zone attributes ------------------------ - -type - Strings which represent the thermal zone type. - This is given by thermal zone driver as part of registration. - E.g: "acpitz" indicates it's an ACPI thermal device. - In order to keep it consistent with hwmon sys attribute; this should - be a short, lowercase string, not containing spaces nor dashes. - RO, Required - -temp - Current temperature as reported by thermal zone (sensor). - Unit: millidegree Celsius - RO, Required - -mode - One of the predefined values in [enabled, disabled]. - This file gives information about the algorithm that is currently - managing the thermal zone. It can be either default kernel based - algorithm or user space application. - - enabled - enable Kernel Thermal management. - disabled - Preventing kernel thermal zone driver actions upon - trip points so that user application can take full - charge of the thermal management. - - RW, Optional - -policy - One of the various thermal governors used for a particular zone. - - RW, Required - -available_policies - Available thermal governors which can be used for a particular zone. - - RO, Required - -`trip_point_[0-*]_temp` - The temperature above which trip point will be fired. - - Unit: millidegree Celsius - - RO, Optional - -`trip_point_[0-*]_type` - Strings which indicate the type of the trip point. - - E.g. it can be one of critical, hot, passive, `active[0-*]` for ACPI - thermal zone. - - RO, Optional - -`trip_point_[0-*]_hyst` - The hysteresis value for a trip point, represented as an integer - Unit: Celsius - RW, Optional - -`cdev[0-*]` - Sysfs link to the thermal cooling device node where the sys I/F - for cooling device throttling control represents. - - RO, Optional - -`cdev[0-*]_trip_point` - The trip point in this thermal zone which `cdev[0-*]` is associated - with; -1 means the cooling device is not associated with any trip - point. - - RO, Optional - -`cdev[0-*]_weight` - The influence of `cdev[0-*]` in this thermal zone. This value - is relative to the rest of cooling devices in the thermal - zone. For example, if a cooling device has a weight double - than that of other, it's twice as effective in cooling the - thermal zone. - - RW, Optional - -passive - Attribute is only present for zones in which the passive cooling - policy is not supported by native thermal driver. Default is zero - and can be set to a temperature (in millidegrees) to enable a - passive trip point for the zone. Activation is done by polling with - an interval of 1 second. - - Unit: millidegrees Celsius - - Valid values: 0 (disabled) or greater than 1000 - - RW, Optional - -emul_temp - Interface to set the emulated temperature method in thermal zone - (sensor). After setting this temperature, the thermal zone may pass - this temperature to platform emulation function if registered or - cache it locally. This is useful in debugging different temperature - threshold and its associated cooling action. This is write only node - and writing 0 on this node should disable emulation. - Unit: millidegree Celsius - - WO, Optional - - WARNING: - Be careful while enabling this option on production systems, - because userland can easily disable the thermal policy by simply - flooding this sysfs node with low temperature values. - -sustainable_power - An estimate of the sustained power that can be dissipated by - the thermal zone. Used by the power allocator governor. For - more information see Documentation/thermal/power_allocator.rst - - Unit: milliwatts - - RW, Optional - -k_po - The proportional term of the power allocator governor's PID - controller during temperature overshoot. Temperature overshoot - is when the current temperature is above the "desired - temperature" trip point. For more information see - Documentation/thermal/power_allocator.rst - - RW, Optional - -k_pu - The proportional term of the power allocator governor's PID - controller during temperature undershoot. Temperature undershoot - is when the current temperature is below the "desired - temperature" trip point. For more information see - Documentation/thermal/power_allocator.rst - - RW, Optional - -k_i - The integral term of the power allocator governor's PID - controller. This term allows the PID controller to compensate - for long term drift. For more information see - Documentation/thermal/power_allocator.rst - - RW, Optional - -k_d - The derivative term of the power allocator governor's PID - controller. For more information see - Documentation/thermal/power_allocator.rst - - RW, Optional - -integral_cutoff - Temperature offset from the desired temperature trip point - above which the integral term of the power allocator - governor's PID controller starts accumulating errors. For - example, if integral_cutoff is 0, then the integral term only - accumulates error when temperature is above the desired - temperature trip point. For more information see - Documentation/thermal/power_allocator.rst - - Unit: millidegree Celsius - - RW, Optional - -slope - The slope constant used in a linear extrapolation model - to determine a hotspot temperature based off the sensor's - raw readings. It is up to the device driver to determine - the usage of these values. - - RW, Optional - -offset - The offset constant used in a linear extrapolation model - to determine a hotspot temperature based off the sensor's - raw readings. It is up to the device driver to determine - the usage of these values. - - RW, Optional - -Cooling device attributes -------------------------- - -type - String which represents the type of device, e.g: - - - for generic ACPI: should be "Fan", "Processor" or "LCD" - - for memory controller device on intel_menlow platform: - should be "Memory controller". - - RO, Required - -max_state - The maximum permissible cooling state of this cooling device. - - RO, Required - -cur_state - The current cooling state of this cooling device. - The value can any integer numbers between 0 and max_state: - - - cur_state == 0 means no cooling - - cur_state == max_state means the maximum cooling. - - RW, Required - -stats/reset - Writing any value resets the cooling device's statistics. - WO, Required - -stats/time_in_state_ms: - The amount of time spent by the cooling device in various cooling - states. The output will have "<state> <time>" pair in each line, which - will mean this cooling device spent <time> msec of time at <state>. - Output will have one line for each of the supported states. usertime - units here is 10mS (similar to other time exported in /proc). - RO, Required - - -stats/total_trans: - A single positive value showing the total number of times the state of a - cooling device is changed. - - RO, Required - -stats/trans_table: - This gives fine grained information about all the cooling state - transitions. The cat output here is a two dimensional matrix, where an - entry <i,j> (row i, column j) represents the number of transitions from - State_i to State_j. If the transition table is bigger than PAGE_SIZE, - reading this will return an -EFBIG error. - RO, Required - -3. A simple implementation -========================== - -ACPI thermal zone may support multiple trip points like critical, hot, -passive, active. If an ACPI thermal zone supports critical, passive, -active[0] and active[1] at the same time, it may register itself as a -thermal_zone_device (thermal_zone1) with 4 trip points in all. -It has one processor and one fan, which are both registered as -thermal_cooling_device. Both are considered to have the same -effectiveness in cooling the thermal zone. - -If the processor is listed in _PSL method, and the fan is listed in _AL0 -method, the sys I/F structure will be built like this:: - - /sys/class/thermal: - |thermal_zone1: - |---type: acpitz - |---temp: 37000 - |---mode: enabled - |---policy: step_wise - |---available_policies: step_wise fair_share - |---trip_point_0_temp: 100000 - |---trip_point_0_type: critical - |---trip_point_1_temp: 80000 - |---trip_point_1_type: passive - |---trip_point_2_temp: 70000 - |---trip_point_2_type: active0 - |---trip_point_3_temp: 60000 - |---trip_point_3_type: active1 - |---cdev0: --->/sys/class/thermal/cooling_device0 - |---cdev0_trip_point: 1 /* cdev0 can be used for passive */ - |---cdev0_weight: 1024 - |---cdev1: --->/sys/class/thermal/cooling_device3 - |---cdev1_trip_point: 2 /* cdev1 can be used for active[0]*/ - |---cdev1_weight: 1024 - - |cooling_device0: - |---type: Processor - |---max_state: 8 - |---cur_state: 0 - - |cooling_device3: - |---type: Fan - |---max_state: 2 - |---cur_state: 0 - - /sys/class/hwmon: - |hwmon0: - |---name: acpitz - |---temp1_input: 37000 - |---temp1_crit: 100000 - -4. Event Notification -===================== - -The framework includes a simple notification mechanism, in the form of a -netlink event. Netlink socket initialization is done during the _init_ -of the framework. Drivers which intend to use the notification mechanism -just need to call thermal_generate_netlink_event() with two arguments viz -(originator, event). The originator is a pointer to struct thermal_zone_device -from where the event has been originated. An integer which represents the -thermal zone device will be used in the message to identify the zone. The -event will be one of:{THERMAL_AUX0, THERMAL_AUX1, THERMAL_CRITICAL, -THERMAL_DEV_FAULT}. Notification can be sent when the current temperature -crosses any of the configured thresholds. - -5. Export Symbol APIs -===================== - -5.1. get_tz_trend ------------------ - -This function returns the trend of a thermal zone, i.e the rate of change -of temperature of the thermal zone. Ideally, the thermal sensor drivers -are supposed to implement the callback. If they don't, the thermal -framework calculated the trend by comparing the previous and the current -temperature values. - -5.2. get_thermal_instance -------------------------- - -This function returns the thermal_instance corresponding to a given -{thermal_zone, cooling_device, trip_point} combination. Returns NULL -if such an instance does not exist. - -5.3. thermal_notify_framework ------------------------------ - -This function handles the trip events from sensor drivers. It starts -throttling the cooling devices according to the policy configured. -For CRITICAL and HOT trip points, this notifies the respective drivers, -and does actual throttling for other trip points i.e ACTIVE and PASSIVE. -The throttling policy is based on the configured platform data; if no -platform data is provided, this uses the step_wise throttling policy. - -5.4. thermal_cdev_update ------------------------- - -This function serves as an arbitrator to set the state of a cooling -device. It sets the cooling device to the deepest cooling state if -possible. - -6. thermal_emergency_poweroff -============================= - -On an event of critical trip temperature crossing. Thermal framework -allows the system to shutdown gracefully by calling orderly_poweroff(). -In the event of a failure of orderly_poweroff() to shut down the system -we are in danger of keeping the system alive at undesirably high -temperatures. To mitigate this high risk scenario we program a work -queue to fire after a pre-determined number of seconds to start -an emergency shutdown of the device using the kernel_power_off() -function. In case kernel_power_off() fails then finally -emergency_restart() is called in the worst case. - -The delay should be carefully profiled so as to give adequate time for -orderly_poweroff(). In case of failure of an orderly_poweroff() the -emergency poweroff kicks in after the delay has elapsed and shuts down -the system. - -If set to 0 emergency poweroff will not be supported. So a carefully -profiled non-zero positive value is a must for emergerncy poweroff to be -triggered. diff --git a/Documentation/thermal/x86_pkg_temperature_thermal.rst b/Documentation/thermal/x86_pkg_temperature_thermal.rst deleted file mode 100644 index f134dbd3f5a9..000000000000 --- a/Documentation/thermal/x86_pkg_temperature_thermal.rst +++ /dev/null @@ -1,55 +0,0 @@ -=================================== -Kernel driver: x86_pkg_temp_thermal -=================================== - -Supported chips: - -* x86: with package level thermal management - -(Verify using: CPUID.06H:EAX[bit 6] =1) - -Authors: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> - -Reference ---------- - -Intel® 64 and IA-32 Architectures Software Developer’s Manual (Jan, 2013): -Chapter 14.6: PACKAGE LEVEL THERMAL MANAGEMENT - -Description ------------ - -This driver register CPU digital temperature package level sensor as a thermal -zone with maximum two user mode configurable trip points. Number of trip points -depends on the capability of the package. Once the trip point is violated, -user mode can receive notification via thermal notification mechanism and can -take any action to control temperature. - - -Threshold management --------------------- -Each package will register as a thermal zone under /sys/class/thermal. - -Example:: - - /sys/class/thermal/thermal_zone1 - -This contains two trip points: - -- trip_point_0_temp -- trip_point_1_temp - -User can set any temperature between 0 to TJ-Max temperature. Temperature units -are in milli-degree Celsius. Refer to "Documentation/thermal/sysfs-api.rst" for -thermal sys-fs details. - -Any value other than 0 in these trip points, can trigger thermal notifications. -Setting 0, stops sending thermal notifications. - -Thermal notifications: -To get kobject-uevent notifications, set the thermal zone -policy to "user_space". - -For example:: - - echo -n "user_space" > policy |