Age | Commit message (Collapse) | Author | Files | Lines |
|
The thermal pressure signal gives information to the scheduler about
reduced CPU capacity due to thermal. It is based on a value stored in
a per-cpu 'thermal_pressure' variable. The online CPUs will get the
new value there, while the offline won't. Unfortunately, when the CPU
is back online, the value read from per-cpu variable might be wrong
(stale data). This might affect the scheduler decisions, since it
sees the CPU capacity differently than what is actually available.
Fix it by making sure that all online+offline CPUs would get the
proper value in their per-cpu variable when thermal framework sets
capping.
Fixes: f12e4f66ab6a3 ("thermal/cpu-cooling: Update thermal pressure in case of a maximum frequency capping")
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://lore.kernel.org/r/20210614191030.22241-1-lukasz.luba@arm.com
|
|
Slab OOB issue is scanned by KASAN in cpu_power_to_freq().
If power is limited below the power of OPP0 in EM table,
it will cause slab out-of-bound issue with negative array
index.
Return the lowest frequency if limited power cannot found
a suitable OPP in EM table to fix this issue.
Backtrace:
[<ffffffd02d2a37f0>] die+0x104/0x5ac
[<ffffffd02d2a5630>] bug_handler+0x64/0xd0
[<ffffffd02d288ce4>] brk_handler+0x160/0x258
[<ffffffd02d281e5c>] do_debug_exception+0x248/0x3f0
[<ffffffd02d284488>] el1_dbg+0x14/0xbc
[<ffffffd02d75d1d4>] __kasan_report+0x1dc/0x1e0
[<ffffffd02d75c2e0>] kasan_report+0x10/0x20
[<ffffffd02d75def8>] __asan_report_load8_noabort+0x18/0x28
[<ffffffd02e6fce5c>] cpufreq_power2state+0x180/0x43c
[<ffffffd02e6ead80>] power_actor_set_power+0x114/0x1d4
[<ffffffd02e6fac24>] allocate_power+0xaec/0xde0
[<ffffffd02e6f9f80>] power_allocator_throttle+0x3ec/0x5a4
[<ffffffd02e6ea888>] handle_thermal_trip+0x160/0x294
[<ffffffd02e6edd08>] thermal_zone_device_check+0xe4/0x154
[<ffffffd02d351cb4>] process_one_work+0x5e4/0xe28
[<ffffffd02d352f44>] worker_thread+0xa4c/0xfac
[<ffffffd02d360124>] kthread+0x33c/0x358
[<ffffffd02d289940>] ret_from_fork+0xc/0x18
Fixes: 371a3bc79c11b ("thermal/drivers/cpufreq_cooling: Fix wrong frequency converted from power")
Signed-off-by: brian-sy yang <brian-sy.yang@mediatek.com>
Signed-off-by: Michael Kao <michael.kao@mediatek.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Cc: stable@vger.kernel.org #v5.7
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20201229050831.19493-1-michael.kao@mediatek.com
|
|
There is a list with the purpose of grouping the cpufreq cooling
device together as described in the comments but actually it is
unused, the code evolved since 2012 and the list was no longer needed.
Delete the remaining unused list related code.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://lore.kernel.org/r/20210314111333.16551-5-daniel.lezcano@linaro.org
|
|
Currently the naming of a cooling device is just a cooling technique
followed by a number. When there are multiple cooling devices using
the same technique, it is impossible to clearly identify the related
device as this one is just a number.
For instance:
thermal-cpufreq-0
thermal-cpufreq-1
etc ...
The 'thermal' prefix is redundant with the subsystem namespace. This
patch removes the 'thermal' prefix and changes the number by the device
name. So the naming above becomes:
cpufreq-cpu0
cpufreq-cpu4
etc ...
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://lore.kernel.org/r/20210314111333.16551-2-daniel.lezcano@linaro.org
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux
Pull thermal updates from Daniel Lezcano:
- Use the newly introduced 'hot' and 'critical' ops for the acpi
thermal driver (Daniel Lezcano)
- Remove the notify ops as it is no longer used (Daniel Lezcano)
- Remove the 'forced passive' option and the unused bind/unbind
functions (Daniel Lezcano)
- Remove the THERMAL_TRIPS_NONE and the code cleanup around this macro
(Daniel Lezcano)
- Rework the delays to make them pre-computed instead of computing them
again and again at each polling interval (Daniel Lezcano)
- Remove the pointless 'thermal_zone_device_reset' function (Daniel
Lezcano)
- Use the critical and hot ops to prevent an unexpected system shutdown
on int340x (Kai-Heng Feng)
- Make the cooling device state private to the thermal subsystem
(Daniel Lezcano)
- Prevent to use not-power-aware actor devices with the power allocator
governor (Lukasz Luba)
- Remove 'zx' and 'tango' support along with the corresponding
platforms (Arnd Bergman)
- Fix several issues on the Omap thermal driver (Tony Lindgren)
- Add support for adc-tm5 PMIC thermal monitor for Qcom platforms
(Dmitry Baryshkov)
- Fix an initialization loop in the adc-tm5 (Colin Ian King)
- Fix a return error check in the cpufreq cooling device (Viresh Kumar)
* tag 'thermal-v5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux: (26 commits)
thermal: cpufreq_cooling: freq_qos_update_request() returns < 0 on error
thermal: qcom: Fix comparison with uninitialized variable channels_available
thermal: qcom: add support for adc-tm5 PMIC thermal monitor
dt-bindings: thermal: qcom: add adc-thermal monitor bindings
thermal: ti-soc-thermal: Use non-inverted define for omap4
thermal: ti-soc-thermal: Simplify polling with iopoll
thermal: ti-soc-thermal: Fix stuck sensor with continuous mode for 4430
thermal: ti-soc-thermal: Skip pointless register access for dra7
thermal/drivers/zx: Remove zx driver
thermal/drivers/tango: Remove tango driver
thermal: power allocator: fail binding for non-power actor devices
thermal/core: Make cooling device state change private
thermal: intel: pch: Fix unexpected shutdown at critical temperature
thermal: int340x: Fix unexpected shutdown at critical temperature
thermal/core: Remove pointless thermal_zone_device_reset() function
thermal/core: Remove ms based delay fields
thermal/core: Use precomputed jiffies for the polling
thermal/core: Precompute the delays from msecs to jiffies
thermal/core: Remove unused macro THERMAL_TRIPS_NONE
thermal/core: Remove THERMAL_TRIPS_NONE test
...
|
|
freq_qos_update_request() returns 1 if the effective constraint value
has changed, 0 if the effective constraint value has not changed, or a
negative error code on failures.
The frequency constraints for CPUs can be set by different parts of the
kernel. If the maximum frequency constraint set by other parts of the
kernel are set at a lower value than the one corresponding to cooling
state 0, then we will never be able to cool down the system as
freq_qos_update_request() will keep on returning 0 and we will skip
updating cpufreq_state and thermal pressure.
Fix that by doing the updates even in the case where
freq_qos_update_request() returns 0, as we have effectively set the
constraint to a new value even if the consolidated value of the
actual constraint is unchanged because of external factors.
Cc: v5.7+ <stable@vger.kernel.org> # v5.7+
Reported-by: Thara Gopinath <thara.gopinath@linaro.org>
Fixes: f12e4f66ab6a ("thermal/cpu-cooling: Update thermal pressure in case of a maximum frequency capping")
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Tested-by: Lukasz Luba <lukasz.luba@arm.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Thara Gopinath<thara.gopinath@linaro.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/b2b7e84944937390256669df5a48ce5abba0c1ef.1613540713.git.viresh.kumar@linaro.org
|
|
Several parts of the kernel are already using the effective CPU
utilization (as seen by the scheduler) to get the current load on the
CPU, do the same here instead of depending on the idle time of the CPU,
which isn't that accurate comparatively.
This is also the right thing to do as it makes the cpufreq governor
(schedutil) align better with the cpufreq_cooling driver, as the power
requested by cpufreq_cooling governor will exactly match the next
frequency requested by the schedutil governor since they are both using
the same metric to calculate load.
This was tested on ARM Hikey6220 platform with hackbench, sysbench and
schbench. None of them showed any regression or significant
improvements. Schbench is the most important ones out of these as it
creates the scenario where the utilization numbers provide a better
estimate of the future.
Scenario 1: The CPUs were mostly idle in the previous polling window of
the IPA governor as the tasks were sleeping and here are the details
from traces (load is in %):
Old: thermal_power_cpu_get_power: cpus=00000000,000000ff freq=1200000 total_load=203 load={{0x35,0x1,0x0,0x31,0x0,0x0,0x64,0x0}} dynamic_power=1339
New: thermal_power_cpu_get_power: cpus=00000000,000000ff freq=1200000 total_load=600 load={{0x60,0x46,0x45,0x45,0x48,0x3b,0x61,0x44}} dynamic_power=3960
Here, the "Old" line gives the load and requested_power (dynamic_power
here) numbers calculated using the idle time based implementation, while
"New" is based on the CPU utilization from scheduler.
As can be clearly seen, the load and requested_power numbers are simply
incorrect in the idle time based approach and the numbers collected from
CPU's utilization are much closer to the reality.
Scenario 2: The CPUs were busy in the previous polling window of the IPA
governor:
Old: thermal_power_cpu_get_power: cpus=00000000,000000ff freq=1200000 total_load=800 load={{0x64,0x64,0x64,0x64,0x64,0x64,0x64,0x64}} dynamic_power=5280
New: thermal_power_cpu_get_power: cpus=00000000,000000ff freq=1200000 total_load=708 load={{0x4d,0x5c,0x5c,0x5b,0x5c,0x5c,0x51,0x5b}} dynamic_power=4672
As can be seen, the idle time based load is 100% for all the CPUs as it
took only the last window into account, but in reality the CPUs aren't
that loaded as shown by the utilization numbers.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://lkml.kernel.org/r/9c255c83d78d58451abc06848001faef94c87a12.1607400596.git.viresh.kumar@linaro.org
|
|
If state has not changed successfully and we updated cpufreq_state,
next time when the new state is equal to cpufreq_state (not changed
successfully last time), we will return directly and miss a
freq_qos_update_request() that should have been.
Fixes: 5130802ddbb1 ("thermal: cpu_cooling: Switch to QoS requests for freq limits")
Cc: v5.4+ <stable@vger.kernel.org> # v5.4+
Signed-off-by: Zhuguangqing <zhuguangqing@xiaomi.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20201106092243.15574-1-zhuguangqing83@gmail.com
|
|
1. devfreq_cooling.c: The variable *tz is not used in
devfreq_cooling_get_requested_power(), devfreq_cooling_state2power()
and devfreq_cooling_power2state().
2. cpufreq_cooling.c: After 84fe2cab48590, the variable *tz is not used
anymore in cpufreq_get_requested_power(), cpufreq_state2power() and
cpufreq_power2state().
Remove the variable *tz.
Signed-off-by: zhuguangqing <zhuguangqing@xiaomi.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20200914071101.13575-1-zhuguangqing83@gmail.com
|
|
* pm-em:
OPP: refactor dev_pm_opp_of_register_em() and update related drivers
Documentation: power: update Energy Model description
PM / EM: change name of em_pd_energy to em_cpu_energy
PM / EM: remove em_register_perf_domain
PM / EM: add support for other devices than CPUs in Energy Model
PM / EM: update callback structure and add device pointer
PM / EM: introduce em_dev_register_perf_domain function
PM / EM: change naming convention from 'capacity' to 'performance'
* pm-core:
mmc: jz4740: Use pm_ptr() macro
PM: Make *_DEV_PM_OPS macros use __maybe_unused
PM: core: introduce pm_ptr() macro
|
|
The function cpu_power_to_freq is used to find a frequency and set the
cooling device to consume at most the power to be converted. For example,
if the power to be converted is 80mW, and the em table is as follow.
struct em_cap_state table[] = {
/* KHz mW */
{ 1008000, 36, 0 },
{ 1200000, 49, 0 },
{ 1296000, 59, 0 },
{ 1416000, 72, 0 },
{ 1512000, 86, 0 },
};
The target frequency should be 1416000KHz, not 1512000KHz.
Fixes: 349d39dc5739 ("thermal: cpu_cooling: merge frequency and power tables")
Cc: <stable@vger.kernel.org> # v4.13+
Signed-off-by: Finley Xiao <finley.xiao@rock-chips.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20200619090825.32747-1-finley.xiao@rock-chips.com
|
|
The Energy Model uses concept of performance domain and capacity states in
order to calculate power used by CPUs. Change naming convention from
capacity to performance state would enable wider usage in future, e.g.
upcoming support for other devices other than CPUs.
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Quentin Perret <qperret@google.com>
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
cpufreq_cooling cannot be modular, remove the unnecessary module.h
include and replace with export.h to handle EXPORT_SYMBOL family of
macros.
Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/7a439e41e91d8bc5ff99207f99723fcf04ca36eb.1589199124.git.amit.kucheria@linaro.org
|
|
Sort headers to make it easier to read and find duplicate headers.
Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/4231f5dfe758b9bf716981be71cadf9642c83528.1589199124.git.amit.kucheria@linaro.org
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux
Pull thermal updates from Daniel Lezcano:
- Convert tsens configuration DT binding to yaml (Rajeshwari)
- Add interrupt support on the rcar sensor (Niklas Söderlund)
- Add a new Spreadtrum thermal driver (Baolin Wang)
- Add thermal binding for the fsl scu board, a new API to retrieve the
sensor id bound to the thermal zone and i.MX system controller sensor
(Anson Huang))
- Remove warning log when a deferred probe is requested on Exynos
(Marek Szyprowski)
- Add the thermal monitoring unit support for imx8mm with its DT
bindings (Anson Huang)
- Rephrase the Kconfig text for clarity (Linus Walleij)
- Use the gpio descriptor for the ti-soc-thermal (Linus Walleij)
- Align msg structure to 4 bytes for i.MX SC, fix the Kconfig
dependency, add the __may_be unused annotation for PM functions and
the COMPILE_TEST option for imx8mm (Anson Huang)
- Fix a dependency on regmap in Kconfig for qoriq (Yuantian Tang)
- Add DT binding and support for the rcar gen3 r8a77961 and improve the
error path on the rcar init function (Niklas Söderlund)
- Cleanup and improvements for the tsens Qcom sensor (Amit Kucheria)
- Improve code by removing lock and caching values in the rcar thermal
sensor (Niklas Söderlund)
- Cleanup in the qoriq drivers and add a call to
imx_thermal_unregister_legacy_cooling in the removal function (Anson
Huang)
- Remove redundant 'maxItems' in tsens and sprd DT bindings (Rob
Herring)
- Change the thermal DT bindings by making the cooling-maps optional
(Yuantian Tang)
- Add Tiger Lake support (Sumeet Pawnikar)
- Use scnprintf() for avoiding potential buffer overflow (Takashi Iwai)
- Make pkg_temp_lock a raw_spinlock_t(Clark Williams)
- Fix incorrect data types by changing them to signed on i.MX SC (Anson
Huang)
- Replace zero-length array with flexible-array member (Gustavo A. R.
Silva)
- Add support for i.MX8MP in the driver and in the DT bindings (Anson
Huang)
- Fix return value of the cpufreq_set_cur_state() function (Willy
Wolff)
- Remove abusing and scary WARN_ON in the cpufreq cooling device
(Daniel Lezcano)
- Fix build warning of incorrect argument type reported by sparse on
imx8mm (Anson Huang)
- Fix stub for the devfreq cooling device (Martin Blumenstingl)
- Fix cpu idle cooling documentation (Sergey Vidishev)
* tag 'thermal-v5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux: (52 commits)
Documentation: cpu-idle-cooling: Fix diagram for 33% duty cycle
thermal: devfreq_cooling: inline all stubs for CONFIG_DEVFREQ_THERMAL=n
thermal: imx8mm: Fix build warning of incorrect argument type
thermal/drivers/cpufreq_cooling: Remove abusing WARN_ON
thermal/drivers/cpufreq_cooling: Fix return of cpufreq_set_cur_state
thermal: imx8mm: Add i.MX8MP support
dt-bindings: thermal: imx8mm-thermal: Add support for i.MX8MP
thermal: qcom: tsens.h: Replace zero-length array with flexible-array member
thermal: imx_sc_thermal: Fix incorrect data type
thermal: int340x_thermal: Use scnprintf() for avoiding potential buffer overflow
thermal: int340x: processor_thermal: Add Tiger Lake support
thermal/x86_pkg_temp: Make pkg_temp_lock a raw_spinlock_t
dt-bindings: thermal: make cooling-maps property optional
dt-bindings: thermal: qcom-tsens: Remove redundant 'maxItems'
dt-bindings: thermal: sprd: Remove redundant 'maxItems'
thermal: imx: Calling imx_thermal_unregister_legacy_cooling() in .remove
thermal: qoriq: Sort includes alphabetically
thermal: qoriq: Use devm_add_action_or_reset() to handle all cleanups
thermal: rcar_thermal: Remove lock in rcar_thermal_get_current_temp()
thermal: rcar_thermal: Do not store ctemp in rcar_thermal_priv
...
|
|
The WARN_ON macros are used at the entry functions state2power() and
set_cur_state().
state2power() is called with the max_state retrieved from
get_max_state which returns cpufreq_cdev->max_level, then it check if
max_state is > cpufreq_cdev->max_level. The test does not really makes
sense but let's assume we want to make sure to catch an error if the
code evolves. However the WARN_ON is overkill.
set_cur_state() is also called from userspace if we write to the
sysfs. It is easy to see a stack dumped by just writing to sysfs
/sys/class/thermal/cooling_device0/cur_state a value greater than
"max_level". A bit scary. Returing -EINVAL is enough.
Remove these WARN_ON.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://lore.kernel.org/r/20200321193107.21590-1-daniel.lezcano@linaro.org
|
|
When setting the cooling device current state from userspace via sysfs,
the operation fails by returning an -EINVAL.
It appears the recent changes with the per-policy frequency QoS
introduced a regression as reported by:
https://lkml.org/lkml/2020/3/20/599
The function freq_qos_update_request returns 0 or 1 describing update
effectiveness, and a negative error code on failure. However,
cpufreq_set_cur_state returns 0 on success or an error code otherwise.
Consider the QoS update as successful if the function does not return
an error.
Fixes: 3000ce3c52f8b ("cpufreq: Use per-policy frequency QoS")
Signed-off-by: Willy Wolff <willy.mh.wolff.ml@gmail.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20200321092740.7vvwfxsebcrznydh@macmini.local
|
|
capping
Thermal governors can request for a CPU's maximum supported frequency to
be capped in case of an overheat event. This in turn means that the
maximum capacity available for tasks to run on the particular CPU is
reduced. Delta between the original maximum capacity and capped maximum
capacity is known as thermal pressure. Enable cpufreq cooling device to
update the thermal pressure in event of a capped maximum frequency.
Signed-off-by: Thara Gopinath <thara.gopinath@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20200222005213.3873-9-thara.gopinath@linaro.org
|
|
As we introduced the idle injection cooling device called
cpuidle_cooling, let's be consistent and rename the cpu_cooling to
cpufreq_cooling as this one mitigates with OPPs changes.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org>
Link: https://lore.kernel.org/r/20191219225317.17158-3-daniel.lezcano@linaro.org
|