summaryrefslogtreecommitdiffstats
path: root/Documentation/admin-guide/pm
AgeCommit message (Collapse)AuthorFilesLines
2020-09-01cpufreq: intel_pstate: Tweak the EPP sysfs interfaceRafael J. Wysocki1-1/+3
Modify the EPP sysfs interface to reject attempts to change the EPP to values different from 0 ("performance") in the active mode with the "performance" policy (ie. scaling_governor set to "performance"), to avoid situations in which the kernel appears to discard data passed to it via the EPP sysfs attribute. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
2020-08-24Documentation: fix pm/intel_pstate build warning and wordingRandy Dunlap1-2/+2
Fix documentation build warning and sentence wording: Documentation/admin-guide/pm/intel_pstate.rst:568: WARNING: Unexpected indentation. Fixes: f473bf398bf1 ("cpufreq: intel_pstate: Allow raw energy performance preference value") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-08-14Merge branch 'pm-cpufreq'Rafael J. Wysocki1-46/+43
* pm-cpufreq: cpufreq: intel_pstate: Implement passive mode with HWP enabled
2020-08-11cpufreq: intel_pstate: Implement passive mode with HWP enabledRafael J. Wysocki1-46/+43
Allow intel_pstate to work in the passive mode with HWP enabled and make it set the HWP minimum performance limit (HWP floor) to the P-state value given by the target frequency supplied by the cpufreq governor, so as to prevent the HWP algorithm and the CPU scheduler from working against each other, at least when the schedutil governor is in use, and update the intel_pstate documentation accordingly. Among other things, this allows utilization clamps to be taken into account, at least to a certain extent, when intel_pstate is in use and makes it more likely that sufficient capacity for deadline tasks will be provided. After this change, the resulting behavior of an HWP system with intel_pstate in the passive mode should be close to the behavior of the analogous non-HWP system with intel_pstate in the passive mode, except that the HWP algorithm is generally allowed to make the CPU run at a frequency above the floor P-state set by intel_pstate in the entire available range of P-states, while without HWP a CPU can run in a P-state above the requested one if the latter falls into the range of turbo P-states (referred to as the turbo range) or if the P-states of all CPUs in one package are coordinated with each other at the hardware level. [Note that in principle the HWP floor may not be taken into account by the processor if it falls into the turbo range, in which case the processor has a license to choose any P-state, either below or above the HWP floor, just like a non-HWP processor in the case when the target P-state falls into the turbo range.] With this change applied, intel_pstate in the passive mode assumes complete control over the HWP request MSR and concurrent changes of that MSR (eg. via the direct MSR access interface) are overridden by it. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2020-08-04Merge tag 'docs-5.9' of git://git.lwn.net/linuxLinus Torvalds2-3/+3
Pull documentation updates from Jonathan Corbet: "It's been a busy cycle for documentation - hopefully the busiest for a while to come. Changes include: - Some new Chinese translations - Progress on the battle against double words words and non-HTTPS URLs - Some block-mq documentation - More RST conversions from Mauro. At this point, that task is essentially complete, so we shouldn't see this kind of churn again for a while. Unless we decide to switch to asciidoc or something...:) - Lots of typo fixes, warning fixes, and more" * tag 'docs-5.9' of git://git.lwn.net/linux: (195 commits) scripts/kernel-doc: optionally treat warnings as errors docs: ia64: correct typo mailmap: add entry for <alobakin@marvell.com> doc/zh_CN: add cpu-load Chinese version Documentation/admin-guide: tainted-kernels: fix spelling mistake MAINTAINERS: adjust kprobes.rst entry to new location devices.txt: document rfkill allocation PCI: correct flag name docs: filesystems: vfs: correct flag name docs: filesystems: vfs: correct sync_mode flag names docs: path-lookup: markup fixes for emphasis docs: path-lookup: more markup fixes docs: path-lookup: fix HTML entity mojibake CREDITS: Replace HTTP links with HTTPS ones docs: process: Add an example for creating a fixes tag doc/zh_CN: add Chinese translation prefer section doc/zh_CN: add clearing-warn-once Chinese version doc/zh_CN: add admin-guide index doc:it_IT: process: coding-style.rst: Correct __maybe_unused compiler label futex: MAINTAINERS: Re-add selftests directory ...
2020-07-05Documentation/admin-guide: intel-speed-select: drop doubled wordsRandy Dunlap1-2/+2
Drop the doubled words "that" and "and". Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: linux-doc@vger.kernel.org Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: platform-driver-x86@vger.kernel.org Link: https://lore.kernel.org/r/20200704032020.21923-11-rdunlap@infradead.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2020-07-05Documentation/admin-guide: intel_pstate: drop doubled wordRandy Dunlap1-1/+1
Drop the doubled word "to". Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: linux-doc@vger.kernel.org Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: Len Brown <lenb@kernel.org> Cc: linux-pm@vger.kernel.org Link: https://lore.kernel.org/r/20200704032020.21923-10-rdunlap@infradead.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2020-07-02cpufreq: Specify default governor on command lineQuentin Perret1-3/+3
Currently, the only way to specify the default CPUfreq governor is via Kconfig options, which suits users who can build the kernel themselves perfectly. However, for those who use a distro-like kernel (such as Android, with the Generic Kernel Image project), the only way to use a non-default governor is to boot to userspace, and to then switch using the sysfs interface. Being able to specify the default governor on the command line, like is the case for cpuidle, would allow those users to specify their governor of choice earlier on, and to simplify the userspace boot procedure slighlty. To support this use-case, add a kernel command line parameter allowing the default governor for CPUfreq to be specified, which takes precedence over the built-in default. This implementation has one notable limitation: the default governor must be registered before the driver. This is solved for builtin governors and drivers using appropriate *_initcall() functions. And in the modular case, this must be reflected as a constraint on the module loading order. Signed-off-by: Quentin Perret <qperret@google.com> [ Viresh: Converted 'default_governor' to a string and parsing it only at initcall level, and several updates to cpufreq_init_policy(). ] Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> [ rjw: Changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-07-02cpufreq: intel_pstate: Allow raw energy performance preference valueSrinivas Pandruvada1-1/+5
Currently using attribute "energy_performance_preference", user space can write one of the four per-defined preference string. These preference strings gets mapped to a hard-coded Energy-Performance Preference (EPP) or Energy-Performance Bias (EPB) knob. These four values are supposed to cover broad spectrum of use cases, but are not uniformly distributed in the range. There are number of cases, where this is not enough. For example: Suppose user wants more performance when connected to AC. Instead of using default "balance performance", the "performance" setting can be used. This changes EPP value from 0x80 to 0x00. But setting EPP to 0, results in electrical and thermal issues on some platforms. This results in aggressive throttling, which causes a drop in performance. But some value between 0x80 and 0x00 results in better performance. But that value can't be fixed as the power curve is not linear. In some cases just changing EPP from 0x80 to 0x75 is enough to get significant performance gain. Similarly on battery the default "balance_performance" mode can be aggressive in power consumption. But picking up the next choice "balance power" results in too much loss of performance, which results in bad user experience in use cases like "Google Hangout". It was observed that some value between these two EPP is optimal. This change allows fine grain EPP tuning for platform like Chromebook or for users who wants to fine tune power and performance. Here based on the product and use cases, different EPP values can be set. This change is similar to the change done for: /sys/devices/system/cpu/cpu*/power/energy_perf_bias where user has choice to write a predefined string or raw value. The change itself is trivial. When user preference doesn't match predefined string preferences and value is an unsigned integer and in range, use that value for EPP. When the EPP feature is not present writing raw value is not supported. Suggested-by: Len Brown <lenb@kernel.org> Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-07-02cpufreq: intel_pstate: Allow enable/disable energy efficiencySrinivas Pandruvada1-0/+11
By default intel_pstate the driver disables energy efficiency by setting MSR_IA32_POWER_CTL bit 19 for Kaby Lake desktop CPU model in HWP mode. This CPU model is also shared by Coffee Lake desktop CPUs. This allows these systems to reach maximum possible frequency. But this adds power penalty, which some customers don't want. They want some way to enable/ disable dynamically. So, add an additional attribute "energy_efficiency" under /sys/devices/system/cpu/intel_pstate/ for these CPU models. This allows to read and write bit 19 ("Disable Energy Efficiency Optimization") in the MSR IA32_POWER_CTL. This attribute is present in both HWP and non-HWP mode as this has an effect in both modes. Refer to Intel Software Developer's manual for details. The scope of this bit is package wide. Also these systems are single package systems. So read/write MSR on the current CPU is enough. The energy efficiency (EE) bit setting needs to be preserved during suspend/resume and CPU offline/online operation. To do this: - Restoring the EE setting from the cpufreq resume() callback, if there is change from the system default. - By default, don't disable EE from cpufreq init() callback for matching CPU models. Since the scope is package wide and is a single package system, move the disable EE calls from init() callback to intel_pstate_init() function, which is called only once. Suggested-by: Len Brown <lenb@kernel.org> Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-06-01Merge branches 'pm-devfreq', 'powercap', 'pm-docs' and 'pm-tools'Rafael J. Wysocki2-0/+918
* pm-devfreq: PM / devfreq: Use lockdep asserts instead of manual checks for locked mutex PM / devfreq: imx-bus: Fix inconsistent IS_ERR and PTR_ERR PM / devfreq: Replace strncpy with strscpy PM / devfreq: imx: Register interconnect device PM / devfreq: Add generic imx bus scaling driver PM / devfreq: tegra30: Delete an error message in tegra_devfreq_probe() PM / devfreq: tegra30: Make CPUFreq notifier to take into account boosting * powercap: powercap: RAPL: remove unused local MSR define powercap/intel_rapl: add support for ElkhartLake * pm-docs: Documentation: admin-guide: pm: Document intel-speed-select * pm-tools: cpupower: Remove unneeded semicolon
2020-06-01Merge branch 'pm-cpufreq'Rafael J. Wysocki1-13/+19
* pm-cpufreq: cpufreq: Fix up cpufreq_boost_set_sw() cpufreq: fix minor typo in struct cpufreq_driver doc comment cpufreq: qoriq: Add platform dependencies clk: qoriq: add cpufreq platform device cpufreq: qoriq: convert to a platform driver cpufreq: qcom: fix wrong compatible binding cpufreq: imx-cpufreq-dt: support i.MX7ULP cpufreq: dt: Add support for r8a7742 cpufreq: Add i.MX7ULP to cpufreq-dt-platdev blacklist cpufreq: omap: Build driver by default for ARCH_OMAP2PLUS cpufreq: intel_pstate: Use passive mode by default without HWP
2020-05-19Documentation: admin-guide: pm: Document intel-speed-selectSrinivas Pandruvada2-0/+918
Added documentation to configure servers to use Intel(R) Speed Select Technology using intel-speed-select tool. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Acked-by: Andriy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-05-19Documentation: cpuidle: update the documentHanjun Guo1-11/+9
Update the document after the remove of cpuidle_sysfs_switch. Signed-off-by: Hanjun Guo <guohanjun@huawei.com> Reviewed-by: Doug Smythies <dsmythies@telus.net> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-04-17cpufreq: intel_pstate: Use passive mode by default without HWPRafael J. Wysocki1-13/+19
After recent changes allowing scale-invariant utilization to be used on x86, the schedutil governor on top of intel_pstate in the passive mode should be on par with (or better than) the active mode "powersave" algorithm of intel_pstate on systems in which hardware-managed P-states (HWP) are not used, so it should not be necessary to use the internal scaling algorithm in those cases. Accordingly, modify intel_pstate to start in the passive mode by default if the processor at hand does not support HWP of if the driver is requested to avoid using HWP through the kernel command line. Among other things, that will allow utilization clamps and the support for RT/DL tasks in the schedutil governor to be utilized on systems in which intel_pstate is used. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-04-03Documentation: PM: sleep: Document system-wide suspend code flowsRafael J. Wysocki2-0/+271
Add a document describing high-level system-wide suspend code flows in Linux. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-03-30Merge branches 'pm-devfreq', 'powercap' and 'pm-docs'Rafael J. Wysocki2-0/+275
* pm-devfreq: PM / devfreq: Get rid of some doc warnings PM / devfreq: Fix handling dev_pm_qos_remove_request result PM / devfreq: Fix a typo in a comment PM / devfreq: Change to DEVFREQ_GOV_UPDATE_INTERVAL event name PM / devfreq: Remove unneeded extern keyword PM / devfreq: Use constant name of userspace governor * powercap: powercap: idle_inject: Replace zero-length array with flexible-array member * pm-docs: docs: cpu-freq: convert cpufreq-stats.txt to ReST docs: cpu-freq: convert cpu-drivers.txt to ReST docs: cpu-freq: convert core.txt to ReST docs: cpu-freq: convert index.txt to ReST docs: cpufreq: fix a broken reference Documentation: cpufreq: Move legacy driver documentation
2020-03-30Merge branch 'pm-cpufreq'Rafael J. Wysocki1-2/+2
* pm-cpufreq: cpufreq: intel_pstate: Simplify intel_pstate_cpu_init() cpufreq: qcom: Add support for krait based socs cpufreq: imx6q-cpufreq: Improve the logic of -EPROBE_DEFER handling cpufreq: Use scnprintf() for avoiding potential buffer overflow Documentation: intel_pstate: update links for references cpufreq: intel_pstate: Consolidate policy verification cpufreq: dt: Allow platform specific intermediate callbacks cpufreq: imx-cpufreq-dt: Correct i.MX8MP's market segment fuse location cpufreq: imx6q: read OCOTP through nvmem for imx6q cpufreq: imx6q: fix error handling cpufreq: imx-cpufreq-dt: Add "cpu-supply" property check cpufreq: ti-cpufreq: Add support for OPP_PLUS cpufreq: imx6q: Fixes unwanted cpu overclocking on i.MX6ULL
2020-03-14Documentation: intel_pstate: update links for referencesAlex Hung1-2/+2
URLs for presentation and Intel Software Developer’s Manual are updated as they were using "http" which are gradually replaced by "https". Signed-off-by: Alex Hung <alex.hung@canonical.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-03-02Documentation: cpufreq: Move legacy driver documentationRafael J. Wysocki2-0/+275
There are three legacy driver documents in Documentation/cpu-freq/ that were added years ago and converting them each to the .rst format is rather pointless, even though there is some value in preserving them. However, if they are preserved, they need to go into the admin-guide part of cpufreq documentation where they belong (at least to a certain extent). To preserve them with minimum amount of changes and put them into the right place, and make it possible to process them into HTML (and other formats) along with the rest of the documentation, move them each as a "literal text" block into a separate section of a single .rst "wrapper" file under Documentation/admin-guide/pm/. While at it, repair the PCC specification URL in one of them. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-02-14Documentation: PM: QoS: Update to reflect previous code changesRafael J. Wysocki1-37/+36
Update the PM QoS documentation to reflect the previous code changes regarding the removal of PM QoS classes and the CPU latency QoS API rework. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org> Tested-by: Amit Kucheria <amit.kucheria@linaro.org>
2020-02-07Merge branches 'pm-avs' and 'pm-cpuidle'Rafael J. Wysocki2-8/+30
* pm-avs: power: avs: qcom-cpr: Avoid clang -Wsometimes-uninitialized in cpr_scale power: avs: qcom-cpr: add unspecified HAS_IOMEM dependency PM / AVS: rockchip-io: fix the supply naming for the emmc supply on px30 power: avs: qcom-cpr: add a printout after the driver has been initialized * pm-cpuidle: cpuidle: Documentation: Clean up PM QoS description intel_idle: Introduce 'states_off' module parameter intel_idle: Introduce 'use_acpi' module parameter
2020-02-05cpuidle: Documentation: Clean up PM QoS descriptionRafael J. Wysocki1-4/+4
Clean up the language in one paragraph in the PM QoS description in Documentation/admin-guide/pm/cpuidle.rst. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-02-03Documentation: admin-guide: PM: Update sleep states documentationRafael J. Wysocki1-17/+59
There is some information in Documentation/power/interface.rst that is still missing from Documentation/admin-guide/pm/sleep-states.rst and really should be present in there, so update the latter by adding that information to it and delete the former (as it becomes redundant after that and it is somewhat outdated). While at it, clean up some assorted pieces of sleep-states.rst a bit. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-02-03intel_idle: Introduce 'states_off' module parameterRafael J. Wysocki1-1/+18
In certain system configurations it may not be desirable to use some C-states assumed to be available by intel_idle and the driver needs to be prevented from using them even before the cpuidle sysfs interface becomes accessible to user space. Currently, the only way to achieve that is by setting the 'max_cstate' module parameter to a value lower than the index of the shallowest of the C-states in question, but that may be overly intrusive, because it effectively makes all of the idle states deeper than the 'max_cstate' one go away (and the C-state to avoid may be in the middle of the range normally regarded as available). To allow that limitation to be overcome, introduce a new module parameter called 'states_off' to represent a list of idle states to be disabled by default in the form of a bitmask and update the documentation to cover it. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-02-03intel_idle: Introduce 'use_acpi' module parameterRafael J. Wysocki1-4/+9
For diagnostics, it is generally useful to be able to make intel_idle take the system's ACPI tables into consideration even if that is not required for the processor model in there, so introduce a new module parameter, 'use_acpi', to make that happen and update the documentation to cover it. While at it, fix the 'no_acpi' module parameter name in the documentation. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-01-15Documentation: admin-guide: PM: Add intel_idle documentRafael J. Wysocki2-0/+247
Add an admin-guide document for the intel_idle driver to describe how it works: how it enumerates idle states, what happens during the initialization of it, how it can be controlled via the kernel command line and so on. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
2019-12-27cpuidle: Allow idle states to be disabled by defaultRafael J. Wysocki1-0/+3
In certain situations it may be useful to prevent some idle states from being used by default while allowing user space to enable them later on. For this purpose, introduce a new state flag, CPUIDLE_FLAG_OFF, to mark idle states that should be disabled by default, make the core set CPUIDLE_STATE_DISABLED_BY_USER for those states at the initialization time and add a new state attribute in sysfs, "default_status", to inform user space of the initial status of the given idle state ("disabled" if CPUIDLE_FLAG_OFF is set for it, "enabled" otherwise). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-05-06Merge branches 'pm-docs' and 'pm-misc'Rafael J. Wysocki8-22/+58
* pm-docs: Documentation: PM: Unify copyright notices Documentation: PM: Add SPDX license tags to multiple files cpufreq: intel_pstate: Documentation: Add references sections * pm-misc: firmware/psci: add support for SYSTEM_RESET2 drivers: firmware: psci: Announce support for OS initiated suspend mode drivers: firmware: psci: Simplify error path of psci_dt_init() drivers: firmware: psci: Split psci_dt_cpu_init_idle() MAINTAINERS: Update files for PSCI drivers: firmware: psci: Move psci to separate directory
2019-04-08admin-guide: pm: intel_epb: Add SPDX license tag and copyright noticeRafael J. Wysocki1-0/+8
Add an SPDX license tag and a copyright notice to the intel_epb.rst file under Documentation/admin-quide/pm. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2019-04-08Documentation: PM: Unify copyright noticesRafael J. Wysocki5-10/+19
Unify copyright notices in the .rst files under Documentation/driver-api/pm and Documentation/admin-quide/pm. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2019-04-08Documentation: PM: Add SPDX license tags to multiple filesRafael J. Wysocki8-0/+16
Add SPDX license tags to .rst files under Documentation/driver-api/pm and Documentation/admin-quide/pm. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2019-04-08cpufreq: intel_pstate: Documentation: Add references sectionsRafael J. Wysocki2-12/+23
Add separate refereces sections to the cpufreq.rst and intel_pstate.rst documents under admin-quide/pm and list the references to external documentation in there. Update the ACPI specification URL while at it. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2019-04-07PM / arch: x86: MSR_IA32_ENERGY_PERF_BIAS sysfs interfaceRafael J. Wysocki1-0/+27
The Performance and Energy Bias Hint (EPB) is expected to be set by user space through the generic MSR interface, but that interface is not particularly nice and there are security concerns regarding it, so it is not always available. For this reason, add a sysfs interface for reading and updating the EPB, in the form of a new attribute, energy_perf_bias, located under /sys/devices/system/cpu/cpu#/power/ for online CPUs that support the EPB feature. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Acked-by: Borislav Petkov <bp@suse.de>
2019-04-07PM / arch: x86: Rework the MSR_IA32_ENERGY_PERF_BIAS handlingRafael J. Wysocki2-0/+7
The current handling of MSR_IA32_ENERGY_PERF_BIAS in the kernel is problematic, because it may cause changes made by user space to that MSR (with the help of the x86_energy_perf_policy tool, for example) to be lost every time a CPU goes offline and then back online as well as during system-wide power management transitions into sleep states and back into the working state. The first problem is that if the current EPB value for a CPU going online is 0 ('performance'), the kernel will change it to 6 ('normal') regardless of whether or not this is the first bring-up of that CPU. That also happens during system-wide resume from sleep states (including, but not limited to, hibernation). However, the EPB may have been adjusted by user space this way and the kernel should not blindly override that setting. The second problem is that if the platform firmware resets the EPB values for any CPUs during system-wide resume from a sleep state, the kernel will not restore their previous EPB values that may have been set by user space before the preceding system-wide suspend transition. Again, that behavior may at least be confusing from the user space perspective. In order to address these issues, rework the handling of MSR_IA32_ENERGY_PERF_BIAS so that the EPB value is saved on CPU offline and restored on CPU online as well as (for the boot CPU) during the syscore stages of system-wide suspend and resume transitions, respectively. However, retain the policy by which the EPB is set to 6 ('normal') on the first bring-up of each CPU if its initial value is 0, based on the observation that 0 may mean 'not initialized' just as well as 'performance' in that case. While at it, move the MSR_IA32_ENERGY_PERF_BIAS handling code into a separate file and document it in Documentation/admin-guide. Fixes: abe48b108247 (x86, intel, power: Initialize MSR_IA32_ENERGY_PERF_BIAS) Fixes: b51ef52df71c (x86/cpu: Restore MSR_IA32_ENERGY_PERF_BIAS after resume) Reported-by: Thomas Renninger <trenn@suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Acked-by: Borislav Petkov <bp@suse.de> Acked-by: Thomas Gleixner <tglx@linutronix.de>
2019-01-16cpuidle: New timer events oriented governor for tickless systemsRafael J. Wysocki1-8/+96
The venerable menu governor does some things that are quite questionable in my view. First, it includes timer wakeups in the pattern detection data and mixes them up with wakeups from other sources which in some cases causes it to expect what essentially would be a timer wakeup in a time frame in which no timer wakeups are possible (because it knows the time until the next timer event and that is later than the expected wakeup time). Second, it uses the extra exit latency limit based on the predicted idle duration and depending on the number of tasks waiting on I/O, even though those tasks may run on a different CPU when they are woken up. Moreover, the time ranges used by it for the sleep length correction factors depend on whether or not there are tasks waiting on I/O, which again doesn't imply anything in particular, and they are not correlated to the list of available idle states in any way whatever. Also, the pattern detection code in menu may end up considering values that are too large to matter at all, in which cases running it is a waste of time. A major rework of the menu governor would be required to address these issues and the performance of at least some workloads (tuned specifically to the current behavior of the menu governor) is likely to suffer from that. It is thus better to introduce an entirely new governor without them and let everybody use the governor that works better with their actual workloads. The new governor introduced here, the timer events oriented (TEO) governor, uses the same basic strategy as menu: it always tries to find the deepest idle state that can be used in the given conditions. However, it applies a different approach to that problem. First, it doesn't use "correction factors" for the time till the closest timer, but instead it tries to correlate the measured idle duration values with the available idle states and use that information to pick up the idle state that is most likely to "match" the upcoming CPU idle interval. Second, it doesn't take the number of "I/O waiters" into account at all and the pattern detection code in it avoids taking timer wakeups into account. It also only uses idle duration values less than the current time till the closest timer (with the tick excluded) for that purpose. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2018-12-21Merge branches 'pm-cpuidle', 'pm-cpufreq' and 'pm-cpufreq-sched'Rafael J. Wysocki3-1/+641
* pm-cpuidle: cpuidle: Add 'above' and 'below' idle state metrics cpuidle: big.LITTLE: fix refcount leak cpuidle: Add cpuidle.governor= command line parameter cpuidle: poll_state: Disregard disable idle states Documentation: admin-guide: PM: Add cpuidle document * pm-cpufreq: cpufreq: qcom-hw: Add support for QCOM cpufreq HW driver dt-bindings: cpufreq: Introduce QCOM cpufreq firmware bindings cpufreq: nforce2: Remove meaningless return cpufreq: ia64: Remove unused header files cpufreq: imx6q: save one condition block for normal case of nvmem read cpufreq: imx6q: remove unused code cpufreq: pmac64: add of_node_put() cpufreq: powernv: add of_node_put() Documentation: intel_pstate: Clarify coordination of P-State limits cpufreq: intel_pstate: Force HWP min perf before offline cpufreq: s3c24xx: Change to use DEFINE_SHOW_ATTRIBUTE macro * pm-cpufreq-sched: sched/cpufreq: Add the SPDX tags
2018-12-12cpuidle: Add 'above' and 'below' idle state metricsRafael J. Wysocki1-0/+10
Add two new metrics for CPU idle states, "above" and "below", to count the number of times the given state had been asked for (or entered from the kernel's perspective), but the observed idle duration turned out to be too short or too long for it (respectively). These metrics help to estimate the quality of the CPU idle governor in use. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-12-11cpuidle: Add cpuidle.governor= command line parameterRafael J. Wysocki1-0/+7
Add cpuidle.governor= command line parameter to allow the default cpuidle governor to be replaced. That is useful, for example, if someone running a tickful kernel wants to use the menu governor on it. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-12-03Documentation: admin-guide: PM: Add cpuidle documentRafael J. Wysocki2-0/+615
Important information is missing from user/admin cpuidle documentation available today, so add a new user/admin document for cpuidle containing current and comprehensive information to admin-guide and drop the old .txt documents it is replacing. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
2018-11-29Documentation: intel_pstate: Clarify coordination of P-State limitsSrinivas Pandruvada1-1/+9
Explain influence of per-core P-states and hyper threading on the effective performance. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-11-07Documentation: cpufreq: Correct a typoZhao Wei Liew1-1/+1
Fix a typo in the admin-guide documentation for cpufreq. Signed-off-by: Zhao Wei Liew <zhaoweiliew@gmail.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-10-16Documentation: intel_pstate: Add base_frequency informationSrinivas Pandruvada1-0/+7
Updated documentation to explain base_frequency attribute. Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-06-27Documentation: intel_pstate: Describe hwp_dynamic_boost sysfs knobRafael J. Wysocki1-0/+11
Document the recently introduced hwp_dynamic_boost sysfs knob allowing user space to tell intel_pstate to use iowait boosting in the active mode with HWP enabled (to improve performance). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
2018-06-27Documentation: admin-guide: intel_pstate: Fix sysfs pathRafael J. Wysocki1-2/+1
Fix an incorrect sysfs path in the intel_pstate admin-guide documentation. Fixes: 33fc30b47098 (cpufreq: intel_pstate: Document the current behavior and user interface) Reported-by: Pawit Pornkitprasan <p.pawit@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-06-21Documentation: intel_pstate: Fix typoRafael J. Wysocki1-1/+1
Fix a typo in the intel_pstate admin-guide documentation. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-05-09PM: docs: intel_pstate: fix Active Mode w/o HWP paragraphJuri Lelli1-1/+1
P-state selection algorithm (powersave or performance) is selected by echoing the desired choice to scaling_governor sysfs attribute and not to scaling_cur_freq (as currently stated). Fix it. Signed-off-by: Juri Lelli <juri.lelli@redhat.com> Reviewed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-05-09PM: docs: sleep-states: Fix a typo ("includig")Jonathan Neuschäfer1-1/+1
Fix a typo in admin-guide/pm/sleep-states.rst. Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-09-04Merge branch 'pm-docs'Rafael J. Wysocki5-9/+317
* pm-docs: PM: docs: Delete the obsolete states.txt document PM: docs: Describe high-level PM strategies and sleep states
2017-09-04Merge branch 'intel_pstate'Rafael J. Wysocki1-53/+8
* intel_pstate: cpufreq: intel_pstate: Shorten a couple of long names cpufreq: intel_pstate: Simplify intel_pstate_adjust_pstate() cpufreq: intel_pstate: Improve IO performance with per-core P-states cpufreq: intel_pstate: Drop INTEL_PSTATE_HWP_SAMPLING_INTERVAL cpufreq: intel_pstate: Drop ->update_util from pstate_funcs cpufreq: intel_pstate: Do not use PID-based P-state selection