summaryrefslogtreecommitdiffstats
path: root/drivers/cpuidle
AgeCommit message (Collapse)AuthorFilesLines
2011-12-21cpu: convert 'cpu' and 'machinecheck' sysdev_class to a regular subsystemKay Sievers3-49/+47
This moves the 'cpu sysdev_class' over to a regular 'cpu' subsystem and converts the devices to regular devices. The sysdev drivers are implemented as subsystem interfaces now. After all sysdev classes are ported to regular driver core entities, the sysdev implementation will be entirely removed from the kernel. Userspace relies on events and generic sysfs subsystem infrastructure from sysdev devices, which are made available with this conversion. Cc: Haavard Skinnemoen <hskinnemoen@gmail.com> Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Borislav Petkov <bp@amd64.org> Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk> Cc: Len Brown <lenb@kernel.org> Cc: Zhang Rui <rui.zhang@intel.com> Cc: Dave Jones <davej@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-07Merge branch 'release' of ↵Linus Torvalds5-88/+115
git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: cpuidle: Single/Global registration of idle states cpuidle: Split cpuidle_state structure and move per-cpu statistics fields cpuidle: Remove CPUIDLE_FLAG_IGNORE and dev->prepare() cpuidle: Move dev->last_residency update to driver enter routine; remove dev->last_state ACPI: Fix CONFIG_ACPI_DOCK=n compiler warning ACPI: Export FADT pm_profile integer value to userspace thermal: Prevent polling from happening during system suspend ACPI: Drop ACPI_NO_HARDWARE_INIT ACPI atomicio: Convert width in bits to bytes in __acpi_ioremap_fast() PNPACPI: Simplify disabled resource registration ACPI: Fix possible recursive locking in hwregs.c ACPI: use kstrdup() mrst pmu: update comment tools/power turbostat: less verbose debugging
2011-11-06cpuidle: Single/Global registration of idle statesDeepthi Dharwar5-53/+68
This patch makes the cpuidle_states structure global (single copy) instead of per-cpu. The statistics needed on per-cpu basis by the governor are kept per-cpu. This simplifies the cpuidle subsystem as state registration is done by single cpu only. Having single copy of cpuidle_states saves memory. Rare case of asymmetric C-states can be handled within the cpuidle driver and architectures such as POWER do not have asymmetric C-states. Having single/global registration of all the idle states, dynamic C-state transitions on x86 are handled by the boot cpu. Here, the boot cpu would disable all the devices, re-populate the states and later enable all the devices, irrespective of the cpu that would receive the notification first. Reference: https://lkml.org/lkml/2011/4/25/83 Signed-off-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com> Signed-off-by: Trinabh Gupta <g.trinabh@gmail.com> Tested-by: Jean Pihet <j-pihet@ti.com> Reviewed-by: Kevin Hilman <khilman@ti.com> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Acked-by: Kevin Hilman <khilman@ti.com> Signed-off-by: Len Brown <len.brown@intel.com>
2011-11-06cpuidle: Split cpuidle_state structure and move per-cpu statistics fieldsDeepthi Dharwar2-11/+19
This is the first step towards global registration of cpuidle states. The statistics used primarily by the governor are per-cpu and have to be split from rest of the fields inside cpuidle_state, which would be made global i.e. single copy. The driver_data field is also per-cpu and moved. Signed-off-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com> Signed-off-by: Trinabh Gupta <g.trinabh@gmail.com> Tested-by: Jean Pihet <j-pihet@ti.com> Reviewed-by: Kevin Hilman <khilman@ti.com> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Acked-by: Kevin Hilman <khilman@ti.com> Signed-off-by: Len Brown <len.brown@intel.com>
2011-11-06cpuidle: Remove CPUIDLE_FLAG_IGNORE and dev->prepare()Deepthi Dharwar2-12/+0
The cpuidle_device->prepare() mechanism causes updates to the cpuidle_state[].flags, setting and clearing CPUIDLE_FLAG_IGNORE to tell the governor not to chose a state on a per-cpu basis at run-time. State demotion is now handled by the driver and it returns the actual state entered. Hence, this mechanism is not required. Also this removes per-cpu flags from cpuidle_state enabling it to be made global. Reference: https://lkml.org/lkml/2011/3/25/52 Signed-off-by: Deepthi Dharwar <deepthi@linux.vnet.ibm> Signed-off-by: Trinabh Gupta <g.trinabh@gmail.com> Tested-by: Jean Pihet <j-pihet@ti.com> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Reviewed-by: Kevin Hilman <khilman@ti.com> Signed-off-by: Len Brown <len.brown@intel.com>
2011-11-06cpuidle: Move dev->last_residency update to driver enter routine; remove ↵Deepthi Dharwar3-18/+34
dev->last_state Cpuidle governor only suggests the state to enter using the governor->select() interface, but allows the low level driver to override the recommended state. The actual entered state may be different because of software or hardware demotion. Software demotion is done by the back-end cpuidle driver and can be accounted correctly. Current cpuidle code uses last_state field to capture the actual state entered and based on that updates the statistics for the state entered. Ideally the driver enter routine should update the counters, and it should return the state actually entered rather than the time spent there. The generic cpuidle code should simply handle where the counters live in the sysfs namespace, not updating the counters. Reference: https://lkml.org/lkml/2011/3/25/52 Signed-off-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com> Signed-off-by: Trinabh Gupta <g.trinabh@gmail.com> Tested-by: Jean Pihet <j-pihet@ti.com> Reviewed-by: Kevin Hilman <khilman@ti.com> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Acked-by: Kevin Hilman <khilman@ti.com> Signed-off-by: Len Brown <len.brown@intel.com>
2011-10-31cpuidle: ladder.c needs module.h and not just moduleparam.hPaul Gortmaker1-1/+1
This file has module_init/exit and MODULE_LICENSE, and so it needs the full module.h header. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31cpuidle: Add module.h to drivers/cpuidle files as required.Paul Gortmaker2-0/+2
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-08-25PM QoS: Move and rename the implementation filesJean Pihet3-3/+3
The PM QoS implementation files are better named kernel/power/qos.c and include/linux/pm_qos.h. The PM QoS support is compiled under the CONFIG_PM option. Signed-off-by: Jean Pihet <j-pihet@ti.com> Acked-by: markgross <markgross@thegnar.org> Reviewed-by: Kevin Hilman <khilman@ti.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-08-03cpuidle: stop depending on pm_idleLen Brown1-20/+18
cpuidle users should call cpuidle_call_idle() directly rather than via (pm_idle)() function pointer. Architecture may choose to continue using (pm_idle)(), but cpuidle need not depend on it: my_arch_cpu_idle() ... if(cpuidle_call_idle()) pm_idle(); cc: Kevin Hilman <khilman@deeprootsystems.com> cc: Paul Mundt <lethal@linux-sh.org> cc: x86@kernel.org Acked-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2011-08-03cpuidle: replace xen access to x86 pm_idle and default_idleLen Brown1-0/+4
When a Xen Dom0 kernel boots on a hypervisor, it gets access to the raw-hardware ACPI tables. While it parses the idle tables for the hypervisor's beneift, it uses HLT for its own idle. Rather than have xen scribble on pm_idle and access default_idle, have it simply disable_cpuidle() so acpi_idle will not load and architecture default HLT will be used. cc: xen-devel@lists.xensource.com Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2011-08-03cpuidle: create bootparam "cpuidle.off=1"Len Brown4-0/+17
useful for disabling cpuidle to fall back to architecture-default idle loop cpuidle drivers and governors will fail to register. on x86 they'll say so: intel_idle: intel_idle yielding to (null) ACPI: acpi_idle yielding to (null) Signed-off-by: Len Brown <len.brown@intel.com>
2011-05-29Merge branch 'idle-release' of ↵Linus Torvalds1-1/+3
git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6 * 'idle-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6: x86 idle: deprecate mwait_idle() and "idle=mwait" cmdline param x86 idle: deprecate "no-hlt" cmdline param x86 idle APM: deprecate CONFIG_APM_CPU_IDLE x86 idle floppy: deprecate disable_hlt() x86 idle: EXPORT_SYMBOL(default_idle, pm_idle) only when APM demands it x86 idle: clarify AMD erratum 400 workaround idle governor: Avoid lock acquisition to read pm_qos before entering idle cpuidle: menu: fixed wrapping timers at 4.294 seconds
2011-05-29cpuidle: menu: fixed wrapping timers at 4.294 secondsTero Kristo1-1/+3
Cpuidle menu governor is using u32 as a temporary datatype for storing nanosecond values which wrap around at 4.294 seconds. This causes errors in predicted sleep times resulting in higher than should be C state selection and increased power consumption. This also breaks cpuidle state residency statistics. cc: stable@kernel.org # .32.x through .39.x Signed-off-by: Tero Kristo <tero.kristo@nokia.com> Signed-off-by: Len Brown <len.brown@intel.com>
2011-02-15Merge branch 'master' into for-nextJiri Kosina1-43/+49
2011-01-19Kill off warning: ‘inline’ is not at beginning of declarationJesper Juhl1-1/+1
Fix a bunch of warning: ‘inline’ is not at beginning of declaration messages when building a 'make allyesconfig' kernel with -Wextra. These warnings are trivial to kill, yet rather annoying when building with -Wextra. The more we can cut down on pointless crap like this the better (IMHO). A previous patch to do this for a 'allnoconfig' build has already been merged. This just takes the cleanup a little further. Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-01-12Merge branch 'cpuidle-perf-events' into idle-testLen Brown1-2/+8
2011-01-12Merge branch 'linus' into idle-testLen Brown1-1/+2
2011-01-12cpuidle/x86/perf: fix power:cpu_idle double end events and throw cpu_idle ↵Thomas Renninger1-2/+8
events from the cpuidle layer Currently intel_idle and acpi_idle driver show double cpu_idle "exit idle" events -> this patch fixes it and makes cpu_idle events throwing less complex. It also introduces cpu_idle events for all architectures which use the cpuidle subsystem, namely: - arch/arm/mach-at91/cpuidle.c - arch/arm/mach-davinci/cpuidle.c - arch/arm/mach-kirkwood/cpuidle.c - arch/arm/mach-omap2/cpuidle34xx.c - arch/drivers/acpi/processor_idle.c (for all cases, not only mwait) - arch/x86/kernel/process.c (did throw events before, but was a mess) - drivers/idle/intel_idle.c (did throw events before) Convention should be: Fire cpu_idle events inside the current pm_idle function (not somewhere down the the callee tree) to keep things easy. Current possible pm_idle functions in X86: c1e_idle, poll_idle, cpuidle_idle_call, mwait_idle, default_idle -> this is really easy is now. This affects userspace: The type field of the cpu_idle power event can now direclty get mapped to: /sys/devices/system/cpu/cpuX/cpuidle/stateX/{name,desc,usage,time,...} instead of throwing very CPU/mwait specific values. This change is not visible for the intel_idle driver. For the acpi_idle driver it should only be visible if the vendor misses out C-states in his BIOS. Another (perf timechart) patch reads out cpuidle info of cpu_idle events from: /sys/.../cpuidle/stateX/*, then the cpuidle events are mapped to the correct C-/cpuidle state again, even if e.g. vendors miss out C-states in their BIOS and for example only export C1 and C3. -> everything is fine. Signed-off-by: Thomas Renninger <trenn@suse.de> CC: Robert Schoene <robert.schoene@tu-dresden.de> CC: Jean Pihet <j-pihet@ti.com> CC: Arjan van de Ven <arjan@linux.intel.com> CC: Ingo Molnar <mingo@elte.hu> CC: Frederic Weisbecker <fweisbec@gmail.com> CC: linux-pm@lists.linux-foundation.org CC: linux-acpi@vger.kernel.org CC: linux-kernel@vger.kernel.org CC: linux-perf-users@vger.kernel.org CC: linux-omap@vger.kernel.org Signed-off-by: Len Brown <len.brown@intel.com>
2011-01-12cpuidle: delete NOP CPUIDLE_FLAG_POLLLen Brown1-1/+1
it serves no purpose Signed-off-by: Len Brown <len.brown@intel.com>
2011-01-12cpuidle: Rename X86 specific idle poll state[0] from C0 to POLLThomas Renninger1-1/+1
C0 means and is well know as "not idle". All documentation out there uses this term as "running"/"not idle" state. Also Linux userspace tools (e.g. cpufreq-aperf and turbostat) show C0 residency which there is correct, but means something totally else than cpuidle "POLL" state. Signed-off-by: Thomas Renninger <trenn@suse.de> Signed-off-by: Len Brown <len.brown@intel.com>
2011-01-12cpuidle: Make cpuidle_enable_device() call poll_idle_init()Rafael J. Wysocki1-41/+41
The following scenario is possible with the current cpuidle code and the ACPI cpuidle driver: (1) acpi_processor_cst_has_changed() is called, (2) cpuidle_disable_device() is called, (3) cpuidle_remove_state_sysfs() is called to remove the (presumably outdated) states info from sysfs, (3) acpi_processor_get_power_info() is called, the first entry in the pr->power.states[] table is filled with zeros, (4) acpi_processor_setup_cpuidle() is called and it doesn't fill the first entry in pr->power.states[], (5) cpuidle_enable_device() is called, (6) __cpuidle_register_device() is _not_ called, since the device has already been registered, (7) Consequently, poll_idle_init() is _not_ called either, (8) cpuidle_add_state_sysfs() is called to create the sysfs attributes for the new states and it uses the bogus first table entry from acpi_processor_get_power_info() for creating state0. This problem is avoided if cpuidle_enable_device() unconditionally calls poll_idle_init(). Reported-by: Len Brown <len.brown@intel.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Len Brown <len.brown@intel.com> cc: stable@kernel.org
2011-01-07Merge branch 'for-2.6.38' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu * 'for-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (30 commits) gameport: use this_cpu_read instead of lookup x86: udelay: Use this_cpu_read to avoid address calculation x86: Use this_cpu_inc_return for nmi counter x86: Replace uses of current_cpu_data with this_cpu ops x86: Use this_cpu_ops to optimize code vmstat: User per cpu atomics to avoid interrupt disable / enable irq_work: Use per cpu atomics instead of regular atomics cpuops: Use cmpxchg for xchg to avoid lock semantics x86: this_cpu_cmpxchg and this_cpu_xchg operations percpu: Generic this_cpu_cmpxchg() and this_cpu_xchg support percpu,x86: relocate this_cpu_add_return() and friends connector: Use this_cpu operations xen: Use this_cpu_inc_return taskstats: Use this_cpu_ops random: Use this_cpu_inc_return fs: Use this_cpu_inc_return in buffer.c highmem: Use this_cpu_xx_return() operations vmstat: Use this_cpu_inc_return for vm statistics x86: Support for this_cpu_add, sub, dec, inc_return percpu: Generic support for this_cpu_add, sub, dec, inc_return ... Fixed up conflicts: in arch/x86/kernel/{apic/nmi.c, apic/x2apic_uv_x.c, process.c} as per Tejun.
2011-01-04perf: Clean up power events by introducing new, more generic onesThomas Renninger1-0/+1
Add these new power trace events: power:cpu_idle power:cpu_frequency power:machine_suspend The old C-state/idle accounting events: power:power_start power:power_end Have now a replacement (but we are still keeping the old tracepoints for compatibility): power:cpu_idle and power:power_frequency is replaced with: power:cpu_frequency power:machine_suspend is newly introduced. Jean Pihet has a patch integrated into the generic layer (kernel/power/suspend.c) which will make use of it. the type= field got removed from both, it was never used and the type is differed by the event type itself. perf timechart userspace tool gets adjusted in a separate patch. Signed-off-by: Thomas Renninger <trenn@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Acked-by: Jean Pihet <jean.pihet@newoldbits.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: rjw@sisk.pl LKML-Reference: <1294073445-14812-3-git-send-email-trenn@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> LKML-Reference: <1290072314-31155-2-git-send-email-trenn@suse.de>
2010-12-17drivers: Replace __get_cpu_var with __this_cpu_read if not used for an address.Christoph Lameter1-1/+1
__get_cpu_var() can be replaced with this_cpu_read and will then use a single read instruction with implied address calculation to access the correct per cpu instance. However, the address of a per cpu variable passed to __this_cpu_read() cannot be determed (since its an implied address conversion through segment prefixes). Therefore apply this only to uses of __get_cpu_var where the addres of the variable is not used. V3->V4: - Move one instance of this_cpu_inc_return to a later patch so that this one can go in without percpu infrastructrure changes. Sedat: fixed compile failure caused by an extra ')'. Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Sedat Dilek <sedat.dilek@gmail.com> Acked-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2010-09-28cpuidle: Fix typosLucas De Marchi1-1/+1
Signed-off-by: Len Brown <len.brown@intel.com>
2010-08-09cpuidle: extend cpuidle and menu governor to handle dynamic statesAi Li2-7/+47
On some SoC chips, HW resources may be in use during any particular idle period. As a consequence, the cpuidle states that the SoC is safe to enter can change from idle period to idle period. In addition, the latency and threshold of each cpuidle state can vary, depending on the operating condition when the CPU becomes idle, e.g. the current cpu frequency, the current state of the HW blocks, etc. cpuidle core and the menu governor, in the current form, are geared towards cpuidle states that are static, i.e. the availabiltiy of the states, their latencies, their thresholds are non-changing during run time. cpuidle does not provide any hook that cpuidle drivers can use to adjust those values on the fly for the current idle period before the menu governor selects the target cpuidle state. This patch extends cpuidle core and the menu governor to handle states that are dynamic. There are three additions in the patch and the patch maintains backwards-compatibility with existing cpuidle drivers. 1) add prepare() to struct cpuidle_device. A cpuidle driver can hook into the callback and cpuidle will call prepare() before calling the governor's select function. The callback gives the cpuidle driver a chance to update the dynamic information of the cpuidle states for the current idle period, e.g. state availability, latencies, thresholds, power values, etc. 2) add CPUIDLE_FLAG_IGNORE as one of the state flags. In the prepare() function, a cpuidle driver can set/clear the flag to indicate to the menu governor whether a cpuidle state should be ignored, i.e. not available, during the current idle period. 3) add power_specified bit to struct cpuidle_device. The menu governor currently assumes that the cpuidle states are arranged in the order of increasing latency, threshold, and power savings. This is true or can be made true for static states. Once the state parameters are dynamic, the latencies, thresholds, and power savings for the cpuidle states can increase or decrease by different amounts from idle period to idle period. So the assumption of increasing latency, threshold, and power savings from Cn to C(n+1) can no longer be guaranteed. It can be straightforward to calculate the power consumption of each available state and to specify it in power_usage for the idle period. Using the power_usage fields, the menu governor then selects the state that has the lowest power consumption and that still satisfies all other critieria. The power_specified bit defaults to 0. For existing cpuidle drivers, cpuidle detects that power_specified is 0 and fills in a dummy set of power_usage values. Signed-off-by: Ai Li <aili@codeaurora.org> Cc: Len Brown <len.brown@intel.com> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Venkatesh Pallipadi <venki@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-03[CPUFREQ] x86 cpufreq: Make trace_power_frequency cpufreq driver independentThomas Renninger1-1/+1
and fix the broken case if a core's frequency depends on others. trace_power_frequency was only implemented in a rather ungeneric way in acpi-cpufreq driver's target() function only. -> Move the call to trace_power_frequency to cpufreq.c:cpufreq_notify_transition() where CPUFREQ_POSTCHANGE notifier is triggered. This will support power frequency tracing by all cpufreq drivers trace_power_frequency did not trace frequency changes correctly when the userspace governor was used or when CPU cores' frequency depend on each other. -> Moving this into the CPUFREQ_POSTCHANGE notifier and pass the cpu which gets switched automatically fixes this. Robert Schoene provided some important fixes on top of my initial quick shot version which are integrated in this patch: - Forgot some changes in power_end trace (TP_printk/variable names) - Variable dummy in power_end must now be cpu_id - Use static 64 bit variable instead of unsigned int for cpu_id Signed-off-by: Thomas Renninger <trenn@suse.de> CC: davej@redhat.com CC: arjan@infradead.org CC: linux-kernel@vger.kernel.org CC: robert.schoene@tu-dresden.de Tested-by: robert.schoene@tu-dresden.de Signed-off-by: Dave Jones <davej@redhat.com>
2010-07-01sched: Cure nr_iowait_cpu() usersPeter Zijlstra1-2/+2
Commit 0224cf4c5e (sched: Intoduce get_cpu_iowait_time_us()) broke things by not making sure preemption was indeed disabled by the callers of nr_iowait_cpu() which took the iowait value of the current cpu. This resulted in a heap of preempt warnings. Cure this by making nr_iowait_cpu() take a cpu number and fix up the callers to pass in the right number. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Maxim Levitsky <maximlevitsky@gmail.com> Cc: Len Brown <len.brown@intel.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Jiri Slaby <jslaby@suse.cz> Cc: linux-pm@lists.linux-foundation.org LKML-Reference: <1277968037.1868.120.camel@laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-05-28Merge branch 'idle-release' of ↵Linus Torvalds4-10/+24
git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6 * 'idle-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6: intel_idle: native hardware cpuidle driver for latest Intel processors ACPI: acpi_idle: touch TS_POLLING only in the non-MWAIT case acpi_pad: uses MONITOR/MWAIT, so it doesn't need to clear TS_POLLING sched: clarify commment for TS_POLLING ACPI: allow a native cpuidle driver to displace ACPI cpuidle: make cpuidle_curr_driver static cpuidle: add cpuidle_unregister_driver() error check cpuidle: fail to register if !CONFIG_CPU_IDLE
2010-05-27cpuidle: make cpuidle_curr_driver staticLen Brown4-9/+20
cpuidle_register_driver() sets cpuidle_curr_driver cpuidle_unregister_driver() clears cpuidle_curr_driver We should't expose cpuidle_curr_driver to potential modification except via these interfaces. So make it static and create cpuidle_get_driver() to observe it. Signed-off-by: Len Brown <len.brown@intel.com>
2010-05-27cpuidle: add cpuidle_unregister_driver() error checkLen Brown1-1/+4
Assure that cpuidle_unregister_driver() will not clobber the registered driver if unregistered by somebody else. Signed-off-by: Len Brown <len.brown@intel.com>
2010-05-25cpuidle: add a repeating pattern detector to the menu governorArjan van de Ven1-1/+59
Currently, the menu governor uses the (corrected) next timer as key item for predicting the idle duration. It turns out that there are specific cases where this breaks down: There are cases where we have a very repetitive pattern of idle durations, where the idle period is pretty much the same, for reasons completely unrelated to the next timer event. Examples of such repeating patterns are network loads with irq mitigation, the mouse moving but in theory also the wifi beacons. This patch adds a relatively simple detector for such repeating patterns, where the standard deviation of the last 8 idle periods is compared to a threshold. With this extra predictor in place, measurements show that the DECAY factor can now be increased (the decaying average will now decay slower) to get an even more stable result. [arjan@infradead.org: fix bug identified by Frank] Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Cc: Corrado Zoccolo <czoccolo@gmail.com> Cc: Frank Rowand <frank.rowand@am.sony.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-10PM QOS updateMark Gross2-2/+2
This patch changes the string based list management to a handle base implementation to help with the hot path use of pm-qos, it also renames much of the API to use "request" as opposed to "requirement" that was used in the initial implementation. I did this because request more accurately represents what it actually does. Also, I added a string based ABI for users wanting to use a string interface. So if the user writes 0xDDDDDDDD formatted hex it will be accepted by the interface. (someone asked me for it and I don't think it hurts anything.) This patch updates some documentation input I got from Randy. Signed-off-by: markgross <mgross@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2010-05-09cpuidle: Fix incorrect optimizationArjan van de Ven1-5/+4
commit 672917dcc78 ("cpuidle: menu governor: reduce latency on exit") added an optimization, where the analysis on the past idle period moved from the end of idle, to the beginning of the new idle. Unfortunately, this optimization had a bug where it zeroed one key variable for new use, that is needed for the analysis. The fix is simple, zero the variable after doing the work from the previous idle. During the audit of the code that found this issue, another issue was also found; the ->measured_us data structure member is never set, a local variable is always used instead. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Cc: Corrado Zoccolo <czoccolo@gmail.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-03-30include cleanup: Update gfp.h and slab.h includes to prepare for breaking ↵Tejun Heo1-0/+1
implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-07Driver core: Constify struct sysfs_ops in struct kobj_typeEmese Revfy1-2/+2
Constify struct sysfs_ops. This is part of the ops structure constification effort started by Arjan van de Ven et al. Benefits of this constification: * prevents modification of data that is shared (referenced) by many other structure instances at runtime * detects/prevents accidental (but not intentional) modification attempts on archs that enforce read-only kernel data at runtime * potentially better optimized code as the compiler can assume that the const data cannot be changed * the compiler/linker move const data into .rodata and therefore exclude them from false sharing Signed-off-by: Emese Revfy <re.emese@gmail.com> Acked-by: David Teigland <teigland@redhat.com> Acked-by: Matt Domsch <Matt_Domsch@dell.com> Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com> Acked-by: Hans J. Koch <hjk@linutronix.de> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: Jens Axboe <jens.axboe@oracle.com> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-03-07sysdev: Pass attribute in sysdev_class attributes show/storeAndi Kleen1-0/+4
Passing the attribute to the low level IO functions allows all kinds of cleanups, by sharing low level IO code without requiring an own function for every piece of data. Also drivers can extend the attributes with own data fields and use that in the low level function. Similar to sysdev_attributes and normal attributes. This is a tree-wide sweep, converting everything in one go. No functional changes in this patch other than passing the new argument everywhere. Tested on x86, the non x86 parts are uncompiled. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-03-06cpuidle menu: remove 8 bytes of padding on 64 bit buildsRichard Kennedy1-1/+1
Reorder struct menu_device to remove 8 bytes of padding on 64 bit builds. Size drops from 136 to 128 bytes, so possibly needing one fewer cache lines. Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk> Cc: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-01-11drivers/cpuidle/governors/menu.c: fix undefined reference to `__udivdi3'Stephen Hemminger1-3/+9
menu: use proper 64 bit math The new menu governor is incorrectly doing a 64 bit divide. Compile tested only Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Len Brown <len.brown@intel.com> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-15drivers/cpuidle: Move dereference after NULL testJulia Lawall1-3/+0
It does not seem possible that ldev can be NULL, so drop the unnecessary test. If ldev can somehow be NULL, then the initialization of last_idx should be moved below the test. A simplified version of the semantic match that detects this problem is as follows (http://coccinelle.lip6.fr/): // <smpl> @match exists@ expression x, E; identifier fld; @@ * x->fld ... when != \(x = E\|&x\) * x == NULL // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-11-09tree-wide: fix typos "aquire" -> "acquire", "cumsumed" -> "consumed"Uwe Kleine-König1-2/+2
This patch was generated by git grep -E -i -l '[Aa]quire' | xargs -r perl -p -i -e 's/([Aa])quire/$1cquire/' and the cumsumed was found by checking the diff for aquire. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-10-29cpuidle: always return with interrupts enabledKevin Hilman1-1/+4
In the case where cpuidle_idle_call() returns before changing state due to a need_resched(), it was returning with IRQs disabled. The idle path assumes that the platform specific idle code returns with interrupts enabled (although this too is undocumented AFAICT) and on ARM we have a WARN_ON(!(irqs_disabled()) when returning from the idle loop, so the user-visible effects were only a warning since interrupts were eventually re-enabled later. On x86, this same problem exists, but there is no WARN_ON() to detect it. As on ARM, the interrupts are eventually re-enabled, so I'm not sure of any actual bugs triggered by this. It's primarily a correctness/consistency fix. This patch ensures IRQs are (re)enabled before returning. Reported-by: Hemanth V <hemanthv@ti.com> Signed-off-by: Kevin Hilman <khilman@deeprootsystems.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Len Brown <len.brown@intel.com> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Tested-by: Martin Michlmayr <tbm@cyrius.com> Cc: <stable@kernel.org> [2.6.31.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-22cpuidle: menu governor: reduce latency on exitCorrado Zoccolo1-1/+19
Move the state residency accounting and statistics computation off the hot exit path. On exit, the need to recompute statistics is recorded, and new statistics will be computed when menu_select is called again. The expected effect is to reduce processor wakeup latency from sleep (C-states). We are speaking of few hundreds of cycles reduction out of a several microseconds latency (determined by the hardware transition), so it is difficult to measure. Signed-off-by: Corrado Zoccolo <czoccolo@gmail.com> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Cc: Len Brown <len.brown@intel.com> Cc: Adam Belay <abelay@novell.com Acked-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-22cpuidle: fix the menu governor to boost IO performanceArjan van de Ven1-39/+212
Fix the menu idle governor which balances power savings, energy efficiency and performance impact. The reason for a reworked governor is that there have been serious performance issues reported with the existing code on Nehalem server systems. To show this I'm sure Andrew wants to see benchmark results: (benchmark is "fio", "no cstates" is using "idle=poll") no cstates current linux new algorithm 1 disk 107 Mb/s 85 Mb/s 105 Mb/s 2 disks 215 Mb/s 123 Mb/s 209 Mb/s 12 disks 590 Mb/s 320 Mb/s 585 Mb/s In various power benchmark measurements, no degredation was found by our measurement&diagnostics team. Obviously a small percentage more power was used in the "fio" benchmark, due to the much higher performance. While it would be a novel idea to describe the new algorithm in this commit message, I cheaped out and described it in comments in the code instead. [changes since first post: spelling fixes from akpm, review feedback, folded menu-tng into menu.c] Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Cc: Len Brown <lenb@kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Yanmin Zhang <yanmin_zhang@linux.intel.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-19tracing, x86, cpuidle: Move the end point of a C state in the power tracerArjan van de Ven1-0/+2
The "end of a C state" trace point currently happens before the code runs that corrects the TSC for having stopped during idle. The result of this is that the timestamp of the end-of-C-state event is garbage on cpus where the TSC stops during idle. This patch moves the end point of the C state to after the timekeeping engine of the kernel has been corrected. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Cc: Len Brown <len.brown@intel.com> Cc: fweisbec@gmail.com Cc: peterz@infradead.org Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <20090919133533.139c2a46@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-30cpuidle: Add decaying history logic to menu idle predictorPallipadi, Venkatesh1-1/+9
Add decaying history of predicted idle time, instead of using the last early wakeup. This logic helps menu governor do better job of predicting idle time. With this change, we also measured noticable (~8%) power savings on a DP server system with CPUs supporting deep C states, when system was lightly loaded. There was no change to power or perf on other load conditions. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2008-11-09regression: disable timer peek-ahead for 2.6.28Arjan van de Ven1-1/+3
It's showing up as regressions; disabling it very likely just papers over an underlying issue, but time is running out for 2.6.28, lets get back to this for 2.6.29 Fixes: #11826 and #11893 Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-23Merge branch 'v28-range-hrtimers-for-linus-v2' of ↵Linus Torvalds1-0/+7
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'v28-range-hrtimers-for-linus-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (37 commits) hrtimers: add missing docbook comments to struct hrtimer hrtimers: simplify hrtimer_peek_ahead_timers() hrtimers: fix docbook comments DECLARE_PER_CPU needs linux/percpu.h hrtimers: fix typo rangetimers: fix the bug reported by Ingo for real rangetimer: fix BUG_ON reported by Ingo rangetimer: fix x86 build failure for the !HRTIMERS case select: fix alpha OSF wrapper select: fix alpha OSF wrapper hrtimer: peek at the timer queue just before going idle hrtimer: make the futex() system call use the per process slack value hrtimer: make the nanosleep() syscall use the per process slack hrtimer: fix signed/unsigned bug in slack estimator hrtimer: show the timer ranges in /proc/timer_list hrtimer: incorporate feedback from Peter Zijlstra hrtimer: add a hrtimer_start_range() function hrtimer: another build fix hrtimer: fix build bug found by Ingo hrtimer: make select() and poll() use the hrtimer range feature ...
2008-10-16cpuidle: upon BIOS bug, default to default_idle rather than pollingVenkatesh Pallipadi1-0/+4
http://bugzilla.kernel.org/show_bug.cgi?id=11345 Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>