From 62d6ae880e3e76098d5e345decd2dce443975889 Mon Sep 17 00:00:00 2001
From: Carsten Emde <C.Emde@osadl.org>
Date: Thu, 19 Jul 2012 20:34:10 +0000
Subject: Honor state disabling in the cpuidle ladder governor

There are two cpuidle governors ladder and menu. While the ladder
governor is always available, if CONFIG_CPU_IDLE is selected, the
menu governor additionally requires CONFIG_NO_HZ.

A particular C state can be disabled by writing to the sysfs file
/sys/devices/system/cpu/cpuN/cpuidle/stateN/disable, but this mechanism
is only implemented in the menu governor. Thus, in a system where
CONFIG_NO_HZ is not selected, the ladder governor becomes default and
always will walk through all sleep states - irrespective of whether the
C state was disabled via sysfs or not. The only way to select a specific
C state was to write the related latency to /dev/cpu_dma_latency and
keep the file open as long as this setting was required - not very
practical and not suitable for setting a single core in an SMP system.

With this patch, the ladder governor only will promote to the next
C state, if it has not been disabled, and it will demote, if the
current C state was disabled.

Note that the patch does not make the setting of the sysfs variable
"disable" coherent, i.e. if one is disabling a light state, then all
deeper states are disabled as well, but the "disable" variable does not
reflect it. Likewise, if one enables a deep state but a lighter state
still is disabled, then this has no effect. A related section has been
added to the documentation.

Signed-off-by: Carsten Emde <C.Emde@osadl.org>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 Documentation/cpuidle/sysfs.txt | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/cpuidle/sysfs.txt b/Documentation/cpuidle/sysfs.txt
index 9d28a3406e74..b6f44f490ed7 100644
--- a/Documentation/cpuidle/sysfs.txt
+++ b/Documentation/cpuidle/sysfs.txt
@@ -76,9 +76,17 @@ total 0
 
 
 * desc : Small description about the idle state (string)
-* disable : Option to disable this idle state (bool)
+* disable : Option to disable this idle state (bool) -> see note below
 * latency : Latency to exit out of this idle state (in microseconds)
 * name : Name of the idle state (string)
 * power : Power consumed while in this idle state (in milliwatts)
 * time : Total time spent in this idle state (in microseconds)
 * usage : Number of times this state was entered (count)
+
+Note:
+The behavior and the effect of the disable variable depends on the
+implementation of a particular governor. In the ladder governor, for
+example, it is not coherent, i.e. if one is disabling a light state,
+then all deeper states are disabled as well, but the disable variable
+does not reflect it. Likewise, if one enables a deep state but a lighter
+state still is disabled, then this has no effect.
-- 
cgit v1.2.3


From 615b7300717b9ad5c23d1f391843484fe30f6c12 Mon Sep 17 00:00:00 2001
From: Andre Przywara <andre.przywara@amd.com>
Date: Tue, 4 Sep 2012 08:28:07 +0000
Subject: acpi-cpufreq: Add support for disabling dynamic overclocking

One feature present in powernow-k8 that isn't present in acpi-cpufreq
is support for enabling or disabling AMD's core performance boost
technology. This patch adds support to acpi-cpufreq, but also
includes support for Intel's dynamic acceleration.

The original boost disabling sysfs file was per CPU, but acted
globally. Also the naming (cpb) was at least not intuitive.
So lets introduce a single file simply called "boost", which sits
once in /sys/devices/system/cpu/cpufreq.
This should be the only way of using this feature, so add
documentation about the rationale and the usage.

A following patch will re-introduce the cpb knob for compatibility
reasons on AMD CPUs.

Per-CPU boost switching is possible, but not trivial and is thus
postponed to a later patch series.

Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 Documentation/ABI/testing/sysfs-devices-system-cpu |  11 ++
 Documentation/cpu-freq/boost.txt                   |  93 +++++++++++
 drivers/cpufreq/acpi-cpufreq.c                     | 177 +++++++++++++++++++++
 3 files changed, 281 insertions(+)
 create mode 100644 Documentation/cpu-freq/boost.txt

(limited to 'Documentation')

diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu
index 5dab36448b44..6943133afcb8 100644
--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -176,3 +176,14 @@ Description:	Disable L3 cache indices
 		All AMD processors with L3 caches provide this functionality.
 		For details, see BKDGs at
 		http://developer.amd.com/documentation/guides/Pages/default.aspx
+
+
+What:		/sys/devices/system/cpu/cpufreq/boost
+Date:		August 2012
+Contact:	Linux kernel mailing list <linux-kernel@vger.kernel.org>
+Description:	Processor frequency boosting control
+
+		This switch controls the boost setting for the whole system.
+		Boosting allows the CPU and the firmware to run at a frequency
+		beyound it's nominal limit.
+		More details can be found in Documentation/cpu-freq/boost.txt
diff --git a/Documentation/cpu-freq/boost.txt b/Documentation/cpu-freq/boost.txt
new file mode 100644
index 000000000000..9b4edfcf486f
--- /dev/null
+++ b/Documentation/cpu-freq/boost.txt
@@ -0,0 +1,93 @@
+Processor boosting control
+
+	- information for users -
+
+Quick guide for the impatient:
+--------------------
+/sys/devices/system/cpu/cpufreq/boost
+controls the boost setting for the whole system. You can read and write
+that file with either "0" (boosting disabled) or "1" (boosting allowed).
+Reading or writing 1 does not mean that the system is boosting at this
+very moment, but only that the CPU _may_ raise the frequency at it's
+discretion.
+--------------------
+
+Introduction
+-------------
+Some CPUs support a functionality to raise the operating frequency of
+some cores in a multi-core package if certain conditions apply, mostly
+if the whole chip is not fully utilized and below it's intended thermal
+budget. This is done without operating system control by a combination
+of hardware and firmware.
+On Intel CPUs this is called "Turbo Boost", AMD calls it "Turbo-Core",
+in technical documentation "Core performance boost". In Linux we use
+the term "boost" for convenience.
+
+Rationale for disable switch
+----------------------------
+
+Though the idea is to just give better performance without any user
+intervention, sometimes the need arises to disable this functionality.
+Most systems offer a switch in the (BIOS) firmware to disable the
+functionality at all, but a more fine-grained and dynamic control would
+be desirable:
+1. While running benchmarks, reproducible results are important. Since
+   the boosting functionality depends on the load of the whole package,
+   single thread performance can vary. By explicitly disabling the boost
+   functionality at least for the benchmark's run-time the system will run
+   at a fixed frequency and results are reproducible again.
+2. To examine the impact of the boosting functionality it is helpful
+   to do tests with and without boosting.
+3. Boosting means overclocking the processor, though under controlled
+   conditions. By raising the frequency and the voltage the processor
+   will consume more power than without the boosting, which may be
+   undesirable for instance for mobile users. Disabling boosting may
+   save power here, though this depends on the workload.
+
+
+User controlled switch
+----------------------
+
+To allow the user to toggle the boosting functionality, the acpi-cpufreq
+driver exports a sysfs knob to disable it. There is a file:
+/sys/devices/system/cpu/cpufreq/boost
+which can either read "0" (boosting disabled) or "1" (boosting enabled).
+Reading the file is always supported, even if the processor does not
+support boosting. In this case the file will be read-only and always
+reads as "0". Explicitly changing the permissions and writing to that
+file anyway will return EINVAL.
+
+On supported CPUs one can write either a "0" or a "1" into this file.
+This will either disable the boost functionality on all cores in the
+whole system (0) or will allow the hardware to boost at will (1).
+
+Writing a "1" does not explicitly boost the system, but just allows the
+CPU (and the firmware) to boost at their discretion. Some implementations
+take external factors like the chip's temperature into account, so
+boosting once does not necessarily mean that it will occur every time
+even using the exact same software setup.
+
+
+AMD legacy cpb switch
+---------------------
+The AMD powernow-k8 driver used to support a very similar switch to
+disable or enable the "Core Performance Boost" feature of some AMD CPUs.
+This switch was instantiated in each CPU's cpufreq directory
+(/sys/devices/system/cpu[0-9]*/cpufreq) and was called "cpb".
+Though the per CPU existence hints at a more fine grained control, the
+actual implementation only supported a system-global switch semantics,
+which was simply reflected into each CPU's file. Writing a 0 or 1 into it
+would pull the other CPUs to the same state.
+For compatibility reasons this file and its behavior is still supported
+on AMD CPUs, though it is now protected by a config switch
+(X86_ACPI_CPUFREQ_CPB). On Intel CPUs this file will never be created,
+even with the config option set.
+This functionality is considered legacy and will be removed in some future
+kernel version.
+
+More fine grained boosting control
+----------------------------------
+
+Technically it is possible to switch the boosting functionality at least
+on a per package basis, for some CPUs even per core. Currently the driver
+does not support it, but this may be implemented in the future.
diff --git a/drivers/cpufreq/acpi-cpufreq.c b/drivers/cpufreq/acpi-cpufreq.c
index 70e717305c29..dffa7af1db71 100644
--- a/drivers/cpufreq/acpi-cpufreq.c
+++ b/drivers/cpufreq/acpi-cpufreq.c
@@ -63,6 +63,8 @@ enum {
 #define INTEL_MSR_RANGE		(0xffff)
 #define AMD_MSR_RANGE		(0x7)
 
+#define MSR_K7_HWCR_CPB_DIS	(1ULL << 25)
+
 struct acpi_cpufreq_data {
 	struct acpi_processor_performance *acpi_data;
 	struct cpufreq_frequency_table *freq_table;
@@ -78,6 +80,96 @@ static struct acpi_processor_performance __percpu *acpi_perf_data;
 static struct cpufreq_driver acpi_cpufreq_driver;
 
 static unsigned int acpi_pstate_strict;
+static bool boost_enabled, boost_supported;
+static struct msr __percpu *msrs;
+
+static bool boost_state(unsigned int cpu)
+{
+	u32 lo, hi;
+	u64 msr;
+
+	switch (boot_cpu_data.x86_vendor) {
+	case X86_VENDOR_INTEL:
+		rdmsr_on_cpu(cpu, MSR_IA32_MISC_ENABLE, &lo, &hi);
+		msr = lo | ((u64)hi << 32);
+		return !(msr & MSR_IA32_MISC_ENABLE_TURBO_DISABLE);
+	case X86_VENDOR_AMD:
+		rdmsr_on_cpu(cpu, MSR_K7_HWCR, &lo, &hi);
+		msr = lo | ((u64)hi << 32);
+		return !(msr & MSR_K7_HWCR_CPB_DIS);
+	}
+	return false;
+}
+
+static void boost_set_msrs(bool enable, const struct cpumask *cpumask)
+{
+	u32 cpu;
+	u32 msr_addr;
+	u64 msr_mask;
+
+	switch (boot_cpu_data.x86_vendor) {
+	case X86_VENDOR_INTEL:
+		msr_addr = MSR_IA32_MISC_ENABLE;
+		msr_mask = MSR_IA32_MISC_ENABLE_TURBO_DISABLE;
+		break;
+	case X86_VENDOR_AMD:
+		msr_addr = MSR_K7_HWCR;
+		msr_mask = MSR_K7_HWCR_CPB_DIS;
+		break;
+	default:
+		return;
+	}
+
+	rdmsr_on_cpus(cpumask, msr_addr, msrs);
+
+	for_each_cpu(cpu, cpumask) {
+		struct msr *reg = per_cpu_ptr(msrs, cpu);
+		if (enable)
+			reg->q &= ~msr_mask;
+		else
+			reg->q |= msr_mask;
+	}
+
+	wrmsr_on_cpus(cpumask, msr_addr, msrs);
+}
+
+static ssize_t store_global_boost(struct kobject *kobj, struct attribute *attr,
+				  const char *buf, size_t count)
+{
+	int ret;
+	unsigned long val = 0;
+
+	if (!boost_supported)
+		return -EINVAL;
+
+	ret = kstrtoul(buf, 10, &val);
+	if (ret || (val > 1))
+		return -EINVAL;
+
+	if ((val && boost_enabled) || (!val && !boost_enabled))
+		return count;
+
+	get_online_cpus();
+
+	boost_set_msrs(val, cpu_online_mask);
+
+	put_online_cpus();
+
+	boost_enabled = val;
+	pr_debug("Core Boosting %sabled.\n", val ? "en" : "dis");
+
+	return count;
+}
+
+static ssize_t show_global_boost(struct kobject *kobj,
+				 struct attribute *attr, char *buf)
+{
+	return sprintf(buf, "%u\n", boost_enabled);
+}
+
+static struct global_attr global_boost = __ATTR(boost, 0644,
+						show_global_boost,
+						store_global_boost);
 
 static int check_est_cpu(unsigned int cpuid)
 {
@@ -448,6 +540,44 @@ static void free_acpi_perf_data(void)
 	free_percpu(acpi_perf_data);
 }
 
+static int boost_notify(struct notifier_block *nb, unsigned long action,
+		      void *hcpu)
+{
+	unsigned cpu = (long)hcpu;
+	const struct cpumask *cpumask;
+
+	cpumask = get_cpu_mask(cpu);
+
+	/*
+	 * Clear the boost-disable bit on the CPU_DOWN path so that
+	 * this cpu cannot block the remaining ones from boosting. On
+	 * the CPU_UP path we simply keep the boost-disable flag in
+	 * sync with the current global state.
+	 */
+
+	switch (action) {
+	case CPU_UP_PREPARE:
+	case CPU_UP_PREPARE_FROZEN:
+		boost_set_msrs(boost_enabled, cpumask);
+		break;
+
+	case CPU_DOWN_PREPARE:
+	case CPU_DOWN_PREPARE_FROZEN:
+		boost_set_msrs(1, cpumask);
+		break;
+
+	default:
+		break;
+	}
+
+	return NOTIFY_OK;
+}
+
+
+static struct notifier_block boost_nb = {
+	.notifier_call          = boost_notify,
+};
+
 /*
  * acpi_cpufreq_early_init - initialize ACPI P-States library
  *
@@ -774,6 +904,49 @@ static struct cpufreq_driver acpi_cpufreq_driver = {
 	.attr		= acpi_cpufreq_attr,
 };
 
+static void __init acpi_cpufreq_boost_init(void)
+{
+	if (boot_cpu_has(X86_FEATURE_CPB) || boot_cpu_has(X86_FEATURE_IDA)) {
+		msrs = msrs_alloc();
+
+		if (!msrs)
+			return;
+
+		boost_supported = true;
+		boost_enabled = boost_state(0);
+
+		get_online_cpus();
+
+		/* Force all MSRs to the same value */
+		boost_set_msrs(boost_enabled, cpu_online_mask);
+
+		register_cpu_notifier(&boost_nb);
+
+		put_online_cpus();
+	} else
+		global_boost.attr.mode = 0444;
+
+	/* We create the boost file in any case, though for systems without
+	 * hardware support it will be read-only and hardwired to return 0.
+	 */
+	if (sysfs_create_file(cpufreq_global_kobject, &(global_boost.attr)))
+		pr_warn(PFX "could not register global boost sysfs file\n");
+	else
+		pr_debug("registered global boost sysfs file\n");
+}
+
+static void __exit acpi_cpufreq_boost_exit(void)
+{
+	sysfs_remove_file(cpufreq_global_kobject, &(global_boost.attr));
+
+	if (msrs) {
+		unregister_cpu_notifier(&boost_nb);
+
+		msrs_free(msrs);
+		msrs = NULL;
+	}
+}
+
 static int __init acpi_cpufreq_init(void)
 {
 	int ret;
@@ -790,6 +963,8 @@ static int __init acpi_cpufreq_init(void)
 	ret = cpufreq_register_driver(&acpi_cpufreq_driver);
 	if (ret)
 		free_acpi_perf_data();
+	else
+		acpi_cpufreq_boost_init();
 
 	return ret;
 }
@@ -798,6 +973,8 @@ static void __exit acpi_cpufreq_exit(void)
 {
 	pr_debug("acpi_cpufreq_exit\n");
 
+	acpi_cpufreq_boost_exit();
+
 	cpufreq_unregister_driver(&acpi_cpufreq_driver);
 
 	free_acpi_perf_data();
-- 
cgit v1.2.3


From b496dfbc94ab86f970ef0167eaabe51f930aa5fb Mon Sep 17 00:00:00 2001
From: Shawn Guo <shawn.guo@linaro.org>
Date: Wed, 5 Sep 2012 01:09:12 +0200
Subject: PM / OPP: Initialize OPP table from device tree

With a lot of devices booting from device tree nowadays, it requires
that OPP table can be initialized from device tree.  The patch adds
a helper function of_init_opp_table together with a binding doc for
that purpose.

Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Acked-by: Rob Herring <rob.herring@calxeda.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 Documentation/devicetree/bindings/power/opp.txt | 25 +++++++++++++
 drivers/base/power/opp.c                        | 47 +++++++++++++++++++++++++
 include/linux/opp.h                             |  8 +++++
 3 files changed, 80 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/power/opp.txt

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/power/opp.txt b/Documentation/devicetree/bindings/power/opp.txt
new file mode 100644
index 000000000000..74499e5033fc
--- /dev/null
+++ b/Documentation/devicetree/bindings/power/opp.txt
@@ -0,0 +1,25 @@
+* Generic OPP Interface
+
+SoCs have a standard set of tuples consisting of frequency and
+voltage pairs that the device will support per voltage domain. These
+are called Operating Performance Points or OPPs.
+
+Properties:
+- operating-points: An array of 2-tuples items, and each item consists
+  of frequency and voltage like <freq-kHz vol-uV>.
+	freq: clock frequency in kHz
+	vol: voltage in microvolt
+
+Examples:
+
+cpu@0 {
+	compatible = "arm,cortex-a9";
+	reg = <0>;
+	next-level-cache = <&L2>;
+	operating-points = <
+		/* kHz    uV */
+		792000  1100000
+		396000  950000
+		198000  850000
+	>;
+};
diff --git a/drivers/base/power/opp.c b/drivers/base/power/opp.c
index ac993eafec82..d9468642fc41 100644
--- a/drivers/base/power/opp.c
+++ b/drivers/base/power/opp.c
@@ -22,6 +22,7 @@
 #include <linux/rculist.h>
 #include <linux/rcupdate.h>
 #include <linux/opp.h>
+#include <linux/of.h>
 
 /*
  * Internal data structure organization with the OPP layer library is as
@@ -674,3 +675,49 @@ struct srcu_notifier_head *opp_get_notifier(struct device *dev)
 
 	return &dev_opp->head;
 }
+
+#ifdef CONFIG_OF
+/**
+ * of_init_opp_table() - Initialize opp table from device tree
+ * @dev:	device pointer used to lookup device OPPs.
+ *
+ * Register the initial OPP table with the OPP library for given device.
+ */
+int of_init_opp_table(struct device *dev)
+{
+	const struct property *prop;
+	const __be32 *val;
+	int nr;
+
+	prop = of_find_property(dev->of_node, "operating-points", NULL);
+	if (!prop)
+		return -ENODEV;
+	if (!prop->value)
+		return -ENODATA;
+
+	/*
+	 * Each OPP is a set of tuples consisting of frequency and
+	 * voltage like <freq-kHz vol-uV>.
+	 */
+	nr = prop->length / sizeof(u32);
+	if (nr % 2) {
+		dev_err(dev, "%s: Invalid OPP list\n", __func__);
+		return -EINVAL;
+	}
+
+	val = prop->value;
+	while (nr) {
+		unsigned long freq = be32_to_cpup(val++) * 1000;
+		unsigned long volt = be32_to_cpup(val++);
+
+		if (opp_add(dev, freq, volt)) {
+			dev_warn(dev, "%s: Failed to add OPP %ld\n",
+				 __func__, freq);
+			continue;
+		}
+		nr -= 2;
+	}
+
+	return 0;
+}
+#endif
diff --git a/include/linux/opp.h b/include/linux/opp.h
index 2a4e5faee904..214e0ebcb84d 100644
--- a/include/linux/opp.h
+++ b/include/linux/opp.h
@@ -48,6 +48,14 @@ int opp_disable(struct device *dev, unsigned long freq);
 
 struct srcu_notifier_head *opp_get_notifier(struct device *dev);
 
+#ifdef CONFIG_OF
+int of_init_opp_table(struct device *dev);
+#else
+static inline int of_init_opp_table(struct device *dev)
+{
+	return -EINVAL;
+}
+#endif /* CONFIG_OF */
 #else
 static inline unsigned long opp_get_voltage(struct opp *opp)
 {
-- 
cgit v1.2.3


From 95ceafd46359dfd901f9d3b881b33d3036e4b0ce Mon Sep 17 00:00:00 2001
From: Shawn Guo <shawn.guo@linaro.org>
Date: Thu, 6 Sep 2012 07:09:11 +0000
Subject: cpufreq: Add a generic cpufreq-cpu0 driver

It adds a generic cpufreq driver for CPU0 frequency management based on
clk, regulator, OPP and device tree support.  It can support both
uniprocessor (UP) and those symmetric multiprocessor (SMP) systems which
share clock and voltage across all CPUs.

Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Tested-by: AnilKumar Ch <anilkumar@ti.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 .../devicetree/bindings/cpufreq/cpufreq-cpu0.txt   |  55 +++++
 drivers/cpufreq/Kconfig                            |  11 +
 drivers/cpufreq/Makefile                           |   2 +
 drivers/cpufreq/cpufreq-cpu0.c                     | 269 +++++++++++++++++++++
 4 files changed, 337 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/cpufreq/cpufreq-cpu0.txt
 create mode 100644 drivers/cpufreq/cpufreq-cpu0.c

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/cpufreq/cpufreq-cpu0.txt b/Documentation/devicetree/bindings/cpufreq/cpufreq-cpu0.txt
new file mode 100644
index 000000000000..4416ccc33472
--- /dev/null
+++ b/Documentation/devicetree/bindings/cpufreq/cpufreq-cpu0.txt
@@ -0,0 +1,55 @@
+Generic CPU0 cpufreq driver
+
+It is a generic cpufreq driver for CPU0 frequency management.  It
+supports both uniprocessor (UP) and symmetric multiprocessor (SMP)
+systems which share clock and voltage across all CPUs.
+
+Both required and optional properties listed below must be defined
+under node /cpus/cpu@0.
+
+Required properties:
+- operating-points: Refer to Documentation/devicetree/bindings/power/opp.txt
+  for details
+
+Optional properties:
+- clock-latency: Specify the possible maximum transition latency for clock,
+  in unit of nanoseconds.
+- voltage-tolerance: Specify the CPU voltage tolerance in percentage.
+
+Examples:
+
+cpus {
+	#address-cells = <1>;
+	#size-cells = <0>;
+
+	cpu@0 {
+		compatible = "arm,cortex-a9";
+		reg = <0>;
+		next-level-cache = <&L2>;
+		operating-points = <
+			/* kHz    uV */
+			792000  1100000
+			396000  950000
+			198000  850000
+		>;
+		transition-latency = <61036>; /* two CLK32 periods */
+	};
+
+	cpu@1 {
+		compatible = "arm,cortex-a9";
+		reg = <1>;
+		next-level-cache = <&L2>;
+	};
+
+	cpu@2 {
+		compatible = "arm,cortex-a9";
+		reg = <2>;
+		next-level-cache = <&L2>;
+	};
+
+	cpu@3 {
+		compatible = "arm,cortex-a9";
+		reg = <3>;
+		next-level-cache = <&L2>;
+	};
+};
diff --git a/drivers/cpufreq/Kconfig b/drivers/cpufreq/Kconfig
index e24a2a1b6666..ea512f47b789 100644
--- a/drivers/cpufreq/Kconfig
+++ b/drivers/cpufreq/Kconfig
@@ -179,6 +179,17 @@ config CPU_FREQ_GOV_CONSERVATIVE
 
 	  If in doubt, say N.
 
+config GENERIC_CPUFREQ_CPU0
+	bool "Generic CPU0 cpufreq driver"
+	depends on HAVE_CLK && REGULATOR && PM_OPP && OF
+	select CPU_FREQ_TABLE
+	help
+	  This adds a generic cpufreq driver for CPU0 frequency management.
+	  It supports both uniprocessor (UP) and symmetric multiprocessor (SMP)
+	  systems which share clock and voltage across all CPUs.
+
+	  If in doubt, say N.
+
 menu "x86 CPU frequency scaling drivers"
 depends on X86
 source "drivers/cpufreq/Kconfig.x86"
diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile
index b99790f400c4..1bc90e1306d8 100644
--- a/drivers/cpufreq/Makefile
+++ b/drivers/cpufreq/Makefile
@@ -13,6 +13,8 @@ obj-$(CONFIG_CPU_FREQ_GOV_CONSERVATIVE)	+= cpufreq_conservative.o
 # CPUfreq cross-arch helpers
 obj-$(CONFIG_CPU_FREQ_TABLE)		+= freq_table.o
 
+obj-$(CONFIG_GENERIC_CPUFREQ_CPU0)	+= cpufreq-cpu0.o
+
 ##################################################################################
 # x86 drivers.
 # Link order matters. K8 is preferred to ACPI because of firmware bugs in early
diff --git a/drivers/cpufreq/cpufreq-cpu0.c b/drivers/cpufreq/cpufreq-cpu0.c
new file mode 100644
index 000000000000..e9158278c71d
--- /dev/null
+++ b/drivers/cpufreq/cpufreq-cpu0.c
@@ -0,0 +1,269 @@
+/*
+ * Copyright (C) 2012 Freescale Semiconductor, Inc.
+ *
+ * The OPP code in function cpu0_set_target() is reused from
+ * drivers/cpufreq/omap-cpufreq.c
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#define pr_fmt(fmt)	KBUILD_MODNAME ": " fmt
+
+#include <linux/clk.h>
+#include <linux/cpu.h>
+#include <linux/cpufreq.h>
+#include <linux/err.h>
+#include <linux/module.h>
+#include <linux/of.h>
+#include <linux/opp.h>
+#include <linux/regulator/consumer.h>
+#include <linux/slab.h>
+
+static unsigned int transition_latency;
+static unsigned int voltage_tolerance; /* in percentage */
+
+static struct device *cpu_dev;
+static struct clk *cpu_clk;
+static struct regulator *cpu_reg;
+static struct cpufreq_frequency_table *freq_table;
+
+static int cpu0_verify_speed(struct cpufreq_policy *policy)
+{
+	return cpufreq_frequency_table_verify(policy, freq_table);
+}
+
+static unsigned int cpu0_get_speed(unsigned int cpu)
+{
+	return clk_get_rate(cpu_clk) / 1000;
+}
+
+static int cpu0_set_target(struct cpufreq_policy *policy,
+			   unsigned int target_freq, unsigned int relation)
+{
+	struct cpufreq_freqs freqs;
+	struct opp *opp;
+	unsigned long freq_Hz, volt = 0, volt_old = 0, tol = 0;
+	unsigned int index, cpu;
+	int ret;
+
+	ret = cpufreq_frequency_table_target(policy, freq_table, target_freq,
+					     relation, &index);
+	if (ret) {
+		pr_err("failed to match target freqency %d: %d\n",
+		       target_freq, ret);
+		return ret;
+	}
+
+	freq_Hz = clk_round_rate(cpu_clk, freq_table[index].frequency * 1000);
+	if (freq_Hz < 0)
+		freq_Hz = freq_table[index].frequency * 1000;
+	freqs.new = freq_Hz / 1000;
+	freqs.old = clk_get_rate(cpu_clk) / 1000;
+
+	if (freqs.old == freqs.new)
+		return 0;
+
+	for_each_online_cpu(cpu) {
+		freqs.cpu = cpu;
+		cpufreq_notify_transition(&freqs, CPUFREQ_PRECHANGE);
+	}
+
+	if (cpu_reg) {
+		opp = opp_find_freq_ceil(cpu_dev, &freq_Hz);
+		if (IS_ERR(opp)) {
+			pr_err("failed to find OPP for %ld\n", freq_Hz);
+			return PTR_ERR(opp);
+		}
+		volt = opp_get_voltage(opp);
+		tol = volt * voltage_tolerance / 100;
+		volt_old = regulator_get_voltage(cpu_reg);
+	}
+
+	pr_debug("%u MHz, %ld mV --> %u MHz, %ld mV\n",
+		 freqs.old / 1000, volt_old ? volt_old / 1000 : -1,
+		 freqs.new / 1000, volt ? volt / 1000 : -1);
+
+	/* scaling up?  scale voltage before frequency */
+	if (cpu_reg && freqs.new > freqs.old) {
+		ret = regulator_set_voltage_tol(cpu_reg, volt, tol);
+		if (ret) {
+			pr_err("failed to scale voltage up: %d\n", ret);
+			freqs.new = freqs.old;
+			return ret;
+		}
+	}
+
+	ret = clk_set_rate(cpu_clk, freqs.new * 1000);
+	if (ret) {
+		pr_err("failed to set clock rate: %d\n", ret);
+		if (cpu_reg)
+			regulator_set_voltage_tol(cpu_reg, volt_old, tol);
+		return ret;
+	}
+
+	/* scaling down?  scale voltage after frequency */
+	if (cpu_reg && freqs.new < freqs.old) {
+		ret = regulator_set_voltage_tol(cpu_reg, volt, tol);
+		if (ret) {
+			pr_err("failed to scale voltage down: %d\n", ret);
+			clk_set_rate(cpu_clk, freqs.old * 1000);
+			freqs.new = freqs.old;
+			return ret;
+		}
+	}
+
+	for_each_online_cpu(cpu) {
+		freqs.cpu = cpu;
+		cpufreq_notify_transition(&freqs, CPUFREQ_POSTCHANGE);
+	}
+
+	return 0;
+}
+
+static int cpu0_cpufreq_init(struct cpufreq_policy *policy)
+{
+	int ret;
+
+	if (policy->cpu != 0)
+		return -EINVAL;
+
+	ret = cpufreq_frequency_table_cpuinfo(policy, freq_table);
+	if (ret) {
+		pr_err("invalid frequency table: %d\n", ret);
+		return ret;
+	}
+
+	policy->cpuinfo.transition_latency = transition_latency;
+	policy->cur = clk_get_rate(cpu_clk) / 1000;
+
+	/*
+	 * The driver only supports the SMP configuartion where all processors
+	 * share the clock and voltage and clock.  Use cpufreq affected_cpus
+	 * interface to have all CPUs scaled together.
+	 */
+	policy->shared_type = CPUFREQ_SHARED_TYPE_ANY;
+	cpumask_setall(policy->cpus);
+
+	cpufreq_frequency_table_get_attr(freq_table, policy->cpu);
+
+	return 0;
+}
+
+static int cpu0_cpufreq_exit(struct cpufreq_policy *policy)
+{
+	cpufreq_frequency_table_put_attr(policy->cpu);
+
+	return 0;
+}
+
+static struct freq_attr *cpu0_cpufreq_attr[] = {
+	&cpufreq_freq_attr_scaling_available_freqs,
+	NULL,
+};
+
+static struct cpufreq_driver cpu0_cpufreq_driver = {
+	.flags = CPUFREQ_STICKY,
+	.verify = cpu0_verify_speed,
+	.target = cpu0_set_target,
+	.get = cpu0_get_speed,
+	.init = cpu0_cpufreq_init,
+	.exit = cpu0_cpufreq_exit,
+	.name = "generic_cpu0",
+	.attr = cpu0_cpufreq_attr,
+};
+
+static int __devinit cpu0_cpufreq_driver_init(void)
+{
+	struct device_node *np;
+	int ret;
+
+	np = of_find_node_by_path("/cpus/cpu@0");
+	if (!np) {
+		pr_err("failed to find cpu0 node\n");
+		return -ENOENT;
+	}
+
+	cpu_dev = get_cpu_device(0);
+	if (!cpu_dev) {
+		pr_err("failed to get cpu0 device\n");
+		ret = -ENODEV;
+		goto out_put_node;
+	}
+
+	cpu_dev->of_node = np;
+
+	cpu_clk = clk_get(cpu_dev, NULL);
+	if (IS_ERR(cpu_clk)) {
+		ret = PTR_ERR(cpu_clk);
+		pr_err("failed to get cpu0 clock: %d\n", ret);
+		goto out_put_node;
+	}
+
+	cpu_reg = regulator_get(cpu_dev, "cpu0");
+	if (IS_ERR(cpu_reg)) {
+		pr_warn("failed to get cpu0 regulator\n");
+		cpu_reg = NULL;
+	}
+
+	ret = of_init_opp_table(cpu_dev);
+	if (ret) {
+		pr_err("failed to init OPP table: %d\n", ret);
+		goto out_put_node;
+	}
+
+	ret = opp_init_cpufreq_table(cpu_dev, &freq_table);
+	if (ret) {
+		pr_err("failed to init cpufreq table: %d\n", ret);
+		goto out_put_node;
+	}
+
+	of_property_read_u32(np, "voltage-tolerance", &voltage_tolerance);
+
+	if (of_property_read_u32(np, "clock-latency", &transition_latency))
+		transition_latency = CPUFREQ_ETERNAL;
+
+	if (cpu_reg) {
+		struct opp *opp;
+		unsigned long min_uV, max_uV;
+		int i;
+
+		/*
+		 * OPP is maintained in order of increasing frequency, and
+		 * freq_table initialised from OPP is therefore sorted in the
+		 * same order.
+		 */
+		for (i = 0; freq_table[i].frequency != CPUFREQ_TABLE_END; i++)
+			;
+		opp = opp_find_freq_exact(cpu_dev,
+				freq_table[0].frequency * 1000, true);
+		min_uV = opp_get_voltage(opp);
+		opp = opp_find_freq_exact(cpu_dev,
+				freq_table[i-1].frequency * 1000, true);
+		max_uV = opp_get_voltage(opp);
+		ret = regulator_set_voltage_time(cpu_reg, min_uV, max_uV);
+		if (ret > 0)
+			transition_latency += ret * 1000;
+	}
+
+	ret = cpufreq_register_driver(&cpu0_cpufreq_driver);
+	if (ret) {
+		pr_err("failed register driver: %d\n", ret);
+		goto out_free_table;
+	}
+
+	of_node_put(np);
+	return 0;
+
+out_free_table:
+	opp_free_cpufreq_table(cpu_dev, &freq_table);
+out_put_node:
+	of_node_put(np);
+	return ret;
+}
+late_initcall(cpu0_cpufreq_driver_init);
+
+MODULE_AUTHOR("Shawn Guo <shawn.guo@linaro.org>");
+MODULE_DESCRIPTION("Generic CPU0 cpufreq driver");
+MODULE_LICENSE("GPL");
-- 
cgit v1.2.3