linux - Linux Kernel (branches are rebased on master from time to time)

Age	Commit message (Collapse)	Author	Files	Lines
2012-10-12	Merge branch 'rcu/urgent' of ↵	Ingo Molnar	3	-11/+28
	git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu Pull RCU fixes from Paul E. McKenney. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-10-10	rcu: Advise most users not to enable RCU user mode	Frederic Weisbecker	1	-0/+12
	Discourage distros from enabling CONFIG_RCU_USER_QS because it brings overhead for no benefits yet. It's not a useful feature on its own until we can fully run an adaptive tickless kernel. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-10-08	rcu: Grace-period initialization excludes only RCU notifier	Paul E. McKenney	2	-11/+16
	Kirill noted the following deadlock cycle on shutdown involving padata: > With commit 755609a9087fa983f567dc5452b2fa7b089b591f I've got deadlock on > poweroff. > > It guess it happens because of race for cpu_hotplug.lock: > > CPU A CPU B > disable_nonboot_cpus() > _cpu_down() > cpu_hotplug_begin() > mutex_lock(&cpu_hotplug.lock); > __cpu_notify() > padata_cpu_callback() > __padata_remove_cpu() > padata_replace() > synchronize_rcu() > rcu_gp_kthread() > get_online_cpus(); > mutex_lock(&cpu_hotplug.lock); It would of course be good to eliminate grace-period delays from CPU-hotplug notifiers, but that is a separate issue. Deadlock is not an appropriate diagnostic for excessive CPU-hotplug latency. Fortunately, grace-period initialization does not actually need to exclude all of the CPU-hotplug operation, but rather only RCU's own CPU_UP_PREPARE and CPU_DEAD CPU-hotplug notifiers. This commit therefore introduces a new per-rcu_state onoff_mutex that provides the required concurrency control in place of the get_online_cpus() that was previously in rcu_gp_init(). Reported-by: "Kirill A. Shutemov" <kirill@shutemov.name> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Kirill A. Shutemov <kirill@shutemov.name>
2012-09-27	Merge branch 'rcu/idle' of ↵	Ingo Molnar	17	-79/+407
	git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu Pull the RCU adaptive-idle feature from Paul E. McKenney: "This series adds RCU APIs that allow non-idle tasks to enter RCU idle mode and provides x86 code to make use of them, allowing RCU to treat user-mode execution as an extended quiescent state when the new RCU_USER_QS kernel configuration parameter is specified. Work is in progress to port this to a few other architectures, but is not part of this series." Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-09-26	rcu: Apply micro-optimization and int/bool fixes to RCU's idle handling	Paul E. McKenney	1	-8/+8
	Checking "user" before "is_idle_task()" allows better optimizations in cases where inlining is possible. Also, "bool" should be passed "true" or "false" rather than "1" or "0". This commit therefore makes these changes, as noted in Josh's review. Reported-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2012-09-26	rcu: Userspace RCU extended QS selftest	Frederic Weisbecker	2	-1/+9
	Provide a config option that enables the userspace RCU extended quiescent state on every CPUs by default. This is for testing purpose. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Alessio Igor Bogani <abogani@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Avi Kivity <avi@redhat.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Christoph Lameter <cl@linux.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Gilad Ben Yossef <gilad@benyossef.com> Cc: Hakan Akkan <hakanakkan@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Kevin Hilman <khilman@ti.com> Cc: Max Krasnyansky <maxk@qualcomm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Hemminger <shemminger@vyatta.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2012-09-26	x86: Exit RCU extended QS on notify resume	Frederic Weisbecker	2	-0/+5
	do_notify_resume() may be called on irq or exception exit. But at that time the exception has already called rcu_user_enter() and the irq has already called rcu_irq_exit(). Since it can use RCU read side critical section, we must call rcu_user_exit() before doing anything there. Then we must call back rcu_user_enter() after this function because we know we are going to userspace from there. This complete support for userspace RCU extended quiescent state in x86-64. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Alessio Igor Bogani <abogani@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Avi Kivity <avi@redhat.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Christoph Lameter <cl@linux.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Gilad Ben Yossef <gilad@benyossef.com> Cc: Hakan Akkan <hakanakkan@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Kevin Hilman <khilman@ti.com> Cc: Max Krasnyansky <maxk@qualcomm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Hemminger <shemminger@vyatta.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2012-09-26	x86: Use the new schedule_user API on userspace preemption	Frederic Weisbecker	2	-4/+17
	This way we can exit the RCU extended quiescent state before we schedule a new task from irq/exception exit. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Alessio Igor Bogani <abogani@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Avi Kivity <avi@redhat.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Christoph Lameter <cl@linux.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Gilad Ben Yossef <gilad@benyossef.com> Cc: Hakan Akkan <hakanakkan@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Kevin Hilman <khilman@ti.com> Cc: Max Krasnyansky <maxk@qualcomm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Hemminger <shemminger@vyatta.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2012-09-26	rcu: Exit RCU extended QS on user preemption	Frederic Weisbecker	1	-0/+15
	When exceptions or irq are about to resume userspace, if the task needs to be rescheduled, the arch low level code calls schedule() directly. If we call it, it is because we have the TIF_RESCHED flag: - It can be set after random local calls to set_need_resched() (RCU, drm, ...) - A wake up happened and the CPU needs preemption. This can happen in several ways: * Remotely: the remote waking CPU has set TIF_RESCHED and send the wakee an IPI to schedule the new task. * Remotely enqueued: the remote waking CPU sends an IPI to the target and the wake up is made by the target. * Locally: waking CPU == wakee CPU and the wakeup is done locally. set_need_resched() is called without IPI. In the case of local and remotely enqueued wake ups, the tick can be restarted when we enqueue the new task and RCU can exit the extended quiescent state at the same time. Then by the time we reach irq exit path and we call schedule, we are not in RCU user mode. But if we call schedule() only because something called set_need_resched(), RCU may still be in user mode when we reach schedule. Also if a wake up is done remotely, the CPU might see the TIF_RESCHED flag and call schedule while the IPI has not yet happen to restart the tick and exit RCU user mode. We need to manually protect against these corner cases. Create a new API schedule_user() that calls schedule() inside rcu_user_exit()-rcu_user_enter() in order to protect it. Archs will need to rely on it now to implement user preemption safely. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Alessio Igor Bogani <abogani@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Avi Kivity <avi@redhat.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Christoph Lameter <cl@linux.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Gilad Ben Yossef <gilad@benyossef.com> Cc: Hakan Akkan <hakanakkan@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Kevin Hilman <khilman@ti.com> Cc: Max Krasnyansky <maxk@qualcomm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Hemminger <shemminger@vyatta.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2012-09-26	rcu: Exit RCU extended QS on kernel preemption after irq/exception	Frederic Weisbecker	1	-0/+1
	When an exception or an irq exits, and we are going to resume into interrupted kernel code, the low level architecture code calls preempt_schedule_irq() if there is a need to reschedule. If the interrupt/exception occured between a call to rcu_user_enter() (from syscall exit, exception exit, do_notify_resume exit, ...) and a real resume to userspace (iret,...), preempt_schedule_irq() can be called whereas RCU thinks we are in userspace. But preempt_schedule_irq() is going to run kernel code and may be some RCU read side critical section. We must exit the userspace extended quiescent state before we call it. To solve this, just call rcu_user_exit() in the beginning of preempt_schedule_irq(). Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Alessio Igor Bogani <abogani@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Avi Kivity <avi@redhat.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Christoph Lameter <cl@linux.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Gilad Ben Yossef <gilad@benyossef.com> Cc: Hakan Akkan <hakanakkan@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Kevin Hilman <khilman@ti.com> Cc: Max Krasnyansky <maxk@qualcomm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Hemminger <shemminger@vyatta.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2012-09-26	x86: Exception hooks for userspace RCU extended QS	Frederic Weisbecker	3	-28/+86
	Add necessary hooks to x86 exception for userspace RCU extended quiescent state support. This includes traps, page fault, debug exceptions, etc... Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Alessio Igor Bogani <abogani@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Avi Kivity <avi@redhat.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Christoph Lameter <cl@linux.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Gilad Ben Yossef <gilad@benyossef.com> Cc: Hakan Akkan <hakanakkan@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Kevin Hilman <khilman@ti.com> Cc: Max Krasnyansky <maxk@qualcomm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Hemminger <shemminger@vyatta.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-09-26	x86: Unspaghettize do_general_protection()	Frederic Weisbecker	1	-22/+16
	There is some unnatural label based layout in this function. Convert the unnecessary goto to readable conditional blocks. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org>
2012-09-26	x86: Syscall hooks for userspace RCU extended QS	Frederic Weisbecker	2	-3/+12
	Add syscall slow path hooks to notify syscall entry and exit on CPUs that want to support userspace RCU extended quiescent state. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Alessio Igor Bogani <abogani@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Avi Kivity <avi@redhat.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Christoph Lameter <cl@linux.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Gilad Ben Yossef <gilad@benyossef.com> Cc: Hakan Akkan <hakanakkan@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Kevin Hilman <khilman@ti.com> Cc: Max Krasnyansky <maxk@qualcomm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Hemminger <shemminger@vyatta.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2012-09-26	rcu: Switch task's syscall hooks on context switch	Frederic Weisbecker	5	-0/+27
	Clear the syscalls hook of a task when it's scheduled out so that if the task migrates, it doesn't run the syscall slow path on a CPU that might not need it. Also set the syscalls hook on the next task if needed. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Alessio Igor Bogani <abogani@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Avi Kivity <avi@redhat.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Christoph Lameter <cl@linux.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Gilad Ben Yossef <gilad@benyossef.com> Cc: Hakan Akkan <hakanakkan@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Kevin Hilman <khilman@ti.com> Cc: Max Krasnyansky <maxk@qualcomm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Hemminger <shemminger@vyatta.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2012-09-26	rcu: Ignore userspace extended quiescent state by default	Frederic Weisbecker	2	-1/+5
	By default we don't want to enter into RCU extended quiescent state while in userspace because doing this produces some overhead (eg: use of syscall slowpath). Set it off by default and ready to run when some feature like adaptive tickless need it. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Alessio Igor Bogani <abogani@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Avi Kivity <avi@redhat.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Christoph Lameter <cl@linux.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Gilad Ben Yossef <gilad@benyossef.com> Cc: Hakan Akkan <hakanakkan@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Kevin Hilman <khilman@ti.com> Cc: Max Krasnyansky <maxk@qualcomm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Hemminger <shemminger@vyatta.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2012-09-26	rcu: Allow rcu_user_enter()/exit() to nest	Frederic Weisbecker	2	-8/+36
	Allow calls to rcu_user_enter() even if we are already in userspace (as seen by RCU) and allow calls to rcu_user_exit() even if we are already in the kernel. This makes the APIs more flexible to be called from architectures. Exception entries for example won't need to know if they come from userspace before calling rcu_user_exit(). Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Alessio Igor Bogani <abogani@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Avi Kivity <avi@redhat.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Christoph Lameter <cl@linux.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Gilad Ben Yossef <gilad@benyossef.com> Cc: Hakan Akkan <hakanakkan@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Kevin Hilman <khilman@ti.com> Cc: Max Krasnyansky <maxk@qualcomm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Hemminger <shemminger@vyatta.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2012-09-26	rcu: Settle config for userspace extended quiescent state	Frederic Weisbecker	4	-1/+33
	Create a new config option under the RCU menu that put CPUs under RCU extended quiescent state (as in dynticks idle mode) when they run in userspace. This require some contribution from architectures to hook into kernel and userspace boundaries. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Alessio Igor Bogani <abogani@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Avi Kivity <avi@redhat.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Christoph Lameter <cl@linux.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Gilad Ben Yossef <gilad@benyossef.com> Cc: Hakan Akkan <hakanakkan@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Kevin Hilman <khilman@ti.com> Cc: Max Krasnyansky <maxk@qualcomm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Hemminger <shemminger@vyatta.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2012-09-26	rcu: Make RCU_FAST_NO_HZ handle adaptive ticks	Paul E. McKenney	1	-0/+20
	The current implementation of RCU_FAST_NO_HZ tries reasonably hard to rid the current CPU of RCU callbacks. This is appropriate when the CPU is entering idle, where it doesn't have much useful to do anyway, but is most definitely not what you want when transitioning to user-mode execution. This commit therefore detects the adaptive-tick case, and refrains from burning CPU time getting rid of RCU callbacks in that case. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2012-09-26	rcu: New rcu_user_enter_after_irq() and rcu_user_exit_after_irq() APIs	Frederic Weisbecker	2	-0/+45
	In some cases, it is necessary to enter or exit userspace-RCU-idle mode from an interrupt handler, for example, if some other CPU sends this CPU a resched IPI. In this case, the current CPU would enter the IPI handler in userspace-RCU-idle mode, but would need to exit the IPI handler after having exited that mode. To allow this to work, this commit adds two new APIs to TREE_RCU: - rcu_user_enter_after_irq(). This must be called from an interrupt between rcu_irq_enter() and rcu_irq_exit(). After the irq calls rcu_irq_exit(), the irq handler will return into an RCU extended quiescent state. In theory, this interrupt is never a nested interrupt, but in practice it might interrupt softirq, which looks to RCU like a nested interrupt. - rcu_user_exit_after_irq(). This must be called from a non-nesting interrupt, interrupting an RCU extended quiescent state, also between rcu_irq_enter() and rcu_irq_exit(). After the irq calls rcu_irq_exit(), the irq handler will return in an RCU non-quiescent state. [ Combined with "Allow calls to rcu_exit_user_irq from nesting irqs." ] Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2012-09-26	rcu: New rcu_user_enter() and rcu_user_exit() APIs	Frederic Weisbecker	2	-34/+103
	RCU currently insists that only idle tasks can enter RCU idle mode, which prohibits an adaptive tickless kernel (AKA nohz cpusets), which in turn would mean that usermode execution would always take scheduling-clock interrupts, even when there is only one task runnable on the CPU in question. This commit therefore adds rcu_user_enter() and rcu_user_exit(), which allow non-idle tasks to enter RCU idle mode. These are quite similar to rcu_idle_enter() and rcu_idle_exit(), respectively, except that they omit the idle-task checks. [ Updated to use "user" flag rather than separate check functions. ] [ paulmck: Updated to drop exports of new functions based on Josh's patch getting rid of the need for them. ] Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Alessio Igor Bogani <abogani@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Avi Kivity <avi@redhat.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Christoph Lameter <cl@linux.com> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Geoff Levand <geoff@infradead.org> Cc: Gilad Ben Yossef <gilad@benyossef.com> Cc: Hakan Akkan <hakanakkan@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kevin Hilman <khilman@ti.com> Cc: Max Krasnyansky <maxk@qualcomm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Hemminger <shemminger@vyatta.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2012-09-26	Merge branch 'rcu/next' of ↵	Ingo Molnar	44	-1335/+1507
	git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu Pull v3.7 RCU commits from Paul E. McKenney: " 0. A fix for a latent bug that has been in RCU ever since the addition of CPU stall warnings. This bug results in false-positive stall warnings, but thus far only on embedded systems with severely cut-down userspace configurations. This fix is located on an rcu/urgent branch, with the rest of the commits based on top of it. This commit CCs stable. Given that the merge window is coming quite soon and given the small number of affected users, I do -not- recommend pushing it to 3.6, but the separate branch makes it easy to find if someone needs it. 1. Further reductions in latency spikes for huge systems, along with additional boot-time adaptation to the actual hardware. This is a large change, as it moves RCU grace-period initialization and cleanup, along with quiescent-state forcing, from softirq to a kthread. However, it appears to be in quite good shape (famous last words). Posted to LKML at https://lkml.org/lkml/2012/9/20/427. 2. Updates to documentation and rcutorture, the latter category including keeping statistics on CPU-hotplug latencies and fixing some initialization-time races. Posted to LKML at https://lkml.org/lkml/2012/8/30/193. 3. Miscellaneous fixes and improvements, posted to LKML at https://lkml.org/lkml/2012/8/30/199. 4. CPU-hotplug fixes and improvements, posted to LKML at https://lkml.org/lkml/2012/8/30/292 for first three and at https://lkml.org/lkml/2012/8/3/416. 5. Idle-loop fixes that were omitted on an earlier submission, posted to LKML at https://lkml.org/lkml/2012/8/30/251. " Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-09-25	Merge remote-tracking branch 'tip/core/rcu' into next.2012.09.25b	Paul E. McKenney	538	-2350/+4324
	Resolved conflict in kernel/sched/core.c using Peter Zijlstra's approach from https://lkml.org/lkml/2012/9/5/585.
2012-09-25	Merge remote-tracking branch 'tip/smp/hotplug' into next.2012.09.25b	Paul E. McKenney	15	-801/+751
	The conflicts between kernel/rcutree.h and kernel/rcutree_plugin.h were due to adjacent insertions and deletions, which were resolved by simply accepting the changes on both branches.
2012-09-25	Merge tag 'v3.6-rc7' into core/rcu	Ingo Molnar	538	-2362/+4333
	Merge Linux 3.6-rc7, to pick up fixes and to resolve a conflict in an upcoming pull. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-09-24	Merge branches 'bigrt.2012.09.23a', 'doctorture.2012.09.23a', ↵	Paul E. McKenney	31	-261/+316
	'fixes.2012.09.23a', 'hotplug.2012.09.23a' and 'idlechop.2012.09.23a' into HEAD bigrt.2012.09.23a contains additional commits to reduce scheduling latency from RCU on huge systems (many hundrends or thousands of CPUs). doctorture.2012.09.23a contains documentation changes and rcutorture fixes. fixes.2012.09.23a contains miscellaneous fixes. hotplug.2012.09.23a contains CPU-hotplug-related changes. idle.2012.09.23a fixes architectures for which RCU no longer considered the idle loop to be a quiescent state due to earlier adaptive-dynticks changes. Affected architectures are alpha, cris, frv, h8300, m32r, m68k, mn10300, parisc, score, xtensa, and ia64.
2012-09-23	Linux 3.6-rc7v3.6-rc7	Linus Torvalds	1	-2/+2

2012-09-23	Merge branch 'rc-fixes' of ↵	Linus Torvalds	2	-3/+3
	git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild Pull kbuild fixes from Michal Marek: "There are two more kbuild fixes for 3.6. One fixes a race between x86's archscripts target and the rule (re)building scripts/basic/fixdep. The second is a fix for the previous attempt at fixing make firmware_install with make 3.82. This new solution should work with any version of GNU make" * 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild: x86/kbuild: archscripts depends on scripts_basic firmware: fix directory creation rule matching with make 3.80
2012-09-23	Merge branch 'hwmon-for-linus' of ↵	Linus Torvalds	3	-2/+23
	git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging Pull hwmon subsystem fixes from Jean Delvare. * 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging: hwmon: (fam15h_power) Tweak runavg_range on resume hwmon: (coretemp) Use get_online_cpus to avoid races involving CPU hotplug hwmon: (via-cputemp) Use get_online_cpus to avoid races involving CPU hotplug
2012-09-23	Merge tag 'scsi-fixes' of ↵	Linus Torvalds	4	-2/+13
	git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "This is a set of four essential fixes: two oops related (bnx2i, virtio-scsi), one data corruption related (hpsa) and one failure to boot due to interrupt routing issues (mpt2ss). Signed-off-by: James Bottomley <JBottomley@Parallels.com>" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: [SCSI] hpsa: fix handling of protocol error [SCSI] mpt2sas: Fix for issue - Unable to boot from the drive connected to HBA [SCSI] bnx2i: Fixed NULL ptr deference for 1G bnx2 Linux iSCSI offload [SCSI] scsi: virtio-scsi: Fix address translation failure of HighMem pages used by sg list
2012-09-23	edac_mc: edac_mc_free() cannot assume mem_ctl_info is registered in sysfs.	Shaun Ruffell	1	-20/+39
	Fix potential NULL pointer dereference in edac_unregister_sysfs() on system boot introduced in 3.6-rc1. Since commit 7a623c039 ("edac: rewrite the sysfs code to use struct device") edac_mc_alloc() no longer initializes embedded kobjects in struct mem_ctl_info. Therefore edac_mc_free() can no longer simply decrement a kobject reference count to free the allocated memory unless the memory controller driver module had also called edac_mc_add_mc(). Now edac_mc_free() will check if the newly embedded struct device has been registered with sysfs before using either the standard device release functions or freeing the data structures itself with logic pulled out of the error path of edac_mc_alloc(). The BUG this patch resolves for me: BUG: unable to handle kernel NULL pointer dereference at (null) EIP is at __wake_up_common+0x1a/0x6a Process modprobe (pid: 933, ti=f3dc6000 task=f3db9520 task.ti=f3dc6000) Call Trace: complete_all+0x3f/0x50 device_pm_remove+0x23/0xa2 device_del+0x34/0x142 edac_unregister_sysfs+0x3b/0x5c [edac_core] edac_mc_free+0x29/0x2f [edac_core] e7xxx_probe1+0x268/0x311 [e7xxx_edac] e7xxx_init_one+0x56/0x61 [e7xxx_edac] local_pci_probe+0x13/0x15 ... Cc: Mauro Carvalho Chehab <mchehab@redhat.com> Cc: Shaohui Xie <Shaohui.Xie@freescale.com> Signed-off-by: Shaun Ruffell <sruffell@digium.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-09-23	edac_mc: fix messy kfree calls in the error path	Fengguang Wu	1	-5/+7
	coccinelle warns about: + drivers/edac/edac_mc.c:429:9-23: ERROR: reference preceded by free on line 429 421 if (mci->csrows) { > 422 for (chn = 0; chn < tot_channels; chn++) { 423 csr = mci->csrows[chn]; 424 if (csr) { > 425 for (chn = 0; chn < tot_channels; chn++) 426 kfree(csr->channels[chn]); 427 kfree(csr); 428 } > 429 kfree(mci->csrows[i]); 430 } 431 kfree(mci->csrows); 432 } and that code block seem to mess things up in several ways (double free, memory leak, out-of-bound reads etc.): L422: The iterator "chn" and bound "tot_channels" are totally wrong. Should be "row" and "tot_csrows" respectively. Which means either memory leak, or out-of-bound reads (which if does not trigger an immediate page fault error, will further lead to kfree() on random addresses). L425: The inner loop is reusing the same iterator "chn" as the outer loop, which could lead to premature end of the outer loop, and hence memory leak. L429: The array index 'i' in mci->csrows[i] is a temporary value used in previous loops, and won't change at all in the current loop. Which means either out-of-bound read and possibly kfree(random number), or the same mci->csrows[i] get freed once and again, and possibly double free for the kfree(csr) in L427. L426/L427: a kfree(csr->channels) is needed in between to avoid leaking the memory. The buggy code was introduced by commit de3910eb ("edac: change the mem allocation scheme to make Documentation/kobject.txt happy") in the 3.6-rc1 merge window. Fix it by freeing up resources in this order: free csrows[i]->channels[j] free csrows[i]->channels free csrows[i] free csrows CC: Mauro Carvalho Chehab <mchehab@redhat.com> CC: Shaun Ruffell <sruffell@digium.com> Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-09-23	hwmon: (fam15h_power) Tweak runavg_range on resume	Andreas Herrmann	1	-2/+13
	The quirk introduced with commit 00250ec90963b7ef6678438888f3244985ecde14 (hwmon: fam15h_power: fix bogus values with current BIOSes) is not only required during driver load but also when system resumes from suspend. The BIOS might set the previously recommended (but unsuitable) initilization value for the running average range register during resume. Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> Tested-by: Andreas Hartmann <andihartmann@01019freenet.de> Signed-off-by: Jean Delvare <khali@linux-fr.org> Cc: stable@vger.kernel.org # 3.0+
2012-09-23	hwmon: (coretemp) Use get_online_cpus to avoid races involving CPU hotplug	Silas Boyd-Wickizer	1	-0/+5
	coretemp_init loops with for_each_online_cpu, adding platform_devices and sysfs interfaces, then calls register_hotcpu_notifier. There is a race if a CPU is offlined or onlined after the loop, but before register_hotcpu_notifier. The race might result in the absence of a platform_device+sysfs interface for an online CPU, or the presence of a platform_device+sysfs interface for an offline CPU. A similar race occurs during coretemp_exit, after the module calls unregister_hotcpu_notifier, but before it unregisters all devices, a CPU might offline and a device for an offline CPU will exist for a short while. This fix surrounds for_each_online_cpu and register_hotcpu_notifier with get_online_cpus+put_online_cpus; and surrounds unregister_hotcpu_notifier and device unregistering with get_online_cpus+put_online_cpus. Build tested. Signed-off-by: Silas Boyd-Wickizer <sbw@mit.edu> Signed-off-by: Jean Delvare <khali@linux-fr.org>
2012-09-23	hwmon: (via-cputemp) Use get_online_cpus to avoid races involving CPU hotplug	Silas Boyd-Wickizer	1	-0/+5
	via_cputemp_init loops with for_each_online_cpu, adding platform_devices, then calls register_hotcpu_notifier. If a CPU is offlined between the loop and register_hotcpu_notifier, then later onlined, via_cputemp_device_add will attempt to add platform devices with the same ID. A similar race occurs during via_cputemp_exit, after the module calls unregister_hotcpu_notifier, a CPU might offline and a device will exist for a CPU that is offline. This fix surrounds for_each_online_cpu and register_hotcpu_notifier with get_online_cpus+put_online_cpus; and surrounds unregister_hotcpu_notifier and device unregistering with get_online_cpus+put_online_cpus. Build tested. Signed-off-by: Silas Boyd-Wickizer <sbw@mit.edu> Acked-by: Harald Welte <laforge@gnumonks.org> Signed-off-by: Jean Delvare <khali@linux-fr.org>
2012-09-23	ia64: Add missing RCU idle APIs on idle loop	Paul E. McKenney	1	-0/+3
	Traditionally, the entire idle task served as an RCU quiescent state. But when RCU read side critical sections started appearing within the idle loop, this traditional strategy became untenable. The fix was to create new RCU APIs named rcu_idle_enter() and rcu_idle_exit(), which must be called by each architecture's idle loop so that RCU can tell when it is safe to ignore a given idle CPU. Unfortunately, this fix was never applied to ia64, a shortcoming remedied by this commit. Reported by: Tony Luck <tony.luck@intel.com> Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Cc: <stable@vger.kernel.org> # 3.3+ Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested by: Tony Luck <tony.luck@intel.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2012-09-23	xtensa: Add missing RCU idle APIs on idle loop	Frederic Weisbecker	1	-0/+3
	In the old times, the whole idle task was considered as an RCU quiescent state. But as RCU became more and more successful overtime, some RCU read side critical section have been added even in the code of some architectures idle tasks, for tracing for example. So nowadays, rcu_idle_enter() and rcu_idle_exit() must be called by the architecture to tell RCU about the part in the idle loop that doesn't make use of rcu read side critical sections, typically the part that puts the CPU in low power mode. This is necessary for RCU to find the quiescent states in idle in order to complete grace periods. Add this missing pair of calls in the xtensa's idle loop. Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Chris Zankel <chris@zankel.net> Cc: <stable@vger.kernel.org> # 3.3+ Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2012-09-23	score: Add missing RCU idle APIs on idle loop	Frederic Weisbecker	1	-1/+3
	In the old times, the whole idle task was considered as an RCU quiescent state. But as RCU became more and more successful overtime, some RCU read side critical section have been added even in the code of some architectures idle tasks, for tracing for example. So nowadays, rcu_idle_enter() and rcu_idle_exit() must be called by the architecture to tell RCU about the part in the idle loop that doesn't make use of rcu read side critical sections, typically the part that puts the CPU in low power mode. This is necessary for RCU to find the quiescent states in idle in order to complete grace periods. Add this missing pair of calls in scores's idle loop. Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Chen Liqin <liqin.chen@sunplusct.com> Cc: Lennox Wu <lennox.wu@gmail.com> Cc: <stable@vger.kernel.org> # 3.3+ Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2012-09-23	parisc: Add missing RCU idle APIs on idle loop	Frederic Weisbecker	1	-0/+3
	In the old times, the whole idle task was considered as an RCU quiescent state. But as RCU became more and more successful overtime, some RCU read side critical section have been added even in the code of some architectures idle tasks, for tracing for example. So nowadays, rcu_idle_enter() and rcu_idle_exit() must be called by the architecture to tell RCU about the part in the idle loop that doesn't make use of rcu read side critical sections, typically the part that puts the CPU in low power mode. This is necessary for RCU to find the quiescent states in idle in order to complete grace periods. Add this missing pair of calls in the parisc's idle loop. Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: James E.J. Bottomley <jejb@parisc-linux.org> Cc: Helge Deller <deller@gmx.de> Cc: Parisc <linux-parisc@vger.kernel.org> Cc: <stable@vger.kernel.org> # 3.3+ Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2012-09-23	mn10300: Add missing RCU idle APIs on idle loop	Frederic Weisbecker	1	-0/+3
	In the old times, the whole idle task was considered as an RCU quiescent state. But as RCU became more and more successful overtime, some RCU read side critical section have been added even in the code of some architectures idle tasks, for tracing for example. So nowadays, rcu_idle_enter() and rcu_idle_exit() must be called by the architecture to tell RCU about the part in the idle loop that doesn't make use of rcu read side critical sections, typically the part that puts the CPU in low power mode. This is necessary for RCU to find the quiescent states in idle in order to complete grace periods. Add this missing pair of calls in the mn10300's idle loop. Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: David Howells <dhowells@redhat.com> Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com> Cc: <stable@vger.kernel.org> # 3.3+ Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: David Howells <dhowells@redhat.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2012-09-23	m68k: Add missing RCU idle APIs on idle loop	Frederic Weisbecker	1	-0/+3
	In the old times, the whole idle task was considered as an RCU quiescent state. But as RCU became more and more successful overtime, some RCU read side critical section have been added even in the code of some architectures idle tasks, for tracing for example. So nowadays, rcu_idle_enter() and rcu_idle_exit() must be called by the architecture to tell RCU about the part in the idle loop that doesn't make use of rcu read side critical sections, typically the part that puts the CPU in low power mode. This is necessary for RCU to find the quiescent states in idle in order to complete grace periods. Add this missing pair of calls in the m68k's idle loop. Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: m68k <linux-m68k@lists.linux-m68k.org> Cc: <stable@vger.kernel.org> # 3.3+ Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2012-09-23	m32r: Add missing RCU idle APIs on idle loop	Frederic Weisbecker	1	-0/+3
	In the old times, the whole idle task was considered as an RCU quiescent state. But as RCU became more and more successful overtime, some RCU read side critical section have been added even in the code of some architectures idle tasks, for tracing for example. So nowadays, rcu_idle_enter() and rcu_idle_exit() must be called by the architecture to tell RCU about the part in the idle loop that doesn't make use of rcu read side critical sections, typically the part that puts the CPU in low power mode. This is necessary for RCU to find the quiescent states in idle in order to complete grace periods. Add this missing pair of calls in the m32r's idle loop. Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Hirokazu Takata <takata@linux-m32r.org> Cc: <stable@vger.kernel.org> # 3.3+ Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2012-09-23	h8300: Add missing RCU idle APIs on idle loop	Frederic Weisbecker	1	-0/+3
	In the old times, the whole idle task was considered as an RCU quiescent state. But as RCU became more and more successful overtime, some RCU read side critical section have been added even in the code of some architectures idle tasks, for tracing for example. So nowadays, rcu_idle_enter() and rcu_idle_exit() must be called by the architecture to tell RCU about the part in the idle loop that doesn't make use of rcu read side critical sections, typically the part that puts the CPU in low power mode. This is necessary for RCU to find the quiescent states in idle in order to complete grace periods. Add this missing pair of calls in the h8300's idle loop. Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: <stable@vger.kernel.org> # 3.3+ Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2012-09-23	frv: Add missing RCU idle APIs on idle loop	Frederic Weisbecker	1	-0/+3
	In the old times, the whole idle task was considered as an RCU quiescent state. But as RCU became more and more successful overtime, some RCU read side critical section have been added even in the code of some architectures idle tasks, for tracing for example. So nowadays, rcu_idle_enter() and rcu_idle_exit() must be called by the architecture to tell RCU about the part in the idle loop that doesn't make use of rcu read side critical sections, typically the part that puts the CPU in low power mode. This is necessary for RCU to find the quiescent states in idle in order to complete grace periods. Add this missing pair of calls in the Frv's idle loop. Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: David Howells <dhowells@redhat.com> Cc: <stable@vger.kernel.org> # 3.3+ Acked-by: David Howells <dhowells@redhat.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2012-09-23	cris: Add missing RCU idle APIs on idle loop	Frederic Weisbecker	1	-0/+3
	In the old times, the whole idle task was considered as an RCU quiescent state. But as RCU became more and more successful overtime, some RCU read side critical section have been added even in the code of some architectures idle tasks, for tracing for example. So nowadays, rcu_idle_enter() and rcu_idle_exit() must be called by the architecture to tell RCU about the part in the idle loop that doesn't make use of rcu read side critical sections, typically the part that puts the CPU in low power mode. This is necessary for RCU to find the quiescent states in idle in order to complete grace periods. Add this missing pair of calls in the Cris's idle loop. Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mikael Starvik <starvik@axis.com> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Cc: Cris <linux-cris-kernel@axis.com> Cc: <stable@vger.kernel.org> # 3.3+ Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2012-09-23	alpha: Add missing RCU idle APIs on idle loop	Frederic Weisbecker	1	-0/+3
	In the old times, the whole idle task was considered as an RCU quiescent state. But as RCU became more and more successful overtime, some RCU read side critical section have been added even in the code of some architectures idle tasks, for tracing for example. So nowadays, rcu_idle_enter() and rcu_idle_exit() must be called by the architecture to tell RCU about the part in the idle loop that doesn't make use of rcu read side critical sections, typically the part that puts the CPU in low power mode. This is necessary for RCU to find the quiescent states in idle in order to complete grace periods. Add this missing pair of calls in the Alpha's idle loop. Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Tested-by: Michael Cree <mcree@orcon.net.nz> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Cc: alpha <linux-alpha@vger.kernel.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: <stable@vger.kernel.org> # 3.3+ Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2012-09-23	alpha: Fix preemption handling in idle loop	Frederic Weisbecker	2	-1/+3
	cpu_idle() is called on the boot CPU by the init code with preemption disabled. But the cpu_idle() function in alpha doesn't handle this when it calls schedule() directly. Fix it by converting it into schedule_preempt_disabled(). Also disable preemption before calling cpu_idle() from secondary CPU entry code to stay consistent with this state. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Tested-by: Michael Cree <mcree@orcon.net.nz> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Cc: alpha <linux-alpha@vger.kernel.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2012-09-23	Use get_online_cpus to avoid races involving CPU hotplug	Silas Boyd-Wickizer	1	-0/+5
	If arch/x86/kernel/cpuid.c is a module, a CPU might offline or online between the for_each_online_cpu() loop and the call to register_hotcpu_notifier in cpuid_init or the call to unregister_hotcpu_notifier in cpuid_exit. The potential races can lead to leaks/duplicates, attempts to destroy non-existant devices, or random pointer dereferences. For example, in cpuid_exit if: for_each_online_cpu(cpu) cpuid_device_destroy(cpu); class_destroy(cpuid_class); __unregister_chrdev(CPUID_MAJOR, 0, NR_CPUS, "cpu/cpuid"); <----- CPU onlines unregister_hotcpu_notifier(&cpuid_class_cpu_notifier); the hotcpu notifier will attempt to create a device for the cpuid_class, which the module already destroyed. This fix surrounds for_each_online_cpu and register_hotcpu_notifier or unregister_hotcpu_notifier with get_online_cpus+put_online_cpus. Tested on a VM. Signed-off-by: Silas Boyd-Wickizer <sbw@mit.edu> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-09-23	Use get_online_cpus to avoid races involving CPU hotplug	Silas Boyd-Wickizer	1	-0/+5
	If arch/x86/kernel/msr.c is a module, a CPU might offline or online between the for_each_online_cpu(i) loop and the call to register_hotcpu_notifier in msr_init or the call to unregister_hotcpu_notifier in msr_exit. The potential races can lead to leaks/duplicates, attempts to destroy non-existant devices, or random pointer dereferences. For example, in msr_init if: for_each_online_cpu(i) { err = msr_device_create(i); if (err != 0) goto out_class; } <----- CPU offlines register_hotcpu_notifier(&msr_class_cpu_notifier); and the CPU never onlines before msr_exit, then the module will never call msr_device_destroy for the associated CPU. This fix surrounds for_each_online_cpu and register_hotcpu_notifier or unregister_hotcpu_notifier with get_online_cpus+put_online_cpus. Tested on a VM. Signed-off-by: Silas Boyd-Wickizer <sbw@mit.edu> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-09-23	sched: Fix load avg vs cpu-hotplug	Peter Zijlstra	1	-21/+20
	Rabik and Paul reported two different issues related to the same few lines of code. Rabik's issue is that the nr_uninterruptible migration code is wrong in that he sees artifacts due to this (Rabik please do expand in more detail). Paul's issue is that this code as it stands relies on us using stop_machine() for unplug, we all would like to remove this assumption so that eventually we can remove this stop_machine() usage altogether. The only reason we'd have to migrate nr_uninterruptible is so that we could use for_each_online_cpu() loops in favour of for_each_possible_cpu() loops, however since nr_uninterruptible() is the only such loop and its using possible lets not bother at all. The problem Rabik sees is (probably) caused by the fact that by migrating nr_uninterruptible we screw rq->calc_load_active for both rqs involved. So don't bother with fancy migration schemes (meaning we now have to keep using for_each_possible_cpu()) and instead fold any nr_active delta after we migrate all tasks away to make sure we don't have any skewed nr_active accounting. [ paulmck: Move call to calc_load_migration to CPU_DEAD to avoid miscounting noted by Rakib. ] Reported-by: Rakib Mullick <rakib.mullick@gmail.com> Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
2012-09-23	rcu: Disallow callback registry on offline CPUs	Paul E. McKenney	1	-0/+10
	Posting a callback after the CPU_DEAD notifier effectively leaks that callback unless/until that CPU comes back online. Silence is unhelpful when attempting to track down such leaks, so this commit emits a WARN_ON_ONCE() and unconditionally leaks the callback when an offline CPU attempts to register a callback. The rdp->nxttail[RCU_NEXT_TAIL] is set to NULL in the CPU_DEAD notifier and restored in the CPU_UP_PREPARE notifier, allowing _call_rcu() to determine exactly when posting callbacks is illegal. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>