smpboot: Add common code for notification from dying CPU - linux - Linux Kernel (branches are rebased on master from time to time)

diff options

author	Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-02-25 10:34:39 -0800
committer	Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-03-11 13:20:25 -0700
commit	8038dad7e888581266c76df15d70ca457a3c5910 (patch)
tree	a921a15c300418540c71a410a1caf558d6ba8a80 /kernel/cpu_pm.c
parent	c517d838eb7d07bbe9507871fab3931deccff539 (diff)
download	linux-8038dad7e888581266c76df15d70ca457a3c5910.tar.bz2

smpboot: Add common code for notification from dying CPU

RCU ignores offlined CPUs, so they cannot safely run RCU read-side code. (They -can- use SRCU, but not RCU.) This means that any use of RCU during or after the call to arch_cpu_idle_dead(). Unfortunately, commit 2ed53c0d6cc99 added a complete() call, which will contain RCU read-side critical sections if there is a task waiting to be awakened. Which, as it turns out, there almost never is. In my qemu/KVM testing, the to-be-awakened task is not yet asleep more than 99.5% of the time. In current mainline, failure is even harder to reproduce, requiring a virtualized environment that delays the outgoing CPU by at least three jiffies between the time it exits its stop_machine() task at CPU_DYING time and the time it calls arch_cpu_idle_dead() from the idle loop. However, this problem really can occur, especially in virtualized environments, and therefore really does need to be fixed This suggests moving back to the polling loop, but using a much shorter wait, with gentle exponential backoff instead of the old 100-millisecond wait. Most of the time, the loop will exit without waiting at all, and almost all of the remaining uses will wait only five microseconds. If the outgoing CPU is preempted, a loop will wait one jiffy, then increase the wait by a factor of 11/10ths, rounding up. As before, there is a five-second timeout. This commit therefore provides common-code infrastructure to do the dying-to-surviving CPU handoff in a safe manner. This code also provides an indication at CPU-online of whether the CPU to be onlined previously timed out on offline. The new cpu_check_up_prepare() function returns -EBUSY if this CPU previously took more than five seconds to go offline, or -EAGAIN if it has not yet managed to go offline. The rationale for -EAGAIN is that it might still be preempted, so an additional wait might well find it correctly offlined. Architecture-specific code can decide how to handle these conditions. Systems in which CPUs take themselves completely offline might respond to an -EBUSY return as if it was a zero (success) return. Systems in which the surviving CPU must take some action might take it at this time, or might simply mark the other CPU as unusable. Note that architectures that take the easy way out and simply pass the -EBUSY and -EAGAIN upwards will change the sysfs API. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: <linux-api@vger.kernel.org> Cc: <linux-arch@vger.kernel.org> [ paulmck: Fixed state machine for architectures that don't check earlier CPU-hotplug results as suggested by James Hogan. ]

Diffstat (limited to 'kernel/cpu_pm.c')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: