summaryrefslogtreecommitdiffstats
path: root/arch/x86/kernel/apic/hw_nmi.c
AgeCommit message (Collapse)AuthorFilesLines
2011-10-10x86, nmi: Wire up NMI handlers to new routinesDon Zickus1-22/+5
Just convert all the files that have an nmi handler to the new routines. Most of it is straight forward conversion. A couple of places needed some tweaking like kgdb which separates the debug notifier from the nmi handler and mce removes a call to notify_die. [Thanks to Ying for finding out the history behind that mce call https://lkml.org/lkml/2010/5/27/114 And Boris responding that he would like to remove that call because of it https://lkml.org/lkml/2011/9/21/163] The things that get converted are the registeration/unregistration routines and the nmi handler itself has its args changed along with code removal to check which list it is on (most are on one NMI list except for kgdb which has both an NMI routine and an NMI Unknown routine). Signed-off-by: Don Zickus <dzickus@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Corey Minyard <minyard@acm.org> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Robert Richter <robert.richter@amd.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Corey Minyard <minyard@acm.org> Cc: Jack Steiner <steiner@sgi.com> Link: http://lkml.kernel.org/r/1317409584-23662-4-git-send-email-dzickus@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-23watchdog: Change the default timeout and configure nmi watchdog period based ↵Mandeep Singh Baines1-2/+2
on watchdog_thresh Before the conversion of the NMI watchdog to perf event, the watchdog timeout was 5 seconds. Now it is 60 seconds. For my particular application, netbooks, 5 seconds was a better timeout. With a short timeout, we catch faults earlier and are able to send back a panic. With a 60 second timeout, the user is unlikely to wait and will instead hit the power button, causing us to lose the panic info. This change configures the NMI period to watchdog_thresh and sets the softlockup_thresh to watchdog_thresh * 2. In addition, watchdog_thresh was reduced to 10 seconds as suggested by Ingo Molnar. Signed-off-by: Mandeep Singh Baines <msb@chromium.org> Cc: Marcin Slusarz <marcin.slusarz@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Frederic Weisbecker <fweisbec@gmail.com> Link: http://lkml.kernel.org/r/1306127423-3347-4-git-send-email-msb@chromium.org Signed-off-by: Ingo Molnar <mingo@elte.hu> LKML-Reference: <20110517071642.GF22305@elte.hu>
2011-03-29x86: Stop including <linux/delay.h> in two asm header filesJean Delvare1-0/+1
Stop including <linux/delay.h> in x86 header files which don't need it. This will let the compiler complain when this header is not included by source files when it should, so that contributors can fix the problem before building on other architectures starts to fail. Credits go to Geert for the idea. Signed-off-by: Jean Delvare <khali@linux-fr.org> Cc: James E.J. Bottomley <James.Bottomley@suse.de> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> LKML-Reference: <20110325152014.297890ec@endymion.delvare> [ this also fixes an upstream build bug in drivers/media/rc/ite-cir.c ] Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-18x86: Don't call dump_stack() from arch_trigger_all_cpu_backtrace_handler()Jan Beulich1-1/+0
show_regs() already prints two(!) stack traces, no need for a third one. Signed-off-by: Jan Beulich <jbeulich@novell.com> LKML-Reference: <4D5D512902000078000326EE@vpn.id2.novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-07x86, NMI: Remove DIE_NMI_IPIDon Zickus1-1/+0
With priorities in place and no one really understanding the difference between DIE_NMI and DIE_NMI_IPI, just remove DIE_NMI_IPI and convert everyone to DIE_NMI. This also simplifies default_do_nmi() a little bit. Instead of calling the die_notifier in both the if and else part, just pull it out and call it before the if-statement. This has the side benefit of avoiding a call to the ioport to see if there is an external NMI sitting around until after the (more frequent) internal NMIs are dealt with. Patch-Inspired-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Don Zickus <dzickus@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1294348732-15030-5-git-send-email-dzickus@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-07x86, NMI: Add priorities to handlersDon Zickus1-1/+1
In order to consolidate the NMI die_chain events, we need to setup the priorities for the die notifiers. I started by defining a bunch of common priorities that can be used by the notifier blocks. Then I modified the notifier blocks to use the newly created priorities. Now that the priorities are straightened out, it should be easier to remove the event DIE_NMI_IPI. Signed-off-by: Don Zickus <dzickus@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1294348732-15030-4-git-send-email-dzickus@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-05x86: Avoid calling arch_trigger_all_cpu_backtrace() at the same timeDongdong Deng1-0/+13
The spin_lock_debug/rcu_cpu_stall detector uses trigger_all_cpu_backtrace() to dump cpu backtrace. Therefore it is possible that trigger_all_cpu_backtrace() could be called at the same time on different CPUs, which triggers and 'unknown reason NMI' warning. The following case illustrates the problem: CPU1 CPU2 ... CPU N trigger_all_cpu_backtrace() set "backtrace_mask" to cpu mask | generate NMI interrupts generate NMI interrupts ... \ | / \ | / The "backtrace_mask" will be cleaned by the first NMI interrupt at nmi_watchdog_tick(), then the following NMI interrupts generated by other cpus's arch_trigger_all_cpu_backtrace() will be taken as unknown reason NMI interrupts. This patch uses a test_and_set to avoid the problem, and stop the arch_trigger_all_cpu_backtrace() from calling to avoid dumping a double cpu backtrace info when there is already a trigger_all_cpu_backtrace() in progress. Signed-off-by: Dongdong Deng <dongdong.deng@windriver.com> Reviewed-by: Bruce Ashfield <bruce.ashfield@windriver.com> Cc: fweisbec@gmail.com LKML-Reference: <1294198689-15447-2-git-send-email-dzickus@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Don Zickus <dzickus@redhat.com>
2011-01-05x86: Only call smp_processor_id in non-preempt casesDon Zickus1-1/+2
There are some paths that walk the die_chain with preemption on. Make sure we are in an NMI call before we start doing anything. This was triggered by do_general_protection calling notify_die with DIE_GPF. Reported-by: Jan Kiszka <jan.kiszka@web.de> Signed-off-by: Don Zickus <dzickus@redhat.com> LKML-Reference: <1294198689-15447-1-git-send-email-dzickus@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-22x86, nmi_watchdog: Remove ARCH_HAS_NMI_WATCHDOG and rely on ↵Don Zickus1-10/+0
CONFIG_HARDLOCKUP_DETECTOR The x86 arch has shifted its use of the nmi_watchdog from a local implementation to the global one provide by kernel/watchdog.c. This shift has caused a whole bunch of compile problems under different config options. I attempt to simplify things with the patch below. In order to simplify things, I had to come to terms with the meaning of two terms ARCH_HAS_NMI_WATCHDOG and CONFIG_HARDLOCKUP_DETECTOR. Basically they mean the same thing, the former on a local level and the latter on a global level. With the old x86 nmi watchdog gone, there is no need to rely on defining the ARCH_HAS_NMI_WATCHDOG variable because it doesn't make sense any more. x86 will now use the global implementation. The changes below do a few things. First it changes the few places that relied on ARCH_HAS_NMI_WATCHDOG to use CONFIG_X86_LOCAL_APIC (the former was an alias for the latter anyway, so nothing unusual here). Those pieces of code were relying more on local apic functionality the nmi watchdog functionality, so the change should make sense. Second, I removed the x86 implementation of touch_nmi_watchdog(). It isn't need now, instead x86 will rely on kernel/watchdog.c's implementation. Third, I removed the #define ARCH_HAS_NMI_WATCHDOG itself from x86. And tweaked the include/linux/nmi.h file to tell users to look for an externally defined touch_nmi_watchdog in the case of ARCH_HAS_NMI_WATCHDOG _or_ CONFIG_HARDLOCKUP_DETECTOR. This changes removes some of the ugliness in that file. Finally, I added a Kconfig dependency for CONFIG_HARDLOCKUP_DETECTOR that said you can't have ARCH_HAS_NMI_WATCHDOG _and_ CONFIG_HARDLOCKUP_DETECTOR. You can only have one nmi_watchdog. Tested with ARCH=i386: allnoconfig, defconfig, allyesconfig, (various broken configs) ARCH=x86_64: allnoconfig, defconfig, allyesconfig, (various broken configs) Hopefully, after this patch I won't get any more compile broken emails. :-) v3: changed a couple of 'linux/nmi.h' -> 'asm/nmi.h' to pick-up correct function prototypes when CONFIG_HARDLOCKUP_DETECTOR is not set. Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: fweisbec@gmail.com LKML-Reference: <1293044403-14117-1-git-send-email-dzickus@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-13x86, watchdog: Compile fix when CONFIG_LOCAL_APIC not enabledDon Zickus1-0/+3
When adjusting the code to handle removing the old nmi watchdog, I forgot to consider the compile case when the local apic is not enabled. This change fixes the following build error: arch/x86/kernel/apic/hw_nmi.c:28:6: error: redefinition of ‘touch_nmi_watchdog’ Signed-off-by: Don Zickus <dzickus@redhat.com> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Rakib Mullick <rakib.mullick@gmail.com> LKML-Reference: <20101213153719.GD18577@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-10x86, NMI: Add back unknown_nmi_panic and nmi_watchdog sysctlsDon Zickus1-3/+0
Originally adapted from Huang Ying's patch which moved the unknown_nmi_panic to the traps.c file. Because the old nmi watchdog was deleted before this change happened, the unknown_nmi_panic sysctl was lost. This re-adds it. Also, the nmi_watchdog sysctl was re-implemented and its documentation updated accordingly. Patch-inspired-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Don Zickus <dzickus@redhat.com> Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com> Acked-by: Yinghai Lu <yinghai@kernel.org> Cc: fweisbec@gmail.com LKML-Reference: <1291068437-5331-3-git-send-email-dzickus@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-10lockup detector: Compile fixes from removing the old x86 nmi watchdogDon Zickus1-1/+7
My patch that removed the old x86 nmi watchdog broke other arches. This change reverts a piece of that patch and puts the change in the correct spot. Signed-off-by: Don Zickus <dzickus@redhat.com> Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com> Cc: fweisbec@gmail.com Cc: yinghai@kernel.org LKML-Reference: <1291068437-5331-2-git-send-email-dzickus@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-09x86: Address 'unused' warning in hw_nmi.c againRakib Mullick1-1/+1
arch/x86/kernel/apic/hw_nmi.c:29: warning: backtrace_mask defined but not used commit 0e2af2a9(x86, hw_nmi: Move backtrace_mask declaration under ARCH_HAS_NMI_WATCHDOG) addressed this warning, but it was reintroduced by commit 5f2b0ba4(x86, nmi_watchdog: Remove the old nmi_watchdog). Move backtrace_mask into the #ifdef arch_trigger_all_cpu_backtrace section again. Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <AANLkTi=rcc38QzoKa6LFy4m++-p_9=Zt4_kDQE=GeKxf@mail.gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-11-26Merge branch 'perf/urgent' into perf/coreIngo Molnar1-3/+4
Conflicts: arch/x86/kernel/apic/hw_nmi.c Merge reason: Resolve conflict, queue up dependent patch. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-18x86, hw_nmi: Move backtrace_mask declaration under ARCH_HAS_NMI_WATCHDOGRakib Mullick1-3/+4
backtrace_mask has been used under the code context of ARCH_HAS_NMI_WATCHDOG. So put it into that context. We were warned by the following warning: arch/x86/kernel/apic/hw_nmi.c:21: warning: ‘backtrace_mask’ defined but not used Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com> Signed-off-by: Don Zickus <dzickus@redhat.com> LKML-Reference: <1289573455-3410-2-git-send-email-dzickus@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-18x86, nmi_watchdog: Remove all stub function calls from old nmi_watchdogDon Zickus1-10/+0
Now that the bulk of the old nmi_watchdog is gone, remove all the stub variables and hooks associated with it. This touches lots of files mainly because of how the io_apic nmi_watchdog was implemented. Now that the io_apic nmi_watchdog is forever gone, remove all its fingers. Most of this code was not being exercised by virtue of nmi_watchdog != NMI_IO_APIC, so there shouldn't be anything to risky here. Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: fweisbec@gmail.com Cc: gorcunov@openvz.org LKML-Reference: <1289578944-28564-3-git-send-email-dzickus@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-18x86, nmi_watchdog: Remove the old nmi_watchdogDon Zickus1-3/+3
Now that we have a new nmi_watchdog that is more generic and sits on top of the perf subsystem, we really do not need the old nmi_watchdog any more. In addition, the old nmi_watchdog doesn't really work if you are using the default clocksource, hpet. The old nmi_watchdog code relied on local apic interrupts to determine if the cpu is still alive. With hpet as the clocksource, these interrupts don't increment any more and the old nmi_watchdog triggers false postives. This piece removes the old nmi_watchdog code and stubs out any variables and functions calls. The stubs are the same ones used by the new nmi_watchdog code, so it should be well tested. Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: fweisbec@gmail.com Cc: gorcunov@openvz.org LKML-Reference: <1289578944-28564-2-git-send-email-dzickus@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-05-13x86, watchdog: Fix build error in hw_nmi.cIngo Molnar1-0/+1
On some configs the following build error triggers: arch/x86/kernel/apic/hw_nmi.c:35: error: 'apic' undeclared (first use in this function) arch/x86/kernel/apic/hw_nmi.c:35: error: (Each undeclared identifier is reported only once arch/x86/kernel/apic/hw_nmi.c:35: error: for each function it appears in.) Because asm/apic.h was only included implicitly. Include it explicitly. Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Cyrill Gorcunov <gorcunov@gmail.com> LKML-Reference: <1273713674-8434-1-git-send-regression-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-05-12x86: Cleanup hw_nmi.c cruftDon Zickus1-58/+0
The design of the hardlockup watchdog has changed and cruft was left behind in the hw_nmi.c file. Just remove the code that isn't used anymore. Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Eric Paris <eparis@redhat.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> LKML-Reference: <1273266711-18706-7-git-send-email-dzickus@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2010-05-12x86: Move trigger_all_cpu_backtrace to its own die_notifierDon Zickus1-14/+51
As part of the transition of the nmi watchdog to something more generic, the trigger_all_cpu_backtrace code is getting left behind. Put it in its own die_notifier so it can still be used. V2: - use arch_spin_locks Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Eric Paris <eparis@redhat.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> LKML-Reference: <1273266711-18706-6-git-send-email-dzickus@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2010-05-12lockup_detector: Combine nmi_watchdog and softlockup detectorDon Zickus1-1/+1
The new nmi_watchdog (which uses the perf event subsystem) is very similar in structure to the softlockup detector. Using Ingo's suggestion, I combined the two functionalities into one file: kernel/watchdog.c. Now both the nmi_watchdog (or hardlockup detector) and softlockup detector sit on top of the perf event subsystem, which is run every 60 seconds or so to see if there are any lockups. To detect hardlockups, cpus not responding to interrupts, I implemented an hrtimer that runs 5 times for every perf event overflow event. If that stops counting on a cpu, then the cpu is most likely in trouble. To detect softlockups, tasks not yielding to the scheduler, I used the previous kthread idea that now gets kicked every time the hrtimer fires. If the kthread isn't being scheduled neither is anyone else and the warning is printed to the console. I tested this on x86_64 and both the softlockup and hardlockup paths work. V2: - cleaned up the Kconfig and softlockup combination - surrounded hardlockup cases with #ifdef CONFIG_PERF_EVENTS_NMI - seperated out the softlockup case from perf event subsystem - re-arranged the enabling/disabling nmi watchdog from proc space - added cpumasks for hardlockup failure cases - removed fallback to soft events if no PMU exists for hard events V3: - comment cleanups - drop support for older softlockup code - per_cpu cleanups - completely remove software clock base hardlockup detector - use per_cpu masking on hard/soft lockup detection - #ifdef cleanups - rename config option NMI_WATCHDOG to LOCKUP_DETECTOR - documentation additions V4: - documentation fixes - convert per_cpu to __get_cpu_var - powerpc compile fixes V5: - split apart warn flags for hard and soft lockups TODO: - figure out how to make an arch-agnostic clock2cycles call (if possible) to feed into perf events as a sample period [fweisbec: merged conflict patch] Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Eric Paris <eparis@redhat.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> LKML-Reference: <1273266711-18706-2-git-send-email-dzickus@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2010-02-25nmi_watchdog: Clean up various small detailsDon Zickus1-7/+7
Mostly copy/paste whitespace damage with a couple of nitpicks by the checkpatch script. Fix the struct definition as requested by Ingo too. Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: peterz@infradead.org Cc: gorcunov@gmail.com Cc: aris@redhat.com LKML-Reference: <1266880143-24943-1-git-send-email-dzickus@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> -- arch/x86/kernel/apic/hw_nmi.c | 14 +++++------ arch/x86/kernel/traps.c | 6 ++-- include/linux/nmi.h | 2 - kernel/nmi_watchdog.c | 51 ++++++++++++++++++++---------------------- 4 files changed, 36 insertions(+), 37 deletions(-)
2010-02-19nmi_watchdog: Fix undefined 'apic' build bugDon Zickus1-0/+2
Ingo provided me a config that fails to compile with: arch/x86/built-in.o: In function `arch_trigger_all_cpu_backtrace': (.text+0x17e78): undefined reference to `apic' make: *** [.tmp_vmlinux1] Error 1 I realized I changed the compile behaviour of the nmi code by not wrapping it with CONFIG_LOCAL_APIC. To fix this I add a compile check for ARCH_HAS_NMI_WATCHDOG around arch_trigger_all_cpu_backtrace. Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: a.p.zijlstra@chello.nl Cc: gorcunov@gmail.com Cc: aris@redhat.com LKML-Reference: <1266548212-24243-1-git-send-email-dzickus@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-02-14nmi_watchdog: Compile and portability fixesDon Zickus1-5/+16
The original patch was x86_64 centric. Changed the code to make it less so. ested by building and running on a powerpc. Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: peterz@infradead.org Cc: gorcunov@gmail.com Cc: aris@redhat.com LKML-Reference: <1266013161-31197-2-git-send-email-dzickus@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-02-08nmi_watchdog: Add new, generic implementation, using perf eventsDon Zickus1-0/+114
This is a new generic nmi_watchdog implementation using the perf events infrastructure as suggested by Ingo. The implementation is simple, just create an in-kernel perf event and register an overflow handler to check for cpu lockups. I created a generic implementation that lives in kernel/ and the hardware specific part that for now lives in arch/x86. This approach has a number of advantages: - It simplifies the x86 PMU implementation in the long run, in that it removes the hardcoded low-level PMU implementation that was the NMI watchdog before. - It allows new NMI watchdog features to be added in a central place. - It allows other architectures to enable the NMI watchdog, as long as they have perf events (that provide NMIs) implemented. - It also allows for more graceful co-existence of existing perf events apps and the NMI watchdog - before these changes the relationship was exclusive. (The NMI watchdog will 'spend' a perf event when enabled. In later iterations we might be able to piggyback from an existing NMI event without having to allocate a hardware event for the NMI watchdog - turning this into a no-hardware-cost feature.) As for compatibility, we'll keep the old NMI watchdog code as well until the new one can 100% replace it on all CPUs, old and new alike. That might take some time as the NMI watchdog has been ported to many CPU models. I have done light testing to make sure the framework works correctly and it does. v2: Set the correct timeout values based on the old nmi watchdog Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: gorcunov@gmail.com Cc: aris@redhat.com Cc: peterz@infradead.org LKML-Reference: <1265424425-31562-3-git-send-email-dzickus@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>