summaryrefslogtreecommitdiffstats
path: root/kernel/panic.c
AgeCommit message (Collapse)AuthorFilesLines
2020-08-12panic: make print_oops_end_marker() staticYue Hu1-1/+1
Since print_oops_end_marker() is not used externally, also remove it in kernel.h at the same time. Signed-off-by: Yue Hu <huyue2@yulong.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Kees Cook <keescook@chromium.org> Link: http://lkml.kernel.org/r/20200724011516.12756-1-zbestahu@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-12kernel/panic.c: make oops_may_print() return boolTiezhu Yang1-1/+1
The return value of oops_may_print() is true or false, so change its type to reflect that. Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Xuefeng Li <lixuefeng@loongson.cn> Link: http://lkml.kernel.org/r/1591103358-32087-1-git-send-email-yangtiezhu@loongson.cn Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-11bug: Annotate WARN/BUG/stackfail as noinstr safeThomas Gleixner1-1/+3
Warnings, bugs and stack protection fails from noinstr sections, e.g. low level and early entry code, are likely to be fatal. Mark them as "safe" to be invoked from noinstr protected code to avoid annotating all usage sites. Getting the information out is important. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200505134100.376598577@linutronix.de
2020-06-08panic: add sysctl to dump all CPUs backtraces on oops eventGuilherme G. Piccoli1-0/+11
Usually when the kernel reaches an oops condition, it's a point of no return; in case not enough debug information is available in the kernel splat, one of the last resorts would be to collect a kernel crash dump and analyze it. The problem with this approach is that in order to collect the dump, a panic is required (to kexec-load the crash kernel). When in an environment of multiple virtual machines, users may prefer to try living with the oops, at least until being able to properly shutdown their VMs / finish their important tasks. This patch implements a way to collect a bit more debug details when an oops event is reached, by printing all the CPUs backtraces through the usage of NMIs (on architectures that support that). The sysctl added (and documented) here was called "oops_all_cpu_backtrace", and when set will (as the name suggests) dump all CPUs backtraces. Far from ideal, this may be the last option though for users that for some reason cannot panic on oops. Most of times oopses are clear enough to indicate the kernel portion that must be investigated, but in virtual environments it's possible to observe hypervisor/KVM issues that could lead to oopses shown in other guests CPUs (like virtual APIC crashes). This patch hence aims to help debug such complex issues without resorting to kdump. Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Iurii Zaikin <yzaikin@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Matthew Wilcox <willy@infradead.org> Link: http://lkml.kernel.org/r/20200327224116.21030-1-gpiccoli@canonical.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-08kernel: add panic_on_taintRafael Aquini1-0/+34
Analogously to the introduction of panic_on_warn, this patch introduces a kernel option named panic_on_taint in order to provide a simple and generic way to stop execution and catch a coredump when the kernel gets tainted by any given flag. This is useful for debugging sessions as it avoids having to rebuild the kernel to explicitly add calls to panic() into the code sites that introduce the taint flags of interest. For instance, if one is interested in proceeding with a post-mortem analysis at the point a given code path is hitting a bad page (i.e. unaccount_page_cache_page(), or slab_bug()), a coredump can be collected by rebooting the kernel with 'panic_on_taint=0x20' amended to the command line. Another, perhaps less frequent, use for this option would be as a means for assuring a security policy case where only a subset of taints, or no single taint (in paranoid mode), is allowed for the running system. The optional switch 'nousertaint' is handy in this particular scenario, as it will avoid userspace induced crashes by writes to sysctl interface /proc/sys/kernel/tainted causing false positive hits for such policies. [akpm@linux-foundation.org: tweak kernel-parameters.txt wording] Suggested-by: Qian Cai <cai@lca.pw> Signed-off-by: Rafael Aquini <aquini@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Cc: Dave Young <dyoung@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kees Cook <keescook@chromium.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Adrian Bunk <bunk@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Laura Abbott <labbott@redhat.com> Cc: Jeff Mahoney <jeffm@suse.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Takashi Iwai <tiwai@suse.de> Link: http://lkml.kernel.org/r/20200515175502.146720-1-aquini@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-11-25locking/refcount: Remove unused 'refcount_error_report()' functionWill Deacon1-11/+0
'refcount_error_report()' has no callers. Remove it. Signed-off-by: Will Deacon <will@kernel.org> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Kees Cook <keescook@chromium.org> Tested-by: Hanjun Guo <guohanjun@huawei.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Elena Reshetova <elena.reshetova@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20191121115902.2551-10-will@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-07panic: ensure preemption is disabled during panic()Will Deacon1-0/+1
Calling 'panic()' on a kernel with CONFIG_PREEMPT=y can leave the calling CPU in an infinite loop, but with interrupts and preemption enabled. From this state, userspace can continue to be scheduled, despite the system being "dead" as far as the kernel is concerned. This is easily reproducible on arm64 when booting with "nosmp" on the command line; a couple of shell scripts print out a periodic "Ping" message whilst another triggers a crash by writing to /proc/sysrq-trigger: | sysrq: Trigger a crash | Kernel panic - not syncing: sysrq triggered crash | CPU: 0 PID: 1 Comm: init Not tainted 5.2.15 #1 | Hardware name: linux,dummy-virt (DT) | Call trace: | dump_backtrace+0x0/0x148 | show_stack+0x14/0x20 | dump_stack+0xa0/0xc4 | panic+0x140/0x32c | sysrq_handle_reboot+0x0/0x20 | __handle_sysrq+0x124/0x190 | write_sysrq_trigger+0x64/0x88 | proc_reg_write+0x60/0xa8 | __vfs_write+0x18/0x40 | vfs_write+0xa4/0x1b8 | ksys_write+0x64/0xf0 | __arm64_sys_write+0x14/0x20 | el0_svc_common.constprop.0+0xb0/0x168 | el0_svc_handler+0x28/0x78 | el0_svc+0x8/0xc | Kernel Offset: disabled | CPU features: 0x0002,24002004 | Memory Limit: none | ---[ end Kernel panic - not syncing: sysrq triggered crash ]--- | Ping 2! | Ping 1! | Ping 1! | Ping 2! The issue can also be triggered on x86 kernels if CONFIG_SMP=n, otherwise local interrupts are disabled in 'smp_send_stop()'. Disable preemption in 'panic()' before re-enabling interrupts. Link: http://lkml.kernel.org/r/20191002123538.22609-1-will@kernel.org Link: https://lore.kernel.org/r/BX1W47JXPMR8.58IYW53H6M5N@dragonstone Signed-off-by: Will Deacon <will@kernel.org> Reported-by: Xogium <contact@xogium.me> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Feng Tang <feng.tang@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-25bug: consolidate __WARN_FLAGS usageKees Cook1-1/+1
Instead of having separate tests for __WARN_FLAGS, merge the two #ifdef blocks and replace the synonym WANT_WARN_ON_SLOWPATH macro. Link: http://lkml.kernel.org/r/20190819234111.9019-7-keescook@chromium.org Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@suse.de> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Drew Davenport <ddavenport@chromium.org> Cc: Feng Tang <feng.tang@intel.com> Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Petr Mladek <pmladek@suse.com> Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org> Cc: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-25bug: lift "cut here" out of __warn()Kees Cook1-4/+2
In preparation for cleaning up "cut here", move the "cut here" logic up out of __warn() and into callers that pass non-NULL args. For anyone looking closely, there are two callers that pass NULL args: one already explicitly prints "cut here". The remaining case is covered by how a WARN is built, which will be cleaned up in the next patch. Link: http://lkml.kernel.org/r/20190819234111.9019-5-keescook@chromium.org Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@suse.de> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Drew Davenport <ddavenport@chromium.org> Cc: Feng Tang <feng.tang@intel.com> Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Petr Mladek <pmladek@suse.com> Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org> Cc: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-25bug: consolidate warn_slowpath_fmt() usageKees Cook1-7/+7
Instead of having a separate helper for no printk output, just consolidate the logic into warn_slowpath_fmt(). Link: http://lkml.kernel.org/r/20190819234111.9019-4-keescook@chromium.org Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@suse.de> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Drew Davenport <ddavenport@chromium.org> Cc: Feng Tang <feng.tang@intel.com> Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Petr Mladek <pmladek@suse.com> Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org> Cc: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-25bug: refactor away warn_slowpath_fmt_taint()Kees Cook1-15/+3
Patch series "Clean up WARN() "cut here" handling", v2. Christophe Leroy noticed that the fix for missing "cut here" in the WARN() case was adding explicit printk() calls instead of teaching the exception handler to add it. This refactors the bug/warn infrastructure to pass this information as a new BUGFLAG. Longer details repeated from the last patch in the series: bug: move WARN_ON() "cut here" into exception handler The original cleanup of "cut here" missed the WARN_ON() case (that does not have a printk message), which was fixed recently by adding an explicit printk of "cut here". This had the downside of adding a printk() to every WARN_ON() caller, which reduces the utility of using an instruction exception to streamline the resulting code. By making this a new BUGFLAG, all of these can be removed and "cut here" can be handled by the exception handler. This was very pronounced on PowerPC, but the effect can be seen on x86 as well. The resulting text size of a defconfig build shows some small savings from this patch: text data bss dec hex filename 19691167 5134320 1646664 26472151 193eed7 vmlinux.before 19676362 5134260 1663048 26473670 193f4c6 vmlinux.after This change also opens the door for creating something like BUG_MSG(), where a custom printk() before issuing BUG(), without confusing the "cut here" line. This patch (of 7): There's no reason to have specialized helpers for passing the warn taint down to __warn(). Consolidate and refactor helper macros, removing __WARN_printf() and warn_slowpath_fmt_taint(). Link: http://lkml.kernel.org/r/20190819234111.9019-2-keescook@chromium.org Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Drew Davenport <ddavenport@chromium.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org> Cc: Feng Tang <feng.tang@intel.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Cc: Borislav Petkov <bp@suse.de> Cc: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-25kgdb: don't use a notifier to enter kgdb at panic; call directlyDouglas Anderson1-0/+8
Right now kgdb/kdb hooks up to debug panics by registering for the panic notifier. This works OK except that it means that kgdb/kdb gets called _after_ the CPUs in the system are taken offline. That means that if anything important was happening on those CPUs (like something that might have contributed to the panic) you can't debug them. Specifically I ran into a case where I got a panic because a task was "blocked for more than 120 seconds" which was detected on CPU 2. I nicely got shown stack traces in the kernel log for all CPUs including CPU 0, which was running 'PID: 111 Comm: kworker/0:1H' and was in the middle of __mmc_switch(). I then ended up at the kdb prompt where switched over to kgdb to try to look at local variables of the process on CPU 0. I found that I couldn't. Digging more, I found that I had no info on any tasks running on CPUs other than CPU 2 and that asking kdb for help showed me "Error: no saved data for this cpu". This was because all the CPUs were offline. Let's move the entry of kdb/kgdb to a direct call from panic() and stop using the generic notifier. Putting a direct call in allows us to order things more properly and it also doesn't seem like we're breaking any abstractions by calling into the debugger from the panic function. Daniel said: : This patch changes the way kdump and kgdb interact with each other. : However it would seem rather odd to have both tools simultaneously armed : and, even if they were, the user still has the option to use panic_timeout : to force a kdump to happen. Thus I think the change of order is : acceptable. Link: http://lkml.kernel.org/r/20190703170354.217312-1-dianders@chromium.org Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Daniel Thompson <daniel.thompson@linaro.org> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Kees Cook <keescook@chromium.org> Cc: Borislav Petkov <bp@suse.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Feng Tang <feng.tang@intel.com> Cc: YueHaibing <yuehaibing@huawei.com> Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-15docs: admin-guide: move sysctl directory to itMauro Carvalho Chehab1-1/+1
The stuff under sysctl describes /sys interface from userspace point of view. So, add it to the admin-guide and remove the :orphan: from its index file. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
2019-07-15docs: sysctl: convert to ReSTMauro Carvalho Chehab1-1/+1
Rename the /proc/sys/ documentation files to ReST, using the README file as a template for an index.rst, adding the other files there via TOC markup. Despite being written on different times with different styles, try to make them somewhat coherent with a similar look and feel, ensuring that they'll look nice as both raw text file and as via the html output produced by the Sphinx build system. At its new index.rst, let's add a :orphan: while this is not linked to the main index.rst file, in order to avoid build warnings. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
2019-05-21treewide: Add SPDX license identifier for missed filesThomas Gleixner1-0/+1
Add SPDX license identifiers to all files which: - Have no license information of any form - Have EXPORT_.*_SYMBOL_GPL inside which was used in the initial scan/conversion to ignore the file These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is: GPL-2.0-only Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-18panic: add an option to replay all the printk message in bufferFeng Tang1-1/+5
Currently on panic, kernel will lower the loglevel and print out pending printk msg only with console_flush_on_panic(). Add an option for users to configure the "panic_print" to replay all dmesg in buffer, some of which they may have never seen due to the loglevel setting, which will help panic debugging . [feng.tang@intel.com: keep the original console_flush_on_panic() inside panic()] Link: http://lkml.kernel.org/r/1556199137-14163-1-git-send-email-feng.tang@intel.com [feng.tang@intel.com: use logbuf lock to protect the console log index] Link: http://lkml.kernel.org/r/1556269868-22654-1-git-send-email-feng.tang@intel.com Link: http://lkml.kernel.org/r/1556095872-36838-1-git-send-email-feng.tang@intel.com Signed-off-by: Feng Tang <feng.tang@intel.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Cc: Aaro Koskinen <aaro.koskinen@nokia.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Cc: Kees Cook <keescook@chromium.org> Cc: Borislav Petkov <bp@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14panic/reboot: allow specifying reboot_mode for panic onlyAaro Koskinen1-0/+2
Allow specifying reboot_mode for panic only. This is needed on systems where ramoops is used to store panic logs, and user wants to use warm reset to preserve those, while still having cold reset on normal reboots. Link: http://lkml.kernel.org/r/20190322004735.27702-1-aaro.koskinen@iki.fi Signed-off-by: Aaro Koskinen <aaro.koskinen@nokia.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14panic: avoid the extra noise dmesgFeng Tang1-0/+3
When kernel panic happens, it will first print the panic call stack, then the ending msg like: [ 35.743249] ---[ end Kernel panic - not syncing: Fatal exception [ 35.749975] ------------[ cut here ]------------ The above message are very useful for debugging. But if system is configured to not reboot on panic, say the "panic_timeout" parameter equals 0, it will likely print out many noisy message like WARN() call stack for each and every CPU except the panic one, messages like below: WARNING: CPU: 1 PID: 280 at kernel/sched/core.c:1198 set_task_cpu+0x183/0x190 Call Trace: <IRQ> try_to_wake_up default_wake_function autoremove_wake_function __wake_up_common __wake_up_common_lock __wake_up wake_up_klogd_work_func irq_work_run_list irq_work_tick update_process_times tick_sched_timer __hrtimer_run_queues hrtimer_interrupt smp_apic_timer_interrupt apic_timer_interrupt For people working in console mode, the screen will first show the panic call stack, but immediately overridden by these noisy extra messages, which makes debugging much more difficult, as the original context gets lost on screen. Also these noisy messages will confuse some users, as I have seen many bug reporters posted the noisy message into bugzilla, instead of the real panic call stack and context. Adding a flag "suppress_printk" which gets set in panic() to avoid those noisy messages, without changing current kernel behavior that both panic blinking and sysrq magic key can work as is, suggested by Petr Mladek. To verify this, make sure kernel is not configured to reboot on panic and in console # echo c > /proc/sysrq-trigger to see if console only prints out the panic call stack. Link: http://lkml.kernel.org/r/1551430186-24169-1-git-send-email-feng.tang@intel.com Signed-off-by: Feng Tang <feng.tang@intel.com> Suggested-by: Petr Mladek <pmladek@suse.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Kees Cook <keescook@chromium.org> Cc: Borislav Petkov <bp@suse.de> Cc: Andi Kleen <ak@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jiri Slaby <jslaby@suse.com> Cc: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-02s390: simplify disabled_waitMartin Schwidefsky1-6/+1
The disabled_wait() function uses its argument as the PSW address when it stops the CPU with a wait PSW that is disabled for interrupts. The different callers sometimes use a specific number like 0xdeadbeef to indicate a specific failure, the early boot code uses 0 and some other calls sites use __builtin_return_address(0). At the time a dump is created the current PSW and the registers of a CPU are written to lowcore to make them avaiable to the dump analysis tool. For a CPU stopped with disabled_wait the PSW and the registers do not really make sense together, the PSW address does not point to the function the registers belong to. Simplify disabled_wait() by using _THIS_IP_ for the PSW address and drop the argument to the function. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2019-03-07kernel/panic.c: taint: fix debugfs_simple_attr.cocci warningsYueHaibing1-6/+4
Use DEFINE_DEBUGFS_ATTRIBUTE rather than DEFINE_SIMPLE_ATTRIBUTE for debugfs files. Semantic patch information: Rationale: DEFINE_SIMPLE_ATTRIBUTE + debugfs_create_file() imposes some significant overhead as compared to DEFINE_DEBUGFS_ATTRIBUTE + debugfs_create_file_unsafe(). Generated by: scripts/coccinelle/api/debugfs/debugfs_simple_attr.cocci The _unsafe() part suggests that some of them "safeness responsibilities" are now panic.c responsibilities. The patch is OK since panic's clear_warn_once_fops struct file_operations is safe against removal, so we don't have to use otherwise necessary debugfs_file_get()/debugfs_file_put(). [sergey.senozhatsky.work@gmail.com: changelog addition] Link: http://lkml.kernel.org/r/1545990861-158097-1-git-send-email-yuehaibing@huawei.com Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Cc: Kees Cook <keescook@chromium.org> Cc: Borislav Petkov <bp@suse.de> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Petr Mladek <pmladek@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04kernel/sysctl: add panic_print into sysctlFeng Tang1-1/+1
So that we can also runtime chose to print out the needed system info for panic, other than setting the kernel cmdline. Link: http://lkml.kernel.org/r/1543398842-19295-3-git-send-email-feng.tang@intel.com Signed-off-by: Feng Tang <feng.tang@intel.com> Suggested-by: Steven Rostedt <rostedt@goodmis.org> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: John Stultz <john.stultz@linaro.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04panic: add options to print system info when panic happensFeng Tang1-0/+28
Kernel panic issues are always painful to debug, partially because it's not easy to get enough information of the context when panic happens. And we have ramoops and kdump for that, while this commit tries to provide a easier way to show the system info by adding a cmdline parameter, referring some idea from sysrq handler. Link: http://lkml.kernel.org/r/1543398842-19295-2-git-send-email-feng.tang@intel.com Signed-off-by: Feng Tang <feng.tang@intel.com> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: John Stultz <john.stultz@linaro.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-27Merge tag 'printk-for-4.21' of ↵Linus Torvalds1-1/+5
git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk Pull printk updates from Petr Mladek: - Keep spinlocks busted until the end of panic() - Fix races between calculating number of messages that would fit into user space buffers, filling the buffers, and switching printk.time parameter - Some code clean up * tag 'printk-for-4.21' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk: printk: Remove print_prefix() calls with NULL buffer. printk: fix printk_time race. printk: Make printk_emit() local function. panic: avoid deadlocks in re-entrant console drivers
2018-11-22panic: avoid deadlocks in re-entrant console driversSergey Senozhatsky1-1/+5
From printk()/serial console point of view panic() is special, because it may force CPU to re-enter printk() or/and serial console driver. Therefore, some of serial consoles drivers are re-entrant. E.g. 8250: serial8250_console_write() { if (port->sysrq) locked = 0; else if (oops_in_progress) locked = spin_trylock_irqsave(&port->lock, flags); else spin_lock_irqsave(&port->lock, flags); ... } panic() does set oops_in_progress via bust_spinlocks(1), so in theory we should be able to re-enter serial console driver from panic(): CPU0 <NMI> uart_console_write() serial8250_console_write() // if (oops_in_progress) // spin_trylock_irqsave() call_console_drivers() console_unlock() console_flush_on_panic() bust_spinlocks(1) // oops_in_progress++ panic() <NMI/> spin_lock_irqsave(&port->lock, flags) // spin_lock_irqsave() serial8250_console_write() call_console_drivers() console_unlock() printk() ... However, this does not happen and we deadlock in serial console on port->lock spinlock. And the problem is that console_flush_on_panic() called after bust_spinlocks(0): void panic(const char *fmt, ...) { bust_spinlocks(1); ... bust_spinlocks(0); console_flush_on_panic(); ... } bust_spinlocks(0) decrements oops_in_progress, so oops_in_progress can go back to zero. Thus even re-entrant console drivers will simply spin on port->lock spinlock. Given that port->lock may already be locked either by a stopped CPU, or by the very same CPU we execute panic() on (for instance, NMI panic() on printing CPU) the system deadlocks and does not reboot. Fix this by removing bust_spinlocks(0), so oops_in_progress is always set in panic() now and, thus, re-entrant console drivers will trylock the port->lock instead of spinning on it forever, when we call them from console_flush_on_panic(). Link: http://lkml.kernel.org/r/20181025101036.6823-1-sergey.senozhatsky@gmail.com Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Daniel Wang <wonderfly@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Alan Cox <gnomes@lxorguk.ukuu.org.uk> Cc: Jiri Slaby <jslaby@suse.com> Cc: Peter Feiner <pfeiner@google.com> Cc: linux-serial@vger.kernel.org Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Petr Mladek <pmladek@suse.com>
2018-10-31kernel/panic.c: filter out a potential trailing newlineBorislav Petkov1-2/+6
If a call to panic() terminates the string with a \n , the result puts the closing brace ']---' on a newline because panic() itself adds \n too. Now, if one goes and removes the newline chars from all panic() invocations - and the stats right now look like this: ~300 calls with a \n ~500 calls without a \n one is destined to a neverending game of whack-a-mole because the usual thing to do is add a newline at the end of a string a function is supposed to print. Therefore, simply zap any \n at the end of the panic string to avoid touching so many places in the kernel. Link: http://lkml.kernel.org/r/20181009205019.2786-1-bp@alien8.de Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Kees Cook <keescook@chromium.org> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-31kernel/panic.c: do not append newline to the stack protector panic stringBorislav Petkov1-1/+1
... because panic() itself already does this. Otherwise you have line-broken trailer: [ 1.836965] ---[ end Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: pgd_alloc+0x29e/0x2a0 [ 1.836965] ]--- Link: http://lkml.kernel.org/r/20181008202901.7894-1-bp@alien8.de Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Kees Cook <keescook@chromium.org> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-14Kbuild: rename CC_STACKPROTECTOR[_STRONG] config variablesLinus Torvalds1-1/+1
The changes to automatically test for working stack protector compiler support in the Kconfig files removed the special STACKPROTECTOR_AUTO option that picked the strongest stack protector that the compiler supported. That was all a nice cleanup - it makes no sense to have the AUTO case now that the Kconfig phase can just determine the compiler support directly. HOWEVER. It also meant that doing "make oldconfig" would now _disable_ the strong stackprotector if you had AUTO enabled, because in a legacy config file, the sane stack protector configuration would look like CONFIG_HAVE_CC_STACKPROTECTOR=y # CONFIG_CC_STACKPROTECTOR_NONE is not set # CONFIG_CC_STACKPROTECTOR_REGULAR is not set # CONFIG_CC_STACKPROTECTOR_STRONG is not set CONFIG_CC_STACKPROTECTOR_AUTO=y and when you ran this through "make oldconfig" with the Kbuild changes, it would ask you about the regular CONFIG_CC_STACKPROTECTOR (that had been renamed from CONFIG_CC_STACKPROTECTOR_REGULAR to just CONFIG_CC_STACKPROTECTOR), but it would think that the STRONG version used to be disabled (because it was really enabled by AUTO), and would disable it in the new config, resulting in: CONFIG_HAVE_CC_STACKPROTECTOR=y CONFIG_CC_HAS_STACKPROTECTOR_NONE=y CONFIG_CC_STACKPROTECTOR=y # CONFIG_CC_STACKPROTECTOR_STRONG is not set CONFIG_CC_HAS_SANE_STACKPROTECTOR=y That's dangerously subtle - people could suddenly find themselves with the weaker stack protector setup without even realizing. The solution here is to just rename not just the old RECULAR stack protector option, but also the strong one. This does that by just removing the CC_ prefix entirely for the user choices, because it really is not about the compiler support (the compiler support now instead automatially impacts _visibility_ of the options to users). This results in "make oldconfig" actually asking the user for their choice, so that we don't have any silent subtle security model changes. The end result would generally look like this: CONFIG_HAVE_CC_STACKPROTECTOR=y CONFIG_CC_HAS_STACKPROTECTOR_NONE=y CONFIG_STACKPROTECTOR=y CONFIG_STACKPROTECTOR_STRONG=y CONFIG_CC_HAS_SANE_STACKPROTECTOR=y where the "CC_" versions really are about internal compiler infrastructure, not the user selections. Acked-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11taint: add taint for randstructKees Cook1-1/+3
Since the randstruct plugin can intentionally produce extremely unusual kernel structure layouts (even performance pathological ones), some maintainers want to be able to trivially determine if an Oops is coming from a randstruct-built kernel, so as to keep their sanity when debugging. This adds the new flag and initializes taint_mask immediately when built with randstruct. Link: http://lkml.kernel.org/r/1519084390-43867-4-git-send-email-keescook@chromium.org Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11taint: consolidate documentationKees Cook1-19/+4
This consolidates the taint bit documentation into a single place with both numeric and letter values. Additionally adds the missing TAINT_AUX documentation. Link: http://lkml.kernel.org/r/1519084390-43867-3-git-send-email-keescook@chromium.org Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11taint: convert to indexed initializationKees Cook1-17/+19
This converts to using indexed initializers instead of comments, adds a comment on why the taint flags can't be an enum, and make sure that no one forgets to update the taint_flags when adding new bits. Link: http://lkml.kernel.org/r/1519084390-43867-2-git-send-email-keescook@chromium.org Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-10Merge tag 'trace-v4.17' of ↵Linus Torvalds1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "New features: - Tom Zanussi's extended histogram work. This adds the synthetic events to have histograms from multiple event data Adds triggers "onmatch" and "onmax" to call the synthetic events Several updates to the histogram code from this - Allow way to nest ring buffer calls in the same context - Allow absolute time stamps in ring buffer - Rewrite of filter code parsing based on Al Viro's suggestions - Setting of trace_clock to global if TSC is unstable (on boot) - Better OOM handling when allocating large ring buffers - Added initcall tracepoints (consolidated initcall_debug code with them) And other various fixes and clean ups" * tag 'trace-v4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (68 commits) init: Have initcall_debug still work without CONFIG_TRACEPOINTS init, tracing: Have printk come through the trace events for initcall_debug init, tracing: instrument security and console initcall trace events init, tracing: Add initcall trace events tracing: Add rcu dereference annotation for test func that touches filter->prog tracing: Add rcu dereference annotation for filter->prog tracing: Fixup logic inversion on setting trace_global_clock defaults tracing: Hide global trace clock from lockdep ring-buffer: Add set/clear_current_oom_origin() during allocations ring-buffer: Check if memory is available before allocation lockdep: Add print_irqtrace_events() to __warn vsprintf: Do not preprocess non-dereferenced pointers for bprintf (%px and %pK) tracing: Uninitialized variable in create_tracing_map_fields() tracing: Make sure variable string fields are NULL-terminated tracing: Add action comparisons when testing matching hist triggers tracing: Don't add flag strings when displaying variable references tracing: Fix display of hist trigger expressions containing timestamps ftrace: Drop a VLA in module_exists() tracing: Mention trace_clock=global when warning about unstable clocks tracing: Default to using trace_global_clock if sched_clock is unstable ...
2018-04-06lockdep: Add print_irqtrace_events() to __warnSteven Rostedt (VMware)1-0/+2
Running a test on a x86_32 kernel I triggered a bug that an interrupt disable/enable isn't being catched by lockdep. At least knowing where the last one was found would be helpful, but the warnings that are produced do not show this information. Even without debugging lockdep, having the WARN() display the last place hard and soft irqs were enabled or disabled is valuable. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-04-02Merge branch 'core-core-for-linus' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc core updates from Ingo Molnar: "Two changes: - add membarriers to Documentation/features/ - fix a minor nit in panic printk formatting" * 'core-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: panic: Add closing panic marker parenthesis Documentation/features, membarriers: Document membarrier-sync-core architecture support Documentation/features: Allow comments in arch features files
2018-03-09bug: use %pB in BUG and stack protector failureKees Cook1-1/+1
The BUG and stack protector reports were still using a raw %p. This changes it to %pB for more meaningful output. Link: http://lkml.kernel.org/r/20180301225704.GA34198@beast Fixes: ad67b74d2469 ("printk: hash addresses printed with %p") Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Richard Weinberger <richard.weinberger@gmail.com>, Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-03-08panic: Add closing panic marker parenthesisBorislav Petkov1-1/+1
Otherwise it looks unbalanced. Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Link: https://lkml.kernel.org/r/20180306094920.16917-2-bp@alien8.de
2017-11-17kernel/panic.c: add TAINT_AUXBorislav Petkov1-0/+2
This is the gist of a patch which we've been forward-porting in our kernels for a long time now and it probably would make a good sense to have such TAINT_AUX flag upstream which can be used by each distro etc, how they see fit. This way, we won't need to forward-port a distro-only version indefinitely. Add an auxiliary taint flag to be used by distros and others. This obviates the need to forward-port whatever internal solutions people have in favor of a single flag which they can map arbitrarily to a definition of their pleasing. The "X" mnemonic could also mean eXternal, which would be taint from a distro or something else but not the upstream kernel. We will use it to mark modules for which we don't provide support. I.e., a really eXternal module. Link: http://lkml.kernel.org/r/20170911134533.dp5mtyku5bongx4c@pd.tnic Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Kees Cook <keescook@chromium.org> Cc: Jessica Yu <jeyu@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Michal Marek <mmarek@suse.cz> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Takashi Iwai <tiwai@suse.de> Cc: Petr Mladek <pmladek@suse.com> Cc: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-17bug: fix "cut here" location for __WARN_TAINT architecturesKees Cook1-1/+15
Prior to v4.11, x86 used warn_slowpath_fmt() for handling WARN()s. After WARN() was moved to using UD0 on x86, the warning text started appearing _before_ the "cut here" line. This appears to have been a long-standing bug on architectures that used __WARN_TAINT, but it didn't get fixed. v4.11 and earlier on x86: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 2956 at drivers/misc/lkdtm_bugs.c:65 lkdtm_WARNING+0x21/0x30 This is a warning message Modules linked in: v4.12 and later on x86: This is a warning message ------------[ cut here ]------------ WARNING: CPU: 1 PID: 2982 at drivers/misc/lkdtm_bugs.c:68 lkdtm_WARNING+0x15/0x20 Modules linked in: With this fix: ------------[ cut here ]------------ This is a warning message WARNING: CPU: 3 PID: 3009 at drivers/misc/lkdtm_bugs.c:67 lkdtm_WARNING+0x15/0x20 Since the __FILE__ reporting happens as part of the UD0 handler, it isn't trivial to move the message to after the WARNING line, but at least we can fix the position of the "cut here" line so all the various logging tools will start including the actual runtime warning message again, when they follow the instruction and "cut here". Link: http://lkml.kernel.org/r/1510100869-73751-4-git-send-email-keescook@chromium.org Fixes: 9a93848fe787 ("x86/debug: Implement __WARN() using UD0") Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-17bug: define the "cut here" string in a single placeKees Cook1-1/+1
The "cut here" string is used in a few paths. Define it in a single place. Link: http://lkml.kernel.org/r/1510100869-73751-3-git-send-email-keescook@chromium.org Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-17kernel debug: support resetting WARN_ONCE for all architecturesAndi Kleen1-1/+2
Some architectures store the WARN_ONCE state in the flags field of the bug_entry. Clear that one too when resetting once state through /sys/kernel/debug/clear_warn_once Pointed out by Michael Ellerman Improves the earlier patch that add clear_warn_once. [ak@linux.intel.com: add a missing ifdef CONFIG_MODULES] Link: http://lkml.kernel.org/r/20171020170633.9593-1-andi@firstfloor.org [akpm@linux-foundation.org: fix unused var warning] [akpm@linux-foundation.org: Use 0200 for clear_warn_once file, per mpe] [akpm@linux-foundation.org: clear BUGFLAG_DONE in clear_once_table(), per mpe] Link: http://lkml.kernel.org/r/20171019204642.7404-1-andi@firstfloor.org Signed-off-by: Andi Kleen <ak@linux.intel.com> Tested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-17kernel debug: support resetting WARN*_ONCEAndi Kleen1-0/+28
I like _ONCE warnings because it's guaranteed that they don't flood the log. During testing I find it useful to reset the state of the once warnings, so that I can rerun tests and see if they trigger again, or can guarantee that a test run always hits the same warnings. This patch adds a debugfs interface to reset all the _ONCE warnings so that they appear again: echo 1 > /sys/kernel/debug/clear_warn_once This is implemented by putting all the warning booleans into a special section, and clearing it. [akpm@linux-foundation.org: coding-style fixes] Link: http://lkml.kernel.org/r/20171017221455.6740-1-andi@firstfloor.org Signed-off-by: Andi Kleen <ak@linux.intel.com> Tested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-17locking/refcounts, x86/asm: Implement fast refcount overflow protectionKees Cook1-0/+12
This implements refcount_t overflow protection on x86 without a noticeable performance impact, though without the fuller checking of REFCOUNT_FULL. This is done by duplicating the existing atomic_t refcount implementation but with normally a single instruction added to detect if the refcount has gone negative (e.g. wrapped past INT_MAX or below zero). When detected, the handler saturates the refcount_t to INT_MIN / 2. With this overflow protection, the erroneous reference release that would follow a wrap back to zero is blocked from happening, avoiding the class of refcount-overflow use-after-free vulnerabilities entirely. Only the overflow case of refcounting can be perfectly protected, since it can be detected and stopped before the reference is freed and left to be abused by an attacker. There isn't a way to block early decrements, and while REFCOUNT_FULL stops increment-from-zero cases (which would be the state _after_ an early decrement and stops potential double-free conditions), this fast implementation does not, since it would require the more expensive cmpxchg loops. Since the overflow case is much more common (e.g. missing a "put" during an error path), this protection provides real-world protection. For example, the two public refcount overflow use-after-free exploits published in 2016 would have been rendered unexploitable: http://perception-point.io/2016/01/14/analysis-and-exploitation-of-a-linux-kernel-vulnerability-cve-2016-0728/ http://cyseclabs.com/page?n=02012016 This implementation does, however, notice an unchecked decrement to zero (i.e. caller used refcount_dec() instead of refcount_dec_and_test() and it resulted in a zero). Decrements under zero are noticed (since they will have resulted in a negative value), though this only indicates that a use-after-free may have already happened. Such notifications are likely avoidable by an attacker that has already exploited a use-after-free vulnerability, but it's better to have them reported than allow such conditions to remain universally silent. On first overflow detection, the refcount value is reset to INT_MIN / 2 (which serves as a saturation value) and a report and stack trace are produced. When operations detect only negative value results (such as changing an already saturated value), saturation still happens but no notification is performed (since the value was already saturated). On the matter of races, since the entire range beyond INT_MAX but before 0 is negative, every operation at INT_MIN / 2 will trap, leaving no overflow-only race condition. As for performance, this implementation adds a single "js" instruction to the regular execution flow of a copy of the standard atomic_t refcount operations. (The non-"and_test" refcount_dec() function, which is uncommon in regular refcount design patterns, has an additional "jz" instruction to detect reaching exactly zero.) Since this is a forward jump, it is by default the non-predicted path, which will be reinforced by dynamic branch prediction. The result is this protection having virtually no measurable change in performance over standard atomic_t operations. The error path, located in .text.unlikely, saves the refcount location and then uses UD0 to fire a refcount exception handler, which resets the refcount, handles reporting, and returns to regular execution. This keeps the changes to .text size minimal, avoiding return jumps and open-coded calls to the error reporting routine. Example assembly comparison: refcount_inc() before: .text: ffffffff81546149: f0 ff 45 f4 lock incl -0xc(%rbp) refcount_inc() after: .text: ffffffff81546149: f0 ff 45 f4 lock incl -0xc(%rbp) ffffffff8154614d: 0f 88 80 d5 17 00 js ffffffff816c36d3 ... .text.unlikely: ffffffff816c36d3: 48 8d 4d f4 lea -0xc(%rbp),%rcx ffffffff816c36d7: 0f ff (bad) These are the cycle counts comparing a loop of refcount_inc() from 1 to INT_MAX and back down to 0 (via refcount_dec_and_test()), between unprotected refcount_t (atomic_t), fully protected REFCOUNT_FULL (refcount_t-full), and this overflow-protected refcount (refcount_t-fast): 2147483646 refcount_inc()s and 2147483647 refcount_dec_and_test()s: cycles protections atomic_t 82249267387 none refcount_t-fast 82211446892 overflow, untested dec-to-zero refcount_t-full 144814735193 overflow, untested dec-to-zero, inc-from-zero This code is a modified version of the x86 PAX_REFCOUNT atomic_t overflow defense from the last public patch of PaX/grsecurity, based on my understanding of the code. Changes or omissions from the original code are mine and don't reflect the original grsecurity/PaX code. Thanks to PaX Team for various suggestions for improvement for repurposing this code to be a refcount-only protection. Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christoph Hellwig <hch@infradead.org> Cc: David S. Miller <davem@davemloft.net> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Elena Reshetova <elena.reshetova@intel.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Hans Liljestrand <ishkamiel@gmail.com> Cc: James Bottomley <James.Bottomley@hansenpartnership.com> Cc: Jann Horn <jannh@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Serge E. Hallyn <serge@hallyn.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: arozansk@redhat.com Cc: axboe@kernel.dk Cc: kernel-hardening@lists.openwall.com Cc: linux-arch <linux-arch@vger.kernel.org> Link: http://lkml.kernel.org/r/20170815161924.GA133115@beast Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02sched/headers: Prepare for new header dependencies before moving code to ↵Ingo Molnar1-0/+1
<linux/sched/debug.h> We are going to split <linux/sched/debug.h> out of <linux/sched.h>, which will have to be picked up from other headers and a couple of .c files. Create a trivial placeholder <linux/sched/debug.h> file that just maps to <linux/sched.h> to make this patch obviously correct and bisectable. Include the new header in the files that are going to need it. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-23sparc64: Send break twice from console to return to boot promVijay Kumar1-1/+2
Now we can also jump to boot prom from sunhv console by sending break twice on console for both running and panicked kernel cases. Signed-off-by: Vijay Kumar <vijay.ac.kumar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-22Merge branch 'for-linus' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk Pull printk updates from Petr Mladek: - Add Petr Mladek, Sergey Senozhatsky as printk maintainers, and Steven Rostedt as the printk reviewer. This idea came up after the discussion about printk issues at Kernel Summit. It was formulated and discussed at lkml[1]. - Extend a lock-less NMI per-cpu buffers idea to handle recursive printk() calls by Sergey Senozhatsky[2]. It is the first step in sanitizing printk as discussed at Kernel Summit. The change allows to see messages that would normally get ignored or would cause a deadlock. Also it allows to enable lockdep in printk(). This already paid off. The testing in linux-next helped to discover two old problems that were hidden before[3][4]. - Remove unused parameter by Sergey Senozhatsky. Clean up after a past change. [1] http://lkml.kernel.org/r/1481798878-31898-1-git-send-email-pmladek@suse.com [2] http://lkml.kernel.org/r/20161227141611.940-1-sergey.senozhatsky@gmail.com [3] http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com [4] http://lkml.kernel.org/r/20170217015932.11898-1-sergey.senozhatsky@gmail.com * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk: printk: drop call_console_drivers() unused param printk: convert the rest to printk-safe printk: remove zap_locks() function printk: use printk_safe buffers in printk printk: report lost messages in printk safe/nmi contexts printk: always use deferred printk when flush printk_safe lines printk: introduce per-cpu safe_print seq buffer printk: rename nmi.c and exported api printk: use vprintk_func in vprintk() MAINTAINERS: Add printk maintainers
2017-02-08printk: rename nmi.c and exported apiSergey Senozhatsky1-2/+2
A preparation patch for printk_safe work. No functional change. - rename nmi.c to print_safe.c - add `printk_safe' prefix to some (which used both by printk-safe and printk-nmi) of the exported functions. Link: http://lkml.kernel.org/r/20161227141611.940-3-sergey.senozhatsky@gmail.com Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jan Kara <jack@suse.cz> Cc: Tejun Heo <tj@kernel.org> Cc: Calvin Owens <calvinowens@fb.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Hurley <peter@hurleysoftware.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Petr Mladek <pmladek@suse.com>
2017-01-24kernel/panic.c: add missing \nJiri Slaby1-1/+1
When a system panics, the "Rebooting in X seconds.." message is never printed because it lacks a new line. Fix it. Link: http://lkml.kernel.org/r/20170119114751.2724-1-jslaby@suse.cz Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-01-17taint/module: Fix problems when out-of-kernel driver defines true or falseLarry Finger1-1/+1
Commit 7fd8329ba502 ("taint/module: Clean up global and module taint flags handling") used the key words true and false as character members of a new struct. These names cause problems when out-of-kernel modules such as VirtualBox include their own definitions of true and false. Fixes: 7fd8329ba502 ("taint/module: Clean up global and module taint flags handling") Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net> Cc: Petr Mladek <pmladek@suse.com> Cc: Jessica Yu <jeyu@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Reported-by: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Reviewed-by: Petr Mladek <pmladek@suse.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Jessica Yu <jeyu@redhat.com>
2016-11-26taint/module: Clean up global and module taint flags handlingPetr Mladek1-28/+25
The commit 66cc69e34e86a231 ("Fix: module signature vs tracepoints: add new TAINT_UNSIGNED_MODULE") updated module_taint_flags() to potentially print one more character. But it did not increase the size of the corresponding buffers in m_show() and print_modules(). We have recently done the same mistake when adding a taint flag for livepatching, see https://lkml.kernel.org/r/cfba2c823bb984690b73572aaae1db596b54a082.1472137475.git.jpoimboe@redhat.com Also struct module uses an incompatible type for mod-taints flags. It survived from the commit 2bc2d61a9638dab670d ("[PATCH] list module taint flags in Oops/panic"). There was used "int" for the global taint flags at these times. But only the global tain flags was later changed to "unsigned long" by the commit 25ddbb18aae33ad2 ("Make the taint flags reliable"). This patch defines TAINT_FLAGS_COUNT that can be used to create arrays and buffers of the right size. Note that we could not use enum because the taint flag indexes are used also in assembly code. Then it reworks the table that describes the taint flags. The TAINT_* numbers can be used as the index. Instead, we add information if the taint flag is also shown per-module. Finally, it uses "unsigned long", bit operations, and the updated taint_flags table also for mod->taints. It is not optimal because only few taint flags can be printed by module_taint_flags(). But better be on the safe side. IMHO, it is not worth the optimization and this is a good compromise. Signed-off-by: Petr Mladek <pmladek@suse.com> Link: http://lkml.kernel.org/r/1474458442-21581-1-git-send-email-pmladek@suse.com [jeyu@redhat.com: fix broken lkml link in changelog] Signed-off-by: Jessica Yu <jeyu@redhat.com>
2016-10-11x86/panic: replace smp_send_stop() with kdump friendly version in panic pathHidehiro Kawai1-7/+40
Daniel Walker reported problems which happens when crash_kexec_post_notifiers kernel option is enabled (https://lkml.org/lkml/2015/6/24/44). In that case, smp_send_stop() is called before entering kdump routines which assume other CPUs are still online. As the result, for x86, kdump routines fail to save other CPUs' registers and disable virtualization extensions. To fix this problem, call a new kdump friendly function, crash_smp_send_stop(), instead of the smp_send_stop() when crash_kexec_post_notifiers is enabled. crash_smp_send_stop() is a weak function, and it just call smp_send_stop(). Architecture codes should override it so that kdump can work appropriately. This patch only provides x86-specific version. For Xen's PV kernel, just keep the current behavior. NOTES: - Right solution would be to place crash_smp_send_stop() before __crash_kexec() invocation in all cases and remove smp_send_stop(), but we can't do that until all architectures implement own crash_smp_send_stop() - crash_smp_send_stop()-like work is still needed by machine_crash_shutdown() because crash_kexec() can be called without entering panic() Fixes: f06e5153f4ae (kernel/panic.c: add "crash_kexec_post_notifiers" option) Link: http://lkml.kernel.org/r/20160810080948.11028.15344.stgit@sysi4-13.yrl.intra.hitachi.co.jp Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Reported-by: Daniel Walker <dwalker@fifo99.com> Cc: Dave Young <dyoung@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Daniel Walker <dwalker@fifo99.com> Cc: Xunlei Pang <xpang@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Borislav Petkov <bp@suse.de> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Toshi Kani <toshi.kani@hpe.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: David Daney <david.daney@cavium.com> Cc: Aaro Koskinen <aaro.koskinen@iki.fi> Cc: "Steven J. Hill" <steven.hill@cavium.com> Cc: Corey Minyard <cminyard@mvista.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-02kexec: use core_param for crash_kexec_post_notifiers boot optionHidehiro Kawai1-9/+4
crash_kexec_post_notifiers ia a boot option which controls whether the 1st kernel calls panic notifiers or not before booting the 2nd kernel. However, there is no need to limit it to being modifiable only at boot time. So, use core_param instead of early_param. Link: http://lkml.kernel.org/r/20160705113327.5864.43139.stgit@softrs Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Cc: Dave Young <dyoung@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Borislav Petkov <bp@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>