summaryrefslogtreecommitdiffstats
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2017-02-20Merge branch 'sched-core-for-linus' of ↵Linus Torvalds32-2406/+2579
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "The main changes in this (fairly busy) cycle were: - There was a class of scheduler bugs related to forgetting to update the rq-clock timestamp which can cause weird and hard to debug problems, so there's a new debug facility for this: which uncovered a whole lot of bugs which convinced us that we want to keep the debug facility. (Peter Zijlstra, Matt Fleming) - Various cputime related updates: eliminate cputime and use u64 nanoseconds directly, simplify and improve the arch interfaces, implement delayed accounting more widely, etc. - (Frederic Weisbecker) - Move code around for better structure plus cleanups (Ingo Molnar) - Move IO schedule accounting deeper into the scheduler plus related changes to improve the situation (Tejun Heo) - ... plus a round of sched/rt and sched/deadline fixes, plus other fixes, updats and cleanups" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (85 commits) sched/core: Remove unlikely() annotation from sched_move_task() sched/autogroup: Rename auto_group.[ch] to autogroup.[ch] sched/topology: Split out scheduler topology code from core.c into topology.c sched/core: Remove unnecessary #include headers sched/rq_clock: Consolidate the ordering of the rq_clock methods delayacct: Include <uapi/linux/taskstats.h> sched/core: Clean up comments sched/rt: Show the 'sched_rr_timeslice' SCHED_RR timeslice tuning knob in milliseconds sched/clock: Add dummy clear_sched_clock_stable() stub function sched/cputime: Remove generic asm headers sched/cputime: Remove unused nsec_to_cputime() s390, sched/cputime: Remove unused cputime definitions powerpc, sched/cputime: Remove unused cputime definitions s390, sched/cputime: Make arch_cpu_idle_time() to return nsecs ia64, sched/cputime: Remove unused cputime definitions ia64: Convert vtime to use nsec units directly ia64, sched/cputime: Move the nsecs based cputime headers to the last arch using it sched/cputime: Remove jiffies based cputime sched/cputime, vtime: Return nsecs instead of cputime_t to account sched/cputime: Complete nsec conversion of tick based accounting ...
2017-02-20Merge branch 'perf-core-for-linus' of ↵Linus Torvalds3-129/+219
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "On the kernel side the main changes in this cycle were: - Add Intel Kaby Lake CPU support (Srinivas Pandruvada) - AMD uncore driver updates for fam17 (Janakarajan Natarajan) - Intel/PT updates and core events optimizations and cleanups (Alexander Shishkin) - cgroups events fixes (David Carrillo-Cisneros) - kprobes improvements (Masami Hiramatsu) - ... plus misc fixes and updates. On the tooling side the main changes were: - Support clang build in tools/{perf,lib/{bpf,traceevent,api}} with CC=clang, to, for instance, take advantage of better warnings (Arnaldo Carvalho de Melo): - Introduce the 'delta-abs' 'perf diff' compute method, that orders the histogram entries by the absolute value of the percentage delta for a function in two perf.data files, i.e. the functions that changed the most (increase or decrease in samples) comes first (Namhyung Kim) - Add support for parsing Intel uncore vendor event files and add uncore vendor events for the Intel server processors (Haswell, Broadwell, IvyBridge), Xeon Phi (Knights Landing) and Broadwell DE (Andi Kleen) - Introduce 'perf ftrace' a perf front end to the kernel's ftrace function and function_graph tracer, defaulting to the "function_graph" tracer, more work will be done in reviving this effort, forward porting it from its initial patch submission (Namhyung Kim) - Add 'e' and 'c' hotkeys to expand/collapse call chains for a single hist entry in the 'perf report' and 'perf top' TUI (Jiri Olsa) - Account thread wait time (off CPU time) separately: sleep, iowait and preempt, based on the prev_state of the last event, show the breakdown when using "perf sched timehist --state" (Namhyumg Kim) - Add more triggers to switch the output file (perf.data.TIMESTAMP). Now, in addition to switching to a different output file when receiving a SIGUSR2, one can also specify file size and time based triggers: perf record -a --switch-output=signal is equivalent to what we had before: perf record -a --switch-output While we can also ask for the file to be "sliced" by size, taking into account that that will happen only when we get woken up by the kernel, i.e. one has to take into account the --mmap-pages (the size of the perf mmap ring buffer): perf record -a --switch-output=2G will break the perf.data output into multiple files limited to 2GB of samples, right when generating the output. For time based samples, alert() will be used, so to have 1 minute limited perf.data output files: perf record -a --switch-output=1m (Jiri Olsa) - Improve 'perf trace' (Arnaldo Carvalho de Melo) - 'perf kallsyms' toy tool to look for extended symbol information on the running kernel and demonstrate the machine/thread/symbol APIs for use in other tools, such as 'perf probe' (Arnaldo Carvalho de Melo) - ... plus tons of other changes, see the shortlog and Git log for details" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (131 commits) perf tools: Add missing parse_events_error() prototype perf pmu: Fix check for unset alias->unit array perf tools: Be consistent on the type of map->symbols[] interator perf intel pt decoder: clang has no -Wno-override-init perf evsel: Do not put a variable sized type not at the end of a struct perf probe: Avoid accessing uninitialized 'map' variable perf tools: Do not put a variable sized type not at the end of a struct perf record: Do not put a variable sized type not at the end of a struct perf tests: Synthesize struct instead of using field after variable sized type perf bench numa: Make sure dprintf() is not defined Revert "perf bench futex: Sanitize numeric parameters" tools lib subcmd: Make it an error to pass a signed value to OPTION_UINTEGER tools: Set the maximum optimization level according to the compiler being used tools: Suppress request for warning options not existent in clang samples/bpf: Reset global variables samples/bpf: Ignore already processed ELF sections samples/bpf: Add missing header perf symbols: dso->name is an array, no need to check it against NULL perf tests record: No need to test an array against NULL perf symbols: No need to check if sym->name is NULL ...
2017-02-20Merge branch 'core-rcu-for-linus' of ↵Linus Torvalds12-206/+313
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU updates from Ingo Molnar: "The RCU changes in this cycle are: - Dynticks updates, consolidating open-coded counter accesses into a well-defined API - SRCU updates: Simplify algorithm, add formal verification - Documentation updates - Miscellaneous fixes - Torture-test updates Most of the diffstat comes from the relatively large documentation update" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (42 commits) srcu: Reduce probability of SRCU ->unlock_count[] counter overflow rcutorture: Add CBMC-based formal verification for SRCU srcu: Force full grace-period ordering srcu: Implement more-efficient reader counts rcu: Adjust FQS offline checks for exact online-CPU detection rcu: Check cond_resched_rcu_qs() state less often to reduce GP overhead rcu: Abstract extended quiescent state determination rcu: Abstract dynticks extended quiescent state enter/exit operations rcu: Add lockdep checks to synchronous expedited primitives rcu: Eliminate unused expedited_normal counter llist: Clarify comments about when locking is needed rcu: Fix comment in rcu_organize_nocb_kthreads() rcu: Enable RCU tracepoints by default to aid in debugging rcu: Make rcu_cpu_starting() use its "cpu" argument rcu: Add comment headers to expedited-grace-period counter functions rcu: Don't wake rcuc/X kthreads on NOCB CPUs rcu: Re-enable TASKS_RCU for User Mode Linux rcu: Once again use NMI-based stack traces in stall warnings rcu: Remove short-term CPU kicking rcu: Add long-term CPU kicking ...
2017-02-20Merge branch 'irq-core-for-linus' of ↵Linus Torvalds3-3/+68
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq updates from Thomas Gleixner: "This update provides: - Yet another two irq controller chip drivers - A few updates and fixes for GICV3 - A resource managed function for interrupt allocation - Fixes, updates and enhancements all over the place" * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/qcom: Fix error handling genirq: Clarify logic calculating bogus irqreturn_t values genirq/msi: Add stubs for get_cached_msi_msg/pci_write_msi_msg genirq/devres: Use dev_name(dev) as default for devname genirq: Fix /proc/interrupts output alignment irqdesc: Add a resource managed version of irq_alloc_descs() irqchip/gic-v3-its: Zero command on allocation irqchip/gic-v3-its: Fix command buffer allocation irqchip/mips-gic: Fix local interrupts irqchip: Add a driver for Cortina Gemini irqchip: DT bindings for Cortina Gemini irqchip irqchip/gic-v3: Remove duplicate definition of GICD_TYPER_LPIS irqchip/gic-v3-its: Rename MAPVI to MAPTI irqchip/gic-v3-its: Drop deprecated GITS_BASER_TYPE_CPU irqchip/gic-v3-its: Refactor command encoding irqchip/gic-v3-its: Enable cacheable attribute Read-allocate hints irqchip/qcom: Add IRQ combiner driver ACPI: Add support for ResourceSource/IRQ domain mapping ACPI: Generic GSI: Do not attempt to map non-GSI IRQs during bus scan irq/platform-msi: Fix comment about maximal MSIs
2017-02-20Merge branch 'timers-core-for-linus' of ↵Linus Torvalds13-589/+60
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: "Nothing exciting, just the usual pile of fixes, updates and cleanups: - A bunch of clocksource driver updates - Removal of CONFIG_TIMER_STATS and the related /proc file - More posix timer slim down work - A scalability enhancement in the tick broadcast code - Math cleanups" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits) hrtimer: Catch invalid clockids again math64, tile: Fix build failure clocksource/drivers/arm_arch_timer:: Mark cyclecounter __ro_after_init timerfd: Protect the might cancel mechanism proper timer_list: Remove useless cast when printing time: Remove CONFIG_TIMER_STATS clocksource/drivers/arm_arch_timer: Work around Hisilicon erratum 161010101 clocksource/drivers/arm_arch_timer: Introduce generic errata handling infrastructure clocksource/drivers/arm_arch_timer: Remove fsl-a008585 parameter clocksource/drivers/arm_arch_timer: Add dt binding for hisilicon-161010101 erratum clocksource/drivers/ostm: Add renesas-ostm timer driver clocksource/drivers/ostm: Document renesas-ostm timer DT bindings clocksource/drivers/tcb_clksrc: Use 32 bit tcb as sched_clock clocksource/drivers/gemini: Add driver for the Cortina Gemini clocksource: add DT bindings for Cortina Gemini clockevents: Add a clkevt-of mechanism like clksrc-of tick/broadcast: Reduce lock cacheline contention timers: Omit POSIX timer stuff from task_struct when disabled x86/timer: Make delay() work during early bootup delay: Add explanation of udelay() inaccuracy ...
2017-02-18Merge branch 'locking-urgent-for-linus' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fix from Thomas Gleixner: "Move the futex init function to core initcall so user mode helper does not run into an uninitialized futex syscall" * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: futex: Move futex_init() to core_initcall
2017-02-18Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds2-10/+9
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixes from Thomas Gleixner: "Two small fixes:: - Prevent deadlock on the tick broadcast lock. Found and fixed by Mike. - Stop using printk() in the timekeeping debug code to prevent a deadlock against the scheduler" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timekeeping: Use deferred printk() in debug code tick/broadcast: Prevent deadlock on tick_broadcast_lock
2017-02-18printk: use rcuidle console tracepointSergey Senozhatsky1-1/+1
Use rcuidle console tracepoint because, apparently, it may be issued from an idle CPU: hw-breakpoint: Failed to enable monitor mode on CPU 0. hw-breakpoint: CPU 0 failed to disable vector catch =============================== [ ERR: suspicious RCU usage. ] 4.10.0-rc8-next-20170215+ #119 Not tainted ------------------------------- ./include/trace/events/printk.h:32 suspicious rcu_dereference_check() usage! other info that might help us debug this: RCU used illegally from idle CPU! rcu_scheduler_active = 2, debug_locks = 0 RCU used illegally from extended quiescent state! 2 locks held by swapper/0/0: #0: (cpu_pm_notifier_lock){......}, at: [<c0237e2c>] cpu_pm_exit+0x10/0x54 #1: (console_lock){+.+.+.}, at: [<c01ab350>] vprintk_emit+0x264/0x474 stack backtrace: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.10.0-rc8-next-20170215+ #119 Hardware name: Generic OMAP4 (Flattened Device Tree) console_unlock vprintk_emit vprintk_default printk reset_ctrl_regs dbg_cpu_pm_notify notifier_call_chain cpu_pm_exit omap_enter_idle_coupled cpuidle_enter_state cpuidle_enter_state_coupled do_idle cpu_startup_entry start_kernel This RCU warning, however, is suppressed by lockdep_off() in printk(). lockdep_off() increments the ->lockdep_recursion counter and thus disables RCU_LOCKDEP_WARN() and debug_lockdep_rcu_enabled(), which want lockdep to be enabled "current->lockdep_recursion == 0". Link: http://lkml.kernel.org/r/20170217015932.11898-1-sergey.senozhatsky@gmail.com Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Reported-by: Tony Lindgren <tony@atomide.com> Tested-by: Tony Lindgren <tony@atomide.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Lindgren <tony@atomide.com> Cc: Russell King <rmk@armlinux.org.uk> Cc: <stable@vger.kernel.org> [3.4+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-18hrtimer: Catch invalid clockids againMarc Zyngier1-5/+15
commit 82e88ff1ea94 ("hrtimer: Revert CLOCK_MONOTONIC_RAW support") removed unfortunately a sanity check in the hrtimer code which was part of that MONOTONIC_RAW patch series. It would have caught the bogus usage of CLOCK_MONOTONIC_RAW in the wireless code. So bring it back. It is way too easy to take any random clockid and feed it to the hrtimer subsystem. At best, it gets mapped to a monotonic base, but it would be better to just catch illegal values as early as possible. Detect invalid clockids, map them to CLOCK_MONOTONIC and emit a warning. [ tglx: Replaced the BUG by a WARN and gracefully map to CLOCK_MONOTONIC ] Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Cc: Tomasz Nowicki <tn@semihalf.com> Cc: Christoffer Dall <christoffer.dall@linaro.org> Link: http://lkml.kernel.org/r/1452879670-16133-3-git-send-email-marc.zyngier@arm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-02-16Revert "nohz: Fix collision between tick and other hrtimers"Linus Torvalds2-14/+2
This reverts commit 24b91e360ef521a2808771633d76ebc68bd5604b and commit 7bdb59f1ad47 ("tick/nohz: Fix possible missing clock reprog after tick soft restart") that depends on it, Pavel reports that it causes occasional boot hangs for him that seem to depend on just how the machine was booted. In particular, his machine hangs at around the PCI fixups of the EHCI USB host controller, but only hangs from cold boot, not from a warm boot. Thomas Gleixner suspecs it's a CPU hotplug interaction, particularly since Pavel also saw suspend/resume issues that seem to be related. We're reverting for now while trying to figure out the root cause. Reported-bisected-and-tested-by: Pavel Machek <pavel@ucw.cz> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@kernel.org # reverted commits were marked for stable Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds3-22/+66
Pull networking fixes from David Miller: 1) In order to avoid problems in the future, make cgroup bpf overriding explicit using BPF_F_ALLOW_OVERRIDE. From Alexei Staovoitov. 2) LLC sets skb->sk without proper skb->destructor and this explodes, fix from Eric Dumazet. 3) Make sure when we have an ipv4 mapped source address, the destination is either also an ipv4 mapped address or ipv6_addr_any(). Fix from Jonathan T. Leighton. 4) Avoid packet loss in fec driver by programming the multicast filter more intelligently. From Rui Sousa. 5) Handle multiple threads invoking fanout_add(), fix from Eric Dumazet. 6) Since we can invoke the TCP input path in process context, without BH being disabled, we have to accomodate that in the locking of the TCP probe. Also from Eric Dumazet. 7) Fix erroneous emission of NETEVENT_DELAY_PROBE_TIME_UPDATE when we aren't even updating that sysctl value. From Marcus Huewe. 8) Fix endian bugs in ibmvnic driver, from Thomas Falcon. [ This is the second version of the pull that reverts the nested rhashtable changes that looked a bit too scary for this late in the release - Linus ] * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (27 commits) rhashtable: Revert nested table changes. ibmvnic: Fix endian errors in error reporting output ibmvnic: Fix endian error when requesting device capabilities net: neigh: Fix netevent NETEVENT_DELAY_PROBE_TIME_UPDATE notification net: xilinx_emaclite: fix freezes due to unordered I/O net: xilinx_emaclite: fix receive buffer overflow bpf: kernel header files need to be copied into the tools directory tcp: tcp_probe: use spin_lock_bh() uapi: fix linux/if_pppol2tp.h userspace compilation errors packet: fix races in fanout_add() ibmvnic: Fix initial MTU settings net: ethernet: ti: cpsw: fix cpsw assignment in resume kcm: fix a null pointer dereference in kcm_sendmsg() net: fec: fix multicast filtering hardware setup ipv6: Handle IPv4-mapped src to in6addr_any dst. ipv6: Inhibit IPv4-mapped src address on the wire. net/mlx5e: Disable preemption when doing TC statistics upcall rhashtable: Add nested tables tipc: Fix tipc_sk_reinit race conditions gfs2: Use rhashtable walk interface in glock_hash_walk ...
2017-02-16genirq: Clarify logic calculating bogus irqreturn_t valuesJeremy Kerr1-1/+3
Although irqreturn_t is an enum, we treat it (and its enumeration constants) as a bitmask. However, bad_action_ret() uses a less-than operator to determine whether an irqreturn_t falls within allowable bit values, which means we need to know the signededness of an enum type to read the logic, which is implementation-dependent. This change explicitly uses an unsigned type for the comparison. We do this instead of changing to a bitwise test, as the latter compiles to increased instructions in this hot path. It looks like we get the correct behaviour currently (bad_action_ret(-1) returns 1), so this is purely a readability fix. Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Link: http://lkml.kernel.org/r/1487219049-4061-1-git-send-email-jk@ozlabs.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-02-15timekeeping: Use deferred printk() in debug codeSergey Senozhatsky1-2/+2
We cannot do printk() from tk_debug_account_sleep_time(), because tk_debug_account_sleep_time() is called under tk_core seq lock. The reason why printk() is unsafe there is that console_sem may invoke scheduler (up()->wake_up_process()->activate_task()), which, in turn, can return back to timekeeping code, for instance, via get_time()->ktime_get(), deadlocking the system on tk_core seq lock. [ 48.950592] ====================================================== [ 48.950622] [ INFO: possible circular locking dependency detected ] [ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted [ 48.950622] ------------------------------------------------------- [ 48.950622] kworker/0:0/3 is trying to acquire lock: [ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90 [ 48.950683] but task is already holding lock: [ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90 [ 48.950714] which lock already depends on the new lock. [ 48.950714] the existing dependency chain (in reverse order) is: [ 48.950714] -> #5 (hrtimer_bases.lock){-.-...}: [ 48.950744] _raw_spin_lock_irqsave+0x50/0x64 [ 48.950775] lock_hrtimer_base+0x28/0x58 [ 48.950775] hrtimer_start_range_ns+0x20/0x5c8 [ 48.950775] __enqueue_rt_entity+0x320/0x360 [ 48.950805] enqueue_rt_entity+0x2c/0x44 [ 48.950805] enqueue_task_rt+0x24/0x94 [ 48.950836] ttwu_do_activate+0x54/0xc0 [ 48.950836] try_to_wake_up+0x248/0x5c8 [ 48.950836] __setup_irq+0x420/0x5f0 [ 48.950836] request_threaded_irq+0xdc/0x184 [ 48.950866] devm_request_threaded_irq+0x58/0xa4 [ 48.950866] omap_i2c_probe+0x530/0x6a0 [ 48.950897] platform_drv_probe+0x50/0xb0 [ 48.950897] driver_probe_device+0x1f8/0x2cc [ 48.950897] __driver_attach+0xc0/0xc4 [ 48.950927] bus_for_each_dev+0x6c/0xa0 [ 48.950927] bus_add_driver+0x100/0x210 [ 48.950927] driver_register+0x78/0xf4 [ 48.950958] do_one_initcall+0x3c/0x16c [ 48.950958] kernel_init_freeable+0x20c/0x2d8 [ 48.950958] kernel_init+0x8/0x110 [ 48.950988] ret_from_fork+0x14/0x24 [ 48.950988] -> #4 (&rt_b->rt_runtime_lock){-.-...}: [ 48.951019] _raw_spin_lock+0x40/0x50 [ 48.951019] rq_offline_rt+0x9c/0x2bc [ 48.951019] set_rq_offline.part.2+0x2c/0x58 [ 48.951049] rq_attach_root+0x134/0x144 [ 48.951049] cpu_attach_domain+0x18c/0x6f4 [ 48.951049] build_sched_domains+0xba4/0xd80 [ 48.951080] sched_init_smp+0x68/0x10c [ 48.951080] kernel_init_freeable+0x160/0x2d8 [ 48.951080] kernel_init+0x8/0x110 [ 48.951080] ret_from_fork+0x14/0x24 [ 48.951110] -> #3 (&rq->lock){-.-.-.}: [ 48.951110] _raw_spin_lock+0x40/0x50 [ 48.951141] task_fork_fair+0x30/0x124 [ 48.951141] sched_fork+0x194/0x2e0 [ 48.951141] copy_process.part.5+0x448/0x1a20 [ 48.951171] _do_fork+0x98/0x7e8 [ 48.951171] kernel_thread+0x2c/0x34 [ 48.951171] rest_init+0x1c/0x18c [ 48.951202] start_kernel+0x35c/0x3d4 [ 48.951202] 0x8000807c [ 48.951202] -> #2 (&p->pi_lock){-.-.-.}: [ 48.951232] _raw_spin_lock_irqsave+0x50/0x64 [ 48.951232] try_to_wake_up+0x30/0x5c8 [ 48.951232] up+0x4c/0x60 [ 48.951263] __up_console_sem+0x2c/0x58 [ 48.951263] console_unlock+0x3b4/0x650 [ 48.951263] vprintk_emit+0x270/0x474 [ 48.951293] vprintk_default+0x20/0x28 [ 48.951293] printk+0x20/0x30 [ 48.951324] kauditd_hold_skb+0x94/0xb8 [ 48.951324] kauditd_thread+0x1a4/0x56c [ 48.951324] kthread+0x104/0x148 [ 48.951354] ret_from_fork+0x14/0x24 [ 48.951354] -> #1 ((console_sem).lock){-.....}: [ 48.951385] _raw_spin_lock_irqsave+0x50/0x64 [ 48.951385] down_trylock+0xc/0x2c [ 48.951385] __down_trylock_console_sem+0x24/0x80 [ 48.951385] console_trylock+0x10/0x8c [ 48.951416] vprintk_emit+0x264/0x474 [ 48.951416] vprintk_default+0x20/0x28 [ 48.951416] printk+0x20/0x30 [ 48.951446] tk_debug_account_sleep_time+0x5c/0x70 [ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0 [ 48.951446] timekeeping_resume+0x218/0x23c [ 48.951477] syscore_resume+0x94/0x42c [ 48.951477] suspend_enter+0x554/0x9b4 [ 48.951477] suspend_devices_and_enter+0xd8/0x4b4 [ 48.951507] enter_state+0x934/0xbd4 [ 48.951507] pm_suspend+0x14/0x70 [ 48.951507] state_store+0x68/0xc8 [ 48.951538] kernfs_fop_write+0xf4/0x1f8 [ 48.951538] __vfs_write+0x1c/0x114 [ 48.951538] vfs_write+0xa0/0x168 [ 48.951568] SyS_write+0x3c/0x90 [ 48.951568] __sys_trace_return+0x0/0x10 [ 48.951568] -> #0 (tk_core){----..}: [ 48.951599] lock_acquire+0xe0/0x294 [ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4 [ 48.951629] retrigger_next_event+0x4c/0x90 [ 48.951629] on_each_cpu+0x40/0x7c [ 48.951629] clock_was_set_work+0x14/0x20 [ 48.951660] process_one_work+0x2b4/0x808 [ 48.951660] worker_thread+0x3c/0x550 [ 48.951660] kthread+0x104/0x148 [ 48.951690] ret_from_fork+0x14/0x24 [ 48.951690] other info that might help us debug this: [ 48.951690] Chain exists of: tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock [ 48.951721] Possible unsafe locking scenario: [ 48.951721] CPU0 CPU1 [ 48.951721] ---- ---- [ 48.951721] lock(hrtimer_bases.lock); [ 48.951751] lock(&rt_b->rt_runtime_lock); [ 48.951751] lock(hrtimer_bases.lock); [ 48.951751] lock(tk_core); [ 48.951782] *** DEADLOCK *** [ 48.951782] 3 locks held by kworker/0:0/3: [ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808 [ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808 [ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90 [ 48.951843] stack backtrace: [ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+ [ 48.951904] Workqueue: events clock_was_set_work [ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14) [ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0) [ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308) [ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324) [ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8) [ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294) [ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4) [ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90) [ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c) [ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20) [ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808) [ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550) [ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148) [ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24) Replace printk() with printk_deferred(), which does not call into the scheduler. Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend") Reported-and-tested-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: John Stultz <john.stultz@linaro.org> Cc: "[4.9+]" <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-02-14Merge tag 'v4.10-rc8' into perf/core, to pick up fixesIngo Molnar7-63/+81
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-13futex: Move futex_init() to core_initcallYang Yang1-1/+1
The UEVENT user mode helper is enabled before the initcalls are executed and is available when the root filesystem has been mounted. The user mode helper is triggered by device init calls and the executable might use the futex syscall. futex_init() is marked __initcall which maps to device_initcall, but there is no guarantee that futex_init() is invoked _before_ the first device init call which triggers the UEVENT user mode helper. If the user mode helper uses the futex syscall before futex_init() then the syscall crashes with a NULL pointer dereference because the futex subsystem has not been initialized yet. Move futex_init() to core_initcall so futexes are initialized before the root filesystem is mounted and the usermode helper becomes available. [ tglx: Rewrote changelog ] Signed-off-by: Yang Yang <yang.yang29@zte.com.cn> Cc: jiang.biao2@zte.com.cn Cc: jiang.zhengxiong@zte.com.cn Cc: zhong.weidong@zte.com.cn Cc: deng.huali@zte.com.cn Cc: Peter Zijlstra <peterz@infradead.org> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1483085875-6130-1-git-send-email-yang.yang29@zte.com.cn Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-02-13tick/broadcast: Prevent deadlock on tick_broadcast_lockMike Galbraith1-8/+7
tick_broadcast_lock is taken from interrupt context, but the following call chain takes the lock without disabling interrupts: [ 12.703736] _raw_spin_lock+0x3b/0x50 [ 12.703738] tick_broadcast_control+0x5a/0x1a0 [ 12.703742] intel_idle_cpu_online+0x22/0x100 [ 12.703744] cpuhp_invoke_callback+0x245/0x9d0 [ 12.703752] cpuhp_thread_fun+0x52/0x110 [ 12.703754] smpboot_thread_fn+0x276/0x320 So the following deadlock can happen: lock(tick_broadcast_lock); <Interrupt> lock(tick_broadcast_lock); intel_idle_cpu_online() is the only place which violates the calling convention of tick_broadcast_control(). This was caused by the removal of the smp function call in course of the cpu hotplug rework. Instead of slapping local_irq_disable/enable() at the call site, we can relax the calling convention and handle it in the core code, which makes the whole machinery more robust. Fixes: 29d7bbada98e ("intel_idle: Remove superfluous SMP fuction call") Reported-by: Gabriel C <nix.or.die@gmail.com> Signed-off-by: Mike Galbraith <efault@gmx.de> Cc: Ruslan Ruslichenko <rruslich@cisco.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: lwn@lwn.net Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Anna-Maria Gleixner <anna-maria@linutronix.de> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: stable <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/1486953115.5912.4.camel@gmx.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-02-12bpf: introduce BPF_F_ALLOW_OVERRIDE flagAlexei Starovoitov3-22/+66
If BPF_F_ALLOW_OVERRIDE flag is used in BPF_PROG_ATTACH command to the given cgroup the descendent cgroup will be able to override effective bpf program that was inherited from this cgroup. By default it's not passed, therefore override is disallowed. Examples: 1. prog X attached to /A with default prog Y fails to attach to /A/B and /A/B/C Everything under /A runs prog X 2. prog X attached to /A with allow_override. prog Y fails to attach to /A/B with default (non-override) prog M attached to /A/B with allow_override. Everything under /A/B runs prog M only. 3. prog X attached to /A with allow_override. prog Y fails to attach to /A with default. The user has to detach first to switch the mode. In the future this behavior may be extended with a chain of non-overridable programs. Also fix the bug where detach from cgroup where nothing is attached was not throwing error. Return ENOENT in such case. Add several testcases and adjust libbpf. Fixes: 3007098494be ("cgroup: add support for eBPF programs") Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Daniel Mack <daniel@zonque.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-12genirq/devres: Use dev_name(dev) as default for devnameHeiner Kallweit1-2/+8
Allow the devname parameter to be NULL and use dev_name(dev) in this case. This should be an appropriate default for most use cases. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: http://lkml.kernel.org/r/05c63d67-30b4-7026-02d5-ce7fb7bc185f@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-02-11Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds1-0/+5
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Ingo Molnar: "Fix a sporadic missed timer hw reprogramming bug that can result in random delays" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tick/nohz: Fix possible missing clock reprog after tick soft restart
2017-02-11Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds1-10/+15
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "A kernel crash fix plus three tooling fixes" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/core: Fix crash in perf_event_read() perf callchain: Reference count maps perf diff: Fix -o/--order option behavior (again) perf diff: Fix segfault on 'perf diff -o N' option
2017-02-11Merge branch 'locking-urgent-for-linus' of ↵Linus Torvalds1-8/+4
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull lockdep fix from Ingo Molnar: "This fixes an ugly lockdep stack trace output regression. (But also affects other stacktrace users such as kmemleak, KASAN, etc)" * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: stacktrace, lockdep: Fix address, newline ugliness
2017-02-10genirq: Fix /proc/interrupts output alignmentH Hartley Sweeten1-0/+2
If the irq_desc being output does not have a domain associated the information following the 'name' is not aligned correctly. Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> Link: http://lkml.kernel.org/r/20170210165416.5629-1-hsweeten@visionengravers.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-02-10irqdesc: Add a resource managed version of irq_alloc_descs()Bartosz Golaszewski1-0/+55
Add a devres flavor of __devm_irq_alloc_descs() and corresponding helper macros. Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: linux-doc@vger.kernel.org Cc: Jonathan Corbet <corbet@lwn.net> Link: http://lkml.kernel.org/r/1486729403-21132-1-git-send-email-bgolaszewski@baylibre.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-02-10timer_list: Remove useless cast when printingMars Cheng1-1/+1
hrtimer_resolution is already unsigned int, not necessary to cast it when printing. Signed-off-by: Mars Cheng <mars.cheng@mediatek.com> Cc: CC Hwang <cc.hwang@mediatek.com> Cc: wsd_upstream@mediatek.com Cc: Loda Chou <loda.chou@mediatek.com> Cc: Jades Shih <jades.shih@mediatek.com> Cc: Miles Chen <miles.chen@mediatek.com> Cc: John Stultz <john.stultz@linaro.org> Cc: My Chuang <my.chuang@mediatek.com> Cc: Matthias Brugger <matthias.bgg@gmail.com> Cc: Yingjoe Chen <yingjoe.chen@mediatek.com> Link: http://lkml.kernel.org/r/1486626615-5879-1-git-send-email-mars.cheng@mediatek.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-02-10time: Remove CONFIG_TIMER_STATSKees Cook7-523/+2
Currently CONFIG_TIMER_STATS exposes process information across namespaces: kernel/time/timer_list.c print_timer(): SEQ_printf(m, ", %s/%d", tmp, timer->start_pid); /proc/timer_list: #11: <0000000000000000>, hrtimer_wakeup, S:01, do_nanosleep, cron/2570 Given that the tracer can give the same information, this patch entirely removes CONFIG_TIMER_STATS. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: John Stultz <john.stultz@linaro.org> Cc: Nicolas Pitre <nicolas.pitre@linaro.org> Cc: linux-doc@vger.kernel.org Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Xing Gao <xgao01@email.wm.edu> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Jessica Frazelle <me@jessfraz.com> Cc: kernel-hardening@lists.openwall.com Cc: Nicolas Iooss <nicolas.iooss_linux@m4x.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Tejun Heo <tj@kernel.org> Cc: Michal Marek <mmarek@suse.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Olof Johansson <olof@lixom.net> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: linux-api@vger.kernel.org Cc: Arjan van de Ven <arjan@linux.intel.com> Link: http://lkml.kernel.org/r/20170208192659.GA32582@beast Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-02-10tick/nohz: Fix possible missing clock reprog after tick soft restartFrederic Weisbecker1-0/+5
ts->next_tick keeps track of the next tick deadline in order to optimize clock programmation on irq exit and avoid redundant clock device writes. Now if ts->next_tick missed an update, we may spuriously miss a clock reprog later as the nohz code is fooled by an obsolete next_tick value. This is what happens here on a specific path: when we observe an expired timer from the nohz update code on irq exit, we perform a soft tick restart which simply fires the closest possible tick without actually exiting the nohz mode and restoring a periodic state. But we forget to update ts->next_tick accordingly. As a result, after the next tick resulting from such soft tick restart, the nohz code sees a stale value on ts->next_tick which doesn't match the clock deadline that just expired. If that obsolete ts->next_tick value happens to collide with the actual next tick deadline to be scheduled, we may spuriously bypass the clock reprogramming. In the worst case, the tick may never fire again. Fix this with a ts->next_tick reset on soft tick restart. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Reviewed: Wanpeng Li <wanpeng.li@hotmail.com> Acked-by: Rik van Riel <riel@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1486485894-29173-1-git-send-email-fweisbec@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-02-10perf/core: Allow kernel filters on CPU eventsAlexander Shishkin1-14/+28
While supporting file-based address filters for CPU events requires some extra context switch handling, kernel address filters are easy, since the kernel mapping is preserved across address spaces. It is also useful as it permits tracing scheduling paths of the kernel. This patch allows setting up kernel filters for CPU events. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: Will Deacon <will.deacon@arm.com> Cc: vince@deater.net Link: http://lkml.kernel.org/r/20170126094057.13805-4-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-10perf/core: Do error out on a kernel filter on an exclude_filter eventAlexander Shishkin1-0/+1
It is currently possible to configure a kernel address filter for a event that excludes kernel from its traces (attr.exclude_kernel==1). While in reality this doesn't make sense, the SET_FILTER ioctl() should return a error in such case, currently it does not. Furthermore, it will still silently discard the filter and any potentially valid filters that came with it. This patch makes the SET_FILTER ioctl() error out in such cases. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: Will Deacon <will.deacon@arm.com> Cc: vince@deater.net Link: http://lkml.kernel.org/r/20170126094057.13805-3-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-10sched/core: Remove unlikely() annotation from sched_move_task()Steven Rostedt (VMware)1-2/+2
The check for 'running' in sched_move_task() has an unlikely() around it. That is, it is unlikely that the task being moved is running. That use to be true. But with a couple of recent updates, it is now likely that the task will be running. The first change came from ea86cb4b7621 ("sched/cgroup: Fix cpu_cgroup_fork() handling") that moved around the use case of sched_move_task() in do_fork() where the call is now done after the task is woken (hence it is running). The second change came from 8e5bfa8c1f84 ("sched/autogroup: Do not use autogroup->tg in zombie threads") where sched_move_task() is called by the exit path, by the task that is exiting. Hence it too is running. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vincent Guittot <vincent.guittot@linaro.org> Link: http://lkml.kernel.org/r/20170206110426.27ca6426@gandalf.local.home Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-10perf/core: Fix crash in perf_event_read()Peter Zijlstra1-10/+15
Alexei had his box explode because doing read() on a package (rapl/uncore) event that isn't currently scheduled in ends up doing an out-of-bounds load. Rework the code to more explicitly deal with event->oncpu being -1. Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Tested-by: Alexei Starovoitov <ast@kernel.org> Tested-by: David Carrillo-Cisneros <davidcc@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: eranian@google.com Fixes: d6a2f9035bfc ("perf/core: Introduce PMU_EV_CAP_READ_ACTIVE_PKG") Link: http://lkml.kernel.org/r/20170131102710.GL6515@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-08kernel/ucount.c: mark user_header with kmemleak_ignore()Luis R. Rodriguez1-2/+1
The user_header gets caught by kmemleak with the following splat as missing a free: unreferenced object 0xffff99667a733d80 (size 96): comm "swapper/0", pid 1, jiffies 4294892317 (age 62191.468s) hex dump (first 32 bytes): a0 b6 92 b4 ff ff ff ff 00 00 00 00 01 00 00 00 ................ 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: kmemleak_alloc+0x4a/0xa0 __kmalloc+0x144/0x260 __register_sysctl_table+0x54/0x5e0 register_sysctl+0x1b/0x20 user_namespace_sysctl_init+0x17/0x34 do_one_initcall+0x52/0x1a0 kernel_init_freeable+0x173/0x200 kernel_init+0xe/0x100 ret_from_fork+0x2c/0x40 The BUG_ON()s are intended to crash so no need to clean up after ourselves on error there. This is also a kernel/ subsys_init() we don't need a respective exit call here as this is never modular, so just white list it. Link: http://lkml.kernel.org/r/20170203211404.31458-1-mcgrof@kernel.org Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Kees Cook <keescook@chromium.org> Cc: Nikolay Borisov <n.borisov.lkml@gmail.com> Cc: Serge Hallyn <serge@hallyn.com> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-08sched/autogroup: Rename auto_group.[ch] to autogroup.[ch]Ingo Molnar4-2/+2
The names are all 'autogroup', not 'auto_group' - so rename the kernel/sched/auto_group.[ch] to match the existing nomenclature. Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-08stacktrace, lockdep: Fix address, newline uglinessOmar Sandoval1-8/+4
Since KERN_CONT became meaningful again, lockdep stack traces have had annoying extra newlines, like this: [ 5.561122] -> #1 (B){+.+...}: [ 5.561528] [ 5.561532] [<ffffffff810d8873>] lock_acquire+0xc3/0x210 [ 5.562178] [ 5.562181] [<ffffffff816f6414>] mutex_lock_nested+0x74/0x6d0 [ 5.562861] [ 5.562880] [<ffffffffa01aa3c3>] init_btrfs_fs+0x21/0x196 [btrfs] [ 5.563717] [ 5.563721] [<ffffffff81000472>] do_one_initcall+0x52/0x1b0 [ 5.564554] [ 5.564559] [<ffffffff811a3af6>] do_init_module+0x5f/0x209 [ 5.565357] [ 5.565361] [<ffffffff81122f4d>] load_module+0x218d/0x2b80 [ 5.566020] [ 5.566021] [<ffffffff81123beb>] SyS_finit_module+0xeb/0x120 [ 5.566694] [ 5.566696] [<ffffffff816fd241>] entry_SYSCALL_64_fastpath+0x1f/0xc2 That's happening because each printk() call now gets printed on its own line, and we do a separate call to print the spaces before the symbol. Fix it by doing the printk() directly instead of using the print_ip_sym() helper. Additionally, the symbol address isn't very helpful, so let's get rid of that, too. The final result looks like this: [ 5.194518] -> #1 (B){+.+...}: [ 5.195002] lock_acquire+0xc3/0x210 [ 5.195439] mutex_lock_nested+0x74/0x6d0 [ 5.196491] do_one_initcall+0x52/0x1b0 [ 5.196939] do_init_module+0x5f/0x209 [ 5.197355] load_module+0x218d/0x2b80 [ 5.197792] SyS_finit_module+0xeb/0x120 [ 5.198251] entry_SYSCALL_64_fastpath+0x1f/0xc2 Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Omar Sandoval <osandov@fb.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kernel-team@fb.com Fixes: 4bcc595ccd80 ("printk: reinstate KERN_CONT for printing continuation lines") Link: http://lkml.kernel.org/r/43b4e114724b2bdb0308fa86cb33aa07d3d67fad.1486510315.git.osandov@fb.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-07sched/topology: Split out scheduler topology code from core.c into topology.cIngo Molnar4-1658/+1684
Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-07sched/core: Remove unnecessary #include headersIngo Molnar1-49/+10
Over the years sched/core.c accumulated over 50 #include lines, 40 of which are superfluous. (!) Removing them decreases the preprocessed .c file (.i) size noticeably: triton:~/tip> wc -l kernel/sched/core.i Before: 76387 kernel/sched/core.i After: 75896 kernel/sched/core.i Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-07sched/rq_clock: Consolidate the ordering of the rq_clock methodsIngo Molnar1-75/+78
update_rq_clock_task() and update_rq_clock() we unnecessarily spread across core.c, requiring an extra prototype line. Move them next to each other and in the proper order. Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-07sched/core: Clean up commentsIngo Molnar1-176/+182
Refresh the comments in the core scheduler code: - Capitalize sentences consistently - Capitalize 'CPU' consistently - ... and other small details. Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-06Merge branch 'perf/urgent' into perf/core, to pick up fixesIngo Molnar2-11/+10
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-04Merge branch 'irq-urgent-for-linus' of ↵Linus Torvalds1-14/+30
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Thomas Gleixner: - Prevent double activation of interrupt lines, which causes problems on certain interrupt controllers - Handle the fallout of the above because x86 (ab)uses the activation function to reconfigure interrupts under the hood. * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/irq: Make irq activate operations symmetric irqdomain: Avoid activating interrupts more than once
2017-02-04tick/broadcast: Reduce lock cacheline contentionWaiman Long1-7/+8
It was observed that on an Intel x86 system without the ARAT (Always running APIC timer) feature and with fairly large number of CPUs as well as CPUs coming in and out of intel_idle frequently, the lock contention on the tick_broadcast_lock can become significant. To reduce contention, the lock is put into its own cacheline and all the cpumask_var_t variables are put into the __read_mostly section. Running the SP benchmark of the NAS Parallel Benchmarks on a 4-socket 16-core 32-thread Nehalam system, the performance number improved from 3353.94 Mop/s to 3469.31 Mop/s when this patch was applied on a 4.9.6 kernel. This is a 3.4% improvement. Signed-off-by: Waiman Long <longman@redhat.com> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1485799063-20857-1-git-send-email-longman@redhat.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-02-03Merge tag 'trace-v4.10-rc2-2' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fix from Steven Rostedt: "Simple fix of s/static struct __init/static __init struct/" * tag 'trace-v4.10-rc2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing/kprobes: Fix __init annotation
2017-02-03modversions: treat symbol CRCs as 32 bit quantitiesArd Biesheuvel1-28/+25
The modversion symbol CRCs are emitted as ELF symbols, which allows us to easily populate the kcrctab sections by relying on the linker to associate each kcrctab slot with the correct value. This has a couple of downsides: - Given that the CRCs are treated as memory addresses, we waste 4 bytes for each CRC on 64 bit architectures, - On architectures that support runtime relocation, a R_<arch>_RELATIVE relocation entry is emitted for each CRC value, which identifies it as a quantity that requires fixing up based on the actual runtime load offset of the kernel. This results in corrupted CRCs unless we explicitly undo the fixup (and this is currently being handled in the core module code) - Such runtime relocation entries take up 24 bytes of __init space each, resulting in a x8 overhead in [uncompressed] kernel size for CRCs. Switching to explicit 32 bit values on 64 bit architectures fixes most of these issues, given that 32 bit values are not treated as quantities that require fixing up based on the actual runtime load offset. Note that on some ELF64 architectures [such as PPC64], these 32-bit values are still emitted as [absolute] runtime relocatable quantities, even if the value resolves to a build time constant. Since relative relocations are always resolved at build time, this patch enables MODULE_REL_CRCS on powerpc when CONFIG_RELOCATABLE=y, which turns the absolute CRC references into relative references into .rodata where the actual CRC value is stored. So redefine all CRC fields and variables as u32, and redefine the __CRC_SYMBOL() macro for 64 bit builds to emit the CRC reference using inline assembler (which is necessary since 64-bit C code cannot use 32-bit types to hold memory addresses, even if they are ultimately resolved using values that do not exceed 0xffffffff). To avoid potential problems with legacy 32-bit architectures using legacy toolchains, the equivalent C definition of the kcrctab entry is retained for 32-bit architectures. Note that this mostly reverts commit d4703aefdbc8 ("module: handle ppc64 relocating kcrctabs when CONFIG_RELOCATABLE=y") Acked-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-02Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds1-23/+46
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Five kernel fixes: - an mmap tracing ABI fix for certain mappings - a use-after-free fix, found via KASAN - three CPU hotplug related x86 PMU driver fixes" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel/uncore: Make package handling more robust perf/x86/intel/uncore: Clean up hotplug conversion fallout perf/x86/intel/rapl: Make package handling more robust perf/core: Fix PERF_RECORD_MMAP2 prot/flags for anonymous memory perf/core: Fix use-after-free bug
2017-02-02tracing/kprobes: Fix __init annotationArnd Bergmann1-1/+1
clang complains about "__init" being attached to a struct name: kernel/trace/trace_kprobe.c:1375:15: error: '__section__' attribute only applies to functions and global variables The intention must have been to mark the function as __init instead of the type, so move the attribute there. Link: http://lkml.kernel.org/r/20170201165826.2625888-1-arnd@arndb.de Fixes: f18f97ac43d7 ("tracing/kprobes: Add a helper method to return number of probe hits") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-02-01sched/rt: Show the 'sched_rr_timeslice' SCHED_RR timeslice tuning knob in ↵Shile Zhang3-3/+5
milliseconds We added the 'sched_rr_timeslice_ms' SCHED_RR tuning knob in this commit: ce0dbbbb30ae ("sched/rt: Add a tuning knob to allow changing SCHED_RR timeslice") ... which name suggests to users that it's in milliseconds, while in reality it's being set in milliseconds but the result is shown in jiffies. This is obviously confusing when HZ is not 1000, it makes it appear like the value set failed, such as HZ=100: root# echo 100 > /proc/sys/kernel/sched_rr_timeslice_ms root# cat /proc/sys/kernel/sched_rr_timeslice_ms 10 Fix this to be milliseconds all around. Signed-off-by: Shile Zhang <shile.zhang@nokia.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1485612049-20923-1-git-send-email-shile.zhang@nokia.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-01sched/cputime, vtime: Return nsecs instead of cputime_t to accountFrederic Weisbecker1-20/+11
Turn the full dynticks cputime clock source to return nsec while keeping its very internals jiffies based for performance reasons. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Link: http://lkml.kernel.org/r/1485832191-26889-27-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-01sched/cputime: Complete nsec conversion of tick based accountingFrederic Weisbecker1-30/+22
This is the final step toward tick based cputime conversion. Now that the whole cputime accounting engine accounts in nsecs, we can convert the very source of the cputime to account in nsecs. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Link: http://lkml.kernel.org/r/1485832191-26889-26-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-01sched/cputime: Push time to account_system_time() in nsecsFrederic Weisbecker1-20/+19
This is one more step toward converting cputime accounting to pure nsecs. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Link: http://lkml.kernel.org/r/1485832191-26889-25-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-01sched/cputime: Push time to account_idle_time() in nsecsFrederic Weisbecker1-9/+9
This is one more step toward converting cputime accounting to pure nsecs. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Link: http://lkml.kernel.org/r/1485832191-26889-24-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-01sched/cputime: Push time to account_steal_time() in nsecsFrederic Weisbecker1-5/+6
This is one more step toward converting cputime accounting to pure nsecs. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Link: http://lkml.kernel.org/r/1485832191-26889-23-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>