summaryrefslogtreecommitdiffstats
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2015-06-19timer: Minimize nohz off overheadThomas Gleixner4-8/+17
If nohz is disabled on the kernel command line the [hr]timer code still calls wake_up_nohz_cpu() and tick_nohz_full_cpu(), a pretty pointless exercise. Cache nohz_active in [hr]timer per cpu bases and avoid the overhead. Before: 48.10% hog [.] main 15.25% [kernel] [k] _raw_spin_lock_irqsave 9.76% [kernel] [k] _raw_spin_unlock_irqrestore 6.50% [kernel] [k] mod_timer 6.44% [kernel] [k] lock_timer_base.isra.38 3.87% [kernel] [k] detach_if_pending 3.80% [kernel] [k] del_timer 2.67% [kernel] [k] internal_add_timer 1.33% [kernel] [k] __internal_add_timer 0.73% [kernel] [k] timerfn 0.54% [kernel] [k] wake_up_nohz_cpu After: 48.73% hog [.] main 15.36% [kernel] [k] _raw_spin_lock_irqsave 9.77% [kernel] [k] _raw_spin_unlock_irqrestore 6.61% [kernel] [k] lock_timer_base.isra.38 6.42% [kernel] [k] mod_timer 3.90% [kernel] [k] detach_if_pending 3.76% [kernel] [k] del_timer 2.41% [kernel] [k] internal_add_timer 1.39% [kernel] [k] __internal_add_timer 0.76% [kernel] [k] timerfn We probably should have a cached value for nohz full in the per cpu bases as well to avoid the cpumask check. The base cache line is hot already, the cpumask not necessarily. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: John Stultz <john.stultz@linaro.org> Cc: Joonwoo Park <joonwoop@codeaurora.org> Cc: Wenbo Wang <wenbo.wang@memblaze.com> Link: http://lkml.kernel.org/r/20150526224512.207378134@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-06-19timer: Reduce timer migration overhead if disabledThomas Gleixner8-43/+121
Eric reported that the timer_migration sysctl is not really nice performance wise as it needs to check at every timer insertion whether the feature is enabled or not. Further the check does not live in the timer code, so we have an extra function call which checks an extra cache line to figure out that it is disabled. We can do better and store that information in the per cpu (hr)timer bases. I pondered to use a static key, but that's a nightmare to update from the nohz code and the timer base cache line is hot anyway when we select a timer base. The old logic enabled the timer migration unconditionally if CONFIG_NO_HZ was set even if nohz was disabled on the kernel command line. With this modification, we start off with migration disabled. The user visible sysctl is still set to enabled. If the kernel switches to NOHZ migration is enabled, if the user did not disable it via the sysctl prior to the switch. If nohz=off is on the kernel command line, migration stays disabled no matter what. Before: 47.76% hog [.] main 14.84% [kernel] [k] _raw_spin_lock_irqsave 9.55% [kernel] [k] _raw_spin_unlock_irqrestore 6.71% [kernel] [k] mod_timer 6.24% [kernel] [k] lock_timer_base.isra.38 3.76% [kernel] [k] detach_if_pending 3.71% [kernel] [k] del_timer 2.50% [kernel] [k] internal_add_timer 1.51% [kernel] [k] get_nohz_timer_target 1.28% [kernel] [k] __internal_add_timer 0.78% [kernel] [k] timerfn 0.48% [kernel] [k] wake_up_nohz_cpu After: 48.10% hog [.] main 15.25% [kernel] [k] _raw_spin_lock_irqsave 9.76% [kernel] [k] _raw_spin_unlock_irqrestore 6.50% [kernel] [k] mod_timer 6.44% [kernel] [k] lock_timer_base.isra.38 3.87% [kernel] [k] detach_if_pending 3.80% [kernel] [k] del_timer 2.67% [kernel] [k] internal_add_timer 1.33% [kernel] [k] __internal_add_timer 0.73% [kernel] [k] timerfn 0.54% [kernel] [k] wake_up_nohz_cpu Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: John Stultz <john.stultz@linaro.org> Cc: Joonwoo Park <joonwoop@codeaurora.org> Cc: Wenbo Wang <wenbo.wang@memblaze.com> Link: http://lkml.kernel.org/r/20150526224512.127050787@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-06-19timer: Stats: Simplify the flags handlingThomas Gleixner2-10/+7
Simplify the handling of the flag storage for the timer statistics. No intermediate storage anymore. Just hand over the flags field. I left the printout of 'deferrable' for now because changing this would be an ABI update and I have no idea how strong people feel about that. OTOH, I wonder whether we should kill the whole timer stats stuff because all of that information can be retrieved via ftrace/perf as well. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: John Stultz <john.stultz@linaro.org> Cc: Joonwoo Park <joonwoop@codeaurora.org> Cc: Wenbo Wang <wenbo.wang@memblaze.com> Link: http://lkml.kernel.org/r/20150526224512.046626248@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-06-19timer: Replace timer base by a cpu indexThomas Gleixner1-91/+36
Instead of storing a pointer to the per cpu tvec_base we can simply cache a CPU index in the timer_list and use that to get hold of the correct per cpu tvec_base. This is only used in lock_timer_base() and the slightly larger code is peanuts versus the spinlock operation and the d-cache foot print of the timer wheel. Aside of that this allows to get rid of following nuisances: - boot_tvec_base That statically allocated 4k bss data is just kept around so the timer has a home when it gets statically initialized. It serves no other purpose. With the CPU index we assign the timer to CPU0 at static initialization time and therefor can avoid the whole boot_tvec_base dance. That also simplifies the init code, which just can use the per cpu base. Before: text data bss dec hex filename 17491 9201 4160 30852 7884 ../build/kernel/time/timer.o After: text data bss dec hex filename 17440 9193 0 26633 6809 ../build/kernel/time/timer.o - Overloading the base pointer with various flags The CPU index has enough space to hold the flags (deferrable, irqsafe) so we can get rid of the extra masking and bit fiddling with the base pointer. As a benefit we reduce the size of struct timer_list on 64 bit machines. 4 - 8 bytes, a size reduction up to 15% per struct timer_list, which is a real win as we have tons of them embedded in other structs. This changes also the newly added deferrable printout of the timer start trace point to capture and print all timer->flags, which allows us to decode the target cpu of the timer as well. We might have used bitfields for this, but that would change the static initializers and the init function for no value to accomodate big endian bitfields. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: John Stultz <john.stultz@linaro.org> Cc: Joonwoo Park <joonwoop@codeaurora.org> Cc: Wenbo Wang <wenbo.wang@memblaze.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Badhri Jagan Sridharan <Badhri@google.com> Link: http://lkml.kernel.org/r/20150526224511.950084301@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-06-19timer: Use hlist for the timer wheel hash bucketsThomas Gleixner1-37/+27
This reduces the size of struct tvec_base by 50% and results in slightly smaller code as well. Before: struct tvec_base: size: 8256, cachelines: 129 text data bss dec hex filename 17698 13297 8256 39251 9953 ../build/kernel/time/timer.o After: struct tvec_base: 4160, cachelines: 65 text data bss dec hex filename 17491 9201 4160 30852 7884 ../build/kernel/time/timer.o Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Eric Dumazet <edumazet@google.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Joonwoo Park <joonwoop@codeaurora.org> Cc: Wenbo Wang <wenbo.wang@memblaze.com> Link: http://lkml.kernel.org/r/20150526224511.854731214@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-06-19timer: Remove FIFO "guarantee"Thomas Gleixner1-4/+2
The FIFO guarantee is only there if two timers are queued into the same bucket at the same jiffie on the same cpu: - The slack value depends on the delta between expiry and enqueue time, so the resulting expiry time can be different for timers which are queued in different jiffies. - Timers which are queued into the secondary array end up after a later queued timer which was queued into the primary array due to cascading. - Timers can end up on different cpus due to the NOHZ target moving around. Obviously there is no guarantee of expiry ordering between cpus. So anything which relies on FIFO behaviour of the timer wheel is broken already. This is a preparatory patch for converting the timer wheel to hlist which reduces the memory foot print of the wheel by 50%. It's a seperate patch so any (unlikely to happen) regression caused by this can be identified clearly. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Eric Dumazet <edumazet@google.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Joonwoo Park <joonwoop@codeaurora.org> Cc: Wenbo Wang <wenbo.wang@memblaze.com> Cc: George Spelvin <linux@horizon.com> Link: http://lkml.kernel.org/r/20150526224511.757520403@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-06-19timers: Sanitize catchup_timer_jiffies() usageThomas Gleixner1-24/+16
catchup_timer_jiffies() has been applied blindly to several functions without looking for possible better ways to do it. 1) internal_add_timer() Move the update to base->all_timers before we actually insert the timer into the wheel. 2) detach_if_pending() Again the update to base->all_timers allows us to explicitely do the timer_jiffies update in place, if this was the last timer which got removed. 3) __run_timers() We only check on entry, which is silly, because base->timer_jiffies can be behind - especially on NOHZ kernels - and if there is a single deferrable timer somewhere between base->timer_jiffies and jiffies we expire it and then loop until base->timer_jiffies == jiffies. Move it into the loop. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: John Stultz <john.stultz@linaro.org> Cc: Joonwoo Park <joonwoop@codeaurora.org> Cc: Wenbo Wang <wenbo.wang@memblaze.com> Link: http://lkml.kernel.org/r/20150526224511.662994644@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-06-19hrtimer: Allow hrtimer::function() to free the timerPeter Zijlstra1-23/+91
Currently an hrtimer callback function cannot free its own timer because __run_hrtimer() still needs to clear HRTIMER_STATE_CALLBACK after it. Freeing the timer would result in a clear use-after-free. Solve this by using a scheme similar to regular timers; track the current running timer in hrtimer_clock_base::running. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: ktkhai@parallels.com Cc: rostedt@goodmis.org Cc: juri.lelli@gmail.com Cc: pang.xunlei@linaro.org Cc: wanpeng.li@linux.intel.com Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: umgwanakikbuti@gmail.com Link: http://lkml.kernel.org/r/20150611124743.471563047@infradead.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-06-19hrtimer: Fix hrtimer_is_queued() holePeter Zijlstra1-10/+13
A queued hrtimer that gets restarted (hrtimer_start*() while hrtimer_is_queued()) will briefly appear as unqueued/inactive, even though the timer has always been active, we just moved it. Close this hole by preserving timer->state in hrtimer_start_range_ns()'s remove_hrtimer() call. Reported-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: ktkhai@parallels.com Cc: rostedt@goodmis.org Cc: juri.lelli@gmail.com Cc: pang.xunlei@linaro.org Cc: wanpeng.li@linux.intel.com Cc: umgwanakikbuti@gmail.com Link: http://lkml.kernel.org/r/20150611124743.175989138@infradead.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-06-19hrtimer: Remove HRTIMER_STATE_MIGRATEOleg Nesterov1-5/+2
I do not understand HRTIMER_STATE_MIGRATE. Unless I am totally confused it looks buggy and simply unneeded. migrate_hrtimer_list() sets it to keep hrtimer_active() == T, but this is not enough: this can fool, say, hrtimer_is_queued() in dequeue_signal(). Can't migrate_hrtimer_list() simply use HRTIMER_STATE_ENQUEUED? This fixes the race and we can kill STATE_MIGRATE. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: ktkhai@parallels.com Cc: rostedt@goodmis.org Cc: juri.lelli@gmail.com Cc: pang.xunlei@linaro.org Cc: wanpeng.li@linux.intel.com Cc: umgwanakikbuti@gmail.com Link: http://lkml.kernel.org/r/20150611124743.072387650@infradead.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-06-18timekeeping: Copy the shadow-timekeeper over the real timekeeper lastJohn Stultz1-1/+2
The fix in d151832650ed9 (time: Move clock_was_set_seq update before updating shadow-timekeeper) was unfortunately incomplete. The main gist of that change was to do the shadow-copy update last, so that any state changes were properly duplicated, and we wouldn't accidentally have stale data in the shadow. Unfortunately in the main update_wall_time() logic, we update use the shadow-timekeeper to calculate the next update values, then while holding the lock, copy the shadow-timekeeper over, then call timekeeping_update() to do some additional bookkeeping, (skipping the shadow mirror). The bug with this is the additional bookkeeping isn't all read-only, and some changes timkeeper state. Thus we might then overwrite this state change on the next update. To avoid this problem, do the timekeeping_update() on the shadow-timekeeper prior to copying the full state over to the real-timekeeper. This avoids problems with both the clock_was_set_seq and next_leap_ktime being overwritten and possibly the fast-timekeepers as well. Many thanks to Prarit for his rigorous testing, which discovered this problem, along with Prarit and Daniel's work validating this fix. Reported-by: Prarit Bhargava <prarit@redhat.com> Tested-by: Prarit Bhargava <prarit@redhat.com> Tested-by: Daniel Bristot de Oliveira <bristot@redhat.com> Signed-off-by: John Stultz <john.stultz@linaro.org> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Jan Kara <jack@suse.cz> Cc: Jiri Bohac <jbohac@suse.cz> Cc: Ingo Molnar <mingo@kernel.org> Link: http://lkml.kernel.org/r/1434560753-7441-1-git-send-email-john.stultz@linaro.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-06-18clockevents: Check state instead of mode in suspend/resume pathViresh Kumar1-2/+2
CLOCK_EVT_MODE_* macros are present for backward compatibility (as most of the drivers are still using old ->set_mode() interface). These macro's shouldn't be used anymore in code, that is common to both driver interfaces, i.e. ->set_mode() and ->set_state_*(). Drivers implementing ->set_state_*() interface, which have their clkevt->mode set to 0 (clkevt device structures are normally globally defined), will not participate in suspend/resume as they will always be marked as UNUSED. Fix this by checking state of the clockevent device instead of mode, which is updated for both the interfaces. Fixes: ac34ad27fc16 ("clockevents: Do not suspend/resume if unused") Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: linaro-kernel@lists.linaro.org Cc: alexandre.belloni@free-electrons.com Cc: sylvain.rochet@finsecur.com Link: http://lkml.kernel.org/r/a1964eef6e8a47d02b1ff9083c6c91f73f0ff643.1434537215.git.viresh.kumar@linaro.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-06-12ntp: Do leapsecond adjustment in adjtimex read pathJohn Stultz1-0/+18
Since the leapsecond is applied at tick-time, this means there is a small window of time at the start of a leap-second where we cross into the next second before applying the leap. This patch modified adjtimex so that the leap-second is applied on the second edge. Providing more correct leapsecond behavior. This does make it so that adjtimex()'s returned time values can be inconsistent with time values read from gettimeofday() or clock_gettime(CLOCK_REALTIME,...) for a brief period of one tick at the leapsecond. However, those other interfaces do not provide the TIME_OOP time_state return that adjtimex() provides, which allows the leapsecond to be properly represented. They instead only see a time discontinuity, and cannot tell the first 23:59:59 from the repeated 23:59:59 leap second. This seems like a reasonable tradeoff given clock_gettime() / gettimeofday() cannot properly represent a leapsecond, and users likely care more about performance, while folks who are using adjtimex() more likely care about leap-second correctness. Signed-off-by: John Stultz <john.stultz@linaro.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Daniel Bristot de Oliveira <bristot@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Jan Kara <jack@suse.cz> Cc: Jiri Bohac <jbohac@suse.cz> Cc: Ingo Molnar <mingo@kernel.org> Link: http://lkml.kernel.org/r/1434063297-28657-5-git-send-email-john.stultz@linaro.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-06-12time: Prevent early expiry of hrtimers[CLOCK_REALTIME] at the leap second edgeJohn Stultz3-8/+58
Currently, leapsecond adjustments are done at tick time. As a result, the leapsecond was applied at the first timer tick *after* the leapsecond (~1-10ms late depending on HZ), rather then exactly on the second edge. This was in part historical from back when we were always tick based, but correcting this since has been avoided since it adds extra conditional checks in the gettime fastpath, which has performance overhead. However, it was recently pointed out that ABS_TIME CLOCK_REALTIME timers set for right after the leapsecond could fire a second early, since some timers may be expired before we trigger the timekeeping timer, which then applies the leapsecond. This isn't quite as bad as it sounds, since behaviorally it is similar to what is possible w/ ntpd made leapsecond adjustments done w/o using the kernel discipline. Where due to latencies, timers may fire just prior to the settimeofday call. (Also, one should note that all applications using CLOCK_REALTIME timers should always be careful, since they are prone to quirks from settimeofday() disturbances.) However, the purpose of having the kernel do the leap adjustment is to avoid such latencies, so I think this is worth fixing. So in order to properly keep those timers from firing a second early, this patch modifies the ntp and timekeeping logic so that we keep enough state so that the update_base_offsets_now accessor, which provides the hrtimer core the current time, can check and apply the leapsecond adjustment on the second edge. This prevents the hrtimer core from expiring timers too early. This patch does not modify any other time read path, so no additional overhead is incurred. However, this also means that the leap-second continues to be applied at tick time for all other read-paths. Apologies to Richard Cochran, who pushed for similar changes years ago, which I resisted due to the concerns about the performance overhead. While I suspect this isn't extremely critical, folks who care about strict leap-second correctness will likely want to watch this. Potentially a -stable candidate eventually. Originally-suggested-by: Richard Cochran <richardcochran@gmail.com> Reported-by: Daniel Bristot de Oliveira <bristot@redhat.com> Reported-by: Prarit Bhargava <prarit@redhat.com> Signed-off-by: John Stultz <john.stultz@linaro.org> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Jan Kara <jack@suse.cz> Cc: Jiri Bohac <jbohac@suse.cz> Cc: Shuah Khan <shuahkh@osg.samsung.com> Cc: Ingo Molnar <mingo@kernel.org> Link: http://lkml.kernel.org/r/1434063297-28657-4-git-send-email-john.stultz@linaro.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-06-12ntp: Introduce and use SECS_PER_DAY macro instead of 86400John Stultz1-2/+3
Currently the leapsecond logic uses what looks like magic values. Improve this by defining SECS_PER_DAY and using that macro to make the logic more clear. Signed-off-by: John Stultz <john.stultz@linaro.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Daniel Bristot de Oliveira <bristot@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Jan Kara <jack@suse.cz> Cc: Jiri Bohac <jbohac@suse.cz> Cc: Ingo Molnar <mingo@kernel.org> Link: http://lkml.kernel.org/r/1434063297-28657-3-git-send-email-john.stultz@linaro.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-06-12time: Move clock_was_set_seq update before updating shadow-timekeeperJohn Stultz1-4/+8
It was reported that 868a3e915f7f5eba (hrtimer: Make offset update smarter) was causing timer problems after suspend/resume. The problem with that change is the modification to clock_was_set_seq in timekeeping_update is done prior to mirroring the time state to the shadow-timekeeper. Thus the next time we do update_wall_time() the updated sequence is overwritten by whats in the shadow copy. This patch moves the shadow-timekeeper mirroring to the end of the function, after all updates have been made, so all data is kept in sync. (This patch also affects the update_fast_timekeeper calls which were also problematically done prior to the mirroring). Reported-and-tested-by: Jeremiah Mahler <jmmahler@gmail.com> Signed-off-by: John Stultz <john.stultz@linaro.org> Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Link: http://lkml.kernel.org/r/1434063297-28657-2-git-send-email-john.stultz@linaro.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-06-10clocksource: Use current logging styleJoe Perches1-12/+12
clocksource messages aren't prefixed in dmesg so it's a bit unclear what subsystem emits the messages. Use pr_fmt and pr_<level> to auto-prefix the messages appropriately. Miscellanea: o Remove "Warning" from KERN_WARNING level messages o Align "timekeeping watchdog: " messages o Coalesce formats o Align multiline arguments Signed-off-by: Joe Perches <joe@perches.com> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/1432579795.2846.75.camel@perches.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-06-10time: Refactor usecs_to_jiffiesNicholas Mc Guire1-10/+3
Refactor the usecs_to_jiffies conditional code part in time.c and jiffies.h putting it into conditional functions rather than #ifdefs to improve readability. This is analogous to the msecs_to_jiffies() cleanup in commit ca42aaf0c861 ("time: Refactor msecs_to_jiffies") Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org> Cc: Masahiro Yamada <yamada.m@jp.panasonic.com> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Joe Perches <joe@perches.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Andrew Hunter <ahh@google.com> Cc: Paul Turner <pjt@google.com> Cc: Michal Marek <mmarek@suse.cz> Link: http://lkml.kernel.org/r/1432832996-12129-1-git-send-email-hofrat@osadl.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-06-02clockevents: Rename state to state_use_accessorsThomas Gleixner1-2/+2
The only sensible way to make abuse of core internal fields obvious and easy to grep for. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org>
2015-06-02clockevents: Use set/get state helper functionsThomas Gleixner2-6/+7
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org>
2015-06-02clockevents: Provide functions to set and get the stateThomas Gleixner5-24/+35
We want to rename dev->state, so provide proper get and set functions. Rename clockevents_set_state() to clockevents_switch_state() to avoid confusion. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org>
2015-06-02clockevents: Use helpers to check the state of a clockevent deviceViresh Kumar4-17/+17
Use accessor functions to check the state of clockevent devices in core code. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: linaro-kernel@lists.linaro.org Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/fa2b9869fd17f210eaa156ec2b594efd0230b6c7.1432192527.git.viresh.kumar@linaro.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-05-27clockevents: Do not suspend/resume if unusedAlexandre Belloni1-2/+2
There is no point in calling suspend/resume for unused clockevents as they are already stopped and disabled. This is really important for AT91 as the hardware is a trainwreck and takes ages to synchronize. Reported-by: Sylvain Rochet <sylvain.rochet@finsecur.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Nicolas Ferre <nicolas.ferre@atmel.com> Cc: Boris Brezillon <boris.brezillon@free-electrons.com> Cc: Maxime Ripard <maxime.ripard@free-electrons.com> Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/1421399151-26800-1-git-send-email-alexandre.belloni@free-electrons.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-05-22time: Remove read_boot_clock()Xunlei Pang1-11/+3
Now that we have a read_boot_clock64() function available on every architecture, and converted all the users to it, it's time to remove the (now unused) read_boot_clock() completely from the kernel. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Xunlei Pang <pang.xunlei@linaro.org> [jstultz: Minor commit message tweak suggested by Ingo] Signed-off-by: John Stultz <john.stultz@linaro.org>
2015-05-22tracing: timer: Add deferrable flag to timer_startBadhri Jagan Sridharan1-1/+1
The timer_start event now shows whether the timer is deferrable in case of a low-res timer. The debug_activate function now includes a deferrable flag while calling the trace_timer_start event. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Badhri Jagan Sridharan <Badhri@google.com> [jstultz: Fixed minor whitespace and grammer tweaks pointed out by Ingo] Signed-off-by: John Stultz <john.stultz@linaro.org>
2015-05-22time: Rework debugging variables so they aren't globalJohn Stultz1-22/+11
Ingo suggested that the timekeeping debugging variables recently added should not be global, and should be tied to the timekeeper's read_base. Thus this patch implements that suggestion. This version is different from the earlier versions as it keeps the variables in the timekeeper structure rather then in the tkr. Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
2015-05-22timekeeping: Provide new API to get the current time resolutionHarald Geyer1-0/+17
This patch series introduces a new function u32 ktime_get_resolution_ns(void) which allows to clean up some driver code. In particular the IIO subsystem has a function to provide timestamps for events but no means to get their resolution. So currently the dht11 driver tries to guess the resolution in a rather messy and convoluted way. We can do much better with the new code. This API is not designed to be exposed to user space. This has been tested on i386, sunxi and mxs. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Harald Geyer <harald@ccbib.org> [jstultz: Tweaked to make it build after upstream changes] Signed-off-by: John Stultz <john.stultz@linaro.org>
2015-05-22time: Make sure tz_minuteswest is set to a valid value when setting timeSasha Levin1-0/+4
Invalid values may overflow later, leading to undefined behaviour when multiplied by 60 to get the amount of seconds. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
2015-05-19clockevents: Stop unused clockevent devicesViresh Kumar3-4/+22
To avoid getting spurious interrupts on a tickless CPU, clockevent device can now be stopped by switching to ONESHOT_STOPPED state. The natural place for handling this transition is tick_program_event(). On 'expires == KTIME_MAX', we skip programming the event and so we need to fix such call sites as well, to always call tick_program_event() irrespective of the expires value. Once the clockevent device is required again, check if it was earlier put into ONESHOT_STOPPED state. If yes, switch its state to ONESHOT before programming its event. To make sure we haven't missed any corner case, add a WARN() for the case where we try to reprogram clockevent device while we aren't configured in ONESHOT_STOPPED state. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: linaro-kernel@lists.linaro.org Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Kevin Hilman <khilman@linaro.org> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/5146b07be7f0bc497e0ebae036590ec2fa73e540.1428031396.git.viresh.kumar@linaro.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-05-19clockevents: Introduce CLOCK_EVT_STATE_ONESHOT_STOPPED stateViresh Kumar2-1/+19
When no timers/hrtimers are pending, the expiry time is set to a special value: 'KTIME_MAX'. This normally happens with NO_HZ_{IDLE|FULL} in both LOWRES/HIGHRES modes. When 'expiry == KTIME_MAX', we either cancel the 'tick-sched' hrtimer (NOHZ_MODE_HIGHRES) or skip reprogramming clockevent device (NOHZ_MODE_LOWRES). But, the clockevent device is already reprogrammed from the tick-handler for next tick. As the clock event device is programmed in ONESHOT mode it will at least fire one more time (unnecessarily). Timers on few implementations (like arm_arch_timer, etc.) only support PERIODIC mode and their drivers emulate ONESHOT over that. Which means that on these platforms we will get spurious interrupts periodically (at last programmed interval rate, normally tick rate). In order to avoid spurious interrupts, the clockevent device should be stopped or its interrupts should be masked. A simple (yet hacky) solution to get this fixed could be: update hrtimer_force_reprogram() to always reprogram clockevent device and update clockevent drivers to STOP generating events (or delay it to max time) when 'expires' is set to KTIME_MAX. But the drawback here is that every clockevent driver has to be hacked for this particular case and its very easy for new ones to miss this. However, Thomas suggested to add an optional state ONESHOT_STOPPED to solve this problem: lkml.org/lkml/2014/5/9/508. This patch adds support for ONESHOT_STOPPED state in clockevents core. It will only be available to drivers that implement the state-specific callbacks instead of the legacy ->set_mode() callback. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Preeti U. Murthy <preeti@linux.vnet.ibm.com> Cc: linaro-kernel@lists.linaro.org Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Kevin Hilman <khilman@linaro.org> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/b8b383a03ac07b13312c16850b5106b82e4245b5.1428031396.git.viresh.kumar@linaro.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-05-19Merge branch 'linus' into timers/coreThomas Gleixner25-215/+265
Make sure the upstream fixes are applied before adding further modifications.
2015-05-19time: Refactor msecs_to_jiffiesNicholas Mc Guire1-40/+19
Refactor the msecs_to_jiffies conditional code part in time.c and jiffies.h putting it into conditional functions rather than #ifdefs to improve readability. [ tglx: Verified that there is no binary code change ] Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org> Cc: Masahiro Yamada <yamada.m@jp.panasonic.com> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Joe Perches <joe@perches.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Andrew Hunter <ahh@google.com> Cc: Paul Turner <pjt@google.com> Cc: Michal Marek <mmarek@suse.cz> Link: http://lkml.kernel.org/r/1431951554-5563-2-git-send-email-hofrat@osadl.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-05-19time: Move timeconst.h into include/generatedNicholas Mc Guire3-18/+4
kernel/time/timeconst.h is moved to include/generated/ and generated by the top level Kbuild. This allows using timeconst.h in an earlier build stage. Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org> Cc: Masahiro Yamada <yamada.m@jp.panasonic.com> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Joe Perches <joe@perches.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Andrew Hunter <ahh@google.com> Cc: Paul Turner <pjt@google.com> Cc: Michal Marek <mmarek@suse.cz> Link: http://lkml.kernel.org/r/1431951554-5563-1-git-send-email-hofrat@osadl.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-05-18watchdog: Fix merge 'conflict'Peter Zijlstra1-5/+15
Two watchdog changes that came through different trees had a non conflicting conflict, that is, one changed the semantics of a variable but no actual code conflict happened. So the merge appeared fine, but the resulting code did not behave as expected. Commit 195daf665a62 ("watchdog: enable the new user interface of the watchdog mechanism") changes the semantics of watchdog_user_enabled, which thereafter is only used by the functions introduced by b3738d293233 ("watchdog: Add watchdog enable/disable all functions"). There further appears to be a distinct lack of serialization between setting and using watchdog_enabled, so perhaps we should wrap the {en,dis}able_all() things in watchdog_proc_mutex. This patch fixes a s2r failure reported by Michal; which I cannot readily explain. But this does make the code internally consistent again. Reported-and-tested-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-05-18sched,perf: Fix periodic timersPeter Zijlstra5-33/+38
In the below two commits (see Fixes) we have periodic timers that can stop themselves when they're no longer required, but need to be (re)-started when their idle condition changes. Further complications is that we want the timer handler to always do the forward such that it will always correctly deal with the overruns, and we do not want to race such that the handler has already decided to stop, but the (external) restart sees the timer still active and we end up with a 'lost' timer. The problem with the current code is that the re-start can come before the callback does the forward, at which point the forward from the callback will WARN about forwarding an enqueued timer. Now, conceptually its easy to detect if you're before or after the fwd by comparing the expiration time against the current time. Of course, that's expensive (and racy) because we don't have the current time. Alternatively one could cache this state inside the timer, but then everybody pays the overhead of maintaining this extra state, and that is undesired. The only other option that I could see is the external timer_active variable, which I tried to kill before. I would love a nicer interface for this seemingly simple 'problem' but alas. Fixes: 272325c4821f ("perf: Fix mux_interval hrtimer wreckage") Fixes: 77a4d1a1b9a1 ("sched: Cleanup bandwidth timers") Cc: pjt@google.com Cc: tglx@linutronix.de Cc: klamm@yandex-team.ru Cc: mingo@kernel.org Cc: bsegall@google.com Cc: hpa@zytor.com Cc: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/20150514102311.GX21418@twins.programming.kicks-ass.net
2015-05-15Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds2-33/+33
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "Two fixes: a suspend/resume related regression fix, and an RT priority boosting fix" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/core: Fix regression in cpuset_cpu_inactive() for suspend sched: Handle priority boosted tasks proper in setscheduler()
2015-05-15Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds1-8/+33
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Mostly tooling fixes, but also a lockdep annotation fix, a PMU event list fix and a new model addition" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tools/liblockdep: Fix compilation error tools/liblockdep: Fix linker error in case of cross compile perf tools: Use getconf to determine number of online CPUs tools: Fix tools/vm build perf/x86/rapl: Enable Broadwell-U RAPL support perf/x86/intel: Fix SLM cache event list perf: Annotate inherited event ctx->mutex recursion
2015-05-09Merge branch 'irq-urgent-for-linus' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq updates from Thomas Gleixner: "Two patches from the irq departement: - a simple fix to make dummy_irq_chip usable for wakeup scenarios - removal of the gic arch_extn hackery. Now that all users are converted we really want to get rid of the interface so people wont come up with new use cases" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip: gic: Drop support for gic_arch_extn genirq: Set IRQCHIP_SKIP_SET_WAKE flag for dummy_irq_chip
2015-05-09Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds1-5/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Thomas Gleixner: "A simple fix to actually shut down a detached device instead of keeping it active" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: clockevents: Shutdown detached clockevent device
2015-05-08Merge tag 'trace-fixes-v4.1-rc2' of ↵Linus Torvalds1-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fix from Steven Rostedt: "The newly added ftrace_print_array_seq() function had a bug in it. Luckily, the only user of it didn't make the 4.1 merge window. But the helper function should be fixed before 4.2 when the users start coming in" * tag 'trace-fixes-v4.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Make ftrace_print_array_seq compute buf_len
2015-05-08perf: Annotate inherited event ctx->mutex recursionPeter Zijlstra1-8/+33
While fuzzing Sasha tripped over another ctx->mutex recursion lockdep splat. Annotate this. Reported-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-08sched/core: Fix regression in cpuset_cpu_inactive() for suspendOmar Sandoval1-16/+12
Commit 3c18d447b3b3 ("sched/core: Check for available DL bandwidth in cpuset_cpu_inactive()"), a SCHED_DEADLINE bugfix, had a logic error that caused a regression in setting a CPU inactive during suspend. I ran into this when a program was failing pthread_setaffinity_np() with EINVAL after a suspend+wake up. A simple reproducer: $ ./a.out sched_setaffinity: Success $ systemctl suspend $ ./a.out sched_setaffinity: Invalid argument ... where ./a.out is: #define _GNU_SOURCE #include <errno.h> #include <sched.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include <unistd.h> int main(void) { long num_cores; cpu_set_t cpu_set; int ret; num_cores = sysconf(_SC_NPROCESSORS_ONLN); CPU_ZERO(&cpu_set); CPU_SET(num_cores - 1, &cpu_set); errno = 0; ret = sched_setaffinity(getpid(), sizeof(cpu_set), &cpu_set); perror("sched_setaffinity"); return ret ? EXIT_FAILURE : EXIT_SUCCESS; } The mistake is that suspend is handled in the action == CPU_DOWN_PREPARE_FROZEN case of the switch statement in cpuset_cpu_inactive(). However, the commit in question masked out CPU_TASKS_FROZEN from the action, making this case dead. The fix is straightforward. Signed-off-by: Omar Sandoval <osandov@osandov.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Juri Lelli <juri.lelli@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 3c18d447b3b3 ("sched/core: Check for available DL bandwidth in cpuset_cpu_inactive()") Link: http://lkml.kernel.org/r/1cb5ecb3d6543c38cce5790387f336f54ec8e2bc.1430733960.git.osandov@osandov.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-08sched: Handle priority boosted tasks proper in setscheduler()Thomas Gleixner2-17/+21
Ronny reported that the following scenario is not handled correctly: T1 (prio = 10) lock(rtmutex); T2 (prio = 20) lock(rtmutex) boost T1 T1 (prio = 20) sys_set_scheduler(prio = 30) T1 prio = 30 .... sys_set_scheduler(prio = 10) T1 prio = 30 The last step is wrong as T1 should now be back at prio 20. Commit c365c292d059 ("sched: Consider pi boosting in setscheduler()") only handles the case where a boosted tasks tries to lower its priority. Fix it by taking the new effective priority into account for the decision whether a change of the priority is required. Reported-by: Ronny Meeus <ronny.meeus@gmail.com> Tested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Cc: <stable@vger.kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Fixes: c365c292d059 ("sched: Consider pi boosting in setscheduler()") Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1505051806060.4225@nanos Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-07nohz: Fix !HIGH_RES_TIMERS hangThomas Gleixner1-6/+3
Simon Horman reported this crash on a system with high-res timers disabled but nohz enabled: > ------------[ cut here ]------------ > kernel BUG at kernel/irq_work.c:135! BUG_ON(!irqs_disabled()); So something enabled interrupts in the periodic tick handling machinery, and that code path indeed has a local_irq_disable()/enable pair in tick_nohz_switch_to_nohz() which causes havoc. Fix it. This patch also fixes a +nohz -hrtimers hang reported by Ingo Molnar. Reported-by: Simon Horman <horms@verge.net.au> Reported-by: Ingo Molnar <mingo@kernel.org> Tested-by: Simon Horman <horms@verge.net.au> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: LAK <linux-arm-kernel@lists.infradead.org> Cc: Magnus Damm <magnus.damm@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1505071425520.4225@nanos Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-06tracing: Make ftrace_print_array_seq compute buf_lenAlex Bennée1-1/+2
The only caller to this function (__print_array) was getting it wrong by passing the array length instead of buffer length. As the element size was already being passed for other reasons it seems reasonable to push the calculation of buffer length into the function. Link: http://lkml.kernel.org/r/1430320727-14582-1-git-send-email-alex.bennee@linaro.org Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-05-06Merge branch 'core-urgent-for-linus' of ↵Linus Torvalds1-7/+9
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU fix from Ingo Molnar: "An RCU Kconfig fix that eliminates an annoying interactive kconfig question for CONFIG_RCU_TORTURE_TEST_SLOW_INIT" * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: rcu: Control grace-period delays directly from value
2015-05-05tick: hrtimer-broadcast: Prevent endless restarting when broadcast device is ↵Andreas Sandberg1-3/+7
unused The hrtimer callback in the hrtimer's tick broadcast code sometimes incorrectly ends up scheduling events at the current tick causing the kernel to hang servicing the same hrtimer forever. This typically happens when a device is swapped out by tick_install_broadcast_device(), which replaces the event handler with clock_events_handle_noop() and sets the device mode to CLOCK_EVT_MODE_UNUSED. If the timer is scheduled when this happens, the next_event field will not be updated and the hrtimer ends up being restarted at the current tick. To prevent this from happening, only try to restart the hrtimer if the broadcast clock event device is in one of the active modes and try to cancel the timer when entering the CLOCK_EVT_MODE_UNUSED mode. Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com> Tested-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/1429880765-5558-1-git-send-email-andreas.sandberg@arm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-05-05timer: Use timer->base for flag checksJoonwoo Park1-1/+1
At present, internal_add_timer() examines flags with 'base' which doesn't contain flags. Examine with 'timer->base' to avoid unnecessary waking up of nohz CPU when timer base has TIMER_DEFERRABLE set. Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org> Cc: sboyd@codeaurora.org Cc: skannan@codeaurora.org Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/1430187709-21087-1-git-send-email-joonwoop@codeaurora.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-05-05tick-broadcast: Fix the printing of broadcast masksPreeti U Murthy1-4/+4
Today the number of bits of the broadcast masks that is output into /proc/timer_list is sizeof(unsigned long). This means that on machines with a larger number of CPUs, the bitmasks of CPUs beyond this range do not appear. Fix this by using bitmap printing through "%*pb" instead, so as to output the broadcast masks for the range of nr_cpu_ids into /proc/timer_list. Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Cc: peterz@infradead.org Cc: linuxppc-dev@ozlabs.org Cc: john.stultz@linaro.org Link: http://lkml.kernel.org/r/20150428084520.3314.62668.stgit@preeti.in.ibm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-05-05tick: broadcast: Simplify oneshot logic and shorten lock regionThomas Gleixner1-25/+17
Simplify the oneshot logic by avoiding the reprogramming loops. That also allows to call the cpu local handler outside of the broadcast_lock held region. Tested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>