summaryrefslogtreecommitdiffstats
path: root/kernel/sched/core.c
AgeCommit message (Collapse)AuthorFilesLines
2017-05-01Merge branch 'locking-core-for-linus' of ↵Linus Torvalds1-11/+53
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: "The main changes in this cycle were: - a big round of FUTEX_UNLOCK_PI improvements, fixes, cleanups and general restructuring - lockdep updates such as new checks for lock_downgrade() - introduce the new atomic_try_cmpxchg() locking API and use it to optimize refcount code generation - ... plus misc fixes, updates and cleanups" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (38 commits) MAINTAINERS: Add FUTEX SUBSYSTEM futex: Clarify mark_wake_futex memory barrier usage futex: Fix small (and harmless looking) inconsistencies futex: Avoid freeing an active timer rtmutex: Plug preempt count leak in rt_mutex_futex_unlock() rtmutex: Fix more prio comparisons rtmutex: Fix PI chain order integrity sched,tracing: Update trace_sched_pi_setprio() sched/rtmutex: Refactor rt_mutex_setprio() rtmutex: Clean up sched/deadline/rtmutex: Dont miss the dl_runtime/dl_period update sched/rtmutex/deadline: Fix a PI crash for deadline tasks rtmutex: Deboost before waking up the top waiter locking/ww-mutex: Limit stress test to 2 seconds locking/atomic: Fix atomic_try_cmpxchg() semantics lockdep: Fix per-cpu static objects futex: Drop hb->lock before enqueueing on the rtmutex futex: Futex_unlock_pi() determinism futex: Rework futex_lock_pi() to use rt_mutex_*_proxy_lock() futex,rt_mutex: Restructure rt_mutex_finish_proxy_lock() ...
2017-05-01Merge branch 'sched-core-for-linus' of ↵Linus Torvalds1-98/+103
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "The main changes in this cycle were: - another round of rq-clock handling debugging, robustization and fixes - PELT accounting improvements - CPU hotplug related ->cpus_allowed affinity handling fixes all around the tree - ... plus misc fixes, cleanups and updates" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (35 commits) sched/x86: Update reschedule warning text crypto: N2 - Replace racy task affinity logic cpufreq/sparc-us2e: Replace racy task affinity logic cpufreq/sparc-us3: Replace racy task affinity logic cpufreq/sh: Replace racy task affinity logic cpufreq/ia64: Replace racy task affinity logic ACPI/processor: Replace racy task affinity logic ACPI/processor: Fix error handling in __acpi_processor_start() sparc/sysfs: Replace racy task affinity logic powerpc/smp: Replace open coded task affinity logic ia64/sn/hwperf: Replace racy task affinity logic ia64/salinfo: Replace racy task affinity logic workqueue: Provide work_on_cpu_safe() ia64/topology: Remove cpus_allowed manipulation sched/fair: Move the PELT constants into a generated header sched/fair: Increase PELT accuracy for small tasks sched/fair: Fix comments sched/Documentation: Add 'sched-pelt' tool sched/fair: Fix corner case in __accumulate_sum() sched/core: Remove 'task' parameter and rename tsk_restore_flags() to current_restore_flags() ...
2017-05-01Merge branch 'for-4.12' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup updates from Tejun Heo: "Nothing major. Two notable fixes are Li's second stab at fixing the long-standing race condition in the mount path and suppression of spurious warning from cgroup_get(). All other changes are trivial" * 'for-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: mark cgroup_get() with __maybe_unused cgroup: avoid attaching a cgroup root to two different superblocks, take 2 cgroup: fix spurious warnings on cgroup_is_dead() from cgroup_sk_alloc() cgroup: move cgroup_subsys_state parent field for cache locality cpuset: Remove cpuset_update_active_cpus()'s parameter. cgroup: switch to BUG_ON() cgroup: drop duplicate header nsproxy.h kernel: convert css_set.refcount from atomic_t to refcount_t kernel: convert cgroup_namespace.count from atomic_t to refcount_t
2017-04-11cpuset: Remove cpuset_update_active_cpus()'s parameter.Rakib Mullick1-2/+2
In cpuset_update_active_cpus(), cpu_online isn't used anymore. Remove it. Signed-off-by: Rakib Mullick<rakib.mullick@gmail.com> Acked-by: Zefan Li <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2017-04-04sched,tracing: Update trace_sched_pi_setprio()Peter Zijlstra1-1/+1
Pass the PI donor task, instead of a numerical priority. Numerical priorities are not sufficient to describe state ever since SCHED_DEADLINE. Annotate all sched tracepoints that are currently broken; fixing them will bork userspace. *hate*. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Cc: juri.lelli@arm.com Cc: bigeasy@linutronix.de Cc: xlpang@redhat.com Cc: mathieu.desnoyers@efficios.com Cc: jdesfossez@efficios.com Cc: bristot@redhat.com Link: http://lkml.kernel.org/r/20170323150216.353599881@infradead.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-04sched/rtmutex: Refactor rt_mutex_setprio()Peter Zijlstra1-13/+53
With the introduction of SCHED_DEADLINE the whole notion that priority is a single number is gone, therefore the @prio argument to rt_mutex_setprio() doesn't make sense anymore. So rework the code to pass a pi_task instead. Note this also fixes a problem with pi_top_task caching; previously we would not set the pointer (call rt_mutex_update_top_task) if the priority didn't change, this could lead to a stale pointer. As for the XXX, I think its fine to use pi_task->prio, because if it differs from waiter->prio, a PI chain update is immenent. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: juri.lelli@arm.com Cc: bigeasy@linutronix.de Cc: xlpang@redhat.com Cc: rostedt@goodmis.org Cc: mathieu.desnoyers@efficios.com Cc: jdesfossez@efficios.com Cc: bristot@redhat.com Link: http://lkml.kernel.org/r/20170323150216.303827095@infradead.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-04sched/rtmutex/deadline: Fix a PI crash for deadline tasksXunlei Pang1-0/+2
A crash happened while I was playing with deadline PI rtmutex. BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 IP: [<ffffffff810eeb8f>] rt_mutex_get_top_task+0x1f/0x30 PGD 232a75067 PUD 230947067 PMD 0 Oops: 0000 [#1] SMP CPU: 1 PID: 10994 Comm: a.out Not tainted Call Trace: [<ffffffff810b658c>] enqueue_task+0x2c/0x80 [<ffffffff810ba763>] activate_task+0x23/0x30 [<ffffffff810d0ab5>] pull_dl_task+0x1d5/0x260 [<ffffffff810d0be6>] pre_schedule_dl+0x16/0x20 [<ffffffff8164e783>] __schedule+0xd3/0x900 [<ffffffff8164efd9>] schedule+0x29/0x70 [<ffffffff8165035b>] __rt_mutex_slowlock+0x4b/0xc0 [<ffffffff81650501>] rt_mutex_slowlock+0xd1/0x190 [<ffffffff810eeb33>] rt_mutex_timed_lock+0x53/0x60 [<ffffffff810ecbfc>] futex_lock_pi.isra.18+0x28c/0x390 [<ffffffff810ed8b0>] do_futex+0x190/0x5b0 [<ffffffff810edd50>] SyS_futex+0x80/0x180 This is because rt_mutex_enqueue_pi() and rt_mutex_dequeue_pi() are only protected by pi_lock when operating pi waiters, while rt_mutex_get_top_task(), will access them with rq lock held but not holding pi_lock. In order to tackle it, we introduce new "pi_top_task" pointer cached in task_struct, and add new rt_mutex_update_top_task() to update its value, it can be called by rt_mutex_setprio() which held both owner's pi_lock and rq lock. Thus "pi_top_task" can be safely accessed by enqueue_task_dl() under rq lock. Originally-From: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Xunlei Pang <xlpang@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Steven Rostedt <rostedt@goodmis.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: juri.lelli@arm.com Cc: bigeasy@linutronix.de Cc: mathieu.desnoyers@efficios.com Cc: jdesfossez@efficios.com Cc: bristot@redhat.com Link: http://lkml.kernel.org/r/20170323150216.157682758@infradead.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-23sched/core: Fix rq lock pinning warning after call balance callbacksWanpeng Li1-3/+3
This can be reproduced by running rt-migrate-test: WARNING: CPU: 2 PID: 2195 at kernel/locking/lockdep.c:3670 lock_unpin_lock() unpinning an unpinned lock ... Call Trace: dump_stack() __warn() warn_slowpath_fmt() lock_unpin_lock() __balance_callback() __schedule() schedule() futex_wait_queue_me() futex_wait() do_futex() SyS_futex() do_syscall_64() entry_SYSCALL64_slow_path() Revert the rq_lock_irqsave() usage here, the whole point of the balance_callback() was to allow dropping rq->lock. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 8a8c69c32778 ("sched/core: Add rq->lock wrappers") Link: http://lkml.kernel.org/r/1489718719-3951-1-git-send-email-wanpeng.li@hotmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-16sched/core: Avoid double update_rq_clock() in move_queued_task()Peter Zijlstra1-1/+2
Address this case: WARNING: CPU: 0 PID: 2070 at ../kernel/sched/core.c:109 update_rq_clock+0x74/0x80 rq->clock_update_flags & RQCF_UPDATED Call Trace: update_rq_clock() move_queued_task() __set_cpus_allowed_ptr() ... Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-16sched/core: Avoid obvious double update_rq_clock()Peter Zijlstra1-8/+10
Add DEQUEUE_NOCLOCK to all places where we just did an update_rq_clock() already. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-16sched/core: Simplify update_rq_clock() in __schedule()Peter Zijlstra1-5/+3
Instead of relying on deactivate_task() to call update_rq_clock() and handling the case where it didn't happen (task_on_rq_queued), unconditionally do update_rq_clock() and skip any further updates. This also avoids a double update on deactivate_task() + ttwu_local(). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-16sched/core: Make sched_ttwu_pending() atomic in timePeter Zijlstra1-1/+3
Since all tasks on the wake_list are woken under a single rq->lock avoid calling update_rq_clock() for each task. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-16sched/core: Add ENQUEUE_NOCLOCK to ENQUEUE_RESTOREPeter Zijlstra1-4/+4
In all cases, ENQUEUE_RESTORE should also have ENQUEUE_NOCLOCK because DEQUEUE_SAVE will have done an update_rq_clock(). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-16sched/core: Add {EN,DE}QUEUE_NOCLOCK flagsPeter Zijlstra1-2/+8
Currently {en,de}queue_task() do an unconditional update_rq_clock(). However since we want to avoid duplicate updates, so that each rq->lock section appears atomic in time, we need to be able to skip these clock updates. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-16sched/core: Add rq->lock wrappersPeter Zijlstra1-81/+74
The missing update_rq_clock() check can work with partial rq->lock wrappery, since a missing wrapper can cause the warning to not be emitted when it should have, but cannot cause the warning to trigger when it should not have. The duplicate update_rq_clock() check however can cause false warnings to trigger. Therefore add more comprehensive rq->lock wrappery. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-16sched/core: Add WARNING for multiple update_rq_clock() callsPeter Zijlstra1-0/+3
Now that we have no missing calls, add a warning to find multiple calls. By having only a single update_rq_clock() call per rq-lock section, the section appears 'atomic' wrt time. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-07Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds1-3/+8
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "A fix for KVM's scheduler clock which (erroneously) was always marked unstable, a fix for RT/DL load balancing, plus latency fixes" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/clock, x86/tsc: Rework the x86 'unstable' sched_clock() interface sched/core: Fix pick_next_task() for RT,DL sched/fair: Make select_idle_cpu() more aggressive
2017-03-02sched/core: Fix pick_next_task() for RT,DLPeter Zijlstra1-3/+8
Pavan noticed that the following commit: 49ee576809d8 ("sched/core: Optimize pick_next_task() for idle_sched_class") ... broke RT,DL balancing by robbing them of the opportinty to do new-'idle' balancing when their last runnable task (on that runqueue) goes away. Reported-by: Pavan Kondeti <pkondeti@codeaurora.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Fixes: 49ee576809d8 ("sched/core: Optimize pick_next_task() for idle_sched_class") Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02sched/headers: Prepare for new header dependencies before moving code to ↵Ingo Molnar1-0/+1
<linux/sched/hotplug.h> We are going to split <linux/sched/hotplug.h> out of <linux/sched.h>, which will have to be picked up from other headers and a couple of .c files. Create a trivial placeholder <linux/sched/hotplug.h> file that just maps to <linux/sched.h> to make this patch obviously correct and bisectable. Include the new header in the files that are going to need it. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02sched/headers: Prepare for new header dependencies before moving code to ↵Ingo Molnar1-0/+1
<linux/sched/loadavg.h> We are going to split <linux/sched/loadavg.h> out of <linux/sched.h>, which will have to be picked up from a couple of .c files. Create a trivial placeholder <linux/sched/topology.h> file that just maps to <linux/sched.h> to make this patch obviously correct and bisectable. Include the new header in the files that are going to need it. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02sched/headers: Prepare for new header dependencies before moving code to ↵Ingo Molnar1-0/+1
<uapi/linux/sched/types.h> We are going to move scheduler ABI details to <uapi/linux/sched/types.h>, which will be used from a number of .c files. Create empty placeholder header that maps to <linux/types.h>. Include the new header in the files that are going to need it. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02sched/headers: Prepare for new header dependencies before moving code to ↵Ingo Molnar1-0/+1
<linux/sched/clock.h> We are going to split <linux/sched/clock.h> out of <linux/sched.h>, which will have to be picked up from other headers and .c files. Create a trivial placeholder <linux/sched/clock.h> file that just maps to <linux/sched.h> to make this patch obviously correct and bisectable. Include the new header in the files that are going to need it. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02rcu: Separate the RCU synchronization types and APIs into ↵Ingo Molnar1-0/+1
<linux/rcupdate_wait.h> So rcupdate.h is a pretty complex header, in particular it includes <linux/completion.h> which includes <linux/wait.h> - creating a dependency that includes <linux/wait.h> in <linux/sched.h>, which prevents the isolation of <linux/sched.h> from the derived <linux/wait.h> header. Solve part of the problem by decoupling rcupdate.h from completions: this can be done by separating out the rcu_synchronize types and APIs, and updating their usage sites. Since this is a mostly RCU-internal types this will not just simplify <linux/sched.h>'s dependencies, but will make all the hundreds of .c files that include rcupdate.h but not completions or wait.h build faster. ( For rcutiny this means that two dependent APIs have to be uninlined, but that shouldn't be much of a problem as they are rare variants. ) Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02sched/core: Remove the tsk_nr_cpus_allowed() wrapperIngo Molnar1-1/+1
tsk_nr_cpus_allowed() too is a pretty pointless wrapper that is not used consistently and which makes the code both harder to read and longer as well. So remove it - this also shrinks <linux/sched.h> a bit. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02sched/core: Remove the tsk_cpus_allowed() wrapperIngo Molnar1-10/+10
So the original intention of tsk_cpus_allowed() was to 'future-proof' the field - but it's pretty ineffectual at that, because half of the code uses ->cpus_allowed directly ... Also, the wrapper makes the code longer than the original expression! So just get rid of it. This also shrinks <linux/sched.h> a bit. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02sched/core: Move the get_preempt_disable_ip() inline to sched/core.cIngo Molnar1-0/+9
It's defined in <linux/sched.h>, but nothing outside the scheduler uses it - so move it to the sched/core.c usage site. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02sched/core: Convert ___assert_task_state() link time assert to BUILD_BUG_ON()Ingo Molnar1-0/+3
The length of TASK_STATE_TO_CHAR_STR was still checked using the old link-time manual error method - convert it to BUILD_BUG_ON(). This has a couple of advantages: - it's more obvious what's going on - it reduces the size and complexity of <linux/sched.h> - BUILD_BUG_ON() will fail during compilation, with a clearer error message than the link time assert. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-28Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds1-12/+17
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "Two rq-clock warnings related fixes, plus a cgroups related crash fix" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/cgroup: Move sched_online_group() back into css_online() to fix crash sched/fair: Update rq clock before changing a task's CPU affinity sched/core: Fix update_rq_clock() splat on hotplug (and suspend/resume)
2017-02-27mm: add new mmgrab() helperVegard Nossum1-2/+2
Apart from adding the helper function itself, the rest of the kernel is converted mechanically using: git grep -l 'atomic_inc.*mm_count' | xargs sed -i 's/atomic_inc(&\(.*\)->mm_count);/mmgrab\(\1\);/' git grep -l 'atomic_inc.*mm_count' | xargs sed -i 's/atomic_inc(&\(.*\)\.mm_count);/mmgrab\(\&\1\);/' This is needed for a later patch that hooks into the helper, but might be a worthwhile cleanup on its own. (Michal Hocko provided most of the kerneldoc comment.) Link: http://lkml.kernel.org/r/20161218123229.22952-1-vegard.nossum@oracle.com Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-24sched/cgroup: Move sched_online_group() back into css_online() to fix crashKonstantin Khlebnikov1-2/+12
Commit: 2f5177f0fd7e ("sched/cgroup: Fix/cleanup cgroup teardown/init") .. moved sched_online_group() from css_online() to css_alloc(). It exposes half-baked task group into global lists before initializing generic cgroup stuff. LTP testcase (third in cgroup_regression_test) written for testing similar race in kernels 2.6.26-2.6.28 easily triggers this oops: BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 IP: kernfs_path_from_node_locked+0x260/0x320 CPU: 1 PID: 30346 Comm: cat Not tainted 4.10.0-rc5-test #4 Call Trace: ? kernfs_path_from_node+0x4f/0x60 kernfs_path_from_node+0x3e/0x60 print_rt_rq+0x44/0x2b0 print_rt_stats+0x7a/0xd0 print_cpu+0x2fc/0xe80 ? __might_sleep+0x4a/0x80 sched_debug_show+0x17/0x30 seq_read+0xf2/0x3b0 proc_reg_read+0x42/0x70 __vfs_read+0x28/0x130 ? security_file_permission+0x9b/0xc0 ? rw_verify_area+0x4e/0xb0 vfs_read+0xa5/0x170 SyS_read+0x46/0xa0 entry_SYSCALL_64_fastpath+0x1e/0xad Here the task group is already linked into the global RCU-protected 'task_groups' list, but the css->cgroup pointer is still NULL. This patch reverts this chunk and moves online back to css_online(). Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 2f5177f0fd7e ("sched/cgroup: Fix/cleanup cgroup teardown/init") Link: http://lkml.kernel.org/r/148655324740.424917.5302984537258726349.stgit@buzz Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-24sched/fair: Update rq clock before changing a task's CPU affinityWanpeng Li1-0/+1
This is triggered during boot when CONFIG_SCHED_DEBUG is enabled: ------------[ cut here ]------------ WARNING: CPU: 6 PID: 81 at kernel/sched/sched.h:812 set_next_entity+0x11d/0x380 rq->clock_update_flags < RQCF_ACT_SKIP CPU: 6 PID: 81 Comm: torture_shuffle Not tainted 4.10.0+ #1 Hardware name: LENOVO ThinkCentre M8500t-N000/SHARKBAY, BIOS FBKTC1AUS 02/16/2016 Call Trace: dump_stack+0x85/0xc2 __warn+0xcb/0xf0 warn_slowpath_fmt+0x5f/0x80 set_next_entity+0x11d/0x380 set_curr_task_fair+0x2b/0x60 do_set_cpus_allowed+0x139/0x180 __set_cpus_allowed_ptr+0x113/0x260 set_cpus_allowed_ptr+0x10/0x20 torture_shuffle+0xfd/0x180 kthread+0x10f/0x150 ? torture_shutdown_init+0x60/0x60 ? kthread_create_on_node+0x60/0x60 ret_from_fork+0x31/0x40 ---[ end trace dd94d92344cea9c6 ]--- The task is running && !queued, so there is no rq clock update before calling set_curr_task(). This patch fixes it by updating rq clock after holding rq->lock/pi_lock just as what other dequeue + put_prev + enqueue + set_curr story does. Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1487749975-5994-1-git-send-email-wanpeng.li@hotmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-24sched/core: Fix update_rq_clock() splat on hotplug (and suspend/resume)Peter Zijlstra1-10/+4
The hotplug code still triggers the warning about using a stale rq->clock value. Fix things up to actually run update_rq_clock() in a place where we record the 'UPDATED' flag, and then modify the annotation to retain this flag over the rq->lock fiddling that happens as a result of actually migrating all the tasks elsewhere. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Tested-by: Mike Galbraith <efault@gmx.de> Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com> Tested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ross Zwisler <zwisler@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 4d25b35ea372 ("sched/fair: Restore previous rq_flags when migrating tasks in hotplug") Link: http://lkml.kernel.org/r/20170202155506.GX6515@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-21sched/core: Fix build paravirt build on arm and arm64Mark Brown1-0/+3
Commit 004172bdad64 ("sched/core: Remove unnecessary #include headers") removed the inclusion of asm/paravirt.h which is used to get declarations of paravirt_steal_rq_enabled and paravirt_steal_clock. It is implicitly included on x86 but not on arm and arm64 breaking the build if paravirtualization is used. Since things from that header are used directly fix the build by putting the direct inclusion back. Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-10sched/core: Remove unlikely() annotation from sched_move_task()Steven Rostedt (VMware)1-2/+2
The check for 'running' in sched_move_task() has an unlikely() around it. That is, it is unlikely that the task being moved is running. That use to be true. But with a couple of recent updates, it is now likely that the task will be running. The first change came from ea86cb4b7621 ("sched/cgroup: Fix cpu_cgroup_fork() handling") that moved around the use case of sched_move_task() in do_fork() where the call is now done after the task is woken (hence it is running). The second change came from 8e5bfa8c1f84 ("sched/autogroup: Do not use autogroup->tg in zombie threads") where sched_move_task() is called by the exit path, by the task that is exiting. Hence it too is running. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vincent Guittot <vincent.guittot@linaro.org> Link: http://lkml.kernel.org/r/20170206110426.27ca6426@gandalf.local.home Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-07sched/topology: Split out scheduler topology code from core.c into topology.cIngo Molnar1-1656/+3
Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-07sched/core: Remove unnecessary #include headersIngo Molnar1-49/+10
Over the years sched/core.c accumulated over 50 #include lines, 40 of which are superfluous. (!) Removing them decreases the preprocessed .c file (.i) size noticeably: triton:~/tip> wc -l kernel/sched/core.i Before: 76387 kernel/sched/core.i After: 75896 kernel/sched/core.i Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-07sched/rq_clock: Consolidate the ordering of the rq_clock methodsIngo Molnar1-75/+78
update_rq_clock_task() and update_rq_clock() we unnecessarily spread across core.c, requiring an extra prototype line. Move them next to each other and in the proper order. Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-07sched/core: Clean up commentsIngo Molnar1-176/+182
Refresh the comments in the core scheduler code: - Capitalize sentences consistently - Capitalize 'CPU' consistently - ... and other small details. Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-01sched/rt: Show the 'sched_rr_timeslice' SCHED_RR timeslice tuning knob in ↵Shile Zhang1-2/+3
milliseconds We added the 'sched_rr_timeslice_ms' SCHED_RR tuning knob in this commit: ce0dbbbb30ae ("sched/rt: Add a tuning knob to allow changing SCHED_RR timeslice") ... which name suggests to users that it's in milliseconds, while in reality it's being set in milliseconds but the result is shown in jiffies. This is obviously confusing when HZ is not 1000, it makes it appear like the value set failed, such as HZ=100: root# echo 100 > /proc/sys/kernel/sched_rr_timeslice_ms root# cat /proc/sys/kernel/sched_rr_timeslice_ms 10 Fix this to be milliseconds all around. Signed-off-by: Shile Zhang <shile.zhang@nokia.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1485612049-20923-1-git-send-email-shile.zhang@nokia.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-30sched/core: Fix &rd->cpudl memory leakMathieu Poirier1-1/+3
While in the process of initialising a root domain, if function cpupri_init() fails the memory allocated in cpudl_init() is not reclaimed. Adding a new goto target to cleanup the previous initialistion of the root_domain's dl_bw structure reclaims said memory. Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1485292295-21298-2-git-send-email-mathieu.poirier@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-30sched/core: Fix &rd->rto_mask memory leakMathieu Poirier1-1/+1
If function cpudl_init() fails the memory allocated for &rd->rto_mask needs to be freed, something this patch is addressing. Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1485292295-21298-1-git-send-email-mathieu.poirier@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-30sched/fair: Restore previous rq_flags when migrating tasks in hotplugMatt Fleming1-1/+9
__migrate_task() can return with a different runqueue locked than the one we passed as an argument. So that we can repin the lock in migrate_tasks() (and keep the update_rq_clock() bit) we need to restore the old rq_flags before repinning. Note that it wouldn't be correct to change move_queued_task() to repin because of the change of runqueue and the fact that having an up-to-date clock on the initial rq doesn't mean the new rq has one too. Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-30sched/core: Add missing update_rq_clock() call in sched_move_task()Peter Zijlstra1-0/+1
Bug was noticed via this warning: WARNING: CPU: 6 PID: 1 at kernel/sched/sched.h:804 detach_task_cfs_rq+0x8e8/0xb80 rq->clock_update_flags < RQCF_ACT_SKIP Modules linked in: CPU: 6 PID: 1 Comm: systemd Not tainted 4.10.0-rc5-00140-g0874170baf55-dirty #1 Hardware name: Supermicro SYS-4048B-TRFT/X10QBi, BIOS 1.0 04/11/2014 Call Trace: dump_stack+0x4d/0x65 __warn+0xcb/0xf0 warn_slowpath_fmt+0x5f/0x80 detach_task_cfs_rq+0x8e8/0xb80 ? allocate_cgrp_cset_links+0x59/0x80 task_change_group_fair+0x27/0x150 sched_change_group+0x48/0xf0 sched_move_task+0x53/0x150 cpu_cgroup_attach+0x36/0x70 cgroup_taskset_migrate+0x175/0x300 cgroup_migrate+0xab/0xd0 cgroup_attach_task+0xf0/0x190 __cgroup_procs_write+0x1ed/0x2f0 cgroup_procs_write+0x14/0x20 cgroup_file_write+0x3f/0x100 kernfs_fop_write+0x104/0x180 __vfs_write+0x37/0x140 vfs_write+0xb8/0x1b0 SyS_write+0x55/0xc0 do_syscall_64+0x61/0x170 entry_SYSCALL64_slow_path+0x25/0x25 Reported-by: Ingo Molnar <mingo@kernel.org> Reported-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-30sched/core: Optimize pick_next_task() for idle_sched_classPeter Zijlstra1-3/+2
Steve noticed that when we switch from IDLE to SCHED_OTHER we fail to take the shortcut, even though all runnable tasks are of the fair class, because prev->sched_class != &fair_sched_class. Since I reworked the put_prev_task() stuff, we don't really care about prev->class here, so removing that condition will allow this case. This increases the likely case from 78% to 98% correct for Steve's workload. Reported-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Tested-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170119174408.GN6485@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-14sched/core: Separate out io_schedule_prepare() and io_schedule_finish()Tejun Heo1-5/+28
Now that IO schedule accounting is done inside __schedule(), io_schedule() can be split into three steps - prep, schedule, and finish - where the schedule part doesn't need any special annotation. This allows marking a sleep as iowait by simply wrapping an existing blocking function with io_schedule_prepare() and io_schedule_finish(). Because task_struct->in_iowait is single bit, the caller of io_schedule_prepare() needs to record and the pass its state to io_schedule_finish() to be safe regarding nesting. While this isn't the prettiest, these functions are mostly gonna be used by core functions and we don't want to use more space for ->in_iowait. While at it, as it's simple to do now, reimplement io_schedule() without unnecessarily going through io_schedule_timeout(). Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: adilger.kernel@dilger.ca Cc: jack@suse.com Cc: kernel-team@fb.com Cc: mingbo@fb.com Cc: tytso@mit.edu Link: http://lkml.kernel.org/r/1477673892-28940-3-git-send-email-tj@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-14sched/core: move IO scheduling accounting from io_schedule_timeout() into ↵Tejun Heo1-7/+61
scheduler For an interface to support blocking for IOs, it must call io_schedule() instead of schedule(). This makes it tedious to add IO blocking to existing interfaces as the switching between schedule() and io_schedule() is often buried deep. As we already have a way to mark the task as IO scheduling, this can be made easier by separating out io_schedule() into multiple steps so that IO schedule preparation can be performed before invoking a blocking interface and the actual accounting happens inside the scheduler. io_schedule_timeout() does the following three things prior to calling schedule_timeout(). 1. Mark the task as scheduling for IO. 2. Flush out plugged IOs. 3. Account the IO scheduling. done close to the actual scheduling. This patch moves #3 into the scheduler so that later patches can separate out preparation and finish steps from io_schedule(). Patch-originally-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: adilger.kernel@dilger.ca Cc: akpm@linux-foundation.org Cc: axboe@kernel.dk Cc: jack@suse.com Cc: kernel-team@fb.com Cc: mingbo@fb.com Cc: tytso@mit.edu Link: http://lkml.kernel.org/r/20161207204841.GA22296@htj.duckdns.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-14sched/clock: Delay switching sched_clock to stablePeter Zijlstra1-0/+4
Currently we switch to the stable sched_clock if we guess the TSC is usable, and then switch back to the unstable path if it turns out TSC isn't stable during SMP bringup after all. Delay switching to the stable path until after SMP bringup is complete. This way we'll avoid switching during the time we detect the worst of the TSC offences. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-14sched/core: Add debugging code to catch missing update_rq_clock() callsMatt Fleming1-4/+7
There's no diagnostic checks for figuring out when we've accidentally missed update_rq_clock() calls. Let's add some by piggybacking on the rq_*pin_lock() wrappers. The idea behind the diagnostic checks is that upon pining rq lock the rq clock should be updated, via update_rq_clock(), before anybody reads the clock with rq_clock() or rq_clock_task(). The exception to this rule is when updates have explicitly been disabled with the rq_clock_skip_update() optimisation. There are some functions that only unpin the rq lock in order to grab some other lock and avoid deadlock. In that case we don't need to update the clock again and the previous diagnostic state can be carried over in rq_repin_lock() by saving the state in the rq_flags context. Since this patch adds a new clock update flag and some already exist in rq::clock_skip_update, that field has now been renamed. An attempt has been made to keep the flag manipulation code small and fast since it's used in the heart of the __schedule() fast path. For the !CONFIG_SCHED_DEBUG case the only object code change (other than addresses) is the following change to reset RQCF_ACT_SKIP inside of __schedule(), - c7 83 38 09 00 00 00 movl $0x0,0x938(%rbx) - 00 00 00 + 83 a3 38 09 00 00 fc andl $0xfffffffc,0x938(%rbx) Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Byungchul Park <byungchul.park@lge.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jan Kara <jack@suse.cz> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luca Abeni <luca.abeni@unitn.it> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Mike Galbraith <efault@gmx.de> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Rik van Riel <riel@redhat.com> Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Cc: Yuyang Du <yuyang.du@intel.com> Link: http://lkml.kernel.org/r/20160921133813.31976-8-matt@codeblueprint.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-14sched/core: Add missing update_rq_clock() call in set_user_nice()Peter Zijlstra1-0/+2
Address this rq-clock update bug: WARNING: CPU: 30 PID: 195 at ../kernel/sched/sched.h:797 set_next_entity() rq->clock_update_flags < RQCF_ACT_SKIP Call Trace: dump_stack() __warn() warn_slowpath_fmt() set_next_entity() ? _raw_spin_lock() set_curr_task_fair() set_user_nice.part.85() set_user_nice() create_worker() worker_thread() kthread() ret_from_fork() Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-14sched/core: Add missing update_rq_clock() in detach_task_cfs_rq()Peter Zijlstra1-0/+3
Instead of adding the update_rq_clock() all the way at the bottom of the callstack, add one at the top, this to aid later effort to minimize update_rq_lock() calls. WARNING: CPU: 0 PID: 1 at ../kernel/sched/sched.h:797 detach_task_cfs_rq() rq->clock_update_flags < RQCF_ACT_SKIP Call Trace: dump_stack() __warn() warn_slowpath_fmt() detach_task_cfs_rq() switched_from_fair() __sched_setscheduler() _sched_setscheduler() sched_set_stop_task() cpu_stop_create() __smpboot_create_thread.part.2() smpboot_register_percpu_thread_cpumask() cpu_stop_init() do_one_initcall() ? print_cpu_info() kernel_init_freeable() ? rest_init() kernel_init() ret_from_fork() Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>