diff options
author | Peter Zijlstra <peterz@infradead.org> | 2019-07-29 16:05:15 +0200 |
---|---|---|
committer | Peter Zijlstra <peterz@infradead.org> | 2019-08-08 09:09:30 +0200 |
commit | 139d025cda1da5484e7287b35c019fe1dcf9b650 (patch) | |
tree | 9cc8a5329e99b63f61711fceb0e4a53c581c2139 /kernel/sched | |
parent | 130d9c331bc59a8733b47c58ef197a2b1fa3ed43 (diff) | |
download | linux-139d025cda1da5484e7287b35c019fe1dcf9b650.tar.bz2 |
sched: Clean up active_mm reference counting
The current active_mm reference counting is confusing and sub-optimal.
Rewrite the code to explicitly consider the 4 separate cases:
user -> user
When switching between two user tasks, all we need to consider
is switch_mm().
user -> kernel
When switching from a user task to a kernel task (which
doesn't have an associated mm) we retain the last mm in our
active_mm. Increment a reference count on active_mm.
kernel -> kernel
When switching between kernel threads, all we need to do is
pass along the active_mm reference.
kernel -> user
When switching between a kernel and user task, we must switch
from the last active_mm to the next mm, hoping of course that
these are the same. Decrement a reference on the active_mm.
The code keeps a different order, because as you'll note, both 'to
user' cases require switch_mm().
And where the old code would increment/decrement for the 'kernel ->
kernel' case, the new code observes this is a neutral operation and
avoids touching the reference count.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Rik van Riel <riel@surriel.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: luto@kernel.org
Diffstat (limited to 'kernel/sched')
-rw-r--r-- | kernel/sched/core.c | 49 |
1 files changed, 30 insertions, 19 deletions
diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 46f3ca9e392a..b4a44bc84749 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -3214,12 +3214,8 @@ static __always_inline struct rq * context_switch(struct rq *rq, struct task_struct *prev, struct task_struct *next, struct rq_flags *rf) { - struct mm_struct *mm, *oldmm; - prepare_task_switch(rq, prev, next); - mm = next->mm; - oldmm = prev->active_mm; /* * For paravirt, this is coupled with an exit in switch_to to * combine the page table reload and the switch backend into @@ -3228,22 +3224,37 @@ context_switch(struct rq *rq, struct task_struct *prev, arch_start_context_switch(prev); /* - * If mm is non-NULL, we pass through switch_mm(). If mm is - * NULL, we will pass through mmdrop() in finish_task_switch(). - * Both of these contain the full memory barrier required by - * membarrier after storing to rq->curr, before returning to - * user-space. + * kernel -> kernel lazy + transfer active + * user -> kernel lazy + mmgrab() active + * + * kernel -> user switch + mmdrop() active + * user -> user switch */ - if (!mm) { - next->active_mm = oldmm; - mmgrab(oldmm); - enter_lazy_tlb(oldmm, next); - } else - switch_mm_irqs_off(oldmm, mm, next); - - if (!prev->mm) { - prev->active_mm = NULL; - rq->prev_mm = oldmm; + if (!next->mm) { // to kernel + enter_lazy_tlb(prev->active_mm, next); + + next->active_mm = prev->active_mm; + if (prev->mm) // from user + mmgrab(prev->active_mm); + else + prev->active_mm = NULL; + } else { // to user + /* + * sys_membarrier() requires an smp_mb() between setting + * rq->curr and returning to userspace. + * + * The below provides this either through switch_mm(), or in + * case 'prev->active_mm == next->mm' through + * finish_task_switch()'s mmdrop(). + */ + + switch_mm_irqs_off(prev->active_mm, next->mm, next); + + if (!prev->mm) { // from kernel + /* will mmdrop() in finish_task_switch(). */ + rq->prev_mm = prev->active_mm; + prev->active_mm = NULL; + } } rq->clock_update_flags &= ~(RQCF_ACT_SKIP|RQCF_REQ_SKIP); |