diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2021-02-21 12:35:04 -0800 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2021-02-21 12:35:04 -0800 |
commit | 657bd90c93146a929c69cd43addf2804eb70c926 (patch) | |
tree | e643825c87070f83df58d37d4daf0417eb17e8c2 /arch | |
parent | 7b15c27e2f7b6d114770c2922b2c49d2e8f3867c (diff) | |
parent | c5e6fc08feb2b88dc5dac2f3c817e1c2a4cafda4 (diff) | |
download | linux-657bd90c93146a929c69cd43addf2804eb70c926.tar.bz2 |
Merge tag 'sched-core-2021-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
"Core scheduler updates:
- Add CONFIG_PREEMPT_DYNAMIC: this in its current form adds the
preempt=none/voluntary/full boot options (default: full), to allow
distros to build a PREEMPT kernel but fall back to close to
PREEMPT_VOLUNTARY (or PREEMPT_NONE) runtime scheduling behavior via
a boot time selection.
There's also the /debug/sched_debug switch to do this runtime.
This feature is implemented via runtime patching (a new variant of
static calls).
The scope of the runtime patching can be best reviewed by looking
at the sched_dynamic_update() function in kernel/sched/core.c.
( Note that the dynamic none/voluntary mode isn't 100% identical,
for example preempt-RCU is available in all cases, plus the
preempt count is maintained in all models, which has runtime
overhead even with the code patching. )
The PREEMPT_VOLUNTARY/PREEMPT_NONE models, used by the vast
majority of distributions, are supposed to be unaffected.
- Fix ignored rescheduling after rcu_eqs_enter(). This is a bug that
was found via rcutorture triggering a hang. The bug is that
rcu_idle_enter() may wake up a NOCB kthread, but this happens after
the last generic need_resched() check. Some cpuidle drivers fix it
by chance but many others don't.
In true 2020 fashion the original bug fix has grown into a 5-patch
scheduler/RCU fix series plus another 16 RCU patches to address the
underlying issue of missed preemption events. These are the initial
fixes that should fix current incarnations of the bug.
- Clean up rbtree usage in the scheduler, by providing & using the
following consistent set of rbtree APIs:
partial-order; less() based:
- rb_add(): add a new entry to the rbtree
- rb_add_cached(): like rb_add(), but for a rb_root_cached
total-order; cmp() based:
- rb_find(): find an entry in an rbtree
- rb_find_add(): find an entry, and add if not found
- rb_find_first(): find the first (leftmost) matching entry
- rb_next_match(): continue from rb_find_first()
- rb_for_each(): iterate a sub-tree using the previous two
- Improve the SMP/NUMA load-balancer: scan for an idle sibling in a
single pass. This is a 4-commit series where each commit improves
one aspect of the idle sibling scan logic.
- Improve the cpufreq cooling driver by getting the effective CPU
utilization metrics from the scheduler
- Improve the fair scheduler's active load-balancing logic by
reducing the number of active LB attempts & lengthen the
load-balancing interval. This improves stress-ng mmapfork
performance.
- Fix CFS's estimated utilization (util_est) calculation bug that can
result in too high utilization values
Misc updates & fixes:
- Fix the HRTICK reprogramming & optimization feature
- Fix SCHED_SOFTIRQ raising race & warning in the CPU offlining code
- Reduce dl_add_task_root_domain() overhead
- Fix uprobes refcount bug
- Process pending softirqs in flush_smp_call_function_from_idle()
- Clean up task priority related defines, remove *USER_*PRIO and
USER_PRIO()
- Simplify the sched_init_numa() deduplication sort
- Documentation updates
- Fix EAS bug in update_misfit_status(), which degraded the quality
of energy-balancing
- Smaller cleanups"
* tag 'sched-core-2021-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (51 commits)
sched,x86: Allow !PREEMPT_DYNAMIC
entry/kvm: Explicitly flush pending rcuog wakeup before last rescheduling point
entry: Explicitly flush pending rcuog wakeup before last rescheduling point
rcu/nocb: Trigger self-IPI on late deferred wake up before user resume
rcu/nocb: Perform deferred wake up before last idle's need_resched() check
rcu: Pull deferred rcuog wake up to rcu_eqs_enter() callers
sched/features: Distinguish between NORMAL and DEADLINE hrtick
sched/features: Fix hrtick reprogramming
sched/deadline: Reduce rq lock contention in dl_add_task_root_domain()
uprobes: (Re)add missing get_uprobe() in __find_uprobe()
smp: Process pending softirqs in flush_smp_call_function_from_idle()
sched: Harden PREEMPT_DYNAMIC
static_call: Allow module use without exposing static_call_key
sched: Add /debug/sched_preempt
preempt/dynamic: Support dynamic preempt with preempt= boot option
preempt/dynamic: Provide irqentry_exit_cond_resched() static call
preempt/dynamic: Provide preempt_schedule[_notrace]() static calls
preempt/dynamic: Provide cond_resched() and might_resched() static calls
preempt: Introduce CONFIG_PREEMPT_DYNAMIC
static_call: Provide DEFINE_STATIC_CALL_RET0()
...
Diffstat (limited to 'arch')
-rw-r--r-- | arch/Kconfig | 9 | ||||
-rw-r--r-- | arch/powerpc/platforms/cell/spufs/sched.c | 2 | ||||
-rw-r--r-- | arch/x86/Kconfig | 1 | ||||
-rw-r--r-- | arch/x86/include/asm/preempt.h | 48 | ||||
-rw-r--r-- | arch/x86/include/asm/static_call.h | 7 | ||||
-rw-r--r-- | arch/x86/kernel/static_call.c | 17 | ||||
-rw-r--r-- | arch/x86/kvm/x86.c | 1 |
7 files changed, 73 insertions, 12 deletions
diff --git a/arch/Kconfig b/arch/Kconfig index 87608c2fa027..4790a5f23d9f 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -1058,6 +1058,15 @@ config HAVE_STATIC_CALL_INLINE bool depends on HAVE_STATIC_CALL +config HAVE_PREEMPT_DYNAMIC + bool + depends on HAVE_STATIC_CALL + depends on GENERIC_ENTRY + help + Select this if the architecture support boot time preempt setting + on top of static calls. It is strongly advised to support inline + static call to avoid any overhead. + config ARCH_WANT_LD_ORPHAN_WARN bool help diff --git a/arch/powerpc/platforms/cell/spufs/sched.c b/arch/powerpc/platforms/cell/spufs/sched.c index 9d06fffb1526..369206489895 100644 --- a/arch/powerpc/platforms/cell/spufs/sched.c +++ b/arch/powerpc/platforms/cell/spufs/sched.c @@ -72,7 +72,7 @@ static struct timer_list spuloadavg_timer; #define DEF_SPU_TIMESLICE (100 * HZ / (1000 * SPUSCHED_TICK)) #define SCALE_PRIO(x, prio) \ - max(x * (MAX_PRIO - prio) / (MAX_USER_PRIO / 2), MIN_SPU_TIMESLICE) + max(x * (MAX_PRIO - prio) / (NICE_WIDTH / 2), MIN_SPU_TIMESLICE) /* * scale user-nice values [ -20 ... 0 ... 19 ] to time slice values: diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 7b934a591df2..595193bc2d31 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -224,6 +224,7 @@ config X86 select HAVE_STACK_VALIDATION if X86_64 select HAVE_STATIC_CALL select HAVE_STATIC_CALL_INLINE if HAVE_STACK_VALIDATION + select HAVE_PREEMPT_DYNAMIC select HAVE_RSEQ select HAVE_SYSCALL_TRACEPOINTS select HAVE_UNSTABLE_SCHED_CLOCK diff --git a/arch/x86/include/asm/preempt.h b/arch/x86/include/asm/preempt.h index 69485ca13665..f8cb8af4de5c 100644 --- a/arch/x86/include/asm/preempt.h +++ b/arch/x86/include/asm/preempt.h @@ -5,6 +5,7 @@ #include <asm/rmwcc.h> #include <asm/percpu.h> #include <linux/thread_info.h> +#include <linux/static_call_types.h> DECLARE_PER_CPU(int, __preempt_count); @@ -103,16 +104,45 @@ static __always_inline bool should_resched(int preempt_offset) } #ifdef CONFIG_PREEMPTION - extern asmlinkage void preempt_schedule_thunk(void); -# define __preempt_schedule() \ - asm volatile ("call preempt_schedule_thunk" : ASM_CALL_CONSTRAINT) - extern asmlinkage void preempt_schedule(void); - extern asmlinkage void preempt_schedule_notrace_thunk(void); -# define __preempt_schedule_notrace() \ - asm volatile ("call preempt_schedule_notrace_thunk" : ASM_CALL_CONSTRAINT) +extern asmlinkage void preempt_schedule(void); +extern asmlinkage void preempt_schedule_thunk(void); - extern asmlinkage void preempt_schedule_notrace(void); -#endif +#define __preempt_schedule_func preempt_schedule_thunk + +extern asmlinkage void preempt_schedule_notrace(void); +extern asmlinkage void preempt_schedule_notrace_thunk(void); + +#define __preempt_schedule_notrace_func preempt_schedule_notrace_thunk + +#ifdef CONFIG_PREEMPT_DYNAMIC + +DECLARE_STATIC_CALL(preempt_schedule, __preempt_schedule_func); + +#define __preempt_schedule() \ +do { \ + __STATIC_CALL_MOD_ADDRESSABLE(preempt_schedule); \ + asm volatile ("call " STATIC_CALL_TRAMP_STR(preempt_schedule) : ASM_CALL_CONSTRAINT); \ +} while (0) + +DECLARE_STATIC_CALL(preempt_schedule_notrace, __preempt_schedule_notrace_func); + +#define __preempt_schedule_notrace() \ +do { \ + __STATIC_CALL_MOD_ADDRESSABLE(preempt_schedule_notrace); \ + asm volatile ("call " STATIC_CALL_TRAMP_STR(preempt_schedule_notrace) : ASM_CALL_CONSTRAINT); \ +} while (0) + +#else /* PREEMPT_DYNAMIC */ + +#define __preempt_schedule() \ + asm volatile ("call preempt_schedule_thunk" : ASM_CALL_CONSTRAINT); + +#define __preempt_schedule_notrace() \ + asm volatile ("call preempt_schedule_notrace_thunk" : ASM_CALL_CONSTRAINT); + +#endif /* PREEMPT_DYNAMIC */ + +#endif /* PREEMPTION */ #endif /* __ASM_PREEMPT_H */ diff --git a/arch/x86/include/asm/static_call.h b/arch/x86/include/asm/static_call.h index c37f11999d0c..cbb67b6030f9 100644 --- a/arch/x86/include/asm/static_call.h +++ b/arch/x86/include/asm/static_call.h @@ -37,4 +37,11 @@ #define ARCH_DEFINE_STATIC_CALL_NULL_TRAMP(name) \ __ARCH_DEFINE_STATIC_CALL_TRAMP(name, "ret; nop; nop; nop; nop") + +#define ARCH_ADD_TRAMP_KEY(name) \ + asm(".pushsection .static_call_tramp_key, \"a\" \n" \ + ".long " STATIC_CALL_TRAMP_STR(name) " - . \n" \ + ".long " STATIC_CALL_KEY_STR(name) " - . \n" \ + ".popsection \n") + #endif /* _ASM_STATIC_CALL_H */ diff --git a/arch/x86/kernel/static_call.c b/arch/x86/kernel/static_call.c index ca9a380d9c0b..9442c4136c38 100644 --- a/arch/x86/kernel/static_call.c +++ b/arch/x86/kernel/static_call.c @@ -11,14 +11,26 @@ enum insn_type { RET = 3, /* tramp / site cond-tail-call */ }; +/* + * data16 data16 xorq %rax, %rax - a single 5 byte instruction that clears %rax + * The REX.W cancels the effect of any data16. + */ +static const u8 xor5rax[] = { 0x66, 0x66, 0x48, 0x31, 0xc0 }; + static void __ref __static_call_transform(void *insn, enum insn_type type, void *func) { + const void *emulate = NULL; int size = CALL_INSN_SIZE; const void *code; switch (type) { case CALL: code = text_gen_insn(CALL_INSN_OPCODE, insn, func); + if (func == &__static_call_return0) { + emulate = code; + code = &xor5rax; + } + break; case NOP: @@ -41,7 +53,7 @@ static void __ref __static_call_transform(void *insn, enum insn_type type, void if (unlikely(system_state == SYSTEM_BOOTING)) return text_poke_early(insn, code, size); - text_poke_bp(insn, code, size, NULL); + text_poke_bp(insn, code, size, emulate); } static void __static_call_validate(void *insn, bool tail) @@ -54,7 +66,8 @@ static void __static_call_validate(void *insn, bool tail) return; } else { if (opcode == CALL_INSN_OPCODE || - !memcmp(insn, ideal_nops[NOP_ATOMIC5], 5)) + !memcmp(insn, ideal_nops[NOP_ATOMIC5], 5) || + !memcmp(insn, xor5rax, 5)) return; } diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 1b404e4d7dd8..b967c1c774a1 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1782,6 +1782,7 @@ EXPORT_SYMBOL_GPL(kvm_emulate_wrmsr); bool kvm_vcpu_exit_request(struct kvm_vcpu *vcpu) { + xfer_to_guest_mode_prepare(); return vcpu->mode == EXITING_GUEST_MODE || kvm_request_pending(vcpu) || xfer_to_guest_mode_work_pending(); } |