summaryrefslogtreecommitdiffstats
path: root/kernel/smp.c
AgeCommit message (Collapse)AuthorFilesLines
2022-10-10Merge tag 'bitmap-6.1-rc1' of https://github.com/norov/linuxLinus Torvalds1-2/+4
Pull bitmap updates from Yury Norov: - Fix unsigned comparison to -1 in CPUMAP_FILE_MAX_BYTES (Phil Auld) - cleanup nr_cpu_ids vs nr_cpumask_bits mess (me) This series cleans that mess and adds new config FORCE_NR_CPUS that allows to optimize cpumask subsystem if the number of CPUs is known at compile-time. - optimize find_bit() functions (me) Reworks find_bit() functions based on new FIND_{FIRST,NEXT}_BIT() macros. - add find_nth_bit() (me) Adds find_nth_bit(), which is ~70 times faster than bitcounting with for_each() loop: for_each_set_bit(bit, mask, size) if (n-- == 0) return bit; Also adds bitmap_weight_and() to let people replace this pattern: tmp = bitmap_alloc(nbits); bitmap_and(tmp, map1, map2, nbits); weight = bitmap_weight(tmp, nbits); bitmap_free(tmp); with a single bitmap_weight_and() call. - repair cpumask_check() (me) After switching cpumask to use nr_cpu_ids, cpumask_check() started generating many false-positive warnings. This series fixes it. - Add for_each_cpu_andnot() and for_each_cpu_andnot() (Valentin Schneider) Extends the API with one more function and applies it in sched/core. * tag 'bitmap-6.1-rc1' of https://github.com/norov/linux: (28 commits) sched/core: Merge cpumask_andnot()+for_each_cpu() into for_each_cpu_andnot() lib/test_cpumask: Add for_each_cpu_and(not) tests cpumask: Introduce for_each_cpu_andnot() lib/find_bit: Introduce find_next_andnot_bit() cpumask: fix checking valid cpu range lib/bitmap: add tests for for_each() loops lib/find: optimize for_each() macros lib/bitmap: introduce for_each_set_bit_wrap() macro lib/find_bit: add find_next{,_and}_bit_wrap cpumask: switch for_each_cpu{,_not} to use for_each_bit() net: fix cpu_max_bits_warn() usage in netif_attrmask_next{,_and} cpumask: add cpumask_nth_{,and,andnot} lib/bitmap: remove bitmap_ord_to_pos lib/bitmap: add tests for find_nth_bit() lib: add find_nth{,_and,_andnot}_bit() lib/bitmap: add bitmap_weight_and() lib/bitmap: don't call __bitmap_weight() in kernel code tools: sync find_bit() implementation lib/find_bit: optimize find_next_bit() functions lib/find_bit: create find_first_zero_bit_le() ...
2022-09-20lib/cpumask: add FORCE_NR_CPUS config optionYury Norov1-1/+1
The size of cpumasks is hard-limited by compile-time parameter NR_CPUS, but defined at boot-time when kernel parses ACPI/DT tables, and stored in nr_cpu_ids. In many practical cases, number of CPUs for a target is known at compile time, and can be provided with NR_CPUS. In that case, compiler may be instructed to rely on NR_CPUS as on actual number of CPUs, not an upper limit. It allows to optimize many cpumask routines and significantly shrink size of the kernel image. This patch adds FORCE_NR_CPUS option to teach the compiler to rely on NR_CPUS and enable corresponding optimizations. If FORCE_NR_CPUS=y, kernel will not set nr_cpu_ids at boot, but only check that the actual number of possible CPUs is equal to NR_CPUS, and WARN if that doesn't hold. The new option is especially useful in embedded applications because kernel configurations are unique for each SoC, the number of CPUs is constant and known well, and memory limitations are typically harder. For my 4-CPU ARM64 build with NR_CPUS=4, FORCE_NR_CPUS=y saves 46KB: add/remove: 3/4 grow/shrink: 46/729 up/down: 652/-46952 (-46300) Signed-off-by: Yury Norov <yury.norov@gmail.com>
2022-09-19smp: add set_nr_cpu_ids()Yury Norov1-2/+2
In preparation to support compile-time nr_cpu_ids, add a setter for the variable. This is a no-op for all arches. Signed-off-by: Yury Norov <yury.norov@gmail.com>
2022-09-19smp: don't declare nr_cpu_ids if NR_CPUS == 1Yury Norov1-0/+2
SMP and NR_CPUS are independent options, hence nr_cpu_ids may be declared even if NR_CPUS == 1, which is useless. Signed-off-by: Yury Norov <yury.norov@gmail.com>
2022-08-31sched/debug: Try trigger_single_cpu_backtrace(cpu) in dump_cpu_task()Zhen Lei1-2/+1
The trigger_all_cpu_backtrace() function attempts to send an NMI to the target CPU, which usually provides much better stack traces than the dump_cpu_task() function's approach of dumping that stack from some other CPU. So much so that most calls to dump_cpu_task() only happen after a call to trigger_all_cpu_backtrace() has failed. And the exception to this rule really should attempt to use trigger_all_cpu_backtrace() first. Therefore, move the trigger_all_cpu_backtrace() invocation into dump_cpu_task(). Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Ben Segall <bsegall@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Daniel Bristot de Oliveira <bristot@redhat.com> Cc: Valentin Schneider <vschneid@redhat.com>
2022-07-19locking/csd_lock: Change csdlock_debug from early_param to __setupChen Zhongjin1-2/+2
The csdlock_debug kernel-boot parameter is parsed by the early_param() function csdlock_debug(). If set, csdlock_debug() invokes static_branch_enable() to enable csd_lock_wait feature, which triggers a panic on arm64 for kernels built with CONFIG_SPARSEMEM=y and CONFIG_SPARSEMEM_VMEMMAP=n. With CONFIG_SPARSEMEM_VMEMMAP=n, __nr_to_section is called in static_key_enable() and returns NULL, resulting in a NULL dereference because mem_section is initialized only later in sparse_init(). This is also a problem for powerpc because early_param() functions are invoked earlier than jump_label_init(), also resulting in static_key_enable() failures. These failures cause the warning "static key 'xxx' used before call to jump_label_init()". Thus, early_param is too early for csd_lock_wait to run static_branch_enable(), so changes it to __setup to fix these. Fixes: 8d0968cc6b8f ("locking/csd_lock: Add boot parameter for controlling CSD lock debugging") Cc: stable@vger.kernel.org Reported-by: Chen jingwen <chenjingwen6@huawei.com> Signed-off-by: Chen Zhongjin <chenzhongjin@huawei.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-05-24Merge tag 'sched-core-2022-05-23' of ↵Linus Torvalds1-8/+24
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: - Updates to scheduler metrics: - PELT fixes & enhancements - PSI fixes & enhancements - Refactor cpu_util_without() - Updates to instrumentation/debugging: - Remove sched_trace_*() helper functions - can be done via debug info - Fix double update_rq_clock() warnings - Introduce & use "preemption model accessors" to simplify some of the Kconfig complexity. - Make softirq handling RT-safe. - Misc smaller fixes & cleanups. * tag 'sched-core-2022-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: topology: Remove unused cpu_cluster_mask() sched: Reverse sched_class layout sched/deadline: Remove superfluous rq clock update in push_dl_task() sched/core: Avoid obvious double update_rq_clock warning smp: Make softirq handling RT safe in flush_smp_call_function_queue() smp: Rename flush_smp_call_function_from_idle() sched: Fix missing prototype warnings sched/fair: Remove cfs_rq_tg_path() sched/fair: Remove sched_trace_*() helper functions sched/fair: Refactor cpu_util_without() sched/fair: Revise comment about lb decision matrix sched/psi: report zeroes for CPU full at the system level sched/fair: Delete useless condition in tg_unthrottle_up() sched/fair: Fix cfs_rq_clock_pelt() for throttled cfs_rq sched/fair: Move calculate of avg_load to a better location mailmap: Update my email address to @redhat.com MAINTAINERS: Add myself as scheduler topology reviewer psi: Fix trigger being fired unexpectedly at initial ftrace: Use preemption model accessors for trace header printout kcsan: Use preemption model accessors
2022-05-23Merge tag 'rcu.2022.05.19a' of ↵Linus Torvalds1-2/+5
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu Pull RCU update from Paul McKenney: - Documentation updates - Miscellaneous fixes - Callback-offloading updates, mainly simplifications - RCU-tasks updates, including some -rt fixups, handling of systems with sparse CPU numbering, and a fix for a boot-time race-condition failure - Put SRCU on a memory diet in order to reduce the size of the srcu_struct structure - Torture-test updates fixing some bugs in tests and closing some testing holes - Torture-test updates for the RCU tasks flavors, most notably ensuring that building rcutorture and friends does not change the RCU-tasks-related Kconfig options - Torture-test scripting updates - Expedited grace-period updates, most notably providing milliseconds-scale (not all that) soft real-time response from synchronize_rcu_expedited(). This is also the first time in almost 30 years of RCU that someone other than me has pushed for a reduction in the RCU CPU stall-warning timeout, in this case by more than three orders of magnitude from 21 seconds to 20 milliseconds. This tighter timeout applies only to expedited grace periods * tag 'rcu.2022.05.19a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (80 commits) rcu: Move expedited grace period (GP) work to RT kthread_worker rcu: Introduce CONFIG_RCU_EXP_CPU_STALL_TIMEOUT srcu: Drop needless initialization of sdp in srcu_gp_start() srcu: Prevent expedited GPs and blocking readers from consuming CPU srcu: Add contention check to call_srcu() srcu_data ->lock acquisition srcu: Automatically determine size-transition strategy at boot rcutorture: Make torture.sh allow for --kasan rcutorture: Make torture.sh refscale and rcuscale specify Tasks Trace RCU rcutorture: Make kvm.sh allow more memory for --kasan runs torture: Save "make allmodconfig" .config file scftorture: Remove extraneous "scf" from per_version_boot_params rcutorture: Adjust scenarios' Kconfig options for CONFIG_PREEMPT_DYNAMIC torture: Enable CSD-lock stall reports for scftorture torture: Skip vmlinux check for kvm-again.sh runs scftorture: Adjust for TASKS_RCU Kconfig option being selected rcuscale: Allow rcuscale without RCU Tasks Rude/Trace rcuscale: Allow rcuscale without RCU Tasks refscale: Allow refscale without RCU Tasks Rude/Trace refscale: Allow refscale without RCU Tasks rcutorture: Allow specifying per-scenario stat_interval ...
2022-05-06Merge tag 'v5.18-rc5' into sched/core to pull in fixes & to resolve a conflictIngo Molnar1-1/+1
- sched/core is on a pretty old -rc1 base - refresh it to include recent fixes. - this also allows up to resolve a (trivial) .mailmap conflict Conflicts: .mailmap Signed-off-by: Ingo Molnar <mingo@kernel.org>
2022-05-01smp: Make softirq handling RT safe in flush_smp_call_function_queue()Sebastian Andrzej Siewior1-1/+4
flush_smp_call_function_queue() invokes do_softirq() which is not available on PREEMPT_RT. flush_smp_call_function_queue() is invoked from the idle task and the migration task with preemption or interrupts disabled. So RT kernels cannot process soft interrupts in that context as that has to acquire 'sleeping spinlocks' which is not possible with preemption or interrupts disabled and forbidden from the idle task anyway. The currently known SMP function call which raises a soft interrupt is in the block layer, but this functionality is not enabled on RT kernels due to latency and performance reasons. RT could wake up ksoftirqd unconditionally, but this wants to be avoided if there were soft interrupts pending already when this is invoked in the context of the migration task. The migration task might have preempted a threaded interrupt handler which raised a soft interrupt, but did not reach the local_bh_enable() to process it. The "running" ksoftirqd might prevent the handling in the interrupt thread context which is causing latency issues. Add a new function which handles this case explicitely for RT and falls back to do_softirq() on !RT kernels. In the RT case this warns when one of the flushed SMP function calls raised a soft interrupt so this can be investigated. [ tglx: Moved the RT part out of SMP code ] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/YgKgL6aPj8aBES6G@linutronix.de Link: https://lore.kernel.org/r/20220413133024.356509586@linutronix.de
2022-05-01smp: Rename flush_smp_call_function_from_idle()Thomas Gleixner1-7/+20
This is invoked from the stopper thread too, which is definitely not idle. Rename it to flush_smp_call_function_queue() and fixup the callers. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220413133024.305001096@linutronix.de
2022-04-20kernel/smp: Provide boot-time timeout for CSD lock diagnosticsPaul E. McKenney1-2/+5
Debugging of problems involving insanely long-running SMI handlers proceeds better if the CSD-lock timeout can be adjusted. This commit therefore provides a new smp.csd_lock_timeout kernel boot parameter that specifies the timeout in milliseconds. The default remains at the previously hard-coded value of five seconds. [ paulmck: Apply feedback from Juergen Gross. ] Cc: Rik van Riel <riel@surriel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-04-13smp: Fix offline cpu check in flush_smp_call_function_queue()Nadav Amit1-1/+1
The check in flush_smp_call_function_queue() for callbacks that are sent to offline CPUs currently checks whether the queue is empty. However, flush_smp_call_function_queue() has just deleted all the callbacks from the queue and moved all the entries into a local list. This checks would only be positive if some callbacks were added in the short time after llist_del_all() was called. This does not seem to be the intention of this check. Change the check to look at the local list to which the entries were moved instead of the queue from which all the callbacks were just removed. Fixes: 8d056c48e4862 ("CPU hotplug, smp: flush any pending IPI callbacks before CPU offline") Signed-off-by: Nadav Amit <namit@vmware.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20220319072015.1495036-1-namit@vmware.com
2021-10-22sched: Improve wake_up_all_idle_cpus() take #2Peter Zijlstra1-7/+5
As reported by syzbot and experienced by Pavel, using cpus_read_lock() in wake_up_all_idle_cpus() generates lock inversion (against mmap_sem and possibly others). Instead, shrink the preempt disable region by iterating all CPUs and checking the online status for each individual CPU while having preemption disabled. Fixes: 8850cb663b5c ("sched: Simplify wake_up_*idle*()") Reported-by: syzbot+d5b23b18d2f4feae8a67@syzkaller.appspotmail.com Reported-by: Pavel Machek <pavel@ucw.cz> Reported-by: Qian Cai <quic_qiancai@quicinc.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Qian Cai <quic_qiancai@quicinc.com>
2021-10-07sched: Simplify wake_up_*idle*()Peter Zijlstra1-3/+3
Simplify and make wake_up_if_idle() more robust, also don't iterate the whole machine with preempt_disable() in it's caller: wake_up_all_idle_cpus(). This prepares for another wake_up_if_idle() user that needs a full do_idle() cycle. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Vasily Gorbik <gor@linux.ibm.com> Tested-by: Vasily Gorbik <gor@linux.ibm.com> # on s390 Link: https://lkml.kernel.org/r/20210929152428.769328779@infradead.org
2021-08-11smp: Fix all kernel-doc warningsRandy Dunlap1-3/+11
Fix the following warnings: kernel/smp.c:1189: warning: cannot understand function prototype: 'struct smp_call_on_cpu_struct ' kernel/smp.c:788: warning: No description found for return value of 'smp_call_function_single_async' kernel/smp.c:990: warning: Function parameter or member 'wait' not described in 'smp_call_function_many' kernel/smp.c:990: warning: Excess function parameter 'flags' description in 'smp_call_function_many' kernel/smp.c:1198: warning: Function parameter or member 'work' not described in 'smp_call_on_cpu_struct' kernel/smp.c:1198: warning: Function parameter or member 'done' not described in 'smp_call_on_cpu_struct' kernel/smp.c:1198: warning: Function parameter or member 'func' not described in 'smp_call_on_cpu_struct' kernel/smp.c:1198: warning: Function parameter or member 'data' not described in 'smp_call_on_cpu_struct' kernel/smp.c:1198: warning: Function parameter or member 'ret' not described in 'smp_call_on_cpu_struct' kernel/smp.c:1198: warning: Function parameter or member 'cpu' not described in 'smp_call_on_cpu_struct' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210810225051.3938-1-rdunlap@infradead.org
2021-05-06smp: Fix smp_call_function_single_async prototypeArnd Bergmann1-13/+13
As of commit 966a967116e6 ("smp: Avoid using two cache lines for struct call_single_data"), the smp code prefers 32-byte aligned call_single_data objects for performance reasons, but the block layer includes an instance of this structure in the main 'struct request' that is more senstive to size than to performance here, see 4ccafe032005 ("block: unalign call_single_data in struct request"). The result is a violation of the calling conventions that clang correctly points out: block/blk-mq.c:630:39: warning: passing 8-byte aligned argument to 32-byte aligned parameter 2 of 'smp_call_function_single_async' may result in an unaligned pointer access [-Walign-mismatch] smp_call_function_single_async(cpu, &rq->csd); It does seem that the usage of the call_single_data without cache line alignment should still be allowed by the smp code, so just change the function prototype so it accepts both, but leave the default alignment unchanged for the other users. This seems better to me than adding a local hack to shut up an otherwise correct warning in the caller. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Jens Axboe <axboe@kernel.dk> Link: https://lkml.kernel.org/r/20210505211300.3174456-1-arnd@kernel.org
2021-03-06Merge branch 'locking/core' into x86/mm, to resolve conflictIngo Molnar1-11/+263
There's a non-trivial conflict between the parallel TLB flush framework and the IPI flush debugging code - merge them manually. Conflicts: kernel/smp.c Signed-off-by: Ingo Molnar <mingo@kernel.org>
2021-03-06smp: Micro-optimize smp_call_function_many_cond()Peter Zijlstra1-1/+1
Call the generic send_call_function_single_ipi() function, which will avoid the IPI when @last_cpu is idle. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2021-03-06smp: Inline on_each_cpu_cond() and on_each_cpu()Nadav Amit1-56/+0
Simplify the code and avoid having an additional function on the stack by inlining on_each_cpu_cond() and on_each_cpu(). Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Nadav Amit <namit@vmware.com> [ Minor edits. ] Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210220231712.2475218-10-namit@vmware.com
2021-03-06smp: Run functions concurrently in smp_call_function_many_cond()Nadav Amit1-68/+88
Currently, on_each_cpu() and similar functions do not exploit the potential of concurrency: the function is first executed remotely and only then it is executed locally. Functions such as TLB flush can take considerable time, so this provides an opportunity for performance optimization. To do so, modify smp_call_function_many_cond(), to allows the callers to provide a function that should be executed (remotely/locally), and run them concurrently. Keep other smp_call_function_many() semantic as it is today for backward compatibility: the called function is not executed in this case locally. smp_call_function_many_cond() does not use the optimized version for a single remote target that smp_call_function_single() implements. For synchronous function call, smp_call_function_single() keeps a call_single_data (which is used for synchronization) on the stack. Interestingly, it seems that not using this optimization provides greater performance improvements (greater speedup with a single remote target than with multiple ones). Presumably, holding data structures that are intended for synchronization on the stack can introduce overheads due to TLB misses and false-sharing when the stack is used for other purposes. Signed-off-by: Nadav Amit <namit@vmware.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/r/20210220231712.2475218-2-namit@vmware.com
2021-03-06locking/csd_lock: Add more data to CSD lock debuggingJuergen Gross1-4/+222
In order to help identifying problems with IPI handling and remote function execution add some more data to IPI debugging code. There have been multiple reports of CPUs looping long times (many seconds) in smp_call_function_many() waiting for another CPU executing a function like tlb flushing. Most of these reports have been for cases where the kernel was running as a guest on top of KVM or Xen (there are rumours of that happening under VMWare, too, and even on bare metal). Finding the root cause hasn't been successful yet, even after more than 2 years of chasing this bug by different developers. Commit: 35feb60474bf4f7 ("kernel/smp: Provide CSD lock timeout diagnostics") tried to address this by adding some debug code and by issuing another IPI when a hang was detected. This helped mitigating the problem (the repeated IPI unlocks the hang), but the root cause is still unknown. Current available data suggests that either an IPI wasn't sent when it should have been, or that the IPI didn't result in the target CPU executing the queued function (due to the IPI not reaching the CPU, the IPI handler not being called, or the handler not seeing the queued request). Try to add more diagnostic data by introducing a global atomic counter which is being incremented when doing critical operations (before and after queueing a new request, when sending an IPI, and when dequeueing a request). The counter value is stored in percpu variables which can be printed out when a hang is detected. The data of the last event (consisting of sequence counter, source CPU, target CPU, and event type) is stored in a global variable. When a new event is to be traced, the data of the last event is stored in the event related percpu location and the global data is updated with the new event's data. This allows to track two events in one data location: one by the value of the event data (the event before the current one), and one by the location itself (the current event). A typical printout with a detected hang will look like this: csd: Detected non-responsive CSD lock (#1) on CPU#1, waiting 5000000003 ns for CPU#06 scf_handler_1+0x0/0x50(0xffffa2a881bb1410). csd: CSD lock (#1) handling prior scf_handler_1+0x0/0x50(0xffffa2a8813823c0) request. csd: cnt(00008cc): ffff->0000 dequeue (src cpu 0 == empty) csd: cnt(00008cd): ffff->0006 idle csd: cnt(0003668): 0001->0006 queue csd: cnt(0003669): 0001->0006 ipi csd: cnt(0003e0f): 0007->000a queue csd: cnt(0003e10): 0001->ffff ping csd: cnt(0003e71): 0003->0000 ping csd: cnt(0003e72): ffff->0006 gotipi csd: cnt(0003e73): ffff->0006 handle csd: cnt(0003e74): ffff->0006 dequeue (src cpu 0 == empty) csd: cnt(0003e7f): 0004->0006 ping csd: cnt(0003e80): 0001->ffff pinged csd: cnt(0003eb2): 0005->0001 noipi csd: cnt(0003eb3): 0001->0006 queue csd: cnt(0003eb4): 0001->0006 noipi csd: cnt now: 0003f00 The idea is to print only relevant entries. Those are all events which are associated with the hang (so sender side events for the source CPU of the hanging request, and receiver side events for the target CPU), and the related events just before those (for adding data needed to identify a possible race). Printing all available data would be possible, but this would add large amounts of data printed on larger configurations. Signed-off-by: Juergen Gross <jgross@suse.com> [ Minor readability edits. Breaks col80 but is far more readable. ] Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Paul E. McKenney <paulmck@kernel.org> Link: https://lore.kernel.org/r/20210301101336.7797-4-jgross@suse.com
2021-03-06locking/csd_lock: Prepare more CSD lock debuggingJuergen Gross1-6/+10
In order to be able to easily add more CSD lock debugging data to struct call_function_data->csd move the call_single_data_t element into a sub-structure. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210301101336.7797-3-jgross@suse.com
2021-03-06locking/csd_lock: Add boot parameter for controlling CSD lock debuggingJuergen Gross1-4/+34
Currently CSD lock debugging can be switched on and off via a kernel config option only. Unfortunately there is at least one problem with CSD lock handling pending for about 2 years now, which has been seen in different environments (mostly when running virtualized under KVM or Xen, at least once on bare metal). Multiple attempts to catch this issue have finally led to introduction of CSD lock debug code, but this code is not in use in most distros as it has some impact on performance. In order to be able to ship kernels with CONFIG_CSD_LOCK_WAIT_DEBUG enabled even for production use, add a boot parameter for switching the debug functionality on. This will reduce any performance impact of the debug coding to a bare minimum when not being used. Signed-off-by: Juergen Gross <jgross@suse.com> [ Minor edits. ] Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210301101336.7797-2-jgross@suse.com
2021-02-17smp: Process pending softirqs in flush_smp_call_function_from_idle()Sebastian Andrzej Siewior1-0/+4
send_call_function_single_ipi() may wake an idle CPU without sending an IPI. The woken up CPU will process the SMP-functions in flush_smp_call_function_from_idle(). Any raised softirq from within the SMP-function call will not be processed. Should the CPU have no tasks assigned, then it will go back to idle with pending softirqs and the NOHZ will rightfully complain. Process pending softirqs on return from flush_smp_call_function_queue(). Fixes: b2a02fc43a1f4 ("smp: Optimize send_call_function_single_ipi()") Reported-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20210123201027.3262800-2-bigeasy@linutronix.de
2020-11-27Merge branch 'linus' into sched/core, to resolve semantic conflictIngo Molnar1-1/+1
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-11-24smp: Cleanup smp_call_function*()Peter Zijlstra1-25/+25
Get rid of the __call_single_node union and cleanup the API a little to avoid external code relying on the structure layout as much. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
2020-10-18Merge tag 'core-rcu-2020-10-12' of ↵Linus Torvalds1-0/+134
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU changes from Ingo Molnar: - Debugging for smp_call_function() - RT raw/non-raw lock ordering fixes - Strict grace periods for KASAN - New smp_call_function() torture test - Torture-test updates - Documentation updates - Miscellaneous fixes [ This doesn't actually pull the tag - I've dropped the last merge from the RCU branch due to questions about the series. - Linus ] * tag 'core-rcu-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (77 commits) smp: Make symbol 'csd_bug_count' static kernel/smp: Provide CSD lock timeout diagnostics smp: Add source and destination CPUs to __call_single_data rcu: Shrink each possible cpu krcp rcu/segcblist: Prevent useless GP start if no CBs to accelerate torture: Add gdb support rcutorture: Allow pointer leaks to test diagnostic code rcutorture: Hoist OOM registry up one level refperf: Avoid null pointer dereference when buf fails to allocate rcutorture: Properly synchronize with OOM notifier rcutorture: Properly set rcu_fwds for OOM handling torture: Add kvm.sh --help and update help message rcutorture: Add CONFIG_PROVE_RCU_LIST to TREE05 torture: Update initrd documentation rcutorture: Replace HTTP links with HTTPS ones locktorture: Make function torture_percpu_rwsem_init() static torture: document --allcpus argument added to the kvm.sh script rcutorture: Output number of elapsed grace periods rcutorture: Remove KCSAN stubs rcu: Remove unused "cpu" parameter from rcu_report_qs_rdp() ...
2020-10-16kernel/: fix repeated words in commentsRandy Dunlap1-1/+1
Fix multiple occurrences of duplicated words in kernel/. Fix one typo/spello on the same line as a duplicate word. Change one instance of "the the" to "that the". Otherwise just drop one of the repeated words. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Link: https://lkml.kernel.org/r/98202fa6-8919-ef63-9efe-c0fad5ca7af1@infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-09-04smp: Make symbol 'csd_bug_count' staticWei Yongjun1-1/+1
The sparse tool complains as follows: kernel/smp.c:107:10: warning: symbol 'csd_bug_count' was not declared. Should it be static? Because variable is not used outside of smp.c, this commit marks it static. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2020-09-04kernel/smp: Provide CSD lock timeout diagnosticsPaul E. McKenney1-2/+130
This commit causes csd_lock_wait() to emit diagnostics when a CPU fails to respond quickly enough to one of the smp_call_function() family of function calls. These diagnostics are enabled by a new CSD_LOCK_WAIT_DEBUG Kconfig option that depends on DEBUG_KERNEL. This commit was inspired by an earlier patch by Josef Bacik. [ paulmck: Fix for syzbot+0f719294463916a3fc0e@syzkaller.appspotmail.com ] [ paulmck: Fix KASAN use-after-free issue reported by Qian Cai. ] [ paulmck: Fix botched nr_cpu_ids comparison per Dan Carpenter. ] [ paulmck: Apply Peter Zijlstra feedback. ] Link: https://lore.kernel.org/lkml/00000000000042f21905a991ecea@google.com Link: https://lore.kernel.org/lkml/0000000000002ef21705a9933cf3@google.com Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-09-04smp: Add source and destination CPUs to __call_single_dataPaul E. McKenney1-0/+6
This commit adds a destination CPU to __call_single_data, and is inspired by an earlier commit by Peter Zijlstra. This version adds #ifdef to permit use by 32-bit systems and supplying the destination CPU for all smp_call_function*() requests, not just smp_call_function_single(). If need be, 32-bit systems could be accommodated by shrinking the flags field to 16 bits (the atomic_t variant is currently unused) and by providing only eight bits for CPU on such systems. It is not clear that the addition of the fields to __call_single_node are really needed. [ paulmck: Apply Boqun Feng feedback on 32-bit builds. ] Link: https://lore.kernel.org/lkml/20200615164048.GC2531@hirez.programming.kicks-ass.net/ Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-07-22smp: Fix a potential usage of stale nr_cpusMuchun Song1-2/+1
The get_option() maybe return 0, it means that the nr_cpus is not initialized. Then we will use the stale nr_cpus to initialize the nr_cpu_ids. So fix it. Signed-off-by: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200716070457.53255-1-songmuchun@bytedance.com
2020-06-28smp, irq_work: Continue smp_call_function*() and irq_work*() integrationPeter Zijlstra1-18/+0
Instead of relying on BUG_ON() to ensure the various data structures line up, use a bunch of horrible unions to make it all automatic. Much of the union magic is to ensure irq_work and smp_call_function do not (yet) see the members of their respective data structures change name. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lkml.kernel.org/r/20200622100825.844455025@infradead.org
2020-06-03Merge tag 'sched-core-2020-06-02' of ↵Linus Torvalds1-43/+132
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "The changes in this cycle are: - Optimize the task wakeup CPU selection logic, to improve scalability and reduce wakeup latency spikes - PELT enhancements - CFS bandwidth handling fixes - Optimize the wakeup path by remove rq->wake_list and replacing it with ->ttwu_pending - Optimize IPI cross-calls by making flush_smp_call_function_queue() process sync callbacks first. - Misc fixes and enhancements" * tag 'sched-core-2020-06-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits) irq_work: Define irq_work_single() on !CONFIG_IRQ_WORK too sched/headers: Split out open-coded prototypes into kernel/sched/smp.h sched: Replace rq::wake_list sched: Add rq::ttwu_pending irq_work, smp: Allow irq_work on call_single_queue smp: Optimize send_call_function_single_ipi() smp: Move irq_work_run() out of flush_smp_call_function_queue() smp: Optimize flush_smp_call_function_queue() sched: Fix smp_call_function_single_async() usage for ILB sched/core: Offload wakee task activation if it the wakee is descheduling sched/core: Optimize ttwu() spinning on p->on_cpu sched: Defend cfs and rt bandwidth quota against overflow sched/cpuacct: Fix charge cpuacct.usage_sys sched/fair: Replace zero-length array with flexible-array sched/pelt: Sync util/runnable_sum with PELT window when propagating sched/cpuacct: Use __this_cpu_add() instead of this_cpu_ptr() sched/fair: Optimize enqueue_task_fair() sched: Make scheduler_ipi inline sched: Clean up scheduler_ipi() sched/core: Simplify sched_init() ...
2020-06-02irq_work: Define irq_work_single() on !CONFIG_IRQ_WORK tooIngo Molnar1-2/+0
Some SMP platforms don't have CONFIG_IRQ_WORK defined, resulting in a link error at build time. Define a stub and clean up the prototype definitions. Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-05-28sched/headers: Split out open-coded prototypes into kernel/sched/smp.hIngo Molnar1-4/+1
Move the prototypes for sched_ttwu_pending() and send_call_function_single_ipi() into the newly created kernel/sched/smp.h header, to make sure they are all the same, and to architectures happy that use -Wmissing-prototypes. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-05-28sched: Replace rq::wake_listPeter Zijlstra1-7/+40
The recent commit: 90b5363acd47 ("sched: Clean up scheduler_ipi()") got smp_call_function_single_async() subtly wrong. Even though it will return -EBUSY when trying to re-use a csd, that condition is not atomic and still requires external serialization. The change in ttwu_queue_remote() got this wrong. While on first reading ttwu_queue_remote() has an atomic test-and-set that appears to serialize the use, the matching 'release' is not in the right place to actually guarantee this serialization. The actual race is vs the sched_ttwu_pending() call in the idle loop; that can run the wakeup-list without consuming the CSD. Instead of trying to chain the lists, merge them. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200526161908.129371594@infradead.org
2020-05-28irq_work, smp: Allow irq_work on call_single_queuePeter Zijlstra1-46/+73
Currently irq_work_queue_on() will issue an unconditional arch_send_call_function_single_ipi() and has the handler do irq_work_run(). This is unfortunate in that it makes the IPI handler look at a second cacheline and it misses the opportunity to avoid the IPI. Instead note that struct irq_work and struct __call_single_data are very similar in layout, so use a few bits in the flags word to encode a type and stick the irq_work on the call_single_queue list. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200526161908.011635912@infradead.org
2020-05-28smp: Optimize send_call_function_single_ipi()Peter Zijlstra1-1/+15
Just like the ttwu_queue_remote() IPI, make use of _TIF_POLLING_NRFLAG to avoid sending IPIs to idle CPUs. [ mingo: Fix UP build bug. ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200526161907.953304789@infradead.org
2020-05-28smp: Move irq_work_run() out of flush_smp_call_function_queue()Peter Zijlstra1-8/+9
This ensures flush_smp_call_function_queue() is strictly about call_single_queue. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200526161907.895109676@infradead.org
2020-05-28smp: Optimize flush_smp_call_function_queue()Peter Zijlstra1-4/+23
The call_single_queue can contain (two) different callbacks, synchronous and asynchronous. The current interrupt handler runs them in-order, which means that remote CPUs that are waiting for their synchronous call can be delayed by running asynchronous callbacks. Rework the interrupt handler to first run the synchonous callbacks. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200526161907.836818381@infradead.org
2020-04-19smp: Use smp_call_func_t in on_each_cpu()Kaitao Cheng1-1/+1
Use smp_call_func_t instead of the open coded function pointer argument. Signed-off-by: Kaitao Cheng <pilgrimtao@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://lkml.kernel.org/r/20200417162451.91969-1-pilgrimtao@gmail.com
2020-03-25cpu/hotplug: Move bringup of secondary CPUs out of smp_init()Qais Yousef1-8/+1
This is the last direct user of cpu_up() before it can become an internal implementation detail of the cpu subsystem. Signed-off-by: Qais Yousef <qais.yousef@arm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20200323135110.30522-17-qais.yousef@arm.com
2020-03-06smp: Allow smp_call_function_single_async() to insert locked csdPeter Xu1-3/+11
Previously we will raise an warning if we want to insert a csd object which is with the LOCK flag set, and if it happens we'll also wait for the lock to be released. However, this operation does not match perfectly with how the function is named - the name with "_async" suffix hints that this function should not block, while we will. This patch changed this behavior by simply return -EBUSY instead of waiting, at the meantime we allow this operation to happen without warning the user to change this into a feature when the caller wants to "insert a csd object, if it's there, just wait for that one". This is pretty safe because in flush_smp_call_function_queue() for async csd objects (where csd->flags&SYNC is zero) we'll first do the unlock then we call the csd->func(). So if we see the csd->flags&LOCK is true in smp_call_function_single_async(), then it's guaranteed that csd->func() will be called after this smp_call_function_single_async() returns -EBUSY. Update the comment of the function too to refect this. Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20191216213125.9536-2-peterx@redhat.com
2020-01-28smp: Remove superfluous cond_func check in smp_call_function_many_cond()Sebastian Andrzej Siewior1-1/+1
It was requested to remove the cond_func check but the follow up patch was overlooked. Remove it now. Fixes: 67719ef25eeb ("smp: Add a smp_cond_func_t argument to smp_call_function_many()") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20200127083915.434tdkztorkklpdu@linutronix.de
2020-01-24smp: Remove allocation mask from on_each_cpu_cond.*()Sebastian Andrzej Siewior1-10/+3
The allocation mask is no longer used by on_each_cpu_cond() and on_each_cpu_cond_mask() and can be removed. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20200117090137.1205765-4-bigeasy@linutronix.de
2020-01-24smp: Add a smp_cond_func_t argument to smp_call_function_many()Sebastian Andrzej Siewior1-43/+38
on_each_cpu_cond_mask() allocates a new CPU mask. The newly allocated mask is a subset of the provided mask based on the conditional function. This memory allocation can be avoided by extending smp_call_function_many() with the conditional function and performing the remote function call based on the mask and the conditional function. Rename smp_call_function_many() to smp_call_function_many_cond() and add the smp_cond_func_t argument. If smp_cond_func_t is provided then it is used before invoking the function. Provide smp_call_function_many() with cond_func set to NULL. Let on_each_cpu_cond_mask() use smp_call_function_many_cond(). Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20200117090137.1205765-3-bigeasy@linutronix.de
2020-01-24smp: Use smp_cond_func_t as type for the conditional functionSebastian Andrzej Siewior1-6/+5
Use a typdef for the conditional function instead defining it each time in the function prototype. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20200117090137.1205765-2-bigeasy@linutronix.de
2019-07-20smp: Warn on function calls from softirq contextPeter Zijlstra1-0/+16
It's clearly documented that smp function calls cannot be invoked from softirq handling context. Unfortunately nothing enforces that or emits a warning. A single function call can be invoked from softirq context only via smp_call_function_single_async(). The only legit context is task context, so add a warning to that effect. Reported-by: luferry <luferry@163.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190718160601.GP3402@hirez.programming.kicks-ass.net