summaryrefslogtreecommitdiffstats
path: root/kernel/rcu/tree_plugin.h
AgeCommit message (Collapse)AuthorFilesLines
2021-08-30Merge tag 'locking-core-2021-08-30' of ↵Linus Torvalds1-3/+3
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking and atomics updates from Thomas Gleixner: "The regular pile: - A few improvements to the mutex code - Documentation updates for atomics to clarify the difference between cmpxchg() and try_cmpxchg() and to explain the forward progress expectations. - Simplification of the atomics fallback generator - The addition of arch_atomic_long*() variants and generic arch_*() bitops based on them. - Add the missing might_sleep() invocations to the down*() operations of semaphores. The PREEMPT_RT locking core: - Scheduler updates to support the state preserving mechanism for 'sleeping' spin- and rwlocks on RT. This mechanism is carefully preserving the state of the task when blocking on a 'sleeping' spin- or rwlock and takes regular wake-ups targeted at the same task into account. The preserved or updated (via a regular wakeup) state is restored when the lock has been acquired. - Restructuring of the rtmutex code so it can be utilized and extended for the RT specific lock variants. - Restructuring of the ww_mutex code to allow sharing of the ww_mutex specific functionality for rtmutex based ww_mutexes. - Header file disentangling to allow substitution of the regular lock implementations with the PREEMPT_RT variants without creating an unmaintainable #ifdef mess. - Shared base code for the PREEMPT_RT specific rw_semaphore and rwlock implementations. Contrary to the regular rw_semaphores and rwlocks the PREEMPT_RT implementation is writer unfair because it is infeasible to do priority inheritance on multiple readers. Experience over the years has shown that real-time workloads are not the typical workloads which are sensitive to writer starvation. The alternative solution would be to allow only a single reader which has been tried and discarded as it is a major bottleneck especially for mmap_sem. Aside of that many of the writer starvation critical usage sites have been converted to a writer side mutex/spinlock and RCU read side protections in the past decade so that the issue is less prominent than it used to be. - The actual rtmutex based lock substitutions for PREEMPT_RT enabled kernels which affect mutex, ww_mutex, rw_semaphore, spinlock_t and rwlock_t. The spin/rw_lock*() functions disable migration across the critical section to preserve the existing semantics vs per-CPU variables. - Rework of the futex REQUEUE_PI mechanism to handle the case of early wake-ups which interleave with a re-queue operation to prevent the situation that a task would be blocked on both the rtmutex associated to the outer futex and the rtmutex based hash bucket spinlock. While this situation cannot happen on !RT enabled kernels the changes make the underlying concurrency problems easier to understand in general. As a result the difference between !RT and RT kernels is reduced to the handling of waiting for the critical section. !RT kernels simply spin-wait as before and RT kernels utilize rcu_wait(). - The substitution of local_lock for PREEMPT_RT with a spinlock which protects the critical section while staying preemptible. The CPU locality is established by disabling migration. The underlying concepts of this code have been in use in PREEMPT_RT for way more than a decade. The code has been refactored several times over the years and this final incarnation has been optimized once again to be as non-intrusive as possible, i.e. the RT specific parts are mostly isolated. It has been extensively tested in the 5.14-rt patch series and it has been verified that !RT kernels are not affected by these changes" * tag 'locking-core-2021-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (92 commits) locking/rtmutex: Return success on deadlock for ww_mutex waiters locking/rtmutex: Prevent spurious EDEADLK return caused by ww_mutexes locking/rtmutex: Dequeue waiter on ww_mutex deadlock locking/rtmutex: Dont dereference waiter lockless locking/semaphore: Add might_sleep() to down_*() family locking/ww_mutex: Initialize waiter.ww_ctx properly static_call: Update API documentation locking/local_lock: Add PREEMPT_RT support locking/spinlock/rt: Prepare for RT local_lock locking/rtmutex: Add adaptive spinwait mechanism locking/rtmutex: Implement equal priority lock stealing preempt: Adjust PREEMPT_LOCK_OFFSET for RT locking/rtmutex: Prevent lockdep false positive with PI futexes futex: Prevent requeue_pi() lock nesting issue on RT futex: Simplify handle_early_requeue_pi_wakeup() futex: Reorder sanity checks in futex_requeue() futex: Clarify comment in futex_requeue() futex: Restructure futex_requeue() futex: Correct the number of requeued waiters for PI futex: Remove bogus condition for requeue PI ...
2021-08-17locking/rtmutex: Split out the inner parts of 'struct rtmutex'Peter Zijlstra1-3/+3
RT builds substitutions for rwsem, mutex, spinlock and rwlock around rtmutexes. Split the inner working out so each lock substitution can use them with the appropriate lockdep annotations. This avoids having an extra unused lockdep map in the wrapped rtmutex. No functional change. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211302.784739994@linutronix.de
2021-08-10Merge branches 'doc.2021.07.20c', 'fixes.2021.08.06a', 'nocb.2021.07.20c', ↵Paul E. McKenney1-1495/+11
'nolibc.2021.07.20c', 'tasks.2021.07.20c', 'torture.2021.07.27a' and 'torturescript.2021.07.27a' into HEAD doc.2021.07.20c: Documentation updates. fixes.2021.08.06a: Miscellaneous fixes. nocb.2021.07.20c: Callback-offloading (NOCB CPU) updates. nolibc.2021.07.20c: Tiny userspace library updates. tasks.2021.07.20c: Tasks RCU updates. torture.2021.07.27a: In-kernel torture-test updates. torturescript.2021.07.27a: Torture-test scripting updates.
2021-08-06rcu: Print human-readable message for schedule() in RCU readerPaul E. McKenney1-1/+1
The WARN_ON_ONCE() invocation within the CONFIG_PREEMPT=y version of rcu_note_context_switch() triggers when there is a voluntary context switch in an RCU read-side critical section, but there is quite a gap between the output of that WARN_ON_ONCE() and this RCU-usage error. This commit therefore converts the WARN_ON_ONCE() to a WARN_ONCE() that explicitly describes the problem in its message. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-08-06rcu: Mark accesses to ->rcu_read_lock_nestingPaul E. McKenney1-3/+6
KCSAN flags accesses to ->rcu_read_lock_nesting as data races, but in the past, the overhead of marked accesses was excessive. However, that was long ago, and much has changed since then, both in terms of hardware and of compilers. Here is data taken on an eight-core laptop using Intel(R) Core(TM) i9-10885H CPU @ 2.40GHz with a kernel built using gcc version 9.3.0, with all data in nanoseconds. Unmarked accesses (status quo), measured by three refscale runs: Minimum reader duration: 3.286 2.851 3.395 Median reader duration: 3.698 3.531 3.4695 Maximum reader duration: 4.481 5.215 5.157 Marked accesses, also measured by three refscale runs: Minimum reader duration: 3.501 3.677 3.580 Median reader duration: 4.053 3.723 3.895 Maximum reader duration: 7.307 4.999 5.511 This focused microbenhmark shows only sub-nanosecond differences which are unlikely to be visible at the system level. This commit therefore marks data-racing accesses to ->rcu_read_lock_nesting. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-07-20rcu: Fix macro name CONFIG_TASKS_RCU_TRACEZhouyi Zhou1-4/+4
This commit fixes several typos where CONFIG_TASKS_RCU_TRACE should instead be CONFIG_TASKS_TRACE_RCU. Among other things, these typos could cause CONFIG_TASKS_TRACE_RCU_READ_MB=y kernels to suffer from memory-ordering bugs that could result in false-positive quiescent states and too-short grace periods. Signed-off-by: Zhouyi Zhou <zhouzhouyi@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-07-20rcu/nocb: Start moving nocb code to its own plugin fileFrederic Weisbecker1-1487/+0
The kernel/rcu/tree_plugin.h file contains not only the plugins for preemptible RCU, but also many other features including rcu_nocbs callback offloading. This offloading has become large and complex, so it is time to put it in its own file. This commit starts that process. Suggested-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> [ paulmck: Rename to tree_nocb.h, add Frederic as author. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-07-04Merge branch 'core-rcu-2021.07.04' of ↵Linus Torvalds1-126/+113
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu Pull RCU updates from Paul McKenney: - Bitmap parsing support for "all" as an alias for all bits - Documentation updates - Miscellaneous fixes, including some that overlap into mm and lockdep - kvfree_rcu() updates - mem_dump_obj() updates, with acks from one of the slab-allocator maintainers - RCU NOCB CPU updates, including limited deoffloading - SRCU updates - Tasks-RCU updates - Torture-test updates * 'core-rcu-2021.07.04' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (78 commits) tasks-rcu: Make show_rcu_tasks_gp_kthreads() be static inline rcu-tasks: Make ksoftirqd provide RCU Tasks quiescent states rcu: Add missing __releases() annotation rcu: Remove obsolete rcu_read_unlock() deadlock commentary rcu: Improve comments describing RCU read-side critical sections rcu: Create an unrcu_pointer() to remove __rcu from a pointer srcu: Early test SRCU polling start rcu: Fix various typos in comments rcu/nocb: Unify timers rcu/nocb: Prepare for fine-grained deferred wakeup rcu/nocb: Only cancel nocb timer if not polling rcu/nocb: Delete bypass_timer upon nocb_gp wakeup rcu/nocb: Cancel nocb_timer upon nocb_gp wakeup rcu/nocb: Allow de-offloading rdp leader rcu/nocb: Directly call __wake_nocb_gp() from bypass timer rcu: Don't penalize priority boosting when there is nothing to boost rcu: Point to documentation of ordering guarantees rcu: Make rcu_gp_cleanup() be noinline for tracing rcu: Restrict RCU_STRICT_GRACE_PERIOD to at most four CPUs rcu: Make show_rcu_gp_kthreads() dump rcu_node structures blocking GP ...
2021-06-18sched: Introduce task_is_running()Peter Zijlstra1-1/+1
Replace a bunch of 'p->state == TASK_RUNNING' with a new helper: task_is_running(p). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Davidlohr Bueso <dave@stgolabs.net> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20210611082838.222401495@infradead.org
2021-05-18Merge branches 'bitmaprange.2021.05.10c', 'doc.2021.05.10c', ↵Paul E. McKenney1-120/+110
'fixes.2021.05.13a', 'kvfree_rcu.2021.05.10c', 'mmdumpobj.2021.05.10c', 'nocb.2021.05.12a', 'srcu.2021.05.12a', 'tasks.2021.05.18a' and 'torture.2021.05.10c' into HEAD bitmaprange.2021.05.10c: Allow "all" for bitmap ranges. doc.2021.05.10c: Documentation updates. fixes.2021.05.13a: Miscellaneous fixes. kvfree_rcu.2021.05.10c: kvfree_rcu() updates. mmdumpobj.2021.05.10c: mem_dump_obj() updates. nocb.2021.05.12a: RCU NOCB CPU updates, including limited deoffloading. srcu.2021.05.12a: SRCU updates. tasks.2021.05.18a: Tasks-RCU updates. torture.2021.05.10c: Torture-test updates.
2021-05-12rcu: Fix various typos in commentsIngo Molnar1-1/+1
Fix ~12 single-word typos in RCU code comments. [ paulmck: Apply feedback from Randy Dunlap. ] Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-05-12rcu/nocb: Unify timersFrederic Weisbecker1-53/+39
Now that ->nocb_timer and ->nocb_bypass_timer have become quite similar, this commit merges them together. A new RCU_NOCB_WAKE_BYPASS wake level is introduced. As a result, timers perform all kinds of deferred wake ups but other deferred wakeup callsites only handle non-bypass wakeups in order not to wake up rcuo too early. The timer also unconditionally executes a full barrier so as to order timer_pending() and callback enqueue although the path performing RCU_NOCB_WAKE_FORCE that makes use of it is debatable. It should also test against the rdp leader instead of the current rdp. This unconditional full barrier shouldn't bring visible overhead since these timers almost never fire. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-05-12rcu/nocb: Prepare for fine-grained deferred wakeupFrederic Weisbecker1-8/+9
Tuning the deferred wakeup level must be done from a safe wakeup point. Currently those sites are: * ->nocb_timer * user/idle/guest entry * CPU down * softirq/rcuc All of these sites perform the wake up for both RCU_NOCB_WAKE and RCU_NOCB_WAKE_FORCE. In order to merge ->nocb_timer and ->nocb_bypass_timer together, we plan to add a new RCU_NOCB_WAKE_BYPASS that really should be deferred until a timer fires so that we don't wake up the NOCB-gp kthread too early. To prepare for that, this commit specifies the per-callsite wakeup level/limit. Cc: Josh Triplett <josh@joshtriplett.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> [ paulmck: Fix non-NOCB rcu_nocb_need_deferred_wakeup() definition. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-05-12rcu/nocb: Only cancel nocb timer if not pollingFrederic Weisbecker1-7/+7
This commit refrains deleting the ->nocb_timer if rcu_nocb is polling because it should not ever have been queued in the polling case. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-05-12rcu/nocb: Delete bypass_timer upon nocb_gp wakeupFrederic Weisbecker1-0/+2
A NOCB-gp wake p can safely delete the ->nocb_bypass_timer because nocb_gp_wait() will recheck again the bypass state and rearm the bypass timer if necessary. This commit therefore deletes this timer. Reviewed-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-05-12rcu/nocb: Cancel nocb_timer upon nocb_gp wakeupFrederic Weisbecker1-0/+4
When waking up in nocb_gp_wait(), there is no need to keep the nocb_timer around because this function will traverse the whole rdp list. Any update performed before the timer was armed will now be visible after the ->nocb_gp_lock acquire. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-05-12rcu/nocb: Allow de-offloading rdp leaderFrederic Weisbecker1-4/+0
The only thing that prevented an rdp leader from being de-offloaded was the nocb_bypass_timer that used to lock the nocb_lock of the rdp leader. If an rdp gets de-offloaded, it will subtlely ignore rcu_nocb_lock() calls and do its job in the timer unsafely. Worse yet: If it gets re-offloaded in the middle of the timer, rcu_nocb_unlock() would try to unlock, leaving it imbalanced. Now that the nocb_bypass_timer doesn't use the nocb_lock anymore, de-offloading the rdp leader is now safe. This commit therefore allows the rdp leader to be de-offloaded. Reported-by: Paul E. McKenney <paulmck@kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-05-12rcu/nocb: Directly call __wake_nocb_gp() from bypass timerFrederic Weisbecker1-2/+3
The bypass timer calls __call_rcu_nocb_wake() instead of directly calling __wake_nocb_gp(). The only difference here is that rdp->qlen_last_fqs_check gets overridden. But resetting the deferred force quiescent state base shouldn't be relevant for that timer. In fact the bypass queue in question can be for any rdp from the group and not necessarily the rdp leader on which the bypass timer is attached. This commit therefore calls __wake_nocb_gp() directly. This way we don't even need to lock the ->nocb_lock. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-05-10rcu: Make RCU priority boosting work on single-CPU rcu_node structuresPaul E. McKenney1-22/+7
When any CPU comes online, it checks to see if an RCU-boost kthread has already been created for that CPU's leaf rcu_node structure, and if not, it creates one. Unfortunately, it also verifies that this leaf rcu_node structure actually has at least one online CPU, and if not, it declines to create the kthread. Although this behavior makes sense during early boot, especially on systems that claim far more CPUs than they actually have, it makes no sense for the first CPU to come online for a given rcu_node structure. There is no point in checking because we know there is a CPU on its way in. The problem is that timing differences can cause this incoming CPU to not yet be reflected in the various bit masks even at rcutree_online_cpu() time, and there is no chance at rcutree_prepare_cpu() time. Plus it would be better to create the RCU-boost kthread at rcutree_prepare_cpu() to handle the case where the CPU is involved in an RCU priority inversion very shortly after it comes online. This commit therefore moves the checking to rcu_prepare_kthreads(), which is called only at early boot, when the check is appropriate. In addition, it makes rcutree_prepare_cpu() invoke rcu_spawn_one_boost_kthread(), which no longer does any checking for online CPUs. With this change, RCU priority boosting tests now pass for short rcutorture runs, even with single-CPU leaf rcu_node structures. Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Scott Wood <swood@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-05-10rcu: Add quiescent states and boost states to show_rcu_gp_kthreads() outputPaul E. McKenney1-0/+1
This commit adds each rcu_node structure's ->qsmask and "bBEG" output indicating whether: (1) There is a boost kthread, (2) A reader needs to be (or is in the process of being) boosted, (3) A reader is blocking an expedited grace period, and (4) A reader is blocking a normal grace period. This helps diagnose RCU priority boosting failures. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-05-10rcu/nocb: Use the rcuog CPU's ->nocb_timerFrederic Weisbecker1-63/+77
Currently each CPU has its own ->nocb_timer queued when the nocb_gp wakeup must be deferred. This approach has many drawbacks, compared to a solution based on a single timer per NOCB group: * There are a lot of timers to maintain. * The per-rdp ->nocb_lock must be held to queue and cancel the timer and this lock can already be heavily contended. * One timer firing doesn't cancel the other timers in the same group: - These other timers can thus cause spurious wakeups - Each rdp that queued a timer must lock both ->nocb_lock and then ->nocb_gp_lock upon exit from the kernel to idle/user/guest mode. * We can't cancel all of them if we detect an unflushed bypass in nocb_gp_wait(). In fact currently we only ever cancel the ->nocb_timer of the leader group. * The leader group's nocb_timer is cancelled without locking ->nocb_lock in nocb_gp_wait(). This currently appears to be safe but is an accident waiting to happen. * Since the timer acquires ->nocb_lock, it requires extra care in the NOCB (de-)offloading process, requiring that it be either enabled or disabled and then flushed. This commit instead uses the rcuog kthread's CPU's ->nocb_timer instead. It is protected by nocb_gp_lock, which is _way_ less contended and remains so even after this change. As a matter of fact, the nocb_timer almost never fires and the deferred wakeup is mostly carried out upon idle/user/guest entry. Now the early check performed at this point in do_nocb_deferred_wakeup() is done on rdp_gp->nocb_defer_wakeup, which is of course racy. However, this raciness is harmless because we only need the guarantee that the timer is queued if we were the last one to queue it. Any other situation (another CPU has queued it and we either see it or not) is fine. This solves all the issues listed above. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-05-10rcu: Fix typo in comment: kthead -> kthreadRolf Eike Beer1-1/+1
Signed-off-by: Rolf Eike Beer <eb@emlix.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-05-10rcu/tree_plugin: Don't handle the case of 'all' CPU rangeYury Norov1-6/+3
The 'all' semantics is now supported by the bitmap_parselist() so we can drop supporting it as a special case in RCU code. Since 'all' is properly supported in core bitmap code, also drop legacy comment in RCU for it. This patch does not make any functional changes for existing users. Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-03-24Merge branches 'bitmaprange.2021.03.08a', 'fixes.2021.03.15a', ↵Paul E. McKenney1-82/+164
'kvfree_rcu.2021.03.08a', 'mmdumpobj.2021.03.08a', 'nocb.2021.03.15a', 'poll.2021.03.24a', 'rt.2021.03.08a', 'tasks.2021.03.08a', 'torture.2021.03.08a' and 'torturescript.2021.03.22a' into HEAD bitmaprange.2021.03.08a: Allow 3-N for bitmap ranges. fixes.2021.03.15a: Miscellaneous fixes. kvfree_rcu.2021.03.08a: kvfree_rcu() updates. mmdumpobj.2021.03.08a: mem_dump_obj() updates. nocb.2021.03.15a: RCU NOCB CPU updates, including limited deoffloading. poll.2021.03.24a: Polling grace-period interfaces for RCU. rt.2021.03.08a: Realtime-related RCU changes. tasks.2021.03.08a: Tasks-RCU updates. torture.2021.03.08a: Torture-test updates. torturescript.2021.03.22a: Torture-test scripting updates.
2021-03-15rcu/nocb: Move trace_rcu_nocb_wake() calls outside nocb_lock when possibleFrederic Weisbecker1-3/+3
Those tracing calls don't need to be under ->nocb_lock. This commit therefore moves them outside of that lock. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-03-15rcu/nocb: Disable bypass when CPU isn't completely offloadedFrederic Weisbecker1-9/+29
Currently, the bypass is flushed at the very last moment in the deoffloading procedure. However, this approach leads to a larger state space than would be preferred. This commit therefore disables the bypass at soon as the deoffloading procedure begins, then flushes it. This guarantees that the bypass remains empty and thus out of the way of the deoffloading procedure. Symmetrically, this commit waits to enable the bypass until the offloading procedure has completed. Reported-by: Paul E. McKenney <paulmck@kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-03-15rcu/nocb: Fix missed nocb_timer requeueFrederic Weisbecker1-2/+5
This sequence of events can lead to a failure to requeue a CPU's ->nocb_timer: 1. There are no callbacks queued for any CPU covered by CPU 0-2's ->nocb_gp_kthread. Note that ->nocb_gp_kthread is associated with CPU 0. 2. CPU 1 enqueues its first callback with interrupts disabled, and thus must defer awakening its ->nocb_gp_kthread. It therefore queues its rcu_data structure's ->nocb_timer. At this point, CPU 1's rdp->nocb_defer_wakeup is RCU_NOCB_WAKE. 3. CPU 2, which shares the same ->nocb_gp_kthread, also enqueues a callback, but with interrupts enabled, allowing it to directly awaken the ->nocb_gp_kthread. 4. The newly awakened ->nocb_gp_kthread associates both CPU 1's and CPU 2's callbacks with a future grace period and arranges for that grace period to be started. 5. This ->nocb_gp_kthread goes to sleep waiting for the end of this future grace period. 6. This grace period elapses before the CPU 1's timer fires. This is normally improbably given that the timer is set for only one jiffy, but timers can be delayed. Besides, it is possible that kernel was built with CONFIG_RCU_STRICT_GRACE_PERIOD=y. 7. The grace period ends, so rcu_gp_kthread awakens the ->nocb_gp_kthread, which in turn awakens both CPU 1's and CPU 2's ->nocb_cb_kthread. Then ->nocb_gb_kthread sleeps waiting for more newly queued callbacks. 8. CPU 1's ->nocb_cb_kthread invokes its callback, then sleeps waiting for more invocable callbacks. 9. Note that neither kthread updated any ->nocb_timer state, so CPU 1's ->nocb_defer_wakeup is still set to RCU_NOCB_WAKE. 10. CPU 1 enqueues its second callback, this time with interrupts enabled so it can wake directly ->nocb_gp_kthread. It does so with calling wake_nocb_gp() which also cancels the pending timer that got queued in step 2. But that doesn't reset CPU 1's ->nocb_defer_wakeup which is still set to RCU_NOCB_WAKE. So CPU 1's ->nocb_defer_wakeup and its ->nocb_timer are now desynchronized. 11. ->nocb_gp_kthread associates the callback queued in 10 with a new grace period, arranges for that grace period to start and sleeps waiting for it to complete. 12. The grace period ends, rcu_gp_kthread awakens ->nocb_gp_kthread, which in turn wakes up CPU 1's ->nocb_cb_kthread which then invokes the callback queued in 10. 13. CPU 1 enqueues its third callback, this time with interrupts disabled so it must queue a timer for a deferred wakeup. However the value of its ->nocb_defer_wakeup is RCU_NOCB_WAKE which incorrectly indicates that a timer is already queued. Instead, CPU 1's ->nocb_timer was cancelled in 10. CPU 1 therefore fails to queue the ->nocb_timer. 14. CPU 1 has its pending callback and it may go unnoticed until some other CPU ever wakes up ->nocb_gp_kthread or CPU 1 ever calls an explicit deferred wakeup, for example, during idle entry. This commit fixes this bug by resetting rdp->nocb_defer_wakeup everytime we delete the ->nocb_timer. It is quite possible that there is a similar scenario involving ->nocb_bypass_timer and ->nocb_defer_wakeup. However, despite some effort from several people, a failure scenario has not yet been located. However, that by no means guarantees that no such scenario exists. Finding a failure scenario is left as an exercise for the reader, and the "Fixes:" tag below relates to ->nocb_bypass_timer instead of ->nocb_timer. Fixes: d1b222c6be1f (rcu/nocb: Add bypass callback queueing) Cc: <stable@vger.kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Boqun Feng <boqun.feng@gmail.com> Reviewed-by: Neeraj Upadhyay <neeraju@codeaurora.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-03-15rcu: Make nocb_nobypass_lim_per_jiffy staticJiapeng Chong1-1/+1
RCU triggerse the following sparse warning: kernel/rcu/tree_plugin.h:1497:5: warning: symbol 'nocb_nobypass_lim_per_jiffy' was not declared. Should it be static? This commit therefore makes this variable static. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Reported-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-03-15rcu: Add explicit barrier() to __rcu_read_unlock()Paul E. McKenney1-1/+2
Because preemptible RCU's __rcu_read_unlock() is an external function, the rough equivalent of an implicit barrier() is inserted by the compiler. Except that there is a direct call to __rcu_read_unlock() in that same file, and compilers are getting to the point where they might choose to inline the fastpath of the __rcu_read_unlock() function. This commit therefore adds an explicit barrier() to the very beginning of __rcu_read_unlock(). Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-03-08rcu: Make rcu_read_unlock_special() expedite strict grace periodsPaul E. McKenney1-0/+1
In kernels built with CONFIG_RCU_STRICT_GRACE_PERIOD=y, every grace period is an expedited grace period. However, rcu_read_unlock_special() does not treat them that way, instead allowing the deferred quiescent state to be reported whenever. This commit therefore adds a check of this Kconfig option that causes rcu_read_unlock_special() to treat all grace periods as expedited for CONFIG_RCU_STRICT_GRACE_PERIOD=y kernels. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-03-08rcu: Expedite deboost in case of deferred quiescent statePaul E. McKenney1-12/+14
Historically, a task that has been subjected to RCU priority boosting is deboosted at rcu_read_unlock() time. However, with the advent of deferred quiescent states, if the outermost rcu_read_unlock() was invoked with either bottom halves, interrupts, or preemption disabled, the deboosting will be delayed for some time. During this time, a low-priority process might be incorrectly running at a high real-time priority level. Fortunately, rcu_read_unlock_special() already provides mechanisms for forcing a minimal deferral of quiescent states, at least for kernels built with CONFIG_IRQ_WORK=y. These mechanisms are currently used when expedited grace periods are pending that might be blocked by the current task. This commit therefore causes those mechanisms to also be used in cases where the current task has been or might soon be subjected to RCU priority boosting. Note that this applies to all kernels built with CONFIG_RCU_BOOST=y, regardless of whether or not they are also built with CONFIG_PREEMPT_RT=y. This approach assumes that kernels build for use with aggressive real-time applications are built with CONFIG_IRQ_WORK=y. It is likely to be far simpler to enable CONFIG_IRQ_WORK=y than to implement a fast-deboosting scheme that works correctly in its absence. While in the area, alphabetize the rcu_preempt_deferred_qs_handler() function's local variables. Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Scott Wood <swood@redhat.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-03-08rcu/nocb: Rename nocb_gp_update_state to nocb_gp_update_state_deoffloadingFrederic Weisbecker1-4/+5
The name nocb_gp_update_state() is unenlightening, so this commit changes it to nocb_gp_update_state_deoffloading(). This function now does what its name says, updates state and returns true if the CPU corresponding to the specified rcu_data structure is in the process of being de-offloaded. Reported-by: Paul E. McKenney <paulmck@kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-03-08rcu/nocb: Avoid confusing double write of rdp->nocb_cb_sleepFrederic Weisbecker1-3/+4
The nocb_cb_wait() function first sets the rdp->nocb_cb_sleep flag to true by after invoking the callbacks, and then sets it back to false if it finds more callbacks that are ready to invoke. This is confusing and will become unsafe if this flag is ever read locklessly. This commit therefore writes it only once, based on the state after both callback invocation and checking. Reported-by: Paul E. McKenney <paulmck@kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-03-08rcu/nocb: Forbid NOCB toggling on offline CPUsFrederic Weisbecker1-36/+21
It makes no sense to de-offload an offline CPU because that CPU will never invoke any remaining callbacks. It also makes little sense to offload an offline CPU because any pending RCU callbacks were migrated when that CPU went offline. Yes, it is in theory possible to use a number of tricks to permit offloading and deoffloading offline CPUs in certain cases, but in practice it is far better to have the simple and deterministic rule "Toggling the offload state of an offline CPU is forbidden". For but one example, consider that an offloaded offline CPU might have millions of callbacks queued. Best to just say "no". This commit therefore forbids toggling of the offloaded state of offline CPUs. Reported-by: Paul E. McKenney <paulmck@kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-03-08rcu/nocb: Comment the reason behind BH disablement on batch processingFrederic Weisbecker1-0/+6
This commit explains why softirqs need to be disabled while invoking callbacks, even when callback processing has been offloaded. After all, invoking callbacks concurrently is one thing, but concurrently invoking the same callback is quite another. Reported-by: Boqun Feng <boqun.feng@gmail.com> Reported-by: Paul E. McKenney <paulmck@kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-03-08rcu/nocb: Detect unsafe checks for offloaded rdpFrederic Weisbecker1-14/+76
Provide CONFIG_PROVE_RCU sanity checks to ensure we are always reading the offloaded state of an rdp in a safe and stable way and prevent from its value to be changed under us. We must either hold the barrier mutex, the cpu-hotplug lock (read or write) or the nocb lock. Local non-preemptible reads are also safe. NOCB kthreads and timers have their own means of synchronization against the offloaded state updaters. Cc: Josh Triplett <josh@joshtriplett.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-03-08rcu: deprecate "all" option to rcu_nocbs=Paul Gortmaker1-4/+2
With the core bitmap support now accepting "N" as a placeholder for the end of the bitmap, "all" can be represented as "0-N" and has the advantage of not being specific to RCU (or any other subsystem). So deprecate the use of "all" by removing documentation references to it. The support itself needs to remain for now, since we don't know how many people out there are using it currently, but since it is in an __init area anyway, it isn't worth losing sleep over. Cc: Yury Norov <yury.norov@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Acked-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-02-21Merge tag 'sched-core-2021-02-17' of ↵Linus Torvalds1-9/+22
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "Core scheduler updates: - Add CONFIG_PREEMPT_DYNAMIC: this in its current form adds the preempt=none/voluntary/full boot options (default: full), to allow distros to build a PREEMPT kernel but fall back to close to PREEMPT_VOLUNTARY (or PREEMPT_NONE) runtime scheduling behavior via a boot time selection. There's also the /debug/sched_debug switch to do this runtime. This feature is implemented via runtime patching (a new variant of static calls). The scope of the runtime patching can be best reviewed by looking at the sched_dynamic_update() function in kernel/sched/core.c. ( Note that the dynamic none/voluntary mode isn't 100% identical, for example preempt-RCU is available in all cases, plus the preempt count is maintained in all models, which has runtime overhead even with the code patching. ) The PREEMPT_VOLUNTARY/PREEMPT_NONE models, used by the vast majority of distributions, are supposed to be unaffected. - Fix ignored rescheduling after rcu_eqs_enter(). This is a bug that was found via rcutorture triggering a hang. The bug is that rcu_idle_enter() may wake up a NOCB kthread, but this happens after the last generic need_resched() check. Some cpuidle drivers fix it by chance but many others don't. In true 2020 fashion the original bug fix has grown into a 5-patch scheduler/RCU fix series plus another 16 RCU patches to address the underlying issue of missed preemption events. These are the initial fixes that should fix current incarnations of the bug. - Clean up rbtree usage in the scheduler, by providing & using the following consistent set of rbtree APIs: partial-order; less() based: - rb_add(): add a new entry to the rbtree - rb_add_cached(): like rb_add(), but for a rb_root_cached total-order; cmp() based: - rb_find(): find an entry in an rbtree - rb_find_add(): find an entry, and add if not found - rb_find_first(): find the first (leftmost) matching entry - rb_next_match(): continue from rb_find_first() - rb_for_each(): iterate a sub-tree using the previous two - Improve the SMP/NUMA load-balancer: scan for an idle sibling in a single pass. This is a 4-commit series where each commit improves one aspect of the idle sibling scan logic. - Improve the cpufreq cooling driver by getting the effective CPU utilization metrics from the scheduler - Improve the fair scheduler's active load-balancing logic by reducing the number of active LB attempts & lengthen the load-balancing interval. This improves stress-ng mmapfork performance. - Fix CFS's estimated utilization (util_est) calculation bug that can result in too high utilization values Misc updates & fixes: - Fix the HRTICK reprogramming & optimization feature - Fix SCHED_SOFTIRQ raising race & warning in the CPU offlining code - Reduce dl_add_task_root_domain() overhead - Fix uprobes refcount bug - Process pending softirqs in flush_smp_call_function_from_idle() - Clean up task priority related defines, remove *USER_*PRIO and USER_PRIO() - Simplify the sched_init_numa() deduplication sort - Documentation updates - Fix EAS bug in update_misfit_status(), which degraded the quality of energy-balancing - Smaller cleanups" * tag 'sched-core-2021-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (51 commits) sched,x86: Allow !PREEMPT_DYNAMIC entry/kvm: Explicitly flush pending rcuog wakeup before last rescheduling point entry: Explicitly flush pending rcuog wakeup before last rescheduling point rcu/nocb: Trigger self-IPI on late deferred wake up before user resume rcu/nocb: Perform deferred wake up before last idle's need_resched() check rcu: Pull deferred rcuog wake up to rcu_eqs_enter() callers sched/features: Distinguish between NORMAL and DEADLINE hrtick sched/features: Fix hrtick reprogramming sched/deadline: Reduce rq lock contention in dl_add_task_root_domain() uprobes: (Re)add missing get_uprobe() in __find_uprobe() smp: Process pending softirqs in flush_smp_call_function_from_idle() sched: Harden PREEMPT_DYNAMIC static_call: Allow module use without exposing static_call_key sched: Add /debug/sched_preempt preempt/dynamic: Support dynamic preempt with preempt= boot option preempt/dynamic: Provide irqentry_exit_cond_resched() static call preempt/dynamic: Provide preempt_schedule[_notrace]() static calls preempt/dynamic: Provide cond_resched() and might_resched() static calls preempt: Introduce CONFIG_PREEMPT_DYNAMIC static_call: Provide DEFINE_STATIC_CALL_RET0() ...
2021-02-17entry/kvm: Explicitly flush pending rcuog wakeup before last rescheduling pointFrederic Weisbecker1-0/+1
Following the idle loop model, cleanly check for pending rcuog wakeup before the last rescheduling point upon resuming to guest mode. This way we can avoid to do it from rcu_user_enter() with the last resort self-IPI hack that enforces rescheduling. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20210131230548.32970-6-frederic@kernel.org
2021-02-17rcu/nocb: Trigger self-IPI on late deferred wake up before user resumeFrederic Weisbecker1-9/+16
Entering RCU idle mode may cause a deferred wake up of an RCU NOCB_GP kthread (rcuog) to be serviced. Unfortunately the call to rcu_user_enter() is already past the last rescheduling opportunity before we resume to userspace or to guest mode. We may escape there with the woken task ignored. The ultimate resort to fix every callsites is to trigger a self-IPI (nohz_full depends on arch to implement arch_irq_work_raise()) that will trigger a reschedule on IRQ tail or guest exit. Eventually every site that want a saner treatment will need to carefully place a call to rcu_nocb_flush_deferred_wakeup() before the last explicit need_resched() check upon resume. Fixes: 96d3fd0d315a (rcu: Break call_rcu() deadlock involving scheduler and perf) Reported-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20210131230548.32970-4-frederic@kernel.org
2021-02-17rcu/nocb: Perform deferred wake up before last idle's need_resched() checkFrederic Weisbecker1-0/+5
Entering RCU idle mode may cause a deferred wake up of an RCU NOCB_GP kthread (rcuog) to be serviced. Usually a local wake up happening while running the idle task is handled in one of the need_resched() checks carefully placed within the idle loop that can break to the scheduler. Unfortunately the call to rcu_idle_enter() is already beyond the last generic need_resched() check and we may halt the CPU with a resched request unhandled, leaving the task hanging. Fix this with splitting the rcuog wakeup handling from rcu_idle_enter() and place it before the last generic need_resched() check in the idle loop. It is then assumed that no call to call_rcu() will be performed after that in the idle loop until the CPU is put in low power mode. Fixes: 96d3fd0d315a (rcu: Break call_rcu() deadlock involving scheduler and perf) Reported-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20210131230548.32970-3-frederic@kernel.org
2021-01-22Merge branches 'doc.2021.01.06a', 'fixes.2021.01.04b', ↵Paul E. McKenney1-25/+342
'kfree_rcu.2021.01.04a', 'mmdumpobj.2021.01.22a', 'nocb.2021.01.06a', 'rt.2021.01.04a', 'stall.2021.01.06a', 'torture.2021.01.12a' and 'tortureall.2021.01.06a' into HEAD doc.2021.01.06a: Documentation updates. fixes.2021.01.04b: Miscellaneous fixes. kfree_rcu.2021.01.04a: kfree_rcu() updates. mmdumpobj.2021.01.22a: Dump allocation point for memory blocks. nocb.2021.01.06a: RCU callback offload updates and cblist segment lengths. rt.2021.01.04a: Real-time updates. stall.2021.01.06a: RCU CPU stall warning updates. torture.2021.01.12a: Torture-test updates and polling SRCU grace-period API. tortureall.2021.01.06a: Torture-test script updates.
2021-01-06rcu/nocb: Code-style nits in callback-offloading togglingPaul E. McKenney1-23/+22
This commit addresses a few code-style nits in callback-offloading toggling, including one that predates this toggling. Cc: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-06rcu/nocb: Add nocb CB kthread list to show_rcu_nocb_state() outputPaul E. McKenney1-0/+1
This commit improves debuggability by indicating laying out the order in which rcuoc kthreads appear in the ->nocb_next_cb_rdp list. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-06rcu/nocb: Add grace period and task state to show_rcu_nocb_state() outputPaul E. McKenney1-7/+32
This commit improves debuggability by indicating which grace period each batch of nocb callbacks is waiting on and by showing the task state and last CPU for reach nocb kthread. [ paulmck: Handle !SMP CB offloading per kernel test robot feedback. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-06rcu/nocb: Set SEGCBLIST_SOFTIRQ_ONLY at the very last stage of de-offloadingFrederic Weisbecker1-1/+8
This commit sets SEGCBLIST_SOFTIRQ_ONLY once toggling is otherwise fully complete, allowing further RCU callback manipulation to be carried out locklessly and locally. Cc: Josh Triplett <josh@joshtriplett.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Thomas Gleixner <tglx@linutronix.de> Inspired-by: Paul E. McKenney <paulmck@kernel.org> Tested-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-06rcu/nocb: Flush bypass before setting SEGCBLIST_SOFTIRQ_ONLYFrederic Weisbecker1-1/+10
This commit flushes the bypass queue and sets state to avoid its being refilled before switching to the final de-offloaded state. To avoid refilling, this commit sets SEGCBLIST_SOFTIRQ_ONLY before re-enabling IRQs. Cc: Josh Triplett <josh@joshtriplett.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Thomas Gleixner <tglx@linutronix.de> Inspired-by: Paul E. McKenney <paulmck@kernel.org> Tested-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-06rcu/nocb: Shutdown nocb timer on de-offloadingFrederic Weisbecker1-1/+11
This commit ensures that the nocb timer is shut down before reaching the final de-offloaded state. The key goal is to prevent the timer handler from manipulating the callbacks without the protection of the nocb locks. Cc: Josh Triplett <josh@joshtriplett.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Thomas Gleixner <tglx@linutronix.de> Inspired-by: Paul E. McKenney <paulmck@kernel.org> Tested-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-06rcu/nocb: Re-offload supportFrederic Weisbecker1-22/+136
To re-offload the callback processing off of a CPU, it is necessary to clear SEGCBLIST_SOFTIRQ_ONLY, set SEGCBLIST_OFFLOADED, and then notify both the CB and GP kthreads so that they both set their own bit flag and start processing the callbacks remotely. The re-offloading worker is then notified that it can stop the RCU_SOFTIRQ handler (or rcuc kthread, as the case may be) from processing the callbacks locally. Ordering must be carefully enforced so that the callbacks that used to be processed locally without locking will have the same ordering properties when they are invoked by the nocb CB and GP kthreads. This commit makes this change. Cc: Josh Triplett <josh@joshtriplett.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Thomas Gleixner <tglx@linutronix.de> Inspired-by: Paul E. McKenney <paulmck@kernel.org> Tested-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> [ paulmck: Export rcu_nocb_cpu_offload(). ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-06rcu/nocb: De-offloading GP kthreadFrederic Weisbecker1-3/+51
To de-offload callback processing back onto a CPU, it is necessary to clear SEGCBLIST_OFFLOAD and notify the nocb GP kthread, which will then clear its own bit flag and ignore this CPU until further notice. Whichever of the nocb CB and nocb GP kthreads is last to clear its own bit notifies the de-offloading worker kthread. Once notified, this worker kthread can proceed safe in the knowledge that the nocb CB and GP kthreads will no longer be manipulating this CPU's RCU callback list. This commit makes this change. Cc: Josh Triplett <josh@joshtriplett.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Thomas Gleixner <tglx@linutronix.de> Inspired-by: Paul E. McKenney <paulmck@kernel.org> Tested-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>