summaryrefslogtreecommitdiffstats
path: root/kernel/locking
AgeCommit message (Collapse)AuthorFilesLines
2021-08-25locking/rtmutex: Dequeue waiter on ww_mutex deadlockThomas Gleixner1-1/+6
The rt_mutex based ww_mutex variant queues the new waiter first in the lock's rbtree before evaluating the ww_mutex specific conditions which might decide that the waiter should back out. This check and conditional exit happens before the waiter is enqueued into the PI chain. The failure handling at the call site assumes that the waiter, if it is the top most waiter on the lock, is queued in the PI chain and then proceeds to adjust the unmodified PI chain, which results in RB tree corruption. Dequeue the waiter from the lock waiter list in the ww_mutex error exit path to prevent this. Fixes: add461325ec5 ("locking/rtmutex: Extend the rtmutex core to support ww_mutex") Reported-by: Sebastian Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210825102454.042280541@linutronix.de
2021-08-25locking/rtmutex: Dont dereference waiter locklessThomas Gleixner2-2/+16
The new rt_mutex_spin_on_onwer() loop checks whether the spinning waiter is still the top waiter on the lock by utilizing rt_mutex_top_waiter(), which is broken because that function contains a sanity check which dereferences the top waiter pointer to check whether the waiter belongs to the lock. That's wrong in the lockless spinwait case: CPU 0 CPU 1 rt_mutex_lock(lock) rt_mutex_lock(lock); queue(waiter0) waiter0 == rt_mutex_top_waiter(lock) rt_mutex_spin_on_onwer(lock, waiter0) { queue(waiter1) waiter1 == rt_mutex_top_waiter(lock) ... top_waiter = rt_mutex_top_waiter(lock) leftmost = rb_first_cached(&lock->waiters); -> signal dequeue(waiter1) destroy(waiter1) w = rb_entry(leftmost, ....) BUG_ON(w->lock != lock) <- UAF The BUG_ON() is correct for the case where the caller holds lock->wait_lock which guarantees that the leftmost waiter entry cannot vanish. For the lockless spinwait case it's broken. Create a new helper function which avoids the pointer dereference and just compares the leftmost entry pointer with current's waiter pointer to validate that currrent is still elegible for spinning. Fixes: 992caf7f1724 ("locking/rtmutex: Add adaptive spinwait mechanism") Reported-by: Sebastian Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210825102453.981720644@linutronix.de
2021-08-20locking/semaphore: Add might_sleep() to down_*() familyXiaoming Ni1-0/+4
Semaphore is sleeping lock. Add might_sleep() to down*() family (with exception of down_trylock()) to detect atomic context sleep. Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20210809021215.19991-1-nixiaoming@huawei.com
2021-08-20locking/ww_mutex: Initialize waiter.ww_ctx properlySebastian Andrzej Siewior1-1/+1
The consolidation of the debug code for mutex waiter intialization sets waiter::ww_ctx to a poison value unconditionally. For regular mutexes this is intended to catch the case where waiter_ww_ctx is dereferenced accidentally. For ww_mutex the poison value has to be overwritten either with a context pointer or NULL for ww_mutexes without context. The rework broke this as it made the store conditional on the context pointer instead of the argument which signals whether ww_mutex code should be compiled in or optiized out. As a result waiter::ww_ctx ends up with the poison pointer for contextless ww_mutexes which causes a later dereference of the poison pointer because it is != NULL. Use the build argument instead so for ww_mutex the poison value is always overwritten. Fixes: c0afb0ffc06e6 ("locking/ww_mutex: Gather mutex_waiter initialization") Reported-by: Guenter Roeck <linux@roeck-us.net> Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210819193030.zpwrpvvrmy7xxxiy@linutronix.de
2021-08-17locking/spinlock/rt: Prepare for RT local_lockThomas Gleixner1-2/+5
Add the static and runtime initializer mechanics to support the RT variant of local_lock, which requires the lock type in the lockdep map to be set to LD_LOCK_PERCPU. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211305.967526724@linutronix.de
2021-08-17locking/rtmutex: Add adaptive spinwait mechanismSteven Rostedt1-2/+65
Going to sleep when locks are contended can be quite inefficient when the contention time is short and the lock owner is running on a different CPU. The MCS mechanism cannot be used because MCS is strictly FIFO ordered while for rtmutex based locks the waiter ordering is priority based. Provide a simple adaptive spinwait mechanism which currently restricts the spinning to the top priority waiter. [ tglx: Provide a contemporary changelog, extended it to all rtmutex based locks and updated it to match the other spin on owner implementations ] Originally-by: Gregory Haskins <ghaskins@novell.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211305.912050691@linutronix.de
2021-08-17locking/rtmutex: Implement equal priority lock stealingGregory Haskins1-17/+35
The current logic only allows lock stealing to occur if the current task is of higher priority than the pending owner. Significant throughput improvements can be gained by allowing the lock stealing to include tasks of equal priority when the contended lock is a spin_lock or a rw_lock and the tasks are not in a RT scheduling task. The assumption was that the system will make faster progress by allowing the task already on the CPU to take the lock rather than waiting for the system to wake up a different task. This does add a degree of unfairness, but in reality no negative side effects have been observed in the many years that this has been used in the RT kernel. [ tglx: Refactored and rewritten several times by Steve Rostedt, Sebastian Siewior and myself ] Signed-off-by: Gregory Haskins <ghaskins@novell.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211305.857240222@linutronix.de
2021-08-17locking/rtmutex: Prevent lockdep false positive with PI futexesThomas Gleixner1-0/+12
On PREEMPT_RT the futex hashbucket spinlock becomes 'sleeping' and rtmutex based. That causes a lockdep false positive because some of the futex functions invoke spin_unlock(&hb->lock) with the wait_lock of the rtmutex associated to the pi_futex held. spin_unlock() in turn takes wait_lock of the rtmutex on which the spinlock is based which makes lockdep notice a lock recursion. Give the futex/rtmutex wait_lock a separate key. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211305.750701219@linutronix.de
2021-08-17locking/rtmutex: Add mutex variant for RTThomas Gleixner2-1/+125
Add the necessary defines, helpers and API functions for replacing struct mutex on a PREEMPT_RT enabled kernel with an rtmutex based variant. No functional change when CONFIG_PREEMPT_RT=n Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211305.081517417@linutronix.de
2021-08-17locking/ww_mutex: Implement rtmutex based ww_mutex API functionsPeter Zijlstra2-1/+77
Add the actual ww_mutex API functions which replace the mutex based variant on RT enabled kernels. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211305.024057938@linutronix.de
2021-08-17locking/rtmutex: Extend the rtmutex core to support ww_mutexPeter Zijlstra4-14/+115
Add a ww acquire context pointer to the waiter and various functions and add the ww_mutex related invocations to the proper spots in the locking code, similar to the mutex based variant. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211304.966139174@linutronix.de
2021-08-17locking/ww_mutex: Add rt_mutex based lock type and accessorsPeter Zijlstra1-3/+3
Provide the defines for RT mutex based ww_mutexes and fix up the debug logic so it's either enabled by DEBUG_MUTEXES or DEBUG_RT_MUTEXES on RT kernels. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211304.908012566@linutronix.de
2021-08-17locking/ww_mutex: Add RT priority to W/W orderPeter Zijlstra1-15/+49
RT mutex based ww_mutexes cannot order based on timestamps. They have to order based on priority. Add the necessary decision logic. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211304.847536630@linutronix.de
2021-08-17locking/ww_mutex: Implement rt_mutex accessorsPeter Zijlstra1-0/+80
Provide the type defines and the helper inlines for rtmutex based ww_mutexes. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211304.790760545@linutronix.de
2021-08-17locking/ww_mutex: Abstract out internal lock accessesThomas Gleixner1-4/+19
Accessing the internal wait_lock of mutex and rtmutex is slightly different. Provide helper functions for that. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211304.734635961@linutronix.de
2021-08-17locking/ww_mutex: Abstract out mutex typesPeter Zijlstra1-10/+13
Some ww_mutex helper functions use pointers for the underlying mutex and mutex_waiter. The upcoming rtmutex based implementation needs to share these functions. Add and use defines for the types and replace the direct types in the affected functions. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211304.678720245@linutronix.de
2021-08-17locking/ww_mutex: Abstract out mutex accessorsPeter Zijlstra1-2/+14
Move the mutex related access from various ww_mutex functions into helper functions so they can be substituted for rtmutex based ww_mutex later. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211304.622477030@linutronix.de
2021-08-17locking/ww_mutex: Abstract out waiter enqueueingPeter Zijlstra1-6/+13
The upcoming rtmutex based ww_mutex needs a different handling for enqueueing a waiter. Split it out into a helper function. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211304.566318143@linutronix.de
2021-08-17locking/ww_mutex: Abstract out the waiter iterationPeter Zijlstra1-4/+53
Split out the waiter iteration functions so they can be substituted for a rtmutex based ww_mutex later. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211304.509186185@linutronix.de
2021-08-17locking/ww_mutex: Remove the __sched annotation from ww_mutex APIsPeter Zijlstra1-6/+6
None of these functions will be on the stack when blocking in schedule(), hence __sched is not needed. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211304.453235952@linutronix.de
2021-08-17locking/ww_mutex: Split out the W/W implementation logic into ↵Peter Zijlstra (Intel)2-371/+370
kernel/locking/ww_mutex.h Split the W/W mutex helper functions out into a separate header file, so they can be shared with a rtmutex based variant later. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211304.396893399@linutronix.de
2021-08-17locking/ww_mutex: Split up ww_mutex_unlock()Peter Zijlstra (Intel)1-11/+15
Split the ww related part out into a helper function so it can be reused for a rtmutex based ww_mutex implementation. [ mingo: Fixed bisection failure. ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211304.340166556@linutronix.de
2021-08-17locking/ww_mutex: Gather mutex_waiter initializationPeter Zijlstra2-9/+4
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211304.281927514@linutronix.de
2021-08-17locking/ww_mutex: Simplify lockdep annotationsPeter Zijlstra1-9/+10
No functional change. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211304.222921634@linutronix.de
2021-08-17locking/mutex: Make mutex::wait_lock rawThomas Gleixner1-11/+11
The wait_lock of mutex is really a low level lock. Convert it to a raw_spinlock like the wait_lock of rtmutex. [ mingo: backmerged the test_lockup.c build fix by bigeasy. ] Co-developed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211304.166863404@linutronix.de
2021-08-17locking/mutex: Move the 'struct mutex_waiter' definition from ↵Thomas Gleixner1-0/+13
<linux/mutex.h> to the internal header Move the mutex waiter declaration from the public <linux/mutex.h> header to the internal kernel/locking/mutex.h header. There is no reason to expose it outside of the core code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211304.054325923@linutronix.de
2021-08-17locking/mutex: Consolidate core headers, remove kernel/locking/mutex-debug.hThomas Gleixner4-48/+26
Having two header files which contain just the non-debug and debug variants is mostly waste of disc space and has no real value. Stick the debug variants into the common mutex.h file as counterpart to the stubs for the non-debug case. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211303.995350521@linutronix.de
2021-08-17locking/rtmutex: Squash !RT tasks to DEFAULT_PRIOPeter Zijlstra1-5/+20
Ensure all !RT tasks have the same prio such that they end up in FIFO order and aren't split up according to nice level. The reason why nice levels were taken into account so far is historical. In the early days of the rtmutex code it was done to give the PI boosting and deboosting a larger coverage. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211303.938676930@linutronix.de
2021-08-17locking/rwlock: Provide RT variantThomas Gleixner3-0/+143
Similar to rw_semaphores, on RT the rwlock substitution is not writer fair, because it's not feasible to have a writer inherit its priority to multiple readers. Readers blocked on a writer follow the normal rules of priority inheritance. Like RT spinlocks, RT rwlocks are state preserving across the slow lock operations (contended case). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211303.882793524@linutronix.de
2021-08-17locking/spinlock: Provide RT variantThomas Gleixner2-0/+130
Provide the actual locking functions which make use of the general and spinlock specific rtmutex code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211303.826621464@linutronix.de
2021-08-17locking/rtmutex: Provide the spin/rwlock core lock functionThomas Gleixner2-1/+61
A simplified version of the rtmutex slowlock function, which neither handles signals nor timeouts, and is careful about preserving the state of the blocked task across the lock operation. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211303.770228446@linutronix.de
2021-08-17locking/rtmutex: Guard regular sleeping locks specific functionsThomas Gleixner3-123/+133
Guard the regular sleeping lock specific functionality, which is used for rtmutex on non-RT enabled kernels and for mutex, rtmutex and semaphores on RT enabled kernels so the code can be reused for the RT specific implementation of spinlocks and rwlocks in a different compilation unit. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211303.311535693@linutronix.de
2021-08-17locking/rtmutex: Prepare RT rt_mutex_wake_q for RT locksThomas Gleixner2-3/+20
Add an rtlock_task pointer to rt_mutex_wake_q, which allows to handle the RT specific wakeup for spin/rwlock waiters. The pointer is just consuming 4/8 bytes on the stack so it is provided unconditionaly to avoid #ifdeffery all over the place. This cannot use a regular wake_q, because a task can have concurrent wakeups which would make it miss either lock or the regular wakeups, depending on what gets queued first, unless task struct gains a separate wake_q_node for this, which would be overkill, because there can only be a single task which gets woken up in the spin/rw_lock unlock path. No functional change for non-RT enabled kernels. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211303.253614678@linutronix.de
2021-08-17locking/rtmutex: Use rt_mutex_wake_q_headThomas Gleixner3-19/+16
Prepare for the required state aware handling of waiter wakeups via wake_q and switch the rtmutex code over to the rtmutex specific wrapper. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211303.197113263@linutronix.de
2021-08-17locking/rtmutex: Provide rt_wake_q_head and helpersThomas Gleixner2-0/+29
To handle the difference between wakeups for regular sleeping locks (mutex, rtmutex, rw_semaphore) and the wakeups for 'sleeping' spin/rwlocks on PREEMPT_RT enabled kernels correctly, it is required to provide a wake_q_head construct which allows to keep them separate. Provide a wrapper around wake_q_head and the required helpers, which will be extended with the state handling later. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211303.139337655@linutronix.de
2021-08-17locking/rtmutex: Add wake_state to rt_mutex_waiterThomas Gleixner2-1/+10
Regular sleeping locks like mutexes, rtmutexes and rw_semaphores are always entering and leaving a blocking section with task state == TASK_RUNNING. On a non-RT kernel spinlocks and rwlocks never affect the task state, but on RT kernels these locks are converted to rtmutex based 'sleeping' locks. So in case of contention the task goes to block, which requires to carefully preserve the task state, and restore it after acquiring the lock taking regular wakeups for the task into account, which happened while the task was blocked. This state preserving is achieved by having a separate task state for blocking on a RT spin/rwlock and a saved_state field in task_struct along with careful handling of these wakeup scenarios in try_to_wake_up(). To avoid conditionals in the rtmutex code, store the wake state which has to be used for waking a lock waiter in rt_mutex_waiter which allows to handle the regular and RT spin/rwlocks by handing it to wake_up_state(). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211303.079800739@linutronix.de
2021-08-17locking/rwsem: Add rtmutex based R/W semaphore implementationThomas Gleixner1-0/+108
The RT specific R/W semaphore implementation used to restrict the number of readers to one, because a writer cannot block on multiple readers and inherit its priority or budget. The single reader restricting was painful in various ways: - Performance bottleneck for multi-threaded applications in the page fault path (mmap sem) - Progress blocker for drivers which are carefully crafted to avoid the potential reader/writer deadlock in mainline. The analysis of the writer code paths shows that properly written RT tasks should not take them. Syscalls like mmap(), file access which take mmap sem write locked have unbound latencies, which are completely unrelated to mmap sem. Other R/W sem users like graphics drivers are not suitable for RT tasks either. So there is little risk to hurt RT tasks when the RT rwsem implementation is done in the following way: - Allow concurrent readers - Make writers block until the last reader left the critical section. This blocking is not subject to priority/budget inheritance. - Readers blocked on a writer inherit their priority/budget in the normal way. There is a drawback with this scheme: R/W semaphores become writer unfair though the applications which have triggered writer starvation (mostly on mmap_sem) in the past are not really the typical workloads running on a RT system. So while it's unlikely to hit writer starvation, it's possible. If there are unexpected workloads on RT systems triggering it, the problem has to be revisited. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211303.016885947@linutronix.de
2021-08-17locking/rt: Add base code for RT rw_semaphore and rwlockThomas Gleixner1-0/+263
On PREEMPT_RT, rw_semaphores and rwlocks are substituted with an rtmutex and a reader count. The implementation is writer unfair, as it is not feasible to do priority inheritance on multiple readers, but experience has shown that real-time workloads are not the typical workloads which are sensitive to writer starvation. The inner workings of rw_semaphores and rwlocks on RT are almost identical except for the task state and signal handling. rw_semaphores are not state preserving over a contention, they are expected to enter and leave with state == TASK_RUNNING. rwlocks have a mechanism to preserve the state of the task at entry and restore it after unblocking taking potential non-lock related wakeups into account. rw_semaphores can also be subject to signal handling interrupting a blocked state, while rwlocks ignore signals. To avoid code duplication, provide a shared implementation which takes the small difference vs. state and signals into account. The code is included into the relevant rw_semaphore/rwlock base code and compiled for each use case separately. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211302.957920571@linutronix.de
2021-08-17locking/rtmutex: Provide rt_mutex_slowlock_locked()Thomas Gleixner2-43/+59
Split the inner workings of rt_mutex_slowlock() out into a separate function, which can be reused by the upcoming RT lock substitutions, e.g. for rw_semaphores. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211302.841971086@linutronix.de
2021-08-17locking/rtmutex: Split out the inner parts of 'struct rtmutex'Peter Zijlstra3-68/+75
RT builds substitutions for rwsem, mutex, spinlock and rwlock around rtmutexes. Split the inner working out so each lock substitution can use them with the appropriate lockdep annotations. This avoids having an extra unused lockdep map in the wrapped rtmutex. No functional change. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211302.784739994@linutronix.de
2021-08-17locking/rtmutex: Split API from implementationThomas Gleixner4-498/+514
Prepare for reusing the inner functions of rtmutex for RT lock substitutions: introduce kernel/locking/rtmutex_api.c and move them there. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211302.726560996@linutronix.de
2021-08-17locking/rtmutex: Switch to from cmpxchg_*() to try_cmpxchg_*()Thomas Gleixner1-2/+2
Allows the compiler to generate better code depending on the architecture. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211302.668958502@linutronix.de
2021-08-17locking/rtmutex: Convert macros to inlinesSebastian Andrzej Siewior1-4/+27
Inlines are type-safe... Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211302.610830960@linutronix.de
2021-08-17locking/rtmutex: Set proper wait context for lockdepThomas Gleixner1-1/+1
RT mutexes belong to the LD_WAIT_SLEEP class. Make them so. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211302.031014562@linutronix.de
2021-08-17Merge tag 'v5.14-rc6' into locking/core, to pick up fixesIngo Molnar8-36/+48
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2021-08-10locking/rtmutex: Use the correct rtmutex debugging config optionZhen Lei1-1/+1
It's CONFIG_DEBUG_RT_MUTEXES not CONFIG_DEBUG_RT_MUTEX. Fixes: f7efc4799f81 ("locking/rtmutex: Inline chainwalk depth check") Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Will Deacon <will@kernel.org> Acked-by: Boqun Feng <boqun.feng@gmail.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210731123011.4555-1-thunder.leizhen@huawei.com
2021-07-27locktorture: Count lock readersPaul E. McKenney1-5/+4
Currently, the lock_is_read_held variable is bool, so that a reader sets it to true just after lock acquisition and then to false just before lock release. This works in a rough statistical sense, but can result in false negatives just after one of a pair of concurrent readers has released the lock. This approach does have low overhead, but at the expense of the setting to true potentially never leaving the reader's store buffer, thus resulting in an unconditional false negative. This commit therefore converts this variable to atomic_t and makes the reader use atomic_inc() just after acquisition and atomic_dec() just before release. This does increase overhead, but this increase is negligible compared to the 10-microsecond lock hold time. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-07-27locktorture: Mark statistics data racesPaul E. McKenney1-7/+9
The lock_stress_stats structure's ->n_lock_fail and ->n_lock_acquired fields are incremented and sampled locklessly using plain C-language statements, which KCSAN objects to. This commit therefore marks the statistics gathering with data_race() to flag the intent. While in the area, this commit also reduces the number of accesses to the ->n_lock_acquired field, thus eliminating some possible check/use confusion. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-07-16locking/rwsem: Remove an unused parameter of rwsem_wake()xuyehan1-3/+3
The 2nd parameter 'count' is not used in this function. The places where the function is called are also modified. Signed-off-by: xuyehan <xuyehan@xiaomi.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Waiman Long <longman@redhat.com> Link: https://lore.kernel.org/r/1625547043-28103-1-git-send-email-yehanxu1@gmail.com
2021-07-11Merge tag 'locking-urgent-2021-07-11' of ↵Linus Torvalds1-12/+14
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Ingo Molnar: - Fix a Sparc crash - Fix a number of objtool warnings - Fix /proc/lockdep output on certain configs - Restore a kprobes fail-safe * tag 'locking-urgent-2021-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/atomic: sparc: Fix arch_cmpxchg64_local() kprobe/static_call: Restore missing static_call_text_reserved() static_call: Fix static_call_text_reserved() vs __init jump_label: Fix jump_label_text_reserved() vs __init locking/lockdep: Fix meaningless /proc/lockdep output of lock classes on !CONFIG_PROVE_LOCKING