From 9fd7fc34cfcaf9f6c932ee1710cce83da3b7bd59 Mon Sep 17 00:00:00 2001 From: Nicholas Mc Guire Date: Mon, 8 Dec 2014 09:33:26 +0100 Subject: locking/lglocks: Add documentation of current lglocks implementation Local/global locks are currently not documented anywhere other than in an somewhat out-of-date LWN article - this is an attempt to document the current state of lglocks. This patch is against linux-next 3.18.0-rc6 Signed-off-by: Nicholas Mc Guire Cc: Paul E. McKenney Cc: Carsten Emde Cc: Steven Rostedt Cc: Jonathan Corbet Cc: Linus Torvalds Cc: Andrew Morton Cc: Peter Zijlstra Link: http://lkml.kernel.org/r/20141208083326.GA29895@opentech.at Signed-off-by: Ingo Molnar --- Documentation/locking/lglock.txt | 166 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 166 insertions(+) create mode 100644 Documentation/locking/lglock.txt diff --git a/Documentation/locking/lglock.txt b/Documentation/locking/lglock.txt new file mode 100644 index 000000000000..a6971e34fabe --- /dev/null +++ b/Documentation/locking/lglock.txt @@ -0,0 +1,166 @@ +lglock - local/global locks for mostly local access patterns +------------------------------------------------------------ + +Origin: Nick Piggin's VFS scalability series introduced during + 2.6.35++ [1] [2] +Location: kernel/locking/lglock.c + include/linux/lglock.h +Users: currently only the VFS and stop_machine related code + +Design Goal: +------------ + +Improve scalability of globally used large data sets that are +distributed over all CPUs as per_cpu elements. + +To manage global data structures that are partitioned over all CPUs +as per_cpu elements but can be mostly handled by CPU local actions +lglock will be used where the majority of accesses are cpu local +reading and occasional cpu local writing with very infrequent +global write access. + + +* deal with things locally whenever possible + - very fast access to the local per_cpu data + - reasonably fast access to specific per_cpu data on a different + CPU +* while making global action possible when needed + - by expensive access to all CPUs locks - effectively + resulting in a globally visible critical section. + +Design: +------- + +Basically it is an array of per_cpu spinlocks with the +lg_local_lock/unlock accessing the local CPUs lock object and the +lg_local_lock_cpu/unlock_cpu accessing a remote CPUs lock object +the lg_local_lock has to disable preemption as migration protection so +that the reference to the local CPUs lock does not go out of scope. +Due to the lg_local_lock/unlock only touching cpu-local resources it +is fast. Taking the local lock on a different CPU will be more +expensive but still relatively cheap. + +One can relax the migration constraints by acquiring the current +CPUs lock with lg_local_lock_cpu, remember the cpu, and release that +lock at the end of the critical section even if migrated. This should +give most of the performance benefits without inhibiting migration +though needs careful considerations for nesting of lglocks and +consideration of deadlocks with lg_global_lock. + +The lg_global_lock/unlock locks all underlying spinlocks of all +possible CPUs (including those off-line). The preemption disable/enable +are needed in the non-RT kernels to prevent deadlocks like: + + on cpu 1 + + task A task B + lg_global_lock + got cpu 0 lock + <<<< preempt <<<< + lg_local_lock_cpu for cpu 0 + spin on cpu 0 lock + +On -RT this deadlock scenario is resolved by the arch_spin_locks in the +lglocks being replaced by rt_mutexes which resolve the above deadlock +by boosting the lock-holder. + + +Implementation: +--------------- + +The initial lglock implementation from Nick Piggin used some complex +macros to generate the lglock/brlock in lglock.h - they were later +turned into a set of functions by Andi Kleen [7]. The change to functions +was motivated by the presence of multiple lock users and also by them +being easier to maintain than the generating macros. This change to +functions is also the basis to eliminated the restriction of not +being initializeable in kernel modules (the remaining problem is that +locks are not explicitly initialized - see lockdep-design.txt) + +Declaration and initialization: +------------------------------- + + #include + + DEFINE_LGLOCK(name) + or: + DEFINE_STATIC_LGLOCK(name); + + lg_lock_init(&name, "lockdep_name_string"); + + on UP this is mapped to DEFINE_SPINLOCK(name) in both cases, note + also that as of 3.18-rc6 all declaration in use are of the _STATIC_ + variant (and it seems that the non-static was never in use). + lg_lock_init is initializing the lockdep map only. + +Usage: +------ + +From the locking semantics it is a spinlock. It could be called a +locality aware spinlock. lg_local_* behaves like a per_cpu +spinlock and lg_global_* like a global spinlock. +No surprises in the API. + + lg_local_lock(*lglock); + access to protected per_cpu object on this CPU + lg_local_unlock(*lglock); + + lg_local_lock_cpu(*lglock, cpu); + access to protected per_cpu object on other CPU cpu + lg_local_unlock_cpu(*lglock, cpu); + + lg_global_lock(*lglock); + access all protected per_cpu objects on all CPUs + lg_global_unlock(*lglock); + + There are no _trylock variants of the lglocks. + +Note that the lg_global_lock/unlock has to iterate over all possible +CPUs rather than the actually present CPUs or a CPU could go off-line +with a held lock [4] and that makes it very expensive. A discussion on +these issues can be found at [5] + +Constraints: +------------ + + * currently the declaration of lglocks in kernel modules is not + possible, though this should be doable with little change. + * lglocks are not recursive. + * suitable for code that can do most operations on the CPU local + data and will very rarely need the global lock + * lg_global_lock/unlock is *very* expensive and does not scale + * on UP systems all lg_* primitives are simply spinlocks + * in PREEMPT_RT the spinlock becomes an rt-mutex and can sleep but + does not change the tasks state while sleeping [6]. + * in PREEMPT_RT the preempt_disable/enable in lg_local_lock/unlock + is downgraded to a migrate_disable/enable, the other + preempt_disable/enable are downgraded to barriers [6]. + The deadlock noted for non-RT above is resolved due to rt_mutexes + boosting the lock-holder in this case which arch_spin_locks do + not do. + +lglocks were designed for very specific problems in the VFS and probably +only are the right answer in these corner cases. Any new user that looks +at lglocks probably wants to look at the seqlock and RCU alternatives as +her first choice. There are also efforts to resolve the RCU issues that +currently prevent using RCU in place of view remaining lglocks. + +Note on brlock history: +----------------------- + +The 'Big Reader' read-write spinlocks were originally introduced by +Ingo Molnar in 2000 (2.4/2.5 kernel series) and removed in 2003. They +later were introduced by the VFS scalability patch set in 2.6 series +again as the "big reader lock" brlock [2] variant of lglock which has +been replaced by seqlock primitives or by RCU based primitives in the +3.13 kernel series as was suggested in [3] in 2003. The brlock was +entirely removed in the 3.13 kernel series. + +Link: 1 http://lkml.org/lkml/2010/8/2/81 +Link: 2 http://lwn.net/Articles/401738/ +Link: 3 http://lkml.org/lkml/2003/3/9/205 +Link: 4 https://lkml.org/lkml/2011/8/24/185 +Link: 5 http://lkml.org/lkml/2011/12/18/189 +Link: 6 https://www.kernel.org/pub/linux/kernel/projects/rt/ + patch series - lglocks-rt.patch.patch +Link: 7 http://lkml.org/lkml/2012/3/5/26 -- cgit v1.2.3 From 78bff1c8684fb94f1ae7283688f90188b53fc433 Mon Sep 17 00:00:00 2001 From: Oleg Nesterov Date: Mon, 1 Dec 2014 22:34:17 +0100 Subject: x86/ticketlock: Fix spin_unlock_wait() livelock arch_spin_unlock_wait() looks very suboptimal, to the point I think this is just wrong and can lead to livelock: if the lock is heavily contended we can never see head == tail. But we do not need to wait for arch_spin_is_locked() == F. If it is locked we only need to wait until the current owner drops this lock. So we could simply spin until old_head != lock->tickets.head in this case, but .head can overflow and thus we can't check "unlocked" only once before the main loop. Also, the "unlocked" check can ignore TICKET_SLOWPATH_FLAG bit. Signed-off-by: Oleg Nesterov Acked-by: Linus Torvalds Cc: Jeremy Fitzhardinge Cc: Paul E.McKenney Cc: Peter Zijlstra Cc: Waiman Long Link: http://lkml.kernel.org/r/20141201213417.GA5842@redhat.com Signed-off-by: Ingo Molnar --- arch/x86/include/asm/spinlock.h | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h index bf156ded74b5..abc34e95398d 100644 --- a/arch/x86/include/asm/spinlock.h +++ b/arch/x86/include/asm/spinlock.h @@ -184,8 +184,20 @@ static __always_inline void arch_spin_lock_flags(arch_spinlock_t *lock, static inline void arch_spin_unlock_wait(arch_spinlock_t *lock) { - while (arch_spin_is_locked(lock)) + __ticket_t head = ACCESS_ONCE(lock->tickets.head); + + for (;;) { + struct __raw_tickets tmp = ACCESS_ONCE(lock->tickets); + /* + * We need to check "unlocked" in a loop, tmp.head == head + * can be false positive because of overflow. + */ + if (tmp.head == (tmp.tail & ~TICKET_SLOWPATH_FLAG) || + tmp.head != head) + break; + cpu_relax(); + } } /* -- cgit v1.2.3