From 59aabfc7e959f5f213e4e5cc7567ab4934da2adf Mon Sep 17 00:00:00 2001 From: Waiman Long Date: Thu, 30 Apr 2015 17:12:16 -0400 Subject: locking/rwsem: Reduce spinlock contention in wakeup after up_read()/up_write() In up_write()/up_read(), rwsem_wake() will be called whenever it detects that some writers/readers are waiting. The rwsem_wake() function will take the wait_lock and call __rwsem_do_wake() to do the real wakeup. For a heavily contended rwsem, doing a spin_lock() on wait_lock will cause further contention on the heavily contended rwsem cacheline resulting in delay in the completion of the up_read/up_write operations. This patch makes the wait_lock taking and the call to __rwsem_do_wake() optional if at least one spinning writer is present. The spinning writer will be able to take the rwsem and call rwsem_wake() later when it calls up_write(). With the presence of a spinning writer, rwsem_wake() will now try to acquire the lock using trylock. If that fails, it will just quit. Suggested-by: Peter Zijlstra (Intel) Signed-off-by: Waiman Long Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Davidlohr Bueso Acked-by: Jason Low Cc: Andrew Morton Cc: Borislav Petkov Cc: Douglas Hatch Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Scott J Norton Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/1430428337-16802-2-git-send-email-Waiman.Long@hp.com Signed-off-by: Ingo Molnar --- include/linux/osq_lock.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include') diff --git a/include/linux/osq_lock.h b/include/linux/osq_lock.h index 3a6490e81b28..703ea5c30a33 100644 --- a/include/linux/osq_lock.h +++ b/include/linux/osq_lock.h @@ -32,4 +32,9 @@ static inline void osq_lock_init(struct optimistic_spin_queue *lock) extern bool osq_lock(struct optimistic_spin_queue *lock); extern void osq_unlock(struct optimistic_spin_queue *lock); +static inline bool osq_is_locked(struct optimistic_spin_queue *lock) +{ + return atomic_read(&lock->tail) != OSQ_UNLOCKED_VAL; +} + #endif -- cgit v1.2.3 From 663fdcbee0a656cdaef934e7f50e6c2670373bc9 Mon Sep 17 00:00:00 2001 From: Preeti U Murthy Date: Thu, 30 Apr 2015 17:27:21 +0530 Subject: kernel: Replace reference to ASSIGN_ONCE() with WRITE_ONCE() in comment Looks like commit : 43239cbe79fc ("kernel: Change ASSIGN_ONCE(val, x) to WRITE_ONCE(x, val)") left behind a reference to ASSIGN_ONCE(). Update this to WRITE_ONCE(). Signed-off-by: Preeti U Murthy Signed-off-by: Peter Zijlstra (Intel) Cc: Borislav Petkov Cc: H. Peter Anvin Cc: Thomas Gleixner Cc: borntraeger@de.ibm.com Cc: dave@stgolabs.net Cc: paulmck@linux.vnet.ibm.com Link: http://lkml.kernel.org/r/20150430115721.22278.94082.stgit@preeti.in.ibm.com Signed-off-by: Ingo Molnar --- include/linux/compiler.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/compiler.h b/include/linux/compiler.h index 867722591be2..a7c0941d10da 100644 --- a/include/linux/compiler.h +++ b/include/linux/compiler.h @@ -450,7 +450,7 @@ static __always_inline void __write_once_size(volatile void *p, void *res, int s * with an explicit memory barrier or atomic instruction that provides the * required ordering. * - * If possible use READ_ONCE/ASSIGN_ONCE instead. + * If possible use READ_ONCE()/WRITE_ONCE() instead. */ #define __ACCESS_ONCE(x) ({ \ __maybe_unused typeof(x) __var = (__force typeof(x)) 0; \ -- cgit v1.2.3 From a33fda35e3a7655fb7df756ed67822afb5ed5e8d Mon Sep 17 00:00:00 2001 From: Waiman Long Date: Fri, 24 Apr 2015 14:56:30 -0400 Subject: locking/qspinlock: Introduce a simple generic 4-byte queued spinlock This patch introduces a new generic queued spinlock implementation that can serve as an alternative to the default ticket spinlock. Compared with the ticket spinlock, this queued spinlock should be almost as fair as the ticket spinlock. It has about the same speed in single-thread and it can be much faster in high contention situations especially when the spinlock is embedded within the data structure to be protected. Only in light to moderate contention where the average queue depth is around 1-3 will this queued spinlock be potentially a bit slower due to the higher slowpath overhead. This queued spinlock is especially suit to NUMA machines with a large number of cores as the chance of spinlock contention is much higher in those machines. The cost of contention is also higher because of slower inter-node memory traffic. Due to the fact that spinlocks are acquired with preemption disabled, the process will not be migrated to another CPU while it is trying to get a spinlock. Ignoring interrupt handling, a CPU can only be contending in one spinlock at any one time. Counting soft IRQ, hard IRQ and NMI, a CPU can only have a maximum of 4 concurrent lock waiting activities. By allocating a set of per-cpu queue nodes and used them to form a waiting queue, we can encode the queue node address into a much smaller 24-bit size (including CPU number and queue node index) leaving one byte for the lock. Please note that the queue node is only needed when waiting for the lock. Once the lock is acquired, the queue node can be released to be used later. Signed-off-by: Waiman Long Signed-off-by: Peter Zijlstra (Intel) Cc: Andrew Morton Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Daniel J Blueman Cc: David Vrabel Cc: Douglas Hatch Cc: H. Peter Anvin Cc: Konrad Rzeszutek Wilk Cc: Linus Torvalds Cc: Oleg Nesterov Cc: Paolo Bonzini Cc: Paul E. McKenney Cc: Peter Zijlstra Cc: Raghavendra K T Cc: Rik van Riel Cc: Scott J Norton Cc: Thomas Gleixner Cc: virtualization@lists.linux-foundation.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1429901803-29771-2-git-send-email-Waiman.Long@hp.com Signed-off-by: Ingo Molnar --- include/asm-generic/qspinlock.h | 132 +++++++++++++++++++++ include/asm-generic/qspinlock_types.h | 58 ++++++++++ kernel/Kconfig.locks | 7 ++ kernel/locking/Makefile | 1 + kernel/locking/mcs_spinlock.h | 1 + kernel/locking/qspinlock.c | 209 ++++++++++++++++++++++++++++++++++ 6 files changed, 408 insertions(+) create mode 100644 include/asm-generic/qspinlock.h create mode 100644 include/asm-generic/qspinlock_types.h create mode 100644 kernel/locking/qspinlock.c (limited to 'include') diff --git a/include/asm-generic/qspinlock.h b/include/asm-generic/qspinlock.h new file mode 100644 index 000000000000..569abcd47a9a --- /dev/null +++ b/include/asm-generic/qspinlock.h @@ -0,0 +1,132 @@ +/* + * Queued spinlock + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * (C) Copyright 2013-2015 Hewlett-Packard Development Company, L.P. + * + * Authors: Waiman Long + */ +#ifndef __ASM_GENERIC_QSPINLOCK_H +#define __ASM_GENERIC_QSPINLOCK_H + +#include + +/** + * queued_spin_is_locked - is the spinlock locked? + * @lock: Pointer to queued spinlock structure + * Return: 1 if it is locked, 0 otherwise + */ +static __always_inline int queued_spin_is_locked(struct qspinlock *lock) +{ + return atomic_read(&lock->val); +} + +/** + * queued_spin_value_unlocked - is the spinlock structure unlocked? + * @lock: queued spinlock structure + * Return: 1 if it is unlocked, 0 otherwise + * + * N.B. Whenever there are tasks waiting for the lock, it is considered + * locked wrt the lockref code to avoid lock stealing by the lockref + * code and change things underneath the lock. This also allows some + * optimizations to be applied without conflict with lockref. + */ +static __always_inline int queued_spin_value_unlocked(struct qspinlock lock) +{ + return !atomic_read(&lock.val); +} + +/** + * queued_spin_is_contended - check if the lock is contended + * @lock : Pointer to queued spinlock structure + * Return: 1 if lock contended, 0 otherwise + */ +static __always_inline int queued_spin_is_contended(struct qspinlock *lock) +{ + return atomic_read(&lock->val) & ~_Q_LOCKED_MASK; +} +/** + * queued_spin_trylock - try to acquire the queued spinlock + * @lock : Pointer to queued spinlock structure + * Return: 1 if lock acquired, 0 if failed + */ +static __always_inline int queued_spin_trylock(struct qspinlock *lock) +{ + if (!atomic_read(&lock->val) && + (atomic_cmpxchg(&lock->val, 0, _Q_LOCKED_VAL) == 0)) + return 1; + return 0; +} + +extern void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val); + +/** + * queued_spin_lock - acquire a queued spinlock + * @lock: Pointer to queued spinlock structure + */ +static __always_inline void queued_spin_lock(struct qspinlock *lock) +{ + u32 val; + + val = atomic_cmpxchg(&lock->val, 0, _Q_LOCKED_VAL); + if (likely(val == 0)) + return; + queued_spin_lock_slowpath(lock, val); +} + +#ifndef queued_spin_unlock +/** + * queued_spin_unlock - release a queued spinlock + * @lock : Pointer to queued spinlock structure + */ +static __always_inline void queued_spin_unlock(struct qspinlock *lock) +{ + /* + * smp_mb__before_atomic() in order to guarantee release semantics + */ + smp_mb__before_atomic_dec(); + atomic_sub(_Q_LOCKED_VAL, &lock->val); +} +#endif + +/** + * queued_spin_unlock_wait - wait until current lock holder releases the lock + * @lock : Pointer to queued spinlock structure + * + * There is a very slight possibility of live-lock if the lockers keep coming + * and the waiter is just unfortunate enough to not see any unlock state. + */ +static inline void queued_spin_unlock_wait(struct qspinlock *lock) +{ + while (atomic_read(&lock->val) & _Q_LOCKED_MASK) + cpu_relax(); +} + +/* + * Initializier + */ +#define __ARCH_SPIN_LOCK_UNLOCKED { ATOMIC_INIT(0) } + +/* + * Remapping spinlock architecture specific functions to the corresponding + * queued spinlock functions. + */ +#define arch_spin_is_locked(l) queued_spin_is_locked(l) +#define arch_spin_is_contended(l) queued_spin_is_contended(l) +#define arch_spin_value_unlocked(l) queued_spin_value_unlocked(l) +#define arch_spin_lock(l) queued_spin_lock(l) +#define arch_spin_trylock(l) queued_spin_trylock(l) +#define arch_spin_unlock(l) queued_spin_unlock(l) +#define arch_spin_lock_flags(l, f) queued_spin_lock(l) +#define arch_spin_unlock_wait(l) queued_spin_unlock_wait(l) + +#endif /* __ASM_GENERIC_QSPINLOCK_H */ diff --git a/include/asm-generic/qspinlock_types.h b/include/asm-generic/qspinlock_types.h new file mode 100644 index 000000000000..aec05c7ad2f6 --- /dev/null +++ b/include/asm-generic/qspinlock_types.h @@ -0,0 +1,58 @@ +/* + * Queued spinlock + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * (C) Copyright 2013-2015 Hewlett-Packard Development Company, L.P. + * + * Authors: Waiman Long + */ +#ifndef __ASM_GENERIC_QSPINLOCK_TYPES_H +#define __ASM_GENERIC_QSPINLOCK_TYPES_H + +/* + * Including atomic.h with PARAVIRT on will cause compilation errors because + * of recursive header file incluson via paravirt_types.h. So don't include + * it if PARAVIRT is on. + */ +#ifndef CONFIG_PARAVIRT +#include +#include +#endif + +typedef struct qspinlock { + atomic_t val; +} arch_spinlock_t; + +/* + * Bitfields in the atomic value: + * + * 0- 7: locked byte + * 8- 9: tail index + * 10-31: tail cpu (+1) + */ +#define _Q_SET_MASK(type) (((1U << _Q_ ## type ## _BITS) - 1)\ + << _Q_ ## type ## _OFFSET) +#define _Q_LOCKED_OFFSET 0 +#define _Q_LOCKED_BITS 8 +#define _Q_LOCKED_MASK _Q_SET_MASK(LOCKED) + +#define _Q_TAIL_IDX_OFFSET (_Q_LOCKED_OFFSET + _Q_LOCKED_BITS) +#define _Q_TAIL_IDX_BITS 2 +#define _Q_TAIL_IDX_MASK _Q_SET_MASK(TAIL_IDX) + +#define _Q_TAIL_CPU_OFFSET (_Q_TAIL_IDX_OFFSET + _Q_TAIL_IDX_BITS) +#define _Q_TAIL_CPU_BITS (32 - _Q_TAIL_CPU_OFFSET) +#define _Q_TAIL_CPU_MASK _Q_SET_MASK(TAIL_CPU) + +#define _Q_LOCKED_VAL (1U << _Q_LOCKED_OFFSET) + +#endif /* __ASM_GENERIC_QSPINLOCK_TYPES_H */ diff --git a/kernel/Kconfig.locks b/kernel/Kconfig.locks index 08561f1acd13..95fdad866a98 100644 --- a/kernel/Kconfig.locks +++ b/kernel/Kconfig.locks @@ -235,6 +235,13 @@ config LOCK_SPIN_ON_OWNER def_bool y depends on MUTEX_SPIN_ON_OWNER || RWSEM_SPIN_ON_OWNER +config ARCH_USE_QUEUED_SPINLOCK + bool + +config QUEUED_SPINLOCK + def_bool y if ARCH_USE_QUEUED_SPINLOCK + depends on SMP && !PARAVIRT_SPINLOCKS + config ARCH_USE_QUEUE_RWLOCK bool diff --git a/kernel/locking/Makefile b/kernel/locking/Makefile index de7a416cca2a..abfcef3c1ef9 100644 --- a/kernel/locking/Makefile +++ b/kernel/locking/Makefile @@ -17,6 +17,7 @@ obj-$(CONFIG_SMP) += spinlock.o obj-$(CONFIG_LOCK_SPIN_ON_OWNER) += osq_lock.o obj-$(CONFIG_SMP) += lglock.o obj-$(CONFIG_PROVE_LOCKING) += spinlock.o +obj-$(CONFIG_QUEUED_SPINLOCK) += qspinlock.o obj-$(CONFIG_RT_MUTEXES) += rtmutex.o obj-$(CONFIG_DEBUG_RT_MUTEXES) += rtmutex-debug.o obj-$(CONFIG_RT_MUTEX_TESTER) += rtmutex-tester.o diff --git a/kernel/locking/mcs_spinlock.h b/kernel/locking/mcs_spinlock.h index 75e114bdf3f2..fd91aaa4554c 100644 --- a/kernel/locking/mcs_spinlock.h +++ b/kernel/locking/mcs_spinlock.h @@ -17,6 +17,7 @@ struct mcs_spinlock { struct mcs_spinlock *next; int locked; /* 1 if lock acquired */ + int count; /* nesting count, see qspinlock.c */ }; #ifndef arch_mcs_spin_lock_contended diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c new file mode 100644 index 000000000000..029b51ce10ea --- /dev/null +++ b/kernel/locking/qspinlock.c @@ -0,0 +1,209 @@ +/* + * Queued spinlock + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * (C) Copyright 2013-2015 Hewlett-Packard Development Company, L.P. + * (C) Copyright 2013-2014 Red Hat, Inc. + * (C) Copyright 2015 Intel Corp. + * + * Authors: Waiman Long + * Peter Zijlstra + */ +#include +#include +#include +#include +#include +#include +#include + +/* + * The basic principle of a queue-based spinlock can best be understood + * by studying a classic queue-based spinlock implementation called the + * MCS lock. The paper below provides a good description for this kind + * of lock. + * + * http://www.cise.ufl.edu/tr/DOC/REP-1992-71.pdf + * + * This queued spinlock implementation is based on the MCS lock, however to make + * it fit the 4 bytes we assume spinlock_t to be, and preserve its existing + * API, we must modify it somehow. + * + * In particular; where the traditional MCS lock consists of a tail pointer + * (8 bytes) and needs the next pointer (another 8 bytes) of its own node to + * unlock the next pending (next->locked), we compress both these: {tail, + * next->locked} into a single u32 value. + * + * Since a spinlock disables recursion of its own context and there is a limit + * to the contexts that can nest; namely: task, softirq, hardirq, nmi. As there + * are at most 4 nesting levels, it can be encoded by a 2-bit number. Now + * we can encode the tail by combining the 2-bit nesting level with the cpu + * number. With one byte for the lock value and 3 bytes for the tail, only a + * 32-bit word is now needed. Even though we only need 1 bit for the lock, + * we extend it to a full byte to achieve better performance for architectures + * that support atomic byte write. + * + * We also change the first spinner to spin on the lock bit instead of its + * node; whereby avoiding the need to carry a node from lock to unlock, and + * preserving existing lock API. This also makes the unlock code simpler and + * faster. + */ + +#include "mcs_spinlock.h" + +/* + * Per-CPU queue node structures; we can never have more than 4 nested + * contexts: task, softirq, hardirq, nmi. + * + * Exactly fits one 64-byte cacheline on a 64-bit architecture. + */ +static DEFINE_PER_CPU_ALIGNED(struct mcs_spinlock, mcs_nodes[4]); + +/* + * We must be able to distinguish between no-tail and the tail at 0:0, + * therefore increment the cpu number by one. + */ + +static inline u32 encode_tail(int cpu, int idx) +{ + u32 tail; + +#ifdef CONFIG_DEBUG_SPINLOCK + BUG_ON(idx > 3); +#endif + tail = (cpu + 1) << _Q_TAIL_CPU_OFFSET; + tail |= idx << _Q_TAIL_IDX_OFFSET; /* assume < 4 */ + + return tail; +} + +static inline struct mcs_spinlock *decode_tail(u32 tail) +{ + int cpu = (tail >> _Q_TAIL_CPU_OFFSET) - 1; + int idx = (tail & _Q_TAIL_IDX_MASK) >> _Q_TAIL_IDX_OFFSET; + + return per_cpu_ptr(&mcs_nodes[idx], cpu); +} + +/** + * queued_spin_lock_slowpath - acquire the queued spinlock + * @lock: Pointer to queued spinlock structure + * @val: Current value of the queued spinlock 32-bit word + * + * (queue tail, lock value) + * + * fast : slow : unlock + * : : + * uncontended (0,0) --:--> (0,1) --------------------------------:--> (*,0) + * : | ^--------. / : + * : v \ | : + * uncontended : (n,x) --+--> (n,0) | : + * queue : | ^--' | : + * : v | : + * contended : (*,x) --+--> (*,0) -----> (*,1) ---' : + * queue : ^--' : + * + */ +void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val) +{ + struct mcs_spinlock *prev, *next, *node; + u32 new, old, tail; + int idx; + + BUILD_BUG_ON(CONFIG_NR_CPUS >= (1U << _Q_TAIL_CPU_BITS)); + + node = this_cpu_ptr(&mcs_nodes[0]); + idx = node->count++; + tail = encode_tail(smp_processor_id(), idx); + + node += idx; + node->locked = 0; + node->next = NULL; + + /* + * trylock || xchg(lock, node) + * + * 0,0 -> 0,1 ; no tail, not locked -> no tail, locked. + * p,x -> n,x ; tail was p -> tail is n; preserving locked. + */ + for (;;) { + new = _Q_LOCKED_VAL; + if (val) + new = tail | (val & _Q_LOCKED_MASK); + + old = atomic_cmpxchg(&lock->val, val, new); + if (old == val) + break; + + val = old; + } + + /* + * we won the trylock; forget about queueing. + */ + if (new == _Q_LOCKED_VAL) + goto release; + + /* + * if there was a previous node; link it and wait until reaching the + * head of the waitqueue. + */ + if (old & ~_Q_LOCKED_MASK) { + prev = decode_tail(old); + WRITE_ONCE(prev->next, node); + + arch_mcs_spin_lock_contended(&node->locked); + } + + /* + * we're at the head of the waitqueue, wait for the owner to go away. + * + * *,x -> *,0 + */ + while ((val = atomic_read(&lock->val)) & _Q_LOCKED_MASK) + cpu_relax(); + + /* + * claim the lock: + * + * n,0 -> 0,1 : lock, uncontended + * *,0 -> *,1 : lock, contended + */ + for (;;) { + new = _Q_LOCKED_VAL; + if (val != tail) + new |= val; + + old = atomic_cmpxchg(&lock->val, val, new); + if (old == val) + break; + + val = old; + } + + /* + * contended path; wait for next, release. + */ + if (new != _Q_LOCKED_VAL) { + while (!(next = READ_ONCE(node->next))) + cpu_relax(); + + arch_mcs_spin_unlock_contended(&next->locked); + } + +release: + /* + * release the node + */ + this_cpu_dec(mcs_nodes[0].count); +} +EXPORT_SYMBOL(queued_spin_lock_slowpath); -- cgit v1.2.3 From c1fb159db9f2e50e0f4025bed92a67a6a7bfa7b7 Mon Sep 17 00:00:00 2001 From: "Peter Zijlstra (Intel)" Date: Fri, 24 Apr 2015 14:56:32 -0400 Subject: locking/qspinlock: Add pending bit Because the qspinlock needs to touch a second cacheline (the per-cpu mcs_nodes[]); add a pending bit and allow a single in-word spinner before we punt to the second cacheline. It is possible so observe the pending bit without the locked bit when the last owner has just released but the pending owner has not yet taken ownership. In this case we would normally queue -- because the pending bit is already taken. However, in this case the pending bit is guaranteed to be released 'soon', therefore wait for it and avoid queueing. Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Waiman Long Signed-off-by: Peter Zijlstra (Intel) Cc: Andrew Morton Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Daniel J Blueman Cc: David Vrabel Cc: Douglas Hatch Cc: H. Peter Anvin Cc: Konrad Rzeszutek Wilk Cc: Linus Torvalds Cc: Oleg Nesterov Cc: Paolo Bonzini Cc: Paul E. McKenney Cc: Peter Zijlstra Cc: Raghavendra K T Cc: Rik van Riel Cc: Scott J Norton Cc: Thomas Gleixner Cc: virtualization@lists.linux-foundation.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1429901803-29771-4-git-send-email-Waiman.Long@hp.com Signed-off-by: Ingo Molnar --- include/asm-generic/qspinlock_types.h | 12 +++- kernel/locking/qspinlock.c | 119 ++++++++++++++++++++++++++++------ 2 files changed, 107 insertions(+), 24 deletions(-) (limited to 'include') diff --git a/include/asm-generic/qspinlock_types.h b/include/asm-generic/qspinlock_types.h index aec05c7ad2f6..7ee6632cb818 100644 --- a/include/asm-generic/qspinlock_types.h +++ b/include/asm-generic/qspinlock_types.h @@ -36,8 +36,9 @@ typedef struct qspinlock { * Bitfields in the atomic value: * * 0- 7: locked byte - * 8- 9: tail index - * 10-31: tail cpu (+1) + * 8: pending + * 9-10: tail index + * 11-31: tail cpu (+1) */ #define _Q_SET_MASK(type) (((1U << _Q_ ## type ## _BITS) - 1)\ << _Q_ ## type ## _OFFSET) @@ -45,7 +46,11 @@ typedef struct qspinlock { #define _Q_LOCKED_BITS 8 #define _Q_LOCKED_MASK _Q_SET_MASK(LOCKED) -#define _Q_TAIL_IDX_OFFSET (_Q_LOCKED_OFFSET + _Q_LOCKED_BITS) +#define _Q_PENDING_OFFSET (_Q_LOCKED_OFFSET + _Q_LOCKED_BITS) +#define _Q_PENDING_BITS 1 +#define _Q_PENDING_MASK _Q_SET_MASK(PENDING) + +#define _Q_TAIL_IDX_OFFSET (_Q_PENDING_OFFSET + _Q_PENDING_BITS) #define _Q_TAIL_IDX_BITS 2 #define _Q_TAIL_IDX_MASK _Q_SET_MASK(TAIL_IDX) @@ -54,5 +59,6 @@ typedef struct qspinlock { #define _Q_TAIL_CPU_MASK _Q_SET_MASK(TAIL_CPU) #define _Q_LOCKED_VAL (1U << _Q_LOCKED_OFFSET) +#define _Q_PENDING_VAL (1U << _Q_PENDING_OFFSET) #endif /* __ASM_GENERIC_QSPINLOCK_TYPES_H */ diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c index 029b51ce10ea..af9c2ef6e930 100644 --- a/kernel/locking/qspinlock.c +++ b/kernel/locking/qspinlock.c @@ -94,24 +94,28 @@ static inline struct mcs_spinlock *decode_tail(u32 tail) return per_cpu_ptr(&mcs_nodes[idx], cpu); } +#define _Q_LOCKED_PENDING_MASK (_Q_LOCKED_MASK | _Q_PENDING_MASK) + /** * queued_spin_lock_slowpath - acquire the queued spinlock * @lock: Pointer to queued spinlock structure * @val: Current value of the queued spinlock 32-bit word * - * (queue tail, lock value) - * - * fast : slow : unlock - * : : - * uncontended (0,0) --:--> (0,1) --------------------------------:--> (*,0) - * : | ^--------. / : - * : v \ | : - * uncontended : (n,x) --+--> (n,0) | : - * queue : | ^--' | : - * : v | : - * contended : (*,x) --+--> (*,0) -----> (*,1) ---' : - * queue : ^--' : + * (queue tail, pending bit, lock value) * + * fast : slow : unlock + * : : + * uncontended (0,0,0) -:--> (0,0,1) ------------------------------:--> (*,*,0) + * : | ^--------.------. / : + * : v \ \ | : + * pending : (0,1,1) +--> (0,1,0) \ | : + * : | ^--' | | : + * : v | | : + * uncontended : (n,x,y) +--> (n,0,0) --' | : + * queue : | ^--' | : + * : v | : + * contended : (*,x,y) +--> (*,0,0) ---> (*,0,1) -' : + * queue : ^--' : */ void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val) { @@ -121,6 +125,75 @@ void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val) BUILD_BUG_ON(CONFIG_NR_CPUS >= (1U << _Q_TAIL_CPU_BITS)); + /* + * wait for in-progress pending->locked hand-overs + * + * 0,1,0 -> 0,0,1 + */ + if (val == _Q_PENDING_VAL) { + while ((val = atomic_read(&lock->val)) == _Q_PENDING_VAL) + cpu_relax(); + } + + /* + * trylock || pending + * + * 0,0,0 -> 0,0,1 ; trylock + * 0,0,1 -> 0,1,1 ; pending + */ + for (;;) { + /* + * If we observe any contention; queue. + */ + if (val & ~_Q_LOCKED_MASK) + goto queue; + + new = _Q_LOCKED_VAL; + if (val == new) + new |= _Q_PENDING_VAL; + + old = atomic_cmpxchg(&lock->val, val, new); + if (old == val) + break; + + val = old; + } + + /* + * we won the trylock + */ + if (new == _Q_LOCKED_VAL) + return; + + /* + * we're pending, wait for the owner to go away. + * + * *,1,1 -> *,1,0 + */ + while ((val = atomic_read(&lock->val)) & _Q_LOCKED_MASK) + cpu_relax(); + + /* + * take ownership and clear the pending bit. + * + * *,1,0 -> *,0,1 + */ + for (;;) { + new = (val & ~_Q_PENDING_MASK) | _Q_LOCKED_VAL; + + old = atomic_cmpxchg(&lock->val, val, new); + if (old == val) + break; + + val = old; + } + return; + + /* + * End of pending bit optimistic spinning and beginning of MCS + * queuing. + */ +queue: node = this_cpu_ptr(&mcs_nodes[0]); idx = node->count++; tail = encode_tail(smp_processor_id(), idx); @@ -130,15 +203,18 @@ void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val) node->next = NULL; /* + * We have already touched the queueing cacheline; don't bother with + * pending stuff. + * * trylock || xchg(lock, node) * - * 0,0 -> 0,1 ; no tail, not locked -> no tail, locked. - * p,x -> n,x ; tail was p -> tail is n; preserving locked. + * 0,0,0 -> 0,0,1 ; no tail, not locked -> no tail, locked. + * p,y,x -> n,y,x ; tail was p -> tail is n; preserving locked. */ for (;;) { new = _Q_LOCKED_VAL; if (val) - new = tail | (val & _Q_LOCKED_MASK); + new = tail | (val & _Q_LOCKED_PENDING_MASK); old = atomic_cmpxchg(&lock->val, val, new); if (old == val) @@ -157,7 +233,7 @@ void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val) * if there was a previous node; link it and wait until reaching the * head of the waitqueue. */ - if (old & ~_Q_LOCKED_MASK) { + if (old & ~_Q_LOCKED_PENDING_MASK) { prev = decode_tail(old); WRITE_ONCE(prev->next, node); @@ -165,18 +241,19 @@ void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val) } /* - * we're at the head of the waitqueue, wait for the owner to go away. + * we're at the head of the waitqueue, wait for the owner & pending to + * go away. * - * *,x -> *,0 + * *,x,y -> *,0,0 */ - while ((val = atomic_read(&lock->val)) & _Q_LOCKED_MASK) + while ((val = atomic_read(&lock->val)) & _Q_LOCKED_PENDING_MASK) cpu_relax(); /* * claim the lock: * - * n,0 -> 0,1 : lock, uncontended - * *,0 -> *,1 : lock, contended + * n,0,0 -> 0,0,1 : lock, uncontended + * *,0,0 -> *,0,1 : lock, contended */ for (;;) { new = _Q_LOCKED_VAL; -- cgit v1.2.3 From 6403bd7d0ea1878a487296114eccf78658d7dd7a Mon Sep 17 00:00:00 2001 From: Waiman Long Date: Fri, 24 Apr 2015 14:56:33 -0400 Subject: locking/qspinlock: Extract out code snippets for the next patch This is a preparatory patch that extracts out the following 2 code snippets to prepare for the next performance optimization patch. 1) the logic for the exchange of new and previous tail code words into a new xchg_tail() function. 2) the logic for clearing the pending bit and setting the locked bit into a new clear_pending_set_locked() function. This patch also simplifies the trylock operation before queuing by calling queued_spin_trylock() directly. Signed-off-by: Waiman Long Signed-off-by: Peter Zijlstra (Intel) Cc: Andrew Morton Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Daniel J Blueman Cc: David Vrabel Cc: Douglas Hatch Cc: H. Peter Anvin Cc: Konrad Rzeszutek Wilk Cc: Linus Torvalds Cc: Oleg Nesterov Cc: Paolo Bonzini Cc: Paul E. McKenney Cc: Peter Zijlstra Cc: Raghavendra K T Cc: Rik van Riel Cc: Scott J Norton Cc: Thomas Gleixner Cc: virtualization@lists.linux-foundation.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1429901803-29771-5-git-send-email-Waiman.Long@hp.com Signed-off-by: Ingo Molnar --- include/asm-generic/qspinlock_types.h | 2 + kernel/locking/qspinlock.c | 79 +++++++++++++++++++++-------------- 2 files changed, 50 insertions(+), 31 deletions(-) (limited to 'include') diff --git a/include/asm-generic/qspinlock_types.h b/include/asm-generic/qspinlock_types.h index 7ee6632cb818..3a7f67173bd0 100644 --- a/include/asm-generic/qspinlock_types.h +++ b/include/asm-generic/qspinlock_types.h @@ -58,6 +58,8 @@ typedef struct qspinlock { #define _Q_TAIL_CPU_BITS (32 - _Q_TAIL_CPU_OFFSET) #define _Q_TAIL_CPU_MASK _Q_SET_MASK(TAIL_CPU) +#define _Q_TAIL_MASK (_Q_TAIL_IDX_MASK | _Q_TAIL_CPU_MASK) + #define _Q_LOCKED_VAL (1U << _Q_LOCKED_OFFSET) #define _Q_PENDING_VAL (1U << _Q_PENDING_OFFSET) diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c index af9c2ef6e930..82bb4a9e9009 100644 --- a/kernel/locking/qspinlock.c +++ b/kernel/locking/qspinlock.c @@ -96,6 +96,42 @@ static inline struct mcs_spinlock *decode_tail(u32 tail) #define _Q_LOCKED_PENDING_MASK (_Q_LOCKED_MASK | _Q_PENDING_MASK) +/** + * clear_pending_set_locked - take ownership and clear the pending bit. + * @lock: Pointer to queued spinlock structure + * + * *,1,0 -> *,0,1 + */ +static __always_inline void clear_pending_set_locked(struct qspinlock *lock) +{ + atomic_add(-_Q_PENDING_VAL + _Q_LOCKED_VAL, &lock->val); +} + +/** + * xchg_tail - Put in the new queue tail code word & retrieve previous one + * @lock : Pointer to queued spinlock structure + * @tail : The new queue tail code word + * Return: The previous queue tail code word + * + * xchg(lock, tail) + * + * p,*,* -> n,*,* ; prev = xchg(lock, node) + */ +static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail) +{ + u32 old, new, val = atomic_read(&lock->val); + + for (;;) { + new = (val & _Q_LOCKED_PENDING_MASK) | tail; + old = atomic_cmpxchg(&lock->val, val, new); + if (old == val) + break; + + val = old; + } + return old; +} + /** * queued_spin_lock_slowpath - acquire the queued spinlock * @lock: Pointer to queued spinlock structure @@ -178,15 +214,7 @@ void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val) * * *,1,0 -> *,0,1 */ - for (;;) { - new = (val & ~_Q_PENDING_MASK) | _Q_LOCKED_VAL; - - old = atomic_cmpxchg(&lock->val, val, new); - if (old == val) - break; - - val = old; - } + clear_pending_set_locked(lock); return; /* @@ -203,37 +231,26 @@ queue: node->next = NULL; /* - * We have already touched the queueing cacheline; don't bother with - * pending stuff. - * - * trylock || xchg(lock, node) - * - * 0,0,0 -> 0,0,1 ; no tail, not locked -> no tail, locked. - * p,y,x -> n,y,x ; tail was p -> tail is n; preserving locked. + * We touched a (possibly) cold cacheline in the per-cpu queue node; + * attempt the trylock once more in the hope someone let go while we + * weren't watching. */ - for (;;) { - new = _Q_LOCKED_VAL; - if (val) - new = tail | (val & _Q_LOCKED_PENDING_MASK); - - old = atomic_cmpxchg(&lock->val, val, new); - if (old == val) - break; - - val = old; - } + if (queued_spin_trylock(lock)) + goto release; /* - * we won the trylock; forget about queueing. + * We have already touched the queueing cacheline; don't bother with + * pending stuff. + * + * p,*,* -> n,*,* */ - if (new == _Q_LOCKED_VAL) - goto release; + old = xchg_tail(lock, tail); /* * if there was a previous node; link it and wait until reaching the * head of the waitqueue. */ - if (old & ~_Q_LOCKED_PENDING_MASK) { + if (old & _Q_TAIL_MASK) { prev = decode_tail(old); WRITE_ONCE(prev->next, node); -- cgit v1.2.3 From 69f9cae90907e09af95fb991ed384670cef8dd32 Mon Sep 17 00:00:00 2001 From: "Peter Zijlstra (Intel)" Date: Fri, 24 Apr 2015 14:56:34 -0400 Subject: locking/qspinlock: Optimize for smaller NR_CPUS When we allow for a max NR_CPUS < 2^14 we can optimize the pending wait-acquire and the xchg_tail() operations. By growing the pending bit to a byte, we reduce the tail to 16bit. This means we can use xchg16 for the tail part and do away with all the repeated compxchg() operations. This in turn allows us to unconditionally acquire; the locked state as observed by the wait loops cannot change. And because both locked and pending are now a full byte we can use simple stores for the state transition, obviating one atomic operation entirely. This optimization is needed to make the qspinlock achieve performance parity with ticket spinlock at light load. All this is horribly broken on Alpha pre EV56 (and any other arch that cannot do single-copy atomic byte stores). Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Waiman Long Signed-off-by: Peter Zijlstra (Intel) Cc: Andrew Morton Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Daniel J Blueman Cc: David Vrabel Cc: Douglas Hatch Cc: H. Peter Anvin Cc: Konrad Rzeszutek Wilk Cc: Linus Torvalds Cc: Oleg Nesterov Cc: Paolo Bonzini Cc: Paul E. McKenney Cc: Peter Zijlstra Cc: Raghavendra K T Cc: Rik van Riel Cc: Scott J Norton Cc: Thomas Gleixner Cc: virtualization@lists.linux-foundation.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1429901803-29771-6-git-send-email-Waiman.Long@hp.com Signed-off-by: Ingo Molnar --- include/asm-generic/qspinlock_types.h | 13 +++++++ kernel/locking/qspinlock.c | 69 ++++++++++++++++++++++++++++++++++- 2 files changed, 81 insertions(+), 1 deletion(-) (limited to 'include') diff --git a/include/asm-generic/qspinlock_types.h b/include/asm-generic/qspinlock_types.h index 3a7f67173bd0..85f888e86761 100644 --- a/include/asm-generic/qspinlock_types.h +++ b/include/asm-generic/qspinlock_types.h @@ -35,6 +35,14 @@ typedef struct qspinlock { /* * Bitfields in the atomic value: * + * When NR_CPUS < 16K + * 0- 7: locked byte + * 8: pending + * 9-15: not used + * 16-17: tail index + * 18-31: tail cpu (+1) + * + * When NR_CPUS >= 16K * 0- 7: locked byte * 8: pending * 9-10: tail index @@ -47,7 +55,11 @@ typedef struct qspinlock { #define _Q_LOCKED_MASK _Q_SET_MASK(LOCKED) #define _Q_PENDING_OFFSET (_Q_LOCKED_OFFSET + _Q_LOCKED_BITS) +#if CONFIG_NR_CPUS < (1U << 14) +#define _Q_PENDING_BITS 8 +#else #define _Q_PENDING_BITS 1 +#endif #define _Q_PENDING_MASK _Q_SET_MASK(PENDING) #define _Q_TAIL_IDX_OFFSET (_Q_PENDING_OFFSET + _Q_PENDING_BITS) @@ -58,6 +70,7 @@ typedef struct qspinlock { #define _Q_TAIL_CPU_BITS (32 - _Q_TAIL_CPU_OFFSET) #define _Q_TAIL_CPU_MASK _Q_SET_MASK(TAIL_CPU) +#define _Q_TAIL_OFFSET _Q_TAIL_IDX_OFFSET #define _Q_TAIL_MASK (_Q_TAIL_IDX_MASK | _Q_TAIL_CPU_MASK) #define _Q_LOCKED_VAL (1U << _Q_LOCKED_OFFSET) diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c index 82bb4a9e9009..e17efe7b8d4d 100644 --- a/kernel/locking/qspinlock.c +++ b/kernel/locking/qspinlock.c @@ -24,6 +24,7 @@ #include #include #include +#include #include /* @@ -56,6 +57,10 @@ * node; whereby avoiding the need to carry a node from lock to unlock, and * preserving existing lock API. This also makes the unlock code simpler and * faster. + * + * N.B. The current implementation only supports architectures that allow + * atomic operations on smaller 8-bit and 16-bit data types. + * */ #include "mcs_spinlock.h" @@ -96,6 +101,62 @@ static inline struct mcs_spinlock *decode_tail(u32 tail) #define _Q_LOCKED_PENDING_MASK (_Q_LOCKED_MASK | _Q_PENDING_MASK) +/* + * By using the whole 2nd least significant byte for the pending bit, we + * can allow better optimization of the lock acquisition for the pending + * bit holder. + */ +#if _Q_PENDING_BITS == 8 + +struct __qspinlock { + union { + atomic_t val; + struct { +#ifdef __LITTLE_ENDIAN + u16 locked_pending; + u16 tail; +#else + u16 tail; + u16 locked_pending; +#endif + }; + }; +}; + +/** + * clear_pending_set_locked - take ownership and clear the pending bit. + * @lock: Pointer to queued spinlock structure + * + * *,1,0 -> *,0,1 + * + * Lock stealing is not allowed if this function is used. + */ +static __always_inline void clear_pending_set_locked(struct qspinlock *lock) +{ + struct __qspinlock *l = (void *)lock; + + WRITE_ONCE(l->locked_pending, _Q_LOCKED_VAL); +} + +/* + * xchg_tail - Put in the new queue tail code word & retrieve previous one + * @lock : Pointer to queued spinlock structure + * @tail : The new queue tail code word + * Return: The previous queue tail code word + * + * xchg(lock, tail) + * + * p,*,* -> n,*,* ; prev = xchg(lock, node) + */ +static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail) +{ + struct __qspinlock *l = (void *)lock; + + return (u32)xchg(&l->tail, tail >> _Q_TAIL_OFFSET) << _Q_TAIL_OFFSET; +} + +#else /* _Q_PENDING_BITS == 8 */ + /** * clear_pending_set_locked - take ownership and clear the pending bit. * @lock: Pointer to queued spinlock structure @@ -131,6 +192,7 @@ static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail) } return old; } +#endif /* _Q_PENDING_BITS == 8 */ /** * queued_spin_lock_slowpath - acquire the queued spinlock @@ -205,8 +267,13 @@ void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val) * we're pending, wait for the owner to go away. * * *,1,1 -> *,1,0 + * + * this wait loop must be a load-acquire such that we match the + * store-release that clears the locked bit and create lock + * sequentiality; this is because not all clear_pending_set_locked() + * implementations imply full barriers. */ - while ((val = atomic_read(&lock->val)) & _Q_LOCKED_MASK) + while ((val = smp_load_acquire(&lock->val.counter)) & _Q_LOCKED_MASK) cpu_relax(); /* -- cgit v1.2.3 From 2aa79af64263190eec610422b07f60e99a7d230a Mon Sep 17 00:00:00 2001 From: "Peter Zijlstra (Intel)" Date: Fri, 24 Apr 2015 14:56:36 -0400 Subject: locking/qspinlock: Revert to test-and-set on hypervisors When we detect a hypervisor (!paravirt, see qspinlock paravirt support patches), revert to a simple test-and-set lock to avoid the horrors of queue preemption. Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Waiman Long Signed-off-by: Peter Zijlstra (Intel) Cc: Andrew Morton Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Daniel J Blueman Cc: David Vrabel Cc: Douglas Hatch Cc: H. Peter Anvin Cc: Konrad Rzeszutek Wilk Cc: Linus Torvalds Cc: Oleg Nesterov Cc: Paolo Bonzini Cc: Paul E. McKenney Cc: Peter Zijlstra Cc: Raghavendra K T Cc: Rik van Riel Cc: Scott J Norton Cc: Thomas Gleixner Cc: virtualization@lists.linux-foundation.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1429901803-29771-8-git-send-email-Waiman.Long@hp.com Signed-off-by: Ingo Molnar --- arch/x86/include/asm/qspinlock.h | 14 ++++++++++++++ include/asm-generic/qspinlock.h | 7 +++++++ kernel/locking/qspinlock.c | 3 +++ 3 files changed, 24 insertions(+) (limited to 'include') diff --git a/arch/x86/include/asm/qspinlock.h b/arch/x86/include/asm/qspinlock.h index e2aee8273664..f079b7020e3f 100644 --- a/arch/x86/include/asm/qspinlock.h +++ b/arch/x86/include/asm/qspinlock.h @@ -1,6 +1,7 @@ #ifndef _ASM_X86_QSPINLOCK_H #define _ASM_X86_QSPINLOCK_H +#include #include #define queued_spin_unlock queued_spin_unlock @@ -15,6 +16,19 @@ static inline void queued_spin_unlock(struct qspinlock *lock) smp_store_release((u8 *)lock, 0); } +#define virt_queued_spin_lock virt_queued_spin_lock + +static inline bool virt_queued_spin_lock(struct qspinlock *lock) +{ + if (!static_cpu_has(X86_FEATURE_HYPERVISOR)) + return false; + + while (atomic_cmpxchg(&lock->val, 0, _Q_LOCKED_VAL) != 0) + cpu_relax(); + + return true; +} + #include #endif /* _ASM_X86_QSPINLOCK_H */ diff --git a/include/asm-generic/qspinlock.h b/include/asm-generic/qspinlock.h index 569abcd47a9a..83bfb87f5bf1 100644 --- a/include/asm-generic/qspinlock.h +++ b/include/asm-generic/qspinlock.h @@ -111,6 +111,13 @@ static inline void queued_spin_unlock_wait(struct qspinlock *lock) cpu_relax(); } +#ifndef virt_queued_spin_lock +static __always_inline bool virt_queued_spin_lock(struct qspinlock *lock) +{ + return false; +} +#endif + /* * Initializier */ diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c index 033872113ebb..fd31a474145d 100644 --- a/kernel/locking/qspinlock.c +++ b/kernel/locking/qspinlock.c @@ -249,6 +249,9 @@ void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val) BUILD_BUG_ON(CONFIG_NR_CPUS >= (1U << _Q_TAIL_CPU_BITS)); + if (virt_queued_spin_lock(lock)) + return; + /* * wait for in-progress pending->locked hand-overs * -- cgit v1.2.3 From cede88418b385b50f6841e4b2f1586888b8ab924 Mon Sep 17 00:00:00 2001 From: Sebastian Andrzej Siewior Date: Wed, 25 Feb 2015 18:56:13 +0100 Subject: locking/rtmutex: Drop usage of __HAVE_ARCH_CMPXCHG The rtmutex code is the only user of __HAVE_ARCH_CMPXCHG and we have a few other user of cmpxchg() which do not care about __HAVE_ARCH_CMPXCHG. This define was first introduced in 23f78d4a0 ("[PATCH] pi-futex: rt mutex core") which is v2.6.18. The generic cmpxchg was introduced later in 068fbad288 ("Add cmpxchg_local to asm-generic for per cpu atomic operations") which is v2.6.25. Back then something was required to get rtmutex working with the fast path on architectures without cmpxchg and this seems to be the result. It popped up recently on rt-users because ARM (v6+) does not define __HAVE_ARCH_CMPXCHG (even that it implements it) which results in slower locking performance in the fast path. To put some numbers on it: preempt -RT, am335x, 10 loops of 100000 invocations of rt_spin_lock() + rt_spin_unlock() (time "total" is the average of the 10 loops for the 100000 invocations, "loop" is "total / 100000 * 1000"): cmpxchg | slowpath used || cmpxchg used | total | loop || total | loop --------|-----------|-------||------------|------- ARMv6 | 9129.4 us | 91 ns || 3311.9 us | 33 ns generic | 9360.2 us | 94 ns || 10834.6 us | 108 ns ----------------------------||-------------------- Forcing it to generic cmpxchg() made things worse for the slowpath and even worse in cmpxchg() path. It boils down to 14ns more per lock+unlock in a cache hot loop so it might not be that much in real world. The last test was a substitute for pre ARMv6 machine but then I was able to perform the comparison on imx28 which is ARMv5 and therefore is always is using the generic cmpxchg implementation. And the numbers: | total | loop -------- |----------- |-------- slowpath | 263937.2 us | 2639 ns cmpxchg | 16934.2 us | 169 ns -------------------------------- The numbers are larger since the machine is slower in general. However, letting rtmutex use cmpxchg() instead the slowpath seem to improve things. Since from the ARM (tested on am335x + imx28) point of view always using cmpxchg() in rt_mutex_lock() + rt_mutex_unlock() makes sense I would drop the define. Signed-off-by: Sebastian Andrzej Siewior Cc: Arnd Bergmann Cc: Peter Zijlstra Cc: will.deacon@arm.com Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/20150225175613.GE6823@linutronix.de Signed-off-by: Thomas Gleixner --- include/asm-generic/cmpxchg.h | 3 --- kernel/locking/rtmutex.c | 6 +++--- 2 files changed, 3 insertions(+), 6 deletions(-) (limited to 'include') diff --git a/include/asm-generic/cmpxchg.h b/include/asm-generic/cmpxchg.h index 811fb1e9b061..3766ab34aa45 100644 --- a/include/asm-generic/cmpxchg.h +++ b/include/asm-generic/cmpxchg.h @@ -86,9 +86,6 @@ unsigned long __xchg(unsigned long x, volatile void *ptr, int size) /* * Atomic compare and exchange. - * - * Do not define __HAVE_ARCH_CMPXCHG because we want to use it to check whether - * a cmpxchg primitive faster than repeated local irq save/restore exists. */ #include diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c index b73279367087..27dff663f9e4 100644 --- a/kernel/locking/rtmutex.c +++ b/kernel/locking/rtmutex.c @@ -70,10 +70,10 @@ static void fixup_rt_mutex_waiters(struct rt_mutex *lock) } /* - * We can speed up the acquire/release, if the architecture - * supports cmpxchg and if there's no debugging state to be set up + * We can speed up the acquire/release, if there's no debugging state to be + * set up. */ -#if defined(__HAVE_ARCH_CMPXCHG) && !defined(CONFIG_DEBUG_RT_MUTEXES) +#ifndef CONFIG_DEBUG_RT_MUTEXES # define rt_mutex_cmpxchg(l,c,n) (cmpxchg(&l->owner, c, n) == c) static inline void mark_rt_mutex_waiters(struct rt_mutex *lock) { -- cgit v1.2.3 From ab3f02fc237211f0583c1e7ba3bf504747be9b8d Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Tue, 12 May 2015 10:52:27 +0200 Subject: locking/arch: Add WRITE_ONCE() to set_mb() Since we assume set_mb() to result in a single store followed by a full memory barrier, employ WRITE_ONCE(). Signed-off-by: Peter Zijlstra (Intel) Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar --- arch/arm/include/asm/barrier.h | 2 +- arch/arm64/include/asm/barrier.h | 2 +- arch/ia64/include/asm/barrier.h | 2 +- arch/metag/include/asm/barrier.h | 2 +- arch/mips/include/asm/barrier.h | 2 +- arch/powerpc/include/asm/barrier.h | 2 +- arch/s390/include/asm/barrier.h | 2 +- arch/sparc/include/asm/barrier_64.h | 2 +- arch/x86/include/asm/barrier.h | 2 +- arch/x86/um/asm/barrier.h | 2 +- include/asm-generic/barrier.h | 2 +- include/linux/compiler.h | 2 +- 12 files changed, 12 insertions(+), 12 deletions(-) (limited to 'include') diff --git a/arch/arm/include/asm/barrier.h b/arch/arm/include/asm/barrier.h index d2f81e6b8c1c..993150aea681 100644 --- a/arch/arm/include/asm/barrier.h +++ b/arch/arm/include/asm/barrier.h @@ -81,7 +81,7 @@ do { \ #define read_barrier_depends() do { } while(0) #define smp_read_barrier_depends() do { } while(0) -#define set_mb(var, value) do { var = value; smp_mb(); } while (0) +#define set_mb(var, value) do { WRITE_ONCE(var, value); smp_mb(); } while (0) #define smp_mb__before_atomic() smp_mb() #define smp_mb__after_atomic() smp_mb() diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h index 71f19c4dc0de..ff7de78d01b8 100644 --- a/arch/arm64/include/asm/barrier.h +++ b/arch/arm64/include/asm/barrier.h @@ -114,7 +114,7 @@ do { \ #define read_barrier_depends() do { } while(0) #define smp_read_barrier_depends() do { } while(0) -#define set_mb(var, value) do { var = value; smp_mb(); } while (0) +#define set_mb(var, value) do { WRITE_ONCE(var, value); smp_mb(); } while (0) #define nop() asm volatile("nop"); #define smp_mb__before_atomic() smp_mb() diff --git a/arch/ia64/include/asm/barrier.h b/arch/ia64/include/asm/barrier.h index f6769eb2bbf9..03117e7b2ab8 100644 --- a/arch/ia64/include/asm/barrier.h +++ b/arch/ia64/include/asm/barrier.h @@ -82,7 +82,7 @@ do { \ * acquire vs release semantics but we can't discuss this stuff with * Linus just yet. Grrr... */ -#define set_mb(var, value) do { (var) = (value); mb(); } while (0) +#define set_mb(var, value) do { WRITE_ONCE(var, value); mb(); } while (0) /* * The group barrier in front of the rsm & ssm are necessary to ensure diff --git a/arch/metag/include/asm/barrier.h b/arch/metag/include/asm/barrier.h index d703d8e26a65..97eb018a2933 100644 --- a/arch/metag/include/asm/barrier.h +++ b/arch/metag/include/asm/barrier.h @@ -84,7 +84,7 @@ static inline void fence(void) #define read_barrier_depends() do { } while (0) #define smp_read_barrier_depends() do { } while (0) -#define set_mb(var, value) do { var = value; smp_mb(); } while (0) +#define set_mb(var, value) do { WRITE_ONCE(var, value); smp_mb(); } while (0) #define smp_store_release(p, v) \ do { \ diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h index 2b8bbbcb9be0..cff1bbdaa74a 100644 --- a/arch/mips/include/asm/barrier.h +++ b/arch/mips/include/asm/barrier.h @@ -113,7 +113,7 @@ #endif #define set_mb(var, value) \ - do { var = value; smp_mb(); } while (0) + do { WRITE_ONCE(var, value); smp_mb(); } while (0) #define smp_llsc_mb() __asm__ __volatile__(__WEAK_LLSC_MB : : :"memory") diff --git a/arch/powerpc/include/asm/barrier.h b/arch/powerpc/include/asm/barrier.h index a3bf5be111ff..2a072e48780d 100644 --- a/arch/powerpc/include/asm/barrier.h +++ b/arch/powerpc/include/asm/barrier.h @@ -34,7 +34,7 @@ #define rmb() __asm__ __volatile__ ("sync" : : : "memory") #define wmb() __asm__ __volatile__ ("sync" : : : "memory") -#define set_mb(var, value) do { var = value; mb(); } while (0) +#define set_mb(var, value) do { WRITE_ONCE(var, value); mb(); } while (0) #ifdef __SUBARCH_HAS_LWSYNC # define SMPWMB LWSYNC diff --git a/arch/s390/include/asm/barrier.h b/arch/s390/include/asm/barrier.h index 8d724718ec21..b66cd53d35fc 100644 --- a/arch/s390/include/asm/barrier.h +++ b/arch/s390/include/asm/barrier.h @@ -36,7 +36,7 @@ #define smp_mb__before_atomic() smp_mb() #define smp_mb__after_atomic() smp_mb() -#define set_mb(var, value) do { var = value; mb(); } while (0) +#define set_mb(var, value) do { WRITE_ONCE(var, value); mb(); } while (0) #define smp_store_release(p, v) \ do { \ diff --git a/arch/sparc/include/asm/barrier_64.h b/arch/sparc/include/asm/barrier_64.h index 76648941fea7..125fec7512f4 100644 --- a/arch/sparc/include/asm/barrier_64.h +++ b/arch/sparc/include/asm/barrier_64.h @@ -41,7 +41,7 @@ do { __asm__ __volatile__("ba,pt %%xcc, 1f\n\t" \ #define dma_wmb() wmb() #define set_mb(__var, __value) \ - do { __var = __value; membar_safe("#StoreLoad"); } while(0) + do { WRITE_ONCE(__var, __value); membar_safe("#StoreLoad"); } while(0) #ifdef CONFIG_SMP #define smp_mb() mb() diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h index 959e45b81fe2..9de5cde133a1 100644 --- a/arch/x86/include/asm/barrier.h +++ b/arch/x86/include/asm/barrier.h @@ -40,7 +40,7 @@ #define smp_mb() barrier() #define smp_rmb() barrier() #define smp_wmb() barrier() -#define set_mb(var, value) do { var = value; barrier(); } while (0) +#define set_mb(var, value) do { WRITE_ONCE(var, value); barrier(); } while (0) #endif /* SMP */ #define read_barrier_depends() do { } while (0) diff --git a/arch/x86/um/asm/barrier.h b/arch/x86/um/asm/barrier.h index 7e8a1a650435..cc0cb01f346d 100644 --- a/arch/x86/um/asm/barrier.h +++ b/arch/x86/um/asm/barrier.h @@ -39,7 +39,7 @@ #define smp_mb() barrier() #define smp_rmb() barrier() #define smp_wmb() barrier() -#define set_mb(var, value) do { var = value; barrier(); } while (0) +#define set_mb(var, value) do { WRITE_ONCE(var, value); barrier(); } while (0) #define read_barrier_depends() do { } while (0) #define smp_read_barrier_depends() do { } while (0) diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h index f5c40b0fadc2..3938716b44d7 100644 --- a/include/asm-generic/barrier.h +++ b/include/asm-generic/barrier.h @@ -67,7 +67,7 @@ #endif #ifndef set_mb -#define set_mb(var, value) do { (var) = (value); mb(); } while (0) +#define set_mb(var, value) do { WRITE_ONCE(var, value); mb(); } while (0) #endif #ifndef smp_mb__before_atomic diff --git a/include/linux/compiler.h b/include/linux/compiler.h index a7c0941d10da..03e227ba481c 100644 --- a/include/linux/compiler.h +++ b/include/linux/compiler.h @@ -250,7 +250,7 @@ static __always_inline void __write_once_size(volatile void *p, void *res, int s ({ union { typeof(x) __val; char __c[1]; } __u; __read_once_size(&(x), __u.__c, sizeof(x)); __u.__val; }) #define WRITE_ONCE(x, val) \ - ({ typeof(x) __val = (val); __write_once_size(&(x), &__val, sizeof(__val)); __val; }) + ({ union { typeof(x) __val; char __c[1]; } __u = { .__val = (val) }; __write_once_size(&(x), __u.__c, sizeof(x)); __u.__val; }) #endif /* __KERNEL__ */ -- cgit v1.2.3 From b92b8b35a2e38bde319fd1d68ec84628c1f1b0fb Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Tue, 12 May 2015 10:51:55 +0200 Subject: locking/arch: Rename set_mb() to smp_store_mb() Since set_mb() is really about an smp_mb() -- not a IO/DMA barrier like mb() rename it to match the recent smp_load_acquire() and smp_store_release(). Suggested-by: Linus Torvalds Signed-off-by: Peter Zijlstra (Intel) Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar --- Documentation/memory-barriers.txt | 6 +++--- arch/arm/include/asm/barrier.h | 2 +- arch/arm64/include/asm/barrier.h | 2 +- arch/ia64/include/asm/barrier.h | 7 +------ arch/metag/include/asm/barrier.h | 2 +- arch/mips/include/asm/barrier.h | 2 +- arch/powerpc/include/asm/barrier.h | 2 +- arch/s390/include/asm/barrier.h | 2 +- arch/sh/include/asm/barrier.h | 2 +- arch/sparc/include/asm/barrier_64.h | 2 +- arch/x86/include/asm/barrier.h | 4 ++-- arch/x86/um/asm/barrier.h | 3 ++- fs/select.c | 6 +++--- include/asm-generic/barrier.h | 4 ++-- include/linux/sched.h | 8 ++++---- kernel/futex.c | 2 +- kernel/locking/qspinlock_paravirt.h | 2 +- kernel/sched/wait.c | 4 ++-- 18 files changed, 29 insertions(+), 33 deletions(-) (limited to 'include') diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt index f95746189b5d..fe4020e4b468 100644 --- a/Documentation/memory-barriers.txt +++ b/Documentation/memory-barriers.txt @@ -1662,7 +1662,7 @@ CPU from reordering them. There are some more advanced barrier functions: - (*) set_mb(var, value) + (*) smp_store_mb(var, value) This assigns the value to the variable and then inserts a full memory barrier after it, depending on the function. It isn't guaranteed to @@ -1975,7 +1975,7 @@ after it has altered the task state: CPU 1 =============================== set_current_state(); - set_mb(); + smp_store_mb(); STORE current->state LOAD event_indicated @@ -2016,7 +2016,7 @@ between the STORE to indicate the event and the STORE to set TASK_RUNNING: CPU 1 CPU 2 =============================== =============================== set_current_state(); STORE event_indicated - set_mb(); wake_up(); + smp_store_mb(); wake_up(); STORE current->state STORE current->state LOAD event_indicated diff --git a/arch/arm/include/asm/barrier.h b/arch/arm/include/asm/barrier.h index 993150aea681..6c2327e1c732 100644 --- a/arch/arm/include/asm/barrier.h +++ b/arch/arm/include/asm/barrier.h @@ -81,7 +81,7 @@ do { \ #define read_barrier_depends() do { } while(0) #define smp_read_barrier_depends() do { } while(0) -#define set_mb(var, value) do { WRITE_ONCE(var, value); smp_mb(); } while (0) +#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); smp_mb(); } while (0) #define smp_mb__before_atomic() smp_mb() #define smp_mb__after_atomic() smp_mb() diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h index ff7de78d01b8..0fa47c4275cb 100644 --- a/arch/arm64/include/asm/barrier.h +++ b/arch/arm64/include/asm/barrier.h @@ -114,7 +114,7 @@ do { \ #define read_barrier_depends() do { } while(0) #define smp_read_barrier_depends() do { } while(0) -#define set_mb(var, value) do { WRITE_ONCE(var, value); smp_mb(); } while (0) +#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); smp_mb(); } while (0) #define nop() asm volatile("nop"); #define smp_mb__before_atomic() smp_mb() diff --git a/arch/ia64/include/asm/barrier.h b/arch/ia64/include/asm/barrier.h index 03117e7b2ab8..843ba435e43b 100644 --- a/arch/ia64/include/asm/barrier.h +++ b/arch/ia64/include/asm/barrier.h @@ -77,12 +77,7 @@ do { \ ___p1; \ }) -/* - * XXX check on this ---I suspect what Linus really wants here is - * acquire vs release semantics but we can't discuss this stuff with - * Linus just yet. Grrr... - */ -#define set_mb(var, value) do { WRITE_ONCE(var, value); mb(); } while (0) +#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); mb(); } while (0) /* * The group barrier in front of the rsm & ssm are necessary to ensure diff --git a/arch/metag/include/asm/barrier.h b/arch/metag/include/asm/barrier.h index 97eb018a2933..5a696e507930 100644 --- a/arch/metag/include/asm/barrier.h +++ b/arch/metag/include/asm/barrier.h @@ -84,7 +84,7 @@ static inline void fence(void) #define read_barrier_depends() do { } while (0) #define smp_read_barrier_depends() do { } while (0) -#define set_mb(var, value) do { WRITE_ONCE(var, value); smp_mb(); } while (0) +#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); smp_mb(); } while (0) #define smp_store_release(p, v) \ do { \ diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h index cff1bbdaa74a..7ecba84656d4 100644 --- a/arch/mips/include/asm/barrier.h +++ b/arch/mips/include/asm/barrier.h @@ -112,7 +112,7 @@ #define __WEAK_LLSC_MB " \n" #endif -#define set_mb(var, value) \ +#define smp_store_mb(var, value) \ do { WRITE_ONCE(var, value); smp_mb(); } while (0) #define smp_llsc_mb() __asm__ __volatile__(__WEAK_LLSC_MB : : :"memory") diff --git a/arch/powerpc/include/asm/barrier.h b/arch/powerpc/include/asm/barrier.h index 2a072e48780d..39505d660a70 100644 --- a/arch/powerpc/include/asm/barrier.h +++ b/arch/powerpc/include/asm/barrier.h @@ -34,7 +34,7 @@ #define rmb() __asm__ __volatile__ ("sync" : : : "memory") #define wmb() __asm__ __volatile__ ("sync" : : : "memory") -#define set_mb(var, value) do { WRITE_ONCE(var, value); mb(); } while (0) +#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); mb(); } while (0) #ifdef __SUBARCH_HAS_LWSYNC # define SMPWMB LWSYNC diff --git a/arch/s390/include/asm/barrier.h b/arch/s390/include/asm/barrier.h index b66cd53d35fc..e6f8615a11eb 100644 --- a/arch/s390/include/asm/barrier.h +++ b/arch/s390/include/asm/barrier.h @@ -36,7 +36,7 @@ #define smp_mb__before_atomic() smp_mb() #define smp_mb__after_atomic() smp_mb() -#define set_mb(var, value) do { WRITE_ONCE(var, value); mb(); } while (0) +#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); mb(); } while (0) #define smp_store_release(p, v) \ do { \ diff --git a/arch/sh/include/asm/barrier.h b/arch/sh/include/asm/barrier.h index 43715308b068..bf91037db4e0 100644 --- a/arch/sh/include/asm/barrier.h +++ b/arch/sh/include/asm/barrier.h @@ -32,7 +32,7 @@ #define ctrl_barrier() __asm__ __volatile__ ("nop;nop;nop;nop;nop;nop;nop;nop") #endif -#define set_mb(var, value) do { (void)xchg(&var, value); } while (0) +#define smp_store_mb(var, value) do { (void)xchg(&var, value); } while (0) #include diff --git a/arch/sparc/include/asm/barrier_64.h b/arch/sparc/include/asm/barrier_64.h index 125fec7512f4..809941e33e12 100644 --- a/arch/sparc/include/asm/barrier_64.h +++ b/arch/sparc/include/asm/barrier_64.h @@ -40,7 +40,7 @@ do { __asm__ __volatile__("ba,pt %%xcc, 1f\n\t" \ #define dma_rmb() rmb() #define dma_wmb() wmb() -#define set_mb(__var, __value) \ +#define smp_store_mb(__var, __value) \ do { WRITE_ONCE(__var, __value); membar_safe("#StoreLoad"); } while(0) #ifdef CONFIG_SMP diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h index 9de5cde133a1..e51a8f803f55 100644 --- a/arch/x86/include/asm/barrier.h +++ b/arch/x86/include/asm/barrier.h @@ -35,12 +35,12 @@ #define smp_mb() mb() #define smp_rmb() dma_rmb() #define smp_wmb() barrier() -#define set_mb(var, value) do { (void)xchg(&var, value); } while (0) +#define smp_store_mb(var, value) do { (void)xchg(&var, value); } while (0) #else /* !SMP */ #define smp_mb() barrier() #define smp_rmb() barrier() #define smp_wmb() barrier() -#define set_mb(var, value) do { WRITE_ONCE(var, value); barrier(); } while (0) +#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); barrier(); } while (0) #endif /* SMP */ #define read_barrier_depends() do { } while (0) diff --git a/arch/x86/um/asm/barrier.h b/arch/x86/um/asm/barrier.h index cc0cb01f346d..b9531d343134 100644 --- a/arch/x86/um/asm/barrier.h +++ b/arch/x86/um/asm/barrier.h @@ -39,7 +39,8 @@ #define smp_mb() barrier() #define smp_rmb() barrier() #define smp_wmb() barrier() -#define set_mb(var, value) do { WRITE_ONCE(var, value); barrier(); } while (0) + +#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); barrier(); } while (0) #define read_barrier_depends() do { } while (0) #define smp_read_barrier_depends() do { } while (0) diff --git a/fs/select.c b/fs/select.c index f684c750e08a..015547330e88 100644 --- a/fs/select.c +++ b/fs/select.c @@ -189,7 +189,7 @@ static int __pollwake(wait_queue_t *wait, unsigned mode, int sync, void *key) * doesn't imply write barrier and the users expect write * barrier semantics on wakeup functions. The following * smp_wmb() is equivalent to smp_wmb() in try_to_wake_up() - * and is paired with set_mb() in poll_schedule_timeout. + * and is paired with smp_store_mb() in poll_schedule_timeout. */ smp_wmb(); pwq->triggered = 1; @@ -244,7 +244,7 @@ int poll_schedule_timeout(struct poll_wqueues *pwq, int state, /* * Prepare for the next iteration. * - * The following set_mb() serves two purposes. First, it's + * The following smp_store_mb() serves two purposes. First, it's * the counterpart rmb of the wmb in pollwake() such that data * written before wake up is always visible after wake up. * Second, the full barrier guarantees that triggered clearing @@ -252,7 +252,7 @@ int poll_schedule_timeout(struct poll_wqueues *pwq, int state, * this problem doesn't exist for the first iteration as * add_wait_queue() has full barrier semantics. */ - set_mb(pwq->triggered, 0); + smp_store_mb(pwq->triggered, 0); return rc; } diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h index 3938716b44d7..e6a83d712ef6 100644 --- a/include/asm-generic/barrier.h +++ b/include/asm-generic/barrier.h @@ -66,8 +66,8 @@ #define smp_read_barrier_depends() do { } while (0) #endif -#ifndef set_mb -#define set_mb(var, value) do { WRITE_ONCE(var, value); mb(); } while (0) +#ifndef smp_store_mb +#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); mb(); } while (0) #endif #ifndef smp_mb__before_atomic diff --git a/include/linux/sched.h b/include/linux/sched.h index 26a2e6122734..18f197223ebd 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -252,7 +252,7 @@ extern char ___assert_task_state[1 - 2*!!( #define set_task_state(tsk, state_value) \ do { \ (tsk)->task_state_change = _THIS_IP_; \ - set_mb((tsk)->state, (state_value)); \ + smp_store_mb((tsk)->state, (state_value)); \ } while (0) /* @@ -274,7 +274,7 @@ extern char ___assert_task_state[1 - 2*!!( #define set_current_state(state_value) \ do { \ current->task_state_change = _THIS_IP_; \ - set_mb(current->state, (state_value)); \ + smp_store_mb(current->state, (state_value)); \ } while (0) #else @@ -282,7 +282,7 @@ extern char ___assert_task_state[1 - 2*!!( #define __set_task_state(tsk, state_value) \ do { (tsk)->state = (state_value); } while (0) #define set_task_state(tsk, state_value) \ - set_mb((tsk)->state, (state_value)) + smp_store_mb((tsk)->state, (state_value)) /* * set_current_state() includes a barrier so that the write of current->state @@ -298,7 +298,7 @@ extern char ___assert_task_state[1 - 2*!!( #define __set_current_state(state_value) \ do { current->state = (state_value); } while (0) #define set_current_state(state_value) \ - set_mb(current->state, (state_value)) + smp_store_mb(current->state, (state_value)) #endif diff --git a/kernel/futex.c b/kernel/futex.c index 2579e407ff67..55ca63ad9622 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -2055,7 +2055,7 @@ static void futex_wait_queue_me(struct futex_hash_bucket *hb, struct futex_q *q, { /* * The task state is guaranteed to be set before another task can - * wake it. set_current_state() is implemented using set_mb() and + * wake it. set_current_state() is implemented using smp_store_mb() and * queue_me() calls spin_unlock() upon completion, both serializing * access to the hash list and forcing another memory barrier. */ diff --git a/kernel/locking/qspinlock_paravirt.h b/kernel/locking/qspinlock_paravirt.h index 27ab96dca68c..04ab18151cc8 100644 --- a/kernel/locking/qspinlock_paravirt.h +++ b/kernel/locking/qspinlock_paravirt.h @@ -175,7 +175,7 @@ static void pv_wait_node(struct mcs_spinlock *node) * * Matches the xchg() from pv_kick_node(). */ - set_mb(pn->state, vcpu_halted); + smp_store_mb(pn->state, vcpu_halted); if (!READ_ONCE(node->locked)) pv_wait(&pn->state, vcpu_halted); diff --git a/kernel/sched/wait.c b/kernel/sched/wait.c index 852143a79f36..9bc82329eaad 100644 --- a/kernel/sched/wait.c +++ b/kernel/sched/wait.c @@ -341,7 +341,7 @@ long wait_woken(wait_queue_t *wait, unsigned mode, long timeout) * condition being true _OR_ WQ_FLAG_WOKEN such that we will not miss * an event. */ - set_mb(wait->flags, wait->flags & ~WQ_FLAG_WOKEN); /* B */ + smp_store_mb(wait->flags, wait->flags & ~WQ_FLAG_WOKEN); /* B */ return timeout; } @@ -354,7 +354,7 @@ int woken_wake_function(wait_queue_t *wait, unsigned mode, int sync, void *key) * doesn't imply write barrier and the users expects write * barrier semantics on wakeup functions. The following * smp_wmb() is equivalent to smp_wmb() in try_to_wake_up() - * and is paired with set_mb() in wait_woken(). + * and is paired with smp_store_mb() in wait_woken(). */ smp_wmb(); /* C */ wait->flags |= WQ_FLAG_WOKEN; -- cgit v1.2.3 From 68722101ec3a0e179408a13708dd020e04f54aab Mon Sep 17 00:00:00 2001 From: George Beshers Date: Thu, 18 Jun 2015 10:25:13 -0500 Subject: locking/lockdep: Remove hard coded array size dependency An apparent oversight left a hardcoded '4' in place when LOCKSTAT_POINTS was introduced. The contention_point[] and contending_point[] arrays in the structs lock_class and lock_class_stats need to be the same size for the loops in lock_stats() to be correct. This patch allows LOCKSTAT_POINTS to be changed without affecting the correctness of the code. Signed-off-by: George Beshers Cc: Andrew Morton Cc: Borislav Petkov Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar --- include/linux/lockdep.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include') diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h index 066ba4157541..2722111591a3 100644 --- a/include/linux/lockdep.h +++ b/include/linux/lockdep.h @@ -130,8 +130,8 @@ enum bounce_type { }; struct lock_class_stats { - unsigned long contention_point[4]; - unsigned long contending_point[4]; + unsigned long contention_point[LOCKSTAT_POINTS]; + unsigned long contending_point[LOCKSTAT_POINTS]; struct lock_time read_waittime; struct lock_time write_waittime; struct lock_time read_holdtime; -- cgit v1.2.3