diff options
author | Ingo Molnar <mingo@kernel.org> | 2018-01-03 14:14:18 +0100 |
---|---|---|
committer | Ingo Molnar <mingo@kernel.org> | 2018-01-03 14:14:18 +0100 |
commit | 475c5ee193fd682c6383b5e418e65e46a477d176 (patch) | |
tree | 43f70a77d919e10237edbbbc3ca7ac8076d27410 | |
parent | 30a7acd573899fd8b8ac39236eff6468b195ac7d (diff) | |
parent | 1dfa55e01987288d847220b8c027204871440ed1 (diff) | |
download | linux-475c5ee193fd682c6383b5e418e65e46a477d176.tar.bz2 |
Merge branch 'for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
Pull RCU updates from Paul E. McKenney:
- Updates to use cond_resched() instead of cond_resched_rcu_qs()
where feasible (currently everywhere except in kernel/rcu and
in kernel/torture.c). Also a couple of fixes to avoid sending
IPIs to offline CPUs.
- Updates to simplify RCU's dyntick-idle handling.
- Updates to remove almost all uses of smp_read_barrier_depends()
and read_barrier_depends().
- Miscellaneous fixes.
- Torture-test updates.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
81 files changed, 501 insertions, 748 deletions
diff --git a/Documentation/RCU/Design/Data-Structures/Data-Structures.html b/Documentation/RCU/Design/Data-Structures/Data-Structures.html index 38d6d800761f..6c06e10bd04b 100644 --- a/Documentation/RCU/Design/Data-Structures/Data-Structures.html +++ b/Documentation/RCU/Design/Data-Structures/Data-Structures.html @@ -1097,7 +1097,8 @@ will cause the CPU to disregard the values of its counters on its next exit from idle. Finally, the <tt>rcu_qs_ctr_snap</tt> field is used to detect cases where a given operation has resulted in a quiescent state -for all flavors of RCU, for example, <tt>cond_resched_rcu_qs()</tt>. +for all flavors of RCU, for example, <tt>cond_resched()</tt> +when RCU has indicated a need for quiescent states. <h5>RCU Callback Handling</h5> @@ -1182,8 +1183,8 @@ CPU (and from tracing) unless otherwise stated. Its fields are as follows: <pre> - 1 int dynticks_nesting; - 2 int dynticks_nmi_nesting; + 1 long dynticks_nesting; + 2 long dynticks_nmi_nesting; 3 atomic_t dynticks; 4 bool rcu_need_heavy_qs; 5 unsigned long rcu_qs_ctr; @@ -1191,15 +1192,31 @@ Its fields are as follows: </pre> <p>The <tt>->dynticks_nesting</tt> field counts the -nesting depth of normal interrupts. -In addition, this counter is incremented when exiting dyntick-idle -mode and decremented when entering it. +nesting depth of process execution, so that in normal circumstances +this counter has value zero or one. +NMIs, irqs, and tracers are counted by the <tt>->dynticks_nmi_nesting</tt> +field. +Because NMIs cannot be masked, changes to this variable have to be +undertaken carefully using an algorithm provided by Andy Lutomirski. +The initial transition from idle adds one, and nested transitions +add two, so that a nesting level of five is represented by a +<tt>->dynticks_nmi_nesting</tt> value of nine. This counter can therefore be thought of as counting the number of reasons why this CPU cannot be permitted to enter dyntick-idle -mode, aside from non-maskable interrupts (NMIs). -NMIs are counted by the <tt>->dynticks_nmi_nesting</tt> -field, except that NMIs that interrupt non-dyntick-idle execution -are not counted. +mode, aside from process-level transitions. + +<p>However, it turns out that when running in non-idle kernel context, +the Linux kernel is fully capable of entering interrupt handlers that +never exit and perhaps also vice versa. +Therefore, whenever the <tt>->dynticks_nesting</tt> field is +incremented up from zero, the <tt>->dynticks_nmi_nesting</tt> field +is set to a large positive number, and whenever the +<tt>->dynticks_nesting</tt> field is decremented down to zero, +the the <tt>->dynticks_nmi_nesting</tt> field is set to zero. +Assuming that the number of misnested interrupts is not sufficient +to overflow the counter, this approach corrects the +<tt>->dynticks_nmi_nesting</tt> field every time the corresponding +CPU enters the idle loop from process context. </p><p>The <tt>->dynticks</tt> field counts the corresponding CPU's transitions to and from dyntick-idle mode, so that this counter @@ -1231,14 +1248,16 @@ in response. <tr><th> </th></tr> <tr><th align="left">Quick Quiz:</th></tr> <tr><td> - Why not just count all NMIs? - Wouldn't that be simpler and less error prone? + Why not simply combine the <tt>->dynticks_nesting</tt> + and <tt>->dynticks_nmi_nesting</tt> counters into a + single counter that just counts the number of reasons that + the corresponding CPU is non-idle? </td></tr> <tr><th align="left">Answer:</th></tr> <tr><td bgcolor="#ffffff"><font color="ffffff"> - It seems simpler only until you think hard about how to go about - updating the <tt>rcu_dynticks</tt> structure's - <tt>->dynticks</tt> field. + Because this would fail in the presence of interrupts whose + handlers never return and of handlers that manage to return + from a made-up interrupt. </font></td></tr> <tr><td> </td></tr> </table> diff --git a/Documentation/RCU/Design/Requirements/Requirements.html b/Documentation/RCU/Design/Requirements/Requirements.html index 62e847bcdcdd..49690228b1c6 100644 --- a/Documentation/RCU/Design/Requirements/Requirements.html +++ b/Documentation/RCU/Design/Requirements/Requirements.html @@ -581,7 +581,8 @@ This guarantee was only partially premeditated. DYNIX/ptx used an explicit memory barrier for publication, but had nothing resembling <tt>rcu_dereference()</tt> for subscription, nor did it have anything resembling the <tt>smp_read_barrier_depends()</tt> -that was later subsumed into <tt>rcu_dereference()</tt>. +that was later subsumed into <tt>rcu_dereference()</tt> and later +still into <tt>READ_ONCE()</tt>. The need for these operations made itself known quite suddenly at a late-1990s meeting with the DEC Alpha architects, back in the days when DEC was still a free-standing company. @@ -2797,7 +2798,7 @@ RCU must avoid degrading real-time response for CPU-bound threads, whether executing in usermode (which is one use case for <tt>CONFIG_NO_HZ_FULL=y</tt>) or in the kernel. That said, CPU-bound loops in the kernel must execute -<tt>cond_resched_rcu_qs()</tt> at least once per few tens of milliseconds +<tt>cond_resched()</tt> at least once per few tens of milliseconds in order to avoid receiving an IPI from RCU. <p> @@ -3128,7 +3129,7 @@ The solution, in the form of is to have implicit read-side critical sections that are delimited by voluntary context switches, that is, calls to <tt>schedule()</tt>, -<tt>cond_resched_rcu_qs()</tt>, and +<tt>cond_resched()</tt>, and <tt>synchronize_rcu_tasks()</tt>. In addition, transitions to and from userspace execution also delimit tasks-RCU read-side critical sections. diff --git a/Documentation/RCU/rcu_dereference.txt b/Documentation/RCU/rcu_dereference.txt index 1acb26b09b48..ab96227bad42 100644 --- a/Documentation/RCU/rcu_dereference.txt +++ b/Documentation/RCU/rcu_dereference.txt @@ -122,11 +122,7 @@ o Be very careful about comparing pointers obtained from Note that if checks for being within an RCU read-side critical section are not required and the pointer is never dereferenced, rcu_access_pointer() should be used in place - of rcu_dereference(). The rcu_access_pointer() primitive - does not require an enclosing read-side critical section, - and also omits the smp_read_barrier_depends() included in - rcu_dereference(), which in turn should provide a small - performance gain in some CPUs (e.g., the DEC Alpha). + of rcu_dereference(). o The comparison is against a pointer that references memory that was initialized "a long time ago." The reason diff --git a/Documentation/RCU/stallwarn.txt b/Documentation/RCU/stallwarn.txt index a08f928c8557..4259f95c3261 100644 --- a/Documentation/RCU/stallwarn.txt +++ b/Documentation/RCU/stallwarn.txt @@ -23,12 +23,10 @@ o A CPU looping with preemption disabled. This condition can o A CPU looping with bottom halves disabled. This condition can result in RCU-sched and RCU-bh stalls. -o For !CONFIG_PREEMPT kernels, a CPU looping anywhere in the - kernel without invoking schedule(). Note that cond_resched() - does not necessarily prevent RCU CPU stall warnings. Therefore, - if the looping in the kernel is really expected and desirable - behavior, you might need to replace some of the cond_resched() - calls with calls to cond_resched_rcu_qs(). +o For !CONFIG_PREEMPT kernels, a CPU looping anywhere in the kernel + without invoking schedule(). If the looping in the kernel is + really expected and desirable behavior, you might need to add + some calls to cond_resched(). o Booting Linux using a console connection that is too slow to keep up with the boot-time console-message rate. For example, diff --git a/Documentation/RCU/whatisRCU.txt b/Documentation/RCU/whatisRCU.txt index df62466da4e0..a27fbfb0efb8 100644 --- a/Documentation/RCU/whatisRCU.txt +++ b/Documentation/RCU/whatisRCU.txt @@ -600,8 +600,7 @@ don't forget about them when submitting patches making use of RCU!] #define rcu_dereference(p) \ ({ \ - typeof(p) _________p1 = p; \ - smp_read_barrier_depends(); \ + typeof(p) _________p1 = READ_ONCE(p); \ (_________p1); \ }) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index af7104aaffd9..5e486b6509e5 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -2053,9 +2053,6 @@ This tests the locking primitive's ability to transition abruptly to and from idle. - locktorture.torture_runnable= [BOOT] - Start locktorture running at boot time. - locktorture.torture_type= [KNL] Specify the locking implementation to test. @@ -3471,9 +3468,6 @@ the same as for rcuperf.nreaders. N, where N is the number of CPUs - rcuperf.perf_runnable= [BOOT] - Start rcuperf running at boot time. - rcuperf.perf_type= [KNL] Specify the RCU implementation to test. @@ -3607,9 +3601,6 @@ Test RCU's dyntick-idle handling. See also the rcutorture.shuffle_interval parameter. - rcutorture.torture_runnable= [BOOT] - Start rcutorture running at boot time. - rcutorture.torture_type= [KNL] Specify the RCU implementation to test. diff --git a/Documentation/circular-buffers.txt b/Documentation/circular-buffers.txt index d4628174b7c5..53e51caa3347 100644 --- a/Documentation/circular-buffers.txt +++ b/Documentation/circular-buffers.txt @@ -220,8 +220,7 @@ before it writes the new tail pointer, which will erase the item. Note the use of READ_ONCE() and smp_load_acquire() to read the opposition index. This prevents the compiler from discarding and -reloading its cached value - which some compilers will do across -smp_read_barrier_depends(). This isn't strictly needed if you can +reloading its cached value. This isn't strictly needed if you can be sure that the opposition index will _only_ be used the once. The smp_load_acquire() additionally forces the CPU to order against subsequent memory references. Similarly, smp_store_release() is used diff --git a/Documentation/locking/locktorture.txt b/Documentation/locking/locktorture.txt index a2ef3a929bf1..6a8df4cd19bf 100644 --- a/Documentation/locking/locktorture.txt +++ b/Documentation/locking/locktorture.txt @@ -57,11 +57,6 @@ torture_type Type of lock to torture. By default, only spinlocks will o "rwsem_lock": read/write down() and up() semaphore pairs. -torture_runnable Start locktorture at boot time in the case where the - module is built into the kernel, otherwise wait for - torture_runnable to be set via sysfs before starting. - By default it will begin once the module is loaded. - ** Torture-framework (RCU + locking) ** diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt index 479ecec80593..a863009849a3 100644 --- a/Documentation/memory-barriers.txt +++ b/Documentation/memory-barriers.txt @@ -227,17 +227,20 @@ There are some minimal guarantees that may be expected of a CPU: (*) On any given CPU, dependent memory accesses will be issued in order, with respect to itself. This means that for: - Q = READ_ONCE(P); smp_read_barrier_depends(); D = READ_ONCE(*Q); + Q = READ_ONCE(P); D = READ_ONCE(*Q); the CPU will issue the following memory operations: Q = LOAD P, D = LOAD *Q - and always in that order. On most systems, smp_read_barrier_depends() - does nothing, but it is required for DEC Alpha. The READ_ONCE() - is required to prevent compiler mischief. Please note that you - should normally use something like rcu_dereference() instead of - open-coding smp_read_barrier_depends(). + and always in that order. However, on DEC Alpha, READ_ONCE() also + emits a memory-barrier instruction, so that a DEC Alpha CPU will + instead issue the following memory operations: + + Q = LOAD P, MEMORY_BARRIER, D = LOAD *Q, MEMORY_BARRIER + + Whether on DEC Alpha or not, the READ_ONCE() also prevents compiler + mischief. (*) Overlapping loads and stores within a particular CPU will appear to be ordered within that CPU. This means that for: @@ -1815,7 +1818,7 @@ The Linux kernel has eight basic CPU memory barriers: GENERAL mb() smp_mb() WRITE wmb() smp_wmb() READ rmb() smp_rmb() - DATA DEPENDENCY read_barrier_depends() smp_read_barrier_depends() + DATA DEPENDENCY READ_ONCE() All memory barriers except the data dependency barriers imply a compiler @@ -2864,7 +2867,10 @@ access depends on a read, not all do, so it may not be relied on. Other CPUs may also have split caches, but must coordinate between the various cachelets for normal memory accesses. The semantics of the Alpha removes the -need for coordination in the absence of memory barriers. +need for hardware coordination in the absence of memory barriers, which +permitted Alpha to sport higher CPU clock rates back in the day. However, +please note that smp_read_barrier_depends() should not be used except in +Alpha arch-specific code and within the READ_ONCE() macro. CACHE COHERENCY VS DMA diff --git a/MAINTAINERS b/MAINTAINERS index b46c9cea5ae5..f12475919bef 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -8193,6 +8193,7 @@ F: arch/*/include/asm/rwsem.h F: include/linux/seqlock.h F: lib/locking*.[ch] F: kernel/locking/ +X: kernel/locking/locktorture.c LOGICAL DISK MANAGER SUPPORT (LDM, Windows 2000/XP/Vista Dynamic Disks) M: "Richard Russon (FlatCap)" <ldm@flatcap.org> @@ -11450,15 +11451,6 @@ L: linux-wireless@vger.kernel.org S: Orphan F: drivers/net/wireless/ray* -RCUTORTURE MODULE -M: Josh Triplett <josh@joshtriplett.org> -M: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> -L: linux-kernel@vger.kernel.org -S: Supported -T: git git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git -F: Documentation/RCU/torture.txt -F: kernel/rcu/rcutorture.c - RCUTORTURE TEST FRAMEWORK M: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> M: Josh Triplett <josh@joshtriplett.org> @@ -13767,6 +13759,18 @@ L: platform-driver-x86@vger.kernel.org S: Maintained F: drivers/platform/x86/topstar-laptop.c +TORTURE-TEST MODULES +M: Davidlohr Bueso <dave@stgolabs.net> +M: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> +M: Josh Triplett <josh@joshtriplett.org> +L: linux-kernel@vger.kernel.org +S: Supported +T: git git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git +F: Documentation/RCU/torture.txt +F: kernel/torture.c +F: kernel/rcu/rcutorture.c +F: kernel/locking/locktorture.c + TOSHIBA ACPI EXTRAS DRIVER M: Azael Avalos <coproscefalo@gmail.com> L: platform-driver-x86@vger.kernel.org diff --git a/arch/mn10300/kernel/mn10300-serial.c b/arch/mn10300/kernel/mn10300-serial.c index d7ef1232a82a..4994b570dfd9 100644 --- a/arch/mn10300/kernel/mn10300-serial.c +++ b/arch/mn10300/kernel/mn10300-serial.c @@ -550,7 +550,7 @@ try_again: return; } - smp_read_barrier_depends(); + /* READ_ONCE() enforces dependency, but dangerous through integer!!! */ ch = port->rx_buffer[ix++]; st = port->rx_buffer[ix++]; smp_mb(); @@ -1728,7 +1728,10 @@ static int mn10300_serial_poll_get_char(struct uart_port *_port) if (CIRC_CNT(port->rx_inp, ix, MNSC_BUFFER_SIZE) == 0) return NO_POLL_CHAR; - smp_read_barrier_depends(); + /* + * READ_ONCE() enforces dependency, but dangerous + * through integer!!! + */ ch = port->rx_buffer[ix++]; st = port->rx_buffer[ix++]; smp_mb(); diff --git a/drivers/dma/ioat/dma.c b/drivers/dma/ioat/dma.c index 58d4ccd33672..8b5b23a8ace9 100644 --- a/drivers/dma/ioat/dma.c +++ b/drivers/dma/ioat/dma.c @@ -597,7 +597,6 @@ static void __cleanup(struct ioatdma_chan *ioat_chan, dma_addr_t phys_complete) for (i = 0; i < active && !seen_current; i++) { struct dma_async_tx_descriptor *tx; - smp_read_barrier_depends(); prefetch(ioat_get_ring_ent(ioat_chan, idx + i + 1)); desc = ioat_get_ring_ent(ioat_chan, idx + i); dump_desc_dbg(ioat_chan, desc); @@ -715,7 +714,6 @@ static void ioat_abort_descs(struct ioatdma_chan *ioat_chan) for (i = 1; i < active; i++) { struct dma_async_tx_descriptor *tx; - smp_read_barrier_depends(); prefetch(ioat_get_ring_ent(ioat_chan, idx + i + 1)); desc = ioat_get_ring_ent(ioat_chan, idx + i); diff --git a/drivers/infiniband/Kconfig b/drivers/infiniband/Kconfig index cbf186522016..5cd700421695 100644 --- a/drivers/infiniband/Kconfig +++ b/drivers/infiniband/Kconfig @@ -4,6 +4,7 @@ menuconfig INFINIBAND depends on NET depends on INET depends on m || IPV6 != m + depends on !ALPHA select IRQ_POLL ---help--- Core support for InfiniBand (IB). Make sure to also select diff --git a/drivers/infiniband/hw/hfi1/rc.c b/drivers/infiniband/hw/hfi1/rc.c index af5f7936f7e5..7eb5d50578ba 100644 --- a/drivers/infiniband/hw/hfi1/rc.c +++ b/drivers/infiniband/hw/hfi1/rc.c @@ -302,7 +302,6 @@ int hfi1_make_rc_req(struct rvt_qp *qp, struct hfi1_pkt_state *ps) if (!(ib_rvt_state_ops[qp->state] & RVT_FLUSH_SEND)) goto bail; /* We are in the error state, flush the work request. */ - smp_read_barrier_depends(); /* see post_one_send() */ if (qp->s_last == READ_ONCE(qp->s_head)) goto bail; /* If DMAs are in progress, we can't flush immediately. */ @@ -346,7 +345,6 @@ int hfi1_make_rc_req(struct rvt_qp *qp, struct hfi1_pkt_state *ps) newreq = 0; if (qp->s_cur == qp->s_tail) { /* Check if send work queue is empty. */ - smp_read_barrier_depends(); /* see post_one_send() */ if (qp->s_tail == READ_ONCE(qp->s_head)) { clear_ahg(qp); goto bail; @@ -900,7 +898,6 @@ void hfi1_send_rc_ack(struct hfi1_ctxtdata *rcd, } /* Ensure s_rdma_ack_cnt changes are committed */ - smp_read_barrier_depends(); if (qp->s_rdma_ack_cnt) { hfi1_queue_rc_ack(qp, is_fecn); return; @@ -1562,7 +1559,6 @@ static void rc_rcv_resp(struct hfi1_packet *packet) trace_hfi1_ack(qp, psn); /* Ignore invalid responses. */ - smp_read_barrier_depends(); /* see post_one_send */ if (cmp_psn(psn, READ_ONCE(qp->s_next_psn)) >= 0) goto ack_done; diff --git a/drivers/infiniband/hw/hfi1/ruc.c b/drivers/infiniband/hw/hfi1/ruc.c index 2c7fc6e331ea..13b994738f41 100644 --- a/drivers/infiniband/hw/hfi1/ruc.c +++ b/drivers/infiniband/hw/hfi1/ruc.c @@ -362,7 +362,6 @@ static void ruc_loopback(struct rvt_qp *sqp) sqp->s_flags |= RVT_S_BUSY; again: - smp_read_barrier_depends(); /* see post_one_send() */ if (sqp->s_last == READ_ONCE(sqp->s_head)) goto clr_busy; wqe = rvt_get_swqe_ptr(sqp, sqp->s_last); diff --git a/drivers/infiniband/hw/hfi1/sdma.c b/drivers/infiniband/hw/hfi1/sdma.c index 31c8f89b5fc8..61c130dbed10 100644 --- a/drivers/infiniband/hw/hfi1/sdma.c +++ b/drivers/infiniband/hw/hfi1/sdma.c @@ -553,7 +553,6 @@ static void sdma_hw_clean_up_task(unsigned long opaque) static inline struct sdma_txreq *get_txhead(struct sdma_engine *sde) { - smp_read_barrier_depends(); /* see sdma_update_tail() */ return sde->tx_ring[sde->tx_head & sde->sdma_mask]; } diff --git a/drivers/infiniband/hw/hfi1/uc.c b/drivers/infiniband/hw/hfi1/uc.c index 991bbee04821..132b63e787d1 100644 --- a/drivers/infiniband/hw/hfi1/uc.c +++ b/drivers/infiniband/hw/hfi1/uc.c @@ -79,7 +79,6 @@ int hfi1_make_uc_req(struct rvt_qp *qp, struct hfi1_pkt_state *ps) if (!(ib_rvt_state_ops[qp->state] & RVT_FLUSH_SEND)) goto bail; /* We are in the error state, flush the work request. */ - smp_read_barrier_depends(); /* see post_one_send() */ if (qp->s_last == READ_ONCE(qp->s_head)) goto bail; /* If DMAs are in progress, we can't flush immediately. */ @@ -119,7 +118,6 @@ int hfi1_make_uc_req(struct rvt_qp *qp, struct hfi1_pkt_state *ps) RVT_PROCESS_NEXT_SEND_OK)) goto bail; /* Check if send work queue is empty. */ - smp_read_barrier_depends(); /* see post_one_send() */ if (qp->s_cur == READ_ONCE(qp->s_head)) { clear_ahg(qp); goto bail; diff --git a/drivers/infiniband/hw/hfi1/ud.c b/drivers/infiniband/hw/hfi1/ud.c index beb5091eccca..deb184574395 100644 --- a/drivers/infiniband/hw/hfi1/ud.c +++ b/drivers/infiniband/hw/hfi1/ud.c @@ -486,7 +486,6 @@ int hfi1_make_ud_req(struct rvt_qp *qp, struct hfi1_pkt_state *ps) if (!(ib_rvt_state_ops[qp->state] & RVT_FLUSH_SEND)) goto bail; /* We are in the error state, flush the work request. */ - smp_read_barrier_depends(); /* see post_one_send */ if (qp->s_last == READ_ONCE(qp->s_head)) goto bail; /* If DMAs are in progress, we can't flush immediately. */ @@ -500,7 +499,6 @@ int hfi1_make_ud_req(struct rvt_qp *qp, struct hfi1_pkt_state *ps) } /* see post_one_send() */ - smp_read_barrier_depends(); if (qp->s_cur == READ_ONCE(qp->s_head)) goto bail; diff --git a/drivers/infiniband/hw/qib/qib_rc.c b/drivers/infiniband/hw/qib/qib_rc.c index 8f5754fb8579..1a785c37ad0a 100644 --- a/drivers/infiniband/hw/qib/qib_rc.c +++ b/drivers/infiniband/hw/qib/qib_rc.c @@ -246,7 +246,6 @@ int qib_make_rc_req(struct rvt_qp *qp, unsigned long *flags) if (!(ib_rvt_state_ops[qp->state] & RVT_FLUSH_SEND)) goto bail; /* We are in the error state, flush the work request. */ - smp_read_barrier_depends(); /* see post_one_send() */ if (qp->s_last == READ_ONCE(qp->s_head)) goto bail; /* If DMAs are in progress, we can't flush immediately. */ @@ -293,7 +292,6 @@ int qib_make_rc_req(struct rvt_qp *qp, unsigned long *flags) newreq = 0; if (qp->s_cur == qp->s_tail) { /* Check if send work queue is empty. */ - smp_read_barrier_depends(); /* see post_one_send() */ if (qp->s_tail == READ_ONCE(qp->s_head)) goto bail; /* @@ -1340,7 +1338,6 @@ static void qib_rc_rcv_resp(struct qib_ibport *ibp, goto ack_done; /* Ignore invalid responses. */ - smp_read_barrier_depends(); /* see post_one_send */ if (qib_cmp24(psn, READ_ONCE(qp->s_next_psn)) >= 0) goto ack_done; diff --git a/drivers/infiniband/hw/qib/qib_ruc.c b/drivers/infiniband/hw/qib/qib_ruc.c index 9a37e844d4c8..4662cc7bde92 100644 --- a/drivers/infiniband/hw/qib/qib_ruc.c +++ b/drivers/infiniband/hw/qib/qib_ruc.c @@ -367,7 +367,6 @@ static void qib_ruc_loopback(struct rvt_qp *sqp) sqp->s_flags |= RVT_S_BUSY; again: - smp_read_barrier_depends(); /* see post_one_send() */ if (sqp->s_last == READ_ONCE(sqp->s_head)) goto clr_busy; wqe = rvt_get_swqe_ptr(sqp, sqp->s_last); diff --git a/drivers/infiniband/hw/qib/qib_uc.c b/drivers/infiniband/hw/qib/qib_uc.c index bddcc37ace44..70c58b88192c 100644 --- a/drivers/infiniband/hw/qib/qib_uc.c +++ b/drivers/infiniband/hw/qib/qib_uc.c @@ -60,7 +60,6 @@ int qib_make_uc_req(struct rvt_qp *qp, unsigned long *flags) if (!(ib_rvt_state_ops[qp->state] & RVT_FLUSH_SEND)) goto bail; /* We are in the error state, flush the work request. */ - smp_read_barrier_depends(); /* see post_one_send() */ if (qp->s_last == READ_ONCE(qp->s_head)) goto bail; /* If DMAs are in progress, we can't flush immediately. */ @@ -90,7 +89,6 @@ int qib_make_uc_req(struct rvt_qp *qp, unsigned long *flags) RVT_PROCESS_NEXT_SEND_OK)) goto bail; /* Check if send work queue is empty. */ - smp_read_barrier_depends(); /* see post_one_send() */ if (qp->s_cur == READ_ONCE(qp->s_head)) goto bail; /* diff --git a/drivers/infiniband/hw/qib/qib_ud.c b/drivers/infiniband/hw/qib/qib_ud.c index 15962ed193ce..386c3c4da0c7 100644 --- a/drivers/infiniband/hw/qib/qib_ud.c +++ b/drivers/infiniband/hw/qib/qib_ud.c @@ -252,7 +252,6 @@ int qib_make_ud_req(struct rvt_qp *qp, unsigned long *flags) if (!(ib_rvt_state_ops[qp->state] & RVT_FLUSH_SEND)) goto bail; /* We are in the error state, flush the work request. */ - smp_read_barrier_depends(); /* see post_one_send */ if (qp->s_last == READ_ONCE(qp->s_head)) goto bail; /* If DMAs are in progress, we can't flush immediately. */ @@ -266,7 +265,6 @@ int qib_make_ud_req(struct rvt_qp *qp, unsigned long *flags) } /* see post_one_send() */ - smp_read_barrier_depends(); if (qp->s_cur == READ_ONCE(qp->s_head)) goto bail; diff --git a/drivers/infiniband/sw/rdmavt/qp.c b/drivers/infiniband/sw/rdmavt/qp.c index 9177df60742a..eae84c216e2f 100644 --- a/drivers/infiniband/sw/rdmavt/qp.c +++ b/drivers/infiniband/sw/rdmavt/qp.c @@ -1684,7 +1684,6 @@ static inline int rvt_qp_is_avail( /* non-reserved operations */ if (likely(qp->s_avail)) return 0; - smp_read_barrier_depends(); /* see rc.c */ slast = READ_ONCE(qp->s_last); if (qp->s_head >= slast) avail = qp->s_size - (qp->s_head - slast); diff --git a/drivers/net/ethernet/qlogic/qed/qed_spq.c b/drivers/net/ethernet/qlogic/qed/qed_spq.c index be48d9abd001..c1237ec58b6c 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_spq.c +++ b/drivers/net/ethernet/qlogic/qed/qed_spq.c @@ -97,9 +97,7 @@ static int __qed_spq_block(struct qed_hwfn *p_hwfn, while (iter_cnt--) { /* Validate we receive completion update */ - if (READ_ONCE(comp_done->done) == 1) { - /* Read updated FW return value */ - smp_read_barrier_depends(); + if (smp_load_acquire(&comp_done->done) == 1) { /* ^^^ */ if (p_fw_ret) *p_fw_ret = comp_done->fw_return_code; return 0; diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 33ac2b186b85..78b5940a415a 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -1877,12 +1877,7 @@ static unsigned next_desc(struct vhost_virtqueue *vq, struct vring_desc *desc) return -1U; /* Check they're not leading us off end of descriptors. */ - next = vhost16_to_cpu(vq, desc->next); - /* Make sure compiler knows to grab that: we don't want it changing! */ - /* We will use the result as an index in an array, so most - * architectures only need a compiler barrier here. */ - read_barrier_depends(); - + next = vhost16_to_cpu(vq, READ_ONCE(desc->next)); return next; } diff --git a/fs/dcache.c b/fs/dcache.c index 5c7df1df81ff..379dce86f001 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -1636,8 +1636,7 @@ struct dentry *__d_alloc(struct super_block *sb, const struct qstr *name) dname[name->len] = 0; /* Make sure we always see the terminating NUL character */ - smp_wmb(); - dentry->d_name.name = dname; + smp_store_release(&dentry->d_name.name, dname); /* ^^^ */ dentry->d_lockref.count = 1; dentry->d_flags = 0; @@ -3047,17 +3046,14 @@ static int prepend(char **buffer, int *buflen, const char *str, int namelen) * retry it again when a d_move() does happen. So any garbage in the buffer * due to mismatched pointer and length will be discarded. * - * Data dependency barrier is needed to make sure that we see that terminating - * NUL. Alpha strikes again, film at 11... + * Load acquire is needed to make sure that we see that terminating NUL. */ static int prepend_name(char **buffer, int *buflen, const struct qstr *name) { - const char *dname = READ_ONCE(name->name); + const char *dname = smp_load_acquire(&name->name); /* ^^^ */ u32 dlen = READ_ONCE(name->len); char *p; - smp_read_barrier_depends(); - *buflen -= dlen + 1; if (*buflen < 0) return -ENAMETOOLONG; diff --git a/fs/file.c b/fs/file.c index 3b080834b870..fc0eeb812e2c 100644 --- a/fs/file.c +++ b/fs/file.c @@ -391,7 +391,7 @@ static struct fdtable *close_files(struct files_struct * files) struct file * file = xchg(&fdt->fd[i], NULL); if (file) { filp_close(file, files); - cond_resched_rcu_qs(); + cond_resched(); } } i++; diff --git a/include/linux/genetlink.h b/include/linux/genetlink.h index ecc2928e8046..bc738504ab4a 100644 --- a/include/linux/genetlink.h +++ b/include/linux/genetlink.h @@ -31,8 +31,7 @@ extern wait_queue_head_t genl_sk_destructing_waitq; * @p: The pointer to read, prior to dereferencing * * Return the value of the specified RCU-protected pointer, but omit - * both the smp_read_barrier_depends() and the READ_ONCE(), because - * caller holds genl mutex. + * the READ_ONCE(), because caller holds genl mutex. */ #define genl_dereference(p) \ rcu_dereference_protected(p, lockdep_genl_is_held()) diff --git a/include/linux/netfilter/nfnetlink.h b/include/linux/netfilter/nfnetlink.h index 495ba4dd9da5..34551f8aaf9d 100644 --- a/include/linux/netfilter/nfnetlink.h +++ b/include/linux/netfilter/nfnetlink.h @@ -67,8 +67,7 @@ static inline bool lockdep_nfnl_is_held(__u8 subsys_id) * @ss: The nfnetlink subsystem ID * * Return the value of the specified RCU-protected pointer, but omit - * both the smp_read_barrier_depends() and the READ_ONCE(), because - * caller holds the NFNL subsystem mutex. + * the READ_ONCE(), because caller holds the NFNL subsystem mutex. */ #define nfnl_dereference(p, ss) \ rcu_dereference_protected(p, lockdep_nfnl_is_held(ss)) diff --git a/include/linux/percpu-refcount.h b/include/linux/percpu-refcount.h index 6658d9ee5257..864d167a1073 100644 --- a/include/linux/percpu-refcount.h +++ b/include/linux/percpu-refcount.h @@ -139,12 +139,12 @@ static inline bool __ref_is_percpu(struct percpu_ref *ref, * when using it as a pointer, __PERCPU_REF_ATOMIC may be set in * between contaminating the pointer value, meaning that * READ_ONCE() is required when fetching it. + * + * The smp_read_barrier_depends() implied by READ_ONCE() pairs + * with smp_store_release() in __percpu_ref_switch_to_percpu(). */ percpu_ptr = READ_ONCE(ref->percpu_count_ptr); - /* paired with smp_store_release() in __percpu_ref_switch_to_percpu() */ - smp_read_barrier_depends(); - /* * Theoretically, the following could test just ATOMIC; however, * then we'd have to mask off DEAD separately as DEAD may be diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index a6ddc42f87a5..043d04784675 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -197,7 +197,7 @@ static inline void exit_tasks_rcu_finish(void) { } #define cond_resched_rcu_qs() \ do { \ if (!cond_resched()) \ - rcu_note_voluntary_context_switch(current); \ + rcu_note_voluntary_context_switch_lite(current); \ } while (0) /* @@ -433,12 +433,12 @@ static inline void rcu_preempt_sleep_check(void) { } * @p: The pointer to read * * Return the value of the specified RCU-protected pointer, but omit the - * smp_read_barrier_depends() and keep the READ_ONCE(). This is useful - * when the value of this pointer is accessed, but the pointer is not - * dereferenced, for example, when testing an RCU-protected pointer against - * NULL. Although rcu_access_pointer() may also be used in cases where - * update-side locks prevent the value of the pointer from changing, you - * should instead use rcu_dereference_protected() for this use case. + * lockdep checks for being in an RCU read-side critical section. This is + * useful when the value of this pointer is accessed, but the pointer is + * not dereferenced, for example, when testing an RCU-protected pointer + * against NULL. Although rcu_access_pointer() may also be used in cases + * where update-side locks prevent the value of the pointer from changing, + * you should instead use rcu_dereference_protected() for this use case. * * It is also permissible to use rcu_access_pointer() when read-side * access to the pointer was removed at least one grace period ago, as @@ -521,12 +521,11 @@ static inline void rcu_preempt_sleep_check(void) { } * @c: The conditions under which the dereference will take place * * Return the value of the specified RCU-protected pointer, but omit - * both the smp_read_barrier_depends() and the READ_ONCE(). This - * is useful in cases where update-side locks prevent the value of the - * pointer from changing. Please note that this primitive does *not* - * prevent the compiler from repeating this reference or combining it - * with other references, so it should not be used without protection - * of appropriate locks. + * the READ_ONCE(). This is useful in cases where update-side locks + * prevent the value of the pointer from changing. Please note that this + * primitive does *not* prevent the compiler from repeating this reference + * or combining it with other references, so it should not be used without + * protection of appropriate locks. * * This function is only for update-side use. Using this function * when protected only by rcu_read_lock() will result in infrequent diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h index b3dbf9502fd0..ce9beec35e34 100644 --- a/include/linux/rcutiny.h +++ b/include/linux/rcutiny.h @@ -111,7 +111,6 @@ static inline void rcu_cpu_stall_reset(void) { } static inline void rcu_idle_enter(void) { } static inline void rcu_idle_exit(void) { } static inline void rcu_irq_enter(void) { } -static inline bool rcu_irq_enter_disabled(void) { return false; } static inline void rcu_irq_exit_irqson(void) { } static inline void rcu_irq_enter_irqson(void) { } static inline void rcu_irq_exit(void) { } diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h index 37d6fd3b7ff8..fd996cdf1833 100644 --- a/include/linux/rcutree.h +++ b/include/linux/rcutree.h @@ -85,7 +85,6 @@ void rcu_irq_enter(void); void rcu_irq_exit(void); void rcu_irq_enter_irqson(void); void rcu_irq_exit_irqson(void); -bool rcu_irq_enter_disabled(void); void exit_rcu(void); diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h index 2032ce2eb20b..1eadec3fc228 100644 --- a/include/linux/rtnetlink.h +++ b/include/linux/rtnetlink.h @@ -70,8 +70,7 @@ static inline bool lockdep_rtnl_is_held(void) * @p: The pointer to read, prior to dereferencing * * Return the value of the specified RCU-protected pointer, but omit - * both the smp_read_barrier_depends() and the READ_ONCE(), because - * caller holds RTNL. + * the READ_ONCE(), because caller holds RTNL. */ #define rtnl_dereference(p) \ rcu_dereference_protected(p, lockdep_rtnl_is_held()) diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h index f189a8a3bbb8..bcf4cf26b8c8 100644 --- a/include/linux/seqlock.h +++ b/include/linux/seqlock.h @@ -278,9 +278,8 @@ static inline void raw_write_seqcount_barrier(seqcount_t *s) static inline int raw_read_seqcount_latch(seqcount_t *s) { - int seq = READ_ONCE(s->sequence); /* Pairs with the first smp_wmb() in raw_write_seqcount_latch() */ - smp_read_barrier_depends(); + int seq = READ_ONCE(s->sequence); /* ^^^ */ return seq; } diff --git a/include/linux/srcutree.h b/include/linux/srcutree.h index a949f4f9e4d7..4eda108abee0 100644 --- a/include/linux/srcutree.h +++ b/include/linux/srcutree.h @@ -40,7 +40,7 @@ struct srcu_data { unsigned long srcu_unlock_count[2]; /* Unlocks per CPU. */ /* Update-side state. */ - raw_spinlock_t __private lock ____cacheline_internodealigned_in_smp; + spinlock_t __private lock ____cacheline_internodealigned_in_smp; struct rcu_segcblist srcu_cblist; /* List of callbacks.*/ unsigned long srcu_gp_seq_needed; /* Furthest future GP needed. */ unsigned long srcu_gp_seq_needed_exp; /* Furthest future exp GP. */ @@ -58,7 +58,7 @@ struct srcu_data { * Node in SRCU combining tree, similar in function to rcu_data. */ struct srcu_node { - raw_spinlock_t __private lock; + spinlock_t __private lock; unsigned long srcu_have_cbs[4]; /* GP seq for children */ /* having CBs, but only */ /* is > ->srcu_gq_seq. */ @@ -78,7 +78,7 @@ struct srcu_struct { struct srcu_node *level[RCU_NUM_LVLS + 1]; /* First node at each level. */ struct mutex srcu_cb_mutex; /* Serialize CB preparation. */ - raw_spinlock_t __private lock; /* Protect counters */ + spinlock_t __private lock; /* Protect counters */ struct mutex srcu_gp_mutex; /* Serialize GP work. */ unsigned int srcu_idx; /* Current rdr array element. */ unsigned long srcu_gp_seq; /* Grace-period seq #. */ @@ -107,7 +107,7 @@ struct srcu_struct { #define __SRCU_STRUCT_INIT(name) \ { \ .sda = &name##_srcu_data, \ - .lock = __RAW_SPIN_LOCK_UNLOCKED(name.lock), \ + .lock = __SPIN_LOCK_UNLOCKED(name.lock), \ .srcu_gp_seq_needed = 0 - 1, \ __SRCU_DEP_MAP_INIT(name) \ } diff --git a/include/linux/torture.h b/include/linux/torture.h index a45702eb3e7b..66272862070b 100644 --- a/include/linux/torture.h +++ b/include/linux/torture.h @@ -79,7 +79,7 @@ void stutter_wait(const char *title); int torture_stutter_init(int s); /* Initialization and cleanup. */ -bool torture_init_begin(char *ttype, bool v, int *runnable); +bool torture_init_begin(char *ttype, bool v); void torture_init_end(void); bool torture_cleanup_begin(void); void torture_cleanup_end(void); @@ -96,4 +96,10 @@ void _torture_stop_kthread(char *m, struct task_struct **tp); #define torture_stop_kthread(n, tp) \ _torture_stop_kthread("Stopping " #n " task", &(tp)) +#ifdef CONFIG_PREEMPT +#define torture_preempt_schedule() preempt_schedule() +#else +#define torture_preempt_schedule() +#endif + #endif /* __LINUX_TORTURE_H */ diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h index a26ffbe09e71..c94f466d57ef 100644 --- a/include/linux/tracepoint.h +++ b/include/linux/tracepoint.h @@ -137,11 +137,8 @@ extern void syscall_unregfunc(void); \ if (!(cond)) \ return; \ - if (rcucheck) { \ - if (WARN_ON_ONCE(rcu_irq_enter_disabled())) \ - return; \ + if (rcucheck) \ rcu_irq_enter_irqson(); \ - } \ rcu_read_lock_sched_notrace(); \ it_func_ptr = rcu_dereference_sched((tp)->funcs); \ if (it_func_ptr) { \ diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h index 59d40c454aa0..0b50fda80db0 100644 --- a/include/trace/events/rcu.h +++ b/include/trace/events/rcu.h @@ -243,6 +243,7 @@ TRACE_EVENT(rcu_exp_funnel_lock, __entry->grphi, __entry->gpevent) ); +#ifdef CONFIG_RCU_NOCB_CPU /* * Tracepoint for RCU no-CBs CPU callback handoffs. This event is intended * to assist debugging of these handoffs. @@ -285,6 +286,7 @@ TRACE_EVENT(rcu_nocb_wake, TP_printk("%s %d %s", __entry->rcuname, __entry->cpu, __entry->reason) ); +#endif /* * Tracepoint for tasks blocking within preemptible-RCU read-side @@ -421,76 +423,40 @@ TRACE_EVENT(rcu_fqs, /* * Tracepoint for dyntick-idle entry/exit events. These take a string - * as argument: "Start" for entering dyntick-idle mode, "End" for - * leaving it, "--=" for events moving towards idle, and "++=" for events - * moving away from idle. "Error on entry: not idle task" and "Error on - * exit: not idle task" indicate that a non-idle task is erroneously - * toying with the idle loop. + * as argument: "Start" for entering dyntick-idle mode, "Startirq" for + * entering it from irq/NMI, "End" for leaving it, "Endirq" for leaving it + * to irq/NMI, "--=" for events moving towards idle, and "++=" for events + * moving away from idle. * * These events also take a pair of numbers, which indicate the nesting - * depth before and after the event of interest. Note that task-related - * events use the upper bits of each number, while interrupt-related - * events use the lower bits. + * depth before and after the event of interest, and a third number that is + * the ->dynticks counter. Note that task-related and interrupt-related + * events use two separate counters, and that the "++=" and "--=" events + * for irq/NMI will change the counter by two, otherwise by one. */ TRACE_EVENT(rcu_dyntick, - TP_PROTO(const char *polarity, long long oldnesting, long long newnesting), + TP_PROTO(const char *polarity, long oldnesting, long newnesting, atomic_t dynticks), - TP_ARGS(polarity, oldnesting, newnesting), + TP_ARGS(polarity, oldnesting, newnesting, dynticks), TP_STRUCT__entry( __field(const char *, polarity) - __field(long long, oldnesting) - __field(long long, newnesting) + __field(long, oldnesting) + __field(long, newnesting) + __field(int, dynticks) ), TP_fast_assign( __entry->polarity = polarity; __entry->oldnesting = oldnesting; __entry->newnesting = newnesting; + __entry->dynticks = atomic_read(&dynticks); ), - TP_printk("%s %llx %llx", __entry->polarity, - __entry->oldnesting, __entry->newnesting) -); - -/* - * Tracepoint for RCU preparation for idle, the goal being to get RCU - * processing done so that the current CPU can shut off its scheduling - * clock and enter dyntick-idle mode. One way to accomplish this is - * to drain all RCU callbacks from this CPU, and the other is to have - * done everything RCU requires for the current grace period. In this - * latter case, the CPU will be awakened at the end of the current grace - * period in order to process the remainder of its callbacks. - * - * These tracepoints take a string as argument: - * - * "No callbacks": Nothing to do, no callbacks on this CPU. - * "In holdoff": Nothing to do, holding off after unsuccessful attempt. - * "Begin holdoff": Attempt failed, don't retry until next jiffy. - * "Dyntick with callbacks": Entering dyntick-idle despite callbacks. - * "Dyntick with lazy callbacks": Entering dyntick-idle w/lazy callbacks. - * "More callbacks": Still more callbacks, try again to clear them out. - * "Callbacks drained": All callbacks processed, off to dyntick idle! - * "Timer": Timer fired to cause CPU to continue processing callbacks. - * "Demigrate": Timer fired on wrong CPU, woke up correct CPU. - * "Cleanup after idle": Idle exited, timer canceled. - */ -TRACE_EVENT(rcu_prep_idle, - - TP_PROTO(const char *reason), - - TP_ARGS(reason), - - TP_STRUCT__entry( - __field(const char *, reason) - ), - - TP_fast_assign( - __entry->reason = reason; - ), - - TP_printk("%s", __entry->reason) + TP_printk("%s %lx %lx %#3x", __entry->polarity, + __entry->oldnesting, __entry->newnesting, + __entry->dynticks & 0xfff) ); /* @@ -799,8 +765,7 @@ TRACE_EVENT(rcu_barrier, grplo, grphi, gp_tasks) do { } \ while (0) #define trace_rcu_fqs(rcuname, gpnum, cpu, qsevent) do { } while (0) -#define trace_rcu_dyntick(polarity, oldnesting, newnesting) do { } while (0) -#define trace_rcu_prep_idle(reason) do { } while (0) +#define trace_rcu_dyntick(polarity, oldnesting, newnesting, dyntick) do { } while (0) #define trace_rcu_callback(rcuname, rhp, qlen_lazy, qlen) do { } while (0) #define trace_rcu_kfree_callback(rcuname, rhp, offset, qlen_lazy, qlen) \ do { } while (0) diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index 267f6ef91d97..ce6848e46e94 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -1167,8 +1167,8 @@ static int xol_add_vma(struct mm_struct *mm, struct xol_area *area) } ret = 0; - smp_wmb(); /* pairs with get_xol_area() */ - mm->uprobes_state.xol_area = area; + /* pairs with get_xol_area() */ + smp_store_release(&mm->uprobes_state.xol_area, area); /* ^^^ */ fail: up_write(&mm->mmap_sem); @@ -1230,8 +1230,8 @@ static struct xol_area *get_xol_area(void) if (!mm->uprobes_state.xol_area) __create_xol_area(0); - area = mm->uprobes_state.xol_area; - smp_read_barrier_depends(); /* pairs with wmb in xol_add_vma() */ + /* Pairs with xol_add_vma() smp_store_release() */ + area = READ_ONCE(mm->uprobes_state.xol_area); /* ^^^ */ return area; } @@ -1528,8 +1528,8 @@ static unsigned long get_trampoline_vaddr(void) struct xol_area *area; unsigned long trampoline_vaddr = -1; - area = current->mm->uprobes_state.xol_area; - smp_read_barrier_depends(); + /* Pairs with xol_add_vma() smp_store_release() */ + area = READ_ONCE(current->mm->uprobes_state.xol_area); /* ^^^ */ if (area) trampoline_vaddr = area->vaddr; diff --git a/kernel/locking/locktorture.c b/kernel/locking/locktorture.c index f24582d4dad3..6850ffd69125 100644 --- a/kernel/locking/locktorture.c +++ b/kernel/locking/locktorture.c @@ -77,10 +77,6 @@ struct lock_stress_stats { long n_lock_acquired; }; -int torture_runnable = IS_ENABLED(MODULE); -module_param(torture_runnable, int, 0444); -MODULE_PARM_DESC(torture_runnable, "Start locktorture at module init"); - /* Forward reference. */ static void lock_torture_cleanup(void); @@ -130,10 +126,8 @@ static void torture_lock_busted_write_delay(struct torture_random_state *trsp) if (!(torture_random(trsp) % (cxt.nrealwriters_stress * 2000 * longdelay_ms))) mdelay(longdelay_ms); -#ifdef CONFIG_PREEMPT if (!(torture_random(trsp) % (cxt.nrealwriters_stress * 20000))) - preempt_schedule(); /* Allow test to be preempted. */ -#endif + torture_preempt_schedule(); /* Allow test to be preempted. */ } static void torture_lock_busted_write_unlock(void) @@ -179,10 +173,8 @@ static void torture_spin_lock_write_delay(struct torture_random_state *trsp) if (!(torture_random(trsp) % (cxt.nrealwriters_stress * 2 * shortdelay_us))) udelay(shortdelay_us); -#ifdef CONFIG_PREEMPT if (!(torture_random(trsp) % (cxt.nrealwriters_stress * 20000))) - preempt_schedule(); /* Allow test to be preempted. */ -#endif + torture_preempt_schedule(); /* Allow test to be preempted. */ } static void torture_spin_lock_write_unlock(void) __releases(torture_spinlock) @@ -352,10 +344,8 @@ static void torture_mutex_delay(struct torture_random_state *trsp) mdelay(longdelay_ms * 5); else mdelay(longdelay_ms / 5); -#ifdef CONFIG_PREEMPT if (!(torture_random(trsp) % (cxt.nrealwriters_stress * 20000))) - preempt_schedule(); /* Allow test to be preempted. */ -#endif + torture_preempt_schedule(); /* Allow test to be preempted. */ } static void torture_mutex_unlock(void) __releases(torture_mutex) @@ -507,10 +497,8 @@ static void torture_rtmutex_delay(struct torture_random_state *trsp) if (!(torture_random(trsp) % (cxt.nrealwriters_stress * 2 * shortdelay_us))) udelay(shortdelay_us); -#ifdef CONFIG_PREEMPT if (!(torture_random(trsp) % (cxt.nrealwriters_stress * 20000))) - preempt_schedule(); /* Allow test to be preempted. */ -#endif + torture_preempt_schedule(); /* Allow test to be preempted. */ } static void torture_rtmutex_unlock(void) __releases(torture_rtmutex) @@ -547,10 +535,8 @@ static void torture_rwsem_write_delay(struct torture_random_state *trsp) mdelay(longdelay_ms * 10); else mdelay(longdelay_ms / 10); -#ifdef CONFIG_PREEMPT if (!(torture_random(trsp) % (cxt.nrealwriters_stress * 20000))) - preempt_schedule(); /* Allow test to be preempted. */ -#endif + torture_preempt_schedule(); /* Allow test to be preempted. */ } static void torture_rwsem_up_write(void) __releases(torture_rwsem) @@ -570,14 +556,12 @@ static void torture_rwsem_read_delay(struct torture_random_state *trsp) /* We want a long delay occasionally to force massive contention. */ if (!(torture_random(trsp) % - (cxt.nrealwriters_stress * 2000 * longdelay_ms))) + (cxt.nrealreaders_stress * 2000 * longdelay_ms))) mdelay(longdelay_ms * 2); else mdelay(longdelay_ms / 2); -#ifdef CONFIG_PREEMPT if (!(torture_random(trsp) % (cxt.nrealreaders_stress * 20000))) - preempt_schedule(); /* Allow test to be preempted. */ -#endif + torture_preempt_schedule(); /* Allow test to be preempted. */ } static void torture_rwsem_up_read(void) __releases(torture_rwsem) @@ -715,8 +699,7 @@ static void __torture_print_stats(char *page, { bool fail = 0; int i, n_stress; - long max = 0; - long min = statp[0].n_lock_acquired; + long max = 0, min = statp ? statp[0].n_lock_acquired : 0; long long sum = 0; n_stress = write ? cxt.nrealwriters_stress : cxt.nrealreaders_stress; @@ -823,7 +806,7 @@ static void lock_torture_cleanup(void) * such, only perform the underlying torture-specific cleanups, * and avoid anything related to locktorture. */ - if (!cxt.lwsa) + if (!cxt.lwsa && !cxt.lrsa) goto end; if (writer_tasks) { @@ -879,7 +862,7 @@ static int __init lock_torture_init(void) &percpu_rwsem_lock_ops, }; - if (!torture_init_begin(torture_type, verbose, &torture_runnable)) + if (!torture_init_begin(torture_type, verbose)) return -EBUSY; /* Process args and tell the world that the torturer is on the job. */ @@ -898,6 +881,13 @@ static int __init lock_torture_init(void) firsterr = -EINVAL; goto unwind; } + + if (nwriters_stress == 0 && nreaders_stress == 0) { + pr_alert("lock-torture: must run at least one locking thread\n"); + firsterr = -EINVAL; + goto unwind; + } + if (cxt.cur_ops->init) cxt.cur_ops->init(); @@ -921,17 +911,19 @@ static int __init lock_torture_init(void) #endif /* Initialize the statistics so that each run gets its own numbers. */ + if (nwriters_stress) { + lock_is_write_held = 0; + cxt.lwsa = kmalloc(sizeof(*cxt.lwsa) * cxt.nrealwriters_stress, GFP_KERNEL); + if (cxt.lwsa == NULL) { + VERBOSE_TOROUT_STRING("cxt.lwsa: Out of memory"); + firsterr = -ENOMEM; + goto unwind; + } - lock_is_write_held = 0; - cxt.lwsa = kmalloc(sizeof(*cxt.lwsa) * cxt.nrealwriters_stress, GFP_KERNEL); - if (cxt.lwsa == NULL) { - VERBOSE_TOROUT_STRING("cxt.lwsa: Out of memory"); - firsterr = -ENOMEM; - goto unwind; - } - for (i = 0; i < cxt.nrealwriters_stress; i++) { - cxt.lwsa[i].n_lock_fail = 0; - cxt.lwsa[i].n_lock_acquired = 0; + for (i = 0; i < cxt.nrealwriters_stress; i++) { + cxt.lwsa[i].n_lock_fail = 0; + cxt.lwsa[i].n_lock_acquired = 0; + } } if (cxt.cur_ops->readlock) { @@ -948,19 +940,21 @@ static int __init lock_torture_init(void) cxt.nrealreaders_stress = cxt.nrealwriters_stress; } - lock_is_read_held = 0; - cxt.lrsa = kmalloc(sizeof(*cxt.lrsa) * cxt.nrealreaders_stress, GFP_KERNEL); - if (cxt.lrsa == NULL) { - VERBOSE_TOROUT_STRING("cxt.lrsa: Out of memory"); - firsterr = -ENOMEM; - kfree(cxt.lwsa); - cxt.lwsa = NULL; - goto unwind; - } - - for (i = 0; i < cxt.nrealreaders_stress; i++) { - cxt.lrsa[i].n_lock_fail = 0; - cxt.lrsa[i].n_lock_acquired = 0; + if (nreaders_stress) { + lock_is_read_held = 0; + cxt.lrsa = kmalloc(sizeof(*cxt.lrsa) * cxt.nrealreaders_stress, GFP_KERNEL); + if (cxt.lrsa == NULL) { + VERBOSE_TOROUT_STRING("cxt.lrsa: Out of memory"); + firsterr = -ENOMEM; + kfree(cxt.lwsa); + cxt.lwsa = NULL; + goto unwind; + } + + for (i = 0; i < cxt.nrealreaders_stress; i++) { + cxt.lrsa[i].n_lock_fail = 0; + cxt.lrsa[i].n_lock_acquired = 0; + } } } @@ -990,12 +984,14 @@ static int __init lock_torture_init(void) goto unwind; } - writer_tasks = kzalloc(cxt.nrealwriters_stress * sizeof(writer_tasks[0]), - GFP_KERNEL); - if (writer_tasks == NULL) { - VERBOSE_TOROUT_ERRSTRING("writer_tasks: Out of memory"); - firsterr = -ENOMEM; - goto unwind; + if (nwriters_stress) { + writer_tasks = kzalloc(cxt.nrealwriters_stress * sizeof(writer_tasks[0]), + GFP_KERNEL); + if (writer_tasks == NULL) { + VERBOSE_TOROUT_ERRSTRING("writer_tasks: Out of memory"); + firsterr = -ENOMEM; + goto unwind; + } } if (cxt.cur_ops->readlock) { diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c index 294294c71ba4..38ece035039e 100644 --- a/kernel/locking/qspinlock.c +++ b/kernel/locking/qspinlock.c @@ -170,7 +170,7 @@ static __always_inline void clear_pending_set_locked(struct qspinlock *lock) * @tail : The new queue tail code word * Return: The previous queue tail code word * - * xchg(lock, tail) + * xchg(lock, tail), which heads an address dependency * * p,*,* -> n,*,* ; prev = xchg(lock, node) */ @@ -409,13 +409,11 @@ queue: if (old & _Q_TAIL_MASK) { prev = decode_tail(old); /* - * The above xchg_tail() is also a load of @lock which generates, - * through decode_tail(), a pointer. - * - * The address dependency matches the RELEASE of xchg_tail() - * such that the access to @prev must happen after. + * The above xchg_tail() is also a load of @lock which + * generates, through decode_tail(), a pointer. The address + * dependency matches the RELEASE of xchg_tail() such that + * the subsequent access to @prev happens after. */ - smp_read_barrier_depends(); WRITE_ONCE(prev->next, node); diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h index 59c471de342a..6334f2c1abd0 100644 --- a/kernel/rcu/rcu.h +++ b/kernel/rcu/rcu.h @@ -30,31 +30,8 @@ #define RCU_TRACE(stmt) #endif /* #else #ifdef CONFIG_RCU_TRACE */ -/* - * Process-level increment to ->dynticks_nesting field. This allows for - * architectures that use half-interrupts and half-exceptions from - * process context. - * - * DYNTICK_TASK_NEST_MASK defines a field of width DYNTICK_TASK_NEST_WIDTH - * that counts the number of process-based reasons why RCU cannot - * consider the corresponding CPU to be idle, and DYNTICK_TASK_NEST_VALUE - * is the value used to increment or decrement this field. - * - * The rest of the bits could in principle be used to count interrupts, - * but this would mean that a negative-one value in the interrupt - * field could incorrectly zero out the DYNTICK_TASK_NEST_MASK field. - * We therefore provide a two-bit guard field defined by DYNTICK_TASK_MASK - * that is set to DYNTICK_TASK_FLAG upon initial exit from idle. - * The DYNTICK_TASK_EXIT_IDLE value is thus the combined value used upon - * initial exit from idle. - */ -#define DYNTICK_TASK_NEST_WIDTH 7 -#define DYNTICK_TASK_NEST_VALUE ((LLONG_MAX >> DYNTICK_TASK_NEST_WIDTH) + 1) -#define DYNTICK_TASK_NEST_MASK (LLONG_MAX - DYNTICK_TASK_NEST_VALUE + 1) -#define DYNTICK_TASK_FLAG ((DYNTICK_TASK_NEST_VALUE / 8) * 2) -#define DYNTICK_TASK_MASK ((DYNTICK_TASK_NEST_VALUE / 8) * 3) -#define DYNTICK_TASK_EXIT_IDLE (DYNTICK_TASK_NEST_VALUE + \ - DYNTICK_TASK_FLAG) +/* Offset to allow for unmatched rcu_irq_{enter,exit}(). */ +#define DYNTICK_IRQ_NONIDLE ((LONG_MAX / 2) + 1) /* diff --git a/kernel/rcu/rcuperf.c b/kernel/rcu/rcuperf.c index 1f87a02c3399..d1ebdf9868bb 100644 --- a/kernel/rcu/rcuperf.c +++ b/kernel/rcu/rcuperf.c @@ -106,10 +106,6 @@ static int rcu_perf_writer_state; #define MAX_MEAS 10000 #define MIN_MEAS 100 -static int perf_runnable = IS_ENABLED(MODULE); -module_param(perf_runnable, int, 0444); -MODULE_PARM_DESC(perf_runnable, "Start rcuperf at boot"); - /* * Operations vector for selecting different types of tests. */ @@ -646,7 +642,7 @@ rcu_perf_init(void) &tasks_ops, }; - if (!torture_init_begin(perf_type, verbose, &perf_runnable)) + if (!torture_init_begin(perf_type, verbose)) return -EBUSY; /* Process args and tell the world that the perf'er is on the job. */ diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c index 74f6b0146b98..308e6fdbced8 100644 --- a/kernel/rcu/rcutorture.c +++ b/kernel/rcu/rcutorture.c @@ -187,10 +187,6 @@ static const char *rcu_torture_writer_state_getname(void) return rcu_torture_writer_state_names[i]; } -static int torture_runnable = IS_ENABLED(MODULE); -module_param(torture_runnable, int, 0444); -MODULE_PARM_DESC(torture_runnable, "Start rcutorture at boot"); - #if defined(CONFIG_RCU_BOOST) && !defined(CONFIG_HOTPLUG_CPU) #define rcu_can_boost() 1 #else /* #if defined(CONFIG_RCU_BOOST) && !defined(CONFIG_HOTPLUG_CPU) */ @@ -315,11 +311,9 @@ static void rcu_read_delay(struct torture_random_state *rrsp) } if (!(torture_random(rrsp) % (nrealreaders * 2 * shortdelay_us))) udelay(shortdelay_us); -#ifdef CONFIG_PREEMPT if (!preempt_count() && - !(torture_random(rrsp) % (nrealreaders * 20000))) - preempt_schedule(); /* No QS if preempt_disable() in effect */ -#endif + !(torture_random(rrsp) % (nrealreaders * 500))) + torture_preempt_schedule(); /* QS only if preemptible. */ } static void rcu_torture_read_unlock(int idx) __releases(RCU) @@ -1731,7 +1725,7 @@ rcu_torture_init(void) &sched_ops, &tasks_ops, }; - if (!torture_init_begin(torture_type, verbose, &torture_runnable)) + if (!torture_init_begin(torture_type, verbose)) return -EBUSY; /* Process args and tell the world that the torturer is on the job. */ diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c index 6d5880089ff6..d5cea81378cc 100644 --- a/kernel/rcu/srcutree.c +++ b/kernel/rcu/srcutree.c @@ -53,6 +53,33 @@ static void srcu_invoke_callbacks(struct work_struct *work); static void srcu_reschedule(struct srcu_struct *sp, unsigned long delay); static void process_srcu(struct work_struct *work); +/* Wrappers for lock acquisition and release, see raw_spin_lock_rcu_node(). */ +#define spin_lock_rcu_node(p) \ +do { \ + spin_lock(&ACCESS_PRIVATE(p, lock)); \ + smp_mb__after_unlock_lock(); \ +} while (0) + +#define spin_unlock_rcu_node(p) spin_unlock(&ACCESS_PRIVATE(p, lock)) + +#define spin_lock_irq_rcu_node(p) \ +do { \ + spin_lock_irq(&ACCESS_PRIVATE(p, lock)); \ + smp_mb__after_unlock_lock(); \ +} while (0) + +#define spin_unlock_irq_rcu_node(p) \ + spin_unlock_irq(&ACCESS_PRIVATE(p, lock)) + +#define spin_lock_irqsave_rcu_node(p, flags) \ +do { \ + spin_lock_irqsave(&ACCESS_PRIVATE(p, lock), flags); \ + smp_mb__after_unlock_lock(); \ +} while (0) + +#define spin_unlock_irqrestore_rcu_node(p, flags) \ + spin_unlock_irqrestore(&ACCESS_PRIVATE(p, lock), flags) \ + /* * Initialize SRCU combining tree. Note that statically allocated * srcu_struct structures might already have srcu_read_lock() and @@ -77,7 +104,7 @@ static void init_srcu_struct_nodes(struct srcu_struct *sp, bool is_static) /* Each pass through this loop initializes one srcu_node structure. */ rcu_for_each_node_breadth_first(sp, snp) { - raw_spin_lock_init(&ACCESS_PRIVATE(snp, lock)); + spin_lock_init(&ACCESS_PRIVATE(snp, lock)); WARN_ON_ONCE(ARRAY_SIZE(snp->srcu_have_cbs) != ARRAY_SIZE(snp->srcu_data_have_cbs)); for (i = 0; i < ARRAY_SIZE(snp->srcu_have_cbs); i++) { @@ -111,7 +138,7 @@ static void init_srcu_struct_nodes(struct srcu_struct *sp, bool is_static) snp_first = sp->level[level]; for_each_possible_cpu(cpu) { sdp = per_cpu_ptr(sp->sda, cpu); - raw_spin_lock_init(&ACCESS_PRIVATE(sdp, lock)); + spin_lock_init(&ACCESS_PRIVATE(sdp, lock)); rcu_segcblist_init(&sdp->srcu_cblist); sdp->srcu_cblist_invoking = false; sdp->srcu_gp_seq_needed = sp->srcu_gp_seq; @@ -170,7 +197,7 @@ int __init_srcu_struct(struct srcu_struct *sp, const char *name, /* Don't re-initialize a lock while it is held. */ debug_check_no_locks_freed((void *)sp, sizeof(*sp)); lockdep_init_map(&sp->dep_map, name, key, 0); - raw_spin_lock_init(&ACCESS_PRIVATE(sp, lock)); + spin_lock_init(&ACCESS_PRIVATE(sp, lock)); return init_srcu_struct_fields(sp, false); } EXPORT_SYMBOL_GPL(__init_srcu_struct); @@ -187,7 +214,7 @@ EXPORT_SYMBOL_GPL(__init_srcu_struct); */ int init_srcu_struct(struct srcu_struct *sp) { - raw_spin_lock_init(&ACCESS_PRIVATE(sp, lock)); + spin_lock_init(&ACCESS_PRIVATE(sp, lock)); return init_srcu_struct_fields(sp, false); } EXPORT_SYMBOL_GPL(init_srcu_struct); @@ -210,13 +237,13 @@ static void check_init_srcu_struct(struct srcu_struct *sp) /* The smp_load_acquire() pairs with the smp_store_release(). */ if (!rcu_seq_state(smp_load_acquire(&sp->srcu_gp_seq_needed))) /*^^^*/ return; /* Already initialized. */ - raw_spin_lock_irqsave_rcu_node(sp, flags); + spin_lock_irqsave_rcu_node(sp, flags); if (!rcu_seq_state(sp->srcu_gp_seq_needed)) { - raw_spin_unlock_irqrestore_rcu_node(sp, flags); + spin_unlock_irqrestore_rcu_node(sp, flags); return; } init_srcu_struct_fields(sp, true); - raw_spin_unlock_irqrestore_rcu_node(sp, flags); + spin_unlock_irqrestore_rcu_node(sp, flags); } /* @@ -513,7 +540,7 @@ static void srcu_gp_end(struct srcu_struct *sp) mutex_lock(&sp->srcu_cb_mutex); /* End the current grace period. */ - raw_spin_lock_irq_rcu_node(sp); + spin_lock_irq_rcu_node(sp); idx = rcu_seq_state(sp->srcu_gp_seq); WARN_ON_ONCE(idx != SRCU_STATE_SCAN2); cbdelay = srcu_get_delay(sp); @@ -522,7 +549,7 @@ static void srcu_gp_end(struct srcu_struct *sp) gpseq = rcu_seq_current(&sp->srcu_gp_seq); if (ULONG_CMP_LT(sp->srcu_gp_seq_needed_exp, gpseq)) sp->srcu_gp_seq_needed_exp = gpseq; - raw_spin_unlock_irq_rcu_node(sp); + spin_unlock_irq_rcu_node(sp); mutex_unlock(&sp->srcu_gp_mutex); /* A new grace period can start at this point. But only one. */ @@ -530,7 +557,7 @@ static void srcu_gp_end(struct srcu_struct *sp) idx = rcu_seq_ctr(gpseq) % ARRAY_SIZE(snp->srcu_have_cbs); idxnext = (idx + 1) % ARRAY_SIZE(snp->srcu_have_cbs); rcu_for_each_node_breadth_first(sp, snp) { - raw_spin_lock_irq_rcu_node(snp); + spin_lock_irq_rcu_node(snp); cbs = false; if (snp >= sp->level[rcu_num_lvls - 1]) cbs = snp->srcu_have_cbs[idx] == gpseq; @@ -540,7 +567,7 @@ static void srcu_gp_end(struct srcu_struct *sp) snp->srcu_gp_seq_needed_exp = gpseq; mask = snp->srcu_data_have_cbs[idx]; snp->srcu_data_have_cbs[idx] = 0; - raw_spin_unlock_irq_rcu_node(snp); + spin_unlock_irq_rcu_node(snp); if (cbs) srcu_schedule_cbs_snp(sp, snp, mask, cbdelay); @@ -548,11 +575,11 @@ static void srcu_gp_end(struct srcu_struct *sp) if (!(gpseq & counter_wrap_check)) for (cpu = snp->grplo; cpu <= snp->grphi; cpu++) { sdp = per_cpu_ptr(sp->sda, cpu); - raw_spin_lock_irqsave_rcu_node(sdp, flags); + spin_lock_irqsave_rcu_node(sdp, flags); if (ULONG_CMP_GE(gpseq, sdp->srcu_gp_seq_needed + 100)) sdp->srcu_gp_seq_needed = gpseq; - raw_spin_unlock_irqrestore_rcu_node(sdp, flags); + spin_unlock_irqrestore_rcu_node(sdp, flags); } } @@ -560,17 +587,17 @@ static void srcu_gp_end(struct srcu_struct *sp) mutex_unlock(&sp->srcu_cb_mutex); /* Start a new grace period if needed. */ - raw_spin_lock_irq_rcu_node(sp); + spin_lock_irq_rcu_node(sp); gpseq = rcu_seq_current(&sp->srcu_gp_seq); if (!rcu_seq_state(gpseq) && ULONG_CMP_LT(gpseq, sp->srcu_gp_seq_needed)) { srcu_gp_start(sp); - raw_spin_unlock_irq_rcu_node(sp); + spin_unlock_irq_rcu_node(sp); /* Throttle expedited grace periods: Should be rare! */ srcu_reschedule(sp, rcu_seq_ctr(gpseq) & 0x3ff ? 0 : SRCU_INTERVAL); } else { - raw_spin_unlock_irq_rcu_node(sp); + spin_unlock_irq_rcu_node(sp); } } @@ -590,18 +617,18 @@ static void srcu_funnel_exp_start(struct srcu_struct *sp, struct srcu_node *snp, if (rcu_seq_done(&sp->srcu_gp_seq, s) || ULONG_CMP_GE(READ_ONCE(snp->srcu_gp_seq_needed_exp), s)) return; - raw_spin_lock_irqsave_rcu_node(snp, flags); + spin_lock_irqsave_rcu_node(snp, flags); if (ULONG_CMP_GE(snp->srcu_gp_seq_needed_exp, s)) { - raw_spin_unlock_irqrestore_rcu_node(snp, flags); + spin_unlock_irqrestore_rcu_node(snp, flags); return; } WRITE_ONCE(snp->srcu_gp_seq_needed_exp, s); - raw_spin_unlock_irqrestore_rcu_node(snp, flags); + spin_unlock_irqrestore_rcu_node(snp, flags); } - raw_spin_lock_irqsave_rcu_node(sp, flags); + spin_lock_irqsave_rcu_node(sp, flags); if (!ULONG_CMP_LT(sp->srcu_gp_seq_needed_exp, s)) sp->srcu_gp_seq_needed_exp = s; - raw_spin_unlock_irqrestore_rcu_node(sp, flags); + spin_unlock_irqrestore_rcu_node(sp, flags); } /* @@ -623,12 +650,12 @@ static void srcu_funnel_gp_start(struct srcu_struct *sp, struct srcu_data *sdp, for (; snp != NULL; snp = snp->srcu_parent) { if (rcu_seq_done(&sp->srcu_gp_seq, s) && snp != sdp->mynode) return; /* GP already done and CBs recorded. */ - raw_spin_lock_irqsave_rcu_node(snp, flags); + spin_lock_irqsave_rcu_node(snp, flags); if (ULONG_CMP_GE(snp->srcu_have_cbs[idx], s)) { snp_seq = snp->srcu_have_cbs[idx]; if (snp == sdp->mynode && snp_seq == s) snp->srcu_data_have_cbs[idx] |= sdp->grpmask; - raw_spin_unlock_irqrestore_rcu_node(snp, flags); + spin_unlock_irqrestore_rcu_node(snp, flags); if (snp == sdp->mynode && snp_seq != s) { srcu_schedule_cbs_sdp(sdp, do_norm ? SRCU_INTERVAL @@ -644,11 +671,11 @@ static void srcu_funnel_gp_start(struct srcu_struct *sp, struct srcu_data *sdp, snp->srcu_data_have_cbs[idx] |= sdp->grpmask; if (!do_norm && ULONG_CMP_LT(snp->srcu_gp_seq_needed_exp, s)) snp->srcu_gp_seq_needed_exp = s; - raw_spin_unlock_irqrestore_rcu_node(snp, flags); + spin_unlock_irqrestore_rcu_node(snp, flags); } /* Top of tree, must ensure the grace period will be started. */ - raw_spin_lock_irqsave_rcu_node(sp, flags); + spin_lock_irqsave_rcu_node(sp, flags); if (ULONG_CMP_LT(sp->srcu_gp_seq_needed, s)) { /* * Record need for grace period s. Pair with load @@ -667,7 +694,7 @@ static void srcu_funnel_gp_start(struct srcu_struct *sp, struct srcu_data *sdp, queue_delayed_work(system_power_efficient_wq, &sp->work, srcu_get_delay(sp)); } - raw_spin_unlock_irqrestore_rcu_node(sp, flags); + spin_unlock_irqrestore_rcu_node(sp, flags); } /* @@ -830,7 +857,7 @@ void __call_srcu(struct srcu_struct *sp, struct rcu_head *rhp, rhp->func = func; local_irq_save(flags); sdp = this_cpu_ptr(sp->sda); - raw_spin_lock_rcu_node(sdp); + spin_lock_rcu_node(sdp); rcu_segcblist_enqueue(&sdp->srcu_cblist, rhp, false); rcu_segcblist_advance(&sdp->srcu_cblist, rcu_seq_current(&sp->srcu_gp_seq)); @@ -844,7 +871,7 @@ void __call_srcu(struct srcu_struct *sp, struct rcu_head *rhp, sdp->srcu_gp_seq_needed_exp = s; needexp = true; } - raw_spin_unlock_irqrestore_rcu_node(sdp, flags); + spin_unlock_irqrestore_rcu_node(sdp, flags); if (needgp) srcu_funnel_gp_start(sp, sdp, s, do_norm); else if (needexp) @@ -900,7 +927,7 @@ static void __synchronize_srcu(struct srcu_struct *sp, bool do_norm) /* * Make sure that later code is ordered after the SRCU grace - * period. This pairs with the raw_spin_lock_irq_rcu_node() + * period. This pairs with the spin_lock_irq_rcu_node() * in srcu_invoke_callbacks(). Unlike Tree RCU, this is needed * because the current CPU might have been totally uninvolved with * (and thus unordered against) that grace period. @@ -1024,7 +1051,7 @@ void srcu_barrier(struct srcu_struct *sp) */ for_each_possible_cpu(cpu) { sdp = per_cpu_ptr(sp->sda, cpu); - raw_spin_lock_irq_rcu_node(sdp); + spin_lock_irq_rcu_node(sdp); atomic_inc(&sp->srcu_barrier_cpu_cnt); sdp->srcu_barrier_head.func = srcu_barrier_cb; debug_rcu_head_queue(&sdp->srcu_barrier_head); @@ -1033,7 +1060,7 @@ void srcu_barrier(struct srcu_struct *sp) debug_rcu_head_unqueue(&sdp->srcu_barrier_head); atomic_dec(&sp->srcu_barrier_cpu_cnt); } - raw_spin_unlock_irq_rcu_node(sdp); + spin_unlock_irq_rcu_node(sdp); } /* Remove the initial count, at which point reaching zero can happen. */ @@ -1082,17 +1109,17 @@ static void srcu_advance_state(struct srcu_struct *sp) */ idx = rcu_seq_state(smp_load_acquire(&sp->srcu_gp_seq)); /* ^^^ */ if (idx == SRCU_STATE_IDLE) { - raw_spin_lock_irq_rcu_node(sp); + spin_lock_irq_rcu_node(sp); if (ULONG_CMP_GE(sp->srcu_gp_seq, sp->srcu_gp_seq_needed)) { WARN_ON_ONCE(rcu_seq_state(sp->srcu_gp_seq)); - raw_spin_unlock_irq_rcu_node(sp); + spin_unlock_irq_rcu_node(sp); mutex_unlock(&sp->srcu_gp_mutex); return; } idx = rcu_seq_state(READ_ONCE(sp->srcu_gp_seq)); if (idx == SRCU_STATE_IDLE) srcu_gp_start(sp); - raw_spin_unlock_irq_rcu_node(sp); + spin_unlock_irq_rcu_node(sp); if (idx != SRCU_STATE_IDLE) { mutex_unlock(&sp->srcu_gp_mutex); return; /* Someone else started the grace period. */ @@ -1141,19 +1168,19 @@ static void srcu_invoke_callbacks(struct work_struct *work) sdp = container_of(work, struct srcu_data, work.work); sp = sdp->sp; rcu_cblist_init(&ready_cbs); - raw_spin_lock_irq_rcu_node(sdp); + spin_lock_irq_rcu_node(sdp); rcu_segcblist_advance(&sdp->srcu_cblist, rcu_seq_current(&sp->srcu_gp_seq)); if (sdp->srcu_cblist_invoking || !rcu_segcblist_ready_cbs(&sdp->srcu_cblist)) { - raw_spin_unlock_irq_rcu_node(sdp); + spin_unlock_irq_rcu_node(sdp); return; /* Someone else on the job or nothing to do. */ } /* We are on the job! Extract and invoke ready callbacks. */ sdp->srcu_cblist_invoking = true; rcu_segcblist_extract_done_cbs(&sdp->srcu_cblist, &ready_cbs); - raw_spin_unlock_irq_rcu_node(sdp); + spin_unlock_irq_rcu_node(sdp); rhp = rcu_cblist_dequeue(&ready_cbs); for (; rhp != NULL; rhp = rcu_cblist_dequeue(&ready_cbs)) { debug_rcu_head_unqueue(rhp); @@ -1166,13 +1193,13 @@ static void srcu_invoke_callbacks(struct work_struct *work) * Update counts, accelerate new callbacks, and if needed, * schedule another round of callback invocation. */ - raw_spin_lock_irq_rcu_node(sdp); + spin_lock_irq_rcu_node(sdp); rcu_segcblist_insert_count(&sdp->srcu_cblist, &ready_cbs); (void)rcu_segcblist_accelerate(&sdp->srcu_cblist, rcu_seq_snap(&sp->srcu_gp_seq)); sdp->srcu_cblist_invoking = false; more = rcu_segcblist_ready_cbs(&sdp->srcu_cblist); - raw_spin_unlock_irq_rcu_node(sdp); + spin_unlock_irq_rcu_node(sdp); if (more) srcu_schedule_cbs_sdp(sdp, 0); } @@ -1185,7 +1212,7 @@ static void srcu_reschedule(struct srcu_struct *sp, unsigned long delay) { bool pushgp = true; - raw_spin_lock_irq_rcu_node(sp); + spin_lock_irq_rcu_node(sp); if (ULONG_CMP_GE(sp->srcu_gp_seq, sp->srcu_gp_seq_needed)) { if (!WARN_ON_ONCE(rcu_seq_state(sp->srcu_gp_seq))) { /* All requests fulfilled, time to go idle. */ @@ -1195,7 +1222,7 @@ static void srcu_reschedule(struct srcu_struct *sp, unsigned long delay) /* Outstanding request and no GP. Start one. */ srcu_gp_start(sp); } - raw_spin_unlock_irq_rcu_node(sp); + spin_unlock_irq_rcu_node(sp); if (pushgp) queue_delayed_work(system_power_efficient_wq, &sp->work, delay); diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index f9c0ca2ccf0c..491bdf39f276 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -265,25 +265,12 @@ void rcu_bh_qs(void) #endif static DEFINE_PER_CPU(struct rcu_dynticks, rcu_dynticks) = { - .dynticks_nesting = DYNTICK_TASK_EXIT_IDLE, + .dynticks_nesting = 1, + .dynticks_nmi_nesting = DYNTICK_IRQ_NONIDLE, .dynticks = ATOMIC_INIT(RCU_DYNTICK_CTRL_CTR), }; /* - * There's a few places, currently just in the tracing infrastructure, - * that uses rcu_irq_enter() to make sure RCU is watching. But there's - * a small location where that will not even work. In those cases - * rcu_irq_enter_disabled() needs to be checked to make sure rcu_irq_enter() - * can be called. - */ -static DEFINE_PER_CPU(bool, disable_rcu_irq_enter); - -bool rcu_irq_enter_disabled(void) -{ - return this_cpu_read(disable_rcu_irq_enter); -} - -/* * Record entry into an extended quiescent state. This is only to be * called when not already in an extended quiescent state. */ @@ -762,68 +749,39 @@ cpu_needs_another_gp(struct rcu_state *rsp, struct rcu_data *rdp) } /* - * rcu_eqs_enter_common - current CPU is entering an extended quiescent state + * Enter an RCU extended quiescent state, which can be either the + * idle loop or adaptive-tickless usermode execution. * - * Enter idle, doing appropriate accounting. The caller must have - * disabled interrupts. + * We crowbar the ->dynticks_nmi_nesting field to zero to allow for + * the possibility of usermode upcalls having messed up our count + * of interrupt nesting level during the prior busy period. */ -static void rcu_eqs_enter_common(bool user) +static void rcu_eqs_enter(bool user) { struct rcu_state *rsp; struct rcu_data *rdp; - struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks); + struct rcu_dynticks *rdtp; - lockdep_assert_irqs_disabled(); - trace_rcu_dyntick(TPS("Start"), rdtp->dynticks_nesting, 0); - if (IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && - !user && !is_idle_task(current)) { - struct task_struct *idle __maybe_unused = - idle_task(smp_processor_id()); - - trace_rcu_dyntick(TPS("Error on entry: not idle task"), rdtp->dynticks_nesting, 0); - rcu_ftrace_dump(DUMP_ORIG); - WARN_ONCE(1, "Current pid: %d comm: %s / Idle pid: %d comm: %s", - current->pid, current->comm, - idle->pid, idle->comm); /* must be idle task! */ + rdtp = this_cpu_ptr(&rcu_dynticks); + WRITE_ONCE(rdtp->dynticks_nmi_nesting, 0); + WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && + rdtp->dynticks_nesting == 0); + if (rdtp->dynticks_nesting != 1) { + rdtp->dynticks_nesting--; + return; } + + lockdep_assert_irqs_disabled(); + trace_rcu_dyntick(TPS("Start"), rdtp->dynticks_nesting, 0, rdtp->dynticks); + WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && !user && !is_idle_task(current)); for_each_rcu_flavor(rsp) { rdp = this_cpu_ptr(rsp->rda); do_nocb_deferred_wakeup(rdp); } rcu_prepare_for_idle(); - __this_cpu_inc(disable_rcu_irq_enter); - rdtp->dynticks_nesting = 0; /* Breaks tracing momentarily. */ - rcu_dynticks_eqs_enter(); /* After this, tracing works again. */ - __this_cpu_dec(disable_rcu_irq_enter); + WRITE_ONCE(rdtp->dynticks_nesting, 0); /* Avoid irq-access tearing. */ + rcu_dynticks_eqs_enter(); rcu_dynticks_task_enter(); - - /* - * It is illegal to enter an extended quiescent state while - * in an RCU read-side critical section. - */ - RCU_LOCKDEP_WARN(lock_is_held(&rcu_lock_map), - "Illegal idle entry in RCU read-side critical section."); - RCU_LOCKDEP_WARN(lock_is_held(&rcu_bh_lock_map), - "Illegal idle entry in RCU-bh read-side critical section."); - RCU_LOCKDEP_WARN(lock_is_held(&rcu_sched_lock_map), - "Illegal idle entry in RCU-sched read-side critical section."); -} - -/* - * Enter an RCU extended quiescent state, which can be either the - * idle loop or adaptive-tickless usermode execution. - */ -static void rcu_eqs_enter(bool user) -{ - struct rcu_dynticks *rdtp; - - rdtp = this_cpu_ptr(&rcu_dynticks); - WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && - (rdtp->dynticks_nesting & DYNTICK_TASK_NEST_MASK) == 0); - if ((rdtp->dynticks_nesting & DYNTICK_TASK_NEST_MASK) == DYNTICK_TASK_NEST_VALUE) - rcu_eqs_enter_common(user); - else - rdtp->dynticks_nesting -= DYNTICK_TASK_NEST_VALUE; } /** @@ -834,10 +792,6 @@ static void rcu_eqs_enter(bool user) * critical sections can occur in irq handlers in idle, a possibility * handled by irq_enter() and irq_exit().) * - * We crowbar the ->dynticks_nesting field to zero to allow for - * the possibility of usermode upcalls having messed up our count - * of interrupt nesting level during the prior busy period. - * * If you add or remove a call to rcu_idle_enter(), be sure to test with * CONFIG_RCU_EQS_DEBUG=y. */ @@ -867,6 +821,46 @@ void rcu_user_enter(void) #endif /* CONFIG_NO_HZ_FULL */ /** + * rcu_nmi_exit - inform RCU of exit from NMI context + * + * If we are returning from the outermost NMI handler that interrupted an + * RCU-idle period, update rdtp->dynticks and rdtp->dynticks_nmi_nesting + * to let the RCU grace-period handling know that the CPU is back to + * being RCU-idle. + * + * If you add or remove a call to rcu_nmi_exit(), be sure to test + * with CONFIG_RCU_EQS_DEBUG=y. + */ +void rcu_nmi_exit(void) +{ + struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks); + + /* + * Check for ->dynticks_nmi_nesting underflow and bad ->dynticks. + * (We are exiting an NMI handler, so RCU better be paying attention + * to us!) + */ + WARN_ON_ONCE(rdtp->dynticks_nmi_nesting <= 0); + WARN_ON_ONCE(rcu_dynticks_curr_cpu_in_eqs()); + + /* + * If the nesting level is not 1, the CPU wasn't RCU-idle, so + * leave it in non-RCU-idle state. + */ + if (rdtp->dynticks_nmi_nesting != 1) { + trace_rcu_dyntick(TPS("--="), rdtp->dynticks_nmi_nesting, rdtp->dynticks_nmi_nesting - 2, rdtp->dynticks); + WRITE_ONCE(rdtp->dynticks_nmi_nesting, /* No store tearing. */ + rdtp->dynticks_nmi_nesting - 2); + return; + } + + /* This NMI interrupted an RCU-idle CPU, restore RCU-idleness. */ + trace_rcu_dyntick(TPS("Startirq"), rdtp->dynticks_nmi_nesting, 0, rdtp->dynticks); + WRITE_ONCE(rdtp->dynticks_nmi_nesting, 0); /* Avoid store tearing. */ + rcu_dynticks_eqs_enter(); +} + +/** * rcu_irq_exit - inform RCU that current CPU is exiting irq towards idle * * Exit from an interrupt handler, which might possibly result in entering @@ -875,8 +869,8 @@ void rcu_user_enter(void) * * This code assumes that the idle loop never does anything that might * result in unbalanced calls to irq_enter() and irq_exit(). If your - * architecture violates this assumption, RCU will give you what you - * deserve, good and hard. But very infrequently and irreproducibly. + * architecture's idle loop violates this assumption, RCU will give you what + * you deserve, good and hard. But very infrequently and irreproducibly. * * Use things like work queues to work around this limitation. * @@ -887,23 +881,14 @@ void rcu_user_enter(void) */ void rcu_irq_exit(void) { - struct rcu_dynticks *rdtp; + struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks); lockdep_assert_irqs_disabled(); - rdtp = this_cpu_ptr(&rcu_dynticks); - - /* Page faults can happen in NMI handlers, so check... */ - if (rdtp->dynticks_nmi_nesting) - return; - - WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && - rdtp->dynticks_nesting < 1); - if (rdtp->dynticks_nesting <= 1) { - rcu_eqs_enter_common(true); - } else { - trace_rcu_dyntick(TPS("--="), rdtp->dynticks_nesting, rdtp->dynticks_nesting - 1); - rdtp->dynticks_nesting--; - } + if (rdtp->dynticks_nmi_nesting == 1) + rcu_prepare_for_idle(); + rcu_nmi_exit(); + if (rdtp->dynticks_nmi_nesting == 0) + rcu_dynticks_task_enter(); } /* @@ -922,55 +907,33 @@ void rcu_irq_exit_irqson(void) } /* - * rcu_eqs_exit_common - current CPU moving away from extended quiescent state - * - * If the new value of the ->dynticks_nesting counter was previously zero, - * we really have exited idle, and must do the appropriate accounting. - * The caller must have disabled interrupts. - */ -static void rcu_eqs_exit_common(long long oldval, int user) -{ - RCU_TRACE(struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks);) - - rcu_dynticks_task_exit(); - rcu_dynticks_eqs_exit(); - rcu_cleanup_after_idle(); - trace_rcu_dyntick(TPS("End"), oldval, rdtp->dynticks_nesting); - if (IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && - !user && !is_idle_task(current)) { - struct task_struct *idle __maybe_unused = - idle_task(smp_processor_id()); - - trace_rcu_dyntick(TPS("Error on exit: not idle task"), - oldval, rdtp->dynticks_nesting); - rcu_ftrace_dump(DUMP_ORIG); - WARN_ONCE(1, "Current pid: %d comm: %s / Idle pid: %d comm: %s", - current->pid, current->comm, - idle->pid, idle->comm); /* must be idle task! */ - } -} - -/* * Exit an RCU extended quiescent state, which can be either the * idle loop or adaptive-tickless usermode execution. + * + * We crowbar the ->dynticks_nmi_nesting field to DYNTICK_IRQ_NONIDLE to + * allow for the possibility of usermode upcalls messing up our count of + * interrupt nesting level during the busy period that is just now starting. */ static void rcu_eqs_exit(bool user) { struct rcu_dynticks *rdtp; - long long oldval; + long oldval; lockdep_assert_irqs_disabled(); rdtp = this_cpu_ptr(&rcu_dynticks); oldval = rdtp->dynticks_nesting; WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && oldval < 0); - if (oldval & DYNTICK_TASK_NEST_MASK) { - rdtp->dynticks_nesting += DYNTICK_TASK_NEST_VALUE; - } else { - __this_cpu_inc(disable_rcu_irq_enter); - rdtp->dynticks_nesting = DYNTICK_TASK_EXIT_IDLE; - rcu_eqs_exit_common(oldval, user); - __this_cpu_dec(disable_rcu_irq_enter); + if (oldval) { + rdtp->dynticks_nesting++; + return; } + rcu_dynticks_task_exit(); + rcu_dynticks_eqs_exit(); + rcu_cleanup_after_idle(); + trace_rcu_dyntick(TPS("End"), rdtp->dynticks_nesting, 1, rdtp->dynticks); + WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && !user && !is_idle_task(current)); + WRITE_ONCE(rdtp->dynticks_nesting, 1); + WRITE_ONCE(rdtp->dynticks_nmi_nesting, DYNTICK_IRQ_NONIDLE); } /** @@ -979,11 +942,6 @@ static void rcu_eqs_exit(bool user) * Exit idle mode, in other words, -enter- the mode in which RCU * read-side critical sections can occur. * - * We crowbar the ->dynticks_nesting field to DYNTICK_TASK_NEST to - * allow for the possibility of usermode upcalls messing up our count - * of interrupt nesting level during the busy period that is just - * now starting. - * * If you add or remove a call to rcu_idle_exit(), be sure to test with * CONFIG_RCU_EQS_DEBUG=y. */ @@ -1013,65 +971,6 @@ void rcu_user_exit(void) #endif /* CONFIG_NO_HZ_FULL */ /** - * rcu_irq_enter - inform RCU that current CPU is entering irq away from idle - * - * Enter an interrupt handler, which might possibly result in exiting - * idle mode, in other words, entering the mode in which read-side critical - * sections can occur. The caller must have disabled interrupts. - * - * Note that the Linux kernel is fully capable of entering an interrupt - * handler that it never exits, for example when doing upcalls to - * user mode! This code assumes that the idle loop never does upcalls to - * user mode. If your architecture does do upcalls from the idle loop (or - * does anything else that results in unbalanced calls to the irq_enter() - * and irq_exit() functions), RCU will give you what you deserve, good - * and hard. But very infrequently and irreproducibly. - * - * Use things like work queues to work around this limitation. - * - * You have been warned. - * - * If you add or remove a call to rcu_irq_enter(), be sure to test with - * CONFIG_RCU_EQS_DEBUG=y. - */ -void rcu_irq_enter(void) -{ - struct rcu_dynticks *rdtp; - long long oldval; - - lockdep_assert_irqs_disabled(); - rdtp = this_cpu_ptr(&rcu_dynticks); - - /* Page faults can happen in NMI handlers, so check... */ - if (rdtp->dynticks_nmi_nesting) - return; - - oldval = rdtp->dynticks_nesting; - rdtp->dynticks_nesting++; - WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && - rdtp->dynticks_nesting == 0); - if (oldval) - trace_rcu_dyntick(TPS("++="), oldval, rdtp->dynticks_nesting); - else - rcu_eqs_exit_common(oldval, true); -} - -/* - * Wrapper for rcu_irq_enter() where interrupts are enabled. - * - * If you add or remove a call to rcu_irq_enter_irqson(), be sure to test - * with CONFIG_RCU_EQS_DEBUG=y. - */ -void rcu_irq_enter_irqson(void) -{ - unsigned long flags; - - local_irq_save(flags); - rcu_irq_enter(); - local_irq_restore(flags); -} - -/** * rcu_nmi_enter - inform RCU of entry to NMI context * * If the CPU was idle from RCU's viewpoint, update rdtp->dynticks and @@ -1086,7 +985,7 @@ void rcu_irq_enter_irqson(void) void rcu_nmi_enter(void) { struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks); - int incby = 2; + long incby = 2; /* Complain about underflow. */ WARN_ON_ONCE(rdtp->dynticks_nmi_nesting < 0); @@ -1103,45 +1002,61 @@ void rcu_nmi_enter(void) rcu_dynticks_eqs_exit(); incby = 1; } - rdtp->dynticks_nmi_nesting += incby; + trace_rcu_dyntick(incby == 1 ? TPS("Endirq") : TPS("++="), + rdtp->dynticks_nmi_nesting, + rdtp->dynticks_nmi_nesting + incby, rdtp->dynticks); + WRITE_ONCE(rdtp->dynticks_nmi_nesting, /* Prevent store tearing. */ + rdtp->dynticks_nmi_nesting + incby); barrier(); } /** - * rcu_nmi_exit - inform RCU of exit from NMI context + * rcu_irq_enter - inform RCU that current CPU is entering irq away from idle * - * If we are returning from the outermost NMI handler that interrupted an - * RCU-idle period, update rdtp->dynticks and rdtp->dynticks_nmi_nesting - * to let the RCU grace-period handling know that the CPU is back to - * being RCU-idle. + * Enter an interrupt handler, which might possibly result in exiting + * idle mode, in other words, entering the mode in which read-side critical + * sections can occur. The caller must have disabled interrupts. * - * If you add or remove a call to rcu_nmi_exit(), be sure to test - * with CONFIG_RCU_EQS_DEBUG=y. + * Note that the Linux kernel is fully capable of entering an interrupt + * handler that it never exits, for example when doing upcalls to user mode! + * This code assumes that the idle loop never does upcalls to user mode. + * If your architecture's idle loop does do upcalls to user mode (or does + * anything else that results in unbalanced calls to the irq_enter() and + * irq_exit() functions), RCU will give you what you deserve, good and hard. + * But very infrequently and irreproducibly. + * + * Use things like work queues to work around this limitation. + * + * You have been warned. + * + * If you add or remove a call to rcu_irq_enter(), be sure to test with + * CONFIG_RCU_EQS_DEBUG=y. */ -void rcu_nmi_exit(void) +void rcu_irq_enter(void) { struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks); - /* - * Check for ->dynticks_nmi_nesting underflow and bad ->dynticks. - * (We are exiting an NMI handler, so RCU better be paying attention - * to us!) - */ - WARN_ON_ONCE(rdtp->dynticks_nmi_nesting <= 0); - WARN_ON_ONCE(rcu_dynticks_curr_cpu_in_eqs()); + lockdep_assert_irqs_disabled(); + if (rdtp->dynticks_nmi_nesting == 0) + rcu_dynticks_task_exit(); + rcu_nmi_enter(); + if (rdtp->dynticks_nmi_nesting == 1) + rcu_cleanup_after_idle(); +} - /* - * If the nesting level is not 1, the CPU wasn't RCU-idle, so - * leave it in non-RCU-idle state. - */ - if (rdtp->dynticks_nmi_nesting != 1) { - rdtp->dynticks_nmi_nesting -= 2; - return; - } +/* + * Wrapper for rcu_irq_enter() where interrupts are enabled. + * + * If you add or remove a call to rcu_irq_enter_irqson(), be sure to test + * with CONFIG_RCU_EQS_DEBUG=y. + */ +void rcu_irq_enter_irqson(void) +{ + unsigned long flags; - /* This NMI interrupted an RCU-idle CPU, restore RCU-idleness. */ - rdtp->dynticks_nmi_nesting = 0; - rcu_dynticks_eqs_enter(); + local_irq_save(flags); + rcu_irq_enter(); + local_irq_restore(flags); } /** @@ -1233,7 +1148,8 @@ EXPORT_SYMBOL_GPL(rcu_lockdep_current_cpu_online); */ static int rcu_is_cpu_rrupt_from_idle(void) { - return __this_cpu_read(rcu_dynticks.dynticks_nesting) <= 1; + return __this_cpu_read(rcu_dynticks.dynticks_nesting) <= 0 && + __this_cpu_read(rcu_dynticks.dynticks_nmi_nesting) <= 1; } /* @@ -2789,6 +2705,11 @@ static void rcu_do_batch(struct rcu_state *rsp, struct rcu_data *rdp) rdp->n_force_qs_snap = rsp->n_force_qs; } else if (count < rdp->qlen_last_fqs_check - qhimark) rdp->qlen_last_fqs_check = count; + + /* + * The following usually indicates a double call_rcu(). To track + * this down, try building with CONFIG_DEBUG_OBJECTS_RCU_HEAD=y. + */ WARN_ON_ONCE(rcu_segcblist_empty(&rdp->cblist) != (count == 0)); local_irq_restore(flags); @@ -3723,7 +3644,7 @@ rcu_boot_init_percpu_data(int cpu, struct rcu_state *rsp) raw_spin_lock_irqsave_rcu_node(rnp, flags); rdp->grpmask = leaf_node_cpu_bit(rdp->mynode, cpu); rdp->dynticks = &per_cpu(rcu_dynticks, cpu); - WARN_ON_ONCE(rdp->dynticks->dynticks_nesting != DYNTICK_TASK_EXIT_IDLE); + WARN_ON_ONCE(rdp->dynticks->dynticks_nesting != 1); WARN_ON_ONCE(rcu_dynticks_in_eqs(rcu_dynticks_snap(rdp->dynticks))); rdp->cpu = cpu; rdp->rsp = rsp; @@ -3752,7 +3673,7 @@ rcu_init_percpu_data(int cpu, struct rcu_state *rsp) if (rcu_segcblist_empty(&rdp->cblist) && /* No early-boot CBs? */ !init_nocb_callback_list(rdp)) rcu_segcblist_init(&rdp->cblist); /* Re-enable callbacks. */ - rdp->dynticks->dynticks_nesting = DYNTICK_TASK_EXIT_IDLE; + rdp->dynticks->dynticks_nesting = 1; /* CPU not up, no tearing. */ rcu_dynticks_eqs_online(); raw_spin_unlock_rcu_node(rnp); /* irqs remain disabled. */ diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h index 46a5d1991450..6488a3b0e729 100644 --- a/kernel/rcu/tree.h +++ b/kernel/rcu/tree.h @@ -38,9 +38,8 @@ * Dynticks per-CPU state. */ struct rcu_dynticks { - long long dynticks_nesting; /* Track irq/process nesting level. */ - /* Process level is worth LLONG_MAX/2. */ - int dynticks_nmi_nesting; /* Track NMI nesting level. */ + long dynticks_nesting; /* Track process nesting level. */ + long dynticks_nmi_nesting; /* Track irq/NMI nesting level. */ atomic_t dynticks; /* Even value for idle, else odd. */ bool rcu_need_heavy_qs; /* GP old, need heavy quiescent state. */ unsigned long rcu_qs_ctr; /* Light universal quiescent state ctr. */ diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index db85ca3975f1..fb88a028deec 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -61,7 +61,6 @@ DEFINE_PER_CPU(char, rcu_cpu_has_work); #ifdef CONFIG_RCU_NOCB_CPU static cpumask_var_t rcu_nocb_mask; /* CPUs to have callbacks offloaded. */ -static bool have_rcu_nocb_mask; /* Was rcu_nocb_mask allocated? */ static bool __read_mostly rcu_nocb_poll; /* Offload kthread are to poll. */ #endif /* #ifdef CONFIG_RCU_NOCB_CPU */ @@ -1687,7 +1686,7 @@ static void print_cpu_stall_info(struct rcu_state *rsp, int cpu) } print_cpu_stall_fast_no_hz(fast_no_hz, cpu); delta = rdp->mynode->gpnum - rdp->rcu_iw_gpnum; - pr_err("\t%d-%c%c%c%c: (%lu %s) idle=%03x/%llx/%d softirq=%u/%u fqs=%ld %s\n", + pr_err("\t%d-%c%c%c%c: (%lu %s) idle=%03x/%ld/%ld softirq=%u/%u fqs=%ld %s\n", cpu, "O."[!!cpu_online(cpu)], "o."[!!(rdp->grpmask & rdp->mynode->qsmaskinit)], @@ -1752,7 +1751,6 @@ static void increment_cpu_stall_ticks(void) static int __init rcu_nocb_setup(char *str) { alloc_bootmem_cpumask_var(&rcu_nocb_mask); - have_rcu_nocb_mask = true; cpulist_parse(str, rcu_nocb_mask); return 1; } @@ -1801,7 +1799,7 @@ static void rcu_init_one_nocb(struct rcu_node *rnp) /* Is the specified CPU a no-CBs CPU? */ bool rcu_is_nocb_cpu(int cpu) { - if (have_rcu_nocb_mask) + if (cpumask_available(rcu_nocb_mask)) return cpumask_test_cpu(cpu, rcu_nocb_mask); return false; } @@ -2295,14 +2293,13 @@ void __init rcu_init_nohz(void) need_rcu_nocb_mask = true; #endif /* #if defined(CONFIG_NO_HZ_FULL) */ - if (!have_rcu_nocb_mask && need_rcu_nocb_mask) { + if (!cpumask_available(rcu_nocb_mask) && need_rcu_nocb_mask) { if (!zalloc_cpumask_var(&rcu_nocb_mask, GFP_KERNEL)) { pr_info("rcu_nocb_mask allocation failed, callback offloading disabled.\n"); return; } - have_rcu_nocb_mask = true; } - if (!have_rcu_nocb_mask) + if (!cpumask_available(rcu_nocb_mask)) return; #if defined(CONFIG_NO_HZ_FULL) @@ -2428,7 +2425,7 @@ static void __init rcu_organize_nocb_kthreads(struct rcu_state *rsp) struct rcu_data *rdp_leader = NULL; /* Suppress misguided gcc warn. */ struct rcu_data *rdp_prev = NULL; - if (!have_rcu_nocb_mask) + if (!cpumask_available(rcu_nocb_mask)) return; if (ls == -1) { ls = int_sqrt(nr_cpu_ids); diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 644fa2e3d993..729b4fff93b8 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -508,7 +508,8 @@ void resched_cpu(int cpu) unsigned long flags; raw_spin_lock_irqsave(&rq->lock, flags); - resched_curr(rq); + if (cpu_online(cpu) || cpu == smp_processor_id()) + resched_curr(rq); raw_spin_unlock_irqrestore(&rq->lock, flags); } diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index 665ace2fc558..862a513adca3 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -2212,7 +2212,7 @@ static void switched_to_rt(struct rq *rq, struct task_struct *p) if (p->nr_cpus_allowed > 1 && rq->rt.overloaded) queue_push_tasks(rq); #endif /* CONFIG_SMP */ - if (p->prio < rq->curr->prio) + if (p->prio < rq->curr->prio && cpu_online(cpu_of(rq))) resched_curr(rq); } } diff --git a/kernel/softirq.c b/kernel/softirq.c index 2f5e87f1bae2..24d243ef8e71 100644 --- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -665,7 +665,7 @@ static void run_ksoftirqd(unsigned int cpu) */ __do_softirq(); local_irq_enable(); - cond_resched_rcu_qs(); + cond_resched(); return; } local_irq_enable(); diff --git a/kernel/torture.c b/kernel/torture.c index 637e172835d8..37b94012a3f8 100644 --- a/kernel/torture.c +++ b/kernel/torture.c @@ -47,6 +47,7 @@ #include <linux/ktime.h> #include <asm/byteorder.h> #include <linux/torture.h> +#include "rcu/rcu.h" MODULE_LICENSE("GPL"); MODULE_AUTHOR("Paul E. McKenney <paulmck@us.ibm.com>"); @@ -60,7 +61,6 @@ static bool verbose; #define FULLSTOP_RMMOD 2 /* Normal rmmod of torture. */ static int fullstop = FULLSTOP_RMMOD; static DEFINE_MUTEX(fullstop_mutex); -static int *torture_runnable; #ifdef CONFIG_HOTPLUG_CPU @@ -500,7 +500,7 @@ static int torture_shutdown(void *arg) torture_shutdown_hook(); else VERBOSE_TOROUT_STRING("No torture_shutdown_hook(), skipping."); - ftrace_dump(DUMP_ALL); + rcu_ftrace_dump(DUMP_ALL); kernel_power_off(); /* Shut down the system. */ return 0; } @@ -572,17 +572,19 @@ static int stutter; */ void stutter_wait(const char *title) { + int spt; + cond_resched_rcu_qs(); - while (READ_ONCE(stutter_pause_test) || - (torture_runnable && !READ_ONCE(*torture_runnable))) { - if (stutter_pause_test) - if (READ_ONCE(stutter_pause_test) == 1) - schedule_timeout_interruptible(1); - else - while (READ_ONCE(stutter_pause_test)) - cond_resched(); - else + spt = READ_ONCE(stutter_pause_test); + for (; spt; spt = READ_ONCE(stutter_pause_test)) { + if (spt == 1) { + schedule_timeout_interruptible(1); + } else if (spt == 2) { + while (READ_ONCE(stutter_pause_test)) + cond_resched(); + } else { schedule_timeout_interruptible(round_jiffies_relative(HZ)); + } torture_shutdown_absorb(title); } } @@ -596,17 +598,15 @@ static int torture_stutter(void *arg) { VERBOSE_TOROUT_STRING("torture_stutter task started"); do { - if (!torture_must_stop()) { - if (stutter > 1) { - schedule_timeout_interruptible(stutter - 1); - WRITE_ONCE(stutter_pause_test, 2); - } - schedule_timeout_interruptible(1); + if (!torture_must_stop() && stutter > 1) { WRITE_ONCE(stutter_pause_test, 1); + schedule_timeout_interruptible(stutter - 1); + WRITE_ONCE(stutter_pause_test, 2); + schedule_timeout_interruptible(1); } + WRITE_ONCE(stutter_pause_test, 0); if (!torture_must_stop()) schedule_timeout_interruptible(stutter); - WRITE_ONCE(stutter_pause_test, 0); torture_shutdown_absorb("torture_stutter"); } while (!torture_must_stop()); torture_kthread_stopping("torture_stutter"); @@ -647,7 +647,7 @@ static void torture_stutter_cleanup(void) * The runnable parameter points to a flag that controls whether or not * the test is currently runnable. If there is no such flag, pass in NULL. */ -bool torture_init_begin(char *ttype, bool v, int *runnable) +bool torture_init_begin(char *ttype, bool v) { mutex_lock(&fullstop_mutex); if (torture_type != NULL) { @@ -659,7 +659,6 @@ bool torture_init_begin(char *ttype, bool v, int *runnable) } torture_type = ttype; verbose = v; - torture_runnable = runnable; fullstop = FULLSTOP_DONTSTOP; return true; } diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 2a8d8a294345..aaf882a8bf1f 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -2682,17 +2682,6 @@ void __trace_stack(struct trace_array *tr, unsigned long flags, int skip, if (unlikely(in_nmi())) return; - /* - * It is possible that a function is being traced in a - * location that RCU is not watching. A call to - * rcu_irq_enter() will make sure that it is, but there's - * a few internal rcu functions that could be traced - * where that wont work either. In those cases, we just - * do nothing. - */ - if (unlikely(rcu_irq_enter_disabled())) - return; - rcu_irq_enter_irqson(); __ftrace_trace_stack(buffer, flags, skip, pc, NULL); rcu_irq_exit_irqson(); diff --git a/kernel/trace/trace_benchmark.c b/kernel/trace/trace_benchmark.c index 79f838a75077..22fee766081b 100644 --- a/kernel/trace/trace_benchmark.c +++ b/kernel/trace/trace_benchmark.c @@ -165,7 +165,7 @@ static int benchmark_event_kthread(void *arg) * this thread will never voluntarily schedule which would * block synchronize_rcu_tasks() indefinitely. */ - cond_resched_rcu_qs(); + cond_resched(); } return 0; diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c index 685c50ae6300..671b13457387 100644 --- a/kernel/tracepoint.c +++ b/kernel/tracepoint.c @@ -212,11 +212,10 @@ static int tracepoint_add_func(struct tracepoint *tp, } /* - * rcu_assign_pointer has a smp_wmb() which makes sure that the new - * probe callbacks array is consistent before setting a pointer to it. - * This array is referenced by __DO_TRACE from - * include/linux/tracepoints.h. A matching smp_read_barrier_depends() - * is used. + * rcu_assign_pointer has as smp_store_release() which makes sure + * that the new probe callbacks array is consistent before setting + * a pointer to it. This array is referenced by __DO_TRACE from + * include/linux/tracepoint.h using rcu_dereference_sched(). */ rcu_assign_pointer(tp->funcs, tp_funcs); if (!static_key_enabled(&tp->key)) diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 43d18cb46308..acf485eada1b 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -2135,7 +2135,7 @@ __acquires(&pool->lock) * stop_machine. At the same time, report a quiescent RCU state so * the same condition doesn't freeze RCU. */ - cond_resched_rcu_qs(); + cond_resched(); spin_lock_irq(&pool->lock); diff --git a/lib/assoc_array.c b/lib/assoc_array.c index b77d51da8c73..c6659cb37033 100644 --- a/lib/assoc_array.c +++ b/lib/assoc_array.c @@ -38,12 +38,10 @@ begin_node: if (assoc_array_ptr_is_shortcut(cursor)) { /* Descend through a shortcut */ shortcut = assoc_array_ptr_to_shortcut(cursor); - smp_read_barrier_depends(); - cursor = READ_ONCE(shortcut->next_node); + cursor = READ_ONCE(shortcut->next_node); /* Address dependency. */ } node = assoc_array_ptr_to_node(cursor); - smp_read_barrier_depends(); slot = 0; /* We perform two passes of each node. @@ -55,15 +53,12 @@ begin_node: */ has_meta = 0; for (; slot < ASSOC_ARRAY_FAN_OUT; slot++) { - ptr = READ_ONCE(node->slots[slot]); + ptr = READ_ONCE(node->slots[slot]); /* Address dependency. */ has_meta |= (unsigned long)ptr; if (ptr && assoc_array_ptr_is_leaf(ptr)) { - /* We need a barrier between the read of the pointer - * and dereferencing the pointer - but only if we are - * actually going to dereference it. + /* We need a barrier between the read of the pointer, + * which is supplied by the above READ_ONCE(). */ - smp_read_barrier_depends(); - /* Invoke the callback */ ret = iterator(assoc_array_ptr_to_leaf(ptr), iterator_data); @@ -86,10 +81,8 @@ begin_node: continue_node: node = assoc_array_ptr_to_node(cursor); - smp_read_barrier_depends(); - for (; slot < ASSOC_ARRAY_FAN_OUT; slot++) { - ptr = READ_ONCE(node->slots[slot]); + ptr = READ_ONCE(node->slots[slot]); /* Address dependency. */ if (assoc_array_ptr_is_meta(ptr)) { cursor = ptr; goto begin_node; @@ -98,16 +91,15 @@ continue_node: finished_node: /* Move up to the parent (may need to skip back over a shortcut) */ - parent = READ_ONCE(node->back_pointer); + parent = READ_ONCE(node->back_pointer); /* Address dependency. */ slot = node->parent_slot; if (parent == stop) return 0; if (assoc_array_ptr_is_shortcut(parent)) { shortcut = assoc_array_ptr_to_shortcut(parent); - smp_read_barrier_depends(); cursor = parent; - parent = READ_ONCE(shortcut->back_pointer); + parent = READ_ONCE(shortcut->back_pointer); /* Address dependency. */ slot = shortcut->parent_slot; if (parent == stop) return 0; @@ -147,7 +139,7 @@ int assoc_array_iterate(const struct assoc_array *array, void *iterator_data), void *iterator_data) { - struct assoc_array_ptr *root = READ_ONCE(array->root); + struct assoc_array_ptr *root = READ_ONCE(array->root); /* Address dependency. */ if (!root) return 0; @@ -194,7 +186,7 @@ assoc_array_walk(const struct assoc_array *array, pr_devel("-->%s()\n", __func__); - cursor = READ_ONCE(array->root); + cursor = READ_ONCE(array->root); /* Address dependency. */ if (!cursor) return assoc_array_walk_tree_empty; @@ -216,11 +208,9 @@ jumped: consider_node: node = assoc_array_ptr_to_node(cursor); - smp_read_barrier_depends(); - slot = segments >> (level & ASSOC_ARRAY_KEY_CHUNK_MASK); slot &= ASSOC_ARRAY_FAN_MASK; - ptr = READ_ONCE(node->slots[slot]); + ptr = READ_ONCE(node->slots[slot]); /* Address dependency. */ pr_devel("consider slot %x [ix=%d type=%lu]\n", slot, level, (unsigned long)ptr & 3); @@ -254,7 +244,6 @@ consider_node: cursor = ptr; follow_shortcut: shortcut = assoc_array_ptr_to_shortcut(cursor); - smp_read_barrier_depends(); pr_devel("shortcut to %d\n", shortcut->skip_to_level); sc_level = level + ASSOC_ARRAY_LEVEL_STEP; BUG_ON(sc_level > shortcut->skip_to_level); @@ -294,7 +283,7 @@ follow_shortcut: } while (sc_level < shortcut->skip_to_level); /* The shortcut matches the leaf's index to this point. */ - cursor = READ_ONCE(shortcut->next_node); + cursor = READ_ONCE(shortcut->next_node); /* Address dependency. */ if (((level ^ sc_level) & ~ASSOC_ARRAY_KEY_CHUNK_MASK) != 0) { level = sc_level; goto jumped; @@ -331,20 +320,18 @@ void *assoc_array_find(const struct assoc_array *array, return NULL; node = result.terminal_node.node; - smp_read_barrier_depends(); /* If the target key is available to us, it's has to be pointed to by * the terminal node. */ for (slot = 0; slot < ASSOC_ARRAY_FAN_OUT; slot++) { - ptr = READ_ONCE(node->slots[slot]); + ptr = READ_ONCE(node->slots[slot]); /* Address dependency. */ if (ptr && assoc_array_ptr_is_leaf(ptr)) { /* We need a barrier between the read of the pointer * and dereferencing the pointer - but only if we are * actually going to dereference it. */ leaf = assoc_array_ptr_to_leaf(ptr); - smp_read_barrier_depends(); if (ops->compare_object(leaf, index_key)) return (void *)leaf; } diff --git a/lib/percpu-refcount.c b/lib/percpu-refcount.c index fe03c6d52761..30e7dd88148b 100644 --- a/lib/percpu-refcount.c +++ b/lib/percpu-refcount.c @@ -197,10 +197,10 @@ static void __percpu_ref_switch_to_percpu(struct percpu_ref *ref) atomic_long_add(PERCPU_COUNT_BIAS, &ref->count); /* - * Restore per-cpu operation. smp_store_release() is paired with - * smp_read_barrier_depends() in __ref_is_percpu() and guarantees - * that the zeroing is visible to all percpu accesses which can see - * the following __PERCPU_REF_ATOMIC clearing. + * Restore per-cpu operation. smp_store_release() is paired + * with READ_ONCE() in __ref_is_percpu() and guarantees that the + * zeroing is visible to all percpu accesses which can see the + * following __PERCPU_REF_ATOMIC clearing. */ for_each_possible_cpu(cpu) *per_cpu_ptr(percpu_count, cpu) = 0; @@ -675,15 +675,8 @@ static struct page *get_ksm_page(struct stable_node *stable_node, bool lock_it) expected_mapping = (void *)((unsigned long)stable_node | PAGE_MAPPING_KSM); again: - kpfn = READ_ONCE(stable_node->kpfn); + kpfn = READ_ONCE(stable_node->kpfn); /* Address dependency. */ page = pfn_to_page(kpfn); - - /* - * page is computed from kpfn, so on most architectures reading - * page->mapping is naturally ordered after reading node->kpfn, - * but on Alpha we need to be more careful. - */ - smp_read_barrier_depends(); if (READ_ONCE(page->mapping) != expected_mapping) goto stale; diff --git a/mm/mlock.c b/mm/mlock.c index 30472d438794..f7f54fd2e13f 100644 --- a/mm/mlock.c +++ b/mm/mlock.c @@ -779,7 +779,7 @@ static int apply_mlockall_flags(int flags) /* Ignore errors */ mlock_fixup(vma, &prev, vma->vm_start, vma->vm_end, newflags); - cond_resched_rcu_qs(); + cond_resched(); } out: return 0; diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c index 0c3c944a7b72..eb8246c39de0 100644 --- a/net/ipv4/netfilter/arp_tables.c +++ b/net/ipv4/netfilter/arp_tables.c @@ -202,13 +202,8 @@ unsigned int arpt_do_table(struct sk_buff *skb, local_bh_disable(); addend = xt_write_recseq_begin(); - private = table->private; + private = READ_ONCE(table->private); /* Address dependency. */ cpu = smp_processor_id(); - /* - * Ensure we load private-> members after we've fetched the base - * pointer. - */ - smp_read_barrier_depends(); table_base = private->entries; jumpstack = (struct arpt_entry **)private->jumpstack[cpu]; diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c index 2e0d339028bb..cc984d0e0c69 100644 --- a/net/ipv4/netfilter/ip_tables.c +++ b/net/ipv4/netfilter/ip_tables.c @@ -260,13 +260,8 @@ ipt_do_table(struct sk_buff *skb, WARN_ON(!(table->valid_hooks & (1 << hook))); local_bh_disable(); addend = xt_write_recseq_begin(); - private = table->private; + private = READ_ONCE(table->private); /* Address dependency. */ cpu = smp_processor_id(); - /* - * Ensure we load private-> members after we've fetched the base - * pointer. - */ - smp_read_barrier_depends(); table_base = private->entries; jumpstack = (struct ipt_entry **)private->jumpstack[cpu]; diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c index 1d7ae9366335..66a8c69a3db4 100644 --- a/net/ipv6/netfilter/ip6_tables.c +++ b/net/ipv6/netfilter/ip6_tables.c @@ -282,12 +282,7 @@ ip6t_do_table(struct sk_buff *skb, local_bh_disable(); addend = xt_write_recseq_begin(); - private = table->private; - /* - * Ensure we load private-> members after we've fetched the base - * pointer. - */ - smp_read_barrier_depends(); + private = READ_ONCE(table->private); /* Address dependency. */ cpu = smp_processor_id(); table_base = private->entries; jumpstack = (struct ip6t_entry **)private->jumpstack[cpu]; diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c index 85f643c1e227..4efaa3066c78 100644 --- a/net/netfilter/nf_conntrack_core.c +++ b/net/netfilter/nf_conntrack_core.c @@ -1044,7 +1044,7 @@ static void gc_worker(struct work_struct *work) * we will just continue with next hash slot. */ rcu_read_unlock(); - cond_resched_rcu_qs(); + cond_resched(); } while (++buckets < goal); if (gc_work->exiting) diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl index 31031f10fe56..ba03f17ff662 100755 --- a/scripts/checkpatch.pl +++ b/scripts/checkpatch.pl @@ -5586,6 +5586,12 @@ sub process { } } +# check for smp_read_barrier_depends and read_barrier_depends + if (!$file && $line =~ /\b(smp_|)read_barrier_depends\s*\(/) { + WARN("READ_BARRIER_DEPENDS", + "$1read_barrier_depends should only be used in READ_ONCE or DEC Alpha code\n" . $herecurr); + } + # check of hardware specific defines if ($line =~ m@^.\s*\#\s*if.*\b(__i386__|__powerpc64__|__sun__|__s390x__)\b@ && $realfile !~ m@include/asm-@) { CHK("ARCH_DEFINES", diff --git a/security/keys/keyring.c b/security/keys/keyring.c index d0bccebbd3b5..41bcf57e96f2 100644 --- a/security/keys/keyring.c +++ b/security/keys/keyring.c @@ -713,7 +713,6 @@ descend_to_keyring: * doesn't contain any keyring pointers. */ shortcut = assoc_array_ptr_to_shortcut(ptr); - smp_read_barrier_depends(); if ((shortcut->index_key[0] & ASSOC_ARRAY_FAN_MASK) != 0) goto not_this_keyring; @@ -723,8 +722,6 @@ descend_to_keyring: } node = assoc_array_ptr_to_node(ptr); - smp_read_barrier_depends(); - ptr = node->slots[0]; if (!assoc_array_ptr_is_meta(ptr)) goto begin_node; @@ -736,7 +733,6 @@ descend_to_node: kdebug("descend"); if (assoc_array_ptr_is_shortcut(ptr)) { shortcut = assoc_array_ptr_to_shortcut(ptr); - smp_read_barrier_depends(); ptr = READ_ONCE(shortcut->next_node); BUG_ON(!assoc_array_ptr_is_node(ptr)); } @@ -744,7 +740,6 @@ descend_to_node: begin_node: kdebug("begin_node"); - smp_read_barrier_depends(); slot = 0; ascend_to_node: /* Go through the slots in a node */ @@ -792,14 +787,12 @@ ascend_to_node: if (ptr && assoc_array_ptr_is_shortcut(ptr)) { shortcut = assoc_array_ptr_to_shortcut(ptr); - smp_read_barrier_depends(); ptr = READ_ONCE(shortcut->back_pointer); slot = shortcut->parent_slot; } if (!ptr) goto not_this_keyring; node = assoc_array_ptr_to_node(ptr); - smp_read_barrier_depends(); slot++; /* If we've ascended to the root (zero backpointer), we must have just diff --git a/tools/testing/selftests/rcutorture/bin/config2frag.sh b/tools/testing/selftests/rcutorture/bin/config2frag.sh deleted file mode 100755 index 56f51ae13d73..000000000000 --- a/tools/testing/selftests/rcutorture/bin/config2frag.sh +++ /dev/null @@ -1,25 +0,0 @@ -#!/bin/bash -# Usage: config2frag.sh < .config > configfrag -# -# Converts the "# CONFIG_XXX is not set" to "CONFIG_XXX=n" so that the -# resulting file becomes a legitimate Kconfig fragment. -# -# This program is free software; you can redistribute it and/or modify -# it under the terms of the GNU General Public License as published by -# the Free Software Foundation; either version 2 of the License, or -# (at your option) any later version. -# -# This program is distributed in the hope that it will be useful, -# but WITHOUT ANY WARRANTY; without even the implied warranty of -# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -# GNU General Public License for more details. -# -# You should have received a copy of the GNU General Public License -# along with this program; if not, you can access it online at -# http://www.gnu.org/licenses/gpl-2.0.html. -# -# Copyright (C) IBM Corporation, 2013 -# -# Authors: Paul E. McKenney <paulmck@linux.vnet.ibm.com> - -LANG=C sed -e 's/^# CONFIG_\([a-zA-Z0-9_]*\) is not set$/CONFIG_\1=n/' diff --git a/tools/testing/selftests/rcutorture/bin/configinit.sh b/tools/testing/selftests/rcutorture/bin/configinit.sh index 51f66a7ce876..c15f270e121d 100755 --- a/tools/testing/selftests/rcutorture/bin/configinit.sh +++ b/tools/testing/selftests/rcutorture/bin/configinit.sh @@ -51,7 +51,7 @@ then mkdir $builddir fi else - echo Bad build directory: \"$builddir\" + echo Bad build directory: \"$buildloc\" exit 2 fi fi diff --git a/tools/testing/selftests/rcutorture/bin/kvm-build.sh b/tools/testing/selftests/rcutorture/bin/kvm-build.sh index fb66d0173638..34d126734cde 100755 --- a/tools/testing/selftests/rcutorture/bin/kvm-build.sh +++ b/tools/testing/selftests/rcutorture/bin/kvm-build.sh @@ -29,11 +29,6 @@ then exit 1 fi builddir=${2} -if test -z "$builddir" -o ! -d "$builddir" -o ! -w "$builddir" -then - echo "kvm-build.sh :$builddir: Not a writable directory, cannot build into it" - exit 1 -fi T=${TMPDIR-/tmp}/test-linux.sh.$$ trap 'rm -rf $T' 0 diff --git a/tools/testing/selftests/rcutorture/bin/kvm-recheck-lock.sh b/tools/testing/selftests/rcutorture/bin/kvm-recheck-lock.sh index 43f764098e50..2de92f43ee8c 100755 --- a/tools/testing/selftests/rcutorture/bin/kvm-recheck-lock.sh +++ b/tools/testing/selftests/rcutorture/bin/kvm-recheck-lock.sh @@ -23,7 +23,7 @@ # Authors: Paul E. McKenney <paulmck@linux.vnet.ibm.com> i="$1" -if test -d $i +if test -d "$i" -a -r "$i" then : else diff --git a/tools/testing/selftests/rcutorture/bin/kvm-recheck-rcu.sh b/tools/testing/selftests/rcutorture/bin/kvm-recheck-rcu.sh index 559e01ac86be..c2e1bb6d0cba 100755 --- a/tools/testing/selftests/rcutorture/bin/kvm-recheck-rcu.sh +++ b/tools/testing/selftests/rcutorture/bin/kvm-recheck-rcu.sh @@ -23,14 +23,14 @@ # Authors: Paul E. McKenney <paulmck@linux.vnet.ibm.com> i="$1" -if test -d $i +if test -d "$i" -a -r "$i" then : else echo Unreadable results directory: $i exit 1 fi -. tools/testing/selftests/rcutorture/bin/functions.sh +. functions.sh configfile=`echo $i | sed -e 's/^.*\///'` ngps=`grep ver: $i/console.log 2> /dev/null | tail -1 | sed -e 's/^.* ver: //' -e 's/ .*$//'` diff --git a/tools/testing/selftests/rcutorture/bin/kvm-recheck-rcuperf-ftrace.sh b/tools/testing/selftests/rcutorture/bin/kvm-recheck-rcuperf-ftrace.sh index f79b0e9e84fc..963f71289d22 100755 --- a/tools/testing/selftests/rcutorture/bin/kvm-recheck-rcuperf-ftrace.sh +++ b/tools/testing/selftests/rcutorture/bin/kvm-recheck-rcuperf-ftrace.sh @@ -26,7 +26,7 @@ # Authors: Paul E. McKenney <paulmck@linux.vnet.ibm.com> i="$1" -. tools/testing/selftests/rcutorture/bin/functions.sh +. functions.sh if test "`grep -c 'rcu_exp_grace_period.*start' < $i/console.log`" -lt 100 then diff --git a/tools/testing/selftests/rcutorture/bin/kvm-recheck-rcuperf.sh b/tools/testing/selftests/rcutorture/bin/kvm-recheck-rcuperf.sh index 8f3121afc716..ccebf772fa1e 100755 --- a/tools/testing/selftests/rcutorture/bin/kvm-recheck-rcuperf.sh +++ b/tools/testing/selftests/rcutorture/bin/kvm-recheck-rcuperf.sh @@ -23,7 +23,7 @@ # Authors: Paul E. McKenney <paulmck@linux.vnet.ibm.com> i="$1" -if test -d $i +if test -d "$i" -a -r "$i" then : else @@ -31,7 +31,7 @@ else exit 1 fi PATH=`pwd`/tools/testing/selftests/rcutorture/bin:$PATH; export PATH -. tools/testing/selftests/rcutorture/bin/functions.sh +. functions.sh if kvm-recheck-rcuperf-ftrace.sh $i then diff --git a/tools/testing/selftests/rcutorture/bin/kvm-recheck.sh b/tools/testing/selftests/rcutorture/bin/kvm-recheck.sh index f659346d3358..f7e988f369dd 100755 --- a/tools/testing/selftests/rcutorture/bin/kvm-recheck.sh +++ b/tools/testing/selftests/rcutorture/bin/kvm-recheck.sh @@ -25,7 +25,7 @@ # Authors: Paul E. McKenney <paulmck@linux.vnet.ibm.com> PATH=`pwd`/tools/testing/selftests/rcutorture/bin:$PATH; export PATH -. tools/testing/selftests/rcutorture/bin/functions.sh +. functions.sh for rd in "$@" do firsttime=1 diff --git a/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh b/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh index ab14b97c942c..1b78a12740e5 100755 --- a/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh +++ b/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh @@ -42,7 +42,7 @@ T=${TMPDIR-/tmp}/kvm-test-1-run.sh.$$ trap 'rm -rf $T' 0 mkdir $T -. $KVM/bin/functions.sh +. functions.sh . $CONFIGFRAG/ver_functions.sh config_template=${1} @@ -154,9 +154,7 @@ cpu_count=`configfrag_boot_cpus "$boot_args" "$config_template" "$cpu_count"` vcpus=`identify_qemu_vcpus` if test $cpu_count -gt $vcpus then - echo CPU count limited from $cpu_count to $vcpus - touch $resdir/Warnings - echo CPU count limited from $cpu_count to $vcpus >> $resdir/Warnings + echo CPU count limited from $cpu_count to $vcpus | tee -a $resdir/Warnings cpu_count=$vcpus fi qemu_args="`specify_qemu_cpus "$QEMU" "$qemu_args" "$cpu_count"`" diff --git a/tools/testing/selftests/rcutorture/bin/kvm.sh b/tools/testing/selftests/rcutorture/bin/kvm.sh index ccd49e958fd2..7d1f607f0f76 100755 --- a/tools/testing/selftests/rcutorture/bin/kvm.sh +++ b/tools/testing/selftests/rcutorture/bin/kvm.sh @@ -1,8 +1,7 @@ #!/bin/bash # # Run a series of 14 tests under KVM. These are not particularly -# well-selected or well-tuned, but are the current set. Run from the -# top level of the source tree. +# well-selected or well-tuned, but are the current set. # # Edit the definitions below to set the locations of the various directories, # as well as the test duration. @@ -34,6 +33,8 @@ T=${TMPDIR-/tmp}/kvm.sh.$$ trap 'rm -rf $T' 0 mkdir $T +cd `dirname $scriptname`/../../../../../ + dur=$((30*60)) dryrun="" KVM="`pwd`/tools/testing/selftests/rcutorture"; export KVM @@ -70,7 +71,7 @@ usage () { echo " --kmake-arg kernel-make-arguments" echo " --mac nn:nn:nn:nn:nn:nn" echo " --no-initrd" - echo " --qemu-args qemu-system-..." + echo " --qemu-args qemu-arguments" echo " --qemu-cmd qemu-system-..." echo " --results absolute-pathname" echo " --torture rcu" @@ -150,7 +151,7 @@ do TORTURE_INITRD=""; export TORTURE_INITRD ;; --qemu-args|--qemu-arg) - checkarg --qemu-args "-qemu args" $# "$2" '^-' '^error' + checkarg --qemu-args "(qemu arguments)" $# "$2" '^-' '^error' TORTURE_QEMU_ARG="$2" shift ;; @@ -238,7 +239,6 @@ BEGIN { } END { - alldone = 0; batch = 0; nc = -1; @@ -331,8 +331,7 @@ awk < $T/cfgcpu.pack \ # Dump out the scripting required to run one test batch. function dump(first, pastlast, batchnum) { - print "echo ----Start batch " batchnum ": `date`"; - print "echo ----Start batch " batchnum ": `date` >> " rd "/log"; + print "echo ----Start batch " batchnum ": `date` | tee -a " rd "log"; print "needqemurun=" jn=1 for (j = first; j < pastlast; j++) { @@ -349,21 +348,18 @@ function dump(first, pastlast, batchnum) ovf = "-ovf"; else ovf = ""; - print "echo ", cfr[jn], cpusr[jn] ovf ": Starting build. `date`"; - print "echo ", cfr[jn], cpusr[jn] ovf ": Starting build. `date` >> " rd "/log"; + print "echo ", cfr[jn], cpusr[jn] ovf ": Starting build. `date` | tee -a " rd "log"; print "rm -f " builddir ".*"; print "touch " builddir ".wait"; print "mkdir " builddir " > /dev/null 2>&1 || :"; print "mkdir " rd cfr[jn] " || :"; print "kvm-test-1-run.sh " CONFIGDIR cf[j], builddir, rd cfr[jn], dur " \"" TORTURE_QEMU_ARG "\" \"" TORTURE_BOOTARGS "\" > " rd cfr[jn] "/kvm-test-1-run.sh.out 2>&1 &" - print "echo ", cfr[jn], cpusr[jn] ovf ": Waiting for build to complete. `date`"; - print "echo ", cfr[jn], cpusr[jn] ovf ": Waiting for build to complete. `date` >> " rd "/log"; + print "echo ", cfr[jn], cpusr[jn] ovf ": Waiting for build to complete. `date` | tee -a " rd "log"; print "while test -f " builddir ".wait" print "do" print "\tsleep 1" print "done" - print "echo ", cfr[jn], cpusr[jn] ovf ": Build complete. `date`"; - print "echo ", cfr[jn], cpusr[jn] ovf ": Build complete. `date` >> " rd "/log"; + print "echo ", cfr[jn], cpusr[jn] ovf ": Build complete. `date` | tee -a " rd "log"; jn++; } for (j = 1; j < jn; j++) { @@ -371,8 +367,7 @@ function dump(first, pastlast, batchnum) print "rm -f " builddir ".ready" print "if test -f \"" rd cfr[j] "/builtkernel\"" print "then" - print "\techo ----", cfr[j], cpusr[j] ovf ": Kernel present. `date`"; - print "\techo ----", cfr[j], cpusr[j] ovf ": Kernel present. `date` >> " rd "/log"; + print "\techo ----", cfr[j], cpusr[j] ovf ": Kernel present. `date` | tee -a " rd "log"; print "\tneedqemurun=1" print "fi" } @@ -386,31 +381,26 @@ function dump(first, pastlast, batchnum) njitter = ja[1]; if (TORTURE_BUILDONLY && njitter != 0) { njitter = 0; - print "echo Build-only run, so suppressing jitter >> " rd "/log" + print "echo Build-only run, so suppressing jitter | tee -a " rd "log" } if (TORTURE_BUILDONLY) { print "needqemurun=" } print "if test -n \"$needqemurun\"" print "then" - print "\techo ---- Starting kernels. `date`"; - print "\techo ---- Starting kernels. `date` >> " rd "/log"; + print "\techo ---- Starting kernels. `date` | tee -a " rd "log"; for (j = 0; j < njitter; j++) print "\tjitter.sh " j " " dur " " ja[2] " " ja[3] "&" print "\twait" - print "\techo ---- All kernel runs complete. `date`"; - print "\techo ---- All kernel runs complete. `date` >> " rd "/log"; + print "\techo ---- All kernel runs complete. `date` | tee -a " rd "log"; print "else" print "\twait" - print "\techo ---- No kernel runs. `date`"; - print "\techo ---- No kernel runs. `date` >> " rd "/log"; + print "\techo ---- No kernel runs. `date` | tee -a " rd "log"; print "fi" for (j = 1; j < jn; j++) { builddir=KVM "/b" j - print "echo ----", cfr[j], cpusr[j] ovf ": Build/run results:"; - print "echo ----", cfr[j], cpusr[j] ovf ": Build/run results: >> " rd "/log"; - print "cat " rd cfr[j] "/kvm-test-1-run.sh.out"; - print "cat " rd cfr[j] "/kvm-test-1-run.sh.out >> " rd "/log"; + print "echo ----", cfr[j], cpusr[j] ovf ": Build/run results: | tee -a " rd "log"; + print "cat " rd cfr[j] "/kvm-test-1-run.sh.out | tee -a " rd "log"; } } diff --git a/tools/testing/selftests/rcutorture/bin/parse-torture.sh b/tools/testing/selftests/rcutorture/bin/parse-torture.sh index f12c38909b00..5987e50cfeb4 100755 --- a/tools/testing/selftests/rcutorture/bin/parse-torture.sh +++ b/tools/testing/selftests/rcutorture/bin/parse-torture.sh @@ -55,7 +55,7 @@ then exit fi -grep --binary-files=text 'torture:.*ver:' $file | grep --binary-files=text -v '(null)' | sed -e 's/^(initramfs)[^]]*] //' -e 's/^\[[^]]*] //' | +grep --binary-files=text 'torture:.*ver:' $file | egrep --binary-files=text -v '\(null\)|rtc: 000000000* ' | sed -e 's/^(initramfs)[^]]*] //' -e 's/^\[[^]]*] //' | awk ' BEGIN { ver = 0; diff --git a/tools/testing/selftests/rcutorture/configs/lock/ver_functions.sh b/tools/testing/selftests/rcutorture/configs/lock/ver_functions.sh index 252aae618984..80eb646e1319 100644 --- a/tools/testing/selftests/rcutorture/configs/lock/ver_functions.sh +++ b/tools/testing/selftests/rcutorture/configs/lock/ver_functions.sh @@ -38,6 +38,5 @@ per_version_boot_params () { echo $1 `locktorture_param_onoff "$1" "$2"` \ locktorture.stat_interval=15 \ locktorture.shutdown_secs=$3 \ - locktorture.torture_runnable=1 \ locktorture.verbose=1 } diff --git a/tools/testing/selftests/rcutorture/configs/rcu/ver_functions.sh b/tools/testing/selftests/rcutorture/configs/rcu/ver_functions.sh index ffb85ed786fa..24ec91041957 100644 --- a/tools/testing/selftests/rcutorture/configs/rcu/ver_functions.sh +++ b/tools/testing/selftests/rcutorture/configs/rcu/ver_functions.sh @@ -51,7 +51,6 @@ per_version_boot_params () { `rcutorture_param_n_barrier_cbs "$1"` \ rcutorture.stat_interval=15 \ rcutorture.shutdown_secs=$3 \ - rcutorture.torture_runnable=1 \ rcutorture.test_no_idle_hz=1 \ rcutorture.verbose=1 } diff --git a/tools/testing/selftests/rcutorture/configs/rcuperf/ver_functions.sh b/tools/testing/selftests/rcutorture/configs/rcuperf/ver_functions.sh index 34f2a1b35ee5..b9603115d7c7 100644 --- a/tools/testing/selftests/rcutorture/configs/rcuperf/ver_functions.sh +++ b/tools/testing/selftests/rcutorture/configs/rcuperf/ver_functions.sh @@ -46,7 +46,6 @@ rcuperf_param_nwriters () { per_version_boot_params () { echo $1 `rcuperf_param_nreaders "$1"` \ `rcuperf_param_nwriters "$1"` \ - rcuperf.perf_runnable=1 \ rcuperf.shutdown=1 \ rcuperf.verbose=1 } |