From 3af3999b9a325d462c9353389b7507c4b7bc5428 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Fri, 6 Oct 2017 13:48:14 -0700 Subject: doc: Update dyntick-idle design documentation for NMI/irq consolidation Signed-off-by: Paul E. McKenney --- .../Design/Data-Structures/Data-Structures.html | 46 +++++++++++++++------- 1 file changed, 32 insertions(+), 14 deletions(-) (limited to 'Documentation') diff --git a/Documentation/RCU/Design/Data-Structures/Data-Structures.html b/Documentation/RCU/Design/Data-Structures/Data-Structures.html index 38d6d800761f..1ac011de606e 100644 --- a/Documentation/RCU/Design/Data-Structures/Data-Structures.html +++ b/Documentation/RCU/Design/Data-Structures/Data-Structures.html @@ -1182,8 +1182,8 @@ CPU (and from tracing) unless otherwise stated. Its fields are as follows:
-  1   int dynticks_nesting;
-  2   int dynticks_nmi_nesting;
+  1   long dynticks_nesting;
+  2   long dynticks_nmi_nesting;
   3   atomic_t dynticks;
   4   bool rcu_need_heavy_qs;
   5   unsigned long rcu_qs_ctr;
@@ -1191,15 +1191,31 @@ Its fields are as follows:
 

The ->dynticks_nesting field counts the -nesting depth of normal interrupts. -In addition, this counter is incremented when exiting dyntick-idle -mode and decremented when entering it. +nesting depth of process execution, so that in normal circumstances +this counter has value zero or one. +NMIs, irqs, and tracers are counted by the ->dynticks_nmi_nesting +field. +Because NMIs cannot be masked, changes to this variable have to be +undertaken carefully using an algorithm provided by Andy Lutomirski. +The initial transition from idle adds one, and nested transitions +add two, so that a nesting level of five is represented by a +->dynticks_nmi_nesting value of nine. This counter can therefore be thought of as counting the number of reasons why this CPU cannot be permitted to enter dyntick-idle -mode, aside from non-maskable interrupts (NMIs). -NMIs are counted by the ->dynticks_nmi_nesting -field, except that NMIs that interrupt non-dyntick-idle execution -are not counted. +mode, aside from process-level transitions. + +

However, it turns out that when running in non-idle kernel context, +the Linux kernel is fully capable of entering interrupt handlers that +never exit and perhaps also vice versa. +Therefore, whenever the ->dynticks_nesting field is +incremented up from zero, the ->dynticks_nmi_nesting field +is set to a large positive number, and whenever the +->dynticks_nesting field is decremented down to zero, +the the ->dynticks_nmi_nesting field is set to zero. +Assuming that the number of misnested interrupts is not sufficient +to overflow the counter, this approach corrects the +->dynticks_nmi_nesting field every time the corresponding +CPU enters the idle loop from process context.

The ->dynticks field counts the corresponding CPU's transitions to and from dyntick-idle mode, so that this counter @@ -1231,14 +1247,16 @@ in response.   Quick Quiz: - Why not just count all NMIs? - Wouldn't that be simpler and less error prone? + Why not simply combine the ->dynticks_nesting + and ->dynticks_nmi_nesting counters into a + single counter that just counts the number of reasons that + the corresponding CPU is non-idle? Answer: - It seems simpler only until you think hard about how to go about - updating the rcu_dynticks structure's - ->dynticks field. + Because this would fail in the presence of interrupts whose + handlers never return and of handlers that manage to return + from a made-up interrupt.   -- cgit From 40555946447a394889243e4393e312f65d847e1e Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Mon, 9 Oct 2017 09:15:21 -0700 Subject: doc: READ_ONCE() now implies smp_barrier_depends() This commit updates an example in memory-barriers.txt to account for the fact that READ_ONCE() now implies smp_barrier_depends(). Signed-off-by: Paul E. McKenney [ paulmck: Added MEMORY_BARRIER instructions from DEC Alpha from READ_ONCE(), per David Howells's feedback. ] --- Documentation/memory-barriers.txt | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) (limited to 'Documentation') diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt index 479ecec80593..13fd35b6a597 100644 --- a/Documentation/memory-barriers.txt +++ b/Documentation/memory-barriers.txt @@ -227,17 +227,20 @@ There are some minimal guarantees that may be expected of a CPU: (*) On any given CPU, dependent memory accesses will be issued in order, with respect to itself. This means that for: - Q = READ_ONCE(P); smp_read_barrier_depends(); D = READ_ONCE(*Q); + Q = READ_ONCE(P); D = READ_ONCE(*Q); the CPU will issue the following memory operations: Q = LOAD P, D = LOAD *Q - and always in that order. On most systems, smp_read_barrier_depends() - does nothing, but it is required for DEC Alpha. The READ_ONCE() - is required to prevent compiler mischief. Please note that you - should normally use something like rcu_dereference() instead of - open-coding smp_read_barrier_depends(). + and always in that order. However, on DEC Alpha, READ_ONCE() also + emits a memory-barrier instruction, so that a DEC Alpha CPU will + instead issue the following memory operations: + + Q = LOAD P, MEMORY_BARRIER, D = LOAD *Q, MEMORY_BARRIER + + Whether on DEC Alpha or not, the READ_ONCE() also prevents compiler + mischief. (*) Overlapping loads and stores within a particular CPU will appear to be ordered within that CPU. This means that for: -- cgit From 9ad3c143d7d6942c66f27bc6c18f5df638f70aff Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Mon, 27 Nov 2017 09:20:40 -0800 Subject: doc: De-emphasize smp_read_barrier_depends This commit keeps only the historical and low-level discussion of smp_read_barrier_depends(). Signed-off-by: Paul E. McKenney [ paulmck: Adjusted to allow for David Howells feedback on prior commit. ] --- Documentation/RCU/Design/Requirements/Requirements.html | 3 ++- Documentation/RCU/rcu_dereference.txt | 6 +----- Documentation/RCU/whatisRCU.txt | 3 +-- Documentation/circular-buffers.txt | 3 +-- Documentation/memory-barriers.txt | 7 +++++-- 5 files changed, 10 insertions(+), 12 deletions(-) (limited to 'Documentation') diff --git a/Documentation/RCU/Design/Requirements/Requirements.html b/Documentation/RCU/Design/Requirements/Requirements.html index 62e847bcdcdd..571c3d75510f 100644 --- a/Documentation/RCU/Design/Requirements/Requirements.html +++ b/Documentation/RCU/Design/Requirements/Requirements.html @@ -581,7 +581,8 @@ This guarantee was only partially premeditated. DYNIX/ptx used an explicit memory barrier for publication, but had nothing resembling rcu_dereference() for subscription, nor did it have anything resembling the smp_read_barrier_depends() -that was later subsumed into rcu_dereference(). +that was later subsumed into rcu_dereference() and later +still into READ_ONCE(). The need for these operations made itself known quite suddenly at a late-1990s meeting with the DEC Alpha architects, back in the days when DEC was still a free-standing company. diff --git a/Documentation/RCU/rcu_dereference.txt b/Documentation/RCU/rcu_dereference.txt index 1acb26b09b48..ab96227bad42 100644 --- a/Documentation/RCU/rcu_dereference.txt +++ b/Documentation/RCU/rcu_dereference.txt @@ -122,11 +122,7 @@ o Be very careful about comparing pointers obtained from Note that if checks for being within an RCU read-side critical section are not required and the pointer is never dereferenced, rcu_access_pointer() should be used in place - of rcu_dereference(). The rcu_access_pointer() primitive - does not require an enclosing read-side critical section, - and also omits the smp_read_barrier_depends() included in - rcu_dereference(), which in turn should provide a small - performance gain in some CPUs (e.g., the DEC Alpha). + of rcu_dereference(). o The comparison is against a pointer that references memory that was initialized "a long time ago." The reason diff --git a/Documentation/RCU/whatisRCU.txt b/Documentation/RCU/whatisRCU.txt index df62466da4e0..a27fbfb0efb8 100644 --- a/Documentation/RCU/whatisRCU.txt +++ b/Documentation/RCU/whatisRCU.txt @@ -600,8 +600,7 @@ don't forget about them when submitting patches making use of RCU!] #define rcu_dereference(p) \ ({ \ - typeof(p) _________p1 = p; \ - smp_read_barrier_depends(); \ + typeof(p) _________p1 = READ_ONCE(p); \ (_________p1); \ }) diff --git a/Documentation/circular-buffers.txt b/Documentation/circular-buffers.txt index d4628174b7c5..53e51caa3347 100644 --- a/Documentation/circular-buffers.txt +++ b/Documentation/circular-buffers.txt @@ -220,8 +220,7 @@ before it writes the new tail pointer, which will erase the item. Note the use of READ_ONCE() and smp_load_acquire() to read the opposition index. This prevents the compiler from discarding and -reloading its cached value - which some compilers will do across -smp_read_barrier_depends(). This isn't strictly needed if you can +reloading its cached value. This isn't strictly needed if you can be sure that the opposition index will _only_ be used the once. The smp_load_acquire() additionally forces the CPU to order against subsequent memory references. Similarly, smp_store_release() is used diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt index 13fd35b6a597..a863009849a3 100644 --- a/Documentation/memory-barriers.txt +++ b/Documentation/memory-barriers.txt @@ -1818,7 +1818,7 @@ The Linux kernel has eight basic CPU memory barriers: GENERAL mb() smp_mb() WRITE wmb() smp_wmb() READ rmb() smp_rmb() - DATA DEPENDENCY read_barrier_depends() smp_read_barrier_depends() + DATA DEPENDENCY READ_ONCE() All memory barriers except the data dependency barriers imply a compiler @@ -2867,7 +2867,10 @@ access depends on a read, not all do, so it may not be relied on. Other CPUs may also have split caches, but must coordinate between the various cachelets for normal memory accesses. The semantics of the Alpha removes the -need for coordination in the absence of memory barriers. +need for hardware coordination in the absence of memory barriers, which +permitted Alpha to sport higher CPU clock rates back in the day. However, +please note that smp_read_barrier_depends() should not be used except in +Alpha arch-specific code and within the READ_ONCE() macro. CACHE COHERENCY VS DMA -- cgit From a2f2577d96ad060b65eb909dd39b57d676754119 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Tue, 21 Nov 2017 20:19:17 -0800 Subject: torture: Eliminate torture_runnable and perf_runnable The purpose of torture_runnable is to allow rcutorture and locktorture to be started and stopped via sysfs when they are built into the kernel (as in not compiled as loadable modules). However, the 0444 permissions for both instances of torture_runnable prevent this use case from ever being put into practice. Given that there have been no complaints about this deficiency, it is reasonable to conclude that no one actually makes use of this sysfs capability. The perf_runnable module parameter for rcuperf is in the same situation. This commit therefore removes both torture_runnable instances as well as perf_runnable. Reported-by: Thomas Gleixner Signed-off-by: Paul E. McKenney --- Documentation/admin-guide/kernel-parameters.txt | 9 --------- Documentation/locking/locktorture.txt | 5 ----- 2 files changed, 14 deletions(-) (limited to 'Documentation') diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 6571fbfdb2a1..66d471f0b92e 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -2049,9 +2049,6 @@ This tests the locking primitive's ability to transition abruptly to and from idle. - locktorture.torture_runnable= [BOOT] - Start locktorture running at boot time. - locktorture.torture_type= [KNL] Specify the locking implementation to test. @@ -3459,9 +3456,6 @@ the same as for rcuperf.nreaders. N, where N is the number of CPUs - rcuperf.perf_runnable= [BOOT] - Start rcuperf running at boot time. - rcuperf.perf_type= [KNL] Specify the RCU implementation to test. @@ -3595,9 +3589,6 @@ Test RCU's dyntick-idle handling. See also the rcutorture.shuffle_interval parameter. - rcutorture.torture_runnable= [BOOT] - Start rcutorture running at boot time. - rcutorture.torture_type= [KNL] Specify the RCU implementation to test. diff --git a/Documentation/locking/locktorture.txt b/Documentation/locking/locktorture.txt index a2ef3a929bf1..6a8df4cd19bf 100644 --- a/Documentation/locking/locktorture.txt +++ b/Documentation/locking/locktorture.txt @@ -57,11 +57,6 @@ torture_type Type of lock to torture. By default, only spinlocks will o "rwsem_lock": read/write down() and up() semaphore pairs. -torture_runnable Start locktorture at boot time in the case where the - module is built into the kernel, otherwise wait for - torture_runnable to be set via sysfs before starting. - By default it will begin once the module is loaded. - ** Torture-framework (RCU + locking) ** -- cgit