summaryrefslogtreecommitdiff
path: root/Documentation/RCU
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/RCU')
-rw-r--r--Documentation/RCU/Design/Data-Structures/Data-Structures.rst28
-rw-r--r--Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst10
-rw-r--r--Documentation/RCU/Design/Memory-Ordering/TreeRCU-dyntick.svg8
-rw-r--r--Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp-fqs.svg8
-rw-r--r--Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp.svg8
-rw-r--r--Documentation/RCU/Design/Memory-Ordering/TreeRCU-hotplug.svg4
-rw-r--r--Documentation/RCU/Design/Requirements/Requirements.rst19
-rw-r--r--Documentation/RCU/checklist.rst61
-rw-r--r--Documentation/RCU/listRCU.rst10
-rw-r--r--Documentation/RCU/rcubarrier.rst5
-rw-r--r--Documentation/RCU/stallwarn.rst9
-rw-r--r--Documentation/RCU/whatisRCU.rst45
12 files changed, 126 insertions, 89 deletions
diff --git a/Documentation/RCU/Design/Data-Structures/Data-Structures.rst b/Documentation/RCU/Design/Data-Structures/Data-Structures.rst
index b34990c7c377..04e16775c752 100644
--- a/Documentation/RCU/Design/Data-Structures/Data-Structures.rst
+++ b/Documentation/RCU/Design/Data-Structures/Data-Structures.rst
@@ -921,10 +921,10 @@ This portion of the ``rcu_data`` structure is declared as follows:
::
- 1 int dynticks_snap;
+ 1 int watching_snap;
2 unsigned long dynticks_fqs;
-The ``->dynticks_snap`` field is used to take a snapshot of the
+The ``->watching_snap`` field is used to take a snapshot of the
corresponding CPU's dyntick-idle state when forcing quiescent states,
and is therefore accessed from other CPUs. Finally, the
``->dynticks_fqs`` field is used to count the number of times this CPU
@@ -935,8 +935,8 @@ This portion of the rcu_data structure is declared as follows:
::
- 1 long dynticks_nesting;
- 2 long dynticks_nmi_nesting;
+ 1 long nesting;
+ 2 long nmi_nesting;
3 atomic_t dynticks;
4 bool rcu_need_heavy_qs;
5 bool rcu_urgent_qs;
@@ -945,14 +945,14 @@ These fields in the rcu_data structure maintain the per-CPU dyntick-idle
state for the corresponding CPU. The fields may be accessed only from
the corresponding CPU (and from tracing) unless otherwise stated.
-The ``->dynticks_nesting`` field counts the nesting depth of process
+The ``->nesting`` field counts the nesting depth of process
execution, so that in normal circumstances this counter has value zero
or one. NMIs, irqs, and tracers are counted by the
-``->dynticks_nmi_nesting`` field. Because NMIs cannot be masked, changes
+``->nmi_nesting`` field. Because NMIs cannot be masked, changes
to this variable have to be undertaken carefully using an algorithm
provided by Andy Lutomirski. The initial transition from idle adds one,
and nested transitions add two, so that a nesting level of five is
-represented by a ``->dynticks_nmi_nesting`` value of nine. This counter
+represented by a ``->nmi_nesting`` value of nine. This counter
can therefore be thought of as counting the number of reasons why this
CPU cannot be permitted to enter dyntick-idle mode, aside from
process-level transitions.
@@ -960,12 +960,12 @@ process-level transitions.
However, it turns out that when running in non-idle kernel context, the
Linux kernel is fully capable of entering interrupt handlers that never
exit and perhaps also vice versa. Therefore, whenever the
-``->dynticks_nesting`` field is incremented up from zero, the
-``->dynticks_nmi_nesting`` field is set to a large positive number, and
-whenever the ``->dynticks_nesting`` field is decremented down to zero,
-the ``->dynticks_nmi_nesting`` field is set to zero. Assuming that
+``->nesting`` field is incremented up from zero, the
+``->nmi_nesting`` field is set to a large positive number, and
+whenever the ``->nesting`` field is decremented down to zero,
+the ``->nmi_nesting`` field is set to zero. Assuming that
the number of misnested interrupts is not sufficient to overflow the
-counter, this approach corrects the ``->dynticks_nmi_nesting`` field
+counter, this approach corrects the ``->nmi_nesting`` field
every time the corresponding CPU enters the idle loop from process
context.
@@ -992,8 +992,8 @@ code.
+-----------------------------------------------------------------------+
| **Quick Quiz**: |
+-----------------------------------------------------------------------+
-| Why not simply combine the ``->dynticks_nesting`` and |
-| ``->dynticks_nmi_nesting`` counters into a single counter that just |
+| Why not simply combine the ``->nesting`` and |
+| ``->nmi_nesting`` counters into a single counter that just |
| counts the number of reasons that the corresponding CPU is non-idle? |
+-----------------------------------------------------------------------+
| **Answer**: |
diff --git a/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst b/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst
index 5750f125361b..1a5ff1a9f02e 100644
--- a/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst
+++ b/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst
@@ -147,11 +147,11 @@ RCU read-side critical sections preceding and following the current
idle sojourn.
This case is handled by calls to the strongly ordered
``atomic_add_return()`` read-modify-write atomic operation that
-is invoked within ``rcu_dynticks_eqs_enter()`` at idle-entry
-time and within ``rcu_dynticks_eqs_exit()`` at idle-exit time.
-The grace-period kthread invokes ``rcu_dynticks_snap()`` and
-``rcu_dynticks_in_eqs_since()`` (both of which invoke
-an ``atomic_add_return()`` of zero) to detect idle CPUs.
+is invoked within ``ct_kernel_exit_state()`` at idle-entry
+time and within ``ct_kernel_enter_state()`` at idle-exit time.
+The grace-period kthread invokes first ``ct_rcu_watching_cpu_acquire()``
+(preceded by a full memory barrier) and ``rcu_watching_snap_stopped_since()``
+(both of which rely on acquire semantics) to detect idle CPUs.
+-----------------------------------------------------------------------+
| **Quick Quiz**: |
diff --git a/Documentation/RCU/Design/Memory-Ordering/TreeRCU-dyntick.svg b/Documentation/RCU/Design/Memory-Ordering/TreeRCU-dyntick.svg
index 423df00c4df9..3fbc19c48a58 100644
--- a/Documentation/RCU/Design/Memory-Ordering/TreeRCU-dyntick.svg
+++ b/Documentation/RCU/Design/Memory-Ordering/TreeRCU-dyntick.svg
@@ -528,7 +528,7 @@
font-style="normal"
y="-8652.5312"
x="2466.7822"
- xml:space="preserve">dyntick_save_progress_counter()</text>
+ xml:space="preserve">rcu_watching_snap_save()</text>
<text
style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier"
id="text202-7-2-7-2-0"
@@ -537,7 +537,7 @@
font-style="normal"
y="-8368.1475"
x="2463.3262"
- xml:space="preserve">rcu_implicit_dynticks_qs()</text>
+ xml:space="preserve">rcu_watching_snap_recheck()</text>
</g>
<g
id="g4504"
@@ -607,7 +607,7 @@
font-weight="bold"
font-size="192"
id="text202-7-5-3-27-6"
- style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">rcu_dynticks_eqs_enter()</text>
+ style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">ct_kernel_exit_state()</text>
<text
xml:space="preserve"
x="3745.7725"
@@ -638,7 +638,7 @@
font-weight="bold"
font-size="192"
id="text202-7-5-3-27-6-1"
- style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">rcu_dynticks_eqs_exit()</text>
+ style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">ct_kernel_enter_state()</text>
<text
xml:space="preserve"
x="3745.7725"
diff --git a/Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp-fqs.svg b/Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp-fqs.svg
index d82a77d03d8c..25c7acc8a4c2 100644
--- a/Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp-fqs.svg
+++ b/Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp-fqs.svg
@@ -844,7 +844,7 @@
font-style="normal"
y="1547.8876"
x="4417.6396"
- xml:space="preserve">dyntick_save_progress_counter()</text>
+ xml:space="preserve">rcu_watching_snap_save()</text>
<g
style="fill:none;stroke-width:0.025in"
transform="translate(6501.9719,-10685.904)"
@@ -899,7 +899,7 @@
font-style="normal"
y="1858.8729"
x="4414.1836"
- xml:space="preserve">rcu_implicit_dynticks_qs()</text>
+ xml:space="preserve">rcu_watching_snap_recheck()</text>
<text
xml:space="preserve"
x="14659.87"
@@ -977,7 +977,7 @@
font-weight="bold"
font-size="192"
id="text202-7-5-3-27-6"
- style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">rcu_dynticks_eqs_enter()</text>
+ style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">ct_kernel_exit_state()</text>
<text
xml:space="preserve"
x="3745.7725"
@@ -1008,7 +1008,7 @@
font-weight="bold"
font-size="192"
id="text202-7-5-3-27-6-1"
- style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">rcu_dynticks_eqs_exit()</text>
+ style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">ct_kernel_enter_state()</text>
<text
xml:space="preserve"
x="3745.7725"
diff --git a/Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp.svg b/Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp.svg
index 53e0dc2a2c79..d05bc7b27edb 100644
--- a/Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp.svg
+++ b/Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp.svg
@@ -2974,7 +2974,7 @@
font-style="normal"
y="38114.047"
x="-334.33856"
- xml:space="preserve">dyntick_save_progress_counter()</text>
+ xml:space="preserve">rcu_watching_snap_save()</text>
<g
style="fill:none;stroke-width:0.025in"
transform="translate(1749.9916,25880.249)"
@@ -3029,7 +3029,7 @@
font-style="normal"
y="38425.035"
x="-337.79462"
- xml:space="preserve">rcu_implicit_dynticks_qs()</text>
+ xml:space="preserve">rcu_watching_snap_recheck()</text>
<text
xml:space="preserve"
x="9907.8887"
@@ -3107,7 +3107,7 @@
font-weight="bold"
font-size="192"
id="text202-7-5-3-27-6"
- style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">rcu_dynticks_eqs_enter()</text>
+ style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">ct_kernel_exit_state()</text>
<text
xml:space="preserve"
x="3745.7725"
@@ -3138,7 +3138,7 @@
font-weight="bold"
font-size="192"
id="text202-7-5-3-27-6-1"
- style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">rcu_dynticks_eqs_exit()</text>
+ style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">ct_kernel_enter_state()</text>
<text
xml:space="preserve"
x="3745.7725"
diff --git a/Documentation/RCU/Design/Memory-Ordering/TreeRCU-hotplug.svg b/Documentation/RCU/Design/Memory-Ordering/TreeRCU-hotplug.svg
index 4fa7506082bf..a92356ce4011 100644
--- a/Documentation/RCU/Design/Memory-Ordering/TreeRCU-hotplug.svg
+++ b/Documentation/RCU/Design/Memory-Ordering/TreeRCU-hotplug.svg
@@ -516,7 +516,7 @@
font-style="normal"
y="-8652.5312"
x="2466.7822"
- xml:space="preserve">dyntick_save_progress_counter()</text>
+ xml:space="preserve">rcu_watching_snap_save()</text>
<text
style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier"
id="text202-7-2-7-2-0"
@@ -525,7 +525,7 @@
font-style="normal"
y="-8368.1475"
x="2463.3262"
- xml:space="preserve">rcu_implicit_dynticks_qs()</text>
+ xml:space="preserve">rcu_watching_snap_recheck()</text>
<text
sodipodi:linespacing="125%"
style="font-size:192px;font-style:normal;font-weight:bold;line-height:125%;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier"
diff --git a/Documentation/RCU/Design/Requirements/Requirements.rst b/Documentation/RCU/Design/Requirements/Requirements.rst
index cccafdaa1f84..6125e7068d2c 100644
--- a/Documentation/RCU/Design/Requirements/Requirements.rst
+++ b/Documentation/RCU/Design/Requirements/Requirements.rst
@@ -2357,6 +2357,7 @@ section.
#. `Sched Flavor (Historical)`_
#. `Sleepable RCU`_
#. `Tasks RCU`_
+#. `Tasks Trace RCU`_
Bottom-Half Flavor (Historical)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -2610,6 +2611,16 @@ critical sections that are delimited by voluntary context switches, that
is, calls to schedule(), cond_resched(), and
synchronize_rcu_tasks(). In addition, transitions to and from
userspace execution also delimit tasks-RCU read-side critical sections.
+Idle tasks are ignored by Tasks RCU, and Tasks Rude RCU may be used to
+interact with them.
+
+Note well that involuntary context switches are *not* Tasks-RCU quiescent
+states. After all, in preemptible kernels, a task executing code in a
+trampoline might be preempted. In this case, the Tasks-RCU grace period
+clearly cannot end until that task resumes and its execution leaves that
+trampoline. This means, among other things, that cond_resched() does
+not provide a Tasks RCU quiescent state. (Instead, use rcu_softirq_qs()
+from softirq or rcu_tasks_classic_qs() otherwise.)
The tasks-RCU API is quite compact, consisting only of
call_rcu_tasks(), synchronize_rcu_tasks(), and
@@ -2632,9 +2643,13 @@ moniker. And this operation is considered to be quite rude by real-time
workloads that don't want their ``nohz_full`` CPUs receiving IPIs and
by battery-powered systems that don't want their idle CPUs to be awakened.
+Once kernel entry/exit and deep-idle functions have been properly tagged
+``noinstr``, Tasks RCU can start paying attention to idle tasks (except
+those that are idle from RCU's perspective) and then Tasks Rude RCU can
+be removed from the kernel.
+
The tasks-rude-RCU API is also reader-marking-free and thus quite compact,
-consisting of call_rcu_tasks_rude(), synchronize_rcu_tasks_rude(),
-and rcu_barrier_tasks_rude().
+consisting solely of synchronize_rcu_tasks_rude().
Tasks Trace RCU
~~~~~~~~~~~~~~~
diff --git a/Documentation/RCU/checklist.rst b/Documentation/RCU/checklist.rst
index 3e6407de231c..7de3e308f330 100644
--- a/Documentation/RCU/checklist.rst
+++ b/Documentation/RCU/checklist.rst
@@ -194,14 +194,13 @@ over a rather long period of time, but improvements are always welcome!
when publicizing a pointer to a structure that can
be traversed by an RCU read-side critical section.
-5. If any of call_rcu(), call_srcu(), call_rcu_tasks(),
- call_rcu_tasks_rude(), or call_rcu_tasks_trace() is used,
- the callback function may be invoked from softirq context,
- and in any case with bottom halves disabled. In particular,
- this callback function cannot block. If you need the callback
- to block, run that code in a workqueue handler scheduled from
- the callback. The queue_rcu_work() function does this for you
- in the case of call_rcu().
+5. If any of call_rcu(), call_srcu(), call_rcu_tasks(), or
+ call_rcu_tasks_trace() is used, the callback function may be
+ invoked from softirq context, and in any case with bottom halves
+ disabled. In particular, this callback function cannot block.
+ If you need the callback to block, run that code in a workqueue
+ handler scheduled from the callback. The queue_rcu_work()
+ function does this for you in the case of call_rcu().
6. Since synchronize_rcu() can block, it cannot be called
from any sort of irq context. The same rule applies
@@ -254,10 +253,10 @@ over a rather long period of time, but improvements are always welcome!
corresponding readers must use rcu_read_lock_trace()
and rcu_read_unlock_trace().
- c. If an updater uses call_rcu_tasks_rude() or
- synchronize_rcu_tasks_rude(), then the corresponding
- readers must use anything that disables preemption,
- for example, preempt_disable() and preempt_enable().
+ c. If an updater uses synchronize_rcu_tasks_rude(),
+ then the corresponding readers must use anything that
+ disables preemption, for example, preempt_disable()
+ and preempt_enable().
Mixing things up will result in confusion and broken kernels, and
has even resulted in an exploitable security issue. Therefore,
@@ -326,11 +325,9 @@ over a rather long period of time, but improvements are always welcome!
d. Periodically invoke rcu_barrier(), permitting a limited
number of updates per grace period.
- The same cautions apply to call_srcu(), call_rcu_tasks(),
- call_rcu_tasks_rude(), and call_rcu_tasks_trace(). This is
- why there is an srcu_barrier(), rcu_barrier_tasks(),
- rcu_barrier_tasks_rude(), and rcu_barrier_tasks_rude(),
- respectively.
+ The same cautions apply to call_srcu(), call_rcu_tasks(), and
+ call_rcu_tasks_trace(). This is why there is an srcu_barrier(),
+ rcu_barrier_tasks(), and rcu_barrier_tasks_trace(), respectively.
Note that although these primitives do take action to avoid
memory exhaustion when any given CPU has too many callbacks,
@@ -383,17 +380,17 @@ over a rather long period of time, but improvements are always welcome!
must use whatever locking or other synchronization is required
to safely access and/or modify that data structure.
- Do not assume that RCU callbacks will be executed on
- the same CPU that executed the corresponding call_rcu(),
- call_srcu(), call_rcu_tasks(), call_rcu_tasks_rude(), or
- call_rcu_tasks_trace(). For example, if a given CPU goes offline
- while having an RCU callback pending, then that RCU callback
- will execute on some surviving CPU. (If this was not the case,
- a self-spawning RCU callback would prevent the victim CPU from
- ever going offline.) Furthermore, CPUs designated by rcu_nocbs=
- might well *always* have their RCU callbacks executed on some
- other CPUs, in fact, for some real-time workloads, this is the
- whole point of using the rcu_nocbs= kernel boot parameter.
+ Do not assume that RCU callbacks will be executed on the same
+ CPU that executed the corresponding call_rcu(), call_srcu(),
+ call_rcu_tasks(), or call_rcu_tasks_trace(). For example, if
+ a given CPU goes offline while having an RCU callback pending,
+ then that RCU callback will execute on some surviving CPU.
+ (If this was not the case, a self-spawning RCU callback would
+ prevent the victim CPU from ever going offline.) Furthermore,
+ CPUs designated by rcu_nocbs= might well *always* have their
+ RCU callbacks executed on some other CPUs, in fact, for some
+ real-time workloads, this is the whole point of using the
+ rcu_nocbs= kernel boot parameter.
In addition, do not assume that callbacks queued in a given order
will be invoked in that order, even if they all are queued on the
@@ -507,9 +504,9 @@ over a rather long period of time, but improvements are always welcome!
These debugging aids can help you find problems that are
otherwise extremely difficult to spot.
-17. If you pass a callback function defined within a module to one of
- call_rcu(), call_srcu(), call_rcu_tasks(), call_rcu_tasks_rude(),
- or call_rcu_tasks_trace(), then it is necessary to wait for all
+17. If you pass a callback function defined within a module
+ to one of call_rcu(), call_srcu(), call_rcu_tasks(), or
+ call_rcu_tasks_trace(), then it is necessary to wait for all
pending callbacks to be invoked before unloading that module.
Note that it is absolutely *not* sufficient to wait for a grace
period! For example, synchronize_rcu() implementation is *not*
@@ -522,7 +519,6 @@ over a rather long period of time, but improvements are always welcome!
- call_rcu() -> rcu_barrier()
- call_srcu() -> srcu_barrier()
- call_rcu_tasks() -> rcu_barrier_tasks()
- - call_rcu_tasks_rude() -> rcu_barrier_tasks_rude()
- call_rcu_tasks_trace() -> rcu_barrier_tasks_trace()
However, these barrier functions are absolutely *not* guaranteed
@@ -539,7 +535,6 @@ over a rather long period of time, but improvements are always welcome!
- Either synchronize_srcu() or synchronize_srcu_expedited(),
together with and srcu_barrier()
- synchronize_rcu_tasks() and rcu_barrier_tasks()
- - synchronize_tasks_rude() and rcu_barrier_tasks_rude()
- synchronize_tasks_trace() and rcu_barrier_tasks_trace()
If necessary, you can use something like workqueues to execute
diff --git a/Documentation/RCU/listRCU.rst b/Documentation/RCU/listRCU.rst
index ed5c9d8c9afe..d8bb98623c12 100644
--- a/Documentation/RCU/listRCU.rst
+++ b/Documentation/RCU/listRCU.rst
@@ -334,7 +334,7 @@ If the system-call audit module were to ever need to reject stale data, one way
to accomplish this would be to add a ``deleted`` flag and a ``lock`` spinlock to the
``audit_entry`` structure, and modify audit_filter_task() as follows::
- static enum audit_state audit_filter_task(struct task_struct *tsk)
+ static struct audit_entry *audit_filter_task(struct task_struct *tsk, char **key)
{
struct audit_entry *e;
enum audit_state state;
@@ -346,16 +346,18 @@ to accomplish this would be to add a ``deleted`` flag and a ``lock`` spinlock to
if (e->deleted) {
spin_unlock(&e->lock);
rcu_read_unlock();
- return AUDIT_BUILD_CONTEXT;
+ return NULL;
}
rcu_read_unlock();
if (state == AUDIT_STATE_RECORD)
*key = kstrdup(e->rule.filterkey, GFP_ATOMIC);
- return state;
+ /* As long as e->lock is held, e is valid and
+ * its value is not stale */
+ return e;
}
}
rcu_read_unlock();
- return AUDIT_BUILD_CONTEXT;
+ return NULL;
}
The ``audit_del_rule()`` function would need to set the ``deleted`` flag under the
diff --git a/Documentation/RCU/rcubarrier.rst b/Documentation/RCU/rcubarrier.rst
index 6da7f66da2a8..12a7b059654f 100644
--- a/Documentation/RCU/rcubarrier.rst
+++ b/Documentation/RCU/rcubarrier.rst
@@ -329,10 +329,7 @@ Answer:
was first added back in 2005. This is because on_each_cpu()
disables preemption, which acted as an RCU read-side critical
section, thus preventing CPU 0's grace period from completing
- until on_each_cpu() had dealt with all of the CPUs. However,
- with the advent of preemptible RCU, rcu_barrier() no longer
- waited on nonpreemptible regions of code in preemptible kernels,
- that being the job of the new rcu_barrier_sched() function.
+ until on_each_cpu() had dealt with all of the CPUs.
However, with the RCU flavor consolidation around v4.20, this
possibility was once again ruled out, because the consolidated
diff --git a/Documentation/RCU/stallwarn.rst b/Documentation/RCU/stallwarn.rst
index ca7b7cd806a1..d1ccd6039a8c 100644
--- a/Documentation/RCU/stallwarn.rst
+++ b/Documentation/RCU/stallwarn.rst
@@ -96,6 +96,13 @@ warnings:
the ``rcu_.*timer wakeup didn't happen for`` console-log message,
which will include additional debugging information.
+- A timer issue causes time to appear to jump forward, so that RCU
+ believes that the RCU CPU stall-warning timeout has been exceeded
+ when in fact much less time has passed. This could be due to
+ timer hardware bugs, timer driver bugs, or even corruption of
+ the "jiffies" global variable. These sorts of timer hardware
+ and driver bugs are not uncommon when testing new hardware.
+
- A low-level kernel issue that either fails to invoke one of the
variants of rcu_eqs_enter(true), rcu_eqs_exit(true), ct_idle_enter(),
ct_idle_exit(), ct_irq_enter(), or ct_irq_exit() on the one
@@ -249,7 +256,7 @@ ticks this GP)" indicates that this CPU has not taken any scheduling-clock
interrupts during the current stalled grace period.
The "idle=" portion of the message prints the dyntick-idle state.
-The hex number before the first "/" is the low-order 12 bits of the
+The hex number before the first "/" is the low-order 16 bits of the
dynticks counter, which will have an even-numbered value if the CPU
is in dyntick-idle mode and an odd-numbered value otherwise. The hex
number between the two "/"s is the value of the nesting, which will be
diff --git a/Documentation/RCU/whatisRCU.rst b/Documentation/RCU/whatisRCU.rst
index 94838c65c7d9..be2eb6be16ec 100644
--- a/Documentation/RCU/whatisRCU.rst
+++ b/Documentation/RCU/whatisRCU.rst
@@ -15,6 +15,9 @@ to start learning about RCU:
| 2014 Big API Table https://lwn.net/Articles/609973/
| 6. The RCU API, 2019 Edition https://lwn.net/Articles/777036/
| 2019 Big API Table https://lwn.net/Articles/777165/
+| 7. The RCU API, 2024 Edition https://lwn.net/Articles/988638/
+| 2024 Background Information https://lwn.net/Articles/988641/
+| 2024 Big API Table https://lwn.net/Articles/988666/
For those preferring video:
@@ -250,21 +253,25 @@ rcu_assign_pointer()
^^^^^^^^^^^^^^^^^^^^
void rcu_assign_pointer(p, typeof(p) v);
- Yes, rcu_assign_pointer() **is** implemented as a macro, though it
- would be cool to be able to declare a function in this manner.
- (Compiler experts will no doubt disagree.)
+ Yes, rcu_assign_pointer() **is** implemented as a macro, though
+ it would be cool to be able to declare a function in this manner.
+ (And there has been some discussion of adding overloaded functions
+ to the C language, so who knows?)
The updater uses this spatial macro to assign a new value to an
RCU-protected pointer, in order to safely communicate the change
in value from the updater to the reader. This is a spatial (as
opposed to temporal) macro. It does not evaluate to an rvalue,
- but it does execute any memory-barrier instructions required
- for a given CPU architecture. Its ordering properties are that
- of a store-release operation.
-
- Perhaps just as important, it serves to document (1) which
- pointers are protected by RCU and (2) the point at which a
- given structure becomes accessible to other CPUs. That said,
+ but it does provide any compiler directives and memory-barrier
+ instructions required for a given compile or CPU architecture.
+ Its ordering properties are that of a store-release operation,
+ that is, any prior loads and stores required to initialize the
+ structure are ordered before the store that publishes the pointer
+ to that structure.
+
+ Perhaps just as important, rcu_assign_pointer() serves to document
+ (1) which pointers are protected by RCU and (2) the point at which
+ a given structure becomes accessible to other CPUs. That said,
rcu_assign_pointer() is most frequently used indirectly, via
the _rcu list-manipulation primitives such as list_add_rcu().
@@ -283,7 +290,11 @@ rcu_dereference()
executes any needed memory-barrier instructions for a given
CPU architecture. Currently, only Alpha needs memory barriers
within rcu_dereference() -- on other CPUs, it compiles to a
- volatile load.
+ volatile load. However, no mainstream C compilers respect
+ address dependencies, so rcu_dereference() uses volatile casts,
+ which, in combination with the coding guidelines listed in
+ rcu_dereference.rst, prevent current compilers from breaking
+ these dependencies.
Common coding practice uses rcu_dereference() to copy an
RCU-protected pointer to a local variable, then dereferences
@@ -963,6 +974,16 @@ unfortunately any spinlock in a ``SLAB_TYPESAFE_BY_RCU`` object must be
initialized after each and every call to kmem_cache_alloc(), which renders
reference-free spinlock acquisition completely unsafe. Therefore, when
using ``SLAB_TYPESAFE_BY_RCU``, make proper use of a reference counter.
+If using refcount_t, the specialized refcount_{add|inc}_not_zero_acquire()
+and refcount_set_release() APIs should be used to ensure correct operation
+ordering when verifying object identity and when initializing newly
+allocated objects. Acquire fence in refcount_{add|inc}_not_zero_acquire()
+ensures that identity checks happen *after* reference count is taken.
+refcount_set_release() should be called after a newly allocated object is
+fully initialized and release fence ensures that new values are visible
+*before* refcount can be successfully taken by other users. Once
+refcount_set_release() is called, the object should be considered visible
+by other tasks.
(Those willing to initialize their locks in a kmem_cache constructor
may also use locking, including cache-friendly sequence locking.)
@@ -1095,7 +1116,7 @@ RCU-Tasks-Rude::
Critical sections Grace period Barrier
- N/A call_rcu_tasks_rude rcu_barrier_tasks_rude
+ N/A N/A
synchronize_rcu_tasks_rude