summaryrefslogtreecommitdiff
path: root/kernel/rcu
AgeCommit message (Collapse)Author
2021-01-06rcu/segcblist: Add debug checks for segment lengthsJoel Fernandes (Google)
This commit adds debug checks near the end of rcu_do_batch() that emit warnings if an empty rcu_segcblist structure has non-zero segment counts, or, conversely, if a non-empty structure has all-zero segment counts. Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> [ paulmck: Fix queue/segment-length checks. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-06rcu/trace: Add tracing for how segcb list changesJoel Fernandes (Google)
This commit adds tracing to track how the segcb list changes before/after acceleration, during queuing and during dequeuing. This tracing helped discover an optimization that avoided needless GP requests when no callbacks were accelerated. The tracing overhead is minimal as each segment's length is now stored in the respective segment. Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Neeraj Upadhyay <neeraju@codeaurora.org> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-06rcu/tree: segcblist: Remove redundant smp_mb()sJoel Fernandes (Google)
The full memory barriers in rcu_segcblist_enqueue() and in rcu_do_batch() are not needed because rcu_segcblist_add_len(), and thus also rcu_segcblist_inc_len(), already includes a memory barrier *before* and *after* the length of the list is updated. This commit therefore removes these redundant smp_mb() invocations. Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-06rcu/segcblist: Add counters to segcblist datastructureJoel Fernandes (Google)
Add counting of segment lengths of segmented callback list. This will be useful for a number of things such as knowing how big the ready-to-execute segment have gotten. The immediate benefit is ability to trace how the callbacks in the segmented callback list change. Also this patch remove hacks related to using donecbs's ->len field as a temporary variable to save the segmented callback list's length. This cannot be done anymore and is not needed. Also fix SRCU: The negative counting of the unsegmented list cannot be used to adjust the segmented one. To fix this, sample the unsegmented length in advance, and use it after CB execution to adjust the segmented list's length. Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-06rcu/segcblist: Add additional comments to explain smp_mb()Joel Fernandes (Google)
One counter-intuitive property of RCU is the fact that full memory barriers are needed both before and after updates to the full (non-segmented) length. This patch therefore helps to assist the reader's intuition by adding appropriate comments. [ paulmck: Wordsmithing. ] Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-04rcu: Make TASKS_TRACE_RCU select IRQ_WORKPaul E. McKenney
Tasks Trace RCU uses irq_work_queue() to safely awaken its grace-period kthread, so this commit therefore causes the TASKS_TRACE_RCU Kconfig option select the IRQ_WORK Kconfig option. Reported-by: kernel test robot <lkp@intel.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-04rcu-tasks: Add RCU-tasks self testsUladzislau Rezki (Sony)
This commit adds self tests for early-boot use of RCU-tasks grace periods. It tests all three variants (Rude, Tasks, and Tasks Trace) and covers both synchronous (e.g., synchronize_rcu_tasks()) and asynchronous (e.g., call_rcu_tasks()) grace-period APIs. Self-tests are run only in kernels built with CONFIG_PROVE_RCU=y. Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> [ paulmck: Handle CONFIG_PROVE_RCU=n and identify test cases' callbacks. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-04rcu: Add lockdep_assert_irqs_disabled() to raw_spin_unlock_rcu_node() macrosPaul E. McKenney
This commit adds a lockdep_assert_irqs_disabled() call to the helper macros that release the rcu_node structure's ->lock, namely to raw_spin_unlock_rcu_node(), raw_spin_unlock_irq_rcu_node() and raw_spin_unlock_irqrestore_rcu_node(). The point of this is to help track down a situation where lockdep appears to be insisting that interrupts are enabled while holding an rcu_node structure's ->lock. Link: https://lore.kernel.org/lkml/20201111133813.GA81547@elver.google.com/ Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-04rcu: Add lockdep_assert_irqs_disabled() to rcu_sched_clock_irq() and calleesPaul E. McKenney
This commit adds a number of lockdep_assert_irqs_disabled() calls to rcu_sched_clock_irq() and a number of the functions that it calls. The point of this is to help track down a situation where lockdep appears to be insisting that interrupts are enabled within these functions, which should only ever be invoked from the scheduling-clock interrupt handler. Link: https://lore.kernel.org/lkml/20201111133813.GA81547@elver.google.com/ Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-04rcu: Do not NMI offline CPUsPaul E. McKenney
Currently, RCU CPU stall warning messages will NMI whatever CPU looks like it is blocking either the current grace period or the grace-period kthread. This can produce confusing output if the target CPU is offline. This commit therefore checks for offline CPUs. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-04rcu: For RCU grace-period kthread starvation, dump last CPU it ran onPaul E. McKenney
When the RCU CPU stall-warning code detects that the RCU grace-period kthread is being starved, it dumps that kthread's stack. This can sometimes be useful, but it is also useful to know what is running on the CPU that this kthread is attempting to run on. This commit therefore adds a stack trace of this CPU in order to help track down whatever it is that might be preventing RCU's grace-period kthread from running. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-04rcu: Mark obtuse portion of stall warning as internal debugPaul E. McKenney
There is a rather obtuse string that can be printed as part of an expedited RCU CPU stall-warning message that starts with "blocking rcu_node structures". Under normal conditions, most of this message is just repeating the list of CPUs blocking the current expedited grace period, but in a manner that is rather difficult to read. This commit therefore marks this message as "(internal RCU debug)" in an effort to give people the option of avoiding wasting time attempting to extract nonexistent additional meaning from this portion of the message. Reported-by: Jonathan Lemon <bsd@fb.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-04rcutorture: Add testing for RCU's global memory orderingPaul E. McKenney
RCU guarantees that anything seen by a given reader will also be seen after any grace period that must wait on that reader. This is very likely to hold based on inspection, but the advantage of having rcutorture do the inspecting is that rcutorture doesn't mind inspecting frequently and often. This commit therefore adds code to test RCU's global memory ordering. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-04rcutorture: Add reader-side tests of polling grace-period APIPaul E. McKenney
This commit adds reader-side testing of the polling grace-period API. This testing verifies that a cookie obtained in an SRCU read-side critical section does not get a true return from poll_state_synchronize_srcu() within that same critical section. Link: https://lore.kernel.org/rcu/20201112201547.GF3365678@moria.home.lan/ Reported-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-04rcutorture: Add writer-side tests of polling grace-period APIPaul E. McKenney
This commit adds writer-side testing of the polling grace-period API. One test verifies that the polling API sees a grace period caused by some other mechanism. Another test verifies that using the polling API to wait for a grace period does not result in too-short grace periods. A third test verifies that the polling API does not report completion within a read-side critical section. A fourth and final test verifies that the polling API does report completion given an intervening grace period. Link: https://lore.kernel.org/rcu/20201112201547.GF3365678@moria.home.lan/ Reported-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-04rcutorture: Prepare for ->start_gp_poll and ->poll_gp_statePaul E. McKenney
The new get_state_synchronize_srcu(), start_poll_synchronize_srcu() and poll_state_synchronize_srcu() functions need to be tested, and so this commit prepares by renaming the rcu_torture_ops field ->get_state to ->get_gp_state in order to be consistent with the upcoming ->start_gp_poll and ->poll_gp_state fields. Link: https://lore.kernel.org/rcu/20201112201547.GF3365678@moria.home.lan/ Reported-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-04srcu: Add comment explaining cookie overflow/wrapPaul E. McKenney
This commit adds to the poll_state_synchronize_srcu() header comment describing the issues surrounding SRCU cookie overflow/wrap for the different kernel configurations. Link: https://lore.kernel.org/rcu/20201112201547.GF3365678@moria.home.lan/ Reported-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-04srcu: Provide polling interfaces for Tree SRCU grace periodsPaul E. McKenney
There is a need for a polling interface for SRCU grace periods, so this commit supplies get_state_synchronize_srcu(), start_poll_synchronize_srcu(), and poll_state_synchronize_srcu() for this purpose. The first can be used if future grace periods are inevitable (perhaps due to a later call_srcu() invocation), the second if future grace periods might not otherwise happen, and the third to check if a grace period has elapsed since the corresponding call to either of the first two. As with get_state_synchronize_rcu() and cond_synchronize_rcu(), the return value from either get_state_synchronize_srcu() or start_poll_synchronize_srcu() must be passed in to a later call to poll_state_synchronize_srcu(). Link: https://lore.kernel.org/rcu/20201112201547.GF3365678@moria.home.lan/ Reported-by: Kent Overstreet <kent.overstreet@gmail.com> [ paulmck: Add EXPORT_SYMBOL_GPL() per kernel test robot feedback. ] [ paulmck: Apply feedback from Neeraj Upadhyay. ] Link: https://lore.kernel.org/lkml/20201117004017.GA7444@paulmck-ThinkPad-P72/ Reviewed-by: Neeraj Upadhyay <neeraju@codeaurora.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-04srcu: Provide polling interfaces for Tiny SRCU grace periodsPaul E. McKenney
There is a need for a polling interface for SRCU grace periods, so this commit supplies get_state_synchronize_srcu(), start_poll_synchronize_srcu(), and poll_state_synchronize_srcu() for this purpose. The first can be used if future grace periods are inevitable (perhaps due to a later call_srcu() invocation), the second if future grace periods might not otherwise happen, and the third to check if a grace period has elapsed since the corresponding call to either of the first two. As with get_state_synchronize_rcu() and cond_synchronize_rcu(), the return value from either get_state_synchronize_srcu() or start_poll_synchronize_srcu() must be passed in to a later call to poll_state_synchronize_srcu(). Link: https://lore.kernel.org/rcu/20201112201547.GF3365678@moria.home.lan/ Reported-by: Kent Overstreet <kent.overstreet@gmail.com> [ paulmck: Add EXPORT_SYMBOL_GPL() per kernel test robot feedback. ] [ paulmck: Apply feedback from Neeraj Upadhyay. ] Link: https://lore.kernel.org/lkml/20201117004017.GA7444@paulmck-ThinkPad-P72/ Reviewed-by: Neeraj Upadhyay <neeraju@codeaurora.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-04srcu: Provide internal interface to start a Tree SRCU grace periodPaul E. McKenney
There is a need for a polling interface for SRCU grace periods. This polling needs to initiate an SRCU grace period without having to queue (and manage) a callback. This commit therefore splits the Tree SRCU __call_srcu() function into callback-initialization and queuing/start-grace-period portions, with the latter in a new function named srcu_gp_start_if_needed(). This function may be passed a NULL callback pointer, in which case it will refrain from queuing anything. Why have the new function mess with queuing? Locking considerations, of course! Link: https://lore.kernel.org/rcu/20201112201547.GF3365678@moria.home.lan/ Reported-by: Kent Overstreet <kent.overstreet@gmail.com> Reviewed-by: Neeraj Upadhyay <neeraju@codeaurora.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-04srcu: Provide internal interface to start a Tiny SRCU grace periodPaul E. McKenney
There is a need for a polling interface for SRCU grace periods. This polling needs to initiate an SRCU grace period without having to queue (and manage) a callback. This commit therefore splits the Tiny SRCU call_srcu() function into callback-queuing and start-grace-period portions, with the latter in a new function named srcu_gp_start_if_needed(). Link: https://lore.kernel.org/rcu/20201112201547.GF3365678@moria.home.lan/ Reported-by: Kent Overstreet <kent.overstreet@gmail.com> Reviewed-by: Neeraj Upadhyay <neeraju@codeaurora.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-04srcu: Make Tiny SRCU use multi-bit grace-period counterPaul E. McKenney
There is a need for a polling interface for SRCU grace periods. This polling needs to distinguish between an SRCU instance being idle on the one hand or in the middle of a grace period on the other. This commit therefore converts the Tiny SRCU srcu_struct structure's srcu_idx from a defacto boolean to a free-running counter, using the bottom bit to indicate that a grace period is in progress. The second-from-bottom bit is thus used as the index returned by srcu_read_lock(). Link: https://lore.kernel.org/rcu/20201112201547.GF3365678@moria.home.lan/ Reported-by: Kent Overstreet <kent.overstreet@gmail.com> [ paulmck: Fix ->srcu_lock_nesting[] indexing per Neeraj Upadhyay. ] Reviewed-by: Neeraj Upadhyay <neeraju@codeaurora.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-04rcu: Enable rcu_normal_after_boot unconditionally for RTJulia Cartwright
Expedited RCU grace periods send IPIs to all non-idle CPUs, and thus can disrupt time-critical code in real-time applications. However, there is a portion of boot-time processing (presumably before any real-time applications have started) where expedited RCU grace periods are the only option. And so it is that experience with the -rt patchset indicates that PREEMPT_RT systems should always set the rcupdate.rcu_normal_after_boot kernel boot parameter. This commit therefore makes the post-boot application environment safe for real-time applications by making PREEMPT_RT systems disable the rcupdate.rcu_normal_after_boot kernel boot parameter and acting as if this parameter had been set. This means that post-boot calls to synchronize_rcu_expedited() will be treated as if they were instead calls to synchronize_rcu(), thus preventing the IPIs, and thus avoiding disrupting real-time applications. Suggested-by: Luiz Capitulino <lcapitulino@redhat.com> Acked-by: Paul E. McKenney <paulmck@linux.ibm.com> Signed-off-by: Julia Cartwright <julia@ni.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> [ paulmck: Update kernel-parameters.txt accordingly. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-04rcu: Unconditionally use rcuc threads on PREEMPT_RTScott Wood
PREEMPT_RT systems have long used the rcutree.use_softirq kernel boot parameter to avoid use of RCU_SOFTIRQ handlers, which can disrupt real-time applications by invoking callbacks during return from interrupts that arrived while executing time-critical code. This kernel boot parameter instead runs RCU core processing in an 'rcuc' kthread, thus allowing the scheduler to do its job of avoiding disrupting time-critical code. This commit therefore disables the rcutree.use_softirq kernel boot parameter on PREEMPT_RT systems, thus forcing such systems to do RCU core processing in 'rcuc' kthreads. This approach has long been in use by users of the -rt patchset, and there have been no complaints. There is therefore no way for the system administrator to override this choice, at least without modifying and rebuilding the kernel. Signed-off-by: Scott Wood <swood@redhat.com> [bigeasy: Reword commit message] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> [ paulmck: Update kernel-parameters.txt accordingly. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-04rcu: Make RCU_BOOST default on CONFIG_PREEMPT_RTSebastian Andrzej Siewior
On PREEMPT_RT kernels, RCU callbacks are deferred to the `rcuc' kthread. This can stall RCU grace periods due to lengthy preemption not only of RCU readers but also of 'rcuc' kthreads, either of which prevent grace periods from completing, which can in turn result in OOM. Because PREEMPT_RT kernels have more kthreads that can block grace periods, it is more important for such kernels to enable RCU_BOOST. This commit therefore makes RCU_BOOST the default on PREEMPT_RT. RCU_BOOST can still be manually disabled if need be. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-04rcu: Record kvfree_call_rcu() call stack for KASANZqiang
This commit adds a call to kasan_record_aux_stack() in kvfree_call_rcu() in order to record the call stack of the code that caused the object to be freed. Please note that this function does not update the allocated/freed state, which is important because RCU readers might still be referencing this object. Acked-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Zqiang <qiang.zhang@windriver.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-04rcu/tree: Make rcu_do_batch count how many callbacks were executedJoel Fernandes (Google)
The rcu_do_batch() function extracts the ready-to-invoke callbacks from the rcu_segcblist located in the ->cblist field of the current CPU's rcu_data structure. These callbacks are first moved to a local (unsegmented) rcu_cblist. The rcu_do_batch() function then uses this rcu_cblist's ->len field to count how many CBs it has invoked, but it does so by counting that field down from zero. Finally, this function negates the value in this ->len field (resulting in a positive number) and subtracts the result from the ->len field of the current CPU's ->cblist field. Except that it is sometimes necessary for rcu_do_batch() to stop invoking callbacks mid-stream, despite there being more ready to invoke, for example, if a high-priority task wakes up. In this case the remaining not-yet-invoked callbacks are requeued back onto the CPU's ->cblist, but remain in the ready-to-invoke segment of that list. As above, the negative of the local rcu_cblist's ->len field is still subtracted from the ->len field of the current CPU's ->cblist field. The design of counting down from 0 is confusing and error-prone, plus use of a positive count will make it easier to provide a uniform and consistent API to deal with the per-segment counts that are added later in this series. For example, rcu_segcblist_extract_done_cbs() can unconditionally populate the resulting unsegmented list's ->len field during extraction. This commit therefore explicitly counts how many callbacks were executed in rcu_do_batch() itself, counting up from zero, and then uses that to update the per-CPU segcb list's ->len field, without relying on the downcounting of rcl->len from zero. Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Neeraj Upadhyay <neeraju@codeaurora.org> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-04Merge branch 'rcu/urgent' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu Pull RCU fix from Paul McKenney: "This is a fix for a regression in the v5.10 merge window, but it was reported quite late in the v5.10 process, plus generating and testing the fix took some time. The regression is due to commit 36dadef23fcc ("kprobes: Init kprobes in early_initcall") which on powerpc can use RCU Tasks before initialization, resulting in boot failures. The fix is straightforward, simply moving initialization of RCU Tasks before the early_initcall()s. The fix has been exposed to -next and kbuild test robot testing, and has been tested by the PowerPC guys" * 'rcu/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: rcu-tasks: Move RCU-tasks initialization to before early_initcall()
2020-12-14Merge tag 'sched-core-2020-12-14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Thomas Gleixner: - migrate_disable/enable() support which originates from the RT tree and is now a prerequisite for the new preemptible kmap_local() API which aims to replace kmap_atomic(). - A fair amount of topology and NUMA related improvements - Improvements for the frequency invariant calculations - Enhanced robustness for the global CPU priority tracking and decision making - The usual small fixes and enhancements all over the place * tag 'sched-core-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (61 commits) sched/fair: Trivial correction of the newidle_balance() comment sched/fair: Clear SMT siblings after determining the core is not idle sched: Fix kernel-doc markup x86: Print ratio freq_max/freq_base used in frequency invariance calculations x86, sched: Use midpoint of max_boost and max_P for frequency invariance on AMD EPYC x86, sched: Calculate frequency invariance for AMD systems irq_work: Optimize irq_work_single() smp: Cleanup smp_call_function*() irq_work: Cleanup sched: Limit the amount of NUMA imbalance that can exist at fork time sched/numa: Allow a floating imbalance between NUMA nodes sched: Avoid unnecessary calculation of load imbalance at clone time sched/numa: Rename nr_running and break out the magic number sched: Make migrate_disable/enable() independent of RT sched/topology: Condition EAS enablement on FIE support arm64: Rebuild sched domains on invariance status changes sched/topology,schedutil: Wrap sched domains rebuild sched/uclamp: Allow to reset a task uclamp constraint value sched/core: Fix typos in comments Documentation: scheduler: fix information on arch SD flags, sched_domain and sched_debug ...
2020-12-14Merge tag 'core-rcu-2020-12-14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU updates from Thomas Gleixner: "RCU, LKMM and KCSAN updates collected by Paul McKenney. RCU: - Avoid cpuinfo-induced IPI pileups and idle-CPU IPIs - Lockdep-RCU updates reducing the need for __maybe_unused - Tasks-RCU updates - Miscellaneous fixes - Documentation updates - Torture-test updates KCSAN: - updates for selftests, avoiding setting watchpoints on NULL pointers - fix to watchpoint encoding LKMM: - updates for documentation along with some updates to example-code litmus tests" * tag 'core-rcu-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (72 commits) srcu: Take early exit on memory-allocation failure rcu/tree: Defer kvfree_rcu() allocation to a clean context rcu: Do not report strict GPs for outgoing CPUs rcu: Fix a typo in rcu_blocking_is_gp() header comment rcu: Prevent lockdep-RCU splats on lock acquisition/release rcu/tree: nocb: Avoid raising softirq for offloaded ready-to-execute CBs rcu,ftrace: Fix ftrace recursion rcu/tree: Make struct kernel_param_ops definitions const rcu/tree: Add a warning if CPU being onlined did not report QS already rcu: Clarify nocb kthreads naming in RCU_NOCB_CPU config rcu: Fix single-CPU check in rcu_blocking_is_gp() rcu: Implement rcu_segcblist_is_offloaded() config dependent list.h: Update comment to explicitly note circular lists rcu: Panic after fixed number of stalls x86/smpboot: Move rcu_cpu_starting() earlier rcu: Allow rcu_irq_enter_check_tick() from NMI tools/memory-model: Label MP tests' producers and consumers tools/memory-model: Use "buf" and "flag" for message-passing tests tools/memory-model: Add types to litmus tests tools/memory-model: Add a glossary of LKMM terms ...
2020-12-14rcu-tasks: Move RCU-tasks initialization to before early_initcall()Uladzislau Rezki (Sony)
PowerPC testing encountered boot failures due to RCU Tasks not being fully initialized until core_initcall() time. This commit therefore initializes RCU Tasks (along with Rude RCU and RCU Tasks Trace) just before early_initcall() time, thus allowing waiting on RCU Tasks grace periods from early_initcall() handlers. Link: https://lore.kernel.org/rcu/87eekfh80a.fsf@dja-thinkpad.axtens.net/ Fixes: 36dadef23fcc ("kprobes: Init kprobes in early_initcall") Tested-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-11-27Merge branch 'linus' into sched/core, to resolve semantic conflictIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-11-24irq_work: CleanupPeter Zijlstra
Get rid of the __call_single_node union and clean up the API a little to avoid external code relying on the structure layout as much. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
2020-11-19Merge branches 'cpuinfo.2020.11.06a', 'doc.2020.11.06a', ↵Paul E. McKenney
'fixes.2020.11.19b', 'lockdep.2020.11.02a', 'tasks.2020.11.06a' and 'torture.2020.11.06a' into HEAD cpuinfo.2020.11.06a: Speedups for /proc/cpuinfo. doc.2020.11.06a: Documentation updates. fixes.2020.11.19b: Miscellaneous fixes. lockdep.2020.11.02a: Lockdep-RCU updates to avoid "unused variable". tasks.2020.11.06a: Tasks-RCU updates. torture.2020.11.06a': Torture-test updates.
2020-11-19srcu: Take early exit on memory-allocation failurePaul E. McKenney
It turns out that init_srcu_struct() can be invoked from usermode tasks, and that fatal signals received by these tasks can cause memory-allocation failures. These failures are not handled well by init_srcu_struct(), so much so that NULL pointer dereferences can result. This commit therefore causes init_srcu_struct() to take an early exit upon detection of memory-allocation failure. Link: https://lore.kernel.org/lkml/20200908144306.33355-1-aik@ozlabs.ru/ Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru> Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-11-19rcu/tree: Defer kvfree_rcu() allocation to a clean contextUladzislau Rezki (Sony)
The current memmory-allocation interface causes the following difficulties for kvfree_rcu(): a) If built with CONFIG_PROVE_RAW_LOCK_NESTING, the lockdep will complain about violation of the nesting rules, as in "BUG: Invalid wait context". This Kconfig option checks for proper raw_spinlock vs. spinlock nesting, in particular, it is not legal to acquire a spinlock_t while holding a raw_spinlock_t. This is a problem because kfree_rcu() uses raw_spinlock_t whereas the "page allocator" internally deals with spinlock_t to access to its zones. The code also can be broken from higher level of view: <snip> raw_spin_lock(&some_lock); kfree_rcu(some_pointer, some_field_offset); <snip> b) If built with CONFIG_PREEMPT_RT, spinlock_t is converted into sleeplock. This means that invoking the page allocator from atomic contexts results in "BUG: scheduling while atomic". c) Please note that call_rcu() is already invoked from raw atomic context, so it is only reasonable to expaect that kfree_rcu() and kvfree_rcu() will also be called from atomic raw context. This commit therefore defers page allocation to a clean context using the combination of an hrtimer and a workqueue. The hrtimer stage is required in order to avoid deadlocks with the scheduler. This deferred allocation is required only when kvfree_rcu()'s per-CPU page cache is empty. Link: https://lore.kernel.org/lkml/20200630164543.4mdcf6zb4zfclhln@linutronix.de/ Fixes: 3042f83f19be ("rcu: Support reclaim for head-less object") Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-11-19rcu: Do not report strict GPs for outgoing CPUsPaul E. McKenney
An outgoing CPU is marked offline in a stop-machine handler and most of that CPU's services stop at that point, including IRQ work queues. However, that CPU must take another pass through the scheduler and through a number of CPU-hotplug notifiers, many of which contain RCU readers. In the past, these readers were not a problem because the outgoing CPU has interrupts disabled, so that rcu_read_unlock_special() would not be invoked, and thus RCU would never attempt to queue IRQ work on the outgoing CPU. This changed with the advent of the CONFIG_RCU_STRICT_GRACE_PERIOD Kconfig option, in which rcu_read_unlock_special() is invoked upon exit from almost all RCU read-side critical sections. Worse yet, because interrupts are disabled, rcu_read_unlock_special() cannot immediately report a quiescent state and will therefore attempt to defer this reporting, for example, by queueing IRQ work. Which fails with a splat because the CPU is already marked as being offline. But it turns out that there is no need to report this quiescent state because rcu_report_dead() will do this job shortly after the outgoing CPU makes its final dive into the idle loop. This commit therefore makes rcu_read_unlock_special() refrain from queuing IRQ work onto outgoing CPUs. Fixes: 44bad5b3cca2 ("rcu: Do full report for .need_qs for strict GPs") Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Jann Horn <jannh@google.com>
2020-11-19rcu: Fix a typo in rcu_blocking_is_gp() header commentZhouyi Zhou
This commit fixes a typo in the rcu_blocking_is_gp() function's header comment. Signed-off-by: Zhouyi Zhou <zhouzhouyi@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-11-19rcu: Prevent lockdep-RCU splats on lock acquisition/releasePaul E. McKenney
The rcu_cpu_starting() and rcu_report_dead() functions transition the current CPU between online and offline state from an RCU perspective. Unfortunately, this means that the rcu_cpu_starting() function's lock acquisition and the rcu_report_dead() function's lock releases happen while the CPU is offline from an RCU perspective, which can result in lockdep-RCU splats about using RCU from an offline CPU. And this situation can also result in too-short grace periods, especially in guest OSes that are subject to vCPU preemption. This commit therefore uses sequence-count-like synchronization to forgive use of RCU while RCU thinks a CPU is offline across the full extent of the rcu_cpu_starting() and rcu_report_dead() function's lock acquisitions and releases. One approach would have been to use the actual sequence-count primitives provided by the Linux kernel. Unfortunately, the resulting code looks completely broken and wrong, and is likely to result in patches that break RCU in an attempt to address this appearance of broken wrongness. Plus there is no net savings in lines of code, given the additional explicit memory barriers required. Therefore, this sequence count is instead implemented by a new ->ofl_seq field in the rcu_node structure. If this counter's value is an odd number, RCU forgives RCU read-side critical sections on other CPUs covered by the same rcu_node structure, even if those CPUs are offline from an RCU perspective. In addition, if a given leaf rcu_node structure's ->ofl_seq counter value is an odd number, rcu_gp_init() delays starting the grace period until that counter value changes. [ paulmck: Apply Peter Zijlstra feedback. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-11-19rcu/tree: nocb: Avoid raising softirq for offloaded ready-to-execute CBsJoel Fernandes (Google)
Testing showed that rcu_pending() can return 1 when offloaded callbacks are ready to execute. This invokes RCU core processing, for example, by raising RCU_SOFTIRQ, eventually resulting in a call to rcu_core(). However, rcu_core() explicitly avoids in any way manipulating offloaded callbacks, which are instead handled by the rcuog and rcuoc kthreads, which work independently of rcu_core(). One exception to this independence is that rcu_core() invokes do_nocb_deferred_wakeup(), however, rcu_pending() also checks rcu_nocb_need_deferred_wakeup() in order to correctly handle this case, invoking rcu_core() when needed. This commit therefore avoids needlessly invoking RCU core processing by checking rcu_segcblist_ready_cbs() only on non-offloaded CPUs. This reduces overhead, for example, by reducing softirq activity. This change passed 30 minute tests of TREE01 through TREE09 each. On TREE08, there is at most 150us from the time that rcu_pending() chose not to invoke RCU core processing to the time when the ready callbacks were invoked by the rcuoc kthread. This provides further evidence that there is no need to invoke rcu_core() for offloaded callbacks that are ready to invoke. Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Neeraj Upadhyay <neeraju@codeaurora.org> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-11-19rcu,ftrace: Fix ftrace recursionPeter Zijlstra
Kim reported that perf-ftrace made his box unhappy. It turns out that commit: ff5c4f5cad33 ("rcu/tree: Mark the idle relevant functions noinstr") removed one too many notrace qualifiers, probably due to there not being a helpful comment. This commit therefore reinstates the notrace and adds a comment to avoid losing it again. [ paulmck: Apply Steven Rostedt's feedback on the comment. ] Fixes: ff5c4f5cad33 ("rcu/tree: Mark the idle relevant functions noinstr") Reported-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-11-19rcu/tree: Make struct kernel_param_ops definitions constJoe Perches
These should be const, so make it so. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-11-19rcu/tree: Add a warning if CPU being onlined did not report QS alreadyJoel Fernandes (Google)
Currently, rcu_cpu_starting() checks to see if the RCU core expects a quiescent state from the incoming CPU. However, the current interaction between RCU quiescent-state reporting and CPU-hotplug operations should mean that the incoming CPU never needs to report a quiescent state. First, the outgoing CPU reports a quiescent state if needed. Second, the race where the CPU is leaving just as RCU is initializing a new grace period is handled by an explicit check for this condition. Third, the CPU's leaf rcu_node structure's ->lock serializes these checks. This means that if rcu_cpu_starting() ever feels the need to report a quiescent state, then there is a bug somewhere in the CPU hotplug code or the RCU grace-period handling code. This commit therefore adds a WARN_ON_ONCE() to bring that bug to everyone's attention. Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Suggested-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-11-19rcu: Clarify nocb kthreads naming in RCU_NOCB_CPU configNeeraj Upadhyay
This commit clarifies that the "p" and the "s" in the in the RCU_NOCB_CPU config-option description refer to the "x" in the "rcuox/N" kthread name. Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org> [ paulmck: While in the area, update description and advice. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-11-19rcu: Fix single-CPU check in rcu_blocking_is_gp()Neeraj Upadhyay
Currently, for CONFIG_PREEMPTION=n kernels, rcu_blocking_is_gp() uses num_online_cpus() to determine whether there is only one CPU online. When there is only a single CPU online, the simple fact that synchronize_rcu() could be legally called implies that a full grace period has elapsed. Therefore, in the single-CPU case, synchronize_rcu() simply returns immediately. Unfortunately, num_online_cpus() is unreliable while a CPU-hotplug operation is transitioning to or from single-CPU operation because: 1. num_online_cpus() uses atomic_read(&__num_online_cpus) to locklessly sample the number of online CPUs. The hotplug locks are not held, which means that an incoming CPU can concurrently update this count. This in turn means that an RCU read-side critical section on the incoming CPU might observe updates prior to the grace period, but also that this critical section might extend beyond the end of the optimized synchronize_rcu(). This breaks RCU's fundamental guarantee. 2. In addition, num_online_cpus() does no ordering, thus providing another way that RCU's fundamental guarantee can be broken by the current code. 3. The most probable failure mode happens on outgoing CPUs. The outgoing CPU updates the count of online CPUs in the CPUHP_TEARDOWN_CPU stop-machine handler, which is fine in and of itself due to preemption being disabled at the call to num_online_cpus(). Unfortunately, after that stop-machine handler returns, the CPU takes one last trip through the scheduler (which has RCU readers) and, after the resulting context switch, one final dive into the idle loop. During this time, RCU needs to keep track of two CPUs, but num_online_cpus() will say that there is only one, which in turn means that the surviving CPU will incorrectly ignore the outgoing CPU's RCU read-side critical sections. This problem is illustrated by the following litmus test in which P0() corresponds to synchronize_rcu() and P1() corresponds to the incoming CPU. The herd7 tool confirms that the "exists" clause can be satisfied, thus demonstrating that this breakage can happen according to the Linux kernel memory model. { int x = 0; atomic_t numonline = ATOMIC_INIT(1); } P0(int *x, atomic_t *numonline) { int r0; WRITE_ONCE(*x, 1); r0 = atomic_read(numonline); if (r0 == 1) { smp_mb(); } else { synchronize_rcu(); } WRITE_ONCE(*x, 2); } P1(int *x, atomic_t *numonline) { int r0; int r1; atomic_inc(numonline); smp_mb(); rcu_read_lock(); r0 = READ_ONCE(*x); smp_rmb(); r1 = READ_ONCE(*x); rcu_read_unlock(); } locations [x;numonline;] exists (1:r0=0 /\ 1:r1=2) It is important to note that these problems arise only when the system is transitioning to or from single-CPU operation. One solution would be to hold the CPU-hotplug locks while sampling num_online_cpus(), which was in fact the intent of the (redundant) preempt_disable() and preempt_enable() surrounding this call to num_online_cpus(). Actually blocking CPU hotplug would not only result in excessive overhead, but would also unnecessarily impede CPU-hotplug operations. This commit therefore follows long-standing RCU tradition by maintaining a separate RCU-specific set of CPU-hotplug books. This separate set of books is implemented by a new ->n_online_cpus field in the rcu_state structure that maintains RCU's count of the online CPUs. This count is incremented early in the CPU-online process, so that the critical transition away from single-CPU operation will occur when there is only a single CPU. Similarly for the critical transition to single-CPU operation, the counter is decremented late in the CPU-offline process, again while there is only a single CPU. Because there is only ever a single CPU when the ->n_online_cpus field undergoes the critical 1->2 and 2->1 transitions, full memory ordering and mutual exclusion is provided implicitly and, better yet, for free. In the case where the CPU is coming online, nothing will happen until the current CPU helps it come online. Therefore, the new CPU will see all accesses prior to the optimized grace period, which means that RCU does not need to further delay this new CPU. In the case where the CPU is going offline, the outgoing CPU is totally out of the picture before the optimized grace period starts, which means that this outgoing CPU cannot see any of the accesses following that grace period. Again, RCU needs no further interaction with the outgoing CPU. This does mean that synchronize_rcu() will unnecessarily do a few grace periods the hard way just before the second CPU comes online and just after the second-to-last CPU goes offline, but it is not worth optimizing this uncommon case. Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-11-19rcu: Implement rcu_segcblist_is_offloaded() config dependentFrederic Weisbecker
This commit simplifies the use of the rcu_segcblist_is_offloaded() API so that its callers no longer need to check the RCU_NOCB_CPU Kconfig option. Note that rcu_segcblist_is_offloaded() is defined in the header file, which means that the generated code should be just as efficient as before. Suggested-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-11-19rcu: Panic after fixed number of stallschao
Some stalls are transient, so that system fully recovers. This commit therefore allows users to configure the number of stalls that must happen in order to trigger kernel panic. Signed-off-by: chao <chao@eero.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-11-19rcu: Allow rcu_irq_enter_check_tick() from NMIPeter Zijlstra
Eugenio managed to tickle #PF from NMI context which resulted in hitting a WARN in RCU through irqentry_enter() -> __rcu_irq_enter_check_tick(). However, this situation is perfectly sane and does not warrant an WARN. The #PF will (necessarily) be atomic and not require messing with the tick state, so early return is correct. This commit therefore removes the WARN. Fixes: aaf2bc50df1f ("rcu: Abstract out rcu_irq_enter_check_tick() from rcu_nmi_enter()") Reported-by: "Eugenio Pérez" <eupm90@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-11-17Merge branch 'urgent-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu Pull RCU fix from Paul McKenney: "A single commit that fixes a bug that was introduced a couple of merge windows ago, but which rather more recently converged to an agreed-upon fix. The bug is that interrupts can be incorrectly enabled while holding an irq-disabled spinlock. This can of course result in self-deadlocks. The bug is a bit difficult to trigger. It requires that a preempted task be blocking a preemptible-RCU grace period long enough to trigger an RCU CPU stall warning. In addition, an interrupt must occur at just the right time, and that interrupt's handler must acquire that same irq-disabled spinlock. Still, a deadlock is a deadlock. Furthermore, we do now have a fix, and that fix survives kernel test robot, -next, and rcutorture testing. It has also been verified by Sebastian as fixing the bug. Therefore..." * 'urgent-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: rcu: Don't invoke try_invoke_on_locked_down_task() with irqs disabled
2020-11-13Merge tag 'arm64-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Will Deacon: - Spectre/Meltdown safelisting for some Qualcomm KRYO cores - Fix RCU splat when failing to online a CPU due to a feature mismatch - Fix a recently introduced sparse warning in kexec() - Fix handling of CPU erratum 1418040 for late CPUs - Ensure hot-added memory falls within linear-mapped region * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: cpu_errata: Apply Erratum 845719 to KRYO2XX Silver arm64: proton-pack: Add KRYO2XX silver CPUs to spectre-v2 safe-list arm64: kpti: Add KRYO2XX gold/silver CPU cores to kpti safelist arm64: Add MIDR value for KRYO2XX gold/silver CPU cores arm64/mm: Validate hotplug range before creating linear mapping arm64: smp: Tell RCU about CPUs that fail to come online arm64: psci: Avoid printing in cpu_psci_cpu_die() arm64: kexec_file: Fix sparse warning arm64: errata: Fix handling of 1418040 with late CPU onlining