summaryrefslogtreecommitdiff
path: root/kernel/rcu
AgeCommit message (Collapse)Author
2021-01-06rcu/nocb: Re-offload supportFrederic Weisbecker
To re-offload the callback processing off of a CPU, it is necessary to clear SEGCBLIST_SOFTIRQ_ONLY, set SEGCBLIST_OFFLOADED, and then notify both the CB and GP kthreads so that they both set their own bit flag and start processing the callbacks remotely. The re-offloading worker is then notified that it can stop the RCU_SOFTIRQ handler (or rcuc kthread, as the case may be) from processing the callbacks locally. Ordering must be carefully enforced so that the callbacks that used to be processed locally without locking will have the same ordering properties when they are invoked by the nocb CB and GP kthreads. This commit makes this change. Cc: Josh Triplett <josh@joshtriplett.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Thomas Gleixner <tglx@linutronix.de> Inspired-by: Paul E. McKenney <paulmck@kernel.org> Tested-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> [ paulmck: Export rcu_nocb_cpu_offload(). ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-06rcu/nocb: De-offloading GP kthreadFrederic Weisbecker
To de-offload callback processing back onto a CPU, it is necessary to clear SEGCBLIST_OFFLOAD and notify the nocb GP kthread, which will then clear its own bit flag and ignore this CPU until further notice. Whichever of the nocb CB and nocb GP kthreads is last to clear its own bit notifies the de-offloading worker kthread. Once notified, this worker kthread can proceed safe in the knowledge that the nocb CB and GP kthreads will no longer be manipulating this CPU's RCU callback list. This commit makes this change. Cc: Josh Triplett <josh@joshtriplett.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Thomas Gleixner <tglx@linutronix.de> Inspired-by: Paul E. McKenney <paulmck@kernel.org> Tested-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-06rcu/nocb: Don't deoffload an offline CPU with pending workFrederic Weisbecker
Offloaded CPUs do not migrate their callbacks, instead relying on their rcuo kthread to invoke them. But if the CPU is offline, it will be running neither its RCU_SOFTIRQ handler nor its rcuc kthread. This means that de-offloading an offline CPU that still has pending callbacks will strand those callbacks. This commit therefore refuses to toggle offline CPUs having pending callbacks. Cc: Josh Triplett <josh@joshtriplett.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Thomas Gleixner <tglx@linutronix.de> Suggested-by: Paul E. McKenney <paulmck@kernel.org> Tested-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-06rcu/nocb: De-offloading CB kthreadFrederic Weisbecker
To de-offload callback processing back onto a CPU, it is necessary to clear SEGCBLIST_OFFLOAD and notify the nocb CB kthread, which will then clear its own bit flag and go to sleep to stop handling callbacks. This commit makes that change. It will also be necessary to notify the nocb GP kthread in this same way, which is the subject of a follow-on commit. Cc: Josh Triplett <josh@joshtriplett.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Thomas Gleixner <tglx@linutronix.de> Inspired-by: Paul E. McKenney <paulmck@kernel.org> Tested-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> [ paulmck: Add export per kernel test robot feedback. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-06rcu/nocb: Always init segcblist on CPU upFrederic Weisbecker
How the rdp->cblist enabled state is treated at CPU-hotplug time depends on whether or not that ->cblist is offloaded. 1) Not offloaded: The ->cblist is disabled when the CPU goes down. All its callbacks are migrated and none can to enqueued until after some later CPU-hotplug operation brings the CPU back up. 2) Offloaded: The ->cblist is not disabled on CPU down because the CB/GP kthreads must finish invoking the remaining callbacks. There is thus no need to re-enable it on CPU up. Since the ->cblist offloaded state is set in stone at boot, it cannot change between CPU down and CPU up. So 1) and 2) are symmetrical. However, given runtime toggling of the offloaded state, there are two additional asymmetrical scenarios: 3) The ->cblist is not offloaded when the CPU goes down. The ->cblist is later toggled to offloaded and then the CPU comes back up. 4) The ->cblist is offloaded when the CPU goes down. The ->cblist is later toggled to no longer be offloaded and then the CPU comes back up. Scenario 4) is currently handled correctly. The ->cblist remains enabled on CPU down and gets re-initialized on CPU up. The toggling operation will wait until ->cblist is empty, so ->cblist will remain empty until CPU-up time. The scenario 3) would run into trouble though, as the rdp is disabled on CPU down and not re-initialized/re-enabled on CPU up. Except that in this case, ->cblist is guaranteed to be empty because all its callbacks were migrated away at CPU-down time. And the CPU-up code already initializes and enables any empty ->cblist structures in order to handle the possibility of early-boot invocations of call_rcu() in the case where such invocations don't occur. So all that need be done is to adjust the locking. Cc: Josh Triplett <josh@joshtriplett.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Thomas Gleixner <tglx@linutronix.de> Inspired-by: Paul E. McKenney <paulmck@kernel.org> Tested-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-06rcu/nocb: Provide basic callback offloading state machine bitsFrederic Weisbecker
Offloading and de-offloading RCU callback processes must be done carefully. There must never be a time at which callback processing is disabled because the task driving the offloading or de-offloading might be preempted or otherwise stalled at that point in time, which would result in OOM due to calbacks piling up indefinitely. This implies that there will be times during which a given CPU's callbacks might be concurrently invoked by both that CPU's RCU_SOFTIRQ handler (or, equivalently, that CPU's rcuc kthread) and by that CPU's rcuo kthread. This situation could fatally confuse both rcu_barrier() and the CPU-hotplug offlining process, so these must be excluded during any concurrent-callback-invocation period. In addition, during times of concurrent callback invocation, changes to ->cblist must be protected both as needed for RCU_SOFTIRQ and as needed for the rcuo kthread. This commit therefore defines and documents the states for a state machine that coordinates offloading and deoffloading. Cc: Josh Triplett <josh@joshtriplett.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Thomas Gleixner <tglx@linutronix.de> Inspired-by: Paul E. McKenney <paulmck@kernel.org> Tested-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-06rcu/nocb: Turn enabled/offload states into a common flagFrederic Weisbecker
This commit gathers the rcu_segcblist ->enabled and ->offloaded property field into a single ->flags bitmask to avoid further proliferation of individual u8 fields in the structure. This change prepares for the state formerly known as ->offloaded state to be modified at runtime. Cc: Josh Triplett <josh@joshtriplett.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Thomas Gleixner <tglx@linutronix.de> Inspired-by: Paul E. McKenney <paulmck@kernel.org> Tested-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-06rcu/segcblist: Add debug checks for segment lengthsJoel Fernandes (Google)
This commit adds debug checks near the end of rcu_do_batch() that emit warnings if an empty rcu_segcblist structure has non-zero segment counts, or, conversely, if a non-empty structure has all-zero segment counts. Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> [ paulmck: Fix queue/segment-length checks. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-06rcu/trace: Add tracing for how segcb list changesJoel Fernandes (Google)
This commit adds tracing to track how the segcb list changes before/after acceleration, during queuing and during dequeuing. This tracing helped discover an optimization that avoided needless GP requests when no callbacks were accelerated. The tracing overhead is minimal as each segment's length is now stored in the respective segment. Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Neeraj Upadhyay <neeraju@codeaurora.org> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-06rcu/tree: segcblist: Remove redundant smp_mb()sJoel Fernandes (Google)
The full memory barriers in rcu_segcblist_enqueue() and in rcu_do_batch() are not needed because rcu_segcblist_add_len(), and thus also rcu_segcblist_inc_len(), already includes a memory barrier *before* and *after* the length of the list is updated. This commit therefore removes these redundant smp_mb() invocations. Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-06rcu/segcblist: Add counters to segcblist datastructureJoel Fernandes (Google)
Add counting of segment lengths of segmented callback list. This will be useful for a number of things such as knowing how big the ready-to-execute segment have gotten. The immediate benefit is ability to trace how the callbacks in the segmented callback list change. Also this patch remove hacks related to using donecbs's ->len field as a temporary variable to save the segmented callback list's length. This cannot be done anymore and is not needed. Also fix SRCU: The negative counting of the unsegmented list cannot be used to adjust the segmented one. To fix this, sample the unsegmented length in advance, and use it after CB execution to adjust the segmented list's length. Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-06rcu/segcblist: Add additional comments to explain smp_mb()Joel Fernandes (Google)
One counter-intuitive property of RCU is the fact that full memory barriers are needed both before and after updates to the full (non-segmented) length. This patch therefore helps to assist the reader's intuition by adding appropriate comments. [ paulmck: Wordsmithing. ] Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-01-04rcu/tree: Make rcu_do_batch count how many callbacks were executedJoel Fernandes (Google)
The rcu_do_batch() function extracts the ready-to-invoke callbacks from the rcu_segcblist located in the ->cblist field of the current CPU's rcu_data structure. These callbacks are first moved to a local (unsegmented) rcu_cblist. The rcu_do_batch() function then uses this rcu_cblist's ->len field to count how many CBs it has invoked, but it does so by counting that field down from zero. Finally, this function negates the value in this ->len field (resulting in a positive number) and subtracts the result from the ->len field of the current CPU's ->cblist field. Except that it is sometimes necessary for rcu_do_batch() to stop invoking callbacks mid-stream, despite there being more ready to invoke, for example, if a high-priority task wakes up. In this case the remaining not-yet-invoked callbacks are requeued back onto the CPU's ->cblist, but remain in the ready-to-invoke segment of that list. As above, the negative of the local rcu_cblist's ->len field is still subtracted from the ->len field of the current CPU's ->cblist field. The design of counting down from 0 is confusing and error-prone, plus use of a positive count will make it easier to provide a uniform and consistent API to deal with the per-segment counts that are added later in this series. For example, rcu_segcblist_extract_done_cbs() can unconditionally populate the resulting unsegmented list's ->len field during extraction. This commit therefore explicitly counts how many callbacks were executed in rcu_do_batch() itself, counting up from zero, and then uses that to update the per-CPU segcb list's ->len field, without relying on the downcounting of rcl->len from zero. Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Neeraj Upadhyay <neeraju@codeaurora.org> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-12-14Merge tag 'sched-core-2020-12-14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Thomas Gleixner: - migrate_disable/enable() support which originates from the RT tree and is now a prerequisite for the new preemptible kmap_local() API which aims to replace kmap_atomic(). - A fair amount of topology and NUMA related improvements - Improvements for the frequency invariant calculations - Enhanced robustness for the global CPU priority tracking and decision making - The usual small fixes and enhancements all over the place * tag 'sched-core-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (61 commits) sched/fair: Trivial correction of the newidle_balance() comment sched/fair: Clear SMT siblings after determining the core is not idle sched: Fix kernel-doc markup x86: Print ratio freq_max/freq_base used in frequency invariance calculations x86, sched: Use midpoint of max_boost and max_P for frequency invariance on AMD EPYC x86, sched: Calculate frequency invariance for AMD systems irq_work: Optimize irq_work_single() smp: Cleanup smp_call_function*() irq_work: Cleanup sched: Limit the amount of NUMA imbalance that can exist at fork time sched/numa: Allow a floating imbalance between NUMA nodes sched: Avoid unnecessary calculation of load imbalance at clone time sched/numa: Rename nr_running and break out the magic number sched: Make migrate_disable/enable() independent of RT sched/topology: Condition EAS enablement on FIE support arm64: Rebuild sched domains on invariance status changes sched/topology,schedutil: Wrap sched domains rebuild sched/uclamp: Allow to reset a task uclamp constraint value sched/core: Fix typos in comments Documentation: scheduler: fix information on arch SD flags, sched_domain and sched_debug ...
2020-12-14Merge tag 'core-rcu-2020-12-14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU updates from Thomas Gleixner: "RCU, LKMM and KCSAN updates collected by Paul McKenney. RCU: - Avoid cpuinfo-induced IPI pileups and idle-CPU IPIs - Lockdep-RCU updates reducing the need for __maybe_unused - Tasks-RCU updates - Miscellaneous fixes - Documentation updates - Torture-test updates KCSAN: - updates for selftests, avoiding setting watchpoints on NULL pointers - fix to watchpoint encoding LKMM: - updates for documentation along with some updates to example-code litmus tests" * tag 'core-rcu-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (72 commits) srcu: Take early exit on memory-allocation failure rcu/tree: Defer kvfree_rcu() allocation to a clean context rcu: Do not report strict GPs for outgoing CPUs rcu: Fix a typo in rcu_blocking_is_gp() header comment rcu: Prevent lockdep-RCU splats on lock acquisition/release rcu/tree: nocb: Avoid raising softirq for offloaded ready-to-execute CBs rcu,ftrace: Fix ftrace recursion rcu/tree: Make struct kernel_param_ops definitions const rcu/tree: Add a warning if CPU being onlined did not report QS already rcu: Clarify nocb kthreads naming in RCU_NOCB_CPU config rcu: Fix single-CPU check in rcu_blocking_is_gp() rcu: Implement rcu_segcblist_is_offloaded() config dependent list.h: Update comment to explicitly note circular lists rcu: Panic after fixed number of stalls x86/smpboot: Move rcu_cpu_starting() earlier rcu: Allow rcu_irq_enter_check_tick() from NMI tools/memory-model: Label MP tests' producers and consumers tools/memory-model: Use "buf" and "flag" for message-passing tests tools/memory-model: Add types to litmus tests tools/memory-model: Add a glossary of LKMM terms ...
2020-11-27Merge branch 'linus' into sched/core, to resolve semantic conflictIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-11-24irq_work: CleanupPeter Zijlstra
Get rid of the __call_single_node union and clean up the API a little to avoid external code relying on the structure layout as much. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
2020-11-19Merge branches 'cpuinfo.2020.11.06a', 'doc.2020.11.06a', ↵Paul E. McKenney
'fixes.2020.11.19b', 'lockdep.2020.11.02a', 'tasks.2020.11.06a' and 'torture.2020.11.06a' into HEAD cpuinfo.2020.11.06a: Speedups for /proc/cpuinfo. doc.2020.11.06a: Documentation updates. fixes.2020.11.19b: Miscellaneous fixes. lockdep.2020.11.02a: Lockdep-RCU updates to avoid "unused variable". tasks.2020.11.06a: Tasks-RCU updates. torture.2020.11.06a': Torture-test updates.
2020-11-19srcu: Take early exit on memory-allocation failurePaul E. McKenney
It turns out that init_srcu_struct() can be invoked from usermode tasks, and that fatal signals received by these tasks can cause memory-allocation failures. These failures are not handled well by init_srcu_struct(), so much so that NULL pointer dereferences can result. This commit therefore causes init_srcu_struct() to take an early exit upon detection of memory-allocation failure. Link: https://lore.kernel.org/lkml/20200908144306.33355-1-aik@ozlabs.ru/ Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru> Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-11-19rcu/tree: Defer kvfree_rcu() allocation to a clean contextUladzislau Rezki (Sony)
The current memmory-allocation interface causes the following difficulties for kvfree_rcu(): a) If built with CONFIG_PROVE_RAW_LOCK_NESTING, the lockdep will complain about violation of the nesting rules, as in "BUG: Invalid wait context". This Kconfig option checks for proper raw_spinlock vs. spinlock nesting, in particular, it is not legal to acquire a spinlock_t while holding a raw_spinlock_t. This is a problem because kfree_rcu() uses raw_spinlock_t whereas the "page allocator" internally deals with spinlock_t to access to its zones. The code also can be broken from higher level of view: <snip> raw_spin_lock(&some_lock); kfree_rcu(some_pointer, some_field_offset); <snip> b) If built with CONFIG_PREEMPT_RT, spinlock_t is converted into sleeplock. This means that invoking the page allocator from atomic contexts results in "BUG: scheduling while atomic". c) Please note that call_rcu() is already invoked from raw atomic context, so it is only reasonable to expaect that kfree_rcu() and kvfree_rcu() will also be called from atomic raw context. This commit therefore defers page allocation to a clean context using the combination of an hrtimer and a workqueue. The hrtimer stage is required in order to avoid deadlocks with the scheduler. This deferred allocation is required only when kvfree_rcu()'s per-CPU page cache is empty. Link: https://lore.kernel.org/lkml/20200630164543.4mdcf6zb4zfclhln@linutronix.de/ Fixes: 3042f83f19be ("rcu: Support reclaim for head-less object") Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-11-19rcu: Do not report strict GPs for outgoing CPUsPaul E. McKenney
An outgoing CPU is marked offline in a stop-machine handler and most of that CPU's services stop at that point, including IRQ work queues. However, that CPU must take another pass through the scheduler and through a number of CPU-hotplug notifiers, many of which contain RCU readers. In the past, these readers were not a problem because the outgoing CPU has interrupts disabled, so that rcu_read_unlock_special() would not be invoked, and thus RCU would never attempt to queue IRQ work on the outgoing CPU. This changed with the advent of the CONFIG_RCU_STRICT_GRACE_PERIOD Kconfig option, in which rcu_read_unlock_special() is invoked upon exit from almost all RCU read-side critical sections. Worse yet, because interrupts are disabled, rcu_read_unlock_special() cannot immediately report a quiescent state and will therefore attempt to defer this reporting, for example, by queueing IRQ work. Which fails with a splat because the CPU is already marked as being offline. But it turns out that there is no need to report this quiescent state because rcu_report_dead() will do this job shortly after the outgoing CPU makes its final dive into the idle loop. This commit therefore makes rcu_read_unlock_special() refrain from queuing IRQ work onto outgoing CPUs. Fixes: 44bad5b3cca2 ("rcu: Do full report for .need_qs for strict GPs") Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Jann Horn <jannh@google.com>
2020-11-19rcu: Fix a typo in rcu_blocking_is_gp() header commentZhouyi Zhou
This commit fixes a typo in the rcu_blocking_is_gp() function's header comment. Signed-off-by: Zhouyi Zhou <zhouzhouyi@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-11-19rcu: Prevent lockdep-RCU splats on lock acquisition/releasePaul E. McKenney
The rcu_cpu_starting() and rcu_report_dead() functions transition the current CPU between online and offline state from an RCU perspective. Unfortunately, this means that the rcu_cpu_starting() function's lock acquisition and the rcu_report_dead() function's lock releases happen while the CPU is offline from an RCU perspective, which can result in lockdep-RCU splats about using RCU from an offline CPU. And this situation can also result in too-short grace periods, especially in guest OSes that are subject to vCPU preemption. This commit therefore uses sequence-count-like synchronization to forgive use of RCU while RCU thinks a CPU is offline across the full extent of the rcu_cpu_starting() and rcu_report_dead() function's lock acquisitions and releases. One approach would have been to use the actual sequence-count primitives provided by the Linux kernel. Unfortunately, the resulting code looks completely broken and wrong, and is likely to result in patches that break RCU in an attempt to address this appearance of broken wrongness. Plus there is no net savings in lines of code, given the additional explicit memory barriers required. Therefore, this sequence count is instead implemented by a new ->ofl_seq field in the rcu_node structure. If this counter's value is an odd number, RCU forgives RCU read-side critical sections on other CPUs covered by the same rcu_node structure, even if those CPUs are offline from an RCU perspective. In addition, if a given leaf rcu_node structure's ->ofl_seq counter value is an odd number, rcu_gp_init() delays starting the grace period until that counter value changes. [ paulmck: Apply Peter Zijlstra feedback. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-11-19rcu/tree: nocb: Avoid raising softirq for offloaded ready-to-execute CBsJoel Fernandes (Google)
Testing showed that rcu_pending() can return 1 when offloaded callbacks are ready to execute. This invokes RCU core processing, for example, by raising RCU_SOFTIRQ, eventually resulting in a call to rcu_core(). However, rcu_core() explicitly avoids in any way manipulating offloaded callbacks, which are instead handled by the rcuog and rcuoc kthreads, which work independently of rcu_core(). One exception to this independence is that rcu_core() invokes do_nocb_deferred_wakeup(), however, rcu_pending() also checks rcu_nocb_need_deferred_wakeup() in order to correctly handle this case, invoking rcu_core() when needed. This commit therefore avoids needlessly invoking RCU core processing by checking rcu_segcblist_ready_cbs() only on non-offloaded CPUs. This reduces overhead, for example, by reducing softirq activity. This change passed 30 minute tests of TREE01 through TREE09 each. On TREE08, there is at most 150us from the time that rcu_pending() chose not to invoke RCU core processing to the time when the ready callbacks were invoked by the rcuoc kthread. This provides further evidence that there is no need to invoke rcu_core() for offloaded callbacks that are ready to invoke. Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Neeraj Upadhyay <neeraju@codeaurora.org> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-11-19rcu,ftrace: Fix ftrace recursionPeter Zijlstra
Kim reported that perf-ftrace made his box unhappy. It turns out that commit: ff5c4f5cad33 ("rcu/tree: Mark the idle relevant functions noinstr") removed one too many notrace qualifiers, probably due to there not being a helpful comment. This commit therefore reinstates the notrace and adds a comment to avoid losing it again. [ paulmck: Apply Steven Rostedt's feedback on the comment. ] Fixes: ff5c4f5cad33 ("rcu/tree: Mark the idle relevant functions noinstr") Reported-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-11-19rcu/tree: Make struct kernel_param_ops definitions constJoe Perches
These should be const, so make it so. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-11-19rcu/tree: Add a warning if CPU being onlined did not report QS alreadyJoel Fernandes (Google)
Currently, rcu_cpu_starting() checks to see if the RCU core expects a quiescent state from the incoming CPU. However, the current interaction between RCU quiescent-state reporting and CPU-hotplug operations should mean that the incoming CPU never needs to report a quiescent state. First, the outgoing CPU reports a quiescent state if needed. Second, the race where the CPU is leaving just as RCU is initializing a new grace period is handled by an explicit check for this condition. Third, the CPU's leaf rcu_node structure's ->lock serializes these checks. This means that if rcu_cpu_starting() ever feels the need to report a quiescent state, then there is a bug somewhere in the CPU hotplug code or the RCU grace-period handling code. This commit therefore adds a WARN_ON_ONCE() to bring that bug to everyone's attention. Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Suggested-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-11-19rcu: Clarify nocb kthreads naming in RCU_NOCB_CPU configNeeraj Upadhyay
This commit clarifies that the "p" and the "s" in the in the RCU_NOCB_CPU config-option description refer to the "x" in the "rcuox/N" kthread name. Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org> [ paulmck: While in the area, update description and advice. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-11-19rcu: Fix single-CPU check in rcu_blocking_is_gp()Neeraj Upadhyay
Currently, for CONFIG_PREEMPTION=n kernels, rcu_blocking_is_gp() uses num_online_cpus() to determine whether there is only one CPU online. When there is only a single CPU online, the simple fact that synchronize_rcu() could be legally called implies that a full grace period has elapsed. Therefore, in the single-CPU case, synchronize_rcu() simply returns immediately. Unfortunately, num_online_cpus() is unreliable while a CPU-hotplug operation is transitioning to or from single-CPU operation because: 1. num_online_cpus() uses atomic_read(&__num_online_cpus) to locklessly sample the number of online CPUs. The hotplug locks are not held, which means that an incoming CPU can concurrently update this count. This in turn means that an RCU read-side critical section on the incoming CPU might observe updates prior to the grace period, but also that this critical section might extend beyond the end of the optimized synchronize_rcu(). This breaks RCU's fundamental guarantee. 2. In addition, num_online_cpus() does no ordering, thus providing another way that RCU's fundamental guarantee can be broken by the current code. 3. The most probable failure mode happens on outgoing CPUs. The outgoing CPU updates the count of online CPUs in the CPUHP_TEARDOWN_CPU stop-machine handler, which is fine in and of itself due to preemption being disabled at the call to num_online_cpus(). Unfortunately, after that stop-machine handler returns, the CPU takes one last trip through the scheduler (which has RCU readers) and, after the resulting context switch, one final dive into the idle loop. During this time, RCU needs to keep track of two CPUs, but num_online_cpus() will say that there is only one, which in turn means that the surviving CPU will incorrectly ignore the outgoing CPU's RCU read-side critical sections. This problem is illustrated by the following litmus test in which P0() corresponds to synchronize_rcu() and P1() corresponds to the incoming CPU. The herd7 tool confirms that the "exists" clause can be satisfied, thus demonstrating that this breakage can happen according to the Linux kernel memory model. { int x = 0; atomic_t numonline = ATOMIC_INIT(1); } P0(int *x, atomic_t *numonline) { int r0; WRITE_ONCE(*x, 1); r0 = atomic_read(numonline); if (r0 == 1) { smp_mb(); } else { synchronize_rcu(); } WRITE_ONCE(*x, 2); } P1(int *x, atomic_t *numonline) { int r0; int r1; atomic_inc(numonline); smp_mb(); rcu_read_lock(); r0 = READ_ONCE(*x); smp_rmb(); r1 = READ_ONCE(*x); rcu_read_unlock(); } locations [x;numonline;] exists (1:r0=0 /\ 1:r1=2) It is important to note that these problems arise only when the system is transitioning to or from single-CPU operation. One solution would be to hold the CPU-hotplug locks while sampling num_online_cpus(), which was in fact the intent of the (redundant) preempt_disable() and preempt_enable() surrounding this call to num_online_cpus(). Actually blocking CPU hotplug would not only result in excessive overhead, but would also unnecessarily impede CPU-hotplug operations. This commit therefore follows long-standing RCU tradition by maintaining a separate RCU-specific set of CPU-hotplug books. This separate set of books is implemented by a new ->n_online_cpus field in the rcu_state structure that maintains RCU's count of the online CPUs. This count is incremented early in the CPU-online process, so that the critical transition away from single-CPU operation will occur when there is only a single CPU. Similarly for the critical transition to single-CPU operation, the counter is decremented late in the CPU-offline process, again while there is only a single CPU. Because there is only ever a single CPU when the ->n_online_cpus field undergoes the critical 1->2 and 2->1 transitions, full memory ordering and mutual exclusion is provided implicitly and, better yet, for free. In the case where the CPU is coming online, nothing will happen until the current CPU helps it come online. Therefore, the new CPU will see all accesses prior to the optimized grace period, which means that RCU does not need to further delay this new CPU. In the case where the CPU is going offline, the outgoing CPU is totally out of the picture before the optimized grace period starts, which means that this outgoing CPU cannot see any of the accesses following that grace period. Again, RCU needs no further interaction with the outgoing CPU. This does mean that synchronize_rcu() will unnecessarily do a few grace periods the hard way just before the second CPU comes online and just after the second-to-last CPU goes offline, but it is not worth optimizing this uncommon case. Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-11-19rcu: Implement rcu_segcblist_is_offloaded() config dependentFrederic Weisbecker
This commit simplifies the use of the rcu_segcblist_is_offloaded() API so that its callers no longer need to check the RCU_NOCB_CPU Kconfig option. Note that rcu_segcblist_is_offloaded() is defined in the header file, which means that the generated code should be just as efficient as before. Suggested-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-11-19rcu: Panic after fixed number of stallschao
Some stalls are transient, so that system fully recovers. This commit therefore allows users to configure the number of stalls that must happen in order to trigger kernel panic. Signed-off-by: chao <chao@eero.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-11-19rcu: Allow rcu_irq_enter_check_tick() from NMIPeter Zijlstra
Eugenio managed to tickle #PF from NMI context which resulted in hitting a WARN in RCU through irqentry_enter() -> __rcu_irq_enter_check_tick(). However, this situation is perfectly sane and does not warrant an WARN. The #PF will (necessarily) be atomic and not require messing with the tick state, so early return is correct. This commit therefore removes the WARN. Fixes: aaf2bc50df1f ("rcu: Abstract out rcu_irq_enter_check_tick() from rcu_nmi_enter()") Reported-by: "Eugenio Pérez" <eupm90@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-11-17Merge branch 'urgent-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu Pull RCU fix from Paul McKenney: "A single commit that fixes a bug that was introduced a couple of merge windows ago, but which rather more recently converged to an agreed-upon fix. The bug is that interrupts can be incorrectly enabled while holding an irq-disabled spinlock. This can of course result in self-deadlocks. The bug is a bit difficult to trigger. It requires that a preempted task be blocking a preemptible-RCU grace period long enough to trigger an RCU CPU stall warning. In addition, an interrupt must occur at just the right time, and that interrupt's handler must acquire that same irq-disabled spinlock. Still, a deadlock is a deadlock. Furthermore, we do now have a fix, and that fix survives kernel test robot, -next, and rcutorture testing. It has also been verified by Sebastian as fixing the bug. Therefore..." * 'urgent-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: rcu: Don't invoke try_invoke_on_locked_down_task() with irqs disabled
2020-11-13Merge tag 'arm64-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Will Deacon: - Spectre/Meltdown safelisting for some Qualcomm KRYO cores - Fix RCU splat when failing to online a CPU due to a feature mismatch - Fix a recently introduced sparse warning in kexec() - Fix handling of CPU erratum 1418040 for late CPUs - Ensure hot-added memory falls within linear-mapped region * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: cpu_errata: Apply Erratum 845719 to KRYO2XX Silver arm64: proton-pack: Add KRYO2XX silver CPUs to spectre-v2 safe-list arm64: kpti: Add KRYO2XX gold/silver CPU cores to kpti safelist arm64: Add MIDR value for KRYO2XX gold/silver CPU cores arm64/mm: Validate hotplug range before creating linear mapping arm64: smp: Tell RCU about CPUs that fail to come online arm64: psci: Avoid printing in cpu_psci_cpu_die() arm64: kexec_file: Fix sparse warning arm64: errata: Fix handling of 1418040 with late CPU onlining
2020-11-10rcu: Don't invoke try_invoke_on_locked_down_task() with irqs disabledPaul E. McKenney
The try_invoke_on_locked_down_task() function requires that interrupts be enabled, but it is called with interrupts disabled from rcu_print_task_stall(), resulting in an "IRQs not enabled as expected" diagnostic. This commit therefore updates rcu_print_task_stall() to accumulate a list of the first few tasks while holding the current leaf rcu_node structure's ->lock, then releases that lock and only then uses try_invoke_on_locked_down_task() to attempt to obtain per-task detailed information. Of course, as soon as ->lock is released, the task might exit, so the get_task_struct() function is used to prevent the task structure from going away in the meantime. Link: https://lore.kernel.org/lkml/000000000000903d5805ab908fc4@google.com/ Fixes: 5bef8da66a9c ("rcu: Add per-task state to RCU CPU stall warnings") Reported-by: syzbot+cb3b69ae80afd6535b0e@syzkaller.appspotmail.com Reported-by: syzbot+f04854e1c5c9e913cc27@syzkaller.appspotmail.com Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-11-10arm64: smp: Tell RCU about CPUs that fail to come onlineWill Deacon
Commit ce3d31ad3cac ("arm64/smp: Move rcu_cpu_starting() earlier") ensured that RCU is informed early about incoming CPUs that might end up calling into printk() before they are online. However, if such a CPU fails the early CPU feature compatibility checks in check_local_cpu_capabilities(), then it will be powered off or parked without informing RCU, leading to an endless stream of stalls: | rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: | rcu: 2-O...: (0 ticks this GP) idle=002/1/0x4000000000000000 softirq=0/0 fqs=2593 | (detected by 0, t=5252 jiffies, g=9317, q=136) | Task dump for CPU 2: | task:swapper/2 state:R running task stack: 0 pid: 0 ppid: 1 flags:0x00000028 | Call trace: | ret_from_fork+0x0/0x30 Ensure that the dying CPU invokes rcu_report_dead() prior to being powered off or parked. Cc: Qian Cai <cai@redhat.com> Cc: "Paul E. McKenney" <paulmck@kernel.org> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Suggested-by: Qian Cai <cai@redhat.com> Link: https://lore.kernel.org/r/20201105222242.GA8842@willie-the-truck Link: https://lore.kernel.org/r/20201106103602.9849-3-will@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2020-11-06rcu-tasks: Make the units of ->init_fract be jiffiesPaul E. McKenney
Currently, the units of ->init_fract are milliseconds while those of ->gp_sleep are jiffies. For consistency with each other and with the argument of schedule_timeout_idle(), this commit changes the units of ->init_fract to jiffies. This change does affect the backoff algorithm, but only on systems where HZ is not 1000, and even there the change makes more sense, given that the current setup would "back off" to the same number of jiffies repeatedly. In contrast, with this change, the number of jiffies waited increases on each pass through the loop in the rcu_tasks_wait_gp() function. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-11-06rcutorture: Don't do need_resched() testing if ->sync is NULLPaul E. McKenney
If cur_ops->sync is NULL, rcu_torture_fwd_prog_nr() will nevertheless attempt to call through it. This commit therefore flags cases where neither need_resched() nor call_rcu() forward-progress testing can be performed due to NULL function pointers, and also causes rcu_torture_fwd_prog_nr() to take an early exit if cur_ops->sync() is NULL. Reported-by: Tom Rix <trix@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-11-06rcutorture: Small code cleanupsPaul E. McKenney
The rcu_torture_cleanup() function fails to NULL out the reader_tasks pointer after freeing it and its fakewriter_tasks loop has redundant braces. This commit therefore cleans these up. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-11-06rcutorture: Make stutter_wait() caller restore priorityPaul E. McKenney
Currently, stutter_wait() will happily spin waiting for the stutter interval to end even if the caller is running at a real-time priority level. This could starve normal-priority tasks for no good reason. This commit therefore drops the calling task's priority to SCHED_OTHER MAX_NICE if stutter_wait() needs to wait. But when it waits, stutter_wait() returns true, which allows the caller to restore the priority if needed. Callers that were already running at SCHED_OTHER MAX_NICE obviously do not need any changes, but this commit also restores priority for higher-priority callers. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-11-06rcutorture: Prevent hangs for invalid argumentsPaul E. McKenney
If an rcutorture torture-test run is given a bad kvm.sh argument, the test will complain to the console, which is good. What is bad is that from the user's perspective, it will just hang for the time specified by the --duration argument. This commit therefore forces an immediate kernel shutdown if a rcu_torture_init()-time error occurs, thus avoiding the appearance of a hang. It also forces a console splat in this case to clearly indicate the presence of an error. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-11-06refscale: Prevent hangs for invalid argumentsPaul E. McKenney
If an refscale torture-test run is given a bad kvm.sh argument, the test will complain to the console, which is good. What is bad is that from the user's perspective, it will just hang for the time specified by the --duration argument. This commit therefore forces an immediate kernel shutdown if a ref_scale_init()-time error occurs, thus avoiding the appearance of a hang. It also forces a console splat in this case to clearly indicate the presence of an error. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-11-06rcuscale: Prevent hangs for invalid argumentsPaul E. McKenney
If an rcuscale torture-test run is given a bad kvm.sh argument, the test will complain to the console, which is good. What is bad is that from the user's perspective, it will just hang for the time specified by the --duration argument. This commit therefore forces an immediate kernel shutdown if a rcu_scale_init()-time error occurs, thus avoiding the appearance of a hang. It also forces a console splat in this case to clearly indicate the presence of an error. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-11-06rcuscale: Add RCU Tasks TracePaul E. McKenney
This commit adds the ability to test performance and scalability of RCU Tasks Trace updaters. Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-11-06x86/cpu: Avoid cpuinfo-induced IPIing of idle CPUsPaul E. McKenney
Currently, accessing /proc/cpuinfo sends IPIs to idle CPUs in order to learn their clock frequency. Which is a bit strange, given that waking them from idle likely significantly changes their clock frequency. This commit therefore avoids sending /proc/cpuinfo-induced IPIs to idle CPUs. [ paulmck: Also check for idle in arch_freq_prepare_all(). ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: <x86@kernel.org>
2020-11-02refscale: Bounds-check module parametersPaul E. McKenney
The default value for refscale.nreaders is -1, which results in the code setting the value to three-quarters of the number of CPUs. On single-CPU systems, this results in three-quarters of the value one, which the C language's integer arithmetic rounds to zero. This in turn results in a divide-by-zero error. This commit therefore adds bounds checking to the refscale module parameters, so that if they are less than one, they are set to the value one. Reported-by: kernel test robot <lkp@intel.com> Tested-by "Chen, Rong A" <rong.a.chen@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-11-02rcutorture: Make grace-period kthread report match RCU flavor being testedPaul E. McKenney
At the end of the test and after rcu_torture_writer() stalls, rcutorture invokes show_rcu_gp_kthreads() in order to dump out information on the RCU grace-period kthread. This makes a lot of sense when testing vanilla RCU, but not so much for the other flavors. This commit therefore allows per-flavor kthread-dump functions to be specified. [ paulmck: Apply feedback from kernel test robot <lkp@intel.com>. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-11-02rcu-tasks: Convert rcu_tasks_wait_gp() for-loop to while-loopPaul E. McKenney
The infinite for-loop in rcu_tasks_wait_gp() has its only exit at the top of the loop, so this commit does the straightforward conversion to a while-loop, thus saving a few lines. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-11-02srcu: Use a more appropriate lockdep helperJakub Kicinski
The lockdep_is_held() macro is defined as: #define lockdep_is_held(lock) lock_is_held(&(lock)->dep_map) This hides away the dereference, so that builds with !LOCKDEP don't break. This works in current kernels because the RCU_LOCKDEP_WARN() eliminates its condition at preprocessor time in !LOCKDEP kernels. However, later patches in this series will cause the compiler to see this condition even in !LOCKDEP kernels. This commit prepares for this upcoming change by switching from lock_is_held() to lockdep_is_held(). Signed-off-by: Jakub Kicinski <kuba@kernel.org> -- CC: jiangshanlai@gmail.com CC: paulmck@kernel.org CC: josh@joshtriplett.org CC: rostedt@goodmis.org CC: mathieu.desnoyers@efficios.com Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-10-26stop_machine, rcu: Mark functions as notraceZong Li
Some architectures assume that the stopped CPUs don't make function calls to traceable functions when they are in the stopped state. See also commit cb9d7fd51d9f ("watchdog: Mark watchdog touch functions as notrace"). Violating this assumption causes kernel crashes when switching tracer on RISC-V. Mark rcu_momentary_dyntick_idle() and stop_machine_yield() notrace to prevent this. Fixes: 4ecf0a43e729 ("processor: get rid of cpu_relax_yield") Fixes: 366237e7b083 ("stop_machine: Provide RCU quiescent state in multi_cpu_stop()") Signed-off-by: Zong Li <zong.li@sifive.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Atish Patra <atish.patra@wdc.com> Tested-by: Colin Ian King <colin.king@canonical.com> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Acked-by: Paul E. McKenney <paulmck@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20201021073839.43935-1-zong.li@sifive.com