From 6d60ea03ac2d3dcf6ddee6b45aa7213d8b0461c5 Mon Sep 17 00:00:00 2001 From: Zqiang Date: Thu, 16 Jun 2022 21:53:47 +0800 Subject: rcu: Fix rcu_read_unlock_strict() strict QS reporting Kernels built with CONFIG_PREEMPT=n and CONFIG_RCU_STRICT_GRACE_PERIOD=y report the quiescent state directly from the outermost rcu_read_unlock(). However, the current CPU's rcu_data structure's ->cpu_no_qs.b.norm might still be set, in which case rcu_report_qs_rdp() will exit early, thus failing to report quiescent state. This commit therefore causes rcu_read_unlock_strict() to clear CPU's rcu_data structure's ->cpu_no_qs.b.norm field before invoking rcu_report_qs_rdp(). Signed-off-by: Zqiang Signed-off-by: Paul E. McKenney --- kernel/rcu/tree_plugin.h | 1 + 1 file changed, 1 insertion(+) (limited to 'kernel/rcu/tree_plugin.h') diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index 438ecae6bd7e..86772c95ed0a 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -824,6 +824,7 @@ void rcu_read_unlock_strict(void) if (irqs_disabled() || preempt_count() || !rcu_state.gp_kthread) return; rdp = this_cpu_ptr(&rcu_data); + rdp->cpu_no_qs.b.norm = false; rcu_report_qs_rdp(rdp); udelay(rcu_unlock_delay); } -- cgit From bca4fa8cb0f4c096b515952f64e560fd784a0514 Mon Sep 17 00:00:00 2001 From: Zqiang Date: Mon, 20 Jun 2022 14:42:24 +0800 Subject: rcu: Update rcu_preempt_deferred_qs() comments for !PREEMPT kernels In non-premptible kernels, tasks never do context switches within RCU read-side critical sections. Therefore, in such kernels, each leaf rcu_node structure's ->blkd_tasks list will always be empty. The comment on the non-preemptible version of rcu_preempt_deferred_qs() confuses this point, so this commit therefore fixes it. Signed-off-by: Zqiang Signed-off-by: Paul E. McKenney --- kernel/rcu/tree_plugin.h | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) (limited to 'kernel/rcu/tree_plugin.h') diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index 86772c95ed0a..4152816dd29f 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -932,10 +932,13 @@ static notrace bool rcu_preempt_need_deferred_qs(struct task_struct *t) return false; } -// Except that we do need to respond to a request by an expedited grace -// period for a quiescent state from this CPU. Note that requests from -// tasks are handled when removing the task from the blocked-tasks list -// below. +// Except that we do need to respond to a request by an expedited +// grace period for a quiescent state from this CPU. Note that in +// non-preemptible kernels, there can be no context switches within RCU +// read-side critical sections, which in turn means that the leaf rcu_node +// structure's blocked-tasks list is always empty. is therefore no need to +// actually check it. Instead, a quiescent state from this CPU suffices, +// and this function is only called from such a quiescent state. notrace void rcu_preempt_deferred_qs(struct task_struct *t) { struct rcu_data *rdp = this_cpu_ptr(&rcu_data); -- cgit From 089254fd386eb6800dd7d7863f12a04ada0c35fa Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Wed, 3 Aug 2022 08:48:12 -0700 Subject: rcu: Document reason for rcu_all_qs() call to preempt_disable() Given that rcu_all_qs() is in non-preemptible kernels, why on earth should it invoke preempt_disable()? This commit adds the reason, which is to work nicely with debugging enabled in CONFIG_PREEMPT_COUNT=y kernels. Reported-by: Neeraj Upadhyay Reported-by: Boqun Feng Reported-by: Frederic Weisbecker Signed-off-by: Paul E. McKenney --- kernel/rcu/tree_plugin.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'kernel/rcu/tree_plugin.h') diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index 4152816dd29f..c46b3c74dad1 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -870,7 +870,7 @@ void rcu_all_qs(void) if (!raw_cpu_read(rcu_data.rcu_urgent_qs)) return; - preempt_disable(); + preempt_disable(); // For CONFIG_PREEMPT_COUNT=y kernels /* Load rcu_urgent_qs before other flags. */ if (!smp_load_acquire(this_cpu_ptr(&rcu_data.rcu_urgent_qs))) { preempt_enable(); -- cgit From 621189a1fe93cb2b34d62c5cdb9e258bca044813 Mon Sep 17 00:00:00 2001 From: Zqiang Date: Mon, 8 Aug 2022 10:26:26 +0800 Subject: rcu: Avoid triggering strict-GP irq-work when RCU is idle Kernels built with PREEMPT_RCU=y and RCU_STRICT_GRACE_PERIOD=y trigger irq-work from rcu_read_unlock(), and the resulting irq-work handler invokes rcu_preempt_deferred_qs_handle(). The point of this triggering is to force grace periods to end quickly in order to give tools like KASAN a better chance of detecting RCU usage bugs such as leaking RCU-protected pointers out of an RCU read-side critical section. However, this irq-work triggering is unconditional. This works, but there is no point in doing this irq-work unless the current grace period is waiting on the running CPU or task, which is not the common case. After all, in the common case there are many rcu_read_unlock() calls per CPU per grace period. This commit therefore triggers the irq-work only when the current grace period is waiting on the running CPU or task. This change was tested as follows on a four-CPU system: echo rcu_preempt_deferred_qs_handler > /sys/kernel/debug/tracing/set_ftrace_filter echo 1 > /sys/kernel/debug/tracing/function_profile_enabled insmod rcutorture.ko sleep 20 rmmod rcutorture.ko echo 0 > /sys/kernel/debug/tracing/function_profile_enabled echo > /sys/kernel/debug/tracing/set_ftrace_filter This procedure produces results in this per-CPU set of files: /sys/kernel/debug/tracing/trace_stat/function* Sample output from one of these files is as follows: Function Hit Time Avg s^2 -------- --- ---- --- --- rcu_preempt_deferred_qs_handle 838746 182650.3 us 0.217 us 0.004 us The baseline sum of the "Hit" values (the number of calls to this function) was 3,319,015. With this commit, that sum was 1,140,359, for a 2.9x reduction. The worst-case variance across the CPUs was less than 25%, so this large effect size is statistically significant. The raw data is available in the Link: URL. Link: https://lore.kernel.org/all/20220808022626.12825-1-qiang1.zhang@intel.com/ Signed-off-by: Zqiang Signed-off-by: Paul E. McKenney --- kernel/rcu/tree_plugin.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'kernel/rcu/tree_plugin.h') diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index c46b3c74dad1..207617f69aa5 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -641,7 +641,8 @@ static void rcu_read_unlock_special(struct task_struct *t) expboost = (t->rcu_blocked_node && READ_ONCE(t->rcu_blocked_node->exp_tasks)) || (rdp->grpmask & READ_ONCE(rnp->expmask)) || - IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD) || + (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD) && + ((rdp->grpmask & READ_ONCE(rnp->qsmask)) || t->rcu_blocked_node)) || (IS_ENABLED(CONFIG_RCU_BOOST) && irqs_were_disabled && t->rcu_blocked_node); // Need to defer quiescent state until everything is enabled. -- cgit From 7634b1eaa0cd135d5eedadb04ad3c91b1ecf28a9 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Wed, 24 Aug 2022 14:46:56 -0700 Subject: rcu: Exclude outgoing CPU when it is the last to leave The rcu_boost_kthread_setaffinity() function removes the outgoing CPU from the set_cpus_allowed() mask for the corresponding leaf rcu_node structure's rcub priority-boosting kthread. Except that if the outgoing CPU will leave that structure without any online CPUs, the mask is set to the housekeeping CPU mask from housekeeping_cpumask(). Which is fine unless the outgoing CPU happens to be a housekeeping CPU. This commit therefore removes the outgoing CPU from the housekeeping mask. This would of course be problematic if the outgoing CPU was the last online housekeeping CPU, but in that case you are in a world of hurt anyway. If someone comes up with a valid use case for a system needing all the housekeeping CPUs to be offline, further adjustments can be made. Signed-off-by: Paul E. McKenney --- kernel/rcu/tree_plugin.h | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) (limited to 'kernel/rcu/tree_plugin.h') diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index 207617f69aa5..32b424b571bd 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -1243,8 +1243,11 @@ static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu) cpu != outgoingcpu) cpumask_set_cpu(cpu, cm); cpumask_and(cm, cm, housekeeping_cpumask(HK_TYPE_RCU)); - if (cpumask_empty(cm)) + if (cpumask_empty(cm)) { cpumask_copy(cm, housekeeping_cpumask(HK_TYPE_RCU)); + if (outgoingcpu >= 0) + cpumask_clear_cpu(outgoingcpu, cm); + } set_cpus_allowed_ptr(t, cm); mutex_unlock(&rnp->boost_kthread_mutex); free_cpumask_var(cm); -- cgit From 528262f50274079740b53e29bcaaabf219aa7417 Mon Sep 17 00:00:00 2001 From: Zqiang Date: Tue, 19 Jul 2022 12:39:00 +0800 Subject: rcu-tasks: Make RCU Tasks Trace check for userspace execution Userspace execution is a valid quiescent state for RCU Tasks Trace, but the scheduling-clock interrupt does not currently report such quiescent states. Of course, the scheduling-clock interrupt is not strictly speaking userspace execution. However, the only way that this code is not in a quiescent state is if something invoked rcu_read_lock_trace(), and that would be reflected in the ->trc_reader_nesting field in the task_struct structure. Furthermore, this field is checked by rcu_tasks_trace_qs(), which is invoked by rcu_tasks_qs() which is in turn invoked by rcu_note_voluntary_context_switch() in kernels building at least one of the RCU Tasks flavors. It is therefore safe to invoke rcu_tasks_trace_qs() from the rcu_sched_clock_irq(). But rcu_tasks_qs() also invokes rcu_tasks_classic_qs() for RCU Tasks, which lacks the read-side markers provided by RCU Tasks Trace. This raises the possibility that an RCU Tasks grace period could start after the interrupt from userspace execution, but before the call to rcu_sched_clock_irq(). However, it turns out that this is safe because the RCU Tasks grace period waits for an RCU grace period, which will wait for the entire scheduling-clock interrupt handler, including any RCU Tasks read-side critical section that this handler might contain. This commit therefore updates the rcu_sched_clock_irq() function's check for usermode execution and its call to rcu_tasks_classic_qs() to instead check for both usermode execution and interrupt from idle, and to instead call rcu_note_voluntary_context_switch(). This consolidates code and provides more faster RCU Tasks Trace reporting of quiescent states in kernels that do scheduling-clock interrupts for userspace execution. [ paulmck: Consolidate checks into rcu_sched_clock_irq(). ] Signed-off-by: Zqiang Signed-off-by: Paul E. McKenney --- kernel/rcu/tree_plugin.h | 4 ---- 1 file changed, 4 deletions(-) (limited to 'kernel/rcu/tree_plugin.h') diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index 438ecae6bd7e..aa64b035c24f 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -718,9 +718,6 @@ static void rcu_flavor_sched_clock_irq(int user) struct task_struct *t = current; lockdep_assert_irqs_disabled(); - if (user || rcu_is_cpu_rrupt_from_idle()) { - rcu_note_voluntary_context_switch(current); - } if (rcu_preempt_depth() > 0 || (preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK))) { /* No QS, force context switch if deferred. */ @@ -972,7 +969,6 @@ static void rcu_flavor_sched_clock_irq(int user) * neither access nor modify, at least not while the * corresponding CPU is online. */ - rcu_qs(); } } -- cgit