summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorZqiang <qiang1.zhang@intel.com>2022-08-08 10:26:26 +0800
committerPaul E. McKenney <paulmck@kernel.org>2022-08-31 05:06:02 -0700
commit621189a1fe93cb2b34d62c5cdb9e258bca044813 (patch)
tree201927b08e8c355acf89d9ae2e7ee6069af476a5
parentbc1cca97e6da6c7c34db7c5b864bb354ca5305ac (diff)
rcu: Avoid triggering strict-GP irq-work when RCU is idle
Kernels built with PREEMPT_RCU=y and RCU_STRICT_GRACE_PERIOD=y trigger irq-work from rcu_read_unlock(), and the resulting irq-work handler invokes rcu_preempt_deferred_qs_handle(). The point of this triggering is to force grace periods to end quickly in order to give tools like KASAN a better chance of detecting RCU usage bugs such as leaking RCU-protected pointers out of an RCU read-side critical section. However, this irq-work triggering is unconditional. This works, but there is no point in doing this irq-work unless the current grace period is waiting on the running CPU or task, which is not the common case. After all, in the common case there are many rcu_read_unlock() calls per CPU per grace period. This commit therefore triggers the irq-work only when the current grace period is waiting on the running CPU or task. This change was tested as follows on a four-CPU system: echo rcu_preempt_deferred_qs_handler > /sys/kernel/debug/tracing/set_ftrace_filter echo 1 > /sys/kernel/debug/tracing/function_profile_enabled insmod rcutorture.ko sleep 20 rmmod rcutorture.ko echo 0 > /sys/kernel/debug/tracing/function_profile_enabled echo > /sys/kernel/debug/tracing/set_ftrace_filter This procedure produces results in this per-CPU set of files: /sys/kernel/debug/tracing/trace_stat/function* Sample output from one of these files is as follows: Function Hit Time Avg s^2 -------- --- ---- --- --- rcu_preempt_deferred_qs_handle 838746 182650.3 us 0.217 us 0.004 us The baseline sum of the "Hit" values (the number of calls to this function) was 3,319,015. With this commit, that sum was 1,140,359, for a 2.9x reduction. The worst-case variance across the CPUs was less than 25%, so this large effect size is statistically significant. The raw data is available in the Link: URL. Link: https://lore.kernel.org/all/20220808022626.12825-1-qiang1.zhang@intel.com/ Signed-off-by: Zqiang <qiang1.zhang@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
-rw-r--r--kernel/rcu/tree_plugin.h3
1 files changed, 2 insertions, 1 deletions
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index c46b3c74dad1..207617f69aa5 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -641,7 +641,8 @@ static void rcu_read_unlock_special(struct task_struct *t)
expboost = (t->rcu_blocked_node && READ_ONCE(t->rcu_blocked_node->exp_tasks)) ||
(rdp->grpmask & READ_ONCE(rnp->expmask)) ||
- IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD) ||
+ (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD) &&
+ ((rdp->grpmask & READ_ONCE(rnp->qsmask)) || t->rcu_blocked_node)) ||
(IS_ENABLED(CONFIG_RCU_BOOST) && irqs_were_disabled &&
t->rcu_blocked_node);
// Need to defer quiescent state until everything is enabled.