summaryrefslogtreecommitdiff
path: root/kernel/rcu/Kconfig.debug
diff options
context:
space:
mode:
authorZhen Lei <thunder.leizhen@huawei.com>2022-11-19 17:25:06 +0800
committerPaul E. McKenney <paulmck@kernel.org>2023-01-05 12:21:11 -0800
commitbe42f00b73a0f50710d16eb7cb4efda0cce062dd (patch)
tree61232854b9fac3a1b4fa54738c8c7d1a45c0e549 /kernel/rcu/Kconfig.debug
parent7c182722a0a9447e31f9645de4f311e5bc59b480 (diff)
rcu: Add RCU stall diagnosis information
Because RCU CPU stall warnings are driven from the scheduling-clock interrupt handler, a workload consisting of a very large number of short-duration hardware interrupts can result in misleading stall-warning messages. On systems supporting only a single level of interrupts, that is, where interrupts handlers cannot be interrupted, this can produce misleading diagnostics. The stack traces will show the innocent-bystander interrupted task, not the interrupts that are at the very least exacerbating the stall. This situation can be improved by displaying the number of interrupts and the CPU time that they have consumed. Diagnosing other types of stalls can be eased by also providing the count of softirqs and the CPU time that they consumed as well as the number of context switches and the task-level CPU time consumed. Consider the following output given this change: rcu: INFO: rcu_preempt self-detected stall on CPU rcu: 0-....: (1250 ticks this GP) <omitted> rcu: hardirqs softirqs csw/system rcu: number: 624 45 0 rcu: cputime: 69 1 2425 ==> 2500(ms) This output shows that the number of hard and soft interrupts is small, there are no context switches, and the system takes up a lot of time. This indicates that the current task is looping with preemption disabled. The impact on system performance is negligible because snapshot is recorded only once for all continuous RCU stalls. This added debugging information is suppressed by default and can be enabled by building the kernel with CONFIG_RCU_CPU_STALL_CPUTIME=y or by booting with rcupdate.rcu_cpu_stall_cputime=1. Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Reviewed-by: Mukesh Ojha <quic_mojha@quicinc.com> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Diffstat (limited to 'kernel/rcu/Kconfig.debug')
-rw-r--r--kernel/rcu/Kconfig.debug13
1 files changed, 13 insertions, 0 deletions
diff --git a/kernel/rcu/Kconfig.debug b/kernel/rcu/Kconfig.debug
index 232e29fe3e5e..49da904df6aa 100644
--- a/kernel/rcu/Kconfig.debug
+++ b/kernel/rcu/Kconfig.debug
@@ -92,6 +92,19 @@ config RCU_EXP_CPU_STALL_TIMEOUT
says to use the RCU_CPU_STALL_TIMEOUT value converted from
seconds to milliseconds.
+config RCU_CPU_STALL_CPUTIME
+ bool "Provide additional RCU stall debug information"
+ depends on RCU_STALL_COMMON
+ default n
+ help
+ Collect statistics during the sampling period, such as the number of
+ (hard interrupts, soft interrupts, task switches) and the cputime of
+ (hard interrupts, soft interrupts, kernel tasks) are added to the
+ RCU stall report. For multiple continuous RCU stalls, all sampling
+ periods begin at half of the first RCU stall timeout.
+ The boot option rcupdate.rcu_cpu_stall_cputime has the same function
+ as this one, but will override this if it exists.
+
config RCU_TRACE
bool "Enable tracing for RCU"
depends on DEBUG_KERNEL