From 4a81e8328d3791a4f99bf5b436d050f6dc5ffea3 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Fri, 20 Jun 2014 16:49:01 -0700 Subject: rcu: Reduce overhead of cond_resched() checks for RCU Commit ac1bea85781e (Make cond_resched() report RCU quiescent states) fixed a problem where a CPU looping in the kernel with but one runnable task would give RCU CPU stall warnings, even if the in-kernel loop contained cond_resched() calls. Unfortunately, in so doing, it introduced performance regressions in Anton Blanchard's will-it-scale "open1" test. The problem appears to be not so much the increased cond_resched() path length as an increase in the rate at which grace periods complete, which increased per-update grace-period overhead. This commit takes a different approach to fixing this bug, mainly by moving the RCU-visible quiescent state from cond_resched() to rcu_note_context_switch(), and by further reducing the check to a simple non-zero test of a single per-CPU variable. However, this approach requires that the force-quiescent-state processing send resched IPIs to the offending CPUs. These will be sent only once the grace period has reached an age specified by the boot/sysfs parameter rcutree.jiffies_till_sched_qs, or once the grace period reaches an age halfway to the point at which RCU CPU stall warnings will be emitted, whichever comes first. Reported-by: Dave Hansen Signed-off-by: Paul E. McKenney Cc: Andi Kleen Cc: Christoph Lameter Cc: Mike Galbraith Cc: Eric Dumazet Reviewed-by: Josh Triplett [ paulmck: Made rcu_momentary_dyntick_idle() as suggested by the ktest build robot. Also fixed smp_mb() comment as noted by Oleg Nesterov. ] Merge with e552592e (Reduce overhead of cond_resched() checks for RCU) Signed-off-by: Paul E. McKenney --- Documentation/kernel-parameters.txt | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'Documentation/kernel-parameters.txt') diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 6eaa9cdb7094..910c3829f81d 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -2785,6 +2785,12 @@ bytes respectively. Such letter suffixes can also be entirely omitted. leaf rcu_node structure. Useful for very large systems. + rcutree.jiffies_till_sched_qs= [KNL] + Set required age in jiffies for a + given grace period before RCU starts + soliciting quiescent-state help from + rcu_note_context_switch(). + rcutree.jiffies_till_first_fqs= [KNL] Set delay from grace-period initialization to first attempt to force quiescent states. -- cgit From fbce7497ee5af800a1c350c73f3c3f103cb27a15 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Tue, 24 Jun 2014 09:26:11 -0700 Subject: rcu: Parallelize and economize NOCB kthread wakeups An 80-CPU system with a context-switch-heavy workload can require so many NOCB kthread wakeups that the RCU grace-period kthreads spend several tens of percent of a CPU just awakening things. This clearly will not scale well: If you add enough CPUs, the RCU grace-period kthreads would get behind, increasing grace-period latency. To avoid this problem, this commit divides the NOCB kthreads into leaders and followers, where the grace-period kthreads awaken the leaders each of whom in turn awakens its followers. By default, the number of groups of kthreads is the square root of the number of CPUs, but this default may be overridden using the rcutree.rcu_nocb_leader_stride boot parameter. This reduces the number of wakeups done per grace period by the RCU grace-period kthread by the square root of the number of CPUs, but of course by shifting those wakeups to the leaders. In addition, because the leaders do grace periods on behalf of their respective followers, the number of wakeups of the followers decreases by up to a factor of two. Instead of being awakened once when new callbacks arrive and again at the end of the grace period, the followers are awakened only at the end of the grace period. For a numerical example, in a 4096-CPU system, the grace-period kthread would awaken 64 leaders, each of which would awaken its 63 followers at the end of the grace period. This compares favorably with the 79 wakeups for the grace-period kthread on an 80-CPU system. Reported-by: Rik van Riel Signed-off-by: Paul E. McKenney --- Documentation/kernel-parameters.txt | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'Documentation/kernel-parameters.txt') diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 910c3829f81d..770662c42c9f 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -2802,6 +2802,13 @@ bytes respectively. Such letter suffixes can also be entirely omitted. quiescent states. Units are jiffies, minimum value is one, and maximum value is HZ. + rcutree.rcu_nocb_leader_stride= [KNL] + Set the number of NOCB kthread groups, which + defaults to the square root of the number of + CPUs. Larger numbers reduces the wakeup overhead + on the per-CPU grace-period kthreads, but increases + that same overhead on each group's leader. + rcutree.qhimark= [KNL] Set threshold of queued RCU callbacks beyond which batch limiting is disabled. -- cgit