summaryrefslogtreecommitdiff
path: root/kernel/rcu
AgeCommit message (Collapse)Author
2022-06-21rcu-tasks: Update commentsPaul E. McKenney
This commit updates comments to reflect the changes in the series of commits that eliminated the full task-list scan. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Martin KaFai Lau <kafai@fb.com> Cc: KP Singh <kpsingh@kernel.org>
2022-06-21rcu-tasks: Disable and enable CPU hotplug in same functionPaul E. McKenney
The rcu_tasks_trace_pregp_step() function invokes cpus_read_lock() to disable CPU hotplug, and a later call to the rcu_tasks_trace_postscan() function invokes cpus_read_unlock() to re-enable it. This was absolutely necessary in the past in order to protect the intervening scan of the full tasks list, but there is no longer such a scan. This commit therefore improves readability by moving the cpus_read_unlock() call to the end of the rcu_tasks_trace_pregp_step() function. This commit is a pure code-motion commit without any (intended) change in functionality. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Martin KaFai Lau <kafai@fb.com> Cc: KP Singh <kpsingh@kernel.org>
2022-06-21rcu-tasks: Eliminate RCU Tasks Trace IPIs to online CPUsPaul E. McKenney
Currently, the RCU Tasks Trace grace-period kthread IPIs each online CPU using smp_call_function_single() in order to track any tasks currently in RCU Tasks Trace read-side critical sections during which the corresponding task has neither blocked nor been preempted. These IPIs are annoying and are also not strictly necessary because any task that blocks or is preempted within its current RCU Tasks Trace read-side critical section will be tracked on one of the per-CPU rcu_tasks_percpu structure's ->rtp_blkd_tasks list. So the only time that this is a problem is if one of the CPUs runs through a long-duration RCU Tasks Trace read-side critical section without a context switch. Note that the task_call_func() function cannot help here because there is no safe way to identify the target task. Of course, the task_call_func() function will be very useful later, when processing the list of tasks, but it needs to know the task. This commit therefore creates a cpu_curr_snapshot() function that returns a pointer the task_struct structure of some task that happened to be running on the specified CPU more or less during the time that the cpu_curr_snapshot() function was executing. If there was no context switch during this time, this function will return a pointer to the task_struct structure of the task that was running throughout. If there was a context switch, then the outgoing task will be taken care of by RCU's context-switch hook, and the incoming task was either already taken care during some previous context switch, or it is not currently within an RCU Tasks Trace read-side critical section. And in this latter case, the grace period already started, so there is no need to wait on this task. This new cpu_curr_snapshot() function is invoked on each CPU early in the RCU Tasks Trace grace-period processing, and the resulting tasks are queued for later quiescent-state inspection. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Martin KaFai Lau <kafai@fb.com> Cc: KP Singh <kpsingh@kernel.org>
2022-06-21rcu-tasks: Maintain a count of tasks blocking RCU Tasks Trace grace periodPaul E. McKenney
This commit maintains a new n_trc_holdouts counter that tracks the number of tasks blocking the RCU Tasks grace period. This counter is useful for debugging, and its value has been added to a diagostic message. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Martin KaFai Lau <kafai@fb.com> Cc: KP Singh <kpsingh@kernel.org>
2022-06-21rcu-tasks: Stop RCU Tasks Trace from scanning full tasks listPaul E. McKenney
This commit takes off the training wheels and relies only on scanning currently running tasks and tasks that have blocked or been preempted within their current RCU Tasks Trace read-side critical section. Before this commit, the time complexity of an RCU Tasks Trace grace period is O(T), where T is the number of tasks. After this commit, this time complexity is O(C+B), where C is the number of CPUs and B is the number of tasks that have blocked (or been preempted) at least once during their current RCU Tasks Trace read-side critical sections. Of course, if all tasks have blocked (or been preempted) at least once during their current RCU Tasks Trace read-side critical sections, this is still O(T), but current expectations are that RCU Tasks Trace read-side critical section will be short and that there will normally not be large numbers of tasks blocked within such a critical section. Dave Marchevsky kindly measured the effects of this commit on the RCU Tasks Trace grace-period latency and the rcu_tasks_trace_kthread task's CPU consumption per RCU Tasks Trace grace period over the course of a fixed test, all in milliseconds: Before After GP latency 22.3 ms stddev > 0.1 17.0 ms stddev < 0.1 GP CPU 2.3 ms stddev 0.3 1.1 ms stddev 0.2 This was on a system with 15,000 tasks, so it is reasonable to expect much larger savings on the systems on which this issue was first noted, given that they sport well in excess of 100,000 tasks. CPU consumption was measured using profiling techniques. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Martin KaFai Lau <kafai@fb.com> Cc: KP Singh <kpsingh@kernel.org> Tested-by: Dave Marchevsky <davemarchevsky@fb.com>
2022-06-21rcutorture: Update rcutorture.fwd_progress help textPaul E. McKenney
This commit updates the rcutorture.fwd_progress help text to say that it is the number of forward-progress kthreads to spawn rather than the old enable/disable functionality. While in the area, make the list of torture-test parameters easier to read by taking advantage of 100 columns. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
2022-06-20rcu: Apply noinstr to rcu_idle_enter() and rcu_idle_exit()Paul E. McKenney
This commit applies the "noinstr" tag to the rcu_idle_enter() and rcu_idle_exit() functions, which are invoked from portions of the idle loop that cannot be instrumented. These tags require reworking the rcu_eqs_enter() and rcu_eqs_exit() functions that these two functions invoke in order to cause them to use normal assertions rather than lockdep. In addition, within rcu_idle_exit(), the raw versions of local_irq_save() and local_irq_restore() are used, again to avoid issues with lockdep in uninstrumented code. This patch is based in part on an earlier patch by Jiri Olsa, discussions with Peter Zijlstra and Frederic Weisbecker, earlier changes by Thomas Gleixner, and off-list discussions with Yonghong Song. Link: https://lore.kernel.org/lkml/20220515203653.4039075-1-jolsa@kernel.org/ Reported-by: Jiri Olsa <jolsa@kernel.org> Reported-by: Alexei Starovoitov <ast@kernel.org> Reported-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Yonghong Song <yhs@fb.com>
2022-06-20rcu: Dump rcuc kthread status for CPUs not reporting quiescent stateZqiang
If the rcutree.use_softirq kernel boot parameter is disabled, then it is possible that a RCU CPU stall is due to the rcuc kthreads being starved of CPU time. There is currently no easy way to infer this from the RCU CPU stall warning output. This commit therefore adds a string of the form " rcuc=%ld jiffies(starved)" to a given CPU's output if the corresponding rcuc kthread has been starved for more than two seconds. [ paulmck: Eliminate extraneous space characters. ] Signed-off-by: Zqiang <qiang1.zhang@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-06-20rcu-tasks: Stop RCU Tasks Trace from scanning idle tasksPaul E. McKenney
Now that RCU scans both running tasks and tasks that have blocked within their current RCU Tasks Trace read-side critical section, there is no need for it to scan the idle tasks. After all, an idle loop should not be remain within an RCU Tasks Trace read-side critical section across exit from idle, and from a BPF viewpoint, functions invoked from the idle loop should not sleep. So only running idle tasks can be within RCU Tasks Trace read-side critical sections. This commit therefore removes the scan of the idle tasks from the rcu_tasks_trace_postscan() function. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Martin KaFai Lau <kafai@fb.com> Cc: KP Singh <kpsingh@kernel.org>
2022-06-20rcu-tasks: Pull in tasks blocked within RCU Tasks Trace readersPaul E. McKenney
This commit scans each CPU's ->rtp_blkd_tasks list, adding them to the list of holdout tasks. This will cause the current RCU Tasks Trace grace period to wait until these tasks exit their RCU Tasks Trace read-side critical sections. This commit will enable later work omitting the scan of the full task list. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Martin KaFai Lau <kafai@fb.com> Cc: KP Singh <kpsingh@kernel.org>
2022-06-20rcu-tasks: Scan running tasks for RCU Tasks Trace readersPaul E. McKenney
A running task might be within an RCU Tasks Trace read-side critical section for any length of time, but will not be placed on any of the per-CPU rcu_tasks_percpu structure's ->rtp_blkd_tasks lists. Therefore any RCU Tasks Trace grace-period processing that does not scan the full task list must interact with the running tasks. This commit therefore causes the rcu_tasks_trace_pregp_step() function to IPI each CPU in order to place the corresponding task on the holdouts list and to record whether or not it was in an RCU Tasks Trace read-side critical section. Yes, it is possible to avoid adding it to that list if it is not a reader, but that would prevent the system from remembering that this task was in a quiescent state. Which is why the running tasks are unconditionally added to the holdout list. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Martin KaFai Lau <kafai@fb.com> Cc: KP Singh <kpsingh@kernel.org>
2022-06-20rcu-tasks: Avoid rcu_tasks_trace_pertask() duplicate list additionsPaul E. McKenney
This commit adds checks within rcu_tasks_trace_pertask() to avoid duplicate (and destructive) additions to the holdouts list. These checks will be required later due to the possibility of a given task having blocked while in an RCU Tasks Trace read-side critical section, but now running on a CPU. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Martin KaFai Lau <kafai@fb.com> Cc: KP Singh <kpsingh@kernel.org>
2022-06-20rcu-tasks: Move rcu_tasks_trace_pertask() before rcu_tasks_trace_pregp_step()Paul E. McKenney
This is a code-motion-only commit that moves rcu_tasks_trace_pertask() to precede rcu_tasks_trace_pregp_step(), so that the latter will be able to invoke the other without forward references. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Martin KaFai Lau <kafai@fb.com> Cc: KP Singh <kpsingh@kernel.org>
2022-06-20rcu-tasks: Add blocked-task indicator to RCU Tasks Trace stall warningsPaul E. McKenney
This commit adds a "B" indicator to the RCU Tasks Trace CPU stall warning when the task has blocked within its current read-side critical section. This serves as a debugging aid. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Martin KaFai Lau <kafai@fb.com> Cc: KP Singh <kpsingh@kernel.org>
2022-06-20rcu-tasks: Untrack blocked RCU Tasks Trace at reader endPaul E. McKenney
This commit causes rcu_read_unlock_trace() to check for the current task being on a per-CPU list within the rcu_tasks_percpu structure, and removes it from that list if so. This has the effect of curtailing tracking of a task that blocked within an RCU Tasks Trace read-side critical section once it exits that critical section. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Martin KaFai Lau <kafai@fb.com> Cc: KP Singh <kpsingh@kernel.org>
2022-06-20rcu-tasks: Track blocked RCU Tasks Trace readersPaul E. McKenney
This commit places any task that has ever blocked within its current RCU Tasks Trace read-side critical section on a per-CPU list within the rcu_tasks_percpu structure. Tasks are removed from this list when they exit by the exit_tasks_rcu_finish_trace() function. The purpose of this commit is to provide the information needed to eliminate the current scan of the full task list. This commit offsets the INT_MIN value for ->trc_reader_nesting with the new nesting level in order to avoid queueing tasks that are exiting their read-side critical sections. [ paulmck: Apply kernel test robot feedback. ] [ paulmck: Apply feedback from syzbot+9bb26e7c5e8e4fa7e641@syzkaller.appspotmail.com ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Tested-by: syzbot <syzbot+9bb26e7c5e8e4fa7e641@syzkaller.appspotmail.com> Tested-by: "Zhang, Qiang1" <qiang1.zhang@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Martin KaFai Lau <kafai@fb.com> Cc: KP Singh <kpsingh@kernel.org>
2022-06-20rcu-tasks: Add data structures for lightweight grace periodsPaul E. McKenney
This commit adds fields to task_struct and to rcu_tasks_percpu that will be used to avoid the task-list scan for RCU Tasks Trace grace periods, and also initializes these fields. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Martin KaFai Lau <kafai@fb.com> Cc: KP Singh <kpsingh@kernel.org>
2022-06-20rcu-tasks: Make RCU Tasks Trace stall warning handle idle offline tasksPaul E. McKenney
When a CPU is offline, its idle task can appear to be running, but it cannot be doing anything while CPU-hotplug operations are excluded. This commit takes advantage of that fact by making trc_check_slow_task() check for task_curr(t) && cpu_online(task_cpu(t)), and recording full information in that case. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Martin KaFai Lau <kafai@fb.com> Cc: KP Singh <kpsingh@kernel.org>
2022-06-20rcu-tasks: Make RCU Tasks Trace stall warnings print full .b.need_qs fieldPaul E. McKenney
Currently, the RCU Tasks Trace CPU stall warning simply indicates whether or not the .b.need_qs field is zero. This commit shows the three permitted values and flags other values with either "!" or "?". This is a debugging aid. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Martin KaFai Lau <kafai@fb.com> Cc: KP Singh <kpsingh@kernel.org>
2022-06-20rcu-tasks: Flag offline CPUs in RCU Tasks Trace stall warningsPaul E. McKenney
This commit tags offline CPUs with "(offline)" in RCU Tasks Trace CPU stall warnings. This is a debugging aid. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Martin KaFai Lau <kafai@fb.com> Cc: KP Singh <kpsingh@kernel.org>
2022-06-20rcu-tasks: Add slow-IPI indicator to RCU Tasks Trace stall warningsPaul E. McKenney
This commit adds a "I" indicator to the RCU Tasks Trace CPU stall warning when an IPI directed to a task has thus far failed to arrive. This serves as a debugging aid. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Martin KaFai Lau <kafai@fb.com> Cc: KP Singh <kpsingh@kernel.org>
2022-06-20rcu-tasks: Simplify trc_inspect_reader() QS logicPaul E. McKenney
Currently, trc_inspect_reader() does one check for nesting less than or equal to zero, then sorts out the distinctions within this single "if" statement. This commit simplifies the logic by providing one "if" statement for quiescent states (nesting of zero) and another "if" statement for transitioning from one nesting level to another or the outermost rcu_read_unlock_trace() (negative nesting). Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Martin KaFai Lau <kafai@fb.com> Cc: KP Singh <kpsingh@kernel.org>
2022-06-20rcu-tasks: Make rcu_note_context_switch() unconditionally call rcu_tasks_qs()Paul E. McKenney
This commit makes rcu_note_context_switch() unconditionally invoke the rcu_tasks_qs() function, as opposed to doing so only when RCU (as opposed to RCU Tasks Trace) urgently needs a grace period to end. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Martin KaFai Lau <kafai@fb.com> Cc: KP Singh <kpsingh@kernel.org>
2022-06-20rcu-tasks: RCU Tasks Trace grace-period kthread has implicit QSPaul E. McKenney
Because the task driving the grace-period kthread is in quiescent state throughout, this commit excludes it from the list of tasks from which a quiescent state is needed. This does mean that attaching a sleepable BPF program to function in kernel/rcu/tasks.h is a bad idea, by the way. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Martin KaFai Lau <kafai@fb.com> Cc: KP Singh <kpsingh@kernel.org>
2022-06-20rcu-tasks: Handle idle tasks for recently offlined CPUsPaul E. McKenney
This commit identifies idle tasks for recently offlined CPUs as residing in a quiescent state. This is safe only because CPU-hotplug operations are excluded during these checks. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Martin KaFai Lau <kafai@fb.com> Cc: KP Singh <kpsingh@kernel.org>
2022-06-20rcu-tasks: Idle tasks on offline CPUs are in quiescent statesPaul E. McKenney
Any idle task corresponding to an offline CPU is in an RCU Tasks Trace quiescent state. This commit causes rcu_tasks_trace_postscan() to ignore idle tasks for offline CPUs, which it can do safely due to CPU-hotplug operations being disabled. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Martin KaFai Lau <kafai@fb.com> Cc: KP Singh <kpsingh@kernel.org>
2022-06-20rcu-tasks: Make trc_read_check_handler() fetch ->trc_reader_nesting only oncePaul E. McKenney
This commit replaces the pair of READ_ONCE(t->trc_reader_nesting) calls with a single such call and a local variable. This makes the code's intent more clear. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Martin KaFai Lau <kafai@fb.com> Cc: KP Singh <kpsingh@kernel.org>
2022-06-20rcu-tasks: Remove rcu_tasks_trace_postgp() wait for counterPaul E. McKenney
Now that tasks are not removed from the list until they have responded to any needed request for a quiescent state, it is no longer necessary to wait for the trc_n_readers_need_end counter to go to zero. This commit therefore removes that waiting code. It is therefore also no longer necessary for rcu_tasks_trace_postgp() to do the final decrement of this counter, so that code is also removed. This in turn means that trc_n_readers_need_end counter itself can be removed, as can the rcu_tasks_trace_iw irq_work structure and the rcu_read_unlock_iw() function. [ paulmck: Apply feedback from Zqiang. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Martin KaFai Lau <kafai@fb.com> Cc: KP Singh <kpsingh@kernel.org>
2022-06-20rcu-tasks: Merge state into .b.need_qs and atomically updatePaul E. McKenney
This commit gets rid of the task_struct structure's ->trc_reader_checked field, making it instead be a bit within the task_struct structure's existing ->trc_reader_special.b.need_qs field. This commit also atomically loads, stores, and checks the resulting combination of the reader-checked and need-quiescent state flags. This will in turn allow significant simplification of the rcu_tasks_trace_postgp() function as well as elimination of the trc_n_readers_need_end counter in later commits. These changes will in turn simplify later elimination of the RCU Tasks Trace scan of the task list, which will make RCU Tasks Trace grace periods less CPU-intensive. [ paulmck: Apply kernel test robot feedback. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Martin KaFai Lau <kafai@fb.com> Cc: KP Singh <kpsingh@kernel.org>
2022-06-20rcu-tasks: Drive synchronous grace periods from calling taskPaul E. McKenney
This commit causes synchronous grace periods to be driven from the task invoking synchronize_rcu_*(), allowing these functions to be invoked from the mid-boot dead zone extending from when the scheduler was initialized to to point that the various RCU tasks grace-period kthreads are spawned. This change will allow the self-tests to run in a consistent manner. Reported-by: Matthew Wilcox <willy@infradead.org> Reported-by: Zhouyi Zhou <zhouzhouyi@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-06-20rcu-tasks: Move synchronize_rcu_tasks_generic() downPaul E. McKenney
This is strictly a code-motion commit that moves the synchronize_rcu_tasks_generic() down to where it can invoke rcu_tasks_one_gp() without the need for a forward declaration. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-06-20rcu-tasks: Split rcu_tasks_one_gp() from rcu_tasks_kthread()Paul E. McKenney
This commit abstracts most of the rcu_tasks_kthread() function's loop body into a new rcu_tasks_one_gp() function. It also introduces a new ->tasks_gp_mutex to synchronize concurrent calls to this new rcu_tasks_one_gp() function. This commit is preparation for allowing RCU tasks grace periods to be driven by the calling task during the mid-boot dead zone. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-06-20rcu-tasks: Check for abandoned callbacksPaul E. McKenney
This commit adds a debugging scan for callbacks that got lost during a callback-queueing transition. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-06-20rcutorture: Validate get_completed_synchronize_rcu()Paul E. McKenney
This commit verifies that the RCU grace-period state cookie returned from get_completed_synchronize_rcu() causes poll_state_synchronize_rcu() to return true, as required. This commit is in preparation for polled expedited grace periods. Link: https://lore.kernel.org/all/20220121142454.1994916-1-bfoster@redhat.com/ Link: https://docs.google.com/document/d/1RNKWW9jQyfjxw2E8dsXVTdvZYh0HnYeSHDKog9jhdN8/edit?usp=sharing Cc: Brian Foster <bfoster@redhat.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Ian Kent <raven@themaw.net> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-06-20rcu: Provide a get_completed_synchronize_rcu() functionPaul E. McKenney
It is currently up to the caller to handle stale return values from get_state_synchronize_rcu(). If poll_state_synchronize_rcu() returned true once, a grace period has elapsed, regardless of the fact that counter wrap might cause some future poll_state_synchronize_rcu() invocation to return false. For example, the caller might store a separate flag that indicates whether some previous call to poll_state_synchronize_rcu() determined that the relevant grace period had already ended. This approach works, but it requires extra storage and is easy to get wrong. This commit therefore introduces a get_completed_synchronize_rcu() that returns a cookie that causes poll_state_synchronize_rcu() to always return true. This already-completed cookie can be stored in place of the cookie that previously caused poll_state_synchronize_rcu() to return true. It can also be used to flag a given structure as not having been exposed to readers, and thus not requiring a grace period to elapse. This commit is in preparation for polled expedited grace periods. Link: https://lore.kernel.org/all/20220121142454.1994916-1-bfoster@redhat.com/ Link: https://docs.google.com/document/d/1RNKWW9jQyfjxw2E8dsXVTdvZYh0HnYeSHDKog9jhdN8/edit?usp=sharing Cc: Brian Foster <bfoster@redhat.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Ian Kent <raven@themaw.net> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-06-20rcu: Make normal polling GP be more precise about sequence numbersPaul E. McKenney
Currently, poll_state_synchronize_rcu() uses rcu_seq_done() to check whether the specified grace period has completed. However, rcu_seq_done() does a simple comparison that reserves have of the sequence-number space for uncompleted grace periods. This has the unfortunate side-effect of not handling sequence-number wrap gracefully. Of course, one can argue that if someone has already waited for half of the full range of grace periods, they can wait for the other half, but why wait at all in this case? This commit therefore creates a rcu_seq_done_exact() that counts as uncompleted only the two grace periods during which the sequence number might have been handed out, while still being uncompleted. This way, if sequence-number wrap happens to hit that range, at most two additional grace periods need be waited for. This commit is in preparation for polled expedited grace periods. Link: https://lore.kernel.org/all/20220121142454.1994916-1-bfoster@redhat.com/ Link: https://docs.google.com/document/d/1RNKWW9jQyfjxw2E8dsXVTdvZYh0HnYeSHDKog9jhdN8/edit?usp=sharing Cc: Brian Foster <bfoster@redhat.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Ian Kent <raven@themaw.net> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-05-26Merge tag 'sysctl-5.19-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux Pull sysctl updates from Luis Chamberlain: "For two kernel releases now kernel/sysctl.c has been being cleaned up slowly, since the tables were grossly long, sprinkled with tons of #ifdefs and all this caused merge conflicts with one susbystem or another. This tree was put together to help try to avoid conflicts with these cleanups going on different trees at time. So nothing exciting on this pull request, just cleanups. Thanks a lot to the Uniontech and Huawei folks for doing some of this nasty work" * tag 'sysctl-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: (28 commits) sched: Fix build warning without CONFIG_SYSCTL reboot: Fix build warning without CONFIG_SYSCTL kernel/kexec_core: move kexec_core sysctls into its own file sysctl: minor cleanup in new_dir() ftrace: fix building with SYSCTL=y but DYNAMIC_FTRACE=n fs/proc: Introduce list_for_each_table_entry for proc sysctl mm: fix unused variable kernel warning when SYSCTL=n latencytop: move sysctl to its own file ftrace: fix building with SYSCTL=n but DYNAMIC_FTRACE=y ftrace: Fix build warning ftrace: move sysctl_ftrace_enabled to ftrace.c kernel/do_mount_initrd: move real_root_dev sysctls to its own file kernel/delayacct: move delayacct sysctls to its own file kernel/acct: move acct sysctls to its own file kernel/panic: move panic sysctls to its own file kernel/lockdep: move lockdep sysctls to its own file mm: move page-writeback sysctls to their own file mm: move oom_kill sysctls to their own file kernel/reboot: move reboot sysctls to its own file sched: Move energy_aware sysctls to topology.c ...
2022-05-25Merge tag 'printk-for-5.19' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux Pull printk updates from Petr Mladek: - Offload writing printk() messages on consoles to per-console kthreads. It prevents soft-lockups when an extensive amount of messages is printed. It was observed, for example, during boot of large systems with a lot of peripherals like disks or network interfaces. It prevents live-lockups that were observed, for example, when messages about allocation failures were reported and a CPU handled consoles instead of reclaiming the memory. It was hard to solve even with rate limiting because it would need to take into account the amount of messages and the speed of all consoles. It is a must to have for real time. Otherwise, any printk() might break latency guarantees. The per-console kthreads allow to handle each console on its own speed. Slow consoles do not longer slow down faster ones. And printk() does not longer unpredictably slows down various code paths. There are situations when the kthreads are either not available or not reliable, for example, early boot, suspend, or panic. In these situations, printk() uses the legacy mode and tries to handle consoles immediately. - Add documentation for the printk index. * tag 'printk-for-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: printk, tracing: fix console tracepoint printk: remove @console_locked printk: extend console_lock for per-console locking printk: add kthread console printers printk: add functions to prefer direct printing printk: add pr_flush() printk: move buffer definitions into console_emit_next_record() caller printk: refactor and rework printing logic printk: add con_printk() macro for console details printk: call boot_delay_msec() in printk_delay() printk: get caller_id/timestamp after migration disable printk: wake waiters for safe and NMI contexts printk: wake up all waiters printk: add missing memory barrier to wake_up_klogd() printk: cpu sync always disable interrupts printk: rename cpulock functions printk/index: Printk index feature documentation MAINTAINERS: Add printk indexing maintainers on mention of printk_index
2022-05-11Merge branch 'exp.2022.05.11a' into HEADPaul E. McKenney
exp.2022.05.11a: Expedited-grace-period latency-reduction updates.
2022-05-11rcu: Move expedited grace period (GP) work to RT kthread_workerKalesh Singh
Enabling CONFIG_RCU_BOOST did not reduce RCU expedited grace-period latency because its workqueues run at SCHED_OTHER, and thus can be delayed by normal processes. This commit avoids these delays by moving the expedited GP work items to a real-time-priority kthread_worker. This option is controlled by CONFIG_RCU_EXP_KTHREAD and disabled by default on PREEMPT_RT=y kernels which disable expedited grace periods after boot by unconditionally setting rcupdate.rcu_normal_after_boot=1. The results were evaluated on arm64 Android devices (6GB ram) running 5.10 kernel, and capturing trace data in critical user-level code. The table below shows the resulting order-of-magnitude improvements in synchronize_rcu_expedited() latency: ------------------------------------------------------------------------ | | workqueues | kthread_worker | Diff | ------------------------------------------------------------------------ | Count | 725 | 688 | | ------------------------------------------------------------------------ | Min Duration (ns) | 326 | 447 | 37.12% | ------------------------------------------------------------------------ | Q1 (ns) | 39,428 | 38,971 | -1.16% | ------------------------------------------------------------------------ | Q2 - Median (ns) | 98,225 | 69,743 | -29.00% | ------------------------------------------------------------------------ | Q3 (ns) | 342,122 | 126,638 | -62.98% | ------------------------------------------------------------------------ | Max Duration (ns) | 372,766,967 | 2,329,671 | -99.38% | ------------------------------------------------------------------------ | Avg Duration (ns) | 2,746,353 | 151,242 | -94.49% | ------------------------------------------------------------------------ | Standard Deviation (ns) | 19,327,765 | 294,408 | | ------------------------------------------------------------------------ The below table show the range of maximums/minimums for synchronize_rcu_expedited() latency from all experiments: ------------------------------------------------------------------------ | | workqueues | kthread_worker | Diff | ------------------------------------------------------------------------ | Total No. of Experiments | 25 | 23 | | ------------------------------------------------------------------------ | Largest Maximum (ns) | 372,766,967 | 2,329,671 | -99.38% | ------------------------------------------------------------------------ | Smallest Maximum (ns) | 38,819 | 86,954 | 124.00% | ------------------------------------------------------------------------ | Range of Maximums (ns) | 372,728,148 | 2,242,717 | | ------------------------------------------------------------------------ | Largest Minimum (ns) | 88,623 | 27,588 | -68.87% | ------------------------------------------------------------------------ | Smallest Minimum (ns) | 326 | 447 | 37.12% | ------------------------------------------------------------------------ | Range of Minimums (ns) | 88,297 | 27,141 | | ------------------------------------------------------------------------ Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Tejun Heo <tj@kernel.org> Reported-by: Tim Murray <timmurray@google.com> Reported-by: Wei Wang <wvw@google.com> Tested-by: Kyle Lin <kylelin@google.com> Tested-by: Chunwei Lu <chunweilu@google.com> Tested-by: Lulu Wang <luluw@google.com> Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-05-11rcu: Introduce CONFIG_RCU_EXP_CPU_STALL_TIMEOUTUladzislau Rezki
Currently both expedited and regular grace period stall warnings use a single timeout value that with units of seconds. However, recent Android use cases problem require a sub-100-millisecond expedited RCU CPU stall warning. Given that expedited RCU grace periods normally complete in far less than a single millisecond, especially for small systems, this is not unreasonable. Therefore introduce the CONFIG_RCU_EXP_CPU_STALL_TIMEOUT kernel configuration that defaults to 20 msec on Android and remains the same as that of the non-expedited stall warnings otherwise. It also can be changed in run-time via: /sys/.../parameters/rcu_exp_cpu_stall_timeout. [ paulmck: Default of zero to use CONFIG_RCU_STALL_TIMEOUT. ] Signed-off-by: Uladzislau Rezki <uladzislau.rezki@sony.com> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-05-03Merge branches 'docs.2022.04.20a', 'fixes.2022.04.20a', 'nocb.2022.04.11b', ↵Paul E. McKenney
'rcu-tasks.2022.04.11b', 'srcu.2022.05.03a', 'torture.2022.04.11b', 'torture-tasks.2022.04.20a' and 'torturescript.2022.04.20a' into HEAD docs.2022.04.20a: Documentation updates. fixes.2022.04.20a: Miscellaneous fixes. nocb.2022.04.11b: Callback-offloading updates. rcu-tasks.2022.04.11b: RCU-tasks updates. srcu.2022.05.03a: Put SRCU on a memory diet. torture.2022.04.11b: Torture-test updates. torture-tasks.2022.04.20a: Avoid torture testing changing RCU configuration. torturescript.2022.04.20a: Torture-test scripting updates.
2022-05-03srcu: Drop needless initialization of sdp in srcu_gp_start()Lukas Bulwahn
Commit 9c7ef4c30f12 ("srcu: Make Tree SRCU able to operate without snp_node array") initializes the local variable sdp differently depending on the srcu's state in srcu_gp_start(). Either way, this initialization overwrites the value used when sdp is defined. This commit therefore drops this pointless definition-time initialization. Although there is no functional change, compiler code generation may be affected. Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-05-03srcu: Prevent expedited GPs and blocking readers from consuming CPUPaul E. McKenney
If an SRCU reader blocks while a synchronize_srcu_expedited() waits for that same reader, then that grace period will spawn an endless series of workqueue handlers, consuming a full CPU. This quickly gets pointless because consuming more CPU isn't going to make that reader get done faster, especially if it is blocked waiting for an external event. This commit therefore spawns at most one pair of back-to-back workqueue handlers per expedited grace period phase, instead inserting increasing delays as that grace period phase grows older, but capped at 10 jiffies. In any case, if there have been at least 100 back-to-back workqueue handlers within a single jiffy, regardless of grace period or grace-period phase, then a one-jiffy delay is inserted. [ paulmck: Apply feedback from kernel test robot. ] Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Reported-by: Song Liu <song@kernel.org> Tested-by: kernel test robot <oliver.sang@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-05-03srcu: Add contention check to call_srcu() srcu_data ->lock acquisitionPaul E. McKenney
This commit increases the sensitivity of contention detection by adding checks to the acquisition of the srcu_data structure's lock on the call_srcu() code path. Co-developed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Signed-off-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-05-03srcu: Automatically determine size-transition strategy at bootPaul E. McKenney
This commit adds a srcutree.convert_to_big option of zero that causes SRCU to decide at boot whether to wait for contention (small systems) or immediately expand to large (large systems). A new srcutree.big_cpu_lim (defaulting to 128) defines how many CPUs constitute a large system. Co-developed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Signed-off-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-04-22printk: add functions to prefer direct printingJohn Ogness
Once kthread printing is available, console printing will no longer occur in the context of the printk caller. However, there are some special contexts where it is desirable for the printk caller to directly print out kernel messages. Using pr_flush() to wait for threaded printers is only possible if the caller is in a sleepable context and the kthreads are active. That is not always the case. Introduce printk_prefer_direct_enter() and printk_prefer_direct_exit() functions to explicitly (and globally) activate/deactivate preferred direct console printing. The term "direct console printing" refers to printing to all enabled consoles from the context of the printk caller. The term "prefer" is used because this type of printing is only best effort. If the console is currently locked or other printers are already actively printing, the printk caller will need to rely on the other contexts to handle the printing. This preferred direct printing is how all printing has been handled until now (unless it was explicitly deferred). When kthread printing is introduced, there may be some unanticipated problems due to kthreads being unable to flush important messages. In order to minimize such risks, preferred direct printing is activated for the primary important messages when the system experiences general types of major errors. These are: - emergency reboot/shutdown - cpu and rcu stalls - hard and soft lockups - hung tasks - warn - sysrq Note that since kthread printing does not yet exist, no behavior changes result from this commit. This is only implementing the counter and marking the various places where preferred direct printing is active. Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Acked-by: Paul E. McKenney <paulmck@kernel.org> # for RCU Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20220421212250.565456-13-john.ogness@linutronix.de
2022-04-20rcuscale: Allow rcuscale without RCU Tasks Rude/TracePaul E. McKenney
Currently, a CONFIG_PREEMPT_NONE=y kernel substitutes normal RCU for RCU Tasks Rude and RCU Tasks Trace. Unless that kernel builds rcuscale, whether built-in or as a module, in which case these RCU Tasks flavors are (unnecessarily) built in. This both increases kernel size and increases the complexity of certain tracing operations. This commit therefore decouples the presence of rcuscale from the presence of RCU Tasks Rude and RCU Tasks Trace. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-04-20rcuscale: Allow rcuscale without RCU TasksPaul E. McKenney
Currently, a CONFIG_PREEMPT_NONE=y kernel substitutes normal RCU for RCU Tasks. Unless that kernel builds rcuscale, whether built-in or as a module, in which case RCU Tasks is (unnecessarily) built. This both increases kernel size and increases the complexity of certain tracing operations. This commit therefore decouples the presence of rcuscale from the presence of RCU Tasks. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-04-20refscale: Allow refscale without RCU Tasks Rude/TracePaul E. McKenney
Currently, a CONFIG_PREEMPT_NONE=y kernel substitutes normal RCU for RCU Tasks Rude and RCU Tasks Trace. Unless that kernel builds refscale, whether built-in or as a module, in which case these RCU Tasks flavors are (unnecessarily) built in. This both increases kernel size and increases the complexity of certain tracing operations. This commit therefore decouples the presence of refscale from the presence of RCU Tasks Rude and RCU Tasks Trace. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>