summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2020-08-26cpuidle: Move trace_cpu_idle() into generic codePeter Zijlstra
Remove trace_cpu_idle() from the arch_cpu_idle() implementations and put it in the generic code, right before disabling RCU. Gets rid of more trace_*_rcuidle() users. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Marco Elver <elver@google.com> Link: https://lkml.kernel.org/r/20200821085348.428433395@infradead.org
2020-08-26sched,idle,rcu: Push rcu_idle deeper into the idle pathPeter Zijlstra
Lots of things take locks, due to a wee bug, rcu_lockdep didn't notice that the locking tracepoints were using RCU. Push rcu_idle_{enter,exit}() as deep as possible into the idle paths, this also resolves a lot of _rcuidle()/RCU_NONIDLE() usage. Specifically, sched_clock_idle_wakeup_event() will use ktime which will use seqlocks which will tickle lockdep, and stop_critical_timings() uses lock. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Marco Elver <elver@google.com> Link: https://lkml.kernel.org/r/20200821085348.310943801@infradead.org
2020-08-26lockdep: Use raw_cpu_*() for per-cpu variablesPeter Zijlstra
Sven reported that commit a21ee6055c30 ("lockdep: Change hardirq{s_enabled,_context} to per-cpu variables") caused trouble on s390 because their this_cpu_*() primitives disable preemption which then lands back tracing. On the one hand, per-cpu ops should use preempt_*able_notrace() and raw_local_irq_*(), on the other hand, we can trivialy use raw_cpu_*() ops for this. Fixes: a21ee6055c30 ("lockdep: Change hardirq{s_enabled,_context} to per-cpu variables") Reported-by: Sven Schnelle <svens@linux.ibm.com> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Marco Elver <elver@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200821085348.192346882@infradead.org
2020-08-25bpf: Add d_path helperJiri Olsa
Adding d_path helper function that returns full path for given 'struct path' object, which needs to be the kernel BTF 'path' object. The path is returned in buffer provided 'buf' of size 'sz' and is zero terminated. bpf_d_path(&file->f_path, buf, size); The helper calls directly d_path function, so there's only limited set of function it can be called from. Adding just very modest set for the start. Updating also bpf.h tools uapi header and adding 'path' to bpf_helpers_doc.py script. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: KP Singh <kpsingh@google.com> Link: https://lore.kernel.org/bpf/20200825192124.710397-11-jolsa@kernel.org
2020-08-25bpf: Add BTF_SET_START/END macrosJiri Olsa
Adding support to define sorted set of BTF ID values. Following defines sorted set of BTF ID values: BTF_SET_START(btf_allowlist_d_path) BTF_ID(func, vfs_truncate) BTF_ID(func, vfs_fallocate) BTF_ID(func, dentry_open) BTF_ID(func, vfs_getattr) BTF_ID(func, filp_close) BTF_SET_END(btf_allowlist_d_path) It defines following 'struct btf_id_set' variable to access values and count: struct btf_id_set btf_allowlist_d_path; Adding 'allowed' callback to struct bpf_func_proto, to allow verifier the check on allowed callers. Adding btf_id_set_contains function, which will be used by allowed callbacks to verify the caller's BTF ID value is within allowed set. Also removing extra '\' in __BTF_ID_LIST macro. Added BTF_SET_START_GLOBAL macro for global sets. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200825192124.710397-10-jolsa@kernel.org
2020-08-25bpf: Add btf_struct_ids_match functionJiri Olsa
Adding btf_struct_ids_match function to check if given address provided by BTF object + offset is also address of another nested BTF object. This allows to pass an argument to helper, which is defined via parent BTF object + offset, like for bpf_d_path (added in following changes): SEC("fentry/filp_close") int BPF_PROG(prog_close, struct file *file, void *id) { ... ret = bpf_d_path(&file->f_path, ... The first bpf_d_path argument is hold by verifier as BTF file object plus offset of f_path member. The btf_struct_ids_match function will walk the struct file object and check if there's nested struct path object on the given offset. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200825192124.710397-9-jolsa@kernel.org
2020-08-25bpf: Factor btf_struct_access functionJiri Olsa
Adding btf_struct_walk function that walks through the struct type + given offset and returns following values: enum bpf_struct_walk_result { /* < 0 error */ WALK_SCALAR = 0, WALK_PTR, WALK_STRUCT, }; WALK_SCALAR - when SCALAR_VALUE is found WALK_PTR - when pointer value is found, its ID is stored in 'next_btf_id' output param WALK_STRUCT - when nested struct object is found, its ID is stored in 'next_btf_id' output param It will be used in following patches to get all nested struct objects for given type and offset. The btf_struct_access now calls btf_struct_walk function, as long as it gets nested structs as return value. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200825192124.710397-8-jolsa@kernel.org
2020-08-25bpf: Remove recursion call in btf_struct_accessJiri Olsa
Andrii suggested we can simply jump to again label instead of making recursion call. Suggested-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200825192124.710397-7-jolsa@kernel.org
2020-08-25bpf: Add type_id pointer as argument to __btf_resolve_sizeJiri Olsa
Adding type_id pointer as argument to __btf_resolve_size to return also BTF ID of the resolved type. It will be used in following changes. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200825192124.710397-6-jolsa@kernel.org
2020-08-25bpf: Add elem_id pointer as argument to __btf_resolve_sizeJiri Olsa
If the resolved type is array, make btf_resolve_size return also ID of the elem type. It will be needed in following changes. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200825192124.710397-5-jolsa@kernel.org
2020-08-25bpf: Move btf_resolve_size into __btf_resolve_sizeJiri Olsa
Moving btf_resolve_size into __btf_resolve_size and keeping btf_resolve_size public with just first 3 arguments, because the rest of the arguments are not used by outside callers. Following changes are adding more arguments, which are not useful to outside callers. They will be added to the __btf_resolve_size function. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200825192124.710397-4-jolsa@kernel.org
2020-08-25bpf: Disallow BPF_PRELOAD in allmodconfig buildsAlexei Starovoitov
The CC_CAN_LINK checks that the host compiler can link, but bpf_preload relies on libbpf which in turn needs libelf to be present during linking. allmodconfig runs in odd setups with cross compilers and missing host libraries like libelf. Instead of extending kconfig with every possible library that bpf_preload might need disallow building BPF_PRELOAD in such build-only configurations. Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-08-25bpf: Allow local storage to be used from LSM programsKP Singh
Adds support for both bpf_{sk, inode}_storage_{get, delete} to be used in LSM programs. These helpers are not used for tracing programs (currently) as their usage is tied to the life-cycle of the object and should only be used where the owning object won't be freed (when the owning object is passed as an argument to the LSM hook). Thus, they are safer to use in LSM hooks than tracing. Usage of local storage in tracing programs will probably follow a per function based whitelist approach. Since the UAPI helper signature for bpf_sk_storage expect a bpf_sock, it, leads to a compilation warning for LSM programs, it's also updated to accept a void * pointer instead. Signed-off-by: KP Singh <kpsingh@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200825182919.1118197-7-kpsingh@chromium.org
2020-08-25bpf: Implement bpf_local_storage for inodesKP Singh
Similar to bpf_local_storage for sockets, add local storage for inodes. The life-cycle of storage is managed with the life-cycle of the inode. i.e. the storage is destroyed along with the owning inode. The BPF LSM allocates an __rcu pointer to the bpf_local_storage in the security blob which are now stackable and can co-exist with other LSMs. Signed-off-by: KP Singh <kpsingh@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200825182919.1118197-6-kpsingh@chromium.org
2020-08-25bpf: Split bpf_local_storage to bpf_sk_storageKP Singh
A purely mechanical change: bpf_sk_storage.c = bpf_sk_storage.c + bpf_local_storage.c bpf_sk_storage.h = bpf_sk_storage.h + bpf_local_storage.h Signed-off-by: KP Singh <kpsingh@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200825182919.1118197-5-kpsingh@chromium.org
2020-08-25alarmtimer: Convert comma to semicolonXu Wang
Replace a comma between expression statements by a semicolon. Signed-off-by: Xu Wang <vulab@iscas.ac.cn> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Stephen Boyd <sboyd@kernel.org> Link: https://lore.kernel.org/r/20200818062651.21680-1-vulab@iscas.ac.cn
2020-08-24bpf, sysctl: Let bpf_stats_handler take a kernel pointer bufferTobias Klauser
Commit 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler") changed ctl_table.proc_handler to take a kernel pointer. Adjust the signature of bpf_stats_handler to match ctl_table.proc_handler which fixes the following sparse warning: kernel/sysctl.c:226:49: warning: incorrect type in argument 3 (different address spaces) kernel/sysctl.c:226:49: expected void * kernel/sysctl.c:226:49: got void [noderef] __user *buffer kernel/sysctl.c:2640:35: warning: incorrect type in initializer (incompatible argument 3 (different address spaces)) kernel/sysctl.c:2640:35: expected int ( [usertype] *proc_handler )( ... ) kernel/sysctl.c:2640:35: got int ( * )( ... ) Fixes: 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler") Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/bpf/20200824142047.22043-1-tklauser@distanz.ch
2020-08-24bpf: Fix a buffer out-of-bound access when filling raw_tp link_infoYonghong Song
Commit f2e10bff16a0 ("bpf: Add support for BPF_OBJ_GET_INFO_BY_FD for bpf_link") added link query for raw_tp. One of fields in link_info is to fill a user buffer with tp_name. The Scurrent checking only declares "ulen && !ubuf" as invalid. So "!ulen && ubuf" will be valid. Later on, we do "copy_to_user(ubuf, tp_name, ulen - 1)" which may overwrite user memory incorrectly. This patch fixed the problem by disallowing "!ulen && ubuf" case as well. Fixes: f2e10bff16a0 ("bpf: Add support for BPF_OBJ_GET_INFO_BY_FD for bpf_link") Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200821191054.714731-1-yhs@fb.com
2020-08-24rcutorture: Allow pointer leaks to test diagnostic codePaul E. McKenney
This commit adds an rcutorture.leakpointer module parameter that intentionally leaks an RCU-protected pointer out of the RCU read-side critical section and checks to see if the corresponding grace period has elapsed, emitting a WARN_ON_ONCE() if so. This module parameter can be used to test facilities like CONFIG_RCU_STRICT_GRACE_PERIOD that end grace periods quickly. While in the area, also document rcutorture.irqreader, which was previously left out. Reported-by Jann Horn <jannh@google.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-08-24rcutorture: Hoist OOM registry up one levelPaul E. McKenney
Currently, registering and unregistering the OOM notifier is done right before and after the test, respectively. This will not work well for multi-threaded tests, so this commit hoists this registering and unregistering up into the rcu_torture_fwd_prog_init() and rcu_torture_fwd_prog_cleanup() functions. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-08-24refperf: Avoid null pointer dereference when buf fails to allocateColin Ian King
Currently in the unlikely event that buf fails to be allocated it is dereferenced a few times. Use the errexit flag to determine if buf should be written to to avoid the null pointer dereferences. Addresses-Coverity: ("Dereference after null check") Fixes: f518f154ecef ("refperf: Dynamically allocate experiment-summary output buffer") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-08-24rcutorture: Properly synchronize with OOM notifierPaul E. McKenney
The current rcutorture forward-progress code assumes that it is the only cause of out-of-memory (OOM) events. For script-based rcutorture testing, this assumption is in fact correct. However, testing based on modprobe/rmmod might well encounter external OOM events, which could happen at any time. This commit therefore properly synchronizes the interaction between rcutorture's forward-progress testing and its OOM notifier by adding a global mutex. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-08-24rcutorture: Properly set rcu_fwds for OOM handlingPaul E. McKenney
The conversion of rcu_fwds to dynamic allocation failed to actually allocate the required structure. This commit therefore allocates it, frees it, and updates rcu_fwds accordingly. While in the area, it abstracts the cleanup actions into rcu_torture_fwd_prog_cleanup(). Fixes: 5155be9994e5 ("rcutorture: Dynamically allocate rcu_fwds structure") Reported-by: kernel test robot <rong.a.chen@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-08-24locktorture: Make function torture_percpu_rwsem_init() staticWei Yongjun
The sparse tool complains as follows: kernel/locking/locktorture.c:569:6: warning: symbol 'torture_percpu_rwsem_init' was not declared. Should it be static? And this function is not used outside of locktorture.c, so this commit marks it static. Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-08-24rcutorture: Output number of elapsed grace periodsJoel Fernandes (Google)
This commit adds code to print the grace-period number at the start of the test along with both the grace-period number and the number of elapsed grace periods at the end of the test. Note that variants of RCU)without the notion of a grace-period number (for example, Tiny RCU) just print zeroes. [ paulmck: Adjust commit log. ] Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-08-24rcutorture: Remove KCSAN stubsPaul E. McKenney
KCSAN is now in mainline, so this commit removes the stubs for the data_race(), ASSERT_EXCLUSIVE_WRITER(), and ASSERT_EXCLUSIVE_ACCESS() macros. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-08-24rcu: Remove unused "cpu" parameter from rcu_report_qs_rdp()Paul E. McKenney
The "cpu" parameter to rcu_report_qs_rdp() is not used, with rdp->cpu being used instead. Furtheremore, every call to rcu_report_qs_rdp() invokes it on rdp->cpu. This commit therefore removes this unused "cpu" parameter and converts a check of rdp->cpu against smp_processor_id() to a WARN_ON_ONCE(). Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-08-24rcu: Report QS for outermost PREEMPT=n rcu_read_unlock() for strict GPsPaul E. McKenney
The CONFIG_PREEMPT=n instance of rcu_read_unlock is even more aggressively than that of CONFIG_PREEMPT=y in deferring reporting quiescent states to the RCU core. This is just what is wanted in normal use because it reduces overhead, but the resulting delay is not what is wanted for kernels built with CONFIG_RCU_STRICT_GRACE_PERIOD=y. This commit therefore adds an rcu_read_unlock_strict() function that checks for exceptional conditions, and reports the newly started quiescent state if it is safe to do so, also doing a spin-delay if requested via rcutree.rcu_unlock_delay. This commit also adds a call to rcu_read_unlock_strict() from the CONFIG_PREEMPT=n instance of __rcu_read_unlock(). [ paulmck: Fixed bug located by kernel test robot <lkp@intel.com> ] Reported-by Jann Horn <jannh@google.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-08-24rcu: Execute RCU reader shortly after rcu_core for strict GPsPaul E. McKenney
A kernel built with CONFIG_RCU_STRICT_GRACE_PERIOD=y needs a quiescent state to appear very shortly after a CPU has noticed a new grace period. Placing an RCU reader immediately after this point is ineffective because this normally happens in softirq context, which acts as a big RCU reader. This commit therefore introduces a new per-CPU work_struct, which is used at the end of rcu_core() processing to schedule an RCU read-side critical section from within a clean environment. Reported-by Jann Horn <jannh@google.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-08-24rcu: Provide optional RCU-reader exit delay for strict GPsPaul E. McKenney
The goal of this series is to increase the probability of tools like KASAN detecting that an RCU-protected pointer was used outside of its RCU read-side critical section. Thus far, the approach has been to make grace periods and callback processing happen faster. Another approach is to delay the pointer leaker. This commit therefore allows a delay to be applied to exit from RCU read-side critical sections. This slowdown is specified by a new rcutree.rcu_unlock_delay kernel boot parameter that specifies this delay in microseconds, defaulting to zero. Reported-by Jann Horn <jannh@google.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-08-24rcu: IPI all CPUs at GP end for strict GPsPaul E. McKenney
Currently, each CPU discovers the end of a given grace period on its own time, which is again good for efficiency but bad for fast grace periods, given that it is things like kfree() within the RCU callbacks that will cause trouble for pointers leaked from RCU read-side critical sections. This commit therefore uses on_each_cpu() to IPI each CPU after grace-period cleanup in order to inform each CPU of the end of the old grace period in a timely manner, but only in kernels build with CONFIG_RCU_STRICT_GRACE_PERIOD=y. Reported-by Jann Horn <jannh@google.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-08-24rcu: IPI all CPUs at GP start for strict GPsPaul E. McKenney
Currently, each CPU discovers the beginning of a given grace period on its own time, which is again good for efficiency but bad for fast grace periods. This commit therefore uses on_each_cpu() to IPI each CPU after grace-period initialization in order to inform each CPU of the new grace period in a timely manner, but only in kernels build with CONFIG_RCU_STRICT_GRACE_PERIOD=y. Reported-by Jann Horn <jannh@google.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-08-24rcu: Attempt QS when CPU discovers GP for strict GPsPaul E. McKenney
A given CPU normally notes a new grace period during one RCU_SOFTIRQ, but avoids reporting the corresponding quiescent state until some later RCU_SOFTIRQ. This leisurly approach improves efficiency by increasing the number of update requests served by each grace period, but is not what is needed for kernels built with CONFIG_RCU_STRICT_GRACE_PERIOD=y. This commit therefore adds a new rcu_strict_gp_check_qs() function which, in CONFIG_RCU_STRICT_GRACE_PERIOD=y kernels, simply enters and immediately exist an RCU read-side critical section. If the CPU is in a quiescent state, the rcu_read_unlock() will attempt to report an immediate quiescent state. This rcu_strict_gp_check_qs() function is invoked from note_gp_changes(), so that a CPU just noticing a new grace period might immediately report a quiescent state for that grace period. Reported-by Jann Horn <jannh@google.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-08-24rcu: Do full report for .need_qs for strict GPsPaul E. McKenney
The rcu_preempt_deferred_qs_irqrestore() function is invoked at the end of an RCU read-side critical section (for example, directly from rcu_read_unlock()) and, if .need_qs is set, invokes rcu_qs() to report the new quiescent state. This works, except that rcu_qs() only updates per-CPU state, leaving reporting of the actual quiescent state to a later call to rcu_report_qs_rdp(), for example from within a later RCU_SOFTIRQ instance. Although this approach is exactly what you want if you are more concerned about efficiency than about short grace periods, in CONFIG_RCU_STRICT_GRACE_PERIOD=y kernels, short grace periods are the name of the game. This commit therefore makes rcu_preempt_deferred_qs_irqrestore() directly invoke rcu_report_qs_rdp() in CONFIG_RCU_STRICT_GRACE_PERIOD=y, thus shortening grace periods. Historical note: To the best of my knowledge, causing rcu_read_unlock() to directly report a quiescent state first appeared in Jim Houston's and Joe Korty's JRCU. This is the second instance of a Linux-kernel RCU feature being inspired by JRCU, the first being RCU callback offloading (as in the RCU_NOCB_CPU Kconfig option). Reported-by Jann Horn <jannh@google.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-08-24rcu: Always set .need_qs from __rcu_read_lock() for strict GPsPaul E. McKenney
The ->rcu_read_unlock_special.b.need_qs field in the task_struct structure indicates that the RCU core needs a quiscent state from the corresponding task. The __rcu_read_unlock() function checks this (via an eventual call to rcu_preempt_deferred_qs_irqrestore()), and if set reports a quiscent state immediately upon exit from the outermost RCU read-side critical section. Currently, this flag is only set when the scheduling-clock interrupt decides that the current RCU grace period is too old, as in about one full second too old. But if the kernel has been built with CONFIG_RCU_STRICT_GRACE_PERIOD=y, we clearly do not want to wait that long. This commit therefore sets the .need_qs field immediately at the start of the RCU read-side critical section from within __rcu_read_lock() in order to unconditionally enlist help from __rcu_read_unlock(). But note the additional check for rcu_state.gp_kthread, which prevents attempts to awaken RCU's grace-period kthread during early boot before there is a scheduler. Leaving off this check results in early boot hangs. So early that there is no console output. Thus, this additional check fails until such time as RCU's grace-period kthread has been created, avoiding these empty-console hangs. Reported-by Jann Horn <jannh@google.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-08-24rcu: Force DEFAULT_RCU_BLIMIT to 1000 for strict RCU GPsPaul E. McKenney
The value of DEFAULT_RCU_BLIMIT is normally set to 10, the idea being to avoid needless response-time degradation due to RCU callback invocation. However, when CONFIG_RCU_STRICT_GRACE_PERIOD=y it is better to avoid throttling callback execution in order to better detect pointer leaks from RCU read-side critical sections. This commit therefore sets the value of DEFAULT_RCU_BLIMIT to 1000 in kernels built with CONFIG_RCU_STRICT_GRACE_PERIOD=y. Reported-by Jann Horn <jannh@google.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-08-24rcu: Restrict default jiffies_till_first_fqs for strict RCU GPsPaul E. McKenney
If there are idle CPUs, RCU's grace-period kthread will wait several jiffies before even thinking about polling them. This promotes efficiency, which is normally a good thing, but when the kernel has been built with CONFIG_RCU_STRICT_GRACE_PERIOD=y, we care more about short grace periods. This commit therefore restricts the default jiffies_till_first_fqs value to zero in kernels built with CONFIG_RCU_STRICT_GRACE_PERIOD=y, which causes RCU's grace-period kthread to poll for idle CPUs immediately after starting a grace period. Reported-by Jann Horn <jannh@google.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-08-24rcu: Reduce leaf fanout for strict RCU grace periodsPaul E. McKenney
Because strict RCU grace periods will complete more quickly, they will experience greater lock contention on each leaf rcu_node structure's ->lock. This commit therefore reduces the leaf fanout in order to reduce this lock contention. Note that this also has the effect of reducing the number of CPUs supported to 16 in the case of CONFIG_RCU_FANOUT_LEAF=2 or 81 in the case of CONFIG_RCU_FANOUT_LEAF=3. However, greater numbers of CPUs are probably a bad idea when using CONFIG_RCU_STRICT_GRACE_PERIOD=y. Those wishing to live dangerously are free to edit their kernel/rcu/Kconfig files accordingly. Reported-by Jann Horn <jannh@google.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-08-24rcu: Add Kconfig option for strict RCU grace periodsPaul E. McKenney
People running automated tests have asked for a way to make RCU minimize grace-period duration in order to increase the probability of KASAN detecting a pointer being improperly leaked from an RCU read-side critical section, for example, like this: rcu_read_lock(); p = rcu_dereference(gp); do_something_with(p); // OK rcu_read_unlock(); do_something_else_with(p); // BUG!!! The rcupdate.rcu_expedited boot parameter is a start in this direction, given that it makes calls to synchronize_rcu() instead invoke the faster (and more wasteful) synchronize_rcu_expedited(). However, this does nothing to shorten RCU grace periods that are instead initiated by call_rcu(), and RCU pointer-leak bugs can involve call_rcu() just as surely as they can synchronize_rcu(). This commit therefore adds a RCU_STRICT_GRACE_PERIOD Kconfig option that will be used to shorten normal (non-expedited) RCU grace periods. This commit also dumps out a message when this option is in effect. Later commits will actually shorten grace periods. Reported-by Jann Horn <jannh@google.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-08-24rcuperf: Change rcuperf to rcuscalePaul E. McKenney
This commit further avoids conflation of rcuperf with the kernel's perf feature by renaming kernel/rcu/rcuperf.c to kernel/rcu/rcuscale.c, and also by similarly renaming the functions and variables inside this file. This has the side effect of changing the names of the kernel boot parameters, so kernel-parameters.txt and ver_functions.sh are also updated. The rcutorture --torture type was also updated from rcuperf to rcuscale. [ paulmck: Fix bugs located by Stephen Rothwell. ] Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-08-24scftorture: Add cond_resched() to test loopPaul E. McKenney
Although the test loop does randomly delay, which would provide quiescent states and so forth, it is possible for there to be a series of long smp_call_function*() handler runtimes with no delays, which results in softlockup and RCU CPU stall warning messages. This commit therefore inserts a cond_resched() into the main test loop. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-08-24scftorture: Adapt memory-ordering test to UP operationPaul E. McKenney
On uniprocessor systems, smp_call_function() does nothing. This commit therefore avoids complaining about the lack of handler accesses in the single-CPU case where there is no handler. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-08-24scftorture: Block scftorture_invoker() kthreads for offline CPUsPaul E. McKenney
Currently, CPU-hotplug operations might result in all but two of (say) 100 CPUs being offline, which in turn might result in false-positive diagnostics due to overload. This commit therefore causes scftorture_invoker() kthreads for offline CPUs to loop blocking for 200 milliseconds at a time, thus continuously adjusting the number of threads to match the number of online CPUs. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-08-24scftorture: Check unexpected "switch" statement valuePaul E. McKenney
This commit adds a "default" case to the switch statement in scftorture_invoke_one() which contains a WARN_ON_ONCE() and an assignment to ->scfc_out to suppress knock-on warnings. These knock-on warnings could otherwise cause the user to think that there was a memory-ordering problem in smp_call_function() instead of a bug in scftorture.c itself. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-08-24scftorture: Make symbol 'scf_torture_rand' staticWei Yongjun
The sparse tool complains as follows kernel/scftorture.c:124:1: warning: symbol '__pcpu_scope_scf_torture_rand' was not declared. Should it be static? And this per-CPU variable is not used outside of scftorture.c, so this commit marks it static. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-08-24scftorture: Prevent compiler from reducing race probabilitiesPaul E. McKenney
Detecting smp_call_function() memory misordering requires close timing, so it is necessary to have the checks immediately before and after the call to the smp_call_function*() function under test. This commit therefore inserts barrier() calls to prevent the compiler from optimizing memory-misordering detection down into the zone of extreme improbability. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-08-24scftorture: Flag errors in torture-compatible mannerPaul E. McKenney
This commit prints error counts on the statistics line and also adds a "!!!" if any of the counters are non-zero. Allocation failures are (somewhat) forgiven, but all other errors result in a "FAILURE" print at the end of the test. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-08-24scftorture: Consolidate scftorture_invoke_one() scf_check initializationPaul E. McKenney
This commit hoists much of the initialization of the scf_check structure out of the switch statement, thus saving a few lines of code. The initialization of the ->scfc_in field remains in each leg of the switch statement in order to more heavily stress memory ordering. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-08-24scftorture: Consolidate scftorture_invoke_one() check and kfree()Paul E. McKenney
This commit moves checking of the ->scfc_out field and the freeing of the scf_check structure down below the end of switch statement, thus saving a few lines of code. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-08-24scftorture: Add smp_call_function() memory-ordering checksPaul E. McKenney
This commit adds checks for memory misordering across calls to and returns from smp_call_function() in the case where the caller waits. Misordering results in a splat. Note that in contrast to smp_call_function_single(), this code does not test memory ordering into the handler in the no-wait case because none of the handlers would be able to free the scf_check structure without introducing heavy synchronization to work out which was last. [ paulmck: s/GFP_KERNEL/GFP_ATOMIC/ per kernel test robot feedback. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>