summaryrefslogtreecommitdiff
path: root/kernel/rcu
AgeCommit message (Collapse)Author
2017-06-08rcu: Remove *_SLOW_* Kconfig optionsPaul E. McKenney
The RCU_TORTURE_TEST_SLOW_PREINIT, RCU_TORTURE_TEST_SLOW_PREINIT_DELAY, RCU_TORTURE_TEST_SLOW_PREINIT_DELAY, RCU_TORTURE_TEST_SLOW_INIT, RCU_TORTURE_TEST_SLOW_INIT_DELAY, RCU_TORTURE_TEST_SLOW_CLEANUP, and RCU_TORTURE_TEST_SLOW_CLEANUP_DELAY Kconfig options are only useful for torture testing, and there are the rcutree.gp_cleanup_delay, rcutree.gp_init_delay, and rcutree.gp_preinit_delay kernel boot parameters that rcutorture can use instead. The effect of these parameters is to artificially slow down grace period initialization and cleanup in order to make some types of race conditions happen more often. This commit therefore simplifies Tree RCU a bit by removing the Kconfig options and adding the corresponding kernel parameters to rcutorture's .boot files instead. However, this commit also leaves out the kernel parameters for TREE02, TREE04, and TREE07 in order to have about the same number of tests slowed as not slowed. TREE01, TREE03, TREE05, and TREE06 are slowed, and the rest are not slowed. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08srcu: Use rnp->lock wrappers to replace explicit memory barriersPaul E. McKenney
This commit uses TREE RCU's rnp->lock wrappers to replace a few explicit memory barriers. This change also has the advantage of making SRCU's memory-ordering properties be implemented in roughly the same way as they are in Tree RCU. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08rcu: Move rnp->lock wrappers for SRCU usePaul E. McKenney
This commit moves the now-generic rnp->lock wrapper macros from kernel/rcu/tree.h to kernel/rcu/rcu.h, thus allowing SRCU to use them. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08rcu: Convert rnp->lock wrappers to macros for SRCU usePaul E. McKenney
Use of smp_mb__after_unlock_lock() would allow SRCU to omit a full memory barrier during callback execution, so this commit converts raw_spin_lock_rcu_node() from inline functions to type-generic macros to allow them to handle locks in srcu_node structures as well as rcu_node structures. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08srcu: Apply trivial callback lists to shrink Tiny SRCUPaul E. McKenney
The rcu_segcblist structure provides quite a bit of functionality, and Tiny SRCU needs almost none of it. So this commit replaces Tiny SRCU's uses of rcu_segcblist with a simple singly linked list with tail pointer. This change significantly reduces Tiny SRCU's memory footprint, more than making up for the growth caused by the creation of rcu_segcblist.c Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08srcu: Shrink srcu.h by moving docbook and private functionPaul E. McKenney
The call_srcu() docbook entry is currently in include/linux/srcu.h, which causes needless processing for each include point. This commit therefore moves this entry to kernel/rcu/srcutree.c, which the compiler reads only once. In addition, the srcu_batches_completed() function is used only within RCU and its torture-test suites. This commit therefore also moves this function's declaration from include/linux/srcutiny.h, include/linux/srcutree.h, and include/linux/srcuclassic.h to kernel/rcu/rcu.h. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08srcu: Prevent sdp->srcu_gp_seq_needed counter wrapPaul E. McKenney
If a given CPU never happens to ever start an SRCU grace period, the grace-period sequence counter might wrap. If this CPU were to decide to finally start a grace period, the state of its sdp->srcu_gp_seq_needed might make it appear that it has already requested this grace period, which would prevent starting the grace period. If no other CPU ever started a grace period again, this would look like a grace-period hang. Even if some other CPU took pity and started the needed grace period, the leaf rcu_node structure's ->srcu_data_have_cbs field won't have record of the fact that this CPU has a callback pending, which would look like a very localized grace-period hang. This might seem very unlikely, but SRCU grace periods can take less than a microsecond on small systems, which means that overflow can happen in much less than an hour on a 32-bit embedded system. And embedded systems are especially likely to have long-term idle CPUs. Therefore, it makes sense to prevent this scenario from happening. This commit therefore scans each srcu_data structure occasionally, with frequency controlled by the srcutree.counter_wrap_check kernel boot parameter. This parameter can be set to something like 255 in order to exercise the counter-wrap-prevention code. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08rcu: Move rcu_request_urgent_qs_task() out of rcutiny.h and rcutree.hPaul E. McKenney
The rcu_request_urgent_qs_task() function is used only within RCU, so there is no point in exporting it to the rest of the kernel from nclude/linux/rcutiny.h and include/linux/rcutree.h. This commit therefore moves this function to kernel/rcu/rcu.h. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08rcu: Move torture-related functions out of rcutiny.h and rcutree.hPaul E. McKenney
The various functions similar to rcu_batches_started(), the function show_rcu_gp_kthreads(), the various functions similar to rcu_force_quiescent_state(), and the variables rcutorture_testseq and rcutorture_vernum are used only within RCU. There is therefore no point in exporting them to the kernel at large from include/linux/rcutiny.h and include/linux/rcutree.h. This commit therefore moves all of these to kernel/rcu/rcu.h. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08rcu: Move rcu_ftrace_dump() from rcupdate.h to rcu.hPaul E. McKenney
The rcu_ftrace_dump() function is used only internally to RCU. This commit therefore moves its declaration from include/linux/rcupdate.h to kernel/rcu/rcu.h. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08rcu: Move rcu_is_nocb_cpu() from rcupdate.h to rcu.hPaul E. McKenney
The rcu_is_nocb_cpu() function is used only internally to RCU. This commit therefore moves its declaration from include/linux/rcupdate.h to kernel/rcu/rcu.h. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08rcu: Improve __call_rcu() debug-objects error messagePaul E. McKenney
The "__call_rcu(): Leaked duplicate callback" error message from __call_rcu() has proven to be unhelpful. This commit therefore changes it to "__call_rcu(): Double-freed CB" and adds the value of the pointer passed in. The value of the pointer improves debuggability by allowing correlation with tracing output, for example, the rcu:rcu_callback trace event. Reported-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08rcu: Move the RCU_SCHEDULER_ definitions from rcupdate.hPaul E. McKenney
The RCU_SCHEDULER_INACTIVE, RCU_SCHEDULER_INIT, and RCU_SCHEDULER_RUNNING definitions are used only within RCU, so this commit moves them from include/linux/rcupdate.h to kernel/rcu/rcu.h. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08rcu: Eliminate the unused __rcu_is_watching() functionPaul E. McKenney
The __rcu_is_watching() function is currently not used, aside from to implement the rcu_is_watching() function. This commit therefore eliminates __rcu_is_watching(), which has the beneficial side-effect of shrinking include/linux/rcupdate.h a bit. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08rcu: Move torture-related definitions from rcupdate.h to rcu.hPaul E. McKenney
The include/linux/rcupdate.h file contains a number of definitions that are used only to communicate between rcutorture, rcuperf, and the RCU code itself. There is no point in having these definitions exposed globally throughout the kernel, so this commit moves them to kernel/rcu/rcu.h. This change has the added benefit of shrinking rcupdate.h. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08rcu: Move expediting-related access/control out of rcupdate.hPaul E. McKenney
The rcu_gp_is_normal(), rcu_gp_is_expedited(), rcu_expedite_gp(), and rcu_unexpedite_gp() functions are intended only for use within the RCU implementation itself -- the sysfs access is what should be used outside of RCU. This commit therefore moves the declarations for these functions to kernel/rcu/rcu.h, and also includes this file into kernel/rcu/rcutorture.c and kernel/rcu/rcuperf.c. This also has the beneficial effect of shrinking rcupdate.c a bit. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08rcu: Move rcu_expedited and rcu_normal externs from rcupdate.hPaul E. McKenney
The rcu_expedited and rcu_normal variables are used only by sysctl and kernel/rcu/update.c, so it does not make sense to their extern declarations in rcupdate.h. This commit therefore moves these extern declarations to update.c. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08rcu: Move docbook comments out of rcupdate.hPaul E. McKenney
The include/linux/rcupdate.h file is included by more than 200 files, so shrinking it should provide some build-time benefits. This commit therefore moves several docbook comments from rcupdate.h to kernel/rcu/update.c, kernel/rcu/tree.c, and kernel/rcu/tree_plugin.h, thus reducing the number of times that the compiler has to scan these comments. This likely provides only a small benefit, but every little bit helps. This commit also fixes a malformed bulleted list noted by the 0day Test Robot. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08rcu: Add memory barriers for NOCB leader wakeupPaul E. McKenney
Wait/wakeup operations do not guarantee ordering on their own. Instead, either locking or memory barriers are required. This commit therefore adds memory barriers to wake_nocb_leader() and nocb_leader_wait(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Krister Johansen <kjlx@templeofstupid.com> Cc: <stable@vger.kernel.org> # 4.6.x
2017-06-08rcu: Use RCU_NOCB_WAKE rather than RCU_NOGP_WAKEPaul E. McKenney
The RCU_NOGP_WAKE_NOT, RCU_NOGP_WAKE, and RCU_NOGP_WAKE_FORCE flags are used to mediate wakeups for the no-CBs CPU kthreads. The "NOGP" really doesn't make any sense, so this commit does s/NOGP/NOCB/. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08rcu: Make synchronize_rcu_mult() check for duplicatesPaul E. McKenney
Currently, doing synchronize_rcu_mult(call_rcu, call_rcu) might (or might not) wait for two RCU grace periods. One approach is of course "don't do that!", but in CONFIG_PREEMPT=n kernels, synchronize_rcu_mult(call_rcu, call_rcu_sched) does exactly that. This results in an ugly #ifdef in sched_cpu_deactivate(). This commit therefore makes __wait_rcu_gp() check for duplicates, which in turn allows duplicates to be passed to synchronize_rcu_mult() without risk of waiting twice on the same type of grace period. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08srcu: Add DEBUG_OBJECTS_RCU_HEAD functionalityPaul E. McKenney
This commit adds DEBUG_OBJECTS_RCU_HEAD checking to detect call_srcu() counterparts to double-free bugs. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08srcu: Shrink Tiny SRCU a bitPaul E. McKenney
In Tiny SRCU, __srcu_read_lock() is a trivial function, outweighed by its EXPORT_SYMBOL_GPL(), and on many architectures, its call sequence. This commit therefore moves it to srcutiny.h so that it can be inlined. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08rcu: Add lockdep_assert_held() teeth to tree_plugin.hPaul E. McKenney
Comments can be helpful, but assertions carry more force. This commit therefore adds lockdep_assert_held() and RCU_LOCKDEP_WARN() calls to enforce lock-held and interrupt-disabled preconditions. Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08rcu: Add lockdep_assert_held() teeth to tree.cPaul E. McKenney
Comments can be helpful, but assertions carry more force. This commit therefore adds lockdep_assert_held() and RCU_LOCKDEP_WARN() calls to enforce lock-held and interrupt-disabled preconditions. Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08srcu: Print non-default exp_holdoff values at boot timePaul E. McKenney
This commit makes srcu_bootup_announce() check for non-default values of the auto-expedite holdoff time exp_holdoff and print a message if so. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08srcu: Make exp_holdoff module parameter be staticPaul E. McKenney
Because exp_holdoff is not used outside of srcutree.c, it can be static. This commit therefore makes this change. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08rcu: Update rcu_bootup_announce_oddness()Paul E. McKenney
This commit updates rcu_bootup_announce_oddness() to check additional Kconfig options and module/boot parameters. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08rcu: Print out rcupdate.c non-default boot-time settingsPaul E. McKenney
This commit adds a rcupdate_announce_bootup_oddness() function to print out non-default values of significant kernel boot parameter settings to aid in debugging. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08rcu: Add preemptibility checks in rcu_sched_qs() and rcu_bh_qs()Paul E. McKenney
This commit adds WARN_ON_ONCE() calls that trigger if either rcu_sched_qs() or rcu_bh_qs() are invoked with preemption enabled. In the immortal words of Peter Zijlstra: "these are much harder to ignore than comments". Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08rcuperf: Add writer_holdoff boot parameterPaul E. McKenney
This commit adds a writer_holdoff boot parameter to rcuperf, which is intended to be used to test Tree SRCU's auto-expediting. This boot parameter is in microseconds, and defaults to zero (that is, disabled). Set it to a bit larger than srcutree.exp_holdoff, keeping the nanosecond/microsecond conversion, to force Tree SRCU to auto-expedite more aggressively. This commit also adds documentation for this parameter, and fixes some alphabetization while in the neighborhood. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08rcuperf: Set more user-friendly defaultsPaul E. McKenney
Common-case use of rcuperf must set rcuperf.nreaders=0 and if not built as a module, rcuperf.shutdown. This commit therefore sets the default for rcuperf.nreaders to zero and sets the default for rcuperf.shutdown to zero if rcuperf is built as a module and to one otherwise. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08srcu: Shrink Tiny SRCU a bit morePaul E. McKenney
This commit rearranges Tiny SRCU's srcu_struct structure, substitutes u8 for bool, and shrinks counters down to short. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08srcu: Make Classic and Tree SRCU announce themselves at bootupPaul E. McKenney
Currently, the only way to tell whether a given kernel is running Classic, Tiny, or Tree SRCU is to look at the .config file, which can easily be lost or associated with the wrong kernel. This commit therefore has Classic and Tree SRCU identify themselves at boot time. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08rcuperf: Add test for dynamically initialized srcu_structPaul E. McKenney
This commit adds a perf_type of "srcud", which species that rcuperf test SRCU on a dynamically initialized srcu_struct. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08rcu: Make sync_rcu_preempt_exp_done() return boolPaul E. McKenney
The sync_rcu_preempt_exp_done() function returns a logical expression, but its return type is nevertheless int. This commit therefore changes the return type to bool. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08rcuperf: Add ability to performance-test call_rcu() and friendsPaul E. McKenney
This commit upgrades rcuperf so that it can do performance testing on asynchronous grace-period primitives such as call_srcu(). There is a new rcuperf.gp_async module parameter that specifies this new behavior, with the pre-existing rcuperf.gp_exp testing expedited grace periods such as synchronize_rcu_expedited, and with the default being to test synchronous non-expedited grace periods such as synchronize_rcu(). There is also a new rcuperf.gp_async_max module parameter that specifies the maximum number of outstanding callbacks per writer kthread, defaulting to 1,000. When this limit is exceeded, the writer thread invokes the appropriate flavor of rcu_barrier() to wait for callbacks to drain. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [ paulmck: Removed the redundant initialization noted by Arnd Bergmann. ]
2017-06-08rcu: Remove obsolete reference to synchronize_kernel()Paul E. McKenney
The synchronize_kernel() primitive was removed in favor of synchronize_sched() more than a decade ago, and it seems likely that rather few kernel hackers are familiar with it. Its continued presence is therefore providing more confusion than enlightenment. This commit therefore removes the reference from the synchronize_sched() header comment, and adds the corresponding information to the synchronize_rcu(0 header comment. Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08rcuperf: Defer expedited/normal check to end of testPaul E. McKenney
Current rcuperf startup checks to see if the user asked to measure only expedited grace periods, yet constrained all grace periods to be normal, or if the user asked to measure only normal grace periods, yet constrained all grace periods to be expedited. Useless tests of this sort are aborted. Unfortunately, making RCU work through the mid-boot dead zone [1] puts RCU into expedited-only mode during that zone. Which happens to also be the exact time that rcuperf carries out the aforementioned check. So if the user asks rcuperf to measure only normal grace periods (the default), rcuperf will now always complain and terminate the test. This commit therefore moves the checks to rcu_perf_cleanup(). This has the disadvantage of failing to abort useless tests, but avoids the need to create yet another kthread and the need to do fiddly checks involving the holdoff time. (Yes, another approach is to do the checks in a late-stage init function, but that would require some way to communicate badness to rcuperf's kthreads, and seems not worth the bother.) [1] https://lwn.net/Articles/716148/ Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08rcu: Complain if blocking in preemptible RCU read-side critical sectionPaul E. McKenney
Although preemptible RCU allows its read-side critical sections to be preempted, general blocking is forbidden. The reason for this is that excessive preemption times can be handled by CONFIG_RCU_BOOST=y, but a voluntarily blocked task doesn't care how high you boost its priority. Because preemptible RCU is a global mechanism, one ill-behaved reader hurts everyone. Hence the prohibition against general blocking in RCU-preempt read-side critical sections. Preemption yes, blocking no. This commit enforces this prohibition. There is a special exception for the -rt patchset (which they kindly volunteered to implement): It is OK to block (as opposed to merely being preempted) within an RCU-preempt read-side critical section, but only if the blocking is subject to priority inheritance. This exception permits CONFIG_RCU_BOOST=y to get -rt RCU readers out of trouble. Why doesn't this exception also apply to mainline's rt_mutex? Because of the possibility that someone does general blocking while holding an rt_mutex. Yes, the priority boosting will affect the rt_mutex, but it won't help with the task doing general blocking while holding that rt_mutex. Reported-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08srcu: Eliminate possibility of destructive counter overflowPaul E. McKenney
Earlier versions of Tree SRCU were subject to a counter overflow bug that could theoretically result in too-short grace periods. This commit eliminates this problem by adding an update-side memory barrier. The short explanation is that if the updater sums the unlock counts too late to see a given __srcu_read_unlock() increment, that CPU's next __srcu_read_lock() must see the new value of ->srcu_idx, thus incrementing the other bank of counters. This eliminates the possibility of destructive counter overflow as long as the srcu_read_lock() nesting level does not exceed floor(ULONG_MAX/NR_CPUS/2), which should be an eminently reasonable nesting limit, especially on 64-bit systems. Reported-by: Lance Roy <ldr709@gmail.com> Suggested-by: Lance Roy <ldr709@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08rcu: Prevent rcu_barrier() from starting needless grace periodsPaul E. McKenney
Currently rcu_barrier() uses call_rcu() to enqueue new callbacks on each CPU with a non-empty callback list. This works, but means that rcu_barrier() forces grace periods that are not otherwise needed. The key point is that rcu_barrier() never needs to wait for a grace period, but instead only for all pre-existing callbacks to be invoked. This means that rcu_barrier()'s new callbacks should be placed in the callback-list segment containing the last pre-existing callback. This commit makes this change using the new rcu_segcblist_entrain() function. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08srcu: Allow use of Classic SRCU from both process and interrupt contextPaolo Bonzini
Linu Cherian reported a WARN in cleanup_srcu_struct() when shutting down a guest running iperf on a VFIO assigned device. This happens because irqfd_wakeup() calls srcu_read_lock(&kvm->irq_srcu) in interrupt context, while a worker thread does the same inside kvm_set_irq(). If the interrupt happens while the worker thread is executing __srcu_read_lock(), updates to the Classic SRCU ->lock_count[] field or the Tree SRCU ->srcu_lock_count[] field can be lost. The docs say you are not supposed to call srcu_read_lock() and srcu_read_unlock() from irq context, but KVM interrupt injection happens from (host) interrupt context and it would be nice if SRCU supported the use case. KVM is using SRCU here not really for the "sleepable" part, but rather due to its IPI-free fast detection of grace periods. It is therefore not desirable to switch back to RCU, which would effectively revert commit 719d93cd5f5c ("kvm/irqchip: Speed up KVM_SET_GSI_ROUTING", 2014-01-16). However, the docs are overly conservative. You can have an SRCU instance only has users in irq context, and you can mix process and irq context as long as process context users disable interrupts. In addition, __srcu_read_unlock() actually uses this_cpu_dec() on both Tree SRCU and Classic SRCU. For those two implementations, only srcu_read_lock() is unsafe. When Classic SRCU's __srcu_read_unlock() was changed to use this_cpu_dec(), in commit 5a41344a3d83 ("srcu: Simplify __srcu_read_unlock() via this_cpu_dec()", 2012-11-29), __srcu_read_lock() did two increments. Therefore it kept __this_cpu_inc(), with preempt_disable/enable in the caller. Tree SRCU however only does one increment, so on most architectures it is more efficient for __srcu_read_lock() to use this_cpu_inc(), and any performance differences appear to be down in the noise. Cc: stable@vger.kernel.org Fixes: 719d93cd5f5c ("kvm/irqchip: Speed up KVM_SET_GSI_ROUTING") Reported-by: Linu Cherian <linuc.decode@gmail.com> Suggested-by: Linu Cherian <linuc.decode@gmail.com> Cc: kvm@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08srcu: Allow use of Tiny/Tree SRCU from both process and interrupt contextPaolo Bonzini
Linu Cherian reported a WARN in cleanup_srcu_struct() when shutting down a guest running iperf on a VFIO assigned device. This happens because irqfd_wakeup() calls srcu_read_lock(&kvm->irq_srcu) in interrupt context, while a worker thread does the same inside kvm_set_irq(). If the interrupt happens while the worker thread is executing __srcu_read_lock(), updates to the Classic SRCU ->lock_count[] field or the Tree SRCU ->srcu_lock_count[] field can be lost. The docs say you are not supposed to call srcu_read_lock() and srcu_read_unlock() from irq context, but KVM interrupt injection happens from (host) interrupt context and it would be nice if SRCU supported the use case. KVM is using SRCU here not really for the "sleepable" part, but rather due to its IPI-free fast detection of grace periods. It is therefore not desirable to switch back to RCU, which would effectively revert commit 719d93cd5f5c ("kvm/irqchip: Speed up KVM_SET_GSI_ROUTING", 2014-01-16). However, the docs are overly conservative. You can have an SRCU instance only has users in irq context, and you can mix process and irq context as long as process context users disable interrupts. In addition, __srcu_read_unlock() actually uses this_cpu_dec() on both Tree SRCU and Classic SRCU. For those two implementations, only srcu_read_lock() is unsafe. When Classic SRCU's __srcu_read_unlock() was changed to use this_cpu_dec(), in commit 5a41344a3d83 ("srcu: Simplify __srcu_read_unlock() via this_cpu_dec()", 2012-11-29), __srcu_read_lock() did two increments. Therefore it kept __this_cpu_inc(), with preempt_disable/enable in the caller. Tree SRCU however only does one increment, so on most architectures it is more efficient for __srcu_read_lock() to use this_cpu_inc(), and any performance differences appear to be down in the noise. Unlike Classic and Tree SRCU, Tiny SRCU does increments and decrements on a single variable. Therefore, as Peter Zijlstra pointed out, Tiny SRCU's implementation already supports mixed-context use of srcu_read_lock() and srcu_read_unlock(), at least as long as uses of srcu_read_lock() and srcu_read_unlock() in each handler are nested and paired properly. In other words, it is still illegal to (say) invoke srcu_read_lock() in an interrupt handler and to invoke the matching srcu_read_unlock() in a softirq handler. Therefore, the only change required for Tiny SRCU is to its comments. Fixes: 719d93cd5f5c ("kvm/irqchip: Speed up KVM_SET_GSI_ROUTING") Reported-by: Linu Cherian <linuc.decode@gmail.com> Suggested-by: Linu Cherian <linuc.decode@gmail.com> Cc: kvm@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Paolo Bonzini <pbonzini@redhat.com>
2017-05-10Merge branch 'core-rcu-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU updates from Ingo Molnar: "The main changes are: - Debloat RCU headers - Parallelize SRCU callback handling (plus overlapping patches) - Improve the performance of Tree SRCU on a CPU-hotplug stress test - Documentation updates - Miscellaneous fixes" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (74 commits) rcu: Open-code the rcu_cblist_n_lazy_cbs() function rcu: Open-code the rcu_cblist_n_cbs() function rcu: Open-code the rcu_cblist_empty() function rcu: Separately compile large rcu_segcblist functions srcu: Debloat the <linux/rcu_segcblist.h> header srcu: Adjust default auto-expediting holdoff srcu: Specify auto-expedite holdoff time srcu: Expedite first synchronize_srcu() when idle srcu: Expedited grace periods with reduced memory contention srcu: Make rcutorture writer stalls print SRCU GP state srcu: Exact tracking of srcu_data structures containing callbacks srcu: Make SRCU be built by default srcu: Fix Kconfig botch when SRCU not selected rcu: Make non-preemptive schedule be Tasks RCU quiescent state srcu: Expedite srcu_schedule_cbs_snp() callback invocation srcu: Parallelize callback handling kvm: Move srcu_struct fields to end of struct kvm rcu: Fix typo in PER_RCU_NODE_PERIOD header comment rcu: Use true/false in assignment to bool rcu: Use bool value directly ...
2017-05-02rcu: Open-code the rcu_cblist_n_lazy_cbs() functionPaul E. McKenney
Because the rcu_cblist_n_lazy_cbs() just samples the ->len_lazy counter, and because the rcu_cblist structure is quite straightforward, it makes sense to open-code rcu_cblist_n_lazy_cbs(p) as p->len_lazy, cutting out a level of indirection. This commit makes this change. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-02rcu: Open-code the rcu_cblist_n_cbs() functionPaul E. McKenney
Because the rcu_cblist_n_cbs() just samples the ->len counter, and because the rcu_cblist structure is quite straightforward, it makes sense to open-code rcu_cblist_n_cbs(p) as p->len, cutting out a level of indirection. This commit makes this change. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-02rcu: Open-code the rcu_cblist_empty() functionPaul E. McKenney
Because the rcu_cblist_empty() just samples the ->head pointer, and because the rcu_cblist structure is quite straightforward, it makes sense to open-code rcu_cblist_empty(p) as !p->head, cutting out a level of indirection. This commit makes this change. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-02rcu: Separately compile large rcu_segcblist functionsPaul E. McKenney
This commit creates a new kernel/rcu/rcu_segcblist.c file that contains non-trivial segcblist functions. Trivial functions remain as static inline functions in kernel/rcu/rcu_segcblist.h Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de>
2017-05-02srcu: Debloat the <linux/rcu_segcblist.h> headerIngo Molnar
Linus noticed that the <linux/rcu_segcblist.h> has huge inline functions which should not be inline at all. As a first step in cleaning this up, move them all to kernel/rcu/ and only keep an absolute minimum of data type defines in the header: before: -rw-r--r-- 1 mingo mingo 22284 May 2 10:25 include/linux/rcu_segcblist.h after: -rw-r--r-- 1 mingo mingo 3180 May 2 10:22 include/linux/rcu_segcblist.h More can be done, such as uninlining the large functions, which inlining is unjustified even if it's an RCU internal matter. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>