summaryrefslogtreecommitdiff
path: root/kernel/locking
AgeCommit message (Collapse)Author
2023-07-17locking/rtmutex: Fix task->pi_waiters integrityPeter Zijlstra
Henry reported that rt_mutex_adjust_prio_check() has an ordering problem and puts the lie to the comment in [7]. Sharing the sort key between lock->waiters and owner->pi_waiters *does* create problems, since unlike what the comment claims, holding [L] is insufficient. Notably, consider: A / \ M1 M2 | | B C That is, task A owns both M1 and M2, B and C block on them. In this case a concurrent chain walk (B & C) will modify their resp. sort keys in [7] while holding M1->wait_lock and M2->wait_lock. So holding [L] is meaningless, they're different Ls. This then gives rise to a race condition between [7] and [11], where the requeue of pi_waiters will observe an inconsistent tree order. B C (holds M1->wait_lock, (holds M2->wait_lock, holds B->pi_lock) holds A->pi_lock) [7] waiter_update_prio(); ... [8] raw_spin_unlock(B->pi_lock); ... [10] raw_spin_lock(A->pi_lock); [11] rt_mutex_enqueue_pi(); // observes inconsistent A->pi_waiters // tree order Fixing this means either extending the range of the owner lock from [10-13] to [6-13], with the immediate problem that this means [6-8] hold both blocked and owner locks, or duplicating the sort key. Since the locking in chain walk is horrible enough without having to consider pi_lock nesting rules, duplicate the sort key instead. By giving each tree their own sort key, the above race becomes harmless, if C sees B at the old location, then B will correct things (if they need correcting) when it walks up the chain and reaches A. Fixes: fb00aca47440 ("rtmutex: Turn the plist into an rb-tree") Reported-by: Henry Wu <triangletrap12@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Henry Wu <triangletrap12@gmail.com> Link: https://lkml.kernel.org/r/20230707161052.GF2883469%40hirez.programming.kicks-ass.net
2023-06-28Merge tag 'mm-nonmm-stable-2023-06-24-19-23' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-mm updates from Andrew Morton: - Arnd Bergmann has fixed a bunch of -Wmissing-prototypes in top-level directories - Douglas Anderson has added a new "buddy" mode to the hardlockup detector. It permits the detector to work on architectures which cannot provide the required interrupts, by having CPUs periodically perform checks on other CPUs - Zhen Lei has enhanced kexec's ability to support two crash regions - Petr Mladek has done a lot of cleanup on the hard lockup detector's Kconfig entries - And the usual bunch of singleton patches in various places * tag 'mm-nonmm-stable-2023-06-24-19-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (72 commits) kernel/time/posix-stubs.c: remove duplicated include ocfs2: remove redundant assignment to variable bit_off watchdog/hardlockup: fix typo in config HARDLOCKUP_DETECTOR_PREFER_BUDDY powerpc: move arch_trigger_cpumask_backtrace from nmi.h to irq.h devres: show which resource was invalid in __devm_ioremap_resource() watchdog/hardlockup: define HARDLOCKUP_DETECTOR_ARCH watchdog/sparc64: define HARDLOCKUP_DETECTOR_SPARC64 watchdog/hardlockup: make HAVE_NMI_WATCHDOG sparc64-specific watchdog/hardlockup: declare arch_touch_nmi_watchdog() only in linux/nmi.h watchdog/hardlockup: make the config checks more straightforward watchdog/hardlockup: sort hardlockup detector related config values a logical way watchdog/hardlockup: move SMP barriers from common code to buddy code watchdog/buddy: simplify the dependency for HARDLOCKUP_DETECTOR_PREFER_BUDDY watchdog/buddy: don't copy the cpumask in watchdog_next_cpu() watchdog/buddy: cleanup how watchdog_buddy_check_hardlockup() is called watchdog/hardlockup: remove softlockup comment in touch_nmi_watchdog() watchdog/hardlockup: in watchdog_hardlockup_check() use cpumask_copy() watchdog/hardlockup: don't use raw_cpu_ptr() in watchdog_hardlockup_kick() watchdog/hardlockup: HAVE_NMI_WATCHDOG must implement watchdog_hardlockup_probe() watchdog/hardlockup: keep kernel.nmi_watchdog sysctl as 0444 if probe fails ...
2023-06-27Merge tag 'locking-core-2023-06-27' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: - Introduce cmpxchg128() -- aka. the demise of cmpxchg_double() The cmpxchg128() family of functions is basically & functionally the same as cmpxchg_double(), but with a saner interface. Instead of a 6-parameter horror that forced u128 - u64/u64-halves layout details on the interface and exposed users to complexity, fragility & bugs, use a natural 3-parameter interface with u128 types. - Restructure the generated atomic headers, and add kerneldoc comments for all of the generic atomic{,64,_long}_t operations. The generated definitions are much cleaner now, and come with documentation. - Implement lock_set_cmp_fn() on lockdep, for defining an ordering when taking multiple locks of the same type. This gets rid of one use of lockdep_set_novalidate_class() in the bcache code. - Fix raw_cpu_generic_try_cmpxchg() bug due to an unintended variable shadowing generating garbage code on Clang on certain ARM builds. * tag 'locking-core-2023-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (43 commits) locking/atomic: scripts: fix ${atomic}_dec_if_positive() kerneldoc percpu: Fix self-assignment of __old in raw_cpu_generic_try_cmpxchg() locking/atomic: treewide: delete arch_atomic_*() kerneldoc locking/atomic: docs: Add atomic operations to the driver basic API documentation locking/atomic: scripts: generate kerneldoc comments docs: scripts: kernel-doc: accept bitwise negation like ~@var locking/atomic: scripts: simplify raw_atomic*() definitions locking/atomic: scripts: simplify raw_atomic_long*() definitions locking/atomic: scripts: split pfx/name/sfx/order locking/atomic: scripts: restructure fallback ifdeffery locking/atomic: scripts: build raw_atomic_long*() directly locking/atomic: treewide: use raw_atomic*_<op>() locking/atomic: scripts: add trivial raw_atomic*_<op>() locking/atomic: scripts: factor out order template generation locking/atomic: scripts: remove leftover "${mult}" locking/atomic: scripts: remove bogus order parameter locking/atomic: xtensa: add preprocessor symbols locking/atomic: x86: add preprocessor symbols locking/atomic: sparc: add preprocessor symbols locking/atomic: sh: add preprocessor symbols ...
2023-06-27Merge tag 'rcu.2023.06.22a' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu Pull RCU updates from Paul McKenney: "Documentation updates Miscellaneous fixes, perhaps most notably: - Remove RCU_NONIDLE(). The new visibility of most of the idle loop to RCU has obsoleted this API. - Make the RCU_SOFTIRQ callback-invocation time limit also apply to the rcuc kthreads that invoke callbacks for CONFIG_PREEMPT_RT. - Add a jiffies-based callback-invocation time limit to handle long-running callbacks. (The local_clock() function is only invoked once per 32 callbacks due to its high overhead.) - Stop rcu_tasks_invoke_cbs() from using never-onlined CPUs, which fixes a bug that can occur on systems with non-contiguous CPU numbering. kvfree_rcu updates: - Eliminate the single-argument variant of k[v]free_rcu() now that all uses have been converted to k[v]free_rcu_mightsleep(). - Add WARN_ON_ONCE() checks for k[v]free_rcu*() freeing callbacks too soon. Yes, this is closing the barn door after the horse has escaped, but Murphy says that there will be more horses. Callback-offloading updates: - Fix a number of bugs involving the shrinker and lazy callbacks. Tasks RCU updates Torture-test updates" * tag 'rcu.2023.06.22a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (32 commits) torture: Remove duplicated argument -enable-kvm for ppc64 doc/rcutorture: Add description of rcutorture.stall_cpu_block rcu/rcuscale: Stop kfree_scale_thread thread(s) after unloading rcuscale rcu/rcuscale: Move rcu_scale_*() after kfree_scale_cleanup() rcutorture: Correct name of use_softirq module parameter locktorture: Add long_hold to adjust lock-hold delays rcu/nocb: Make shrinker iterate only over NOCB CPUs rcu-tasks: Stop rcu_tasks_invoke_cbs() from using never-onlined CPUs rcu: Make rcu_cpu_starting() rely on interrupts being disabled rcu: Mark rcu_cpu_kthread() accesses to ->rcu_cpu_has_work rcu: Mark additional concurrent load from ->cpu_no_qs.b.exp rcu: Employ jiffies-based backstop to callback time limit rcu: Check callback-invocation time limit for rcuc kthreads rcu: Remove RCU_NONIDLE() rcu: Add more RCU files to kernel-api.rst rcu-tasks: Clarify the cblist_init_generic() function's pr_info() output rcu-tasks: Avoid pr_info() with spin lock in cblist_init_generic() rcu/nocb: Recheck lazy callbacks under the ->nocb_lock from shrinker rcu/nocb: Fix shrinker race against callback enqueuer rcu/nocb: Protect lazy shrinker against concurrent (de-)offloading ...
2023-06-09locking: add lockevent_read() prototypeArnd Bergmann
lockevent_read() has a __weak definition and the only caller in kernel/locking/lock_events.c, plus a strong definition in qspinlock_stat.h that overrides it, but no other declaration. This causes a W=1 warning: kernel/locking/lock_events.c:61:16: error: no previous prototype for 'lockevent_read' [-Werror=missing-prototypes] Add shared prototype to avoid the warnings. Link: https://lkml.kernel.org/r/20230517131102.934196-7-arnd@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christoph Lameter <cl@linux.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Eric Paris <eparis@redhat.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Moore <paul@paul-moore.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rafael@kernel.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <longman@redhat.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-05-28Merge tag 'core-debugobjects-2023-05-28' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull debugobjects fixes from Thomas Gleixner: "Two fixes for debugobjects: - Prevent the allocation path from waking up kswapd. That's a long standing issue due to the GFP_ATOMIC allocation flag. As debug objects can be invoked from pretty much any context waking kswapd can end up in arbitrary lock chains versus the waitqueue lock - Correct the explicit lockdep wait-type violation in debug_object_fill_pool()" * tag 'core-debugobjects-2023-05-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: debugobjects: Don't wake up kswapd from fill_pool() debugobjects,locking: Annotate debug_object_fill_pool() wait type violation
2023-05-19lockdep: Add lock_set_cmp_fn() annotationKent Overstreet
This implements a new interface to lockdep, lock_set_cmp_fn(), for defining a custom ordering when taking multiple locks of the same class. This is an alternative to subclasses, but can not fully replace them since subclasses allow lock hierarchies with other clasees inter-twined, while this relies on pure class nesting. Specifically, if A is our nesting class then: A/0 <- B <- A/1 Would be a valid lock order with subclasses (each subclass really is a full class from the validation PoV) but not with this annotation, which requires all nesting to be consecutive. Example output: | ============================================ | WARNING: possible recursive locking detected | 6.2.0-rc8-00003-g7d81e591ca6a-dirty #15 Not tainted | -------------------------------------------- | kworker/14:3/938 is trying to acquire lock: | ffff8880143218c8 (&b->lock l=0 0:2803368){++++}-{3:3}, at: bch_btree_node_get.part.0+0x81/0x2b0 | | but task is already holding lock: | ffff8880143de8c8 (&b->lock l=1 1048575:9223372036854775807){++++}-{3:3}, at: __bch_btree_map_nodes+0xea/0x1e0 | and the lock comparison function returns 1: | | other info that might help us debug this: | Possible unsafe locking scenario: | | CPU0 | ---- | lock(&b->lock l=1 1048575:9223372036854775807); | lock(&b->lock l=0 0:2803368); | | *** DEADLOCK *** | | May be due to missing lock nesting notation | | 3 locks held by kworker/14:3/938: | #0: ffff888005ea9d38 ((wq_completion)bcache){+.+.}-{0:0}, at: process_one_work+0x1ec/0x530 | #1: ffff8880098c3e70 ((work_completion)(&cl->work)#3){+.+.}-{0:0}, at: process_one_work+0x1ec/0x530 | #2: ffff8880143de8c8 (&b->lock l=1 1048575:9223372036854775807){++++}-{3:3}, at: __bch_btree_map_nodes+0xea/0x1e0 [peterz: extended changelog] Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20230509195847.1745548-1-kent.overstreet@linux.dev
2023-05-11locktorture: Add long_hold to adjust lock-hold delaysPaul E. McKenney
This commit adds a long_hold module parameter to allow testing diagnostics for excessive lock-hold times. Also adjust torture_param() invocations for longer line length while in the area. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
2023-05-08locking/rwsem: Add __always_inline annotation to __down_read_common() and ↵John Stultz
inlined callers Apparently despite it being marked inline, the compiler may not inline __down_read_common() which makes it difficult to identify the cause of lock contention, as the blocked function in traceevents will always be listed as __down_read_common(). So this patch adds __always_inline annotation to the common function (as well as the inlined helper callers) to force it to be inlined so the blocking function will be listed (via Wchan) in traceevents. Fixes: c995e638ccbb ("locking/rwsem: Fold __down_{read,write}*()") Reported-by: Tim Murray <timmurray@google.com> Signed-off-by: John Stultz <jstultz@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Waiman Long <longman@redhat.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20230503023351.2832796-1-jstultz@google.com
2023-05-05Merge tag 'locking-core-2023-05-05' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: - Introduce local{,64}_try_cmpxchg() - a slightly more optimal primitive, which will be used in perf events ring-buffer code - Simplify/modify rwsems on PREEMPT_RT, to address writer starvation - Misc cleanups/fixes * tag 'locking-core-2023-05-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/atomic: Correct (cmp)xchg() instrumentation locking/x86: Define arch_try_cmpxchg_local() locking/arch: Wire up local_try_cmpxchg() locking/generic: Wire up local{,64}_try_cmpxchg() locking/atomic: Add generic try_cmpxchg{,64}_local() support locking/rwbase: Mitigate indefinite writer starvation locking/arch: Rename all internal __xchg() names to __arch_xchg()
2023-05-02debugobjects,locking: Annotate debug_object_fill_pool() wait type violationPeter Zijlstra
There is an explicit wait-type violation in debug_object_fill_pool() for PREEMPT_RT=n kernels which allows them to more easily fill the object pool and reduce the chance of allocation failures. Lockdep's wait-type checks are designed to check the PREEMPT_RT locking rules even for PREEMPT_RT=n kernels and object to this, so create a lockdep annotation to allow this to stand. Specifically, create a 'lock' type that overrides the inner wait-type while it is held -- allowing one to temporarily raise it, such that the violation is hidden. Reported-by: Vlastimil Babka <vbabka@suse.cz> Reported-by: Qi Zheng <zhengqi.arch@bytedance.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Qi Zheng <zhengqi.arch@bytedance.com> Link: https://lkml.kernel.org/r/20230429100614.GA1489784@hirez.programming.kicks-ass.net
2023-04-29locking/rwbase: Mitigate indefinite writer starvationSebastian Andrzej Siewior
On PREEMPT_RT, rw_semaphore and rwlock_t locks are unfair to writers. Readers can indefinitely acquire the lock unless the writer fully acquired the lock, which might never happen if there is always a reader in the critical section owning the lock. Mel Gorman reported that since LTP-20220121 the dio_truncate test case went from having 1 reader to having 16 readers and that number of readers is sufficient to prevent the down_write ever succeeding while readers exist. Eventually the test is killed after 30 minutes as a failure. Mel proposed a timeout to limit how long a writer can be blocked until the reader is forced into the slowpath. Thomas argued that there is no added value by providing this timeout. From a PREEMPT_RT point of view, there are no critical rw_semaphore or rwlock_t locks left where the reader must be preferred. Mitigate indefinite writer starvation by forcing the READER into the slowpath once the WRITER attempts to acquire the lock. Reported-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Mel Gorman <mgorman@techsingularity.net> Link: https://lore.kernel.org/877cwbq4cq.ffs@tglx Link: https://lore.kernel.org/r/20230321161140.HMcQEhHb@linutronix.de Cc: Linus Torvalds <torvalds@linux-foundation.org>
2023-04-24Merge tag 'rcu.6.4.april5.2023.3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux Pull RCU updates from Joel Fernandes: - Updates and additions to MAINTAINERS files, with Boqun being added to the RCU entry and Zqiang being added as an RCU reviewer. I have also transitioned from reviewer to maintainer; however, Paul will be taking over sending RCU pull-requests for the next merge window. - Resolution of hotplug warning in nohz code, achieved by fixing cpu_is_hotpluggable() through interaction with the nohz subsystem. Tick dependency modifications by Zqiang, focusing on fixing usage of the TICK_DEP_BIT_RCU_EXP bitmask. - Avoid needless calls to the rcu-lazy shrinker for CONFIG_RCU_LAZY=n kernels, fixed by Zqiang. - Improvements to rcu-tasks stall reporting by Neeraj. - Initial renaming of k[v]free_rcu() to k[v]free_rcu_mightsleep() for increased robustness, affecting several components like mac802154, drbd, vmw_vmci, tracing, and more. A report by Eric Dumazet showed that the API could be unknowingly used in an atomic context, so we'd rather make sure they know what they're asking for by being explicit: https://lore.kernel.org/all/20221202052847.2623997-1-edumazet@google.com/ - Documentation updates, including corrections to spelling, clarifications in comments, and improvements to the srcu_size_state comments. - Better srcu_struct cache locality for readers, by adjusting the size of srcu_struct in support of SRCU usage by Christoph Hellwig. - Teach lockdep to detect deadlocks between srcu_read_lock() vs synchronize_srcu() contributed by Boqun. Previously lockdep could not detect such deadlocks, now it can. - Integration of rcutorture and rcu-related tools, targeted for v6.4 from Boqun's tree, featuring new SRCU deadlock scenarios, test_nmis module parameter, and more - Miscellaneous changes, various code cleanups and comment improvements * tag 'rcu.6.4.april5.2023.3' of git://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux: (71 commits) checkpatch: Error out if deprecated RCU API used mac802154: Rename kfree_rcu() to kvfree_rcu_mightsleep() rcuscale: Rename kfree_rcu() to kfree_rcu_mightsleep() ext4/super: Rename kfree_rcu() to kfree_rcu_mightsleep() net/mlx5: Rename kfree_rcu() to kfree_rcu_mightsleep() net/sysctl: Rename kvfree_rcu() to kvfree_rcu_mightsleep() lib/test_vmalloc.c: Rename kvfree_rcu() to kvfree_rcu_mightsleep() tracing: Rename kvfree_rcu() to kvfree_rcu_mightsleep() misc: vmw_vmci: Rename kvfree_rcu() to kvfree_rcu_mightsleep() drbd: Rename kvfree_rcu() to kvfree_rcu_mightsleep() rcu: Protect rcu_print_task_exp_stall() ->exp_tasks access rcu: Avoid stack overflow due to __rcu_irq_enter_check_tick() being kprobe-ed rcu-tasks: Report stalls during synchronize_srcu() in rcu_tasks_postscan() rcu: Permit start_poll_synchronize_rcu_expedited() to be invoked early rcu: Remove never-set needwake assignment from rcu_report_qs_rdp() rcu: Register rcu-lazy shrinker only for CONFIG_RCU_LAZY=y kernels rcu: Fix missing TICK_DEP_MASK_RCU_EXP dependency check rcu: Fix set/clear TICK_DEP_BIT_RCU_EXP bitmask race rcu/trace: use strscpy() to instead of strncpy() tick/nohz: Fix cpu_is_hotpluggable() by checking with nohz subsystem ...
2023-03-27locking/lockdep: Improve the deadlock scenario print for sync and read lockBoqun Feng
Lock scenario print is always a weak spot of lockdep splats. Improvement can be made if we rework the dependency search and the error printing. However without touching the graph search, we can improve a little for the circular deadlock case, since we have the to-be-added lock dependency, and know whether these two locks are read/write/sync. In order to know whether a held_lock is sync or not, a bit was "stolen" from ->references, which reduce our limit for the same lock class nesting from 2^12 to 2^11, and it should still be good enough. Besides, since we now have bit in held_lock for sync, we don't need the "hardirqoffs being 1" trick, and also we can avoid the __lock_release() if we jump out of __lock_acquire() before the held_lock stored. With these changes, a deadlock case evolved with read lock and sync gets a better print-out from: [...] Possible unsafe locking scenario: [...] [...] CPU0 CPU1 [...] ---- ---- [...] lock(srcuA); [...] lock(srcuB); [...] lock(srcuA); [...] lock(srcuB); to [...] Possible unsafe locking scenario: [...] [...] CPU0 CPU1 [...] ---- ---- [...] rlock(srcuA); [...] lock(srcuB); [...] lock(srcuA); [...] sync(srcuB); Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
2023-03-27locking: Reduce the number of locks in ww_mutex stress testsBoqun Feng
The stress test in test_ww_mutex_init() uses 4095 locks since lockdep::reference has 12 bits, and since we are going to reduce it to 11 bits to support lock_sync(), and 2047 is still a reasonable number of the max nesting level for locks, so adjust the test. Reported-by: kernel test robot <oliver.sang@intel.com> Link: https://lore.kernel.org/oe-lkp/202302011445.9d99dae2-oliver.sang@intel.com Tested-by: Paul E. McKenney <paulmck@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
2023-03-27locking/lockdep: Introduce lock_sync()Boqun Feng
Currently, functions like synchronize_srcu() do not have lockdep annotations resembling those of other write-side locking primitives. Such annotations might look as follows: lock_acquire(); lock_release(); Such annotations would tell lockdep that synchronize_srcu() acts like an empty critical section that waits for other (read-side) critical sections to finish. This would definitely catch some deadlock, but as pointed out by Paul Mckenney [1], this could also introduce false positives because of irq-safe/unsafe detection. Of course, there are tricks could help with this: might_sleep(); // Existing statement in __synchronize_srcu(). if (IS_ENABLED(CONFIG_PROVE_LOCKING)) { local_irq_disable(); lock_acquire(); lock_release(); local_irq_enable(); } But it would be better for lockdep to provide a separate annonation for functions like synchronize_srcu(), so that people won't need to repeat the ugly tricks above. Therefore introduce lock_sync(), which is simply an lock+unlock pair with no irq safe/unsafe deadlock check. This works because the to-be-annontated functions do not create real critical sections, and there is therefore no way that irq can create extra dependencies. [1]: https://lore.kernel.org/lkml/20180412021233.ewncg5jjuzjw3x62@tardis/ Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Acked-by: Waiman Long <longman@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> [ boqun: Fix typos reported by Davidlohr Bueso and Paul E. Mckenney ] Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
2023-03-07locktorture: Add raw_spinlock* torture tests for PREEMPT_RT kernelsZqiang
In PREEMPT_RT kernels, both spin_lock() and spin_lock_irq() are converted to sleepable rt_spin_lock(). This means that the interrupt related suffixes for spin_lock/unlock(_irq, irqsave/irqrestore) do not affect the CPU's interrupt state. This commit therefore adds raw spin-lock torture tests. This in turn permits pure spin locks to be tested in PREEMPT_RT kernels. Suggested-by: Paul E. McKenney <paulmck@kernel.org> Suggested-by: Davidlohr Bueso <dave@stgolabs.net> Signed-off-by: Zqiang <qiang1.zhang@intel.com> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-03-07locktorture: With nested locks, occasionally skip main lockJohn Stultz
If we're using nested locking to stress things, occasionally skip taking the main lock, so that we can get some different contention patterns between the writers (to hopefully get two disjoint blocked trees) Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: kernel-team@android.com Co-developed-by: Connor O'Brien <connoro@google.com> Signed-off-by: Connor O'Brien <connoro@google.com> Signed-off-by: John Stultz <jstultz@google.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-03-07locktorture: Add nested locking to rtmutex torture testsJohn Stultz
This patch adds randomized nested locking to the rtmutex torture tests. Additionally it adds LOCK09 config files for testing rtmutexes with nested locking. Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: kernel-team@android.com Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Co-developed-by: Connor O'Brien <connoro@google.com> Signed-off-by: Connor O'Brien <connoro@google.com> Signed-off-by: John Stultz <jstultz@google.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-03-07locktorture: Add nested locking to mutex torture testsJohn Stultz
This patch adds randomized nested locking to the mutex torture tests, as well as new LOCK08 config files for testing mutexes with nested locking Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: kernel-team@android.com Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Co-developed-by: Connor O'Brien <connoro@google.com> Signed-off-by: Connor O'Brien <connoro@google.com> Signed-off-by: John Stultz <jstultz@google.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-03-07locktorture: Add nested_[un]lock() hooks and nlocks parameterJohn Stultz
In order to extend locktorture to support lock nesting, add nested_lock() and nested_unlock() hooks to the torture ops. These take a 32bit lockset mask which is generated at random, so some number of locks will be taken before the main lock is taken and released afterwards. Additionally, add nested_locks module parameter to allow specifying the number of nested locks to be used. This has been helpful to uncover issues in the proxy-exec series development. This was inspired by locktorture extensions originally implemented by Connor O'Brien, for stress testing the proxy-execution series: https://lore.kernel.org/lkml/20221003214501.2050087-12-connoro@google.com/ Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: kernel-team@android.com Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Co-developed-by: Connor O'Brien <connoro@google.com> Signed-off-by: Connor O'Brien <connoro@google.com> Signed-off-by: John Stultz <jstultz@google.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-02-21Merge tag 'rcu.2023.02.10a' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu Pull RCU updates from Paul McKenney: - Documentation updates - Miscellaneous fixes, perhaps most notably: - Throttling callback invocation based on the number of callbacks that are now ready to invoke instead of on the total number of callbacks - Several patches that suppress false-positive boot-time diagnostics, for example, due to lockdep not yet being initialized - Make expedited RCU CPU stall warnings dump stacks of any tasks that are blocking the stalled grace period. (Normal RCU CPU stall warnings have done this for many years) - Lazy-callback fixes to avoid delays during boot, suspend, and resume. (Note that lazy callbacks must be explicitly enabled, so this should not (yet) affect production use cases) - Make kfree_rcu() and friends take advantage of polled grace periods, thus reducing memory footprint by almost two orders of magnitude, admittedly on a microbenchmark This also begins the transition from kfree_rcu(p) to kfree_rcu_mightsleep(p). This transition was motivated by bugs where kfree_rcu(p), which can block, was typed instead of the intended kfree_rcu(p, rh) - SRCU updates, perhaps most notably fixing a bug that causes SRCU to fail when booted on a system with a non-zero boot CPU. This surprising situation actually happens for kdump kernels on the powerpc architecture This also adds an srcu_down_read() and srcu_up_read(), which act like srcu_read_lock() and srcu_read_unlock(), but allow an SRCU read-side critical section to be handed off from one task to another - Clean up the now-useless SRCU Kconfig option There are a few more commits that are not yet acked or pulled into maintainer trees, and these will be in a pull request for a later merge window - RCU-tasks updates, perhaps most notably these fixes: - A strange interaction between PID-namespace unshare and the RCU-tasks grace period that results in a low-probability but very real hang - A race between an RCU tasks rude grace period on a single-CPU system and CPU-hotplug addition of the second CPU that can result in a too-short grace period - A race between shrinking RCU tasks down to a single callback list and queuing a new callback to some other CPU, but where that queuing is delayed for more than an RCU grace period. This can result in that callback being stranded on the non-boot CPU - Torture-test updates and fixes - Torture-test scripting updates and fixes - Provide additional RCU CPU stall-warning information in kernels built with CONFIG_RCU_CPU_STALL_CPUTIME=y, and restore the full five-minute timeout limit for expedited RCU CPU stall warnings * tag 'rcu.2023.02.10a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (80 commits) rcu/kvfree: Add kvfree_rcu_mightsleep() and kfree_rcu_mightsleep() kernel/notifier: Remove CONFIG_SRCU init: Remove "select SRCU" fs/quota: Remove "select SRCU" fs/notify: Remove "select SRCU" fs/btrfs: Remove "select SRCU" fs: Remove CONFIG_SRCU drivers/pci/controller: Remove "select SRCU" drivers/net: Remove "select SRCU" drivers/md: Remove "select SRCU" drivers/hwtracing/stm: Remove "select SRCU" drivers/dax: Remove "select SRCU" drivers/base: Remove CONFIG_SRCU rcu: Disable laziness if lazy-tracking says so rcu: Track laziness during boot and suspend rcu: Remove redundant call to rcu_boost_kthread_setaffinity() rcu: Allow up to five minutes expedited RCU CPU stall-warning timeouts rcu: Align the output of RCU CPU stall warning messages rcu: Add RCU stall diagnosis information sched: Add helper nr_context_switches_cpu() ...
2023-02-20Merge tag 'sched-core-2023-02-20' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: - Improve the scalability of the CFS bandwidth unthrottling logic with large number of CPUs. - Fix & rework various cpuidle routines, simplify interaction with the generic scheduler code. Add __cpuidle methods as noinstr to objtool's noinstr detection and fix boatloads of cpuidle bugs & quirks. - Add new ABI: introduce MEMBARRIER_CMD_GET_REGISTRATIONS, to query previously issued registrations. - Limit scheduler slice duration to the sysctl_sched_latency period, to improve scheduling granularity with a large number of SCHED_IDLE tasks. - Debuggability enhancement on sys_exit(): warn about disabled IRQs, but also enable them to prevent a cascade of followup problems and repeat warnings. - Fix the rescheduling logic in prio_changed_dl(). - Micro-optimize cpufreq and sched-util methods. - Micro-optimize ttwu_runnable() - Micro-optimize the idle-scanning in update_numa_stats(), select_idle_capacity() and steal_cookie_task(). - Update the RSEQ code & self-tests - Constify various scheduler methods - Remove unused methods - Refine __init tags - Documentation updates - Misc other cleanups, fixes * tag 'sched-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (110 commits) sched/rt: pick_next_rt_entity(): check list_entry sched/deadline: Add more reschedule cases to prio_changed_dl() sched/fair: sanitize vruntime of entity being placed sched/fair: Remove capacity inversion detection sched/fair: unlink misfit task from cpu overutilized objtool: mem*() are not uaccess safe cpuidle: Fix poll_idle() noinstr annotation sched/clock: Make local_clock() noinstr sched/clock/x86: Mark sched_clock() noinstr x86/pvclock: Improve atomic update of last_value in pvclock_clocksource_read() x86/atomics: Always inline arch_atomic64*() cpuidle: tracing, preempt: Squash _rcuidle tracing cpuidle: tracing: Warn about !rcu_is_watching() cpuidle: lib/bug: Disable rcu_is_watching() during WARN/BUG cpuidle: drivers: firmware: psci: Dont instrument suspend code KVM: selftests: Fix build of rseq test exit: Detect and fix irq disabled state in oops cpuidle, arm64: Fix the ARM64 cpuidle logic cpuidle: mvebu: Fix duplicate flags assignment sched/fair: Limit sched slice duration ...
2023-02-20Merge tag 'locking-core-2023-02-20' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: - rwsem micro-optimizations - spinlock micro-optimizations - cleanups, simplifications * tag 'locking-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: vduse: Remove include of rwlock.h locking/lockdep: Remove lockdep_init_map_crosslock. x86/ACPI/boot: Use try_cmpxchg() in __acpi_{acquire,release}_global_lock() x86/PAT: Use try_cmpxchg() in set_page_memtype() locking/rwsem: Disable preemption in all down_write*() and up_write() code paths locking/rwsem: Disable preemption in all down_read*() and up_read() code paths locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath locking/qspinlock: Micro-optimize pending state waiting for unlock
2023-02-06rtmutex: Ensure that the top waiter is always woken upWander Lairson Costa
Let L1 and L2 be two spinlocks. Let T1 be a task holding L1 and blocked on L2. T1, currently, is the top waiter of L2. Let T2 be the task holding L2. Let T3 be a task trying to acquire L1. The following events will lead to a state in which the wait queue of L2 isn't empty, but no task actually holds the lock. T1 T2 T3 == == == spin_lock(L1) | raw_spin_lock(L1->wait_lock) | rtlock_slowlock_locked(L1) | | task_blocks_on_rt_mutex(L1, T3) | | | orig_waiter->lock = L1 | | | orig_waiter->task = T3 | | | raw_spin_unlock(L1->wait_lock) | | | rt_mutex_adjust_prio_chain(T1, L1, L2, orig_waiter, T3) spin_unlock(L2) | | | | | rt_mutex_slowunlock(L2) | | | | | | raw_spin_lock(L2->wait_lock) | | | | | | wakeup(T1) | | | | | | raw_spin_unlock(L2->wait_lock) | | | | | | | | waiter = T1->pi_blocked_on | | | | waiter == rt_mutex_top_waiter(L2) | | | | waiter->task == T1 | | | | raw_spin_lock(L2->wait_lock) | | | | dequeue(L2, waiter) | | | | update_prio(waiter, T1) | | | | enqueue(L2, waiter) | | | | waiter != rt_mutex_top_waiter(L2) | | | | L2->owner == NULL | | | | wakeup(T1) | | | | raw_spin_unlock(L2->wait_lock) T1 wakes up T1 != top_waiter(L2) schedule_rtlock() If the deadline of T1 is updated before the call to update_prio(), and the new deadline is greater than the deadline of the second top waiter, then after the requeue, T1 is no longer the top waiter, and the wrong task is woken up which will then go back to sleep because it is not the top waiter. This can be reproduced in PREEMPT_RT with stress-ng: while true; do stress-ng --sched deadline --sched-period 1000000000 \ --sched-runtime 800000000 --sched-deadline \ 1000000000 --mmapfork 23 -t 20 done A similar issue was pointed out by Thomas versus the cases where the top waiter drops out early due to a signal or timeout, which is a general issue for all regular rtmutex use cases, e.g. futex. The problematic code is in rt_mutex_adjust_prio_chain(): // Save the top waiter before dequeue/enqueue prerequeue_top_waiter = rt_mutex_top_waiter(lock); rt_mutex_dequeue(lock, waiter); waiter_update_prio(waiter, task); rt_mutex_enqueue(lock, waiter); // Lock has no owner? if (!rt_mutex_owner(lock)) { // Top waiter changed ----> if (prerequeue_top_waiter != rt_mutex_top_waiter(lock)) ----> wake_up_state(waiter->task, waiter->wake_state); This only takes the case into account where @waiter is the new top waiter due to the requeue operation. But it fails to handle the case where @waiter is not longer the top waiter due to the requeue operation. Ensure that the new top waiter is woken up so in all cases so it can take over the ownerless lock. [ tglx: Amend changelog, add Fixes tag ] Fixes: c014ef69b3ac ("locking/rtmutex: Add wake_state to rt_mutex_waiter") Signed-off-by: Wander Lairson Costa <wander@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230117172649.52465-1-wander@redhat.com Link: https://lore.kernel.org/r/20230202123020.14844-1-wander@redhat.com
2023-01-31cpuidle: lib/bug: Disable rcu_is_watching() during WARN/BUGPeter Zijlstra
In order to avoid WARN/BUG from generating nested or even recursive warnings, force rcu_is_watching() true during WARN/lockdep_rcu_suspicious(). Notably things like unwinding the stack can trigger rcu_dereference() warnings, which then triggers more unwinding which then triggers more warnings etc.. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230126151323.408156109@infradead.org
2023-01-26locking/rwsem: Disable preemption in all down_write*() and up_write() code pathsWaiman Long
The previous patch has disabled preemption in all the down_read() and up_read() code paths. For symmetry, this patch extends commit: 48dfb5d2560d ("locking/rwsem: Disable preemption while trying for rwsem lock") ... to have preemption disabled in all the down_write() and up_write() code paths, including downgrade_write(). Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230126003628.365092-4-longman@redhat.com
2023-01-26locking/rwsem: Disable preemption in all down_read*() and up_read() code pathsWaiman Long
Commit: 91d2a812dfb9 ("locking/rwsem: Make handoff writer optimistically spin on owner") ... assumes that when the owner field is changed to NULL, the lock will become free soon. But commit: 48dfb5d2560d ("locking/rwsem: Disable preemption while trying for rwsem lock") ... disabled preemption when acquiring rwsem for write. However, preemption has not yet been disabled when acquiring a read lock on a rwsem. So a reader can add a RWSEM_READER_BIAS to count without setting owner to signal a reader, got preempted out by a RT task which then spins in the writer slowpath as owner remains NULL leading to live lock. One easy way to fix this problem is to disable preemption at all the down_read*() and up_read() code paths as implemented in this patch. Fixes: 91d2a812dfb9 ("locking/rwsem: Make handoff writer optimistically spin on owner") Reported-by: Mukesh Ojha <quic_mojha@quicinc.com> Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230126003628.365092-3-longman@redhat.com
2023-01-26locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpathWaiman Long
A non-first waiter can potentially spin in the for loop of rwsem_down_write_slowpath() without sleeping but fail to acquire the lock even if the rwsem is free if the following sequence happens: Non-first RT waiter First waiter Lock holder ------------------- ------------ ----------- Acquire wait_lock rwsem_try_write_lock(): Set handoff bit if RT or wait too long Set waiter->handoff_set Release wait_lock Acquire wait_lock Inherit waiter->handoff_set Release wait_lock Clear owner Release lock if (waiter.handoff_set) { rwsem_spin_on_owner((); if (OWNER_NULL) goto trylock_again; } trylock_again: Acquire wait_lock rwsem_try_write_lock(): if (first->handoff_set && (waiter != first)) return false; Release wait_lock A non-first waiter cannot really acquire the rwsem even if it mistakenly believes that it can spin on OWNER_NULL value. If that waiter happens to be an RT task running on the same CPU as the first waiter, it can block the first waiter from acquiring the rwsem leading to live lock. Fix this problem by making sure that a non-first waiter cannot spin in the slowpath loop without sleeping. Fixes: d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more consistent") Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Mukesh Ojha <quic_mojha@quicinc.com> Reviewed-by: Mukesh Ojha <quic_mojha@quicinc.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230126003628.365092-2-longman@redhat.com
2023-01-05locktorture: Make the rt_boost factor a tunableJoel Fernandes (Google)
The rt boosting in locktorture has a factor variable s currently large enough that boosting only happens once every minute or so. Add a tunable to reduce the factor so that boosting happens more often, to test paths and arrive at failure modes earlier. With this change, I can set the factor to like 50 and have the boosting happens every 10 seconds or so. Tested with boot parameters: locktorture.torture_type=mutex_lock locktorture.onoff_interval=1 locktorture.nwriters_stress=8 locktorture.stutter=0 locktorture.rt_boost=1 locktorture.rt_boost_factor=50 locktorture.nlocks=3 Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-01-05locktorture: Allow non-rtmutex lock types to be boostedJoel Fernandes (Google)
Currently RT boosting is only done for rtmutex_lock, however with proxy execution, we also have the mutex_lock participating in priorities. To exercise the testing better, add RT boosting to other lock testing types as well, using a new knob (rt_boost). Tested with boot parameters: locktorture.torture_type=mutex_lock locktorture.onoff_interval=1 locktorture.nwriters_stress=8 locktorture.stutter=0 locktorture.rt_boost=1 locktorture.rt_boost_factor=1 locktorture.nlocks=3 Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-01-05locking/qspinlock: Micro-optimize pending state waiting for unlockGuo Ren
When we're pending, we only care about lock value. The xchg_tail wouldn't affect the pending state. That means the hardware thread could stay in a sleep state and leaves the rest execution units' resources of pipeline to other hardware threads. This situation is the SMT scenarios in the same core. Not an entering low-power state situation. Of course, the granularity between cores is "cacheline", but the granularity between SMT hw threads of the same core could be "byte" which internal LSU handles. For example, when a hw-thread yields the resources of the core to other hw-threads, this patch could help the hw-thread stay in the sleep state and prevent it from being woken up by other hw-threads xchg_tail. Signed-off-by: Guo Ren <guoren@linux.alibaba.com> Signed-off-by: Guo Ren <guoren@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Waiman Long <longman@redhat.com> Link: https://lore.kernel.org/r/20230105021952.3090070-1-guoren@kernel.org Cc: Peter Zijlstra <peterz@infradead.org>
2023-01-01Merge tag 'locking_urgent_for_v6.2_rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Borislav Petkov: - Prevent the leaking of a debug timer in futex_waitv() - A preempt-RT mutex locking fix, adding the proper acquire semantics * tag 'locking_urgent_for_v6.2_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: futex: Fix futex_waitv() hrtimer debug object leak on kcalloc error rtmutex: Add acquire semantics for rtmutex lock acquisition slow path
2022-12-13Merge tag 'mm-stable-2022-12-13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - More userfaultfs work from Peter Xu - Several convert-to-folios series from Sidhartha Kumar and Huang Ying - Some filemap cleanups from Vishal Moola - David Hildenbrand added the ability to selftest anon memory COW handling - Some cpuset simplifications from Liu Shixin - Addition of vmalloc tracing support by Uladzislau Rezki - Some pagecache folioifications and simplifications from Matthew Wilcox - A pagemap cleanup from Kefeng Wang: we have VM_ACCESS_FLAGS, so use it - Miguel Ojeda contributed some cleanups for our use of the __no_sanitize_thread__ gcc keyword. This series should have been in the non-MM tree, my bad - Naoya Horiguchi improved the interaction between memory poisoning and memory section removal for huge pages - DAMON cleanups and tuneups from SeongJae Park - Tony Luck fixed the handling of COW faults against poisoned pages - Peter Xu utilized the PTE marker code for handling swapin errors - Hugh Dickins reworked compound page mapcount handling, simplifying it and making it more efficient - Removal of the autonuma savedwrite infrastructure from Nadav Amit and David Hildenbrand - zram support for multiple compression streams from Sergey Senozhatsky - David Hildenbrand reworked the GUP code's R/O long-term pinning so that drivers no longer need to use the FOLL_FORCE workaround which didn't work very well anyway - Mel Gorman altered the page allocator so that local IRQs can remnain enabled during per-cpu page allocations - Vishal Moola removed the try_to_release_page() wrapper - Stefan Roesch added some per-BDI sysfs tunables which are used to prevent network block devices from dirtying excessive amounts of pagecache - David Hildenbrand did some cleanup and repair work on KSM COW breaking - Nhat Pham and Johannes Weiner have implemented writeback in zswap's zsmalloc backend - Brian Foster has fixed a longstanding corner-case oddity in file[map]_write_and_wait_range() - sparse-vmemmap changes for MIPS, LoongArch and NIOS2 from Feiyang Chen - Shiyang Ruan has done some work on fsdax, to make its reflink mode work better under xfstests. Better, but still not perfect - Christoph Hellwig has removed the .writepage() method from several filesystems. They only need .writepages() - Yosry Ahmed wrote a series which fixes the memcg reclaim target beancounting - David Hildenbrand has fixed some of our MM selftests for 32-bit machines - Many singleton patches, as usual * tag 'mm-stable-2022-12-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (313 commits) mm/hugetlb: set head flag before setting compound_order in __prep_compound_gigantic_folio mm: mmu_gather: allow more than one batch of delayed rmaps mm: fix typo in struct pglist_data code comment kmsan: fix memcpy tests mm: add cond_resched() in swapin_walk_pmd_entry() mm: do not show fs mm pc for VM_LOCKONFAULT pages selftests/vm: ksm_functional_tests: fixes for 32bit selftests/vm: cow: fix compile warning on 32bit selftests/vm: madv_populate: fix missing MADV_POPULATE_(READ|WRITE) definitions mm/gup_test: fix PIN_LONGTERM_TEST_READ with highmem mm,thp,rmap: fix races between updates of subpages_mapcount mm: memcg: fix swapcached stat accounting mm: add nodes= arg to memory.reclaim mm: disable top-tier fallback to reclaim on proactive reclaim selftests: cgroup: make sure reclaim target memcg is unprotected selftests: cgroup: refactor proactive reclaim code to reclaim_until() mm: memcg: fix stale protection of reclaim target memcg mm/mmap: properly unaccount memory on mas_preallocate() failure omfs: remove ->writepage jfs: remove ->writepage ...
2022-12-12rtmutex: Add acquire semantics for rtmutex lock acquisition slow pathMel Gorman
Jan Kara reported the following bug triggering on 6.0.5-rt14 running dbench on XFS on arm64. kernel BUG at fs/inode.c:625! Internal error: Oops - BUG: 0 [#1] PREEMPT_RT SMP CPU: 11 PID: 6611 Comm: dbench Tainted: G E 6.0.0-rt14-rt+ #1 pc : clear_inode+0xa0/0xc0 lr : clear_inode+0x38/0xc0 Call trace: clear_inode+0xa0/0xc0 evict+0x160/0x180 iput+0x154/0x240 do_unlinkat+0x184/0x300 __arm64_sys_unlinkat+0x48/0xc0 el0_svc_common.constprop.4+0xe4/0x2c0 do_el0_svc+0xac/0x100 el0_svc+0x78/0x200 el0t_64_sync_handler+0x9c/0xc0 el0t_64_sync+0x19c/0x1a0 It also affects 6.1-rc7-rt5 and affects a preempt-rt fork of 5.14 so this is likely a bug that existed forever and only became visible when ARM support was added to preempt-rt. The same problem does not occur on x86-64 and he also reported that converting sb->s_inode_wblist_lock to raw_spinlock_t makes the problem disappear indicating that the RT spinlock variant is the problem. Which in turn means that RT mutexes on ARM64 and any other weakly ordered architecture are affected by this independent of RT. Will Deacon observed: "I'd be more inclined to be suspicious of the slowpath tbh, as we need to make sure that we have acquire semantics on all paths where the lock can be taken. Looking at the rtmutex code, this really isn't obvious to me -- for example, try_to_take_rt_mutex() appears to be able to return via the 'takeit' label without acquire semantics and it looks like we might be relying on the caller's subsequent _unlock_ of the wait_lock for ordering, but that will give us release semantics which aren't correct." Sebastian Andrzej Siewior prototyped a fix that does work based on that comment but it was a little bit overkill and added some fences that should not be necessary. The lock owner is updated with an IRQ-safe raw spinlock held, but the spin_unlock does not provide acquire semantics which are needed when acquiring a mutex. Adds the necessary acquire semantics for lock owner updates in the slow path acquisition and the waiter bit logic. It successfully completed 10 iterations of the dbench workload while the vanilla kernel fails on the first iteration. [ bigeasy@linutronix.de: Initial prototype fix ] Fixes: 700318d1d7b38 ("locking/rtmutex: Use acquire/release semantics") Fixes: 23f78d4a03c5 ("[PATCH] pi-futex: rt mutex core") Reported-by: Jan Kara <jack@suse.cz> Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20221202100223.6mevpbl7i6x5udfd@techsingularity.net
2022-12-11lockdep: allow instrumenting lockdep.c with KMSANAlexander Potapenko
Lockdep and KMSAN used to play badly together, causing deadlocks when KMSAN instrumentation of lockdep.c called lockdep functions recursively. Looks like this is no more the case, and a kernel can run (yet slower) with both KMSAN and lockdep enabled. This patch should fix false positives on wq_head->lock->dep_map, which KMSAN used to consider uninitialized because of lockdep.c not being instrumented. Link: https://lore.kernel.org/lkml/Y3b9AAEKp2Vr3e6O@sol.localdomain/ Link: https://lkml.kernel.org/r/20221128094541.2645890-1-glider@google.com Signed-off-by: Alexander Potapenko <glider@google.com> Reported-by: Eric Biggers <ebiggers@kernel.org> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Marco Elver <elver@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-18treewide: use get_random_u32_below() instead of deprecated functionJason A. Donenfeld
This is a simple mechanical transformation done by: @@ expression E; @@ - prandom_u32_max + get_random_u32_below (E) Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs Reviewed-by: SeongJae Park <sj@kernel.org> # for damon Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> # for infiniband Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> # for arm Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # for mmc Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-10-11treewide: use prandom_u32_max() when possible, part 1Jason A. Donenfeld
Rather than incurring a division or requesting too many random bytes for the given range, use the prandom_u32_max() function, which only takes the minimum required bytes from the RNG and avoids divisions. This was done mechanically with this coccinelle script: @basic@ expression E; type T; identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; typedef u64; @@ ( - ((T)get_random_u32() % (E)) + prandom_u32_max(E) | - ((T)get_random_u32() & ((E) - 1)) + prandom_u32_max(E * XXX_MAKE_SURE_E_IS_POW2) | - ((u64)(E) * get_random_u32() >> 32) + prandom_u32_max(E) | - ((T)get_random_u32() & ~PAGE_MASK) + prandom_u32_max(PAGE_SIZE) ) @multi_line@ identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; identifier RAND; expression E; @@ - RAND = get_random_u32(); ... when != RAND - RAND %= (E); + RAND = prandom_u32_max(E); // Find a potential literal @literal_mask@ expression LITERAL; type T; identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; position p; @@ ((T)get_random_u32()@p & (LITERAL)) // Add one to the literal. @script:python add_one@ literal << literal_mask.LITERAL; RESULT; @@ value = None if literal.startswith('0x'): value = int(literal, 16) elif literal[0] in '123456789': value = int(literal, 10) if value is None: print("I don't know how to handle %s" % (literal)) cocci.include_match(False) elif value == 2**32 - 1 or value == 2**31 - 1 or value == 2**24 - 1 or value == 2**16 - 1 or value == 2**8 - 1: print("Skipping 0x%x for cleanup elsewhere" % (value)) cocci.include_match(False) elif value & (value + 1) != 0: print("Skipping 0x%x because it's not a power of two minus one" % (value)) cocci.include_match(False) elif literal.startswith('0x'): coccinelle.RESULT = cocci.make_expr("0x%x" % (value + 1)) else: coccinelle.RESULT = cocci.make_expr("%d" % (value + 1)) // Replace the literal mask with the calculated result. @plus_one@ expression literal_mask.LITERAL; position literal_mask.p; expression add_one.RESULT; identifier FUNC; @@ - (FUNC()@p & (LITERAL)) + prandom_u32_max(RESULT) @collapse_ret@ type T; identifier VAR; expression E; @@ { - T VAR; - VAR = (E); - return VAR; + return E; } @drop_var@ type T; identifier VAR; @@ { - T VAR; ... when != VAR } Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Yury Norov <yury.norov@gmail.com> Reviewed-by: KP Singh <kpsingh@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> # for ext4 and sbitmap Reviewed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> # for drbd Acked-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Heiko Carstens <hca@linux.ibm.com> # for s390 Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # for mmc Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-10-10Merge tag 'mm-stable-2022-10-08' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Yu Zhao's Multi-Gen LRU patches are here. They've been under test in linux-next for a couple of months without, to my knowledge, any negative reports (or any positive ones, come to that). - Also the Maple Tree from Liam Howlett. An overlapping range-based tree for vmas. It it apparently slightly more efficient in its own right, but is mainly targeted at enabling work to reduce mmap_lock contention. Liam has identified a number of other tree users in the kernel which could be beneficially onverted to mapletrees. Yu Zhao has identified a hard-to-hit but "easy to fix" lockdep splat at [1]. This has yet to be addressed due to Liam's unfortunately timed vacation. He is now back and we'll get this fixed up. - Dmitry Vyukov introduces KMSAN: the Kernel Memory Sanitizer. It uses clang-generated instrumentation to detect used-unintialized bugs down to the single bit level. KMSAN keeps finding bugs. New ones, as well as the legacy ones. - Yang Shi adds a userspace mechanism (madvise) to induce a collapse of memory into THPs. - Zach O'Keefe has expanded Yang Shi's madvise(MADV_COLLAPSE) to support file/shmem-backed pages. - userfaultfd updates from Axel Rasmussen - zsmalloc cleanups from Alexey Romanov - cleanups from Miaohe Lin: vmscan, hugetlb_cgroup, hugetlb and memory-failure - Huang Ying adds enhancements to NUMA balancing memory tiering mode's page promotion, with a new way of detecting hot pages. - memcg updates from Shakeel Butt: charging optimizations and reduced memory consumption. - memcg cleanups from Kairui Song. - memcg fixes and cleanups from Johannes Weiner. - Vishal Moola provides more folio conversions - Zhang Yi removed ll_rw_block() :( - migration enhancements from Peter Xu - migration error-path bugfixes from Huang Ying - Aneesh Kumar added ability for a device driver to alter the memory tiering promotion paths. For optimizations by PMEM drivers, DRM drivers, etc. - vma merging improvements from Jakub Matěn. - NUMA hinting cleanups from David Hildenbrand. - xu xin added aditional userspace visibility into KSM merging activity. - THP & KSM code consolidation from Qi Zheng. - more folio work from Matthew Wilcox. - KASAN updates from Andrey Konovalov. - DAMON cleanups from Kaixu Xia. - DAMON work from SeongJae Park: fixes, cleanups. - hugetlb sysfs cleanups from Muchun Song. - Mike Kravetz fixes locking issues in hugetlbfs and in hugetlb core. Link: https://lkml.kernel.org/r/CAOUHufZabH85CeUN-MEMgL8gJGzJEWUrkiM58JkTbBhh-jew0Q@mail.gmail.com [1] * tag 'mm-stable-2022-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (555 commits) hugetlb: allocate vma lock for all sharable vmas hugetlb: take hugetlb vma_lock when clearing vma_lock->vma pointer hugetlb: fix vma lock handling during split vma and range unmapping mglru: mm/vmscan.c: fix imprecise comments mm/mglru: don't sync disk for each aging cycle mm: memcontrol: drop dead CONFIG_MEMCG_SWAP config symbol mm: memcontrol: use do_memsw_account() in a few more places mm: memcontrol: deprecate swapaccounting=0 mode mm: memcontrol: don't allocate cgroup swap arrays when memcg is disabled mm/secretmem: remove reduntant return value mm/hugetlb: add available_huge_pages() func mm: remove unused inline functions from include/linux/mm_inline.h selftests/vm: add selftest for MADV_COLLAPSE of uffd-minor memory selftests/vm: add file/shmem MADV_COLLAPSE selftest for cleared pmd selftests/vm: add thp collapse shmem testing selftests/vm: add thp collapse file and tmpfs testing selftests/vm: modularize thp collapse memory operations selftests/vm: dedup THP helpers mm/khugepaged: add tracepoint to hpage_collapse_scan_file() mm/madvise: add file and shmem support to MADV_COLLAPSE ...
2022-10-10Merge tag 'locking-core-2022-10-07' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: - Disable preemption in rwsem_write_trylock()'s attempt to take the rwsem, to avoid RT tasks hogging the CPU, which managed to preempt this function after the owner has been cleared but before a new owner is set. Also add debug checks to enforce this. - Add __lockfunc to more slow path functions and add __sched to semaphore functions. - Mark spinlock APIs noinline when the respective CONFIG_INLINE_SPIN_* toggles are disabled, to reduce LTO text size. - Print more debug information when lockdep gets confused in look_up_lock_class(). - Improve header file abuse checks. - Misc cleanups * tag 'locking-core-2022-10-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/lockdep: Print more debug information - report name and key when look_up_lock_class() got confused locking: Add __sched to semaphore functions locking/rwsem: Disable preemption while trying for rwsem lock locking: Detect includes rwlock.h outside of spinlock.h locking: Add __lockfunc to slow path functions locking/spinlocks: Mark spinlocks noinline when inline spinlocks are disabled selftests: futex: Fix 'the the' typo in comment
2022-10-03kmsan: disable instrumentation of unsupported common kernel codeAlexander Potapenko
EFI stub cannot be linked with KMSAN runtime, so we disable instrumentation for it. Instrumenting kcov, stackdepot or lockdep leads to infinite recursion caused by instrumentation hooks calling instrumented code again. Link: https://lkml.kernel.org/r/20220915150417.722975-13-glider@google.com Signed-off-by: Alexander Potapenko <glider@google.com> Reviewed-by: Marco Elver <elver@google.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Eric Biggers <ebiggers@google.com> Cc: Eric Biggers <ebiggers@kernel.org> Cc: Eric Dumazet <edumazet@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kees Cook <keescook@chromium.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vegard Nossum <vegard.nossum@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-21locking/lockdep: Print more debug information - report name and key when ↵Tetsuo Handa
look_up_lock_class() got confused Printing this information will be helpful: ------------[ cut here ]------------ Looking for class "l2tp_sock" with key l2tp_socket_class, but found a different class "slock-AF_INET6" with the same key WARNING: CPU: 1 PID: 14195 at kernel/locking/lockdep.c:940 look_up_lock_class+0xcc/0x140 Modules linked in: CPU: 1 PID: 14195 Comm: a.out Not tainted 6.0.0-rc6-dirty #863 Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 RIP: 0010:look_up_lock_class+0xcc/0x140 Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/bd99391e-f787-efe9-5ec6-3c6dc4c587b0@I-love.SAKURA.ne.jp
2022-09-21Merge tag 'v6.0-rc6' into locking/core, to refresh the branchIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2022-09-15locking: Add __sched to semaphore functionsNamhyung Kim
The internal functions are marked with __sched already, let's do the same for external functions too so that we can skip them in the stack trace. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220909000803.4181857-1-namhyung@kernel.org
2022-09-15locking/rwsem: Disable preemption while trying for rwsem lockGokul krishna Krishnakumar
Make the region inside the rwsem_write_trylock non preemptible. We observe RT task is hogging CPU when trying to acquire rwsem lock which was acquired by a kworker task but before the rwsem owner was set. Here is the scenario: 1. CFS task (affined to a particular CPU) takes rwsem lock. 2. CFS task gets preempted by a RT task before setting owner. 3. RT task (FIFO) is trying to acquire the lock, but spinning until RT throttling happens for the lock as the lock was taken by CFS task. This patch attempts to fix the above issue by disabling preemption until owner is set for the lock. While at it also fix the issues at the places where rwsem_{set,clear}_owner() are called. This also adds lockdep annotation of preemption disable in rwsem_{set,clear}_owner() on Peter Z. suggestion. Signed-off-by: Gokul krishna Krishnakumar <quic_gokukris@quicinc.com> Signed-off-by: Mukesh Ojha <quic_mojha@quicinc.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Waiman Long <longman@redhat.com> Link: https://lore.kernel.org/r/1662661467-24203-1-git-send-email-quic_mojha@quicinc.com
2022-08-30locking/percpu-rwsem: Add percpu_is_write_locked() and percpu_is_read_locked()Marco Elver
Implement simple accessors to probe percpu-rwsem's locked state: percpu_is_write_locked(), percpu_is_read_locked(). Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20220829124719.675715-11-elver@google.com
2022-08-19locking: Add __lockfunc to slow path functionsNamhyung Kim
So that we can skip the functions in the perf lock contention and other places like /proc/PID/wchan. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Waiman Long <longman@redhat.com> Link: https://lore.kernel.org/r/20220810220346.1919485-1-namhyung@kernel.org
2022-08-04locking/spinlocks: Mark spinlocks noinline when inline spinlocks are disabledAndi Kleen
Otherwise LTO will inline them anyways and cause a large kernel text increase. Since the explicit intention here is to not inline them marking them noinline is good documentation even for the non-LTO case. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Martin Liska <mliska@suse.cz> Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20220719110548.1544-1-jslaby@suse.cz
2022-08-02Merge tag 'rcu.2022.07.26a' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu Pull RCU updates from Paul McKenney: - Documentation updates - Miscellaneous fixes - Callback-offload updates, perhaps most notably a new RCU_NOCB_CPU_DEFAULT_ALL Kconfig option that causes all CPUs to be offloaded at boot time, regardless of kernel boot parameters. This is useful to battery-powered systems such as ChromeOS and Android. In addition, a new RCU_NOCB_CPU_CB_BOOST kernel boot parameter prevents offloaded callbacks from interfering with real-time workloads and with energy-efficiency mechanisms - Polled grace-period updates, perhaps most notably making these APIs account for both normal and expedited grace periods - Tasks RCU updates, perhaps most notably reducing the CPU overhead of RCU tasks trace grace periods by more than a factor of two on a system with 15,000 tasks. The reduction is expected to increase with the number of tasks, so it seems reasonable to hypothesize that a system with 150,000 tasks might see a 20-fold reduction in CPU overhead - Torture-test updates - Updates that merge RCU's dyntick-idle tracking into context tracking, thus reducing the overhead of transitioning to kernel mode from either idle or nohz_full userspace execution for kernels that track context independently of RCU. This is expected to be helpful primarily for kernels built with CONFIG_NO_HZ_FULL=y * tag 'rcu.2022.07.26a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (98 commits) rcu: Add irqs-disabled indicator to expedited RCU CPU stall warnings rcu: Diagnose extended sync_rcu_do_polled_gp() loops rcu: Put panic_on_rcu_stall() after expedited RCU CPU stall warnings rcutorture: Test polled expedited grace-period primitives rcu: Add polled expedited grace-period primitives rcutorture: Verify that polled GP API sees synchronous grace periods rcu: Make Tiny RCU grace periods visible to polled APIs rcu: Make polled grace-period API account for expedited grace periods rcu: Switch polled grace-period APIs to ->gp_seq_polled rcu/nocb: Avoid polling when my_rdp->nocb_head_rdp list is empty rcu/nocb: Add option to opt rcuo kthreads out of RT priority rcu: Add nocb_cb_kthread check to rcu_is_callbacks_kthread() rcu/nocb: Add an option to offload all CPUs on boot rcu/nocb: Fix NOCB kthreads spawn failure with rcu_nocb_rdp_deoffload() direct call rcu/nocb: Invert rcu_state.barrier_mutex VS hotplug lock locking order rcu/nocb: Add/del rdp to iterate from rcuog itself rcu/tree: Add comment to describe GP-done condition in fqs loop rcu: Initialize first_gp_fqs at declaration in rcu_gp_fqs() rcu/kvfree: Remove useless monitor_todo flag rcu: Cleanup RCU urgency state for offline CPU ...
2022-08-01Merge tag 'locking-core-2022-08-01' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: "This was a fairly quiet cycle for the locking subsystem: - lockdep: Fix a handful of the more complex lockdep_init_map_*() primitives that can lose the lock_type & cause false reports. No such mishap was observed in the wild. - jump_label improvements: simplify the cross-arch support of initial NOP patching by making it arch-specific code (used on MIPS only), and remove the s390 initial NOP patching that was superfluous" * tag 'locking-core-2022-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/lockdep: Fix lockdep_init_map_*() confusion jump_label: make initial NOP patching the special case jump_label: mips: move module NOP patching into arch code jump_label: s390: avoid pointless initial NOP patching