summaryrefslogtreecommitdiff
path: root/kernel/locking
AgeCommit message (Collapse)Author
2024-01-12Merge tag 'rcu.release.v6.8' of https://github.com/neeraju/linuxLinus Torvalds
Pull RCU updates from Neeraj Upadhyay: - Documentation and comment updates - RCU torture, locktorture updates that include cleanups; nolibc init build support for mips, ppc and rv64; testing of mid stall duration scenario and fixing fqs task creation conditions - Misc fixes, most notably restricting usage of RCU CPU stall notifiers, to confine their usage primarily to debug kernels - RCU tasks minor fixes - lockdep annotation fix for NMI-safe accesses, callback advancing/acceleration cleanup and documentation improvements * tag 'rcu.release.v6.8' of https://github.com/neeraju/linux: rcu: Force quiescent states only for ongoing grace period doc: Clarify historical disclaimers in memory-barriers.txt doc: Mention address and data dependencies in rcu_dereference.rst doc: Clarify RCU Tasks reader/updater checklist rculist.h: docs: Fix wrong function summary Documentation: RCU: Remove repeated word in comments srcu: Use try-lock lockdep annotation for NMI-safe access. srcu: Explain why callbacks invocations can't run concurrently srcu: No need to advance/accelerate if no callback enqueued srcu: Remove superfluous callbacks advancing from srcu_gp_start() rcu: Remove unused macros from rcupdate.h rcu: Restrict access to RCU CPU stall notifiers rcu-tasks: Mark RCU Tasks accesses to current->rcu_tasks_idle_cpu rcutorture: Add fqs_holdoff check before fqs_task is created rcutorture: Add mid-sized stall to TREE07 rcutorture: add nolibc init support for mips, ppc and rv64 locktorture: Increase Hamming distance between call_rcu_chain and rcu_call_chains
2024-01-10Merge tag 'header_cleanup-2024-01-10' of https://evilpiepirate.org/git/bcachefsLinus Torvalds
Pull header cleanups from Kent Overstreet: "The goal is to get sched.h down to a type only header, so the main thing happening in this patchset is splitting out various _types.h headers and dependency fixups, as well as moving some things out of sched.h to better locations. This is prep work for the memory allocation profiling patchset which adds new sched.h interdepencencies" * tag 'header_cleanup-2024-01-10' of https://evilpiepirate.org/git/bcachefs: (51 commits) Kill sched.h dependency on rcupdate.h kill unnecessary thread_info.h include Kill unnecessary kernel.h include preempt.h: Kill dependency on list.h rseq: Split out rseq.h from sched.h LoongArch: signal.c: add header file to fix build error restart_block: Trim includes lockdep: move held_lock to lockdep_types.h sem: Split out sem_types.h uidgid: Split out uidgid_types.h seccomp: Split out seccomp_types.h refcount: Split out refcount_types.h uapi/linux/resource.h: fix include x86/signal: kill dependency on time.h syscall_user_dispatch.h: split out *_types.h mm_types_task.h: Trim dependencies Split out irqflags_types.h ipc: Kill bogus dependency on spinlock.h shm: Slim down dependencies workqueue: Split out workqueue_types.h ...
2024-01-02Merge tag 'v6.7-rc8' into locking/core, to pick up dependent changesIngo Molnar
Pick up these commits from Linus's tree: b106bcf0f99a ("locking/osq_lock: Clarify osq_wait_next()") 563adbfc351b ("locking/osq_lock: Clarify osq_wait_next() calling convention") 7c2230982129 ("locking/osq_lock: Move the definition of optimistic_spin_node into osq_lock.c") Signed-off-by: Ingo Molnar <mingo@kernel.org>
2023-12-30locking/osq_lock: Clarify osq_wait_next()David Laight
Directly return NULL or 'next' instead of breaking out of the loop. Signed-off-by: David Laight <david.laight@aculab.com> [ Split original patch into two independent parts - Linus ] Link: https://lore.kernel.org/lkml/7c8828aec72e42eeb841ca0ee3397e9a@AcuMS.aculab.com/ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-12-30locking/osq_lock: Clarify osq_wait_next() calling conventionDavid Laight
osq_wait_next() is passed 'prev' from osq_lock() and NULL from osq_unlock() but only needs the 'cpu' value to write to lock->tail. Just pass prev->cpu or OSQ_UNLOCKED_VAL instead. Should have no effect on the generated code since gcc manages to assume that 'prev != NULL' due to an earlier dereference. Signed-off-by: David Laight <david.laight@aculab.com> [ Changed 'old' to 'old_cpu' by request from Waiman Long - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-12-30locking/osq_lock: Move the definition of optimistic_spin_node into osq_lock.cDavid Laight
struct optimistic_spin_node is private to the implementation. Move it into the C file to ensure nothing is accessing it. Signed-off-by: David Laight <david.laight@aculab.com> Acked-by: Waiman Long <longman@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-12-20sched.h: move pid helpers to pid.hKent Overstreet
This is needed for killing the sched.h dependency on rcupdate.h, and pid.h is a better place for this code anyways. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-01locking/mutex: Document that mutex_unlock() is non-atomicJann Horn
I have seen several cases of attempts to use mutex_unlock() to release an object such that the object can then be freed by another task. This is not safe because mutex_unlock(), in the MUTEX_FLAG_WAITERS && !MUTEX_FLAG_HANDOFF case, accesses the mutex structure after having marked it as unlocked; so mutex_unlock() requires its caller to ensure that the mutex stays alive until mutex_unlock() returns. If MUTEX_FLAG_WAITERS is set and there are real waiters, those waiters have to keep the mutex alive, but we could have a spurious MUTEX_FLAG_WAITERS left if an interruptible/killable waiter bailed between the points where __mutex_unlock_slowpath() did the cmpxchg reading the flags and where it acquired the wait_lock. ( With spinlocks, that kind of code pattern is allowed and, from what I remember, used in several places in the kernel. ) Document this, such a semantic difference between mutexes and spinlocks is fairly unintuitive. [ mingo: Made the changelog a bit more assertive, refined the comments. ] Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20231130204817.2031407-1-jannh@google.com
2023-11-24lockdep: Fix block chain corruptionPeter Zijlstra
Kent reported an occasional KASAN splat in lockdep. Mark then noted: > I suspect the dodgy access is to chain_block_buckets[-1], which hits the last 4 > bytes of the redzone and gets (incorrectly/misleadingly) attributed to > nr_large_chain_blocks. That would mean @size == 0, at which point size_to_bucket() returns -1 and the above happens. alloc_chain_hlocks() has 'size - req', for the first with the precondition 'size >= rq', which allows the 0. This code is trying to split a block, del_chain_block() takes what we need, and add_chain_block() puts back the remainder, except in the above case the remainder is 0 sized and things go sideways. Fixes: 810507fe6fd5 ("locking/lockdep: Reuse freed chain_hlocks entries") Reported-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Kent Overstreet <kent.overstreet@linux.dev> Link: https://lkml.kernel.org/r/20231121114126.GH8262@noisy.programming.kicks-ass.net
2023-11-23locktorture: Increase Hamming distance between call_rcu_chain and ↵Paul E. McKenney
rcu_call_chains One letter difference is really not enough, so this commit changes call_rcu_chain to call_rcu_chain_list. Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.iitr10@gmail.com>
2023-10-30Merge tag 'rcu-next-v6.7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks Pull RCU updates from Frederic Weisbecker: - RCU torture, locktorture and generic torture infrastructure updates that include various fixes, cleanups and consolidations. Among the user visible things, ftrace dumps can now be found into their own file, and module parameters get better documented and reported on dumps. - Generic and misc fixes all over the place. Some highlights: * Hotplug handling has seen some light cleanups and comments * An RCU barrier can now be triggered through sysfs to serialize memory stress testing and avoid OOM * Object information is now dumped in case of invalid callback invocation * Also various SRCU issues, too hard to trigger to deserve urgent pull requests, have been fixed - RCU documentation updates - RCU reference scalability test minor fixes and doc improvements. - RCU tasks minor fixes - Stall detection updates. Introduce RCU CPU Stall notifiers that allows a subsystem to provide informations to help debugging. Also cure some false positive stalls. * tag 'rcu-next-v6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks: (56 commits) srcu: Only accelerate on enqueue time locktorture: Check the correct variable for allocation failure srcu: Fix callbacks acceleration mishandling rcu: Comment why callbacks migration can't wait for CPUHP_RCUTREE_PREP rcu: Standardize explicit CPU-hotplug calls rcu: Conditionally build CPU-hotplug teardown callbacks rcu: Remove references to rcu_migrate_callbacks() from diagrams rcu: Assume rcu_report_dead() is always called locally rcu: Assume IRQS disabled from rcu_report_dead() rcu: Use rcu_segcblist_segempty() instead of open coding it rcu: kmemleak: Ignore kmemleak false positives when RCU-freeing objects srcu: Fix srcu_struct node grpmask overflow on 64-bit systems torture: Convert parse-console.sh to mktemp rcutorture: Traverse possible cpu to set maxcpu in rcu_nocb_toggle() rcutorture: Replace schedule_timeout*() 1-jiffy waits with HZ/20 torture: Add kvm.sh --debug-info argument locktorture: Rename readers_bind/writers_bind to bind_readers/bind_writers doc: Catch-up update for locktorture module parameters locktorture: Add call_rcu_chains module parameter locktorture: Add new module parameters to lock_torture_print_module_parms() ...
2023-10-30Merge tag 'locking-core-2023-10-28' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Info Molnar: "Futex improvements: - Add the 'futex2' syscall ABI, which is an attempt to get away from the multiplex syscall and adds a little room for extentions, while lifting some limitations. - Fix futex PI recursive rt_mutex waiter state bug - Fix inter-process shared futexes on no-MMU systems - Use folios instead of pages Micro-optimizations of locking primitives: - Improve arch_spin_value_unlocked() on asm-generic ticket spinlock architectures, to improve lockref code generation - Improve the x86-32 lockref_get_not_zero() main loop by adding build-time CMPXCHG8B support detection for the relevant lockref code, and by better interfacing the CMPXCHG8B assembly code with the compiler - Introduce arch_sync_try_cmpxchg() on x86 to improve sync_try_cmpxchg() code generation. Convert some sync_cmpxchg() users to sync_try_cmpxchg(). - Micro-optimize rcuref_put_slowpath() Locking debuggability improvements: - Improve CONFIG_DEBUG_RT_MUTEXES=y to have a fast-path as well - Enforce atomicity of sched_submit_work(), which is de-facto atomic but was un-enforced previously. - Extend <linux/cleanup.h>'s no_free_ptr() with __must_check semantics - Fix ww_mutex self-tests - Clean up const-propagation in <linux/seqlock.h> and simplify the API-instantiation macros a bit RT locking improvements: - Provide the rt_mutex_*_schedule() primitives/helpers and use them in the rtmutex code to avoid recursion vs. rtlock on the PI state. - Add nested blocking lockdep asserts to rt_mutex_lock(), rtlock_lock() and rwbase_read_lock() .. plus misc fixes & cleanups" * tag 'locking-core-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (39 commits) futex: Don't include process MM in futex key on no-MMU locking/seqlock: Fix grammar in comment alpha: Fix up new futex syscall numbers locking/seqlock: Propagate 'const' pointers within read-only methods, remove forced type casts locking/lockdep: Fix string sizing bug that triggers a format-truncation compiler-warning locking/seqlock: Change __seqprop() to return the function pointer locking/seqlock: Simplify SEQCOUNT_LOCKNAME() locking/atomics: Use atomic_try_cmpxchg_release() to micro-optimize rcuref_put_slowpath() locking/atomic, xen: Use sync_try_cmpxchg() instead of sync_cmpxchg() locking/atomic/x86: Introduce arch_sync_try_cmpxchg() locking/atomic: Add generic support for sync_try_cmpxchg() and its fallback locking/seqlock: Fix typo in comment futex/requeue: Remove unnecessary ‘NULL’ initialization from futex_proxy_trylock_atomic() locking/local, arch: Rewrite local_add_unless() as a static inline function locking/debug: Fix debugfs API return value checks to use IS_ERR() locking/ww_mutex/test: Make sure we bail out instead of livelock locking/ww_mutex/test: Fix potential workqueue corruption locking/ww_mutex/test: Use prng instead of rng to avoid hangs at bootup futex: Add sys_futex_requeue() futex: Add flags2 argument to futex_requeue() ...
2023-10-19locking: export contention tracepoints for bcachefs six locksBrian Foster
The bcachefs implementation of six locks is intended to land in generic locking code in the long term, but has been pulled into the bcachefs subsystem for internal use for the time being. This code lift breaks the bcachefs module build as six locks depend a couple of the generic locking tracepoints. Export these tracepoint symbols for bcachefs. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-12locking/lockdep: Fix string sizing bug that triggers a format-truncation ↵Lucy Mielke
compiler-warning On an allyesconfig, with "treat warnings as errors" unset, GCC emits these warnings: kernel/locking/lockdep_proc.c:438:32: Warning: Format specifier '%lld' may be truncated when writing 1 to 17 bytes into a region of size 15 [-Wformat-truncation=] kernel/locking/lockdep_proc.c:438:31: Note: Format directive argument is in the range [-9223372036854775, 9223372036854775] kernel/locking/lockdep_proc.c:438:9: Note: 'snprintf' has output between 5 and 22 bytes into a target of size 15 In seq_time(), the longest s64 is "-9223372036854775808"-ish, which converted to the fixed-point float format is "-9223372036854775.80": 21 bytes, plus termination is another byte: 22. Therefore, a larger buffer size of 22 is needed here - not 15. The code was safe due to the snprintf(). Fix it. Signed-off-by: Lucy Mielke <lucymielke@icloud.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/ZSfOEHRkZAWaQr3U@fedora.fritz.box
2023-10-11locktorture: Check the correct variable for allocation failureDan Carpenter
There is a typo so this checks the wrong variable. "chains" plural vs "chain" singular. We already know that "chains" is non-zero. Fixes: 7f993623e9eb ("locktorture: Add call_rcu_chains module parameter") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2023-10-03locking/debug: Fix debugfs API return value checks to use IS_ERR()Atul Kumar Pant
Update the checking of return values from debugfs_create_file() and debugfs_create_dir() to use IS_ERR(). Signed-off-by: Atul Kumar Pant <atulpant.linux@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Waiman Long <longman@redhat.com> Link: https://lore.kernel.org/r/20230807121834.7438-1-atulpant.linux@gmail.com
2023-09-24locktorture: Rename readers_bind/writers_bind to bind_readers/bind_writersPaul E. McKenney
This commit renames the readers_bind and writers_bind module parameters to bind_readers and bind_writers, respectively. This provides added clarity via the imperative mode and better organizes the documentation. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2023-09-24locktorture: Add call_rcu_chains module parameterPaul E. McKenney
When running locktorture on large systems, there will normally be enough RCU activity to ensure that there is a grace period in flight at all times. However, on smaller systems, RCU might well be idle the majority of the time. This situation can be inconvenient in cases where the RCU CPU stall warning is part of the debugging process. This commit therefore adds an call_rcu_chains module parameter to locktorture, allowing the user to specify the desired number of self-propagating call_rcu() chains. For good measure, immediately before invoking call_rcu(), the self-propagating RCU callback invokes start_poll_synchronize_rcu() to force the immediate start of a grace period, with the call_rcu() forcing another to start shortly thereafter. Booting with locktorture.call_rcu_chains=2 increases the probability of a stuck locking primitive resulting in an RCU CPU stall warning from about 25% to nearly 100%. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2023-09-24locktorture: Add new module parameters to lock_torture_print_module_parms()Paul E. McKenney
This commit adds new module parameters to lock_torture_print_module_parms, and alphabetizes things while in the area. This change makes locktorture test results more useful and self-contained. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2023-09-24locktorture: Add acq_writer_lim to complain about long acquistion timesPaul E. McKenney
This commit adds a locktorture.acq_writer_lim module parameter that specifies the maximum number of jiffies that is expected to be consumed by write-side lock acquisition. If this limit is exceeded, a WARN_ONCE() causes a splat. Note that this limit applies to the main lock acquisition only, not to any nested acquisitions. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2023-09-24locktorture: Consolidate "if" statements in lock_torture_writer()Paul E. McKenney
There is a pair of adjacent "if" statements with identical conditions in the lock_torture_writer() function. This commit therefore combines them. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2023-09-24locktorture: Alphabetize torture_param() entriesPaul E. McKenney
There are getting to be too many module parameters for a random list to be comfortable, so this commit alphabetizes the list. Strictly code motion. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2023-09-24locktorture: Add readers_bind and writers_bind module parametersPaul E. McKenney
This commit adds readers_bind and writers_bind module parameters to locktorture in order to skew tests across socket boundaries. This skewing is intended to provide additional variable-latency stress on the primitive under test. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2023-09-22locking/ww_mutex/test: Make sure we bail out instead of livelockJohn Stultz
I've seen what appears to be livelocks in the stress_inorder_work() function, and looking at the code it is clear we can have a case where we continually retry acquiring the locks and never check to see if we have passed the specified timeout. This patch reworks that function so we always check the timeout before iterating through the loop again. I believe others may have hit this previously here: https://lore.kernel.org/lkml/895ef450-4fb3-5d29-a6ad-790657106a5a@intel.com/ Reported-by: Li Zhijian <zhijianx.li@intel.com> Signed-off-by: John Stultz <jstultz@google.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230922043616.19282-4-jstultz@google.com
2023-09-22locking/ww_mutex/test: Fix potential workqueue corruptionJohn Stultz
In some cases running with the test-ww_mutex code, I was seeing odd behavior where sometimes it seemed flush_workqueue was returning before all the work threads were finished. Often this would cause strange crashes as the mutexes would be freed while they were being used. Looking at the code, there is a lifetime problem as the controlling thread that spawns the work allocates the "struct stress" structures that are passed to the workqueue threads. Then when the workqueue threads are finished, they free the stress struct that was passed to them. Unfortunately the workqueue work_struct node is in the stress struct. Which means the work_struct is freed before the work thread returns and while flush_workqueue is waiting. It seems like a better idea to have the controlling thread both allocate and free the stress structures, so that we can be sure we don't corrupt the workqueue by freeing the structure prematurely. So this patch reworks the test to do so, and with this change I no longer see the early flush_workqueue returns. Signed-off-by: John Stultz <jstultz@google.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230922043616.19282-3-jstultz@google.com
2023-09-22locking/ww_mutex/test: Use prng instead of rng to avoid hangs at bootupJohn Stultz
Booting w/ qemu without kvm, and with 64 cpus, I noticed we'd sometimes hung task watchdog splats in get_random_u32_below() when using the test-ww_mutex stress test. While entropy exhaustion is no longer an issue, the RNG may be slower early in boot. The test-ww_mutex code will spawn off 128 threads (2x cpus) and each thread will call get_random_u32_below() a number of times to generate a random order of the 16 locks. This intense use takes time and without kvm, qemu can be slow enough that we trip the hung task watchdogs. For this test, we don't need true randomness, just mixed up orders for testing ww_mutex lock acquisitions, so it changes the logic to use the prng instead, which takes less time and avoids the watchdgos. Feedback would be appreciated! Signed-off-by: John Stultz <jstultz@google.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230922043616.19282-2-jstultz@google.com
2023-09-20locking/rtmutex: Add a lockdep assert to catch potential nested blockingThomas Gleixner
There used to be a BUG_ON(current->pi_blocked_on) in the lock acquisition functions, but that vanished in one of the rtmutex overhauls. Bring it back in form of a lockdep assert to catch code paths which take rtmutex based locks with current::pi_blocked_on != NULL. Reported-by: Crystal Wood <swood@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: "Peter Zijlstra (Intel)" <peterz@infradead.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20230908162254.999499-7-bigeasy@linutronix.de
2023-09-20locking/rtmutex: Use rt_mutex specific scheduler helpersSebastian Andrzej Siewior
Have rt_mutex use the rt_mutex specific scheduler helpers to avoid recursion vs rtlock on the PI state. [[ peterz: adapted to new names ]] Reported-by: Crystal Wood <swood@redhat.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20230908162254.999499-6-bigeasy@linutronix.de
2023-09-20locking/rtmutex: Avoid unconditional slowpath for DEBUG_RT_MUTEXESSebastian Andrzej Siewior
With DEBUG_RT_MUTEXES enabled the fast-path rt_mutex_cmpxchg_acquire() always fails and all lock operations take the slow path. Provide a new helper inline rt_mutex_try_acquire() which maps to rt_mutex_cmpxchg_acquire() in the non-debug case. For the debug case it invokes rt_mutex_slowtrylock() which can acquire a non-contended rtmutex under full debug coverage. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20230908162254.999499-3-bigeasy@linutronix.de
2023-08-29Merge tag 'mm-nonmm-stable-2023-08-28-22-48' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - An extensive rework of kexec and crash Kconfig from Eric DeVolder ("refactor Kconfig to consolidate KEXEC and CRASH options") - kernel.h slimming work from Andy Shevchenko ("kernel.h: Split out a couple of macros to args.h") - gdb feature work from Kuan-Ying Lee ("Add GDB memory helper commands") - vsprintf inclusion rationalization from Andy Shevchenko ("lib/vsprintf: Rework header inclusions") - Switch the handling of kdump from a udev scheme to in-kernel handling, by Eric DeVolder ("crash: Kernel handling of CPU and memory hot un/plug") - Many singleton patches to various parts of the tree * tag 'mm-nonmm-stable-2023-08-28-22-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (81 commits) document while_each_thread(), change first_tid() to use for_each_thread() drivers/char/mem.c: shrink character device's devlist[] array x86/crash: optimize CPU changes crash: change crash_prepare_elf64_headers() to for_each_possible_cpu() crash: hotplug support for kexec_load() x86/crash: add x86 crash hotplug support crash: memory and CPU hotplug sysfs attributes kexec: exclude elfcorehdr from the segment digest crash: add generic infrastructure for crash hotplug support crash: move a few code bits to setup support of crash hotplug kstrtox: consistently use _tolower() kill do_each_thread() nilfs2: fix WARNING in mark_buffer_dirty due to discarded buffer reuse scripts/bloat-o-meter: count weak symbol sizes treewide: drop CONFIG_EMBEDDED lockdep: fix static memory detection even more lib/vsprintf: declare no_hash_pointers in sprintf.h lib/vsprintf: split out sprintf() and friends kernel/fork: stop playing lockless games for exe_file replacement adfs: delete unused "union adfs_dirtail" definition ...
2023-08-28Merge tag 'x86-cleanups-2023-08-28' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 cleanups from Ingo Molnar: "The following commit deserves special mention: 22dc02f81cddd Revert "sched/fair: Move unused stub functions to header" This is in x86/cleanups, because the revert is a re-application of a number of cleanups that got removed inadvertedly" [ This also effectively undoes the amd_check_microcode() microcode declaration change I had done in my microcode loader merge in commit 42a7f6e3ffe0 ("Merge tag 'x86_microcode_for_v6.6_rc1' [...]"). I picked the declaration change by Arnd from this branch instead, which put it in <asm/processor.h> instead of <asm/microcode.h> like I had done in my merge resolution - Linus ] * tag 'x86-cleanups-2023-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/platform/uv: Refactor code using deprecated strncpy() interface to use strscpy() x86/hpet: Refactor code using deprecated strncpy() interface to use strscpy() x86/platform/uv: Refactor code using deprecated strcpy()/strncpy() interfaces to use strscpy() x86/qspinlock-paravirt: Fix missing-prototype warning x86/paravirt: Silence unused native_pv_lock_init() function warning x86/alternative: Add a __alt_reloc_selftest() prototype x86/purgatory: Include header for warn() declaration x86/asm: Avoid unneeded __div64_32 function definition Revert "sched/fair: Move unused stub functions to header" x86/apic: Hide unused safe_smp_processor_id() on 32-bit UP x86/cpu: Fix amd_check_microcode() declaration
2023-08-28Merge tag 'rcu.2023.08.21a' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu Pull RCU updates from Paul McKenney: - Documentation updates - Miscellaneous fixes, perhaps most notably simplifying SRCU_NOTIFIER_INIT() as suggested - RCU Tasks updates, most notably treating Tasks RCU callbacks as lazy while still treating synchronous grace periods as urgent. Also fixes one bug that restores the ability to apply debug-objects to RCU Tasks and another that fixes a race condition that could result in false-positive failures of the boot-time self-test code - RCU-scalability performance-test updates, most notably adding the ability to measure the RCU-Tasks's grace-period kthread's CPU consumption. This proved quite useful for the RCU Tasks work - Reference-acquisition/release performance-test updates, including a fix for an uninitialized wait_queue_head_t - Miscellaneous torture-test updates - Torture-test scripting updates, including removal of the non-longer-functional formal-verification scripts, test builds of individual RCU Tasks flavors, better diagnostics for loss of connectivity for distributed rcutorture tests, disabling of reboot loops in qemu/KVM-based rcutorture testing, and passing of init parameters to rcutorture's init program * tag 'rcu.2023.08.21a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (64 commits) rcu: Use WRITE_ONCE() for assignments to ->next for rculist_nulls rcu: Make the rcu_nocb_poll boot parameter usable via boot config rcu: Mark __rcu_irq_enter_check_tick() ->rcu_urgent_qs load srcu,notifier: Remove #ifdefs in favor of SRCU Tiny srcu_usage rcutorture: Stop right-shifting torture_random() return values torture: Stop right-shifting torture_random() return values torture: Move stutter_wait() timeouts to hrtimers torture: Move torture_shuffle() timeouts to hrtimers torture: Move torture_onoff() timeouts to hrtimers torture: Make torture_hrtimeout_*() use TASK_IDLE torture: Add lock_torture writer_fifo module parameter torture: Add a kthread-creation callback to _torture_create_kthread() rcu-tasks: Fix boot-time RCU tasks debug-only deadlock rcu-tasks: Permit use of debug-objects with RCU Tasks flavors checkpatch: Complain about unexpected uses of RCU Tasks Trace torture: Cause mkinitrd.sh to indicate failure on compile errors torture: Make init program dump command-line arguments torture: Switch qemu from -nographic to -display none torture: Add init-program support for loongarch torture: Avoid torture-test reboot loops ...
2023-08-21lockdep: fix static memory detection even moreHelge Deller
On the parisc architecture, lockdep reports for all static objects which are in the __initdata section (e.g. "setup_done" in devtmpfs, "kthreadd_done" in init/main.c) this warning: INFO: trying to register non-static key. The warning itself is wrong, because those objects are in the __initdata section, but the section itself is on parisc outside of range from _stext to _end, which is why the static_obj() functions returns a wrong answer. While fixing this issue, I noticed that the whole existing check can be simplified a lot. Instead of checking against the _stext and _end symbols (which include code areas too) just check for the .data and .bss segments (since we check a data object). This can be done with the existing is_kernel_core_data() macro. In addition objects in the __initdata section can be checked with init_section_contains(), and is_kernel_rodata() allows keys to be in the _ro_after_init section. This partly reverts and simplifies commit bac59d18c701 ("x86/setup: Fix static memory detection"). Link: https://lkml.kernel.org/r/ZNqrLRaOi/3wPAdp@p100 Fixes: bac59d18c701 ("x86/setup: Fix static memory detection") Signed-off-by: Helge Deller <deller@gmx.de> Cc: Borislav Petkov <bp@suse.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-14torture: Add lock_torture writer_fifo module parameterDietmar Eggemann
This commit adds a module parameter that causes the locktorture writer to run at real-time priority. To use it: insmod /lib/modules/torture.ko random_shuffle=1 insmod /lib/modules/locktorture.ko torture_type=mutex_lock rt_boost=1 rt_boost_factor=50 nested_locks=3 writer_fifo=1 ^^^^^^^^^^^^^ A predecessor to this patch has been helpful to uncover issues with the proxy-execution series. [ paulmck: Remove locktorture-specific code from kernel/torture.c. ] Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Valentin Schneider <vschneid@redhat.com> Cc: kernel-team@android.com Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> [jstultz: Include header change to build, reword commit message] Signed-off-by: John Stultz <jstultz@google.com> Acked-by: Davidlohr Bueso <dave@stgolabs.net> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-08-03x86/qspinlock-paravirt: Fix missing-prototype warningArnd Bergmann
__pv_queued_spin_unlock_slowpath() is defined in a header file as a global function, and designed to be called from inline asm, but there is no prototype visible in the definition: kernel/locking/qspinlock_paravirt.h:493:1: error: no previous \ prototype for '__pv_queued_spin_unlock_slowpath' [-Werror=missing-prototypes] Add this to the x86 header that contains the inline asm calling it, and ensure this gets included before the definition, rather than after it. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230803082619.1369127-8-arnd@kernel.org
2023-07-17locking/rtmutex: Fix task->pi_waiters integrityPeter Zijlstra
Henry reported that rt_mutex_adjust_prio_check() has an ordering problem and puts the lie to the comment in [7]. Sharing the sort key between lock->waiters and owner->pi_waiters *does* create problems, since unlike what the comment claims, holding [L] is insufficient. Notably, consider: A / \ M1 M2 | | B C That is, task A owns both M1 and M2, B and C block on them. In this case a concurrent chain walk (B & C) will modify their resp. sort keys in [7] while holding M1->wait_lock and M2->wait_lock. So holding [L] is meaningless, they're different Ls. This then gives rise to a race condition between [7] and [11], where the requeue of pi_waiters will observe an inconsistent tree order. B C (holds M1->wait_lock, (holds M2->wait_lock, holds B->pi_lock) holds A->pi_lock) [7] waiter_update_prio(); ... [8] raw_spin_unlock(B->pi_lock); ... [10] raw_spin_lock(A->pi_lock); [11] rt_mutex_enqueue_pi(); // observes inconsistent A->pi_waiters // tree order Fixing this means either extending the range of the owner lock from [10-13] to [6-13], with the immediate problem that this means [6-8] hold both blocked and owner locks, or duplicating the sort key. Since the locking in chain walk is horrible enough without having to consider pi_lock nesting rules, duplicate the sort key instead. By giving each tree their own sort key, the above race becomes harmless, if C sees B at the old location, then B will correct things (if they need correcting) when it walks up the chain and reaches A. Fixes: fb00aca47440 ("rtmutex: Turn the plist into an rb-tree") Reported-by: Henry Wu <triangletrap12@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Henry Wu <triangletrap12@gmail.com> Link: https://lkml.kernel.org/r/20230707161052.GF2883469%40hirez.programming.kicks-ass.net
2023-06-28Merge tag 'mm-nonmm-stable-2023-06-24-19-23' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-mm updates from Andrew Morton: - Arnd Bergmann has fixed a bunch of -Wmissing-prototypes in top-level directories - Douglas Anderson has added a new "buddy" mode to the hardlockup detector. It permits the detector to work on architectures which cannot provide the required interrupts, by having CPUs periodically perform checks on other CPUs - Zhen Lei has enhanced kexec's ability to support two crash regions - Petr Mladek has done a lot of cleanup on the hard lockup detector's Kconfig entries - And the usual bunch of singleton patches in various places * tag 'mm-nonmm-stable-2023-06-24-19-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (72 commits) kernel/time/posix-stubs.c: remove duplicated include ocfs2: remove redundant assignment to variable bit_off watchdog/hardlockup: fix typo in config HARDLOCKUP_DETECTOR_PREFER_BUDDY powerpc: move arch_trigger_cpumask_backtrace from nmi.h to irq.h devres: show which resource was invalid in __devm_ioremap_resource() watchdog/hardlockup: define HARDLOCKUP_DETECTOR_ARCH watchdog/sparc64: define HARDLOCKUP_DETECTOR_SPARC64 watchdog/hardlockup: make HAVE_NMI_WATCHDOG sparc64-specific watchdog/hardlockup: declare arch_touch_nmi_watchdog() only in linux/nmi.h watchdog/hardlockup: make the config checks more straightforward watchdog/hardlockup: sort hardlockup detector related config values a logical way watchdog/hardlockup: move SMP barriers from common code to buddy code watchdog/buddy: simplify the dependency for HARDLOCKUP_DETECTOR_PREFER_BUDDY watchdog/buddy: don't copy the cpumask in watchdog_next_cpu() watchdog/buddy: cleanup how watchdog_buddy_check_hardlockup() is called watchdog/hardlockup: remove softlockup comment in touch_nmi_watchdog() watchdog/hardlockup: in watchdog_hardlockup_check() use cpumask_copy() watchdog/hardlockup: don't use raw_cpu_ptr() in watchdog_hardlockup_kick() watchdog/hardlockup: HAVE_NMI_WATCHDOG must implement watchdog_hardlockup_probe() watchdog/hardlockup: keep kernel.nmi_watchdog sysctl as 0444 if probe fails ...
2023-06-27Merge tag 'locking-core-2023-06-27' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: - Introduce cmpxchg128() -- aka. the demise of cmpxchg_double() The cmpxchg128() family of functions is basically & functionally the same as cmpxchg_double(), but with a saner interface. Instead of a 6-parameter horror that forced u128 - u64/u64-halves layout details on the interface and exposed users to complexity, fragility & bugs, use a natural 3-parameter interface with u128 types. - Restructure the generated atomic headers, and add kerneldoc comments for all of the generic atomic{,64,_long}_t operations. The generated definitions are much cleaner now, and come with documentation. - Implement lock_set_cmp_fn() on lockdep, for defining an ordering when taking multiple locks of the same type. This gets rid of one use of lockdep_set_novalidate_class() in the bcache code. - Fix raw_cpu_generic_try_cmpxchg() bug due to an unintended variable shadowing generating garbage code on Clang on certain ARM builds. * tag 'locking-core-2023-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (43 commits) locking/atomic: scripts: fix ${atomic}_dec_if_positive() kerneldoc percpu: Fix self-assignment of __old in raw_cpu_generic_try_cmpxchg() locking/atomic: treewide: delete arch_atomic_*() kerneldoc locking/atomic: docs: Add atomic operations to the driver basic API documentation locking/atomic: scripts: generate kerneldoc comments docs: scripts: kernel-doc: accept bitwise negation like ~@var locking/atomic: scripts: simplify raw_atomic*() definitions locking/atomic: scripts: simplify raw_atomic_long*() definitions locking/atomic: scripts: split pfx/name/sfx/order locking/atomic: scripts: restructure fallback ifdeffery locking/atomic: scripts: build raw_atomic_long*() directly locking/atomic: treewide: use raw_atomic*_<op>() locking/atomic: scripts: add trivial raw_atomic*_<op>() locking/atomic: scripts: factor out order template generation locking/atomic: scripts: remove leftover "${mult}" locking/atomic: scripts: remove bogus order parameter locking/atomic: xtensa: add preprocessor symbols locking/atomic: x86: add preprocessor symbols locking/atomic: sparc: add preprocessor symbols locking/atomic: sh: add preprocessor symbols ...
2023-06-27Merge tag 'rcu.2023.06.22a' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu Pull RCU updates from Paul McKenney: "Documentation updates Miscellaneous fixes, perhaps most notably: - Remove RCU_NONIDLE(). The new visibility of most of the idle loop to RCU has obsoleted this API. - Make the RCU_SOFTIRQ callback-invocation time limit also apply to the rcuc kthreads that invoke callbacks for CONFIG_PREEMPT_RT. - Add a jiffies-based callback-invocation time limit to handle long-running callbacks. (The local_clock() function is only invoked once per 32 callbacks due to its high overhead.) - Stop rcu_tasks_invoke_cbs() from using never-onlined CPUs, which fixes a bug that can occur on systems with non-contiguous CPU numbering. kvfree_rcu updates: - Eliminate the single-argument variant of k[v]free_rcu() now that all uses have been converted to k[v]free_rcu_mightsleep(). - Add WARN_ON_ONCE() checks for k[v]free_rcu*() freeing callbacks too soon. Yes, this is closing the barn door after the horse has escaped, but Murphy says that there will be more horses. Callback-offloading updates: - Fix a number of bugs involving the shrinker and lazy callbacks. Tasks RCU updates Torture-test updates" * tag 'rcu.2023.06.22a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (32 commits) torture: Remove duplicated argument -enable-kvm for ppc64 doc/rcutorture: Add description of rcutorture.stall_cpu_block rcu/rcuscale: Stop kfree_scale_thread thread(s) after unloading rcuscale rcu/rcuscale: Move rcu_scale_*() after kfree_scale_cleanup() rcutorture: Correct name of use_softirq module parameter locktorture: Add long_hold to adjust lock-hold delays rcu/nocb: Make shrinker iterate only over NOCB CPUs rcu-tasks: Stop rcu_tasks_invoke_cbs() from using never-onlined CPUs rcu: Make rcu_cpu_starting() rely on interrupts being disabled rcu: Mark rcu_cpu_kthread() accesses to ->rcu_cpu_has_work rcu: Mark additional concurrent load from ->cpu_no_qs.b.exp rcu: Employ jiffies-based backstop to callback time limit rcu: Check callback-invocation time limit for rcuc kthreads rcu: Remove RCU_NONIDLE() rcu: Add more RCU files to kernel-api.rst rcu-tasks: Clarify the cblist_init_generic() function's pr_info() output rcu-tasks: Avoid pr_info() with spin lock in cblist_init_generic() rcu/nocb: Recheck lazy callbacks under the ->nocb_lock from shrinker rcu/nocb: Fix shrinker race against callback enqueuer rcu/nocb: Protect lazy shrinker against concurrent (de-)offloading ...
2023-06-09locking: add lockevent_read() prototypeArnd Bergmann
lockevent_read() has a __weak definition and the only caller in kernel/locking/lock_events.c, plus a strong definition in qspinlock_stat.h that overrides it, but no other declaration. This causes a W=1 warning: kernel/locking/lock_events.c:61:16: error: no previous prototype for 'lockevent_read' [-Werror=missing-prototypes] Add shared prototype to avoid the warnings. Link: https://lkml.kernel.org/r/20230517131102.934196-7-arnd@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christoph Lameter <cl@linux.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Eric Paris <eparis@redhat.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Moore <paul@paul-moore.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rafael@kernel.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <longman@redhat.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-05-28Merge tag 'core-debugobjects-2023-05-28' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull debugobjects fixes from Thomas Gleixner: "Two fixes for debugobjects: - Prevent the allocation path from waking up kswapd. That's a long standing issue due to the GFP_ATOMIC allocation flag. As debug objects can be invoked from pretty much any context waking kswapd can end up in arbitrary lock chains versus the waitqueue lock - Correct the explicit lockdep wait-type violation in debug_object_fill_pool()" * tag 'core-debugobjects-2023-05-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: debugobjects: Don't wake up kswapd from fill_pool() debugobjects,locking: Annotate debug_object_fill_pool() wait type violation
2023-05-19lockdep: Add lock_set_cmp_fn() annotationKent Overstreet
This implements a new interface to lockdep, lock_set_cmp_fn(), for defining a custom ordering when taking multiple locks of the same class. This is an alternative to subclasses, but can not fully replace them since subclasses allow lock hierarchies with other clasees inter-twined, while this relies on pure class nesting. Specifically, if A is our nesting class then: A/0 <- B <- A/1 Would be a valid lock order with subclasses (each subclass really is a full class from the validation PoV) but not with this annotation, which requires all nesting to be consecutive. Example output: | ============================================ | WARNING: possible recursive locking detected | 6.2.0-rc8-00003-g7d81e591ca6a-dirty #15 Not tainted | -------------------------------------------- | kworker/14:3/938 is trying to acquire lock: | ffff8880143218c8 (&b->lock l=0 0:2803368){++++}-{3:3}, at: bch_btree_node_get.part.0+0x81/0x2b0 | | but task is already holding lock: | ffff8880143de8c8 (&b->lock l=1 1048575:9223372036854775807){++++}-{3:3}, at: __bch_btree_map_nodes+0xea/0x1e0 | and the lock comparison function returns 1: | | other info that might help us debug this: | Possible unsafe locking scenario: | | CPU0 | ---- | lock(&b->lock l=1 1048575:9223372036854775807); | lock(&b->lock l=0 0:2803368); | | *** DEADLOCK *** | | May be due to missing lock nesting notation | | 3 locks held by kworker/14:3/938: | #0: ffff888005ea9d38 ((wq_completion)bcache){+.+.}-{0:0}, at: process_one_work+0x1ec/0x530 | #1: ffff8880098c3e70 ((work_completion)(&cl->work)#3){+.+.}-{0:0}, at: process_one_work+0x1ec/0x530 | #2: ffff8880143de8c8 (&b->lock l=1 1048575:9223372036854775807){++++}-{3:3}, at: __bch_btree_map_nodes+0xea/0x1e0 [peterz: extended changelog] Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20230509195847.1745548-1-kent.overstreet@linux.dev
2023-05-11locktorture: Add long_hold to adjust lock-hold delaysPaul E. McKenney
This commit adds a long_hold module parameter to allow testing diagnostics for excessive lock-hold times. Also adjust torture_param() invocations for longer line length while in the area. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
2023-05-08locking/rwsem: Add __always_inline annotation to __down_read_common() and ↵John Stultz
inlined callers Apparently despite it being marked inline, the compiler may not inline __down_read_common() which makes it difficult to identify the cause of lock contention, as the blocked function in traceevents will always be listed as __down_read_common(). So this patch adds __always_inline annotation to the common function (as well as the inlined helper callers) to force it to be inlined so the blocking function will be listed (via Wchan) in traceevents. Fixes: c995e638ccbb ("locking/rwsem: Fold __down_{read,write}*()") Reported-by: Tim Murray <timmurray@google.com> Signed-off-by: John Stultz <jstultz@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Waiman Long <longman@redhat.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20230503023351.2832796-1-jstultz@google.com
2023-05-05Merge tag 'locking-core-2023-05-05' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: - Introduce local{,64}_try_cmpxchg() - a slightly more optimal primitive, which will be used in perf events ring-buffer code - Simplify/modify rwsems on PREEMPT_RT, to address writer starvation - Misc cleanups/fixes * tag 'locking-core-2023-05-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/atomic: Correct (cmp)xchg() instrumentation locking/x86: Define arch_try_cmpxchg_local() locking/arch: Wire up local_try_cmpxchg() locking/generic: Wire up local{,64}_try_cmpxchg() locking/atomic: Add generic try_cmpxchg{,64}_local() support locking/rwbase: Mitigate indefinite writer starvation locking/arch: Rename all internal __xchg() names to __arch_xchg()
2023-05-02debugobjects,locking: Annotate debug_object_fill_pool() wait type violationPeter Zijlstra
There is an explicit wait-type violation in debug_object_fill_pool() for PREEMPT_RT=n kernels which allows them to more easily fill the object pool and reduce the chance of allocation failures. Lockdep's wait-type checks are designed to check the PREEMPT_RT locking rules even for PREEMPT_RT=n kernels and object to this, so create a lockdep annotation to allow this to stand. Specifically, create a 'lock' type that overrides the inner wait-type while it is held -- allowing one to temporarily raise it, such that the violation is hidden. Reported-by: Vlastimil Babka <vbabka@suse.cz> Reported-by: Qi Zheng <zhengqi.arch@bytedance.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Qi Zheng <zhengqi.arch@bytedance.com> Link: https://lkml.kernel.org/r/20230429100614.GA1489784@hirez.programming.kicks-ass.net
2023-04-29locking/rwbase: Mitigate indefinite writer starvationSebastian Andrzej Siewior
On PREEMPT_RT, rw_semaphore and rwlock_t locks are unfair to writers. Readers can indefinitely acquire the lock unless the writer fully acquired the lock, which might never happen if there is always a reader in the critical section owning the lock. Mel Gorman reported that since LTP-20220121 the dio_truncate test case went from having 1 reader to having 16 readers and that number of readers is sufficient to prevent the down_write ever succeeding while readers exist. Eventually the test is killed after 30 minutes as a failure. Mel proposed a timeout to limit how long a writer can be blocked until the reader is forced into the slowpath. Thomas argued that there is no added value by providing this timeout. From a PREEMPT_RT point of view, there are no critical rw_semaphore or rwlock_t locks left where the reader must be preferred. Mitigate indefinite writer starvation by forcing the READER into the slowpath once the WRITER attempts to acquire the lock. Reported-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Mel Gorman <mgorman@techsingularity.net> Link: https://lore.kernel.org/877cwbq4cq.ffs@tglx Link: https://lore.kernel.org/r/20230321161140.HMcQEhHb@linutronix.de Cc: Linus Torvalds <torvalds@linux-foundation.org>
2023-04-24Merge tag 'rcu.6.4.april5.2023.3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux Pull RCU updates from Joel Fernandes: - Updates and additions to MAINTAINERS files, with Boqun being added to the RCU entry and Zqiang being added as an RCU reviewer. I have also transitioned from reviewer to maintainer; however, Paul will be taking over sending RCU pull-requests for the next merge window. - Resolution of hotplug warning in nohz code, achieved by fixing cpu_is_hotpluggable() through interaction with the nohz subsystem. Tick dependency modifications by Zqiang, focusing on fixing usage of the TICK_DEP_BIT_RCU_EXP bitmask. - Avoid needless calls to the rcu-lazy shrinker for CONFIG_RCU_LAZY=n kernels, fixed by Zqiang. - Improvements to rcu-tasks stall reporting by Neeraj. - Initial renaming of k[v]free_rcu() to k[v]free_rcu_mightsleep() for increased robustness, affecting several components like mac802154, drbd, vmw_vmci, tracing, and more. A report by Eric Dumazet showed that the API could be unknowingly used in an atomic context, so we'd rather make sure they know what they're asking for by being explicit: https://lore.kernel.org/all/20221202052847.2623997-1-edumazet@google.com/ - Documentation updates, including corrections to spelling, clarifications in comments, and improvements to the srcu_size_state comments. - Better srcu_struct cache locality for readers, by adjusting the size of srcu_struct in support of SRCU usage by Christoph Hellwig. - Teach lockdep to detect deadlocks between srcu_read_lock() vs synchronize_srcu() contributed by Boqun. Previously lockdep could not detect such deadlocks, now it can. - Integration of rcutorture and rcu-related tools, targeted for v6.4 from Boqun's tree, featuring new SRCU deadlock scenarios, test_nmis module parameter, and more - Miscellaneous changes, various code cleanups and comment improvements * tag 'rcu.6.4.april5.2023.3' of git://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux: (71 commits) checkpatch: Error out if deprecated RCU API used mac802154: Rename kfree_rcu() to kvfree_rcu_mightsleep() rcuscale: Rename kfree_rcu() to kfree_rcu_mightsleep() ext4/super: Rename kfree_rcu() to kfree_rcu_mightsleep() net/mlx5: Rename kfree_rcu() to kfree_rcu_mightsleep() net/sysctl: Rename kvfree_rcu() to kvfree_rcu_mightsleep() lib/test_vmalloc.c: Rename kvfree_rcu() to kvfree_rcu_mightsleep() tracing: Rename kvfree_rcu() to kvfree_rcu_mightsleep() misc: vmw_vmci: Rename kvfree_rcu() to kvfree_rcu_mightsleep() drbd: Rename kvfree_rcu() to kvfree_rcu_mightsleep() rcu: Protect rcu_print_task_exp_stall() ->exp_tasks access rcu: Avoid stack overflow due to __rcu_irq_enter_check_tick() being kprobe-ed rcu-tasks: Report stalls during synchronize_srcu() in rcu_tasks_postscan() rcu: Permit start_poll_synchronize_rcu_expedited() to be invoked early rcu: Remove never-set needwake assignment from rcu_report_qs_rdp() rcu: Register rcu-lazy shrinker only for CONFIG_RCU_LAZY=y kernels rcu: Fix missing TICK_DEP_MASK_RCU_EXP dependency check rcu: Fix set/clear TICK_DEP_BIT_RCU_EXP bitmask race rcu/trace: use strscpy() to instead of strncpy() tick/nohz: Fix cpu_is_hotpluggable() by checking with nohz subsystem ...
2023-03-27locking/lockdep: Improve the deadlock scenario print for sync and read lockBoqun Feng
Lock scenario print is always a weak spot of lockdep splats. Improvement can be made if we rework the dependency search and the error printing. However without touching the graph search, we can improve a little for the circular deadlock case, since we have the to-be-added lock dependency, and know whether these two locks are read/write/sync. In order to know whether a held_lock is sync or not, a bit was "stolen" from ->references, which reduce our limit for the same lock class nesting from 2^12 to 2^11, and it should still be good enough. Besides, since we now have bit in held_lock for sync, we don't need the "hardirqoffs being 1" trick, and also we can avoid the __lock_release() if we jump out of __lock_acquire() before the held_lock stored. With these changes, a deadlock case evolved with read lock and sync gets a better print-out from: [...] Possible unsafe locking scenario: [...] [...] CPU0 CPU1 [...] ---- ---- [...] lock(srcuA); [...] lock(srcuB); [...] lock(srcuA); [...] lock(srcuB); to [...] Possible unsafe locking scenario: [...] [...] CPU0 CPU1 [...] ---- ---- [...] rlock(srcuA); [...] lock(srcuB); [...] lock(srcuA); [...] sync(srcuB); Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
2023-03-27locking: Reduce the number of locks in ww_mutex stress testsBoqun Feng
The stress test in test_ww_mutex_init() uses 4095 locks since lockdep::reference has 12 bits, and since we are going to reduce it to 11 bits to support lock_sync(), and 2047 is still a reasonable number of the max nesting level for locks, so adjust the test. Reported-by: kernel test robot <oliver.sang@intel.com> Link: https://lore.kernel.org/oe-lkp/202302011445.9d99dae2-oliver.sang@intel.com Tested-by: Paul E. McKenney <paulmck@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>