summaryrefslogtreecommitdiff
path: root/kernel/locking/rtmutex.c
AgeCommit message (Collapse)Author
2024-03-01locking/rtmutex: Use try_cmpxchg_relaxed() in mark_rt_mutex_waiters()Uros Bizjak
Use try_cmpxchg() instead of cmpxchg(*ptr, old, new) == old. The x86 CMPXCHG instruction returns success in the ZF flag, so this change saves a compare after CMPXCHG (and related move instruction in front of CMPXCHG). Also, try_cmpxchg() implicitly assigns old *ptr value to "old" when CMPXCHG fails. There is no need to re-read the value in the loop. Note that the value from *ptr should be read using READ_ONCE() to prevent the compiler from merging, refetching or reordering the read. No functional change intended. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Waiman Long <longman@redhat.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@kernel.org> Link: https://lore.kernel.org/r/20240124104953.612063-1-ubizjak@gmail.com
2023-09-20locking/rtmutex: Add a lockdep assert to catch potential nested blockingThomas Gleixner
There used to be a BUG_ON(current->pi_blocked_on) in the lock acquisition functions, but that vanished in one of the rtmutex overhauls. Bring it back in form of a lockdep assert to catch code paths which take rtmutex based locks with current::pi_blocked_on != NULL. Reported-by: Crystal Wood <swood@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: "Peter Zijlstra (Intel)" <peterz@infradead.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20230908162254.999499-7-bigeasy@linutronix.de
2023-09-20locking/rtmutex: Use rt_mutex specific scheduler helpersSebastian Andrzej Siewior
Have rt_mutex use the rt_mutex specific scheduler helpers to avoid recursion vs rtlock on the PI state. [[ peterz: adapted to new names ]] Reported-by: Crystal Wood <swood@redhat.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20230908162254.999499-6-bigeasy@linutronix.de
2023-09-20locking/rtmutex: Avoid unconditional slowpath for DEBUG_RT_MUTEXESSebastian Andrzej Siewior
With DEBUG_RT_MUTEXES enabled the fast-path rt_mutex_cmpxchg_acquire() always fails and all lock operations take the slow path. Provide a new helper inline rt_mutex_try_acquire() which maps to rt_mutex_cmpxchg_acquire() in the non-debug case. For the debug case it invokes rt_mutex_slowtrylock() which can acquire a non-contended rtmutex under full debug coverage. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20230908162254.999499-3-bigeasy@linutronix.de
2023-07-17locking/rtmutex: Fix task->pi_waiters integrityPeter Zijlstra
Henry reported that rt_mutex_adjust_prio_check() has an ordering problem and puts the lie to the comment in [7]. Sharing the sort key between lock->waiters and owner->pi_waiters *does* create problems, since unlike what the comment claims, holding [L] is insufficient. Notably, consider: A / \ M1 M2 | | B C That is, task A owns both M1 and M2, B and C block on them. In this case a concurrent chain walk (B & C) will modify their resp. sort keys in [7] while holding M1->wait_lock and M2->wait_lock. So holding [L] is meaningless, they're different Ls. This then gives rise to a race condition between [7] and [11], where the requeue of pi_waiters will observe an inconsistent tree order. B C (holds M1->wait_lock, (holds M2->wait_lock, holds B->pi_lock) holds A->pi_lock) [7] waiter_update_prio(); ... [8] raw_spin_unlock(B->pi_lock); ... [10] raw_spin_lock(A->pi_lock); [11] rt_mutex_enqueue_pi(); // observes inconsistent A->pi_waiters // tree order Fixing this means either extending the range of the owner lock from [10-13] to [6-13], with the immediate problem that this means [6-8] hold both blocked and owner locks, or duplicating the sort key. Since the locking in chain walk is horrible enough without having to consider pi_lock nesting rules, duplicate the sort key instead. By giving each tree their own sort key, the above race becomes harmless, if C sees B at the old location, then B will correct things (if they need correcting) when it walks up the chain and reaches A. Fixes: fb00aca47440 ("rtmutex: Turn the plist into an rb-tree") Reported-by: Henry Wu <triangletrap12@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Henry Wu <triangletrap12@gmail.com> Link: https://lkml.kernel.org/r/20230707161052.GF2883469%40hirez.programming.kicks-ass.net
2023-02-06rtmutex: Ensure that the top waiter is always woken upWander Lairson Costa
Let L1 and L2 be two spinlocks. Let T1 be a task holding L1 and blocked on L2. T1, currently, is the top waiter of L2. Let T2 be the task holding L2. Let T3 be a task trying to acquire L1. The following events will lead to a state in which the wait queue of L2 isn't empty, but no task actually holds the lock. T1 T2 T3 == == == spin_lock(L1) | raw_spin_lock(L1->wait_lock) | rtlock_slowlock_locked(L1) | | task_blocks_on_rt_mutex(L1, T3) | | | orig_waiter->lock = L1 | | | orig_waiter->task = T3 | | | raw_spin_unlock(L1->wait_lock) | | | rt_mutex_adjust_prio_chain(T1, L1, L2, orig_waiter, T3) spin_unlock(L2) | | | | | rt_mutex_slowunlock(L2) | | | | | | raw_spin_lock(L2->wait_lock) | | | | | | wakeup(T1) | | | | | | raw_spin_unlock(L2->wait_lock) | | | | | | | | waiter = T1->pi_blocked_on | | | | waiter == rt_mutex_top_waiter(L2) | | | | waiter->task == T1 | | | | raw_spin_lock(L2->wait_lock) | | | | dequeue(L2, waiter) | | | | update_prio(waiter, T1) | | | | enqueue(L2, waiter) | | | | waiter != rt_mutex_top_waiter(L2) | | | | L2->owner == NULL | | | | wakeup(T1) | | | | raw_spin_unlock(L2->wait_lock) T1 wakes up T1 != top_waiter(L2) schedule_rtlock() If the deadline of T1 is updated before the call to update_prio(), and the new deadline is greater than the deadline of the second top waiter, then after the requeue, T1 is no longer the top waiter, and the wrong task is woken up which will then go back to sleep because it is not the top waiter. This can be reproduced in PREEMPT_RT with stress-ng: while true; do stress-ng --sched deadline --sched-period 1000000000 \ --sched-runtime 800000000 --sched-deadline \ 1000000000 --mmapfork 23 -t 20 done A similar issue was pointed out by Thomas versus the cases where the top waiter drops out early due to a signal or timeout, which is a general issue for all regular rtmutex use cases, e.g. futex. The problematic code is in rt_mutex_adjust_prio_chain(): // Save the top waiter before dequeue/enqueue prerequeue_top_waiter = rt_mutex_top_waiter(lock); rt_mutex_dequeue(lock, waiter); waiter_update_prio(waiter, task); rt_mutex_enqueue(lock, waiter); // Lock has no owner? if (!rt_mutex_owner(lock)) { // Top waiter changed ----> if (prerequeue_top_waiter != rt_mutex_top_waiter(lock)) ----> wake_up_state(waiter->task, waiter->wake_state); This only takes the case into account where @waiter is the new top waiter due to the requeue operation. But it fails to handle the case where @waiter is not longer the top waiter due to the requeue operation. Ensure that the new top waiter is woken up so in all cases so it can take over the ownerless lock. [ tglx: Amend changelog, add Fixes tag ] Fixes: c014ef69b3ac ("locking/rtmutex: Add wake_state to rt_mutex_waiter") Signed-off-by: Wander Lairson Costa <wander@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230117172649.52465-1-wander@redhat.com Link: https://lore.kernel.org/r/20230202123020.14844-1-wander@redhat.com
2022-12-12rtmutex: Add acquire semantics for rtmutex lock acquisition slow pathMel Gorman
Jan Kara reported the following bug triggering on 6.0.5-rt14 running dbench on XFS on arm64. kernel BUG at fs/inode.c:625! Internal error: Oops - BUG: 0 [#1] PREEMPT_RT SMP CPU: 11 PID: 6611 Comm: dbench Tainted: G E 6.0.0-rt14-rt+ #1 pc : clear_inode+0xa0/0xc0 lr : clear_inode+0x38/0xc0 Call trace: clear_inode+0xa0/0xc0 evict+0x160/0x180 iput+0x154/0x240 do_unlinkat+0x184/0x300 __arm64_sys_unlinkat+0x48/0xc0 el0_svc_common.constprop.4+0xe4/0x2c0 do_el0_svc+0xac/0x100 el0_svc+0x78/0x200 el0t_64_sync_handler+0x9c/0xc0 el0t_64_sync+0x19c/0x1a0 It also affects 6.1-rc7-rt5 and affects a preempt-rt fork of 5.14 so this is likely a bug that existed forever and only became visible when ARM support was added to preempt-rt. The same problem does not occur on x86-64 and he also reported that converting sb->s_inode_wblist_lock to raw_spinlock_t makes the problem disappear indicating that the RT spinlock variant is the problem. Which in turn means that RT mutexes on ARM64 and any other weakly ordered architecture are affected by this independent of RT. Will Deacon observed: "I'd be more inclined to be suspicious of the slowpath tbh, as we need to make sure that we have acquire semantics on all paths where the lock can be taken. Looking at the rtmutex code, this really isn't obvious to me -- for example, try_to_take_rt_mutex() appears to be able to return via the 'takeit' label without acquire semantics and it looks like we might be relying on the caller's subsequent _unlock_ of the wait_lock for ordering, but that will give us release semantics which aren't correct." Sebastian Andrzej Siewior prototyped a fix that does work based on that comment but it was a little bit overkill and added some fences that should not be necessary. The lock owner is updated with an IRQ-safe raw spinlock held, but the spin_unlock does not provide acquire semantics which are needed when acquiring a mutex. Adds the necessary acquire semantics for lock owner updates in the slow path acquisition and the waiter bit logic. It successfully completed 10 iterations of the dbench workload while the vanilla kernel fails on the first iteration. [ bigeasy@linutronix.de: Initial prototype fix ] Fixes: 700318d1d7b38 ("locking/rtmutex: Use acquire/release semantics") Fixes: 23f78d4a03c5 ("[PATCH] pi-futex: rt mutex core") Reported-by: Jan Kara <jack@suse.cz> Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20221202100223.6mevpbl7i6x5udfd@techsingularity.net
2022-04-05locking: Apply contention tracepoints in the slow pathNamhyung Kim
Adding the lock contention tracepoints in various lock function slow paths. Note that each arch can define spinlock differently, I only added it only to the generic qspinlock for now. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Link: https://lkml.kernel.org/r/20220322185709.141236-3-namhyung@kernel.org
2021-12-18Merge branch 'locking/urgent' into locking/coreThomas Gleixner
Pick up the spin loop condition fix. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2021-12-18locking/rtmutex: Fix incorrect condition in rtmutex_spin_on_owner()Zqiang
Optimistic spinning needs to be terminated when the spinning waiter is not longer the top waiter on the lock, but the condition is negated. It terminates if the waiter is the top waiter, which is defeating the whole purpose. Fixes: c3123c431447 ("locking/rtmutex: Dont dereference waiter lockless") Signed-off-by: Zqiang <qiang1.zhang@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20211217074207.77425-1-qiang1.zhang@intel.com
2021-12-04locking: Make owner_on_cpu() into <linux/sched.h>Kefeng Wang
Move the owner_on_cpu() from kernel/locking/rwsem.c into include/linux/sched.h with under CONFIG_SMP, then use it in the mutex/rwsem/rtmutex to simplify the code. Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20211203075935.136808-2-wangkefeng.wang@huawei.com
2021-12-04locking/rtmutex: Squash self-deadlock check for ww_rt_mutex.Peter Zijlstra
Similar to the issues in commits: 6467822b8cc9 ("locking/rtmutex: Prevent spurious EDEADLK return caused by ww_mutexes") a055fcc132d4 ("locking/rtmutex: Return success on deadlock for ww_mutex waiters") ww_rt_mutex_lock() should not return EDEADLK without first going through the __ww_mutex logic to set the required state. In fact, the chain-walk can deal with the spurious cycles (per the above commits) this check warns about and is trying to avoid. Therefore ignore this test for ww_rt_mutex and simply let things fall in place. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20211129174654.668506-4-bigeasy@linutronix.de
2021-10-01rtmutex: Wake up the waiters lockless while dropping the read lock.Thomas Gleixner
The rw_semaphore and rwlock_t implementation both wake the waiter while holding the rt_mutex_base::wait_lock acquired. This can be optimized by waking the waiter lockless outside of the locked section to avoid a needless contention on the rt_mutex_base::wait_lock lock. Extend rt_mutex_wake_q_add() to also accept task and state and use it in __rwbase_read_unlock(). Suggested-by: Davidlohr Bueso <dave@stgolabs.net> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210928150006.597310-3-bigeasy@linutronix.de
2021-10-01rtmutex: Check explicit for TASK_RTLOCK_WAIT.Sebastian Andrzej Siewior
rt_mutex_wake_q_add() needs to need to distiguish between sleeping locks (TASK_RTLOCK_WAIT) and normal locks which use TASK_NORMAL to use the proper wake mechanism. Instead of checking for != TASK_NORMAL make it more robust and check explicit for TASK_RTLOCK_WAIT which is the reason why a different wake mechanism is used. No functional change. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210928150006.597310-2-bigeasy@linutronix.de
2021-09-09locking/rtmutex: Fix ww_mutex deadlock checkPeter Zijlstra
Dan reported that rt_mutex_adjust_prio_chain() can be called with .orig_waiter == NULL however commit a055fcc132d4 ("locking/rtmutex: Return success on deadlock for ww_mutex waiters") unconditionally dereferences it. Since both call-sites that have .orig_waiter == NULL don't care for the return value, simply disable the deadlock squash by adding the NULL check. Notably, both callers use the deadlock condition as a termination condition for the iteration; once detected, it is sure that (de)boosting is done. Arguably step [3] would be a more natural termination point, but it's dubious whether adding a third deadlock detection state would improve the code. Fixes: a055fcc132d4 ("locking/rtmutex: Return success on deadlock for ww_mutex waiters") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://lore.kernel.org/r/YS9La56fHMiCCo75@hirez.programming.kicks-ass.net
2021-08-27locking/rtmutex: Return success on deadlock for ww_mutex waitersPeter Zijlstra
ww_mutexes can legitimately cause a deadlock situation in the lock graph which is resolved afterwards by the wait/wound mechanics. The rtmutex chain walk can detect such a deadlock and returns EDEADLK which in turn skips the wait/wound mechanism and returns EDEADLK to the caller. That's wrong because both lock chains might get EDEADLK or the wrong waiter would back out. Detect that situation and return 'success' in case that the waiter which initiated the chain walk is a ww_mutex with context. This allows the wait/wound mechanics to resolve the situation according to the rules. [ tglx: Split it apart and added changelog ] Reported-by: Sebastian Siewior <bigeasy@linutronix.de> Fixes: add461325ec5 ("locking/rtmutex: Extend the rtmutex core to support ww_mutex") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/YSeWjCHoK4v5OcOt@hirez.programming.kicks-ass.net
2021-08-27locking/rtmutex: Prevent spurious EDEADLK return caused by ww_mutexesPeter Zijlstra
rtmutex based ww_mutexes can legitimately create a cycle in the lock graph which can be observed by a blocker which didn't cause the problem: P1: A, ww_A, ww_B P2: ww_B, ww_A P3: A P3 might therefore be trapped in the ww_mutex induced cycle and run into the lock depth limitation of rt_mutex_adjust_prio_chain() which returns -EDEADLK to the caller. Disable the deadlock detection walk when the chain walk observes a ww_mutex to prevent this looping. [ tglx: Split it apart and added changelog ] Reported-by: Sebastian Siewior <bigeasy@linutronix.de> Fixes: add461325ec5 ("locking/rtmutex: Extend the rtmutex core to support ww_mutex") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/YSeWjCHoK4v5OcOt@hirez.programming.kicks-ass.net
2021-08-25locking/rtmutex: Dequeue waiter on ww_mutex deadlockThomas Gleixner
The rt_mutex based ww_mutex variant queues the new waiter first in the lock's rbtree before evaluating the ww_mutex specific conditions which might decide that the waiter should back out. This check and conditional exit happens before the waiter is enqueued into the PI chain. The failure handling at the call site assumes that the waiter, if it is the top most waiter on the lock, is queued in the PI chain and then proceeds to adjust the unmodified PI chain, which results in RB tree corruption. Dequeue the waiter from the lock waiter list in the ww_mutex error exit path to prevent this. Fixes: add461325ec5 ("locking/rtmutex: Extend the rtmutex core to support ww_mutex") Reported-by: Sebastian Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210825102454.042280541@linutronix.de
2021-08-25locking/rtmutex: Dont dereference waiter locklessThomas Gleixner
The new rt_mutex_spin_on_onwer() loop checks whether the spinning waiter is still the top waiter on the lock by utilizing rt_mutex_top_waiter(), which is broken because that function contains a sanity check which dereferences the top waiter pointer to check whether the waiter belongs to the lock. That's wrong in the lockless spinwait case: CPU 0 CPU 1 rt_mutex_lock(lock) rt_mutex_lock(lock); queue(waiter0) waiter0 == rt_mutex_top_waiter(lock) rt_mutex_spin_on_onwer(lock, waiter0) { queue(waiter1) waiter1 == rt_mutex_top_waiter(lock) ... top_waiter = rt_mutex_top_waiter(lock) leftmost = rb_first_cached(&lock->waiters); -> signal dequeue(waiter1) destroy(waiter1) w = rb_entry(leftmost, ....) BUG_ON(w->lock != lock) <- UAF The BUG_ON() is correct for the case where the caller holds lock->wait_lock which guarantees that the leftmost waiter entry cannot vanish. For the lockless spinwait case it's broken. Create a new helper function which avoids the pointer dereference and just compares the leftmost entry pointer with current's waiter pointer to validate that currrent is still elegible for spinning. Fixes: 992caf7f1724 ("locking/rtmutex: Add adaptive spinwait mechanism") Reported-by: Sebastian Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210825102453.981720644@linutronix.de
2021-08-17locking/rtmutex: Add adaptive spinwait mechanismSteven Rostedt
Going to sleep when locks are contended can be quite inefficient when the contention time is short and the lock owner is running on a different CPU. The MCS mechanism cannot be used because MCS is strictly FIFO ordered while for rtmutex based locks the waiter ordering is priority based. Provide a simple adaptive spinwait mechanism which currently restricts the spinning to the top priority waiter. [ tglx: Provide a contemporary changelog, extended it to all rtmutex based locks and updated it to match the other spin on owner implementations ] Originally-by: Gregory Haskins <ghaskins@novell.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211305.912050691@linutronix.de
2021-08-17locking/rtmutex: Implement equal priority lock stealingGregory Haskins
The current logic only allows lock stealing to occur if the current task is of higher priority than the pending owner. Significant throughput improvements can be gained by allowing the lock stealing to include tasks of equal priority when the contended lock is a spin_lock or a rw_lock and the tasks are not in a RT scheduling task. The assumption was that the system will make faster progress by allowing the task already on the CPU to take the lock rather than waiting for the system to wake up a different task. This does add a degree of unfairness, but in reality no negative side effects have been observed in the many years that this has been used in the RT kernel. [ tglx: Refactored and rewritten several times by Steve Rostedt, Sebastian Siewior and myself ] Signed-off-by: Gregory Haskins <ghaskins@novell.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211305.857240222@linutronix.de
2021-08-17locking/rtmutex: Extend the rtmutex core to support ww_mutexPeter Zijlstra
Add a ww acquire context pointer to the waiter and various functions and add the ww_mutex related invocations to the proper spots in the locking code, similar to the mutex based variant. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211304.966139174@linutronix.de
2021-08-17locking/rtmutex: Squash !RT tasks to DEFAULT_PRIOPeter Zijlstra
Ensure all !RT tasks have the same prio such that they end up in FIFO order and aren't split up according to nice level. The reason why nice levels were taken into account so far is historical. In the early days of the rtmutex code it was done to give the PI boosting and deboosting a larger coverage. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211303.938676930@linutronix.de
2021-08-17locking/rtmutex: Provide the spin/rwlock core lock functionThomas Gleixner
A simplified version of the rtmutex slowlock function, which neither handles signals nor timeouts, and is careful about preserving the state of the blocked task across the lock operation. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211303.770228446@linutronix.de
2021-08-17locking/rtmutex: Guard regular sleeping locks specific functionsThomas Gleixner
Guard the regular sleeping lock specific functionality, which is used for rtmutex on non-RT enabled kernels and for mutex, rtmutex and semaphores on RT enabled kernels so the code can be reused for the RT specific implementation of spinlocks and rwlocks in a different compilation unit. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211303.311535693@linutronix.de
2021-08-17locking/rtmutex: Prepare RT rt_mutex_wake_q for RT locksThomas Gleixner
Add an rtlock_task pointer to rt_mutex_wake_q, which allows to handle the RT specific wakeup for spin/rwlock waiters. The pointer is just consuming 4/8 bytes on the stack so it is provided unconditionaly to avoid #ifdeffery all over the place. This cannot use a regular wake_q, because a task can have concurrent wakeups which would make it miss either lock or the regular wakeups, depending on what gets queued first, unless task struct gains a separate wake_q_node for this, which would be overkill, because there can only be a single task which gets woken up in the spin/rw_lock unlock path. No functional change for non-RT enabled kernels. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211303.253614678@linutronix.de
2021-08-17locking/rtmutex: Use rt_mutex_wake_q_headThomas Gleixner
Prepare for the required state aware handling of waiter wakeups via wake_q and switch the rtmutex code over to the rtmutex specific wrapper. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211303.197113263@linutronix.de
2021-08-17locking/rtmutex: Provide rt_wake_q_head and helpersThomas Gleixner
To handle the difference between wakeups for regular sleeping locks (mutex, rtmutex, rw_semaphore) and the wakeups for 'sleeping' spin/rwlocks on PREEMPT_RT enabled kernels correctly, it is required to provide a wake_q_head construct which allows to keep them separate. Provide a wrapper around wake_q_head and the required helpers, which will be extended with the state handling later. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211303.139337655@linutronix.de
2021-08-17locking/rtmutex: Add wake_state to rt_mutex_waiterThomas Gleixner
Regular sleeping locks like mutexes, rtmutexes and rw_semaphores are always entering and leaving a blocking section with task state == TASK_RUNNING. On a non-RT kernel spinlocks and rwlocks never affect the task state, but on RT kernels these locks are converted to rtmutex based 'sleeping' locks. So in case of contention the task goes to block, which requires to carefully preserve the task state, and restore it after acquiring the lock taking regular wakeups for the task into account, which happened while the task was blocked. This state preserving is achieved by having a separate task state for blocking on a RT spin/rwlock and a saved_state field in task_struct along with careful handling of these wakeup scenarios in try_to_wake_up(). To avoid conditionals in the rtmutex code, store the wake state which has to be used for waking a lock waiter in rt_mutex_waiter which allows to handle the regular and RT spin/rwlocks by handing it to wake_up_state(). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211303.079800739@linutronix.de
2021-08-17locking/rtmutex: Provide rt_mutex_slowlock_locked()Thomas Gleixner
Split the inner workings of rt_mutex_slowlock() out into a separate function, which can be reused by the upcoming RT lock substitutions, e.g. for rw_semaphores. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211302.841971086@linutronix.de
2021-08-17locking/rtmutex: Split out the inner parts of 'struct rtmutex'Peter Zijlstra
RT builds substitutions for rwsem, mutex, spinlock and rwlock around rtmutexes. Split the inner working out so each lock substitution can use them with the appropriate lockdep annotations. This avoids having an extra unused lockdep map in the wrapped rtmutex. No functional change. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211302.784739994@linutronix.de
2021-08-17locking/rtmutex: Split API from implementationThomas Gleixner
Prepare for reusing the inner functions of rtmutex for RT lock substitutions: introduce kernel/locking/rtmutex_api.c and move them there. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211302.726560996@linutronix.de
2021-08-17locking/rtmutex: Switch to from cmpxchg_*() to try_cmpxchg_*()Thomas Gleixner
Allows the compiler to generate better code depending on the architecture. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211302.668958502@linutronix.de
2021-08-17locking/rtmutex: Convert macros to inlinesSebastian Andrzej Siewior
Inlines are type-safe... Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211302.610830960@linutronix.de
2021-08-17locking/rtmutex: Set proper wait context for lockdepThomas Gleixner
RT mutexes belong to the LD_WAIT_SLEEP class. Make them so. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211302.031014562@linutronix.de
2021-08-10locking/rtmutex: Use the correct rtmutex debugging config optionZhen Lei
It's CONFIG_DEBUG_RT_MUTEXES not CONFIG_DEBUG_RT_MUTEX. Fixes: f7efc4799f81 ("locking/rtmutex: Inline chainwalk depth check") Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Will Deacon <will@kernel.org> Acked-by: Boqun Feng <boqun.feng@gmail.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210731123011.4555-1-thunder.leizhen@huawei.com
2021-06-18sched: Change task_struct::statePeter Zijlstra
Change the type and name of task_struct::state. Drop the volatile and shrink it to an 'unsigned int'. Rename it in order to find all uses such that we can use READ_ONCE/WRITE_ONCE as appropriate. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com> Acked-by: Will Deacon <will@kernel.org> Acked-by: Daniel Thompson <daniel.thompson@linaro.org> Link: https://lore.kernel.org/r/20210611082838.550736351@infradead.org
2021-03-29locking/rtmutex: Clean up signal handling in __rt_mutex_slowlock()Thomas Gleixner
The signal handling in __rt_mutex_slowlock() is open coded. Use signal_pending_state() instead. Aside of the cleanup this also prepares for the RT lock substituions which require support for TASK_KILLABLE. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20210326153944.533811987@linutronix.de
2021-03-29locking/rtmutex: Restrict the trylock WARN_ON() to debugThomas Gleixner
The warning as written is expensive and not really required for a production kernel. Make it depend on rt mutex debugging and use !in_task() for the condition which generates far better code and gives the same answer. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20210326153944.436565064@linutronix.de
2021-03-29locking/rtmutex: Fix misleading comment in rt_mutex_postunlock()Thomas Gleixner
Preemption is disabled in mark_wakeup_next_waiter(,) not in rt_mutex_slowunlock(). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20210326153944.341734608@linutronix.de
2021-03-29locking/rtmutex: Consolidate the fast/slowpath invocationThomas Gleixner
The indirection via a function pointer (which is at least optimized into a tail call by the compiler) is making the code hard to read. Clean it up and move the futex related trylock functions down to the futex section. Move the wake_q wakeup into rt_mutex_slowunlock(). No point in handing it to the caller. The futex code uses a different function. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20210326153944.247927548@linutronix.de
2021-03-29locking/rtmutex: Make text section and inlining consistentThomas Gleixner
rtmutex is half __sched and the other half is not. If the compiler decides to not inline larger static functions then part of the code ends up in the regular text section. There are also quite some performance related small helpers which are either static or plain inline. Force inline those which make sense and mark the rest __sched. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20210326153944.152977820@linutronix.de
2021-03-29locking/rtmutex: Decrapify __rt_mutex_init()Thomas Gleixner
The conditional debug handling is just another layer of obfuscation. Split the function so rt_mutex_init_proxy_locked() can invoke the inner init and __rt_mutex_init() gets the full treatment. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20210326153943.955697588@linutronix.de
2021-03-29locking/rtmutex: Inline chainwalk depth checkThomas Gleixner
There is no point for this wrapper at all. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20210326153943.754254046@linutronix.de
2021-03-29locking/rtmutex: Move rt_mutex_debug_task_free() to rtmutex.cThomas Gleixner
Prepare for removing the header maze. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20210326153943.646359691@linutronix.de
2021-03-29locking/rtmutex: Remove empty and unused debug stubsThomas Gleixner
No users or useless and therefore just ballast. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20210326153943.549192485@linutronix.de
2021-03-29locking/rtmutex: Remove output from deadlock detectorSebastian Andrzej Siewior
The rtmutex specific deadlock detector predates lockdep coverage of rtmutex and since commit f5694788ad8da ("rt_mutex: Add lockdep annotations") it contains a lot of redundant functionality: - lockdep will detect an potential deadlock before rtmutex-debug has a chance to do so - the deadlock debugging is restricted to rtmutexes which are not associated to futexes and have an active waiter, which is covered by lockdep already Remove the redundant functionality and move actual deadlock WARN() into the deadlock code path. The latter needs a seperate cleanup. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20210326153943.320398604@linutronix.de
2021-03-29locking/rtmutex: Remove rtmutex deadlock tester leftoversSebastian Andrzej Siewior
The following debug members of 'struct rtmutex' are unused: - save_state: No users - file,line: Printed if ::name is NULL. This is only used for non-futex locks so ::name is never NULL - magic: Assigned to NULL by rt_mutex_destroy(), no further usage Remove them along with unused inline and macro leftovers related to the long gone deadlock tester. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20210326153943.195064296@linutronix.de
2021-03-29locking/rtmutex: Remove rt_mutex_timed_lock()Sebastian Andrzej Siewior
rt_mutex_timed_lock() has no callers since: c051b21f71d1f ("rtmutex: Confine deadlock logic to futex") Remove it. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20210326153943.061103415@linutronix.de
2021-03-22locking: Fix typos in commentsIngo Molnar
Fix ~16 single-word typos in locking code comments. Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>