summaryrefslogtreecommitdiff
path: root/Documentation/futex-requeue-pi.txt
diff options
context:
space:
mode:
authorMauro Carvalho Chehab <mchehab+huawei@kernel.org>2020-05-01 17:37:54 +0200
committerJonathan Corbet <corbet@lwn.net>2020-05-15 12:05:07 -0600
commit95ca6d73a8a97ba343082746dbf935863b76375a (patch)
tree5c7514627a4f4fa5d1b34783cf35b83354f4f2d6 /Documentation/futex-requeue-pi.txt
parent9184027f0aaf6c95856bb57d04d0fa0b16fd9981 (diff)
docs: move locking-specific documents to locking/
Several files under Documentation/*.txt describe some type of locking API. Move them to locking/ subdir and add to the locking/index.rst index file. Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Link: https://lore.kernel.org/r/dd833a10bbd0b2c1461d78913f5ec28a7e27f00b.1588345503.git.mchehab+huawei@kernel.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Diffstat (limited to 'Documentation/futex-requeue-pi.txt')
-rw-r--r--Documentation/futex-requeue-pi.txt132
1 files changed, 0 insertions, 132 deletions
diff --git a/Documentation/futex-requeue-pi.txt b/Documentation/futex-requeue-pi.txt
deleted file mode 100644
index 14ab5787b9a7..000000000000
--- a/Documentation/futex-requeue-pi.txt
+++ /dev/null
@@ -1,132 +0,0 @@
-================
-Futex Requeue PI
-================
-
-Requeueing of tasks from a non-PI futex to a PI futex requires
-special handling in order to ensure the underlying rt_mutex is never
-left without an owner if it has waiters; doing so would break the PI
-boosting logic [see rt-mutex-desgin.txt] For the purposes of
-brevity, this action will be referred to as "requeue_pi" throughout
-this document. Priority inheritance is abbreviated throughout as
-"PI".
-
-Motivation
-----------
-
-Without requeue_pi, the glibc implementation of
-pthread_cond_broadcast() must resort to waking all the tasks waiting
-on a pthread_condvar and letting them try to sort out which task
-gets to run first in classic thundering-herd formation. An ideal
-implementation would wake the highest-priority waiter, and leave the
-rest to the natural wakeup inherent in unlocking the mutex
-associated with the condvar.
-
-Consider the simplified glibc calls::
-
- /* caller must lock mutex */
- pthread_cond_wait(cond, mutex)
- {
- lock(cond->__data.__lock);
- unlock(mutex);
- do {
- unlock(cond->__data.__lock);
- futex_wait(cond->__data.__futex);
- lock(cond->__data.__lock);
- } while(...)
- unlock(cond->__data.__lock);
- lock(mutex);
- }
-
- pthread_cond_broadcast(cond)
- {
- lock(cond->__data.__lock);
- unlock(cond->__data.__lock);
- futex_requeue(cond->data.__futex, cond->mutex);
- }
-
-Once pthread_cond_broadcast() requeues the tasks, the cond->mutex
-has waiters. Note that pthread_cond_wait() attempts to lock the
-mutex only after it has returned to user space. This will leave the
-underlying rt_mutex with waiters, and no owner, breaking the
-previously mentioned PI-boosting algorithms.
-
-In order to support PI-aware pthread_condvar's, the kernel needs to
-be able to requeue tasks to PI futexes. This support implies that
-upon a successful futex_wait system call, the caller would return to
-user space already holding the PI futex. The glibc implementation
-would be modified as follows::
-
-
- /* caller must lock mutex */
- pthread_cond_wait_pi(cond, mutex)
- {
- lock(cond->__data.__lock);
- unlock(mutex);
- do {
- unlock(cond->__data.__lock);
- futex_wait_requeue_pi(cond->__data.__futex);
- lock(cond->__data.__lock);
- } while(...)
- unlock(cond->__data.__lock);
- /* the kernel acquired the mutex for us */
- }
-
- pthread_cond_broadcast_pi(cond)
- {
- lock(cond->__data.__lock);
- unlock(cond->__data.__lock);
- futex_requeue_pi(cond->data.__futex, cond->mutex);
- }
-
-The actual glibc implementation will likely test for PI and make the
-necessary changes inside the existing calls rather than creating new
-calls for the PI cases. Similar changes are needed for
-pthread_cond_timedwait() and pthread_cond_signal().
-
-Implementation
---------------
-
-In order to ensure the rt_mutex has an owner if it has waiters, it
-is necessary for both the requeue code, as well as the waiting code,
-to be able to acquire the rt_mutex before returning to user space.
-The requeue code cannot simply wake the waiter and leave it to
-acquire the rt_mutex as it would open a race window between the
-requeue call returning to user space and the waiter waking and
-starting to run. This is especially true in the uncontended case.
-
-The solution involves two new rt_mutex helper routines,
-rt_mutex_start_proxy_lock() and rt_mutex_finish_proxy_lock(), which
-allow the requeue code to acquire an uncontended rt_mutex on behalf
-of the waiter and to enqueue the waiter on a contended rt_mutex.
-Two new system calls provide the kernel<->user interface to
-requeue_pi: FUTEX_WAIT_REQUEUE_PI and FUTEX_CMP_REQUEUE_PI.
-
-FUTEX_WAIT_REQUEUE_PI is called by the waiter (pthread_cond_wait()
-and pthread_cond_timedwait()) to block on the initial futex and wait
-to be requeued to a PI-aware futex. The implementation is the
-result of a high-speed collision between futex_wait() and
-futex_lock_pi(), with some extra logic to check for the additional
-wake-up scenarios.
-
-FUTEX_CMP_REQUEUE_PI is called by the waker
-(pthread_cond_broadcast() and pthread_cond_signal()) to requeue and
-possibly wake the waiting tasks. Internally, this system call is
-still handled by futex_requeue (by passing requeue_pi=1). Before
-requeueing, futex_requeue() attempts to acquire the requeue target
-PI futex on behalf of the top waiter. If it can, this waiter is
-woken. futex_requeue() then proceeds to requeue the remaining
-nr_wake+nr_requeue tasks to the PI futex, calling
-rt_mutex_start_proxy_lock() prior to each requeue to prepare the
-task as a waiter on the underlying rt_mutex. It is possible that
-the lock can be acquired at this stage as well, if so, the next
-waiter is woken to finish the acquisition of the lock.
-
-FUTEX_CMP_REQUEUE_PI accepts nr_wake and nr_requeue as arguments, but
-their sum is all that really matters. futex_requeue() will wake or
-requeue up to nr_wake + nr_requeue tasks. It will wake only as many
-tasks as it can acquire the lock for, which in the majority of cases
-should be 0 as good programming practice dictates that the caller of
-either pthread_cond_broadcast() or pthread_cond_signal() acquire the
-mutex prior to making the call. FUTEX_CMP_REQUEUE_PI requires that
-nr_wake=1. nr_requeue should be INT_MAX for broadcast and 0 for
-signal.