summaryrefslogtreecommitdiff
path: root/Documentation/scheduler/sched-ext.rst
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/scheduler/sched-ext.rst')
-rw-r--r--Documentation/scheduler/sched-ext.rst70
1 files changed, 56 insertions, 14 deletions
diff --git a/Documentation/scheduler/sched-ext.rst b/Documentation/scheduler/sched-ext.rst
index 6cb8b676ce03..404fe6126a76 100644
--- a/Documentation/scheduler/sched-ext.rst
+++ b/Documentation/scheduler/sched-ext.rst
@@ -1,3 +1,5 @@
+.. _sched-ext:
+
==========================
Extensible Scheduler Class
==========================
@@ -16,12 +18,12 @@ programs - the BPF scheduler.
* The system integrity is maintained no matter what the BPF scheduler does.
The default scheduling behavior is restored anytime an error is detected,
a runnable task stalls, or on invoking the SysRq key sequence
- :kbd:`SysRq-S`.
+ `SysRq-S`.
* When the BPF scheduler triggers an error, debug information is dumped to
aid debugging. The debug dump is passed to and printed out by the
scheduler binary. The debug dump can also be accessed through the
- `sched_ext_dump` tracepoint. The SysRq key sequence :kbd:`SysRq-D`
+ `sched_ext_dump` tracepoint. The SysRq key sequence `SysRq-D`
triggers a debug dump. This doesn't terminate the BPF scheduler and can
only be read through the tracepoint.
@@ -47,8 +49,8 @@ options should be enabled to use sched_ext:
sched_ext is used only when the BPF scheduler is loaded and running.
If a task explicitly sets its scheduling policy to ``SCHED_EXT``, it will be
-treated as ``SCHED_NORMAL`` and scheduled by CFS until the BPF scheduler is
-loaded.
+treated as ``SCHED_NORMAL`` and scheduled by the fair-class scheduler until the
+BPF scheduler is loaded.
When the BPF scheduler is loaded and ``SCX_OPS_SWITCH_PARTIAL`` is not set
in ``ops->flags``, all ``SCHED_NORMAL``, ``SCHED_BATCH``, ``SCHED_IDLE``, and
@@ -57,11 +59,11 @@ in ``ops->flags``, all ``SCHED_NORMAL``, ``SCHED_BATCH``, ``SCHED_IDLE``, and
However, when the BPF scheduler is loaded and ``SCX_OPS_SWITCH_PARTIAL`` is
set in ``ops->flags``, only tasks with the ``SCHED_EXT`` policy are scheduled
by sched_ext, while tasks with ``SCHED_NORMAL``, ``SCHED_BATCH`` and
-``SCHED_IDLE`` policies are scheduled by CFS.
+``SCHED_IDLE`` policies are scheduled by the fair-class scheduler.
-Terminating the sched_ext scheduler program, triggering :kbd:`SysRq-S`, or
+Terminating the sched_ext scheduler program, triggering `SysRq-S`, or
detection of any internal error including stalled runnable tasks aborts the
-BPF scheduler and reverts all tasks back to CFS.
+BPF scheduler and reverts all tasks back to the fair-class scheduler.
.. code-block:: none
@@ -107,8 +109,7 @@ detailed information:
nr_rejected : 0
enable_seq : 1
-If ``CONFIG_SCHED_DEBUG`` is set, whether a given task is on sched_ext can
-be determined as follows:
+Whether a given task is on sched_ext can be determined as follows:
.. code-block:: none
@@ -198,8 +199,8 @@ Dispatch Queues
To match the impedance between the scheduler core and the BPF scheduler,
sched_ext uses DSQs (dispatch queues) which can operate as both a FIFO and a
priority queue. By default, there is one global FIFO (``SCX_DSQ_GLOBAL``),
-and one local dsq per CPU (``SCX_DSQ_LOCAL``). The BPF scheduler can manage
-an arbitrary number of dsq's using ``scx_bpf_create_dsq()`` and
+and one local DSQ per CPU (``SCX_DSQ_LOCAL``). The BPF scheduler can manage
+an arbitrary number of DSQs using ``scx_bpf_create_dsq()`` and
``scx_bpf_destroy_dsq()``.
A CPU always executes a task from its local DSQ. A task is "inserted" into a
@@ -242,9 +243,9 @@ The following briefly shows how a waking task is scheduled and executed.
task was inserted directly from ``ops.select_cpu()``). ``ops.enqueue()``
can make one of the following decisions:
- * Immediately insert the task into either the global or local DSQ by
- calling ``scx_bpf_dsq_insert()`` with ``SCX_DSQ_GLOBAL`` or
- ``SCX_DSQ_LOCAL``, respectively.
+ * Immediately insert the task into either the global or a local DSQ by
+ calling ``scx_bpf_dsq_insert()`` with one of the following options:
+ ``SCX_DSQ_GLOBAL``, ``SCX_DSQ_LOCAL``, or ``SCX_DSQ_LOCAL_ON | cpu``.
* Immediately insert the task into a custom DSQ by calling
``scx_bpf_dsq_insert()`` with a DSQ ID which is smaller than 2^63.
@@ -294,6 +295,47 @@ dispatching, and must be dispatched to with ``scx_bpf_dsq_insert()``. See
the function documentation and usage in ``tools/sched_ext/scx_simple.bpf.c``
for more information.
+Task Lifecycle
+--------------
+
+The following pseudo-code summarizes the entire lifecycle of a task managed
+by a sched_ext scheduler:
+
+.. code-block:: c
+
+ ops.init_task(); /* A new task is created */
+ ops.enable(); /* Enable BPF scheduling for the task */
+
+ while (task in SCHED_EXT) {
+ if (task can migrate)
+ ops.select_cpu(); /* Called on wakeup (optimization) */
+
+ ops.runnable(); /* Task becomes ready to run */
+
+ while (task is runnable) {
+ if (task is not in a DSQ && task->scx.slice == 0) {
+ ops.enqueue(); /* Task can be added to a DSQ */
+
+ /* Any usable CPU becomes available */
+
+ ops.dispatch(); /* Task is moved to a local DSQ */
+ }
+ ops.running(); /* Task starts running on its assigned CPU */
+ while (task->scx.slice > 0 && task is runnable)
+ ops.tick(); /* Called every 1/HZ seconds */
+ ops.stopping(); /* Task stops running (time slice expires or wait) */
+
+ /* Task's CPU becomes available */
+
+ ops.dispatch(); /* task->scx.slice can be refilled */
+ }
+
+ ops.quiescent(); /* Task releases its assigned CPU (wait) */
+ }
+
+ ops.disable(); /* Disable BPF scheduling for the task */
+ ops.exit_task(); /* Task is destroyed */
+
Where to Look
=============