summaryrefslogtreecommitdiff
path: root/Documentation/scheduler
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/scheduler')
-rw-r--r--Documentation/scheduler/sched-deadline.rst85
-rw-r--r--Documentation/scheduler/sched-ext.rst25
-rw-r--r--Documentation/scheduler/sched-stats.rst55
3 files changed, 105 insertions, 60 deletions
diff --git a/Documentation/scheduler/sched-deadline.rst b/Documentation/scheduler/sched-deadline.rst
index a727827b8dd5..ec543a12f848 100644
--- a/Documentation/scheduler/sched-deadline.rst
+++ b/Documentation/scheduler/sched-deadline.rst
@@ -20,7 +20,8 @@ Deadline Task Scheduling
4.3 Default behavior
4.4 Behavior of sched_yield()
5. Tasks CPU affinity
- 5.1 SCHED_DEADLINE and cpusets HOWTO
+ 5.1 Using cgroup v1 cpuset controller
+ 5.2 Using cgroup v2 cpuset controller
6. Future plans
A. Test suite
B. Minimal main()
@@ -671,15 +672,17 @@ Deadline Task Scheduling
5. Tasks CPU affinity
=====================
- -deadline tasks cannot have an affinity mask smaller that the entire
- root_domain they are created on. However, affinities can be specified
- through the cpuset facility (Documentation/admin-guide/cgroup-v1/cpusets.rst).
+ Deadline tasks cannot have a cpu affinity mask smaller than the root domain they
+ are created on. So, using ``sched_setaffinity(2)`` won't work. Instead, the
+ the deadline task should be created in a restricted root domain. This can be
+ done using the cpuset controller of either cgroup v1 (deprecated) or cgroup v2.
+ See :ref:`Documentation/admin-guide/cgroup-v1/cpusets.rst <cpusets>` and
+ :ref:`Documentation/admin-guide/cgroup-v2.rst <cgroup-v2>` for more information.
-5.1 SCHED_DEADLINE and cpusets HOWTO
-------------------------------------
+5.1 Using cgroup v1 cpuset controller
+-------------------------------------
- An example of a simple configuration (pin a -deadline task to CPU0)
- follows (rt-app is used to create a -deadline task)::
+ An example of a simple configuration (pin a -deadline task to CPU0) follows::
mkdir /dev/cpuset
mount -t cgroup -o cpuset cpuset /dev/cpuset
@@ -692,8 +695,20 @@ Deadline Task Scheduling
echo 1 > cpu0/cpuset.cpu_exclusive
echo 1 > cpu0/cpuset.mem_exclusive
echo $$ > cpu0/tasks
- rt-app -t 100000:10000:d:0 -D5 # it is now actually superfluous to specify
- # task affinity
+ chrt --sched-runtime 100000 --sched-period 200000 --deadline 0 yes > /dev/null
+
+5.2 Using cgroup v2 cpuset controller
+-------------------------------------
+
+ Assuming the cgroup v2 root is mounted at ``/sys/fs/cgroup``.
+
+ cd /sys/fs/cgroup
+ echo '+cpuset' > cgroup.subtree_control
+ mkdir deadline_group
+ echo 0 > deadline_group/cpuset.cpus
+ echo 'root' > deadline_group/cpuset.cpus.partition
+ echo $$ > deadline_group/cgroup.procs
+ chrt --sched-runtime 100000 --sched-period 200000 --deadline 0 yes > /dev/null
6. Future plans
===============
@@ -731,24 +746,38 @@ Appendix A. Test suite
behaves under such workloads. In this way, results are easily reproducible.
rt-app is available at: https://github.com/scheduler-tools/rt-app.
- Thread parameters can be specified from the command line, with something like
- this::
-
- # rt-app -t 100000:10000:d -t 150000:20000:f:10 -D5
-
- The above creates 2 threads. The first one, scheduled by SCHED_DEADLINE,
- executes for 10ms every 100ms. The second one, scheduled at SCHED_FIFO
- priority 10, executes for 20ms every 150ms. The test will run for a total
- of 5 seconds.
-
- More interestingly, configurations can be described with a json file that
- can be passed as input to rt-app with something like this::
-
- # rt-app my_config.json
-
- The parameters that can be specified with the second method are a superset
- of the command line options. Please refer to rt-app documentation for more
- details (`<rt-app-sources>/doc/*.json`).
+ rt-app does not accept command line arguments, and instead reads from a JSON
+ configuration file. Here is an example ``config.json``:
+
+ .. code-block:: json
+
+ {
+ "tasks": {
+ "dl_task": {
+ "policy": "SCHED_DEADLINE",
+ "priority": 0,
+ "dl-runtime": 10000,
+ "dl-period": 100000,
+ "dl-deadline": 100000
+ },
+ "fifo_task": {
+ "policy": "SCHED_FIFO",
+ "priority": 10,
+ "runtime": 20000,
+ "sleep": 130000
+ }
+ },
+ "global": {
+ "duration": 5
+ }
+ }
+
+ On running ``rt-app config.json``, it creates 2 threads. The first one,
+ scheduled by SCHED_DEADLINE, executes for 10ms every 100ms. The second one,
+ scheduled at SCHED_FIFO priority 10, executes for 20ms every 150ms. The test
+ will run for a total of 5 seconds.
+
+ Please refer to the rt-app documentation for the JSON schema and more examples.
The second testing application is done using chrt which has support
for SCHED_DEADLINE.
diff --git a/Documentation/scheduler/sched-ext.rst b/Documentation/scheduler/sched-ext.rst
index 0b2654e2164b..404fe6126a76 100644
--- a/Documentation/scheduler/sched-ext.rst
+++ b/Documentation/scheduler/sched-ext.rst
@@ -1,3 +1,5 @@
+.. _sched-ext:
+
==========================
Extensible Scheduler Class
==========================
@@ -47,8 +49,8 @@ options should be enabled to use sched_ext:
sched_ext is used only when the BPF scheduler is loaded and running.
If a task explicitly sets its scheduling policy to ``SCHED_EXT``, it will be
-treated as ``SCHED_NORMAL`` and scheduled by CFS until the BPF scheduler is
-loaded.
+treated as ``SCHED_NORMAL`` and scheduled by the fair-class scheduler until the
+BPF scheduler is loaded.
When the BPF scheduler is loaded and ``SCX_OPS_SWITCH_PARTIAL`` is not set
in ``ops->flags``, all ``SCHED_NORMAL``, ``SCHED_BATCH``, ``SCHED_IDLE``, and
@@ -57,11 +59,11 @@ in ``ops->flags``, all ``SCHED_NORMAL``, ``SCHED_BATCH``, ``SCHED_IDLE``, and
However, when the BPF scheduler is loaded and ``SCX_OPS_SWITCH_PARTIAL`` is
set in ``ops->flags``, only tasks with the ``SCHED_EXT`` policy are scheduled
by sched_ext, while tasks with ``SCHED_NORMAL``, ``SCHED_BATCH`` and
-``SCHED_IDLE`` policies are scheduled by CFS.
+``SCHED_IDLE`` policies are scheduled by the fair-class scheduler.
Terminating the sched_ext scheduler program, triggering `SysRq-S`, or
detection of any internal error including stalled runnable tasks aborts the
-BPF scheduler and reverts all tasks back to CFS.
+BPF scheduler and reverts all tasks back to the fair-class scheduler.
.. code-block:: none
@@ -197,8 +199,8 @@ Dispatch Queues
To match the impedance between the scheduler core and the BPF scheduler,
sched_ext uses DSQs (dispatch queues) which can operate as both a FIFO and a
priority queue. By default, there is one global FIFO (``SCX_DSQ_GLOBAL``),
-and one local dsq per CPU (``SCX_DSQ_LOCAL``). The BPF scheduler can manage
-an arbitrary number of dsq's using ``scx_bpf_create_dsq()`` and
+and one local DSQ per CPU (``SCX_DSQ_LOCAL``). The BPF scheduler can manage
+an arbitrary number of DSQs using ``scx_bpf_create_dsq()`` and
``scx_bpf_destroy_dsq()``.
A CPU always executes a task from its local DSQ. A task is "inserted" into a
@@ -311,16 +313,21 @@ by a sched_ext scheduler:
ops.runnable(); /* Task becomes ready to run */
while (task is runnable) {
- if (task is not in a DSQ) {
+ if (task is not in a DSQ && task->scx.slice == 0) {
ops.enqueue(); /* Task can be added to a DSQ */
- /* A CPU becomes available */
+ /* Any usable CPU becomes available */
ops.dispatch(); /* Task is moved to a local DSQ */
}
ops.running(); /* Task starts running on its assigned CPU */
- ops.tick(); /* Called every 1/HZ seconds */
+ while (task->scx.slice > 0 && task is runnable)
+ ops.tick(); /* Called every 1/HZ seconds */
ops.stopping(); /* Task stops running (time slice expires or wait) */
+
+ /* Task's CPU becomes available */
+
+ ops.dispatch(); /* task->scx.slice can be refilled */
}
ops.quiescent(); /* Task releases its assigned CPU (wait) */
diff --git a/Documentation/scheduler/sched-stats.rst b/Documentation/scheduler/sched-stats.rst
index 08b6bc9a315c..9d6a337755f4 100644
--- a/Documentation/scheduler/sched-stats.rst
+++ b/Documentation/scheduler/sched-stats.rst
@@ -86,13 +86,16 @@ Domain statistics
-----------------
One of these is produced per domain for each cpu described. (Note that if
CONFIG_SMP is not defined, *no* domains are utilized and these lines
-will not appear in the output. <name> is an extension to the domain field
-that prints the name of the corresponding sched domain. It can appear in
-schedstat version 17 and above.
+will not appear in the output.)
domain<N> <name> <cpumask> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
-The first field is a bit mask indicating what cpus this domain operates over.
+The <name> field prints the name of the sched domain and is only supported
+with schedstat version >= 17. On previous versions, <cpumask> is the first
+field.
+
+The <cpumask> field is a bit mask indicating what cpus this domain operates
+over.
The next 33 are a variety of sched_balance_rq() statistics in grouped into types
of idleness (busy, idle and newly idle):
@@ -103,12 +106,13 @@ of idleness (busy, idle and newly idle):
load did not require balancing when busy
3) # of times in this domain sched_balance_rq() tried to move one or
more tasks and failed, when the cpu was busy
- 4) Total imbalance in load when the cpu was busy
- 5) Total imbalance in utilization when the cpu was busy
- 6) Total imbalance in number of tasks when the cpu was busy
- 7) Total imbalance due to misfit tasks when the cpu was busy
- 8) # of times in this domain pull_task() was called when busy
- 9) # of times in this domain pull_task() was called even though the
+ 4) Total imbalance in load in this domain when the cpu was busy
+ 5) Total imbalance in utilization in this domain when the cpu was busy
+ 6) Total imbalance in number of tasks in this domain when the cpu was busy
+ 7) Total imbalance due to misfit tasks in this domain when the cpu was
+ busy
+ 8) # of times in this domain detach_task() was called when busy
+ 9) # of times in this domain detach_task() was called even though the
target task was cache-hot when busy
10) # of times in this domain sched_balance_rq() was called but did not
find a busier queue while the cpu was busy
@@ -121,13 +125,14 @@ of idleness (busy, idle and newly idle):
the load did not require balancing when the cpu was idle
14) # of times in this domain sched_balance_rq() tried to move one or
more tasks and failed, when the cpu was idle
- 15) Total imbalance in load when the cpu was idle
- 16) Total imbalance in utilization when the cpu was idle
- 17) Total imbalance in number of tasks when the cpu was idle
- 18) Total imbalance due to misfit tasks when the cpu was idle
- 19) # of times in this domain pull_task() was called when the cpu
+ 15) Total imbalance in load in this domain when the cpu was idle
+ 16) Total imbalance in utilization in this domain when the cpu was idle
+ 17) Total imbalance in number of tasks in this domain when the cpu was idle
+ 18) Total imbalance due to misfit tasks in this domain when the cpu was
+ idle
+ 19) # of times in this domain detach_task() was called when the cpu
was idle
- 20) # of times in this domain pull_task() was called even though
+ 20) # of times in this domain detach_task() was called even though
the target task was cache-hot when idle
21) # of times in this domain sched_balance_rq() was called but did
not find a busier queue while the cpu was idle
@@ -135,17 +140,21 @@ of idleness (busy, idle and newly idle):
cpu was idle but no busier group was found
23) # of times in this domain sched_balance_rq() was called when the
- was just becoming idle
+ cpu was just becoming idle
24) # of times in this domain sched_balance_rq() checked but found the
load did not require balancing when the cpu was just becoming idle
25) # of times in this domain sched_balance_rq() tried to move one or more
tasks and failed, when the cpu was just becoming idle
- 26) Total imbalance in load when the cpu was just becoming idle
- 27) Total imbalance in utilization when the cpu was just becoming idle
- 28) Total imbalance in number of tasks when the cpu was just becoming idle
- 29) Total imbalance due to misfit tasks when the cpu was just becoming idle
- 30) # of times in this domain pull_task() was called when newly idle
- 31) # of times in this domain pull_task() was called even though the
+ 26) Total imbalance in load in this domain when the cpu was just becoming
+ idle
+ 27) Total imbalance in utilization in this domain when the cpu was just
+ becoming idle
+ 28) Total imbalance in number of tasks in this domain when the cpu was just
+ becoming idle
+ 29) Total imbalance due to misfit tasks in this domain when the cpu was
+ just becoming idle
+ 30) # of times in this domain detach_task() was called when newly idle
+ 31) # of times in this domain detach_task() was called even though the
target task was cache-hot when just becoming idle
32) # of times in this domain sched_balance_rq() was called but did not
find a busier queue while the cpu was just becoming idle