diff options
Diffstat (limited to 'Documentation/scheduler')
-rw-r--r-- | Documentation/scheduler/sched-deadline.rst | 85 | ||||
-rw-r--r-- | Documentation/scheduler/sched-ext.rst | 25 | ||||
-rw-r--r-- | Documentation/scheduler/sched-stats.rst | 55 |
3 files changed, 105 insertions, 60 deletions
diff --git a/Documentation/scheduler/sched-deadline.rst b/Documentation/scheduler/sched-deadline.rst index a727827b8dd5..ec543a12f848 100644 --- a/Documentation/scheduler/sched-deadline.rst +++ b/Documentation/scheduler/sched-deadline.rst @@ -20,7 +20,8 @@ Deadline Task Scheduling 4.3 Default behavior 4.4 Behavior of sched_yield() 5. Tasks CPU affinity - 5.1 SCHED_DEADLINE and cpusets HOWTO + 5.1 Using cgroup v1 cpuset controller + 5.2 Using cgroup v2 cpuset controller 6. Future plans A. Test suite B. Minimal main() @@ -671,15 +672,17 @@ Deadline Task Scheduling 5. Tasks CPU affinity ===================== - -deadline tasks cannot have an affinity mask smaller that the entire - root_domain they are created on. However, affinities can be specified - through the cpuset facility (Documentation/admin-guide/cgroup-v1/cpusets.rst). + Deadline tasks cannot have a cpu affinity mask smaller than the root domain they + are created on. So, using ``sched_setaffinity(2)`` won't work. Instead, the + the deadline task should be created in a restricted root domain. This can be + done using the cpuset controller of either cgroup v1 (deprecated) or cgroup v2. + See :ref:`Documentation/admin-guide/cgroup-v1/cpusets.rst <cpusets>` and + :ref:`Documentation/admin-guide/cgroup-v2.rst <cgroup-v2>` for more information. -5.1 SCHED_DEADLINE and cpusets HOWTO ------------------------------------- +5.1 Using cgroup v1 cpuset controller +------------------------------------- - An example of a simple configuration (pin a -deadline task to CPU0) - follows (rt-app is used to create a -deadline task):: + An example of a simple configuration (pin a -deadline task to CPU0) follows:: mkdir /dev/cpuset mount -t cgroup -o cpuset cpuset /dev/cpuset @@ -692,8 +695,20 @@ Deadline Task Scheduling echo 1 > cpu0/cpuset.cpu_exclusive echo 1 > cpu0/cpuset.mem_exclusive echo $$ > cpu0/tasks - rt-app -t 100000:10000:d:0 -D5 # it is now actually superfluous to specify - # task affinity + chrt --sched-runtime 100000 --sched-period 200000 --deadline 0 yes > /dev/null + +5.2 Using cgroup v2 cpuset controller +------------------------------------- + + Assuming the cgroup v2 root is mounted at ``/sys/fs/cgroup``. + + cd /sys/fs/cgroup + echo '+cpuset' > cgroup.subtree_control + mkdir deadline_group + echo 0 > deadline_group/cpuset.cpus + echo 'root' > deadline_group/cpuset.cpus.partition + echo $$ > deadline_group/cgroup.procs + chrt --sched-runtime 100000 --sched-period 200000 --deadline 0 yes > /dev/null 6. Future plans =============== @@ -731,24 +746,38 @@ Appendix A. Test suite behaves under such workloads. In this way, results are easily reproducible. rt-app is available at: https://github.com/scheduler-tools/rt-app. - Thread parameters can be specified from the command line, with something like - this:: - - # rt-app -t 100000:10000:d -t 150000:20000:f:10 -D5 - - The above creates 2 threads. The first one, scheduled by SCHED_DEADLINE, - executes for 10ms every 100ms. The second one, scheduled at SCHED_FIFO - priority 10, executes for 20ms every 150ms. The test will run for a total - of 5 seconds. - - More interestingly, configurations can be described with a json file that - can be passed as input to rt-app with something like this:: - - # rt-app my_config.json - - The parameters that can be specified with the second method are a superset - of the command line options. Please refer to rt-app documentation for more - details (`<rt-app-sources>/doc/*.json`). + rt-app does not accept command line arguments, and instead reads from a JSON + configuration file. Here is an example ``config.json``: + + .. code-block:: json + + { + "tasks": { + "dl_task": { + "policy": "SCHED_DEADLINE", + "priority": 0, + "dl-runtime": 10000, + "dl-period": 100000, + "dl-deadline": 100000 + }, + "fifo_task": { + "policy": "SCHED_FIFO", + "priority": 10, + "runtime": 20000, + "sleep": 130000 + } + }, + "global": { + "duration": 5 + } + } + + On running ``rt-app config.json``, it creates 2 threads. The first one, + scheduled by SCHED_DEADLINE, executes for 10ms every 100ms. The second one, + scheduled at SCHED_FIFO priority 10, executes for 20ms every 150ms. The test + will run for a total of 5 seconds. + + Please refer to the rt-app documentation for the JSON schema and more examples. The second testing application is done using chrt which has support for SCHED_DEADLINE. diff --git a/Documentation/scheduler/sched-ext.rst b/Documentation/scheduler/sched-ext.rst index 0b2654e2164b..404fe6126a76 100644 --- a/Documentation/scheduler/sched-ext.rst +++ b/Documentation/scheduler/sched-ext.rst @@ -1,3 +1,5 @@ +.. _sched-ext: + ========================== Extensible Scheduler Class ========================== @@ -47,8 +49,8 @@ options should be enabled to use sched_ext: sched_ext is used only when the BPF scheduler is loaded and running. If a task explicitly sets its scheduling policy to ``SCHED_EXT``, it will be -treated as ``SCHED_NORMAL`` and scheduled by CFS until the BPF scheduler is -loaded. +treated as ``SCHED_NORMAL`` and scheduled by the fair-class scheduler until the +BPF scheduler is loaded. When the BPF scheduler is loaded and ``SCX_OPS_SWITCH_PARTIAL`` is not set in ``ops->flags``, all ``SCHED_NORMAL``, ``SCHED_BATCH``, ``SCHED_IDLE``, and @@ -57,11 +59,11 @@ in ``ops->flags``, all ``SCHED_NORMAL``, ``SCHED_BATCH``, ``SCHED_IDLE``, and However, when the BPF scheduler is loaded and ``SCX_OPS_SWITCH_PARTIAL`` is set in ``ops->flags``, only tasks with the ``SCHED_EXT`` policy are scheduled by sched_ext, while tasks with ``SCHED_NORMAL``, ``SCHED_BATCH`` and -``SCHED_IDLE`` policies are scheduled by CFS. +``SCHED_IDLE`` policies are scheduled by the fair-class scheduler. Terminating the sched_ext scheduler program, triggering `SysRq-S`, or detection of any internal error including stalled runnable tasks aborts the -BPF scheduler and reverts all tasks back to CFS. +BPF scheduler and reverts all tasks back to the fair-class scheduler. .. code-block:: none @@ -197,8 +199,8 @@ Dispatch Queues To match the impedance between the scheduler core and the BPF scheduler, sched_ext uses DSQs (dispatch queues) which can operate as both a FIFO and a priority queue. By default, there is one global FIFO (``SCX_DSQ_GLOBAL``), -and one local dsq per CPU (``SCX_DSQ_LOCAL``). The BPF scheduler can manage -an arbitrary number of dsq's using ``scx_bpf_create_dsq()`` and +and one local DSQ per CPU (``SCX_DSQ_LOCAL``). The BPF scheduler can manage +an arbitrary number of DSQs using ``scx_bpf_create_dsq()`` and ``scx_bpf_destroy_dsq()``. A CPU always executes a task from its local DSQ. A task is "inserted" into a @@ -311,16 +313,21 @@ by a sched_ext scheduler: ops.runnable(); /* Task becomes ready to run */ while (task is runnable) { - if (task is not in a DSQ) { + if (task is not in a DSQ && task->scx.slice == 0) { ops.enqueue(); /* Task can be added to a DSQ */ - /* A CPU becomes available */ + /* Any usable CPU becomes available */ ops.dispatch(); /* Task is moved to a local DSQ */ } ops.running(); /* Task starts running on its assigned CPU */ - ops.tick(); /* Called every 1/HZ seconds */ + while (task->scx.slice > 0 && task is runnable) + ops.tick(); /* Called every 1/HZ seconds */ ops.stopping(); /* Task stops running (time slice expires or wait) */ + + /* Task's CPU becomes available */ + + ops.dispatch(); /* task->scx.slice can be refilled */ } ops.quiescent(); /* Task releases its assigned CPU (wait) */ diff --git a/Documentation/scheduler/sched-stats.rst b/Documentation/scheduler/sched-stats.rst index 08b6bc9a315c..9d6a337755f4 100644 --- a/Documentation/scheduler/sched-stats.rst +++ b/Documentation/scheduler/sched-stats.rst @@ -86,13 +86,16 @@ Domain statistics ----------------- One of these is produced per domain for each cpu described. (Note that if CONFIG_SMP is not defined, *no* domains are utilized and these lines -will not appear in the output. <name> is an extension to the domain field -that prints the name of the corresponding sched domain. It can appear in -schedstat version 17 and above. +will not appear in the output.) domain<N> <name> <cpumask> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 -The first field is a bit mask indicating what cpus this domain operates over. +The <name> field prints the name of the sched domain and is only supported +with schedstat version >= 17. On previous versions, <cpumask> is the first +field. + +The <cpumask> field is a bit mask indicating what cpus this domain operates +over. The next 33 are a variety of sched_balance_rq() statistics in grouped into types of idleness (busy, idle and newly idle): @@ -103,12 +106,13 @@ of idleness (busy, idle and newly idle): load did not require balancing when busy 3) # of times in this domain sched_balance_rq() tried to move one or more tasks and failed, when the cpu was busy - 4) Total imbalance in load when the cpu was busy - 5) Total imbalance in utilization when the cpu was busy - 6) Total imbalance in number of tasks when the cpu was busy - 7) Total imbalance due to misfit tasks when the cpu was busy - 8) # of times in this domain pull_task() was called when busy - 9) # of times in this domain pull_task() was called even though the + 4) Total imbalance in load in this domain when the cpu was busy + 5) Total imbalance in utilization in this domain when the cpu was busy + 6) Total imbalance in number of tasks in this domain when the cpu was busy + 7) Total imbalance due to misfit tasks in this domain when the cpu was + busy + 8) # of times in this domain detach_task() was called when busy + 9) # of times in this domain detach_task() was called even though the target task was cache-hot when busy 10) # of times in this domain sched_balance_rq() was called but did not find a busier queue while the cpu was busy @@ -121,13 +125,14 @@ of idleness (busy, idle and newly idle): the load did not require balancing when the cpu was idle 14) # of times in this domain sched_balance_rq() tried to move one or more tasks and failed, when the cpu was idle - 15) Total imbalance in load when the cpu was idle - 16) Total imbalance in utilization when the cpu was idle - 17) Total imbalance in number of tasks when the cpu was idle - 18) Total imbalance due to misfit tasks when the cpu was idle - 19) # of times in this domain pull_task() was called when the cpu + 15) Total imbalance in load in this domain when the cpu was idle + 16) Total imbalance in utilization in this domain when the cpu was idle + 17) Total imbalance in number of tasks in this domain when the cpu was idle + 18) Total imbalance due to misfit tasks in this domain when the cpu was + idle + 19) # of times in this domain detach_task() was called when the cpu was idle - 20) # of times in this domain pull_task() was called even though + 20) # of times in this domain detach_task() was called even though the target task was cache-hot when idle 21) # of times in this domain sched_balance_rq() was called but did not find a busier queue while the cpu was idle @@ -135,17 +140,21 @@ of idleness (busy, idle and newly idle): cpu was idle but no busier group was found 23) # of times in this domain sched_balance_rq() was called when the - was just becoming idle + cpu was just becoming idle 24) # of times in this domain sched_balance_rq() checked but found the load did not require balancing when the cpu was just becoming idle 25) # of times in this domain sched_balance_rq() tried to move one or more tasks and failed, when the cpu was just becoming idle - 26) Total imbalance in load when the cpu was just becoming idle - 27) Total imbalance in utilization when the cpu was just becoming idle - 28) Total imbalance in number of tasks when the cpu was just becoming idle - 29) Total imbalance due to misfit tasks when the cpu was just becoming idle - 30) # of times in this domain pull_task() was called when newly idle - 31) # of times in this domain pull_task() was called even though the + 26) Total imbalance in load in this domain when the cpu was just becoming + idle + 27) Total imbalance in utilization in this domain when the cpu was just + becoming idle + 28) Total imbalance in number of tasks in this domain when the cpu was just + becoming idle + 29) Total imbalance due to misfit tasks in this domain when the cpu was + just becoming idle + 30) # of times in this domain detach_task() was called when newly idle + 31) # of times in this domain detach_task() was called even though the target task was cache-hot when just becoming idle 32) # of times in this domain sched_balance_rq() was called but did not find a busier queue while the cpu was just becoming idle |