3 files changed, 105 insertions, 60 deletions
diff --git a/Documentation/scheduler/sched-deadline.rst b/Documentation/scheduler/sched-deadline.rst
index a727827b8dd5..ec543a12f848 100644
--- a/Documentation/scheduler/sched-deadline.rst
+++ b/Documentation/scheduler/sched-deadline.rst
@@ -20,7 +20,8 @@ Deadline Task Scheduling
       4.3 Default behavior
       4.4 Behavior of sched_yield()
     5. Tasks CPU affinity
-      5.1 SCHED_DEADLINE and cpusets HOWTO
+      5.1 Using cgroup v1 cpuset controller
+      5.2 Using cgroup v2 cpuset controller
     6. Future plans
     A. Test suite
     B. Minimal main()
@@ -671,15 +672,17 @@ Deadline Task Scheduling
 5. Tasks CPU affinity
 =====================
 
- -deadline tasks cannot have an affinity mask smaller that the entire
- root_domain they are created on. However, affinities can be specified
- through the cpuset facility (Documentation/admin-guide/cgroup-v1/cpusets.rst).
+ Deadline tasks cannot have a cpu affinity mask smaller than the root domain they
+ are created on. So, using ``sched_setaffinity(2)`` won't work. Instead, the
+ the deadline task should be created in a restricted root domain. This can be
+ done using the cpuset controller of either cgroup v1 (deprecated) or cgroup v2.
+ See :ref:`Documentation/admin-guide/cgroup-v1/cpusets.rst <cpusets>` and
+ :ref:`Documentation/admin-guide/cgroup-v2.rst <cgroup-v2>` for more information.
 
-5.1 SCHED_DEADLINE and cpusets HOWTO
-------------------------------------
+5.1 Using cgroup v1 cpuset controller
+-------------------------------------
 
- An example of a simple configuration (pin a -deadline task to CPU0)
- follows (rt-app is used to create a -deadline task)::
+ An example of a simple configuration (pin a -deadline task to CPU0) follows::
 
    mkdir /dev/cpuset
    mount -t cgroup -o cpuset cpuset /dev/cpuset
@@ -692,8 +695,20 @@ Deadline Task Scheduling
    echo 1 > cpu0/cpuset.cpu_exclusive
    echo 1 > cpu0/cpuset.mem_exclusive
    echo $$ > cpu0/tasks
-   rt-app -t 100000:10000:d:0 -D5 # it is now actually superfluous to specify
-				  # task affinity
+   chrt --sched-runtime 100000 --sched-period 200000 --deadline 0 yes > /dev/null
+
+5.2 Using cgroup v2 cpuset controller
+-------------------------------------
+
+ Assuming the cgroup v2 root is mounted at ``/sys/fs/cgroup``.
+
+   cd /sys/fs/cgroup
+   echo '+cpuset' > cgroup.subtree_control
+   mkdir deadline_group
+   echo 0 > deadline_group/cpuset.cpus
+   echo 'root' > deadline_group/cpuset.cpus.partition
+   echo $$ > deadline_group/cgroup.procs
+   chrt --sched-runtime 100000 --sched-period 200000 --deadline 0 yes > /dev/null
 
 6. Future plans
 ===============
@@ -731,24 +746,38 @@ Appendix A. Test suite
  behaves under such workloads. In this way, results are easily reproducible.
  rt-app is available at: https://github.com/scheduler-tools/rt-app.
 
- Thread parameters can be specified from the command line, with something like
- this::
-
-  # rt-app -t 100000:10000:d -t 150000:20000:f:10 -D5
-
- The above creates 2 threads. The first one, scheduled by SCHED_DEADLINE,
- executes for 10ms every 100ms. The second one, scheduled at SCHED_FIFO
- priority 10, executes for 20ms every 150ms. The test will run for a total
- of 5 seconds.
-
- More interestingly, configurations can be described with a json file that
- can be passed as input to rt-app with something like this::
-
-  # rt-app my_config.json
-
- The parameters that can be specified with the second method are a superset
- of the command line options. Please refer to rt-app documentation for more
- details (`<rt-app-sources>/doc/*.json`).
+ rt-app does not accept command line arguments, and instead reads from a JSON
+ configuration file. Here is an example ``config.json``:
+
+ .. code-block:: json
+
+  {
+    "tasks": {
+      "dl_task": {
+        "policy": "SCHED_DEADLINE",
+        "priority": 0,
+        "dl-runtime": 10000,
+        "dl-period": 100000,
+        "dl-deadline": 100000
+      },
+      "fifo_task": {
+        "policy": "SCHED_FIFO",
+        "priority": 10,
+        "runtime": 20000,
+        "sleep": 130000
+      }
+    },
+    "global": {
+      "duration": 5
+    }
+  }
+
+ On running ``rt-app config.json``, it creates 2 threads. The first one,
+ scheduled by SCHED_DEADLINE, executes for 10ms every 100ms. The second one,
+ scheduled at SCHED_FIFO priority 10, executes for 20ms every 150ms. The test
+ will run for a total of 5 seconds.
+
+ Please refer to the rt-app documentation for the JSON schema and more examples.
 
  The second testing application is done using chrt which has support
  for SCHED_DEADLINE.
diff --git a/Documentation/scheduler/sched-ext.rst b/Documentation/scheduler/sched-ext.rst
index 0b2654e2164b..404fe6126a76 100644
--- a/Documentation/scheduler/sched-ext.rst
+++ b/Documentation/scheduler/sched-ext.rst
@@ -1,3 +1,5 @@
+.. _sched-ext:
+
 ==========================
 Extensible Scheduler Class
 ==========================
@@ -47,8 +49,8 @@ options should be enabled to use sched_ext:
 sched_ext is used only when the BPF scheduler is loaded and running.
 
 If a task explicitly sets its scheduling policy to ``SCHED_EXT``, it will be
-treated as ``SCHED_NORMAL`` and scheduled by CFS until the BPF scheduler is
-loaded.
+treated as ``SCHED_NORMAL`` and scheduled by the fair-class scheduler until the
+BPF scheduler is loaded.
 
 When the BPF scheduler is loaded and ``SCX_OPS_SWITCH_PARTIAL`` is not set
 in ``ops->flags``, all ``SCHED_NORMAL``, ``SCHED_BATCH``, ``SCHED_IDLE``, and
@@ -57,11 +59,11 @@ in ``ops->flags``, all ``SCHED_NORMAL``, ``SCHED_BATCH``, ``SCHED_IDLE``, and
 However, when the BPF scheduler is loaded and ``SCX_OPS_SWITCH_PARTIAL`` is
 set in ``ops->flags``, only tasks with the ``SCHED_EXT`` policy are scheduled
 by sched_ext, while tasks with ``SCHED_NORMAL``, ``SCHED_BATCH`` and
-``SCHED_IDLE`` policies are scheduled by CFS.
+``SCHED_IDLE`` policies are scheduled by the fair-class scheduler.
 
 Terminating the sched_ext scheduler program, triggering `SysRq-S`, or
 detection of any internal error including stalled runnable tasks aborts the
-BPF scheduler and reverts all tasks back to CFS.
+BPF scheduler and reverts all tasks back to the fair-class scheduler.
 
 .. code-block:: none
 
@@ -197,8 +199,8 @@ Dispatch Queues
 To match the impedance between the scheduler core and the BPF scheduler,
 sched_ext uses DSQs (dispatch queues) which can operate as both a FIFO and a
 priority queue. By default, there is one global FIFO (``SCX_DSQ_GLOBAL``),
-and one local dsq per CPU (``SCX_DSQ_LOCAL``). The BPF scheduler can manage
-an arbitrary number of dsq's using ``scx_bpf_create_dsq()`` and
+and one local DSQ per CPU (``SCX_DSQ_LOCAL``). The BPF scheduler can manage
+an arbitrary number of DSQs using ``scx_bpf_create_dsq()`` and
 ``scx_bpf_destroy_dsq()``.
 
 A CPU always executes a task from its local DSQ. A task is "inserted" into a
@@ -311,16 +313,21 @@ by a sched_ext scheduler:
         ops.runnable();         /* Task becomes ready to run */
 
         while (task is runnable) {
-            if (task is not in a DSQ) {
+            if (task is not in a DSQ && task->scx.slice == 0) {
                 ops.enqueue();  /* Task can be added to a DSQ */
 
-                /* A CPU becomes available */
+                /* Any usable CPU becomes available */
 
                 ops.dispatch(); /* Task is moved to a local DSQ */
             }
             ops.running();      /* Task starts running on its assigned CPU */
-            ops.tick();         /* Called every 1/HZ seconds */
+            while (task->scx.slice > 0 && task is runnable)
+                ops.tick();     /* Called every 1/HZ seconds */
             ops.stopping();     /* Task stops running (time slice expires or wait) */
+
+            /* Task's CPU becomes available */
+
+            ops.dispatch();     /* task->scx.slice can be refilled */
         }
 
         ops.quiescent();        /* Task releases its assigned CPU (wait) */
diff --git a/Documentation/scheduler/sched-stats.rst b/Documentation/scheduler/sched-stats.rst
index 08b6bc9a315c..9d6a337755f4 100644
--- a/Documentation/scheduler/sched-stats.rst
+++ b/Documentation/scheduler/sched-stats.rst
@@ -86,13 +86,16 @@ Domain statistics
 -----------------
 One of these is produced per domain for each cpu described. (Note that if
 CONFIG_SMP is not defined, *no* domains are utilized and these lines
-will not appear in the output. <name> is an extension to the domain field
-that prints the name of the corresponding sched domain. It can appear in
-schedstat version 17 and above.
+will not appear in the output.)
 
 domain<N> <name> <cpumask> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
 
-The first field is a bit mask indicating what cpus this domain operates over.
+The <name> field prints the name of the sched domain and is only supported
+with schedstat version >= 17. On previous versions, <cpumask> is the first
+field.
+
+The <cpumask> field is a bit mask indicating what cpus this domain operates
+over.
 
 The next 33 are a variety of sched_balance_rq() statistics in grouped into types
 of idleness (busy, idle and newly idle):
@@ -103,12 +106,13 @@ of idleness (busy, idle and newly idle):
         load did not require balancing when busy
     3)  # of times in this domain sched_balance_rq() tried to move one or
         more tasks and failed, when the cpu was busy
-    4)  Total imbalance in load when the cpu was busy
-    5)  Total imbalance in utilization when the cpu was busy
-    6)  Total imbalance in number of tasks when the cpu was busy
-    7)  Total imbalance due to misfit tasks when the cpu was busy
-    8)  # of times in this domain pull_task() was called when busy
-    9)  # of times in this domain pull_task() was called even though the
+    4)  Total imbalance in load in this domain when the cpu was busy
+    5)  Total imbalance in utilization in this domain when the cpu was busy
+    6)  Total imbalance in number of tasks in this domain when the cpu was busy
+    7)  Total imbalance due to misfit tasks in this domain when the cpu was
+        busy
+    8)  # of times in this domain detach_task() was called when busy
+    9)  # of times in this domain detach_task() was called even though the
         target task was cache-hot when busy
     10) # of times in this domain sched_balance_rq() was called but did not
         find a busier queue while the cpu was busy
@@ -121,13 +125,14 @@ of idleness (busy, idle and newly idle):
         the load did not require balancing when the cpu was idle
     14) # of times in this domain sched_balance_rq() tried to move one or
         more tasks and failed, when the cpu was idle
-    15) Total imbalance in load when the cpu was idle
-    16) Total imbalance in utilization when the cpu was idle
-    17) Total imbalance in number of tasks when the cpu was idle
-    18) Total imbalance due to misfit tasks when the cpu was idle
-    19) # of times in this domain pull_task() was called when the cpu
+    15) Total imbalance in load in this domain when the cpu was idle
+    16) Total imbalance in utilization in this domain when the cpu was idle
+    17) Total imbalance in number of tasks in this domain when the cpu was idle
+    18) Total imbalance due to misfit tasks in this domain when the cpu was
+        idle
+    19) # of times in this domain detach_task() was called when the cpu
         was idle
-    20) # of times in this domain pull_task() was called even though
+    20) # of times in this domain detach_task() was called even though
         the target task was cache-hot when idle
     21) # of times in this domain sched_balance_rq() was called but did
         not find a busier queue while the cpu was idle
@@ -135,17 +140,21 @@ of idleness (busy, idle and newly idle):
         cpu was idle but no busier group was found
 
     23) # of times in this domain sched_balance_rq() was called when the
-        was just becoming idle
+        cpu was just becoming idle
     24) # of times in this domain sched_balance_rq() checked but found the
         load did not require balancing when the cpu was just becoming idle
     25) # of times in this domain sched_balance_rq() tried to move one or more
         tasks and failed, when the cpu was just becoming idle
-    26) Total imbalance in load when the cpu was just becoming idle
-    27) Total imbalance in utilization when the cpu was just becoming idle
-    28) Total imbalance in number of tasks when the cpu was just becoming idle
-    29) Total imbalance due to misfit tasks when the cpu was just becoming idle
-    30) # of times in this domain pull_task() was called when newly idle
-    31) # of times in this domain pull_task() was called even though the
+    26) Total imbalance in load in this domain when the cpu was just becoming
+        idle
+    27) Total imbalance in utilization in this domain when the cpu was just
+        becoming idle
+    28) Total imbalance in number of tasks in this domain when the cpu was just
+        becoming idle
+    29) Total imbalance due to misfit tasks in this domain when the cpu was
+        just becoming idle
+    30) # of times in this domain detach_task() was called when newly idle
+    31) # of times in this domain detach_task() was called even though the
         target task was cache-hot when just becoming idle
     32) # of times in this domain sched_balance_rq() was called but did not
         find a busier queue while the cpu was just becoming idle