From d6a3b247627a3bc0551504eb305d624cc6fb5453 Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Wed, 12 Jun 2019 14:53:03 -0300 Subject: docs: scheduler: convert docs to ReST and rename to *.rst In order to prepare to add them to the Kernel API book, convert the files to ReST format. The conversion is actually: - add blank lines and identation in order to identify paragraphs; - fix tables markups; - add some lists markups; - mark literal blocks; - adjust title markups. At its new index.rst, let's add a :orphan: while this is not linked to the main index.rst file, in order to avoid build warnings. Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Jonathan Corbet --- Documentation/scheduler/sched-rt-group.txt | 183 ----------------------------- 1 file changed, 183 deletions(-) delete mode 100644 Documentation/scheduler/sched-rt-group.txt (limited to 'Documentation/scheduler/sched-rt-group.txt') diff --git a/Documentation/scheduler/sched-rt-group.txt b/Documentation/scheduler/sched-rt-group.txt deleted file mode 100644 index d8fce3e78457..000000000000 --- a/Documentation/scheduler/sched-rt-group.txt +++ /dev/null @@ -1,183 +0,0 @@ - Real-Time group scheduling - -------------------------- - -CONTENTS -======== - -0. WARNING -1. Overview - 1.1 The problem - 1.2 The solution -2. The interface - 2.1 System-wide settings - 2.2 Default behaviour - 2.3 Basis for grouping tasks -3. Future plans - - -0. WARNING -========== - - Fiddling with these settings can result in an unstable system, the knobs are - root only and assumes root knows what he is doing. - -Most notable: - - * very small values in sched_rt_period_us can result in an unstable - system when the period is smaller than either the available hrtimer - resolution, or the time it takes to handle the budget refresh itself. - - * very small values in sched_rt_runtime_us can result in an unstable - system when the runtime is so small the system has difficulty making - forward progress (NOTE: the migration thread and kstopmachine both - are real-time processes). - -1. Overview -=========== - - -1.1 The problem ---------------- - -Realtime scheduling is all about determinism, a group has to be able to rely on -the amount of bandwidth (eg. CPU time) being constant. In order to schedule -multiple groups of realtime tasks, each group must be assigned a fixed portion -of the CPU time available. Without a minimum guarantee a realtime group can -obviously fall short. A fuzzy upper limit is of no use since it cannot be -relied upon. Which leaves us with just the single fixed portion. - -1.2 The solution ----------------- - -CPU time is divided by means of specifying how much time can be spent running -in a given period. We allocate this "run time" for each realtime group which -the other realtime groups will not be permitted to use. - -Any time not allocated to a realtime group will be used to run normal priority -tasks (SCHED_OTHER). Any allocated run time not used will also be picked up by -SCHED_OTHER. - -Let's consider an example: a frame fixed realtime renderer must deliver 25 -frames a second, which yields a period of 0.04s per frame. Now say it will also -have to play some music and respond to input, leaving it with around 80% CPU -time dedicated for the graphics. We can then give this group a run time of 0.8 -* 0.04s = 0.032s. - -This way the graphics group will have a 0.04s period with a 0.032s run time -limit. Now if the audio thread needs to refill the DMA buffer every 0.005s, but -needs only about 3% CPU time to do so, it can do with a 0.03 * 0.005s = -0.00015s. So this group can be scheduled with a period of 0.005s and a run time -of 0.00015s. - -The remaining CPU time will be used for user input and other tasks. Because -realtime tasks have explicitly allocated the CPU time they need to perform -their tasks, buffer underruns in the graphics or audio can be eliminated. - -NOTE: the above example is not fully implemented yet. We still -lack an EDF scheduler to make non-uniform periods usable. - - -2. The Interface -================ - - -2.1 System wide settings ------------------------- - -The system wide settings are configured under the /proc virtual file system: - -/proc/sys/kernel/sched_rt_period_us: - The scheduling period that is equivalent to 100% CPU bandwidth - -/proc/sys/kernel/sched_rt_runtime_us: - A global limit on how much time realtime scheduling may use. Even without - CONFIG_RT_GROUP_SCHED enabled, this will limit time reserved to realtime - processes. With CONFIG_RT_GROUP_SCHED it signifies the total bandwidth - available to all realtime groups. - - * Time is specified in us because the interface is s32. This gives an - operating range from 1us to about 35 minutes. - * sched_rt_period_us takes values from 1 to INT_MAX. - * sched_rt_runtime_us takes values from -1 to (INT_MAX - 1). - * A run time of -1 specifies runtime == period, ie. no limit. - - -2.2 Default behaviour ---------------------- - -The default values for sched_rt_period_us (1000000 or 1s) and -sched_rt_runtime_us (950000 or 0.95s). This gives 0.05s to be used by -SCHED_OTHER (non-RT tasks). These defaults were chosen so that a run-away -realtime tasks will not lock up the machine but leave a little time to recover -it. By setting runtime to -1 you'd get the old behaviour back. - -By default all bandwidth is assigned to the root group and new groups get the -period from /proc/sys/kernel/sched_rt_period_us and a run time of 0. If you -want to assign bandwidth to another group, reduce the root group's bandwidth -and assign some or all of the difference to another group. - -Realtime group scheduling means you have to assign a portion of total CPU -bandwidth to the group before it will accept realtime tasks. Therefore you will -not be able to run realtime tasks as any user other than root until you have -done that, even if the user has the rights to run processes with realtime -priority! - - -2.3 Basis for grouping tasks ----------------------------- - -Enabling CONFIG_RT_GROUP_SCHED lets you explicitly allocate real -CPU bandwidth to task groups. - -This uses the cgroup virtual file system and "/cpu.rt_runtime_us" -to control the CPU time reserved for each control group. - -For more information on working with control groups, you should read -Documentation/cgroup-v1/cgroups.txt as well. - -Group settings are checked against the following limits in order to keep the -configuration schedulable: - - \Sum_{i} runtime_{i} / global_period <= global_runtime / global_period - -For now, this can be simplified to just the following (but see Future plans): - - \Sum_{i} runtime_{i} <= global_runtime - - -3. Future plans -=============== - -There is work in progress to make the scheduling period for each group -("/cpu.rt_period_us") configurable as well. - -The constraint on the period is that a subgroup must have a smaller or -equal period to its parent. But realistically its not very useful _yet_ -as its prone to starvation without deadline scheduling. - -Consider two sibling groups A and B; both have 50% bandwidth, but A's -period is twice the length of B's. - -* group A: period=100000us, runtime=50000us - - this runs for 0.05s once every 0.1s - -* group B: period= 50000us, runtime=25000us - - this runs for 0.025s twice every 0.1s (or once every 0.05 sec). - -This means that currently a while (1) loop in A will run for the full period of -B and can starve B's tasks (assuming they are of lower priority) for a whole -period. - -The next project will be SCHED_EDF (Earliest Deadline First scheduling) to bring -full deadline scheduling to the linux kernel. Deadline scheduling the above -groups and treating end of the period as a deadline will ensure that they both -get their allocated time. - -Implementing SCHED_EDF might take a while to complete. Priority Inheritance is -the biggest challenge as the current linux PI infrastructure is geared towards -the limited static priority levels 0-99. With deadline scheduling you need to -do deadline inheritance (since priority is inversely proportional to the -deadline delta (deadline - now)). - -This means the whole PI machinery will have to be reworked - and that is one of -the most complex pieces of code we have. -- cgit