summaryrefslogtreecommitdiff
path: root/Documentation
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2020-03-30 17:01:51 -0700
committerLinus Torvalds <torvalds@linux-foundation.org>2020-03-30 17:01:51 -0700
commit642e53ead6aea8740a219ede509a5d138fd4f780 (patch)
tree5c4680d0c07315dab24fe7333c62f56bc19ec4e4 /Documentation
parent9b82f05f869a823d43ea4186f5f732f2924d3693 (diff)
parent313f16e2e35abb833eab5bdebc6ae30699adca18 (diff)
Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar: "The main changes in this cycle are: - Various NUMA scheduling updates: harmonize the load-balancer and NUMA placement logic to not work against each other. The intended result is better locality, better utilization and fewer migrations. - Introduce Thermal Pressure tracking and optimizations, to improve task placement on thermally overloaded systems. - Implement frequency invariant scheduler accounting on (some) x86 CPUs. This is done by observing and sampling the 'recent' CPU frequency average at ~tick boundaries. The CPU provides this data via the APERF/MPERF MSRs. This hopefully makes our capacity estimates more precise and keeps tasks on the same CPU better even if it might seem overloaded at a lower momentary frequency. (As usual, turbo mode is a complication that we resolve by observing the maximum frequency and renormalizing to it.) - Add asymmetric CPU capacity wakeup scan to improve capacity utilization on asymmetric topologies. (big.LITTLE systems) - PSI fixes and optimizations. - RT scheduling capacity awareness fixes & improvements. - Optimize the CONFIG_RT_GROUP_SCHED constraints code. - Misc fixes, cleanups and optimizations - see the changelog for details" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (62 commits) threads: Update PID limit comment according to futex UAPI change sched/fair: Fix condition of avg_load calculation sched/rt: cpupri_find: Trigger a full search as fallback kthread: Do not preempt current task if it is going to call schedule() sched/fair: Improve spreading of utilization sched: Avoid scale real weight down to zero psi: Move PF_MEMSTALL out of task->flags MAINTAINERS: Add maintenance information for psi psi: Optimize switching tasks inside shared cgroups psi: Fix cpu.pressure for cpu.max and competing cgroups sched/core: Distribute tasks within affinity masks sched/fair: Fix enqueue_task_fair warning thermal/cpu-cooling, sched/core: Move the arch_set_thermal_pressure() API to generic scheduler code sched/rt: Remove unnecessary push for unfit tasks sched/rt: Allow pulling unfitting task sched/rt: Optimize cpupri_find() on non-heterogenous systems sched/rt: Re-instate old behavior in select_task_rq_rt() sched/rt: cpupri_find: Implement fallback mechanism for !fit case sched/fair: Fix reordering of enqueue/dequeue_task_fair() sched/fair: Fix runnable_avg for throttled cfs ...
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/admin-guide/kernel-parameters.txt16
-rw-r--r--Documentation/robust-futex-ABI.txt14
2 files changed, 22 insertions, 8 deletions
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 74e75a7b4b24..9024fb2dc418 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4428,6 +4428,22 @@
incurs a small amount of overhead in the scheduler
but is useful for debugging and performance tuning.
+ sched_thermal_decay_shift=
+ [KNL, SMP] Set a decay shift for scheduler thermal
+ pressure signal. Thermal pressure signal follows the
+ default decay period of other scheduler pelt
+ signals(usually 32 ms but configurable). Setting
+ sched_thermal_decay_shift will left shift the decay
+ period for the thermal pressure signal by the shift
+ value.
+ i.e. with the default pelt decay period of 32 ms
+ sched_thermal_decay_shift thermal pressure decay pr
+ 1 64 ms
+ 2 128 ms
+ and so on.
+ Format: integer between 0 and 10
+ Default is 0.
+
skew_tick= [KNL] Offset the periodic timer tick per cpu to mitigate
xtime_lock contention on larger systems, and/or RCU lock
contention on all systems with CONFIG_MAXSMP set.
diff --git a/Documentation/robust-futex-ABI.txt b/Documentation/robust-futex-ABI.txt
index 8a5d34abf726..f24904f1c16f 100644
--- a/Documentation/robust-futex-ABI.txt
+++ b/Documentation/robust-futex-ABI.txt
@@ -61,8 +61,8 @@ setup that list.
address of the associated 'lock entry', plus or minus, of what will
be called the 'lock word', from that 'lock entry'. The 'lock word'
is always a 32 bit word, unlike the other words above. The 'lock
- word' holds 3 flag bits in the upper 3 bits, and the thread id (TID)
- of the thread holding the lock in the bottom 29 bits. See further
+ word' holds 2 flag bits in the upper 2 bits, and the thread id (TID)
+ of the thread holding the lock in the bottom 30 bits. See further
below for a description of the flag bits.
The third word, called 'list_op_pending', contains transient copy of
@@ -128,7 +128,7 @@ that thread's robust_futex linked lock list a given time.
A given futex lock structure in a user shared memory region may be held
at different times by any of the threads with access to that region. The
thread currently holding such a lock, if any, is marked with the threads
-TID in the lower 29 bits of the 'lock word'.
+TID in the lower 30 bits of the 'lock word'.
When adding or removing a lock from its list of held locks, in order for
the kernel to correctly handle lock cleanup regardless of when the task
@@ -141,7 +141,7 @@ On insertion:
1) set the 'list_op_pending' word to the address of the 'lock entry'
to be inserted,
2) acquire the futex lock,
- 3) add the lock entry, with its thread id (TID) in the bottom 29 bits
+ 3) add the lock entry, with its thread id (TID) in the bottom 30 bits
of the 'lock word', to the linked list starting at 'head', and
4) clear the 'list_op_pending' word.
@@ -155,7 +155,7 @@ On removal:
On exit, the kernel will consider the address stored in
'list_op_pending' and the address of each 'lock word' found by walking
-the list starting at 'head'. For each such address, if the bottom 29
+the list starting at 'head'. For each such address, if the bottom 30
bits of the 'lock word' at offset 'offset' from that address equals the
exiting threads TID, then the kernel will do two things:
@@ -180,7 +180,5 @@ any point:
future kernel configuration changes) elements.
When the kernel sees a list entry whose 'lock word' doesn't have the
-current threads TID in the lower 29 bits, it does nothing with that
+current threads TID in the lower 30 bits, it does nothing with that
entry, and goes on to the next entry.
-
-Bit 29 (0x20000000) of the 'lock word' is reserved for future use.