diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2020-03-30 17:01:51 -0700 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2020-03-30 17:01:51 -0700 |
commit | 642e53ead6aea8740a219ede509a5d138fd4f780 (patch) | |
tree | 5c4680d0c07315dab24fe7333c62f56bc19ec4e4 /Documentation | |
parent | 9b82f05f869a823d43ea4186f5f732f2924d3693 (diff) | |
parent | 313f16e2e35abb833eab5bdebc6ae30699adca18 (diff) |
Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
"The main changes in this cycle are:
- Various NUMA scheduling updates: harmonize the load-balancer and
NUMA placement logic to not work against each other. The intended
result is better locality, better utilization and fewer migrations.
- Introduce Thermal Pressure tracking and optimizations, to improve
task placement on thermally overloaded systems.
- Implement frequency invariant scheduler accounting on (some) x86
CPUs. This is done by observing and sampling the 'recent' CPU
frequency average at ~tick boundaries. The CPU provides this data
via the APERF/MPERF MSRs. This hopefully makes our capacity
estimates more precise and keeps tasks on the same CPU better even
if it might seem overloaded at a lower momentary frequency. (As
usual, turbo mode is a complication that we resolve by observing
the maximum frequency and renormalizing to it.)
- Add asymmetric CPU capacity wakeup scan to improve capacity
utilization on asymmetric topologies. (big.LITTLE systems)
- PSI fixes and optimizations.
- RT scheduling capacity awareness fixes & improvements.
- Optimize the CONFIG_RT_GROUP_SCHED constraints code.
- Misc fixes, cleanups and optimizations - see the changelog for
details"
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (62 commits)
threads: Update PID limit comment according to futex UAPI change
sched/fair: Fix condition of avg_load calculation
sched/rt: cpupri_find: Trigger a full search as fallback
kthread: Do not preempt current task if it is going to call schedule()
sched/fair: Improve spreading of utilization
sched: Avoid scale real weight down to zero
psi: Move PF_MEMSTALL out of task->flags
MAINTAINERS: Add maintenance information for psi
psi: Optimize switching tasks inside shared cgroups
psi: Fix cpu.pressure for cpu.max and competing cgroups
sched/core: Distribute tasks within affinity masks
sched/fair: Fix enqueue_task_fair warning
thermal/cpu-cooling, sched/core: Move the arch_set_thermal_pressure() API to generic scheduler code
sched/rt: Remove unnecessary push for unfit tasks
sched/rt: Allow pulling unfitting task
sched/rt: Optimize cpupri_find() on non-heterogenous systems
sched/rt: Re-instate old behavior in select_task_rq_rt()
sched/rt: cpupri_find: Implement fallback mechanism for !fit case
sched/fair: Fix reordering of enqueue/dequeue_task_fair()
sched/fair: Fix runnable_avg for throttled cfs
...
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/admin-guide/kernel-parameters.txt | 16 | ||||
-rw-r--r-- | Documentation/robust-futex-ABI.txt | 14 |
2 files changed, 22 insertions, 8 deletions
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 74e75a7b4b24..9024fb2dc418 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -4428,6 +4428,22 @@ incurs a small amount of overhead in the scheduler but is useful for debugging and performance tuning. + sched_thermal_decay_shift= + [KNL, SMP] Set a decay shift for scheduler thermal + pressure signal. Thermal pressure signal follows the + default decay period of other scheduler pelt + signals(usually 32 ms but configurable). Setting + sched_thermal_decay_shift will left shift the decay + period for the thermal pressure signal by the shift + value. + i.e. with the default pelt decay period of 32 ms + sched_thermal_decay_shift thermal pressure decay pr + 1 64 ms + 2 128 ms + and so on. + Format: integer between 0 and 10 + Default is 0. + skew_tick= [KNL] Offset the periodic timer tick per cpu to mitigate xtime_lock contention on larger systems, and/or RCU lock contention on all systems with CONFIG_MAXSMP set. diff --git a/Documentation/robust-futex-ABI.txt b/Documentation/robust-futex-ABI.txt index 8a5d34abf726..f24904f1c16f 100644 --- a/Documentation/robust-futex-ABI.txt +++ b/Documentation/robust-futex-ABI.txt @@ -61,8 +61,8 @@ setup that list. address of the associated 'lock entry', plus or minus, of what will be called the 'lock word', from that 'lock entry'. The 'lock word' is always a 32 bit word, unlike the other words above. The 'lock - word' holds 3 flag bits in the upper 3 bits, and the thread id (TID) - of the thread holding the lock in the bottom 29 bits. See further + word' holds 2 flag bits in the upper 2 bits, and the thread id (TID) + of the thread holding the lock in the bottom 30 bits. See further below for a description of the flag bits. The third word, called 'list_op_pending', contains transient copy of @@ -128,7 +128,7 @@ that thread's robust_futex linked lock list a given time. A given futex lock structure in a user shared memory region may be held at different times by any of the threads with access to that region. The thread currently holding such a lock, if any, is marked with the threads -TID in the lower 29 bits of the 'lock word'. +TID in the lower 30 bits of the 'lock word'. When adding or removing a lock from its list of held locks, in order for the kernel to correctly handle lock cleanup regardless of when the task @@ -141,7 +141,7 @@ On insertion: 1) set the 'list_op_pending' word to the address of the 'lock entry' to be inserted, 2) acquire the futex lock, - 3) add the lock entry, with its thread id (TID) in the bottom 29 bits + 3) add the lock entry, with its thread id (TID) in the bottom 30 bits of the 'lock word', to the linked list starting at 'head', and 4) clear the 'list_op_pending' word. @@ -155,7 +155,7 @@ On removal: On exit, the kernel will consider the address stored in 'list_op_pending' and the address of each 'lock word' found by walking -the list starting at 'head'. For each such address, if the bottom 29 +the list starting at 'head'. For each such address, if the bottom 30 bits of the 'lock word' at offset 'offset' from that address equals the exiting threads TID, then the kernel will do two things: @@ -180,7 +180,5 @@ any point: future kernel configuration changes) elements. When the kernel sees a list entry whose 'lock word' doesn't have the -current threads TID in the lower 29 bits, it does nothing with that +current threads TID in the lower 30 bits, it does nothing with that entry, and goes on to the next entry. - -Bit 29 (0x20000000) of the 'lock word' is reserved for future use. |