summaryrefslogtreecommitdiff
path: root/Documentation
diff options
context:
space:
mode:
authorChengming Zhou <zhouchengming@bytedance.com>2022-09-07 17:03:32 +0800
committerPeter Zijlstra <peterz@infradead.org>2022-09-09 11:08:33 +0200
commit34f26a15611afb03c33df6819359d36f5b382589 (patch)
tree38765bb0112e0ba6c8f173d8a6a129bf0b9f92f0 /Documentation
parentdc86aba751e2867244411adda1562f6664747019 (diff)
sched/psi: Per-cgroup PSI accounting disable/re-enable interface
PSI accounts stalls for each cgroup separately and aggregates it at each level of the hierarchy. This may cause non-negligible overhead for some workloads when under deep level of the hierarchy. commit 3958e2d0c34e ("cgroup: make per-cgroup pressure stall tracking configurable") make PSI to skip per-cgroup stall accounting, only account system-wide to avoid this each level overhead. But for our use case, we also want leaf cgroup PSI stats accounted for userspace adjustment on that cgroup, apart from only system-wide adjustment. So this patch introduce a per-cgroup PSI accounting disable/re-enable interface "cgroup.pressure", which is a read-write single value file that allowed values are "0" and "1", the defaults is "1" so per-cgroup PSI stats is enabled by default. Implementation details: It should be relatively straight-forward to disable and re-enable state aggregation, time tracking, averaging on a per-cgroup level, if we can live with losing history from while it was disabled. I.e. the avgs will restart from 0, total= will have gaps. But it's hard or complex to stop/restart groupc->tasks[] updates, which is not implemented in this patch. So we always update groupc->tasks[] and PSI_ONCPU bit in psi_group_change() even when the cgroup PSI stats is disabled. Suggested-by: Johannes Weiner <hannes@cmpxchg.org> Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Link: https://lkml.kernel.org/r/20220907090332.2078-1-zhouchengming@bytedance.com
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/admin-guide/cgroup-v2.rst17
1 files changed, 17 insertions, 0 deletions
diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 971c418bc778..4cad4e2b31ec 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -976,6 +976,23 @@ All cgroup core files are prefixed with "cgroup."
killing cgroups is a process directed operation, i.e. it affects
the whole thread-group.
+ cgroup.pressure
+ A read-write single value file that allowed values are "0" and "1".
+ The default is "1".
+
+ Writing "0" to the file will disable the cgroup PSI accounting.
+ Writing "1" to the file will re-enable the cgroup PSI accounting.
+
+ This control attribute is not hierarchical, so disable or enable PSI
+ accounting in a cgroup does not affect PSI accounting in descendants
+ and doesn't need pass enablement via ancestors from root.
+
+ The reason this control attribute exists is that PSI accounts stalls for
+ each cgroup separately and aggregates it at each level of the hierarchy.
+ This may cause non-negligible overhead for some workloads when under
+ deep level of the hierarchy, in which case this control attribute can
+ be used to disable PSI accounting in the non-leaf cgroups.
+
irq.pressure
A read-write nested-keyed file.