diff options
Diffstat (limited to 'Documentation/admin-guide/cgroup-v1')
-rw-r--r-- | Documentation/admin-guide/cgroup-v1/cgroups.rst | 2 | ||||
-rw-r--r-- | Documentation/admin-guide/cgroup-v1/cpusets.rst | 9 | ||||
-rw-r--r-- | Documentation/admin-guide/cgroup-v1/hugetlb.rst | 20 | ||||
-rw-r--r-- | Documentation/admin-guide/cgroup-v1/memcg_test.rst | 2 | ||||
-rw-r--r-- | Documentation/admin-guide/cgroup-v1/memory.rst | 116 | ||||
-rw-r--r-- | Documentation/admin-guide/cgroup-v1/pids.rst | 3 |
6 files changed, 53 insertions, 99 deletions
diff --git a/Documentation/admin-guide/cgroup-v1/cgroups.rst b/Documentation/admin-guide/cgroup-v1/cgroups.rst index 9343148ee993..a3e2edb3d274 100644 --- a/Documentation/admin-guide/cgroup-v1/cgroups.rst +++ b/Documentation/admin-guide/cgroup-v1/cgroups.rst @@ -570,7 +570,7 @@ visible to cgroup_for_each_child/descendant_*() iterators. The subsystem may choose to fail creation by returning -errno. This callback can be used to implement reliable state sharing and propagation along the hierarchy. See the comment on -cgroup_for_each_descendant_pre() for details. +cgroup_for_each_live_descendant_pre() for details. ``void css_offline(struct cgroup *cgrp);`` (cgroup_mutex held by caller) diff --git a/Documentation/admin-guide/cgroup-v1/cpusets.rst b/Documentation/admin-guide/cgroup-v1/cpusets.rst index ae646d621a8a..f401af5e2f09 100644 --- a/Documentation/admin-guide/cgroup-v1/cpusets.rst +++ b/Documentation/admin-guide/cgroup-v1/cpusets.rst @@ -179,7 +179,7 @@ files describing that cpuset: - cpuset.mem_hardwall flag: is memory allocation hardwalled - cpuset.memory_pressure: measure of how much paging pressure in cpuset - cpuset.memory_spread_page flag: if set, spread page cache evenly on allowed nodes - - cpuset.memory_spread_slab flag: if set, spread slab cache evenly on allowed nodes + - cpuset.memory_spread_slab flag: OBSOLETE. Doesn't have any function. - cpuset.sched_load_balance flag: if set, load balance within CPUs on that cpuset - cpuset.sched_relax_domain_level: the searching range when migrating tasks @@ -568,7 +568,7 @@ on the next tick. For some applications in special situation, waiting The 'cpuset.sched_relax_domain_level' file allows you to request changing this searching range as you like. This file takes int value which -indicates size of searching range in levels ideally as follows, +indicates size of searching range in levels approximately as follows, otherwise initial value -1 that indicates the cpuset has no request. ====== =========================================================== @@ -581,6 +581,11 @@ otherwise initial value -1 that indicates the cpuset has no request. 5 search system wide [on NUMA system] ====== =========================================================== +Not all levels can be present and values can change depending on the +system architecture and kernel configuration. Check +/sys/kernel/debug/sched/domains/cpu*/domain*/ for system-specific +details. + The system default is architecture dependent. The system default can be changed using the relax_domain_level= boot parameter. diff --git a/Documentation/admin-guide/cgroup-v1/hugetlb.rst b/Documentation/admin-guide/cgroup-v1/hugetlb.rst index 0fa724d82abb..493a8e386700 100644 --- a/Documentation/admin-guide/cgroup-v1/hugetlb.rst +++ b/Documentation/admin-guide/cgroup-v1/hugetlb.rst @@ -65,10 +65,12 @@ files include:: 1. Page fault accounting -hugetlb.<hugepagesize>.limit_in_bytes -hugetlb.<hugepagesize>.max_usage_in_bytes -hugetlb.<hugepagesize>.usage_in_bytes -hugetlb.<hugepagesize>.failcnt +:: + + hugetlb.<hugepagesize>.limit_in_bytes + hugetlb.<hugepagesize>.max_usage_in_bytes + hugetlb.<hugepagesize>.usage_in_bytes + hugetlb.<hugepagesize>.failcnt The HugeTLB controller allows users to limit the HugeTLB usage (page fault) per control group and enforces the limit during page fault. Since HugeTLB @@ -82,10 +84,12 @@ getting SIGBUS. 2. Reservation accounting -hugetlb.<hugepagesize>.rsvd.limit_in_bytes -hugetlb.<hugepagesize>.rsvd.max_usage_in_bytes -hugetlb.<hugepagesize>.rsvd.usage_in_bytes -hugetlb.<hugepagesize>.rsvd.failcnt +:: + + hugetlb.<hugepagesize>.rsvd.limit_in_bytes + hugetlb.<hugepagesize>.rsvd.max_usage_in_bytes + hugetlb.<hugepagesize>.rsvd.usage_in_bytes + hugetlb.<hugepagesize>.rsvd.failcnt The HugeTLB controller allows to limit the HugeTLB reservations per control group and enforces the controller limit at reservation time and at the fault of diff --git a/Documentation/admin-guide/cgroup-v1/memcg_test.rst b/Documentation/admin-guide/cgroup-v1/memcg_test.rst index 1f128458ddea..9f8e27355cba 100644 --- a/Documentation/admin-guide/cgroup-v1/memcg_test.rst +++ b/Documentation/admin-guide/cgroup-v1/memcg_test.rst @@ -102,7 +102,7 @@ Under below explanation, we assume CONFIG_SWAP=y. The logic is very clear. (About migration, see below) Note: - __remove_from_page_cache() is called by remove_from_page_cache() + __filemap_remove_folio() is called by filemap_remove_folio() and __remove_mapping(). 6. Shmem(tmpfs) Page Cache diff --git a/Documentation/admin-guide/cgroup-v1/memory.rst b/Documentation/admin-guide/cgroup-v1/memory.rst index ca7d9402f6be..286d16fc22eb 100644 --- a/Documentation/admin-guide/cgroup-v1/memory.rst +++ b/Documentation/admin-guide/cgroup-v1/memory.rst @@ -78,18 +78,22 @@ Brief summary of control files. memory.memsw.max_usage_in_bytes show max memory+Swap usage recorded memory.soft_limit_in_bytes set/show soft limit of memory usage This knob is not available on CONFIG_PREEMPT_RT systems. + This knob is deprecated and shouldn't be + used. memory.stat show various statistics memory.use_hierarchy set/show hierarchical account enabled This knob is deprecated and shouldn't be used. memory.force_empty trigger forced page reclaim memory.pressure_level set memory pressure notifications + This knob is deprecated and shouldn't be + used. memory.swappiness set/show swappiness parameter of vmscan (See sysctl's vm.swappiness) - memory.move_charge_at_immigrate set/show controls of moving charges + memory.move_charge_at_immigrate This knob is deprecated. + memory.oom_control set/show oom controls. This knob is deprecated and shouldn't be used. - memory.oom_control set/show oom controls. memory.numa_stat show the number of memory usage per numa node memory.kmem.limit_in_bytes Deprecated knob to set and read the kernel @@ -105,10 +109,18 @@ Brief summary of control files. memory.kmem.max_usage_in_bytes show max kernel memory usage recorded memory.kmem.tcp.limit_in_bytes set/show hard limit for tcp buf memory + This knob is deprecated and shouldn't be + used. memory.kmem.tcp.usage_in_bytes show current tcp buf memory allocation + This knob is deprecated and shouldn't be + used. memory.kmem.tcp.failcnt show the number of tcp buf memory usage hits limits + This knob is deprecated and shouldn't be + used. memory.kmem.tcp.max_usage_in_bytes show max tcp buf memory usage recorded + This knob is deprecated and shouldn't be + used. ==================================== ========================================== 1. History @@ -229,10 +241,6 @@ behind this approach is that a cgroup that aggressively uses a shared page will eventually get charged for it (once it is uncharged from the cgroup that brought it in -- this will happen on memory pressure). -But see :ref:`section 8.2 <cgroup-v1-memory-movable-charges>` when moving a -task to another cgroup, its pages may be recharged to the new cgroup, if -move_charge_at_immigrate has been chosen. - 2.4 Swap Extension -------------------------------------- @@ -300,14 +308,14 @@ When oom event notifier is registered, event will be delivered. Lock order is as follows:: - Page lock (PG_locked bit of page->flags) + folio_lock mm->page_table_lock or split pte_lock folio_memcg_lock (memcg->move_lock) mapping->i_pages lock lruvec->lru_lock. Per-node-per-memcgroup LRU (cgroup's private LRU) is guarded by -lruvec->lru_lock; PG_lru bit of page->flags is cleared before +lruvec->lru_lock; the folio LRU flag is cleared before isolating a page from its LRU under lruvec->lru_lock. .. _cgroup-v1-memory-kernel-extension: @@ -693,8 +701,10 @@ For compatibility reasons writing 1 to memory.use_hierarchy will always pass:: # echo 1 > memory.use_hierarchy -7. Soft limits -============== +7. Soft limits (DEPRECATED) +=========================== + +THIS IS DEPRECATED! Soft limits allow for greater sharing of memory. The idea behind soft limits is to allow control groups to use as much of the memory as needed, provided @@ -740,78 +750,8 @@ If we want to change this to 1G, we can at any time use:: THIS IS DEPRECATED! -It's expensive and unreliable! It's better practice to launch workload -tasks directly from inside their target cgroup. Use dedicated workload -cgroups to allow fine-grained policy adjustments without having to -move physical pages between control domains. - -Users can move charges associated with a task along with task migration, that -is, uncharge task's pages from the old cgroup and charge them to the new cgroup. -This feature is not supported in !CONFIG_MMU environments because of lack of -page tables. - -8.1 Interface -------------- - -This feature is disabled by default. It can be enabled (and disabled again) by -writing to memory.move_charge_at_immigrate of the destination cgroup. - -If you want to enable it:: - - # echo (some positive value) > memory.move_charge_at_immigrate - -.. note:: - Each bits of move_charge_at_immigrate has its own meaning about what type - of charges should be moved. See :ref:`section 8.2 - <cgroup-v1-memory-movable-charges>` for details. - -.. note:: - Charges are moved only when you move mm->owner, in other words, - a leader of a thread group. - -.. note:: - If we cannot find enough space for the task in the destination cgroup, we - try to make space by reclaiming memory. Task migration may fail if we - cannot make enough space. - -.. note:: - It can take several seconds if you move charges much. - -And if you want disable it again:: - - # echo 0 > memory.move_charge_at_immigrate - -.. _cgroup-v1-memory-movable-charges: - -8.2 Type of charges which can be moved --------------------------------------- - -Each bit in move_charge_at_immigrate has its own meaning about what type of -charges should be moved. But in any case, it must be noted that an account of -a page or a swap can be moved only when it is charged to the task's current -(old) memory cgroup. - -+---+--------------------------------------------------------------------------+ -|bit| what type of charges would be moved ? | -+===+==========================================================================+ -| 0 | A charge of an anonymous page (or swap of it) used by the target task. | -| | You must enable Swap Extension (see 2.4) to enable move of swap charges. | -+---+--------------------------------------------------------------------------+ -| 1 | A charge of file pages (normal file, tmpfs file (e.g. ipc shared memory) | -| | and swaps of tmpfs file) mmapped by the target task. Unlike the case of | -| | anonymous pages, file pages (and swaps) in the range mmapped by the task | -| | will be moved even if the task hasn't done page fault, i.e. they might | -| | not be the task's "RSS", but other task's "RSS" that maps the same file. | -| | And mapcount of the page is ignored (the page can be moved even if | -| | page_mapcount(page) > 1). You must enable Swap Extension (see 2.4) to | -| | enable move of swap charges. | -+---+--------------------------------------------------------------------------+ - -8.3 TODO --------- - -- All of moving charge operations are done under cgroup_mutex. It's not good - behavior to hold the mutex too long, so we may need some trick. +Reading memory.move_charge_at_immigrate will always return 0 and writing +to it will always return -EINVAL. 9. Memory thresholds ==================== @@ -834,8 +774,10 @@ It's applicable for root and non-root cgroup. .. _cgroup-v1-memory-oom-control: -10. OOM Control -=============== +10. OOM Control (DEPRECATED) +============================ + +THIS IS DEPRECATED! memory.oom_control file is for OOM notification and other controls. @@ -882,8 +824,10 @@ At reading, current status of OOM is shown. The number of processes belonging to this cgroup killed by any kind of OOM killer. -11. Memory Pressure -=================== +11. Memory Pressure (DEPRECATED) +================================ + +THIS IS DEPRECATED! The pressure level notifications can be used to monitor the memory allocation cost; based on the pressure, applications can implement diff --git a/Documentation/admin-guide/cgroup-v1/pids.rst b/Documentation/admin-guide/cgroup-v1/pids.rst index 6acebd9e72c8..0f9f9a7b1f6c 100644 --- a/Documentation/admin-guide/cgroup-v1/pids.rst +++ b/Documentation/admin-guide/cgroup-v1/pids.rst @@ -36,7 +36,8 @@ superset of parent/child/pids.current. The pids.events file contains event counters: - - max: Number of times fork failed because limit was hit. + - max: Number of times fork failed in the cgroup because limit was hit in + self or ancestors. Example ------- |