summaryrefslogtreecommitdiff
path: root/mm/Kconfig
diff options
context:
space:
mode:
authorNhat Pham <nphamcs@gmail.com>2023-11-30 11:40:23 -0800
committerAndrew Morton <akpm@linux-foundation.org>2023-12-12 10:57:02 -0800
commitb5ba474f3f518701249598b35c581b92a3c95b48 (patch)
tree74b6e95f9fdfe7a41cea4692ca8ff1b1948c6b8f /mm/Kconfig
parenta697dc2be925d4814f26d7588347ccdd2c5525ed (diff)
zswap: shrink zswap pool based on memory pressure
Currently, we only shrink the zswap pool when the user-defined limit is hit. This means that if we set the limit too high, cold data that are unlikely to be used again will reside in the pool, wasting precious memory. It is hard to predict how much zswap space will be needed ahead of time, as this depends on the workload (specifically, on factors such as memory access patterns and compressibility of the memory pages). This patch implements a memcg- and NUMA-aware shrinker for zswap, that is initiated when there is memory pressure. The shrinker does not have any parameter that must be tuned by the user, and can be opted in or out on a per-memcg basis. Furthermore, to make it more robust for many workloads and prevent overshrinking (i.e evicting warm pages that might be refaulted into memory), we build in the following heuristics: * Estimate the number of warm pages residing in zswap, and attempt to protect this region of the zswap LRU. * Scale the number of freeable objects by an estimate of the memory saving factor. The better zswap compresses the data, the fewer pages we will evict to swap (as we will otherwise incur IO for relatively small memory saving). * During reclaim, if the shrinker encounters a page that is also being brought into memory, the shrinker will cautiously terminate its shrinking action, as this is a sign that it is touching the warmer region of the zswap LRU. As a proof of concept, we ran the following synthetic benchmark: build the linux kernel in a memory-limited cgroup, and allocate some cold data in tmpfs to see if the shrinker could write them out and improved the overall performance. Depending on the amount of cold data generated, we observe from 14% to 35% reduction in kernel CPU time used in the kernel builds. [nphamcs@gmail.com: check shrinker enablement early, use less costly stat flushing] Link: https://lkml.kernel.org/r/20231206194456.3234203-1-nphamcs@gmail.com Link: https://lkml.kernel.org/r/20231130194023.4102148-7-nphamcs@gmail.com Signed-off-by: Nhat Pham <nphamcs@gmail.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Tested-by: Bagas Sanjaya <bagasdotme@gmail.com> Cc: Chris Li <chrisl@kernel.org> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Seth Jennings <sjenning@redhat.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Cc: Yosry Ahmed <yosryahmed@google.com> Cc: Chengming Zhou <chengming.zhou@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Diffstat (limited to 'mm/Kconfig')
-rw-r--r--mm/Kconfig14
1 files changed, 14 insertions, 0 deletions
diff --git a/mm/Kconfig b/mm/Kconfig
index 57cd378c73d6..ca87cdb72f11 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -61,6 +61,20 @@ config ZSWAP_EXCLUSIVE_LOADS_DEFAULT_ON
The cost is that if the page was never dirtied and needs to be
swapped out again, it will be re-compressed.
+config ZSWAP_SHRINKER_DEFAULT_ON
+ bool "Shrink the zswap pool on memory pressure"
+ depends on ZSWAP
+ default n
+ help
+ If selected, the zswap shrinker will be enabled, and the pages
+ stored in the zswap pool will become available for reclaim (i.e
+ written back to the backing swap device) on memory pressure.
+
+ This means that zswap writeback could happen even if the pool is
+ not yet full, or the cgroup zswap limit has not been reached,
+ reducing the chance that cold pages will reside in the zswap pool
+ and consume memory indefinitely.
+
choice
prompt "Default compressor"
depends on ZSWAP