summaryrefslogtreecommitdiff
path: root/Documentation/admin-guide/sysctl
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/admin-guide/sysctl')
-rw-r--r--Documentation/admin-guide/sysctl/fs.rst25
-rw-r--r--Documentation/admin-guide/sysctl/kernel.rst60
-rw-r--r--Documentation/admin-guide/sysctl/vm.rst46
3 files changed, 94 insertions, 37 deletions
diff --git a/Documentation/admin-guide/sysctl/fs.rst b/Documentation/admin-guide/sysctl/fs.rst
index 08e89e031714..6c54718c9d04 100644
--- a/Documentation/admin-guide/sysctl/fs.rst
+++ b/Documentation/admin-guide/sysctl/fs.rst
@@ -347,3 +347,28 @@ filesystems:
``/proc/sys/fs/fuse/max_pages_limit`` is a read/write file for
setting/getting the maximum number of pages that can be used for servicing
requests in FUSE.
+
+``/proc/sys/fs/fuse/default_request_timeout`` is a read/write file for
+setting/getting the default timeout (in seconds) for a fuse server to
+reply to a kernel-issued request in the event where the server did not
+specify a timeout at mount. If the server set a timeout,
+then default_request_timeout will be ignored. The default
+"default_request_timeout" is set to 0. 0 indicates no default timeout.
+The maximum value that can be set is 65535.
+
+``/proc/sys/fs/fuse/max_request_timeout`` is a read/write file for
+setting/getting the maximum timeout (in seconds) for a fuse server to
+reply to a kernel-issued request. A value greater than 0 automatically opts
+the server into a timeout that will be set to at most "max_request_timeout",
+even if the server did not specify a timeout and default_request_timeout is
+set to 0. If max_request_timeout is greater than 0 and the server set a timeout
+greater than max_request_timeout or default_request_timeout is set to a value
+greater than max_request_timeout, the system will use max_request_timeout as the
+timeout. 0 indicates no max request timeout. The maximum value that can be set
+is 65535.
+
+For timeouts, if the server does not respond to the request by the time
+the set timeout elapses, then the connection to the fuse server will be aborted.
+Please note that the timeouts are not 100% precise (eg you may set 60 seconds but
+the timeout may kick in after 70 seconds). The upper margin of error for the
+timeout is roughly FUSE_TIMEOUT_TIMER_FREQ seconds.
diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
index dd49a89a62d3..8b49eab937d0 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -177,6 +177,7 @@ core_pattern
%E executable path
%c maximum size of core file by resource limit RLIMIT_CORE
%C CPU the task ran on
+ %F pidfd number
%<OTHER> both are dropped
======== ==========================================
@@ -889,7 +890,7 @@ bit 1 print system memory info
bit 2 print timer info
bit 3 print locks info if ``CONFIG_LOCKDEP`` is on
bit 4 print ftrace buffer
-bit 5 print all printk messages in buffer
+bit 5 replay all messages on consoles at the end of panic
bit 6 print all CPUs backtrace (if available in the arch)
bit 7 print only tasks in uninterruptible (blocked) state
===== ============================================
@@ -899,6 +900,24 @@ So for example to print tasks and memory info on panic, user can::
echo 3 > /proc/sys/kernel/panic_print
+panic_sys_info
+==============
+
+A comma separated list of extra information to be dumped on panic,
+for example, "tasks,mem,timers,...". It is a human readable alternative
+to 'panic_print'. Possible values are:
+
+============= ===================================================
+tasks print all tasks info
+mem print system memory info
+timer print timers info
+lock print locks info if CONFIG_LOCKDEP is on
+ftrace print ftrace buffer
+all_bt print all CPUs backtrace (if available in the arch)
+blocked_tasks print only tasks in uninterruptible (blocked) state
+============= ===================================================
+
+
panic_on_rcu_stall
==================
@@ -1014,30 +1033,26 @@ perf_user_access (arm64 and riscv only)
Controls user space access for reading perf event counters.
-arm64
-=====
-
-The default value is 0 (access disabled).
+* for arm64
+ The default value is 0 (access disabled).
-When set to 1, user space can read performance monitor counter registers
-directly.
+ When set to 1, user space can read performance monitor counter registers
+ directly.
-See Documentation/arch/arm64/perf.rst for more information.
-
-riscv
-=====
+ See Documentation/arch/arm64/perf.rst for more information.
-When set to 0, user space access is disabled.
+* for riscv
+ When set to 0, user space access is disabled.
-The default value is 1, user space can read performance monitor counter
-registers through perf, any direct access without perf intervention will trigger
-an illegal instruction.
+ The default value is 1, user space can read performance monitor counter
+ registers through perf, any direct access without perf intervention will trigger
+ an illegal instruction.
-When set to 2, which enables legacy mode (user space has direct access to cycle
-and insret CSRs only). Note that this legacy value is deprecated and will be
-removed once all user space applications are fixed.
+ When set to 2, which enables legacy mode (user space has direct access to cycle
+ and insret CSRs only). Note that this legacy value is deprecated and will be
+ removed once all user space applications are fixed.
-Note that the time CSR is always directly accessible to all modes.
+ Note that the time CSR is always directly accessible to all modes.
pid_max
=======
@@ -1110,7 +1125,8 @@ printk_ratelimit_burst
While long term we enforce one message per `printk_ratelimit`_
seconds, we do allow a burst of messages to pass through.
``printk_ratelimit_burst`` specifies the number of messages we can
-send before ratelimiting kicks in.
+send before ratelimiting kicks in. After `printk_ratelimit`_ seconds
+have elapsed, another burst of messages may be sent.
The default value is 10 messages.
@@ -1465,7 +1481,7 @@ stack_erasing
=============
This parameter can be used to control kernel stack erasing at the end
-of syscalls for kernels built with ``CONFIG_GCC_PLUGIN_STACKLEAK``.
+of syscalls for kernels built with ``CONFIG_KSTACK_ERASE``.
That erasing reduces the information which kernel stack leak bugs
can reveal and blocks some uninitialized stack variable attacks.
@@ -1473,7 +1489,7 @@ The tradeoff is the performance impact: on a single CPU system kernel
compilation sees a 1% slowdown, other systems and workloads may vary.
= ====================================================================
-0 Kernel stack erasing is disabled, STACKLEAK_METRICS are not updated.
+0 Kernel stack erasing is disabled, KSTACK_ERASE_METRICS are not updated.
1 Kernel stack erasing is enabled (default), it is performed before
returning to the userspace at the end of syscalls.
= ====================================================================
diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
index 8290177b4f75..4d71211fdad8 100644
--- a/Documentation/admin-guide/sysctl/vm.rst
+++ b/Documentation/admin-guide/sysctl/vm.rst
@@ -75,6 +75,7 @@ Currently, these files are in /proc/sys/vm:
- unprivileged_userfaultfd
- user_reserve_kbytes
- vfs_cache_pressure
+- vfs_cache_pressure_denom
- watermark_boost_factor
- watermark_scale_factor
- zone_reclaim_mode
@@ -131,6 +132,12 @@ to latency spikes in unsuspecting applications. The kernel employs
various heuristics to avoid wasting CPU cycles if it detects that
proactive compaction is not being effective.
+Setting the value above 80 will, in addition to lowering the acceptable level
+of fragmentation, make the compaction code more sensitive to increases in
+fragmentation, i.e. compaction will trigger more often, but reduce
+fragmentation by a smaller amount.
+This makes the fragmentation level more stable over time.
+
Be careful when setting it to extreme values like 100, as that may
cause excessive background compaction activity.
@@ -458,8 +465,8 @@ The minimum value is 1 (1/1 -> 100%). The value less than 1 completely
disables protection of the pages.
-max_map_count:
-==============
+max_map_count
+=============
This file contains the maximum number of memory map areas a process
may have. Memory map areas are used as a side-effect of calling
@@ -488,8 +495,8 @@ memory allocations.
The default value depends on CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT.
-memory_failure_early_kill:
-==========================
+memory_failure_early_kill
+=========================
Control how to kill processes when uncorrected memory error (typically
a 2bit error in a memory module) is detected in the background by hardware
@@ -1017,19 +1024,28 @@ vfs_cache_pressure
This percentage value controls the tendency of the kernel to reclaim
the memory which is used for caching of directory and inode objects.
-At the default value of vfs_cache_pressure=100 the kernel will attempt to
-reclaim dentries and inodes at a "fair" rate with respect to pagecache and
-swapcache reclaim. Decreasing vfs_cache_pressure causes the kernel to prefer
-to retain dentry and inode caches. When vfs_cache_pressure=0, the kernel will
-never reclaim dentries and inodes due to memory pressure and this can easily
-lead to out-of-memory conditions. Increasing vfs_cache_pressure beyond 100
-causes the kernel to prefer to reclaim dentries and inodes.
+At the default value of vfs_cache_pressure=vfs_cache_pressure_denom the kernel
+will attempt to reclaim dentries and inodes at a "fair" rate with respect to
+pagecache and swapcache reclaim. Decreasing vfs_cache_pressure causes the
+kernel to prefer to retain dentry and inode caches. When vfs_cache_pressure=0,
+the kernel will never reclaim dentries and inodes due to memory pressure and
+this can easily lead to out-of-memory conditions. Increasing vfs_cache_pressure
+beyond vfs_cache_pressure_denom causes the kernel to prefer to reclaim dentries
+and inodes.
-Increasing vfs_cache_pressure significantly beyond 100 may have negative
-performance impact. Reclaim code needs to take various locks to find freeable
-directory and inode objects. With vfs_cache_pressure=1000, it will look for
-ten times more freeable objects than there are.
+Increasing vfs_cache_pressure significantly beyond vfs_cache_pressure_denom may
+have negative performance impact. Reclaim code needs to take various locks to
+find freeable directory and inode objects. When vfs_cache_pressure equals
+(10 * vfs_cache_pressure_denom), it will look for ten times more freeable
+objects than there are.
+
+Note: This setting should always be used together with vfs_cache_pressure_denom.
+
+vfs_cache_pressure_denom
+========================
+Defaults to 100 (minimum allowed value). Requires corresponding
+vfs_cache_pressure setting to take effect.
watermark_boost_factor
======================