summaryrefslogtreecommitdiff
path: root/Documentation/admin-guide/sysctl
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/admin-guide/sysctl')
-rw-r--r--Documentation/admin-guide/sysctl/fs.rst15
-rw-r--r--Documentation/admin-guide/sysctl/kernel.rst72
-rw-r--r--Documentation/admin-guide/sysctl/net.rst6
-rw-r--r--Documentation/admin-guide/sysctl/vm.rst54
4 files changed, 138 insertions, 9 deletions
diff --git a/Documentation/admin-guide/sysctl/fs.rst b/Documentation/admin-guide/sysctl/fs.rst
index 47499a1742bd..08e89e031714 100644
--- a/Documentation/admin-guide/sysctl/fs.rst
+++ b/Documentation/admin-guide/sysctl/fs.rst
@@ -38,6 +38,11 @@ requests. ``aio-max-nr`` allows you to change the maximum value
``aio-max-nr`` does not result in the
pre-allocation or re-sizing of any kernel data structures.
+dentry-negative
+----------------------------
+
+Policy for negative dentries. Set to 1 to always delete the dentry when a
+file is removed, and 0 to disable it. By default, this behavior is disabled.
dentry-state
------------
@@ -332,3 +337,13 @@ Each "watch" costs roughly 90 bytes on a 32-bit kernel, and roughly 160 bytes
on a 64-bit one.
The current default value for ``max_user_watches`` is 4% of the
available low memory, divided by the "watch" cost in bytes.
+
+5. /proc/sys/fs/fuse - Configuration options for FUSE filesystems
+=====================================================================
+
+This directory contains the following configuration options for FUSE
+filesystems:
+
+``/proc/sys/fs/fuse/max_pages_limit`` is a read/write file for
+setting/getting the maximum number of pages that can be used for servicing
+requests in FUSE.
diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
index 6584a1f9bfe3..dd49a89a62d3 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -212,6 +212,17 @@ pid>/``).
This value defaults to 0.
+core_sort_vma
+=============
+
+The default coredump writes VMAs in address order. By setting
+``core_sort_vma`` to 1, VMAs will be written from smallest size
+to largest size. This is known to break at least elfutils, but
+can be handy when dealing with very large (and truncated)
+coredumps where the more useful debugging details are included
+in the smaller VMAs.
+
+
core_uses_pid
=============
@@ -296,12 +307,30 @@ kernel panic). This will output the contents of the ftrace buffers to
the console. This is very useful for capturing traces that lead to
crashes and outputting them to a serial console.
-= ===================================================
-0 Disabled (default).
-1 Dump buffers of all CPUs.
-2 Dump the buffer of the CPU that triggered the oops.
-= ===================================================
+======================= ===========================================
+0 Disabled (default).
+1 Dump buffers of all CPUs.
+2(orig_cpu) Dump the buffer of the CPU that triggered the
+ oops.
+<instance> Dump the specific instance buffer on all CPUs.
+<instance>=2(orig_cpu) Dump the specific instance buffer on the CPU
+ that triggered the oops.
+======================= ===========================================
+
+Multiple instance dump is also supported, and instances are separated
+by commas. If global buffer also needs to be dumped, please specify
+the dump mode (1/2/orig_cpu) first for global buffer.
+So for example to dump "foo" and "bar" instance buffer on all CPUs,
+user can::
+
+ echo "foo,bar" > /proc/sys/kernel/ftrace_dump_on_oops
+
+To dump global buffer and "foo" instance buffer on all
+CPUs along with the "bar" instance buffer on CPU that triggered the
+oops, user can::
+
+ echo "1,foo,bar=2" > /proc/sys/kernel/ftrace_dump_on_oops
ftrace_enabled, stack_tracer_enabled
====================================
@@ -383,6 +412,15 @@ The upper bound on the number of tasks that are checked.
This file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled.
+hung_task_detect_count
+======================
+
+Indicates the total number of tasks that have been detected as hung since
+the system boot.
+
+This file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled.
+
+
hung_task_timeout_secs
======================
@@ -436,7 +474,7 @@ ignore-unaligned-usertrap
On architectures where unaligned accesses cause traps, and where this
feature is supported (``CONFIG_SYSCTL_ARCH_UNALIGN_NO_WARN``;
-currently, ``arc`` and ``loongarch``), controls whether all
+currently, ``arc``, ``parisc`` and ``loongarch``), controls whether all
unaligned traps are logged.
= =============================================================
@@ -594,6 +632,9 @@ default (``MSGMNB``).
``msgmni`` is the maximum number of IPC queues. 32000 by default
(``MSGMNI``).
+All of these parameters are set per ipc namespace. The maximum number of bytes
+in POSIX message queues is limited by ``RLIMIT_MSGQUEUE``. This limit is
+respected hierarchically in the each user namespace.
msg_next_id, sem_next_id, and shm_next_id (System V IPC)
========================================================
@@ -850,6 +891,7 @@ bit 3 print locks info if ``CONFIG_LOCKDEP`` is on
bit 4 print ftrace buffer
bit 5 print all printk messages in buffer
bit 6 print all CPUs backtrace (if available in the arch)
+bit 7 print only tasks in uninterruptible (blocked) state
===== ============================================
So for example to print tasks and memory info on panic, user can::
@@ -1274,15 +1316,20 @@ are doing anyway :)
shmall
======
-This parameter sets the total amount of shared memory pages that
-can be used system wide. Hence, ``shmall`` should always be at least
-``ceil(shmmax/PAGE_SIZE)``.
+This parameter sets the total amount of shared memory pages that can be used
+inside ipc namespace. The shared memory pages counting occurs for each ipc
+namespace separately and is not inherited. Hence, ``shmall`` should always be at
+least ``ceil(shmmax/PAGE_SIZE)``.
If you are not sure what the default ``PAGE_SIZE`` is on your Linux
system, you can run the following command::
# getconf PAGE_SIZE
+To reduce or disable the ability to allocate shared memory, you must create a
+new ipc namespace, set this parameter to the required value and prohibit the
+creation of a new ipc namespace in the current user namespace or cgroups can
+be used.
shmmax
======
@@ -1508,6 +1555,13 @@ constant ``FUTEX_TID_MASK`` (0x3fffffff).
If a value outside of this range is written to ``threads-max`` an
``EINVAL`` error occurs.
+timer_migration
+===============
+
+When set to a non-zero value, attempt to migrate timers away from idle cpus to
+allow them to remain in low power states longer.
+
+Default is set (1).
traceoff_on_warning
===================
diff --git a/Documentation/admin-guide/sysctl/net.rst b/Documentation/admin-guide/sysctl/net.rst
index 396091651955..7b0c4291c686 100644
--- a/Documentation/admin-guide/sysctl/net.rst
+++ b/Documentation/admin-guide/sysctl/net.rst
@@ -72,6 +72,7 @@ two flavors of JITs, the newer eBPF JIT currently supported on:
- riscv64
- riscv32
- loongarch64
+ - arc
And the older cBPF JIT supported on the following archs:
@@ -206,6 +207,11 @@ Will increase power usage.
Default: 0 (off)
+mem_pcpu_rsv
+------------
+
+Per-cpu reserved forward alloc cache size in page units. Default 1MB per CPU.
+
rmem_default
------------
diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
index c59889de122b..f48eaa98d22d 100644
--- a/Documentation/admin-guide/sysctl/vm.rst
+++ b/Documentation/admin-guide/sysctl/vm.rst
@@ -36,6 +36,7 @@ Currently, these files are in /proc/sys/vm:
- dirtytime_expire_seconds
- dirty_writeback_centisecs
- drop_caches
+- enable_soft_offline
- extfrag_threshold
- highmem_is_dirtyable
- hugetlb_shm_group
@@ -43,6 +44,7 @@ Currently, these files are in /proc/sys/vm:
- legacy_va_layout
- lowmem_reserve_ratio
- max_map_count
+- mem_profiling (only if CONFIG_MEM_ALLOC_PROFILING=y)
- memory_failure_early_kill
- memory_failure_recovery
- min_free_kbytes
@@ -266,6 +268,43 @@ used::
These are informational only. They do not mean that anything is wrong
with your system. To disable them, echo 4 (bit 2) into drop_caches.
+enable_soft_offline
+===================
+Correctable memory errors are very common on servers. Soft-offline is kernel's
+solution for memory pages having (excessive) corrected memory errors.
+
+For different types of page, soft-offline has different behaviors / costs.
+
+- For a raw error page, soft-offline migrates the in-use page's content to
+ a new raw page.
+
+- For a page that is part of a transparent hugepage, soft-offline splits the
+ transparent hugepage into raw pages, then migrates only the raw error page.
+ As a result, user is transparently backed by 1 less hugepage, impacting
+ memory access performance.
+
+- For a page that is part of a HugeTLB hugepage, soft-offline first migrates
+ the entire HugeTLB hugepage, during which a free hugepage will be consumed
+ as migration target. Then the original hugepage is dissolved into raw
+ pages without compensation, reducing the capacity of the HugeTLB pool by 1.
+
+It is user's call to choose between reliability (staying away from fragile
+physical memory) vs performance / capacity implications in transparent and
+HugeTLB cases.
+
+For all architectures, enable_soft_offline controls whether to soft offline
+memory pages. When set to 1, kernel attempts to soft offline the pages
+whenever it thinks needed. When set to 0, kernel returns EOPNOTSUPP to
+the request to soft offline the pages. Its default value is 1.
+
+It is worth mentioning that after setting enable_soft_offline to 0, the
+following requests to soft offline pages will not be performed:
+
+- Request to soft offline pages from RAS Correctable Errors Collector.
+
+- On ARM, the request to soft offline pages from GHES driver.
+
+- On PARISC, the request to soft offline pages from Page Deallocation Table.
extfrag_threshold
=================
@@ -425,6 +464,21 @@ e.g., up to one or two maps per allocation.
The default value is 65530.
+mem_profiling
+==============
+
+Enable memory profiling (when CONFIG_MEM_ALLOC_PROFILING=y)
+
+1: Enable memory profiling.
+
+0: Disable memory profiling.
+
+Enabling memory profiling introduces a small performance overhead for all
+memory allocations.
+
+The default value depends on CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT.
+
+
memory_failure_early_kill:
==========================