diff options
Diffstat (limited to 'Documentation/admin-guide/sysctl')
-rw-r--r-- | Documentation/admin-guide/sysctl/fs.rst | 15 | ||||
-rw-r--r-- | Documentation/admin-guide/sysctl/kernel.rst | 72 | ||||
-rw-r--r-- | Documentation/admin-guide/sysctl/net.rst | 6 | ||||
-rw-r--r-- | Documentation/admin-guide/sysctl/vm.rst | 54 |
4 files changed, 138 insertions, 9 deletions
diff --git a/Documentation/admin-guide/sysctl/fs.rst b/Documentation/admin-guide/sysctl/fs.rst index 47499a1742bd..08e89e031714 100644 --- a/Documentation/admin-guide/sysctl/fs.rst +++ b/Documentation/admin-guide/sysctl/fs.rst @@ -38,6 +38,11 @@ requests. ``aio-max-nr`` allows you to change the maximum value ``aio-max-nr`` does not result in the pre-allocation or re-sizing of any kernel data structures. +dentry-negative +---------------------------- + +Policy for negative dentries. Set to 1 to always delete the dentry when a +file is removed, and 0 to disable it. By default, this behavior is disabled. dentry-state ------------ @@ -332,3 +337,13 @@ Each "watch" costs roughly 90 bytes on a 32-bit kernel, and roughly 160 bytes on a 64-bit one. The current default value for ``max_user_watches`` is 4% of the available low memory, divided by the "watch" cost in bytes. + +5. /proc/sys/fs/fuse - Configuration options for FUSE filesystems +===================================================================== + +This directory contains the following configuration options for FUSE +filesystems: + +``/proc/sys/fs/fuse/max_pages_limit`` is a read/write file for +setting/getting the maximum number of pages that can be used for servicing +requests in FUSE. diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst index 6584a1f9bfe3..dd49a89a62d3 100644 --- a/Documentation/admin-guide/sysctl/kernel.rst +++ b/Documentation/admin-guide/sysctl/kernel.rst @@ -212,6 +212,17 @@ pid>/``). This value defaults to 0. +core_sort_vma +============= + +The default coredump writes VMAs in address order. By setting +``core_sort_vma`` to 1, VMAs will be written from smallest size +to largest size. This is known to break at least elfutils, but +can be handy when dealing with very large (and truncated) +coredumps where the more useful debugging details are included +in the smaller VMAs. + + core_uses_pid ============= @@ -296,12 +307,30 @@ kernel panic). This will output the contents of the ftrace buffers to the console. This is very useful for capturing traces that lead to crashes and outputting them to a serial console. -= =================================================== -0 Disabled (default). -1 Dump buffers of all CPUs. -2 Dump the buffer of the CPU that triggered the oops. -= =================================================== +======================= =========================================== +0 Disabled (default). +1 Dump buffers of all CPUs. +2(orig_cpu) Dump the buffer of the CPU that triggered the + oops. +<instance> Dump the specific instance buffer on all CPUs. +<instance>=2(orig_cpu) Dump the specific instance buffer on the CPU + that triggered the oops. +======================= =========================================== + +Multiple instance dump is also supported, and instances are separated +by commas. If global buffer also needs to be dumped, please specify +the dump mode (1/2/orig_cpu) first for global buffer. +So for example to dump "foo" and "bar" instance buffer on all CPUs, +user can:: + + echo "foo,bar" > /proc/sys/kernel/ftrace_dump_on_oops + +To dump global buffer and "foo" instance buffer on all +CPUs along with the "bar" instance buffer on CPU that triggered the +oops, user can:: + + echo "1,foo,bar=2" > /proc/sys/kernel/ftrace_dump_on_oops ftrace_enabled, stack_tracer_enabled ==================================== @@ -383,6 +412,15 @@ The upper bound on the number of tasks that are checked. This file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled. +hung_task_detect_count +====================== + +Indicates the total number of tasks that have been detected as hung since +the system boot. + +This file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled. + + hung_task_timeout_secs ====================== @@ -436,7 +474,7 @@ ignore-unaligned-usertrap On architectures where unaligned accesses cause traps, and where this feature is supported (``CONFIG_SYSCTL_ARCH_UNALIGN_NO_WARN``; -currently, ``arc`` and ``loongarch``), controls whether all +currently, ``arc``, ``parisc`` and ``loongarch``), controls whether all unaligned traps are logged. = ============================================================= @@ -594,6 +632,9 @@ default (``MSGMNB``). ``msgmni`` is the maximum number of IPC queues. 32000 by default (``MSGMNI``). +All of these parameters are set per ipc namespace. The maximum number of bytes +in POSIX message queues is limited by ``RLIMIT_MSGQUEUE``. This limit is +respected hierarchically in the each user namespace. msg_next_id, sem_next_id, and shm_next_id (System V IPC) ======================================================== @@ -850,6 +891,7 @@ bit 3 print locks info if ``CONFIG_LOCKDEP`` is on bit 4 print ftrace buffer bit 5 print all printk messages in buffer bit 6 print all CPUs backtrace (if available in the arch) +bit 7 print only tasks in uninterruptible (blocked) state ===== ============================================ So for example to print tasks and memory info on panic, user can:: @@ -1274,15 +1316,20 @@ are doing anyway :) shmall ====== -This parameter sets the total amount of shared memory pages that -can be used system wide. Hence, ``shmall`` should always be at least -``ceil(shmmax/PAGE_SIZE)``. +This parameter sets the total amount of shared memory pages that can be used +inside ipc namespace. The shared memory pages counting occurs for each ipc +namespace separately and is not inherited. Hence, ``shmall`` should always be at +least ``ceil(shmmax/PAGE_SIZE)``. If you are not sure what the default ``PAGE_SIZE`` is on your Linux system, you can run the following command:: # getconf PAGE_SIZE +To reduce or disable the ability to allocate shared memory, you must create a +new ipc namespace, set this parameter to the required value and prohibit the +creation of a new ipc namespace in the current user namespace or cgroups can +be used. shmmax ====== @@ -1508,6 +1555,13 @@ constant ``FUTEX_TID_MASK`` (0x3fffffff). If a value outside of this range is written to ``threads-max`` an ``EINVAL`` error occurs. +timer_migration +=============== + +When set to a non-zero value, attempt to migrate timers away from idle cpus to +allow them to remain in low power states longer. + +Default is set (1). traceoff_on_warning =================== diff --git a/Documentation/admin-guide/sysctl/net.rst b/Documentation/admin-guide/sysctl/net.rst index 396091651955..7b0c4291c686 100644 --- a/Documentation/admin-guide/sysctl/net.rst +++ b/Documentation/admin-guide/sysctl/net.rst @@ -72,6 +72,7 @@ two flavors of JITs, the newer eBPF JIT currently supported on: - riscv64 - riscv32 - loongarch64 + - arc And the older cBPF JIT supported on the following archs: @@ -206,6 +207,11 @@ Will increase power usage. Default: 0 (off) +mem_pcpu_rsv +------------ + +Per-cpu reserved forward alloc cache size in page units. Default 1MB per CPU. + rmem_default ------------ diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst index c59889de122b..f48eaa98d22d 100644 --- a/Documentation/admin-guide/sysctl/vm.rst +++ b/Documentation/admin-guide/sysctl/vm.rst @@ -36,6 +36,7 @@ Currently, these files are in /proc/sys/vm: - dirtytime_expire_seconds - dirty_writeback_centisecs - drop_caches +- enable_soft_offline - extfrag_threshold - highmem_is_dirtyable - hugetlb_shm_group @@ -43,6 +44,7 @@ Currently, these files are in /proc/sys/vm: - legacy_va_layout - lowmem_reserve_ratio - max_map_count +- mem_profiling (only if CONFIG_MEM_ALLOC_PROFILING=y) - memory_failure_early_kill - memory_failure_recovery - min_free_kbytes @@ -266,6 +268,43 @@ used:: These are informational only. They do not mean that anything is wrong with your system. To disable them, echo 4 (bit 2) into drop_caches. +enable_soft_offline +=================== +Correctable memory errors are very common on servers. Soft-offline is kernel's +solution for memory pages having (excessive) corrected memory errors. + +For different types of page, soft-offline has different behaviors / costs. + +- For a raw error page, soft-offline migrates the in-use page's content to + a new raw page. + +- For a page that is part of a transparent hugepage, soft-offline splits the + transparent hugepage into raw pages, then migrates only the raw error page. + As a result, user is transparently backed by 1 less hugepage, impacting + memory access performance. + +- For a page that is part of a HugeTLB hugepage, soft-offline first migrates + the entire HugeTLB hugepage, during which a free hugepage will be consumed + as migration target. Then the original hugepage is dissolved into raw + pages without compensation, reducing the capacity of the HugeTLB pool by 1. + +It is user's call to choose between reliability (staying away from fragile +physical memory) vs performance / capacity implications in transparent and +HugeTLB cases. + +For all architectures, enable_soft_offline controls whether to soft offline +memory pages. When set to 1, kernel attempts to soft offline the pages +whenever it thinks needed. When set to 0, kernel returns EOPNOTSUPP to +the request to soft offline the pages. Its default value is 1. + +It is worth mentioning that after setting enable_soft_offline to 0, the +following requests to soft offline pages will not be performed: + +- Request to soft offline pages from RAS Correctable Errors Collector. + +- On ARM, the request to soft offline pages from GHES driver. + +- On PARISC, the request to soft offline pages from Page Deallocation Table. extfrag_threshold ================= @@ -425,6 +464,21 @@ e.g., up to one or two maps per allocation. The default value is 65530. +mem_profiling +============== + +Enable memory profiling (when CONFIG_MEM_ALLOC_PROFILING=y) + +1: Enable memory profiling. + +0: Disable memory profiling. + +Enabling memory profiling introduces a small performance overhead for all +memory allocations. + +The default value depends on CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT. + + memory_failure_early_kill: ========================== |