diff options
Diffstat (limited to 'Documentation/filesystems/proc.rst')
-rw-r--r-- | Documentation/filesystems/proc.rst | 187 |
1 files changed, 170 insertions, 17 deletions
diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst index c6a6b9df2104..2a17865dfe39 100644 --- a/Documentation/filesystems/proc.rst +++ b/Documentation/filesystems/proc.rst @@ -48,6 +48,7 @@ fixes/update part 1.1 Stefani Seibold <stefani@seibold.net> June 9 2009 3.11 /proc/<pid>/patch_state - Livepatch patch operation state 3.12 /proc/<pid>/arch_status - Task architecture specific information 3.13 /proc/<pid>/fd - List of symlinks to open files + 3.14 /proc/<pid/ksm_stat - Information about the process's ksm status. 4 Configuring procfs 4.1 Mount options @@ -127,6 +128,16 @@ process running on the system, which is named after the process ID (PID). The link 'self' points to the process reading the file system. Each process subdirectory has the entries listed in Table 1-1. +A process can read its own information from /proc/PID/* with no extra +permissions. When reading /proc/PID/* information for other processes, reading +process is required to have either CAP_SYS_PTRACE capability with +PTRACE_MODE_READ access permissions, or, alternatively, CAP_PERFMON +capability. This applies to all read-only information like `maps`, `environ`, +`pagemap`, etc. The only exception is `mem` file due to its read-write nature, +which requires CAP_SYS_PTRACE capabilities with more elevated +PTRACE_MODE_ATTACH permissions; CAP_PERFMON capability does not grant access +to /proc/PID/mem for other processes. + Note that an open file descriptor to /proc/<pid> or to any of its contained files or subdirectories does not prevent <pid> being reused for some other process in the event that <pid> exits. Operations on @@ -443,6 +454,15 @@ is not associated with a file: or if empty, the mapping is anonymous. +Starting with 6.11 kernel, /proc/PID/maps provides an alternative +ioctl()-based API that gives ability to flexibly and efficiently query and +filter individual VMAs. This interface is binary and is meant for more +efficient and easy programmatic use. `struct procmap_query`, defined in +linux/fs.h UAPI header, serves as an input/output argument to the +`PROCMAP_QUERY` ioctl() command. See comments in linus/fs.h UAPI header for +details on query semantics, supported flags, data returned, and general API +usage information. + The /proc/PID/smaps is an extension based on maps, showing the memory consumption for each of the process's mappings. For each mapping (aka Virtual Memory Area, or VMA) there is a series of lines such as the following:: @@ -475,14 +495,15 @@ Memory Area, or VMA) there is a series of lines such as the following:: THPeligible: 0 VmFlags: rd ex mr mw me dw -The first of these lines shows the same information as is displayed for the -mapping in /proc/PID/maps. Following lines show the size of the mapping -(size); the size of each page allocated when backing a VMA (KernelPageSize), -which is usually the same as the size in the page table entries; the page size -used by the MMU when backing a VMA (in most cases, the same as KernelPageSize); -the amount of the mapping that is currently resident in RAM (RSS); the -process' proportional share of this mapping (PSS); and the number of clean and -dirty shared and private pages in the mapping. +The first of these lines shows the same information as is displayed for +the mapping in /proc/PID/maps. Following lines show the size of the +mapping (size); the size of each page allocated when backing a VMA +(KernelPageSize), which is usually the same as the size in the page table +entries; the page size used by the MMU when backing a VMA (in most cases, +the same as KernelPageSize); the amount of the mapping that is currently +resident in RAM (RSS); the process's proportional share of this mapping +(PSS); and the number of clean and dirty shared and private pages in the +mapping. The "proportional set size" (PSS) of a process is the count of pages it has in memory, where each page is divided by the number of processes sharing it. @@ -491,9 +512,25 @@ process, its PSS will be 1500. "Pss_Dirty" is the portion of PSS which consists of dirty pages. ("Pss_Clean" is not included, but it can be calculated by subtracting "Pss_Dirty" from "Pss".) -Note that even a page which is part of a MAP_SHARED mapping, but has only -a single pte mapped, i.e. is currently used by only one process, is accounted -as private and not as shared. +Traditionally, a page is accounted as "private" if it is mapped exactly once, +and a page is accounted as "shared" when mapped multiple times, even when +mapped in the same process multiple times. Note that this accounting is +independent of MAP_SHARED. + +In some kernel configurations, the semantics of pages part of a larger +allocation (e.g., THP) can differ: a page is accounted as "private" if all +pages part of the corresponding large allocation are *certainly* mapped in the +same process, even if the page is mapped multiple times in that process. A +page is accounted as "shared" if any page page of the larger allocation +is *maybe* mapped in a different process. In some cases, a large allocation +might be treated as "maybe mapped by multiple processes" even though this +is no longer the case. + +Some kernel configurations do not track the precise number of times a page part +of a larger allocation is mapped. In this case, when calculating the PSS, the +average number of mappings per page in this larger allocation might be used +as an approximation for the number of mappings of a page. The PSS calculation +will be imprecise in this case. "Referenced" indicates the amount of memory currently marked as referenced or accessed. @@ -570,7 +607,8 @@ encoded manner. The codes are the following: mt arm64 MTE allocation tags are enabled um userfaultfd missing tracking uw userfaultfd wr-protect tracking - ss shadow stack page + ss shadow/guarded control stack page + sl sealed == ======================================= Note that there is no guarantee that every flag and associated mnemonic will @@ -674,6 +712,11 @@ Where: node locality page counters (N0 == node0, N1 == node1, ...) and the kernel page size, in KB, that is backing the mapping up. +Note that some kernel configurations do not track the precise number of times +a page part of a larger allocation (e.g., THP) is mapped. In these +configurations, "mapmax" might corresponds to the average number of mappings +per page in such a larger allocation instead. + 1.2 Kernel data --------------- @@ -688,6 +731,7 @@ files are there, and which are missing. ============ =============================================================== File Content ============ =============================================================== + allocinfo Memory allocations profiling information apm Advanced power management info bootconfig Kernel command line obtained from boot config, and, if there were kernel parameters from the @@ -953,6 +997,35 @@ also be allocatable although a lot of filesystem metadata may have to be reclaimed to achieve this. +allocinfo +~~~~~~~~~ + +Provides information about memory allocations at all locations in the code +base. Each allocation in the code is identified by its source file, line +number, module (if originates from a loadable module) and the function calling +the allocation. The number of bytes allocated and number of calls at each +location are reported. The first line indicates the version of the file, the +second line is the header listing fields in the file. + +Example output. + +:: + + > tail -n +3 /proc/allocinfo | sort -rn + 127664128 31168 mm/page_ext.c:270 func:alloc_page_ext + 56373248 4737 mm/slub.c:2259 func:alloc_slab_page + 14880768 3633 mm/readahead.c:247 func:page_cache_ra_unbounded + 14417920 3520 mm/mm_init.c:2530 func:alloc_large_system_hash + 13377536 234 block/blk-mq.c:3421 func:blk_mq_alloc_rqs + 11718656 2861 mm/filemap.c:1919 func:__filemap_get_folio + 9192960 2800 kernel/fork.c:307 func:alloc_thread_stack_node + 4206592 4 net/netfilter/nf_conntrack_core.c:2567 func:nf_ct_alloc_hashtable + 4136960 1010 drivers/staging/ctagmod/ctagmod.c:20 [ctagmod] func:ctagmod_start + 3940352 962 mm/memory.c:4214 func:alloc_anon_folio + 2894464 22613 fs/kernfs/dir.c:615 func:__kernfs_new_node + ... + + meminfo ~~~~~~~ @@ -1018,6 +1091,8 @@ Example output. You may not have all of these fields. FilePmdMapped: 0 kB CmaTotal: 0 kB CmaFree: 0 kB + Unaccepted: 0 kB + Balloon: 0 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 @@ -1090,9 +1165,15 @@ Dirty Writeback Memory which is actively being written back to the disk AnonPages - Non-file backed pages mapped into userspace page tables + Non-file backed pages mapped into userspace page tables. Note that + some kernel configurations might consider all pages part of a + larger allocation (e.g., THP) as "mapped", as soon as a single + page is mapped. Mapped - files which have been mmapped, such as libraries + files which have been mmapped, such as libraries. Note that some + kernel configurations might consider all pages part of a larger + allocation (e.g., THP) as "mapped", as soon as a single page is + mapped. Shmem Total memory used by shared memory (shmem) and tmpfs KReclaimable @@ -1110,8 +1191,8 @@ KernelStack PageTables Memory consumed by userspace page tables SecPageTables - Memory consumed by secondary page tables, this currently - currently includes KVM mmu allocations on x86 and arm64. + Memory consumed by secondary page tables, this currently includes + KVM mmu and IOMMU allocations on x86 and arm64. NFS_Unstable Always zero. Previous counted pages which had been written to the server, but has not been committed to stable storage. @@ -1186,6 +1267,10 @@ CmaTotal Memory reserved for the Contiguous Memory Allocator (CMA) CmaFree Free remaining memory in the CMA reserves +Unaccepted + Memory that has not been accepted by the guest +Balloon + Memory returned to Host by VM Balloon Drivers HugePages_Total, HugePages_Free, HugePages_Rsvd, HugePages_Surp, Hugepagesize, Hugetlb See Documentation/admin-guide/mm/hugetlbpage.rst. DirectMap4k, DirectMap2M, DirectMap1G @@ -2192,6 +2277,74 @@ The number of open files for the process is stored in 'size' member of stat() output for /proc/<pid>/fd for fast access. ------------------------------------------------------- +3.14 /proc/<pid/ksm_stat - Information about the process's ksm status +--------------------------------------------------------------------- +When CONFIG_KSM is enabled, each process has this file which displays +the information of ksm merging status. + +Example +~~~~~~~ + +:: + + / # cat /proc/self/ksm_stat + ksm_rmap_items 0 + ksm_zero_pages 0 + ksm_merging_pages 0 + ksm_process_profit 0 + ksm_merge_any: no + ksm_mergeable: no + +Description +~~~~~~~~~~~ + +ksm_rmap_items +^^^^^^^^^^^^^^ + +The number of ksm_rmap_item structures in use. The structure +ksm_rmap_item stores the reverse mapping information for virtual +addresses. KSM will generate a ksm_rmap_item for each ksm-scanned page of +the process. + +ksm_zero_pages +^^^^^^^^^^^^^^ + +When /sys/kernel/mm/ksm/use_zero_pages is enabled, it represent how many +empty pages are merged with kernel zero pages by KSM. + +ksm_merging_pages +^^^^^^^^^^^^^^^^^ + +It represents how many pages of this process are involved in KSM merging +(not including ksm_zero_pages). It is the same with what +/proc/<pid>/ksm_merging_pages shows. + +ksm_process_profit +^^^^^^^^^^^^^^^^^^ + +The profit that KSM brings (Saved bytes). KSM can save memory by merging +identical pages, but also can consume additional memory, because it needs +to generate a number of rmap_items to save each scanned page's brief rmap +information. Some of these pages may be merged, but some may not be abled +to be merged after being checked several times, which are unprofitable +memory consumed. + +ksm_merge_any +^^^^^^^^^^^^^ + +It specifies whether the process's 'mm is added by prctl() into the +candidate list of KSM or not, and if KSM scanning is fully enabled at +process level. + +ksm_mergeable +^^^^^^^^^^^^^ + +It specifies whether any VMAs of the process''s mms are currently +applicable to KSM. + +More information about KSM can be found in +Documentation/admin-guide/mm/ksm.rst. + Chapter 4: Configuring procfs ============================= @@ -2221,7 +2374,7 @@ arguments are now protected against local eavesdroppers. hidepid=invisible or hidepid=2 means hidepid=1 plus all /proc/<pid>/ will be fully invisible to other users. It doesn't mean that it hides a fact whether a process with a specific pid value exists (it can be learned by other means, e.g. -by "kill -0 $PID"), but it hides process' uid and gid, which may be learned by +by "kill -0 $PID"), but it hides process's uid and gid, which may be learned by stat()'ing /proc/<pid>/ otherwise. It greatly complicates an intruder's task of gathering information about running processes, whether some daemon runs with elevated privileges, whether other user runs some sensitive program, whether |