diff options
Diffstat (limited to 'Documentation/admin-guide/mm')
-rw-r--r-- | Documentation/admin-guide/mm/damon/index.rst | 1 | ||||
-rw-r--r-- | Documentation/admin-guide/mm/damon/stat.rst | 69 | ||||
-rw-r--r-- | Documentation/admin-guide/mm/damon/usage.rst | 46 | ||||
-rw-r--r-- | Documentation/admin-guide/mm/transhuge.rst | 19 |
4 files changed, 124 insertions, 11 deletions
diff --git a/Documentation/admin-guide/mm/damon/index.rst b/Documentation/admin-guide/mm/damon/index.rst index bc7e976120e0..3ce3164480c7 100644 --- a/Documentation/admin-guide/mm/damon/index.rst +++ b/Documentation/admin-guide/mm/damon/index.rst @@ -14,3 +14,4 @@ access monitoring and access-aware system operations. usage reclaim lru_sort + stat diff --git a/Documentation/admin-guide/mm/damon/stat.rst b/Documentation/admin-guide/mm/damon/stat.rst new file mode 100644 index 000000000000..4c517c2c219a --- /dev/null +++ b/Documentation/admin-guide/mm/damon/stat.rst @@ -0,0 +1,69 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=================================== +Data Access Monitoring Results Stat +=================================== + +Data Access Monitoring Results Stat (DAMON_STAT) is a static kernel module that +is aimed to be used for simple access pattern monitoring. It monitors accesses +on the system's entire physical memory using DAMON, and provides simplified +access monitoring results statistics, namely idle time percentiles and +estimated memory bandwidth. + +Monitoring Accuracy and Overhead +================================ + +DAMON_STAT uses monitoring intervals :ref:`auto-tuning +<damon_design_monitoring_intervals_autotuning>` to make its accuracy high and +overhead minimum. It auto-tunes the intervals aiming 4 % of observable access +events to be captured in each snapshot, while limiting the resulting sampling +events to be 5 milliseconds in minimum and 10 seconds in maximum. On a few +production server systems, it resulted in consuming only 0.x % single CPU time, +while capturing reasonable quality of access patterns. + +Interface: Module Parameters +============================ + +To use this feature, you should first ensure your system is running on a kernel +that is built with ``CONFIG_DAMON_STAT=y``. The feature can be enabled by +default at build time, by setting ``CONFIG_DAMON_STAT_ENABLED_DEFAULT`` true. + +To let sysadmins enable or disable it at boot and/or runtime, and read the +monitoring results, DAMON_STAT provides module parameters. Following +sections are descriptions of the parameters. + +enabled +------- + +Enable or disable DAMON_STAT. + +You can enable DAMON_STAT by setting the value of this parameter as ``Y``. +Setting it as ``N`` disables DAMON_STAT. The default value is set by +``CONFIG_DAMON_STAT_ENABLED_DEFAULT`` build config option. + +estimated_memory_bandwidth +-------------------------- + +Estimated memory bandwidth consumption (bytes per second) of the system. + +DAMON_STAT reads observed access events on the current DAMON results snapshot +and converts it to memory bandwidth consumption estimation in bytes per second. +The resulting metric is exposed to user via this read-only parameter. Because +DAMON uses sampling, this is only an estimation of the access intensity rather +than accurate memory bandwidth. + +memory_idle_ms_percentiles +-------------------------- + +Per-byte idle time (milliseconds) percentiles of the system. + +DAMON_STAT calculates how long each byte of the memory was not accessed until +now (idle time), based on the current DAMON results snapshot. If DAMON found a +region of access frequency (nr_accesses) larger than zero, every byte of the +region gets zero idle time. If a region has zero access frequency +(nr_accesses), how long the region was keeping the zero access frequency (age) +becomes the idle time of every byte of the region. Then, DAMON_STAT exposes +the percentiles of the idle time values via this read-only parameter. Reading +the parameter returns 101 idle time values in milliseconds, separated by comma. +Each value represents 0-th, 1st, 2nd, 3rd, ..., 99th and 100th percentile idle +times. diff --git a/Documentation/admin-guide/mm/damon/usage.rst b/Documentation/admin-guide/mm/damon/usage.rst index d960aba72b82..ff3a2dda1f02 100644 --- a/Documentation/admin-guide/mm/damon/usage.rst +++ b/Documentation/admin-guide/mm/damon/usage.rst @@ -59,7 +59,7 @@ comma (","). :ref:`/sys/kernel/mm/damon <sysfs_root>`/admin │ :ref:`kdamonds <sysfs_kdamonds>`/nr_kdamonds - │ │ :ref:`0 <sysfs_kdamond>`/state,pid + │ │ :ref:`0 <sysfs_kdamond>`/state,pid,refresh_ms │ │ │ :ref:`contexts <sysfs_contexts>`/nr_contexts │ │ │ │ :ref:`0 <sysfs_context>`/avail_operations,operations │ │ │ │ │ :ref:`monitoring_attrs <sysfs_monitoring_attrs>`/ @@ -85,6 +85,8 @@ comma (","). │ │ │ │ │ │ │ :ref:`watermarks <sysfs_watermarks>`/metric,interval_us,high,mid,low │ │ │ │ │ │ │ :ref:`{core_,ops_,}filters <sysfs_filters>`/nr_filters │ │ │ │ │ │ │ │ 0/type,matching,allow,memcg_path,addr_start,addr_end,target_idx,min,max + │ │ │ │ │ │ │ :ref:`dests <damon_sysfs_dests>`/nr_dests + │ │ │ │ │ │ │ │ 0/id,weight │ │ │ │ │ │ │ :ref:`stats <sysfs_schemes_stats>`/nr_tried,sz_tried,nr_applied,sz_applied,sz_ops_filter_passed,qt_exceeds │ │ │ │ │ │ │ :ref:`tried_regions <sysfs_schemes_tried_regions>`/total_bytes │ │ │ │ │ │ │ │ 0/start,end,nr_accesses,age,sz_filter_passed @@ -121,8 +123,8 @@ kdamond. kdamonds/<N>/ ------------- -In each kdamond directory, two files (``state`` and ``pid``) and one directory -(``contexts``) exist. +In each kdamond directory, three files (``state``, ``pid`` and ``refresh_ms``) +and one directory (``contexts``) exist. Reading ``state`` returns ``on`` if the kdamond is currently running, or ``off`` if it is not running. @@ -159,6 +161,13 @@ Users can write below commands for the kdamond to the ``state`` file. If the state is ``on``, reading ``pid`` shows the pid of the kdamond thread. +Users can ask the kernel to periodically update files showing auto-tuned +parameters and DAMOS stats instead of manually writing +``update_tuned_intervals`` like keywords to ``state`` file. For this, users +should write the desired update time interval in milliseconds to ``refresh_ms`` +file. If the interval is zero, the periodic update is disabled. Reading the +file shows currently set time interval. + ``contexts`` directory contains files for controlling the monitoring contexts that this kdamond will execute. @@ -307,10 +316,10 @@ to ``N-1``. Each directory represents each DAMON-based operation scheme. schemes/<N>/ ------------ -In each scheme directory, seven directories (``access_pattern``, ``quotas``, -``watermarks``, ``core_filters``, ``ops_filters``, ``filters``, ``stats``, and -``tried_regions``) and three files (``action``, ``target_nid`` and -``apply_interval``) exist. +In each scheme directory, eight directories (``access_pattern``, ``quotas``, +``watermarks``, ``core_filters``, ``ops_filters``, ``filters``, ``dests``, +``stats``, and ``tried_regions``) and three files (``action``, ``target_nid`` +and ``apply_interval``) exist. The ``action`` file is for setting and getting the scheme's :ref:`action <damon_design_damos_action>`. The keywords that can be written to and read @@ -484,6 +493,29 @@ Refer to the :ref:`DAMOS filters design documentation of different ``allow`` works, when each of the filters are supported, and differences on stats. +.. _damon_sysfs_dests: + +schemes/<N>/dests/ +------------------ + +Directory for specifying the destinations of given DAMON-based operation +scheme's action. This directory is ignored if the action of the given scheme +is not supporting multiple destinations. Only ``DAMOS_MIGRATE_{HOT,COLD}`` +actions are supporting multiple destinations. + +In the beginning, the directory has only one file, ``nr_dests``. Writing a +number (``N``) to the file creates the number of child directories named ``0`` +to ``N-1``. Each directory represents each action destination. + +Each destination directory contains two files, namely ``id`` and ``weight``. +Users can write and read the identifier of the destination to ``id`` file. +For ``DAMOS_MIGRATE_{HOT,COLD}`` actions, the migrate destination node's node +id should be written to ``id`` file. Users can write and read the weight of +the destination among the given destinations to the ``weight`` file. The +weight can be an arbitrary integer. When DAMOS apply the action to each entity +of the memory region, it will select the destination of the action based on the +relative weights of the destinations. + .. _sysfs_schemes_stats: schemes/<N>/stats/ diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst index dff8d5985f0f..370fba113460 100644 --- a/Documentation/admin-guide/mm/transhuge.rst +++ b/Documentation/admin-guide/mm/transhuge.rst @@ -107,7 +107,7 @@ sysfs Global THP controls ------------------- -Transparent Hugepage Support for anonymous memory can be entirely disabled +Transparent Hugepage Support for anonymous memory can be disabled (mostly for debugging purposes) or only enabled inside MADV_HUGEPAGE regions (to avoid the risk of consuming more memory resources) or enabled system wide. This can be achieved per-supported-THP-size with one of:: @@ -119,6 +119,11 @@ system wide. This can be achieved per-supported-THP-size with one of:: where <size> is the hugepage size being addressed, the available sizes for which vary by system. +.. note:: Setting "never" in all sysfs THP controls does **not** disable + Transparent Huge Pages globally. This is because ``madvise(..., + MADV_COLLAPSE)`` ignores these settings and collapses ranges to + PMD-sized huge pages unconditionally. + For example:: echo always >/sys/kernel/mm/transparent_hugepage/hugepages-2048kB/enabled @@ -187,7 +192,9 @@ madvise behaviour. never - should be self-explanatory. + should be self-explanatory. Note that ``madvise(..., + MADV_COLLAPSE)`` can still cause transparent huge pages to be + obtained even if this mode is specified everywhere. By default kernel tries to use huge, PMD-mappable zero page on read page fault to anonymous mapping. It's possible to disable huge zero @@ -378,7 +385,9 @@ always Attempt to allocate huge pages every time we need a new page; never - Do not allocate huge pages; + Do not allocate huge pages. Note that ``madvise(..., MADV_COLLAPSE)`` + can still cause transparent huge pages to be obtained even if this mode + is specified everywhere; within_size Only allocate huge page if it will be fully within i_size. @@ -434,7 +443,9 @@ inherit have enabled="inherit" and all other hugepage sizes have enabled="never"; never - Do not allocate <size> huge pages; + Do not allocate <size> huge pages. Note that ``madvise(..., + MADV_COLLAPSE)`` can still cause transparent huge pages to be obtained + even if this mode is specified everywhere; within_size Only allocate <size> huge page if it will be fully within i_size. |