block: make iolatency avg_lat exponentially decay

Currently, avg_lat is calculated by accumulating the mean of every window in a long running cumulative average. As time goes on, the metric becomes less and less useful due to the accumulated history. This patch reuses the same calculation done in load averages to make the avg_lat metric more lively. Unlike load averages, the avg only advances when a window elapses (due to an io). Idle periods extend the most recent window. Bucketing is used to limit the history of avg_lat by binding it to the window size. So, the window range for 1/exp (decay rate) is [1 min, 2.5 min) when windows elapse immediately. The current sample window size is exposed in the debug info to enable calculation of the window range. Signed-off-by: Dennis Zhou <dennisszhou@gmail.com> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
author: Dennis Zhou (Facebook) <dennisszhou@gmail.com> 2018-08-01 23:15:41 -0700
committer: Jens Axboe <axboe@kernel.dk> 2018-08-02 09:58:14 -0600
commit: c480bcf97b186a67ea6f0f6cab70ba430bcd5613 (patch)
tree: b793725441cda9321981e427a5e68947656f1e58 /Documentation/admin-guide
parent: 2c323017e381c55c5ce2a603b8305bb18c1162cc (diff)
1 files changed, 12 insertions, 9 deletions
diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 3afe10fa82bc..1746131bc9cb 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1474,11 +1474,9 @@ So the ideal way to configure this is to set io.latency in groups A, B, and C.
 Generally you do not want to set a value lower than the latency your device
 supports.  Experiment to find the value that works best for your workload.
 Start at higher than the expected latency for your device and watch the
-total_lat_avg value in io.stat for your workload group to get an idea of the
-latency you see during normal operation.  Use this value as a basis for your
-real setting, setting at 10-15% higher than the value in io.stat.
-Experimentation is key here because total_lat_avg is a running total, so is the
-"statistics" portion of "lies, damned lies, and statistics."
+avg_lat value in io.stat for your workload group to get an idea of the
+latency you see during normal operation.  Use the avg_lat value as a basis for
+your real setting, setting at 10-15% higher than the value in io.stat.
 
 How IO Latency Throttling Works
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -1522,10 +1520,15 @@ IO Latency Interface Files
 		This is the current queue depth for the group.
 
 	  avg_lat
-		The running average IO latency for this group in microseconds.
-		Running average is generally flawed, but will give an
-		administrator a general idea of the overall latency they can
-		expect for their workload on the given disk.
+		This is an exponential moving average with a decay rate of 1/exp
+		bound by the sampling interval.  The decay rate interval can be
+		calculated by multiplying the win value in io.stat by the
+		corresponding number of samples based on the win value.
+
+	  win
+		The sampling window size in milliseconds.  This is the minimum
+		duration of time between evaluation events.  Windows only elapse
+		with IO activity.  Idle periods extend the most recent window.
 
 PID
 ---
author	Dennis Zhou (Facebook) <dennisszhou@gmail.com>	2018-08-01 23:15:41 -0700
committer	Jens Axboe <axboe@kernel.dk>	2018-08-02 09:58:14 -0600
commit	c480bcf97b186a67ea6f0f6cab70ba430bcd5613 (patch)
tree	b793725441cda9321981e427a5e68947656f1e58 /Documentation/admin-guide
parent	2c323017e381c55c5ce2a603b8305bb18c1162cc (diff)