summaryrefslogtreecommitdiff
path: root/drivers/base/node.c
diff options
context:
space:
mode:
authorJoshua Hahn <joshua.hahnjy@gmail.com>2025-05-05 11:23:28 -0700
committerAndrew Morton <akpm@linux-foundation.org>2025-05-21 09:55:15 -0700
commite341f9c3c8412e57fe0042a33a2640245ecdf619 (patch)
tree19b584fb0cd9a37a27f7a46300f0223af00abac6 /drivers/base/node.c
parent9e619cd4fefd19cdce16e169d5827bc64ae01aa1 (diff)
mm/mempolicy: Weighted Interleave Auto-tuning
On machines with multiple memory nodes, interleaving page allocations across nodes allows for better utilization of each node's bandwidth. Previous work by Gregory Price [1] introduced weighted interleave, which allowed for pages to be allocated across nodes according to user-set ratios. Ideally, these weights should be proportional to their bandwidth, so that under bandwidth pressure, each node uses its maximal efficient bandwidth and prevents latency from increasing exponentially. Previously, weighted interleave's default weights were just 1s -- which would be equivalent to the (unweighted) interleave mempolicy, which goes through the nodes in a round-robin fashion, ignoring bandwidth information. This patch has two main goals: First, it makes weighted interleave easier to use for users who wish to relieve bandwidth pressure when using nodes with varying bandwidth (CXL). By providing a set of "real" default weights that just work out of the box, users who might not have the capability (or wish to) perform experimentation to find the most optimal weights for their system can still take advantage of bandwidth-informed weighted interleave. Second, it allows for weighted interleave to dynamically adjust to hotplugged memory with new bandwidth information. Instead of manually updating node weights every time new bandwidth information is reported or taken off, weighted interleave adjusts and provides a new set of default weights for weighted interleave to use when there is a change in bandwidth information. To meet these goals, this patch introduces an auto-configuration mode for the interleave weights that provides a reasonable set of default weights, calculated using bandwidth data reported by the system. In auto mode, weights are dynamically adjusted based on whatever the current bandwidth information reports (and responds to hotplug events). This patch still supports users manually writing weights into the nodeN sysfs interface by entering into manual mode. When a user enters manual mode, the system stops dynamically updating any of the node weights, even during hotplug events that shift the optimal weight distribution. A new sysfs interface "auto" is introduced, which allows users to switch between the auto (writing 1 or Y) and manual (writing 0 or N) modes. The system also automatically enters manual mode when a nodeN interface is manually written to. There is one functional change that this patch makes to the existing weighted_interleave ABI: previously, writing 0 directly to a nodeN interface was said to reset the weight to the system default. Before this patch, the default for all weights were 1, which meant that writing 0 and 1 were functionally equivalent. With this patch, writing 0 is invalid. Link: https://lkml.kernel.org/r/20250520141236.2987309-1-joshua.hahnjy@gmail.com [joshua.hahnjy@gmail.com: wordsmithing changes, simplification, fixes] Link: https://lkml.kernel.org/r/20250511025840.2410154-1-joshua.hahnjy@gmail.com [joshua.hahnjy@gmail.com: remove auto_kobj_attr field from struct sysfs_wi_group] Link: https://lkml.kernel.org/r/20250512142511.3959833-1-joshua.hahnjy@gmail.com https://lore.kernel.org/linux-mm/20240202170238.90004-1-gregory.price@memverge.com/ [1] Link: https://lkml.kernel.org/r/20250505182328.4148265-1-joshua.hahnjy@gmail.com Co-developed-by: Gregory Price <gourry@gourry.net> Signed-off-by: Gregory Price <gourry@gourry.net> Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com> Suggested-by: Yunjeong Mun <yunjeong.mun@sk.com> Suggested-by: Oscar Salvador <osalvador@suse.de> Suggested-by: Ying Huang <ying.huang@linux.alibaba.com> Suggested-by: Harry Yoo <harry.yoo@oracle.com> Reviewed-by: Harry Yoo <harry.yoo@oracle.com> Reviewed-by: Huang Ying <ying.huang@linux.alibaba.com> Reviewed-by: Honggyu Kim <honggyu.kim@sk.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Joanthan Cameron <Jonathan.Cameron@huawei.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Len Brown <lenb@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Diffstat (limited to 'drivers/base/node.c')
-rw-r--r--drivers/base/node.c9
1 files changed, 9 insertions, 0 deletions
diff --git a/drivers/base/node.c b/drivers/base/node.c
index cd13ef287011..25ab9ec14eb8 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -7,6 +7,7 @@
#include <linux/init.h>
#include <linux/mm.h>
#include <linux/memory.h>
+#include <linux/mempolicy.h>
#include <linux/vmstat.h>
#include <linux/notifier.h>
#include <linux/node.h>
@@ -214,6 +215,14 @@ void node_set_perf_attrs(unsigned int nid, struct access_coordinate *coord,
break;
}
}
+
+ /* When setting CPU access coordinates, update mempolicy */
+ if (access == ACCESS_COORDINATE_CPU) {
+ if (mempolicy_set_node_perf(nid, coord)) {
+ pr_info("failed to set mempolicy attrs for node %d\n",
+ nid);
+ }
+ }
}
EXPORT_SYMBOL_GPL(node_set_perf_attrs);