From d6c3af7d8a2ba5602c28841248c551a712ac50f5 Mon Sep 17 00:00:00 2001 From: Yu Zhao Date: Sun, 18 Sep 2022 02:00:09 -0600 Subject: mm: multi-gen LRU: debugfs interface MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add /sys/kernel/debug/lru_gen for working set estimation and proactive reclaim. These techniques are commonly used to optimize job scheduling (bin packing) in data centers [1][2]. Compared with the page table-based approach and the PFN-based approach, this lruvec-based approach has the following advantages: 1. It offers better choices because it is aware of memcgs, NUMA nodes, shared mappings and unmapped page cache. 2. It is more scalable because it is O(nr_hot_pages), whereas the PFN-based approach is O(nr_total_pages). Add /sys/kernel/debug/lru_gen_full for debugging. [1] https://dl.acm.org/doi/10.1145/3297858.3304053 [2] https://dl.acm.org/doi/10.1145/3503222.3507731 Link: https://lkml.kernel.org/r/20220918080010.2920238-13-yuzhao@google.com Signed-off-by: Yu Zhao Reviewed-by: Qi Zheng Acked-by: Brian Geffon Acked-by: Jan Alexander Steffens (heftig) Acked-by: Oleksandr Natalenko Acked-by: Steven Barrett Acked-by: Suleiman Souhlal Tested-by: Daniel Byrne Tested-by: Donald Carr Tested-by: Holger Hoffstätte Tested-by: Konstantin Kharlamov Tested-by: Shuang Zhai Tested-by: Sofia Trinh Tested-by: Vaibhav Jain Cc: Andi Kleen Cc: Aneesh Kumar K.V Cc: Barry Song Cc: Catalin Marinas Cc: Dave Hansen Cc: Hillf Danton Cc: Jens Axboe Cc: Johannes Weiner Cc: Jonathan Corbet Cc: Linus Torvalds Cc: Matthew Wilcox Cc: Mel Gorman Cc: Miaohe Lin Cc: Michael Larabel Cc: Michal Hocko Cc: Mike Rapoport Cc: Mike Rapoport Cc: Peter Zijlstra Cc: Tejun Heo Cc: Vlastimil Babka Cc: Will Deacon Signed-off-by: Andrew Morton --- include/linux/nodemask.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux/nodemask.h') diff --git a/include/linux/nodemask.h b/include/linux/nodemask.h index 4b71a96190a8..3a0eec9f2faa 100644 --- a/include/linux/nodemask.h +++ b/include/linux/nodemask.h @@ -493,6 +493,7 @@ static inline int num_node_state(enum node_states state) #define first_online_node 0 #define first_memory_node 0 #define next_online_node(nid) (MAX_NUMNODES) +#define next_memory_node(nid) (MAX_NUMNODES) #define nr_node_ids 1U #define nr_online_nodes 1U -- cgit From 3e061d924fe9c7b487ca45935f92fcd414ce6f1a Mon Sep 17 00:00:00 2001 From: "Aneesh Kumar K.V" Date: Thu, 18 Aug 2022 18:40:42 +0530 Subject: lib/nodemask: optimize node_random for nodemask with single NUMA node The most common case for certain node_random usage (demotion nodemask) is with nodemask weight 1. We can avoid calling get_random_init() in that case and always return the only node set in the nodemask. A simple test as below before = rdtsc_ordered(); for (i= 0; i < 100; i++) { rand = node_random(&nmask); } after = rdtsc_ordered(); Without fix after - before : 16438 With fix after - before : 816 Link: https://lkml.kernel.org/r/20220818131042.113280-11-aneesh.kumar@linux.ibm.com Signed-off-by: Aneesh Kumar K.V Reviewed-by: "Huang, Ying" Acked-by: Wei Xu Cc: Alistair Popple Cc: Bharata B Rao Cc: Dan Williams Cc: Dave Hansen Cc: Davidlohr Bueso Cc: Hesham Almatary Cc: Jagdish Gediya Cc: Johannes Weiner Cc: Jonathan Cameron Cc: Michal Hocko Cc: Tim Chen Cc: Yang Shi Cc: SeongJae Park Signed-off-by: Andrew Morton --- include/linux/nodemask.h | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) (limited to 'include/linux/nodemask.h') diff --git a/include/linux/nodemask.h b/include/linux/nodemask.h index 3a0eec9f2faa..e66742db741c 100644 --- a/include/linux/nodemask.h +++ b/include/linux/nodemask.h @@ -505,12 +505,21 @@ static inline int num_node_state(enum node_states state) static inline int node_random(const nodemask_t *maskp) { #if defined(CONFIG_NUMA) && (MAX_NUMNODES > 1) - int w, bit = NUMA_NO_NODE; + int w, bit; w = nodes_weight(*maskp); - if (w) + switch (w) { + case 0: + bit = NUMA_NO_NODE; + break; + case 1: + bit = first_node(*maskp); + break; + default: bit = bitmap_ord_to_pos(maskp->bits, - get_random_int() % w, MAX_NUMNODES); + get_random_int() % w, MAX_NUMNODES); + break; + } return bit; #else return 0; -- cgit