From 7ad3f188aac15772c97523dc4ca3e8e5b6294b9c Mon Sep 17 00:00:00 2001 From: Yang Shi Date: Wed, 15 Nov 2017 17:31:59 -0800 Subject: tools: slabinfo: add "-U" option to show unreclaimable slabs only Patch series "oom: capture unreclaimable slab info in oom message", v10. Recently we ran into a oom issue, kernel panic due to no killable process. The dmesg shows huge unreclaimable slabs used almost 100% memory, but kdump doesn't capture vmcore due to some reason. So, it may sound better to capture unreclaimable slab info in oom message when kernel panic to aid trouble shooting and cover the corner case. Since kernel already panic, so capturing more information sounds worthy and doesn't bother normal oom killer. With the patchset, tools/vm/slabinfo has a new option, "-U", to show unreclaimable slab only. And, oom will print all non zero (num_objs * size != 0) unreclaimable slabs in oom killer message. This patch (of 3): Add "-U" option to show unreclaimable slabs only. "-U" and "-S" together can tell us what unreclaimable slabs use the most memory to help debug huge unreclaimable slabs issue. Link: http://lkml.kernel.org/r/1507152550-46205-2-git-send-email-yang.s@alibaba-inc.com Signed-off-by: Yang Shi Acked-by: Christoph Lameter Acked-by: David Rientjes Cc: Pekka Enberg Cc: Joonsoo Kim Cc: Michal Hocko Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- tools/vm/slabinfo.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) (limited to 'tools') diff --git a/tools/vm/slabinfo.c b/tools/vm/slabinfo.c index b0b7ef6d0de1..f82c2eaa859d 100644 --- a/tools/vm/slabinfo.c +++ b/tools/vm/slabinfo.c @@ -84,6 +84,7 @@ int output_lines = -1; int sort_loss; int extended_totals; int show_bytes; +int unreclaim_only; /* Debug options */ int sanity; @@ -133,6 +134,7 @@ static void usage(void) "-L|--Loss Sort by loss\n" "-X|--Xtotals Show extended summary information\n" "-B|--Bytes Show size in bytes\n" + "-U|--Unreclaim Show unreclaimable slabs only\n" "\nValid debug options (FZPUT may be combined)\n" "a / A Switch on all debug options (=FZUP)\n" "- Switch off all debug options\n" @@ -569,6 +571,9 @@ static void slabcache(struct slabinfo *s) if (strcmp(s->name, "*") == 0) return; + if (unreclaim_only && s->reclaim_account) + return; + if (actual_slabs == 1) { report(s); return; @@ -1347,6 +1352,7 @@ struct option opts[] = { { "Loss", no_argument, NULL, 'L'}, { "Xtotals", no_argument, NULL, 'X'}, { "Bytes", no_argument, NULL, 'B'}, + { "Unreclaim", no_argument, NULL, 'U'}, { NULL, 0, NULL, 0 } }; @@ -1358,7 +1364,7 @@ int main(int argc, char *argv[]) page_size = getpagesize(); - while ((c = getopt_long(argc, argv, "aAd::Defhil1noprstvzTSN:LXB", + while ((c = getopt_long(argc, argv, "aAd::Defhil1noprstvzTSN:LXBU", opts, NULL)) != -1) switch (c) { case '1': @@ -1439,6 +1445,9 @@ int main(int argc, char *argv[]) case 'B': show_bytes = 1; break; + case 'U': + unreclaim_only = 1; + break; default: fatal("%s: Invalid option '%c'\n", argv[0], optopt); -- cgit From d8be75663cec0069b85f80191abd2682ce4a512f Mon Sep 17 00:00:00 2001 From: "Levin, Alexander (Sasha Levin)" Date: Wed, 15 Nov 2017 17:35:58 -0800 Subject: kmemcheck: remove whats left of NOTRACK flags Now that kmemcheck is gone, we don't need the NOTRACK flags. Link: http://lkml.kernel.org/r/20171007030159.22241-5-alexander.levin@verizon.com Signed-off-by: Sasha Levin Cc: Alexander Potapenko Cc: Eric W. Biederman Cc: Michal Hocko Cc: Pekka Enberg Cc: Steven Rostedt Cc: Tim Hansen Cc: Vegard Nossum Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- tools/perf/builtin-kmem.c | 1 - 1 file changed, 1 deletion(-) (limited to 'tools') diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c index 557d391f564a..cbf70738ef5f 100644 --- a/tools/perf/builtin-kmem.c +++ b/tools/perf/builtin-kmem.c @@ -655,7 +655,6 @@ static const struct { { "__GFP_RECLAIMABLE", "RC" }, { "__GFP_MOVABLE", "M" }, { "__GFP_ACCOUNT", "AC" }, - { "__GFP_NOTRACK", "NT" }, { "__GFP_WRITE", "WR" }, { "__GFP_RECLAIM", "R" }, { "__GFP_DIRECT_RECLAIM", "DR" }, -- cgit From 4675ff05de2d76d167336b368bd07f3fef6ed5a6 Mon Sep 17 00:00:00 2001 From: "Levin, Alexander (Sasha Levin)" Date: Wed, 15 Nov 2017 17:36:02 -0800 Subject: kmemcheck: rip it out Fix up makefiles, remove references, and git rm kmemcheck. Link: http://lkml.kernel.org/r/20171007030159.22241-4-alexander.levin@verizon.com Signed-off-by: Sasha Levin Cc: Steven Rostedt Cc: Vegard Nossum Cc: Pekka Enberg Cc: Michal Hocko Cc: Eric W. Biederman Cc: Alexander Potapenko Cc: Tim Hansen Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- tools/include/linux/kmemcheck.h | 8 -------- 1 file changed, 8 deletions(-) (limited to 'tools') diff --git a/tools/include/linux/kmemcheck.h b/tools/include/linux/kmemcheck.h index 2bccd2c7b897..ea32a7d3cf1b 100644 --- a/tools/include/linux/kmemcheck.h +++ b/tools/include/linux/kmemcheck.h @@ -1,9 +1 @@ /* SPDX-License-Identifier: GPL-2.0 */ -#ifndef _LIBLOCKDEP_LINUX_KMEMCHECK_H_ -#define _LIBLOCKDEP_LINUX_KMEMCHECK_H_ - -static inline void kmemcheck_mark_initialized(void *address, unsigned int n) -{ -} - -#endif -- cgit From c7df8ad2910e965a6241b6d8f52fd122e26b0315 Mon Sep 17 00:00:00 2001 From: Mel Gorman Date: Wed, 15 Nov 2017 17:37:41 -0800 Subject: mm, truncate: do not check mapping for every page being truncated During truncation, the mapping has already been checked for shmem and dax so it's known that workingset_update_node is required. This patch avoids the checks on mapping for each page being truncated. In all other cases, a lookup helper is used to determine if workingset_update_node() needs to be called. The one danger is that the API is slightly harder to use as calling workingset_update_node directly without checking for dax or shmem mappings could lead to surprises. However, the API rarely needs to be used and hopefully the comment is enough to give people the hint. sparsetruncate (tiny) 4.14.0-rc4 4.14.0-rc4 oneirq-v1r1 pickhelper-v1r1 Min Time 141.00 ( 0.00%) 140.00 ( 0.71%) 1st-qrtle Time 142.00 ( 0.00%) 141.00 ( 0.70%) 2nd-qrtle Time 142.00 ( 0.00%) 142.00 ( 0.00%) 3rd-qrtle Time 143.00 ( 0.00%) 143.00 ( 0.00%) Max-90% Time 144.00 ( 0.00%) 144.00 ( 0.00%) Max-95% Time 147.00 ( 0.00%) 145.00 ( 1.36%) Max-99% Time 195.00 ( 0.00%) 191.00 ( 2.05%) Max Time 230.00 ( 0.00%) 205.00 ( 10.87%) Amean Time 144.37 ( 0.00%) 143.82 ( 0.38%) Stddev Time 10.44 ( 0.00%) 9.00 ( 13.74%) Coeff Time 7.23 ( 0.00%) 6.26 ( 13.41%) Best99%Amean Time 143.72 ( 0.00%) 143.34 ( 0.26%) Best95%Amean Time 142.37 ( 0.00%) 142.00 ( 0.26%) Best90%Amean Time 142.19 ( 0.00%) 141.85 ( 0.24%) Best75%Amean Time 141.92 ( 0.00%) 141.58 ( 0.24%) Best50%Amean Time 141.69 ( 0.00%) 141.31 ( 0.27%) Best25%Amean Time 141.38 ( 0.00%) 140.97 ( 0.29%) As you'd expect, the gain is marginal but it can be detected. The differences in bonnie are all within the noise which is not surprising given the impact on the microbenchmark. radix_tree_update_node_t is a callback for some radix operations that optionally passes in a private field. The only user of the callback is workingset_update_node and as it no longer requires a mapping, the private field is removed. Link: http://lkml.kernel.org/r/20171018075952.10627-3-mgorman@techsingularity.net Signed-off-by: Mel Gorman Acked-by: Johannes Weiner Reviewed-by: Jan Kara Cc: Andi Kleen Cc: Dave Chinner Cc: Dave Hansen Cc: Vlastimil Babka Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- tools/testing/radix-tree/multiorder.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'tools') diff --git a/tools/testing/radix-tree/multiorder.c b/tools/testing/radix-tree/multiorder.c index 06c71178d07d..59245b3d587c 100644 --- a/tools/testing/radix-tree/multiorder.c +++ b/tools/testing/radix-tree/multiorder.c @@ -618,7 +618,7 @@ static void multiorder_account(void) __radix_tree_insert(&tree, 1 << 5, 5, (void *)0x12); __radix_tree_lookup(&tree, 1 << 5, &node, &slot); assert(node->count == node->exceptional * 2); - __radix_tree_replace(&tree, node, slot, NULL, NULL, NULL); + __radix_tree_replace(&tree, node, slot, NULL, NULL); assert(node->exceptional == 0); item_kill_tree(&tree); -- cgit From 453f85d43fa9ee243f0fc3ac4e1be45615301e3f Mon Sep 17 00:00:00 2001 From: Mel Gorman Date: Wed, 15 Nov 2017 17:38:03 -0800 Subject: mm: remove __GFP_COLD As the page free path makes no distinction between cache hot and cold pages, there is no real useful ordering of pages in the free list that allocation requests can take advantage of. Juding from the users of __GFP_COLD, it is likely that a number of them are the result of copying other sites instead of actually measuring the impact. Remove the __GFP_COLD parameter which simplifies a number of paths in the page allocator. This is potentially controversial but bear in mind that the size of the per-cpu pagelists versus modern cache sizes means that the whole per-cpu list can often fit in the L3 cache. Hence, there is only a potential benefit for microbenchmarks that alloc/free pages in a tight loop. It's even worse when THP is taken into account which has little or no chance of getting a cache-hot page as the per-cpu list is bypassed and the zeroing of multiple pages will thrash the cache anyway. The truncate microbenchmarks are not shown as this patch affects the allocation path and not the free path. A page fault microbenchmark was tested but it showed no sigificant difference which is not surprising given that the __GFP_COLD branches are a miniscule percentage of the fault path. Link: http://lkml.kernel.org/r/20171018075952.10627-9-mgorman@techsingularity.net Signed-off-by: Mel Gorman Acked-by: Vlastimil Babka Cc: Andi Kleen Cc: Dave Chinner Cc: Dave Hansen Cc: Jan Kara Cc: Johannes Weiner Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- tools/perf/builtin-kmem.c | 1 - 1 file changed, 1 deletion(-) (limited to 'tools') diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c index cbf70738ef5f..ae11e4c3516a 100644 --- a/tools/perf/builtin-kmem.c +++ b/tools/perf/builtin-kmem.c @@ -641,7 +641,6 @@ static const struct { { "__GFP_ATOMIC", "_A" }, { "__GFP_IO", "I" }, { "__GFP_FS", "F" }, - { "__GFP_COLD", "CO" }, { "__GFP_NOWARN", "NWR" }, { "__GFP_RETRY_MAYFAIL", "R" }, { "__GFP_NOFAIL", "NF" }, -- cgit