mm: don't raise MEMCG_OOM event due to failed high-order allocation

It was reported that on some of our machines containers were restarted with OOM symptoms without an obvious reason. Despite there were almost no memory pressure and plenty of page cache, MEMCG_OOM event was raised occasionally, causing the container management software to think, that OOM has happened. However, no tasks have been killed. The following investigation showed that the problem is caused by a failing attempt to charge a high-order page. In such case, the OOM killer is never invoked. As shown below, it can happen under conditions, which are very far from a real OOM: e.g. there is plenty of clean page cache and no memory pressure. There is no sense in raising an OOM event in this case, as it might confuse a user and lead to wrong and excessive actions (e.g. restart the workload, as in my case). Let's look at the charging path in try_charge(). If the memory usage is about memory.max, which is absolutely natural for most memory cgroups, we try to reclaim some pages. Even if we were able to reclaim enough memory for the allocation, the following check can fail due to a race with another concurrent allocation: if (mem_cgroup_margin(mem_over_limit) >= nr_pages) goto retry; For regular pages the following condition will save us from triggering the OOM: if (nr_reclaimed && nr_pages <= (1 << PAGE_ALLOC_COSTLY_ORDER)) goto retry; But for high-order allocation this condition will intentionally fail. The reason behind is that we'll likely fall to regular pages anyway, so it's ok and even preferred to return ENOMEM. In this case the idea of raising MEMCG_OOM looks dubious. Fix this by moving MEMCG_OOM raising to mem_cgroup_oom() after allocation order check, so that the event won't be raised for high order allocations. This change doesn't affect regular pages allocation and charging. Link: http://lkml.kernel.org/r/20181004214050.7417-1-guro@fb.com Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Michal Hocko <mhocko@kernel.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
author: Roman Gushchin <guro@fb.com> 2018-10-26 15:09:48 -0700
committer: Linus Torvalds <torvalds@linux-foundation.org> 2018-10-26 16:38:14 -0700
commit: 7a1adfddaf0d11a39fdcaf6e82a88e9c0586e08b (patch)
tree: b72fc5d5a53fa688aaba31c81ac81cef5e5fa2bc /mm/memcontrol.c
parent: 64081362e8ff4587b4554087f3cfc73d3e0a4cd7 (diff)
1 files changed, 2 insertions, 2 deletions
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 92d38c88250f..10a9b554d69f 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1669,6 +1669,8 @@ static enum oom_status mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask, int
 	if (order > PAGE_ALLOC_COSTLY_ORDER)
 		return OOM_SKIPPED;
 
+	memcg_memory_event(memcg, MEMCG_OOM);
+
 	/*
 	 * We are in the middle of the charge context here, so we
 	 * don't want to block when potentially sitting on a callstack
@@ -2250,8 +2252,6 @@ retry:
 	if (fatal_signal_pending(current))
 		goto force;
 
-	memcg_memory_event(mem_over_limit, MEMCG_OOM);
-
 	/*
 	 * keep retrying as long as the memcg oom killer is able to make
 	 * a forward progress or bypass the charge if the oom killer
author	Roman Gushchin <guro@fb.com>	2018-10-26 15:09:48 -0700
committer	Linus Torvalds <torvalds@linux-foundation.org>	2018-10-26 16:38:14 -0700
commit	7a1adfddaf0d11a39fdcaf6e82a88e9c0586e08b (patch)
tree	b72fc5d5a53fa688aaba31c81ac81cef5e5fa2bc /mm/memcontrol.c
parent	64081362e8ff4587b4554087f3cfc73d3e0a4cd7 (diff)