summaryrefslogtreecommitdiff
path: root/lib/mpi/mpi-internal.h
diff options
context:
space:
mode:
authorSeongJae Park <sjpark@amazon.de>2020-12-11 12:24:05 +0100
committerJakub Kicinski <kuba@kernel.org>2020-12-12 15:08:54 -0800
commit0b9b241406818a871c6d25390aa487dba966d548 (patch)
tree8d1b4990106eed407ef7704833fe892584c37221 /lib/mpi/mpi-internal.h
parente0a64d1dffca048a99546993322bd1fb5c728ee8 (diff)
inet: frags: batch fqdir destroy works
On a few of our systems, I found frequent 'unshare(CLONE_NEWNET)' calls make the number of active slab objects including 'sock_inode_cache' type rapidly and continuously increase. As a result, memory pressure occurs. In more detail, I made an artificial reproducer that resembles the workload that we found the problem and reproduce the problem faster. It merely repeats 'unshare(CLONE_NEWNET)' 50,000 times in a loop. It takes about 2 minutes. On 40 CPU cores / 70GB DRAM machine, the available memory continuously reduced in a fast speed (about 120MB per second, 15GB in total within the 2 minutes). Note that the issue don't reproduce on every machine. On my 6 CPU cores machine, the problem didn't reproduce. 'cleanup_net()' and 'fqdir_work_fn()' are functions that deallocate the relevant memory objects. They are asynchronously invoked by the work queues and internally use 'rcu_barrier()' to ensure safe destructions. 'cleanup_net()' works in a batched maneer in a single thread worker, while 'fqdir_work_fn()' works for each 'fqdir_exit()' call in the 'system_wq'. Therefore, 'fqdir_work_fn()' called frequently under the workload and made the contention for 'rcu_barrier()' high. In more detail, the global mutex, 'rcu_state.barrier_mutex' became the bottleneck. This commit avoids such contention by doing the 'rcu_barrier()' and subsequent lightweight works in a batched manner, as similar to that of 'cleanup_net()'. The fqdir hashtable destruction, which is done before the 'rcu_barrier()', is still allowed to run in parallel for fast processing, but this commit makes it to use a dedicated work queue instead of the 'system_wq', to make sure that the number of threads is bounded. Signed-off-by: SeongJae Park <sjpark@amazon.de> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20201211112405.31158-1-sjpark@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Diffstat (limited to 'lib/mpi/mpi-internal.h')
0 files changed, 0 insertions, 0 deletions