From d256d1cd8da1cbc4615de69df71c87ce623fec2f Mon Sep 17 00:00:00 2001 From: Tong Tiangen Date: Mon, 28 Aug 2023 10:25:27 +0800 Subject: mm: memory-failure: use rcu lock instead of tasklist_lock when collect_procs() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit We found a softlock issue in our test, analyzed the logs, and found that the relevant CPU call trace as follows: CPU0: _do_fork -> copy_process() -> write_lock_irq(&tasklist_lock) //Disable irq,waiting for //tasklist_lock CPU1: wp_page_copy() ->pte_offset_map_lock() -> spin_lock(&page->ptl); //Hold page->ptl -> ptep_clear_flush() -> flush_tlb_others() ... -> smp_call_function_many() -> arch_send_call_function_ipi_mask() -> csd_lock_wait() //Waiting for other CPUs respond //IPI CPU2: collect_procs_anon() -> read_lock(&tasklist_lock) //Hold tasklist_lock ->for_each_process(tsk) -> page_mapped_in_vma() -> page_vma_mapped_walk() -> map_pte() ->spin_lock(&page->ptl) //Waiting for page->ptl We can see that CPU1 waiting for CPU0 respond IPI,CPU0 waiting for CPU2 unlock tasklist_lock, CPU2 waiting for CPU1 unlock page->ptl. As a result, softlockup is triggered. For collect_procs_anon(), what we're doing is task list iteration, during the iteration, with the help of call_rcu(), the task_struct object is freed only after one or more grace periods elapse. the logic as follows: release_task() -> __exit_signal() -> __unhash_process() -> list_del_rcu() -> put_task_struct_rcu_user() -> call_rcu(&task->rcu, delayed_put_task_struct) delayed_put_task_struct() -> put_task_struct() -> if (refcount_sub_and_test()) __put_task_struct() -> free_task() Therefore, under the protection of the rcu lock, we can safely use get_task_struct() to ensure a safe reference to task_struct during the iteration. By removing the use of tasklist_lock in task list iteration, we can break the softlock chain above. The same logic can also be applied to: - collect_procs_file() - collect_procs_fsdax() - collect_procs_ksm() Link: https://lkml.kernel.org/r/20230828022527.241693-1-tongtiangen@huawei.com Signed-off-by: Tong Tiangen Acked-by: Naoya Horiguchi Cc: Kefeng Wang Cc: Matthew Wilcox (Oracle) Cc: Miaohe Lin Cc: Paul E. McKenney Signed-off-by: Andrew Morton --- mm/filemap.c | 3 --- 1 file changed, 3 deletions(-) (limited to 'mm/filemap.c') diff --git a/mm/filemap.c b/mm/filemap.c index bf6219d9aaac..582f5317ff71 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -121,9 +121,6 @@ * bdi.wb->list_lock (zap_pte_range->set_page_dirty) * ->inode->i_lock (zap_pte_range->set_page_dirty) * ->private_lock (zap_pte_range->block_dirty_folio) - * - * ->i_mmap_rwsem - * ->tasklist_lock (memory_failure, collect_procs_ao) */ static void page_cache_delete(struct address_space *mapping, -- cgit