summaryrefslogtreecommitdiff
path: root/mm/memory-failure.c
diff options
context:
space:
mode:
authorShiyang Ruan <ruansy.fnst@fujitsu.com>2023-10-23 15:20:46 +0800
committerChandan Babu R <chandanbabu@kernel.org>2023-12-07 14:34:26 +0530
commitfa422b353d212373fb2b2857a5ea5a6fa4876f9c (patch)
treee3cf6995ef6ac33ee3ad9e5b223a0bafaf9a355a /mm/memory-failure.c
parent49391d1349da4cd752437a8aab77d3304be75d9b (diff)
mm, pmem, xfs: Introduce MF_MEM_PRE_REMOVE for unbind
Now, if we suddenly remove a PMEM device(by calling unbind) which contains FSDAX while programs are still accessing data in this device, e.g.: ``` $FSSTRESS_PROG -d $SCRATCH_MNT -n 99999 -p 4 & # $FSX_PROG -N 1000000 -o 8192 -l 500000 $SCRATCH_MNT/t001 & echo "pfn1.1" > /sys/bus/nd/drivers/nd_pmem/unbind ``` it could come into an unacceptable state: 1. device has gone but mount point still exists, and umount will fail with "target is busy" 2. programs will hang and cannot be killed 3. may crash with NULL pointer dereference To fix this, we introduce a MF_MEM_PRE_REMOVE flag to let it know that we are going to remove the whole device, and make sure all related processes could be notified so that they could end up gracefully. This patch is inspired by Dan's "mm, dax, pmem: Introduce dev_pagemap_failure()"[1]. With the help of dax_holder and ->notify_failure() mechanism, the pmem driver is able to ask filesystem on it to unmap all files in use, and notify processes who are using those files. Call trace: trigger unbind -> unbind_store() -> ... (skip) -> devres_release_all() -> kill_dax() -> dax_holder_notify_failure(dax_dev, 0, U64_MAX, MF_MEM_PRE_REMOVE) -> xfs_dax_notify_failure() `-> freeze_super() // freeze (kernel call) `-> do xfs rmap ` -> mf_dax_kill_procs() ` -> collect_procs_fsdax() // all associated processes ` -> unmap_and_kill() ` -> invalidate_inode_pages2_range() // drop file's cache `-> thaw_super() // thaw (both kernel & user call) Introduce MF_MEM_PRE_REMOVE to let filesystem know this is a remove event. Use the exclusive freeze/thaw[2] to lock the filesystem to prevent new dax mapping from being created. Do not shutdown filesystem directly if configuration is not supported, or if failure range includes metadata area. Make sure all files and processes(not only the current progress) are handled correctly. Also drop the cache of associated files before pmem is removed. [1]: https://lore.kernel.org/linux-mm/161604050314.1463742.14151665140035795571.stgit@dwillia2-desk3.amr.corp.intel.com/ [2]: https://lore.kernel.org/linux-xfs/169116275623.3187159.16862410128731457358.stg-ugh@frogsfrogsfrogs/ Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
Diffstat (limited to 'mm/memory-failure.c')
-rw-r--r--mm/memory-failure.c21
1 files changed, 17 insertions, 4 deletions
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 660c21859118..cff3bda60691 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -679,7 +679,7 @@ static void add_to_kill_fsdax(struct task_struct *tsk, struct page *p,
*/
static void collect_procs_fsdax(struct page *page,
struct address_space *mapping, pgoff_t pgoff,
- struct list_head *to_kill)
+ struct list_head *to_kill, bool pre_remove)
{
struct vm_area_struct *vma;
struct task_struct *tsk;
@@ -687,8 +687,15 @@ static void collect_procs_fsdax(struct page *page,
i_mmap_lock_read(mapping);
rcu_read_lock();
for_each_process(tsk) {
- struct task_struct *t = task_early_kill(tsk, true);
+ struct task_struct *t = tsk;
+ /*
+ * Search for all tasks while MF_MEM_PRE_REMOVE is set, because
+ * the current may not be the one accessing the fsdax page.
+ * Otherwise, search for the current task.
+ */
+ if (!pre_remove)
+ t = task_early_kill(tsk, true);
if (!t)
continue;
vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, pgoff) {
@@ -1795,6 +1802,7 @@ int mf_dax_kill_procs(struct address_space *mapping, pgoff_t index,
dax_entry_t cookie;
struct page *page;
size_t end = index + count;
+ bool pre_remove = mf_flags & MF_MEM_PRE_REMOVE;
mf_flags |= MF_ACTION_REQUIRED | MF_MUST_KILL;
@@ -1806,9 +1814,14 @@ int mf_dax_kill_procs(struct address_space *mapping, pgoff_t index,
if (!page)
goto unlock;
- SetPageHWPoison(page);
+ if (!pre_remove)
+ SetPageHWPoison(page);
- collect_procs_fsdax(page, mapping, index, &to_kill);
+ /*
+ * The pre_remove case is revoking access, the memory is still
+ * good and could theoretically be put back into service.
+ */
+ collect_procs_fsdax(page, mapping, index, &to_kill, pre_remove);
unmap_and_kill(&to_kill, page_to_pfn(page), mapping,
index, mf_flags);
unlock: