From a3cf0e4342b6af9e6b34a4b913c630fbd03a82ea Mon Sep 17 00:00:00 2001 From: Qu Wenruo Date: Tue, 14 Jul 2020 09:12:20 +0800 Subject: btrfs: qgroup: free per-trans reserved space when a subvolume gets dropped [BUG] Sometime fsstress could lead to qgroup warning for case like generic/013: BTRFS warning (device dm-3): qgroup 0/259 has unreleased space, type 1 rsv 81920 ------------[ cut here ]------------ WARNING: CPU: 9 PID: 24535 at fs/btrfs/disk-io.c:4142 close_ctree+0x1dc/0x323 [btrfs] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:close_ctree+0x1dc/0x323 [btrfs] Call Trace: btrfs_put_super+0x15/0x17 [btrfs] generic_shutdown_super+0x72/0x110 kill_anon_super+0x18/0x30 btrfs_kill_super+0x17/0x30 [btrfs] deactivate_locked_super+0x3b/0xa0 deactivate_super+0x40/0x50 cleanup_mnt+0x135/0x190 __cleanup_mnt+0x12/0x20 task_work_run+0x64/0xb0 __prepare_exit_to_usermode+0x1bc/0x1c0 __syscall_return_slowpath+0x47/0x230 do_syscall_64+0x64/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 ---[ end trace 6c341cdf9b6cc3c1 ]--- BTRFS error (device dm-3): qgroup reserved space leaked While that subvolume 259 is no longer in that filesystem. [CAUSE] Normally per-trans qgroup reserved space is freed when a transaction is committed, in commit_fs_roots(). However for completely dropped subvolume, that subvolume is completely gone, thus is no longer in the fs_roots_radix, and its per-trans reserved qgroup will never be freed. Since the subvolume is already gone, leaked per-trans space won't cause any trouble for end users. [FIX] Just call btrfs_qgroup_free_meta_all_pertrans() before a subvolume is completely dropped. Signed-off-by: Qu Wenruo Signed-off-by: David Sterba --- fs/btrfs/extent-tree.c | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'fs/btrfs/extent-tree.c') diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index c0bc35f932bf..96223813b618 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -5466,6 +5466,14 @@ int btrfs_drop_snapshot(struct btrfs_root *root, int update_ref, int for_reloc) } } + /* + * This subvolume is going to be completely dropped, and won't be + * recorded as dirty roots, thus pertrans meta rsv will not be freed at + * commit transaction time. So free it here manually. + */ + btrfs_qgroup_convert_reserved_meta(root, INT_MAX); + btrfs_qgroup_free_meta_all_pertrans(root); + if (test_bit(BTRFS_ROOT_IN_RADIX, &root->state)) btrfs_add_dropped_root(trans, root); else -- cgit From f3e3d9cc35252a70a2fd698762c9687718268ec6 Mon Sep 17 00:00:00 2001 From: Qu Wenruo Date: Mon, 13 Jul 2020 09:03:20 +0800 Subject: btrfs: avoid possible signal interruption of btrfs_drop_snapshot() on relocation tree [BUG] There is a bug report about bad signal timing could lead to read-only fs during balance: BTRFS info (device xvdb): balance: start -d -m -s BTRFS info (device xvdb): relocating block group 73001861120 flags metadata BTRFS info (device xvdb): found 12236 extents, stage: move data extents BTRFS info (device xvdb): relocating block group 71928119296 flags data BTRFS info (device xvdb): found 3 extents, stage: move data extents BTRFS info (device xvdb): found 3 extents, stage: update data pointers BTRFS info (device xvdb): relocating block group 60922265600 flags metadata BTRFS: error (device xvdb) in btrfs_drop_snapshot:5505: errno=-4 unknown BTRFS info (device xvdb): forced readonly BTRFS info (device xvdb): balance: ended with status: -4 [CAUSE] The direct cause is the -EINTR from the following call chain when a fatal signal is pending: relocate_block_group() |- clean_dirty_subvols() |- btrfs_drop_snapshot() |- btrfs_start_transaction() |- btrfs_delayed_refs_rsv_refill() |- btrfs_reserve_metadata_bytes() |- __reserve_metadata_bytes() |- wait_reserve_ticket() |- prepare_to_wait_event(); |- ticket->error = -EINTR; Normally this behavior is fine for most btrfs_start_transaction() callers, as they need to catch any other error, same for the signal, and exit ASAP. However for balance, especially for the clean_dirty_subvols() case, we're already doing cleanup works, getting -EINTR from btrfs_drop_snapshot() could cause a lot of unexpected problems. From the mentioned forced read-only report, to later balance error due to half dropped reloc trees. [FIX] Fix this problem by using btrfs_join_transaction() if btrfs_drop_snapshot() is called from relocation context. Since btrfs_join_transaction() won't get interrupted by signal, we can continue the cleanup. CC: stable@vger.kernel.org # 5.4+ Reviewed-by: Josef Bacik Signed-off-by: Qu Wenruo Reviewed-by: David Sterba 3 Signed-off-by: David Sterba --- fs/btrfs/extent-tree.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) (limited to 'fs/btrfs/extent-tree.c') diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 96223813b618..61ede335f6c3 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -5298,7 +5298,14 @@ int btrfs_drop_snapshot(struct btrfs_root *root, int update_ref, int for_reloc) goto out; } - trans = btrfs_start_transaction(tree_root, 0); + /* + * Use join to avoid potential EINTR from transaction start. See + * wait_reserve_ticket and the whole reservation callchain. + */ + if (for_reloc) + trans = btrfs_join_transaction(tree_root); + else + trans = btrfs_start_transaction(tree_root, 0); if (IS_ERR(trans)) { err = PTR_ERR(trans); goto out_free; -- cgit From c57dd1f2f6a7cd1bb61802344f59ccdc5278c983 Mon Sep 17 00:00:00 2001 From: Qu Wenruo Date: Fri, 31 Jul 2020 19:29:11 +0800 Subject: btrfs: trim: fix underflow in trim length to prevent access beyond device boundary [BUG] The following script can lead to tons of beyond device boundary access: mkfs.btrfs -f $dev -b 10G mount $dev $mnt trimfs $mnt btrfs filesystem resize 1:-1G $mnt trimfs $mnt [CAUSE] Since commit 929be17a9b49 ("btrfs: Switch btrfs_trim_free_extents to find_first_clear_extent_bit"), we try to avoid trimming ranges that's already trimmed. So we check device->alloc_state by finding the first range which doesn't have CHUNK_TRIMMED and CHUNK_ALLOCATED not set. But if we shrunk the device, that bits are not cleared, thus we could easily got a range starts beyond the shrunk device size. This results the returned @start and @end are all beyond device size, then we call "end = min(end, device->total_bytes -1);" making @end smaller than device size. Then finally we goes "len = end - start + 1", totally underflow the result, and lead to the beyond-device-boundary access. [FIX] This patch will fix the problem in two ways: - Clear CHUNK_TRIMMED | CHUNK_ALLOCATED bits when shrinking device This is the root fix - Add extra safety check when trimming free device extents We check and warn if the returned range is already beyond current device. Link: https://github.com/kdave/btrfs-progs/issues/282 Fixes: 929be17a9b49 ("btrfs: Switch btrfs_trim_free_extents to find_first_clear_extent_bit") CC: stable@vger.kernel.org # 5.4+ Signed-off-by: Qu Wenruo Reviewed-by: Filipe Manana Signed-off-by: David Sterba --- fs/btrfs/extent-tree.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) (limited to 'fs/btrfs/extent-tree.c') diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 61ede335f6c3..de6fe176fdfb 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -33,6 +33,7 @@ #include "delalloc-space.h" #include "block-group.h" #include "discard.h" +#include "rcu-string.h" #undef SCRAMBLE_DELAYED_REFS @@ -5668,6 +5669,19 @@ static int btrfs_trim_free_extents(struct btrfs_device *device, u64 *trimmed) &start, &end, CHUNK_TRIMMED | CHUNK_ALLOCATED); + /* Check if there are any CHUNK_* bits left */ + if (start > device->total_bytes) { + WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG)); + btrfs_warn_in_rcu(fs_info, +"ignoring attempt to trim beyond device size: offset %llu length %llu device %s device size %llu", + start, end - start + 1, + rcu_str_deref(device->name), + device->total_bytes); + mutex_unlock(&fs_info->chunk_mutex); + ret = 0; + break; + } + /* Ensure we skip the reserved area in the first 1M */ start = max_t(u64, start, SZ_1M); -- cgit