summaryrefslogtreecommitdiff
path: root/fs/btrfs/relocation.c
diff options
context:
space:
mode:
authorQu Wenruo <wqu@suse.com>2019-01-23 15:15:17 +0800
committerDavid Sterba <dsterba@suse.com>2019-02-25 14:13:26 +0100
commitf616f5cd9da7fceb7d884812da380b26040cd083 (patch)
tree35b1d1f6912235e37e2a84579d8a00041d3a6411 /fs/btrfs/relocation.c
parent370a11b8114bcca3738fe6a5d7ed8babcc212f39 (diff)
btrfs: qgroup: Use delayed subtree rescan for balance
Before this patch, qgroup code traces the whole subtree of subvolume and reloc trees unconditionally. This makes qgroup numbers consistent, but it could cause tons of unnecessary extent tracing, which causes a lot of overhead. However for subtree swap of balance, just swap both subtrees because they contain the same contents and tree structure, so qgroup numbers won't change. It's the race window between subtree swap and transaction commit could cause qgroup number change. This patch will delay the qgroup subtree scan until COW happens for the subtree root. So if there is no other operations for the fs, balance won't cause extra qgroup overhead. (best case scenario) Depending on the workload, most of the subtree scan can still be avoided. Only for worst case scenario, it will fall back to old subtree swap overhead. (scan all swapped subtrees) [[Benchmark]] Hardware: VM 4G vRAM, 8 vCPUs, disk is using 'unsafe' cache mode, backing device is SAMSUNG 850 evo SSD. Host has 16G ram. Mkfs parameter: --nodesize 4K (To bump up tree size) Initial subvolume contents: 4G data copied from /usr and /lib. (With enough regular small files) Snapshots: 16 snapshots of the original subvolume. each snapshot has 3 random files modified. balance parameter: -m So the content should be pretty similar to a real world root fs layout. And after file system population, there is no other activity, so it should be the best case scenario. | v4.20-rc1 | w/ patchset | diff ----------------------------------------------------------------------- relocated extents | 22615 | 22457 | -0.1% qgroup dirty extents | 163457 | 121606 | -25.6% time (sys) | 22.884s | 18.842s | -17.6% time (real) | 27.724s | 22.884s | -17.5% Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Diffstat (limited to 'fs/btrfs/relocation.c')
-rw-r--r--fs/btrfs/relocation.c14
1 files changed, 5 insertions, 9 deletions
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 0c528918c844..e04cf11059b5 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -1889,16 +1889,12 @@ again:
* If not traced, we will leak data numbers
* 2) Fs subtree
* If not traced, we will double count old data
- * and tree block numbers, if current trans doesn't free
- * data reloc tree inode.
+ *
+ * We don't scan the subtree right now, but only record
+ * the swapped tree blocks.
+ * The real subtree rescan is delayed until we have new
+ * CoW on the subtree root node before transaction commit.
*/
- ret = btrfs_qgroup_trace_subtree_swap(trans, rc->block_group,
- parent, slot, path->nodes[level],
- path->slots[level], last_snapshot);
- if (ret < 0)
- break;
-
- btrfs_node_key_to_cpu(parent, &first_key, slot);
ret = btrfs_qgroup_add_swapped_blocks(trans, dest,
rc->block_group, parent, slot,
path->nodes[level], path->slots[level],