From 432cd2a10f1c10cead91fe706ff5dc52f06d642a Mon Sep 17 00:00:00 2001 From: Filipe Manana Date: Mon, 8 Jun 2020 13:32:55 +0100 Subject: btrfs: fix data block group relocation failure due to concurrent scrub When running relocation of a data block group while scrub is running in parallel, it is possible that the relocation will fail and abort the current transaction with an -EINVAL error: [134243.988595] BTRFS info (device sdc): found 14 extents, stage: move data extents [134243.999871] ------------[ cut here ]------------ [134244.000741] BTRFS: Transaction aborted (error -22) [134244.001692] WARNING: CPU: 0 PID: 26954 at fs/btrfs/ctree.c:1071 __btrfs_cow_block+0x6a7/0x790 [btrfs] [134244.003380] Modules linked in: btrfs blake2b_generic xor raid6_pq (...) [134244.012577] CPU: 0 PID: 26954 Comm: btrfs Tainted: G W 5.6.0-rc7-btrfs-next-58 #5 [134244.014162] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 [134244.016184] RIP: 0010:__btrfs_cow_block+0x6a7/0x790 [btrfs] [134244.017151] Code: 48 c7 c7 (...) [134244.020549] RSP: 0018:ffffa41607863888 EFLAGS: 00010286 [134244.021515] RAX: 0000000000000000 RBX: ffff9614bdfe09c8 RCX: 0000000000000000 [134244.022822] RDX: 0000000000000001 RSI: ffffffffb3d63980 RDI: 0000000000000001 [134244.024124] RBP: ffff961589e8c000 R08: 0000000000000000 R09: 0000000000000001 [134244.025424] R10: ffffffffc0ae5955 R11: 0000000000000000 R12: ffff9614bd530d08 [134244.026725] R13: ffff9614ced41b88 R14: ffff9614bdfe2a48 R15: 0000000000000000 [134244.028024] FS: 00007f29b63c08c0(0000) GS:ffff9615ba600000(0000) knlGS:0000000000000000 [134244.029491] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [134244.030560] CR2: 00007f4eb339b000 CR3: 0000000130d6e006 CR4: 00000000003606f0 [134244.031997] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [134244.033153] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [134244.034484] Call Trace: [134244.034984] btrfs_cow_block+0x12b/0x2b0 [btrfs] [134244.035859] do_relocation+0x30b/0x790 [btrfs] [134244.036681] ? do_raw_spin_unlock+0x49/0xc0 [134244.037460] ? _raw_spin_unlock+0x29/0x40 [134244.038235] relocate_tree_blocks+0x37b/0x730 [btrfs] [134244.039245] relocate_block_group+0x388/0x770 [btrfs] [134244.040228] btrfs_relocate_block_group+0x161/0x2e0 [btrfs] [134244.041323] btrfs_relocate_chunk+0x36/0x110 [btrfs] [134244.041345] btrfs_balance+0xc06/0x1860 [btrfs] [134244.043382] ? btrfs_ioctl_balance+0x27c/0x310 [btrfs] [134244.045586] btrfs_ioctl_balance+0x1ed/0x310 [btrfs] [134244.045611] btrfs_ioctl+0x1880/0x3760 [btrfs] [134244.049043] ? do_raw_spin_unlock+0x49/0xc0 [134244.049838] ? _raw_spin_unlock+0x29/0x40 [134244.050587] ? __handle_mm_fault+0x11b3/0x14b0 [134244.051417] ? ksys_ioctl+0x92/0xb0 [134244.052070] ksys_ioctl+0x92/0xb0 [134244.052701] ? trace_hardirqs_off_thunk+0x1a/0x1c [134244.053511] __x64_sys_ioctl+0x16/0x20 [134244.054206] do_syscall_64+0x5c/0x280 [134244.054891] entry_SYSCALL_64_after_hwframe+0x49/0xbe [134244.055819] RIP: 0033:0x7f29b51c9dd7 [134244.056491] Code: 00 00 00 (...) [134244.059767] RSP: 002b:00007ffcccc1dd08 EFLAGS: 00000202 ORIG_RAX: 0000000000000010 [134244.061168] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f29b51c9dd7 [134244.062474] RDX: 00007ffcccc1dda0 RSI: 00000000c4009420 RDI: 0000000000000003 [134244.063771] RBP: 0000000000000003 R08: 00005565cea4b000 R09: 0000000000000000 [134244.065032] R10: 0000000000000541 R11: 0000000000000202 R12: 00007ffcccc2060a [134244.066327] R13: 00007ffcccc1dda0 R14: 0000000000000002 R15: 00007ffcccc1dec0 [134244.067626] irq event stamp: 0 [134244.068202] hardirqs last enabled at (0): [<0000000000000000>] 0x0 [134244.069351] hardirqs last disabled at (0): [] copy_process+0x74f/0x2020 [134244.070909] softirqs last enabled at (0): [] copy_process+0x74f/0x2020 [134244.072392] softirqs last disabled at (0): [<0000000000000000>] 0x0 [134244.073432] ---[ end trace bd7c03622e0b0a99 ]--- The -EINVAL error comes from the following chain of function calls: __btrfs_cow_block() <-- aborts the transaction btrfs_reloc_cow_block() replace_file_extents() get_new_location() <-- returns -EINVAL When relocating a data block group, for each allocated extent of the block group, we preallocate another extent (at prealloc_file_extent_cluster()), associated with the data relocation inode, and then dirty all its pages. These preallocated extents have, and must have, the same size that extents from the data block group being relocated have. Later before we start the relocation stage that updates pointers (bytenr field of file extent items) to point to the the new extents, we trigger writeback for the data relocation inode. The expectation is that writeback will write the pages to the previously preallocated extents, that it follows the NOCOW path. That is generally the case, however, if a scrub is running it may have turned the block group that contains those extents into RO mode, in which case writeback falls back to the COW path. However in the COW path instead of allocating exactly one extent with the expected size, the allocator may end up allocating several smaller extents due to free space fragmentation - because we tell it at cow_file_range() that the minimum allocation size can match the filesystem's sector size. This later breaks the relocation's expectation that an extent associated to a file extent item in the data relocation inode has the same size as the respective extent pointed by a file extent item in another tree - in this case the extent to which the relocation inode poins to is smaller, causing relocation.c:get_new_location() to return -EINVAL. For example, if we are relocating a data block group X that has a logical address of X and the block group has an extent allocated at the logical address X + 128KiB with a size of 64KiB: 1) At prealloc_file_extent_cluster() we allocate an extent for the data relocation inode with a size of 64KiB and associate it to the file offset 128KiB (X + 128KiB - X) of the data relocation inode. This preallocated extent was allocated at block group Z; 2) A scrub running in parallel turns block group Z into RO mode and starts scrubing its extents; 3) Relocation triggers writeback for the data relocation inode; 4) When running delalloc (btrfs_run_delalloc_range()), we try first the NOCOW path because the data relocation inode has BTRFS_INODE_PREALLOC set in its flags. However, because block group Z is in RO mode, the NOCOW path (run_delalloc_nocow()) falls back into the COW path, by calling cow_file_range(); 5) At cow_file_range(), in the first iteration of the while loop we call btrfs_reserve_extent() to allocate a 64KiB extent and pass it a minimum allocation size of 4KiB (fs_info->sectorsize). Due to free space fragmentation, btrfs_reserve_extent() ends up allocating two extents of 32KiB each, each one on a different iteration of that while loop; 6) Writeback of the data relocation inode completes; 7) Relocation proceeds and ends up at relocation.c:replace_file_extents(), with a leaf which has a file extent item that points to the data extent from block group X, that has a logical address (bytenr) of X + 128KiB and a size of 64KiB. Then it calls get_new_location(), which does a lookup in the data relocation tree for a file extent item starting at offset 128KiB (X + 128KiB - X) and belonging to the data relocation inode. It finds a corresponding file extent item, however that item points to an extent that has a size of 32KiB, which doesn't match the expected size of 64KiB, resuling in -EINVAL being returned from this function and propagated up to __btrfs_cow_block(), which aborts the current transaction. To fix this make sure that at cow_file_range() when we call the allocator we pass it a minimum allocation size corresponding the desired extent size if the inode belongs to the data relocation tree, otherwise pass it the filesystem's sector size as the minimum allocation size. CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Josef Bacik Signed-off-by: Filipe Manana Signed-off-by: David Sterba --- fs/btrfs/inode.c | 19 ++++++++++++++++++- 1 file changed, 18 insertions(+), 1 deletion(-) (limited to 'fs/btrfs/inode.c') diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 12b5d61f23bb..62c3f4972ff6 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -985,6 +985,7 @@ static noinline int cow_file_range(struct inode *inode, u64 num_bytes; unsigned long ram_size; u64 cur_alloc_size = 0; + u64 min_alloc_size; u64 blocksize = fs_info->sectorsize; struct btrfs_key ins; struct extent_map *em; @@ -1035,10 +1036,26 @@ static noinline int cow_file_range(struct inode *inode, btrfs_drop_extent_cache(BTRFS_I(inode), start, start + num_bytes - 1, 0); + /* + * Relocation relies on the relocated extents to have exactly the same + * size as the original extents. Normally writeback for relocation data + * extents follows a NOCOW path because relocation preallocates the + * extents. However, due to an operation such as scrub turning a block + * group to RO mode, it may fallback to COW mode, so we must make sure + * an extent allocated during COW has exactly the requested size and can + * not be split into smaller extents, otherwise relocation breaks and + * fails during the stage where it updates the bytenr of file extent + * items. + */ + if (root->root_key.objectid == BTRFS_DATA_RELOC_TREE_OBJECTID) + min_alloc_size = num_bytes; + else + min_alloc_size = fs_info->sectorsize; + while (num_bytes > 0) { cur_alloc_size = num_bytes; ret = btrfs_reserve_extent(root, cur_alloc_size, cur_alloc_size, - fs_info->sectorsize, 0, alloc_hint, + min_alloc_size, 0, alloc_hint, &ins, 1, 1); if (ret < 0) goto out_unlock; -- cgit From 6bd335b469f945f75474c11e3f577f85409f39c3 Mon Sep 17 00:00:00 2001 From: Filipe Manana Date: Mon, 8 Jun 2020 13:33:05 +0100 Subject: btrfs: fix bytes_may_use underflow when running balance and scrub in parallel When balance and scrub are running in parallel it is possible to end up with an underflow of the bytes_may_use counter of the data space_info object, which triggers a warning like the following: [134243.793196] BTRFS info (device sdc): relocating block group 1104150528 flags data [134243.806891] ------------[ cut here ]------------ [134243.807561] WARNING: CPU: 1 PID: 26884 at fs/btrfs/space-info.h:125 btrfs_add_reserved_bytes+0x1da/0x280 [btrfs] [134243.808819] Modules linked in: btrfs blake2b_generic xor (...) [134243.815779] CPU: 1 PID: 26884 Comm: kworker/u8:8 Tainted: G W 5.6.0-rc7-btrfs-next-58 #5 [134243.816944] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 [134243.818389] Workqueue: writeback wb_workfn (flush-btrfs-108483) [134243.819186] RIP: 0010:btrfs_add_reserved_bytes+0x1da/0x280 [btrfs] [134243.819963] Code: 0b f2 85 (...) [134243.822271] RSP: 0018:ffffa4160aae7510 EFLAGS: 00010287 [134243.822929] RAX: 000000000000c000 RBX: ffff96159a8c1000 RCX: 0000000000000000 [134243.823816] RDX: 0000000000008000 RSI: 0000000000000000 RDI: ffff96158067a810 [134243.824742] RBP: ffff96158067a800 R08: 0000000000000001 R09: 0000000000000000 [134243.825636] R10: ffff961501432a40 R11: 0000000000000000 R12: 000000000000c000 [134243.826532] R13: 0000000000000001 R14: ffffffffffff4000 R15: ffff96158067a810 [134243.827432] FS: 0000000000000000(0000) GS:ffff9615baa00000(0000) knlGS:0000000000000000 [134243.828451] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [134243.829184] CR2: 000055bd7e414000 CR3: 00000001077be004 CR4: 00000000003606e0 [134243.830083] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [134243.830975] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [134243.831867] Call Trace: [134243.832211] find_free_extent+0x4a0/0x16c0 [btrfs] [134243.832846] btrfs_reserve_extent+0x91/0x180 [btrfs] [134243.833487] cow_file_range+0x12d/0x490 [btrfs] [134243.834080] fallback_to_cow+0x82/0x1b0 [btrfs] [134243.834689] ? release_extent_buffer+0x121/0x170 [btrfs] [134243.835370] run_delalloc_nocow+0x33f/0xa30 [btrfs] [134243.836032] btrfs_run_delalloc_range+0x1ea/0x6d0 [btrfs] [134243.836725] ? find_lock_delalloc_range+0x221/0x250 [btrfs] [134243.837450] writepage_delalloc+0xe8/0x150 [btrfs] [134243.838059] __extent_writepage+0xe8/0x4c0 [btrfs] [134243.838674] extent_write_cache_pages+0x237/0x530 [btrfs] [134243.839364] extent_writepages+0x44/0xa0 [btrfs] [134243.839946] do_writepages+0x23/0x80 [134243.840401] __writeback_single_inode+0x59/0x700 [134243.841006] writeback_sb_inodes+0x267/0x5f0 [134243.841548] __writeback_inodes_wb+0x87/0xe0 [134243.842091] wb_writeback+0x382/0x590 [134243.842574] ? wb_workfn+0x4a2/0x6c0 [134243.843030] wb_workfn+0x4a2/0x6c0 [134243.843468] process_one_work+0x26d/0x6a0 [134243.843978] worker_thread+0x4f/0x3e0 [134243.844452] ? process_one_work+0x6a0/0x6a0 [134243.844981] kthread+0x103/0x140 [134243.845400] ? kthread_create_worker_on_cpu+0x70/0x70 [134243.846030] ret_from_fork+0x3a/0x50 [134243.846494] irq event stamp: 0 [134243.846892] hardirqs last enabled at (0): [<0000000000000000>] 0x0 [134243.847682] hardirqs last disabled at (0): [] copy_process+0x74f/0x2020 [134243.848687] softirqs last enabled at (0): [] copy_process+0x74f/0x2020 [134243.849913] softirqs last disabled at (0): [<0000000000000000>] 0x0 [134243.850698] ---[ end trace bd7c03622e0b0a96 ]--- [134243.851335] ------------[ cut here ]------------ When relocating a data block group, for each extent allocated in the block group we preallocate another extent with the same size for the data relocation inode (we do it at prealloc_file_extent_cluster()). We reserve space by calling btrfs_check_data_free_space(), which ends up incrementing the data space_info's bytes_may_use counter, and then call btrfs_prealloc_file_range() to allocate the extent, which always decrements the bytes_may_use counter by the same amount. The expectation is that writeback of the data relocation inode always follows a NOCOW path, by writing into the preallocated extents. However, when starting writeback we might end up falling back into the COW path, because the block group that contains the preallocated extent was turned into RO mode by a scrub running in parallel. The COW path then calls the extent allocator which ends up calling btrfs_add_reserved_bytes(), and this function decrements the bytes_may_use counter of the data space_info object by an amount corresponding to the size of the allocated extent, despite we haven't previously incremented it. When the counter currently has a value smaller then the allocated extent we reset the counter to 0 and emit a warning, otherwise we just decrement it and slowly mess up with this counter which is crucial for space reservation, the end result can be granting reserved space to tasks when there isn't really enough free space, and having the tasks fail later in critical places where error handling consists of a transaction abort or hitting a BUG_ON(). Fix this by making sure that if we fallback to the COW path for a data relocation inode, we increment the bytes_may_use counter of the data space_info object. The COW path will then decrement it at btrfs_add_reserved_bytes() on success or through its error handling part by a call to extent_clear_unlock_delalloc() (which ends up calling btrfs_clear_delalloc_extent() that does the decrement operation) in case of an error. Test case btrfs/061 from fstests could sporadically trigger this. CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Josef Bacik Signed-off-by: Filipe Manana Signed-off-by: David Sterba --- fs/btrfs/inode.c | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-) (limited to 'fs/btrfs/inode.c') diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 62c3f4972ff6..62b49d2db928 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -1378,6 +1378,8 @@ static int fallback_to_cow(struct inode *inode, struct page *locked_page, int *page_started, unsigned long *nr_written) { const bool is_space_ino = btrfs_is_free_space_inode(BTRFS_I(inode)); + const bool is_reloc_ino = (BTRFS_I(inode)->root->root_key.objectid == + BTRFS_DATA_RELOC_TREE_OBJECTID); const u64 range_bytes = end + 1 - start; struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree; u64 range_start = start; @@ -1408,18 +1410,23 @@ static int fallback_to_cow(struct inode *inode, struct page *locked_page, * data space info, which we incremented in the step above. * * If we need to fallback to cow and the inode corresponds to a free - * space cache inode, we must also increment bytes_may_use of the data - * space_info for the same reason. Space caches always get a prealloc + * space cache inode or an inode of the data relocation tree, we must + * also increment bytes_may_use of the data space_info for the same + * reason. Space caches and relocated data extents always get a prealloc * extent for them, however scrub or balance may have set the block - * group that contains that extent to RO mode. + * group that contains that extent to RO mode and therefore force COW + * when starting writeback. */ count = count_range_bits(io_tree, &range_start, end, range_bytes, EXTENT_NORESERVE, 0); - if (count > 0 || is_space_ino) { - const u64 bytes = is_space_ino ? range_bytes : count; + if (count > 0 || is_space_ino || is_reloc_ino) { + u64 bytes = count; struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info; struct btrfs_space_info *sinfo = fs_info->data_sinfo; + if (is_space_ino || is_reloc_ino) + bytes = range_bytes; + spin_lock(&sinfo->lock); btrfs_space_info_update_bytes_may_use(fs_info, sinfo, bytes); spin_unlock(&sinfo->lock); -- cgit From 4b1946284dd6641afdb9457101056d9e6ee6204c Mon Sep 17 00:00:00 2001 From: Filipe Manana Date: Mon, 15 Jun 2020 18:48:58 +0100 Subject: btrfs: fix failure of RWF_NOWAIT write into prealloc extent beyond eof If we attempt to write to prealloc extent located after eof using a RWF_NOWAIT write, we always fail with -EAGAIN. We do actually check if we have an allocated extent for the write at the start of btrfs_file_write_iter() through a call to check_can_nocow(), but later when we go into the actual direct IO write path we simply return -EAGAIN if the write starts at or beyond EOF. Trivial to reproduce: $ mkfs.btrfs -f /dev/sdb $ mount /dev/sdb /mnt $ touch /mnt/foo $ chattr +C /mnt/foo $ xfs_io -d -c "pwrite -S 0xab 0 64K" /mnt/foo wrote 65536/65536 bytes at offset 0 64 KiB, 16 ops; 0.0004 sec (135.575 MiB/sec and 34707.1584 ops/sec) $ xfs_io -c "falloc -k 64K 1M" /mnt/foo $ xfs_io -d -c "pwrite -N -V 1 -S 0xfe -b 64K 64K 64K" /mnt/foo pwrite: Resource temporarily unavailable On xfs and ext4 the write succeeds, as expected. Fix this by removing the wrong check at btrfs_direct_IO(). Fixes: edf064e7c6fec3 ("btrfs: nowait aio support") CC: stable@vger.kernel.org # 4.14+ Signed-off-by: Filipe Manana Reviewed-by: David Sterba Signed-off-by: David Sterba --- fs/btrfs/inode.c | 3 --- 1 file changed, 3 deletions(-) (limited to 'fs/btrfs/inode.c') diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 62b49d2db928..cfa863d2d97c 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -7889,9 +7889,6 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter) dio_data.overwrite = 1; inode_unlock(inode); relock = true; - } else if (iocb->ki_flags & IOCB_NOWAIT) { - ret = -EAGAIN; - goto out; } ret = btrfs_delalloc_reserve_space(inode, &data_reserved, offset, count); -- cgit From 7999096fa9cfd0253497c8d2ed9a5a1537521d25 Mon Sep 17 00:00:00 2001 From: Herbert Xu Date: Fri, 12 Jun 2020 16:57:37 +1000 Subject: iov_iter: Move unnecessary inclusion of crypto/hash.h The header file linux/uio.h includes crypto/hash.h which pulls in most of the Crypto API. Since linux/uio.h is used throughout the kernel this means that every tiny bit of change to the Crypto API causes the entire kernel to get rebuilt. This patch fixes this by moving it into lib/iov_iter.c instead where it is actually used. This patch also fixes the ifdef to use CRYPTO_HASH instead of just CRYPTO which does not guarantee the existence of ahash. Unfortunately a number of drivers were relying on linux/uio.h to provide access to linux/slab.h. This patch adds inclusions of linux/slab.h as detected by build failures. Also skbuff.h was relying on this to provide a declaration for ahash_request. This patch adds a forward declaration instead. Signed-off-by: Herbert Xu Signed-off-by: Al Viro --- fs/btrfs/inode.c | 1 + 1 file changed, 1 insertion(+) (limited to 'fs/btrfs/inode.c') diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index d04c82c88418..d901d53e4f03 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -3,6 +3,7 @@ * Copyright (C) 2007 Oracle. All rights reserved. */ +#include #include #include #include -- cgit From 230ed397435e85b54f055c524fcb267ae2ce3bc4 Mon Sep 17 00:00:00 2001 From: Josef Bacik Date: Mon, 6 Jul 2020 09:14:12 -0400 Subject: btrfs: fix double put of block group with nocow While debugging a patch that I wrote I was hitting use-after-free panics when accessing block groups on unmount. This turned out to be because in the nocow case if we bail out of doing the nocow for whatever reason we need to call btrfs_dec_nocow_writers() if we called the inc. This puts our block group, but a few error cases does if (nocow) { btrfs_dec_nocow_writers(); goto error; } unfortunately, error is error: if (nocow) btrfs_dec_nocow_writers(); so we get a double put on our block group. Fix this by dropping the error cases calling of btrfs_dec_nocow_writers(), as it's handled at the error label now. Fixes: 762bf09893b4 ("btrfs: improve error handling in run_delalloc_nocow") CC: stable@vger.kernel.org # 5.4+ Reviewed-by: Filipe Manana Signed-off-by: Josef Bacik Reviewed-by: David Sterba Signed-off-by: David Sterba --- fs/btrfs/inode.c | 9 +-------- 1 file changed, 1 insertion(+), 8 deletions(-) (limited to 'fs/btrfs/inode.c') diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index cfa863d2d97c..11f81a148350 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -1690,12 +1690,8 @@ out_check: ret = fallback_to_cow(inode, locked_page, cow_start, found_key.offset - 1, page_started, nr_written); - if (ret) { - if (nocow) - btrfs_dec_nocow_writers(fs_info, - disk_bytenr); + if (ret) goto error; - } cow_start = (u64)-1; } @@ -1711,9 +1707,6 @@ out_check: ram_bytes, BTRFS_COMPRESS_NONE, BTRFS_ORDERED_PREALLOC); if (IS_ERR(em)) { - if (nocow) - btrfs_dec_nocow_writers(fs_info, - disk_bytenr); ret = PTR_ERR(em); goto error; } -- cgit From fa91e4aa1716004ea8096d5185ec0451e206aea0 Mon Sep 17 00:00:00 2001 From: Qu Wenruo Date: Fri, 17 Jul 2020 15:12:05 +0800 Subject: btrfs: qgroup: fix data leak caused by race between writeback and truncate [BUG] When running tests like generic/013 on test device with btrfs quota enabled, it can normally lead to data leak, detected at unmount time: BTRFS warning (device dm-3): qgroup 0/5 has unreleased space, type 0 rsv 4096 ------------[ cut here ]------------ WARNING: CPU: 11 PID: 16386 at fs/btrfs/disk-io.c:4142 close_ctree+0x1dc/0x323 [btrfs] RIP: 0010:close_ctree+0x1dc/0x323 [btrfs] Call Trace: btrfs_put_super+0x15/0x17 [btrfs] generic_shutdown_super+0x72/0x110 kill_anon_super+0x18/0x30 btrfs_kill_super+0x17/0x30 [btrfs] deactivate_locked_super+0x3b/0xa0 deactivate_super+0x40/0x50 cleanup_mnt+0x135/0x190 __cleanup_mnt+0x12/0x20 task_work_run+0x64/0xb0 __prepare_exit_to_usermode+0x1bc/0x1c0 __syscall_return_slowpath+0x47/0x230 do_syscall_64+0x64/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 ---[ end trace caf08beafeca2392 ]--- BTRFS error (device dm-3): qgroup reserved space leaked [CAUSE] In the offending case, the offending operations are: 2/6: writev f2X[269 1 0 0 0 0] [1006997,67,288] 0 2/7: truncate f2X[269 1 0 0 48 1026293] 18388 0 The following sequence of events could happen after the writev(): CPU1 (writeback) | CPU2 (truncate) ----------------------------------------------------------------- btrfs_writepages() | |- extent_write_cache_pages() | |- Got page for 1003520 | | 1003520 is Dirty, no writeback | | So (!clear_page_dirty_for_io()) | | gets called for it | |- Now page 1003520 is Clean. | | | btrfs_setattr() | | |- btrfs_setsize() | | |- truncate_setsize() | | New i_size is 18388 |- __extent_writepage() | | |- page_offset() > i_size | |- btrfs_invalidatepage() | |- Page is clean, so no qgroup | callback executed This means, the qgroup reserved data space is not properly released in btrfs_invalidatepage() as the page is Clean. [FIX] Instead of checking the dirty bit of a page, call btrfs_qgroup_free_data() unconditionally in btrfs_invalidatepage(). As qgroup rsv are completely bound to the QGROUP_RESERVED bit of io_tree, not bound to page status, thus we won't cause double freeing anyway. Fixes: 0b34c261e235 ("btrfs: qgroup: Prevent qgroup->reserved from going subzero") CC: stable@vger.kernel.org # 4.14+ Reviewed-by: Josef Bacik Signed-off-by: Qu Wenruo Signed-off-by: David Sterba --- fs/btrfs/inode.c | 23 ++++++++++------------- 1 file changed, 10 insertions(+), 13 deletions(-) (limited to 'fs/btrfs/inode.c') diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 11f81a148350..b7dd5124941e 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -8136,20 +8136,17 @@ again: /* * Qgroup reserved space handler * Page here will be either - * 1) Already written to disk - * In this case, its reserved space is released from data rsv map - * and will be freed by delayed_ref handler finally. - * So even we call qgroup_free_data(), it won't decrease reserved - * space. - * 2) Not written to disk - * This means the reserved space should be freed here. However, - * if a truncate invalidates the page (by clearing PageDirty) - * and the page is accounted for while allocating extent - * in btrfs_check_data_free_space() we let delayed_ref to - * free the entire extent. + * 1) Already written to disk or ordered extent already submitted + * Then its QGROUP_RESERVED bit in io_tree is already cleaned. + * Qgroup will be handled by its qgroup_record then. + * btrfs_qgroup_free_data() call will do nothing here. + * + * 2) Not written to disk yet + * Then btrfs_qgroup_free_data() call will clear the QGROUP_RESERVED + * bit of its io_tree, and free the qgroup reserved data space. + * Since the IO will never happen for this page. */ - if (PageDirty(page)) - btrfs_qgroup_free_data(inode, NULL, page_start, PAGE_SIZE); + btrfs_qgroup_free_data(inode, NULL, page_start, PAGE_SIZE); if (!inode_evicting) { clear_extent_bit(tree, page_start, page_end, EXTENT_LOCKED | EXTENT_DELALLOC | EXTENT_DELALLOC_NEW | -- cgit From 46d4dac888ebe083d61f18acb16a6988e9062268 Mon Sep 17 00:00:00 2001 From: Filipe Manana Date: Tue, 9 Jun 2020 11:19:33 +0100 Subject: btrfs: remove the start argument from btrfs_free_reserved_data_space_noquota() The start argument for btrfs_free_reserved_data_space_noquota() is only used to make sure the amount of bytes we decrement from the bytes_may_use counter of the data space_info object is aligned to the filesystem's sector size. It serves no other purpose. All its current callers always pass a length argument that is already aligned to the sector size, so we can make the start argument go away. In fact its presence makes it impossible to use it in a context where we just want to free a number of bytes for a range for which either we do not know its start offset or for freeing multiple ranges at once (which are not contiguous). This change is preparatory work for a patch (third patch in this series) that makes relocation of data block groups that are not full reserve less data space. Reviewed-by: Anand Jain Signed-off-by: Filipe Manana Reviewed-by: David Sterba Signed-off-by: David Sterba --- fs/btrfs/inode.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) (limited to 'fs/btrfs/inode.c') diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 6862cd7e21a9..07d20f634467 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -2087,7 +2087,7 @@ void btrfs_clear_delalloc_extent(struct inode *vfs_inode, (*bits & EXTENT_CLEAR_DATA_RESV)) btrfs_free_reserved_data_space_noquota( &inode->vfs_inode, - state->start, len); + len); percpu_counter_add_batch(&fs_info->delalloc_bytes, -len, fs_info->delalloc_batch); @@ -7278,8 +7278,7 @@ static int btrfs_get_blocks_direct_write(struct extent_map **map, * use the existing or preallocated extent, so does not * need to adjust btrfs_space_info's bytes_may_use. */ - btrfs_free_reserved_data_space_noquota(inode, start, - len); + btrfs_free_reserved_data_space_noquota(inode, len); goto skip_cow; } } -- cgit From 203f44c51982b80c437b49b32c843597c112f287 Mon Sep 17 00:00:00 2001 From: Qu Wenruo Date: Wed, 10 Jun 2020 09:04:40 +0800 Subject: btrfs: inode: refactor the parameters of insert_reserved_file_extent() Function insert_reserved_file_extent() takes a long list of parameters, which are all for btrfs_file_extent_item, even including two reserved members, encryption and other_encoding. This makes the parameter list unnecessary long for a function which only gets called twice. This patch will refactor the parameter list, by using btrfs_file_extent_item as parameter directly to hugely reduce the number of parameters. Also, since there are only two callers, one in btrfs_finish_ordered_io() which inserts file extent for ordered extent, and one __btrfs_prealloc_file_range(). These two call sites have completely different context, where ordered extent can be compressed, but will always be regular extent, while the preallocated one is never going to be compressed and always has PREALLOC type. So use two small wrapper for these two different call sites to improve readability. Reviewed-by: Josef Bacik Signed-off-by: Qu Wenruo Reviewed-by: David Sterba Signed-off-by: David Sterba --- fs/btrfs/inode.c | 94 +++++++++++++++++++++++++++++++++++++------------------- 1 file changed, 63 insertions(+), 31 deletions(-) (limited to 'fs/btrfs/inode.c') diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 07d20f634467..0c4f5f796ea6 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -2467,16 +2467,16 @@ int btrfs_writepage_cow_fixup(struct page *page, u64 start, u64 end) static int insert_reserved_file_extent(struct btrfs_trans_handle *trans, struct inode *inode, u64 file_pos, - u64 disk_bytenr, u64 disk_num_bytes, - u64 num_bytes, u64 ram_bytes, - u8 compression, u8 encryption, - u16 other_encoding, int extent_type) + struct btrfs_file_extent_item *stack_fi) { struct btrfs_root *root = BTRFS_I(inode)->root; - struct btrfs_file_extent_item *fi; struct btrfs_path *path; struct extent_buffer *leaf; struct btrfs_key ins; + u64 disk_num_bytes = btrfs_stack_file_extent_disk_num_bytes(stack_fi); + u64 disk_bytenr = btrfs_stack_file_extent_disk_bytenr(stack_fi); + u64 num_bytes = btrfs_stack_file_extent_num_bytes(stack_fi); + u64 ram_bytes = btrfs_stack_file_extent_ram_bytes(stack_fi); u64 qg_released; int extent_inserted = 0; int ret; @@ -2496,7 +2496,7 @@ static int insert_reserved_file_extent(struct btrfs_trans_handle *trans, */ ret = __btrfs_drop_extents(trans, root, inode, path, file_pos, file_pos + num_bytes, NULL, 0, - 1, sizeof(*fi), &extent_inserted); + 1, sizeof(*stack_fi), &extent_inserted); if (ret) goto out; @@ -2507,23 +2507,15 @@ static int insert_reserved_file_extent(struct btrfs_trans_handle *trans, path->leave_spinning = 1; ret = btrfs_insert_empty_item(trans, root, path, &ins, - sizeof(*fi)); + sizeof(*stack_fi)); if (ret) goto out; } leaf = path->nodes[0]; - fi = btrfs_item_ptr(leaf, path->slots[0], - struct btrfs_file_extent_item); - btrfs_set_file_extent_generation(leaf, fi, trans->transid); - btrfs_set_file_extent_type(leaf, fi, extent_type); - btrfs_set_file_extent_disk_bytenr(leaf, fi, disk_bytenr); - btrfs_set_file_extent_disk_num_bytes(leaf, fi, disk_num_bytes); - btrfs_set_file_extent_offset(leaf, fi, 0); - btrfs_set_file_extent_num_bytes(leaf, fi, num_bytes); - btrfs_set_file_extent_ram_bytes(leaf, fi, ram_bytes); - btrfs_set_file_extent_compression(leaf, fi, compression); - btrfs_set_file_extent_encryption(leaf, fi, encryption); - btrfs_set_file_extent_other_encoding(leaf, fi, other_encoding); + btrfs_set_stack_file_extent_generation(stack_fi, trans->transid); + write_extent_buffer(leaf, stack_fi, + btrfs_item_ptr_offset(leaf, path->slots[0]), + sizeof(struct btrfs_file_extent_item)); btrfs_mark_buffer_dirty(leaf); btrfs_release_path(path); @@ -2571,7 +2563,33 @@ static void btrfs_release_delalloc_bytes(struct btrfs_fs_info *fs_info, btrfs_put_block_group(cache); } -/* as ordered data IO finishes, this gets called so we can finish +static int insert_ordered_extent_file_extent(struct btrfs_trans_handle *trans, + struct inode *inode, + struct btrfs_ordered_extent *oe) +{ + struct btrfs_file_extent_item stack_fi; + u64 logical_len; + + memset(&stack_fi, 0, sizeof(stack_fi)); + btrfs_set_stack_file_extent_type(&stack_fi, BTRFS_FILE_EXTENT_REG); + btrfs_set_stack_file_extent_disk_bytenr(&stack_fi, oe->disk_bytenr); + btrfs_set_stack_file_extent_disk_num_bytes(&stack_fi, + oe->disk_num_bytes); + if (test_bit(BTRFS_ORDERED_TRUNCATED, &oe->flags)) + logical_len = oe->truncated_len; + else + logical_len = oe->num_bytes; + btrfs_set_stack_file_extent_num_bytes(&stack_fi, logical_len); + btrfs_set_stack_file_extent_ram_bytes(&stack_fi, logical_len); + btrfs_set_stack_file_extent_compression(&stack_fi, oe->compress_type); + /* Encryption and other encoding is reserved and all 0 */ + + return insert_reserved_file_extent(trans, inode, oe->file_offset, + &stack_fi); +} + +/* + * As ordered data IO finishes, this gets called so we can finish * an ordered extent if the range of bytes in the file it covers are * fully written. */ @@ -2673,12 +2691,8 @@ static int btrfs_finish_ordered_io(struct btrfs_ordered_extent *ordered_extent) logical_len); } else { BUG_ON(root == fs_info->tree_root); - ret = insert_reserved_file_extent(trans, inode, start, - ordered_extent->disk_bytenr, - ordered_extent->disk_num_bytes, - logical_len, logical_len, - compress_type, 0, 0, - BTRFS_FILE_EXTENT_REG); + ret = insert_ordered_extent_file_extent(trans, inode, + ordered_extent); if (!ret) { clear_reserved_extent = false; btrfs_release_delalloc_bytes(fs_info, @@ -9583,6 +9597,27 @@ out_unlock: return err; } +static int insert_prealloc_file_extent(struct btrfs_trans_handle *trans, + struct inode *inode, struct btrfs_key *ins, + u64 file_offset) +{ + struct btrfs_file_extent_item stack_fi; + u64 start = ins->objectid; + u64 len = ins->offset; + + memset(&stack_fi, 0, sizeof(stack_fi)); + + btrfs_set_stack_file_extent_type(&stack_fi, BTRFS_FILE_EXTENT_PREALLOC); + btrfs_set_stack_file_extent_disk_bytenr(&stack_fi, start); + btrfs_set_stack_file_extent_disk_num_bytes(&stack_fi, len); + btrfs_set_stack_file_extent_num_bytes(&stack_fi, len); + btrfs_set_stack_file_extent_ram_bytes(&stack_fi, len); + btrfs_set_stack_file_extent_compression(&stack_fi, BTRFS_COMPRESS_NONE); + /* Encryption and other encoding is reserved and all 0 */ + + return insert_reserved_file_extent(trans, inode, file_offset, + &stack_fi); +} static int __btrfs_prealloc_file_range(struct inode *inode, int mode, u64 start, u64 num_bytes, u64 min_size, loff_t actual_len, u64 *alloc_hint, @@ -9641,11 +9676,8 @@ static int __btrfs_prealloc_file_range(struct inode *inode, int mode, btrfs_dec_block_group_reservations(fs_info, ins.objectid); last_alloc = ins.offset; - ret = insert_reserved_file_extent(trans, inode, - cur_offset, ins.objectid, - ins.offset, ins.offset, - ins.offset, 0, 0, 0, - BTRFS_FILE_EXTENT_PREALLOC); + ret = insert_prealloc_file_extent(trans, inode, &ins, + cur_offset); if (ret) { btrfs_free_reserved_extent(fs_info, ins.objectid, ins.offset, 0); -- cgit From 9729f10a608f235779f060636f32c87766ec615d Mon Sep 17 00:00:00 2001 From: Qu Wenruo Date: Wed, 10 Jun 2020 09:04:41 +0800 Subject: btrfs: inode: move qgroup reserved space release to the callers of insert_reserved_file_extent() This is to prepare for the incoming timing change of qgroup reserved data space and ordered extent. Reviewed-by: Josef Bacik Signed-off-by: Qu Wenruo Reviewed-by: David Sterba Signed-off-by: David Sterba --- fs/btrfs/inode.c | 26 +++++++++++++------------- 1 file changed, 13 insertions(+), 13 deletions(-) (limited to 'fs/btrfs/inode.c') diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 0c4f5f796ea6..516192eccf52 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -2467,7 +2467,8 @@ int btrfs_writepage_cow_fixup(struct page *page, u64 start, u64 end) static int insert_reserved_file_extent(struct btrfs_trans_handle *trans, struct inode *inode, u64 file_pos, - struct btrfs_file_extent_item *stack_fi) + struct btrfs_file_extent_item *stack_fi, + u64 qgroup_reserved) { struct btrfs_root *root = BTRFS_I(inode)->root; struct btrfs_path *path; @@ -2477,7 +2478,6 @@ static int insert_reserved_file_extent(struct btrfs_trans_handle *trans, u64 disk_bytenr = btrfs_stack_file_extent_disk_bytenr(stack_fi); u64 num_bytes = btrfs_stack_file_extent_num_bytes(stack_fi); u64 ram_bytes = btrfs_stack_file_extent_ram_bytes(stack_fi); - u64 qg_released; int extent_inserted = 0; int ret; @@ -2531,17 +2531,9 @@ static int insert_reserved_file_extent(struct btrfs_trans_handle *trans, if (ret) goto out; - /* - * Release the reserved range from inode dirty range map, as it is - * already moved into delayed_ref_head - */ - ret = btrfs_qgroup_release_data(inode, file_pos, ram_bytes); - if (ret < 0) - goto out; - qg_released = ret; ret = btrfs_alloc_reserved_file_extent(trans, root, btrfs_ino(BTRFS_I(inode)), - file_pos, qg_released, &ins); + file_pos, qgroup_reserved, &ins); out: btrfs_free_path(path); @@ -2569,6 +2561,7 @@ static int insert_ordered_extent_file_extent(struct btrfs_trans_handle *trans, { struct btrfs_file_extent_item stack_fi; u64 logical_len; + int ret; memset(&stack_fi, 0, sizeof(stack_fi)); btrfs_set_stack_file_extent_type(&stack_fi, BTRFS_FILE_EXTENT_REG); @@ -2584,8 +2577,11 @@ static int insert_ordered_extent_file_extent(struct btrfs_trans_handle *trans, btrfs_set_stack_file_extent_compression(&stack_fi, oe->compress_type); /* Encryption and other encoding is reserved and all 0 */ + ret = btrfs_qgroup_release_data(inode, oe->file_offset, logical_len); + if (ret < 0) + return ret; return insert_reserved_file_extent(trans, inode, oe->file_offset, - &stack_fi); + &stack_fi, ret); } /* @@ -9604,6 +9600,7 @@ static int insert_prealloc_file_extent(struct btrfs_trans_handle *trans, struct btrfs_file_extent_item stack_fi; u64 start = ins->objectid; u64 len = ins->offset; + int ret; memset(&stack_fi, 0, sizeof(stack_fi)); @@ -9615,8 +9612,11 @@ static int insert_prealloc_file_extent(struct btrfs_trans_handle *trans, btrfs_set_stack_file_extent_compression(&stack_fi, BTRFS_COMPRESS_NONE); /* Encryption and other encoding is reserved and all 0 */ + ret = btrfs_qgroup_release_data(inode, file_offset, len); + if (ret < 0) + return ret; return insert_reserved_file_extent(trans, inode, file_offset, - &stack_fi); + &stack_fi, ret); } static int __btrfs_prealloc_file_range(struct inode *inode, int mode, u64 start, u64 num_bytes, u64 min_size, -- cgit From 7dbeaad0af7d0a1a2a8e41d04e90964368ddfcc5 Mon Sep 17 00:00:00 2001 From: Qu Wenruo Date: Wed, 10 Jun 2020 09:04:43 +0800 Subject: btrfs: change timing for qgroup reserved space for ordered extents to fix reserved space leak [BUG] The following simple workload from fsstress can lead to qgroup reserved data space leak: 0/0: creat f0 x:0 0 0 0/0: creat add id=0,parent=-1 0/1: write f0[259 1 0 0 0 0] [600030,27288] 0 0/4: dwrite - xfsctl(XFS_IOC_DIOINFO) f0[259 1 0 0 64 627318] return 25, fallback to stat() 0/4: dwrite f0[259 1 0 0 64 627318] [610304,106496] 0 This would cause btrfs qgroup to leak 20480 bytes for data reserved space. If btrfs qgroup limit is enabled, such leak can lead to unexpected early EDQUOT and unusable space. [CAUSE] When doing direct IO, kernel will try to writeback existing buffered page cache, then invalidate them: generic_file_direct_write() |- filemap_write_and_wait_range(); |- invalidate_inode_pages2_range(); However for btrfs, the bi_end_io hook doesn't finish all its heavy work right after bio ends. In fact, it delays its work further: submit_extent_page(end_io_func=end_bio_extent_writepage); end_bio_extent_writepage() |- btrfs_writepage_endio_finish_ordered() |- btrfs_init_work(finish_ordered_fn); <<< Work queue execution >>> finish_ordered_fn() |- btrfs_finish_ordered_io(); |- Clear qgroup bits This means, when filemap_write_and_wait_range() returns, btrfs_finish_ordered_io() is not guaranteed to be executed, thus the qgroup bits for related range are not cleared. Now into how the leak happens, this will only focus on the overlapping part of buffered and direct IO part. 1. After buffered write The inode had the following range with QGROUP_RESERVED bit: 596 616K |///////////////| Qgroup reserved data space: 20K 2. Writeback part for range [596K, 616K) Write back finished, but btrfs_finish_ordered_io() not get called yet. So we still have: 596K 616K |///////////////| Qgroup reserved data space: 20K 3. Pages for range [596K, 616K) get released This will clear all qgroup bits, but don't update the reserved data space. So we have: 596K 616K | | Qgroup reserved data space: 20K That number doesn't match the qgroup bit range anymore. 4. Dio prepare space for range [596K, 700K) Qgroup reserved data space for that range, we got: 596K 616K 700K |///////////////|///////////////////////| Qgroup reserved data space: 20K + 104K = 124K 5. btrfs_finish_ordered_range() gets executed for range [596K, 616K) Qgroup free reserved space for that range, we got: 596K 616K 700K | |///////////////////////| We need to free that range of reserved space. Qgroup reserved data space: 124K - 20K = 104K 6. btrfs_finish_ordered_range() gets executed for range [596K, 700K) However qgroup bit for range [596K, 616K) is already cleared in previous step, so we only free 84K for qgroup reserved space. 596K 616K 700K | | | We need to free that range of reserved space. Qgroup reserved data space: 104K - 84K = 20K Now there is no way to release that 20K unless disabling qgroup or unmounting the fs. [FIX] This patch will change the timing of btrfs_qgroup_release/free_data() call. Here it uses buffered COW write as an example. The new timing | The old timing ----------------------------------------+--------------------------------------- btrfs_buffered_write() | btrfs_buffered_write() |- btrfs_qgroup_reserve_data() | |- btrfs_qgroup_reserve_data() | btrfs_run_delalloc_range() | btrfs_run_delalloc_range() |- btrfs_add_ordered_extent() | |- btrfs_qgroup_release_data() | The reserved is passed into | btrfs_ordered_extent structure | | btrfs_finish_ordered_io() | btrfs_finish_ordered_io() |- The reserved space is passed to | |- btrfs_qgroup_release_data() btrfs_qgroup_record | The resereved space is passed | to btrfs_qgroup_recrod | btrfs_qgroup_account_extents() | btrfs_qgroup_account_extents() |- btrfs_qgroup_free_refroot() | |- btrfs_qgroup_free_refroot() The point of such change is to ensure, when ordered extents are submitted, the qgroup reserved space is already released, to keep the timing aligned with file_write_and_wait_range(). So that qgroup data reserved space is all bound to btrfs_ordered_extent and solve the timing mismatch. Fixes: f695fdcef83a ("btrfs: qgroup: Introduce functions to release/free qgroup reserve data space") Suggested-by: Josef Bacik Reviewed-by: Josef Bacik Signed-off-by: Qu Wenruo Signed-off-by: David Sterba --- fs/btrfs/inode.c | 15 +-------------- 1 file changed, 1 insertion(+), 14 deletions(-) (limited to 'fs/btrfs/inode.c') diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 516192eccf52..cdd9872b66f8 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -2561,7 +2561,6 @@ static int insert_ordered_extent_file_extent(struct btrfs_trans_handle *trans, { struct btrfs_file_extent_item stack_fi; u64 logical_len; - int ret; memset(&stack_fi, 0, sizeof(stack_fi)); btrfs_set_stack_file_extent_type(&stack_fi, BTRFS_FILE_EXTENT_REG); @@ -2577,11 +2576,8 @@ static int insert_ordered_extent_file_extent(struct btrfs_trans_handle *trans, btrfs_set_stack_file_extent_compression(&stack_fi, oe->compress_type); /* Encryption and other encoding is reserved and all 0 */ - ret = btrfs_qgroup_release_data(inode, oe->file_offset, logical_len); - if (ret < 0) - return ret; return insert_reserved_file_extent(trans, inode, oe->file_offset, - &stack_fi, ret); + &stack_fi, oe->qgroup_rsv); } /* @@ -2636,13 +2632,6 @@ static int btrfs_finish_ordered_io(struct btrfs_ordered_extent *ordered_extent) if (test_bit(BTRFS_ORDERED_NOCOW, &ordered_extent->flags)) { BUG_ON(!list_empty(&ordered_extent->list)); /* Logic error */ - /* - * For mwrite(mmap + memset to write) case, we still reserve - * space for NOCOW range. - * As NOCOW won't cause a new delayed ref, just free the space - */ - btrfs_qgroup_free_data(inode, NULL, start, - ordered_extent->num_bytes); btrfs_inode_safe_disk_i_size_write(inode, 0); if (freespace_inode) trans = btrfs_join_transaction_spacecache(root); @@ -2679,8 +2668,6 @@ static int btrfs_finish_ordered_io(struct btrfs_ordered_extent *ordered_extent) compress_type = ordered_extent->compress_type; if (test_bit(BTRFS_ORDERED_PREALLOC, &ordered_extent->flags)) { BUG_ON(compress_type); - btrfs_qgroup_free_data(inode, NULL, start, - ordered_extent->num_bytes); ret = btrfs_mark_extent_written(trans, BTRFS_I(inode), ordered_extent->file_offset, ordered_extent->file_offset + -- cgit From 43c69849ae78d83adc2a9ed077bc4c6353b09bc5 Mon Sep 17 00:00:00 2001 From: Nikolay Borisov Date: Wed, 3 Jun 2020 08:55:02 +0300 Subject: btrfs: make get_extent_allocation_hint take btrfs_inode It doesn't use the vfs inode for anything, can just as easily take btrfs_inode. Follow up patches will convert callers as well. Signed-off-by: Nikolay Borisov Reviewed-by: David Sterba Signed-off-by: David Sterba --- fs/btrfs/inode.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) (limited to 'fs/btrfs/inode.c') diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index cdd9872b66f8..690549dc4073 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -929,10 +929,10 @@ out_free: goto again; } -static u64 get_extent_allocation_hint(struct inode *inode, u64 start, +static u64 get_extent_allocation_hint(struct btrfs_inode *inode, u64 start, u64 num_bytes) { - struct extent_map_tree *em_tree = &BTRFS_I(inode)->extent_tree; + struct extent_map_tree *em_tree = &inode->extent_tree; struct extent_map *em; u64 alloc_hint = 0; @@ -1032,7 +1032,8 @@ static noinline int cow_file_range(struct inode *inode, } } - alloc_hint = get_extent_allocation_hint(inode, start, num_bytes); + alloc_hint = get_extent_allocation_hint(BTRFS_I(inode), start, + num_bytes); btrfs_drop_extent_cache(BTRFS_I(inode), start, start + num_bytes - 1, 0); @@ -6893,7 +6894,7 @@ static struct extent_map *btrfs_new_extent_direct(struct inode *inode, u64 alloc_hint; int ret; - alloc_hint = get_extent_allocation_hint(inode, start, len); + alloc_hint = get_extent_allocation_hint(BTRFS_I(inode), start, len); ret = btrfs_reserve_extent(root, len, len, fs_info->sectorsize, 0, alloc_hint, &ins, 1, 1); if (ret) -- cgit From c3504372699bff6daeda207b4e30256c39f584c1 Mon Sep 17 00:00:00 2001 From: Nikolay Borisov Date: Wed, 3 Jun 2020 08:55:03 +0300 Subject: btrfs: make btrfs_lookup_ordered_extent take btrfs_inode It doesn't use the generic vfs inode for anything use btrfs_inode directly. Signed-off-by: Nikolay Borisov Reviewed-by: David Sterba Signed-off-by: David Sterba --- fs/btrfs/inode.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'fs/btrfs/inode.c') diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 690549dc4073..bd69367452c8 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -4558,7 +4558,7 @@ again: lock_extent_bits(io_tree, block_start, block_end, &cached_state); set_page_extent_mapped(page); - ordered = btrfs_lookup_ordered_extent(inode, block_start); + ordered = btrfs_lookup_ordered_extent(BTRFS_I(inode), block_start); if (ordered) { unlock_extent_cached(io_tree, block_start, block_end, &cached_state); -- cgit From 7bfa9535019b1ca0696d0a0590a3fd657224ae2f Mon Sep 17 00:00:00 2001 From: Nikolay Borisov Date: Wed, 3 Jun 2020 08:55:04 +0300 Subject: btrfs: make btrfs_reloc_clone_csums take btrfs_inode It really wants btrfs_inode and not a vfs inode. Signed-off-by: Nikolay Borisov Reviewed-by: David Sterba Signed-off-by: David Sterba --- fs/btrfs/inode.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'fs/btrfs/inode.c') diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index bd69367452c8..ba7d2043fb96 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -1085,7 +1085,7 @@ static noinline int cow_file_range(struct inode *inode, if (root->root_key.objectid == BTRFS_DATA_RELOC_TREE_OBJECTID) { - ret = btrfs_reloc_clone_csums(inode, start, + ret = btrfs_reloc_clone_csums(BTRFS_I(inode), start, cur_alloc_size); /* * Only drop cache here, and process as normal. @@ -1743,7 +1743,7 @@ out_check: * extent_clear_unlock_delalloc() in error handler * from freeing metadata of created ordered extent. */ - ret = btrfs_reloc_clone_csums(inode, cur_offset, + ret = btrfs_reloc_clone_csums(BTRFS_I(inode), cur_offset, num_bytes); extent_clear_unlock_delalloc(inode, cur_offset, -- cgit From 4b67c11dd19cd6443944d888b027017bb7872514 Mon Sep 17 00:00:00 2001 From: Nikolay Borisov Date: Wed, 3 Jun 2020 08:55:05 +0300 Subject: btrfs: make create_io_em take btrfs_inode It really wants a btrfs_inode and will allow submit_compressed_extents to be completely converted to btrfs_inode in follow up patches. Signed-off-by: Nikolay Borisov Reviewed-by: David Sterba Signed-off-by: David Sterba --- fs/btrfs/inode.c | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) (limited to 'fs/btrfs/inode.c') diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index ba7d2043fb96..7015bf1a65c0 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -84,8 +84,8 @@ static noinline int cow_file_range(struct inode *inode, struct page *locked_page, u64 start, u64 end, int *page_started, unsigned long *nr_written, int unlock); -static struct extent_map *create_io_em(struct inode *inode, u64 start, u64 len, - u64 orig_start, u64 block_start, +static struct extent_map *create_io_em(struct btrfs_inode *inode, u64 start, + u64 len, u64 orig_start, u64 block_start, u64 block_len, u64 orig_block_len, u64 ram_bytes, int compress_type, int type); @@ -845,7 +845,7 @@ retry: * here we're doing allocation and writeback of the * compressed pages */ - em = create_io_em(inode, async_extent->start, + em = create_io_em(BTRFS_I(inode), async_extent->start, async_extent->ram_size, /* len */ async_extent->start, /* orig_start */ ins.objectid, /* block_start */ @@ -1064,7 +1064,7 @@ static noinline int cow_file_range(struct inode *inode, extent_reserved = true; ram_size = ins.offset; - em = create_io_em(inode, start, ins.offset, /* len */ + em = create_io_em(BTRFS_I(inode), start, ins.offset, /* len */ start, /* orig_start */ ins.objectid, /* block_start */ ins.offset, /* block_len */ @@ -1700,7 +1700,7 @@ out_check: u64 orig_start = found_key.offset - extent_offset; struct extent_map *em; - em = create_io_em(inode, cur_offset, num_bytes, + em = create_io_em(BTRFS_I(inode), cur_offset, num_bytes, orig_start, disk_bytenr, /* block_start */ num_bytes, /* block_len */ @@ -6861,7 +6861,7 @@ static struct extent_map *btrfs_create_dio_extent(struct inode *inode, int ret; if (type != BTRFS_ORDERED_NOCOW) { - em = create_io_em(inode, start, len, orig_start, + em = create_io_em(BTRFS_I(inode), start, len, orig_start, block_start, block_len, orig_block_len, ram_bytes, BTRFS_COMPRESS_NONE, /* compress_type */ @@ -7140,8 +7140,8 @@ static int lock_extent_direct(struct inode *inode, u64 lockstart, u64 lockend, } /* The callers of this must take lock_extent() */ -static struct extent_map *create_io_em(struct inode *inode, u64 start, u64 len, - u64 orig_start, u64 block_start, +static struct extent_map *create_io_em(struct btrfs_inode *inode, u64 start, + u64 len, u64 orig_start, u64 block_start, u64 block_len, u64 orig_block_len, u64 ram_bytes, int compress_type, int type) @@ -7155,7 +7155,7 @@ static struct extent_map *create_io_em(struct inode *inode, u64 start, u64 len, type == BTRFS_ORDERED_NOCOW || type == BTRFS_ORDERED_REGULAR); - em_tree = &BTRFS_I(inode)->extent_tree; + em_tree = &inode->extent_tree; em = alloc_extent_map(); if (!em) return ERR_PTR(-ENOMEM); @@ -7177,8 +7177,8 @@ static struct extent_map *create_io_em(struct inode *inode, u64 start, u64 len, } do { - btrfs_drop_extent_cache(BTRFS_I(inode), em->start, - em->start + em->len - 1, 0); + btrfs_drop_extent_cache(inode, em->start, + em->start + em->len - 1, 0); write_lock(&em_tree->lock); ret = add_extent_mapping(em_tree, em, 1); write_unlock(&em_tree->lock); -- cgit From ad7ff17b65a0567b826c88009d3ea080431816c3 Mon Sep 17 00:00:00 2001 From: Nikolay Borisov Date: Wed, 3 Jun 2020 08:55:06 +0300 Subject: btrfs: make extent_clear_unlock_delalloc take btrfs_inode It has one VFS and 1 btrfs inode usages but converting it to btrfs_inode interface will allow seamless conversion of its callers. Signed-off-by: Nikolay Borisov Reviewed-by: David Sterba Signed-off-by: David Sterba --- fs/btrfs/inode.c | 29 ++++++++++++++++------------- 1 file changed, 16 insertions(+), 13 deletions(-) (limited to 'fs/btrfs/inode.c') diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 7015bf1a65c0..cab56c37ec39 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -642,7 +642,8 @@ cont: * our outstanding extent for clearing delalloc for this * range. */ - extent_clear_unlock_delalloc(inode, start, end, NULL, + extent_clear_unlock_delalloc(BTRFS_I(inode), start, end, + NULL, clear_flags, PAGE_UNLOCK | PAGE_CLEAR_DIRTY | @@ -878,7 +879,7 @@ retry: /* * clear dirty, set writeback and unlock the pages. */ - extent_clear_unlock_delalloc(inode, async_extent->start, + extent_clear_unlock_delalloc(BTRFS_I(inode), async_extent->start, async_extent->start + async_extent->ram_size - 1, NULL, EXTENT_LOCKED | EXTENT_DELALLOC, @@ -900,7 +901,7 @@ retry: btrfs_writepage_endio_finish_ordered(p, start, end, 0); p->mapping = NULL; - extent_clear_unlock_delalloc(inode, start, end, + extent_clear_unlock_delalloc(BTRFS_I(inode), start, end, NULL, 0, PAGE_END_WRITEBACK | PAGE_SET_ERROR); @@ -915,7 +916,7 @@ out_free_reserve: btrfs_dec_block_group_reservations(fs_info, ins.objectid); btrfs_free_reserved_extent(fs_info, ins.objectid, ins.offset, 1); out_free: - extent_clear_unlock_delalloc(inode, async_extent->start, + extent_clear_unlock_delalloc(BTRFS_I(inode), async_extent->start, async_extent->start + async_extent->ram_size - 1, NULL, EXTENT_LOCKED | EXTENT_DELALLOC | @@ -1017,7 +1018,8 @@ static noinline int cow_file_range(struct inode *inode, * our outstanding extent for clearing delalloc for this * range. */ - extent_clear_unlock_delalloc(inode, start, end, NULL, + extent_clear_unlock_delalloc(BTRFS_I(inode), start, end, + NULL, EXTENT_LOCKED | EXTENT_DELALLOC | EXTENT_DELALLOC_NEW | EXTENT_DEFRAG | EXTENT_DO_ACCOUNTING, PAGE_UNLOCK | @@ -1115,7 +1117,7 @@ static noinline int cow_file_range(struct inode *inode, page_ops = unlock ? PAGE_UNLOCK : 0; page_ops |= PAGE_SET_PRIVATE2; - extent_clear_unlock_delalloc(inode, start, + extent_clear_unlock_delalloc(BTRFS_I(inode), start, start + ram_size - 1, locked_page, EXTENT_LOCKED | EXTENT_DELALLOC, @@ -1160,7 +1162,7 @@ out_unlock: * it the flag EXTENT_CLEAR_DATA_RESV. */ if (extent_reserved) { - extent_clear_unlock_delalloc(inode, start, + extent_clear_unlock_delalloc(BTRFS_I(inode), start, start + cur_alloc_size - 1, locked_page, clear_bits, @@ -1169,7 +1171,7 @@ out_unlock: if (start >= end) goto out; } - extent_clear_unlock_delalloc(inode, start, end, locked_page, + extent_clear_unlock_delalloc(BTRFS_I(inode), start, end, locked_page, clear_bits | EXTENT_CLEAR_DATA_RESV, page_ops); goto out; @@ -1277,8 +1279,8 @@ static int cow_file_range_async(struct inode *inode, PAGE_SET_WRITEBACK | PAGE_END_WRITEBACK | PAGE_SET_ERROR; - extent_clear_unlock_delalloc(inode, start, end, locked_page, - clear_bits, page_ops); + extent_clear_unlock_delalloc(BTRFS_I(inode), start, end, + locked_page, clear_bits, page_ops); return -ENOMEM; } @@ -1468,7 +1470,8 @@ static noinline int run_delalloc_nocow(struct inode *inode, path = btrfs_alloc_path(); if (!path) { - extent_clear_unlock_delalloc(inode, start, end, locked_page, + extent_clear_unlock_delalloc(BTRFS_I(inode), start, end, + locked_page, EXTENT_LOCKED | EXTENT_DELALLOC | EXTENT_DO_ACCOUNTING | EXTENT_DEFRAG, PAGE_UNLOCK | @@ -1746,7 +1749,7 @@ out_check: ret = btrfs_reloc_clone_csums(BTRFS_I(inode), cur_offset, num_bytes); - extent_clear_unlock_delalloc(inode, cur_offset, + extent_clear_unlock_delalloc(BTRFS_I(inode), cur_offset, cur_offset + num_bytes - 1, locked_page, EXTENT_LOCKED | EXTENT_DELALLOC | @@ -1783,7 +1786,7 @@ error: btrfs_dec_nocow_writers(fs_info, disk_bytenr); if (ret && cur_offset < end) - extent_clear_unlock_delalloc(inode, cur_offset, end, + extent_clear_unlock_delalloc(BTRFS_I(inode), cur_offset, end, locked_page, EXTENT_LOCKED | EXTENT_DELALLOC | EXTENT_DEFRAG | EXTENT_DO_ACCOUNTING, PAGE_UNLOCK | -- cgit From bd242a08a690e98d9e9eb7ab51580d4a86b76c6c Mon Sep 17 00:00:00 2001 From: Nikolay Borisov Date: Wed, 3 Jun 2020 08:55:07 +0300 Subject: btrfs: make btrfs_csum_one_bio takae btrfs_inode Will enable converting btrfs_submit_compressed_write to btrfs_inode more easily. Signed-off-by: Nikolay Borisov Reviewed-by: David Sterba Signed-off-by: David Sterba --- fs/btrfs/inode.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) (limited to 'fs/btrfs/inode.c') diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index cab56c37ec39..d0af2027f898 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -2167,7 +2167,7 @@ static blk_status_t btrfs_submit_bio_start(void *private_data, struct bio *bio, struct inode *inode = private_data; blk_status_t ret = 0; - ret = btrfs_csum_one_bio(inode, bio, 0, 0); + ret = btrfs_csum_one_bio(BTRFS_I(inode), bio, 0, 0); BUG_ON(ret); /* -ENOMEM */ return 0; } @@ -2232,7 +2232,7 @@ static blk_status_t btrfs_submit_bio_hook(struct inode *inode, struct bio *bio, 0, inode, btrfs_submit_bio_start); goto out; } else if (!skip_sum) { - ret = btrfs_csum_one_bio(inode, bio, 0, 0); + ret = btrfs_csum_one_bio(BTRFS_I(inode), bio, 0, 0); if (ret) goto out; } @@ -7572,7 +7572,7 @@ static blk_status_t btrfs_submit_bio_start_direct_io(void *private_data, { struct inode *inode = private_data; blk_status_t ret; - ret = btrfs_csum_one_bio(inode, bio, offset, 1); + ret = btrfs_csum_one_bio(BTRFS_I(inode), bio, offset, 1); BUG_ON(ret); /* -ENOMEM */ return 0; } @@ -7633,7 +7633,7 @@ static inline blk_status_t btrfs_submit_dio_bio(struct bio *bio, * If we aren't doing async submit, calculate the csum of the * bio now. */ - ret = btrfs_csum_one_bio(inode, bio, file_offset, 1); + ret = btrfs_csum_one_bio(BTRFS_I(inode), bio, file_offset, 1); if (ret) goto err; } else { -- cgit From 906c448c3dc3189d83bf644ec453d49737371b00 Mon Sep 17 00:00:00 2001 From: Nikolay Borisov Date: Wed, 3 Jun 2020 08:55:08 +0300 Subject: btrfs: make __btrfs_drop_extents take btrfs_inode It has only 4 uses of a vfs_inode for inode_sub_bytes but unifies the interface with the non __ prefixed version. Will also makes converting its callers to btrfs_inode easier. Signed-off-by: Nikolay Borisov Reviewed-by: David Sterba Signed-off-by: David Sterba --- fs/btrfs/inode.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'fs/btrfs/inode.c') diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index d0af2027f898..6f8c7d65c428 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -323,7 +323,7 @@ static noinline int cow_file_range_inline(struct inode *inode, u64 start, extent_item_size = btrfs_file_extent_calc_inline_size( inline_len); - ret = __btrfs_drop_extents(trans, root, inode, path, + ret = __btrfs_drop_extents(trans, root, BTRFS_I(inode), path, start, aligned_end, NULL, 1, 1, extent_item_size, &extent_inserted); if (ret) { @@ -2498,7 +2498,7 @@ static int insert_reserved_file_extent(struct btrfs_trans_handle *trans, * the caller is expected to unpin it and allow it to be merged * with the others. */ - ret = __btrfs_drop_extents(trans, root, inode, path, file_pos, + ret = __btrfs_drop_extents(trans, root, BTRFS_I(inode), path, file_pos, file_pos + num_bytes, NULL, 0, 1, sizeof(*stack_fi), &extent_inserted); if (ret) -- cgit From bab16e21e8bbd644067289cfa328f8a67f3e333d Mon Sep 17 00:00:00 2001 From: David Sterba Date: Tue, 23 Jun 2020 20:56:12 +0200 Subject: btrfs: don't use UAPI types for fiemap callback The fiemap callback is not part of UAPI interface and the prototypes don't have the __u64 types either. Reviewed-by: Nikolay Borisov Signed-off-by: David Sterba --- fs/btrfs/inode.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'fs/btrfs/inode.c') diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 6f8c7d65c428..e5feefa20b28 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -7946,7 +7946,7 @@ out: } static int btrfs_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo, - __u64 start, __u64 len) + u64 start, u64 len) { int ret; -- cgit From 6d4572a9d71d5fc2affee0258d8582d39859188c Mon Sep 17 00:00:00 2001 From: Qu Wenruo Date: Wed, 24 Jun 2020 07:23:50 +0800 Subject: btrfs: allow btrfs_truncate_block() to fallback to nocow for data space reservation [BUG] When the data space is exhausted, even if the inode has NOCOW attribute, we will still refuse to truncate unaligned range due to ENOSPC. The following script can reproduce it pretty easily: #!/bin/bash dev=/dev/test/test mnt=/mnt/btrfs umount $dev &> /dev/null umount $mnt &> /dev/null mkfs.btrfs -f $dev -b 1G mount -o nospace_cache $dev $mnt touch $mnt/foobar chattr +C $mnt/foobar xfs_io -f -c "pwrite -b 4k 0 4k" $mnt/foobar > /dev/null xfs_io -f -c "pwrite -b 4k 0 1G" $mnt/padding &> /dev/null sync xfs_io -c "fpunch 0 2k" $mnt/foobar umount $mnt Currently this will fail at the fpunch part. [CAUSE] Because btrfs_truncate_block() always reserves space without checking the NOCOW attribute. Since the writeback path follows NOCOW bit, we only need to bother the space reservation code in btrfs_truncate_block(). [FIX] Make btrfs_truncate_block() follow btrfs_buffered_write() to try to reserve data space first, and fall back to NOCOW check only when we don't have enough space. Such always-try-reserve is an optimization introduced in btrfs_buffered_write(), to avoid expensive btrfs_check_can_nocow() call. This patch will export check_can_nocow() as btrfs_check_can_nocow(), and use it in btrfs_truncate_block() to fix the problem. Reported-by: Martin Doucha Reviewed-by: Filipe Manana Reviewed-by: Anand Jain Signed-off-by: Qu Wenruo Reviewed-by: David Sterba Signed-off-by: David Sterba --- fs/btrfs/inode.c | 44 +++++++++++++++++++++++++++++++++++++------- 1 file changed, 37 insertions(+), 7 deletions(-) (limited to 'fs/btrfs/inode.c') diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index e5feefa20b28..a0388b21c5cc 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -4512,11 +4512,13 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len, struct extent_state *cached_state = NULL; struct extent_changeset *data_reserved = NULL; char *kaddr; + bool only_release_metadata = false; u32 blocksize = fs_info->sectorsize; pgoff_t index = from >> PAGE_SHIFT; unsigned offset = from & (blocksize - 1); struct page *page; gfp_t mask = btrfs_alloc_write_mask(mapping); + size_t write_bytes = blocksize; int ret = 0; u64 block_start; u64 block_end; @@ -4528,11 +4530,27 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len, block_start = round_down(from, blocksize); block_end = block_start + blocksize - 1; - ret = btrfs_delalloc_reserve_space(inode, &data_reserved, - block_start, blocksize); - if (ret) - goto out; + ret = btrfs_check_data_free_space(inode, &data_reserved, block_start, + blocksize); + if (ret < 0) { + if ((BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW | + BTRFS_INODE_PREALLOC)) && + btrfs_check_can_nocow(BTRFS_I(inode), block_start, + &write_bytes, false) > 0) { + /* For nocow case, no need to reserve data space */ + only_release_metadata = true; + } else { + goto out; + } + } + ret = btrfs_delalloc_reserve_metadata(BTRFS_I(inode), blocksize); + if (ret < 0) { + if (!only_release_metadata) + btrfs_free_reserved_data_space(inode, data_reserved, + block_start, blocksize); + goto out; + } again: page = find_or_create_page(mapping, index, mask); if (!page) { @@ -4601,14 +4619,26 @@ again: set_page_dirty(page); unlock_extent_cached(io_tree, block_start, block_end, &cached_state); + if (only_release_metadata) + set_extent_bit(&BTRFS_I(inode)->io_tree, block_start, + block_end, EXTENT_NORESERVE, NULL, NULL, + GFP_NOFS); + out_unlock: - if (ret) - btrfs_delalloc_release_space(inode, data_reserved, block_start, - blocksize, true); + if (ret) { + if (only_release_metadata) + btrfs_delalloc_release_metadata(BTRFS_I(inode), + blocksize, true); + else + btrfs_delalloc_release_space(inode, data_reserved, + block_start, blocksize, true); + } btrfs_delalloc_release_extents(BTRFS_I(inode), blocksize); unlock_page(page); put_page(page); out: + if (only_release_metadata) + btrfs_drew_write_unlock(&BTRFS_I(inode)->root->snapshot_lock); extent_changeset_free(data_reserved); return ret; } -- cgit From e4ecaf90bc13e2a9c351d5cd86d4094844d7d7bd Mon Sep 17 00:00:00 2001 From: Qu Wenruo Date: Wed, 24 Jun 2020 07:23:51 +0800 Subject: btrfs: add comments for btrfs_check_can_nocow() and can_nocow_extent() These two functions have extra conditions that their callers need to meet, and some not-that-common parameters used for return value. So adding some comments may save reviewers some time. Signed-off-by: Qu Wenruo Reviewed-by: David Sterba Signed-off-by: David Sterba --- fs/btrfs/inode.c | 21 +++++++++++++++++++-- 1 file changed, 19 insertions(+), 2 deletions(-) (limited to 'fs/btrfs/inode.c') diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index a0388b21c5cc..37c864e6b5bc 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -6945,8 +6945,25 @@ static struct extent_map *btrfs_new_extent_direct(struct inode *inode, } /* - * returns 1 when the nocow is safe, < 1 on error, 0 if the - * block must be cow'd + * Check if we can do nocow write into the range [@offset, @offset + @len) + * + * @offset: File offset + * @len: The length to write, will be updated to the nocow writeable + * range + * @orig_start: (optional) Return the original file offset of the file extent + * @orig_len: (optional) Return the original on-disk length of the file extent + * @ram_bytes: (optional) Return the ram_bytes of the file extent + * + * This function will flush ordered extents in the range to ensure proper + * nocow checks for (nowait == false) case. + * + * Return: + * >0 and update @len if we can do nocow write + * 0 if we can't do nocow write + * <0 if error happened + * + * NOTE: This only checks the file extents, caller is responsible to wait for + * any ordered extents. */ noinline int can_nocow_extent(struct inode *inode, u64 offset, u64 *len, u64 *orig_start, u64 *orig_block_len, -- cgit From 38d37aa9c32938214ca071fe02762f55b89937fd Mon Sep 17 00:00:00 2001 From: Qu Wenruo Date: Wed, 24 Jun 2020 07:23:52 +0800 Subject: btrfs: refactor btrfs_check_can_nocow() into two variants The function btrfs_check_can_nocow() now has two completely different call patterns. For nowait variant, callers don't need to do any cleanup. While for wait variant, callers need to release the lock if they can do nocow write. This is somehow confusing, and is already a problem for the exported btrfs_check_can_nocow(). So this patch will separate the different patterns into different functions. For nowait variant, the function will be called check_nocow_nolock(). For wait variant, the function pair will be btrfs_check_nocow_lock() btrfs_check_nocow_unlock(). Reviewed-by: Anand Jain Signed-off-by: Qu Wenruo Reviewed-by: David Sterba Signed-off-by: David Sterba --- fs/btrfs/inode.c | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) (limited to 'fs/btrfs/inode.c') diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 37c864e6b5bc..bd51365e53fb 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -4534,10 +4534,8 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len, ret = btrfs_check_data_free_space(inode, &data_reserved, block_start, blocksize); if (ret < 0) { - if ((BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW | - BTRFS_INODE_PREALLOC)) && - btrfs_check_can_nocow(BTRFS_I(inode), block_start, - &write_bytes, false) > 0) { + if (btrfs_check_nocow_lock(BTRFS_I(inode), block_start, + &write_bytes) > 0) { /* For nocow case, no need to reserve data space */ only_release_metadata = true; } else { @@ -4638,7 +4636,7 @@ out_unlock: put_page(page); out: if (only_release_metadata) - btrfs_drew_write_unlock(&BTRFS_I(inode)->root->snapshot_lock); + btrfs_check_nocow_unlock(BTRFS_I(inode)); extent_changeset_free(data_reserved); return ret; } -- cgit From 8b8a979f1fc69054f99abab80daeae89aba3f19b Mon Sep 17 00:00:00 2001 From: Nikolay Borisov Date: Wed, 3 Jun 2020 08:55:11 +0300 Subject: btrfs: make btrfs_qgroup_free_data take btrfs_inode It passes btrfs_inode to its callee so change the interface. Signed-off-by: Nikolay Borisov Reviewed-by: David Sterba Signed-off-by: David Sterba --- fs/btrfs/inode.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) (limited to 'fs/btrfs/inode.c') diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index bd51365e53fb..ec3d13303ebc 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -354,7 +354,7 @@ out: * And at reserve time, it's always aligned to page size, so * just free one page here. */ - btrfs_qgroup_free_data(inode, NULL, 0, PAGE_SIZE); + btrfs_qgroup_free_data(BTRFS_I(inode), NULL, 0, PAGE_SIZE); btrfs_free_path(path); btrfs_end_transaction(trans); return ret; @@ -4994,7 +4994,8 @@ static void evict_inode_truncate_pages(struct inode *inode) * Note, end is the bytenr of last byte, so we need + 1 here. */ if (state_flags & EXTENT_DELALLOC) - btrfs_qgroup_free_data(inode, NULL, start, end - start + 1); + btrfs_qgroup_free_data(BTRFS_I(inode), NULL, start, + end - start + 1); clear_extent_bit(io_tree, start, end, EXTENT_LOCKED | EXTENT_DELALLOC | @@ -8178,7 +8179,7 @@ again: * bit of its io_tree, and free the qgroup reserved data space. * Since the IO will never happen for this page. */ - btrfs_qgroup_free_data(inode, NULL, page_start, PAGE_SIZE); + btrfs_qgroup_free_data(BTRFS_I(inode), NULL, page_start, PAGE_SIZE); if (!inode_evicting) { clear_extent_bit(tree, page_start, page_end, EXTENT_LOCKED | EXTENT_DELALLOC | EXTENT_DELALLOC_NEW | -- cgit From a0349401c14f507990bbe052406498fd527f7df7 Mon Sep 17 00:00:00 2001 From: Nikolay Borisov Date: Wed, 3 Jun 2020 08:55:12 +0300 Subject: btrfs: make cow_file_range_inline take btrfs_inode It has only 2 uses for the vfs_inode - insert_inline_extent and i_size_read. On the flipside it will allow converting its callers to btrfs_inode, so convert it to taking btrfs_inode. Signed-off-by: Nikolay Borisov Reviewed-by: David Sterba Signed-off-by: David Sterba --- fs/btrfs/inode.c | 31 ++++++++++++++++--------------- 1 file changed, 16 insertions(+), 15 deletions(-) (limited to 'fs/btrfs/inode.c') diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index ec3d13303ebc..7a815f1d9e0d 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -274,15 +274,15 @@ fail: * does the checks required to make sure the data is small enough * to fit as an inline extent. */ -static noinline int cow_file_range_inline(struct inode *inode, u64 start, +static noinline int cow_file_range_inline(struct btrfs_inode *inode, u64 start, u64 end, size_t compressed_size, int compress_type, struct page **compressed_pages) { - struct btrfs_root *root = BTRFS_I(inode)->root; + struct btrfs_root *root = inode->root; struct btrfs_fs_info *fs_info = root->fs_info; struct btrfs_trans_handle *trans; - u64 isize = i_size_read(inode); + u64 isize = i_size_read(&inode->vfs_inode); u64 actual_end = min(end + 1, isize); u64 inline_len = actual_end - start; u64 aligned_end = ALIGN(end, fs_info->sectorsize); @@ -314,7 +314,7 @@ static noinline int cow_file_range_inline(struct inode *inode, u64 start, btrfs_free_path(path); return PTR_ERR(trans); } - trans->block_rsv = &BTRFS_I(inode)->block_rsv; + trans->block_rsv = &inode->block_rsv; if (compressed_size && compressed_pages) extent_item_size = btrfs_file_extent_calc_inline_size( @@ -323,9 +323,9 @@ static noinline int cow_file_range_inline(struct inode *inode, u64 start, extent_item_size = btrfs_file_extent_calc_inline_size( inline_len); - ret = __btrfs_drop_extents(trans, root, BTRFS_I(inode), path, - start, aligned_end, NULL, - 1, 1, extent_item_size, &extent_inserted); + ret = __btrfs_drop_extents(trans, root, inode, path, start, aligned_end, + NULL, 1, 1, extent_item_size, + &extent_inserted); if (ret) { btrfs_abort_transaction(trans, ret); goto out; @@ -334,7 +334,7 @@ static noinline int cow_file_range_inline(struct inode *inode, u64 start, if (isize > actual_end) inline_len = min_t(u64, isize, actual_end); ret = insert_inline_extent(trans, path, extent_inserted, - root, inode, start, + root, &inode->vfs_inode, start, inline_len, compressed_size, compress_type, compressed_pages); if (ret && ret != -ENOSPC) { @@ -345,8 +345,8 @@ static noinline int cow_file_range_inline(struct inode *inode, u64 start, goto out; } - set_bit(BTRFS_INODE_NEEDS_FULL_SYNC, &BTRFS_I(inode)->runtime_flags); - btrfs_drop_extent_cache(BTRFS_I(inode), start, aligned_end - 1, 0); + set_bit(BTRFS_INODE_NEEDS_FULL_SYNC, &inode->runtime_flags); + btrfs_drop_extent_cache(inode, start, aligned_end - 1, 0); out: /* * Don't forget to free the reserved space, as for inlined extent @@ -354,7 +354,7 @@ out: * And at reserve time, it's always aligned to page size, so * just free one page here. */ - btrfs_qgroup_free_data(BTRFS_I(inode), NULL, 0, PAGE_SIZE); + btrfs_qgroup_free_data(inode, NULL, 0, PAGE_SIZE); btrfs_free_path(path); btrfs_end_transaction(trans); return ret; @@ -616,11 +616,12 @@ cont: /* we didn't compress the entire range, try * to make an uncompressed inline extent. */ - ret = cow_file_range_inline(inode, start, end, 0, - BTRFS_COMPRESS_NONE, NULL); + ret = cow_file_range_inline(BTRFS_I(inode), start, end, + 0, BTRFS_COMPRESS_NONE, + NULL); } else { /* try making a compressed inline extent */ - ret = cow_file_range_inline(inode, start, end, + ret = cow_file_range_inline(BTRFS_I(inode), start, end, total_compressed, compress_type, pages); } @@ -1009,7 +1010,7 @@ static noinline int cow_file_range(struct inode *inode, if (start == 0) { /* lets try to make an inline extent */ - ret = cow_file_range_inline(inode, start, end, 0, + ret = cow_file_range_inline(BTRFS_I(inode), start, end, 0, BTRFS_COMPRESS_NONE, NULL); if (ret == 0) { /* -- cgit From e7fbf60453a7eae2a36ac9096c84ccb1067dabdf Mon Sep 17 00:00:00 2001 From: Nikolay Borisov Date: Wed, 3 Jun 2020 08:55:13 +0300 Subject: btrfs: make btrfs_add_ordered_extent take btrfs_inode Preparation to converting its callers to taking btrfs_inode. Signed-off-by: Nikolay Borisov Reviewed-by: David Sterba Signed-off-by: David Sterba --- fs/btrfs/inode.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) (limited to 'fs/btrfs/inode.c') diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 7a815f1d9e0d..2d053c33c380 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -1081,8 +1081,9 @@ static noinline int cow_file_range(struct inode *inode, } free_extent_map(em); - ret = btrfs_add_ordered_extent(inode, start, ins.objectid, - ram_size, cur_alloc_size, 0); + ret = btrfs_add_ordered_extent(BTRFS_I(inode), start, + ins.objectid, ram_size, + cur_alloc_size, 0); if (ret) goto out_drop_extent_cache; @@ -1716,7 +1717,7 @@ out_check: goto error; } free_extent_map(em); - ret = btrfs_add_ordered_extent(inode, cur_offset, + ret = btrfs_add_ordered_extent(BTRFS_I(inode), cur_offset, disk_bytenr, num_bytes, num_bytes, BTRFS_ORDERED_PREALLOC); @@ -1728,7 +1729,7 @@ out_check: goto error; } } else { - ret = btrfs_add_ordered_extent(inode, cur_offset, + ret = btrfs_add_ordered_extent(BTRFS_I(inode), cur_offset, disk_bytenr, num_bytes, num_bytes, BTRFS_ORDERED_NOCOW); -- cgit From 6e26c442233b9e2ecdebdfb5b75fd114b15884df Mon Sep 17 00:00:00 2001 From: Nikolay Borisov Date: Wed, 3 Jun 2020 08:55:14 +0300 Subject: btrfs: make cow_file_range take btrfs_inode All its children functions take btrfs_inode so convert it to taking btrfs_inode. Signed-off-by: Nikolay Borisov Reviewed-by: David Sterba Signed-off-by: David Sterba --- fs/btrfs/inode.c | 52 ++++++++++++++++++++++++---------------------------- 1 file changed, 24 insertions(+), 28 deletions(-) (limited to 'fs/btrfs/inode.c') diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 2d053c33c380..f1b66901dc55 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -80,7 +80,7 @@ struct kmem_cache *btrfs_free_space_bitmap_cachep; static int btrfs_setsize(struct inode *inode, struct iattr *attr); static int btrfs_truncate(struct inode *inode, bool skip_writeback); static int btrfs_finish_ordered_io(struct btrfs_ordered_extent *ordered_extent); -static noinline int cow_file_range(struct inode *inode, +static noinline int cow_file_range(struct btrfs_inode *inode, struct page *locked_page, u64 start, u64 end, int *page_started, unsigned long *nr_written, int unlock); @@ -789,7 +789,8 @@ retry: unsigned long nr_written = 0; /* allocate blocks */ - ret = cow_file_range(inode, async_chunk->locked_page, + ret = cow_file_range(BTRFS_I(inode), + async_chunk->locked_page, async_extent->start, async_extent->start + async_extent->ram_size - 1, @@ -976,13 +977,13 @@ static u64 get_extent_allocation_hint(struct btrfs_inode *inode, u64 start, * required to start IO on it. It may be clean and already done with * IO when we return. */ -static noinline int cow_file_range(struct inode *inode, +static noinline int cow_file_range(struct btrfs_inode *inode, struct page *locked_page, u64 start, u64 end, int *page_started, unsigned long *nr_written, int unlock) { - struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); - struct btrfs_root *root = BTRFS_I(inode)->root; + struct btrfs_root *root = inode->root; + struct btrfs_fs_info *fs_info = root->fs_info; u64 alloc_hint = 0; u64 num_bytes; unsigned long ram_size; @@ -996,7 +997,7 @@ static noinline int cow_file_range(struct inode *inode, bool extent_reserved = false; int ret = 0; - if (btrfs_is_free_space_inode(BTRFS_I(inode))) { + if (btrfs_is_free_space_inode(inode)) { WARN_ON_ONCE(1); ret = -EINVAL; goto out_unlock; @@ -1006,11 +1007,11 @@ static noinline int cow_file_range(struct inode *inode, num_bytes = max(blocksize, num_bytes); ASSERT(num_bytes <= btrfs_super_total_bytes(fs_info->super_copy)); - inode_should_defrag(BTRFS_I(inode), start, end, num_bytes, SZ_64K); + inode_should_defrag(inode, start, end, num_bytes, SZ_64K); if (start == 0) { /* lets try to make an inline extent */ - ret = cow_file_range_inline(BTRFS_I(inode), start, end, 0, + ret = cow_file_range_inline(inode, start, end, 0, BTRFS_COMPRESS_NONE, NULL); if (ret == 0) { /* @@ -1019,8 +1020,7 @@ static noinline int cow_file_range(struct inode *inode, * our outstanding extent for clearing delalloc for this * range. */ - extent_clear_unlock_delalloc(BTRFS_I(inode), start, end, - NULL, + extent_clear_unlock_delalloc(inode, start, end, NULL, EXTENT_LOCKED | EXTENT_DELALLOC | EXTENT_DELALLOC_NEW | EXTENT_DEFRAG | EXTENT_DO_ACCOUNTING, PAGE_UNLOCK | @@ -1035,10 +1035,8 @@ static noinline int cow_file_range(struct inode *inode, } } - alloc_hint = get_extent_allocation_hint(BTRFS_I(inode), start, - num_bytes); - btrfs_drop_extent_cache(BTRFS_I(inode), start, - start + num_bytes - 1, 0); + alloc_hint = get_extent_allocation_hint(inode, start, num_bytes); + btrfs_drop_extent_cache(inode, start, start + num_bytes - 1, 0); /* * Relocation relies on the relocated extents to have exactly the same @@ -1067,7 +1065,7 @@ static noinline int cow_file_range(struct inode *inode, extent_reserved = true; ram_size = ins.offset; - em = create_io_em(BTRFS_I(inode), start, ins.offset, /* len */ + em = create_io_em(inode, start, ins.offset, /* len */ start, /* orig_start */ ins.objectid, /* block_start */ ins.offset, /* block_len */ @@ -1081,15 +1079,14 @@ static noinline int cow_file_range(struct inode *inode, } free_extent_map(em); - ret = btrfs_add_ordered_extent(BTRFS_I(inode), start, - ins.objectid, ram_size, - cur_alloc_size, 0); + ret = btrfs_add_ordered_extent(inode, start, ins.objectid, + ram_size, cur_alloc_size, 0); if (ret) goto out_drop_extent_cache; if (root->root_key.objectid == BTRFS_DATA_RELOC_TREE_OBJECTID) { - ret = btrfs_reloc_clone_csums(BTRFS_I(inode), start, + ret = btrfs_reloc_clone_csums(inode, start, cur_alloc_size); /* * Only drop cache here, and process as normal. @@ -1103,7 +1100,7 @@ static noinline int cow_file_range(struct inode *inode, * skip current ordered extent. */ if (ret) - btrfs_drop_extent_cache(BTRFS_I(inode), start, + btrfs_drop_extent_cache(inode, start, start + ram_size - 1, 0); } @@ -1119,8 +1116,7 @@ static noinline int cow_file_range(struct inode *inode, page_ops = unlock ? PAGE_UNLOCK : 0; page_ops |= PAGE_SET_PRIVATE2; - extent_clear_unlock_delalloc(BTRFS_I(inode), start, - start + ram_size - 1, + extent_clear_unlock_delalloc(inode, start, start + ram_size - 1, locked_page, EXTENT_LOCKED | EXTENT_DELALLOC, page_ops); @@ -1144,7 +1140,7 @@ out: return ret; out_drop_extent_cache: - btrfs_drop_extent_cache(BTRFS_I(inode), start, start + ram_size - 1, 0); + btrfs_drop_extent_cache(inode, start, start + ram_size - 1, 0); out_reserve: btrfs_dec_block_group_reservations(fs_info, ins.objectid); btrfs_free_reserved_extent(fs_info, ins.objectid, ins.offset, 1); @@ -1164,7 +1160,7 @@ out_unlock: * it the flag EXTENT_CLEAR_DATA_RESV. */ if (extent_reserved) { - extent_clear_unlock_delalloc(BTRFS_I(inode), start, + extent_clear_unlock_delalloc(inode, start, start + cur_alloc_size - 1, locked_page, clear_bits, @@ -1173,7 +1169,7 @@ out_unlock: if (start >= end) goto out; } - extent_clear_unlock_delalloc(BTRFS_I(inode), start, end, locked_page, + extent_clear_unlock_delalloc(inode, start, end, locked_page, clear_bits | EXTENT_CLEAR_DATA_RESV, page_ops); goto out; @@ -1441,8 +1437,8 @@ static int fallback_to_cow(struct inode *inode, struct page *locked_page, 0, 0, NULL); } - return cow_file_range(inode, locked_page, start, end, page_started, - nr_written, 1); + return cow_file_range(BTRFS_I(inode), locked_page, start, end, + page_started, nr_written, 1); } /* @@ -1838,7 +1834,7 @@ int btrfs_run_delalloc_range(struct inode *inode, struct page *locked_page, page_started, 0, nr_written); } else if (!inode_can_compress(inode) || !inode_need_compress(inode, start, end)) { - ret = cow_file_range(inode, locked_page, start, end, + ret = cow_file_range(BTRFS_I(inode), locked_page, start, end, page_started, nr_written, 1); } else { set_bit(BTRFS_INODE_HAS_ASYNC_EXTENT, -- cgit From 4cc612090ba5828fb301623fa8cdf0d7a165f91c Mon Sep 17 00:00:00 2001 From: Nikolay Borisov Date: Wed, 3 Jun 2020 08:55:15 +0300 Subject: btrfs: make btrfs_add_ordered_extent_compress take btrfs_inode It simpy forwards its inode argument to __btrfs_add_ordered_extent which already takes btrfs_inode. Signed-off-by: Nikolay Borisov Reviewed-by: David Sterba Signed-off-by: David Sterba --- fs/btrfs/inode.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'fs/btrfs/inode.c') diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index f1b66901dc55..982fe4cca715 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -862,7 +862,7 @@ retry: goto out_free_reserve; free_extent_map(em); - ret = btrfs_add_ordered_extent_compress(inode, + ret = btrfs_add_ordered_extent_compress(BTRFS_I(inode), async_extent->start, ins.objectid, async_extent->ram_size, -- cgit From c7ee1819dc7169348eb93a088970ae143aa27435 Mon Sep 17 00:00:00 2001 From: Nikolay Borisov Date: Wed, 3 Jun 2020 08:55:16 +0300 Subject: btrfs: make btrfs_submit_compressed_write take btrfs_inode Majority of its uses are for btrfs_inode so take it as an argument directly. Signed-off-by: Nikolay Borisov Reviewed-by: David Sterba Signed-off-by: David Sterba --- fs/btrfs/inode.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'fs/btrfs/inode.c') diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 982fe4cca715..e019800eaab2 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -887,7 +887,7 @@ retry: NULL, EXTENT_LOCKED | EXTENT_DELALLOC, PAGE_UNLOCK | PAGE_CLEAR_DIRTY | PAGE_SET_WRITEBACK); - if (btrfs_submit_compressed_write(inode, + if (btrfs_submit_compressed_write(BTRFS_I(inode), async_extent->start, async_extent->ram_size, ins.objectid, -- cgit From a0ff10dcc4a5185e9df2e9d5349f8b03cc909b23 Mon Sep 17 00:00:00 2001 From: Nikolay Borisov Date: Wed, 3 Jun 2020 08:55:17 +0300 Subject: btrfs: make submit_compressed_extents take btrfs_inode All but 3 uses require vfs_inode so convert the logic to have btrfs_inode be the main inode struct. Signed-off-by: Nikolay Borisov Reviewed-by: David Sterba Signed-off-by: David Sterba --- fs/btrfs/inode.c | 34 +++++++++++++++------------------- 1 file changed, 15 insertions(+), 19 deletions(-) (limited to 'fs/btrfs/inode.c') diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index e019800eaab2..7aeaa9fe18a6 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -764,14 +764,14 @@ static void free_async_extent_pages(struct async_extent *async_extent) */ static noinline void submit_compressed_extents(struct async_chunk *async_chunk) { - struct inode *inode = async_chunk->inode; - struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); + struct btrfs_inode *inode = BTRFS_I(async_chunk->inode); + struct btrfs_fs_info *fs_info = inode->root->fs_info; struct async_extent *async_extent; u64 alloc_hint = 0; struct btrfs_key ins; struct extent_map *em; - struct btrfs_root *root = BTRFS_I(inode)->root; - struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree; + struct btrfs_root *root = inode->root; + struct extent_io_tree *io_tree = &inode->io_tree; int ret = 0; again: @@ -789,8 +789,7 @@ retry: unsigned long nr_written = 0; /* allocate blocks */ - ret = cow_file_range(BTRFS_I(inode), - async_chunk->locked_page, + ret = cow_file_range(inode, async_chunk->locked_page, async_extent->start, async_extent->start + async_extent->ram_size - 1, @@ -805,7 +804,7 @@ retry: * all those pages down to the drive. */ if (!page_started && !ret) - extent_write_locked_range(inode, + extent_write_locked_range(&inode->vfs_inode, async_extent->start, async_extent->start + async_extent->ram_size - 1, @@ -835,7 +834,7 @@ retry: * will not submit these pages down to lower * layers. */ - extent_range_redirty_for_io(inode, + extent_range_redirty_for_io(&inode->vfs_inode, async_extent->start, async_extent->start + async_extent->ram_size - 1); @@ -848,7 +847,7 @@ retry: * here we're doing allocation and writeback of the * compressed pages */ - em = create_io_em(BTRFS_I(inode), async_extent->start, + em = create_io_em(inode, async_extent->start, async_extent->ram_size, /* len */ async_extent->start, /* orig_start */ ins.objectid, /* block_start */ @@ -862,7 +861,7 @@ retry: goto out_free_reserve; free_extent_map(em); - ret = btrfs_add_ordered_extent_compress(BTRFS_I(inode), + ret = btrfs_add_ordered_extent_compress(inode, async_extent->start, ins.objectid, async_extent->ram_size, @@ -870,8 +869,7 @@ retry: BTRFS_ORDERED_COMPRESSED, async_extent->compress_type); if (ret) { - btrfs_drop_extent_cache(BTRFS_I(inode), - async_extent->start, + btrfs_drop_extent_cache(inode, async_extent->start, async_extent->start + async_extent->ram_size - 1, 0); goto out_free_reserve; @@ -881,14 +879,13 @@ retry: /* * clear dirty, set writeback and unlock the pages. */ - extent_clear_unlock_delalloc(BTRFS_I(inode), async_extent->start, + extent_clear_unlock_delalloc(inode, async_extent->start, async_extent->start + async_extent->ram_size - 1, NULL, EXTENT_LOCKED | EXTENT_DELALLOC, PAGE_UNLOCK | PAGE_CLEAR_DIRTY | PAGE_SET_WRITEBACK); - if (btrfs_submit_compressed_write(BTRFS_I(inode), - async_extent->start, + if (btrfs_submit_compressed_write(inode, async_extent->start, async_extent->ram_size, ins.objectid, ins.offset, async_extent->pages, @@ -899,12 +896,11 @@ retry: const u64 start = async_extent->start; const u64 end = start + async_extent->ram_size - 1; - p->mapping = inode->i_mapping; + p->mapping = inode->vfs_inode.i_mapping; btrfs_writepage_endio_finish_ordered(p, start, end, 0); p->mapping = NULL; - extent_clear_unlock_delalloc(BTRFS_I(inode), start, end, - NULL, 0, + extent_clear_unlock_delalloc(inode, start, end, NULL, 0, PAGE_END_WRITEBACK | PAGE_SET_ERROR); free_async_extent_pages(async_extent); @@ -918,7 +914,7 @@ out_free_reserve: btrfs_dec_block_group_reservations(fs_info, ins.objectid); btrfs_free_reserved_extent(fs_info, ins.objectid, ins.offset, 1); out_free: - extent_clear_unlock_delalloc(BTRFS_I(inode), async_extent->start, + extent_clear_unlock_delalloc(inode, async_extent->start, async_extent->start + async_extent->ram_size - 1, NULL, EXTENT_LOCKED | EXTENT_DELALLOC | -- cgit From 72b7d15bf1e1ee7b71da81e6c5d9afcd2e86c426 Mon Sep 17 00:00:00 2001 From: Nikolay Borisov Date: Wed, 3 Jun 2020 08:55:18 +0300 Subject: btrfs: make btrfs_qgroup_release_data take btrfs_inode It just forwards its argument to __btrfs_qgroup_release_data. Signed-off-by: Nikolay Borisov Reviewed-by: David Sterba Signed-off-by: David Sterba --- fs/btrfs/inode.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'fs/btrfs/inode.c') diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 7aeaa9fe18a6..49d0d3d528b5 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -9643,7 +9643,7 @@ static int insert_prealloc_file_extent(struct btrfs_trans_handle *trans, btrfs_set_stack_file_extent_compression(&stack_fi, BTRFS_COMPRESS_NONE); /* Encryption and other encoding is reserved and all 0 */ - ret = btrfs_qgroup_release_data(inode, file_offset, len); + ret = btrfs_qgroup_release_data(BTRFS_I(inode), file_offset, len); if (ret < 0) return ret; return insert_reserved_file_extent(trans, inode, file_offset, -- cgit From c553f94df4d1c5f37ec253b0ff40a2362af03fc1 Mon Sep 17 00:00:00 2001 From: Nikolay Borisov Date: Wed, 3 Jun 2020 08:55:19 +0300 Subject: btrfs: make insert_reserved_file_extent take btrfs_inode Signed-off-by: Nikolay Borisov Reviewed-by: David Sterba c Signed-off-by: David Sterba --- fs/btrfs/inode.c | 23 ++++++++++------------- 1 file changed, 10 insertions(+), 13 deletions(-) (limited to 'fs/btrfs/inode.c') diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 49d0d3d528b5..f3b711e39dbe 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -2464,11 +2464,11 @@ int btrfs_writepage_cow_fixup(struct page *page, u64 start, u64 end) } static int insert_reserved_file_extent(struct btrfs_trans_handle *trans, - struct inode *inode, u64 file_pos, + struct btrfs_inode *inode, u64 file_pos, struct btrfs_file_extent_item *stack_fi, u64 qgroup_reserved) { - struct btrfs_root *root = BTRFS_I(inode)->root; + struct btrfs_root *root = inode->root; struct btrfs_path *path; struct extent_buffer *leaf; struct btrfs_key ins; @@ -2492,14 +2492,14 @@ static int insert_reserved_file_extent(struct btrfs_trans_handle *trans, * the caller is expected to unpin it and allow it to be merged * with the others. */ - ret = __btrfs_drop_extents(trans, root, BTRFS_I(inode), path, file_pos, + ret = __btrfs_drop_extents(trans, root, inode, path, file_pos, file_pos + num_bytes, NULL, 0, 1, sizeof(*stack_fi), &extent_inserted); if (ret) goto out; if (!extent_inserted) { - ins.objectid = btrfs_ino(BTRFS_I(inode)); + ins.objectid = btrfs_ino(inode); ins.offset = file_pos; ins.type = BTRFS_EXTENT_DATA_KEY; @@ -2518,19 +2518,17 @@ static int insert_reserved_file_extent(struct btrfs_trans_handle *trans, btrfs_mark_buffer_dirty(leaf); btrfs_release_path(path); - inode_add_bytes(inode, num_bytes); + inode_add_bytes(&inode->vfs_inode, num_bytes); ins.objectid = disk_bytenr; ins.offset = disk_num_bytes; ins.type = BTRFS_EXTENT_ITEM_KEY; - ret = btrfs_inode_set_file_extent_range(BTRFS_I(inode), file_pos, - ram_bytes); + ret = btrfs_inode_set_file_extent_range(inode, file_pos, ram_bytes); if (ret) goto out; - ret = btrfs_alloc_reserved_file_extent(trans, root, - btrfs_ino(BTRFS_I(inode)), + ret = btrfs_alloc_reserved_file_extent(trans, root, btrfs_ino(inode), file_pos, qgroup_reserved, &ins); out: btrfs_free_path(path); @@ -2574,7 +2572,7 @@ static int insert_ordered_extent_file_extent(struct btrfs_trans_handle *trans, btrfs_set_stack_file_extent_compression(&stack_fi, oe->compress_type); /* Encryption and other encoding is reserved and all 0 */ - return insert_reserved_file_extent(trans, inode, oe->file_offset, + return insert_reserved_file_extent(trans, BTRFS_I(inode), oe->file_offset, &stack_fi, oe->qgroup_rsv); } @@ -9646,7 +9644,7 @@ static int insert_prealloc_file_extent(struct btrfs_trans_handle *trans, ret = btrfs_qgroup_release_data(BTRFS_I(inode), file_offset, len); if (ret < 0) return ret; - return insert_reserved_file_extent(trans, inode, file_offset, + return insert_reserved_file_extent(trans, BTRFS_I(inode), file_offset, &stack_fi, ret); } static int __btrfs_prealloc_file_range(struct inode *inode, int mode, @@ -9707,8 +9705,7 @@ static int __btrfs_prealloc_file_range(struct inode *inode, int mode, btrfs_dec_block_group_reservations(fs_info, ins.objectid); last_alloc = ins.offset; - ret = insert_prealloc_file_extent(trans, inode, &ins, - cur_offset); + ret = insert_prealloc_file_extent(trans, inode, &ins, cur_offset); if (ret) { btrfs_free_reserved_extent(fs_info, ins.objectid, ins.offset, 0); -- cgit From 8ba96f3dd6a0b2dac1a3c48ece76885aa5e40e66 Mon Sep 17 00:00:00 2001 From: Nikolay Borisov Date: Wed, 3 Jun 2020 08:55:20 +0300 Subject: btrfs: make fallback_to_cow take btrfs_inode It really wants btrfs_inode and is prepration to converting run_delalloc_nocow to taking btrfs_inode. Signed-off-by: Nikolay Borisov Reviewed-by: David Sterba Signed-off-by: David Sterba --- fs/btrfs/inode.c | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) (limited to 'fs/btrfs/inode.c') diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index f3b711e39dbe..329b5ad7fe59 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -1370,15 +1370,15 @@ static noinline int csum_exist_in_range(struct btrfs_fs_info *fs_info, return 1; } -static int fallback_to_cow(struct inode *inode, struct page *locked_page, +static int fallback_to_cow(struct btrfs_inode *inode, struct page *locked_page, const u64 start, const u64 end, int *page_started, unsigned long *nr_written) { - const bool is_space_ino = btrfs_is_free_space_inode(BTRFS_I(inode)); - const bool is_reloc_ino = (BTRFS_I(inode)->root->root_key.objectid == + const bool is_space_ino = btrfs_is_free_space_inode(inode); + const bool is_reloc_ino = (inode->root->root_key.objectid == BTRFS_DATA_RELOC_TREE_OBJECTID); const u64 range_bytes = end + 1 - start; - struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree; + struct extent_io_tree *io_tree = &inode->io_tree; u64 range_start = start; u64 count; @@ -1418,7 +1418,7 @@ static int fallback_to_cow(struct inode *inode, struct page *locked_page, EXTENT_NORESERVE, 0); if (count > 0 || is_space_ino || is_reloc_ino) { u64 bytes = count; - struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info; + struct btrfs_fs_info *fs_info = inode->root->fs_info; struct btrfs_space_info *sinfo = fs_info->data_sinfo; if (is_space_ino || is_reloc_ino) @@ -1433,8 +1433,8 @@ static int fallback_to_cow(struct inode *inode, struct page *locked_page, 0, 0, NULL); } - return cow_file_range(BTRFS_I(inode), locked_page, start, end, - page_started, nr_written, 1); + return cow_file_range(inode, locked_page, start, end, page_started, + nr_written, 1); } /* @@ -1685,8 +1685,8 @@ out_check: * NOCOW, following one which needs to be COW'ed */ if (cow_start != (u64)-1) { - ret = fallback_to_cow(inode, locked_page, cow_start, - found_key.offset - 1, + ret = fallback_to_cow(BTRFS_I(inode), locked_page, + cow_start, found_key.offset - 1, page_started, nr_written); if (ret) goto error; @@ -1769,8 +1769,8 @@ out_check: if (cow_start != (u64)-1) { cur_offset = end; - ret = fallback_to_cow(inode, locked_page, cow_start, end, - page_started, nr_written); + ret = fallback_to_cow(BTRFS_I(inode), locked_page, cow_start, + end, page_started, nr_written); if (ret) goto error; } -- cgit From 968322c8c6d593fb15f26f73cb630c820c1a20f5 Mon Sep 17 00:00:00 2001 From: Nikolay Borisov Date: Wed, 3 Jun 2020 08:55:21 +0300 Subject: btrfs: make run_delalloc_nocow take btrfs_inode It only really uses btrfs_inode. Signed-off-by: Nikolay Borisov Reviewed-by: David Sterba Signed-off-by: David Sterba --- fs/btrfs/inode.c | 38 ++++++++++++++++++-------------------- 1 file changed, 18 insertions(+), 20 deletions(-) (limited to 'fs/btrfs/inode.c') diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 329b5ad7fe59..c0f8db1bca9c 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -1444,28 +1444,27 @@ static int fallback_to_cow(struct btrfs_inode *inode, struct page *locked_page, * If no cow copies or snapshots exist, we write directly to the existing * blocks on disk */ -static noinline int run_delalloc_nocow(struct inode *inode, +static noinline int run_delalloc_nocow(struct btrfs_inode *inode, struct page *locked_page, const u64 start, const u64 end, int *page_started, int force, unsigned long *nr_written) { - struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); - struct btrfs_root *root = BTRFS_I(inode)->root; + struct btrfs_fs_info *fs_info = inode->root->fs_info; + struct btrfs_root *root = inode->root; struct btrfs_path *path; u64 cow_start = (u64)-1; u64 cur_offset = start; int ret; bool check_prev = true; - const bool freespace_inode = btrfs_is_free_space_inode(BTRFS_I(inode)); - u64 ino = btrfs_ino(BTRFS_I(inode)); + const bool freespace_inode = btrfs_is_free_space_inode(inode); + u64 ino = btrfs_ino(inode); bool nocow = false; u64 disk_bytenr = 0; path = btrfs_alloc_path(); if (!path) { - extent_clear_unlock_delalloc(BTRFS_I(inode), start, end, - locked_page, + extent_clear_unlock_delalloc(inode, start, end, locked_page, EXTENT_LOCKED | EXTENT_DELALLOC | EXTENT_DO_ACCOUNTING | EXTENT_DEFRAG, PAGE_UNLOCK | @@ -1685,7 +1684,7 @@ out_check: * NOCOW, following one which needs to be COW'ed */ if (cow_start != (u64)-1) { - ret = fallback_to_cow(BTRFS_I(inode), locked_page, + ret = fallback_to_cow(inode, locked_page, cow_start, found_key.offset - 1, page_started, nr_written); if (ret) @@ -1697,7 +1696,7 @@ out_check: u64 orig_start = found_key.offset - extent_offset; struct extent_map *em; - em = create_io_em(BTRFS_I(inode), cur_offset, num_bytes, + em = create_io_em(inode, cur_offset, num_bytes, orig_start, disk_bytenr, /* block_start */ num_bytes, /* block_len */ @@ -1709,19 +1708,18 @@ out_check: goto error; } free_extent_map(em); - ret = btrfs_add_ordered_extent(BTRFS_I(inode), cur_offset, + ret = btrfs_add_ordered_extent(inode, cur_offset, disk_bytenr, num_bytes, num_bytes, BTRFS_ORDERED_PREALLOC); if (ret) { - btrfs_drop_extent_cache(BTRFS_I(inode), - cur_offset, + btrfs_drop_extent_cache(inode, cur_offset, cur_offset + num_bytes - 1, 0); goto error; } } else { - ret = btrfs_add_ordered_extent(BTRFS_I(inode), cur_offset, + ret = btrfs_add_ordered_extent(inode, cur_offset, disk_bytenr, num_bytes, num_bytes, BTRFS_ORDERED_NOCOW); @@ -1740,10 +1738,10 @@ out_check: * extent_clear_unlock_delalloc() in error handler * from freeing metadata of created ordered extent. */ - ret = btrfs_reloc_clone_csums(BTRFS_I(inode), cur_offset, + ret = btrfs_reloc_clone_csums(inode, cur_offset, num_bytes); - extent_clear_unlock_delalloc(BTRFS_I(inode), cur_offset, + extent_clear_unlock_delalloc(inode, cur_offset, cur_offset + num_bytes - 1, locked_page, EXTENT_LOCKED | EXTENT_DELALLOC | @@ -1769,8 +1767,8 @@ out_check: if (cow_start != (u64)-1) { cur_offset = end; - ret = fallback_to_cow(BTRFS_I(inode), locked_page, cow_start, - end, page_started, nr_written); + ret = fallback_to_cow(inode, locked_page, cow_start, end, + page_started, nr_written); if (ret) goto error; } @@ -1780,7 +1778,7 @@ error: btrfs_dec_nocow_writers(fs_info, disk_bytenr); if (ret && cur_offset < end) - extent_clear_unlock_delalloc(BTRFS_I(inode), cur_offset, end, + extent_clear_unlock_delalloc(inode, cur_offset, end, locked_page, EXTENT_LOCKED | EXTENT_DELALLOC | EXTENT_DEFRAG | EXTENT_DO_ACCOUNTING, PAGE_UNLOCK | @@ -1823,10 +1821,10 @@ int btrfs_run_delalloc_range(struct inode *inode, struct page *locked_page, int force_cow = need_force_cow(inode, start, end); if (BTRFS_I(inode)->flags & BTRFS_INODE_NODATACOW && !force_cow) { - ret = run_delalloc_nocow(inode, locked_page, start, end, + ret = run_delalloc_nocow(BTRFS_I(inode), locked_page, start, end, page_started, 1, nr_written); } else if (BTRFS_I(inode)->flags & BTRFS_INODE_PREALLOC && !force_cow) { - ret = run_delalloc_nocow(inode, locked_page, start, end, + ret = run_delalloc_nocow(BTRFS_I(inode), locked_page, start, end, page_started, 0, nr_written); } else if (!inode_can_compress(inode) || !inode_need_compress(inode, start, end)) { -- cgit From 751b64318d4c6b09f6bb9be4313ed742ec2293f9 Mon Sep 17 00:00:00 2001 From: Nikolay Borisov Date: Wed, 3 Jun 2020 08:55:22 +0300 Subject: btrfs: make cow_file_range_async take btrfs_inode It only uses vfs inode for assigning it to the async_chunk function. Signed-off-by: Nikolay Borisov Reviewed-by: David Sterba Signed-off-by: David Sterba --- fs/btrfs/inode.c | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) (limited to 'fs/btrfs/inode.c') diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index c0f8db1bca9c..94ba5248e201 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -1233,13 +1233,13 @@ static noinline void async_cow_free(struct btrfs_work *work) kvfree(async_chunk->pending); } -static int cow_file_range_async(struct inode *inode, +static int cow_file_range_async(struct btrfs_inode *inode, struct writeback_control *wbc, struct page *locked_page, u64 start, u64 end, int *page_started, unsigned long *nr_written) { - struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); + struct btrfs_fs_info *fs_info = inode->root->fs_info; struct cgroup_subsys_state *blkcg_css = wbc_blkcg_css(wbc); struct async_cow *ctx; struct async_chunk *async_chunk; @@ -1251,9 +1251,9 @@ static int cow_file_range_async(struct inode *inode, unsigned nofs_flag; const unsigned int write_flags = wbc_to_write_flags(wbc); - unlock_extent(&BTRFS_I(inode)->io_tree, start, end); + unlock_extent(&inode->io_tree, start, end); - if (BTRFS_I(inode)->flags & BTRFS_INODE_NOCOMPRESS && + if (inode->flags & BTRFS_INODE_NOCOMPRESS && !btrfs_test_opt(fs_info, FORCE_COMPRESS)) { num_chunks = 1; should_compress = false; @@ -1273,8 +1273,8 @@ static int cow_file_range_async(struct inode *inode, PAGE_SET_WRITEBACK | PAGE_END_WRITEBACK | PAGE_SET_ERROR; - extent_clear_unlock_delalloc(BTRFS_I(inode), start, end, - locked_page, clear_bits, page_ops); + extent_clear_unlock_delalloc(inode, start, end, locked_page, + clear_bits, page_ops); return -ENOMEM; } @@ -1291,9 +1291,9 @@ static int cow_file_range_async(struct inode *inode, * igrab is called higher up in the call chain, take only the * lightweight reference for the callback lifetime */ - ihold(inode); + ihold(&inode->vfs_inode); async_chunk[i].pending = &ctx->num_chunks; - async_chunk[i].inode = inode; + async_chunk[i].inode = &inode->vfs_inode; async_chunk[i].start = start; async_chunk[i].end = cur_end; async_chunk[i].write_flags = write_flags; @@ -1833,7 +1833,7 @@ int btrfs_run_delalloc_range(struct inode *inode, struct page *locked_page, } else { set_bit(BTRFS_INODE_HAS_ASYNC_EXTENT, &BTRFS_I(inode)->runtime_flags); - ret = cow_file_range_async(inode, wbc, locked_page, start, end, + ret = cow_file_range_async(BTRFS_I(inode), wbc, locked_page, start, end, page_started, nr_written); } if (ret) -- cgit From 7095821ee1f57d9d4b179463738776e0fcd28d38 Mon Sep 17 00:00:00 2001 From: Nikolay Borisov Date: Wed, 3 Jun 2020 08:55:23 +0300 Subject: btrfs: make btrfs_dec_test_first_ordered_pending take btrfs_inode It doesn't really need vfs_inode but btrfs_inode. Signed-off-by: Nikolay Borisov Reviewed-by: David Sterba Signed-off-by: David Sterba --- fs/btrfs/inode.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) (limited to 'fs/btrfs/inode.c') diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 94ba5248e201..20bb3a4072fe 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -7578,10 +7578,10 @@ static void __endio_write_update_ordered(struct inode *inode, while (ordered_offset < offset + bytes) { last_offset = ordered_offset; - if (btrfs_dec_test_first_ordered_pending(inode, &ordered, - &ordered_offset, - ordered_bytes, - uptodate)) { + if (btrfs_dec_test_first_ordered_pending(BTRFS_I(inode), &ordered, + &ordered_offset, + ordered_bytes, + uptodate)) { btrfs_init_work(&ordered->work, finish_ordered_fn, NULL, NULL); btrfs_queue_work(wq, &ordered->work); -- cgit From b672b5c1563094d890e8a9c8545bce36d98baf12 Mon Sep 17 00:00:00 2001 From: Nikolay Borisov Date: Wed, 3 Jun 2020 08:55:24 +0300 Subject: btrfs: make __endio_write_update_ordered take btrfs_inode It really wants btrfs_inode. Signed-off-by: Nikolay Borisov Reviewed-by: David Sterba Signed-off-by: David Sterba --- fs/btrfs/inode.c | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-) (limited to 'fs/btrfs/inode.c') diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 20bb3a4072fe..c718fdd57020 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -90,7 +90,7 @@ static struct extent_map *create_io_em(struct btrfs_inode *inode, u64 start, u64 ram_bytes, int compress_type, int type); -static void __endio_write_update_ordered(struct inode *inode, +static void __endio_write_update_ordered(struct btrfs_inode *inode, const u64 offset, const u64 bytes, const bool uptodate); @@ -134,7 +134,8 @@ static inline void btrfs_cleanup_ordered_extents(struct inode *inode, bytes -= PAGE_SIZE; } - return __endio_write_update_ordered(inode, offset, bytes, false); + return __endio_write_update_ordered(BTRFS_I(inode), offset, bytes, + false); } static int btrfs_dirty_inode(struct inode *inode); @@ -7474,7 +7475,8 @@ static void btrfs_dio_private_put(struct btrfs_dio_private *dip) return; if (bio_op(dip->dio_bio) == REQ_OP_WRITE) { - __endio_write_update_ordered(dip->inode, dip->logical_offset, + __endio_write_update_ordered(BTRFS_I(dip->inode), + dip->logical_offset, dip->bytes, !dip->dio_bio->bi_status); } else { @@ -7560,25 +7562,25 @@ static blk_status_t btrfs_check_read_dio_bio(struct inode *inode, return err; } -static void __endio_write_update_ordered(struct inode *inode, +static void __endio_write_update_ordered(struct btrfs_inode *inode, const u64 offset, const u64 bytes, const bool uptodate) { - struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); + struct btrfs_fs_info *fs_info = inode->root->fs_info; struct btrfs_ordered_extent *ordered = NULL; struct btrfs_workqueue *wq; u64 ordered_offset = offset; u64 ordered_bytes = bytes; u64 last_offset; - if (btrfs_is_free_space_inode(BTRFS_I(inode))) + if (btrfs_is_free_space_inode(inode)) wq = fs_info->endio_freespace_worker; else wq = fs_info->endio_write_workers; while (ordered_offset < offset + bytes) { last_offset = ordered_offset; - if (btrfs_dec_test_first_ordered_pending(BTRFS_I(inode), &ordered, + if (btrfs_dec_test_first_ordered_pending(inode, &ordered, &ordered_offset, ordered_bytes, uptodate)) { @@ -7961,7 +7963,7 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter) */ if (dio_data.unsubmitted_oe_range_start < dio_data.unsubmitted_oe_range_end) - __endio_write_update_ordered(inode, + __endio_write_update_ordered(BTRFS_I(inode), dio_data.unsubmitted_oe_range_start, dio_data.unsubmitted_oe_range_end - dio_data.unsubmitted_oe_range_start, -- cgit From 64e1db566deb5fd5bd9ece981b609de7c540c12a Mon Sep 17 00:00:00 2001 From: Nikolay Borisov Date: Wed, 3 Jun 2020 08:55:25 +0300 Subject: btrfs: make btrfs_cleanup_ordered_extents take btrfs_inode Preparation to converting btrfs_run_delalloc_range to using btrfs_inode without BTRFS_I() calls. Signed-off-by: Nikolay Borisov Reviewed-by: David Sterba Signed-off-by: David Sterba --- fs/btrfs/inode.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) (limited to 'fs/btrfs/inode.c') diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index c718fdd57020..445085cd58be 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -104,7 +104,7 @@ static void __endio_write_update_ordered(struct btrfs_inode *inode, * to be released, which we want to happen only when finishing the ordered * extent (btrfs_finish_ordered_io()). */ -static inline void btrfs_cleanup_ordered_extents(struct inode *inode, +static inline void btrfs_cleanup_ordered_extents(struct btrfs_inode *inode, struct page *locked_page, u64 offset, u64 bytes) { @@ -116,7 +116,7 @@ static inline void btrfs_cleanup_ordered_extents(struct inode *inode, struct page *page; while (index <= end_index) { - page = find_get_page(inode->i_mapping, index); + page = find_get_page(inode->vfs_inode.i_mapping, index); index++; if (!page) continue; @@ -134,8 +134,7 @@ static inline void btrfs_cleanup_ordered_extents(struct inode *inode, bytes -= PAGE_SIZE; } - return __endio_write_update_ordered(BTRFS_I(inode), offset, bytes, - false); + return __endio_write_update_ordered(inode, offset, bytes, false); } static int btrfs_dirty_inode(struct inode *inode); @@ -1838,7 +1837,7 @@ int btrfs_run_delalloc_range(struct inode *inode, struct page *locked_page, page_started, nr_written); } if (ret) - btrfs_cleanup_ordered_extents(inode, locked_page, start, + btrfs_cleanup_ordered_extents(BTRFS_I(inode), locked_page, start, end - start + 1); return ret; } -- cgit From 99c88dc71cae52050892798e20c63d5d7053bb94 Mon Sep 17 00:00:00 2001 From: Nikolay Borisov Date: Wed, 3 Jun 2020 08:55:26 +0300 Subject: btrfs: make inode_can_compress take btrfs_inode Gets rid of superfluous BTRFS_I() calls. Signed-off-by: Nikolay Borisov Reviewed-by: David Sterba Signed-off-by: David Sterba --- fs/btrfs/inode.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) (limited to 'fs/btrfs/inode.c') diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 445085cd58be..0d3f0b3c1621 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -412,10 +412,10 @@ static noinline int add_async_extent(struct async_chunk *cow, /* * Check if the inode has flags compatible with compression */ -static inline bool inode_can_compress(struct inode *inode) +static inline bool inode_can_compress(struct btrfs_inode *inode) { - if (BTRFS_I(inode)->flags & BTRFS_INODE_NODATACOW || - BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM) + if (inode->flags & BTRFS_INODE_NODATACOW || + inode->flags & BTRFS_INODE_NODATASUM) return false; return true; } @@ -428,7 +428,7 @@ static inline int inode_need_compress(struct inode *inode, u64 start, u64 end) { struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); - if (!inode_can_compress(inode)) { + if (!inode_can_compress(BTRFS_I(inode))) { WARN(IS_ENABLED(CONFIG_BTRFS_DEBUG), KERN_ERR "BTRFS: unexpected compression for ino %llu\n", btrfs_ino(BTRFS_I(inode))); @@ -1826,7 +1826,7 @@ int btrfs_run_delalloc_range(struct inode *inode, struct page *locked_page, } else if (BTRFS_I(inode)->flags & BTRFS_INODE_PREALLOC && !force_cow) { ret = run_delalloc_nocow(BTRFS_I(inode), locked_page, start, end, page_started, 0, nr_written); - } else if (!inode_can_compress(inode) || + } else if (!inode_can_compress(BTRFS_I(inode)) || !inode_need_compress(inode, start, end)) { ret = cow_file_range(BTRFS_I(inode), locked_page, start, end, page_started, nr_written, 1); -- cgit From 808a12923203ee1b992a278b6f8d4e53b69d1508 Mon Sep 17 00:00:00 2001 From: Nikolay Borisov Date: Wed, 3 Jun 2020 08:55:27 +0300 Subject: btrfs: make inode_need_compress take btrfs_inode Simply gets rid of superfluous BTRFS_I() calls. Signed-off-by: Nikolay Borisov Reviewed-by: David Sterba Signed-off-by: David Sterba --- fs/btrfs/inode.c | 23 ++++++++++++----------- 1 file changed, 12 insertions(+), 11 deletions(-) (limited to 'fs/btrfs/inode.c') diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 0d3f0b3c1621..64ac763d4a03 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -424,29 +424,30 @@ static inline bool inode_can_compress(struct btrfs_inode *inode) * Check if the inode needs to be submitted to compression, based on mount * options, defragmentation, properties or heuristics. */ -static inline int inode_need_compress(struct inode *inode, u64 start, u64 end) +static inline int inode_need_compress(struct btrfs_inode *inode, u64 start, + u64 end) { - struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); + struct btrfs_fs_info *fs_info = inode->root->fs_info; - if (!inode_can_compress(BTRFS_I(inode))) { + if (!inode_can_compress(inode)) { WARN(IS_ENABLED(CONFIG_BTRFS_DEBUG), KERN_ERR "BTRFS: unexpected compression for ino %llu\n", - btrfs_ino(BTRFS_I(inode))); + btrfs_ino(inode)); return 0; } /* force compress */ if (btrfs_test_opt(fs_info, FORCE_COMPRESS)) return 1; /* defrag ioctl */ - if (BTRFS_I(inode)->defrag_compress) + if (inode->defrag_compress) return 1; /* bad compression ratios */ - if (BTRFS_I(inode)->flags & BTRFS_INODE_NOCOMPRESS) + if (inode->flags & BTRFS_INODE_NOCOMPRESS) return 0; if (btrfs_test_opt(fs_info, COMPRESS) || - BTRFS_I(inode)->flags & BTRFS_INODE_COMPRESS || - BTRFS_I(inode)->prop_compress) - return btrfs_compress_heuristic(inode, start, end); + inode->flags & BTRFS_INODE_COMPRESS || + inode->prop_compress) + return btrfs_compress_heuristic(&inode->vfs_inode, start, end); return 0; } @@ -552,7 +553,7 @@ again: * inode has not been flagged as nocompress. This flag can * change at any time if we discover bad compression ratios. */ - if (inode_need_compress(inode, start, end)) { + if (inode_need_compress(BTRFS_I(inode), start, end)) { WARN_ON(pages); pages = kcalloc(nr_pages, sizeof(struct page *), GFP_NOFS); if (!pages) { @@ -1827,7 +1828,7 @@ int btrfs_run_delalloc_range(struct inode *inode, struct page *locked_page, ret = run_delalloc_nocow(BTRFS_I(inode), locked_page, start, end, page_started, 0, nr_written); } else if (!inode_can_compress(BTRFS_I(inode)) || - !inode_need_compress(inode, start, end)) { + !inode_need_compress(BTRFS_I(inode), start, end)) { ret = cow_file_range(BTRFS_I(inode), locked_page, start, end, page_started, nr_written, 1); } else { -- cgit From 0c4942258cc1de265619b48a50a29d81f85b86eb Mon Sep 17 00:00:00 2001 From: Nikolay Borisov Date: Wed, 3 Jun 2020 08:55:28 +0300 Subject: btrfs: make need_force_cow take btrfs_inode Gets rid of superfulous BTRFS_I() calls and prepare for converting btrfs_run_delalloc_range to using btrfs_inode. Signed-off-by: Nikolay Borisov Reviewed-by: David Sterba Signed-off-by: David Sterba --- fs/btrfs/inode.c | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) (limited to 'fs/btrfs/inode.c') diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 64ac763d4a03..eaa8e71288c8 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -1790,11 +1790,11 @@ error: return ret; } -static inline int need_force_cow(struct inode *inode, u64 start, u64 end) +static inline int need_force_cow(struct btrfs_inode *inode, u64 start, u64 end) { - if (!(BTRFS_I(inode)->flags & BTRFS_INODE_NODATACOW) && - !(BTRFS_I(inode)->flags & BTRFS_INODE_PREALLOC)) + if (!(inode->flags & BTRFS_INODE_NODATACOW) && + !(inode->flags & BTRFS_INODE_PREALLOC)) return 0; /* @@ -1802,9 +1802,8 @@ static inline int need_force_cow(struct inode *inode, u64 start, u64 end) * if is not zero, it means the file is defragging. * Force cow if given extent needs to be defragged. */ - if (BTRFS_I(inode)->defrag_bytes && - test_range_bit(&BTRFS_I(inode)->io_tree, start, end, - EXTENT_DEFRAG, 0, NULL)) + if (inode->defrag_bytes && + test_range_bit(&inode->io_tree, start, end, EXTENT_DEFRAG, 0, NULL)) return 1; return 0; @@ -1819,7 +1818,7 @@ int btrfs_run_delalloc_range(struct inode *inode, struct page *locked_page, struct writeback_control *wbc) { int ret; - int force_cow = need_force_cow(inode, start, end); + int force_cow = need_force_cow(BTRFS_I(inode), start, end); if (BTRFS_I(inode)->flags & BTRFS_INODE_NODATACOW && !force_cow) { ret = run_delalloc_nocow(BTRFS_I(inode), locked_page, start, end, -- cgit From 98456b9c46c140275b0fe81970ddfad250c68ed4 Mon Sep 17 00:00:00 2001 From: Nikolay Borisov Date: Wed, 3 Jun 2020 08:55:29 +0300 Subject: btrfs: make btrfs_run_delalloc_range take btrfs_inode All children now take btrfs_inode so convert it to taking it as a parameter as well. Signed-off-by: Nikolay Borisov Reviewed-by: David Sterba Signed-off-by: David Sterba --- fs/btrfs/inode.c | 27 +++++++++++++-------------- 1 file changed, 13 insertions(+), 14 deletions(-) (limited to 'fs/btrfs/inode.c') diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index eaa8e71288c8..aa48452e395f 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -1813,31 +1813,30 @@ static inline int need_force_cow(struct btrfs_inode *inode, u64 start, u64 end) * Function to process delayed allocation (create CoW) for ranges which are * being touched for the first time. */ -int btrfs_run_delalloc_range(struct inode *inode, struct page *locked_page, +int btrfs_run_delalloc_range(struct btrfs_inode *inode, struct page *locked_page, u64 start, u64 end, int *page_started, unsigned long *nr_written, struct writeback_control *wbc) { int ret; - int force_cow = need_force_cow(BTRFS_I(inode), start, end); + int force_cow = need_force_cow(inode, start, end); - if (BTRFS_I(inode)->flags & BTRFS_INODE_NODATACOW && !force_cow) { - ret = run_delalloc_nocow(BTRFS_I(inode), locked_page, start, end, + if (inode->flags & BTRFS_INODE_NODATACOW && !force_cow) { + ret = run_delalloc_nocow(inode, locked_page, start, end, page_started, 1, nr_written); - } else if (BTRFS_I(inode)->flags & BTRFS_INODE_PREALLOC && !force_cow) { - ret = run_delalloc_nocow(BTRFS_I(inode), locked_page, start, end, + } else if (inode->flags & BTRFS_INODE_PREALLOC && !force_cow) { + ret = run_delalloc_nocow(inode, locked_page, start, end, page_started, 0, nr_written); - } else if (!inode_can_compress(BTRFS_I(inode)) || - !inode_need_compress(BTRFS_I(inode), start, end)) { - ret = cow_file_range(BTRFS_I(inode), locked_page, start, end, - page_started, nr_written, 1); + } else if (!inode_can_compress(inode) || + !inode_need_compress(inode, start, end)) { + ret = cow_file_range(inode, locked_page, start, end, + page_started, nr_written, 1); } else { - set_bit(BTRFS_INODE_HAS_ASYNC_EXTENT, - &BTRFS_I(inode)->runtime_flags); - ret = cow_file_range_async(BTRFS_I(inode), wbc, locked_page, start, end, + set_bit(BTRFS_INODE_HAS_ASYNC_EXTENT, &inode->runtime_flags); + ret = cow_file_range_async(inode, wbc, locked_page, start, end, page_started, nr_written); } if (ret) - btrfs_cleanup_ordered_extents(BTRFS_I(inode), locked_page, start, + btrfs_cleanup_ordered_extents(inode, locked_page, start, end - start + 1); return ret; } -- cgit From c1e095202caae87ac4a5c353691a492692d61857 Mon Sep 17 00:00:00 2001 From: Nikolay Borisov Date: Wed, 3 Jun 2020 08:55:30 +0300 Subject: btrfs: make btrfs_add_ordered_extent_dio take btrfs_inode Simply forwards its argument so let's get rid of one extra BTRFS_I by taking btrfs_inode directly. Signed-off-by: Nikolay Borisov Reviewed-by: David Sterba Signed-off-by: David Sterba --- fs/btrfs/inode.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'fs/btrfs/inode.c') diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index aa48452e395f..412693947484 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -6890,7 +6890,7 @@ static struct extent_map *btrfs_create_dio_extent(struct inode *inode, if (IS_ERR(em)) goto out; } - ret = btrfs_add_ordered_extent_dio(inode, start, block_start, + ret = btrfs_add_ordered_extent_dio(BTRFS_I(inode), start, block_start, len, block_len, type); if (ret) { if (em) { -- cgit From 64f54188ea4309a22eaf09df169ea5d0f0021640 Mon Sep 17 00:00:00 2001 From: Nikolay Borisov Date: Wed, 3 Jun 2020 08:55:31 +0300 Subject: btrfs: make btrfs_create_dio_extent take btrfs_inode Take btrfs_inode directly and stop using superfulous BTRFS_I calls. Signed-off-by: Nikolay Borisov Reviewed-by: David Sterba Signed-off-by: David Sterba --- fs/btrfs/inode.c | 18 ++++++++---------- 1 file changed, 8 insertions(+), 10 deletions(-) (limited to 'fs/btrfs/inode.c') diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 412693947484..7415a6486072 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -6868,7 +6868,7 @@ out: return em; } -static struct extent_map *btrfs_create_dio_extent(struct inode *inode, +static struct extent_map *btrfs_create_dio_extent(struct btrfs_inode *inode, const u64 start, const u64 len, const u64 orig_start, @@ -6882,21 +6882,19 @@ static struct extent_map *btrfs_create_dio_extent(struct inode *inode, int ret; if (type != BTRFS_ORDERED_NOCOW) { - em = create_io_em(BTRFS_I(inode), start, len, orig_start, - block_start, block_len, orig_block_len, - ram_bytes, + em = create_io_em(inode, start, len, orig_start, block_start, + block_len, orig_block_len, ram_bytes, BTRFS_COMPRESS_NONE, /* compress_type */ type); if (IS_ERR(em)) goto out; } - ret = btrfs_add_ordered_extent_dio(BTRFS_I(inode), start, block_start, - len, block_len, type); + ret = btrfs_add_ordered_extent_dio(inode, start, block_start, len, + block_len, type); if (ret) { if (em) { free_extent_map(em); - btrfs_drop_extent_cache(BTRFS_I(inode), start, - start + len - 1, 0); + btrfs_drop_extent_cache(inode, start, start + len - 1, 0); } em = ERR_PTR(ret); } @@ -6921,7 +6919,7 @@ static struct extent_map *btrfs_new_extent_direct(struct inode *inode, if (ret) return ERR_PTR(ret); - em = btrfs_create_dio_extent(inode, start, ins.offset, start, + em = btrfs_create_dio_extent(BTRFS_I(inode), start, ins.offset, start, ins.objectid, ins.offset, ins.offset, ins.offset, BTRFS_ORDERED_REGULAR); btrfs_dec_block_group_reservations(fs_info, ins.objectid); @@ -7295,7 +7293,7 @@ static int btrfs_get_blocks_direct_write(struct extent_map **map, btrfs_inc_nocow_writers(fs_info, block_start)) { struct extent_map *em2; - em2 = btrfs_create_dio_extent(inode, start, len, + em2 = btrfs_create_dio_extent(BTRFS_I(inode), start, len, orig_start, block_start, len, orig_block_len, ram_bytes, type); -- cgit From 9fc6f911a014fab57ed661b6918b7f730fa60c59 Mon Sep 17 00:00:00 2001 From: Nikolay Borisov Date: Wed, 3 Jun 2020 08:55:32 +0300 Subject: btrfs: make btrfs_new_extent_direct take btrfs_inode This function really needs a btrfs_inode and not a generic vfs one. Take it as a parameter and get rid of superfluous BTRFS_I() calls. Signed-off-by: Nikolay Borisov Reviewed-by: David Sterba Signed-off-by: David Sterba --- fs/btrfs/inode.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) (limited to 'fs/btrfs/inode.c') diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 7415a6486072..5abb6d0a8cac 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -6903,29 +6903,29 @@ static struct extent_map *btrfs_create_dio_extent(struct btrfs_inode *inode, return em; } -static struct extent_map *btrfs_new_extent_direct(struct inode *inode, +static struct extent_map *btrfs_new_extent_direct(struct btrfs_inode *inode, u64 start, u64 len) { - struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); - struct btrfs_root *root = BTRFS_I(inode)->root; + struct btrfs_root *root = inode->root; + struct btrfs_fs_info *fs_info = root->fs_info; struct extent_map *em; struct btrfs_key ins; u64 alloc_hint; int ret; - alloc_hint = get_extent_allocation_hint(BTRFS_I(inode), start, len); + alloc_hint = get_extent_allocation_hint(inode, start, len); ret = btrfs_reserve_extent(root, len, len, fs_info->sectorsize, 0, alloc_hint, &ins, 1, 1); if (ret) return ERR_PTR(ret); - em = btrfs_create_dio_extent(BTRFS_I(inode), start, ins.offset, start, + em = btrfs_create_dio_extent(inode, start, ins.offset, start, ins.objectid, ins.offset, ins.offset, ins.offset, BTRFS_ORDERED_REGULAR); btrfs_dec_block_group_reservations(fs_info, ins.objectid); if (IS_ERR(em)) - btrfs_free_reserved_extent(fs_info, ins.objectid, - ins.offset, 1); + btrfs_free_reserved_extent(fs_info, ins.objectid, ins.offset, + 1); return em; } @@ -7320,7 +7320,7 @@ static int btrfs_get_blocks_direct_write(struct extent_map **map, /* this will cow the extent */ len = bh_result->b_size; free_extent_map(em); - *map = em = btrfs_new_extent_direct(inode, start, len); + *map = em = btrfs_new_extent_direct(BTRFS_I(inode), start, len); if (IS_ERR(em)) { ret = PTR_ERR(em); goto out; -- cgit From c2566f22893c8b8cdf443505c043b2ca9f5023f6 Mon Sep 17 00:00:00 2001 From: Nikolay Borisov Date: Wed, 3 Jun 2020 08:55:35 +0300 Subject: btrfs: make btrfs_set_extent_delalloc take btrfs_inode Preparation to make btrfs_dirty_pages take btrfs_inode as parameter. Signed-off-by: Nikolay Borisov Reviewed-by: David Sterba Signed-off-by: David Sterba --- fs/btrfs/inode.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) (limited to 'fs/btrfs/inode.c') diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 5abb6d0a8cac..0d711e525dbb 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -2260,13 +2260,13 @@ static noinline int add_pending_csums(struct btrfs_trans_handle *trans, return 0; } -int btrfs_set_extent_delalloc(struct inode *inode, u64 start, u64 end, +int btrfs_set_extent_delalloc(struct btrfs_inode *inode, u64 start, u64 end, unsigned int extra_bits, struct extent_state **cached_state) { WARN_ON(PAGE_ALIGNED(end)); - return set_extent_delalloc(&BTRFS_I(inode)->io_tree, start, end, - extra_bits, cached_state); + return set_extent_delalloc(&inode->io_tree, start, end, extra_bits, + cached_state); } /* see btrfs_writepage_start_hook for details on why this is required */ @@ -2363,7 +2363,7 @@ again: goto again; } - ret = btrfs_set_extent_delalloc(inode, page_start, page_end, 0, + ret = btrfs_set_extent_delalloc(BTRFS_I(inode), page_start, page_end, 0, &cached_state); if (ret) goto out_reserved; @@ -4581,7 +4581,7 @@ again: EXTENT_DELALLOC | EXTENT_DO_ACCOUNTING | EXTENT_DEFRAG, 0, 0, &cached_state); - ret = btrfs_set_extent_delalloc(inode, block_start, block_end, 0, + ret = btrfs_set_extent_delalloc(BTRFS_I(inode), block_start, block_end, 0, &cached_state); if (ret) { unlock_extent_cached(io_tree, block_start, block_end, @@ -8296,7 +8296,7 @@ again: EXTENT_DELALLOC | EXTENT_DO_ACCOUNTING | EXTENT_DEFRAG, 0, 0, &cached_state); - ret2 = btrfs_set_extent_delalloc(inode, page_start, end, 0, + ret2 = btrfs_set_extent_delalloc(BTRFS_I(inode), page_start, end, 0, &cached_state); if (ret2) { unlock_extent_cached(io_tree, page_start, page_end, -- cgit From 9db5d510ac5bfeaedd1e05e6afa300dda1ea7f4f Mon Sep 17 00:00:00 2001 From: Nikolay Borisov Date: Wed, 3 Jun 2020 08:55:38 +0300 Subject: btrfs: make btrfs_free_reserved_data_space_noquota take btrfs_fs_info No point in taking an inode only to get btrfs_fs_info from it, instead take btrfs_fs_info directly. Signed-off-by: Nikolay Borisov Reviewed-by: David Sterba Signed-off-by: David Sterba --- fs/btrfs/inode.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) (limited to 'fs/btrfs/inode.c') diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 0d711e525dbb..f700e3897937 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -2080,9 +2080,7 @@ void btrfs_clear_delalloc_extent(struct inode *vfs_inode, if (root->root_key.objectid != BTRFS_DATA_RELOC_TREE_OBJECTID && do_list && !(state->state & EXTENT_NORESERVE) && (*bits & EXTENT_CLEAR_DATA_RESV)) - btrfs_free_reserved_data_space_noquota( - &inode->vfs_inode, - len); + btrfs_free_reserved_data_space_noquota(fs_info, len); percpu_counter_add_batch(&fs_info->delalloc_bytes, -len, fs_info->delalloc_batch); @@ -7312,7 +7310,7 @@ static int btrfs_get_blocks_direct_write(struct extent_map **map, * use the existing or preallocated extent, so does not * need to adjust btrfs_space_info's bytes_may_use. */ - btrfs_free_reserved_data_space_noquota(inode, len); + btrfs_free_reserved_data_space_noquota(fs_info, len); goto skip_cow; } } -- cgit From 25ce28caaa1ddc2ef8848c5a09e63a9bc0a5d455 Mon Sep 17 00:00:00 2001 From: Nikolay Borisov Date: Wed, 3 Jun 2020 08:55:39 +0300 Subject: btrfs: make btrfs_free_reserved_data_space take btrfs_inode It only uses btrfs_inode internally so take it as a parameter. Signed-off-by: Nikolay Borisov Reviewed-by: David Sterba Signed-off-by: David Sterba --- fs/btrfs/inode.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'fs/btrfs/inode.c') diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index f700e3897937..59e04bec4b35 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -4532,8 +4532,8 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len, ret = btrfs_delalloc_reserve_metadata(BTRFS_I(inode), blocksize); if (ret < 0) { if (!only_release_metadata) - btrfs_free_reserved_data_space(inode, data_reserved, - block_start, blocksize); + btrfs_free_reserved_data_space(BTRFS_I(inode), + data_reserved, block_start, blocksize); goto out; } again: @@ -9772,7 +9772,7 @@ next: btrfs_end_transaction(trans); } if (clear_offset < end) - btrfs_free_reserved_data_space(inode, NULL, clear_offset, + btrfs_free_reserved_data_space(BTRFS_I(inode), NULL, clear_offset, end - clear_offset + 1); return ret; } -- cgit From 86d52921a2ba51a78e5bfb71c75aedcbd9e61a5c Mon Sep 17 00:00:00 2001 From: Nikolay Borisov Date: Wed, 3 Jun 2020 08:55:40 +0300 Subject: btrfs: make btrfs_delalloc_release_space take btrfs_inode It needs btrfs_inode so take it as a parameter directly. Signed-off-by: Nikolay Borisov Reviewed-by: David Sterba Signed-off-by: David Sterba --- fs/btrfs/inode.c | 26 ++++++++++++++------------ 1 file changed, 14 insertions(+), 12 deletions(-) (limited to 'fs/btrfs/inode.c') diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 59e04bec4b35..5a1dfbb45734 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -2328,7 +2328,8 @@ again: if (!ret) { btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE); - btrfs_delalloc_release_space(inode, data_reserved, + btrfs_delalloc_release_space(BTRFS_I(inode), + data_reserved, page_start, PAGE_SIZE, true); } @@ -2378,8 +2379,8 @@ again: out_reserved: btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE); if (free_delalloc_space) - btrfs_delalloc_release_space(inode, data_reserved, page_start, - PAGE_SIZE, true); + btrfs_delalloc_release_space(BTRFS_I(inode), data_reserved, + page_start, PAGE_SIZE, true); unlock_extent_cached(&BTRFS_I(inode)->io_tree, page_start, page_end, &cached_state); out_page: @@ -4539,7 +4540,7 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len, again: page = find_or_create_page(mapping, index, mask); if (!page) { - btrfs_delalloc_release_space(inode, data_reserved, + btrfs_delalloc_release_space(BTRFS_I(inode), data_reserved, block_start, blocksize, true); btrfs_delalloc_release_extents(BTRFS_I(inode), blocksize); ret = -ENOMEM; @@ -4615,7 +4616,7 @@ out_unlock: btrfs_delalloc_release_metadata(BTRFS_I(inode), blocksize, true); else - btrfs_delalloc_release_space(inode, data_reserved, + btrfs_delalloc_release_space(BTRFS_I(inode), data_reserved, block_start, blocksize, true); } btrfs_delalloc_release_extents(BTRFS_I(inode), blocksize); @@ -7947,8 +7948,9 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter) current->journal_info = NULL; if (ret < 0 && ret != -EIOCBQUEUED) { if (dio_data.reserve) - btrfs_delalloc_release_space(inode, data_reserved, - offset, dio_data.reserve, true); + btrfs_delalloc_release_space(BTRFS_I(inode), + data_reserved, offset, dio_data.reserve, + true); /* * On error we might have left some ordered extents * without submitting corresponding bios for them, so @@ -7963,7 +7965,7 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter) dio_data.unsubmitted_oe_range_start, false); } else if (ret >= 0 && (size_t)ret < count) - btrfs_delalloc_release_space(inode, data_reserved, + btrfs_delalloc_release_space(BTRFS_I(inode), data_reserved, offset, count - (size_t)ret, true); btrfs_delalloc_release_extents(BTRFS_I(inode), count); } @@ -8277,9 +8279,9 @@ again: fs_info->sectorsize); if (reserved_space < PAGE_SIZE) { end = page_start + reserved_space - 1; - btrfs_delalloc_release_space(inode, data_reserved, - page_start, PAGE_SIZE - reserved_space, - true); + btrfs_delalloc_release_space(BTRFS_I(inode), + data_reserved, page_start, + PAGE_SIZE - reserved_space, true); } } @@ -8334,7 +8336,7 @@ out_unlock: unlock_page(page); out: btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE); - btrfs_delalloc_release_space(inode, data_reserved, page_start, + btrfs_delalloc_release_space(BTRFS_I(inode), data_reserved, page_start, reserved_space, (ret != 0)); out_noreserve: sb_end_pagefault(inode->i_sb); -- cgit From 36ea6f3e931391c2adbb38af8c5dd4a043d26ac5 Mon Sep 17 00:00:00 2001 From: Nikolay Borisov Date: Wed, 3 Jun 2020 08:55:41 +0300 Subject: btrfs: make btrfs_check_data_free_space take btrfs_inode Instead of calling BTRFS_I on the passed vfs_inode take btrfs_inode directly. Signed-off-by: Nikolay Borisov Reviewed-by: David Sterba Signed-off-by: David Sterba --- fs/btrfs/inode.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'fs/btrfs/inode.c') diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 5a1dfbb45734..1cc2c68206a7 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -4519,8 +4519,8 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len, block_end = block_start + blocksize - 1; - ret = btrfs_check_data_free_space(inode, &data_reserved, block_start, - blocksize); + ret = btrfs_check_data_free_space(BTRFS_I(inode), &data_reserved, + block_start, blocksize); if (ret < 0) { if (btrfs_check_nocow_lock(BTRFS_I(inode), block_start, &write_bytes) > 0) { -- cgit From e5b7231e2009967b63a39ce0ec6a022307616b82 Mon Sep 17 00:00:00 2001 From: Nikolay Borisov Date: Wed, 3 Jun 2020 08:55:42 +0300 Subject: btrfs: make btrfs_delalloc_reserve_space take btrfs_inode All of its children take btrfs_inode so bubble up this requirement to btrfs_delalloc_reserve_space's interface and stop calling BTRFS_I internally. Signed-off-by: Nikolay Borisov Reviewed-by: David Sterba Signed-off-by: David Sterba --- fs/btrfs/inode.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) (limited to 'fs/btrfs/inode.c') diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 1cc2c68206a7..cd092a7a69cc 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -2297,8 +2297,8 @@ static void btrfs_writepage_fixup_worker(struct btrfs_work *work) * This is similar to page_mkwrite, we need to reserve the space before * we take the page lock. */ - ret = btrfs_delalloc_reserve_space(inode, &data_reserved, page_start, - PAGE_SIZE); + ret = btrfs_delalloc_reserve_space(BTRFS_I(inode), &data_reserved, + page_start, PAGE_SIZE); again: lock_page(page); @@ -4518,7 +4518,6 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len, block_start = round_down(from, blocksize); block_end = block_start + blocksize - 1; - ret = btrfs_check_data_free_space(BTRFS_I(inode), &data_reserved, block_start, blocksize); if (ret < 0) { @@ -7916,7 +7915,7 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter) inode_unlock(inode); relock = true; } - ret = btrfs_delalloc_reserve_space(inode, &data_reserved, + ret = btrfs_delalloc_reserve_space(BTRFS_I(inode), &data_reserved, offset, count); if (ret) goto out; @@ -8231,8 +8230,8 @@ vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf) * end up waiting indefinitely to get a lock on the page currently * being processed by btrfs_page_mkwrite() function. */ - ret2 = btrfs_delalloc_reserve_space(inode, &data_reserved, page_start, - reserved_space); + ret2 = btrfs_delalloc_reserve_space(BTRFS_I(inode), &data_reserved, + page_start, reserved_space); if (!ret2) { ret2 = file_update_time(vmf->vma->vm_file); reserved = 1; -- cgit From 65d87f7918ef11a4040f87393cba7b8dff0d9fc8 Mon Sep 17 00:00:00 2001 From: Nikolay Borisov Date: Fri, 5 Jun 2020 10:51:51 +0300 Subject: btrfs: remove BTRFS_I calls in btrfs_writepage_fixup_worker All of its children functions use btrfs_inode. Signed-off-by: Nikolay Borisov Reviewed-by: David Sterba Signed-off-by: David Sterba --- fs/btrfs/inode.c | 38 +++++++++++++++++--------------------- 1 file changed, 17 insertions(+), 21 deletions(-) (limited to 'fs/btrfs/inode.c') diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index cd092a7a69cc..0a038834bf71 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -2281,7 +2281,7 @@ static void btrfs_writepage_fixup_worker(struct btrfs_work *work) struct extent_state *cached_state = NULL; struct extent_changeset *data_reserved = NULL; struct page *page; - struct inode *inode; + struct btrfs_inode *inode; u64 page_start; u64 page_end; int ret = 0; @@ -2289,7 +2289,7 @@ static void btrfs_writepage_fixup_worker(struct btrfs_work *work) fixup = container_of(work, struct btrfs_writepage_fixup, work); page = fixup->page; - inode = fixup->inode; + inode = BTRFS_I(fixup->inode); page_start = page_offset(page); page_end = page_offset(page) + PAGE_SIZE - 1; @@ -2297,8 +2297,8 @@ static void btrfs_writepage_fixup_worker(struct btrfs_work *work) * This is similar to page_mkwrite, we need to reserve the space before * we take the page lock. */ - ret = btrfs_delalloc_reserve_space(BTRFS_I(inode), &data_reserved, - page_start, PAGE_SIZE); + ret = btrfs_delalloc_reserve_space(inode, &data_reserved, page_start, + PAGE_SIZE); again: lock_page(page); @@ -2326,10 +2326,8 @@ again: * when the page was already properly dealt with. */ if (!ret) { - btrfs_delalloc_release_extents(BTRFS_I(inode), - PAGE_SIZE); - btrfs_delalloc_release_space(BTRFS_I(inode), - data_reserved, + btrfs_delalloc_release_extents(inode, PAGE_SIZE); + btrfs_delalloc_release_space(inode, data_reserved, page_start, PAGE_SIZE, true); } @@ -2344,25 +2342,23 @@ again: if (ret) goto out_page; - lock_extent_bits(&BTRFS_I(inode)->io_tree, page_start, page_end, - &cached_state); + lock_extent_bits(&inode->io_tree, page_start, page_end, &cached_state); /* already ordered? We're done */ if (PagePrivate2(page)) goto out_reserved; - ordered = btrfs_lookup_ordered_range(BTRFS_I(inode), page_start, - PAGE_SIZE); + ordered = btrfs_lookup_ordered_range(inode, page_start, PAGE_SIZE); if (ordered) { - unlock_extent_cached(&BTRFS_I(inode)->io_tree, page_start, - page_end, &cached_state); + unlock_extent_cached(&inode->io_tree, page_start, page_end, + &cached_state); unlock_page(page); - btrfs_start_ordered_extent(inode, ordered, 1); + btrfs_start_ordered_extent(&inode->vfs_inode, ordered, 1); btrfs_put_ordered_extent(ordered); goto again; } - ret = btrfs_set_extent_delalloc(BTRFS_I(inode), page_start, page_end, 0, + ret = btrfs_set_extent_delalloc(inode, page_start, page_end, 0, &cached_state); if (ret) goto out_reserved; @@ -2377,11 +2373,11 @@ again: BUG_ON(!PageDirty(page)); free_delalloc_space = false; out_reserved: - btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE); + btrfs_delalloc_release_extents(inode, PAGE_SIZE); if (free_delalloc_space) - btrfs_delalloc_release_space(BTRFS_I(inode), data_reserved, - page_start, PAGE_SIZE, true); - unlock_extent_cached(&BTRFS_I(inode)->io_tree, page_start, page_end, + btrfs_delalloc_release_space(inode, data_reserved, page_start, + PAGE_SIZE, true); + unlock_extent_cached(&inode->io_tree, page_start, page_end, &cached_state); out_page: if (ret) { @@ -2404,7 +2400,7 @@ out_page: * that could need flushing space. Recursing back to fixup worker would * deadlock. */ - btrfs_add_delayed_iput(inode); + btrfs_add_delayed_iput(&inode->vfs_inode); } /* -- cgit From d90944141b4a7704fac7f2b8fcc566511176e4b4 Mon Sep 17 00:00:00 2001 From: Nikolay Borisov Date: Fri, 5 Jun 2020 10:41:13 +0300 Subject: btrfs: make btrfs_set_inode_last_trans take btrfs_inode Instead of making multiple calls to BTRFS_I simply take btrfs_inode as an input paramter. Signed-off-by: Nikolay Borisov Reviewed-by: David Sterba Signed-off-by: David Sterba --- fs/btrfs/inode.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'fs/btrfs/inode.c') diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 0a038834bf71..07dd8aa4f708 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -3481,7 +3481,7 @@ static noinline int btrfs_update_inode_item(struct btrfs_trans_handle *trans, fill_inode_item(trans, leaf, inode_item, inode); btrfs_mark_buffer_dirty(leaf); - btrfs_set_inode_last_trans(trans, inode); + btrfs_set_inode_last_trans(trans, BTRFS_I(inode)); ret = 0; failed: btrfs_free_path(path); @@ -3511,7 +3511,7 @@ noinline int btrfs_update_inode(struct btrfs_trans_handle *trans, ret = btrfs_delayed_update_inode(trans, root, inode); if (!ret) - btrfs_set_inode_last_trans(trans, inode); + btrfs_set_inode_last_trans(trans, BTRFS_I(inode)); return ret; } @@ -6053,7 +6053,7 @@ static struct inode *btrfs_new_inode(struct btrfs_trans_handle *trans, inode_tree_add(inode); trace_btrfs_inode_new(inode); - btrfs_set_inode_last_trans(trans, inode); + btrfs_set_inode_last_trans(trans, BTRFS_I(inode)); btrfs_update_root_times(trans, root); -- cgit From cfdd45921571eb24073e0737fa0bd44b4218f914 Mon Sep 17 00:00:00 2001 From: Nikolay Borisov Date: Wed, 3 Jun 2020 08:55:46 +0300 Subject: btrfs: make btrfs_qgroup_check_reserved_leak take btrfs_inode vfs_inode is used only for the inode number everything else requires btrfs_inode. Signed-off-by: Nikolay Borisov Reviewed-by: David Sterba [ use btrfs_ino ] Signed-off-by: David Sterba --- fs/btrfs/inode.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'fs/btrfs/inode.c') diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 07dd8aa4f708..5d2ce8092531 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -8631,7 +8631,7 @@ void btrfs_destroy_inode(struct inode *inode) btrfs_put_ordered_extent(ordered); } } - btrfs_qgroup_check_reserved_leak(inode); + btrfs_qgroup_check_reserved_leak(BTRFS_I(inode)); inode_tree_del(inode); btrfs_drop_extent_cache(BTRFS_I(inode), 0, (u64)-1, 0); btrfs_inode_clear_file_extent_range(BTRFS_I(inode), 0, (u64)-1); -- cgit From 082b6c970f02fefd278c7833880cda29691a5f34 Mon Sep 17 00:00:00 2001 From: Qu Wenruo Date: Tue, 16 Jun 2020 10:17:37 +0800 Subject: btrfs: free anon block device right after subvolume deletion [BUG] When a lot of subvolumes are created, there is a user report about transaction aborted caused by slow anonymous block device reclaim: BTRFS: Transaction aborted (error -24) WARNING: CPU: 17 PID: 17041 at fs/btrfs/transaction.c:1576 create_pending_snapshot+0xbc4/0xd10 [btrfs] RIP: 0010:create_pending_snapshot+0xbc4/0xd10 [btrfs] Call Trace: create_pending_snapshots+0x82/0xa0 [btrfs] btrfs_commit_transaction+0x275/0x8c0 [btrfs] btrfs_mksubvol+0x4b9/0x500 [btrfs] btrfs_ioctl_snap_create_transid+0x174/0x180 [btrfs] btrfs_ioctl_snap_create_v2+0x11c/0x180 [btrfs] btrfs_ioctl+0x11a4/0x2da0 [btrfs] do_vfs_ioctl+0xa9/0x640 ksys_ioctl+0x67/0x90 __x64_sys_ioctl+0x1a/0x20 do_syscall_64+0x5a/0x110 entry_SYSCALL_64_after_hwframe+0x44/0xa9 ---[ end trace 33f2f83f3d5250e9 ]--- BTRFS: error (device sda1) in create_pending_snapshot:1576: errno=-24 unknown BTRFS info (device sda1): forced readonly BTRFS warning (device sda1): Skipping commit of aborted transaction. BTRFS: error (device sda1) in cleanup_transaction:1831: errno=-24 unknown [CAUSE] The anonymous device pool is shared and its size is 1M. It's possible to hit that limit if the subvolume deletion is not fast enough and the subvolumes to be cleaned keep the ids allocated. [WORKAROUND] We can't avoid the anon device pool exhaustion but we can shorten the time the id is attached to the subvolume root once the subvolume becomes invisible to the user. Reported-by: Greed Rong Link: https://lore.kernel.org/linux-btrfs/CA+UqX+NTrZ6boGnWHhSeZmEY5J76CTqmYjO2S+=tHJX7nb9DPw@mail.gmail.com/ CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Josef Bacik Signed-off-by: Qu Wenruo Reviewed-by: David Sterba Signed-off-by: David Sterba --- fs/btrfs/inode.c | 2 ++ 1 file changed, 2 insertions(+) (limited to 'fs/btrfs/inode.c') diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 5d2ce8092531..f066cad2d039 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -4026,6 +4026,8 @@ int btrfs_delete_subvolume(struct inode *dir, struct dentry *dentry) } } + free_anon_bdev(dest->anon_dev); + dest->anon_dev = 0; out_end_trans: trans->block_rsv = NULL; trans->bytes_reserved = 0; -- cgit From 814723e0a55a9576b1d17f4fa8811086a24dd3e8 Mon Sep 17 00:00:00 2001 From: Nikolay Borisov Date: Thu, 2 Jul 2020 15:23:32 +0300 Subject: btrfs: increment device corruption error in case of checksum error Now that btrfs_io_bio have access to btrfs_device we can safely increment the device corruption counter on error. There is one notable exception - repair bios for raid. Since those don't go through the normal submit_stripe_bio callpath but through raid56_parity_recover thus repair bios won't have their device set. Scrub increments the corruption counter for checksum mismatch as well but does not call this function. Link: https://lore.kernel.org/linux-btrfs/4857863.FCrPRfMyHP@liv/ Reviewed-by: Josef Bacik Signed-off-by: Nikolay Borisov Reviewed-by: David Sterba Signed-off-by: David Sterba --- fs/btrfs/inode.c | 3 +++ 1 file changed, 3 insertions(+) (limited to 'fs/btrfs/inode.c') diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index f066cad2d039..0fa4f7007ff9 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -2815,6 +2815,9 @@ static int check_data_csum(struct inode *inode, struct btrfs_io_bio *io_bio, zeroit: btrfs_print_data_csum_error(BTRFS_I(inode), start, csum, csum_expected, io_bio->mirror_num); + if (io_bio->device) + btrfs_dev_stat_inc_and_print(io_bio->device, + BTRFS_DEV_STAT_CORRUPTION_ERRS); memset(kaddr + pgoff, 1, len); flush_dcache_page(page); kunmap_atomic(kaddr); -- cgit From 3ebac17ce593490bff48d8eb0b4b97b97d8609fa Mon Sep 17 00:00:00 2001 From: Filipe Manana Date: Wed, 15 Jul 2020 12:30:43 +0100 Subject: btrfs: reduce contention on log trees when logging checksums The possibility of extents being shared (through clone and deduplication operations) requires special care when logging data checksums, to avoid having a log tree with different checksum items that cover ranges which overlap (which resulted in missing checksums after replaying a log tree). Such problems were fixed in the past by the following commits: commit 40e046acbd2f ("Btrfs: fix missing data checksums after replaying a log tree") commit e289f03ea79b ("btrfs: fix corrupt log due to concurrent fsync of inodes with shared extents") Test case generic/588 exercises the scenario solved by the first commit (purely sequential and deterministic) while test case generic/457 often triggered the case fixed by the second commit (not deterministic, requires specific timings under concurrency). The problems were addressed by deleting, from the log tree, any existing checksums before logging the new ones. And also by doing the deletion and logging of the cheksums while locking the checksum range in an extent io tree (root->log_csum_range), to deal with the case where we have concurrent fsyncs against files with shared extents. That however causes more contention on the leaves of a log tree where we store checksums (and all the nodes in the paths leading to them), even when we do not have shared extents, or all the shared extents were created by past transactions. It also adds a bit of contention on the spin lock of the log_csums_range extent io tree of the log root. This change adds a 'last_reflink_trans' field to the inode to keep track of the last transaction where a new extent was shared between inodes (through clone and deduplication operations). It is updated for both the source and destination inodes of reflink operations whenever a new extent (created in the current transaction) becomes shared by the inodes. This field is kept in memory only, not persisted in the inode item, similar to other existing fields (last_unlink_trans, logged_trans). When logging checksums for an extent, if the value of 'last_reflink_trans' is smaller then the current transaction's generation/id, we skip locking the extent range and deletion of checksums from the log tree, since we know we do not have new shared extents. This reduces contention on the log tree's leaves where checksums are stored. The following script, which uses fio, was used to measure the impact of this change: $ cat test-fsync.sh #!/bin/bash DEV=/dev/sdk MNT=/mnt/sdk MOUNT_OPTIONS="-o ssd" MKFS_OPTIONS="-d single -m single" if [ $# -ne 3 ]; then echo "Use $0 NUM_JOBS FILE_SIZE FSYNC_FREQ" exit 1 fi NUM_JOBS=$1 FILE_SIZE=$2 FSYNC_FREQ=$3 cat < /tmp/fio-job.ini [writers] rw=write fsync=$FSYNC_FREQ fallocate=none group_reporting=1 direct=0 bs=64k ioengine=sync size=$FILE_SIZE directory=$MNT numjobs=$NUM_JOBS EOF echo "Using config:" echo cat /tmp/fio-job.ini echo mkfs.btrfs -f $MKFS_OPTIONS $DEV mount $MOUNT_OPTIONS $DEV $MNT fio /tmp/fio-job.ini umount $MNT The tests were performed for different numbers of jobs, file sizes and fsync frequency. A qemu VM using kvm was used, with 8 cores (the host has 12 cores, with cpu governance set to performance mode on all cores), 16GiB of ram (the host has 64GiB) and using a NVMe device directly (without an intermediary filesystem in the host). While running the tests, the host was not used for anything else, to avoid disturbing the tests. The obtained results were the following (the last line of fio's output was pasted). Starting with 16 jobs is where a significant difference is observable in this particular setup and hardware (differences highlighted below). The very small differences for tests with less than 16 jobs are possibly just noise and random. **** 1 job, file size 1G, fsync frequency 1 **** before this change: WRITE: bw=23.8MiB/s (24.9MB/s), 23.8MiB/s-23.8MiB/s (24.9MB/s-24.9MB/s), io=1024MiB (1074MB), run=43075-43075msec after this change: WRITE: bw=24.4MiB/s (25.6MB/s), 24.4MiB/s-24.4MiB/s (25.6MB/s-25.6MB/s), io=1024MiB (1074MB), run=41938-41938msec **** 2 jobs, file size 1G, fsync frequency 1 **** before this change: WRITE: bw=37.7MiB/s (39.5MB/s), 37.7MiB/s-37.7MiB/s (39.5MB/s-39.5MB/s), io=2048MiB (2147MB), run=54351-54351msec after this change: WRITE: bw=37.7MiB/s (39.5MB/s), 37.6MiB/s-37.6MiB/s (39.5MB/s-39.5MB/s), io=2048MiB (2147MB), run=54428-54428msec **** 4 jobs, file size 1G, fsync frequency 1 **** before this change: WRITE: bw=67.5MiB/s (70.8MB/s), 67.5MiB/s-67.5MiB/s (70.8MB/s-70.8MB/s), io=4096MiB (4295MB), run=60669-60669msec after this change: WRITE: bw=68.6MiB/s (71.0MB/s), 68.6MiB/s-68.6MiB/s (71.0MB/s-71.0MB/s), io=4096MiB (4295MB), run=59678-59678msec **** 8 jobs, file size 1G, fsync frequency 1 **** before this change: WRITE: bw=128MiB/s (134MB/s), 128MiB/s-128MiB/s (134MB/s-134MB/s), io=8192MiB (8590MB), run=64048-64048msec after this change: WRITE: bw=129MiB/s (135MB/s), 129MiB/s-129MiB/s (135MB/s-135MB/s), io=8192MiB (8590MB), run=63405-63405msec **** 16 jobs, file size 1G, fsync frequency 1 **** before this change: WRITE: bw=78.5MiB/s (82.3MB/s), 78.5MiB/s-78.5MiB/s (82.3MB/s-82.3MB/s), io=16.0GiB (17.2GB), run=208676-208676msec after this change: WRITE: bw=110MiB/s (115MB/s), 110MiB/s-110MiB/s (115MB/s-115MB/s), io=16.0GiB (17.2GB), run=149295-149295msec (+40.1% throughput, -28.5% runtime) **** 32 jobs, file size 1G, fsync frequency 1 **** before this change: WRITE: bw=58.8MiB/s (61.7MB/s), 58.8MiB/s-58.8MiB/s (61.7MB/s-61.7MB/s), io=32.0GiB (34.4GB), run=557134-557134msec after this change: WRITE: bw=76.1MiB/s (79.8MB/s), 76.1MiB/s-76.1MiB/s (79.8MB/s-79.8MB/s), io=32.0GiB (34.4GB), run=430550-430550msec (+29.4% throughput, -22.7% runtime) **** 64 jobs, file size 512M, fsync frequency 1 **** before this change: WRITE: bw=65.8MiB/s (68.0MB/s), 65.8MiB/s-65.8MiB/s (68.0MB/s-68.0MB/s), io=32.0GiB (34.4GB), run=498055-498055msec after this change: WRITE: bw=85.1MiB/s (89.2MB/s), 85.1MiB/s-85.1MiB/s (89.2MB/s-89.2MB/s), io=32.0GiB (34.4GB), run=385116-385116msec (+29.3% throughput, -22.7% runtime) **** 128 jobs, file size 256M, fsync frequency 1 **** before this change: WRITE: bw=54.7MiB/s (57.3MB/s), 54.7MiB/s-54.7MiB/s (57.3MB/s-57.3MB/s), io=32.0GiB (34.4GB), run=599373-599373msec after this change: WRITE: bw=121MiB/s (126MB/s), 121MiB/s-121MiB/s (126MB/s-126MB/s), io=32.0GiB (34.4GB), run=271907-271907msec (+121.2% throughput, -54.6% runtime) **** 256 jobs, file size 256M, fsync frequency 1 **** before this change: WRITE: bw=69.2MiB/s (72.5MB/s), 69.2MiB/s-69.2MiB/s (72.5MB/s-72.5MB/s), io=64.0GiB (68.7GB), run=947536-947536msec after this change: WRITE: bw=121MiB/s (127MB/s), 121MiB/s-121MiB/s (127MB/s-127MB/s), io=64.0GiB (68.7GB), run=541916-541916msec (+74.9% throughput, -42.8% runtime) **** 512 jobs, file size 128M, fsync frequency 1 **** before this change: WRITE: bw=85.4MiB/s (89.5MB/s), 85.4MiB/s-85.4MiB/s (89.5MB/s-89.5MB/s), io=64.0GiB (68.7GB), run=767734-767734msec after this change: WRITE: bw=141MiB/s (147MB/s), 141MiB/s-141MiB/s (147MB/s-147MB/s), io=64.0GiB (68.7GB), run=466022-466022msec (+65.1% throughput, -39.3% runtime) **** 1024 jobs, file size 128M, fsync frequency 1 **** before this change: WRITE: bw=115MiB/s (120MB/s), 115MiB/s-115MiB/s (120MB/s-120MB/s), io=128GiB (137GB), run=1143775-1143775msec after this change: WRITE: bw=171MiB/s (180MB/s), 171MiB/s-171MiB/s (180MB/s-180MB/s), io=128GiB (137GB), run=764843-764843msec (+48.7% throughput, -33.1% runtime) Reviewed-by: Josef Bacik Signed-off-by: Filipe Manana Signed-off-by: David Sterba --- fs/btrfs/inode.c | 9 +++++++++ 1 file changed, 9 insertions(+) (limited to 'fs/btrfs/inode.c') diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 0fa4f7007ff9..611b3412fbfd 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -3336,6 +3336,14 @@ cache_index: */ BTRFS_I(inode)->last_unlink_trans = BTRFS_I(inode)->last_trans; + /* + * Same logic as for last_unlink_trans. We don't persist the generation + * of the last transaction where this inode was used for a reflink + * operation, so after eviction and reloading the inode we must be + * pessimistic and assume the last transaction that modified the inode. + */ + BTRFS_I(inode)->last_reflink_trans = BTRFS_I(inode)->last_trans; + path->slots[0]++; if (inode->i_nlink != 1 || path->slots[0] >= btrfs_header_nritems(leaf)) @@ -8550,6 +8558,7 @@ struct inode *btrfs_alloc_inode(struct super_block *sb) ei->index_cnt = (u64)-1; ei->dir_index = 0; ei->last_unlink_trans = 0; + ei->last_reflink_trans = 0; ei->last_log_commit = 0; spin_lock_init(&ei->lock); -- cgit From 1e6e238c3002ea3611465ce5f32777ddd6a40126 Mon Sep 17 00:00:00 2001 From: Qu Wenruo Date: Tue, 28 Jul 2020 16:39:26 +0800 Subject: btrfs: inode: fix NULL pointer dereference if inode doesn't need compression [BUG] There is a bug report of NULL pointer dereference caused in compress_file_extent(): Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries Workqueue: btrfs-delalloc btrfs_delalloc_helper [btrfs] NIP [c008000006dd4d34] compress_file_range.constprop.41+0x75c/0x8a0 [btrfs] LR [c008000006dd4d1c] compress_file_range.constprop.41+0x744/0x8a0 [btrfs] Call Trace: [c000000c69093b00] [c008000006dd4d1c] compress_file_range.constprop.41+0x744/0x8a0 [btrfs] (unreliable) [c000000c69093bd0] [c008000006dd4ebc] async_cow_start+0x44/0xa0 [btrfs] [c000000c69093c10] [c008000006e14824] normal_work_helper+0xdc/0x598 [btrfs] [c000000c69093c80] [c0000000001608c0] process_one_work+0x2c0/0x5b0 [c000000c69093d10] [c000000000160c38] worker_thread+0x88/0x660 [c000000c69093db0] [c00000000016b55c] kthread+0x1ac/0x1c0 [c000000c69093e20] [c00000000000b660] ret_from_kernel_thread+0x5c/0x7c ---[ end trace f16954aa20d822f6 ]--- [CAUSE] For the following execution route of compress_file_range(), it's possible to hit NULL pointer dereference: compress_file_extent() |- pages = NULL; |- start = async_chunk->start = 0; |- end = async_chunk = 4095; |- nr_pages = 1; |- inode_need_compress() == false; <<< Possible, see later explanation | Now, we have nr_pages = 1, pages = NULL |- cont: |- ret = cow_file_range_inline(); |- if (ret <= 0) { |- for (i = 0; i < nr_pages; i++) { |- WARN_ON(pages[i]->mapping); <<< Crash To enter above call execution branch, we need the following race: Thread 1 (chattr) | Thread 2 (writeback) --------------------------+------------------------------ | btrfs_run_delalloc_range | |- inode_need_compress = true | |- cow_file_range_async() btrfs_ioctl_set_flag() | |- binode_flags |= | BTRFS_INODE_NOCOMPRESS | | compress_file_range() | |- inode_need_compress = false | |- nr_page = 1 while pages = NULL | | Then hit the crash [FIX] This patch will fix it by checking @pages before doing accessing it. This patch is only designed as a hot fix and easy to backport. More elegant fix may make btrfs only check inode_need_compress() once to avoid such race, but that would be another story. Reported-by: Luciano Chavez Fixes: 4d3a800ebb12 ("btrfs: merge nr_pages input and output parameter in compress_pages") CC: stable@vger.kernel.org # 4.14.x: cecc8d9038d16: btrfs: Move free_pages_out label in inline extent handling branch in compress_file_range CC: stable@vger.kernel.org # 4.14+ Signed-off-by: Qu Wenruo Signed-off-by: David Sterba --- fs/btrfs/inode.c | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-) (limited to 'fs/btrfs/inode.c') diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 611b3412fbfd..9988d754e465 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -653,12 +653,18 @@ cont: page_error_op | PAGE_END_WRITEBACK); - for (i = 0; i < nr_pages; i++) { - WARN_ON(pages[i]->mapping); - put_page(pages[i]); + /* + * Ensure we only free the compressed pages if we have + * them allocated, as we can still reach here with + * inode_need_compress() == false. + */ + if (pages) { + for (i = 0; i < nr_pages; i++) { + WARN_ON(pages[i]->mapping); + put_page(pages[i]); + } + kfree(pages); } - kfree(pages); - return 0; } } -- cgit From 881a3a11c2b858fe9b69ef79ac5ee9978a266dc9 Mon Sep 17 00:00:00 2001 From: Pavel Machek Date: Mon, 3 Aug 2020 11:35:06 +0200 Subject: btrfs: fix return value mixup in btrfs_get_extent btrfs_get_extent() sets variable ret, but out: error path expect error to be in variable err so the error code is lost. Fixes: 6bf9e4bd6a27 ("btrfs: inode: Verify inode mode to avoid NULL pointer dereference") CC: stable@vger.kernel.org # 5.4+ Reviewed-by: Nikolay Borisov Signed-off-by: Pavel Machek (CIP) Reviewed-by: David Sterba Signed-off-by: David Sterba --- fs/btrfs/inode.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'fs/btrfs/inode.c') diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 9988d754e465..16ce82039dcf 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -6627,7 +6627,7 @@ struct extent_map *btrfs_get_extent(struct btrfs_inode *inode, extent_type == BTRFS_FILE_EXTENT_PREALLOC) { /* Only regular file could have regular/prealloc extent */ if (!S_ISREG(inode->vfs_inode.i_mode)) { - ret = -EUCLEAN; + err = -EUCLEAN; btrfs_crit(fs_info, "regular/prealloc extent found for non-regular inode %llu", btrfs_ino(inode)); -- cgit