summaryrefslogtreecommitdiff
path: root/fs/btrfs/extent-io-tree.h
AgeCommit message (Collapse)Author
2025-05-15btrfs: make extent unpinning more efficient when committing transactionFilipe Manana
At btrfs_finish_extent_commit() we have this loop that keeps finding an extent range to unpin in the transaction's pinned_extents io tree, caches the extent state and then passes that cached extent state to btrfs_clear_extent_dirty(), which will free that extent state since we clear the only bit it can have set. So on each loop iteration we do a full io tree search and the cached state is used only to avoid having a tree search done by btrfs_clear_extent_dirty(). During the lifetime of a transaction we can pin many thousands of extents, resulting in a large and deep rb tree that backs the io tree. For example, for the following fs_mark run on a 12 cores boxes: $ cat test.sh #!/bin/bash DEV=/dev/nullb0 MNT=/mnt/nullb0 FILES=100000 THREADS=$(nproc --all) echo "performance" | \ tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor mkfs.btrfs -f $DEV mount $DEV $MNT OPTS="-S 0 -L 8 -n $FILES -s 0 -t $THREADS -k" for ((i = 1; i <= $THREADS; i++)); do OPTS="$OPTS -d $MNT/d$i" done fs_mark $OPTS umount $MNT an histogram for the number of ranges (elements) in the pinned extents io tree of a transaction was the following: Count: 76 Range: 5440.000 - 51088.000; Mean: 27354.368; Median: 28312.000; Stddev: 9800.767 Percentiles: 90th: 40486.000; 95th: 43322.000; 99th: 51088.000 5440.000 - 6805.809: 1 ### 6805.809 - 10652.034: 1 ### 10652.034 - 13326.178: 3 ######## 13326.178 - 16671.590: 8 ###################### 16671.590 - 20856.773: 7 #################### 20856.773 - 26092.528: 13 #################################### 26092.528 - 32642.571: 19 ##################################################### 32642.571 - 40836.818: 17 ############################################### 40836.818 - 51088.000: 7 #################### We can improve on this by grabbing the next state before calling btrfs_clear_extent_dirty(), avoiding a full tree search on the next iteration which always has an O(log n) complexity while grabbing the next element (rb_next() rbtree operation) is in the worst case O(log n) too, but very often much less than that, making it more efficient. Here follow histograms for the execution times, in nanoseconds, of btrfs_finish_extent_commit() before and after applying this patch and all the other patches in the same patchset. Before patchset: Count: 32 Range: 3925691.000 - 269990635.000; Mean: 133814526.906; Median: 122758052.000; Stddev: 65776550.375 Percentiles: 90th: 228672087.000; 95th: 265187000.000; 99th: 269990635.000 3925691.000 - 5993208.660: 1 #### 5993208.660 - 75878537.656: 4 ################## 75878537.656 - 115840974.514: 12 ##################################################### 115840974.514 - 176850157.761: 6 ########################### 176850157.761 - 269990635.000: 9 ######################################## After patchset: Count: 32 Range: 1849393.000 - 231491064.000; Mean: 126978584.625; Median: 123732897.000; Stddev: 58007821.806 Percentiles: 90th: 203055491.000; 95th: 219952699.000; 99th: 231491064.000 1849393.000 - 2997642.092: 1 #### 2997642.092 - 88111637.071: 9 ##################################### 88111637.071 - 142818264.414: 9 ##################################### 142818264.414 - 231491064.000: 13 ##################################################### Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15btrfs: make btrfs_find_contiguous_extent_bit() return bool instead of intFilipe Manana
The function needs only to return true or false, so there's no need to return an integer. Currently it returns 0 when a range with the given bits is set and 1 when not found, which is a bit counter intuitive too. So change the function to return a bool instead, returning true when a range is found and false otherwise. Update the function's documentation to mention the return value too. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15btrfs: rename remaining exported functions from extent-io-tree.hFilipe Manana
Rename the remaning exported functions that don't have a 'btrfs_' prefix. By convention exported functions should have such prefix to make it clear they are btrfs specific and to avoid collisions with functions from elsewhere in the kernel. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15btrfs: rename free_extent_state() to include a btrfs prefixFilipe Manana
This is an exported function so it should have a 'btrfs_' prefix by convention, to make it clear it's btrfs specific and to avoid collisions with functions from elsewhere in the kernel. Rename the function to add 'btrfs_' prefix to it. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15btrfs: rename the functions to count, test and get bit ranges in io treesFilipe Manana
These functions are exported so they should have a 'btrfs_' prefix by convention, to make it clear they are btrfs specific and to avoid collisions with functions from elsewhere in the kernel. So add a 'btrfs_' prefix to their names to make it clear they are from btrfs. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15btrfs: rename the functions to init and release an extent io treeFilipe Manana
These functions are exported so they should have a 'btrfs_' prefix by convention, to make it clear they are btrfs specific and to avoid collisions with functions from elsewhere in the kernel. So add a 'btrfs_' prefix to their name to make it clear they are from btrfs. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15btrfs: rename the functions to get inode and fs_info from an extent io treeFilipe Manana
These functions are exported so they should have a 'btrfs_' prefix by convention, to make it clear they are btrfs specific and to avoid collisions with functions from elsewhere in the kernel. So add a 'btrfs_' prefix to their name to make it clear they are from btrfs. Also remove the 'const' suffix from extent_io_tree_to_inode_const() since there's no non-const variant anymore and makes the naming consistent with extent_io_tree_to_fs_info() (no 'const' suffix and returns a const pointer). Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15btrfs: rename the functions to search for bits in extent rangesFilipe Manana
These functions are exported so they should have a 'btrfs_' prefix by convention, to make it clear they are btrfs specific and to avoid collisions with functions from elsewhere in the kernel. So add a 'btrfs_' prefix to their name to make it clear they are from btrfs. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15btrfs: rename set_extent_bit() to include a btrfs prefixFilipe Manana
This is an exported function so it should have a 'btrfs_' prefix by convention, to make it clear it's btrfs specific and to avoid collisions with functions from elsewhere in the kernel. So rename it to btrfs_set_extent_bit(). Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15btrfs: rename the functions to clear bits for an extent rangeFilipe Manana
These functions are exported so they should have a 'btrfs_' prefix by convention, to make it clear they are btrfs specific and to avoid collisions with functions from elsewhere in the kernel. One of them has a double underscore prefix which is also discouraged. So remove double underscore prefix where applicable and add a 'btrfs_' prefix to their name to make it clear they are from btrfs. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15btrfs: rename __lock_extent() and __try_lock_extent()Filipe Manana
These functions are exported so they should have a 'btrfs_' prefix by convention, to make it clear they are btrfs specific and to avoid collisions with functions from elsewhere in the kernel. Their double underscore prefix is also discouraged. So remove their double underscore prefix, add a 'btrfs_' prefix to their name to make it clear they are from btrfs and a '_bits' suffix to avoid collision with btrfs_lock_extent() and btrfs_try_lock_extent(). Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15btrfs: add btrfs prefix to dio lock and unlock extent functionsFilipe Manana
These functions are exported so they should have a 'btrfs_' prefix by convention, to make it clear they are btrfs specific and to avoid collisions with functions from elsewhere in the kernel. So add a prefix to their name. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15btrfs: add btrfs prefix to main lock, try lock and unlock extent functionsFilipe Manana
These functions are exported so they should have a 'btrfs_' prefix by convention, to make it clear they are btrfs specific and to avoid collisions with functions from elsewhere in the kernel. So add a prefix to their name. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15btrfs: remove extent_io_tree_to_inode() and is_inode_io_tree()Filipe Manana
These functions aren't used outside extent-io-tree.c, but yet one of them (extent_io_tree_to_inode()) is unnecessarily exported in the header. Furthermore their single use is in a pattern like this: if (is_inode_io_tree(tree)) foo(extent_io_tree_to_inode(tree), ...); So we're effectively unnecessarily adding more indirection, checking twice if tree->owner == IO_TREE_INODE_IO before getting the inode and doing a non-inline function call to get tree->inode. Simplify this by removing these helper functions and instead doing thing like this: if (tree->owner == IO_TREE_INODE_IO) foo(tree->inode, ...); Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15btrfs: pass a pointer to get_range_bits() to cache first search resultFilipe Manana
Allow get_range_bits() to take an extent state pointer to pointer argument so that we can cache the first extent state record in the target range, so that a caller can use it for subsequent operations without doing a full tree search. Currently the only user is try_release_extent_state(), which then does a call to __clear_extent_bit() which can use such a cached state record. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15btrfs: allow folios to be released while ordered extent is finishingFilipe Manana
When the release_folio callback (from struct address_space_operations) is invoked we don't allow the folio to be released if its range is currently locked in the inode's io_tree, as it may indicate the folio may be needed by the task that locked the range. However if the range is locked because an ordered extent is finishing, then we can safely allow the folio to be released because ordered extent completion doesn't need to use the folio at all. When we are under memory pressure, the kernel starts writeback of dirty pages (folios) with the goal of releasing the pages from the page cache after writeback completes, however this often is not possible on btrfs because: * Once the writeback completes we queue the ordered extent completion; * Once the ordered extent completion starts, we lock the range in the inode's io_tree (at btrfs_finish_one_ordered()); * If the release_folio callback is called while the folio's range is locked in the inode's io_tree, we don't allow the folio to be released, so the kernel has to try to release memory elsewhere, which may result in triggering more writeback or releasing other pages from the page cache which may be more useful to have around for applications. In contrast, when the release_folio callback is invoked after writeback finishes and before ordered extent completion starts or locks the range, we allow the folio to be released, as well as when the release_folio callback is invoked after ordered extent completion unlocks the range. Improve on this by detecting if the range is locked for ordered extent completion and if it is, allow the folio to be released. This detection is achieved by adding a new extent flag in the io_tree that is set when the range is locked during ordered extent completion. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15btrfs: remove EXTENT_UPTODATE io tree flagFilipe Manana
The EXTENT_UPTODATE io tree flag is now used only to mark ranges in the fs_info->excluded_extents as used by super blocks and not available for extent allocation (to prevent adding those ranges as free space in the in memory space caches). As we can use any flag for that purpose, and we are using EXTENT_DIRTY for the pinned extents io tree for example, remove the EXTENT_UPTODATE flag and use instead EXTENT_DIRTY for the excluded extents io tree. Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15btrfs: remove leftover EXTENT_UPTODATE clear from an inode's io_treeFilipe Manana
After commit 52b029f42751 ("btrfs: remove unnecessary EXTENT_UPTODATE state in buffered I/O path") we never set EXTENT_UPTODATE in an inode's io_tree anymore, but we still have some code attempting to clear that bit from an inode's io_tree. Remove that code as it doesn't do anything anymore. The sole use of the EXTENT_UPTODATE bit is for the excluded extents io_tree (fs_info->excluded_extents), which is used to track the locations of super blocks, so that their ranges are never marked as free, making them unavailable for extent allocation. Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-10btrfs: introduce EXTENT_DIO_LOCKEDJosef Bacik
In order to support dropping the extent lock during a read we need a way to make sure that direct reads and direct writes for overlapping ranges are protected from each other. To accomplish this introduce another lock bit specifically for direct io. Subsequent patches will utilize this to protect direct IO operations. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-03-04btrfs: add forward declarations and headers, part 2David Sterba
Do a cleanup in more headers: - add forward declarations for types referenced by pointers - add includes when types need them This fixes potential compilation problems if the headers are reordered or the missing includes are not provided indirectly. Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15btrfs: always set extent_io_tree::inode and drop fs_infoDavid Sterba
The extent_io_tree is embedded in several structures, notably in struct btrfs_inode. The fs_info is only used for reporting errors and for reference in trace points. We can get to the pointer through the inode, but not all io trees set it. However, we always know the owner and can recognize if inode is valid. For access helpers are provided, const variant for the trace points. This reduces size of extent_io_tree by 8 bytes and following structures in turn: - btrfs_inode 1104 -> 1088 - btrfs_device 520 -> 512 - btrfs_root 1360 -> 1344 - btrfs_transaction 456 -> 440 - btrfs_fs_info 3600 -> 3592 - reloc_control 1520 -> 1512 Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12btrfs: make wait_extent_bit() staticFilipe Manana
The function wait_extent_bit() is not used outside extent-io-tree.c so make it static. Furthermore the function doesn't have the 'btrfs_' prefix. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12btrfs: change test_range_bit to scan the whole rangeDavid Sterba
The semantics of test_range_bit() with filled == 0 is now in it's own helper so test_range_bit will check the whole range unconditionally. The detection logic is flipped and assumes success by default and catches exceptions. Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12btrfs: add specific helper for range bit test existsDavid Sterba
The existing helper test_range_bit works in two ways, checks if the whole range contains all the bits, or stop on the first occurrence. By adding a specific helper for the latter case, the inner loop can be simplified and contains fewer conditionals, making it a bit faster. There's no caller that uses the cached state pointer so this reduces the argument count further. Signed-off-by: David Sterba <dsterba@suse.com>
2023-08-21btrfs: make find_first_extent_bit() return a booleanFilipe Manana
Currently find_first_extent_bit() returns a 0 if it found a range in the given io tree and 1 if it didn't find any. There's no need to return any errors, so make the return value a boolean and invert the logic to make more sense: return true if it found a range and false if it didn't find any range. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-06-19btrfs: drop gfp from parameter extent state helpersDavid Sterba
Now that all extent state bit helpers effectively take the GFP_NOFS mask (and GFP_NOWAIT is encoded in the bits) we can remove the parameter. This reduces stack consumption in many functions and simplifies a lot of code. Net effect on module on a release build: text data bss dec hex filename 1250432 20985 16088 1287505 13a551 pre/btrfs.ko 1247074 20985 16088 1284147 139833 post/btrfs.ko DELTA: -3358 Signed-off-by: David Sterba <dsterba@suse.com>
2023-06-19btrfs: pass NOWAIT for set/clear extent bits as another bitDavid Sterba
The only flags we now pass to set_extent_bit/__clear_extent_bit are GFP_NOFS and GFP_NOWAIT (a few functions handling mappings). This requires an extra parameter to be passed everywhere but is almost always the same. Encode the GFP_NOWAIT as an artificial extent bit and extract the real bits and gfp mask in the lowest level helpers. Now the passed gfp mask is not actually used and can be removed. Signed-off-by: David Sterba <dsterba@suse.com>
2023-06-19btrfs: open code set_extent_bitsDavid Sterba
This helper calls set_extent_bit with two more parameters set to default values, but otherwise it's purpose is not clear. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-06-19btrfs: open code set_extent_bits_nowaitDavid Sterba
The helper only passes GFP_NOWAIT as gfp flags and is used two times. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-06-19btrfs: open code set_extent_dirtyDavid Sterba
The helper is used a few times, that it's setting the DIRTY extent bit is still clear. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-06-19btrfs: open code set_extent_newDavid Sterba
The helper is used only once. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-06-19btrfs: open code set_extent_delallocDavid Sterba
The helper is used once in fs code and a few times in the self test code. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-06-19btrfs: open code set_extent_defragDavid Sterba
The helper is used only once. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15btrfs: remove the io_failure_record infrastructureChristoph Hellwig
struct io_failure_record and the io_failure_tree tree are unused now, so remove them. This in turn makes struct btrfs_inode smaller by 16 bytes. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05btrfs: allow passing a cached state record to count_range_bits()Filipe Manana
An inode's io_tree can be quite large and there are cases where due to delalloc it can have thousands of extent state records, which makes the red black tree have a depth of 10 or more, making the operation of count_range_bits() slow if we repeatedly call it for a range that starts where, or after, the previous one we called it for. Such use cases are when searching for delalloc in a file range that corresponds to a hole or a prealloc extent, which is done during lseek SEEK_HOLE/DATA and fiemap. So introduce a cached state parameter to count_range_bits() which we use to store the last extent state record we visited, and then allow the caller to pass it again on its next call to count_range_bits(). The next patches in the series will make fiemap and lseek use the new parameter. This change is part of a patchset that has the goal to make performance better for applications that use lseek's SEEK_HOLE and SEEK_DATA modes to iterate over the extents of a file. Two examples are the cp program from coreutils 9.0+ and the tar program (when using its --sparse / -S option). A sample test and results are listed in the changelog of the last patch in the series: 1/9 btrfs: remove leftover setting of EXTENT_UPTODATE state in an inode's io_tree 2/9 btrfs: add an early exit when searching for delalloc range for lseek/fiemap 3/9 btrfs: skip unnecessary delalloc searches during lseek/fiemap 4/9 btrfs: search for delalloc more efficiently during lseek/fiemap 5/9 btrfs: remove no longer used btrfs_next_extent_map() 6/9 btrfs: allow passing a cached state record to count_range_bits() 7/9 btrfs: update stale comment for count_range_bits() 8/9 btrfs: use cached state when looking for delalloc ranges with fiemap 9/9 btrfs: use cached state when looking for delalloc ranges with lseek Reported-by: Wang Yugui <wangyugui@e16-tech.com> Link: https://lore.kernel.org/linux-btrfs/20221106073028.71F9.409509F4@e16-tech.com/ Link: https://lore.kernel.org/linux-btrfs/CAL3q7H5NSVicm7nYBJ7x8fFkDpno8z3PYt5aPU43Bajc1H0h1Q@mail.gmail.com/ Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05btrfs: remove leftover setting of EXTENT_UPTODATE state in an inode's io_treeFilipe Manana
We don't need to set the EXTENT_UPDATE bit in an inode's io_tree to mark a range as uptodate, we rely on the pages themselves being uptodate - page reading is not triggered for already uptodate pages. Recently we removed most use of the EXTENT_UPTODATE for buffered IO with commit 52b029f42751 ("btrfs: remove unnecessary EXTENT_UPTODATE state in buffered I/O path"), but there were a few leftovers, namely when reading from holes and successfully finishing read repair. These leftovers are unnecessarily making an inode's tree larger and deeper, slowing down searches on it. So remove all the leftovers. This change is part of a patchset that has the goal to make performance better for applications that use lseek's SEEK_HOLE and SEEK_DATA modes to iterate over the extents of a file. Two examples are the cp program from coreutils 9.0+ and the tar program (when using its --sparse / -S option). A sample test and results are listed in the changelog of the last patch in the series: 1/9 btrfs: remove leftover setting of EXTENT_UPTODATE state in an inode's io_tree 2/9 btrfs: add an early exit when searching for delalloc range for lseek/fiemap 3/9 btrfs: skip unnecessary delalloc searches during lseek/fiemap 4/9 btrfs: search for delalloc more efficiently during lseek/fiemap 5/9 btrfs: remove no longer used btrfs_next_extent_map() 6/9 btrfs: allow passing a cached state record to count_range_bits() 7/9 btrfs: update stale comment for count_range_bits() 8/9 btrfs: use cached state when looking for delalloc ranges with fiemap 9/9 btrfs: use cached state when looking for delalloc ranges with lseek Reported-by: Wang Yugui <wangyugui@e16-tech.com> Link: https://lore.kernel.org/linux-btrfs/20221106073028.71F9.409509F4@e16-tech.com/ Link: https://lore.kernel.org/linux-btrfs/CAL3q7H5NSVicm7nYBJ7x8fFkDpno8z3PYt5aPU43Bajc1H0h1Q@mail.gmail.com/ Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05btrfs: switch extent_io_tree::private_data to btrfs_inode and renameDavid Sterba
The extent_io_tree::private_data was meant to be a preparatory work for the metadata inode rework but that never materialized. Now it's used only for an inode so it's better to change the appropriate type and rename it. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05btrfs: drop private_data parameter from extent_io_tree_initDavid Sterba
All callers except one pass NULL, so the parameter can be dropped and the inode::io_tree initialization can be open coded. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05btrfs: remove unused unlock_extent_atomicJosef Bacik
As of "btrfs: do not use GFP_ATOMIC in the read endio" we no longer have any users of unlock_extent_atomic, remove it. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05btrfs: convert EXTENT_* bits to enumsDavid Sterba
Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05btrfs: cache the failed state when locking extentsJosef Bacik
Currently if we fail to lock a range we'll return the start of the range that we failed to lock. We'll then search down to this range and wait on any extent states in this range. However we can avoid this search altogether if we simply cache the extent_state that had the contention. We can pass this into wait_extent_bit() and start from that extent_state without doing the search. In the most optimistic case we can avoid all searches, more likely we'll avoid the initial search and have to perform the search after we wait on the failed state, or worst case we must search both times which is what currently happens. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05btrfs: add a cached_state to try_lock_extentJosef Bacik
With nowait becoming more pervasive throughout our codebase go ahead and add a cached_state to try_lock_extent(). This allows us to be faster about clearing the locked area if we have contention, and then gives us the same optimization for unlock if we are able to lock the range. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-09-26btrfs: stop tracking failed reads in the I/O treeChristoph Hellwig
There is a separate I/O failure tree to track the fail reads, so remove the extra EXTENT_DAMAGED bit in the I/O tree as it's set but never used. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-09-26btrfs: replace delete argument with EXTENT_CLEAR_ALL_BITSJosef Bacik
Instead of taking up a whole argument to indicate we're clearing everything in a range, simply add another EXTENT bit to control this, and then update all the callers to drop this argument from the clear_extent_bit variants. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-09-26btrfs: get rid of extent_io_tree::dirty_bytesJosef Bacik
This was used as an optimization for count_range_bits(EXTENT_DIRTY), which was used by the failed record code. However this was removed in this series by patch "btrfs: convert the io_failure_tree to a plain rb_tree" which was the last user of this optimization. Remove the ->dirty_bytes as nobody cares anymore. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-09-26btrfs: remove extent_io_tree::track_uptodateJosef Bacik
Since commit 78361f64ff42 ("btrfs: remove unnecessary EXTENT_UPTODATE state in buffered I/O path") we no longer check ->track_uptodate, remove it. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-09-26btrfs: unify the lock/unlock extent variantsJosef Bacik
We have two variants of lock/unlock extent, one set that takes a cached state, another that does not. This is slightly annoying, and generally speaking there are only a few places where we don't have a cached state. Simplify this by making lock_extent/unlock_extent the only variant and make it take a cached state, then convert all the callers appropriately. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-09-26btrfs: drop extent_changeset from set_extent_bitJosef Bacik
The only places that set extent_changeset is set_record_extent_bits, everywhere else sets it to NULL. Drop this argument from set_extent_bit. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-09-26btrfs: remove failed_start argument from set_extent_bitJosef Bacik
This is only used for internal locking related helpers, everybody else just passes in NULL. I've changed set_extent_bit to __set_extent_bit and made it static, removed failed_start from set_extent_bit and have it call __set_extent_bit with a NULL failed_start, and I've moved some code down below the now static __set_extent_bit. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-09-26btrfs: remove the wake argument from clear_extent_bitsJosef Bacik
This is only used in the case that we are clearing EXTENT_LOCKED, so infer this value from the bits passed in instead of taking it as an argument. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>