summaryrefslogtreecommitdiff
path: root/fs/btrfs/extent_io.c
AgeCommit message (Collapse)Author
2023-08-17btrfs: only subtract from len_to_oe_boundary when it is tracking an extentChris Mason
bio_ctrl->len_to_oe_boundary is used to make sure we stay inside a zone as we submit bios for writes. Every time we add a page to the bio, we decrement those bytes from len_to_oe_boundary, and then we submit the bio if we happen to hit zero. Most of the time, len_to_oe_boundary gets set to U32_MAX. submit_extent_page() adds pages into our bio, and the size of the bio ends up limited by: - Are we contiguous on disk? - Does bio_add_page() allow us to stuff more in? - is len_to_oe_boundary > 0? The len_to_oe_boundary math starts with U32_MAX, which isn't page or sector aligned, and subtracts from it until it hits zero. In the non-zoned case, the last IO we submit before we hit zero is going to be unaligned, triggering BUGs. This is hard to trigger because bio_add_page() isn't going to make a bio of U32_MAX size unless you give it a perfect set of pages and fully contiguous extents on disk. We can hit it pretty reliably while making large swapfiles during provisioning because the machine is freshly booted, mostly idle, and the disk is freshly formatted. It's also possible to trigger with reads when read_ahead_kb is set to 4GB. The code has been clean up and shifted around a few times, but this flaw has been lurking since the counter was added. I think the commit 24e6c8082208 ("btrfs: simplify main loop in submit_extent_page") ended up exposing the bug. The fix used here is to skip doing math on len_to_oe_boundary unless we've changed it from the default U32_MAX value. bio_add_page() is the real limit we want, and there's no reason to do extra math when block layer is doing it for us. Sample reproducer, note you'll need to change the path to the bdi and device: SUBVOL=/btrfs/swapvol SWAPFILE=$SUBVOL/swapfile SZMB=8192 mkfs.btrfs -f /dev/vdb mount /dev/vdb /btrfs btrfs subvol create $SUBVOL chattr +C $SUBVOL dd if=/dev/zero of=$SWAPFILE bs=1M count=$SZMB sync echo 4 > /proc/sys/vm/drop_caches echo 4194304 > /sys/class/bdi/btrfs-2/read_ahead_kb while true; do echo 1 > /proc/sys/vm/drop_caches echo 1 > /proc/sys/vm/drop_caches dd of=/dev/zero if=$SWAPFILE bs=4096M count=2 iflag=fullblock done Fixes: 24e6c8082208 ("btrfs: simplify main loop in submit_extent_page") CC: stable@vger.kernel.org # 6.4+ Reviewed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-08-10btrfs: don't wait for writeback on clean pages in extent_write_cache_pagesChristoph Hellwig
__extent_writepage could have started on more pages than the one it was called for. This happens regularly for zoned file systems, and in theory could happen for compressed I/O if the worker thread was executed very quickly. For such pages extent_write_cache_pages waits for writeback to complete before moving on to the next page, which is highly inefficient as it blocks the flusher thread. Port over the PageDirty check that was added to write_cache_pages in commit 515f4a037fb ("mm: write_cache_pages optimise page cleaning") to fix this. CC: stable@vger.kernel.org # 4.14+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-08-10btrfs: don't stop integrity writeback too earlyChristoph Hellwig
extent_write_cache_pages stops writing pages as soon as nr_to_write hits zero. That is the right thing for opportunistic writeback, but incorrect for data integrity writeback, which needs to ensure that no dirty pages are left in the range. Thus only stop the writeback for WB_SYNC_NONE if nr_to_write hits 0. This is a port of write_cache_pages changes in commit 05fe478dd04e ("mm: write_cache_pages integrity fix"). Note that I've only trigger the problem with other changes to the btrfs writeback code, but this condition seems worthwhile fixing anyway. CC: stable@vger.kernel.org # 4.14+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> [ updated comment ] Signed-off-by: David Sterba <dsterba@suse.com>
2023-06-19btrfs: use btrfs_finish_ordered_extent to complete buffered writesChristoph Hellwig
Use the btrfs_finish_ordered_extent helper to complete compressed writes using the bbio->ordered pointer instead of requiring an rbtree lookup in the otherwise equivalent btrfs_mark_ordered_io_finished called from btrfs_writepage_endio_finish_ordered. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-06-19btrfs: open code end_extent_writepage in end_bio_extent_writepageChristoph Hellwig
This prepares for switching to more efficient ordered_extent processing and already removes the forth and back conversion from len to end back to len. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-06-19btrfs: add an ordered_extent pointer to struct btrfs_bioChristoph Hellwig
Add a pointer to the ordered_extent to the existing union in struct btrfs_bio, so all code dealing with data write bios can just use a pointer dereference to retrieve the ordered_extent instead of doing multiple rbtree lookups per I/O. The reference to this ordered_extent is dropped at end I/O time, which implies that an extra one must be acquired when the bio is split. This also requires moving the btrfs_extract_ordered_extent call into btrfs_split_bio so that the invariant of always having a valid ordered_extent reference for the btrfs_bio is kept. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-06-19btrfs: limit write bios to a single ordered extentChristoph Hellwig
Currently buffered writeback bios are allowed to span multiple ordered_extents, although that basically never actually happens since commit 4a445b7b6178 ("btrfs: don't merge pages into bio if their page offset is not contiguous"). Supporting bios than span ordered_extents complicates the file checksumming code, and prevents us from adding an ordered_extent pointer to the btrfs_bio structure. Use the existing code to limit a bio to single ordered_extent for zoned device writes for all writes. This allows to remove the REQ_BTRFS_ONE_ORDERED flags, and the handling of multiple ordered_extents in btrfs_csum_one_bio. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-06-19btrfs: don't treat zoned writeback as being from an async helper threadChristoph Hellwig
When extent_write_locked_range was originally added, it was only used writing back compressed pages from an async helper thread. But it is now also used for writing back pages on zoned devices, where it is called directly from the ->writepage context. In this case we want to be able to pass on the writeback_control instead of creating a new one, and more importantly want to use all the normal cgroup interaction instead of potentially deferring writeback to another helper. Fixes: 898793d992c2 ("btrfs: zoned: write out partially allocated region") Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-06-19btrfs: only call __extent_writepage_io from extent_write_locked_rangeChristoph Hellwig
__extent_writepage does a lot of things that make no sense for extent_write_locked_range, given that extent_write_locked_range itself is called from __extent_writepage either directly or through a workqueue, and all this work has already been done in the first invocation and the pages haven't been unlocked since. Call __extent_writepage_io directly instead and open code the logic tracked in btrfs_bio_ctrl::extent_locked. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-06-19btrfs: move writeback_control::nr_to_write update to __extent_writepageChristoph Hellwig
Move the nr_to_write accounting from __extent_writepage_io to __extent_writepage_io as we'll grow another __extent_writepage_io that doesn't want this accounting soon. Also drop the obsolete comment - decrementing a counter in the on-stack writeback_control data structure doesn't need the page lock. Reviewed-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-06-19btrfs: remove non-standard extent handling in __extent_writepage_ioChristoph Hellwig
__extent_writepage_io is never called for compressed or inline extents, or holes. Remove the not quite working code for them and replace it with asserts that these cases don't happen. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-06-19btrfs: remove PAGE_SET_ERRORChristoph Hellwig
Now that the btrfs writeback code has stopped using PageError, using PAGE_SET_ERROR to just set the per-address_space error flag is confusing. Open code the mapping_set_error calls in the callers and remove the PAGE_SET_ERROR flag. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-06-19btrfs: stop setting PageError in the data I/O pathChristoph Hellwig
PageError is not used by the VFS/MM and deprecated because it uses up a page bit and has no coherent rules. Instead read errors are usually propagated by not setting or clearing the uptodate bit, and write errors are propagated through the address_space. Btrfs now only sets the flag and never clears it for data pages, so just remove all places setting it, and the subpage error bit. Note that the error propagation for superblock writes that work on the block device mapping still uses PageError for now, but that will be addressed in a separate series. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-06-19btrfs: don't check PageError in __extent_writepageChristoph Hellwig
__extent_writepage currenly sets PageError whenever any error happens, and the also checks for PageError to decide if to call error handling. This leads to very unclear responsibility for cleaning up on errors. In the VM and generic writeback helpers the basic idea is that once I/O is fired off all error handling responsibility is delegated to the end I/O handler. But if that end I/O handler sets the PageError bit, and the submitter checks it, the bit could in some cases leak into the submission context for fast enough I/O. Fix this by simply not checking PageError and just using the local ret variable to check for submission errors. This also fundamentally solves the long problem documented in a comment in __extent_writepage by never leaking the error bit into the submission context. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-06-19btrfs: don't check PageError in btrfs_verify_pageChristoph Hellwig
btrfs_verify_page is called from the readpage completion handler, which is only used to read pages, or parts of pages that aren't uptodate yet. The only case where PageError could be set on a page in btrfs is if we had a previous writeback error, but in that case we won't called readpage on it, as it has previously been marked uptodate. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-06-19btrfs: fix fsverify read error handling in end_page_readChristoph Hellwig
Also clear the uptodate bit to make sure the page isn't seen as uptodate in the page cache if fsverity verification fails. Fixes: 146054090b08 ("btrfs: initial fsverity support") CC: stable@vger.kernel.org # 5.15+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-06-19btrfs: factor out a btrfs_verify_page helperChristoph Hellwig
Split all the conditionals for the fsverity calls in end_page_read into a btrfs_verify_page helper to keep the code readable and make additional refactoring easier. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-06-19btrfs: fix range_end calculation in extent_write_locked_rangeChristoph Hellwig
The range_end field in struct writeback_control is inclusive, just like the end parameter passed to extent_write_locked_range. Not doing this could cause extra writeout, which is harmless but suboptimal. Fixes: 771ed689d2cd ("Btrfs: Optimize compressed writeback and reads") CC: stable@vger.kernel.org # 5.9+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-06-19btrfs: use the same uptodate variable for end_bio_extent_readpage()Qu Wenruo
In function end_bio_extent_readpage() we call endio_readpage_release_extent() to unlock the extent io tree. However we pass PageUptodate(page) as @uptodate parameter for it, while for previous end_page_read() call, we use a dedicated @uptodate local variable. This is not a big deal, as even for subpage cases, either the bio only covers part of the page, then the @uptodate is always false, and the subpage ranges can still be merged. But for the sake of consistency, always use @uptodate variable when possible. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-06-19btrfs: subpage: make alloc_extent_buffer() handle previously uptodate range ↵Qu Wenruo
efficiently Currently alloc_extent_buffer() would make the extent buffer uptodate if the corresponding pages are also uptodate. But this check is only checking PageUptodate, which is fine for regular cases, but not for subpage cases, as we can have multiple extent buffers in the same page. So here we go btrfs_page_test_uptodate() instead. The old code doesn't cause any problem, but is not efficient, as it would cause extra metadata read even if the range is already uptodate. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-06-19btrfs: subpage: dump extra subpage bitmaps for debugQu Wenruo
There is a bug report that assert_eb_page_uptodate() gets triggered for free space tree metadata. Without proper dump for the subpage bitmaps it's much harder to debug. Thus this patch would dump all the subpage bitmaps (split them into their own bitmaps) for a easier debugging. The output would look like this: (Dumped after a tree block got read from disk) page:000000006e34bf49 refcount:4 mapcount:0 mapping:0000000067661ac4 index:0x1d1 pfn:0x110e9 memcg:ffff0000d7d62000 aops:btree_aops [btrfs] ino:1 flags: 0x8000000000002002(referenced|private|zone=2) page_type: 0xffffffff() raw: 8000000000002002 0000000000000000 dead000000000122 ffff00000188bed0 raw: 00000000000001d1 ffff0000c7992700 00000004ffffffff ffff0000d7d62000 page dumped because: btrfs subpage dump BTRFS warning (device dm-1): start=30490624 len=16384 page=30474240 bitmaps: uptodate=4-7 error= dirty= writeback= ordered= checked= Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-06-19btrfs: drop gfp from parameter extent state helpersDavid Sterba
Now that all extent state bit helpers effectively take the GFP_NOFS mask (and GFP_NOWAIT is encoded in the bits) we can remove the parameter. This reduces stack consumption in many functions and simplifies a lot of code. Net effect on module on a release build: text data bss dec hex filename 1250432 20985 16088 1287505 13a551 pre/btrfs.ko 1247074 20985 16088 1284147 139833 post/btrfs.ko DELTA: -3358 Signed-off-by: David Sterba <dsterba@suse.com>
2023-06-19btrfs: merge write_one_subpage_eb into write_one_ebChristoph Hellwig
Most of the code in write_one_subpage_eb and write_one_eb is shared, so merge the two functions into one. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-06-19btrfs: use per-buffer locking for extent_buffer readingChristoph Hellwig
Instead of locking and unlocking every page or the extent, just add a new EXTENT_BUFFER_READING bit that mirrors EXTENT_BUFFER_WRITEBACK for synchronizing threads trying to read an extent_buffer and to wait for I/O completion. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-06-19btrfs: don't check for uptodate pages in read_extent_buffer_pagesChristoph Hellwig
The only place that reads in pages and thus marks them uptodate for the btree inode is read_extent_buffer_pages. Which means that either pages are already uptodate from an old buffer when creating a new one in alloc_extent_buffer, or they will be updated by ca call to read_extent_buffer_pages. This means the checks for uptodate pages in read_extent_buffer_pages and read_extent_buffer_subpage are superfluous and can be removed. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-06-19btrfs: stop using PageError for extent_buffersChristoph Hellwig
PageError is only used to limit the uptodate check in assert_eb_page_uptodate. But we have a much more useful flag indicating the exact condition we are about with the EXTENT_BUFFER_WRITE_ERR flag, so use that instead and help the kernel toward eventually removing PageError. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-06-19btrfs: remove the io_pages field in struct extent_bufferChristoph Hellwig
No need to track the number of pages under I/O now that each extent_buffer is read and written using a single bio. For the read side we need to grab an extra reference for the duration of the I/O to prevent eviction, though. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-06-19btrfs: use a separate end_io handler for extent_buffer writingChristoph Hellwig
Now that we always use a single bio to write an extent_buffer, the buffer can be passed to the end_io handler as private data. This allows to simplify the metadata write end I/O handler, and merge the subpage end_io handler into the main one. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-06-19btrfs: don't use btrfs_bio_ctrl for extent buffer writingChristoph Hellwig
The btrfs_bio_ctrl machinery is overkill for writing extent_buffers as we always operate on PAGE_SIZE chunks (or one smaller one for the subpage case) that are contiguous and are guaranteed to fit into a single bio. Replace it with open coded btrfs_bio_alloc, __bio_add_page and btrfs_submit_bio calls. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-06-19btrfs: move page locking from lock_extent_buffer_for_io to write_one_ebChristoph Hellwig
Locking the pages in lock_extent_buffer_for_io only for the non-subpage case is very confusing. Move it to write_one_eb to mirror the subpage case and simplify the code. Now lock_extent_buffer_for_io does not leave all the pages locked and each is individually locked/unlocked in write_one_eb. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-06-19btrfs: submit a writeback bio per extent_bufferChristoph Hellwig
Stop trying to cluster writes of multiple extent_buffers into a single bio. There is no need for that as the blk_plug mechanism used all the way up in writeback_inodes_wb gives us the same I/O pattern even with multiple bios. Removing the clustering simplifies lock_extent_buffer_for_io a lot and will also allow passing the eb as private data to the end I/O handler. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-06-19btrfs: return bool from lock_extent_buffer_for_ioChristoph Hellwig
lock_extent_buffer_for_io never returns a negative error value, so switch the return value to a simple bool. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> [ keep noinline_for_stack ] Signed-off-by: David Sterba <dsterba@suse.com>
2023-06-19btrfs: do not try to unlock the extent for non-subpage metadata readsChristoph Hellwig
Only subpage metadata reads lock the extent. Don't try to unlock it and waste cycles in the extent tree lookup for PAGE_SIZE or larger metadata. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>
2023-06-19btrfs: use a separate end_io handler for read_extent_bufferChristoph Hellwig
Now that we always use a single bio to read an extent_buffer, the buffer can be passed to the end_io handler as private data. This allows implementing a much simplified dedicated end I/O handler for metadata reads. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>
2023-06-19btrfs: remove the mirror_num argument to btrfs_submit_compressed_readChristoph Hellwig
Given that read recovery for data I/O is handled in the storage layer, the mirror_num argument to btrfs_submit_compressed_read is always 0, so remove it. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-06-19btrfs: don't use btrfs_bio_ctrl for extent buffer readingChristoph Hellwig
The btrfs_bio_ctrl machinery is overkill for reading extent_buffers as we always operate on PAGE_SIZE chunks (or one smaller one for the subpage case) that are contiguous and are guaranteed to fit into a single bio. Replace it with open coded btrfs_bio_alloc, __bio_add_page and btrfs_submit_bio calls in a helper function shared between the subpage and node size >= PAGE_SIZE cases. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-06-19btrfs: always read the entire extent_bufferChristoph Hellwig
Currently read_extent_buffer_pages skips pages that are already uptodate when reading in an extent_buffer. While this reduces the amount of data read, it increases the number of I/O operations as we now need to do multiple I/Os when reading an extent buffer with one or more uptodate pages in the middle of it. On any modern storage device, be that hard drives or SSDs this actually decreases I/O performance. Fortunately this case is pretty rare as the pages are always initially read together and then aged the same way. Besides simplifying the code a bit as-is this will allow for major simplifications to the I/O completion handler later on. Note that the case where all pages are uptodate is still handled by an optimized fast path that does not read any data from disk. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-06-19btrfs: subpage: fix error handling in end_bio_subpage_eb_writepageChristoph Hellwig
Call btrfs_page_clear_uptodate instead of ClearPageUptodate to properly manage the uptodate bit for the subpage case. Reported-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-06-19btrfs: mark extent_buffer_under_io staticChristoph Hellwig
extent_buffer_under_io is only used in extent_io.c, so mark it static. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-06-19btrfs: don't hold an extra reference for redirtied buffersChristoph Hellwig
When btrfs_redirty_list_add redirties a buffer, it also acquires an extra reference that is released on transaction commit. But this is not required as buffers that are dirty or under writeback are never freed (look for calls to extent_buffer_under_io())). Remove the extra reference and the infrastructure used to drop it again. History behind redirty logic: In the first place, it used releasing_list to hold all the to-be-released extent buffers, and decided which buffers to re-dirty at the commit time. Then, in a later version, the behaviour got changed to re-dirty a necessary buffer and add re-dirtied one to the list in btrfs_free_tree_block(). In short, the list was there mostly for the patch series' historical reason. Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> [ add Naohiro's comment regarding history ] Signed-off-by: David Sterba <dsterba@suse.com>
2023-06-19btrfs: fix dirty_metadata_bytes for redirtied buffersChristoph Hellwig
dirty_metadata_bytes is decremented in both places that clear the dirty bit in a buffer, but only incremented in btrfs_mark_buffer_dirty, which means that a buffer that is redirtied using btrfs_redirty_list_add won't be added to dirty_metadata_bytes, but it will be subtracted when written out, leading an inconsistency in the counter. Move the dirty_metadata_bytes from btrfs_mark_buffer_dirty into set_extent_buffer_dirty to also account for the redirty case, and remove the now unused set_extent_buffer_dirty return value. Fixes: d3575156f662 ("btrfs: zoned: redirty released extent buffers") CC: stable@vger.kernel.org # 5.15+ Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17btrfs: introduce btrfs_bio::fs_info memberQu Wenruo
Currently we're doing a lot of work for btrfs_bio: - Checksum verification for data read bios - Bio splits if it crosses stripe boundary - Read repair for data read bios However for the incoming scrub patches, we don't want this extra functionality at all, just plain logical + mirror -> physical mapping ability. Thus here we do the following changes: - Introduce btrfs_bio::fs_info This is for the new scrub specific btrfs_bio, which would not populate btrfs_bio::inode. Thus we need such new member to grab a fs_info This new member will always be populated. - Replace @inode argument with @fs_info for btrfs_bio_init() and its caller Since @inode is no longer a mandatory member, replace it with @fs_info, and let involved users populate @inode. - Skip checksum verification and generation if @bbio->inode is NULL - Add extra ASSERT()s To make sure: * bbio->inode is properly set for involved read repair path * if @file_offset is set, bbio->inode is also populated - Grab @fs_info from @bbio directly We can no longer go @bbio->inode->root->fs_info, as bbio->inode can be NULL. This involves: * btrfs_simple_end_io() * should_async_write() * btrfs_wq_submit_bio() * btrfs_use_zone_append() Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17btrfs, block: move REQ_CGROUP_PUNT to btrfsChristoph Hellwig
REQ_CGROUP_PUNT is a bit annoying as it is hard to follow and adds a branch to the bio submission hot path. To fix this, export blkcg_punt_bio_submit and let btrfs call it directly. Add a new REQ_FS_PRIVATE flag for btrfs to indicate to it's own low-level bio submission code that a punt to the cgroup submission helper is required. Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17btrfs, mm: remove the punt_to_cgroup field in struct writeback_controlChristoph Hellwig
punt_to_cgroup is only used by extent_write_locked_range, but that function also directly controls the bio flags for the actual submission. Remove th punt_to_cgroup field, and just set REQ_CGROUP_PUNT directly in extent_write_locked_range. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17btrfs: return a btrfs_bio from btrfs_bio_allocChristoph Hellwig
Return the containing struct btrfs_bio instead of the less type safe struct bio from btrfs_bio_alloc. Reviewed-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17btrfs: store a pointer to a btrfs_bio in struct btrfs_bio_ctrlChristoph Hellwig
The bio in struct btrfs_bio_ctrl must be a btrfs_bio, so store a pointer to the btrfs_bio for better type checking. Reviewed-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17btrfs: simplify finding the inode in submit_one_bioChristoph Hellwig
struct btrfs_bio now has an always valid inode pointer that can be used to find the inode in submit_one_bio, so use that and initialize all variables for which it is possible at declaration time. Reviewed-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17btrfs: pass a btrfs_bio to btrfs_submit_compressed_readChristoph Hellwig
btrfs_submit_compressed_read expects the bio passed to it to be embedded into a btrfs_bio structure. Pass the btrfs_bio directly to increase type safety and make the code self-documenting. Reviewed-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17btrfs: pass a btrfs_bio to btrfs_submit_bioChristoph Hellwig
btrfs_submit_bio expects the bio passed to it to be embedded into a btrfs_bio structure. Pass the btrfs_bio directly to increase type safety and make the code self-documenting. Reviewed-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17btrfs: sink calc_bio_boundaries into its only callerJohannes Thumshirn
Nowadays calc_bio_boundaries() is a relatively simple function that only guarantees the one bio equals to one ordered extent rule for uncompressed Zone Append bios. Sink it into it's only caller alloc_new_bio(). Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>