summaryrefslogtreecommitdiff
path: root/fs/bcachefs/fs-io.c
AgeCommit message (Collapse)Author
2023-10-22bcachefs: fix stack corruptionYuxuan Shui
When a bkey_on_stack is passed to bch_read_indirect_extent, there is no guarantee that it will be big enough to hold the bkey. And bch_read_indirect_extent is not aware of bkey_on_stack to call realloc on it. This cause a stack corruption. This commit makes bch_read_indirect_extent aware of bkey_on_stack so it can call realloc when appropriate. Tested-by: Yuxuan Shui <yshuiv7@gmail.com> Signed-off-by: Yuxuan Shui <yshuiv7@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Don't issue writes that are more than 1 MBKent Overstreet
the bcachefs io path in io.c can't bounce writes larger than that. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix fallocate FL_INSERT_RANGEKent Overstreet
This was another bug because of bch2_btree_iter_set_pos() invalidating iterators. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix a use after free in dio write pathKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Kill TRANS_RESET_MEM|TRANS_RESET_ITERSKent Overstreet
All iterators should be released now with bch2_trans_iter_put(), so TRANS_RESET_ITERS shouldn't be needed anymore, and TRANS_RESET_MEM is always used. Also convert more code to __bch2_trans_do(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Sort & deduplicate updates in bch2_trans_update()Kent Overstreet
Previously, when doing multiple update in the same transaction commit that overwrote each other, we relied on doing the updates in the same order as the bch2_trans_update() calls in order to get the correct result. But that wasn't correct for triggers; bch2_trans_mark_update() when marking overwrites would do the wrong thing because it hadn't seen the update that was being overwritten. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Split out btree_trigger_flagsKent Overstreet
The trigger flags really belong with individual btree_insert_entries, not the transaction commit flags - this splits out those flags and unifies them with the BCH_BUCKET_MARK flags. Todo - split out btree_trigger.c from buckets.c Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Kill BTREE_INSERT_ATOMICKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Convert all bch2_trans_commit() users to BTREE_INSERT_ATOMICKent Overstreet
BTREE_INSERT_ATOMIC should really be the default mode, and there's not that much code that doesn't need it - so this is prep work for getting rid of the flag. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: bch2_trans_reset() calls should be at the tops of loopsKent Overstreet
It needs to be called when we get -EINTR due to e.g. lock restart - this fixes a transaction iterators overflow bug. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix for an assertion on filesystem errorKent Overstreet
Normally the in memory i_size is always greater than or equal to i_size on disk; this doesn't hold on filesystem error. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: bkey_on_stack_reassemble()Kent Overstreet
Small helper function. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Reorganize extents.cKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Inline data extentsKent Overstreet
This implements extents that have their data inline, in the value, instead of the bkey value being pointers to the data - and the read and write paths are updated to read from these new extent types and write them out, when the write size is small enough. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Split out extent_update.cKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Rework of cut_front & cut_backKent Overstreet
This changes bch2_cut_front and bch2_cut_back so that they're able to shorten the size of the value, and it also changes the extent update path to update the accounting in the btree node when this happens. When the size of the value is shortened, they zero out the space that's no longer used, so it's interpreted as noops (as implemented in the last patch). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: bkey_on_stackKent Overstreet
This implements code for storing small bkeys on the stack and allocating out of a mempool if they're too big. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Use wbc_to_write_flags()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Some reflink fixesKent Overstreet
len might fit into a loff_t when aligned_len does not - make sure we use a u64 for aligned_len. Also, we weren't always extending the inode correctly. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Eliminate function calls in DIO fastpathsKent Overstreet
We can assume that usually buffered and O_DIRECT IO won't be mixed, and the calls to flush the page cache won't be needed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: DIO write path only needs to shoot down pagecache once, not twiceKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Add pagecache_add lock to buffered IO path, fault pathKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Don't hold inode lock longer than necessary in dio write pathKent Overstreet
In theory we should be able to do (non appending/extending) dio writes without taking the inode lock at all - but this gets us most of the way there. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Avoid atomics in write fast pathKent Overstreet
This adds some horrible hacks, but the atomic ops for closures were getting to be a pretty expensive part of the write path. We don't want to rip out closures entirely from the write path, because they're used for e.g. waiting on the allocator, or waiting on the journal flush, and that stuff would get really ugly without closures. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix an error path raceKent Overstreet
On IO error, bch2_writepages_io_done() will set the page state to indicate nothing's already reserved (since the write didn't happen, we don't know what's already reserved). This can race with the buffered IO path, in between getting a disk reservation and calling bch2_set_page_dirty(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Refactor bch2_trans_commit() pathKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Limit bios in writepages path to 256MKent Overstreet
This works around a bug where bio_full() doesn't check for bio->bi_iter.bi_size overflowing - and, we don't really want to build bios that are that big anyways. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Kill bchfs_extent_update()Kent Overstreet
The generic IO path now handles inode updates for i_size and i_sectors - this means we can drop a fair amount of code from fs-io.c. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Convert bch2_fpunch to bch2_extent_update()Kent Overstreet
As before - we're moving non Linux specific code out of fs-io.c. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Split out bchfs_extent_update()Kent Overstreet
The next few patches are going to be more moving the logic around i_size/i_sectors updates to io.c, and better separating the Linux VFS specific code from core bcachefs code, to better support the fuse port. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Kill some dependencies on ei_inodeKent Overstreet
Moving bch2_extent_update() to io.c will be greatly simplified if we no longer have to keep ei_inode.bi_size/bi_sectors up to date. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Check if extending inode differentlyKent Overstreet
In bch2_extent_update(), we have to update the inode if i_size is changing (the file is being extend) or if i_sectors is changing, but we want to avoid touching the inode if it's not necessary. Change sum_sector_overwrites() to also check if there's already data above where we're writing to - this means we're definitely not extending the file. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Add a lock to bch_page_stateKent Overstreet
We can't use the page lock to protect it, because on writeback IO error we need to access the page state before calling end_page_writeback() and the page lock semantics are completely insane so that deadlocks. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: bch2_extent_atomic_end() now traverses iterKent Overstreet
This fixes a bug in io.c bch2_write_index_default() - it was missing the traverse call, but bch2_extent_atomic_end returns an error now and can just call it itself. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: bch2_inode_peek()/bch2_inode_write()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix __bch2_buffered_write() returning -ENOMEMKent Overstreet
When grab_cache_page_write_begin() fails but we did pin some pages, we shouldn't return -ENOMEM, we should do a partial write. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Rework btree iterator lifetimesKent Overstreet
The btree_trans struct needs to memoize/cache btree iterators, so that on transaction restart we don't have to completely redo btree lookups, and so that we can do them all at once in the correct order when the transaction had to restart to avoid a deadlock. This switches the btree iterator lookups to work based on iterator position, instead of trying to match them up based on the stack trace. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Kill deferred btree updatesKent Overstreet
Will be replaced by cached btree iterators Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix for partial buffered writesKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: BTREE_ITER_SLOTS isn't a type of btree iterKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Trivial cleanupKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Convert a BUG_ON() to a warningKent Overstreet
We shouldn't ever be writing past i_size - but, apparently there's still a bug to track down. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Handle bio_iov_iter_get_pages() returning unaligned bioKent Overstreet
If the user buffer isn't aligned to the filesystem block size, on a large enough IO - where it won't fit into a single bio - bio_iov_iter_get_pages() won't necessarily return a bio with the proper alignment. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Add support for FALLOC_FL_INSERT_RANGEKent Overstreet
Somewhat tricky and ugly, because iterating over extents backwards is a pain. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Don't write past eofKent Overstreet
When converting from PAGE_SIZE to block_size, the .mkwrite path was missed Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Improved bch2_fcollapse()Kent Overstreet
Move extents instead of copying them - this way, we can iterate over only live extents, not the entire keyspace. Also, this means we can mostly skip running triggers. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Inline some fast pathsKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Check alignment in write pathKent Overstreet
Also - fix alignment in bch2_set_page_dirty() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: ReflinkKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Refactor bch2_extent_trim_atomic() for reflinkKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>