summaryrefslogtreecommitdiff
path: root/fs/bcachefs/fs-io.c
AgeCommit message (Collapse)Author
2023-10-22bcachefs: Fix i_sectors_leak in bch2_truncate_pageKent Overstreet
When bch2_truncate_page() discards dirty sectors in the page cache, we need to account for that - we don't need to account for allocated sectors because that'll be done by the bch2_fpunch() call when it updates the btree. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix an i_sectors accounting bugKent Overstreet
We weren't checking for errors before calling i_sectors_acct() Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Don't check for -ENOSPC in page writebackKent Overstreet
If at all possible we'd prefer to not fail page writeback unless the filesystem has been shutdown; allowing errors in page writeback means things we'd like to assert about i_size consistency between the VFS and the btree go out the window. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fallocate fixesKent Overstreet
- fpunch wasn't always correctly updating i_size - when we drop buffered writes that were extending a file, we become responsible for writing i_size. - fzero was sometimes zeroing out more data that it should have - block_start and block_end were being rounded in the wrong directions Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Switch fsync to use bi_journal_seqKent Overstreet
Now that we're recording in each inode the journal sequence number of the most recent update, fsync becomes a lot simpler and we can delete all the plumbing for ei_journal_seq. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix restart handling in for_each_btree_key()Kent Overstreet
Code that uses for_each_btree_key often wants transaction restarts to be handled locally and not returned. Originally, we wouldn't return transaction restarts if there was a single iterator in the transaction - the reasoning being if there weren't other iterators being invalidated, and the current iterator was being advanced/retraversed, there weren't any locks or iterators we were required to preserve. But with the btree_path conversion that approach doesn't work anymore - even when we're using for_each_btree_key() with a single iterator there will still be two paths in the transaction, since we now always preserve the path at the pos the iterator was initialized at - the reason being that on restart we often restart from the same place. And it turns out there's now a lot of for_each_btree_key() uses that _do not_ want transaction restarts handled locally, and should be returning them. This patch splits out for_each_btree_key_norestart() and for_each_btree_key_continue_norestart(), and converts existing users as appropriate. for_each_btree_key(), for_each_btree_key_continue(), and for_each_btree_node() now handle transaction restarts themselves by calling bch2_trans_begin() when necessary - and the old hack to not return transaction restarts when there's a single path in the transaction has been deleted. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: bch2_trans_exit() no longer returns errorsKent Overstreet
Now that peek_node()/next_node() are converted to return errors directly, we don't need bch2_trans_exit() to return errors - it's cleaner this way and wasn't used much anymore. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Convert io paths for snapshotsKent Overstreet
This plumbs around the subvolume ID as was done previously for other filesystem code, but now for the IO paths - the control flow in the IO paths is trickier so the changes in this patch are more involved. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Plumb through subvolume idKent Overstreet
To implement snapshots, we need every filesystem btree operation (every btree operation without a subvolume) to start by looking up the subvolume and getting the current snapshot ID, with bch2_subvolume_get_snapshot() - then, that snapshot ID is used for doing btree lookups in BTREE_ITER_FILTER_SNAPSHOTS mode. This patch adds those bch2_subvolume_get_snapshot() calls, and also switches to passing around a subvol_inum instead of just an inode number. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: btree_pathKent Overstreet
This splits btree_iter into two components: btree_iter is now the externally visible componont, and it points to a btree_path which is now reference counted. This means we no longer have to clone iterators up front if they might be mutated - btree_path can be shared by multiple iterators, and cloned if an iterator would mutate a shared btree_path. This will help us use iterators more efficiently, as well as slimming down the main long lived state in btree_trans, and significantly cleans up the logic for iterator lifetimes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Reduce iter->trans usageKent Overstreet
Disfavoured, and should go away. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix an unhandled transaction restartKent Overstreet
__bch2_read() -> __bch2_read_extent() -> bch2_bucket_io_time_reset() may cause a transaction restart, which we don't return an error for because it doesn't prevent us from making forward progress on the read we're submitting. Instead, change __bch2_read() and bchfs_read() to check for transaction restarts. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Use bch2_trans_begin() more consistentlyKent Overstreet
Upcoming patch will require that a transaction restart is always immediately followed by bch2_trans_begin(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Always check for transaction restartsKent Overstreet
On transaction restart iterators won't be locked anymore - make sure we're always checking for errors. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Use bch2_inode_find_by_inum() in truncateKent Overstreet
This is needed for snapshots because we need to start handling lock restarts even when just calling bch2_inode_peek(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix a memory leak in the dio write pathKent Overstreet
There were some error paths where we were leaking page refs - oops. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: fix truncate without a size changeDan Robertson
Do not attempt to shortcut a truncate when the given new size is the same as the current size. There may be blocks allocated to the file that extend beyond the i_size. The ctime and mtime should not be updated in this case. Signed-off-by: Dan Robertson <dan@dlrobertson.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: fix truncate with ATTR_MODEKent Overstreet
After the v5.12 rebase, we started oopsing when truncate was passed ATTR_MODE, due to not passing mnt_userns to setattr_copy(). This refactors things so that truncate/extend finish by using bch2_setattr_nonsize(), which solves the problem. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Improve iter->should_be_lockedKent Overstreet
Adding iter->should_be_locked introduced a regression where it ended up not being set on the iterator passed to bch2_btree_update_start(), which is definitely not what we want. This patch requires it to be set when calling bch2_trans_update(), and adds various fixups to make that happen. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix a memory leak in dio write pathKent Overstreet
Commit c42bca92be928ce7dece5fc04cf68d0e37ee6718 "bio: don't copy bvec for direct IO" changed bio_iov_iter_get_pages() to point bio->bi_iovec at the incoming biovec, meaning if we already allocated one, it'll be leaked. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Preallocate transaction memKent Overstreet
This helps avoid transaction restarts. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Don't use bch_write_op->cl for delivering completionsKent Overstreet
We already had op->end_io as an alternative mechanism to op->cl.parent for delivering write completions; this switches all code paths to using op->end_io. Two reasons: - op->end_io is more efficient, due to fewer atomic ops, this completes the conversion that was originally only done for the direct IO path. - We'll be restructing the write path to use a different mechanism for punting to process context, refactoring to not use op->cl will make that easier. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix for buffered writes getting -ENOSPCKent Overstreet
Buffered writes may have to increase their disk reservation at btree update time, due to compression and erasure coding being unpredictable: O_DIRECT writes should be checking for -ENOSPC, but buffered writes have already been accepted and should not. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Make bch2_remap_range respect O_SYNCKent Overstreet
Caught by xfstest generic/628 Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Ratelimiting for writeback IOsKent Overstreet
Writeback throttling is a kernel config option and not always enabled. When it's not enabled we need a fallback, to avoid unbounded memory pinning and work item backlogs. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Ensure that fpunch updates inode timestampsKent Overstreet
Fixes xfstests generic/059 Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Refactor bchfs_fallocate() to not nest btree_trans on stackKent Overstreet
Upcoming patch is going to disallow multiple btree_trans on the stack. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Require all btree iterators to be freedKent Overstreet
We keep running into occasional bugs with btree transaction iterators overflowing - this will make those bugs more visible. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Kill reflink optionKent Overstreet
An option was added to control whether reflink support was on or off because for a long time, reflink + inline data extent support was missing - but that's since been fixed, so we can drop the option now. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix read retry path for indirect extentsKent Overstreet
In the read path, for retry of indirect extents to work we need to differentiate between the location in the btree the read was for, vs. the location where we found the data. This patch adds that plumbing to bch_read_bio. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Rename BTREE_ID enums for consistency with other enumsKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix bch2_btree_iter_peek_prev()Kent Overstreet
This makes bch2_btree_iter_peek_prev() and bch2_btree_iter_prev() consistent with peek() and next(), w.r.t. iter->pos. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix loopback in dio modeKent Overstreet
We had a deadlock on page_lock, because buffered reads signal completion by unlocking the page, but the dio read path normally dirties the pages it's reading to with set_page_dirty_lock. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix .splice_writeKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Reduce/kill BKEY_PADDED useKent Overstreet
With various newer key types - stripe keys, inline data extents - the old approach of calculating the maximum size of the value is becoming more and more error prone. Better to switch to bkey_on_stack, which can dynamically allocate if necessary to handle any size bkey. In particular we also want to get rid of BKEY_EXTENT_VAL_U64s_MAX. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Change when we allow overwritesKent Overstreet
Originally, we'd check for -ENOSPC when getting a disk reservation whenever the new extent took up more space on disk than the old extent. Erasure coding screwed this up, because with erasure coding writes are initially replicated, and then in the background the extra replicas are dropped when the stripe is created. This means that with erasure coding enabled, writes will always take up more space on disk than the data they're overwriting - but, according to posix, overwrites aren't supposed to return ENOSPC. So, in this patch we fudge things: if the new extent has more replicas than the _effective_ replicas of the old extent, or if the old extent is compressed and the new one isn't, we check for ENOSPC when getting the disk reservation - otherwise, we don't. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Don't write bucket IO time lazilyKent Overstreet
With the btree key cache code, we don't need to update the alloc btree lazily - and this will mean we can remove the bch2_alloc_write() call in the shutdown path. Future work: we really need to expend the bucket IO clocks from 16 to 64 bits, so that we don't have to rescale them. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Flag inodes that had btree update errorsKent Overstreet
On write error, the vfs inode's i_size may be inconsistent with the btree inode's i_size - flag this so we don't have spurious assertions. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Improve some IO error messagesKent Overstreet
it's useful to know whether an error was for a read or a write - this also standardizes error messages a bit more. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: bch2_trans_get_iter() no longer returns errorsKent Overstreet
Since we now always preallocate the maximum number of iterators when we initialize a btree transaction, getting an iterator never fails - we can delete a fair amount of error path code. This patch also simplifies the iterator allocation code a bit. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix for __readahead_batch getting partial batchKent Overstreet
We were incorrectly ignoring the return value of __readahead_batch, leading to a null ptr deref in __bch2_page_state_create(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Deadlock prevention for ei_pagecache_lockKent Overstreet
In the dio write path, when get_user_pages() invokes the fault handler we have a recursive locking situation - we have to handle the lock ordering ourselves or we have a deadlock: this patch addresses that by checking for locking ordering violations and doing the unlock/relock dance if necessary. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Use attach_page_private and detach_page_privateMatthew Wilcox (Oracle)
These recently added helpers simplify the code. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Remove page_state_init_for_readMatthew Wilcox (Oracle)
This is dead code; delete the function. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix rare use after free in read pathKent Overstreet
If the bkey_on_stack_reassemble() call in __bch2_read_indirect_extent() reallocates the buffer, k in bch2_read - which we pointed at the bkey_on_stack buffer - will now point to a stale buffer. Whoops. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix __bch2_truncate_page()Kent Overstreet
__bch2_truncate_page() will mark some of the blocks in a page as unallocated. But, if the page is mmapped (and writable), every block in the page needs to be marked dirty, else those blocks won't be written by __bch2_writepage(). The solution is to change those userspace mappings to RO, so that we force bch2_page_mkwrite() to be called again. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix short buffered writesKent Overstreet
In the buffered write path, we have to check for short writes that write to the full page, where the page wasn't UpToDate; when this happens, the page is partly garbage, so we have to zero it out and revert that part of the write. This check was wrong - we reverted total from copied, but didn't revert the iov_iter, probably also leading to corrupted writes. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Don't cap ios in dio write path at 2 MBKent Overstreet
It appears this was erronious, a different bug was responsible Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Refactor dio write code to reinit bch_write_opKent Overstreet
This fixes a bug where the BCH_WRITE_SKIP_CLOSURE_PUT was set incorrectly, causing the completion to be delivered multiple times. oops. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Add an option to disable reflink supportKent Overstreet
Reflink might be buggy, so we're adding an option so users can help bisect what's going on. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>