summaryrefslogtreecommitdiff
path: root/fs/bcachefs
AgeCommit message (Collapse)Author
2023-10-22bcachefs: Add a kmem_cache for btree_key_cache objectsKent Overstreet
We allocate a lot of these, and we're seeing sporading OOMs - this will help with tracking those down. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Be more precise with journal error reportingKent Overstreet
We were incorrectly detecting a journal deadlock - the journal filling up - when only the journal pin fifo had filled up; if the journal pin fifo is full that just means we need to wait on reclaim. This plumbs through better error reporting so we can better discriminate in the journal_res_get path what's going on. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Add btree cache stats to sysfsKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Add an ioctl for resizing journal on a deviceKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Add more debug checksKent Overstreet
tracking down a bug where we see a btree node pointer in the wrong node Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Dump journal state when the journal deadlocksKent Overstreet
Currently tracking down one of these bugs. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Dont' use percpu btree_iter buf in userspaceKent Overstreet
bcachefs-tools doesn't have a real percpu (per thread) implementation yet Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Set preallocated transaction mem to avoid restartsKent Overstreet
this will reduce transaction restarts, from observation of tracepoints. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Convert tracepoints to use %ps, not %pfKent Overstreet
Symbol decoding was changed from %pf to %ps Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix journal entry repair codeKent Overstreet
When we detect bad keys in the journal that have to be dropped, the flow control was wrong - we ended up not checking the next key in that entry. Oops. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Add a shrinker for the btree key cacheKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Take a SRCU lock in btree transactionsKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Check for errors from register_shrinker()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Assorted journal refactoringKent Overstreet
Improved the way we track various state by adding j->err_seq, which records the first journal sequence number that encountered an error being written, and j->last_empty_seq, which records the most recent journal entry that was completely empty. Also, use the low bits of the journal sequence number to index the corresponding journal_buf. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Delete dead journalling codeKent Overstreet
Usage of the journal has gotten somewhat simpler over time - neat. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Improve journal error messagesKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Be more careful in bch2_bkey_to_text()Kent Overstreet
This is used to print keys that failed bch2_bkey_invalid(), so be more careful with k->type. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Inode delete doesn't need to flush key cache anymoreKent Overstreet
Inode create checks to make sure the slot doesn't exist in the btree key cache. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix a btree transaction iter overflowKent Overstreet
extent_replay_key dates from before putting iterators was required - fixed. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix a 64 bit divideKent Overstreet
this fixes builds on 32 bit. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Improve journal entry validate codeKent Overstreet
Previously, the journal entry read code was changed so that if we got a journal entry that failed validation, we'd try to use it, preferring to use a good version from another device if available. But this left a bug where if an earlier validation check (say, checksum) failed, the later checks (for last_seq) wouldn't run and we'd end up using a journal entry with a garbage last_seq field. This fixes that so that the later validation checks run and if necessary change those fields to something sensible. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Deadlock prevention for ei_pagecache_lockKent Overstreet
In the dio write path, when get_user_pages() invokes the fault handler we have a recursive locking situation - we have to handle the lock ordering ourselves or we have a deadlock: this patch addresses that by checking for locking ordering violations and doing the unlock/relock dance if necessary. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Hack around bch2_varint_decode invalid readsKent Overstreet
bch2_varint_decode can do reads up to 7 bytes past the end ptr, for the sake of performance - these extra bytes are always masked off. This won't be a problem in practice if we make sure to burn 8 bytes in any buffer that has bkeys in it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix missing memalloc_nofs_restore()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix btree key cache shutdownKent Overstreet
On emergency shutdown, we might still have dirty keys in the btree key cache that need to be cleaned up properly. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Add accounting for dirty btree nodes/keysKent Overstreet
This lets us improve journal reclaim, so that it now tries to make sure no more than 3/4s of the btree node cache and btree key cache are dirty - ensuring the shrinkers can free memory. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix btree iterator leakKent Overstreet
this fixes an occasonial btree transaction iterators overflow. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Inline make_bfloat() into __build_ro_aux_tree()Kent Overstreet
This is a fast path - also, lift out the checks/init for min/max key. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: use a radix tree for inum bitmap in fsckKent Overstreet
The change to use the cpu nr for the high bits of new inode numbers means that inode numbers are very space - we see -ENOMEM during fsck without this. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: New varintsKent Overstreet
Previous varint implementation used by the inode code was not nearly as fast as it could have been; partly because it was attempting to encode integers up to 96 bits (for timestamps) but this meant that encoding and decoding the length required a table lookup. Instead, we'll just encode timestamps greater than 64 bits as two separate varints; this will make decoding/encoding of inodes significantly faster overall. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix build warning when CONFIG_BCACHEFS_DEBUG=nKent Overstreet
this function is only used by debug code, but we'd like to always build it so we know that it does build. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Drop typechecking from bkey_cmp_packed()Kent Overstreet
This only did anything in two places, and those can just be replaced wiht bkey_cmp_left_packed()). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: More inlinining in the btree key cache codeKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix spurious transaction restartsKent Overstreet
The checks for lock ordering violations weren't quite right. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Add a single slot percpu buf for btree itersKent Overstreet
Allocating our array of btree iters is a big enough allocation that it hits the buddy allocator, and we're seeing lots of lock contention. Sticking a single element buffer in front of it should help. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Use attach_page_private and detach_page_privateMatthew Wilcox (Oracle)
These recently added helpers simplify the code. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Remove page_state_init_for_readMatthew Wilcox (Oracle)
This is dead code; delete the function. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Build fixes for 32bit x86Kent Overstreet
PAGE_SIZE and size_t are not unsigned longs on 32 bit, annoying... also switch to atomic64_cmpxchg instead of cmpxchg() for journal_seq_copy, as atomic64_cmpxchg has a fallback that uses spinlocks for when it's not supported. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Improved inode create optimizationKent Overstreet
This shards new inodes into different btree nodes by using the processor ID for the high bits of the new inode number. Much faster than the previous inode create optimization - this also helps with sharding in the other btrees that index by inode number. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Report inode counts via statfsKent Overstreet
Took awhile to figure out exactly what statfs wanted... Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: add const annotations to bset.cKent Overstreet
perhaps a bit silly, but some debug assertions we want to add need const propagated a bit more. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Don't embed btree iters in btree_transKent Overstreet
These haven't been in used since reallocing iterators has been disabled, and saves us a lot of stack if we get rid of it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Split out debug_check_btree_accountingKent Overstreet
This check is very expensive Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Drop sysfs interface to debug parametersKent Overstreet
It's not used much anymore, the module paramter interface is better. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Minor journal reclaim improvementKent Overstreet
With the btree key cache code, journal reclaim now has a lot more work to do. It could be the case that after journal reclaim has finished one iteration there's already more work to do, so put it in a loop to check for that. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Inode create optimizationKent Overstreet
On workloads that do a lot of multithreaded creates all at once, lock contention on the inodes btree turns out to still be an issue. This patch adds a small buffer of inode numbers that are known to be free, so that we can avoid touching the btree on every create. Also, this changes inode creates to update via the btree key cache for the initial create. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Improve check for when bios are physically contiguousKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix spurious transaction restartsKent Overstreet
The check for whether locking a btree node would deadlock was wrong - we have to check that interior nodes are locked before descendents, but this check was wrong when consider cached vs. non cached iterators. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Improve tracing for transaction restartsKent Overstreet
We have a bug where we can get stuck with a process spinning in transaction restarts - need more information. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix stack corruptionKent Overstreet
A bkey_on_stack_realloc() call was in the wrong place, and broken for indirect extents Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>