summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2024-05-08bcachefs: bch2_trans_unlock() must always be followed by relock() or begin()Kent Overstreet
We're about to add new asserts for btree_trans locking consistency, and part of that requires that aren't using the btree_trans while it's unlocked. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: Use bch2_btree_path_upgrade() in key cache traverseKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: bch2_btree_path_upgrade() checks nodes_locked, not uptodateKent Overstreet
In the key cache fill path, we use path_upgrade() on a path that isn't uptodate yet but should be locked. This change makes bch2_btree_path_upgrade() slightly looser so we can use it in key cache upgrade, instead of the __ version. Also, make the related assert - that path->uptodate implies nodes_locked - slightly clearer. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: maintain lock invariants in btree_iter_next_node()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: bch2_trans_commit_flags_to_text()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: prefer drop_locks_do()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: get_unlocked_mut_path -> bch2_path_get_unlocked_mutKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: fix typo in reference to BCACHEFS_DEBUGLukas Bulwahn
Commit ec9cc18fc2e6 ("bcachefs: Add checks for invalid snapshot IDs") intends to check the sanity of a snapshot and panic when BCACHEFS_DEBUG is set, but that conditional has a typo. Fix the typo to refer to the actual existing Kconfig symbol. This was found with ./scripts/checkkconfigsymbols.py. Signed-off-by: Lukas Bulwahn <lukas.bulwahn@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: chardev: make bch_chardev_class constantRicardo B. Marliere
Since commit 43a7206b0963 ("driver core: class: make class_register() take a const *"), the driver core allows for struct class to be in read-only memory, so move the bch_chardev_class structure to be declared at build time placing it into read-only memory, instead of having to be dynamically allocated at boot time. Also, correctly clean up after failing paths in bch2_chardev_init(). Cc: Hongbo Li <lihongbo22@huawei.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ricardo B. Marliere <ricardo@marliere.net> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: member helper cleanupsKent Overstreet
Some renaming for better consistency bch2_member_exists -> bch2_member_alive bch2_dev_exists -> bch2_member_exists bch2_dev_exsits2 -> bch2_dev_exists bch_dev_locked -> bch2_dev_locked bch_dev_bkey_exists -> bch2_dev_bkey_exists new helper - bch2_dev_safe Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: bucket_valid()Kent Overstreet
cut out a branch from doing it the obvious way Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: bch2_trans_relock_fail() - factor out slowpathKent Overstreet
Factor out slowpath into a separate helper Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: bch2_dir_emit() - drop_locks_do() conversionKent Overstreet
Add a new helper that calls dir_emit() and updates ctx->pos on success; this lets us convert bch2_readdir() to drop_locks_do(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: bch2_btree_insert_trans() no longer specifies BTREE_ITER_cachedKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: iter/update/trigger/str_hash flag cleanupKent Overstreet
Combine iter/update/trigger/str_hash flags into a single enum, and x-macroize them for a to_text() function later. These flags are all for a specific iter/key/update context, so it makes sense to group them together - iter/update/trigger flags were already given distinct bits, this cleans up and unifies that handling. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: __BTREE_ITER_ALL_SNAPSHOTS -> BTREE_ITER_SNAPSHOT_FIELDKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: mark_superblock cleanupKent Overstreet
Consolidate mark_superblock() and trans_mark_superblock(), like we did with the other trigger paths. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: gc_btree_init_recurse() uses gc_mark_node()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: move root node topo checks to node_check_topology()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: move topology repair kick to gc_btrees()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: kill metadata only gcKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: Finish converting reconstruct_alloc to errors_silentKent Overstreet
with errors_silent, reconstruct_alloc no longer requires fsck and fix_errors to work Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: bch2_gc() is now private to btree_gc.cKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: for_each_btree_key_continue()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: kill for_each_btree_key_old()Kent Overstreet
Dead code Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: Optimize eytzinger0_sort() with bottom-up heapsortKuan-Wei Chiu
This optimization reduces the average number of comparisons required from 2*n*log2(n) - 3*n + o(n) to n*log2(n) + 0.37*n + o(n). When n is sufficiently large, it results in approximately 50% fewer comparisons. Currently, eytzinger0_sort employs the textbook version of heapsort, where during the heapify process, each level requires two comparisons to determine the maximum among three elements. In contrast, the bottom-up heapsort, during heapify, only compares two children at each level until reaching a leaf node. Then, it backtracks from the leaf node to find the correct position. Since heapify typically continues until very close to the leaf node, the standard heapify requires about 2*log2(n) comparisons, while the bottom-up variant only needs log2(n) comparisons. The experimental data presented below is based on an array generated by get_random_u32(). | N | comparisons(old) | comparisons(new) | time(old) | time(new) | |-------|------------------|------------------|-----------|-----------| | 10000 | 235381 | 136615 | 25545 us | 20366 us | | 20000 | 510694 | 293425 | 31336 us | 18312 us | | 30000 | 800384 | 457412 | 35042 us | 27386 us | | 40000 | 1101617 | 626831 | 48779 us | 38253 us | | 50000 | 1409762 | 799637 | 62238 us | 46950 us | | 60000 | 1721191 | 974521 | 75588 us | 58367 us | | 70000 | 2038536 | 1152171 | 90823 us | 68778 us | | 80000 | 2362958 | 1333472 | 104165 us | 78625 us | | 90000 | 2690900 | 1516065 | 116111 us | 89573 us | | 100000| 3019413 | 1699879 | 133638 us | 100998 us | Refs: BOTTOM-UP-HEAPSORT, a new variant of HEAPSORT beating, on an average, QUICKSORT (if n is not very small) Ingo Wegener Theoretical Computer Science, 118(1); Pages 81-98, 13 September 1993 https://doi.org/10.1016/0304-3975(93)90364-Y Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: When traversing to interior nodes, propagate result to paths to ↵Kent Overstreet
same leaf node Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: Don't read journal just for fsckKent Overstreet
reading the journal can take a decent amount of time compared to the rest of fsck, let's only read it when required. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: allow for custom action in fsck error messagesKent Overstreet
Be more explicit to the user about what we're doing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: New assertion for writing to the journal after shutdownKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: bch2_btree_path_to_text()Kent Overstreet
Long form version of bch2_btree_path_to_text() - useful in error messages and tracepoints. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: add btree_node_merging_disabled debug paramKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: bch2_hash_lookup() now returns bkey_s_cKent Overstreet
small cleanup Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: bch2_journal_keys_dump()Kent Overstreet
debug helper Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: bch2_btree_node_header_to_text()Kent Overstreet
better btree node read path error messages Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: prt_printf() now respects \r\n\tKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: printbufs: prt_printf() now handles \t\r\nKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: printbuf improvementsKent Overstreet
- fix assorted (harmless) off-by-one errors - we were inconsistent on whether out->pos stays <= out->size on overflow; now it does, and printbuf.overflow exists to indicate if a printbuf has overflowed - factor out printbuf_advance_pos() - printbuf_nul_terminate_reserved(); use this to reduce the number of printbuf_make_room() calls Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: Run upgrade/downgrade even in -o nochanges modeKent Overstreet
We need to be able to test these paths in dry run mode. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: Better write_super() error messagesKent Overstreet
When a superblock write is silently dropped or it's been modified by another process we need to know which device it was. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: Fix xattr_to_text() unsafetyKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: bch2_bkey_format_field_overflows()Kent Overstreet
Fix another shift-by-64 by factoring out a common helper for bch2_bkey_format_invalid() and bformat_needs_redo() (where it was already fixed). Reported-by: syzbot+9833a1d29d4a44361e2c@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: Fix needs_whiteout BUG_ON() in bkey_sort()Kent Overstreet
Btree nodes are log structured; thus, we need to emit whiteouts when we're deleting a key that's been written out to disk. k->needs_whiteout tracks whether a key will need a whiteout when it's deleted, and this requires some careful handling; e.g. the key we're deleting may not have been written out to disk, but it may have overwritten a key that was - thus we need to carry this flag around on overwrites. Invariants: There may be multiple key for the same position in a given node (because of overwrites), but only one of them will be a live (non deleted) key, and only one key for a given position will have the needs_whiteout flag set. Additionally, we don't want to carry around whiteouts that need to be written in the main searchable part of a btree node - btree_iter_peek() will have to skip past them, and this can lead to an O(n^2) issues when doing sequential deletions (e.g. inode rm/truncate). So there's a separate region in the btree node buffer for unwritten whiteouts; these are merge sorted with the rest of the keys we're writing in the btree node write path. The unwritten whiteouts was a later optimization that bch2_sort_keys() didn't take into account; the unwritten whiteouts area means that we never have deleted keys with needs_whiteout set in the main searchable part of a btree node. That means we can simplify and optimize some sort paths, and eliminate an assertion that syzbot found: - Unless we're in the btree node write path, it's always ok to drop whiteouts when sorting - When sorting for a btree node write, we drop the whiteout if it's not from the unwritten whiteouts area, or if it's overwritten by a real key at the same position. This completely eliminates some tricky logic for propagating the needs_whiteout flag: syzbot was able to hit the assertion that checked that there shouldn't be more than one key at the same pos with needs_whiteout set, likely due to a combination of flipping on needs_whiteout on all written keys (they need whiteouts if overwritten), combined with not always dropping unneeded whiteouts, and the tricky logic in the sort path for preserving needs_whiteout that wasn't really needed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: Fix sb_clean_validate endianness conversionKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08Merge tag '6.9-rc7-ksmbd-fixes' of git://git.samba.org/ksmbdLinus Torvalds
Pull smb server fixes from Steve French: "Five ksmbd server fixes, all also for stable - Three fixes related to SMB3 leases (fixes two xfstests, and a locking issue) - Unitialized variable fix - Socket creation fix when bindv6only is set" * tag '6.9-rc7-ksmbd-fixes' of git://git.samba.org/ksmbd: ksmbd: do not grant v2 lease if parent lease key and epoch are not set ksmbd: use rwsem instead of rwlock for lease break ksmbd: avoid to send duplicate lease break notifications ksmbd: off ipv6only for both ipv4/ipv6 binding ksmbd: fix uninitialized symbol 'share' in smb2_tree_connect()
2024-05-08Merge tag 'fuse-fixes-6.9-final' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse fixes from Miklos Szeredi: "Two one-liner fixes for issues introduced in -rc1" * tag 'fuse-fixes-6.9-final' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: virtiofs: include a newline in sysfs tag fuse: verify zero padding in fuse_backing_map
2024-05-08Merge tag 'exfat-for-6.9-rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat Pull exfat fixes from Namjae Jeon: - Fix xfstests generic/013 test failure with dirsync mount option - Initialize the reserved fields of deleted file and stream extension dentries to zero * tag 'exfat-for-6.9-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat: exfat: zero the reserved fields of file and stream extension dentries exfat: fix timing of synchronizing bitmap and inode
2024-05-08fscrypt: try to avoid refing parent dentry in fscrypt_file_openMateusz Guzik
Merely checking if the directory is encrypted happens for every open when using ext4, at the moment refing and unrefing the parent, costing 2 atomics and serializing opens of different files. The most common case of encryption not being used can be checked for with RCU instead. Sample result from open1_processes -t 20 ("Separate file open/close") from will-it-scale on Sapphire Rapids (ops/s): before: 12539898 after: 25575494 (+103%) v2: - add a comment justifying rcu usage, submitted by Eric Biggers - whack spurious IS_ENCRYPTED check from the refed case Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://lore.kernel.org/r/20240508081400.422212-1-mjguzik@gmail.com Signed-off-by: Eric Biggers <ebiggers@google.com>
2024-05-08Merge tag 'bcachefs-2024-05-07.2' of https://evilpiepirate.org/git/bcachefsLinus Torvalds
Pull bcachefs fixes from Kent Overstreet: - Various syzbot fixes; mainly small gaps in validation - Fix an integer overflow in fiemap() which was preventing filefrag from returning the full list of extents - Fix a refcounting bug on the device refcount, turned up by new assertions in the development branch - Fix a device removal/readd bug; write_super() was repeatedly dropping and retaking bch_dev->io_ref references * tag 'bcachefs-2024-05-07.2' of https://evilpiepirate.org/git/bcachefs: bcachefs: Add missing sched_annotate_sleep() in bch2_journal_flush_seq_async() bcachefs: Fix race in bch2_write_super() bcachefs: BCH_SB_LAYOUT_SIZE_BITS_MAX bcachefs: Add missing skcipher_request_set_callback() call bcachefs: Fix snapshot_t() usage in bch2_fs_quota_read_inode() bcachefs: Fix shift-by-64 in bformat_needs_redo() bcachefs: Guard against unknown k.k->type in __bkey_invalid() bcachefs: Add missing validation for superblock section clean bcachefs: Fix assert in bch2_alloc_v4_invalid() bcachefs: fix overflow in fiemap bcachefs: Add a better limit for maximum number of buckets bcachefs: Fix lifetime issue in device iterator helpers bcachefs: Fix bch2_dev_lookup() refcounting bcachefs: Initialize bch_write_op->failed in inline data path bcachefs: Fix refcount put in sb_field_resize error path bcachefs: Inodes need extra padding for varint_decode_fast() bcachefs: Fix early error path in bch2_fs_btree_key_cache_exit() bcachefs: bucket_pos_to_bp_noerror() bcachefs: don't free error pointers bcachefs: Fix a scheduler splat in __bch2_next_write_buffer_flush_journal_buf()
2024-05-08fs/coredump: Enable dynamic configuration of max file note sizeAllen Pais
Introduce the capability to dynamically configure the maximum file note size for ELF core dumps via sysctl. Why is this being done? We have observed that during a crash when there are more than 65k mmaps in memory, the existing fixed limit on the size of the ELF notes section becomes a bottleneck. The notes section quickly reaches its capacity, leading to incomplete memory segment information in the resulting coredump. This truncation compromises the utility of the coredumps, as crucial information about the memory state at the time of the crash might be omitted. This enhancement removes the previous static limit of 4MB, allowing system administrators to adjust the size based on system-specific requirements or constraints. Eg: $ sysctl -a | grep core_file_note_size_limit kernel.core_file_note_size_limit = 4194304 $ sysctl -n kernel.core_file_note_size_limit 4194304 $echo 519304 > /proc/sys/kernel/core_file_note_size_limit $sysctl -n kernel.core_file_note_size_limit 519304 Attempting to write beyond the ceiling value of 16MB $echo 17194304 > /proc/sys/kernel/core_file_note_size_limit bash: echo: write error: Invalid argument Signed-off-by: Vijay Nag <nagvijay@microsoft.com> Signed-off-by: Allen Pais <apais@linux.microsoft.com> Link: https://lore.kernel.org/r/20240506193700.7884-1-apais@linux.microsoft.com Signed-off-by: Kees Cook <keescook@chromium.org>