Age | Commit message (Collapse) | Author |
|
Some renaming for better consistency
bch2_member_exists -> bch2_member_alive
bch2_dev_exists -> bch2_member_exists
bch2_dev_exsits2 -> bch2_dev_exists
bch_dev_locked -> bch2_dev_locked
bch_dev_bkey_exists -> bch2_dev_bkey_exists
new helper - bch2_dev_safe
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
cut out a branch from doing it the obvious way
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Factor out slowpath into a separate helper
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Add a new helper that calls dir_emit() and updates ctx->pos on success;
this lets us convert bch2_readdir() to drop_locks_do().
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Combine iter/update/trigger/str_hash flags into a single enum, and
x-macroize them for a to_text() function later.
These flags are all for a specific iter/key/update context, so it makes
sense to group them together - iter/update/trigger flags were already
given distinct bits, this cleans up and unifies that handling.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Consolidate mark_superblock() and trans_mark_superblock(), like we did
with the other trigger paths.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
with errors_silent, reconstruct_alloc no longer requires fsck and
fix_errors to work
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Dead code
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This optimization reduces the average number of comparisons required
from 2*n*log2(n) - 3*n + o(n) to n*log2(n) + 0.37*n + o(n). When n is
sufficiently large, it results in approximately 50% fewer comparisons.
Currently, eytzinger0_sort employs the textbook version of heapsort,
where during the heapify process, each level requires two comparisons
to determine the maximum among three elements. In contrast, the
bottom-up heapsort, during heapify, only compares two children at each
level until reaching a leaf node. Then, it backtracks from the leaf
node to find the correct position. Since heapify typically continues
until very close to the leaf node, the standard heapify requires about
2*log2(n) comparisons, while the bottom-up variant only needs log2(n)
comparisons.
The experimental data presented below is based on an array generated
by get_random_u32().
| N | comparisons(old) | comparisons(new) | time(old) | time(new) |
|-------|------------------|------------------|-----------|-----------|
| 10000 | 235381 | 136615 | 25545 us | 20366 us |
| 20000 | 510694 | 293425 | 31336 us | 18312 us |
| 30000 | 800384 | 457412 | 35042 us | 27386 us |
| 40000 | 1101617 | 626831 | 48779 us | 38253 us |
| 50000 | 1409762 | 799637 | 62238 us | 46950 us |
| 60000 | 1721191 | 974521 | 75588 us | 58367 us |
| 70000 | 2038536 | 1152171 | 90823 us | 68778 us |
| 80000 | 2362958 | 1333472 | 104165 us | 78625 us |
| 90000 | 2690900 | 1516065 | 116111 us | 89573 us |
| 100000| 3019413 | 1699879 | 133638 us | 100998 us |
Refs:
BOTTOM-UP-HEAPSORT, a new variant of HEAPSORT beating, on an average,
QUICKSORT (if n is not very small)
Ingo Wegener
Theoretical Computer Science, 118(1); Pages 81-98, 13 September 1993
https://doi.org/10.1016/0304-3975(93)90364-Y
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
same leaf node
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
reading the journal can take a decent amount of time compared to the
rest of fsck, let's only read it when required.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Be more explicit to the user about what we're doing.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Long form version of bch2_btree_path_to_text() - useful in error
messages and tracepoints.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
small cleanup
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
debug helper
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
better btree node read path error messages
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
- fix assorted (harmless) off-by-one errors
- we were inconsistent on whether out->pos stays <= out->size on
overflow; now it does, and printbuf.overflow exists to indicate if a
printbuf has overflowed
- factor out printbuf_advance_pos()
- printbuf_nul_terminate_reserved(); use this to reduce the number of
printbuf_make_room() calls
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We need to be able to test these paths in dry run mode.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
When a superblock write is silently dropped or it's been modified by
another process we need to know which device it was.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Fix another shift-by-64 by factoring out a common helper for
bch2_bkey_format_invalid() and bformat_needs_redo() (where it was
already fixed).
Reported-by: syzbot+9833a1d29d4a44361e2c@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Btree nodes are log structured; thus, we need to emit whiteouts when
we're deleting a key that's been written out to disk.
k->needs_whiteout tracks whether a key will need a whiteout when it's
deleted, and this requires some careful handling; e.g. the key we're
deleting may not have been written out to disk, but it may have
overwritten a key that was - thus we need to carry this flag around on
overwrites.
Invariants:
There may be multiple key for the same position in a given node (because
of overwrites), but only one of them will be a live (non deleted) key,
and only one key for a given position will have the needs_whiteout flag
set.
Additionally, we don't want to carry around whiteouts that need to be
written in the main searchable part of a btree node - btree_iter_peek()
will have to skip past them, and this can lead to an O(n^2) issues when
doing sequential deletions (e.g. inode rm/truncate). So there's a
separate region in the btree node buffer for unwritten whiteouts; these
are merge sorted with the rest of the keys we're writing in the btree
node write path.
The unwritten whiteouts was a later optimization that bch2_sort_keys()
didn't take into account; the unwritten whiteouts area means that we
never have deleted keys with needs_whiteout set in the main searchable
part of a btree node.
That means we can simplify and optimize some sort paths, and eliminate
an assertion that syzbot found:
- Unless we're in the btree node write path, it's always ok to drop
whiteouts when sorting
- When sorting for a btree node write, we drop the whiteout if it's not
from the unwritten whiteouts area, or if it's overwritten by a real
key at the same position.
This completely eliminates some tricky logic for propagating the
needs_whiteout flag: syzbot was able to hit the assertion that checked
that there shouldn't be more than one key at the same pos with
needs_whiteout set, likely due to a combination of flipping on
needs_whiteout on all written keys (they need whiteouts if overwritten),
combined with not always dropping unneeded whiteouts, and the tricky
logic in the sort path for preserving needs_whiteout that wasn't really
needed.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Pull smb server fixes from Steve French:
"Five ksmbd server fixes, all also for stable
- Three fixes related to SMB3 leases (fixes two xfstests, and a
locking issue)
- Unitialized variable fix
- Socket creation fix when bindv6only is set"
* tag '6.9-rc7-ksmbd-fixes' of git://git.samba.org/ksmbd:
ksmbd: do not grant v2 lease if parent lease key and epoch are not set
ksmbd: use rwsem instead of rwlock for lease break
ksmbd: avoid to send duplicate lease break notifications
ksmbd: off ipv6only for both ipv4/ipv6 binding
ksmbd: fix uninitialized symbol 'share' in smb2_tree_connect()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse fixes from Miklos Szeredi:
"Two one-liner fixes for issues introduced in -rc1"
* tag 'fuse-fixes-6.9-final' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
virtiofs: include a newline in sysfs tag
fuse: verify zero padding in fuse_backing_map
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat
Pull exfat fixes from Namjae Jeon:
- Fix xfstests generic/013 test failure with dirsync mount option
- Initialize the reserved fields of deleted file and stream extension
dentries to zero
* tag 'exfat-for-6.9-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat:
exfat: zero the reserved fields of file and stream extension dentries
exfat: fix timing of synchronizing bitmap and inode
|
|
Merely checking if the directory is encrypted happens for every open
when using ext4, at the moment refing and unrefing the parent, costing 2
atomics and serializing opens of different files.
The most common case of encryption not being used can be checked for
with RCU instead.
Sample result from open1_processes -t 20 ("Separate file open/close")
from will-it-scale on Sapphire Rapids (ops/s):
before: 12539898
after: 25575494 (+103%)
v2:
- add a comment justifying rcu usage, submitted by Eric Biggers
- whack spurious IS_ENCRYPTED check from the refed case
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Link: https://lore.kernel.org/r/20240508081400.422212-1-mjguzik@gmail.com
Signed-off-by: Eric Biggers <ebiggers@google.com>
|
|
Pull bcachefs fixes from Kent Overstreet:
- Various syzbot fixes; mainly small gaps in validation
- Fix an integer overflow in fiemap() which was preventing filefrag
from returning the full list of extents
- Fix a refcounting bug on the device refcount, turned up by new
assertions in the development branch
- Fix a device removal/readd bug; write_super() was repeatedly dropping
and retaking bch_dev->io_ref references
* tag 'bcachefs-2024-05-07.2' of https://evilpiepirate.org/git/bcachefs:
bcachefs: Add missing sched_annotate_sleep() in bch2_journal_flush_seq_async()
bcachefs: Fix race in bch2_write_super()
bcachefs: BCH_SB_LAYOUT_SIZE_BITS_MAX
bcachefs: Add missing skcipher_request_set_callback() call
bcachefs: Fix snapshot_t() usage in bch2_fs_quota_read_inode()
bcachefs: Fix shift-by-64 in bformat_needs_redo()
bcachefs: Guard against unknown k.k->type in __bkey_invalid()
bcachefs: Add missing validation for superblock section clean
bcachefs: Fix assert in bch2_alloc_v4_invalid()
bcachefs: fix overflow in fiemap
bcachefs: Add a better limit for maximum number of buckets
bcachefs: Fix lifetime issue in device iterator helpers
bcachefs: Fix bch2_dev_lookup() refcounting
bcachefs: Initialize bch_write_op->failed in inline data path
bcachefs: Fix refcount put in sb_field_resize error path
bcachefs: Inodes need extra padding for varint_decode_fast()
bcachefs: Fix early error path in bch2_fs_btree_key_cache_exit()
bcachefs: bucket_pos_to_bp_noerror()
bcachefs: don't free error pointers
bcachefs: Fix a scheduler splat in __bch2_next_write_buffer_flush_journal_buf()
|
|
Introduce the capability to dynamically configure the maximum file
note size for ELF core dumps via sysctl.
Why is this being done?
We have observed that during a crash when there are more than 65k mmaps
in memory, the existing fixed limit on the size of the ELF notes section
becomes a bottleneck. The notes section quickly reaches its capacity,
leading to incomplete memory segment information in the resulting coredump.
This truncation compromises the utility of the coredumps, as crucial
information about the memory state at the time of the crash might be
omitted.
This enhancement removes the previous static limit of 4MB, allowing
system administrators to adjust the size based on system-specific
requirements or constraints.
Eg:
$ sysctl -a | grep core_file_note_size_limit
kernel.core_file_note_size_limit = 4194304
$ sysctl -n kernel.core_file_note_size_limit
4194304
$echo 519304 > /proc/sys/kernel/core_file_note_size_limit
$sysctl -n kernel.core_file_note_size_limit
519304
Attempting to write beyond the ceiling value of 16MB
$echo 17194304 > /proc/sys/kernel/core_file_note_size_limit
bash: echo: write error: Invalid argument
Signed-off-by: Vijay Nag <nagvijay@microsoft.com>
Signed-off-by: Allen Pais <apais@linux.microsoft.com>
Link: https://lore.kernel.org/r/20240506193700.7884-1-apais@linux.microsoft.com
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
Upon running sparse, "warning: dubious: x & !y" is output at an array
index calculation within nilfs_load_super_block().
The calculation is not wrong, but to eliminate the sparse warning, replace
it with an equivalent calculation.
Also, add a comment to make it easier to understand what the unintuitive
array index calculation is doing and whether it's correct.
Link: https://lkml.kernel.org/r/20240430080019.4242-3-konishi.ryusuke@gmail.com
Fixes: e339ad31f599 ("nilfs2: introduce secondary super block")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: kernel test robot <lkp@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Nobody checks the error flag on squashfs folios, so stop setting it.
Link: https://lkml.kernel.org/r/20240420025029.2166544-24-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Tested-by: Phillip Lougher <phillip@squashfs.org.uk>
Reviewed-by: Phillip Lougher <phillip@squashfs.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Remove use of page APIs, return the errno instead of 0, switch from
kmap_atomic to kmap_local and use folio_end_read() to unify the two exit
paths.
Link: https://lkml.kernel.org/r/20240420025029.2166544-23-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Tested-by: Phillip Lougher <phillip@squashfs.org.uk>
Reviewed-by: Phillip Lougher <phillip@squashfs.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Variable status is being assigned and error code that is never read, it is
being assigned inside of a do-while loop. The assignment is redundant and
can be removed.
Cleans up clang scan build warning:
fs/ocfs2/dlm/dlmdomain.c:1530:2: warning: Value stored to 'status' is never
read [deadcode.DeadStores]
Link: https://lkml.kernel.org/r/20240423223018.1573213-1-colin.i.king@gmail.com
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: Heming Zhao <heming.zhao@suse.com>
Cc: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Convert nilfs2 to use the new mount API.
[sandeen@redhat.com: v2]
Link: https://lkml.kernel.org/r/33d078a7-9072-4d8e-a3a9-dec23d4191da@redhat.com
Link: https://lkml.kernel.org/r/20240425190526.10905-1-konishi.ryusuke@gmail.com
[konishi.ryusuke: fixed missing SB_RDONLY flag repair in nilfs_reconfigure]
Link: https://lkml.kernel.org/r/33d078a7-9072-4d8e-a3a9-dec23d4191da@redhat.com
Link: https://lkml.kernel.org/r/20240424182716.6024-1-konishi.ryusuke@gmail.com
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Only four lcluster types here, remove redundant code.
No real logic changes.
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20240508123357.3266173-1-hsiangkao@linux.alibaba.com
|
|
Use the superblock's UUID to generate the fsid when it's non-null.
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Signed-off-by: Hongzhen Luo <hongzhen@linux.alibaba.com>
Link: https://lore.kernel.org/r/20240409113022.74720-1-hongzhen@linux.alibaba.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
|
|
This adds a special global buffer pool (in the end) for reserved pages.
Using a reserved pool for LZ4 decompression significantly reduces the
time spent on extra temporary page allocation for the extreme cases in
low memory scenarios.
The table below shows the reduction in time spent on page allocation for
LZ4 decompression when using a reserved pool. The results were obtained
from multi-app launch benchmarks on ARM64 Android devices running the
5.15 kernel with an 8-core CPU and 8GB of memory. In the benchmark, we
launched 16 frequently-used apps, and the camera app was the last one in
each round. The data in the table is the average time of camera app for
each round.
After using the reserved pool, there was an average improvement of 150ms
in the overall launch time of our camera app, which was obtained from
the systrace log.
+--------------+---------------+--------------+---------+
| | w/o page pool | w/ page pool | diff |
+--------------+---------------+--------------+---------+
| Average (ms) | 3434 | 21 | -99.38% |
+--------------+---------------+--------------+---------+
Based on the benchmark logs, 64 pages are sufficient for 95% of
scenarios. This value can be adjusted with a module parameter
`reserved_pages`. The default value is 0.
This pool is currently only used for the LZ4 decompressor, but it can be
applied to more decompressors if needed.
Signed-off-by: Chunhai Guo <guochunhai@vivo.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20240402131523.2703948-1-guochunhai@vivo.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
|
|
Let's use alloc_pages_bulk_array() for simplicity and get rid of
unnecessary pagepool.
Signed-off-by: Chunhai Guo <guochunhai@vivo.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20240402092757.2635257-1-guochunhai@vivo.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
|