Age | Commit message (Collapse) | Author |
|
The is_mgtime test checks whether the FS_MGTIME flag is set in the
fstype. To get there from the inode though, we have to dereference 3
pointers.
Add a new IOP_MGTIME flag, and have inode_init_always() set that flag
when the fstype flag is set. Then, make is_mgtime test for IOP_MGTIME
instead.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20241113-mgtime-v1-1-84e256980e11@kernel.org
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
It literally directly follows a spin_lock() call.
This whacks an explicit barrier on x86-64.
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Link: https://lore.kernel.org/r/20241113155103.4194099-1-mjguzik@gmail.com
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
As the size of a directory increases item creation slows down.
Optimizing access to s_children removes this bottleneck.
dirents are already pinned into the cache, there is no need to scan the
s_children list looking for duplicate Items. The configfs_dirent_exists
check is moved to a location where it is called only during subsystem
initialization.
d_lookup will only need to call configfs_lookup in the case where the
item in question is not pinned to dcache. The only items not pinned to
dcache are attributes. These are placed at the front of the s_children
list, whilst pinned items are inserted at the back. configfs_lookup
stops scanning when it encounters the first pinned entry in s_children.
The assumption of the above optimizations is that there will be few
attributes, but potentially many Items in a given directory.
Signed-off-by: Seamus Connor <sconnor@purestorage.com>
Reviewed-by: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
configfs_hash_and_remove() has been unused since it was added in 2005
by commit
7063fbf22611 ("[PATCH] configfs: User-driven configuration filesystem")
Remove it.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Call blk_validate_limits on the queue limits used for zone append
splitting so that calculated values get filled in and any stacking
conflicts get cought.
Without this there isn't a max_zone_append_sectors limits as of commit
559218d43ec9 ("block: pre-calculate max_zone_append_sectors").
Fixes: 559218d43ec9 ("block: pre-calculate max_zone_append_sectors")
Reported-by: Yi Zhang <yi.zhang@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20241113084541.34315-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The code indicates that journal_init_common() fills the journal_t object
it returns while the comment incorrectly states that only a few fields are
initialised. Also, the comment claims that journal structures could be
created from scratch which isn't possible as journal_init_common() calls
journal_load_superblock() which loads and checks journal superblock from
disk.
Signed-off-by: Daniel Martín Gómez <dalme@riseup.net>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://patch.msgid.link/20241107144538.3544-1-dalme@riseup.net
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Use ktime_get_ns instead of ktime_get_real_ns when computing the lr_timeout
not to be affected by system time jumps.
Use a boolean instead of the MAX_JIFFY_OFFSET value to determine whether
the next_wakeup value has been set. Comparing elr->lr_next_sched to
MAX_JIFFY_OFFSET can cause the lazyinit thread to loop indefinitely.
Co-developed-by: Lukas Skupinski <lukas.skupinski@landisgyr.com>
Signed-off-by: Lukas Skupinski <lukas.skupinski@landisgyr.com>
Signed-off-by: Mathieu Othacehe <othacehe@gnu.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20241106134741.26948-2-othacehe@gnu.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Inline and use struct_size() to calculate the number of bytes to
allocate for new_fn and remove the local variable len.
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Link: https://patch.msgid.link/20241105103353.11590-2-thorsten.blum@linux.dev
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Add the __counted_by compiler attribute to the flexible array member
name to improve access bounds-checking via CONFIG_UBSAN_BOUNDS and
CONFIG_FORTIFY_SOURCE.
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Link: https://patch.msgid.link/20241105101813.10864-2-thorsten.blum@linux.dev
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Remove hard-coded strings by using the str_yes_no() helper function.
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Link: https://patch.msgid.link/20241021100056.5521-2-thorsten.blum@linux.dev
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
We got a report that adding a fanotify filsystem watch prevents tail -f
from receiving events.
Reproducer:
1. Create 3 windows / login sessions. Become root in each session.
2. Choose a mounted filesystem that is pretty quiet; I picked /boot.
3. In the first window, run: fsnotifywait -S -m /boot
4. In the second window, run: echo data >> /boot/foo
5. In the third window, run: tail -f /boot/foo
6. Go back to the second window and run: echo more data >> /boot/foo
7. Observe that the tail command doesn't show the new data.
8. In the first window, hit control-C to interrupt fsnotifywait.
9. In the second window, run: echo still more data >> /boot/foo
10. Observe that the tail command in the third window has now printed
the missing data.
When stracing tail, we observed that when fanotify filesystem mark is
set, tail does get the inotify event, but the event is receieved with
the filename:
read(4, "\1\0\0\0\2\0\0\0\0\0\0\0\20\0\0\0foo\0\0\0\0\0\0\0\0\0\0\0\0\0",
50) = 32
This is unexpected, because tail is watching the file itself and not its
parent and is inconsistent with the inotify event received by tail when
fanotify filesystem mark is not set:
read(4, "\1\0\0\0\2\0\0\0\0\0\0\0\0\0\0\0", 50) = 16
The inteference between different fsnotify groups was caused by the fact
that the mark on the sb requires the filename, so the filename is passed
to fsnotify(). Later on, fsnotify_handle_event() tries to take care of
not passing the filename to groups (such as inotify) that are interested
in the filename only when the parent is watching.
But the logic was incorrect for the case that no group is watching the
parent, some groups are watching the sb and some watching the inode.
Reported-by: Miklos Szeredi <miklos@szeredi.hu>
Fixes: 7372e79c9eb9 ("fanotify: fix logic of reporting name info with watched parent")
Cc: stable@vger.kernel.org # 5.10+
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"10 hotfixes, 7 of which are cc:stable. 7 are MM, 3 are not. All
singletons"
* tag 'mm-hotfixes-stable-2024-11-12-16-39' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm: swapfile: fix cluster reclaim work crash on rotational devices
selftests: hugetlb_dio: fixup check for initial conditions to skip in the start
mm/thp: fix deferred split queue not partially_mapped: fix
mm/gup: avoid an unnecessary allocation call for FOLL_LONGTERM cases
nommu: pass NULL argument to vma_iter_prealloc()
ocfs2: fix UBSAN warning in ocfs2_verify_volume()
nilfs2: fix null-ptr-deref in block_dirty_buffer tracepoint
nilfs2: fix null-ptr-deref in block_touch_buffer tracepoint
mm: page_alloc: move mlocked flag clearance into free_pages_prepare()
mm: count zeromap read and set for swapout and swapin
|
|
It's used only to initialize ->getattr in one inode_operations instance
(empty_dir_inode_operations) and its behaviour had always been equivalent
to what we get with NULL ->getattr.
Just remove that initializer, along with empty_dir_getattr() itself.
While we are at it, the same instance has ->permission initialized to
generic_permission, which is what NULL ->permission ends up doing.
Again, no point keeping it.
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Commit 8a924db2d7b5 ("fs: Pass AT_GETATTR_NOSEC flag to getattr interface
function")' introduced the AT_GETATTR_NOSEC flag to ensure that the
call paths only call vfs_getattr_nosec if it is set instead of vfs_getattr.
Now, simplify the getattr interface functions of filesystems where the flag
AT_GETATTR_NOSEC is checked.
There is only a single caller of inode_operations getattr function and it
is located in fs/stat.c in vfs_getattr_nosec. The caller there is the only
one from which the AT_GETATTR_NOSEC flag is passed from.
Two filesystems are checking this flag in .getattr and the flag is always
passed to them unconditionally from only vfs_getattr_nosec:
- ecryptfs: Simplify by always calling vfs_getattr_nosec in
ecryptfs_getattr. From there the flag is passed to no other
function and this function is not called otherwise.
- overlayfs: Simplify by always calling vfs_getattr_nosec in
ovl_getattr. From there the flag is passed to no other
function and this function is not called otherwise.
The query_flags in vfs_getattr_nosec will mask-out AT_GETATTR_NOSEC from
any caller using AT_STATX_SYNC_TYPE as mask so that the flag is not
important inside this function. Also, since no filesystem is checking the
flag anymore, remove the flag entirely now, including the BUG_ON check that
never triggered.
The net change of the changes here combined with the original commit is
that ecryptfs and overlayfs do not call vfs_getattr but only
vfs_getattr_nosec.
Fixes: 8a924db2d7b5 ("fs: Pass AT_GETATTR_NOSEC flag to getattr interface function")
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Closes: https://lore.kernel.org/linux-fsdevel/20241101011724.GN1350452@ZenIV/T/#u
Cc: Tyler Hicks <code@tyhicks.com>
Cc: ecryptfs@vger.kernel.org
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: linux-unionfs@vger.kernel.org
Cc: Christian Brauner <brauner@kernel.org>
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
... and use fd_empty() consistently
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
LOOKUP_EMPTY is ignored by the only remaining user, and without
that 'getname_' prefix makes no sense.
Remove LOOKUP_EMPTY part, rename to statx_lookup_flags() and make
static. It most likely is _not_ statx() specific, either, but
that's the next step.
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Filesystem options can be retrieved with STATMOUNT_MNT_OPTS, which
returns a string of comma separated options, where some characters are
escaped using the \OOO notation.
Add a new flag, STATMOUNT_OPT_ARRAY, which instead returns the raw
option values separated with '\0' charaters.
Since escaped charaters are rare, this inteface is preferable for
non-libmount users which likley don't want to deal with option
de-escaping.
Example code:
if (st->mask & STATMOUNT_OPT_ARRAY) {
const char *opt = st->str + st->opt_array;
for (unsigned int i = 0; i < st->opt_num; i++) {
printf("opt_array[%i]: <%s>\n", i, opt);
opt += strlen(opt) + 1;
}
}
Example ouput:
(1) mnt_opts: <lowerdir+=/l\054w\054r,lowerdir+=/l\054w\054r1,upperdir=/upp\054r,workdir=/w\054rk,redirect_dir=nofollow,uuid=null>
(2) opt_array[0]: <lowerdir+=/l,w,r>
opt_array[1]: <lowerdir+=/l,w,r1>
opt_array[2]: <upperdir=/upp,r>
opt_array[3]: <workdir=/w,rk>
opt_array[4]: <redirect_dir=nofollow>
opt_array[5]: <uuid=null>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Link: https://lore.kernel.org/r/20241112101006.30715-1-mszeredi@redhat.com
Acked-by: Jeff Layton <jlayton@kernel.org>
[brauner: tweak variable naming and parsing add example output]
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Ugh, indeed - and not needed nearly a decade. It had been
added for the sake of inode_sb_list_lock and that spinlock had become
a per-superblock (->s_inode_list_lock) in March 2015...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20241112213842.GC3387508@ZenIV
|
|
Christoph Hellwig <hch@lst.de> says:
This fixes one (of multiple) sparse warnings in fs-writeback.c, and
then reshuffles the code a bit that only the proper high level API
instead of low-level helpers is exported.
* patches from https://lore.kernel.org/r/20241112054403.1470586-1-hch@lst.de:
writeback: wbc_attach_fdatawrite_inode out of line
writeback: add a __releases annoation to wbc_attach_and_unlock_inode
Link: https://lore.kernel.org/r/20241112054403.1470586-1-hch@lst.de
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
/proc/self/mountinfo displays the source for the mount, but statmount()
doesn't yet have a way to return it. Add a new STATMOUNT_SB_SOURCE flag,
claim the 32-bit __spare1 field to hold the offset into the str[] array.
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20241111-statmount-v4-3-2eaf35d07a80@kernel.org
Acked-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Implemented the suggested solution mentioned in the bug
https://bugzilla.kernel.org/show_bug.cgi?id=218820
Preventing the disabling of delayed allocation mode on remount.
delalloc to nodelalloc not permitted anymore
nodelalloc to delalloc permitted, not affected
Signed-off-by: Nicolas Bretz <bretznic@gmail.com>
Link: https://patch.msgid.link/20241014034143.59779-1-bretznic@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
The b_frozen_data allocation should not be failed during journal
committing process, otherwise jbd2 will abort.
Since commit 490c1b444ce653d("jbd2: do not fail journal because of
frozen_buffer allocation failure") already added '__GFP_NOFAIL' flag
in do_get_write_access(), just add '__GFP_NOFAIL' flag for all allocations
in jbd2_journal_write_metadata_buffer(), like 'new_bh' allocation does.
Besides, remove all error handling branches for do_get_write_access().
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://patch.msgid.link/20241012085530.2147846-1-chengzhihao@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
The variables "&EXT4_SB(inode->i_sb)->s_fc_lock" and "&sbi->s_fc_lock"
are the same lock. This function uses a mix of both, which is a bit
unsightly and confuses Smatch.
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://patch.msgid.link/96008557-8ff4-44cc-b5e3-ce242212f1a3@stanley.mountain
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Use string choice helpers for better readability and to fix cocci warning
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Julia Lawall <julia.lawall@inria.fr>
Closes: https://lore.kernel.org/r/202410062256.BoynX3c2-lkp@intel.com/
Signed-off-by: R Sundar <prosunofficial@gmail.com>
Link: https://patch.msgid.link/20241007172006.83339-1-prosunofficial@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Keep 'success' internally to track if any error happened and then
return it at the end in do_one_pass(). If jbd2_do_replay() return
-ENOMEM then stop replay journal.
Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20240930005942.626942-7-yebin@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
The judgement 'if (block_error && success == 0)' is never valid. Just
remove useless 'block_error' variable.
Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://patch.msgid.link/20240930005942.626942-6-yebin@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Factor out jbd2_do_replay() no funtional change.
Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20240930005942.626942-5-yebin@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
To make JBD2_COMMIT_BLOCK process more clean, no functional change.
Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://patch.msgid.link/20240930005942.626942-4-yebin@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Now buffer_head free is very fragmented in do_one_pass(), unified release
of buffer_head in do_one_pass()
Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://patch.msgid.link/20240930005942.626942-3-yebin@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
'need_check_commit_time' is only used by v2/v3 checksum, so there isn't
need to add 'need_check_commit_time' judegement for v1 checksum logic.
Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://patch.msgid.link/20240930005942.626942-2-yebin@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Instead of directly casting and returning an error-valued pointer,
use ERR_CAST to make the error handling more explicit and improve
code clarity.
Signed-off-by: Yu Jiaoliang <yujiaoliang@vivo.com>
Link: https://patch.msgid.link/20240920021440.1959243-1-yujiaoliang@vivo.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Using mapped writes, it's technically possible to expose stale
post-eof data on a truncate up operation. Consider the following
example:
$ xfs_io -fc "pwrite 0 2k" -c "mmap 0 4k" -c "mwrite 2k 2k" \
-c "truncate 8k" -c "pread -v 2k 16" <file>
...
00000800: 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 XXXXXXXXXXXXXXXX
...
This shows that the post-eof data written via mwrite lands within
EOF after a truncate up. While this is deliberate of the test case,
behavior is somewhat unpredictable because writeback does post-eof
zeroing, and writeback can occur at any time in the background. For
example, an fsync inserted between the mwrite and truncate causes
the subsequent read to instead return zeroes. This basically means
that there is a race window in this situation between any subsequent
extending operation and writeback that dictates whether post-eof
data is exposed to the file or zeroed.
To prevent this problem, perform partial block zeroing as part of
the various inode size extending operations that are susceptible to
it. For truncate extension, zero around the original eof similar to
how truncate down does partial zeroing of the new eof. For extension
via writes and fallocate related operations, zero the newly exposed
range of the file to cover any partial zeroing that must occur at
the original and new eof blocks.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Link: https://patch.msgid.link/20240919160741.208162-2-bfoster@redhat.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
The commit 91562895f803 ("ext4: properly sync file size update after O_SYNC
direct IO") causes confusion about the meaning of the return value of
ext4_dio_write_end_io().
Specifically, when the ext4_handle_inode_extension() operation succeeds,
ext4_dio_write_end_io() directly returns count instead of 0.
This does not cause a bug in the current kernel, but the semantics of the
return value of the ext4_dio_write_end_io() function are wrong, which is
likely to introduce bugs in the future code evolution.
Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://patch.msgid.link/20240919082539.381626-1-alexjlzheng@tencent.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Commit 449813515d3e ("block, fs: Restore the per-bio/request data
lifetime fields") restored write-hint support in ext4. But that is
applicable only for direct IO. This patch supports passing
write-hint for buffered IO from ext4 file system to block layer
by filling bi_write_hint of struct bio in io_submit_add_bh().
Signed-off-by: j.xia <j.xia@samsung.com>
Link: https://patch.msgid.link/20240919020341.2657646-1-j.xia@samsung.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
When I enabled ext4 debug for fault injection testing, I encountered the
following warning:
EXT4-fs error (device sda): ext4_read_inode_bitmap:201: comm fsstress:
Cannot read inode bitmap - block_group = 8, inode_bitmap = 1051
WARNING: CPU: 0 PID: 511 at fs/buffer.c:1181 mark_buffer_dirty+0x1b3/0x1d0
The root cause of the issue lies in the improper implementation of ext4's
buffer_head read fault injection. The actual completion of buffer_head
read and the buffer_head fault injection are not atomic, which can lead
to the uptodate flag being cleared on normally used buffer_heads in race
conditions.
[CPU0] [CPU1] [CPU2]
ext4_read_inode_bitmap
ext4_read_bh()
<bh read complete>
ext4_read_inode_bitmap
if (buffer_uptodate(bh))
return bh
jbd2_journal_commit_transaction
__jbd2_journal_refile_buffer
__jbd2_journal_unfile_buffer
__jbd2_journal_temp_unlink_buffer
ext4_simulate_fail_bh()
clear_buffer_uptodate
mark_buffer_dirty
<report warning>
WARN_ON_ONCE(!buffer_uptodate(bh))
The best approach would be to perform fault injection in the IO completion
callback function, rather than after IO completion. However, the IO
completion callback function cannot get the fault injection code in sb.
Fix it by passing the result of fault injection into the bh read function,
we simulate faults within the bh read function itself. This requires adding
an extra parameter to the bh read functions that need fault injection.
Fixes: 46f870d690fe ("ext4: simulate various I/O and checksum errors when reading metadata")
Signed-off-by: Long Li <leo.lilong@huawei.com>
Link: https://patch.msgid.link/20240906091746.510163-1-leo.lilong@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
When converting a delalloc extent in ext4_es_insert_extent(), since we
only want to pass the info of whether the quota has already been claimed
if the allocation is a direct allocation from ext4_map_create_blocks(),
there is no need to pass full mapping flags, so changes to just pass
whether the EXT4_GET_BLOCKS_DELALLOC_RESERVE bit is set.
Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20240906061401.2980330-1-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
When ctx_set_flags() is unused, it prevents kernel builds
with clang, `make W=1` and CONFIG_WERROR=y:
.../ext4/super.c:2120:1: error: unused function 'ctx_set_flags' [-Werror,-Wunused-function]
2120 | EXT4_SET_CTX(flags); /* set only */
| ^~~~~~~~~~~~~~~~~~~
Fix this by marking ctx_*_flags() with __maybe_unused
(mark both for the sake of symmetry).
See also commit 6863f5643dd7 ("kbuild: allow Clang to find unused static
inline functions for W=1 build").
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://patch.msgid.link/20240905163229.140522-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
This is the logic behavior and one that we would like to verify
using a generic fstest similar to xfs/546.
Link: https://lore.kernel.org/fstests/20240830152648.GE6216@frogsfrogsfrogs/
Suggested-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20240904084657.1062243-1-amir73il@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
cma_alloc() keep failed in our system which thanks to a jh->bh->b_page
can not be migrated out of CMA area[1] as the jh has one cp_transaction
pending on it because of j_free > j_max_transaction_buffers[2][3][4][5][6].
We temporarily solve this by launching jbd2_log_do_checkpoint forcefully
somewhere. Since journal is common mechanism to all JFSs and
cp_transaction has a little fewer opportunity to be launched, the
cma_alloc() could be affected under the same scenario. This patch
would like to have buffer_head of ext4 not use CMA pages when doing
sb_getblk.
[1]
crash_arm64_v8.0.4++> kmem -p|grep ffffff808f0aa150(sb->s_bdev->bd_inode->i_mapping)
fffffffe01a51c00 e9470000 ffffff808f0aa150 3 2 8000000008020 lru,private
fffffffe03d189c0 174627000 ffffff808f0aa150 4 2 2004000000008020 lru,private
fffffffe03d88e00 176238000 ffffff808f0aa150 3f9 2 2008000000008020 lru,private
fffffffe03d88e40 176239000 ffffff808f0aa150 6 2 2008000000008020 lru,private
fffffffe03d88e80 17623a000 ffffff808f0aa150 5 2 2008000000008020 lru,private
fffffffe03d88ec0 17623b000 ffffff808f0aa150 1 2 2008000000008020 lru,private
fffffffe03d88f00 17623c000 ffffff808f0aa150 0 2 2008000000008020 lru,private
fffffffe040e6540 183995000 ffffff808f0aa150 3f4 2 2004000000008020 lru,private
[2] page -> buffer_head
crash_arm64_v8.0.4++> struct page.private fffffffe01a51c00 -x
private = 0xffffff802fca0c00
[3] buffer_head -> journal_head
crash_arm64_v8.0.4++> struct buffer_head.b_private 0xffffff802fca0c00
b_private = 0xffffff8041338e10,
[4] journal_head -> b_cp_transaction
crash_arm64_v8.0.4++> struct journal_head.b_cp_transaction 0xffffff8041338e10 -x
b_cp_transaction = 0xffffff80410f1900,
[5] transaction_t -> journal
crash_arm64_v8.0.4++> struct transaction_t.t_journal 0xffffff80410f1900 -x
t_journal = 0xffffff80e70f3000,
[6] j_free & j_max_transaction_buffers
crash_arm64_v8.0.4++> struct journal_t.j_free,j_max_transaction_buffers 0xffffff80e70f3000 -x
j_free = 0x3f1,
j_max_transaction_buffers = 0x100,
Suggested-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20240904075300.1148836-1-zhaoyang.huang@unisoc.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
The if condition !A || A && B can be simplified to !A || B.
./fs/ext4/fast_commit.c:362:21-23: WARNING !A || A && B is equivalent to !A || B.
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=9837
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20240830071713.40565-1-jiapeng.chong@linux.alibaba.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
The original implementation ext4's FS_IOC_GETFSMAP handling only
worked when the range of queried blocks included at least one free
(unallocated) block range. This is because how the metadata blocks
were emitted was as a side effect of ext4_mballoc_query_range()
calling ext4_getfsmap_datadev_helper(), and that function was only
called when a free block range was identified. As a result, this
caused generic/365 to fail.
Fix this by creating a new function ext4_getfsmap_meta_helper() which
gets called so that blocks before the first free block range in a
block group can get properly reported.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
|
|
The maximum length of a filename is 255 and the minimum block size is 1024,
so it is always guaranteed that the number of entries is greater than or
equal to 2 when do_split() is called. So unless ext4_dx_add_entry() and
make_indexed_dir() or some other functions are buggy, 'split == 0' will
not occur.
Setting 'continued' to 0 in this case masks the problem that the file
system has become corrupted, even though it prevents possible out-of-bounds
access. Hence WARN_ON_ONCE() is used to check if 'split' is 0, and if it is
then warns and returns an error to abort split.
Suggested-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20240823160518.GA424729@mit.edu
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20241008121152.3771906-1-libaokun@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
After commit 21175ca434c5 ("ext4: make prefetch_block_bitmaps default"),
we enable 'prefetch_block_bitmaps' by default, but this is not shown in
the '/proc/fs/ext4/sdx/options' procfs interface.
This makes it impossible to distinguish whether the feature is enabled by
default or not, so 'prefetch_block_bitmaps' is shown in the 'options'
procfs interface when prefetch_block_bitmaps is enabled by default.
This makes it easy to notice changes to the default mount options between
versions through the '/proc/fs/ext4/sdx/options' procfs interface.
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20241008120134.3758097-1-libaokun@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
These two fields are populated and stored as a "frequently used value"
in ufs_fill_super, but are not used afterwards in the driver.
Moreover, one of the shifts triggers UBSAN: shift-out-of-bounds when
apbshift is 12 because 12 * 3 = 36 and 1 << 36 does not fit in the 32
bit integer used to store the value.
Closes: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2087853
Signed-off-by: Agathe Porte <agathe.porte@canonical.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
This will eliminate the need for index recalculation during the fast
path.
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Message-Id: <20241006184341.9081-1-mgurtovoy@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
Enhance the device probing process by adding a log message when a new
virtio-fs tag is successfully discovered. This improvement provides
better visibility into the initialization of virtio-fs devices.
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Message-Id: <20241006184324.8497-1-mgurtovoy@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
This allows exporting this high-level interface only while keeping
wbc_attach_and_unlock_inode private in fs-writeback.c and unexporting
__inode_attach_wb.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20241112054403.1470586-3-hch@lst.de
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
This shuts up a sparse lock context tracking warning.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20241112054403.1470586-2-hch@lst.de
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
/proc/self/mountinfo prints out the sb->s_subtype after the type. This
is particularly useful for disambiguating FUSE mounts (at least when the
userland driver bothers to set it). Add STATMOUNT_FS_SUBTYPE and claim
one of the __spare2 fields to point to the offset into the str[] array.
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ian Kent <raven@themaw.net>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20241111-statmount-v4-2-2eaf35d07a80@kernel.org
Acked-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
When one of the statmount_string() handlers doesn't emit anything to
seq, the kernel currently sets the corresponding flag and emits an empty
string.
Given that statmount() returns a mask of accessible fields, just leave
the bit unset in this case, and skip any NULL termination. If nothing
was emitted to the seq, then the EOVERFLOW and EAGAIN cases aren't
applicable and the function can just return immediately.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20241111-statmount-v4-1-2eaf35d07a80@kernel.org
Acked-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|