Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull fileattr updates from Christian Brauner:
"This introduces the new file_getattr() and file_setattr() system calls
after lengthy discussions.
Both system calls serve as successors and extensible companions to
the FS_IOC_FSGETXATTR and FS_IOC_FSSETXATTR system calls which have
started to show their age in addition to being named in a way that
makes it easy to conflate them with extended attribute related
operations.
These syscalls allow userspace to set filesystem inode attributes on
special files. One of the usage examples is the XFS quota projects.
XFS has project quotas which could be attached to a directory. All new
inodes in these directories inherit project ID set on parent
directory.
The project is created from userspace by opening and calling
FS_IOC_FSSETXATTR on each inode. This is not possible for special
files such as FIFO, SOCK, BLK etc. Therefore, some inodes are left
with empty project ID. Those inodes then are not shown in the quota
accounting but still exist in the directory. This is not critical but
in the case when special files are created in the directory with
already existing project quota, these new inodes inherit extended
attributes. This creates a mix of special files with and without
attributes. Moreover, special files with attributes don't have a
possibility to become clear or change the attributes. This, in turn,
prevents userspace from re-creating quota project on these existing
files.
In addition, these new system calls allow the implementation of
additional attributes that we couldn't or didn't want to fit into the
legacy ioctls anymore"
* tag 'vfs-6.17-rc1.fileattr' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
fs: tighten a sanity check in file_attr_to_fileattr()
tree-wide: s/struct fileattr/struct file_kattr/g
fs: introduce file_getattr and file_setattr syscalls
fs: prepare for extending file_get/setattr()
fs: make vfs_fileattr_[get|set] return -EOPNOTSUPP
selinux: implement inode_file_[g|s]etattr hooks
lsm: introduce new hooks for setting/getting inode fsxattr
fs: split fileattr related helpers into separate file
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull mmap_prepare updates from Christian Brauner:
"Last cycle we introduce f_op->mmap_prepare() in c84bf6dd2b83 ("mm:
introduce new .mmap_prepare() file callback").
This is preferred to the existing f_op->mmap() hook as it does require
a VMA to be established yet, thus allowing the mmap logic to invoke
this hook far, far earlier, prior to inserting a VMA into the virtual
address space, or performing any other heavy handed operations.
This allows for much simpler unwinding on error, and for there to be a
single attempt at merging a VMA rather than having to possibly
reattempt a merge based on potentially altered VMA state.
Far more importantly, it prevents inappropriate manipulation of
incompletely initialised VMA state, which is something that has been
the cause of bugs and complexity in the past.
The intent is to gradually deprecate f_op->mmap, and in that vein this
series coverts the majority of file systems to using f_op->mmap_prepare.
Prerequisite steps are taken - firstly ensuring all checks for mmap
capabilities use the file_has_valid_mmap_hooks() helper rather than
directly checking for f_op->mmap (which is now not a valid check) and
secondly updating daxdev_mapping_supported() to not require a VMA
parameter to allow ext4 and xfs to be converted.
Commit bb666b7c2707 ("mm: add mmap_prepare() compatibility layer for
nested file systems") handles the nasty edge-case of nested file
systems like overlayfs, which introduces a compatibility shim to allow
f_op->mmap_prepare() to be invoked from an f_op->mmap() callback.
This allows for nested filesystems to continue to function correctly
with all file systems regardless of which callback is used. Once we
finally convert all file systems, this shim can be removed.
As a result, ecryptfs, fuse, and overlayfs remain unaltered so they
can nest all other file systems.
We additionally do not update resctl - as this requires an update to
remap_pfn_range() (or an alternative to it) which we defer to a later
series, equally we do not update cramfs which needs a mixed mapping
insertion with the same issue, nor do we update procfs, hugetlbfs,
syfs or kernfs all of which require VMAs for internal state and hooks.
We shall return to all of these later"
* tag 'vfs-6.17-rc1.mmap_prepare' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
doc: update porting, vfs documentation to describe mmap_prepare()
fs: replace mmap hook with .mmap_prepare for simple mappings
fs: convert most other generic_file_*mmap() users to .mmap_prepare()
fs: convert simple use of generic_file_*_mmap() to .mmap_prepare()
mm/filemap: introduce generic_file_*_mmap_prepare() helpers
fs/xfs: transition from deprecated .mmap hook to .mmap_prepare
fs/ext4: transition from deprecated .mmap hook to .mmap_prepare
fs/dax: make it possible to check dev dax support without a VMA
fs: consistently use can_mmap_file() helper
mm/nommu: use file_has_valid_mmap_hooks() helper
mm: rename call_mmap/mmap_prepare to vfs_mmap/mmap_prepare
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull misc VFS updates from Christian Brauner:
"This contains the usual selections of misc updates for this cycle.
Features:
- Add ext4 IOCB_DONTCACHE support
This refactors the address_space_operations write_begin() and
write_end() callbacks to take const struct kiocb * as their first
argument, allowing IOCB flags such as IOCB_DONTCACHE to propagate
to the filesystem's buffered I/O path.
Ext4 is updated to implement handling of the IOCB_DONTCACHE flag
and advertises support via the FOP_DONTCACHE file operation flag.
Additionally, the i915 driver's shmem write paths are updated to
bypass the legacy write_begin/write_end interface in favor of
directly calling write_iter() with a constructed synchronous kiocb.
Another i915 change replaces a manual write loop with
kernel_write() during GEM shmem object creation.
Cleanups:
- don't duplicate vfs_open() in kernel_file_open()
- proc_fd_getattr(): don't bother with S_ISDIR() check
- fs/ecryptfs: replace snprintf with sysfs_emit in show function
- vfs: Remove unnecessary list_for_each_entry_safe() from
evict_inodes()
- filelock: add new locks_wake_up_waiter() helper
- fs: Remove three arguments from block_write_end()
- VFS: change old_dir and new_dir in struct renamedata to dentrys
- netfs: Remove unused declaration netfs_queue_write_request()
Fixes:
- eventpoll: Fix semi-unbounded recursion
- eventpoll: fix sphinx documentation build warning
- fs/read_write: Fix spelling typo
- fs: annotate data race between poll_schedule_timeout() and
pollwake()
- fs/pipe: set FMODE_NOWAIT in create_pipe_files()
- docs/vfs: update references to i_mutex to i_rwsem
- fs/buffer: remove comment about hard sectorsize
- fs/buffer: remove the min and max limit checks in __getblk_slow()
- fs/libfs: don't assume blocksize <= PAGE_SIZE in
generic_check_addressable
- fs_context: fix parameter name in infofc() macro
- fs: Prevent file descriptor table allocations exceeding INT_MAX"
* tag 'vfs-6.17-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (24 commits)
netfs: Remove unused declaration netfs_queue_write_request()
eventpoll: fix sphinx documentation build warning
ext4: support uncached buffered I/O
mm/pagemap: add write_begin_get_folio() helper function
fs: change write_begin/write_end interface to take struct kiocb *
drm/i915: Refactor shmem_pwrite() to use kiocb and write_iter
drm/i915: Use kernel_write() in shmem object create
eventpoll: Fix semi-unbounded recursion
vfs: Remove unnecessary list_for_each_entry_safe() from evict_inodes()
fs/libfs: don't assume blocksize <= PAGE_SIZE in generic_check_addressable
fs/buffer: remove the min and max limit checks in __getblk_slow()
fs: Prevent file descriptor table allocations exceeding INT_MAX
fs: Remove three arguments from block_write_end()
fs/ecryptfs: replace snprintf with sysfs_emit in show function
fs: annotate suspected data race between poll_schedule_timeout() and pollwake()
docs/vfs: update references to i_mutex to i_rwsem
fs/buffer: remove comment about hard sectorsize
fs_context: fix parameter name in infofc() macro
VFS: change old_dir and new_dir in struct renamedata to dentrys
proc_fd_getattr(): don't bother with S_ISDIR() check
...
|
|
To prevent inodes with invalid file types from tripping through the vfs
and causing malfunctions or assertion failures, add a missing sanity check
when reading an inode from a block device. If the file type is not valid,
treat it as a filesystem error.
Link: https://lkml.kernel.org/r/20250710134952.29862-1-konishi.ryusuke@gmail.com
Fixes: 05fe58fdc10d ("nilfs2: inode operations")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+895c23f6917da440ed0d@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?extid=895c23f6917da440ed0d
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Change the address_space_operations callbacks write_begin() and
write_end() to take struct kiocb * as the first argument instead of
struct file *.
Update all affected function prototypes, implementations, call sites,
and related documentation across VFS, filesystems, and block layer.
Part of a series refactoring address_space_operations write_begin and
write_end callbacks to use struct kiocb for passing write context and
flags.
Signed-off-by: Taotao Chen <chentaotao@didiglobal.com>
Link: https://lore.kernel.org/20250716093559.217344-4-chentaotao@didiglobal.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Now that we expose struct file_attr as our uapi struct rename all the
internal struct to struct file_kattr to clearly communicate that it is a
kernel internal struct. This is similar to struct mount_{k}attr and
others.
Link: https://lore.kernel.org/20250703-restlaufzeit-baurecht-9ed44552b481@brauner
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
block_write_end() looks like it can be used as a ->write_end()
implementation. However, it can't as it does not unlock nor put
the folio. Since it does not use the 'file', 'mapping' nor 'fsdata'
arguments, remove them.
Signed-off-by: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Link: https://lore.kernel.org/20250624132130.1590285-1-willy@infradead.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Since commit c84bf6dd2b83 ("mm: introduce new .mmap_prepare() file
callback"), the f_op->mmap() hook has been deprecated in favour of
f_op->mmap_prepare().
This callback is invoked in the mmap() logic far earlier, so error handling
can be performed more safely without complicated and bug-prone state
unwinding required should an error arise.
This hook also avoids passing a pointer to a not-yet-correctly-established
VMA avoiding any issues with referencing this data structure.
It rather provides a pointer to the new struct vm_area_desc descriptor type
which contains all required state and allows easy setting of required
parameters without any consideration needing to be paid to locking or
reference counts.
Note that nested filesystems like overlayfs are compatible with an
.mmap_prepare() callback since commit bb666b7c2707 ("mm: add mmap_prepare()
compatibility layer for nested file systems").
In this patch we apply this change to file systems with relatively simple
mmap() hook logic - exfat, ceph, f2fs, bcachefs, zonefs, btrfs, ocfs2,
orangefs, nilfs2, romfs, ramfs and aio.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Link: https://lore.kernel.org/f528ac4f35b9378931bd800920fee53fc0c5c74d.1750099179.git.lorenzo.stoakes@oracle.com
Acked-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Move this API to the canonical timer_*() namespace.
[ tglx: Redone against pre rc1 ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/aB2X0jCKQO56WdMt@gmail.com
|
|
Since commit 013a07052a1a ("nilfs2: convert metadata aops from writepage
to writepages"), nilfs_mdt_write_folio can't be called from reclaim
context any more. Remove the code keyed of the wbc->for_reclaim flag,
which is now only set for writing out swap or shmem pages inside the swap
code, but never passed to file systems.
Link: https://lkml.kernel.org/r/20250508054938.15894-7-hch@lst.de
Link: https://lkml.kernel.org/r/20250516123417.6779-1-konishi.ryusuke@gmail.com
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
In preparation for writing logs, in nilfs_btree_propagate(), which makes
parent and ancestor node blocks dirty starting from a modified data block
or b-tree node block, if the starting block does not belong to the b-tree,
i.e. is isolated, nilfs_btree_do_lookup() called within the function
fails with -ENOENT.
In this case, even though -ENOENT is an internal code, it is propagated to
the log writer via nilfs_bmap_propagate() and may be erroneously returned
to system calls such as fsync().
Fix this issue by changing the error code to -EINVAL in this case, and
having the bmap layer detect metadata corruption and convert the error
code appropriately.
Link: https://lkml.kernel.org/r/20250428173808.6452-3-konishi.ryusuke@gmail.com
Fixes: 1f5abe7e7dbc ("nilfs2: replace BUG_ON and BUG calls triggerable from ioctl")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Wentao Liang <vulab@iscas.ac.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "nilfs2: improve sanity checks in dirty state propagation".
This fixes one missed check for block mapping anomalies and one improper
return of an error code during a preparation step for log writing, thereby
improving checking for filesystem corruption on writeback.
This patch (of 2):
In nilfs_direct_propagate(), the printer get from nilfs_direct_get_ptr()
need to be checked to ensure it is not an invalid pointer.
If the pointer value obtained by nilfs_direct_get_ptr() is
NILFS_BMAP_INVALID_PTR, means that the metadata (in this case, i_bmap in
the nilfs_inode_info struct) that should point to the data block at the
buffer head of the argument is corrupted and the data block is orphaned,
meaning that the file system has lost consistency.
Add a value check and return -EINVAL when it is an invalid pointer.
Link: https://lkml.kernel.org/r/20250428173808.6452-1-konishi.ryusuke@gmail.com
Link: https://lkml.kernel.org/r/20250428173808.6452-2-konishi.ryusuke@gmail.com
Fixes: 36a580eb489f ("nilfs2: direct block mapping")
Signed-off-by: Wentao Liang <vulab@iscas.ac.cn>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
After commit c0e473a0d226 ("block: fix race between set_blocksize and read
paths") was merged, set_blocksize() called by sb_set_blocksize() now locks
the inode of the backing device file. As a result of this change, syzbot
started reporting deadlock warnings due to a circular dependency involving
the semaphore "ns_sem" of the nilfs object, the inode lock of the backing
device file, and the locks that this inode lock is transitively dependent
on.
This is caused by a new lock dependency added by the above change, since
init_nilfs() calls sb_set_blocksize() in the lock section of "ns_sem".
However, these warnings are false positives because init_nilfs() is called
in the early stage of the mount operation and the filesystem has not yet
started.
The reason why "ns_sem" is locked in init_nilfs() was to avoid a race
condition in nilfs_fill_super() caused by sharing a nilfs object among
multiple filesystem instances (super block structures) in the early
implementation. However, nilfs objects and super block structures have
long ago become one-to-one, and there is no longer any need to use the
semaphore there.
So, fix this issue by removing the use of the semaphore "ns_sem" in
init_nilfs().
Link: https://lkml.kernel.org/r/20250503053327.12294-1-konishi.ryusuke@gmail.com
Fixes: c0e473a0d226 ("block: fix race between set_blocksize and read paths")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+00f7f5b884b117ee6773@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=00f7f5b884b117ee6773
Tested-by: syzbot+00f7f5b884b117ee6773@syzkaller.appspotmail.com
Reported-by: syzbot+f30591e72bfc24d4715b@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=f30591e72bfc24d4715b
Tested-by: syzbot+f30591e72bfc24d4715b@syzkaller.appspotmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
timer_delete[_sync]() replaces del_timer[_sync](). Convert the whole tree
over and remove the historical wrapper inlines.
Conversion was done with coccinelle plus manual fixups where necessary.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Some filesystems, such as NFS, cifs, ceph, and fuse, do not have
complete control of sequencing on the actual filesystem (e.g. on a
different server) and may find that the inode created for a mkdir
request already exists in the icache and dcache by the time the mkdir
request returns. For example, if the filesystem is mounted twice the
directory could be visible on the other mount before it is on the
original mount, and a pair of name_to_handle_at(), open_by_handle_at()
calls could instantiate the directory inode with an IS_ROOT() dentry
before the first mkdir returns.
This means that the dentry passed to ->mkdir() may not be the one that
is associated with the inode after the ->mkdir() completes. Some
callers need to interact with the inode after the ->mkdir completes and
they currently need to perform a lookup in the (rare) case that the
dentry is no longer hashed.
This lookup-after-mkdir requires that the directory remains locked to
avoid races. Planned future patches to lock the dentry rather than the
directory will mean that this lookup cannot be performed atomically with
the mkdir.
To remove this barrier, this patch changes ->mkdir to return the
resulting dentry if it is different from the one passed in.
Possible returns are:
NULL - the directory was created and no other dentry was used
ERR_PTR() - an error occurred
non-NULL - this other dentry was spliced in
This patch only changes file-systems to return "ERR_PTR(err)" instead of
"err" or equivalent transformations. Subsequent patches will make
further changes to some file-systems to return a correct dentry.
Not all filesystems reliably result in a positive hashed dentry:
- NFS, cifs, hostfs will sometimes need to perform a lookup of
the name to get inode information. Races could result in this
returning something different. Note that this lookup is
non-atomic which is what we are trying to avoid. Placing the
lookup in filesystem code means it only happens when the filesystem
has no other option.
- kernfs and tracefs leave the dentry negative and the ->revalidate
operation ensures that lookup will be called to correctly populate
the dentry. This could be fixed but I don't think it is important
to any of the users of vfs_mkdir() which look at the dentry.
The recommendation to use
d_drop();d_splice_alias()
is ugly but fits with current practice. A planned future patch will
change this.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: NeilBrown <neilb@suse.de>
Link: https://lore.kernel.org/r/20250227013949.536172-2-neilb@suse.de
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Since nilfs_bmap_lookup_contig() in nilfs_fiemap() calculates its result
by being prepared to go through potentially maxblocks == INT_MAX blocks,
the value in n may experience an overflow caused by left shift of blkbits.
While it is extremely unlikely to occur, play it safe and cast right hand
expression to wider type to mitigate the issue.
Found by Linux Verification Center (linuxtesting.org) with static analysis
tool SVACE.
Link: https://lkml.kernel.org/r/20250124222133.5323-1-konishi.ryusuke@gmail.com
Fixes: 622daaff0a89 ("nilfs2: fiemap support")
Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
A minor issue with nilfs_rename, originating from an old ext2
implementation, is that the mtime is updated even if the rename target is
a directory and it is renamed within the same directory, rather than moved
to a different directory.
In this case, the child directory being renamed does not change in any
way, so changing its mtime is unnecessary according to the specification,
and can unnecessarily confuse backup tools.
In ext2, this issue was fixed by commit 39fe7557b4d6 ("ext2: Do not update
mtime of a moved directory") and a few subsequent fixes, but it remained
in nilfs2.
Fix this issue by not calling nilfs_set_link(), which rewrites the inode
number of the directory entry that refers to the parent directory, when
the move target is a directory and the source and destination are the same
directory.
Here, the directory to be moved only needs to be read if the inode number
of the parent directory is rewritten with nilfs_set_link, so also adjust
the execution conditions of the preparation work to avoid unnecessary
directory reads.
Link: https://lkml.kernel.org/r/20250111143518.7901-3-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "nilfs2: fix issues with rename operations".
This series fixes BUG_ON check failures reported by syzbot around rename
operations, and a minor behavioral issue where the mtime of a child
directory changes when it is renamed instead of moved.
This patch (of 2):
The directory manipulation routines nilfs_set_link() and
nilfs_delete_entry() rewrite the directory entry in the folio/page
previously read by nilfs_find_entry(), so error handling is omitted on the
assumption that nilfs_prepare_chunk(), which prepares the buffer for
rewriting, will always succeed for these. And if an error is returned, it
triggers the legacy BUG_ON() checks in each routine.
This assumption is wrong, as proven by syzbot: the buffer layer called by
nilfs_prepare_chunk() may call nilfs_get_block() if necessary, which may
fail due to metadata corruption or other reasons. This has been there all
along, but improved sanity checks and error handling may have made it more
reproducible in fuzzing tests.
Fix this issue by adding missing error paths in nilfs_set_link(),
nilfs_delete_entry(), and their caller nilfs_rename().
Link: https://lkml.kernel.org/r/20250111143518.7901-1-konishi.ryusuke@gmail.com
Link: https://lkml.kernel.org/r/20250111143518.7901-2-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+32c3706ebf5d95046ea1@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=32c3706ebf5d95046ea1
Reported-by: syzbot+1097e95f134f37d9395c@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=1097e95f134f37d9395c
Fixes: 2ba466d74ed7 ("nilfs2: directory entry operations")
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Also for comments that do not cause kernel-doc warnings (those that list
multiple error codes), revise the return value description style to match
Brian G.'s suggestion of "..., or one of the following negative error
codes on failure:".
Link: https://lkml.kernel.org/r/CAAq45aNh1qV8P6XgDhKeNstT=PvcPUaCXsAF-f9rvmzznsZL5A@mail.gmail.com
Link: https://lkml.kernel.org/r/20250110010530.21872-8-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: "Brian G ." <gissf1@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
There are a number of kernel-doc comments for functions that are missing
return values, which also causes a number of warnings when the kernel-doc
script is run with the "-Wall" option.
Fix this issue by adding proper return value descriptions, and improve
code maintainability.
Link: https://lkml.kernel.org/r/20250110010530.21872-7-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: "Brian G ." <gissf1@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Similar to the previous changes to fix return value descriptions, this
fixes the format of the return value descriptions of functions for the
rest.
Link: https://lkml.kernel.org/r/20250110010530.21872-6-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: "Brian G ." <gissf1@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Similar to the previous changes to fix return value descriptions, this
fixes the format of the return value descriptions for metadata file
functions other than sufile.
Link: https://lkml.kernel.org/r/20250110010530.21872-5-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: "Brian G ." <gissf1@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Similar to the previous changes to fix return value descriptions, this
fixes the format of the return value descriptions of functions for
sufile-related functions, eliminating a dozen warnings emitted by the
kernel-doc script.
Link: https://lkml.kernel.org/r/20250110010530.21872-4-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: "Brian G ." <gissf1@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Similar to the previous patch to fix the ioctl return value descriptions,
this fixes the format of the return value descriptions for bmap (and
btree)-related functions, which was causing the kernel-doc script to emit
a number of warnings.
Link: https://lkml.kernel.org/r/20250110010530.21872-3-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: "Brian G ." <gissf1@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "nilfs2: fix kernel-doc comments for function return values",
v2.
This series fixes the inadequacies in the return value descriptions in
nilfs2's kernel-doc comments (mainly incorrect formatting), as well as the
lack of return value descriptions themselves, and fixes most of the
remaining warnings that are output when the kernel-doc script is run with
the "-Wall" option.
This patch (of 7):
In the kernel-doc comments for functions, there are many cases where the
format of the return value description is inaccurate, such as "Return
Value: ...", which causes many warnings to be output when the kernel-doc
script is executed with the "-Wall" option.
This fixes such incorrectly formatted return value descriptions for ioctl
functions.
Link: https://lkml.kernel.org/r/20250110010530.21872-1-konishi.ryusuke@gmail.com
Link: https://lkml.kernel.org/r/20250110010530.21872-2-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: "Brian G ." <gissf1@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
nilfs_lookup_dirty_data_buffers(), which iterates through the buffers
attached to dirty data folios/pages, accesses the attached buffers without
locking the folios/pages.
For data cache, nilfs_clear_folio_dirty() may be called asynchronously
when the file system degenerates to read only, so
nilfs_lookup_dirty_data_buffers() still has the potential to cause use
after free issues when buffers lose the protection of their dirty state
midway due to this asynchronous clearing and are unintentionally freed by
try_to_free_buffers().
Eliminate this race issue by adjusting the lock section in this function.
Link: https://lkml.kernel.org/r/20250107200202.6432-3-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Fixes: 8c26c4e2694a ("nilfs2: fix issue with flush kernel thread after remount in RO mode because of driver's internal error or metadata corruption")
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "nilfs2: protect busy buffer heads from being force-cleared".
This series fixes the buffer head state inconsistency issues reported by
syzbot that occurs when the filesystem is corrupted and falls back to
read-only, and the associated buffer head use-after-free issue.
This patch (of 2):
Syzbot has reported that after nilfs2 detects filesystem corruption and
falls back to read-only, inconsistencies in the buffer state may occur.
One of the inconsistencies is that when nilfs2 calls mark_buffer_dirty()
to set a data or metadata buffer as dirty, but it detects that the buffer
is not in the uptodate state:
WARNING: CPU: 0 PID: 6049 at fs/buffer.c:1177 mark_buffer_dirty+0x2e5/0x520
fs/buffer.c:1177
...
Call Trace:
<TASK>
nilfs_palloc_commit_alloc_entry+0x4b/0x160 fs/nilfs2/alloc.c:598
nilfs_ifile_create_inode+0x1dd/0x3a0 fs/nilfs2/ifile.c:73
nilfs_new_inode+0x254/0x830 fs/nilfs2/inode.c:344
nilfs_mkdir+0x10d/0x340 fs/nilfs2/namei.c:218
vfs_mkdir+0x2f9/0x4f0 fs/namei.c:4257
do_mkdirat+0x264/0x3a0 fs/namei.c:4280
__do_sys_mkdirat fs/namei.c:4295 [inline]
__se_sys_mkdirat fs/namei.c:4293 [inline]
__x64_sys_mkdirat+0x87/0xa0 fs/namei.c:4293
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
The other is when nilfs_btree_propagate(), which propagates the dirty
state to the ancestor nodes of a b-tree that point to a dirty buffer,
detects that the origin buffer is not dirty, even though it should be:
WARNING: CPU: 0 PID: 5245 at fs/nilfs2/btree.c:2089
nilfs_btree_propagate+0xc79/0xdf0 fs/nilfs2/btree.c:2089
...
Call Trace:
<TASK>
nilfs_bmap_propagate+0x75/0x120 fs/nilfs2/bmap.c:345
nilfs_collect_file_data+0x4d/0xd0 fs/nilfs2/segment.c:587
nilfs_segctor_apply_buffers+0x184/0x340 fs/nilfs2/segment.c:1006
nilfs_segctor_scan_file+0x28c/0xa50 fs/nilfs2/segment.c:1045
nilfs_segctor_collect_blocks fs/nilfs2/segment.c:1216 [inline]
nilfs_segctor_collect fs/nilfs2/segment.c:1540 [inline]
nilfs_segctor_do_construct+0x1c28/0x6b90 fs/nilfs2/segment.c:2115
nilfs_segctor_construct+0x181/0x6b0 fs/nilfs2/segment.c:2479
nilfs_segctor_thread_construct fs/nilfs2/segment.c:2587 [inline]
nilfs_segctor_thread+0x69e/0xe80 fs/nilfs2/segment.c:2701
kthread+0x2f0/0x390 kernel/kthread.c:389
ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
</TASK>
Both of these issues are caused by the callbacks that handle the
page/folio write requests, forcibly clear various states, including the
working state of the buffers they hold, at unexpected times when they
detect read-only fallback.
Fix these issues by checking if the buffer is referenced before clearing
the page/folio state, and skipping the clear if it is.
Link: https://lkml.kernel.org/r/20250107200202.6432-1-konishi.ryusuke@gmail.com
Link: https://lkml.kernel.org/r/20250107200202.6432-2-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+b2b14916b77acf8626d7@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=b2b14916b77acf8626d7
Reported-by: syzbot+d98fd19acd08b36ff422@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?extid=d98fd19acd08b36ff422
Fixes: 8c26c4e2694a ("nilfs2: fix issue with flush kernel thread after remount in RO mode because of driver's internal error or metadata corruption")
Tested-by: syzbot+b2b14916b77acf8626d7@syzkaller.appspotmail.com
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
When block_invalidatepage was converted to block_invalidate_folio, the
fallback to block_invalidatepage in folio_invalidate() if the
address_space_operations method invalidatepage (currently
invalidate_folio) was not set, was removed.
Unfortunately, some pseudo-inodes in nilfs2 use empty_aops set by
inode_init_always_gfp() as is, or explicitly set it to
address_space_operations. Therefore, with this change,
block_invalidatepage() is no longer called from folio_invalidate(), and as
a result, the buffer_head structures attached to these pages/folios are no
longer freed via try_to_free_buffers().
Thus, these buffer heads are now leaked by truncate_inode_pages(), which
cleans up the page cache from inode evict(), etc.
Three types of caches use empty_aops: gc inode caches and the DAT shadow
inode used by GC, and b-tree node caches. Of these, b-tree node caches
explicitly call invalidate_mapping_pages() during cleanup, which involves
calling try_to_free_buffers(), so the leak was not visible during normal
operation but worsened when GC was performed.
Fix this issue by using address_space_operations with invalidate_folio set
to block_invalidate_folio instead of empty_aops, which will ensure the
same behavior as before.
Link: https://lkml.kernel.org/r/20241212164556.21338-1-konishi.ryusuke@gmail.com
Fixes: 7ba13abbd31e ("fs: Turn block_invalidatepage into block_invalidate_folio")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org> [5.18+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
syzbot reported a WARNING in nilfs_rmdir. [1]
Because the inode bitmap is corrupted, an inode with an inode number that
should exist as a ".nilfs" file was reassigned by nilfs_mkdir for "file0",
causing an inode duplication during execution. And this causes an
underflow of i_nlink in rmdir operations.
The inode is used twice by the same task to unmount and remove directories
".nilfs" and "file0", it trigger warning in nilfs_rmdir.
Avoid to this issue, check i_nlink in nilfs_iget(), if it is 0, it means
that this inode has been deleted, and iput is executed to reclaim it.
[1]
WARNING: CPU: 1 PID: 5824 at fs/inode.c:407 drop_nlink+0xc4/0x110 fs/inode.c:407
...
Call Trace:
<TASK>
nilfs_rmdir+0x1b0/0x250 fs/nilfs2/namei.c:342
vfs_rmdir+0x3a3/0x510 fs/namei.c:4394
do_rmdir+0x3b5/0x580 fs/namei.c:4453
__do_sys_rmdir fs/namei.c:4472 [inline]
__se_sys_rmdir fs/namei.c:4470 [inline]
__x64_sys_rmdir+0x47/0x50 fs/namei.c:4470
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Link: https://lkml.kernel.org/r/20241209065759.6781-1-konishi.ryusuke@gmail.com
Fixes: d25006523d0b ("nilfs2: pathname operations")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+9260555647a5132edd48@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=9260555647a5132edd48
Tested-by: syzbot+9260555647a5132edd48@syzkaller.appspotmail.com
Signed-off-by: Edward Adam Davis <eadavis@qq.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Syzbot reported that when searching for records in a directory where the
inode's i_size is corrupted and has a large value, memory access outside
the folio/page range may occur, or a use-after-free bug may be detected if
KASAN is enabled.
This is because nilfs_last_byte(), which is called by nilfs_find_entry()
and others to calculate the number of valid bytes of directory data in a
page from i_size and the page index, loses the upper 32 bits of the 64-bit
size information due to an inappropriate type of local variable to which
the i_size value is assigned.
This caused a large byte offset value due to underflow in the end address
calculation in the calling nilfs_find_entry(), resulting in memory access
that exceeds the folio/page size.
Fix this issue by changing the type of the local variable causing the bit
loss from "unsigned int" to "u64". The return value of nilfs_last_byte()
is also of type "unsigned int", but it is truncated so as not to exceed
PAGE_SIZE and no bit loss occurs, so no change is required.
Link: https://lkml.kernel.org/r/20241119172403.9292-1-konishi.ryusuke@gmail.com
Fixes: 2ba466d74ed7 ("nilfs2: directory entry operations")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+96d5d14c47d97015c624@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=96d5d14c47d97015c624
Tested-by: syzbot+96d5d14c47d97015c624@syzkaller.appspotmail.com
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
- The series "resource: A couple of cleanups" from Andy Shevchenko
performs some cleanups in the resource management code
- The series "Improve the copy of task comm" from Yafang Shao addresses
possible race-induced overflows in the management of
task_struct.comm[]
- The series "Remove unnecessary header includes from
{tools/}lib/list_sort.c" from Kuan-Wei Chiu adds some cleanups and a
small fix to the list_sort library code and to its selftest
- The series "Enhance min heap API with non-inline functions and
optimizations" also from Kuan-Wei Chiu optimizes and cleans up the
min_heap library code
- The series "nilfs2: Finish folio conversion" from Ryusuke Konishi
finishes off nilfs2's folioification
- The series "add detect count for hung tasks" from Lance Yang adds
more userspace visibility into the hung-task detector's activity
- Apart from that, singelton patches in many places - please see the
individual changelogs for details
* tag 'mm-nonmm-stable-2024-11-24-02-05' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (71 commits)
gdb: lx-symbols: do not error out on monolithic build
kernel/reboot: replace sprintf() with sysfs_emit()
lib: util_macros_kunit: add kunit test for util_macros.h
util_macros.h: fix/rework find_closest() macros
Improve consistency of '#error' directive messages
ocfs2: fix uninitialized value in ocfs2_file_read_iter()
hung_task: add docs for hung_task_detect_count
hung_task: add detect count for hung tasks
dma-buf: use atomic64_inc_return() in dma_buf_getfile()
fs/proc/kcore.c: fix coccinelle reported ERROR instances
resource: avoid unnecessary resource tree walking in __region_intersects()
ocfs2: remove unused errmsg function and table
ocfs2: cluster: fix a typo
lib/scatterlist: use sg_phys() helper
checkpatch: always parse orig_commit in fixes tag
nilfs2: convert metadata aops from writepage to writepages
nilfs2: convert nilfs_recovery_copy_block() to take a folio
nilfs2: convert nilfs_page_count_clean_buffers() to take a folio
nilfs2: remove nilfs_writepage
nilfs2: convert checkpoint file to be folio-based
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs pagecache updates from Christian Brauner:
"Cleanup filesystem page flag usage: This continues the work to make
the mappedtodisk/owner_2 flag available to filesystems which don't use
buffer heads. Further patches remove uses of Private2. This brings us
very close to being rid of it entirely"
* tag 'vfs-6.13.pagecache' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
migrate: Remove references to Private2
ceph: Remove call to PagePrivate2()
btrfs: Switch from using the private_2 flag to owner_2
mm: Remove PageMappedToDisk
nilfs2: Convert nilfs_copy_buffer() to use folios
fs: Move clearing of mappedtodisk to buffer.c
|
|
When using the "block:block_dirty_buffer" tracepoint, mark_buffer_dirty()
may cause a NULL pointer dereference, or a general protection fault when
KASAN is enabled.
This happens because, since the tracepoint was added in
mark_buffer_dirty(), it references the dev_t member bh->b_bdev->bd_dev
regardless of whether the buffer head has a pointer to a block_device
structure.
In the current implementation, nilfs_grab_buffer(), which grabs a buffer
to read (or create) a block of metadata, including b-tree node blocks,
does not set the block device, but instead does so only if the buffer is
not in the "uptodate" state for each of its caller block reading
functions. However, if the uptodate flag is set on a folio/page, and the
buffer heads are detached from it by try_to_free_buffers(), and new buffer
heads are then attached by create_empty_buffers(), the uptodate flag may
be restored to each buffer without the block device being set to
bh->b_bdev, and mark_buffer_dirty() may be called later in that state,
resulting in the bug mentioned above.
Fix this issue by making nilfs_grab_buffer() always set the block device
of the super block structure to the buffer head, regardless of the state
of the buffer's uptodate flag.
Link: https://lkml.kernel.org/r/20241106160811.3316-3-konishi.ryusuke@gmail.com
Fixes: 5305cb830834 ("block: add block_{touch|dirty}_buffer tracepoint")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Ubisectech Sirius <bugreport@valiantsec.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "nilfs2: fix null-ptr-deref bugs on block tracepoints".
This series fixes null pointer dereference bugs that occur when using
nilfs2 and two block-related tracepoints.
This patch (of 2):
It has been reported that when using "block:block_touch_buffer"
tracepoint, touch_buffer() called from __nilfs_get_folio_block() causes a
NULL pointer dereference, or a general protection fault when KASAN is
enabled.
This happens because since the tracepoint was added in touch_buffer(), it
references the dev_t member bh->b_bdev->bd_dev regardless of whether the
buffer head has a pointer to a block_device structure. In the current
implementation, the block_device structure is set after the function
returns to the caller.
Here, touch_buffer() is used to mark the folio/page that owns the buffer
head as accessed, but the common search helper for folio/page used by the
caller function was optimized to mark the folio/page as accessed when it
was reimplemented a long time ago, eliminating the need to call
touch_buffer() here in the first place.
So this solves the issue by eliminating the touch_buffer() call itself.
Link: https://lkml.kernel.org/r/20241106160811.3316-1-konishi.ryusuke@gmail.com
Link: https://lkml.kernel.org/r/20241106160811.3316-2-konishi.ryusuke@gmail.com
Fixes: 5305cb830834 ("block: add block_{touch|dirty}_buffer tracepoint")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: Ubisectech Sirius <bugreport@valiantsec.com>
Closes: https://lkml.kernel.org/r/86bd3013-887e-4e38-960f-ca45c657f032.bugreport@valiantsec.com
Reported-by: syzbot+9982fb8d18eba905abe2@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=9982fb8d18eba905abe2
Tested-by: syzbot+9982fb8d18eba905abe2@syzkaller.appspotmail.com
Cc: Tejun Heo <tj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
By implementing ->writepages instead of ->writepage, we remove a layer of
indirect function calls from the writeback path and the last use of struct
page in nilfs2.
[konishi.ryusuke@gmail.com: fixed panic by using buffer_migrate_folio_norefs]
Link: https://lkml.kernel.org/r/20241002150036.1339475-5-willy@infradead.org
Link: https://lkml.kernel.org/r/20241024092602.13395-13-konishi.ryusuke@gmail.com
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Use memcpy_to_folio() instead of open-coding it, and use offset_in_folio()
in case anybody wants to use nilfs2 on a device with large blocks.
[konishi.ryusuke@gmail.com: added label name change]
Link: https://lkml.kernel.org/r/20241002150036.1339475-4-willy@infradead.org
Link: https://lkml.kernel.org/r/20241024092602.13395-12-konishi.ryusuke@gmail.com
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Both callers have a folio, so pass it in and use it directly.
[konishi.ryusuke@gmail.com: fixed a checkpatch warning about function declaration]
Link: https://lkml.kernel.org/r/20241002150036.1339475-3-willy@infradead.org
Link: https://lkml.kernel.org/r/20241024092602.13395-11-konishi.ryusuke@gmail.com
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Since nilfs2 has a ->writepages operation already, ->writepage is only
called by the migration code. If we add a ->migrate_folio operation, it
won't even be used for that and so it can be deleted.
[konishi.ryusuke@gmail.com: fixed panic by using buffer_migrate_folio_norefs]
Link: https://lkml.kernel.org/r/20241002150036.1339475-2-willy@infradead.org
Link: https://lkml.kernel.org/r/20241024092602.13395-10-konishi.ryusuke@gmail.com
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Regarding the cpfile, a metadata file that manages checkpoints, convert
the page-based implementation to a folio-based implementation.
This change involves some helper functions to calculate byte offsets on
folios and removing a few helper functions that are no longer needed.
Link: https://lkml.kernel.org/r/20241024092602.13395-9-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
All calls to nilfs_palloc_block_get_entry() are now gone, so remove it.
Link: https://lkml.kernel.org/r/20241024092602.13395-8-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Regarding the DAT, a metadata file that manages virtual block addresses,
convert the page-based implementation to a folio-based implementation.
Link: https://lkml.kernel.org/r/20241024092602.13395-7-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Convert the page-based implementation of ifile, a metadata file that
manages inodes, to folio-based.
Link: https://lkml.kernel.org/r/20241024092602.13395-6-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Regarding the persistent oject allocator, a common mechanism for
allocating objects in metadata files such as inodes and DAT entries,
convert the page-based implementation to a folio-based implementation.
In this conversion, helper functions nilfs_palloc_group_desc_offset() and
nilfs_palloc_bitmap_offset() are added and used to calculate the byte
offset within a folio of a group descriptor structure and bitmap,
respectively, to replace kmap_local_page with kmap_local_folio.
In addition, a helper function called nilfs_palloc_entry_offset() is
provided to facilitate common calculation of the byte offset within a
folio of metadata file entries managed in the persistent object allocator
format.
Link: https://lkml.kernel.org/r/20241024092602.13395-5-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
For the sufile, which is a metadata file that holds information about
managing segments, convert the page-based implementation to a folio-based
implementation.
kmap_local_page() is changed to use kmap_local_folio(), and where offsets
within a page are calculated using bh_offset(), are replaced with
calculations using offset_in_folio() with an additional helper function
nilfs_sufile_segment_usage_offset().
Link: https://lkml.kernel.org/r/20241024092602.13395-4-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
In the common routines for metadata files, nilfs_mdt_insert_new_block(),
which inserts a new block buffer into the cache, is still page-based, and
there are two places where bh_offset() is used. Convert these to
page-based.
Link: https://lkml.kernel.org/r/20241024092602.13395-3-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "nilfs2: Finish folio conversion".
This series converts all remaining page structure references in nilfs2 to
folio-based, except for nilfs_copy_buffer function, which was converted to
use folios in advance for cross-fs page flags cleanup.
This prioritizes folio conversion, and does not include buffer head
reference reduction, nor does it support for block sizes larger than the
system page size.
The first eight patches in this series mainly convert each of the
nilfs2-specific metadata implementations to use folios. The last four
patches, by Matthew Wilcox, eliminate aops writepage callbacks and convert
the remaining page structure references to folio-based. This part
reflects some corrections to the patch series posted by Matthew.
This patch (of 12):
In the segment buffer (log buffer) implementation, two parts of the block
buffer, CRC calculation and bio preparation, are still page-based, so
convert them to folio-based.
Link: https://lkml.kernel.org/r/20241024092602.13395-1-konishi.ryusuke@gmail.com
Link: https://lkml.kernel.org/r/20241024092602.13395-2-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Syzbot reported that page_symlink(), called by nilfs_symlink(), triggers
memory reclamation involving the filesystem layer, which can result in
circular lock dependencies among the reader/writer semaphore
nilfs->ns_segctor_sem, s_writers percpu_rwsem (intwrite) and the
fs_reclaim pseudo lock.
This is because after commit 21fc61c73c39 ("don't put symlink bodies in
pagecache into highmem"), the gfp flags of the page cache for symbolic
links are overwritten to GFP_KERNEL via inode_nohighmem().
This is not a problem for symlinks read from the backing device, because
the __GFP_FS flag is dropped after inode_nohighmem() is called. However,
when a new symlink is created with nilfs_symlink(), the gfp flags remain
overwritten to GFP_KERNEL. Then, memory allocation called from
page_symlink() etc. triggers memory reclamation including the FS layer,
which may call nilfs_evict_inode() or nilfs_dirty_inode(). And these can
cause a deadlock if they are called while nilfs->ns_segctor_sem is held:
Fix this issue by dropping the __GFP_FS flag from the page cache GFP flags
of newly created symlinks in the same way that nilfs_new_inode() and
__nilfs_read_inode() do, as a workaround until we adopt nofs allocation
scope consistently or improve the locking constraints.
Link: https://lkml.kernel.org/r/20241020050003.4308-1-konishi.ryusuke@gmail.com
Fixes: 21fc61c73c39 ("don't put symlink bodies in pagecache into highmem")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+9ef37ac20608f4836256@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=9ef37ac20608f4836256
Tested-by: syzbot+9ef37ac20608f4836256@syzkaller.appspotmail.com
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Syzbot reported that in directory operations after nilfs2 detects
filesystem corruption and degrades to read-only,
__block_write_begin_int(), which is called to prepare block writes, may
fail the BUG_ON check for accesses exceeding the folio/page size,
triggering a kernel bug.
This was found to be because the "checked" flag of a page/folio was not
cleared when it was discarded by nilfs2's own routine, which causes the
sanity check of directory entries to be skipped when the directory
page/folio is reloaded. So, fix that.
This was necessary when the use of nilfs2's own page discard routine was
applied to more than just metadata files.
Link: https://lkml.kernel.org/r/20241017193359.5051-1-konishi.ryusuke@gmail.com
Fixes: 8c26c4e2694a ("nilfs2: fix issue with flush kernel thread after remount in RO mode because of driver's internal error or metadata corruption")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+d6ca2daf692c7a82f959@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=d6ca2daf692c7a82f959
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
"afs:
- Fix a lock recursion in afs_wake_up_async_call() on ->notify_lock
netfs:
- Drop the references to a folio immediately after the folio has been
extracted to prevent races with future I/O collection
- Fix a documenation build error
- Downgrade the i_rwsem for buffered writes to fix a cifs reported
performance regression when switching to netfslib
vfs:
- Explicitly return -E2BIG from openat2() if the specified size is
unexpectedly large. This aligns openat2() with other extensible
struct based system calls
- When copying a mount namespace ensure that we only try to remove
the new copy from the mount namespace rbtree if it has already been
added to it
nilfs:
- Clear the buffer delay flag when clearing the buffer state clags
when a buffer head is discarded to prevent a kernel OOPs
ocfs2:
- Fix an unitialized value warning in ocfs2_setattr()
proc:
- Fix a kernel doc warning"
* tag 'vfs-6.12-rc5.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
proc: Fix W=1 build kernel-doc warning
afs: Fix lock recursion
fs: Fix uninitialized value issue in from_kuid and from_kgid
fs: don't try and remove empty rbtree node
netfs: Downgrade i_rwsem for a buffered write
nilfs2: fix kernel bug due to missing clearing of buffer delay flag
openat2: explicitly return -E2BIG for (usize > PAGE_SIZE)
netfs: fix documentation build error
netfs: In readahead, put the folio refs as soon extracted
|
|
Syzbot reported that a task hang occurs in vcs_open() during a fuzzing
test for nilfs2.
The root cause of this problem is that in nilfs_find_entry(), which
searches for directory entries, ignores errors when loading a directory
page/folio via nilfs_get_folio() fails.
If the filesystem images is corrupted, and the i_size of the directory
inode is large, and the directory page/folio is successfully read but
fails the sanity check, for example when it is zero-filled,
nilfs_check_folio() may continue to spit out error messages in bursts.
Fix this issue by propagating the error to the callers when loading a
page/folio fails in nilfs_find_entry().
The current interface of nilfs_find_entry() and its callers is outdated
and cannot propagate error codes such as -EIO and -ENOMEM returned via
nilfs_find_entry(), so fix it together.
Link: https://lkml.kernel.org/r/20241004033640.6841-1-konishi.ryusuke@gmail.com
Fixes: 2ba466d74ed7 ("nilfs2: directory entry operations")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: Lizhi Xu <lizhi.xu@windriver.com>
Closes: https://lkml.kernel.org/r/20240927013806.3577931-1-lizhi.xu@windriver.com
Reported-by: syzbot+8a192e8d090fa9a31135@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=8a192e8d090fa9a31135
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|