summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2025-07-19ocfs2: avoid NULL pointer dereference in dx_dir_lookup_rec()Ivan Pravdin
When a directory entry is not found, ocfs2_dx_dir_lookup_rec() prints an error message that unconditionally dereferences the 'rec' pointer. However, if 'rec' is NULL, this leads to a NULL pointer dereference and a kernel panic. Add an explicit check empty extent list to avoid dereferencing NULL 'rec' pointer. Link: https://lkml.kernel.org/r/20250708001009.372263-1-ipravdin.official@gmail.com Reported-by: syzbot+20282c1b2184a857ac4c@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/67cd7e29.050a0220.e1a89.0007.GAE@google.com/ Signed-off-by: Ivan Pravdin <ipravdin.official@gmail.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19ocfs2/dlm: fix "take a while" typoAhelenia Ziemiańska
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19mm, vmstat: remove the NR_WRITEBACK_TEMP node_stat_item counterVlastimil Babka
The only user of the counter (FUSE) was removed in commit 0c58a97f919c ("fuse: remove tmp folio for writebacks and internal rb tree") so follow the established pattern of removing the counter and hardcoding 0 in meminfo output, as done recently with NR_BOUNCE. Update documentation for procfs, including for the value for Bounce that was missed when removing its counter. Also remove the mention of NR_WRITEBACK_TEMP implications from a comment in wb_position_ratio(). The rest of the comment there about fuse setting bdi->max_ratio to 1% is still correct. [vbabka@suse.cz: v2] Link: https://lkml.kernel.org/r/5a848e15-6a57-4ecb-a015-d4f358b8a5d3@suse.cz Link: https://lkml.kernel.org/r/20250625-nr_writeback_removal-v1-1-7f2a0df70faa@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Cc: Brendan Jackman <jackmanb@google.com> Cc: Danilo Krummrich <dakr@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jan Kara <jack@suse.cz> Cc: Jeff Layton <jlayton@kernel.org> Cc: Jeffle Xu <jefflexu@linux.alibaba.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Joanne Koong <joannelkoong@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kirill A. Shuemov <kirill.shutemov@linux.intel.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Maxim Patlasov <mpatlasov@parallels.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Miklos Szeredi <mszeredi@redhat.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Zach O'Keefe <zokeefe@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19Merge tag 'efi-fixes-for-v6.16-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi Pull EFI fix from Ard Biesheuvel: - Fix potential memory leak reported by kmemleak * tag 'efi-fixes-for-v6.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: efivarfs: Fix memory leak of efivarfs_fs_info in fs_context error paths
2025-07-19Merge tag 'vfs-6.16-rc7.fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: - Fix a memory leak in fcntl_dirnotify() - Raise SB_I_NOEXEC on secrement superblock instead of messing with flags on the mount - Add fsdevel and block mailing lists to uio entry. We had a few instances were very questionable stuff was added without either block or the VFS being aware of it - Fix netfs copy-to-cache so that it performs collection with ceph+fscache - Fix netfs race between cache write completion and ALL_QUEUED being set - Verify the inode mode when loading entries from disk in isofs - Avoid state_lock in iomap_set_range_uptodate() - Fix PIDFD_INFO_COREDUMP check in PIDFD_GET_INFO ioctl - Fix the incorrect return value in __cachefiles_write() * tag 'vfs-6.16-rc7.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: MAINTAINERS: add block and fsdevel lists to iov_iter netfs: Fix race between cache write completion and ALL_QUEUED being set netfs: Fix copy-to-cache so that it performs collection with ceph+fscache fix a leak in fcntl_dirnotify() iomap: avoid unnecessary ifs_set_range_uptodate() with locks isofs: Verify inode mode when loading from disk cachefiles: Fix the incorrect return value in __cachefiles_write() secretmem: use SB_I_NOEXEC coredump: fix PIDFD_INFO_COREDUMP ioctl check
2025-07-18Merge tag 'v6.16-rc6-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull smb client fixes from Steve French: - fix creating special files to Samba when using SMB3.1.1 POSIX Extensions - fix incorrect caching on new file creation with directory leases enabled - two use after free fixes: one in oplock_break and one in async decryption * tag 'v6.16-rc6-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: Fix SMB311 posix special file creation to servers which do not advertise reparse support smb: invalidate and close cached directory when creating child entries smb: client: fix use-after-free in crypt_message when using async crypto smb: client: fix use-after-free in cifs_oplock_break
2025-07-18net: s/dev_get_flags/netif_get_flags/Stanislav Fomichev
Commit cc34acd577f1 ("docs: net: document new locking reality") introduced netif_ vs dev_ function semantics: the former expects locked netdev, the latter takes care of the locking. We don't strictly follow this semantics on either side, but there are more dev_xxx handlers now that don't fit. Rename them to netif_xxx where appropriate. Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250717172333.1288349-6-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf after rc6Alexei Starovoitov
Cross-merge BPF and other fixes after downstream PR. No conflicts. Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-18btrfs/ioctl: store btrfs_uring_encoded_data in io_btrfs_cmdCaleb Sander Mateos
btrfs is the only user of struct io_uring_cmd_data and its op_data field. Switch its ->uring_cmd() implementations to store the struct btrfs_uring_encoded_data * in the struct io_btrfs_cmd, overlayed with io_uring_cmd's pdu field. This avoids having to touch another cache line to access the struct btrfs_uring_encoded_data *, and allows op_data and struct io_uring_cmd_data to be removed. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Acked-by: David Sterba <dsterba@suse.com> Link: https://lore.kernel.org/r/20250708202212.2851548-4-csander@purestorage.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-07-18Fix SMB311 posix special file creation to servers which do not advertise ↵Steve French
reparse support Some servers (including Samba), support the SMB3.1.1 POSIX Extensions (which use reparse points for handling special files) but do not properly advertise file system attribute FILE_SUPPORTS_REPARSE_POINTS. Although we don't check for this attribute flag when querying special file information, we do check it when creating special files which causes them to fail unnecessarily. If we have negotiated SMB3.1.1 POSIX Extensions with the server we can expect the server to support creating special files via reparse points, and even if the server fails the operation due to really forbidding creating special files, then it should be no problem and is more likely to return a more accurate rc in any case (e.g. EACCES instead of EOPNOTSUPP). Allow creating special files as long as the server supports either reparse points or the SMB3.1.1 POSIX Extensions (note that if the "sfu" mount option is specified it uses a different way of storing special files that does not rely on reparse points). Cc: <stable@vger.kernel.org> Fixes: 6c06be908ca19 ("cifs: Check if server supports reparse points before using them") Acked-by: Ralph Boehme <slow@samba.org> Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-07-18Merge tag 'xfs-fixes-6.16-rc7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull xfs fixes from Carlos Maiolino: "This contains mostly code clean up, refactoring and comments modification. The most important patch in this series is the last one that removes an unnecessary data structure allocation of xfs busy extents which might lead to a memory leak on the zoned allocator code" * tag 'xfs-fixes-6.16-rc7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: don't allocate the xfs_extent_busy structure for zoned RTGs xfs: remove the bt_bdev_file buftarg field xfs: rename the bt_bdev_* buftarg fields xfs: refactor xfs_calc_atomic_write_unit_max xfs: add a xfs_group_type_buftarg helper xfs: remove the call to sync_blockdev in xfs_configure_buftarg xfs: clean up the initial read logic in xfs_readsb xfs: replace strncpy with memcpy in xattr listing
2025-07-18xfs: don't allocate the xfs_extent_busy structure for zoned RTGsChristoph Hellwig
Busy extent tracking is primarily used to ensure that freed blocks are not reused for data allocations before the transaction that deleted them has been committed to stable storage, and secondarily to drive online discard. None of the use cases applies to zoned RTGs, as the zoned allocator can't overwrite blocks before resetting the zone, which already flushes out all transactions touching the RTGs. So the busy extent tracking is not needed for zoned RTGs, and also not called for zoned RTGs. But somehow the code to skip allocating and freeing the structure got lost during the zoned XFS upstreaming process. This not only causes these structures to unnecessarily allocated, but can also lead to memory leaks as the xg_busy_extents pointer in the xfs_group structure is overlayed with the pointer for the linked list of to be reset zones. Stop allocating and freeing the structure to not pointlessly allocate memory which is then leaked when the zone is reset. Fixes: 080d01c41d44 ("xfs: implement zoned garbage collection") Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: <stable@vger.kernel.org> # v6.15 [cem: Fix type and add stable tag] Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-07-18efivarfs: Fix memory leak of efivarfs_fs_info in fs_context error pathsBreno Leitao
When processing mount options, efivarfs allocates efivarfs_fs_info (sfi) early in fs_context initialization. However, sfi is associated with the superblock and typically freed when the superblock is destroyed. If the fs_context is released (final put) before fill_super is called—such as on error paths or during reconfiguration—the sfi structure would leak, as ownership never transfers to the superblock. Implement the .free callback in efivarfs_context_ops to ensure any allocated sfi is properly freed if the fs_context is torn down before fill_super, preventing this memory leak. Suggested-by: James Bottomley <James.Bottomley@HansenPartnership.com> Fixes: 5329aa5101f73c ("efivarfs: Add uid/gid mount options") Signed-off-by: Breno Leitao <leitao@debian.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2025-07-18ovl: rename ovl_cleanup_unlocked() to ovl_cleanup()NeilBrown
The only remaining user of ovl_cleanup() is ovl_cleanup_locked(), so we no longer need both. This patch renames ovl_cleanup() to ovl_cleanup_locked() and makes it static. ovl_cleanup_unlocked() is renamed to ovl_cleanup(). Signed-off-by: NeilBrown <neil@brown.name> Link: https://lore.kernel.org/20250716004725.1206467-22-neil@brown.name Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-18ovl: change ovl_create_real() to receive dentry parentNeilBrown
Instead of passing an inode *dir, pass a dentry *parent. This makes the calling slightly cleaner. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: NeilBrown <neil@brown.name> Link: https://lore.kernel.org/20250716004725.1206467-21-neil@brown.name Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-18ovl: narrow locking in ovl_check_rename_whiteout()NeilBrown
ovl_check_rename_whiteout() now only holds the directory lock when needed, and takes it again if necessary. This makes way for future changes where locks are taken on individual dentries rather than the whole directory. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: NeilBrown <neil@brown.name> Link: https://lore.kernel.org/20250716004725.1206467-20-neil@brown.name Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-18ovl: narrow locking in ovl_whiteout()NeilBrown
ovl_whiteout() relies on the workdir i_rwsem to provide exclusive access to ofs->whiteout which it manipulates. Rather than depending on this, add a new mutex, "whiteout_lock" to explicitly provide the required locking. Use guard(mutex) for this so that we can return without needing to explicitly unlock. Then take the lock on workdir only when needed - to lookup the temp name and to do the whiteout or link. Signed-off-by: NeilBrown <neil@brown.name> Link: https://lore.kernel.org/20250716004725.1206467-19-neil@brown.name Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-18ovl: change ovl_cleanup_and_whiteout() to take rename lock as neededNeilBrown
Rather than locking the directory(s) before calling ovl_cleanup_and_whiteout(), change it (and ovl_whiteout()) to do the locking, so the locking can be fine grained as will be needed for proposed locking changes. Sometimes this is called to whiteout something in the index dir, in which case only that dir must be locked. In one case it is called on something in an upperdir, so two directories must be locked. We use ovl_lock_rename_workdir() for this and remove the restriction that upperdir cannot be indexdir - because now sometimes it is. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: NeilBrown <neil@brown.name> Link: https://lore.kernel.org/20250716004725.1206467-18-neil@brown.name Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-18ovl: narrow locking on ovl_remove_and_whiteout()NeilBrown
This code: performs a lookup_upper creates a whiteout object renames the whiteout over the result of the lookup The create and the rename must be locked separately for proposed directory locking changes. This patch takes a first step of moving the lookup out of the locked region. A subsequent patch will separate the create from the rename. Signed-off-by: NeilBrown <neil@brown.name> Link: https://lore.kernel.org/20250716004725.1206467-17-neil@brown.name Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-18ovl: change ovl_workdir_cleanup() to take dir lock as needed.NeilBrown
Rather than calling ovl_workdir_cleanup() with the dir already locked, change it to take the dir lock only when needed. Also change ovl_workdir_cleanup() to take a dentry for the parent rather than an inode. Signed-off-by: NeilBrown <neil@brown.name> Link: https://lore.kernel.org/20250716004725.1206467-16-neil@brown.name Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-18ovl: narrow locking in ovl_workdir_cleanup_recurse()NeilBrown
Only take the dir lock when needed, rather than for the whole loop. Signed-off-by: NeilBrown <neil@brown.name> Link: https://lore.kernel.org/20250716004725.1206467-15-neil@brown.name Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-18ovl: narrow locking in ovl_indexdir_cleanup()NeilBrown
Instead of taking the directory lock for the whole cleanup, only take it when needed. Signed-off-by: NeilBrown <neil@brown.name> Link: https://lore.kernel.org/20250716004725.1206467-14-neil@brown.name Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-18ovl: narrow locking in ovl_workdir_create()NeilBrown
In ovl_workdir_create() don't hold the dir lock for the whole time, but only take it when needed. It now gets taken separately for ovl_workdir_cleanup(). A subsequent patch will move the locking into that function. Signed-off-by: NeilBrown <neil@brown.name> Link: https://lore.kernel.org/20250716004725.1206467-13-neil@brown.name Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-18ovl: narrow locking in ovl_cleanup_index()NeilBrown
ovl_cleanup_index() takes a lock on the directory and then does a lookup and possibly one of two different cleanups. This patch narrows the locking to use the _unlocked() versions of the lookup and one cleanup, and just takes the lock for the other cleanup. A subsequent patch will take the lock into the cleanup. Signed-off-by: NeilBrown <neil@brown.name> Link: https://lore.kernel.org/20250716004725.1206467-12-neil@brown.name Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-18ovl: narrow locking in ovl_cleanup_whiteouts()NeilBrown
Rather than lock the directory for the whole operation, use ovl_lookup_upper_unlocked() and ovl_cleanup_unlocked() to take the lock only when needed. This makes way for future changes where locks are taken on individual dentries rather than the whole directory. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: NeilBrown <neil@brown.name> Link: https://lore.kernel.org/20250716004725.1206467-11-neil@brown.name Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-18ovl: narrow locking in ovl_rename()NeilBrown
Drop the rename lock immediately after the rename, and use ovl_cleanup_unlocked() for cleanup. This makes way for future changes where locks are taken on individual dentries rather than the whole directory. Signed-off-by: NeilBrown <neil@brown.name> Link: https://lore.kernel.org/20250716004725.1206467-10-neil@brown.name Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-18ovl: simplify gotos in ovl_rename()NeilBrown
Rather than having three separate goto label: out_unlock, out_dput_old, and out_dput, make use of that fact that dput() happily accepts a NULL pointer to reduce this to just one goto label: out_unlock. olddentry and newdentry are initialised to NULL and only set once a value dentry is found. They are then dput() late in the function. Suggested-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: NeilBrown <neil@brown.name> Link: https://lore.kernel.org/20250716004725.1206467-9-neil@brown.name Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-18ovl: narrow locking in ovl_create_over_whiteout()NeilBrown
Unlock the parents immediately after the rename, and use ovl_cleanup_unlocked() for cleanup, which takes a separate lock. This makes way for future changes where locks are taken on individual dentries rather than the whole directory. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: NeilBrown <neil@brown.name> Link: https://lore.kernel.org/20250716004725.1206467-8-neil@brown.name Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-18ovl: narrow locking in ovl_clear_empty()NeilBrown
Drop the locks immediately after rename, and use a separate lock for cleanup. This makes way for future changes where locks are taken on individual dentries rather than the whole directory. Note that ovl_cleanup_whiteouts() operates on "upper", a child of "upperdir" and does not require upperdir or workdir to be locked. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: NeilBrown <neil@brown.name> Link: https://lore.kernel.org/20250716004725.1206467-7-neil@brown.name Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-18ovl: narrow locking in ovl_create_upper()NeilBrown
Drop the directory lock immediately after the ovl_create_real() call and take a separate lock later for cleanup in ovl_cleanup_unlocked() - if needed. This makes way for future changes where locks are taken on individual dentries rather than the whole directory. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: NeilBrown <neil@brown.name> Link: https://lore.kernel.org/20250716004725.1206467-6-neil@brown.name Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-18ovl: narrow the locked region in ovl_copy_up_workdir()NeilBrown
In ovl_copy_up_workdir() unlock immediately after the rename. There is nothing else in the function that needs the lock. Signed-off-by: NeilBrown <neil@brown.name> Link: https://lore.kernel.org/20250716004725.1206467-5-neil@brown.name Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-18ovl: Call ovl_create_temp() without lock held.NeilBrown
ovl currently locks a directory or two and then performs multiple actions in one or both directories. This is incompatible with proposed changes which will lock just the dentry of objects being acted on. This patch moves calls to ovl_create_temp() out of the locked regions and has it take and release the relevant lock itself. The lock that was taken before this function was called is now taken after. This means that any code between where the lock was taken and ovl_create_temp() is now unlocked. This necessitates the use of ovl_cleanup_unlocked() and the creation of ovl_lookup_upper_unlocked(). These will be used more widely in future patches. Now that the file is created before the lock is taken for rename, we need to ensure the parent wasn't changed before the lock was gained. ovl_lock_rename_workdir() is changed to optionally receive the dentries that will be involved in the rename. If either is present but has the wrong parent, an error is returned. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: NeilBrown <neil@brown.name> Link: https://lore.kernel.org/20250716004725.1206467-4-neil@brown.name Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-18ovl: change ovl_create_index() to take dir locksNeilBrown
ovl_copy_up_workdir() currently take a rename lock on two directories, then use the lock to both create a file in one directory, perform a rename, and possibly unlink the file for cleanup. This is incompatible with proposed changes which will lock just the dentry of objects being acted on. This patch moves the call to ovl_create_index() earlier in ovl_copy_up_workdir() to before the lock is taken. ovl_create_index() then takes the required lock only when needed. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: NeilBrown <neil@brown.name> Link: https://lore.kernel.org/20250716004725.1206467-3-neil@brown.name Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-18ovl: simplify an error path in ovl_copy_up_workdir()NeilBrown
If ovl_copy_up_data() fails the error is not immediately handled but the code continues on to call ovl_start_write() and lock_rename(), presumably because both of these locks are needed for the cleanup. Only then (if the lock was successful) is the error checked. This makes the code a little hard to follow and could be fragile. This patch changes to handle the error after the ovl_start_write() (which cannot fail, so there aren't multiple errors to deail with). A new ovl_cleanup_unlocked() is created which takes the required directory lock. This will be used extensively in later patches. In general we need to check the parent is still correct after taking the lock (as ovl_copy_up_workdir() does after a successful lock_rename()) so that is included in ovl_cleanup_unlocked() using new ovl_parent_lock() and ovl_parent_unlock() calls (it is planned to move this API into VFS code eventually, though in a slightly different form). Signed-off-by: NeilBrown <neil@brown.name> Link: https://lore.kernel.org/20250716004725.1206467-2-neil@brown.name Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-18ovl: support layers on case-folding capable filesystemsAmir Goldstein
Case folding is often applied to subtrees and not on an entire filesystem. Disallowing layers from filesystems that support case folding is over limiting. Replace the rule that case-folding capable are not allowed as layers with a rule that case folded directories are not allowed in a merged directory stack. Should case folding be enabled on an underlying directory while overlayfs is mounted the outcome is generally undefined. Specifically in ovl_lookup(), we check the base underlying directory and fail with -ESTALE and write a warning to kmsg if an underlying directory case folding is enabled. Suggested-by: Kent Overstreet <kent.overstreet@linux.dev> Link: https://lore.kernel.org/linux-fsdevel/20250520051600.1903319-1-kent.overstreet@linux.dev/ Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/20250602171702.1941891-1-amir73il@gmail.com Reviewed-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-18ovl: remove unneeded non-const conversionAmir Goldstein
file_user_path() now takes a const file ptr. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/20250607115304.2521155-3-amir73il@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-18fs: constify file ptr in backing_file accessor helpersAmir Goldstein
Add internal helper backing_file_set_user_path() for the only two cases that need to modify backing_file fields. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/20250607115304.2521155-2-amir73il@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-17ext4: refactor the inline directory conversion and new directory codepathsTheodore Ts'o
There was a lot of common code in the codepaths used to convert an inline directory and to creaet a new directory. To address this, rename ext4_init_dot_dotdot() to ext4_init_dirblock() and then move common code into that function. This reduces the lines of code count in fs/ext4/inline.c and fs/ext4/namei.c, as well as reducing the size of their object files. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Link: https://patch.msgid.link/20250712181249.434530-3-tytso@mit.edu Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-07-17ext4: use memcpy() instead of strcpy()Theodore Ts'o
The strcpy() function is considered dangerous and eeeevil by people who are using sophisticated code analysis tools such as "grep". This is true even when a quick inspection would show that the source is a constant string ("." or "..") and the destination is a fixed array which is guaranteed to have enough space. Make the "grep" code analysis tool happy by using memcpy() isstead of strcpy(). :-) Signed-off-by: Theodore Ts'o <tytso@mit.edu> Link: https://patch.msgid.link/20250712181249.434530-2-tytso@mit.edu Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-07-17ext4: replace strcmp with direct comparison for '.' and '..'Theodore Ts'o
In a discussion over a proposed patch, "ext4: replace strcpy() with '.' assignment"[1], I had asserted that directory entries in ext4 were not NUL terminated, and hence it was safe to replace strcpy() with a direct assignment. As it turns out, this was incorrect. It's true for all all directory entries *except* for '.' and '..' where the kernel was using strcmp() and where e2fsck actually checks and offers to fix things if '.' and '..' are not NUL terminated. [1] https://lore.kernel.org/r/202505191316.JJMnPobO-lkp@intel.com We can't change this without breaking old kernel versions, but in the spirit of "be liberal in what you receive", use direct comparison of de->name_len and de->name[0,1] instead of strcmp(). This has the side benefit of reducing the compiled text size by 96 bytes on x86_64. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Link: https://patch.msgid.link/20250712181249.434530-1-tytso@mit.edu Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-07-17ext4: Make sure BH_New bit is cleared in ->write_end handlerJan Kara
Currently we clear BH_New bit in case of error and also in the standard ext4_write_end() handler (in block_commit_write()). However ext4_journalled_write_end() misses this clearing and thus we are leaving stale BH_New bits behind. Generally ext4_block_write_begin() clears these bits before any harm can be done but in case blocksize < pagesize and we hit some error when processing a page with these stale bits, we'll try to zero buffers with these stale BH_New bits and jbd2 will complain (as buffers were not prepared for writing in this transaction). Fix the problem by clearing BH_New bits in ext4_journalled_write_end() and WARN if ext4_block_write_begin() sees stale BH_New bits. Reported-by: Baolin Liu <liubaolin12138@163.com> Reported-by: Zhi Long <longzhi@sangfor.com.cn> Fixes: 3910b513fcdf ("ext4: persist the new uptodate buffers in ext4_journalled_zero_new_buffers") Signed-off-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20250709084831.23876-2-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-07-17ext4: fix inode use after free in ext4_end_io_rsv_work()Baokun Li
In ext4_io_end_defer_completion(), check if io_end->list_vec is empty to avoid adding an io_end that requires no conversion to the i_rsv_conversion_list, which in turn prevents starting an unnecessary worker. An ext4_emergency_state() check is also added to avoid attempting to abort the journal in an emergency state. Additionally, ext4_put_io_end_defer() is refactored to call ext4_io_end_defer_completion() directly instead of being open-coded. This also prevents starting an unnecessary worker when EXT4_IO_END_FAILED is set but data_err=abort is not enabled. This ensures that the check in ext4_put_io_end_defer() is consistent with the check in ext4_end_bio(). Otherwise, we might add an io_end to the i_rsv_conversion_list and then call ext4_finish_bio(), after which the inode could be freed before ext4_end_io_rsv_work() is called, triggering a use-after-free issue. Fixes: ce51afb8cc5e ("ext4: abort journal on data writeback failure if in data_err=abort mode") Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20250708111504.3208660-1-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-07-17binfmt_elf: remove the 4k limitation of program header sizeYin Fengwei
We have assembly code generated by a script. GCC successfully compiles it. However, the kernel cannot load it on an ARM64 platform with a 4K page size. In contrast, the same ELF file loads correctly on the same platform with a 64K page size. The root cause is the Linux kernel's ELF_MIN_ALIGN limitation on the program headers of ELF files. The ELF file contains 78 program headers (the script inserts many holes when generating the assembly code). On ARM64 with a 4K page size, the ELF_MIN_ALLIGN enforces a maximum of 74 program headers, causing the ELF file to fail. However, with a 64K page size, the ELF_MIN_ALIGN is relaxed to over 1,184 program headers, allowing the file to run correctly. Cook kindly identified[1] that this limitation was introduced in Linux-0.99.15f without an explanation for its purpose. The ELF specification does not impose such a restriction on program headers. Removing the ELF_MIN_ALIGN limitation on program headers to align with the ELF spec. After removing ELF_MIN_ALIGN limitation, 64K size limitation still exist which should be sufficient. Suggested-by: Kees Cook <kees@kernel.org> Link: https://lore.kernel.org/linux-mm/202506270854.A729825@keescook/ [1] Signed-off-by: Yin Fengwei <fengwei_yin@linux.alibaba.com> Link: https://lore.kernel.org/r/20250717110108.55586-1-fengwei_yin@linux.alibaba.com Signed-off-by: Kees Cook <kees@kernel.org>
2025-07-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.16-rc7). Conflicts: Documentation/netlink/specs/ovpn.yaml 880d43ca9aa4 ("netlink: specs: clean up spaces in brackets") af52020fc599 ("ovpn: reject unexpected netlink attributes") drivers/net/phy/phy_device.c a44312d58e78 ("net: phy: Don't register LEDs for genphy") f0f2b992d818 ("net: phy: Don't register LEDs for genphy") https://lore.kernel.org/20250710114926.7ec3a64f@kernel.org drivers/net/wireless/intel/iwlwifi/fw/regulatory.c drivers/net/wireless/intel/iwlwifi/mld/regulatory.c 5fde0fcbd760 ("wifi: iwlwifi: mask reserved bits in chan_state_active_bitmap") ea045a0de3b9 ("wifi: iwlwifi: add support for accepting raw DSM tables by firmware") net/ipv6/mcast.c ae3264a25a46 ("ipv6: mcast: Delay put pmc->idev in mld_del_delrec()") a8594c956cc9 ("ipv6: mcast: Avoid a duplicate pointer check in mld_del_delrec()") https://lore.kernel.org/8cc52891-3653-4b03-a45e-05464fe495cf@kernel.org No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17ext4: Refactor breaking condition for xattr_find_entry()I Hsin Cheng
Refactor the condition for breaking the loop within xattr_find_entry(). Elimate the usage of "<=" and take condition shortcut when "!cmp" is true. Originally, the condition was "(cmp <= 0 && (sorted || cmp == 0))", which means after it knows "cmp <= 0" is true, it has to check the value of "sorted" and "cmp". The checking of "cmp" here would be redundant since it has already checked it. Observing from the logic, when "cmp == 0" the branch is going to be true, no need to check "cmp == 0" again, so we only need to take shortcut when "cmp == 0", on the other hand, we'll check "sorted" when "cmp < 0". The refactor can shrink the generated code size by 44 bytes. Numerous instructions can be saved thus should also benefit execution efficiency as well. $ ./scripts/bloat-o-meter vmlinux_old vmlinux_new add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-44 (-44) Function old new delta xattr_find_entry 300 256 -44 Total: Before=22989434, After=22989390, chg -0.00% The test is done on kernel version 6.16 with x86_64 defconfig and gcc 13.3.0. Signed-off-by: I Hsin Cheng <richard120310@gmail.com> Link: https://patch.msgid.link/20250708020013.175728-1-richard120310@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-07-17ilog2: add max_pow_of_two_factor()John Garry
Relocate the function max_pow_of_two_factor() to common ilog2.h from the xfs code, as it will be used elsewhere. Also simplify the function, as advised by Mikulas Patocka. Signed-off-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20250711105258.3135198-2-john.g.garry@oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-07-17fuse: refactor writeback to use iomap_writepage_ctx inodeJoanne Koong
struct iomap_writepage_ctx includes a pointer to the file inode. In writeback, use that instead of also passing the inode into fuse_fill_wb_data. No functional changes. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Link: https://lore.kernel.org/20250715202122.2282532-6-joannelkoong@gmail.com Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-17fuse: hook into iomap for invalidating and checking partial uptodatenessJoanne Koong
Hook into iomap_invalidate_folio() so that if the entire folio is being invalidated during truncation, the dirty state is cleared and the folio doesn't get written back. As well the folio's corresponding ifs struct will get freed. Hook into iomap_is_partially_uptodate() since iomap tracks uptodateness granularly when it does buffered writes. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Link: https://lore.kernel.org/20250715202122.2282532-5-joannelkoong@gmail.com Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-17fuse: use iomap for folio launderingJoanne Koong
Use iomap for folio laundering, which will do granular dirty writeback when laundering a large folio. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Link: https://lore.kernel.org/20250715202122.2282532-4-joannelkoong@gmail.com Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-17fuse: use iomap for writebackJoanne Koong
Use iomap for dirty folio writeback in ->writepages(). This allows for granular dirty writeback of large folios. Only the dirty portions of the large folio will be written instead of having to write out the entire folio. For example if there is a 1 MB large folio and only 2 bytes in it are dirty, only the page for those dirty bytes will be written out. .dirty_folio needs to be set to iomap_dirty_folio so that the bitmap iomap uses for dirty tracking correctly reflects dirty regions that need to be written back. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Link: https://lore.kernel.org/20250715202122.2282532-3-joannelkoong@gmail.com Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>