summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2025-03-10Merge branch 'vfs-6.15.iomap' of ↵Carlos Maiolino
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs into xfs-6.15-merge XFS code for 6.15 depends on patches within iomap. Merge them before pulling in XFS code. Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-03-10xfs: Use abs_diff instead of XFS_ABSDIFFMatthew Wilcox (Oracle)
We have a central definition for this function since 2023, used by a number of different parts of the kernel. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-03-10Merge patch series "pipe: Trivial cleanups"Christian Brauner
K Prateek Nayak <kprateek.nayak@amd.com> says: Based on the suggestion on the RFC, the treewide conversion of references to pipe->{head,tail} from unsigned int to pipe_index_t has been dropped for now. The series contains trivial cleanup suggested to limit the nr_slots in pipe_resize_ring() to be covered between pipe_index_t limits of pipe->{head,tail} and using pipe_buf() to remove the open-coded usage of masks to access pipe buffer building on Linus' cleanup of fs/fuse/dev.c in commit ebb0f38bb47f ("fs/pipe: fix pipe buffer index use in FUSE") * patches from https://lore.kernel.org/r/20250307052919.34542-1-kprateek.nayak@amd.com: fs/splice: Use pipe_buf() helper to retrieve pipe buffer fs/pipe: Use pipe_buf() helper to retrieve pipe buffer kernel/watch_queue: Use pipe_buf() to retrieve the pipe buffer fs/pipe: Limit the slots in pipe_resize_ring() Link: https://lore.kernel.org/r/20250307052919.34542-1-kprateek.nayak@amd.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-10fs/splice: Use pipe_buf() helper to retrieve pipe bufferK Prateek Nayak
Use pipe_buf() helper to retrieve the pipe buffer throughout the file replacing the open-coded the logic. Suggested-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com> Link: https://lore.kernel.org/r/20250307052919.34542-5-kprateek.nayak@amd.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-10fs/pipe: Use pipe_buf() helper to retrieve pipe bufferK Prateek Nayak
Use pipe_buf() helper to retrieve the pipe buffer throughout the file replacing the open-coded the logic. Suggested-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com> Link: https://lore.kernel.org/r/20250307052919.34542-4-kprateek.nayak@amd.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-10fs/pipe: Limit the slots in pipe_resize_ring()K Prateek Nayak
Limit the number of slots in pipe_resize_ring() to the maximum value representable by pipe->{head,tail}. Values beyond the max limit can lead to incorrect pipe occupancy related calculations where the pipe will never appear full. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com> Link: https://lore.kernel.org/r/20250307052919.34542-2-kprateek.nayak@amd.com Reviewed-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-10Merge mainline pipe changesChristian Brauner
Mainline now contains various changes to pipes that are relevant for other pipe work this cycle. So merge them into the respective VFS tree. Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-08Merge tag 'mm-hotfixes-stable-2025-03-08-16-27' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "33 hotfixes. 24 are cc:stable and the remainder address post-6.13 issues or aren't considered necessary for -stable kernels. 26 are for MM and 7 are for non-MM. - "mm: memory_failure: unmap poisoned folio during migrate properly" from Ma Wupeng fixes a couple of two year old bugs involving the migration of hwpoisoned folios. - "selftests/damon: three fixes for false results" from SeongJae Park fixes three one year old bugs in the SAMON selftest code. The remainder are singletons and doubletons. Please see the individual changelogs for details" * tag 'mm-hotfixes-stable-2025-03-08-16-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (33 commits) mm/page_alloc: fix uninitialized variable rapidio: add check for rio_add_net() in rio_scan_alloc_net() rapidio: fix an API misues when rio_add_net() fails MAINTAINERS: .mailmap: update Sumit Garg's email address Revert "mm/page_alloc.c: don't show protection in zone's ->lowmem_reserve[] for empty zone" mm: fix finish_fault() handling for large folios mm: don't skip arch_sync_kernel_mappings() in error paths mm: shmem: remove unnecessary warning in shmem_writepage() userfaultfd: fix PTE unmapping stack-allocated PTE copies userfaultfd: do not block on locking a large folio with raised refcount mm: zswap: use ATOMIC_LONG_INIT to initialize zswap_stored_pages mm: shmem: fix potential data corruption during shmem swapin mm: fix kernel BUG when userfaultfd_move encounters swapcache selftests/damon/damon_nr_regions: sort collected regiosn before checking with min/max boundaries selftests/damon/damon_nr_regions: set ops update for merge results check to 100ms selftests/damon/damos_quota: make real expectation of quota exceeds include/linux/log2.h: mark is_power_of_2() with __always_inline NFS: fix nfs_release_folio() to not deadlock via kcompactd writeback mm, swap: avoid BUG_ON in relocate_cluster() mm: swap: use correct step in loop to wait all clusters in wait_for_allocation() ...
2025-03-08f2fs: control nat_bits feature via mount optionChao Yu
Introduce a new mount option "nat_bits" to control nat_bits feature, by default nat_bits feature is disabled. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-08vfs: Remove invalidate_inodes()Jan Kara
The function can be replaced by evict_inodes. The only difference is that evict_inodes() skips the inodes with positive refcount without touching ->i_lock, but they are equivalent as evict_inodes() repeats the refcount check after having grabbed ->i_lock. Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20250307144318.28120-2-jack@suse.cz Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-07binfmt_elf_fdpic: fix variable set but not used warningsunliming
Fix below kernel warning: fs/binfmt_elf_fdpic.c:1024:52: warning: variable 'excess1' set but not used [-Wunused-but-set-variable] Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: sunliming <sunliming@kylinos.cn> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20250308022754.75013-1-sunliming@linux.dev Signed-off-by: Kees Cook <kees@kernel.org>
2025-03-07Merge tag 'execve-v6.14-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull core dumping fix from Kees Cook: - Only sort VMAs when core_sort_vma sysctl is set * tag 'execve-v6.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: coredump: Only sort VMAs when core_sort_vma sysctl is set
2025-03-07Merge tag 'for-6.14-rc5-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - fix leaked extent map after error when reading chunks - replace use of deprecated strncpy - in zoned mode, fixed range when ulocking extent range, causing a hang * tag 'for-6.14-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix a leaked chunk map issue in read_one_chunk() btrfs: replace deprecated strncpy() with strscpy() btrfs: zoned: fix extent range end unlock in cow_file_range()
2025-03-07bdev: add back PAGE_SIZE block size validation for sb_set_blocksize()Luis Chamberlain
The commit titled "block/bdev: lift block size restrictions to 64k" lifted the block layer's max supported block size to 64k inside the helper blk_validate_block_size() now that we support large folios. However in lifting the block size we also removed the silly use cases many filesystems have to use sb_set_blocksize() to *verify* that the block size <= PAGE_SIZE. The call to sb_set_blocksize() was used to check the block size <= PAGE_SIZE since historically we've always supported userspace to create for example 64k block size filesystems even on 4k page size systems, but what we didn't allow was mounting them. Older filesystems have been using the check with sb_set_blocksize() for years. While, we could argue that such checks should be filesystem specific, there are much more users of sb_set_blocksize() than LBS enabled filesystem on upstream, so just do the easier thing and bring back the PAGE_SIZE check for sb_set_blocksize() users and only skip it for LBS enabled filesystems. This will ensure that tests such as generic/466 when run in a loop against say, ext4, won't try to try to actually mount a filesystem with a block size larger than your filesystem supports given your PAGE_SIZE and in the worst case crash. Cc: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Link: https://lore.kernel.org/r/20250307020403.3068567-1-mcgrof@kernel.org Reviewed-by: Kent Overstreet <kent.overstreet@linux.dev> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-07efivarfs: Revert "allow creation of zero length files"Ard Biesheuvel
As agreed with the fwupd/LVFS maintainer, this reverts commit fc20737d8b85691ecabab3739ed7d06c9b7bc00f again for the v6.15 cycle, leaving them sufficient time to roll out a fix for the issue that the reverted commit works around. Link: https://lore.kernel.org/all/63837c36eceaf8cf2af7933dccca54ff4dd9f30d.camel@HansenPartnership.com/ Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2025-03-06fs/pipe: add simpler helpers for common casesLinus Torvalds
The fix to atomically read the pipe head and tail state when not holding the pipe mutex has caused a number of headaches due to the size change of the involved types. It turns out that we don't have _that_ many places that access these fields directly and were affected, but we have more than we strictly should have, because our low-level helper functions have been designed to have intimate knowledge of how the pipes work. And as a result, that random noise of direct 'pipe->head' and 'pipe->tail' accesses makes it harder to pinpoint any actual potential problem spots remaining. For example, we didn't have a "is the pipe full" helper function, but instead had a "given these pipe buffer indexes and this pipe size, is the pipe full". That's because some low-level pipe code does actually want that much more complicated interface. But most other places literally just want a "is the pipe full" helper, and not having it meant that those places ended up being unnecessarily much too aware of this all. It would have been much better if only the very core pipe code that cared had been the one aware of this all. So let's fix it - better late than never. This just introduces the trivial wrappers for "is this pipe full or empty" and to get how many pipe buffers are used, so that instead of writing if (pipe_full(pipe->head, pipe->tail, pipe->max_usage)) the places that literally just want to know if a pipe is full can just say if (pipe_is_full(pipe)) instead. The existing trivial cases were converted with a 'sed' script. This cuts down on the places that access pipe->head and pipe->tail directly outside of the pipe code (and core splice code) quite a lot. The splice code in particular still revels in doing the direct low-level accesses, and the fuse fuse_dev_splice_write() code also seems a bit unnecessarily eager to go very low-level, but it's at least a bit better than it used to be. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-03-06Merge tag 'bcachefs-2025-03-06' of git://evilpiepirate.org/bcachefsLinus Torvalds
Pull bcachefs fixes from Kent Overstreet: - Fix a compatibility issue: we shouldn't be setting incompat feature bits unless explicitly requested - Fix another bug where the journal alloc/resize path could spuriously fail with -BCH_ERR_open_buckets_empty - Copygc shouldn't run on read-only devices: fragmentation isn't an issue if we're not currently writing to a given device, and it may not have anywhere to move the data to * tag 'bcachefs-2025-03-06' of git://evilpiepirate.org/bcachefs: bcachefs: copygc now skips non-rw devices bcachefs: Fix bch2_dev_journal_alloc() spuriously failing bcachefs: Don't set BCH_FEATURE_incompat_version_field unless requested
2025-03-06bcachefs: copygc now skips non-rw devicesKent Overstreet
There's no point in doing copygc on non-rw devices: the fragmentation doesn't matter if we're not writing to them, and we may not have anywhere to put the data on our other devices. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-06bcachefs: Fix bch2_dev_journal_alloc() spuriously failingKent Overstreet
Previously, we fixed journal resize spuriousl failing with -BCH_ERR_open_buckets_empty, but initial journal allocation was missed because it didn't invoke the "block on allocator" loop at all. Factor out the "loop on allocator" code to fix that. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.14-rc6). Conflicts: net/ethtool/cabletest.c 2bcf4772e45a ("net: ethtool: try to protect all callback with netdev instance lock") 637399bf7e77 ("net: ethtool: netlink: Allow NULL nlattrs when getting a phy_device") No Adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06Merge tag 'v6.14-rc5-smb3-fixes' of git://git.samba.org/ksmbdLinus Torvalds
Pull smb fixes from Steve French: "Five SMB server fixes, two related client fixes, and minor MAINTAINERS update: - Two SMB3 lock fixes fixes (including use after free and bug on fix) - Fix to race condition that can happen in processing IPC responses - Four ACL related fixes: one related to endianness of num_aces, and two related fixes to the checks for num_aces (for both client and server), and one fixing missing check for num_subauths which can cause memory corruption - And minor update to email addresses in MAINTAINERS file" * tag 'v6.14-rc5-smb3-fixes' of git://git.samba.org/ksmbd: cifs: fix incorrect validation for num_aces field of smb_acl ksmbd: fix incorrect validation for num_aces field of smb_acl smb: common: change the data type of num_aces to le16 ksmbd: fix bug on trap in smb2_lock ksmbd: fix use-after-free in smb2_lock ksmbd: fix type confusion via race condition when using ipc_msg_send_request ksmbd: fix out-of-bounds in parse_sec_desc() MAINTAINERS: update email address in cifs and ksmbd entry
2025-03-06Merge tag 'exfat-for-6.14-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat Pull exfat fixes from Namjae Jeon: - Optimize new cluster allocation by correctly find empty entry slot - Add a check to prevent excessive bitmap clearing due to invalid data size of file/dir entry - Fix incorrect error return for zero-byte writes * tag 'exfat-for-6.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat: exfat: add a check for invalid data size exfat: short-circuit zero-byte writes in exfat_file_write_iter exfat: fix soft lockup in exfat_clear_bitmap exfat: fix just enough dentries but allocate a new cluster to dir
2025-03-06fs/pipe: fix pipe buffer index use in FUSELinus Torvalds
This was another case that Rasmus pointed out where the direct access to the pipe head and tail pointers broke on 32-bit configurations due to the type changes. As with the pipe FIONREAD case, fix it by using the appropriate helper functions that deal with the right pipe index sizing. Reported-by: Rasmus Villemoes <ravi@prevas.dk> Link: https://lore.kernel.org/all/878qpi5wz4.fsf@prevas.dk/ Fixes: 3d252160b818 ("fs/pipe: Read pipe->{head,tail} atomically outside pipe->mutex")Cc: Oleg > Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Swapnil Sapkal <swapnil.sapkal@amd.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-03-06fs/pipe: do not open-code pipe head/tail logic in FIONREADLinus Torvalds
Rasmus points out that we do indeed have other cases of breakage from the type changes that were introduced on 32-bit targets in order to read the pipe head and tail values atomically (commit 3d252160b818: "fs/pipe: Read pipe->{head,tail} atomically outside pipe->mutex"). Fix it up by using the proper helper functions that now deal with the pipe buffer index types properly. This makes the code simpler and more obvious. The compiler does the CSE and loop hoisting of the pipe ring size masking that we used to do manually, so open-coding this was never a good idea. Reported-by: Rasmus Villemoes <ravi@prevas.dk> Link: https://lore.kernel.org/all/87cyeu5zgk.fsf@prevas.dk/ Fixes: 3d252160b818 ("fs/pipe: Read pipe->{head,tail} atomically outside pipe->mutex")Cc: Oleg Nesterov <oleg@redhat.com> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Swapnil Sapkal <swapnil.sapkal@amd.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-03-06fs/ntfs3: Remove unused ntfs_flush_inodesDr. David Alan Gilbert
ntfs_flush_inodes() was added in 2021 by commit 82cae269cfa9 ("fs/ntfs3: Add initialization of super block") but has remained unused. Remove it, and it's helper function. Note: There is a commented out call to ntfs_flush_inodes in ntfs_truncate() - I've left that in place. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2025-03-06fs/ntfs3: Remove unused ntfs_sb_readDr. David Alan Gilbert
ntfs_sb_read() was added in 2021 by commit 82cae269cfa9 ("fs/ntfs3: Add initialization of super block") but hasn't been used. Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2025-03-06fs/ntfs3: Remove unused ni_load_attrDr. David Alan Gilbert
ni_load_attr() was added in 2021 by commit 4342306f0f0d ("fs/ntfs3: Add file operations and implementation") but hasn't been used. Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2025-03-06fs/ntfs3: Prevent integer overflow in hdr_first_de()Dan Carpenter
The "de_off" and "used" variables come from the disk so they both need to check. The problem is that on 32bit systems if they're both greater than UINT_MAX - 16 then the check does work as intended because of an integer overflow. Fixes: 60ce8dfde035 ("fs/ntfs3: Fix wrong if in hdr_first_de") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2025-03-06fs/ntfs3: Fix a couple integer overflows on 32bit systemsDan Carpenter
On 32bit systems the "off + sizeof(struct NTFS_DE)" addition can have an integer wrapping issue. Fix it by using size_add(). Fixes: 82cae269cfa9 ("fs/ntfs3: Add initialization of super block") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2025-03-06btrfs: fix a leaked chunk map issue in read_one_chunk()Haoxiang Li
Add btrfs_free_chunk_map() to free the memory allocated by btrfs_alloc_chunk_map() if btrfs_add_chunk_map() fails. Fixes: 7dc66abb5a47 ("btrfs: use a dedicated data structure for chunk maps") CC: stable@vger.kernel.org Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Haoxiang Li <haoxiang_li2024@163.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-03-06iomap: Lift blocksize restriction on atomic writesRitesh Harjani (IBM)
Filesystems like ext4 can submit writes in multiples of blocksizes. But we still can't allow the writes to be split. Hence let's check if the iomap_length() is same as iter->len or not. It is the role of the FS to ensure that a single mapping may be created for an atomic write. The FS will also continue to check size and alignment legality. Signed-off-by: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com> jpg: Tweak commit message Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20250303171120.2837067-7-john.g.garry@oracle.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-06iomap: Support SW-based atomic writesJohn Garry
Currently atomic write support requires dedicated HW support. This imposes a restriction on the filesystem that disk blocks need to be aligned and contiguously mapped to FS blocks to issue atomic writes. XFS has no method to guarantee FS block alignment for regular, non-RT files. As such, atomic writes are currently limited to 1x FS block there. To deal with the scenario that we are issuing an atomic write over misaligned or discontiguous data blocks - and raise the atomic write size limit - support a SW-based software emulated atomic write mode. For XFS, this SW-based atomic writes would use CoW support to issue emulated untorn writes. It is the responsibility of the FS to detect discontiguous atomic writes and switch to IOMAP_DIO_ATOMIC_SW mode and retry the write. Indeed, SW-based atomic writes could be used always when the mounted bdev does not support HW offload, but this strategy is not initially expected to be used. Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20250303171120.2837067-6-john.g.garry@oracle.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-06iomap: Rename IOMAP_ATOMIC -> IOMAP_ATOMIC_HWJohn Garry
In future xfs will support a SW-based atomic write, so rename IOMAP_ATOMIC -> IOMAP_ATOMIC_HW to be clear which mode is being used. Also relocate setting of IOMAP_ATOMIC_HW to the write path in __iomap_dio_rw(), to be clear that this flag is only relevant to writes. Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20250303171120.2837067-3-john.g.garry@oracle.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-06Merge branch 'vfs-6.15.shared.iomap' of ↵Christian Brauner
gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs Bring in iomap changes that xfs relies on. Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-06orangefs: Convert orangefs_writepages to contain an array of foliosMatthew Wilcox (Oracle)
The pages being passed in are always folios (since they come from the page cache). This eliminates several hidden calls to compound_head(), and uses of legacy APIs. Signed-off-by: "Matthew Wilcox (Oracle)" <willy@infradead.org> Link: https://lore.kernel.org/r/20250305204734.1475264-10-willy@infradead.org Tested-by: Mike Marshall <hubcap@omnibond.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-06orangefs: Simplify bvec setup in orangefs_writepages_work()Matthew Wilcox (Oracle)
This produces a bvec which is slightly different as the last page is added in its entirety rather than only the portion which is being written back. However we don't use this information anywhere; the iovec has its own length parameter. Signed-off-by: "Matthew Wilcox (Oracle)" <willy@infradead.org> Link: https://lore.kernel.org/r/20250305204734.1475264-9-willy@infradead.org Tested-by: Mike Marshall <hubcap@omnibond.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-06orangefs: Unify error & success paths in orangefs_writepages_work()Matthew Wilcox (Oracle)
Both arms of this conditional now have the same loop, so sink it out of the conditional. Signed-off-by: "Matthew Wilcox (Oracle)" <willy@infradead.org> Link: https://lore.kernel.org/r/20250305204734.1475264-8-willy@infradead.org Tested-by: Mike Marshall <hubcap@omnibond.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-06orangefs: Pass mapping to orangefs_writepages_work()Matthew Wilcox (Oracle)
Remove two accesses to page->mapping by passing the mapping from orangefs_writepages() to orangefs_writepages_callback() and then orangefs_writepages_work(). That makes it obvious that all folios come from the same mapping, so we can hoist the call to mapping_set_error() outside the loop. While I'm here, switch from write_cache_pages() to writeback_iter() which removes an indirect function call. Signed-off-by: "Matthew Wilcox (Oracle)" <willy@infradead.org> Link: https://lore.kernel.org/r/20250305204734.1475264-7-willy@infradead.org Tested-by: Mike Marshall <hubcap@omnibond.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-06orangefs: Convert orangefs_writepage_locked() to take a folioMatthew Wilcox (Oracle)
Both callers have a folio, pass it in and use it inside orangefs_writepage_locked(). Removes a few hidden calls to compound_head() and accesses to page->mapping. Signed-off-by: "Matthew Wilcox (Oracle)" <willy@infradead.org> Link: https://lore.kernel.org/r/20250305204734.1475264-6-willy@infradead.org Tested-by: Mike Marshall <hubcap@omnibond.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-06orangefs: Remove orangefs_writepage()Matthew Wilcox (Oracle)
If we add a migrate_folio operation, we can remove orangefs_writepage (as there is already a writepages operation). filemap_migrate_folio() will do fine as struct orangefs_write_range does not need to be adjusted when the folio is migrated. Signed-off-by: "Matthew Wilcox (Oracle)" <willy@infradead.org> Link: https://lore.kernel.org/r/20250305204734.1475264-5-willy@infradead.org Tested-by: Mike Marshall <hubcap@omnibond.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-06orangefs: make open_for_read and open_for_write booleanMatthew Wilcox (Oracle)
sparse currently warns: fs/orangefs/file.c:119:32: warning: incorrect type in assignment (different base types) fs/orangefs/file.c:119:32: expected int open_for_write fs/orangefs/file.c:119:32: got restricted fmode_t Turning open_for_write and open_for_read into booleans (which is how they're used) removes this warning. Signed-off-by: "Matthew Wilcox (Oracle)" <willy@infradead.org> Link: https://lore.kernel.org/r/20250305204734.1475264-4-willy@infradead.org Tested-by: Mike Marshall <hubcap@omnibond.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-06orangefs: Move s_kmod_keyword_mask_map to orangefs-debugfs.cMatthew Wilcox (Oracle)
Attempting to build orangefs with W=1 currently reports errors like: In file included from ../fs/orangefs/protocol.h:287, from ../fs/orangefs/waitqueue.c:16: ../fs/orangefs/orangefs-debug.h:86:18: error: ‘num_kmod_keyword_mask_map’ defined but not used [-Werror=unused-const-variable=] Move num_kmod_keyword_mask_map, s_kmod_keyword_mask_map and struct __keyword_mask_s to orangefs-debugfs.c which is the only file they're used in. Signed-off-by: "Matthew Wilcox (Oracle)" <willy@infradead.org> Link: https://lore.kernel.org/r/20250305204734.1475264-3-willy@infradead.org Tested-by: Mike Marshall <hubcap@omnibond.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-06orangefs: Do not truncate file sizeMatthew Wilcox (Oracle)
'len' is used to store the result of i_size_read(), so making 'len' a size_t results in truncation to 4GiB on 32-bit systems. Signed-off-by: "Matthew Wilcox (Oracle)" <willy@infradead.org> Link: https://lore.kernel.org/r/20250305204734.1475264-2-willy@infradead.org Tested-by: Mike Marshall <hubcap@omnibond.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-05NFS: fix nfs_release_folio() to not deadlock via kcompactd writebackMike Snitzer
Add PF_KCOMPACTD flag and current_is_kcompactd() helper to check for it so nfs_release_folio() can skip calling nfs_wb_folio() from kcompactd. Otherwise NFS can deadlock waiting for kcompactd enduced writeback which recurses back to NFS (which triggers writeback to NFSD via NFS loopback mount on the same host, NFSD blocks waiting for XFS's call to __filemap_get_folio): 6070.550357] INFO: task kcompactd0:58 blocked for more than 4435 seconds. {--- [58] "kcompactd0" [<0>] folio_wait_bit+0xe8/0x200 [<0>] folio_wait_writeback+0x2b/0x80 [<0>] nfs_wb_folio+0x80/0x1b0 [nfs] [<0>] nfs_release_folio+0x68/0x130 [nfs] [<0>] split_huge_page_to_list_to_order+0x362/0x840 [<0>] migrate_pages_batch+0x43d/0xb90 [<0>] migrate_pages_sync+0x9a/0x240 [<0>] migrate_pages+0x93c/0x9f0 [<0>] compact_zone+0x8e2/0x1030 [<0>] compact_node+0xdb/0x120 [<0>] kcompactd+0x121/0x2e0 [<0>] kthread+0xcf/0x100 [<0>] ret_from_fork+0x31/0x40 [<0>] ret_from_fork_asm+0x1a/0x30 ---} [akpm@linux-foundation.org: fix build] Link: https://lkml.kernel.org/r/20250225022002.26141-1-snitzer@kernel.org Fixes: 96780ca55e3c ("NFS: fix up nfs_release_folio() to try to release the page") Signed-off-by: Mike Snitzer <snitzer@kernel.org> Cc: Anna Schumaker <anna.schumaker@oracle.com> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-05ext4: protect ext4_release_dquot against freezingOjaswin Mujoo
Protect ext4_release_dquot against freezing so that we don't try to start a transaction when FS is frozen, leading to warnings. Further, avoid taking the freeze protection if a transaction is already running so that we don't need end up in a deadlock as described in 46e294efc355 ext4: fix deadlock with fs freezing and EA inodes Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Reviewed-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20241121123855.645335-3-ojaswin@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-03-05fs: use fput_close() in path_openat()Mateusz Guzik
This bumps failing open rate by 1.7% on Sapphire Rapids by avoiding one atomic. Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://lore.kernel.org/r/20250305123644.554845-5-mjguzik@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-05fs: use fput_close() in filp_close()Mateusz Guzik
When tracing a kernel build over refcounts seen this is a wash: @[kprobe:filp_close]: [0] 32195 |@@@@@@@@@@ | [1] 164567 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| I verified vast majority of the skew comes from do_close_on_exec() which could be changed to use a different variant instead. Even without changing that, the 19.5% of calls which got here still can save the extra atomic. Calls here are borderline non-existent compared to fput (over 3.2 mln!), so they should not negatively affect scalability. Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://lore.kernel.org/r/20250305123644.554845-4-mjguzik@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-05fs: use fput_close_sync() in close()Mateusz Guzik
This bumps open+close rate by 1% on Sapphire Rapids by eliding one atomic. It would be higher if it was not for several other slowdowns of the same nature. Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://lore.kernel.org/r/20250305123644.554845-3-mjguzik@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-05file: add fput and file_ref_put routines optimized for use when closing a fdMateusz Guzik
Vast majority of the time closing a file descriptor also operates on the last reference, where a regular fput usage will result in 2 atomics. This can be changed to only suffer 1. See commentary above file_ref_put_close() for more information. Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://lore.kernel.org/r/20250305123644.554845-2-mjguzik@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-05fs: predict no error in close()Mateusz Guzik
Vast majority of the time the system call returns 0. Letting the compiler know shortens the routine (119 -> 116) and the fast path. Disasm starting at the call to __fput_sync(): before: <+55>: call 0xffffffff816b0da0 <__fput_sync> <+60>: lea 0x201(%rbx),%eax <+66>: cmp $0x1,%eax <+69>: jbe 0xffffffff816ab707 <__x64_sys_close+103> <+71>: mov %ebx,%edx <+73>: movslq %ebx,%rax <+76>: and $0xfffffffd,%edx <+79>: cmp $0xfffffdfc,%edx <+85>: mov $0xfffffffffffffffc,%rdx <+92>: cmove %rdx,%rax <+96>: pop %rbx <+97>: pop %rbp <+98>: jmp 0xffffffff82242fa0 <__x86_return_thunk> <+103>: mov $0xfffffffffffffffc,%rax <+110>: jmp 0xffffffff816ab700 <__x64_sys_close+96> <+112>: mov $0xfffffffffffffff7,%rax <+119>: jmp 0xffffffff816ab700 <__x64_sys_close+96> after: <+56>: call 0xffffffff816b0da0 <__fput_sync> <+61>: xor %eax,%eax <+63>: test %ebp,%ebp <+65>: jne 0xffffffff816ab6ea <__x64_sys_close+74> <+67>: pop %rbx <+68>: pop %rbp <+69>: jmp 0xffffffff82242fa0 <__x86_return_thunk> # the jmp out <+74>: lea 0x201(%rbp),%edx <+80>: mov $0xfffffffffffffffc,%rax <+87>: cmp $0x1,%edx <+90>: jbe 0xffffffff816ab6e3 <__x64_sys_close+67> <+92>: mov %ebp,%edx <+94>: and $0xfffffffd,%edx <+97>: cmp $0xfffffdfc,%edx <+103>: cmovne %rbp,%rax <+107>: jmp 0xffffffff816ab6e3 <__x64_sys_close+67> <+109>: mov $0xfffffffffffffff7,%rax <+116>: jmp 0xffffffff816ab6e3 <__x64_sys_close+67> Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://lore.kernel.org/r/20250301104356.246031-1-mjguzik@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>