summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2025-03-10Merge afs RCU pathwalk fixChristian Brauner
Bring in the fix for afs_atcell_get_link() to handle RCU pathwalk from the afs branch for this cycle. This fix has to go upstream now. Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-10afs: Simplify cell record handlingDavid Howells
Simplify afs_cell record handling to avoid very occasional races that cause module removal to hang (it waits for all cell records to be removed). There are two things that particularly contribute to the difficulty: firstly, the code tries to pass a ref on the cell to the cell's maintenance work item (which gets awkward if the work item is already queued); and, secondly, there's an overall cell manager that tries to use just one timer for the entire cell collection (to avoid having loads of timers). However, both of these are probably unnecessarily restrictive. To simplify this, the following changes are made: (1) The cell record collection manager is removed. Each cell record manages itself individually. (2) Each afs_cell is given a second work item (cell->destroyer) that is queued when its refcount reaches zero. This is not done in the context of the putting thread as it might be in an inconvenient place to sleep. (3) Each afs_cell is given its own timer. The timer is used to expire the cell record after a period of unuse if not otherwise pinned and can also be used for other maintenance tasks if necessary (of which there are currently none as DNS refresh is triggered by filesystem operations). (4) The afs_cell manager work item (cell->manager) is no longer given a ref on the cell when queued; rather, the manager must be deleted. This does away with the need to deal with the consequences of losing a race to queue cell->manager. Clean up of extra queuing is deferred to the destroyer. (5) The cell destroyer work item makes sure the cell timer is removed and that the normal cell work is cancelled before farming the actual destruction off to RCU. (6) When a network namespace is destroyed or the kafs module is unloaded, it's now a simple matter of marking the namespace as dead then just waking up all the cell work items. They will then remove and destroy themselves once all remaining activity counts and/or a ref counts are dropped. This makes sure that all server records are dropped first. (7) The cell record state set is reduced to just four states: SETTING_UP, ACTIVE, REMOVING and DEAD. The record persists in the active state even when it's not being used until the time comes to remove it rather than downgrading it to an inactive state from whence it can be restored. This means that the cell still appears in /proc and /afs when not in use until it switches to the REMOVING state - at which point it is removed. Note that the REMOVING state is included so that someone wanting to resurrect the cell record is forced to wait whilst the cell is torn down in that state. Once it's in the DEAD state, it has been removed from net->cells tree and is no longer findable and can be replaced. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1 Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-03-10afs: Fix afs_server ref accountingDavid Howells
The current way that afs_server refs are accounted and cleaned up sometimes cause rmmod to hang when it is waiting for cell records to be removed. The problem is that the cell cleanup might occasionally happen before the server cleanup and then there's nothing that causes the cell to garbage-collect the remaining servers as they become inactive. Partially fix this by: (1) Give each afs_server record its own management timer that rather than relying on the cell manager's central timer to drive each individual cell's maintenance work item to garbage collect servers. This timer is set when afs_unuse_server() reduces a server's activity count to zero and will schedule the server's destroyer work item upon firing. (2) Give each afs_server record its own destroyer work item that removes the record from the cell's database, shuts down the timer, cancels any pending work for itself, sends an RPC to the server to cancel outstanding callbacks. This change, in combination with the timer, obviates the need to try and coordinate so closely between the cell record and a bunch of other server records to try and tear everything down in a coordinated fashion. With this, the cell record is pinned until the server RCU is complete and namespace/module removal will wait until all the cell records are removed. (3) Now that incoming calls are mapped to servers (and thus cells) using data attached to an rxrpc_peer, the UUID-to-server mapping tree is moved from the namespace to the cell (cell->fs_servers). This means there can no longer be duplicates therein - and that allows the mapping tree to be simpler as there doesn't need to be a chain of same-UUID servers that are in different cells. (4) The lock protecting the UUID mapping tree is switched to an rw_semaphore on the cell rather than a seqlock on the namespace as it's now only used during mounting in contexts in which we're allowed to sleep. (5) When it comes time for a cell that is being removed to purge its set of servers, it just needs to iterate over them and wake them up. Once a server becomes inactive, its destroyer work item will observe the state of the cell and immediately remove that record. (6) When a server record is removed, it is marked AFS_SERVER_FL_EXPIRED to prevent reattempts at removal. The record will be dispatched to RCU for destruction once its refcount reaches 0. (7) The AFS_SERVER_FL_UNCREATED/CREATING flags are used to synchronise simultaneous creation attempts. If one attempt fails, it will abandon the attempt and allow another to try again. Note that the record can't just be abandoned when dead as it's bound into a server list attached to a volume and only subject to replacement if the server list obtained for the volume from the VLDB changes. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250224234154.2014840-15-dhowells@redhat.com/ # v1 Link: https://lore.kernel.org/r/20250310094206.801057-11-dhowells@redhat.com/ # v4
2025-03-10afs: Use the per-peer app data provided by rxrpcDavid Howells
Make use of the per-peer application data that rxrpc now allows the application to store on the rxrpc_peer struct to hold a back pointer to the afs_server record that peer represents an endpoint for. Then, when a call comes in to the AFS cache manager, this can be used to map it to the correct server record rather than having to use a UUID-to-server mapping table and having to do an additional lookup. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250224234154.2014840-14-dhowells@redhat.com/ # v1 Link: https://lore.kernel.org/r/20250310094206.801057-10-dhowells@redhat.com/ # v4
2025-03-10afs: Drop the net parameter from afs_unuse_cell()David Howells
Remove the redundant net parameter to afs_unuse_cell() as cell->net can be used instead. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250224234154.2014840-12-dhowells@redhat.com/ # v1 Link: https://lore.kernel.org/r/20250310094206.801057-8-dhowells@redhat.com/ # v4
2025-03-10afs: Make afs_lookup_cell() take a trace noteDavid Howells
Pass a note to be added to the afs_cell tracepoint to afs_lookup_cell() so that different callers can be distinguished. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250224234154.2014840-11-dhowells@redhat.com/ # v1 Link: https://lore.kernel.org/r/20250310094206.801057-7-dhowells@redhat.com/ # v4
2025-03-10afs: Improve server refcount/active count tracingDavid Howells
Improve server refcount/active count tracing to distinguish between simply getting/putting a ref and using/unusing the server record (which changes the activity count as well as the refcount). This makes it a bit easier to work out what's going on. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250224234154.2014840-10-dhowells@redhat.com/ # v1 Link: https://lore.kernel.org/r/20250310094206.801057-6-dhowells@redhat.com/ # v4
2025-03-10afs: Improve afs_volume tracing to display a debug IDDavid Howells
Improve the tracing of afs_volume objects to include displaying a debug ID so that different instances of volumes with the same "vid" can be distinguished. Also be consistent about displaying the volume's refcount (and not the cell's). Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250224234154.2014840-9-dhowells@redhat.com/ # v1 Link: https://lore.kernel.org/r/20250310094206.801057-5-dhowells@redhat.com/ # v4
2025-03-10afs: Change dynroot to create contents on demandDavid Howells
Change the AFS dynamic root to do things differently: (1) Rather than having the creation of cell records create inodes and dentries for cell mountpoints, create them on demand during lookup. This simplifies cell management and locking as we no longer have to create these objects in advance *and* on speculative lookup by the user for a cell that isn't precreated. (2) Rather than using the libfs dentry-based readdir (the dentries now no longer exist until accessed from (1)), have readdir generate the contents by reading the list of cells. The @cell symlinks get pushed in positions 2 and 3 if rootcell has been configured. (3) Make the @cell symlink dentries persist for the life of the superblock or until reclaimed, but make cell mountpoints disappear immediately if unused. It's not perfect as someone doing an "ls -l /afs" may create a whole bunch of dentries which will be garbage collected immediately. But any dentry that gets automounted will be pinned by the mount, so it shouldn't be too bad. (4) Allocate the inode numbers for the cell mountpoints from an IDR to prevent duplicates appearing in the event it cycles round. The number allocated from the IDR is doubled to provide two inode numbers - one for the normal cell name (RO) and one for the dotted cell name (RW). Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250224234154.2014840-8-dhowells@redhat.com/ # v1 Link: https://lore.kernel.org/r/20250310094206.801057-4-dhowells@redhat.com/ # v4
2025-03-10afs: Remove the "autocell" mount optionDavid Howells
Remove the "autocell" mount option. It was an attempt to do automounting of arbitrary cells based on what the user looked up but within the root directory of a mounted volume. This isn't really the right thing to do, and using the "dyn" mount option to get the dynamic root is the right way to do it. The kafs-client package uses "-o dyn" when mounting /afs, so it should be safe to drop "-o autocell". Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250224234154.2014840-7-dhowells@redhat.com/ # v1 Link: https://lore.kernel.org/r/20250310094206.801057-3-dhowells@redhat.com/ # v4
2025-03-10afs: Fix afs_atcell_get_link() to handle RCU pathwalkDavid Howells
The ->get_link() method may be entered under RCU pathwalk conditions (in which case, the dentry pointer is NULL). This is not taken account of by afs_atcell_get_link() and lockdep will complain when it tries to lock an rwsem. Fix this by marking net->ws_cell as __rcu and using RCU access macros on it and by making afs_atcell_get_link() just return a pointer to the name in RCU pathwalk without taking net->cells_lock or a ref on the cell as RCU will protect the name storage (the cell is already freed via call_rcu()). Fixes: 30bca65bbbae ("afs: Make /afs/@cell and /afs/.@cell symlinks") Reported-by: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250310094206.801057-2-dhowells@redhat.com/ # v4
2025-03-10Merge branch 'xfs-6.15-merge' into for-nextCarlos Maiolino
XFS code for 6.15 to be merged into linux-next Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-03-10Merge branch 'xfs-6.15-zoned_devices' into xfs-6.15-mergeCarlos Maiolino
Merge Zoned devices support for XFS Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-03-10Merge branch 'vfs-6.15.iomap' of ↵Carlos Maiolino
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs into xfs-6.15-merge XFS code for 6.15 depends on patches within iomap. Merge them before pulling in XFS code. Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-03-10xfs: Use abs_diff instead of XFS_ABSDIFFMatthew Wilcox (Oracle)
We have a central definition for this function since 2023, used by a number of different parts of the kernel. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-03-10Merge patch series "pipe: Trivial cleanups"Christian Brauner
K Prateek Nayak <kprateek.nayak@amd.com> says: Based on the suggestion on the RFC, the treewide conversion of references to pipe->{head,tail} from unsigned int to pipe_index_t has been dropped for now. The series contains trivial cleanup suggested to limit the nr_slots in pipe_resize_ring() to be covered between pipe_index_t limits of pipe->{head,tail} and using pipe_buf() to remove the open-coded usage of masks to access pipe buffer building on Linus' cleanup of fs/fuse/dev.c in commit ebb0f38bb47f ("fs/pipe: fix pipe buffer index use in FUSE") * patches from https://lore.kernel.org/r/20250307052919.34542-1-kprateek.nayak@amd.com: fs/splice: Use pipe_buf() helper to retrieve pipe buffer fs/pipe: Use pipe_buf() helper to retrieve pipe buffer kernel/watch_queue: Use pipe_buf() to retrieve the pipe buffer fs/pipe: Limit the slots in pipe_resize_ring() Link: https://lore.kernel.org/r/20250307052919.34542-1-kprateek.nayak@amd.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-10fs/splice: Use pipe_buf() helper to retrieve pipe bufferK Prateek Nayak
Use pipe_buf() helper to retrieve the pipe buffer throughout the file replacing the open-coded the logic. Suggested-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com> Link: https://lore.kernel.org/r/20250307052919.34542-5-kprateek.nayak@amd.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-10fs/pipe: Use pipe_buf() helper to retrieve pipe bufferK Prateek Nayak
Use pipe_buf() helper to retrieve the pipe buffer throughout the file replacing the open-coded the logic. Suggested-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com> Link: https://lore.kernel.org/r/20250307052919.34542-4-kprateek.nayak@amd.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-10fs/pipe: Limit the slots in pipe_resize_ring()K Prateek Nayak
Limit the number of slots in pipe_resize_ring() to the maximum value representable by pipe->{head,tail}. Values beyond the max limit can lead to incorrect pipe occupancy related calculations where the pipe will never appear full. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com> Link: https://lore.kernel.org/r/20250307052919.34542-2-kprateek.nayak@amd.com Reviewed-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-10Merge mainline pipe changesChristian Brauner
Mainline now contains various changes to pipes that are relevant for other pipe work this cycle. So merge them into the respective VFS tree. Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-08Merge tag 'mm-hotfixes-stable-2025-03-08-16-27' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "33 hotfixes. 24 are cc:stable and the remainder address post-6.13 issues or aren't considered necessary for -stable kernels. 26 are for MM and 7 are for non-MM. - "mm: memory_failure: unmap poisoned folio during migrate properly" from Ma Wupeng fixes a couple of two year old bugs involving the migration of hwpoisoned folios. - "selftests/damon: three fixes for false results" from SeongJae Park fixes three one year old bugs in the SAMON selftest code. The remainder are singletons and doubletons. Please see the individual changelogs for details" * tag 'mm-hotfixes-stable-2025-03-08-16-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (33 commits) mm/page_alloc: fix uninitialized variable rapidio: add check for rio_add_net() in rio_scan_alloc_net() rapidio: fix an API misues when rio_add_net() fails MAINTAINERS: .mailmap: update Sumit Garg's email address Revert "mm/page_alloc.c: don't show protection in zone's ->lowmem_reserve[] for empty zone" mm: fix finish_fault() handling for large folios mm: don't skip arch_sync_kernel_mappings() in error paths mm: shmem: remove unnecessary warning in shmem_writepage() userfaultfd: fix PTE unmapping stack-allocated PTE copies userfaultfd: do not block on locking a large folio with raised refcount mm: zswap: use ATOMIC_LONG_INIT to initialize zswap_stored_pages mm: shmem: fix potential data corruption during shmem swapin mm: fix kernel BUG when userfaultfd_move encounters swapcache selftests/damon/damon_nr_regions: sort collected regiosn before checking with min/max boundaries selftests/damon/damon_nr_regions: set ops update for merge results check to 100ms selftests/damon/damos_quota: make real expectation of quota exceeds include/linux/log2.h: mark is_power_of_2() with __always_inline NFS: fix nfs_release_folio() to not deadlock via kcompactd writeback mm, swap: avoid BUG_ON in relocate_cluster() mm: swap: use correct step in loop to wait all clusters in wait_for_allocation() ...
2025-03-08f2fs: control nat_bits feature via mount optionChao Yu
Introduce a new mount option "nat_bits" to control nat_bits feature, by default nat_bits feature is disabled. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-08vfs: Remove invalidate_inodes()Jan Kara
The function can be replaced by evict_inodes. The only difference is that evict_inodes() skips the inodes with positive refcount without touching ->i_lock, but they are equivalent as evict_inodes() repeats the refcount check after having grabbed ->i_lock. Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20250307144318.28120-2-jack@suse.cz Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-07binfmt_elf_fdpic: fix variable set but not used warningsunliming
Fix below kernel warning: fs/binfmt_elf_fdpic.c:1024:52: warning: variable 'excess1' set but not used [-Wunused-but-set-variable] Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: sunliming <sunliming@kylinos.cn> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20250308022754.75013-1-sunliming@linux.dev Signed-off-by: Kees Cook <kees@kernel.org>
2025-03-07Merge tag 'execve-v6.14-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull core dumping fix from Kees Cook: - Only sort VMAs when core_sort_vma sysctl is set * tag 'execve-v6.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: coredump: Only sort VMAs when core_sort_vma sysctl is set
2025-03-07Merge tag 'for-6.14-rc5-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - fix leaked extent map after error when reading chunks - replace use of deprecated strncpy - in zoned mode, fixed range when ulocking extent range, causing a hang * tag 'for-6.14-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix a leaked chunk map issue in read_one_chunk() btrfs: replace deprecated strncpy() with strscpy() btrfs: zoned: fix extent range end unlock in cow_file_range()
2025-03-07bdev: add back PAGE_SIZE block size validation for sb_set_blocksize()Luis Chamberlain
The commit titled "block/bdev: lift block size restrictions to 64k" lifted the block layer's max supported block size to 64k inside the helper blk_validate_block_size() now that we support large folios. However in lifting the block size we also removed the silly use cases many filesystems have to use sb_set_blocksize() to *verify* that the block size <= PAGE_SIZE. The call to sb_set_blocksize() was used to check the block size <= PAGE_SIZE since historically we've always supported userspace to create for example 64k block size filesystems even on 4k page size systems, but what we didn't allow was mounting them. Older filesystems have been using the check with sb_set_blocksize() for years. While, we could argue that such checks should be filesystem specific, there are much more users of sb_set_blocksize() than LBS enabled filesystem on upstream, so just do the easier thing and bring back the PAGE_SIZE check for sb_set_blocksize() users and only skip it for LBS enabled filesystems. This will ensure that tests such as generic/466 when run in a loop against say, ext4, won't try to try to actually mount a filesystem with a block size larger than your filesystem supports given your PAGE_SIZE and in the worst case crash. Cc: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Link: https://lore.kernel.org/r/20250307020403.3068567-1-mcgrof@kernel.org Reviewed-by: Kent Overstreet <kent.overstreet@linux.dev> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-07efivarfs: Revert "allow creation of zero length files"Ard Biesheuvel
As agreed with the fwupd/LVFS maintainer, this reverts commit fc20737d8b85691ecabab3739ed7d06c9b7bc00f again for the v6.15 cycle, leaving them sufficient time to roll out a fix for the issue that the reverted commit works around. Link: https://lore.kernel.org/all/63837c36eceaf8cf2af7933dccca54ff4dd9f30d.camel@HansenPartnership.com/ Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2025-03-06fs/pipe: add simpler helpers for common casesLinus Torvalds
The fix to atomically read the pipe head and tail state when not holding the pipe mutex has caused a number of headaches due to the size change of the involved types. It turns out that we don't have _that_ many places that access these fields directly and were affected, but we have more than we strictly should have, because our low-level helper functions have been designed to have intimate knowledge of how the pipes work. And as a result, that random noise of direct 'pipe->head' and 'pipe->tail' accesses makes it harder to pinpoint any actual potential problem spots remaining. For example, we didn't have a "is the pipe full" helper function, but instead had a "given these pipe buffer indexes and this pipe size, is the pipe full". That's because some low-level pipe code does actually want that much more complicated interface. But most other places literally just want a "is the pipe full" helper, and not having it meant that those places ended up being unnecessarily much too aware of this all. It would have been much better if only the very core pipe code that cared had been the one aware of this all. So let's fix it - better late than never. This just introduces the trivial wrappers for "is this pipe full or empty" and to get how many pipe buffers are used, so that instead of writing if (pipe_full(pipe->head, pipe->tail, pipe->max_usage)) the places that literally just want to know if a pipe is full can just say if (pipe_is_full(pipe)) instead. The existing trivial cases were converted with a 'sed' script. This cuts down on the places that access pipe->head and pipe->tail directly outside of the pipe code (and core splice code) quite a lot. The splice code in particular still revels in doing the direct low-level accesses, and the fuse fuse_dev_splice_write() code also seems a bit unnecessarily eager to go very low-level, but it's at least a bit better than it used to be. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-03-06Merge tag 'bcachefs-2025-03-06' of git://evilpiepirate.org/bcachefsLinus Torvalds
Pull bcachefs fixes from Kent Overstreet: - Fix a compatibility issue: we shouldn't be setting incompat feature bits unless explicitly requested - Fix another bug where the journal alloc/resize path could spuriously fail with -BCH_ERR_open_buckets_empty - Copygc shouldn't run on read-only devices: fragmentation isn't an issue if we're not currently writing to a given device, and it may not have anywhere to move the data to * tag 'bcachefs-2025-03-06' of git://evilpiepirate.org/bcachefs: bcachefs: copygc now skips non-rw devices bcachefs: Fix bch2_dev_journal_alloc() spuriously failing bcachefs: Don't set BCH_FEATURE_incompat_version_field unless requested
2025-03-06bcachefs: copygc now skips non-rw devicesKent Overstreet
There's no point in doing copygc on non-rw devices: the fragmentation doesn't matter if we're not writing to them, and we may not have anywhere to put the data on our other devices. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-06bcachefs: Fix bch2_dev_journal_alloc() spuriously failingKent Overstreet
Previously, we fixed journal resize spuriousl failing with -BCH_ERR_open_buckets_empty, but initial journal allocation was missed because it didn't invoke the "block on allocator" loop at all. Factor out the "loop on allocator" code to fix that. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.14-rc6). Conflicts: net/ethtool/cabletest.c 2bcf4772e45a ("net: ethtool: try to protect all callback with netdev instance lock") 637399bf7e77 ("net: ethtool: netlink: Allow NULL nlattrs when getting a phy_device") No Adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06Merge tag 'v6.14-rc5-smb3-fixes' of git://git.samba.org/ksmbdLinus Torvalds
Pull smb fixes from Steve French: "Five SMB server fixes, two related client fixes, and minor MAINTAINERS update: - Two SMB3 lock fixes fixes (including use after free and bug on fix) - Fix to race condition that can happen in processing IPC responses - Four ACL related fixes: one related to endianness of num_aces, and two related fixes to the checks for num_aces (for both client and server), and one fixing missing check for num_subauths which can cause memory corruption - And minor update to email addresses in MAINTAINERS file" * tag 'v6.14-rc5-smb3-fixes' of git://git.samba.org/ksmbd: cifs: fix incorrect validation for num_aces field of smb_acl ksmbd: fix incorrect validation for num_aces field of smb_acl smb: common: change the data type of num_aces to le16 ksmbd: fix bug on trap in smb2_lock ksmbd: fix use-after-free in smb2_lock ksmbd: fix type confusion via race condition when using ipc_msg_send_request ksmbd: fix out-of-bounds in parse_sec_desc() MAINTAINERS: update email address in cifs and ksmbd entry
2025-03-06Merge tag 'exfat-for-6.14-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat Pull exfat fixes from Namjae Jeon: - Optimize new cluster allocation by correctly find empty entry slot - Add a check to prevent excessive bitmap clearing due to invalid data size of file/dir entry - Fix incorrect error return for zero-byte writes * tag 'exfat-for-6.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat: exfat: add a check for invalid data size exfat: short-circuit zero-byte writes in exfat_file_write_iter exfat: fix soft lockup in exfat_clear_bitmap exfat: fix just enough dentries but allocate a new cluster to dir
2025-03-06fs/pipe: fix pipe buffer index use in FUSELinus Torvalds
This was another case that Rasmus pointed out where the direct access to the pipe head and tail pointers broke on 32-bit configurations due to the type changes. As with the pipe FIONREAD case, fix it by using the appropriate helper functions that deal with the right pipe index sizing. Reported-by: Rasmus Villemoes <ravi@prevas.dk> Link: https://lore.kernel.org/all/878qpi5wz4.fsf@prevas.dk/ Fixes: 3d252160b818 ("fs/pipe: Read pipe->{head,tail} atomically outside pipe->mutex")Cc: Oleg > Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Swapnil Sapkal <swapnil.sapkal@amd.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-03-06fs/pipe: do not open-code pipe head/tail logic in FIONREADLinus Torvalds
Rasmus points out that we do indeed have other cases of breakage from the type changes that were introduced on 32-bit targets in order to read the pipe head and tail values atomically (commit 3d252160b818: "fs/pipe: Read pipe->{head,tail} atomically outside pipe->mutex"). Fix it up by using the proper helper functions that now deal with the pipe buffer index types properly. This makes the code simpler and more obvious. The compiler does the CSE and loop hoisting of the pipe ring size masking that we used to do manually, so open-coding this was never a good idea. Reported-by: Rasmus Villemoes <ravi@prevas.dk> Link: https://lore.kernel.org/all/87cyeu5zgk.fsf@prevas.dk/ Fixes: 3d252160b818 ("fs/pipe: Read pipe->{head,tail} atomically outside pipe->mutex")Cc: Oleg Nesterov <oleg@redhat.com> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Swapnil Sapkal <swapnil.sapkal@amd.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-03-06fs/ntfs3: Remove unused ntfs_flush_inodesDr. David Alan Gilbert
ntfs_flush_inodes() was added in 2021 by commit 82cae269cfa9 ("fs/ntfs3: Add initialization of super block") but has remained unused. Remove it, and it's helper function. Note: There is a commented out call to ntfs_flush_inodes in ntfs_truncate() - I've left that in place. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2025-03-06fs/ntfs3: Remove unused ntfs_sb_readDr. David Alan Gilbert
ntfs_sb_read() was added in 2021 by commit 82cae269cfa9 ("fs/ntfs3: Add initialization of super block") but hasn't been used. Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2025-03-06fs/ntfs3: Remove unused ni_load_attrDr. David Alan Gilbert
ni_load_attr() was added in 2021 by commit 4342306f0f0d ("fs/ntfs3: Add file operations and implementation") but hasn't been used. Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2025-03-06fs/ntfs3: Prevent integer overflow in hdr_first_de()Dan Carpenter
The "de_off" and "used" variables come from the disk so they both need to check. The problem is that on 32bit systems if they're both greater than UINT_MAX - 16 then the check does work as intended because of an integer overflow. Fixes: 60ce8dfde035 ("fs/ntfs3: Fix wrong if in hdr_first_de") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2025-03-06fs/ntfs3: Fix a couple integer overflows on 32bit systemsDan Carpenter
On 32bit systems the "off + sizeof(struct NTFS_DE)" addition can have an integer wrapping issue. Fix it by using size_add(). Fixes: 82cae269cfa9 ("fs/ntfs3: Add initialization of super block") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2025-03-06btrfs: fix a leaked chunk map issue in read_one_chunk()Haoxiang Li
Add btrfs_free_chunk_map() to free the memory allocated by btrfs_alloc_chunk_map() if btrfs_add_chunk_map() fails. Fixes: 7dc66abb5a47 ("btrfs: use a dedicated data structure for chunk maps") CC: stable@vger.kernel.org Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Haoxiang Li <haoxiang_li2024@163.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-03-06iomap: Lift blocksize restriction on atomic writesRitesh Harjani (IBM)
Filesystems like ext4 can submit writes in multiples of blocksizes. But we still can't allow the writes to be split. Hence let's check if the iomap_length() is same as iter->len or not. It is the role of the FS to ensure that a single mapping may be created for an atomic write. The FS will also continue to check size and alignment legality. Signed-off-by: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com> jpg: Tweak commit message Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20250303171120.2837067-7-john.g.garry@oracle.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-06iomap: Support SW-based atomic writesJohn Garry
Currently atomic write support requires dedicated HW support. This imposes a restriction on the filesystem that disk blocks need to be aligned and contiguously mapped to FS blocks to issue atomic writes. XFS has no method to guarantee FS block alignment for regular, non-RT files. As such, atomic writes are currently limited to 1x FS block there. To deal with the scenario that we are issuing an atomic write over misaligned or discontiguous data blocks - and raise the atomic write size limit - support a SW-based software emulated atomic write mode. For XFS, this SW-based atomic writes would use CoW support to issue emulated untorn writes. It is the responsibility of the FS to detect discontiguous atomic writes and switch to IOMAP_DIO_ATOMIC_SW mode and retry the write. Indeed, SW-based atomic writes could be used always when the mounted bdev does not support HW offload, but this strategy is not initially expected to be used. Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20250303171120.2837067-6-john.g.garry@oracle.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-06iomap: Rename IOMAP_ATOMIC -> IOMAP_ATOMIC_HWJohn Garry
In future xfs will support a SW-based atomic write, so rename IOMAP_ATOMIC -> IOMAP_ATOMIC_HW to be clear which mode is being used. Also relocate setting of IOMAP_ATOMIC_HW to the write path in __iomap_dio_rw(), to be clear that this flag is only relevant to writes. Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20250303171120.2837067-3-john.g.garry@oracle.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-06Merge branch 'vfs-6.15.shared.iomap' of ↵Christian Brauner
gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs Bring in iomap changes that xfs relies on. Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-06orangefs: Convert orangefs_writepages to contain an array of foliosMatthew Wilcox (Oracle)
The pages being passed in are always folios (since they come from the page cache). This eliminates several hidden calls to compound_head(), and uses of legacy APIs. Signed-off-by: "Matthew Wilcox (Oracle)" <willy@infradead.org> Link: https://lore.kernel.org/r/20250305204734.1475264-10-willy@infradead.org Tested-by: Mike Marshall <hubcap@omnibond.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-06orangefs: Simplify bvec setup in orangefs_writepages_work()Matthew Wilcox (Oracle)
This produces a bvec which is slightly different as the last page is added in its entirety rather than only the portion which is being written back. However we don't use this information anywhere; the iovec has its own length parameter. Signed-off-by: "Matthew Wilcox (Oracle)" <willy@infradead.org> Link: https://lore.kernel.org/r/20250305204734.1475264-9-willy@infradead.org Tested-by: Mike Marshall <hubcap@omnibond.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-06orangefs: Unify error & success paths in orangefs_writepages_work()Matthew Wilcox (Oracle)
Both arms of this conditional now have the same loop, so sink it out of the conditional. Signed-off-by: "Matthew Wilcox (Oracle)" <willy@infradead.org> Link: https://lore.kernel.org/r/20250305204734.1475264-8-willy@infradead.org Tested-by: Mike Marshall <hubcap@omnibond.com> Signed-off-by: Christian Brauner <brauner@kernel.org>