summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2024-05-019p: Use alternative invalidation to using launder_folioDavid Howells
Use writepages-based flushing invalidation instead of invalidate_inode_pages2() and ->launder_folio(). This will allow ->launder_folio() to be removed eventually. Signed-off-by: David Howells <dhowells@redhat.com> cc: Eric Van Hensbergen <ericvh@kernel.org> cc: Latchesar Ionkov <lucho@ionkov.net> cc: Dominique Martinet <asmadeus@codewreck.org> cc: Christian Schoenebeck <linux_oss@crudebyte.com> cc: Jeff Layton <jlayton@kernel.org> cc: v9fs@lists.linux.dev cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org
2024-05-01mm: Provide a means of invalidation without using launder_folioDavid Howells
Implement a replacement for launder_folio. The key feature of invalidate_inode_pages2() is that it locks each folio individually, unmaps it to prevent mmap'd accesses interfering and calls the ->launder_folio() address_space op to flush it. This has problems: firstly, each folio is written individually as one or more small writes; secondly, adjacent folios cannot be added so easily into the laundry; thirdly, it's yet another op to implement. Instead, use the invalidate lock to cause anyone wanting to add a folio to the inode to wait, then unmap all the folios if we have mmaps, then, conditionally, use ->writepages() to flush any dirty data back and then discard all pages. The invalidate lock prevents ->read_iter(), ->write_iter() and faulting through mmap all from adding pages for the duration. This is then used from netfslib to handle the flusing in unbuffered and direct writes. Signed-off-by: David Howells <dhowells@redhat.com> cc: Matthew Wilcox <willy@infradead.org> cc: Miklos Szeredi <miklos@szeredi.hu> cc: Trond Myklebust <trond.myklebust@hammerspace.com> cc: Christoph Hellwig <hch@lst.de> cc: Andrew Morton <akpm@linux-foundation.org> cc: Alexander Viro <viro@zeniv.linux.org.uk> cc: Christian Brauner <brauner@kernel.org> cc: Jeff Layton <jlayton@kernel.org> cc: linux-mm@kvack.org cc: linux-fsdevel@vger.kernel.org cc: netfs@lists.linux.dev cc: v9fs@lists.linux.dev cc: linux-afs@lists.infradead.org cc: ceph-devel@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: linux-nfs@vger.kernel.org cc: devel@lists.orangefs.org
2024-04-30btrfs: set correct ram_bytes when splitting ordered extentQu Wenruo
[BUG] When running generic/287, the following file extent items can be generated: item 16 key (258 EXTENT_DATA 2682880) itemoff 15305 itemsize 53 generation 9 type 1 (regular) extent data disk byte 1378414592 nr 462848 extent data offset 0 nr 462848 ram 2097152 extent compression 0 (none) Note that file extent item is not a compressed one, but its ram_bytes is way larger than its disk_num_bytes. According to btrfs on-disk scheme, ram_bytes should match disk_num_bytes if it's not a compressed one. [CAUSE] Since commit b73a6fd1b1ef ("btrfs: split partial dio bios before submit"), for partial dio writes, we would split the ordered extent. However the function btrfs_split_ordered_extent() doesn't update the ram_bytes even it has already shrunk the disk_num_bytes. Originally the function btrfs_split_ordered_extent() is only introduced for zoned devices in commit d22002fd37bd ("btrfs: zoned: split ordered extent when bio is sent"), but later commit b73a6fd1b1ef ("btrfs: split partial dio bios before submit") makes non-zoned btrfs affected. Thankfully for un-compressed file extent, we do not really utilize the ram_bytes member, thus it won't cause any real problem. [FIX] Also update btrfs_ordered_extent::ram_bytes inside btrfs_split_ordered_extent(). Fixes: d22002fd37bd ("btrfs: zoned: split ordered extent when bio is sent") CC: stable@vger.kernel.org # 5.15+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-04-30xfs: do not allocate the entire delalloc extent in xfs_bmapi_writeChristoph Hellwig
While trying to convert the entire delalloc extent is a good decision for regular writeback as it leads to larger contigous on-disk extents, but for other callers of xfs_bmapi_write is is rather questionable as it forced them to loop creating new transactions just in case there is no large enough contiguous extent to cover the whole delalloc reservation. Change xfs_bmapi_write to only allocate the passed in range instead, whіle the writeback path through xfs_bmapi_convert_delalloc and xfs_bmapi_allocate still always converts the full extents. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-04-30xfs: fix xfs_bmap_add_extent_delay_real for partial conversionsChristoph Hellwig
xfs_bmap_add_extent_delay_real takes parts or all of a delalloc extent and converts them to a real extent. It is written to deal with any potential overlap of the to be converted range with the delalloc extent, but it turns out that currently only converting the entire extents, or a part starting at the beginning is actually exercised, as the only caller always tries to convert the entire delalloc extent, and either succeeds or at least progresses partially from the start. If it only converts a tiny part of a delalloc extent, the indirect block calculation for the new delalloc extent (da_new) might be equivalent to that of the existing delalloc extent (da_old). If this extent conversion now requires allocating an indirect block that gets accounted into da_new, leading to the assert that da_new must be smaller or equal to da_new unless we split the extent to trigger. Except for the assert that case is actually handled by just trying to allocate more space, as that already handled for the split case (which currently can't be reached at all), so just reusing it should be fine. Except that without dipping into the reserved block pool that would make it a bit too easy to trigger a fs shutdown due to ENOSPC. So in addition to adjusting the assert, also dip into the reserved block pool. Note that I could only reproduce the assert with a change to only convert the actually asked range instead of the full delalloc extent from xfs_bmapi_write. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-04-30xfs: remove the xfs_iext_peek_prev_extent call in xfs_bmapi_allocateChristoph Hellwig
Both callers of xfs_bmapi_allocate already initialize bma->prev, don't redo that in xfs_bmapi_allocate. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-04-30xfs: pass the actual offset and len to allocate to xfs_bmapi_allocateChristoph Hellwig
xfs_bmapi_allocate currently overwrites offset and len when converting delayed allocations, and duplicates the length cap done for non-delalloc allocations. Move all that logic into the callers to avoid duplication and to make the calling conventions more obvious. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-04-30xfs: don't open code XFS_FILBLKS_MIN in xfs_bmapi_writeChristoph Hellwig
XFS_FILBLKS_MIN uses min_t and thus does the comparison using the correct xfs_filblks_t type. Use it in xfs_bmapi_write and slightly adjust the comment document th potential pitfall to take account of this Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-04-30xfs: lift a xfs_valid_startblock into xfs_bmapi_allocateChristoph Hellwig
xfs_bmapi_convert_delalloc has a xfs_valid_startblock check on the block allocated by xfs_bmapi_allocate. Lift it into xfs_bmapi_allocate as we should assert the same for xfs_bmapi_write. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-04-30xfs: remove the unusued tmp_logflags variable in xfs_bmapi_allocateChristoph Hellwig
tmp_logflags is initialized to 0 and then ORed into bma->logflags, which isn't actually doing anything. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-04-30xfs: fix error returns from xfs_bmapi_writeChristoph Hellwig
xfs_bmapi_write can return 0 without actually returning a mapping in mval in two different cases: 1) when there is absolutely no space available to do an allocation 2) when converting delalloc space, and the allocation is so small that it only covers parts of the delalloc extent before the range requested by the caller Callers at best can handle one of these cases, but in many cases can't cope with either one. Switch xfs_bmapi_write to always return a mapping or return an error code instead. For case 1) above ENOSPC is the obvious choice which is very much what the callers expect anyway. For case 2) there is no really good error code, so pick a funky one from the SysV streams portfolio. This fixes the reproducer here: https://lore.kernel.org/linux-xfs/CAEJPjCvT3Uag-pMTYuigEjWZHn1sGMZ0GCjVVCv29tNHK76Cgg@mail.gmail.com0/ which uses reserved blocks to create file systems that are gravely out of space and thus cause at least xfs_file_alloc_space to hang and trigger the lack of ENOSPC handling in xfs_dquot_disk_alloc. Note that this patch does not actually make any caller but xfs_alloc_file_space deal intelligently with case 2) above. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: 刘通 <lyutoon@gmail.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-04-29Merge tag 'nfsd-6.9-6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fix from Chuck Lever: - Avoid freeing unallocated memory (v6.7 regression) * tag 'nfsd-6.9-6' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: NFSD: Fix nfsd4_encode_fattr4() crasher
2024-04-29Merge tag 'nfs-for-6.9-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client fixes from Trond Myklebust: - Fix an Oops in xs_tcp_tls_setup_socket - Fix an Oops due to missing error handling in nfs_net_init() * tag 'nfs-for-6.9-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: nfs: Handle error of rpc_proc_register() in nfs_net_init(). SUNRPC: add a missing rpc_stat for TCP TLS
2024-04-29Merge tag 'bcachefs-2024-04-29' of https://evilpiepirate.org/git/bcachefsLinus Torvalds
Pull bcachefs fixes from Kent Overstreet: "Tiny set of fixes this time" * tag 'bcachefs-2024-04-29' of https://evilpiepirate.org/git/bcachefs: bcachefs: fix integer conversion bug bcachefs: btree node scan now fills in sectors_written bcachefs: Remove accidental debug assert
2024-04-29f2fs: zone: fix to don't trigger OPU on pinfile for direct IOChao Yu
Otherwise, it breaks pinfile's sematics. Cc: Daeho Jeong <daeho43@gmail.com> Signed-off-by: Chao Yu <chao@kernel.org> Reviewed-by: Daeho Jeong <daehojeong@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-04-29f2fs: fix to do sanity check on i_xattr_nid in sanity_check_inode()Chao Yu
syzbot reports a kernel bug as below: F2FS-fs (loop0): Mounted with checkpoint version = 48b305e4 ================================================================== BUG: KASAN: slab-out-of-bounds in f2fs_test_bit fs/f2fs/f2fs.h:2933 [inline] BUG: KASAN: slab-out-of-bounds in current_nat_addr fs/f2fs/node.h:213 [inline] BUG: KASAN: slab-out-of-bounds in f2fs_get_node_info+0xece/0x1200 fs/f2fs/node.c:600 Read of size 1 at addr ffff88807a58c76c by task syz-executor280/5076 CPU: 1 PID: 5076 Comm: syz-executor280 Not tainted 6.9.0-rc5-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114 print_address_description mm/kasan/report.c:377 [inline] print_report+0x169/0x550 mm/kasan/report.c:488 kasan_report+0x143/0x180 mm/kasan/report.c:601 f2fs_test_bit fs/f2fs/f2fs.h:2933 [inline] current_nat_addr fs/f2fs/node.h:213 [inline] f2fs_get_node_info+0xece/0x1200 fs/f2fs/node.c:600 f2fs_xattr_fiemap fs/f2fs/data.c:1848 [inline] f2fs_fiemap+0x55d/0x1ee0 fs/f2fs/data.c:1925 ioctl_fiemap fs/ioctl.c:220 [inline] do_vfs_ioctl+0x1c07/0x2e50 fs/ioctl.c:838 __do_sys_ioctl fs/ioctl.c:902 [inline] __se_sys_ioctl+0x81/0x170 fs/ioctl.c:890 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f The root cause is we missed to do sanity check on i_xattr_nid during f2fs_iget(), so that in fiemap() path, current_nat_addr() will access nat_bitmap w/ offset from invalid i_xattr_nid, result in triggering kasan bug report, fix it. Reported-and-tested-by: syzbot+3694e283cf5c40df6d14@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-f2fs-devel/00000000000094036c0616e72a1d@google.com Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-04-29f2fs: fix to avoid allocating WARM_DATA segment for direct IOChao Yu
If active_log is not 6, we never use WARM_DATA segment, let's avoid allocating WARM_DATA segment for direct IO. Signed-off-by: Yunlei He <heyunlei@oppo.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-04-29f2fs: remove redundant parameter in is_next_segment_free()Yifan Zhao
is_next_segment_free() takes a redundant `type` parameter. Remove it. Signed-off-by: Yifan Zhao <zhaoyifan@sjtu.edu.cn> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-04-29Merge tag 'erofs-for-6.9-rc7-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs fixes from Gao Xiang: "Three fixes related to EROFS fscache mode. The most important two patches fix calling kill_block_super() in bdev-based mode instead of kill_anon_super(). The remaining patch is an informative one. Summary: - Better error message when prepare_ondemand_read failed - Fix unmount of bdev-based mode if CONFIG_EROFS_FS_ONDEMAND is on" * tag 'erofs-for-6.9-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: reliably distinguish block based and fscache mode erofs: get rid of erofs_fs_context erofs: modify the error message when prepare_ondemand_read failed
2024-04-29netfs: Use subreq_counter to allocate subreq debug_index valuesDavid Howells
Use the subreq_counter in netfs_io_request to allocate subrequest debug_index values in read ops as well as write ops. Signed-off-by: David Howells <dhowells@redhat.com> cc: Jeff Layton <jlayton@kernel.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org
2024-04-29netfs: Make netfs_io_request::subreq_counter an atomic_tDavid Howells
Make the netfs_io_request::subreq_counter, used to generate values for netfs_io_subrequest::debug_index, into an atomic_t so that it can be called from the retry thread at the same time as the app thread issuing writes. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org
2024-04-29netfs: Remove deprecated use of PG_private_2 as a second writeback flagDavid Howells
Remove the deprecated use of PG_private_2 in netfslib. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: Matthew Wilcox (Oracle) <willy@infradead.org> cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2024-04-29mm: Remove the PG_fscache alias for PG_private_2David Howells
Remove the PG_fscache alias for PG_private_2 and use the latter directly. Use of this flag for marking pages undergoing writing to the cache should be considered deprecated and the folios should be marked dirty instead and the write done in ->writepages(). Note that PG_private_2 itself should be considered deprecated and up for future removal by the MM folks too. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: Matthew Wilcox (Oracle) <willy@infradead.org> cc: Ilya Dryomov <idryomov@gmail.com> cc: Xiubo Li <xiubli@redhat.com> cc: Steve French <sfrench@samba.org> cc: Paulo Alcantara <pc@manguebit.com> cc: Ronnie Sahlberg <ronniesahlberg@gmail.com> cc: Shyam Prasad N <sprasad@microsoft.com> cc: Tom Talpey <tom@talpey.com> cc: Bharath SM <bharathsm@microsoft.com> cc: Trond Myklebust <trond.myklebust@hammerspace.com> cc: Anna Schumaker <anna@kernel.org> cc: netfs@lists.linux.dev cc: ceph-devel@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: linux-nfs@vger.kernel.org cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2024-04-29netfs: Replace PG_fscache by setting folio->private and marking dirtyDavid Howells
When dirty data is being written to the cache, setting/waiting on/clearing the fscache flag is always done in tandem with setting/waiting on/clearing the writeback flag. The netfslib buffered write routines wait on and set both flags and the write request cleanup clears both flags, so the fscache flag is almost superfluous. The reason it isn't superfluous is because the fscache flag is also used to indicate that data just read from the server is being written to the cache. The flag is used to prevent a race involving overlapping direct-I/O writes to the cache. Change this to indicate that a page is in need of being copied to the cache by placing a magic value in folio->private and marking the folios dirty. Then when the writeback code sees a folio marked in this way, it only writes it to the cache and not to the server. If a folio that has this magic value set is modified, the value is just replaced and the folio will then be uplodaded too. With this, PG_fscache is no longer required by the netfslib core, 9p and afs. Ceph and nfs, however, still need to use the old PG_fscache-based tracking. To deal with this, a flag, NETFS_ICTX_USE_PGPRIV2, now has to be set on the flags in the netfs_inode struct for those filesystems. This reenables the use of PG_fscache in that inode. 9p and afs use the netfslib write helpers so get switched over; cifs, for the moment, does page-by-page manual access to the cache, so doesn't use PG_fscache and is unaffected. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: Matthew Wilcox (Oracle) <willy@infradead.org> cc: Eric Van Hensbergen <ericvh@kernel.org> cc: Latchesar Ionkov <lucho@ionkov.net> cc: Dominique Martinet <asmadeus@codewreck.org> cc: Christian Schoenebeck <linux_oss@crudebyte.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Ilya Dryomov <idryomov@gmail.com> cc: Xiubo Li <xiubli@redhat.com> cc: Steve French <sfrench@samba.org> cc: Paulo Alcantara <pc@manguebit.com> cc: Ronnie Sahlberg <ronniesahlberg@gmail.com> cc: Shyam Prasad N <sprasad@microsoft.com> cc: Tom Talpey <tom@talpey.com> cc: Bharath SM <bharathsm@microsoft.com> cc: Trond Myklebust <trond.myklebust@hammerspace.com> cc: Anna Schumaker <anna@kernel.org> cc: netfs@lists.linux.dev cc: v9fs@lists.linux.dev cc: linux-afs@lists.infradead.org cc: ceph-devel@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: linux-nfs@vger.kernel.org cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2024-04-29netfs: Update i_blocks when write committed to pagecacheDavid Howells
Update i_blocks when i_size is updated when we finish making a write to the pagecache to reflect the amount of space we think will be consumed. This maintains cifs commit dbfdff402d89854126658376cbcb08363194d3cd ("smb3: update allocation size more accurately on write completion") which would otherwise be removed by the cifs part of the netfs writeback rewrite. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: Steve French <sfrench@samba.org> cc: Shyam Prasad N <nspmangalore@gmail.com> cc: Rohith Surabattula <rohiths.msft@gmail.com> cc: linux-cifs@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2024-04-29xfs: convert delayed extents to unwritten when zeroing post eof blocksZhang Yi
Current clone operation could be non-atomic if the destination of a file is beyond EOF, user could get a file with corrupted (zeroed) data on crash. The problem is about preallocations. If you write some data into a file: [A...B) and XFS decides to preallocate some post-eof blocks, then it can create a delayed allocation reservation: [A.........D) The writeback path tries to convert delayed extents to real ones by allocating blocks. If there aren't enough contiguous free space, we can end up with two extents, the first real and the second still delalloc: [A....C)[C.D) After that, both the in-memory and the on-disk file sizes are still B. If we clone into the range [E...F) from another file: [A....C)[C.D) [E...F) then xfs_reflink_zero_posteof() calls iomap_zero_range() to zero out the range [B, E) beyond EOF and flush it. Since [C, D) is still a delalloc extent, its pagecache will be zeroed and both the in-memory and on-disk size will be updated to D after flushing but before cloning. This is wrong, because the user can see the size change and read the zeroes while the clone operation is ongoing. We need to keep the in-memory and on-disk size before the clone operation starts, so instead of writing zeroes through the page cache for delayed ranges beyond EOF, we convert these ranges to unwritten and invalidate any cached data over that range beyond EOF. Suggested-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-04-29xfs: make xfs_bmapi_convert_delalloc() to allocate the target offsetZhang Yi
Since xfs_bmapi_convert_delalloc() only attempts to allocate the entire delalloc extent and require multiple invocations to allocate the target offset. So xfs_convert_blocks() add a loop to do this job and we call it in the write back path, but xfs_convert_blocks() isn't a common helper. Let's do it in xfs_bmapi_convert_delalloc() and drop xfs_convert_blocks(), preparing for the post EOF delalloc blocks converting in the buffered write begin path. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-04-29xfs: make the seq argument to xfs_bmapi_convert_delalloc() optionalZhang Yi
Allow callers to pass a NULLL seq argument if they don't care about the fork sequence number. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-04-29xfs: match lock mode in xfs_buffered_write_iomap_begin()Zhang Yi
Commit 1aa91d9c9933 ("xfs: Add async buffered write support") replace xfs_ilock(XFS_ILOCK_EXCL) with xfs_ilock_for_iomap() when locking the writing inode, and a new variable lockmode is used to indicate the lock mode. Although the lockmode should always be XFS_ILOCK_EXCL, it's still better to use this variable instead of useing XFS_ILOCK_EXCL directly when unlocking the inode. Fixes: 1aa91d9c9933 ("xfs: Add async buffered write support") Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-04-29gfs2: Convert gfs2_page_mkwrite() to use a folioMatthew Wilcox (Oracle)
Convert the incoming page to a folio and use it throughout, saving several calls to compound_head(). Also use 'pos' for file position rather than the ambiguous 'offset' and convert 'length' to type size_t in case we get some truly ridiculous sized folios in the future. This function should now be large-folio safe, but I may have missed something. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-04-29gfs2: gfs2_freeze_unlock cleanupAndreas Gruenbacher
Function gfs2_freeze_unlock() is always called with &sdp->sd_freeze_gh as its argument, so clean up the code by passing in sdp instead. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-04-28bcachefs: fix integer conversion bugKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-28bcachefs: btree node scan now fills in sectors_writtenKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-28bcachefs: Remove accidental debug assertKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-28ksmbd: fix uninitialized symbol 'share' in smb2_tree_connect()Namjae Jeon
Fix uninitialized symbol 'share' in smb2_tree_connect(). Fixes: e9d8c2f95ab8 ("ksmbd: add continuous availability share parameter") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-04-28erofs: reliably distinguish block based and fscache modeChristian Brauner
When erofs_kill_sb() is called in block dev based mode, s_bdev may not have been initialised yet, and if CONFIG_EROFS_FS_ONDEMAND is enabled, it will be mistaken for fscache mode, and then attempt to free an anon_dev that has never been allocated, triggering the following warning: ============================================ ida_free called for id=0 which is not allocated. WARNING: CPU: 14 PID: 926 at lib/idr.c:525 ida_free+0x134/0x140 Modules linked in: CPU: 14 PID: 926 Comm: mount Not tainted 6.9.0-rc3-dirty #630 RIP: 0010:ida_free+0x134/0x140 Call Trace: <TASK> erofs_kill_sb+0x81/0x90 deactivate_locked_super+0x35/0x80 get_tree_bdev+0x136/0x1e0 vfs_get_tree+0x2c/0xf0 do_new_mount+0x190/0x2f0 [...] ============================================ Now when erofs_kill_sb() is called, erofs_sb_info must have been initialised, so use sbi->fsid to distinguish between the two modes. Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20240419123611.947084-3-libaokun1@huawei.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2024-04-28erofs: get rid of erofs_fs_contextBaokun Li
Instead of allocating the erofs_sb_info in fill_super() allocate it during erofs_init_fs_context() and ensure that erofs can always have the info available during erofs_kill_sb(). After this erofs_fs_context is no longer needed, replace ctx with sbi, no functional changes. Suggested-by: Jingbo Xu <jefflexu@linux.alibaba.com> Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20240419123611.947084-2-libaokun1@huawei.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2024-04-28erofs: modify the error message when prepare_ondemand_read failedHongbo Li
When prepare_ondemand_read failed, wrong error message is printed. The prepare_read is also implemented in cachefiles, so we amend it. Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20240424084247.759432-1-lihongbo22@huawei.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2024-04-27Merge tag '6.9-rc5-cifs-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull smb client fixes from Steve French: "Three smb3 client fixes, all also for stable: - two small locking fixes spotted by Coverity - FILE_ALL_INFO and network_open_info packing fix" * tag '6.9-rc5-cifs-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6: smb3: fix lock ordering potential deadlock in cifs_sync_mid_result smb3: missing lock when picking channel smb: client: Fix struct_group() usage in __packed structs
2024-04-26Merge tag 'mm-hotfixes-stable-2024-04-26-13-30' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "11 hotfixes. 8 are cc:stable and the remaining 3 (nice ratio!) address post-6.8 issues or aren't considered suitable for backporting. All except one of these are for MM. I see no particular theme - it's singletons all over" * tag 'mm-hotfixes-stable-2024-04-26-13-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: mm/hugetlb: fix DEBUG_LOCKS_WARN_ON(1) when dissolve_free_hugetlb_folio() selftests: mm: protection_keys: save/restore nr_hugepages value from launch script stackdepot: respect __GFP_NOLOCKDEP allocation flag hugetlb: check for anon_vma prior to folio allocation mm: zswap: fix shrinker NULL crash with cgroup_disable=memory mm: turn folio_test_hugetlb into a PageType mm: support page_mapcount() on page_has_type() pages mm: create FOLIO_FLAG_FALSE and FOLIO_TYPE_OPS macros mm/hugetlb: fix missing hugetlb_lock for resv uncharge selftests: mm: fix unused and uninitialized variable warning selftests/harness: remove use of LINE_MAX
2024-04-26Merge tag 'vfs-6.9-rc6.fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: "This contains a few small fixes for this merge window and the attempt to handle the ntfs removal regression that was reported a little while ago: - After the removal of the legacy ntfs driver we received reports about regressions for some people that do mount "ntfs" explicitly and expect the driver to be available. Since ntfs3 is a drop-in for legacy ntfs we alias legacy ntfs to ntfs3 just like ext3 is aliased to ext4. We also enforce legacy ntfs is always mounted read-only and give it custom file operations to ensure that ioctl()'s can't be abused to perform write operations. - Fix an unbalanced module_get() in bdev_open(). - Two smaller fixes for the netfs work done earlier in this cycle. - Fix the errno returned from the new FS_IOC_GETUUID and FS_IOC_GETFSSYSFSPATH ioctls. Both commands just pull information out of the superblock so there's no need to call into the actual ioctl handlers. So instead of returning ENOIOCTLCMD to indicate to fallback we just return ENOTTY directly avoiding that indirection" * tag 'vfs-6.9-rc6.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: netfs: Fix the pre-flush when appending to a file in writethrough mode netfs: Fix writethrough-mode error handling ntfs3: add legacy ntfs file operations ntfs3: enforce read-only when used as legacy ntfs driver ntfs3: serve as alias for the legacy ntfs driver block: fix module reference leakage from bdev_open_by_dev error path fs: Return ENOTTY directly if FS_IOC_GETUUID or FS_IOC_GETFSSYSFSPATH fail
2024-04-26netfs: Fix the pre-flush when appending to a file in writethrough modeDavid Howells
In netfs_perform_write(), when the file is marked NETFS_ICTX_WRITETHROUGH or O_*SYNC or RWF_*SYNC was specified, write-through caching is performed on a buffered file. When setting up for write-through, we flush any conflicting writes in the region and wait for the write to complete, failing if there's a write error to return. The issue arises if we're writing at or above the EOF position because we skip the flush and - more importantly - the wait. This becomes a problem if there's a partial folio at the end of the file that is being written out and we want to make a write to it too. Both the already-running write and the write we start both want to clear the writeback mark, but whoever is second causes a warning looking something like: ------------[ cut here ]------------ R=00000012: folio 11 is not under writeback WARNING: CPU: 34 PID: 654 at fs/netfs/write_collect.c:105 ... CPU: 34 PID: 654 Comm: kworker/u386:27 Tainted: G S ... ... Workqueue: events_unbound netfs_write_collection_worker ... RIP: 0010:netfs_writeback_lookup_folio Fix this by making the flush-and-wait unconditional. It will do nothing if there are no folios in the pagecache and will return quickly if there are no folios in the region specified. Further, move the WBC attachment above the flush call as the flush is going to attach a WBC and detach it again if it is not present - and since we need one anyway we might as well share it. Fixes: 41d8e7673a77 ("netfs: Implement a write-through caching option") Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202404161031.468b84f-oliver.sang@intel.com Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/2150448.1714130115@warthog.procyon.org.uk Reviewed-by: Jeffrey Layton <jlayton@kernel.org> cc: Eric Van Hensbergen <ericvh@kernel.org> cc: Latchesar Ionkov <lucho@ionkov.net> cc: Dominique Martinet <asmadeus@codewreck.org> cc: Christian Schoenebeck <linux_oss@crudebyte.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org cc: v9fs@lists.linux.dev cc: linux-afs@lists.infradead.org cc: linux-cifs@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-04-26fs: Create anon_inode_getfile_fmode()Dawid Osuchowski
Creates an anon_inode_getfile_fmode() function that works similarly to anon_inode_getfile() with the addition of being able to set the fmode member. Signed-off-by: Dawid Osuchowski <linux@osuchow.ski> Link: https://lore.kernel.org/r/20240426075854.4723-1-linux@osuchow.ski Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-04-26xfs: refactor dir format helpersChristoph Hellwig
Add a new enum and a xfs_dir2_format helper that returns it to allow the code to switch on the format of a directory in a single operation and switch all helpers of xfs_dir2_isblock and xfs_dir2_isleaf to it. This also removes the explicit xfs_iread_extents call in a few of the call sites given that xfs_bmap_last_offset already takes care of it underneath. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-04-26xfs: factor out a xfs_dir_replace_args helperChristoph Hellwig
Add a helper to switch between the different directory formats for removing a directory entry. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-04-26xfs: factor out a xfs_dir_removename_args helperChristoph Hellwig
Add a helper to switch between the different directory formats for removing a directory entry. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-04-26xfs: factor out a xfs_dir_createname_args helperChristoph Hellwig
Add a helper to switch between the different directory formats for creating a directory entry and to handle the XFS_DA_OP_JUSTCHECK flag based on the passed in ino number field. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-04-26xfs: factor out a xfs_dir_lookup_args helperChristoph Hellwig
Add a helper to switch between the different directory formats for lookup and to handle the -EEXIST return for a successful lookup. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-04-25nilfs2: add kernel-doc comments to nilfs_remove_all_gcinodes()Yang Li
This commit adds kernel-doc style comments with complete parameter descriptions for the function nilfs_remove_all_gcinodes. Link: https://lkml.kernel.org/r/20240410075629.3441-4-konishi.ryusuke@gmail.com Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-25nilfs2: add kernel-doc comments to nilfs_btree_convert_and_insert()Yang Li
This commit adds kernel-doc style comments with complete parameter descriptions for the function nilfs_btree_convert_and_insert. Link: https://lkml.kernel.org/r/20240410075629.3441-3-konishi.ryusuke@gmail.com Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>