summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2025-07-14ext4: limit the maximum folio orderZhang Yi
In environments with a page size of 64KB, the maximum size of a folio can reach up to 128MB. Consequently, during the write-back of folios, the 'rsv_blocks' will be overestimated to 1,577, which can make pressure on the journal space where the journal is small. This can easily exceed the limit of a single transaction. Besides, an excessively large folio is meaningless and will instead increase the overhead of traversing the bhs within the folio. Therefore, limit the maximum order of a folio to 2048 filesystem blocks. Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Reported-by: Joseph Qi <jiangqi903@gmail.com> Closes: https://lore.kernel.org/linux-ext4/CA+G9fYsyYQ3ZL4xaSg1-Tt5Evto7Zd+hgNWZEa9cQLbahA1+xg@mail.gmail.com/ Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Tested-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20250707140814.542883-12-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-07-15gfs2: a minor finish_xmote cleanupAndreas Gruenbacher
As a minor clean-up to commit 1fc05c8d8426 ("gfs2: cancel timed-out glock requests"), when a demote request is in progress in finish_xmote(), there is no point in waking up the glock holder at the head of the queue because the reply from dlm cannot be on behalf of that glock holder. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Andrew Price <anprice@redhat.com>
2025-07-15gfs2: simplify finish_xmoteAndreas Gruenbacher
As a follow-up to commit a431d49243a0 ("gfs2: Fix request cancelation bug"), it turns out that any call to finish_xmote() is always followed by a call to run_queue(), either * directly when glock_work_func() calls finish_xmote() before calling run_queue(), or * indirectly when do_xmote() calls finish_xmote() before calling gfs2_glock_queue_work(), which queues a call to glock_work_func() in work queue context, so remove the code in finish_xmote() that duplicates the functionality of run_queue(). In addition, the code this commit removes is missing a check for the GLF_DEMOTE flag which indicates that no further promotes should be performed, so if that code didn't get removed, that check would have to be added. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Andrew Price <anprice@redhat.com>
2025-07-15gfs2: sanitize the gdlm_ast -> finish_xmote interfaceAndreas Gruenbacher
When gdlm_ast() is called with a non-zero status code, this means that the requested operation did not succeed and the current lock state didn't change. Turn that into a non-zero LM_OUT_* status code (with ret & ~LM_OUT_ST_MASK != 0) instead of pretending that dlm returned the current lock state. That way, we can easily change finish_xmote() to only update gl->gl_state when the state has actually changed. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Andrew Price <anprice@redhat.com>
2025-07-14jfs: stop using write_cache_pagesChristoph Hellwig
Stop using the obsolete write_cache_pages and use writeback_iter directly. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2025-07-14jfs: truncate good inode pages when hard link is 0Lizhi Xu
The fileset value of the inode copy from the disk by the reproducer is AGGR_RESERVED_I. When executing evict, its hard link number is 0, so its inode pages are not truncated. This causes the bugon to be triggered when executing clear_inode() because nrpages is greater than 0. Reported-by: syzbot+6e516bb515d93230bc7b@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=6e516bb515d93230bc7b Signed-off-by: Lizhi Xu <lizhi.xu@windriver.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2025-07-14jfs: jfs_xtree: replace XT_GETPAGE macro with xt_getpage()Suchit Karunakaran
Replace legacy XT_GETPAGE macro with an inline function that returns a xtpage_t pointer and update all instances of XT_GETPAGE in jfs_xtree.c Signed-off-by: Suchit Karunakaran <suchitkarunakaran@gmail.com> Simplified xt_getpage by removing size and rc arguments. Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2025-07-14jfs: Regular file corruption checkEdward Adam Davis
The reproducer builds a corrupted file on disk with a negative i_size value. Add a check when opening this file to avoid subsequent operation failures. Reported-by: syzbot+630f6d40b3ccabc8e96e@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=630f6d40b3ccabc8e96e Tested-by: syzbot+630f6d40b3ccabc8e96e@syzkaller.appspotmail.com Signed-off-by: Edward Adam Davis <eadavis@qq.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2025-07-14jfs: upper bound check of tree index in dbAllocAGArnaud Lecomte
When computing the tree index in dbAllocAG, we never check if we are out of bounds realative to the size of the stree. This could happen in a scenario where the filesystem metadata are corrupted. Reported-by: syzbot+cffd18309153948f3c3e@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=cffd18309153948f3c3e Tested-by: syzbot+cffd18309153948f3c3e@syzkaller.appspotmail.com Signed-off-by: Arnaud Lecomte <contact@arnaud-lcm.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2025-07-14fsverity: Switch from crypto_shash to SHA-2 libraryEric Biggers
fsverity supports two hash algorithms: SHA-256 and SHA-512. Since both of these have a library API now, just use the library API instead of crypto_shash. Even with multiple algorithms, the library-based code still ends up being quite a bit simpler, due to how clumsy the old-school crypto API is. The library-based code is also more efficient, since it avoids overheads such as indirect calls. Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250630172224.46909-3-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-07-14fsverity: Explicitly include <linux/export.h>Eric Biggers
Fix build warnings with W=1 that started appearing after commit a934a57a42f6 ("scripts/misc-check: check missing #include <linux/export.h> when W=1"). Link: https://lore.kernel.org/r/20250614221723.131827-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-07-14nfsd: Drop dprintk in blocklayout xdr functionsSergey Bashirov
Minor clean up. Instead of dprintk there are appropriate error codes. Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-07-14NFSD: Simplify struct knfsd_fhChuck Lever
Compilers are allowed to insert padding and reorder the fields in a struct, so using a union of an array and a struct in struct knfsd_fh is not reliable. The position of elements in an array is more reliable. Suggested-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-07-14NFSD: Access a knfsd_fh's fsid by pointerChuck Lever
I'm about to remove the union in struct knfsd_fh. First step is to add an accessor function for the file handle's fsid portion. Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-07-14Revert "NFSD: Force all NFSv4.2 COPY requests to be synchronous"Chuck Lever
In the past several kernel releases, we've made NFSv4.2 async copy reliable: - The Linux NFS client and server now both implement and use the NFSv4.2 OFFLOAD_STATUS operation - The Linux NFS server keeps copy stateids around longer - The Linux NFS client and server now both implement referring call lists And resilient against DoS: - The Linux NFS server limits the number of concurrent async copy operations Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-07-14NFSD: Avoid multiple -Wflex-array-member-not-at-end warningsGustavo A. R. Silva
Replace flexible-array member with a fixed-size array. With this changes, fix many instances of the following type of warnings: fs/nfsd/nfsfh.h:79:33: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] fs/nfsd/state.h:763:33: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] fs/nfsd/state.h:669:33: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] fs/nfsd/state.h:549:33: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] fs/nfsd/xdr4.h:705:33: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] fs/nfsd/xdr4.h:678:33: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: NeilBrown <neil@brown.name> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-07-14NFSD: Use vfs_iocb_iter_write()Chuck Lever
Refactor: Enable the use of IOCB flags to control NFSD's individual write operations. This allows the eventual use of atomic, uncached, direct, or asynchronous writes. Suggested-by: NeilBrown <neil@brown.name> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: NeilBrown <neil@brown.name> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-07-14NFSD: Use vfs_iocb_iter_read()Chuck Lever
Refactor: Enable the use of IOCB flags to control NFSD's individual read operations (when not using splice). This allows the eventual use of atomic, uncached, direct, or asynchronous reads. Suggested-by: Christoph Hellwig <hch@infradead.org> Reviewed-by: NeilBrown <neil@brown.name> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-07-14NFSD: Clean up kdoc for nfsd_open_local_fh()Chuck Lever
Sparse reports that the synopsis of nfsd_open_local_fh() does not match its kdoc comment. Introduced by commit e6f7e1487ab5 ("nfs_localio: simplify interface to nfsd for getting nfsd_file"). Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-07-14NFSD: Clean up kdoc for nfsd_file_put_local()Chuck Lever
Sparse reports that the synopsis of nfsd_file_put_local() does not match its kdoc comment. Introduced by commit c25a89770d1f ("nfs_localio: change nfsd_file_put_local() to take a pointer to __rcu pointer") . Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-07-14NFSD: Remove definition for trace_nfsd_ctl_maxconnChuck Lever
trace_nfsd_ctl_maxconn() was removed by commit a4b853f183a1 ("sunrpc: remove all connection limit configuration") but did not remove the event. Reported-by: Steven Rostedt <rostedt@goodmis.org> Closes: https://lore.kernel.org/linux-nfs/5ccae2f9-1560-4ac5-b506-b235ed4e4f4f@oracle.com/T/#t Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-07-14NFSD: Remove definition for trace_nfsd_file_gc_recentChuck Lever
Event nfsd_file_gc_recent was added by commit 64912122a4f8 ("nfsd: filecache: introduce NFSD_FILE_RECENT") but never used. Reported-by: Steven Rostedt <rostedt@goodmis.org> Closes: https://lore.kernel.org/linux-nfs/5ccae2f9-1560-4ac5-b506-b235ed4e4f4f@oracle.com/T/#t Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-07-14NFSD: Remove definitions for unused trace_nfsd_file_lru trace pointsChuck Lever
Events nfsd_file_lru_add_disposed and nfsd_file_lru_del_disposed were added by commit 4a0e73e635e3 ("NFSD: Leave open files out of the filecache LRU") but they were never used. Reported-by: Steven Rostedt <rostedt@goodmis.org> Closes: https://lore.kernel.org/linux-nfs/5ccae2f9-1560-4ac5-b506-b235ed4e4f4f@oracle.com/T/#t Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-07-14NFSD: Remove definition for trace_nfsd_file_unhash_and_queueChuck Lever
trace_nfsd_file_unhash_and_queue() was removed by commit ac3a2585f018 ("nfsd: rework refcounting in filecache"). Reported-by: Steven Rostedt <rostedt@goodmis.org> Closes: https://lore.kernel.org/linux-nfs/5ccae2f9-1560-4ac5-b506-b235ed4e4f4f@oracle.com/T/#t Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-07-14nfsd: Use correct error code when decoding extentsSergey Bashirov
Update error codes in decoding functions of block and scsi layout drivers to match the core nfsd code. NFS4ERR_EINVAL means that the server was able to decode the request, but the decoded values are invalid. Use NFS4ERR_BADXDR instead to indicate a decoding error. And ENOMEM is changed to nfs code NFS4ERR_DELAY. Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-07-14NFSD: Remove the cap on number of operations per NFSv4 COMPOUNDChuck Lever
This limit has always been a sanity check; in nearly all cases a large COMPOUND is a sign of a malfunctioning client. The only real limit on COMPOUND size and complexity is the size of NFSD's send and receive buffers. However, there are a few cases where a large COMPOUND is sane. For example, when a client implementation wants to walk down a long file pathname in a single round trip. A small risk is that now a client can construct a COMPOUND request that can keep a single nfsd thread busy for quite some time. Suggested-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-07-14NFSD: Make nfsd_genl_rqstp::rq_ops array best-effortChuck Lever
To enable NFSD to handle NFSv4 COMPOUNDs of unrestricted size, resize the array in struct nfsd_genl_rqstp so it saves only up to 16 operations per COMPOUND. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-07-14NFSD: Rename a function parameterChuck Lever
Clean up: A function parameter called "rqstp" typically refers to an object of type "struct svc_rqst", so it's confusing when such an parameter refers to a different struct type with field names that are very similar to svc_rqst. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-07-14NFSD: detect mismatch of file handle and delegation stateid in OPEN opDai Ngo
When the client sends an OPEN with claim type CLAIM_DELEG_CUR_FH or CLAIM_DELEGATION_CUR, the delegation stateid and the file handle must belong to the same file, otherwise return NFS4ERR_INVAL. Note that RFC8881, section 8.2.4, mandates the server to return NFS4ERR_BAD_STATEID if the selected table entry does not match the current filehandle. However returning NFS4ERR_BAD_STATEID in the OPEN causes the client to retry the operation and therefor get the client into a loop. To avoid this situation we return NFS4ERR_INVAL instead. Reported-by: Petro Pavlov <petro.pavlov@vastdata.com> Fixes: c44c5eeb2c02 ("[PATCH] nfsd4: add open state code for CLAIM_DELEGATE_CUR") Cc: stable@vger.kernel.org Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-07-14nfsd: handle get_client_locked() failure in nfsd4_setclientid_confirm()Jeff Layton
Lei Lu recently reported that nfsd4_setclientid_confirm() did not check the return value from get_client_locked(). a SETCLIENTID_CONFIRM could race with a confirmed client expiring and fail to get a reference. That could later lead to a UAF. Fix this by getting a reference early in the case where there is an extant confirmed client. If that fails then treat it as if there were no confirmed client found at all. In the case where the unconfirmed client is expiring, just fail and return the result from get_client_locked(). Reported-by: lei lu <llfamsec@gmail.com> Closes: https://lore.kernel.org/linux-nfs/CAEBF3_b=UvqzNKdnfD_52L05Mqrqui9vZ2eFamgAbV0WG+FNWQ@mail.gmail.com/ Fixes: d20c11d86d8f ("nfsd: Protect session creation and client confirm using client_lock") Cc: stable@vger.kernel.org Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-07-14nfsd: Change the type of ek_fsidtype from int to u8 and use kstrtou8Su Hui
The valid values for ek_fsidtype are actually 0-7 so it's better to change the type to u8. Also using kstrtou8() to relpace simple_strtoul(), kstrtou8() is safer and more suitable for u8. Suggested-by: NeilBrown <neil@brown.name> Signed-off-by: Su Hui <suhui@nfschina.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-07-14sunrpc: simplify xdr_init_encode_pagesChristoph Hellwig
The rqst argument to xdr_init_encode_pages is set to NULL by all callers, and pages is always set to buf->pages. Remove the two arguments and hardcode the assignments. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-07-14NFSD: release read access of nfs4_file when a write delegation is returnedDai Ngo
When a write delegation is returned, check if read access was added to nfs4_file when client opens file with WRONLY, and release it. Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-07-14NFSD: Offer write delegation for OPEN with OPEN4_SHARE_ACCESS_WRITEDai Ngo
RFC8881, section 9.1.2 says: "In the case of READ, the server may perform the corresponding check on the access mode, or it may choose to allow READ for OPEN4_SHARE_ACCESS_WRITE, to accommodate clients whose WRITE implementation may unavoidably do reads (e.g., due to buffer cache constraints)." and in section 10.4.1: "Similarly, when closing a file opened for OPEN4_SHARE_ACCESS_WRITE/ OPEN4_SHARE_ACCESS_BOTH and if an OPEN_DELEGATE_WRITE delegation is in effect" This patch allows READ using write delegation stateid granted on OPENs with OPEN4_SHARE_ACCESS_WRITE only, to accommodate clients whose WRITE implementation may unavoidably do (e.g., due to buffer cache constraints). For write delegation granted for OPEN with OPEN4_SHARE_ACCESS_WRITE a new nfsd_file and a struct file are allocated to use for reads. The nfsd_file is freed when the file is closed by release_all_access. Suggested-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-07-14netfs: Fix race between cache write completion and ALL_QUEUED being setDavid Howells
When netfslib is issuing subrequests, the subrequests start processing immediately and may complete before we reach the end of the issuing function. At the end of the issuing function we set NETFS_RREQ_ALL_QUEUED to indicate to the collector that we aren't going to issue any more subreqs and that it can do the final notifications and cleanup. Now, this isn't a problem if the request is synchronous (NETFS_RREQ_OFFLOAD_COLLECTION is unset) as the result collection will be done in-thread and we're guaranteed an opportunity to run the collector. However, if the request is asynchronous, collection is primarily triggered by the termination of subrequests queuing it on a workqueue. Now, a race can occur here if the app thread sets ALL_QUEUED after the last subrequest terminates. This can happen most easily with the copy2cache code (as used by Ceph) where, in the collection routine of a read request, an asynchronous write request is spawned to copy data to the cache. Folios are added to the write request as they're unlocked, but there may be a delay before ALL_QUEUED is set as the write subrequests may complete before we get there. If all the write subreqs have finished by the ALL_QUEUED point, no further events happen and the collection never happens, leaving the request hanging. Fix this by queuing the collector after setting ALL_QUEUED. This is a bit heavy-handed and it may be sufficient to do it only if there are no extant subreqs. Also add a tracepoint to cross-reference both requests in a copy-to-request operation and add a trace to the netfs_rreq tracepoint to indicate the setting of ALL_QUEUED. Fixes: e2d46f2ec332 ("netfs: Change the read result collector to only use one work item") Reported-by: Max Kellermann <max.kellermann@ionos.com> Link: https://lore.kernel.org/r/CAKPOu+8z_ijTLHdiCYGU_Uk7yYD=shxyGLwfe-L7AV3DhebS3w@mail.gmail.com/ Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/20250711151005.2956810-3-dhowells@redhat.com Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> cc: Paulo Alcantara <pc@manguebit.org> cc: Viacheslav Dubeyko <slava@dubeyko.com> cc: Alex Markuze <amarkuze@redhat.com> cc: Ilya Dryomov <idryomov@gmail.com> cc: netfs@lists.linux.dev cc: ceph-devel@vger.kernel.org cc: linux-fsdevel@vger.kernel.org cc: stable@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-14netfs: Fix copy-to-cache so that it performs collection with ceph+fscacheDavid Howells
The netfs copy-to-cache that is used by Ceph with local caching sets up a new request to write data just read to the cache. The request is started and then left to look after itself whilst the app continues. The request gets notified by the backing fs upon completion of the async DIO write, but then tries to wake up the app because NETFS_RREQ_OFFLOAD_COLLECTION isn't set - but the app isn't waiting there, and so the request just hangs. Fix this by setting NETFS_RREQ_OFFLOAD_COLLECTION which causes the notification from the backing filesystem to put the collection onto a work queue instead. Fixes: e2d46f2ec332 ("netfs: Change the read result collector to only use one work item") Reported-by: Max Kellermann <max.kellermann@ionos.com> Link: https://lore.kernel.org/r/CAKPOu+8z_ijTLHdiCYGU_Uk7yYD=shxyGLwfe-L7AV3DhebS3w@mail.gmail.com/ Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/20250711151005.2956810-2-dhowells@redhat.com Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> cc: Paulo Alcantara <pc@manguebit.org> cc: Viacheslav Dubeyko <slava@dubeyko.com> cc: Alex Markuze <amarkuze@redhat.com> cc: Ilya Dryomov <idryomov@gmail.com> cc: netfs@lists.linux.dev cc: ceph-devel@vger.kernel.org cc: linux-fsdevel@vger.kernel.org cc: stable@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-14iomap: build the writeback code without CONFIG_BLOCKChristoph Hellwig
Allow fuse to use the iomap writeback code even when CONFIG_BLOCK is not enabled. Do this with an ifdef instead of a separate file to keep the iomap_folio_state local to buffered-io.c. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/20250710133343.399917-15-hch@lst.de Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-14iomap: add read_folio_range() handler for buffered writesChristoph Hellwig
Add a read_folio_range() handler for buffered writes that filesystems may pass in if they wish to provide a custom handler for synchronously reading in the contents of a folio. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> [hch: renamed to read_folio_range, pass less arguments] Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/20250710133343.399917-14-hch@lst.de Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-14iomap: improve argument passing to iomap_read_folio_syncChristoph Hellwig
Pass the iomap_iter and derive the map inside iomap_read_folio_sync instead of in the caller, and use the more descriptive srcmap name for the source iomap. Stop passing the offset into folio argument as it can be derived from the folio and the file offset. Rename the variables for the offset into the file and the length to be more descriptive and match the rest of the code. Rename the function itself to iomap_read_folio_range to make the use more clear. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/20250710133343.399917-13-hch@lst.de Reviewed-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-14iomap: replace iomap_folio_ops with iomap_write_opsChristoph Hellwig
The iomap_folio_ops are only used for buffered writes, including the zero and unshare variants. Rename them to iomap_write_ops to better describe the usage, and pass them through the call chain like the other operation specific methods instead of through the iomap. xfs_iomap_valid grows a IOMAP_HOLE check to keep the existing behavior that never attached the folio_ops to a iomap representing a hole. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/20250710133343.399917-12-hch@lst.de Acked-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-14iomap: export iomap_writeback_folioChristoph Hellwig
Allow fuse to use iomap_writeback_folio for folio laundering. Note that the caller needs to manually submit the pending writeback context. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/20250710133343.399917-11-hch@lst.de Reviewed-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-14iomap: move folio_unlock out of iomap_writeback_folioJoanne Koong
Move unlocking the folio out of iomap_writeback_folio into the caller. This means the end writeback machinery is now run with the folio locked when no writeback happened, or writeback completed extremely fast. Note that having the folio locked over the call to folio_end_writeback in iomap_writeback_folio means that the dropbehind handling there will never run because the trylock fails. The only way this can happen is if the writepage either never wrote back any dirty data at all, in which case the dropbehind handling isn't needed, or if all writeback finished instantly, which is rather unlikely. Even in the latter case the dropbehind handling is an optional optimization so skipping it will not cause correctness issues. This prepares for exporting iomap_writeback_folio for use in folio laundering. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> [hch: split from a larger patch] Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/20250710133343.399917-10-hch@lst.de Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-14iomap: rename iomap_writepage_map to iomap_writeback_folioChristoph Hellwig
->writepage is gone, and our naming wasn't always that great to start with. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/20250710133343.399917-9-hch@lst.de Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-14iomap: move all ioend handling to ioend.cChristoph Hellwig
Now that the writeback code has the proper abstractions, all the ioend code can be self-contained in ioend.c. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/20250710133343.399917-8-hch@lst.de Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-14iomap: add public helpers for uptodate state manipulationJoanne Koong
Add a new iomap_start_folio_write helper to abstract away the write_bytes_pending handling, and export it and the existing iomap_finish_folio_write for non-iomap writeback in fuse. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> [hch: split from a larger patch] Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/20250710133343.399917-7-hch@lst.de Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-14iomap: hide ioends from the generic writeback codeChristoph Hellwig
Replace the ioend pointer in iomap_writeback_ctx with a void *wb_ctx one to facilitate non-block, non-ioend writeback for use. Rename the submit_ioend method to writeback_submit and make it mandatory so that the generic writeback code stops seeing ioends and bios. Co-developed-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/20250710133343.399917-6-hch@lst.de Acked-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-14iomap: refactor the writeback interfaceChristoph Hellwig
Replace ->map_blocks with a new ->writeback_range, which differs in the following ways: - it must also queue up the I/O for writeback, that is called into the slightly refactored and extended in scope iomap_add_to_ioend for each region - can handle only a part of the requested region, that is the retry loop for partial mappings moves to the caller - handles cleanup on failures as well, and thus also replaces the discard_folio method only implemented by XFS. This will allow to use the iomap writeback code also for file systems that are not block based like fuse. Co-developed-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/20250710133343.399917-5-hch@lst.de Acked-by: Damien Le Moal <dlemoal@kernel.org> # zonefs Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-14iomap: cleanup the pending writeback tracking in iomap_writepage_map_blocksJoanne Koong
We don't care about the count of outstanding ioends, just if there is one. Replace the count variable passed to iomap_writepage_map_blocks with a boolean to make that more clear. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> [hch: rename the variable, update the commit message] Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/20250710133343.399917-4-hch@lst.de Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-14iomap: pass more arguments using the iomap writeback contextChristoph Hellwig
Add inode and wpc fields to pass the inode and writeback context that are needed in the entire writeback call chain, and let the callers initialize all fields in the writeback context before calling iomap_writepages to simplify the argument passing. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/20250710133343.399917-3-hch@lst.de Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-14iomap: header dietChristoph Hellwig
Drop various unused #include statements. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/20250710133343.399917-2-hch@lst.de Reviewed-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>