summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2022-05-09f2fs: Convert f2fs to read_folioMatthew Wilcox (Oracle)
This is a "weak" conversion which converts straight back to using pages. A full conversion should be performed at some point, hopefully by someone familiar with the filesystem. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-05-09ext4: Convert ext4 to read_folioMatthew Wilcox (Oracle)
This is a "weak" conversion which converts straight back to using pages. A full conversion should be performed at some point, hopefully by someone familiar with the filesystem. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-05-09erofs: Convert erofs zdata to read_folioMatthew Wilcox (Oracle)
This is a "weak" conversion which converts straight back to using pages. A full conversion should be performed at some point, hopefully by someone familiar with the filesystem. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-05-09efs: Convert efs symlinks to read_folioMatthew Wilcox (Oracle)
This is a "weak" conversion which converts straight back to using pages. A full conversion should be performed at some point, hopefully by someone familiar with the filesystem. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-05-09ecryptfs: Convert ecryptfs to read_folioMatthew Wilcox (Oracle)
This is a "weak" conversion which converts straight back to using pages. A full conversion should be performed at some point, hopefully by someone familiar with the filesystem. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-05-09cramfs: Convert cramfs to read_folioMatthew Wilcox (Oracle)
This is a "weak" conversion which converts straight back to using pages. A full conversion should be performed at some point, hopefully by someone familiar with the filesystem. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-05-09coda: Convert coda to read_folioMatthew Wilcox (Oracle)
This is a "weak" conversion which converts straight back to using pages. A full conversion should be performed at some point, hopefully by someone familiar with the filesystem. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-05-09cifs: Convert cifs to read_folioMatthew Wilcox (Oracle)
This is a "weak" conversion which converts straight back to using pages. CIFS should probably be converted to use netfs_read_folio() by someone familiar with it. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-05-09btrfs: Convert btrfs to read_folioMatthew Wilcox (Oracle)
This is a "weak" conversion which converts straight back to using pages. A full conversion should be performed at some point, hopefully by someone familiar with the filesystem. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-05-09befs: Convert befs to read_folioMatthew Wilcox (Oracle)
This is a "weak" conversion which converts straight back to using pages. A full conversion should be performed at some point, hopefully by someone familiar with the filesystem. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-05-09afs: Convert afs_symlink_readpage to afs_symlink_read_folioMatthew Wilcox (Oracle)
This function mostly used folios already, and only a few minor changes were needed. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-05-09affs: Convert affs to read_folioMatthew Wilcox (Oracle)
This is a "weak" conversion which converts straight back to using pages. A full conversion should be performed at some point, hopefully by someone familiar with the filesystem. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-05-09fs: Convert simple_readpage to simple_read_folioMatthew Wilcox (Oracle)
This is a full folio conversion; it is prepared to handle folios of arbitrary size. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-05-09fs: Convert mpage_readpage to mpage_read_folioMatthew Wilcox (Oracle)
mpage_readpage still works in terms of pages, and has not been audited for correctness with large folios, so include an assertion that the filesystem is not passing it large folios. Convert all the filesystems to call mpage_read_folio() instead of mpage_readpage(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-05-09fs: Convert block_read_full_page() to block_read_full_folio()Matthew Wilcox (Oracle)
This function is NOT converted to handle large folios, so include an assert that the filesystem isn't passing one in. Otherwise, use the folio functions instead of the page functions, where they exist. Convert all filesystems which use block_read_full_page(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-05-09fs: Convert iomap_readpage to iomap_read_folioMatthew Wilcox (Oracle)
A straightforward conversion as iomap_readpage already worked in folios. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-05-09fs: Convert netfs_readpage to netfs_read_folioMatthew Wilcox (Oracle)
This is straightforward because netfs already worked in terms of folios. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-05-09fs: Introduce aops->read_folioMatthew Wilcox (Oracle)
Change all the callers of ->readpage to call ->read_folio in preference, if it exists. This is a transitional duplication, and will be removed by the end of the series. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-05-09nfs: fix broken handling of the softreval mount optionDan Aloni
Turns out that ever since this mount option was added, passing `softreval` in NFS mount options cancelled all other flags while not affecting the underlying flag `NFS_MOUNT_SOFTREVAL`. Fixes: c74dfe97c104 ("NFS: Add mount option 'softreval'") Signed-off-by: Dan Aloni <dan.aloni@vastdata.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-05-09f2fs: stop allocating pinned sections if EAGAIN happensJaegeuk Kim
EAGAIN doesn't guarantee to have a free section. Let's report it. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-05-09f2fs: skip GC if possible when checkpoint disablingWeichao Guo
If the number of unusable blocks is not larger than unusable capacity, we can skip GC when checkpoint disabling. Signed-off-by: Weichao Guo <guoweichao@oppo.com> Signed-off-by: Chao Yu <chao@kernel.org> [Jaegeuk Kim: Fix missing gc_mode assignment] Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-05-09io_uring: support CQE32 for nop operationStefan Roesch
This adds support for filling the extra1 and extra2 fields for large CQE's. Co-developed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Stefan Roesch <shr@fb.com> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Link: https://lore.kernel.org/r/20220426182134.136504-13-shr@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-09io_uring: enable CQE32Stefan Roesch
This enables large CQE's in the uring setup. Co-developed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Stefan Roesch <shr@fb.com> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Link: https://lore.kernel.org/r/20220426182134.136504-12-shr@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-09io_uring: support CQE32 in /proc infoStefan Roesch
This exposes the extra1 and extra2 fields in the /proc output. Signed-off-by: Stefan Roesch <shr@fb.com> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Link: https://lore.kernel.org/r/20220426182134.136504-11-shr@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-09io_uring: add tracing for additional CQE32 fieldsStefan Roesch
This adds tracing for the extra1 and extra2 fields. Co-developed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Stefan Roesch <shr@fb.com> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Link: https://lore.kernel.org/r/20220426182134.136504-10-shr@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-09io_uring: overflow processing for CQE32Stefan Roesch
This adds the overflow processing for large CQE's. This adds two parameters to the io_cqring_event_overflow function and uses these fields to initialize the large CQE fields. Allocate enough space for large CQE's in the overflow structue. If no large CQE's are used, the size of the allocation is unchanged. The cqe field can have a different size depending if its a large CQE or not. To be able to allocate different sizes, the two fields in the structure are re-ordered. Co-developed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Stefan Roesch <shr@fb.com> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Link: https://lore.kernel.org/r/20220426182134.136504-9-shr@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-09io_uring: flush completions for CQE32Stefan Roesch
This flushes the completions according to their CQE type: the same processing is done for the default CQE size, but for large CQE's the extra1 and extra2 fields are filled in. Signed-off-by: Stefan Roesch <shr@fb.com> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Link: https://lore.kernel.org/r/20220426182134.136504-8-shr@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-09io_uring: modify io_get_cqe for CQE32Stefan Roesch
Modify accesses to the CQE array to take large CQE's into account. The index needs to be shifted by one for large CQE's. Signed-off-by: Stefan Roesch <shr@fb.com> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Link: https://lore.kernel.org/r/20220426182134.136504-7-shr@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-09io_uring: add CQE32 completion processingStefan Roesch
This adds the completion processing for the large CQE's and makes sure that the extra1 and extra2 fields are passed through. Co-developed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Stefan Roesch <shr@fb.com> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Link: https://lore.kernel.org/r/20220426182134.136504-6-shr@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-09io_uring: add CQE32 setup processingStefan Roesch
This adds two new function to setup and fill the CQE32 result structure. Signed-off-by: Stefan Roesch <shr@fb.com> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Link: https://lore.kernel.org/r/20220426182134.136504-5-shr@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-09io_uring: change ring size calculation for CQE32Stefan Roesch
This changes the function rings_size to take large CQE's into account. Co-developed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Stefan Roesch <shr@fb.com> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Link: https://lore.kernel.org/r/20220426182134.136504-4-shr@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-09io_uring: store add. return values for CQE32Stefan Roesch
This reuses the hash list node for the storage we need to hold the two 64-bit values that must be passed back. Co-developed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Stefan Roesch <shr@fb.com> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Link: https://lore.kernel.org/r/20220426182134.136504-3-shr@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-09io_uring: add support for 128-byte SQEsJens Axboe
Normal SQEs are 64-bytes in length, which is fine for all the commands we support. However, in preparation for supporting passthrough IO, provide an option for setting up a ring with 128-byte SQEs. We continue to use the same type for io_uring_sqe, it's marked and commented with a zero sized array pad at the end. This provides up to 80 bytes of data for a passthrough command - 64 bytes for the extra added data, and 16 bytes available at the end of the existing SQE. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-09Merge branch 'for-5.19/io_uring-socket' into for-5.19/io_uring-passthroughJens Axboe
* for-5.19/io_uring-socket: io_uring: use the text representation of ops in trace io_uring: rename op -> opcode io_uring: add io_uring_get_opcode io_uring: add type to op enum io_uring: add socket(2) support net: add __sys_socket_file() io_uring: fix trace for reduced sqe padding io_uring: add fgetxattr and getxattr support io_uring: add fsetxattr and setxattr support fs: split off do_getxattr from getxattr fs: split off setxattr_copy and do_setxattr function from setxattr
2022-05-09Merge branch 'for-5.19/io_uring' into for-5.19/io_uring-passthroughJens Axboe
* for-5.19/io_uring: (85 commits) io_uring: don't clear req->kbuf when buffer selection is done io_uring: eliminate the need to track provided buffer ID separately io_uring: move provided buffer state closer to submit state io_uring: move provided and fixed buffers into the same io_kiocb area io_uring: abstract out provided buffer list selection io_uring: never call io_buffer_select() for a buffer re-select io_uring: get rid of hashed provided buffer groups io_uring: always use req->buf_index for the provided buffer group io_uring: ignore ->buf_index if REQ_F_BUFFER_SELECT isn't set io_uring: kill io_rw_buffer_select() wrapper io_uring: make io_buffer_select() return the user address directly io_uring: kill io_recv_buffer_select() wrapper io_uring: use 'sr' vs 'req->sr_msg' consistently io_uring: add POLL_FIRST support for send/sendmsg and recv/recvmsg io_uring: check IOPOLL/ioprio support upfront io_uring: replace smp_mb() with smp_mb__after_atomic() in io_sq_thread() io_uring: add IORING_SETUP_TASKRUN_FLAG io_uring: use TWA_SIGNAL_NO_IPI if IORING_SETUP_COOP_TASKRUN is used io_uring: set task_work notify method at init time io-wq: use __set_notify_signal() to wake workers ...
2022-05-09io_uring: don't clear req->kbuf when buffer selection is doneJens Axboe
It's not needed as the REQ_F_BUFFER_SELECTED flag tracks the state of whether or not kbuf is valid, so just drop it. Suggested-by: Dylan Yudaken <dylany@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-09io_uring: eliminate the need to track provided buffer ID separatelyJens Axboe
We have io_kiocb->buf_index which is used for either fixed buffers, or for provided buffers. For the latter, it's used to hold the buffer group ID for buffer selection. Post selection, req->kbuf->bid is used to get the buffer ID. Store the buffer ID, when selected, in req->buf_index. If we do end up recycling the buffer, reset it back to the buffer group ID. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-09io_uring: move provided buffer state closer to submit stateJens Axboe
The timeout and other items that follow are less hot, so let's move the provided buffer state above that. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-09io_uring: move provided and fixed buffers into the same io_kiocb areaJens Axboe
These are mutually exclusive - if you use provided buffers, then you cannot use fixed buffers and vice versa. Move them into the same spot in the io_kiocb, which is also advantageous for provided buffers as they get near the submit side hot cacheline. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-09io_uring: abstract out provided buffer list selectionJens Axboe
In preparation for providing another way to select a buffer, move the existing logic into a helper. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-09io_uring: never call io_buffer_select() for a buffer re-selectJens Axboe
Callers already have room to store the addr and length information, clean it up by having the caller just assign the previously provided data. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-09io_uring: get rid of hashed provided buffer groupsJens Axboe
Use a plain array for any group ID that's less than 64, and punt anything beyond that to an xarray. 64 fits in a page even for 4KB page sizes and with the planned additions. This makes the expected group usage faster by avoiding a hash and lookup to find our list, and it uses less memory upfront by not allocating any memory for provided buffers unless it's actually being used. Suggested-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-09io_uring: always use req->buf_index for the provided buffer groupJens Axboe
The read/write opcodes use it already, but the recv/recvmsg do not. If we switch them over and read and validate this at init time while we're checking if the opcode supports it anyway, then we can do it in one spot and we don't have to pass in a separate group ID for io_buffer_select(). Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-09io_uring: ignore ->buf_index if REQ_F_BUFFER_SELECT isn't setJens Axboe
There's no point in validity checking buf_index if the request doesn't have REQ_F_BUFFER_SELECT set, as we will never use it for that case. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-09io_uring: kill io_rw_buffer_select() wrapperJens Axboe
After the recent changes, this is direct call to io_buffer_select() anyway. With this change, there are no wrappers left for provided buffer selection. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-09io_uring: make io_buffer_select() return the user address directlyJens Axboe
There's no point in having callers provide a kbuf, we're just returning the address anyway. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-09fanotify: do not allow setting dirent events in mask of non-dirAmir Goldstein
Dirent events (create/delete/move) are only reported on watched directory inodes, but in fanotify as well as in legacy inotify, it was always allowed to set them on non-dir inode, which does not result in any meaningful outcome. Until kernel v5.17, dirent events in fanotify also differed from events "on child" (e.g. FAN_OPEN) in the information provided in the event. For example, FAN_OPEN could be set in the mask of a non-dir or the mask of its parent and event would report the fid of the child regardless of the marked object. By contrast, FAN_DELETE is not reported if the child is marked and the child fid was not reported in the events. Since kernel v5.17, with fanotify group flag FAN_REPORT_TARGET_FID, the fid of the child is reported with dirent events, like events "on child", which may create confusion for users expecting the same behavior as events "on child" when setting events in the mask on a child. The desired semantics of setting dirent events in the mask of a child are not clear, so for now, deny this action for a group initialized with flag FAN_REPORT_TARGET_FID and for the new event FAN_RENAME. We may relax this restriction in the future if we decide on the semantics and implement them. Fixes: d61fd650e9d2 ("fanotify: introduce group flag FAN_REPORT_TARGET_FID") Fixes: 8cc3b1ccd930 ("fanotify: wire up FAN_RENAME event") Link: https://lore.kernel.org/linux-fsdevel/20220505133057.zm5t6vumc4xdcnsg@quack3.lan/ Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220507080028.219826-1-amir73il@gmail.com
2022-05-09xfs: Skip flip flags for delayed attrsAllison Henderson
This is a clean up patch that skips the flip flag logic for delayed attr renames. Since the log replay keeps the inode locked, we do not need to worry about race windows with attr lookups. So we can skip over flipping the flag and the extra transaction roll for it Signed-off-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-05-09xfs: Implement attr logging and replayAllison Henderson
This patch adds the needed routines to create, log and recover logged extended attribute intents. Signed-off-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-05-08iomap: iomap_write_end cleanupAndreas Gruenbacher
In iomap_write_end(), only call iomap_write_failed() on the byte range that has failed. This should improve code readability, but doesn't fix an actual bug because iomap_write_failed() is called after updating the file size here and it only affects the memory beyond the end of the file. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>