summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2022-12-12exfat: remove unnecessary arguments from exfat_find_dir_entry()Yuezhang Mo
This commit removes argument 'num_entries' and 'type' from exfat_find_dir_entry(). Code refinement, no functional changes. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2022-12-12exfat: remove unneeded codes from __exfat_rename()Yuezhang Mo
The code gets the dentry, but the dentry is not used, remove the code. Code refinement, no functional changes. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2022-12-12exfat: remove call ilog2() from exfat_readdir()Yuezhang Mo
There is no need to call ilog2() for the conversions between cluster and dentry in exfat_readdir(), because these conversions can be replaced with EXFAT_DEN_TO_CLU()/EXFAT_CLU_TO_DEN(). Code refinement, no functional changes. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2022-12-12exfat: replace magic numbers with MacrosYuezhang Mo
Code refinement, no functional changes. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2022-12-12exfat: rename exfat_free_dentry_set() to exfat_put_dentry_set()Yuezhang Mo
Since struct exfat_entry_set_cache is allocated from stack, no need to free, so rename exfat_free_dentry_set() to exfat_put_dentry_set(). After renaming, the new function pair is exfat_get_dentry_set()/exfat_put_dentry_set(). Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2022-12-12exfat: move exfat_entry_set_cache from heap to stackYuezhang Mo
The size of struct exfat_entry_set_cache is only 56 bytes on 64-bit system, and allocating from stack is more efficient than allocating from heap. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2022-12-12exfat: support dynamic allocate bh for exfat_entry_set_cacheYuezhang Mo
In special cases, a file or a directory may occupied more than 19 directory entries, pre-allocating 3 bh is not enough. Such as - Support vendor secondary directory entry in the future. - Since file directory entry is damaged, the SecondaryCount field is bigger than 18. So this commit supports dynamic allocation of bh. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2022-12-12exfat: reduce the size of exfat_entry_set_cacheYuezhang Mo
In normal, there are 19 directory entries at most for a file or a directory. - A file directory entry - A stream extension directory entry - 1~17 file name directory entry So the directory entries are in 3 sectors at most, it is enough for struct exfat_entry_set_cache to pre-allocate 3 bh. This commit changes the size of struct exfat_entry_set_cache as: Before After 32-bit system 88 32 bytes 64-bit system 168 48 bytes Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2022-12-12exfat: hint the empty entry which at the end of cluster chainYuezhang Mo
After traversing all directory entries, hint the empty directory entry no matter whether or not there are enough empty directory entries. After this commit, hint the empty directory entries like this: 1. Hint the deleted directory entries if enough; 2. Hint the deleted and unused directory entries which at the end of the cluster chain no matter whether enough or not(Add by this commit); 3. If no any empty directory entries, hint the empty directory entries in the new cluster(Add by this commit). This avoids repeated traversal of directory entries, reduces CPU usage, and improves the performance of creating files and directories(especially on low-performance CPUs). Test create 5000 files in a class 4 SD card on imx6q-sabrelite with: for ((i=0;i<5;i++)); do sync time (for ((j=1;j<=1000;j++)); do touch file$((i*1000+j)); done) done The more files, the more performance improvements. Before After Improvement 1~1000 25.360s 22.168s 14.40% 1001~2000 38.242s 28.72ss 33.15% 2001~3000 49.134s 35.037s 40.23% 3001~4000 62.042s 41.624s 49.05% 4001~5000 73.629s 46.772s 57.42% Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2022-12-12exfat: simplify empty entry hintYuezhang Mo
This commit adds exfat_set_empty_hint()/exfat_reset_empty_hint() to reduce code complexity and make code more readable. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2022-12-11nfsd: rework refcounting in filecacheJeff Layton
The filecache refcounting is a bit non-standard for something searchable by RCU, in that we maintain a sentinel reference while it's hashed. This in turn requires that we have to do things differently in the "put" depending on whether its hashed, which we believe to have led to races. There are other problems in here too. nfsd_file_close_inode_sync can end up freeing an nfsd_file while there are still outstanding references to it, and there are a number of subtle ToC/ToU races. Rework the code so that the refcount is what drives the lifecycle. When the refcount goes to zero, then unhash and rcu free the object. A task searching for a nfsd_file is allowed to bump its refcount, but only if it's not already 0. Ensure that we don't make any other changes to it until a reference is held. With this change, the LRU carries a reference. Take special care to deal with it when removing an entry from the list, and ensure that we only repurpose the nf_lru list_head when the refcount is 0 to ensure exclusive access to it. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-12-11ksmbd: Convert to use sysfs_emit()/sysfs_emit_at() APIsye xingchen
Follow the advice of the Documentation/filesystems/sysfs.rst and show() should only use sysfs_emit() or sysfs_emit_at() when formatting the value to be returned to user space. Signed-off-by: ye xingchen <ye.xingchen@zte.com.cn> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-12-11ksmbd: Fix resource leak in smb2_lock()Marios Makassikis
"flock" is leaked if an error happens before smb2_lock_init(), as the lock is not added to the lock_list to be cleaned up. Signed-off-by: Marios Makassikis <mmakassikis@freebox.fr> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-12-11ksmbd: Fix resource leak in ksmbd_session_rpc_open()Xiu Jianfeng
When ksmbd_rpc_open() fails then it must call ksmbd_rpc_id_free() to undo the result of ksmbd_ipc_id_alloc(). Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3") Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-12-11ksmbd: replace one-element arrays with flexible-array membersGustavo A. R. Silva
One-element arrays are deprecated, and we are replacing them with flexible array members instead. So, replace one-element arrays with flexible-array members in multiple structs in fs/ksmbd/smb_common.h and one in fs/ksmbd/smb2pdu.h. Important to mention is that doing a build before/after this patch results in no binary output differences. This helps with the ongoing efforts to tighten the FORTIFY_SOURCE routines on memcpy() and help us make progress towards globally enabling -fstrict-flex-arrays=3 [1]. Link: https://github.com/KSPP/linux/issues/242 Link: https://github.com/KSPP/linux/issues/79 Link: https://gcc.gnu.org/pipermail/gcc-patches/2022-October/602902.html [1] Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-12-11ksmbd: use F_SETLK when unlocking a fileJeff Layton
ksmbd seems to be trying to use a cmd value of 0 when unlocking a file. That activity requires a type of F_UNLCK with a cmd of F_SETLK. For local POSIX locking, it doesn't matter much since vfs_lock_file ignores @cmd, but filesystems that define their own ->lock operation expect to see it set sanely. Cc: David Howells <dhowells@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: David Howells <dhowells@redhat.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-12-11ksmbd: set SMB2_SESSION_FLAG_ENCRYPT_DATA when enforcing data encryption for ↵Namjae Jeon
this share Currently, SMB2_SESSION_FLAG_ENCRYPT_DATA is always set session setup response. Since this forces data encryption from the client, there is a problem that data is always encrypted regardless of the use of the cifs seal mount option. SMB2_SESSION_FLAG_ENCRYPT_DATA should be set according to KSMBD_GLOBAL_FLAG_SMB2_ENCRYPTION flags, and in case of KSMBD_GLOBAL_FLAG_SMB2_ENCRYPTION_OFF, encryption mode is turned off for all connections. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-12-10fs: sysv: Fix sysv_nblocks() returns wrong valueChen Zhongjin
sysv_nblocks() returns 'blocks' rather than 'res', which only counting the number of triple-indirect blocks and causing sysv_getattr() gets a wrong result. [AV: this is actually a sysv counterpart of minixfs fix - 0fcd426de9d0 "[PATCH] minix block usage counting fix" in historical tree; mea culpa, should've thought to check fs/sysv back then...] Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Chen Zhongjin <chenzhongjin@huawei.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-12-10NFSv4.2: Change the default KConfig value for READ_PLUSAnna Schumaker
Now that we've worked out performance issues and have a server patch addressing the failed xfstests, we can safely enable this feature by default. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-12-10NFSD: Avoid clashing function prototypesKees Cook
When built with Control Flow Integrity, function prototypes between caller and function declaration must match. These mismatches are visible at compile time with the new -Wcast-function-type-strict in Clang[1]. There were 97 warnings produced by NFS. For example: fs/nfsd/nfs4xdr.c:2228:17: warning: cast from '__be32 (*)(struct nfsd4_compoundargs *, struct nfsd4_access *)' (aka 'unsigned int (*)(struct nfsd4_compoundargs *, struct nfsd4_access *)') to 'nfsd4_dec' (aka 'unsigned int (*)(struct nfsd4_compoundargs *, void *)') converts to incompatible function type [-Wcast-function-type-strict] [OP_ACCESS] = (nfsd4_dec)nfsd4_decode_access, ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The enc/dec callbacks were defined as passing "void *" as the second argument, but were being implicitly cast to a new type. Replace the argument with union nfsd4_op_u, and perform explicit member selection in the function body. There are no resulting binary differences. Changes were made mechanically using the following Coccinelle script, with minor by-hand fixes for members that didn't already match their existing argument name: @find@ identifier func; type T, opsT; identifier ops, N; @@ opsT ops[] = { [N] = (T) func, }; @already_void@ identifier find.func; identifier name; @@ func(..., -void +union nfsd4_op_u *name) { ... } @proto depends on !already_void@ identifier find.func; type T; identifier name; position p; @@ func@p(..., T name ) { ... } @script:python get_member@ type_name << proto.T; member; @@ coccinelle.member = cocci.make_ident(type_name.split("_", 1)[1].split(' ',1)[0]) @convert@ identifier find.func; type proto.T; identifier proto.name; position proto.p; identifier get_member.member; @@ func@p(..., - T name + union nfsd4_op_u *u ) { + T name = &u->member; ... } @cast@ identifier find.func; type T, opsT; identifier ops, N; @@ opsT ops[] = { [N] = - (T) func, }; Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Jeff Layton <jlayton@kernel.org> Cc: Gustavo A. R. Silva <gustavoars@kernel.org> Cc: linux-nfs@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-12-10NFSD: Use only RQ_DROPME to signal the need to drop a replyChuck Lever
Clean up: NFSv2 has the only two usages of rpc_drop_reply in the NFSD code base. Since NFSv2 is going away at some point, replace these in order to simplify the "drop this reply?" check in nfsd_dispatch(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org>
2022-12-10NFSD: add CB_RECALL_ANY tracepointsDai Ngo
Add tracepoints to trace start and end of CB_RECALL_ANY operation. Signed-off-by: Dai Ngo <dai.ngo@oracle.com> [ cel: added show_rca_mask() macro ] Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-12-10NFSD: add delegation reaper to react to low memory conditionDai Ngo
The delegation reaper is called by nfsd memory shrinker's on the 'count' callback. It scans the client list and sends the courtesy CB_RECALL_ANY to the clients that hold delegations. To avoid flooding the clients with CB_RECALL_ANY requests, the delegation reaper sends only one CB_RECALL_ANY request to each client per 5 seconds. Signed-off-by: Dai Ngo <dai.ngo@oracle.com> [ cel: moved definition of RCA4_TYPE_MASK_RDATA_DLG ] Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-12-10NFSD: add support for sending CB_RECALL_ANYDai Ngo
Add XDR encode and decode function for CB_RECALL_ANY. Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-12-10NFSD: refactoring courtesy_client_reaper to a generic low memory shrinkerDai Ngo
Refactoring courtesy_client_reaper to generic low memory shrinker so it can be used for other purposes. Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-12-10trace: Relocate event helper filesChuck Lever
Steven Rostedt says: > The include/trace/events/ directory should only hold files that > are to create events, not headers that hold helper functions. > > Can you please move them out of include/trace/events/ as that > directory is "special" in the creation of events. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Acked-by: Leon Romanovsky <leonro@nvidia.com> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> Acked-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-12-10NFSD: pass range end to vfs_fsync_range() instead of countBrian Foster
_nfsd_copy_file_range() calls vfs_fsync_range() with an offset and count (bytes written), but the former wants the start and end bytes of the range to sync. Fix it up. Fixes: eac0b17a77fb ("NFSD add vfs_fsync after async copy is done") Signed-off-by: Brian Foster <bfoster@redhat.com> Tested-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-12-10lockd: fix file selection in nlmsvc_cancel_blockedJeff Layton
We currently do a lock_to_openmode call based on the arguments from the NLM_UNLOCK call, but that will always set the fl_type of the lock to F_UNLCK, and the O_RDONLY descriptor is always chosen. Fix it to use the file_lock from the block instead. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-12-10lockd: ensure we use the correct file descriptor when unlockingJeff Layton
Shared locks are set on O_RDONLY descriptors and exclusive locks are set on O_WRONLY ones. nlmsvc_unlock however calls vfs_lock_file twice, once for each descriptor, but it doesn't reset fl_file. Ensure that it does. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-12-10lockd: set missing fl_flags field when retrieving argsJeff Layton
Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-12-10NFSD: Use struct_size() helper in alloc_session()Xiu Jianfeng
Use struct_size() helper to simplify the code, no functional changes. Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-12-10nfsd: return error if nfs4_setacl failsJeff Layton
With the addition of POSIX ACLs to struct nfsd_attrs, we no longer return an error if setting the ACL fails. Ensure we return the na_aclerr error on SETATTR if there is one. Fixes: c0cbe70742f4 ("NFSD: add posix ACLs to struct nfsd_attrs") Cc: Neil Brown <neilb@suse.de> Reported-by: Yongcheng Yang <yoyang@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-12-10lockd: set other missing fields when unlocking filesTrond Myklebust
vfs_lock_file() expects the struct file_lock to be fully initialised by the caller. Re-exported NFSv3 has been seen to Oops if the fl_file field is NULL. Fixes: aec158242b87 ("lockd: set fl_owner when unlocking files") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Link: https://bugzilla.kernel.org/show_bug.cgi?id=216582 Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-12-10NFSD: Add an nfsd_file_fsync tracepointChuck Lever
Add a tracepoint to capture the number of filecache-triggered fsync calls and which files needed it. Also, record when an fsync triggers a write verifier reset. Examples: <...>-97 [007] 262.505611: nfsd_file_free: inode=0xffff888171e08140 ref=0 flags=GC may=WRITE nf_file=0xffff8881373d2400 <...>-97 [007] 262.505612: nfsd_file_fsync: inode=0xffff888171e08140 ref=0 flags=GC may=WRITE nf_file=0xffff8881373d2400 ret=0 <...>-97 [007] 262.505623: nfsd_file_free: inode=0xffff888171e08dc0 ref=0 flags=GC may=WRITE nf_file=0xffff8881373d1e00 <...>-97 [007] 262.505624: nfsd_file_fsync: inode=0xffff888171e08dc0 ref=0 flags=GC may=WRITE nf_file=0xffff8881373d1e00 ret=0 Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org>
2022-12-10nfsd: fix up the filecache laundrette schedulingJeff Layton
We don't really care whether there are hashed entries when it comes to scheduling the laundrette. They might all be non-gc entries, after all. We only want to schedule it if there are entries on the LRU. Switch to using list_lru_count, and move the check into nfsd_file_gc_worker. The other callsite in nfsd_file_put doesn't need to count entries, since it only schedules the laundrette after adding an entry to the LRU. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-12-10gfs2: Minor gfs2_try_evict cleanupAndreas Gruenbacher
In gfs2_try_evict(), when an inode can't be evicted, we are grabbing a temporary reference on the inode glock to poke that glock. That should be safe, but it's easier to just grab an inode reference as we already do earlier in this function. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2022-12-09udf: Fix extending file within last blockJan Kara
When extending file within last block it can happen that the extent is already rounded to the blocksize and thus contains the offset we want to grow up to. In such case we would mistakenly expand the last extent and make it one block longer than it should be, exposing unallocated block in a file and causing data corruption. Fix the problem by properly detecting this case and bailing out. CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz>
2022-12-09udf: Discard preallocation before extending file with a holeJan Kara
When extending file with a hole, we tried to preserve existing preallocation for the file. However that is not very useful and complicates code because the previous extent may need to be rounded to block boundary as well (which we forgot to do thus causing data corruption for sequence like: xfs_io -f -c "pwrite 0x75e63 11008" -c "truncate 0x7b24b" \ -c "truncate 0xabaa3" -c "pwrite 0xac70b 22954" \ -c "pwrite 0x93a43 11358" -c "pwrite 0xb8e65 52211" file with 512-byte block size. Just discard preallocation before extending file to simplify things and also fix this data corruption. CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz>
2022-12-09udf: Do not bother looking for prealloc extents if i_lenExtents matches i_sizeJan Kara
If rounded block-rounded i_lenExtents matches block rounded i_size, there are no preallocation extents. Do not bother walking extent linked list. CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz>
2022-12-09udf: Fix preallocation discarding at indirect extent boundaryJan Kara
When preallocation extent is the first one in the extent block, the code would corrupt extent tree header instead. Fix the problem and use udf_delete_aext() for deleting extent to avoid some code duplication. CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz>
2022-12-09ext4: fix reserved cluster accounting in __es_remove_extent()Ye Bin
When bigalloc is enabled, reserved cluster accounting for delayed allocation is handled in extent_status.c. With a corrupted file system, it's possible for this accounting to be incorrect, dsicovered by Syzbot: EXT4-fs error (device loop0): ext4_validate_block_bitmap:398: comm rep: bg 0: block 5: invalid block bitmap EXT4-fs (loop0): Delayed block allocation failed for inode 18 at logical offset 0 with max blocks 32 with error 28 EXT4-fs (loop0): This should not happen!! Data will be lost EXT4-fs (loop0): Total free blocks count 0 EXT4-fs (loop0): Free/Dirty block details EXT4-fs (loop0): free_blocks=0 EXT4-fs (loop0): dirty_blocks=32 EXT4-fs (loop0): Block reservation details EXT4-fs (loop0): i_reserved_data_blocks=2 EXT4-fs (loop0): Inode 18 (00000000845cd634): i_reserved_data_blocks (1) not cleared! Above issue happens as follows: Assume: sbi->s_cluster_ratio = 16 Step1: Insert delay block [0, 31] -> ei->i_reserved_data_blocks=2 Step2: ext4_writepages mpage_map_and_submit_extent -> return failed mpage_release_unused_pages -> to release [0, 30] ext4_es_remove_extent -> remove lblk=0 end=30 __es_remove_extent -> len1=0 len2=31-30=1 __es_remove_extent: ... if (len2 > 0) { ... if (len1 > 0) { ... } else { es->es_lblk = end + 1; es->es_len = len2; ... } if (count_reserved) count_rsvd(inode, lblk, ...); goto out; -> will return but didn't calculate 'reserved' ... Step3: ext4_destroy_inode -> trigger "i_reserved_data_blocks (1) not cleared!" To solve above issue if 'len2>0' call 'get_rsvd()' before goto out. Reported-by: syzbot+05a0f0ccab4a25626e38@syzkaller.appspotmail.com Fixes: 8fcc3a580651 ("ext4: rework reserved cluster accounting when invalidating pages") Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Eric Whitney <enwlinux@gmail.com> Link: https://lore.kernel.org/r/20221208033426.1832460-2-yebin@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
2022-12-09ext4: fix inode leak in ext4_xattr_inode_create() on an error pathYe Bin
There is issue as follows when do setxattr with inject fault: [localhost]# fsck.ext4 -fn /dev/sda e2fsck 1.46.6-rc1 (12-Sep-2022) Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Unattached zero-length inode 15. Clear? no Unattached inode 15 Connect to /lost+found? no Pass 5: Checking group summary information /dev/sda: ********** WARNING: Filesystem still has errors ********** /dev/sda: 15/655360 files (0.0% non-contiguous), 66755/2621440 blocks This occurs in 'ext4_xattr_inode_create()'. If 'ext4_mark_inode_dirty()' fails, dropping i_nlink of the inode is needed. Or will lead to inode leak. Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20221208023233.1231330-5-yebin@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
2022-12-09ext4: allocate extended attribute value in vmalloc areaYe Bin
Now, extended attribute value maximum length is 64K. The memory requested here does not need continuous physical addresses, so it is appropriate to use kvmalloc to request memory. At the same time, it can also cope with the situation that the extended attribute will become longer in the future. Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20221208023233.1231330-3-yebin@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
2022-12-08ext4: avoid unaccounted block allocation when expanding inodeJan Kara
When expanding inode space in ext4_expand_extra_isize_ea() we may need to allocate external xattr block. If quota is not initialized for the inode, the block allocation will not be accounted into quota usage. Make sure the quota is initialized before we try to expand inode space. Reported-by: Pengfei Xu <pengfei.xu@intel.com> Link: https://lore.kernel.org/all/Y5BT+k6xWqthZc1P@xpf.sh.intel.com Signed-off-by: Jan Kara <jack@suse.cz> Cc: stable@kernel.org Link: https://lore.kernel.org/r/20221207115937.26601-2-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-12-08ext4: initialize quota before expanding inode in setproject ioctlJan Kara
Make sure we initialize quotas before possibly expanding inode space (and thus maybe needing to allocate external xattr block) in ext4_ioctl_setproject(). This prevents not accounting the necessary block allocation. Signed-off-by: Jan Kara <jack@suse.cz> Cc: stable@kernel.org Link: https://lore.kernel.org/r/20221207115937.26601-1-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-12-08ext4: stop providing .writepage hookJan Kara
Now we don't need .writepage hook for anything anymore. Reclaim is fine with relying on .writepages to clean pages and we often couldn't do much from the .writepage callback anyway. We only need to provide .migrate_folio callback for the ext4_journalled_aops - let's use buffer_migrate_page_norefs() there so that buffers cannot be modified under jdb2's hands as that can cause data corruption. For example when commit code does writeout of transaction buffers in jbd2_journal_write_metadata_buffer(), we don't hold page lock or have page writeback bit set or have the buffer locked. So page migration code would go and happily migrate the page elsewhere while the copy is running thus corrupting data. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20221207112722.22220-12-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-12-08ext4: switch to using write_cache_pages() for data=journal writeoutJan Kara
Instead of using generic_writepages(), let's use write_cache_pages() for writeout of journalled data. It will allow us to stop providing .writepage callback. Our data=journal writeback path would benefit from a larger cleanup and refactoring but that's for a separate cleanup series. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20221207112722.22220-10-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-12-08jbd2: switch jbd2_submit_inode_data() to use fs-provided hook for data writeoutJan Kara
jbd2_submit_inode_data() hardcoded use of jbd2_journal_submit_inode_data_buffers() for submission of data pages. Make it use j_submit_inode_data_buffers hook instead. This effectively switches ext4 fastcommits to use ext4_writepages() for data writeout instead of generic_writepages(). Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20221207112722.22220-9-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-12-08ext4: switch to using ext4_do_writepages() for ordered data writeoutJan Kara
Use the standard writepages method (ext4_do_writepages()) to perform writeout of ordered data during journal commit. Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20221207112722.22220-8-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-12-08ext4: move percpu_rwsem protection into ext4_writepages()Jan Kara
Move protection by percpu_rwsem from ext4_do_writepages() to ext4_writepages(). We will not want to grab this protection during transaction commits as that would be prone to deadlocks and the protection is not needed. Move the shutdown state checking as well since we want to be able to complete commit while the shutdown is in progress. Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20221207112722.22220-7-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>