summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2024-04-25btrfs: add missing mutex_unlock in btrfs_relocate_sys_chunks()Dominique Martinet
The previous patch that replaced BUG_ON by error handling forgot to unlock the mutex in the error path. Link: https://lore.kernel.org/all/Zh%2fHpAGFqa7YAFuM@duo.ucw.cz Reported-by: Pavel Machek <pavel@denx.de> Fixes: 7411055db5ce ("btrfs: handle chunk tree lookup error in btrfs_relocate_sys_chunks()") CC: stable@vger.kernel.org Reviewed-by: Pavel Machek <pavel@denx.de> Signed-off-by: Dominique Martinet <dominique.martinet@atmark-techno.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-04-25exfat: zero the reserved fields of file and stream extension dentriesYuezhang Mo
From exFAT specification, the reserved fields should initialize to zero and should not use for any purpose. If create a new dentry set in the UNUSED dentries, all fields had been zeroed when allocating cluster to parent directory. But if create a new dentry set in the DELETED dentries, the reserved fields in file and stream extension dentries may be non-zero. Because only the valid bit of the type field of the dentry is cleared in exfat_remove_entries(), if the type of dentry is different from the original(For example, a dentry that was originally a file name dentry, then set to deleted dentry, and then set as a file dentry), the reserved fields is non-zero. So this commit initializes the dentry to 0 before createing file dentry and stream extension dentry. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-04-25iomap: do some small logical cleanup in buffered writeZhang Yi
Since iomap_write_end() can never return a partial write length, the comparison between written, copied and bytes becomes useless, just merge them with the unwritten branch. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Link: https://lore.kernel.org/r/20240320110548.2200662-10-yi.zhang@huaweicloud.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-04-25iomap: make iomap_write_end() return a booleanZhang Yi
For now, we can make sure iomap_write_end() always return 0 or copied bytes, so instead of return written bytes, convert to return a boolean to indicate the copied bytes have been written to the pagecache. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Link: https://lore.kernel.org/r/20240320110548.2200662-9-yi.zhang@huaweicloud.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-04-25iomap: use a new variable to handle the written bytes in iomap_write_iter()Zhang Yi
In iomap_write_iter(), the status variable used to receive the return value from iomap_write_end() is confusing, replace it with a new written variable to represent the written bytes in each cycle, no logic changes. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Link: https://lore.kernel.org/r/20240320110548.2200662-8-yi.zhang@huaweicloud.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-04-25iomap: don't increase i_size if it's not a write operationZhang Yi
Increase i_size in iomap_zero_range() and iomap_unshare_iter() is not needed, the caller should handle it. Especially, when truncate partial block, we should not increase i_size beyond the new EOF here. It doesn't affect xfs and gfs2 now because they set the new file size after zero out, it doesn't matter that a transient increase in i_size, but it will affect ext4 because it set file size before truncate. So move the i_size updating logic to iomap_write_iter(). Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Link: https://lore.kernel.org/r/20240320110548.2200662-7-yi.zhang@huaweicloud.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-04-25iomap: drop the write failure handles when unsharing and zeroingZhang Yi
Unsharing and zeroing can only happen within EOF, so there is never a need to perform posteof pagecache truncation if write begin fails, also partial write could never theoretically happened from iomap_write_end(), so remove both of them. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Link: https://lore.kernel.org/r/20240320110548.2200662-6-yi.zhang@huaweicloud.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-04-25erofs_buf: store address_space instead of inodeAl Viro
... seeing that ->i_mapping is the only thing we want from the inode. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-04-24mm: support page_mapcount() on page_has_type() pagesMatthew Wilcox (Oracle)
Return 0 for pages which can't be mapped. This matches how page_mapped() works. It is more convenient for users to not have to filter out these pages. Link: https://lkml.kernel.org/r/20240321142448.1645400-5-willy@infradead.org Fixes: 9c5ccf2db04b ("mm: remove HUGETLB_PAGE_DTOR") Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Oscar Salvador <osalvador@suse.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-24fs: ecryptfs: replace deprecated strncpy with strscpyJustin Stitt
strncpy() is deprecated for use on NUL-terminated destination strings [1] and as such we should prefer more robust and less ambiguous string interfaces. A good alternative is strscpy() as it guarantees NUL-termination on the destination buffer. In crypto.c: We expect cipher_name to be NUL-terminated based on its use with the C-string format specifier %s and with other string apis like strlen(): | printk(KERN_ERR "Error attempting to initialize key TFM " | "cipher with name = [%s]; rc = [%d]\n", | tmp_tfm->cipher_name, rc); and | int cipher_name_len = strlen(cipher_name); In main.c: We can remove the manual NUL-byte assignments as well as the pointers to destinations (which I assume only existed to trim down on line length?) in favor of directly using the destination buffer which allows the compiler to get size information -- enabling the usage of the new 2-argument strscpy(). Note that this patch relies on the _new_ 2-argument versions of strscpy() and strscpy_pad() introduced in Commit e6584c3964f2f ("string: Allow 2-argument strscpy()"). Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1] Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html [2] Link: https://github.com/KSPP/linux/issues/90 Cc: <linux-hardening@vger.kernel.org> Signed-off-by: Justin Stitt <justinstitt@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20240321-strncpy-fs-ecryptfs-crypto-c-v1-1-d78b74c214ac@google.com Signed-off-by: Kees Cook <keescook@chromium.org>
2024-04-24hfsplus: refactor copy_name to not use strncpyJustin Stitt
strncpy() is deprecated with NUL-terminated destination strings [1]. The copy_name() method does a lot of manual buffer manipulation to eventually arrive with its desired string. If we don't know the namespace this attr has or belongs to we want to prepend "osx." to our final string. Following this, we're copying xattr_name and doing a bizarre manual NUL-byte assignment with a memset where n=1. Really, we can use some more obvious string APIs to acomplish this, improving readability and security. Following the same control flow as before: if we don't know the namespace let's use scnprintf() to form our prefix + xattr_name pairing (while NUL-terminating too!). Otherwise, use strscpy() to return the number of bytes copied into our buffer. Additionally, for non-empty strings, include the NUL-byte in the length -- matching the behavior of the previous implementation. Note that strscpy() _can_ return -E2BIG but this is already handled by all callsites: In both hfsplus_listxattr_finder_info() and hfsplus_listxattr(), ret is already type ssize_t so we can change the return type of copy_name() to match (understanding that scnprintf()'s return type is different yet fully representable by ssize_t). Furthermore, listxattr() in fs/xattr.c is well-equipped to handle a potential -E2BIG return result from vfs_listxattr(): | ssize_t error; ... | error = vfs_listxattr(d, klist, size); | if (error > 0) { | if (size && copy_to_user(list, klist, error)) | error = -EFAULT; | } else if (error == -ERANGE && size >= XATTR_LIST_MAX) { | /* The file system tried to returned a list bigger | than XATTR_LIST_MAX bytes. Not possible. */ | error = -E2BIG; | } ... the error can potentially already be -E2BIG, skipping this else-if and ending up at the same state as other errors. Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1] Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html Link: https://github.com/KSPP/linux/issues/90 Cc: <linux-hardening@vger.kernel.org> Signed-off-by: Justin Stitt <justinstitt@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20240401-strncpy-fs-hfsplus-xattr-c-v2-1-6e089999355e@google.com Signed-off-by: Kees Cook <keescook@chromium.org>
2024-04-24reiserfs: replace deprecated strncpy with scnprintfJustin Stitt
strncpy() is deprecated for use on NUL-terminated destination strings [1] and as such we should prefer more robust and less ambiguous string interfaces. Our goal here is to get @namebuf populated with @name's contents but surrounded with quotes. There is some careful handling done to ensure we properly truncate @name so that we have room for a literal quote as well as a NUL-term. All this careful handling can be done with scnprintf using the dynamic string width specifier %.*s which allows us to pass in the max size for a source string. Doing this, we can put literal quotes in our format specifier and ensure @name is truncated to fit inbetween these quotes (-3 is from 2 quotes + 1 NUL-byte). All in all, we get to remove a deprecated use of strncpy and clean up this code nicely! Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1] Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html [2] Link: https://github.com/KSPP/linux/issues/90 Cc: <linux-hardening@vger.kernel.org> Signed-off-by: Justin Stitt <justinstitt@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20240328-strncpy-fs-reiserfs-item_ops-c-v1-1-2dab6d22a996@google.com Signed-off-by: Kees Cook <keescook@chromium.org>
2024-04-24binfmt_elf_fdpic: fix /proc/<pid>/auxvMax Filippov
Althought FDPIC linux kernel provides /proc/<pid>/auxv files they are empty because there's no code that initializes mm->saved_auxv in the FDPIC ELF loader. Synchronize FDPIC ELF aux vector setup with ELF. Replace entry-by-entry aux vector copying to userspace with initialization of mm->saved_auxv first and then copying it to userspace as a whole. Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Link: https://lore.kernel.org/r/20240322195418.2160164-1-jcmvbkbc@gmail.com Signed-off-by: Kees Cook <keescook@chromium.org>
2024-04-24binfmt_elf: Leave a gap between .bss and brkKees Cook
Currently the brk starts its randomization immediately after .bss, which means there is a chance that when the random offset is 0, linear overflows from .bss can reach into the brk area. Leave at least a single page gap between .bss and brk (when it has not already been explicitly relocated into the mmap range). Reported-by: <y0un9n132@gmail.com> Closes: https://lore.kernel.org/linux-hardening/CA+2EKTVLvc8hDZc+2Yhwmus=dzOUG5E4gV7ayCbu0MPJTZzWkw@mail.gmail.com/ Link: https://lore.kernel.org/r/20240217062545.1631668-2-keescook@chromium.org Signed-off-by: Kees Cook <keescook@chromium.org>
2024-04-24gfs2: Remove and replace gfs2_glock_queue_workAndreas Gruenbacher
There are no more callers of gfs2_glock_queue_work() left, so remove that helper. With that, we can now rename __gfs2_glock_queue_work() back to gfs2_glock_queue_work() to get rid of some unnecessary clutter. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-04-24gfs2: do_xmote fixesAndreas Gruenbacher
Function do_xmote() is called with the glock spinlock held. Commit 86934198eefa added a 'goto skip_inval' statement at the beginning of the function to further below where the glock spinlock is expected not to be held anymore. Then it added code there that requires the glock spinlock to be held. This doesn't make sense; fix this up by dropping and retaking the spinlock where needed. In addition, when ->lm_lock() returned an error, do_xmote() didn't fail the locking operation, and simply left the glock hanging; fix that as well. (This is a much older error.) Fixes: 86934198eefa ("gfs2: Clear flags when withdraw prevents xmote") Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-04-24gfs2: finish_xmote cleanupAndreas Gruenbacher
Currently, function finish_xmote() takes and releases the glock spinlock. However, all of its callers immediately take that spinlock again, so it makes more sense to take the spin lock before calling finish_xmote() already. With that, thaw_glock() is the only place that sets the GLF_HAVE_REPLY flag outside of the glock spinlock, but it also takes that spinlock immediately thereafter. Change that to set the bit when the spinlock is already held. This allows to switch from test_and_clear_bit() to test_bit() and clear_bit() in glock_work_func(). Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-04-24gfs2: Unlock fewer glocks on unmountAndreas Gruenbacher
At unmount time, we would generally like to explicitly unlock as few glocks as possible for efficiency. We are already skipping glocks that don't have a lock value block (LVB), but we can also skip glocks which are not held in DLM_LOCK_EX or DLM_LOCK_PW mode (of which gfs2 only uses DLM_LOCK_EX under the name LM_ST_EXCLUSIVE). Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Cc: David Teigland <teigland@redhat.com>
2024-04-24gfs2: Fix potential glock use-after-free on unmountAndreas Gruenbacher
When a DLM lockspace is released and there ares still locks in that lockspace, DLM will unlock those locks automatically. Commit fb6791d100d1b started exploiting this behavior to speed up filesystem unmount: gfs2 would simply free glocks it didn't want to unlock and then release the lockspace. This didn't take the bast callbacks for asynchronous lock contention notifications into account, which remain active until until a lock is unlocked or its lockspace is released. To prevent those callbacks from accessing deallocated objects, put the glocks that should not be unlocked on the sd_dead_glocks list, release the lockspace, and only then free those glocks. As an additional measure, ignore unexpected ast and bast callbacks if the receiving glock is dead. Fixes: fb6791d100d1b ("GFS2: skip dlm_unlock calls in unmount") Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Cc: David Teigland <teigland@redhat.com>
2024-04-24gfs2: Remove ill-placed consistency checkAndreas Gruenbacher
This consistency check was originally added by commit 9287c6452d2b1 ("gfs2: Fix occasional glock use-after-free"). It is ill-placed in gfs2_glock_free() because if it holds there, it must equally hold in __gfs2_glock_put() already. Either way, the check doesn't seem necessary anymore. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-04-24gfs2: Fix lru_count accountingAndreas Gruenbacher
Currently, gfs2_scan_glock_lru() decrements lru_count when a glock is moved onto the dispose list. When such a glock is then stolen from the dispose list while gfs2_dispose_glock_lru() doesn't hold the lru_lock, lru_count will be decremented again, so the counter will eventually go negative. This bug has existed in one form or another since at least commit 97cc1025b1a91 ("GFS2: Kill two daemons with one patch"). Fix this by only decrementing lru_count when we actually remove a glock and schedule for it to be unlocked and dropped. We also don't need to remove and then re-add glocks when we can just as well move them back onto the lru_list when necessary. In addition, return the number of glocks freed as we should, not the number of glocks moved onto the dispose list. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-04-24Merge tag 'for-6.9-rc5-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - fix information leak by the buffer returned from LOGICAL_INO ioctl - fix flipped condition in scrub when tracking sectors in zoned mode - fix calculation when dropping extent range - reinstate fallback to write uncompressed data in case of fragmented space that could not store the entire compressed chunk - minor fix to message formatting style to make it conforming to the commonly used style * tag 'for-6.9-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix wrong block_start calculation for btrfs_drop_extent_map_range() btrfs: fix information leak in btrfs_ioctl_logical_to_ino() btrfs: fallback if compressed IO fails for ENOSPC btrfs: scrub: run relocation repair when/only needed btrfs: remove colon from messages with state
2024-04-24xfs: don't call xfs_file_open from xfs_dir_openChristoph Hellwig
Directories do not support direct I/O and thus no non-blocking direct I/O either. Open code the shutdown check and call to generic_file_open instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240423124608.537794-4-hch@lst.de Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-04-24xfs: drop fop_flags for directoriesChristoph Hellwig
Directories have non of the capabilities, so drop the flags. Note that the current state is harmless as no one actually checks for the flags either. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240423124608.537794-3-hch@lst.de Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-04-24xfs: fix overly long line in the file_operationsChristoph Hellwig
Re-wrap the newly added fop_flags fields to not go over 80 characters. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240423124608.537794-2-hch@lst.de Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-04-24sysctl: drop now unnecessary out-of-bounds checkThomas Weißschuh
Remove the now unneeded check for ctl_table_size; it is safe to do so as sysctl_set_perm_empty_ctl_header() does not access the ctl_table member anymore. This also makes the element of sysctl_mount_point unnecessary, so drop it at the same time. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Joel Granados <j.granados@samsung.com>
2024-04-24sysctl: move sysctl type to ctl_table_headerThomas Weißschuh
Move the SYSCTL_TABLE_TYPE_{DEFAULT,PERMANENTLY_EMPTY} enums from ctl_table to ctl_table_header. Removing the mutable member is necessary to constify static instances of struct ctl_table. Move the initialization of the sysctl_mount_point type into init_header() where all the other header fields are also initialized. As a side-effect the memory usage of the sysctl core is reduced. Each ctl_table_header instance can manage multiple ctl_table instances and is only allocated when the table is actually registered. This saves 8 bytes of memory per ctl_table on 64bit, 4 due to the enum field itself and 4 due to padding. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Joel Granados <j.granados@samsung.com>
2024-04-24sysctl: drop sysctl_is_perm_empty_ctl_tableThomas Weißschuh
It is used only twice and those callers are simpler with sysctl_is_perm_empty_ctl_header(). So use this sibling function. This is part of an effort to constify definition of struct ctl_table. For this effort the mutable member 'type' is moved from struct ctl_table to struct ctl_table_header. Unifying the macros sysctl_is_perm_empty_ctl_* makes this easier. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Joel Granados <j.granados@samsung.com>
2024-04-24sysctl: treewide: drop unused argument ctl_table_root::set_ownership(table)Thomas Weißschuh
Remove the 'table' argument from set_ownership as it is never used. This change is a step towards putting "struct ctl_table" into .rodata and eventually having sysctl core only use "const struct ctl_table". The patch was created with the following coccinelle script: @@ identifier func, head, table, uid, gid; @@ void func( struct ctl_table_header *head, - struct ctl_table *table, kuid_t *uid, kgid_t *gid) { ... } No additional occurrences of 'set_ownership' were found after doing a tree-wide search. Reviewed-by: Joel Granados <j.granados@samsung.com> Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Joel Granados <j.granados@samsung.com>
2024-04-24xfs: Remove unused function xrep_dir_self_parentJiapeng Chong
The function are defined in the dir_repair.c file, but not called elsewhere, so delete the unused function. fs/xfs/scrub/dir_repair.c:186:1: warning: unused function 'xrep_dir_self_parent'. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=8867 Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-04-24smb: client: Fix struct_group() usage in __packed structsGustavo A. R. Silva
Use struct_group_attr() in __packed structs, instead of struct_group(). Below you can see the pahole output before/after changes: pahole -C smb2_file_network_open_info fs/smb/client/smb2ops.o struct smb2_file_network_open_info { union { struct { __le64 CreationTime; /* 0 8 */ __le64 LastAccessTime; /* 8 8 */ __le64 LastWriteTime; /* 16 8 */ __le64 ChangeTime; /* 24 8 */ __le64 AllocationSize; /* 32 8 */ __le64 EndOfFile; /* 40 8 */ __le32 Attributes; /* 48 4 */ }; /* 0 56 */ struct { __le64 CreationTime; /* 0 8 */ __le64 LastAccessTime; /* 8 8 */ __le64 LastWriteTime; /* 16 8 */ __le64 ChangeTime; /* 24 8 */ __le64 AllocationSize; /* 32 8 */ __le64 EndOfFile; /* 40 8 */ __le32 Attributes; /* 48 4 */ } network_open_info; /* 0 56 */ }; /* 0 56 */ __le32 Reserved; /* 56 4 */ /* size: 60, cachelines: 1, members: 2 */ /* last cacheline: 60 bytes */ } __attribute__((__packed__)); pahole -C smb2_file_network_open_info fs/smb/client/smb2ops.o struct smb2_file_network_open_info { union { struct { __le64 CreationTime; /* 0 8 */ __le64 LastAccessTime; /* 8 8 */ __le64 LastWriteTime; /* 16 8 */ __le64 ChangeTime; /* 24 8 */ __le64 AllocationSize; /* 32 8 */ __le64 EndOfFile; /* 40 8 */ __le32 Attributes; /* 48 4 */ } __attribute__((__packed__)); /* 0 52 */ struct { __le64 CreationTime; /* 0 8 */ __le64 LastAccessTime; /* 8 8 */ __le64 LastWriteTime; /* 16 8 */ __le64 ChangeTime; /* 24 8 */ __le64 AllocationSize; /* 32 8 */ __le64 EndOfFile; /* 40 8 */ __le32 Attributes; /* 48 4 */ } __attribute__((__packed__)) network_open_info; /* 0 52 */ }; /* 0 52 */ __le32 Reserved; /* 52 4 */ /* size: 56, cachelines: 1, members: 2 */ /* last cacheline: 56 bytes */ }; pahole -C smb_com_open_rsp fs/smb/client/cifssmb.o struct smb_com_open_rsp { ... union { struct { __le64 CreationTime; /* 48 8 */ __le64 LastAccessTime; /* 56 8 */ /* --- cacheline 1 boundary (64 bytes) --- */ __le64 LastWriteTime; /* 64 8 */ __le64 ChangeTime; /* 72 8 */ __le32 FileAttributes; /* 80 4 */ }; /* 48 40 */ struct { __le64 CreationTime; /* 48 8 */ __le64 LastAccessTime; /* 56 8 */ /* --- cacheline 1 boundary (64 bytes) --- */ __le64 LastWriteTime; /* 64 8 */ __le64 ChangeTime; /* 72 8 */ __le32 FileAttributes; /* 80 4 */ } common_attributes; /* 48 40 */ }; /* 48 40 */ ... /* size: 111, cachelines: 2, members: 14 */ /* last cacheline: 47 bytes */ } __attribute__((__packed__)); pahole -C smb_com_open_rsp fs/smb/client/cifssmb.o struct smb_com_open_rsp { ... union { struct { __le64 CreationTime; /* 48 8 */ __le64 LastAccessTime; /* 56 8 */ /* --- cacheline 1 boundary (64 bytes) --- */ __le64 LastWriteTime; /* 64 8 */ __le64 ChangeTime; /* 72 8 */ __le32 FileAttributes; /* 80 4 */ } __attribute__((__packed__)); /* 48 36 */ struct { __le64 CreationTime; /* 48 8 */ __le64 LastAccessTime; /* 56 8 */ /* --- cacheline 1 boundary (64 bytes) --- */ __le64 LastWriteTime; /* 64 8 */ __le64 ChangeTime; /* 72 8 */ __le32 FileAttributes; /* 80 4 */ } __attribute__((__packed__)) common_attributes; /* 48 36 */ }; /* 48 36 */ ... /* size: 107, cachelines: 2, members: 14 */ /* last cacheline: 43 bytes */ } __attribute__((__packed__)); pahole -C FILE_ALL_INFO fs/smb/client/cifssmb.o typedef struct { union { struct { __le64 CreationTime; /* 0 8 */ __le64 LastAccessTime; /* 8 8 */ __le64 LastWriteTime; /* 16 8 */ __le64 ChangeTime; /* 24 8 */ __le32 Attributes; /* 32 4 */ }; /* 0 40 */ struct { __le64 CreationTime; /* 0 8 */ __le64 LastAccessTime; /* 8 8 */ __le64 LastWriteTime; /* 16 8 */ __le64 ChangeTime; /* 24 8 */ __le32 Attributes; /* 32 4 */ } common_attributes; /* 0 40 */ }; /* 0 40 */ ... /* size: 113, cachelines: 2, members: 17 */ /* last cacheline: 49 bytes */ } __attribute__((__packed__)) FILE_ALL_INFO; pahole -C FILE_ALL_INFO fs/smb/client/cifssmb.o typedef struct { union { struct { __le64 CreationTime; /* 0 8 */ __le64 LastAccessTime; /* 8 8 */ __le64 LastWriteTime; /* 16 8 */ __le64 ChangeTime; /* 24 8 */ __le32 Attributes; /* 32 4 */ } __attribute__((__packed__)); /* 0 36 */ struct { __le64 CreationTime; /* 0 8 */ __le64 LastAccessTime; /* 8 8 */ __le64 LastWriteTime; /* 16 8 */ __le64 ChangeTime; /* 24 8 */ __le32 Attributes; /* 32 4 */ } __attribute__((__packed__)) common_attributes; /* 0 36 */ }; /* 0 36 */ ... /* size: 109, cachelines: 2, members: 17 */ /* last cacheline: 45 bytes */ } __attribute__((__packed__)) FILE_ALL_INFO; Fixes: 0015eb6e1238 ("smb: client, common: fix fortify warnings") Cc: stable@vger.kernel.org Reviewed-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-04-23Revert "NFSD: Convert the callback workqueue to use delayed_work"Chuck Lever
This commit was a pre-requisite for commit c1ccfcf1a9bf ("NFSD: Reschedule CB operations when backchannel rpc_clnt is shut down"), which has already been reverted. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-04-23Revert "NFSD: Reschedule CB operations when backchannel rpc_clnt is shut down"Chuck Lever
The reverted commit attempted to enable NFSD to retransmit pending callback operations if an NFS client disconnects, but unintentionally introduces a hazardous behavior regression if the client becomes permanently unreachable while callback operations are still pending. A disconnect can occur due to network partition or if the NFS server needs to force the NFS client to retransmit (for example, if a GSS window under-run occurs). Reverting the commit will make NFSD behave the same as it did in v6.8 and before. Pending callback operations are permanently lost if the client connection is terminated before the client receives them. For some callback operations, this loss is not harmful. However, for CB_RECALL, the loss means a delegation might be revoked unnecessarily. For CB_OFFLOAD, pending COPY operations will never complete unless the NFS client subsequently sends an OFFLOAD_STATUS operation, which the Linux NFS client does not currently implement. These issues still need to be addressed somehow. Reported-by: Dai Ngo <dai.ngo@oracle.com> Link: https://bugzilla.kernel.org/show_bug.cgi?id=218735 Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-04-23xfs: invalidate dentries for a file before moving it to the orphanageDarrick J. Wong
Invalidate the cached dentries that point to the file that we're moving to lost+found before we actually move it. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-23xfs: exchange-range for repairs is no longer dynamicDarrick J. Wong
The atomic file exchange-range functionality is now a permanent filesystem feature instead of a dynamic log-incompat feature. It cannot be turned on at runtime, so we no longer need the XCHK_FSGATES flags and whatnot that supported it. Remove the flag and the enable function, and move the xfs_has_exchange_range checks to the start of the repair functions. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-23xfs: fix iunlock calls in xrep_adoption_trans_allocDarrick J. Wong
If the transaction allocation in xrep_adoption_trans_alloc fails, we should drop only the locks that we took. In this case this is ILOCK_EXCL of both the orphanage and the file being repaired. Dropping any IOLOCK here is incorrect. Found by fuzzing u3.sfdir3.list[1].name = zeroes in xfs/1546. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-23xfs: drop the scrub file's iolock when transaction allocation failsDarrick J. Wong
If the transaction allocation in the !orphanage_available case of xrep_nlinks_repair_inode fails, we need to drop the IOLOCK of the file being scrubbed before exiting. Found by fuzzing u3.sfdir3.list[1].name = zeroes in xfs/1546. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-23xfs: only iget the file once when doing vectored scrub-by-handleDarrick J. Wong
If a program wants us to perform a scrub on a file handle and the fd passed to ioctl() is not the file referenced in the handle, iget the file once and pass it into the scrub code. This amortizes the untrusted iget lookup over /all/ the scrubbers mentioned in the scrubv call. When running fstests in "rebuild all metadata after each test" mode, I observed a 10% reduction in runtime on account of avoiding repeated inobt lookups. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-23xfs: use dontcache for grabbing inodes during scrubDarrick J. Wong
Back when I wrote commit a03297a0ca9f2, I had thought that we'd be doing users a favor by only marking inodes dontcache at the end of a scrub operation, and only if there's only one reference to that inode. This was more or less true back when I_DONTCACHE was an XFS iflag and the only thing it did was change the outcome of xfs_fs_drop_inode to 1. Note: If there are dentries pointing to the inode when scrub finishes, the inode will have positive i_count and stay around in cache until dentry reclaim. But now we have d_mark_dontcache, which cause the inode *and* the dentries attached to it all to be marked I_DONTCACHE, which means that we drop the dentries ASAP, which drops the inode ASAP. This is bad if scrub found problems with the inode, because now they can be scheduled for inactivation, which can cause inodegc to trip on it and shut down the filesystem. Even if the inode isn't bad, this is still suboptimal because phases 3-7 each initiate inode scans. Dropping the inode immediately during phase 3 is silly because phase 5 will reload it and drop it immediately, etc. It's fine to mark the inodes dontcache, but if there have been accesses to the file that set up dentries, we should keep them. I validated this by setting up ftrace to capture xfs_iget_recycle* tracepoints and ran xfs/285 for 30 seconds. With current djwong-wtf I saw ~30,000 recycle events. I then dropped the d_mark_dontcache calls and set XFS_IGET_DONTCACHE, and the recycle events dropped to ~5,000 per 30 seconds. Therefore, grab the inode with XFS_IGET_DONTCACHE, which only has the effect of setting I_DONTCACHE for cache misses. Remove the d_mark_dontcache call that can happen in xchk_irele. Fixes: a03297a0ca9f2 ("xfs: manage inode DONTCACHE status at irele time") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-23xfs: introduce vectored scrub modeDarrick J. Wong
Introduce a variant on XFS_SCRUB_METADATA that allows for a vectored mode. The caller specifies the principal metadata object that they want to scrub (allocation group, inode, etc.) once, followed by an array of scrub types they want called on that object. The kernel runs the scrub operations and writes the output flags and errno code to the corresponding array element. A new pseudo scrub type BARRIER is introduced to force the kernel to return to userspace if any corruptions have been found when scrubbing the previous scrub types in the array. This enables userspace to schedule, for example, the sequence: 1. data fork 2. barrier 3. directory If the data fork scrub is clean, then the kernel will perform the directory scrub. If not, the barrier in 2 will exit back to userspace. The alternative would have been an interface where userspace passes a pointer to an empty buffer, and the kernel formats that with xfs_scrub_vecs that tell userspace what it scrubbed and what the outcome was. With that the kernel would have to communicate that the buffer needed to have been at least X size, even though for our cases XFS_SCRUB_TYPE_NR + 2 would always be enough. Compared to that, this design keeps all the dependency policy and ordering logic in userspace where it already resides instead of duplicating it in the kernel. The downside of that is that it needs the barrier logic. When running fstests in "rebuild all metadata after each test" mode, I observed a 10% reduction in runtime due to fewer transitions across the system call boundary. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-23xfs: move xfs_ioc_scrub_metadata to scrub.cDarrick J. Wong
Move the scrub ioctl handler to scrub.c to keep the code together and to reduce unnecessary code when CONFIG_XFS_ONLINE_SCRUB=n. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-23xfs: reduce the rate of cond_resched calls inside scrubDarrick J. Wong
We really don't want to call cond_resched every single time we go through a loop in scrub -- there may be billions of records, and probing into the scheduler itself has overhead. Reduce this overhead by only calling cond_resched 10x per second; and add a counter so that we only check jiffies once every 1000 records or so. Surprisingly, this reduces scrub-only fstests runtime by about 2%. I used the bmapinflate xfs_db command to produce a billion-extent file and this stupid gadget reduced the scrub runtime by about 4%. From a stupid microbenchmark of calling these things 1 billion times, I estimate that cond_resched costs about 5.5ns per call; jiffes costs about 0.3ns per read; and fatal_signal_pending costs about 0.4ns per call. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-23xfs: fix corruptions in the directory treeDarrick J. Wong
Repair corruptions in the directory tree itself. Cycles are broken by removing an incoming parent->child link. Multiply-owned directories are fixed by pruning the extra parent -> child links Disconnected subtrees are reconnected to the lost and found. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-23xfs: report directory tree corruption in the health informationDarrick J. Wong
Report directories that are the source of corruption in the directory tree. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-23xfs: invalidate dirloop scrub path data when concurrent updates happenDarrick J. Wong
Add a dirent update hook so that we can detect directory tree updates that affect any of the paths found by this scrubber and force it to rescan. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-23xfs: teach online scrub to find directory tree structure problemsDarrick J. Wong
Create a new scrubber that detects corruptions within the directory tree structure itself. It can detect directories with multiple parents; loops within the directory tree; and directory loops not accessible from the root. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-23xfs: inode repair should ensure there's an attr fork to store parent pointersDarrick J. Wong
The runtime parent pointer update code expects that any file being moved around the directory tree already has an attr fork. However, if we had to rebuild an inode core record, there's a chance that we zeroed forkoff as part of the inode to pass the iget verifiers. Therefore, if we performed any repairs on an inode core, ensure that the inode has a nonzero forkoff before unlocking the inode. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-23xfs: repair link count of nondirectories after rebuilding parent pointersDarrick J. Wong
Since the parent pointer scrubber does not exhaustively search the filesystem for missing parent pointers, it doesn't have a good way to determine that there are pointers missing from an otherwise uncorrupt xattr structure. Instead, for nondirectories it employs a heuristic of comparing the file link count to the number of parent pointers found. However, we don't want this heuristic flagging a false corruption after a repair has actually scanned the entire filesystem to rebuild the parent pointers. Therefore, reset the file link count in this one case because we actually know the correct link count. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-23xfs: adapt the orphanage code to handle parent pointersDarrick J. Wong
Adapt the orphanage's adoption code to update the child file's parent pointers as part of the reparenting process. Also ensure that the child has an attr fork to receive the parent pointer update, since the runtime code assumes one exists. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-23xfs: actually rebuild the parent pointer xattrsDarrick J. Wong
Once we've assembled all the parent pointers for a file, we need to commit the new dataset atomically to that file. Parent pointer records are embedded in the xattr structure, which means that we must write a new extended attribute structure, again, atomically. Therefore, we must copy the non-parent-pointer attributes from the file being repaired into the temporary file's extended attributes and then call the atomic extent swap mechanism to exchange the blocks. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>