Age | Commit message (Collapse) | Author |
|
The previous patch that replaced BUG_ON by error handling forgot to
unlock the mutex in the error path.
Link: https://lore.kernel.org/all/Zh%2fHpAGFqa7YAFuM@duo.ucw.cz
Reported-by: Pavel Machek <pavel@denx.de>
Fixes: 7411055db5ce ("btrfs: handle chunk tree lookup error in btrfs_relocate_sys_chunks()")
CC: stable@vger.kernel.org
Reviewed-by: Pavel Machek <pavel@denx.de>
Signed-off-by: Dominique Martinet <dominique.martinet@atmark-techno.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
From exFAT specification, the reserved fields should initialize
to zero and should not use for any purpose.
If create a new dentry set in the UNUSED dentries, all fields
had been zeroed when allocating cluster to parent directory.
But if create a new dentry set in the DELETED dentries, the
reserved fields in file and stream extension dentries may be
non-zero. Because only the valid bit of the type field of the
dentry is cleared in exfat_remove_entries(), if the type of
dentry is different from the original(For example, a dentry that
was originally a file name dentry, then set to deleted dentry,
and then set as a file dentry), the reserved fields is non-zero.
So this commit initializes the dentry to 0 before createing file
dentry and stream extension dentry.
Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Reviewed-by: Andy Wu <Andy.Wu@sony.com>
Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com>
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
|
|
Since iomap_write_end() can never return a partial write length, the
comparison between written, copied and bytes becomes useless, just
merge them with the unwritten branch.
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://lore.kernel.org/r/20240320110548.2200662-10-yi.zhang@huaweicloud.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
For now, we can make sure iomap_write_end() always return 0 or copied
bytes, so instead of return written bytes, convert to return a boolean
to indicate the copied bytes have been written to the pagecache.
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://lore.kernel.org/r/20240320110548.2200662-9-yi.zhang@huaweicloud.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
In iomap_write_iter(), the status variable used to receive the return
value from iomap_write_end() is confusing, replace it with a new written
variable to represent the written bytes in each cycle, no logic changes.
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://lore.kernel.org/r/20240320110548.2200662-8-yi.zhang@huaweicloud.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Increase i_size in iomap_zero_range() and iomap_unshare_iter() is not
needed, the caller should handle it. Especially, when truncate partial
block, we should not increase i_size beyond the new EOF here. It doesn't
affect xfs and gfs2 now because they set the new file size after zero
out, it doesn't matter that a transient increase in i_size, but it will
affect ext4 because it set file size before truncate. So move the i_size
updating logic to iomap_write_iter().
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://lore.kernel.org/r/20240320110548.2200662-7-yi.zhang@huaweicloud.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Unsharing and zeroing can only happen within EOF, so there is never a
need to perform posteof pagecache truncation if write begin fails, also
partial write could never theoretically happened from iomap_write_end(),
so remove both of them.
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://lore.kernel.org/r/20240320110548.2200662-6-yi.zhang@huaweicloud.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
... seeing that ->i_mapping is the only thing we want from the inode.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Return 0 for pages which can't be mapped. This matches how page_mapped()
works. It is more convenient for users to not have to filter out these
pages.
Link: https://lkml.kernel.org/r/20240321142448.1645400-5-willy@infradead.org
Fixes: 9c5ccf2db04b ("mm: remove HUGETLB_PAGE_DTOR")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
strncpy() is deprecated for use on NUL-terminated destination strings
[1] and as such we should prefer more robust and less ambiguous string
interfaces. A good alternative is strscpy() as it guarantees
NUL-termination on the destination buffer.
In crypto.c:
We expect cipher_name to be NUL-terminated based on its use with
the C-string format specifier %s and with other string apis like
strlen():
| printk(KERN_ERR "Error attempting to initialize key TFM "
| "cipher with name = [%s]; rc = [%d]\n",
| tmp_tfm->cipher_name, rc);
and
| int cipher_name_len = strlen(cipher_name);
In main.c:
We can remove the manual NUL-byte assignments as well as the pointers to
destinations (which I assume only existed to trim down on line length?)
in favor of directly using the destination buffer which allows the
compiler to get size information -- enabling the usage of the new
2-argument strscpy().
Note that this patch relies on the _new_ 2-argument versions of
strscpy() and strscpy_pad() introduced in Commit e6584c3964f2f ("string:
Allow 2-argument strscpy()").
Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1]
Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html [2]
Link: https://github.com/KSPP/linux/issues/90
Cc: <linux-hardening@vger.kernel.org>
Signed-off-by: Justin Stitt <justinstitt@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20240321-strncpy-fs-ecryptfs-crypto-c-v1-1-d78b74c214ac@google.com
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
strncpy() is deprecated with NUL-terminated destination strings [1].
The copy_name() method does a lot of manual buffer manipulation to
eventually arrive with its desired string. If we don't know the
namespace this attr has or belongs to we want to prepend "osx." to our
final string. Following this, we're copying xattr_name and doing a
bizarre manual NUL-byte assignment with a memset where n=1.
Really, we can use some more obvious string APIs to acomplish this,
improving readability and security. Following the same control flow as
before: if we don't know the namespace let's use scnprintf() to form our
prefix + xattr_name pairing (while NUL-terminating too!). Otherwise, use
strscpy() to return the number of bytes copied into our buffer.
Additionally, for non-empty strings, include the NUL-byte in the length
-- matching the behavior of the previous implementation.
Note that strscpy() _can_ return -E2BIG but this is already handled by
all callsites:
In both hfsplus_listxattr_finder_info() and hfsplus_listxattr(), ret is
already type ssize_t so we can change the return type of copy_name() to
match (understanding that scnprintf()'s return type is different yet
fully representable by ssize_t). Furthermore, listxattr() in fs/xattr.c
is well-equipped to handle a potential -E2BIG return result from
vfs_listxattr():
| ssize_t error;
...
| error = vfs_listxattr(d, klist, size);
| if (error > 0) {
| if (size && copy_to_user(list, klist, error))
| error = -EFAULT;
| } else if (error == -ERANGE && size >= XATTR_LIST_MAX) {
| /* The file system tried to returned a list bigger
| than XATTR_LIST_MAX bytes. Not possible. */
| error = -E2BIG;
| }
... the error can potentially already be -E2BIG, skipping this else-if
and ending up at the same state as other errors.
Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1]
Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html
Link: https://github.com/KSPP/linux/issues/90
Cc: <linux-hardening@vger.kernel.org>
Signed-off-by: Justin Stitt <justinstitt@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20240401-strncpy-fs-hfsplus-xattr-c-v2-1-6e089999355e@google.com
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
strncpy() is deprecated for use on NUL-terminated destination strings
[1] and as such we should prefer more robust and less ambiguous string
interfaces.
Our goal here is to get @namebuf populated with @name's contents but
surrounded with quotes. There is some careful handling done to ensure we
properly truncate @name so that we have room for a literal quote as well
as a NUL-term. All this careful handling can be done with scnprintf
using the dynamic string width specifier %.*s which allows us to pass in
the max size for a source string. Doing this, we can put literal quotes
in our format specifier and ensure @name is truncated to fit inbetween
these quotes (-3 is from 2 quotes + 1 NUL-byte).
All in all, we get to remove a deprecated use of strncpy and clean up
this code nicely!
Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1]
Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html [2]
Link: https://github.com/KSPP/linux/issues/90
Cc: <linux-hardening@vger.kernel.org>
Signed-off-by: Justin Stitt <justinstitt@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20240328-strncpy-fs-reiserfs-item_ops-c-v1-1-2dab6d22a996@google.com
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
Althought FDPIC linux kernel provides /proc/<pid>/auxv files they are
empty because there's no code that initializes mm->saved_auxv in the
FDPIC ELF loader.
Synchronize FDPIC ELF aux vector setup with ELF. Replace entry-by-entry
aux vector copying to userspace with initialization of mm->saved_auxv
first and then copying it to userspace as a whole.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Link: https://lore.kernel.org/r/20240322195418.2160164-1-jcmvbkbc@gmail.com
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
Currently the brk starts its randomization immediately after .bss,
which means there is a chance that when the random offset is 0, linear
overflows from .bss can reach into the brk area. Leave at least a single
page gap between .bss and brk (when it has not already been explicitly
relocated into the mmap range).
Reported-by: <y0un9n132@gmail.com>
Closes: https://lore.kernel.org/linux-hardening/CA+2EKTVLvc8hDZc+2Yhwmus=dzOUG5E4gV7ayCbu0MPJTZzWkw@mail.gmail.com/
Link: https://lore.kernel.org/r/20240217062545.1631668-2-keescook@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
There are no more callers of gfs2_glock_queue_work() left, so remove
that helper. With that, we can now rename __gfs2_glock_queue_work()
back to gfs2_glock_queue_work() to get rid of some unnecessary clutter.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Function do_xmote() is called with the glock spinlock held. Commit
86934198eefa added a 'goto skip_inval' statement at the beginning of the
function to further below where the glock spinlock is expected not to be
held anymore. Then it added code there that requires the glock spinlock
to be held. This doesn't make sense; fix this up by dropping and
retaking the spinlock where needed.
In addition, when ->lm_lock() returned an error, do_xmote() didn't fail
the locking operation, and simply left the glock hanging; fix that as
well. (This is a much older error.)
Fixes: 86934198eefa ("gfs2: Clear flags when withdraw prevents xmote")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Currently, function finish_xmote() takes and releases the glock
spinlock. However, all of its callers immediately take that spinlock
again, so it makes more sense to take the spin lock before calling
finish_xmote() already.
With that, thaw_glock() is the only place that sets the GLF_HAVE_REPLY
flag outside of the glock spinlock, but it also takes that spinlock
immediately thereafter. Change that to set the bit when the spinlock is
already held. This allows to switch from test_and_clear_bit() to
test_bit() and clear_bit() in glock_work_func().
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
At unmount time, we would generally like to explicitly unlock as few
glocks as possible for efficiency. We are already skipping glocks that
don't have a lock value block (LVB), but we can also skip glocks which
are not held in DLM_LOCK_EX or DLM_LOCK_PW mode (of which gfs2 only uses
DLM_LOCK_EX under the name LM_ST_EXCLUSIVE).
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Cc: David Teigland <teigland@redhat.com>
|
|
When a DLM lockspace is released and there ares still locks in that
lockspace, DLM will unlock those locks automatically. Commit
fb6791d100d1b started exploiting this behavior to speed up filesystem
unmount: gfs2 would simply free glocks it didn't want to unlock and then
release the lockspace. This didn't take the bast callbacks for
asynchronous lock contention notifications into account, which remain
active until until a lock is unlocked or its lockspace is released.
To prevent those callbacks from accessing deallocated objects, put the
glocks that should not be unlocked on the sd_dead_glocks list, release
the lockspace, and only then free those glocks.
As an additional measure, ignore unexpected ast and bast callbacks if
the receiving glock is dead.
Fixes: fb6791d100d1b ("GFS2: skip dlm_unlock calls in unmount")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Cc: David Teigland <teigland@redhat.com>
|
|
This consistency check was originally added by commit 9287c6452d2b1
("gfs2: Fix occasional glock use-after-free"). It is ill-placed in
gfs2_glock_free() because if it holds there, it must equally hold in
__gfs2_glock_put() already. Either way, the check doesn't seem
necessary anymore.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Currently, gfs2_scan_glock_lru() decrements lru_count when a glock is
moved onto the dispose list. When such a glock is then stolen from the
dispose list while gfs2_dispose_glock_lru() doesn't hold the lru_lock,
lru_count will be decremented again, so the counter will eventually go
negative.
This bug has existed in one form or another since at least commit
97cc1025b1a91 ("GFS2: Kill two daemons with one patch").
Fix this by only decrementing lru_count when we actually remove a glock
and schedule for it to be unlocked and dropped. We also don't need to
remove and then re-add glocks when we can just as well move them back
onto the lru_list when necessary.
In addition, return the number of glocks freed as we should, not the
number of glocks moved onto the dispose list.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- fix information leak by the buffer returned from LOGICAL_INO ioctl
- fix flipped condition in scrub when tracking sectors in zoned mode
- fix calculation when dropping extent range
- reinstate fallback to write uncompressed data in case of fragmented
space that could not store the entire compressed chunk
- minor fix to message formatting style to make it conforming to the
commonly used style
* tag 'for-6.9-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix wrong block_start calculation for btrfs_drop_extent_map_range()
btrfs: fix information leak in btrfs_ioctl_logical_to_ino()
btrfs: fallback if compressed IO fails for ENOSPC
btrfs: scrub: run relocation repair when/only needed
btrfs: remove colon from messages with state
|
|
Directories do not support direct I/O and thus no non-blocking direct
I/O either. Open code the shutdown check and call to generic_file_open
instead.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20240423124608.537794-4-hch@lst.de
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Directories have non of the capabilities, so drop the flags. Note that
the current state is harmless as no one actually checks for the flags
either.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20240423124608.537794-3-hch@lst.de
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Re-wrap the newly added fop_flags fields to not go over 80 characters.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20240423124608.537794-2-hch@lst.de
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Remove the now unneeded check for ctl_table_size; it is safe
to do so as sysctl_set_perm_empty_ctl_header() does not access the
ctl_table member anymore.
This also makes the element of sysctl_mount_point unnecessary, so drop
it at the same time.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Joel Granados <j.granados@samsung.com>
|
|
Move the SYSCTL_TABLE_TYPE_{DEFAULT,PERMANENTLY_EMPTY} enums from
ctl_table to ctl_table_header.
Removing the mutable member is necessary to constify static instances
of struct ctl_table.
Move the initialization of the sysctl_mount_point type into
init_header() where all the other header fields are also initialized.
As a side-effect the memory usage of the sysctl core is reduced.
Each ctl_table_header instance can manage multiple ctl_table instances
and is only allocated when the table is actually registered.
This saves 8 bytes of memory per ctl_table on 64bit, 4 due to the enum
field itself and 4 due to padding.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Joel Granados <j.granados@samsung.com>
|
|
It is used only twice and those callers are simpler with
sysctl_is_perm_empty_ctl_header().
So use this sibling function.
This is part of an effort to constify definition of struct ctl_table.
For this effort the mutable member 'type' is moved from
struct ctl_table to struct ctl_table_header.
Unifying the macros sysctl_is_perm_empty_ctl_* makes this easier.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Joel Granados <j.granados@samsung.com>
|
|
Remove the 'table' argument from set_ownership as it is never used. This
change is a step towards putting "struct ctl_table" into .rodata and
eventually having sysctl core only use "const struct ctl_table".
The patch was created with the following coccinelle script:
@@
identifier func, head, table, uid, gid;
@@
void func(
struct ctl_table_header *head,
- struct ctl_table *table,
kuid_t *uid, kgid_t *gid)
{ ... }
No additional occurrences of 'set_ownership' were found after doing a
tree-wide search.
Reviewed-by: Joel Granados <j.granados@samsung.com>
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Joel Granados <j.granados@samsung.com>
|
|
The function are defined in the dir_repair.c file, but not called
elsewhere, so delete the unused function.
fs/xfs/scrub/dir_repair.c:186:1: warning: unused function 'xrep_dir_self_parent'.
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=8867
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
|
|
Use struct_group_attr() in __packed structs, instead of struct_group().
Below you can see the pahole output before/after changes:
pahole -C smb2_file_network_open_info fs/smb/client/smb2ops.o
struct smb2_file_network_open_info {
union {
struct {
__le64 CreationTime; /* 0 8 */
__le64 LastAccessTime; /* 8 8 */
__le64 LastWriteTime; /* 16 8 */
__le64 ChangeTime; /* 24 8 */
__le64 AllocationSize; /* 32 8 */
__le64 EndOfFile; /* 40 8 */
__le32 Attributes; /* 48 4 */
}; /* 0 56 */
struct {
__le64 CreationTime; /* 0 8 */
__le64 LastAccessTime; /* 8 8 */
__le64 LastWriteTime; /* 16 8 */
__le64 ChangeTime; /* 24 8 */
__le64 AllocationSize; /* 32 8 */
__le64 EndOfFile; /* 40 8 */
__le32 Attributes; /* 48 4 */
} network_open_info; /* 0 56 */
}; /* 0 56 */
__le32 Reserved; /* 56 4 */
/* size: 60, cachelines: 1, members: 2 */
/* last cacheline: 60 bytes */
} __attribute__((__packed__));
pahole -C smb2_file_network_open_info fs/smb/client/smb2ops.o
struct smb2_file_network_open_info {
union {
struct {
__le64 CreationTime; /* 0 8 */
__le64 LastAccessTime; /* 8 8 */
__le64 LastWriteTime; /* 16 8 */
__le64 ChangeTime; /* 24 8 */
__le64 AllocationSize; /* 32 8 */
__le64 EndOfFile; /* 40 8 */
__le32 Attributes; /* 48 4 */
} __attribute__((__packed__)); /* 0 52 */
struct {
__le64 CreationTime; /* 0 8 */
__le64 LastAccessTime; /* 8 8 */
__le64 LastWriteTime; /* 16 8 */
__le64 ChangeTime; /* 24 8 */
__le64 AllocationSize; /* 32 8 */
__le64 EndOfFile; /* 40 8 */
__le32 Attributes; /* 48 4 */
} __attribute__((__packed__)) network_open_info; /* 0 52 */
}; /* 0 52 */
__le32 Reserved; /* 52 4 */
/* size: 56, cachelines: 1, members: 2 */
/* last cacheline: 56 bytes */
};
pahole -C smb_com_open_rsp fs/smb/client/cifssmb.o
struct smb_com_open_rsp {
...
union {
struct {
__le64 CreationTime; /* 48 8 */
__le64 LastAccessTime; /* 56 8 */
/* --- cacheline 1 boundary (64 bytes) --- */
__le64 LastWriteTime; /* 64 8 */
__le64 ChangeTime; /* 72 8 */
__le32 FileAttributes; /* 80 4 */
}; /* 48 40 */
struct {
__le64 CreationTime; /* 48 8 */
__le64 LastAccessTime; /* 56 8 */
/* --- cacheline 1 boundary (64 bytes) --- */
__le64 LastWriteTime; /* 64 8 */
__le64 ChangeTime; /* 72 8 */
__le32 FileAttributes; /* 80 4 */
} common_attributes; /* 48 40 */
}; /* 48 40 */
...
/* size: 111, cachelines: 2, members: 14 */
/* last cacheline: 47 bytes */
} __attribute__((__packed__));
pahole -C smb_com_open_rsp fs/smb/client/cifssmb.o
struct smb_com_open_rsp {
...
union {
struct {
__le64 CreationTime; /* 48 8 */
__le64 LastAccessTime; /* 56 8 */
/* --- cacheline 1 boundary (64 bytes) --- */
__le64 LastWriteTime; /* 64 8 */
__le64 ChangeTime; /* 72 8 */
__le32 FileAttributes; /* 80 4 */
} __attribute__((__packed__)); /* 48 36 */
struct {
__le64 CreationTime; /* 48 8 */
__le64 LastAccessTime; /* 56 8 */
/* --- cacheline 1 boundary (64 bytes) --- */
__le64 LastWriteTime; /* 64 8 */
__le64 ChangeTime; /* 72 8 */
__le32 FileAttributes; /* 80 4 */
} __attribute__((__packed__)) common_attributes; /* 48 36 */
}; /* 48 36 */
...
/* size: 107, cachelines: 2, members: 14 */
/* last cacheline: 43 bytes */
} __attribute__((__packed__));
pahole -C FILE_ALL_INFO fs/smb/client/cifssmb.o
typedef struct {
union {
struct {
__le64 CreationTime; /* 0 8 */
__le64 LastAccessTime; /* 8 8 */
__le64 LastWriteTime; /* 16 8 */
__le64 ChangeTime; /* 24 8 */
__le32 Attributes; /* 32 4 */
}; /* 0 40 */
struct {
__le64 CreationTime; /* 0 8 */
__le64 LastAccessTime; /* 8 8 */
__le64 LastWriteTime; /* 16 8 */
__le64 ChangeTime; /* 24 8 */
__le32 Attributes; /* 32 4 */
} common_attributes; /* 0 40 */
}; /* 0 40 */
...
/* size: 113, cachelines: 2, members: 17 */
/* last cacheline: 49 bytes */
} __attribute__((__packed__)) FILE_ALL_INFO;
pahole -C FILE_ALL_INFO fs/smb/client/cifssmb.o
typedef struct {
union {
struct {
__le64 CreationTime; /* 0 8 */
__le64 LastAccessTime; /* 8 8 */
__le64 LastWriteTime; /* 16 8 */
__le64 ChangeTime; /* 24 8 */
__le32 Attributes; /* 32 4 */
} __attribute__((__packed__)); /* 0 36 */
struct {
__le64 CreationTime; /* 0 8 */
__le64 LastAccessTime; /* 8 8 */
__le64 LastWriteTime; /* 16 8 */
__le64 ChangeTime; /* 24 8 */
__le32 Attributes; /* 32 4 */
} __attribute__((__packed__)) common_attributes; /* 0 36 */
}; /* 0 36 */
...
/* size: 109, cachelines: 2, members: 17 */
/* last cacheline: 45 bytes */
} __attribute__((__packed__)) FILE_ALL_INFO;
Fixes: 0015eb6e1238 ("smb: client, common: fix fortify warnings")
Cc: stable@vger.kernel.org
Reviewed-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
This commit was a pre-requisite for commit c1ccfcf1a9bf ("NFSD:
Reschedule CB operations when backchannel rpc_clnt is shut down"),
which has already been reverted.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
The reverted commit attempted to enable NFSD to retransmit pending
callback operations if an NFS client disconnects, but
unintentionally introduces a hazardous behavior regression if the
client becomes permanently unreachable while callback operations are
still pending.
A disconnect can occur due to network partition or if the NFS server
needs to force the NFS client to retransmit (for example, if a GSS
window under-run occurs).
Reverting the commit will make NFSD behave the same as it did in
v6.8 and before. Pending callback operations are permanently lost if
the client connection is terminated before the client receives them.
For some callback operations, this loss is not harmful.
However, for CB_RECALL, the loss means a delegation might be revoked
unnecessarily. For CB_OFFLOAD, pending COPY operations will never
complete unless the NFS client subsequently sends an OFFLOAD_STATUS
operation, which the Linux NFS client does not currently implement.
These issues still need to be addressed somehow.
Reported-by: Dai Ngo <dai.ngo@oracle.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=218735
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Invalidate the cached dentries that point to the file that we're moving
to lost+found before we actually move it.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
The atomic file exchange-range functionality is now a permanent
filesystem feature instead of a dynamic log-incompat feature. It cannot
be turned on at runtime, so we no longer need the XCHK_FSGATES flags and
whatnot that supported it. Remove the flag and the enable function, and
move the xfs_has_exchange_range checks to the start of the repair
functions.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
If the transaction allocation in xrep_adoption_trans_alloc fails, we
should drop only the locks that we took. In this case this is
ILOCK_EXCL of both the orphanage and the file being repaired. Dropping
any IOLOCK here is incorrect.
Found by fuzzing u3.sfdir3.list[1].name = zeroes in xfs/1546.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
If the transaction allocation in the !orphanage_available case of
xrep_nlinks_repair_inode fails, we need to drop the IOLOCK of the file
being scrubbed before exiting.
Found by fuzzing u3.sfdir3.list[1].name = zeroes in xfs/1546.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
If a program wants us to perform a scrub on a file handle and the fd
passed to ioctl() is not the file referenced in the handle, iget the
file once and pass it into the scrub code. This amortizes the untrusted
iget lookup over /all/ the scrubbers mentioned in the scrubv call.
When running fstests in "rebuild all metadata after each test" mode, I
observed a 10% reduction in runtime on account of avoiding repeated
inobt lookups.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
Back when I wrote commit a03297a0ca9f2, I had thought that we'd be doing
users a favor by only marking inodes dontcache at the end of a scrub
operation, and only if there's only one reference to that inode. This
was more or less true back when I_DONTCACHE was an XFS iflag and the
only thing it did was change the outcome of xfs_fs_drop_inode to 1.
Note: If there are dentries pointing to the inode when scrub finishes,
the inode will have positive i_count and stay around in cache until
dentry reclaim.
But now we have d_mark_dontcache, which cause the inode *and* the
dentries attached to it all to be marked I_DONTCACHE, which means that
we drop the dentries ASAP, which drops the inode ASAP.
This is bad if scrub found problems with the inode, because now they can
be scheduled for inactivation, which can cause inodegc to trip on it and
shut down the filesystem.
Even if the inode isn't bad, this is still suboptimal because phases 3-7
each initiate inode scans. Dropping the inode immediately during phase
3 is silly because phase 5 will reload it and drop it immediately, etc.
It's fine to mark the inodes dontcache, but if there have been accesses
to the file that set up dentries, we should keep them.
I validated this by setting up ftrace to capture xfs_iget_recycle*
tracepoints and ran xfs/285 for 30 seconds. With current djwong-wtf I
saw ~30,000 recycle events. I then dropped the d_mark_dontcache calls
and set XFS_IGET_DONTCACHE, and the recycle events dropped to ~5,000 per
30 seconds.
Therefore, grab the inode with XFS_IGET_DONTCACHE, which only has the
effect of setting I_DONTCACHE for cache misses. Remove the
d_mark_dontcache call that can happen in xchk_irele.
Fixes: a03297a0ca9f2 ("xfs: manage inode DONTCACHE status at irele time")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
Introduce a variant on XFS_SCRUB_METADATA that allows for a vectored
mode. The caller specifies the principal metadata object that they want
to scrub (allocation group, inode, etc.) once, followed by an array of
scrub types they want called on that object. The kernel runs the scrub
operations and writes the output flags and errno code to the
corresponding array element.
A new pseudo scrub type BARRIER is introduced to force the kernel to
return to userspace if any corruptions have been found when scrubbing
the previous scrub types in the array. This enables userspace to
schedule, for example, the sequence:
1. data fork
2. barrier
3. directory
If the data fork scrub is clean, then the kernel will perform the
directory scrub. If not, the barrier in 2 will exit back to userspace.
The alternative would have been an interface where userspace passes a
pointer to an empty buffer, and the kernel formats that with
xfs_scrub_vecs that tell userspace what it scrubbed and what the outcome
was. With that the kernel would have to communicate that the buffer
needed to have been at least X size, even though for our cases
XFS_SCRUB_TYPE_NR + 2 would always be enough.
Compared to that, this design keeps all the dependency policy and
ordering logic in userspace where it already resides instead of
duplicating it in the kernel. The downside of that is that it needs the
barrier logic.
When running fstests in "rebuild all metadata after each test" mode, I
observed a 10% reduction in runtime due to fewer transitions across the
system call boundary.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
Move the scrub ioctl handler to scrub.c to keep the code together and to
reduce unnecessary code when CONFIG_XFS_ONLINE_SCRUB=n.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
We really don't want to call cond_resched every single time we go
through a loop in scrub -- there may be billions of records, and probing
into the scheduler itself has overhead. Reduce this overhead by only
calling cond_resched 10x per second; and add a counter so that we only
check jiffies once every 1000 records or so.
Surprisingly, this reduces scrub-only fstests runtime by about 2%. I
used the bmapinflate xfs_db command to produce a billion-extent file and
this stupid gadget reduced the scrub runtime by about 4%.
From a stupid microbenchmark of calling these things 1 billion times, I
estimate that cond_resched costs about 5.5ns per call; jiffes costs
about 0.3ns per read; and fatal_signal_pending costs about 0.4ns per
call.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
Repair corruptions in the directory tree itself. Cycles are broken by
removing an incoming parent->child link. Multiply-owned directories are
fixed by pruning the extra parent -> child links Disconnected subtrees
are reconnected to the lost and found.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
Report directories that are the source of corruption in the directory
tree.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
Add a dirent update hook so that we can detect directory tree updates
that affect any of the paths found by this scrubber and force it to
rescan.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
Create a new scrubber that detects corruptions within the directory tree
structure itself. It can detect directories with multiple parents;
loops within the directory tree; and directory loops not accessible from
the root.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
The runtime parent pointer update code expects that any file being moved
around the directory tree already has an attr fork. However, if we had
to rebuild an inode core record, there's a chance that we zeroed forkoff
as part of the inode to pass the iget verifiers.
Therefore, if we performed any repairs on an inode core, ensure that the
inode has a nonzero forkoff before unlocking the inode.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
Since the parent pointer scrubber does not exhaustively search the
filesystem for missing parent pointers, it doesn't have a good way to
determine that there are pointers missing from an otherwise uncorrupt
xattr structure. Instead, for nondirectories it employs a heuristic of
comparing the file link count to the number of parent pointers found.
However, we don't want this heuristic flagging a false corruption after
a repair has actually scanned the entire filesystem to rebuild the
parent pointers. Therefore, reset the file link count in this one case
because we actually know the correct link count.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
Adapt the orphanage's adoption code to update the child file's parent
pointers as part of the reparenting process. Also ensure that the child
has an attr fork to receive the parent pointer update, since the runtime
code assumes one exists.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
Once we've assembled all the parent pointers for a file, we need to
commit the new dataset atomically to that file. Parent pointer records
are embedded in the xattr structure, which means that we must write a
new extended attribute structure, again, atomically. Therefore, we must
copy the non-parent-pointer attributes from the file being repaired into
the temporary file's extended attributes and then call the atomic extent
swap mechanism to exchange the blocks.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|