Age | Commit message (Collapse) | Author |
|
virtqueue_enable_cb() will call virtqueue_poll() which will check if
queue is broken at beginning, so remove the virtqueue_is_broken() call
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
...when calling fuse_iget().
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
The root inode is assumed to be always hashed. Do not unhash the root
inode even if it is marked BAD.
Fixes: 5d069dbe8aaf ("fuse: fix bad inode")
Cc: <stable@vger.kernel.org> # v5.11
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
The root inode has a fixed nodeid and generation (1, 0).
Prior to the commit 15db16837a35 ("fuse: fix illegal access to inode with
reused nodeid") generation number on lookup was ignored. After this commit
lookup with the wrong generation number resulted in the inode being
unhashed. This is correct for non-root inodes, but replacing the root
inode is wrong and results in weird behavior.
Fix by reverting to the old behavior if ignoring the generation for the
root inode, but issuing a warning in dmesg.
Reported-by: Antonio SJ Musumeci <trapexit@spawn.link>
Closes: https://lore.kernel.org/all/CAOQ4uxhek5ytdN8Yz2tNEOg5ea4NkBb4nk0FGPjPk_9nz-VG3g@mail.gmail.com/
Fixes: 15db16837a35 ("fuse: fix illegal access to inode with reused nodeid")
Cc: <stable@vger.kernel.org> # v5.14
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
fuse_do_statx() was added with the wrong helper.
Fixes: d3045530bdd2 ("fuse: implement statx")
Cc: <stable@vger.kernel.org> # v6.6
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
virtio_fs_sysfs_exit() is called by:
- static int __init virtio_fs_init(void)
- static void __exit virtio_fs_exit(void)
Remove __exit from virtio_fs_sysfs_exit() since virtio_fs_init() is not
an __exit function.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202402270649.GYjNX0yw-lkp@intel.com/
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
An mmap request for a file open in passthrough mode, maps the memory
directly to the backing file.
An mmap of a file in direct io mode, usually uses cached mmap and puts
the inode in caching io mode, which denies new passthrough opens of that
inode, because caching io mode is conflicting with passthrough io mode.
For the same reason, trying to mmap a direct io file, while there is
a passthrough file open on the same inode will fail with -ENODEV.
An mmap of a file in direct io mode, also needs to wait for parallel
dio writes in-progress to complete.
If a passthrough file is opened, while an mmap of another direct io
file is waiting for parallel dio writes to complete, the wait is aborted
and mmap fails with -ENODEV.
A FUSE server that uses passthrough and direct io opens on the same inode
that may also be mmaped, is advised to provide a backing fd also for the
files that are open in direct io mode (i.e. use the flags combination
FOPEN_DIRECT_IO | FOPEN_PASSTHROUGH), so that mmap will always use the
backing file, even if read/write do not passthrough.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
This allows passing fstests generic/249 and generic/591.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Use the backing file read/write helpers to implement read/write
passthrough to a backing file.
After read/write, we invalidate a/c/mtime/size attributes.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
After getting a backing file id with FUSE_DEV_IOC_BACKING_OPEN ioctl,
a FUSE server can reply to an OPEN request with flag FOPEN_PASSTHROUGH
and the backing file id.
The FUSE server should reuse the same backing file id for all the open
replies of the same FUSE inode and open will fail (with -EIO) if a the
server attempts to open the same inode with conflicting io modes or to
setup passthrough to two different backing files for the same FUSE inode.
Using the same backing file id for several different inodes is allowed.
Opening a new file with FOPEN_DIRECT_IO for an inode that is already
open for passthrough is allowed, but only if the FOPEN_PASSTHROUGH flag
and correct backing file id are specified as well.
The read/write IO of such files will not use passthrough operations to
the backing file, but mmap, which does not support direct_io, will use
the backing file insead of using the page cache as it always did.
Even though all FUSE passthrough files of the same inode use the same
backing file as a backing inode reference, each FUSE file opens a unique
instance of a backing_file object to store the FUSE path that was used
to open the inode and the open flags of the specific open file.
The per-file, backing_file object is released along with the FUSE file.
The inode associated fuse_backing object is released when the last FUSE
passthrough file of that inode is released AND when the backing file id
is closed by the server using the FUSE_DEV_IOC_BACKING_CLOSE ioctl.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
In preparation for opening file in passthrough mode, store the
fuse_open_out argument in ff->args to be passed into fuse_file_io_open()
with the optional backing_id member.
This will be used for setting up passthrough to backing file on open
reply with FOPEN_PASSTHROUGH flag and a valid backing_id.
Opening a file in passthrough mode may fail for several reasons, such as
missing capability, conflicting open flags or inode in caching mode.
Return EIO from fuse_file_io_open() in those cases.
The combination of FOPEN_PASSTHROUGH and FOPEN_DIRECT_IO is allowed -
it mean that read/write operations will go directly to the server,
but mmap will be done to the backing file.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
FUSE server calls the FUSE_DEV_IOC_BACKING_OPEN ioctl with a backing file
descriptor. If the call succeeds, a backing file identifier is returned.
A later change will be using this backing file id in a reply to OPEN
request with the flag FOPEN_PASSTHROUGH to setup passthrough of file
operations on the open FUSE file to the backing file.
The FUSE server should call FUSE_DEV_IOC_BACKING_CLOSE ioctl to close the
backing file by its id.
This can be done at any time, but if an open reply with FOPEN_PASSTHROUGH
flag is still in progress, the open may fail if the backing file is
closed before the fuse file was opened.
Setting up backing files requires a server with CAP_SYS_ADMIN privileges.
For the backing file to be successfully setup, the backing file must
implement both read_iter and write_iter file operations.
The limitation on the level of filesystem stacking allowed for the
backing file is enforced before setting up the backing file.
Signed-off-by: Alessio Balsini <balsini@android.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
The first kiocb_set_cancel_fn() argument may point at a struct kiocb
that is not embedded inside struct aio_kiocb. With the current code,
depending on the compiler, the req->ki_ctx read happens either before
the IOCB_AIO_RW test or after that test. Move the req->ki_ctx read such
that it is guaranteed that the IOCB_AIO_RW test happens first.
Reported-by: Eric Biggers <ebiggers@kernel.org>
Cc: Benjamin LaHaise <ben@communityfibre.ca>
Cc: Eric Biggers <ebiggers@google.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Avi Kivity <avi@scylladb.com>
Cc: Sandeep Dhavale <dhavale@google.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: stable@vger.kernel.org
Fixes: b820de741ae4 ("fs/aio: Restrict kiocb_set_cancel_fn() to I/O submitted via libaio")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20240304235715.3790858-1-bvanassche@acm.org
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Patch "fs/aio: Make io_cancel() generate completions again" is based on the
assumption that calling kiocb->ki_cancel() does not complete R/W requests.
This is incorrect: the two drivers that call kiocb_set_cancel_fn() callers
set a cancellation function that calls usb_ep_dequeue(). According to its
documentation, usb_ep_dequeue() calls the completion routine with status
-ECONNRESET. Hence this revert.
Cc: Benjamin LaHaise <ben@communityfibre.ca>
Cc: Eric Biggers <ebiggers@google.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Avi Kivity <avi@scylladb.com>
Cc: Sandeep Dhavale <dhavale@google.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: stable@vger.kernel.org
Reported-by: syzbot+b91eb2ed18f599dd3c31@syzkaller.appspotmail.com
Fixes: 54cbc058d86b ("fs/aio: Make io_cancel() generate completions again")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20240304182945.3646109-1-bvanassche@acm.org
Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Just cleanup, no functional change.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
f2fs_gc_range may return error, so its caller
f2fs_allocate_pinning_section should determine whether
to do retry based on ist return value.
Also just do f2fs_gc_range when f2fs_allocate_new_section
return -EAGAIN, and check cp error case in f2fs_gc_range.
Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
__allocate_new_segment may return error when get_new_segment
fails, so its caller should check its return value.
Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
If GET_SEGNO return NULL_SEGNO for some unecpected case,
update_sit_entry will access invalid memory address,
cause system crash. It is better to do sanity check about
GET_SEGNO just like update_segment_mtime & locate_dirty_segment.
Also remove some redundant judgment code.
Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
In f2fs_allocate_data_block(), before skip allocating new segment
for DATA_PINNED log header, it needs to tag log header as unloaded
one to avoid skipping logic in locate_dirty_segment() and
__f2fs_save_inmem_curseg().
Signed-off-by: Chao Yu <chao@kernel.org>
Reviewed-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Move f2fs_valid_pinned_area() check logic from new_curseg() to
get_new_segment(), it can avoid calling __set_free() if it fails
to find free segment in conventional zone for pinned file.
Signed-off-by: Chao Yu <chao@kernel.org>
Reviewed-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This patch exchangs position of f2fs_precache_extents() and
filemap_fdatawrite(), so that f2fs_precache_extents() can load
extent info after physical addresses of all data are fixed.
Signed-off-by: Chao Yu <chao@kernel.org>
Reviewed-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
In f2fs_migrate_blocks(), when traversing blocks in last section,
blkofs_end should be (start_blk + blkcnt - 1) % blk_per_sec, fix it.
Signed-off-by: Chao Yu <chao@kernel.org>
Reviewed-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
[ 9299.893835] F2FS-fs (vdd): Allow to mount readonly mode only
mount: /mnt/f2fs: WARNING: source write-protected, mounted read-only.
root@qemu:/ ps -ef|grep f2fs
root 94 2 0 03:46 ? 00:00:00 [kworker/u17:0-f2fs_post_read_wq]
root 6282 2 0 06:21 ? 00:00:00 [f2fs_discard-253:48]
There will be no deletion in readonly image, let's skip starting
discard thread to save system resources.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Mapping info from dump.f2fs:
i_addr[0x2d] cluster flag [0xfffffffe : 4294967294]
i_addr[0x2e] [0x 10428 : 66600]
i_addr[0x2f] [0x 10429 : 66601]
i_addr[0x30] [0x 1042a : 66602]
f2fs_io fiemap 37 1 /mnt/f2fs/disk-58390c8c.raw
Previsouly, it missed to align fofs and ofs_in_node to cluster_size,
result in adding incorrect read extent cache, fix it.
Before:
f2fs_update_read_extent_tree_range: dev = (253,48), ino = 5, pgofs = 37, len = 4, blkaddr = 66600, c_len = 3
After:
f2fs_update_read_extent_tree_range: dev = (253,48), ino = 5, pgofs = 36, len = 4, blkaddr = 66600, c_len = 3
Fixes: 94afd6d6e525 ("f2fs: extent cache: support unaligned extent")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
In f2fs_insert_range(), it missed to check return value of
filemap_write_and_wait_range(), fix it.
Meanwhile, just return error number once __exchange_data_block()
fails.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
@type in f2fs_allocate_data_block() indicates log header's type, it
can be CURSEG_COLD_DATA_PINNED or CURSEG_ALL_DATA_ATGC, rather than
type of data/node, however IS_DATASEG()/IS_NODESEG() only accept later
one, fix it.
Fixes: 093749e296e2 ("f2fs: support age threshold based garbage collection")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs
Pull write hint fix from Christian Brauner:
UFS devices are widely used in mobile applications, e.g. in smartphones.
UFS vendors need data lifetime information to achieve good performance.
Providing data lifetime information to UFS devices can result in up to
40% lower write amplification. Hence this patch series that restores the
bi_write_hint member in struct bio. After this patch series has been
merged, patches that implement data lifetime support in the SCSI disk
(sd) driver will be sent to the Linux kernel SCSI maintainer.
The following changes are included in this patch series:
- Improvements for the F_GET_RW_HINT and F_SET_RW_HINT fcntls.
- Move enum rw_hint into a new header file.
- Support F_SET_RW_HINT for block devices to make it easy to test data
lifetime support.
- Restore the bio.bi_write_hint member and restore support in the VFS
layer and also in the block layer for data lifetime information.
The shell script that has been used to test the patch series combined
with the SCSI patches is available at the end of this cover letter.
* tag 'vfs-6.9.rw_hint' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs:
block, fs: Restore the per-bio/request data lifetime fields
fs: Propagate write hints to the struct block_device inode
fs: Move enum rw_hint into a new header file
fs: Split fcntl_rw_hint()
fs: Verify write lifetime constants at compile time
fs: Fix rw_hint validation
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Don't block mounting the partition, if cap is 100%.
Reviewed-by: Chao Yu <chao@kernel.org>
Reviewed-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
No functional change, but add some more logs.
Note, it includes the spelling mistakes pointed by Colin Ian King.
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
The helpers btrfs_del_delalloc_inode() and __btrfs_del_delalloc_inode()
don't follow the pattern when the "__" helper does a special case and
are in fact reversed regarding the naming. We can merge them into one as
there's only one place that needs to be open coded.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Replace the two parameters bdev and name by one that can be used to get
them both.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Other errors in flush_reservations() are handled and also in the caller.
Ignoring commit might make some sense as it's called right after join so
it's to poke the whole commit machinery to free space.
However for consistency return the error. The caller
btrfs_quota_disable() would try to start the transaction which would
in turn fail too so there's no effective change.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Use the KMEM_CACHE() macro instead of kmem_cache_create() to simplify
the creation of SLAB caches when the default values are used.
Signed-off-by: Kunwu Chan <chentao@kylinos.cn>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Use the KMEM_CACHE() macro instead of kmem_cache_create() to simplify
the creation of SLAB caches related to delayed refs when the default
values are used.
Signed-off-by: Kunwu Chan <chentao@kylinos.cn>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Use the KMEM_CACHE() macro instead of kmem_cache_create() to simplify
the creation of SLAB caches when the default values are used.
Signed-off-by: Kunwu Chan <chentao@kylinos.cn>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Use the KMEM_CACHE() macro instead of kmem_cache_create() to simplify
the creation of SLAB caches when the default values are used.
Signed-off-by: Kunwu Chan <chentao@kylinos.cn>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Use the KMEM_CACHE() macro instead of kmem_cache_create() to simplify
the creation of SLAB caches when the default values are used.
Signed-off-by: Kunwu Chan <chentao@kylinos.cn>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Use the KMEM_CACHE() macro instead of kmem_cache_create() to simplify
the creation of SLAB caches when the default values are used.
Signed-off-by: Kunwu Chan <chentao@kylinos.cn>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
The helpers are doing an initialization or release work, none of which
is performance critical that it would require a static inline, so move
them to the .c file.
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
The helper is really trivial, reading a cache size can be done directly.
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
The helpers are doing an initialization or release work, none of which
is performance critical that it would require a static inline, so move
them to the .c file.
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Using static inline in a .c file should be justified, e.g. when
functions are on a hot path but none of the affected functions seem to
be. As it's all in one compilation unit let the compiler decide.
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
This is a simple initializer and not on any hot path, it does not need
to be static inline.
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
There are many helpers doing simple things but not simple enough to
justify the static inline. None of them seems to be on a hot path so
move them to .c.
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
The helper is trivial, we can inline it. It's safe to remove the 'if' as
the iterator is always valid when used, the potential NULL was never
checked anyway.
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
The helper is trivial and used only once, open code it. It's safe to
remove the 'if', the pointer is validated in build_backref_tree().
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
The from/to CPU/disk helpers for balance args are used only in volumes,
no need to define them in accessors.h.
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
We disable offloading checksum to workqueues and do it synchronously when
the checksum algorithm is fast. However, as reported in the link below,
RAID0 with multiple devices may suffer from the sync checksum, because
"fast checksum" is still not fast enough to catch up with RAID0 writing.
We don't have an effective way to determine whether to offload or not,
for now add a sysfs knob so this can be debugged. This is intentionally
under CONFIG_BTRFS_DEBUG so ti's not exposed to users as it may be
removed in the future agin.
Introduce fs_devices->offload_csum_mode, so that a btrfs developer can
change the behavior by writing to /sys/fs/btrfs/<uuid>/offload_csum. The
default is "auto" which is the same as the previous behavior. Or, you
can set "on" or "off" (or "y" or "n" whatever kstrtobool() accepts) to
always/never offload checksum.
More benchmark need to be collected with this knob to implement a proper
criteria to enable/disable checksum offloading.
Link: https://lore.kernel.org/linux-btrfs/20230731152223.4EFB.409509F4@e16-tech.com/
Link: https://lore.kernel.org/linux-btrfs/p3vo3g7pqn664mhmdhlotu5dzcna6vjtcoc2hb2lsgo2fwct7k@xzaxclba5tae/
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
[BUG]
I have got at least two crash report for RAID6 syndrome generation, no
matter if it's AVX2 or SSE2, they all seems to have a similar
calltrace with corrupted RAX:
BUG: kernel NULL pointer dereference, address: 0000000000000000
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP PTI
Workqueue: btrfs-rmw rmw_rbio_work [btrfs]
RIP: 0010:raid6_sse21_gen_syndrome+0x9e/0x130 [raid6_pq]
RAX: 0000000000000000 RBX: 0000000000001000 RCX: ffffa0ff4cfa3248
RDX: 0000000000000000 RSI: ffffa0f74cfa3238 RDI: 0000000000000000
Call Trace:
<TASK>
rmw_rbio+0x5c8/0xa80 [btrfs]
process_one_work+0x1c7/0x3d0
worker_thread+0x4d/0x380
kthread+0xf3/0x120
ret_from_fork+0x2c/0x50
</TASK>
[CAUSE]
The cause is not known. Recently I also hit this in AVX512 path, and
that's even in v5.15 backport, which doesn't have any of my RAID56
rework.
Furthermore according to the registers:
RAX: 0000000000000000 RBX: 0000000000001000 RCX: ffffa0ff4cfa3248
The RAX register is showing the number of stripes (including PQ), which
is not correct (0). But the remaining two registers are all sane.
- RBX is the sectorsize
For x86_64 it should always be 4K and matches the output.
- RCX is the pointers array
Which is from rbio->finish_pointers, and it looks like a sane
kernel address.
[WORKAROUND]
For now, I can only add extra debug ASSERT()s before we call raid6
gen_syndrome() helper and hopes to catch the problem.
The debug requires both CONFIG_BTRFS_DEBUG and CONFIG_BTRFS_ASSERT
enabled.
My current guess is some use-after-free, but every report is only having
corrupted RAX but seemingly valid pointers doesn't make much sense.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
At btrfs_free_tree_block(), we are always initializing a delayed reference
to drop the given extent buffer but we only use if it does not belong to a
log root tree. So we are doing unnecessary work here and increasing the
duration of a critical section as this is normally called while holding a
lock on the parent tree block (if any) and while holding a log transaction
open.
So initialize the delayed reference only if the extent buffer is not from
a log tree, avoiding unnecessary work and making the code also a bit
easier to follow.
Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|