Age | Commit message (Collapse) | Author |
|
If we free a metadata buffer which has been failed to async write out
in the background, the jbd2 checkpoint procedure will not detect this
failure in jbd2_log_do_checkpoint(), so it may lead to filesystem
inconsistency after cleanup journal tail. This patch abort the journal
if free a buffer has write_io_error flag to prevent potential further
inconsistency.
Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
Link: https://lore.kernel.org/r/20200620025427.1756360-5-yi.zhang@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
There is a risk of filesystem inconsistency if we failed to async write
back metadata buffer in the background. Because of current buffer's end
io procedure is handled by end_buffer_async_write() in the block layer,
and it only clear the buffer's uptodate flag and mark the write_io_error
flag, so ext4 cannot detect such failure immediately. In most cases of
getting metadata buffer (e.g. ext4_read_inode_bitmap()), although the
buffer's data is actually uptodate, it may still read data from disk
because the buffer's uptodate flag has been cleared. Finally, it may
lead to on-disk filesystem inconsistency if reading old data from the
disk successfully and write them out again.
This patch detect bdev mapping->wb_err when getting journal's write
access and mark the filesystem error if bdev's mapping->wb_err was
increased, this could prevent further writing and potential
inconsistency.
Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
Suggested-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20200620025427.1756360-2-yi.zhang@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Pull xfs updates from Darrick Wong:
"There are quite a few changes in this release, the most notable of
which is that we've made inode flushing fully asynchronous, and we no
longer block memory reclaim on this.
Furthermore, we have fixed a long-standing bug in the quota code where
soft limit warnings and inode limits were never tracked properly.
Moving further down the line, the reflink control loops have been
redesigned to behave more efficiently; and numerous small bugs have
been fixed (see below). The xattr and quota code have been extensively
refactored in preparation for more new features coming down the line.
Finally, the behavior of DAX between ext4 and xfs has been stabilized,
which gets us a step closer to removing the experimental tag from that
feature.
We have a few new contributors this time around. Welcome, all!
I anticipate a second pull request next week for a few small bugfixes
that have been trickling in, but this is it for big changes.
Summary:
- Fix some btree block pingponging problems when swapping extents
- Redesign the reflink copy loop so that we only run one remapping
operation per transaction. This helps us avoid running out of block
reservation on highly deduped filesystems.
- Take the MMAPLOCK around filemap_map_pages.
- Make inode reclaim fully async so that we avoid stalling processes
on flushing inodes to disk.
- Reduce inode cluster buffer RMW cycles by attaching the buffer to
dirty inodes so we won't let go of the cluster buffer when we know
we're going to need it soon.
- Add some more checks to the realtime bitmap file scrubber.
- Don't trip false lockdep warnings in fs freeze.
- Remove various redundant lines of code.
- Remove unnecessary calls to xfs_perag_{get,put}.
- Preserve I_VERSION state across remounts.
- Fix an unmount hang due to AIL going to sleep with a non-empty
delwri buffer list.
- Fix an error in the inode allocation space reservation macro that
caused regressions in generic/531.
- Fix a potential livelock when dquot flush fails because the dquot
buffer is locked.
- Fix a miscalculation when reserving inode quota that could cause
users to exceed a hardlimit.
- Refactor struct xfs_dquot to use native types for incore fields
instead of abusing the ondisk struct for this purpose. This will
eventually enable proper y2038+ support, but for now it merely
cleans up the quota function declarations.
- Actually increment the quota softlimit warning counter so that soft
failures turn into hard(er) failures when they exceed the softlimit
warning counter limits set by the administrator.
- Split incore dquot state flags into their own field and namespace,
to avoid mixing them with quota type flags.
- Create a new quota type flags namespace so that we can make it
obvious when a quota function takes a quota type (user, group,
project) as an argument.
- Rename the ondisk dquot flags field to type, as that more
accurately represents what we store in it.
- Drop our bespoke memory allocation flags in favor of GFP_*.
- Rearrange the xattr functions so that we no longer mix metadata
updates and transaction management (e.g. rolling complex
transactions) in the same functions. This work will prepare us for
atomic xattr operations (itself a prerequisite for directory
backrefs) in future release cycles.
- Support FS_DAX_FL (aka FS_XFLAG_DAX) via GETFLAGS/SETFLAGS"
* tag 'xfs-5.9-merge-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (117 commits)
fs/xfs: Support that ioctl(SETXFLAGS/GETXFLAGS) can set/get inode DAX on XFS.
xfs: Lift -ENOSPC handler from xfs_attr_leaf_addname
xfs: Simplify xfs_attr_node_addname
xfs: Simplify xfs_attr_leaf_addname
xfs: Add helper function xfs_attr_node_removename_rmt
xfs: Add helper function xfs_attr_node_removename_setup
xfs: Add remote block helper functions
xfs: Add helper function xfs_attr_leaf_mark_incomplete
xfs: Add helpers xfs_attr_is_shortform and xfs_attr_set_shortform
xfs: Remove xfs_trans_roll in xfs_attr_node_removename
xfs: Remove unneeded xfs_trans_roll_inode calls
xfs: Add helper function xfs_attr_node_shrink
xfs: Pull up xfs_attr_rmtval_invalidate
xfs: Refactor xfs_attr_rmtval_remove
xfs: Pull up trans roll in xfs_attr3_leaf_clearflag
xfs: Factor out xfs_attr_rmtval_invalidate
xfs: Pull up trans roll from xfs_attr3_leaf_setflag
xfs: Refactor xfs_attr_try_sf_addname
xfs: Split apart xfs_attr_leaf_addname
xfs: Pull up trans handling in xfs_attr3_leaf_flipflags
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull init and set_fs() cleanups from Al Viro:
"Christoph's 'getting rid of ksys_...() uses under KERNEL_DS' series"
* 'hch.init_path' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (50 commits)
init: add an init_dup helper
init: add an init_utimes helper
init: add an init_stat helper
init: add an init_mknod helper
init: add an init_mkdir helper
init: add an init_symlink helper
init: add an init_link helper
init: add an init_eaccess helper
init: add an init_chmod helper
init: add an init_chown helper
init: add an init_chroot helper
init: add an init_chdir helper
init: add an init_rmdir helper
init: add an init_unlink helper
init: add an init_umount helper
init: add an init_mount helper
init: mark create_dev as __init
init: mark console_on_rootfs as __init
init: initialize ramdisk_execute_command at compile time
devtmpfs: refactor devtmpfsd()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull ptrace regset updates from Al Viro:
"Internal regset API changes:
- regularize copy_regset_{to,from}_user() callers
- switch to saner calling conventions for ->get()
- kill user_regset_copyout()
The ->put() side of things will have to wait for the next cycle,
unfortunately.
The balance is about -1KLoC and replacements for ->get() instances are
a lot saner"
* 'work.regset' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (41 commits)
regset: kill user_regset_copyout{,_zero}()
regset(): kill ->get_size()
regset: kill ->get()
csky: switch to ->regset_get()
xtensa: switch to ->regset_get()
parisc: switch to ->regset_get()
nds32: switch to ->regset_get()
nios2: switch to ->regset_get()
hexagon: switch to ->regset_get()
h8300: switch to ->regset_get()
openrisc: switch to ->regset_get()
riscv: switch to ->regset_get()
c6x: switch to ->regset_get()
ia64: switch to ->regset_get()
arc: switch to ->regset_get()
arm: switch to ->regset_get()
sh: convert to ->regset_get()
arm64: switch to ->regset_get()
mips: switch to ->regset_get()
sparc: switch to ->regset_get()
...
|
|
Before this patch, if function gfs2_dirty_inode got an error when
trying to lock the inode glock, it complained, but it didn't say
what glock or inode had the problem.
In this case, it almost always means that dinode_in found an error
with the dinode in the file system. So it makes sense to dump the
glock, which tells us the location of the dinode in the file system.
That will allow us to analyze the corruption from the metadata.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Before this patch, some functions started transactions then they called
gfs2_block_zero_range. However, gfs2_block_zero_range, like writes, can
start transactions, which results in a recursive transaction error.
For example:
do_shrink
trunc_start
gfs2_trans_begin <------------------------------------------------
gfs2_block_zero_range
iomap_zero_range(inode, from, length, NULL, &gfs2_iomap_ops);
iomap_apply ... iomap_zero_range_actor
iomap_begin
gfs2_iomap_begin
gfs2_iomap_begin_write
actor (iomap_zero_range_actor)
iomap_zero
iomap_write_begin
gfs2_iomap_page_prepare
gfs2_trans_begin <------------------------
This patch reorders the callers of gfs2_block_zero_range so that they
only start their transactions after the call. It also adds a BUG_ON to
ensure this doesn't happen again.
Fixes: 2257e468a63b ("gfs2: implement gfs2_block_zero_range using iomap_zero_range")
Cc: stable@vger.kernel.org # v5.5+
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
If function gfs2_trans_begin is called with another transaction active
it BUGs out, but it doesn't give any details about the duplicate.
This patch moves function gfs2_print_trans and calls it when this
situation arises for better debugging.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
The comment regarding journal flush thresholds is wrong. This patch fixes it.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
The error handling calls kfree(full_path) so we can't let it be a NULL
pointer. There used to be a NULL assignment here but we accidentally
deleted it. Add it back.
Fixes: 7efd08158261 ("cifs: document and cleanup dfs mount")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm
Pull dlm updates from David Teigland:
"This set includes a some improvements to the dlm networking layer:
improving the ability to trace dlm messages for debugging, and
improved handling of bad messages or disrupted connections"
* tag 'dlm-5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm:
fs: dlm: implement tcp graceful shutdown
fs: dlm: change handling of reconnects
fs: dlm: don't close socket on invalid message
fs: dlm: set skb mark per peer socket
fs: dlm: set skb mark for listen socket
net: sock: add sock_set_mark
dlm: Fix kobject memleak
|
|
Pull iomap updates from Darrick Wong:
"The most notable changes are:
- iomap no longer invalidates the page cache when performing a direct
read, since doing so is unnecessary and the old directio code
doesn't do that either.
- iomap embraced the use of returning ENOTBLK from a direct write to
trigger falling back to a buffered write since ext4 already did
this and btrfs wants it for their port.
- iomap falls back to buffered writes if we're doing a direct write
and the page cache invalidation after the flush fails; this was
necessary to handle a corner case in the btrfs port.
- Remove email virus scanner detritus that was accidentally included
in yesterday's pull request. Clearly I need(ed) to update my git
branch checker scripts. :("
* tag 'iomap-5.9-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
iomap: fall back to buffered writes for invalidation failures
xfs: use ENOTBLK for direct I/O to buffered I/O fallback
iomap: Only invalidate page cache pages on direct IO writes
iomap: Make sure iomap_end is called after iomap_begin
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull fsnotify updates from Jan Kara:
- fanotify fix for softlockups when there are many queued events
- performance improvement to reduce fsnotify overhead when not used
- Amir's implementation of fanotify events with names. With these you
can now efficiently monitor whole filesystem, eg to mirror changes to
another machine.
* tag 'fsnotify_for_v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: (37 commits)
fanotify: compare fsid when merging name event
fsnotify: create method handle_inode_event() in fsnotify_operations
fanotify: report parent fid + child fid
fanotify: report parent fid + name + child fid
fanotify: add support for FAN_REPORT_NAME
fanotify: report events with parent dir fid to sb/mount/non-dir marks
fanotify: add basic support for FAN_REPORT_DIR_FID
fsnotify: remove check that source dentry is positive
fsnotify: send event with parent/name info to sb/mount/non-dir marks
audit: do not set FS_EVENT_ON_CHILD in audit marks mask
inotify: do not set FS_EVENT_ON_CHILD in non-dir mark mask
fsnotify: pass dir and inode arguments to fsnotify()
fsnotify: create helper fsnotify_inode()
fsnotify: send event to parent and child with single callback
inotify: report both events on parent and child with single callback
dnotify: report both events on parent and child with single callback
fanotify: no external fh buffer in fanotify_name_event
fanotify: use struct fanotify_info to parcel the variable size buffer
fsnotify: add object type "child" to object type iterator
fanotify: use FAN_EVENT_ON_CHILD as implicit flag on sb/mount/non-dir marks
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull ext2, udf, reiserfs, quota cleanups and minor fixes from Jan Kara:
"A few ext2 fixups and then several (mostly comment and documentation)
cleanups in ext2, udf, reiserfs, and quota"
* tag 'for_v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
reiserfs: delete duplicated words
udf: osta_udf.h: delete a duplicated word
reiserfs: reiserfs.h: delete a duplicated word
ext2: ext2.h: fix duplicated word + typos
udf: Replace HTTP links with HTTPS ones
quota: Fixup http links in quota doc
Replace HTTP links with HTTPS ones: DISKQUOTA
ext2: initialize quota info in ext2_xattr_set()
ext2: fix some incorrect comments in inode.c
ext2: remove nocheck option
ext2: fix missing percpu_counter_inc
ext2: ext2_find_entry() return -ENOENT if no entry found
ext2: propagate errors up to ext2_find_entry()'s callers
ext2: fix improper assignment for e_value_offs
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs updates from Gao Xiang:
"This cycle mainly addresses an issue out of some extended inode with
designated location, which are not generated by current mkfs but need
to handled at runtime anyway. The others are quite trivial ones.
- use HTTPS links instead of insecure HTTP ones;
- fix crossing page boundary on specific extended inodes;
- remove useless WQ_CPU_INTENSIVE flag for unbound wq;
- minor cleanup"
* tag 'erofs-for-5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: remove WQ_CPU_INTENSIVE flag from unbound wq's
erofs: fold in used-once helper erofs_workgroup_unfreeze_final()
erofs: fix extended inode could cross boundary
erofs: Replace HTTP links with HTTPS ones
|
|
Pull cifs updates from Steve French:
"16 cifs/smb3 fixes, about half DFS related, two fixes for stable.
Still working on and testing an additional set of fixes (including
updates to mount, and some fallocate scenario improvements) for later
in the merge window"
* tag '5.9-rc-smb3-fixes-part1' of git://git.samba.org/sfrench/cifs-2.6:
cifs: document and cleanup dfs mount
cifs: only update prefix path of DFS links in cifs_tree_connect()
cifs: fix double free error on share and prefix
cifs: handle RESP_GET_DFS_REFERRAL.PathConsumed in reconnect
cifs: handle empty list of targets in cifs_reconnect()
cifs: rename reconn_inval_dfs_target()
cifs: reduce number of referral requests in DFS link lookups
cifs: merge __{cifs,smb2}_reconnect[_tcon]() into cifs_tree_connect()
cifs: convert to use be32_add_cpu()
cifs: delete duplicated words in header files
cifs: Remove the superfluous break
cifs: smb1: Try failing back to SetFileInfo if SetPathInfo fails
cifs`: handle ERRBaduid for SMB1
cifs: remove unused variable 'server'
smb3: warn on confusing error scenario with sec=krb5
cifs: Fix leak when handling lease break for cached root fid
|
|
During my code inspection I saw there is no implementation of a graceful
shutdown for tcp. This patch will introduce a graceful shutdown for tcp
connections. The shutdown is implemented synchronized as
dlm_lowcomms_stop() is called to end all dlm communication. After shutdown
is done, a lot of flush and closing functionality will be called. However
I don't see a problem with that.
The waitqueue for synchronize the shutdown has a timeout of 10 seconds, if
timeout a force close will be exectued.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
|
|
This patch changes the handling of reconnects. At first we only close
the connection related to the communication failure. If we get a new
connection for an already existing connection we close the existing
connection and take the new one.
This patch improves significantly the stability of tcp connections while
running "tcpkill -9 -i $IFACE port 21064" while generating a lot of dlm
messages e.g. on a gfs2 mount with many files. My test setup shows that a
deadlock is "more" unlikely. Before this patch I wasn't able to get
not a deadlock after 5 seconds. After this patch my observation is
that it's more likely to survive after 5 seconds and more, but still a
deadlock occurs after certain time. My guess is that there are still
"segments" inside the tcp writequeue or retransmit queue which get dropped
when receiving a tcp reset [1]. Hard to reproduce because the right message
need to be inside these queues, which might even be in the 5 first seconds
with this patch.
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/ipv4/tcp_input.c?h=v5.8-rc6#n4122
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
|
|
This patch doesn't close sockets when there is an invalid dlm message
received. The connection will probably reconnect anyway so. To not
close the connection will reduce the number of possible failtures.
As we don't have a different strategy to react on such scenario
just keep going the connection and ignore the message.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
|
|
This patch adds support to set the skb mark value for the DLM tcp and
sctp socket per peer. The mark value will be offered as per comm value
of configfs. At creation time of the peer socket it will be set as
socket option.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
|
|
This patch adds support to set the skb mark value for the DLM listen
tcp and sctp sockets. The mark value will be offered as cluster
configuration. At creation time of the listen socket it will be set as
socket option.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
|
|
Currently the error return path from kobject_init_and_add() is not
followed by a call to kobject_put() - which means we are leaking
the kobject.
Set do_unreg = 1 before kobject_init_and_add() to ensure that
kobject_put() can be called in its error patch.
Fixes: 901195ed7f4b ("Kobject: change GFS2 to use kobject_init_and_add")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Signed-off-by: David Teigland <teigland@redhat.com>
|
|
The tear down path will always unaccount the memory, so ensure that we
have accounted it before hitting any of them.
Reported-by: Tomáš Chaloupka <chalucha@gmail.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
If we hit an earlier error path in io_uring_create(), then we will have
accounted memory, but not set ctx->{sq,cq}_entries yet. Then when the
ring is torn down in error, we use those values to unaccount the memory.
Ensure we set the ctx entries before we're able to hit a potential error
path.
Cc: stable@vger.kernel.org
Reported-by: Tomáš Chaloupka <chalucha@gmail.com>
Tested-by: Tomáš Chaloupka <chalucha@gmail.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Pick up the full seqlock series PeterZ is working on.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
cr=0 is supposed to be an optimization to save CPU cycles, but if
buddy data (in memory) is not initialized then all this makes no sense
as we have to do sync IO taking a lot of cycles. Also, at cr=0
mballoc doesn't choose any available chunk. cr=1 also skips groups
using heuristic based on avg. fragment size. It's more useful to skip
such groups and switch to cr=2 where groups will be scanned for
available chunks. However, we always read the first block group in a
flex_bg so metadata blocks will get read into the first flex_bg if
possible.
Using sparse image and dm-slow virtual device of 120TB was
simulated, then the image was formatted and filled using debugfs to
mark ~85% of available space as busy. mount process w/o the patch
couldn't complete in half an hour (according to vmstat it would take
~10-11 hours). With the patch applied mount took ~20 seconds.
Lustre-bug-id: https://jira.whamcloud.com/browse/LU-12988
Signed-off-by: Alex Zhuravlev <azhuravlev@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Artem Blagodarenko <artem.blagodarenko@gmail.com>
|
|
This should significantly improve bitmap loading, especially for flex
groups as it tries to load all bitmaps within a flex.group instead of
one by one synchronously.
Prefetching is done in 8 * flex_bg groups, so it should be 8
read-ahead reads for a single allocating thread. At the end of
allocation the thread waits for read-ahead completion and initializes
buddy information so that read-aheads are not lost in case of memory
pressure.
At cr=0 the number of prefetching IOs is limited per allocation
context to prevent a situation when mballoc loads thousands of bitmaps
looking for a perfect group and ignoring groups with good chunks.
Together with the patch "ext4: limit scanning of uninitialized groups"
the mount time (which includes few tiny allocations) of a 1PB
filesystem is reduced significantly:
0% full 50%-full unpatched patched
mount time 33s 9279s 563s
[ Restructured by tytso; removed the state flags in the allocation
context, so it can be used to lazily prefetch the allocation bitmaps
immediately after the file system is mounted. Skip prefetching
block groups which are uninitialized. Finally pass in the
REQ_RAHEAD flag to the block layer while prefetching. ]
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Ext4 uses blkdev_get_by_dev() to get the block_device for journal device
which does check to see if the read-only block device was opened
read-only.
As a result ext4 will hapily proceed mounting the file system with
external journal on read-only device. This is bad as we would not be
able to use the journal leading to errors later on.
Instead of simply failing to mount file system in this case, treat it in
a similar way we treat internal journal on read-only device. Allow to
mount with -o noload in read-only mode.
This can be reproduced easily like this:
mke2fs -F -O journal_dev $JOURNAL_DEV 100M
mkfs.$FSTYPE -F -J device=$JOURNAL_DEV $FS_DEV
blockdev --setro $JOURNAL_DEV
mount $FS_DEV $MNT
touch $MNT/file
umount $MNT
leading to error like this
[ 1307.318713] ------------[ cut here ]------------
[ 1307.323362] generic_make_request: Trying to write to read-only block-device dm-2 (partno 0)
[ 1307.331741] WARNING: CPU: 36 PID: 3224 at block/blk-core.c:855 generic_make_request_checks+0x2c3/0x580
[ 1307.341041] Modules linked in: ext4 mbcache jbd2 rfkill intel_rapl_msr intel_rapl_common isst_if_commd
[ 1307.419445] CPU: 36 PID: 3224 Comm: jbd2/dm-2 Tainted: G W I 5.8.0-rc5 #2
[ 1307.427359] Hardware name: Dell Inc. PowerEdge R740/01KPX8, BIOS 2.3.10 08/15/2019
[ 1307.434932] RIP: 0010:generic_make_request_checks+0x2c3/0x580
[ 1307.440676] Code: 94 03 00 00 48 89 df 48 8d 74 24 08 c6 05 cf 2b 18 01 01 e8 7f a4 ff ff 48 c7 c7 50e
[ 1307.459420] RSP: 0018:ffffc0d70eb5fb48 EFLAGS: 00010286
[ 1307.464646] RAX: 0000000000000000 RBX: ffff9b33b2978300 RCX: 0000000000000000
[ 1307.471780] RDX: ffff9b33e12a81e0 RSI: ffff9b33e1298000 RDI: ffff9b33e1298000
[ 1307.478913] RBP: ffff9b7b9679e0c0 R08: 0000000000000837 R09: 0000000000000024
[ 1307.486044] R10: 0000000000000000 R11: ffffc0d70eb5f9f0 R12: 0000000000000400
[ 1307.493177] R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000000
[ 1307.500308] FS: 0000000000000000(0000) GS:ffff9b33e1280000(0000) knlGS:0000000000000000
[ 1307.508396] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1307.514142] CR2: 000055eaf4109000 CR3: 0000003dee40a006 CR4: 00000000007606e0
[ 1307.521273] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1307.528407] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1307.535538] PKRU: 55555554
[ 1307.538250] Call Trace:
[ 1307.540708] generic_make_request+0x30/0x340
[ 1307.544985] submit_bio+0x43/0x190
[ 1307.548393] ? bio_add_page+0x62/0x90
[ 1307.552068] submit_bh_wbc+0x16a/0x190
[ 1307.555833] jbd2_write_superblock+0xec/0x200 [jbd2]
[ 1307.560803] jbd2_journal_update_sb_log_tail+0x65/0xc0 [jbd2]
[ 1307.566557] jbd2_journal_commit_transaction+0x2ae/0x1860 [jbd2]
[ 1307.572566] ? check_preempt_curr+0x7a/0x90
[ 1307.576756] ? update_curr+0xe1/0x1d0
[ 1307.580421] ? account_entity_dequeue+0x7b/0xb0
[ 1307.584955] ? newidle_balance+0x231/0x3d0
[ 1307.589056] ? __switch_to_asm+0x42/0x70
[ 1307.592986] ? __switch_to_asm+0x36/0x70
[ 1307.596918] ? lock_timer_base+0x67/0x80
[ 1307.600851] kjournald2+0xbd/0x270 [jbd2]
[ 1307.604873] ? finish_wait+0x80/0x80
[ 1307.608460] ? commit_timeout+0x10/0x10 [jbd2]
[ 1307.612915] kthread+0x114/0x130
[ 1307.616152] ? kthread_park+0x80/0x80
[ 1307.619816] ret_from_fork+0x22/0x30
[ 1307.623400] ---[ end trace 27490236265b1630 ]---
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Link: https://lore.kernel.org/r/20200717090605.2612-1-lczerner@redhat.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Fix spelling typos in ext4_mb_initialize_context.
Signed-off-by: Chunguang Xu <brookxu@tencent.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/883b523c-58ec-7f38-0bb8-cd2ea4393684@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Don't define EXT4_IOC_* aliases to ioctls that already have a generic
FS_IOC_* name. These aliases are unnecessary, and they make it unclear
which ioctls are ext4-specific and which are generic.
Exception: leave EXT4_IOC_GETVERSION_OLD and EXT4_IOC_SETVERSION_OLD
as-is for now, since renaming them to FS_IOC_GETVERSION and
FS_IOC_SETVERSION would probably make them more likely to be confused
with EXT4_IOC_GETVERSION and EXT4_IOC_SETVERSION which also exist.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20200714230909.56349-1-ebiggers@kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Define the EXT4_FL_USER_* constants by OR-ing together the appropriate
flags, rather than hard-coding a numeric value. This makes it much
easier to see which flags are listed.
No change in the actual values.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20200713031012.192440-1-ebiggers@kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
A customer has reported a BUG_ON in ext4_clear_journal_err() hitting
during an LTP testing. Either this has been caused by a test setup
issue where the filesystem was being overwritten while LTP was mounting
it or the journal replay has overwritten the superblock with invalid
data. In either case it is preferable we don't take the machine down
with a BUG_ON. So handle the situation of unexpectedly missing
has_journal feature more gracefully. We issue warning and fail the mount
in the cases where the race window is narrow and the failed check is
most likely a programming error. In cases where fs corruption is more
likely, we do full ext4_error() handling before failing mount / remount.
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20200710140759.18031-1-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Since commit 378f32bab371 ("ext4: introduce direct I/O write using iomap
infrastructure") we don't properly bail out of RWF_NOWAIT direct IO
write if underlying blocks are not allocated. Also
ext4_dio_write_checks() does not honor RWF_NOWAIT when re-acquiring
i_rwsem. Fix both issues.
Fixes: 378f32bab371 ("ext4: introduce direct I/O write using iomap infrastructure")
Cc: stable@kernel.org
Reported-by: Filipe Manana <fdmanana@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com>
Link: https://lore.kernel.org/r/20200708153516.9507-1-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Rationale:
Reduces attack surface on kernel devs opening the links for MITM
as HTTPS traffic is much harder to manipulate.
Deterministic algorithm:
For each file:
If not .svg:
For each line:
If doesn't contain `\bxmlns\b`:
For each link, `\bhttp://[^# \t\r\n]*(?:\w|/)`:
If both the HTTP and HTTPS versions
return 200 OK and serve the same content:
Replace HTTP with HTTPS.
Signed-off-by: Alexander A. Klimov <grandmaster@al2klimov.de>
Link: https://lore.kernel.org/r/20200706190339.20709-1-grandmaster@al2klimov.de
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
If dquot_initialize() return non-zero and trace of ext4_unlink_enter/exit
enabled then the matching-pair of trace_exit will lost in log.
Signed-off-by: Yi Zhuang <zhuangyi1@huawei.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com>
Link: https://lore.kernel.org/r/20200629122621.129953-1-zhuangyi1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
It should call trace exit in all return path for ext4_truncate.
Signed-off-by: zhengliang <zhengliang6@huawei.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com>
Link: https://lore.kernel.org/r/20200701083027.45996-1-zhengliang6@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
jbd2_write_superblock()
jbd2_write_superblock() is under the buffer lock of journal superblock
before ending that superblock write, so add a missing unlock_buffer() in
in the error path before submitting buffer.
Fixes: 742b06b5628f ("jbd2: check superblock mapped prior to committing")
Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com>
Cc: stable@kernel.org
Link: https://lore.kernel.org/r/20200620061948.2049579-1-yi.zhang@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
If for any reason a directory passed to do_split() does not have enough
active entries to exceed half the size of the block, we can end up
iterating over all "count" entries without finding a split point.
In this case, count == move, and split will be zero, and we will
attempt a negative index into map[].
Guard against this by detecting this case, and falling back to
split-to-half-of-count instead; in this case we will still have
plenty of space (> half blocksize) in each split block.
Fixes: ef2b02d3e617 ("ext34: ensure do_split leaves enough free space in both blocks")
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/f53e246b-647c-64bb-16ec-135383c70ad7@redhat.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Callers of __jbd2_journal_unfile_buffer() and
__jbd2_journal_refile_buffer() assume that the b_transaction is set. In
fact if it's not, we can end up with journal_head refcounting errors
leading to crash much later that might be very hard to track down. Add
asserts to make sure that is the case.
We also make sure that b_next_transaction is NULL in
__jbd2_journal_unfile_buffer() since the callers expect that as well and
we should not get into that stage in this state anyway, leading to
problems later on if we do.
Tested with fstests.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20200617092549.6712-1-lczerner@redhat.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Fixed a few coding style issues in file.c
Signed-off-by: Dio Putra <dioput12@gmail.com>
Link: https://lore.kernel.org/r/239fcd8f-d33f-8621-9e82-0416dd3f9c94@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
The brelse() function tests whether its argument is NULL
and then returns immediately.
Thus remove the tests which are not needed around the shown calls.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com>
Link: https://lore.kernel.org/r/0d713702-072f-a89c-20ec-ca70aa83a432@web.de
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Pull networking updates from David Miller:
1) Support 6Ghz band in ath11k driver, from Rajkumar Manoharan.
2) Support UDP segmentation in code TSO code, from Eric Dumazet.
3) Allow flashing different flash images in cxgb4 driver, from Vishal
Kulkarni.
4) Add drop frames counter and flow status to tc flower offloading,
from Po Liu.
5) Support n-tuple filters in cxgb4, from Vishal Kulkarni.
6) Various new indirect call avoidance, from Eric Dumazet and Brian
Vazquez.
7) Fix BPF verifier failures on 32-bit pointer arithmetic, from
Yonghong Song.
8) Support querying and setting hardware address of a port function via
devlink, use this in mlx5, from Parav Pandit.
9) Support hw ipsec offload on bonding slaves, from Jarod Wilson.
10) Switch qca8k driver over to phylink, from Jonathan McDowell.
11) In bpftool, show list of processes holding BPF FD references to
maps, programs, links, and btf objects. From Andrii Nakryiko.
12) Several conversions over to generic power management, from Vaibhav
Gupta.
13) Add support for SO_KEEPALIVE et al. to bpf_setsockopt(), from Dmitry
Yakunin.
14) Various https url conversions, from Alexander A. Klimov.
15) Timestamping and PHC support for mscc PHY driver, from Antoine
Tenart.
16) Support bpf iterating over tcp and udp sockets, from Yonghong Song.
17) Support 5GBASE-T i40e NICs, from Aleksandr Loktionov.
18) Add kTLS RX HW offload support to mlx5e, from Tariq Toukan.
19) Fix the ->ndo_start_xmit() return type to be netdev_tx_t in several
drivers. From Luc Van Oostenryck.
20) XDP support for xen-netfront, from Denis Kirjanov.
21) Support receive buffer autotuning in MPTCP, from Florian Westphal.
22) Support EF100 chip in sfc driver, from Edward Cree.
23) Add XDP support to mvpp2 driver, from Matteo Croce.
24) Support MPTCP in sock_diag, from Paolo Abeni.
25) Commonize UDP tunnel offloading code by creating udp_tunnel_nic
infrastructure, from Jakub Kicinski.
26) Several pci_ --> dma_ API conversions, from Christophe JAILLET.
27) Add FLOW_ACTION_POLICE support to mlxsw, from Ido Schimmel.
28) Add SK_LOOKUP bpf program type, from Jakub Sitnicki.
29) Refactor a lot of networking socket option handling code in order to
avoid set_fs() calls, from Christoph Hellwig.
30) Add rfc4884 support to icmp code, from Willem de Bruijn.
31) Support TBF offload in dpaa2-eth driver, from Ioana Ciornei.
32) Support XDP_REDIRECT in qede driver, from Alexander Lobakin.
33) Support PCI relaxed ordering in mlx5 driver, from Aya Levin.
34) Support TCP syncookies in MPTCP, from Flowian Westphal.
35) Fix several tricky cases of PMTU handling wrt. briding, from Stefano
Brivio.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2056 commits)
net: thunderx: initialize VF's mailbox mutex before first usage
usb: hso: remove bogus check for EINPROGRESS
usb: hso: no complaint about kmalloc failure
hso: fix bailout in error case of probe
ip_tunnel_core: Fix build for archs without _HAVE_ARCH_IPV6_CSUM
selftests/net: relax cpu affinity requirement in msg_zerocopy test
mptcp: be careful on subflow creation
selftests: rtnetlink: make kci_test_encap() return sub-test result
selftests: rtnetlink: correct the final return value for the test
net: dsa: sja1105: use detected device id instead of DT one on mismatch
tipc: set ub->ifindex for local ipv6 address
ipv6: add ipv6_dev_find()
net: openvswitch: silence suspicious RCU usage warning
Revert "vxlan: fix tos value before xmit"
ptp: only allow phase values lower than 1 period
farsync: switch from 'pci_' to 'dma_' API
wan: wanxl: switch from 'pci_' to 'dma_' API
hv_netvsc: do not use VF device if link is down
dpaa2-eth: Fix passing zero to 'PTR_ERR' warning
net: macb: Properly handle phylink on at91sam9x
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here is the "big" set of changes to the driver core, and some drivers
using the changes, for 5.9-rc1.
"Biggest" thing in here is the device link exposure in sysfs, to help
to tame the madness that is SoC device tree representations and driver
interactions with it.
Other stuff in here that is interesting is:
- device probe log helper so that drivers can report problems in a
unified way easier.
- devres functions added
- DEVICE_ATTR_ADMIN_* macro added to make it harder to write
incorrect sysfs file permissions
- documentation cleanups
- ability for debugfs to be present in the kernel, yet not exposed to
userspace. Needed for systems that want it enabled, but do not
trust users, so they can still use some kernel functions that were
otherwise disabled.
- other minor fixes and cleanups
The patches outside of drivers/base/ all have acks from the respective
subsystem maintainers to go through this tree instead of theirs.
All of these have been in linux-next with no reported issues"
* tag 'driver-core-5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (39 commits)
drm/bridge: lvds-codec: simplify error handling
drm/bridge/sii8620: fix resource acquisition error handling
driver core: add deferring probe reason to devices_deferred property
driver core: add device probe log helper
driver core: Avoid binding drivers to dead devices
Revert "test_firmware: Test platform fw loading on non-EFI systems"
firmware_loader: EFI firmware loader must handle pre-allocated buffer
selftest/firmware: Add selftest timeout in settings
test_firmware: Test platform fw loading on non-EFI systems
driver core: Change delimiter in devlink device's name to "--"
debugfs: Add access restriction option
tracefs: Remove unnecessary debug_fs checks.
driver core: Fix probe_count imbalance in really_probe()
kobject: remove unused KOBJ_MAX action
driver core: Fix sleeping in invalid context during device link deletion
driver core: Add waiting_for_supplier sysfs file for devices
driver core: Add state_synced sysfs file for devices that support it
driver core: Expose device link details in sysfs
driver core: Drop mention of obsolete bus rwsem from kernel-doc
debugfs: file: Remove unnecessary cast in kfree()
...
|
|
Failing to invalid the page cache means data in incoherent, which is
a very bad state for the system. Always fall back to buffered I/O
through the page cache if we can't invalidate mappings.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Acked-by: Bob Peterson <rpeterso@redhat.com>
Acked-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Theodore Ts'o <tytso@mit.edu> # for ext4
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com> # for gfs2
Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com>
|
|
This is what the classic fs/direct-io.c implementation and thuse other
file systems use.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
The historic requirement for XFS to invalidate cached pages on
direct IO reads has been lost in the twisty pages of history - it was
inherited from Irix, which implemented page cache invalidation on
read as a method of working around problems synchronising page
cache state with uncached IO.
XFS has carried this ever since. In the initial linux ports it was
necessary to get mmap and DIO to play "ok" together and not
immediately corrupt data. This was the state of play until the linux
kernel had infrastructure to track unwritten extents and synchronise
page faults with allocations and unwritten extent conversions
(->page_mkwrite infrastructure). IOws, the page cache invalidation
on DIO read was necessary to prevent trivial data corruptions. This
didn't solve all the problems, though.
There were peformance problems if we didn't invalidate the entire
page cache over the file on read - we couldn't easily determine if
the cached pages were over the range of the IO, and invalidation
required taking a serialising lock (i_mutex) on the inode. This
serialising lock was an issue for XFS, as it was the only exclusive
lock in the direct Io read path.
Hence if there were any cached pages, we'd just invalidate the
entire file in one go so that subsequent IOs didn't need to take the
serialising lock. This was a problem that prevented ranged
invalidation from being particularly useful for avoiding the
remaining coherency issues. This was solved with the conversion of
i_mutex to i_rwsem and the conversion of the XFS inode IO lock to
use i_rwsem. Hence we could now just do ranged invalidation and the
performance problem went away.
However, page cache invalidation was still needed to serialise
sub-page/sub-block zeroing via direct IO against buffered IO because
bufferhead state attached to the cached page could get out of whack
when direct IOs were issued. We've removed bufferheads from the
XFS code, and we don't carry any extent state on the cached pages
anymore, and so this problem has gone away, too.
IOWs, it would appear that we don't have any good reason to be
invalidating the page cache on DIO reads anymore. Hence remove the
invalidation on read because it is unnecessary overhead,
not needed to maintain coherency between mmap/buffered access and
direct IO anymore, and prevents anyone from using direct IO reads
from intentionally invalidating the page cache of a file.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
Delete repeated words in fs/xfs/.
{we, that, the, a, to, fork}
Change "it it" to "it is" in one location.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
To: linux-fsdevel@vger.kernel.org
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: linux-xfs@vger.kernel.org
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
Most session messages contain a feature mask, but the MDS will
routinely send a REJECT message with one that is zero-length.
Commit 0fa8263367db ("ceph: fix endianness bug when handling MDS
session feature bits") fixed the decoding of the feature mask,
but failed to account for the MDS sending a zero-length feature
mask. This causes REJECT message decoding to fail.
Skip trying to decode a feature mask if the word count is zero.
Cc: stable@vger.kernel.org
URL: https://tracker.ceph.com/issues/46823
Fixes: 0fa8263367db ("ceph: fix endianness bug when handling MDS session feature bits")
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Tested-by: Patrick Donnelly <pdonnell@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Virtio fs is modern-only. Use LE accessors for config space.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
loop_rw_iter() does not check whether the file has a read or
write function. This can lead to NULL pointer dereference
when the user passes in a file descriptor that does not have
read or write function.
The crash log looks like this:
[ 99.834071] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 99.835364] #PF: supervisor instruction fetch in kernel mode
[ 99.836522] #PF: error_code(0x0010) - not-present page
[ 99.837771] PGD 8000000079d62067 P4D 8000000079d62067 PUD 79d8c067 PMD 0
[ 99.839649] Oops: 0010 [#2] SMP PTI
[ 99.840591] CPU: 1 PID: 333 Comm: io_wqe_worker-0 Tainted: G D 5.8.0 #2
[ 99.842622] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1 04/01/2014
[ 99.845140] RIP: 0010:0x0
[ 99.845840] Code: Bad RIP value.
[ 99.846672] RSP: 0018:ffffa1c7c01ebc08 EFLAGS: 00010202
[ 99.848018] RAX: 0000000000000000 RBX: ffff92363bd67300 RCX: ffff92363d461208
[ 99.849854] RDX: 0000000000000010 RSI: 00007ffdbf696bb0 RDI: ffff92363bd67300
[ 99.851743] RBP: ffffa1c7c01ebc40 R08: 0000000000000000 R09: 0000000000000000
[ 99.853394] R10: ffffffff9ec692a0 R11: 0000000000000000 R12: 0000000000000010
[ 99.855148] R13: 0000000000000000 R14: ffff92363d461208 R15: ffffa1c7c01ebc68
[ 99.856914] FS: 0000000000000000(0000) GS:ffff92363dd00000(0000) knlGS:0000000000000000
[ 99.858651] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 99.860032] CR2: ffffffffffffffd6 CR3: 000000007ac66000 CR4: 00000000000006e0
[ 99.861979] Call Trace:
[ 99.862617] loop_rw_iter.part.0+0xad/0x110
[ 99.863838] io_write+0x2ae/0x380
[ 99.864644] ? kvm_sched_clock_read+0x11/0x20
[ 99.865595] ? sched_clock+0x9/0x10
[ 99.866453] ? sched_clock_cpu+0x11/0xb0
[ 99.867326] ? newidle_balance+0x1d4/0x3c0
[ 99.868283] io_issue_sqe+0xd8f/0x1340
[ 99.869216] ? __switch_to+0x7f/0x450
[ 99.870280] ? __switch_to_asm+0x42/0x70
[ 99.871254] ? __switch_to_asm+0x36/0x70
[ 99.872133] ? lock_timer_base+0x72/0xa0
[ 99.873155] ? switch_mm_irqs_off+0x1bf/0x420
[ 99.874152] io_wq_submit_work+0x64/0x180
[ 99.875192] ? kthread_use_mm+0x71/0x100
[ 99.876132] io_worker_handle_work+0x267/0x440
[ 99.877233] io_wqe_worker+0x297/0x350
[ 99.878145] kthread+0x112/0x150
[ 99.878849] ? __io_worker_unuse+0x100/0x100
[ 99.879935] ? kthread_park+0x90/0x90
[ 99.880874] ret_from_fork+0x22/0x30
[ 99.881679] Modules linked in:
[ 99.882493] CR2: 0000000000000000
[ 99.883324] ---[ end trace 4453745f4673190b ]---
[ 99.884289] RIP: 0010:0x0
[ 99.884837] Code: Bad RIP value.
[ 99.885492] RSP: 0018:ffffa1c7c01ebc08 EFLAGS: 00010202
[ 99.886851] RAX: 0000000000000000 RBX: ffff92363acd7f00 RCX: ffff92363d461608
[ 99.888561] RDX: 0000000000000010 RSI: 00007ffe040d9e10 RDI: ffff92363acd7f00
[ 99.890203] RBP: ffffa1c7c01ebc40 R08: 0000000000000000 R09: 0000000000000000
[ 99.891907] R10: ffffffff9ec692a0 R11: 0000000000000000 R12: 0000000000000010
[ 99.894106] R13: 0000000000000000 R14: ffff92363d461608 R15: ffffa1c7c01ebc68
[ 99.896079] FS: 0000000000000000(0000) GS:ffff92363dd00000(0000) knlGS:0000000000000000
[ 99.898017] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 99.899197] CR2: ffffffffffffffd6 CR3: 000000007ac66000 CR4: 00000000000006e0
Fixes: 32960613b7c3 ("io_uring: correctly handle non ->{read,write}_iter() file_operations")
Cc: stable@vger.kernel.org
Signed-off-by: Guoyu Huang <hgy5945@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|