Age | Commit message (Collapse) | Author |
|
This patch takes advantage of the new glock holder sharing feature for
resource groups. We have already introduced local resource group
locking in a previous patch, so competing accesses of local processes
are already under control.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Introduce a new LM_FLAG_NODE_SCOPE glock holder flag: when taking a
glock in LM_ST_EXCLUSIVE (EX) mode and with the LM_FLAG_NODE_SCOPE flag
set, the exclusive lock is shared among all local processes who are
holding the glock in EX mode and have the LM_FLAG_NODE_SCOPE flag set.
From the point of view of other nodes, the lock is still held
exclusively.
A future patch will start using this flag to improve performance with
rgrp sharing.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Prepare for treating resource group glocks as exclusive among nodes but
shared among all tasks running on a node: introduce another layer of
node-specific locking that the local tasks can use to coordinate their
accesses.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Add a rs_reserved field to struct gfs2_blkreserv to keep track of the number of
blocks reserved by this particular reservation, and a rd_reserved field to
struct gfs2_rgrpd to keep track of the total number of reserved blocks in the
resource group. Those blocks are exclusively reserved, as opposed to the
rs_requested / rd_requested blocks which are tracked in the reservation tree
(rd_rstree) and which can be stolen if necessary.
When making a reservation with gfs2_inplace_reserve, rs_reserved is set to
somewhere between ap->min_target and ap->target depending on the number of free
blocks in the resource group. When allocating blocks with gfs2_alloc_blocks,
rs_reserved is decremented accordingly. Eventually, any reserved but not
consumed blocks are returned to the resource group by gfs2_inplace_release.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
We keep track of what we've so far been referring to as reservations in
rd_rstree: the nodes in that tree indicate where in a resource group we'd
like to allocate the next couple of blocks for a particular inode. Local
processes take those as hints, but they may still "steal" blocks from those
extents, so when actually allocating a block, we must double check in the
bitmap whether that block is actually still free. Likewise, other cluster
nodes may "steal" such blocks as well.
One of the following patches introduces resource group glock sharing, i.e.,
sharing of an exclusively locked resource group glock among local processes to
speed up allocations. To make that work, we'll need to keep track of how many
blocks we've actually reserved for each inode, so we end up with two different
kinds of reservations.
Distinguish these two kinds by referring to blocks which are reserved but may
still be "stolen" as "requested". This rename also makes it more obvious that
rs_requested and rd_requested are strongly related.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
In gfs2_release, check if the inode has an active reservation to avoid
unnecessary lock taking.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
If gfs2_inplace_reserve has chosen a resource group but it couldn't make a
reservation there, there are too many other reservations in that resource
group. In that case, don't even try to respect existing reservations in
gfs2_alloc_blocks.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Only pass the current reservation down to gfs2_rbm_find rather than the entire
inode; we don't need any of the other information.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Pass a non-NULL minext to gfs2_rbm_find even for single-block allocations. In
gfs2_rbm_find, also set rgd->rd_extfail_pt when a single-block allocation
fails in a resource group: there is no reason for treating that case
differently. In gfs2_reservation_check_and_update, only check how many free
blocks we have if more than one block is requested; we already know there's at
least one free block.
In addition, when allocating N blocks fails in gfs2_rbm_find, we need to set
rd_extfail_pt to N - 1 rather than N: rd_extfail_pt defines the biggest
allocation that might still succeed.
Finally, reset rd_extfail_pt when updating the resource group statistics in
update_rgrp_lvb, as we already do in gfs2_rgrp_bh_get.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
We add task_work from any context, hence we need to ensure that we can
tolerate it being from IRQ context as well.
Fixes: 7cbf1722d5fc ("io_uring: provide FIFO ordering for task_work")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
If the Fb cap is used it means the current inode is flushing the
dirty data to OSD, just defer flushing the capsnap.
URL: https://tracker.ceph.com/issues/48640
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Testing with the fscache overhaul has triggered some lockdep warnings
about circular lock dependencies involving page_mkwrite and the
mmap_lock. It'd be better to do the "real work" without the mmap lock
being held.
Change the skip_checking_caps parameter in __ceph_put_cap_refs to an
enum, and use that to determine whether to queue check_caps, do it
synchronously or not at all. Change ceph_page_mkwrite to do a
ceph_put_cap_refs_async().
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Add a generic function for taking an inode reference, setting the I_WORK
bit and queueing i_work. Turn the ceph_queue_* functions into static
inline wrappers that pass in the right bit.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
A primary reason for skipping ceph_check_caps after putting the
references was to avoid the locking in ceph_check_caps during a
reconnect. __ceph_put_cap_refs can still call ceph_flush_snaps in that
case though, and that takes many of the same inconvenient locks.
Fix the logic in __ceph_put_cap_refs to skip flushing snaps when the
skip_checking_caps flag is set.
Fixes: e64f44a88465 ("ceph: skip checking caps when session reconnecting and releasing reqs")
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Userspace has discovered the functionality offered by SYS_kcmp and has
started to depend upon it. In particular, Mesa uses SYS_kcmp for
os_same_file_description() in order to identify when two fd (e.g. device
or dmabuf) point to the same struct file. Since they depend on it for
core functionality, lift SYS_kcmp out of the non-default
CONFIG_CHECKPOINT_RESTORE into the selectable syscall category.
Rasmus Villemoes also pointed out that systemd uses SYS_kcmp to
deduplicate the per-service file descriptor store.
Note that some distributions such as Ubuntu are already enabling
CHECKPOINT_RESTORE in their configs and so, by extension, SYS_kcmp.
References: https://gitlab.freedesktop.org/drm/intel/-/issues/3046
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Will Drewry <wad@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dave Airlie <airlied@gmail.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Lucas Stach <l.stach@pengutronix.de>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: stable@vger.kernel.org
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> # DRM depends on kcmp
Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> # systemd uses kcmp
Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Thomas Zimmermann <tzimmermann@suse.de>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210205220012.1983-1-chris@chris-wilson.co.uk
|
|
Add tracepoints for file I/O operations to aid in debugging of I/O errors
with zonefs.
The added tracepoints are in:
- zonefs_zone_mgmt() for tracing zone management operations
- zonefs_iomap_begin() for tracing regular file I/O
- zonefs_file_dio_append() for tracing zone-append operations
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
|
|
If this is attempted by an io-wq kthread, then return -EOPNOTSUPP as we
don't currently support that. Once we can get task_pid_ptr() doing the
right thing, then this can go away again.
Use PF_IO_WORKER for this to speciically target the io_uring workers.
Modify the /proc/self/ check to use PF_IO_WORKER as well.
Cc: stable@vger.kernel.org
Fixes: 8d4c3e76e3be ("proc: don't allow async path resolution of /proc/self components")
Reported-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
It can be useful to the interpreter to know which flags are in use.
For instance, knowing if the preserve-argv[0] is in use would
allow to skip the pathname argument.
This patch uses an unused auxiliary vector, AT_FLAGS, to add a
flag to inform interpreter if the preserve-argv[0] is enabled.
Note by Helge Deller:
The real-world user of this patch is qemu-user, which needs to know
if it has to preserve the argv[0]. See Debian bug #970460.
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: YunQiang Su <ysu@wavecomp.com>
URL: http://bugs.debian.org/970460
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
These pernet operations may depend on stuff set up or torn down in the
module init/exit functions. And they may be called at any time in
between. So it makes more sense for them to be the last to be
registered in the init function, and the first to be unregistered in the
exit function.
In particular, without this, the drc slab is being destroyed before all
the per-net drcs are shut down, resulting in an "Objects remaining in
nfsd_drc on __kmem_cache_shutdown()" warning in exit_nfsd.
Reported-by: Zhi Li <yieli@redhat.com>
Fixes: 3ba75830ce17 "nfsd4: drc containerization"
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Fix to return PTR_ERR() error code from the error handling case instead
fo 0 in function alloc_wbufs(), as done elsewhere in this function.
Fixes: 6a98bc4614de ("ubifs: Add authentication nodes to journal")
Signed-off-by: Wang ShaoBo <bobo.shaobowang@huawei.com>
Reviewed-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fix from David Sterba:
"A regression fix caused by a refactoring in 5.11.
A corrupted superblock wouldn't be detected by checksum verification
due to wrongly placed initialization of the checksum length, thus
making memcmp always work"
* tag 'for-5.11-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: initialize fs_info::csum_size earlier in open_ctree
|
|
s390 and alpha are the only 64 bit architectures with a 32-bit ino_t.
Since this is quite unusual this causes bugs from time to time.
See e.g. commit ebce3eb2f7ef ("ceph: fix inode number handling on
arches with 32-bit ino_t") for an example.
This (obviously) also prevents s390 and alpha to use 64-bit ino_t for
tmpfs. See commit b85a7a8bb573 ("tmpfs: disallow CONFIG_TMPFS_INODE64
on s390").
Therefore switch both s390 and alpha to 64-bit ino_t. This should only
have an effect on the ustat system call. To prevent ABI breakage
define struct ustat compatible to the old layout and change
sys_ustat() accordingly.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Be nice and prune these upfront, in case the ring is being shared and
one of the tasks is going away. This is a bit more important now that
we account the allocations.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
We have three different ones, put it in a helper for easy calling. This
is in preparation for doing it outside of ring freeing as well. With
that in mind, also ensure that we do the proper locking for safe calling
from a context where the ring it still live.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
No changes in this patch, just allows a caller to pass in a targeted
task that we must match for freeing requests in the cache.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Pull cifs fixes from Steve French:
"Four small smb3 fixes to the new mount API (including a particularly
important one for DFS links).
These were found in testing this week of additional DFS scenarios, and
a user testing of an apache container problem"
* tag '5.11-rc7-smb3-github' of git://github.com/smfrench/smb3-kernel:
cifs: Set CIFS_MOUNT_USE_PREFIX_PATH flag on setting cifs_sb->prepath.
cifs: In the new mount api we get the full devname as source=
cifs: do not disable noperm if multiuser mount option is not provided
cifs: fix dfs-links
|
|
Let's allow mounting readonly partition. We're able to recovery later once we
have it as read-write back.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
By default, kernel threads have init_fs and init_files assigned. In the
past, this has triggered security problems, as commands that don't ask
for (and hence don't get assigned) fs/files from the originating task
can then attempt path resolution etc with access to parts of the system
they should not be able to.
Rather than add checks in the fs code for misuse, just set these to
NULL. If we do attempt to use them, then the resulting code will oops
rather than provide access to something that it should not permit.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
KASAN reports a BUG when download file in jffs2 filesystem.It is
because when dstlen == 1, cpage_out will write array out of bounds.
Actually, data will not be compressed in jffs2_zlib_compress() if
data's length less than 4.
[ 393.799778] BUG: KASAN: slab-out-of-bounds in jffs2_rtime_compress+0x214/0x2f0 at addr ffff800062e3b281
[ 393.809166] Write of size 1 by task tftp/2918
[ 393.813526] CPU: 3 PID: 2918 Comm: tftp Tainted: G B 4.9.115-rt93-EMBSYS-CGEL-6.1.R6-dirty #1
[ 393.823173] Hardware name: LS1043A RDB Board (DT)
[ 393.827870] Call trace:
[ 393.830322] [<ffff20000808c700>] dump_backtrace+0x0/0x2f0
[ 393.835721] [<ffff20000808ca04>] show_stack+0x14/0x20
[ 393.840774] [<ffff2000086ef700>] dump_stack+0x90/0xb0
[ 393.845829] [<ffff20000827b19c>] kasan_object_err+0x24/0x80
[ 393.851402] [<ffff20000827b404>] kasan_report_error+0x1b4/0x4d8
[ 393.857323] [<ffff20000827bae8>] kasan_report+0x38/0x40
[ 393.862548] [<ffff200008279d44>] __asan_store1+0x4c/0x58
[ 393.867859] [<ffff2000084ce2ec>] jffs2_rtime_compress+0x214/0x2f0
[ 393.873955] [<ffff2000084bb3b0>] jffs2_selected_compress+0x178/0x2a0
[ 393.880308] [<ffff2000084bb530>] jffs2_compress+0x58/0x478
[ 393.885796] [<ffff2000084c5b34>] jffs2_write_inode_range+0x13c/0x450
[ 393.892150] [<ffff2000084be0b8>] jffs2_write_end+0x2a8/0x4a0
[ 393.897811] [<ffff2000081f3008>] generic_perform_write+0x1c0/0x280
[ 393.903990] [<ffff2000081f5074>] __generic_file_write_iter+0x1c4/0x228
[ 393.910517] [<ffff2000081f5210>] generic_file_write_iter+0x138/0x288
[ 393.916870] [<ffff20000829ec1c>] __vfs_write+0x1b4/0x238
[ 393.922181] [<ffff20000829ff00>] vfs_write+0xd0/0x238
[ 393.927232] [<ffff2000082a1ba8>] SyS_write+0xa0/0x110
[ 393.932283] [<ffff20000808429c>] __sys_trace_return+0x0/0x4
[ 393.937851] Object at ffff800062e3b280, in cache kmalloc-64 size: 64
[ 393.944197] Allocated:
[ 393.946552] PID = 2918
[ 393.948913] save_stack_trace_tsk+0x0/0x220
[ 393.953096] save_stack_trace+0x18/0x20
[ 393.956932] kasan_kmalloc+0xd8/0x188
[ 393.960594] __kmalloc+0x144/0x238
[ 393.963994] jffs2_selected_compress+0x48/0x2a0
[ 393.968524] jffs2_compress+0x58/0x478
[ 393.972273] jffs2_write_inode_range+0x13c/0x450
[ 393.976889] jffs2_write_end+0x2a8/0x4a0
[ 393.980810] generic_perform_write+0x1c0/0x280
[ 393.985251] __generic_file_write_iter+0x1c4/0x228
[ 393.990040] generic_file_write_iter+0x138/0x288
[ 393.994655] __vfs_write+0x1b4/0x238
[ 393.998228] vfs_write+0xd0/0x238
[ 394.001543] SyS_write+0xa0/0x110
[ 394.004856] __sys_trace_return+0x0/0x4
[ 394.008684] Freed:
[ 394.010691] PID = 2918
[ 394.013051] save_stack_trace_tsk+0x0/0x220
[ 394.017233] save_stack_trace+0x18/0x20
[ 394.021069] kasan_slab_free+0x88/0x188
[ 394.024902] kfree+0x6c/0x1d8
[ 394.027868] jffs2_sum_write_sumnode+0x2c4/0x880
[ 394.032486] jffs2_do_reserve_space+0x198/0x598
[ 394.037016] jffs2_reserve_space+0x3f8/0x4d8
[ 394.041286] jffs2_write_inode_range+0xf0/0x450
[ 394.045816] jffs2_write_end+0x2a8/0x4a0
[ 394.049737] generic_perform_write+0x1c0/0x280
[ 394.054179] __generic_file_write_iter+0x1c4/0x228
[ 394.058968] generic_file_write_iter+0x138/0x288
[ 394.063583] __vfs_write+0x1b4/0x238
[ 394.067157] vfs_write+0xd0/0x238
[ 394.070470] SyS_write+0xa0/0x110
[ 394.073783] __sys_trace_return+0x0/0x4
[ 394.077612] Memory state around the buggy address:
[ 394.082404] ffff800062e3b180: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc
[ 394.089623] ffff800062e3b200: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc
[ 394.096842] >ffff800062e3b280: 01 fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 394.104056] ^
[ 394.107283] ffff800062e3b300: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
[ 394.114502] ffff800062e3b380: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
[ 394.121718] ==================================================================
Signed-off-by: Yang Yang <yang.yang29@zte.com.cn>
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
An inode is allowed to have ubifs_xattr_max_cnt() xattrs, so we must
complain only when an inode has more xattrs, having exactly
ubifs_xattr_max_cnt() xattrs is fine.
With this the maximum number of xattrs can be created without hitting
the "has too many xattrs" warning when removing it.
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
An earlier commit moved out some functions to not be inlined by gcc, but
after some other rework to remove one of those, clang started inlining
the other one and ran into the same problem as gcc did before:
fs/ubifs/replay.c:1174:5: error: stack frame size of 1152 bytes in function 'ubifs_replay_journal' [-Werror,-Wframe-larger-than=]
Mark the function as noinline_for_stack to ensure it doesn't happen
again.
Fixes: f80df3851246 ("ubifs: use crypto_shash_tfm_digest()")
Fixes: eb66eff6636d ("ubifs: replay: Fix high stack usage")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
When crypto_shash_digestsize() fails, c->hmac_tfm
has not been freed before returning, which leads
to memleak.
Fixes: 49525e5eecca5 ("ubifs: Add helper functions for authentication support")
Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn>
Reviewed-by: Zhihao Cheng <chengzhihao1@huawei.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
clang static analysis reports this problem
fs/jffs2/summary.c:794:31: warning: Use of memory after it is freed
c->summary->sum_list_head = temp->u.next;
^~~~~~~~~~~~
In jffs2_sum_write_data(), in a loop summary data is handles a node at
a time. When it has written out the node it is removed the summary list,
and the node is deleted. In the corner case when a
JFFS2_FEATURE_RWCOMPAT_COPY is seen, a call is made to
jffs2_sum_disable_collecting(). jffs2_sum_disable_collecting() deletes
the whole list which conflicts with the loop's deleting the list by parts.
To preserve the old behavior of stopping the write midway, bail out of
the loop after disabling summary collection.
Fixes: 6171586a7ae5 ("[JFFS2] Correct handling of JFFS2_FEATURE_RWCOMPAT_COPY nodes.")
Signed-off-by: Tom Rix <trix@redhat.com>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
This collects all of them together and makes it possible to
e.g. exclude it from slub debugging.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
Pull io_uring fix from Jens Axboe:
"Revert of a patch from this release that caused a regression"
* tag 'io_uring-5.11-2021-02-12' of git://git.kernel.dk/linux-block:
Revert "io_uring: don't take fs for recvmsg/sendmsg"
|
|
Invalid req->flags are tolerated by free/put well, avoid this dancing
needlessly presetting it to zero, and then not even resetting but
modifying it, i.e. "|=".
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Indirectly io_req_find_next() is called for every request, optimise the
check by testing flags as it was long before -- __io_req_find_next()
tolerates false-positives well (i.e. link==NULL), and those should be
really rare.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
io_sq_thread_acquire_mm_files() can find a PF_EXITING task only when
it's called from task_work context. Don't check it in all other cases,
that are when we're in io_uring_enter().
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 updates for Linux 5.12
- Make the nVHE EL2 object relocatable, resulting in much more
maintainable code
- Handle concurrent translation faults hitting the same page
in a more elegant way
- Support for the standard TRNG hypervisor call
- A bunch of small PMU/Debug fixes
- Allow the disabling of symbol export from assembly code
- Simplification of the early init hypercall handling
|
|
User reported that btrfs-progs misc-tests/028-superblock-recover fails:
[TEST/misc] 028-superblock-recover
unexpected success: mounted fs with corrupted superblock
test failed for case 028-superblock-recover
The test case expects that a broken image with bad superblock will be
rejected to be mounted. However, the test image just passed csum check
of superblock and was successfully mounted.
Commit 55fc29bed8dd ("btrfs: use cached value of fs_info::csum_size
everywhere") replaces all calls to btrfs_super_csum_size by
fs_info::csum_size. The calls include the place where fs_info->csum_size
is not initialized. So btrfs_check_super_csum() passes because memcmp()
with len 0 always returns 0.
Fix it by caching csum size in btrfs_fs_info::csum_size once we know the
csum type in superblock is valid in open_ctree().
Link: https://github.com/kdave/btrfs-progs/issues/250
Fixes: 55fc29bed8dd ("btrfs: use cached value of fs_info::csum_size everywhere")
Signed-off-by: Su Yue <l@damenly.su>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Remove io_consume_sqe() and inline it back into io_get_sqe(). It
requires req dealloc on error, but in exchange we get cleaner
io_submit_sqes() and better locality for cached_sq_head.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Do a little trick in io_ring_ctx_free() briefly taking uring_lock, that
will wait for everyone currently holding it, so we can skip pinning ctx
with ctx->refs for __io_req_task_submit(), which is executed and loses
its refs/reqs while holding the lock.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Don't hand code io_req_task_queue() inside of io_async_buf_func(), just
call it.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
There are two reasons for this. First is to optimise
io_sq_thread_acquire_mm_files() for non-SQPOLL case, which currently do
too many checks and function calls in the hot path, e.g. in
io_init_req().
The second is to not grab mm/files when there are not needed. As
__io_queue_sqe() issues only one request now, we can reuse
io_sq_thread_acquire_mm_files() instead of unconditional acquire
mm/files.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
__io_queue_sqe() tries to issue as much requests of a link as it can,
and uses io_put_req_find_next() to extract a next one, targeting inline
completed requests. As now __io_queue_sqe() is always used together with
struct io_comp_state, it leaves next propagation only a small window and
only for async reqs, that doesn't justify its existence.
Remove it, make __io_queue_sqe() to issue only a head request. It
simplifies the code and will allow other optimisations.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Completion and submission states are now coupled together, it's weird to
get one from argument and another from ctx, do it consistently for
io_req_free_batch(). It's also faster as we already have @state cached
in registers.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
__io_complete_rw() casts request to kiocb for it to be immediately
container_of()'ed by io_complete_rw_common(). And the last function's name
doesn't do a great job of illuminating its purposes, so just inline it in
its only user.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
We pass return code into io_rw_reissue() only to be able to check if it's
-EAGAIN. That's not the cleanest approach and may prevent inlining of the
non-EAGAIN fast path, so do it at call sites.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Don't stash -EAGAIN'ed iopoll requests into a list to reissue it later,
do it eagerly. It removes overhead on keeping and checking that list,
and allows in case of failure for these requests to be completed through
normal iopoll completion path.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
io_req_free_batch_finish() is final and does not permit struct req_batch
to be reused without re-init. To be more consistent don't clear ->task
there.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|