Age | Commit message (Collapse) | Author |
|
In __ubifs_node_verify_hmac(), 'hmac' is allocated through kmalloc().
However, it is not deallocated in the following execution if
ubifs_node_calc_hmac() fails, leading to a memory leak bug. To fix this
issue, free 'hmac' before returning the error.
Fixes: 49525e5eecca ("ubifs: Add helper functions for authentication support")
Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu>
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
In read_znode(), the indexing node 'idx' is allocated by kmalloc().
However, it is not deallocated in the following execution if
ubifs_node_check_hash() fails, leading to a memory leak bug. To fix this
issue, free 'idx' before returning the error.
Fixes: 16a26b20d2af ("ubifs: authentication: Add hashes to index nodes")
Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu>
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
The pointer fname is being assigned with a value that is never
read because the function returns after the assignment. The assignment
is redundant and can be removed.
Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
This reverts commit b03755ad6f33b7b8cd7312a3596a2dbf496de6e7.
This is sad, and done for all the wrong reasons. Because that commit is
good, and does exactly what it says: avoids a lot of small disk requests
for the inode table read-ahead.
However, it turns out that it causes an entirely unrelated problem: the
getrandom() system call was introduced back in 2014 by commit
c6e9d6f38894 ("random: introduce getrandom(2) system call"), and people
use it as a convenient source of good random numbers.
But part of the current semantics for getrandom() is that it waits for
the entropy pool to fill at least partially (unlike /dev/urandom). And
at least ArchLinux apparently has a systemd that uses getrandom() at
boot time, and the improvements in IO patterns means that existing
installations suddenly start hanging, waiting for entropy that will
never happen.
It seems to be an unlucky combination of not _quite_ enough entropy,
together with a particular systemd version and configuration. Lennart
says that the systemd-random-seed process (which is what does this early
access) is supposed to not block any other boot activity, but sadly that
doesn't actually seem to be the case (possibly due bogus dependencies on
cryptsetup for encrypted swapspace).
The correct fix is to fix getrandom() to not block when it's not
appropriate, but that fix is going to take a lot more discussion. Do we
just make it act like /dev/urandom by default, and add a new flag for
"wait for entropy"? Do we add a boot-time option? Or do we just limit
the amount of time it will wait for entropy?
So in the meantime, we do the revert to give us time to discuss the
eventual fix for the fundamental problem, at which point we can re-apply
the ext4 inode table access optimization.
Reported-by: Ahmed S. Darwish <darwish.07@gmail.com>
Cc: Ted Ts'o <tytso@mit.edu>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Alexander E. Patrakov <patrakov@gmail.com>
Cc: Lennart Poettering <mzxreary@0pointer.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
no point reinventing it (with wrong ->read(), BTW).
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Some workloads can require far more than 4K oustanding entries. For
example memcached can have ~300K sockets over ~40 cores. Bumping the max
to 32K seems to work pretty well.
Reported-by: Dan Melnic <dmm@fb.com>
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
To resolve dependencies in following patches
mlx5_ib.h conflict resolved by keeing both hunks
Linux 5.3-rc8
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"Here are two fixes, one of them urgent fixing a bug introduced in 5.2
and reported by many users. It took time to identify the root cause,
catching the 5.3 release is higly desired also to push the fix to 5.2
stable tree.
The bug is a mess up of return values after adding proper error
handling and honestly the kind of bug that can cause sleeping
disorders until it's caught. My appologies to everybody who was
affected.
Summary of what could happen:
1) either a hang when committing a transaction, if this happens
there's no risk of corruption, still the hang is very inconvenient
and can't be resolved without a reboot
2) writeback for some btree nodes may never be started and we end up
committing a transaction without noticing that, this is really
serious and that will lead to the "parent transid verify failed"
messages"
* tag 'for-5.3-rc8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
Btrfs: fix unwritten extent buffers and hangs on future writeback attempts
Btrfs: fix assertion failure during fsync and use of stale transaction
|
|
Make fs_parse() handle fs_param_is_fd-type parameters that are passed a
string by converting it to an integer (in addition to handling direct fd
specification).
Also range check the integer.
[fix from Yin Fengwei folded]
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Convert the ramfs, shmem, tmpfs, devtmpfs and rootfs filesystems to the new
internal mount API as the old one will be obsoleted and removed. This
allows greater flexibility in communication of mount parameters between
userspace, the VFS and the filesystem.
See Documentation/filesystems/mount_api.txt for more information.
Note that tmpfs is slightly tricky as it can contain embedded commas, so it
can't be trivially split up using strsep() to break on commas in
generic_parse_monolithic(). Instead, tmpfs has to supply its own generic
parser.
However, if tmpfs changes, then devtmpfs and rootfs, which are wrappers
around tmpfs or ramfs, must change too - and thus so must ramfs, so these
had to be converted also.
[AV: rewritten]
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Hugh Dickins <hughd@google.com>
cc: linux-mm@kvack.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
The way the logic is setup in io_uring_enter() means that you can't wake
up the SQ poller thread while at the same time waiting (or polling) for
completions afterwards. There's no reason for that to be the case.
Reported-by: Lewis Baker <lbaker@fb.com>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
We currently merge async work items if we see a strict sequential hit.
This helps avoid unnecessary workqueue switches when we don't need
them. We can extend this merging to cover cases where it's not a strict
sequential hit, but the IO still fits within the same page. If an
application is doing multiple requests within the same page, we don't
want separate workers waiting on the same page to complete IO. It's much
faster to let the first worker bring in the page, then operate on that
page from the same worker to complete the next request(s).
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Variable err is initialized to a value that is never read and it
is re-assigned later. The initialization is redundant and can
be removed.
Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
|
|
This patch adds a missing zero to mode 755 specification required to
express it in octal numeral system.
Reported-by: Łukasz Wrochna <l.wrochna@samsung.com>
Signed-off-by: Artur Świgoń <a.swigon@partner.samsung.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
|
|
virtio-fs does not support aborting requests which are being
processed. That is requests which have been sent to fuse daemon on host.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Allow virtio-fs to also send DESTROY request.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Don't hold onto dentry in lru list if need to re-lookup it anyway at next
access. Only do this if explicitly enabled, otherwise it could result in
performance regression.
More advanced version of this patch would periodically flush out dentries
from the lru which have gone stale.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
As of now fuse_dev_alloc() both allocates a fuse device and installs it in
fuse_conn list. fuse_dev_alloc() can fail if fuse_device allocation fails.
virtio-fs needs to initialize multiple fuse devices (one per virtio queue).
It initializes one fuse device as part of call to fuse_fill_super_common()
and rest of the devices are allocated and installed after that.
But, we can't afford to fail after calling fuse_fill_super_common() as we
don't have a way to undo all the actions done by fuse_fill_super_common().
So to avoid failures after the call to fuse_fill_super_common(),
pre-allocate all fuse devices early and install them into fuse connection
later.
This patch provides two separate helpers for fuse device allocation and
fuse device installation in fuse_conn.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
The /dev/fuse device uses fiq->waitq and fasync to signal that requests are
available. These mechanisms do not apply to virtio-fs. This patch
introduces callbacks so alternative behavior can be used.
Note that queue_interrupt() changes along these lines:
spin_lock(&fiq->waitq.lock);
wake_up_locked(&fiq->waitq);
+ kill_fasync(&fiq->fasync, SIGIO, POLL_IN);
spin_unlock(&fiq->waitq.lock);
- kill_fasync(&fiq->fasync, SIGIO, POLL_IN);
Since queue_request() and queue_forget() also call kill_fasync() inside
the spinlock this should be safe.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
fuse_fill_super() includes code to process the fd= option and link the
struct fuse_dev to the fd's struct file. In virtio-fs there is no file
descriptor because /dev/fuse is not used.
This patch extracts fuse_fill_super_common() so that both classic fuse and
virtio-fs can share the code to initialize a mount.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
File systems like virtio-fs need to do not have to play directly with
forget list data structures. There is a helper function use that instead.
Rename dequeue_forget() to fuse_dequeue_forget() and export it so that
stacked filesystems can use it.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
virtio-fs will need unique IDs for FORGET requests from outside
fs/fuse/dev.c. Make the symbol visible.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
This will be used by virtio-fs to send init request to fuse server after
initialization of virt queues.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
virtio-fs will need to query the length of fuse_arg lists. Make the symbol
visible.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
virtio-fs will need to complete requests from outside fs/fuse/dev.c. Make
the symbol visible.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
The size of struct fuse_req was reduced from 392B to 144B on a non-debug
config, thus the sanitize_global_limit() helper was setting a larger
default limit. This doesn't really reflect reduction in the memory used by
requests, since the fields removed from fuse_req were added to fuse_args
derived structs; e.g. sizeof(struct fuse_writepages_args) is 248B, thus
resulting in slightly more memory being used for writepage requests
overalll (due to using 256B slabs).
Make the calculatation ignore the size of fuse_req and use the old 392B
value.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
The lock_extent_buffer_io() returns 1 to the caller to tell it everything
went fine and the callers needs to start writeback for the extent buffer
(submit a bio, etc), 0 to tell the caller everything went fine but it does
not need to start writeback for the extent buffer, and a negative value if
some error happened.
When it's about to return 1 it tries to lock all pages, and if a try lock
on a page fails, and we didn't flush any existing bio in our "epd", it
calls flush_write_bio(epd) and overwrites the return value of 1 to 0 or
an error. The page might have been locked elsewhere, not with the goal
of starting writeback of the extent buffer, and even by some code other
than btrfs, like page migration for example, so it does not mean the
writeback of the extent buffer was already started by some other task,
so returning a 0 tells the caller (btree_write_cache_pages()) to not
start writeback for the extent buffer. Note that epd might currently have
either no bio, so flush_write_bio() returns 0 (success) or it might have
a bio for another extent buffer with a lower index (logical address).
Since we return 0 with the EXTENT_BUFFER_WRITEBACK bit set on the
extent buffer and writeback is never started for the extent buffer,
future attempts to writeback the extent buffer will hang forever waiting
on that bit to be cleared, since it can only be cleared after writeback
completes. Such hang is reported with a trace like the following:
[49887.347053] INFO: task btrfs-transacti:1752 blocked for more than 122 seconds.
[49887.347059] Not tainted 5.2.13-gentoo #2
[49887.347060] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[49887.347062] btrfs-transacti D 0 1752 2 0x80004000
[49887.347064] Call Trace:
[49887.347069] ? __schedule+0x265/0x830
[49887.347071] ? bit_wait+0x50/0x50
[49887.347072] ? bit_wait+0x50/0x50
[49887.347074] schedule+0x24/0x90
[49887.347075] io_schedule+0x3c/0x60
[49887.347077] bit_wait_io+0x8/0x50
[49887.347079] __wait_on_bit+0x6c/0x80
[49887.347081] ? __lock_release.isra.29+0x155/0x2d0
[49887.347083] out_of_line_wait_on_bit+0x7b/0x80
[49887.347084] ? var_wake_function+0x20/0x20
[49887.347087] lock_extent_buffer_for_io+0x28c/0x390
[49887.347089] btree_write_cache_pages+0x18e/0x340
[49887.347091] do_writepages+0x29/0xb0
[49887.347093] ? kmem_cache_free+0x132/0x160
[49887.347095] ? convert_extent_bit+0x544/0x680
[49887.347097] filemap_fdatawrite_range+0x70/0x90
[49887.347099] btrfs_write_marked_extents+0x53/0x120
[49887.347100] btrfs_write_and_wait_transaction.isra.4+0x38/0xa0
[49887.347102] btrfs_commit_transaction+0x6bb/0x990
[49887.347103] ? start_transaction+0x33e/0x500
[49887.347105] transaction_kthread+0x139/0x15c
So fix this by not overwriting the return value (ret) with the result
from flush_write_bio(). We also need to clear the EXTENT_BUFFER_WRITEBACK
bit in case flush_write_bio() returns an error, otherwise it will hang
any future attempts to writeback the extent buffer, and undo all work
done before (set back EXTENT_BUFFER_DIRTY, etc).
This is a regression introduced in the 5.2 kernel.
Fixes: 2e3c25136adfb ("btrfs: extent_io: add proper error handling to lock_extent_buffer_for_io()")
Fixes: f4340622e0226 ("btrfs: extent_io: Move the BUG_ON() in flush_write_bio() one level up")
Reported-by: Zdenek Sojka <zsojka@seznam.cz>
Link: https://lore.kernel.org/linux-btrfs/GpO.2yos.3WGDOLpx6t%7D.1TUDYM@seznam.cz/T/#u
Reported-by: Stefan Priebe - Profihost AG <s.priebe@profihost.ag>
Link: https://lore.kernel.org/linux-btrfs/5c4688ac-10a7-fb07-70e8-c5d31a3fbb38@profihost.ag/T/#t
Reported-by: Drazen Kacar <drazen.kacar@oradian.com>
Link: https://lore.kernel.org/linux-btrfs/DB8PR03MB562876ECE2319B3E579590F799C80@DB8PR03MB5628.eurprd03.prod.outlook.com/
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204377
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Sometimes when fsync'ing a file we need to log that other inodes exist and
when we need to do that we acquire a reference on the inodes and then drop
that reference using iput() after logging them.
That generally is not a problem except if we end up doing the final iput()
(dropping the last reference) on the inode and that inode has a link count
of 0, which can happen in a very short time window if the logging path
gets a reference on the inode while it's being unlinked.
In that case we end up getting the eviction callback, btrfs_evict_inode(),
invoked through the iput() call chain which needs to drop all of the
inode's items from its subvolume btree, and in order to do that, it needs
to join a transaction at the helper function evict_refill_and_join().
However because the task previously started a transaction at the fsync
handler, btrfs_sync_file(), it has current->journal_info already pointing
to a transaction handle and therefore evict_refill_and_join() will get
that transaction handle from btrfs_join_transaction(). From this point on,
two different problems can happen:
1) evict_refill_and_join() will often change the transaction handle's
block reserve (->block_rsv) and set its ->bytes_reserved field to a
value greater than 0. If evict_refill_and_join() never commits the
transaction, the eviction handler ends up decreasing the reference
count (->use_count) of the transaction handle through the call to
btrfs_end_transaction(), and after that point we have a transaction
handle with a NULL ->block_rsv (which is the value prior to the
transaction join from evict_refill_and_join()) and a ->bytes_reserved
value greater than 0. If after the eviction/iput completes the inode
logging path hits an error or it decides that it must fallback to a
transaction commit, the btrfs fsync handle, btrfs_sync_file(), gets a
non-zero value from btrfs_log_dentry_safe(), and because of that
non-zero value it tries to commit the transaction using a handle with
a NULL ->block_rsv and a non-zero ->bytes_reserved value. This makes
the transaction commit hit an assertion failure at
btrfs_trans_release_metadata() because ->bytes_reserved is not zero but
the ->block_rsv is NULL. The produced stack trace for that is like the
following:
[192922.917158] assertion failed: !trans->bytes_reserved, file: fs/btrfs/transaction.c, line: 816
[192922.917553] ------------[ cut here ]------------
[192922.917922] kernel BUG at fs/btrfs/ctree.h:3532!
[192922.918310] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
[192922.918666] CPU: 2 PID: 883 Comm: fsstress Tainted: G W 5.1.4-btrfs-next-47 #1
[192922.919035] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
[192922.919801] RIP: 0010:assfail.constprop.25+0x18/0x1a [btrfs]
(...)
[192922.920925] RSP: 0018:ffffaebdc8a27da8 EFLAGS: 00010286
[192922.921315] RAX: 0000000000000051 RBX: ffff95c9c16a41c0 RCX: 0000000000000000
[192922.921692] RDX: 0000000000000000 RSI: ffff95cab6b16838 RDI: ffff95cab6b16838
[192922.922066] RBP: ffff95c9c16a41c0 R08: 0000000000000000 R09: 0000000000000000
[192922.922442] R10: ffffaebdc8a27e70 R11: 0000000000000000 R12: ffff95ca731a0980
[192922.922820] R13: 0000000000000000 R14: ffff95ca84c73338 R15: ffff95ca731a0ea8
[192922.923200] FS: 00007f337eda4e80(0000) GS:ffff95cab6b00000(0000) knlGS:0000000000000000
[192922.923579] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[192922.923948] CR2: 00007f337edad000 CR3: 00000001e00f6002 CR4: 00000000003606e0
[192922.924329] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[192922.924711] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[192922.925105] Call Trace:
[192922.925505] btrfs_trans_release_metadata+0x10c/0x170 [btrfs]
[192922.925911] btrfs_commit_transaction+0x3e/0xaf0 [btrfs]
[192922.926324] btrfs_sync_file+0x44c/0x490 [btrfs]
[192922.926731] do_fsync+0x38/0x60
[192922.927138] __x64_sys_fdatasync+0x13/0x20
[192922.927543] do_syscall_64+0x60/0x1c0
[192922.927939] entry_SYSCALL_64_after_hwframe+0x49/0xbe
(...)
[192922.934077] ---[ end trace f00808b12068168f ]---
2) If evict_refill_and_join() decides to commit the transaction, it will
be able to do it, since the nested transaction join only increments the
transaction handle's ->use_count reference counter and it does not
prevent the transaction from getting committed. This means that after
eviction completes, the fsync logging path will be using a transaction
handle that refers to an already committed transaction. What happens
when using such a stale transaction can be unpredictable, we are at
least having a use-after-free on the transaction handle itself, since
the transaction commit will call kmem_cache_free() against the handle
regardless of its ->use_count value, or we can end up silently losing
all the updates to the log tree after that iput() in the logging path,
or using a transaction handle that in the meanwhile was allocated to
another task for a new transaction, etc, pretty much unpredictable
what can happen.
In order to fix both of them, instead of using iput() during logging, use
btrfs_add_delayed_iput(), so that the logging path of fsync never drops
the last reference on an inode, that step is offloaded to a safe context
(usually the cleaner kthread).
The assertion failure issue was sporadically triggered by the test case
generic/475 from fstests, which loads the dm error target while fsstress
is running, which lead to fsync failing while logging inodes with -EIO
errors and then trying later to commit the transaction, triggering the
assertion failure.
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
When filtering xattr list for reading, presence of trusted xattr
results in a security audit log. However, if there is other content
no errno will be set, and if there isn't, the errno will be -ENODATA
and not -EPERM as is usually associated with a lack of capability.
The check does not block the request to list the xattrs present.
Switch to ns_capable_noaudit to reflect a more appropriate check.
Signed-off-by: Mark Salyzyn <salyzyn@android.com>
Cc: linux-security-module@vger.kernel.org
Cc: kernel-team@android.com
Cc: stable@vger.kernel.org # v3.18+
Fixes: a082c6f680da ("ovl: filter trusted xattr for non-admin")
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
if ovl_encode_real_fh() fails, no memory was allocated
and the error in the error-valued pointer should be returned.
Fixes: 9b6faee07470 ("ovl: check ERR_PTR() return value from ovl_encode_fh()")
Signed-off-by: Ding Xiang <dingxiang@cmss.chinamobile.com>
Cc: <stable@vger.kernel.org> # v4.16+
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Get rid of the callback, deal with that and dentry in callers
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Lots of duplicated code that benefits from a little consolidation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Configfs abuses symlink(2). Unlike the normal filesystems, it
wants the target resolved at symlink(2) time, like link(2) would've
done. The problem is that ->symlink() is called with the parent
directory locked exclusive, so resolving the target inside the
->symlink() is easily deadlocked.
Short of really ugly games in sys_symlink() itself, all we can
do is to unlock the parent before resolving the target and
relock it after. However, that invalidates the checks done
by the caller of ->symlink(), so we have to
* check that dentry is still where it used to be
(it couldn't have been moved, but it could've been unhashed)
* recheck that it's still negative (somebody else
might've successfully created a symlink with the same name
while we were looking the target up)
* recheck the permissions on the parent directory.
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
All the popular filesystems need to grab the inode lock for buffered
writes. With io_uring punting buffered writes to async context, we
observe a lot of contention with all workers hamming this mutex.
For buffered writes, we generally don't need a lot of parallelism on
the submission side, as the flushing will take care of that for us.
Hence we don't need a deep queue on the write side, as long as we
can safely punt from the original submission context.
Add a workqueue with a limit of 2 that we can use for buffered writes.
This greatly improves the performance and efficiency of higher queue
depth buffered async writes with io_uring.
Reported-by: Andres Freund <andres@anarazel.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Add a helper for queueing a request for async execution, in preparation
for optimizing it.
No functional change in this patch.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The page array pointers are also duplicated across fuse_args_pages and
fuse_req. Get rid of the fuse_req ones.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
No need to duplicate the argument arrays in fuse_req, so just dereference
req->args instead of copying to the fuse_req internal ones.
This allows further cleanup of the fuse_req structure.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Get rid of request specific fields in fuse_req that are not used anymore.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Page arrays are not allocated together with the request anymore. Get rid
of the dead code
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
All requests are now sent with one of the fuse_simple_... helpers. Get rid
of the old api from the fuse internal header.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Rename fuse_request_send_notify_reply() to fuse_simple_notify_reply() and
convert to passing fuse_args instead of fuse_req.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Since we cannot reserve the request structure up-front, make sure that the
request allocation doesn't fail using __GFP_NOFAIL.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
This is a straightforward conversion.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Bypass the fc->initialized check by setting the force flag.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Derive fuse_writepage_args from fuse_io_args.
Sending the request is tricky since it was done with fi->lock held, hence
we must either use atomic allocation or release the lock. Both are
possible so try atomic first and if it fails, release the lock and do the
regular allocation with GFP_NOFS and __GFP_NOFAIL. Both flags are
necessary for correct operation.
Move the page realloc function from dev.c to file.c and convert to using
fuse_writepage_args.
The last caller of fuse_write_fill() is gone, so get rid of it.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
The old fuse_read_fill() helper can be deleted, now that the last user is
gone.
The fuse_io_args struct is moved to fuse_i.h so it can be shared between
readdir/read code.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Need to extend fuse_io_args with 'attr_ver' and 'ff' members, that take the
functionality of the same named members in fuse_req.
fuse_short_read() can now take struct fuse_args_pages.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Change of semantics in fuse_async_req_send/fuse_send_(read|write): these
can now return error, in which case the 'end' callback isn't called, so the
fuse_io_args object needs to be freed.
Added verification that the return value is sane (less than or equal to the
requested read/write size).
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Create a helper named fuse_simple_background() that is similar to
fuse_simple_request(). Unlike the latter, it returns immediately and calls
the supplied 'end' callback when the reply is received.
The supplied 'args' pointer is stored in 'fuse_req' which allows the
callback to interpret the output arguments decoded from the reply.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|