Age | Commit message (Collapse) | Author |
|
The page array pointers are also duplicated across fuse_args_pages and
fuse_req. Get rid of the fuse_req ones.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
No need to duplicate the argument arrays in fuse_req, so just dereference
req->args instead of copying to the fuse_req internal ones.
This allows further cleanup of the fuse_req structure.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Get rid of request specific fields in fuse_req that are not used anymore.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Page arrays are not allocated together with the request anymore. Get rid
of the dead code
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
All requests are now sent with one of the fuse_simple_... helpers. Get rid
of the old api from the fuse internal header.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Rename fuse_request_send_notify_reply() to fuse_simple_notify_reply() and
convert to passing fuse_args instead of fuse_req.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Since we cannot reserve the request structure up-front, make sure that the
request allocation doesn't fail using __GFP_NOFAIL.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
This is a straightforward conversion.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Bypass the fc->initialized check by setting the force flag.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Derive fuse_writepage_args from fuse_io_args.
Sending the request is tricky since it was done with fi->lock held, hence
we must either use atomic allocation or release the lock. Both are
possible so try atomic first and if it fails, release the lock and do the
regular allocation with GFP_NOFS and __GFP_NOFAIL. Both flags are
necessary for correct operation.
Move the page realloc function from dev.c to file.c and convert to using
fuse_writepage_args.
The last caller of fuse_write_fill() is gone, so get rid of it.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
The old fuse_read_fill() helper can be deleted, now that the last user is
gone.
The fuse_io_args struct is moved to fuse_i.h so it can be shared between
readdir/read code.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Need to extend fuse_io_args with 'attr_ver' and 'ff' members, that take the
functionality of the same named members in fuse_req.
fuse_short_read() can now take struct fuse_args_pages.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Change of semantics in fuse_async_req_send/fuse_send_(read|write): these
can now return error, in which case the 'end' callback isn't called, so the
fuse_io_args object needs to be freed.
Added verification that the return value is sane (less than or equal to the
requested read/write size).
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Create a helper named fuse_simple_background() that is similar to
fuse_simple_request(). Unlike the latter, it returns immediately and calls
the supplied 'end' callback when the reply is received.
The supplied 'args' pointer is stored in 'fuse_req' which allows the
callback to interpret the output arguments decoded from the reply.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Extract a fuse_write_flags() helper that converts ki_flags relevant write
to open flags.
The other parts of fuse_send_write() aren't used in the
fuse_perform_write() case.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Derive fuse_io_args from struct fuse_args_pages. This will be used for
both synchronous and asynchronous read/write requests.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
This will allow the use of this function when converting to the simple api
(which doesn't use fuse_req).
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
fuse_simple_request() is converted to return length of last (instead of
single) out arg, since FUSE_IOCTL_OUT has two out args, the second of which
is variable length.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
fuse_req_pages_alloc() is moved to file.c, since its internal use by the
device code will eventually be removed.
Rename to fuse_pages_alloc() to signify that it's not only usable for
fuse_req page array.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Also turn BUG_ON into gracefully recovered WARN_ON.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Derive fuse_args_pages from fuse_args. This is used to handle requests
which use pages for input or output. The related flags are added to
fuse_args.
New FR_ALLOC_PAGES flags is added to indicate whether the page arrays in
fuse_req need to be freed by fuse_put_request() or not.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
We can use the "force" flag to make sure the DESTROY request is always sent
to userspace. So no need to keep it allocated during the lifetime of the
filesystem.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
In some cases it makes no sense to set pid/uid/gid fields in the request
header. Allow fuse_simple_background() to omit these. This is only
required in the "force" case, so for now just WARN if set otherwise.
Fold fuse_get_req_nofail_nopages() into its only caller. Comment is
obsolete anyway.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Move this function to the readdir.c where its only caller resides.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
This will be used by fuse_force_forget().
We can expand fuse_request_send() into fuse_simple_request(). The
FR_WAITING bit has already been set, no need to check.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Add 'force' to fuse_args and use fuse_get_req_nofail_nopages() to allocate
the request in that case.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Instead of complex games with a reserved request, just use __GFP_NOFAIL.
Both calers (flush, readdir) guarantee that connection was already
initialized, so no need to wait for fc->initialized.
Also remove unneeded clearing of FR_BACKGROUND flag.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
This makes the structure better packed.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
...to make future expansion simpler. The hiearachical structure is a
historical thing that does not serve any practical purpose.
The generated code is excatly the same before and after the patch.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
When IOCB_CMD_POLL is used on the FUSE device, aio_poll() disables IRQs
and takes kioctx::ctx_lock, then fuse_iqueue::waitq.lock.
This may have to wait for fuse_iqueue::waitq.lock to be released by one
of many places that take it with IRQs enabled. Since the IRQ handler
may take kioctx::ctx_lock, lockdep reports that a deadlock is possible.
Fix it by protecting the state of struct fuse_iqueue with a separate
spinlock, and only accessing fuse_iqueue::waitq using the versions of
the waitqueue functions which do IRQ-safe locking internally.
Reproducer:
#include <fcntl.h>
#include <stdio.h>
#include <sys/mount.h>
#include <sys/stat.h>
#include <sys/syscall.h>
#include <unistd.h>
#include <linux/aio_abi.h>
int main()
{
char opts[128];
int fd = open("/dev/fuse", O_RDWR);
aio_context_t ctx = 0;
struct iocb cb = { .aio_lio_opcode = IOCB_CMD_POLL, .aio_fildes = fd };
struct iocb *cbp = &cb;
sprintf(opts, "fd=%d,rootmode=040000,user_id=0,group_id=0", fd);
mkdir("mnt", 0700);
mount("foo", "mnt", "fuse", 0, opts);
syscall(__NR_io_setup, 1, &ctx);
syscall(__NR_io_submit, ctx, 1, &cbp);
}
Beginning of lockdep output:
=====================================================
WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected
5.3.0-rc5 #9 Not tainted
-----------------------------------------------------
syz_fuse/135 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
000000003590ceda (&fiq->waitq){+.+.}, at: spin_lock include/linux/spinlock.h:338 [inline]
000000003590ceda (&fiq->waitq){+.+.}, at: aio_poll fs/aio.c:1751 [inline]
000000003590ceda (&fiq->waitq){+.+.}, at: __io_submit_one.constprop.0+0x203/0x5b0 fs/aio.c:1825
and this task is already holding:
0000000075037284 (&(&ctx->ctx_lock)->rlock){..-.}, at: spin_lock_irq include/linux/spinlock.h:363 [inline]
0000000075037284 (&(&ctx->ctx_lock)->rlock){..-.}, at: aio_poll fs/aio.c:1749 [inline]
0000000075037284 (&(&ctx->ctx_lock)->rlock){..-.}, at: __io_submit_one.constprop.0+0x1f4/0x5b0 fs/aio.c:1825
which would create a new lock dependency:
(&(&ctx->ctx_lock)->rlock){..-.} -> (&fiq->waitq){+.+.}
but this new dependency connects a SOFTIRQ-irq-safe lock:
(&(&ctx->ctx_lock)->rlock){..-.}
[...]
Reported-by: syzbot+af05535bb79520f95431@syzkaller.appspotmail.com
Reported-by: syzbot+d86c4426a01f60feddc7@syzkaller.appspotmail.com
Fixes: bfe4037e722e ("aio: implement IOCB_CMD_POLL")
Cc: <stable@vger.kernel.org> # v4.19+
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
The unused vfs code can be removed. Don't pass empty subtype (same as if
->parse callback isn't called).
The bits that are left involve determining whether it's permitted to split the
filesystem type string passed in to mount(2). Consequently, this means that we
cannot get rid of the FS_HAS_SUBTYPE flag unless we define that a type string
with a dot in it always indicates a subtype specification.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Convert the fuse filesystem to the new internal mount API as the old
one will be obsoleted and removed. This allows greater flexibility in
communication of mount parameters between userspace, the VFS and the
filesystem.
See Documentation/filesystems/mount_api.txt for more information.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs into HEAD
Mount API convertion of fuse needs get_tree_bdev().
|
|
Provide a function, get_tree_mtd(), to replace mount_mtd(), using an
fs_context struct to hold the parameters.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: David Woodhouse <dwmw2@infradead.org>
cc: Brian Norris <computersforpeace@gmail.com>
cc: Boris Brezillon <bbrezillon@kernel.org>
cc: Marek Vasut <marek.vasut@gmail.com>
cc: Richard Weinberger <richard@nod.at>
cc: linux-mtd@lists.infradead.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Create a function, get_tree_bdev(), that is fs_context-aware and a
->get_tree() counterpart of mount_bdev().
It caches the block device pointer in the fs_context struct so that this
information can be passed into sget_fc()'s test and set functions.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: linux-block@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
For vfs_get_keyed_super users.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
fs_context::user_ns is used by fuse_parse_param(), even during remount,
so it needs to be set to the existing value for reconfigure.
Reproducer:
#include <fcntl.h>
#include <sys/mount.h>
int main()
{
char opts[128];
int fd = open("/dev/fuse", O_RDWR);
sprintf(opts, "fd=%d,rootmode=040000,user_id=0,group_id=0", fd);
mkdir("mnt", 0777);
mount("foo", "mnt", "fuse.foo", 0, opts);
mount("foo", "mnt", "fuse.foo", MS_REMOUNT, opts);
}
Crash:
BUG: kernel NULL pointer dereference, address: 0000000000000000
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] SMP
CPU: 0 PID: 129 Comm: syz_make_kuid Not tainted 5.3.0-rc5-next-20190821 #3
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-20181126_142135-anatol 04/01/2014
RIP: 0010:map_id_range_down+0xb/0xc0 kernel/user_namespace.c:291
[...]
Call Trace:
map_id_down kernel/user_namespace.c:312 [inline]
make_kuid+0xe/0x10 kernel/user_namespace.c:389
fuse_parse_param+0x116/0x210 fs/fuse/inode.c:523
vfs_parse_fs_param+0xdb/0x1b0 fs/fs_context.c:145
vfs_parse_fs_string+0x6a/0xa0 fs/fs_context.c:188
generic_parse_monolithic+0x85/0xc0 fs/fs_context.c:228
parse_monolithic_mount_data+0x1b/0x20 fs/fs_context.c:708
do_remount fs/namespace.c:2525 [inline]
do_mount+0x39a/0xa60 fs/namespace.c:3107
ksys_mount+0x7d/0xd0 fs/namespace.c:3325
__do_sys_mount fs/namespace.c:3339 [inline]
__se_sys_mount fs/namespace.c:3336 [inline]
__x64_sys_mount+0x20/0x30 fs/namespace.c:3336
do_syscall_64+0x4a/0x1a0 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Reported-by: syzbot+7d6a57304857423318a5@syzkaller.appspotmail.com
Fixes: 408cbe695350 ("vfs: Convert fuse to use the new mount API")
Cc: David Howells <dhowells@redhat.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
The inode parameter in cuse_release() is likely *not* a fuse inode. It's a
small wonder it didn't blow up until now.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
fuse_wait_on_page_writeback() always returns zero and nobody cares.
Let's make it void.
Signed-off-by: Maxim Patlasov <mpatlasov@virtuozzo.com>
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
[ This retries commit d4b13963f217 ("fuse: require /dev/fuse reads to have
enough buffer capacity"), which was reverted. In this version we require
only `sizeof(fuse_in_header) + sizeof(fuse_write_in)` instead of 4K for
FUSE request header room, because, contrary to libfuse and kernel client
behaviour, GlusterFS actually provides only so much room for request
header. ]
A FUSE filesystem server queues /dev/fuse sys_read calls to get filesystem
requests to handle. It does not know in advance what would be that request
as it can be anything that client issues - LOOKUP, READ, WRITE, ... Many
requests are short and retrieve data from the filesystem. However WRITE and
NOTIFY_REPLY write data into filesystem.
Before getting into operation phase, FUSE filesystem server and kernel
client negotiate what should be the maximum write size the client will ever
issue. After negotiation the contract in between server/client is that the
filesystem server then should queue /dev/fuse sys_read calls with enough
buffer capacity to receive any client request - WRITE in particular, while
FUSE client should not, in particular, send WRITE requests with >
negotiated max_write payload. FUSE client in kernel and libfuse
historically reserve 4K for request header. However an existing filesystem
server - GlusterFS - was found which reserves only 80 bytes for header room
(= `sizeof(fuse_in_header) + sizeof(fuse_write_in)`).
Since
`sizeof(fuse_in_header) + sizeof(fuse_write_in)` ==
`sizeof(fuse_in_header) + sizeof(fuse_read_in)` ==
`sizeof(fuse_in_header) + sizeof(fuse_notify_retrieve_in)`
is the absolute minimum any sane filesystem should be using for header
room, the contract is that filesystem server should queue sys_reads with
`sizeof(fuse_in_header) + sizeof(fuse_write_in)` + max_write buffer.
If the filesystem server does not follow this contract, what can happen
is that fuse_dev_do_read will see that request size is > buffer size,
and then it will return EIO to client who issued the request but won't
indicate in any way that there is a problem to filesystem server.
This can be hard to diagnose because for some requests, e.g. for
NOTIFY_REPLY which mimics WRITE, there is no client thread that is
waiting for request completion and that EIO goes nowhere, while on
filesystem server side things look like the kernel is not replying back
after successful NOTIFY_RETRIEVE request made by the server.
We can make the problem easy to diagnose if we indicate via error return to
filesystem server when it is violating the contract. This should not
practically cause problems because if a filesystem server is using shorter
buffer, writes to it were already very likely to cause EIO, and if the
filesystem is read-only it should be too following FUSE_MIN_READ_BUFFER
minimum buffer size.
Please see [1] for context where the problem of stuck filesystem was hit
for real (because kernel client was incorrectly sending more than
max_write data with NOTIFY_REPLY; see also previous patch), how the
situation was traced and for more involving patch that did not make it
into the tree.
[1] https://marc.info/?l=linux-fsdevel&m=155057023600853&w=2
Signed-off-by: Kirill Smelkov <kirr@nexedi.com>
Tested-by: Sander Eikelenboom <linux@eikelenboom.it>
Cc: Han-Wen Nienhuys <hanwen@google.com>
Cc: Jakob Unterwurzacher <jakobunt@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
|
|
Pull auxdisplay cleanup from Miguel Ojeda:
"Make ht16k33_fb_fix and ht16k33_fb_var constant (Nishka Dasgupta)"
* tag 'auxdisplay-for-linus-v5.3-rc7' of git://github.com/ojeda/linux:
auxdisplay: ht16k33: Make ht16k33_fb_fix and ht16k33_fb_var constant
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml
Pull UML fix from Richard Weinberger:
"Fix time travel mode"
* tag 'for-linus-5.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
um: fix time travel mode
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs
Pull UBIFS and JFFS2 fixes from Richard Weinberger:
"UBIFS:
- Don't block too long in writeback_inodes_sb()
- Fix for a possible overrun of the log head
- Fix double unlock in orphan_delete()
JFFS2:
- Remove C++ style from UAPI header and unbreak picky toolchains"
* tag 'for-linus-5.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs:
ubifs: Limit the number of pages in shrink_liability
ubifs: Correctly initialize c->min_log_bytes
ubifs: Fix double unlock around orphan_delete()
jffs2: Remove C++ style comments from uapi header
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"A few fixes for x86:
- Fix a boot regression caused by the recent bootparam sanitizing
change, which escaped the attention of all people who reviewed that
code.
- Address a boot problem on machines with broken E820 tables caused
by an underflow which ended up placing the trampoline start at
physical address 0.
- Handle machines which do not advertise a legacy timer of any form,
but need calibration of the local APIC timer gracefully by making
the calibration routine independent from the tick interrupt. Marked
for stable as well as there seems to be quite some new laptops
rolled out which expose this.
- Clear the RDRAND CPUID bit on AMD family 15h and 16h CPUs which are
affected by broken firmware which does not initialize RDRAND
correctly after resume. Add a command line parameter to override
this for machine which either do not use suspend/resume or have a
fixed BIOS. Unfortunately there is no way to detect this on boot,
so the only safe decision is to turn it off by default.
- Prevent RFLAGS from being clobbers in CALL_NOSPEC on 32bit which
caused fast KVM instruction emulation to break.
- Explain the Intel CPU model naming convention so that the repeating
discussions come to an end"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/retpoline: Don't clobber RFLAGS during CALL_NOSPEC on i386
x86/boot: Fix boot regression caused by bootparam sanitizing
x86/CPU/AMD: Clear RDRAND CPUID bit on AMD family 15h/16h
x86/boot/compressed/64: Fix boot on machines with broken E820 table
x86/apic: Handle missing global clockevent gracefully
x86/cpu: Explain Intel model naming convention
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timekeeping fix from Thomas Gleixner:
"A single fix for a regression caused by the generic VDSO
implementation where a math overflow causes CLOCK_BOOTTIME to become a
random number generator"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timekeeping/vsyscall: Prevent math overflow in BOOTTIME update
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fix from Thomas Gleixner:
"Handle the worker management in situations where a task is scheduled
out on a PI lock contention correctly and schedule a new worker if
possible"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/core: Schedule new worker even if PI-blocked
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Thomas Gleixner:
"Two small fixes for kprobes and perf:
- Prevent a deadlock in kprobe_optimizer() causes by reverse lock
ordering
- Fix a comment typo"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
kprobes: Fix potential deadlock in kprobe_optimizer()
perf/x86: Fix typo in comment
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fix from Thomas Gleixner:
"A single fix for a imbalanced kobject operation in the irq decriptor
code which was unearthed by the new warnings in the kobject code"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq: Properly pair kobject_del() with kobject_add()
|
|
Mergr misc fixes from Andrew Morton:
"11 fixes"
Mostly VM fixes, one psi polling fix, and one parisc build fix.
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm/kasan: fix false positive invalid-free reports with CONFIG_KASAN_SW_TAGS=y
mm/zsmalloc.c: fix race condition in zs_destroy_pool
mm/zsmalloc.c: migration can leave pages in ZS_EMPTY indefinitely
mm, page_owner: handle THP splits correctly
userfaultfd_release: always remove uffd flags and clear vm_userfaultfd_ctx
psi: get poll_work to run when calling poll syscall next time
mm: memcontrol: flush percpu vmevents before releasing memcg
mm: memcontrol: flush percpu vmstats before releasing memcg
parisc: fix compilation errrors
mm, page_alloc: move_freepages should not examine struct page of reserved memory
mm/z3fold.c: fix race between migration and destruction
|