summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2019-09-15ubifs: Fix memory leak in __ubifs_node_verify_hmac error pathWenwen Wang
In __ubifs_node_verify_hmac(), 'hmac' is allocated through kmalloc(). However, it is not deallocated in the following execution if ubifs_node_calc_hmac() fails, leading to a memory leak bug. To fix this issue, free 'hmac' before returning the error. Fixes: 49525e5eecca ("ubifs: Add helper functions for authentication support") Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu> Signed-off-by: Richard Weinberger <richard@nod.at>
2019-09-15ubifs: Fix memory leak in read_znode() error pathWenwen Wang
In read_znode(), the indexing node 'idx' is allocated by kmalloc(). However, it is not deallocated in the following execution if ubifs_node_check_hash() fails, leading to a memory leak bug. To fix this issue, free 'idx' before returning the error. Fixes: 16a26b20d2af ("ubifs: authentication: Add hashes to index nodes") Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu> Signed-off-by: Richard Weinberger <richard@nod.at>
2019-09-15ubifs: Remove redundant assignment to pointer fnameColin Ian King
The pointer fname is being assigned with a value that is never read because the function returns after the assignment. The assignment is redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2019-09-15Revert "ext4: make __ext4_get_inode_loc plug"Linus Torvalds
This reverts commit b03755ad6f33b7b8cd7312a3596a2dbf496de6e7. This is sad, and done for all the wrong reasons. Because that commit is good, and does exactly what it says: avoids a lot of small disk requests for the inode table read-ahead. However, it turns out that it causes an entirely unrelated problem: the getrandom() system call was introduced back in 2014 by commit c6e9d6f38894 ("random: introduce getrandom(2) system call"), and people use it as a convenient source of good random numbers. But part of the current semantics for getrandom() is that it waits for the entropy pool to fill at least partially (unlike /dev/urandom). And at least ArchLinux apparently has a systemd that uses getrandom() at boot time, and the improvements in IO patterns means that existing installations suddenly start hanging, waiting for entropy that will never happen. It seems to be an unlucky combination of not _quite_ enough entropy, together with a particular systemd version and configuration. Lennart says that the systemd-random-seed process (which is what does this early access) is supposed to not block any other boot activity, but sadly that doesn't actually seem to be the case (possibly due bogus dependencies on cryptsetup for encrypted swapspace). The correct fix is to fix getrandom() to not block when it's not appropriate, but that fix is going to take a lot more discussion. Do we just make it act like /dev/urandom by default, and add a new flag for "wait for entropy"? Do we add a boot-time option? Or do we just limit the amount of time it will wait for entropy? So in the meantime, we do the revert to give us time to discuss the eventual fix for the fundamental problem, at which point we can re-apply the ext4 inode table access optimization. Reported-by: Ahmed S. Darwish <darwish.07@gmail.com> Cc: Ted Ts'o <tytso@mit.edu> Cc: Willy Tarreau <w@1wt.eu> Cc: Alexander E. Patrakov <patrakov@gmail.com> Cc: Lennart Poettering <mzxreary@0pointer.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-15afs dynroot: switch to simple_dir_operationsAl Viro
no point reinventing it (with wrong ->read(), BTW). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-09-14io_uring: increase IORING_MAX_ENTRIES to 32KDaniel Xu
Some workloads can require far more than 4K oustanding entries. For example memcached can have ~300K sockets over ~40 cores. Bumping the max to 32K seems to work pretty well. Reported-by: Dan Melnic <dmm@fb.com> Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-13Merge tag 'v5.3-rc8' into rdma.git for-nextJason Gunthorpe
To resolve dependencies in following patches mlx5_ib.h conflict resolved by keeing both hunks Linux 5.3-rc8 Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-09-13Merge tag 'for-5.3-rc8-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "Here are two fixes, one of them urgent fixing a bug introduced in 5.2 and reported by many users. It took time to identify the root cause, catching the 5.3 release is higly desired also to push the fix to 5.2 stable tree. The bug is a mess up of return values after adding proper error handling and honestly the kind of bug that can cause sleeping disorders until it's caught. My appologies to everybody who was affected. Summary of what could happen: 1) either a hang when committing a transaction, if this happens there's no risk of corruption, still the hang is very inconvenient and can't be resolved without a reboot 2) writeback for some btree nodes may never be started and we end up committing a transaction without noticing that, this is really serious and that will lead to the "parent transid verify failed" messages" * tag 'for-5.3-rc8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: Btrfs: fix unwritten extent buffers and hangs on future writeback attempts Btrfs: fix assertion failure during fsync and use of stale transaction
2019-09-12vfs: Make fs_parse() handle fs_param_is_fd-type params betterDavid Howells
Make fs_parse() handle fs_param_is_fd-type parameters that are passed a string by converting it to an integer (in addition to handling direct fd specification). Also range check the integer. [fix from Yin Fengwei folded] Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-09-12vfs: Convert ramfs, shmem, tmpfs, devtmpfs, rootfs to use the new mount APIDavid Howells
Convert the ramfs, shmem, tmpfs, devtmpfs and rootfs filesystems to the new internal mount API as the old one will be obsoleted and removed. This allows greater flexibility in communication of mount parameters between userspace, the VFS and the filesystem. See Documentation/filesystems/mount_api.txt for more information. Note that tmpfs is slightly tricky as it can contain embedded commas, so it can't be trivially split up using strsep() to break on commas in generic_parse_monolithic(). Instead, tmpfs has to supply its own generic parser. However, if tmpfs changes, then devtmpfs and rootfs, which are wrappers around tmpfs or ramfs, must change too - and thus so must ramfs, so these had to be converted also. [AV: rewritten] Signed-off-by: David Howells <dhowells@redhat.com> cc: Hugh Dickins <hughd@google.com> cc: linux-mm@kvack.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-09-12io_uring: make sqpoll wakeup possible with geteventsJens Axboe
The way the logic is setup in io_uring_enter() means that you can't wake up the SQ poller thread while at the same time waiting (or polling) for completions afterwards. There's no reason for that to be the case. Reported-by: Lewis Baker <lbaker@fb.com> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-12io_uring: extend async work mergingJens Axboe
We currently merge async work items if we see a strict sequential hit. This helps avoid unnecessary workqueue switches when we don't need them. We can extend this merging to cover cases where it's not a strict sequential hit, but the IO still fits within the same page. If an application is doing multiple requests within the same page, we don't want separate workers waiting on the same page to complete IO. It's much faster to let the first worker bring in the page, then operate on that page from the same worker to complete the next request(s). Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-12orangefs: remove redundant assignment to errColin Ian King
Variable err is initialized to a value that is never read and it is re-assigned later. The initialization is redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2019-09-12orangefs: Add octal zero prefixArtur Świgoń
This patch adds a missing zero to mode 755 specification required to express it in octal numeral system. Reported-by: Łukasz Wrochna <l.wrochna@samsung.com> Signed-off-by: Artur Świgoń <a.swigon@partner.samsung.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2019-09-12fuse: allow skipping control interface and forced unmountVivek Goyal
virtio-fs does not support aborting requests which are being processed. That is requests which have been sent to fuse daemon on host. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-12fuse: dissociate DESTROY from fuseblkMiklos Szeredi
Allow virtio-fs to also send DESTROY request. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-12fuse: delete dentry if timeout is zeroMiklos Szeredi
Don't hold onto dentry in lru list if need to re-lookup it anyway at next access. Only do this if explicitly enabled, otherwise it could result in performance regression. More advanced version of this patch would periodically flush out dentries from the lru which have gone stale. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-12fuse: separate fuse device allocation and installation in fuse_connVivek Goyal
As of now fuse_dev_alloc() both allocates a fuse device and installs it in fuse_conn list. fuse_dev_alloc() can fail if fuse_device allocation fails. virtio-fs needs to initialize multiple fuse devices (one per virtio queue). It initializes one fuse device as part of call to fuse_fill_super_common() and rest of the devices are allocated and installed after that. But, we can't afford to fail after calling fuse_fill_super_common() as we don't have a way to undo all the actions done by fuse_fill_super_common(). So to avoid failures after the call to fuse_fill_super_common(), pre-allocate all fuse devices early and install them into fuse connection later. This patch provides two separate helpers for fuse device allocation and fuse device installation in fuse_conn. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-12fuse: add fuse_iqueue_ops callbacksStefan Hajnoczi
The /dev/fuse device uses fiq->waitq and fasync to signal that requests are available. These mechanisms do not apply to virtio-fs. This patch introduces callbacks so alternative behavior can be used. Note that queue_interrupt() changes along these lines: spin_lock(&fiq->waitq.lock); wake_up_locked(&fiq->waitq); + kill_fasync(&fiq->fasync, SIGIO, POLL_IN); spin_unlock(&fiq->waitq.lock); - kill_fasync(&fiq->fasync, SIGIO, POLL_IN); Since queue_request() and queue_forget() also call kill_fasync() inside the spinlock this should be safe. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-12fuse: extract fuse_fill_super_common()Stefan Hajnoczi
fuse_fill_super() includes code to process the fd= option and link the struct fuse_dev to the fd's struct file. In virtio-fs there is no file descriptor because /dev/fuse is not used. This patch extracts fuse_fill_super_common() so that both classic fuse and virtio-fs can share the code to initialize a mount. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-12fuse: export fuse_dequeue_forget() functionVivek Goyal
File systems like virtio-fs need to do not have to play directly with forget list data structures. There is a helper function use that instead. Rename dequeue_forget() to fuse_dequeue_forget() and export it so that stacked filesystems can use it. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-12fuse: export fuse_get_unique()Stefan Hajnoczi
virtio-fs will need unique IDs for FORGET requests from outside fs/fuse/dev.c. Make the symbol visible. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-12fuse: export fuse_send_init_request()Vivek Goyal
This will be used by virtio-fs to send init request to fuse server after initialization of virt queues. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-12fuse: export fuse_len_args()Stefan Hajnoczi
virtio-fs will need to query the length of fuse_arg lists. Make the symbol visible. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-12fuse: export fuse_end_request()Stefan Hajnoczi
virtio-fs will need to complete requests from outside fs/fuse/dev.c. Make the symbol visible. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-12fuse: fix request limitMiklos Szeredi
The size of struct fuse_req was reduced from 392B to 144B on a non-debug config, thus the sanitize_global_limit() helper was setting a larger default limit. This doesn't really reflect reduction in the memory used by requests, since the fields removed from fuse_req were added to fuse_args derived structs; e.g. sizeof(struct fuse_writepages_args) is 248B, thus resulting in slightly more memory being used for writepage requests overalll (due to using 256B slabs). Make the calculatation ignore the size of fuse_req and use the old 392B value. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-12Btrfs: fix unwritten extent buffers and hangs on future writeback attemptsFilipe Manana
The lock_extent_buffer_io() returns 1 to the caller to tell it everything went fine and the callers needs to start writeback for the extent buffer (submit a bio, etc), 0 to tell the caller everything went fine but it does not need to start writeback for the extent buffer, and a negative value if some error happened. When it's about to return 1 it tries to lock all pages, and if a try lock on a page fails, and we didn't flush any existing bio in our "epd", it calls flush_write_bio(epd) and overwrites the return value of 1 to 0 or an error. The page might have been locked elsewhere, not with the goal of starting writeback of the extent buffer, and even by some code other than btrfs, like page migration for example, so it does not mean the writeback of the extent buffer was already started by some other task, so returning a 0 tells the caller (btree_write_cache_pages()) to not start writeback for the extent buffer. Note that epd might currently have either no bio, so flush_write_bio() returns 0 (success) or it might have a bio for another extent buffer with a lower index (logical address). Since we return 0 with the EXTENT_BUFFER_WRITEBACK bit set on the extent buffer and writeback is never started for the extent buffer, future attempts to writeback the extent buffer will hang forever waiting on that bit to be cleared, since it can only be cleared after writeback completes. Such hang is reported with a trace like the following: [49887.347053] INFO: task btrfs-transacti:1752 blocked for more than 122 seconds. [49887.347059] Not tainted 5.2.13-gentoo #2 [49887.347060] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [49887.347062] btrfs-transacti D 0 1752 2 0x80004000 [49887.347064] Call Trace: [49887.347069] ? __schedule+0x265/0x830 [49887.347071] ? bit_wait+0x50/0x50 [49887.347072] ? bit_wait+0x50/0x50 [49887.347074] schedule+0x24/0x90 [49887.347075] io_schedule+0x3c/0x60 [49887.347077] bit_wait_io+0x8/0x50 [49887.347079] __wait_on_bit+0x6c/0x80 [49887.347081] ? __lock_release.isra.29+0x155/0x2d0 [49887.347083] out_of_line_wait_on_bit+0x7b/0x80 [49887.347084] ? var_wake_function+0x20/0x20 [49887.347087] lock_extent_buffer_for_io+0x28c/0x390 [49887.347089] btree_write_cache_pages+0x18e/0x340 [49887.347091] do_writepages+0x29/0xb0 [49887.347093] ? kmem_cache_free+0x132/0x160 [49887.347095] ? convert_extent_bit+0x544/0x680 [49887.347097] filemap_fdatawrite_range+0x70/0x90 [49887.347099] btrfs_write_marked_extents+0x53/0x120 [49887.347100] btrfs_write_and_wait_transaction.isra.4+0x38/0xa0 [49887.347102] btrfs_commit_transaction+0x6bb/0x990 [49887.347103] ? start_transaction+0x33e/0x500 [49887.347105] transaction_kthread+0x139/0x15c So fix this by not overwriting the return value (ret) with the result from flush_write_bio(). We also need to clear the EXTENT_BUFFER_WRITEBACK bit in case flush_write_bio() returns an error, otherwise it will hang any future attempts to writeback the extent buffer, and undo all work done before (set back EXTENT_BUFFER_DIRTY, etc). This is a regression introduced in the 5.2 kernel. Fixes: 2e3c25136adfb ("btrfs: extent_io: add proper error handling to lock_extent_buffer_for_io()") Fixes: f4340622e0226 ("btrfs: extent_io: Move the BUG_ON() in flush_write_bio() one level up") Reported-by: Zdenek Sojka <zsojka@seznam.cz> Link: https://lore.kernel.org/linux-btrfs/GpO.2yos.3WGDOLpx6t%7D.1TUDYM@seznam.cz/T/#u Reported-by: Stefan Priebe - Profihost AG <s.priebe@profihost.ag> Link: https://lore.kernel.org/linux-btrfs/5c4688ac-10a7-fb07-70e8-c5d31a3fbb38@profihost.ag/T/#t Reported-by: Drazen Kacar <drazen.kacar@oradian.com> Link: https://lore.kernel.org/linux-btrfs/DB8PR03MB562876ECE2319B3E579590F799C80@DB8PR03MB5628.eurprd03.prod.outlook.com/ Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204377 Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-12Btrfs: fix assertion failure during fsync and use of stale transactionFilipe Manana
Sometimes when fsync'ing a file we need to log that other inodes exist and when we need to do that we acquire a reference on the inodes and then drop that reference using iput() after logging them. That generally is not a problem except if we end up doing the final iput() (dropping the last reference) on the inode and that inode has a link count of 0, which can happen in a very short time window if the logging path gets a reference on the inode while it's being unlinked. In that case we end up getting the eviction callback, btrfs_evict_inode(), invoked through the iput() call chain which needs to drop all of the inode's items from its subvolume btree, and in order to do that, it needs to join a transaction at the helper function evict_refill_and_join(). However because the task previously started a transaction at the fsync handler, btrfs_sync_file(), it has current->journal_info already pointing to a transaction handle and therefore evict_refill_and_join() will get that transaction handle from btrfs_join_transaction(). From this point on, two different problems can happen: 1) evict_refill_and_join() will often change the transaction handle's block reserve (->block_rsv) and set its ->bytes_reserved field to a value greater than 0. If evict_refill_and_join() never commits the transaction, the eviction handler ends up decreasing the reference count (->use_count) of the transaction handle through the call to btrfs_end_transaction(), and after that point we have a transaction handle with a NULL ->block_rsv (which is the value prior to the transaction join from evict_refill_and_join()) and a ->bytes_reserved value greater than 0. If after the eviction/iput completes the inode logging path hits an error or it decides that it must fallback to a transaction commit, the btrfs fsync handle, btrfs_sync_file(), gets a non-zero value from btrfs_log_dentry_safe(), and because of that non-zero value it tries to commit the transaction using a handle with a NULL ->block_rsv and a non-zero ->bytes_reserved value. This makes the transaction commit hit an assertion failure at btrfs_trans_release_metadata() because ->bytes_reserved is not zero but the ->block_rsv is NULL. The produced stack trace for that is like the following: [192922.917158] assertion failed: !trans->bytes_reserved, file: fs/btrfs/transaction.c, line: 816 [192922.917553] ------------[ cut here ]------------ [192922.917922] kernel BUG at fs/btrfs/ctree.h:3532! [192922.918310] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC PTI [192922.918666] CPU: 2 PID: 883 Comm: fsstress Tainted: G W 5.1.4-btrfs-next-47 #1 [192922.919035] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014 [192922.919801] RIP: 0010:assfail.constprop.25+0x18/0x1a [btrfs] (...) [192922.920925] RSP: 0018:ffffaebdc8a27da8 EFLAGS: 00010286 [192922.921315] RAX: 0000000000000051 RBX: ffff95c9c16a41c0 RCX: 0000000000000000 [192922.921692] RDX: 0000000000000000 RSI: ffff95cab6b16838 RDI: ffff95cab6b16838 [192922.922066] RBP: ffff95c9c16a41c0 R08: 0000000000000000 R09: 0000000000000000 [192922.922442] R10: ffffaebdc8a27e70 R11: 0000000000000000 R12: ffff95ca731a0980 [192922.922820] R13: 0000000000000000 R14: ffff95ca84c73338 R15: ffff95ca731a0ea8 [192922.923200] FS: 00007f337eda4e80(0000) GS:ffff95cab6b00000(0000) knlGS:0000000000000000 [192922.923579] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [192922.923948] CR2: 00007f337edad000 CR3: 00000001e00f6002 CR4: 00000000003606e0 [192922.924329] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [192922.924711] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [192922.925105] Call Trace: [192922.925505] btrfs_trans_release_metadata+0x10c/0x170 [btrfs] [192922.925911] btrfs_commit_transaction+0x3e/0xaf0 [btrfs] [192922.926324] btrfs_sync_file+0x44c/0x490 [btrfs] [192922.926731] do_fsync+0x38/0x60 [192922.927138] __x64_sys_fdatasync+0x13/0x20 [192922.927543] do_syscall_64+0x60/0x1c0 [192922.927939] entry_SYSCALL_64_after_hwframe+0x49/0xbe (...) [192922.934077] ---[ end trace f00808b12068168f ]--- 2) If evict_refill_and_join() decides to commit the transaction, it will be able to do it, since the nested transaction join only increments the transaction handle's ->use_count reference counter and it does not prevent the transaction from getting committed. This means that after eviction completes, the fsync logging path will be using a transaction handle that refers to an already committed transaction. What happens when using such a stale transaction can be unpredictable, we are at least having a use-after-free on the transaction handle itself, since the transaction commit will call kmem_cache_free() against the handle regardless of its ->use_count value, or we can end up silently losing all the updates to the log tree after that iput() in the logging path, or using a transaction handle that in the meanwhile was allocated to another task for a new transaction, etc, pretty much unpredictable what can happen. In order to fix both of them, instead of using iput() during logging, use btrfs_add_delayed_iput(), so that the logging path of fsync never drops the last reference on an inode, that step is offloaded to a safe context (usually the cleaner kthread). The assertion failure issue was sporadically triggered by the test case generic/475 from fstests, which loads the dm error target while fsstress is running, which lead to fsync failing while logging inodes with -EIO errors and then trying later to commit the transaction, triggering the assertion failure. CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-11ovl: filter of trusted xattr results in auditMark Salyzyn
When filtering xattr list for reading, presence of trusted xattr results in a security audit log. However, if there is other content no errno will be set, and if there isn't, the errno will be -ENODATA and not -EPERM as is usually associated with a lack of capability. The check does not block the request to list the xattrs present. Switch to ns_capable_noaudit to reflect a more appropriate check. Signed-off-by: Mark Salyzyn <salyzyn@android.com> Cc: linux-security-module@vger.kernel.org Cc: kernel-team@android.com Cc: stable@vger.kernel.org # v3.18+ Fixes: a082c6f680da ("ovl: filter trusted xattr for non-admin") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-11ovl: Fix dereferencing possible ERR_PTR()Ding Xiang
if ovl_encode_real_fh() fails, no memory was allocated and the error in the error-valued pointer should be returned. Fixes: 9b6faee07470 ("ovl: check ERR_PTR() return value from ovl_encode_fh()") Signed-off-by: Ding Xiang <dingxiang@cmss.chinamobile.com> Cc: <stable@vger.kernel.org> # v4.16+ Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-11configfs: calculate the symlink target only onceAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Christoph Hellwig <hch@lst.de>
2019-09-11configfs: make configfs_create() return inodeAl Viro
Get rid of the callback, deal with that and dentry in callers Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Christoph Hellwig <hch@lst.de>
2019-09-11configfs: factor dirent removal into helpersChristoph Hellwig
Lots of duplicated code that benefits from a little consolidation. Signed-off-by: Christoph Hellwig <hch@lst.de>
2019-09-11configfs: fix a deadlock in configfs_symlink()Al Viro
Configfs abuses symlink(2). Unlike the normal filesystems, it wants the target resolved at symlink(2) time, like link(2) would've done. The problem is that ->symlink() is called with the parent directory locked exclusive, so resolving the target inside the ->symlink() is easily deadlocked. Short of really ugly games in sys_symlink() itself, all we can do is to unlock the parent before resolving the target and relock it after. However, that invalidates the checks done by the caller of ->symlink(), so we have to * check that dentry is still where it used to be (it couldn't have been moved, but it could've been unhashed) * recheck that it's still negative (somebody else might've successfully created a symlink with the same name while we were looking the target up) * recheck the permissions on the parent directory. Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Christoph Hellwig <hch@lst.de>
2019-09-10io_uring: limit parallelism of buffered writesJens Axboe
All the popular filesystems need to grab the inode lock for buffered writes. With io_uring punting buffered writes to async context, we observe a lot of contention with all workers hamming this mutex. For buffered writes, we generally don't need a lot of parallelism on the submission side, as the flushing will take care of that for us. Hence we don't need a deep queue on the write side, as long as we can safely punt from the original submission context. Add a workqueue with a limit of 2 that we can use for buffered writes. This greatly improves the performance and efficiency of higher queue depth buffered async writes with io_uring. Reported-by: Andres Freund <andres@anarazel.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-10io_uring: add io_queue_async_work() helperJens Axboe
Add a helper for queueing a request for async execution, in preparation for optimizing it. No functional change in this patch. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-10fuse: stop copying pages to fuse_reqMiklos Szeredi
The page array pointers are also duplicated across fuse_args_pages and fuse_req. Get rid of the fuse_req ones. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-10fuse: stop copying args to fuse_reqMiklos Szeredi
No need to duplicate the argument arrays in fuse_req, so just dereference req->args instead of copying to the fuse_req internal ones. This allows further cleanup of the fuse_req structure. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-10fuse: clean up fuse_reqMiklos Szeredi
Get rid of request specific fields in fuse_req that are not used anymore. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-10fuse: simplify request allocationMiklos Szeredi
Page arrays are not allocated together with the request anymore. Get rid of the dead code Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-10fuse: unexport request opsMiklos Szeredi
All requests are now sent with one of the fuse_simple_... helpers. Get rid of the old api from the fuse internal header. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-10fuse: convert retrieve to simple apiMiklos Szeredi
Rename fuse_request_send_notify_reply() to fuse_simple_notify_reply() and convert to passing fuse_args instead of fuse_req. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-10fuse: convert release to simple apiMiklos Szeredi
Since we cannot reserve the request structure up-front, make sure that the request allocation doesn't fail using __GFP_NOFAIL. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-10cuse: convert init to simple apiMiklos Szeredi
This is a straightforward conversion. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-10fuse: convert init to simple apiMiklos Szeredi
Bypass the fc->initialized check by setting the force flag. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-10fuse: convert writepages to simple apiMiklos Szeredi
Derive fuse_writepage_args from fuse_io_args. Sending the request is tricky since it was done with fi->lock held, hence we must either use atomic allocation or release the lock. Both are possible so try atomic first and if it fails, release the lock and do the regular allocation with GFP_NOFS and __GFP_NOFAIL. Both flags are necessary for correct operation. Move the page realloc function from dev.c to file.c and convert to using fuse_writepage_args. The last caller of fuse_write_fill() is gone, so get rid of it. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-10fuse: convert readdir to simple apiMiklos Szeredi
The old fuse_read_fill() helper can be deleted, now that the last user is gone. The fuse_io_args struct is moved to fuse_i.h so it can be shared between readdir/read code. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-10fuse: convert readpages to simple apiMiklos Szeredi
Need to extend fuse_io_args with 'attr_ver' and 'ff' members, that take the functionality of the same named members in fuse_req. fuse_short_read() can now take struct fuse_args_pages. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-10fuse: convert direct_io to simple apiMiklos Szeredi
Change of semantics in fuse_async_req_send/fuse_send_(read|write): these can now return error, in which case the 'end' callback isn't called, so the fuse_io_args object needs to be freed. Added verification that the return value is sane (less than or equal to the requested read/write size). Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-10fuse: add simple background helperMiklos Szeredi
Create a helper named fuse_simple_background() that is similar to fuse_simple_request(). Unlike the latter, it returns immediately and calls the supplied 'end' callback when the reply is received. The supplied 'args' pointer is stored in 'fuse_req' which allows the callback to interpret the output arguments decoded from the reply. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>