summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2020-06-30io_uring: do grab_env() just before puntingPavel Begunkov
Currently io_steal_work() is disabled, and every linked request should go through task_work for initialisation. Do io_req_work_grab_env() just before io-wq punting and for the whole link, so any request reachable by io_steal_work() is prepared. This is also interesting for another reason -- it localises io_req_work_grab_env() into one place just before io-wq punting, helping to to better manage req->work lifetime and add some neat cleanup/optimisations later. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-30io_uring: factor out grab_env() from defer_prep()Pavel Begunkov
Remove io_req_work_grab_env() call from io_req_defer_prep(), just call it when neccessary. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-30io_uring: do init work in grab_env()Pavel Begunkov
Place io_req_init_async() in io_req_work_grab_env() so it won't be forgotten. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-30io_uring: don't pass def into io_req_work_grab_envPavel Begunkov
Remove struct io_op_def *def parameter from io_req_work_grab_env(), it's trivially deducible from req->opcode and fast. The API is cleaner this way, and also helps the complier to understand that it's a real constant and could be register-cached. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-30io_uring: fix potential use after free on fallback request freePavel Begunkov
After __io_free_req() puts a ctx ref, it should be assumed that the ctx may already be gone. However, it can be accessed when putting the fallback req. Free the req first and then put the ctx. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-30io_uring: kill REQ_F_TIMEOUT_NOSEQPavel Begunkov
There are too many useless flags, kill REQ_F_TIMEOUT_NOSEQ, which can be easily infered from req.timeout itself. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-30io_uring: kill REQ_F_TIMEOUTPavel Begunkov
Now REQ_F_TIMEOUT is set but never used, kill it Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-30io_uring: replace find_next() out param with retPavel Begunkov
Generally, it's better to return a value directly than having out parameter. It's cleaner and saves from some kinds of ugly bugs. May also be faster. Return next request from io_req_find_next() and friends directly instead of passing out parameter. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-30io_uring: deduplicate freeing linked timeoutsPavel Begunkov
Linked timeout cancellation code is repeated in in io_req_link_next() and io_fail_links(), and they differ in details even though shouldn't. Basing on the fact that there is maximum one armed linked timeout in a link, and it immediately follows the head, extract a function that will check for it and defuse. Justification: - DRY and cleaner - better inlining for io_req_link_next() (just 1 call site now) - isolates linked_timeouts from common path - reduces time under spinlock for failed links - actually less code Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> [axboe: fold in locking fix for io_fail_links()] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-30iov_iter: Move unnecessary inclusion of crypto/hash.hHerbert Xu
The header file linux/uio.h includes crypto/hash.h which pulls in most of the Crypto API. Since linux/uio.h is used throughout the kernel this means that every tiny bit of change to the Crypto API causes the entire kernel to get rebuilt. This patch fixes this by moving it into lib/iov_iter.c instead where it is actually used. This patch also fixes the ifdef to use CRYPTO_HASH instead of just CRYPTO which does not guarantee the existence of ahash. Unfortunately a number of drivers were relying on linux/uio.h to provide access to linux/slab.h. This patch adds inclusions of linux/slab.h as detected by build failures. Also skbuff.h was relying on this to provide a declaration for ahash_request. This patch adds a forward declaration instead. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-06-30gfs2: Don't sleep during glock hash walkAndreas Gruenbacher
In flush_delete_work, instead of flushing each individual pending delayed work item, cancel and re-queue them for immediate execution. The waiting isn't needed here because we're already waiting for all queued work items to complete in gfs2_flush_delete_work. This makes the code more efficient, but more importantly, it avoids sleeping during a rhashtable walk, inside rcu_read_lock(). Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-06-30gfs2: fix trans slab error when withdraw occurs inside log_flushBob Peterson
Log flush operations (gfs2_log_flush()) can target a specific transaction. But if the function encounters errors (e.g. io errors) and withdraws, the transaction was only freed it if was queued to one of the ail lists. If the withdraw occurred before the transaction was queued to the ail1 list, function ail_drain never freed it. The result was: BUG gfs2_trans: Objects remaining in gfs2_trans on __kmem_cache_shutdown() This patch makes log_flush() add the targeted transaction to the ail1 list so that function ail_drain() will find and free it properly. Cc: stable@vger.kernel.org # v5.7+ Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-06-30gfs2: Don't return NULL from gfs2_inode_lookupAndreas Gruenbacher
Callers expect gfs2_inode_lookup to return an inode pointer or ERR_PTR(error). Commit b66648ad6dcf caused it to return NULL instead of ERR_PTR(-ESTALE) in some cases. Fix that. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: b66648ad6dcf ("gfs2: Move inode generation number check into gfs2_inode_lookup") Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-06-29fs/btrfs: Add cond_resched() for try_release_extent_mapping() stallsPaul E. McKenney
Very large I/Os can cause the following RCU CPU stall warning: RIP: 0010:rb_prev+0x8/0x50 Code: 49 89 c0 49 89 d1 48 89 c2 48 89 f8 e9 e5 fd ff ff 4c 89 48 10 c3 4c = 89 06 c3 4c 89 40 10 c3 0f 1f 00 48 8b 0f 48 39 cf 74 38 <48> 8b 47 10 48 85 c0 74 22 48 8b 50 08 48 85 d2 74 0c 48 89 d0 48 RSP: 0018:ffffc9002212bab0 EFLAGS: 00000287 ORIG_RAX: ffffffffffffff13 RAX: ffff888821f93630 RBX: ffff888821f93630 RCX: ffff888821f937e0 RDX: 0000000000000000 RSI: 0000000000102000 RDI: ffff888821f93630 RBP: 0000000000103000 R08: 000000000006c000 R09: 0000000000000238 R10: 0000000000102fff R11: ffffc9002212bac8 R12: 0000000000000001 R13: ffffffffffffffff R14: 0000000000102000 R15: ffff888821f937e0 __lookup_extent_mapping+0xa0/0x110 try_release_extent_mapping+0xdc/0x220 btrfs_releasepage+0x45/0x70 shrink_page_list+0xa39/0xb30 shrink_inactive_list+0x18f/0x3b0 shrink_lruvec+0x38e/0x6b0 shrink_node+0x14d/0x690 do_try_to_free_pages+0xc6/0x3e0 try_to_free_mem_cgroup_pages+0xe6/0x1e0 reclaim_high.constprop.73+0x87/0xc0 mem_cgroup_handle_over_high+0x66/0x150 exit_to_usermode_loop+0x82/0xd0 do_syscall_64+0xd4/0x100 entry_SYSCALL_64_after_hwframe+0x44/0xa9 On a PREEMPT=n kernel, the try_release_extent_mapping() function's "while" loop might run for a very long time on a large I/O. This commit therefore adds a cond_resched() to this loop, providing RCU any needed quiescent states. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-06-29nfsd: fix nfsdfs inode reference count leakJ. Bruce Fields
I don't understand this code well, but I'm seeing a warning about a still-referenced inode on unmount, and every other similar filesystem does a dput() here. Fixes: e8a79fb14f6b ("nfsd: add nfsd/clients directory") Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2020-06-29nfsd4: fix nfsdfs reference count loopJ. Bruce Fields
We don't drop the reference on the nfsdfs filesystem with mntput(nn->nfsd_mnt) until nfsd_exit_net(), but that won't be called until the nfsd module's unloaded, and we can't unload the module as long as there's a reference on nfsdfs. So this prevents module unloading. Fixes: 2c830dd7209b ("nfsd: persist nfsd filesystem across mounts") Reported-and-Tested-by: Luo Xiaogang <lxgrxd@163.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2020-06-29Revert "fs: Do not check if there is a fsnotify watcher on pseudo inodes"Mel Gorman
This reverts commit e9c15badbb7b ("fs: Do not check if there is a fsnotify watcher on pseudo inodes"). The commit intended to eliminate fsnotify-related overhead for pseudo inodes but it is broken in concept. inotify can receive events of pipe files under /proc/X/fd and chromium relies on close and open events for sandboxing. Maxim Levitsky reported the following Chromium starts as a white rectangle, shows few white rectangles that resemble its notifications and then crashes. The stdout output from chromium: [mlevitsk@starship ~]$chromium-freeworld mesa: for the --simplifycfg-sink-common option: may only occur zero or one times! mesa: for the --global-isel-abort option: may only occur zero or one times! [3379:3379:0628/135151.440930:ERROR:browser_switcher_service.cc(238)] XXX Init() ../../sandbox/linux/seccomp-bpf-helpers/sigsys_handlers.cc:**CRASHING**:seccomp-bpf failure in syscall 0072 Received signal 11 SEGV_MAPERR 0000004a9048 Crashes are not universal but even if chromium does not crash, it certainly does not work properly. While filtering just modify and access might be safe, the benefit is not worth the risk hence the revert. Reported-by: Maxim Levitsky <mlevitsk@redhat.com> Fixes: e9c15badbb7b ("fs: Do not check if there is a fsnotify watcher on pseudo inodes") Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-29io_uring: fix missing wake_up io_rw_reissue()Pavel Begunkov
Don't forget to wake up a process to which io_rw_reissue() added task_work. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-29exfat: flush dirty metadata in fsyncSungjong Seo
generic_file_fsync() exfat used could not guarantee the consistency of a file because it has flushed not dirty metadata but only dirty data pages for a file. Instead of that, use exfat_file_fsync() for files and directories so that it guarantees to commit both the metadata and data pages for a file. Signed-off-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
2020-06-29exfat: move setting VOL_DIRTY over exfat_remove_entries()Namjae Jeon
Move setting VOL_DIRTY over exfat_remove_entries() to avoid unneeded leaving VOL_DIRTY on -ENOTEMPTY. Fixes: 5f2aa075070c ("exfat: add inode operations") Cc: stable@vger.kernel.org # v5.7 Reported-by: Tetsuhiro Kohada <kohada.t2@gmail.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
2020-06-29exfat: call sync_filesystem for read-only remountHyunchul Lee
We need to commit dirty metadata and pages to disk before remounting exfat as read-only. This fixes a failure in xfstests generic/452 generic/452 does the following: cp something <exfat>/ mount -o remount,ro <exfat> the <exfat>/something is corrupted. because while exfat is remounted as read-only, exfat doesn't have a chance to commit metadata and vfs invalidates page caches in a block device. Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com> Acked-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
2020-06-29exfat: add missing brelse() calls on error pathsDan Carpenter
If the second exfat_get_dentry() call fails then we need to release "old_bh" before returning. There is a similar bug in exfat_move_file(). Fixes: 5f2aa075070c ("exfat: add inode operations") Reported-by: Markus Elfring <Markus.Elfring@web.de> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
2020-06-29exfat: Set the unused characters of FileName field to the value 0000hHyeongseok.Kim
Some fsck tool complain that padding part of the FileName field is not set to the value 0000h. So let's maintain filesystem cleaner, as exfat's spec. recommendation. Signed-off-by: Hyeongseok.Kim <Hyeongseok@gmail.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
2020-06-28Merge tag 'efi-urgent-2020-06-28' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull EFI fixes from Ingo Molnar: - Fix build regression on v4.8 and older - Robustness fix for TPM log parsing code - kobject refcount fix for the ESRT parsing code - Two efivarfs fixes to make it behave more like an ordinary file system - Style fixup for zero length arrays - Fix a regression in path separator handling in the initrd loader - Fix a missing prototype warning - Add some kerneldoc headers for newly introduced stub routines - Allow support for SSDT overrides via EFI variables to be disabled - Report CPU mode and MMU state upon entry for 32-bit ARM - Use the correct stack pointer alignment when entering from mixed mode * tag 'efi-urgent-2020-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: efi/libstub: arm: Print CPU boot mode and MMU state at boot efi/libstub: arm: Omit arch specific config table matching array on arm64 efi/x86: Setup stack correctly for efi_pe_entry efi: Make it possible to disable efivar_ssdt entirely efi/libstub: Descriptions for stub helper functions efi/libstub: Fix path separator regression efi/libstub: Fix missing-prototype warning for skip_spaces() efi: Replace zero-length array and use struct_size() helper efivarfs: Don't return -EINTR when rate-limiting reads efivarfs: Update inode modification time for successful writes efi/esrt: Fix reference count leak in esre_create_sysfs_entry. efi/tpm: Verify event log header before parsing efi/x86: Fix build with gcc 4
2020-06-28io_uring: fix iopoll -EAGAIN handlingPavel Begunkov
req->iopoll() is not necessarily called by a task that submitted a request. Because of that, it's dangerous to grab_env() and punt async on -EGAIN, potentially grabbing another task's mm and corrupting its memory. Do resubmit from the submitter task context. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-28io_uring: do task_work_run() during iopollPavel Begunkov
There are a lot of new users of task_work, and some of task_work_add() may happen while we do io polling, thus make iopoll from time to time to do task_work_run(), so it doesn't poll for sitting there reqs. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-28io_uring: clean up req->result setting by rwPavel Begunkov
Assign req->result to io_size early in io_{read,write}(), it's enough and makes it more straightforward. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-28io_uring: kill REQ_F_LINK_NEXTPavel Begunkov
After pulling nxt from a request, it's no more a links head, so clear REQ_F_LINK_HEAD. Absence of this flag also indicates that there are no linked requests, so replacing REQ_F_LINK_NEXT, which can be killed. Linked timeouts also behave leaving the flag intact when necessary. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-28io_uring: cosmetic changes for batch freePavel Begunkov
Move all batch free bits close to each other and rename in a consistent way. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-28io_uring: batch-free linked requests as wellPavel Begunkov
There is no reason to not batch deallocation of linked requests. Take away its next req first and handle it as everything else in io_req_multi_free(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-28io_uring: dismantle req early and remove need_iterPavel Begunkov
Every request in io_req_multi_free() is has ->file set. Instead of pointlessly defering and counting reqs with file, dismantle it on place and save for batch dealloc. It also saves us from potentially skipping io_cleanup_req(), put_task(), etc. Never happens though, becacuse ->file is always there. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-28io_uring: remove inflight batching in free_many()Pavel Begunkov
io_free_req_many() is used only for iopoll requests, i.e. reads/writes. Hence no need to batch inflight unhooking. For safety, it'll be done by io_dismantle_req(), which replaces __io_req_aux_free(), and looks more solid and cleaner. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-28io_uring: fix refs underflow in io_iopoll_queue()Pavel Begunkov
Now io_complete_rw_common() puts a ref, extra io_req_put() in io_iopoll_queue() causes undeflow. Remove it. [ 455.998620] refcount_t: underflow; use-after-free. [ 455.998743] WARNING: CPU: 6 PID: 285394 at lib/refcount.c:28 refcount_warn_saturate+0xae/0xf0 [ 455.998772] CPU: 6 PID: 285394 Comm: read-write2 Tainted: G I E 5.8.0-rc2-00048-g1b1aa738f167-dirty #509 [ 455.998772] RIP: 0010:refcount_warn_saturate+0xae/0xf0 ... [ 455.998778] Call Trace: [ 455.998778] io_put_req+0x44/0x50 [ 455.998778] io_iopoll_complete+0x245/0x370 [ 455.998779] io_iopoll_getevents+0x12f/0x1a0 [ 455.998779] io_iopoll_reap_events.part.0+0x5e/0xa0 [ 455.998780] io_ring_ctx_wait_and_kill+0x132/0x1c0 [ 455.998780] io_uring_release+0x20/0x30 [ 455.998780] __fput+0xcd/0x230 [ 455.998781] ____fput+0xe/0x10 [ 455.998781] task_work_run+0x67/0xa0 [ 455.998781] do_exit+0x35d/0xb70 [ 455.998782] do_group_exit+0x43/0xa0 [ 455.998783] get_signal+0x140/0x900 [ 455.998783] do_signal+0x37/0x780 [ 455.998784] __prepare_exit_to_usermode+0x126/0x1c0 [ 455.998785] __syscall_return_slowpath+0x3b/0x1c0 [ 455.998785] do_syscall_64+0x5f/0xa0 [ 455.998785] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: a1d7c393c47 ("io_uring: enable READ/WRITE to use deferred completions") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-28io_uring: fix missing io_grab_files()Pavel Begunkov
We won't have valid ring_fd, ring_file in task work. Grab files early. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-28io_uring: don't mark link's head for_asyncPavel Begunkov
No reason to mark a head of a link as for-async in io_req_defer_prep(). grab_env(), etc. That will be done further during submission if neccessary. Mark for_async=false saving extra grab_env() in many cases. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-28io_uring: fix feeding io-wq with uninit reqsPavel Begunkov
io_steal_work() can't be sure that @nxt has req->work properly set, so we can't pass it to io-wq as is. A dirty quick fix -- drag it through io_req_task_queue(), and always return NULL from io_steal_work(). e.g. [ 50.770161] BUG: kernel NULL pointer dereference, address: 00000000 [ 50.770164] #PF: supervisor write access in kernel mode [ 50.770164] #PF: error_code(0x0002) - not-present page [ 50.770168] CPU: 1 PID: 1448 Comm: io_wqe_worker-0 Tainted: G I 5.8.0-rc2-00035-g2237d76530eb-dirty #494 [ 50.770172] RIP: 0010:override_creds+0x19/0x30 ... [ 50.770183] io_worker_handle_work+0x25c/0x430 [ 50.770185] io_wqe_worker+0x2a0/0x350 [ 50.770190] kthread+0x136/0x180 [ 50.770194] ret_from_fork+0x22/0x30 Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-28io_uring: fix punting req w/o grabbed envPavel Begunkov
It's not enough to check for REQ_F_WORK_INITIALIZED and punt async assuming that io_req_work_grab_env() was called, it may not have been. E.g. io_close_prep() and personality path set the flag without further async init. As a quick fix, always pass next work through io_req_task_queue(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-28io_uring: fix req->work corruptionPavel Begunkov
req->work and req->task_work are in a union, so io_req_task_queue() screws everything that was in work. De-union them for now. [ 704.367253] BUG: unable to handle page fault for address: ffffffffaf7330d0 [ 704.367256] #PF: supervisor write access in kernel mode [ 704.367256] #PF: error_code(0x0003) - permissions violation [ 704.367261] CPU: 6 PID: 1654 Comm: io_wqe_worker-0 Tainted: G I 5.8.0-rc2-00038-ge28d0bdc4863-dirty #498 [ 704.367265] RIP: 0010:_raw_spin_lock+0x1e/0x36 ... [ 704.367276] __alloc_fd+0x35/0x150 [ 704.367279] __get_unused_fd_flags+0x25/0x30 [ 704.367280] io_openat2+0xcb/0x1b0 [ 704.367283] io_issue_sqe+0x36a/0x1320 [ 704.367294] io_wq_submit_work+0x58/0x160 [ 704.367295] io_worker_handle_work+0x2a3/0x430 [ 704.367296] io_wqe_worker+0x2a0/0x350 [ 704.367301] kthread+0x136/0x180 [ 704.367304] ret_from_fork+0x22/0x30 Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-27afs: Fix storage of cell namesDavid Howells
The cell name stored in the afs_cell struct is a 64-char + NUL buffer - when it needs to be able to handle up to AFS_MAXCELLNAME (256 chars) + NUL. Fix this by changing the array to a pointer and allocating the string. Found using Coverity. Fixes: 989782dcdc91 ("afs: Overhaul cell database management") Reported-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-27Merge tag '5.8-rc2-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull cifs fixes from Steve French: "Six cifs/smb3 fixes, three of them for stable. Fixes xfstests 451, 313 and 316" * tag '5.8-rc2-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: misc: Use array_size() in if-statement controlling expression cifs: update ctime and mtime during truncate cifs/smb3: Fix data inconsistent when punch hole cifs/smb3: Fix data inconsistent when zero file range cifs: Fix double add page to memcg when cifs_readpages cifs: Fix cached_fid refcnt leak in open_shroot
2020-06-27Merge tag 'nfs-for-5.8-2' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds
Pull NFS client bugfixes from Anna Schumaker: "Stable Fixes: - xprtrdma: Fix handling of RDMA_ERROR replies - sunrpc: Fix rollback in rpc_gssd_dummy_populate() - pNFS/flexfiles: Fix list corruption if the mirror count changes - NFSv4: Fix CLOSE not waiting for direct IO completion - SUNRPC: Properly set the @subbuf parameter of xdr_buf_subsegment() Other Fixes: - xprtrdma: Fix a use-after-free with r_xprt->rx_ep - Fix other xprtrdma races during disconnect - NFS: Fix memory leak of export_path" * tag 'nfs-for-5.8-2' of git://git.linux-nfs.org/projects/anna/linux-nfs: SUNRPC: Properly set the @subbuf parameter of xdr_buf_subsegment() NFSv4 fix CLOSE not waiting for direct IO compeletion pNFS/flexfiles: Fix list corruption if the mirror count changes nfs: Fix memory leak of export_path sunrpc: fixed rollback in rpc_gssd_dummy_populate() xprtrdma: Fix handling of RDMA_ERROR replies xprtrdma: Clean up disconnect xprtrdma: Clean up synopsis of rpcrdma_flush_disconnect() xprtrdma: Use re_connect_status safely in rpcrdma_xprt_connect() xprtrdma: Prevent dereferencing r_xprt->rx_ep after it is freed
2020-06-27Merge tag 'io_uring-5.8-2020-06-26' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull io_uring fixes from Jens Axboe: "Three small fixes: - Close a corner case for polled IO resubmission (Pavel) - Toss commands when exiting (Pavel) - Fix SQPOLL conditional reschedule on perpetually busy submit (Xuan)" * tag 'io_uring-5.8-2020-06-26' of git://git.kernel.dk/linux-block: io_uring: fix current->mm NULL dereference on exit io_uring: fix hanging iopoll in case of -EAGAIN io_uring: fix io_sq_thread no schedule when busy
2020-06-26io_uring: fix function args for !CONFIG_NETRandy Dunlap
Fix build errors when CONFIG_NET is not set/enabled: ../fs/io_uring.c:5472:10: error: too many arguments to function ‘io_sendmsg’ ../fs/io_uring.c:5474:10: error: too many arguments to function ‘io_send’ ../fs/io_uring.c:5484:10: error: too many arguments to function ‘io_recvmsg’ ../fs/io_uring.c:5486:10: error: too many arguments to function ‘io_recv’ ../fs/io_uring.c:5510:9: error: too many arguments to function ‘io_accept’ ../fs/io_uring.c:5518:9: error: too many arguments to function ‘io_connect’ Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: io-uring@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-26Merge branch 'io_uring-5.8' into for-5.9/io_uringJens Axboe
Merge in changes that went into 5.8-rc3. GIT will silently do the merge, but we still need a tweak on top of that since io_complete_rw_common() was modified to take a io_comp_state pointer. The auto-merge fails on that, and we end up with something that doesn't compile. * io_uring-5.8: io_uring: fix current->mm NULL dereference on exit io_uring: fix hanging iopoll in case of -EAGAIN io_uring: fix io_sq_thread no schedule when busy Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-26Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misx fixes from Andrew Morton: "31 patches. Subsystems affected by this patch series: hotfixes, mm/pagealloc, kexec, ocfs2, lib, mm/slab, mm/slab, mm/slub, mm/swap, mm/pagemap, mm/vmalloc, mm/memcg, mm/gup, mm/thp, mm/vmscan, x86, mm/memory-hotplug, MAINTAINERS" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (31 commits) MAINTAINERS: update info for sparse mm/memory_hotplug.c: fix false softlockup during pfn range removal mm: remove vmalloc_exec arm64: use PAGE_KERNEL_ROX directly in alloc_insn_page x86/hyperv: allocate the hypercall page with only read and execute bits mm/memory: fix IO cost for anonymous page mm/swap: fix for "mm: workingset: age nonresident information alongside anonymous pages" mm: workingset: age nonresident information alongside anonymous pages doc: THP CoW fault no longer allocate THP docs: mm/gup: minor documentation update mm/memcontrol.c: prevent missed memory.low load tears mm/memcontrol.c: add missed css_put() mm: memcontrol: handle div0 crash race condition in memory.low mm/vmalloc.c: fix a warning while make xmldocs media: omap3isp: remove cacheflush.h make asm-generic/cacheflush.h more standalone mm/debug_vm_pgtable: fix build failure with powerpc 8xx mm/memory.c: properly pte_offset_map_lock/unlock in vm_insert_pages() mm: fix swap cache node allocation mask slub: cure list_slab_objects() from double fix ...
2020-06-26io-wq: return next work from ->do_work() directlyPavel Begunkov
It's easier to return next work from ->do_work() than having an in-out argument. Looks nicer and easier to compile. Also, merge io_wq_assign_next() into its only user. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-26io-wq: compact io-wq flags numbersPavel Begunkov
Renumerate IO_WQ flags, so they take adjacent bits Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-26io_uring: use task_work for links if possibleJens Axboe
Currently links are always done in an async fashion, unless we catch them inline after we successfully complete a request without having to resort to blocking. This isn't necessarily the most efficient approach, it'd be more ideal if we could just use the task_work handling for this. Outside of saving an async jump, we can also do less prep work for these kinds of requests. Running dependent links from the task_work handler yields some nice performance benefits. As an example, examples/link-cp from the liburing repository uses read+write links to implement a copy operation. Without this patch, the a cache fold 4G file read from a VM runs in about 3 seconds: $ time examples/link-cp /data/file /dev/null real 0m2.986s user 0m0.051s sys 0m2.843s and a subsequent cache hot run looks like this: $ time examples/link-cp /data/file /dev/null real 0m0.898s user 0m0.069s sys 0m0.797s With this patch in place, the cold case takes about 2.4 seconds: $ time examples/link-cp /data/file /dev/null real 0m2.400s user 0m0.020s sys 0m2.366s and the cache hot case looks like this: $ time examples/link-cp /data/file /dev/null real 0m0.676s user 0m0.010s sys 0m0.665s As expected, the (mostly) cache hot case yields the biggest improvement, running about 25% faster with this change, while the cache cold case yields about a 20% increase in performance. Outside of the performance increase, we're using less CPU as well, as we're not using the async offload threads at all for this anymore. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-26NFSv4 fix CLOSE not waiting for direct IO compeletionOlga Kornievskaia
Figuring out the root case for the REMOVE/CLOSE race and suggesting the solution was done by Neil Brown. Currently what happens is that direct IO calls hold a reference on the open context which is decremented as an asynchronous task in the nfs_direct_complete(). Before reference is decremented, control is returned to the application which is free to close the file. When close is being processed, it decrements its reference on the open_context but since directIO still holds one, it doesn't sent a close on the wire. It returns control to the application which is free to do other operations. For instance, it can delete a file. Direct IO is finally releasing its reference and triggering an asynchronous close. Which races with the REMOVE. On the server, REMOVE can be processed before the CLOSE, failing the REMOVE with EACCES as the file is still opened. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Suggested-by: Neil Brown <neilb@suse.com> CC: stable@vger.kernel.org Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-06-26pNFS/flexfiles: Fix list corruption if the mirror count changesTrond Myklebust
If the mirror count changes in the new layout we pick up inside ff_layout_pg_init_write(), then we can end up adding the request to the wrong mirror and corrupting the mirror->pg_list. Fixes: d600ad1f2bdb ("NFS41: pop some layoutget errors to application") Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>