linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2020-05-17	io_uring: initialize ctx->sqo_wait earlier	Jens Axboe
	Ensure that ctx->sqo_wait is initialized as soon as the ctx is allocated, instead of deferring it to the offload setup. This fixes a syzbot reported lockdep complaint, which is really due to trying to wake_up on an uninitialized wait queue: RSP: 002b:00007fffb1fb9aa8 EFLAGS: 00000246 ORIG_RAX: 00000000000001a9 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000441319 RDX: 0000000000000001 RSI: 0000000020000140 RDI: 000000000000047b RBP: 0000000000010475 R08: 0000000000000001 R09: 00000000004002c8 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000402260 R13: 00000000004022f0 R14: 0000000000000000 R15: 0000000000000000 INFO: trying to register non-static key. the code is fine but needs lockdep annotation. turning off the locking correctness validator. CPU: 1 PID: 7090 Comm: syz-executor222 Not tainted 5.7.0-rc1-next-20200415-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x188/0x20d lib/dump_stack.c:118 assign_lock_key kernel/locking/lockdep.c:913 [inline] register_lock_class+0x1664/0x1760 kernel/locking/lockdep.c:1225 __lock_acquire+0x104/0x4c50 kernel/locking/lockdep.c:4234 lock_acquire+0x1f2/0x8f0 kernel/locking/lockdep.c:4934 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline] _raw_spin_lock_irqsave+0x8c/0xbf kernel/locking/spinlock.c:159 __wake_up_common_lock+0xb4/0x130 kernel/sched/wait.c:122 io_cqring_ev_posted+0xa5/0x1e0 fs/io_uring.c:1160 io_poll_remove_all fs/io_uring.c:4357 [inline] io_ring_ctx_wait_and_kill+0x2bc/0x5a0 fs/io_uring.c:7305 io_uring_create fs/io_uring.c:7843 [inline] io_uring_setup+0x115e/0x22b0 fs/io_uring.c:7870 do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:295 entry_SYSCALL_64_after_hwframe+0x49/0xb3 RIP: 0033:0x441319 Code: e8 5c ae 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 bb 0a fc ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007fffb1fb9aa8 EFLAGS: 00000246 ORIG_RAX: 00000000000001a9 Reported-by: syzbot+8c91f5d054e998721c57@syzkaller.appspotmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-16	Merge tag '5.7-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6	Linus Torvalds
	Pull cifs fixes from Steve French: "Three small cifs/smb3 fixes, one for stable" * tag '5.7-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: fix leaked reference on requeued write cifs: Fix null pointer check in cifs_read CIFS: Spelling s/EACCESS/EACCES/
2020-05-16	Merge tag 'io_uring-5.7-2020-05-15' of git://git.kernel.dk/linux-block	Linus Torvalds
	Pull io_uring fixes from Jens Axboe: "Two small fixes that should go into this release: - Check and handle zero length splice (Pavel) - Fix a regression in this merge window for fixed files used with polled block IO" * tag 'io_uring-5.7-2020-05-15' of git://git.kernel.dk/linux-block: io_uring: polled fixed file must go through free iteration io_uring: fix zero len do_splice()
2020-05-15	Merge tag 'nfs-for-5.7-5' of git://git.linux-nfs.org/projects/trondmy/linux-nfs	Linus Torvalds
	Pull NFS client bugfixes from Trond Myklebust: "Highlights include: Stable fixes: - nfs: fix NULL deference in nfs4_get_valid_delegation Bugfixes: - Fix corruption of the return value in cachefiles_read_or_alloc_pages() - Fix several fscache cookie issues - Fix a fscache queuing race that can trigger a BUG_ON - NFS: Fix two use-after-free regressions due to the RPC_TASK_CRED_NOREF flag - SUNRPC: Fix a use-after-free regression in rpc_free_client_work() - SUNRPC: Fix a race when tearing down the rpc client debugfs directory - SUNRPC: Signalled ASYNC tasks need to exit - NFSv3: fix rpc receive buffer size for MOUNT call" * tag 'nfs-for-5.7-5' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFSv3: fix rpc receive buffer size for MOUNT call SUNRPC: 'Directory with parent 'rpc_clnt' already present!' NFS/pnfs: Don't use RPC_TASK_CRED_NOREF with pnfs NFS: Don't use RPC_TASK_CRED_NOREF with delegreturn SUNRPC: Signalled ASYNC tasks need to exit nfs: fix NULL deference in nfs4_get_valid_delegation SUNRPC: fix use-after-free in rpc_free_client_work() cachefiles: Fix race between read_waiter and read_copier involving op->to_do NFSv4: Fix fscache cookie aux_data to ensure change_attr is included NFS: Fix fscache super_cookie allocation NFS: Fix fscache super_cookie index_key from changing after umount cachefiles: Fix corruption of the return value in cachefiles_read_or_alloc_pages()
2020-05-15	fscrypt: add fscrypt_add_test_dummy_key()	Eric Biggers
	Currently, the test_dummy_encryption mount option (which is used for encryption I/O testing with xfstests) uses v1 encryption policies, and it relies on userspace inserting a test key into the session keyring. We need test_dummy_encryption to support v2 encryption policies too. Requiring userspace to add the test key doesn't work well with v2 policies, since v2 policies only support the filesystem keyring (not the session keyring), and keys in the filesystem keyring are lost when the filesystem is unmounted. Hooking all test code that unmounts and re-mounts the filesystem would be difficult. Instead, let's make the filesystem automatically add the test key to its keyring when test_dummy_encryption is enabled. That puts the responsibility for choosing the test key on the kernel. We could just hard-code a key. But out of paranoia, let's first try using a per-boot random key, to prevent this code from being misused. A per-boot key will work as long as no one expects dummy-encrypted files to remain accessible after a reboot. (gce-xfstests doesn't.) Therefore, this patch adds a function fscrypt_add_test_dummy_key() which implements the above. The next patch will use it. Link: https://lore.kernel.org/r/20200512233251.118314-3-ebiggers@kernel.org Reviewed-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-05-15	io_uring: file registration list and lock optimization	Jens Axboe
	There's no point in using list_del_init() on entries that are going away, and the associated lock is always used in process context so let's not use the IRQ disabling+saving variant of the spinlock. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-15	io_uring: add IORING_CQ_EVENTFD_DISABLED to the CQ ring flags	Stefano Garzarella
	This new flag should be set/clear from the application to disable/enable eventfd notifications when a request is completed and queued to the CQ ring. Before this patch, notifications were always sent if an eventfd is registered, so IORING_CQ_EVENTFD_DISABLED is not set during the initialization. It will be up to the application to set the flag after initialization if no notifications are required at the beginning. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-15	io_uring: add 'cq_flags' field for the CQ ring	Stefano Garzarella
	This patch adds the new 'cq_flags' field that should be written by the application and read by the kernel. This new field is available to the userspace application through 'cq_off.flags'. We are using 4-bytes previously reserved and set to zero. This means that if the application finds this field to zero, then the new functionality is not supported. In the next patch we will introduce the first flag available. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-15	io_uring: allow POLL_ADD with double poll_wait() users	Jens Axboe
	Some file descriptors use separate waitqueues for their f_ops->poll() handler, most commonly one for read and one for write. The io_uring poll implementation doesn't work with that, as the 2nd poll_wait() call will cause the io_uring poll request to -EINVAL. This affects (at least) tty devices and /dev/random as well. This is a big problem for event loops where some file descriptors work, and others don't. With this fix, io_uring handles multiple waitqueues. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-15	io_uring: batch reap of dead file registrations	Jens Axboe
	We currently embed and queue a work item per fixed_file_ref_node that we update, but if the workload does a lot of these, then the associated kworker-events overhead can become quite noticeable. Since we rarely need to wait on these, batch them at 1 second intervals instead. If we do need to wait for them, we just flush the pending delayed work. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-15	scs: Add page accounting for shadow call stack allocations	Sami Tolvanen
	This change adds accounting for the memory allocated for shadow stacks. Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Will Deacon <will@kernel.org>
2020-05-14	io_uring: name sq thread and ref completions	Jens Axboe
	We used to have three completions, now we just have two. With the two, let's not allocate them dynamically, just embed then in the ctx and name them appropriately. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-14	cifs: fix leaked reference on requeued write	Adam McCoy
	Failed async writes that are requeued may not clean up a refcount on the file, which can result in a leaked open. This scenario arises very reliably when using persistent handles and a reconnect occurs while writing. cifs_writev_requeue only releases the reference if the write fails (rc != 0). The server->ops->async_writev operation will take its own reference, so the initial reference can always be released. Signed-off-by: Adam McCoy <adam@forsedomani.com> Signed-off-by: Steve French <stfrench@microsoft.com> CC: Stable <stable@vger.kernel.org> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2020-05-14	NFSv3: fix rpc receive buffer size for MOUNT call	Olga Kornievskaia
	Prior to commit e3d3ab64dd66 ("SUNRPC: Use au_rslack when computing reply buffer size"), there was enough slack in the reply buffer to commodate filehandles of size 60bytes. However, the real problem was that the reply buffer size for the MOUNT operation was not correctly calculated. Received buffer size used the filehandle size for NFSv2 (32bytes) which is much smaller than the allowed filehandle size for the v3 mounts. Fix the reply buffer size (decode arguments size) for the MNT command. Fixes: 2c94b8eca1a2 ("SUNRPC: Use au_rslack when computing reply buffer size") Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-05-14	epoll: call final ep_events_available() check under the lock	Roman Penyaev
	There is a possible race when ep_scan_ready_list() leaves ->rdllist and ->obflist empty for a short period of time although some events are pending. It is quite likely that ep_events_available() observes empty lists and goes to sleep. Since commit 339ddb53d373 ("fs/epoll: remove unnecessary wakeups of nested epoll") we are conservative in wakeups (there is only one place for wakeup and this is ep_poll_callback()), thus ep_events_available() must always observe correct state of two lists. The easiest and correct way is to do the final check under the lock. This does not impact the performance, since lock is taken anyway for adding a wait entry to the wait queue. The discussion of the problem can be found here: https://lore.kernel.org/linux-fsdevel/a2f22c3c-c25a-4bda-8339-a7bdaf17849e@akamai.com/ In this patch barrierless __set_current_state() is used. This is safe since waitqueue_active() is called under the same lock on wakeup side. Short-circuit for fatal signals (i.e. fatal_signal_pending() check) is moved to the line just before actual events harvesting routine. This is fully compliant to what is said in the comment of the patch where the actual fatal_signal_pending() check was added: c257a340ede0 ("fs, epoll: short circuit fetching events if thread has been killed"). Fixes: 339ddb53d373 ("fs/epoll: remove unnecessary wakeups of nested epoll") Reported-by: Jason Baron <jbaron@akamai.com> Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Roman Penyaev <rpenyaev@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Jason Baron <jbaron@akamai.com> Cc: Khazhismel Kumykov <khazhy@google.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200505145609.1865152-1-rpenyaev@suse.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-14	cifs: Fix null pointer check in cifs_read	Steve French
	Coverity scan noted a redundant null check Coverity-id: 728517 Reported-by: Coverity <scan-admin@coverity.com> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Shyam Prasad N <nspmangalore@gmail.com>
2020-05-14	vfs: add faccessat2 syscall	Miklos Szeredi
	POSIX defines faccessat() as having a fourth "flags" argument, while the linux syscall doesn't have it. Glibc tries to emulate AT_EACCESS and AT_SYMLINK_NOFOLLOW, but AT_EACCESS emulation is broken. Add a new faccessat(2) syscall with the added flags argument and implement both flags. The value of AT_EACCESS is defined in glibc headers to be the same as AT_REMOVEDIR. Use this value for the kernel interface as well, together with the explanatory comment. Also add AT_EMPTY_PATH support, which is not documented by POSIX, but can be useful and is trivial to implement. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-05-14	vfs: don't parse "silent" option	Miklos Szeredi
	Parsing "silent" and clearing SB_SILENT makes zero sense. Parsing "silent" and setting SB_SILENT would make a bit more sense, but apparently nobody cares. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-05-14	vfs: don't parse "posixacl" option	Miklos Szeredi
	Unlike the others, this is _not_ a standard option accepted by mount(8). In fact SB_POSIXACL is an internal flag, and accepting MS_POSIXACL on the mount(2) interface is possibly a bug. The only filesystem that apparently wants to handle the "posixacl" option is 9p, but it has special handling of that option besides setting SB_POSIXACL. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-05-14	vfs: don't parse forbidden flags	Miklos Szeredi
	Makes little sense to keep this blacklist synced with what mount(8) parses and what it doesn't. E.g. it has various forms of "*atime" options, but not "atime"... Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-05-14	statx: add mount_root	Miklos Szeredi
	Determining whether a path or file descriptor refers to a mountpoint (or more precisely a mount root) is not trivial using current tools. Add a flag to statx that indicates whether the path or fd refers to the root of a mount or not. Cc: linux-api@vger.kernel.org Cc: linux-man@vger.kernel.org Reported-by: Lennart Poettering <mzxreary@0pointer.de> Reported-by: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-05-14	statx: add mount ID	Miklos Szeredi
	Systemd is hacking around to get it and it's trivial to add to statx, so... Cc: linux-api@vger.kernel.org Cc: linux-man@vger.kernel.org Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-05-14	statx: don't clear STATX_ATIME on SB_RDONLY	Miklos Szeredi
	IS_NOATIME(inode) is defined as __IS_FLG(inode, SB_RDONLY\|SB_NOATIME), so generic_fillattr() will clear STATX_ATIME from the result_mask if the super block is marked read only. This was probably not the intention, so fix to only clear STATX_ATIME if the fs doesn't support atime at all. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Acked-by: David Howells <dhowells@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-05-14	uapi: deprecate STATX_ALL	Miklos Szeredi
	Constants of the *_ALL type can be actively harmful due to the fact that developers will usually fail to consider the possible effects of future changes to the definition. Deprecate STATX_ALL in the uapi, while no damage has been done yet. We could keep something like this around in the kernel, but there's actually no point, since all filesystems should be explicitly checking flags that they support and not rely on the VFS masking unknown ones out: a flag could be known to the VFS, yet not known to the filesystem. Cc: David Howells <dhowells@redhat.com> Cc: linux-api@vger.kernel.org Cc: linux-man@vger.kernel.org Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-05-14	utimensat: AT_EMPTY_PATH support	Miklos Szeredi
	This makes it possible to use utimensat on an O_PATH file (including symlinks). It supersedes the nonstandard utimensat(fd, NULL, ...) form. Cc: linux-api@vger.kernel.org Cc: linux-man@vger.kernel.org Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-05-14	vfs: split out access_override_creds()	Miklos Szeredi
	Split out a helper that overrides the credentials in preparation for actually doing the access check. This prepares for the next patch that optionally disables the creds override. Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-05-14	proc/mounts: add cursor	Miklos Szeredi
	If mounts are deleted after a read(2) call on /proc/self/mounts (or its kin), the subsequent read(2) could miss a mount that comes after the deleted one in the list. This is because the file position is interpreted as the number mount entries from the start of the list. E.g. first read gets entries #0 to #9; the seq file index will be 10. Then entry #5 is deleted, resulting in #10 becoming #9 and #11 becoming #10, etc... The next read will continue from entry #10, and #9 is missed. Solve this by adding a cursor entry for each open instance. Taking the global namespace_sem for write seems excessive, since we are only dealing with a per-namespace list. Instead add a per-namespace spinlock and use that together with namespace_sem taken for read to protect against concurrent modification of the mount list. This may reduce parallelism of is_local_mountpoint(), but it's hardly a big contention point. We could also use RCU freeing of cursors to make traversal not need additional locks, if that turns out to be neceesary. Only move the cursor once for each read (cursor is not added on open) to minimize cacheline invalidation. When EOF is reached, the cursor is taken off the list, in order to prevent an excessive number of cursors due to inactive open file descriptors. Reported-by: Karel Zak <kzak@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-05-14	aio: fix async fsync creds	Miklos Szeredi
	Avi Kivity reports that on fuse filesystems running in a user namespace asyncronous fsync fails with EOVERFLOW. The reason is that f_ops->fsync() is called with the creds of the kthread performing aio work instead of the creds of the process originally submitting IOCB_CMD_FSYNC. Fuse sends the creds of the caller in the request header and it needs to translate the uid and gid into the server's user namespace. Since the kthread is running in init_user_ns, the translation will fail and the operation returns an error. It can be argued that fsync doesn't actually need any creds, but just zeroing out those fields in the header (as with requests that currently don't take creds) is a backward compatibility risk. Instead of working around this issue in fuse, solve the core of the problem by calling the filesystem with the proper creds. Reported-by: Avi Kivity <avi@scylladb.com> Tested-by: Giuseppe Scrivano <gscrivan@redhat.com> Fixes: c9582eb0ff7d ("fuse: Fail all requests with invalid uids or gids") Cc: stable@vger.kernel.org # 4.18+ Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-05-14	vfs: allow unprivileged whiteout creation	Miklos Szeredi
	Whiteouts, unlike real device node should not require privileges to create. The general concern with device nodes is that opening them can have side effects. The kernel already avoids zero major (see Documentation/admin-guide/devices.txt). To be on the safe side the patch explicitly forbids registering a char device with 0/0 number (see cdev_add()). This guarantees that a non-O_PATH open on a whiteout will fail with ENODEV; i.e. it won't have any side effect. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-05-13	io_uring: polled fixed file must go through free iteration	Jens Axboe
	When we changed the file registration handling, it became important to iterate the bulk request freeing list for fixed files as well, or we miss dropping the fixed file reference. If not, we're leaking references, and we'll get a kworker stuck waiting for file references to disappear. This also means we can remove the special casing of fixed vs non-fixed files, we need to iterate for both and we can just rely on __io_req_aux_free() doing io_put_file() instead of doing it manually. Fixes: 055895537302 ("io_uring: refactor file register/unregister/update handling") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-13	NFS/pnfs: Don't use RPC_TASK_CRED_NOREF with pnfs	Trond Myklebust
	When we're doing pnfs then the credential being used for the RPC call is not necessarily the same as the one used in the open context, so don't use RPC_TASK_CRED_NOREF. Fixes: 612965072020 ("NFSv4: Avoid referencing the cred unnecessarily during NFSv4 I/O") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-05-13	ovl: potential crash in ovl_fid_to_fh()	Dan Carpenter
	The "buflen" value comes from the user and there is a potential that it could be zero. In do_handle_to_path() we know that "handle->handle_bytes" is non-zero and we do: handle_dwords = handle->handle_bytes >> 2; So values 1-3 become zero. Then in ovl_fh_to_dentry() we do: int len = fh_len << 2; So now len is in the "0,4-128" range and a multiple of 4. But if "buflen" is zero it will try to copy negative bytes when we do the memcpy in ovl_fid_to_fh(). memcpy(&fh->fb, fid, buflen - OVL_FH_WIRE_OFFSET); And that will lead to a crash. Thanks to Amir Goldstein for his help with this patch. Fixes: cbe7fba8edfc ("ovl: make sure that real fid is 32bit aligned in memory") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Cc: <stable@vger.kernel.org> # v5.5 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-05-12	zonefs: use REQ_OP_ZONE_APPEND for sync DIO	Johannes Thumshirn
	Synchronous direct I/O to a sequential write only zone can be issued using the new REQ_OP_ZONE_APPEND request operation. As dispatching multiple BIOs can potentially result in reordering, we cannot support asynchronous IO via this interface. We also can only dispatch up to queue_max_zone_append_sectors() via the new zone-append method and have to return a short write back to user-space in case an IO larger than queue_max_zone_append_sectors() has been issued. Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Acked-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-12	block: add blk_io_schedule() for avoiding task hung in sync dio	Ming Lei
	Sync dio could be big, or may take long time in discard or in case of IO failure. We have prevented task hung in submit_bio_wait() and blk_execute_rq(), so apply the same trick for prevent task hung from happening in sync dio. Add helper of blk_io_schedule() and use io_schedule_timeout() to prevent task hung warning. Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Cc: Salman Qazi <sqazi@google.com> Cc: Jesse Barnes <jsbarnes@google.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Bart Van Assche <bvanassche@acm.org> Cc: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-12	fs-verity: remove unnecessary extern keywords	Eric Biggers
	Remove the unnecessary 'extern' keywords from function declarations. This makes it so that we don't have a mix of both styles, so it won't be ambiguous what to use in new fs-verity patches. This also makes the code shorter and matches the 'checkpatch --strict' expectation. Link: https://lore.kernel.org/r/20200511192118.71427-3-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-05-12	fs-verity: fix all kerneldoc warnings	Eric Biggers
	Fix all kerneldoc warnings in fs/verity/ and include/linux/fsverity.h. Most of these were due to missing documentation for function parameters. Detected with: scripts/kernel-doc -v -none fs/verity/*.{c,h} include/linux/fsverity.h This cleanup makes it possible to check new patches for kerneldoc warnings without having to filter out all the existing ones. Link: https://lore.kernel.org/r/20200511192118.71427-2-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-05-12	fscrypt: remove unnecessary extern keywords	Eric Biggers
	Remove the unnecessary 'extern' keywords from function declarations. This makes it so that we don't have a mix of both styles, so it won't be ambiguous what to use in new fscrypt patches. This also makes the code shorter and matches the 'checkpatch --strict' expectation. Link: https://lore.kernel.org/r/20200511191358.53096-4-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-05-12	fscrypt: fix all kerneldoc warnings	Eric Biggers
	Fix all kerneldoc warnings in fs/crypto/ and include/linux/fscrypt.h. Most of these were due to missing documentation for function parameters. Detected with: scripts/kernel-doc -v -none fs/crypto/*.{c,h} include/linux/fscrypt.h This cleanup makes it possible to check new patches for kerneldoc warnings without having to filter out all the existing ones. For consistency, also adjust some function "brief descriptions" to include the parentheses and to wrap at 80 characters. (The latter matches the checkpatch expectation.) Link: https://lore.kernel.org/r/20200511191358.53096-2-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-05-12	Merge tag 'gfs2-v5.7-rc1.fixes' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 fixes from Andreas Gruenbacher: "Various gfs2 fixes. Fixes for bugs prior to v5.7: - Fix random block reads when reading fragmented journals (v5.2) - Fix a possible random memory access in gfs2_walk_metadata (v5.3) Fixes for v5.7: - Fix several overlooked gfs2_qa_get / gfs2_qa_put imbalances - Fix several bugs in the new filesystem withdraw logic" * tag 'gfs2-v5.7-rc1.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: Revert "gfs2: Don't demote a glock until its revokes are written" gfs2: If go_sync returns error, withdraw but skip invalidate gfs2: Grab glock reference sooner in gfs2_add_revoke gfs2: don't call quota_unhold if quotas are not locked gfs2: move privileged user check to gfs2_quota_lock_check gfs2: remove check for quotas on in gfs2_quota_check gfs2: Change BUG_ON to an assert_withdraw in gfs2_quota_change gfs2: Fix problems regarding gfs2_qa_get and _put gfs2: More gfs2_find_jhead fixes gfs2: Another gfs2_walk_metadata fix gfs2: Fix use-after-free in gfs2_logd after withdraw gfs2: Fix BUG during unmount after file system withdraw gfs2: Fix error exit in do_xmote gfs2: fix withdraw sequence deadlock
2020-05-12	pstore: Refactor pstorefs record list removal	Kees Cook
	The "unlink" handling should perform list removal (which can also make sure records don't get double-erased), and the "evict" handling should be responsible only for memory freeing. Link: https://lore.kernel.org/lkml/20200506152114.50375-8-keescook@chromium.org/ Signed-off-by: Kees Cook <keescook@chromium.org>
2020-05-12	pstore: Add proper unregister lock checking	Kees Cook
	The pstore backend lock wasn't being used during pstore_unregister(). Add sanity check and locking. Link: https://lore.kernel.org/lkml/20200506152114.50375-7-keescook@chromium.org/ Signed-off-by: Kees Cook <keescook@chromium.org>
2020-05-12	pstore: Convert "records_list" locking to mutex	Kees Cook
	The pstorefs internal list lock doesn't need to be a spinlock and will create problems when trying to access the list in the subsequent patch that will walk the pstorefs records during pstore_unregister(). Change this to a mutex to avoid may_sleep() warnings when unregistering devices. Link: https://lore.kernel.org/lkml/20200506152114.50375-6-keescook@chromium.org/ Signed-off-by: Kees Cook <keescook@chromium.org>
2020-05-12	pstore: Rename "allpstore" to "records_list"	Kees Cook
	The name "allpstore" doesn't carry much meaning, so rename it to what it actually is: the list of all records present in the filesystem. The lock is also renamed accordingly. Link: https://lore.kernel.org/lkml/20200506152114.50375-5-keescook@chromium.org/ Signed-off-by: Kees Cook <keescook@chromium.org>
2020-05-12	pstore: Convert "psinfo" locking to mutex	Kees Cook
	Currently pstore can only have a single backend attached at a time, and it tracks the active backend via "psinfo", under a lock. The locking for this does not need to be a spinlock, and in order to avoid may_sleep() issues during future changes to pstore_unregister(), switch to a mutex instead. Link: https://lore.kernel.org/lkml/20200506152114.50375-4-keescook@chromium.org/ Signed-off-by: Kees Cook <keescook@chromium.org>
2020-05-12	pstore: Rename "pstore_lock" to "psinfo_lock"	Kees Cook
	The name "pstore_lock" sounds very global, but it is only supposed to be used for managing changes to "psinfo", so rename it accordingly. Link: https://lore.kernel.org/lkml/20200506152114.50375-3-keescook@chromium.org/ Signed-off-by: Kees Cook <keescook@chromium.org>
2020-05-12	pstore: Drop useless try_module_get() for backend	Kees Cook
	There is no reason to be doing a module get/put in pstore_register(), since the module calling pstore_register() cannot be unloaded since it hasn't finished its initialization. Remove it so there is no confusion about how registration ordering works. Link: https://lore.kernel.org/lkml/20200506152114.50375-2-keescook@chromium.org/ Signed-off-by: Kees Cook <keescook@chromium.org>
2020-05-11	NFS: Don't use RPC_TASK_CRED_NOREF with delegreturn	Trond Myklebust
	We are not guaranteed that the credential will remain pinned. Fixes: 612965072020 ("NFSv4: Avoid referencing the cred unnecessarily during NFSv4 I/O") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-05-11	Merge tag 'fscache-fixes-20200508-2' of ↵	Trond Myklebust
	git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs (1) The reorganisation of bmap() use accidentally caused the return value of cachefiles_read_or_alloc_pages() to get corrupted. (2) The NFS superblock index key accidentally got changed to include a number of kernel pointers - meaning that the key isn't matchable after a reboot. (3) A redundant check in nfs_fscache_get_super_cookie(). (4) The NFS change_attr sometimes set in the auxiliary data for the caching of an file and sometimes not, which causes the cache to get discarded when it shouldn't. (5) There's a race between cachefiles_read_waiter() and cachefiles_read_copier() that causes an occasional assertion failure.
2020-05-11	nfs: fix NULL deference in nfs4_get_valid_delegation	J. Bruce Fields
	We add the new state to the nfsi->open_states list, making it potentially visible to other threads, before we've finished initializing it. That wasn't a problem when all the readers were also taking the i_lock (as we do here), but since we switched to RCU, there's now a possibility that a reader could see the partially initialized state. Symptoms observed were a crash when another thread called nfs4_get_valid_delegation() on a NULL inode, resulting in an oops like: BUG: unable to handle page fault for address: ffffffffffffffb0 ... RIP: 0010:nfs4_get_valid_delegation+0x6/0x30 [nfsv4] ... Call Trace: nfs4_open_prepare+0x80/0x1c0 [nfsv4] __rpc_execute+0x75/0x390 [sunrpc] ? finish_task_switch+0x75/0x260 rpc_async_schedule+0x29/0x40 [sunrpc] process_one_work+0x1ad/0x370 worker_thread+0x30/0x390 ? create_worker+0x1a0/0x1a0 kthread+0x10c/0x130 ? kthread_park+0x80/0x80 ret_from_fork+0x22/0x30 Fixes: 9ae075fdd190 "NFSv4: Convert open state lookup to use RCU" Reviewed-by: Seiichi Ikarashi <s.ikarashi@fujitsu.com> Tested-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Tested-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Cc: stable@vger.kernel.org # v4.20+ Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-05-11	rxrpc: Fix the excessive initial retransmission timeout	David Howells
	rxrpc currently uses a fixed 4s retransmission timeout until the RTT is sufficiently sampled. This can cause problems with some fileservers with calls to the cache manager in the afs filesystem being dropped from the fileserver because a packet goes missing and the retransmission timeout is greater than the call expiry timeout. Fix this by: (1) Copying the RTT/RTO calculation code from Linux's TCP implementation and altering it to fit rxrpc. (2) Altering the various users of the RTT to make use of the new SRTT value. (3) Replacing the use of rxrpc_resend_timeout to use the calculated RTO value instead (which is needed in jiffies), along with a backoff. Notes: (1) rxrpc provides RTT samples by matching the serial numbers on outgoing DATA packets that have the RXRPC_REQUEST_ACK set and PING ACK packets against the reference serial number in incoming REQUESTED ACK and PING-RESPONSE ACK packets. (2) Each packet that is transmitted on an rxrpc connection gets a new per-connection serial number, even for retransmissions, so an ACK can be cross-referenced to a specific trigger packet. This allows RTT information to be drawn from retransmitted DATA packets also. (3) rxrpc maintains the RTT/RTO state on the rxrpc_peer record rather than on an rxrpc_call because many RPC calls won't live long enough to generate more than one sample. (4) The calculated SRTT value is in units of 8ths of a microsecond rather than nanoseconds. The (S)RTT and RTO values are displayed in /proc/net/rxrpc/peers. Fixes: 17926a79320a ([AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both"") Signed-off-by: David Howells <dhowells@redhat.com>