summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2020-07-12Merge tag 'io_uring-5.8-2020-07-12' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull io_uring fixes from Jens Axboe: "Two late fixes again: - Fix missing msg_name assignment in certain cases (Pavel) - Correct a previous fix for full coverage (Pavel)" * tag 'io_uring-5.8-2020-07-12' of git://git.kernel.dk/linux-block: io_uring: fix not initialised work->flags io_uring: fix missing msg_name assignment
2020-07-12Merge tag 'for-5.8-rc4-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "Two refcounting fixes and one prepartory patch for upcoming splice cleanup: - fix double put of block group with nodatacow - fix missing block group put when remounting with discard=async - explicitly set splice callback (no functional change), to ease integrating splice cleanup patches" * tag 'for-5.8-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: wire up iter_file_splice_write btrfs: fix double put of block group with nocow btrfs: discard: add missing put when grabbing block group from unused list
2020-07-12io_uring: fix not initialised work->flagsPavel Begunkov
59960b9deb535 ("io_uring: fix lazy work init") tried to fix missing io_req_init_async(), but left out work.flags and hash. Do it earlier. Fixes: 7cdaf587de7c ("io_uring: avoid whole io_wq_work copy for requests completed inline") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-12io_uring: fix missing msg_name assignmentPavel Begunkov
Ensure to set msg.msg_name for the async portion of send/recvmsg, as the header copy will copy to/from it. Cc: stable@vger.kernel.org # v5.5+ Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
All conflicts seemed rather trivial, with some guidance from Saeed Mameed on the tc_ct.c one. Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-10Merge tag '5.8-rc4-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull cifs fixes from Steve French: "Four cifs/smb3 fixes: the three for stable fix problems found recently with change notification including a reference count leak" * tag '5.8-rc4-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: update internal module version number cifs: fix reference leak for tlink smb3: fix unneeded error message on change notify cifs: remove the retry in cifs_poxis_lock_set smb3: fix access denied on change notify request to some servers
2020-07-10seccomp: Report number of loaded filters in /proc/$pid/statusKees Cook
A common question asked when debugging seccomp filters is "how many filters are attached to your process?" Provide a way to easily answer this question through /proc/$pid/status with a "Seccomp_filters" line. Signed-off-by: Kees Cook <keescook@chromium.org>
2020-07-10debugfs: make sure we can remove u32_array files cleanlyJakub Kicinski
debugfs_create_u32_array() allocates a small structure to wrap the data and size information about the array. If users ever try to remove the file this leads to a leak since nothing ever frees this wrapper. That said there are no upstream users of debugfs_create_u32_array() that'd remove a u32 array file (we only have one u32 array user in CMA), so there is no real bug here. Make callers pass a wrapper they allocated. This way the lifetime management of the wrapper is on the caller, and we can avoid the potential leak in debugfs. CC: Chucheng Luo <luochucheng@vivo.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-10Merge tag 'io_uring-5.8-2020-07-10' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull io_uring fixes from Jens Axboe: - Fix memleak for error path in registered files (Yang) - Export CQ overflow state in flags, necessary to fix a case where liburing doesn't know if it needs to enter the kernel (Xiaoguang) - Fix for a regression in when user memory is accounted freed, causing issues with back-to-back ring exit + init if the ulimit -l setting is very tight. * tag 'io_uring-5.8-2020-07-10' of git://git.kernel.dk/linux-block: io_uring: account user memory freed when exit has been queued io_uring: fix memleak in io_sqe_files_register() io_uring: fix memleak in __io_sqe_files_update() io_uring: export cq overflow status to userspace
2020-07-10Merge tag 'cleanup-kernel_read_write' of git://git.infradead.org/users/hch/miscLinus Torvalds
Pull in-kernel read and write op cleanups from Christoph Hellwig: "Cleanup in-kernel read and write operations Reshuffle the (__)kernel_read and (__)kernel_write helpers, and ensure all users of in-kernel file I/O use them if they don't use iov_iter based methods already. The new WARN_ONs in combination with syzcaller already found a missing input validation in 9p. The fix should be on your way through the maintainer ASAP". [ This is prep-work for the real changes coming 5.9 ] * tag 'cleanup-kernel_read_write' of git://git.infradead.org/users/hch/misc: fs: remove __vfs_read fs: implement kernel_read using __kernel_read integrity/ima: switch to using __kernel_read fs: add a __kernel_read helper fs: remove __vfs_write fs: implement kernel_write using __kernel_write fs: check FMODE_WRITE in __kernel_write fs: unexport __kernel_write bpfilter: switch to kernel_write autofs: switch to kernel_write cachefiles: switch to kernel_write
2020-07-10Merge tag 'gfs2-v5.8-rc4.fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 fixes from Andreas Gruenbacher: "Fix gfs2 readahead deadlocks by adding a IOCB_NOIO flag that allows gfs2 to use the generic fiel read iterator functions without having to worry about being called back while holding locks". * tag 'gfs2-v5.8-rc4.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: Rework read and page fault locking fs: Add IOCB_NOIO flag for generic_file_read_iter
2020-07-10io_uring: account user memory freed when exit has been queuedJens Axboe
We currently account the memory after the exit work has been run, but that leaves a gap where a process has closed its ring and until the memory has been accounted as freed. If the memlocked ulimit is borderline, then that can introduce spurious setup errors returning -ENOMEM because the free work hasn't been run yet. Account this as freed when we close the ring, as not to expose a tiny gap where setting up a new ring can fail. Fixes: 85faa7b8346e ("io_uring: punt final io_ring_ctx wait-and-free to workqueue") Cc: stable@vger.kernel.org # v5.7 Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-10io_uring: fix memleak in io_sqe_files_register()Yang Yingliang
I got a memleak report when doing some fuzz test: BUG: memory leak unreferenced object 0x607eeac06e78 (size 8): comm "test", pid 295, jiffies 4294735835 (age 31.745s) hex dump (first 8 bytes): 00 00 00 00 00 00 00 00 ........ backtrace: [<00000000932632e6>] percpu_ref_init+0x2a/0x1b0 [<0000000092ddb796>] __io_uring_register+0x111d/0x22a0 [<00000000eadd6c77>] __x64_sys_io_uring_register+0x17b/0x480 [<00000000591b89a6>] do_syscall_64+0x56/0xa0 [<00000000864a281d>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Call percpu_ref_exit() on error path to avoid refcount memleak. Fixes: 05f3fb3c5397 ("io_uring: avoid ring quiesce for fixed file set unregister and update") Cc: stable@vger.kernel.org Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-10debugfs: file: Remove unnecessary cast in kfree()Xu Wang
Remove unnecassary casts in the argument to kfree. Signed-off-by: Xu Wang <vulab@iscas.ac.cn> Link: https://lore.kernel.org/r/20200709054033.30148-1-vulab@iscas.ac.cn Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-09io_uring: remove dead 'ctx' argument and move forward declarationJens Axboe
We don't use 'ctx' at all in io_sq_thread_drop_mm(), it just works on the mm of the current task. Drop the argument. Move io_file_put_work() to where we have the other forward declarations of functions. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-09btrfs: wire up iter_file_splice_writeChristoph Hellwig
btrfs implements the iter_write op and thus can use the more efficient iov_iter based splice implementation. For now falling back to the less efficient default is pretty harmless, but I have a pending series that removes the default, and thus would cause btrfs to not support splice at all. Reported-by: Andy Lavr <andy.lavr@gmail.com> Tested-by: Andy Lavr <andy.lavr@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-07-09xfs: Fix false positive lockdep warning with sb_internal & fs_reclaimWaiman Long
Depending on the workloads, the following circular locking dependency warning between sb_internal (a percpu rwsem) and fs_reclaim (a pseudo lock) may show up: ====================================================== WARNING: possible circular locking dependency detected 5.0.0-rc1+ #60 Tainted: G W ------------------------------------------------------ fsfreeze/4346 is trying to acquire lock: 0000000026f1d784 (fs_reclaim){+.+.}, at: fs_reclaim_acquire.part.19+0x5/0x30 but task is already holding lock: 0000000072bfc54b (sb_internal){++++}, at: percpu_down_write+0xb4/0x650 which lock already depends on the new lock. : Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(sb_internal); lock(fs_reclaim); lock(sb_internal); lock(fs_reclaim); *** DEADLOCK *** 4 locks held by fsfreeze/4346: #0: 00000000b478ef56 (sb_writers#8){++++}, at: percpu_down_write+0xb4/0x650 #1: 000000001ec487a9 (&type->s_umount_key#28){++++}, at: freeze_super+0xda/0x290 #2: 000000003edbd5a0 (sb_pagefaults){++++}, at: percpu_down_write+0xb4/0x650 #3: 0000000072bfc54b (sb_internal){++++}, at: percpu_down_write+0xb4/0x650 stack backtrace: Call Trace: dump_stack+0xe0/0x19a print_circular_bug.isra.10.cold.34+0x2f4/0x435 check_prev_add.constprop.19+0xca1/0x15f0 validate_chain.isra.14+0x11af/0x3b50 __lock_acquire+0x728/0x1200 lock_acquire+0x269/0x5a0 fs_reclaim_acquire.part.19+0x29/0x30 fs_reclaim_acquire+0x19/0x20 kmem_cache_alloc+0x3e/0x3f0 kmem_zone_alloc+0x79/0x150 xfs_trans_alloc+0xfa/0x9d0 xfs_sync_sb+0x86/0x170 xfs_log_sbcount+0x10f/0x140 xfs_quiesce_attr+0x134/0x270 xfs_fs_freeze+0x4a/0x70 freeze_super+0x1af/0x290 do_vfs_ioctl+0xedc/0x16c0 ksys_ioctl+0x41/0x80 __x64_sys_ioctl+0x73/0xa9 do_syscall_64+0x18f/0xd23 entry_SYSCALL_64_after_hwframe+0x49/0xbe This is a false positive as all the dirty pages are flushed out before the filesystem can be frozen. One way to avoid this splat is to add GFP_NOFS to the affected allocation calls by using the memalloc_nofs_save()/memalloc_nofs_restore() pair. This shouldn't matter unless the system is really running out of memory. In that particular case, the filesystem freeze operation may fail while it was succeeding previously. Without this patch, the command sequence below will show that the lock dependency chain sb_internal -> fs_reclaim exists. # fsfreeze -f /home # fsfreeze --unfreeze /home # grep -i fs_reclaim -C 3 /proc/lockdep_chains | grep -C 5 sb_internal After applying the patch, such sb_internal -> fs_reclaim lock dependency chain can no longer be found. Because of that, the locking dependency warning will not be shown. Suggested-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Waiman Long <longman@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-07-09btrfs: fix double put of block group with nocowJosef Bacik
While debugging a patch that I wrote I was hitting use-after-free panics when accessing block groups on unmount. This turned out to be because in the nocow case if we bail out of doing the nocow for whatever reason we need to call btrfs_dec_nocow_writers() if we called the inc. This puts our block group, but a few error cases does if (nocow) { btrfs_dec_nocow_writers(); goto error; } unfortunately, error is error: if (nocow) btrfs_dec_nocow_writers(); so we get a double put on our block group. Fix this by dropping the error cases calling of btrfs_dec_nocow_writers(), as it's handled at the error label now. Fixes: 762bf09893b4 ("btrfs: improve error handling in run_delalloc_nocow") CC: stable@vger.kernel.org # 5.4+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-07-09io_uring: get rid of __req_need_defer()Jens Axboe
We just have one caller of this, req_need_defer(), just inline the code in there instead. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-09cifs: update internal module version numberSteve French
To 2.28 Signed-off-by: Steve French <stfrench@microsoft.com>
2020-07-09cifs: fix reference leak for tlinkRonnie Sahlberg
Don't leak a reference to tlink during the NOTIFY ioctl Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com> CC: Stable <stable@vger.kernel.org> # v5.6+
2020-07-09efi/efivars: Expose RT service availability via efivars abstractionArd Biesheuvel
Commit bf67fad19e493b ("efi: Use more granular check for availability for variable services") introduced a check into the efivarfs, efi-pstore and other drivers that aborts loading of the module if not all three variable runtime services (GetVariable, SetVariable and GetNextVariable) are supported. However, this results in efivarfs being unavailable entirely if only SetVariable support is missing, which is only needed if you want to make any modifications. Also, efi-pstore and the sysfs EFI variable interface could be backed by another implementation of the 'efivars' abstraction, in which case it is completely irrelevant which services are supported by the EFI firmware. So make the generic 'efivars' abstraction dependent on the availibility of the GetVariable and GetNextVariable EFI runtime services, and add a helper 'efivar_supports_writes()' to find out whether the currently active efivars abstraction supports writes (and wire it up to the availability of SetVariable for the generic one). Then, use the efivar_supports_writes() helper to decide whether to permit efivarfs to be mounted read-write, and whether to enable efi-pstore or the sysfs EFI variable interface altogether. Fixes: bf67fad19e493b ("efi: Use more granular check for availability for variable services") Reported-by: Heinrich Schuchardt <xypron.glpk@gmx.de> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Tested-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-07-09Replace HTTP links with HTTPS ones: DISKQUOTAAlexander A. Klimov
Rationale: Reduces attack surface on kernel devs opening the links for MITM as HTTPS traffic is much harder to manipulate. Deterministic algorithm: For each file: If not .svg: For each line: If doesn't contain `\bxmlns\b`: For each link, `\bhttp://[^# \t\r\n]*(?:\w|/)`: If neither `\bgnu\.org/license`, nor `\bmozilla\.org/MPL\b`: If both the HTTP and HTTPS versions return 200 OK and serve the same content: Replace HTTP with HTTPS. Link: https://lore.kernel.org/r/20200708171905.15396-1-grandmaster@al2klimov.de Signed-off-by: Alexander A. Klimov <grandmaster@al2klimov.de> Signed-off-by: Jan Kara <jack@suse.cz>
2020-07-09ext2: initialize quota info in ext2_xattr_set()Chengguang Xu
In order to correctly account/limit space usage, should initialize quota info before calling quota related functions. Link: https://lore.kernel.org/r/20200626054959.114177-1-cgxu519@mykernel.net Signed-off-by: Chengguang Xu <cgxu519@mykernel.net> Reviewed-by: Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-07-09ext2: fix some incorrect comments in inode.cChengguang Xu
There are some incorrect comments in inode.c, so fix them properly. Link: https://lore.kernel.org/r/20200703124411.24085-1-cgxu519@mykernel.net Signed-off-by: Chengguang Xu <cgxu519@mykernel.net> Signed-off-by: Jan Kara <jack@suse.cz>
2020-07-09ext2: remove nocheck optionChengguang Xu
Remove useless nocheck option. Link: https://lore.kernel.org/r/20200619073144.4701-1-cgxu519@mykernel.net Signed-off-by: Chengguang Xu <cgxu519@mykernel.net> Signed-off-by: Jan Kara <jack@suse.cz>
2020-07-09ext2: fix missing percpu_counter_incMikulas Patocka
sbi->s_freeinodes_counter is only decreased by the ext2 code, it is never increased. This patch fixes it. Note that sbi->s_freeinodes_counter is only used in the algorithm that tries to find the group for new allocations, so this bug is not easily visible (the only visibility is that the group finding algorithm selects inoptinal result). Link: https://lore.kernel.org/r/alpine.LRH.2.02.2004201538300.19436@file01.intranet.prod.int.rdu2.redhat.com Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz>
2020-07-09ext2: ext2_find_entry() return -ENOENT if no entry foundzhangyi (F)
Almost all callers of ext2_find_entry() transform NULL return value to -ENOENT, so just let ext2_find_entry() retuen -ENOENT instead of NULL if no valid entry found, and also switch to check the return value of ext2_inode_by_name() in ext2_lookup() and ext2_get_parent(). Link: https://lore.kernel.org/r/20200608034043.10451-2-yi.zhang@huawei.com Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Jan Kara <jack@suse.cz>
2020-07-09ext2: propagate errors up to ext2_find_entry()'s callerszhangyi (F)
The same to commit <36de928641ee4> (ext4: propagate errors up to ext4_find_entry()'s callers') in ext4, also return error instead of NULL pointer in case of some error happens in ext2_find_entry() (e.g. -ENOMEM or -EIO). This could avoid a negative dentry cache entry installed even it failed to read directory block due to IO error. Link: https://lore.kernel.org/r/20200608034043.10451-1-yi.zhang@huawei.com Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-07-09ext2: fix improper assignment for e_value_offsChengguang Xu
In the process of changing value for existing EA, there is an improper assignment of e_value_offs(setting to 0), because it will be reset to incorrect value in the following loop(shifting EA values before target). Delayed assignment can avoid this issue. Link: https://lore.kernel.org/r/20200603084429.25344-1-cgxu519@mykernel.net Signed-off-by: Chengguang Xu <cgxu519@mykernel.net> Signed-off-by: Jan Kara <jack@suse.cz>
2020-07-08f2fs: don't keep meta inode pages used for compressed block migrationChao Yu
meta inode's pages are used for encrypted, verity and compressed blocks, so the meta inode's cache invalidation condition in do_checkpoint() should consider compression as well, not just for verity and encryption, fix it. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-07-08io_uring: fix memleak in __io_sqe_files_update()Yang Yingliang
I got a memleak report when doing some fuzz test: BUG: memory leak unreferenced object 0xffff888113e02300 (size 488): comm "syz-executor401", pid 356, jiffies 4294809529 (age 11.954s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ a0 a4 ce 19 81 88 ff ff 60 ce 09 0d 81 88 ff ff ........`....... backtrace: [<00000000129a84ec>] kmem_cache_zalloc include/linux/slab.h:659 [inline] [<00000000129a84ec>] __alloc_file+0x25/0x310 fs/file_table.c:101 [<000000003050ad84>] alloc_empty_file+0x4f/0x120 fs/file_table.c:151 [<000000004d0a41a3>] alloc_file+0x5e/0x550 fs/file_table.c:193 [<000000002cb242f0>] alloc_file_pseudo+0x16a/0x240 fs/file_table.c:233 [<00000000046a4baa>] anon_inode_getfile fs/anon_inodes.c:91 [inline] [<00000000046a4baa>] anon_inode_getfile+0xac/0x1c0 fs/anon_inodes.c:74 [<0000000035beb745>] __do_sys_perf_event_open+0xd4a/0x2680 kernel/events/core.c:11720 [<0000000049009dc7>] do_syscall_64+0x56/0xa0 arch/x86/entry/common.c:359 [<00000000353731ca>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 BUG: memory leak unreferenced object 0xffff8881152dd5e0 (size 16): comm "syz-executor401", pid 356, jiffies 4294809529 (age 11.954s) hex dump (first 16 bytes): 01 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<0000000074caa794>] kmem_cache_zalloc include/linux/slab.h:659 [inline] [<0000000074caa794>] lsm_file_alloc security/security.c:567 [inline] [<0000000074caa794>] security_file_alloc+0x32/0x160 security/security.c:1440 [<00000000c6745ea3>] __alloc_file+0xba/0x310 fs/file_table.c:106 [<000000003050ad84>] alloc_empty_file+0x4f/0x120 fs/file_table.c:151 [<000000004d0a41a3>] alloc_file+0x5e/0x550 fs/file_table.c:193 [<000000002cb242f0>] alloc_file_pseudo+0x16a/0x240 fs/file_table.c:233 [<00000000046a4baa>] anon_inode_getfile fs/anon_inodes.c:91 [inline] [<00000000046a4baa>] anon_inode_getfile+0xac/0x1c0 fs/anon_inodes.c:74 [<0000000035beb745>] __do_sys_perf_event_open+0xd4a/0x2680 kernel/events/core.c:11720 [<0000000049009dc7>] do_syscall_64+0x56/0xa0 arch/x86/entry/common.c:359 [<00000000353731ca>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 If io_sqe_file_register() failed, we need put the file that get by fget() to avoid the memleak. Fixes: c3a31e605620 ("io_uring: add support for IORING_REGISTER_FILES_UPDATE") Cc: stable@vger.kernel.org Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-08io_uring: export cq overflow status to userspaceXiaoguang Wang
For those applications which are not willing to use io_uring_enter() to reap and handle cqes, they may completely rely on liburing's io_uring_peek_cqe(), but if cq ring has overflowed, currently because io_uring_peek_cqe() is not aware of this overflow, it won't enter kernel to flush cqes, below test program can reveal this bug: static void test_cq_overflow(struct io_uring *ring) { struct io_uring_cqe *cqe; struct io_uring_sqe *sqe; int issued = 0; int ret = 0; do { sqe = io_uring_get_sqe(ring); if (!sqe) { fprintf(stderr, "get sqe failed\n"); break;; } ret = io_uring_submit(ring); if (ret <= 0) { if (ret != -EBUSY) fprintf(stderr, "sqe submit failed: %d\n", ret); break; } issued++; } while (ret > 0); assert(ret == -EBUSY); printf("issued requests: %d\n", issued); while (issued) { ret = io_uring_peek_cqe(ring, &cqe); if (ret) { if (ret != -EAGAIN) { fprintf(stderr, "peek completion failed: %s\n", strerror(ret)); break; } printf("left requets: %d\n", issued); continue; } io_uring_cqe_seen(ring, cqe); issued--; printf("left requets: %d\n", issued); } } int main(int argc, char *argv[]) { int ret; struct io_uring ring; ret = io_uring_queue_init(16, &ring, 0); if (ret) { fprintf(stderr, "ring setup failed: %d\n", ret); return 1; } test_cq_overflow(&ring); return 0; } To fix this issue, export cq overflow status to userspace by adding new IORING_SQ_CQ_OVERFLOW flag, then helper functions() in liburing, such as io_uring_peek_cqe, can be aware of this cq overflow and do flush accordingly. Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-08writeback: remove bdi->congested_fnChristoph Hellwig
Except for pktdvd, the only places setting congested bits are file systems that allocate their own backing_dev_info structures. And pktdvd is a deprecated driver that isn't useful in stack setup either. So remove the dead congested_fn stacking infrastructure. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Song Liu <song@kernel.org> Acked-by: David Sterba <dsterba@suse.com> [axboe: fixup unused variables in bcache/request.c] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-08isofs: remove a stale commentChristoph Hellwig
check_disk_change isn't for consumers of the block layer, so remove the comment mentioning it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-08block: remove flush_diskChristoph Hellwig
flush_disk has only two callers, so open code it there. That also helps clarifying the error message for the particular case, and allows to remove setting bd_invalidated in check_disk_size_change, which will be cleared again instantly. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-08io_uring: only call kfree() for a non-zero pointerJens Axboe
It's safe to call kfree() with a NULL pointer, but it's also pointless. Most of the time we don't have any data to free, and at millions of requests per second, the redundant function call adds noticeable overhead (about 1.3% of the runtime). Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-08io_uring: fix a use after free in io_async_task_func()Dan Carpenter
The "apoll" variable is freed and then used on the next line. We need to move the free down a few lines. Fixes: 0be0b0e33b0b ("io_uring: simplify io_async_task_func()") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-08ext4: add inline encryption supportEric Biggers
Wire up ext4 to support inline encryption via the helper functions which fs/crypto/ now provides. This includes: - Adding a mount option 'inlinecrypt' which enables inline encryption on encrypted files where it can be used. - Setting the bio_crypt_ctx on bios that will be submitted to an inline-encrypted file. Note: submit_bh_wbc() in fs/buffer.c also needed to be patched for this part, since ext4 sometimes uses ll_rw_block() on file data. - Not adding logically discontiguous data to bios that will be submitted to an inline-encrypted file. - Not doing filesystem-layer crypto on inline-encrypted files. Co-developed-by: Satya Tangirala <satyat@google.com> Signed-off-by: Satya Tangirala <satyat@google.com> Reviewed-by: Theodore Ts'o <tytso@mit.edu> Link: https://lore.kernel.org/r/20200702015607.1215430-5-satyat@google.com Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-07-08f2fs: add inline encryption supportSatya Tangirala
Wire up f2fs to support inline encryption via the helper functions which fs/crypto/ now provides. This includes: - Adding a mount option 'inlinecrypt' which enables inline encryption on encrypted files where it can be used. - Setting the bio_crypt_ctx on bios that will be submitted to an inline-encrypted file. - Not adding logically discontiguous data to bios that will be submitted to an inline-encrypted file. - Not doing filesystem-layer crypto on inline-encrypted files. This patch includes a fix for a race during IPU by Sahitya Tummala <stummala@codeaurora.org> Signed-off-by: Satya Tangirala <satyat@google.com> Acked-by: Jaegeuk Kim <jaegeuk@kernel.org> Reviewed-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Link: https://lore.kernel.org/r/20200702015607.1215430-4-satyat@google.com Co-developed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-07-08fscrypt: add inline encryption supportSatya Tangirala
Add support for inline encryption to fs/crypto/. With "inline encryption", the block layer handles the decryption/encryption as part of the bio, instead of the filesystem doing the crypto itself via Linux's crypto API. This model is needed in order to take advantage of the inline encryption hardware present on most modern mobile SoCs. To use inline encryption, the filesystem needs to be mounted with '-o inlinecrypt'. Blk-crypto will then be used instead of the traditional filesystem-layer crypto whenever possible to encrypt the contents of any encrypted files in that filesystem. Fscrypt still provides the key and IV to use, and the actual ciphertext on-disk is still the same; therefore it's testable using the existing fscrypt ciphertext verification tests. Note that since blk-crypto has a fallback to Linux's crypto API, and also supports all the encryption modes currently supported by fscrypt, this feature is usable and testable even without actual inline encryption hardware. Per-filesystem changes will be needed to set encryption contexts when submitting bios and to implement the 'inlinecrypt' mount option. This patch just adds the common code. Signed-off-by: Satya Tangirala <satyat@google.com> Reviewed-by: Jaegeuk Kim <jaegeuk@kernel.org> Reviewed-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Theodore Ts'o <tytso@mit.edu> Link: https://lore.kernel.org/r/20200702015607.1215430-3-satyat@google.com Co-developed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-07-08f2fs: fix error path in do_recover_data()Chao Yu
- don't panic kernel if f2fs_get_node_page() fails in f2fs_recover_inline_data() or f2fs_recover_inline_xattr(); - return error number of f2fs_truncate_blocks() to f2fs_recover_inline_data()'s caller; Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-07-08f2fs: fix to wait GCed compressed page writebackChao Yu
like we did for encrypted page. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-07-08f2fs: remove write attribute of main_blkaddr sysfs nodeDehe Gu
Fuzzing main_blkaddr sysfs node will corrupt this field's value, causing kernel panic, remove its write attribute to avoid potential security risk. [Chao Yu: add description] Signed-off-by: Dehe Gu <gudehe@huawei.com> Signed-off-by: Daiyue Zhang <zhangdaiyue1@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-07-08smb3: fix unneeded error message on change notifySteve French
We should not be logging a warning repeatedly on change notify. CC: Stable <stable@vger.kernel.org> # v5.6+ Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
2020-07-08fs: remove __vfs_readChristoph Hellwig
Fold it into the two callers. Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-08fs: implement kernel_read using __kernel_readChristoph Hellwig
Consolidate the two in-kernel read helpers to make upcoming changes easier. The only difference are the missing call to rw_verify_area in kernel_read, and an access_ok check that doesn't make sense for kernel buffers to start with. Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-08fs: add a __kernel_read helperChristoph Hellwig
This is the counterpart to __kernel_write, and skip the rw_verify_area call compared to kernel_read. Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-08fs: remove __vfs_writeChristoph Hellwig
Fold it into the two callers. Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-08fs: implement kernel_write using __kernel_writeChristoph Hellwig
Consolidate the two in-kernel write helpers to make upcoming changes easier. The only difference are the missing call to rw_verify_area in kernel_write, and an access_ok check that doesn't make sense for kernel buffers to start with. Signed-off-by: Christoph Hellwig <hch@lst.de>