summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2019-05-01openpromfs: switch to ->free_inode()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-01ocfs2: switch to ->free_inode()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-01dlmfs: switch to ->free_inode()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-01nilfs2: switch to ->free_inode()Al Viro
kill an extern that went stale 9 years ago, while we are at it... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-01nfs{,4}: switch to ->free_inode()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-01minix: switch to ->free_inode()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-01jffs2: switch to ->free_inode()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-01isofs: switch to ->free_inode()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-01hpfs: switch to ->free_inode()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-01hostfs: switch to ->free_inode()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-01hfsplus: switch to ->free_inode()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-01hfs: switch to ->free_inode()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-01gfs2: switch to ->free_inode()Al Viro
... and use GFS2_I() to get the containing gfs2_inode by inode; yes, we can feed the address of the first member of structure to kmem_cache_free(), but let's do it in an obviously safe way. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-01freevxfs: switch to ->free_inode()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-01fat: switch to ->free_inode()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-01f2fs: switch to ->free_inode()Al Viro
Acked-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-01ext2: switch to ->free_inode()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-01efs: switch to ->free_inode()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-01debugfs: switch to ->free_inode()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-01cifs: switch to ->free_inode()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-01bdev: switch to ->free_inode()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-01bfs: switch to ->free_inode()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-01befs: switch to ->free_inode()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-01affs: switch to ->free_inode()Al Viro
Acked-by: David Sterba <dsterba@suse.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-01adfs: switch to ->free_inode()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-019p: switch to ->free_inode()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-01new inode method: ->free_inode()Al Viro
A lot of ->destroy_inode() instances end with call_rcu() of a callback that does RCU-delayed part of freeing. Introduce a new method for doing just that, with saner signature. Rules: ->destroy_inode ->free_inode f g immediate call of f(), RCU-delayed call of g() f NULL immediate call of f(), no RCU-delayed calls NULL g RCU-delayed call of g() NULL NULL RCU-delayed default freeing IOW, NULL ->free_inode gives the same behaviour as now. Note that NULL, NULL is equivalent to NULL, free_inode_nonrcu; we could mandate the latter form, but that would have very little benefit beyond making rules a bit more symmetric. It would break backwards compatibility, require extra boilerplate and expected semantics for (NULL, NULL) pair would have no use whatsoever... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-01gcc-9: don't warn about uninitialized btrfs extent_type variableLinus Torvalds
The 'extent_type' variable does seem to be reliably initialized, but it's _very_ non-obvious, since there's a "goto next" case that jumps over the normal initialization. That will then always trigger the "start >= extent_end" test, which will end up never falling through to the use of that variable. But the code is certainly not obvious, and the compiler warning looks reasonable. Make 'extent_type' an int, and initialize it to an invalid negative value, which seems to be the common pattern in other places. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-01fsnotify: Clarify connector assignment in fsnotify_add_mark_list()Jan Kara
Add a comment explaining why WRITE_ONCE() is enough when setting mark->connector which can get dereferenced by RCU protected readers. Signed-off-by: Jan Kara <jack@suse.cz>
2019-05-01io_uring: avoid page allocation warningsMark Rutland
In io_sqe_buffer_register() we allocate a number of arrays based on the iov_len from the user-provided iov. While we limit iov_len to SZ_1G, we can still attempt to allocate arrays exceeding MAX_ORDER. On a 64-bit system with 4KiB pages, for an iov where iov_base = 0x10 and iov_len = SZ_1G, we'll calculate that nr_pages = 262145. When we try to allocate a corresponding array of (16-byte) bio_vecs, requiring 4194320 bytes, which is greater than 4MiB. This results in SLUB warning that we're trying to allocate greater than MAX_ORDER, and failing the allocation. Avoid this by using kvmalloc() for allocations dependent on the user-provided iov_len. At the same time, fix a leak of imu->bvec when registration fails. Full splat from before this patch: WARNING: CPU: 1 PID: 2314 at mm/page_alloc.c:4595 __alloc_pages_nodemask+0x7ac/0x2938 mm/page_alloc.c:4595 Kernel panic - not syncing: panic_on_warn set ... CPU: 1 PID: 2314 Comm: syz-executor326 Not tainted 5.1.0-rc7-dirty #4 Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace+0x0/0x2f0 include/linux/compiler.h:193 show_stack+0x20/0x30 arch/arm64/kernel/traps.c:158 __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x110/0x190 lib/dump_stack.c:113 panic+0x384/0x68c kernel/panic.c:214 __warn+0x2bc/0x2c0 kernel/panic.c:571 report_bug+0x228/0x2d8 lib/bug.c:186 bug_handler+0xa0/0x1a0 arch/arm64/kernel/traps.c:956 call_break_hook arch/arm64/kernel/debug-monitors.c:301 [inline] brk_handler+0x1d4/0x388 arch/arm64/kernel/debug-monitors.c:316 do_debug_exception+0x1a0/0x468 arch/arm64/mm/fault.c:831 el1_dbg+0x18/0x8c __alloc_pages_nodemask+0x7ac/0x2938 mm/page_alloc.c:4595 alloc_pages_current+0x164/0x278 mm/mempolicy.c:2132 alloc_pages include/linux/gfp.h:509 [inline] kmalloc_order+0x20/0x50 mm/slab_common.c:1231 kmalloc_order_trace+0x30/0x2b0 mm/slab_common.c:1243 kmalloc_large include/linux/slab.h:480 [inline] __kmalloc+0x3dc/0x4f0 mm/slub.c:3791 kmalloc_array include/linux/slab.h:670 [inline] io_sqe_buffer_register fs/io_uring.c:2472 [inline] __io_uring_register fs/io_uring.c:2962 [inline] __do_sys_io_uring_register fs/io_uring.c:3008 [inline] __se_sys_io_uring_register fs/io_uring.c:2990 [inline] __arm64_sys_io_uring_register+0x9e0/0x1bc8 fs/io_uring.c:2990 __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline] invoke_syscall arch/arm64/kernel/syscall.c:47 [inline] el0_svc_common.constprop.0+0x148/0x2e0 arch/arm64/kernel/syscall.c:83 el0_svc_handler+0xdc/0x100 arch/arm64/kernel/syscall.c:129 el0_svc+0x8/0xc arch/arm64/kernel/entry.S:948 SMP: stopping secondary CPUs Dumping ftrace buffer: (ftrace buffer empty) Kernel Offset: disabled CPU features: 0x002,23000438 Memory Limit: none Rebooting in 1 seconds.. Fixes: edafccee56ff3167 ("io_uring: add support for pre-mapped user IO buffers") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-fsdevel@vger.kernel.org Cc: linux-block@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-01iomap: Add a page_prepare callbackAndreas Gruenbacher
Move the page_done callback into a separate iomap_page_ops structure and add a page_prepare calback to be called before the next page is written to. In gfs2, we'll want to start a transaction in page_prepare and end it in page_done. Other filesystems that implement data journaling will require the same kind of mechanism. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-05-01iomap: Fix use-after-free error in page_done callbackAndreas Gruenbacher
In iomap_write_end, we're not holding a page reference anymore when calling the page_done callback, but the callback needs that reference to access the page. To fix that, move the put_page call in __generic_write_end into the callers of __generic_write_end. Then, in iomap_write_end, put the page after calling the page_done callback. Reported-by: Jan Kara <jack@suse.cz> Fixes: 63899c6f8851 ("iomap: add a page_done callback") Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-05-01fs: Turn __generic_write_end into a void functionAndreas Gruenbacher
The VFS-internal __generic_write_end helper always returns the value of its @copied argument. This can be confusing, and it isn't very useful anyway, so turn __generic_write_end into a function returning void instead. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-05-01iomap: Clean up __generic_write_end callingChristoph Hellwig
Move the call to __generic_write_end into iomap_write_end instead of duplicating it in each of the three branches. This requires open coding the generic_write_end for the buffer_head case. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-05-01block: fix handling for BIO_NO_PAGE_REFMing Lei
Commit 399254aaf489211 ("block: add BIO_NO_PAGE_REF flag") introduces BIO_NO_PAGE_REF, and once this flag is set for one bio, all pages in the bio won't be get/put during IO. However, if one bio is submitted via __blkdev_direct_IO_simple(), even though BIO_NO_PAGE_REF is set, pages still may be put. Fixes this issue by avoiding to put pages if BIO_NO_PAGE_REF is set. Fixes: 399254aaf489211 ("block: add BIO_NO_PAGE_REF flag") Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-01io_uring: drop req submit reference always in async puntJens Axboe
If we don't end up actually calling submit in io_sq_wq_submit_work(), we still need to drop the submit reference to the request. If we don't, then we can leak the request. This can happen if we race with ring shutdown while flushing the workqueue for requests that require use of the mm_struct. Fixes: e65ef56db494 ("io_uring: use regular request ref counts") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-01io_uring: free allocated io_memory onceMark Rutland
If io_allocate_scq_urings() fails to allocate an sq_* region, it will call io_mem_free() for any previously allocated regions, but leave dangling pointers to these regions in the ctx. Any regions which have not yet been allocated are left NULL. Note that when returning -EOVERFLOW, the previously allocated sq_ring is not freed, which appears to be an unintentional leak. When io_allocate_scq_urings() fails, io_uring_create() will call io_ring_ctx_wait_and_kill(), which calls io_mem_free() on all the sq_* regions, assuming the pointers are valid and not NULL. This can result in pages being freed multiple times, which has been observed to corrupt the page state, leading to subsequent fun. This can also result in virt_to_page() on NULL, resulting in the use of bogus page addresses, and yet more subsequent fun. The latter can be detected with CONFIG_DEBUG_VIRTUAL on arm64. Adding a cleanup path to io_allocate_scq_urings() complicates the logic, so let's leave it to io_ring_ctx_free() to consistently free these pointers, and simplify the io_allocate_scq_urings() error paths. Full splats from before this patch below. Note that the pointer logged by the DEBUG_VIRTUAL "non-linear address" warning has been hashed, and is actually NULL. [ 26.098129] page:ffff80000e949a00 count:0 mapcount:-128 mapping:0000000000000000 index:0x0 [ 26.102976] flags: 0x63fffc000000() [ 26.104373] raw: 000063fffc000000 ffff80000e86c188 ffff80000ea3df08 0000000000000000 [ 26.108917] raw: 0000000000000000 0000000000000001 00000000ffffff7f 0000000000000000 [ 26.137235] page dumped because: VM_BUG_ON_PAGE(page_ref_count(page) == 0) [ 26.143960] ------------[ cut here ]------------ [ 26.146020] kernel BUG at include/linux/mm.h:547! [ 26.147586] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP [ 26.149163] Modules linked in: [ 26.150287] Process syz-executor.21 (pid: 20204, stack limit = 0x000000000e9cefeb) [ 26.153307] CPU: 2 PID: 20204 Comm: syz-executor.21 Not tainted 5.1.0-rc7-00004-g7d30b2ea43d6 #18 [ 26.156566] Hardware name: linux,dummy-virt (DT) [ 26.158089] pstate: 40400005 (nZcv daif +PAN -UAO) [ 26.159869] pc : io_mem_free+0x9c/0xa8 [ 26.161436] lr : io_mem_free+0x9c/0xa8 [ 26.162720] sp : ffff000013003d60 [ 26.164048] x29: ffff000013003d60 x28: ffff800025048040 [ 26.165804] x27: 0000000000000000 x26: ffff800025048040 [ 26.167352] x25: 00000000000000c0 x24: ffff0000112c2820 [ 26.169682] x23: 0000000000000000 x22: 0000000020000080 [ 26.171899] x21: ffff80002143b418 x20: ffff80002143b400 [ 26.174236] x19: ffff80002143b280 x18: 0000000000000000 [ 26.176607] x17: 0000000000000000 x16: 0000000000000000 [ 26.178997] x15: 0000000000000000 x14: 0000000000000000 [ 26.181508] x13: 00009178a5e077b2 x12: 0000000000000001 [ 26.183863] x11: 0000000000000000 x10: 0000000000000980 [ 26.186437] x9 : ffff000013003a80 x8 : ffff800025048a20 [ 26.189006] x7 : ffff8000250481c0 x6 : ffff80002ffe9118 [ 26.191359] x5 : ffff80002ffe9118 x4 : 0000000000000000 [ 26.193863] x3 : ffff80002ffefe98 x2 : 44c06ddd107d1f00 [ 26.196642] x1 : 0000000000000000 x0 : 000000000000003e [ 26.198892] Call trace: [ 26.199893] io_mem_free+0x9c/0xa8 [ 26.201155] io_ring_ctx_wait_and_kill+0xec/0x180 [ 26.202688] io_uring_setup+0x6c4/0x6f0 [ 26.204091] __arm64_sys_io_uring_setup+0x18/0x20 [ 26.205576] el0_svc_common.constprop.0+0x7c/0xe8 [ 26.207186] el0_svc_handler+0x28/0x78 [ 26.208389] el0_svc+0x8/0xc [ 26.209408] Code: aa0203e0 d0006861 9133a021 97fcdc3c (d4210000) [ 26.211995] ---[ end trace bdb81cd43a21e50d ]--- [ 81.770626] ------------[ cut here ]------------ [ 81.825015] virt_to_phys used for non-linear address: 000000000d42f2c7 ( (null)) [ 81.827860] WARNING: CPU: 1 PID: 30171 at arch/arm64/mm/physaddr.c:15 __virt_to_phys+0x48/0x68 [ 81.831202] Modules linked in: [ 81.832212] CPU: 1 PID: 30171 Comm: syz-executor.20 Not tainted 5.1.0-rc7-00004-g7d30b2ea43d6 #19 [ 81.835616] Hardware name: linux,dummy-virt (DT) [ 81.836863] pstate: 60400005 (nZCv daif +PAN -UAO) [ 81.838727] pc : __virt_to_phys+0x48/0x68 [ 81.840572] lr : __virt_to_phys+0x48/0x68 [ 81.842264] sp : ffff80002cf67c70 [ 81.843858] x29: ffff80002cf67c70 x28: ffff800014358e18 [ 81.846463] x27: 0000000000000000 x26: 0000000020000080 [ 81.849148] x25: 0000000000000000 x24: ffff80001bb01f40 [ 81.851986] x23: ffff200011db06c8 x22: ffff2000127e3c60 [ 81.854351] x21: ffff800014358cc0 x20: ffff800014358d98 [ 81.856711] x19: 0000000000000000 x18: 0000000000000000 [ 81.859132] x17: 0000000000000000 x16: 0000000000000000 [ 81.861586] x15: 0000000000000000 x14: 0000000000000000 [ 81.863905] x13: 0000000000000000 x12: ffff1000037603e9 [ 81.866226] x11: 1ffff000037603e8 x10: 0000000000000980 [ 81.868776] x9 : ffff80002cf67840 x8 : ffff80001bb02920 [ 81.873272] x7 : ffff1000037603e9 x6 : ffff80001bb01f47 [ 81.875266] x5 : ffff1000037603e9 x4 : dfff200000000000 [ 81.876875] x3 : ffff200010087528 x2 : ffff1000059ecf58 [ 81.878751] x1 : 44c06ddd107d1f00 x0 : 0000000000000000 [ 81.880453] Call trace: [ 81.881164] __virt_to_phys+0x48/0x68 [ 81.882919] io_mem_free+0x18/0x110 [ 81.886585] io_ring_ctx_wait_and_kill+0x13c/0x1f0 [ 81.891212] io_uring_setup+0xa60/0xad0 [ 81.892881] __arm64_sys_io_uring_setup+0x2c/0x38 [ 81.894398] el0_svc_common.constprop.0+0xac/0x150 [ 81.896306] el0_svc_handler+0x34/0x88 [ 81.897744] el0_svc+0x8/0xc [ 81.898715] ---[ end trace b4a703802243cbba ]--- Fixes: 2b188cc1bb857a9d ("Add io_uring IO interface") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: linux-block@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-01io_uring: fix SQPOLL cpu validationMark Rutland
In io_sq_offload_start(), we call cpu_possible() on an unbounded cpu value from userspace. On v5.1-rc7 on arm64 with CONFIG_DEBUG_PER_CPU_MAPS, this results in a splat: WARNING: CPU: 1 PID: 27601 at include/linux/cpumask.h:121 cpu_max_bits_warn include/linux/cpumask.h:121 [inline] There was an attempt to fix this in commit: 917257daa0fea7a0 ("io_uring: only test SQPOLL cpu after we've verified it") ... by adding a check after the cpu value had been limited to NR_CPU_IDS using array_index_nospec(). However, this left an unbound check at the start of the function, for which the warning still fires. Let's fix this correctly by checking that the cpu value is bound by nr_cpu_ids before passing it to cpu_possible(). Note that only nr_cpu_ids of a cpumask are guaranteed to exist at runtime, and nr_cpu_ids can be significantly smaller than NR_CPUs. For example, an arm64 defconfig has NR_CPUS=256, while my test VM has 4 vCPUs. Following the intent from the commit message for 917257daa0fea7a0, the check is moved under the SQ_AFF branch, which is the only branch where the cpu values is consumed. The check is performed before bounding the value with array_index_nospec() so that we don't silently accept bogus cpu values from userspace, where array_index_nospec() would force these values to 0. I suspect we can remove the array_index_nospec() call entirely, but I've conservatively left that in place, updated to use nr_cpu_ids to match the prior check. Tested on arm64 with the Syzkaller reproducer: https://syzkaller.appspot.com/bug?extid=cd714a07c6de2bc34293 https://syzkaller.appspot.com/x/repro.syz?x=15d8b397200000 Full splat from before this patch: WARNING: CPU: 1 PID: 27601 at include/linux/cpumask.h:121 cpu_max_bits_warn include/linux/cpumask.h:121 [inline] WARNING: CPU: 1 PID: 27601 at include/linux/cpumask.h:121 cpumask_check include/linux/cpumask.h:128 [inline] WARNING: CPU: 1 PID: 27601 at include/linux/cpumask.h:121 cpumask_test_cpu include/linux/cpumask.h:344 [inline] WARNING: CPU: 1 PID: 27601 at include/linux/cpumask.h:121 io_sq_offload_start fs/io_uring.c:2244 [inline] WARNING: CPU: 1 PID: 27601 at include/linux/cpumask.h:121 io_uring_create fs/io_uring.c:2864 [inline] WARNING: CPU: 1 PID: 27601 at include/linux/cpumask.h:121 io_uring_setup+0x1108/0x15a0 fs/io_uring.c:2916 Kernel panic - not syncing: panic_on_warn set ... CPU: 1 PID: 27601 Comm: syz-executor.0 Not tainted 5.1.0-rc7 #3 Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace+0x0/0x2f0 include/linux/compiler.h:193 show_stack+0x20/0x30 arch/arm64/kernel/traps.c:158 __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x110/0x190 lib/dump_stack.c:113 panic+0x384/0x68c kernel/panic.c:214 __warn+0x2bc/0x2c0 kernel/panic.c:571 report_bug+0x228/0x2d8 lib/bug.c:186 bug_handler+0xa0/0x1a0 arch/arm64/kernel/traps.c:956 call_break_hook arch/arm64/kernel/debug-monitors.c:301 [inline] brk_handler+0x1d4/0x388 arch/arm64/kernel/debug-monitors.c:316 do_debug_exception+0x1a0/0x468 arch/arm64/mm/fault.c:831 el1_dbg+0x18/0x8c cpu_max_bits_warn include/linux/cpumask.h:121 [inline] cpumask_check include/linux/cpumask.h:128 [inline] cpumask_test_cpu include/linux/cpumask.h:344 [inline] io_sq_offload_start fs/io_uring.c:2244 [inline] io_uring_create fs/io_uring.c:2864 [inline] io_uring_setup+0x1108/0x15a0 fs/io_uring.c:2916 __do_sys_io_uring_setup fs/io_uring.c:2929 [inline] __se_sys_io_uring_setup fs/io_uring.c:2926 [inline] __arm64_sys_io_uring_setup+0x50/0x70 fs/io_uring.c:2926 __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline] invoke_syscall arch/arm64/kernel/syscall.c:47 [inline] el0_svc_common.constprop.0+0x148/0x2e0 arch/arm64/kernel/syscall.c:83 el0_svc_handler+0xdc/0x100 arch/arm64/kernel/syscall.c:129 el0_svc+0x8/0xc arch/arm64/kernel/entry.S:948 SMP: stopping secondary CPUs Dumping ftrace buffer: (ftrace buffer empty) Kernel Offset: disabled CPU features: 0x002,23000438 Memory Limit: none Rebooting in 1 seconds.. Fixes: 917257daa0fea7a0 ("io_uring: only test SQPOLL cpu after we've verified it") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: linux-block@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org Simplied the logic Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-01io_uring: have submission side sqe errors post a cqeJens Axboe
Currently we only post a cqe if we get an error OUTSIDE of submission. For submission, we return the error directly through io_uring_enter(). This is a bit awkward for applications, and it makes more sense to always post a cqe with an error, if the error happens on behalf of an sqe. This changes submission behavior a bit. io_uring_enter() returns -ERROR for an error, and > 0 for number of sqes submitted. Before this change, if you wanted to submit 8 entries and had an error on the 5th entry, io_uring_enter() would return 4 (for number of entries successfully submitted) and rewind the sqring. The application would then have to peek at the sqring and figure out what was wrong with the head sqe, and then skip it itself. With this change, we'll return 5 since we did consume 5 sqes, and the last sqe (with the error) will result in a cqe being posted with the error. This makes the logic easier to handle in the application, and it cleans up the submission part. Suggested-by: Stefan Bühler <source@stbuehler.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-04-30libfs: document simple_get_link()Eric Biggers
Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-04-30ext4: fix ext4_show_options for file systems w/o journalDebabrata Banerjee
Instead of removing EXT4_MOUNT_JOURNAL_CHECKSUM from s_def_mount_opt as I assume was intended, all other options were blown away leading to _ext4_show_options() output being incorrect. Fixes: 1e381f60dad9 ("ext4: do not allow journal_opts for fs w/o journal") Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Cc: stable@kernel.org
2019-04-30Merge tag 'fsnotify_for_v5.1-rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fsnotify fix from Jan Kara: "A fix of user trigerable NULL pointer dereference syzbot has recently spotted. The problem was introduced in this merge window so no CC stable is needed" * tag 'fsnotify_for_v5.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: fsnotify: Fix NULL ptr deref in fanotify_get_fsid()
2019-04-30quota: check time limit when back out space/inode changeChengguang Xu
When we fail from allocating inode/space, we back out the change we already did. In a special case which has exceeded soft limit by the change, we should also check time limit and reset it properly. Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Signed-off-by: Jan Kara <jack@suse.cz>
2019-04-30io_uring: remove unnecessary barrier after unsetting IORING_SQ_NEED_WAKEUPStefan Bühler
There is no operation to order with afterwards, and removing the flag is not critical in any way. There will always be a "race condition" where the application will trigger IORING_ENTER_SQ_WAKEUP when it isn't actually needed. Signed-off-by: Stefan Bühler <source@stbuehler.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-04-30io_uring: remove unnecessary barrier after incrementing dropped counterStefan Bühler
smp_store_release in io_commit_sqring already orders the store to dropped before the update to SQ head. Signed-off-by: Stefan Bühler <source@stbuehler.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-04-30io_uring: remove unnecessary barrier before reading SQ tailStefan Bühler
There is no operation before to order with. Signed-off-by: Stefan Bühler <source@stbuehler.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-04-30io_uring: remove unnecessary barrier after updating SQ headStefan Bühler
There is no operation afterwards to order with. Signed-off-by: Stefan Bühler <source@stbuehler.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-04-30io_uring: remove unnecessary barrier before reading cq headStefan Bühler
The memory operations before reading cq head are unrelated and we don't care about their order. Document that the control dependency in combination with READ_ONCE and WRITE_ONCE forms a barrier we need. Signed-off-by: Stefan Bühler <source@stbuehler.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-04-30io_uring: remove unnecessary barrier before wq_has_sleeperStefan Bühler
wq_has_sleeper has a full barrier internally. The smp_rmb barrier in io_uring_poll synchronizes with it. Signed-off-by: Stefan Bühler <source@stbuehler.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-04-30io_uring: fix notes on barriersStefan Bühler
The application reading the CQ ring needs a barrier to pair with the smp_store_release in io_commit_cqring, not the barrier after it. Also a write barrier *after* writing something (but not *before* writing anything interesting) doesn't order anything, so an smp_wmb() after writing SQ tail is not needed. Additionally consider reading SQ head and writing CQ tail in the notes. Also add some clarifications how the various other fields in the ring buffers are used. Signed-off-by: Stefan Bühler <source@stbuehler.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>