summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2020-03-10io_uring: add IOSQE_BUFFER_SELECT support for IORING_OP_READVJens Axboe
This adds support for the vectored read. This is limited to supporting just 1 segment in the iov, and is provided just for convenience for applications that use IORING_OP_READV already. The iov helpers will be used for IORING_OP_RECVMSG as well. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-10io_uring: support buffer selection for OP_READ and OP_RECVJens Axboe
If a server process has tons of pending socket connections, generally it uses epoll to wait for activity. When the socket is ready for reading (or writing), the task can select a buffer and issue a recv/send on the given fd. Now that we have fast (non-async thread) support, a task can have tons of pending reads or writes pending. But that means they need buffers to back that data, and if the number of connections is high enough, having them preallocated for all possible connections is unfeasible. With IORING_OP_PROVIDE_BUFFERS, an application can register buffers to use for any request. The request then sets IOSQE_BUFFER_SELECT in the sqe, and a given group ID in sqe->buf_group. When the fd becomes ready, a free buffer from the specified group is selected. If none are available, the request is terminated with -ENOBUFS. If successful, the CQE on completion will contain the buffer ID chosen in the cqe->flags member, encoded as: (buffer_id << IORING_CQE_BUFFER_SHIFT) | IORING_CQE_F_BUFFER; Once a buffer has been consumed by a request, it is no longer available and must be registered again with IORING_OP_PROVIDE_BUFFERS. Requests need to support this feature. For now, IORING_OP_READ and IORING_OP_RECV support it. This is checked on SQE submission, a CQE with res == -EOPNOTSUPP will be posted if attempted on unsupported requests. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-10io_uring: add IORING_OP_PROVIDE_BUFFERSJens Axboe
IORING_OP_PROVIDE_BUFFERS uses the buffer registration infrastructure to support passing in an addr/len that is associated with a buffer ID and buffer group ID. The group ID is used to index and lookup the buffers, while the buffer ID can be used to notify the application which buffer in the group was used. The addr passed in is the starting buffer address, and length is each buffer length. A number of buffers to add with can be specified, in which case addr is incremented by length for each addition, and each buffer increments the buffer ID specified. No validation is done of the buffer ID. If the application provides buffers within the same group with identical buffer IDs, then it'll have a hard time telling which buffer ID was used. The only restriction is that the buffer ID can be a max of 16-bits in size, so USHRT_MAX is the maximum ID that can be used. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-09pstore/ram: Replace zero-length array with flexible-array memberGustavo A. R. Silva
The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Link: https://lore.kernel.org/r/20200309202327.GA8813@embeddedor Signed-off-by: Kees Cook <keescook@chromium.org>
2020-03-09Merge 5.6-rc5 into driver-core-nextGreg Kroah-Hartman
We need the driver core and debugfs changes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-08io_uring: ensure RCU callback ordering with rcu_barrier()Jens Axboe
After more careful studying, Paul informs me that we cannot rely on ordering of RCU callbacks in the way that the the tagged commit did. The current construct looks like this: void C(struct rcu_head *rhp) { do_something(rhp); call_rcu(&p->rh, B); } call_rcu(&p->rh, A); call_rcu(&p->rh, C); and we're relying on ordering between A and B, which isn't guaranteed. Make this explicit instead, and have a work item issue the rcu_barrier() to ensure that A has run before we manually execute B. While thorough testing never showed this issue, it's dependent on the per-cpu load in terms of RCU callbacks. The updated method simplifies the code as well, and eliminates the need to maintain an rcu_head in the fileset data. Fixes: c1e2148f8ecb ("io_uring: free fixed_file_data after RCU grace period") Reported-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-08Merge tag 'driver-core-5.6-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core and debugfs fixes from Greg KH: "Here are four small driver core / debugfs patches for 5.6-rc3: - debugfs api cleanup now that all debugfs_create_regset32() callers have been fixed up. This was waiting until after the -rc1 merge as these fixes came in through different trees - driver core sync state fixes based on reports of minor issues found in the feature All of these have been in linux-next with no reported issues" * tag 'driver-core-5.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: driver core: Skip unnecessary work when device doesn't have sync_state() driver core: Add dev_has_sync_state() driver core: Call sync_state() even if supplier has no consumers debugfs: remove return value of debugfs_create_regset32()
2020-03-08Merge branch 'efi/urgent' into efi/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-03-07fscrypt: don't evict dirty inodes after removing keyEric Biggers
After FS_IOC_REMOVE_ENCRYPTION_KEY removes a key, it syncs the filesystem and tries to get and put all inodes that were unlocked by the key so that unused inodes get evicted via fscrypt_drop_inode(). Normally, the inodes are all clean due to the sync. However, after the filesystem is sync'ed, userspace can modify and close one of the files. (Userspace is *supposed* to close the files before removing the key. But it doesn't always happen, and the kernel can't assume it.) This causes the inode to be dirtied and have i_count == 0. Then, fscrypt_drop_inode() failed to consider this case and indicated that the inode can be dropped, causing the write to be lost. On f2fs, other problems such as a filesystem freeze could occur due to the inode being freed while still on f2fs's dirty inode list. Fix this bug by making fscrypt_drop_inode() only drop clean inodes. I've written an xfstest which detects this bug on ext4, f2fs, and ubifs. Fixes: b1c0ec3599f4 ("fscrypt: add FS_IOC_REMOVE_ENCRYPTION_KEY ioctl") Cc: <stable@vger.kernel.org> # v5.4+ Link: https://lore.kernel.org/r/20200305084138.653498-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-03-07Merge tag 'io_uring-5.6-2020-03-07' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull io_uring fixes from Jens Axboe: "Here are a few io_uring fixes that should go into this release. This contains: - Removal of (now) unused io_wq_flush() and associated flag (Pavel) - Fix cancelation lockup with linked timeouts (Pavel) - Fix for potential use-after-free when freeing percpu ref for fixed file sets - io-wq cancelation fixups (Pavel)" * tag 'io_uring-5.6-2020-03-07' of git://git.kernel.dk/linux-block: io_uring: fix lockup with timeouts io_uring: free fixed_file_data after RCU grace period io-wq: remove io_wq_flush and IO_WQ_WORK_INTERNAL io-wq: fix IO_WQ_WORK_NO_CANCEL cancellation
2020-03-07io_uring: fix lockup with timeoutsPavel Begunkov
There is a recipe to deadlock the kernel: submit a timeout sqe with a linked_timeout (e.g. test_single_link_timeout_ception() from liburing), and SIGKILL the process. Then, io_kill_timeouts() takes @ctx->completion_lock, but the timeout isn't flagged with REQ_F_COMP_LOCKED, and will try to double grab it during io_put_free() to cancel the linked timeout. Probably, the same can happen with another io_kill_timeout() call site, that is io_commit_cqring(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-06Merge tag 'for-5.6-rc4-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fix from David Sterba: "One fixup for DIO when in use with the new checksums, a missed case where the checksum size was still assuming u32" * tag 'for-5.6-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix RAID direct I/O reads with alternate csums
2020-03-06Merge tag 'filelock-v5.6-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux Pull file locking fixes from Jeff Layton: "Just a couple of late-breaking patches for the file locking code. The second patch (from yangerkun) fixes a rather nasty looking potential use-after-free that should go to stable. The other patch could technically wait for 5.7, but it's fairly innocuous so I figured we might as well take it" * tag 'filelock-v5.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux: locks: fix a potential use-after-free problem when wakeup a waiter fcntl: Distribute switch variables for initialization
2020-03-06io_uring: free fixed_file_data after RCU grace periodJens Axboe
The percpu refcount protects this structure, and we can have an atomic switch in progress when exiting. This makes it unsafe to just free the struct normally, and can trigger the following KASAN warning: BUG: KASAN: use-after-free in percpu_ref_switch_to_atomic_rcu+0xfa/0x1b0 Read of size 1 at addr ffff888181a19a30 by task swapper/0/0 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.6.0-rc4+ #5747 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 Call Trace: <IRQ> dump_stack+0x76/0xa0 print_address_description.constprop.0+0x3b/0x60 ? percpu_ref_switch_to_atomic_rcu+0xfa/0x1b0 ? percpu_ref_switch_to_atomic_rcu+0xfa/0x1b0 __kasan_report.cold+0x1a/0x3d ? percpu_ref_switch_to_atomic_rcu+0xfa/0x1b0 percpu_ref_switch_to_atomic_rcu+0xfa/0x1b0 rcu_core+0x370/0x830 ? percpu_ref_exit+0x50/0x50 ? rcu_note_context_switch+0x7b0/0x7b0 ? run_rebalance_domains+0x11d/0x140 __do_softirq+0x10a/0x3e9 irq_exit+0xd5/0xe0 smp_apic_timer_interrupt+0x86/0x200 apic_timer_interrupt+0xf/0x20 </IRQ> RIP: 0010:default_idle+0x26/0x1f0 Fix this by punting the final exit and free of the struct to RCU, then we know that it's safe to do so. Jann suggested the approach of using a double rcu callback to achieve this. It's important that we do a nested call_rcu() callback, as otherwise the free could be ordered before the atomic switch, even if the latter was already queued. Reported-by: syzbot+e017e49c39ab484ac87a@syzkaller.appspotmail.com Suggested-by: Jann Horn <jannh@google.com> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-06locks: fix a potential use-after-free problem when wakeup a waiteryangerkun
'16306a61d3b7 ("fs/locks: always delete_block after waiting.")' add the logic to check waiter->fl_blocker without blocked_lock_lock. And it will trigger a UAF when we try to wakeup some waiter: Thread 1 has create a write flock a on file, and now thread 2 try to unlock and delete flock a, thread 3 try to add flock b on the same file. Thread2 Thread3 flock syscall(create flock b) ...flock_lock_inode_wait flock_lock_inode(will insert our fl_blocked_member list to flock a's fl_blocked_requests) sleep flock syscall(unlock) ...flock_lock_inode_wait locks_delete_lock_ctx ...__locks_wake_up_blocks __locks_delete_blocks( b->fl_blocker = NULL) ... break by a signal locks_delete_block b->fl_blocker == NULL && list_empty(&b->fl_blocked_requests) success, return directly locks_free_lock b wake_up(&b->fl_waiter) trigger UAF Fix it by remove this logic, and this patch may also fix CVE-2019-19769. Cc: stable@vger.kernel.org Fixes: 16306a61d3b7 ("fs/locks: always delete_block after waiting.") Signed-off-by: yangerkun <yangerkun@huawei.com> Signed-off-by: Jeff Layton <jlayton@kernel.org>
2020-03-06gfs2: Additional information when gfs2_ail1_flush withdrawsBob Peterson
Before this patch, if gfs2_ail1_flush gets an error from function gfs2_ail1_start_one (which comes indirectly from generic_writepages) the file system is withdrawn, but without any explanation why. This patch adds an error message if gfs2_ail1_flush gets an error from gfs2_ail1_start_one. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2020-03-06fat: fix uninit-memory access for partial initialized inodeOGAWA Hirofumi
When get an error in the middle of reading an inode, some fields in the inode might be still not initialized. And then the evict_inode path may access those fields via iput(). To fix, this makes sure that inode fields are initialized. Reported-by: syzbot+9d82b8de2992579da5d0@syzkaller.appspotmail.com Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/871rqnreqx.fsf@mail.parknet.co.jp Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-03-06futex: Fix inode life-time issuePeter Zijlstra
As reported by Jann, ihold() does not in fact guarantee inode persistence. And instead of making it so, replace the usage of inode pointers with a per boot, machine wide, unique inode identifier. This sequence number is global, but shared (file backed) futexes are rare enough that this should not become a performance issue. Reported-by: Jann Horn <jannh@google.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2020-03-05ext4: fix a data race at inode->i_disksizeQiujun Huang
KCSAN find inode->i_disksize could be accessed concurrently. BUG: KCSAN: data-race in ext4_mark_iloc_dirty / ext4_write_end write (marked) to 0xffff8b8932f40090 of 8 bytes by task 66792 on cpu 0: ext4_write_end+0x53f/0x5b0 ext4_da_write_end+0x237/0x510 generic_perform_write+0x1c4/0x2a0 ext4_buffered_write_iter+0x13a/0x210 ext4_file_write_iter+0xe2/0x9b0 new_sync_write+0x29c/0x3a0 __vfs_write+0x92/0xa0 vfs_write+0xfc/0x2a0 ksys_write+0xe8/0x140 __x64_sys_write+0x4c/0x60 do_syscall_64+0x8a/0x2a0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 read to 0xffff8b8932f40090 of 8 bytes by task 14414 on cpu 1: ext4_mark_iloc_dirty+0x716/0x1190 ext4_mark_inode_dirty+0xc9/0x360 ext4_convert_unwritten_extents+0x1bc/0x2a0 ext4_convert_unwritten_io_end_vec+0xc5/0x150 ext4_put_io_end+0x82/0x130 ext4_writepages+0xae7/0x16f0 do_writepages+0x64/0x120 __writeback_single_inode+0x7d/0x650 writeback_sb_inodes+0x3a4/0x860 __writeback_inodes_wb+0xc4/0x150 wb_writeback+0x43f/0x510 wb_workfn+0x3b2/0x8a0 process_one_work+0x39b/0x7e0 worker_thread+0x88/0x650 kthread+0x1d4/0x1f0 ret_from_fork+0x35/0x40 The plain read is outside of inode->i_data_sem critical section which results in a data race. Fix it by adding READ_ONCE(). Signed-off-by: Qiujun Huang <hqjagain@gmail.com> Link: https://lore.kernel.org/r/1582556566-3909-1-git-send-email-hqjagain@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-03-05ext4: fix a data race at inode->i_blocksQian Cai
inode->i_blocks could be accessed concurrently as noticed by KCSAN, BUG: KCSAN: data-race in ext4_do_update_inode [ext4] / inode_add_bytes write to 0xffff9a00d4b982d0 of 8 bytes by task 22100 on cpu 118: inode_add_bytes+0x65/0xf0 __inode_add_bytes at fs/stat.c:689 (inlined by) inode_add_bytes at fs/stat.c:702 ext4_mb_new_blocks+0x418/0xca0 [ext4] ext4_ext_map_blocks+0x1a6b/0x27b0 [ext4] ext4_map_blocks+0x1a9/0x950 [ext4] _ext4_get_block+0xfc/0x270 [ext4] ext4_get_block_unwritten+0x33/0x50 [ext4] __block_write_begin_int+0x22e/0xae0 __block_write_begin+0x39/0x50 ext4_write_begin+0x388/0xb50 [ext4] ext4_da_write_begin+0x35f/0x8f0 [ext4] generic_perform_write+0x15d/0x290 ext4_buffered_write_iter+0x11f/0x210 [ext4] ext4_file_write_iter+0xce/0x9e0 [ext4] new_sync_write+0x29c/0x3b0 __vfs_write+0x92/0xa0 vfs_write+0x103/0x260 ksys_write+0x9d/0x130 __x64_sys_write+0x4c/0x60 do_syscall_64+0x91/0xb05 entry_SYSCALL_64_after_hwframe+0x49/0xbe read to 0xffff9a00d4b982d0 of 8 bytes by task 8 on cpu 65: ext4_do_update_inode+0x4a0/0xf60 [ext4] ext4_inode_blocks_set at fs/ext4/inode.c:4815 ext4_mark_iloc_dirty+0xaf/0x160 [ext4] ext4_mark_inode_dirty+0x129/0x3e0 [ext4] ext4_convert_unwritten_extents+0x253/0x2d0 [ext4] ext4_convert_unwritten_io_end_vec+0xc5/0x150 [ext4] ext4_end_io_rsv_work+0x22c/0x350 [ext4] process_one_work+0x54f/0xb90 worker_thread+0x80/0x5f0 kthread+0x1cd/0x1f0 ret_from_fork+0x27/0x50 4 locks held by kworker/u256:0/8: #0: ffff9a025abc4328 ((wq_completion)ext4-rsv-conversion){+.+.}, at: process_one_work+0x443/0xb90 #1: ffffab5a862dbe20 ((work_completion)(&ei->i_rsv_conversion_work)){+.+.}, at: process_one_work+0x443/0xb90 #2: ffff9a025a9d0f58 (jbd2_handle){++++}, at: start_this_handle+0x1c1/0x9d0 [jbd2] #3: ffff9a00d4b985d8 (&(&ei->i_raw_lock)->rlock){+.+.}, at: ext4_do_update_inode+0xaa/0xf60 [ext4] irq event stamp: 3009267 hardirqs last enabled at (3009267): [<ffffffff980da9b7>] __find_get_block+0x107/0x790 hardirqs last disabled at (3009266): [<ffffffff980da8f9>] __find_get_block+0x49/0x790 softirqs last enabled at (3009230): [<ffffffff98a0034c>] __do_softirq+0x34c/0x57c softirqs last disabled at (3009223): [<ffffffff97cc67a2>] irq_exit+0xa2/0xc0 Reported by Kernel Concurrency Sanitizer on: CPU: 65 PID: 8 Comm: kworker/u256:0 Tainted: G L 5.6.0-rc2-next-20200221+ #7 Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019 Workqueue: ext4-rsv-conversion ext4_end_io_rsv_work [ext4] The plain read is outside of inode->i_lock critical section which results in a data race. Fix it by adding READ_ONCE() there. Link: https://lore.kernel.org/r/20200222043258.2279-1-cai@lca.pw Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
2020-03-05exfat: update file system parameter handlingValdis Kletnieks
Al Viro recently reworked the way file system parameters are handled Update super.c to work with it in linux-next 20200203. Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-05exfat: add Kconfig and MakefileNamjae Jeon
This adds the Kconfig and Makefile for exfat. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Sungjong Seo <sj1557.seo@samsung.com> Reviewed-by: Pali Rohár <pali.rohar@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-05exfat: add nls operationsNamjae Jeon
This adds the implementation of nls operations for exfat. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Sungjong Seo <sj1557.seo@samsung.com> Reviewed-by: Pali Rohár <pali.rohar@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-05exfat: add misc operationsNamjae Jeon
This adds the implementation of misc operations for exfat. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Sungjong Seo <sj1557.seo@samsung.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Pali Rohár <pali.rohar@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-05exfat: add exfat cacheNamjae Jeon
This adds the implementation of exfat cache. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Sungjong Seo <sj1557.seo@samsung.com> Reviewed-by: Pali Rohár <pali.rohar@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-05exfat: add bitmap operationsNamjae Jeon
This adds the implementation of bitmap operations for exfat. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Sungjong Seo <sj1557.seo@samsung.com> Reviewed-by: Pali Rohár <pali.rohar@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-05exfat: add fat entry operationsNamjae Jeon
This adds the implementation of fat entry operations for exfat. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Sungjong Seo <sj1557.seo@samsung.com> Reviewed-by: Pali Rohár <pali.rohar@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-05exfat: add file operationsNamjae Jeon
This adds the implementation of file operations for exfat. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Sungjong Seo <sj1557.seo@samsung.com> Reviewed-by: Pali Rohár <pali.rohar@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-05exfat: add directory operationsNamjae Jeon
This adds the implementation of directory operations for exfat. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Sungjong Seo <sj1557.seo@samsung.com> Reviewed-by: Pali Rohár <pali.rohar@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-05exfat: add inode operationsNamjae Jeon
This adds the implementation of inode operations for exfat. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Sungjong Seo <sj1557.seo@samsung.com> Reviewed-by: Pali Rohár <pali.rohar@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-05exfat: add super block operationsNamjae Jeon
This adds the implementation of superblock operations for exfat. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Sungjong Seo <sj1557.seo@samsung.com> Reviewed-by: Pali Rohár <pali.rohar@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-05exfat: add in-memory and on-disk structures and headersNamjae Jeon
This adds in-memory and on-disk structures and headers. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Sungjong Seo <sj1557.seo@samsung.com> Reviewed-by: Pali Rohár <pali.rohar@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-05ext4: clean up error return for convert_initialized_extent()Eric Whitney
Although convert_initialized_extent() can potentially return an error code with a negative value, its returned value is assigned to an unsigned variable containing a block count in ext4_ext_map_blocks() and then returned to that function's caller. The code currently works, though the way this happens is obscure. The code would be more readable if it followed the error handling convention used elsewhere in ext4_ext_map_blocks(). This patch does not address any known test failure or bug report - it's simply a cleanup. It also addresses a nearby coding standard issue. Signed-off-by: Eric Whitney <enwlinux@gmail.com> Link: https://lore.kernel.org/r/20200218202656.21561-1-enwlinux@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-03-05jbd2: improve comments about freeing data buffers whose page mapping is NULLzhangyi (F)
Improve comments in jbd2_journal_commit_transaction() to describe why we don't need to clear the buffer_mapped bit for freeing file mapping buffers whose page mapping is NULL. Link: https://lore.kernel.org/r/20200217112706.20085-1-yi.zhang@huawei.com Fixes: c96dceeabf76 ("jbd2: do not clear the BH_Mapped flag when forgetting a metadata buffer") Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-03-05ext4: use flexible-array members in struct dx_node and struct dx_rootGustavo A. R. Silva
The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Link: https://lore.kernel.org/r/20200213160648.GA7054@embeddedor Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-03-05ext4: use built-in RCU list checking in mballocMadhuparna Bhowmik
list_for_each_entry_rcu() has built-in RCU and lock checking. Pass cond argument to list_for_each_entry_rcu() to silence false lockdep warning when CONFIG_PROVE_RCU_LIST is enabled by default. Signed-off-by: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com> Link: https://lore.kernel.org/r/20200213152558.7070-1-madhuparnabhowmik10@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-03-05ext4: delete declaration for ext4_split_extent()Eric Whitney
There are no forward references for ext4_split_extent() in extents.c, so delete its unnecessary declaration. Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Whitney <enwlinux@gmail.com> Link: https://lore.kernel.org/r/20200212162141.22381-1-enwlinux@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-03-05ext4: remove EXT4_EOFBLOCKS_FL and associated codeEric Whitney
The EXT4_EOFBLOCKS_FL inode flag is used to indicate whether a file contains unwritten blocks past i_size. It's set when ext4_fallocate is called with the KEEP_SIZE flag to extend a file with an unwritten extent. However, this flag hasn't been useful functionally since March, 2012, when a decision was made to remove it from ext4. All traces of EXT4_EOFBLOCKS_FL were removed from e2fsprogs version 1.42.2 by commit 010dc7b90d97 ("e2fsck: remove EXT4_EOFBLOCKS_FL flag handling") at that time. Now that enough time has passed to make e2fsprogs versions containing this modification common, this patch now removes the code associated with EXT4_EOFBLOCKS_FL from the kernel as well. This change has two implications. First, because pre-1.42.2 e2fsck versions only look for a problem if EXT4_EOFBLOCKS_FL is set, and because that bit will never be set by newer kernels containing this patch, old versions of e2fsck won't have a compatibility problem with files created by newer kernels. Second, newer kernels will not clear EXT4_EOFBLOCKS_FL inode flag bits belonging to a file written by an older kernel. If set, it will remain in that state until the file is deleted. Because e2fsck versions since 1.42.2 don't check the flag at all, no adverse effect is expected. However, pre-1.42.2 e2fsck versions that do check the flag may report that it is set when it ought not to be after a file has been truncated or had its unwritten blocks written. In this case, the old version of e2fsck will offer to clear the flag. No adverse effect would then occur whether the user chooses to clear the flag or not. Signed-off-by: Eric Whitney <enwlinux@gmail.com> Link: https://lore.kernel.org/r/20200211210216.24960-1-enwlinux@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-03-05ext4: code cleanup for ext4_statfs_project()Chengguang Xu
Calling min_not_zero() to simplify complicated prjquota limit comparison in ext4_statfs_project(). Signed-off-by: Chengguang Xu <cgxu519@mykernel.net> Link: https://lore.kernel.org/r/20200210082445.2379-1-cgxu519@mykernel.net Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-03-05ext4: start to support iopoll methodXiaoguang Wang
Since commit "b1b4705d54ab ext4: introduce direct I/O read using iomap infrastructure", we can easily make ext4 support iopoll method, just use iomap_dio_iopoll(). Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Link: https://lore.kernel.org/r/20200207120758.2411-1-xiaoguang.wang@linux.alibaba.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-03-05ext4: force buffer up-to-date while marking it dirtyHarshad Shirwadkar
Writeback errors can leave buffer in not up-to-date state when there are errors during background writes. Force buffer up-to-date while marking it dirty. Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20191224190940.157952-1-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-03-05iomap: Remove pgoff from tracepointsMatthew Wilcox (Oracle)
The 'pgoff' displayed by the tracepoints wasn't a pgoff at all; it was a byte offset from the start of the file. We already emit that in the form of the 'offset', so we can just remove pgoff. That means we can remove 'page' as an argument to the tracepoint, and rename this type of tracepoint from being a page class to being a range class. Fixes: 0b1b213fcf3a ("xfs: event tracing support") Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-03-04io_uring: buffer registration infrastructureJens Axboe
This just prepares the ring for having lists of buffers associated with it, that the application can provide for SQEs to consume instead of providing their own. The buffers are organized by group ID. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-04io_uring/io-wq: forward submission ref to asyncPavel Begunkov
First it changes io-wq interfaces. It replaces {get,put}_work() with free_work(), which guaranteed to be called exactly once. It also enforces free_work() callback to be non-NULL. io_uring follows the changes and instead of putting a submission reference in io_put_req_async_completion(), it will be done in io_free_work(). As removes io_get_work() with corresponding refcount_inc(), the ref balance is maintained. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-04io-wq: optimise out *next_work() double lockPavel Begunkov
When executing non-linked hashed work, io_worker_handle_work() will lock-unlock wqe->lock to update hash, and then immediately lock-unlock to get next work. Optimise this case and do lock/unlock only once. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-04io-wq: optimise locking in io_worker_handle_work()Pavel Begunkov
There are 2 optimisations: - Now, io_worker_handler_work() do io_assign_current_work() twice per request, and each one adds lock/unlock(worker->lock) pair. The first is to reset worker->cur_work to NULL, and the second to set a real work shortly after. If there is a dependant work, set it immediately, that effectively removes the extra NULL'ing. - And there is no use in taking wqe->lock for linked works, as they are not hashed now. Optimise it out. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-04io-wq: shuffle io_worker_handle_work() codePavel Begunkov
This is a preparation patch, it adds some helpers and makes the next patches cleaner. - extract io_impersonate_work() and io_assign_current_work() - replace @next label with nested do-while - move put_work() right after NULL'ing cur_work. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-03io_uring: get next work with submission ref dropPavel Begunkov
If after dropping the submission reference req->refs == 1, the request is done, because this one is for io_put_work() and will be dropped synchronously shortly after. In this case it's safe to steal a next work from the request. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-03io_uring: remove @nxt from handlersPavel Begunkov
There will be no use for @nxt in the handlers, and it's doesn't work anyway, so purge it Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-03io_uring: make submission ref putting consistentPavel Begunkov
The rule is simple, any async handler gets a submission ref and should put it at the end. Make them all follow it, and so more consistent. This is a preparation patch, and as io_wq_assign_next() currently won't ever work, this doesn't care to use io_put_req_find_next() instead of io_put_req(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> refcount_inc_not_zero() -> refcount_inc() fix. Signed-off-by: Jens Axboe <axboe@kernel.dk>