summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2019-05-02req->error only used for iopollStefan Bühler
No need to set it in io_poll_add; io_poll_complete doesn't use it to set the result in the CQE. Signed-off-by: Stefan Bühler <source@stbuehler.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-02io_uring: add support for eventfd notificationsJens Axboe
Allow registration of an eventfd, which will trigger an event every time a completion event happens for this io_uring instance. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-02io_uring: add support for IORING_OP_SYNC_FILE_RANGEJens Axboe
This behaves just like sync_file_range(2) does. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-02fs: add sync_file_range() helperJens Axboe
This just pulls out the ksys_sync_file_range() code to work on a struct file instead of an fd, so we can use it elsewhere. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-02io_uring: add support for marking commands as drainingJens Axboe
There are no ordering constraints between the submission and completion side of io_uring. But sometimes that would be useful to have. One common example is doing an fsync, for instance, and have it ordered with previous writes. Without support for that, the application must do this tracking itself. This adds a general SQE flag, IOSQE_IO_DRAIN. If a command is marked with this flag, then it will not be issued before previous commands have completed, and subsequent commands submitted after the drain will not be issued before the drain is started.. If there are no pending commands, setting this flag will not change the behavior of the issue of the command. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-02Merge tag 'for-linus-20190502' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull io_uring fixes from Jens Axboe: "This is mostly io_uring fixes/tweaks. Most of these were actually done in time for the last -rc, but I wanted to ensure that everything tested out great before including them. The code delta looks larger than it really is, as it's mostly just comment additions/changes. Outside of the comment additions/changes, this is mostly removal of unnecessary barriers. In all, this pull request contains: - Tweak to how we handle errors at submission time. We now post a completion event if the error occurs on behalf of an sqe, instead of returning it through the system call. If the error happens outside of a specific sqe, we return the error through the system call. This makes it nicer to use and makes the "normal" use case behave the same as the offload cases. (me) - Fix for a missing req reference drop from async context (me) - If an sqe is submitted with RWF_NOWAIT, don't punt it to async context. Return -EAGAIN directly, instead of using it as a hint to do async punt. (Stefan) - Fix notes on barriers (Stefan) - Remove unnecessary barriers (Stefan) - Fix potential double free of memory in setup error (Mark) - Further improve sq poll CPU validation (Mark) - Fix page allocation warning and leak on buffer registration error (Mark) - Fix iov_iter_type() for new no-ref flag (Ming) - Fix a case where dio doesn't honor bio no-page-ref (Ming)" * tag 'for-linus-20190502' of git://git.kernel.dk/linux-block: io_uring: avoid page allocation warnings iov_iter: fix iov_iter_type block: fix handling for BIO_NO_PAGE_REF io_uring: drop req submit reference always in async punt io_uring: free allocated io_memory once io_uring: fix SQPOLL cpu validation io_uring: have submission side sqe errors post a cqe io_uring: remove unnecessary barrier after unsetting IORING_SQ_NEED_WAKEUP io_uring: remove unnecessary barrier after incrementing dropped counter io_uring: remove unnecessary barrier before reading SQ tail io_uring: remove unnecessary barrier after updating SQ head io_uring: remove unnecessary barrier before reading cq head io_uring: remove unnecessary barrier before wq_has_sleeper io_uring: fix notes on barriers io_uring: fix handling SQEs requesting NOWAIT
2019-05-02btrfs: Use kvmalloc for allocating compressed path contextNikolay Borisov
Recent refactoring of cow_file_range_async means it's now possible to request a rather large physically contiguous memory via kmalloc. The size is dependent on the number of 512k chunks that the compressed range consists of. David reported multiple OOM messages on such large allocations. Fix it by switching to using kvmalloc. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-05-02btrfs: Factor out common extent locking code in submit_compressed_extentsNikolay Borisov
Irrespective of whether the compress code fell back to uncompressed or a compressed extent has to be submitted, the extent range is always locked. So factor out the common lock_extent call at the beginning of the loop. No functional changes just removes one duplicate lock_extent call. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-05-02btrfs: Set io_tree only once in submit_compressed_extentsNikolay Borisov
The inode never changes so it's sufficient to dereference it and get the iotree only once, before the execution of the main loop. No functional changes, only the size of the function is decreased: add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-44 (-44) Function old new delta submit_compressed_extents 1240 1196 -44 Total: Before=88476, After=88432, chg -0.05% Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-05-02btrfs: Replace clear_extent_bit with unlock_extentNikolay Borisov
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-05-02btrfs: Make compress_file_range take only struct async_chunkNikolay Borisov
All context this function needs is held within struct async_chunk. Currently we not only pass the struct but also every individual member. This is redundant, simplify it by only passing struct async_chunk and leaving it to compress_file_range to extract the values it requires. No functional changes. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-05-02btrfs: Remove fs_info from struct async_chunkNikolay Borisov
The associated btrfs_work already contains a reference to the fs_info so use that instead of passing it via async_chunk. No functional changes. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-05-02btrfs: Rename async_cow to async_chunkNikolay Borisov
Now that we have an explicit async_chunk struct rename references to variables of this type to async_chunk. No functional changes. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-05-02btrfs: Preallocate chunks in cow_file_range_asyncNikolay Borisov
This commit changes the implementation of cow_file_range_async in order to get rid of the BUG_ON in the middle of the loop. Additionally it reworks the inner loop in the hopes of making it more understandable. The idea is to make async_cow be a top-level structured, shared amongst all chunks being sent for compression. This allows to perform one memory allocation at the beginning and gracefully fail the IO if there isn't enough memory. Now, each chunk is going to be described by an async_chunk struct. It's the responsibility of the final chunk to actually free the memory. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-05-02btrfs: reserve delalloc metadata differentlyJosef Bacik
With the per-inode block reserves we started refilling the reserve based on the calculated size of the outstanding csum bytes and extents for the inode, including the amount we were adding with the new operation. However, generic/224 exposed a problem with this approach. With 1000 files all writing at the same time we ended up with a bunch of bytes being reserved but unusable. When you write to a file we reserve space for the csum leaves for those bytes, the number of extent items required to cover those bytes, and a single transaction item for updating the inode at ordered extent finish for that range of bytes. This is held until the ordered extent finishes and we release all of the reserved space. If a second write comes in at this point we would add a single reservation for the new outstanding extent and however many reservations for the csum leaves. At this point we find the delta of how much we have reserved and how much outstanding size this is and attempt to reserve this delta. If the first write finishes it will not release any space, because the space it had reserved for the initial write is still needed for the second write. However some space would have been used, as we have added csums, extent items, and dirtied the inode. Our reserved space would be > 0 but less than the total needed reserved space. This is just for a single inode, now consider generic/224. This has 1000 inodes writing in parallel to a very small file system, 1GiB. In my testing this usually means we get about a 120MiB metadata area to work with, more than enough to allow the writes to continue, but not enough if all of the inodes are stuck trying to reserve the slack space while continuing to hold their leftovers from their initial writes. Fix this by pre-reserved _only_ for the space we are currently trying to add. Then once that is successful modify our inodes csum count and outstanding extents, and then add the newly reserved space to the inodes block_rsv. This allows us to actually pass generic/224 without running out of metadata space. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-05-02ufs: fix braino in ufs_get_inode_gid() for solaris UFS flavourAl Viro
To choose whether to pick the GID from the old (16bit) or new (32bit) field, we should check if the old gid field is set to 0xffff. Mainline checks the old *UID* field instead - cut'n'paste from the corresponding code in ufs_get_inode_uid(). Fixes: 252e211e90ce Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-01xfs: change some error-less functions to void typesEric Sandeen
There are several functions which have no opportunity to return an error, and don't contain any ASSERTs which could be argued to be better constructed as error cases. So, make them voids to simplify the callers. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2019-05-01iomap: move iomap_read_inline_data aroundChristoph Hellwig
iomap_read_inline_data ended up being placed in the middle of the bio based read I/O completion handling, which tends to confuse the heck out of me whenever I follow the code. Move it to a more suitable place. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-05-01orangefs: make use of ->free_inode()Al Viro
Acked-by: Mike Marshall <hubcap@omnibond.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-01hugetlb: make use of ->free_inode()Al Viro
moving synchronous parts of ->destroy_inode() to ->evict_inode() is not possible here - they are balancing the stuff done in ->alloc_inode(), not the things acquired while using it or sanity checks. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-01overlayfs: make use of ->free_inode()Al Viro
synchronous parts are left in ->destroy_inode() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-01jfs: switch to ->free_inode()Al Viro
synchronous part can be moved to ->evict_inode(), the rest - ->free_inode() fodder Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-01fuse: switch to ->free_inode()Al Viro
fuse_destroy_inode() is gone - sanity checks that need the stack trace of the caller get moved into ->evict_inode(), the rest joins the RCU-delayed part which becomes ->free_inode(). While we are at it, don't just pass the address of what happens to be the first member of structure to kmem_cache_free() - get_fuse_inode() is there for purpose and it gives the proper container_of() use. No behaviour change, but verifying correctness is easier that way. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-01ext4: make use of ->free_inode()Al Viro
the rest of this ->destroy_inode() instance could probably be folded into ext4_evict_inode() Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-01ecryptfs: make use of ->free_inode()Al Viro
no idea if crypto destruction could be moved there as well Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-01ceph: use ->free_inode()Al Viro
a lot of non-delayed work in this case; all of that is left in ->destroy_inode() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-01btrfs: use ->free_inode()Al Viro
a lot of stuff remains in ->destroy_inode() Acked-by: David Sterba <dsterba@suse.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-01afs: switch to use of ->free_inode()Al Viro
debugging printks left in ->destroy_inode() and so's the update of inode count; we could take the latter to RCU-delayed part (would take only moving the check on module exit past rcu_barrier() there), but debugging output ought to either stay where it is or go into ->evict_inode() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-01ntfs: switch to ->free_inode()Al Viro
move the synchronous stuff from ->destroy_inode() to ->evict_inode(), turn the RCU-delayed part into ->free_inode() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-01ufs: switch to ->free_inode()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-01coda: switch to ->free_inode()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-01sysv: switch to ->free_inode()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-01udf: switch to ->free_inode()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-01ubifs: switch to ->free_inode()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-01squashfs: switch to ->free_inode()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-01romfs: convert to ->free_inode()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-01reiserfs: convert to ->free_inode()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-01qnx6: switch to ->free_inode()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-01qnx4: switch to ->free_inode()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-01procfs: switch to ->free_inode()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-01openpromfs: switch to ->free_inode()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-01ocfs2: switch to ->free_inode()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-01dlmfs: switch to ->free_inode()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-01nilfs2: switch to ->free_inode()Al Viro
kill an extern that went stale 9 years ago, while we are at it... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-01nfs{,4}: switch to ->free_inode()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-01minix: switch to ->free_inode()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-01jffs2: switch to ->free_inode()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-01isofs: switch to ->free_inode()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-01hpfs: switch to ->free_inode()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-01hostfs: switch to ->free_inode()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>