summaryrefslogtreecommitdiff
path: root/block/fops.c
AgeCommit message (Collapse)Author
2022-02-22block: clear iocb->private in blkdev_bio_end_io_async()Stefano Garzarella
iocb_bio_iopoll() expects iocb->private to be cleared before releasing the bio. We already do this in blkdev_bio_end_io(), but we forgot in the recently added blkdev_bio_end_io_async(). Fixes: 54a88eb838d3 ("block: add single bio async direct IO helper") Cc: asml.silence@gmail.com Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220211090136.44471-1-sgarzare@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-02block: fix DIO handling regressions in blkdev_read_iter()Ilya Dryomov
Commit ceaa762527f4 ("block: move direct_IO into our own read_iter handler") introduced several regressions for bdev DIO: 1. read spanning EOF always returns 0 instead of the number of bytes read. This is because "count" is assigned early and isn't updated when the iterator is truncated: $ lsblk -o name,size /dev/vdb NAME SIZE vdb 1G $ xfs_io -d -c 'pread -b 4M 1021M 4M' /dev/vdb read 0/4194304 bytes at offset 1070596096 0.000000 bytes, 0 ops; 0.0007 sec (0.000000 bytes/sec and 0.0000 ops/sec) instead of $ xfs_io -d -c 'pread -b 4M 1021M 4M' /dev/vdb read 3145728/4194304 bytes at offset 1070596096 3 MiB, 1 ops; 0.0007 sec (3.865 GiB/sec and 1319.2612 ops/sec) 2. truncated iterator isn't reexpanded 3. iterator isn't reverted on blkdev_direct_IO() error 4. zero size read no longer skips atime update Fixes: ceaa762527f4 ("block: move direct_IO into our own read_iter handler") Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220201100420.25875-1-idryomov@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-01-12Merge tag 'for-5.17/block-2022-01-11' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block updates from Jens Axboe: - Unify where the struct request handling code is located in the blk-mq code (Christoph) - Header cleanups (Christoph) - Clean up the io_context handling code (Christoph, me) - Get rid of ->rq_disk in struct request (Christoph) - Error handling fix for add_disk() (Christoph) - request allocation cleanusp (Christoph) - Documentation updates (Eric, Matthew) - Remove trivial crypto unregister helper (Eric) - Reduce shared tag overhead (John) - Reduce poll_stats memory overhead (me) - Known indirect function call for dio (me) - Use atomic references for struct request (me) - Support request list issue for block and NVMe (me) - Improve queue dispatch pinning (Ming) - Improve the direct list issue code (Keith) - BFQ improvements (Jan) - Direct completion helper and use it in mmc block (Sebastian) - Use raw spinlock for the blktrace code (Wander) - fsync error handling fix (Ye) - Various fixes and cleanups (Lukas, Randy, Yang, Tetsuo, Ming, me) * tag 'for-5.17/block-2022-01-11' of git://git.kernel.dk/linux-block: (132 commits) MAINTAINERS: add entries for block layer documentation docs: block: remove queue-sysfs.rst docs: sysfs-block: document virt_boundary_mask docs: sysfs-block: document stable_writes docs: sysfs-block: fill in missing documentation from queue-sysfs.rst docs: sysfs-block: add contact for nomerges docs: sysfs-block: sort alphabetically docs: sysfs-block: move to stable directory block: don't protect submit_bio_checks by q_usage_counter block: fix old-style declaration nvme-pci: fix queue_rqs list splitting block: introduce rq_list_move block: introduce rq_list_for_each_safe macro block: move rq_list macros to blk-mq.h block: drop needless assignment in set_task_ioprio() block: remove unnecessary trailing '\' bio.h: fix kernel-doc warnings block: check minor range in device_add_disk() block: use "unsigned long" for blk_validate_block_size(). block: fix error unwinding in device_add_disk ...
2021-12-11Merge tag 'block-5.16-2021-12-10' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: "A few block fixes that should go into this release: - NVMe pull request: - set ana_log_size to 0 after freeing ana_log_buf (Hou Tao) - show subsys nqn for duplicate cntlids (Keith Busch) - disable namespace access for unsupported metadata (Keith Busch) - report write pointer for a full zone as zone start + zone len (Niklas Cassel) - fix use after free when disconnecting a reconnecting ctrl (Ruozhu Li) - fix a list corruption in nvmet-tcp (Sagi Grimberg) - Fix for a regression on DIO single bio async IO (Pavel) - ioprio seteuid fix (Davidlohr) - mtd fix that subsequently got reverted as it was broken, will get re-done and submitted for the next round - Two MD fixes via Song (Markus, zhangyue)" * tag 'block-5.16-2021-12-10' of git://git.kernel.dk/linux-block: Revert "mtd_blkdevs: don't scan partitions for plain mtdblock" block: fix ioprio_get(IOPRIO_WHO_PGRP) vs setuid(2) md: fix double free of mddev->private in autorun_array() md: fix update super 1.0 on rdev size change nvmet-tcp: fix possible list corruption for unexpected command failure block: fix single bio async DIO error handling nvme: fix use after free when disconnecting a reconnecting ctrl nvme-multipath: set ana_log_size to 0 after free ana_log_buf mtd_blkdevs: don't scan partitions for plain mtdblock nvme: report write pointer for a full zone as zone start + zone len nvme: disable namespace access for unsupported metadata nvme: show subsys nqn for duplicate cntlids
2021-12-07block: fix single bio async DIO error handlingPavel Begunkov
BUG: KASAN: use-after-free in io_submit_one+0x496/0x2fe0 fs/aio.c:1882 CPU: 2 PID: 15100 Comm: syz-executor873 Not tainted 5.16.0-rc1-syzk #1 Hardware name: Red Hat KVM, BIOS 1.13.0-2.module+el8.3.0+7860+a7792d29 04/01/2014 Call Trace: [...] refcount_dec_and_test include/linux/refcount.h:333 [inline] iocb_put fs/aio.c:1161 [inline] io_submit_one+0x496/0x2fe0 fs/aio.c:1882 __do_sys_io_submit fs/aio.c:1938 [inline] __se_sys_io_submit fs/aio.c:1908 [inline] __x64_sys_io_submit+0x1c7/0x4a0 fs/aio.c:1908 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3a/0x80 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae __blkdev_direct_IO_async() returns errors from bio_iov_iter_get_pages() directly, in which case upper layers won't be expecting ->ki_complete to be called by the block layer and will terminate the request. However, there is also bio_endio() leading to a second ->ki_complete and a double free. Fixes: 54a88eb838d37 ("block: add single bio async direct IO helper") Reported-by: George Kennedy <george.kennedy@oracle.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/c9eb786f6cef041e159e6287de131bec0719ad5c.1638907997.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-12-03block: move direct_IO into our own read_iter handlerJens Axboe
Don't call into generic_file_read_iter() if we know it's O_DIRECT, just set it up ourselves and call our own handler. This avoids an indirect call for O_DIRECT. Fall back to filemap_read() if we fail. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-12-03treewide: Add missing includes masked by cgroup -> bpf dependencyJakub Kicinski
cgroup.h (therefore swap.h, therefore half of the universe) includes bpf.h which in turn includes module.h and slab.h. Since we're about to get rid of that dependency we need to clean things up. v2: drop the cpu.h include from cacheinfo.h, it's not necessary and it makes riscv sensitive to ordering of include files. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Krzysztof WilczyƄski <kw@linux.com> Acked-by: Peter Chen <peter.chen@kernel.org> Acked-by: SeongJae Park <sj@kernel.org> Acked-by: Jani Nikula <jani.nikula@intel.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/all/20211120035253.72074-1-kuba@kernel.org/ # v1 Link: https://lore.kernel.org/all/20211120165528.197359-1-kuba@kernel.org/ # cacheinfo discussion Link: https://lore.kernel.org/bpf/20211202203400.1208663-1-kuba@kernel.org
2021-11-09Merge tag 'for-5.16/bdev-size-2021-11-09' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull more bdev size updates from Jens Axboe: "Two followup changes for the bdev-size series from this merge window: - Add loff_t cast to bdev_nr_bytes() (Christoph) - Use bdev_nr_bytes() consistently for the block parts at least (me)" * tag 'for-5.16/bdev-size-2021-11-09' of git://git.kernel.dk/linux-block: block: use new bdev_nr_bytes() helper for blkdev_{read,write}_iter() block: add a loff_t cast to bdev_nr_bytes
2021-11-05block: use new bdev_nr_bytes() helper for blkdev_{read,write}_iter()Jens Axboe
We have new helpers for this, use them rather than the slower inode size reads. This makes the read/write path consistent with most of the rest of block as well. Signed-off-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/a72767cd-3c6d-47f7-80f4-aa025a17b2cb@kernel.dk Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-01Merge tag 'for-5.16/ki_complete-2021-10-29' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull kiocb->ki_complete() cleanup from Jens Axboe: "This removes the res2 argument from kiocb->ki_complete(). Only the USB gadget code used it, everybody else passes 0. The USB guys checked the user gadget code they could find, and everybody just uses res as expected for the async interface" * tag 'for-5.16/ki_complete-2021-10-29' of git://git.kernel.dk/linux-block: fs: get rid of the res2 iocb->ki_complete argument usb: remove res2 argument from gadget code completions
2021-11-01Merge tag 'for-5.16/bdev-size-2021-10-29' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull bdev size cleanups from Jens Axboe: "Clean up the bdev size handling with new bdev_nr_bytes() helper" * tag 'for-5.16/bdev-size-2021-10-29' of git://git.kernel.dk/linux-block: (34 commits) partitions/ibm: use bdev_nr_sectors instead of open coding it partitions/efi: use bdev_nr_bytes instead of open coding it block/ioctl: use bdev_nr_sectors and bdev_nr_bytes block: cache inode size in bdev udf: use sb_bdev_nr_blocks reiserfs: use sb_bdev_nr_blocks ntfs: use sb_bdev_nr_blocks jfs: use sb_bdev_nr_blocks ext4: use sb_bdev_nr_blocks block: add a sb_bdev_nr_blocks helper block: use bdev_nr_bytes instead of open coding it in blkdev_fallocate squashfs: use bdev_nr_bytes instead of open coding it reiserfs: use bdev_nr_bytes instead of open coding it pstore/blk: use bdev_nr_bytes instead of open coding it ntfs3: use bdev_nr_bytes instead of open coding it nilfs2: use bdev_nr_bytes instead of open coding it nfs/blocklayout: use bdev_nr_bytes instead of open coding it jfs: use bdev_nr_bytes instead of open coding it hfsplus: use bdev_nr_sectors instead of open coding it hfs: use bdev_nr_sectors instead of open coding it ...
2021-10-27block: add async version of bio_set_polledPavel Begunkov
If we know that a iocb is async we can optimise bio_set_polled() a bit, add a new helper bio_set_polled_async(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/8fa137885164a5d05fadcff4c3521da8d5a83d00.1635337135.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-27block: kill DIO_MULTI_BIOPavel Begunkov
Now __blkdev_direct_IO() serves only multi-bio I/O, thus remove not used anymore single bio refcounting optimisations. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/88eb488aae9ed4852a30f3a7132f296f56e43b80.1635337135.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-27block: kill unused polling bits in __blkdev_direct_IO()Pavel Begunkov
With addition of __blkdev_direct_IO_async(), __blkdev_direct_IO() now serves only multio-bio I/O, which we don't poll. Now we can remove anything related to I/O polling from it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/b8c597a6b7ee612df394853bfd24726aee5b898e.1635337135.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-27block: avoid extra iter advance with async iocbPavel Begunkov
Nobody cares about iov iterators state if we return -EIOCBQUEUED, so as the we now have __blkdev_direct_IO_async(), which gets pages only once, we can skip expensive iov_iter_advance(). It's around 1-2% of all CPU spent. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/a6158edfbfa2ae3bc24aed29a72f035df18fad2f.1635337135.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-25fs: get rid of the res2 iocb->ki_complete argumentJens Axboe
The second argument was only used by the USB gadget code, yet everyone pays the overhead of passing a zero to be passed into aio, where it ends up being part of the aio res2 value. Now that everybody is passing in zero, kill off the extra argument. Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-25block: add single bio async direct IO helperPavel Begunkov
As with __blkdev_direct_IO_simple(), we can implement direct IO more efficiently if there is only one bio. Add __blkdev_direct_IO_async() and blkdev_bio_end_io_async(). This patch brings me from 4.45-4.5 MIOPS with nullblk to 4.7+. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/f0ae4109b7a6934adede490f84d188d53b97051b.1635006010.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-21block: convert fops.c magic constants to SHIFT_SECTORPavel Begunkov
Don't use shifting by a magic number 9 but replace with a more descriptive SHIFT_SECTOR. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/068782b9f7e97569fb59a99529b23bb17ea4c5e2.1634755800.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-21block: optimise boundary blkdev_read_iter's checksPavel Begunkov
Combine pos and len checks and mark unlikely. Also, don't reexpand if it's not truncated. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/fff34e613aeaae1ad12977dc4592cb1a1f5d3190.1634755800.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19block: align blkdev_dio inlined bio to a cachelineJens Axboe
We get all sorts of unreliable and funky results since the bio is designed to align on a cacheline, which it does not when inlined like this. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: use bdev_nr_bytes instead of open coding it in blkdev_fallocateChristoph Hellwig
Use the proper helper to read the block device size. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20211018101130.1838532-25-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: add a struct io_comp_batch argument to fops->iopoll()Jens Axboe
struct io_comp_batch contains a list head and a completion handler, which will allow completions to more effciently completed batches of IO. For now, no functional changes in this patch, we just define the io_comp_batch structure and add the argument to the file_operations iopoll handler. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: use flags instead of bit fields for blkdev_dioJens Axboe
This generates a lot better code for me, and bumps performance from 7650K IOPS to 7750K IOPS. Looking at profiles for the run and running perf diff, it confirms that we're now sending a lot less time there: 6.38% -2.80% [kernel.vmlinux] [k] blkdev_direct_IO Taking it from the 2nd most cycle consumer to only the 9th most at 3.35% of the CPU time. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: cache bdev in struct file for raw bdev IOPavel Begunkov
bdev = &BDEV_I(file->f_mapping->host)->bdev Getting struct block_device from a file requires 2 memory dereferences as illustrated above, that takes a toll on performance, so cache it in yet unused file->private_data. That gives a noticeable peak performance improvement. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/8415f9fe12e544b9da89593dfbca8de2b52efe03.1634115360.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: switch polling to be bio basedChristoph Hellwig
Replace the blk_poll interface that requires the caller to keep a queue and cookie from the submissions with polling based on the bio. Polling for the bio itself leads to a few advantages: - the cookie construction can made entirely private in blk-mq.c - the caller does not need to remember the request_queue and cookie separately and thus sidesteps their lifetime issues - keeping the device and the cookie inside the bio allows to trivially support polling BIOs remapping by stacking drivers - a lot of code to propagate the cookie back up the submission path can be removed entirely. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Mark Wunderlich <mark.wunderlich@intel.com> Link: https://lore.kernel.org/r/20211012111226.760968-15-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: replace the spin argument to blk_iopoll with a flags argumentChristoph Hellwig
Switch the boolean spin argument to blk_poll to passing a set of flags instead. This will allow to control polling behavior in a more fine grained way. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Mark Wunderlich <mark.wunderlich@intel.com> Link: https://lore.kernel.org/r/20211012111226.760968-10-hch@lst.de [axboe: adapt to changed io_uring iopoll] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: don't try to poll multi-bio I/Os in __blkdev_direct_IOChristoph Hellwig
If an iocb is split into multiple bios we can't poll for both. So don't even bother to try to poll in that case. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211012111226.760968-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: merge block_ioctl into blkdev_ioctlChristoph Hellwig
Simplify the ioctl path and match the code structure on the compat side. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211012104450.659013-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-09-24block: hold ->invalidate_lock in blkdev_fallocateMing Lei
When running ->fallocate(), blkdev_fallocate() should hold mapping->invalidate_lock to prevent page cache from being accessed, otherwise stale data may be read in page cache. Without this patch, blktests block/009 fails sometimes. With this patch, block/009 can pass always. Also as Jan pointed out, no pages can be created in the discarded area while you are holding the invalidate_lock, so remove the 2nd truncate_bdev_range(). Cc: Jan Kara <jack@suse.cz> Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210923023751.1441091-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-09-07block: split out operations on block special filesChristoph Hellwig
Add a new block/fops.c for all the file and address_space operations that provide the block special file support. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210907141303.1371844-2-hch@lst.de [axboe: correct trailing whitespace while at it] Signed-off-by: Jens Axboe <axboe@kernel.dk>