linux-arm.git - Russell King's ARM Linux kernel tree

Age	Commit message (Collapse)	Author
2021-12-23	nvme: drop unused variable ctrl in nvme_setup_cmd	Geliang Tang
	The variable 'ctrl' became useless since the code using it was dropped from nvme_setup_cmd() in the commit 292ddf67bbd5 ("nvme: increment request genctr on completion"). Fix it to get rid of this compilation warning in the nvme-5.17 branch: drivers/nvme/host/core.c: In function ‘nvme_setup_cmd’: drivers/nvme/host/core.c:993:20: warning: unused variable ‘ctrl’ [-Wunused-variable] struct nvme_ctrl *ctrl = nvme_req(req)->ctrl; ^~~~ Fixes: 292ddf67bbd5 ("nvme: increment request genctr on completion") Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-12-23	nvme: increment request genctr on completion	Keith Busch
	The nvme request generation counter is intended to catch duplicate completions. Incrementing the counter on submission means duplicates can only be caught if the request tag is reallocated and dispatched prior to the driver observing the corrupted CQE. Incrementing on completion removes this window, making it possible to detect duplicate completions in consecutive entries. Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-12-23	nvme-fabrics: print out valid arguments when reading from /dev/nvme-fabrics	Hannes Reinecke
	Currently applications have a hard time figuring out which nvme-over-fabrics arguments are supported for any given kernel; the ioctl will return an error code on failure, and the application has to guess whether this was due to an invalid argument or due to a connection or controller error. With this patch applications can read a list of supported arguments by simply reading from /dev/nvme-fabrics, allowing them to validate the connection string. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-12-16	block: remove the rsxx driver	Christoph Hellwig
	This driver was for rare and shortlived high end enterprise hardware and hasn't been maintained since 2014, which also means it never got converted to use blk-mq. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-12-14	rsxx: Drop PCI legacy power management	Bjorn Helgaas
	The rsxx driver doesn't support device suspend, so remove rsxx_pci_suspend(), the legacy PCI .suspend() method, completely. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://lore.kernel.org/r/20211208192449.146076-5-helgaas@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-12-14	mtip32xx: convert to generic power management	Vaibhav Gupta
	Convert mtip32xx from legacy PCI power management to the generic power management framework. Previously, mtip32xx used legacy PCI power management, where mtip_pci_suspend() and mtip_pci_resume() were responsible for both device-specific things and generic PCI things: mtip_pci_suspend mtip_block_suspend(dd) <-- device-specific pci_save_state(pdev) <-- generic PCI pci_set_power_state(pdev, pci_choose_state(pdev, state)) mtip_pci_resume pci_set_power_state(PCI_D0) <-- generic PCI pci_restore_state(pdev) <-- generic PCI pcim_enable_device(pdev) <-- generic PCI pci_set_master(pdev) <-- generic PCI mtip_block_resume(dd) <-- device-specific With generic power management, the PCI bus PM methods do the generic PCI things, and the driver needs only the device-specific part, i.e., suspend_devices_and_enter dpm_suspend_start(PMSG_SUSPEND) pci_pm_suspend # PCI bus .suspend() method mtip_pci_suspend # dev->driver->pm->suspend mtip_block_suspend <-- device-specific suspend_enter dpm_suspend_noirq(PMSG_SUSPEND) pci_pm_suspend_noirq # PCI bus .suspend_noirq() method pci_save_state <-- generic PCI pci_prepare_to_sleep <-- generic PCI pci_set_power_state ... dpm_resume_end(PMSG_RESUME) pci_pm_resume # PCI bus .resume() method pci_restore_standard_config pci_set_power_state(PCI_D0) <-- generic PCI pci_restore_state <-- generic PCI mtip_pci_resume # dev->driver->pm->resume mtip_block_resume <-- device-specific [bhelgaas: commit log] Link: https://lore.kernel.org/r/20210114115423.52414-2-vaibhavgupta40@gmail.com Signed-off-by: Vaibhav Gupta <vaibhavgupta40@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://lore.kernel.org/r/20211208192449.146076-4-helgaas@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-12-14	mtip32xx: remove pointless drvdata lookups	Bjorn Helgaas
	Previously we passed a struct pci_dev * to mtip_check_surprise_removal(), which immediately looked up the driver_data. But all callers already have the driver_data pointer, so just pass it directly and skip the extra lookup. No functional change intended. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://lore.kernel.org/r/20211208192449.146076-3-helgaas@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-12-14	mtip32xx: remove pointless drvdata checking	Bjorn Helgaas
	The .suspend() and .resume() methods are only called after the .probe() method (mtip_pci_probe()) has set the drvdata and returned success. Therefore, if we get to mtip_pci_suspend() or mtip_pci_resume(), the drvdata must be valid. Drop the unnecessary checking. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://lore.kernel.org/r/20211208192449.146076-2-helgaas@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-12-13	drbd: Use struct_group() to zero algs	Kees Cook
	In preparation for FORTIFY_SOURCE performing compile-time and run-time field bounds checking for memset(), avoid intentionally writing across neighboring fields. Add a struct_group() for the algs so that memset() can correctly reason about the size. Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://lore.kernel.org/r/20211118203712.1288866-1-keescook@chromium.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-12-13	loop: make autoclear operation asynchronous	Tetsuo Handa
	syzbot is reporting circular locking problem at __loop_clr_fd() [1], for commit 87579e9b7d8dc36e ("loop: use worker per cgroup instead of kworker") is calling destroy_workqueue() with disk->open_mutex held. This circular dependency cannot be broken unless we call __loop_clr_fd() without holding disk->open_mutex. Therefore, defer __loop_clr_fd() from lo_release() to a WQ context. Link: https://syzkaller.appspot.com/bug?extid=643e4ce4b6ad1347d372 [1] Reported-by: syzbot <syzbot+643e4ce4b6ad1347d372@syzkaller.appspotmail.com> Suggested-by: Christoph Hellwig <hch@infradead.org> Cc: Jan Kara <jack@suse.cz> Tested-by: syzbot+643e4ce4b6ad1347d372@syzkaller.appspotmail.com Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/1ed7df28-ebd6-71fb-70e5-1c2972e05ddb@i-love.sakura.ne.jp Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-12-10	null_blk: cast command status to integer	Jens Axboe
	kernel test robot reports that sparse now triggers a warning on null_blk: >> drivers/block/null_blk/main.c:1577:55: sparse: sparse: incorrect type in argument 3 (different base types) @@ expected int ioerror @@ got restricted blk_status_t [usertype] error @@ drivers/block/null_blk/main.c:1577:55: sparse: expected int ioerror drivers/block/null_blk/main.c:1577:55: sparse: got restricted blk_status_t [usertype] error because blk_mq_add_to_batch() takes an integer instead of a blk_status_t. Just cast this to an integer to silence it, null_blk is the odd one out here since the command status is the "right" type. If we change the function type, then we'll have do that for other callers too (existing and future ones). Fixes: 2385ebf38f94 ("block: null_blk: batched complete poll requests") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-12-09	pktdvd: stop using bdi congestion framework.	NeilBrown
	The bdi congestion framework isn't widely used and should be deprecated. pktdvd makes use of it to track congestion, but this can be done entirely internally to pktdvd, so it doesn't need to use the framework. So introduce a "congested" flag. When waiting for bio_queue_size to drop, set this flag and a var_waitqueue() to wait for it. When bio_queue_size does drop and this flag is set, clear the flag and call wake_up_var(). We don't use a wait_var_event macro for the waiting as we need to set the flag and drop the spinlock before calling schedule() and while that is possible with __wait_var_event(), result is not easy to read. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: NeilBrown <neilb@suse.de> Link: https://lore.kernel.org/r/163910843527.9928.857338663717630212@noble.neil.brown.name Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-12-03	block: null_blk: batched complete poll requests	Ming Lei
	Complete poll requests via blk_mq_add_to_batch() and blk_mq_end_request_batch(), so that we can cover batched complete code path by running null_blk test. Meantime this way shows ~14% IOPS boost on 't/io_uring /dev/nullb0' in my test. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20211203081703.3506020-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-12-03	floppy: Add max size check for user space request	Xiongwei Song
	We need to check the max request size that is from user space before allocating pages. If the request size exceeds the limit, return -EINVAL. This check can avoid the warning below from page allocator. WARNING: CPU: 3 PID: 16525 at mm/page_alloc.c:5344 current_gfp_context include/linux/sched/mm.h:195 [inline] WARNING: CPU: 3 PID: 16525 at mm/page_alloc.c:5344 __alloc_pages+0x45d/0x500 mm/page_alloc.c:5356 Modules linked in: CPU: 3 PID: 16525 Comm: syz-executor.3 Not tainted 5.15.0-syzkaller #0 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014 RIP: 0010:__alloc_pages+0x45d/0x500 mm/page_alloc.c:5344 Code: be c9 00 00 00 48 c7 c7 20 4a 97 89 c6 05 62 32 a7 0b 01 e8 74 9a 42 07 e9 6a ff ff ff 0f 0b e9 a0 fd ff ff 40 80 e5 3f eb 88 <0f> 0b e9 18 ff ff ff 4c 89 ef 44 89 e6 45 31 ed e8 1e 76 ff ff e9 RSP: 0018:ffffc90023b87850 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 1ffff92004770f0b RCX: dffffc0000000000 RDX: 0000000000000000 RSI: 0000000000000033 RDI: 0000000000010cc1 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001 R10: ffffffff81bb4686 R11: 0000000000000001 R12: ffffffff902c1960 R13: 0000000000000033 R14: 0000000000000000 R15: ffff88804cf64a30 FS: 0000000000000000(0000) GS:ffff88802cd00000(0063) knlGS:00000000f44b4b40 CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033 CR2: 000000002c921000 CR3: 000000004f507000 CR4: 0000000000150ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> alloc_pages+0x1a7/0x300 mm/mempolicy.c:2191 __get_free_pages+0x8/0x40 mm/page_alloc.c:5418 raw_cmd_copyin drivers/block/floppy.c:3113 [inline] raw_cmd_ioctl drivers/block/floppy.c:3160 [inline] fd_locked_ioctl+0x12e5/0x2820 drivers/block/floppy.c:3528 fd_ioctl drivers/block/floppy.c:3555 [inline] fd_compat_ioctl+0x891/0x1b60 drivers/block/floppy.c:3869 compat_blkdev_ioctl+0x3b8/0x810 block/ioctl.c:662 __do_compat_sys_ioctl+0x1c7/0x290 fs/ioctl.c:972 do_syscall_32_irqs_on arch/x86/entry/common.c:112 [inline] __do_fast_syscall_32+0x65/0xf0 arch/x86/entry/common.c:178 do_fast_syscall_32+0x2f/0x70 arch/x86/entry/common.c:203 entry_SYSENTER_compat_after_hwframe+0x4d/0x5c Reported-by: syzbot+23a02c7df2cf2bc93fa2@syzkaller.appspotmail.com Link: https://lore.kernel.org/r/20211116131033.27685-1-sxwjean@me.com Signed-off-by: Xiongwei Song <sxwjean@gmail.com> Signed-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-12-03	floppy: Fix hang in watchdog when disk is ejected	Tasos Sahanidis
	When the watchdog detects a disk change, it calls cancel_activity(), which in turn tries to cancel the fd_timer delayed work. In the above scenario, fd_timer_fn is set to fd_watchdog(), meaning it is trying to cancel its own work. This results in a hang as cancel_delayed_work_sync() is waiting for the watchdog (itself) to return, which never happens. This can be reproduced relatively consistently by attempting to read a broken floppy, and ejecting it while IO is being attempted and retried. To resolve this, this patch calls cancel_delayed_work() instead, which cancels the work without waiting for the watchdog to return and finish. Before this regression was introduced, the code in this section used del_timer(), and not del_timer_sync() to delete the watchdog timer. Link: https://lore.kernel.org/r/399e486c-6540-db27-76aa-7a271b061f76@tasossah.com Fixes: 070ad7e793dc ("floppy: convert to delayed work and single-thread wq") Signed-off-by: Tasos Sahanidis <tasos@tasossah.com> Signed-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-12-02	null_blk: allow zero poll queues	Ming Lei
	There isn't any reason to not allow zero poll queues from user viewpoint. Also sometimes we need to compare io poll between poll mode and irq mode, so not allowing poll queues is bad. Fixes: 15dfc662ef31 ("null_blk: Fix handling of submit_queues and poll_queues attributes") Cc: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20211203023935.3424042-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-29	loop: don't hold lo_mutex during __loop_clr_fd()	Tetsuo Handa
	syzbot is reporting circular locking problem at __loop_clr_fd() [1], for commit 87579e9b7d8dc36e ("loop: use worker per cgroup instead of kworker") is calling destroy_workqueue() with lo->lo_mutex held. Since all functions where lo->lo_state matters are already checking lo->lo_state with lo->lo_mutex held (in order to avoid racing with e.g. ioctl(LOOP_CTL_REMOVE)), and __loop_clr_fd() can be called from either ioctl(LOOP_CLR_FD) xor close(), lo->lo_state == Lo_rundown is considered as an exclusive lock for __loop_clr_fd(). Therefore, hold lo->lo_mutex inside __loop_clr_fd() only when asserting/updating lo->lo_state. Since ioctl(LOOP_CLR_FD) depends on lo->lo_state == Lo_bound, a valid lo->lo_backing_file must have been assigned by ioctl(LOOP_SET_FD) or ioctl(LOOP_CONFIGURE). Thus, we can remove lo->lo_backing_file test, and convert __loop_clr_fd() into a void function. Link: https://syzkaller.appspot.com/bug?extid=63614029dfb79abd4383 [1] Reported-by: syzbot <syzbot+63614029dfb79abd4383@syzkaller.appspotmail.com> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/8ebe3b2e-8975-7f26-0620-7144a3b8b8cd@i-love.sakura.ne.jp Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-29	scsi: remove the gendisk argument to scsi_ioctl	Christoph Hellwig
	Now that blk_execute_rq does not take a gendisk argument there is no need to pass it through the scsi_ioctl callchain either. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20211126121802.2090656-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-29	block: remove the gendisk argument to blk_execute_rq	Christoph Hellwig
	Remove the gendisk aregument to blk_execute_rq and blk_execute_rq_nowait given that it is unused now. Also convert the boolean at_head parameter to actually use the bool type while touching the prototype. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20211126121802.2090656-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-29	block: remove the ->rq_disk field in struct request	Christoph Hellwig
	Just use the disk attached to the request_queue instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20211126121802.2090656-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-29	block: don't check ->rq_disk in merges	Christoph Hellwig
	There is a 1:1 relationship between request_queues and gendisks now, so no need for these extra checks. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20211126121802.2090656-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-29	mtd_blkdevs: remove the sector out of range check in do_blktrans_request	Christoph Hellwig
	The block layer already performs this check, no need to duplicate it in the driver. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20211126121802.2090656-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-29	block: Remove redundant initialization of variable ret	Colin Ian King
	The variable ret is being initialized with a value that is never read, it is being updated later on. The assignment is redundant and can be removed. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Link: https://lore.kernel.org/r/20211126230652.1175636-1-colin.i.king@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-29	block: simplify ioc_lookup_icq	Christoph Hellwig
	Remove the ioc argument as it always points to current->io_context. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211126115817.2087431-15-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-29	block: simplify ioc_create_icq	Christoph Hellwig
	Remove the ioc and gfp_mask argument, which are hard coded by the caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211126115817.2087431-14-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-29	block: return the io_context from create_task_io_context	Christoph Hellwig
	Grab a reference to the newly allocated or existing io_context in create_task_io_context and return it. This simplifies the callers and removes the need for double lookups. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211126115817.2087431-13-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-29	block: use alloc_io_context in __copy_io	Christoph Hellwig
	In __copy_io we know that the newly allocate task_struct does not have an I/O context yet and is not exiting. So just allocate the I/O context struct and install it directly. There is no need to lock the task either as it is just being created. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211126115817.2087431-12-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-29	block: factor out a alloc_io_context helper	Christoph Hellwig
	Factor out a helper that just allocate an I/O context. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211126115817.2087431-11-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-29	block: remove get_io_context_active	Christoph Hellwig
	Fold it into it's only caller, and remove a lof of the debug checks that are not needed. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211126115817.2087431-10-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-29	block: move the remaining elv.icq handling to the I/O scheduler	Christoph Hellwig
	After the prepare side has been moved to the only I/O scheduler that cares, do the same for the cleanup and the NULL initialization. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211126115817.2087431-9-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-29	block: move blk_mq_sched_assign_ioc to blk-ioc.c	Christoph Hellwig
	Move blk_mq_sched_assign_ioc so that many interfaces from the file can be marked static. Rename the function to ioc_find_get_icq as well and return the icq to simplify the interface. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211126115817.2087431-8-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-29	block: mark put_io_context_active static	Christoph Hellwig
	Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211126115817.2087431-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-29	Revert "block: Provide blk_mq_sched_get_icq()"	Christoph Hellwig
	This reverts commit 4896c4e64ba5d5d5acdbcf68c5910dd4f6d8fa62. The helper is not needed any more. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211126115817.2087431-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-29	bfq: use bfq_bic_lookup in bfq_limit_depth	Christoph Hellwig
	No need to create a new I/O context if there is none present yet in ->limit_depth. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211126115817.2087431-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-29	bfq: simplify bfq_bic_lookup	Christoph Hellwig
	Remove the unused bfqd argument, and hardcode ioc to current->io_context. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211126115817.2087431-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-29	fork: move copy_io to block/blk-ioc.c	Christoph Hellwig
	Move the copying of the I/O context to the block layer as that is where we can use the proper low-level interfaces. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211126115817.2087431-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-29	RDMA/qib: rename copy_io to qib_copy_io	Christoph Hellwig
	Add the proper module prefix to avoid conflicts with a function in the scheduler. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211126115817.2087431-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-29	blk-mq: use bio->bi_opf after bio is checked	Ming Lei
	bio->bi_opf isn't finalized before checking the bio, so use it after submit_bio_checks() returns. Fixes: 5b13bc8a3fd5 ("blk-mq: cleanup request allocation") Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-29	bfq: Do not let waker requests skip proper accounting	Jan Kara
	Commit 7cc4ffc55564 ("block, bfq: put reqs of waker and woken in dispatch list") added a condition to bfq_insert_request() which added waker's requests directly to dispatch list. The rationale was that completing waker's IO is needed to get more IO for the current queue. Although this rationale is valid, there is a hole in it. The waker does not necessarily serve the IO only for the current queue and maybe it's current IO is not needed for current queue to make progress. Furthermore injecting IO like this completely bypasses any service accounting within bfq and thus we do not properly track how much service is waker's queue getting or that the waker is actually doing any IO. Depending on the conditions this can result in the waker getting too much or too few service. Consider for example the following job file: [global] directory=/mnt/repro/ rw=write size=8g time_based runtime=30 ramp_time=10 blocksize=1m direct=0 ioengine=sync [slowwriter] numjobs=1 prioclass=2 prio=7 fsync=200 [fastwriter] numjobs=1 prioclass=2 prio=0 fsync=200 Despite processes have very different IO priorities, they get the same about of service. The reason is that bfq identifies these processes as having waker-wakee relationship and once that happens, IO from fastwriter gets injected during slowwriter's time slice. As a result bfq is not aware that fastwriter has any IO to do and constantly schedules only slowwriter's queue. Thus fastwriter is forced to compete with slowwriter's IO all the time instead of getting its share of time based on IO priority. Drop the special injection condition from bfq_insert_request(). As a result, requests will be tracked and queued in a normal way and on next dispatch bfq_select_queue() can decide whether the waker's inserted requests should be injected during the current queue's timeslice or not. Fixes: 7cc4ffc55564 ("block, bfq: put reqs of waker and woken in dispatch list") Acked-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20211125133645.27483-8-jack@suse.cz Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-29	bfq: Log waker detections	Jan Kara
	Waker - wakee relationships are important in deciding whether one queue can preempt the other one. Print information about detected waker-wakee relationships so that scheduling decisions can be better understood from block traces. Acked-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20211125133645.27483-7-jack@suse.cz Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-29	bfq: Provide helper to generate bfqq name	Jan Kara
	Instead of having helper formating bfqq pid, provide a helper to generate full bfqq name as used in the traces. It saves some code duplication and will save more in the coming tracepoints. Acked-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20211125133645.27483-6-jack@suse.cz Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-29	bfq: Limit waker detection in time	Jan Kara
	Currently, when process A starts issuing requests shortly after process B has completed some IO three times in a row, we decide that B is a "waker" of A meaning that completing IO of B is needed for A to make progress and generally stop separating A's and B's IO much. This logic is useful to avoid unnecessary idling and thus throughput loss for cases where workload needs to switch e.g. between the process and the journaling thread doing IO. However the detection heuristic tends to frequently give false positives when A and B are fighting IO bandwidth and other processes aren't doing much IO as we are basically deemed to eventually accumulate three occurences of a situation where one process starts issuing requests after the other has completed some IO. To reduce these false positives, cancel the waker detection also if we didn't accumulate three detected wakeups within given timeout. The rationale is that if wakeups are really rare, the pointless idling doesn't hurt throughput that much anyway. This significantly reduces false waker detection for workload like: [global] directory=/mnt/repro/ rw=write size=8g time_based runtime=30 ramp_time=10 blocksize=1m direct=0 ioengine=sync [slowwriter] numjobs=1 fsync=200 [fastwriter] numjobs=1 fsync=200 Acked-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20211125133645.27483-5-jack@suse.cz Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-29	bfq: Limit number of requests consumed by each cgroup	Jan Kara
	When cgroup IO scheduling is used with BFQ it does not really provide service differentiation if the cgroup drives a big IO depth. That for example happens with writeback which asynchronously submits lots of IO but it can happen with AIO as well. The problem is that if we have two cgroups that submit IO with different weights, the cgroup with higher weight properly gets more IO time and is able to dispatch more IO. However this causes lower weight cgroup to accumulate more requests inside BFQ and eventually lower weight cgroup consumes most of IO scheduler tags. At that point higher weight cgroup stops getting better service as it is mostly blocked waiting for a scheduler tag while its queues inside BFQ are empty and thus lower weight cgroup gets served. Check how many requests submitting cgroup has allocated in bfq_limit_depth() and if it consumes more requests than what would correspond to its weight limit available depth to 1 so that the cgroup cannot consume many more requests. With this limitation the higher weight cgroup gets proper service even with writeback. Reviewed-by: Michal Koutný <mkoutny@suse.com> Acked-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20211125133645.27483-4-jack@suse.cz Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-29	bfq: Store full bitmap depth in bfq_data	Jan Kara
	Store bitmap depth shift inside bfq_data so that we can use it in bfq_limit_depth() for proportioning when limiting number of available request tags for a cgroup. Acked-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20211125133645.27483-3-jack@suse.cz Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-29	bfq: Track number of allocated requests in bfq_entity	Jan Kara
	When we want to limit number of requests used by each bfqq and also cgroup, we need to track also number of requests used by each cgroup. So track number of allocated requests for each bfq_entity. Acked-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20211125133645.27483-2-jack@suse.cz Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-29	block: Provide blk_mq_sched_get_icq()	Jan Kara
	Currently we lookup ICQ only after the request is allocated. However BFQ will want to decide how many scheduler tags it allows a given bfq queue (effectively a process) to consume based on cgroup weight. So provide a function blk_mq_sched_get_icq() so that BFQ can lookup ICQ earlier. Acked-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20211125133645.27483-1-jack@suse.cz Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-29	mmc: core: Use blk_mq_complete_request_direct().	Sebastian Andrzej Siewior
	The completion callback for the sdhci-pci device is invoked from a kworker. I couldn't identify in which context is mmc_blk_mq_req_done() invoke but the remaining caller are from invoked from preemptible context. Here it would make sense to complete the request directly instead scheduling ksoftirqd for its completion. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Link: https://lore.kernel.org/r/20211025070658.1565848-3-bigeasy@linutronix.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-29	blk-mq: Add blk_mq_complete_request_direct()	Sebastian Andrzej Siewior
	Add blk_mq_complete_request_direct() which completes the block request directly instead deferring it to softirq for single queue devices. This is useful for devices which complete the requests in preemptible context and raising softirq from means scheduling ksoftirqd. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211025070658.1565848-2-bigeasy@linutronix.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-29	blk-crypto: remove blk_crypto_unregister()	Eric Biggers
	This function is trivial and is only used in one place. Having this function is misleading because it implies that blk_crypto_register() needs to be paired with blk_crypto_unregister(), which is not the case. Just set disk->queue->crypto_profile to NULL directly. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211124013733.347612-1-ebiggers@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-29	blk-mq: cleanup request allocation	Christoph Hellwig
	Refactor the request alloction so that blk_mq_get_cached_request tries to find a cached request first, and the entirely separate and now self contained blk_mq_get_new_requests allocates one or more requests if that is not possible. There is a small change in behavior as submit_bio_checks is called twice now if a cached request is present but can't be used, but that is a small price to pay for unwinding this code. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211124062856.1444266-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>