summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-01-26Merge tag 'nvme-6.2-2023-01-26' of git://git.infradead.org/nvme into block-6.2Jens Axboe
Pull NVMe fixes from Christoph: "nvme fixes for Linux 6.2 - flush initial scan_work for async probe (Keith Busch) - fix passthrough csi check (Keith Busch) - fix nvme-fc initialization order (Ross Lagerwall)" * tag 'nvme-6.2-2023-01-26' of git://git.infradead.org/nvme: nvme: fix passthrough csi check nvme-pci: flush initial scan_work for async probe nvme-fc: fix initialization order
2023-01-26block: ublk: move ublk_chr_class destroying after devices are removedMing Lei
The 'ublk_chr_class' is needed when deleting ublk char devices in ublk_exit(), so move it after devices(idle) are removed. Fixes the following warning reported by Harris, James R: [ 859.178950] sysfs group 'power' not found for kobject 'ublkc0' [ 859.178962] WARNING: CPU: 3 PID: 1109 at fs/sysfs/group.c:278 sysfs_remove_group+0x9c/0xb0 Reported-by: "Harris, James R" <james.r.harris@intel.com> Fixes: 71f28f3136af ("ublk_drv: add io_uring based userspace block driver") Link: https://lore.kernel.org/linux-block/Y9JlFmSgDl3+zy3N@T590/T/#t Signed-off-by: Ming Lei <ming.lei@redhat.com> Tested-by: Jim Harris <james.r.harris@intel.com> Link: https://lore.kernel.org/r/20230126115346.263344-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-25nvme: fix passthrough csi checkKeith Busch
The namespace head saves the Command Set Indicator enum, so use that instead of the Command Set Selected. The two values are not the same. Fixes: 831ed60c2aca2d ("nvme: also return I/O command effects from nvme_command_effects") Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-01-24nvme-pci: flush initial scan_work for async probeKeith Busch
The nvme device may have a namespace with the root partition, so make sure we've completed scanning before returning from the async probe. Fixes: eac3ef262941 ("nvme-pci: split the initial probe from the rest path") Reported-by: Klaus Jensen <its@irrelevant.dk> Signed-off-by: Keith Busch <kbusch@kernel.org> Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Tested-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-01-23nvme-fc: fix initialization orderRoss Lagerwall
ctrl->ops is used by nvme_alloc_admin_tag_set() but set by nvme_init_ctrl() so reorder the calls to avoid a NULL pointer dereference. Fixes: 6dfba1c09c10 ("nvme-fc: use the tagset alloc/free helpers") Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-01-20Merge tag 'nvme-6.2-2023-01-20' of git://git.infradead.org/nvme into block-6.2Jens Axboe
Pull NVMe fixes from Christoph: "nvme fixes for Linux 6.2 - fix controller shutdown regression in nvme-apple (Janne Grunau) - fix a polling on timeout regression in nvme-pci (Keith Busch)" * tag 'nvme-6.2-2023-01-20' of git://git.infradead.org/nvme: nvme-pci: fix timeout request state check nvme-apple: only reset the controller when RTKit is running nvme-apple: reset controller during shutdown
2023-01-19nvme-pci: fix timeout request state checkKeith Busch
Polling the completion can progress the request state to IDLE, either inline with the completion, or through softirq. Either way, the state may not be COMPLETED, so don't check for that. We only care if the state isn't IN_FLIGHT. This is fixing an issue where the driver aborts an IO that we just completed. Seeing the "aborting" message instead of "polled" is very misleading as to where the timeout problem resides. Fixes: bf392a5dc02a9b ("nvme-pci: Remove tag from process cq") Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-01-19nvme-apple: only reset the controller when RTKit is runningJanne Grunau
NVMe controller register access hangs indefinitely when the co-processor is not running. A missed reset is preferable over a hanging thread since it could be recoverable. Signed-off-by: Janne Grunau <j@jannau.net> Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-01-19nvme-apple: reset controller during shutdownJanne Grunau
This is a functional revert of c76b8308e4c9 ("nvme-apple: fix controller shutdown in apple_nvme_disable"). The commit broke suspend/resume since apple_nvme_reset_work() tries to disable the controller on resume. This does not work for the apple NVMe controller since register access only works while the co-processor firmware is running. Disabling the NVMe controller in the shutdown path is also required for shutting the co-processor down. The original code was appropriate for this hardware. Add a comment to prevent a similar breaking changes in the future. Fixes: c76b8308e4c9 ("nvme-apple: fix controller shutdown in apple_nvme_disable") Reported-by: Janne Grunau <j@jannau.net> Link: https://lore.kernel.org/all/20230110174745.GA3576@jannau.net/ Signed-off-by: Janne Grunau <j@jannau.net> [hch: updated with a more descriptive comment from Hector Martin] Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-01-17block: fix hctx checks for batch allocationPavel Begunkov
When there are no read queues read requests will be assigned a default queue on allocation. However, blk_mq_get_cached_request() is not prepared for that and will fail all attempts to grab read requests from the cache. Worst case it doubles the number of requests allocated, roughly half of which will be returned by blk_mq_free_plug_rqs(). It only affects batched allocations and so is io_uring specific. For reference, QD8 t/io_uring benchmark improves by 20-35%. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/80d4511011d7d4751b4cf6375c4e38f237d935e3.1673955390.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-17block/rnbd-clt: fix wrong max ID in ida_alloc_maxGuoqing Jiang
We need to pass 'end - 1' to ida_alloc_max after switch from ida_simple_get to ida_alloc_max. Otherwise smatch warns. drivers/block/rnbd/rnbd-clt.c:1460 init_dev() error: Calling ida_alloc_max() with a 'max' argument which is a power of 2. -1 missing? Fixes: 24afc15dbe21 ("block/rnbd: Remove a useless mutex") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Acked-by: Jack Wang <jinpu.wang@ionos.com> Link: https://lore.kernel.org/r/20221230010926.32243-1-guoqing.jiang@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-16blk-cgroup: fix missing pd_online_fn() while activating policyYu Kuai
If the policy defines pd_online_fn(), it should be called after pd_init_fn(), like blkg_create(). Signed-off-by: Yu Kuai <yukuai3@huawei.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20230103112833.2013432-1-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-16pktcdvd: check for NULL returna fter calling bio_split_to_limits()Jens Axboe
The revert of the removal of this driver happened after we fixed up the split limits for NOWAIT issue, hence it got missed. Ensure that we check for a NULL bio after splitting, in case it should be retried. Marking this as fixing both commits, so that stable backport will do this correctly. Cc: stable@vger.kernel.org Fixes: 9cea62b2cbab ("block: don't allow splitting of a REQ_NOWAIT bio") Fixes: 4b83e99ee709 ("Revert "pktcdvd: remove driver."") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-15block, bfq: switch 'bfqg->ref' to use atomic refcount apisYu Kuai
The updating of 'bfqg->ref' should be protected by 'bfqd->lock', however, during code review, we found that bfq_pd_free() update 'bfqg->ref' without holding the lock, which is problematic: 1) bfq_pd_free() triggered by removing cgroup is called asynchronously; 2) bfqq will grab bfqg reference, and exit bfqq will drop the reference, which can concurrent with 1). Unfortunately, 'bfqd->lock' can't be held here because 'bfqd' might already be freed in bfq_pd_free(). Fix the problem by using atomic refcount apis. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230103084755.1256479-1-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-13Merge branch 'md-fixes' of ↵Jens Axboe
https://git.kernel.org/pub/scm/linux/kernel/git/song/md into block-6.2 Pull MD fix from Song: "It fixes an issue introduced by recent code refactor." * 'md-fixes' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md: md: fix incorrect declaration about claim_rdev in md_import_device
2023-01-12md: fix incorrect declaration about claim_rdev in md_import_deviceAdrian Huang
Commit fb541ca4c365 ("md: remove lock_bdev / unlock_bdev") removes wrappers for blkdev_get/blkdev_put. However, the uninitialized local static variable of pointer type 'claim_rdev' in md_import_device() is NULL, which leads to the following warning call trace: WARNING: CPU: 22 PID: 1037 at block/bdev.c:577 bd_prepare_to_claim+0x131/0x150 CPU: 22 PID: 1037 Comm: mdadm Not tainted 6.2.0-rc3+ #69 .. RIP: 0010:bd_prepare_to_claim+0x131/0x150 .. Call Trace: <TASK> ? _raw_spin_unlock+0x15/0x30 ? iput+0x6a/0x220 blkdev_get_by_dev.part.0+0x4b/0x300 md_import_device+0x126/0x1d0 new_dev_store+0x184/0x240 md_attr_store+0x80/0xf0 kernfs_fop_write_iter+0x128/0x1c0 vfs_write+0x2be/0x3c0 ksys_write+0x5f/0xe0 do_syscall_64+0x38/0x90 entry_SYSCALL_64_after_hwframe+0x72/0xdc It turns out the md device cannot be used: md: could not open device unknown-block(259,0). md: md127 stopped. Fix the issue by declaring the local static variable of struct type and passing the pointer of the variable to blkdev_get_by_dev(). Fixes: fb541ca4c365 ("md: remove lock_bdev / unlock_bdev") Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Adrian Huang <ahuang12@lenovo.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Song Liu <song@kernel.org>
2023-01-12Merge tag 'nvme-6.2-2023-01-12' of git://git.infradead.org/nvme into block-6.2Jens Axboe
Pull NVMe fixes from Christoph: "nvme fixes for Linux 6.2 - Identify quirks for Apple controllers (Hector Martin) - fix error handling in nvme_pci_enable (Tong Zhang) - refuse unprivileged passthrough on partitions (Christoph Hellwig) - fix MAINTAINERS to not match nvmem subsystem headers (Russell King)" * tag 'nvme-6.2-2023-01-12' of git://git.infradead.org/nvme: MAINTAINERS: stop nvme matching for nvmem files nvme: don't allow unprivileged passthrough on partitions nvme: replace the "bool vec" arguments with flags in the ioctl path nvme: remove __nvme_ioctl nvme-pci: fix error handling in nvme_pci_enable() nvme-pci: add NVME_QUIRK_IDENTIFY_CNS quirk to Apple T2 controllers nvme-apple: add NVME_QUIRK_IDENTIFY_CNS quirk to fix regression
2023-01-10MAINTAINERS: stop nvme matching for nvmem filesRussell King (Oracle)
The nvme patterns detect all include files starting with nvme, which also picks up the nvmem subsystem header files. Fix this by using a more specific pattern. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> [hch: switched to a purely inclusive pattern instead of excluding nvmem*] Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-01-10nvme: don't allow unprivileged passthrough on partitionsChristoph Hellwig
Passthrough commands can always access the entire device, and thus submitting them on partitions is an privelege escalation. In hindsight we should have never allowed any passthrough commands on partitions, but it's probably too late to change that decision now. Fixes: e4fbcf32c860 ("nvme: identify-namespace without CAP_SYS_ADMIN") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
2023-01-10nvme: replace the "bool vec" arguments with flags in the ioctl pathChristoph Hellwig
To prepare for passing down more information, replace the boolean vec argument with a more extensible flags one. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
2023-01-10nvme: remove __nvme_ioctlChristoph Hellwig
Open code __nvme_ioctl in the two callers to make future changes that pass down additional paramters in the ioctl path easier. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
2023-01-10nvme-pci: fix error handling in nvme_pci_enable()Tong Zhang
There are two issues in nvme_pci_enable(): 1) If pci_alloc_irq_vectors() fails, device is left enabled. Fix this by adding a goto disable statement. 2) nvme_pci_configure_admin_queue could return -ENODEV, in this case, we will need to free IRQ properly. Otherwise the following warning could be triggered: [ 5.286752] WARNING: CPU: 0 PID: 33 at kernel/irq/irqdomain.c:253 irq_domain_remove+0x12d/0x140 [ 5.290547] Call Trace: [ 5.290626] <TASK> [ 5.290695] msi_remove_device_irq_domain+0xc9/0xf0 [ 5.290843] msi_device_data_release+0x15/0x80 [ 5.290978] release_nodes+0x58/0x90 [ 5.293788] WARNING: CPU: 0 PID: 33 at kernel/irq/msi.c:276 msi_device_data_release+0x76/0x80 [ 5.297573] Call Trace: [ 5.297651] <TASK> [ 5.297719] release_nodes+0x58/0x90 [ 5.297831] devres_release_all+0xef/0x140 [ 5.298339] device_unbind_cleanup+0x11/0xc0 [ 5.298479] really_probe+0x296/0x320 Fixes: a6ee7f19ebfd ("nvme-pci: call nvme_pci_configure_admin_queue from nvme_pci_enable") Co-developed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Tong Zhang <ztong0001@gmail.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-01-10nvme-pci: add NVME_QUIRK_IDENTIFY_CNS quirk to Apple T2 controllersHector Martin
This mirrors the quirk added to Apple Silicon controllers in apple.c. These controllers do not support the Active NS ID List command and behave identically to the SoC version judging by existing user reports/syslogs, so will need the same fix. This quirk reverts back to NVMe 1.0 behavior and disables the broken commands. Fixes: 811f4de0344d ("nvme: avoid fallback to sequential scan due to transient issues") Signed-off-by: Hector Martin <marcan@marcan.st> Tested-by: Orlando Chamberlain <orlandoch.dev@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-01-10nvme-apple: add NVME_QUIRK_IDENTIFY_CNS quirk to fix regressionHector Martin
From the get-go, this driver and the ANS syslog have been complaining about namespace identification. In 6.2-rc1, commit 811f4de0344d ("nvme: avoid fallback to sequential scan due to transient issues") regressed the driver by no longer allowing fallback to sequential namespace scans, leaving us with no namespaces. It turns out that the real problem is that this controller claiming NVMe 1.1 compat is treating the CNS field as a binary field, as in NVMe 1.0. This already has a quirk, NVME_QUIRK_IDENTIFY_CNS, so set it for the controller to fix all this nonsense (including other errors triggered by other CNS commands). Fixes: 811f4de0344d ("nvme: avoid fallback to sequential scan due to transient issues") Fixes: 5bd2927aceba ("nvme-apple: Add initial Apple SoC NVMe driver") Signed-off-by: Hector Martin <marcan@marcan.st> Reviewed-by: Sven Peter <sven@svenpeter.dev> Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-01-08block: Drop spurious might_sleep() from blk_put_queue()Tejun Heo
Dan reports the following smatch detected the following: block/blk-cgroup.c:1863 blkcg_schedule_throttle() warn: sleeping in atomic context caused by blkcg_schedule_throttle() calling blk_put_queue() in an non-sleepable context. blk_put_queue() acquired might_sleep() in 63f93fd6fa57 ("block: mark blk_put_queue as potentially blocking") which transferred the might_sleep() from blk_free_queue(). blk_free_queue() acquired might_sleep() in e8c7d14ac6c3 ("block: revert back to synchronous request_queue removal") while turning request_queue removal synchronous. However, this isn't necessary as nothing in the free path actually requires sleeping. It's pretty unusual to require a sleeping context in a put operation and it's not needed in the first place. Let's drop it. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Dan Carpenter <error27@gmail.com> Link: https://lkml.kernel.org/r/Y7g3L6fntnTtOm63@kili Cc: Christoph Hellwig <hch@lst.de> Cc: Luis Chamberlain <mcgrof@kernel.org> Fixes: e8c7d14ac6c3 ("block: revert back to synchronous request_queue removal") # v5.9+ Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/Y7iFwjN+XzWvLv3y@slm.duckdns.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-05block: Remove "select SRCU"Paul E. McKenney
Now that the SRCU Kconfig option is unconditionally selected, there is no longer any point in selecting it. Therefore, remove the "select SRCU" Kconfig statements. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-block@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-04Revert "pktcdvd: remove driver."Jens Axboe
This reverts commit f40eb99897af665f11858dd7b56edcb62c3f3c67. There are apparently still users out there of this driver. While we'd love to remove it to ease the maintenance burden, let's reinstate it for now until better (userspace) solutions can be developed. Link: https://lore.kernel.org/lkml/20230104190115.ceglfefco475ev6c@pali/ Reported-by: Pali Rohár <pali@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-04Revert "block: remove devnode callback from struct block_device_operations"Jens Axboe
This reverts commit 85d6ce58e493ac8b7122e2fbe3f41b94d6ebdc11. We're reinstating the pktcdvd driver, which needs this API. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-04Revert "block: bio_copy_data_iter"Jens Axboe
This reverts commit db1c7d77976775483a8ef240b4c705f113e13ea1. We're reinstating the pktcdvd driver, which needs this API. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-04ublk: honor IO_URING_F_NONBLOCK for handling control commandMing Lei
Most of control command handlers may sleep, so return -EAGAIN in case of IO_URING_F_NONBLOCK to defer the handling into io wq context. Fixes: 71f28f3136af ("ublk_drv: add io_uring based userspace block driver") Reported-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230104133235.836536-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-04block: don't allow splitting of a REQ_NOWAIT bioJens Axboe
If we split a bio marked with REQ_NOWAIT, then we can trigger spurious EAGAIN if constituent parts of that split bio end up failing request allocations. Parts will complete just fine, but just a single failure in one of the chained bios will yield an EAGAIN final result for the parent bio. Return EAGAIN early if we end up needing to split such a bio, which allows for saner recovery handling. Cc: stable@vger.kernel.org # 5.15+ Link: https://github.com/axboe/liburing/issues/766 Reported-by: Michael Kelley <mikelley@microsoft.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-04block: handle bio_split_to_limits() NULL returnJens Axboe
This can't happen right now, but in preparation for allowing bio_split_to_limits() returning NULL if it ended the bio, check for it in all the callers. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-12-29Merge tag 'nvme-6.2-2022-12-29' of git://git.infradead.org/nvme into block-6.2Jens Axboe
Pull NVMe fixes from Christoph: "nvme fixes for Linux 6.2 - fix various problems in handling the Command Supported and Effects log (Christoph Hellwig) - don't allow unprivileged passthrough of commands that don't transfer data but modify logical block content (Christoph Hellwig) - add a features and quirks policy document (Christoph Hellwig) - fix some really nasty code that was correct but made smatch complain (Sagi Grimberg)" * tag 'nvme-6.2-2022-12-29' of git://git.infradead.org/nvme: nvme-auth: fix smatch warning complaints nvme: consult the CSE log page for unprivileged passthrough nvme: also return I/O command effects from nvme_command_effects nvmet: don't defer passthrough commands with trivial effects to the workqueue nvmet: set the LBCC bit for commands that modify data nvmet: use NVME_CMD_EFFECTS_CSUPP instead of open coding it nvme: fix the NVME_CMD_EFFECTS_CSE_MASK definition docs, nvme: add a feature and quirk policy document
2022-12-28nvme-auth: fix smatch warning complaintsSagi Grimberg
When initializing auth context, there may be no secrets passed by the user. Make return code explicit when returning successfully. smatch warnings: drivers/nvme/host/auth.c:950 nvme_auth_init_ctrl() warn: missing error code? 'ret' Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-12-28nvme: consult the CSE log page for unprivileged passthroughChristoph Hellwig
Commands like Write Zeros can change the contents of a namespaces without actually transferring data. To protect against this, check the Commands Supported and Effects log is supported by the controller for any unprivileg command passthrough and refuse unprivileged passthrough if the command has any effects that can change data or metadata. Note: While the Commands Support and Effects log page has only been mandatory since NVMe 2.0, it is widely supported because Windows requires it for any command passthrough from userspace. Fixes: e4fbcf32c860 ("nvme: identify-namespace without CAP_SYS_ADMIN") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
2022-12-28nvme: also return I/O command effects from nvme_command_effectsChristoph Hellwig
To be able to use the Commands Supported and Effects Log for allowing unprivileged passtrough, it needs to be corretly reported for I/O commands as well. Return the I/O command effects from nvme_command_effects, and also add a default list of effects for the NVM command set. For other command sets, the Commands Supported and Effects log is required to be present already. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
2022-12-28nvmet: don't defer passthrough commands with trivial effects to the workqueueChristoph Hellwig
Mask out the "Command Supported" and "Logical Block Content Change" bits and only defer execution of commands that have non-trivial effects to the workqueue for synchronous execution. This allows to execute admin commands asynchronously on controllers that provide a Command Supported and Effects log page, and will keep allowing to execute Write commands asynchronously once command effects on I/O commands are taken into account. Fixes: c1fef73f793b ("nvmet: add passthru code to process commands") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
2022-12-28nvmet: set the LBCC bit for commands that modify dataChristoph Hellwig
Write, Write Zeroes, Zone append and a Zone Reset through Zone Management Send modify the logical block content of a namespace, so make sure the LBCC bit is reported for them. Fixes: b5d0b38c0475 ("nvmet: add Command Set Identifier support") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
2022-12-28nvmet: use NVME_CMD_EFFECTS_CSUPP instead of open coding itChristoph Hellwig
Use NVME_CMD_EFFECTS_CSUPP instead of open coding it and assign a single value to multiple array entries instead of repeated assignments. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
2022-12-28nvme: fix the NVME_CMD_EFFECTS_CSE_MASK definitionChristoph Hellwig
3 << 16 does not generate the correct mask for bits 16, 17 and 18. Use the GENMASK macro to generate the correct mask instead. Fixes: 84fef62d135b ("nvme: check admin passthru command effects") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
2022-12-28docs, nvme: add a feature and quirk policy documentChristoph Hellwig
This adds a document about what specification features are supported by the Linux NVMe driver, and what qualifies for a quirk if an implementation has problems following the specification. Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Jonathan Corbet <corbet@lwn.net>
2022-12-26nvme-pci: update sqsize when adjusting the queue depthChristoph Hellwig
Update the core sqsize field in addition to the PCIe-specific q_depth field as the core tagset allocation helpers rely on it. Fixes: 0da7feaa5913 ("nvme-pci: use the tagset alloc/free helpers") Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Hugh Dickins <hughd@google.com> Link: https://lore.kernel.org/r/20221225103234.226794-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-12-26nvme: fix setting the queue depth in nvme_alloc_io_tag_setChristoph Hellwig
While the CAP.MQES field in NVMe is a 0s based filed with a natural one off, we also need to account for the queue wrap condition and fix undo the one off again in nvme_alloc_io_tag_set. This was never properly done by the fabrics drivers, but they don't seem to care because there is no actual physical queue that can wrap around, but it became a problem when converting over the PCIe driver. Also add back the BLK_MQ_MAX_DEPTH check that was lost in the same commit. Fixes: 0da7feaa5913 ("nvme-pci: use the tagset alloc/free helpers") Reported-by: Hugh Dickins <hughd@google.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Hugh Dickins <hughd@google.com> Link: https://lore.kernel.org/r/20221225103234.226794-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-12-26block, bfq: fix uaf for bfqq in bfq_exit_icq_bfqqYu Kuai
Commit 64dc8c732f5c ("block, bfq: fix possible uaf for 'bfqq->bic'") will access 'bic->bfqq' in bic_set_bfqq(), however, bfq_exit_icq_bfqq() can free bfqq first, and then call bic_set_bfqq(), which will cause uaf. Fix the problem by moving bfq_exit_bfqq() behind bic_set_bfqq(). Fixes: 64dc8c732f5c ("block, bfq: fix possible uaf for 'bfqq->bic'") Reported-by: Yi Zhang <yi.zhang@redhat.com> Signed-off-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20221226030605.1437081-1-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-12-22Merge tag 'nvme-6.2-2022-12-22' of git://git.infradead.org/nvme into block-6.2Jens Axboe
Pull NVMe fixes from Christoph: "nvme fixes for Linux 6.2 - fix doorbell buffer value endianness (Klaus Jensen) - fix Linux vs NVMe page size mismatch (Keith Busch) - fix a potential use memory access beyong the allocation limit (Keith Busch) - fix a multipath vs blktrace NULL pointer dereference (Yanjun Zhang)" * tag 'nvme-6.2-2022-12-22' of git://git.infradead.org/nvme: nvme: fix multipath crash caused by flush request when blktrace is enabled nvme-pci: fix page size checks nvme-pci: fix mempool alloc size nvme-pci: fix doorbell buffer value endianness
2022-12-22nvme: fix multipath crash caused by flush request when blktrace is enabledYanjun Zhang
The flush request initialized by blk_kick_flush has NULL bio, and it may be dealt with nvme_end_req during io completion. When blktrace is enabled, nvme_trace_bio_complete with multipath activated trying to access NULL pointer bio from flush request results in the following crash: [ 2517.831677] BUG: kernel NULL pointer dereference, address: 000000000000001a [ 2517.835213] #PF: supervisor read access in kernel mode [ 2517.838724] #PF: error_code(0x0000) - not-present page [ 2517.842222] PGD 7b2d51067 P4D 0 [ 2517.845684] Oops: 0000 [#1] SMP NOPTI [ 2517.849125] CPU: 2 PID: 732 Comm: kworker/2:1H Kdump: loaded Tainted: G S 5.15.67-0.cl9.x86_64 #1 [ 2517.852723] Hardware name: XFUSION 2288H V6/BC13MBSBC, BIOS 1.13 07/27/2022 [ 2517.856358] Workqueue: nvme_tcp_wq nvme_tcp_io_work [nvme_tcp] [ 2517.859993] RIP: 0010:blk_add_trace_bio_complete+0x6/0x30 [ 2517.863628] Code: 1f 44 00 00 48 8b 46 08 31 c9 ba 04 00 10 00 48 8b 80 50 03 00 00 48 8b 78 50 e9 e5 fe ff ff 0f 1f 44 00 00 41 54 49 89 f4 55 <0f> b6 7a 1a 48 89 d5 e8 3e 1c 2b 00 48 89 ee 4c 89 e7 5d 89 c1 ba [ 2517.871269] RSP: 0018:ff7f6a008d9dbcd0 EFLAGS: 00010286 [ 2517.875081] RAX: ff3d5b4be00b1d50 RBX: 0000000002040002 RCX: ff3d5b0a270f2000 [ 2517.878966] RDX: 0000000000000000 RSI: ff3d5b0b021fb9f8 RDI: 0000000000000000 [ 2517.882849] RBP: ff3d5b0b96a6fa00 R08: 0000000000000001 R09: 0000000000000000 [ 2517.886718] R10: 000000000000000c R11: 000000000000000c R12: ff3d5b0b021fb9f8 [ 2517.890575] R13: 0000000002000000 R14: ff3d5b0b021fb1b0 R15: 0000000000000018 [ 2517.894434] FS: 0000000000000000(0000) GS:ff3d5b42bfc80000(0000) knlGS:0000000000000000 [ 2517.898299] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2517.902157] CR2: 000000000000001a CR3: 00000004f023e005 CR4: 0000000000771ee0 [ 2517.906053] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 2517.909930] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 2517.913761] PKRU: 55555554 [ 2517.917558] Call Trace: [ 2517.921294] <TASK> [ 2517.924982] nvme_complete_rq+0x1c3/0x1e0 [nvme_core] [ 2517.928715] nvme_tcp_recv_pdu+0x4d7/0x540 [nvme_tcp] [ 2517.932442] nvme_tcp_recv_skb+0x4f/0x240 [nvme_tcp] [ 2517.936137] ? nvme_tcp_recv_pdu+0x540/0x540 [nvme_tcp] [ 2517.939830] tcp_read_sock+0x9c/0x260 [ 2517.943486] nvme_tcp_try_recv+0x65/0xa0 [nvme_tcp] [ 2517.947173] nvme_tcp_io_work+0x64/0x90 [nvme_tcp] [ 2517.950834] process_one_work+0x1e8/0x390 [ 2517.954473] worker_thread+0x53/0x3c0 [ 2517.958069] ? process_one_work+0x390/0x390 [ 2517.961655] kthread+0x10c/0x130 [ 2517.965211] ? set_kthread_struct+0x40/0x40 [ 2517.968760] ret_from_fork+0x1f/0x30 [ 2517.972285] </TASK> To avoid this situation, add a NULL check for req->bio before calling trace_block_bio_complete. Signed-off-by: Yanjun Zhang <zhangyanjun@cestc.cn> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-12-21nvme-pci: fix page size checksKeith Busch
The size allocated out of the dma pool is at most NVME_CTRL_PAGE_SIZE, which may be smaller than the PAGE_SIZE. Fixes: c61b82c7b7134 ("nvme-pci: fix PRP pool size") Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-12-21nvme-pci: fix mempool alloc sizeKeith Busch
Convert the max size to bytes to match the units of the divisor that calculates the worst-case number of PRP entries. The result is used to determine how many PRP Lists are required. The code was previously rounding this to 1 list, but we can require 2 in the worst case. In that scenario, the driver would corrupt memory beyond the size provided by the mempool. While unlikely to occur (you'd need a 4MB in exactly 127 phys segments on a queue that doesn't support SGLs), this memory corruption has been observed by kfence. Cc: Jens Axboe <axboe@kernel.dk> Fixes: 943e942e6266f ("nvme-pci: limit max IO size and segments to avoid high order allocations") Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-12-21nvme-pci: fix doorbell buffer value endiannessKlaus Jensen
When using shadow doorbells, the event index and the doorbell values are written to host memory. Prior to this patch, the values written would erroneously be written in host endianness. This causes trouble on big-endian platforms. Fix this by adding missing endian conversions. This issue was noticed by Guenter while testing various big-endian platforms under QEMU[1]. A similar fix required for hw/nvme in QEMU is up for review as well[2]. [1]: https://lore.kernel.org/qemu-devel/20221209110022.GA3396194@roeck-us.net/ [2]: https://lore.kernel.org/qemu-devel/20221212114409.34972-4-its@irrelevant.dk/ Fixes: f9f38e33389c ("nvme: improve performance for virtual NVMe devices") Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-12-16block: don't clear REQ_ALLOC_CACHE for non-polled requestsJens Axboe
Since commit: b99182c501c3 ("bio: add pcpu caching for non-polling bio_put") we support bio caching for IRQ based IO as well, hence there's no need to manually clear REQ_ALLOC_CACHE if we disable polling on a request. Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>