summaryrefslogtreecommitdiff
path: root/drivers/nvme/target
AgeCommit message (Collapse)Author
2025-05-31Merge tag 'mm-nonmm-stable-2025-05-31-15-28' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - "hung_task: extend blocking task stacktrace dump to semaphore" from Lance Yang enhances the hung task detector. The detector presently dumps the blocking tasks's stack when it is blocked on a mutex. Lance's series extends this to semaphores - "nilfs2: improve sanity checks in dirty state propagation" from Wentao Liang addresses a couple of minor flaws in nilfs2 - "scripts/gdb: Fixes related to lx_per_cpu()" from Illia Ostapyshyn fixes a couple of issues in the gdb scripts - "Support kdump with LUKS encryption by reusing LUKS volume keys" from Coiby Xu addresses a usability problem with kdump. When the dump device is LUKS-encrypted, the kdump kernel may not have the keys to the encrypted filesystem. A full writeup of this is in the series [0/N] cover letter - "sysfs: add counters for lockups and stalls" from Max Kellermann adds /sys/kernel/hardlockup_count and /sys/kernel/hardlockup_count and /sys/kernel/rcu_stall_count - "fork: Page operation cleanups in the fork code" from Pasha Tatashin implements a number of code cleanups in fork.c - "scripts/gdb/symbols: determine KASLR offset on s390 during early boot" from Ilya Leoshkevich fixes some s390 issues in the gdb scripts * tag 'mm-nonmm-stable-2025-05-31-15-28' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (67 commits) llist: make llist_add_batch() a static inline delayacct: remove redundant code and adjust indentation squashfs: add optional full compressed block caching crash_dump, nvme: select CONFIGFS_FS as built-in scripts/gdb/symbols: determine KASLR offset on s390 during early boot scripts/gdb/symbols: factor out pagination_off() scripts/gdb/symbols: factor out get_vmlinux() kernel/panic.c: format kernel-doc comments mailmap: update and consolidate Casey Connolly's name and email nilfs2: remove wbc->for_reclaim handling fork: define a local GFP_VMAP_STACK fork: check charging success before zeroing stack fork: clean-up naming of vm_stack/vm_struct variables in vmap stacks code fork: clean-up ifdef logic around stack allocation kernel/rcu/tree_stall: add /sys/kernel/rcu_stall_count kernel/watchdog: add /sys/kernel/{hard,soft}lockup_count x86/crash: make the page that stores the dm crypt keys inaccessible x86/crash: pass dm crypt keys to kdump kernel Revert "x86/mm: Remove unused __set_memory_prot()" crash_dump: retrieve dm crypt keys in kdump kernel ...
2025-05-28Merge tag 'net-next-6.16' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Paolo Abeni: "Core: - Implement the Device Memory TCP transmit path, allowing zero-copy data transmission on top of TCP from e.g. GPU memory to the wire. - Move all the IPv6 routing tables management outside the RTNL scope, under its own lock and RCU. The route control path is now 3x times faster. - Convert queue related netlink ops to instance lock, reducing again the scope of the RTNL lock. This improves the control plane scalability. - Refactor the software crc32c implementation, removing unneeded abstraction layers and improving significantly the related micro-benchmarks. - Optimize the GRO engine for UDP-tunneled traffic, for a 10% performance improvement in related stream tests. - Cover more per-CPU storage with local nested BH locking; this is a prep work to remove the current per-CPU lock in local_bh_disable() on PREMPT_RT. - Introduce and use nlmsg_payload helper, combining buffer bounds verification with accessing payload carried by netlink messages. Netfilter: - Rewrite the procfs conntrack table implementation, improving considerably the dump performance. A lot of user-space tools still use this interface. - Implement support for wildcard netdevice in netdev basechain and flowtables. - Integrate conntrack information into nft trace infrastructure. - Export set count and backend name to userspace, for better introspection. BPF: - BPF qdisc support: BPF-qdisc can be implemented with BPF struct_ops programs and can be controlled in similar way to traditional qdiscs using the "tc qdisc" command. - Refactor the UDP socket iterator, addressing long standing issues WRT duplicate hits or missed sockets. Protocols: - Improve TCP receive buffer auto-tuning and increase the default upper bound for the receive buffer; overall this improves the single flow maximum thoughput on 200Gbs link by over 60%. - Add AFS GSSAPI security class to AF_RXRPC; it provides transport security for connections to the AFS fileserver and VL server. - Improve TCP multipath routing, so that the sources address always matches the nexthop device. - Introduce SO_PASSRIGHTS for AF_UNIX, to allow disabling SCM_RIGHTS, and thus preventing DoS caused by passing around problematic FDs. - Retire DCCP socket. DCCP only receives updates for bugs, and major distros disable it by default. Its removal allows for better organisation of TCP fields to reduce the number of cache lines hit in the fast path. - Extend TCP drop-reason support to cover PAWS checks. Driver API: - Reorganize PTP ioctl flag support to require an explicit opt-in for the drivers, avoiding the problem of drivers not rejecting new unsupported flags. - Converted several device drivers to timestamping APIs. - Introduce per-PHY ethtool dump helpers, improving the support for dump operations targeting PHYs. Tests and tooling: - Add support for classic netlink in user space C codegen, so that ynl-c can now read, create and modify links, routes addresses and qdisc layer configuration. - Add ynl sub-types for binary attributes, allowing ynl-c to output known struct instead of raw binary data, clarifying the classic netlink output. - Extend MPTCP selftests to improve the code-coverage. - Add tests for XDP tail adjustment in AF_XDP. New hardware / drivers: - OpenVPN virtual driver: offload OpenVPN data channels processing to the kernel-space, increasing the data transfer throughput WRT the user-space implementation. - Renesas glue driver for the gigabit ethernet RZ/V2H(P) SoC. - Broadcom asp-v3.0 ethernet driver. - AMD Renoir ethernet device. - ReakTek MT9888 2.5G ethernet PHY driver. - Aeonsemi 10G C45 PHYs driver. Drivers: - Ethernet high-speed NICs: - nVidia/Mellanox (mlx5): - refactor the steering table handling to significantly reduce the amount of memory used - add support for complex matches in H/W flow steering - improve flow streeing error handling - convert to netdev instance locking - Intel (100G, ice, igb, ixgbe, idpf): - ice: add switchdev support for LLDP traffic over VF - ixgbe: add firmware manipulation and regions devlink support - igb: introduce support for frame transmission premption - igb: adds persistent NAPI configuration - idpf: introduce RDMA support - idpf: add initial PTP support - Meta (fbnic): - extend hardware stats coverage - add devlink dev flash support - Broadcom (bnxt): - add support for RX-side device memory TCP - Wangxun (txgbe): - implement support for udp tunnel offload - complete PTP and SRIOV support for AML 25G/10G devices - Ethernet NICs embedded and virtual: - Google (gve): - add device memory TCP TX support - Amazon (ena): - support persistent per-NAPI config - Airoha: - add H/W support for L2 traffic offload - add per flow stats for flow offloading - RealTek (rtl8211): add support for WoL magic packet - Synopsys (stmmac): - dwmac-socfpga 1000BaseX support - add Loongson-2K3000 support - introduce support for hardware-accelerated VLAN stripping - Broadcom (bcmgenet): - expose more H/W stats - Freescale (enetc, dpaa2-eth): - enetc: add MAC filter, VLAN filter RSS and loopback support - dpaa2-eth: convert to H/W timestamping APIs - vxlan: convert FDB table to rhashtable, for better scalabilty - veth: apply qdisc backpressure on full ring to reduce TX drops - Ethernet switches: - Microchip (kzZ88x3): add ETS scheduler support - Ethernet PHYs: - RealTek (rtl8211): - add support for WoL magic packet - add support for PHY LEDs - CAN: - Adds RZ/G3E CANFD support to the rcar_canfd driver. - Preparatory work for CAN-XL support. - Add self-tests framework with support for CAN physical interfaces. - WiFi: - mac80211: - scan improvements with multi-link operation (MLO) - Qualcomm (ath12k): - enable AHB support for IPQ5332 - add monitor interface support to QCN9274 - add multi-link operation support to WCN7850 - add 802.11d scan offload support to WCN7850 - monitor mode for WCN7850, better 6 GHz regulatory - Qualcomm (ath11k): - restore hibernation support - MediaTek (mt76): - WiFi-7 improvements - implement support for mt7990 - Intel (iwlwifi): - enhanced multi-link single-radio (EMLSR) support on 5 GHz links - rework device configuration - RealTek (rtw88): - improve throughput for RTL8814AU - RealTek (rtw89): - add multi-link operation support - STA/P2P concurrency improvements - support different SAR configs by antenna - Bluetooth: - introduce HCI Driver protocol - btintel_pcie: do not generate coredump for diagnostic events - btusb: add HCI Drv commands for configuring altsetting - btusb: add RTL8851BE device 0x0bda:0xb850 - btusb: add new VID/PID 13d3/3584 for MT7922 - btusb: add new VID/PID 13d3/3630 and 13d3/3613 for MT7925 - btnxpuart: implement host-wakeup feature" * tag 'net-next-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1611 commits) selftests/bpf: Fix bpf selftest build warning selftests: netfilter: Fix skip of wildcard interface test net: phy: mscc: Stop clearing the the UDPv4 checksum for L2 frames net: openvswitch: Fix the dead loop of MPLS parse calipso: Don't call calipso functions for AF_INET sk. selftests/tc-testing: Add a test for HFSC eltree double add with reentrant enqueue behaviour on netem net_sched: hfsc: Address reentrant enqueue adding class to eltree twice octeontx2-pf: QOS: Refactor TC_HTB_LEAF_DEL_LAST callback octeontx2-pf: QOS: Perform cache sync on send queue teardown net: mana: Add support for Multi Vports on Bare metal net: devmem: ncdevmem: remove unused variable net: devmem: ksft: upgrade rx test to send 1K data net: devmem: ksft: add 5 tuple FS support net: devmem: ksft: add exit_wait to make rx test pass net: devmem: ksft: add ipv4 support net: devmem: preserve sockc_err page_pool: fix ugly page_pool formatting net: devmem: move list_add to net_devmem_bind_dmabuf. selftests: netfilter: nft_queue.sh: include file transfer duration in log message net: phy: mscc: Fix memory leak when using one step timestamping ...
2025-05-27crash_dump, nvme: select CONFIGFS_FS as built-inArnd Bergmann
Configfs can be configured as a loadable module, which causes a link-time failure for dm-crypt crash dump support: crash_dump_dm_crypt.c:(.text+0x3a4): undefined reference to `config_item_init_type_name' aarch64-linux-ld: kernel/crash_dump_dm_crypt.o: in function `configfs_dmcrypt_keys_init': crash_dump_dm_crypt.c:(.init.text+0x90): undefined reference to `config_group_init' aarch64-linux-ld: crash_dump_dm_crypt.c:(.init.text+0xb4): undefined reference to `configfs_register_subsystem' aarch64-linux-ld: crash_dump_dm_crypt.c:(.init.text+0xd8): undefined reference to `configfs_unregister_subsystem' This could be avoided with a dependency on CONFIGFS_FS=y, but the dependency has an additional problem of causing Kconfig dependency loops since most other uses select the symbol. Using a simple 'select CONFIGFS_FS' here in turn fails with CONFIG_DM_CRYPT=m, because that still only causes configfs to be a loadable module. The only version I found that fixes this reliably uses an additional Kconfig symbol to ensure the 'select' actually turns on configfs as builtin, with two additional changes to avoid dependency loops with nvme and sysfs. There is no compile-time dependency between configfs and sysfs, so selecting configfs from a driver with sysfs disabled does not cause link failures, only the default /sys/kernel/config mount point will not be created. Link: https://lkml.kernel.org/r/20250521160359.2132363-1-arnd@kernel.org Fixes: 6b23858fd63b ("crash_dump: make dm crypt keys persist for the kdump kernel") Fixes: 1fb470408497 ("nvme-loop: add configfs dependency") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Andreas Hindborg <a.hindborg@kernel.org> Cc: Breno Leitao <leitao@debian.org> Cc: Chaitanya Kulkarni <kch@nvidia.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Coiby Xu <coxu@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-27net: core: Convert inet_addr_is_any() to sockaddr_storageKees Cook
All the callers of inet_addr_is_any() have a sockaddr_storage-backed sockaddr. Avoid casts and switch prototype to the actual object being used. Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> # SCSI Signed-off-by: Kees Cook <kees@kernel.org> Link: https://patch.msgid.link/20250521204619.2301870-1-kees@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-26Merge tag 'for-6.16/block-20250523' of git://git.kernel.dk/linuxLinus Torvalds
Pull block updates from Jens Axboe: - ublk updates: - Add support for updating the size of a ublk instance - Zero-copy improvements - Auto-registering of buffers for zero-copy - Series simplifying and improving GET_DATA and request lookup - Series adding quiesce support - Lots of selftests additions - Various cleanups - NVMe updates via Christoph: - add per-node DMA pools and use them for PRP/SGL allocations (Caleb Sander Mateos, Keith Busch) - nvme-fcloop refcounting fixes (Daniel Wagner) - support delayed removal of the multipath node and optionally support the multipath node for private namespaces (Nilay Shroff) - support shared CQs in the PCI endpoint target code (Wilfred Mallawa) - support admin-queue only authentication (Hannes Reinecke) - use the crc32c library instead of the crypto API (Eric Biggers) - misc cleanups (Christoph Hellwig, Marcelo Moreira, Hannes Reinecke, Leon Romanovsky, Gustavo A. R. Silva) - MD updates via Yu: - Fix that normal IO can be starved by sync IO, found by mkfs on newly created large raid5, with some clean up patches for bdev inflight counters - Clean up brd, getting rid of atomic kmaps and bvec poking - Add loop driver specifically for zoned IO testing - Eliminate blk-rq-qos calls with a static key, if not enabled - Improve hctx locking for when a plug has IO for multiple queues pending - Remove block layer bouncing support, which in turn means we can remove the per-node bounce stat as well - Improve blk-throttle support - Improve delay support for blk-throttle - Improve brd discard support - Unify IO scheduler switching. This should also fix a bunch of lockdep warnings we've been seeing, after enabling lockdep support for queue freezing/unfreezeing - Add support for block write streams via FDP (flexible data placement) on NVMe - Add a bunch of block helpers, facilitating the removal of a bunch of duplicated boilerplate code - Remove obsolete BLK_MQ pci and virtio Kconfig options - Add atomic/untorn write support to blktrace - Various little cleanups and fixes * tag 'for-6.16/block-20250523' of git://git.kernel.dk/linux: (186 commits) selftests: ublk: add test for UBLK_F_QUIESCE ublk: add feature UBLK_F_QUIESCE selftests: ublk: add test case for UBLK_U_CMD_UPDATE_SIZE traceevent/block: Add REQ_ATOMIC flag to block trace events ublk: run auto buf unregisgering in same io_ring_ctx with registering io_uring: add helper io_uring_cmd_ctx_handle() ublk: remove io argument from ublk_auto_buf_reg_fallback() ublk: handle ublk_set_auto_buf_reg() failure correctly in ublk_fetch() selftests: ublk: add test for covering UBLK_AUTO_BUF_REG_FALLBACK selftests: ublk: support UBLK_F_AUTO_BUF_REG ublk: support UBLK_AUTO_BUF_REG_FALLBACK ublk: register buffer to local io_uring with provided buf index via UBLK_F_AUTO_BUF_REG ublk: prepare for supporting to register request buffer automatically ublk: convert to refcount_t selftests: ublk: make IO & device removal test more stressful nvme: rename nvme_mpath_shutdown_disk to nvme_mpath_remove_disk nvme: introduce multipath_always_on module param nvme-multipath: introduce delayed removal of the multipath head node nvme-pci: derive and better document max segments limits nvme-pci: use struct_size for allocation struct nvme_dev ...
2025-05-20nvmet-fcloop: don't wait for lport cleanupDaniel Wagner
The lifetime of the fcloop_lsreq is not tight to the lifetime of the host or target port, thus there is no need anymore to synchronize the cleanup path anymore. Signed-off-by: Daniel Wagner <wagi@kernel.org> Reviewed-by: Hannes Reinecke <hare@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-05-20nvmet-fcloop: add missing fcloop_callback_host_doneDaniel Wagner
Add the missing fcloop_call_host_done calls so that the caller frees resources when something goes wrong. Signed-off-by: Daniel Wagner <wagi@kernel.org> Reviewed-by: Hannes Reinecke <hare@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-05-20nvmet-fc: take tgtport refs for portentryDaniel Wagner
Ensure that the tgtport is not going away as long portentry has a pointer on it. Signed-off-by: Daniel Wagner <wagi@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-05-20nvmet-fc: free pending reqs on tgtport unregisterDaniel Wagner
When nvmet_fc_unregister_targetport is called by the LLDD, it's not possible to communicate with the host, thus all pending request will not be process. Thus explicitly free them. Signed-off-by: Daniel Wagner <wagi@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-05-20nvmet-fcloop: drop response if targetport is goneDaniel Wagner
When the target port is gone, the lsrsp pointer is invalid. Thus don't call the done function anymore instead just drop the response. This happens when the target sends a disconnect association. After this the target starts tearing down all resources and doesn't expect any response. Signed-off-by: Daniel Wagner <wagi@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-05-20nvmet-fcloop: allocate/free fcloop_lsreq directlyDaniel Wagner
fcloop depends on the host or the target to allocate the fcloop_lsreq object. This means that the lifetime of the fcloop_lsreq is tied to either the host or the target. Consequently, the host or the target must cooperate during shutdown. Unfortunately, this approach does not work well when the target forces a shutdown, as there are dependencies that are difficult to resolve in a clean way. The simplest solution is to decouple the lifetime of the fcloop_lsreq object by managing them directly within fcloop. Since this is not a performance-critical path and only a small number of LS objects are used during setup and cleanup, it does not significantly impact performance to allocate them during normal operation. Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Daniel Wagner <wagi@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-05-20nvmet-fcloop: prevent double port deletionDaniel Wagner
The delete callback can be called either via the unregister function or from the transport directly. Thus it is necessary ensure resources are not freed multiple times. Signed-off-by: Daniel Wagner <wagi@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-05-20nvmet-fcloop: access fcpreq only when holding reqlockDaniel Wagner
The abort handling logic expects that the state and the fcpreq are only accessed when holding the reqlock lock. While at it, only handle the aborts in the abort handler. Signed-off-by: Daniel Wagner <wagi@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-05-20nvmet-fcloop: update refs on tfcp_reqDaniel Wagner
Track the lifetime of the in-flight tfcp_req to ensure the object is not freed too early. Signed-off-by: Daniel Wagner <wagi@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-05-20nvmet-fcloop: refactor fcloop_delete_local_portDaniel Wagner
Use the newly introduced fcloop_lport_lookup instead of the open coded version. Signed-off-by: Daniel Wagner <wagi@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-05-20nvmet-fcloop: refactor fcloop_nport_alloc and track lportDaniel Wagner
The checks for a valid input values are mixed with the logic to insert a newly allocated nport. Refactor the function so that first the checks are done. This allows to untangle the setup steps into a more linear form which reduces the complexity of the functions. Also start tracking lport when a lport is assigned to a nport. This ensures, that the lport is not going away as long it is still referenced by a nport. Signed-off-by: Daniel Wagner <wagi@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-05-20nvmet-fcloop: remove nport from list on last userDaniel Wagner
The nport object has an association with the rport and lport object, that means we can only remove an nport object from the global nport_list after the last user of an rport or lport is gone. Signed-off-by: Daniel Wagner <wagi@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-05-20nvmet-fcloop: track ref counts for nportsDaniel Wagner
A nport object is always used in association with targerport, remoteport, tport and rport objects. Add explicit references for any of the associated object. This ensures that nport is not removed too early on shutdown sequences. Signed-off-by: Daniel Wagner <wagi@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-05-20nvmet-auth: use SHASH_DESC_ON_STACKHannes Reinecke
Use SHASH_DESC_ON_STACK to avoid explicit allocation. Signed-off-by: Hannes Reinecke <hare@kernel.org> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-05-20nvmet: simplify the nvmet_req_init() interfaceWilfred Mallawa
Now that a submission queue holds a reference to its completion queue, there is no need to pass the cq argument to nvmet_req_init(), so remove it. Signed-off-by: Wilfred Mallawa <wilfred.mallawa@wdc.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-05-20nvmet: support completion queue sharingWilfred Mallawa
The NVMe PCI transport specification allows for completion queues to be shared by different submission queues. This patch allows a submission queue to keep track of the completion queue it is using with reference counting. As such, it can be ensured that a completion queue is not deleted while a submission queue is actively using it. This patch enables completion queue sharing in the pci-epf target driver. For fabrics drivers, completion queue sharing is not enabled as it is not possible as per the fabrics specification. However, this patch modifies the fabrics drivers to correctly integrate the new API that supports completion queue sharing. Signed-off-by: Wilfred Mallawa <wilfred.mallawa@wdc.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-05-20nvmet: fabrics: add CQ init and destroyWilfred Mallawa
With struct nvmet_cq now having a reference count, this patch amends the target fabrics call chain to initialize and destroy/put a completion queue. Signed-off-by: Wilfred Mallawa <wilfred.mallawa@wdc.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-05-20nvmet: cq: prepare for completion queue sharingWilfred Mallawa
For the PCI transport, the NVMe specification allows submission queues to share completion queues, however, this is not supported in the current NVMe target implementation. This is a preparatory patch to allow for completion queue (CQ) sharing between different submission queues (SQ). To support queue sharing, reference counting completion queues is required. This patch adds the refcount_t field ref to struct nvmet_cq coupled with respective nvmet_cq_init(), nvmet_cq_get(), nvmet_cq_put(), nvmet_cq_is_deletable() and nvmet_cq_destroy() functions. A CQ reference count is initialized with nvmet_cq_init() when a CQ is created. Using nvmet_cq_get(), a reference to a CQ is taken when an SQ is created that uses the respective CQ. Similarly. when an SQ is destroyed, the reference count to the respective CQ from the SQ being destroyed is decremented with nvmet_cq_put(). The last reference to a CQ is dropped on a CQ deletion using nvmet_cq_put(), which invokes nvmet_cq_destroy() to fully cleanup after the CQ. The helper function nvmet_cq_in_use() is used to determine if any SQs are still using the CQ pending deletion. In which case, the CQ must not be deleted. This should protect scenarios where a bad host may attempt to delete a CQ without first having deleted SQ(s) using that CQ. Additionally, this patch adds an array of struct nvmet_cq to the nvmet_ctrl structure. This allows for the controller to keep track of CQs as they are created and destroyed, similar to the current tracking done for SQs. The memory for this array is freed when the controller is freed. A struct nvmet_ctrl reference is also added to the nvmet_cq structure to allow for CQs to be removed from the controller whilst keeping the new API similar to the existing API for SQs. Sample callchain with CQ refcounting for the PCI endpoint target (pci-epf): i. nvmet_execute_create_cq -> nvmet_pci_epf_create_cq -> nvmet_cq_create -> nvmet_cq_init [cq refcount=1] ii. nvmet_execute_create_sq -> nvmet_pci_epf_create_sq -> nvmet_sq_create -> nvmet_sq_init -> nvmet_cq_get [cq refcount=2] iii. nvmet_execute_delete_sq - > nvmet_pci_epf_delete_sq -> -> nvmet_sq_destroy -> nvmet_cq_put [cq refcount 1] iv. nvmet_execute_delete_cq -> nvmet_pci_epf_delete_cq -> nvmet_cq_put [cq refcount 0] Signed-off-by: Wilfred Mallawa <wilfred.mallawa@wdc.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-05-20nvmet: add a helper function for cqid checkingWilfred Mallawa
This patch adds a new helper function nvmet_check_io_cqid(). It is to be used when parsing host commands for IO CQ creation/deletion and IO SQ creation to ensure that the specified IO completion queue identifier (CQID) is not 0 (Admin queue ID). This is a check that already occurs in the nvmet_execute_x() functions prior to nvmet_check_cqid. With the addition of this helper function, the CQ ID checks in the nvmet_execute_x() function can be removed, and instead simply call nvmet_check_io_cqid() in place of nvmet_check_cqid(). Signed-off-by: Wilfred Mallawa <wilfred.mallawa@wdc.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-05-20nvmet-auth: authenticate on admin queue onlyHannes Reinecke
Do not start authentication on I/O queues as it doesn't really add value, and secure concatenation disallows it anyway. Authentication commands on I/O queues are not aborted, so the host may still run the authentication protocol on I/O queues. Signed-off-by: Hannes Reinecke <hare@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-05-20nvmet-tcp: switch to using the crc32c libraryEric Biggers
Now that the crc32c() library function directly takes advantage of architecture-specific optimizations, it is unnecessary to go through the crypto API. Just use crc32c(). This is much simpler, and it improves performance due to eliminating the crypto API overhead. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-05-20nvmet: replace strncpy with strscpyMarcelo Moreira
The strncpy() function is deprecated for NUL-terminated strings as explained in the "strncpy() on NUL-terminated strings" section of Documentation/process/deprecated.rst. The key issues are: - strncpy() fails to guarantee NULL-termination when source > destination - it unnecessarily zero-pads short strings, causing performance overhead strscpy() is the proper replacement because: - it guarantees NULL-termination - it avoids redundant zero-padding - it aligns with current kernel string-copying best practice memcpy() was rejected because: - NQN buffers (subsysnqn/hostnqn) are treated as NULL-terminated strings: - strcmp() usage in nvmet_host_allowed() (discovery.c) - strscpy() to copy subsysnqn in nvmet_execute_disc_identify() seq_buf wasn't used because: - this is a simple fixed-size buffer copy - there is no need for progressive string construction features Signed-off-by: Marcelo Moreira <marcelomoreira1905@gmail.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-05-20nvme-loop: avoid -Wflex-array-member-not-at-end warningGustavo A. R. Silva
-Wflex-array-member-not-at-end was introduced in GCC-14, and we are getting ready to enable it, globally. Move the conflicting declaration to the end of the structure. Notice that `struct nvme_loop_iod` is a flexible structure --a structure that contains a flexible-array member. Fix the following warning: drivers/nvme/target/loop.c:36:33: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-05-13nvmet: pci-epf: remove NVMET_PCI_EPF_Q_IS_SQDamien Le Moal
The flag NVMET_PCI_EPF_Q_IS_SQ is set but never used. Remove it. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-05-13nvmet: pci-epf: improve debug messageDamien Le Moal
Improve the debug message of nvmet_pci_epf_create_cq() to indicate if a completion queue IRQ is disabled. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Niklas Cassel <cassel@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-05-13nvmet: pci-epf: cleanup nvmet_pci_epf_raise_irq()Damien Le Moal
There is no point in taking the controller irq_lock and calling nvmet_pci_epf_should_raise_irq() for a completion queue which does not have IRQ enabled (NVMET_PCI_EPF_Q_IRQ_ENABLED flag is not set). Move the test for the NVMET_PCI_EPF_Q_IRQ_ENABLED flag out of nvmet_pci_epf_should_raise_irq() to the top of nvmet_pci_epf_raise_irq() to return early when no IRQ should be raised. Also, use dev_err_ratelimited() to avoid a message storm under load when raising IRQs is failing. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Niklas Cassel <cassel@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-05-13nvmet: pci-epf: do not fall back to using INTX if not supportedDamien Le Moal
Some endpoint PCIe controllers do not support raising legacy INTX interrupts. This support is indicated by the intx_capable field of struct pci_epc_features. Modify nvmet_pci_epf_raise_irq() to not automatically fallback to trying raising an INTX interrupt after an MSI or MSI-X error if the controller does not support INTX. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-05-13nvmet: pci-epf: clear completion queue IRQ flag on deleteDamien Le Moal
The function nvmet_pci_epf_delete_cq() unconditionally calls nvmet_pci_epf_remove_irq_vector() even for completion queues that do not have interrupts enabled. Furthermore, for completion queues that do have IRQ enabled, deleting and re-creating the completion queue leaves the flag NVMET_PCI_EPF_Q_IRQ_ENABLED set, even if the completion queue is being re-created with IRQ disabled. Fix these issues by calling nvmet_pci_epf_remove_irq_vector() only if NVMET_PCI_EPF_Q_IRQ_ENABLED is set and make sure to always clear that flag. Fixes: 0faa0fe6f90e ("nvmet: New NVMe PCI endpoint function target driver") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-04-30nvmet-auth: always free derived key dataHannes Reinecke
After calling nvme_auth_derive_tls_psk() we need to free the resulting psk data, as either TLS is disable (and we don't need the data anyway) or the psk data is copied into the resulting key (and can be free, too). Fixes: fa2e0f8bbc68 ("nvmet-tcp: support secure channel concatenation") Reported-by: Yi Zhang <yi.zhang@redhat.com> Suggested-by: Maurizio Lombardi <mlombard@bsdbackstore.eu> Signed-off-by: Hannes Reinecke <hare@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Tested-by: Yi Zhang <yi.zhang@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-04-30nvmet-tcp: don't restore null sk_state_changeAlistair Francis
queue->state_change is set as part of nvmet_tcp_set_queue_sock(), but if the TCP connection isn't established when nvmet_tcp_set_queue_sock() is called then queue->state_change isn't set and sock->sk->sk_state_change isn't replaced. As such we don't need to restore sock->sk->sk_state_change if queue->state_change is NULL. This avoids NULL pointer dereferences such as this: [ 286.462026][ C0] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 286.462814][ C0] #PF: supervisor instruction fetch in kernel mode [ 286.463796][ C0] #PF: error_code(0x0010) - not-present page [ 286.464392][ C0] PGD 8000000140620067 P4D 8000000140620067 PUD 114201067 PMD 0 [ 286.465086][ C0] Oops: Oops: 0010 [#1] SMP KASAN PTI [ 286.465559][ C0] CPU: 0 UID: 0 PID: 1628 Comm: nvme Not tainted 6.15.0-rc2+ #11 PREEMPT(voluntary) [ 286.466393][ C0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-3.fc41 04/01/2014 [ 286.467147][ C0] RIP: 0010:0x0 [ 286.467420][ C0] Code: Unable to access opcode bytes at 0xffffffffffffffd6. [ 286.467977][ C0] RSP: 0018:ffff8883ae008580 EFLAGS: 00010246 [ 286.468425][ C0] RAX: 0000000000000000 RBX: ffff88813fd34100 RCX: ffffffffa386cc43 [ 286.469019][ C0] RDX: 1ffff11027fa68b6 RSI: 0000000000000008 RDI: ffff88813fd34100 [ 286.469545][ C0] RBP: ffff88813fd34160 R08: 0000000000000000 R09: ffffed1027fa682c [ 286.470072][ C0] R10: ffff88813fd34167 R11: 0000000000000000 R12: ffff88813fd344c3 [ 286.470585][ C0] R13: ffff88813fd34112 R14: ffff88813fd34aec R15: ffff888132cdd268 [ 286.471070][ C0] FS: 00007fe3c04c7d80(0000) GS:ffff88840743f000(0000) knlGS:0000000000000000 [ 286.471644][ C0] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 286.472543][ C0] CR2: ffffffffffffffd6 CR3: 000000012daca000 CR4: 00000000000006f0 [ 286.473500][ C0] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 286.474467][ C0] DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7: 0000000000000400 [ 286.475453][ C0] Call Trace: [ 286.476102][ C0] <IRQ> [ 286.476719][ C0] tcp_fin+0x2bb/0x440 [ 286.477429][ C0] tcp_data_queue+0x190f/0x4e60 [ 286.478174][ C0] ? __build_skb_around+0x234/0x330 [ 286.478940][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.479659][ C0] ? __pfx_tcp_data_queue+0x10/0x10 [ 286.480431][ C0] ? tcp_try_undo_loss+0x640/0x6c0 [ 286.481196][ C0] ? seqcount_lockdep_reader_access.constprop.0+0x82/0x90 [ 286.482046][ C0] ? kvm_clock_get_cycles+0x14/0x30 [ 286.482769][ C0] ? ktime_get+0x66/0x150 [ 286.483433][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.484146][ C0] tcp_rcv_established+0x6e4/0x2050 [ 286.484857][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.485523][ C0] ? ipv4_dst_check+0x160/0x2b0 [ 286.486203][ C0] ? __pfx_tcp_rcv_established+0x10/0x10 [ 286.486917][ C0] ? lock_release+0x217/0x2c0 [ 286.487595][ C0] tcp_v4_do_rcv+0x4d6/0x9b0 [ 286.488279][ C0] tcp_v4_rcv+0x2af8/0x3e30 [ 286.488904][ C0] ? raw_local_deliver+0x51b/0xad0 [ 286.489551][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.490198][ C0] ? __pfx_tcp_v4_rcv+0x10/0x10 [ 286.490813][ C0] ? __pfx_raw_local_deliver+0x10/0x10 [ 286.491487][ C0] ? __pfx_nf_confirm+0x10/0x10 [nf_conntrack] [ 286.492275][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.492900][ C0] ip_protocol_deliver_rcu+0x8f/0x370 [ 286.493579][ C0] ip_local_deliver_finish+0x297/0x420 [ 286.494268][ C0] ip_local_deliver+0x168/0x430 [ 286.494867][ C0] ? __pfx_ip_local_deliver+0x10/0x10 [ 286.495498][ C0] ? __pfx_ip_local_deliver_finish+0x10/0x10 [ 286.496204][ C0] ? ip_rcv_finish_core+0x19a/0x1f20 [ 286.496806][ C0] ? lock_release+0x217/0x2c0 [ 286.497414][ C0] ip_rcv+0x455/0x6e0 [ 286.497945][ C0] ? __pfx_ip_rcv+0x10/0x10 [ 286.498550][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.499137][ C0] ? __pfx_ip_rcv_finish+0x10/0x10 [ 286.499763][ C0] ? lock_release+0x217/0x2c0 [ 286.500327][ C0] ? dl_scaled_delta_exec+0xd1/0x2c0 [ 286.500922][ C0] ? __pfx_ip_rcv+0x10/0x10 [ 286.501480][ C0] __netif_receive_skb_one_core+0x166/0x1b0 [ 286.502173][ C0] ? __pfx___netif_receive_skb_one_core+0x10/0x10 [ 286.502903][ C0] ? lock_acquire+0x2b2/0x310 [ 286.503487][ C0] ? process_backlog+0x372/0x1350 [ 286.504087][ C0] ? lock_release+0x217/0x2c0 [ 286.504642][ C0] process_backlog+0x3b9/0x1350 [ 286.505214][ C0] ? process_backlog+0x372/0x1350 [ 286.505779][ C0] __napi_poll.constprop.0+0xa6/0x490 [ 286.506363][ C0] net_rx_action+0x92e/0xe10 [ 286.506889][ C0] ? __pfx_net_rx_action+0x10/0x10 [ 286.507437][ C0] ? timerqueue_add+0x1f0/0x320 [ 286.507977][ C0] ? sched_clock_cpu+0x68/0x540 [ 286.508492][ C0] ? lock_acquire+0x2b2/0x310 [ 286.509043][ C0] ? kvm_sched_clock_read+0xd/0x20 [ 286.509607][ C0] ? handle_softirqs+0x1aa/0x7d0 [ 286.510187][ C0] handle_softirqs+0x1f2/0x7d0 [ 286.510754][ C0] ? __pfx_handle_softirqs+0x10/0x10 [ 286.511348][ C0] ? irqtime_account_irq+0x181/0x290 [ 286.511937][ C0] ? __dev_queue_xmit+0x85d/0x3450 [ 286.512510][ C0] do_softirq.part.0+0x89/0xc0 [ 286.513100][ C0] </IRQ> [ 286.513548][ C0] <TASK> [ 286.513953][ C0] __local_bh_enable_ip+0x112/0x140 [ 286.514522][ C0] ? __dev_queue_xmit+0x85d/0x3450 [ 286.515072][ C0] __dev_queue_xmit+0x872/0x3450 [ 286.515619][ C0] ? nft_do_chain+0xe16/0x15b0 [nf_tables] [ 286.516252][ C0] ? __pfx___dev_queue_xmit+0x10/0x10 [ 286.516817][ C0] ? selinux_ip_postroute+0x43c/0xc50 [ 286.517433][ C0] ? __pfx_selinux_ip_postroute+0x10/0x10 [ 286.518061][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.518606][ C0] ? ip_output+0x164/0x4a0 [ 286.519149][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.519671][ C0] ? ip_finish_output2+0x17d5/0x1fb0 [ 286.520258][ C0] ip_finish_output2+0xb4b/0x1fb0 [ 286.520787][ C0] ? __pfx_ip_finish_output2+0x10/0x10 [ 286.521355][ C0] ? __ip_finish_output+0x15d/0x750 [ 286.521890][ C0] ip_output+0x164/0x4a0 [ 286.522372][ C0] ? __pfx_ip_output+0x10/0x10 [ 286.522872][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.523402][ C0] ? _raw_spin_unlock_irqrestore+0x4c/0x60 [ 286.524031][ C0] ? __pfx_ip_finish_output+0x10/0x10 [ 286.524605][ C0] ? __ip_queue_xmit+0x999/0x2260 [ 286.525200][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.525744][ C0] ? ipv4_dst_check+0x16a/0x2b0 [ 286.526279][ C0] ? lock_release+0x217/0x2c0 [ 286.526793][ C0] __ip_queue_xmit+0x1883/0x2260 [ 286.527324][ C0] ? __skb_clone+0x54c/0x730 [ 286.527827][ C0] __tcp_transmit_skb+0x209b/0x37a0 [ 286.528374][ C0] ? __pfx___tcp_transmit_skb+0x10/0x10 [ 286.528952][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.529472][ C0] ? seqcount_lockdep_reader_access.constprop.0+0x82/0x90 [ 286.530152][ C0] ? trace_hardirqs_on+0x12/0x120 [ 286.530691][ C0] tcp_write_xmit+0xb81/0x88b0 [ 286.531224][ C0] ? mod_memcg_state+0x4d/0x60 [ 286.531736][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.532253][ C0] __tcp_push_pending_frames+0x90/0x320 [ 286.532826][ C0] tcp_send_fin+0x141/0xb50 [ 286.533352][ C0] ? __pfx_tcp_send_fin+0x10/0x10 [ 286.533908][ C0] ? __local_bh_enable_ip+0xab/0x140 [ 286.534495][ C0] inet_shutdown+0x243/0x320 [ 286.535077][ C0] nvme_tcp_alloc_queue+0xb3b/0x2590 [nvme_tcp] [ 286.535709][ C0] ? do_raw_spin_lock+0x129/0x260 [ 286.536314][ C0] ? __pfx_nvme_tcp_alloc_queue+0x10/0x10 [nvme_tcp] [ 286.536996][ C0] ? do_raw_spin_unlock+0x54/0x1e0 [ 286.537550][ C0] ? _raw_spin_unlock+0x29/0x50 [ 286.538127][ C0] ? do_raw_spin_lock+0x129/0x260 [ 286.538664][ C0] ? __pfx_do_raw_spin_lock+0x10/0x10 [ 286.539249][ C0] ? nvme_tcp_alloc_admin_queue+0xd5/0x340 [nvme_tcp] [ 286.539892][ C0] ? __wake_up+0x40/0x60 [ 286.540392][ C0] nvme_tcp_alloc_admin_queue+0xd5/0x340 [nvme_tcp] [ 286.541047][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.541589][ C0] nvme_tcp_setup_ctrl+0x8b/0x7a0 [nvme_tcp] [ 286.542254][ C0] ? _raw_spin_unlock_irqrestore+0x4c/0x60 [ 286.542887][ C0] ? __pfx_nvme_tcp_setup_ctrl+0x10/0x10 [nvme_tcp] [ 286.543568][ C0] ? trace_hardirqs_on+0x12/0x120 [ 286.544166][ C0] ? _raw_spin_unlock_irqrestore+0x35/0x60 [ 286.544792][ C0] ? nvme_change_ctrl_state+0x196/0x2e0 [nvme_core] [ 286.545477][ C0] nvme_tcp_create_ctrl+0x839/0xb90 [nvme_tcp] [ 286.546126][ C0] nvmf_dev_write+0x3db/0x7e0 [nvme_fabrics] [ 286.546775][ C0] ? rw_verify_area+0x69/0x520 [ 286.547334][ C0] vfs_write+0x218/0xe90 [ 286.547854][ C0] ? do_syscall_64+0x9f/0x190 [ 286.548408][ C0] ? trace_hardirqs_on_prepare+0xdb/0x120 [ 286.549037][ C0] ? syscall_exit_to_user_mode+0x93/0x280 [ 286.549659][ C0] ? __pfx_vfs_write+0x10/0x10 [ 286.550259][ C0] ? do_syscall_64+0x9f/0x190 [ 286.550840][ C0] ? syscall_exit_to_user_mode+0x8e/0x280 [ 286.551516][ C0] ? trace_hardirqs_on_prepare+0xdb/0x120 [ 286.552180][ C0] ? syscall_exit_to_user_mode+0x93/0x280 [ 286.552834][ C0] ? ksys_read+0xf5/0x1c0 [ 286.553386][ C0] ? __pfx_ksys_read+0x10/0x10 [ 286.553964][ C0] ksys_write+0xf5/0x1c0 [ 286.554499][ C0] ? __pfx_ksys_write+0x10/0x10 [ 286.555072][ C0] ? trace_hardirqs_on_prepare+0xdb/0x120 [ 286.555698][ C0] ? syscall_exit_to_user_mode+0x93/0x280 [ 286.556319][ C0] ? do_syscall_64+0x54/0x190 [ 286.556866][ C0] do_syscall_64+0x93/0x190 [ 286.557420][ C0] ? rcu_read_unlock+0x17/0x60 [ 286.557986][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.558526][ C0] ? lock_release+0x217/0x2c0 [ 286.559087][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.559659][ C0] ? count_memcg_events.constprop.0+0x4a/0x60 [ 286.560476][ C0] ? exc_page_fault+0x7a/0x110 [ 286.561064][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.561647][ C0] ? lock_release+0x217/0x2c0 [ 286.562257][ C0] ? do_user_addr_fault+0x171/0xa00 [ 286.562839][ C0] ? do_user_addr_fault+0x4a2/0xa00 [ 286.563453][ C0] ? irqentry_exit_to_user_mode+0x84/0x270 [ 286.564112][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.564677][ C0] ? irqentry_exit_to_user_mode+0x84/0x270 [ 286.565317][ C0] ? trace_hardirqs_on_prepare+0xdb/0x120 [ 286.565922][ C0] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 286.566542][ C0] RIP: 0033:0x7fe3c05e6504 [ 286.567102][ C0] Code: c7 00 16 00 00 00 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 80 3d c5 8b 10 00 00 74 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 55 48 89 e5 48 83 ec 20 48 89 [ 286.568931][ C0] RSP: 002b:00007fff76444f58 EFLAGS: 00000202 ORIG_RAX: 0000000000000001 [ 286.569807][ C0] RAX: ffffffffffffffda RBX: 000000003b40d930 RCX: 00007fe3c05e6504 [ 286.570621][ C0] RDX: 00000000000000cf RSI: 000000003b40d930 RDI: 0000000000000003 [ 286.571443][ C0] RBP: 0000000000000003 R08: 00000000000000cf R09: 000000003b40d930 [ 286.572246][ C0] R10: 0000000000000000 R11: 0000000000000202 R12: 000000003b40cd60 [ 286.573069][ C0] R13: 00000000000000cf R14: 00007fe3c07417f8 R15: 00007fe3c073502e [ 286.573886][ C0] </TASK> Closes: https://lore.kernel.org/linux-nvme/5hdonndzoqa265oq3bj6iarwtfk5dewxxjtbjvn5uqnwclpwt6@a2n6w3taxxex/ Signed-off-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-04-30nvmet-tcp: select CONFIG_TLS from CONFIG_NVME_TARGET_TCP_TLSAlistair Francis
Ensure that TLS support is enabled in the kernel when CONFIG_NVME_TARGET_TCP_TLS is enabled. Without this the code compiles, but does not actually work unless something else enables CONFIG_TLS. Fixes: 675b453e0241 ("nvmet-tcp: enable TLS handshake upcall") Signed-off-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-04-22nvmet: fix out-of-bounds access in nvmet_enable_portRichard Weinberger
When trying to enable a port that has no transport configured yet, nvmet_enable_port() uses NVMF_TRTYPE_MAX (255) to query the transports array, causing an out-of-bounds access: [ 106.058694] BUG: KASAN: global-out-of-bounds in nvmet_enable_port+0x42/0x1da [ 106.058719] Read of size 8 at addr ffffffff89dafa58 by task ln/632 [...] [ 106.076026] nvmet: transport type 255 not supported Since commit 200adac75888, NVMF_TRTYPE_MAX is the default state as configured by nvmet_ports_make(). Avoid this by checking for NVMF_TRTYPE_MAX before proceeding. Fixes: 200adac75888 ("nvme: Add PCI transport type") Signed-off-by: Richard Weinberger <richard@nod.at> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
2025-04-16nvmet: pci-epf: cleanup link state managementDamien Le Moal
Since the link_up boolean field of struct nvmet_pci_epf_ctrl is always set to true when nvmet_pci_epf_start_ctrl() is called, assign true to this field in nvmet_pci_epf_start_ctrl(). Conversely, since this field is set to false when nvmet_pci_epf_stop_ctrl() is called, set this field to false directly inside that function. While at it, also add information messages to notify the user of the PCI link state changes to help troubleshoot any link stability issues without needing to enable debug messages. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Niklas Cassel <cassel@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-04-16nvmet: pci-epf: clear CC and CSTS when disabling the controllerDamien Le Moal
When a host shuts down the controller when shutting down but does so without first disabling the controller, the enable bit remains set in the controller configuration register. When the host restarts and attempts to enable the controller again, the nvmet_pci_epf_poll_cc_work() function is unable to detect the change from 0 to 1 of the enable bit, and thus the controller is not enabled again, which result in a device scan timeout on the host. This problem also occurs if the host shuts down uncleanly or if the PCIe link goes down: as the CC.EN value is not reset, the controller is not enabled again when the host restarts. Fix this by introducing the function nvmet_pci_epf_clear_ctrl_config() to clear the CC and CSTS registers of the controller when the PCIe link is lost (nvmet_pci_epf_stop_ctrl() function), or when starting the controller fails (nvmet_pci_epf_enable_ctrl() fails). Also use this function in nvmet_pci_epf_init_bar() to simplify the initialization of the CC and CSTS registers. Furthermore, modify the function nvmet_pci_epf_disable_ctrl() to clear the CC.EN bit and write this updated value to the BAR register when the controller is shutdown by the host, to ensure that upon restart, we can detect the host setting CC.EN. Fixes: 0faa0fe6f90e ("nvmet: New NVMe PCI endpoint function target driver") Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Niklas Cassel <cassel@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-04-16nvmet: pci-epf: always fully initialize completion entriesDamien Le Moal
For a command that is normally processed through the command request execute() function, the completion entry for the command is initialized by __nvmet_req_complete() and nvmet_pci_epf_cq_work() only needs to set the status field and the phase of the completion entry before posting the entry to the completion queue. However, for commands that are failed due to an internal error (e.g. the command data buffer allocation fails), the command request execute() function is not called and __nvmet_req_complete() is never executed for the command, leaving the command completion entry uninitialized. For such command failed before calling req->execute(), the host ends up seeing completion entries with an invalid submission queue ID and command ID. Avoid such issue by always fully initilizing a command completion entry in nvmet_pci_epf_cq_work(), setting the entry submission queue head, ID and command ID. Fixes: 0faa0fe6f90e ("nvmet: New NVMe PCI endpoint function target driver") Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Niklas Cassel <cassel@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-04-16nvmet: auth: use NULL to clear a pointer in nvmet_auth_sq_free()Damien Le Moal
When compiling with C=1, the following sparse warning is generated: auth.c:243:23: warning: Using plain integer as NULL pointer Avoid this warning by using NULL to instead of 0 to set the sq tls_key pointer. Fixes: fa2e0f8bbc68 ("nvmet-tcp: support secure channel concatenation") Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-04-09nvmet-fc: put ref when assoc->del_work is already scheduledDaniel Wagner
Do not leak the tgtport reference when the work is already scheduled. Signed-off-by: Daniel Wagner <wagi@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-04-09nvmet-fc: take tgtport reference only onceDaniel Wagner
The reference counting code can be simplified. Instead taking a tgtport refrerence at the beginning of nvmet_fc_alloc_hostport and put it back if not a new hostport object is allocated, only take it when a new hostport object is allocated. Signed-off-by: Daniel Wagner <wagi@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-04-09nvmet-fc: update tgtport ref per assocDaniel Wagner
We need to take for each unique association a reference. nvmet_fc_alloc_hostport for each newly created association. Signed-off-by: Daniel Wagner <wagi@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-04-09nvmet-fc: inline nvmet_fc_free_hostportDaniel Wagner
No need for this tiny helper with only one user, let's inline it. And since the hostport ref counter needs to stay in sync, it's not optional anymore to give back the reference. Signed-off-by: Daniel Wagner <wagi@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-04-09nvmet-fc: inline nvmet_fc_delete_assocDaniel Wagner
No need for this tiny helper with only one user, just inline it. Signed-off-by: Daniel Wagner <wagi@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-04-09nvmet-fcloop: add ref counting to lportDaniel Wagner
The fcloop_lport objects live time is controlled by the user interface add_local_port and del_local_port. nport, rport and tport objects are pointing to the lport objects but here is no clear tracking. Let's introduce an explicit ref counter for the lport objects and prepare the stage for restructuring how lports are used. Signed-off-by: Daniel Wagner <wagi@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-04-09nvmet-fcloop: replace kref with refcountDaniel Wagner
The kref wrapper is not really adding any value ontop of refcount. Thus replace the kref API with the refcount API. Signed-off-by: Daniel Wagner <wagi@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-04-09nvmet-fcloop: swap list_add_tail argumentsDaniel Wagner
The newly element to be added to the list is the first argument of list_add_tail. This fix is missing dcfad4ab4d67 ("nvmet-fcloop: swap the list_add_tail arguments"). Fixes: 437c0b824dbd ("nvme-fcloop: add target to host LS request support") Signed-off-by: Daniel Wagner <wagi@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-04-03Merge tag 'block-6.15-20250403' of git://git.kernel.dk/linuxLinus Torvalds
Pull more block updates from Jens Axboe: - NVMe pull request via Keith: - PCI endpoint target cleanup (Damien) - Early import for uring_cmd fixed buffer (Caleb) - Multipath documentation and notification improvements (John) - Invalid pci sq doorbell write fix (Maurizio) - Queue init locking fix - Remove dead nsegs parameter from blk_mq_get_new_requests() * tag 'block-6.15-20250403' of git://git.kernel.dk/linux: block: don't grab elevator lock during queue initialization nvme-pci: skip nvme_write_sq_db on empty rqlist nvme-multipath: change the NVME_MULTIPATH config option nvme: update the multipath warning in nvme_init_ns_head nvme/ioctl: move fixed buffer lookup to nvme_uring_cmd_io() nvme/ioctl: move blk_mq_free_request() out of nvme_map_user_request() nvme/ioctl: don't warn on vectorized uring_cmd with fixed buffer nvmet: pci-epf: Keep completion queues mapped block: remove unused nseg parameter