summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-05-20ata: libata-eh: Update DIPM comments to reflect realityNiklas Cassel
The comments describing which LPM policies that has DIPM enabled predates the introduction of the LPM policies ATA_LPM_MIN_POWER_WITH_PARTIAL and ATA_LPM_MED_POWER_WITH_DIPM. Update the DIPM comments to reflect reality. Also remove the sentence that claims that "Order device and link configurations such that the host always allows DIPM requests." This comment is written before 24e0e61db3cb ("ata: libata: disallow dev-initiated LPM transitions to unsupported states"). Even though the set_lpm() call is done before enabling DIPM, the host will not always allow DIPM requests. For all LPM polcies where DIPM is enabled, only DIPM requests to LPM states that are supported by the HBA will be allowed. Signed-off-by: Niklas Cassel <cassel@kernel.org> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
2025-05-20PCI: Remove hybrid-devres usage warnings from kernel-docPhilipp Stanner
pci/iomap.c still contains warnings about those functions not behaving in a managed manner if pcim_enable_device() was called. Since all hybrid behavior that users could know about has been removed by now, those explicit warnings are no longer necessary. Remove the hybrid-devres usage warnings from the kernel-doc. Signed-off-by: Philipp Stanner <phasta@kernel.org> [kwilczynski: commit log] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Link: https://lore.kernel.org/r/20250519112959.25487-8-phasta@kernel.org
2025-05-20PCI: Remove redundant set of request functionsPhilipp Stanner
When the demangling of the hybrid devres functions within PCI was implemented, it was necessary to implement several PCI functions a second time to avoid cyclic calls, since the hybrid functions in pci.c call the managed functions in devres.c, which in turn can be directly used outside of PCI and needed request infrastructure, too. Therefore, __pcim_request_region_range(), __pci_release_region_range() and wrappers around them were implemented. The hybrid nature has recently been removed from all functions in pci.c. Therefore, the functions in devres.c can now directly use their counterparts in pci.c without causing a call-cycle. Remove __pcim_request_region_range(), __pcim_request_region_range() and the wrappers. Use the corresponding request functions from pci.c in devres.c Signed-off-by: Philipp Stanner <phasta@kernel.org> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Link: https://lore.kernel.org/r/20250519112959.25487-7-phasta@kernel.org
2025-05-20PCI: Remove exclusive requests flags from _pcim_request_region()Philipp Stanner
pcim_request_region_exclusive(), the only user in PCI devres that needed exclusive region requests, has been removed. All features related to exclusive requests can, therefore, be removed, too. Remove them. Signed-off-by: Philipp Stanner <phasta@kernel.org> [kwilczynski: commit log] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20250519112959.25487-6-phasta@kernel.org
2025-05-20arm64: dts: renesas: white-hawk-ard-audio: Fix TPU0 groupsThuan Nguyen
White Hawk ARD audio uses a clock generated by the TPU, but commit 3d144ef10a44 ("pinctrl: renesas: r8a779g0: Fix TPU suffixes") renamed pin group "tpu_to0_a" to "tpu_to0_b". Update DTS accordingly otherwise the sound driver does not receive a clock signal. Fixes: 3d144ef10a448f89 ("pinctrl: renesas: r8a779g0: Fix TPU suffixes") Signed-off-by: Thuan Nguyen <thuan.nguyen-hong@banvien.com.vn> Signed-off-by: Duy Nguyen <duy.nguyen.rh@renesas.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Link: https://lore.kernel.org/TYCPR01MB8740608B675365215ADB0374B49CA@TYCPR01MB8740.jpnprd01.prod.outlook.com Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-05-20MAINTAINERS: Add entry for newly added EcoNet platform.Caleb James DeLisle
Add a MAINTAINERS entry as part of integration of the EcoNet MIPS platform. Signed-off-by: Caleb James DeLisle <cjd@cjdns.fr> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2025-05-20mips: dts: Add EcoNet DTS with EN751221 and SmartFiber XP8421-B boardCaleb James DeLisle
Add DTS files in support of EcoNet platform, including SmartFiber XP8421-B, a low cost commercially available board based on EN751221. Signed-off-by: Caleb James DeLisle <cjd@cjdns.fr> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2025-05-20dt-bindings: vendor-prefixes: Add SmartFiberCaleb James DeLisle
Add "smartfiber" vendor prefix for manufactorer of EcoNet based boards. Signed-off-by: Caleb James DeLisle <cjd@cjdns.fr> Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2025-05-20mips: Add EcoNet MIPS platform supportCaleb James DeLisle
Add platform support for EcoNet MIPS SoCs. Signed-off-by: Caleb James DeLisle <cjd@cjdns.fr> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2025-05-20dt-bindings: mips: Add EcoNet platform bindingCaleb James DeLisle
Document the top-level device tree binding for EcoNet MIPS-based SoCs. Signed-off-by: Caleb James DeLisle <cjd@cjdns.fr> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2025-05-20gpiolib: remove unneeded #ifdefBartosz Golaszewski
We are already within another `#ifdef CONFIG_GPIOLIB_IRQCHIP` in gpiochip_to_irq() so there's no need for another guard. Remove it. Acked-by: Peng Fan <peng.fan@nxp.com> Link: https://lore.kernel.org/r/20250519-gpio-irq-kconfig-fixes-v1-3-fe6ba1c6116d@linaro.org Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2025-05-20gpio: mpc8xxx: select GPIOLIB_IRQCHIPBartosz Golaszewski
This driver uses gpiochip_irq_reqres() and gpiochip_irq_relres() which are only built with GPIOLIB_IRQCHIP=y. Add the missing Kconfig select. Fixes: 7688a54d5b53 ("gpio: mpc8xxx: Make irq_chip immutable") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202505180309.1nosQMkI-lkp@intel.com/ Acked-by: Peng Fan <peng.fan@nxp.com> Link: https://lore.kernel.org/r/20250519-gpio-irq-kconfig-fixes-v1-2-fe6ba1c6116d@linaro.org Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2025-05-20gpio: pxa: select GPIOLIB_IRQCHIPBartosz Golaszewski
This driver uses gpiochip_irq_reqres() and gpiochip_irq_relres() which are only built with GPIOLIB_IRQCHIP=y. Add the missing Kconfig select. Fixes: 20117cf426b6 ("gpio: pxa: Make irq_chip immutable") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202505181429.mzyIatOU-lkp@intel.com/ Acked-by: Peng Fan <peng.fan@nxp.com> Link: https://lore.kernel.org/r/20250519-gpio-irq-kconfig-fixes-v1-1-fe6ba1c6116d@linaro.org Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2025-05-20MIPS: bcm63xx: nvram: avoid inefficient use of crc32_le_combine()Eric Biggers
bcm963xx_nvram_checksum() was using crc32_le_combine() to update a CRC with four zero bytes. However, this is about 5x slower than just CRC'ing four zero bytes in the normal way. Just do that instead. (We could instead make crc32_le_combine() faster on short lengths. But all its callers do just fine without it, so I'd like to just remove it.) Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2025-05-20mips: dts: pic32: pic32mzda: Rename the sdhci nodename to match with common ↵Charan Pedumuru
mmc-controller binding Rename the sdhci nodename from "sdhci@" to "mmc@" to align with linux common mmc-controller binding. Signed-off-by: Charan Pedumuru <charan.pedumuru@gmail.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2025-05-20MIPS: SMP: Move the AP sync point before the non-parallel aware functionsGregory CLEMENT
When CONFIG_HOTPLUG_PARALLEL is enabled, the code executing before cpuhp_ap_sync_alive() is executed in parallel, while after it is serialized. The functions set_cpu_sibling_map() and set_cpu_core_map() were not designed to be executed in parallel, so by moving the cpuhp_ap_sync_alive() before cpuhp_ap_sync_alive(), we then ensure they will be called serialized. The measurement done on EyeQ5 did not show any relevant boot time increase after applying this patch. Fixes: 76c43eb507bc ("MIPS: SMP: Implement parallel CPU bring up for EyeQ") Reported-by: Huacai Chen <chenhuacai@kernel.org> Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2025-05-20ALSA: hda/realtek: Add quirk for Lenovo Yoga Pro 7 14ASP10Ed Burcher
Lenovo Yoga Pro 7 (gen 10) with Realtek ALC3306 and combined CS35L56 amplifiers need quirk ALC287_FIXUP_YOGA9_14IAP7_BASS_SPK_PIN to enable bass Signed-off-by: Ed Burcher <git@edburcher.com> Cc: <stable@vger.kernel.org> Link: https://patch.msgid.link/20250519224907.31265-2-git@edburcher.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2025-05-20xfrm: use kfree_sensitive() for SA secret zeroizationZilin Guan
High-level copy_to_user_* APIs already redact SA secret fields when redaction is enabled, but the state teardown path still freed aead, aalg and ealg structs with plain kfree(), which does not clear memory before deallocation. This can leave SA keys and other confidential data in memory, risking exposure via post-free vulnerabilities. Since this path is outside the packet fast path, the cost of zeroization is acceptable and prevents any residual key material. This patch replaces those kfree() calls unconditionally with kfree_sensitive(), which zeroizes the entire buffer before freeing. Signed-off-by: Zilin Guan <zilin@seu.edu.cn> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2025-05-20cpufreq: scmi: Skip SCMI devices that aren't used by the CPUsMike Tipton
Currently, all SCMI devices with performance domains attempt to register a cpufreq driver, even if their performance domains aren't used to control the CPUs. The cpufreq framework only supports registering a single driver, so only the first device will succeed. And if that device isn't used for the CPUs, then cpufreq will scale the wrong domains. To avoid this, return early from scmi_cpufreq_probe() if the probing SCMI device isn't referenced by the CPU device phandles. This keeps the existing assumption that all CPUs are controlled by a single SCMI device. Signed-off-by: Mike Tipton <quic_mdtipton@quicinc.com> Reviewed-by: Peng Fan <peng.fan@nxp.com> Reviewed-by: Cristian Marussi <cristian.marussi@arm.com> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com> Tested-by: Cristian Marussi <cristian.marussi@arm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2025-05-20Merge branch 'rust/cpufreq-dt' into cpufreq/arm/linux-nextViresh Kumar
2025-05-20cpufreq: Add Rust-based cpufreq-dt driverViresh Kumar
Introduce a Rust-based implementation of the cpufreq-dt driver, covering most of the functionality provided by the existing C version. Some features, such as retrieving platform data from `cpufreq-dt-platdev.c`, are still pending. The driver has been tested with QEMU, and frequency scaling works as expected. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2025-05-20rust: opp: Extend OPP abstractions with cpufreq supportViresh Kumar
Extend the OPP abstractions to include support for interacting with the cpufreq core, including the ability to retrieve frequency tables from OPP table. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2025-05-20rust: cpufreq: Extend abstractions for driver registrationViresh Kumar
Extend the cpufreq abstractions to support driver registration from Rust. Reviewed-by: Danilo Krummrich <dakr@kernel.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2025-05-20rust: cpufreq: Extend abstractions for policy and driver opsViresh Kumar
Extend the cpufreq abstractions to include support for policy handling and driver operations. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2025-05-20rust: cpufreq: Add initial abstractions for cpufreq frameworkViresh Kumar
Introduce initial Rust abstractions for the cpufreq core. This includes basic representations for cpufreq flags, relation types, and the cpufreq table. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2025-05-19hwmon: (isl28022) Fix current reading calculationYikai Tsai
According to the ISL28022 datasheet, bit15 of the current register is representing -32768. Fix the calculation to properly handle this bit, ensuring correct measurements for negative values. Signed-off-by: Yikai Tsai <yikai.tsai.wiwynn@gmail.com> Link: https://lore.kernel.org/r/20250519084055.3787-2-yikai.tsai.wiwynn@gmail.com Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2025-05-20rust: opp: Add abstractions for the configuration optionsViresh Kumar
Introduce Rust abstractions for the OPP core configuration options, enabling safe access to various configurable aspects of the OPP framework. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2025-05-20rust: opp: Add abstractions for the OPP tableViresh Kumar
Introduce Rust abstractions for `struct opp_table`, enabling access to OPP tables from Rust. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2025-05-20rust: opp: Add initial abstractions for OPP frameworkViresh Kumar
Introduce initial Rust abstractions for the Operating Performance Points (OPP) framework. This includes bindings for `struct dev_pm_opp` and `struct dev_pm_opp_data`, laying the groundwork for further OPP integration. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2025-05-20rust: cpu: Add from_cpu()Viresh Kumar
This implements cpu::from_cpu(), which returns a reference to Device for a CPU. The C struct is created at initialization time for CPUs and is never freed and so ARef isn't returned from this function. The new helper will be used by Rust based cpufreq drivers. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2025-05-20rust: macros: enable use of hyphens in module namesAnisse Astier
Some modules might need naming that contains hyphens "-" to match the auto-probing by name in the platform devices that comes from the device tree. But Rust identifier cannot contain hyphens, so replace them with underscores. Signed-off-by: Anisse Astier <anisse@astier.eu> Reviewed-by: Alice Ryhl <aliceryhl@google.com> [ Viresh: Replace "-" with '-', minor commit log fix, rename variable and fix line length checkpatch warnings ] Reviewed-by: Benno Lossin <lossin@kernel.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2025-05-20nvme: rename nvme_mpath_shutdown_disk to nvme_mpath_remove_diskNilay Shroff
In the NVMe context, the term "shutdown" has a specific technical meaning. To avoid confusion, this commit renames the nvme_mpath_ shutdown_disk function to nvme_mpath_remove_disk to better reflect its purpose (i.e. removing the disk from the system). However, nvme_mpath_remove_disk was already in use, and its functionality is related to releasing or putting the head node disk. To resolve this naming conflict and improve clarity, the existing nvme_mpath_ remove_disk function is also renamed to nvme_mpath_put_disk. This renaming improves code readability and better aligns function names with their actual roles. Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Nilay Shroff <nilay@linux.ibm.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-05-20nvme: introduce multipath_always_on module paramNilay Shroff
Currently, a multipath head disk node is not created for single- ported NVMe adapters or private namespaces with non-unique NSID. However, creating a head node in these cases can help transparently handle transient PCIe link failures. Without a head node, features like delayed removal cannot be leveraged, making it difficult to tolerate such link failures. To address this, this commit introduces nvme_core module parameter multipath_always_on. When multipath_always_on is set to true, it forces the creation of a multipath head node regardless NVMe disk or namespace type. So this option allows the use of delayed removal of head node functionality even for single-ported NVMe disks and private namespaces with a unique NSID and thus helps transparently handle transient PCIe link failures. By default multipath_always_on is set to false, thus preserving the existing behavior. Setting it to true enables improved fault tolerance in PCIe setups. Moreover, please note that enabling this option would also implicitly enable nvme_core.multipath. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Nilay Shroff <nilay@linux.ibm.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-05-20nvme-multipath: introduce delayed removal of the multipath head nodeNilay Shroff
Currently, the multipath head node of an NVMe disk is removed immediately as soon as all paths of the disk are removed. However, this can cause issues in scenarios where: - The disk hot-removal followed by re-addition. - Transient PCIe link failures that trigger re-enumeration, temporarily removing and then restoring the disk. In these cases, removing the head node prematurely may lead to a head disk node name change upon re-addition, requiring applications to reopen their handles if they were performing I/O during the failure. To address this, introduce a delayed removal mechanism of head disk node. During transient failure, instead of immediate removal of head disk node, the system waits for a configurable timeout, allowing the disk to recover. During transient disk failure, if application sends any IO then we queue it instead of failing such IO immediately. If the disk comes back online within the timeout, the queued IOs are resubmitted to the disk ensuring seamless operation. In case disk couldn't recover from the failure then queued IOs are failed to its completion and application receives the error. So this way, if disk comes back online within the configured period, the head node remains unchanged, ensuring uninterrupted workloads without requiring applications to reopen device handles. A new sysfs attribute, named "delayed_removal_secs" is added under head disk blkdev for user who wish to configure time for the delayed removal of head disk node. The default value of this attribute is set to zero second ensuring no behavior change unless explicitly configured. Link: https://lore.kernel.org/linux-nvme/Y9oGTKCFlOscbPc2@infradead.org/ Link: https://lore.kernel.org/linux-nvme/Y+1aKcQgbskA2tra@kbusch-mbp.dhcp.thefacebook.com/ Suggested-by: Keith Busch <kbusch@kernel.org> Suggested-by: Christoph Hellwig <hch@infradead.org> [nilay: reworked based on the original idea/POC from Christoph and Keith] Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Nilay Shroff <nilay@linux.ibm.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-05-20nvme-pci: derive and better document max segments limitsChristoph Hellwig
Redefine the max segments and max integrity limits based on the limiting factors. This keeps exactly the same values for 4k PAGE_SIZE systems, but increases the number of segments for larger page size as it properly derives the scatterlist allocation based limit for them instead of assuming a 4k PAGE_SIZE. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org>
2025-05-20nvme-pci: use struct_size for allocation struct nvme_devChristoph Hellwig
This avoids open coding the variable size array arithmetics. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Leon Romanovsky <leon@kernel.org>
2025-05-20nvme-pci: add a symolic name for the small pool sizeLeon Romanovsky
Open coding magic numbers in multiple places is never a good idea. Signed-off-by: Leon Romanovsky <leon@kernel.org> [hch: split from a larger patch] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Caleb Sander Mateos <csander@purestorage.com>
2025-05-20nvme-pci: use a better encoding for small prp pool allocationsChristoph Hellwig
Add a separate flag to encode that the transfer is using the small page sized pool, and use a normal 0..n count for the number of descriptors. Contains improvements and suggestions from Kanchan Joshi <joshi.k@samsung.com> and Leon Romanovsky <leon@kernel.org>. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Leon Romanovsky <leon@kernel.org>
2025-05-20nvme-pci: rename the descriptor poolsChristoph Hellwig
They are used for both PRPs and SGLs, and we use descriptor elsewhere when referring to their allocations, so use that name here as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Leon Romanovsky <leon@kernel.org>
2025-05-20nvme-pci: remove struct nvme_descriptorChristoph Hellwig
There is no real point in having a union of two pointer types here, just use a void pointer as we mix and match types between the arms of the union between the allocation and freeing side already. Also rename the nr_allocations field to nr_descriptors to better describe what it does. Signed-off-by: Christoph Hellwig <hch@lst.de> [leon: ported forward to include metadata SGL support] Signed-off-by: Leon Romanovsky <leon@kernel.org> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
2025-05-20nvme-pci: store aborted state in flags variableLeon Romanovsky
Instead of keeping dedicated "bool aborted" variable, switch to a flags flags that can be used for other flags as well. Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Caleb Sander Mateos <csander@purestorage.com>
2025-05-20nvme-pci: don't try to use SGLs for metadata on the admin queueChristoph Hellwig
No admin command defined in an NVMe specification supports metadata, but to protect against vendor specific commands using metadata ensure that we don't try to use SGLs for metadata on the admin queue, as NVMe does not support SGLs on the admin queue for the PCI transport. Do this by checking if the data transfer has been setup using SGLs as that is required for using SGLs for metadata. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Leon Romanovsky <leon@kernel.org>
2025-05-20nvme-pci: make PRP list DMA pools per-NUMA-nodeCaleb Sander Mateos
NVMe commands with over 8 KB of discontiguous data allocate PRP list pages from the per-nvme_device dma_pool prp_page_pool or prp_small_pool. Each call to dma_pool_alloc() and dma_pool_free() takes the per-dma_pool spinlock. These device-global spinlocks are a significant source of contention when many CPUs are submitting to the same NVMe devices. On a workload issuing 32 KB reads from 16 CPUs (8 hypertwin pairs) across 2 NUMA nodes to 23 NVMe devices, we observed 2.4% of CPU time spent in _raw_spin_lock_irqsave called from dma_pool_alloc and dma_pool_free. Ideally, the dma_pools would be per-hctx to minimize contention. But that could impose considerable resource costs in a system with many NVMe devices and CPUs. As a compromise, allocate per-NUMA-node PRP list DMA pools. Map each nvme_queue to the set of DMA pools corresponding to its device and its hctx's NUMA node. This reduces the _raw_spin_lock_irqsave overhead by about half, to 1.2%. Preventing the sharing of PRP list pages across NUMA nodes also makes them cheaper to initialize. Link: https://lore.kernel.org/linux-nvme/CADUfDZqa=OOTtTTznXRDmBQo1WrFcDw1hBA7XwM7hzJ-hpckcA@mail.gmail.com/T/#u Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-05-20nvme-pci: factor out a nvme_init_hctx_common() helperCaleb Sander Mateos
nvme_init_hctx() and nvme_admin_init_hctx() are very similar. In preparation for adding more logic, factor out a nvme_init_hctx-common() helper. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-05-20dmapool: add NUMA affinity supportKeith Busch
Introduce dma_pool_create_node(), like dma_pool_create() but taking an additional NUMA node argument. Allocate struct dma_pool on the desired node, and store the node on dma_pool for allocating struct dma_page. Make dma_pool_create() an alias for dma_pool_create_node() with node set to NUMA_NO_NODE. Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-05-20nvme-fc: do not reference lsrsp after failureDaniel Wagner
The lsrsp object is maintained by the LLDD. The lifetime of the lsrsp object is implicit. Because there is no explicit cleanup/free call into the LLDD, it is not safe to assume after xml_rsp_fails, that the lsrsp is still valid. The LLDD could have freed the object already. With the recent changes how fcloop tracks the resources, this is the case. Thus don't access lsrsp after xml_rsp_fails. Signed-off-by: Daniel Wagner <wagi@kernel.org> Reviewed-by: Hannes Reinecke <hare@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-05-20nvmet-fcloop: don't wait for lport cleanupDaniel Wagner
The lifetime of the fcloop_lsreq is not tight to the lifetime of the host or target port, thus there is no need anymore to synchronize the cleanup path anymore. Signed-off-by: Daniel Wagner <wagi@kernel.org> Reviewed-by: Hannes Reinecke <hare@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-05-20nvmet-fcloop: add missing fcloop_callback_host_doneDaniel Wagner
Add the missing fcloop_call_host_done calls so that the caller frees resources when something goes wrong. Signed-off-by: Daniel Wagner <wagi@kernel.org> Reviewed-by: Hannes Reinecke <hare@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-05-20nvmet-fc: take tgtport refs for portentryDaniel Wagner
Ensure that the tgtport is not going away as long portentry has a pointer on it. Signed-off-by: Daniel Wagner <wagi@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-05-20nvmet-fc: free pending reqs on tgtport unregisterDaniel Wagner
When nvmet_fc_unregister_targetport is called by the LLDD, it's not possible to communicate with the host, thus all pending request will not be process. Thus explicitly free them. Signed-off-by: Daniel Wagner <wagi@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>