linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2025-05-21	Merge tag 'clk-fixes-for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux Pull clk fixes from Stephen Boyd: "Fixes for some SoC clk drivers: - Define the gate clk for the OTG PHY on Rockchip RK3576 so the nvmem driver actually works - Initialize clk_hw_onecell_data::num before accessing the 'hws' array to keep UBSAN happy - Fix a perf degradation on the Allwinner D1 MMC clk that was making things half bad - Fix the Allwinner SNXI_CCU_MP_DATA_WITH_MUX_GATE_FEAT macro to have proper order of arguments" * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: clk: sunxi-ng: d1: Add missing divider for MMC mod clocks clk: s2mps11: initialise clk_hw_onecell_data::num before accessing ::hws[] in probe() clk: sunxi-ng: fix order of arguments in clock macro clk: rockchip: rk3576: define clk_otp_phy_g
2025-05-21	net: hibmcge: fix wrong ndo.open() after reset fail issue.	Jijie Shao
	If the driver reset fails, it may not work properly. Therefore, the ndo.open() operation should be rejected. In this patch, the driver calls netif_device_detach() before the reset and calls netif_device_attach() after the reset succeeds. If the reset fails, netif_device_attach() is not called. Therefore, netdev does not present and cannot be opened. If reset fails, only the PCI reset (via sysfs) can be used to attempt recovery. Fixes: 3f5a61f6d504 ("net: hibmcge: Add reset supported in this module") Signed-off-by: Jijie Shao <shaojijie@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250517095828.1763126-3-shaojijie@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-21	net: hibmcge: fix incorrect statistics update issue	Jijie Shao
	When the user dumps statistics, the hibmcge driver automatically updates all statistics. If the driver is performing the reset operation, the error data of 0xFFFFFFFF is updated. Therefore, if the driver is resetting, the hbg_update_stats_by_info() needs to return directly. Fixes: c0bf9bf31e79 ("net: hibmcge: Add support for dump statistics") Signed-off-by: Jijie Shao <shaojijie@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250517095828.1763126-2-shaojijie@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-21	ublk: handle ublk_set_auto_buf_reg() failure correctly in ublk_fetch()	Ming Lei
	If ublk_set_auto_buf_reg() fails, we need to unlock and return, otherwise `ub->mutex` is leaked. Fixes: 99c1e4eb6a3f ("ublk: register buffer to local io_uring with provided buf index via UBLK_F_AUTO_BUF_REG") Reported-by: Caleb Sander Mateos <csander@purestorage.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250521025502.71041-2-ming.lei@redhat.com Reviewed-by: Caleb Sander Mateos <csander@purestorage.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-21	PCI/MSI: Use bool for MSI enable state tracking	Hans Zhang
	Convert pci_msi_enable and pci_msi_enabled() to use bool type for clarity. No functional changes, only code cleanup. Signed-off-by: Hans Zhang <hans.zhang@cixtech.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250516165223.125083-2-18255117159@163.com
2025-05-21	Merge tag 'samsung-fixes-6.15' of ↵	Arnd Bergmann
	https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux into arm/fixes Samsung SoC driver fixes for v6.15 1. Exynos ACPM driver (used on Google GS101): Fix timeout due to missing responses from the firmware part. 2. Samsung USI (serial engines) driver: Correct ineffective unconfiguring of the interface during probe removal. * tag 'samsung-fixes-6.15' of https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux: soc: samsung: usi: prevent wrong bits inversion during unconfiguring firmware: exynos-acpm: check saved RX before bailing out on empty RX queue Link: https://lore.kernel.org/r/20250513101023.21552-5-krzysztof.kozlowski@linaro.org Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-05-21	nvme: avoid creating multipath sysfs group under namespace path devices	Nilay Shroff
	Commit 4dbd2b2ebe4c ("nvme-multipath: Add visibility for round-robin io-policy") introduced the creation of the multipath sysfs group under the NVMe head gendisk device node. However, it also inadvertently added the same sysfs group under each namespace path device which head node refers to and that is incorrect. The multipath sysfs group should only be exposed through the namespace head gendisk node. This is sufficient, as the head device already provides symbolic links to the individual namespace paths it manages. This patch fixes the issue by preventing the creation of the multipath sysfs group under namespace path devices, ensuring it only appears under the head disk node. Fixes: 4dbd2b2ebe4c ("nvme-multipath: Add visibility for round-robin io-policy") Signed-off-by: Nilay Shroff <nilay@linux.ibm.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-05-21	perf/apple_m1: Remove driver-specific throttle support	Kan Liang
	The throttle support has been added in the generic code. Remove the driver-specific throttle support. Besides the throttle, perf_event_overflow may return true because of event_limit. It already does an inatomic event disable. The pmu->stop is not required either. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250520181644.2673067-10-kan.liang@linux.intel.com
2025-05-21	perf/arm: Remove driver-specific throttle support	Kan Liang
	The throttle support has been added in the generic code. Remove the driver-specific throttle support. Besides the throttle, perf_event_overflow may return true because of event_limit. It already does an inatomic event disable. The pmu->stop is not required either. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Leo Yan <leo.yan@arm.com> Link: https://lore.kernel.org/r/20250520181644.2673067-9-kan.liang@linux.intel.com
2025-05-21	can: slcan: allow reception of short error messages	Carlos Sanchez
	Allows slcan to receive short messages (typically errors) from the serial interface. When error support was added to slcan protocol in b32ff4668544e1333b694fcc7812b2d7397b4d6a ("can: slcan: extend the protocol with error info") the minimum valid message size changed from 5 (minimum standard can frame tIII0) to 3 ("e1a" is a valid protocol message, it is one of the examples given in the comments for slcan_bump_err() ), but the check for minimum message length prodicating all decoding was not adjusted. This makes short error messages discarded and error frames not being generated. This patch changes the minimum length to the new minimum (3 characters, excluding terminator, is now a valid message). Signed-off-by: Carlos Sanchez <carlossanchez@geotab.com> Fixes: b32ff4668544 ("can: slcan: extend the protocol with error info") Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Link: https://patch.msgid.link/20250520102305.1097494-1-carlossanchez@geotab.com Cc: stable@vger.kernel.org Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-05-21	Merge tag 'v6.15-rc7' into x86/core, to pick up fixes	Ingo Molnar
	Pick up build fixes from upstream to make this tree more testable. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-05-21	crypto: marvell/cesa - Do not chain submitted requests	Herbert Xu
	This driver tries to chain requests together before submitting them to hardware in order to reduce completion interrupts. However, it even extends chains that have already been submitted to hardware. This is dangerous because there is no way of knowing whether the hardware has already read the DMA memory in question or not. Fix this by splitting the chain list into two. One for submitted requests and one for requests that have not yet been submitted. Only extend the latter. Reported-by: Klaus Kudielka <klaus.kudielka@gmail.com> Fixes: 85030c5168f1 ("crypto: marvell - Add support for chaining crypto requests in TDMA mode") Cc: <stable@vger.kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-05-20	Merge tag 'rproc-v6.15-fixes' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux Pull remoteproc fix from Bjorn Andersson: "Address a regression preventing the wireless subsystem remoteproc on some Qualcomm platforms (e.g. SDM632) from working" * tag 'rproc-v6.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux: remoteproc: qcom_wcnss: Fix on platforms without fallback regulators
2025-05-20	net: lan743x: Restore SGMII CTRL register on resume	Thangaraj Samynathan
	SGMII_CTRL register, which specifies the active interface, was not properly restored when resuming from suspend. This led to incorrect interface selection after resume particularly in scenarios involving the FPGA. To fix this: - Move the SGMII_CTRL setup out of the probe function. - Initialize the register in the hardware initialization helper function, which is called during both device initialization and resume. This ensures the interface configuration is consistently restored after suspend/resume cycles. Fixes: a46d9d37c4f4f ("net: lan743x: Add support for SGMII interface") Signed-off-by: Thangaraj Samynathan <thangaraj.s@microchip.com> Link: https://patch.msgid.link/20250516035719.117960-1-thangaraj.s@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-20	bnxt_en: Fix netdev locking in ULP IRQ functions	Michael Chan
	netdev_lock is already held when calling bnxt_ulp_irq_stop() and bnxt_ulp_irq_restart(). When converting rtnl_lock to netdev_lock, the original code was rtnl_dereference() to indicate that rtnl_lock was already held. rcu_dereference_protected() is the correct conversion after replacing rtnl_lock with netdev_lock. Add a new helper netdev_lock_dereference() similar to rtnl_dereference(). Fixes: 004b5008016a ("eth: bnxt: remove most dependencies on RTNL") Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250519204130.3097027-2-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-20	net: dwmac-sun8i: Use parsed internal PHY address instead of 1	Paul Kocialkowski
	While the MDIO address of the internal PHY on Allwinner sun8i chips is generally 1, of_mdio_parse_addr is used to cleanly parse the address from the device-tree instead of hardcoding it. A commit reworking the code ditched the parsed value and hardcoded the value 1 instead, which didn't really break anything but is more fragile and not future-proof. Restore the initial behavior using the parsed address returned from the helper. Fixes: 634db83b8265 ("net: stmmac: dwmac-sun8i: Handle integrated/external MDIOs") Signed-off-by: Paul Kocialkowski <paulk@sys-base.io> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Corentin LABBE <clabbe.montjoie@gmail.com> Tested-by: Corentin LABBE <clabbe.montjoie@gmail.com> Link: https://patch.msgid.link/20250519164936.4172658-1-paulk@sys-base.io Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-20	Merge branch '100GbE' of ↵	Jakub Kicinski
	git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2025-05-19 (ice, idpf) For ice: Jake removes incorrect incrementing of MAC filter count. Dave adds check for, prerequisite, switchdev mode before setting up LAG. For idpf: Pavan stores max_tx_hdr_size to prevent NULL pointer dereference during reset. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: idpf: fix null-ptr-deref in idpf_features_check ice: Fix LACP bonds without SRIOV environment ice: fix vf->num_mac count with port representors ==================== Link: https://patch.msgid.link/20250519210523.1866503-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-20	net: ethernet: ti: am65-cpsw: Lower random mac address error print to info	Nishanth Menon
	Using random mac address is not an error since the driver continues to function, it should be informative that the system has not assigned a MAC address. This is inline with other drivers such as ax88796c, dm9051 etc. Drop the error level to info level. Signed-off-by: Nishanth Menon <nm@ti.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Roger Quadros <rogerq@kernel.org> Link: https://patch.msgid.link/20250516122655.442808-1-nm@ti.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-20	pinctrl: qcom: switch to devm_register_sys_off_handler()	Dmitry Baryshkov
	Error-handling paths in msm_pinctrl_probe() don't call a function required to unroll restart handler registration, unregister_restart_handler(). Instead of adding calls to this function, switch the msm pinctrl code into using devm_register_sys_off_handler(). Fixes: cf1fc1876289 ("pinctrl: qcom: use restart_notifier mechanism for ps_hold") Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Link: https://lore.kernel.org/20250513-pinctrl-msm-fix-v2-2-249999af0fc1@oss.qualcomm.com Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2025-05-20	gpiolib: don't crash on enabling GPIO HOG pins	Dmitry Baryshkov
	On Qualcomm platforms if the board uses GPIO hogs msm_pinmux_request() calls gpiochip_line_is_valid(). After commit 8015443e24e7 ("gpio: Hide valid_mask from direct assignments") gpiochip_line_is_valid() uses gc->gpiodev, which is NULL when GPIO hog pins are being processed. Thus after this commit using GPIO hogs causes the following crash. In order to fix this, verify that gc->gpiodev is not NULL. Note: it is not possible to reorder calls (e.g. by calling msm_gpio_init() before pinctrl registration or by splitting pinctrl_register() into _and_init() and pinctrl_enable() and calling the latter function after msm_gpio_init()) because GPIO chip registration would fail with EPROBE_DEFER if pinctrl is not enabled at the time of registration. pc : gpiochip_line_is_valid+0x4/0x28 lr : msm_pinmux_request+0x24/0x40 sp : ffff8000808eb870 x29: ffff8000808eb870 x28: 0000000000000000 x27: 0000000000000000 x26: 0000000000000000 x25: ffff726240f9d040 x24: 0000000000000000 x23: ffff7262438c0510 x22: 0000000000000080 x21: ffff726243ea7000 x20: ffffab13f2c4e698 x19: 0000000000000080 x18: 00000000ffffffff x17: ffff726242ba6000 x16: 0000000000000100 x15: 0000000000000028 x14: 0000000000000000 x13: 0000000000002948 x12: 0000000000000003 x11: 0000000000000078 x10: 0000000000002948 x9 : ffffab13f50eb5e8 x8 : 0000000003ecb21b x7 : 000000000000002d x6 : 0000000000000b68 x5 : 0000007fffffffff x4 : ffffab13f52f84a8 x3 : ffff8000808eb804 x2 : ffffab13f1de8190 x1 : 0000000000000080 x0 : 0000000000000000 Call trace: gpiochip_line_is_valid+0x4/0x28 (P) pin_request+0x208/0x2c0 pinmux_enable_setting+0xa0/0x2e0 pinctrl_commit_state+0x150/0x26c pinctrl_enable+0x6c/0x2a4 pinctrl_register+0x3c/0xb0 devm_pinctrl_register+0x58/0xa0 msm_pinctrl_probe+0x2a8/0x584 sdm845_pinctrl_probe+0x20/0x88 platform_probe+0x68/0xc0 really_probe+0xbc/0x298 __driver_probe_device+0x78/0x12c driver_probe_device+0x3c/0x160 __device_attach_driver+0xb8/0x138 bus_for_each_drv+0x84/0xe0 __device_attach+0x9c/0x188 device_initial_probe+0x14/0x20 bus_probe_device+0xac/0xb0 deferred_probe_work_func+0x8c/0xc8 process_one_work+0x208/0x5e8 worker_thread+0x1b4/0x35c kthread+0x144/0x220 ret_from_fork+0x10/0x20 Code: b5fffba0 17fffff2 9432ec27 f9400400 (f9428800) Fixes: 8015443e24e7 ("gpio: Hide valid_mask from direct assignments") Reported-by: Doug Anderson <dianders@chromium.org> Closes: https://lore.kernel.org/r/CAD=FV=Vg8_ZOLgLoC4WhFPzhVsxXFC19NrF38W6cW_W_3nFjbw@mail.gmail.com Tested-by: Douglas Anderson <dianders@chromium.org> Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Reviewed-by: Matti Vaittinen <mazziesaccount@gmail.com> Acked-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Link: https://lore.kernel.org/20250513-pinctrl-msm-fix-v2-1-249999af0fc1@oss.qualcomm.com Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2025-05-20	can: kvaser_pciefd: Continue parsing DMA buf after dropped RX	Axel Forsman
	Going bus-off on a channel doing RX could result in dropped packets. As netif_running() gets cleared before the channel abort procedure, the handling of any last RDATA packets would see netif_rx() return non-zero to signal a dropped packet. kvaser_pciefd_read_buffer() dealt with this "error" by breaking out of processing the remaining DMA RX buffer. Only return an error from kvaser_pciefd_read_buffer() due to packet corruption, otherwise handle it internally. Cc: stable@vger.kernel.org Signed-off-by: Axel Forsman <axfo@kvaser.com> Tested-by: Jimmy Assarsson <extja@kvaser.com> Reviewed-by: Jimmy Assarsson <extja@kvaser.com> Link: https://patch.msgid.link/20250520114332.8961-4-axfo@kvaser.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-05-20	can: kvaser_pciefd: Fix echo_skb race	Axel Forsman
	The functions kvaser_pciefd_start_xmit() and kvaser_pciefd_handle_ack_packet() raced to stop/wake TX queues and get/put echo skbs, as kvaser_pciefd_can->echo_lock was only ever taken when transmitting and KCAN_TX_NR_PACKETS_CURRENT gets decremented prior to handling of ACKs. E.g., this caused the following error: can_put_echo_skb: BUG! echo_skb 5 is occupied! Instead, use the synchronization helpers in netdev_queues.h. As those piggyback on BQL barriers, start updating in-flight packets and bytes counts as well. Cc: stable@vger.kernel.org Signed-off-by: Axel Forsman <axfo@kvaser.com> Tested-by: Jimmy Assarsson <extja@kvaser.com> Reviewed-by: Jimmy Assarsson <extja@kvaser.com> Link: https://patch.msgid.link/20250520114332.8961-3-axfo@kvaser.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-05-20	can: kvaser_pciefd: Force IRQ edge in case of nested IRQ	Axel Forsman
	Avoid the driver missing IRQs by temporarily masking IRQs in the ISR to enforce an edge even if a different IRQ is signalled before handled IRQs are cleared. Fixes: 48f827d4f48f ("can: kvaser_pciefd: Move reset of DMA RX buffers to the end of the ISR") Cc: stable@vger.kernel.org Signed-off-by: Axel Forsman <axfo@kvaser.com> Tested-by: Jimmy Assarsson <extja@kvaser.com> Reviewed-by: Jimmy Assarsson <extja@kvaser.com> Link: https://patch.msgid.link/20250520114332.8961-2-axfo@kvaser.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-05-20	ublk: support UBLK_AUTO_BUF_REG_FALLBACK	Ming Lei
	For UBLK_F_AUTO_BUF_REG, buffer is registered to uring_cmd context automatically with the provided buffer index. User may provide one wrong buffer index, or the specified buffer is registered by application already. Add UBLK_AUTO_BUF_REG_FALLBACK for supporting to auto buffer registering fallback by completing the uring_cmd and telling ublk server the register failure via UBLK_AUTO_BUF_REG_FALLBACK, then ublk server still can register the buffer from userspace. So we can provide reliable way for supporting auto buffer register. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250520045455.515691-5-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-20	ublk: register buffer to local io_uring with provided buf index via ↵	Ming Lei
	UBLK_F_AUTO_BUF_REG Add UBLK_F_AUTO_BUF_REG for supporting to register buffer automatically to local io_uring context with provided buffer index. Add UAPI structure `struct ublk_auto_buf_reg` for holding user parameter to register request buffer automatically, one 'flags' field is defined, and there is still 32bit available for future extension, such as, adding one io_ring FD field for registering buffer to external io_uring. `struct ublk_auto_buf_reg` is populated from ublk uring_cmd's sqe->addr, and all existing ublk commands are data-less, so it is just fine to reuse sqe->addr for this purpose. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250520045455.515691-4-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-20	ublk: prepare for supporting to register request buffer automatically	Ming Lei
	UBLK_F_SUPPORT_ZERO_COPY requires ublk server to issue explicit buffer register/unregister uring_cmd for each IO, this way is not only inefficient, but also introduce dependency between buffer consumer and buffer register/ unregister uring_cmd, please see tools/testing/selftests/ublk/stripe.c in which backing file IO has to be issued one by one by IOSQE_IO_LINK. Prepare for adding feature UBLK_F_AUTO_BUF_REG for addressing the existing zero copy limitation: - register request buffer automatically to ublk uring_cmd's io_uring context before delivering io command to ublk server - unregister request buffer automatically from the ublk uring_cmd's io_uring context when completing the request - io_uring will unregister the buffer automatically when uring is exiting, so we needn't worry about accident exit For using this feature, ublk server has to create one sparse buffer table Reviewed-by: Caleb Sander Mateos <csander@purestorage.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250520045455.515691-3-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-20	ublk: convert to refcount_t	Ming Lei
	Convert to refcount_t and prepare for supporting to register bvec buffer automatically, which needs to initialize reference counter as 2, and kref doesn't provide this interface, so convert to refcount_t. Reviewed-by: Caleb Sander Mateos <csander@purestorage.com> Suggested-by: Caleb Sander Mateos <csander@purestorage.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250520045455.515691-2-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-20	loop: don't require ->write_iter for writable files in loop_configure	Christoph Hellwig
	Block devices can be opened read-write even if they can't be written to for historic reasons. Remove the check requiring file->f_op->write_iter when the block devices was opened in loop_configure. The call to loop_check_backing_file just below ensures the ->write_iter is present for backing files opened for writing, which is the only check that is actually needed. Fixes: f5c84eff634b ("loop: Add sanity check for read/write_iter") Reported-by: Christian Hesse <mail@eworm.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250520135420.1177312-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-20	platform/x86: think-lmi: Fix attribute name usage for non-compliant items	Mark Pearson
	A few, quite rare, WMI attributes have names that are not compatible with filenames, e.g. "Intel VT for Directed I/O (VT-d)". For these cases the '/' gets replaced with '\' for display, but doesn't get switched again when doing the WMI access. Fix this by keeping the original attribute name and using that for sending commands to the BIOS Fixes: a40cd7ef22fb ("platform/x86: think-lmi: Add WMI interface support on Lenovo platforms") Signed-off-by: Mark Pearson <mpearson-lenovo@squebb.ca> Link: https://lore.kernel.org/r/20250520005027.3840705-1-mpearson-lenovo@squebb.ca Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-05-20	platform/x86: thinkpad_acpi: Ignore battery threshold change event notification	Mark Pearson
	If user modifies the battery charge threshold an ACPI event is generated. Confirmed with Lenovo FW team this is only generated on user event. As no action is needed, ignore the event and prevent spurious kernel logs. Reported-by: Derek Barbosa <debarbos@redhat.com> Closes: https://lore.kernel.org/platform-driver-x86/7e9a1c47-5d9c-4978-af20-3949d53fb5dc@app.fastmail.com/T/#m5f5b9ae31d3fbf30d7d9a9d76c15fb3502dfd903 Signed-off-by: Mark Pearson <mpearson-lenovo@squebb.ca> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Armin Wolf <W_Armin@gmx.de> Link: https://lore.kernel.org/r/20250517023348.2962591-1-mpearson-lenovo@squebb.ca Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-05-20	nvme: rename nvme_mpath_shutdown_disk to nvme_mpath_remove_disk	Nilay Shroff
	In the NVMe context, the term "shutdown" has a specific technical meaning. To avoid confusion, this commit renames the nvme_mpath_ shutdown_disk function to nvme_mpath_remove_disk to better reflect its purpose (i.e. removing the disk from the system). However, nvme_mpath_remove_disk was already in use, and its functionality is related to releasing or putting the head node disk. To resolve this naming conflict and improve clarity, the existing nvme_mpath_ remove_disk function is also renamed to nvme_mpath_put_disk. This renaming improves code readability and better aligns function names with their actual roles. Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Nilay Shroff <nilay@linux.ibm.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-05-20	nvme: introduce multipath_always_on module param	Nilay Shroff
	Currently, a multipath head disk node is not created for single- ported NVMe adapters or private namespaces with non-unique NSID. However, creating a head node in these cases can help transparently handle transient PCIe link failures. Without a head node, features like delayed removal cannot be leveraged, making it difficult to tolerate such link failures. To address this, this commit introduces nvme_core module parameter multipath_always_on. When multipath_always_on is set to true, it forces the creation of a multipath head node regardless NVMe disk or namespace type. So this option allows the use of delayed removal of head node functionality even for single-ported NVMe disks and private namespaces with a unique NSID and thus helps transparently handle transient PCIe link failures. By default multipath_always_on is set to false, thus preserving the existing behavior. Setting it to true enables improved fault tolerance in PCIe setups. Moreover, please note that enabling this option would also implicitly enable nvme_core.multipath. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Nilay Shroff <nilay@linux.ibm.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-05-20	nvme-multipath: introduce delayed removal of the multipath head node	Nilay Shroff
	Currently, the multipath head node of an NVMe disk is removed immediately as soon as all paths of the disk are removed. However, this can cause issues in scenarios where: - The disk hot-removal followed by re-addition. - Transient PCIe link failures that trigger re-enumeration, temporarily removing and then restoring the disk. In these cases, removing the head node prematurely may lead to a head disk node name change upon re-addition, requiring applications to reopen their handles if they were performing I/O during the failure. To address this, introduce a delayed removal mechanism of head disk node. During transient failure, instead of immediate removal of head disk node, the system waits for a configurable timeout, allowing the disk to recover. During transient disk failure, if application sends any IO then we queue it instead of failing such IO immediately. If the disk comes back online within the timeout, the queued IOs are resubmitted to the disk ensuring seamless operation. In case disk couldn't recover from the failure then queued IOs are failed to its completion and application receives the error. So this way, if disk comes back online within the configured period, the head node remains unchanged, ensuring uninterrupted workloads without requiring applications to reopen device handles. A new sysfs attribute, named "delayed_removal_secs" is added under head disk blkdev for user who wish to configure time for the delayed removal of head disk node. The default value of this attribute is set to zero second ensuring no behavior change unless explicitly configured. Link: https://lore.kernel.org/linux-nvme/Y9oGTKCFlOscbPc2@infradead.org/ Link: https://lore.kernel.org/linux-nvme/Y+1aKcQgbskA2tra@kbusch-mbp.dhcp.thefacebook.com/ Suggested-by: Keith Busch <kbusch@kernel.org> Suggested-by: Christoph Hellwig <hch@infradead.org> [nilay: reworked based on the original idea/POC from Christoph and Keith] Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Nilay Shroff <nilay@linux.ibm.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-05-20	nvme-pci: derive and better document max segments limits	Christoph Hellwig
	Redefine the max segments and max integrity limits based on the limiting factors. This keeps exactly the same values for 4k PAGE_SIZE systems, but increases the number of segments for larger page size as it properly derives the scatterlist allocation based limit for them instead of assuming a 4k PAGE_SIZE. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org>
2025-05-20	nvme-pci: use struct_size for allocation struct nvme_dev	Christoph Hellwig
	This avoids open coding the variable size array arithmetics. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Leon Romanovsky <leon@kernel.org>
2025-05-20	nvme-pci: add a symolic name for the small pool size	Leon Romanovsky
	Open coding magic numbers in multiple places is never a good idea. Signed-off-by: Leon Romanovsky <leon@kernel.org> [hch: split from a larger patch] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Caleb Sander Mateos <csander@purestorage.com>
2025-05-20	nvme-pci: use a better encoding for small prp pool allocations	Christoph Hellwig
	Add a separate flag to encode that the transfer is using the small page sized pool, and use a normal 0..n count for the number of descriptors. Contains improvements and suggestions from Kanchan Joshi <joshi.k@samsung.com> and Leon Romanovsky <leon@kernel.org>. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Leon Romanovsky <leon@kernel.org>
2025-05-20	nvme-pci: rename the descriptor pools	Christoph Hellwig
	They are used for both PRPs and SGLs, and we use descriptor elsewhere when referring to their allocations, so use that name here as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Leon Romanovsky <leon@kernel.org>
2025-05-20	nvme-pci: remove struct nvme_descriptor	Christoph Hellwig
	There is no real point in having a union of two pointer types here, just use a void pointer as we mix and match types between the arms of the union between the allocation and freeing side already. Also rename the nr_allocations field to nr_descriptors to better describe what it does. Signed-off-by: Christoph Hellwig <hch@lst.de> [leon: ported forward to include metadata SGL support] Signed-off-by: Leon Romanovsky <leon@kernel.org> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
2025-05-20	nvme-pci: store aborted state in flags variable	Leon Romanovsky
	Instead of keeping dedicated "bool aborted" variable, switch to a flags flags that can be used for other flags as well. Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Caleb Sander Mateos <csander@purestorage.com>
2025-05-20	nvme-pci: don't try to use SGLs for metadata on the admin queue	Christoph Hellwig
	No admin command defined in an NVMe specification supports metadata, but to protect against vendor specific commands using metadata ensure that we don't try to use SGLs for metadata on the admin queue, as NVMe does not support SGLs on the admin queue for the PCI transport. Do this by checking if the data transfer has been setup using SGLs as that is required for using SGLs for metadata. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Leon Romanovsky <leon@kernel.org>
2025-05-20	nvme-pci: make PRP list DMA pools per-NUMA-node	Caleb Sander Mateos
	NVMe commands with over 8 KB of discontiguous data allocate PRP list pages from the per-nvme_device dma_pool prp_page_pool or prp_small_pool. Each call to dma_pool_alloc() and dma_pool_free() takes the per-dma_pool spinlock. These device-global spinlocks are a significant source of contention when many CPUs are submitting to the same NVMe devices. On a workload issuing 32 KB reads from 16 CPUs (8 hypertwin pairs) across 2 NUMA nodes to 23 NVMe devices, we observed 2.4% of CPU time spent in _raw_spin_lock_irqsave called from dma_pool_alloc and dma_pool_free. Ideally, the dma_pools would be per-hctx to minimize contention. But that could impose considerable resource costs in a system with many NVMe devices and CPUs. As a compromise, allocate per-NUMA-node PRP list DMA pools. Map each nvme_queue to the set of DMA pools corresponding to its device and its hctx's NUMA node. This reduces the _raw_spin_lock_irqsave overhead by about half, to 1.2%. Preventing the sharing of PRP list pages across NUMA nodes also makes them cheaper to initialize. Link: https://lore.kernel.org/linux-nvme/CADUfDZqa=OOTtTTznXRDmBQo1WrFcDw1hBA7XwM7hzJ-hpckcA@mail.gmail.com/T/#u Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-05-20	nvme-pci: factor out a nvme_init_hctx_common() helper	Caleb Sander Mateos
	nvme_init_hctx() and nvme_admin_init_hctx() are very similar. In preparation for adding more logic, factor out a nvme_init_hctx-common() helper. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-05-20	nvme-fc: do not reference lsrsp after failure	Daniel Wagner
	The lsrsp object is maintained by the LLDD. The lifetime of the lsrsp object is implicit. Because there is no explicit cleanup/free call into the LLDD, it is not safe to assume after xml_rsp_fails, that the lsrsp is still valid. The LLDD could have freed the object already. With the recent changes how fcloop tracks the resources, this is the case. Thus don't access lsrsp after xml_rsp_fails. Signed-off-by: Daniel Wagner <wagi@kernel.org> Reviewed-by: Hannes Reinecke <hare@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-05-20	nvmet-fcloop: don't wait for lport cleanup	Daniel Wagner
	The lifetime of the fcloop_lsreq is not tight to the lifetime of the host or target port, thus there is no need anymore to synchronize the cleanup path anymore. Signed-off-by: Daniel Wagner <wagi@kernel.org> Reviewed-by: Hannes Reinecke <hare@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-05-20	nvmet-fcloop: add missing fcloop_callback_host_done	Daniel Wagner
	Add the missing fcloop_call_host_done calls so that the caller frees resources when something goes wrong. Signed-off-by: Daniel Wagner <wagi@kernel.org> Reviewed-by: Hannes Reinecke <hare@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-05-20	nvmet-fc: take tgtport refs for portentry	Daniel Wagner
	Ensure that the tgtport is not going away as long portentry has a pointer on it. Signed-off-by: Daniel Wagner <wagi@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-05-20	nvmet-fc: free pending reqs on tgtport unregister	Daniel Wagner
	When nvmet_fc_unregister_targetport is called by the LLDD, it's not possible to communicate with the host, thus all pending request will not be process. Thus explicitly free them. Signed-off-by: Daniel Wagner <wagi@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-05-20	nvmet-fcloop: drop response if targetport is gone	Daniel Wagner
	When the target port is gone, the lsrsp pointer is invalid. Thus don't call the done function anymore instead just drop the response. This happens when the target sends a disconnect association. After this the target starts tearing down all resources and doesn't expect any response. Signed-off-by: Daniel Wagner <wagi@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-05-20	nvmet-fcloop: allocate/free fcloop_lsreq directly	Daniel Wagner
	fcloop depends on the host or the target to allocate the fcloop_lsreq object. This means that the lifetime of the fcloop_lsreq is tied to either the host or the target. Consequently, the host or the target must cooperate during shutdown. Unfortunately, this approach does not work well when the target forces a shutdown, as there are dependencies that are difficult to resolve in a clean way. The simplest solution is to decouple the lifetime of the fcloop_lsreq object by managing them directly within fcloop. Since this is not a performance-critical path and only a small number of LS objects are used during setup and cleanup, it does not significantly impact performance to allocate them during normal operation. Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Daniel Wagner <wagi@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>