linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2025-08-20	LoongArch: Add cpuhotplug hooks to fix high cpu usage of vCPU threads	Xianglai Li
	When the CPU is offline, the timer of LoongArch is not correctly closed. This is harmless for real machines, but resulting in an excessively high cpu usage rate of the offline vCPU thread in the virtual machines. To correctly close the timer, we have made the following modifications: Register the cpu hotplug event (CPUHP_AP_LOONGARCH_ARCH_TIMER_STARTING) for LoongArch. This event's hooks will be called to close the timer when the CPU is offline. Clear the timer interrupt when the timer is turned off. Since before the timer is turned off, there may be a timer interrupt that has already been in the pending state due to the interruption of the disabled, which also affects the halt state of the offline vCPU. Signed-off-by: Xianglai Li <lixianglai@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-08-20	LoongArch: Increase COMMAND_LINE_SIZE up to 4096	Ming Wang
	The default COMMAND_LINE_SIZE of 512, inherited from asm-generic, is too small for modern use cases. For example, kdump configurations or extensive debugging parameters can easily exceed this limit. Therefore, increase the command line size to 4096 bytes, aligning LoongArch with the MIPS architecture. This change follows a broader trend among architectures to raise this limit to support modern needs; for instance, PowerPC increased its value for similar reasons in the commit a5980d064fe2 ("powerpc: Bump COMMAND_LINE_SIZE to 2048"). Similar to the change made for RISC-V in the commit 61fc1ee8be26 ("riscv: Bump COMMAND_LINE_SIZE value to 1024"), this is considered a safe change. The broader kernel community has reached a consensus that modifying COMMAND_LINE_SIZE from UAPI headers does not constitute a uABI breakage, as well-behaved userspace applications should not rely on this macro. Suggested-by: Huang Cun <cunhuang@tencent.com> Signed-off-by: Ming Wang <wangming01@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-08-20	LoongArch: Pass annotate-tablejump option if LTO is enabled	Tiezhu Yang
	When compiling with LLVM and CONFIG_LTO_CLANG is set, there exist many objtool warnings "sibling call from callable instruction with modified stack frame". For this special case, the related object file shows that there is no generated relocation section '.rela.discard.tablejump_annotate' for the table jump instruction jirl, thus objtool can not know that what is the actual destination address. It needs to do something on the LLVM side to make sure that there is the relocation section '.rela.discard.tablejump_annotate' if LTO is enabled, but in order to maintain compatibility for the current LLVM compiler, this can be done in the kernel Makefile for now. Ensure it is aware of linker with LTO, '--loongarch-annotate-tablejump' needs to be passed via '-mllvm' to ld.lld. Note that it should also pass the compiler option -mannotate-tablejump rather than only pass '-mllvm --loongarch-annotate-tablejump' to ld.lld if LTO is enabled, otherwise there are no jump info for some table jump instructions. Fixes: e20ab7d454ee ("LoongArch: Enable jump table for objtool") Closes: https://lore.kernel.org/loongarch/20250731175655.GA1455142@ax162/ Reported-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Co-developed-by: WANG Rui <wangrui@loongson.cn> Signed-off-by: WANG Rui <wangrui@loongson.cn> Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-08-20	objtool/LoongArch: Get table size correctly if LTO is enabled	Tiezhu Yang
	When compiling with LLVM and CONFIG_LTO_CLANG is set, there exist many objtool warnings "sibling call from callable instruction with modified stack frame". For this special case, the related object file shows that there is no generated relocation section '.rela.discard.tablejump_annotate' for the table jump instruction jirl, thus objtool can not know that what is the actual destination address. It needs to do something on the LLVM side to make sure that there is the relocation section '.rela.discard.tablejump_annotate' if LTO is enabled, but in order to maintain compatibility for the current LLVM compiler, this can be done in the kernel Makefile for now. Ensure it is aware of linker with LTO, '--loongarch-annotate-tablejump' needs to be passed via '-mllvm' to ld.lld. Before doing the above changes, it should handle the special case of the relocation section '.rela.discard.tablejump_annotate' to get the correct table size first, otherwise there are many objtool warnings and errors if LTO is enabled. There are many different rodata for each function if LTO is enabled, it is necessary to enhance get_rodata_table_size_by_table_annotate(). Fixes: b95f852d3af2 ("objtool/LoongArch: Add support for switch table") Closes: https://lore.kernel.org/loongarch/20250731175655.GA1455142@ax162/ Reported-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-08-20	Merge drm/drm-fixes into drm-misc-fixes	Maxime Ripard
	Update drm-misc-fixes to -rc2. Signed-off-by: Maxime Ripard <mripard@kernel.org>
2025-08-20	drivers/xen/xenbus: remove quirk for Xen 3.x	Juergen Gross
	The kernel is not supported to run as a Xen guest on Xen versions older than 4.0. Remove xen_strict_xenbus_quirk() which is testing the Xen version to be at least 4.0. Acked-by: Stefano Stabellini <sstabellini@kernel.org> Reviewed-by: Jason Andryuk <jason.andryuk@amd.com> Signed-off-by: Juergen Gross <jgross@suse.com> Message-ID: <20250815074052.13792-1-jgross@suse.com>
2025-08-20	ACPI: pfr_update: Fix the driver update version check	Chen Yu
	The security-version-number check should be used rather than the runtime version check for driver updates. Otherwise, the firmware update would fail when the update binary had a lower runtime version number than the current one. Fixes: 0db89fa243e5 ("ACPI: Introduce Platform Firmware Runtime Update device driver") Cc: 5.17+ <stable@vger.kernel.org> # 5.17+ Reported-by: "Govindarajulu, Hariganesh" <hariganesh.govindarajulu@intel.com> Signed-off-by: Chen Yu <yu.c.chen@intel.com> Link: https://patch.msgid.link/20250722143233.3970607-1-yu.c.chen@intel.com [ rjw: Changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-08-20	ALSA: hda: tas2781: Fix wrong reference of tasdevice_priv	Takashi Iwai
	During the conversion to unify the calibration data management, the reference to tasdevice_priv was wrongly set to h->hda_priv instead of h->priv. This resulted in memory corruption and crashes eventually. Unfortunately it's a void pointer, hence the compiler couldn't know that it's wrong. Fixes: 4fe238513407 ("ALSA: hda/tas2781: Move and unified the calibrated-data getting function for SPI and I2C into the tas2781_hda lib") Link: https://bugzilla.suse.com/show_bug.cgi?id=1248270 Cc: <stable@vger.kernel.org> Link: https://patch.msgid.link/20250820051902.4523-1-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2025-08-19	Merge branch 'fixes-on-the-microchip-s-lan865x-driver'	Jakub Kicinski
	Parthiban Veerasooran says: ==================== Fixes on the Microchip's LAN865x driver This patch series includes two bug fixes for the LAN865x Ethernet MAC-PHY driver: 1. Fix missing transmit queue restart on device reopen This patch addresses an issue where the transmit queue is not restarted when the network interface is brought back up after being taken down (e.g., via ip or ifconfig). As a result, packet transmission hangs after the first down/up cycle. The fix ensures netif_start_queue() is explicitly called in lan865x_net_open() to properly restart the queue on every reopen. 2. Fix missing configuration in the Microchip LAN865x driver for silicon revisions B0 and B1, as documented in Microchip Application Note AN1760 (Rev F, June 2024). These revisions require the MAC to be configured for timestamping at the end of the Start of Frame Delimiter (SFD) and the Timer Increment register to be set to 40 ns, corresponding to a 25 MHz internal clock. Both patches address issues introduced with the initial driver support and are marked with the appropriate Fixes: tag. ==================== Link: https://patch.msgid.link/20250818060514.52795-1-parthiban.veerasooran@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-19	microchip: lan865x: fix missing Timer Increment config for Rev.B0/B1	Parthiban Veerasooran
	Fix missing configuration for LAN865x silicon revisions B0 and B1 as per Microchip Application Note AN1760 (Rev F, June 2024). The Timer Increment register was not being set, which is required for accurate timestamping. As per the application note, configure the MAC to set timestamping at the end of the Start of Frame Delimiter (SFD), and set the Timer Increment register to 40 ns (corresponding to a 25 MHz internal clock). Link: https://www.microchip.com/en-us/application-notes/an1760 Fixes: 5cd2340cb6a3 ("microchip: lan865x: add driver support for Microchip's LAN865X MAC-PHY") Signed-off-by: Parthiban Veerasooran <parthiban.veerasooran@microchip.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/20250818060514.52795-3-parthiban.veerasooran@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-19	microchip: lan865x: fix missing netif_start_queue() call on device open	Parthiban Veerasooran
	This fixes an issue where the transmit queue is started implicitly only the very first time the device is registered. When the device is taken down and brought back up again (using `ip` or `ifconfig`), the transmit queue is not restarted, causing packet transmission to hang. Adding an explicit call to netif_start_queue() in lan865x_net_open() ensures the transmit queue is properly started every time the device is reopened. Fixes: 5cd2340cb6a3 ("microchip: lan865x: add driver support for Microchip's LAN865X MAC-PHY") Signed-off-by: Parthiban Veerasooran <parthiban.veerasooran@microchip.com> Link: https://patch.msgid.link/20250818060514.52795-2-parthiban.veerasooran@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-19	Merge branch 'mlx5-hws-fixes-2025-08-17'	Jakub Kicinski
	Mark Bloch says: ==================== mlx5 HWS fixes 2025-08-17 The following patch set focuses on hardware steering fixes found by the team. ==================== Link: https://patch.msgid.link/20250817202323.308604-1-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-19	net/mlx5: CT: Use the correct counter offset	Vlad Dogaru
	Specifying the counter action is not enough, as it is used by multiple counters that were allocated in a bulk. By omitting the offset, rules will be associated with a different counter from the same bulk. Subsequently, the CT subsystem checks the correct counter, assumes that no traffic has triggered the rule, and ages out the rule. The end result is intermittent offloading of long lived connections, as rules are aged out then promptly re-added. Fix this by specifying the correct offset along with the counter rule. Fixes: 34eea5b12a10 ("net/mlx5e: CT: Add initial support for Hardware Steering") Signed-off-by: Vlad Dogaru <vdogaru@nvidia.com> Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250817202323.308604-8-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-19	net/mlx5: HWS, Fix table creation UID	Alex Vesker
	During table creation, caller passes a UID using ft_attr. The UID value was ignored, which leads to problems when the caller sets the UID to a non-zero value, such as SHARED_RESOURCE_UID (0xffff) - the internal FT objects will be created with UID=0. Fixes: 0869701cba3d ("net/mlx5: HWS, added FW commands handling") Signed-off-by: Alex Vesker <valex@nvidia.com> Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250817202323.308604-7-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-19	net/mlx5: HWS, don't rehash on every kind of insertion failure	Yevgeny Kliteynik
	If rule creation failed due to a full queue, due to timeout in polling for completion, or due to matcher being in resize, don't try to initiate rehash sequence - rehash would have failed anyway. Fixes: 2111bb970c78 ("net/mlx5: HWS, added backward-compatible API handling") Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Vlad Dogaru <vdogaru@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250817202323.308604-6-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-19	net/mlx5: HWS, prevent rehash from filling up the queues	Yevgeny Kliteynik
	While moving the rules during rehash, CQ is not drained. The flush and drain happens only when all the rules of a certain queue have been moved. This behaviour can lead to accumulating large quantity of rules that haven't got their completion yet, and eventually will fill up the queue and will cause the rehash to fail. Fix this problem by requiring drain once the number of outstanding completions reaches a certain threshold. Fixes: ef94799a8741 ("net/mlx5: HWS, rework rehash loop") Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Vlad Dogaru <vdogaru@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250817202323.308604-5-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-19	net/mlx5: HWS, fix complex rules rehash error flow	Yevgeny Kliteynik
	Moving rules from matcher to matcher should not fail. However, if it does fail due to various reasons, the error flow should allow the kernel to continue functioning (albeit with broken steering rules) instead of going into series of soft lock-ups or some other problematic behaviour. Similar to the simple rules, complex rules rehash logic suffers from the same problems. This patch fixes the error flow for moving complex rules: - If new rule creation fails before it was even enqeued, do not poll for completion - If TIMEOUT happened while moving the rule, no point trying to poll for completions for other rules. Something is broken, completion won't come, just abort the rehash sequence. - If some other completion with error received, don't give up. Continue handling rest of the rules to minimize the damage. - Make sure that the first error code that was received will be actually returned to the caller instead of replacing it with the generic error code. All the aforementioned issues stem from the same bad error flow, so no point fixing them one by one and leaving partially broken code - fixing them in one patch. Fixes: 17e0accac577 ("net/mlx5: HWS, support complex matchers") Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Vlad Dogaru <vdogaru@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250817202323.308604-4-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-19	net/mlx5: HWS, fix simple rules rehash error flow	Yevgeny Kliteynik
	Moving rules from matcher to matcher should not fail. However, if it does fail due to various reasons, the error flow should allow the kernel to continue functioning (albeit with broken steering rules) instead of going into series of soft lock-ups or some other problematic behaviour. This patch fixes the error flow for moving simple rules: - If new rule creation fails before it was even enqeued, do not poll for completion - If TIMEOUT happened while moving the rule, no point trying to poll for completions for other rules. Something is broken, completion won't come, just abort the rehash sequence. - If some other completion with error received, don't give up. Continue handling rest of the rules to minimize the damage. - Make sure that the first error code that was received will be actually returned to the caller instead of replacing it with the generic error code. All the aforementioned issues stem from the same bad error flow, so no point fixing them one by one and leaving partially broken code - fixing them in one patch. Fixes: ef94799a8741 ("net/mlx5: HWS, rework rehash loop") Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Vlad Dogaru <vdogaru@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250817202323.308604-3-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-19	net/mlx5: HWS, fix bad parameter in CQ creation	Yevgeny Kliteynik
	'cqe_sz' valid value should be 0 for 64-byte CQE. Fixes: 2ca62599aa0b ("net/mlx5: HWS, added send engine and context handling") Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Vlad Dogaru <vdogaru@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250817202323.308604-2-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-19	net/smc: fix UAF on smcsk after smc_listen_out()	D. Wythe
	BPF CI testing report a UAF issue: [ 16.446633] BUG: kernel NULL pointer dereference, address: 000000000000003 0 [ 16.447134] #PF: supervisor read access in kernel mod e [ 16.447516] #PF: error_code(0x0000) - not-present pag e [ 16.447878] PGD 0 P4D 0 [ 16.448063] Oops: Oops: 0000 [#1] PREEMPT SMP NOPT I [ 16.448409] CPU: 0 UID: 0 PID: 9 Comm: kworker/0:1 Tainted: G OE 6.13.0-rc3-g89e8a75fda73-dirty #4 2 [ 16.449124] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODUL E [ 16.449502] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/201 4 [ 16.450201] Workqueue: smc_hs_wq smc_listen_wor k [ 16.450531] RIP: 0010:smc_listen_work+0xc02/0x159 0 [ 16.452158] RSP: 0018:ffffb5ab40053d98 EFLAGS: 0001024 6 [ 16.452526] RAX: 0000000000000001 RBX: 0000000000000002 RCX: 000000000000030 0 [ 16.452994] RDX: 0000000000000280 RSI: 00003513840053f0 RDI: 000000000000000 0 [ 16.453492] RBP: ffffa097808e3800 R08: ffffa09782dba1e0 R09: 000000000000000 5 [ 16.453987] R10: 0000000000000000 R11: 0000000000000000 R12: ffffa0978274640 0 [ 16.454497] R13: 0000000000000000 R14: 0000000000000000 R15: ffffa09782d4092 0 [ 16.454996] FS: 0000000000000000(0000) GS:ffffa097bbc00000(0000) knlGS:000000000000000 0 [ 16.455557] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003 3 [ 16.455961] CR2: 0000000000000030 CR3: 0000000102788004 CR4: 0000000000770ef 0 [ 16.456459] PKRU: 5555555 4 [ 16.456654] Call Trace : [ 16.456832] <TASK > [ 16.456989] ? __die+0x23/0x7 0 [ 16.457215] ? page_fault_oops+0x180/0x4c 0 [ 16.457508] ? __lock_acquire+0x3e6/0x249 0 [ 16.457801] ? exc_page_fault+0x68/0x20 0 [ 16.458080] ? asm_exc_page_fault+0x26/0x3 0 [ 16.458389] ? smc_listen_work+0xc02/0x159 0 [ 16.458689] ? smc_listen_work+0xc02/0x159 0 [ 16.458987] ? lock_is_held_type+0x8f/0x10 0 [ 16.459284] process_one_work+0x1ea/0x6d 0 [ 16.459570] worker_thread+0x1c3/0x38 0 [ 16.459839] ? __pfx_worker_thread+0x10/0x1 0 [ 16.460144] kthread+0xe0/0x11 0 [ 16.460372] ? __pfx_kthread+0x10/0x1 0 [ 16.460640] ret_from_fork+0x31/0x5 0 [ 16.460896] ? __pfx_kthread+0x10/0x1 0 [ 16.461166] ret_from_fork_asm+0x1a/0x3 0 [ 16.461453] </TASK > [ 16.461616] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod(OE) ] [ 16.462134] CR2: 000000000000003 0 [ 16.462380] ---[ end trace 0000000000000000 ]--- [ 16.462710] RIP: 0010:smc_listen_work+0xc02/0x1590 The direct cause of this issue is that after smc_listen_out_connected(), newclcsock->sk may be NULL since it will releases the smcsk. Therefore, if the application closes the socket immediately after accept, newclcsock->sk can be NULL. A possible execution order could be as follows: smc_listen_work \| userspace ----------------------------------------------------------------- lock_sock(sk) \| smc_listen_out_connected() \| \| \- smc_listen_out \| \| \| \- release_sock \| \| \|- sk->sk_data_ready() \| \| fd = accept(); \| close(fd); \| \- socket->sk = NULL; /* newclcsock->sk is NULL now */ SMC_STAT_SERV_SUCC_INC(sock_net(newclcsock->sk)) Since smc_listen_out_connected() will not fail, simply swapping the order of the code can easily fix this issue. Fixes: 3b2dec2603d5 ("net/smc: restructure client and server code in af_smc") Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Reviewed-by: Guangguan Wang <guangguan.wang@linux.alibaba.com> Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Reviewed-by: Dust Li <dust.li@linux.alibaba.com> Link: https://patch.msgid.link/20250818054618.41615-1-alibuda@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-19	net: stmmac: thead: Enable TX clock before MAC initialization	Yao Zi
	The clk_tx_i clock must be supplied to the MAC for successful initialization. On TH1520 SoC, the clock is provided by an internal divider configured through GMAC_PLLCLK_DIV register when using RGMII interface. However, currently we don't setup the divider before initialization of the MAC, resulting in DMA reset failures if the bootloader/firmware doesn't enable the divider, [ 7.839601] thead-dwmac ffe7060000.ethernet eth0: Register MEM_TYPE_PAGE_POOL RxQ-0 [ 7.938338] thead-dwmac ffe7060000.ethernet eth0: PHY [stmmac-0:02] driver [RTL8211F Gigabit Ethernet] (irq=POLL) [ 8.160746] thead-dwmac ffe7060000.ethernet eth0: Failed to reset the dma [ 8.170118] thead-dwmac ffe7060000.ethernet eth0: stmmac_hw_setup: DMA engine initialization failed [ 8.179384] thead-dwmac ffe7060000.ethernet eth0: __stmmac_open: Hw setup failed Let's simply write GMAC_PLLCLK_DIV_EN to GMAC_PLLCLK_DIV to enable the divider before MAC initialization. Note that for reconfiguring the divisor, the divider must be disabled first and re-enabled later to make sure the new divisor take effect. The exact clock rate doesn't affect MAC's initialization according to my test. It's set to the speed required by RGMII when the linkspeed is 1Gbps and could be reclocked later after link is up if necessary. Fixes: 33a1a01e3afa ("net: stmmac: Add glue layer for T-HEAD TH1520 SoC") Signed-off-by: Yao Zi <ziyao@disroot.org> Reviewed-by: Drew Fustini <fustini@kernel.org> Link: https://patch.msgid.link/20250815104803.55294-1-ziyao@disroot.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-19	gve: prevent ethtool ops after shutdown	Jordan Rhee
	A crash can occur if an ethtool operation is invoked after shutdown() is called. shutdown() is invoked during system shutdown to stop DMA operations without performing expensive deallocations. It is discouraged to unregister the netdev in this path, so the device may still be visible to userspace and kernel helpers. In gve, shutdown() tears down most internal data structures. If an ethtool operation is dispatched after shutdown(), it will dereference freed or NULL pointers, leading to a kernel panic. While graceful shutdown normally quiesces userspace before invoking the reboot syscall, forced shutdowns (as observed on GCP VMs) can still trigger this path. Fix by calling netif_device_detach() in shutdown(). This marks the device as detached so the ethtool ioctl handler will skip dispatching operations to the driver. Fixes: 974365e51861 ("gve: Implement suspend/resume/shutdown") Signed-off-by: Jordan Rhee <jordanrhee@google.com> Signed-off-by: Jeroen de Borst <jeroendb@google.com> Link: https://patch.msgid.link/20250818211245.1156919-1-jeroendb@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-19	net: usb: asix_devices: Fix PHY address mask in MDIO bus initialization	Yuichiro Tsuji
	Syzbot reported shift-out-of-bounds exception on MDIO bus initialization. The PHY address should be masked to 5 bits (0-31). Without this mask, invalid PHY addresses could be used, potentially causing issues with MDIO bus operations. Fix this by masking the PHY address with 0x1f (31 decimal) to ensure it stays within the valid range. Fixes: 4faff70959d5 ("net: usb: asix_devices: add phy_mask for ax88772 mdio bus") Reported-by: syzbot+20537064367a0f98d597@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=20537064367a0f98d597 Tested-by: syzbot+20537064367a0f98d597@syzkaller.appspotmail.com Signed-off-by: Yuichiro Tsuji <yuichtsu@amazon.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250818084541.1958-1-yuichtsu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-19	phy: mscc: Fix timestamping for vsc8584	Horatiu Vultur
	There was a problem when we received frames and the frames were timestamped. The driver is configured to store the nanosecond part of the timestmap in the ptp reserved bits and it would take the second part by reading the LTC. The problem is that when reading the LTC we are in atomic context and to read the second part will go over mdio bus which might sleep, so we get an error. The fix consists in actually put all the frames in a queue and start the aux work and in that work to read the LTC and then calculate the full received time. Fixes: 7d272e63e0979d ("net: phy: mscc: timestamping and PHC support") Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20250818081029.1300780-1-horatiu.vultur@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-19	net/sched: sch_dualpi2: Run prob update timer in softirq to avoid deadlock	Victor Nogueira
	When a user creates a dualpi2 qdisc it automatically sets a timer. This timer will run constantly and update the qdisc's probability field. The issue is that the timer acquires the qdisc root lock and runs in hardirq. The qdisc root lock is also acquired in dev.c whenever a packet arrives for this qdisc. Since the dualpi2 timer callback runs in hardirq, it may interrupt the packet processing running in softirq. If that happens and it runs on the same CPU, it will acquire the same lock and cause a deadlock. The following splat shows up when running a kernel compiled with lock debugging: [ +0.000224] WARNING: inconsistent lock state [ +0.000224] 6.16.0+ #10 Not tainted [ +0.000169] -------------------------------- [ +0.000029] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage. [ +0.000000] ping/156 [HC0[0]:SC0[2]:HE1:SE0] takes: [ +0.000000] ffff897841242110 (&sch->root_lock_key){?.-.}-{3:3}, at: __dev_queue_xmit+0x86d/0x1140 [ +0.000000] {IN-HARDIRQ-W} state was registered at: [ +0.000000] lock_acquire.part.0+0xb6/0x220 [ +0.000000] _raw_spin_lock+0x31/0x80 [ +0.000000] dualpi2_timer+0x6f/0x270 [ +0.000000] __hrtimer_run_queues+0x1c5/0x360 [ +0.000000] hrtimer_interrupt+0x115/0x260 [ +0.000000] __sysvec_apic_timer_interrupt+0x6d/0x1a0 [ +0.000000] sysvec_apic_timer_interrupt+0x6e/0x80 [ +0.000000] asm_sysvec_apic_timer_interrupt+0x1a/0x20 [ +0.000000] pv_native_safe_halt+0xf/0x20 [ +0.000000] default_idle+0x9/0x10 [ +0.000000] default_idle_call+0x7e/0x1e0 [ +0.000000] do_idle+0x1e8/0x250 [ +0.000000] cpu_startup_entry+0x29/0x30 [ +0.000000] rest_init+0x151/0x160 [ +0.000000] start_kernel+0x6f3/0x700 [ +0.000000] x86_64_start_reservations+0x24/0x30 [ +0.000000] x86_64_start_kernel+0xc8/0xd0 [ +0.000000] common_startup_64+0x13e/0x148 [ +0.000000] irq event stamp: 6884 [ +0.000000] hardirqs last enabled at (6883): [<ffffffffa75700b3>] neigh_resolve_output+0x223/0x270 [ +0.000000] hardirqs last disabled at (6882): [<ffffffffa7570078>] neigh_resolve_output+0x1e8/0x270 [ +0.000000] softirqs last enabled at (6880): [<ffffffffa757006b>] neigh_resolve_output+0x1db/0x270 [ +0.000000] softirqs last disabled at (6884): [<ffffffffa755b533>] __dev_queue_xmit+0x73/0x1140 [ +0.000000] other info that might help us debug this: [ +0.000000] Possible unsafe locking scenario: [ +0.000000] CPU0 [ +0.000000] ---- [ +0.000000] lock(&sch->root_lock_key); [ +0.000000] <Interrupt> [ +0.000000] lock(&sch->root_lock_key); [ +0.000000] * DEADLOCK * [ +0.000000] 4 locks held by ping/156: [ +0.000000] #0: ffff897842332e08 (sk_lock-AF_INET){+.+.}-{0:0}, at: raw_sendmsg+0x41e/0xf40 [ +0.000000] #1: ffffffffa816f880 (rcu_read_lock){....}-{1:3}, at: ip_output+0x2c/0x190 [ +0.000000] #2: ffffffffa816f880 (rcu_read_lock){....}-{1:3}, at: ip_finish_output2+0xad/0x950 [ +0.000000] #3: ffffffffa816f840 (rcu_read_lock_bh){....}-{1:3}, at: __dev_queue_xmit+0x73/0x1140 I am able to reproduce it consistently when running the following: tc qdisc add dev lo handle 1: root dualpi2 ping -f 127.0.0.1 To fix it, make the timer run in softirq. Fixes: 320d031ad6e4 ("sched: Struct definition and parsing of dualpi2 qdisc") Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Victor Nogueira <victor@mojatatu.com> Link: https://patch.msgid.link/20250815135317.664993-1-victor@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-19	cdc_ncm: Flag Intel OEM version of Fibocom L850-GL as WWAN	Lubomir Rintel
	This lets NetworkManager/ModemManager know that this is a modem and needs to be connected first. Signed-off-by: Lubomir Rintel <lkundrak@v3.sk> Link: https://patch.msgid.link/20250814154214.250103-1-lkundrak@v3.sk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-19	mm/mremap: fix WARN with uffd that has remap events disabled	David Hildenbrand
	Registering userfaultd on a VMA that spans at least one PMD and then mremap()'ing that VMA can trigger a WARN when recovering from a failed page table move due to a page table allocation error. The code ends up doing the right thing (recurse, avoiding moving actual page tables), but triggering that WARN is unpleasant: WARNING: CPU: 2 PID: 6133 at mm/mremap.c:357 move_normal_pmd mm/mremap.c:357 [inline] WARNING: CPU: 2 PID: 6133 at mm/mremap.c:357 move_pgt_entry mm/mremap.c:595 [inline] WARNING: CPU: 2 PID: 6133 at mm/mremap.c:357 move_page_tables+0x3832/0x44a0 mm/mremap.c:852 Modules linked in: CPU: 2 UID: 0 PID: 6133 Comm: syz.0.19 Not tainted 6.17.0-rc1-syzkaller-00004-g53e760d89498 #0 PREEMPT(full) Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014 RIP: 0010:move_normal_pmd mm/mremap.c:357 [inline] RIP: 0010:move_pgt_entry mm/mremap.c:595 [inline] RIP: 0010:move_page_tables+0x3832/0x44a0 mm/mremap.c:852 Code: ... RSP: 0018:ffffc900037a76d8 EFLAGS: 00010293 RAX: 0000000000000000 RBX: 0000000032930007 RCX: ffffffff820c6645 RDX: ffff88802e56a440 RSI: ffffffff820c7201 RDI: 0000000000000007 RBP: ffff888037728fc0 R08: 0000000000000007 R09: 0000000000000000 R10: 0000000032930007 R11: 0000000000000000 R12: 0000000000000000 R13: ffffc900037a79a8 R14: 0000000000000001 R15: dffffc0000000000 FS: 000055556316a500(0000) GS:ffff8880d68bc000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b30863fff CR3: 0000000050171000 CR4: 0000000000352ef0 Call Trace: <TASK> copy_vma_and_data+0x468/0x790 mm/mremap.c:1215 move_vma+0x548/0x1780 mm/mremap.c:1282 mremap_to+0x1b7/0x450 mm/mremap.c:1406 do_mremap+0xfad/0x1f80 mm/mremap.c:1921 __do_sys_mremap+0x119/0x170 mm/mremap.c:1977 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xcd/0x4c0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f00d0b8ebe9 Code: ... RSP: 002b:00007ffe5ea5ee98 EFLAGS: 00000246 ORIG_RAX: 0000000000000019 RAX: ffffffffffffffda RBX: 00007f00d0db5fa0 RCX: 00007f00d0b8ebe9 RDX: 0000000000400000 RSI: 0000000000c00000 RDI: 0000200000000000 RBP: 00007ffe5ea5eef0 R08: 0000200000c00000 R09: 0000000000000000 R10: 0000000000000003 R11: 0000000000000246 R12: 0000000000000002 R13: 00007f00d0db5fa0 R14: 00007f00d0db5fa0 R15: 0000000000000005 </TASK> The underlying issue is that we recurse during the original page table move, but not during the recovery move. Fix it by checking for both VMAs and performing the check before the pmd_none() sanity check. Add a new helper where we perform+document that check for the PMD and PUD level. Thanks to Harry for bisecting. Link: https://lkml.kernel.org/r/20250818175358.1184757-1-david@redhat.com Fixes: 0cef0bb836e3 ("mm: clear uffd-wp PTE/PMD state on mremap()") Signed-off-by: David Hildenbrand <david@redhat.com> Reported-by: syzbot+4d9a13f0797c46a29e42@syzkaller.appspotmail.com Closes: https://lkml.kernel.org/r/689bb893.050a0220.7f033.013a.GAE@google.com Tested-by: Harry Yoo <harry.yoo@oracle.com> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Pedro Falcato <pfalcato@suse.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-08-19	mm/damon/sysfs-schemes: put damos dests dir after removing its files	SeongJae Park
	damon_sysfs_scheme_rm_dirs() puts dests directory kobject before removing its internal files. Sincee putting the kobject frees its container struct, and the internal files removal accesses the container, use-after-free happens. Fix it by putting the reference _after_ removing the files. Link: https://lkml.kernel.org/r/20250816165559.2601-1-sj@kernel.org Fixes: 2cd0bf85a203 ("mm/damon/sysfs-schemes: implement DAMOS action destinations directory") Signed-off-by: SeongJae Park <sj@kernel.org> Reported-by: Alexandre Ghiti <alex@ghiti.fr> Closes: https://lore.kernel.org/2d39a734-320d-4341-8f8a-4019eec2dbf2@ghiti.fr Tested-by: Alexandre Ghiti <alexghiti@rivosinc.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-08-19	mm/migrate: fix NULL movable_ops if CONFIG_ZSMALLOC=m	Huacai Chen
	After commit 84caf98838a3e5f4bdb34 ("mm: stop storing migration_ops in page->mapping") we get such an error message if CONFIG_ZSMALLOC=m: WARNING: CPU: 3 PID: 42 at mm/migrate.c:142 isolate_movable_ops_page+0xa8/0x1c0 CPU: 3 UID: 0 PID: 42 Comm: kcompactd0 Not tainted 6.16.0-rc5+ #2133 PREEMPT pc 9000000000540bd8 ra 9000000000540b84 tp 9000000100420000 sp 9000000100423a60 a0 9000000100193a80 a1 000000000000000c a2 000000000000001b a3 ffffffffffffffff a4 ffffffffffffffff a5 0000000000000267 a6 0000000000000000 a7 9000000100423ae0 t0 00000000000000f1 t1 00000000000000f6 t2 0000000000000000 t3 0000000000000001 t4 ffffff00010eb834 t5 0000000000000040 t6 900000010c89d380 t7 90000000023fcc70 t8 0000000000000018 u0 0000000000000000 s9 ffffff00010eb800 s0 ffffff00010eb800 s1 000000000000000c s2 0000000000043ae0 s3 0000800000000000 s4 900000000219cc40 s5 0000000000000000 s6 ffffff00010eb800 s7 0000000000000001 s8 90000000025b4000 ra: 9000000000540b84 isolate_movable_ops_page+0x54/0x1c0 ERA: 9000000000540bd8 isolate_movable_ops_page+0xa8/0x1c0 CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE) PRMD: 00000004 (PPLV0 +PIE -PWE) EUEN: 00000000 (-FPE -SXE -ASXE -BTE) ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7) ESTAT: 000c0000 [BRK] (IS= ECode=12 EsubCode=0) PRID: 0014c010 (Loongson-64bit, Loongson-3A5000) CPU: 3 UID: 0 PID: 42 Comm: kcompactd0 Not tainted 6.16.0-rc5+ #2133 PREEMPT Stack : 90000000021fd000 0000000000000000 9000000000247720 9000000100420000 90000001004236a0 90000001004236a8 0000000000000000 90000001004237e8 90000001004237e0 90000001004237e0 9000000100423550 0000000000000001 0000000000000001 90000001004236a8 725a84864a19e2d9 90000000023fcc58 9000000100420000 90000000024c6848 9000000002416848 0000000000000001 0000000000000000 000000000000000a 0000000007fe0000 ffffff00010eb800 0000000000000000 90000000021fd000 0000000000000000 900000000205cf30 000000000000008e 0000000000000009 ffffff00010eb800 0000000000000001 90000000025b4000 0000000000000000 900000000024773c 00007ffff103d748 00000000000000b0 0000000000000004 0000000000000000 0000000000071c1d ... Call Trace: [<900000000024773c>] show_stack+0x5c/0x190 [<90000000002415e0>] dump_stack_lvl+0x70/0x9c [<90000000004abe6c>] isolate_migratepages_block+0x3bc/0x16e0 [<90000000004af408>] compact_zone+0x558/0x1000 [<90000000004b0068>] compact_node+0xa8/0x1e0 [<90000000004b0aa4>] kcompactd+0x394/0x410 [<90000000002b3c98>] kthread+0x128/0x140 [<9000000001779148>] ret_from_kernel_thread+0x28/0xc0 [<9000000000245528>] ret_from_kernel_thread_asm+0x10/0x88 The reason is that defined(CONFIG_ZSMALLOC) evaluates to 1 only when CONFIG_ZSMALLOC=y, we should use IS_ENABLED(CONFIG_ZSMALLOC) instead. But when I use IS_ENABLED(CONFIG_ZSMALLOC), page_movable_ops() cannot access zsmalloc_mops because zsmalloc_mops is in a module. To solve this problem, we define a set_movable_ops() interface to register and unregister offline_movable_ops / zsmalloc_movable_ops in mm/migrate.c, and call them at mm/balloon_compaction.c & mm/zsmalloc.c. Since offline_movable_ops / zsmalloc_movable_ops are always accessible, all #ifdef / #endif are removed in page_movable_ops(). Link: https://lkml.kernel.org/r/20250817151759.2525174-1-chenhuacai@loongson.cn Fixes: 84caf98838a3 ("mm: stop storing migration_ops in page->mapping") Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Acked-by: Zi Yan <ziy@nvidia.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Huacai Chen <chenhuacai@loongson.cn> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-08-19	mm/damon/core: fix damos_commit_filter not changing allow	Sang-Heon Jeon
	Current damos_commit_filter() does not persist the `allow' value of the filter. As a result, changing the `allow' value of a filter and committing doesn't change the `allow' value. Add the missing `allow' value update, so committing the filter persistently changes the `allow' value well. Link: https://lkml.kernel.org/r/20250816015116.194589-1-ekffu200098@gmail.com Fixes: fe6d7fdd6249 ("mm/damon/core: add damos_filter->allow field") Signed-off-by: Sang-Heon Jeon <ekffu200098@gmail.com> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: <stable@vger.kernel.org> [6.14.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-08-19	mm/memory-failure: fix infinite UCE for VM_PFNMAP pfn	Jinjiang Tu
	When memory_failure() is called for a already hwpoisoned pfn, kill_accessing_process() will be called to kill current task. However, if the vma of the accessing vaddr is VM_PFNMAP, walk_page_range() will skip the vma in walk_page_test() and return 0. Before commit aaf99ac2ceb7 ("mm/hwpoison: do not send SIGBUS to processes with recovered clean pages"), kill_accessing_process() will return EFAULT. For x86, the current task will be killed in kill_me_maybe(). However, after this commit, kill_accessing_process() simplies return 0, that means UCE is handled properly, but it doesn't actually. In such case, the user task will trigger UCE infinitely. To fix it, add .test_walk callback for hwpoison_walk_ops to scan all vmas. Link: https://lkml.kernel.org/r/20250815073209.1984582-1-tujinjiang@huawei.com Fixes: aaf99ac2ceb7 ("mm/hwpoison: do not send SIGBUS to processes with recovered clean pages") Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Jane Chu <jane.chu@oracle.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Shuai Xue <xueshuai@linux.alibaba.com> Cc: Zi Yan <ziy@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-08-19	MAINTAINERS: mark MGLRU as maintained	Axel Rasmussen
	The three folks being added here are actively working on MGLRU within Google, so we can review patches for this feature and plan to contribute some improvements / extensions to it on an ongoing basis. With three of us we may have some hope filling Yu Zhao's shoes, since he has moved on to other projects these days. Link: https://lkml.kernel.org/r/20250815215914.3671925-1-axelrasmussen@google.com Signed-off-by: Axel Rasmussen <axelrasmussen@google.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Wei Xu <weixugc@google.com> Cc: Yuanchu Xie <yuanchu@google.com> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-08-19	mm: rust: add page.rs to MEMORY MANAGEMENT - RUST	Alice Ryhl
	The page.rs file currently isn't included anywhere, and I think it's a good fit for the MEMORY MANAGEMENT - RUST entry. The file was originally added for use by Rust Binder, but I believe there is also work to use it in the upcoming scatterlist abstractions. Link: https://lkml.kernel.org/r/20250814075454.1596482-1-aliceryhl@google.com Signed-off-by: Alice Ryhl <aliceryhl@google.com> Acked-by: Danilo Krummrich <dakr@kernel.org> Cc: Danilo Krummrich <dakr@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-08-19	iov_iter: iterate_folioq: fix handling of offset >= folio size	Dominique Martinet
	It's apparently possible to get an iov advanced all the way up to the end of the current page we're looking at, e.g. (gdb) p *iter $24 = {iter_type = 4 '\004', nofault = false, data_source = false, iov_offset = 4096, {__ubuf_iovec = { iov_base = 0xffff88800f5bc000, iov_len = 655}, {{__iov = 0xffff88800f5bc000, kvec = 0xffff88800f5bc000, bvec = 0xffff88800f5bc000, folioq = 0xffff88800f5bc000, xarray = 0xffff88800f5bc000, ubuf = 0xffff88800f5bc000}, count = 655}}, {nr_segs = 2, folioq_slot = 2 '\002', xarray_start = 2}} Where iov_offset is 4k with 4k-sized folios This should have been fine because we're only in the 2nd slot and there's another one after this, but iterate_folioq should not try to map a folio that skips the whole size, and more importantly part here does not end up zero (because 'PAGE_SIZE - skip % PAGE_SIZE' ends up PAGE_SIZE and not zero..), so skip forward to the "advance to next folio" code Link: https://lkml.kernel.org/r/20250813-iot_iter_folio-v3-0-a0ffad2b665a@codewreck.org Link: https://lkml.kernel.org/r/20250813-iot_iter_folio-v3-1-a0ffad2b665a@codewreck.org Signed-off-by: Dominique Martinet <asmadeus@codewreck.org> Fixes: db0aa2e9566f ("mm: Define struct folio_queue and ITER_FOLIOQ to handle a sequence of folios") Reported-by: Maximilian Bosch <maximilian@mbosch.me> Reported-by: Ryan Lahfa <ryan@lahfa.xyz> Reported-by: Christian Theune <ct@flyingcircus.io> Reported-by: Arnout Engelen <arnout@bzzt.net> Link: https://lkml.kernel.org/r/D4LHHUNLG79Y.12PI0X6BEHRHW@mbosch.me/ Acked-by: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: <stable@vger.kernel.org> [6.12+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-08-19	selftests/damon: fix selftests by installing drgn related script	Sang-Heon Jeon
	drgn_dump_damon_status is not installed during kselftest setup. It can break other tests which depend on drgn_dump_damon_status. Install drgn_dump_damon_status files to fix broken test. Link: https://lkml.kernel.org/r/20250812140046.660486-1-ekffu200098@gmail.com Fixes: f3e8e1e51362 ("selftests/damon: add drgn script for extracting damon status") Signed-off-by: Sang-Heon Jeon <ekffu200098@gmail.com> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: Honggyu Kim <honggyu.kim@sk.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-08-19	.mailmap: add entry for Easwar Hariharan	Easwar Hariharan
	Map my old, obsolete work email address to my current one. Link: https://lkml.kernel.org/r/20250812180218.92755-1-easwar.hariharan@linux.microsoft.com Signed-off-by: Easwar Hariharan <easwar.hariharan@linux.microsoft.com> Cc: Carlos Bilbao <carlos.bilbao@kernel.org> Cc: Jarkko Sakkinen <jarkko@kernel.org> Cc: Shannon Nelson <sln@onemain.com> Cc: Dmitry Baryshkov <lumag@kernel.org> Cc: Hans Verkuil <hverkuil@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-08-19	selftests/mm: add test for invalid multi VMA operations	Lorenzo Stoakes
	We can use UFFD to easily assert invalid multi VMA moves, so do so, asserting expected behaviour when VMAs invalid for a multi VMA operation are encountered. We assert both that such operations are not permitted, and that we do not even attempt to move the first VMA under these circumstances. We also assert that we can still move a single VMA regardless. We then assert that a partial failure can occur if the invalid VMA appears later in the range of multiple VMAs, both at the very next VMA, and also at the end of the range. As part of this change, we are using the is_range_valid() helper more aggressively. Therefore, fix a bug where stale buffered data would hang around on success, causing subsequent calls to is_range_valid() to potentially give invalid results. We simply have to fflush() the stream on success to resolve this issue. Link: https://lkml.kernel.org/r/c4fb86dd5ba37610583ad5fc0e0c2306ddf318b9.1754218667.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: David Hildenbrand <david@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-08-19	mm/mremap: catch invalid multi VMA moves earlier	Lorenzo Stoakes
	Previously, any attempt to solely move a VMA would require that the span specified reside within the span of that single VMA, with no gaps before or afterwards. After commit d23cb648e365 ("mm/mremap: permit mremap() move of multiple VMAs"), the multi VMA move permitted a gap to exist only after VMAs. This was done to provide maximum flexibility. However, We have consequently permitted this behaviour for the move of a single VMA including those not eligible for multi VMA move. The change introduced here means that we no longer permit non-eligible VMAs from being moved in this way. This is consistent, as it means all eligible VMA moves are treated the same, and all non-eligible moves are treated as they were before. This change does not break previous behaviour, which equally would have disallowed such a move (only in all cases). [lorenzo.stoakes@oracle.com: do not incorrectly reference invalid VMA in VM_WARN_ON_ONCE()] Link: https://lkml.kernel.org/r/b6dbda20-667e-4053-abae-8ed4fa84bb6c@lucifer.local Link: https://lkml.kernel.org/r/2b5aad5681573be85b5b8fac61399af6fb6b68b6.1754218667.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: David Hildenbrand <david@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-08-19	mm/mremap: allow multi-VMA move when filesystem uses thp_get_unmapped_area	Lorenzo Stoakes
	The multi-VMA move functionality introduced in commit d23cb648e365 ("mm/mremap: permit mremap() move of multiple VMA") doesn't allow moves of file-backed mappings which specify a custom f_op->get_unmapped_area handler excepting hugetlb and shmem. We expand this to include thp_get_unmapped_area to support file-backed mappings for filesystems which use large folios. Additionally, when the first VMA in a range is not compatible with a multi-VMA move, instead of moving the first VMA and returning an error, this series results in us not moving anything and returning an error immediately. Examining this second change in detail: The semantics of multi-VMA moves in mremap() very clearly indicate that a failure can result in a partial move of VMAs. This is in line with other aggregate operations within the kernel, which share these semantics. There are two classes of failures we're concerned with - eligiblity for mutli-VMA move, and transient failures that would occur even if the user individually moved each VMA. The latter is due to out-of-memory conditions (which, given the allocations involved are small, would likely be fatal in any case), or hitting the mapping limit. Regardless of the cause, transient issues would be fatal anyway, so it isn't really material which VMAs succeeded at being moved or not. However with when it comes to multi-VMA move eligiblity, we face another issue - we must allow a single VMA to succeed regardless of this eligiblity (as, of course, it is not a multi-VMA move) - but we must then fail multi-VMA operations. The two means by which VMAs may fail the eligbility test are - the VMAs being UFFD-armed, or the VMA being file-backed and providing its own f_op->get_unmapped_area() helper (because this may result in MREMAP_FIXED being disregarded), excepting those known to correctly handle MREMAP_FIXED. It is therefore conceivable that a user could erroneously try to use this functionality in these instances, and would prefer to not perform any move at all should that occur. This series therefore avoids any move of subsequent VMAs should the first be multi-VMA move ineligble and the input span exceeds that of the first VMA. We also add detailed test logic to assert that multi VMA move with ineligible VMAs functions as expected. This patch (of 3): We currently restrict multi-VMA move to avoid filesystems or drivers which provide a custom f_op->get_unmapped_area handler unless it is known to correctly handle MREMAP_FIXED. We do this so we do not get unexpected result when moving from one area to another (for instance, if the handler would align things resulting in the moved VMAs having different gaps than the original mapping). More and more filesystems are moving to using large folios, and typically do so (in part) by setting f_op->get_unmapped_area to thp_get_unmapped_area. When mremap() invokes the file system's get_unmapped MREMAP_FIXED, it does so via get_unmapped_area(), called in vrm_set_new_addr(). In order to do so, it converts the MREMAP_FIXED flag to a MAP_FIXED flag and passes this to the unmapped area handler. The __get_unmapped_area() function (called by get_unmapped_area()) in turn invokes the filesystem or driver's f_op->get_unmapped_area() handler. Therefore this is a point at which thp_get_unmapped_area() may be called (also, this is the case for anonymous mappings where the size is huge page aligned). thp_get_unmapped_area() calls thp_get_unmapped_area_vmflags() and __thp_get_unmapped_area() in turn (falling back to mm_get_unmapped_area_vm_flags() which is known to handle MAP_FIXED correctly). The __thp_get_unmapped_area() function in turn does nothing to change the address hint, nor the MAP_FIXED flag, only adjusting alignment parameters. It hten calls mm_get_unmapped_area_vmflags(), and in turn arch-specific unmapped area functions, all of which honour MAP_FIXED correctly. Therefore, we can safely add thp_get_unmapped_area to the known-good handlers. Link: https://lkml.kernel.org/r/cover.1754218667.git.lorenzo.stoakes@oracle.com Link: https://lkml.kernel.org/r/4f2542340c29c84d3d470b0c605e916b192f6c81.1754218667.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: David Hildenbrand <david@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-08-19	mm/damon/core: fix commit_ops_filters by using correct nth function	Sang-Heon Jeon
	damos_commit_ops_filters() incorrectly uses damos_nth_filter() which iterates core_filters. As a result, performing a commit unintentionally corrupts ops_filters. Add damos_nth_ops_filter() which iterates ops_filters. Use this function to fix issues caused by wrong iteration. Link: https://lkml.kernel.org/r/20250810124201.15743-1-ekffu200098@gmail.com Fixes: 3607cc590f18 ("mm/damon/core: support committing ops_filters") # 6.15.x Signed-off-by: Sang-Heon Jeon <ekffu200098@gmail.com> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-08-19	tools/testing: add linux/args.h header and fix radix, VMA tests	Lorenzo Stoakes
	Commit 857d18f23ab1 ("cleanup: Introduce ACQUIRE() and ACQUIRE_ERR() for conditional locks") accidentally broke the radix tree, VMA userland tests by including linux/args.h which is not present in the tools/include directory. This patch copies this over and adds an #ifdef block to avoid duplicate __CONCAT declaration in conflict with system headers when we ultimately include this. Link: https://lkml.kernel.org/r/20250811052654.33286-1-lorenzo.stoakes@oracle.com Fixes: 857d18f23ab1 ("cleanup: Introduce ACQUIRE() and ACQUIRE_ERR() for conditional locks") Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Jann Horn <jannh@google.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-08-19	mm/debug_vm_pgtable: clear page table entries at destroy_args()	Herton R. Krzesinski
	The mm/debug_vm_pagetable test allocates manually page table entries for the tests it runs, using also its manually allocated mm_struct. That in itself is ok, but when it exits, at destroy_args() it fails to clear those entries with the *_clear functions. The problem is that leaves stale entries. If another process allocates an mm_struct with a pgd at the same address, it may end up running into the stale entry. This is happening in practice on a debug kernel with CONFIG_DEBUG_VM_PGTABLE=y, for example this is the output with some extra debugging I added (it prints a warning trace if pgtables_bytes goes negative, in addition to the warning at check_mm() function): [ 2.539353] debug_vm_pgtable: [get_random_vaddr ]: random_vaddr is 0x7ea247140000 [ 2.539366] kmem_cache info [ 2.539374] kmem_cachep 0x000000002ce82385 - freelist 0x0000000000000000 - offset 0x508 [ 2.539447] debug_vm_pgtable: [init_args ]: args->mm is 0x000000002267cc9e (...) [ 2.552800] WARNING: CPU: 5 PID: 116 at include/linux/mm.h:2841 free_pud_range+0x8bc/0x8d0 [ 2.552816] Modules linked in: [ 2.552843] CPU: 5 UID: 0 PID: 116 Comm: modprobe Not tainted 6.12.0-105.debug_vm2.el10.ppc64le+debug #1 VOLUNTARY [ 2.552859] Hardware name: IBM,9009-41A POWER9 (architected) 0x4e0202 0xf000005 of:IBM,FW910.00 (VL910_062) hv:phyp pSeries [ 2.552872] NIP: c0000000007eef3c LR: c0000000007eef30 CTR: c0000000003d8c90 [ 2.552885] REGS: c0000000622e73b0 TRAP: 0700 Not tainted (6.12.0-105.debug_vm2.el10.ppc64le+debug) [ 2.552899] MSR: 800000000282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 24002822 XER: 0000000a [ 2.552954] CFAR: c0000000008f03f0 IRQMASK: 0 [ 2.552954] GPR00: c0000000007eef30 c0000000622e7650 c000000002b1ac00 0000000000000001 [ 2.552954] GPR04: 0000000000000008 0000000000000000 c0000000007eef30 ffffffffffffffff [ 2.552954] GPR08: 00000000ffff00f5 0000000000000001 0000000000000048 0000000000004000 [ 2.552954] GPR12: 00000003fa440000 c000000017ffa300 c0000000051d9f80 ffffffffffffffdb [ 2.552954] GPR16: 0000000000000000 0000000000000008 000000000000000a 60000000000000e0 [ 2.552954] GPR20: 4080000000000000 c0000000113af038 00007fffcf130000 0000700000000000 [ 2.552954] GPR24: c000000062a6a000 0000000000000001 8000000062a68000 0000000000000001 [ 2.552954] GPR28: 000000000000000a c000000062ebc600 0000000000002000 c000000062ebc760 [ 2.553170] NIP [c0000000007eef3c] free_pud_range+0x8bc/0x8d0 [ 2.553185] LR [c0000000007eef30] free_pud_range+0x8b0/0x8d0 [ 2.553199] Call Trace: [ 2.553207] [c0000000622e7650] [c0000000007eef30] free_pud_range+0x8b0/0x8d0 (unreliable) [ 2.553229] [c0000000622e7750] [c0000000007f40b4] free_pgd_range+0x284/0x3b0 [ 2.553248] [c0000000622e7800] [c0000000007f4630] free_pgtables+0x450/0x570 [ 2.553274] [c0000000622e78e0] [c0000000008161c0] exit_mmap+0x250/0x650 [ 2.553292] [c0000000622e7a30] [c0000000001b95b8] __mmput+0x98/0x290 [ 2.558344] [c0000000622e7a80] [c0000000001d1018] exit_mm+0x118/0x1b0 [ 2.558361] [c0000000622e7ac0] [c0000000001d141c] do_exit+0x2ec/0x870 [ 2.558376] [c0000000622e7b60] [c0000000001d1ca8] do_group_exit+0x88/0x150 [ 2.558391] [c0000000622e7bb0] [c0000000001d1db8] sys_exit_group+0x48/0x50 [ 2.558407] [c0000000622e7be0] [c00000000003d810] system_call_exception+0x1e0/0x4c0 [ 2.558423] [c0000000622e7e50] [c00000000000d05c] system_call_vectored_common+0x15c/0x2ec (...) [ 2.558892] ---[ end trace 0000000000000000 ]--- [ 2.559022] BUG: Bad rss-counter state mm:000000002267cc9e type:MM_ANONPAGES val:1 [ 2.559037] BUG: non-zero pgtables_bytes on freeing mm: -6144 Here the modprobe process ended up with an allocated mm_struct from the mm_struct slab that was used before by the debug_vm_pgtable test. That is not a problem, since the mm_struct is initialized again etc., however, if it ends up using the same pgd table, it bumps into the old stale entry when clearing/freeing the page table entries, so it tries to free an entry already gone (that one which was allocated by the debug_vm_pgtable test), which also explains the negative pgtables_bytes since it's accounting for not allocated entries in the current process. As far as I looked pgd_{alloc,free} etc. does not clear entries, and clearing of the entries is explicitly done in the free_pgtables-> free_pgd_range->free_p4d_range->free_pud_range->free_pmd_range-> free_pte_range path. However, the debug_vm_pgtable test does not call free_pgtables, since it allocates mm_struct and entries manually for its test and eg. not goes through page faults. So it also should clear manually the entries before exit at destroy_args(). This problem was noticed on a reboot X number of times test being done on a powerpc host, with a debug kernel with CONFIG_DEBUG_VM_PGTABLE enabled. Depends on the system, but on a 100 times reboot loop the problem could manifest once or twice, if a process ends up getting the right mm->pgd entry with the stale entries used by mm/debug_vm_pagetable. After using this patch, I couldn't reproduce/experience the problems anymore. I was able to reproduce the problem as well on latest upstream kernel (6.16). I also modified destroy_args() to use mmput() instead of mmdrop(), there is no reason to hold mm_users reference and not release the mm_struct entirely, and in the output above with my debugging prints I already had patched it to use mmput, it did not fix the problem, but helped in the debugging as well. Link: https://lkml.kernel.org/r/20250731214051.4115182-1-herton@redhat.com Fixes: 3c9b84f044a9 ("mm/debug_vm_pgtable: introduce struct pgtable_debug_args") Signed-off-by: Herton R. Krzesinski <herton@redhat.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Gavin Shan <gshan@redhat.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-08-19	squashfs: fix memory leak in squashfs_fill_super	Phillip Lougher
	If sb_min_blocksize returns 0, squashfs_fill_super exits without freeing allocated memory (sb->s_fs_info). Fix this by moving the call to sb_min_blocksize to before memory is allocated. Link: https://lkml.kernel.org/r/20250811223740.110392-1-phillip@squashfs.org.uk Fixes: 734aa85390ea ("Squashfs: check return result of sb_min_blocksize") Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk> Reported-by: Scott GUO <scottzhguo@tencent.com> Closes: https://lore.kernel.org/all/20250811061921.3807353-1-scott_gzh@163.com Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-08-19	kho: warn if KHO is disabled due to an error	Pasha Tatashin
	During boot scratch area is allocated based on command line parameters or auto calculated. However, scratch area may fail to allocate, and in that case KHO is disabled. Currently, no warning is printed that KHO is disabled, which makes it confusing for the end user to figure out why KHO is not available. Add the missing warning message. Link: https://lkml.kernel.org/r/20250808201804.772010-4-pasha.tatashin@soleen.com Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Acked-by: Pratyush Yadav <pratyush@kernel.org> Cc: Alexander Graf <graf@amazon.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Baoquan He <bhe@redhat.com> Cc: Changyuan Lyu <changyuanl@google.com> Cc: Coiby Xu <coxu@redhat.com> Cc: Dave Vasilevsky <dave@vasilevsky.ca> Cc: Eric Biggers <ebiggers@google.com> Cc: Kees Cook <kees@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-08-19	kho: mm: don't allow deferred struct page with KHO	Pasha Tatashin
	KHO uses struct pages for the preserved memory early in boot, however, with deferred struct page initialization, only a small portion of memory has properly initialized struct pages. This problem was detected where vmemmap is poisoned, and illegal flag combinations are detected. Don't allow them to be enabled together, and later we will have to teach KHO to work properly with deferred struct page init kernel feature. Link: https://lkml.kernel.org/r/20250808201804.772010-3-pasha.tatashin@soleen.com Fixes: 4e1d010e3bda ("kexec: add config option for KHO") Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Acked-by: Pratyush Yadav <pratyush@kernel.org> Cc: Alexander Graf <graf@amazon.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Baoquan He <bhe@redhat.com> Cc: Changyuan Lyu <changyuanl@google.com> Cc: Coiby Xu <coxu@redhat.com> Cc: Dave Vasilevsky <dave@vasilevsky.ca> Cc: Eric Biggers <ebiggers@google.com> Cc: Kees Cook <kees@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-08-19	kho: init new_physxa->phys_bits to fix lockdep	Pasha Tatashin
	Patch series "Several KHO Hotfixes". Three unrelated fixes for Kexec Handover. This patch (of 3): Lockdep shows the following warning: INFO: trying to register non-static key. The code is fine but needs lockdep annotation, or maybe you didn't initialize this object before use? turning off the locking correctness validator. [<ffffffff810133a6>] dump_stack_lvl+0x66/0xa0 [<ffffffff8136012c>] assign_lock_key+0x10c/0x120 [<ffffffff81358bb4>] register_lock_class+0xf4/0x2f0 [<ffffffff813597ff>] __lock_acquire+0x7f/0x2c40 [<ffffffff81360cb0>] ? __pfx_hlock_conflict+0x10/0x10 [<ffffffff811707be>] ? native_flush_tlb_global+0x8e/0xa0 [<ffffffff8117096e>] ? __flush_tlb_all+0x4e/0xa0 [<ffffffff81172fc2>] ? __kernel_map_pages+0x112/0x140 [<ffffffff813ec327>] ? xa_load_or_alloc+0x67/0xe0 [<ffffffff81359556>] lock_acquire+0xe6/0x280 [<ffffffff813ec327>] ? xa_load_or_alloc+0x67/0xe0 [<ffffffff8100b9e0>] _raw_spin_lock+0x30/0x40 [<ffffffff813ec327>] ? xa_load_or_alloc+0x67/0xe0 [<ffffffff813ec327>] xa_load_or_alloc+0x67/0xe0 [<ffffffff813eb4c0>] kho_preserve_folio+0x90/0x100 [<ffffffff813ebb7f>] __kho_finalize+0xcf/0x400 [<ffffffff813ebef4>] kho_finalize+0x34/0x70 This is becase xa has its own lock, that is not initialized in xa_load_or_alloc. Modifiy __kho_preserve_order(), to properly call xa_init(&new_physxa->phys_bits); Link: https://lkml.kernel.org/r/20250808201804.772010-2-pasha.tatashin@soleen.com Fixes: fc33e4b44b27 ("kexec: enable KHO support for memory preservation") Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Alexander Graf <graf@amazon.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Baoquan He <bhe@redhat.com> Cc: Changyuan Lyu <changyuanl@google.com> Cc: Coiby Xu <coxu@redhat.com> Cc: Dave Vasilevsky <dave@vasilevsky.ca> Cc: Eric Biggers <ebiggers@google.com> Cc: Kees Cook <kees@kernel.org> Cc: Pratyush Yadav <pratyush@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-08-19	NFS: Fix a race when updating an existing write	Trond Myklebust
	After nfs_lock_and_join_requests() tests for whether the request is still attached to the mapping, nothing prevents a call to nfs_inode_remove_request() from succeeding until we actually lock the page group. The reason is that whoever called nfs_inode_remove_request() doesn't necessarily have a lock on the page group head. So in order to avoid races, let's take the page group lock earlier in nfs_lock_and_join_requests(), and hold it across the removal of the request in nfs_inode_remove_request(). Reported-by: Jeff Layton <jlayton@kernel.org> Tested-by: Joe Quanaim <jdq@meta.com> Tested-by: Andrew Steffen <aksteffen@meta.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Fixes: bd37d6fce184 ("NFSv4: Convert nfs_lock_and_join_requests() to use nfs_page_find_head_request()") Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-08-19	Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	Linus Torvalds
	Pull mount fixes from Al Viro: "Fixes for several recent mount-related regressions" * tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: change_mnt_propagation(): calculate propagation source only if we'll need it use uniform permission checks for all mount propagation changes propagate_umount(): only surviving overmounts should be reparented fix the softlockups in attach_recursive_mnt()
2025-08-19	Merge tag 'ovl-fixes-6.17-rc1' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs Pull overlayfs fixes from Amir Goldstein: "Fixes for two fallouts from Neil's directory locking changes" * tag 'ovl-fixes-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs: ovl: fix possible double unlink ovl: use I_MUTEX_PARENT when locking parent in ovl_create_temp()
2025-08-19	Merge tag 'vfs-6.17-rc3.fixes' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: - Fix two memory leaks in pidfs - Prevent changing the idmapping of an already idmapped mount without OPEN_TREE_CLONE through open_tree_attr() - Don't fail listing extended attributes in kernfs when no extended attributes are set - Fix the return value in coredump_parse() - Fix the error handling for unbuffered writes in netfs - Fix broken data integrity guarantees for O_SYNC writes via iomap - Fix UAF in __mark_inode_dirty() - Keep inode->i_blkbits constant in fuse - Fix coredump selftests - Fix get_unused_fd_flags() usage in do_handle_open() - Rename EXPORT_SYMBOL_GPL_FOR_MODULES to EXPORT_SYMBOL_FOR_MODULES - Fix use-after-free in bh_read() - Fix incorrect lflags value in the move_mount() syscall * tag 'vfs-6.17-rc3.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: signal: Fix memory leak for PIDFD_SELF* sentinels kernfs: don't fail listing extended attributes coredump: Fix return value in coredump_parse() fs/buffer: fix use-after-free when call bh_read() helper pidfs: Fix memory leak in pidfd_info() netfs: Fix unbuffered write error handling fhandle: do_handle_open() should get FD with user flags module: Rename EXPORT_SYMBOL_GPL_FOR_MODULES to EXPORT_SYMBOL_FOR_MODULES fs: fix incorrect lflags value in the move_mount syscall selftests/coredump: Remove the read() that fails the test fuse: keep inode->i_blkbits constant iomap: Fix broken data integrity guarantees for O_SYNC writes selftests/mount_setattr: add smoke tests for open_tree_attr(2) bug open_tree_attr: do not allow id-mapping changes without OPEN_TREE_CLONE fs: writeback: fix use-after-free in __mark_inode_dirty()