linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2025-05-26	Merge branch ↵	Paolo Abeni
	'add-the-capability-to-consume-sram-for-hwfd-descriptor-queue-in-airoha_eth-driver' Lorenzo Bianconi says: ==================== Add the capability to consume SRAM for hwfd descriptor queue in airoha_eth driver In order to improve packet processing and packet forwarding performances, EN7581 SoC supports consuming SRAM instead of DRAM for hw forwarding descriptors queue. For downlink hw accelerated traffic request to consume SRAM memory for hw forwarding descriptors queue. Moreover, in some configurations QDMA blocks require a contiguous block of system memory for hwfd buffers queue. Introduce the capability to allocate hw buffers forwarding queue via the reserved-memory DTS property instead of running dmam_alloc_coherent(). v2: https://lore.kernel.org/r/20250509-airopha-desc-sram-v2-0-9dc3d8076dfb@kernel.org v1: https://lore.kernel.org/r/20250507-airopha-desc-sram-v1-0-d42037431bfa@kernel.org ==================== Link: https://patch.msgid.link/20250521-airopha-desc-sram-v3-0-a6e9b085b4f0@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-26	net: airoha: Add the capability to allocate hfwd descriptors in SRAM	Lorenzo Bianconi
	In order to improve packet processing and packet forwarding performances, EN7581 SoC supports consuming SRAM instead of DRAM for hw forwarding descriptors queue. For downlink hw accelerated traffic request to consume SRAM memory for hw forwarding descriptors queue. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250521-airopha-desc-sram-v3-4-a6e9b085b4f0@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-26	net: airoha: Add the capability to allocate hwfd buffers via reserved-memory	Lorenzo Bianconi
	In some configurations QDMA blocks require a contiguous block of system memory for hwfd buffers queue. Introduce the capability to allocate hw buffers forwarding queue via the reserved-memory DTS property instead of running dmam_alloc_coherent(). Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250521-airopha-desc-sram-v3-3-a6e9b085b4f0@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-26	net: airoha: Do not store hfwd references in airoha_qdma struct	Lorenzo Bianconi
	Since hfwd descriptor and buffer queues are allocated via dmam_alloc_coherent() we do not need to store their references in airoha_qdma struct. This patch does not introduce any logical changes, just code clean-up. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250521-airopha-desc-sram-v3-2-a6e9b085b4f0@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-26	dt-bindings: net: airoha: Add EN7581 memory-region property	Lorenzo Bianconi
	Introduce memory-region and memory-region-names properties for the ethernet node available on EN7581 SoC in order to reserve system memory for hw forwarding buffers queue used by the QDMA modules. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Acked-by: Conor Dooley <conor.dooley@microchip.com> Link: https://patch.msgid.link/20250521-airopha-desc-sram-v3-1-a6e9b085b4f0@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-26	Merge branch 'add-functions-for-txgbe-aml-devices'	Paolo Abeni
	Jiawen Wu says: ==================== Support phylink and link/gpio irqs for AML 25G/10G devices, and complete PTP and SRIOV. ==================== Link: https://patch.msgid.link/ Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-26	net: txgbe: Implement SRIOV for AML devices	Jiawen Wu
	Since .mac_link_up and .mac_link_down are changed for AML 25G/10G NICs, the SR-IOV related function should be invoked in these new functions, to bring VFs link up. Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Link: https://patch.msgid.link/BA8B302B7AAB6EA6+20250521064402.22348-10-jiawenwu@trustnetic.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-26	net: txgbe: Implement PTP for AML devices	Jiawen Wu
	Support PTP clock and 1PPS output signal for AML devices. Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Link: https://patch.msgid.link/F2F6E5E8899D2C20+20250521064402.22348-9-jiawenwu@trustnetic.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-26	net: txgbe: Restrict the use of mismatched FW versions	Jiawen Wu
	The new added mailbox commands require a new released firmware version. Otherwise, a lot of logs "Unknown FW command" would be printed. And the devices may not work properly. So add the test command in the probe function. Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/18283F17BE0FA335+20250521064402.22348-8-jiawenwu@trustnetic.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-26	net: txgbe: Correct the currect link settings	Jiawen Wu
	For AML 25G/10G devices, some of the information returned from phylink_ethtool_ksettings_get() is not correct, since there is a fixed-link mode. So add additional corrections. Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/C94BF867617C544D+20250521064402.22348-7-jiawenwu@trustnetic.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-26	net: txgbe: Support to handle GPIO IRQs for AML devices	Jiawen Wu
	The driver needs to handle GPIO interrupts to identify SFP module and configure PHY by sending mailbox messages to firmware. Since the SFP module needs to wait for ready to get information when it is inserted, workqueue is added to handle delayed tasks. And each SW-FW interaction takes time to wait, so they are processed in the workqueue instead of IRQ handler function. Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/399624AF221E8E28+20250521064402.22348-6-jiawenwu@trustnetic.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-26	net: txgbe: Implement PHYLINK for AML 25G/10G devices	Jiawen Wu
	There is a new PHY attached to AML 25G/10G NIC, which is different from SP 10G/1G NIC. But the PHY configuration is handed over to firmware, and also I2C is controlled by firmware. So the different PHYLINK fixed-link mode is added for these devices. Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/987B973A5929CD48+20250521064402.22348-5-jiawenwu@trustnetic.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-26	net: txgbe: Distinguish between 40G and 25G devices	Jiawen Wu
	For the following patches to support PHYLINK for AML 25G devices, separate MAC type wx_mac_aml40 to maintain the driver of 40G devices. Because 40G devices will complete support later, not now. And this patch makes the 25G devices use some PHYLINK interfaces, but it is not yet create PHYLINK and cannot be used on its own. It is just preparation for the next patches. Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/592B1A6920867D0C+20250521064402.22348-4-jiawenwu@trustnetic.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-26	net: wangxun: Use specific flag bit to simplify the code	Jiawen Wu
	Most of the different code that requires MAC type in the common library is due to NGBE only supports a few queues and pools, unlike TXGBE, which supports 128 queues and 64 pools. This difference accounts for most of the hardware configuration differences in the driver code. So add a flag bit "WX_FLAG_MULTI_64_FUNC" for them to clean-up the driver code. Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/C731132E124D75E5+20250521064402.22348-3-jiawenwu@trustnetic.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-26	net: txgbe: Remove specified SP type	Jiawen Wu
	Since AML devices are going to reuse some definitions, remove the "SP" qualifier from these definitions. Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/8EF712EC14B8FF70+20250521064402.22348-2-jiawenwu@trustnetic.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-26	Merge tag 'vfs-6.16-rc1.writepage' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull final writepage conversion from Christian Brauner: "This converts vboxfs from ->writepage() to ->writepages(). This was the last user of the ->writepage() method. So remove ->writepage() completely and all references to it" * tag 'vfs-6.16-rc1.writepage' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs: Remove aops->writepage mm: Remove swap_writepage() and shmem_writepage() ttm: Call shmem_writeout() from ttm_backup_backup_page() i915: Use writeback_iter() shmem: Add shmem_writeout() writeback: Remove writeback_use_writepage() migrate: Remove call to ->writepage vboxsf: Convert to writepages 9p: Add a migrate_folio method
2025-05-26	net: dsa: microchip: Add SGMII port support to KSZ9477 switch	Tristram Ha
	The KSZ9477 switch driver uses the XPCS driver to operate its SGMII port. However there are some hardware bugs in the KSZ9477 SGMII module so workarounds are needed. There was a proposal to update the XPCS driver to accommodate KSZ9477, but the new code is not generic enough to be used by other vendors. It is better to do all these workarounds inside the KSZ9477 driver instead of modifying the XPCS driver. There are 3 hardware issues. The first is the MII_ADVERTISE register needs to be write once after reset for the correct code word to be sent. The XPCS driver disables auto-negotiation first before configuring the SGMII/1000BASE-X mode and then enables it back. The KSZ9477 driver then writes the MII_ADVERTISE register before enabling auto-negotiation. In 1000BASE-X mode the MII_ADVERTISE register will be set, so KSZ9477 driver does not need to write it. The second issue is the MII_BMCR register needs to set the exact speed and duplex mode when running in SGMII mode. During link polling the KSZ9477 will check the speed and duplex mode are different from previous ones and update the MII_BMCR register accordingly. The last issue is 1000BASE-X mode does not work with auto-negotiation on. The cause is the local port hardware does not know the link is up and so network traffic is not forwarded. The workaround is to write 2 additional bits when 1000BASE-X mode is configured. Note the SGMII interrupt in the port cannot be masked. As that interrupt is not handled in the KSZ9477 driver the SGMII interrupt bit will not be set even when the XPCS driver sets it. Signed-off-by: Tristram Ha <tristram.ha@microchip.com> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20250520230720.23425-1-Tristram.Ha@microchip.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-26	Merge tag 'vfs-6.16-rc1.async.dir' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs directory lookup updates from Christian Brauner: "This contains cleanups for the lookup_one() family of helpers. We expose a set of functions with names containing "lookup_one_len" and others without the "_len". This difference has nothing to do with "len". It's rater a historical accident that can be confusing. The functions without "_len" take a "mnt_idmap" pointer. This is found in the "vfsmount" and that is an important question when choosing which to use: do you have a vfsmount, or are you "inside" the filesystem. A related question is "is permission checking relevant here?". nfsd and cachefiles do* have a vfsmount but don't use the non-_len functions. They pass nop_mnt_idmap and refuse to work on filesystems which have any other idmap. This work changes nfsd and cachefile to use the lookup_one family of functions and to explictily pass &nop_mnt_idmap which is consistent with all other vfs interfaces used where &nop_mnt_idmap is explicitly passed. The remaining uses of the "_one" functions do not require permission checks so these are renamed to be "_noperm" and the permission checking is removed. This series also changes these lookup function to take a qstr instead of separate name and len. In many cases this simplifies the call" * tag 'vfs-6.16-rc1.async.dir' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: VFS: change lookup_one_common and lookup_noperm_common to take a qstr Use try_lookup_noperm() instead of d_hash_and_lookup() outside of VFS VFS: rename lookup_one_len family to lookup_noperm and remove permission check cachefiles: Use lookup_one() rather than lookup_one_len() nfsd: Use lookup_one() rather than lookup_one_len() VFS: improve interface for lookup_one functions
2025-05-26	net: usb: aqc111: fix error handling of usbnet read calls	Nikita Zhandarovich
	Syzkaller, courtesy of syzbot, identified an error (see report [1]) in aqc111 driver, caused by incomplete sanitation of usb read calls' results. This problem is quite similar to the one fixed in commit 920a9fa27e78 ("net: asix: add proper error handling of usb read errors"). For instance, usbnet_read_cmd() may read fewer than 'size' bytes, even if the caller expected the full amount, and aqc111_read_cmd() will not check its result properly. As [1] shows, this may lead to MAC address in aqc111_bind() being only partly initialized, triggering KMSAN warnings. Fix the issue by verifying that the number of bytes read is as expected and not less. [1] Partial syzbot report: BUG: KMSAN: uninit-value in is_valid_ether_addr include/linux/etherdevice.h:208 [inline] BUG: KMSAN: uninit-value in usbnet_probe+0x2e57/0x4390 drivers/net/usb/usbnet.c:1830 is_valid_ether_addr include/linux/etherdevice.h:208 [inline] usbnet_probe+0x2e57/0x4390 drivers/net/usb/usbnet.c:1830 usb_probe_interface+0xd01/0x1310 drivers/usb/core/driver.c:396 call_driver_probe drivers/base/dd.c:-1 [inline] really_probe+0x4d1/0xd90 drivers/base/dd.c:658 __driver_probe_device+0x268/0x380 drivers/base/dd.c:800 ... Uninit was stored to memory at: dev_addr_mod+0xb0/0x550 net/core/dev_addr_lists.c:582 __dev_addr_set include/linux/netdevice.h:4874 [inline] eth_hw_addr_set include/linux/etherdevice.h:325 [inline] aqc111_bind+0x35f/0x1150 drivers/net/usb/aqc111.c:717 usbnet_probe+0xbe6/0x4390 drivers/net/usb/usbnet.c:1772 usb_probe_interface+0xd01/0x1310 drivers/usb/core/driver.c:396 ... Uninit was stored to memory at: ether_addr_copy include/linux/etherdevice.h:305 [inline] aqc111_read_perm_mac drivers/net/usb/aqc111.c:663 [inline] aqc111_bind+0x794/0x1150 drivers/net/usb/aqc111.c:713 usbnet_probe+0xbe6/0x4390 drivers/net/usb/usbnet.c:1772 usb_probe_interface+0xd01/0x1310 drivers/usb/core/driver.c:396 call_driver_probe drivers/base/dd.c:-1 [inline] ... Local variable buf.i created at: aqc111_read_perm_mac drivers/net/usb/aqc111.c:656 [inline] aqc111_bind+0x221/0x1150 drivers/net/usb/aqc111.c:713 usbnet_probe+0xbe6/0x4390 drivers/net/usb/usbnet.c:1772 Reported-by: syzbot+3b6b9ff7b80430020c7b@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=3b6b9ff7b80430020c7b Tested-by: syzbot+3b6b9ff7b80430020c7b@syzkaller.appspotmail.com Fixes: df2d59a2ab6c ("net: usb: aqc111: Add support for getting and setting of MAC address") Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru> Link: https://patch.msgid.link/20250520113240.2369438-1-n.zhandarovich@fintech.ru Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-26	thermal/drivers/acerhdf: Constify struct thermal_zone_device_ops	Christophe JAILLET
	'struct thermal_zone_device_ops' could be left unmodified in this driver. Constifying this structure moves some data to a read-only section, so increases overall security, especially when the structure holds some function pointers. While at it, also constify a struct thermal_zone_params. On a x86_64, with allmodconfig: Before: ====== text data bss dec hex filename 26422 12584 512 39518 9a5e drivers/platform/x86/acerhdf.o After: ===== text data bss dec hex filename 26646 12360 512 39518 9a5e drivers/platform/x86/acerhdf.o Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/e502fadf2c6b24fc4ec3a7880533f7ca68429720.1748177235.git.christophe.jaillet@wanadoo.fr Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-05-26	exfat: do not clear volume dirty flag during sync	Yuezhang Mo
	xfstests generic/482 tests the file system consistency after each FUA operation. It fails when run on exfat. exFAT clears the volume dirty flag with a FUA operation during sync. Since s_lock is not held when data is being written to a file, sync can be executed at the same time. When data is being written to a file, the FAT chain is updated first, and then the file size is updated. If sync is executed between updating them, the length of the FAT chain may be inconsistent with the file size. To avoid the situation where the file system is inconsistent but the volume dirty flag is cleared, this commit moves the clearing of the volume dirty flag from exfat_fs_sync() to exfat_put_super(), so that the volume dirty flag is not cleared until unmounting. After the move, there is no additional action during sync, so exfat_fs_sync() can be deleted. Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2025-05-26	exfat: fix double free in delayed_free	Namjae Jeon
	The double free could happen in the following path. exfat_create_upcase_table() exfat_create_upcase_table() : return error exfat_free_upcase_table() : free ->vol_utbl exfat_load_default_upcase_table : return error exfat_kill_sb() delayed_free() exfat_free_upcase_table() <--------- double free This patch set ->vol_util as NULL after freeing it. Reported-by: Jianzhou Zhao <xnxc22xnxc22@qq.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2025-05-26	rust: opp: Make the doctest example depend on CONFIG_OF	Viresh Kumar
	The doctest example uses a function only available for CONFIG_OF and so the build with doc tests fails when it isn't enabled. error[E0599]: no function or associated item named `from_of_cpumask` found for struct `rust_doctest_kernel_alloc_kbox_rs_4::kernel::opp::Table` in the current scope Fix this by making the doctest depend on CONFIG_OF. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202505260856.ZQWHW2xT-lkp@intel.com/ Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Link: https://patch.msgid.link/a80bfedcb4d94531dc27d3b48062db5042078e88.1748237646.git.viresh.kumar@linaro.org Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-05-26	Merge tag 'opp-updates-6.16' of ↵	Rafael J. Wysocki
	git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm Merge operating performance points (OPP) updates for 6.16 from Viresh Kumar: "- OPP: Add dev_pm_opp_set_level() (Praveen Talari). - Introduce scope-based cleanup headers and mutex locking guards in OPP core (Viresh Kumar). - switch to use kmemdup_array() (Zhang Enpei)." * tag 'opp-updates-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm: OPP: switch to use kmemdup_array() OPP: Add dev_pm_opp_set_level() OPP: Use mutex locking guards OPP: Define and use scope-based cleanup helpers OPP: Use scope-based OF cleanup helpers OPP: Return opp_table from dev_pm_opp_get_opp_table_ref() OPP: Return opp from dev_pm_opp_get() OPP: Remove _get_opp_table_kref()
2025-05-26	arm64: dts: renesas: rzg3e-smarc-som: Reduce I2C2 clock frequency	John Madieu
	Lower the I2C2 bus clock frequency on the RZ/G3E SMARC SoM from 1MHz to 400kHz to improve compatibility with a wider range of I2C peripherals. As the GreenPAK device is programmed to operate at 400kHz, the previous 1MHz setting was too aggressive, causing it to experience timing issues. Fixes: f7a98e256ee3 ("arm64: dts: renesas: rzg3e-smarc-som: Add I2C2 device pincontrol") Signed-off-by: John Madieu <john.madieu.xa@bp.renesas.com> Reviewed-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://lore.kernel.org/20250518220812.1480696-1-john.madieu.xa@bp.renesas.com Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-05-26	net: neigh: use kfree_skb_reason() in neigh_resolve_output() and ↵	Qiu Yutan
	neigh_connected_output() Replace kfree_skb() used in neigh_resolve_output() and neigh_connected_output() with kfree_skb_reason(). Following new skb drop reason is added: /* failed to fill the device hard header */ SKB_DROP_REASON_NEIGH_HH_FILLFAIL Signed-off-by: Qiu Yutan <qiu.yutan@zte.com.cn> Signed-off-by: Jiang Kun <jiang.kun2@zte.com.cn> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Xu Xin <xu.xin16@zte.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-05-26	selftests: ncdevmem: add tx test with multiple IOVs	Stanislav Fomichev
	Use prime 3 for length to make offset slowly drift away. Signed-off-by: Stanislav Fomichev <stfomichev@gmail.com> Acked-by: Mina Almasry <almasrymina@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-05-26	selftests: ncdevmem: make chunking optional	Stanislav Fomichev
	Add new -z argument to specify max IOV size. By default, use single large IOV. Signed-off-by: Stanislav Fomichev <stfomichev@gmail.com> Reviewed-by: Mina Almasry <almasrymina@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-05-26	net: devmem: support single IOV with sendmsg	Stanislav Fomichev
	sendmsg() with a single iov becomes ITER_UBUF, sendmsg() with multiple iovs becomes ITER_IOVEC. iter_iov_len does not return correct value for UBUF, so teach to treat UBUF differently. Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Pavel Begunkov <asml.silence@gmail.com> Cc: Mina Almasry <almasrymina@google.com> Fixes: bd61848900bf ("net: devmem: Implement TX path") Signed-off-by: Stanislav Fomichev <stfomichev@gmail.com> Acked-by: Mina Almasry <almasrymina@google.com> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-05-26	x86/fpu: Fix irq_fpu_usable() to return false during CPU onlining	Eric Biggers
	irq_fpu_usable() incorrectly returned true before the FPU is initialized. The x86 CPU onlining code can call sha256() to checksum AMD microcode images, before the FPU is initialized. Since sha256() recently gained a kernel-mode FPU optimized code path, a crash occurred in kernel_fpu_begin_mask() during hotplug CPU onlining. (The crash did not occur during boot-time CPU onlining, since the optimized sha256() code is not enabled until subsys_initcalls run.) Fix this by making irq_fpu_usable() return false before fpu__init_cpu() has run. To do this without adding any additional overhead to irq_fpu_usable(), replace the existing per-CPU bool in_kernel_fpu with kernel_fpu_allowed which tracks both initialization and usage rather than just usage. The initial state is false; FPU initialization sets it to true; kernel-mode FPU sections toggle it to false and then back to true; and CPU offlining restores it to the initial state of false. Fixes: 11d7956d526f ("crypto: x86/sha256 - implement library instead of shash") Reported-by: Ayush Jain <Ayush.Jain3@amd.com> Closes: https://lore.kernel.org/r/20250516112217.GBaCcf6Yoc6LkIIryP@fat_crate.local Signed-off-by: Eric Biggers <ebiggers@google.com> Tested-by: Ayush Jain <Ayush.Jain3@amd.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-05-25	Linux 6.15v6.15	Linus Torvalds

2025-05-25	Disable FOP_DONTCACHE for now due to bugs	Linus Torvalds
	This is kind of last-minute, but Al Viro reported that the new FOP_DONTCACHE flag causes memory corruption due to use-after-free issues. This was triggered by commit 974c5e6139db ("xfs: flag as supporting FOP_DONTCACHE"), but that is not the underlying bug - it is just the first user of the flag. Vlastimil Babka suspects the underlying problem stems from the folio_end_writeback() logic introduced in commit fb7d3bc414939 ("mm/filemap: drop streaming/uncached pages when writeback completes"). The most straightforward fix would be to just revert the commit that exposed this, but Matthew Wilcox points out that other filesystems are also starting to enable the FOP_DONTCACHE logic, so this instead disables that bit globally for now. The fix will hopefully end up being trivial and we can just re-enable this logic after more testing, but until such a time we'll have to disable the new FOP_DONTCACHE flag. Reported-by: Al Viro <viro@zeniv.linux.org.uk> Link: https://lore.kernel.org/all/20250525083209.GS2023217@ZenIV/ Triggered-by: 974c5e6139db ("xfs: flag as supporting FOP_DONTCACHE") Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Matthew Wilcox <willy@infradead.org> Cc: Jan Kara <jack@suse.cz> Cc: Jens Axboe <axboe@kernel.dk> Cc: Christoph Hellwig <hch@lst.de> Cc: Darrick J. Wong <djwong@kernel.org> Cc: Christian Brauner <brauner@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-05-25	platform/x86/amd/hsmp: fix building with CONFIG_HWMON=m	Arnd Bergmann
	When CONFIG_HWMON is built as a loadable module, the HSMP drivers cannot be built-in: ERROR: modpost: "hsmp_create_sensor" [drivers/platform/x86/amd/hsmp/amd_hsmp.ko] undefined! ERROR: modpost: "hsmp_create_sensor" [drivers/platform/x86/amd/hsmp/hsmp_acpi.ko] undefined! Enforce that through the usual Kconfig dependnecy trick. Fixes: 92c025db52bb ("platform/x86/amd/hsmp: Report power via hwmon sensors") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20250522144422.2824083-1-arnd@kernel.org Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-05-25	platform/x86: asus-wmi: fix build without CONFIG_SUSPEND	Luke Jones
	The patch "Refactor Ally suspend/resume" introduced an acpi_s2idle_dev_ops for use with ROG Ally which caused a build error if CONFIG_SUSPEND was not defined. Signed-off-by: Luke Jones <luke@ljones.dev> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202505090418.DaeaXe4i-lkp@intel.com/ Fixes: feea7bd6b02d ("platform/x86: asus-wmi: Refactor Ally suspend/resume") Link: https://lore.kernel.org/r/20250524101343.57883-2-luke@ljones.dev Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-05-25	Merge tag 'mm-hotfixes-stable-2025-05-25-00-58' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull hotfixes from Andrew Morton: "22 hotfixes. 13 are cc:stable and the remainder address post-6.14 issues or aren't considered necessary for -stable kernels. 19 are for MM" * tag 'mm-hotfixes-stable-2025-05-25-00-58' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (22 commits) mailmap: add Jarkko's employer email address mm: fix copy_vma() error handling for hugetlb mappings memcg: always call cond_resched() after fn() mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb folios mm: vmalloc: only zero-init on vrealloc shrink mm: vmalloc: actually use the in-place vrealloc region alloc_tag: allocate percpu counters for module tags dynamically module: release codetag section when module load fails mm/cma: make detection of highmem_start more robust MAINTAINERS: add mm memory policy section MAINTAINERS: add mm ksm section kasan: avoid sleepable page allocation from atomic context highmem: add folio_test_partial_kmap() MAINTAINERS: add hung-task detector section taskstats: fix struct taskstats breaks backward compatibility since version 15 mm/truncate: fix out-of-bounds when doing a right-aligned split MAINTAINERS: add mm reclaim section MAINTAINERS: update page allocator section mm: fix VM_UFFD_MINOR == VM_SHADOW_STACK on USERFAULTFD=y && ARM64_GCS=y mm: mmap: map MAP_STACK to VM_NOHUGEPAGE only if THP is enabled ...
2025-05-25	net: ethernet: mtk_eth_soc: Correct spelling	Simon Horman
	Correct spelling of platforms, various, and initial. As flagged by codespell. Signed-off-by: Simon Horman <horms@kernel.org> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-05-25	net: dlink: Correct endian treatment of t_SROM data	Simon Horman
	As it's name suggests, parse_eeprom() parses EEPROM data. This is done by reading data, 16 bits at a time as follows: for (i = 0; i < 128; i++) ((__le16 *) sromdata)[i] = cpu_to_le16(read_eeprom(np, i)); sromdata is at the same memory location as psrom. And the type of psrom is a pointer to struct t_SROM. As can be seen in the loop above, data is stored in sromdata, and thus psrom, as 16-bit little-endian values. However, the integer fields of t_SROM are host byte order. In the case of the led_mode field this results in a but which has been addressed by commit e7e5ae71831c ("net: dlink: Correct endianness handling of led_mode"). In the case of the remaining fields, which are updated by this patch, I do not believe this does not result in any bugs. But it does seem best to correctly annotate the endianness of integers. Flagged by Sparse as: .../dl2k.c:344:35: warning: restricted __le32 degrades to integer Compile tested only. No run-time change intended. Signed-off-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-05-25	octeontx2-af: NPC: Clear Unicast rule on nixlf detach	Hariprasad Kelam
	The AF driver assigns reserved MCAM entries (for unicast, broadcast, etc.) based on the NIXLF number. When a NIXLF is detached, these entries are disabled. For example, PF NIXLF -------------------- PF0 0 SDP-VF0 1 If the user unbinds both PF0 and SDP-VF0 interfaces and then binds them in reverse order PF NIXLF --------------------- SDP-VF0 0 PF0 1 In this scenario, the PF0 unicast entry is getting corrupted because the MCAM entry contains stale data (SDP-VF0 ucast data) This patch resolves the issue by clearing the unicast MCAM entry during NIXLF detach Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-05-25	RDMA/bnxt_re: Support extended stats for Thor2 VF	Ajit Khaparde
	The driver currently checks if the user is querying VF RoCE statistics. It will not send the query_roce_stats_ext HWRM command if it is for a VF. But Thor2 VF can support extended statistics. Allow query of extended stats for Thor2 VFs. Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Signed-off-by: Shravya KN <shravya.k-n@broadcom.com> Link: https://patch.msgid.link/20250523075952.1267827-1-kalesh-anakkur.purayil@broadcom.com Reviewed-by: Damodharam Ammepalli <damodharam.ammepalli@broadcom.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-05-25	RDMA/hns: Fix endian issue in trace events	Junxian Huang
	Avoid using __le32 directly in trace events to fix sparse complains: drivers/infiniband/hw/hns/./hns_roce_trace.h:48:1: sparse: sparse: cast to restricted __le32 drivers/infiniband/hw/hns/./hns_roce_trace.h:48:1: sparse: sparse: restricted __le32 degrades to integer drivers/infiniband/hw/hns/./hns_roce_trace.h:48:1: sparse: sparse: restricted __le32 degrades to integer drivers/infiniband/hw/hns/./hns_roce_trace.h:90:1: sparse: sparse: cast to restricted __le32 drivers/infiniband/hw/hns/./hns_roce_trace.h:90:1: sparse: sparse: restricted __le32 degrades to integer drivers/infiniband/hw/hns/./hns_roce_trace.h:90:1: sparse: sparse: restricted __le32 degrades to integer drivers/infiniband/hw/hns/./hns_roce_trace.h:173:1: sparse: sparse: cast to restricted __le32 drivers/infiniband/hw/hns/./hns_roce_trace.h:173:1: sparse: sparse: restricted __le32 degrades to integer drivers/infiniband/hw/hns/./hns_roce_trace.h:173:1: sparse: sparse: restricted __le32 degrades to integer Fixes: 6c98c8670806 ("RDMA/hns: Add trace for WQE dumping") Fixes: 1e63e2f96613 ("RDMA/hns: Add trace for AEQE dumping") Fixes: 6bd18dabf1c9 ("RDMA/hns: Add trace for CMDQ dumping") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202505170327.TNOpreil-lkp@intel.com/ Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20250523023433.2171003-1-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-05-25	RDMA/mlx5: Avoid flexible array warning	Leon Romanovsky
	The following warning is reported by sparse tool: drivers/infiniband/hw/mlx5/fs.c:1664:26: warning: array of flexible structures Avoid it by simply splitting array into two separate structs. Link: https://patch.msgid.link/7b891b96a9fc053d01284c184d25ae98d35db2d4.1747827041.git.leon@kernel.org Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2025-05-25	IB/cm: Remove dead code and adjust naming	Vlad Dumitrescu
	Drop ib_send_cm_mra parameters which are always constant. Remove branch which is never taken. Adjust name to ib_prepare_cm_mra, which better reflects its functionality - no MRA is actually sent. Adjust name of related tracepoints. Push setting of the constant service timeout to cm.c and drop IB_CM_MRA_FLAG_DELAY. Signed-off-by: Vlad Dumitrescu <vdumitrescu@nvidia.com> Reviewed-by: Sean Hefty <shefty@nvidia.com> Link: https://patch.msgid.link/cdd2a237acf2b495c19ce02e4b1c42c41c6751c2.1747827207.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-05-25	RDMA/core: Avoid hmm_dma_map_alloc() for virtual DMA devices	Daisuke Matsuda
	Drivers such as rxe, which use virtual DMA, must not call into the DMA mapping core since they lack physical DMA capabilities. Otherwise, a NULL pointer dereference is observed as shown below. This patch ensures the RDMA core handles virtual and physical DMA paths appropriately. This fixes the following kernel oops: BUG: kernel NULL pointer dereference, address: 00000000000002fc #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 1028eb067 P4D 1028eb067 PUD 105da0067 PMD 0 Oops: Oops: 0000 [#1] SMP NOPTI CPU: 3 UID: 1000 PID: 1854 Comm: python3 Tainted: G W 6.15.0-rc1+ #11 PREEMPT(voluntary) Tainted: [W]=WARN Hardware name: Trigkey Key N/Key N, BIOS KEYN101 09/02/2024 RIP: 0010:hmm_dma_map_alloc+0x25/0x100 Code: 90 90 90 90 90 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 49 89 d6 49 c1 e6 0c 41 55 41 54 53 49 39 ce 0f 82 c6 00 00 00 49 89 fc <f6> 87 fc 02 00 00 20 0f 84 af 00 00 00 49 89 f5 48 89 d3 49 89 cf RSP: 0018:ffffd3d3420eb830 EFLAGS: 00010246 RAX: 0000000000001000 RBX: ffff8b727c7f7400 RCX: 0000000000001000 RDX: 0000000000000001 RSI: ffff8b727c7f74b0 RDI: 0000000000000000 RBP: ffffd3d3420eb858 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 00007262a622a000 R14: 0000000000001000 R15: ffff8b727c7f74b0 FS: 00007262a62a1080(0000) GS:ffff8b762ac3e000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000002fc CR3: 000000010a1f0004 CR4: 0000000000f72ef0 PKRU: 55555554 Call Trace: <TASK> ib_init_umem_odp+0xb6/0x110 [ib_uverbs] ib_umem_odp_get+0xf0/0x150 [ib_uverbs] rxe_odp_mr_init_user+0x71/0x170 [rdma_rxe] rxe_reg_user_mr+0x217/0x2e0 [rdma_rxe] ib_uverbs_reg_mr+0x19e/0x2e0 [ib_uverbs] ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0xd9/0x150 [ib_uverbs] ib_uverbs_cmd_verbs+0xd19/0xee0 [ib_uverbs] ? mmap_region+0x63/0xd0 ? __pfx_ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0x10/0x10 [ib_uverbs] ib_uverbs_ioctl+0xba/0x130 [ib_uverbs] __x64_sys_ioctl+0xa4/0xe0 x64_sys_call+0x1178/0x2660 do_syscall_64+0x7e/0x170 ? syscall_exit_to_user_mode+0x4e/0x250 ? do_syscall_64+0x8a/0x170 ? do_syscall_64+0x8a/0x170 ? syscall_exit_to_user_mode+0x4e/0x250 ? do_syscall_64+0x8a/0x170 ? syscall_exit_to_user_mode+0x4e/0x250 ? do_syscall_64+0x8a/0x170 ? do_user_addr_fault+0x1d2/0x8d0 ? irqentry_exit_to_user_mode+0x43/0x250 ? irqentry_exit+0x43/0x50 ? exc_page_fault+0x93/0x1d0 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7262a6124ded Code: 04 25 28 00 00 00 48 89 45 c8 31 c0 48 8d 45 10 c7 45 b0 10 00 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1a 48 8b 45 c8 64 48 2b 04 25 28 00 00 00 RSP: 002b:00007fffd08c3960 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007fffd08c39f0 RCX: 00007262a6124ded RDX: 00007fffd08c3a10 RSI: 00000000c0181b01 RDI: 0000000000000007 RBP: 00007fffd08c39b0 R08: 0000000014107820 R09: 00007fffd08c3b44 R10: 000000000000000c R11: 0000000000000246 R12: 00007fffd08c3b44 R13: 000000000000000c R14: 00007fffd08c3b58 R15: 0000000014107960 </TASK> Fixes: 1efe8c0670d6 ("RDMA/core: Convert UMEM ODP DMA mapping to caching IOVA and page linkage") Closes: https://lore.kernel.org/all/3e8f343f-7d66-4f7a-9f08-3910623e322f@gmail.com/ Signed-off-by: Daisuke Matsuda <dskmtsd@gmail.com> Link: https://patch.msgid.link/20250524144328.4361-1-dskmtsd@gmail.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-05-25	Merge branch 'locking/futex' into locking/core, to pick up pending futex changes	Ingo Molnar
	Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-05-25	mailmap: add Jarkko's employer email address	Jarkko Sakkinen
	Add the current employer email address to mailmap. Link: https://lkml.kernel.org/r/20250523121105.15850-1-jarkko@kernel.org Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> Cc: Alexander Sverdlin <alexander.sverdlin@gmail.com> Cc: Antonio Quartulli <antonio@openvpn.net> Cc: Carlos Bilbao <carlos.bilbao@kernel.org> Cc: Kees Cook <kees@kernel.org> Cc: Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-25	mm: fix copy_vma() error handling for hugetlb mappings	Ricardo Cañuelo Navarro
	If, during a mremap() operation for a hugetlb-backed memory mapping, copy_vma() fails after the source vma has been duplicated and opened (ie. vma_link() fails), the error is handled by closing the new vma. This updates the hugetlbfs reservation counter of the reservation map which at this point is referenced by both the source vma and the new copy. As a result, once the new vma has been freed and copy_vma() returns, the reservation counter for the source vma will be incorrect. This patch addresses this corner case by clearing the hugetlb private page reservation reference for the new vma and decrementing the reference before closing the vma, so that vma_close() won't update the reservation counter. This is also what copy_vma_and_data() does with the source vma if copy_vma() succeeds, so a helper function has been added to do the fixup in both functions. The issue was reported by a private syzbot instance and can be reproduced using the C reproducer in [1]. It's also a possible duplicate of public syzbot report [2]. The WARNING report is: ============================================================ page_counter underflow: -1024 nr_pages=1024 WARNING: CPU: 0 PID: 3287 at mm/page_counter.c:61 page_counter_cancel+0xf6/0x120 Modules linked in: CPU: 0 UID: 0 PID: 3287 Comm: repro__WARNING_ Not tainted 6.15.0-rc7+ #54 NONE Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-2-gc13ff2cd-prebuilt.qemu.org 04/01/2014 RIP: 0010:page_counter_cancel+0xf6/0x120 Code: ff 5b 41 5e 41 5f 5d c3 cc cc cc cc e8 f3 4f 8f ff c6 05 64 01 27 06 01 48 c7 c7 60 15 f8 85 48 89 de 4c 89 fa e8 2a a7 51 ff <0f> 0b e9 66 ff ff ff 44 89 f9 80 e1 07 38 c1 7c 9d 4c 81 RSP: 0018:ffffc900025df6a0 EFLAGS: 00010246 RAX: 2edfc409ebb44e00 RBX: fffffffffffffc00 RCX: ffff8880155f0000 RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000000 RBP: dffffc0000000000 R08: ffffffff81c4a23c R09: 1ffff1100330482a R10: dffffc0000000000 R11: ffffed100330482b R12: 0000000000000000 R13: ffff888058a882c0 R14: ffff888058a882c0 R15: 0000000000000400 FS: 0000000000000000(0000) GS:ffff88808fc53000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000004b33e0 CR3: 00000000076d6000 CR4: 00000000000006f0 Call Trace: <TASK> page_counter_uncharge+0x33/0x80 hugetlb_cgroup_uncharge_counter+0xcb/0x120 hugetlb_vm_op_close+0x579/0x960 ? __pfx_hugetlb_vm_op_close+0x10/0x10 remove_vma+0x88/0x130 exit_mmap+0x71e/0xe00 ? __pfx_exit_mmap+0x10/0x10 ? __mutex_unlock_slowpath+0x22e/0x7f0 ? __pfx_exit_aio+0x10/0x10 ? __up_read+0x256/0x690 ? uprobe_clear_state+0x274/0x290 ? mm_update_next_owner+0xa9/0x810 __mmput+0xc9/0x370 exit_mm+0x203/0x2f0 ? __pfx_exit_mm+0x10/0x10 ? taskstats_exit+0x32b/0xa60 do_exit+0x921/0x2740 ? do_raw_spin_lock+0x155/0x3b0 ? __pfx_do_exit+0x10/0x10 ? __pfx_do_raw_spin_lock+0x10/0x10 ? _raw_spin_lock_irq+0xc5/0x100 do_group_exit+0x20c/0x2c0 get_signal+0x168c/0x1720 ? __pfx_get_signal+0x10/0x10 ? schedule+0x165/0x360 arch_do_signal_or_restart+0x8e/0x7d0 ? __pfx_arch_do_signal_or_restart+0x10/0x10 ? __pfx___se_sys_futex+0x10/0x10 syscall_exit_to_user_mode+0xb8/0x2c0 do_syscall_64+0x75/0x120 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x422dcd Code: Unable to access opcode bytes at 0x422da3. RSP: 002b:00007ff266cdb208 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca RAX: 0000000000000001 RBX: 00007ff266cdbcdc RCX: 0000000000422dcd RDX: 00000000000f4240 RSI: 0000000000000081 RDI: 00000000004c7bec RBP: 00007ff266cdb220 R08: 203a6362696c6720 R09: 203a6362696c6720 R10: 0000200000c00000 R11: 0000000000000246 R12: ffffffffffffffd0 R13: 0000000000000002 R14: 00007ffe1cb5f520 R15: 00007ff266cbb000 </TASK> ============================================================ Link: https://lkml.kernel.org/r/20250523-warning_in_page_counter_cancel-v2-1-b6df1a8cfefd@igalia.com Link: https://people.igalia.com/rcn/kernel_logs/20250422__WARNING_in_page_counter_cancel__repro.c [1] Link: https://lore.kernel.org/all/67000a50.050a0220.49194.048d.GAE@google.com/ [2] Signed-off-by: Ricardo Cañuelo Navarro <rcn@igalia.com> Suggested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Florent Revest <revest@google.com> Cc: Jann Horn <jannh@google.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-25	memcg: always call cond_resched() after fn()	Breno Leitao
	I am seeing soft lockup on certain machine types when a cgroup OOMs. This is happening because killing the process in certain machine might be very slow, which causes the soft lockup and RCU stalls. This happens usually when the cgroup has MANY processes and memory.oom.group is set. Example I am seeing in real production: [462012.244552] Memory cgroup out of memory: Killed process 3370438 (crosvm) .... .... [462037.318059] Memory cgroup out of memory: Killed process 4171372 (adb) .... [462037.348314] watchdog: BUG: soft lockup - CPU#64 stuck for 26s! [stat_manager-ag:1618982] .... Quick look at why this is so slow, it seems to be related to serial flush for certain machine types. For all the crashes I saw, the target CPU was at console_flush_all(). In the case above, there are thousands of processes in the cgroup, and it is soft locking up before it reaches the 1024 limit in the code (which would call the cond_resched()). So, cond_resched() in 1024 blocks is not sufficient. Remove the counter-based conditional rescheduling logic and call cond_resched() unconditionally after each task iteration, after fn() is called. This avoids the lockup independently of how slow fn() is. Link: https://lkml.kernel.org/r/20250523-memcg_fix-v1-1-ad3eafb60477@debian.org Fixes: ade81479c7dd ("memcg: fix soft lockup in the OOM process") Signed-off-by: Breno Leitao <leitao@debian.org> Suggested-by: Rik van Riel <riel@surriel.com> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Michael van der Westhuizen <rmikey@meta.com> Cc: Usama Arif <usamaarif642@gmail.com> Cc: Pavel Begunkov <asml.silence@gmail.com> Cc: Chen Ridong <chenridong@huawei.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-25	mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb ↵	Ge Yang
	folios A kernel crash was observed when replacing free hugetlb folios: BUG: kernel NULL pointer dereference, address: 0000000000000028 PGD 0 P4D 0 Oops: Oops: 0000 [#1] SMP NOPTI CPU: 28 UID: 0 PID: 29639 Comm: test_cma.sh Tainted 6.15.0-rc6-zp #41 PREEMPT(voluntary) RIP: 0010:alloc_and_dissolve_hugetlb_folio+0x1d/0x1f0 RSP: 0018:ffffc9000b30fa90 EFLAGS: 00010286 RAX: 0000000000000000 RBX: 0000000000342cca RCX: ffffea0043000000 RDX: ffffc9000b30fb08 RSI: ffffea0043000000 RDI: 0000000000000000 RBP: ffffc9000b30fb20 R08: 0000000000001000 R09: 0000000000000000 R10: ffff88886f92eb00 R11: 0000000000000000 R12: ffffea0043000000 R13: 0000000000000000 R14: 00000000010c0200 R15: 0000000000000004 FS: 00007fcda5f14740(0000) GS:ffff8888ec1d8000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000028 CR3: 0000000391402000 CR4: 0000000000350ef0 Call Trace: <TASK> replace_free_hugepage_folios+0xb6/0x100 alloc_contig_range_noprof+0x18a/0x590 ? srso_return_thunk+0x5/0x5f ? down_read+0x12/0xa0 ? srso_return_thunk+0x5/0x5f cma_range_alloc.constprop.0+0x131/0x290 __cma_alloc+0xcf/0x2c0 cma_alloc_write+0x43/0xb0 simple_attr_write_xsigned.constprop.0.isra.0+0xb2/0x110 debugfs_attr_write+0x46/0x70 full_proxy_write+0x62/0xa0 vfs_write+0xf8/0x420 ? srso_return_thunk+0x5/0x5f ? filp_flush+0x86/0xa0 ? srso_return_thunk+0x5/0x5f ? filp_close+0x1f/0x30 ? srso_return_thunk+0x5/0x5f ? do_dup2+0xaf/0x160 ? srso_return_thunk+0x5/0x5f ksys_write+0x65/0xe0 do_syscall_64+0x64/0x170 entry_SYSCALL_64_after_hwframe+0x76/0x7e There is a potential race between __update_and_free_hugetlb_folio() and replace_free_hugepage_folios(): CPU1 CPU2 __update_and_free_hugetlb_folio replace_free_hugepage_folios folio_test_hugetlb(folio) -- It's still hugetlb folio. __folio_clear_hugetlb(folio) hugetlb_free_folio(folio) h = folio_hstate(folio) -- Here, h is NULL pointer When the above race condition occurs, folio_hstate(folio) returns NULL, and subsequent access to this NULL pointer will cause the system to crash. To resolve this issue, execute folio_hstate(folio) under the protection of the hugetlb_lock lock, ensuring that folio_hstate(folio) does not return NULL. Link: https://lkml.kernel.org/r/1747884137-26685-1-git-send-email-yangge1116@126.com Fixes: 04f13d241b8b ("mm: replace free hugepage folios after migration") Signed-off-by: Ge Yang <yangge1116@126.com> Reviewed-by: Muchun Song <muchun.song@linux.dev> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <21cnbao@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-25	mm: vmalloc: only zero-init on vrealloc shrink	Kees Cook
	The common case is to grow reallocations, and since init_on_alloc will have already zeroed the whole allocation, we only need to zero when shrinking the allocation. Link: https://lkml.kernel.org/r/20250515214217.619685-2-kees@kernel.org Fixes: a0309faf1cb0 ("mm: vmalloc: support more granular vrealloc() sizing") Signed-off-by: Kees Cook <kees@kernel.org> Tested-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Cc: Danilo Krummrich <dakr@kernel.org> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: "Erhard F." <erhard_f@mailbox.org> Cc: Shung-Hsi Yu <shung-hsi.yu@suse.com> Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-25	mm: vmalloc: actually use the in-place vrealloc region	Kees Cook
	Patch series "mm: vmalloc: Actually use the in-place vrealloc region". This fixes a performance regression[1] with vrealloc()[1]. The refactoring to not build a new vmalloc region only actually worked when shrinking. Actually return the resized area when it grows. Ugh. Link: https://lkml.kernel.org/r/20250515214217.619685-1-kees@kernel.org Fixes: a0309faf1cb0 ("mm: vmalloc: support more granular vrealloc() sizing") Signed-off-by: Kees Cook <kees@kernel.org> Reported-by: Shung-Hsi Yu <shung-hsi.yu@suse.com> Closes: https://lore.kernel.org/all/20250515-bpf-verifier-slowdown-vwo2meju4cgp2su5ckj@6gi6ssxbnfqg [1] Tested-by: Eduard Zingerman <eddyz87@gmail.com> Tested-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Tested-by: Shung-Hsi Yu <shung-hsi.yu@suse.com> Reviewed-by: "Uladzislau Rezki (Sony)" <urezki@gmail.com> Reviewed-by: Danilo Krummrich <dakr@kernel.org> Cc: "Erhard F." <erhard_f@mailbox.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>