linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2025-04-17	fs: move the bdex_statx call to vfs_getattr_nosec	Christoph Hellwig
	Currently bdex_statx is only called from the very high-level vfs_statx_path function, and thus bypassing it for in-kernel calls to vfs_getattr or vfs_getattr_nosec. This breaks querying the block ѕize of the underlying device in the loop driver and also is a pitfall for any other new kernel caller. Move the call into the lowest level helper to ensure all callers get the right results. Fixes: 2d985f8c6b91 ("vfs: support STATX_DIOALIGN on block devices") Fixes: f4774e92aab8 ("loop: take the file system minimum dio alignment into account") Reported-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/20250417064042.712140-1-hch@lst.de Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-04-17	dma-mapping: avoid potential unused data compilation warning	Marek Szyprowski
	When CONFIG_NEED_DMA_MAP_STATE is not defined, dma-mapping clients might report unused data compilation warnings for dma_unmap_*() calls arguments. Redefine macros for those calls to let compiler to notice that it is okay when the provided arguments are not used. Reported-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Tested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20250415075659.428549-1-m.szyprowski@samsung.com
2025-04-16	Merge tag 'mm-hotfixes-stable-2025-04-16-19-59' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc hotfixes from Andrew Morton: "31 hotfixes. 9 are cc:stable and the remainder address post-6.15 issues or aren't considered necessary for -stable kernels. 22 patches are for MM, 9 are otherwise" * tag 'mm-hotfixes-stable-2025-04-16-19-59' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (31 commits) MAINTAINERS: update HUGETLB reviewers mm: fix apply_to_existing_page_range() selftests/mm: fix compiler -Wmaybe-uninitialized warning alloc_tag: handle incomplete bulk allocations in vm_module_tags_populate mailmap: add entry for Jean-Michel Hautbois mm: (un)track_pfn_copy() fix + doc improvements mm: fix filemap_get_folios_contig returning batches of identical folios mm/hugetlb: add a line break at the end of the format string selftests: mincore: fix tmpfs mincore test failure mm/hugetlb: fix set_max_huge_pages() when there are surplus pages mm/cma: report base address of single range correctly mm: page_alloc: speed up fallbacks in rmqueue_bulk() kunit: slub: add module description mm/kasan: add module decription ucs2_string: add module description zlib: add module description fpga: tests: add module descriptions samples/livepatch: add module descriptions ASN.1: add module description mm/vma: add give_up_on_oom option on modify/merge, use in uffd release ...
2025-04-16	sched/topology: Introduce sched_update_asym_prefer_cpu()	K Prateek Nayak
	A subset of AMD Processors supporting Preferred Core Rankings also feature the ability to dynamically switch these rankings at runtime to bias load balancing towards or away from the LLC domain with larger cache. To support dynamically updating "sg->asym_prefer_cpu" without needing to rebuild the sched domain, introduce sched_update_asym_prefer_cpu() which recomutes the "asym_prefer_cpu" when the core-ranking of a CPU changes. sched_update_asym_prefer_cpu() swaps the "sg->asym_prefer_cpu" with the CPU whose ranking has changed if the new ranking is greater than that of the "asym_prefer_cpu". If CPU whose ranking has changed is the current "asym_prefer_cpu", it scans the CPUs of the sched groups to find the new "asym_prefer_cpu" and sets it accordingly. get_group() for non-overlapping sched domains returns the sched group for the first CPU in the sched_group_span() which ensures all CPUs in the group see the updated value of "asym_prefer_cpu". Overlapping groups are allocated differently and will require moving the "asym_prefer_cpu" to "sg->sgc" but since the current implementations do not set "SD_ASYM_PACKING" at NUMA domains, skip additional indirection and place a SCHED_WARN_ON() to alert any future users. Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250409053446.23367-3-kprateek.nayak@amd.com
2025-04-16	i2c: core: Deprecate of_node in struct i2c_board_info	Andy Shevchenko
	Two members of the same or quite similar semantics is quite confusing to begin with. Moreover, fwnode covers all possible firmware descriptions that Linux kernel supports. Deprecate of_node in struct i2c_board_info, so users will be warned and in the future there is a plan to convert the users and remove it completely. Tested-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com> Reviewed-by: Sakari Ailus <sakari.ailus@linux.intel.com> Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
2025-04-16	kernel: globalize lookup_or_create_module_kobject()	Shyam Saini
	lookup_or_create_module_kobject() is marked as static and __init, to make it global drop static keyword. Since this function can be called from non-init code, use __modinit instead of __init, __modinit marker will make it __init if CONFIG_MODULES is not defined. Suggested-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Shyam Saini <shyamsaini@linux.microsoft.com> Link: https://lore.kernel.org/r/20250227184930.34163-4-shyamsaini@linux.microsoft.com Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
2025-04-16	xfrm: Add explicit dev to .xdo_dev_state_{add,delete,free}	Cosmin Ratiu
	Previously, device driver IPSec offload implementations would fall into two categories: 1. Those that used xso.dev to determine the offload device. 2. Those that used xso.real_dev to determine the offload device. The first category didn't work with bonding while the second did. In a non-bonding setup the two pointers are the same. This commit adds explicit pointers for the offload netdevice to .xdo_dev_state_add() / .xdo_dev_state_delete() / .xdo_dev_state_free() which eliminates the confusion and allows drivers from the first category to work with bonding. xso.real_dev now becomes a private pointer managed by the bonding driver. Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2025-04-16	crypto: api - Add support for duplicating algorithms before registration	Herbert Xu
	If the bit CRYPTO_ALG_DUP_FIRST is set, an algorithm will be duplicated by kmemdup before registration. This is inteded for hardware-based algorithms that may be unplugged at will. Do not use this if the algorithm data structure is embedded in a bigger data structure. Perform the duplication in the driver instead. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-16	Merge branch 'x86/cpu' into x86/fpu, to pick up dependent commits	Ingo Molnar
	Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-04-16	crypto: api - Add reqsize to crypto_alg	Herbert Xu
	Add a reqsize field to crypto_alg with the intention of replacing the type-specific reqsize field currently used by ahash and acomp. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-16	crypto: api - Mark cra_init/cra_exit as deprecated	Herbert Xu
	These functions have been obsoleted by the type-specific init/exit functions. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-16	crypto: api - Add helpers to manage request flags	Herbert Xu
	Add helpers so that the ON_STACK request flag management is not duplicated all over the place. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-16	crypto: ahash - Remove request chaining	Herbert Xu
	Request chaining requires the user to do too much book keeping. Remove it from ahash. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-15	net: ptp: introduce .supported_perout_flags to ptp_clock_info	Jacob Keller
	The PTP_PEROUT_REQUEST2 ioctl has gained support for flags specifying specific output behavior including PTP_PEROUT_ONE_SHOT, PTP_PEROUT_DUTY_CYCLE, PTP_PEROUT_PHASE. Driver authors are notorious for not checking the flags of the request. This results in misinterpreting the request, generating an output signal that does not match the requested value. It is anticipated that even more flags will be added in the future, resulting in even more broken requests. Expecting these issues to be caught during review or playing whack-a-mole after the fact is not a great solution. Instead, introduce the supported_perout_flags field in the ptp_clock_info structure. Update the core character device logic to explicitly reject any request which has a flag not on this list. This ensures that drivers must 'opt in' to the flags they support. Drivers which don't set the .supported_perout_flags field will not need to check that unsupported flags aren't passed, as the core takes care of this. Update the drivers which do support flags to set this new field. Note the following driver files set n_per_out to a non-zero value but did not check the flags at all: • drivers/ptp/ptp_clockmatrix.c • drivers/ptp/ptp_idt82p33.c • drivers/ptp/ptp_fc3.c • drivers/net/ethernet/ti/am65-cpts.c • drivers/net/ethernet/aquantia/atlantic/aq_ptp.c • drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.c • drivers/net/dsa/sja1105/sja1105_ptp.c • drivers/net/ethernet/freescale/dpaa2/dpaa2-ptp.c • drivers/net/ethernet/mscc/ocelot_vsc7514.c • drivers/net/ethernet/intel/i40e/i40e_ptp.c Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Kory Maincent <kory.maincent@bootlin.com> Link: https://patch.msgid.link/20250414-jk-supported-perout-flags-v2-2-f6b17d15475c@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-15	net: ptp: introduce .supported_extts_flags to ptp_clock_info	Jacob Keller
	The PTP_EXTTS_REQUEST(2) ioctl has a flags field which specifies how the external timestamp request should behave. This includes which edge of the signal to timestamp, as well as a specialized "offset" mode. It is expected that more flags will be added in the future. Driver authors routinely do not check the flags, often accepting requests with flags which they do not support. Even drivers which do check flags may not be future-proofed to reject flags not yet defined. Thus, any future flag additions often require manually updating drivers to reject these flags. This approach of hoping we catch flag checks during review, or playing whack-a-mole after the fact is the wrong approach. Introduce the "supported_extts_flags" field to the ptp_clock_info structure. This field defines the set of flags the device actually supports. Update the core character device logic to check this field and reject unsupported requests. Getting this right is somewhat tricky. First, to avoid unnecessary repetition and make basic functionality work when .supported_extts_flags is 0, the core always accepts the PTP_ENABLE_FEATURE flag. This flag is used to set the 'on' parameter to the .enable function and is thus always 'supported' by all drivers. For backwards compatibility, the PTP_RISING_EDGE and PTP_FALLING_EDGE flags are merely "hints" when using the old PTP_EXTTS_REQUEST ioctl, and are not expected to be enforced. If the user issues PTP_EXTTS_REQUEST2, the PTP_STRICT_FLAGS flag is added which is supposed to inform the driver to strictly validate the flags and reject unsupported requests. To handle this, first check if the driver reports PTP_STRICT_FLAGS support. If it does not, then always allow the PTP_RISING_EDGE and PTP_FALLING_EDGE flags. This keeps backwards compatibility with the original PTP_EXTTS_REQUEST ioctl where these flags are not guaranteed to be honored. This way, drivers which do not set the supported_extts_flags will continue to accept requests for the original PTP_EXTTS_REQUEST ioctl. The core will automatically reject requests with new flags, and correctly reject requests with PTP_STRICT_FLAGS, where the driver is supposed to strictly validate the flags. Update the various drivers, refactoring their validation logic into the .supported_extts_flags field. For consistency and readability, PTP_ENABLE_FEATURE is not set in the supported flags list, and PTP_EXTTS_EDGES is expanded to PTP_RISING_EDGE \| PTP_FALLING_EDGE in all cases. Note the following driver files set n_ext_ts to a non-zero value but did not check flags at all: • drivers/net/ethernet/freescale/dpaa2/dpaa2-ptp.c • drivers/net/ethernet/freescale/enetc/enetc_ptp.c • drivers/net/ethernet/intel/i40e/i40e_ptp.c • drivers/net/ethernet/marvell/octeontx2/nic/otx2_ptp.c • drivers/net/ethernet/renesas/ravb_ptp.c • drivers/net/ethernet/renesas/rtsn.c • drivers/net/ethernet/renesas/rtsn.h • drivers/net/ethernet/ti/am65-cpts.c • drivers/net/ethernet/ti/cpts.h • drivers/net/ethernet/ti/icssg/icss_iep.c • drivers/net/ethernet/xscale/ptp_ixp46x.c • drivers/net/phy/bcm-phy-ptp.c • drivers/ptp/ptp_ocp.c • drivers/ptp/ptp_pch.c • drivers/ptp/ptp_qoriq.c These drivers behavior does change slightly: they will now reject the PTP_EXTTS_REQUEST2 ioctl, because they do not strictly validate their flags. This also makes them no longer incorrectly accept PTP_EXT_OFFSET. Also note that the renesas ravb driver does not support PTP_STRICT_FLAGS. We could leave the .supported_extts_flags as 0, but I added the PTP_RISING_EDGE \| PTP_FALLING_EDGE since the driver previously manually validated these flags. This is equivalent to 0 because the core will allow these flags regardless unless PTP_STRICT_FLAGS is also set. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Kory Maincent <kory.maincent@bootlin.com> Link: https://patch.msgid.link/20250414-jk-supported-perout-flags-v2-1-f6b17d15475c@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-15	PCI: pciehp: Ignore Link Down/Up caused by Secondary Bus Reset	Lukas Wunner
	When a Secondary Bus Reset is issued at a hotplug port, it causes a Data Link Layer State Changed event as a side effect. On hotplug ports using in-band presence detect, it additionally causes a Presence Detect Changed event. These spurious events should not result in teardown and re-enumeration of the device in the slot. Hence commit 2e35afaefe64 ("PCI: pciehp: Add reset_slot() method") masked the Presence Detect Changed Enable bit in the Slot Control register during a Secondary Bus Reset. Commit 06a8d89af551 ("PCI: pciehp: Disable link notification across slot reset") additionally masked the Data Link Layer State Changed Enable bit. However masking those bits only disables interrupt generation (PCIe r6.2 sec 6.7.3.1). The events are still visible in the Slot Status register and picked up by the IRQ handler if it runs during a Secondary Bus Reset. This can happen if the interrupt is shared or if an unmasked hotplug event occurs, e.g. Attention Button Pressed or Power Fault Detected. The likelihood of this happening used to be small, so it wasn't much of a problem in practice. That has changed with the recent introduction of bandwidth control in v6.13-rc1 with commit 665745f27487 ("PCI/bwctrl: Re-add BW notification portdrv as PCIe BW controller"): Bandwidth control shares the interrupt with PCIe hotplug. A Secondary Bus Reset causes a Link Bandwidth Notification, so the hotplug IRQ handler runs, picks up the masked events and tears down the device in the slot. As a result, Joel reports VFIO passthrough failure of a GPU, which Ilpo root-caused to the incorrect handling of masked hotplug events. Clearly, a more reliable way is needed to ignore spurious hotplug events. For Downstream Port Containment, a new ignore mechanism was introduced by commit a97396c6eb13 ("PCI: pciehp: Ignore Link Down/Up caused by DPC"). It has been working reliably for the past four years. Adapt it for Secondary Bus Resets. Introduce two helpers to annotate code sections which cause spurious link changes: pci_hp_ignore_link_change() and pci_hp_unignore_link_change() Use those helpers in lieu of masking interrupts in the Slot Control register. Introduce a helper to check whether such a code section is executing concurrently and if so, await it: pci_hp_spurious_link_change() Invoke the helper in the hotplug IRQ thread pciehp_ist(). Re-use the IRQ thread's existing code which ignores DPC-induced link changes unless the link is unexpectedly down after reset recovery or the device was replaced during the bus reset. That code block in pciehp_ist() was previously only executed if a Data Link Layer State Changed event has occurred. Additionally execute it for Presence Detect Changed events. That's necessary for compatibility with PCIe r1.0 hotplug ports because Data Link Layer State Changed didn't exist before PCIe r1.1. DPC was added with PCIe r3.1 and thus DPC-capable hotplug ports always support Data Link Layer State Changed events. But the same cannot be assumed for Secondary Bus Reset, which already existed in PCIe r1.0. Secondary Bus Reset is only one of many causes of spurious link changes. Others include runtime suspend to D3cold, firmware updates or FPGA reconfiguration. The new pci_hp_{,un}ignore_link_change() helpers may be used by all kinds of drivers to annotate such code sections, hence their declarations are publicly visible in <linux/pci.h>. A case in point is the Mellanox Ethernet driver which disables a firmware reset feature if the Ethernet card is attached to a hotplug port, see commit 3d7a3f2612d7 ("net/mlx5: Nack sync reset request when HotPlug is enabled"). Going forward, PCIe hotplug will be able to cope gracefully with all such use cases once the code sections are properly annotated. The new helpers internally use two bits in struct pci_dev's priv_flags as well as a wait_queue. This mirrors what was done for DPC by commit a97396c6eb13 ("PCI: pciehp: Ignore Link Down/Up caused by DPC"). That may be insufficient if spurious link changes are caused by multiple sources simultaneously. An example might be a Secondary Bus Reset issued by AER during FPGA reconfiguration. If this turns out to happen in real life, support for it can easily be added by replacing the PCI_LINK_CHANGING flag with an atomic_t counter incremented by pci_hp_ignore_link_change() and decremented by pci_hp_unignore_link_change(). Instead of awaiting a zero PCI_LINK_CHANGING flag, the pci_hp_spurious_link_change() helper would then simply await a zero counter. Fixes: 665745f27487 ("PCI/bwctrl: Re-add BW notification portdrv as PCIe BW controller") Reported-by: Joel Mathew Thomas <proxy0@tutamail.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219765 Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Joel Mathew Thomas <proxy0@tutamail.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://patch.msgid.link/d04deaf49d634a2edf42bf3c06ed81b4ca54d17b.1744298239.git.lukas@wunner.de
2025-04-15	sysfs: constify attribute_group::bin_attrs	Thomas Weißschuh
	All users of this field have been migrated to bin_attrs_new. It can now be constified. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Link: https://lore.kernel.org/r/20250313-sysfs-const-bin_attr-final-v2-2-96284e1e88ce@weissschuh.net Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-04-15	sysfs: constify bin_attribute argument of bin_attribute::read/write()	Thomas Weißschuh
	All callback implementers have been moved to the const variant of the callbacks. The signature of the original callbacks can now be changed. Also remove the now unnecessary transition machinery inside __BIN_ATTR(). Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Link: https://lore.kernel.org/r/20250313-sysfs-const-bin_attr-final-v2-1-96284e1e88ce@weissschuh.net Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-04-15	device property: Add a note to the fwnode.h	Andy Shevchenko
	Add a note to the fwnode.h that the header should not be used directly in the leaf drivers, they all should use the higher level APIs and the respective headers. The purpose of this note is to give guidance to driver writers to avoid repeating a common mistake. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Sakari Ailus <sakari.ailus@linux.intel.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Zijun Hu <quic_zijuhu@quicinc.com> Link: https://lore.kernel.org/r/20250408095229.1298005-1-andriy.shevchenko@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-04-15	net: phy: remove device_phy_find_device	Heiner Kallweit
	AFAICS this function has never had a user. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/ab7b8094-2eea-4e82-a047-fd60117f220b@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-15	platform/mellanox: Rename field to improve code readability	Vadim Pasternak
	Rename field 'counter' in 'mlxreg_core_hotplug_platform_data' to count. Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Link: https://lore.kernel.org/r/20250412091843.33943-2-vadimp@nvidia.com Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-04-15	driver core: auxiliary bus: add device creation helpers	Jerome Brunet
	Add helper functions to create a device on the auxiliary bus. This is meant for fairly simple usage of the auxiliary bus, to avoid having the same code repeated in the different drivers. Suggested-by: Stephen Boyd <sboyd@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jerome Brunet <jbrunet@baylibre.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/20250218-aux-device-create-helper-v4-1-c3d7dfdea2e6@baylibre.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-04-15	mm: skip folio reclaim in legacy memcg contexts for deadlockable mappings	Joanne Koong
	Currently in shrink_folio_list(), reclaim for folios under writeback falls into 3 different cases: 1) Reclaim is encountering an excessive number of folios under writeback and this folio has both the writeback and reclaim flags set 2) Dirty throttling is enabled (this happens if reclaim through cgroup is not enabled, if reclaim through cgroupv2 memcg is enabled, or if reclaim is on the root cgroup), or if the folio is not marked for immediate reclaim, or if the caller does not have __GFP_FS (or __GFP_IO if it's going to swap) set 3) Legacy cgroupv1 encounters a folio that already has the reclaim flag set and the caller did not have __GFP_FS (or __GFP_IO if swap) set In cases 1) and 2), we activate the folio and skip reclaiming it while in case 3), we wait for writeback to finish on the folio and then try to reclaim the folio again. In case 3, we wait on writeback because cgroupv1 does not have dirty folio throttling, as such this is a mitigation against the case where there are too many folios in writeback with nothing else to reclaim. If a filesystem (eg fuse) may deadlock due to reclaim waiting on writeback, then the filesystem needs to add inefficient messy workarounds to prevent this. To improve the performance of these filesystems, this commit adds two things: a) a AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM mapping flag that filesystems may set to indicate that reclaim should not wait on writeback b) if legacy memcg encounters a folio with this AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM flag set (eg case 3), the folio will be activated and skip reclaim (eg default to behavior in case 2) instead. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev> Reviewed-by: Jeff Layton <jlayton@kernel.org> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2025-04-15	interconnect: core: Add dynamic id allocation support	Raviteja Laggyshetty
	The current interconnect framework relies on static IDs for node creation and registration, which limits topologies with multiple instances of the same interconnect provider. To address this, introduce icc_node_create_dyn() and icc_link_nodes() APIs to dynamically allocate IDs for interconnect nodes during creation and link. This change removes the dependency on static IDs, allowing multiple instances of the same hardware, such as EPSS L3. Signed-off-by: Raviteja Laggyshetty <quic_rlaggysh@quicinc.com> Link: https://lore.kernel.org/r/20250415095343.32125-3-quic_rlaggysh@quicinc.com Signed-off-by: Georgi Djakov <djakov@kernel.org>
2025-04-15	Merge drm/drm-next into drm-misc-next	Thomas Zimmermann
	Backmerging to get fixes from v6.15-rc2 into drm-misc-next. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
2025-04-15	fs: add kern_path_locked_negative()	Christian Brauner
	The audit code relies on the fact that kern_path_locked() returned a path even for a negative dentry. If it doesn't find a valid dentry it immediately calls: audit_find_parent(d_backing_inode(parent_path.dentry)); which assumes that parent_path.dentry is still valid. But it isn't since kern_path_locked() has been changed to path_put() also for a negative dentry. Fix this by adding a helper that implements the required audit semantics and allows us to fix the immediate bleeding. We can find a unified solution for this afterwards. Link: https://lore.kernel.org/20250414-rennt-wimmeln-f186c3a780f1@brauner Fixes: 1c3cb50b58c3 ("VFS: change kern_path_locked() and user_path_locked_at() to never return negative dentry") Reported-and-tested-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-04-15	PCI/MSI: Add an option to write MSIX ENTRY_DATA before any reads	Jonathan Currier
	Commit 7d5ec3d36123 ("PCI/MSI: Mask all unused MSI-X entries") introduced a readl() from ENTRY_VECTOR_CTRL before the writel() to ENTRY_DATA. This is correct, however some hardware, like the Sun Neptune chips, the NIU module, will cause an error and/or fatal trap if any MSIX table entry is read before the corresponding ENTRY_DATA field is written to. Add an optional early writel() in msix_prepare_msi_desc(). Fixes: 7d5ec3d36123 ("PCI/MSI: Mask all unused MSI-X entries") Signed-off-by: Jonathan Currier <dullfire@yahoo.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20241117234843.19236-2-dullfire@yahoo.com
2025-04-14	net: stmmac: remove eee_usecs_rate	Russell King (Oracle)
	plat_dat->eee_users_rate is now unused, so remove this member. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1u3Vuv-000E7y-9k@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-14	net: ethtool: fix get_ts_stats() documentation	Russell King (Oracle)
	Commit 0e9c127729be ("ethtool: add interface to read Tx hardware timestamping statistics") added documentation for timestamping statistics, but added the detailed explanation for this method to the get_ts_info() rather than get_ts_stats(). Move it to the correct entry. Cc: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1u3MTz-000Crx-IW@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-14	page_pool: Track DMA-mapped pages and unmap them when destroying the pool	Toke Høiland-Jørgensen
	When enabling DMA mapping in page_pool, pages are kept DMA mapped until they are released from the pool, to avoid the overhead of re-mapping the pages every time they are used. This causes resource leaks and/or crashes when there are pages still outstanding while the device is torn down, because page_pool will attempt an unmap through a non-existent DMA device on the subsequent page return. To fix this, implement a simple tracking of outstanding DMA-mapped pages in page pool using an xarray. This was first suggested by Mina[0], and turns out to be fairly straight forward: We simply store pointers to pages directly in the xarray with xa_alloc() when they are first DMA mapped, and remove them from the array on unmap. Then, when a page pool is torn down, it can simply walk the xarray and unmap all pages still present there before returning, which also allows us to get rid of the get/put_device() calls in page_pool. Using xa_cmpxchg(), no additional synchronisation is needed, as a page will only ever be unmapped once. To avoid having to walk the entire xarray on unmap to find the page reference, we stash the ID assigned by xa_alloc() into the page structure itself, using the upper bits of the pp_magic field. This requires a couple of defines to avoid conflicting with the POINTER_POISON_DELTA define, but this is all evaluated at compile-time, so does not affect run-time performance. The bitmap calculations in this patch gives the following number of bits for different architectures: - 23 bits on 32-bit architectures - 21 bits on PPC64 (because of the definition of ILLEGAL_POINTER_VALUE) - 32 bits on other 64-bit architectures Stashing a value into the unused bits of pp_magic does have the effect that it can make the value stored there lie outside the unmappable range (as governed by the mmap_min_addr sysctl), for architectures that don't define ILLEGAL_POINTER_VALUE. This means that if one of the pointers that is aliased to the pp_magic field (such as page->lru.next) is dereferenced while the page is owned by page_pool, that could lead to a dereference into userspace, which is a security concern. The risk of this is mitigated by the fact that (a) we always clear pp_magic before releasing a page from page_pool, and (b) this would need a use-after-free bug for struct page, which can have many other risks since page->lru.next is used as a generic list pointer in multiple places in the kernel. As such, with this patch we take the position that this risk is negligible in practice. For more discussion, see[1]. Since all the tracking added in this patch is performed on DMA map/unmap, no additional code is needed in the fast path, meaning the performance overhead of this tracking is negligible there. A micro-benchmark shows that the total overhead of the tracking itself is about 400 ns (39 cycles(tsc) 395.218 ns; sum for both map and unmap[2]). Since this cost is only paid on DMA map and unmap, it seems like an acceptable cost to fix the late unmap issue. Further optimisation can narrow the cases where this cost is paid (for instance by eliding the tracking when DMA map/unmap is a no-op). The extra memory needed to track the pages is neatly encapsulated inside xarray, which uses the 'struct xa_node' structure to track items. This structure is 576 bytes long, with slots for 64 items, meaning that a full node occurs only 9 bytes of overhead per slot it tracks (in practice, it probably won't be this efficient, but in any case it should be an acceptable overhead). [0] https://lore.kernel.org/all/CAHS8izPg7B5DwKfSuzz-iOop_YRbk3Sd6Y4rX7KBG9DcVJcyWg@mail.gmail.com/ [1] https://lore.kernel.org/r/20250320023202.GA25514@openwall.com [2] https://lore.kernel.org/r/ae07144c-9295-4c9d-a400-153bb689fe9e@huawei.com Reported-by: Yonglong Liu <liuyonglong@huawei.com> Closes: https://lore.kernel.org/r/8743264a-9700-4227-a556-5f931c720211@huawei.com Fixes: ff7d6b27f894 ("page_pool: refurbish version of page_pool code") Suggested-by: Mina Almasry <almasrymina@google.com> Reviewed-by: Mina Almasry <almasrymina@google.com> Reviewed-by: Jesper Dangaard Brouer <hawk@kernel.org> Tested-by: Jesper Dangaard Brouer <hawk@kernel.org> Tested-by: Qiuling Ren <qren@redhat.com> Tested-by: Yuying Ma <yuma@redhat.com> Tested-by: Yonglong Liu <liuyonglong@huawei.com> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://patch.msgid.link/20250409-page-pool-track-dma-v9-2-6a9ef2e0cba8@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-14	page_pool: Move pp_magic check into helper functions	Toke Høiland-Jørgensen
	Since we are about to stash some more information into the pp_magic field, let's move the magic signature checks into a pair of helper functions so it can be changed in one place. Reviewed-by: Mina Almasry <almasrymina@google.com> Tested-by: Yonglong Liu <liuyonglong@huawei.com> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://patch.msgid.link/20250409-page-pool-track-dma-v9-1-6a9ef2e0cba8@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-14	net: convert dev->rtnl_link_state to a bool	Jakub Kicinski
	netdevice reg_state was split into two 16 bit enums back in 2010 in commit a2835763e130 ("rtnetlink: handle rtnl_link netlink notifications manually"). Since the split the fields have been moved apart, and last year we converted reg_state to a normal u8 in commit 4d42b37def70 ("net: convert dev->reg_state to u8"). rtnl_link_state being a 16 bitfield makes no sense. Convert it to a single bool, it seems very unlikely after 15 years that we'll need more values in it. We could drop dev->rtnl_link_ops from the conditions but feels like having it there more clearly points at the reason for this hack. Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250410014246.780885-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-14	Merge tag 'vfs-6.15-rc3.fixes' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: - Fix NULL pointer dereference in virtiofs - Fix slab OOB access in hfs/hfsplus - Only create /proc/fs/netfs when CONFIG_PROC_FS is set - Fix getname_flags() to initialize pointer correctly - Convert dentry flags to enum - Don't allow datadir without lowerdir in overlayfs - Use namespace_{lock,unlock} helpers in dissolve_on_fput() instead of plain namespace_sem so unmounted mounts are properly cleaned up - Skip unnecessary ifs_block_is_uptodate check in iomap - Remove an unused forward declaration in overlayfs - Fix devpts uid/gid handling after converting to the new mount api - Fix afs_dynroot_readdir() to not use the RCU read lock - Fix mount_setattr() and open_tree_attr() to not pointlessly do path lookup or walk the mount tree if no mount option change has been requested * tag 'vfs-6.15-rc3.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs: use namespace_{lock,unlock} in dissolve_on_fput() iomap: skip unnecessary ifs_block_is_uptodate check fs: Fix filename init after recent refactoring netfs: Only create /proc/fs/netfs with CONFIG_PROC_FS mount: ensure we don't pointlessly walk the mount tree dcache: convert dentry flag macros to enum afs: Fix afs_dynroot_readdir() to not use the RCU read lock hfs/hfsplus: fix slab-out-of-bounds in hfs_bnode_read_key virtiofs: add filesystem context source name check devpts: Fix type for uid and gid params ovl: remove unused forward declaration ovl: don't allow datadir only
2025-04-14	firmware: imx: Add i.MX95 SCMI CPU driver	Peng Fan
	The i.MX95 System manager exports SCMI CPU protocol for linux to manage cpu cores. The driver is to use the cpu Protocol interface to start, stop a cpu cores (eg, M7). Reviewed-by: Cristian Marussi <cristian.marussi@arm.com> Signed-off-by: Peng Fan <peng.fan@nxp.com> Message-Id: <20250408-imx-lmm-cpu-v4-6-4c5f4a456e49@nxp.com> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
2025-04-14	firmware: imx: Add i.MX95 SCMI LMM driver	Peng Fan
	The i.MX95 System manager exports SCMI LMM protocol for linux to manage Logical Machines. The driver is to use the LMM Protocol interface to boot, shutdown a LM. Reviewed-by: Cristian Marussi <cristian.marussi@arm.com> Signed-off-by: Peng Fan <peng.fan@nxp.com> Message-Id: <20250408-imx-lmm-cpu-v4-5-4c5f4a456e49@nxp.com> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
2025-04-14	ASoC: tas27{64,70}: improve support for Apple codec	Mark Brown
	Merge series from James Calligeros <jcalligeros99@gmail.com>: This series introduces a number of changes to the drivers for the Texas Instruments TAS2764 and TAS2770 amplifiers in order to introduce (and improve in the case of TAS2770) support for the variants of these amps found in Apple Silicon Macs. Apple's variant of TAS2764 is known as SN012776, and as always with Apple is a subtly incompatible variant with a number of quirks. It is not publicly available. The TAS2770 variant is known as TAS5770L, and does not require incompatible handling. Much as with the Cirrus codec patches, I do not expect that we will get any official acknowledgement that these parts exist from TI, however I would be delighted to be proven wrong. This series has been living in the downstream Asahi kernel tree[1] for over two years, and has been tested by many thousands of users by this point[2]. v4 drops the TDM idle TX slot behaviour patches. I experimented with the API discussed in v3, however this did not work on any of the machines I tested it with. More tweaking is probably needed. [1] https://github.com/AsahiLinux/linux/tree/asahi-wip [2] https://stats.asahilinux.org/
2025-04-14	Fix up building KUnit tests for Cirrus Logic modules	Mark Brown
	Merge series from Richard Fitzgerald <rf@opensource.cirrus.com>: This series fixes the KConfig for cs_dsp and cs-amp-lib tests so that CONFIG_KUNIT_ALL_TESTS doesn't cause them to add modules to the build.
2025-04-14	firmware: arm_scmi: imx: Add i.MX95 CPU Protocol	Peng Fan
	This protocol allows an agent to start, stop a CPU or set reset vector. It is used to manage auxiliary CPUs in an LM (e.g. additional cores in an AP cluster). Signed-off-by: Peng Fan <peng.fan@nxp.com> Message-Id: <20250408-imx-lmm-cpu-v4-4-4c5f4a456e49@nxp.com> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
2025-04-14	firmware: arm_scmi: imx: Add i.MX95 LMM protocol	Peng Fan
	Add Logical Machine Management(LMM) protocol which is intended for boot, shutdown, and reset of other logical machines (LM). It is usually used to allow one LM to manager another used as an offload or accelerator engine. Reviewed-by: Cristian Marussi <cristian.marussi@arm.com> Signed-off-by: Peng Fan <peng.fan@nxp.com> Message-Id: <20250408-imx-lmm-cpu-v4-3-4c5f4a456e49@nxp.com> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
2025-04-14	x86/fpu: Make task_struct::thread constant size	Ingo Molnar
	Turn thread.fpu into a pointer. Since most FPU code internals work by passing around the FPU pointer already, the code generation impact is small. This allows us to remove the old kludge of task_struct being variable size: struct task_struct { ... /* * New fields for task_struct should be added above here, so that * they are included in the randomized portion of task_struct. / randomized_struct_fields_end / CPU-specific state of this task: / struct thread_struct thread; / * WARNING: on x86, 'thread_struct' contains a variable-sized * structure. It MUST be at the end of 'task_struct'. * * Do not put anything below here! */ }; ... which creates a number of problems, such as requiring thread_struct to be the last member of the struct - not allowing it to be struct-randomized, etc. But the primary motivation is to allow the decoupling of task_struct from hardware details (<asm/processor.h> in particular), and to eventually allow the per-task infrastructure: DECLARE_PER_TASK(type, name); ... per_task(current, name) = val; ... which requires task_struct to be a constant size struct. The fpu_thread_struct_whitelist() quirk to hardened usercopy can be removed, now that the FPU structure is not embedded in the task struct anymore, which reduces text footprint a bit. Fixed-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Chang S. Bae <chang.seok.bae@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250409211127.3544993-4-mingo@kernel.org
2025-04-14	pwm: Make chip parameter to pwmchip_get_drvdata() a const pointer	Uwe Kleine-König
	dev_get_drvdata()'s parameter is a const pointer, so the chip passed to pwmchip_get_drvdata() can be const, too. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com> Link: https://lore.kernel.org/r/20250403151134.266388-2-u.kleine-koenig@baylibre.com Signed-off-by: Uwe Kleine-König <ukleinek@kernel.org>
2025-04-14	Merge tag 'drm-misc-next-2025-04-09' of ↵	Dave Airlie
	https://gitlab.freedesktop.org/drm/misc/kernel into drm-next drm-misc-next for v6.16-rc1: UAPI Changes: - Add ASAHI uapi header! - Add apple fourcc modifiers. - Add capset virtio definitions to UAPI. - Extend EXPORT_SYNC_FILE for timeline syncobjs. Cross-subsystem Changes: - Adjust DMA-BUF sg handling to not cache map on attach. - Update drm/ci, hlcdc, virtio, maintainers. - Update fbdev todo. - Allow setting dma-device for dma-buf import. - Export efi_mem_desc_lookup to make efidrm build as a module. Core Changes: - Update drm scheduler docs. - Use the correct resv object in TTM delayed destroy. - Fix compiler warning with panic qr code, and other small fixes. - drm/ci updates. - Add debugfs file for listing all bridges. - Small fixes to drm/client, ttm tests. - Add documentation to display/hdmi. - Add kunit tests for bridges. - Dont fail managed device probing if connector polling fails. - Create Kconfig.debug for drm core. - Add tests for the drm scheduler. - Add and use new access helpers for DPCPD. - Add generic and optimized conversions for format-helper. - Begin refcounting panel for improving lifetime handling. - Unify simpledrm and ofdrm sysfb, and add extra features. - Split hdmi audio in bridge to make DP audio work. Driver Changes: - Convert drivers to use devm_platform_ioremap_resource(). - Assorted small fixes to imx/legacy-bridg, gma500, pl111, nouveau, vc4, vmwgfx, ast, mxsfb, xlnx, accel/qaic, v3d, bridge/imx8qxp-ldb, ofdrm, bridge/fsl-ldb, udl, bridge/ti-sn65dsi86, bridge/anx7625, cirrus-qemu, bridge/cdns-dsi, panel/sharp, panel/himax, bridge/sil902x, renesas, imagination, various panels. - Allow attaching more display to vkms. - Add Powertip PH128800T004-ZZA01 panel. - Add rotation quirk for ZOTAC panel. - Convert bridge/tc358775 to atomic. - Remove deprecated panel calls from synaptics, novatek, samsung panels. - Refactor shmem helper page pinning and accel drivers using it. - Add dmabuf support to accel/amdxdna. - Use 4k page table format for panfrost/mediatek. - Add common powerup/down dp link helper and use it. - Assorted compiler warning fixes. - Support dma-buf import for renesas Signed-off-by: Dave Airlie <airlied@redhat.com> # Conflicts: # include/drm/drm_kunit_helpers.h From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://lore.kernel.org/r/e147ff95-697b-4067-9e2e-7cbd424e162a@linux.intel.com
2025-04-13	nfs: add missing selections of CONFIG_CRC32	Eric Biggers
	nfs.ko, nfsd.ko, and lockd.ko all use crc32_le(), which is available only when CONFIG_CRC32 is enabled. But the only NFS kconfig option that selected CONFIG_CRC32 was CONFIG_NFS_DEBUG, which is client-specific and did not actually guard the use of crc32_le() even on the client. The code worked around this bug by only actually calling crc32_le() when CONFIG_CRC32 is built-in, instead hard-coding '0' in other cases. This avoided randconfig build errors, and in real kernels the fallback code was unlikely to be reached since CONFIG_CRC32 is 'default y'. But, this really needs to just be done properly, especially now that I'm planning to update CONFIG_CRC32 to not be 'default y'. Therefore, make CONFIG_NFS_FS, CONFIG_NFSD, and CONFIG_LOCKD select CONFIG_CRC32. Then remove the fallback code that becomes unnecessary, as well as the selection of CONFIG_CRC32 from CONFIG_NFS_DEBUG. Fixes: 1264a2f053a3 ("NFS: refactor code for calculating the crc32 hash of a filehandle") Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Anna Schumaker <anna.schumaker@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-04-13	spi: Add support for Double Transfer Rate (DTR) mode	Mukesh Kumar Savaliya
	Introduce support for protocol drivers to specify whether a transfer should use single or dual transfer mode. Currently, the SPI controller cannot determine this information from the user, leading to potential limitations in transfer capabilities. Add a new field `dtr_mode` in the `spi_transfer` structure. The `dtr_mode` field allows protocol drivers to indicate if Double Transfer Rate (DTR) mode is supported for a given transfer. When `dtr_mode` is set to true, the SPI controller will use DTR mode; otherwise, it will default to single transfer mode. Introduce another field `dtr_caps` to indicate if the QSPI controller is capable of supporting DTR mode (SDR and DDR). By default, both `dtr_caps` and `dtr_mode` will be false. These flags manage the QSPI controller's DTR mode capabilities within the SPI framework. The QSPI controller driver uses these flags to configure single or double transfer rates using the controller register. The existing spi-mem driver helps configure the DTR mode but is limited to memory devices. There is no support available to set DTR mode for non-memory devices, e.g., touch or any generic SPI sensor. This change is backward compatible and doesn't break existing SPI or QSPI drivers. Changes include: - Addition of `dtr_mode` and `dtr_caps` fields in the `spi_transfer` structure. - Documentation updates to reflect the new `dtr_mode` and `dtr_caps` fields. Signed-off-by: Mukesh Kumar Savaliya <quic_msavaliy@quicinc.com> Link: https://patch.msgid.link/20250404135427.313825-1-quic_msavaliy@quicinc.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-04-11	net: Retire DCCP socket.	Kuniyuki Iwashima
	DCCP was orphaned in 2021 by commit 054c4610bd05 ("MAINTAINERS: dccp: move Gerrit Renker to CREDITS"), which noted that the last maintainer had been inactive for five years. In recent years, it has become a playground for syzbot, and most changes to DCCP have been odd bug fixes triggered by syzbot. Apart from that, the only changes have been driven by treewide or networking API updates or adjustments related to TCP. Thus, in 2023, we announced we would remove DCCP in 2025 via commit b144fcaf46d4 ("dccp: Print deprecation notice."). Since then, only one individual has contacted the netdev mailing list. [0] There is ongoing research for Multipath DCCP. The repository is hosted on GitHub [1], and development is not taking place through the upstream community. While the repository is published under the GPLv2 license, the scheduling part remains proprietary, with a LICENSE file [2] stating: "This is not Open Source software." The researcher mentioned a plan to address the licensing issue, upstream the patches, and step up as a maintainer, but there has been no further communication since then. Maintaining DCCP for a decade without any real users has become a burden. Therefore, it's time to remove it. Removing DCCP will also provide significant benefits to TCP. It allows us to freely reorganize the layout of struct inet_connection_sock, which is currently shared with DCCP, and optimize it to reduce the number of cachelines accessed in the TCP fast path. Note that we keep DCCP netfilter modules as requested. [3] Link: https://lore.kernel.org/netdev/20230710182253.81446-1-kuniyu@amazon.com/T/#u #[0] Link: https://github.com/telekom/mp-dccp #[1] Link: https://github.com/telekom/mp-dccp/blob/mpdccp_v03_k5.10/net/dccp/non_gpl_scheduler/LICENSE #[2] Link: https://lore.kernel.org/netdev/Z_VQ0KlCRkqYWXa-@calendula/ #[3] Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Paul Moore <paul@paul-moore.com> (LSM and SELinux) Acked-by: Casey Schaufler <casey@schaufler-ca.com> Link: https://patch.msgid.link/20250410023921.11307-3-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-11	mm: (un)track_pfn_copy() fix + doc improvements	David Hildenbrand
	We got a late smatch warning and some additional review feedback. smatch warnings: mm/memory.c:1428 copy_page_range() error: uninitialized symbol 'pfn'. We actually use the pfn only when it is properly initialized; however, we may pass an uninitialized value to a function -- although it will not use it that likely still is UB in C. So let's just fix it by always initializing pfn in the caller of track_pfn_copy(), and improving the documentation of track_pfn_copy(). While at it, clarify the doc of untrack_pfn_copy(), that internal checks make sure if we actually have to untrack anything. Link: https://lkml.kernel.org/r/20250408085950.976103-1-david@redhat.com Fixes: dc84bc2aba85 ("x86/mm/pat: Fix VM_PAT handling when fork() fails in copy_page_range()") Signed-off-by: David Hildenbrand <david@redhat.com> Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <error27@gmail.com> Closes: https://lore.kernel.org/r/202503270941.IFILyNCX-lkp@intel.com/ Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Rik van Riel <riel@surriel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-11	locking/local_lock, mm: replace localtry_ helpers with local_trylock_t type	Alexei Starovoitov
	Partially revert commit 0aaddfb06882 ("locking/local_lock: Introduce localtry_lock_t"). Remove localtry_() helpers, since localtry_lock() name might be misinterpreted as "try lock". Introduce local_trylock[_irqsave]() helpers that only work with newly introduced local_trylock_t type. Note that attempt to use local_trylock[_irqsave]() with local_lock_t will cause compilation failure. Usage and behavior in !PREEMPT_RT: local_lock_t lock; // sizeof(lock) == 0 local_lock(&lock); // preempt disable local_lock_irqsave(&lock, ...); // irq save if (local_trylock_irqsave(&lock, ...)) // compilation error local_trylock_t lock; // sizeof(lock) == 4 local_lock(&lock); // preempt disable, acquired = 1 local_lock_irqsave(&lock, ...); // irq save, acquired = 1 if (local_trylock(&lock)) // if (!acquired) preempt disable, acquired = 1 if (local_trylock_irqsave(&lock, ...)) // if (!acquired) irq save, acquired = 1 The existing local_lock_() macros can be used either with local_lock_t or local_trylock_t. With local_trylock_t they set acquired = 1 while local_unlock_() clears it. In !PREEMPT_RT local_lock_irqsave(local_lock_t ) disables interrupts to protect critical section, but it doesn't prevent NMI, so the fully reentrant code cannot use local_lock_irqsave(local_lock_t ) for exclusive access. The local_lock_irqsave(local_trylock_t ) helper disables interrupts and sets acquired=1, so local_trylock_irqsave(local_trylock_t *) from NMI attempting to acquire the same lock will return false. In PREEMPT_RT local_lock_irqsave() maps to preemptible spin_lock(). Map local_trylock_irqsave() to preemptible spin_trylock(). When in hard IRQ or NMI return false right away, since spin_trylock() is not safe due to explicit locking in the underneath rt_spin_trylock() implementation. Removing this explicit locking and attempting only "trylock" is undesired due to PI implications. The local_trylock() without _irqsave can be used to avoid the cost of disabling/enabling interrupts by only disabling preemption, so local_trylock() in an interrupt attempting to acquire the same lock will return false. Note there is no need to use local_inc for acquired variable, since it's a percpu variable with strict nesting scopes. Note that guard(local_lock)(&lock) works only for "local_lock_t lock". The patch also makes sure that local_lock_release(l) is called before WRITE_ONCE(l->acquired, 0). Though IRQs are disabled at this point the local_trylock() from NMI will succeed and local_lock_acquire(l) will warn. Link: https://lkml.kernel.org/r/20250403025514.41186-1-alexei.starovoitov@gmail.com Fixes: 0aaddfb06882 ("locking/local_lock: Introduce localtry_lock_t") Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Daniel Borkman <daniel@iogearbox.net> Cc: Linus Torvalds <torvalds@linuxfoundation.org> Cc: Martin KaFai Lau <martin.lau@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-11	vt: create ucs_recompose.c using gen_ucs_recompose.py	Nicolas Pitre
	This provides ucs_recompose() to recompose two Unicode characters into a single character if possible. This is needed for the VT to properly display decomposed UTF8 sequences. Note: scripts/checkpatch.pl complains about "... exceeds 100 columns". Please ignore. Signed-off-by: Nicolas Pitre <npitre@baylibre.com> Link: https://lore.kernel.org/r/20250410011839.64418-8-nico@fluxnic.net Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-04-11	vt: update ucs_width.c using gen_ucs_width.py	Nicolas Pitre
	This replaces ucs_width.c with the code generated by gen_ucs_width.py providing comprehensive tables for double-width and zero-width Unicode code points. Also make ucs_is_zero_width() effective. Note: scripts/checkpatch.pl complains about "... exceeds 100 columns". Please ignore. Signed-off-by: Nicolas Pitre <npitre@baylibre.com> Link: https://lore.kernel.org/r/20250410011839.64418-6-nico@fluxnic.net Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-04-11	vt: properly support zero-width Unicode code points	Nicolas Pitre
	Zero-width Unicode code points are causing misalignment in vertically aligned content, disrupting the visual layout. Let's handle zero-width code points more intelligently. Double-width code points are stored in the screen grid followed by a white space code point to create the expected screen layout. When a double-width code point is followed by a zero-width code point in the console incoming bytestream (e.g., an emoji with a presentation selector) then we may replace the white space padding by that zero-width code point instead of dropping it. This maximize screen content information while preserving proper layout. If a zero-width code point is preceded by a single-width code point then the above trick is not possible and such zero-width code point must be dropped. VS16 (Variation Selector 16, U+FE0F) is special as it doubles the width of the preceding single-width code point. We handle that case by giving VS16 a width of 1 when that happens. Signed-off-by: Nicolas Pitre <npitre@baylibre.com> Link: https://lore.kernel.org/r/20250410011839.64418-4-nico@fluxnic.net Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>