linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2025-03-21	mm: remove references to folio in __memcg_kmem_uncharge_page()	Matthew Wilcox (Oracle)
	This use of folios is misleading because these pages are not part of a folio. Remove an unnecessary call to page_folio(), saving 58 bytes of text in a Debian kernel build. Link: https://lkml.kernel.org/r/20250314133617.138071-6-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: David Hildenbrand <david@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-21	mm: simplify folio_memcg_charged()	Matthew Wilcox (Oracle)
	There's no need to check which kind of pointer is in the memcg_data field, all we actually care about is whether it's zero or not. Saves 70 bytes in workingset_activation() with the Debian config. Link: https://lkml.kernel.org/r/20250314133617.138071-5-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: David Hildenbrand <david@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-21	mm: remove references to folio in split_page_memcg()	Matthew Wilcox (Oracle)
	We know that the passed in page is not part of a folio (it's a plain page allocated with GFP_ACCOUNT), so we should get rid of the misleading references to folios. Introduce page_objcg() and page_set_objcg() helpers to make things more clear. Link: https://lkml.kernel.org/r/20250314133617.138071-4-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: David Hildenbrand <david@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-21	mm: simplify split_page_memcg()	Matthew Wilcox (Oracle)
	The last argument to split_page_memcg() is now always 0, so remove it, effectively reverting commit b8791381d7ed. Link: https://lkml.kernel.org/r/20250314133617.138071-3-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Acked-by: Zi Yan <ziy@nvidia.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: David Hildenbrand <david@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-21	mm: separate folio_split_memcg_refs() from split_page_memcg()	Matthew Wilcox (Oracle)
	Patch series "Minor memcg cleanups & prep for memdescs", v2. Separate the handling of accounted folios and GFP_ACCOUNT pages for easier to understand code. For more detail, see https://lore.kernel.org/linux-mm/Z9LwTOudOlCGny3f@casper.infradead.org/ This patch (of 5): Folios always use memcg_data to refer to the mem_cgroup while pages allocated with GFP_ACCOUNT have a pointer to the obj_cgroup. Since the caller already knows what it has, split the function into two and then we don't need to check. Move the assignment of split folio memcg_data to the point where we set up the other parts of the new folio. That leaves folio_split_memcg_refs() just handling the memcg accounting. Link: https://lkml.kernel.org/r/20250314133617.138071-1-willy@infradead.org Link: https://lkml.kernel.org/r/20250314133617.138071-2-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Acked-by: Zi Yan <ziy@nvidia.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: David Hildenbrand <david@redhat.com> Cc: Matthew Wilcow (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-21	memcg: move do_memsw_account() to CONFIG_MEMCG_V1	Shakeel Butt
	The do_memsw_account() is used to enable or disable legacy memory+swap accounting in memory cgroup. However with disabled CONFIG_MEMCG_V1, we don't need to keep checking it. So, let's always return false for !CONFIG_MEMCG_V1 configs. Before the patch: $ size mm/memcontrol.o text data bss dec hex filename 49928 10736 4172 64836 fd44 mm/memcontrol.o After the patch: $ size mm/memcontrol.o text data bss dec hex filename 49430 10480 4172 64082 fa52 mm/memcontrol.o Link: https://lkml.kernel.org/r/20250312222552.3284173-1-shakeel.butt@linux.dev Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Roman Gushchin <roman.gushchin@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-21	memcg: avoid refill_stock for root memcg	Shakeel Butt
	We never charge the page counters of root memcg, so there is no need to put root memcg in the memcg stock. At the moment, refill_stock() can be called from try_charge_memcg(), obj_cgroup_uncharge_pages() and mem_cgroup_uncharge_skmem(). The try_charge_memcg() and mem_cgroup_uncharge_skmem() are never called with root memcg, so those are fine. However obj_cgroup_uncharge_pages() can potentially call refill_stock() with root memcg if the objcg object has been reparented over to the root memcg. Let's just avoid refill_stock() from obj_cgroup_uncharge_pages() for root memcg. Link: https://lkml.kernel.org/r/20250313054812.2185900-1-shakeel.butt@linux.dev Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev> Acked-by: Michal Hocko <mhockoc@suse.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Roman Gushchin <roman.gushchin@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-21	mm/mm_init: rename init_reserved_page to init_deferred_page	Mike Rapoport (Microsoft)
	When CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, init_reserved_page() function performs initialization of a struct page that would have been deferred normally. Rename it to init_deferred_page() to better reflect what the function does. Link: https://lkml.kernel.org/r/20250225083017.567649-3-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Wei Yang <richard.weiyang@gmail.com> Cc: Frank van der Linden <fvdl@google.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Changyuan Lyu <changyuanl@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-21	mm/mm_init: rename __init_reserved_page_zone to __init_page_from_nid	Mike Rapoport (Microsoft)
	__init_reserved_page_zone() function finds the zone for pfn and nid and performs initialization of a struct page with that zone and nid. There is nothing in that function about reserved pages and it is misnamed. Rename it to __init_page_from_nid() to better reflect what the function does. Link: https://lkml.kernel.org/r/20250225083017.567649-2-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Wei Yang <richard.weiyang@gmail.com> Cc: Frank van der Linden <fvdl@google.com> Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-21	mm/cma: using per-CMA locks to improve concurrent allocation performance	Ge Yang
	For different CMAs, concurrent allocation of CMA memory ideally should not require synchronization using locks. Currently, a global cma_mutex lock is employed to synchronize all CMA allocations, which can impact the performance of concurrent allocations across different CMAs. To test the performance impact, follow these steps: 1. Boot the kernel with the command line argument hugetlb_cma=30G to allocate a 30GB CMA area specifically for huge page allocations. (note: on my machine, which has 3 nodes, each node is initialized with 10G of CMA) 2. Use the dd command with parameters if=/dev/zero of=/dev/shm/file bs=1G count=30 to fully utilize the CMA area by writing zeroes to a file in /dev/shm. 3. Open three terminals and execute the following commands simultaneously: (Note: Each of these commands attempts to allocate 10GB [2621440 * 4KB pages] of CMA memory.) On Terminal 1: time echo 2621440 > /sys/kernel/debug/cma/hugetlb1/alloc On Terminal 2: time echo 2621440 > /sys/kernel/debug/cma/hugetlb2/alloc On Terminal 3: time echo 2621440 > /sys/kernel/debug/cma/hugetlb3/alloc We attempt to allocate pages through the CMA debug interface and use the time command to measure the duration of each allocation. Performance comparison: Without this patch With this patch Terminal1 ~7s ~7s Terminal2 ~14s ~8s Terminal3 ~21s ~7s To solve problem above, we could use per-CMA locks to improve concurrent allocation performance. This would allow each CMA to be managed independently, reducing the need for a global lock and thus improving scalability and performance. Link: https://lkml.kernel.org/r/1739152566-744-1-git-send-email-yangge1116@126.com Signed-off-by: Ge Yang <yangge1116@126.com> Reviewed-by: Barry Song <baohua@kernel.org> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Aisheng Dong <aisheng.dong@nxp.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-21	hwmon: emc2305: Use devm_thermal_of_cooling_device_register	Florin Leotescu
	Prepare the emc2305 driver to use configuration from Device Tree nodes. Switch to devm_thermal_of_cooling_device_register to simplify the cleanup procedure, allowing the removal of emc2305_unset_tz and emc2305_remove, which are no longer needed. Signed-off-by: Florin Leotescu <florin.leotescu@nxp.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Link: https://lore.kernel.org/r/20250321143308.4008623-4-florin.leotescu@oss.nxp.com Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2025-03-21	hwmon: emc2305: Add OF support	Florin Leotescu
	Introduce OF support for Microchip emc2305 pwm fan controller. Signed-off-by: Florin Leotescu <florin.leotescu@nxp.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Link: https://lore.kernel.org/r/20250321143308.4008623-3-florin.leotescu@oss.nxp.com Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2025-03-21	dt-bindings: hwmon: Add Microchip emc2305 support	Florin Leotescu
	Introduce yaml schema for Microchip emc2305 pwm fan controller. Signed-off-by: Florin Leotescu <florin.leotescu@nxp.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Link: https://lore.kernel.org/r/20250321143308.4008623-2-florin.leotescu@oss.nxp.com [groeck: Fixed comment line length, added 'maxItems: 1' Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2025-03-21	pds_fwctl: add Documentation entries	Shannon Nelson
	Add pds_fwctl to the driver and fwctl documentation pages. Link: https://patch.msgid.link/r/20250320194412.67983-7-shannon.nelson@amd.com Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-21	pds_fwctl: add rpc and query support	Brett Creeley
	The pds_fwctl driver doesn't know what RPC operations are available in the firmware, so also doesn't know what scope they might have. The userland utility supplies the firmware "endpoint" and "operation" id values and this driver queries the firmware for endpoints and their available operations. The operation descriptions include the scope information which the driver uses for scope testing. Link: https://patch.msgid.link/r/20250320194412.67983-6-shannon.nelson@amd.com Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Brett Creeley <brett.creeley@amd.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-21	pds_fwctl: initial driver framework	Shannon Nelson
	Initial files for adding a new fwctl driver for the AMD/Pensando PDS devices. This sets up a simple auxiliary_bus driver that registers with fwctl subsystem. It expects that a pds_core device has set up the auxiliary_device pds_core.fwctl Link: https://patch.msgid.link/r/20250320194412.67983-5-shannon.nelson@amd.com Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-22	crypto: testmgr - Add multibuffer acomp testing	Herbert Xu
	Add rudimentary multibuffer acomp testing. Testing coverage is extended to compression vectors only. However, as the compression vectors are compressed and then decompressed, this covers both compression and decompression. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-03-22	crypto: acomp - Fix synchronous acomp chaining fallback	Herbert Xu
	The synchronous acomp fallback code path is broken because the completion code path assumes that the state object is always set but this is only done for asynchronous algorithms. First of all remove the assumption on the completion code path by passing in req0 instead of the state. However, also remove the conditional setting of the state since it's always in the request object anyway. Fixes: b67a02600372 ("crypto: acomp - Add request chaining and virtual addresses") Reported-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-03-22	crypto: testmgr - Add multibuffer hash testing	Herbert Xu
	This is based on a patch by Eric Biggers <ebiggers@google.com>. Add limited self-test for multibuffer hash code path. This tests only a single request in chain of a random length. The other requests are either all of the same length as the one being tested, or random lengths between 0 and PAGE_SIZE * 2 * XBUFSIZE. Potential extension include testing all requests rather than just the single one. Link: https://lore.kernel.org/all/20241001153718.111665-3-ebiggers@kernel.org/ Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-03-22	crypto: hash - Fix synchronous ahash chaining fallback	Herbert Xu
	The synchronous ahash fallback code paths are broken because the ahash_restore_req assumes there is always a state object. Fix this by removing the state from ahash_restore_req and localising it to the asynchronous completion callback. Also add a missing synchronous finish call in ahash_def_digest_finish. Fixes: f2ffe5a9183d ("crypto: hash - Add request chaining API") Fixes: 439963cdc3aa ("crypto: ahash - Add virtual address support") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-03-22	crypto: arm/ghash-ce - Remove SIMD fallback code path	Herbert Xu
	Remove the obsolete fallback code path for SIMD and remove the cryptd-based ghash-ce algorithm. Rename the shash algorithm to ghash-ce. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-03-21	x86/hyperv: fix an indentation issue in mshyperv.h	Wei Liu
	Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202503220640.hjiacW2C-lkp@intel.com/ Signed-off-by: Wei Liu <wei.liu@kernel.org>
2025-03-21	hwspinlock: Remove unused hwspin_lock_get_id()	Dr. David Alan Gilbert
	hwspin_lock_get_id() has been unused since the original 2011 commit bd9a4c7df256 ("drivers: hwspinlock: add framework") Remove it and the corresponding docs. Note that the of_hwspin_lock_get_id() version is still in use, so leave that alone. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Link: https://lore.kernel.org/r/20241215022023.181435-1-linux@treblig.org Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-03-21	hwspinlock: Remove unused (devm_)hwspin_lock_request()	Dr. David Alan Gilbert
	devm_hwspin_lock_request() was added by 2018's commit 4f1acd758b08 ("hwspinlock: Add devm_xxx() APIs to request/free hwlock") however, it's never been used, everyone uses the devm_hwspin_lock_request_specific() call instead. Remove it. Similarly, the none-devm variant isn't used. Remove it, and the referring documentation. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Link: https://lore.kernel.org/r/20241027205445.239108-1-linux@treblig.org Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-03-21	PCI/DOE: Allow enabling DOE without CXL	Alistair Francis
	PCIe devices (not CXL) can support DOE as well, so allow DOE to be enabled even if CXL isn't. Link: https://lore.kernel.org/r/20250306075211.1855177-4-alistair@alistair23.me Signed-off-by: Alistair Francis <alistair@alistair23.me> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2025-03-21	PCI/DOE: Expose DOE features via sysfs	Alistair Francis
	PCIe r6.0 added support for Data Object Exchange (DOE). When DOE is supported, the DOE Discovery Feature must be implemented per PCIe r6.1, sec 6.30.1.1. DOE allows a requester to obtain information about the other DOE features supported by the device. The kernel already queries the DOE features supported and caches the values. Expose the values in sysfs to allow user space to determine which DOE features are supported by the PCIe device. By exposing the information to userspace, tools like lspci can relay the information to users. By listing all of the supported features we can allow userspace to parse the list, which might include vendor specific features as well as yet to be supported features. As the DOE Discovery feature must always be supported we treat it as a special named attribute case. This allows the usual PCI attribute_group handling to correctly create the doe_features directory when registering pci_doe_sysfs_group (otherwise it doesn't and sysfs_add_file_to_group() will seg fault). After this patch is supported you can see something like this when attaching a DOE device: $ ls /sys/devices/pci0000:00/0000:00:02.0//doe* 0001:01 0001:02 doe_discovery Link: https://lore.kernel.org/r/20250306075211.1855177-3-alistair@alistair23.me Signed-off-by: Alistair Francis <alistair@alistair23.me> [bhelgaas: drop pci_doe_sysfs_init() stub return, make DEVICE_ATTR_RO(doe_discovery) static] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2025-03-21	media: dt-bindings: mediatek,vcodec-encoder: Drop assigned-clock properties	Rob Herring (Arm)
	The assigned-clock properties are always allowed on nodes with 'clocks' and generally not required. Additionally the mt8183 doesn't define them, so they must not be required in that case. Signed-off-by: "Rob Herring (Arm)" <robh@kernel.org> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20250317214621.794674-1-robh@kernel.org Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
2025-03-21	net: Remove RTNL dance for SIOCBRADDIF and SIOCBRDELIF.	Kuniyuki Iwashima
	SIOCBRDELIF is passed to dev_ioctl() first and later forwarded to br_ioctl_call(), which causes unnecessary RTNL dance and the splat below [0] under RTNL pressure. Let's say Thread A is trying to detach a device from a bridge and Thread B is trying to remove the bridge. In dev_ioctl(), Thread A bumps the bridge device's refcnt by netdev_hold() and releases RTNL because the following br_ioctl_call() also re-acquires RTNL. In the race window, Thread B could acquire RTNL and try to remove the bridge device. Then, rtnl_unlock() by Thread B will release RTNL and wait for netdev_put() by Thread A. Thread A, however, must hold RTNL after the unlock in dev_ifsioc(), which may take long under RTNL pressure, resulting in the splat by Thread B. Thread A (SIOCBRDELIF) Thread B (SIOCBRDELBR) ---------------------- ---------------------- sock_ioctl sock_ioctl `- sock_do_ioctl `- br_ioctl_call `- dev_ioctl `- br_ioctl_stub \|- rtnl_lock \| \|- dev_ifsioc ' ' \|- dev = __dev_get_by_name(...) \|- netdev_hold(dev, ...) . / \|- rtnl_unlock ------. \| \| \|- br_ioctl_call `---> \|- rtnl_lock Race \| \| `- br_ioctl_stub \|- br_del_bridge Window \| \| \| \|- dev = __dev_get_by_name(...) \| \| \| May take long \| `- br_dev_delete(dev, ...) \| \| \| under RTNL pressure \| `- unregister_netdevice_queue(dev, ...) \| \| \| \| `- rtnl_unlock \ \| \|- rtnl_lock <-' `- netdev_run_todo \| \|- ... `- netdev_run_todo \| `- rtnl_unlock \|- __rtnl_unlock \| \|- netdev_wait_allrefs_any \|- netdev_put(dev, ...) <----------------' Wait refcnt decrement and log splat below To avoid blocking SIOCBRDELBR unnecessarily, let's not call dev_ioctl() for SIOCBRADDIF and SIOCBRDELIF. In the dev_ioctl() path, we do the following: 1. Copy struct ifreq by get_user_ifreq in sock_do_ioctl() 2. Check CAP_NET_ADMIN in dev_ioctl() 3. Call dev_load() in dev_ioctl() 4. Fetch the master dev from ifr.ifr_name in dev_ifsioc() 3. can be done by request_module() in br_ioctl_call(), so we move 1., 2., and 4. to br_ioctl_stub(). Note that 2. is also checked later in add_del_if(), but it's better performed before RTNL. SIOCBRADDIF and SIOCBRDELIF have been processed in dev_ioctl() since the pre-git era, and there seems to be no specific reason to process them there. [0]: unregister_netdevice: waiting for wpan3 to become free. Usage count = 2 ref_tracker: wpan3@ffff8880662d8608 has 1/1 users at __netdev_tracker_alloc include/linux/netdevice.h:4282 [inline] netdev_hold include/linux/netdevice.h:4311 [inline] dev_ifsioc+0xc6a/0x1160 net/core/dev_ioctl.c:624 dev_ioctl+0x255/0x10c0 net/core/dev_ioctl.c:826 sock_do_ioctl+0x1ca/0x260 net/socket.c:1213 sock_ioctl+0x23a/0x6c0 net/socket.c:1318 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:906 [inline] __se_sys_ioctl fs/ioctl.c:892 [inline] __x64_sys_ioctl+0x1a4/0x210 fs/ioctl.c:892 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcb/0x250 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f Fixes: 893b19587534 ("net: bridge: fix ioctl locking") Reported-by: syzkaller <syzkaller@googlegroups.com> Reported-by: yan kang <kangyan91@outlook.com> Reported-by: yue sun <samsun1006219@gmail.com> Closes: https://lore.kernel.org/netdev/SY8P300MB0421225D54EB92762AE8F0F2A1D32@SY8P300MB0421.AUSP300.PROD.OUTLOOK.COM/ Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20250316192851.19781-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-21	Merge tag 'spi-fix-v6.14-rc7' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi fix from Mark Brown: "This is a straightforward fix for a reference count leak in the rarely used SPI device mode functionality" * tag 'spi-fix-v6.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: spi: Fix reference count leak in slave_show()
2025-03-21	libbpf: Add namespace for errstr making it libbpf_errstr	Ian Rogers
	When statically linking symbols can be replaced with those from other statically linked libraries depending on the link order and the hoped for "multiple definition" error may not appear. To avoid conflicts it is good practice to namespace symbols, this change renames errstr to libbpf_errstr. To avoid churn a #define is used to turn use of errstr(err) to libbpf_errstr(err). Fixes: 1633a83bf993 ("libbpf: Introduce errstr() for stringifying errno") Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250320222439.1350187-1-irogers@google.com
2025-03-21	Merge tag 'regulator-fix-v6.14-rc7' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator Pull regulator fixes from Mark Brown: "More fixes than I'd like at this point, some of which is due to me cooking things in -next for a bit and resetting that cooking time as more fixes came in. - Christian Eggers fixed some race conditions with the dummy regulator not being available very early in boot due to the use of asynchronous probing, both the provider side (ensuring that it's availalbe) and consumer side (handling things if that goes wrong) are fixed - Ludvig Pärsson fixed some lockdep issues with the debugfs registration for regulators holding more locks than it really needs causing issues later when looking at the resulting debugfs.boot - Some device specific fixes for incorrect descriptions of the RTQ2208 from ChiYuan Huang" * tag 'regulator-fix-v6.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: regulator: rtq2208: Fix the LDO DVS capability regulator: rtq2208: Fix incorrect buck converter phase mapping regulator: check that dummy regulator has been probed before using it regulator: dummy: force synchronous probing regulator: core: Fix deadlock in create_regulator()
2025-03-21	net: phy: realtek: disable PHY-mode EEE	Russell King (Oracle)
	Realtek RTL8211F has a "PHY-mode" EEE support which interferes with an IEEE 802.3 compliant implementation. This mode defaults to enabled, and results in the MAC receive path not seeing the link transition to LPI state. Fix this by disabling PHY-mode EEE. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/E1ttnHW-00785s-Uq@rmk-PC.armlinux.org.uk Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-21	net: phy: fix genphy_c45_eee_is_active() for disabled EEE	Russell King (Oracle)
	Commit 809265fe96fe ("net: phy: c45: remove local advertisement parameter from genphy_c45_eee_is_active") stopped reading the local advertisement from the PHY earlier in this development cycle, which broke "ethtool --set-eee ethX eee off". When ethtool is used to set EEE off, genphy_c45_eee_is_active() indicates that EEE was active if the link partner reported an advertisement, which causes phylib to set phydev->enable_tx_lpi on link up, despite our local advertisement in hardware being empty. However, phydev->advertising_eee is preserved while EEE is turned off, which leads to genphy_c45_eee_is_active() incorrectly reporting that EEE is active. Fix it by checking phydev->eee_cfg.eee_enabled, and if clear, immediately indicate that EEE is not active. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/E1ttmWN-0077Mb-Q6@rmk-PC.armlinux.org.uk Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-21	Merge branch 'mlx5e-support-recovery-counter-in-reset'	Paolo Abeni
	Tariq Toukan says: ==================== mlx5e: Support recovery counter in reset This series by Yael adds a recovery counter in ethtool, for any recovery type during port reset cycle. Series starts with some cleanup and refactoring patches. New counter is added and exposed to ethtool stats in patch #4. ==================== Link: https://patch.msgid.link/1742112876-2890-1-git-send-email-tariqt@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-21	net/mlx5e: Expose port reset cycle recovery counter via ethtool	Yael Chemla
	Display recovery event of PPCNT recovery counters group. Counts (per link) the number of total successful recovery events of any recovery types during port reset cycle. Signed-off-by: Yael Chemla <ychemla@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/1742112876-2890-5-git-send-email-tariqt@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-21	net/mlx5e: Get counter group size by FW capability	Yael Chemla
	Retrieve the number of fields supported by each PPCNT counter group based on the FW capability for this group. Signed-off-by: Yael Chemla <ychemla@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://patch.msgid.link/1742112876-2890-4-git-send-email-tariqt@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-21	net/mlx5e: Access PHY layer counter group as other counter groups	Yael Chemla
	Adjust the way physical layer counters group is accessed to match the generic method used for accessing other PPCNT counter groups. Signed-off-by: Yael Chemla <ychemla@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/1742112876-2890-3-git-send-email-tariqt@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-21	net/mlx5e: Ensure each counter group uses its PCAM bit	Yael Chemla
	The code was incorrectly relying on PCAM bit of ppcnt_statistical_group for accessing per_lane_error_counters. If ppcnt_statistical_group PCAM bit was not set, we would not read per_lane_error_counters, even when its PCAM bit is set. Given the existing device capabilities, it seems to cause no harm, so this change primarily serves as cleanup. Signed-off-by: Yael Chemla <ychemla@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://patch.msgid.link/1742112876-2890-2-git-send-email-tariqt@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-21	selftests: ublk: fix starting ublk device	Ming Lei
	Firstly ublk char device node may not be created by udev yet, so wait a while until it can be opened or timeout. Secondly delete created ublk device in case of start failure, otherwise the device becomes zombie. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250321135324.259677-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-21	Merge tag 'pinctrl-v6.14-4' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl Pull pin control fix from Linus Walleij: - A single patch for Spacemit K1 fixing up the Kconfig to not default to "y" * tag 'pinctrl-v6.14-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: pinctrl: spacemit: PINCTRL_SPACEMIT_K1 should not default to y unconditionally
2025-03-21	s390/pci: Support mmap() of PCI resources except for ISM devices	Niklas Schnelle
	So far s390 does not allow mmap() of PCI resources to user-space via the usual mechanisms, though it does use it for RDMA. For the PCI sysfs resource files and /proc/bus/pci it defines neither HAVE_PCI_MMAP nor ARCH_GENERIC_PCI_MMAP_RESOURCE. For vfio-pci s390 previously relied on disabled VFIO_PCI_MMAP and now relies on setting pdev->non_mappable_bars for all devices. This is partly because access to mapped PCI resources from user-space requires special PCI load/store memory-I/O (MIO) instructions, or the special MMIO syscalls when these are not available. Still, such access is possible and useful not just for RDMA, in fact not being able to mmap() PCI resources has previously caused extra work when testing devices. One thing that doesn't work with PCI resources mapped to user-space though is the s390 specific virtual ISM device. Not only because the BAR size of 256 TiB prevents mapping the whole BAR but also because access requires use of the legacy PCI instructions which are not accessible to user-space on systems with the newer MIO PCI instructions. Now with the pdev->non_mappable_bars flag ISM can be excluded from mapping its resources while making this functionality available for all other PCI devices. To this end introduce a minimal implementation of PCI_QUIRKS and use that to set pdev->non_mappable_bars for ISM devices only. Then also set ARCH_GENERIC_PCI_MMAP_RESOURCE to take advantage of the generic implementation of pci_mmap_resource_range() enabling only the newer sysfs mmap() interface. This follows the recommendation in Documentation/PCI/sysfs-pci.rst. Link: https://lore.kernel.org/r/20250226-vfio_pci_mmap-v7-3-c5c0f1d26efd@linux.ibm.com Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-21	s390/pci: Introduce pdev->non_mappable_bars and replace VFIO_PCI_MMAP	Niklas Schnelle
	The ability to map PCI resources to user-space is controlled by global defines. For vfio there is VFIO_PCI_MMAP which is only disabled on s390 and controls mapping of PCI resources using vfio-pci with a fallback option via the pread()/pwrite() interface. For the PCI core there is ARCH_GENERIC_PCI_MMAP_RESOURCE which enables a generic implementation for mapping PCI resources plus the newer sysfs interface. Then there is HAVE_PCI_MMAP which can be used with custom definitions of pci_mmap_resource_range() and the historical /proc/bus/pci interface. Both mechanisms are all or nothing. For s390 mapping PCI resources is possible and useful for testing and certain applications such as QEMU's vfio-pci based user-space NVMe driver. For certain devices, however access to PCI resources via mappings to user-space is not possible and these must be excluded from the general PCI resource mapping mechanisms. Introduce pdev->non_mappable_bars to indicate that a PCI device's BARs can not be accessed via mappings to user-space. In the future this enables per-device restrictions of PCI resource mapping. For now, set this flag for all PCI devices on s390 in line with the existing, general disable of PCI resource mapping. As s390 is the only user of the VFI_PCI_MMAP Kconfig options this can already be replaced with a check of this new flag. Also add similar checks in the other code protected by HAVE_PCI_MMAP respectively ARCH_GENERIC_PCI_MMAP in preparation for enabling these for supported devices. Link: https://lore.kernel.org/lkml/20250212132808.08dcf03c.alex.williamson@redhat.com/ Link: https://lore.kernel.org/r/20250226-vfio_pci_mmap-v7-2-c5c0f1d26efd@linux.ibm.com Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-21	s390/pci: Fix s390_mmio_read/write syscall page fault handling	Niklas Schnelle
	The s390 MMIO syscalls when using the classic PCI instructions do not cause a page fault when follow_pfnmap_start() fails due to the page not being present. Besides being a general deficiency this breaks vfio-pci's mmap() handling once VFIO_PCI_MMAP gets enabled as this lazily maps on first access. Fix this by following a failed follow_pfnmap_start() with fixup_user_page() and retrying the follow_pfnmap_start(). Also fix a VM_READ vs VM_WRITE mixup in the read syscall. Link: https://lore.kernel.org/r/20250226-vfio_pci_mmap-v7-1-c5c0f1d26efd@linux.ibm.com Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
2025-03-21	PCI: Fix NULL dereference in SR-IOV VF creation error path	Shay Drory
	Clean up when virtfn setup fails to prevent NULL pointer dereference during device removal. The kernel oops below occurred due to incorrect error handling flow when pci_setup_device() fails. Add pci_iov_scan_device(), which handles virtfn allocation and setup and cleans up if pci_setup_device() fails, so pci_iov_add_virtfn() doesn't need to call pci_stop_and_remove_bus_device(). This prevents accessing partially initialized virtfn devices during removal. BUG: kernel NULL pointer dereference, address: 00000000000000d0 RIP: 0010:device_del+0x3d/0x3d0 Call Trace: pci_remove_bus_device+0x7c/0x100 pci_iov_add_virtfn+0xfa/0x200 sriov_enable+0x208/0x420 mlx5_core_sriov_configure+0x6a/0x160 [mlx5_core] sriov_numvfs_store+0xae/0x1a0 Link: https://lore.kernel.org/r/20250310084524.599225-1-shayd@nvidia.com Fixes: e3f30d563a38 ("PCI: Make pci_destroy_dev() concurrent safe") Signed-off-by: Shay Drory <shayd@nvidia.com> [bhelgaas: commit log, return ERR_PTR(-ENOMEM) directly] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Keith Busch <kbusch@kernel.org>
2025-03-21	tracepoint: Print the function symbol when tracepoint_debug is set	Huang Shijie
	When tracepoint_debug is set, we may get the output in kernel log: [ 380.013843] Probe 0 : 00000000f0d68cda It is not readable, so change to print the function symbol. After this patch, the output may becomes: [ 55.225555] Probe 0 : perf_trace_sched_wakeup_template+0x0/0x20 Link: https://lore.kernel.org/20250307033858.4134-1-shijie@os.amperecomputing.com Signed-off-by: Huang Shijie <shijie@os.amperecomputing.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-03-21	io_uring/net: only import send_zc buffer once	Caleb Sander Mateos
	io_send_zc() guards its call to io_send_zc_import() with if (!done_io) in an attempt to avoid calling it redundantly on the same req. However, if the initial non-blocking issue returns -EAGAIN, done_io will stay 0. This causes the subsequent issue to unnecessarily re-import the buffer. Add an explicit flag "imported" to io_sr_msg to track if its buffer has already been imported. Clear the flag in io_send_zc_prep(). Call io_send_zc_import() and set the flag in io_send_zc() if it is unset. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Fixes: 54cdcca05abd ("io_uring/net: switch io_send() and io_send_zc() to using io_async_msghdr") Link: https://lore.kernel.org/r/20250321184819.3847386-2-csander@purestorage.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-21	io_uring/cmd: introduce io_uring_cmd_import_fixed_vec	Pavel Begunkov
	io_uring_cmd_import_fixed_vec() is a cmd helper around vectored registered buffer import functions, which caches the memory under the hood. The lifetime of the vectore and hence the iterator is bound to the request. Furthermore, the user is not allowed to call it multiple times for a single request. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/97487a80dec3fb8cf8aeedf1f9026ef6d503fe4b.1742579999.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-21	io_uring/cmd: add iovec cache for commands	Pavel Begunkov
	Add iou_vec to commands and wire caching for it, but don't expose it to users just yet. We need the vec cleared on initial alloc, but since we can't place it at the beginning at the moment, zero the entire async_data. It's cached, and the performance effects only the initial allocation, and it might be not a bad idea since we're exposing those bits to outside drivers. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/c0f2145b75791bc6106eb4e72add2cf6a2c72a7a.1742579999.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-21	x86/hyperv: Add comments about hv_vpset and var size hypercall input args	Michael Kelley
	Current code varies in how the size of the variable size input header for hypercalls is calculated when the input contains struct hv_vpset. Surprisingly, this variation is correct, as different hypercalls make different choices for what portion of struct hv_vpset is treated as part of the variable size input header. The Hyper-V TLFS is silent on these details, but the behavior has been confirmed with Hyper-V developers. To avoid future confusion about these differences, add comments to struct hv_vpset, and to hypercall call sites with input that contains a struct hv_vpset. The comments describe the overall situation and the calculation that should be used at each particular call site. No functional change as only comments are updated. Signed-off-by: Michael Kelley <mhklinux@outlook.com> Link: https://lore.kernel.org/r/20250318214919.958953-1-mhklinux@outlook.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <20250318214919.958953-1-mhklinux@outlook.com>
2025-03-21	Drivers: hv: Introduce mshv_root module to expose /dev/mshv to VMMs	Nuno Das Neves
	Provide a set of IOCTLs for creating and managing child partitions when running as root partition on Hyper-V. The new driver is enabled via CONFIG_MSHV_ROOT. A brief overview of the interface: MSHV_CREATE_PARTITION is the entry point, returning a file descriptor representing a child partition. IOCTLs on this fd can be used to map memory, create VPs, etc. Creating a VP returns another file descriptor representing that VP which in turn has another set of corresponding IOCTLs for running the VP, getting/setting state, etc. MSHV_ROOT_HVCALL is a generic "passthrough" hypercall IOCTL which can be used for a number of partition or VP hypercalls. This is for hypercalls that do not affect any state in the kernel driver, such as getting and setting VP registers and partition properties, translating addresses, etc. It is "passthrough" because the binary input and output for the hypercall is only interpreted by the VMM - the kernel driver does nothing but insert the VP and partition id where necessary (which are always in the same place), and execute the hypercall. Co-developed-by: Anirudh Rayabharam <anrayabh@linux.microsoft.com> Signed-off-by: Anirudh Rayabharam <anrayabh@linux.microsoft.com> Co-developed-by: Jinank Jain <jinankjain@microsoft.com> Signed-off-by: Jinank Jain <jinankjain@microsoft.com> Co-developed-by: Mukesh Rathor <mrathor@linux.microsoft.com> Signed-off-by: Mukesh Rathor <mrathor@linux.microsoft.com> Co-developed-by: Muminul Islam <muislam@microsoft.com> Signed-off-by: Muminul Islam <muislam@microsoft.com> Co-developed-by: Praveen K Paladugu <prapal@linux.microsoft.com> Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com> Co-developed-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Co-developed-by: Wei Liu <wei.liu@kernel.org> Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Reviewed-by: Roman Kisel <romank@linux.microsoft.com> Link: https://lore.kernel.org/r/1741980536-3865-11-git-send-email-nunodasneves@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <1741980536-3865-11-git-send-email-nunodasneves@linux.microsoft.com>