summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-06-13drm/nouveau/gsp: Fix potential integer overflow on integer shiftsColin Ian King
The left shift int 32 bit integer constants 1 is evaluated using 32 bit arithmetic and then assigned to a 64 bit unsigned integer. In the case where the shift is 32 or more this can lead to an overflow. Avoid this by shifting using the BIT_ULL macro instead. Fixes: 6c3ac7bcfcff ("drm/nouveau/gsp: support deeper page tables in COPY_SERVER_RESERVED_PDES") Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Danilo Krummrich <dakr@kernel.org> Link: https://lore.kernel.org/r/20250522131512.2768310-1-colin.i.king@gmail.com
2025-06-13genirq/irq_sim: Initialize work context pointers properlyGyeyoung Baek
Initialize `ops` member's pointers properly by using kzalloc() instead of kmalloc() when allocating the simulation work context. Otherwise the pointers contain random content leading to invalid dereferencing. Signed-off-by: Gyeyoung Baek <gye976@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250612124827.63259-1-gye976@gmail.com
2025-06-13genirq/cpuhotplug: Restore affinity even for suspended IRQBrian Norris
Commit 788019eb559f ("genirq: Retain disable depth for managed interrupts across CPU hotplug") tried to make managed shutdown/startup properly reference counted, but it missed the fact that the unplug and hotplug code has an intentional imbalance by skipping IRQS_SUSPENDED interrupts on the "restore" path. This means that if a managed-affinity interrupt was both suspended and managed-shutdown (such as may happen during system suspend / S3), resume skips calling irq_startup_managed(), and would again have an unbalanced depth this time, with a positive value (i.e., remaining unexpectedly masked). This IRQS_SUSPENDED check was introduced in commit a60dd06af674 ("genirq/cpuhotplug: Skip suspended interrupts when restoring affinity") for essentially the same reason as commit 788019eb559f, to prevent that irq_startup() would unconditionally re-enable an interrupt too early. Because irq_startup_managed() now respsects the disable-depth count, the IRQS_SUSPENDED check is not longer needed, and instead, it causes harm. Thus, drop the IRQS_SUSPENDED check, and restore balance. This effectively reverts commit a60dd06af674 ("genirq/cpuhotplug: Skip suspended interrupts when restoring affinity"), because it is replaced by commit 788019eb559f ("genirq: Retain disable depth for managed interrupts across CPU hotplug"). Fixes: 788019eb559f ("genirq: Retain disable depth for managed interrupts across CPU hotplug") Reported-by: Aleksandrs Vinarskis <alex.vinarskis@gmail.com> Signed-off-by: Brian Norris <briannorris@chromium.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Aleksandrs Vinarskis <alex.vinarskis@gmail.com> Link: https://lore.kernel.org/all/20250612183303.3433234-3-briannorris@chromium.org Closes: https://lore.kernel.org/lkml/24ec4adc-7c80-49e9-93ee-19908a97ab84@gmail.com/
2025-06-13genirq/cpuhotplug: Rebalance managed interrupts across multi-CPU hotplugBrian Norris
Commit 788019eb559f ("genirq: Retain disable depth for managed interrupts across CPU hotplug") intended to only decrement the disable depth once per managed shutdown, but instead it decrements for each CPU hotplug in the affinity mask, until its depth reaches a point where it finally gets re-started. For example, consider: 1. Interrupt is affine to CPU {M,N} 2. disable_irq() -> depth is 1 3. CPU M goes offline -> interrupt migrates to CPU N / depth is still 1 4. CPU N goes offline -> irq_shutdown() / depth is 2 5. CPU N goes online -> irq_restore_affinity_of_irq() -> irqd_is_managed_and_shutdown()==true -> irq_startup_managed() -> depth is 1 6. CPU M goes online -> irq_restore_affinity_of_irq() -> irqd_is_managed_and_shutdown()==true -> irq_startup_managed() -> depth is 0 *** BUG: driver expects the interrupt is still disabled *** -> irq_startup() -> irqd_clr_managed_shutdown() 7. enable_irq() -> depth underflow / unbalanced enable_irq() warning This should clear the managed-shutdown flag at step 6, so that further hotplugs don't cause further imbalance. Note: It might be cleaner to also remove the irqd_clr_managed_shutdown() invocation from __irq_startup_managed(). But this is currently not possible because of irq_update_affinity_desc() as it sets IRQD_MANAGED_SHUTDOWN and expects irq_startup() to clear it. Fixes: 788019eb559f ("genirq: Retain disable depth for managed interrupts across CPU hotplug") Reported-by: Aleksandrs Vinarskis <alex.vinarskis@gmail.com> Signed-off-by: Brian Norris <briannorris@chromium.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Aleksandrs Vinarskis <alex.vinarskis@gmail.com> Link: https://lore.kernel.org/all/20250612183303.3433234-2-briannorris@chromium.org
2025-06-13ata: ahci: Disallow LPM for ASUSPRO-D840SA motherboardNiklas Cassel
A user has bisected a regression which causes graphical corruptions on his screen to commit 7627a0edef54 ("ata: ahci: Drop low power policy board type"). Simply reverting commit 7627a0edef54 ("ata: ahci: Drop low power policy board type") makes the graphical corruptions on his screen to go away. (Note: there are no visible messages in dmesg that indicates a problem with AHCI.) The user also reports that the problem occurs regardless if there is an HDD or an SSD connected via AHCI, so the problem is not device related. The devices also work fine on other motherboards, so it seems specific to the ASUSPRO-D840SA motherboard. While enabling low power modes for AHCI is not supposed to affect completely unrelated hardware, like a graphics card, it does however allow the system to enter deeper PC-states, which could expose ACPI issues that were previously not visible (because the system never entered these lower power states before). There are previous examples where enabling LPM exposed serious BIOS/ACPI bugs, see e.g. commit 240630e61870 ("ahci: Disable LPM on Lenovo 50 series laptops with a too old BIOS"). Since there hasn't been any BIOS update in years for the ASUSPRO-D840SA motherboard, disable LPM for this board, in order to avoid entering lower PC-states, which triggers graphical corruptions. Cc: stable@vger.kernel.org Reported-by: Andy Yang <andyybtc79@gmail.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220111 Fixes: 7627a0edef54 ("ata: ahci: Drop low power policy board type") Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hans de Goede <hansg@kernel.org> Link: https://lore.kernel.org/r/20250612141750.2108342-2-cassel@kernel.org Signed-off-by: Niklas Cassel <cassel@kernel.org>
2025-06-13block: Fix bvec_set_folio() for very large foliosMatthew Wilcox (Oracle)
Similarly to 26064d3e2b4d ("block: fix adding folio to bio"), if we attempt to add a folio that is larger than 4GB, we'll silently truncate the offset and len. Widen the parameters to size_t, assert that the length is less than 4GB and set the first page that contains the interesting data rather than the first page of the folio. Fixes: 26db5ee15851 (block: add a bvec_set_folio helper) Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Link: https://lore.kernel.org/r/20250612144255.2850278-1-willy@infradead.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-06-13bio: Fix bio_first_folio() for SPARSEMEM without VMEMMAPMatthew Wilcox (Oracle)
It is possible for physically contiguous folios to have discontiguous struct pages if SPARSEMEM is enabled and SPARSEMEM_VMEMMAP is not. This is correctly handled by folio_page_idx(), so remove this open-coded implementation. Fixes: 640d1930bef4 (block: Add bio_for_each_folio_all()) Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Link: https://lore.kernel.org/r/20250612144126.2849931-1-willy@infradead.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-06-13Revert "platform/x86: alienware-wmi-wmax: Add G-Mode support to Alienware ↵Kurt Borja
m16 R1" This reverts commit 5ff79cabb23a2f14d2ed29e9596aec908905a0e6. Although the Alienware m16 R1 AMD model supports G-Mode, it actually has a lower power ceiling than plain "performance" profile, which results in lower performance. Reported-by: Cihan Ozakca <cozakca@outlook.com> Cc: stable@vger.kernel.org # 6.15.x Signed-off-by: Kurt Borja <kuurtb@gmail.com> Link: https://lore.kernel.org/r/20250611-m16-rev-v1-1-72d13bad03c9@gmail.com Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-06-13platform/x86/amd/pmc: Add PCSpecialist Lafite Pro V 14M to 8042 quirks listMario Limonciello
Every other s2idle cycle fails to reach hardware sleep when keyboard wakeup is enabled. This appears to be an EC bug, but the vendor refuses to fix it. It was confirmed that turning off i8042 wakeup avoids ths issue (albeit keyboard wakeup is disabled). Take the lesser of two evils and add it to the i8042 quirk list. Reported-by: Raoul <ein4rth@gmail.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220116 Tested-by: Raoul <ein4rth@gmail.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Link: https://lore.kernel.org/r/20250611203341.3733478-1-superm1@kernel.org Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-06-13spi: spi-pci1xxxx: Drop MSI-X usage as unsupported by DMA engineThangaraj Samynathan
Removes MSI-X from the interrupt request path, as the DMA engine used by the SPI controller does not support MSI-X interrupts. Signed-off-by: Thangaraj Samynathan <thangaraj.s@microchip.com> Link: https://patch.msgid.link/20250612023059.71726-1-thangaraj.s@microchip.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-06-13ASoC: apple: mca: Drop default ARCH_APPLE in KconfigSven Peter
When the first driver for Apple Silicon was upstreamed we accidentally included `default ARCH_APPLE` in its Kconfig which then spread to almost every subsequent driver. As soon as ARCH_APPLE is set to y this will pull in many drivers as built-ins which is not what we want. Thus, drop `default ARCH_APPLE` from Kconfig. Signed-off-by: Sven Peter <sven@kernel.org> Reviewed-by: Janne Grunau <j@jannau.net> Link: https://patch.msgid.link/20250612-apple-kconfig-defconfig-v1-10-0e6f9cb512c1@kernel.org Signed-off-by: Mark Brown <broonie@kernel.org>
2025-06-13crypto: testmgr - reinstate kconfig control over full self-testsEric Biggers
Commit 698de822780f ("crypto: testmgr - make it easier to enable the full set of tests") removed support for building kernels that run only the "fast" set of crypto self-tests by default. This assumed that nearly everyone actually wanted the full set of tests, *if* they had already chosen to enable the tests at all. Unfortunately, it turns out that both Debian and Fedora intentionally have the crypto self-tests enabled in their production kernels. And for production kernels we do need to keep the testing time down, which implies just running the "fast" tests, not the full set of tests. For Fedora, a reason for enabling the tests in production is that they are being (mis)used to meet the FIPS 140-3 pre-operational testing requirement. However, the other reason for enabling the tests in production, which applies to both distros, is that they provide some value in protecting users from buggy drivers. Unfortunately, the crypto/ subsystem has many buggy and untested drivers for off-CPU hardware accelerators on rare platforms. These broken drivers get shipped to users, and there have been multiple examples of the tests preventing these buggy drivers from being used. So effectively, the tests are being relied on in production kernels. I think this is kind of crazy (untested drivers should just not be enabled at all), but that seems to be how things work currently. Thus, reintroduce a kconfig option that controls the level of testing. Call it CRYPTO_SELFTESTS_FULL instead of the original name CRYPTO_MANAGER_EXTRA_TESTS, which was slightly misleading. Moreover, given the "production kernel" use case, make CRYPTO_SELFTESTS depend on EXPERT instead of DEBUG_KERNEL. I also haven't reinstated all the #ifdefs in crypto/testmgr.c. Instead, just rely on the compiler to optimize out unused code. Fixes: 40b9969796bf ("crypto: testmgr - replace CRYPTO_MANAGER_DISABLE_TESTS with CRYPTO_SELFTESTS") Fixes: 698de822780f ("crypto: testmgr - make it easier to enable the full set of tests") Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-06-13ALSA: usb-audio: Rename ALSA kcontrol PCM and PCM1 for the KTMicro sound cardwangdicheng
PCM1 not in Pulseaudio's control list; standardize control to "Speaker" and "Headphone". Signed-off-by: wangdicheng <wangdicheng@kylinos.cn> Cc: <stable@vger.kernel.org> Link: https://patch.msgid.link/20250613063636.239683-1-wangdich9700@163.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2025-06-13perf/x86/intel: Fix crash in icl_update_topdown_event()Kan Liang
The perf_fuzzer found a hard-lockup crash on a RaptorLake machine: Oops: general protection fault, maybe for address 0xffff89aeceab400: 0000 CPU: 23 UID: 0 PID: 0 Comm: swapper/23 Tainted: [W]=WARN Hardware name: Dell Inc. Precision 9660/0VJ762 RIP: 0010:native_read_pmc+0x7/0x40 Code: cc e8 8d a9 01 00 48 89 03 5b cd cc cc cc cc 0f 1f ... RSP: 000:fffb03100273de8 EFLAGS: 00010046 .... Call Trace: <TASK> icl_update_topdown_event+0x165/0x190 ? ktime_get+0x38/0xd0 intel_pmu_read_event+0xf9/0x210 __perf_event_read+0xf9/0x210 CPUs 16-23 are E-core CPUs that don't support the perf metrics feature. The icl_update_topdown_event() should not be invoked on these CPUs. It's a regression of commit: f9bdf1f95339 ("perf/x86/intel: Avoid disable PMU if !cpuc->enabled in sample read") The bug introduced by that commit is that the is_topdown_event() function is mistakenly used to replace the is_topdown_count() call to check if the topdown functions for the perf metrics feature should be invoked. Fix it. Fixes: f9bdf1f95339 ("perf/x86/intel: Avoid disable PMU if !cpuc->enabled in sample read") Closes: https://lore.kernel.org/lkml/352f0709-f026-cd45-e60c-60dfd97f73f3@maine.edu/ Reported-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Vince Weaver <vincent.weaver@maine.edu> Cc: stable@vger.kernel.org # v6.15+ Link: https://lore.kernel.org/r/20250612143818.2889040-1-kan.liang@linux.intel.com
2025-06-13powerpc: dts: mpc8315erdb: Add GPIO controller nodeJ. Neuschäfer
The MPC8315E SoC and variants have a GPIO controller at IMMR + 0xc00. This node was previously missing from the device tree. Signed-off-by: J. Neuschäfer <j.ne@posteo.net> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20250611-mpc-gpio-v1-1-02d1f75336e2@posteo.net
2025-06-13powerpc/microwatt: Fix model property in device treeJ. Neuschäfer
The standard property for the model name is called "model". Signed-off-by: J. Neuschäfer <j.ne@posteo.net> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20250611-microwatt-v2-1-80847bbc5f9c@posteo.net
2025-06-13powerpc/eeh: Fix missing PE bridge reconfiguration during VFIO EEH recoveryNarayana Murty N
VFIO EEH recovery for PCI passthrough devices fails on PowerNV and pseries platforms due to missing host-side PE bridge reconfiguration. In the current implementation, eeh_pe_configure() only performs RTAS or OPAL-based bridge reconfiguration for native host devices, but skips it entirely for PEs managed through VFIO in guest passthrough scenarios. This leads to incomplete EEH recovery when a PCI error affects a passthrough device assigned to a QEMU/KVM guest. Although VFIO triggers the EEH recovery flow through VFIO_EEH_PE_ENABLE ioctl, the platform-specific bridge reconfiguration step is silently bypassed. As a result, the PE's config space is not fully restored, causing subsequent config space access failures or EEH freeze-on-access errors inside the guest. This patch fixes the issue by ensuring that eeh_pe_configure() always invokes the platform's configure_bridge() callback (e.g., pseries_eeh_phb_configure_bridge) even for VFIO-managed PEs. This ensures that RTAS or OPAL calls to reconfigure the PE bridge are correctly issued on the host side, restoring the PE's configuration space after an EEH event. This fix is essential for reliable EEH recovery in QEMU/KVM guests using VFIO PCI passthrough on PowerNV and pseries systems. Tested with: - QEMU/KVM guest using VFIO passthrough (IBM Power9,(lpar)Power11 host) - Injected EEH errors with pseries EEH errinjct tool on host, recovery verified on qemu guest. - Verified successful config space access and CAP_EXP DevCtl restoration after recovery Fixes: 212d16cdca2d ("powerpc/eeh: EEH support for VFIO PCI device") Signed-off-by: Narayana Murty N <nnmlinux@linux.ibm.com> Reviewed-by: Vaibhav Jain <vaibhav@linux.ibm.com> Reviewed-by: Ganesh Goudar <ganeshgr@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20250508062928.146043-1-nnmlinux@linux.ibm.com
2025-06-13powerpc/vdso: Fix build of VDSO32 with pcrelChristophe Leroy
Building vdso32 on power10 with pcrel leads to following errors: VDSO32A arch/powerpc/kernel/vdso/gettimeofday-32.o arch/powerpc/kernel/vdso/gettimeofday.S: Assembler messages: arch/powerpc/kernel/vdso/gettimeofday.S:40: Error: syntax error; found `@', expected `,' arch/powerpc/kernel/vdso/gettimeofday.S:71: Info: macro invoked from here arch/powerpc/kernel/vdso/gettimeofday.S:40: Error: junk at end of line: `@notoc' arch/powerpc/kernel/vdso/gettimeofday.S:71: Info: macro invoked from here ... make[2]: *** [arch/powerpc/kernel/vdso/Makefile:85: arch/powerpc/kernel/vdso/gettimeofday-32.o] Error 1 make[1]: *** [arch/powerpc/Makefile:388: vdso_prepare] Error 2 Once the above is fixed, the following happens: VDSO32C arch/powerpc/kernel/vdso/vgettimeofday-32.o cc1: error: '-mpcrel' requires '-mcmodel=medium' make[2]: *** [arch/powerpc/kernel/vdso/Makefile:89: arch/powerpc/kernel/vdso/vgettimeofday-32.o] Error 1 make[1]: *** [arch/powerpc/Makefile:388: vdso_prepare] Error 2 make: *** [Makefile:251: __sub-make] Error 2 Make sure pcrel version of CFUNC() macro is used only for powerpc64 builds and remove -mpcrel for powerpc32 builds. Fixes: 7e3a68be42e1 ("powerpc/64: vmlinux support building with PCREL addresing") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/1fa3453f07d42a50a70114da9905bf7b73304fca.1747073669.git.christophe.leroy@csgroup.eu
2025-06-13drm/mgag200: Do not include <linux/export.h>Thomas Zimmermann
Fix the compile-time warning drivers/gpu/drm/mgag200/mgag200_ddc.c: warning: EXPORT_SYMBOL() is not used, but #include <linux/export.h> is present Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Jocelyn Falempe <jfalempe@redhat.com> Link: https://lore.kernel.org/r/20250612085308.203861-1-tzimmermann@suse.de
2025-06-13drm/ast: Do not include <linux/export.h>Thomas Zimmermann
Fix the compile-time warning drivers/gpu/drm/ast/ast_mode.c: warning: EXPORT_SYMBOL() is not used, but #include <linux/export.h> is present Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Jocelyn Falempe <jfalempe@redhat.com> Link: https://lore.kernel.org/r/20250612084257.200907-1-tzimmermann@suse.de
2025-06-13Merge tag 'drm-misc-fixes-2025-06-12' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes drm-misc-fixes for v6.16-rc2: - Fix infinite EPROBE_DEFER loop in vc4 probing. - Fix amdxdna firmware size. - mode fixes for meson. - Kconfig fix for st7171-i2c. - Fix -EBUSY WARN_ON_ONCE in dma-buf - Use dma_sync_sgtable_for_cpu in udmabuf. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://lore.kernel.org/r/62c06195-8bc1-4dae-8777-e86d94e4d9d9@linux.intel.com
2025-06-12mm: add mmap_prepare() compatibility layer for nested file systemsLorenzo Stoakes
Nested file systems, that is those which invoke call_mmap() within their own f_op->mmap() handlers, may encounter underlying file systems which provide the f_op->mmap_prepare() hook introduced by commit c84bf6dd2b83 ("mm: introduce new .mmap_prepare() file callback"). We have a chicken-and-egg scenario here - until all file systems are converted to using .mmap_prepare(), we cannot convert these nested handlers, as we can't call f_op->mmap from an .mmap_prepare() hook. So we have to do it the other way round - invoke the .mmap_prepare() hook from an .mmap() one. in order to do so, we need to convert VMA state into a struct vm_area_desc descriptor, invoking the underlying file system's f_op->mmap_prepare() callback passing a pointer to this, and then setting VMA state accordingly and safely. This patch achieves this via the compat_vma_mmap_prepare() function, which we invoke from call_mmap() if f_op->mmap_prepare() is specified in the passed in file pointer. We place the fundamental logic into mm/vma.h where VMA manipulation belongs. We also update the VMA userland tests to accommodate the changes. The compat_vma_mmap_prepare() function and its associated machinery is temporary, and will be removed once the conversion of file systems is complete. We carefully place this code so it can be used with CONFIG_MMU and also with cutting edge nommu silicon. [akpm@linux-foundation.org: export compat_vma_mmap_prepare tp fix build] [lorenzo.stoakes@oracle.com: remove unused declarations] Link: https://lkml.kernel.org/r/ac3ae324-4c65-432a-8c6d-2af988b18ac8@lucifer.local Link: https://lkml.kernel.org/r/20250609165749.344976-1-lorenzo.stoakes@oracle.com Fixes: c84bf6dd2b83 ("mm: introduce new .mmap_prepare() file callback"). Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reported-by: Jann Horn <jannh@google.com> Closes: https://lore.kernel.org/linux-mm/CAG48ez04yOEVx1ekzOChARDDBZzAKwet8PEoPM4Ln3_rk91AzQ@mail.gmail.com/ Reviewed-by: Pedro Falcato <pfalcato@suse.de> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-06-12tools/bpf_jit_disasm: Fix potential negative tpath index in get_exec_path()Ruslan Semchenko
If readlink() fails, len will be -1, which can cause negative indexing and undefined behavior. This patch ensures that len is set to 0 on readlink failure, preventing such issues. Signed-off-by: Ruslan Semchenko <uncleruc2075@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20250612131816.1870-1-uncleruc2075@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-12Merge branch 'bpf-fix-a-few-test-failures-with-64k-page-size'Alexei Starovoitov
Yonghong Song says: ==================== bpf: Fix a few test failures with 64K page size Changelog: v2 -> v3: - v2: https://lore.kernel.org/bpf/20250611171519.2033193-1-yonghong.song@linux.dev/ - Add additional comments for xdp_adjust_tail test. - Use actual kernel page size to set new_len for Patch 2 tests. v1 -> v2: - v1: https://lore.kernel.org/bpf/20250608165534.1019914-1-yonghong.song@linux.dev/ - For xdp_adjust_tail, let kernel test_run can handle various page sizes for xdp progs. - For two change_tail tests, make code easier to understand. - Resolved a new test failure (xdp_do_redirect). ==================== Link: https://patch.msgid.link/20250612035027.2207299-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-12selftests/bpf: Fix xdp_do_redirect failure with 64KB page sizeYonghong Song
On arm64 with 64KB page size, the selftest xdp_do_redirect failed like below: ... test_xdp_do_redirect:PASS:pkt_count_tc 0 nsec test_max_pkt_size:PASS:prog_run_max_size 0 nsec test_max_pkt_size:FAIL:prog_run_too_big unexpected prog_run_too_big: actual -28 != expected -22 With 64KB page size, the xdp frame size will be much bigger so the existing test will fail. Adjust various parameters so the test can also work on 64K page size. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20250612035042.2208630-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-12selftests/bpf: Fix two net related test failures with 64K page sizeYonghong Song
When running BPF selftests on arm64 with a 64K page size, I encountered the following two test failures: sockmap_basic/sockmap skb_verdict change tail:FAIL tc_change_tail:FAIL With further debugging, I identified the root cause in the following kernel code within __bpf_skb_change_tail(): u32 max_len = BPF_SKB_MAX_LEN; u32 min_len = __bpf_skb_min_len(skb); int ret; if (unlikely(flags || new_len > max_len || new_len < min_len)) return -EINVAL; With a 4K page size, new_len = 65535 and max_len = 16064, the function returns -EINVAL. However, With a 64K page size, max_len increases to 261824, allowing execution to proceed further in the function. This is because BPF_SKB_MAX_LEN scales with the page size and larger page sizes result in higher max_len values. Updating the new_len parameter in both tests based on actual kernel page size resolved both failures. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20250612035037.2207911-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-12bpf: Fix an issue in bpf_prog_test_run_xdp when page size greater than 4KYonghong Song
The bpf selftest xdp_adjust_tail/xdp_adjust_frags_tail_grow failed on arm64 with 64KB page: xdp_adjust_tail/xdp_adjust_frags_tail_grow:FAIL In bpf_prog_test_run_xdp(), the xdp->frame_sz is set to 4K, but later on when constructing frags, with 64K page size, the frag data_len could be more than 4K. This will cause problems in bpf_xdp_frags_increase_tail(). To fix the failure, the xdp->frame_sz is set to be PAGE_SIZE so kernel can test different page size properly. With the kernel change, the user space and bpf prog needs adjustment. Currently, the MAX_SKB_FRAGS default value is 17, so for 4K page, the maximum packet size will be less than 68K. To test 64K page, a bigger maximum packet size than 68K is desired. So two different functions are implemented for subtest xdp_adjust_frags_tail_grow. Depending on different page size, different data input/output sizes are used to adapt with different page size. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20250612035032.2207498-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-12ionic: Prevent driver/fw getting out of sync on devcmd(s)Brett Creeley
Some stress/negative firmware testing around devcmd(s) returning EAGAIN found that the done bit could get out of sync in the firmware when it wasn't cleared in a retry case. While here, change the type of the local done variable to a bool to match the return type from ionic_dev_cmd_done(). Fixes: ec8ee714736e ("ionic: stretch heartbeat detection") Signed-off-by: Brett Creeley <brett.creeley@amd.com> Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250609212827.53842-1-shannon.nelson@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-13Merge tag 'drm-xe-fixes-2025-06-12' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes Driver Changes: - Fix regression disallowing 64K SVM migration (Maarten) - Use a bounce buffer for WA BB (Lucas) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Hellstrom <thomas.hellstrom@linux.intel.com> Link: https://lore.kernel.org/r/aEsBQoh5Si3ouPgE@fedora
2025-06-12SUNRPC: Cleanup/fix initial rq_pages allocationBenjamin Coddington
While investigating some reports of memory-constrained NUMA machines failing to mount v3 and v4.0 nfs mounts, we found that svc_init_buffer() was not attempting to retry allocations from the bulk page allocator. Typically, this results in a single page allocation being returned and the mount attempt fails with -ENOMEM. A retry would have allowed the mount to succeed. Additionally, it seems that the bulk allocation in svc_init_buffer() is redundant because svc_alloc_arg() will perform the required allocation and does the correct thing to retry the allocations. The call to allocate memory in svc_alloc_arg() drops the preferred node argument, but I expect we'll still allocate on the preferred node because the allocation call happens within the svc thread context, which chooses the node with memory closest to the current thread's execution. This patch cleans out the bulk allocation in svc_init_buffer() to allow svc_alloc_arg() to handle the allocation/retry logic for rq_pages. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Fixes: ed603bcf4fea ("sunrpc: Replace the rq_pages array with dynamically-allocated memory") Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-06-12NFSD: Avoid corruption of a referring call listChuck Lever
The new code neglects to remove a freshly-allocated RCL from the callback's referring call list when no matching referring call is found. Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/r/202505171002.cE46sdj5-lkp@intel.com/ Fixes: 4f3c8d8c9e10 ("NFSD: Implement CB_SEQUENCE referring call lists") Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-06-12bpf: Initialize used but uninit variable in propagate_liveness()Song Liu
With input changed == NULL, a local variable is used for "changed". Initialize tmp properly, so that it can be used in the following: *changed |= err > 0; Otherwise, UBSAN will complain: UBSAN: invalid-load in kernel/bpf/verifier.c:18924:4 load of value <some random value> is not a valid value for type '_Bool' Fixes: dfb2d4c64b82 ("bpf: set 'changed' status if propagate_liveness() did any updates") Signed-off-by: Song Liu <song@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250612221100.2153401-1-song@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-12docs/bpf: Default cpu version changed from v1 to v3 in llvm 20Yonghong Song
The default cpu version is changed from v1 to v3 in llvm version 20. See [1] for more detailed reasoning. Update bpf_devel_QA.rst so developers can find such information easily. [1] https://github.com/llvm/llvm-project/pull/107008 Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20250612043049.2411989-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-12selftests/bpf: fix signedness bug in redir_partial()Fushuai Wang
When xsend() returns -1 (error), the check 'n < sizeof(buf)' incorrectly treats it as success due to unsigned promotion. Explicitly check for -1 first. Fixes: a4b7193d8efd ("selftests/bpf: Add sockmap test for redirecting partial skb data") Signed-off-by: Fushuai Wang <wangfushuai@baidu.com> Link: https://lore.kernel.org/r/20250612084208.27722-1-wangfushuai@baidu.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-12bpf: Fix state use-after-free on push_stack() errLuis Gerhorst
Without this, `state->speculative` is used after the cleanup cycles in push_stack() or push_async_cb() freed `env->cur_state` (i.e., `state`). Avoid this by relying on the short-circuit logic to only access `state` if the error is recoverable (and make sure it never is after push_*() failed). push_*() callers must always return an error for which error_recoverable_with_nospec(err) is false if push_*() returns NULL, otherwise we try to recover and access the stale `state`. This is only violated by sanitize_ptr_alu(), thus also fix this case to return -ENOMEM. state->speculative does not make sense if the error path of push_*() ran. In that case, `state->speculative && error_recoverable_with_nospec(err)` as a whole should already never evaluate to true (because all cases where push_stack() fails must return -ENOMEM/-EFAULT). As mentioned, this is only violated by the push_stack() call in sanitize_speculative_path() which returns -EACCES without [1] (through REASON_STACK in sanitize_err() after sanitize_ptr_alu()). To fix this, return -ENOMEM for REASON_STACK (which is also the behavior we will have after [1]). Checked that it fixes the syzbot reproducer as expected. [1] https://lore.kernel.org/all/20250603213232.339242-1-luis.gerhorst@fau.de/ Fixes: d6f1c85f2253 ("bpf: Fall back to nospec for Spectre v1") Reported-by: syzbot+b5eb72a560b8149a1885@syzkaller.appspotmail.com Reported-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/all/38862a832b91382cddb083dddd92643bed0723b8.camel@gmail.com/ Signed-off-by: Luis Gerhorst <luis.gerhorst@fau.de> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250611210728.266563-1-luis.gerhorst@fau.de Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-12Merge branch 'bpf-propagate-read-precision-marks-over-state-graph-backedges'Alexei Starovoitov
Eduard Zingerman says: ==================== bpf: propagate read/precision marks over state graph backedges Current loop_entry-based states comparison logic does not handle the following case: .-> A --. Assume the states are visited in the order A, B, C. | | | Assume that state B reaches a state equivalent to state A. | v v At this point, state C is not processed yet, so state A '-- B C has not received any read or precision marks from C. As a result, these marks won't be propagated to B. If B has incomplete marks, it is unsafe to use it in states_equal() checks. This issue was first reported in [1]. This patch-set -------------- Here is the gist of the algorithm implemented by this patch-set: - Compute strongly connected components (SCCs) in the program CFG. - When a verifier state enters an SCC, that state is recorded as the SCC's entry point. - When a verifier state is found to be equivalent to another (e.g., B to A in the example above), it is recorded as a states-graph backedge. - Backedges are accumulated per SCC (*). - When an SCC entry state reaches `branches == 0`, propagate read and precision marks through the backedges until a fixed point is reached (e.g., from A to B, from C to A, and then again from A to B). (*) This is an oversimplification, see patch #8 for details. Unfortunately, this means that commit [2] needs to be reverted, as precision propagation requires access to jump history, and backedges represent history not belonging to `env->cur_state`. Details are provided in patch #8; a comment in `is_state_visited()` explains most of the mechanics. Patch #2 adds a `compute_scc()` function, which computes SCCs in the program CFG. This function was tested using property-based testing in [3], but it is not included in selftests. Previous attempt ---------------- A previous attempt to fix this is described in [4]: 1. Within the states loop, `states_equal(... RANGE_WITHIN)` ignores read and precision marks. 2. For states outside the loop, all registers for states within the loop are marked as read and precise. This approach led to an 86x regression on the `cond_break1` selftest. In that test, one loop was followed by another, and a certain variable was incremented in the second loop. This variable was marked as precise due to rule (2), which hindered convergence in the first loop. After some off-list discussion, it was decided that this might be a typical case and such regressions are undesirable. This patch-set avoids such eager precision markings. Alternatives ------------ Another option is to associate a mask of read/written/precise stack slots with each instruction. This mask can be populated during verifier states exploration. Upon reaching an `EXIT` instruction or an equivalent state, the accumulated masks can be used to propagate read/written/precise bits across the program's control flow graph using an analysis similar to use-def. Unfortunately, a naive implementation of this approach [5] results in a 10x regression in `veristat` for some `sched_ext` programs due to the inability to express the must-write property. This issue requires further investigation. Changes in verification performance ----------------------------------- There are some veristat regressions when comparing with master using selftests and sched_ext BPF binaries. The comparison is done using master from [6] and this patch-set from [7] where memory accounting logic is added to veristat. ========= selftests: master vs patch-set ========= File Program Insns Peak memory (KiB) --------------------- ----------------------------------- ----- ----- ---------------- ---- ----- ---------------- bpf_qdisc_fq.bpf.o bpf_fq_dequeue 1187 1645 +458 (+38.58%) 768 1240 +472 (+61.46%) dynptr_success.bpf.o test_copy_from_user_str_dynptr 208 279 +71 (+34.13%) 512 1024 +512 (+100.00%) dynptr_success.bpf.o test_copy_from_user_task_str_dynptr 205 263 +58 (+28.29%) 512 1024 +512 (+100.00%) dynptr_success.bpf.o test_probe_read_kernel_str_dynptr 686 857 +171 (+24.93%) 992 1724 +732 (+73.79%) dynptr_success.bpf.o test_probe_read_user_str_dynptr 689 860 +171 (+24.82%) 1016 1744 +728 (+71.65%) iters.bpf.o checkpoint_states_deletion 1211 1216 +5 (+0.41%) 512 1280 +768 (+150.00%) pyperf600_iter.bpf.o on_event 2591 5929 +3338 (+128.83%) 4744 11176 +6432 (+135.58%) verifier_gotol.bpf.o gotol_large_imm 40004 40004 +0 (+0.00%) 1024 1536 +512 (+50.00%) Total progs: 3725 Old success: 2157 New success: 2157 total_insns diff min: 0.00% total_insns diff max: 128.83% 0 -> value: 0 value -> 0: 0 total_insns abs max old: 837,487 total_insns abs max new: 837,487 0 .. 5 %: 3710 5 .. 15 %: 6 20 .. 30 %: 6 30 .. 40 %: 2 125 .. 130 %: 1 mem_peak diff min: -27.78% mem_peak diff max: 198.44% mem_peak abs max old: 269,312 KiB mem_peak abs max new: 269,312 KiB -30 .. -20 %: 1 -5 .. 0 %: 18 0 .. 5 %: 3568 5 .. 15 %: 4 15 .. 25 %: 3 45 .. 55 %: 4 60 .. 70 %: 1 70 .. 80 %: 2 100 .. 110 %: 3 135 .. 145 %: 1 150 .. 160 %: 1 195 .. 200 %: 1 ========= scx: master vs patch-set ========= Program Insns Peak memory (KiB) ------------------------ ----- ----- --------------- ----- ----- ----------------- arena_topology_node_init 2133 2395 +262 (+12.28%) 768 768 +0 (+0.00%) chaos_dispatch 2835 2868 +33 (+1.16%) 1972 1720 -252 (-12.78%) chaos_init 4324 5210 +886 (+20.49%) 2528 3028 +500 (+19.78%) lavd_cpu_offline 5107 5726 +619 (+12.12%) 4188 6304 +2116 (+50.53%) lavd_cpu_online 5107 5726 +619 (+12.12%) 4188 6304 +2116 (+50.53%) lavd_dispatch 41775 47601 +5826 (+13.95%) 6196 29192 +22996 (+371.14%) lavd_enqueue 20238 24188 +3950 (+19.52%) 22084 42156 +20072 (+90.89%) lavd_init 6974 7685 +711 (+10.20%) 5428 6928 +1500 (+27.63%) lavd_select_cpu 22138 26088 +3950 (+17.84%) 24448 43688 +19240 (+78.70%) layered_dispatch 17847 26581 +8734 (+48.94%) 11728 28740 +17012 (+145.05%) layered_dump 1891 2098 +207 (+10.95%) 2036 3048 +1012 (+49.71%) layered_runnable 2606 2634 +28 (+1.07%) 748 1240 +492 (+65.78%) p2dq_init 3691 4554 +863 (+23.38%) 2016 2528 +512 (+25.40%) rusty_enqueue 28853 28853 +0 (+0.00%) 2072 1824 -248 (-11.97%) rusty_init_task 31128 31128 +0 (+0.00%) 2176 2560 +384 (+17.65%) Total progs: 148 Old success: 135 New success: 135 total_insns diff min: 0.00% total_insns diff max: 48.94% 0 -> value: 0 value -> 0: 0 total_insns abs max old: 41,775 total_insns abs max new: 47,601 0 .. 5 %: 133 5 .. 15 %: 7 15 .. 25 %: 4 35 .. 45 %: 3 45 .. 50 %: 1 mem_peak diff min: -12.78% mem_peak diff max: 371.14% mem_peak abs max old: 24,448 KiB mem_peak abs max new: 43,688 KiB -15 .. -5 %: 2 -5 .. 0 %: 2 0 .. 5 %: 129 5 .. 15 %: 1 15 .. 25 %: 2 25 .. 35 %: 2 45 .. 55 %: 3 65 .. 75 %: 1 75 .. 85 %: 1 90 .. 100 %: 1 145 .. 155 %: 1 195 .. 205 %: 1 370 .. 375 %: 1 Changelog --------- v1: https://lore.kernel.org/bpf/20250524191932.389444-1-eddyz87@gmail.com/ v1 -> v2: - Rebase - added mem_peak statistics (Alexei) - selftests: fixed comments and removed useless r7 assignments (Yonghong) v2: https://lore.kernel.org/bpf/20250606210352.1692944-1-eddyz87@gmail.com/ v2 -> v3: - Rebase Links ----- [1] https://lore.kernel.org/bpf/20250312031344.3735498-1-eddyz87@gmail.com/ [2] commit 96a30e469ca1 ("bpf: use common instruction history across all states") [3] https://github.com/eddyz87/scc-test [4] https://lore.kernel.org/bpf/20250426104634.744077-1-eddyz87@gmail.com/ [5] https://github.com/eddyz87/bpf/tree/propagate-read-and-precision-in-cfg [6] https://github.com/eddyz87/bpf/tree/veristat-memory-accounting [7] https://github.com/eddyz87/bpf/tree/scc-accumulate-backedges ==================== Link: https://patch.msgid.link/20250611200546.4120963-1-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-12selftests/bpf: tests with a loop state missing read/precision markEduard Zingerman
The test case absent_mark_in_the_middle_state is equivalent of the following C program: 1: r8 = bpf_get_prandom_u32(); 2: r6 = -32; 3: bpf_iter_num_new(&fp[-8], 0, 10); 4: if (unlikely(bpf_get_prandom_u32())) 5: r6 = -31; 6: for (;;) { 7: if (!bpf_iter_num_next(&fp[-8])) 8: break; 9: if (unlikely(bpf_get_prandom_u32())) 10: *(u64 *)(fp + r6) = 7; 11: } 12: bpf_iter_num_destroy(&fp[-8]); 13: return 0; W/o a fix that instructs verifier to ignore branches count for loop entries verification proceeds as follows: - 1-4, state is {r6=-32,fp-8=active}; - 6, checkpoint A is created with {r6=-32,fp-8=active}; - 7, checkpoint B is created with {r6=-32,fp-8=active}, push state {r6=-32,fp-8=active} from 7 to 9; - 8,12,13, {r6=-32,fp-8=drained}, exit; - pop state with {r6=-32,fp-8=active} from 7 to 9; - 9, push state {r6=-32,fp-8=active} from 9 to 10; - 6, checkpoint C is created with {r6=-32,fp-8=active}; - 7, checkpoint A is hit, no precision propagated for r6 to C; - pop state {r6=-32,fp-8=active} from 9 to 10; - 10, state is {r6=-31,fp-8=active}, r6 is marked as read and precise, these marks are propagated to checkpoints A and B (but not C, as it is not the parent of current state; - 6, {r6=-31,fp-8=active} checkpoint C is hit, because r6 is not marked precise for this checkpoint; - the program is accepted, despite a possibility of unaligned u64 stack access at offset -31. The test case absent_mark_in_the_middle_state2 is similar except the following change: r8 = bpf_get_prandom_u32(); r6 = -32; bpf_iter_num_new(&fp[-8], 0, 10); if (unlikely(bpf_get_prandom_u32())) { r6 = -31; + jump_into_loop: + goto +0; + goto loop; + } + if (unlikely(bpf_get_prandom_u32())) + goto jump_into_loop; + loop: for (;;) { if (!bpf_iter_num_next(&fp[-8])) break; if (unlikely(bpf_get_prandom_u32())) *(u64 *)(fp + r6) = 7; } bpf_iter_num_destroy(&fp[-8]) return 0 The goal is to check that read/precision marks are propagated to checkpoint created at 'goto +0' that resides outside of the loop. The test case absent_mark_in_the_middle_state3 is a bit different and is equivalent to the C program below: int absent_mark_in_the_middle_state3(void) { bpf_iter_num_new(&fp[-8], 0, 10) loop1(-32, &fp[-8]) loop1_wrapper(&fp[-8]) bpf_iter_num_destroy(&fp[-8]) } int loop1(num, iter) { while (bpf_iter_num_next(iter)) { if (unlikely(bpf_get_prandom_u32())) *(fp + num) = 7; } return 0 } int loop1_wrapper(iter) { r6 = -32; if (unlikely(bpf_get_prandom_u32())) r6 = -31; loop1(r6, iter); return 0; } The unsafe state is reached in a similar manner, but the loop is located inside a subprogram that is called from two locations in the main subprogram. This detail is important for exercising bpf_scc_visit->backedges memory management. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250611200836.4135542-11-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-12bpf: include backedges in peak_states statEduard Zingerman
Count states accumulated in bpf_scc_visit->backedges in env->peak_states. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250611200836.4135542-10-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-12bpf: remove {update,get}_loop_entry functionsEduard Zingerman
The previous patch switched read and precision tracking for iterator-based loops from state-graph-based loop tracking to control-flow-graph-based loop tracking. This patch removes the now-unused `update_loop_entry()` and `get_loop_entry()` functions, which were part of the state-graph-based logic. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250611200836.4135542-9-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-12bpf: propagate read/precision marks over state graph backedgesEduard Zingerman
Current loop_entry-based exact states comparison logic does not handle the following case: .-> A --. Assume the states are visited in the order A, B, C. | | | Assume that state B reaches a state equivalent to state A. | v v At this point, state C is not processed yet, so state A '-- B C has not received any read or precision marks from C. As a result, these marks won't be propagated to B. If B has incomplete marks, it is unsafe to use it in states_equal() checks. This commit replaces the existing logic with the following: - Strongly connected components (SCCs) are computed over the program's control flow graph (intraprocedurally). - When a verifier state enters an SCC, that state is recorded as the SCC entry point. - When a verifier state is found equivalent to another (e.g., B to A in the example), it is recorded as a states graph backedge. Backedges are accumulated per SCC. - When an SCC entry state reaches `branches == 0`, read and precision marks are propagated through the backedges (e.g., from A to B, from C to A, and then again from A to B). To support nested subprogram calls, the entry state and backedge list are associated not with the SCC itself but with an object called `bpf_scc_callchain`. A callchain is a tuple `(callsite*, scc_id)`, where `callsite` is the index of a call instruction for each frame except the last. See the comments added in `is_state_visited()` and `compute_scc_callchain()` for more details. Fixes: 2a0992829ea3 ("bpf: correct loop detection for iterators convergence") Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250611200836.4135542-8-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-12bpf: move REG_LIVE_DONE check to clean_live_states()Eduard Zingerman
The next patch would add some relatively heavy-weight operation to clean_live_states(), this operation can be skipped if REG_LIVE_DONE is set. Move the check from clean_verifier_state() to clean_verifier_state() as a small refactoring commit. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250611200836.4135542-7-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-12bpf: set 'changed' status if propagate_liveness() did any updatesEduard Zingerman
Add an out parameter to `propagate_liveness()` to record whether any new liveness bits were set during its execution. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250611200836.4135542-6-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-12bpf: set 'changed' status if propagate_precision() did any updatesEduard Zingerman
Add an out parameter to `propagate_precision()` to record whether any new precision bits were set during its execution. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250611200836.4135542-5-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-12bpf: starting_state parameter for __mark_chain_precision()Eduard Zingerman
Allow `mark_chain_precision()` to run from an arbitrary starting state by replacing direct references to `env->cur_state` with a parameter. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250611200836.4135542-4-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-12bpf: frame_insn_idx() utility functionEduard Zingerman
A function to return IP for a given frame in a call stack of a state. Will be used by a next patch. The `state->insn_idx = env->insn_idx;` assignment in the do_check() allows to use frame_insn_idx with env->cur_state. At the moment bpf_verifier_state->insn_idx is set when new cached state is added in is_state_visited() and accessed only in the contexts when the state is already in the cache. Hence this assignment does not change verifier behaviour. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250611200836.4135542-3-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-12bpf: compute SCCs in program control flow graphEduard Zingerman
Compute strongly connected components in the program CFG. Assign an SCC number to each instruction, recorded in env->insn_aux[*].scc. Use Tarjan's algorithm for SCC computation adapted to run non-recursively. For debug purposes print out computed SCCs as a part of full program dump in compute_live_registers() at log level 2, e.g.: func#0 @0 Live regs before insn: 0: .......... (b4) w6 = 10 2 1: ......6... (18) r1 = 0xffff88810bbb5565 2 3: .1....6... (b4) w2 = 2 2 4: .12...6... (85) call bpf_trace_printk#6 2 5: ......6... (04) w6 += -1 2 6: ......6... (56) if w6 != 0x0 goto pc-6 7: .......... (b4) w6 = 5 1 8: ......6... (18) r1 = 0xffff88810bbb5567 1 10: .1....6... (b4) w2 = 2 1 11: .12...6... (85) call bpf_trace_printk#6 1 12: ......6... (04) w6 += -1 1 13: ......6... (56) if w6 != 0x0 goto pc-6 14: .......... (b4) w0 = 0 15: 0......... (95) exit ^^^ SCC number for the instruction Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250611200836.4135542-2-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-12Revert "bpf: use common instruction history across all states"Eduard Zingerman
This reverts commit 96a30e469ca1d2b8cc7811b40911f8614b558241. Next patches in the series modify propagate_precision() to allow arbitrary starting state. Precision propagation requires access to jump history, and arbitrary states represent history not belonging to `env->cur_state`. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250611200836.4135542-1-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-12Merge tag 'bitmap-for-6.16-rc2' of https://github.com/norov/linuxLinus Torvalds
Pull bitmap fix from Yury Norov: "Fix for __GENMASK() and __GENMASK_ULL() in UAPI" * tag 'bitmap-for-6.16-rc2' of https://github.com/norov/linux: uapi: bitops: use UAPI-safe variant of BITS_PER_LONG again
2025-06-12smb: improve directory cache reuse for readdir operationsBharath SM
Currently, cached directory contents were not reused across subsequent 'ls' operations because the cache validity check relied on comparing the ctx pointer, which changes with each readdir invocation. As a result, the cached dir entries was not marked as valid and the cache was not utilized for subsequent 'ls' operations. This change uses the file pointer, which remains consistent across all readdir calls for a given directory instance, to associate and validate the cache. As a result, cached directory contents can now be correctly reused, improving performance for repeated directory listings. Performance gains with local windows SMB server: Without the patch and default actimeo=1: 1000 directory enumeration operations on dir with 10k files took 135.0s With this patch and actimeo=0: 1000 directory enumeration operations on dir with 10k files took just 5.1s Signed-off-by: Bharath SM <bharathsm@microsoft.com> Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Cc: stable@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
2025-06-12smb: client: fix perf regression with deferred closesPaulo Alcantara
Customer reported that one of their applications started failing to open files with STATUS_INSUFFICIENT_RESOURCES due to NetApp server hitting the maximum number of opens to same file that it would allow for a single client connection. It turned out the client was failing to reuse open handles with deferred closes because matching ->f_flags directly without masking off O_CREAT|O_EXCL|O_TRUNC bits first broke the comparision and then client ended up with thousands of deferred closes to same file. Those bits are already satisfied on the original open, so no need to check them against existing open handles. Reproducer: #include <stdio.h> #include <stdlib.h> #include <string.h> #include <unistd.h> #include <fcntl.h> #include <pthread.h> #define NR_THREADS 4 #define NR_ITERATIONS 2500 #define TEST_FILE "/mnt/1/test/dir/foo" static char buf[64]; static void *worker(void *arg) { int i, j; int fd; for (i = 0; i < NR_ITERATIONS; i++) { fd = open(TEST_FILE, O_WRONLY|O_CREAT|O_APPEND, 0666); for (j = 0; j < 16; j++) write(fd, buf, sizeof(buf)); close(fd); } } int main(int argc, char *argv[]) { pthread_t t[NR_THREADS]; int fd; int i; fd = open(TEST_FILE, O_WRONLY|O_CREAT|O_TRUNC, 0666); close(fd); memset(buf, 'a', sizeof(buf)); for (i = 0; i < NR_THREADS; i++) pthread_create(&t[i], NULL, worker, NULL); for (i = 0; i < NR_THREADS; i++) pthread_join(t[i], NULL); return 0; } Before patch: $ mount.cifs //srv/share /mnt/1 -o ... $ mkdir -p /mnt/1/test/dir $ gcc repro.c && ./a.out ... number of opens: 1391 After patch: $ mount.cifs //srv/share /mnt/1 -o ... $ mkdir -p /mnt/1/test/dir $ gcc repro.c && ./a.out ... number of opens: 1 Cc: linux-cifs@vger.kernel.org Cc: David Howells <dhowells@redhat.com> Cc: Jay Shin <jaeshin@redhat.com> Cc: Pierguido Lambri <plambri@redhat.com> Fixes: b8ea3b1ff544 ("smb: enable reuse of deferred file handles for write operations") Acked-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Signed-off-by: Steve French <stfrench@microsoft.com>