summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-05-04Merge tag 'edac_urgent_for_v6.15_rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull EDAC fixes from Borislav Petkov: - Test the correct structure member when handling correctable errors and avoid spurious interrupts, in altera_edac * tag 'edac_urgent_for_v6.15_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: EDAC/altera: Set DDR and SDMMC interrupt mask before registration EDAC/altera: Test the correct error reg offset
2025-05-04io_uring: always arm linked timeouts prior to issueJens Axboe
There are a few spots where linked timeouts are armed, and not all of them adhere to the pre-arm, attempt issue, post-arm pattern. This can be problematic if the linked request returns that it will trigger a callback later, and does so before the linked timeout is fully armed. Consolidate all the linked timeout handling into __io_issue_sqe(), rather than have it spread throughout the various issue entry points. Cc: stable@vger.kernel.org Link: https://github.com/axboe/liburing/issues/1390 Reported-by: Chase Hiltz <chase@path.net> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-04Merge tag 'x86-urgent-2025-05-04' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fix from Ingo Molnar: "Fix SEV-SNP memory acceptance from the EFI stub for guests running at VMPL >0" * tag 'x86-urgent-2025-05-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/boot/sev: Support memory acceptance in the EFI stub under SVSM
2025-05-04Merge tag 'perf-urgent-2025-05-04' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc perf fixes from Ingo Molnar: - Require group events for branch counter groups and PEBS counter snapshotting groups to be x86 events. - Fix the handling of counter-snapshotting of non-precise events, where counter values may move backwards a bit, temporarily, confusing the code. - Restrict perf/KVM PEBS to guest-owned events. * tag 'perf-urgent-2025-05-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel: KVM: Mask PEBS_ENABLE loaded for guest with vCPU's value. perf/x86/intel/ds: Fix counter backwards of non-precise events counters-snapshotting perf/x86/intel: Check the X86 leader for pebs_counter_event_group perf/x86/intel: Only check the group flag for X86 leader
2025-05-04Merge tag 'irq-urgent-2025-05-04' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Ingo Molnar: - Prevent NULL pointer dereference in msi_domain_debug_show() - Fix crash in the qcom-mpm irqchip driver when configuring interrupts for non-wake GPIOs * tag 'irq-urgent-2025-05-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/qcom-mpm: Prevent crash when trying to handle non-wake GPIOs genirq/msi: Prevent NULL pointer dereference in msi_domain_debug_show()
2025-05-04x86/boot/sev: Support memory acceptance in the EFI stub under SVSMArd Biesheuvel
Commit: d54d610243a4 ("x86/boot/sev: Avoid shared GHCB page for early memory acceptance") provided a fix for SEV-SNP memory acceptance from the EFI stub when running at VMPL #0. However, that fix was insufficient for SVSM SEV-SNP guests running at VMPL >0, as those rely on a SVSM calling area, which is a shared buffer whose address is programmed into a SEV-SNP MSR, and the SEV init code that sets up this calling area executes much later during the boot. Given that booting via the EFI stub at VMPL >0 implies that the firmware has configured this calling area already, reuse it for performing memory acceptance in the EFI stub. Fixes: fcd042e86422 ("x86/sev: Perform PVALIDATE using the SVSM when not at VMPL0") Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Co-developed-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: <stable@vger.kernel.org> Cc: Dionna Amalie Glaze <dionnaglaze@google.com> Cc: Kevin Loughlin <kevinloughlin@google.com> Cc: linux-efi@vger.kernel.org Link: https://lore.kernel.org/r/20250428174322.2780170-2-ardb+git@google.com
2025-05-03Merge tag 'arm64-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fix from Catalin Marinas: "Add missing sentinels to the arm64 Spectre-BHB MIDR arrays, otherwise is_midr_in_range_list() reads beyond the end of these arrays" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: errata: Add missing sentinels to Spectre-BHB MIDR arrays
2025-05-03Merge tag 'i2c-for-6.15-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fix from Wolfram Sang: - imx-lpi2c: fix clock error handling sequence in probe * tag 'i2c-for-6.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: imx-lpi2c: Fix clock count when probe defers
2025-05-03Merge tag 'sound-6.15-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "A bunch of small fixes. Mostly driver specific. - An OOB access fix in core UMP rawmidi conversion code - Fix for ASoC DAPM hw_params widget sequence - Make retry of usb_set_interface() errors for flaky devices - Fix redundant USB MIDI name strings - Quirks for various HP and ASUS models with HD-audio, and Jabra Evolve 65 USB-audio - Cirrus Kunit test fixes - Various fixes for ASoC Intel, stm32, renesas, imx-card, and simple-card" * tag 'sound-6.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (30 commits) ASoC: amd: ps: fix for irq handler return status ASoC: simple-card-utils: Fix pointer check in graph_util_parse_link_direction ASoC: intel/sdw_utils: Add volume limit to cs35l56 speakers ASoC: intel/sdw_utils: Add volume limit to cs42l43 speakers ASoC: stm32: sai: add a check on minimal kernel frequency ASoC: stm32: sai: skip useless iterations on kernel rate loop ALSA: hda/realtek - Add more HP laptops which need mute led fixup ALSA: hda/realtek: Fix built-mic regression on other ASUS models ASoC: Intel: catpt: avoid type mismatch in dev_dbg() format ALSA: usb-audio: Fix duplicated name in MIDI substream names ALSA: ump: Fix buffer overflow at UMP SysEx message conversion ALSA: usb-audio: Add second USB ID for Jabra Evolve 65 headset ALSA: hda/realtek: Add quirk for HP Spectre x360 15-df1xxx ALSA: hda: Apply volume control on speaker+lineout for HP EliteStudio AIO ASoC: Intel: bytcr_rt5640: Add DMI quirk for Acer Aspire SW3-013 ASoC: amd: acp: Fix devm_snd_soc_register_card(acp-pdm-mach) failure ASoC: amd: acp: Fix NULL pointer deref in acp_i2s_set_tdm_slot ASoC: amd: acp: Fix NULL pointer deref on acp resume path ASoC: renesas: rz-ssi: Use NOIRQ_SYSTEM_SLEEP_PM_OPS() ASoC: soc-acpi-intel-ptl-match: add empty item to ptl_cs42l43_l3[] ...
2025-05-03landlock: Remove KUnit test that triggers a warningMickaël Salaün
A KUnit test checking boundaries triggers a canary warning, which may be disturbing. Let's remove this test for now. Hopefully, KUnit will soon get support for suppressing warning backtraces [1]. Cc: Alessandro Carminati <acarmina@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Günther Noack <gnoack@google.com> Reported-by: Tingmao Wang <m@maowtm.org> Closes: https://lore.kernel.org/r/20250327213807.12964-1-m@maowtm.org Link: https://lore.kernel.org/r/20250425193249.78b45d2589575c15f483c3d8@linux-foundation.org [1] Link: https://lore.kernel.org/r/20250503065359.3625407-1-mic@digikod.net Signed-off-by: Mickaël Salaün <mic@digikod.net>
2025-05-02Merge tag 'spi-fix-v6.15-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi fixes from Mark Brown: "A fairly small pile of fixes, plus one new compatible string addition to the Synopsis driver for a new platform. The most notable thing is the fix for divide by zeros in spi-mem if an operation has no dummy bytes" * tag 'spi-fix-v6.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: spi: tegra114: Don't fail set_cs_timing when delays are zero spi: spi-qpic-snand: fix NAND_READ_LOCATION_2 register handling spi: spi-mem: Add fix to avoid divide error spi: dt-bindings: snps,dw-apb-ssi: Add compatible for SOPHGO SG2042 SoC spi: dt-bindings: snps,dw-apb-ssi: Merge duplicate compatible entry spi: spi-qpic-snand: propagate errors from qcom_spi_block_erase() spi: stm32-ospi: Fix an error handling path in stm32_ospi_probe()
2025-05-02Merge tag 'pm-6.15-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "These fix three recent regressions, two in cpufreq and one in the Intel Soundwire driver, and an unchecked MSR access in the intel_pstate driver: - Fix a recent regression causing systems where frequency tables are used by cpufreq to have issues with setting frequency limits (Rafael Wysocki) - Fix a recent regressions causing frequency boost settings to become out-of-sync if platform firmware updates the registers associated with frequency boost during system resume (Viresh Kumar) - Fix a recent regression causing resume failures to occur in the Intel Soundwire driver if the device handled by it is in runtime suspend before a system-wide suspend (Rafael Wysocki) - Fix an unchecked MSR aceess in the intel_pstate driver occurring when CPUID indicates no turbo, but the driver attempts to enable turbo frequencies due to a misleading value read from an MSR (Srinivas Pandruvada)" * tag 'pm-6.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: cpufreq: intel_pstate: Unchecked MSR aceess in legacy mode soundwire: intel_auxdevice: Fix system suspend/resume handling cpufreq: Fix setting policy limits when frequency tables are used cpufreq: ACPI: Re-sync CPU boost state on system resume
2025-05-02Merge tag '6.15-rc4-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull smb client fixes from Steve French: - fix posix mkdir error to ksmbd (also avoids crash in cifs_destroy_request_bufs) - two smb1 fixes: fixing querypath info and setpathinfo to old servers - fix rsize/wsize when not multiple of page size to address DIO reads/writes * tag '6.15-rc4-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: smb: client: ensure aligned IO sizes cifs: Fix changing times and read-only attr over SMB1 smb_set_file_info() function cifs: Fix and improve cifs_query_path_info() and cifs_query_file_info() smb: client: fix zero length for mkdir POSIX create context
2025-05-02Merge tag 'drm-fixes-2025-05-03' of https://gitlab.freedesktop.org/drm/kernelLinus Torvalds
Pull drm fixes from Dave Airlie: "Weekly drm fixes, amdgpu and xe as usual, the new adp driver has a bunch of vblank fixes, then a bunch of small fixes across the board. Seems about the right level for this time in the release cycle. ttm: - docs warning fix kunit - fix leak in shmem tests fdinfo: - driver unbind race fix amdgpu: - Fix possible UAF in HDCP - XGMI dma-buf fix - NBIO 7.11 fix - VCN 5.0.1 fix xe: - EU stall locking fix and disabling on VF - Documentation fix kernel version supporting hwmon entries - SVM fixes on error handling i915: - Fix build for CONFIG_DRM_I915_PXP=n nouveau: - fix race condition in fence handling ivpu: - interrupt handling fix - D0i2 test mode fix adp: - vblank fixes mipi-dbi: - timing fix" * tag 'drm-fixes-2025-05-03' of https://gitlab.freedesktop.org/drm/kernel: (23 commits) drm/gpusvm: set has_dma_mapping inside mapping loop drm/xe/hwmon: Fix kernel version documentation for temperature drm/xe/eustall: Do not support EU stall on SRIOV VF drm/xe/eustall: Resolve a possible circular locking dependency drm/amdgpu: Add DPG pause for VCN v5.0.1 drm/amdgpu: Fix offset for HDP remap in nbio v7.11 drm/amdgpu: Fail DMABUF map of XGMI-accessible memory drm/amd/display: Fix slab-use-after-free in hdcp drm/mipi-dbi: Fix blanking for non-16 bit formats drm/tests: shmem: Fix memleak drm/xe/guc: Fix capture of steering registers drm/xe/svm: fix dereferencing error pointer in drm_gpusvm_range_alloc() drm: Select DRM_KMS_HELPER from DRM_DEBUG_DP_MST_TOPOLOGY_REFS drm: adp: Remove pointless irq_lock spin lock drm: adp: Enable vblank interrupts in crtc's .atomic_enable drm: adp: Handle drm_crtc_vblank_get() errors drm: adp: Use spin_lock_irqsave for drm device event_lock drm/fdinfo: Protect against driver unbind drm/ttm: fix the warning for hit_low and evict_low accel/ivpu: Fix the D0i2 disable test mode ...
2025-05-02KVM: x86/mmu: Prevent installing hugepages when mem attributes are changingSean Christopherson
When changing memory attributes on a subset of a potential hugepage, add the hugepage to the invalidation range tracking to prevent installing a hugepage until the attributes are fully updated. Like the actual hugepage tracking updates in kvm_arch_post_set_memory_attributes(), process only the head and tail pages, as any potential hugepages that are entirely covered by the range will already be tracked. Note, only hugepage chunks whose current attributes are NOT mixed need to be added to the invalidation set, as mixed attributes already prevent installing a hugepage, and it's perfectly safe to install a smaller mapping for a gfn whose attributes aren't changing. Fixes: 8dd2eee9d526 ("KVM: x86/mmu: Handle page fault for private memory") Cc: stable@vger.kernel.org Reported-by: Michael Roth <michael.roth@amd.com> Tested-by: Michael Roth <michael.roth@amd.com> Link: https://lore.kernel.org/r/20250430220954.522672-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-05-02KVM: SVM: Update dump_ghcb() to use the GHCB snapshot fieldsTom Lendacky
Commit 4e15a0ddc3ff ("KVM: SEV: snapshot the GHCB before accessing it") updated the SEV code to take a snapshot of the GHCB before using it. But the dump_ghcb() function wasn't updated to use the snapshot locations. This results in incorrect output from dump_ghcb() for the "is_valid" and "valid_bitmap" fields. Update dump_ghcb() to use the proper locations. Fixes: 4e15a0ddc3ff ("KVM: SEV: snapshot the GHCB before accessing it") Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Liam Merwick <liam.merwick@oracle.com> Link: https://lore.kernel.org/r/8f03878443681496008b1b37b7c4bf77a342b459.1745866531.git.thomas.lendacky@amd.com [sean: add comment and snapshot qualifier] Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-05-02Merge branch 'pm-cpufreq'Rafael J. Wysocki
Merge cpufreq fixes for 6.15-rc5: - Fix a recent regression causing systems where frequency tables are used by cpufreq to have issues with setting frequency limits (Rafael Wysocki). - Fix a recent regressions causing frequency boost settings to become out-of-sync if platform firmware updates the registers associated with them during system resume (Viresh Kumar). - Fix an unchecked MSR aceess in the intel_pstate driver occurring when CPUID indicates no turbo, but the driver attempts to enable turbo frequencies due to a misleading value read from an MSR (Srinivas Pandruvada). * pm-cpufreq: cpufreq: intel_pstate: Unchecked MSR aceess in legacy mode cpufreq: Fix setting policy limits when frequency tables are used cpufreq: ACPI: Re-sync CPU boost state on system resume
2025-05-02drm/v3d: Add job to pending list if the reset was skippedMaíra Canal
When a CL/CSD job times out, we check if the GPU has made any progress since the last timeout. If so, instead of resetting the hardware, we skip the reset and let the timer get rearmed. This gives long-running jobs a chance to complete. However, when `timedout_job()` is called, the job in question is removed from the pending list, which means it won't be automatically freed through `free_job()`. Consequently, when we skip the reset and keep the job running, the job won't be freed when it finally completes. This situation leads to a memory leak, as exposed in [1] and [2]. Similarly to commit 704d3d60fec4 ("drm/etnaviv: don't block scheduler when GPU is still active"), this patch ensures the job is put back on the pending list when extending the timeout. Cc: stable@vger.kernel.org # 6.0 Reported-by: Daivik Bhatia <dtgs1208@gmail.com> Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12227 [1] Closes: https://github.com/raspberrypi/linux/issues/6817 [2] Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Acked-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Link: https://lore.kernel.org/r/20250430210643.57924-1-mcanal@igalia.com Signed-off-by: Maíra Canal <mcanal@igalia.com>
2025-05-02irqchip/qcom-mpm: Prevent crash when trying to handle non-wake GPIOsStephan Gerhold
On Qualcomm chipsets not all GPIOs are wakeup capable. Those GPIOs do not have a corresponding MPM pin and should not be handled inside the MPM driver. The IRQ domain hierarchy is always applied, so it's required to explicitly disconnect the hierarchy for those. The pinctrl-msm driver marks these with GPIO_NO_WAKE_IRQ. qcom-pdc has a check for this, but irq-qcom-mpm is currently missing the check. This is causing crashes when setting up interrupts for non-wake GPIOs: root@rb1:~# gpiomon -c gpiochip1 10 irq: IRQ159: trimming hierarchy from :soc@0:interrupt-controller@f200000-1 Unable to handle kernel paging request at virtual address ffff8000a1dc3820 Hardware name: Qualcomm Technologies, Inc. Robotics RB1 (DT) pc : mpm_set_type+0x80/0xcc lr : mpm_set_type+0x5c/0xcc Call trace: mpm_set_type+0x80/0xcc (P) qcom_mpm_set_type+0x64/0x158 irq_chip_set_type_parent+0x20/0x38 msm_gpio_irq_set_type+0x50/0x530 __irq_set_trigger+0x60/0x184 __setup_irq+0x304/0x6bc request_threaded_irq+0xc8/0x19c edge_detector_setup+0x260/0x364 linereq_create+0x420/0x5a8 gpio_ioctl+0x2d4/0x6c0 Fix this by copying the check for GPIO_NO_WAKE_IRQ from qcom-pdc.c, so that MPM is removed entirely from the hierarchy for non-wake GPIOs. Fixes: a6199bb514d8 ("irqchip: Add Qualcomm MPM controller driver") Reported-by: Alexey Klimov <alexey.klimov@linaro.org> Signed-off-by: Stephan Gerhold <stephan.gerhold@linaro.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Alexey Klimov <alexey.klimov@linaro.org> Reviewed-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20250502-irq-qcom-mpm-fix-no-wake-v1-1-8a1eafcd28d4@linaro.org
2025-05-02arm64: vdso: Work around invalid absolute relocations from GCCThomas Weißschuh
All vDSO code needs to be completely position independent. Symbol references are marked as hidden so the compiler emits PC-relative relocations. However GCC emits absolute relocations for symbol-relative references with an offset >= 64KiB. After recent refactorings in the vDSO code this is the case in __arch_get_vdso_u_timens_data() with a page size of 64KiB. Work around the issue by preventing the optimizer from seeing the offsets. Fixes: 83a2a6b8cfc5 ("vdso/gettimeofday: Prepare do_hres_timens() for introduction of struct vdso_clock") Reported-by: Jan Stancek <jstancek@redhat.com> Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/all/20250430-vdso-absolute-reloc-v2-1-5efcc3bc4b26@linutronix.de Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120002 Closes: https://lore.kernel.org/lkml/aApGPAoctq_eoE2g@t14ultra/
2025-05-02perf symbol-minimal: Fix double free in filename__read_build_idIan Rogers
Running the "perf script task-analyzer tests" with address sanitizer showed a double free: ``` FAIL: "test_csv_extended_times" Error message: "Failed to find required string:'Out-Out;'." ================================================================= ==19190==ERROR: AddressSanitizer: attempting double-free on 0x50b000017b10 in thread T0: #0 0x55da9601c78a in free (perf+0x26078a) (BuildId: e7ef50e08970f017a96fde6101c5e2491acc674a) #1 0x55da96640c63 in filename__read_build_id tools/perf/util/symbol-minimal.c:221:2 0x50b000017b10 is located 0 bytes inside of 112-byte region [0x50b000017b10,0x50b000017b80) freed by thread T0 here: #0 0x55da9601ce40 in realloc (perf+0x260e40) (BuildId: e7ef50e08970f017a96fde6101c5e2491acc674a) #1 0x55da96640ad6 in filename__read_build_id tools/perf/util/symbol-minimal.c:204:10 previously allocated by thread T0 here: #0 0x55da9601ca23 in malloc (perf+0x260a23) (BuildId: e7ef50e08970f017a96fde6101c5e2491acc674a) #1 0x55da966407e7 in filename__read_build_id tools/perf/util/symbol-minimal.c:181:9 SUMMARY: AddressSanitizer: double-free (perf+0x26078a) (BuildId: e7ef50e08970f017a96fde6101c5e2491acc674a) in free ==19190==ABORTING FAIL: "invocation of perf script report task-analyzer --csv-summary csvsummary --summary-extended command failed" Error message: "" FAIL: "test_csvsummary_extended" Error message: "Failed to find required string:'Out-Out;'." ---- end(-1) ---- 132: perf script task-analyzer tests : FAILED! ``` The buf_size if always set to phdr->p_filesz, but that may be 0 causing a free and realloc to return NULL. This is treated in filename__read_build_id like a failure and the buffer is freed again. To avoid this problem only grow buf, meaning the buf_size will never be 0. This also reduces the number of memory (re)allocations. Fixes: b691f64360ecec49 ("perf symbols: Implement poor man's ELF parser") Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250501070003.22251-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-02perf mem: Add 'dtlb' output fieldNamhyung Kim
This is a breakdown of perf_mem_data_src.mem_dtlb values. It assumes PMU drivers would set PERF_MEM_TLB_HIT bit with an appropriate level. And having PERF_MEM_TLB_MISS means that it failed to find one in any levels of TLB. For now, it doesn't use PERF_MEM_TLB_{WK,OS} bits. Also it seems Intel machines don't distinguish L1 or L2 precisely. So I added ANY_HIT (printed as "L?-Hit") to handle the case. $ perf mem report -F overhead,dtlb,dso --stdio ... # --- D-TLB ---- # Overhead L?-Hit Miss Shared Object # ........ .............. ................. # 67.03% 99.5% 0.5% [unknown] 31.23% 99.2% 0.8% [kernel.kallsyms] 1.08% 97.8% 2.2% [i915] 0.36% 100.0% 0.0% [JIT] tid 6853 0.12% 100.0% 0.0% [drm] 0.05% 100.0% 0.0% [drm_kms_helper] 0.05% 100.0% 0.0% [ext4] 0.02% 100.0% 0.0% [aesni_intel] 0.02% 100.0% 0.0% [crc32c_intel] 0.02% 100.0% 0.0% [dm_crypt] ... Committer testing: # perf report --header | grep cpudesc # cpudesc : AMD Ryzen 9 9950X3D 16-Core Processor # perf mem report -F overhead,dtlb,dso --stdio | head -20 # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 2K of event 'cycles:P' # Total weight : 2637 # Sort order : local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked,blocked,local_ins_lat,local_p_stage_cyc # # ---------- D-TLB ----------- # Overhead L1-Hit L2-Hit Miss Other Shared Object # ........ ............................ ................................. # 77.47% 18.4% 0.1% 0.6% 80.9% [kernel.kallsyms] 5.61% 36.5% 0.7% 1.4% 61.5% libxul.so 2.77% 39.7% 0.0% 12.3% 47.9% libc.so.6 2.01% 34.0% 1.9% 1.9% 62.3% libglib-2.0.so.0.8400.1 1.93% 31.4% 2.0% 2.0% 64.7% [amdgpu] 1.63% 48.8% 0.0% 0.0% 51.2% [JIT] tid 60168 1.14% 3.3% 0.0% 0.0% 96.7% [vdso] # Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/r/20250430205548.789750-12-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-02perf mem: Add 'snoop' output fieldNamhyung Kim
This is a breakdown of perf_mem_data_src.mem_snoop values. For now, it doesn't use mem_snoopx values like FWD and PEER. $ perf mem report -F overhead,snoop,comm --stdio ... # ---------- Snoop ----------- # Overhead Hit HitM Miss Other Command # ........ ............................ ............... # 34.24% 0.6% 0.0% 0.0% 99.4% gnome-shell 12.02% 1.0% 0.0% 0.0% 99.0% chrome 9.32% 1.0% 0.0% 0.3% 98.7% Isolated Web Co 6.85% 1.0% 0.3% 0.0% 98.6% swapper 6.30% 0.8% 0.8% 0.0% 98.5% Xorg 3.02% 2.4% 0.0% 0.0% 97.6% VizCompositorTh 2.35% 0.0% 0.0% 0.0% 100.0% firefox-esr 2.04% 0.0% 0.0% 0.0% 100.0% JS Helper 1.51% 3.2% 0.0% 0.0% 96.8% threaded-ml 1.44% 0.0% 0.0% 0.0% 100.0% AudioIP~allback ... Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/r/20250430205548.789750-11-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-02perf mem: Add 'cache' and 'memory' output fieldsNamhyung Kim
This is a breakdown of perf_mem_data_src.mem_lvl_num. But it's also divided into two parts because the combination is bigger than 8. Since there are many entries for different cache levels, 'cache' field focuses on them. I generalized buffers like LFB, MAB and MHB to L1-buf and L2-buf. The rest goes to 'memory' field which can be RAM, CXL, PMEM, IO, etc. $ perf mem report -F cache,mem,dso --stdio ... # # -------------- Cache -------------- --- Memory --- # L1 L2 L3 L1-buf Other RAM Other Shared Object # ................................... .............. .................................... # 53.9% 3.6% 16.2% 21.6% 4.8% 4.8% 95.2% [kernel.kallsyms] 64.7% 1.7% 3.5% 17.4% 12.8% 12.8% 87.2% chrome (deleted) 78.3% 2.8% 0.0% 1.0% 17.9% 17.9% 82.1% libc.so.6 39.6% 1.5% 0.0% 5.7% 53.2% 53.2% 46.8% libxul.so 26.2% 0.0% 0.0% 0.0% 73.8% 73.8% 26.2% [unknown] 85.5% 0.0% 0.0% 14.5% 0.0% 0.0% 100.0% libspa-audioconvert.so 66.3% 4.4% 0.0% 29.4% 0.0% 0.0% 100.0% libglib-2.0.so.0.8200.1 (deleted) 1.9% 0.0% 0.0% 0.0% 98.1% 98.1% 1.9% libmutter-cogl-15.so.0.0.0 (deleted) 10.6% 0.0% 0.0% 89.4% 0.0% 0.0% 100.0% libpulsecommon-16.1.so 0.0% 0.0% 0.0% 100.0% 0.0% 0.0% 100.0% libfreeblpriv3.so (deleted) ... Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/r/20250430205548.789750-10-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-02perf hist: Hide unused mem stat columnsNamhyung Kim
Some mem_stat types don't use all 8 columns. And there are cases only samples in certain kinds of mem_stat types are available only. For that case hide columns which has no samples. The new output for the previous data would be: $ perf mem report -F overhead,op,comm --stdio ... # ------ Mem Op ------- # Overhead Load Store Other Command # ........ ..................... ............... # 44.85% 21.1% 30.7% 48.3% swapper 26.82% 98.8% 0.3% 0.9% netsli-prober 7.19% 51.7% 13.7% 34.6% perf 5.81% 89.7% 2.2% 8.1% qemu-system-ppc 4.77% 100.0% 0.0% 0.0% notifications_c 1.77% 95.9% 1.2% 3.0% MemoryReleaser 0.77% 71.6% 4.1% 24.3% DefaultEventMan 0.19% 66.7% 22.2% 11.1% gnome-shell ... On Intel machines, the event is only for loads or stores so it'll have only one column: # Mem Op # Overhead Load Command # ........ ....... ............... # 20.55% 100.0% swapper 17.13% 100.0% chrome 9.02% 100.0% data-loop.0 6.26% 100.0% pipewire-pulse 5.63% 100.0% threaded-ml 5.47% 100.0% GraphRunner 5.37% 100.0% AudioIP~allback 5.30% 100.0% Chrome_ChildIOT 3.17% 100.0% Isolated Web Co ... Committer testing: # grep "model name" -m1 /proc/cpuinfo model name : AMD Ryzen 9 9950X3D 16-Core Processo # perf mem report -F overhead,op,comm --stdio # Total Lost Samples: 0 # # Samples: 2K of event 'cycles:P' # Total weight : 2637 # Sort order : local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked,blocked,local_ins_lat,local_p_stage_cyc # # ------ Mem Op ------- # Overhead Load Store Other Command # ........ ..................... ............... # 61.02% 14.4% 25.5% 60.1% swapper 5.61% 26.4% 13.5% 60.1% Isolated Web Co 5.50% 21.4% 29.7% 49.0% perf 4.74% 27.2% 15.2% 57.6% gnome-shell 4.63% 33.6% 11.5% 54.9% mdns_service 4.29% 28.3% 12.4% 59.3% ptyxis 2.16% 24.6% 19.3% 56.1% DOM Worker 0.99% 23.1% 34.6% 42.3% firefox 0.72% 26.3% 15.8% 57.9% IPC I/O Parent 0.61% 12.5% 12.5% 75.0% kworker/u130:20 0.61% 37.5% 18.8% 43.8% podman 0.57% 33.3% 6.7% 60.0% Timer 0.53% 14.3% 7.1% 78.6% KMS thread 0.49% 30.8% 7.7% 61.5% kworker/u130:3- 0.46% 41.7% 33.3% 25.0% IPDL Background Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/r/20250430205548.789750-9-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-02perf mem: Add 'op' output fieldNamhyung Kim
This is an actual example of the he_mem_stat based sample breakdown. It uses 'mem_op' field of union perf_mem_data_src which means memory operations. It'd have basically 'load' or 'store' which can be useful if PMU doesn't have separate events for them like IBS or SPE. In addition, there's an entry in case load and store happen at the same time. Also adds entries for prefetching and execution. $ perf mem report -F +op -s comm --stdio # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 4K of event 'ibs_op//' # Total weight : 9559 # Sort order : comm # # --------------------- Mem Op ---------------------- # Overhead Samples Load Store Ld+St Pfetch Exec Other N/A N/A Command # ........ ....... ................................................... ............... # 44.85% 4077 21.1% 30.7% 0.0% 0.0% 0.0% 48.3% 0.0% 0.0% swapper 26.82% 45 98.8% 0.3% 0.0% 0.0% 0.0% 0.9% 0.0% 0.0% netsli-prober 7.19% 442 51.7% 13.7% 0.0% 0.0% 0.0% 34.6% 0.0% 0.0% perf 5.81% 75 89.7% 2.2% 0.0% 0.0% 0.0% 8.1% 0.0% 0.0% qemu-system-ppc 4.77% 1 100.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% notifications_c 1.77% 10 95.9% 1.2% 0.0% 0.0% 0.0% 3.0% 0.0% 0.0% MemoryReleaser 0.77% 32 71.6% 4.1% 0.0% 0.0% 0.0% 24.3% 0.0% 0.0% DefaultEventMan 0.19% 10 66.7% 22.2% 0.0% 0.0% 0.0% 11.1% 0.0% 0.0% gnome-shell Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/r/20250430205548.789750-8-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-02perf hist: Implement output fields for mem statsNamhyung Kim
This is a preparation for later changes to support mem_stat output. The new fields will need two lines for the header - the first line will show type of mem stat and the second line will show the name of each item which is returned by mem_stat_name(). Each element in the mem_stat array will be printed in percentage for the hist_entry and their sum would be 100%. Add new output field dimension only for SORT_MODE__MEM using mem_stat. To handle possible name conflict with existing sort keys, move the order of checking output field dimensions after the sort dimensions when it looks for sort keys. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/r/20250430205548.789750-7-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-02perf hist: Basic support for mem_stat accountingNamhyung Kim
Add a logic to account he->mem_stat based on mem_stat_type in hists. Each mem_stat entry will have different meaning based on the type so the index in the array is calculated at runtime using the corresponding value in the sample.data_src. Still hists has no mem_stat_types yet so this code won't work for now. Later hists->mem_stat_types will be allocated based on what users want in the output actually. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/r/20250430205548.789750-6-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-02perf hist: Add struct he_mem_statNamhyung Kim
The 'struct he_mem_stat' is to save detailed information about memory instruction. It'll be used to show breakdown of various data from PERF_SAMPLE_DATA_SRC. Note that this structure is generic and the contents will be different depending on actual data it'll use later. The information about the actual data will be saved in 'struct hists' and its length is in nr_mem_stats. This commit just adds ground works and does nothing since hists->nr_mem_stats is 0 for now. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/r/20250430205548.789750-5-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-02perf hist: Support multi-line headerNamhyung Kim
This is a preparation to support multi-line headers in 'perf mem report'. Normal sort keys and output fields that don't have contents for multi- line will print the header string at the last line only. As we don't use multi-line headers normally, it should not have any changes in the output. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/r/20250430205548.789750-4-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-02perf record: Add --sample-mem-info optionNamhyung Kim
There's no way to enable PERF_SAMPLE_DATA_SRC without PERF_SAMPLE_ADDR which brings a lot of overhead due to the number of MMAP[2] records. Let's add a new option to enable this information separately. Committer testing: # perf record -a --sample-mem-info ^C[ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 1.815 MB perf.data (2637 samples) ] # # perf evlist -v cycles:P: type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0 (PERF_COUNT_HW_CPU_CYCLES), { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD|IDENTIFIER|DATA_SRC, read_format: ID|LOST, disabled: 1, freq: 1, precise_ip: 2, sample_id_all: 1 dummy:u: type: 1 (PERF_TYPE_SOFTWARE), size: 136, config: 0x9 (PERF_COUNT_SW_DUMMY), { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|CPU|IDENTIFIER|DATA_SRC, read_format: ID|LOST, exclude_kernel: 1, exclude_hv: 1, mmap: 1, comm: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1 # # perf report -D |& grep -w PERF_RECORD_SAMPLE -A3 -m1 0 44675164447282 0x1a7590 [0x40]: PERF_RECORD_SAMPLE(IP, 0x4001): 107299/107299: 0xffffffffac4a5e11 period: 144 addr: 0 . data_src: 0x229080142 ... thread: perf:107299 ...... dso: /lib/modules/6.15.0-rc4+/build/vmlinux # Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/r/20250430205548.789750-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-02perf hist: Remove output field from sort-list properlyNamhyung Kim
When it removes an output format for cancelled children or latency, it should delete itself from the sort list as well. Otherwise assertion in fmt_free() will fire. $ perf report -H --stdio perf: ui/hist.c:603: fmt_free: Assertion `!(!list_empty(&fmt->sort_list))' failed. Aborted (core dumped) Also convert to perf_hpp__column_unregister() for the same open codes. Committer notes: Before this patch: # perf test hierarchy 83: perf report --hierarchy : FAILED! # perf test -v hierarchy --- start --- test child forked, pid 102242 perf report --hierarchy Linux [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.025 MB /tmp/perf-test-report.HX0N85TlPq/perf-report-hierarchy-perf.data (6 samples) ] perf: ui/hist.c:603: fmt_free: Assertion `!(!list_empty(&fmt->sort_list))' failed. /home/acme/libexec/perf-core/tests/shell/perf-report-hierarchy.sh: line 34: 102250 Aborted (core dumped) perf report --hierarchy > /dev/null --- Cleaning up --- ---- end(-1) ---- 83: perf report --hierarchy : FAILED! # After: # perf test hierarchy 83: perf report --hierarchy : Ok # Fixes: dbd11b6bdab12f60 ("perf hist: Remove formats in hierarchy when cancel children") Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250430180321.736939-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-02perf test perf-report-hierarchy: Add new testArnaldo Carvalho de Melo
Super simple test to check that at least we're not segfaulting when trying to use 'perf report --hierarchy', more subtests should be added to make sure the output is the expected one. This is being merged right before a fix for that that this test detects: # perf test hierarchy 83: perf report --hierarchy : FAILED! # perf test -v hierarchy --- start --- test child forked, pid 102242 perf report --hierarchy Linux [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.025 MB /tmp/perf-test-report.HX0N85TlPq/perf-report-hierarchy-perf.data (6 samples) ] perf: ui/hist.c:603: fmt_free: Assertion `!(!list_empty(&fmt->sort_list))' failed. /home/acme/libexec/perf-core/tests/shell/perf-report-hierarchy.sh: line 34: 102250 Aborted (core dumped) perf report --hierarchy > /dev/null --- Cleaning up --- ---- end(-1) ---- 83: perf report --hierarchy : FAILED! # Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/lkml/20250430180321.736939-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-02Merge tag 'scsi-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Two minor updates, both in drivers" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: ufs: core: Remove redundant query_complete trace scsi: myrb: Fix spelling mistake "statux" -> "status"
2025-05-02Merge tag 'block-6.15-20250502' of git://git.kernel.dk/linuxLinus Torvalds
Pull block fixes from Jens Axboe: - NVMe pull request via Christoph: - fix queue unquiesce check on PCI slot_reset (Keith Busch) - fix premature queue removal and I/O failover in nvme-tcp (Michael Liang) - don't restore null sk_state_change (Alistair Francis) - select CONFIG_TLS where needed (Alistair Francis) - always free derived key data (Hannes Reinecke) - more quirks (Wentao Guan) - ublk zero copy fix - ublk selftest fix for UBLK_F_NEED_GET_DATA * tag 'block-6.15-20250502' of git://git.kernel.dk/linux: nvmet-auth: always free derived key data nvmet-tcp: don't restore null sk_state_change nvmet-tcp: select CONFIG_TLS from CONFIG_NVME_TARGET_TCP_TLS nvme-tcp: select CONFIG_TLS from CONFIG_NVME_TCP_TLS nvme-tcp: fix premature queue removal and I/O failover nvme-pci: add quirks for WDC Blue SN550 15b7:5009 nvme-pci: add quirks for device 126f:1001 nvme-pci: fix queue unquiesce check on slot_reset ublk: remove the check of ublk_need_req_ref() from __ublk_check_and_get_req ublk: enhance check for register/unregister io buffer command ublk: decouple zero copy from user copy selftests: ublk: fix UBLK_F_NEED_GET_DATA
2025-05-02Merge tag 'io_uring-6.15-20250502' of git://git.kernel.dk/linuxLinus Torvalds
Pull io_uring fix from Jens Axboe: "Just a single fix, annotating the fdinfo side SQ/CQ head/tail reads with data_race() as they are known racy. Only serves to silence syzbot testing, by definition these debug outputs are going to be racy as they may change as soon as we've read them" * tag 'io_uring-6.15-20250502' of git://git.kernel.dk/linux: io_uring/fdinfo: annotate racy sq/cq head/tail reads
2025-05-02Merge tag 'bcachefs-2025-05-01' of git://evilpiepirate.org/bcachefsLinus Torvalds
Pull bcachefs fixes from Kent Overstreet: "Lots of assorted small fixes... - Some repair path fixes, a fix for -ENOMEM when reconstructing lots of alloc info on large filesystems, upgrade for ancient 0.14 filesystems, etc. - Various assert tweaks; assert -> ERO, ERO -> log the error in the superblock and continue - casefolding now uses d_ops like on other casefolding filesystems - fix device label create on device add, fix bucket array resize on filesystem resize - fix xattrs with FORTIFY_SOURCE builds with gcc-15/clang" * tag 'bcachefs-2025-05-01' of git://evilpiepirate.org/bcachefs: (22 commits) bcachefs: Remove incorrect __counted_by annotation bcachefs: add missing sched_annotate_sleep() bcachefs: Fix __bch2_dev_group_set() bcachefs: Kill ERO for i_blocks check in truncate bcachefs: check for inode.bi_sectors underflow bcachefs: Kill ERO in __bch2_i_sectors_acct() bcachefs: readdir fixes bcachefs: improve missing journal write device error message bcachefs: Topology error after insert is now an ERO bcachefs: Use bch2_kvmalloc() for journal keys array bcachefs: More informative error message when shutting down due to error bcachefs: btree_root_unreadable_and_scan_found_nothing autofix for non data btrees bcachefs: btree_node_data_missing is now autofix bcachefs: Don't generate alloc updates to invalid buckets bcachefs: Improve bch2_dev_bucket_missing() bcachefs: fix bch2_dev_buckets_resize() bcachefs: Add upgrade table entry from 0.14 bcachefs: Run BCH_RECOVERY_PASS_reconstruct_snapshots on missing subvol -> snapshot bcachefs: Add missing utf8_unload() bcachefs: Emit unicode version message on startup ...
2025-05-02Merge tag 'pinctrl-v6.15-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl Pull pin control fixes from Linus Walleij: - Fix potential NULL dereference in the i.MX driver - Fix the pull up/down resistor values in the Meson driver - Fix the mapping of the PHY LED pins in the Airhoa driver - Fix EINT interrupts on older controllers and a debounce value issue in the Mediatek driver - Fix an erronoeus PINGROUP define in the Qualcomm driver * tag 'pinctrl-v6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: pinctrl: qcom: Fix PINGROUP definition for sm8750 pinctrl: mediatek: common-v1: Fix error checking in mtk_eint_init() pinctrl: mediatek: Fix new design debounce issue pinctrl: mediatek: common-v1: Fix EINT breakage on older controllers pinctrl: airoha: fix wrong PHY LED mapping and PHY2 LED defines pinctrl: meson: define the pull up/down resistor value as 60 kOhm pinctrl: imx: Return NULL if no group is matched and found
2025-05-02Merge tag 'iommu-fixes-v6.15-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux Pull iommu fixes from Joerg Roedel: "ARM-SMMU fixes: - Fix broken detection of the S2FWB feature - Ensure page-size bitmap is initialised for SVA domains - Fix handling of SMMU client devices with duplicate Stream IDs - Don't fail SMMU probe if Stream IDs are aliased across clients Intel VT-d fixes: - Add quirk for IGFX device - Revert an ATS change to fix a boot failure AMD IOMMU: - Fix potential buffer overflow Core: - Fix for iommu_copy_struct_from_user()" * tag 'iommu-fixes-v6.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: iommu/vt-d: Apply quirk_iommu_igfx for 8086:0044 (QM57/QS57) iommu/vt-d: Revert ATS timing change to fix boot failure iommu: Fix two issues in iommu_copy_struct_from_user() iommu/amd: Fix potential buffer overflow in parse_ivrs_acpihid iommu/arm-smmu-v3: Fail aliasing StreamIDs more gracefully iommu/arm-smmu-v3: Fix iommu_device_probe bug due to duplicated stream ids iommu/arm-smmu-v3: Fix pgsize_bit for sva domains iommu/arm-smmu-v3: Add missing S2FWB feature detection
2025-05-02Merge tag 'slab-for-6.15-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab Pull slab fix from Vlastimil Babka: - Stable fix to avoid bugs due to leftover obj_ext after allocation profiling is disabled at runtime (Zhenhua Huang) * tag 'slab-for-6.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab: mm, slab: clean up slab->obj_exts always
2025-05-02Merge tag 'i2c-host-fixes-6.15-rc5' of ↵Wolfram Sang
git://git.kernel.org/pub/scm/linux/kernel/git/andi.shyti/linux into i2c/for-current i2c-host-fixes for v6.15-rc5 - imx-lpi2c: fix error handling sequence in probe
2025-05-02drm: Fix potential overflow issue in event_string arrayFeng Jiang
When calling scnprintf() to append recovery method to event_string, the second argument should be `sizeof(event_string) - len`, otherwise there is a potential overflow problem. Fixes: b7cf9f4ac1b8 ("drm: Introduce device wedged event") Signed-off-by: Feng Jiang <jiangfeng@kylinos.cn> Reviewed-by: André Almeida <andrealmeid@igalia.com> Reviewed-by: Raag Jadav <raag.jadav@intel.com> Link: https://lore.kernel.org/r/20250409014633.31303-1-jiangfeng@kylinos.cn Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-05-02fs/eventpoll: fix endless busy loop after timeout has expiredMax Kellermann
After commit 0a65bc27bd64 ("eventpoll: Set epoll timeout if it's in the future"), the following program would immediately enter a busy loop in the kernel: ``` int main() { int e = epoll_create1(0); struct epoll_event event = {.events = EPOLLIN}; epoll_ctl(e, EPOLL_CTL_ADD, 0, &event); const struct timespec timeout = {.tv_nsec = 1}; epoll_pwait2(e, &event, 1, &timeout, 0); } ``` This happens because the given (non-zero) timeout of 1 nanosecond usually expires before ep_poll() is entered and then ep_schedule_timeout() returns false, but `timed_out` is never set because the code line that sets it is skipped. This quickly turns into a soft lockup, RCU stalls and deadlocks, inflicting severe headaches to the whole system. When the timeout has expired, we don't need to schedule a hrtimer, but we should set the `timed_out` variable. Therefore, I suggest moving the ep_schedule_timeout() check into the `timed_out` expression instead of skipping it. brauner: Note that there was an earlier fix by Joe Damato in response to my bug report in [1]. Fixes: 0a65bc27bd64 ("eventpoll: Set epoll timeout if it's in the future") Cc: Joe Damato <jdamato@fastly.com> Cc: stable@vger.kernel.org Signed-off-by: Max Kellermann <max.kellermann@ionos.com> Link: https://lore.kernel.org/20250429153419.94723-1-jdamato@fastly.com [1] Link: https://lore.kernel.org/20250429185827.3564438-1-max.kellermann@ionos.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-02Drivers: hv: Make the sysfs node size for the ring buffer dynamicNaman Jain
The ring buffer size varies across VMBus channels. The size of sysfs node for the ring buffer is currently hardcoded to 4 MB. Userspace clients either use fstat() or hardcode this size for doing mmap(). To address this, make the sysfs node size dynamic to reflect the actual ring buffer size for each channel. This will ensure that fstat() on ring sysfs node always returns the correct size of ring buffer. Reviewed-by: Michael Kelley <mhklinux@outlook.com> Tested-by: Michael Kelley <mhklinux@outlook.com> Reviewed-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Naman Jain <namjain@linux.microsoft.com> Link: https://lore.kernel.org/r/20250502074811.2022-3-namjain@linux.microsoft.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-02uio_hv_generic: Fix sysfs creation path for ring bufferNaman Jain
On regular bootup, devices get registered to VMBus first, so when uio_hv_generic driver for a particular device type is probed, the device is already initialized and added, so sysfs creation in hv_uio_probe() works fine. However, when the device is removed and brought back, the channel gets rescinded and the device again gets registered to VMBus. However this time, the uio_hv_generic driver is already registered to probe for that device and in this case sysfs creation is tried before the device's kobject gets initialized completely. Fix this by moving the core logic of sysfs creation of ring buffer, from uio_hv_generic to HyperV's VMBus driver, where the rest of the sysfs attributes for the channels are defined. While doing that, make use of attribute groups and macros, instead of creating sysfs directly, to ensure better error handling and code flow. Problematic path: vmbus_process_offer (A new offer comes for the VMBus device) vmbus_add_channel_work vmbus_device_register |-> device_register | |... | |-> hv_uio_probe | |... | |-> sysfs_create_bin_file (leads to a warning as | the primary channel's kobject, which is used to | create the sysfs file, is not yet initialized) |-> kset_create_and_add |-> vmbus_add_channel_kobj (initialization of the primary channel's kobject happens later) Above code flow is sequential and the warning is always reproducible in this path. Fixes: 9ab877a6ccf8 ("uio_hv_generic: make ring buffer attribute for primary channel") Cc: stable@kernel.org Suggested-by: Saurabh Sengar <ssengar@linux.microsoft.com> Suggested-by: Michael Kelley <mhklinux@outlook.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Tested-by: Michael Kelley <mhklinux@outlook.com> Reviewed-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Naman Jain <namjain@linux.microsoft.com> Link: https://lore.kernel.org/r/20250502074811.2022-2-namjain@linux.microsoft.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-02btrfs: open code folio_index() in btree_clear_folio_dirty_tag()Kairui Song
The folio_index() helper is only needed for mixed usage of page cache and swap cache, for pure page cache usage, the caller can just use folio->index instead. It can't be a swap cache folio here. Swap mapping may only call into fs through 'swap_rw' but btrfs does not use that method for swap. Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Kairui Song <kasong@tencent.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-02Revert "btrfs: canonicalize the device path before adding it"Qu Wenruo
This reverts commit 7e06de7c83a746e58d4701e013182af133395188. Commit 7e06de7c83a7 ("btrfs: canonicalize the device path before adding it") tries to make btrfs to use "/dev/mapper/*" name first, then any filename inside "/dev/" as the device path. This is mostly fine when there is only the root namespace involved, but when multiple namespace are involved, things can easily go wrong for the d_path() usage. As d_path() returns a file path that is namespace dependent, the resulted string may not make any sense in another namespace. Furthermore, the "/dev/" prefix checks itself is not reliable, one can still make a valid initramfs without devtmpfs, and fill all needed device nodes manually. Overall the userspace has all its might to pass whatever device path for mount, and we are not going to win the war trying to cover every corner case. So just revert that commit, and do no extra d_path() based file path sanity check. CC: stable@vger.kernel.org # 6.12+ Link: https://lore.kernel.org/linux-fsdevel/20250115185608.GA2223535@zen.localdomain/ Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-02btrfs: avoid NULL pointer dereference if no valid csum treeQu Wenruo
[BUG] When trying read-only scrub on a btrfs with rescue=idatacsums mount option, it will crash with the following call trace: BUG: kernel NULL pointer dereference, address: 0000000000000208 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page CPU: 1 UID: 0 PID: 835 Comm: btrfs Tainted: G O 6.15.0-rc3-custom+ #236 PREEMPT(full) Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown 02/02/2022 RIP: 0010:btrfs_lookup_csums_bitmap+0x49/0x480 [btrfs] Call Trace: <TASK> scrub_find_fill_first_stripe+0x35b/0x3d0 [btrfs] scrub_simple_mirror+0x175/0x290 [btrfs] scrub_stripe+0x5f7/0x6f0 [btrfs] scrub_chunk+0x9a/0x150 [btrfs] scrub_enumerate_chunks+0x333/0x660 [btrfs] btrfs_scrub_dev+0x23e/0x600 [btrfs] btrfs_ioctl+0x1dcf/0x2f80 [btrfs] __x64_sys_ioctl+0x97/0xc0 do_syscall_64+0x4f/0x120 entry_SYSCALL_64_after_hwframe+0x76/0x7e [CAUSE] Mount option "rescue=idatacsums" will completely skip loading the csum tree, so that any data read will not find any data csum thus we will ignore data checksum verification. Normally call sites utilizing csum tree will check the fs state flag NO_DATA_CSUMS bit, but unfortunately scrub does not check that bit at all. This results in scrub to call btrfs_search_slot() on a NULL pointer and triggered above crash. [FIX] Check both extent and csum tree root before doing any tree search. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-02btrfs: handle empty eb->folios in num_extent_folios()Boris Burkov
num_extent_folios() unconditionally calls folio_order() on eb->folios[0]. If that is NULL this will be a segfault. It is reasonable for it to return 0 as the number of folios in the eb when the first entry is NULL, so do that instead. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Boris Burkov <boris@bur.io> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-02btrfs: correct the order of prelim_ref arguments in btrfs__prelim_refGoldwyn Rodrigues
btrfs_prelim_ref() calls the old and new reference variables in the incorrect order. This causes a NULL pointer dereference because oldref is passed as NULL to trace_btrfs_prelim_ref_insert(). Note, trace_btrfs_prelim_ref_insert() is being called with newref as oldref (and oldref as NULL) on purpose in order to print out the values of newref. To reproduce: echo 1 > /sys/kernel/debug/tracing/events/btrfs/btrfs_prelim_ref_insert/enable Perform some writeback operations. Backtrace: BUG: kernel NULL pointer dereference, address: 0000000000000018 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 115949067 P4D 115949067 PUD 11594a067 PMD 0 Oops: Oops: 0000 [#1] SMP NOPTI CPU: 1 UID: 0 PID: 1188 Comm: fsstress Not tainted 6.15.0-rc2-tester+ #47 PREEMPT(voluntary) 7ca2cef72d5e9c600f0c7718adb6462de8149622 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-2-gc13ff2cd-prebuilt.qemu.org 04/01/2014 RIP: 0010:trace_event_raw_event_btrfs__prelim_ref+0x72/0x130 Code: e8 43 81 9f ff 48 85 c0 74 78 4d 85 e4 0f 84 8f 00 00 00 49 8b 94 24 c0 06 00 00 48 8b 0a 48 89 48 08 48 8b 52 08 48 89 50 10 <49> 8b 55 18 48 89 50 18 49 8b 55 20 48 89 50 20 41 0f b6 55 28 88 RSP: 0018:ffffce44820077a0 EFLAGS: 00010286 RAX: ffff8c6b403f9014 RBX: ffff8c6b55825730 RCX: 304994edf9cf506b RDX: d8b11eb7f0fdb699 RSI: ffff8c6b403f9010 RDI: ffff8c6b403f9010 RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000010 R10: 00000000ffffffff R11: 0000000000000000 R12: ffff8c6b4e8fb000 R13: 0000000000000000 R14: ffffce44820077a8 R15: ffff8c6b4abd1540 FS: 00007f4dc6813740(0000) GS:ffff8c6c1d378000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000018 CR3: 000000010eb42000 CR4: 0000000000750ef0 PKRU: 55555554 Call Trace: <TASK> prelim_ref_insert+0x1c1/0x270 find_parent_nodes+0x12a6/0x1ee0 ? __entry_text_end+0x101f06/0x101f09 ? srso_alias_return_thunk+0x5/0xfbef5 ? srso_alias_return_thunk+0x5/0xfbef5 ? srso_alias_return_thunk+0x5/0xfbef5 ? srso_alias_return_thunk+0x5/0xfbef5 btrfs_is_data_extent_shared+0x167/0x640 ? fiemap_process_hole+0xd0/0x2c0 extent_fiemap+0xa5c/0xbc0 ? __entry_text_end+0x101f05/0x101f09 btrfs_fiemap+0x7e/0xd0 do_vfs_ioctl+0x425/0x9d0 __x64_sys_ioctl+0x75/0xc0 Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>