summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-07-14drivers/perf: hisi: Relax the event number check of v2 PMUsJunhao He
The supported event number range of each Uncore PMUs is provided by each driver in hisi_pmu::check_event and out of range events will be rejected. A later version with expanded event number range needs to register the PMU with updated hisi_pmu::check_event even if it's the only update, which means the expanded events cannot be used unless the driver's updated. However the unsupported events won't be counted by the hardware so we can relax the event number check to allow the use the expanded events. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Junhao He <hejunhao3@huawei.com> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20250619125557.57372-6-yangyicong@huawei.com Signed-off-by: Will Deacon <will@kernel.org>
2025-07-14drivers/perf: hisi: Add support for HiSilicon SLLC v3 PMU driverJunhao He
SLLC v3 PMU has the following changes compared to previous version: a) update the register layout b) update the definition of SRCID_CTRL and TGTID_CTRL registers. To be compatible with v2, we use maximum width (11 bits) and mask the extra length for themselves. c) remove latency events (driver does not need to be adapted). SLLC v3 PMU is identified with HID HISI0264. Signed-off-by: Junhao He <hejunhao3@huawei.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20250619125557.57372-5-yangyicong@huawei.com Signed-off-by: Will Deacon <will@kernel.org>
2025-07-14drivers/perf: hisi: Use ACPI driver_data to retrieve SLLC PMU informationJunhao He
Make use of struct acpi_device_id::driver_data for version specific information rather than judge the version register. This will help to simplify the probe process and also a bit easier for extension. Factor out SLLC register definition to struct hisi_sllc_pmu_regs. No functional changes intended. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Junhao He <hejunhao3@huawei.com> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20250619125557.57372-4-yangyicong@huawei.com Signed-off-by: Will Deacon <will@kernel.org>
2025-07-14drivers/perf: hisi: Add support for HiSilicon DDRC v3 PMU driverJunhao He
HiSilicon DDRC v3 PMU has the different interrupt register offset compared to the v2. Add device information of v3 PMU with ACPI HID HISI0235. Signed-off-by: Junhao He <hejunhao3@huawei.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20250619125557.57372-3-yangyicong@huawei.com Signed-off-by: Will Deacon <will@kernel.org>
2025-07-14drivers/perf: hisi: Simplify the probe process for each DDRC versionJunhao He
Version 1 and 2 of DDRC PMU also use different HID. Make use of struct acpi_device_id::driver_data for version specific information rather than judge the version register. This will help to simplify the probe process and also a bit easier for extension. In order to support this extend struct hisi_pmu_dev_info for version specific counter bits and event range. Signed-off-by: Junhao He <hejunhao3@huawei.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20250619125557.57372-2-yangyicong@huawei.com Signed-off-by: Will Deacon <will@kernel.org>
2025-07-14perf/arm-ni: Support sharing IRQs within an NI instanceShouping Wang
NI-700 has a distinct PMU interrupt output for each Clock Domain, however some integrations may still combine these together externally. The initial driver didn't attempt to support this, in anticipation of a more general solution for IRQ sharing between system PMU instances, but that's still a way off, so let's make this intermediate step for now to at least allow sharing IRQs within an individual NI instance. Now that CPU affinity and migration are cleaned up, it's fairly straightforward to adopt similar logic to arm-cmn, to identify CDs with a common interrupt and loop over them directly in the handler. Signed-off-by: Shouping Wang <allen.wang@hj-micro.com> [ rm: Rework for affinity handling, cosmetics, new commit message ] Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/f62db639d3b54c959ec477db7b8ccecbef1ca310.1752256072.git.robin.murphy@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2025-07-14perf/arm-ni: Consolidate CPU affinity handlingRobin Murphy
Since overflow interrupts from the individual PMUs are infrequent and unlikely to coincide, and we make no attempt to balance them across CPUs anyway, there's really not much point tracking a separate CPU affinity per PMU. Move the CPU affinity and hotplug migration up to the NI instance level. Tested-by: Shouping Wang <allen.wang@hj-micro.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/00b622872006c2f0c89485e343b1cb8caaa79c47.1752256072.git.robin.murphy@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2025-07-14perf/cxlpmu: Fix typos in cxl_pmu.c comments and documentationAlok Tiwari
Fix several minor typo errors in comments: - Remove duplicated word "a" in "a a VID / GroupID". - Correct "Opcopdes" to "Opcodes" in CXL spec reference. - Fix spelling of "implemnted" to "implemented". Improves code readability and documentation consistency. Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Link: https://lore.kernel.org/r/20250624194350.109790-4-alok.a.tiwari@oracle.com Signed-off-by: Will Deacon <will@kernel.org>
2025-07-14perf/cxlpmu: Remove unintended newline from IRQ name format stringAlok Tiwari
The IRQ name format string used in devm_kasprintf() mistakenly included a newline character "\n". This could lead to confusing log output or misformatted names in sysfs or debug messages. This fix removes the newline to ensure proper IRQ naming. Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Link: https://lore.kernel.org/r/20250624194350.109790-3-alok.a.tiwari@oracle.com Signed-off-by: Will Deacon <will@kernel.org>
2025-07-14perf/cxlpmu: Fix devm_kcalloc() argument order in cxl_pmu_probe()Alok Tiwari
The previous code mistakenly swapped the count and size parameters. This fix corrects the argument order in devm_kcalloc() to follow the conventional count, size form, avoiding potential confusion or bugs. Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Link: https://lore.kernel.org/r/20250624194350.109790-2-alok.a.tiwari@oracle.com Signed-off-by: Will Deacon <will@kernel.org>
2025-07-08perf: arm_spe: Relax period restrictionLeo Yan
The minimum interval specified the PMSIDR_EL1.Interval field is a hardware recommendation. However, this value is set by hardware designer before the production. It is not actual hardware limitation but tools currently have no way to test shorter periods. This change relaxes the limitation by allowing any non-zero periods, with simplifying code with clamp_t(). The downside is that small periods may increase the risk of AUX ring buffer overruns. When an overrun occurs, the perf core layer will trigger an irq work to disable the event and wake up the tool in user space to read the trace data. After the tool finishes reading, it will re-enable the AUX event. Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/20250627163028.3503122-1-leo.yan@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2025-07-08perf: arm_pmuv3: Add support for the Branch Record Buffer Extension (BRBE)Rob Herring (Arm)
The ARMv9.2 architecture introduces the optional Branch Record Buffer Extension (BRBE), which records information about branches as they are executed into set of branch record registers. BRBE is similar to x86's Last Branch Record (LBR) and PowerPC's Branch History Rolling Buffer (BHRB). BRBE supports filtering by exception level and can filter just the source or target address if excluded to avoid leaking privileged addresses. The h/w filter would be sufficient except when there are multiple events with disjoint filtering requirements. In this case, BRBE is configured with a union of all the events' desired branches, and then the recorded branches are filtered based on each event's filter. For example, with one event capturing kernel events and another event capturing user events, BRBE will be configured to capture both kernel and user branches. When handling event overflow, the branch records have to be filtered by software to only include kernel or user branch addresses for that event. In contrast, x86 simply configures LBR using the last installed event which seems broken. It is possible on x86 to configure branch filter such that no branches are ever recorded (e.g. -j save_type). For BRBE, events with a configuration that will result in no samples are rejected. Recording branches in KVM guests is not supported like x86. However, perf on x86 allows requesting branch recording in guests. The guest events are recorded, but the resulting branches are all from the host. For BRBE, events with branch recording and "exclude_host" set are rejected. Requiring "exclude_guest" to be set did not work. The default for the perf tool does set "exclude_guest" if no exception level options are specified. However, specifying kernel or user events defaults to including both host and guest. In this case, only host branches are recorded. BRBE can support some additional exception branch types compared to x86. On x86, all exceptions other than syscalls are recorded as IRQ. With BRBE, it is possible to better categorize these exceptions. One limitation relative to x86 is we cannot distinguish a syscall return from other exception returns. So all exception returns are recorded as ERET type. The FIQ branch type is omitted as the only FIQ user is Apple platforms which don't support BRBE. The debug branch types are omitted as there is no clear need for them. BRBE records are invalidated whenever events are reconfigured, a new task is scheduled in, or after recording is paused (and the records have been recorded for the event). The architecture allows branch records to be invalidated by the PE under implementation defined conditions. It is expected that these conditions are rare. Cc: Catalin Marinas <catalin.marinas@arm.com> Co-developed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Co-developed-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Rob Herring (Arm) <robh@kernel.org> tested-by: Adam Young <admiyo@os.amperecomputing.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20250611-arm-brbe-v19-v23-4-e7775563036e@kernel.org [will: Fix sparse warnings about mixed declarations and code. Fix C99 comment syntax.] Signed-off-by: Will Deacon <will@kernel.org>
2025-07-08KVM: arm64: nvhe: Disable branch generation in nVHE guestsAnshuman Khandual
While BRBE can record branches within guests, the host recording branches in guests is not supported by perf (though events are). Support for BRBE in guests will supported by providing direct access to BRBE within the guests. That is how x86 LBR works for guests. Therefore, BRBE needs to be disabled on guest entry and restored on exit. For nVHE, this requires explicit handling for guests. Before entering a guest, save the BRBE state and disable the it. When returning to the host, restore the state. For VHE, it is not necessary. We initialize BRBCR_EL1.{E1BRE,E0BRE}=={0,0} at boot time, and HCR_EL2.TGE==1 while running in the host. We configure BRBCR_EL2.{E2BRE,E0HBRE} to enable branch recording in the host. When entering the guest, we set HCR_EL2.TGE==0 which means BRBCR_EL1 is used instead of BRBCR_EL2. Consequently for VHE, BRBE recording is disabled at EL1 and EL0 when running a guest. Should recording in guests (by the host) ever be desired, the perf ABI will need to be extended to distinguish guest addresses (struct perf_branch_entry.priv) for starters. BRBE records would also need to be invalidated on guest entry/exit as guest/host EL1 and EL0 records can't be distinguished. Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Co-developed-by: Rob Herring (Arm) <robh@kernel.org> Signed-off-by: Rob Herring (Arm) <robh@kernel.org> Tested-by: James Clark <james.clark@linaro.org> Reviewed-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Acked-by: Marc Zyngier <maz@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20250611-arm-brbe-v19-v23-3-e7775563036e@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2025-07-08arm64: Handle BRBE booting requirementsAnshuman Khandual
To use the Branch Record Buffer Extension (BRBE), some configuration is necessary at EL3 and EL2. This patch documents the requirements and adds the initial EL2 setup code, which largely consists of configuring the fine-grained traps and initializing a couple of BRBE control registers. Before this patch, __init_el2_fgt() would initialize HDFGRTR_EL2 and HDFGWTR_EL2 with the same value, relying on the read/write trap controls for a register occupying the same bit position in either register. The 'nBRBIDR' trap control only exists in bit 59 of HDFGRTR_EL2, while bit 59 of HDFGWTR_EL2 is RES0, and so this assumption no longer holds. To handle HDFGRTR_EL2 and HDFGWTR_EL2 having (slightly) different bit layouts, __init_el2_fgt() is changed to accumulate the HDFGRTR_EL2 and HDFGWTR_EL2 control bits separately. While making this change the open-coded value (1 << 62) is replaced with HDFG{R,W}TR_EL2_nPMSNEVFR_EL1_MASK. The BRBCR_EL1 and BRBCR_EL2 registers are unusual and require special initialisation: even though they are subject to E2H renaming, both have an effect regardless of HCR_EL2.TGE, even when running at EL2. So we must initialize BRBCR_EL2 in case we run in nVHE mode. This is handled in __init_el2_brbe() with a comment to explain the situation. Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Leo Yan <leo.yan@arm.com> Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> [Mark: rewrite commit message, fix typo in comment] Signed-off-by: Mark Rutland <mark.rutland@arm.com> Co-developed-by: Rob Herring (Arm) <robh@kernel.org> Signed-off-by: Rob Herring (Arm) <robh@kernel.org> tested-by: Adam Young <admiyo@os.amperecomputing.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20250611-arm-brbe-v19-v23-2-e7775563036e@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2025-07-08arm64/sysreg: Add BRBE registers and fieldsAnshuman Khandual
This patch adds definitions related to the Branch Record Buffer Extension (BRBE) as per ARM DDI 0487K.a. These will be used by KVM and a BRBE driver in subsequent patches. Some existing BRBE definitions in asm/sysreg.h are replaced with equivalent generated definitions. Cc: Marc Zyngier <maz@kernel.org> Reviewed-by: Mark Brown <broonie@kernel.org> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Tested-by: James Clark <james.clark@linaro.org> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Rob Herring (Arm) <robh@kernel.org> tested-by: Adam Young <admiyo@os.amperecomputing.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20250611-arm-brbe-v19-v23-1-e7775563036e@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2025-07-04perf/arm: Add missing .suppress_bind_attrsRobin Murphy
PMU drivers should set .suppress_bind_attrs so that userspace is denied the opportunity to pull the driver out from underneath an in-use PMU (with predictably unpleasant consequences). Somehow both the CMN and NI drivers have managed to miss this; put that right. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Leo Yan <leo.yan@arm.com> Link: https://lore.kernel.org/r/acd48c341b33b96804a3969ee00b355d40c546e2.1751465293.git.robin.murphy@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2025-07-04perf/arm-cmn: Reduce stack usage during discoveryRobin Murphy
Arnd reports that Clang's aggressive inlining of arm_cmn_discover() can lead to stack frame size warnings, and while we could simply prevent such inlining to hide the issue, it seems more productive to actually heed the warning and do something about the overall stack footprint. The xp_region array is already rather large, and CMN_MAX_XPS might only grow larger in future, however it only serves as a convenience to save repeating the first level's worth of register reads in the second pass of discovery. There's no performance concern here, and it only takes a small tweak to the flow to re-extract the offsets instead of stashing them, so let's just do that and save several hundred bytes of stack. Reported-by: Arnd Bergmann <arnd@kernel.org> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-and-tested-by: Ilkka Koskinen <ilkka@os.amperecomputing.com> Link: https://lore.kernel.org/r/e7dd41bf0f1b098e2e4b01ef91318a4b272abff8.1751046159.git.robin.murphy@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2025-07-04perf: imx9_perf: make the read-only array mask static constColin Ian King
Don't populate the read-only array mask on the stack at run time, instead make it static const. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Link: https://lore.kernel.org/r/20250611133917.170888-1-colin.i.king@gmail.com Signed-off-by: Will Deacon <will@kernel.org>
2025-07-04perf/arm-cmn: Broaden module description for wider interconnect supportZhiyuan Dai
The current MODULE_DESCRIPTION only mentions CMN-600, but this driver now supports several Arm mesh interconnects including CMN-650, CMN-700, CI-700, and CMN-S3. Update the MODULE_DESCRIPTION to reflect the expanded scope. Signed-off-by: Zhiyuan Dai <daizhiyuan@phytium.com.cn> Link: https://lore.kernel.org/r/20250522032122.949373-1-daizhiyuan@phytium.com.cn Signed-off-by: Will Deacon <will@kernel.org>
2025-07-04perf/arm-ni: Set initial IRQ affinityRobin Murphy
While we do request our IRQs with the right flags to stop their affinity changing unexpectedly, we forgot to actually set it to start with. Oops. Cc: stable@vger.kernel.org Fixes: 4d5a7680f2b4 ("perf: Add driver for Arm NI-700 interconnect PMU") Signed-off-by: Robin Murphy <robin.murphy@arm.com> Tested-by: Shouping Wang <allen.wang@hj-micro.com> Link: https://lore.kernel.org/r/614ced9149ee8324e58930862bd82cbf46228d27.1747149165.git.robin.murphy@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2025-07-04arm64/watchdog_hld: Add a cpufreq notifier for update watchdog threshYicong Yang
arm64 depends on the cpufreq driver to gain the maximum cpu frequency to convert the watchdog_thresh to perf event period. cpufreq drivers like cppc_cpufreq will be initialized lately after the initializing of the hard lockup detector so just use a safe cpufreq which will be inaccurency. Use a cpufreq notifier to adjust the event's period to a more accurate one. Reviewed-by: Jie Zhan <zhanjie9@hisilicon.com> Reviewed-by: Douglas Anderson <dianders@chromium.org> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20250701110214.27242-3-yangyicong@huawei.com Signed-off-by: Will Deacon <will@kernel.org>
2025-07-04watchdog/perf: Provide function for adjusting the event periodYicong Yang
Architecture's using perf events for hard lockup detection needs to convert the watchdog_thresh to the event's period, some architecture for example arm64 perform this conversion using the CPU's maximum frequency which will be acquired by cpufreq. However by the time the lockup detector's initialized the cpufreq driver may not be initialized, thus launch a watchdog with inaccurate period. Provide a function hardlockup_detector_perf_adjust_period() to allowing adjust the event period. Then architecture can update with more accurate period if cpufreq is initialized. Reviewed-by: Douglas Anderson <dianders@chromium.org> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20250701110214.27242-2-yangyicong@huawei.com Signed-off-by: Will Deacon <will@kernel.org>
2025-06-08Linux 6.16-rc1v6.16-rc1Linus Torvalds
2025-06-08Merge tag 'turbostat-2025.06.08' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux Pull turbostat updates from Len Brown: - Add initial DMR support, which required smarter RAPL probe - Fix AMD MSR RAPL energy reporting - Add RAPL power limit configuration output - Minor fixes * tag 'turbostat-2025.06.08' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: tools/power turbostat: version 2025.06.08 tools/power turbostat: Add initial support for BartlettLake tools/power turbostat: Add initial support for DMR tools/power turbostat: Dump RAPL sysfs info tools/power turbostat: Avoid probing the same perf counters tools/power turbostat: Allow probing RAPL with platform_features->rapl_msrs cleared tools/power turbostat: Clean up add perf/msr counter logic tools/power turbostat: Introduce add_msr_counter() tools/power turbostat: Remove add_msr_perf_counter_() tools/power turbostat: Remove add_cstate_perf_counter_() tools/power turbostat: Remove add_rapl_perf_counter_() tools/power turbostat: Quit early for unsupported RAPL counters tools/power turbostat: Always check rapl_joules flag tools/power turbostat: Fix AMD package-energy reporting tools/power turbostat: Fix RAPL_GFX_ALL typo tools/power turbostat: Add Android support for MSR device handling tools/power turbostat.8: pm_domain wording fix tools/power turbostat.8: fix typo: idle_pct should be pct_idle
2025-06-08Merge tag 'timers-cleanups-2025-06-08' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer cleanup from Thomas Gleixner: "The delayed from_timer() API cleanup: The renaming to the timer_*() namespace was delayed due massive conflicts against Linux-next. Now that everything is upstream finish the conversion" * tag 'timers-cleanups-2025-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: treewide, timers: Rename from_timer() to timer_container_of()
2025-06-08Merge tag 'x86-urgent-2025-06-08' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "A small set of x86 fixes: - Cure IO bitmap inconsistencies A failed fork cleans up all resources of the newly created thread via exit_thread(). exit_thread() invokes io_bitmap_exit() which does the IO bitmap cleanups, which unfortunately assume that the cleanup is related to the current task, which is obviously bogus. Make it work correctly - A lockdep fix in the resctrl code removed the clearing of the command buffer in two places, which keeps stale error messages around. Bring them back. - Remove unused trace events" * tag 'x86-urgent-2025-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: fs/resctrl: Restore the rdt_last_cmd_clear() calls after acquiring rdtgroup_mutex x86/iopl: Cure TIF_IO_BITMAP inconsistencies x86/fpu: Remove unused trace events
2025-06-08Merge tag 'timers-urgent-2025-06-08' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Thomas Gleixner: "Add the missing seq_file forward declaration in the timer namespace header" * tag 'timers-urgent-2025-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timens: Add struct seq_file forward declaration
2025-06-08tools/power turbostat: version 2025.06.08Len Brown
Add initial DMR support, which required smarter RAPL probe Fix AMD MSR RAPL energy reporting Add RAPL power limit configuration output Minor fixes Signed-off-by: Len Brown <len.brown@intel.com>
2025-06-08tools/power turbostat: Add initial support for BartlettLakeZhang Rui
Add initial support for BartlettLake. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2025-06-08tools/power turbostat: Add initial support for DMRZhang Rui
Add initial support for DMR. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2025-06-08tools/power turbostat: Dump RAPL sysfs infoZhang Rui
for example: intel-rapl:1: psys 28.0s:100W 976.0us:100W intel-rapl:0: package-0 28.0s:57W,max:15W 2.4ms:57W intel-rapl:0/intel-rapl:0:0: core disabled intel-rapl:0/intel-rapl:0:1: uncore disabled intel-rapl-mmio:0: package-0 28.0s:28W,max:15W 2.4ms:57W [lenb: simplified format] Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com> squish me Signed-off-by: Len Brown <len.brown@intel.com>
2025-06-08tools/power turbostat: Avoid probing the same perf countersZhang Rui
For the RAPL package energy status counter, Intel and AMD share the same perf_subsys and perf_name, but with different MSR addresses. Both rapl_counter_arch_infos[0] and rapl_counter_arch_infos[1] are introduced to describe this counter for different Vendors. As a result, the perf counter is probed twice, and causes a failure in in get_rapl_counters() because expected_read_size and actual_read_size don't match. Fix the problem by skipping the already probed counter. Note, this is not a perfect fix. For example, if different vendors/platforms use the same MSR value for different purpose, the code can be fooled when it probes a rapl_counter_arch_infos[] entry that does not belong to the running Vendor/Platform. In a long run, better to put rapl_counter_arch_infos[] into the platform_features so that this becomes Vendor/Platform specific. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2025-06-08tools/power turbostat: Allow probing RAPL with platform_features->rapl_msrs ↵Zhang Rui
cleared platform_features->rapl_msrs describes the RAPL MSRs supported. While RAPL Perf counters can be exposed from different kernel backend drivers, e.g. RAPL MSR I/F driver, or RAPL TPMI I/F driver. Thus, turbostat should first blindly probe all the available RAPL Perf counters, and falls back to the RAPL MSR counters if they are listed in platform_features->rapl_msrs. With this, platforms that don't have RAPL MSRs can clear the platform_features->rapl_msrs bits and use RAPL Perf counters only. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2025-06-08tools/power turbostat: Clean up add perf/msr counter logicZhang Rui
Increase the code readability by moving the no_perf/no_msr flag and the cai->perf_name/cai->msr sanity checks into the counter probe functions. No functional change. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2025-06-08tools/power turbostat: Introduce add_msr_counter()Zhang Rui
probe_rapl_msr() is reused for probing RAPL MSR counters, cstate MSR counters and MPERF/APERF/SMI MSR counters, thus its name is misleading. Similar to add_perf_counter(), introduce add_msr_counter() to probe a counter via MSR. Introduce wrapper function add_rapl_msr_counter() at the same time to add extra check for Zero return value for specified RAPL counters. No functional change intended. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2025-06-08tools/power turbostat: Remove add_msr_perf_counter_()Zhang Rui
As the only caller of add_msr_perf_counter_(), add_msr_perf_counter() just gives extra debug output on top. There is no need to keep both functions. Remove add_msr_perf_counter_() and move all the logic to add_msr_perf_counter(). No functional change. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2025-06-08tools/power turbostat: Remove add_cstate_perf_counter_()Zhang Rui
As the only caller of add_cstate_perf_counter_(), add_cstate_perf_counter() just gives extra debug output on top. There is no need to keep both functions. Remove add_cstate_perf_counter_() and move all the logic to add_cstate_perf_counter(). No functional change. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2025-06-08tools/power turbostat: Remove add_rapl_perf_counter_()Zhang Rui
As the only caller of add_rapl_perf_counter_(), add_rapl_perf_counter() just gives extra debug output on top. There is no need to keep both functions. Remove add_rapl_perf_counter_() and move all the logic to add_rapl_perf_counter(). No functional change. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2025-06-08tools/power turbostat: Quit early for unsupported RAPL countersZhang Rui
Quit early for unsupported RAPL counters. No functional change. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2025-06-08tools/power turbostat: Always check rapl_joules flagZhang Rui
rapl_joules bit should always be checked even if platform_features->rapl_msrs is not set or no_msr flag is used. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2025-06-08tools/power turbostat: Fix AMD package-energy reportingGautham R. Shenoy
commit 05a2f07db888 ("tools/power turbostat: read RAPL counters via perf") that adds support to read RAPL counters via perf defines the notion of a RAPL domain_id which is set to physical_core_id on platforms which support per_core_rapl counters (Eg: AMD processors Family 17h onwards) and is set to the physical_package_id on all the other platforms. However, the physical_core_id is only unique within a package and on platforms with multiple packages more than one core can have the same physical_core_id and thus the same domain_id. (For eg, the first cores of each package have the physical_core_id = 0). This results in all these cores with the same physical_core_id using the same entry in the rapl_counter_info_perdomain[]. Since rapl_perf_init() skips the perf-initialization for cores whose domain_ids have already been visited, cores that have the same physical_core_id always read the perf file corresponding to the physical_core_id of the first package and thus the package-energy is incorrectly reported to be the same value for different packages. Note: This issue only arises when RAPL counters are read via perf and not when they are read via MSRs since in the latter case the MSRs are read separately on each core. Fix this issue by associating each CPU with rapl_core_id which is unique across all the packages in the system. Fixes: 05a2f07db888 ("tools/power turbostat: read RAPL counters via perf") Signed-off-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Signed-off-by: Len Brown <len.brown@intel.com>
2025-06-08tools/power turbostat: Fix RAPL_GFX_ALL typoKaushlendra Kumar
Fix typo in the currently unused RAPL_GFX_ALL macro definition. Signed-off-by: Kaushlendra Kumar <kaushlendra.kumar@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2025-06-08tools/power turbostat: Add Android support for MSR device handlingKaushlendra Kumar
It uses /dev/msrN device paths on Android instead of /dev/cpu/N/msr, updates error messages and permission checks to reflect the Android device path, and wraps platform-specific code with #if defined(ANDROID) to ensure correct behavior on both Android and non-Android systems. These changes improve compatibility and usability of turbostat on Android devices. Signed-off-by: Kaushlendra Kumar <kaushlendra.kumar@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2025-06-08tools/power turbostat.8: pm_domain wording fixLen Brown
turbostat.8: clarify that uncore "domains" are Power Management domains, aka pm_domains. Signed-off-by: Len Brown <len.brown@intel.com>
2025-06-08tools/power turbostat.8: fix typo: idle_pct should be pct_idleLen Brown
idle_pct should be pct_idle Signed-off-by: Len Brown <len.brown@intel.com>
2025-06-08Merge tag 'perf-urgent-2025-06-08' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 perf fix from Thomas Gleixner: "A single fix for the x86 performance counters on Intel CPUs: The MSR offset calculations for fixed performance counters are stored at the wrong index in the configuration array causing the general purpose counter MSR offset to be overwritten, so both the general purpose and the fixed counters offsets are incorrect. Correct the array index calculation to fix that" * tag 'perf-urgent-2025-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel: Fix incorrect MSR index calculations in intel_pmu_config_acr()
2025-06-08Merge tag 'irq-urgent-2025-06-08' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fix from Thomas Gleixner: "A single fix for the PCI/MSI code: The conversion to per device MSI domains created a MSI domain with size 1 instead of sizing it to the maximum possible number of MSI interrupts for the device. This "worked" as the subsequent allocations resized the domain, but the recent change to move the prepare() call into the domain creation path broke this works by chance mechanism. Size the domain properly at creation time" * tag 'irq-urgent-2025-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: PCI/MSI: Size device MSI domain with the maximum number of vectors
2025-06-08Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds
Pull mount fixes from Al Viro: "Various mount-related bugfixes: - split the do_move_mount() checks in subtree-of-our-ns and entire-anon cases and adapt detached mount propagation selftest for mount_setattr - allow clone_private_mount() for a path on real rootfs - fix a race in call of has_locked_children() - fix move_mount propagation graph breakage by MOVE_MOUNT_SET_GROUP - make sure clone_private_mnt() caller has CAP_SYS_ADMIN in the right userns - avoid false negatives in path_overmount() - don't leak MNT_LOCKED from parent to child in finish_automount() - do_change_type(): refuse to operate on unmounted/not ours mounts" * tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: do_change_type(): refuse to operate on unmounted/not ours mounts clone_private_mnt(): make sure that caller has CAP_SYS_ADMIN in the right userns selftests/mount_setattr: adapt detached mount propagation test do_move_mount(): split the checks in subtree-of-our-ns and entire-anon cases fs: allow clone_private_mount() for a path on real rootfs fix propagation graph breakage by MOVE_MOUNT_SET_GROUP move_mount(2) finish_automount(): don't leak MNT_LOCKED from parent to child path_overmount(): avoid false negatives fs/fhandle.c: fix a race in call of has_locked_children()
2025-06-08Merge tag '6.16-rc-part2-smb3-client-fixes' of ↵Linus Torvalds
git://git.samba.org/sfrench/cifs-2.6 Pull more smb client updates from Steve French: - multichannel/reconnect fixes - move smbdirect (smb over RDMA) defines to fs/smb/common so they will be able to be used in the future more broadly, and a documentation update explaining setting up smbdirect mounts - update email address for Paulo * tag '6.16-rc-part2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: update internal version number MAINTAINERS, mailmap: Update Paulo Alcantara's email address cifs: add documentation for smbdirect setup cifs: do not disable interface polling on failure cifs: serialize other channels when query server interfaces is pending cifs: deal with the channel loading lag while picking channels smb: client: make use of common smbdirect_socket_parameters smb: smbdirect: introduce smbdirect_socket_parameters smb: client: make use of common smbdirect_socket smb: smbdirect: add smbdirect_socket.h smb: client: make use of common smbdirect.h smb: smbdirect: add smbdirect.h with public structures smb: client: make use of common smbdirect_pdu.h smb: smbdirect: add smbdirect_pdu.h with protocol definitions
2025-06-08Merge tag 'trace-v6.16-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull more tracing fixes from Steven Rostedt: - Fix regression of waiting a long time on updating trace event filters When the faultable trace points were added, it needed task trace RCU synchronization. This was added to the tracepoint_synchronize_unregister() function. The filter logic always called this function whenever it updated the trace event filters before freeing the old filters. This increased the time of "trace-cmd record" from taking 13 seconds to running over 2 minutes to complete. Move the freeing of the filters to call_rcu*() logic, which brings the time back down to 13 seconds. - Fix ring_buffer_subbuf_order_set() error path lock protection The error path of the ring_buffer_subbuf_order_set() released the mutex too early and allowed subsequent accesses to setting the subbuffer size to corrupt the data and cause a bug. By moving the mutex locking to the end of the error path, it prevents the reentrant access to the critical data and also allows the function to convert the taking of the mutex over to the guard() logic. - Remove unused power management clock events The clock events were added in 2010 for power management. In 2011 arm used them. In 2013 the code they were used in was removed. These events have been wasting memory since then. - Fix sparse warnings There was a few places that sparse warned about trace_events_filter.c where file->filter was referenced directly, but it is annotated with an __rcu tag. Use the helper functions and fix them up to use rcu_dereference() properly. * tag 'trace-v6.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Add rcu annotation around file->filter accesses tracing: PM: Remove unused clock events ring-buffer: Fix buffer locking in ring_buffer_subbuf_order_set() tracing: Fix regression of filter waiting a long time on RCU synchronization