Age | Commit message (Collapse) | Author |
|
The supported event number range of each Uncore PMUs is provided by
each driver in hisi_pmu::check_event and out of range events
will be rejected. A later version with expanded event number range
needs to register the PMU with updated hisi_pmu::check_event
even if it's the only update, which means the expanded events
cannot be used unless the driver's updated. However the unsupported
events won't be counted by the hardware so we can relax the event
number check to allow the use the expanded events.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Junhao He <hejunhao3@huawei.com>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20250619125557.57372-6-yangyicong@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
SLLC v3 PMU has the following changes compared to previous version:
a) update the register layout
b) update the definition of SRCID_CTRL and TGTID_CTRL registers.
To be compatible with v2, we use maximum width (11 bits)
and mask the extra length for themselves.
c) remove latency events (driver does not need to be adapted).
SLLC v3 PMU is identified with HID HISI0264.
Signed-off-by: Junhao He <hejunhao3@huawei.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20250619125557.57372-5-yangyicong@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Make use of struct acpi_device_id::driver_data for version specific
information rather than judge the version register. This will help
to simplify the probe process and also a bit easier for extension.
Factor out SLLC register definition to struct hisi_sllc_pmu_regs.
No functional changes intended.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Junhao He <hejunhao3@huawei.com>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20250619125557.57372-4-yangyicong@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
HiSilicon DDRC v3 PMU has the different interrupt register offset
compared to the v2. Add device information of v3 PMU with ACPI
HID HISI0235.
Signed-off-by: Junhao He <hejunhao3@huawei.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20250619125557.57372-3-yangyicong@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Version 1 and 2 of DDRC PMU also use different HID. Make use of
struct acpi_device_id::driver_data for version specific information
rather than judge the version register. This will help to
simplify the probe process and also a bit easier for extension.
In order to support this extend struct hisi_pmu_dev_info for version
specific counter bits and event range.
Signed-off-by: Junhao He <hejunhao3@huawei.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20250619125557.57372-2-yangyicong@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
NI-700 has a distinct PMU interrupt output for each Clock Domain,
however some integrations may still combine these together externally.
The initial driver didn't attempt to support this, in anticipation of a
more general solution for IRQ sharing between system PMU instances, but
that's still a way off, so let's make this intermediate step for now to
at least allow sharing IRQs within an individual NI instance.
Now that CPU affinity and migration are cleaned up, it's fairly
straightforward to adopt similar logic to arm-cmn, to identify CDs with
a common interrupt and loop over them directly in the handler.
Signed-off-by: Shouping Wang <allen.wang@hj-micro.com>
[ rm: Rework for affinity handling, cosmetics, new commit message ]
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/f62db639d3b54c959ec477db7b8ccecbef1ca310.1752256072.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Since overflow interrupts from the individual PMUs are infrequent and
unlikely to coincide, and we make no attempt to balance them across
CPUs anyway, there's really not much point tracking a separate CPU
affinity per PMU. Move the CPU affinity and hotplug migration up to
the NI instance level.
Tested-by: Shouping Wang <allen.wang@hj-micro.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/00b622872006c2f0c89485e343b1cb8caaa79c47.1752256072.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Fix several minor typo errors in comments:
- Remove duplicated word "a" in "a a VID / GroupID".
- Correct "Opcopdes" to "Opcodes" in CXL spec reference.
- Fix spelling of "implemnted" to "implemented".
Improves code readability and documentation consistency.
Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Link: https://lore.kernel.org/r/20250624194350.109790-4-alok.a.tiwari@oracle.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The IRQ name format string used in devm_kasprintf() mistakenly included
a newline character "\n".
This could lead to confusing log output or misformatted names in sysfs
or debug messages.
This fix removes the newline to ensure proper IRQ naming.
Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Link: https://lore.kernel.org/r/20250624194350.109790-3-alok.a.tiwari@oracle.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The previous code mistakenly swapped the count and size parameters.
This fix corrects the argument order in devm_kcalloc() to follow the
conventional count, size form, avoiding potential confusion or bugs.
Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Link: https://lore.kernel.org/r/20250624194350.109790-2-alok.a.tiwari@oracle.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The minimum interval specified the PMSIDR_EL1.Interval field is a
hardware recommendation. However, this value is set by hardware designer
before the production. It is not actual hardware limitation but tools
currently have no way to test shorter periods.
This change relaxes the limitation by allowing any non-zero periods,
with simplifying code with clamp_t().
The downside is that small periods may increase the risk of AUX ring
buffer overruns. When an overrun occurs, the perf core layer will
trigger an irq work to disable the event and wake up the tool in user
space to read the trace data. After the tool finishes reading, it will
re-enable the AUX event.
Signed-off-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/20250627163028.3503122-1-leo.yan@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The ARMv9.2 architecture introduces the optional Branch Record Buffer
Extension (BRBE), which records information about branches as they are
executed into set of branch record registers. BRBE is similar to x86's
Last Branch Record (LBR) and PowerPC's Branch History Rolling Buffer
(BHRB).
BRBE supports filtering by exception level and can filter just the
source or target address if excluded to avoid leaking privileged
addresses. The h/w filter would be sufficient except when there are
multiple events with disjoint filtering requirements. In this case, BRBE
is configured with a union of all the events' desired branches, and then
the recorded branches are filtered based on each event's filter. For
example, with one event capturing kernel events and another event
capturing user events, BRBE will be configured to capture both kernel
and user branches. When handling event overflow, the branch records have
to be filtered by software to only include kernel or user branch
addresses for that event. In contrast, x86 simply configures LBR using
the last installed event which seems broken.
It is possible on x86 to configure branch filter such that no branches
are ever recorded (e.g. -j save_type). For BRBE, events with a
configuration that will result in no samples are rejected.
Recording branches in KVM guests is not supported like x86. However,
perf on x86 allows requesting branch recording in guests. The guest
events are recorded, but the resulting branches are all from the host.
For BRBE, events with branch recording and "exclude_host" set are
rejected. Requiring "exclude_guest" to be set did not work. The default
for the perf tool does set "exclude_guest" if no exception level
options are specified. However, specifying kernel or user events
defaults to including both host and guest. In this case, only host
branches are recorded.
BRBE can support some additional exception branch types compared to
x86. On x86, all exceptions other than syscalls are recorded as IRQ.
With BRBE, it is possible to better categorize these exceptions. One
limitation relative to x86 is we cannot distinguish a syscall return
from other exception returns. So all exception returns are recorded as
ERET type. The FIQ branch type is omitted as the only FIQ user is Apple
platforms which don't support BRBE. The debug branch types are omitted
as there is no clear need for them.
BRBE records are invalidated whenever events are reconfigured, a new
task is scheduled in, or after recording is paused (and the records
have been recorded for the event). The architecture allows branch
records to be invalidated by the PE under implementation defined
conditions. It is expected that these conditions are rare.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Co-developed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Co-developed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: James Clark <james.clark@linaro.org>
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
tested-by: Adam Young <admiyo@os.amperecomputing.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20250611-arm-brbe-v19-v23-4-e7775563036e@kernel.org
[will: Fix sparse warnings about mixed declarations and code.
Fix C99 comment syntax.]
Signed-off-by: Will Deacon <will@kernel.org>
|
|
While BRBE can record branches within guests, the host recording
branches in guests is not supported by perf (though events are).
Support for BRBE in guests will supported by providing direct access
to BRBE within the guests. That is how x86 LBR works for guests.
Therefore, BRBE needs to be disabled on guest entry and restored on
exit.
For nVHE, this requires explicit handling for guests. Before
entering a guest, save the BRBE state and disable the it. When
returning to the host, restore the state.
For VHE, it is not necessary. We initialize
BRBCR_EL1.{E1BRE,E0BRE}=={0,0} at boot time, and HCR_EL2.TGE==1 while
running in the host. We configure BRBCR_EL2.{E2BRE,E0HBRE} to enable
branch recording in the host. When entering the guest, we set
HCR_EL2.TGE==0 which means BRBCR_EL1 is used instead of BRBCR_EL2.
Consequently for VHE, BRBE recording is disabled at EL1 and EL0 when
running a guest.
Should recording in guests (by the host) ever be desired, the perf ABI
will need to be extended to distinguish guest addresses (struct
perf_branch_entry.priv) for starters. BRBE records would also need to be
invalidated on guest entry/exit as guest/host EL1 and EL0 records can't
be distinguished.
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Co-developed-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Tested-by: James Clark <james.clark@linaro.org>
Reviewed-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20250611-arm-brbe-v19-v23-3-e7775563036e@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
To use the Branch Record Buffer Extension (BRBE), some configuration is
necessary at EL3 and EL2. This patch documents the requirements and adds
the initial EL2 setup code, which largely consists of configuring the
fine-grained traps and initializing a couple of BRBE control registers.
Before this patch, __init_el2_fgt() would initialize HDFGRTR_EL2 and
HDFGWTR_EL2 with the same value, relying on the read/write trap controls
for a register occupying the same bit position in either register. The
'nBRBIDR' trap control only exists in bit 59 of HDFGRTR_EL2, while bit
59 of HDFGWTR_EL2 is RES0, and so this assumption no longer holds.
To handle HDFGRTR_EL2 and HDFGWTR_EL2 having (slightly) different bit
layouts, __init_el2_fgt() is changed to accumulate the HDFGRTR_EL2 and
HDFGWTR_EL2 control bits separately. While making this change the
open-coded value (1 << 62) is replaced with
HDFG{R,W}TR_EL2_nPMSNEVFR_EL1_MASK.
The BRBCR_EL1 and BRBCR_EL2 registers are unusual and require special
initialisation: even though they are subject to E2H renaming, both have
an effect regardless of HCR_EL2.TGE, even when running at EL2. So we
must initialize BRBCR_EL2 in case we run in nVHE mode. This is handled
in __init_el2_brbe() with a comment to explain the situation.
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Reviewed-by: Leo Yan <leo.yan@arm.com>
Tested-by: James Clark <james.clark@linaro.org>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
[Mark: rewrite commit message, fix typo in comment]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Co-developed-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
tested-by: Adam Young <admiyo@os.amperecomputing.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20250611-arm-brbe-v19-v23-2-e7775563036e@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
This patch adds definitions related to the Branch Record Buffer Extension
(BRBE) as per ARM DDI 0487K.a. These will be used by KVM and a BRBE driver
in subsequent patches.
Some existing BRBE definitions in asm/sysreg.h are replaced with equivalent
generated definitions.
Cc: Marc Zyngier <maz@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: James Clark <james.clark@linaro.org>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
tested-by: Adam Young <admiyo@os.amperecomputing.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20250611-arm-brbe-v19-v23-1-e7775563036e@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
PMU drivers should set .suppress_bind_attrs so that userspace is denied
the opportunity to pull the driver out from underneath an in-use PMU
(with predictably unpleasant consequences). Somehow both the CMN and NI
drivers have managed to miss this; put that right.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Leo Yan <leo.yan@arm.com>
Link: https://lore.kernel.org/r/acd48c341b33b96804a3969ee00b355d40c546e2.1751465293.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Arnd reports that Clang's aggressive inlining of arm_cmn_discover() can
lead to stack frame size warnings, and while we could simply prevent
such inlining to hide the issue, it seems more productive to actually
heed the warning and do something about the overall stack footprint.
The xp_region array is already rather large, and CMN_MAX_XPS might only
grow larger in future, however it only serves as a convenience to save
repeating the first level's worth of register reads in the second pass
of discovery. There's no performance concern here, and it only takes a
small tweak to the flow to re-extract the offsets instead of stashing
them, so let's just do that and save several hundred bytes of stack.
Reported-by: Arnd Bergmann <arnd@kernel.org>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-and-tested-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Link: https://lore.kernel.org/r/e7dd41bf0f1b098e2e4b01ef91318a4b272abff8.1751046159.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Don't populate the read-only array mask on the stack at run time,
instead make it static const.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://lore.kernel.org/r/20250611133917.170888-1-colin.i.king@gmail.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The current MODULE_DESCRIPTION only mentions CMN-600, but this driver
now supports several Arm mesh interconnects including CMN-650, CMN-700,
CI-700, and CMN-S3.
Update the MODULE_DESCRIPTION to reflect the expanded scope.
Signed-off-by: Zhiyuan Dai <daizhiyuan@phytium.com.cn>
Link: https://lore.kernel.org/r/20250522032122.949373-1-daizhiyuan@phytium.com.cn
Signed-off-by: Will Deacon <will@kernel.org>
|
|
While we do request our IRQs with the right flags to stop their affinity
changing unexpectedly, we forgot to actually set it to start with. Oops.
Cc: stable@vger.kernel.org
Fixes: 4d5a7680f2b4 ("perf: Add driver for Arm NI-700 interconnect PMU")
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Tested-by: Shouping Wang <allen.wang@hj-micro.com>
Link: https://lore.kernel.org/r/614ced9149ee8324e58930862bd82cbf46228d27.1747149165.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
arm64 depends on the cpufreq driver to gain the maximum cpu frequency
to convert the watchdog_thresh to perf event period. cpufreq drivers
like cppc_cpufreq will be initialized lately after the initializing of
the hard lockup detector so just use a safe cpufreq which will be
inaccurency. Use a cpufreq notifier to adjust the event's period to
a more accurate one.
Reviewed-by: Jie Zhan <zhanjie9@hisilicon.com>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20250701110214.27242-3-yangyicong@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Architecture's using perf events for hard lockup detection needs to
convert the watchdog_thresh to the event's period, some architecture
for example arm64 perform this conversion using the CPU's maximum
frequency which will be acquired by cpufreq. However by the time
the lockup detector's initialized the cpufreq driver may not be
initialized, thus launch a watchdog with inaccurate period. Provide
a function hardlockup_detector_perf_adjust_period() to allowing
adjust the event period. Then architecture can update with more
accurate period if cpufreq is initialized.
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20250701110214.27242-2-yangyicong@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux
Pull turbostat updates from Len Brown:
- Add initial DMR support, which required smarter RAPL probe
- Fix AMD MSR RAPL energy reporting
- Add RAPL power limit configuration output
- Minor fixes
* tag 'turbostat-2025.06.08' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
tools/power turbostat: version 2025.06.08
tools/power turbostat: Add initial support for BartlettLake
tools/power turbostat: Add initial support for DMR
tools/power turbostat: Dump RAPL sysfs info
tools/power turbostat: Avoid probing the same perf counters
tools/power turbostat: Allow probing RAPL with platform_features->rapl_msrs cleared
tools/power turbostat: Clean up add perf/msr counter logic
tools/power turbostat: Introduce add_msr_counter()
tools/power turbostat: Remove add_msr_perf_counter_()
tools/power turbostat: Remove add_cstate_perf_counter_()
tools/power turbostat: Remove add_rapl_perf_counter_()
tools/power turbostat: Quit early for unsupported RAPL counters
tools/power turbostat: Always check rapl_joules flag
tools/power turbostat: Fix AMD package-energy reporting
tools/power turbostat: Fix RAPL_GFX_ALL typo
tools/power turbostat: Add Android support for MSR device handling
tools/power turbostat.8: pm_domain wording fix
tools/power turbostat.8: fix typo: idle_pct should be pct_idle
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer cleanup from Thomas Gleixner:
"The delayed from_timer() API cleanup:
The renaming to the timer_*() namespace was delayed due massive
conflicts against Linux-next. Now that everything is upstream finish
the conversion"
* tag 'timers-cleanups-2025-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
treewide, timers: Rename from_timer() to timer_container_of()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"A small set of x86 fixes:
- Cure IO bitmap inconsistencies
A failed fork cleans up all resources of the newly created thread
via exit_thread(). exit_thread() invokes io_bitmap_exit() which
does the IO bitmap cleanups, which unfortunately assume that the
cleanup is related to the current task, which is obviously bogus.
Make it work correctly
- A lockdep fix in the resctrl code removed the clearing of the
command buffer in two places, which keeps stale error messages
around. Bring them back.
- Remove unused trace events"
* tag 'x86-urgent-2025-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
fs/resctrl: Restore the rdt_last_cmd_clear() calls after acquiring rdtgroup_mutex
x86/iopl: Cure TIF_IO_BITMAP inconsistencies
x86/fpu: Remove unused trace events
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fix from Thomas Gleixner:
"Add the missing seq_file forward declaration in the timer namespace
header"
* tag 'timers-urgent-2025-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timens: Add struct seq_file forward declaration
|
|
Add initial DMR support, which required smarter RAPL probe
Fix AMD MSR RAPL energy reporting
Add RAPL power limit configuration output
Minor fixes
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
Add initial support for BartlettLake.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
Add initial support for DMR.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
for example:
intel-rapl:1: psys 28.0s:100W 976.0us:100W
intel-rapl:0: package-0 28.0s:57W,max:15W 2.4ms:57W
intel-rapl:0/intel-rapl:0:0: core disabled
intel-rapl:0/intel-rapl:0:1: uncore disabled
intel-rapl-mmio:0: package-0 28.0s:28W,max:15W 2.4ms:57W
[lenb: simplified format]
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
squish me
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
For the RAPL package energy status counter, Intel and AMD share the same
perf_subsys and perf_name, but with different MSR addresses.
Both rapl_counter_arch_infos[0] and rapl_counter_arch_infos[1] are
introduced to describe this counter for different Vendors.
As a result, the perf counter is probed twice, and causes a failure in
in get_rapl_counters() because expected_read_size and actual_read_size
don't match.
Fix the problem by skipping the already probed counter.
Note, this is not a perfect fix. For example, if different
vendors/platforms use the same MSR value for different purpose, the code
can be fooled when it probes a rapl_counter_arch_infos[] entry that does
not belong to the running Vendor/Platform.
In a long run, better to put rapl_counter_arch_infos[] into the
platform_features so that this becomes Vendor/Platform specific.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
cleared
platform_features->rapl_msrs describes the RAPL MSRs supported. While
RAPL Perf counters can be exposed from different kernel backend drivers,
e.g. RAPL MSR I/F driver, or RAPL TPMI I/F driver.
Thus, turbostat should first blindly probe all the available RAPL Perf
counters, and falls back to the RAPL MSR counters if they are listed in
platform_features->rapl_msrs.
With this, platforms that don't have RAPL MSRs can clear the
platform_features->rapl_msrs bits and use RAPL Perf counters only.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
Increase the code readability by moving the no_perf/no_msr flag and the
cai->perf_name/cai->msr sanity checks into the counter probe functions.
No functional change.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
probe_rapl_msr() is reused for probing RAPL MSR counters, cstate MSR
counters and MPERF/APERF/SMI MSR counters, thus its name is misleading.
Similar to add_perf_counter(), introduce add_msr_counter() to probe a
counter via MSR. Introduce wrapper function add_rapl_msr_counter() at
the same time to add extra check for Zero return value for specified
RAPL counters.
No functional change intended.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
As the only caller of add_msr_perf_counter_(), add_msr_perf_counter()
just gives extra debug output on top. There is no need to keep both
functions.
Remove add_msr_perf_counter_() and move all the logic to
add_msr_perf_counter().
No functional change.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
As the only caller of add_cstate_perf_counter_(),
add_cstate_perf_counter() just gives extra debug output on top. There is
no need to keep both functions.
Remove add_cstate_perf_counter_() and move all the logic to
add_cstate_perf_counter().
No functional change.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
As the only caller of add_rapl_perf_counter_(), add_rapl_perf_counter()
just gives extra debug output on top. There is no need to keep both
functions.
Remove add_rapl_perf_counter_() and move all the logic to
add_rapl_perf_counter().
No functional change.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
Quit early for unsupported RAPL counters.
No functional change.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
rapl_joules bit should always be checked even if
platform_features->rapl_msrs is not set or no_msr flag is used.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
commit 05a2f07db888 ("tools/power turbostat: read RAPL counters via
perf") that adds support to read RAPL counters via perf defines the
notion of a RAPL domain_id which is set to physical_core_id on
platforms which support per_core_rapl counters (Eg: AMD processors
Family 17h onwards) and is set to the physical_package_id on all the
other platforms.
However, the physical_core_id is only unique within a package and on
platforms with multiple packages more than one core can have the same
physical_core_id and thus the same domain_id. (For eg, the first cores
of each package have the physical_core_id = 0). This results in all
these cores with the same physical_core_id using the same entry in the
rapl_counter_info_perdomain[]. Since rapl_perf_init() skips the
perf-initialization for cores whose domain_ids have already been
visited, cores that have the same physical_core_id always read the
perf file corresponding to the physical_core_id of the first package
and thus the package-energy is incorrectly reported to be the same
value for different packages.
Note: This issue only arises when RAPL counters are read via perf and
not when they are read via MSRs since in the latter case the MSRs are
read separately on each core.
Fix this issue by associating each CPU with rapl_core_id which is
unique across all the packages in the system.
Fixes: 05a2f07db888 ("tools/power turbostat: read RAPL counters via perf")
Signed-off-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
Fix typo in the currently unused RAPL_GFX_ALL macro definition.
Signed-off-by: Kaushlendra Kumar <kaushlendra.kumar@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
It uses /dev/msrN device paths on Android instead of /dev/cpu/N/msr,
updates error messages and permission checks to reflect the Android
device path, and wraps platform-specific code with #if defined(ANDROID)
to ensure correct behavior on both Android and non-Android systems.
These changes improve compatibility and usability of turbostat on
Android devices.
Signed-off-by: Kaushlendra Kumar <kaushlendra.kumar@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
turbostat.8: clarify that uncore "domains" are Power Management domains,
aka pm_domains.
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
idle_pct should be pct_idle
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 perf fix from Thomas Gleixner:
"A single fix for the x86 performance counters on Intel CPUs:
The MSR offset calculations for fixed performance counters are stored
at the wrong index in the configuration array causing the general
purpose counter MSR offset to be overwritten, so both the general
purpose and the fixed counters offsets are incorrect.
Correct the array index calculation to fix that"
* tag 'perf-urgent-2025-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel: Fix incorrect MSR index calculations in intel_pmu_config_acr()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fix from Thomas Gleixner:
"A single fix for the PCI/MSI code:
The conversion to per device MSI domains created a MSI domain with
size 1 instead of sizing it to the maximum possible number of MSI
interrupts for the device. This "worked" as the subsequent allocations
resized the domain, but the recent change to move the prepare() call
into the domain creation path broke this works by chance mechanism.
Size the domain properly at creation time"
* tag 'irq-urgent-2025-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
PCI/MSI: Size device MSI domain with the maximum number of vectors
|
|
Pull mount fixes from Al Viro:
"Various mount-related bugfixes:
- split the do_move_mount() checks in subtree-of-our-ns and
entire-anon cases and adapt detached mount propagation selftest for
mount_setattr
- allow clone_private_mount() for a path on real rootfs
- fix a race in call of has_locked_children()
- fix move_mount propagation graph breakage by MOVE_MOUNT_SET_GROUP
- make sure clone_private_mnt() caller has CAP_SYS_ADMIN in the right
userns
- avoid false negatives in path_overmount()
- don't leak MNT_LOCKED from parent to child in finish_automount()
- do_change_type(): refuse to operate on unmounted/not ours mounts"
* tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
do_change_type(): refuse to operate on unmounted/not ours mounts
clone_private_mnt(): make sure that caller has CAP_SYS_ADMIN in the right userns
selftests/mount_setattr: adapt detached mount propagation test
do_move_mount(): split the checks in subtree-of-our-ns and entire-anon cases
fs: allow clone_private_mount() for a path on real rootfs
fix propagation graph breakage by MOVE_MOUNT_SET_GROUP move_mount(2)
finish_automount(): don't leak MNT_LOCKED from parent to child
path_overmount(): avoid false negatives
fs/fhandle.c: fix a race in call of has_locked_children()
|
|
git://git.samba.org/sfrench/cifs-2.6
Pull more smb client updates from Steve French:
- multichannel/reconnect fixes
- move smbdirect (smb over RDMA) defines to fs/smb/common so they will
be able to be used in the future more broadly, and a documentation
update explaining setting up smbdirect mounts
- update email address for Paulo
* tag '6.16-rc-part2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: update internal version number
MAINTAINERS, mailmap: Update Paulo Alcantara's email address
cifs: add documentation for smbdirect setup
cifs: do not disable interface polling on failure
cifs: serialize other channels when query server interfaces is pending
cifs: deal with the channel loading lag while picking channels
smb: client: make use of common smbdirect_socket_parameters
smb: smbdirect: introduce smbdirect_socket_parameters
smb: client: make use of common smbdirect_socket
smb: smbdirect: add smbdirect_socket.h
smb: client: make use of common smbdirect.h
smb: smbdirect: add smbdirect.h with public structures
smb: client: make use of common smbdirect_pdu.h
smb: smbdirect: add smbdirect_pdu.h with protocol definitions
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull more tracing fixes from Steven Rostedt:
- Fix regression of waiting a long time on updating trace event filters
When the faultable trace points were added, it needed task trace RCU
synchronization.
This was added to the tracepoint_synchronize_unregister() function.
The filter logic always called this function whenever it updated the
trace event filters before freeing the old filters. This increased
the time of "trace-cmd record" from taking 13 seconds to running over
2 minutes to complete.
Move the freeing of the filters to call_rcu*() logic, which brings
the time back down to 13 seconds.
- Fix ring_buffer_subbuf_order_set() error path lock protection
The error path of the ring_buffer_subbuf_order_set() released the
mutex too early and allowed subsequent accesses to setting the
subbuffer size to corrupt the data and cause a bug.
By moving the mutex locking to the end of the error path, it prevents
the reentrant access to the critical data and also allows the
function to convert the taking of the mutex over to the guard()
logic.
- Remove unused power management clock events
The clock events were added in 2010 for power management. In 2011 arm
used them. In 2013 the code they were used in was removed. These
events have been wasting memory since then.
- Fix sparse warnings
There was a few places that sparse warned about trace_events_filter.c
where file->filter was referenced directly, but it is annotated with
an __rcu tag. Use the helper functions and fix them up to use
rcu_dereference() properly.
* tag 'trace-v6.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: Add rcu annotation around file->filter accesses
tracing: PM: Remove unused clock events
ring-buffer: Fix buffer locking in ring_buffer_subbuf_order_set()
tracing: Fix regression of filter waiting a long time on RCU synchronization
|