Age | Commit message (Collapse) | Author |
|
In an effort to move ARM64 away from the legacy MSI setup, convert the
Apple PCIe driver to the MSI-parent infrastructure and let each device have
its own MSI domain.
[ tglx: Moved the struct out of the function call argument ]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Link: https://lore.kernel.org/all/20250513172819.2216709-8-maz@kernel.org
|
|
Bad MSI implementations multiplex MSIs onto a single downstream interrupt,
meaning they have no concept of individual affinity.
The old MSI code did a reasonable job at this by honouring the
MSI_FLAG_NO_AFFINITY, but the new shiny device MSI code doesn't.
Teach it about the sad reality of existing hardware.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250513172819.2216709-7-maz@kernel.org
|
|
Switch the MVEBU family of interrupt chip drivers over to the common helper
function to create the interrupt domains.
[ tglx: Moved the struct out of the function call argument and fix up
the of_node_to_fwnode() instances ]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250513172819.2216709-5-maz@kernel.org
|
|
Switch the GIC family of interrupt chip drivers over to the common helper
function to create the interrupt domains.
[ tglx: Moved the struct out of the function call argument ]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250513172819.2216709-4-maz@kernel.org
|
|
Creating an irq domain that serves as an MSI parent requires
a substantial amount of esoteric boiler-plate code, some of
which is often provided twice (such as the bus token).
To make things a bit simpler for the unsuspecting MSI tinkerer,
provide a helper that does it for them, and serves as documentation
of what needs to be provided.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250513172819.2216709-3-maz@kernel.org
|
|
Move irq-msi-lib.h into include/linux/irqchip, making it available
to compilation units outside of drivers/irqchip.
This requires some churn in drivers to fetch it from the new location,
generated using this script:
git grep -l -w \"irq-msi-lib.h\" | \
xargs sed -i -e 's:"irq-msi-lib.h":\<linux/irqchip/irq-msi-lib.h\>:'
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250513172819.2216709-2-maz@kernel.org
|
|
Pick up the PCI changes to avoid conflicts.
|
|
Now that .msi_prepare() gets called at the right time and not
with semi-random parameters, remove the ugly hack that tried
to fix up the number of allocated vectors.
It is now correct by construction.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250513163144.2215824-6-maz@kernel.org
|
|
Kindly inform the MSI driver that the domain is torn down, providing the
allocation context previously populated on domain creation.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250513163144.2215824-5-maz@kernel.org
|
|
The current device MSI infrastructure is subtly broken, as it will issue an
.msi_prepare() callback into the MSI controller driver every time it needs
to allocate an MSI. That's pretty wrong, as the contract (or unwarranted
assumption, depending who you ask) between the MSI controller and the core
code is that .msi_prepare() is called exactly once per device.
This leads to some subtle breakage in some MSI controller drivers, as it
gives the impression that there are multiple endpoints sharing a bus
identifier (RID in PCI parlance, DID for GICv3+). It implies that whatever
allocation the ITS driver (for example) has done on behalf of these devices
cannot be undone, as there is no way to track the shared state. This is
particularly bad for wire-MSI devices, for which .msi_prepare() is called
for each input line.
To address this issue, move the call to .msi_prepare() to take place at the
point of irq domain allocation, which is the only place that makes
sense. The msi_alloc_info_t structure is made part of the
msi_domain_template, so that its life-cycle is that of the domain as well.
Finally, the msi_info::alloc_data field is made to point at this allocation
tracking structure, ensuring that it is carried around the block.
This is all pretty straightforward, except for the non-device-MSI
leftovers, which still have to call .msi_prepare() at the old spot. One
day...
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250513163144.2215824-4-maz@kernel.org
|
|
The ITS driver currently nukes the structure representing an endpoint
device translating via an ITS on freeing the last LPI allocated for it.
That's an unfortunate state of affair, as it is pretty common for a driver
to allocate a single MSI, do something clever, teardown this MSI, and
reallocate a whole bunch of them. The NVME driver does exactly that,
amongst others.
What happens in that case is that the core code is accidentaly issuing
another .msi_prepare() call, even if it shouldn't. This luckily cancels
the above behaviour and hides the problem.
In order to fix the core code, start by implementing the new
.msi_teardown() callback. Nothing calls it yet, so a side effect is that
the its_dev structure will not be freed and that the DID will stay
mapped. Not a big deal, and this will be solved in following patches.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250513163144.2215824-3-maz@kernel.org
|
|
While the MSI ops do have a .msi_prepare() callback that is responsible for
setting up the relevant (usually per-device) allocation, there is no
callback reversing this setup.
For this purpose, add .msi_teardown() callback.
In order to avoid breaking the ITS driver that suffers from related issues,
do not call the callback just yet.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250513163144.2215824-2-maz@kernel.org
|
|
Pull in the platform MSI/GIC changes which are seperate for the PCI
endpoint driver updates.
|
|
Some platform devices create child devices dynamically and require the
parent device's msi-map to map device IDs to actual sideband information.
A typical use case is using ITS as a PCIe Endpoint Controller(EPC)'s
doorbell function, where PCI hosts send TLP memory writes to the EP
controller. The EP controller converts these writes to AXI transactions
and appends platform-specific sideband information.
EPC's DTS will provide such information by msi-map and msi-mask. A
simplified dts as
pcie-ep@10000000 {
...
msi-map = <0 &its 0xc 8>;
^^^ 0xc is implement defined sideband information,
which append to AXI write transaction.
^ 0 is function index.
msi-mask = <0x7>
}
Check msi-map if msi-parent missed to keep compatility with existing systems.
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250414-ep-msi-v18-5-f69b49917464@nxp.com
|
|
Document the use of (msi|iommu)-map for PCI Endpoint (EP) controllers,
which can use MSI as a doorbell mechanism. Each EP controller can support
up to 8 physical functions and 65,536 virtual functions.
Define how to construct device IDs using function bits [2:0] and virtual
function index bits [31:3], enabling (msi|iommu)-map to associate each
child device with a specific (msi|iommu)-specifier.
The EP cannot rely on PCI Requester ID (RID) because the RID is determined
by the PCI topology of the host system. Since the EP may be connected to
different PCI hosts, the RID can vary between systems and is therefore not
a reliable identifier.
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://lore.kernel.org/all/20250414-ep-msi-v18-4-f69b49917464@nxp.com
|
|
Set the IRQ_DOMAIN_FLAG_MSI_IMMUTABLE flag for ITS, as it does not change
the address/data pair after setup.
Ensure compatibility with MSI users, such as PCIe Endpoint Doorbell, which
require the address/data pair to remain unchanged. Enable PCIe endpoints to
use ITS for triggering doorbells from the PCIe Root Complex (RC) side.
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250414-ep-msi-v18-3-f69b49917464@nxp.com
|
|
Add the flag IRQ_DOMAIN_FLAG_MSI_IMMUTABLE and the API function
irq_domain_is_msi_immutable() to check if the MSI controller retains an
immutable address/data pair during irq_set_affinity().
Ensure compatibility with MSI users like PCIe Endpoint Doorbell, which
require the address/data pair to remain unchanged after setup. Use this
function to verify if the MSI controller is immutable.
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250414-ep-msi-v18-2-f69b49917464@nxp.com
|
|
platform_device_msi_free_irqs_all()
platform_device_msi_init_and_alloc_irqs() performs two tasks: allocating
the MSI domain for a platform device, and allocate a number of MSIs in that
domain.
platform_device_msi_free_irqs_all() only frees the MSIs, and leaves the MSI
domain alive.
Given that platform_device_msi_init_and_alloc_irqs() is the sole tool a
platform device has to allocate platform MSIs, it makes sense for
platform_device_msi_free_irqs_all() to teardown the MSI domain at the same
time as the MSIs.
This avoids warnings and unexpected behaviours when a driver repeatedly
allocates and frees MSIs.
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/all/20250414-ep-msi-v18-1-f69b49917464@nxp.com
|
|
Now that all abuse is gone and the legit users are converted to
guard(msi_descs_lock), rename the lock functions and document them as
internal.
No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huwei.com>
Link: https://lore.kernel.org/all/20250319105506.864699741@linutronix.de
|
|
The driver abuses the MSI descriptors for internal purposes. Aside of core
code and MSI providers nothing has to care about their existence. They have
been encapsulated with a lot of effort because this kind of abuse caused
all sorts of issues including a maintainability nightmare.
Rewrite the code so it uses dedicated storage to hand the required
information to the interrupt handler and use a custom cleanup function to
simplify the error path.
No functional change intended.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Link: https://lore.kernel.org/all/20250319105506.805529593@linutronix.de
|
|
The driver walks the MSI descriptors to test whether a descriptor exists
for a given index. That's just abuse of the MSI internals.
The same test can be done with a single function call by looking up whether
there is a Linux interrupt number assigned at the index.
What's worse is that the function is completely unserialized against
modifications of the MSI-X control by operations issued from the interrupt
core. It also brings the PCI/MSI-X internal cached control word out of
sync.
Remove the trainwreck and invoke the function provided by the PCI/MSI core
to update it.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/all/20250319105506.744271447@linutronix.de
|
|
The PCI/TPH driver fiddles with the MSI-X control word of an active
interrupt completely unserialized against concurrent operations issued
from the interrupt core. It also brings the PCI/MSI-X internal cached
control word out of sync.
Provide a function, which has the required serialization and keeps the
control word cache in sync.
Unfortunately this requires to look up and lock the interrupt descriptor,
which should be only done in the interrupt core code. But confining this
particular oddity in the PCI/MSI core is the lesser of all evil. A
interrupt core implementation would require a larger pile of infrastructure
and indirections for dubious value.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/all/20250319105506.683663807@linutronix.de
|
|
Convert the code to use the new guard(msi_descs_lock).
No functional change intended.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Acked-by: Wei Liu <wei.liu@kernel.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/all/20250319105506.624838146@linutronix.de
|
|
Split the lock protected functionality of msix_capability_init() out into a
helper function and use guard(msi_desc_lock) to replace the lock/unlock
pair.
Simplify the error path in the helper function by utilizing a custom
cleanup to get rid of the remaining gotos.
No functional change intended.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/all/20250319105506.564105011@linutronix.de
|
|
Split the lock protected functionality of msi_capability_init() out into a
helper function and use guard(msi_desc_lock) to replace the lock/unlock
pair.
No functional change intended.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/all/20250319105506.504992208@linutronix.de
|
|
Let cleanup handle the freeing of the affinity mask. That prepares for
switching the MSI descriptor locking to a guard().
No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/all/20250319105506.444764312@linutronix.de
|
|
The comment claiming that pci_dev::msi_enabled has to be set across setup
is a leftover from ancient code versions. Nothing in the setup code
requires the flag to be set anymore.
Set it in the success path and remove the extra goto label.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/all/20250319105506.383222333@linutronix.de
|
|
Convert the trivial cases of msi_desc_lock/unlock() pairs.
No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/all/20250319105506.322536126@linutronix.de
|
|
Convert the code to use the new guard(msi_descs_lock).
No functional change intended.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Acked-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/all/20250319105506.263456735@linutronix.de
|
|
Convert the code to use the new guard(msi_descs_lock).
No functional change intended.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Nishanth Menon <nm@ti.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dhruva Gole <d-gole@ti.com>
Link: https://lore.kernel.org/all/20250319105506.203802081@linutronix.de
|
|
Provide a lock guard for MSI descriptor locking and update the core code
accordingly.
No functional change intended.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/all/20250319105506.144672678@linutronix.de
|
|
In cases where an allocation is consumed by another function, the
allocation needs to be retained on success or freed on failure. The code
pattern is usually:
struct foo *f = kzalloc(sizeof(*f), GFP_KERNEL);
struct bar *b;
,,,
// Initialize f
...
if (ret)
goto free;
...
bar = bar_create(f);
if (!bar) {
ret = -ENOMEM;
goto free;
}
...
return 0;
free:
kfree(f);
return ret;
This prevents using __free(kfree) on @f because there is no canonical way
to tell the cleanup code that the allocation should not be freed.
Abusing no_free_ptr() by force ignoring the return value is not really a
sensible option either.
Provide an explicit macro retain_and_null_ptr(), which NULLs the cleanup
pointer. That makes it easy to analyze and reason about.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Link: https://lore.kernel.org/all/20250319105506.083538907@linutronix.de
|
|
of_node_to_fwnode() is irqdomain's reimplementation of the "officially"
defined of_fwnode_handle(). The former is in the process of being removed,
so use the latter instead.
Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20250319092951.37667-8-jirislaby@kernel.org
|
|
|
|
The "real" linux/types.h UAPI header gracefully degrades to a NOOP when
included from assembly code.
Mirror this behaviour in the tools/ variant.
Test for __ASSEMBLER__ over __ASSEMBLY__ as the former is provided by the
toolchain automatically.
Reported-by: Mark Brown <broonie@kernel.org>
Closes: https://lore.kernel.org/lkml/af553c62-ca2f-4956-932c-dd6e3a126f58@sirena.org.uk/
Fixes: c9fbaa879508 ("selftests: vDSO: parse_vdso: Use UAPI headers instead of libc headers")
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Link: https://patch.msgid.link/20250321-uapi-consistency-v1-1-439070118dc0@linutronix.de
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux
Pull turbostat updates from Len Brown:
- support up to 8192 processors
- add cpuidle governor debug telemetry, disabled by default
- update default output to exclude cpuidle invocation counts
- bug fixes
* tag 'turbostat-2025.05.06' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
tools/power turbostat: v2025.05.06
tools/power turbostat: disable "cpuidle" invocation counters, by default
tools/power turbostat: re-factor sysfs code
tools/power turbostat: Restore GFX sysfs fflush() call
tools/power turbostat: Document GNR UncMHz domain convention
tools/power turbostat: report CoreThr per measurement interval
tools/power turbostat: Increase CPU_SUBSET_MAXCPUS to 8192
tools/power turbostat: Add idle governor statistics reporting
tools/power turbostat: Fix names matching
tools/power turbostat: Allow Zero return value for some RAPL registers
tools/power turbostat: Clustered Uncore MHz counters should honor show/hide options
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire
Pull soundwire fix from Vinod Koul:
- add missing config symbol CONFIG_SND_HDA_EXT_CORE required for asoc
driver CONFIG_SND_SOF_SOF_HDA_SDW_BPT
* tag 'soundwire-6.15-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire:
ASoC: SOF: Intel: Let SND_SOF_SOF_HDA_SDW_BPT select SND_HDA_EXT_CORE
|
|
Support up to 8192 processors
Add cpuidle governor debug telemetry, disabled by default
Update default output to exclude cpuidle invocation counts
Bug fixes
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
Create "pct_idle" counter group, the sofware notion of residency
so it can now be singled out, independent of other counter groups.
Create "cpuidle" group, the cpuidle invocation counts.
Disable "cpuidle", by default.
Create "swidle" = "cpuidle" + "pct_idle".
Undocument "sysfs", the old name for "swidle", but keep it working
for backwards compatibilty.
Create "hwidle", all the HW idle counters
Modify "idle", enabled by default
"idle" = "hwidle" + "pct_idle" (and now excludes "cpuidle")
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf event fix from Ingo Molnar:
"Fix a perf events time accounting bug"
* tag 'perf-urgent-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/core: Fix child_total_time_enabled accounting bug at task exit
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
- Fix a nonsensical Kconfig combination
- Remove an unnecessary rseq-notification
* tag 'sched-urgent-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
rseq: Eliminate useless task_work on execve
sched/isolation: Make CONFIG_CPU_ISOLATION depend on CONFIG_SMP
|
|
... and don't error out so hard on missing module descriptions.
Before commit 6c6c1fc09de3 ("modpost: require a MODULE_DESCRIPTION()")
we used to warn about missing module descriptions, but only when
building with extra warnigns (ie 'W=1').
After that commit the warning became an unconditional hard error.
And it turns out not all modules have been converted despite the claims
to the contrary. As reported by Damian Tometzki, the slub KUnit test
didn't have a module description, and apparently nobody ever really
noticed.
The reason nobody noticed seems to be that the slub KUnit tests get
disabled by SLUB_TINY, which also ends up disabling a lot of other code,
both in tests and in slub itself. And so anybody doing full build tests
didn't actually see this failre.
So let's disable SLUB_TINY for build-only tests, since it clearly ends
up limiting build coverage. Also turn the missing module descriptions
error back into a warning, but let's keep it around for non-'W=1'
builds.
Reported-by: Damian Tometzki <damian@riscv-rocks.de>
Link: https://lore.kernel.org/all/01070196099fd059-e8463438-7b1b-4ec8-816d-173874be9966-000000@eu-central-1.amazonses.com/
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
Fixes: 6c6c1fc09de3 ("modpost: require a MODULE_DESCRIPTION()")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Probe cpuidle "sysfs" residency and counts separately,
since soon we will make one disabled on, and the
other disabled off.
Clarify that some BIC (build-in-counters) are actually "groups".
since we're about to re-name some of those groups.
no functional change.
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
Do fflush() to discard the buffered data, before each read of the
graphics sysfs knobs.
Fixes: ba99a4fc8c24 ("tools/power turbostat: Remove unnecessary fflush() call")
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
Document that on Intel Granite Rapids Systems,
Uncore domains 0-2 are CPU domains, and
uncore domains 3-4 are IO domains.
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
The CoreThr column displays total thermal throttling events
since boot time.
Change it to report events during the measurement interval.
This is more useful for showing a user the current conditions.
Total events since boot time are still available to the user via
/sys/devices/system/cpu/cpu*/thermal_throttle/*
Document CoreThr on turbostat.8
Fixes: eae97e053fe30 ("turbostat: Support thermal throttle count print")
Reported-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Cc: Chen Yu <yu.c.chen@intel.com>
|
|
On systems with >= 1024 cpus (in my case 1152), turbostat fails with the error output:
"turbostat: /sys/fs/cgroup/cpuset.cpus.effective: cpu str malformat 0-1151"
A similar error appears with the use of turbostat --cpu when the inputted cpu
range contains a cpu number >= 1024:
# turbostat -c 1100-1151
"--cpu 1100-1151" malformed
...
Both errors are caused by parse_cpu_str() reaching its limit of CPU_SUBSET_MAXCPUS.
It's a good idea to limit the maximum cpu number being parsed, but 1024 is too low.
For a small increase in compute and allocated memory, increasing CPU_SUBSET_MAXCPUS
brings support for parsing cpu numbers >= 1024.
Increase CPU_SUBSET_MAXCPUS to 8192, a common setting for CONFIG_NR_CPUS on x86_64.
Signed-off-by: Justin Ernst <justin.ernst@hpe.com>
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer cleanups from Thomas Gleixner:
"A set of final cleanups for the timer subsystem:
- Convert all del_timer[_sync]() instances over to the new
timer_delete[_sync]() API and remove the legacy wrappers.
Conversion was done with coccinelle plus some manual fixups as
coccinelle chokes on scoped_guard().
- The final cleanup of the hrtimer_init() to hrtimer_setup()
conversion.
This has been delayed to the end of the merge window, so that all
patches which have been merged through other trees are in mainline
and all new users are catched.
Doing this right before rc1 ensures that new code which is merged post
rc1 is not introducing new instances of the original functionality"
* tag 'timers-cleanups-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tracing/timers: Rename the hrtimer_init event to hrtimer_setup
hrtimers: Rename debug_init_on_stack() to debug_setup_on_stack()
hrtimers: Rename debug_init() to debug_setup()
hrtimers: Rename __hrtimer_init_sleeper() to __hrtimer_setup_sleeper()
hrtimers: Remove unnecessary NULL check in hrtimer_start_range_ns()
hrtimers: Make callback function pointer private
hrtimers: Merge __hrtimer_init() into __hrtimer_setup()
hrtimers: Switch to use __htimer_setup()
hrtimers: Delete hrtimer_init()
treewide: Convert new and leftover hrtimer_init() users
treewide: Switch/rename to timer_delete[_sync]()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull more irq updates from Thomas Gleixner:
"A set of updates for the interrupt subsystem:
- A treewide cleanup for the irq_domain code, which makes the naming
consistent and gets rid of the original oddity of naming domains
'host'.
This is a trivial mechanical change and is done late to ensure that
all instances have been catched and new code merged post rc1 wont
reintroduce new instances.
- A trivial consistency fix in the migration code
The recent introduction of irq_force_complete_move() in the core
code, causes a problem for the nostalgia crowd who maintains ia64
out of tree.
The code assumes that hierarchical interrupt domains are enabled
and dereferences irq_data::parent_data unconditionally. That works
in mainline because both architectures which enable that code have
hierarchical domains enabled. Though it breaks the ia64 build,
which enables the functionality, but does not have hierarchical
domains.
While it's not really a problem for mainline today, this
unconditional dereference is inconsistent and trivially fixable by
using the existing helper function irqd_get_parent_data(), which
has the appropriate #ifdeffery in place"
* tag 'irq-urgent-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq/migration: Use irqd_get_parent_data() in irq_force_complete_move()
irqdomain: Stop using 'host' for domain
irqdomain: Rename irq_get_default_host() to irq_get_default_domain()
irqdomain: Rename irq_set_default_host() to irq_set_default_domain()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fix from Thomas Gleixner:
"A revert to fix a adjtimex() regression:
The recent change to prevent that time goes backwards for the coarse
time getters due to immediate multiplier adjustments via adjtimex(),
changed the way how the timekeeping core treats that.
That change result in a regression on the adjtimex() side, which is
user space visible:
1) The forwarding of the base time moves the update out of the
original period and establishes a new one. That's changing the
behaviour of the [PF]LL control, which user space expects to be
applied periodically.
2) The clearing of the accumulated NTP error due to #1, changes the
behaviour as well.
An attempt to delay the multiplier/frequency update to the next tick
did not solve the problem as userspace expects that the multiplier or
frequency updates are in effect, when the syscall returns.
There is a different solution for the coarse time problem available,
so revert the offending commit to restore the existing adjtimex()
behaviour"
* tag 'timers-urgent-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
Revert "timekeeping: Fix possible inconsistencies in _COARSE clockids"
|