Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"These are dominated by cpufreq updates which in turn are dominated by
updates related to boost support in the core and drivers and
amd-pstate driver optimizations.
Apart from the above, there are some cpuidle updates including a
rework of the most recent idle intervals handling in the venerable
menu governor that leads to significant improvements in some
performance benchmarks, as the governor is now more likely to predict
a shorter idle duration in some cases, and there are updates of the
core device power management code, mostly related to system suspend
and resume, that should help to avoid potential issues arising when
the drivers of devices depending on one another want to use different
optimizations.
There is also a usual collection of assorted fixes and cleanups,
including removal of some unused code.
Specifics:
- Manage sysfs attributes and boost frequencies efficiently from
cpufreq core to reduce boilerplate code in drivers (Viresh Kumar)
- Minor cleanups to cpufreq drivers (Aaron Kling, Benjamin Schneider,
Dhananjay Ugwekar, Imran Shaik, zuoqian)
- Migrate some cpufreq drivers to using for_each_present_cpu() (Jacky
Bai)
- cpufreq-qcom-hw DT binding fixes (Krzysztof Kozlowski)
- Use str_enable_disable() helper in cpufreq_online() (Lifeng Zheng)
- Optimize the amd-pstate driver to avoid cases where call paths end
up calling the same writes multiple times and needlessly caching
variables through code reorganization, locking overhaul and tracing
adjustments (Mario Limonciello, Dhananjay Ugwekar)
- Make it possible to avoid enabling capacity-aware scheduling (CAS)
in the intel_pstate driver and relocate a check for out-of-band
(OOB) platform handling in it to make it detect OOB before checking
HWP availability (Rafael Wysocki)
- Fix dbs_update() to avoid inadvertent conversions of negative
integer values to unsigned int which causes CPU frequency selection
to be inaccurate in some cases when the "conservative" cpufreq
governor is in use (Jie Zhan)
- Update the handling of the most recent idle intervals in the menu
cpuidle governor to prevent useful information from being discarded
by it in some cases and improve the prediction accuracy (Rafael
Wysocki)
- Make it possible to tell the intel_idle driver to ignore its
built-in table of idle states for the given processor, clean up the
handling of auto-demotion disabling on Baytrail and Cherrytrail
chips in it, and update its MAINTAINERS entry (David Arcari, Artem
Bityutskiy, Rafael Wysocki)
- Make some cpuidle drivers use for_each_present_cpu() instead of
for_each_possible_cpu() during initialization to avoid issues
occurring when nosmp or maxcpus=0 are used (Jacky Bai)
- Clean up the Energy Model handling code somewhat (Rafael Wysocki)
- Use kfree_rcu() to simplify the handling of runtime Energy Model
updates (Li RongQing)
- Add an entry for the Energy Model framework to MAINTAINERS as
properly maintained (Lukasz Luba)
- Address RCU-related sparse warnings in the Energy Model code
(Rafael Wysocki)
- Remove ENERGY_MODEL dependency on SMP and allow it to be selected
when DEVFREQ is set without CPUFREQ so it can be used on a wider
range of systems (Jeson Gao)
- Unify error handling during runtime suspend and runtime resume in
the core to help drivers to implement more consistent runtime PM
error handling (Rafael Wysocki)
- Drop a redundant check from pm_runtime_force_resume() and rearrange
documentation related to __pm_runtime_disable() (Rafael Wysocki)
- Rework the handling of the "smart suspend" driver flag in the PM
core to avoid issues hat may occur when drivers using it depend on
some other drivers and clean up the related PM core code (Rafael
Wysocki, Colin Ian King)
- Fix the handling of devices with the power.direct_complete flag set
if device_suspend() returns an error for at least one device to
avoid situations in which some of them may not be resumed (Rafael
Wysocki)
- Use mutex_trylock() in hibernate_compressor_param_set() to avoid a
possible deadlock that may occur if the "compressor" hibernation
module parameter is accessed during the registration of a new
ieee80211 device (Lizhi Xu)
- Suppress sleeping parent warning in device_pm_add() in the case
when new children are added under a device with the
power.direct_complete set after it has been processed by
device_resume() (Xu Yang)
- Remove needless return in three void functions related to system
wakeup (Zijun Hu)
- Replace deprecated kmap_atomic() with kmap_local_page() in the
hibernation core code (David Reaver)
- Remove unused helper functions related to system sleep (David Alan
Gilbert)
- Clean up s2idle_enter() so it does not lock and unlock CPU offline
in vain and update comments in it (Ulf Hansson)
- Clean up broken white space in dpm_wait_for_children() (Geert
Uytterhoeven)
- Update the cpupower utility to fix lib version-ing in it and memory
leaks in error legs, remove hard-coded values, and implement CPU
physical core querying (Thomas Renninger, John B. Wyatt IV, Shuah
Khan, Yiwei Lin, Zhongqiu Han)"
* tag 'pm-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (139 commits)
PM: sleep: Fix bit masking operation
dt-bindings: cpufreq: cpufreq-qcom-hw: Narrow properties on SDX75, SA8775p and SM8650
dt-bindings: cpufreq: cpufreq-qcom-hw: Drop redundant minItems:1
dt-bindings: cpufreq: cpufreq-qcom-hw: Add missing constraint for interrupt-names
dt-bindings: cpufreq: cpufreq-qcom-hw: Add QCS8300 compatible
cpufreq: Init cpufreq only for present CPUs
PM: sleep: Fix handling devices with direct_complete set on errors
cpuidle: Init cpuidle only for present CPUs
PM: clk: Remove unused pm_clk_remove()
PM: sleep: core: Fix indentation in dpm_wait_for_children()
PM: s2idle: Extend comment in s2idle_enter()
PM: s2idle: Drop redundant locks when entering s2idle
PM: sleep: Remove unused pm_generic_ wrappers
cpufreq: tegra186: Share policy per cluster
cpupower: Make lib versioning scheme more obvious and fix version link
PM: EM: Rework the depends on for CONFIG_ENERGY_MODEL
PM: EM: Address RCU-related sparse warnings
cpupower: Implement CPU physical core querying
pm: cpupower: remove hard-coded topology depth values
pm: cpupower: Fix cmd_monitor() error legs to free cpu_topology
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm
Merge ARM cpufreq updates for 6.15 from Viresh Kumar:
"- manage sysfs attributes and boost frequencies efficiently from cpufreq
core to reduce boilerplate code from drivers (Viresh Kumar).
- Minor cleanups to cpufreq drivers (Aaron Kling, Benjamin Schneider,
Dhananjay Ugwekar, Imran Shaik, and zuoqian).
- Migrate to using for_each_present_cpu (Jacky Bai).
- cpufreq-qcom-hw DT binding fixes (Krzysztof Kozlowski).
- Use str_enable_disable() helper (Lifeng Zheng)."
* tag 'cpufreq-arm-updates-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm: (59 commits)
dt-bindings: cpufreq: cpufreq-qcom-hw: Narrow properties on SDX75, SA8775p and SM8650
dt-bindings: cpufreq: cpufreq-qcom-hw: Drop redundant minItems:1
dt-bindings: cpufreq: cpufreq-qcom-hw: Add missing constraint for interrupt-names
dt-bindings: cpufreq: cpufreq-qcom-hw: Add QCS8300 compatible
cpufreq: Init cpufreq only for present CPUs
cpufreq: tegra186: Share policy per cluster
cpufreq: tegra194: Allow building for Tegra234
cpufreq: enable 1200Mhz clock speed for armada-37xx
cpufreq: Remove cpufreq_enable_boost_support()
cpufreq: staticize policy_has_boost_freq()
cpufreq: qcom: Set .set_boost directly
cpufreq: dt: Set .set_boost directly
cpufreq: scmi: Set .set_boost directly
cpufreq: powernv: Set .set_boost directly
cpufreq: loongson: Set .set_boost directly
cpufreq: apple: Set .set_boost directly
cpufreq: Restrict enabling boost on policies with no boost frequencies
cpufreq: cppc: Set policy->boost_supported
cpufreq: amd: Set policy->boost_supported
cpufreq: acpi: Set policy->boost_supported
...
|
|
There have been instances in past where refcount decrementing is missed
while exiting a function. Use automatic scope based cleanup to avoid
such errors.
Signed-off-by: Dhananjay Ugwekar <dhananjay.ugwekar@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Link: https://lore.kernel.org/r/20250205112523.201101-12-dhananjay.ugwekar@amd.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
|
|
Allow arch_freq_get_on_cpu to return an error for cases when retrieving
current CPU frequency is not possible, whether that being due to lack of
required arch support or due to other circumstances when the current
frequency cannot be determined at given point of time.
Signed-off-by: Beata Michalska <beata.michalska@arm.com>
Reviewed-by: Prasanna Kumar T S M <ptsm@linux.microsoft.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Link: https://lore.kernel.org/r/20250131162439.3843071-2-beata.michalska@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Remove the now unused helper, cpufreq_enable_boost_support().
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
|
|
policy_has_boost_freq() isn't used outside of freq_table.c now, mark it
static.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
|
|
It is possible to have a scenario where not all cpufreq policies support
boost frequencies. And letting sysfs (or other parts of the kernel)
enable boost feature for that policy isn't correct.
Add a new flag, boost_supported, which will be set to true by the
cpufreq core only if the freq table contains valid boost frequencies.
Some cpufreq drivers though don't have boost frequencies in the
freq-table, they can set this flag from their ->init() callbacks.
Once all the drivers are updated to set the flag correctly, we can check
it before enabling boost feature for a policy.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
|
|
This will be used directly by cpufreq driver going forward, export it.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
|
|
cpufreq_boost_trigger_state() is only used by cpufreq core, mark it
static.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
|
|
All users of cpufreq_generic_attr are migrated now, remove it. While at
it, also stop exporting attributes for available and boost frequencies
as they are only used by cpufreq core now.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
|
|
In the parse_perf_domain function, if the call to
of_parse_phandle_with_args returns an error, then the reference to the
CPU device node that was acquired at the start of the function would not
be properly decremented.
Address this by declaring the variable with the __free(device_node)
cleanup attribute.
Signed-off-by: Miquel Sabaté Solà <mikisabate@gmail.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://patch.msgid.link/20240917134246.584026-1-mikisabate@gmail.com
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
The current LATENCY_MULTIPLIER which has been around for nearly 20 years
causes rate_limit_us to be always in ms range.
On M1 mac mini I get 50 and 56us transition latency, but due to the 1000
multiplier we end up setting rate_limit_us to 50 and 56ms, which gets
capped into 2ms and was 10ms before e13aa799c2a6 ("cpufreq: Change
default transition delay to 2ms")
On Intel I5 system transition latency is 20us but due to the multiplier
we end up with 20ms that again is capped to 2ms.
Given how good modern hardware and how modern workloads require systems
to be more responsive to cater for sudden changes in workload (tasks
sleeping/wakeup/migrating, uclamp causing a sudden boost or cap) and
that 2ms is quarter of the time of 120Hz refresh rate system, drop the
old logic in favour of providing 50% headroom.
rate_limit_us = 1.5 * latency.
I considered not adding any headroom which could mean that we can end up
with infinite back-to-back requests.
I also considered providing a constant headroom (e.g: 100us) assuming
that any h/w or f/w dealing with the request shouldn't require a large
headroom when transition_latency is actually high.
But for both cases I wasn't sure if h/w or f/w can end up being
overwhelmed dealing with the freq requests in a potentially busy system.
So I opted for providing 50% breathing room.
This is expected to impact schedutil only as the other user,
dbs_governor, takes the max(2*tick, transition_delay_us) and the former
was at least 2ms on 1ms TICK, which is equivalent to the max_delay_us
before applying this patch. For systems with TICK of 4ms, this value
would have almost always ended up with 8ms sampling rate.
For systems that report 0 transition latency, we still default to
returning 1ms as transition delay.
This helps in eliminating a source of latency for applying requests as
mentioned in [1]. For example if we have a 1ms tick, most systems will
miss sending an update at tick when updating the util_avg for a task/CPU
(rate_limit_us will be 2ms for most systems).
Link: https://lore.kernel.org/lkml/20240724212255.mfr2ybiv2j2uqek7@airbuntu/ # [1]
Link: https://lore.kernel.org/lkml/20240205022500.2232124-1-qyousef@layalina.io/
Signed-off-by: Qais Yousef <qyousef@layalina.io>
Link: https://patch.msgid.link/20240728192659.58115-1-qyousef@layalina.io
[ rjw: Subject edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/vireshk/pm
Merge ARM cpufreq updates for 6.11 from Viresh Kumar:
"- cpufreq: Add Loongson-3 CPUFreq driver support (Huacai Chen).
- Make exit() callback return void (Lizhe and Viresh Kumar).
- Minor cleanups and fixes in several drivers (Bryan Brattlof,
Javier Carrasco, Jagadeesh Kona, Jeff Johnson, Nícolas F. R. A. Prado,
Primoz Fiser, Raphael Gallais-Pou, and Riwen Lu)."
* tag 'cpufreq-arm-updates-6.11' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/vireshk/pm: (21 commits)
cpufreq: sti: fix build warning
cpufreq: mediatek: Use dev_err_probe in every error path in probe
cpufreq: Add Loongson-3 CPUFreq driver support
cpufreq: Make cpufreq_driver->exit() return void
cpufreq: pcc: Remove empty exit() callback
cpufreq: loongson2: Remove empty exit() callback
cpufreq: nforce2: Remove empty exit() callback
cpufreq: sti: add missing MODULE_DEVICE_TABLE entry for stih418
cpufreq: ti: update OPP table for AM62Px SoCs
cpufreq: ti: update OPP table for AM62Ax SoCs
cpufreq: sun50i: add Allwinner H700 speed bin
cpufreq/cppc: Don't compare desired_perf in target()
OPP: ti: Fix ti_opp_supply_probe wrong return values
cpufreq: ti-cpufreq: Handle deferred probe with dev_err_probe()
cpufreq: dt-platdev: add missing MODULE_DESCRIPTION() macro
cpufreq: longhaul: Fix kernel-doc param for longhaul_setstate
cpufreq: qcom-nvmem: eliminate uses of of_node_put()
cpufreq: qcom-nvmem: fix memory leaks in probe error paths
cpufreq: scmi: Avoid overflow of target_freq in fast switch
cpufreq: sun50i: replace of_node_put() with automatic cleanup handler
...
|
|
The cpufreq core doesn't check the return type of the exit() callback
and there is not much the core can do on failures at that point. Just
drop the returned value and make it return void.
Signed-off-by: Lizhe <sensor1010@163.com>
[ Viresh: Reworked the patches to fix all missing changes together. ]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> # Mediatek
Acked-by: Sudeep Holla <sudeep.holla@arm.com> # scpi, scmi, vexpress
Acked-by: Mario Limonciello <mario.limonciello@amd.com> # amd
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> # bmips
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Acked-by: Kevin Hilman <khilman@baylibre.com> # omap
|
|
Since this function is supposed to return boost_enabled which is anyway
a bool type make sure that it's return value is also marked as bool.
This helps maintain better consistency in data types being used.
Signed-off-by: Dhruva Gole <d-gole@ti.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://patch.msgid.link/20240627060117.1809477-1-d-gole@ti.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Provide to the scheduler a feedback about the temporary max available
capacity. Unlike arch_update_thermal_pressure(), this doesn't need to be
filtered as the pressure will happen for dozens of ms or more.
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Lukasz Luba <lukasz.luba@arm.com>
Reviewed-by: Qais Yousef <qyousef@layalina.io>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Reviewed-by: Dhruva Gole <d-gole@ti.com>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://lore.kernel.org/r/20240326091616.3696851-2-vincent.guittot@linaro.org
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull more ARM SoC updates from Arnd Bergmann:
"These are changes that for some reason ended up not making it into the
first four branches but that should still make it into 6.9:
- A rework of the omap clock support that touches both drivers and
device tree files
- The reset controller branch changes that had a dependency on late
bugfixes. Merging them here avoids a backmerge of 6.8-rc5 into the
drivers branch
- The RISC-V/starfive, RISC-V/microchip and ARM/Broadcom devicetree
changes that got delayed and needed some extra time in linux-next
for wider testing"
* tag 'soc-late-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (31 commits)
soc: fsl: dpio: fix kcalloc() argument order
bus: ts-nbus: Improve error reporting
bus: ts-nbus: Convert to atomic pwm API
riscv: dts: starfive: jh7110: Add camera subsystem nodes
ARM: bcm: stop selecing CONFIG_TICK_ONESHOT
ARM: dts: omap3: Update clksel clocks to use reg instead of ti,bit-shift
ARM: dts: am3: Update clksel clocks to use reg instead of ti,bit-shift
clk: ti: Improve clksel clock bit parsing for reg property
clk: ti: Handle possible address in the node name
dt-bindings: pwm: opencores: Add compatible for StarFive JH8100
dt-bindings: riscv: cpus: reg matches hart ID
reset: Instantiate reset GPIO controller for shared reset-gpios
reset: gpio: Add GPIO-based reset controller
cpufreq: do not open-code of_phandle_args_equal()
of: Add of_phandle_args_equal() helper
reset: simple: add support for Sophgo SG2042
dt-bindings: reset: sophgo: support SG2042
riscv: dts: microchip: add specific compatible for mpfs pdma
riscv: dts: microchip: add missing CAN bus clocks
ARM: brcmstb: Add debug UART entry for 74165
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm into pm
Merge OPP (operating performance points) updates for 6.9 from Viresh
Kumar:
"- Fix couple of warnings related to W=1 builds. (Viresh Kumar).
- Move Move dev_pm_opp_{init|free}_cpufreq_table() to pm_opp.h (Viresh Kumar).
- Extend dev_pm_opp_data with turbo support (Sibi Sankar).
- dt-bindings: drop maxItems from inner items (David Heidelberg)."
* tag 'opp-updates-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
dt-bindings: opp: drop maxItems from inner items
OPP: debugfs: Fix warning around icc_get_name()
OPP: debugfs: Fix warning with W=1 builds
cpufreq: Move dev_pm_opp_{init|free}_cpufreq_table() to pm_opp.h
OPP: Extend dev_pm_opp_data with turbo support
|
|
Move the declaration of functions defined in the OPP core to pm_opp.h.
These were added to cpufreq.h as it was the only user of the APIs, but
that was a mistake perhaps. Fix it.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
|
|
Resolving a frequency to an efficient one should not transgress
policy->max (which can be set for thermal reason) and policy->min.
Currently, there is possibility where scaling_cur_freq can exceed
scaling_max_freq when scaling_max_freq is an inefficient frequency.
Add a check to ensure that resolving a frequency will respect
policy->min/max.
Cc: All applicable <stable@vger.kernel.org>
Fixes: 1f39fa0dccff ("cpufreq: Introducing CPUFREQ_RELATION_E")
Signed-off-by: Shivnandan Kumar <quic_kshivnan@quicinc.com>
[ rjw: Whitespace adjustment, changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
A minimum sampling rate value of 10ms was introduced in:
commit cef9615a853e ("[CPUFREQ] ondemand: Uncouple minimal sampling rate from HZ in NO_HZ case")
The use of this value was removed in:
commit ed4676e25463 ("cpufreq: Replace "max_transition_latency" with "dynamic_switching"")
Remove:
- a comment referencing this value
- an unused macro associated to this value
Signed-off-by: Pierre Gondois <pierre.gondois@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Use newly added of_phandle_args_equal() helper to compare two
of_phandle_args.
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20240129115216.96479-3-krzysztof.kozlowski@linaro.org
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
|
|
Platform firmware sends notify 0x85 to inform the OS that the highest
performance of a CPU has changed.
This will be used by the AMD P-state driver to update the ranking of
preferred cores and set the priority of cores accordingly.
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Perry Yuan <perry.yuan@amd.com>
Signed-off-by: Meng Li <li.meng@amd.com>
Link: https://uefi.org/specs/ACPI/6.5/05_ACPI_Software_Programming_Model.html#processor-device-notification-values
[ rjw: New subject, changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
cpuinfo.max_freq can change at runtime because of boost as an example. This
implies that the value could be different from the frequency that has been
used to compute the capacity of a CPU.
The new arch_scale_freq_ref() returns a fixed and coherent frequency
that can be used to compute the capacity for a given frequency.
[ Also fix a arch_set_freq_scale() newline style wart in <linux/cpufreq.h>. ]
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Lukasz Luba <lukasz.luba@arm.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Link: https://lore.kernel.org/r/20231211104855.558096-3-vincent.guittot@linaro.org
|
|
The Energy Aware Scheduler (EAS) relies on the schedutil governor.
When moving to/from the schedutil governor, sched domains must be
rebuilt to allow re-evaluating the enablement conditions of EAS.
This is done through sched_cpufreq_governor_change().
Having a cpufreq governor assumes a cpufreq driver is running.
Inserting/removing a cpufreq driver should trigger a re-evaluation
of EAS enablement conditions, avoiding to see EAS enabled when
removing a running cpufreq driver.
Rebuild the sched domains in schedutil's sugov_init()/sugov_exit(),
allowing to check EAS's enablement condition whenever schedutil
governor is initialized/exited from.
Move relevant code up in schedutil.c to avoid a split and conditional
function declaration.
Rename sched_cpufreq_governor_change() to sugov_eas_rebuild_sd().
Signed-off-by: Pierre Gondois <pierre.gondois@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
The boost control currently applies to the whole system. However, users
may prefer to boost a subset of cores in order to provide prioritized
performance to workloads running on the boosted cores.
Enable per-policy boost by adding a 'boost' sysfs interface under each
policy path. This can be found at:
/sys/devices/system/cpu/cpufreq/policy<*>/boost
Same to the global boost switch, writing 1/0 to the per-policy 'boost'
enables/disables boost on a cpufreq policy respectively.
The user view of global and per-policy boost controls should be:
1. Enabling global boost initially enables boost on all policies, and
per-policy boost can then be enabled or disabled individually, given that
the platform does support so.
2. Disabling global boost makes the per-policy boost interface illegal.
Signed-off-by: Jie Zhan <zhanjie9@hisilicon.com>
Reviewed-by: Wei Xu <xuwei5@hisilicon.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm
Pull ARM cpufreq updates for 6.6 from Viresh Kumar:
"- Migrate various platforms to use remove callback returning void
(Yangtao Li).
- Add online/offline/exit hooks for Tegra driver (Sumit Gupta).
- Explicitly include correct DT includes (Rob Herring).
- Frequency domain updates for qcom-hw driver (Neil Armstrong).
- Modify AMD pstate driver return the highest_perf value (Meng Li).
- Generic cleanups for cppc, mediatek and powernow driver (Liao Chang
and Konrad Dybcio).
- Add more platforms to cpufreq-arm driver's blocklist (AngeloGioacchino
Del Regno and Konrad Dybcio).
- brcmstb-avs-cpufreq: Fix -Warray-bounds bug (Gustavo A. R. Silva)."
* tag 'cpufreq-arm-updates-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm: (33 commits)
cpufreq: tegra194: remove opp table in exit hook
cpufreq: powernow-k8: Use related_cpus instead of cpus in driver.exit()
cpufreq: tegra194: add online/offline hooks
cpufreq: qcom-cpufreq-hw: add support for 4 freq domains
dt-bindings: cpufreq: qcom-hw: add a 4th frequency domain
cpufreq: cppc: Set fie_disabled to FIE_DISABLED if fails to create kworker_fie
cpufreq: cppc: cppc_cpufreq_get_rate() returns zero in all error cases.
cpufreq: Prefer to print cpuid in MIN/MAX QoS register error message
cpufreq: amd-pstate-ut: Modify the function to get the highest_perf value
cpufreq: mediatek-hw: Remove unused define
cpufreq: blocklist more Qualcomm platforms in cpufreq-dt-platdev
cpufreq: brcmstb-avs-cpufreq: Fix -Warray-bounds bug
cpufreq: blocklist MSM8998 in cpufreq-dt-platdev
cpufreq: omap: Convert to platform remove callback returning void
cpufreq: qoriq: Convert to platform remove callback returning void
cpufreq: acpi: Convert to platform remove callback returning void
cpufreq: tegra186: Convert to platform remove callback returning void
cpufreq: qcom-nvmem: Convert to platform remove callback returning void
cpufreq: kirkwood: Convert to platform remove callback returning void
cpufreq: pcc-cpufreq: Convert to platform remove callback returning void
...
|
|
The valid values of policy.{min, max} should be between 'min' and 'max',
so use clamp() helper macro to makes cpufreq_verify_within_limits() easier
to follow.
Signed-off-by: Liao Chang <liaochang1@huawei.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
The cpufreq framework used to use the zero of return value to reflect
the cppc_cpufreq_get_rate() had failed to get current frequecy and treat
all positive integer to be succeed. Since cppc_get_perf_ctrs() returns a
negative integer in error case, so it is better to convert the value to
zero as the return value of cppc_cpufreq_get_rate().
Signed-off-by: Liao Chang <liaochang1@huawei.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
|
|
If fast_switch_possible flag is set by the scaling driver, the governor
is free to select fast_switch function even if adjust_perf is set. Some
scaling drivers which use adjust_perf don't set fast_switch thinking
that the governor would never fall back to fast_switch. But the governor
can fall back to fast_switch even in runtime if frequency invariance is
disabled due to some reason. This could crash the kernel if the driver
didn't set the fast_switch function pointer.
Therefore, fail driver registration if it has adjust_perf without
fast_switch.
Suggested-by: Rafael J. Wysocki <rafael@kernel.org>
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull more devicetree updates from Rob Herring:
- First part of DT header detangling dropping cpu.h from of_device.h
and replacing some includes with forward declarations. A handful of
drivers needed some adjustment to their includes as a result.
- Refactor of_device.h to be used by bus drivers rather than various
device drivers. This moves non-bus related functions out of
of_device.h. The end goal is for of_platform.h and of_device.h to
stop including each other.
- Refactor open coded parsing of "ranges" in some bus drivers to use DT
address parsing functions
- Add some new address parsing functions of_property_read_reg(),
of_range_count(), and of_range_to_resource() in preparation to
convert more open coded parsing of DT addresses to use them.
- Treewide clean-ups to use of_property_read_bool() and
of_property_present() as appropriate. The ones here are the ones that
didn't get picked up elsewhere.
* tag 'devicetree-for-6.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (34 commits)
bus: tegra-gmi: Replace of_platform.h with explicit includes
hte: Use of_property_present() for testing DT property presence
w1: w1-gpio: Use of_property_read_bool() for boolean properties
virt: fsl: Use of_property_present() for testing DT property presence
soc: fsl: Use of_property_present() for testing DT property presence
sbus: display7seg: Use of_property_read_bool() for boolean properties
sparc: Use of_property_read_bool() for boolean properties
sparc: Use of_property_present() for testing DT property presence
bus: mvebu-mbus: Remove open coded "ranges" parsing
of/address: Add of_property_read_reg() helper
of/address: Add of_range_count() helper
of/address: Add support for 3 address cell bus
of/address: Add of_range_to_resource() helper
of: unittest: Add bus address range parsing tests
of: Drop cpu.h include from of_device.h
OPP: Adjust includes to remove of_device.h
irqchip: loongson-eiointc: Add explicit include for cpuhotplug.h
cpuidle: Adjust includes to remove of_device.h
cpufreq: sun50i: Add explicit include for cpu.h
cpufreq: Adjust includes to remove of_device.h
...
|
|
Now that of_cpu_device_node_get() is defined in of.h, of_device.h is just
implicitly including other includes, and is no longer needed. Adjust the
include files with what was implicitly included by of_device.h (cpu.h and
of.h) and drop including of_device.h.
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Link: https://lore.kernel.org/r/20230329-dt-cpu-header-cleanups-v1-14-581e2605fe47@kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
|
|
Since the cpufreq core directly uses freq_table, for cpufreq drivers
that set their target_index() callback, make it mandatory for them to
set the same.
Since this is set per policy and normally from policy->init(), do this
from cpufreq_table_validate_and_sort() which gets called right after
->init().
Reported-by: Yajun Deng <yajun.deng@linux.dev>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
All but a few drivers ignore the return value of
cpufreq_unregister_driver(). Those few that don't only call it after
cpufreq_register_driver() succeeded, in which case the call doesn't
fail.
Make the function return no value and add a WARN_ON for the case that
the function is called in an invalid situation (i.e. without a previous
successful call to cpufreq_register_driver()).
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Acked-by: Florian Fainelli <f.fainelli@gmail.com> # brcmstb-avs-cpufreq.c
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
of_perf_domain_get_sharing_cpumask currently assumes a 1-argument
phandle format, and directly returns the argument. Generalize this to
return the full of_phandle_args, so it can be used by drivers which use
other phandle styles (e.g. separate nodes). This also requires changing
the CPU sharing match to compare the full args structure.
Also, make sure to of_node_put(args.np) (the original code was leaking a
reference).
Signed-off-by: Hector Martin <marcan@marcan.st>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
|
|
The frequency invariance infrastructure provides the APERF/MPERF samples
already. Utilize them for the cpu frequency display in /proc/cpuinfo.
The sample is considered valid for 20ms. So for idle or isolated NOHZ full
CPUs the function returns 0, which is matching the previous behaviour.
This gets rid of the mass IPIs and a delay of 20ms for stabilizing observed
by Eric when reading /proc/cpuinfo.
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Link: https://lore.kernel.org/r/20220415161206.875029458@linutronix.de
|
|
|
|
This effectively revert '4bf8e582119e ("cpufreq: Remove ready()
callback")', in order to reintroduce the ready callback.
This is needed in order to be able to leave the thermal pressure
interrupts in the Qualcomm CPUfreq driver disabled during
initialization, so that it doesn't fire while related_cpus are still 0.
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
[ Viresh: Added the Chinese translation as well and updated commit msg ]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
|
|
So it can be reused by other codes.
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Merge Energy Model and power capping updates for 5.16-rc1:
- Add support for inefficient operating performance points to the
Energy Model and modify cpufreq to use them properly (Vincent
Donnefort).
- Rearrange the DTPM framework code to simplify it and make it easier
to follow (Daniel Lezcano).
- Fix power intialization in DTPM (Daniel Lezcano).
- Add CPU load consideration when estimating the instaneous power
consumption in DTPM (Daniel Lezcano).
* pm-em:
cpufreq: mediatek-hw: Fix cpufreq_table_find_index_dl() call
PM: EM: Mark inefficiencies in CPUFreq
cpufreq: Use CPUFREQ_RELATION_E in DVFS governors
cpufreq: Introducing CPUFREQ_RELATION_E
cpufreq: Add an interface to mark inefficient frequencies
cpufreq: Make policy min/max hard requirements
PM: EM: Allow skipping inefficient states
PM: EM: Extend em_perf_domain with a flag field
PM: EM: Mark inefficient states
PM: EM: Fix inefficient states detection
* powercap:
powercap/drivers/dtpm: Fix power limit initialization
powercap/drivers/dtpm: Scale the power with the load
powercap/drivers/dtpm: Use container_of instead of a private data field
powercap/drivers/dtpm: Simplify the dtpm table
powercap/drivers/dtpm: Encapsulate even more the code
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm
Pull ARM cpufreq updates for 5.16-rc1 from Viresh Kumar:
"- Fix tegra driver to handle BPMP errors properly (Mikko Perttunen).
- Fix the parameter usage of the newly added perf-domain API (Hector
Yuan).
- Minor cleanups to cppc, vexpress and s3c244x drivers (Han Wang,
Guenter Roeck, and Arnd Bergmann)."
* 'cpufreq/arm/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
cpufreq: Fix parameter in parse_perf_domain()
cpufreq: tegra186/tegra194: Handle errors in BPMP response
cpufreq: remove useless INIT_LIST_HEAD()
cpufreq: s3c244x: add fallthrough comments for switch
cpufreq: vexpress: Drop unused variable
|
|
Pass cpu to parse_perf_domain() instead of pcpu.
Fixes: 8486a32dd484 ("cpufreq: Add of_perf_domain_get_sharing_cpumask")
Signed-off-by: Hector.Yuan <hector.yuan@mediatek.com>
[ Viresh: Massaged changelog ]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
|
|
s/internale/internal/
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
|
|
Let the governors schedutil, conservative and ondemand to work, if possible
on efficient frequencies only.
Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
This newly introduced flag can be applied by a governor to a CPUFreq
relation, when looking for a frequency within the policy table. The
resolution would then only walk through efficient frequencies.
Even with the flag set, the policy max limit will still be honoured. If no
efficient frequencies can be found within the limits of the policy, an
inefficient one would be returned.
Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Some SoCs such as the sd855 have OPPs within the same policy whose cost is
higher than others with a higher frequency. Those OPPs are inefficients
and it might be interesting for a governor to not use them.
cpufreq_table_set_inefficient() allows the caller to identify a specified
frequency as being inefficient. Inefficient frequencies are only supported
on sorted tables.
Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm
Pull more ARM cpufreq changes for v5.15-rc1 from Viresh Kumar:
"This adds a new cpufreq driver for Mediatek, which had been going
through reviews since last one year."
* 'cpufreq/arm/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
cpufreq: mediatek-hw: Add support for CPUFREQ HW
cpufreq: Add of_perf_domain_get_sharing_cpumask
dt-bindings: cpufreq: add bindings for MediaTek cpufreq HW
|
|
Add of_perf_domain_get_sharing_cpumask function to group cpu
to specific performance domain.
Signed-off-by: Hector.Yuan <hector.yuan@mediatek.com>
[ Viresh: create separate routine parse_perf_domain() and always set the
cpumask. ]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
|
|
This isn't used anymore, get rid of it.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Many cpufreq drivers register with the energy model for each policy and
do exactly the same thing. Follow the footsteps of thermal-cooling, to
get it done from the cpufreq core itself.
Provide a new callback, which will be called, if present, by the cpufreq
core at the right moment (more on that in the code's comment). Also
provide a generic implementation that uses dev_pm_opp_of_register_em().
This also allows us to register with the EM at a later point of time,
compared to ->init(), from where the EM core can access cpufreq policy
directly using cpufreq_cpu_get() type of helpers and perform other work,
like marking few frequencies inefficient, this will be done separately.
Reviewed-by: Quentin Perret <qperret@google.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
|