diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2019-11-26 19:06:44 -0800 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2019-11-26 19:06:44 -0800 |
commit | 9e7a03233e02afd3ee061e373355f34d7254f1e6 (patch) | |
tree | 62e978438049432c52dddb8329aea8090a8a8243 /drivers/cpuidle/governors/teo.c | |
parent | c2da5bdc66a377f0b82ee959f19f5a6774706b83 (diff) | |
parent | e350b60f4e0f089f585d738e27213c8133fe9093 (diff) |
Merge tag 'pm-5.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"These include cpuidle changes to use nanoseconds (instead of
microseconds) as the unit of time and to simplify checks for disabled
idle states in the idle loop, some cpuidle fixes and governor updates,
assorted cpufreq updates (driver updates mostly and a few core fixes
and cleanups), devfreq updates (dominated by the tegra30 driver
changes), new CPU IDs for the RAPL power capping driver, relatively
minor updates of the generic power domains (genpd) and operation
performance points (OPP) frameworks, and assorted fixes and cleanups.
There are also two maintainer information updates: Chanwoo Choi will
be maintaining the devfreq subsystem going forward and Todd Brandt is
going to maintain the pm-graph utility (created by him).
Specifics:
- Use nanoseconds (instead of microseconds) as the unit of time in
the cpuidle core and simplify checks for disabled idle states in
the idle loop (Rafael Wysocki)
- Fix and clean up the teo cpuidle governor (Rafael Wysocki)
- Fix the cpuidle registration error code path (Zhenzhong Duan)
- Avoid excessive vmexits in the ACPI cpuidle driver (Yin Fengwei)
- Extend the idle injection infrastructure to be able to measure the
requested duration in nanoseconds and to allow an exit latency
limit for idle states to be specified (Daniel Lezcano)
- Fix cpufreq driver registration and clarify a comment in the
cpufreq core (Viresh Kumar)
- Add NULL checks to the show() and store() methods of sysfs
attributes exposed by cpufreq (Kai Shen)
- Update cpufreq drivers:
* Fix for a plain int as pointer warning from sparse in
intel_pstate (Jamal Shareef)
* Fix for a hardcoded number of CPUs and stack bloat in the
powernv driver (John Hubbard)
* Updates to the ti-cpufreq driver and DT files to support new
platforms and migrate bindings from opp-v1 to opp-v2 (Adam Ford,
H. Nikolaus Schaller)
* Merging of the arm_big_little and vexpress-spc drivers and
related cleanup (Sudeep Holla)
* Fix for imx's default speed grade value (Anson Huang)
* Minor cleanup of the s3c64xx driver (Nathan Chancellor)
* CPU speed bin detection fix for sun50i (Ondrej Jirman)
- Appoint Chanwoo Choi as the new devfreq maintainer.
- Update the devfreq core:
* Check NULL governor in available_governors_show sysfs to prevent
showing wrong governor information and fix a race condition
between devfreq_update_status() and trans_stat_show() (Leonard
Crestez)
* Add new 'interrupt-driven' flag for devfreq governors to allow
interrupt-driven governors to prevent the devfreq core from
polling devices for status (Dmitry Osipenko)
* Improve an error message in devfreq_add_device() (Matthias
Kaehlcke)
- Update devfreq drivers:
* tegra30 driver fixes and cleanups (Dmitry Osipenko)
* Removal of unused property from dt-binding documentation for the
exynos-bus driver (Kamil Konieczny)
* exynos-ppmu cleanup and DT bindings update (Lukasz Luba, Marek
Szyprowski)
- Add new CPU IDs for CometLake Mobile and Desktop to the Intel RAPL
power capping driver (Zhang Rui)
- Allow device initialization in the generic power domains (genpd)
framework to be more straightforward and clean it up (Ulf Hansson)
- Add support for adjusting OPP voltages at run time to the OPP
framework (Stephen Boyd)
- Avoid freeing memory that has never been allocated in the
hibernation core (Andy Whitcroft)
- Clean up function headers in a header file and coding style in the
wakeup IRQs handling code (Ulf Hansson, Xiaofei Tan)
- Clean up the SmartReflex adaptive voltage scaling (AVS) driver for
ARM (Ben Dooks, Geert Uytterhoeven)
- Wrap power management documentation to fit in 80 columns (Bjorn
Helgaas)
- Add pm-graph utility entry to MAINTAINERS (Todd Brandt)
- Update the cpupower utility:
* Fix the handling of set and info subcommands (Abhishek Goel)
* Fix build warnings (Nathan Chancellor)
* Improve mperf_monitor handling (Janakarajan Natarajan)"
* tag 'pm-5.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (83 commits)
PM: Wrap documentation to fit in 80 columns
cpuidle: Pass exit latency limit to cpuidle_use_deepest_state()
cpuidle: Allow idle injection to apply exit latency limit
cpuidle: Introduce cpuidle_driver_state_disabled() for driver quirks
cpuidle: teo: Avoid code duplication in conditionals
cpufreq: Register drivers only after CPU devices have been registered
cpuidle: teo: Avoid using "early hits" incorrectly
cpuidle: teo: Exclude cpuidle overhead from computations
PM / Domains: Convert to dev_to_genpd_safe() in genpd_syscore_switch()
mmc: tmio: Avoid boilerplate code in ->runtime_suspend()
PM / Domains: Implement the ->start() callback for genpd
PM / Domains: Introduce dev_pm_domain_start()
ARM: OMAP2+: SmartReflex: add omap_sr_pdata definition
PM / wakeirq: remove unnecessary parentheses
power: avs: smartreflex: Remove superfluous cast in debugfs_create_file() call
cpuidle: Use nanoseconds as the unit of time
PM / OPP: Support adjusting OPP voltages at runtime
PM / core: Clean up some function headers in power.h
cpufreq: Add NULL checks to show() and store() methods of cpufreq
cpufreq: intel_pstate: Fix plain int as pointer warning from sparse
...
Diffstat (limited to 'drivers/cpuidle/governors/teo.c')
-rw-r--r-- | drivers/cpuidle/governors/teo.c | 182 |
1 files changed, 120 insertions, 62 deletions
diff --git a/drivers/cpuidle/governors/teo.c b/drivers/cpuidle/governors/teo.c index b5a0e498f798..de7e706efd46 100644 --- a/drivers/cpuidle/governors/teo.c +++ b/drivers/cpuidle/governors/teo.c @@ -104,7 +104,7 @@ struct teo_cpu { u64 sleep_length_ns; struct teo_idle_state states[CPUIDLE_STATE_MAX]; int interval_idx; - unsigned int intervals[INTERVALS]; + u64 intervals[INTERVALS]; }; static DEFINE_PER_CPU(struct teo_cpu, teo_cpus); @@ -117,9 +117,8 @@ static DEFINE_PER_CPU(struct teo_cpu, teo_cpus); static void teo_update(struct cpuidle_driver *drv, struct cpuidle_device *dev) { struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu); - unsigned int sleep_length_us = ktime_to_us(cpu_data->sleep_length_ns); int i, idx_hit = -1, idx_timer = -1; - unsigned int measured_us; + u64 measured_ns; if (cpu_data->time_span_ns >= cpu_data->sleep_length_ns) { /* @@ -127,23 +126,28 @@ static void teo_update(struct cpuidle_driver *drv, struct cpuidle_device *dev) * enough to the closest timer event expected at the idle state * selection time to be discarded. */ - measured_us = UINT_MAX; + measured_ns = U64_MAX; } else { - unsigned int lat; + u64 lat_ns = drv->states[dev->last_state_idx].exit_latency_ns; - lat = drv->states[dev->last_state_idx].exit_latency; - - measured_us = ktime_to_us(cpu_data->time_span_ns); + /* + * The computations below are to determine whether or not the + * (saved) time till the next timer event and the measured idle + * duration fall into the same "bin", so use last_residency_ns + * for that instead of time_span_ns which includes the cpuidle + * overhead. + */ + measured_ns = dev->last_residency_ns; /* * The delay between the wakeup and the first instruction * executed by the CPU is not likely to be worst-case every * time, so take 1/2 of the exit latency as a very rough * approximation of the average of it. */ - if (measured_us >= lat) - measured_us -= lat / 2; + if (measured_ns >= lat_ns) + measured_ns -= lat_ns / 2; else - measured_us /= 2; + measured_ns /= 2; } /* @@ -155,9 +159,9 @@ static void teo_update(struct cpuidle_driver *drv, struct cpuidle_device *dev) cpu_data->states[i].early_hits -= early_hits >> DECAY_SHIFT; - if (drv->states[i].target_residency <= sleep_length_us) { + if (drv->states[i].target_residency_ns <= cpu_data->sleep_length_ns) { idx_timer = i; - if (drv->states[i].target_residency <= measured_us) + if (drv->states[i].target_residency_ns <= measured_ns) idx_hit = i; } } @@ -193,30 +197,35 @@ static void teo_update(struct cpuidle_driver *drv, struct cpuidle_device *dev) * Save idle duration values corresponding to non-timer wakeups for * pattern detection. */ - cpu_data->intervals[cpu_data->interval_idx++] = measured_us; + cpu_data->intervals[cpu_data->interval_idx++] = measured_ns; if (cpu_data->interval_idx > INTERVALS) cpu_data->interval_idx = 0; } +static bool teo_time_ok(u64 interval_ns) +{ + return !tick_nohz_tick_stopped() || interval_ns >= TICK_NSEC; +} + /** * teo_find_shallower_state - Find shallower idle state matching given duration. * @drv: cpuidle driver containing state data. * @dev: Target CPU. * @state_idx: Index of the capping idle state. - * @duration_us: Idle duration value to match. + * @duration_ns: Idle duration value to match. */ static int teo_find_shallower_state(struct cpuidle_driver *drv, struct cpuidle_device *dev, int state_idx, - unsigned int duration_us) + u64 duration_ns) { int i; for (i = state_idx - 1; i >= 0; i--) { - if (drv->states[i].disabled || dev->states_usage[i].disable) + if (dev->states_usage[i].disable) continue; state_idx = i; - if (drv->states[i].target_residency <= duration_us) + if (drv->states[i].target_residency_ns <= duration_ns) break; } return state_idx; @@ -232,9 +241,10 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, bool *stop_tick) { struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu); - int latency_req = cpuidle_governor_latency_req(dev->cpu); - unsigned int duration_us, count; - int max_early_idx, constraint_idx, idx, i; + s64 latency_req = cpuidle_governor_latency_req(dev->cpu); + u64 duration_ns; + unsigned int hits, misses, early_hits; + int max_early_idx, prev_max_early_idx, constraint_idx, idx, i; ktime_t delta_tick; if (dev->last_state_idx >= 0) { @@ -244,50 +254,92 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, cpu_data->time_span_ns = local_clock(); - cpu_data->sleep_length_ns = tick_nohz_get_sleep_length(&delta_tick); - duration_us = ktime_to_us(cpu_data->sleep_length_ns); + duration_ns = tick_nohz_get_sleep_length(&delta_tick); + cpu_data->sleep_length_ns = duration_ns; - count = 0; + hits = 0; + misses = 0; + early_hits = 0; max_early_idx = -1; + prev_max_early_idx = -1; constraint_idx = drv->state_count; idx = -1; for (i = 0; i < drv->state_count; i++) { struct cpuidle_state *s = &drv->states[i]; - struct cpuidle_state_usage *su = &dev->states_usage[i]; - if (s->disabled || su->disable) { + if (dev->states_usage[i].disable) { + /* + * Ignore disabled states with target residencies beyond + * the anticipated idle duration. + */ + if (s->target_residency_ns > duration_ns) + continue; + + /* + * This state is disabled, so the range of idle duration + * values corresponding to it is covered by the current + * candidate state, but still the "hits" and "misses" + * metrics of the disabled state need to be used to + * decide whether or not the state covering the range in + * question is good enough. + */ + hits = cpu_data->states[i].hits; + misses = cpu_data->states[i].misses; + + if (early_hits >= cpu_data->states[i].early_hits || + idx < 0) + continue; + /* - * If the "early hits" metric of a disabled state is - * greater than the current maximum, it should be taken - * into account, because it would be a mistake to select - * a deeper state with lower "early hits" metric. The - * index cannot be changed to point to it, however, so - * just increase the max count alone and let the index - * still point to a shallower idle state. + * If the current candidate state has been the one with + * the maximum "early hits" metric so far, the "early + * hits" metric of the disabled state replaces the + * current "early hits" count to avoid selecting a + * deeper state with lower "early hits" metric. */ - if (max_early_idx >= 0 && - count < cpu_data->states[i].early_hits) - count = cpu_data->states[i].early_hits; + if (max_early_idx == idx) { + early_hits = cpu_data->states[i].early_hits; + continue; + } + + /* + * The current candidate state is closer to the disabled + * one than the current maximum "early hits" state, so + * replace the latter with it, but in case the maximum + * "early hits" state index has not been set so far, + * check if the current candidate state is not too + * shallow for that role. + */ + if (teo_time_ok(drv->states[idx].target_residency_ns)) { + prev_max_early_idx = max_early_idx; + early_hits = cpu_data->states[i].early_hits; + max_early_idx = idx; + } continue; } - if (idx < 0) + if (idx < 0) { idx = i; /* first enabled state */ + hits = cpu_data->states[i].hits; + misses = cpu_data->states[i].misses; + } - if (s->target_residency > duration_us) + if (s->target_residency_ns > duration_ns) break; - if (s->exit_latency > latency_req && constraint_idx > i) + if (s->exit_latency_ns > latency_req && constraint_idx > i) constraint_idx = i; idx = i; + hits = cpu_data->states[i].hits; + misses = cpu_data->states[i].misses; - if (count < cpu_data->states[i].early_hits && - !(tick_nohz_tick_stopped() && - drv->states[i].target_residency < TICK_USEC)) { - count = cpu_data->states[i].early_hits; + if (early_hits < cpu_data->states[i].early_hits && + teo_time_ok(drv->states[i].target_residency_ns)) { + prev_max_early_idx = max_early_idx; + early_hits = cpu_data->states[i].early_hits; max_early_idx = i; } } @@ -300,10 +352,19 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, * "early hits" metric, but if that cannot be determined, just use the * state selected so far. */ - if (cpu_data->states[idx].hits <= cpu_data->states[idx].misses && - max_early_idx >= 0) { - idx = max_early_idx; - duration_us = drv->states[idx].target_residency; + if (hits <= misses) { + /* + * The current candidate state is not suitable, so take the one + * whose "early hits" metric is the maximum for the range of + * shallower states. + */ + if (idx == max_early_idx) + max_early_idx = prev_max_early_idx; + + if (max_early_idx >= 0) { + idx = max_early_idx; + duration_ns = drv->states[idx].target_residency_ns; + } } /* @@ -316,18 +377,17 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, if (idx < 0) { idx = 0; /* No states enabled. Must use 0. */ } else if (idx > 0) { + unsigned int count = 0; u64 sum = 0; - count = 0; - /* * Count and sum the most recent idle duration values less than * the current expected idle duration value. */ for (i = 0; i < INTERVALS; i++) { - unsigned int val = cpu_data->intervals[i]; + u64 val = cpu_data->intervals[i]; - if (val >= duration_us) + if (val >= duration_ns) continue; count++; @@ -339,17 +399,17 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, * values are in the interesting range. */ if (count > INTERVALS / 2) { - unsigned int avg_us = div64_u64(sum, count); + u64 avg_ns = div64_u64(sum, count); /* * Avoid spending too much time in an idle state that * would be too shallow. */ - if (!(tick_nohz_tick_stopped() && avg_us < TICK_USEC)) { - duration_us = avg_us; - if (drv->states[idx].target_residency > avg_us) + if (teo_time_ok(avg_ns)) { + duration_ns = avg_ns; + if (drv->states[idx].target_residency_ns > avg_ns) idx = teo_find_shallower_state(drv, dev, - idx, avg_us); + idx, avg_ns); } } } @@ -359,9 +419,7 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, * expected idle duration is shorter than the tick period length. */ if (((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) || - duration_us < TICK_USEC) && !tick_nohz_tick_stopped()) { - unsigned int delta_tick_us = ktime_to_us(delta_tick); - + duration_ns < TICK_NSEC) && !tick_nohz_tick_stopped()) { *stop_tick = false; /* @@ -370,8 +428,8 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, * till the closest timer including the tick, try to correct * that. */ - if (idx > 0 && drv->states[idx].target_residency > delta_tick_us) - idx = teo_find_shallower_state(drv, dev, idx, delta_tick_us); + if (idx > 0 && drv->states[idx].target_residency_ns > delta_tick) + idx = teo_find_shallower_state(drv, dev, idx, delta_tick); } return idx; @@ -415,7 +473,7 @@ static int teo_enable_device(struct cpuidle_driver *drv, memset(cpu_data, 0, sizeof(*cpu_data)); for (i = 0; i < INTERVALS; i++) - cpu_data->intervals[i] = UINT_MAX; + cpu_data->intervals[i] = U64_MAX; return 0; } |