summaryrefslogtreecommitdiff
path: root/drivers/cpufreq/cpufreq.c
AgeCommit message (Collapse)Author
2017-10-03cpufreq: provide default frequency-invariance setter functionDietmar Eggemann
Frequency-invariant accounting support based on the ratio of current frequency and maximum supported frequency is an optional feature an arch can implement. Since there are cpufreq drivers (e.g. cpufreq-dt) which can be build for different arch's a default implementation of the frequency-invariance setter function arch_set_freq_scale() is needed. This default implementation is an empty weak function which will be overwritten by a strong function in case the arch provides one. The setter function passes the cpumask of related (to the frequency change) cpus (online and offline cpus), the (new) current frequency and the maximum supported frequency. Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-09-04Merge branch 'pm-cpufreq-sched'Rafael J. Wysocki
* pm-cpufreq-sched: cpufreq: schedutil: Always process remote callback with slow switching cpufreq: schedutil: Don't restrict kthread to related_cpus unnecessarily cpufreq: Return 0 from ->fast_switch() on errors cpufreq: Simplify cpufreq_can_do_remote_dvfs() cpufreq: Process remote callbacks from any CPU if the platform permits sched: cpufreq: Allow remote cpufreq callbacks cpufreq: schedutil: Use unsigned int for iowait boost cpufreq: schedutil: Make iowait boost more energy efficient
2017-08-22cpufreq: Cap the default transition delay value to 10 msViresh Kumar
If transition_delay_us isn't defined by the cpufreq driver, the default value of transition delay (time after which the cpufreq governor will try updating the frequency again) is currently calculated by multiplying transition_latency (nsec) with LATENCY_MULTIPLIER (1000) and then converting this time to usec. That gives the exact same value as transition_latency, just that the time unit is usec instead of nsec. With acpi-cpufreq for example, transition_latency is set to around 10 usec and we get transition delay as 10 ms. Which seems to be a reasonable amount of time to reevaluate the frequency again. But for platforms where frequency switching isn't that fast (like ARM), the transition_latency varies from 500 usec to 3 ms, and the transition delay becomes 500 ms to 3 seconds. Of course, that is a pretty bad default value to start with. We can try to come across a better formula (instead of multiplying with LATENCY_MULTIPLIER) to solve this problem, but will that be worth it ? This patch tries a simple approach and caps the maximum value of default transition delay to 10 ms. Of course, userspace can still come in and change this value anytime or individual drivers can rather provide transition_delay_us instead. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-08-10cpufreq: Return 0 from ->fast_switch() on errorsViresh Kumar
CPUFREQ_ENTRY_INVALID is a special symbol which is used to specify that an entry in the cpufreq table is invalid. But using it outside of the scope of the cpufreq table looks a bit incorrect. We can represent an invalid frequency by writing it as 0 instead if we need. Note that it is already done that way for the return value of the ->get() callback. Lets do the same for ->fast_switch() and not use CPUFREQ_ENTRY_INVALID outside of the scope of cpufreq table. Also update the comment over cpufreq_driver_fast_switch() to clearly mention what this returns. None of the drivers return CPUFREQ_ENTRY_INVALID as of now from ->fast_switch() callback and so we don't need to update any of those. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-07-26cpufreq: Allow dynamic switching with CPUFREQ_ETERNAL latencyViresh Kumar
With the recent updates, CPUFREQ_ETERNAL is only used by the drivers which don't know their transition latency but want to use dynamic switching. Anyway, the routine cpufreq_policy_transition_delay_us() caps the value of transition latency to 10 ms now and that can be used safely with such platforms. Remove the check from cpufreq_init_governor() and allow dynamic switching for such configurations as well. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-07-26cpufreq: Add CPUFREQ_NO_AUTO_DYNAMIC_SWITCHING cpufreq driver flagViresh Kumar
The policy->transition_latency field is used for multiple purposes today and its not straight forward at all. This is how it is used: A. Set the correct transition_latency value. B. Set it to CPUFREQ_ETERNAL because: 1. We don't want automatic dynamic switching (with ondemand/conservative) to happen at all. 2. We don't know the transition latency. This patch handles the B.1. case in a more readable way. A new flag for the cpufreq drivers is added to disallow use of cpufreq governors which have dynamic_switching flag set. All the current cpufreq drivers which are setting transition_latency unconditionally to CPUFREQ_ETERNAL are updated to use it. They don't need to set transition_latency anymore. There shouldn't be any functional change after this patch. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-07-26cpufreq: Replace "max_transition_latency" with "dynamic_switching"Viresh Kumar
There is no limitation in the ondemand or conservative governors which disallow the transition_latency to be greater than 10 ms. The max_transition_latency field is rather used to disallow automatic dynamic frequency switching for platforms which didn't wanted these governors to run. Replace max_transition_latency with a boolean (dynamic_switching) and check for transition_latency == CPUFREQ_ETERNAL along with that. This makes it pretty straight forward to read/understand now. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-07-22cpufreq: Use transition_delay_us for legacy governors as wellViresh Kumar
The policy->transition_delay_us field is used only by the schedutil governor currently, and this field describes how fast the driver wants the cpufreq governor to change CPUs frequency. It should rather be a common thing across all governors, as it doesn't have any schedutil dependency here. Create a new helper cpufreq_policy_transition_delay_us() to get the transition delay across all governors. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-07-04Merge tag 'pm-4.13-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "The big ticket items here are the rework of suspend-to-idle in order to add proper support for power button wakeup from it on recent Dell laptops and the rework of interfaces exporting the current CPU frequency on x86. In addition to that, support for a few new pieces of hardware is added, the PCI/ACPI device wakeup infrastructure is simplified significantly and the wakeup IRQ framework is fixed to unbreak the IRQ bus locking infrastructure. Also, there are some functional improvements for intel_pstate, tools updates and small fixes and cleanups all over. Specifics: - Rework suspend-to-idle to allow it to take wakeup events signaled by the EC into account on ACPI-based platforms in order to properly support power button wakeup from suspend-to-idle on recent Dell laptops (Rafael Wysocki). That includes the core suspend-to-idle code rework, support for the Low Power S0 _DSM interface, and support for the ACPI INT0002 Virtual GPIO device from Hans de Goede (required for USB keyboard wakeup from suspend-to-idle to work on some machines). - Stop trying to export the current CPU frequency via /proc/cpuinfo on x86 as that is inaccurate and confusing (Len Brown). - Rework the way in which the current CPU frequency is exported by the kernel (over the cpufreq sysfs interface) on x86 systems with the APERF and MPERF registers by always using values read from these registers, when available, to compute the current frequency regardless of which cpufreq driver is in use (Len Brown). - Rework the PCI/ACPI device wakeup infrastructure to remove the questionable and artificial distinction between "devices that can wake up the system from sleep states" and "devices that can generate wakeup signals in the working state" from it, which allows the code to be simplified quite a bit (Rafael Wysocki). - Fix the wakeup IRQ framework by making it use SRCU instead of RCU which doesn't allow sleeping in the read-side critical sections, but which in turn is expected to be allowed by the IRQ bus locking infrastructure (Thomas Gleixner). - Modify some computations in the intel_pstate driver to avoid rounding errors resulting from them (Srinivas Pandruvada). - Reduce the overhead of the intel_pstate driver in the HWP (hardware-managed P-states) mode and when the "performance" P-state selection algorithm is in use by making it avoid registering scheduler callbacks in those cases (Len Brown). - Rework the energy_performance_preference sysfs knob in intel_pstate by changing the values that correspond to different symbolic hint names used by it (Len Brown). - Make it possible to use more than one cpuidle driver at the same time on ARM (Daniel Lezcano). - Make it possible to prevent the cpuidle menu governor from using the 0 state by disabling it via sysfs (Nicholas Piggin). - Add support for FFH (Fixed Functional Hardware) MWAIT in ACPI C1 on AMD systems (Yazen Ghannam). - Make the CPPC cpufreq driver take the lowest nonlinear performance information into account (Prashanth Prakash). - Add support for hi3660 to the cpufreq-dt driver, fix the imx6q driver and clean up the sfi, exynos5440 and intel_pstate drivers (Colin Ian King, Krzysztof Kozlowski, Octavian Purdila, Rafael Wysocki, Tao Wang). - Fix a few minor issues in the generic power domains (genpd) framework and clean it up somewhat (Krzysztof Kozlowski, Mikko Perttunen, Viresh Kumar). - Fix a couple of minor issues in the operating performance points (OPP) framework and clean it up somewhat (Viresh Kumar). - Fix a CONFIG dependency in the hibernation core and clean it up slightly (Balbir Singh, Arvind Yadav, BaoJun Luo). - Add rk3228 support to the rockchip-io adaptive voltage scaling (AVS) driver (David Wu). - Fix an incorrect bit shift operation in the RAPL power capping driver (Adam Lessnau). - Add support for the EPP field in the HWP (hardware managed P-states) control register, HWP.EPP, to the x86_energy_perf_policy tool and update msr-index.h with HWP.EPP values (Len Brown). - Fix some minor issues in the turbostat tool (Len Brown). - Add support for AMD family 0x17 CPUs to the cpupower tool and fix a minor issue in it (Sherry Hurwitz). - Assorted cleanups, mostly related to the constification of some data structures (Arvind Yadav, Joe Perches, Kees Cook, Krzysztof Kozlowski)" * tag 'pm-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (69 commits) cpufreq: Update scaling_cur_freq documentation cpufreq: intel_pstate: Clean up after performance governor changes PM: hibernate: constify attribute_group structures. cpuidle: menu: allow state 0 to be disabled intel_idle: Use more common logging style PM / Domains: Fix missing default_power_down_ok comment PM / Domains: Fix unsafe iteration over modified list of domains PM / Domains: Fix unsafe iteration over modified list of domain providers PM / Domains: Fix unsafe iteration over modified list of device links PM / Domains: Handle safely genpd_syscore_switch() call on non-genpd device PM / Domains: Call driver's noirq callbacks PM / core: Drop run_wake flag from struct dev_pm_info PCI / PM: Simplify device wakeup settings code PCI / PM: Drop pme_interrupt flag from struct pci_dev ACPI / PM: Consolidate device wakeup settings code ACPI / PM: Drop run_wake from struct acpi_device_wakeup_flags PM / QoS: constify *_attribute_group. PM / AVS: rockchip-io: add io selectors and supplies for rk3228 powercap/RAPL: prevent overridding bits outside of the mask PM / sysfs: Constify attribute groups ...
2017-07-03Merge branch 'smp-hotplug-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull SMP hotplug updates from Thomas Gleixner: "This update is primarily a cleanup of the CPU hotplug locking code. The hotplug locking mechanism is an open coded RWSEM, which allows recursive locking. The main problem with that is the recursive nature as it evades the full lockdep coverage and hides potential deadlocks. The rework replaces the open coded RWSEM with a percpu RWSEM and establishes full lockdep coverage that way. The bulk of the changes fix up recursive locking issues and address the now fully reported potential deadlocks all over the place. Some of these deadlocks have been observed in the RT tree, but on mainline the probability was low enough to hide them away." * 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits) cpu/hotplug: Constify attribute_group structures powerpc: Only obtain cpu_hotplug_lock if called by rtasd ARM/hw_breakpoint: Fix possible recursive locking for arch_hw_breakpoint_init cpu/hotplug: Remove unused check_for_tasks() function perf/core: Don't release cred_guard_mutex if not taken cpuhotplug: Link lock stacks for hotplug callbacks acpi/processor: Prevent cpu hotplug deadlock sched: Provide is_percpu_thread() helper cpu/hotplug: Convert hotplug locking to percpu rwsem s390: Prevent hotplug rwsem recursion arm: Prevent hotplug rwsem recursion arm64: Prevent cpu hotplug rwsem recursion kprobes: Cure hotplug lock ordering issues jump_label: Reorder hotplug lock and jump_label_lock perf/tracing/cpuhotplug: Fix locking order ACPI/processor: Use cpu_hotplug_disable() instead of get_online_cpus() PCI: Replace the racy recursion prevention PCI: Use cpu_hotplug_disable() instead of get_online_cpus() perf/x86/intel: Drop get_online_cpus() in intel_snb_check_microcode() x86/perf: Drop EXPORT of perf_check_microcode ...
2017-06-27x86: use common aperfmperf_khz_on_cpu() to calculate KHz using APERF/MPERFLen Brown
The goal of this change is to give users a uniform and meaningful result when they read /sys/...cpufreq/scaling_cur_freq on modern x86 hardware, as compared to what they get today. Modern x86 processors include the hardware needed to accurately calculate frequency over an interval -- APERF, MPERF, and the TSC. Here we provide an x86 routine to make this calculation on supported hardware, and use it in preference to any driver driver-specific cpufreq_driver.get() routine. MHz is computed like so: MHz = base_MHz * delta_APERF / delta_MPERF MHz is the average frequency of the busy processor over a measurement interval. The interval is defined to be the time between successive invocations of aperfmperf_khz_on_cpu(), which are expected to to happen on-demand when users read sysfs attribute cpufreq/scaling_cur_freq. As with previous methods of calculating MHz, idle time is excluded. base_MHz above is from TSC calibration global "cpu_khz". This x86 native method to calculate MHz returns a meaningful result no matter if P-states are controlled by hardware or firmware and/or if the Linux cpufreq sub-system is or is-not installed. When this routine is invoked more frequently, the measurement interval becomes shorter. However, the code limits re-computation to 10ms intervals so that average frequency remains meaningful. Discerning users are encouraged to take advantage of the turbostat(8) utility, which can gracefully handle concurrent measurement intervals of arbitrary length. Signed-off-by: Len Brown <len.brown@intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-05-30cpufreq: cpufreq_register_driver() should return -ENODEV if init failsDavid Arcari
For a driver that does not set the CPUFREQ_STICKY flag, if all of the ->init() calls fail, cpufreq_register_driver() should return an error. This will prevent the driver from loading. Fixes: ce1bcfe94db8 (cpufreq: check cpufreq_policy_list instead of scanning policies for all CPUs) Cc: 4.0+ <stable@vger.kernel.org> # 4.0+ Signed-off-by: David Arcari <darcari@redhat.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-05-26cpufreq: Use cpuhp_setup_state_nocalls_cpuslocked()Sebastian Andrzej Siewior
cpufreq holds get_online_cpus() while invoking cpuhp_setup_state_nocalls() to make subsys_interface_register() and the registration of hotplug calls atomic versus cpu hotplug. cpuhp_setup_state_nocalls() invokes get_online_cpus() as well. This is correct, but prevents the conversion of the hotplug locking to a percpu rwsem. Use cpuhp_setup/remove_state_nocalls_cpuslocked() to avoid the nested call. Convert *_online_cpus() to the new interfaces while at it. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: linux-pm@vger.kernel.org Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170524081547.731628408@linutronix.de
2017-04-13cpufreq: Bring CPUs up even if cpufreq_online() failedChen Yu
There is a report that after commit 27622b061eb4 ("cpufreq: Convert to hotplug state machine"), the normal CPU offline/online cycle fails on some platforms. According to the ftrace result, this problem was triggered on platforms using acpi-cpufreq as the default cpufreq driver, and due to the lack of some ACPI freq method (eg. _PCT), cpufreq_online() failed and returned a negative value, so the CPU hotplug state machine rolled back the CPU online process. Actually, from the user's perspective, the failure of cpufreq_online() should not prevent that CPU from being brought up, although cpufreq might not work on that CPU. BTW, during system startup cpufreq_online() is not invoked via CPU online but by the cpufreq device creation process, so the APs can be brought up even though cpufreq_online() fails in that stage. This patch ignores the return value of cpufreq_online/offline() and lets the cpufreq framework deal with the failure. cpufreq_online() itself will do a proper rollback in that case and if _PCT is missing, the ACPI cpufreq driver will print a warning if the corresponding debug options have been enabled. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=194581 Fixes: 27622b061eb4 ("cpufreq: Convert to hotplug state machine") Reported-and-tested-by: Tomasz Maciej Nowak <tmn505@gmail.com> Signed-off-by: Chen Yu <yu.c.chen@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: 4.9+ <stable@vger.kernel.org> # 4.9+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-03-27cpufreq: Fix creation of symbolic links to policy directoriesRafael J. Wysocki
The cpufreq core only tries to create symbolic links from CPU directories in sysfs to policy directories in cpufreq_add_dev(), either when a given CPU is registered or when the cpufreq driver is registered, whichever happens first. That is not sufficient, however, because cpufreq_add_dev() may be called for an offline CPU whose policy object has not been created yet and, quite obviously, the symbolic cannot be added in that case. Fix that by making cpufreq_online() attempt to add symbolic links to policy objects for the CPUs in the related_cpus mask of every new policy object created by it. The cpufreq_driver_lock locking around the for_each_cpu() loop in cpufreq_online() is dropped, because it is not necessary and the code is somewhat simpler without it. Moreover, failures to create a symbolic link will not be regarded as hard errors any more and the CPUs without those links will not be taken offline automatically, but that should not be problematic in practice. Reported-and-tested-by: Prashanth Prakash <pprakash@codeaurora.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: 4.9+ <stable@vger.kernel.org> # 4.9+
2017-03-22cpufreq: Restore policy min/max limits on CPU onlineViresh Kumar
On CPU online the cpufreq core restores the previous governor (or the previous "policy" setting for ->setpolicy drivers), but it does not restore the min/max limits at the same time, which is confusing, inconsistent and real pain for users who set the limits and then suspend/resume the system (using full suspend), in which case the limits are reset on all CPUs except for the boot one. Fix this by making cpufreq_online() restore the limits when an inactive policy is brought online. The commit log and patch are inspired from Rafael's earlier work. Reported-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: 4.3+ <stable@vger.kernel.org> # 4.3+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-03-16cpufreq: Fix and clean up show_cpuinfo_cur_freq()Rafael J. Wysocki
There is a missing newline in show_cpuinfo_cur_freq(), so add it, but while at it clean that function up somewhat too. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: All applicable <stable@vger.kernel.org>
2017-03-09Merge branch 'pm-cpufreq'Rafael J. Wysocki
* pm-cpufreq: cpufreq: intel_pstate: Do not reinit performance limits in ->setpolicy cpufreq: intel_pstate: Fix intel_pstate_verify_policy() cpufreq: intel_pstate: Fix global settings in active mode cpufreq: Add the "cpufreq.off=1" cmdline option cpufreq: intel_pstate: Avoid triggering cpu_frequency tracepoint unnecessarily cpufreq: intel_pstate: Fix intel_cpufreq_verify_policy() cpufreq: intel_pstate: Do not use performance_limits in passive mode
2017-03-06cpufreq: Add the "cpufreq.off=1" cmdline optionLen Brown
Add the "cpufreq.off=1" cmdline option. At boot-time, this allows a user to request CONFIG_CPU_FREQ=n behavior from a kernel built with CONFIG_CPU_FREQ=y. This is analogous to the existing "cpuidle.off=1" option and CONFIG_CPU_IDLE=y This capability is valuable when we need to debug end-user issues in the BIOS or in Linux. It is also convenient for enabling comparisons, which may otherwise require a new kernel, or help from BIOS SETUP, which may be buggy or unavailable. Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-02-20Merge tag 'pm-4.11-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "The majority of changes go into the Operating Performance Points (OPP) framework and cpufreq this time, followed by devfreq and some scattered updates all over. The OPP changes are mostly related to switching over from RCU-based synchronization, that turned out to be overly complicated and problematic, to reference counting using krefs. In the cpufreq land there are core cleanups, documentation updates, a new driver for Broadcom BMIPS SoCs, a new cpufreq-dt sub-driver for TI SoCs that require special handling, ARM64 SoCs support for the qoriq driver, intel_pstate updates, powernv driver update and assorted fixes. The devfreq changes are mostly fixes related to the sysfs interface and some Exynos drivers updates. Apart from that, the cpuidle menu governor will support per-CPU PM QoS constraints for the wakeup latency now, some bugs in the wakeup IRQs framework are fixed, the generic power domains framework should handle asynchronous invocations of *noirq suspend/resume callbacks from now on, the analyze_suspend.py script is updated and there is a new tool for intel_pstate diagnostics. Specifics: - Operating Performance Points (OPP) framework fixes, cleanups and switch over from RCU-based synchronization to reference counting using krefs (Viresh Kumar, Wei Yongjun, Dave Gerlach) - cpufreq core cleanups and documentation updates (Viresh Kumar, Rafael Wysocki) - New cpufreq driver for Broadcom BMIPS SoCs (Markus Mayer) - New cpufreq-dt sub-driver for TI SoCs requiring special handling, like in the AM335x, AM437x, DRA7x, and AM57x families, along with new DT bindings for it (Dave Gerlach, Paul Gortmaker) - ARM64 SoCs support for the qoriq cpufreq driver (Tang Yuantian) - intel_pstate driver updates including a new sysfs knob to control the driver's operation mode and fixes related to the no_turbo sysfs knob and the hardware-managed P-states feature support (Rafael Wysocki, Srinivas Pandruvada) - New interface to export ultra-turbo frequencies for the powernv cpufreq driver (Shilpasri Bhat) - Assorted fixes for cpufreq drivers (Arnd Bergmann, Dan Carpenter, Wei Yongjun) - devfreq core fixes, mostly related to the sysfs interface exported by it (Chanwoo Choi, Chris Diamand) - Updates of the exynos-bus and exynos-ppmu devfreq drivers (Chanwoo Choi) - Device PM QoS extension to support CPUs and support for per-CPU wakeup (device resume) latency constraints in the cpuidle menu governor (Alex Shi) - Wakeup IRQs framework fixes (Grygorii Strashko) - Generic power domains framework update including a fix to make it handle asynchronous invocations of *noirq suspend/resume callbacks correctly (Ulf Hansson, Geert Uytterhoeven) - Assorted fixes and cleanups in the core suspend/hibernate code, PM QoS framework and x86 ACPI idle support code (Corentin Labbe, Geert Uytterhoeven, Geliang Tang, John Keeping, Nick Desaulniers) - Update of the analyze_suspend.py script is updated to version 4.5 offering multiple improvements (Todd Brandt) - New tool for intel_pstate diagnostics using the pstate_sample tracepoint (Doug Smythies)" * tag 'pm-4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (85 commits) MAINTAINERS: cpufreq: add bmips-cpufreq.c PM / QoS: Fix memory leak on resume_latency.notifiers PM / Documentation: Spelling s/wrtie/write/ PM / sleep: Fix test_suspend after sleep state rework cpufreq: CPPC: add ACPI_PROCESSOR dependency cpufreq: make ti-cpufreq explicitly non-modular cpufreq: Do not clear real_cpus mask on policy init tools/power/x86: Debug utility for intel_pstate driver AnalyzeSuspend: fix drag and zoom bug in javascript PM / wakeirq: report a wakeup_event on dedicated wekup irq PM / wakeirq: Fix spurious wake-up events for dedicated wakeirqs PM / wakeirq: Enable dedicated wakeirq for suspend cpufreq: dt: Don't use generic platdev driver for ti-cpufreq platforms cpufreq: ti: Add cpufreq driver to determine available OPPs at runtime Documentation: dt: add bindings for ti-cpufreq PM / OPP: Expose _of_get_opp_desc_node as dev_pm_opp API cpufreq: qoriq: Don't look at clock implementation details cpufreq: qoriq: add ARM64 SoCs support PM / Domains: Provide dummy governors if CONFIG_PM_GENERIC_DOMAINS=n cpufreq: brcmstb-avs-cpufreq: remove unnecessary platform_set_drvdata() ...
2017-02-16cpufreq: Do not clear real_cpus mask on policy initRafael J. Wysocki
If new_policy is set in cpufreq_online(), the policy object has just been created and its real_cpus mask has been zeroed on allocation, and the driver's ->init() callback should not touch it. It doesn't need to be cleared again, so don't do that. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2017-02-04cpufreq: Remove CPUFREQ_START notifier eventViresh Kumar
Its not used anymore, remove it. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-02-03cpufreq: Remove policy create/remove notifiersViresh Kumar
Those were added by: commit fcd7af917abb ("cpufreq: stats: handle cpufreq_unregister_driver() and suspend/resume properly") but aren't used anymore since: commit 1aefc75b2449 ("cpufreq: stats: Make the stats code non-modular"). Remove them. Also remove the redundant parameter to the respective routines. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-02-01sched/cputime: Convert kcpustat to nsecsFrederic Weisbecker
Kernel CPU stats are stored in cputime_t which is an architecture defined type, and hence a bit opaque and requiring accessors and mutators for any operation. Converting them to nsecs simplifies the code and is one step toward the removal of cputime_t in the core code. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Link: http://lkml.kernel.org/r/1485832191-26889-4-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-21cpufreq: Make cpufreq_update_policy() voidRafael J. Wysocki
The return value of cpufreq_update_policy() is never used, so make it void. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-11-21cpufreq: Avoid using inactive policiesRafael J. Wysocki
There are two places in the cpufreq core in which low-level driver callbacks may be invoked for an inactive cpufreq policy, which isn't guaranteed to work in general. Both are due to possible races with CPU offline. First, in cpufreq_get(), the policy may become inactive after the check against policy->cpus in cpufreq_cpu_get() and before policy->rwsem is acquired, in which case using it going forward may not be correct. Second, an analogous situation is possible in cpufreq_update_policy(). Avoid using inactive policies by adding policy_is_inactive() checks to the code in the above places. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-10-03Merge branch 'smp-hotplug-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull CPU hotplug updates from Thomas Gleixner: "Yet another batch of cpu hotplug core updates and conversions: - Provide core infrastructure for multi instance drivers so the drivers do not have to keep custom lists. - Convert custom lists to the new infrastructure. The block-mq custom list conversion comes through the block tree and makes the diffstat tip over to more lines removed than added. - Handle unbalanced hotplug enable/disable calls more gracefully. - Remove the obsolete CPU_STARTING/DYING notifier support. - Convert another batch of notifier users. The relayfs changes which conflicted with the conversion have been shipped to me by Andrew. The remaining lot is targeted for 4.10 so that we finally can remove the rest of the notifiers" * 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (46 commits) cpufreq: Fix up conversion to hotplug state machine blk/mq: Reserve hotplug states for block multiqueue x86/apic/uv: Convert to hotplug state machine s390/mm/pfault: Convert to hotplug state machine mips/loongson/smp: Convert to hotplug state machine mips/octeon/smp: Convert to hotplug state machine fault-injection/cpu: Convert to hotplug state machine padata: Convert to hotplug state machine cpufreq: Convert to hotplug state machine ACPI/processor: Convert to hotplug state machine virtio scsi: Convert to hotplug state machine oprofile/timer: Convert to hotplug state machine block/softirq: Convert to hotplug state machine lib/irq_poll: Convert to hotplug state machine x86/microcode: Convert to hotplug state machine sh/SH-X3 SMP: Convert to hotplug state machine ia64/mca: Convert to hotplug state machine ARM/OMAP/wakeupgen: Convert to hotplug state machine ARM/shmobile: Convert to hotplug state machine arm64/FP/SIMD: Convert to hotplug state machine ...
2016-09-20cpufreq: Fix up conversion to hotplug state machineSebastian Andrzej Siewior
The function cpufreq_register_driver() returns zero on success and since commit 27622b061eb4 ("cpufreq: Convert to hotplug state machine") erroneously a positive number. Due to the "if (x) assume_error" construct all callers assumed an error and as a consequence the cpu freq kworker crashes with a NULL pointer dereference. Reset the return value back to zero in the success case. Fixes: 27622b061eb4 ("cpufreq: Convert to hotplug state machine") Reported-by: Borislav Petkov <bp@alien8.de> Reported-and-tested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: peterz@infradead.org Cc: rjw@rjwysocki.net Link: http://lkml.kernel.org/r/20160920145628.lp2bmq72ip3oiash@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-19cpufreq: Convert to hotplug state machineSebastian Andrzej Siewior
Install the callbacks via the state machine. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: linux-pm@vger.kernel.org Cc: Peter Zijlstra <peterz@infradead.org> Cc: Viresh Kumar <viresh.kumar@linaro.or Cc: rt@linutronix.de Link: http://lkml.kernel.org/r/20160906170457.32393-13-bigeasy@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-13cpufreq: create link to policy only for registered CPUsViresh Kumar
If a cpufreq driver is registered very early in the boot stage (e.g. registered from postcore_initcall()), then cpufreq core may generate kernel warnings for it. In this case, the CPUs are brought online, then the cpufreq driver is registered, and then the CPU topology devices are registered. However, by the time cpufreq_add_dev() gets called, the cpu device isn't stored in the per-cpu variable (cpu_sys_devices,) which is read by get_cpu_device(). So the cpufreq core fails to get device for the CPU, for which cpufreq_add_dev() was called in the first place and we will hit a WARN_ON(!cpu_dev). Even if we reuse the 'dev' parameter passed to cpufreq_add_dev() to avoid that warning, there might be other CPUs online that share the policy with the cpu for which cpufreq_add_dev() is called. Eventually get_cpu_device() will return NULL for them as well, and we will hit the same WARN_ON() again. In order to fix these issues, change cpufreq core to create links to the policy for a cpu only when cpufreq_add_dev() is called for that CPU. Reuse the 'real_cpus' mask to track that as well. Note that cpufreq_remove_dev() already handles removal of the links for individual CPUs and cpufreq_add_dev() has aligned with that now. Reported-by: Russell King <rmk+kernel@arm.linux.org.uk> Tested-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-09-01cpufreq: Drop unnecessary check from cpufreq_policy_alloc()Rafael J. Wysocki
Since cpufreq_policy_alloc() doesn't use its dev variable for anything useful, drop that variable from there along with the NULL check against it. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-07-22cpufreq: export cpufreq_driver_resolve_freq()Steve Muckle
Export cpufreq_driver_resolve_freq() since governors may be compiled as modules. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Steve Muckle <smuckle@linaro.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-07-21cpufreq: Disallow ->resolve_freq() for drivers providing ->target_index()Viresh Kumar
The handlers provided by cpufreq core are sufficient for resolving the frequency for drivers providing ->target_index(), as the core already has the frequency table and so ->resolve_freq() isn't required for such platforms. This patch disallows drivers with ->target_index() callback to use the ->resolve_freq() callback. Also, it fixes a potential kernel crash for drivers providing ->target() but no ->resolve_freq(). Fixes: e3c062360870 "cpufreq: add cpufreq_driver_resolve_freq()" Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-07-21cpufreq: add cpufreq_driver_resolve_freq()Steve Muckle
Cpufreq governors may need to know what a particular target frequency maps to in the driver without necessarily wanting to set the frequency. Support this operation via a new cpufreq API, cpufreq_driver_resolve_freq(). This API returns the lowest driver frequency equal or greater than the target frequency (CPUFREQ_RELATION_L), subject to any policy (min/max) or driver limitations. The mapping is also cached in the policy so that a subsequent fast_switch operation can avoid repeating the same lookup. The API will call a new cpufreq driver callback, resolve_freq(), if it has been registered by the driver. Otherwise the frequency is resolved via cpufreq_frequency_table_target(). Rather than require ->target() style drivers to provide a resolve_freq() callback it is left to the caller to ensure that the driver implements this callback if necessary to use cpufreq_driver_resolve_freq(). Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Steve Muckle <smuckle@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-07-04cpufreq: Drop redundant check from cpufreq_update_current_freq()Rafael J. Wysocki
Both callers of cpufreq_update_current_freq(), cpufreq_update_policy() and cpufreq_start_governor(), check cpufreq_suspended before calling that function, so drop the redundant cpufreq_suspended check from it. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-07-04Merge back earlier cpufreq material for v4.8.Rafael J. Wysocki
2016-06-28cpufreq: Avoid false-positive WARN_ON()s in cpufreq_update_policy()Rafael J. Wysocki
CPU notifications from the firmware coming in when cpufreq is suspended cause cpufreq_update_current_freq() to return 0 which triggers the WARN_ON() in cpufreq_update_policy() for no reason. Avoid that by checking cpufreq_suspended before calling cpufreq_update_current_freq(). Fixes: c9d9c929e674 (cpufreq: Abort cpufreq_update_current_freq() for cpufreq_suspended set) Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: 4.6+ <stable@vger.kernel.org> # 4.6+
2016-06-13Merge back earlier cpufreq changes for v4.8.Rafael J. Wysocki
2016-06-09cpufreq: Return index from cpufreq_frequency_table_target()Viresh Kumar
This routine can't fail unless the frequency table is invalid and doesn't contain any valid entries. Make it return the index and WARN() in case it is used for an invalid table. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-06-09cpufreq: Drop 'freq_table' argument of __target_index()Viresh Kumar
It is already present as part of the policy and so no need to pass it from the caller. Also, 'freq_table' is guaranteed to be valid in this function and so no need to check it. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-06-09cpufreq: Drop freq-table param to cpufreq_frequency_table_target()Viresh Kumar
The policy already has this pointer set, use it instead. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-06-09cpufreq: Remove cpufreq_frequency_get_table()Viresh Kumar
Most of the callers of cpufreq_frequency_get_table() already have the pointer to a valid 'policy' structure and they don't really need to go through the per-cpu variable first and then a check to validate the frequency, in order to find the freq-table for the policy. Directly use the policy->freq_table field instead for them. Only one user of that API is left after above changes, cpu_cooling.c and it accesses the freq_table in a racy way as the policy can get freed in between. Fix it by using cpufreq_cpu_get() properly. Since there are no more users of cpufreq_frequency_get_table() left, get rid of it. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Javi Merino <javi.merino@arm.com> (cpu_cooling.c) Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-06-02cpufreq: stats: Make the stats code non-modularRafael J. Wysocki
The modularity of cpufreq_stats is quite problematic. First off, the usage of policy notifiers for the initialization and cleanup in the cpufreq_stats module is inherently racy with respect to CPU offline/online and the initialization and cleanup of the cpufreq driver. Second, fast frequency switching (used by the schedutil governor) cannot be enabled if any transition notifiers are registered, so if the cpufreq_stats module (that registers a transition notifier for updating transition statistics) is loaded, the schedutil governor cannot use fast frequency switching. On the other hand, allowing cpufreq_stats to be built as a module doesn't really add much value. Arguably, there's not much reason for that code to be modular at all. For the above reasons, make the cpufreq stats code non-modular, modify the core to invoke functions provided by that code directly and drop the notifiers from it. Make the stats sysfs attributes appear empty if fast frequency switching is enabled as the statistics will not be updated in that case anyway (and returning -EBUSY from those attributes breaks powertop). While at it, clean up Kconfig help for the CPU_FREQ_STAT and CPU_FREQ_STAT_DETAILS options. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-06-02cpufreq: Use clamp_val() in __cpufreq_driver_target()Viresh Kumar
Use clamp_val() instead of open coding it. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-06-02cpufreq: Send START policy notification after sending CREATEViresh Kumar
The sequence got a bit wrong as we are sending CPUFREQ_START notifications even before we have sent CPUFREQ_CREATE_POLICY. Fix it. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-06-02cpufreq: Drop the 'initialized' field from struct cpufreq_governorRafael J. Wysocki
The 'initialized' field in struct cpufreq_governor is only used by the conservative governor (as a usage counter) and the way that happens is far from straightforward and arguably incorrect. Namely, the value of 'initialized' is checked by cpufreq_dbs_governor_init() and cpufreq_dbs_governor_exit() and the results of those checks are passed (as the second argument) to the ->init() and ->exit() callbacks in struct dbs_governor. Those callbacks are only implemented by the ondemand and conservative governors and ondemand doesn't use their second argument at all. In turn, the conservative governor uses it to decide whether or not to either register or unregister a transition notifier. That whole mechanism is not only unnecessarily convoluted, but also racy, because the 'initialized' field of struct cpufreq_governor is updated in cpufreq_init_governor() and cpufreq_exit_governor() under policy->rwsem which doesn't help if one of these functions is run twice in parallel for different policies (which isn't impossible in principle), for example. Instead of it, add a proper usage counter to the conservative governor and update it from cs_init() and cs_exit() which is guaranteed to be non-racy, as those functions are only called under gov_dbs_data_mutex which is global. With that in place, drop the 'initialized' field from struct cpufreq_governor as it is not used any more. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-06-02cpufreq: governor: Get rid of governor eventsRafael J. Wysocki
The design of the cpufreq governor API is not very straightforward, as struct cpufreq_governor provides only one callback to be invoked from different code paths for different purposes. The purpose it is invoked for is determined by its second "event" argument, causing it to act as a "callback multiplexer" of sorts. Unfortunately, that leads to extra complexity in governors, some of which implement the ->governor() callback as a switch statement that simply checks the event argument and invokes a separate function to handle that specific event. That extra complexity can be eliminated by replacing the all-purpose ->governor() callback with a family of callbacks to carry out specific governor operations: initialization and exit, start and stop and policy limits updates. That also turns out to reduce the code size too, so do it. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-06-01cpufreq: Fix clamp_val() usage in cpufreq_driver_fast_switch()Rafael J. Wysocki
The return value of clamp_val() has to be stored actually. Fixes: b7898fda5bc7 (cpufreq: Support for fast frequency switching) Reported-by: Steve Muckle <steve.muckle@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-05-30cpufreq: Split cpufreq_governor() into simpler functionsRafael J. Wysocki
The cpufreq_governor() routine is used by the cpufreq core to invoke the current governor's ->governor() callback with appropriate arguments and do some housekeeping related to that. Unfortunately, the way it mixes different governor events in one code path makes it rather hard to follow the code. For this reason, split cpufreq_governor() into five simpler functions that each will handle just one specific governor event and put all of the code related to the given event into its own function. This change is a prerequisite for a redesign of the cpufreq governor API that will be done subsequently. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-05-30cpufreq: governor: Check transition latecy at init time onlyRafael J. Wysocki
It is not necessary to check the governor's max_transition_latency attribute every time cpufreq_governor() runs, so check it only if the event argument is CPUFREQ_GOV_POLICY_INIT. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>