summaryrefslogtreecommitdiff
path: root/Documentation
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2017-11-13 19:43:50 -0800
committerLinus Torvalds <torvalds@linux-foundation.org>2017-11-13 19:43:50 -0800
commitbd2cd7d5a8f83ddc761025f42a3ca8e56351a6cc (patch)
tree6ea70f09f32544f895020e198dac632145332cc2 /Documentation
parentb29c6ef7bb1257853c1e31616d84f55e561cf631 (diff)
parent990a848d537e4da966907c8ccec95bc568f2911c (diff)
Merge tag 'pm-4.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki: "There are no real big ticket items here this time. The most noticeable change is probably the relocation of the OPP (Operating Performance Points) framework to its own directory under drivers/ as it has grown big enough for that. Also Viresh is now going to maintain it and send pull requests for it to me, so you will see this change in the git history going forward (but still not right now). Another noticeable set of changes is the modifications of the PM core, the PCI subsystem and the ACPI PM domain to allow of more integration between system-wide suspend/resume and runtime PM. For now it's just a way to avoid resuming devices from runtime suspend unnecessarily during system suspend (if the driver sets a flag to indicate its readiness for that) and in the works is an analogous mechanism to allow devices to stay suspended after system resume. In addition to that, we have some changes related to supporting frequency-invariant CPU utilization metrics in the scheduler and in the schedutil cpufreq governor on ARM and changes to add support for device performance states to the generic power domains (genpd) framework. The rest is mostly fixes and cleanups of various sorts. Specifics: - Relocate the OPP (Operating Performance Points) framework to its own directory under drivers/ and add support for power domain performance states to it (Viresh Kumar). - Modify the PM core, the PCI bus type and the ACPI PM domain to support power management driver flags allowing device drivers to specify their capabilities and preferences regarding the handling of devices with enabled runtime PM during system suspend/resume and clean up that code somewhat (Rafael Wysocki, Ulf Hansson). - Add frequency-invariant accounting support to the task scheduler on ARM and ARM64 (Dietmar Eggemann). - Fix PM QoS device resume latency framework to prevent "no restriction" requests from overriding requests with specific requirements and drop the confusing PM_QOS_FLAG_REMOTE_WAKEUP device PM QoS flag (Rafael Wysocki). - Drop legacy class suspend/resume operations from the PM core and drop legacy bus type suspend and resume callbacks from ARM/locomo (Rafael Wysocki). - Add min/max frequency support to devfreq and clean it up somewhat (Chanwoo Choi). - Rework wakeup support in the generic power domains (genpd) framework and update some of its users accordingly (Geert Uytterhoeven). - Convert timers in the PM core to use timer_setup() (Kees Cook). - Add support for exposing the SLP_S0 (Low Power S0 Idle) residency counter based on the LPIT ACPI table on Intel platforms (Srinivas Pandruvada). - Add per-CPU PM QoS resume latency support to the ladder cpuidle governor (Ramesh Thomas). - Fix a deadlock between the wakeup notify handler and the notifier removal in the ACPI core (Ville Syrjälä). - Fix a cpufreq schedutil governor issue causing it to use stale cached frequency values sometimes (Viresh Kumar). - Fix an issue in the system suspend core support code causing wakeup events detection to fail in some cases (Rajat Jain). - Fix the generic power domains (genpd) framework to prevent the PM core from using the direct-complete optimization with it as that is guaranteed to fail (Ulf Hansson). - Fix a minor issue in the cpuidle core and clean it up a bit (Gaurav Jindal, Nicholas Piggin). - Fix and clean up the intel_idle and ARM cpuidle drivers (Jason Baron, Len Brown, Leo Yan). - Fix a couple of minor issues in the OPP framework and clean it up (Arvind Yadav, Fabio Estevam, Sudeep Holla, Tobias Jordan). - Fix and clean up some cpufreq drivers and fix a minor issue in the cpufreq statistics code (Arvind Yadav, Bhumika Goyal, Fabio Estevam, Gautham Shenoy, Gustavo Silva, Marek Szyprowski, Masahiro Yamada, Robert Jarzmik, Zumeng Chen). - Fix minor issues in the system suspend and hibernation core, in power management documentation and in the AVS (Adaptive Voltage Scaling) framework (Helge Deller, Himanshu Jha, Joe Perches, Rafael Wysocki). - Fix some issues in the cpupower utility and document that Shuah Khan is going to maintain it going forward (Prarit Bhargava, Shuah Khan)" * tag 'pm-4.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (88 commits) tools/power/cpupower: add libcpupower.so.0.0.1 to .gitignore tools/power/cpupower: Add 64 bit library detection intel_idle: Graceful probe failure when MWAIT is disabled cpufreq: schedutil: Reset cached_raw_freq when not in sync with next_freq freezer: Fix typo in freezable_schedule_timeout() comment PM / s2idle: Clear the events_check_enabled flag cpufreq: stats: Handle the case when trans_table goes beyond PAGE_SIZE cpufreq: arm_big_little: make cpufreq_arm_bL_ops structures const cpufreq: arm_big_little: make function arguments and structure pointer const cpuidle: Avoid assignment in if () argument cpuidle: Clean up cpuidle_enable_device() error handling a bit ACPI / PM: Fix acpi_pm_notifier_lock vs flush_workqueue() deadlock PM / Domains: Fix genpd to deal with drivers returning 1 from ->prepare() cpuidle: ladder: Add per CPU PM QoS resume latency support PM / QoS: Fix device resume latency framework PM / domains: Rework governor code to be more consistent PM / Domains: Remove gpd_dev_ops.active_wakeup() callback soc: rockchip: power-domain: Use GENPD_FLAG_ACTIVE_WAKEUP soc: mediatek: Use GENPD_FLAG_ACTIVE_WAKEUP ARM: shmobile: pm-rmobile: Use GENPD_FLAG_ACTIVE_WAKEUP ...
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/ABI/testing/sysfs-devices-power20
-rw-r--r--Documentation/acpi/lpit.txt25
-rw-r--r--Documentation/cpu-freq/cpufreq-stats.txt3
-rw-r--r--Documentation/driver-api/pm/devices.rst61
-rw-r--r--Documentation/power/pci.txt33
-rw-r--r--Documentation/power/pm_qos_interface.txt13
6 files changed, 129 insertions, 26 deletions
diff --git a/Documentation/ABI/testing/sysfs-devices-power b/Documentation/ABI/testing/sysfs-devices-power
index 676fdf5f2a99..80a00f7b6667 100644
--- a/Documentation/ABI/testing/sysfs-devices-power
+++ b/Documentation/ABI/testing/sysfs-devices-power
@@ -211,7 +211,9 @@ Description:
device, after it has been suspended at run time, from a resume
request to the moment the device will be ready to process I/O,
in microseconds. If it is equal to 0, however, this means that
- the PM QoS resume latency may be arbitrary.
+ the PM QoS resume latency may be arbitrary and the special value
+ "n/a" means that user space cannot accept any resume latency at
+ all for the given device.
Not all drivers support this attribute. If it isn't supported,
it is not present.
@@ -258,19 +260,3 @@ Description:
This attribute has no effect on system-wide suspend/resume and
hibernation.
-
-What: /sys/devices/.../power/pm_qos_remote_wakeup
-Date: September 2012
-Contact: Rafael J. Wysocki <rjw@rjwysocki.net>
-Description:
- The /sys/devices/.../power/pm_qos_remote_wakeup attribute
- is used for manipulating the PM QoS "remote wakeup required"
- flag. If set, this flag indicates to the kernel that the
- device is a source of user events that have to be signaled from
- its low-power states.
-
- Not all drivers support this attribute. If it isn't supported,
- it is not present.
-
- This attribute has no effect on system-wide suspend/resume and
- hibernation.
diff --git a/Documentation/acpi/lpit.txt b/Documentation/acpi/lpit.txt
new file mode 100644
index 000000000000..b426398d2e97
--- /dev/null
+++ b/Documentation/acpi/lpit.txt
@@ -0,0 +1,25 @@
+To enumerate platform Low Power Idle states, Intel platforms are using
+“Low Power Idle Table” (LPIT). More details about this table can be
+downloaded from:
+http://www.uefi.org/sites/default/files/resources/Intel_ACPI_Low_Power_S0_Idle.pdf
+
+Residencies for each low power state can be read via FFH
+(Function fixed hardware) or a memory mapped interface.
+
+On platforms supporting S0ix sleep states, there can be two types of
+residencies:
+- CPU PKG C10 (Read via FFH interface)
+- Platform Controller Hub (PCH) SLP_S0 (Read via memory mapped interface)
+
+The following attributes are added dynamically to the cpuidle
+sysfs attribute group:
+ /sys/devices/system/cpu/cpuidle/low_power_idle_cpu_residency_us
+ /sys/devices/system/cpu/cpuidle/low_power_idle_system_residency_us
+
+The "low_power_idle_cpu_residency_us" attribute shows time spent
+by the CPU package in PKG C10
+
+The "low_power_idle_system_residency_us" attribute shows SLP_S0
+residency, or system time spent with the SLP_S0# signal asserted.
+This is the lowest possible system power state, achieved only when CPU is in
+PKG C10 and all functional blocks in PCH are in a low power state.
diff --git a/Documentation/cpu-freq/cpufreq-stats.txt b/Documentation/cpu-freq/cpufreq-stats.txt
index 2bbe207354ed..a873855c811d 100644
--- a/Documentation/cpu-freq/cpufreq-stats.txt
+++ b/Documentation/cpu-freq/cpufreq-stats.txt
@@ -90,6 +90,9 @@ Freq_i to Freq_j. Freq_i is in descending order with increasing rows and
Freq_j is in descending order with increasing columns. The output here also
contains the actual freq values for each row and column for better readability.
+If the transition table is bigger than PAGE_SIZE, reading this will
+return an -EFBIG error.
+
--------------------------------------------------------------------------------
<mysystem>:/sys/devices/system/cpu/cpu0/cpufreq/stats # cat trans_table
From : To
diff --git a/Documentation/driver-api/pm/devices.rst b/Documentation/driver-api/pm/devices.rst
index a0dc2879a152..53c1b0b06da5 100644
--- a/Documentation/driver-api/pm/devices.rst
+++ b/Documentation/driver-api/pm/devices.rst
@@ -274,7 +274,7 @@ sleep states and the hibernation state ("suspend-to-disk"). Each phase involves
executing callbacks for every device before the next phase begins. Not all
buses or classes support all these callbacks and not all drivers use all the
callbacks. The various phases always run after tasks have been frozen and
-before they are unfrozen. Furthermore, the ``*_noirq phases`` run at a time
+before they are unfrozen. Furthermore, the ``*_noirq`` phases run at a time
when IRQ handlers have been disabled (except for those marked with the
IRQF_NO_SUSPEND flag).
@@ -328,7 +328,10 @@ the phases are: ``prepare``, ``suspend``, ``suspend_late``, ``suspend_noirq``.
After the ``->prepare`` callback method returns, no new children may be
registered below the device. The method may also prepare the device or
driver in some way for the upcoming system power transition, but it
- should not put the device into a low-power state.
+ should not put the device into a low-power state. Moreover, if the
+ device supports runtime power management, the ``->prepare`` callback
+ method must not update its state in case it is necessary to resume it
+ from runtime suspend later on.
For devices supporting runtime power management, the return value of the
prepare callback can be used to indicate to the PM core that it may
@@ -351,11 +354,35 @@ the phases are: ``prepare``, ``suspend``, ``suspend_late``, ``suspend_noirq``.
is because all such devices are initially set to runtime-suspended with
runtime PM disabled.
+ This feature also can be controlled by device drivers by using the
+ ``DPM_FLAG_NEVER_SKIP`` and ``DPM_FLAG_SMART_PREPARE`` driver power
+ management flags. [Typically, they are set at the time the driver is
+ probed against the device in question by passing them to the
+ :c:func:`dev_pm_set_driver_flags` helper function.] If the first of
+ these flags is set, the PM core will not apply the direct-complete
+ procedure described above to the given device and, consequenty, to any
+ of its ancestors. The second flag, when set, informs the middle layer
+ code (bus types, device types, PM domains, classes) that it should take
+ the return value of the ``->prepare`` callback provided by the driver
+ into account and it may only return a positive value from its own
+ ``->prepare`` callback if the driver's one also has returned a positive
+ value.
+
2. The ``->suspend`` methods should quiesce the device to stop it from
performing I/O. They also may save the device registers and put it into
the appropriate low-power state, depending on the bus type the device is
on, and they may enable wakeup events.
+ However, for devices supporting runtime power management, the
+ ``->suspend`` methods provided by subsystems (bus types and PM domains
+ in particular) must follow an additional rule regarding what can be done
+ to the devices before their drivers' ``->suspend`` methods are called.
+ Namely, they can only resume the devices from runtime suspend by
+ calling :c:func:`pm_runtime_resume` for them, if that is necessary, and
+ they must not update the state of the devices in any other way at that
+ time (in case the drivers need to resume the devices from runtime
+ suspend in their ``->suspend`` methods).
+
3. For a number of devices it is convenient to split suspend into the
"quiesce device" and "save device state" phases, in which cases
``suspend_late`` is meant to do the latter. It is always executed after
@@ -729,6 +756,36 @@ state temporarily, for example so that its system wakeup capability can be
disabled. This all depends on the hardware and the design of the subsystem and
device driver in question.
+If it is necessary to resume a device from runtime suspend during a system-wide
+transition into a sleep state, that can be done by calling
+:c:func:`pm_runtime_resume` for it from the ``->suspend`` callback (or its
+couterpart for transitions related to hibernation) of either the device's driver
+or a subsystem responsible for it (for example, a bus type or a PM domain).
+That is guaranteed to work by the requirement that subsystems must not change
+the state of devices (possibly except for resuming them from runtime suspend)
+from their ``->prepare`` and ``->suspend`` callbacks (or equivalent) *before*
+invoking device drivers' ``->suspend`` callbacks (or equivalent).
+
+Some bus types and PM domains have a policy to resume all devices from runtime
+suspend upfront in their ``->suspend`` callbacks, but that may not be really
+necessary if the driver of the device can cope with runtime-suspended devices.
+The driver can indicate that by setting ``DPM_FLAG_SMART_SUSPEND`` in
+:c:member:`power.driver_flags` at the probe time, by passing it to the
+:c:func:`dev_pm_set_driver_flags` helper. That also may cause middle-layer code
+(bus types, PM domains etc.) to skip the ``->suspend_late`` and
+``->suspend_noirq`` callbacks provided by the driver if the device remains in
+runtime suspend at the beginning of the ``suspend_late`` phase of system-wide
+suspend (or in the ``poweroff_late`` phase of hibernation), when runtime PM
+has been disabled for it, under the assumption that its state should not change
+after that point until the system-wide transition is over. If that happens, the
+driver's system-wide resume callbacks, if present, may still be invoked during
+the subsequent system-wide resume transition and the device's runtime power
+management status may be set to "active" before enabling runtime PM for it,
+so the driver must be prepared to cope with the invocation of its system-wide
+resume callbacks back-to-back with its ``->runtime_suspend`` one (without the
+intervening ``->runtime_resume`` and so on) and the final state of the device
+must reflect the "active" status for runtime PM in that case.
+
During system-wide resume from a sleep state it's easiest to put devices into
the full-power state, as explained in :file:`Documentation/power/runtime_pm.txt`.
Refer to that document for more information regarding this particular issue as
diff --git a/Documentation/power/pci.txt b/Documentation/power/pci.txt
index d17fdf8f45ef..704cd36079b8 100644
--- a/Documentation/power/pci.txt
+++ b/Documentation/power/pci.txt
@@ -961,6 +961,39 @@ dev_pm_ops to indicate that one suspend routine is to be pointed to by the
.suspend(), .freeze(), and .poweroff() members and one resume routine is to
be pointed to by the .resume(), .thaw(), and .restore() members.
+3.1.19. Driver Flags for Power Management
+
+The PM core allows device drivers to set flags that influence the handling of
+power management for the devices by the core itself and by middle layer code
+including the PCI bus type. The flags should be set once at the driver probe
+time with the help of the dev_pm_set_driver_flags() function and they should not
+be updated directly afterwards.
+
+The DPM_FLAG_NEVER_SKIP flag prevents the PM core from using the direct-complete
+mechanism allowing device suspend/resume callbacks to be skipped if the device
+is in runtime suspend when the system suspend starts. That also affects all of
+the ancestors of the device, so this flag should only be used if absolutely
+necessary.
+
+The DPM_FLAG_SMART_PREPARE flag instructs the PCI bus type to only return a
+positive value from pci_pm_prepare() if the ->prepare callback provided by the
+driver of the device returns a positive value. That allows the driver to opt
+out from using the direct-complete mechanism dynamically.
+
+The DPM_FLAG_SMART_SUSPEND flag tells the PCI bus type that from the driver's
+perspective the device can be safely left in runtime suspend during system
+suspend. That causes pci_pm_suspend(), pci_pm_freeze() and pci_pm_poweroff()
+to skip resuming the device from runtime suspend unless there are PCI-specific
+reasons for doing that. Also, it causes pci_pm_suspend_late/noirq(),
+pci_pm_freeze_late/noirq() and pci_pm_poweroff_late/noirq() to return early
+if the device remains in runtime suspend in the beginning of the "late" phase
+of the system-wide transition under way. Moreover, if the device is in
+runtime suspend in pci_pm_resume_noirq() or pci_pm_restore_noirq(), its runtime
+power management status will be changed to "active" (as it is going to be put
+into D0 going forward), but if it is in runtime suspend in pci_pm_thaw_noirq(),
+the function will set the power.direct_complete flag for it (to make the PM core
+skip the subsequent "thaw" callbacks for it) and return.
+
3.2. Device Runtime Power Management
------------------------------------
In addition to providing device power management callbacks PCI device drivers
diff --git a/Documentation/power/pm_qos_interface.txt b/Documentation/power/pm_qos_interface.txt
index 21d2d48f87a2..19c5f7b1a7ba 100644
--- a/Documentation/power/pm_qos_interface.txt
+++ b/Documentation/power/pm_qos_interface.txt
@@ -98,8 +98,7 @@ Values are updated in response to changes of the request list.
The target values of resume latency and active state latency tolerance are
simply the minimum of the request values held in the parameter list elements.
The PM QoS flags aggregate value is a gather (bitwise OR) of all list elements'
-values. Two device PM QoS flags are defined currently: PM_QOS_FLAG_NO_POWER_OFF
-and PM_QOS_FLAG_REMOTE_WAKEUP.
+values. One device PM QoS flag is defined currently: PM_QOS_FLAG_NO_POWER_OFF.
Note: The aggregated target values are implemented in such a way that reading
the aggregated value does not require any locking mechanism.
@@ -153,14 +152,14 @@ PM QoS list of resume latency constraints and remove sysfs attribute
pm_qos_resume_latency_us from the device's power directory.
int dev_pm_qos_expose_flags(device, value)
-Add a request to the device's PM QoS list of flags and create sysfs attributes
-pm_qos_no_power_off and pm_qos_remote_wakeup under the device's power directory
-allowing user space to change these flags' value.
+Add a request to the device's PM QoS list of flags and create sysfs attribute
+pm_qos_no_power_off under the device's power directory allowing user space to
+change the value of the PM_QOS_FLAG_NO_POWER_OFF flag.
void dev_pm_qos_hide_flags(device)
Drop the request added by dev_pm_qos_expose_flags() from the device's PM QoS list
-of flags and remove sysfs attributes pm_qos_no_power_off and pm_qos_remote_wakeup
-under the device's power directory.
+of flags and remove sysfs attribute pm_qos_no_power_off from the device's power
+directory.
Notification mechanisms:
The per-device PM QoS framework has a per-device notification tree.