summaryrefslogtreecommitdiff
path: root/Documentation
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/admin-guide/device-mapper/dm-integrity.rst15
-rw-r--r--Documentation/core-api/printk-formats.rst14
-rw-r--r--Documentation/devicetree/bindings/dma/fsl-edma.txt3
-rw-r--r--Documentation/devicetree/bindings/dma/socionext,uniphier-xdmac.yaml7
-rw-r--r--Documentation/devicetree/bindings/net/dsa/b53.txt3
-rw-r--r--Documentation/driver-api/pm/devices.rst199
-rw-r--r--Documentation/networking/devlink/ice.rst4
-rw-r--r--Documentation/power/pci.rst58
-rw-r--r--Documentation/usb/raw-gadget.rst37
-rw-r--r--Documentation/virt/kvm/index.rst2
-rw-r--r--Documentation/virt/kvm/running-nested-guests.rst276
11 files changed, 496 insertions, 122 deletions
diff --git a/Documentation/admin-guide/device-mapper/dm-integrity.rst b/Documentation/admin-guide/device-mapper/dm-integrity.rst
index c00f9f11e3f3..8439d2ae689b 100644
--- a/Documentation/admin-guide/device-mapper/dm-integrity.rst
+++ b/Documentation/admin-guide/device-mapper/dm-integrity.rst
@@ -182,12 +182,15 @@ fix_padding
space-efficient. If this option is not present, large padding is
used - that is for compatibility with older kernels.
-
-The journal mode (D/J), buffer_sectors, journal_watermark, commit_time can
-be changed when reloading the target (load an inactive table and swap the
-tables with suspend and resume). The other arguments should not be changed
-when reloading the target because the layout of disk data depend on them
-and the reloaded target would be non-functional.
+allow_discards
+ Allow block discard requests (a.k.a. TRIM) for the integrity device.
+ Discards are only allowed to devices using internal hash.
+
+The journal mode (D/J), buffer_sectors, journal_watermark, commit_time and
+allow_discards can be changed when reloading the target (load an inactive
+table and swap the tables with suspend and resume). The other arguments
+should not be changed when reloading the target because the layout of disk
+data depend on them and the reloaded target would be non-functional.
The layout of the formatted block device:
diff --git a/Documentation/core-api/printk-formats.rst b/Documentation/core-api/printk-formats.rst
index 8ebe46b1af39..5dfcc4592b23 100644
--- a/Documentation/core-api/printk-formats.rst
+++ b/Documentation/core-api/printk-formats.rst
@@ -112,6 +112,20 @@ used when printing stack backtraces. The specifier takes into
consideration the effect of compiler optimisations which may occur
when tail-calls are used and marked with the noreturn GCC attribute.
+Probed Pointers from BPF / tracing
+----------------------------------
+
+::
+
+ %pks kernel string
+ %pus user string
+
+The ``k`` and ``u`` specifiers are used for printing prior probed memory from
+either kernel memory (k) or user memory (u). The subsequent ``s`` specifier
+results in printing a string. For direct use in regular vsnprintf() the (k)
+and (u) annotation is ignored, however, when used out of BPF's bpf_trace_printk(),
+for example, it reads the memory it is pointing to without faulting.
+
Kernel Pointers
---------------
diff --git a/Documentation/devicetree/bindings/dma/fsl-edma.txt b/Documentation/devicetree/bindings/dma/fsl-edma.txt
index e77b08ebcd06..ee1754739b4b 100644
--- a/Documentation/devicetree/bindings/dma/fsl-edma.txt
+++ b/Documentation/devicetree/bindings/dma/fsl-edma.txt
@@ -10,7 +10,8 @@ Required properties:
- compatible :
- "fsl,vf610-edma" for eDMA used similar to that on Vybrid vf610 SoC
- "fsl,imx7ulp-edma" for eDMA2 used similar to that on i.mx7ulp
- - "fsl,fsl,ls1028a-edma" for eDMA used similar to that on Vybrid vf610 SoC
+ - "fsl,ls1028a-edma" followed by "fsl,vf610-edma" for eDMA used on the
+ LS1028A SoC.
- reg : Specifies base physical address(s) and size of the eDMA registers.
The 1st region is eDMA control register's address and size.
The 2nd and the 3rd regions are programmable channel multiplexing
diff --git a/Documentation/devicetree/bindings/dma/socionext,uniphier-xdmac.yaml b/Documentation/devicetree/bindings/dma/socionext,uniphier-xdmac.yaml
index 86cfb599256e..371f18773198 100644
--- a/Documentation/devicetree/bindings/dma/socionext,uniphier-xdmac.yaml
+++ b/Documentation/devicetree/bindings/dma/socionext,uniphier-xdmac.yaml
@@ -22,9 +22,7 @@ properties:
const: socionext,uniphier-xdmac
reg:
- items:
- - description: XDMAC base register region (offset and length)
- - description: XDMAC extension register region (offset and length)
+ maxItems: 1
interrupts:
maxItems: 1
@@ -49,12 +47,13 @@ required:
- reg
- interrupts
- "#dma-cells"
+ - dma-channels
examples:
- |
xdmac: dma-controller@5fc10000 {
compatible = "socionext,uniphier-xdmac";
- reg = <0x5fc10000 0x1000>, <0x5fc20000 0x800>;
+ reg = <0x5fc10000 0x5300>;
interrupts = <0 188 4>;
#dma-cells = <2>;
dma-channels = <16>;
diff --git a/Documentation/devicetree/bindings/net/dsa/b53.txt b/Documentation/devicetree/bindings/net/dsa/b53.txt
index 5201bc15fdd6..cfd1afdc6e94 100644
--- a/Documentation/devicetree/bindings/net/dsa/b53.txt
+++ b/Documentation/devicetree/bindings/net/dsa/b53.txt
@@ -110,6 +110,9 @@ Ethernet switch connected via MDIO to the host, CPU port wired to eth0:
#size-cells = <0>;
ports {
+ #address-cells = <1>;
+ #size-cells = <0>;
+
port0@0 {
reg = <0>;
label = "lan1";
diff --git a/Documentation/driver-api/pm/devices.rst b/Documentation/driver-api/pm/devices.rst
index f66c7b9126ea..946ad0b94e31 100644
--- a/Documentation/driver-api/pm/devices.rst
+++ b/Documentation/driver-api/pm/devices.rst
@@ -349,7 +349,7 @@ the phases are: ``prepare``, ``suspend``, ``suspend_late``, ``suspend_noirq``.
PM core will skip the ``suspend``, ``suspend_late`` and
``suspend_noirq`` phases as well as all of the corresponding phases of
the subsequent device resume for all of these devices. In that case,
- the ``->complete`` callback will be invoked directly after the
+ the ``->complete`` callback will be the next one invoked after the
``->prepare`` callback and is entirely responsible for putting the
device into a consistent state as appropriate.
@@ -361,9 +361,9 @@ the phases are: ``prepare``, ``suspend``, ``suspend_late``, ``suspend_noirq``.
runtime PM disabled.
This feature also can be controlled by device drivers by using the
- ``DPM_FLAG_NEVER_SKIP`` and ``DPM_FLAG_SMART_PREPARE`` driver power
- management flags. [Typically, they are set at the time the driver is
- probed against the device in question by passing them to the
+ ``DPM_FLAG_NO_DIRECT_COMPLETE`` and ``DPM_FLAG_SMART_PREPARE`` driver
+ power management flags. [Typically, they are set at the time the driver
+ is probed against the device in question by passing them to the
:c:func:`dev_pm_set_driver_flags` helper function.] If the first of
these flags is set, the PM core will not apply the direct-complete
procedure described above to the given device and, consequenty, to any
@@ -383,11 +383,15 @@ the phases are: ``prepare``, ``suspend``, ``suspend_late``, ``suspend_noirq``.
``->suspend`` methods provided by subsystems (bus types and PM domains
in particular) must follow an additional rule regarding what can be done
to the devices before their drivers' ``->suspend`` methods are called.
- Namely, they can only resume the devices from runtime suspend by
- calling :c:func:`pm_runtime_resume` for them, if that is necessary, and
+ Namely, they may resume the devices from runtime suspend by
+ calling :c:func:`pm_runtime_resume` for them, if that is necessary, but
they must not update the state of the devices in any other way at that
time (in case the drivers need to resume the devices from runtime
- suspend in their ``->suspend`` methods).
+ suspend in their ``->suspend`` methods). In fact, the PM core prevents
+ subsystems or drivers from putting devices into runtime suspend at
+ these times by calling :c:func:`pm_runtime_get_noresume` before issuing
+ the ``->prepare`` callback (and calling :c:func:`pm_runtime_put` after
+ issuing the ``->complete`` callback).
3. For a number of devices it is convenient to split suspend into the
"quiesce device" and "save device state" phases, in which cases
@@ -459,22 +463,22 @@ When resuming from freeze, standby or memory sleep, the phases are:
Note, however, that new children may be registered below the device as
soon as the ``->resume`` callbacks occur; it's not necessary to wait
- until the ``complete`` phase with that.
+ until the ``complete`` phase runs.
Moreover, if the preceding ``->prepare`` callback returned a positive
number, the device may have been left in runtime suspend throughout the
- whole system suspend and resume (the ``suspend``, ``suspend_late``,
- ``suspend_noirq`` phases of system suspend and the ``resume_noirq``,
- ``resume_early``, ``resume`` phases of system resume may have been
- skipped for it). In that case, the ``->complete`` callback is entirely
+ whole system suspend and resume (its ``->suspend``, ``->suspend_late``,
+ ``->suspend_noirq``, ``->resume_noirq``,
+ ``->resume_early``, and ``->resume`` callbacks may have been
+ skipped). In that case, the ``->complete`` callback is entirely
responsible for putting the device into a consistent state after system
suspend if necessary. [For example, it may need to queue up a runtime
resume request for the device for this purpose.] To check if that is
the case, the ``->complete`` callback can consult the device's
- ``power.direct_complete`` flag. Namely, if that flag is set when the
- ``->complete`` callback is being run, it has been called directly after
- the preceding ``->prepare`` and special actions may be required
- to make the device work correctly afterward.
+ ``power.direct_complete`` flag. If that flag is set when the
+ ``->complete`` callback is being run then the direct-complete mechanism
+ was used, and special actions may be required to make the device work
+ correctly afterward.
At the end of these phases, drivers should be as functional as they were before
suspending: I/O can be performed using DMA and IRQs, and the relevant clocks are
@@ -575,10 +579,12 @@ and the phases are similar.
The ``->poweroff``, ``->poweroff_late`` and ``->poweroff_noirq`` callbacks
should do essentially the same things as the ``->suspend``, ``->suspend_late``
-and ``->suspend_noirq`` callbacks, respectively. The only notable difference is
+and ``->suspend_noirq`` callbacks, respectively. A notable difference is
that they need not store the device register values, because the registers
should already have been stored during the ``freeze``, ``freeze_late`` or
-``freeze_noirq`` phases.
+``freeze_noirq`` phases. Also, on many machines the firmware will power-down
+the entire system, so it is not necessary for the callback to put the device in
+a low-power state.
Leaving Hibernation
@@ -764,70 +770,119 @@ device driver in question.
If it is necessary to resume a device from runtime suspend during a system-wide
transition into a sleep state, that can be done by calling
-:c:func:`pm_runtime_resume` for it from the ``->suspend`` callback (or its
-couterpart for transitions related to hibernation) of either the device's driver
-or a subsystem responsible for it (for example, a bus type or a PM domain).
-That is guaranteed to work by the requirement that subsystems must not change
-the state of devices (possibly except for resuming them from runtime suspend)
+:c:func:`pm_runtime_resume` from the ``->suspend`` callback (or the ``->freeze``
+or ``->poweroff`` callback for transitions related to hibernation) of either the
+device's driver or its subsystem (for example, a bus type or a PM domain).
+However, subsystems must not otherwise change the runtime status of devices
from their ``->prepare`` and ``->suspend`` callbacks (or equivalent) *before*
invoking device drivers' ``->suspend`` callbacks (or equivalent).
+.. _smart_suspend_flag:
+
+The ``DPM_FLAG_SMART_SUSPEND`` Driver Flag
+------------------------------------------
+
Some bus types and PM domains have a policy to resume all devices from runtime
suspend upfront in their ``->suspend`` callbacks, but that may not be really
-necessary if the driver of the device can cope with runtime-suspended devices.
-The driver can indicate that by setting ``DPM_FLAG_SMART_SUSPEND`` in
-:c:member:`power.driver_flags` at the probe time, by passing it to the
-:c:func:`dev_pm_set_driver_flags` helper. That also may cause middle-layer code
+necessary if the device's driver can cope with runtime-suspended devices.
+The driver can indicate this by setting ``DPM_FLAG_SMART_SUSPEND`` in
+:c:member:`power.driver_flags` at probe time, with the assistance of the
+:c:func:`dev_pm_set_driver_flags` helper routine.
+
+Setting that flag causes the PM core and middle-layer code
(bus types, PM domains etc.) to skip the ``->suspend_late`` and
``->suspend_noirq`` callbacks provided by the driver if the device remains in
-runtime suspend at the beginning of the ``suspend_late`` phase of system-wide
-suspend (or in the ``poweroff_late`` phase of hibernation), when runtime PM
-has been disabled for it, under the assumption that its state should not change
-after that point until the system-wide transition is over (the PM core itself
-does that for devices whose "noirq", "late" and "early" system-wide PM callbacks
-are executed directly by it). If that happens, the driver's system-wide resume
-callbacks, if present, may still be invoked during the subsequent system-wide
-resume transition and the device's runtime power management status may be set
-to "active" before enabling runtime PM for it, so the driver must be prepared to
-cope with the invocation of its system-wide resume callbacks back-to-back with
-its ``->runtime_suspend`` one (without the intervening ``->runtime_resume`` and
-so on) and the final state of the device must reflect the "active" runtime PM
-status in that case.
+runtime suspend throughout those phases of the system-wide suspend (and
+similarly for the "freeze" and "poweroff" parts of system hibernation).
+[Otherwise the same driver
+callback might be executed twice in a row for the same device, which would not
+be valid in general.] If the middle-layer system-wide PM callbacks are present
+for the device then they are responsible for skipping these driver callbacks;
+if not then the PM core skips them. The subsystem callback routines can
+determine whether they need to skip the driver callbacks by testing the return
+value from the :c:func:`dev_pm_skip_suspend` helper function.
+
+In addition, with ``DPM_FLAG_SMART_SUSPEND`` set, the driver's ``->thaw_noirq``
+and ``->thaw_early`` callbacks are skipped in hibernation if the device remained
+in runtime suspend throughout the preceding "freeze" transition. Again, if the
+middle-layer callbacks are present for the device, they are responsible for
+doing this, otherwise the PM core takes care of it.
+
+
+The ``DPM_FLAG_MAY_SKIP_RESUME`` Driver Flag
+--------------------------------------------
During system-wide resume from a sleep state it's easiest to put devices into
the full-power state, as explained in :file:`Documentation/power/runtime_pm.rst`.
[Refer to that document for more information regarding this particular issue as
well as for information on the device runtime power management framework in
-general.]
-
-However, it often is desirable to leave devices in suspend after system
-transitions to the working state, especially if those devices had been in
+general.] However, it often is desirable to leave devices in suspend after
+system transitions to the working state, especially if those devices had been in
runtime suspend before the preceding system-wide suspend (or analogous)
-transition. Device drivers can use the ``DPM_FLAG_LEAVE_SUSPENDED`` flag to
-indicate to the PM core (and middle-layer code) that they prefer the specific
-devices handled by them to be left suspended and they have no problems with
-skipping their system-wide resume callbacks for this reason. Whether or not the
-devices will actually be left in suspend may depend on their state before the
-given system suspend-resume cycle and on the type of the system transition under
-way. In particular, devices are not left suspended if that transition is a
-restore from hibernation, as device states are not guaranteed to be reflected
-by the information stored in the hibernation image in that case.
-
-The middle-layer code involved in the handling of the device is expected to
-indicate to the PM core if the device may be left in suspend by setting its
-:c:member:`power.may_skip_resume` status bit which is checked by the PM core
-during the "noirq" phase of the preceding system-wide suspend (or analogous)
-transition. The middle layer is then responsible for handling the device as
-appropriate in its "noirq" resume callback, which is executed regardless of
-whether or not the device is left suspended, but the other resume callbacks
-(except for ``->complete``) will be skipped automatically by the PM core if the
-device really can be left in suspend.
-
-For devices whose "noirq", "late" and "early" driver callbacks are invoked
-directly by the PM core, all of the system-wide resume callbacks are skipped if
-``DPM_FLAG_LEAVE_SUSPENDED`` is set and the device is in runtime suspend during
-the ``suspend_noirq`` (or analogous) phase or the transition under way is a
-proper system suspend (rather than anything related to hibernation) and the
-device's wakeup settings are suitable for runtime PM (that is, it cannot
-generate wakeup signals at all or it is allowed to wake up the system from
-sleep).
+transition.
+
+To that end, device drivers can use the ``DPM_FLAG_MAY_SKIP_RESUME`` flag to
+indicate to the PM core and middle-layer code that they allow their "noirq" and
+"early" resume callbacks to be skipped if the device can be left in suspend
+after system-wide PM transitions to the working state. Whether or not that is
+the case generally depends on the state of the device before the given system
+suspend-resume cycle and on the type of the system transition under way.
+In particular, the "thaw" and "restore" transitions related to hibernation are
+not affected by ``DPM_FLAG_MAY_SKIP_RESUME`` at all. [All callbacks are
+issued during the "restore" transition regardless of the flag settings,
+and whether or not any driver callbacks
+are skipped during the "thaw" transition depends whether or not the
+``DPM_FLAG_SMART_SUSPEND`` flag is set (see `above <smart_suspend_flag_>`_).
+In addition, a device is not allowed to remain in runtime suspend if any of its
+children will be returned to full power.]
+
+The ``DPM_FLAG_MAY_SKIP_RESUME`` flag is taken into account in combination with
+the :c:member:`power.may_skip_resume` status bit set by the PM core during the
+"suspend" phase of suspend-type transitions. If the driver or the middle layer
+has a reason to prevent the driver's "noirq" and "early" resume callbacks from
+being skipped during the subsequent system resume transition, it should
+clear :c:member:`power.may_skip_resume` in its ``->suspend``, ``->suspend_late``
+or ``->suspend_noirq`` callback. [Note that the drivers setting
+``DPM_FLAG_SMART_SUSPEND`` need to clear :c:member:`power.may_skip_resume` in
+their ``->suspend`` callback in case the other two are skipped.]
+
+Setting the :c:member:`power.may_skip_resume` status bit along with the
+``DPM_FLAG_MAY_SKIP_RESUME`` flag is necessary, but generally not sufficient,
+for the driver's "noirq" and "early" resume callbacks to be skipped. Whether or
+not they should be skipped can be determined by evaluating the
+:c:func:`dev_pm_skip_resume` helper function.
+
+If that function returns ``true``, the driver's "noirq" and "early" resume
+callbacks should be skipped and the device's runtime PM status will be set to
+"suspended" by the PM core. Otherwise, if the device was runtime-suspended
+during the preceding system-wide suspend transition and its
+``DPM_FLAG_SMART_SUSPEND`` is set, its runtime PM status will be set to
+"active" by the PM core. [Hence, the drivers that do not set
+``DPM_FLAG_SMART_SUSPEND`` should not expect the runtime PM status of their
+devices to be changed from "suspended" to "active" by the PM core during
+system-wide resume-type transitions.]
+
+If the ``DPM_FLAG_MAY_SKIP_RESUME`` flag is not set for a device, but
+``DPM_FLAG_SMART_SUSPEND`` is set and the driver's "late" and "noirq" suspend
+callbacks are skipped, its system-wide "noirq" and "early" resume callbacks, if
+present, are invoked as usual and the device's runtime PM status is set to
+"active" by the PM core before enabling runtime PM for it. In that case, the
+driver must be prepared to cope with the invocation of its system-wide resume
+callbacks back-to-back with its ``->runtime_suspend`` one (without the
+intervening ``->runtime_resume`` and system-wide suspend callbacks) and the
+final state of the device must reflect the "active" runtime PM status in that
+case. [Note that this is not a problem at all if the driver's
+``->suspend_late`` callback pointer points to the same function as its
+``->runtime_suspend`` one and its ``->resume_early`` callback pointer points to
+the same function as the ``->runtime_resume`` one, while none of the other
+system-wide suspend-resume callbacks of the driver are present, for example.]
+
+Likewise, if ``DPM_FLAG_MAY_SKIP_RESUME`` is set for a device, its driver's
+system-wide "noirq" and "early" resume callbacks may be skipped while its "late"
+and "noirq" suspend callbacks may have been executed (in principle, regardless
+of whether or not ``DPM_FLAG_SMART_SUSPEND`` is set). In that case, the driver
+needs to be able to cope with the invocation of its ``->runtime_resume``
+callback back-to-back with its "late" and "noirq" suspend ones. [For instance,
+that is not a concern if the driver sets both ``DPM_FLAG_SMART_SUSPEND`` and
+``DPM_FLAG_MAY_SKIP_RESUME`` and uses the same pair of suspend/resume callback
+functions for runtime PM and system-wide suspend/resume.]
diff --git a/Documentation/networking/devlink/ice.rst b/Documentation/networking/devlink/ice.rst
index 5b58fc4e1268..4574352d6ff4 100644
--- a/Documentation/networking/devlink/ice.rst
+++ b/Documentation/networking/devlink/ice.rst
@@ -61,8 +61,8 @@ The ``ice`` driver reports the following versions
- running
- ICE OS Default Package
- The name of the DDP package that is active in the device. The DDP
- package is loaded by the driver during initialization. Each varation
- of DDP package shall have a unique name.
+ package is loaded by the driver during initialization. Each
+ variation of the DDP package has a unique name.
* - ``fw.app``
- running
- 1.3.1.0
diff --git a/Documentation/power/pci.rst b/Documentation/power/pci.rst
index 0924d29636ad..1831e431f725 100644
--- a/Documentation/power/pci.rst
+++ b/Documentation/power/pci.rst
@@ -1004,41 +1004,39 @@ including the PCI bus type. The flags should be set once at the driver probe
time with the help of the dev_pm_set_driver_flags() function and they should not
be updated directly afterwards.
-The DPM_FLAG_NEVER_SKIP flag prevents the PM core from using the direct-complete
-mechanism allowing device suspend/resume callbacks to be skipped if the device
-is in runtime suspend when the system suspend starts. That also affects all of
-the ancestors of the device, so this flag should only be used if absolutely
-necessary.
-
-The DPM_FLAG_SMART_PREPARE flag instructs the PCI bus type to only return a
-positive value from pci_pm_prepare() if the ->prepare callback provided by the
+The DPM_FLAG_NO_DIRECT_COMPLETE flag prevents the PM core from using the
+direct-complete mechanism allowing device suspend/resume callbacks to be skipped
+if the device is in runtime suspend when the system suspend starts. That also
+affects all of the ancestors of the device, so this flag should only be used if
+absolutely necessary.
+
+The DPM_FLAG_SMART_PREPARE flag causes the PCI bus type to return a positive
+value from pci_pm_prepare() only if the ->prepare callback provided by the
driver of the device returns a positive value. That allows the driver to opt
-out from using the direct-complete mechanism dynamically.
+out from using the direct-complete mechanism dynamically (whereas setting
+DPM_FLAG_NO_DIRECT_COMPLETE means permanent opt-out).
The DPM_FLAG_SMART_SUSPEND flag tells the PCI bus type that from the driver's
perspective the device can be safely left in runtime suspend during system
suspend. That causes pci_pm_suspend(), pci_pm_freeze() and pci_pm_poweroff()
-to skip resuming the device from runtime suspend unless there are PCI-specific
-reasons for doing that. Also, it causes pci_pm_suspend_late/noirq(),
-pci_pm_freeze_late/noirq() and pci_pm_poweroff_late/noirq() to return early
-if the device remains in runtime suspend in the beginning of the "late" phase
-of the system-wide transition under way. Moreover, if the device is in
-runtime suspend in pci_pm_resume_noirq() or pci_pm_restore_noirq(), its runtime
-power management status will be changed to "active" (as it is going to be put
-into D0 going forward), but if it is in runtime suspend in pci_pm_thaw_noirq(),
-the function will set the power.direct_complete flag for it (to make the PM core
-skip the subsequent "thaw" callbacks for it) and return.
-
-Setting the DPM_FLAG_LEAVE_SUSPENDED flag means that the driver prefers the
-device to be left in suspend after system-wide transitions to the working state.
-This flag is checked by the PM core, but the PCI bus type informs the PM core
-which devices may be left in suspend from its perspective (that happens during
-the "noirq" phase of system-wide suspend and analogous transitions) and next it
-uses the dev_pm_may_skip_resume() helper to decide whether or not to return from
-pci_pm_resume_noirq() early, as the PM core will skip the remaining resume
-callbacks for the device during the transition under way and will set its
-runtime PM status to "suspended" if dev_pm_may_skip_resume() returns "true" for
-it.
+to avoid resuming the device from runtime suspend unless there are PCI-specific
+reasons for doing that. Also, it causes pci_pm_suspend_late/noirq() and
+pci_pm_poweroff_late/noirq() to return early if the device remains in runtime
+suspend during the "late" phase of the system-wide transition under way.
+Moreover, if the device is in runtime suspend in pci_pm_resume_noirq() or
+pci_pm_restore_noirq(), its runtime PM status will be changed to "active" (as it
+is going to be put into D0 going forward).
+
+Setting the DPM_FLAG_MAY_SKIP_RESUME flag means that the driver allows its
+"noirq" and "early" resume callbacks to be skipped if the device can be left
+in suspend after a system-wide transition into the working state. This flag is
+taken into consideration by the PM core along with the power.may_skip_resume
+status bit of the device which is set by pci_pm_suspend_noirq() in certain
+situations. If the PM core determines that the driver's "noirq" and "early"
+resume callbacks should be skipped, the dev_pm_skip_resume() helper function
+will return "true" and that will cause pci_pm_resume_noirq() and
+pci_pm_resume_early() to return upfront without touching the device and
+executing the driver callbacks.
3.2. Device Runtime Power Management
------------------------------------
diff --git a/Documentation/usb/raw-gadget.rst b/Documentation/usb/raw-gadget.rst
index 9e78cb858f86..68d879a8009e 100644
--- a/Documentation/usb/raw-gadget.rst
+++ b/Documentation/usb/raw-gadget.rst
@@ -27,9 +27,8 @@ differences are:
3. Raw Gadget provides a way to select a UDC device/driver to bind to,
while GadgetFS currently binds to the first available UDC.
-4. Raw Gadget uses predictable endpoint names (handles) across different
- UDCs (as long as UDCs have enough endpoints of each required transfer
- type).
+4. Raw Gadget explicitly exposes information about endpoints addresses and
+ capabilities allowing a user to write UDC-agnostic gadgets.
5. Raw Gadget has ioctl-based interface instead of a filesystem-based one.
@@ -50,12 +49,36 @@ The typical usage of Raw Gadget looks like:
Raw Gadget and react to those depending on what kind of USB device
needs to be emulated.
+Note, that some UDC drivers have fixed addresses assigned to endpoints, and
+therefore arbitrary endpoint addresses can't be used in the descriptors.
+Nevertheles, Raw Gadget provides a UDC-agnostic way to write USB gadgets.
+Once a USB_RAW_EVENT_CONNECT event is received via USB_RAW_IOCTL_EVENT_FETCH,
+the USB_RAW_IOCTL_EPS_INFO ioctl can be used to find out information about
+endpoints that the UDC driver has. Based on that information, the user must
+chose UDC endpoints that will be used for the gadget being emulated, and
+properly assign addresses in endpoint descriptors.
+
+You can find usage examples (along with a test suite) here:
+
+https://github.com/xairy/raw-gadget
+
+Internal details
+~~~~~~~~~~~~~~~~
+
+Currently every endpoint read/write ioctl submits a USB request and waits until
+its completion. This is the desired mode for coverage-guided fuzzing (as we'd
+like all USB request processing happen during the lifetime of a syscall),
+and must be kept in the implementation. (This might be slow for real world
+applications, thus the O_NONBLOCK improvement suggestion below.)
+
Potential future improvements
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-- Implement ioctl's for setting/clearing halt status on endpoints.
-
-- Reporting more events (suspend, resume, etc.) through
- USB_RAW_IOCTL_EVENT_FETCH.
+- Report more events (suspend, resume, etc.) through USB_RAW_IOCTL_EVENT_FETCH.
- Support O_NONBLOCK I/O.
+
+- Support USB 3 features (accept SS endpoint companion descriptor when
+ enabling endpoints; allow providing stream_id for bulk transfers).
+
+- Support ISO transfer features (expose frame_number for completed requests).
diff --git a/Documentation/virt/kvm/index.rst b/Documentation/virt/kvm/index.rst
index dcc252634cf9..b6833c7bb474 100644
--- a/Documentation/virt/kvm/index.rst
+++ b/Documentation/virt/kvm/index.rst
@@ -28,3 +28,5 @@ KVM
arm/index
devices/index
+
+ running-nested-guests
diff --git a/Documentation/virt/kvm/running-nested-guests.rst b/Documentation/virt/kvm/running-nested-guests.rst
new file mode 100644
index 000000000000..d0a1fc754c84
--- /dev/null
+++ b/Documentation/virt/kvm/running-nested-guests.rst
@@ -0,0 +1,276 @@
+==============================
+Running nested guests with KVM
+==============================
+
+A nested guest is the ability to run a guest inside another guest (it
+can be KVM-based or a different hypervisor). The straightforward
+example is a KVM guest that in turn runs on a KVM guest (the rest of
+this document is built on this example)::
+
+ .----------------. .----------------.
+ | | | |
+ | L2 | | L2 |
+ | (Nested Guest) | | (Nested Guest) |
+ | | | |
+ |----------------'--'----------------|
+ | |
+ | L1 (Guest Hypervisor) |
+ | KVM (/dev/kvm) |
+ | |
+ .------------------------------------------------------.
+ | L0 (Host Hypervisor) |
+ | KVM (/dev/kvm) |
+ |------------------------------------------------------|
+ | Hardware (with virtualization extensions) |
+ '------------------------------------------------------'
+
+Terminology:
+
+- L0 – level-0; the bare metal host, running KVM
+
+- L1 – level-1 guest; a VM running on L0; also called the "guest
+ hypervisor", as it itself is capable of running KVM.
+
+- L2 – level-2 guest; a VM running on L1, this is the "nested guest"
+
+.. note:: The above diagram is modelled after the x86 architecture;
+ s390x, ppc64 and other architectures are likely to have
+ a different design for nesting.
+
+ For example, s390x always has an LPAR (LogicalPARtition)
+ hypervisor running on bare metal, adding another layer and
+ resulting in at least four levels in a nested setup — L0 (bare
+ metal, running the LPAR hypervisor), L1 (host hypervisor), L2
+ (guest hypervisor), L3 (nested guest).
+
+ This document will stick with the three-level terminology (L0,
+ L1, and L2) for all architectures; and will largely focus on
+ x86.
+
+
+Use Cases
+---------
+
+There are several scenarios where nested KVM can be useful, to name a
+few:
+
+- As a developer, you want to test your software on different operating
+ systems (OSes). Instead of renting multiple VMs from a Cloud
+ Provider, using nested KVM lets you rent a large enough "guest
+ hypervisor" (level-1 guest). This in turn allows you to create
+ multiple nested guests (level-2 guests), running different OSes, on
+ which you can develop and test your software.
+
+- Live migration of "guest hypervisors" and their nested guests, for
+ load balancing, disaster recovery, etc.
+
+- VM image creation tools (e.g. ``virt-install``, etc) often run
+ their own VM, and users expect these to work inside a VM.
+
+- Some OSes use virtualization internally for security (e.g. to let
+ applications run safely in isolation).
+
+
+Enabling "nested" (x86)
+-----------------------
+
+From Linux kernel v4.19 onwards, the ``nested`` KVM parameter is enabled
+by default for Intel and AMD. (Though your Linux distribution might
+override this default.)
+
+In case you are running a Linux kernel older than v4.19, to enable
+nesting, set the ``nested`` KVM module parameter to ``Y`` or ``1``. To
+persist this setting across reboots, you can add it in a config file, as
+shown below:
+
+1. On the bare metal host (L0), list the kernel modules and ensure that
+ the KVM modules::
+
+ $ lsmod | grep -i kvm
+ kvm_intel 133627 0
+ kvm 435079 1 kvm_intel
+
+2. Show information for ``kvm_intel`` module::
+
+ $ modinfo kvm_intel | grep -i nested
+ parm: nested:bool
+
+3. For the nested KVM configuration to persist across reboots, place the
+ below in ``/etc/modprobed/kvm_intel.conf`` (create the file if it
+ doesn't exist)::
+
+ $ cat /etc/modprobe.d/kvm_intel.conf
+ options kvm-intel nested=y
+
+4. Unload and re-load the KVM Intel module::
+
+ $ sudo rmmod kvm-intel
+ $ sudo modprobe kvm-intel
+
+5. Verify if the ``nested`` parameter for KVM is enabled::
+
+ $ cat /sys/module/kvm_intel/parameters/nested
+ Y
+
+For AMD hosts, the process is the same as above, except that the module
+name is ``kvm-amd``.
+
+
+Additional nested-related kernel parameters (x86)
+-------------------------------------------------
+
+If your hardware is sufficiently advanced (Intel Haswell processor or
+higher, which has newer hardware virt extensions), the following
+additional features will also be enabled by default: "Shadow VMCS
+(Virtual Machine Control Structure)", APIC Virtualization on your bare
+metal host (L0). Parameters for Intel hosts::
+
+ $ cat /sys/module/kvm_intel/parameters/enable_shadow_vmcs
+ Y
+
+ $ cat /sys/module/kvm_intel/parameters/enable_apicv
+ Y
+
+ $ cat /sys/module/kvm_intel/parameters/ept
+ Y
+
+.. note:: If you suspect your L2 (i.e. nested guest) is running slower,
+ ensure the above are enabled (particularly
+ ``enable_shadow_vmcs`` and ``ept``).
+
+
+Starting a nested guest (x86)
+-----------------------------
+
+Once your bare metal host (L0) is configured for nesting, you should be
+able to start an L1 guest with::
+
+ $ qemu-kvm -cpu host [...]
+
+The above will pass through the host CPU's capabilities as-is to the
+gues); or for better live migration compatibility, use a named CPU
+model supported by QEMU. e.g.::
+
+ $ qemu-kvm -cpu Haswell-noTSX-IBRS,vmx=on
+
+then the guest hypervisor will subsequently be capable of running a
+nested guest with accelerated KVM.
+
+
+Enabling "nested" (s390x)
+-------------------------
+
+1. On the host hypervisor (L0), enable the ``nested`` parameter on
+ s390x::
+
+ $ rmmod kvm
+ $ modprobe kvm nested=1
+
+.. note:: On s390x, the kernel parameter ``hpage`` is mutually exclusive
+ with the ``nested`` paramter — i.e. to be able to enable
+ ``nested``, the ``hpage`` parameter *must* be disabled.
+
+2. The guest hypervisor (L1) must be provided with the ``sie`` CPU
+ feature — with QEMU, this can be done by using "host passthrough"
+ (via the command-line ``-cpu host``).
+
+3. Now the KVM module can be loaded in the L1 (guest hypervisor)::
+
+ $ modprobe kvm
+
+
+Live migration with nested KVM
+------------------------------
+
+Migrating an L1 guest, with a *live* nested guest in it, to another
+bare metal host, works as of Linux kernel 5.3 and QEMU 4.2.0 for
+Intel x86 systems, and even on older versions for s390x.
+
+On AMD systems, once an L1 guest has started an L2 guest, the L1 guest
+should no longer be migrated or saved (refer to QEMU documentation on
+"savevm"/"loadvm") until the L2 guest shuts down. Attempting to migrate
+or save-and-load an L1 guest while an L2 guest is running will result in
+undefined behavior. You might see a ``kernel BUG!`` entry in ``dmesg``, a
+kernel 'oops', or an outright kernel panic. Such a migrated or loaded L1
+guest can no longer be considered stable or secure, and must be restarted.
+Migrating an L1 guest merely configured to support nesting, while not
+actually running L2 guests, is expected to function normally even on AMD
+systems but may fail once guests are started.
+
+Migrating an L2 guest is always expected to succeed, so all the following
+scenarios should work even on AMD systems:
+
+- Migrating a nested guest (L2) to another L1 guest on the *same* bare
+ metal host.
+
+- Migrating a nested guest (L2) to another L1 guest on a *different*
+ bare metal host.
+
+- Migrating a nested guest (L2) to a bare metal host.
+
+Reporting bugs from nested setups
+-----------------------------------
+
+Debugging "nested" problems can involve sifting through log files across
+L0, L1 and L2; this can result in tedious back-n-forth between the bug
+reporter and the bug fixer.
+
+- Mention that you are in a "nested" setup. If you are running any kind
+ of "nesting" at all, say so. Unfortunately, this needs to be called
+ out because when reporting bugs, people tend to forget to even
+ *mention* that they're using nested virtualization.
+
+- Ensure you are actually running KVM on KVM. Sometimes people do not
+ have KVM enabled for their guest hypervisor (L1), which results in
+ them running with pure emulation or what QEMU calls it as "TCG", but
+ they think they're running nested KVM. Thus confusing "nested Virt"
+ (which could also mean, QEMU on KVM) with "nested KVM" (KVM on KVM).
+
+Information to collect (generic)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The following is not an exhaustive list, but a very good starting point:
+
+ - Kernel, libvirt, and QEMU version from L0
+
+ - Kernel, libvirt and QEMU version from L1
+
+ - QEMU command-line of L1 -- when using libvirt, you'll find it here:
+ ``/var/log/libvirt/qemu/instance.log``
+
+ - QEMU command-line of L2 -- as above, when using libvirt, get the
+ complete libvirt-generated QEMU command-line
+
+ - ``cat /sys/cpuinfo`` from L0
+
+ - ``cat /sys/cpuinfo`` from L1
+
+ - ``lscpu`` from L0
+
+ - ``lscpu`` from L1
+
+ - Full ``dmesg`` output from L0
+
+ - Full ``dmesg`` output from L1
+
+x86-specific info to collect
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Both the below commands, ``x86info`` and ``dmidecode``, should be
+available on most Linux distributions with the same name:
+
+ - Output of: ``x86info -a`` from L0
+
+ - Output of: ``x86info -a`` from L1
+
+ - Output of: ``dmidecode`` from L0
+
+ - Output of: ``dmidecode`` from L1
+
+s390x-specific info to collect
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Along with the earlier mentioned generic details, the below is
+also recommended:
+
+ - ``/proc/sysinfo`` from L1; this will also include the info from L0