11 files changed, 1204 insertions, 13 deletions
diff --git a/Documentation/driver-api/80211/cfg80211.rst b/Documentation/driver-api/80211/cfg80211.rst
index 8ffac57e1f5b..eeab91b59457 100644
--- a/Documentation/driver-api/80211/cfg80211.rst
+++ b/Documentation/driver-api/80211/cfg80211.rst
@@ -300,9 +300,6 @@ Data path helpers
    :functions: ieee80211_data_to_8023
 
 .. kernel-doc:: include/net/cfg80211.h
-   :functions: ieee80211_data_from_8023
-
-.. kernel-doc:: include/net/cfg80211.h
    :functions: ieee80211_amsdu_to_8023s
 
 .. kernel-doc:: include/net/cfg80211.h
diff --git a/Documentation/driver-api/dmaengine/client.rst b/Documentation/driver-api/dmaengine/client.rst
new file mode 100644
index 000000000000..6245c99af8c1
--- /dev/null
+++ b/Documentation/driver-api/dmaengine/client.rst
@@ -0,0 +1,275 @@
+====================
+DMA Engine API Guide
+====================
+
+Vinod Koul <vinod dot koul at intel.com>
+
+.. note:: For DMA Engine usage in async_tx please see:
+          ``Documentation/crypto/async-tx-api.txt``
+
+
+Below is a guide to device driver writers on how to use the Slave-DMA API of the
+DMA Engine. This is applicable only for slave DMA usage only.
+
+DMA usage
+=========
+
+The slave DMA usage consists of following steps:
+
+- Allocate a DMA slave channel
+
+- Set slave and controller specific parameters
+
+- Get a descriptor for transaction
+
+- Submit the transaction
+
+- Issue pending requests and wait for callback notification
+
+The details of these operations are:
+
+1. Allocate a DMA slave channel
+
+   Channel allocation is slightly different in the slave DMA context,
+   client drivers typically need a channel from a particular DMA
+   controller only and even in some cases a specific channel is desired.
+   To request a channel dma_request_chan() API is used.
+
+   Interface:
+
+   .. code-block:: c
+
+      struct dma_chan *dma_request_chan(struct device *dev, const char *name);
+
+   Which will find and return the ``name`` DMA channel associated with the 'dev'
+   device. The association is done via DT, ACPI or board file based
+   dma_slave_map matching table.
+
+   A channel allocated via this interface is exclusive to the caller,
+   until dma_release_channel() is called.
+
+2. Set slave and controller specific parameters
+
+   Next step is always to pass some specific information to the DMA
+   driver. Most of the generic information which a slave DMA can use
+   is in struct dma_slave_config. This allows the clients to specify
+   DMA direction, DMA addresses, bus widths, DMA burst lengths etc
+   for the peripheral.
+
+   If some DMA controllers have more parameters to be sent then they
+   should try to embed struct dma_slave_config in their controller
+   specific structure. That gives flexibility to client to pass more
+   parameters, if required.
+
+   Interface:
+
+   .. code-block:: c
+
+      int dmaengine_slave_config(struct dma_chan *chan,
+			struct dma_slave_config *config)
+
+   Please see the dma_slave_config structure definition in dmaengine.h
+   for a detailed explanation of the struct members. Please note
+   that the 'direction' member will be going away as it duplicates the
+   direction given in the prepare call.
+
+3. Get a descriptor for transaction
+
+  For slave usage the various modes of slave transfers supported by the
+  DMA-engine are:
+
+  - slave_sg: DMA a list of scatter gather buffers from/to a peripheral
+
+  - dma_cyclic: Perform a cyclic DMA operation from/to a peripheral till the
+    operation is explicitly stopped.
+
+  - interleaved_dma: This is common to Slave as well as M2M clients. For slave
+    address of devices' fifo could be already known to the driver.
+    Various types of operations could be expressed by setting
+    appropriate values to the 'dma_interleaved_template' members.
+
+  A non-NULL return of this transfer API represents a "descriptor" for
+  the given transaction.
+
+  Interface:
+
+  .. code-block:: c
+
+     struct dma_async_tx_descriptor *dmaengine_prep_slave_sg(
+		struct dma_chan *chan, struct scatterlist *sgl,
+		unsigned int sg_len, enum dma_data_direction direction,
+		unsigned long flags);
+
+     struct dma_async_tx_descriptor *dmaengine_prep_dma_cyclic(
+		struct dma_chan *chan, dma_addr_t buf_addr, size_t buf_len,
+		size_t period_len, enum dma_data_direction direction);
+
+     struct dma_async_tx_descriptor *dmaengine_prep_interleaved_dma(
+		struct dma_chan *chan, struct dma_interleaved_template *xt,
+		unsigned long flags);
+
+  The peripheral driver is expected to have mapped the scatterlist for
+  the DMA operation prior to calling dmaengine_prep_slave_sg(), and must
+  keep the scatterlist mapped until the DMA operation has completed.
+  The scatterlist must be mapped using the DMA struct device.
+  If a mapping needs to be synchronized later, dma_sync_*_for_*() must be
+  called using the DMA struct device, too.
+  So, normal setup should look like this:
+
+  .. code-block:: c
+
+     nr_sg = dma_map_sg(chan->device->dev, sgl, sg_len);
+	if (nr_sg == 0)
+		/* error */
+
+	desc = dmaengine_prep_slave_sg(chan, sgl, nr_sg, direction, flags);
+
+  Once a descriptor has been obtained, the callback information can be
+  added and the descriptor must then be submitted. Some DMA engine
+  drivers may hold a spinlock between a successful preparation and
+  submission so it is important that these two operations are closely
+  paired.
+
+  .. note::
+
+     Although the async_tx API specifies that completion callback
+     routines cannot submit any new operations, this is not the
+     case for slave/cyclic DMA.
+
+     For slave DMA, the subsequent transaction may not be available
+     for submission prior to callback function being invoked, so
+     slave DMA callbacks are permitted to prepare and submit a new
+     transaction.
+
+     For cyclic DMA, a callback function may wish to terminate the
+     DMA via dmaengine_terminate_async().
+
+     Therefore, it is important that DMA engine drivers drop any
+     locks before calling the callback function which may cause a
+     deadlock.
+
+     Note that callbacks will always be invoked from the DMA
+     engines tasklet, never from interrupt context.
+
+4. Submit the transaction
+
+   Once the descriptor has been prepared and the callback information
+   added, it must be placed on the DMA engine drivers pending queue.
+
+   Interface:
+
+   .. code-block:: c
+
+      dma_cookie_t dmaengine_submit(struct dma_async_tx_descriptor *desc)
+
+   This returns a cookie can be used to check the progress of DMA engine
+   activity via other DMA engine calls not covered in this document.
+
+   dmaengine_submit() will not start the DMA operation, it merely adds
+   it to the pending queue. For this, see step 5, dma_async_issue_pending.
+
+5. Issue pending DMA requests and wait for callback notification
+
+   The transactions in the pending queue can be activated by calling the
+   issue_pending API. If channel is idle then the first transaction in
+   queue is started and subsequent ones queued up.
+
+   On completion of each DMA operation, the next in queue is started and
+   a tasklet triggered. The tasklet will then call the client driver
+   completion callback routine for notification, if set.
+
+   Interface:
+
+   .. code-block:: c
+
+      void dma_async_issue_pending(struct dma_chan *chan);
+
+Further APIs:
+------------
+
+1. Terminate APIs
+
+   .. code-block:: c
+
+      int dmaengine_terminate_sync(struct dma_chan *chan)
+      int dmaengine_terminate_async(struct dma_chan *chan)
+      int dmaengine_terminate_all(struct dma_chan *chan) /* DEPRECATED */
+
+   This causes all activity for the DMA channel to be stopped, and may
+   discard data in the DMA FIFO which hasn't been fully transferred.
+   No callback functions will be called for any incomplete transfers.
+
+   Two variants of this function are available.
+
+   dmaengine_terminate_async() might not wait until the DMA has been fully
+   stopped or until any running complete callbacks have finished. But it is
+   possible to call dmaengine_terminate_async() from atomic context or from
+   within a complete callback. dmaengine_synchronize() must be called before it
+   is safe to free the memory accessed by the DMA transfer or free resources
+   accessed from within the complete callback.
+
+   dmaengine_terminate_sync() will wait for the transfer and any running
+   complete callbacks to finish before it returns. But the function must not be
+   called from atomic context or from within a complete callback.
+
+   dmaengine_terminate_all() is deprecated and should not be used in new code.
+
+2. Pause API
+
+   .. code-block:: c
+
+      int dmaengine_pause(struct dma_chan *chan)
+
+   This pauses activity on the DMA channel without data loss.
+
+3. Resume API
+
+   .. code-block:: c
+
+       int dmaengine_resume(struct dma_chan *chan)
+
+   Resume a previously paused DMA channel. It is invalid to resume a
+   channel which is not currently paused.
+
+4. Check Txn complete
+
+   .. code-block:: c
+
+      enum dma_status dma_async_is_tx_complete(struct dma_chan *chan,
+		dma_cookie_t cookie, dma_cookie_t *last, dma_cookie_t *used)
+
+   This can be used to check the status of the channel. Please see
+   the documentation in include/linux/dmaengine.h for a more complete
+   description of this API.
+
+   This can be used in conjunction with dma_async_is_complete() and
+   the cookie returned from dmaengine_submit() to check for
+   completion of a specific DMA transaction.
+
+   .. note::
+
+      Not all DMA engine drivers can return reliable information for
+      a running DMA channel. It is recommended that DMA engine users
+      pause or stop (via dmaengine_terminate_all()) the channel before
+      using this API.
+
+5. Synchronize termination API
+
+   .. code-block:: c
+
+      void dmaengine_synchronize(struct dma_chan *chan)
+
+   Synchronize the termination of the DMA channel to the current context.
+
+   This function should be used after dmaengine_terminate_async() to synchronize
+   the termination of the DMA channel to the current context. The function will
+   wait for the transfer and any running complete callbacks to finish before it
+   returns.
+
+   If dmaengine_terminate_async() is used to stop the DMA channel this function
+   must be called before it is safe to free memory accessed by previously
+   submitted descriptors or to free any resources accessed within the complete
+   callback of previously submitted descriptors.
+
+   The behavior of this function is undefined if dma_async_issue_pending() has
+   been called between dmaengine_terminate_async() and this function.
diff --git a/Documentation/driver-api/dmaengine/dmatest.rst b/Documentation/driver-api/dmaengine/dmatest.rst
new file mode 100644
index 000000000000..3922c0a3f0c0
--- /dev/null
+++ b/Documentation/driver-api/dmaengine/dmatest.rst
@@ -0,0 +1,110 @@
+==============
+DMA Test Guide
+==============
+
+Andy Shevchenko <andriy.shevchenko@linux.intel.com>
+
+This small document introduces how to test DMA drivers using dmatest module.
+
+Part 1 - How to build the test module
+=====================================
+
+The menuconfig contains an option that could be found by following path:
+	Device Drivers -> DMA Engine support -> DMA Test client
+
+In the configuration file the option called CONFIG_DMATEST. The dmatest could
+be built as module or inside kernel. Let's consider those cases.
+
+Part 2 - When dmatest is built as a module
+==========================================
+
+Example of usage: ::
+
+    % modprobe dmatest channel=dma0chan0 timeout=2000 iterations=1 run=1
+
+...or: ::
+
+    % modprobe dmatest
+    % echo dma0chan0 > /sys/module/dmatest/parameters/channel
+    % echo 2000 > /sys/module/dmatest/parameters/timeout
+    % echo 1 > /sys/module/dmatest/parameters/iterations
+    % echo 1 > /sys/module/dmatest/parameters/run
+
+...or on the kernel command line: ::
+
+    dmatest.channel=dma0chan0 dmatest.timeout=2000 dmatest.iterations=1 dmatest.run=1
+
+..hint:: available channel list could be extracted by running the following
+         command:
+
+::
+
+    % ls -1 /sys/class/dma/
+
+Once started a message like "dmatest: Started 1 threads using dma0chan0" is
+emitted. After that only test failure messages are reported until the test
+stops.
+
+Note that running a new test will not stop any in progress test.
+
+The following command returns the state of the test. ::
+
+    % cat /sys/module/dmatest/parameters/run
+
+To wait for test completion userpace can poll 'run' until it is false, or use
+the wait parameter. Specifying 'wait=1' when loading the module causes module
+initialization to pause until a test run has completed, while reading
+/sys/module/dmatest/parameters/wait waits for any running test to complete
+before returning. For example, the following scripts wait for 42 tests
+to complete before exiting. Note that if 'iterations' is set to 'infinite' then
+waiting is disabled.
+
+Example: ::
+
+    % modprobe dmatest run=1 iterations=42 wait=1
+    % modprobe -r dmatest
+
+...or: ::
+
+    % modprobe dmatest run=1 iterations=42
+    % cat /sys/module/dmatest/parameters/wait
+    % modprobe -r dmatest
+
+Part 3 - When built-in in the kernel
+====================================
+
+The module parameters that is supplied to the kernel command line will be used
+for the first performed test. After user gets a control, the test could be
+re-run with the same or different parameters. For the details see the above
+section "Part 2 - When dmatest is built as a module..."
+
+In both cases the module parameters are used as the actual values for the test
+case. You always could check them at run-time by running ::
+
+    % grep -H . /sys/module/dmatest/parameters/*
+
+Part 4 - Gathering the test results
+===================================
+
+Test results are printed to the kernel log buffer with the format: ::
+
+    "dmatest: result <channel>: <test id>: '<error msg>' with src_off=<val> dst_off=<val> len=<val> (<err code>)"
+
+Example of output: ::
+
+
+    % dmesg | tail -n 1
+    dmatest: result dma0chan0-copy0: #1: No errors with src_off=0x7bf dst_off=0x8ad len=0x3fea (0)
+
+The message format is unified across the different types of errors. A number in
+the parens represents additional information, e.g. error code, error counter,
+or status. A test thread also emits a summary line at completion listing the
+number of tests executed, number that failed, and a result code.
+
+Example: ::
+
+    % dmesg | tail -n 1
+    dmatest: dma0chan0-copy0: summary 1 test, 0 failures 1000 iops 100000 KB/s (0)
+
+The details of a data miscompare error are also emitted, but do not follow the
+above format.
diff --git a/Documentation/driver-api/dmaengine/index.rst b/Documentation/driver-api/dmaengine/index.rst
new file mode 100644
index 000000000000..3026fa975937
--- /dev/null
+++ b/Documentation/driver-api/dmaengine/index.rst
@@ -0,0 +1,55 @@
+=======================
+DMAEngine documentation
+=======================
+
+DMAEngine documentation provides documents for various aspects of DMAEngine
+framework.
+
+DMAEngine documentation
+-----------------------
+
+This book helps with DMAengine internal APIs and guide for DMAEngine device
+driver writers.
+
+.. toctree::
+   :maxdepth: 1
+
+   provider
+
+DMAEngine client documentation
+------------------------------
+
+This book is a guide to device driver writers on how to use the Slave-DMA
+API of the DMAEngine. This is applicable only for slave DMA usage only.
+
+.. toctree::
+   :maxdepth: 1
+
+   client
+
+DMA Test documentation
+----------------------
+
+This book introduces how to test DMA drivers using dmatest module.
+
+.. toctree::
+   :maxdepth: 1
+
+   dmatest
+
+PXA DMA documentation
+----------------------
+
+This book adds some notes about PXA DMA
+
+.. toctree::
+   :maxdepth: 1
+
+   pxa_dma
+
+.. only::  subproject
+
+   Indices
+   =======
+
+   * :ref:`genindex`
diff --git a/Documentation/driver-api/dmaengine/provider.rst b/Documentation/driver-api/dmaengine/provider.rst
new file mode 100644
index 000000000000..814acb4d2294
--- /dev/null
+++ b/Documentation/driver-api/dmaengine/provider.rst
@@ -0,0 +1,508 @@
+==================================
+DMAengine controller documentation
+==================================
+
+Hardware Introduction
+=====================
+
+Most of the Slave DMA controllers have the same general principles of
+operations.
+
+They have a given number of channels to use for the DMA transfers, and
+a given number of requests lines.
+
+Requests and channels are pretty much orthogonal. Channels can be used
+to serve several to any requests. To simplify, channels are the
+entities that will be doing the copy, and requests what endpoints are
+involved.
+
+The request lines actually correspond to physical lines going from the
+DMA-eligible devices to the controller itself. Whenever the device
+will want to start a transfer, it will assert a DMA request (DRQ) by
+asserting that request line.
+
+A very simple DMA controller would only take into account a single
+parameter: the transfer size. At each clock cycle, it would transfer a
+byte of data from one buffer to another, until the transfer size has
+been reached.
+
+That wouldn't work well in the real world, since slave devices might
+require a specific number of bits to be transferred in a single
+cycle. For example, we may want to transfer as much data as the
+physical bus allows to maximize performances when doing a simple
+memory copy operation, but our audio device could have a narrower FIFO
+that requires data to be written exactly 16 or 24 bits at a time. This
+is why most if not all of the DMA controllers can adjust this, using a
+parameter called the transfer width.
+
+Moreover, some DMA controllers, whenever the RAM is used as a source
+or destination, can group the reads or writes in memory into a buffer,
+so instead of having a lot of small memory accesses, which is not
+really efficient, you'll get several bigger transfers. This is done
+using a parameter called the burst size, that defines how many single
+reads/writes it's allowed to do without the controller splitting the
+transfer into smaller sub-transfers.
+
+Our theoretical DMA controller would then only be able to do transfers
+that involve a single contiguous block of data. However, some of the
+transfers we usually have are not, and want to copy data from
+non-contiguous buffers to a contiguous buffer, which is called
+scatter-gather.
+
+DMAEngine, at least for mem2dev transfers, require support for
+scatter-gather. So we're left with two cases here: either we have a
+quite simple DMA controller that doesn't support it, and we'll have to
+implement it in software, or we have a more advanced DMA controller,
+that implements in hardware scatter-gather.
+
+The latter are usually programmed using a collection of chunks to
+transfer, and whenever the transfer is started, the controller will go
+over that collection, doing whatever we programmed there.
+
+This collection is usually either a table or a linked list. You will
+then push either the address of the table and its number of elements,
+or the first item of the list to one channel of the DMA controller,
+and whenever a DRQ will be asserted, it will go through the collection
+to know where to fetch the data from.
+
+Either way, the format of this collection is completely dependent on
+your hardware. Each DMA controller will require a different structure,
+but all of them will require, for every chunk, at least the source and
+destination addresses, whether it should increment these addresses or
+not and the three parameters we saw earlier: the burst size, the
+transfer width and the transfer size.
+
+The one last thing is that usually, slave devices won't issue DRQ by
+default, and you have to enable this in your slave device driver first
+whenever you're willing to use DMA.
+
+These were just the general memory-to-memory (also called mem2mem) or
+memory-to-device (mem2dev) kind of transfers. Most devices often
+support other kind of transfers or memory operations that dmaengine
+support and will be detailed later in this document.
+
+DMA Support in Linux
+====================
+
+Historically, DMA controller drivers have been implemented using the
+async TX API, to offload operations such as memory copy, XOR,
+cryptography, etc., basically any memory to memory operation.
+
+Over time, the need for memory to device transfers arose, and
+dmaengine was extended. Nowadays, the async TX API is written as a
+layer on top of dmaengine, and acts as a client. Still, dmaengine
+accommodates that API in some cases, and made some design choices to
+ensure that it stayed compatible.
+
+For more information on the Async TX API, please look the relevant
+documentation file in Documentation/crypto/async-tx-api.txt.
+
+DMAEngine APIs
+==============
+
+``struct dma_device`` Initialization
+------------------------------------
+
+Just like any other kernel framework, the whole DMAEngine registration
+relies on the driver filling a structure and registering against the
+framework. In our case, that structure is dma_device.
+
+The first thing you need to do in your driver is to allocate this
+structure. Any of the usual memory allocators will do, but you'll also
+need to initialize a few fields in there:
+
+- channels: should be initialized as a list using the
+  INIT_LIST_HEAD macro for example
+
+- src_addr_widths:
+  should contain a bitmask of the supported source transfer width
+
+- dst_addr_widths:
+  should contain a bitmask of the supported destination transfer width
+
+- directions:
+  should contain a bitmask of the supported slave directions
+  (i.e. excluding mem2mem transfers)
+
+- residue_granularity:
+
+  - Granularity of the transfer residue reported to dma_set_residue.
+    This can be either:
+
+  - Descriptor
+
+    - Your device doesn't support any kind of residue
+      reporting. The framework will only know that a particular
+      transaction descriptor is done.
+
+      - Segment
+
+        - Your device is able to report which chunks have been transferred
+
+      - Burst
+
+        - Your device is able to report which burst have been transferred
+
+  - dev: should hold the pointer to the ``struct device`` associated
+    to your current driver instance.
+
+Supported transaction types
+---------------------------
+
+The next thing you need is to set which transaction types your device
+(and driver) supports.
+
+Our ``dma_device structure`` has a field called cap_mask that holds the
+various types of transaction supported, and you need to modify this
+mask using the dma_cap_set function, with various flags depending on
+transaction types you support as an argument.
+
+All those capabilities are defined in the ``dma_transaction_type enum``,
+in ``include/linux/dmaengine.h``
+
+Currently, the types available are:
+
+- DMA_MEMCPY
+
+  - The device is able to do memory to memory copies
+
+- DMA_XOR
+
+  - The device is able to perform XOR operations on memory areas
+
+  - Used to accelerate XOR intensive tasks, such as RAID5
+
+- DMA_XOR_VAL
+
+  - The device is able to perform parity check using the XOR
+    algorithm against a memory buffer.
+
+- DMA_PQ
+
+  - The device is able to perform RAID6 P+Q computations, P being a
+    simple XOR, and Q being a Reed-Solomon algorithm.
+
+- DMA_PQ_VAL
+
+  - The device is able to perform parity check using RAID6 P+Q
+    algorithm against a memory buffer.
+
+- DMA_INTERRUPT
+
+  - The device is able to trigger a dummy transfer that will
+    generate periodic interrupts
+
+  - Used by the client drivers to register a callback that will be
+    called on a regular basis through the DMA controller interrupt
+
+- DMA_PRIVATE
+
+  - The devices only supports slave transfers, and as such isn't
+    available for async transfers.
+
+- DMA_ASYNC_TX
+
+  - Must not be set by the device, and will be set by the framework
+    if needed
+
+  - TODO: What is it about?
+
+- DMA_SLAVE
+
+  - The device can handle device to memory transfers, including
+    scatter-gather transfers.
+
+  - While in the mem2mem case we were having two distinct types to
+    deal with a single chunk to copy or a collection of them, here,
+    we just have a single transaction type that is supposed to
+    handle both.
+
+  - If you want to transfer a single contiguous memory buffer,
+    simply build a scatter list with only one item.
+
+- DMA_CYCLIC
+
+  - The device can handle cyclic transfers.
+
+  - A cyclic transfer is a transfer where the chunk collection will
+    loop over itself, with the last item pointing to the first.
+
+  - It's usually used for audio transfers, where you want to operate
+    on a single ring buffer that you will fill with your audio data.
+
+- DMA_INTERLEAVE
+
+  - The device supports interleaved transfer.
+
+  - These transfers can transfer data from a non-contiguous buffer
+    to a non-contiguous buffer, opposed to DMA_SLAVE that can
+    transfer data from a non-contiguous data set to a continuous
+    destination buffer.
+
+  - It's usually used for 2d content transfers, in which case you
+    want to transfer a portion of uncompressed data directly to the
+    display to print it
+
+These various types will also affect how the source and destination
+addresses change over time.
+
+Addresses pointing to RAM are typically incremented (or decremented)
+after each transfer. In case of a ring buffer, they may loop
+(DMA_CYCLIC). Addresses pointing to a device's register (e.g. a FIFO)
+are typically fixed.
+
+Device operations
+-----------------
+
+Our dma_device structure also requires a few function pointers in
+order to implement the actual logic, now that we described what
+operations we were able to perform.
+
+The functions that we have to fill in there, and hence have to
+implement, obviously depend on the transaction types you reported as
+supported.
+
+- ``device_alloc_chan_resources``
+
+- ``device_free_chan_resources``
+
+  - These functions will be called whenever a driver will call
+    ``dma_request_channel`` or ``dma_release_channel`` for the first/last
+    time on the channel associated to that driver.
+
+  - They are in charge of allocating/freeing all the needed
+    resources in order for that channel to be useful for your driver.
+
+  - These functions can sleep.
+
+- ``device_prep_dma_*``
+
+  - These functions are matching the capabilities you registered
+    previously.
+
+  - These functions all take the buffer or the scatterlist relevant
+    for the transfer being prepared, and should create a hardware
+    descriptor or a list of hardware descriptors from it
+
+  - These functions can be called from an interrupt context
+
+  - Any allocation you might do should be using the GFP_NOWAIT
+    flag, in order not to potentially sleep, but without depleting
+    the emergency pool either.
+
+  - Drivers should try to pre-allocate any memory they might need
+    during the transfer setup at probe time to avoid putting to
+    much pressure on the nowait allocator.
+
+  - It should return a unique instance of the
+    ``dma_async_tx_descriptor structure``, that further represents this
+    particular transfer.
+
+  - This structure can be initialized using the function
+    ``dma_async_tx_descriptor_init``.
+
+  - You'll also need to set two fields in this structure:
+
+    - flags:
+      TODO: Can it be modified by the driver itself, or
+      should it be always the flags passed in the arguments
+
+    - tx_submit: A pointer to a function you have to implement,
+      that is supposed to push the current transaction descriptor to a
+      pending queue, waiting for issue_pending to be called.
+
+  - In this structure the function pointer callback_result can be
+    initialized in order for the submitter to be notified that a
+    transaction has completed. In the earlier code the function pointer
+    callback has been used. However it does not provide any status to the
+    transaction and will be deprecated. The result structure defined as
+    ``dmaengine_result`` that is passed in to callback_result
+    has two fields:
+
+    - result: This provides the transfer result defined by
+      ``dmaengine_tx_result``. Either success or some error condition.
+
+    - residue: Provides the residue bytes of the transfer for those that
+      support residue.
+
+- ``device_issue_pending``
+
+  - Takes the first transaction descriptor in the pending queue,
+    and starts the transfer. Whenever that transfer is done, it
+    should move to the next transaction in the list.
+
+  - This function can be called in an interrupt context
+
+- ``device_tx_status``
+
+  - Should report the bytes left to go over on the given channel
+
+  - Should only care about the transaction descriptor passed as
+    argument, not the currently active one on a given channel
+
+  - The tx_state argument might be NULL
+
+  - Should use dma_set_residue to report it
+
+  - In the case of a cyclic transfer, it should only take into
+    account the current period.
+
+  - This function can be called in an interrupt context.
+
+- device_config
+
+  - Reconfigures the channel with the configuration given as argument
+
+  - This command should NOT perform synchronously, or on any
+    currently queued transfers, but only on subsequent ones
+
+  - In this case, the function will receive a ``dma_slave_config``
+    structure pointer as an argument, that will detail which
+    configuration to use.
+
+  - Even though that structure contains a direction field, this
+    field is deprecated in favor of the direction argument given to
+    the prep_* functions
+
+  - This call is mandatory for slave operations only. This should NOT be
+    set or expected to be set for memcpy operations.
+    If a driver support both, it should use this call for slave
+    operations only and not for memcpy ones.
+
+- device_pause
+
+  - Pauses a transfer on the channel
+
+  - This command should operate synchronously on the channel,
+    pausing right away the work of the given channel
+
+- device_resume
+
+  - Resumes a transfer on the channel
+
+  - This command should operate synchronously on the channel,
+    resuming right away the work of the given channel
+
+- device_terminate_all
+
+  - Aborts all the pending and ongoing transfers on the channel
+
+  - For aborted transfers the complete callback should not be called
+
+  - Can be called from atomic context or from within a complete
+    callback of a descriptor. Must not sleep. Drivers must be able
+    to handle this correctly.
+
+  - Termination may be asynchronous. The driver does not have to
+    wait until the currently active transfer has completely stopped.
+    See device_synchronize.
+
+- device_synchronize
+
+  - Must synchronize the termination of a channel to the current
+    context.
+
+  - Must make sure that memory for previously submitted
+    descriptors is no longer accessed by the DMA controller.
+
+  - Must make sure that all complete callbacks for previously
+    submitted descriptors have finished running and none are
+    scheduled to run.
+
+  - May sleep.
+
+
+Misc notes
+==========
+
+(stuff that should be documented, but don't really know
+where to put them)
+
+``dma_run_dependencies``
+
+- Should be called at the end of an async TX transfer, and can be
+  ignored in the slave transfers case.
+
+- Makes sure that dependent operations are run before marking it
+  as complete.
+
+dma_cookie_t
+
+- it's a DMA transaction ID that will increment over time.
+
+- Not really relevant any more since the introduction of ``virt-dma``
+  that abstracts it away.
+
+DMA_CTRL_ACK
+
+- If clear, the descriptor cannot be reused by provider until the
+  client acknowledges receipt, i.e. has has a chance to establish any
+  dependency chains
+
+- This can be acked by invoking async_tx_ack()
+
+- If set, does not mean descriptor can be reused
+
+DMA_CTRL_REUSE
+
+- If set, the descriptor can be reused after being completed. It should
+  not be freed by provider if this flag is set.
+
+- The descriptor should be prepared for reuse by invoking
+  ``dmaengine_desc_set_reuse()`` which will set DMA_CTRL_REUSE.
+
+- ``dmaengine_desc_set_reuse()`` will succeed only when channel support
+  reusable descriptor as exhibited by capabilities
+
+- As a consequence, if a device driver wants to skip the
+  ``dma_map_sg()`` and ``dma_unmap_sg()`` in between 2 transfers,
+  because the DMA'd data wasn't used, it can resubmit the transfer right after
+  its completion.
+
+- Descriptor can be freed in few ways
+
+  - Clearing DMA_CTRL_REUSE by invoking
+    ``dmaengine_desc_clear_reuse()`` and submitting for last txn
+
+  - Explicitly invoking ``dmaengine_desc_free()``, this can succeed only
+    when DMA_CTRL_REUSE is already set
+
+  - Terminating the channel
+
+- DMA_PREP_CMD
+
+  - If set, the client driver tells DMA controller that passed data in DMA
+    API is command data.
+
+  - Interpretation of command data is DMA controller specific. It can be
+    used for issuing commands to other peripherals/register reads/register
+    writes for which the descriptor should be in different format from
+    normal data descriptors.
+
+General Design Notes
+====================
+
+Most of the DMAEngine drivers you'll see are based on a similar design
+that handles the end of transfer interrupts in the handler, but defer
+most work to a tasklet, including the start of a new transfer whenever
+the previous transfer ended.
+
+This is a rather inefficient design though, because the inter-transfer
+latency will be not only the interrupt latency, but also the
+scheduling latency of the tasklet, which will leave the channel idle
+in between, which will slow down the global transfer rate.
+
+You should avoid this kind of practice, and instead of electing a new
+transfer in your tasklet, move that part to the interrupt handler in
+order to have a shorter idle window (that we can't really avoid
+anyway).
+
+Glossary
+========
+
+- Burst: A number of consecutive read or write operations that
+  can be queued to buffers before being flushed to memory.
+
+- Chunk: A contiguous collection of bursts
+
+- Transfer: A collection of chunks (be it contiguous or not)
diff --git a/Documentation/driver-api/dmaengine/pxa_dma.rst b/Documentation/driver-api/dmaengine/pxa_dma.rst
new file mode 100644
index 000000000000..442ee691a190
--- /dev/null
+++ b/Documentation/driver-api/dmaengine/pxa_dma.rst
@@ -0,0 +1,190 @@
+==============================
+PXA/MMP - DMA Slave controller
+==============================
+
+Constraints
+===========
+
+a) Transfers hot queuing
+A driver submitting a transfer and issuing it should be granted the transfer
+is queued even on a running DMA channel.
+This implies that the queuing doesn't wait for the previous transfer end,
+and that the descriptor chaining is not only done in the irq/tasklet code
+triggered by the end of the transfer.
+A transfer which is submitted and issued on a phy doesn't wait for a phy to
+stop and restart, but is submitted on a "running channel". The other
+drivers, especially mmp_pdma waited for the phy to stop before relaunching
+a new transfer.
+
+b) All transfers having asked for confirmation should be signaled
+Any issued transfer with DMA_PREP_INTERRUPT should trigger a callback call.
+This implies that even if an irq/tasklet is triggered by end of tx1, but
+at the time of irq/dma tx2 is already finished, tx1->complete() and
+tx2->complete() should be called.
+
+c) Channel running state
+A driver should be able to query if a channel is running or not. For the
+multimedia case, such as video capture, if a transfer is submitted and then
+a check of the DMA channel reports a "stopped channel", the transfer should
+not be issued until the next "start of frame interrupt", hence the need to
+know if a channel is in running or stopped state.
+
+d) Bandwidth guarantee
+The PXA architecture has 4 levels of DMAs priorities : high, normal, low.
+The high priorities get twice as much bandwidth as the normal, which get twice
+as much as the low priorities.
+A driver should be able to request a priority, especially the real-time
+ones such as pxa_camera with (big) throughputs.
+
+Design
+======
+a) Virtual channels
+Same concept as in sa11x0 driver, ie. a driver was assigned a "virtual
+channel" linked to the requestor line, and the physical DMA channel is
+assigned on the fly when the transfer is issued.
+
+b) Transfer anatomy for a scatter-gather transfer
+
+::
+
+   +------------+-----+---------------+----------------+-----------------+
+   | desc-sg[0] | ... | desc-sg[last] | status updater | finisher/linker |
+   +------------+-----+---------------+----------------+-----------------+
+
+This structure is pointed by dma->sg_cpu.
+The descriptors are used as follows :
+
+    - desc-sg[i]: i-th descriptor, transferring the i-th sg
+      element to the video buffer scatter gather
+
+    - status updater
+      Transfers a single u32 to a well known dma coherent memory to leave
+      a trace that this transfer is done. The "well known" is unique per
+      physical channel, meaning that a read of this value will tell which
+      is the last finished transfer at that point in time.
+
+    - finisher: has ddadr=DADDR_STOP, dcmd=ENDIRQEN
+
+    - linker: has ddadr= desc-sg[0] of next transfer, dcmd=0
+
+c) Transfers hot-chaining
+Suppose the running chain is:
+
+::
+
+   Buffer 1              Buffer 2
+   +---------+----+---+  +----+----+----+---+
+   | d0 | .. | dN | l |  | d0 | .. | dN | f |
+   +---------+----+-|-+  ^----+----+----+---+
+                    |    |
+                    +----+
+
+After a call to dmaengine_submit(b3), the chain will look like:
+
+::
+
+   Buffer 1              Buffer 2              Buffer 3
+   +---------+----+---+  +----+----+----+---+  +----+----+----+---+
+   | d0 | .. | dN | l |  | d0 | .. | dN | l |  | d0 | .. | dN | f |
+   +---------+----+-|-+  ^----+----+----+-|-+  ^----+----+----+---+
+                    |    |                |    |
+                    +----+                +----+
+                                         new_link
+
+If while new_link was created the DMA channel stopped, it is _not_
+restarted. Hot-chaining doesn't break the assumption that
+dma_async_issue_pending() is to be used to ensure the transfer is actually started.
+
+One exception to this rule :
+
+- if Buffer1 and Buffer2 had all their addresses 8 bytes aligned
+
+- and if Buffer3 has at least one address not 4 bytes aligned
+
+- then hot-chaining cannot happen, as the channel must be stopped, the
+  "align bit" must be set, and the channel restarted As a consequence,
+  such a transfer tx_submit() will be queued on the submitted queue, and
+  this specific case if the DMA is already running in aligned mode.
+
+d) Transfers completion updater
+Each time a transfer is completed on a channel, an interrupt might be
+generated or not, up to the client's request. But in each case, the last
+descriptor of a transfer, the "status updater", will write the latest
+transfer being completed into the physical channel's completion mark.
+
+This will speed up residue calculation, for large transfers such as video
+buffers which hold around 6k descriptors or more. This also allows without
+any lock to find out what is the latest completed transfer in a running
+DMA chain.
+
+e) Transfers completion, irq and tasklet
+When a transfer flagged as "DMA_PREP_INTERRUPT" is finished, the dma irq
+is raised. Upon this interrupt, a tasklet is scheduled for the physical
+channel.
+
+The tasklet is responsible for :
+
+- reading the physical channel last updater mark
+
+- calling all the transfer callbacks of finished transfers, based on
+  that mark, and each transfer flags.
+
+If a transfer is completed while this handling is done, a dma irq will
+be raised, and the tasklet will be scheduled once again, having a new
+updater mark.
+
+f) Residue
+Residue granularity will be descriptor based. The issued but not completed
+transfers will be scanned for all of their descriptors against the
+currently running descriptor.
+
+g) Most complicated case of driver's tx queues
+The most tricky situation is when :
+
+ - there are not "acked" transfers (tx0)
+
+ - a driver submitted an aligned tx1, not chained
+
+ - a driver submitted an aligned tx2 => tx2 is cold chained to tx1
+
+ - a driver issued tx1+tx2 => channel is running in aligned mode
+
+ - a driver submitted an aligned tx3 => tx3 is hot-chained
+
+ - a driver submitted an unaligned tx4 => tx4 is put in submitted queue,
+   not chained
+
+ - a driver issued tx4 => tx4 is put in issued queue, not chained
+
+ - a driver submitted an aligned tx5 => tx5 is put in submitted queue, not
+   chained
+
+ - a driver submitted an aligned tx6 => tx6 is put in submitted queue,
+   cold chained to tx5
+
+ This translates into (after tx4 is issued) :
+
+ - issued queue
+
+ ::
+
+      +-----+ +-----+ +-----+ +-----+
+      | tx1 | | tx2 | | tx3 | | tx4 |
+      +---|-+ ^---|-+ ^-----+ +-----+
+          |   |   |   |
+          +---+   +---+
+        - submitted queue
+      +-----+ +-----+
+      | tx5 | | tx6 |
+      +---|-+ ^-----+
+          |   |
+          +---+
+
+- completed queue : empty
+
+- allocated queue : tx0
+
+It should be noted that after tx3 is completed, the channel is stopped, and
+restarted in "unaligned mode" to handle tx4.
+
+Author: Robert Jarzmik <robert.jarzmik@free.fr>
diff --git a/Documentation/driver-api/index.rst b/Documentation/driver-api/index.rst
index 9c20624842b7..d17a9876b473 100644
--- a/Documentation/driver-api/index.rst
+++ b/Documentation/driver-api/index.rst
@@ -46,6 +46,7 @@ available subsections can be seen below.
    pinctl
    gpio
    misc_devices
+   dmaengine/index
 
 .. only::  subproject and html
 
diff --git a/Documentation/driver-api/pinctl.rst b/Documentation/driver-api/pinctl.rst
index 48f15b4f9d3e..6cb68d67fa75 100644
--- a/Documentation/driver-api/pinctl.rst
+++ b/Documentation/driver-api/pinctl.rst
@@ -757,8 +757,8 @@ that your datasheet calls "GPIO mode", but actually is just an electrical
 configuration for a certain device. See the section below named
 "GPIO mode pitfalls" for more details on this scenario.
 
-The public pinmux API contains two functions named pinctrl_request_gpio()
-and pinctrl_free_gpio(). These two functions shall *ONLY* be called from
+The public pinmux API contains two functions named pinctrl_gpio_request()
+and pinctrl_gpio_free(). These two functions shall *ONLY* be called from
 gpiolib-based drivers as part of their gpio_request() and
 gpio_free() semantics. Likewise the pinctrl_gpio_direction_[input|output]
 shall only be called from within respective gpio_direction_[input|output]
@@ -790,7 +790,7 @@ gpiolib driver and the affected GPIO range, pin offset and desired direction
 will be passed along to this function.
 
 Alternatively to using these special functions, it is fully allowed to use
-named functions for each GPIO pin, the pinctrl_request_gpio() will attempt to
+named functions for each GPIO pin, the pinctrl_gpio_request() will attempt to
 obtain the function "gpioN" where "N" is the global GPIO pin number if no
 special GPIO-handler is registered.
 
diff --git a/Documentation/driver-api/pm/devices.rst b/Documentation/driver-api/pm/devices.rst
index bedd32388dac..53c1b0b06da5 100644
--- a/Documentation/driver-api/pm/devices.rst
+++ b/Documentation/driver-api/pm/devices.rst
@@ -274,7 +274,7 @@ sleep states and the hibernation state ("suspend-to-disk").  Each phase involves
 executing callbacks for every device before the next phase begins.  Not all
 buses or classes support all these callbacks and not all drivers use all the
 callbacks.  The various phases always run after tasks have been frozen and
-before they are unfrozen.  Furthermore, the ``*_noirq phases`` run at a time
+before they are unfrozen.  Furthermore, the ``*_noirq`` phases run at a time
 when IRQ handlers have been disabled (except for those marked with the
 IRQF_NO_SUSPEND flag).
 
@@ -328,7 +328,10 @@ the phases are: ``prepare``, ``suspend``, ``suspend_late``, ``suspend_noirq``.
 	After the ``->prepare`` callback method returns, no new children may be
 	registered below the device.  The method may also prepare the device or
 	driver in some way for the upcoming system power transition, but it
-	should not put the device into a low-power state.
+	should not put the device into a low-power state.  Moreover, if the
+	device supports runtime power management, the ``->prepare`` callback
+	method must not update its state in case it is necessary to resume it
+	from runtime suspend later on.
 
 	For devices supporting runtime power management, the return value of the
 	prepare callback can be used to indicate to the PM core that it may
@@ -351,11 +354,35 @@ the phases are: ``prepare``, ``suspend``, ``suspend_late``, ``suspend_noirq``.
 	is because all such devices are initially set to runtime-suspended with
 	runtime PM disabled.
 
+	This feature also can be controlled by device drivers by using the
+	``DPM_FLAG_NEVER_SKIP`` and ``DPM_FLAG_SMART_PREPARE`` driver power
+	management flags.  [Typically, they are set at the time the driver is
+	probed against the device in question by passing them to the
+	:c:func:`dev_pm_set_driver_flags` helper function.]  If the first of
+	these flags is set, the PM core will not apply the direct-complete
+	procedure described above to the given device and, consequenty, to any
+	of its ancestors.  The second flag, when set, informs the middle layer
+	code (bus types, device types, PM domains, classes) that it should take
+	the return value of the ``->prepare`` callback provided by the driver
+	into account and it may only return a positive value from its own
+	``->prepare`` callback if the driver's one also has returned a positive
+	value.
+
     2.	The ``->suspend`` methods should quiesce the device to stop it from
 	performing I/O.  They also may save the device registers and put it into
 	the appropriate low-power state, depending on the bus type the device is
 	on, and they may enable wakeup events.
 
+	However, for devices supporting runtime power management, the
+	``->suspend`` methods provided by subsystems (bus types and PM domains
+	in particular) must follow an additional rule regarding what can be done
+	to the devices before their drivers' ``->suspend`` methods are called.
+	Namely, they can only resume the devices from runtime suspend by
+	calling :c:func:`pm_runtime_resume` for them, if that is necessary, and
+	they must not update the state of the devices in any other way at that
+	time (in case the drivers need to resume the devices from runtime
+	suspend in their ``->suspend`` methods).
+
     3.	For a number of devices it is convenient to split suspend into the
 	"quiesce device" and "save device state" phases, in which cases
 	``suspend_late`` is meant to do the latter.  It is always executed after
@@ -675,7 +702,7 @@ sub-domain of the parent domain.
 
 Support for power domains is provided through the :c:member:`pm_domain` field of
 |struct device|.  This field is a pointer to an object of type
-|struct dev_pm_domain|, defined in :file:`include/linux/pm.h``, providing a set
+|struct dev_pm_domain|, defined in :file:`include/linux/pm.h`, providing a set
 of power management callbacks analogous to the subsystem-level and device driver
 callbacks that are executed for the given device during all power transitions,
 instead of the respective subsystem-level callbacks.  Specifically, if a
@@ -729,6 +756,36 @@ state temporarily, for example so that its system wakeup capability can be
 disabled.  This all depends on the hardware and the design of the subsystem and
 device driver in question.
 
+If it is necessary to resume a device from runtime suspend during a system-wide
+transition into a sleep state, that can be done by calling
+:c:func:`pm_runtime_resume` for it from the ``->suspend`` callback (or its
+couterpart for transitions related to hibernation) of either the device's driver
+or a subsystem responsible for it (for example, a bus type or a PM domain).
+That is guaranteed to work by the requirement that subsystems must not change
+the state of devices (possibly except for resuming them from runtime suspend)
+from their ``->prepare`` and ``->suspend`` callbacks (or equivalent) *before*
+invoking device drivers' ``->suspend`` callbacks (or equivalent).
+
+Some bus types and PM domains have a policy to resume all devices from runtime
+suspend upfront in their ``->suspend`` callbacks, but that may not be really
+necessary if the driver of the device can cope with runtime-suspended devices.
+The driver can indicate that by setting ``DPM_FLAG_SMART_SUSPEND`` in
+:c:member:`power.driver_flags` at the probe time, by passing it to the
+:c:func:`dev_pm_set_driver_flags` helper.  That also may cause middle-layer code
+(bus types, PM domains etc.) to skip the ``->suspend_late`` and
+``->suspend_noirq`` callbacks provided by the driver if the device remains in
+runtime suspend at the beginning of the ``suspend_late`` phase of system-wide
+suspend (or in the ``poweroff_late`` phase of hibernation), when runtime PM
+has been disabled for it, under the assumption that its state should not change
+after that point until the system-wide transition is over.  If that happens, the
+driver's system-wide resume callbacks, if present, may still be invoked during
+the subsequent system-wide resume transition and the device's runtime power
+management status may be set to "active" before enabling runtime PM for it,
+so the driver must be prepared to cope with the invocation of its system-wide
+resume callbacks back-to-back with its ``->runtime_suspend`` one (without the
+intervening ``->runtime_resume`` and so on) and the final state of the device
+must reflect the "active" status for runtime PM in that case.
+
 During system-wide resume from a sleep state it's easiest to put devices into
 the full-power state, as explained in :file:`Documentation/power/runtime_pm.txt`.
 Refer to that document for more information regarding this particular issue as
diff --git a/Documentation/driver-api/scsi.rst b/Documentation/driver-api/scsi.rst
index 5a2aa7a377d9..9ae03171daca 100644
--- a/Documentation/driver-api/scsi.rst
+++ b/Documentation/driver-api/scsi.rst
@@ -28,7 +28,7 @@ SCSI commands can be transported over just about any kind of bus, and
 are the default protocol for storage devices attached to USB, SATA, SAS,
 Fibre Channel, FireWire, and ATAPI devices. SCSI packets are also
 commonly exchanged over Infiniband,
-`I20 <http://i2o.shadowconnect.com/faq.php>`__, TCP/IP
+`I2O <http://i2o.shadowconnect.com/faq.php>`__, TCP/IP
 (`iSCSI <https://en.wikipedia.org/wiki/ISCSI>`__), even `Parallel
 ports <http://cyberelk.net/tim/parport/parscsi.html>`__.
 
diff --git a/Documentation/driver-api/usb/usb.rst b/Documentation/driver-api/usb/usb.rst
index dba0f876b36f..078e981e2b16 100644
--- a/Documentation/driver-api/usb/usb.rst
+++ b/Documentation/driver-api/usb/usb.rst
@@ -690,9 +690,7 @@ The USB devices are now exported via debugfs:
 This file is handy for status viewing tools in user mode, which can scan
 the text format and ignore most of it. More detailed device status
 (including class and vendor status) is available from device-specific
-files. For information about the current format of this file, see the
-``Documentation/usb/proc_usb_info.txt`` file in your Linux kernel
-sources.
+files. For information about the current format of this file, see below.
 
 This file, in combination with the poll() system call, can also be used
 to detect when devices are added or removed::