summaryrefslogtreecommitdiff
path: root/Documentation/core-api
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/core-api')
-rw-r--r--Documentation/core-api/dma-api-howto.rst36
-rw-r--r--Documentation/core-api/dma-api.rst197
-rw-r--r--Documentation/core-api/entry.rst6
-rw-r--r--Documentation/core-api/index.rst1
-rw-r--r--Documentation/core-api/kernel-api.rst24
-rw-r--r--Documentation/core-api/list.rst776
-rw-r--r--Documentation/core-api/memory-hotplug.rst91
-rw-r--r--Documentation/core-api/mm-api.rst6
-rw-r--r--Documentation/core-api/packing.rst2
-rw-r--r--Documentation/core-api/workqueue.rst6
10 files changed, 971 insertions, 174 deletions
diff --git a/Documentation/core-api/dma-api-howto.rst b/Documentation/core-api/dma-api-howto.rst
index 0bf31b6c4383..96fce2a9aa90 100644
--- a/Documentation/core-api/dma-api-howto.rst
+++ b/Documentation/core-api/dma-api-howto.rst
@@ -155,7 +155,7 @@ a device with limitations, it needs to be decreased.
Special note about PCI: PCI-X specification requires PCI-X devices to support
64-bit addressing (DAC) for all transactions. And at least one platform (SGI
-SN2) requires 64-bit consistent allocations to operate correctly when the IO
+SN2) requires 64-bit coherent allocations to operate correctly when the IO
bus is in PCI-X mode.
For correct operation, you must set the DMA mask to inform the kernel about
@@ -174,7 +174,7 @@ used instead:
int dma_set_mask(struct device *dev, u64 mask);
- The setup for consistent allocations is performed via a call
+ The setup for coherent allocations is performed via a call
to dma_set_coherent_mask()::
int dma_set_coherent_mask(struct device *dev, u64 mask);
@@ -241,7 +241,7 @@ it would look like this::
The coherent mask will always be able to set the same or a smaller mask as
the streaming mask. However for the rare case that a device driver only
-uses consistent allocations, one would have to check the return value from
+uses coherent allocations, one would have to check the return value from
dma_set_coherent_mask().
Finally, if your device can only drive the low 24-bits of
@@ -298,20 +298,20 @@ Types of DMA mappings
There are two types of DMA mappings:
-- Consistent DMA mappings which are usually mapped at driver
+- Coherent DMA mappings which are usually mapped at driver
initialization, unmapped at the end and for which the hardware should
guarantee that the device and the CPU can access the data
in parallel and will see updates made by each other without any
explicit software flushing.
- Think of "consistent" as "synchronous" or "coherent".
+ Think of "coherent" as "synchronous".
- The current default is to return consistent memory in the low 32
+ The current default is to return coherent memory in the low 32
bits of the DMA space. However, for future compatibility you should
- set the consistent mask even if this default is fine for your
+ set the coherent mask even if this default is fine for your
driver.
- Good examples of what to use consistent mappings for are:
+ Good examples of what to use coherent mappings for are:
- Network card DMA ring descriptors.
- SCSI adapter mailbox command data structures.
@@ -320,13 +320,13 @@ There are two types of DMA mappings:
The invariant these examples all require is that any CPU store
to memory is immediately visible to the device, and vice
- versa. Consistent mappings guarantee this.
+ versa. Coherent mappings guarantee this.
.. important::
- Consistent DMA memory does not preclude the usage of
+ Coherent DMA memory does not preclude the usage of
proper memory barriers. The CPU may reorder stores to
- consistent memory just as it may normal memory. Example:
+ coherent memory just as it may normal memory. Example:
if it is important for the device to see the first word
of a descriptor updated before the second, you must do
something like::
@@ -365,10 +365,10 @@ Also, systems with caches that aren't DMA-coherent will work better
when the underlying buffers don't share cache lines with other data.
-Using Consistent DMA mappings
-=============================
+Using Coherent DMA mappings
+===========================
-To allocate and map large (PAGE_SIZE or so) consistent DMA regions,
+To allocate and map large (PAGE_SIZE or so) coherent DMA regions,
you should do::
dma_addr_t dma_handle;
@@ -385,10 +385,10 @@ __get_free_pages() (but takes size instead of a page order). If your
driver needs regions sized smaller than a page, you may prefer using
the dma_pool interface, described below.
-The consistent DMA mapping interfaces, will by default return a DMA address
+The coherent DMA mapping interfaces, will by default return a DMA address
which is 32-bit addressable. Even if the device indicates (via the DMA mask)
-that it may address the upper 32-bits, consistent allocation will only
-return > 32-bit addresses for DMA if the consistent DMA mask has been
+that it may address the upper 32-bits, coherent allocation will only
+return > 32-bit addresses for DMA if the coherent DMA mask has been
explicitly changed via dma_set_coherent_mask(). This is true of the
dma_pool interface as well.
@@ -497,7 +497,7 @@ program address space. Such platforms can and do report errors in the
kernel logs when the DMA controller hardware detects violation of the
permission setting.
-Only streaming mappings specify a direction, consistent mappings
+Only streaming mappings specify a direction, coherent mappings
implicitly have a direction attribute setting of
DMA_BIDIRECTIONAL.
diff --git a/Documentation/core-api/dma-api.rst b/Documentation/core-api/dma-api.rst
index 2ad08517e626..3087bea715ed 100644
--- a/Documentation/core-api/dma-api.rst
+++ b/Documentation/core-api/dma-api.rst
@@ -8,15 +8,15 @@ This document describes the DMA API. For a more gentle introduction
of the API (and actual examples), see Documentation/core-api/dma-api-howto.rst.
This API is split into two pieces. Part I describes the basic API.
-Part II describes extensions for supporting non-consistent memory
+Part II describes extensions for supporting non-coherent memory
machines. Unless you know that your driver absolutely has to support
-non-consistent platforms (this is usually only legacy platforms) you
+non-coherent platforms (this is usually only legacy platforms) you
should only use the API described in part I.
-Part I - dma_API
+Part I - DMA API
----------------
-To get the dma_API, you must #include <linux/dma-mapping.h>. This
+To get the DMA API, you must #include <linux/dma-mapping.h>. This
provides dma_addr_t and the interfaces described below.
A dma_addr_t can hold any valid DMA address for the platform. It can be
@@ -33,13 +33,13 @@ Part Ia - Using large DMA-coherent buffers
dma_alloc_coherent(struct device *dev, size_t size,
dma_addr_t *dma_handle, gfp_t flag)
-Consistent memory is memory for which a write by either the device or
+Coherent memory is memory for which a write by either the device or
the processor can immediately be read by the processor or device
without having to worry about caching effects. (You may however need
to make sure to flush the processor's write buffers before telling
devices to read that memory.)
-This routine allocates a region of <size> bytes of consistent memory.
+This routine allocates a region of <size> bytes of coherent memory.
It returns a pointer to the allocated region (in the processor's virtual
address space) or NULL if the allocation failed.
@@ -48,15 +48,14 @@ It also returns a <dma_handle> which may be cast to an unsigned integer the
same width as the bus and given to the device as the DMA address base of
the region.
-Note: consistent memory can be expensive on some platforms, and the
+Note: coherent memory can be expensive on some platforms, and the
minimum allocation length may be as big as a page, so you should
-consolidate your requests for consistent memory as much as possible.
+consolidate your requests for coherent memory as much as possible.
The simplest way to do that is to use the dma_pool calls (see below).
-The flag parameter (dma_alloc_coherent() only) allows the caller to
-specify the ``GFP_`` flags (see kmalloc()) for the allocation (the
-implementation may choose to ignore flags that affect the location of
-the returned memory, like GFP_DMA).
+The flag parameter allows the caller to specify the ``GFP_`` flags (see
+kmalloc()) for the allocation (the implementation may ignore flags that affect
+the location of the returned memory, like GFP_DMA).
::
@@ -64,19 +63,18 @@ the returned memory, like GFP_DMA).
dma_free_coherent(struct device *dev, size_t size, void *cpu_addr,
dma_addr_t dma_handle)
-Free a region of consistent memory you previously allocated. dev,
-size and dma_handle must all be the same as those passed into
-dma_alloc_coherent(). cpu_addr must be the virtual address returned by
-the dma_alloc_coherent().
+Free a previously allocated region of coherent memory. dev, size and dma_handle
+must all be the same as those passed into dma_alloc_coherent(). cpu_addr must
+be the virtual address returned by dma_alloc_coherent().
-Note that unlike their sibling allocation calls, these routines
-may only be called with IRQs enabled.
+Note that unlike the sibling allocation call, this routine may only be called
+with IRQs enabled.
Part Ib - Using small DMA-coherent buffers
------------------------------------------
-To get this part of the dma_API, you must #include <linux/dmapool.h>
+To get this part of the DMA API, you must #include <linux/dmapool.h>
Many drivers need lots of small DMA-coherent memory regions for DMA
descriptors or I/O buffers. Rather than allocating in units of a page
@@ -85,78 +83,29 @@ much like a struct kmem_cache, except that they use the DMA-coherent allocator,
not __get_free_pages(). Also, they understand common hardware constraints
for alignment, like queue heads needing to be aligned on N-byte boundaries.
+.. kernel-doc:: mm/dmapool.c
+ :export:
-::
-
- struct dma_pool *
- dma_pool_create(const char *name, struct device *dev,
- size_t size, size_t align, size_t alloc);
-
-dma_pool_create() initializes a pool of DMA-coherent buffers
-for use with a given device. It must be called in a context which
-can sleep.
-
-The "name" is for diagnostics (like a struct kmem_cache name); dev and size
-are like what you'd pass to dma_alloc_coherent(). The device's hardware
-alignment requirement for this type of data is "align" (which is expressed
-in bytes, and must be a power of two). If your device has no boundary
-crossing restrictions, pass 0 for alloc; passing 4096 says memory allocated
-from this pool must not cross 4KByte boundaries.
-
-::
-
- void *
- dma_pool_zalloc(struct dma_pool *pool, gfp_t mem_flags,
- dma_addr_t *handle)
-
-Wraps dma_pool_alloc() and also zeroes the returned memory if the
-allocation attempt succeeded.
-
-
-::
-
- void *
- dma_pool_alloc(struct dma_pool *pool, gfp_t gfp_flags,
- dma_addr_t *dma_handle);
-
-This allocates memory from the pool; the returned memory will meet the
-size and alignment requirements specified at creation time. Pass
-GFP_ATOMIC to prevent blocking, or if it's permitted (not
-in_interrupt, not holding SMP locks), pass GFP_KERNEL to allow
-blocking. Like dma_alloc_coherent(), this returns two values: an
-address usable by the CPU, and the DMA address usable by the pool's
-device.
-
-::
-
- void
- dma_pool_free(struct dma_pool *pool, void *vaddr,
- dma_addr_t addr);
-
-This puts memory back into the pool. The pool is what was passed to
-dma_pool_alloc(); the CPU (vaddr) and DMA addresses are what
-were returned when that routine allocated the memory being freed.
-
-::
-
- void
- dma_pool_destroy(struct dma_pool *pool);
-
-dma_pool_destroy() frees the resources of the pool. It must be
-called in a context which can sleep. Make sure you've freed all allocated
-memory back to the pool before you destroy it.
+.. kernel-doc:: include/linux/dmapool.h
Part Ic - DMA addressing limitations
------------------------------------
+DMA mask is a bit mask of the addressable region for the device. In other words,
+if applying the DMA mask (a bitwise AND operation) to the DMA address of a
+memory region does not clear any bits in the address, then the device can
+perform DMA to that memory region.
+
+All the below functions which set a DMA mask may fail if the requested mask
+cannot be used with the device, or if the device is not capable of doing DMA.
+
::
int
dma_set_mask_and_coherent(struct device *dev, u64 mask)
-Checks to see if the mask is possible and updates the device
-streaming and coherent DMA mask parameters if it is.
+Updates both streaming and coherent DMA masks.
Returns: 0 if successful and a negative error if not.
@@ -165,8 +114,7 @@ Returns: 0 if successful and a negative error if not.
int
dma_set_mask(struct device *dev, u64 mask)
-Checks to see if the mask is possible and updates the device
-parameters if it is.
+Updates only the streaming DMA mask.
Returns: 0 if successful and a negative error if not.
@@ -175,8 +123,7 @@ Returns: 0 if successful and a negative error if not.
int
dma_set_coherent_mask(struct device *dev, u64 mask)
-Checks to see if the mask is possible and updates the device
-parameters if it is.
+Updates only the coherent DMA mask.
Returns: 0 if successful and a negative error if not.
@@ -231,12 +178,32 @@ transfer memory ownership. Returns %false if those calls can be skipped.
unsigned long
dma_get_merge_boundary(struct device *dev);
-Returns the DMA merge boundary. If the device cannot merge any the DMA address
+Returns the DMA merge boundary. If the device cannot merge any DMA address
segments, the function returns 0.
Part Id - Streaming DMA mappings
--------------------------------
+Streaming DMA allows to map an existing buffer for DMA transfers and then
+unmap it when finished. Map functions are not guaranteed to succeed, so the
+return value must be checked.
+
+.. note::
+
+ In particular, mapping may fail for memory not addressable by the
+ device, e.g. if it is not within the DMA mask of the device and/or a
+ connecting bus bridge. Streaming DMA functions try to overcome such
+ addressing constraints, either by using an IOMMU (a device which maps
+ I/O DMA addresses to physical memory addresses), or by copying the
+ data to/from a bounce buffer if the kernel is configured with a
+ :doc:`SWIOTLB <swiotlb>`. However, these methods are not always
+ available, and even if they are, they may still fail for a number of
+ reasons.
+
+ In short, a device driver may need to be wary of where buffers are
+ located in physical memory, especially if the DMA mask is less than 32
+ bits.
+
::
dma_addr_t
@@ -246,9 +213,7 @@ Part Id - Streaming DMA mappings
Maps a piece of processor virtual memory so it can be accessed by the
device and returns the DMA address of the memory.
-The direction for both APIs may be converted freely by casting.
-However the dma_API uses a strongly typed enumerator for its
-direction:
+The DMA API uses a strongly typed enumerator for its direction:
======================= =============================================
DMA_NONE no direction (used for debugging)
@@ -259,31 +224,13 @@ DMA_BIDIRECTIONAL direction isn't known
.. note::
- Not all memory regions in a machine can be mapped by this API.
- Further, contiguous kernel virtual space may not be contiguous as
+ Contiguous kernel virtual space may not be contiguous as
physical memory. Since this API does not provide any scatter/gather
capability, it will fail if the user tries to map a non-physically
contiguous piece of memory. For this reason, memory to be mapped by
this API should be obtained from sources which guarantee it to be
physically contiguous (like kmalloc).
- Further, the DMA address of the memory must be within the
- dma_mask of the device (the dma_mask is a bit mask of the
- addressable region for the device, i.e., if the DMA address of
- the memory ANDed with the dma_mask is still equal to the DMA
- address, then the device can perform DMA to the memory). To
- ensure that the memory allocated by kmalloc is within the dma_mask,
- the driver may specify various platform-dependent flags to restrict
- the DMA address range of the allocation (e.g., on x86, GFP_DMA
- guarantees to be within the first 16MB of available DMA addresses,
- as required by ISA devices).
-
- Note also that the above constraints on physical contiguity and
- dma_mask may not apply if the platform has an IOMMU (a device which
- maps an I/O DMA address to a physical memory address). However, to be
- portable, device driver writers may *not* assume that such an IOMMU
- exists.
-
.. warning::
Memory coherency operates at a granularity called the cache
@@ -325,8 +272,7 @@ DMA_BIDIRECTIONAL direction isn't known
enum dma_data_direction direction)
Unmaps the region previously mapped. All the parameters passed in
-must be identical to those passed in (and returned) by the mapping
-API.
+must be identical to those passed to (and returned by) dma_map_single().
::
@@ -376,10 +322,10 @@ action (e.g. reduce current DMA mapping usage or delay and try again later).
dma_map_sg(struct device *dev, struct scatterlist *sg,
int nents, enum dma_data_direction direction)
-Returns: the number of DMA address segments mapped (this may be shorter
-than <nents> passed in if some elements of the scatter/gather list are
-physically or virtually adjacent and an IOMMU maps them with a single
-entry).
+Maps a scatter/gather list for DMA. Returns the number of DMA address segments
+mapped, which may be smaller than <nents> passed in if several consecutive
+sglist entries are merged (e.g. with an IOMMU, or if some adjacent segments
+just happen to be physically contiguous).
Please note that the sg cannot be mapped again if it has been mapped once.
The mapping process is allowed to destroy information in the sg.
@@ -403,9 +349,8 @@ With scatterlists, you use the resulting mapping like this::
where nents is the number of entries in the sglist.
The implementation is free to merge several consecutive sglist entries
-into one (e.g. with an IOMMU, or if several pages just happen to be
-physically contiguous) and returns the actual number of sg entries it
-mapped them to. On failure 0, is returned.
+into one. The returned number is the actual number of sg entries it
+mapped them to. On failure, 0 is returned.
Then you should loop count times (note: this can be less than nents times)
and use sg_dma_address() and sg_dma_len() macros where you previously
@@ -775,19 +720,19 @@ memory or doing partial flushes.
of two for easy alignment.
-Part III - Debug drivers use of the DMA-API
+Part III - Debug drivers use of the DMA API
-------------------------------------------
-The DMA-API as described above has some constraints. DMA addresses must be
+The DMA API as described above has some constraints. DMA addresses must be
released with the corresponding function with the same size for example. With
the advent of hardware IOMMUs it becomes more and more important that drivers
do not violate those constraints. In the worst case such a violation can
result in data corruption up to destroyed filesystems.
-To debug drivers and find bugs in the usage of the DMA-API checking code can
+To debug drivers and find bugs in the usage of the DMA API checking code can
be compiled into the kernel which will tell the developer about those
violations. If your architecture supports it you can select the "Enable
-debugging of DMA-API usage" option in your kernel configuration. Enabling this
+debugging of DMA API usage" option in your kernel configuration. Enabling this
option has a performance impact. Do not enable it in production kernels.
If you boot the resulting kernel will contain code which does some bookkeeping
@@ -826,7 +771,7 @@ example warning message may look like this::
<EOI> <4>---[ end trace f6435a98e2a38c0e ]---
The driver developer can find the driver and the device including a stacktrace
-of the DMA-API call which caused this warning.
+of the DMA API call which caused this warning.
Per default only the first error will result in a warning message. All other
errors will only silently counted. This limitation exist to prevent the code
@@ -834,7 +779,7 @@ from flooding your kernel log. To support debugging a device driver this can
be disabled via debugfs. See the debugfs interface documentation below for
details.
-The debugfs directory for the DMA-API debugging code is called dma-api/. In
+The debugfs directory for the DMA API debugging code is called dma-api/. In
this directory the following files can currently be found:
=============================== ===============================================
@@ -882,7 +827,7 @@ dma-api/driver_filter You can write a name of a driver into this file
If you have this code compiled into your kernel it will be enabled by default.
If you want to boot without the bookkeeping anyway you can provide
-'dma_debug=off' as a boot parameter. This will disable DMA-API debugging.
+'dma_debug=off' as a boot parameter. This will disable DMA API debugging.
Notice that you can not enable it again at runtime. You have to reboot to do
so.
@@ -915,3 +860,9 @@ the driver. When driver does unmap, debug_dma_unmap() checks the flag and if
this flag is still set, prints warning message that includes call trace that
leads up to the unmap. This interface can be called from dma_mapping_error()
routines to enable DMA mapping error check debugging.
+
+Functions and structures
+========================
+
+.. kernel-doc:: include/linux/scatterlist.h
+.. kernel-doc:: lib/scatterlist.c
diff --git a/Documentation/core-api/entry.rst b/Documentation/core-api/entry.rst
index a15f9b1767a2..71d8eedc0549 100644
--- a/Documentation/core-api/entry.rst
+++ b/Documentation/core-api/entry.rst
@@ -105,7 +105,7 @@ has to do extra work between the various steps. In such cases it has to
ensure that enter_from_user_mode() is called first on entry and
exit_to_user_mode() is called last on exit.
-Do not nest syscalls. Nested systcalls will cause RCU and/or context tracking
+Do not nest syscalls. Nested syscalls will cause RCU and/or context tracking
to print a warning.
KVM
@@ -115,8 +115,8 @@ Entering or exiting guest mode is very similar to syscalls. From the host
kernel point of view the CPU goes off into user space when entering the
guest and returns to the kernel on exit.
-kvm_guest_enter_irqoff() is a KVM-specific variant of exit_to_user_mode()
-and kvm_guest_exit_irqoff() is the KVM variant of enter_from_user_mode().
+guest_state_enter_irqoff() is a KVM-specific variant of exit_to_user_mode()
+and guest_state_exit_irqoff() is the KVM variant of enter_from_user_mode().
The state operations have the same ordering.
Task work handling is done separately for guest at the boundary of the
diff --git a/Documentation/core-api/index.rst b/Documentation/core-api/index.rst
index 7a4ca18ca6e2..a03a99c2cac5 100644
--- a/Documentation/core-api/index.rst
+++ b/Documentation/core-api/index.rst
@@ -54,6 +54,7 @@ Library functionality that is used throughout the kernel.
union_find
min_heap
parser
+ list
Low level entry and exit
========================
diff --git a/Documentation/core-api/kernel-api.rst b/Documentation/core-api/kernel-api.rst
index ae92a2571388..e8211c4ca662 100644
--- a/Documentation/core-api/kernel-api.rst
+++ b/Documentation/core-api/kernel-api.rst
@@ -3,12 +3,6 @@ The Linux Kernel API
====================
-List Management Functions
-=========================
-
-.. kernel-doc:: include/linux/list.h
- :internal:
-
Basic C Library Functions
=========================
@@ -136,26 +130,28 @@ Arithmetic Overflow Checking
CRC Functions
-------------
-.. kernel-doc:: lib/crc4.c
+.. kernel-doc:: lib/crc/crc4.c
:export:
-.. kernel-doc:: lib/crc7.c
+.. kernel-doc:: lib/crc/crc7.c
:export:
-.. kernel-doc:: lib/crc8.c
+.. kernel-doc:: lib/crc/crc8.c
:export:
-.. kernel-doc:: lib/crc16.c
+.. kernel-doc:: lib/crc/crc16.c
:export:
-.. kernel-doc:: lib/crc32.c
-
-.. kernel-doc:: lib/crc-ccitt.c
+.. kernel-doc:: lib/crc/crc-ccitt.c
:export:
-.. kernel-doc:: lib/crc-itu-t.c
+.. kernel-doc:: lib/crc/crc-itu-t.c
:export:
+.. kernel-doc:: include/linux/crc32.h
+
+.. kernel-doc:: include/linux/crc64.h
+
Base 2 log and power Functions
------------------------------
diff --git a/Documentation/core-api/list.rst b/Documentation/core-api/list.rst
new file mode 100644
index 000000000000..86873ce9adbf
--- /dev/null
+++ b/Documentation/core-api/list.rst
@@ -0,0 +1,776 @@
+.. SPDX-License-Identifier: GPL-2.0+
+
+=====================
+Linked Lists in Linux
+=====================
+
+:Author: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
+
+.. contents::
+
+Introduction
+============
+
+Linked lists are one of the most basic data structures used in many programs.
+The Linux kernel implements several different flavours of linked lists. The
+purpose of this document is not to explain linked lists in general, but to show
+new kernel developers how to use the Linux kernel implementations of linked
+lists.
+
+Please note that while linked lists certainly are ubiquitous, they are rarely
+the best data structure to use in cases where a simple array doesn't already
+suffice. In particular, due to their poor data locality, linked lists are a bad
+choice in situations where performance may be of consideration. Familiarizing
+oneself with other in-kernel generic data structures, especially for concurrent
+accesses, is highly encouraged.
+
+Linux implementation of doubly linked lists
+===========================================
+
+Linux's linked list implementations can be used by including the header file
+``<linux/list.h>``.
+
+The doubly-linked list will likely be the most familiar to many readers. It's a
+list that can efficiently be traversed forwards and backwards.
+
+The Linux kernel's doubly-linked list is circular in nature. This means that to
+get from the head node to the tail, we can just travel one edge backwards.
+Similarly, to get from the tail node to the head, we can simply travel forwards
+"beyond" the tail and arrive back at the head.
+
+Declaring a node
+----------------
+
+A node in a doubly-linked list is declared by adding a struct list_head
+member to the data structure you wish to be contained in the list:
+
+.. code-block:: c
+
+ struct clown {
+ unsigned long long shoe_size;
+ const char *name;
+ struct list_head node; /* the aforementioned member */
+ };
+
+This may be an unfamiliar approach to some, as the classical explanation of a
+linked list is a list node data structure with pointers to the previous and next
+list node, as well the payload data. Linux chooses this approach because it
+allows for generic list modification code regardless of what data structure is
+contained within the list. Since the struct list_head member is not a pointer
+but part of the data structure proper, the container_of() pattern can be used by
+the list implementation to access the payload data regardless of its type, while
+staying oblivious to what said type actually is.
+
+Declaring and initializing a list
+---------------------------------
+
+A doubly-linked list can then be declared as just another struct list_head,
+and initialized with the LIST_HEAD_INIT() macro during initial assignment, or
+with the INIT_LIST_HEAD() function later:
+
+.. code-block:: c
+
+ struct clown_car {
+ int tyre_pressure[4];
+ struct list_head clowns; /* Looks like a node! */
+ };
+
+ /* ... Somewhere later in our driver ... */
+
+ static int circus_init(struct circus_priv *circus)
+ {
+ struct clown_car other_car = {
+ .tyre_pressure = {10, 12, 11, 9},
+ .clowns = LIST_HEAD_INIT(other_car.clowns)
+ };
+
+ INIT_LIST_HEAD(&circus->car.clowns);
+
+ return 0;
+ }
+
+A further point of confusion to some may be that the list itself doesn't really
+have its own type. The concept of the entire linked list and a
+struct list_head member that points to other entries in the list are one and
+the same.
+
+Adding nodes to the list
+------------------------
+
+Adding a node to the linked list is done through the list_add() macro.
+
+We'll return to our clown car example to illustrate how nodes get added to the
+list:
+
+.. code-block:: c
+
+ static int circus_fill_car(struct circus_priv *circus)
+ {
+ struct clown_car *car = &circus->car;
+ struct clown *grock;
+ struct clown *dimitri;
+
+ /* State 1 */
+
+ grock = kzalloc(sizeof(*grock), GFP_KERNEL);
+ if (!grock)
+ return -ENOMEM;
+ grock->name = "Grock";
+ grock->shoe_size = 1000;
+
+ /* Note that we're adding the "node" member */
+ list_add(&grock->node, &car->clowns);
+
+ /* State 2 */
+
+ dimitri = kzalloc(sizeof(*dimitri), GFP_KERNEL);
+ if (!dimitri)
+ return -ENOMEM;
+ dimitri->name = "Dimitri";
+ dimitri->shoe_size = 50;
+
+ list_add(&dimitri->node, &car->clowns);
+
+ /* State 3 */
+
+ return 0;
+ }
+
+In State 1, our list of clowns is still empty::
+
+ .------.
+ v |
+ .--------. |
+ | clowns |--'
+ '--------'
+
+This diagram shows the singular "clowns" node pointing at itself. In this
+diagram, and all following diagrams, only the forward edges are shown, to aid in
+clarity.
+
+In State 2, we've added Grock after the list head::
+
+ .--------------------.
+ v |
+ .--------. .-------. |
+ | clowns |---->| Grock |--'
+ '--------' '-------'
+
+This diagram shows the "clowns" node pointing at a new node labeled "Grock".
+The Grock node is pointing back at the "clowns" node.
+
+In State 3, we've added Dimitri after the list head, resulting in the following::
+
+ .------------------------------------.
+ v |
+ .--------. .---------. .-------. |
+ | clowns |---->| Dimitri |---->| Grock |--'
+ '--------' '---------' '-------'
+
+This diagram shows the "clowns" node pointing at a new node labeled "Dimitri",
+which then points at the node labeled "Grock". The "Grock" node still points
+back at the "clowns" node.
+
+If we wanted to have Dimitri inserted at the end of the list instead, we'd use
+list_add_tail(). Our code would then look like this:
+
+.. code-block:: c
+
+ static int circus_fill_car(struct circus_priv *circus)
+ {
+ /* ... */
+
+ list_add_tail(&dimitri->node, &car->clowns);
+
+ /* State 3b */
+
+ return 0;
+ }
+
+This results in the following list::
+
+ .------------------------------------.
+ v |
+ .--------. .-------. .---------. |
+ | clowns |---->| Grock |---->| Dimitri |--'
+ '--------' '-------' '---------'
+
+This diagram shows the "clowns" node pointing at the node labeled "Grock",
+which points at the new node labeled "Dimitri". The node labeled "Dimitri"
+points back at the "clowns" node.
+
+Traversing the list
+-------------------
+
+To iterate the list, we can loop through all nodes within the list with
+list_for_each().
+
+In our clown example, this results in the following somewhat awkward code:
+
+.. code-block:: c
+
+ static unsigned long long circus_get_max_shoe_size(struct circus_priv *circus)
+ {
+ unsigned long long res = 0;
+ struct clown *e;
+ struct list_head *cur;
+
+ list_for_each(cur, &circus->car.clowns) {
+ e = list_entry(cur, struct clown, node);
+ if (e->shoe_size > res)
+ res = e->shoe_size;
+ }
+
+ return res;
+ }
+
+The list_entry() macro internally uses the aforementioned container_of() to
+retrieve the data structure instance that ``node`` is a member of.
+
+Note how the additional list_entry() call is a little awkward here. It's only
+there because we're iterating through the ``node`` members, but we really want
+to iterate through the payload, i.e. the ``struct clown`` that contains each
+node's struct list_head. For this reason, there is a second macro:
+list_for_each_entry()
+
+Using it would change our code to something like this:
+
+.. code-block:: c
+
+ static unsigned long long circus_get_max_shoe_size(struct circus_priv *circus)
+ {
+ unsigned long long res = 0;
+ struct clown *e;
+
+ list_for_each_entry(e, &circus->car.clowns, node) {
+ if (e->shoe_size > res)
+ res = e->shoe_size;
+ }
+
+ return res;
+ }
+
+This eliminates the need for the list_entry() step, and our loop cursor is now
+of the type of our payload. The macro is given the member name that corresponds
+to the list's struct list_head within the clown data structure so that it can
+still walk the list.
+
+Removing nodes from the list
+----------------------------
+
+The list_del() function can be used to remove entries from the list. It not only
+removes the given entry from the list, but poisons the entry's ``prev`` and
+``next`` pointers, so that unintended use of the entry after removal does not
+go unnoticed.
+
+We can extend our previous example to remove one of the entries:
+
+.. code-block:: c
+
+ static int circus_fill_car(struct circus_priv *circus)
+ {
+ /* ... */
+
+ list_add(&dimitri->node, &car->clowns);
+
+ /* State 3 */
+
+ list_del(&dimitri->node);
+
+ /* State 4 */
+
+ return 0;
+ }
+
+The result of this would be this::
+
+ .--------------------.
+ v |
+ .--------. .-------. | .---------.
+ | clowns |---->| Grock |--' | Dimitri |
+ '--------' '-------' '---------'
+
+This diagram shows the "clowns" node pointing at the node labeled "Grock",
+which points back at the "clowns" node. Off to the side is a lone node labeled
+"Dimitri", which has no arrows pointing anywhere.
+
+Note how the Dimitri node does not point to itself; its pointers are
+intentionally set to a "poison" value that the list code refuses to traverse.
+
+If we wanted to reinitialize the removed node instead to make it point at itself
+again like an empty list head, we can use list_del_init() instead:
+
+.. code-block:: c
+
+ static int circus_fill_car(struct circus_priv *circus)
+ {
+ /* ... */
+
+ list_add(&dimitri->node, &car->clowns);
+
+ /* State 3 */
+
+ list_del_init(&dimitri->node);
+
+ /* State 4b */
+
+ return 0;
+ }
+
+This results in the deleted node pointing to itself again::
+
+ .--------------------. .-------.
+ v | v |
+ .--------. .-------. | .---------. |
+ | clowns |---->| Grock |--' | Dimitri |--'
+ '--------' '-------' '---------'
+
+This diagram shows the "clowns" node pointing at the node labeled "Grock",
+which points back at the "clowns" node. Off to the side is a lone node labeled
+"Dimitri", which points to itself.
+
+Traversing whilst removing nodes
+--------------------------------
+
+Deleting entries while we're traversing the list will cause problems if we use
+list_for_each() and list_for_each_entry(), as deleting the current entry would
+modify the ``next`` pointer of it, which means the traversal can't properly
+advance to the next list entry.
+
+There is a solution to this however: list_for_each_safe() and
+list_for_each_entry_safe(). These take an additional parameter of a pointer to
+a struct list_head to use as temporary storage for the next entry during
+iteration, solving the issue.
+
+An example of how to use it:
+
+.. code-block:: c
+
+ static void circus_eject_insufficient_clowns(struct circus_priv *circus)
+ {
+ struct clown *e;
+ struct clown *n; /* temporary storage for safe iteration */
+
+ list_for_each_entry_safe(e, n, &circus->car.clowns, node) {
+ if (e->shoe_size < 500)
+ list_del(&e->node);
+ }
+ }
+
+Proper memory management (i.e. freeing the deleted node while making sure
+nothing still references it) in this case is left as an exercise to the reader.
+
+Cutting a list
+--------------
+
+There are two helper functions to cut lists with. Both take elements from the
+list ``head``, and replace the contents of the list ``list``.
+
+The first such function is list_cut_position(). It removes all list entries from
+``head`` up to and including ``entry``, placing them in ``list`` instead.
+
+In this example, it's assumed we start with the following list::
+
+ .----------------------------------------------------------------.
+ v |
+ .--------. .-------. .---------. .-----. .---------. |
+ | clowns |---->| Grock |---->| Dimitri |---->| Pic |---->| Alfredo |--'
+ '--------' '-------' '---------' '-----' '---------'
+
+With the following code, every clown up to and including "Pic" is moved from
+the "clowns" list head to a separate struct list_head initialized at local
+stack variable ``retirement``:
+
+.. code-block:: c
+
+ static void circus_retire_clowns(struct circus_priv *circus)
+ {
+ struct list_head retirement = LIST_HEAD_INIT(retirement);
+ struct clown *grock, *dimitri, *pic, *alfredo;
+ struct clown_car *car = &circus->car;
+
+ /* ... clown initialization, list adding ... */
+
+ list_cut_position(&retirement, &car->clowns, &pic->node);
+
+ /* State 1 */
+ }
+
+The resulting ``car->clowns`` list would be this::
+
+ .----------------------.
+ v |
+ .--------. .---------. |
+ | clowns |---->| Alfredo |--'
+ '--------' '---------'
+
+Meanwhile, the ``retirement`` list is transformed to the following::
+
+ .--------------------------------------------------.
+ v |
+ .------------. .-------. .---------. .-----. |
+ | retirement |---->| Grock |---->| Dimitri |---->| Pic |--'
+ '------------' '-------' '---------' '-----'
+
+The second function, list_cut_before(), is much the same, except it cuts before
+the ``entry`` node, i.e. it removes all list entries from ``head`` up to but
+excluding ``entry``, placing them in ``list`` instead. This example assumes the
+same initial starting list as the previous example:
+
+.. code-block:: c
+
+ static void circus_retire_clowns(struct circus_priv *circus)
+ {
+ struct list_head retirement = LIST_HEAD_INIT(retirement);
+ struct clown *grock, *dimitri, *pic, *alfredo;
+ struct clown_car *car = &circus->car;
+
+ /* ... clown initialization, list adding ... */
+
+ list_cut_before(&retirement, &car->clowns, &pic->node);
+
+ /* State 1b */
+ }
+
+The resulting ``car->clowns`` list would be this::
+
+ .----------------------------------.
+ v |
+ .--------. .-----. .---------. |
+ | clowns |---->| Pic |---->| Alfredo |--'
+ '--------' '-----' '---------'
+
+Meanwhile, the ``retirement`` list is transformed to the following::
+
+ .--------------------------------------.
+ v |
+ .------------. .-------. .---------. |
+ | retirement |---->| Grock |---->| Dimitri |--'
+ '------------' '-------' '---------'
+
+It should be noted that both functions will destroy links to any existing nodes
+in the destination ``struct list_head *list``.
+
+Moving entries and partial lists
+--------------------------------
+
+The list_move() and list_move_tail() functions can be used to move an entry
+from one list to another, to either the start or end respectively.
+
+In the following example, we'll assume we start with two lists ("clowns" and
+"sidewalk" in the following initial state "State 0"::
+
+ .----------------------------------------------------------------.
+ v |
+ .--------. .-------. .---------. .-----. .---------. |
+ | clowns |---->| Grock |---->| Dimitri |---->| Pic |---->| Alfredo |--'
+ '--------' '-------' '---------' '-----' '---------'
+
+ .-------------------.
+ v |
+ .----------. .-----. |
+ | sidewalk |---->| Pio |--'
+ '----------' '-----'
+
+We apply the following example code to the two lists:
+
+.. code-block:: c
+
+ static void circus_clowns_exit_car(struct circus_priv *circus)
+ {
+ struct list_head sidewalk = LIST_HEAD_INIT(sidewalk);
+ struct clown *grock, *dimitri, *pic, *alfredo, *pio;
+ struct clown_car *car = &circus->car;
+
+ /* ... clown initialization, list adding ... */
+
+ /* State 0 */
+
+ list_move(&pic->node, &sidewalk);
+
+ /* State 1 */
+
+ list_move_tail(&dimitri->node, &sidewalk);
+
+ /* State 2 */
+ }
+
+In State 1, we arrive at the following situation::
+
+ .-----------------------------------------------------.
+ | |
+ v |
+ .--------. .-------. .---------. .---------. |
+ | clowns |---->| Grock |---->| Dimitri |---->| Alfredo |--'
+ '--------' '-------' '---------' '---------'
+
+ .-------------------------------.
+ v |
+ .----------. .-----. .-----. |
+ | sidewalk |---->| Pic |---->| Pio |--'
+ '----------' '-----' '-----'
+
+In State 2, after we've moved Dimitri to the tail of sidewalk, the situation
+changes as follows::
+
+ .-------------------------------------.
+ | |
+ v |
+ .--------. .-------. .---------. |
+ | clowns |---->| Grock |---->| Alfredo |--'
+ '--------' '-------' '---------'
+
+ .-----------------------------------------------.
+ v |
+ .----------. .-----. .-----. .---------. |
+ | sidewalk |---->| Pic |---->| Pio |---->| Dimitri |--'
+ '----------' '-----' '-----' '---------'
+
+As long as the source and destination list head are part of the same list, we
+can also efficiently bulk move a segment of the list to the tail end of the
+list. We continue the previous example by adding a list_bulk_move_tail() after
+State 2, moving Pic and Pio to the tail end of the sidewalk list.
+
+.. code-block:: c
+
+ static void circus_clowns_exit_car(struct circus_priv *circus)
+ {
+ struct list_head sidewalk = LIST_HEAD_INIT(sidewalk);
+ struct clown *grock, *dimitri, *pic, *alfredo, *pio;
+ struct clown_car *car = &circus->car;
+
+ /* ... clown initialization, list adding ... */
+
+ /* State 0 */
+
+ list_move(&pic->node, &sidewalk);
+
+ /* State 1 */
+
+ list_move_tail(&dimitri->node, &sidewalk);
+
+ /* State 2 */
+
+ list_bulk_move_tail(&sidewalk, &pic->node, &pio->node);
+
+ /* State 3 */
+ }
+
+For the sake of brevity, only the altered "sidewalk" list at State 3 is depicted
+in the following diagram::
+
+ .-----------------------------------------------.
+ v |
+ .----------. .---------. .-----. .-----. |
+ | sidewalk |---->| Dimitri |---->| Pic |---->| Pio |--'
+ '----------' '---------' '-----' '-----'
+
+Do note that list_bulk_move_tail() does not do any checking as to whether all
+three supplied ``struct list_head *`` parameters really do belong to the same
+list. If you use it outside the constraints the documentation gives, then the
+result is a matter between you and the implementation.
+
+Rotating entries
+----------------
+
+A common write operation on lists, especially when using them as queues, is
+to rotate it. A list rotation means entries at the front are sent to the back.
+
+For rotation, Linux provides us with two functions: list_rotate_left() and
+list_rotate_to_front(). The former can be pictured like a bicycle chain, taking
+the entry after the supplied ``struct list_head *`` and moving it to the tail,
+which in essence means the entire list, due to its circular nature, rotates by
+one position.
+
+The latter, list_rotate_to_front(), takes the same concept one step further:
+instead of advancing the list by one entry, it advances it *until* the specified
+entry is the new front.
+
+In the following example, our starting state, State 0, is the following::
+
+ .-----------------------------------------------------------------.
+ v |
+ .--------. .-------. .---------. .-----. .---------. .-----. |
+ | clowns |-->| Grock |-->| Dimitri |-->| Pic |-->| Alfredo |-->| Pio |-'
+ '--------' '-------' '---------' '-----' '---------' '-----'
+
+The example code being used to demonstrate list rotations is the following:
+
+.. code-block:: c
+
+ static void circus_clowns_rotate(struct circus_priv *circus)
+ {
+ struct clown *grock, *dimitri, *pic, *alfredo, *pio;
+ struct clown_car *car = &circus->car;
+
+ /* ... clown initialization, list adding ... */
+
+ /* State 0 */
+
+ list_rotate_left(&car->clowns);
+
+ /* State 1 */
+
+ list_rotate_to_front(&alfredo->node, &car->clowns);
+
+ /* State 2 */
+
+ }
+
+In State 1, we arrive at the following situation::
+
+ .-----------------------------------------------------------------.
+ v |
+ .--------. .---------. .-----. .---------. .-----. .-------. |
+ | clowns |-->| Dimitri |-->| Pic |-->| Alfredo |-->| Pio |-->| Grock |-'
+ '--------' '---------' '-----' '---------' '-----' '-------'
+
+Next, after the list_rotate_to_front() call, we arrive in the following
+State 2::
+
+ .-----------------------------------------------------------------.
+ v |
+ .--------. .---------. .-----. .-------. .---------. .-----. |
+ | clowns |-->| Alfredo |-->| Pio |-->| Grock |-->| Dimitri |-->| Pic |-'
+ '--------' '---------' '-----' '-------' '---------' '-----'
+
+As is hopefully evident from the diagrams, the entries in front of "Alfredo"
+were cycled to the tail end of the list.
+
+Swapping entries
+----------------
+
+Another common operation is that two entries need to be swapped with each other.
+
+For this, Linux provides us with list_swap().
+
+In the following example, we have a list with three entries, and swap two of
+them. This is our starting state in "State 0"::
+
+ .-----------------------------------------.
+ v |
+ .--------. .-------. .---------. .-----. |
+ | clowns |-->| Grock |-->| Dimitri |-->| Pic |-'
+ '--------' '-------' '---------' '-----'
+
+.. code-block:: c
+
+ static void circus_clowns_swap(struct circus_priv *circus)
+ {
+ struct clown *grock, *dimitri, *pic;
+ struct clown_car *car = &circus->car;
+
+ /* ... clown initialization, list adding ... */
+
+ /* State 0 */
+
+ list_swap(&dimitri->node, &pic->node);
+
+ /* State 1 */
+ }
+
+The resulting list at State 1 is the following::
+
+ .-----------------------------------------.
+ v |
+ .--------. .-------. .-----. .---------. |
+ | clowns |-->| Grock |-->| Pic |-->| Dimitri |-'
+ '--------' '-------' '-----' '---------'
+
+As is evident by comparing the diagrams, the "Pic" and "Dimitri" nodes have
+traded places.
+
+Splicing two lists together
+---------------------------
+
+Say we have two lists, in the following example one represented by a list head
+we call "knie" and one we call "stey". In a hypothetical circus acquisition,
+the two list of clowns should be spliced together. The following is our
+situation in "State 0"::
+
+ .-----------------------------------------.
+ | |
+ v |
+ .------. .-------. .---------. .-----. |
+ | knie |-->| Grock |-->| Dimitri |-->| Pic |--'
+ '------' '-------' '---------' '-----'
+
+ .-----------------------------.
+ v |
+ .------. .---------. .-----. |
+ | stey |-->| Alfredo |-->| Pio |--'
+ '------' '---------' '-----'
+
+The function to splice these two lists together is list_splice(). Our example
+code is as follows:
+
+.. code-block:: c
+
+ static void circus_clowns_splice(void)
+ {
+ struct clown *grock, *dimitri, *pic, *alfredo, *pio;
+ struct list_head knie = LIST_HEAD_INIT(knie);
+ struct list_head stey = LIST_HEAD_INIT(stey);
+
+ /* ... Clown allocation and initialization here ... */
+
+ list_add_tail(&grock->node, &knie);
+ list_add_tail(&dimitri->node, &knie);
+ list_add_tail(&pic->node, &knie);
+ list_add_tail(&alfredo->node, &stey);
+ list_add_tail(&pio->node, &stey);
+
+ /* State 0 */
+
+ list_splice(&stey, &dimitri->node);
+
+ /* State 1 */
+ }
+
+The list_splice() call here adds all the entries in ``stey`` to the list
+``dimitri``'s ``node`` list_head is in, after the ``node`` of ``dimitri``. A
+somewhat surprising diagram of the resulting "State 1" follows::
+
+ .-----------------------------------------------------------------.
+ | |
+ v |
+ .------. .-------. .---------. .---------. .-----. .-----. |
+ | knie |-->| Grock |-->| Dimitri |-->| Alfredo |-->| Pio |-->| Pic |--'
+ '------' '-------' '---------' '---------' '-----' '-----'
+ ^
+ .-------------------------------'
+ |
+ .------. |
+ | stey |--'
+ '------'
+
+Traversing the ``stey`` list no longer results in correct behavior. A call of
+list_for_each() on ``stey`` results in an infinite loop, as it never returns
+back to the ``stey`` list head.
+
+This is because list_splice() did not reinitialize the list_head it took
+entries from, leaving its pointer pointing into what is now a different list.
+
+If we want to avoid this situation, list_splice_init() can be used. It does the
+same thing as list_splice(), except reinitalizes the donor list_head after the
+transplant.
+
+Concurrency considerations
+--------------------------
+
+Concurrent access and modification of a list needs to be protected with a lock
+in most cases. Alternatively and preferably, one may use the RCU primitives for
+lists in read-mostly use-cases, where read accesses to the list are common but
+modifications to the list less so. See Documentation/RCU/listRCU.rst for more
+details.
+
+Further reading
+---------------
+
+* `How does the kernel implements Linked Lists? - KernelNewbies <https://kernelnewbies.org/FAQ/LinkedLists>`_
+
+Full List API
+=============
+
+.. kernel-doc:: include/linux/list.h
+ :internal:
diff --git a/Documentation/core-api/memory-hotplug.rst b/Documentation/core-api/memory-hotplug.rst
index 682259ee633a..8fc97c2379de 100644
--- a/Documentation/core-api/memory-hotplug.rst
+++ b/Documentation/core-api/memory-hotplug.rst
@@ -9,6 +9,9 @@ Memory hotplug event notifier
Hotplugging events are sent to a notification queue.
+Memory notifier
+----------------
+
There are six types of notification defined in ``include/linux/memory.h``:
MEM_GOING_ONLINE
@@ -56,20 +59,18 @@ The third argument (arg) passes a pointer of struct memory_notify::
struct memory_notify {
unsigned long start_pfn;
unsigned long nr_pages;
- int status_change_nid_normal;
- int status_change_nid;
}
- start_pfn is start_pfn of online/offline memory.
- nr_pages is # of pages of online/offline memory.
-- status_change_nid_normal is set node id when N_NORMAL_MEMORY of nodemask
- is (will be) set/clear, if this is -1, then nodemask status is not changed.
-- status_change_nid is set node id when N_MEMORY of nodemask is (will be)
- set/clear. It means a new(memoryless) node gets new memory by online and a
- node loses all memory. If this is -1, then nodemask status is not changed.
- If status_changed_nid* >= 0, callback should create/discard structures for the
- node if necessary.
+It is possible to get notified for MEM_CANCEL_ONLINE without having been notified
+for MEM_GOING_ONLINE, and the same applies to MEM_CANCEL_OFFLINE and
+MEM_GOING_OFFLINE.
+This can happen when a consumer fails, meaning we break the callchain and we
+stop calling the remaining consumers of the notifier.
+It is then important that users of memory_notify make no assumptions and get
+prepared to handle such cases.
The callback routine shall return one of the values
NOTIFY_DONE, NOTIFY_OK, NOTIFY_BAD, NOTIFY_STOP
@@ -83,6 +84,78 @@ further processing of the notification queue.
NOTIFY_STOP stops further processing of the notification queue.
+Numa node notifier
+------------------
+
+There are six types of notification defined in ``include/linux/node.h``:
+
+NODE_ADDING_FIRST_MEMORY
+ Generated before memory becomes available to this node for the first time.
+
+NODE_CANCEL_ADDING_FIRST_MEMORY
+ Generated if NODE_ADDING_FIRST_MEMORY fails.
+
+NODE_ADDED_FIRST_MEMORY
+ Generated when memory has become available fo this node for the first time.
+
+NODE_REMOVING_LAST_MEMORY
+ Generated when the last memory available to this node is about to be offlined.
+
+NODE_CANCEL_REMOVING_LAST_MEMORY
+ Generated when NODE_CANCEL_REMOVING_LAST_MEMORY fails.
+
+NODE_REMOVED_LAST_MEMORY
+ Generated when the last memory available to this node has been offlined.
+
+A callback routine can be registered by calling::
+
+ hotplug_node_notifier(callback_func, priority)
+
+Callback functions with higher values of priority are called before callback
+functions with lower values.
+
+A callback function must have the following prototype::
+
+ int callback_func(
+
+ struct notifier_block *self, unsigned long action, void *arg);
+
+The first argument of the callback function (self) is a pointer to the block
+of the notifier chain that points to the callback function itself.
+The second argument (action) is one of the event types described above.
+The third argument (arg) passes a pointer of struct node_notify::
+
+ struct node_notify {
+ int nid;
+ }
+
+- nid is the node we are adding or removing memory to.
+
+It is possible to get notified for NODE_CANCEL_ADDING_FIRST_MEMORY without
+having been notified for NODE_ADDING_FIRST_MEMORY, and the same applies to
+NODE_CANCEL_REMOVING_LAST_MEMORY and NODE_REMOVING_LAST_MEMORY.
+This can happen when a consumer fails, meaning we break the callchain and we
+stop calling the remaining consumers of the notifier.
+It is then important that users of node_notify make no assumptions and get
+prepared to handle such cases.
+
+The callback routine shall return one of the values
+NOTIFY_DONE, NOTIFY_OK, NOTIFY_BAD, NOTIFY_STOP
+defined in ``include/linux/notifier.h``
+
+NOTIFY_DONE and NOTIFY_OK have no effect on the further processing.
+
+NOTIFY_BAD is used as response to the NODE_ADDING_FIRST_MEMORY,
+NODE_REMOVING_LAST_MEMORY, NODE_ADDED_FIRST_MEMORY or
+NODE_REMOVED_LAST_MEMORY action to cancel hotplugging.
+It stops further processing of the notification queue.
+
+NOTIFY_STOP stops further processing of the notification queue.
+
+Please note that we should not fail for NODE_ADDED_FIRST_MEMORY /
+NODE_REMOVED_FIRST_MEMORY, as memory_hotplug code cannot rollback at that
+point anymore.
+
Locking Internals
=================
diff --git a/Documentation/core-api/mm-api.rst b/Documentation/core-api/mm-api.rst
index af8151db88b2..50cfc7842930 100644
--- a/Documentation/core-api/mm-api.rst
+++ b/Documentation/core-api/mm-api.rst
@@ -91,12 +91,6 @@ Memory pools
.. kernel-doc:: mm/mempool.c
:export:
-DMA pools
-=========
-
-.. kernel-doc:: mm/dmapool.c
- :export:
-
More Memory Management Functions
================================
diff --git a/Documentation/core-api/packing.rst b/Documentation/core-api/packing.rst
index 0ce2078c8e13..f68f1e08fef9 100644
--- a/Documentation/core-api/packing.rst
+++ b/Documentation/core-api/packing.rst
@@ -319,7 +319,7 @@ Here is an example of how to use the fields APIs:
#define SIZE 13
- typdef struct __packed { u8 buf[SIZE]; } packed_buf_t;
+ typedef struct __packed { u8 buf[SIZE]; } packed_buf_t;
static const struct packed_field_u8 fields[] = {
PACKED_FIELD(100, 90, struct data, field1),
diff --git a/Documentation/core-api/workqueue.rst b/Documentation/core-api/workqueue.rst
index e295835fc116..165ca73e8351 100644
--- a/Documentation/core-api/workqueue.rst
+++ b/Documentation/core-api/workqueue.rst
@@ -183,6 +183,12 @@ resources, scheduled and executed.
BH work items cannot sleep. All other features such as delayed queueing,
flushing and canceling are supported.
+``WQ_PERCPU``
+ Work items queued to a per-cpu wq are bound to a specific CPU.
+ This flag is the right choice when cpu locality is important.
+
+ This flag is the complement of ``WQ_UNBOUND``.
+
``WQ_UNBOUND``
Work items queued to an unbound wq are served by the special
worker-pools which host workers which are not bound to any