From 34e75cf4beb1a88a61b7c76b5fdc99c43cff8594 Mon Sep 17 00:00:00 2001
From: "Daniel W. S. Almeida" <dwlsalmeida@gmail.com>
Date: Wed, 29 Jan 2020 01:49:13 -0300
Subject: Documentation: nfs: convert pnfs.txt to ReST

Convert pnfs.txt to ReST. Content remains mostly unchanged.

Signed-off-by: Daniel W. S. Almeida <dwlsalmeida@gmail.com>
Link: https://lore.kernel.org/r/20200129044917.566906-2-dwlsalmeida@gmail.com
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/index.rst     |  1 +
 Documentation/filesystems/nfs/index.rst |  9 ++++
 Documentation/filesystems/nfs/pnfs.rst  | 78 +++++++++++++++++++++++++++++++++
 Documentation/filesystems/nfs/pnfs.txt  | 73 ------------------------------
 4 files changed, 88 insertions(+), 73 deletions(-)
 create mode 100644 Documentation/filesystems/nfs/index.rst
 create mode 100644 Documentation/filesystems/nfs/pnfs.rst
 delete mode 100644 Documentation/filesystems/nfs/pnfs.txt

diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index 386eaad008b2..45d791905e91 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -51,3 +51,4 @@ Documentation for filesystem implementations.
    overlayfs
    virtiofs
    vfat
+   nfs/index
diff --git a/Documentation/filesystems/nfs/index.rst b/Documentation/filesystems/nfs/index.rst
new file mode 100644
index 000000000000..d19ba592779a
--- /dev/null
+++ b/Documentation/filesystems/nfs/index.rst
@@ -0,0 +1,9 @@
+===============================
+NFS
+===============================
+
+
+.. toctree::
+   :maxdepth: 1
+
+   pnfs
diff --git a/Documentation/filesystems/nfs/pnfs.rst b/Documentation/filesystems/nfs/pnfs.rst
new file mode 100644
index 000000000000..7c470ecdc3a9
--- /dev/null
+++ b/Documentation/filesystems/nfs/pnfs.rst
@@ -0,0 +1,78 @@
+==========================
+Reference counting in pnfs
+==========================
+
+The are several inter-related caches.  We have layouts which can
+reference multiple devices, each of which can reference multiple data servers.
+Each data server can be referenced by multiple devices.  Each device
+can be referenced by multiple layouts. To keep all of this straight,
+we need to reference count.
+
+
+struct pnfs_layout_hdr
+======================
+
+The on-the-wire command LAYOUTGET corresponds to struct
+pnfs_layout_segment, usually referred to by the variable name lseg.
+Each nfs_inode may hold a pointer to a cache of these layout
+segments in nfsi->layout, of type struct pnfs_layout_hdr.
+
+We reference the header for the inode pointing to it, across each
+outstanding RPC call that references it (LAYOUTGET, LAYOUTRETURN,
+LAYOUTCOMMIT), and for each lseg held within.
+
+Each header is also (when non-empty) put on a list associated with
+struct nfs_client (cl_layouts).  Being put on this list does not bump
+the reference count, as the layout is kept around by the lseg that
+keeps it in the list.
+
+deviceid_cache
+==============
+
+lsegs reference device ids, which are resolved per nfs_client and
+layout driver type.  The device ids are held in a RCU cache (struct
+nfs4_deviceid_cache).  The cache itself is referenced across each
+mount.  The entries (struct nfs4_deviceid) themselves are held across
+the lifetime of each lseg referencing them.
+
+RCU is used because the deviceid is basically a write once, read many
+data structure.  The hlist size of 32 buckets needs better
+justification, but seems reasonable given that we can have multiple
+deviceid's per filesystem, and multiple filesystems per nfs_client.
+
+The hash code is copied from the nfsd code base.  A discussion of
+hashing and variations of this algorithm can be found `here.
+<http://groups.google.com/group/comp.lang.c/browse_thread/thread/9522965e2b8d3809>`_
+
+data server cache
+=================
+
+file driver devices refer to data servers, which are kept in a module
+level cache.  Its reference is held over the lifetime of the deviceid
+pointing to it.
+
+lseg
+====
+
+lseg maintains an extra reference corresponding to the NFS_LSEG_VALID
+bit which holds it in the pnfs_layout_hdr's list.  When the final lseg
+is removed from the pnfs_layout_hdr's list, the NFS_LAYOUT_DESTROYED
+bit is set, preventing any new lsegs from being added.
+
+layout drivers
+==============
+
+PNFS utilizes what is called layout drivers. The STD defines 4 basic
+layout types: "files", "objects", "blocks", and "flexfiles". For each
+of these types there is a layout-driver with a common function-vectors
+table which are called by the nfs-client pnfs-core to implement the
+different layout types.
+
+Files-layout-driver code is in: fs/nfs/filelayout/.. directory
+Blocks-layout-driver code is in: fs/nfs/blocklayout/.. directory
+Flexfiles-layout-driver code is in: fs/nfs/flexfilelayout/.. directory
+
+blocks-layout setup
+===================
+
+TODO: Document the setup needs of the blocks layout driver
diff --git a/Documentation/filesystems/nfs/pnfs.txt b/Documentation/filesystems/nfs/pnfs.txt
deleted file mode 100644
index 80dc0bdc302a..000000000000
--- a/Documentation/filesystems/nfs/pnfs.txt
+++ /dev/null
@@ -1,73 +0,0 @@
-Reference counting in pnfs:
-==========================
-
-The are several inter-related caches.  We have layouts which can
-reference multiple devices, each of which can reference multiple data servers.
-Each data server can be referenced by multiple devices.  Each device
-can be referenced by multiple layouts.  To keep all of this straight,
-we need to reference count.
-
-
-struct pnfs_layout_hdr
-----------------------
-The on-the-wire command LAYOUTGET corresponds to struct
-pnfs_layout_segment, usually referred to by the variable name lseg.
-Each nfs_inode may hold a pointer to a cache of these layout
-segments in nfsi->layout, of type struct pnfs_layout_hdr.
-
-We reference the header for the inode pointing to it, across each
-outstanding RPC call that references it (LAYOUTGET, LAYOUTRETURN,
-LAYOUTCOMMIT), and for each lseg held within.
-
-Each header is also (when non-empty) put on a list associated with
-struct nfs_client (cl_layouts).  Being put on this list does not bump
-the reference count, as the layout is kept around by the lseg that
-keeps it in the list.
-
-deviceid_cache
---------------
-lsegs reference device ids, which are resolved per nfs_client and
-layout driver type.  The device ids are held in a RCU cache (struct
-nfs4_deviceid_cache).  The cache itself is referenced across each
-mount.  The entries (struct nfs4_deviceid) themselves are held across
-the lifetime of each lseg referencing them.
-
-RCU is used because the deviceid is basically a write once, read many
-data structure.  The hlist size of 32 buckets needs better
-justification, but seems reasonable given that we can have multiple
-deviceid's per filesystem, and multiple filesystems per nfs_client.
-
-The hash code is copied from the nfsd code base.  A discussion of
-hashing and variations of this algorithm can be found at:
-http://groups.google.com/group/comp.lang.c/browse_thread/thread/9522965e2b8d3809
-
-data server cache
------------------
-file driver devices refer to data servers, which are kept in a module
-level cache.  Its reference is held over the lifetime of the deviceid
-pointing to it.
-
-lseg
-----
-lseg maintains an extra reference corresponding to the NFS_LSEG_VALID
-bit which holds it in the pnfs_layout_hdr's list.  When the final lseg
-is removed from the pnfs_layout_hdr's list, the NFS_LAYOUT_DESTROYED
-bit is set, preventing any new lsegs from being added.
-
-layout drivers
---------------
-
-PNFS utilizes what is called layout drivers. The STD defines 4 basic
-layout types: "files", "objects", "blocks", and "flexfiles". For each
-of these types there is a layout-driver with a common function-vectors
-table which are called by the nfs-client pnfs-core to implement the
-different layout types.
-
-Files-layout-driver code is in: fs/nfs/filelayout/.. directory
-Blocks-layout-driver code is in: fs/nfs/blocklayout/.. directory
-Flexfiles-layout-driver code is in: fs/nfs/flexfilelayout/.. directory
-
-blocks-layout setup
--------------------
-
-TODO: Document the setup needs of the blocks layout driver
-- 
cgit 


From f0bf8a988b26e75cc6fc28a44a745cb354a2b5a6 Mon Sep 17 00:00:00 2001
From: "Daniel W. S. Almeida" <dwlsalmeida@gmail.com>
Date: Wed, 29 Jan 2020 01:49:14 -0300
Subject: Documentation: nfs: rpc-cache: convert to ReST

Convert rpc-cache.txt to ReST. Changes aim to improve presentation
but the content itself remains mostly the same.

Signed-off-by: Daniel W. S. Almeida <dwlsalmeida@gmail.com>
Link: https://lore.kernel.org/r/20200129044917.566906-3-dwlsalmeida@gmail.com
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/nfs/index.rst     |   1 +
 Documentation/filesystems/nfs/rpc-cache.rst | 220 ++++++++++++++++++++++++++++
 Documentation/filesystems/nfs/rpc-cache.txt | 202 -------------------------
 3 files changed, 221 insertions(+), 202 deletions(-)
 create mode 100644 Documentation/filesystems/nfs/rpc-cache.rst
 delete mode 100644 Documentation/filesystems/nfs/rpc-cache.txt

diff --git a/Documentation/filesystems/nfs/index.rst b/Documentation/filesystems/nfs/index.rst
index d19ba592779a..52f4956e7770 100644
--- a/Documentation/filesystems/nfs/index.rst
+++ b/Documentation/filesystems/nfs/index.rst
@@ -7,3 +7,4 @@ NFS
    :maxdepth: 1
 
    pnfs
+   rpc-cache
diff --git a/Documentation/filesystems/nfs/rpc-cache.rst b/Documentation/filesystems/nfs/rpc-cache.rst
new file mode 100644
index 000000000000..bb164eea969b
--- /dev/null
+++ b/Documentation/filesystems/nfs/rpc-cache.rst
@@ -0,0 +1,220 @@
+=========
+RPC Cache
+=========
+
+This document gives a brief introduction to the caching
+mechanisms in the sunrpc layer that is used, in particular,
+for NFS authentication.
+
+Caches
+======
+
+The caching replaces the old exports table and allows for
+a wide variety of values to be caches.
+
+There are a number of caches that are similar in structure though
+quite possibly very different in content and use.  There is a corpus
+of common code for managing these caches.
+
+Examples of caches that are likely to be needed are:
+
+  - mapping from IP address to client name
+  - mapping from client name and filesystem to export options
+  - mapping from UID to list of GIDs, to work around NFS's limitation
+    of 16 gids.
+  - mappings between local UID/GID and remote UID/GID for sites that
+    do not have uniform uid assignment
+  - mapping from network identify to public key for crypto authentication.
+
+The common code handles such things as:
+
+   - general cache lookup with correct locking
+   - supporting 'NEGATIVE' as well as positive entries
+   - allowing an EXPIRED time on cache items, and removing
+     items after they expire, and are no longer in-use.
+   - making requests to user-space to fill in cache entries
+   - allowing user-space to directly set entries in the cache
+   - delaying RPC requests that depend on as-yet incomplete
+     cache entries, and replaying those requests when the cache entry
+     is complete.
+   - clean out old entries as they expire.
+
+Creating a Cache
+----------------
+
+-  A cache needs a datum to store.  This is in the form of a
+   structure definition that must contain a struct cache_head
+   as an element, usually the first.
+   It will also contain a key and some content.
+   Each cache element is reference counted and contains
+   expiry and update times for use in cache management.
+-  A cache needs a "cache_detail" structure that
+   describes the cache.  This stores the hash table, some
+   parameters for cache management, and some operations detailing how
+   to work with particular cache items.
+
+   The operations are:
+
+    struct cache_head \*alloc(void)
+      This simply allocates appropriate memory and returns
+      a pointer to the cache_detail embedded within the
+      structure
+
+    void cache_put(struct kref \*)
+      This is called when the last reference to an item is
+      dropped.  The pointer passed is to the 'ref' field
+      in the cache_head.  cache_put should release any
+      references create by 'cache_init' and, if CACHE_VALID
+      is set, any references created by cache_update.
+      It should then release the memory allocated by
+      'alloc'.
+
+    int match(struct cache_head \*orig, struct cache_head \*new)
+      test if the keys in the two structures match.  Return
+      1 if they do, 0 if they don't.
+
+    void init(struct cache_head \*orig, struct cache_head \*new)
+      Set the 'key' fields in 'new' from 'orig'.  This may
+      include taking references to shared objects.
+
+    void update(struct cache_head \*orig, struct cache_head \*new)
+      Set the 'content' fileds in 'new' from 'orig'.
+
+    int cache_show(struct seq_file \*m, struct cache_detail \*cd, struct cache_head \*h)
+      Optional.  Used to provide a /proc file that lists the
+      contents of a cache.  This should show one item,
+      usually on just one line.
+
+    int cache_request(struct cache_detail \*cd, struct cache_head \*h, char \*\*bpp, int \*blen)
+      Format a request to be send to user-space for an item
+      to be instantiated.  \*bpp is a buffer of size \*blen.
+      bpp should be moved forward over the encoded message,
+      and  \*blen should be reduced to show how much free
+      space remains.  Return 0 on success or <0 if not
+      enough room or other problem.
+
+    int cache_parse(struct cache_detail \*cd, char \*buf, int len)
+      A message from user space has arrived to fill out a
+      cache entry.  It is in 'buf' of length 'len'.
+      cache_parse should parse this, find the item in the
+      cache with sunrpc_cache_lookup_rcu, and update the item
+      with sunrpc_cache_update.
+
+
+-  A cache needs to be registered using cache_register().  This
+   includes it on a list of caches that will be regularly
+   cleaned to discard old data.
+
+Using a cache
+-------------
+
+To find a value in a cache, call sunrpc_cache_lookup_rcu passing a pointer
+to the cache_head in a sample item with the 'key' fields filled in.
+This will be passed to ->match to identify the target entry.  If no
+entry is found, a new entry will be create, added to the cache, and
+marked as not containing valid data.
+
+The item returned is typically passed to cache_check which will check
+if the data is valid, and may initiate an up-call to get fresh data.
+cache_check will return -ENOENT in the entry is negative or if an up
+call is needed but not possible, -EAGAIN if an upcall is pending,
+or 0 if the data is valid;
+
+cache_check can be passed a "struct cache_req\*".  This structure is
+typically embedded in the actual request and can be used to create a
+deferred copy of the request (struct cache_deferred_req).  This is
+done when the found cache item is not uptodate, but the is reason to
+believe that userspace might provide information soon.  When the cache
+item does become valid, the deferred copy of the request will be
+revisited (->revisit).  It is expected that this method will
+reschedule the request for processing.
+
+The value returned by sunrpc_cache_lookup_rcu can also be passed to
+sunrpc_cache_update to set the content for the item.  A second item is
+passed which should hold the content.  If the item found by _lookup
+has valid data, then it is discarded and a new item is created.  This
+saves any user of an item from worrying about content changing while
+it is being inspected.  If the item found by _lookup does not contain
+valid data, then the content is copied across and CACHE_VALID is set.
+
+Populating a cache
+------------------
+
+Each cache has a name, and when the cache is registered, a directory
+with that name is created in /proc/net/rpc
+
+This directory contains a file called 'channel' which is a channel
+for communicating between kernel and user for populating the cache.
+This directory may later contain other files of interacting
+with the cache.
+
+The 'channel' works a bit like a datagram socket. Each 'write' is
+passed as a whole to the cache for parsing and interpretation.
+Each cache can treat the write requests differently, but it is
+expected that a message written will contain:
+
+  - a key
+  - an expiry time
+  - a content.
+
+with the intention that an item in the cache with the give key
+should be create or updated to have the given content, and the
+expiry time should be set on that item.
+
+Reading from a channel is a bit more interesting.  When a cache
+lookup fails, or when it succeeds but finds an entry that may soon
+expire, a request is lodged for that cache item to be updated by
+user-space.  These requests appear in the channel file.
+
+Successive reads will return successive requests.
+If there are no more requests to return, read will return EOF, but a
+select or poll for read will block waiting for another request to be
+added.
+
+Thus a user-space helper is likely to::
+
+  open the channel.
+    select for readable
+    read a request
+    write a response
+  loop.
+
+If it dies and needs to be restarted, any requests that have not been
+answered will still appear in the file and will be read by the new
+instance of the helper.
+
+Each cache should define a "cache_parse" method which takes a message
+written from user-space and processes it.  It should return an error
+(which propagates back to the write syscall) or 0.
+
+Each cache should also define a "cache_request" method which
+takes a cache item and encodes a request into the buffer
+provided.
+
+.. note::
+  If a cache has no active readers on the channel, and has had not
+  active readers for more than 60 seconds, further requests will not be
+  added to the channel but instead all lookups that do not find a valid
+  entry will fail.  This is partly for backward compatibility: The
+  previous nfs exports table was deemed to be authoritative and a
+  failed lookup meant a definite 'no'.
+
+request/response format
+-----------------------
+
+While each cache is free to use its own format for requests
+and responses over channel, the following is recommended as
+appropriate and support routines are available to help:
+Each request or response record should be printable ASCII
+with precisely one newline character which should be at the end.
+Fields within the record should be separated by spaces, normally one.
+If spaces, newlines, or nul characters are needed in a field they
+much be quoted.  two mechanisms are available:
+
+-  If a field begins '\x' then it must contain an even number of
+   hex digits, and pairs of these digits provide the bytes in the
+   field.
+-  otherwise a \ in the field must be followed by 3 octal digits
+   which give the code for a byte.  Other characters are treated
+   as them selves.  At the very least, space, newline, nul, and
+   '\' must be quoted in this way.
diff --git a/Documentation/filesystems/nfs/rpc-cache.txt b/Documentation/filesystems/nfs/rpc-cache.txt
deleted file mode 100644
index c4dac829db0f..000000000000
--- a/Documentation/filesystems/nfs/rpc-cache.txt
+++ /dev/null
@@ -1,202 +0,0 @@
-	This document gives a brief introduction to the caching
-mechanisms in the sunrpc layer that is used, in particular,
-for NFS authentication.
-
-CACHES
-======
-The caching replaces the old exports table and allows for
-a wide variety of values to be caches.
-
-There are a number of caches that are similar in structure though
-quite possibly very different in content and use.  There is a corpus
-of common code for managing these caches.
-
-Examples of caches that are likely to be needed are:
-  - mapping from IP address to client name
-  - mapping from client name and filesystem to export options
-  - mapping from UID to list of GIDs, to work around NFS's limitation
-    of 16 gids.
-  - mappings between local UID/GID and remote UID/GID for sites that
-    do not have uniform uid assignment
-  - mapping from network identify to public key for crypto authentication.
-
-The common code handles such things as:
-   - general cache lookup with correct locking
-   - supporting 'NEGATIVE' as well as positive entries
-   - allowing an EXPIRED time on cache items, and removing
-     items after they expire, and are no longer in-use.
-   - making requests to user-space to fill in cache entries
-   - allowing user-space to directly set entries in the cache
-   - delaying RPC requests that depend on as-yet incomplete
-     cache entries, and replaying those requests when the cache entry
-     is complete.
-   - clean out old entries as they expire.
-
-Creating a Cache
-----------------
-
-1/ A cache needs a datum to store.  This is in the form of a
-   structure definition that must contain a
-     struct cache_head
-   as an element, usually the first.
-   It will also contain a key and some content.
-   Each cache element is reference counted and contains
-   expiry and update times for use in cache management.
-2/ A cache needs a "cache_detail" structure that
-   describes the cache.  This stores the hash table, some
-   parameters for cache management, and some operations detailing how
-   to work with particular cache items.
-   The operations requires are:
-   	struct cache_head *alloc(void)
-		This simply allocates appropriate memory and returns
-   		a pointer to the cache_detail embedded within the
-		structure
-	void cache_put(struct kref *)
-		This is called when the last reference to an item is
-		dropped.  The pointer passed is to the 'ref' field
-		in the cache_head.  cache_put should release any
-		references create by 'cache_init' and, if CACHE_VALID
-		is set, any references created by cache_update.
-		It should then release the memory allocated by
-   		'alloc'.
-        int match(struct cache_head *orig, struct cache_head *new)
-		test if the keys in the two structures match.  Return
-		1 if they do, 0 if they don't.
-	void init(struct cache_head *orig, struct cache_head *new)
-		Set the 'key' fields in 'new' from 'orig'.  This may
-		include taking references to shared objects.
-	void update(struct cache_head *orig, struct cache_head *new)
-		Set the 'content' fileds in 'new' from 'orig'.
-	int cache_show(struct seq_file *m, struct cache_detail *cd,
-			struct cache_head *h)
-		Optional.  Used to provide a /proc file that lists the
-		contents of a cache.  This should show one item,
-   		usually on just one line.
-	int cache_request(struct cache_detail *cd, struct cache_head *h,
-   		char **bpp, int *blen)
-		Format a request to be send to user-space for an item
-   		to be instantiated.  *bpp is a buffer of size *blen.
-		bpp should be moved forward over the encoded message,
-		and  *blen should be reduced to show how much free
-		space remains.  Return 0 on success or <0 if not
-		enough room or other problem.
-	int cache_parse(struct cache_detail *cd, char *buf, int len)
-		A message from user space has arrived to fill out a
-		cache entry.  It is in 'buf' of length 'len'.
-		cache_parse should parse this, find the item in the
-		cache with sunrpc_cache_lookup_rcu, and update the item
-		with sunrpc_cache_update.
-
-
-3/ A cache needs to be registered using cache_register().  This
-   includes it on a list of caches that will be regularly
-   cleaned to discard old data.
-
-Using a cache
--------------
-
-To find a value in a cache, call sunrpc_cache_lookup_rcu passing a pointer
-to the cache_head in a sample item with the 'key' fields filled in.
-This will be passed to ->match to identify the target entry.  If no
-entry is found, a new entry will be create, added to the cache, and
-marked as not containing valid data.
-
-The item returned is typically passed to cache_check which will check
-if the data is valid, and may initiate an up-call to get fresh data.
-cache_check will return -ENOENT in the entry is negative or if an up
-call is needed but not possible, -EAGAIN if an upcall is pending,
-or 0 if the data is valid;
-
-cache_check can be passed a "struct cache_req *".  This structure is
-typically embedded in the actual request and can be used to create a
-deferred copy of the request (struct cache_deferred_req).  This is
-done when the found cache item is not uptodate, but the is reason to
-believe that userspace might provide information soon.  When the cache
-item does become valid, the deferred copy of the request will be
-revisited (->revisit).  It is expected that this method will
-reschedule the request for processing.
-
-The value returned by sunrpc_cache_lookup_rcu can also be passed to
-sunrpc_cache_update to set the content for the item.  A second item is
-passed which should hold the content.  If the item found by _lookup
-has valid data, then it is discarded and a new item is created.  This
-saves any user of an item from worrying about content changing while
-it is being inspected.  If the item found by _lookup does not contain
-valid data, then the content is copied across and CACHE_VALID is set.
-
-Populating a cache
-------------------
-
-Each cache has a name, and when the cache is registered, a directory
-with that name is created in /proc/net/rpc
-
-This directory contains a file called 'channel' which is a channel
-for communicating between kernel and user for populating the cache.
-This directory may later contain other files of interacting
-with the cache.
-
-The 'channel' works a bit like a datagram socket. Each 'write' is
-passed as a whole to the cache for parsing and interpretation.
-Each cache can treat the write requests differently, but it is
-expected that a message written will contain:
-  - a key
-  - an expiry time
-  - a content.
-with the intention that an item in the cache with the give key
-should be create or updated to have the given content, and the
-expiry time should be set on that item.
-
-Reading from a channel is a bit more interesting.  When a cache
-lookup fails, or when it succeeds but finds an entry that may soon
-expire, a request is lodged for that cache item to be updated by
-user-space.  These requests appear in the channel file.
-
-Successive reads will return successive requests.
-If there are no more requests to return, read will return EOF, but a
-select or poll for read will block waiting for another request to be
-added.
-
-Thus a user-space helper is likely to:
-  open the channel.
-    select for readable
-    read a request
-    write a response
-  loop.
-
-If it dies and needs to be restarted, any requests that have not been
-answered will still appear in the file and will be read by the new
-instance of the helper.
-
-Each cache should define a "cache_parse" method which takes a message
-written from user-space and processes it.  It should return an error
-(which propagates back to the write syscall) or 0.
-
-Each cache should also define a "cache_request" method which
-takes a cache item and encodes a request into the buffer
-provided.
-
-Note: If a cache has no active readers on the channel, and has had not
-active readers for more than 60 seconds, further requests will not be
-added to the channel but instead all lookups that do not find a valid
-entry will fail.  This is partly for backward compatibility: The
-previous nfs exports table was deemed to be authoritative and a
-failed lookup meant a definite 'no'.
-
-request/response format
------------------------
-
-While each cache is free to use its own format for requests
-and responses over channel, the following is recommended as
-appropriate and support routines are available to help:
-Each request or response record should be printable ASCII
-with precisely one newline character which should be at the end.
-Fields within the record should be separated by spaces, normally one.
-If spaces, newlines, or nul characters are needed in a field they
-much be quoted.  two mechanisms are available:
-1/ If a field begins '\x' then it must contain an even number of
-   hex digits, and pairs of these digits provide the bytes in the
-   field.
-2/ otherwise a \ in the field must be followed by 3 octal digits
-   which give the code for a byte.  Other characters are treated
-   as them selves.  At the very least, space, newline, nul, and
-   '\' must be quoted in this way.
-- 
cgit 


From 250baf06aacf4eafb5641c86c91f2b1df4cf7d86 Mon Sep 17 00:00:00 2001
From: "Daniel W. S. Almeida" <dwlsalmeida@gmail.com>
Date: Wed, 29 Jan 2020 01:49:15 -0300
Subject: Documentation: nfs: rpc-server-gss: convert to ReST

Convert rpc-server-gss.txt to ReST. Content remains mostly unchanged.

Signed-off-by: Daniel W. S. Almeida <dwlsalmeida@gmail.com>
Link: https://lore.kernel.org/r/20200129044917.566906-4-dwlsalmeida@gmail.com
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/nfs/index.rst          |  1 +
 Documentation/filesystems/nfs/rpc-server-gss.rst | 94 ++++++++++++++++++++++++
 Documentation/filesystems/nfs/rpc-server-gss.txt | 91 -----------------------
 3 files changed, 95 insertions(+), 91 deletions(-)
 create mode 100644 Documentation/filesystems/nfs/rpc-server-gss.rst
 delete mode 100644 Documentation/filesystems/nfs/rpc-server-gss.txt

diff --git a/Documentation/filesystems/nfs/index.rst b/Documentation/filesystems/nfs/index.rst
index 52f4956e7770..9d5365cbe2c3 100644
--- a/Documentation/filesystems/nfs/index.rst
+++ b/Documentation/filesystems/nfs/index.rst
@@ -8,3 +8,4 @@ NFS
 
    pnfs
    rpc-cache
+   rpc-server-gss
diff --git a/Documentation/filesystems/nfs/rpc-server-gss.rst b/Documentation/filesystems/nfs/rpc-server-gss.rst
new file mode 100644
index 000000000000..812754576845
--- /dev/null
+++ b/Documentation/filesystems/nfs/rpc-server-gss.rst
@@ -0,0 +1,94 @@
+=========================================
+rpcsec_gss support for kernel RPC servers
+=========================================
+
+This document gives references to the standards and protocols used to
+implement RPCGSS authentication in kernel RPC servers such as the NFS
+server and the NFS client's NFSv4.0 callback server.  (But note that
+NFSv4.1 and higher don't require the client to act as a server for the
+purposes of authentication.)
+
+RPCGSS is specified in a few IETF documents:
+
+ - RFC2203 v1: http://tools.ietf.org/rfc/rfc2203.txt
+ - RFC5403 v2: http://tools.ietf.org/rfc/rfc5403.txt
+
+and there is a 3rd version  being proposed:
+
+ - http://tools.ietf.org/id/draft-williams-rpcsecgssv3.txt
+   (At draft n. 02 at the time of writing)
+
+Background
+==========
+
+The RPCGSS Authentication method describes a way to perform GSSAPI
+Authentication for NFS.  Although GSSAPI is itself completely mechanism
+agnostic, in many cases only the KRB5 mechanism is supported by NFS
+implementations.
+
+The Linux kernel, at the moment, supports only the KRB5 mechanism, and
+depends on GSSAPI extensions that are KRB5 specific.
+
+GSSAPI is a complex library, and implementing it completely in kernel is
+unwarranted. However GSSAPI operations are fundementally separable in 2
+parts:
+
+- initial context establishment
+- integrity/privacy protection (signing and encrypting of individual
+  packets)
+
+The former is more complex and policy-independent, but less
+performance-sensitive.  The latter is simpler and needs to be very fast.
+
+Therefore, we perform per-packet integrity and privacy protection in the
+kernel, but leave the initial context establishment to userspace.  We
+need upcalls to request userspace to perform context establishment.
+
+NFS Server Legacy Upcall Mechanism
+==================================
+
+The classic upcall mechanism uses a custom text based upcall mechanism
+to talk to a custom daemon called rpc.svcgssd that is provide by the
+nfs-utils package.
+
+This upcall mechanism has 2 limitations:
+
+A) It can handle tokens that are no bigger than 2KiB
+
+In some Kerberos deployment GSSAPI tokens can be quite big, up and
+beyond 64KiB in size due to various authorization extensions attacked to
+the Kerberos tickets, that needs to be sent through the GSS layer in
+order to perform context establishment.
+
+B) It does not properly handle creds where the user is member of more
+than a few thousand groups (the current hard limit in the kernel is 65K
+groups) due to limitation on the size of the buffer that can be send
+back to the kernel (4KiB).
+
+NFS Server New RPC Upcall Mechanism
+===================================
+
+The newer upcall mechanism uses RPC over a unix socket to a daemon
+called gss-proxy, implemented by a userspace program called Gssproxy.
+
+The gss_proxy RPC protocol is currently documented `here
+<https://fedorahosted.org/gss-proxy/wiki/ProtocolDocumentation>`_.
+
+This upcall mechanism uses the kernel rpc client and connects to the gssproxy
+userspace program over a regular unix socket. The gssproxy protocol does not
+suffer from the size limitations of the legacy protocol.
+
+Negotiating Upcall Mechanisms
+=============================
+
+To provide backward compatibility, the kernel defaults to using the
+legacy mechanism.  To switch to the new mechanism, gss-proxy must bind
+to /var/run/gssproxy.sock and then write "1" to
+/proc/net/rpc/use-gss-proxy.  If gss-proxy dies, it must repeat both
+steps.
+
+Once the upcall mechanism is chosen, it cannot be changed.  To prevent
+locking into the legacy mechanisms, the above steps must be performed
+before starting nfsd.  Whoever starts nfsd can guarantee this by reading
+from /proc/net/rpc/use-gss-proxy and checking that it contains a
+"1"--the read will block until gss-proxy has done its write to the file.
diff --git a/Documentation/filesystems/nfs/rpc-server-gss.txt b/Documentation/filesystems/nfs/rpc-server-gss.txt
deleted file mode 100644
index 310bbbaf9080..000000000000
--- a/Documentation/filesystems/nfs/rpc-server-gss.txt
+++ /dev/null
@@ -1,91 +0,0 @@
-
-rpcsec_gss support for kernel RPC servers
-=========================================
-
-This document gives references to the standards and protocols used to
-implement RPCGSS authentication in kernel RPC servers such as the NFS
-server and the NFS client's NFSv4.0 callback server.  (But note that
-NFSv4.1 and higher don't require the client to act as a server for the
-purposes of authentication.)
-
-RPCGSS is specified in a few IETF documents:
- - RFC2203 v1: http://tools.ietf.org/rfc/rfc2203.txt
- - RFC5403 v2: http://tools.ietf.org/rfc/rfc5403.txt
-and there is a 3rd version  being proposed:
- - http://tools.ietf.org/id/draft-williams-rpcsecgssv3.txt
-   (At draft n. 02 at the time of writing)
-
-Background
-----------
-
-The RPCGSS Authentication method describes a way to perform GSSAPI
-Authentication for NFS.  Although GSSAPI is itself completely mechanism
-agnostic, in many cases only the KRB5 mechanism is supported by NFS
-implementations.
-
-The Linux kernel, at the moment, supports only the KRB5 mechanism, and
-depends on GSSAPI extensions that are KRB5 specific.
-
-GSSAPI is a complex library, and implementing it completely in kernel is
-unwarranted. However GSSAPI operations are fundementally separable in 2
-parts:
-- initial context establishment
-- integrity/privacy protection (signing and encrypting of individual
-  packets)
-
-The former is more complex and policy-independent, but less
-performance-sensitive.  The latter is simpler and needs to be very fast.
-
-Therefore, we perform per-packet integrity and privacy protection in the
-kernel, but leave the initial context establishment to userspace.  We
-need upcalls to request userspace to perform context establishment.
-
-NFS Server Legacy Upcall Mechanism
-----------------------------------
-
-The classic upcall mechanism uses a custom text based upcall mechanism
-to talk to a custom daemon called rpc.svcgssd that is provide by the
-nfs-utils package.
-
-This upcall mechanism has 2 limitations:
-
-A) It can handle tokens that are no bigger than 2KiB
-
-In some Kerberos deployment GSSAPI tokens can be quite big, up and
-beyond 64KiB in size due to various authorization extensions attacked to
-the Kerberos tickets, that needs to be sent through the GSS layer in
-order to perform context establishment.
-
-B) It does not properly handle creds where the user is member of more
-than a few thousand groups (the current hard limit in the kernel is 65K
-groups) due to limitation on the size of the buffer that can be send
-back to the kernel (4KiB).
-
-NFS Server New RPC Upcall Mechanism
------------------------------------
-
-The newer upcall mechanism uses RPC over a unix socket to a daemon
-called gss-proxy, implemented by a userspace program called Gssproxy.
-
-The gss_proxy RPC protocol is currently documented here:
-
-	https://fedorahosted.org/gss-proxy/wiki/ProtocolDocumentation
-
-This upcall mechanism uses the kernel rpc client and connects to the gssproxy
-userspace program over a regular unix socket. The gssproxy protocol does not
-suffer from the size limitations of the legacy protocol.
-
-Negotiating Upcall Mechanisms
------------------------------
-
-To provide backward compatibility, the kernel defaults to using the
-legacy mechanism.  To switch to the new mechanism, gss-proxy must bind
-to /var/run/gssproxy.sock and then write "1" to
-/proc/net/rpc/use-gss-proxy.  If gss-proxy dies, it must repeat both
-steps.
-
-Once the upcall mechanism is chosen, it cannot be changed.  To prevent
-locking into the legacy mechanisms, the above steps must be performed
-before starting nfsd.  Whoever starts nfsd can guarantee this by reading
-from /proc/net/rpc/use-gss-proxy and checking that it contains a
-"1"--the read will block until gss-proxy has done its write to the file.
-- 
cgit 


From 04f81fb08d067f79c59fe132929a9c81eb9cb74b Mon Sep 17 00:00:00 2001
From: "Daniel W. S. Almeida" <dwlsalmeida@gmail.com>
Date: Wed, 29 Jan 2020 01:49:16 -0300
Subject: Documentation: nfs: nfs41-server: convert to ReST

Convert nfs41-server.txt to ReST. ASCII tables were converted to ReST grid
table format.

Signed-off-by: Daniel W. S. Almeida <dwlsalmeida@gmail.com>
Link: https://lore.kernel.org/r/20200129044917.566906-5-dwlsalmeida@gmail.com
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/nfs/index.rst        |   1 +
 Documentation/filesystems/nfs/nfs41-server.rst | 256 +++++++++++++++++++++++++
 Documentation/filesystems/nfs/nfs41-server.txt | 173 -----------------
 3 files changed, 257 insertions(+), 173 deletions(-)
 create mode 100644 Documentation/filesystems/nfs/nfs41-server.rst
 delete mode 100644 Documentation/filesystems/nfs/nfs41-server.txt

diff --git a/Documentation/filesystems/nfs/index.rst b/Documentation/filesystems/nfs/index.rst
index 9d5365cbe2c3..a0a678af921b 100644
--- a/Documentation/filesystems/nfs/index.rst
+++ b/Documentation/filesystems/nfs/index.rst
@@ -9,3 +9,4 @@ NFS
    pnfs
    rpc-cache
    rpc-server-gss
+   nfs41-server
diff --git a/Documentation/filesystems/nfs/nfs41-server.rst b/Documentation/filesystems/nfs/nfs41-server.rst
new file mode 100644
index 000000000000..16b5f02f81c3
--- /dev/null
+++ b/Documentation/filesystems/nfs/nfs41-server.rst
@@ -0,0 +1,256 @@
+=============================
+NFSv4.1 Server Implementation
+=============================
+
+Server support for minorversion 1 can be controlled using the
+/proc/fs/nfsd/versions control file.  The string output returned
+by reading this file will contain either "+4.1" or "-4.1"
+correspondingly.
+
+Currently, server support for minorversion 1 is enabled by default.
+It can be disabled at run time by writing the string "-4.1" to
+the /proc/fs/nfsd/versions control file.  Note that to write this
+control file, the nfsd service must be taken down.  You can use rpc.nfsd
+for this; see rpc.nfsd(8).
+
+(Warning: older servers will interpret "+4.1" and "-4.1" as "+4" and
+"-4", respectively.  Therefore, code meant to work on both new and old
+kernels must turn 4.1 on or off *before* turning support for version 4
+on or off; rpc.nfsd does this correctly.)
+
+The NFSv4 minorversion 1 (NFSv4.1) implementation in nfsd is based
+on RFC 5661.
+
+From the many new features in NFSv4.1 the current implementation
+focuses on the mandatory-to-implement NFSv4.1 Sessions, providing
+"exactly once" semantics and better control and throttling of the
+resources allocated for each client.
+
+The table below, taken from the NFSv4.1 document, lists
+the operations that are mandatory to implement (REQ), optional
+(OPT), and NFSv4.0 operations that are required not to implement (MNI)
+in minor version 1.  The first column indicates the operations that
+are not supported yet by the linux server implementation.
+
+The OPTIONAL features identified and their abbreviations are as follows:
+
+- **pNFS**	Parallel NFS
+- **FDELG**	File Delegations
+- **DDELG**	Directory Delegations
+
+The following abbreviations indicate the linux server implementation status.
+
+- **I**	Implemented NFSv4.1 operations.
+- **NS**	Not Supported.
+- **NS\***	Unimplemented optional feature.
+
+Operations
+==========
+
++-----------------------+----------------------+---------------------+---------------------------+----------------+
+| Implementation status | Operation            | REQ,REC, OPT or NMI | Feature (REQ, REC or OPT) | Definition     |
++=======================+======================+=====================+===========================+================+
+|                       | ACCESS               | REQ                 |                           | Section 18.1   |
++-----------------------+----------------------+---------------------+---------------------------+----------------+
+| I                     | BACKCHANNEL_CTL      | REQ                 |                           | Section 18.33  |
++-----------------------+----------------------+---------------------+---------------------------+----------------+
+| I                     | BIND_CONN_TO_SESSION | REQ                 |                           | Section 18.34  |
++-----------------------+----------------------+---------------------+---------------------------+----------------+
+|                       | CLOSE                | REQ                 |                           | Section 18.2   |
++-----------------------+----------------------+---------------------+---------------------------+----------------+
+|                       | COMMIT               | REQ                 |                           | Section 18.3   |
++-----------------------+----------------------+---------------------+---------------------------+----------------+
+|                       | CREATE               | REQ                 |                           | Section 18.4   |
++-----------------------+----------------------+---------------------+---------------------------+----------------+
+| I                     | CREATE_SESSION       | REQ                 |                           | Section 18.36  |
++-----------------------+----------------------+---------------------+---------------------------+----------------+
+| NS*                   | DELEGPURGE           | OPT                 | FDELG (REQ)               | Section 18.5   |
++-----------------------+----------------------+---------------------+---------------------------+----------------+
+|                       | DELEGRETURN          | OPT                 | FDELG,                    | Section 18.6   |
++-----------------------+----------------------+---------------------+---------------------------+----------------+
+|                       |                      |                     | DDELG, pNFS               |                |
++-----------------------+----------------------+---------------------+---------------------------+----------------+
+|                       |                      |                     | (REQ)                     |                |
++-----------------------+----------------------+---------------------+---------------------------+----------------+
+| I                     | DESTROY_CLIENTID     | REQ                 |                           | Section 18.50  |
++-----------------------+----------------------+---------------------+---------------------------+----------------+
+| I                     | DESTROY_SESSION      | REQ                 |                           | Section 18.37  |
++-----------------------+----------------------+---------------------+---------------------------+----------------+
+| I                     | EXCHANGE_ID          | REQ                 |                           | Section 18.35  |
++-----------------------+----------------------+---------------------+---------------------------+----------------+
+| I                     | FREE_STATEID         | REQ                 |                           | Section 18.38  |
++-----------------------+----------------------+---------------------+---------------------------+----------------+
+|                       | GETATTR              | REQ                 |                           | Section 18.7   |
++-----------------------+----------------------+---------------------+---------------------------+----------------+
+| I                     | GETDEVICEINFO        | OPT                 | pNFS (REQ)                | Section 18.40  |
++-----------------------+----------------------+---------------------+---------------------------+----------------+
+| NS*                   | GETDEVICELIST        | OPT                 | pNFS (OPT)                | Section 18.41  |
++-----------------------+----------------------+---------------------+---------------------------+----------------+
+|                       | GETFH                | REQ                 |                           | Section 18.8   |
++-----------------------+----------------------+---------------------+---------------------------+----------------+
+| NS*                   | GET_DIR_DELEGATION   | OPT                 | DDELG (REQ)               | Section 18.39  |
++-----------------------+----------------------+---------------------+---------------------------+----------------+
+| I                     | LAYOUTCOMMIT         | OPT                 | pNFS (REQ)                | Section 18.42  |
++-----------------------+----------------------+---------------------+---------------------------+----------------+
+| I                     | LAYOUTGET            | OPT                 | pNFS (REQ)                | Section 18.43  |
++-----------------------+----------------------+---------------------+---------------------------+----------------+
+| I                     | LAYOUTRETURN         | OPT                 | pNFS (REQ)                | Section 18.44  |
++-----------------------+----------------------+---------------------+---------------------------+----------------+
+|                       | LINK                 | OPT                 |                           | Section 18.9   |
++-----------------------+----------------------+---------------------+---------------------------+----------------+
+|                       | LOCK                 | REQ                 |                           | Section 18.10  |
++-----------------------+----------------------+---------------------+---------------------------+----------------+
+|                       | LOCKT                | REQ                 |                           | Section 18.11  |
++-----------------------+----------------------+---------------------+---------------------------+----------------+
+|                       | LOCKU                | REQ                 |                           | Section 18.12  |
++-----------------------+----------------------+---------------------+---------------------------+----------------+
+|                       | LOOKUP               | REQ                 |                           | Section 18.13  |
++-----------------------+----------------------+---------------------+---------------------------+----------------+
+|                       | LOOKUPP              | REQ                 |                           | Section 18.14  |
++-----------------------+----------------------+---------------------+---------------------------+----------------+
+|                       | NVERIFY              | REQ                 |                           | Section 18.15  |
++-----------------------+----------------------+---------------------+---------------------------+----------------+
+|                       | OPEN                 | REQ                 |                           | Section 18.16  |
++-----------------------+----------------------+---------------------+---------------------------+----------------+
+| NS*                   | OPENATTR             | OPT                 |                           | Section 18.17  |
++-----------------------+----------------------+---------------------+---------------------------+----------------+
+|                       | OPEN_CONFIRM         | MNI                 |                           | N/A            |
++-----------------------+----------------------+---------------------+---------------------------+----------------+
+|                       | OPEN_DOWNGRADE       | REQ                 |                           | Section 18.18  |
++-----------------------+----------------------+---------------------+---------------------------+----------------+
+|                       | PUTFH                | REQ                 |                           | Section 18.19  |
++-----------------------+----------------------+---------------------+---------------------------+----------------+
+|                       | PUTPUBFH             | REQ                 |                           | Section 18.20  |
++-----------------------+----------------------+---------------------+---------------------------+----------------+
+|                       | PUTROOTFH            | REQ                 |                           | Section 18.21  |
++-----------------------+----------------------+---------------------+---------------------------+----------------+
+|                       | READ                 | REQ                 |                           | Section 18.22  |
++-----------------------+----------------------+---------------------+---------------------------+----------------+
+|                       | READDIR              | REQ                 |                           | Section 18.23  |
++-----------------------+----------------------+---------------------+---------------------------+----------------+
+|                       | READLINK             | OPT                 |                           | Section 18.24  |
++-----------------------+----------------------+---------------------+---------------------------+----------------+
+|                       | RECLAIM_COMPLETE     | REQ                 |                           | Section 18.51  |
++-----------------------+----------------------+---------------------+---------------------------+----------------+
+|                       | RELEASE_LOCKOWNER    | MNI                 |                           | N/A            |
++-----------------------+----------------------+---------------------+---------------------------+----------------+
+|                       | REMOVE               | REQ                 |                           | Section 18.25  |
++-----------------------+----------------------+---------------------+---------------------------+----------------+
+|                       | RENAME               | REQ                 |                           | Section 18.26  |
++-----------------------+----------------------+---------------------+---------------------------+----------------+
+|                       | RENEW                | MNI                 |                           | N/A            |
++-----------------------+----------------------+---------------------+---------------------------+----------------+
+|                       | RESTOREFH            | REQ                 |                           | Section 18.27  |
++-----------------------+----------------------+---------------------+---------------------------+----------------+
+|                       | SAVEFH               | REQ                 |                           | Section 18.28  |
++-----------------------+----------------------+---------------------+---------------------------+----------------+
+|                       | SECINFO              | REQ                 |                           | Section 18.29  |
++-----------------------+----------------------+---------------------+---------------------------+----------------+
+| I                     | SECINFO_NO_NAME      | REC                 | pNFS files                | Section 18.45, |
++-----------------------+----------------------+---------------------+---------------------------+----------------+
+|                       |                      |                     | layout (REQ)              | Section 13.12  |
++-----------------------+----------------------+---------------------+---------------------------+----------------+
+| I                     | SEQUENCE             | REQ                 |                           | Section 18.46  |
++-----------------------+----------------------+---------------------+---------------------------+----------------+
+|                       | SETATTR              | REQ                 |                           | Section 18.30  |
++-----------------------+----------------------+---------------------+---------------------------+----------------+
+|                       | SETCLIENTID          | MNI                 |                           | N/A            |
++-----------------------+----------------------+---------------------+---------------------------+----------------+
+|                       | SETCLIENTID_CONFIRM  | MNI                 |                           | N/A            |
++-----------------------+----------------------+---------------------+---------------------------+----------------+
+| NS                    | SET_SSV              | REQ                 |                           | Section 18.47  |
++-----------------------+----------------------+---------------------+---------------------------+----------------+
+| I                     | TEST_STATEID         | REQ                 |                           | Section 18.48  |
++-----------------------+----------------------+---------------------+---------------------------+----------------+
+|                       | VERIFY               | REQ                 |                           | Section 18.31  |
++-----------------------+----------------------+---------------------+---------------------------+----------------+
+| NS*                   | WANT_DELEGATION      | OPT                 | FDELG (OPT)               | Section 18.49  |
++-----------------------+----------------------+---------------------+---------------------------+----------------+
+|                       | WRITE                | REQ                 |                           | Section 18.32  |
++-----------------------+----------------------+---------------------+---------------------------+----------------+
+
+
+Callback Operations
+===================
++-----------------------+-------------------------+---------------------+---------------------------+---------------+
+| Implementation status | Operation               | REQ,REC, OPT or NMI | Feature (REQ, REC or OPT) | Definition    |
++=======================+=========================+=====================+===========================+===============+
+|                       | CB_GETATTR              | OPT                 | FDELG (REQ)               | Section 20.1  |
++-----------------------+-------------------------+---------------------+---------------------------+---------------+
+| I                     | CB_LAYOUTRECALL         | OPT                 | pNFS (REQ)                | Section 20.3  |
++-----------------------+-------------------------+---------------------+---------------------------+---------------+
+| NS*                   | CB_NOTIFY               | OPT                 | DDELG (REQ)               | Section 20.4  |
++-----------------------+-------------------------+---------------------+---------------------------+---------------+
+| NS*                   | CB_NOTIFY_DEVICEID      | OPT                 | pNFS (OPT)                | Section 20.12 |
++-----------------------+-------------------------+---------------------+---------------------------+---------------+
+| NS*                   | CB_NOTIFY_LOCK          | OPT                 |                           | Section 20.11 |
++-----------------------+-------------------------+---------------------+---------------------------+---------------+
+| NS*                   | CB_PUSH_DELEG           | OPT                 | FDELG (OPT)               | Section 20.5  |
++-----------------------+-------------------------+---------------------+---------------------------+---------------+
+|                       | CB_RECALL               | OPT                 | FDELG,                    | Section 20.2  |
++-----------------------+-------------------------+---------------------+---------------------------+---------------+
+|                       |                         |                     | DDELG, pNFS               |               |
++-----------------------+-------------------------+---------------------+---------------------------+---------------+
+|                       |                         |                     | (REQ)                     |               |
++-----------------------+-------------------------+---------------------+---------------------------+---------------+
+| NS*                   | CB_RECALL_ANY           | OPT                 | FDELG,                    | Section 20.6  |
++-----------------------+-------------------------+---------------------+---------------------------+---------------+
+|                       |                         |                     | DDELG, pNFS               |               |
++-----------------------+-------------------------+---------------------+---------------------------+---------------+
+|                       |                         |                     | (REQ)                     |               |
++-----------------------+-------------------------+---------------------+---------------------------+---------------+
+| NS                    | CB_RECALL_SLOT          | REQ                 |                           | Section 20.8  |
++-----------------------+-------------------------+---------------------+---------------------------+---------------+
+| NS*                   | CB_RECALLABLE_OBJ_AVAIL | OPT                 | DDELG, pNFS               | Section 20.7  |
++-----------------------+-------------------------+---------------------+---------------------------+---------------+
+|                       |                         |                     | (REQ)                     |               |
++-----------------------+-------------------------+---------------------+---------------------------+---------------+
+| I                     | CB_SEQUENCE             | OPT                 | FDELG,                    | Section 20.9  |
++-----------------------+-------------------------+---------------------+---------------------------+---------------+
+|                       |                         |                     | DDELG, pNFS               |               |
++-----------------------+-------------------------+---------------------+---------------------------+---------------+
+|                       |                         |                     | (REQ)                     |               |
++-----------------------+-------------------------+---------------------+---------------------------+---------------+
+| NS*                   | CB_WANTS_CANCELLED      | OPT                 | FDELG,                    | Section 20.10 |
++-----------------------+-------------------------+---------------------+---------------------------+---------------+
+|                       |                         |                     | DDELG, pNFS               |               |
++-----------------------+-------------------------+---------------------+---------------------------+---------------+
+|                       |                         |                     | (REQ)                     |               |
++-----------------------+-------------------------+---------------------+---------------------------+---------------+
+
+
+Implementation notes:
+=====================
+
+SSV:
+  The spec claims this is mandatory, but we don't actually know of any
+  implementations, so we're ignoring it for now.  The server returns
+  NFS4ERR_ENCR_ALG_UNSUPP on EXCHANGE_ID, which should be future-proof.
+
+GSS on the backchannel:
+  Again, theoretically required but not widely implemented (in
+  particular, the current Linux client doesn't request it).  We return
+  NFS4ERR_ENCR_ALG_UNSUPP on CREATE_SESSION.
+
+DELEGPURGE:
+  mandatory only for servers that support CLAIM_DELEGATE_PREV and/or
+  CLAIM_DELEG_PREV_FH (which allows clients to keep delegations that
+  persist across client reboots).  Thus we need not implement this for
+  now.
+
+EXCHANGE_ID:
+  implementation ids are ignored
+
+CREATE_SESSION:
+  backchannel attributes are ignored
+
+SEQUENCE:
+  no support for dynamic slot table renegotiation (optional)
+
+Nonstandard compound limitations:
+  No support for a sessions fore channel RPC compound that requires both a
+  ca_maxrequestsize request and a ca_maxresponsesize reply, so we may
+  fail to live up to the promise we made in CREATE_SESSION fore channel
+  negotiation.
+
+See also http://wiki.linux-nfs.org/wiki/index.php/Server_4.0_and_4.1_issues.
diff --git a/Documentation/filesystems/nfs/nfs41-server.txt b/Documentation/filesystems/nfs/nfs41-server.txt
deleted file mode 100644
index 682a59fabe3f..000000000000
--- a/Documentation/filesystems/nfs/nfs41-server.txt
+++ /dev/null
@@ -1,173 +0,0 @@
-NFSv4.1 Server Implementation
-
-Server support for minorversion 1 can be controlled using the
-/proc/fs/nfsd/versions control file.  The string output returned
-by reading this file will contain either "+4.1" or "-4.1"
-correspondingly.
-
-Currently, server support for minorversion 1 is enabled by default.
-It can be disabled at run time by writing the string "-4.1" to
-the /proc/fs/nfsd/versions control file.  Note that to write this
-control file, the nfsd service must be taken down.  You can use rpc.nfsd
-for this; see rpc.nfsd(8).
-
-(Warning: older servers will interpret "+4.1" and "-4.1" as "+4" and
-"-4", respectively.  Therefore, code meant to work on both new and old
-kernels must turn 4.1 on or off *before* turning support for version 4
-on or off; rpc.nfsd does this correctly.)
-
-The NFSv4 minorversion 1 (NFSv4.1) implementation in nfsd is based
-on RFC 5661.
-
-From the many new features in NFSv4.1 the current implementation
-focuses on the mandatory-to-implement NFSv4.1 Sessions, providing
-"exactly once" semantics and better control and throttling of the
-resources allocated for each client.
-
-The table below, taken from the NFSv4.1 document, lists
-the operations that are mandatory to implement (REQ), optional
-(OPT), and NFSv4.0 operations that are required not to implement (MNI)
-in minor version 1.  The first column indicates the operations that
-are not supported yet by the linux server implementation.
-
-The OPTIONAL features identified and their abbreviations are as follows:
-	pNFS	Parallel NFS
-	FDELG	File Delegations
-	DDELG	Directory Delegations
-
-The following abbreviations indicate the linux server implementation status.
-	I	Implemented NFSv4.1 operations.
-	NS	Not Supported.
-	NS*	Unimplemented optional feature.
-
-Operations
-
-   +----------------------+------------+--------------+----------------+
-   | Operation            | REQ, REC,  | Feature      | Definition     |
-   |                      | OPT, or    | (REQ, REC,   |                |
-   |                      | MNI        | or OPT)      |                |
-   +----------------------+------------+--------------+----------------+
-   | ACCESS               | REQ        |              | Section 18.1   |
-I  | BACKCHANNEL_CTL      | REQ        |              | Section 18.33  |
-I  | BIND_CONN_TO_SESSION | REQ        |              | Section 18.34  |
-   | CLOSE                | REQ        |              | Section 18.2   |
-   | COMMIT               | REQ        |              | Section 18.3   |
-   | CREATE               | REQ        |              | Section 18.4   |
-I  | CREATE_SESSION       | REQ        |              | Section 18.36  |
-NS*| DELEGPURGE           | OPT        | FDELG (REQ)  | Section 18.5   |
-   | DELEGRETURN          | OPT        | FDELG,       | Section 18.6   |
-   |                      |            | DDELG, pNFS  |                |
-   |                      |            | (REQ)        |                |
-I  | DESTROY_CLIENTID     | REQ        |              | Section 18.50  |
-I  | DESTROY_SESSION      | REQ        |              | Section 18.37  |
-I  | EXCHANGE_ID          | REQ        |              | Section 18.35  |
-I  | FREE_STATEID         | REQ        |              | Section 18.38  |
-   | GETATTR              | REQ        |              | Section 18.7   |
-I  | GETDEVICEINFO        | OPT        | pNFS (REQ)   | Section 18.40  |
-NS*| GETDEVICELIST        | OPT        | pNFS (OPT)   | Section 18.41  |
-   | GETFH                | REQ        |              | Section 18.8   |
-NS*| GET_DIR_DELEGATION   | OPT        | DDELG (REQ)  | Section 18.39  |
-I  | LAYOUTCOMMIT         | OPT        | pNFS (REQ)   | Section 18.42  |
-I  | LAYOUTGET            | OPT        | pNFS (REQ)   | Section 18.43  |
-I  | LAYOUTRETURN         | OPT        | pNFS (REQ)   | Section 18.44  |
-   | LINK                 | OPT        |              | Section 18.9   |
-   | LOCK                 | REQ        |              | Section 18.10  |
-   | LOCKT                | REQ        |              | Section 18.11  |
-   | LOCKU                | REQ        |              | Section 18.12  |
-   | LOOKUP               | REQ        |              | Section 18.13  |
-   | LOOKUPP              | REQ        |              | Section 18.14  |
-   | NVERIFY              | REQ        |              | Section 18.15  |
-   | OPEN                 | REQ        |              | Section 18.16  |
-NS*| OPENATTR             | OPT        |              | Section 18.17  |
-   | OPEN_CONFIRM         | MNI        |              | N/A            |
-   | OPEN_DOWNGRADE       | REQ        |              | Section 18.18  |
-   | PUTFH                | REQ        |              | Section 18.19  |
-   | PUTPUBFH             | REQ        |              | Section 18.20  |
-   | PUTROOTFH            | REQ        |              | Section 18.21  |
-   | READ                 | REQ        |              | Section 18.22  |
-   | READDIR              | REQ        |              | Section 18.23  |
-   | READLINK             | OPT        |              | Section 18.24  |
-   | RECLAIM_COMPLETE     | REQ        |              | Section 18.51  |
-   | RELEASE_LOCKOWNER    | MNI        |              | N/A            |
-   | REMOVE               | REQ        |              | Section 18.25  |
-   | RENAME               | REQ        |              | Section 18.26  |
-   | RENEW                | MNI        |              | N/A            |
-   | RESTOREFH            | REQ        |              | Section 18.27  |
-   | SAVEFH               | REQ        |              | Section 18.28  |
-   | SECINFO              | REQ        |              | Section 18.29  |
-I  | SECINFO_NO_NAME      | REC        | pNFS files   | Section 18.45, |
-   |                      |            | layout (REQ) | Section 13.12  |
-I  | SEQUENCE             | REQ        |              | Section 18.46  |
-   | SETATTR              | REQ        |              | Section 18.30  |
-   | SETCLIENTID          | MNI        |              | N/A            |
-   | SETCLIENTID_CONFIRM  | MNI        |              | N/A            |
-NS | SET_SSV              | REQ        |              | Section 18.47  |
-I  | TEST_STATEID         | REQ        |              | Section 18.48  |
-   | VERIFY               | REQ        |              | Section 18.31  |
-NS*| WANT_DELEGATION      | OPT        | FDELG (OPT)  | Section 18.49  |
-   | WRITE                | REQ        |              | Section 18.32  |
-
-Callback Operations
-
-   +-------------------------+-----------+-------------+---------------+
-   | Operation               | REQ, REC, | Feature     | Definition    |
-   |                         | OPT, or   | (REQ, REC,  |               |
-   |                         | MNI       | or OPT)     |               |
-   +-------------------------+-----------+-------------+---------------+
-   | CB_GETATTR              | OPT       | FDELG (REQ) | Section 20.1  |
-I  | CB_LAYOUTRECALL         | OPT       | pNFS (REQ)  | Section 20.3  |
-NS*| CB_NOTIFY               | OPT       | DDELG (REQ) | Section 20.4  |
-NS*| CB_NOTIFY_DEVICEID      | OPT       | pNFS (OPT)  | Section 20.12 |
-NS*| CB_NOTIFY_LOCK          | OPT       |             | Section 20.11 |
-NS*| CB_PUSH_DELEG           | OPT       | FDELG (OPT) | Section 20.5  |
-   | CB_RECALL               | OPT       | FDELG,      | Section 20.2  |
-   |                         |           | DDELG, pNFS |               |
-   |                         |           | (REQ)       |               |
-NS*| CB_RECALL_ANY           | OPT       | FDELG,      | Section 20.6  |
-   |                         |           | DDELG, pNFS |               |
-   |                         |           | (REQ)       |               |
-NS | CB_RECALL_SLOT          | REQ       |             | Section 20.8  |
-NS*| CB_RECALLABLE_OBJ_AVAIL | OPT       | DDELG, pNFS | Section 20.7  |
-   |                         |           | (REQ)       |               |
-I  | CB_SEQUENCE             | OPT       | FDELG,      | Section 20.9  |
-   |                         |           | DDELG, pNFS |               |
-   |                         |           | (REQ)       |               |
-NS*| CB_WANTS_CANCELLED      | OPT       | FDELG,      | Section 20.10 |
-   |                         |           | DDELG, pNFS |               |
-   |                         |           | (REQ)       |               |
-   +-------------------------+-----------+-------------+---------------+
-
-Implementation notes:
-
-SSV:
-* The spec claims this is mandatory, but we don't actually know of any
-  implementations, so we're ignoring it for now.  The server returns
-  NFS4ERR_ENCR_ALG_UNSUPP on EXCHANGE_ID, which should be future-proof.
-
-GSS on the backchannel:
-* Again, theoretically required but not widely implemented (in
-  particular, the current Linux client doesn't request it).  We return
-  NFS4ERR_ENCR_ALG_UNSUPP on CREATE_SESSION.
-
-DELEGPURGE:
-* mandatory only for servers that support CLAIM_DELEGATE_PREV and/or
-  CLAIM_DELEG_PREV_FH (which allows clients to keep delegations that
-  persist across client reboots).  Thus we need not implement this for
-  now.
-
-EXCHANGE_ID:
-* implementation ids are ignored
-
-CREATE_SESSION:
-* backchannel attributes are ignored
-
-SEQUENCE:
-* no support for dynamic slot table renegotiation (optional)
-
-Nonstandard compound limitations:
-* No support for a sessions fore channel RPC compound that requires both a
-  ca_maxrequestsize request and a ca_maxresponsesize reply, so we may
-  fail to live up to the promise we made in CREATE_SESSION fore channel
-  negotiation.
-
-See also http://wiki.linux-nfs.org/wiki/index.php/Server_4.0_and_4.1_issues.
-- 
cgit 


From cb63032b1233e03ac20fc2b60820a50d605b9bc0 Mon Sep 17 00:00:00 2001
From: "Daniel W. S. Almeida" <dwlsalmeida@gmail.com>
Date: Wed, 29 Jan 2020 01:49:17 -0300
Subject: Documentation: nfs: knfsd-stats: convert to ReST

Convert knfsd-stats.txt to ReST. Content remains mostly the same.

Signed-off-by: Daniel W. S. Almeida <dwlsalmeida@gmail.com>
Link: https://lore.kernel.org/r/20200129044917.566906-6-dwlsalmeida@gmail.com
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/nfs/index.rst       |   1 +
 Documentation/filesystems/nfs/knfsd-stats.rst | 122 +++++++++++++++++++++++++
 Documentation/filesystems/nfs/knfsd-stats.txt | 123 --------------------------
 3 files changed, 123 insertions(+), 123 deletions(-)
 create mode 100644 Documentation/filesystems/nfs/knfsd-stats.rst
 delete mode 100644 Documentation/filesystems/nfs/knfsd-stats.txt

diff --git a/Documentation/filesystems/nfs/index.rst b/Documentation/filesystems/nfs/index.rst
index a0a678af921b..65805624e39b 100644
--- a/Documentation/filesystems/nfs/index.rst
+++ b/Documentation/filesystems/nfs/index.rst
@@ -10,3 +10,4 @@ NFS
    rpc-cache
    rpc-server-gss
    nfs41-server
+   knfsd-stats
diff --git a/Documentation/filesystems/nfs/knfsd-stats.rst b/Documentation/filesystems/nfs/knfsd-stats.rst
new file mode 100644
index 000000000000..80bcf13550de
--- /dev/null
+++ b/Documentation/filesystems/nfs/knfsd-stats.rst
@@ -0,0 +1,122 @@
+============================
+Kernel NFS Server Statistics
+============================
+
+:Authors: Greg Banks <gnb@sgi.com> - 26 Mar 2009
+
+This document describes the format and semantics of the statistics
+which the kernel NFS server makes available to userspace.  These
+statistics are available in several text form pseudo files, each of
+which is described separately below.
+
+In most cases you don't need to know these formats, as the nfsstat(8)
+program from the nfs-utils distribution provides a helpful command-line
+interface for extracting and printing them.
+
+All the files described here are formatted as a sequence of text lines,
+separated by newline '\n' characters.  Lines beginning with a hash
+'#' character are comments intended for humans and should be ignored
+by parsing routines.  All other lines contain a sequence of fields
+separated by whitespace.
+
+/proc/fs/nfsd/pool_stats
+========================
+
+This file is available in kernels from 2.6.30 onwards, if the
+/proc/fs/nfsd filesystem is mounted (it almost always should be).
+
+The first line is a comment which describes the fields present in
+all the other lines.  The other lines present the following data as
+a sequence of unsigned decimal numeric fields.  One line is shown
+for each NFS thread pool.
+
+All counters are 64 bits wide and wrap naturally.  There is no way
+to zero these counters, instead applications should do their own
+rate conversion.
+
+pool
+	The id number of the NFS thread pool to which this line applies.
+	This number does not change.
+
+	Thread pool ids are a contiguous set of small integers starting
+	at zero.  The maximum value depends on the thread pool mode, but
+	currently cannot be larger than the number of CPUs in the system.
+	Note that in the default case there will be a single thread pool
+	which contains all the nfsd threads and all the CPUs in the system,
+	and thus this file will have a single line with a pool id of "0".
+
+packets-arrived
+	Counts how many NFS packets have arrived.  More precisely, this
+	is the number of times that the network stack has notified the
+	sunrpc server layer that new data may be available on a transport
+	(e.g. an NFS or UDP socket or an NFS/RDMA endpoint).
+
+	Depending on the NFS workload patterns and various network stack
+	effects (such as Large Receive Offload) which can combine packets
+	on the wire, this may be either more or less than the number
+	of NFS calls received (which statistic is available elsewhere).
+	However this is a more accurate and less workload-dependent measure
+	of how much CPU load is being placed on the sunrpc server layer
+	due to NFS network traffic.
+
+sockets-enqueued
+	Counts how many times an NFS transport is enqueued to wait for
+	an nfsd thread to service it, i.e. no nfsd thread was considered
+	available.
+
+	The circumstance this statistic tracks indicates that there was NFS
+	network-facing work to be done but it couldn't be done immediately,
+	thus introducing a small delay in servicing NFS calls.  The ideal
+	rate of change for this counter is zero; significantly non-zero
+	values may indicate a performance limitation.
+
+	This can happen because there are too few nfsd threads in the thread
+	pool for the NFS workload (the workload is thread-limited), in which
+	case configuring more nfsd threads will probably improve the
+	performance of the NFS workload.
+
+threads-woken
+	Counts how many times an idle nfsd thread is woken to try to
+	receive some data from an NFS transport.
+
+	This statistic tracks the circumstance where incoming
+	network-facing NFS work is being handled quickly, which is a good
+	thing.  The ideal rate of change for this counter will be close
+	to but less than the rate of change of the packets-arrived counter.
+
+threads-timedout
+	Counts how many times an nfsd thread triggered an idle timeout,
+	i.e. was not woken to handle any incoming network packets for
+	some time.
+
+	This statistic counts a circumstance where there are more nfsd
+	threads configured than can be used by the NFS workload.  This is
+	a clue that the number of nfsd threads can be reduced without
+	affecting performance.  Unfortunately, it's only a clue and not
+	a strong indication, for a couple of reasons:
+
+	 - Currently the rate at which the counter is incremented is quite
+	   slow; the idle timeout is 60 minutes.  Unless the NFS workload
+	   remains constant for hours at a time, this counter is unlikely
+	   to be providing information that is still useful.
+
+	 - It is usually a wise policy to provide some slack,
+	   i.e. configure a few more nfsds than are currently needed,
+	   to allow for future spikes in load.
+
+
+Note that incoming packets on NFS transports will be dealt with in
+one of three ways.  An nfsd thread can be woken (threads-woken counts
+this case), or the transport can be enqueued for later attention
+(sockets-enqueued counts this case), or the packet can be temporarily
+deferred because the transport is currently being used by an nfsd
+thread.  This last case is not very interesting and is not explicitly
+counted, but can be inferred from the other counters thus::
+
+	packets-deferred = packets-arrived - ( sockets-enqueued + threads-woken )
+
+
+More
+====
+
+Descriptions of the other statistics file should go here.
diff --git a/Documentation/filesystems/nfs/knfsd-stats.txt b/Documentation/filesystems/nfs/knfsd-stats.txt
deleted file mode 100644
index 1a5d82180b84..000000000000
--- a/Documentation/filesystems/nfs/knfsd-stats.txt
+++ /dev/null
@@ -1,123 +0,0 @@
-
-Kernel NFS Server Statistics
-============================
-
-This document describes the format and semantics of the statistics
-which the kernel NFS server makes available to userspace.  These
-statistics are available in several text form pseudo files, each of
-which is described separately below.
-
-In most cases you don't need to know these formats, as the nfsstat(8)
-program from the nfs-utils distribution provides a helpful command-line
-interface for extracting and printing them.
-
-All the files described here are formatted as a sequence of text lines,
-separated by newline '\n' characters.  Lines beginning with a hash
-'#' character are comments intended for humans and should be ignored
-by parsing routines.  All other lines contain a sequence of fields
-separated by whitespace.
-
-/proc/fs/nfsd/pool_stats
-------------------------
-
-This file is available in kernels from 2.6.30 onwards, if the
-/proc/fs/nfsd filesystem is mounted (it almost always should be).
-
-The first line is a comment which describes the fields present in
-all the other lines.  The other lines present the following data as
-a sequence of unsigned decimal numeric fields.  One line is shown
-for each NFS thread pool.
-
-All counters are 64 bits wide and wrap naturally.  There is no way
-to zero these counters, instead applications should do their own
-rate conversion.
-
-pool
-	The id number of the NFS thread pool to which this line applies.
-	This number does not change.
-
-	Thread pool ids are a contiguous set of small integers starting
-	at zero.  The maximum value depends on the thread pool mode, but
-	currently cannot be larger than the number of CPUs in the system.
-	Note that in the default case there will be a single thread pool
-	which contains all the nfsd threads and all the CPUs in the system,
-	and thus this file will have a single line with a pool id of "0".
-
-packets-arrived
-	Counts how many NFS packets have arrived.  More precisely, this
-	is the number of times that the network stack has notified the
-	sunrpc server layer that new data may be available on a transport
-	(e.g. an NFS or UDP socket or an NFS/RDMA endpoint).
-
-	Depending on the NFS workload patterns and various network stack
-	effects (such as Large Receive Offload) which can combine packets
-	on the wire, this may be either more or less than the number
-	of NFS calls received (which statistic is available elsewhere).
-	However this is a more accurate and less workload-dependent measure
-	of how much CPU load is being placed on the sunrpc server layer
-	due to NFS network traffic.
-
-sockets-enqueued
-	Counts how many times an NFS transport is enqueued to wait for
-	an nfsd thread to service it, i.e. no nfsd thread was considered
-	available.
-
-	The circumstance this statistic tracks indicates that there was NFS
-	network-facing work to be done but it couldn't be done immediately,
-	thus introducing a small delay in servicing NFS calls.  The ideal
-	rate of change for this counter is zero; significantly non-zero
-	values may indicate a performance limitation.
-
-	This can happen because there are too few nfsd threads in the thread
-	pool for the NFS workload (the workload is thread-limited), in which
-	case configuring more nfsd threads will probably improve the
-	performance of the NFS workload.
-
-threads-woken
-	Counts how many times an idle nfsd thread is woken to try to
-	receive some data from an NFS transport.
-
-	This statistic tracks the circumstance where incoming
-	network-facing NFS work is being handled quickly, which is a good
-	thing.  The ideal rate of change for this counter will be close
-	to but less than the rate of change of the packets-arrived counter.
-
-threads-timedout
-	Counts how many times an nfsd thread triggered an idle timeout,
-	i.e. was not woken to handle any incoming network packets for
-	some time.
-
-	This statistic counts a circumstance where there are more nfsd
-	threads configured than can be used by the NFS workload.  This is
-	a clue that the number of nfsd threads can be reduced without
-	affecting performance.  Unfortunately, it's only a clue and not
-	a strong indication, for a couple of reasons:
-
-	 - Currently the rate at which the counter is incremented is quite
-	   slow; the idle timeout is 60 minutes.  Unless the NFS workload
-	   remains constant for hours at a time, this counter is unlikely
-	   to be providing information that is still useful.
-
-	 - It is usually a wise policy to provide some slack,
-	   i.e. configure a few more nfsds than are currently needed,
-	   to allow for future spikes in load.
-
-
-Note that incoming packets on NFS transports will be dealt with in
-one of three ways.  An nfsd thread can be woken (threads-woken counts
-this case), or the transport can be enqueued for later attention
-(sockets-enqueued counts this case), or the packet can be temporarily
-deferred because the transport is currently being used by an nfsd
-thread.  This last case is not very interesting and is not explicitly
-counted, but can be inferred from the other counters thus:
-
-packets-deferred = packets-arrived - ( sockets-enqueued + threads-woken )
-
-
-More
-----
-Descriptions of the other statistics file should go here.
-
-
-Greg Banks <gnb@sgi.com>
-26 Mar 2009
-- 
cgit 


From 56e6b3b0b381abd0484802828764d01552ff76ab Mon Sep 17 00:00:00 2001
From: Yue Hu <zbestahu@163.com>
Date: Thu, 6 Feb 2020 19:10:31 +0800
Subject: Documentation: zram: fix the description about orig_data_size of
 mm_stat

orig_data_size counted the same_pages by commit 51f9f82c855d ("zram:
count same page write as page_stored"), so let's fix it.

Signed-off-by: Yue Hu <zbestahu@163.com>
Link: https://lore.kernel.org/r/20200206111031.9524-1-zbestahu@gmail.com
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/admin-guide/blockdev/zram.rst | 2 --
 1 file changed, 2 deletions(-)

diff --git a/Documentation/admin-guide/blockdev/zram.rst b/Documentation/admin-guide/blockdev/zram.rst
index 27c77d853028..a6fd1f9b5faf 100644
--- a/Documentation/admin-guide/blockdev/zram.rst
+++ b/Documentation/admin-guide/blockdev/zram.rst
@@ -251,8 +251,6 @@ line of text and contains the following stats separated by whitespace:
 
  ================ =============================================================
  orig_data_size   uncompressed size of data stored in this disk.
-		  This excludes same-element-filled pages (same_pages) since
-		  no memory is allocated for them.
                   Unit: bytes
  compr_data_size  compressed size of data stored in this disk
  mem_used_total   the amount of memory allocated for this disk. This
-- 
cgit 


From 895f2c20a88a343d12c387dab9d785ff665cb4ac Mon Sep 17 00:00:00 2001
From: "d.hatayama@fujitsu.com" <d.hatayama@fujitsu.com>
Date: Thu, 13 Feb 2020 02:51:49 +0000
Subject: docs: admin-guide: Add description of %c corename format

There is somehow no description of %c corename format specifier for
/proc/sys/kernel/core_pattern. The %c corename format specifier is
used by user-space application such as systemd-coredump, so it should
be documented.

To find where %c is handled in the kernel source code, look at
function format_corename() in fs/coredump.c.

Signed-off-by: HATAYAMA Daisuke <d.hatayama@fujitsu.com>
Link: https://lore.kernel.org/r/TYAPR01MB4014714BB2ACE425BB6EC6B7951A0@TYAPR01MB4014.jpnprd01.prod.outlook.com
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/admin-guide/sysctl/kernel.rst | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
index def074807cee..b08ba4e63291 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -213,6 +213,7 @@ core_pattern is used to specify a core dumpfile pattern name.
 	%h	hostname
 	%e	executable filename (may be shortened)
 	%E	executable path
+       %c      maximum size of core file by resource limit RLIMIT_CORE
 	%<OTHER> both are dropped
 
 * If the first character of the pattern is a '|', the kernel will treat
-- 
cgit 


From 3b82a112ce594889742164b242d8f213938a443f Mon Sep 17 00:00:00 2001
From: Wang Long <w@laoqinren.net>
Date: Fri, 7 Feb 2020 21:42:10 +0800
Subject: Documentation/ABI: move sysfs-kernel-uids to removed directory

commit 7c9414385ebf ("sched: Remove USER_SCHED") deleted the
USER_SCHED feature. so move the ABI doc to removed directory.

Signed-off-by: Wang Long <w@laoqinren.net>
Link: https://lore.kernel.org/r/1581082930-30441-1-git-send-email-w@laoqinren.net
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/ABI/removed/sysfs-kernel-uids | 14 ++++++++++++++
 Documentation/ABI/testing/sysfs-kernel-uids | 14 --------------
 2 files changed, 14 insertions(+), 14 deletions(-)
 create mode 100644 Documentation/ABI/removed/sysfs-kernel-uids
 delete mode 100644 Documentation/ABI/testing/sysfs-kernel-uids

diff --git a/Documentation/ABI/removed/sysfs-kernel-uids b/Documentation/ABI/removed/sysfs-kernel-uids
new file mode 100644
index 000000000000..dc4463f190a7
--- /dev/null
+++ b/Documentation/ABI/removed/sysfs-kernel-uids
@@ -0,0 +1,14 @@
+What:		/sys/kernel/uids/<uid>/cpu_shares
+Date:		December 2007, finally removed in kernel v2.6.34-rc1
+Contact:	Dhaval Giani <dhaval@linux.vnet.ibm.com>
+		Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
+Description:
+		The /sys/kernel/uids/<uid>/cpu_shares tunable is used
+		to set the cpu bandwidth a user is allowed. This is a
+		propotional value. What that means is that if there
+		are two users logged in, each with an equal number of
+		shares, then they will get equal CPU bandwidth. Another
+		example would be, if User A has shares = 1024 and user
+		B has shares = 2048, User B will get twice the CPU
+		bandwidth user A will. For more details refer
+		Documentation/scheduler/sched-design-CFS.rst
diff --git a/Documentation/ABI/testing/sysfs-kernel-uids b/Documentation/ABI/testing/sysfs-kernel-uids
deleted file mode 100644
index 4182b7061816..000000000000
--- a/Documentation/ABI/testing/sysfs-kernel-uids
+++ /dev/null
@@ -1,14 +0,0 @@
-What:		/sys/kernel/uids/<uid>/cpu_shares
-Date:		December 2007
-Contact:	Dhaval Giani <dhaval@linux.vnet.ibm.com>
-		Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
-Description:
-		The /sys/kernel/uids/<uid>/cpu_shares tunable is used
-		to set the cpu bandwidth a user is allowed. This is a
-		propotional value. What that means is that if there
-		are two users logged in, each with an equal number of
-		shares, then they will get equal CPU bandwidth. Another
-		example would be, if User A has shares = 1024 and user
-		B has shares = 2048, User B will get twice the CPU
-		bandwidth user A will. For more details refer
-		Documentation/scheduler/sched-design-CFS.rst
-- 
cgit 


From 473da2f0d80aa7240dd0a2be5015fdfd93543ca2 Mon Sep 17 00:00:00 2001
From: Alexandre Belloni <alexandre.belloni@bootlin.com>
Date: Sun, 9 Feb 2020 21:33:04 +0100
Subject: docs: userspace: ioctl-number: remove mc146818rtc conflict

In 2.3.43pre2, the RTC ioctls definitions were actually moved from
linux/mc146818rtc.h to linux/rtc.h

Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/20200209203304.66004-1-alexandre.belloni@bootlin.com
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/userspace-api/ioctl/ioctl-number.rst | 1 -
 1 file changed, 1 deletion(-)

diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst b/Documentation/userspace-api/ioctl/ioctl-number.rst
index 2e91370dc159..f759edafd938 100644
--- a/Documentation/userspace-api/ioctl/ioctl-number.rst
+++ b/Documentation/userspace-api/ioctl/ioctl-number.rst
@@ -266,7 +266,6 @@ Code  Seq#    Include File                                           Comments
 'o'   01-A1  `linux/dvb/*.h`                                         DVB
 'p'   00-0F  linux/phantom.h                                         conflict! (OpenHaptics needs this)
 'p'   00-1F  linux/rtc.h                                             conflict!
-'p'   00-3F  linux/mc146818rtc.h                                     conflict!
 'p'   40-7F  linux/nvram.h
 'p'   80-9F  linux/ppdev.h                                           user-space parport
                                                                      <mailto:tim@cyberelk.net>
-- 
cgit 


From 2e5b1886e9bab6c29c5e5c3ce4e373bb9e9eaa8b Mon Sep 17 00:00:00 2001
From: Randy Dunlap <rdunlap@infradead.org>
Date: Sun, 9 Feb 2020 19:53:17 -0800
Subject: Documentation: bootconfig: fix Sphinx block warning

Fix Sphinx format warning:

lnx-56-rc1/Documentation/admin-guide/bootconfig.rst:26: WARNING: Literal block expected; none found.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lore.kernel.org/r/07b3e31f-9b1e-1876-aa60-4436e4dd6da0@infradead.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/admin-guide/bootconfig.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/bootconfig.rst b/Documentation/admin-guide/bootconfig.rst
index b342a6796392..e603ebb5bdda 100644
--- a/Documentation/admin-guide/bootconfig.rst
+++ b/Documentation/admin-guide/bootconfig.rst
@@ -23,7 +23,7 @@ of dot-connected-words, and key and value are connected by ``=``. The value
 has to be terminated by semi-colon (``;``) or newline (``\n``).
 For array value, array entries are separated by comma (``,``). ::
 
-KEY[.WORD[...]] = VALUE[, VALUE2[...]][;]
+  KEY[.WORD[...]] = VALUE[, VALUE2[...]][;]
 
 Unlike the kernel command line syntax, spaces are OK around the comma and ``=``.
 
-- 
cgit 


From 874ddbce487f077c46957e44e4115b3d82f62c92 Mon Sep 17 00:00:00 2001
From: Alexandre Ghiti <alex@ghiti.fr>
Date: Wed, 19 Feb 2020 01:59:53 -0500
Subject: documentation: vm: Advertise support for pte_special in riscv

Risc-V architecture has actually supported pte_special since its merge
upstream, simply add this info to the documentation.

Signed-off-by: Alexandre Ghiti <alex@ghiti.fr>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/features/vm/pte_special/arch-support.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/features/vm/pte_special/arch-support.txt b/Documentation/features/vm/pte_special/arch-support.txt
index 2dc5df6a1cf5..3d492a34c8ee 100644
--- a/Documentation/features/vm/pte_special/arch-support.txt
+++ b/Documentation/features/vm/pte_special/arch-support.txt
@@ -23,7 +23,7 @@
     |    openrisc: | TODO |
     |      parisc: | TODO |
     |     powerpc: |  ok  |
-    |       riscv: | TODO |
+    |       riscv: |  ok  |
     |        s390: |  ok  |
     |          sh: |  ok  |
     |       sparc: |  ok  |
-- 
cgit 


From 2d5dfb5911cb0eed0a9a91ea404ad963f18e5aaf Mon Sep 17 00:00:00 2001
From: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Date: Tue, 18 Feb 2020 17:38:25 +0100
Subject: docs: arm: tcm: Fix a few typos
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/arm/tcm.rst | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/Documentation/arm/tcm.rst b/Documentation/arm/tcm.rst
index effd9c7bc968..b256f9783883 100644
--- a/Documentation/arm/tcm.rst
+++ b/Documentation/arm/tcm.rst
@@ -4,18 +4,18 @@ ARM TCM (Tightly-Coupled Memory) handling in Linux
 
 Written by Linus Walleij <linus.walleij@stericsson.com>
 
-Some ARM SoC:s have a so-called TCM (Tightly-Coupled Memory).
+Some ARM SoCs have a so-called TCM (Tightly-Coupled Memory).
 This is usually just a few (4-64) KiB of RAM inside the ARM
 processor.
 
-Due to being embedded inside the CPU The TCM has a
+Due to being embedded inside the CPU, the TCM has a
 Harvard-architecture, so there is an ITCM (instruction TCM)
 and a DTCM (data TCM). The DTCM can not contain any
 instructions, but the ITCM can actually contain data.
 The size of DTCM or ITCM is minimum 4KiB so the typical
 minimum configuration is 4KiB ITCM and 4KiB DTCM.
 
-ARM CPU:s have special registers to read out status, physical
+ARM CPUs have special registers to read out status, physical
 location and size of TCM memories. arch/arm/include/asm/cputype.h
 defines a CPUID_TCM register that you can read out from the
 system control coprocessor. Documentation from ARM can be found
-- 
cgit 


From fb2511247dc4061fd122d0195838278a4a0b7b59 Mon Sep 17 00:00:00 2001
From: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Date: Tue, 18 Feb 2020 16:02:19 +0100
Subject: docs: Fix path to MTD command line partition parser
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

cmdlinepart.c has been moved to drivers/mtd/parsers/.

Fixes: a3f12a35c91d ("mtd: parsers: Move CMDLINE parser")
Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/admin-guide/kernel-parameters.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index dbc22d684627..47cd55e339a5 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2791,7 +2791,7 @@
 			<name>,<region-number>[,<base>,<size>,<buswidth>,<altbuswidth>]
 
 	mtdparts=	[MTD]
-			See drivers/mtd/cmdlinepart.c.
+			See drivers/mtd/parsers/cmdlinepart.c
 
 	multitce=off	[PPC]  This parameter disables the use of the pSeries
 			firmware feature for updating multiple TCE entries
-- 
cgit 


From a3cb66a508528e9082cba8303b4f31767e7743a2 Mon Sep 17 00:00:00 2001
From: Stephen Kitt <steve@sk2.org>
Date: Tue, 18 Feb 2020 13:59:16 +0100
Subject: docs: pretty up sysctl/kernel.rst

This updates sysctl/kernel.rst to use ReStructured Text more fully:
* the list of files is now the table of contents (old entries with no
  corresponding sections are added as empty sections for now);
* code references and commands are formatted as code, except for
  function names which end up linked to the appropriate documentation;
* links are used to point to other documentation and other sections;
* tables are used to make lists of values more readable (as already
  done for some sections);
* in heavily-reworked paragraphs, sentences are wrapped individually,
  to make future diffs easier to read.

The first mention of the kernel version is dropped. The second
mention, saying that the document is accurate for 2.2, is preserved
for now; I will update that once the document really is accurate for a
current kernel release.

Signed-off-by: Stephen Kitt <steve@sk2.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/admin-guide/sysctl/kernel.rst | 987 ++++++++++++++--------------
 1 file changed, 492 insertions(+), 495 deletions(-)

diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
index b08ba4e63291..4872610cc491 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -2,263 +2,187 @@
 Documentation for /proc/sys/kernel/
 ===================================
 
-kernel version 2.2.10
-
 Copyright (c) 1998, 1999,  Rik van Riel <riel@nl.linux.org>
 
 Copyright (c) 2009,        Shen Feng<shen@cn.fujitsu.com>
 
-For general info and legal blurb, please look in index.rst.
+For general info and legal blurb, please look in :doc:`index`.
 
 ------------------------------------------------------------------------------
 
 This file contains documentation for the sysctl files in
-/proc/sys/kernel/ and is valid for Linux kernel version 2.2.
+``/proc/sys/kernel/`` and is valid for Linux kernel version 2.2.
 
 The files in this directory can be used to tune and monitor
 miscellaneous and general things in the operation of the Linux
-kernel. Since some of the files _can_ be used to screw up your
+kernel. Since some of the files *can* be used to screw up your
 system, it is advisable to read both documentation and source
 before actually making adjustments.
 
 Currently, these files might (depending on your configuration)
-show up in /proc/sys/kernel:
-
-- acct
-- acpi_video_flags
-- auto_msgmni
-- bootloader_type	     [ X86 only ]
-- bootloader_version	     [ X86 only ]
-- cap_last_cap
-- core_pattern
-- core_pipe_limit
-- core_uses_pid
-- ctrl-alt-del
-- dmesg_restrict
-- domainname
-- hostname
-- hotplug
-- hardlockup_all_cpu_backtrace
-- hardlockup_panic
-- hung_task_panic
-- hung_task_check_count
-- hung_task_timeout_secs
-- hung_task_check_interval_secs
-- hung_task_warnings
-- hyperv_record_panic_msg
-- kexec_load_disabled
-- kptr_restrict
-- l2cr                        [ PPC only ]
-- modprobe                    ==> Documentation/debugging-modules.txt
-- modules_disabled
-- msg_next_id		      [ sysv ipc ]
-- msgmax
-- msgmnb
-- msgmni
-- nmi_watchdog
-- osrelease
-- ostype
-- overflowgid
-- overflowuid
-- panic
-- panic_on_oops
-- panic_on_stackoverflow
-- panic_on_unrecovered_nmi
-- panic_on_warn
-- panic_print
-- panic_on_rcu_stall
-- perf_cpu_time_max_percent
-- perf_event_paranoid
-- perf_event_max_stack
-- perf_event_mlock_kb
-- perf_event_max_contexts_per_stack
-- pid_max
-- powersave-nap               [ PPC only ]
-- printk
-- printk_delay
-- printk_ratelimit
-- printk_ratelimit_burst
-- pty                         ==> Documentation/filesystems/devpts.txt
-- randomize_va_space
-- real-root-dev               ==> Documentation/admin-guide/initrd.rst
-- reboot-cmd                  [ SPARC only ]
-- rtsig-max
-- rtsig-nr
-- sched_energy_aware
-- seccomp/                    ==> Documentation/userspace-api/seccomp_filter.rst
-- sem
-- sem_next_id		      [ sysv ipc ]
-- sg-big-buff                 [ generic SCSI device (sg) ]
-- shm_next_id		      [ sysv ipc ]
-- shm_rmid_forced
-- shmall
-- shmmax                      [ sysv ipc ]
-- shmmni
-- softlockup_all_cpu_backtrace
-- soft_watchdog
-- stack_erasing
-- stop-a                      [ SPARC only ]
-- sysrq                       ==> Documentation/admin-guide/sysrq.rst
-- sysctl_writes_strict
-- tainted                     ==> Documentation/admin-guide/tainted-kernels.rst
-- threads-max
-- unknown_nmi_panic
-- watchdog
-- watchdog_thresh
-- version
-
-
-acct:
-=====
+show up in ``/proc/sys/kernel``:
+
+.. contents:: :local:
+
+
+acct
+====
+
+::
 
-highwater lowwater frequency
+    highwater lowwater frequency
 
 If BSD-style process accounting is enabled these values control
 its behaviour. If free space on filesystem where the log lives
-goes below <lowwater>% accounting suspends. If free space gets
-above <highwater>% accounting resumes. <Frequency> determines
+goes below ``lowwater``% accounting suspends. If free space gets
+above ``highwater``% accounting resumes. ``frequency`` determines
 how often do we check the amount of free space (value is in
 seconds). Default:
-4 2 30
-That is, suspend accounting if there left <= 2% free; resume it
-if we got >=4%; consider information about amount of free space
-valid for 30 seconds.
 
+::
 
-acpi_video_flags:
-=================
+    4 2 30
 
-flags
+That is, suspend accounting if free space drops below 2%; resume it
+if it increases to at least 4%; consider information about amount of
+free space valid for 30 seconds.
 
-See Doc*/kernel/power/video.txt, it allows mode of video boot to be
-set during run time.
 
+acpi_video_flags
+================
+
+See Documentation/kernel/power/video.txt, it allows mode of video boot
+to be set during run time.
 
-auto_msgmni:
-============
+
+auto_msgmni
+===========
 
 This variable has no effect and may be removed in future kernel
 releases. Reading it always returns 0.
-Up to Linux 3.17, it enabled/disabled automatic recomputing of msgmni
-upon memory add/remove or upon ipc namespace creation/removal.
+Up to Linux 3.17, it enabled/disabled automatic recomputing of
+`msgmni`_
+upon memory add/remove or upon IPC namespace creation/removal.
 Echoing "1" into this file enabled msgmni automatic recomputing.
-Echoing "0" turned it off. auto_msgmni default value was 1.
-
+Echoing "0" turned it off. The default value was 1.
 
-bootloader_type:
-================
 
-x86 bootloader identification
+bootloader_type (x86 only)
+==========================
 
 This gives the bootloader type number as indicated by the bootloader,
 shifted left by 4, and OR'd with the low four bits of the bootloader
 version.  The reason for this encoding is that this used to match the
-type_of_loader field in the kernel header; the encoding is kept for
+``type_of_loader`` field in the kernel header; the encoding is kept for
 backwards compatibility.  That is, if the full bootloader type number
 is 0x15 and the full version number is 0x234, this file will contain
 the value 340 = 0x154.
 
-See the type_of_loader and ext_loader_type fields in
-Documentation/x86/boot.rst for additional information.
-
+See the ``type_of_loader`` and ``ext_loader_type`` fields in
+:doc:`/x86/boot` for additional information.
 
-bootloader_version:
-===================
 
-x86 bootloader version
+bootloader_version (x86 only)
+=============================
 
 The complete bootloader version number.  In the example above, this
 file will contain the value 564 = 0x234.
 
-See the type_of_loader and ext_loader_ver fields in
-Documentation/x86/boot.rst for additional information.
+See the ``type_of_loader`` and ``ext_loader_ver`` fields in
+:doc:`/x86/boot` for additional information.
 
 
-cap_last_cap:
-=============
+cap_last_cap
+============
 
 Highest valid capability of the running kernel.  Exports
-CAP_LAST_CAP from the kernel.
+``CAP_LAST_CAP`` from the kernel.
 
 
-core_pattern:
-=============
+core_pattern
+============
 
-core_pattern is used to specify a core dumpfile pattern name.
+``core_pattern`` is used to specify a core dumpfile pattern name.
 
 * max length 127 characters; default value is "core"
-* core_pattern is used as a pattern template for the output filename;
-  certain string patterns (beginning with '%') are substituted with
-  their actual values.
-* backward compatibility with core_uses_pid:
+* ``core_pattern`` is used as a pattern template for the output
+  filename; certain string patterns (beginning with '%') are
+  substituted with their actual values.
+* backward compatibility with ``core_uses_pid``:
 
-	If core_pattern does not include "%p" (default does not)
-	and core_uses_pid is set, then .PID will be appended to
+	If ``core_pattern`` does not include "%p" (default does not)
+	and ``core_uses_pid`` is set, then .PID will be appended to
 	the filename.
 
-* corename format specifiers::
-
-	%<NUL>	'%' is dropped
-	%%	output one '%'
-	%p	pid
-	%P	global pid (init PID namespace)
-	%i	tid
-	%I	global tid (init PID namespace)
-	%u	uid (in initial user namespace)
-	%g	gid (in initial user namespace)
-	%d	dump mode, matches PR_SET_DUMPABLE and
-		/proc/sys/fs/suid_dumpable
-	%s	signal number
-	%t	UNIX time of dump
-	%h	hostname
-	%e	executable filename (may be shortened)
-	%E	executable path
-       %c      maximum size of core file by resource limit RLIMIT_CORE
-	%<OTHER> both are dropped
+* corename format specifiers
+
+	========	==========================================
+	%<NUL>		'%' is dropped
+	%%		output one '%'
+	%p		pid
+	%P		global pid (init PID namespace)
+	%i		tid
+	%I		global tid (init PID namespace)
+	%u		uid (in initial user namespace)
+	%g		gid (in initial user namespace)
+	%d		dump mode, matches ``PR_SET_DUMPABLE`` and
+			``/proc/sys/fs/suid_dumpable``
+	%s		signal number
+	%t		UNIX time of dump
+	%h		hostname
+	%e		executable filename (may be shortened)
+	%E		executable path
+	%c		maximum size of core file by resource limit RLIMIT_CORE
+	%<OTHER>	both are dropped
+	========	==========================================
 
 * If the first character of the pattern is a '|', the kernel will treat
   the rest of the pattern as a command to run.  The core dump will be
   written to the standard input of that program instead of to a file.
 
 
-core_pipe_limit:
-================
+core_pipe_limit
+===============
 
-This sysctl is only applicable when core_pattern is configured to pipe
-core files to a user space helper (when the first character of
-core_pattern is a '|', see above).  When collecting cores via a pipe
-to an application, it is occasionally useful for the collecting
-application to gather data about the crashing process from its
-/proc/pid directory.  In order to do this safely, the kernel must wait
-for the collecting process to exit, so as not to remove the crashing
-processes proc files prematurely.  This in turn creates the
-possibility that a misbehaving userspace collecting process can block
-the reaping of a crashed process simply by never exiting.  This sysctl
-defends against that.  It defines how many concurrent crashing
-processes may be piped to user space applications in parallel.  If
-this value is exceeded, then those crashing processes above that value
-are noted via the kernel log and their cores are skipped.  0 is a
-special value, indicating that unlimited processes may be captured in
-parallel, but that no waiting will take place (i.e. the collecting
-process is not guaranteed access to /proc/<crashing pid>/).  This
-value defaults to 0.
-
-
-core_uses_pid:
-==============
+This sysctl is only applicable when `core_pattern`_ is configured to
+pipe core files to a user space helper (when the first character of
+``core_pattern`` is a '|', see above).
+When collecting cores via a pipe to an application, it is occasionally
+useful for the collecting application to gather data about the
+crashing process from its ``/proc/pid`` directory.
+In order to do this safely, the kernel must wait for the collecting
+process to exit, so as not to remove the crashing processes proc files
+prematurely.
+This in turn creates the possibility that a misbehaving userspace
+collecting process can block the reaping of a crashed process simply
+by never exiting.
+This sysctl defends against that.
+It defines how many concurrent crashing processes may be piped to user
+space applications in parallel.
+If this value is exceeded, then those crashing processes above that
+value are noted via the kernel log and their cores are skipped.
+0 is a special value, indicating that unlimited processes may be
+captured in parallel, but that no waiting will take place (i.e. the
+collecting process is not guaranteed access to ``/proc/<crashing
+pid>/``).
+This value defaults to 0.
+
+
+core_uses_pid
+=============
 
 The default coredump filename is "core".  By setting
-core_uses_pid to 1, the coredump filename becomes core.PID.
-If core_pattern does not include "%p" (default does not)
-and core_uses_pid is set, then .PID will be appended to
+``core_uses_pid`` to 1, the coredump filename becomes core.PID.
+If `core_pattern`_ does not include "%p" (default does not)
+and ``core_uses_pid`` is set, then .PID will be appended to
 the filename.
 
 
-ctrl-alt-del:
-=============
+ctrl-alt-del
+============
 
 When the value in this file is 0, ctrl-alt-del is trapped and
-sent to the init(1) program to handle a graceful restart.
+sent to the ``init(1)`` program to handle a graceful restart.
 When, however, the value is > 0, Linux's reaction to a Vulcan
 Nerve Pinch (tm) will be an immediate reboot, without even
 syncing its dirty buffers.
@@ -270,21 +194,22 @@ Note:
   to decide what to do with it.
 
 
-dmesg_restrict:
-===============
+dmesg_restrict
+==============
 
 This toggle indicates whether unprivileged users are prevented
-from using dmesg(8) to view messages from the kernel's log buffer.
-When dmesg_restrict is set to (0) there are no restrictions. When
-dmesg_restrict is set set to (1), users must have CAP_SYSLOG to use
-dmesg(8).
+from using ``dmesg(8)`` to view messages from the kernel's log
+buffer.
+When ``dmesg_restrict`` is set to 0 there are no restrictions.
+When ``dmesg_restrict`` is set set to 1, users must have
+``CAP_SYSLOG`` to use ``dmesg(8)``.
 
-The kernel config option CONFIG_SECURITY_DMESG_RESTRICT sets the
-default value of dmesg_restrict.
+The kernel config option ``CONFIG_SECURITY_DMESG_RESTRICT`` sets the
+default value of ``dmesg_restrict``.
 
 
-domainname & hostname:
-======================
+domainname & hostname
+=====================
 
 These files can be used to set the NIS/YP domainname and the
 hostname of your box in exactly the same way as the commands
@@ -303,167 +228,192 @@ hostname "darkstar" and DNS (Internet Domain Name Server)
 domainname "frop.org", not to be confused with the NIS (Network
 Information Service) or YP (Yellow Pages) domainname. These two
 domain names are in general different. For a detailed discussion
-see the hostname(1) man page.
+see the ``hostname(1)`` man page.
 
 
-hardlockup_all_cpu_backtrace:
-=============================
+hardlockup_all_cpu_backtrace
+============================
 
 This value controls the hard lockup detector behavior when a hard
 lockup condition is detected as to whether or not to gather further
 debug information. If enabled, arch-specific all-CPU stack dumping
 will be initiated.
 
-0: do nothing. This is the default behavior.
-
-1: on detection capture more debug information.
+= ============================================
+0 Do nothing. This is the default behavior.
+1 On detection capture more debug information.
+= ============================================
 
 
-hardlockup_panic:
-=================
+hardlockup_panic
+================
 
 This parameter can be used to control whether the kernel panics
 when a hard lockup is detected.
 
-   0 - don't panic on hard lockup
-   1 - panic on hard lockup
+= ===========================
+0 Don't panic on hard lockup.
+1 Panic on hard lockup.
+= ===========================
 
-See Documentation/admin-guide/lockup-watchdogs.rst for more information.  This can
-also be set using the nmi_watchdog kernel parameter.
+See :doc:`/admin-guide/lockup-watchdogs` for more information.
+This can also be set using the nmi_watchdog kernel parameter.
 
 
-hotplug:
-========
+hotplug
+=======
 
 Path for the hotplug policy agent.
-Default value is "/sbin/hotplug".
+Default value is "``/sbin/hotplug``".
 
 
-hung_task_panic:
-================
+hung_task_panic
+===============
 
 Controls the kernel's behavior when a hung task is detected.
-This file shows up if CONFIG_DETECT_HUNG_TASK is enabled.
+This file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled.
 
-0: continue operation. This is the default behavior.
+= =================================================
+0 Continue operation. This is the default behavior.
+1 Panic immediately.
+= =================================================
 
-1: panic immediately.
 
-
-hung_task_check_count:
-======================
+hung_task_check_count
+=====================
 
 The upper bound on the number of tasks that are checked.
-This file shows up if CONFIG_DETECT_HUNG_TASK is enabled.
+This file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled.
 
 
-hung_task_timeout_secs:
-=======================
+hung_task_timeout_secs
+======================
 
 When a task in D state did not get scheduled
 for more than this value report a warning.
-This file shows up if CONFIG_DETECT_HUNG_TASK is enabled.
+This file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled.
 
-0: means infinite timeout - no checking done.
+0 means infinite timeout, no checking is done.
 
-Possible values to set are in range {0..LONG_MAX/HZ}.
+Possible values to set are in range {0:``LONG_MAX``/``HZ``}.
 
 
-hung_task_check_interval_secs:
-==============================
+hung_task_check_interval_secs
+=============================
 
 Hung task check interval. If hung task checking is enabled
-(see hung_task_timeout_secs), the check is done every
-hung_task_check_interval_secs seconds.
-This file shows up if CONFIG_DETECT_HUNG_TASK is enabled.
+(see `hung_task_timeout_secs`_), the check is done every
+``hung_task_check_interval_secs`` seconds.
+This file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled.
 
-0 (default): means use hung_task_timeout_secs as checking interval.
-Possible values to set are in range {0..LONG_MAX/HZ}.
+0 (default) means use ``hung_task_timeout_secs`` as checking
+interval.
 
+Possible values to set are in range {0:``LONG_MAX``/``HZ``}.
 
-hung_task_warnings:
-===================
+
+hung_task_warnings
+==================
 
 The maximum number of warnings to report. During a check interval
 if a hung task is detected, this value is decreased by 1.
 When this value reaches 0, no more warnings will be reported.
-This file shows up if CONFIG_DETECT_HUNG_TASK is enabled.
+This file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled.
 
 -1: report an infinite number of warnings.
 
 
-hyperv_record_panic_msg:
-========================
+hyperv_record_panic_msg
+=======================
 
 Controls whether the panic kmsg data should be reported to Hyper-V.
 
-0: do not report panic kmsg data.
+= =========================================================
+0 Do not report panic kmsg data.
+1 Report the panic kmsg data. This is the default behavior.
+= =========================================================
 
-1: report the panic kmsg data. This is the default behavior.
 
+kexec_load_disabled
+===================
 
-kexec_load_disabled:
-====================
-
-A toggle indicating if the kexec_load syscall has been disabled. This
-value defaults to 0 (false: kexec_load enabled), but can be set to 1
-(true: kexec_load disabled). Once true, kexec can no longer be used, and
-the toggle cannot be set back to false. This allows a kexec image to be
-loaded before disabling the syscall, allowing a system to set up (and
-later use) an image without it being altered. Generally used together
-with the "modules_disabled" sysctl.
+A toggle indicating if the ``kexec_load`` syscall has been disabled.
+This value defaults to 0 (false: ``kexec_load`` enabled), but can be
+set to 1 (true: ``kexec_load`` disabled).
+Once true, kexec can no longer be used, and the toggle cannot be set
+back to false.
+This allows a kexec image to be loaded before disabling the syscall,
+allowing a system to set up (and later use) an image without it being
+altered.
+Generally used together with the `modules_disabled`_ sysctl.
 
 
-kptr_restrict:
-==============
+kptr_restrict
+=============
 
 This toggle indicates whether restrictions are placed on
-exposing kernel addresses via /proc and other interfaces.
-
-When kptr_restrict is set to 0 (the default) the address is hashed before
-printing. (This is the equivalent to %p.)
+exposing kernel addresses via ``/proc`` and other interfaces.
+
+When ``kptr_restrict`` is set to 0 (the default) the address is hashed
+before printing.
+(This is the equivalent to %p.)
+
+When ``kptr_restrict`` is set to 1, kernel pointers printed using the
+%pK format specifier will be replaced with 0s unless the user has
+``CAP_SYSLOG`` and effective user and group ids are equal to the real
+ids.
+This is because %pK checks are done at read() time rather than open()
+time, so if permissions are elevated between the open() and the read()
+(e.g via a setuid binary) then %pK will not leak kernel pointers to
+unprivileged users.
+Note, this is a temporary solution only.
+The correct long-term solution is to do the permission checks at
+open() time.
+Consider removing world read permissions from files that use %pK, and
+using `dmesg_restrict`_ to protect against uses of %pK in ``dmesg(8)``
+if leaking kernel pointer values to unprivileged users is a concern.
+
+When ``kptr_restrict`` is set to 2, kernel pointers printed using
+%pK will be replaced with 0s regardless of privileges.
+
+
+l2cr (PPC only)
+===============
 
-When kptr_restrict is set to (1), kernel pointers printed using the %pK
-format specifier will be replaced with 0's unless the user has CAP_SYSLOG
-and effective user and group ids are equal to the real ids. This is
-because %pK checks are done at read() time rather than open() time, so
-if permissions are elevated between the open() and the read() (e.g via
-a setuid binary) then %pK will not leak kernel pointers to unprivileged
-users. Note, this is a temporary solution only. The correct long-term
-solution is to do the permission checks at open() time. Consider removing
-world read permissions from files that use %pK, and using dmesg_restrict
-to protect against uses of %pK in dmesg(8) if leaking kernel pointer
-values to unprivileged users is a concern.
+This flag controls the L2 cache of G3 processor boards. If
+0, the cache is disabled. Enabled if nonzero.
 
-When kptr_restrict is set to (2), kernel pointers printed using
-%pK will be replaced with 0's regardless of privileges.
 
+modprobe
+========
 
-l2cr: (PPC only)
-================
-
-This flag controls the L2 cache of G3 processor boards. If
-0, the cache is disabled. Enabled if nonzero.
+See Documentation/debugging-modules.txt.
 
 
-modules_disabled:
-=================
+modules_disabled
+================
 
 A toggle value indicating if modules are allowed to be loaded
 in an otherwise modular kernel.  This toggle defaults to off
 (0), but can be set true (1).  Once true, modules can be
 neither loaded nor unloaded, and the toggle cannot be set back
-to false.  Generally used with the "kexec_load_disabled" toggle.
+to false.  Generally used with the `kexec_load_disabled`_ toggle.
+
 
+.. _msgmni:
 
-msg_next_id, sem_next_id, and shm_next_id:
-==========================================
+msgmax, msgmnb, and msgmni
+==========================
+
+
+msg_next_id, sem_next_id, and shm_next_id (System V IPC)
+========================================================
 
 These three toggles allows to specify desired id for next allocated IPC
 object: message, semaphore or shared memory respectively.
 
 By default they are equal to -1, which means generic allocation logic.
-Possible values to set are in range {0..INT_MAX}.
+Possible values to set are in range {0:``INT_MAX``}.
 
 Notes:
   1) kernel doesn't guarantee, that new object will have desired id. So,
@@ -473,15 +423,16 @@ Notes:
      fails, it is undefined if the value remains unmodified or is reset to -1.
 
 
-nmi_watchdog:
-=============
+nmi_watchdog
+============
 
 This parameter can be used to control the NMI watchdog
 (i.e. the hard lockup detector) on x86 systems.
 
-0 - disable the hard lockup detector
-
-1 - enable the hard lockup detector
+= =================================
+0 Disable the hard lockup detector.
+1 Enable the hard lockup detector.
+= =================================
 
 The hard lockup detector monitors each CPU for its ability to respond to
 timer interrupts. The mechanism utilizes CPU performance counter registers
@@ -493,11 +444,11 @@ in a KVM virtual machine. This default can be overridden by adding::
 
    nmi_watchdog=1
 
-to the guest kernel command line (see Documentation/admin-guide/kernel-parameters.rst).
+to the guest kernel command line (see :doc:`/admin-guide/kernel-parameters`).
 
 
-numa_balancing:
-===============
+numa_balancing
+==============
 
 Enables/disables automatic page fault based NUMA memory
 balancing. Memory is moved automatically to nodes
@@ -515,9 +466,10 @@ ideally is offset by improved memory locality but there is no universal
 guarantee. If the target workload is already bound to NUMA nodes then this
 feature should be disabled. Otherwise, if the system overhead from the
 feature is too high then the rate the kernel samples for NUMA hinting
-faults may be controlled by the numa_balancing_scan_period_min_ms,
+faults may be controlled by the `numa_balancing_scan_period_min_ms,
 numa_balancing_scan_delay_ms, numa_balancing_scan_period_max_ms,
-numa_balancing_scan_size_mb, and numa_balancing_settle_count sysctls.
+numa_balancing_scan_size_mb`_, and numa_balancing_settle_count sysctls.
+
 
 numa_balancing_scan_period_min_ms, numa_balancing_scan_delay_ms, numa_balancing_scan_period_max_ms, numa_balancing_scan_size_mb
 ===============================================================================================================================
@@ -543,23 +495,23 @@ workload pattern changes and minimises performance impact due to remote
 memory accesses. These sysctls control the thresholds for scan delays and
 the number of pages scanned.
 
-numa_balancing_scan_period_min_ms is the minimum time in milliseconds to
+``numa_balancing_scan_period_min_ms`` is the minimum time in milliseconds to
 scan a tasks virtual memory. It effectively controls the maximum scanning
 rate for each task.
 
-numa_balancing_scan_delay_ms is the starting "scan delay" used for a task
+``numa_balancing_scan_delay_ms`` is the starting "scan delay" used for a task
 when it initially forks.
 
-numa_balancing_scan_period_max_ms is the maximum time in milliseconds to
+``numa_balancing_scan_period_max_ms`` is the maximum time in milliseconds to
 scan a tasks virtual memory. It effectively controls the minimum scanning
 rate for each task.
 
-numa_balancing_scan_size_mb is how many megabytes worth of pages are
+``numa_balancing_scan_size_mb`` is how many megabytes worth of pages are
 scanned for a given scan.
 
 
-osrelease, ostype & version:
-============================
+osrelease, ostype & version
+===========================
 
 ::
 
@@ -570,15 +522,16 @@ osrelease, ostype & version:
   # cat version
   #5 Wed Feb 25 21:49:24 MET 1998
 
-The files osrelease and ostype should be clear enough. Version
+The files ``osrelease`` and ``ostype`` should be clear enough.
+``version``
 needs a little more clarification however. The '#5' means that
 this is the fifth kernel built from this source base and the
 date behind it indicates the time the kernel was built.
 The only way to tune these values is to rebuild the kernel :-)
 
 
-overflowgid & overflowuid:
-==========================
+overflowgid & overflowuid
+=========================
 
 if your architecture did not always support 32-bit UIDs (i.e. arm,
 i386, m68k, sh, and sparc32), a fixed UID and GID will be returned to
@@ -589,108 +542,113 @@ These sysctls allow you to change the value of the fixed UID and GID.
 The default is 65534.
 
 
-panic:
-======
+panic
+=====
 
 The value in this file represents the number of seconds the kernel
 waits before rebooting on a panic. When you use the software watchdog,
 the recommended setting is 60.
 
 
-panic_on_io_nmi:
-================
+panic_on_io_nmi
+===============
 
 Controls the kernel's behavior when a CPU receives an NMI caused by
 an IO error.
 
-0: try to continue operation (default)
-
-1: panic immediately. The IO error triggered an NMI. This indicates a
-   serious system condition which could result in IO data corruption.
-   Rather than continuing, panicking might be a better choice. Some
-   servers issue this sort of NMI when the dump button is pushed,
-   and you can use this option to take a crash dump.
+= ==================================================================
+0 Try to continue operation (default).
+1 Panic immediately. The IO error triggered an NMI. This indicates a
+  serious system condition which could result in IO data corruption.
+  Rather than continuing, panicking might be a better choice. Some
+  servers issue this sort of NMI when the dump button is pushed,
+  and you can use this option to take a crash dump.
+= ==================================================================
 
 
-panic_on_oops:
-==============
+panic_on_oops
+=============
 
 Controls the kernel's behaviour when an oops or BUG is encountered.
 
-0: try to continue operation
-
-1: panic immediately.  If the `panic` sysctl is also non-zero then the
-   machine will be rebooted.
+= ===================================================================
+0 Try to continue operation.
+1 Panic immediately.  If the `panic` sysctl is also non-zero then the
+  machine will be rebooted.
+= ===================================================================
 
 
-panic_on_stackoverflow:
-=======================
+panic_on_stackoverflow
+======================
 
 Controls the kernel's behavior when detecting the overflows of
 kernel, IRQ and exception stacks except a user stack.
-This file shows up if CONFIG_DEBUG_STACKOVERFLOW is enabled.
-
-0: try to continue operation.
+This file shows up if ``CONFIG_DEBUG_STACKOVERFLOW`` is enabled.
 
-1: panic immediately.
+= ==========================
+0 Try to continue operation.
+1 Panic immediately.
+= ==========================
 
 
-panic_on_unrecovered_nmi:
-=========================
+panic_on_unrecovered_nmi
+========================
 
 The default Linux behaviour on an NMI of either memory or unknown is
 to continue operation. For many environments such as scientific
 computing it is preferable that the box is taken out and the error
 dealt with than an uncorrected parity/ECC error get propagated.
 
-A small number of systems do generate NMI's for bizarre random reasons
+A small number of systems do generate NMIs for bizarre random reasons
 such as power management so the default is off. That sysctl works like
 the existing panic controls already in that directory.
 
 
-panic_on_warn:
-==============
+panic_on_warn
+=============
 
 Calls panic() in the WARN() path when set to 1.  This is useful to avoid
 a kernel rebuild when attempting to kdump at the location of a WARN().
 
-0: only WARN(), default behaviour.
-
-1: call panic() after printing out WARN() location.
+= ================================================
+0 Only WARN(), default behaviour.
+1 Call panic() after printing out WARN() location.
+= ================================================
 
 
-panic_print:
-============
+panic_print
+===========
 
 Bitmask for printing system info when panic happens. User can chose
 combination of the following bits:
 
-=====  ========================================
+=====  ============================================
 bit 0  print all tasks info
 bit 1  print system memory info
 bit 2  print timer info
-bit 3  print locks info if CONFIG_LOCKDEP is on
+bit 3  print locks info if ``CONFIG_LOCKDEP`` is on
 bit 4  print ftrace buffer
-=====  ========================================
+=====  ============================================
 
 So for example to print tasks and memory info on panic, user can::
 
   echo 3 > /proc/sys/kernel/panic_print
 
 
-panic_on_rcu_stall:
-===================
+panic_on_rcu_stall
+==================
 
 When set to 1, calls panic() after RCU stall detection messages. This
 is useful to define the root cause of RCU stalls using a vmcore.
 
-0: do not panic() when RCU stall takes place, default behavior.
-
-1: panic() after printing RCU stall messages.
+= ============================================================
+0 Do not panic() when RCU stall takes place, default behavior.
+1 panic() after printing RCU stall messages.
+= ============================================================
 
 
-perf_cpu_time_max_percent:
-==========================
+perf_cpu_time_max_percent
+=========================
 
 Hints to the kernel how much CPU time it should be allowed to
 use to handle perf sampling events.  If the perf subsystem
@@ -703,171 +661,179 @@ unexpectedly take too long to execute, the NMIs can become
 stacked up next to each other so much that nothing else is
 allowed to execute.
 
-0:
-   disable the mechanism.  Do not monitor or correct perf's
-   sampling rate no matter how CPU time it takes.
+===== ========================================================
+0     Disable the mechanism.  Do not monitor or correct perf's
+      sampling rate no matter how CPU time it takes.
 
-1-100:
-   attempt to throttle perf's sample rate to this
-   percentage of CPU.  Note: the kernel calculates an
-   "expected" length of each sample event.  100 here means
-   100% of that expected length.  Even if this is set to
-   100, you may still see sample throttling if this
-   length is exceeded.  Set to 0 if you truly do not care
-   how much CPU is consumed.
+1-100 Attempt to throttle perf's sample rate to this
+      percentage of CPU.  Note: the kernel calculates an
+      "expected" length of each sample event.  100 here means
+      100% of that expected length.  Even if this is set to
+      100, you may still see sample throttling if this
+      length is exceeded.  Set to 0 if you truly do not care
+      how much CPU is consumed.
+===== ========================================================
 
 
-perf_event_paranoid:
-====================
+perf_event_paranoid
+===================
 
 Controls use of the performance events system by unprivileged
 users (without CAP_SYS_ADMIN).  The default value is 2.
 
 ===  ==================================================================
- -1  Allow use of (almost) all events by all users
+ -1  Allow use of (almost) all events by all users.
 
-     Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK
+     Ignore mlock limit after perf_event_mlock_kb without
+     ``CAP_IPC_LOCK``.
 
->=0  Disallow ftrace function tracepoint by users without CAP_SYS_ADMIN
+>=0  Disallow ftrace function tracepoint by users without
+     ``CAP_SYS_ADMIN``.
 
-     Disallow raw tracepoint access by users without CAP_SYS_ADMIN
+     Disallow raw tracepoint access by users without ``CAP_SYS_ADMIN``.
 
->=1  Disallow CPU event access by users without CAP_SYS_ADMIN
+>=1  Disallow CPU event access by users without ``CAP_SYS_ADMIN``.
 
->=2  Disallow kernel profiling by users without CAP_SYS_ADMIN
+>=2  Disallow kernel profiling by users without ``CAP_SYS_ADMIN``.
 ===  ==================================================================
 
 
-perf_event_max_stack:
-=====================
+perf_event_max_stack
+====================
 
-Controls maximum number of stack frames to copy for (attr.sample_type &
-PERF_SAMPLE_CALLCHAIN) configured events, for instance, when using
-'perf record -g' or 'perf trace --call-graph fp'.
+Controls maximum number of stack frames to copy for (``attr.sample_type &
+PERF_SAMPLE_CALLCHAIN``) configured events, for instance, when using
+'``perf record -g``' or '``perf trace --call-graph fp``'.
 
 This can only be done when no events are in use that have callchains
-enabled, otherwise writing to this file will return -EBUSY.
+enabled, otherwise writing to this file will return ``-EBUSY``.
 
 The default value is 127.
 
 
-perf_event_mlock_kb:
-====================
+perf_event_mlock_kb
+===================
 
 Control size of per-cpu ring buffer not counted agains mlock limit.
 
 The default value is 512 + 1 page
 
 
-perf_event_max_contexts_per_stack:
-==================================
+perf_event_max_contexts_per_stack
+=================================
 
 Controls maximum number of stack frame context entries for
-(attr.sample_type & PERF_SAMPLE_CALLCHAIN) configured events, for
-instance, when using 'perf record -g' or 'perf trace --call-graph fp'.
+(``attr.sample_type & PERF_SAMPLE_CALLCHAIN``) configured events, for
+instance, when using '``perf record -g``' or '``perf trace --call-graph fp``'.
 
 This can only be done when no events are in use that have callchains
-enabled, otherwise writing to this file will return -EBUSY.
+enabled, otherwise writing to this file will return ``-EBUSY``.
 
 The default value is 8.
 
 
-pid_max:
-========
+pid_max
+=======
 
 PID allocation wrap value.  When the kernel's next PID value
 reaches this value, it wraps back to a minimum PID value.
-PIDs of value pid_max or larger are not allocated.
+PIDs of value ``pid_max`` or larger are not allocated.
 
 
-ns_last_pid:
-============
+ns_last_pid
+===========
 
 The last pid allocated in the current (the one task using this sysctl
 lives in) pid namespace. When selecting a pid for a next task on fork
 kernel tries to allocate a number starting from this one.
 
 
-powersave-nap: (PPC only)
-=========================
+powersave-nap (PPC only)
+========================
 
 If set, Linux-PPC will use the 'nap' mode of powersaving,
 otherwise the 'doze' mode will be used.
 
+
 ==============================================================
 
-printk:
-=======
+printk
+======
 
-The four values in printk denote: console_loglevel,
-default_message_loglevel, minimum_console_loglevel and
-default_console_loglevel respectively.
+The four values in printk denote: ``console_loglevel``,
+``default_message_loglevel``, ``minimum_console_loglevel`` and
+``default_console_loglevel`` respectively.
 
 These values influence printk() behavior when printing or
-logging error messages. See 'man 2 syslog' for more info on
+logging error messages. See '``man 2 syslog``' for more info on
 the different loglevels.
 
-- console_loglevel:
-	messages with a higher priority than
-	this will be printed to the console
-- default_message_loglevel:
-	messages without an explicit priority
-	will be printed with this priority
-- minimum_console_loglevel:
-	minimum (highest) value to which
-	console_loglevel can be set
-- default_console_loglevel:
-	default value for console_loglevel
+======================== =====================================
+console_loglevel         messages with a higher priority than
+                         this will be printed to the console
+default_message_loglevel messages without an explicit priority
+                         will be printed with this priority
+minimum_console_loglevel minimum (highest) value to which
+                         console_loglevel can be set
+default_console_loglevel default value for console_loglevel
+======================== =====================================
 
 
-printk_delay:
-=============
+printk_delay
+============
 
-Delay each printk message in printk_delay milliseconds
+Delay each printk message in ``printk_delay`` milliseconds
 
 Value from 0 - 10000 is allowed.
 
 
-printk_ratelimit:
-=================
+printk_ratelimit
+================
 
-Some warning messages are rate limited. printk_ratelimit specifies
+Some warning messages are rate limited. ``printk_ratelimit`` specifies
 the minimum length of time between these messages (in seconds).
 The default value is 5 seconds.
 
 A value of 0 will disable rate limiting.
 
 
-printk_ratelimit_burst:
-=======================
+printk_ratelimit_burst
+======================
 
-While long term we enforce one message per printk_ratelimit
+While long term we enforce one message per `printk_ratelimit`_
 seconds, we do allow a burst of messages to pass through.
-printk_ratelimit_burst specifies the number of messages we can
+``printk_ratelimit_burst`` specifies the number of messages we can
 send before ratelimiting kicks in.
 
 The default value is 10 messages.
 
 
-printk_devkmsg:
-===============
-
-Control the logging to /dev/kmsg from userspace:
-
-ratelimit:
-	default, ratelimited
+printk_devkmsg
+==============
 
-on: unlimited logging to /dev/kmsg from userspace
+Control the logging to ``/dev/kmsg`` from userspace:
 
-off: logging to /dev/kmsg disabled
+========= =============================================
+ratelimit default, ratelimited
+on        unlimited logging to /dev/kmsg from userspace
+off       logging to /dev/kmsg disabled
+========= =============================================
 
-The kernel command line parameter printk.devkmsg= overrides this and is
+The kernel command line parameter ``printk.devkmsg=`` overrides this and is
 a one-time setting until next reboot: once set, it cannot be changed by
 this sysctl interface anymore.
 
+==============================================================
 
-randomize_va_space:
-===================
+
+pty
+===
+
+See Documentation/filesystems/devpts.txt.
+
+
+randomize_va_space
+==================
 
 This option can be used to select the type of process address
 space randomization that is used in the system, for architectures
@@ -882,10 +848,10 @@ that support this feature.
     This, among other things, implies that shared libraries will be
     loaded to random addresses.  Also for PIE-linked binaries, the
     location of code start is randomized.  This is the default if the
-    CONFIG_COMPAT_BRK option is enabled.
+    ``CONFIG_COMPAT_BRK`` option is enabled.
 
 2   Additionally enable heap randomization.  This is the default if
-    CONFIG_COMPAT_BRK is disabled.
+    ``CONFIG_COMPAT_BRK`` is disabled.
 
     There are a few legacy applications out there (such as some ancient
     versions of libc.so.5 from 1996) that assume that brk area starts
@@ -895,21 +861,27 @@ that support this feature.
     systems it is safe to choose full randomization.
 
     Systems with ancient and/or broken binaries should be configured
-    with CONFIG_COMPAT_BRK enabled, which excludes the heap from process
+    with ``CONFIG_COMPAT_BRK`` enabled, which excludes the heap from process
     address space randomization.
 ==  ===========================================================================
 
 
-reboot-cmd: (Sparc only)
-========================
+real-root-dev
+=============
+
+See :doc:`/admin-guide/initrd`.
+
+
+reboot-cmd (SPARC only)
+=======================
 
 ??? This seems to be a way to give an argument to the Sparc
 ROM/Flash boot loader. Maybe to tell it what to do after
 rebooting. ???
 
 
-rtsig-max & rtsig-nr:
-=====================
+rtsig-max & rtsig-nr
+====================
 
 The file rtsig-max can be used to tune the maximum number
 of POSIX realtime (queued) signals that can be outstanding
@@ -918,8 +890,8 @@ in the system.
 rtsig-nr shows the number of RT signals currently queued.
 
 
-sched_energy_aware:
-===================
+sched_energy_aware
+==================
 
 Enables/disables Energy Aware Scheduling (EAS). EAS starts
 automatically on platforms where it can run (that is,
@@ -929,75 +901,85 @@ requirements for EAS but you do not want to use it, change
 this value to 0.
 
 
-sched_schedstats:
-=================
+sched_schedstats
+================
 
 Enables/disables scheduler statistics. Enabling this feature
 incurs a small amount of overhead in the scheduler but is
 useful for debugging and performance tuning.
 
 
-sg-big-buff:
-============
+seccomp
+=======
+
+See :doc:`/userspace-api/seccomp_filter`.
+
+
+sg-big-buff
+===========
 
 This file shows the size of the generic SCSI (sg) buffer.
 You can't tune it just yet, but you could change it on
-compile time by editing include/scsi/sg.h and changing
-the value of SG_BIG_BUFF.
+compile time by editing ``include/scsi/sg.h`` and changing
+the value of ``SG_BIG_BUFF``.
 
 There shouldn't be any reason to change this value. If
 you can come up with one, you probably know what you
 are doing anyway :)
 
 
-shmall:
-=======
+shmall
+======
 
 This parameter sets the total amount of shared memory pages that
-can be used system wide. Hence, SHMALL should always be at least
-ceil(shmmax/PAGE_SIZE).
+can be used system wide. Hence, ``shmall`` should always be at least
+``ceil(shmmax/PAGE_SIZE)``.
 
-If you are not sure what the default PAGE_SIZE is on your Linux
-system, you can run the following command:
+If you are not sure what the default ``PAGE_SIZE`` is on your Linux
+system, you can run the following command::
 
 	# getconf PAGE_SIZE
 
 
-shmmax:
-=======
+shmmax
+======
 
 This value can be used to query and set the run time limit
 on the maximum shared memory segment size that can be created.
 Shared memory segments up to 1Gb are now supported in the
-kernel.  This value defaults to SHMMAX.
+kernel.  This value defaults to ``SHMMAX``.
 
 
-shm_rmid_forced:
-================
+shmmni
+======
+
+
+shm_rmid_forced
+===============
 
 Linux lets you set resource limits, including how much memory one
-process can consume, via setrlimit(2).  Unfortunately, shared memory
+process can consume, via ``setrlimit(2)``.  Unfortunately, shared memory
 segments are allowed to exist without association with any process, and
 thus might not be counted against any resource limits.  If enabled,
 shared memory segments are automatically destroyed when their attach
 count becomes zero after a detach or a process termination.  It will
 also destroy segments that were created, but never attached to, on exit
-from the process.  The only use left for IPC_RMID is to immediately
+from the process.  The only use left for ``IPC_RMID`` is to immediately
 destroy an unattached segment.  Of course, this breaks the way things are
 defined, so some applications might stop working.  Note that this
 feature will do you no good unless you also configure your resource
-limits (in particular, RLIMIT_AS and RLIMIT_NPROC).  Most systems don't
+limits (in particular, ``RLIMIT_AS`` and ``RLIMIT_NPROC``).  Most systems don't
 need this.
 
 Note that if you change this from 0 to 1, already created segments
 without users and with a dead originative process will be destroyed.
 
 
-sysctl_writes_strict:
-=====================
+sysctl_writes_strict
+====================
 
 Control how file position affects the behavior of updating sysctl values
-via the /proc/sys interface:
+via the ``/proc/sys`` interface:
 
   ==   ======================================================================
   -1   Legacy per-write sysctl value handling, with no printk warnings.
@@ -1014,8 +996,8 @@ via the /proc/sys interface:
   ==   ======================================================================
 
 
-softlockup_all_cpu_backtrace:
-=============================
+softlockup_all_cpu_backtrace
+============================
 
 This value controls the soft lockup detector thread's behavior
 when a soft lockup condition is detected as to whether or not
@@ -1025,43 +1007,56 @@ be issued an NMI and instructed to capture stack trace.
 This feature is only applicable for architectures which support
 NMI.
 
-0: do nothing. This is the default behavior.
-
-1: on detection capture more debug information.
+= ============================================
+0 Do nothing. This is the default behavior.
+1 On detection capture more debug information.
+= ============================================
 
 
-soft_watchdog:
-==============
+soft_watchdog
+=============
 
 This parameter can be used to control the soft lockup detector.
 
-   0 - disable the soft lockup detector
-
-   1 - enable the soft lockup detector
+= =================================
+0 Disable the soft lockup detector.
+1 Enable the soft lockup detector.
+= =================================
 
 The soft lockup detector monitors CPUs for threads that are hogging the CPUs
 without rescheduling voluntarily, and thus prevent the 'watchdog/N' threads
 from running. The mechanism depends on the CPUs ability to respond to timer
 interrupts which are needed for the 'watchdog/N' threads to be woken up by
-the watchdog timer function, otherwise the NMI watchdog - if enabled - can
+the watchdog timer function, otherwise the NMI watchdog — if enabled — can
 detect a hard lockup condition.
 
 
-stack_erasing:
-==============
+stack_erasing
+=============
 
 This parameter can be used to control kernel stack erasing at the end
-of syscalls for kernels built with CONFIG_GCC_PLUGIN_STACKLEAK.
+of syscalls for kernels built with ``CONFIG_GCC_PLUGIN_STACKLEAK``.
 
 That erasing reduces the information which kernel stack leak bugs
 can reveal and blocks some uninitialized stack variable attacks.
 The tradeoff is the performance impact: on a single CPU system kernel
 compilation sees a 1% slowdown, other systems and workloads may vary.
 
-  0: kernel stack erasing is disabled, STACKLEAK_METRICS are not updated.
+= ====================================================================
+0 Kernel stack erasing is disabled, STACKLEAK_METRICS are not updated.
+1 Kernel stack erasing is enabled (default), it is performed before
+  returning to the userspace at the end of syscalls.
+= ====================================================================
+
+
+stop-a (SPARC only)
+===================
 
-  1: kernel stack erasing is enabled (default), it is performed before
-     returning to the userspace at the end of syscalls.
+
+sysrq
+=====
+
+See :doc:`/admin-guide/sysrq`.
 
 
 tainted
@@ -1091,30 +1086,30 @@ ORed together. The letters are seen in "Tainted" line of Oops reports.
 131072  `(T)`  The kernel was built with the struct randomization plugin
 ======  =====  ==============================================================
 
-See Documentation/admin-guide/tainted-kernels.rst for more information.
+See :doc:`/admin-guide/tainted-kernels` for more information.
 
 
-threads-max:
-============
+threads-max
+===========
 
 This value controls the maximum number of threads that can be created
-using fork().
+using ``fork()``.
 
 During initialization the kernel sets this value such that even if the
 maximum number of threads is created, the thread structures occupy only
 a part (1/8th) of the available RAM pages.
 
-The minimum value that can be written to threads-max is 1.
+The minimum value that can be written to ``threads-max`` is 1.
 
-The maximum value that can be written to threads-max is given by the
-constant FUTEX_TID_MASK (0x3fffffff).
+The maximum value that can be written to ``threads-max`` is given by the
+constant ``FUTEX_TID_MASK`` (0x3fffffff).
 
-If a value outside of this range is written to threads-max an error
-EINVAL occurs.
+If a value outside of this range is written to ``threads-max`` an
+``EINVAL`` error occurs.
 
 
-unknown_nmi_panic:
-==================
+unknown_nmi_panic
+=================
 
 The value in this file affects behavior of handling NMI. When the
 value is non-zero, unknown NMI is trapped and then panic occurs. At
@@ -1124,37 +1119,39 @@ NMI switch that most IA32 servers have fires unknown NMI up, for
 example.  If a system hangs up, try pressing the NMI switch.
 
 
-watchdog:
-=========
+watchdog
+========
 
 This parameter can be used to disable or enable the soft lockup detector
-_and_ the NMI watchdog (i.e. the hard lockup detector) at the same time.
+*and* the NMI watchdog (i.e. the hard lockup detector) at the same time.
 
-   0 - disable both lockup detectors
-
-   1 - enable both lockup detectors
+= ==============================
+0 Disable both lockup detectors.
+1 Enable both lockup detectors.
+= ==============================
 
 The soft lockup detector and the NMI watchdog can also be disabled or
-enabled individually, using the soft_watchdog and nmi_watchdog parameters.
-If the watchdog parameter is read, for example by executing::
+enabled individually, using the ``soft_watchdog`` and ``nmi_watchdog``
+parameters.
+If the ``watchdog`` parameter is read, for example by executing::
 
    cat /proc/sys/kernel/watchdog
 
-the output of this command (0 or 1) shows the logical OR of soft_watchdog
-and nmi_watchdog.
+the output of this command (0 or 1) shows the logical OR of
+``soft_watchdog`` and ``nmi_watchdog``.
 
 
-watchdog_cpumask:
-=================
+watchdog_cpumask
+================
 
 This value can be used to control on which cpus the watchdog may run.
-The default cpumask is all possible cores, but if NO_HZ_FULL is
+The default cpumask is all possible cores, but if ``NO_HZ_FULL`` is
 enabled in the kernel config, and cores are specified with the
-nohz_full= boot argument, those cores are excluded by default.
+``nohz_full=`` boot argument, those cores are excluded by default.
 Offline cores can be included in this mask, and if the core is later
 brought online, the watchdog will be started based on the mask value.
 
-Typically this value would only be touched in the nohz_full case
+Typically this value would only be touched in the ``nohz_full`` case
 to re-enable cores that by default were not running the watchdog,
 if a kernel lockup was suspected on those cores.
 
@@ -1165,12 +1162,12 @@ might say::
   echo 0,2-4 > /proc/sys/kernel/watchdog_cpumask
 
 
-watchdog_thresh:
-================
+watchdog_thresh
+===============
 
 This value can be used to control the frequency of hrtimer and NMI
 events and the soft and hard lockup thresholds. The default threshold
 is 10 seconds.
 
-The softlockup threshold is (2 * watchdog_thresh). Setting this
+The softlockup threshold is (``2 * watchdog_thresh``). Setting this
 tunable to zero will disable lockup detection altogether.
-- 
cgit 


From 0317c5371e6a9b71a2e25b47013dd5c62d55d1a6 Mon Sep 17 00:00:00 2001
From: Stephen Kitt <steve@sk2.org>
Date: Tue, 18 Feb 2020 13:59:17 +0100
Subject: docs: merge debugging-modules.txt into sysctl/kernel.rst

This fits nicely in sysctl/kernel.rst, merge it (and rephrase it)
instead of linking to it.

Signed-off-by: Stephen Kitt <steve@sk2.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/admin-guide/sysctl/kernel.rst | 14 +++++++++++++-
 Documentation/debugging-modules.txt         | 22 ----------------------
 2 files changed, 13 insertions(+), 23 deletions(-)
 delete mode 100644 Documentation/debugging-modules.txt

diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
index 4872610cc491..bb56ff25d947 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -387,7 +387,19 @@ This flag controls the L2 cache of G3 processor boards. If
 modprobe
 ========
 
-See Documentation/debugging-modules.txt.
+This gives the full path of the modprobe command which the kernel will
+use to load modules. This can be used to debug module loading
+requests::
+
+    echo '#! /bin/sh' > /tmp/modprobe
+    echo 'echo "$@" >> /tmp/modprobe.log' >> /tmp/modprobe
+    echo 'exec /sbin/modprobe "$@"' >> /tmp/modprobe
+    chmod a+x /tmp/modprobe
+    echo /tmp/modprobe > /proc/sys/kernel/modprobe
+
+This only applies when the *kernel* is requesting that the module be
+loaded; it won't have any effect if the module is being loaded
+explicitly using ``modprobe`` from userspace.
 
 
 modules_disabled
diff --git a/Documentation/debugging-modules.txt b/Documentation/debugging-modules.txt
deleted file mode 100644
index 172ad4aec493..000000000000
--- a/Documentation/debugging-modules.txt
+++ /dev/null
@@ -1,22 +0,0 @@
-Debugging Modules after 2.6.3
------------------------------
-
-In almost all distributions, the kernel asks for modules which don't
-exist, such as "net-pf-10" or whatever.  Changing "modprobe -q" to
-"succeed" in this case is hacky and breaks some setups, and also we
-want to know if it failed for the fallback code for old aliases in
-fs/char_dev.c, for example.
-
-In the past a debugging message which would fill people's logs was
-emitted.  This debugging message has been removed.  The correct way
-of debugging module problems is something like this:
-
-echo '#! /bin/sh' > /tmp/modprobe
-echo 'echo "$@" >> /tmp/modprobe.log' >> /tmp/modprobe
-echo 'exec /sbin/modprobe "$@"' >> /tmp/modprobe
-chmod a+x /tmp/modprobe
-echo /tmp/modprobe > /proc/sys/kernel/modprobe
-
-Note that the above applies only when the *kernel* is requesting
-that the module be loaded -- it won't have any effect if that module
-is being loaded explicitly using "modprobe" from userspace.
-- 
cgit 


From a474105bb6a6fe85ea30d7fe0a087184da32c751 Mon Sep 17 00:00:00 2001
From: Stephen Kitt <steve@sk2.org>
Date: Tue, 18 Feb 2020 13:59:18 +0100
Subject: docs: drop l2cr from sysctl/kernel.rst

The l2cr sysctl entry was removed in commit c2f3dabefa73 ("sysctl:
kill binary sysctl KERN_PPC_L2CR"), this removes the corresponding
documentation.

Signed-off-by: Stephen Kitt <steve@sk2.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/admin-guide/sysctl/kernel.rst | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
index bb56ff25d947..99569a26f93e 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -377,13 +377,6 @@ When ``kptr_restrict`` is set to 2, kernel pointers printed using
 %pK will be replaced with 0s regardless of privileges.
 
 
-l2cr (PPC only)
-===============
-
-This flag controls the L2 cache of G3 processor boards. If
-0, the cache is disabled. Enabled if nonzero.
-
-
 modprobe
 ========
 
-- 
cgit 


From fa5b526411bb5afe7736ce14bab18c0b68db4251 Mon Sep 17 00:00:00 2001
From: Stephen Kitt <steve@sk2.org>
Date: Tue, 18 Feb 2020 13:59:19 +0100
Subject: docs: add missing IPC documentation in sysctl/kernel.rst

This adds short descriptions of msgmax, msgmnb, msgmni, and shmmni,
which were previously listed in kernel.rst but not described.

Signed-off-by: Stephen Kitt <steve@sk2.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/admin-guide/sysctl/kernel.rst | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
index 99569a26f93e..0ae52156db75 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -410,6 +410,15 @@ to false.  Generally used with the `kexec_load_disabled`_ toggle.
 msgmax, msgmnb, and msgmni
 ==========================
 
+``msgmax`` is the maximum size of an IPC message, in bytes. 8192 by
+default (``MSGMAX``).
+
+``msgmnb`` is the maximum size of an IPC queue, in bytes. 16384 by
+default (``MSGMNB``).
+
+``msgmni`` is the maximum number of IPC queues. 32000 by default
+(``MSGMNI``).
+
 
 msg_next_id, sem_next_id, and shm_next_id (System V IPC)
 ========================================================
@@ -958,6 +967,9 @@ kernel.  This value defaults to ``SHMMAX``.
 shmmni
 ======
 
+This value determines the maximum number of shared memory segments.
+4096 by default (``SHMMNI``).
+
 
 shm_rmid_forced
 ===============
-- 
cgit 


From a1ad4f15054b58636aa58f0df2961259f8781746 Mon Sep 17 00:00:00 2001
From: Stephen Kitt <steve@sk2.org>
Date: Tue, 18 Feb 2020 13:59:20 +0100
Subject: docs: document stop-a in sysctl/kernel.rst

This describes the SPARC-specific stop-a sysctl entry, which was
previously listed in kernel.rst but not documented.

Base on the implementation in arch/sparc/kernel/setup_{32,64}.c and
kernel/panic.c.

Signed-off-by: Stephen Kitt <steve@sk2.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/admin-guide/sysctl/kernel.rst | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
index 0ae52156db75..3cbbe4502e18 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -1069,6 +1069,16 @@ compilation sees a 1% slowdown, other systems and workloads may vary.
 stop-a (SPARC only)
 ===================
 
+Controls Stop-A:
+
+= ====================================
+0 Stop-A has no effect.
+1 Stop-A breaks to the PROM (default).
+= ====================================
+
+Stop-A is always enabled on a panic, so that the user can return to
+the boot PROM.
+
 
 sysrq
 =====
-- 
cgit 


From 404347e68aeb81b89dc440135ed23fcabff104f9 Mon Sep 17 00:00:00 2001
From: Stephen Kitt <steve@sk2.org>
Date: Tue, 18 Feb 2020 13:59:21 +0100
Subject: docs: document panic fully in sysctl/kernel.rst
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The description of panic doesn’t cover all the supported scenarios;
this patch fixes that, describing the three possibilities (no reboot,
immediate reboot, reboot after a delay).

Based on the implementation in kernel/panic.c.

Signed-off-by: Stephen Kitt <steve@sk2.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/admin-guide/sysctl/kernel.rst | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
index 3cbbe4502e18..60c97a79ff26 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -559,9 +559,15 @@ The default is 65534.
 panic
 =====
 
-The value in this file represents the number of seconds the kernel
-waits before rebooting on a panic. When you use the software watchdog,
-the recommended setting is 60.
+The value in this file determines the behaviour of the kernel on a
+panic:
+
+* if zero, the kernel will loop forever;
+* if negative, the kernel will reboot immediately;
+* if positive, the kernel will reboot after the corresponding number
+  of seconds.
+
+When you use the software watchdog, the recommended setting is 60.
 
 
 panic_on_io_nmi
-- 
cgit 


From 8f21f54b8a9517e0213948088aca757a0f122447 Mon Sep 17 00:00:00 2001
From: Stephen Kitt <steve@sk2.org>
Date: Tue, 18 Feb 2020 13:59:23 +0100
Subject: docs: sysctl/kernel: remove rtsig entries

These have no corresponding code in the kernel.

Signed-off-by: Stephen Kitt <steve@sk2.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/admin-guide/sysctl/kernel.rst | 10 ----------
 1 file changed, 10 deletions(-)

diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
index 60c97a79ff26..6c0d8c55101c 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -900,16 +900,6 @@ ROM/Flash boot loader. Maybe to tell it what to do after
 rebooting. ???
 
 
-rtsig-max & rtsig-nr
-====================
-
-The file rtsig-max can be used to tune the maximum number
-of POSIX realtime (queued) signals that can be outstanding
-in the system.
-
-rtsig-nr shows the number of RT signals currently queued.
-
-
 sched_energy_aware
 ==================
 
-- 
cgit 


From dff2c2e69f308c1c7d296d49d2b0467e9675b58e Mon Sep 17 00:00:00 2001
From: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Date: Tue, 18 Feb 2020 15:10:13 +0530
Subject: Replace dead urls with active urls for Mutt

This patch replace stale/dead urls with active urls for Mutt.

Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/process/email-clients.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/process/email-clients.rst b/Documentation/process/email-clients.rst
index 5273d06c8ff6..c9e4ce2613c0 100644
--- a/Documentation/process/email-clients.rst
+++ b/Documentation/process/email-clients.rst
@@ -237,9 +237,9 @@ using Mutt to send patches through Gmail::
 
 The Mutt docs have lots more information:
 
-    http://dev.mutt.org/trac/wiki/UseCases/Gmail
+    https://gitlab.com/muttmua/mutt/-/wikis/UseCases/Gmail
 
-    http://dev.mutt.org/doc/manual.html
+    http://www.mutt.org/doc/manual/
 
 Pine (TUI)
 **********
-- 
cgit 


From fb0e0ffe7fc8e0e91481e67665f1d646bfd071f2 Mon Sep 17 00:00:00 2001
From: Tony Fischetti <tony.fischetti@gmail.com>
Date: Sun, 16 Feb 2020 19:08:26 -0500
Subject: Documentation: bring process docs up to date

The guide to the kernel dev process documentation, for example, contains
references to older kernels and their timelines. In addition, one of the
"long term support kernels" listed have since reached EOL, and a new one
has been named. This patch brings information/tables up to date.

Additionally, some very trivial grammatical errors, unclear sentences,
and potentially unsavory diction have been edited.

Signed-off-by: Tony Fischetti <tony.fischetti@gmail.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/process/2.Process.rst    | 108 +++++++++++++++++----------------
 Documentation/process/coding-style.rst |  18 +++---
 Documentation/process/howto.rst        |  17 +++---
 3 files changed, 73 insertions(+), 70 deletions(-)

diff --git a/Documentation/process/2.Process.rst b/Documentation/process/2.Process.rst
index ae020d84d7c4..b21b5b245d13 100644
--- a/Documentation/process/2.Process.rst
+++ b/Documentation/process/2.Process.rst
@@ -18,18 +18,18 @@ major kernel release happening every two or three months.  The recent
 release history looks like this:
 
 	======  =================
-	4.11	April 30, 2017
-	4.12	July 2, 2017
-	4.13	September 3, 2017
-	4.14	November 12, 2017
-	4.15	January 28, 2018
-	4.16	April 1, 2018
+	5.0	March 3, 2019
+	5.1	May 5, 2019
+	5.2	July 7, 2019
+	5.3	September 15, 2019
+	5.4	November 24, 2019
+	5.5	January 6, 2020
 	======  =================
 
-Every 4.x release is a major kernel release with new features, internal
-API changes, and more.  A typical 4.x release contain about 13,000
-changesets with changes to several hundred thousand lines of code.  4.x is
-thus the leading edge of Linux kernel development; the kernel uses a
+Every 5.x release is a major kernel release with new features, internal
+API changes, and more.  A typical release can contain about 13,000
+changesets with changes to several hundred thousand lines of code.  5.x is
+the leading edge of Linux kernel development; the kernel uses a
 rolling development model which is continually integrating major changes.
 
 A relatively straightforward discipline is followed with regard to the
@@ -48,9 +48,9 @@ detail later on).
 
 The merge window lasts for approximately two weeks.  At the end of this
 time, Linus Torvalds will declare that the window is closed and release the
-first of the "rc" kernels.  For the kernel which is destined to be 2.6.40,
+first of the "rc" kernels.  For the kernel which is destined to be 5.6,
 for example, the release which happens at the end of the merge window will
-be called 2.6.40-rc1.  The -rc1 release is the signal that the time to
+be called 5.6-rc1.  The -rc1 release is the signal that the time to
 merge new features has passed, and that the time to stabilize the next
 kernel has begun.
 
@@ -67,22 +67,23 @@ add at any time).
 As fixes make their way into the mainline, the patch rate will slow over
 time.  Linus releases new -rc kernels about once a week; a normal series
 will get up to somewhere between -rc6 and -rc9 before the kernel is
-considered to be sufficiently stable and the final 2.6.x release is made.
+considered to be sufficiently stable and the final release is made.
 At that point the whole process starts over again.
 
-As an example, here is how the 4.16 development cycle went (all dates in
-2018):
+As an example, here is how the 5.4 development cycle went (all dates in
+2019):
 
 	==============  ===============================
-	January 28	4.15 stable release
-	February 11	4.16-rc1, merge window closes
-	February 18	4.16-rc2
-	February 25	4.16-rc3
-	March 4		4.16-rc4
-	March 11	4.16-rc5
-	March 18	4.16-rc6
-	March 25	4.16-rc7
-	April 1		4.16 stable release
+	September 15	5.3 stable release
+	September 30	5.4-rc1, merge window closes
+	October 6	5.4-rc2
+	October 13	5.4-rc3
+	October 20	5.4-rc4
+	October 27	5.4-rc5
+	November 3	5.4-rc6
+	November 10	5.4-rc7
+	November 17	5.4-rc8
+	November 24	5.4 stable release
 	==============  ===============================
 
 How do the developers decide when to close the development cycle and create
@@ -98,43 +99,44 @@ release is made.  In the real world, this kind of perfection is hard to
 achieve; there are just too many variables in a project of this size.
 There comes a point where delaying the final release just makes the problem
 worse; the pile of changes waiting for the next merge window will grow
-larger, creating even more regressions the next time around.  So most 4.x
+larger, creating even more regressions the next time around.  So most 5.x
 kernels go out with a handful of known regressions though, hopefully, none
 of them are serious.
 
 Once a stable release is made, its ongoing maintenance is passed off to the
-"stable team," currently consisting of Greg Kroah-Hartman.  The stable team
-will release occasional updates to the stable release using the 4.x.y
-numbering scheme.  To be considered for an update release, a patch must (1)
-fix a significant bug, and (2) already be merged into the mainline for the
-next development kernel.  Kernels will typically receive stable updates for
-a little more than one development cycle past their initial release.  So,
-for example, the 4.13 kernel's history looked like:
+"stable team," currently Greg Kroah-Hartman. The stable team will release
+occasional updates to the stable release using the 5.x.y numbering scheme.
+To be considered for an update release, a patch must (1) fix a significant
+bug, and (2) already be merged into the mainline for the next development
+kernel. Kernels will typically receive stable updates for a little more
+than one development cycle past their initial release. So, for example, the
+5.2 kernel's history looked like this (all dates in 2019):
 
 	==============  ===============================
-	September 3 	4.13 stable release
-	September 13	4.13.1
-	September 20	4.13.2
-	September 27	4.13.3
-	October 5	4.13.4
-	October 12  	4.13.5
+	September 15 	5.2 stable release
+	July 14		5.2.1
+	July 21		5.2.2
+	July 26		5.2.3
+	July 28		5.2.4
+	July 31  	5.2.5
 	...		...
-	November 24	4.13.16
+	October 11	5.2.21
 	==============  ===============================
 
-4.13.16 was the final stable update of the 4.13 release.
+5.2.21 was the final stable update of the 5.2 release.
 
 Some kernels are designated "long term" kernels; they will receive support
 for a longer period.  As of this writing, the current long term kernels
 and their maintainers are:
 
-	======  ======================  ==============================
-	3.16	Ben Hutchings		(very long-term stable kernel)
-	4.1	Sasha Levin
-	4.4	Greg Kroah-Hartman	(very long-term stable kernel)
-	4.9	Greg Kroah-Hartman
-	4.14	Greg Kroah-Hartman
-	======  ======================  ==============================
+	======  ================================	=======================
+	3.16	Ben Hutchings				(very long-term kernel)
+	4.4	Greg Kroah-Hartman & Sasha Levin	(very long-term kernel)
+	4.9	Greg Kroah-Hartman & Sasha Levin
+	4.14	Greg Kroah-Hartman & Sasha Levin
+	4.19	Greg Kroah-Hartman & Sasha Levin
+	5.4	Greg Kroah-Hartman & Sasha Levin
+	======  ================================	=======================
 
 The selection of a kernel for long-term support is purely a matter of a
 maintainer having the need and the time to maintain that release.  There
@@ -215,12 +217,12 @@ How patches get into the Kernel
 -------------------------------
 
 There is exactly one person who can merge patches into the mainline kernel
-repository: Linus Torvalds.  But, of the over 9,500 patches which went
-into the 2.6.38 kernel, only 112 (around 1.3%) were directly chosen by Linus
-himself.  The kernel project has long since grown to a size where no single
-developer could possibly inspect and select every patch unassisted.  The
-way the kernel developers have addressed this growth is through the use of
-a lieutenant system built around a chain of trust.
+repository: Linus Torvalds. But, for example, of the over 9,500 patches
+which went into the 2.6.38 kernel, only 112 (around 1.3%) were directly
+chosen by Linus himself. The kernel project has long since grown to a size
+where no single developer could possibly inspect and select every patch
+unassisted. The way the kernel developers have addressed this growth is
+through the use of a lieutenant system built around a chain of trust.
 
 The kernel code base is logically broken down into a set of subsystems:
 networking, specific architecture support, memory management, video
diff --git a/Documentation/process/coding-style.rst b/Documentation/process/coding-style.rst
index edb296c52f61..acb2f1b36350 100644
--- a/Documentation/process/coding-style.rst
+++ b/Documentation/process/coding-style.rst
@@ -284,9 +284,9 @@ context lines.
 4) Naming
 ---------
 
-C is a Spartan language, and so should your naming be.  Unlike Modula-2
-and Pascal programmers, C programmers do not use cute names like
-ThisVariableIsATemporaryCounter.  A C programmer would call that
+C is a Spartan language, and your naming conventions should follow suit.
+Unlike Modula-2 and Pascal programmers, C programmers do not use cute
+names like ThisVariableIsATemporaryCounter. A C programmer would call that
 variable ``tmp``, which is much easier to write, and not the least more
 difficult to understand.
 
@@ -300,9 +300,9 @@ that counts the number of active users, you should call that
 ``count_active_users()`` or similar, you should **not** call it ``cntusr()``.
 
 Encoding the type of a function into the name (so-called Hungarian
-notation) is brain damaged - the compiler knows the types anyway and can
-check those, and it only confuses the programmer.  No wonder MicroSoft
-makes buggy programs.
+notation) is asinine - the compiler knows the types anyway and can check
+those, and it only confuses the programmer. No wonder Microsoft makes buggy
+programs.
 
 LOCAL variable names should be short, and to the point.  If you have
 some random integer loop counter, it should probably be called ``i``.
@@ -806,9 +806,9 @@ covers RTL which is used frequently with assembly language in the kernel.
 ----------------------------
 
 Kernel developers like to be seen as literate. Do mind the spelling
-of kernel messages to make a good impression. Do not use crippled
-words like ``dont``; use ``do not`` or ``don't`` instead.  Make the messages
-concise, clear, and unambiguous.
+of kernel messages to make a good impression. Do not use incorrect
+contractions like ``dont``; use ``do not`` or ``don't`` instead. Make the
+messages concise, clear, and unambiguous.
 
 Kernel messages do not have to be terminated with a period.
 
diff --git a/Documentation/process/howto.rst b/Documentation/process/howto.rst
index b6f5a379ad6c..70791e153de1 100644
--- a/Documentation/process/howto.rst
+++ b/Documentation/process/howto.rst
@@ -243,10 +243,10 @@ branches.  These different branches are:
 Mainline tree
 ~~~~~~~~~~~~~
 
-Mainline tree are maintained by Linus Torvalds, and can be found at
+The mainline tree is maintained by Linus Torvalds, and can be found at
 https://kernel.org or in the repo.  Its development process is as follows:
 
-  - As soon as a new kernel is released a two weeks window is open,
+  - As soon as a new kernel is released a two week window is open,
     during this period of time maintainers can submit big diffs to
     Linus, usually the patches that have already been included in the
     linux-next for a few weeks.  The preferred way to submit big changes
@@ -281,8 +281,9 @@ Various stable trees with multiple major numbers
 
 Kernels with 3-part versions are -stable kernels. They contain
 relatively small and critical fixes for security problems or significant
-regressions discovered in a given major mainline release, with the first
-2-part of version number are the same correspondingly.
+regressions discovered in a given major mainline release. Each release
+in a major stable series increments the third part of the version
+number, keeping the first two parts the same.
 
 This is the recommended branch for users who want the most recent stable
 kernel and are not interested in helping test development/experimental
@@ -359,10 +360,10 @@ Managing bug reports
 
 One of the best ways to put into practice your hacking skills is by fixing
 bugs reported by other people. Not only you will help to make the kernel
-more stable, you'll learn to fix real world problems and you will improve
-your skills, and other developers will be aware of your presence. Fixing
-bugs is one of the best ways to get merits among other developers, because
-not many people like wasting time fixing other people's bugs.
+more stable, but you'll also learn to fix real world problems and you will
+improve your skills, and other developers will be aware of your presence.
+Fixing bugs is one of the best ways to get merits among other developers,
+because not many people like wasting time fixing other people's bugs.
 
 To work in the already reported bug reports, go to https://bugzilla.kernel.org.
 
-- 
cgit 


From 965fc39f73932041441e03730db31516e285b61a Mon Sep 17 00:00:00 2001
From: Randy Dunlap <rdunlap@infradead.org>
Date: Sat, 15 Feb 2020 23:26:06 -0800
Subject: Documentation: sort _SPHINXDIRS for 'make help'

Sort the _SPHINXDIRS so that the 'make help' output is easier to read &
search and in a predictable order instead of some unknown pseudo-random
order.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-doc@vger.kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/Makefile b/Documentation/Makefile
index d77bb607aea4..79ecee62d597 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -13,7 +13,7 @@ endif
 SPHINXBUILD   = sphinx-build
 SPHINXOPTS    =
 SPHINXDIRS    = .
-_SPHINXDIRS   = $(patsubst $(srctree)/Documentation/%/index.rst,%,$(wildcard $(srctree)/Documentation/*/index.rst))
+_SPHINXDIRS   = $(sort $(patsubst $(srctree)/Documentation/%/index.rst,%,$(wildcard $(srctree)/Documentation/*/index.rst)))
 SPHINX_CONF   = conf.py
 PAPER         =
 BUILDDIR      = $(obj)/output
-- 
cgit 


From 1733ec77d34059cd67a7b9677fe2fd3ef977afb3 Mon Sep 17 00:00:00 2001
From: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Date: Fri, 14 Feb 2020 18:41:32 +0100
Subject: docs: driver-api: edid: Fix list formatting
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Without the empty lines, Sphinx renders the list as part of the running
text.

Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/driver-api/edid.rst | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/Documentation/driver-api/edid.rst b/Documentation/driver-api/edid.rst
index b1b5acd501ed..7dc07942ceb2 100644
--- a/Documentation/driver-api/edid.rst
+++ b/Documentation/driver-api/edid.rst
@@ -11,11 +11,13 @@ Today, with the advent of Kernel Mode Setting, a graphics board is
 either correctly working because all components follow the standards -
 or the computer is unusable, because the screen remains dark after
 booting or it displays the wrong area. Cases when this happens are:
+
 - The graphics board does not recognize the monitor.
 - The graphics board is unable to detect any EDID data.
 - The graphics board incorrectly forwards EDID data to the driver.
 - The monitor sends no or bogus EDID data.
 - A KVM sends its own EDID data instead of querying the connected monitor.
+
 Adding the kernel parameter "nomodeset" helps in most cases, but causes
 restrictions later on.
 
-- 
cgit 


From 320bfd91a985f2b945bad611c43add8a3a359845 Mon Sep 17 00:00:00 2001
From: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Date: Fri, 14 Feb 2020 18:41:33 +0100
Subject: docs: admin-guide: Move edid.rst from driver-api
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This document describes actions that an admin can do, rather than
interfaces available to driver developers, so admin-guide seems to
be a more appropriate place for it.

Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/admin-guide/edid.rst  | 60 +++++++++++++++++++++++++++++++++++++
 Documentation/admin-guide/index.rst |  1 +
 Documentation/driver-api/edid.rst   | 60 -------------------------------------
 Documentation/driver-api/index.rst  |  1 -
 4 files changed, 61 insertions(+), 61 deletions(-)
 create mode 100644 Documentation/admin-guide/edid.rst
 delete mode 100644 Documentation/driver-api/edid.rst

diff --git a/Documentation/admin-guide/edid.rst b/Documentation/admin-guide/edid.rst
new file mode 100644
index 000000000000..7dc07942ceb2
--- /dev/null
+++ b/Documentation/admin-guide/edid.rst
@@ -0,0 +1,60 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+====
+EDID
+====
+
+In the good old days when graphics parameters were configured explicitly
+in a file called xorg.conf, even broken hardware could be managed.
+
+Today, with the advent of Kernel Mode Setting, a graphics board is
+either correctly working because all components follow the standards -
+or the computer is unusable, because the screen remains dark after
+booting or it displays the wrong area. Cases when this happens are:
+
+- The graphics board does not recognize the monitor.
+- The graphics board is unable to detect any EDID data.
+- The graphics board incorrectly forwards EDID data to the driver.
+- The monitor sends no or bogus EDID data.
+- A KVM sends its own EDID data instead of querying the connected monitor.
+
+Adding the kernel parameter "nomodeset" helps in most cases, but causes
+restrictions later on.
+
+As a remedy for such situations, the kernel configuration item
+CONFIG_DRM_LOAD_EDID_FIRMWARE was introduced. It allows to provide an
+individually prepared or corrected EDID data set in the /lib/firmware
+directory from where it is loaded via the firmware interface. The code
+(see drivers/gpu/drm/drm_edid_load.c) contains built-in data sets for
+commonly used screen resolutions (800x600, 1024x768, 1280x1024, 1600x1200,
+1680x1050, 1920x1080) as binary blobs, but the kernel source tree does
+not contain code to create these data. In order to elucidate the origin
+of the built-in binary EDID blobs and to facilitate the creation of
+individual data for a specific misbehaving monitor, commented sources
+and a Makefile environment are given here.
+
+To create binary EDID and C source code files from the existing data
+material, simply type "make".
+
+If you want to create your own EDID file, copy the file 1024x768.S,
+replace the settings with your own data and add a new target to the
+Makefile. Please note that the EDID data structure expects the timing
+values in a different way as compared to the standard X11 format.
+
+X11:
+  HTimings:
+    hdisp hsyncstart hsyncend htotal
+  VTimings:
+    vdisp vsyncstart vsyncend vtotal
+
+EDID::
+
+  #define XPIX hdisp
+  #define XBLANK htotal-hdisp
+  #define XOFFSET hsyncstart-hdisp
+  #define XPULSE hsyncend-hsyncstart
+
+  #define YPIX vdisp
+  #define YBLANK vtotal-vdisp
+  #define YOFFSET vsyncstart-vdisp
+  #define YPULSE vsyncend-vsyncstart
diff --git a/Documentation/admin-guide/index.rst b/Documentation/admin-guide/index.rst
index f1d0ccffbe72..5a6269fb8593 100644
--- a/Documentation/admin-guide/index.rst
+++ b/Documentation/admin-guide/index.rst
@@ -75,6 +75,7 @@ configure specific aspects of kernel behavior to your liking.
    cputopology
    dell_rbu
    device-mapper/index
+   edid
    efi-stub
    ext4
    nfs/index
diff --git a/Documentation/driver-api/edid.rst b/Documentation/driver-api/edid.rst
deleted file mode 100644
index 7dc07942ceb2..000000000000
--- a/Documentation/driver-api/edid.rst
+++ /dev/null
@@ -1,60 +0,0 @@
-.. SPDX-License-Identifier: GPL-2.0
-
-====
-EDID
-====
-
-In the good old days when graphics parameters were configured explicitly
-in a file called xorg.conf, even broken hardware could be managed.
-
-Today, with the advent of Kernel Mode Setting, a graphics board is
-either correctly working because all components follow the standards -
-or the computer is unusable, because the screen remains dark after
-booting or it displays the wrong area. Cases when this happens are:
-
-- The graphics board does not recognize the monitor.
-- The graphics board is unable to detect any EDID data.
-- The graphics board incorrectly forwards EDID data to the driver.
-- The monitor sends no or bogus EDID data.
-- A KVM sends its own EDID data instead of querying the connected monitor.
-
-Adding the kernel parameter "nomodeset" helps in most cases, but causes
-restrictions later on.
-
-As a remedy for such situations, the kernel configuration item
-CONFIG_DRM_LOAD_EDID_FIRMWARE was introduced. It allows to provide an
-individually prepared or corrected EDID data set in the /lib/firmware
-directory from where it is loaded via the firmware interface. The code
-(see drivers/gpu/drm/drm_edid_load.c) contains built-in data sets for
-commonly used screen resolutions (800x600, 1024x768, 1280x1024, 1600x1200,
-1680x1050, 1920x1080) as binary blobs, but the kernel source tree does
-not contain code to create these data. In order to elucidate the origin
-of the built-in binary EDID blobs and to facilitate the creation of
-individual data for a specific misbehaving monitor, commented sources
-and a Makefile environment are given here.
-
-To create binary EDID and C source code files from the existing data
-material, simply type "make".
-
-If you want to create your own EDID file, copy the file 1024x768.S,
-replace the settings with your own data and add a new target to the
-Makefile. Please note that the EDID data structure expects the timing
-values in a different way as compared to the standard X11 format.
-
-X11:
-  HTimings:
-    hdisp hsyncstart hsyncend htotal
-  VTimings:
-    vdisp vsyncstart vsyncend vtotal
-
-EDID::
-
-  #define XPIX hdisp
-  #define XBLANK htotal-hdisp
-  #define XOFFSET hsyncstart-hdisp
-  #define XPULSE hsyncend-hsyncstart
-
-  #define YPIX vdisp
-  #define YBLANK vtotal-vdisp
-  #define YOFFSET vsyncstart-vdisp
-  #define YPULSE vsyncend-vsyncstart
diff --git a/Documentation/driver-api/index.rst b/Documentation/driver-api/index.rst
index 0ebe205efd0c..ea3003b3c5e5 100644
--- a/Documentation/driver-api/index.rst
+++ b/Documentation/driver-api/index.rst
@@ -74,7 +74,6 @@ available subsections can be seen below.
    connector
    console
    dcdbas
-   edid
    eisa
    ipmb
    isa
-- 
cgit 


From b4ce545f349b711351ec4b0df7a3302d91c3dd45 Mon Sep 17 00:00:00 2001
From: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Date: Fri, 14 Feb 2020 18:41:35 +0100
Subject: docs: admin-guide: edid: Clarify where to run "make"
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

When both the documentation and the data files lived in
Documentation/EDID, this wasn't necessary, but both have
been moved to other directories in the meantime.

Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/admin-guide/edid.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/edid.rst b/Documentation/admin-guide/edid.rst
index 7dc07942ceb2..80deeb21a265 100644
--- a/Documentation/admin-guide/edid.rst
+++ b/Documentation/admin-guide/edid.rst
@@ -34,7 +34,7 @@ individual data for a specific misbehaving monitor, commented sources
 and a Makefile environment are given here.
 
 To create binary EDID and C source code files from the existing data
-material, simply type "make".
+material, simply type "make" in tools/edid/.
 
 If you want to create your own EDID file, copy the file 1024x768.S,
 replace the settings with your own data and add a new target to the
-- 
cgit 


From e2c79ab7d75b4c6ed827e8078e5ebe2d059edafc Mon Sep 17 00:00:00 2001
From: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Date: Fri, 14 Feb 2020 18:41:34 +0100
Subject: tools/edid: Move EDID data sets from Documentation/
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The EDID files are not really documentation.

Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/EDID/1024x768.S  |  43 -------
 Documentation/EDID/1280x1024.S |  43 -------
 Documentation/EDID/1600x1200.S |  43 -------
 Documentation/EDID/1680x1050.S |  43 -------
 Documentation/EDID/1920x1080.S |  43 -------
 Documentation/EDID/800x600.S   |  40 ------
 Documentation/EDID/Makefile    |  37 ------
 Documentation/EDID/edid.S      | 274 -----------------------------------------
 Documentation/EDID/hex         |   1 -
 tools/edid/1024x768.S          |  43 +++++++
 tools/edid/1280x1024.S         |  43 +++++++
 tools/edid/1600x1200.S         |  43 +++++++
 tools/edid/1680x1050.S         |  43 +++++++
 tools/edid/1920x1080.S         |  43 +++++++
 tools/edid/800x600.S           |  40 ++++++
 tools/edid/Makefile            |  37 ++++++
 tools/edid/edid.S              | 274 +++++++++++++++++++++++++++++++++++++++++
 tools/edid/hex                 |   1 +
 18 files changed, 567 insertions(+), 567 deletions(-)
 delete mode 100644 Documentation/EDID/1024x768.S
 delete mode 100644 Documentation/EDID/1280x1024.S
 delete mode 100644 Documentation/EDID/1600x1200.S
 delete mode 100644 Documentation/EDID/1680x1050.S
 delete mode 100644 Documentation/EDID/1920x1080.S
 delete mode 100644 Documentation/EDID/800x600.S
 delete mode 100644 Documentation/EDID/Makefile
 delete mode 100644 Documentation/EDID/edid.S
 delete mode 100644 Documentation/EDID/hex
 create mode 100644 tools/edid/1024x768.S
 create mode 100644 tools/edid/1280x1024.S
 create mode 100644 tools/edid/1600x1200.S
 create mode 100644 tools/edid/1680x1050.S
 create mode 100644 tools/edid/1920x1080.S
 create mode 100644 tools/edid/800x600.S
 create mode 100644 tools/edid/Makefile
 create mode 100644 tools/edid/edid.S
 create mode 100644 tools/edid/hex

diff --git a/Documentation/EDID/1024x768.S b/Documentation/EDID/1024x768.S
deleted file mode 100644
index 4aed3f9ab88a..000000000000
--- a/Documentation/EDID/1024x768.S
+++ /dev/null
@@ -1,43 +0,0 @@
-/*
-   1024x768.S: EDID data set for standard 1024x768 60 Hz monitor
-
-   Copyright (C) 2011 Carsten Emde <C.Emde@osadl.org>
-
-   This program is free software; you can redistribute it and/or
-   modify it under the terms of the GNU General Public License
-   as published by the Free Software Foundation; either version 2
-   of the License, or (at your option) any later version.
-
-   This program is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-   GNU General Public License for more details.
-
-   You should have received a copy of the GNU General Public License
-   along with this program; if not, write to the Free Software
-   Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA.
-*/
-
-/* EDID */
-#define VERSION 1
-#define REVISION 3
-
-/* Display */
-#define CLOCK 65000 /* kHz */
-#define XPIX 1024
-#define YPIX 768
-#define XY_RATIO XY_RATIO_4_3
-#define XBLANK 320
-#define YBLANK 38
-#define XOFFSET 8
-#define XPULSE 144
-#define YOFFSET 3
-#define YPULSE 6
-#define DPI 72
-#define VFREQ 60 /* Hz */
-#define TIMING_NAME "Linux XGA"
-#define ESTABLISHED_TIMING2_BITS 0x08 /* Bit 3 -> 1024x768 @60 Hz */
-#define HSYNC_POL 0
-#define VSYNC_POL 0
-
-#include "edid.S"
diff --git a/Documentation/EDID/1280x1024.S b/Documentation/EDID/1280x1024.S
deleted file mode 100644
index b26dd424cad7..000000000000
--- a/Documentation/EDID/1280x1024.S
+++ /dev/null
@@ -1,43 +0,0 @@
-/*
-   1280x1024.S: EDID data set for standard 1280x1024 60 Hz monitor
-
-   Copyright (C) 2011 Carsten Emde <C.Emde@osadl.org>
-
-   This program is free software; you can redistribute it and/or
-   modify it under the terms of the GNU General Public License
-   as published by the Free Software Foundation; either version 2
-   of the License, or (at your option) any later version.
-
-   This program is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-   GNU General Public License for more details.
-
-   You should have received a copy of the GNU General Public License
-   along with this program; if not, write to the Free Software
-   Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA.
-*/
-
-/* EDID */
-#define VERSION 1
-#define REVISION 3
-
-/* Display */
-#define CLOCK 108000 /* kHz */
-#define XPIX 1280
-#define YPIX 1024
-#define XY_RATIO XY_RATIO_5_4
-#define XBLANK 408
-#define YBLANK 42
-#define XOFFSET 48
-#define XPULSE 112
-#define YOFFSET 1
-#define YPULSE 3
-#define DPI 72
-#define VFREQ 60 /* Hz */
-#define TIMING_NAME "Linux SXGA"
-/* No ESTABLISHED_TIMINGx_BITS */
-#define HSYNC_POL 1
-#define VSYNC_POL 1
-
-#include "edid.S"
diff --git a/Documentation/EDID/1600x1200.S b/Documentation/EDID/1600x1200.S
deleted file mode 100644
index 0d091b282768..000000000000
--- a/Documentation/EDID/1600x1200.S
+++ /dev/null
@@ -1,43 +0,0 @@
-/*
-   1600x1200.S: EDID data set for standard 1600x1200 60 Hz monitor
-
-   Copyright (C) 2013 Carsten Emde <C.Emde@osadl.org>
-
-   This program is free software; you can redistribute it and/or
-   modify it under the terms of the GNU General Public License
-   as published by the Free Software Foundation; either version 2
-   of the License, or (at your option) any later version.
-
-   This program is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-   GNU General Public License for more details.
-
-   You should have received a copy of the GNU General Public License
-   along with this program; if not, write to the Free Software
-   Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA.
-*/
-
-/* EDID */
-#define VERSION 1
-#define REVISION 3
-
-/* Display */
-#define CLOCK 162000 /* kHz */
-#define XPIX 1600
-#define YPIX 1200
-#define XY_RATIO XY_RATIO_4_3
-#define XBLANK 560
-#define YBLANK 50
-#define XOFFSET 64
-#define XPULSE 192
-#define YOFFSET 1
-#define YPULSE 3
-#define DPI 72
-#define VFREQ 60 /* Hz */
-#define TIMING_NAME "Linux UXGA"
-/* No ESTABLISHED_TIMINGx_BITS */
-#define HSYNC_POL 1
-#define VSYNC_POL 1
-
-#include "edid.S"
diff --git a/Documentation/EDID/1680x1050.S b/Documentation/EDID/1680x1050.S
deleted file mode 100644
index 7dfed9a33eab..000000000000
--- a/Documentation/EDID/1680x1050.S
+++ /dev/null
@@ -1,43 +0,0 @@
-/*
-   1680x1050.S: EDID data set for standard 1680x1050 60 Hz monitor
-
-   Copyright (C) 2012 Carsten Emde <C.Emde@osadl.org>
-
-   This program is free software; you can redistribute it and/or
-   modify it under the terms of the GNU General Public License
-   as published by the Free Software Foundation; either version 2
-   of the License, or (at your option) any later version.
-
-   This program is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-   GNU General Public License for more details.
-
-   You should have received a copy of the GNU General Public License
-   along with this program; if not, write to the Free Software
-   Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA.
-*/
-
-/* EDID */
-#define VERSION 1
-#define REVISION 3
-
-/* Display */
-#define CLOCK 146250 /* kHz */
-#define XPIX 1680
-#define YPIX 1050
-#define XY_RATIO XY_RATIO_16_10
-#define XBLANK 560
-#define YBLANK 39
-#define XOFFSET 104
-#define XPULSE 176
-#define YOFFSET 3
-#define YPULSE 6
-#define DPI 96
-#define VFREQ 60 /* Hz */
-#define TIMING_NAME "Linux WSXGA"
-/* No ESTABLISHED_TIMINGx_BITS */
-#define HSYNC_POL 1
-#define VSYNC_POL 1
-
-#include "edid.S"
diff --git a/Documentation/EDID/1920x1080.S b/Documentation/EDID/1920x1080.S
deleted file mode 100644
index d6ffbba28e95..000000000000
--- a/Documentation/EDID/1920x1080.S
+++ /dev/null
@@ -1,43 +0,0 @@
-/*
-   1920x1080.S: EDID data set for standard 1920x1080 60 Hz monitor
-
-   Copyright (C) 2012 Carsten Emde <C.Emde@osadl.org>
-
-   This program is free software; you can redistribute it and/or
-   modify it under the terms of the GNU General Public License
-   as published by the Free Software Foundation; either version 2
-   of the License, or (at your option) any later version.
-
-   This program is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-   GNU General Public License for more details.
-
-   You should have received a copy of the GNU General Public License
-   along with this program; if not, write to the Free Software
-   Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA.
-*/
-
-/* EDID */
-#define VERSION 1
-#define REVISION 3
-
-/* Display */
-#define CLOCK 148500 /* kHz */
-#define XPIX 1920
-#define YPIX 1080
-#define XY_RATIO XY_RATIO_16_9
-#define XBLANK 280
-#define YBLANK 45
-#define XOFFSET 88
-#define XPULSE 44
-#define YOFFSET 4
-#define YPULSE 5
-#define DPI 96
-#define VFREQ 60 /* Hz */
-#define TIMING_NAME "Linux FHD"
-/* No ESTABLISHED_TIMINGx_BITS */
-#define HSYNC_POL 1
-#define VSYNC_POL 1
-
-#include "edid.S"
diff --git a/Documentation/EDID/800x600.S b/Documentation/EDID/800x600.S
deleted file mode 100644
index a5616588de08..000000000000
--- a/Documentation/EDID/800x600.S
+++ /dev/null
@@ -1,40 +0,0 @@
-/*
-   800x600.S: EDID data set for standard 800x600 60 Hz monitor
-
-   Copyright (C) 2011 Carsten Emde <C.Emde@osadl.org>
-   Copyright (C) 2014 Linaro Limited
-
-   This program is free software; you can redistribute it and/or
-   modify it under the terms of the GNU General Public License
-   as published by the Free Software Foundation; either version 2
-   of the License, or (at your option) any later version.
-
-   This program is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-   GNU General Public License for more details.
-*/
-
-/* EDID */
-#define VERSION 1
-#define REVISION 3
-
-/* Display */
-#define CLOCK 40000 /* kHz */
-#define XPIX 800
-#define YPIX 600
-#define XY_RATIO XY_RATIO_4_3
-#define XBLANK 256
-#define YBLANK 28
-#define XOFFSET 40
-#define XPULSE 128
-#define YOFFSET 1
-#define YPULSE 4
-#define DPI 72
-#define VFREQ 60 /* Hz */
-#define TIMING_NAME "Linux SVGA"
-#define ESTABLISHED_TIMING1_BITS 0x01 /* Bit 0: 800x600 @ 60Hz */
-#define HSYNC_POL 1
-#define VSYNC_POL 1
-
-#include "edid.S"
diff --git a/Documentation/EDID/Makefile b/Documentation/EDID/Makefile
deleted file mode 100644
index 85a927dfab02..000000000000
--- a/Documentation/EDID/Makefile
+++ /dev/null
@@ -1,37 +0,0 @@
-
-SOURCES	:= $(wildcard [0-9]*x[0-9]*.S)
-
-BIN	:= $(patsubst %.S, %.bin, $(SOURCES))
-
-IHEX	:= $(patsubst %.S, %.bin.ihex, $(SOURCES))
-
-CODE	:= $(patsubst %.S, %.c, $(SOURCES))
-
-all:	$(BIN) $(IHEX) $(CODE)
-
-clean:
-	@rm -f *.o *.bin.ihex *.bin *.c
-
-%.o:	%.S
-	@cc -c $^
-
-%.bin.nocrc:	%.o
-	@objcopy -Obinary $^ $@
-
-%.crc:	%.bin.nocrc
-	@list=$$(for i in `seq 1 127`; do head -c$$i $^ | tail -c1 \
-		| hexdump -v -e '/1 "%02X+"'; done); \
-		echo "ibase=16;100-($${list%?})%100" | bc >$@
-
-%.p:	%.crc %.S
-	@cc -c -DCRC="$$(cat $*.crc)" -o $@ $*.S
-
-%.bin:	%.p
-	@objcopy -Obinary $^ $@
-
-%.bin.ihex:	%.p
-	@objcopy -Oihex $^ $@
-	@dos2unix $@ 2>/dev/null
-
-%.c:	%.bin
-	@echo "{" >$@; hexdump -f hex $^ >>$@; echo "};" >>$@
diff --git a/Documentation/EDID/edid.S b/Documentation/EDID/edid.S
deleted file mode 100644
index c3d13815526d..000000000000
--- a/Documentation/EDID/edid.S
+++ /dev/null
@@ -1,274 +0,0 @@
-/*
-   edid.S: EDID data template
-
-   Copyright (C) 2012 Carsten Emde <C.Emde@osadl.org>
-
-   This program is free software; you can redistribute it and/or
-   modify it under the terms of the GNU General Public License
-   as published by the Free Software Foundation; either version 2
-   of the License, or (at your option) any later version.
-
-   This program is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-   GNU General Public License for more details.
-
-   You should have received a copy of the GNU General Public License
-   along with this program; if not, write to the Free Software
-   Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA.
-*/
-
-
-/* Manufacturer */
-#define MFG_LNX1 'L'
-#define MFG_LNX2 'N'
-#define MFG_LNX3 'X'
-#define SERIAL 0
-#define YEAR 2012
-#define WEEK 5
-
-/* EDID 1.3 standard definitions */
-#define XY_RATIO_16_10	0b00
-#define XY_RATIO_4_3	0b01
-#define XY_RATIO_5_4	0b10
-#define XY_RATIO_16_9	0b11
-
-/* Provide defaults for the timing bits */
-#ifndef ESTABLISHED_TIMING1_BITS
-#define ESTABLISHED_TIMING1_BITS 0x00
-#endif
-#ifndef ESTABLISHED_TIMING2_BITS
-#define ESTABLISHED_TIMING2_BITS 0x00
-#endif
-#ifndef ESTABLISHED_TIMING3_BITS
-#define ESTABLISHED_TIMING3_BITS 0x00
-#endif
-
-#define mfgname2id(v1,v2,v3) \
-	((((v1-'@')&0x1f)<<10)+(((v2-'@')&0x1f)<<5)+((v3-'@')&0x1f))
-#define swap16(v1) ((v1>>8)+((v1&0xff)<<8))
-#define lsbs2(v1,v2) (((v1&0x0f)<<4)+(v2&0x0f))
-#define msbs2(v1,v2) ((((v1>>8)&0x0f)<<4)+((v2>>8)&0x0f))
-#define msbs4(v1,v2,v3,v4) \
-	((((v1>>8)&0x03)<<6)+(((v2>>8)&0x03)<<4)+\
-	(((v3>>4)&0x03)<<2)+((v4>>4)&0x03))
-#define pixdpi2mm(pix,dpi) ((pix*25)/dpi)
-#define xsize pixdpi2mm(XPIX,DPI)
-#define ysize pixdpi2mm(YPIX,DPI)
-
-		.data
-
-/* Fixed header pattern */
-header:		.byte	0x00,0xff,0xff,0xff,0xff,0xff,0xff,0x00
-
-mfg_id:		.hword	swap16(mfgname2id(MFG_LNX1, MFG_LNX2, MFG_LNX3))
-
-prod_code:	.hword	0
-
-/* Serial number. 32 bits, little endian. */
-serial_number:	.long	SERIAL
-
-/* Week of manufacture */
-week:		.byte	WEEK
-
-/* Year of manufacture, less 1990. (1990-2245)
-   If week=255, it is the model year instead */
-year:		.byte	YEAR-1990
-
-version:	.byte	VERSION 	/* EDID version, usually 1 (for 1.3) */
-revision:	.byte	REVISION	/* EDID revision, usually 3 (for 1.3) */
-
-/* If Bit 7=1	Digital input. If set, the following bit definitions apply:
-     Bits 6-1	Reserved, must be 0
-     Bit 0	Signal is compatible with VESA DFP 1.x TMDS CRGB,
-		  1 pixel per clock, up to 8 bits per color, MSB aligned,
-   If Bit 7=0	Analog input. If clear, the following bit definitions apply:
-     Bits 6-5	Video white and sync levels, relative to blank
-		  00=+0.7/-0.3 V; 01=+0.714/-0.286 V;
-		  10=+1.0/-0.4 V; 11=+0.7/0 V
-   Bit 4	Blank-to-black setup (pedestal) expected
-   Bit 3	Separate sync supported
-   Bit 2	Composite sync (on HSync) supported
-   Bit 1	Sync on green supported
-   Bit 0	VSync pulse must be serrated when somposite or
-		  sync-on-green is used. */
-video_parms:	.byte	0x6d
-
-/* Maximum horizontal image size, in centimetres
-   (max 292 cm/115 in at 16:9 aspect ratio) */
-max_hor_size:	.byte	xsize/10
-
-/* Maximum vertical image size, in centimetres.
-   If either byte is 0, undefined (e.g. projector) */
-max_vert_size:	.byte	ysize/10
-
-/* Display gamma, minus 1, times 100 (range 1.00-3.5 */
-gamma:		.byte	120
-
-/* Bit 7	DPMS standby supported
-   Bit 6	DPMS suspend supported
-   Bit 5	DPMS active-off supported
-   Bits 4-3	Display type: 00=monochrome; 01=RGB colour;
-		  10=non-RGB multicolour; 11=undefined
-   Bit 2	Standard sRGB colour space. Bytes 25-34 must contain
-		  sRGB standard values.
-   Bit 1	Preferred timing mode specified in descriptor block 1.
-   Bit 0	GTF supported with default parameter values. */
-dsp_features:	.byte	0xea
-
-/* Chromaticity coordinates. */
-/* Red and green least-significant bits
-   Bits 7-6	Red x value least-significant 2 bits
-   Bits 5-4	Red y value least-significant 2 bits
-   Bits 3-2	Green x value lst-significant 2 bits
-   Bits 1-0	Green y value least-significant 2 bits */
-red_green_lsb:	.byte	0x5e
-
-/* Blue and white least-significant 2 bits */
-blue_white_lsb:	.byte	0xc0
-
-/* Red x value most significant 8 bits.
-   0-255 encodes 0-0.996 (255/256); 0-0.999 (1023/1024) with lsbits */
-red_x_msb:	.byte	0xa4
-
-/* Red y value most significant 8 bits */
-red_y_msb:	.byte	0x59
-
-/* Green x and y value most significant 8 bits */
-green_x_y_msb:	.byte	0x4a,0x98
-
-/* Blue x and y value most significant 8 bits */
-blue_x_y_msb:	.byte	0x25,0x20
-
-/* Default white point x and y value most significant 8 bits */
-white_x_y_msb:	.byte	0x50,0x54
-
-/* Established timings */
-/* Bit 7	720x400 @ 70 Hz
-   Bit 6	720x400 @ 88 Hz
-   Bit 5	640x480 @ 60 Hz
-   Bit 4	640x480 @ 67 Hz
-   Bit 3	640x480 @ 72 Hz
-   Bit 2	640x480 @ 75 Hz
-   Bit 1	800x600 @ 56 Hz
-   Bit 0	800x600 @ 60 Hz */
-estbl_timing1:	.byte	ESTABLISHED_TIMING1_BITS
-
-/* Bit 7	800x600 @ 72 Hz
-   Bit 6	800x600 @ 75 Hz
-   Bit 5	832x624 @ 75 Hz
-   Bit 4	1024x768 @ 87 Hz, interlaced (1024x768)
-   Bit 3	1024x768 @ 60 Hz
-   Bit 2	1024x768 @ 72 Hz
-   Bit 1	1024x768 @ 75 Hz
-   Bit 0	1280x1024 @ 75 Hz */
-estbl_timing2:	.byte	ESTABLISHED_TIMING2_BITS
-
-/* Bit 7	1152x870 @ 75 Hz (Apple Macintosh II)
-   Bits 6-0 	Other manufacturer-specific display mod */
-estbl_timing3:	.byte	ESTABLISHED_TIMING3_BITS
-
-/* Standard timing */
-/* X resolution, less 31, divided by 8 (256-2288 pixels) */
-std_xres:	.byte	(XPIX/8)-31
-/* Y resolution, X:Y pixel ratio
-   Bits 7-6	X:Y pixel ratio: 00=16:10; 01=4:3; 10=5:4; 11=16:9.
-   Bits 5-0	Vertical frequency, less 60 (60-123 Hz) */
-std_vres:	.byte	(XY_RATIO<<6)+VFREQ-60
-		.fill	7,2,0x0101	/* Unused */
-
-descriptor1:
-/* Pixel clock in 10 kHz units. (0.-655.35 MHz, little-endian) */
-clock:		.hword	CLOCK/10
-
-/* Horizontal active pixels 8 lsbits (0-4095) */
-x_act_lsb:	.byte	XPIX&0xff
-/* Horizontal blanking pixels 8 lsbits (0-4095)
-   End of active to start of next active. */
-x_blk_lsb:	.byte	XBLANK&0xff
-/* Bits 7-4 	Horizontal active pixels 4 msbits
-   Bits 3-0	Horizontal blanking pixels 4 msbits */
-x_msbs:		.byte	msbs2(XPIX,XBLANK)
-
-/* Vertical active lines 8 lsbits (0-4095) */
-y_act_lsb:	.byte	YPIX&0xff
-/* Vertical blanking lines 8 lsbits (0-4095) */
-y_blk_lsb:	.byte	YBLANK&0xff
-/* Bits 7-4 	Vertical active lines 4 msbits
-   Bits 3-0 	Vertical blanking lines 4 msbits */
-y_msbs:		.byte	msbs2(YPIX,YBLANK)
-
-/* Horizontal sync offset pixels 8 lsbits (0-1023) From blanking start */
-x_snc_off_lsb:	.byte	XOFFSET&0xff
-/* Horizontal sync pulse width pixels 8 lsbits (0-1023) */
-x_snc_pls_lsb:	.byte	XPULSE&0xff
-/* Bits 7-4 	Vertical sync offset lines 4 lsbits (0-63)
-   Bits 3-0 	Vertical sync pulse width lines 4 lsbits (0-63) */
-y_snc_lsb:	.byte	lsbs2(YOFFSET, YPULSE)
-/* Bits 7-6 	Horizontal sync offset pixels 2 msbits
-   Bits 5-4 	Horizontal sync pulse width pixels 2 msbits
-   Bits 3-2 	Vertical sync offset lines 2 msbits
-   Bits 1-0 	Vertical sync pulse width lines 2 msbits */
-xy_snc_msbs:	.byte	msbs4(XOFFSET,XPULSE,YOFFSET,YPULSE)
-
-/* Horizontal display size, mm, 8 lsbits (0-4095 mm, 161 in) */
-x_dsp_size:	.byte	xsize&0xff
-
-/* Vertical display size, mm, 8 lsbits (0-4095 mm, 161 in) */
-y_dsp_size:	.byte	ysize&0xff
-
-/* Bits 7-4 	Horizontal display size, mm, 4 msbits
-   Bits 3-0 	Vertical display size, mm, 4 msbits */
-dsp_size_mbsb:	.byte	msbs2(xsize,ysize)
-
-/* Horizontal border pixels (each side; total is twice this) */
-x_border:	.byte	0
-/* Vertical border lines (each side; total is twice this) */
-y_border:	.byte	0
-
-/* Bit 7 	Interlaced
-   Bits 6-5 	Stereo mode: 00=No stereo; other values depend on bit 0:
-   Bit 0=0: 01=Field sequential, sync=1 during right; 10=similar,
-     sync=1 during left; 11=4-way interleaved stereo
-   Bit 0=1 2-way interleaved stereo: 01=Right image on even lines;
-     10=Left image on even lines; 11=side-by-side
-   Bits 4-3 	Sync type: 00=Analog composite; 01=Bipolar analog composite;
-     10=Digital composite (on HSync); 11=Digital separate
-   Bit 2 	If digital separate: Vertical sync polarity (1=positive)
-   Other types: VSync serrated (HSync during VSync)
-   Bit 1 	If analog sync: Sync on all 3 RGB lines (else green only)
-   Digital: HSync polarity (1=positive)
-   Bit 0 	2-way line-interleaved stereo, if bits 4-3 are not 00. */
-features:	.byte	0x18+(VSYNC_POL<<2)+(HSYNC_POL<<1)
-
-descriptor2:	.byte	0,0	/* Not a detailed timing descriptor */
-		.byte	0	/* Must be zero */
-		.byte	0xff	/* Descriptor is monitor serial number (text) */
-		.byte	0	/* Must be zero */
-start1:		.ascii	"Linux #0"
-end1:		.byte	0x0a	/* End marker */
-		.fill	12-(end1-start1), 1, 0x20 /* Padded spaces */
-descriptor3:	.byte	0,0	/* Not a detailed timing descriptor */
-		.byte	0	/* Must be zero */
-		.byte	0xfd	/* Descriptor is monitor range limits */
-		.byte	0	/* Must be zero */
-start2:		.byte	VFREQ-1	/* Minimum vertical field rate (1-255 Hz) */
-		.byte	VFREQ+1	/* Maximum vertical field rate (1-255 Hz) */
-		.byte	(CLOCK/(XPIX+XBLANK))-1 /* Minimum horizontal line rate
-						    (1-255 kHz) */
-		.byte	(CLOCK/(XPIX+XBLANK))+1 /* Maximum horizontal line rate
-						    (1-255 kHz) */
-		.byte	(CLOCK/10000)+1	/* Maximum pixel clock rate, rounded up
-					   to 10 MHz multiple (10-2550 MHz) */
-		.byte	0	/* No extended timing information type */
-end2:		.byte	0x0a	/* End marker */
-		.fill	12-(end2-start2), 1, 0x20 /* Padded spaces */
-descriptor4:	.byte	0,0	/* Not a detailed timing descriptor */
-		.byte	0	/* Must be zero */
-		.byte	0xfc	/* Descriptor is text */
-		.byte	0	/* Must be zero */
-start3:		.ascii	TIMING_NAME
-end3:		.byte	0x0a	/* End marker */
-		.fill	12-(end3-start3), 1, 0x20 /* Padded spaces */
-extensions:	.byte	0	/* Number of extensions to follow */
-checksum:	.byte	CRC	/* Sum of all bytes must be 0 */
diff --git a/Documentation/EDID/hex b/Documentation/EDID/hex
deleted file mode 100644
index 8873ebb618af..000000000000
--- a/Documentation/EDID/hex
+++ /dev/null
@@ -1 +0,0 @@
-"\t" 8/1 "0x%02x, " "\n"
diff --git a/tools/edid/1024x768.S b/tools/edid/1024x768.S
new file mode 100644
index 000000000000..4aed3f9ab88a
--- /dev/null
+++ b/tools/edid/1024x768.S
@@ -0,0 +1,43 @@
+/*
+   1024x768.S: EDID data set for standard 1024x768 60 Hz monitor
+
+   Copyright (C) 2011 Carsten Emde <C.Emde@osadl.org>
+
+   This program is free software; you can redistribute it and/or
+   modify it under the terms of the GNU General Public License
+   as published by the Free Software Foundation; either version 2
+   of the License, or (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program; if not, write to the Free Software
+   Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA.
+*/
+
+/* EDID */
+#define VERSION 1
+#define REVISION 3
+
+/* Display */
+#define CLOCK 65000 /* kHz */
+#define XPIX 1024
+#define YPIX 768
+#define XY_RATIO XY_RATIO_4_3
+#define XBLANK 320
+#define YBLANK 38
+#define XOFFSET 8
+#define XPULSE 144
+#define YOFFSET 3
+#define YPULSE 6
+#define DPI 72
+#define VFREQ 60 /* Hz */
+#define TIMING_NAME "Linux XGA"
+#define ESTABLISHED_TIMING2_BITS 0x08 /* Bit 3 -> 1024x768 @60 Hz */
+#define HSYNC_POL 0
+#define VSYNC_POL 0
+
+#include "edid.S"
diff --git a/tools/edid/1280x1024.S b/tools/edid/1280x1024.S
new file mode 100644
index 000000000000..b26dd424cad7
--- /dev/null
+++ b/tools/edid/1280x1024.S
@@ -0,0 +1,43 @@
+/*
+   1280x1024.S: EDID data set for standard 1280x1024 60 Hz monitor
+
+   Copyright (C) 2011 Carsten Emde <C.Emde@osadl.org>
+
+   This program is free software; you can redistribute it and/or
+   modify it under the terms of the GNU General Public License
+   as published by the Free Software Foundation; either version 2
+   of the License, or (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program; if not, write to the Free Software
+   Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA.
+*/
+
+/* EDID */
+#define VERSION 1
+#define REVISION 3
+
+/* Display */
+#define CLOCK 108000 /* kHz */
+#define XPIX 1280
+#define YPIX 1024
+#define XY_RATIO XY_RATIO_5_4
+#define XBLANK 408
+#define YBLANK 42
+#define XOFFSET 48
+#define XPULSE 112
+#define YOFFSET 1
+#define YPULSE 3
+#define DPI 72
+#define VFREQ 60 /* Hz */
+#define TIMING_NAME "Linux SXGA"
+/* No ESTABLISHED_TIMINGx_BITS */
+#define HSYNC_POL 1
+#define VSYNC_POL 1
+
+#include "edid.S"
diff --git a/tools/edid/1600x1200.S b/tools/edid/1600x1200.S
new file mode 100644
index 000000000000..0d091b282768
--- /dev/null
+++ b/tools/edid/1600x1200.S
@@ -0,0 +1,43 @@
+/*
+   1600x1200.S: EDID data set for standard 1600x1200 60 Hz monitor
+
+   Copyright (C) 2013 Carsten Emde <C.Emde@osadl.org>
+
+   This program is free software; you can redistribute it and/or
+   modify it under the terms of the GNU General Public License
+   as published by the Free Software Foundation; either version 2
+   of the License, or (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program; if not, write to the Free Software
+   Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA.
+*/
+
+/* EDID */
+#define VERSION 1
+#define REVISION 3
+
+/* Display */
+#define CLOCK 162000 /* kHz */
+#define XPIX 1600
+#define YPIX 1200
+#define XY_RATIO XY_RATIO_4_3
+#define XBLANK 560
+#define YBLANK 50
+#define XOFFSET 64
+#define XPULSE 192
+#define YOFFSET 1
+#define YPULSE 3
+#define DPI 72
+#define VFREQ 60 /* Hz */
+#define TIMING_NAME "Linux UXGA"
+/* No ESTABLISHED_TIMINGx_BITS */
+#define HSYNC_POL 1
+#define VSYNC_POL 1
+
+#include "edid.S"
diff --git a/tools/edid/1680x1050.S b/tools/edid/1680x1050.S
new file mode 100644
index 000000000000..7dfed9a33eab
--- /dev/null
+++ b/tools/edid/1680x1050.S
@@ -0,0 +1,43 @@
+/*
+   1680x1050.S: EDID data set for standard 1680x1050 60 Hz monitor
+
+   Copyright (C) 2012 Carsten Emde <C.Emde@osadl.org>
+
+   This program is free software; you can redistribute it and/or
+   modify it under the terms of the GNU General Public License
+   as published by the Free Software Foundation; either version 2
+   of the License, or (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program; if not, write to the Free Software
+   Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA.
+*/
+
+/* EDID */
+#define VERSION 1
+#define REVISION 3
+
+/* Display */
+#define CLOCK 146250 /* kHz */
+#define XPIX 1680
+#define YPIX 1050
+#define XY_RATIO XY_RATIO_16_10
+#define XBLANK 560
+#define YBLANK 39
+#define XOFFSET 104
+#define XPULSE 176
+#define YOFFSET 3
+#define YPULSE 6
+#define DPI 96
+#define VFREQ 60 /* Hz */
+#define TIMING_NAME "Linux WSXGA"
+/* No ESTABLISHED_TIMINGx_BITS */
+#define HSYNC_POL 1
+#define VSYNC_POL 1
+
+#include "edid.S"
diff --git a/tools/edid/1920x1080.S b/tools/edid/1920x1080.S
new file mode 100644
index 000000000000..d6ffbba28e95
--- /dev/null
+++ b/tools/edid/1920x1080.S
@@ -0,0 +1,43 @@
+/*
+   1920x1080.S: EDID data set for standard 1920x1080 60 Hz monitor
+
+   Copyright (C) 2012 Carsten Emde <C.Emde@osadl.org>
+
+   This program is free software; you can redistribute it and/or
+   modify it under the terms of the GNU General Public License
+   as published by the Free Software Foundation; either version 2
+   of the License, or (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program; if not, write to the Free Software
+   Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA.
+*/
+
+/* EDID */
+#define VERSION 1
+#define REVISION 3
+
+/* Display */
+#define CLOCK 148500 /* kHz */
+#define XPIX 1920
+#define YPIX 1080
+#define XY_RATIO XY_RATIO_16_9
+#define XBLANK 280
+#define YBLANK 45
+#define XOFFSET 88
+#define XPULSE 44
+#define YOFFSET 4
+#define YPULSE 5
+#define DPI 96
+#define VFREQ 60 /* Hz */
+#define TIMING_NAME "Linux FHD"
+/* No ESTABLISHED_TIMINGx_BITS */
+#define HSYNC_POL 1
+#define VSYNC_POL 1
+
+#include "edid.S"
diff --git a/tools/edid/800x600.S b/tools/edid/800x600.S
new file mode 100644
index 000000000000..a5616588de08
--- /dev/null
+++ b/tools/edid/800x600.S
@@ -0,0 +1,40 @@
+/*
+   800x600.S: EDID data set for standard 800x600 60 Hz monitor
+
+   Copyright (C) 2011 Carsten Emde <C.Emde@osadl.org>
+   Copyright (C) 2014 Linaro Limited
+
+   This program is free software; you can redistribute it and/or
+   modify it under the terms of the GNU General Public License
+   as published by the Free Software Foundation; either version 2
+   of the License, or (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+*/
+
+/* EDID */
+#define VERSION 1
+#define REVISION 3
+
+/* Display */
+#define CLOCK 40000 /* kHz */
+#define XPIX 800
+#define YPIX 600
+#define XY_RATIO XY_RATIO_4_3
+#define XBLANK 256
+#define YBLANK 28
+#define XOFFSET 40
+#define XPULSE 128
+#define YOFFSET 1
+#define YPULSE 4
+#define DPI 72
+#define VFREQ 60 /* Hz */
+#define TIMING_NAME "Linux SVGA"
+#define ESTABLISHED_TIMING1_BITS 0x01 /* Bit 0: 800x600 @ 60Hz */
+#define HSYNC_POL 1
+#define VSYNC_POL 1
+
+#include "edid.S"
diff --git a/tools/edid/Makefile b/tools/edid/Makefile
new file mode 100644
index 000000000000..85a927dfab02
--- /dev/null
+++ b/tools/edid/Makefile
@@ -0,0 +1,37 @@
+
+SOURCES	:= $(wildcard [0-9]*x[0-9]*.S)
+
+BIN	:= $(patsubst %.S, %.bin, $(SOURCES))
+
+IHEX	:= $(patsubst %.S, %.bin.ihex, $(SOURCES))
+
+CODE	:= $(patsubst %.S, %.c, $(SOURCES))
+
+all:	$(BIN) $(IHEX) $(CODE)
+
+clean:
+	@rm -f *.o *.bin.ihex *.bin *.c
+
+%.o:	%.S
+	@cc -c $^
+
+%.bin.nocrc:	%.o
+	@objcopy -Obinary $^ $@
+
+%.crc:	%.bin.nocrc
+	@list=$$(for i in `seq 1 127`; do head -c$$i $^ | tail -c1 \
+		| hexdump -v -e '/1 "%02X+"'; done); \
+		echo "ibase=16;100-($${list%?})%100" | bc >$@
+
+%.p:	%.crc %.S
+	@cc -c -DCRC="$$(cat $*.crc)" -o $@ $*.S
+
+%.bin:	%.p
+	@objcopy -Obinary $^ $@
+
+%.bin.ihex:	%.p
+	@objcopy -Oihex $^ $@
+	@dos2unix $@ 2>/dev/null
+
+%.c:	%.bin
+	@echo "{" >$@; hexdump -f hex $^ >>$@; echo "};" >>$@
diff --git a/tools/edid/edid.S b/tools/edid/edid.S
new file mode 100644
index 000000000000..c3d13815526d
--- /dev/null
+++ b/tools/edid/edid.S
@@ -0,0 +1,274 @@
+/*
+   edid.S: EDID data template
+
+   Copyright (C) 2012 Carsten Emde <C.Emde@osadl.org>
+
+   This program is free software; you can redistribute it and/or
+   modify it under the terms of the GNU General Public License
+   as published by the Free Software Foundation; either version 2
+   of the License, or (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program; if not, write to the Free Software
+   Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA.
+*/
+
+
+/* Manufacturer */
+#define MFG_LNX1 'L'
+#define MFG_LNX2 'N'
+#define MFG_LNX3 'X'
+#define SERIAL 0
+#define YEAR 2012
+#define WEEK 5
+
+/* EDID 1.3 standard definitions */
+#define XY_RATIO_16_10	0b00
+#define XY_RATIO_4_3	0b01
+#define XY_RATIO_5_4	0b10
+#define XY_RATIO_16_9	0b11
+
+/* Provide defaults for the timing bits */
+#ifndef ESTABLISHED_TIMING1_BITS
+#define ESTABLISHED_TIMING1_BITS 0x00
+#endif
+#ifndef ESTABLISHED_TIMING2_BITS
+#define ESTABLISHED_TIMING2_BITS 0x00
+#endif
+#ifndef ESTABLISHED_TIMING3_BITS
+#define ESTABLISHED_TIMING3_BITS 0x00
+#endif
+
+#define mfgname2id(v1,v2,v3) \
+	((((v1-'@')&0x1f)<<10)+(((v2-'@')&0x1f)<<5)+((v3-'@')&0x1f))
+#define swap16(v1) ((v1>>8)+((v1&0xff)<<8))
+#define lsbs2(v1,v2) (((v1&0x0f)<<4)+(v2&0x0f))
+#define msbs2(v1,v2) ((((v1>>8)&0x0f)<<4)+((v2>>8)&0x0f))
+#define msbs4(v1,v2,v3,v4) \
+	((((v1>>8)&0x03)<<6)+(((v2>>8)&0x03)<<4)+\
+	(((v3>>4)&0x03)<<2)+((v4>>4)&0x03))
+#define pixdpi2mm(pix,dpi) ((pix*25)/dpi)
+#define xsize pixdpi2mm(XPIX,DPI)
+#define ysize pixdpi2mm(YPIX,DPI)
+
+		.data
+
+/* Fixed header pattern */
+header:		.byte	0x00,0xff,0xff,0xff,0xff,0xff,0xff,0x00
+
+mfg_id:		.hword	swap16(mfgname2id(MFG_LNX1, MFG_LNX2, MFG_LNX3))
+
+prod_code:	.hword	0
+
+/* Serial number. 32 bits, little endian. */
+serial_number:	.long	SERIAL
+
+/* Week of manufacture */
+week:		.byte	WEEK
+
+/* Year of manufacture, less 1990. (1990-2245)
+   If week=255, it is the model year instead */
+year:		.byte	YEAR-1990
+
+version:	.byte	VERSION 	/* EDID version, usually 1 (for 1.3) */
+revision:	.byte	REVISION	/* EDID revision, usually 3 (for 1.3) */
+
+/* If Bit 7=1	Digital input. If set, the following bit definitions apply:
+     Bits 6-1	Reserved, must be 0
+     Bit 0	Signal is compatible with VESA DFP 1.x TMDS CRGB,
+		  1 pixel per clock, up to 8 bits per color, MSB aligned,
+   If Bit 7=0	Analog input. If clear, the following bit definitions apply:
+     Bits 6-5	Video white and sync levels, relative to blank
+		  00=+0.7/-0.3 V; 01=+0.714/-0.286 V;
+		  10=+1.0/-0.4 V; 11=+0.7/0 V
+   Bit 4	Blank-to-black setup (pedestal) expected
+   Bit 3	Separate sync supported
+   Bit 2	Composite sync (on HSync) supported
+   Bit 1	Sync on green supported
+   Bit 0	VSync pulse must be serrated when somposite or
+		  sync-on-green is used. */
+video_parms:	.byte	0x6d
+
+/* Maximum horizontal image size, in centimetres
+   (max 292 cm/115 in at 16:9 aspect ratio) */
+max_hor_size:	.byte	xsize/10
+
+/* Maximum vertical image size, in centimetres.
+   If either byte is 0, undefined (e.g. projector) */
+max_vert_size:	.byte	ysize/10
+
+/* Display gamma, minus 1, times 100 (range 1.00-3.5 */
+gamma:		.byte	120
+
+/* Bit 7	DPMS standby supported
+   Bit 6	DPMS suspend supported
+   Bit 5	DPMS active-off supported
+   Bits 4-3	Display type: 00=monochrome; 01=RGB colour;
+		  10=non-RGB multicolour; 11=undefined
+   Bit 2	Standard sRGB colour space. Bytes 25-34 must contain
+		  sRGB standard values.
+   Bit 1	Preferred timing mode specified in descriptor block 1.
+   Bit 0	GTF supported with default parameter values. */
+dsp_features:	.byte	0xea
+
+/* Chromaticity coordinates. */
+/* Red and green least-significant bits
+   Bits 7-6	Red x value least-significant 2 bits
+   Bits 5-4	Red y value least-significant 2 bits
+   Bits 3-2	Green x value lst-significant 2 bits
+   Bits 1-0	Green y value least-significant 2 bits */
+red_green_lsb:	.byte	0x5e
+
+/* Blue and white least-significant 2 bits */
+blue_white_lsb:	.byte	0xc0
+
+/* Red x value most significant 8 bits.
+   0-255 encodes 0-0.996 (255/256); 0-0.999 (1023/1024) with lsbits */
+red_x_msb:	.byte	0xa4
+
+/* Red y value most significant 8 bits */
+red_y_msb:	.byte	0x59
+
+/* Green x and y value most significant 8 bits */
+green_x_y_msb:	.byte	0x4a,0x98
+
+/* Blue x and y value most significant 8 bits */
+blue_x_y_msb:	.byte	0x25,0x20
+
+/* Default white point x and y value most significant 8 bits */
+white_x_y_msb:	.byte	0x50,0x54
+
+/* Established timings */
+/* Bit 7	720x400 @ 70 Hz
+   Bit 6	720x400 @ 88 Hz
+   Bit 5	640x480 @ 60 Hz
+   Bit 4	640x480 @ 67 Hz
+   Bit 3	640x480 @ 72 Hz
+   Bit 2	640x480 @ 75 Hz
+   Bit 1	800x600 @ 56 Hz
+   Bit 0	800x600 @ 60 Hz */
+estbl_timing1:	.byte	ESTABLISHED_TIMING1_BITS
+
+/* Bit 7	800x600 @ 72 Hz
+   Bit 6	800x600 @ 75 Hz
+   Bit 5	832x624 @ 75 Hz
+   Bit 4	1024x768 @ 87 Hz, interlaced (1024x768)
+   Bit 3	1024x768 @ 60 Hz
+   Bit 2	1024x768 @ 72 Hz
+   Bit 1	1024x768 @ 75 Hz
+   Bit 0	1280x1024 @ 75 Hz */
+estbl_timing2:	.byte	ESTABLISHED_TIMING2_BITS
+
+/* Bit 7	1152x870 @ 75 Hz (Apple Macintosh II)
+   Bits 6-0 	Other manufacturer-specific display mod */
+estbl_timing3:	.byte	ESTABLISHED_TIMING3_BITS
+
+/* Standard timing */
+/* X resolution, less 31, divided by 8 (256-2288 pixels) */
+std_xres:	.byte	(XPIX/8)-31
+/* Y resolution, X:Y pixel ratio
+   Bits 7-6	X:Y pixel ratio: 00=16:10; 01=4:3; 10=5:4; 11=16:9.
+   Bits 5-0	Vertical frequency, less 60 (60-123 Hz) */
+std_vres:	.byte	(XY_RATIO<<6)+VFREQ-60
+		.fill	7,2,0x0101	/* Unused */
+
+descriptor1:
+/* Pixel clock in 10 kHz units. (0.-655.35 MHz, little-endian) */
+clock:		.hword	CLOCK/10
+
+/* Horizontal active pixels 8 lsbits (0-4095) */
+x_act_lsb:	.byte	XPIX&0xff
+/* Horizontal blanking pixels 8 lsbits (0-4095)
+   End of active to start of next active. */
+x_blk_lsb:	.byte	XBLANK&0xff
+/* Bits 7-4 	Horizontal active pixels 4 msbits
+   Bits 3-0	Horizontal blanking pixels 4 msbits */
+x_msbs:		.byte	msbs2(XPIX,XBLANK)
+
+/* Vertical active lines 8 lsbits (0-4095) */
+y_act_lsb:	.byte	YPIX&0xff
+/* Vertical blanking lines 8 lsbits (0-4095) */
+y_blk_lsb:	.byte	YBLANK&0xff
+/* Bits 7-4 	Vertical active lines 4 msbits
+   Bits 3-0 	Vertical blanking lines 4 msbits */
+y_msbs:		.byte	msbs2(YPIX,YBLANK)
+
+/* Horizontal sync offset pixels 8 lsbits (0-1023) From blanking start */
+x_snc_off_lsb:	.byte	XOFFSET&0xff
+/* Horizontal sync pulse width pixels 8 lsbits (0-1023) */
+x_snc_pls_lsb:	.byte	XPULSE&0xff
+/* Bits 7-4 	Vertical sync offset lines 4 lsbits (0-63)
+   Bits 3-0 	Vertical sync pulse width lines 4 lsbits (0-63) */
+y_snc_lsb:	.byte	lsbs2(YOFFSET, YPULSE)
+/* Bits 7-6 	Horizontal sync offset pixels 2 msbits
+   Bits 5-4 	Horizontal sync pulse width pixels 2 msbits
+   Bits 3-2 	Vertical sync offset lines 2 msbits
+   Bits 1-0 	Vertical sync pulse width lines 2 msbits */
+xy_snc_msbs:	.byte	msbs4(XOFFSET,XPULSE,YOFFSET,YPULSE)
+
+/* Horizontal display size, mm, 8 lsbits (0-4095 mm, 161 in) */
+x_dsp_size:	.byte	xsize&0xff
+
+/* Vertical display size, mm, 8 lsbits (0-4095 mm, 161 in) */
+y_dsp_size:	.byte	ysize&0xff
+
+/* Bits 7-4 	Horizontal display size, mm, 4 msbits
+   Bits 3-0 	Vertical display size, mm, 4 msbits */
+dsp_size_mbsb:	.byte	msbs2(xsize,ysize)
+
+/* Horizontal border pixels (each side; total is twice this) */
+x_border:	.byte	0
+/* Vertical border lines (each side; total is twice this) */
+y_border:	.byte	0
+
+/* Bit 7 	Interlaced
+   Bits 6-5 	Stereo mode: 00=No stereo; other values depend on bit 0:
+   Bit 0=0: 01=Field sequential, sync=1 during right; 10=similar,
+     sync=1 during left; 11=4-way interleaved stereo
+   Bit 0=1 2-way interleaved stereo: 01=Right image on even lines;
+     10=Left image on even lines; 11=side-by-side
+   Bits 4-3 	Sync type: 00=Analog composite; 01=Bipolar analog composite;
+     10=Digital composite (on HSync); 11=Digital separate
+   Bit 2 	If digital separate: Vertical sync polarity (1=positive)
+   Other types: VSync serrated (HSync during VSync)
+   Bit 1 	If analog sync: Sync on all 3 RGB lines (else green only)
+   Digital: HSync polarity (1=positive)
+   Bit 0 	2-way line-interleaved stereo, if bits 4-3 are not 00. */
+features:	.byte	0x18+(VSYNC_POL<<2)+(HSYNC_POL<<1)
+
+descriptor2:	.byte	0,0	/* Not a detailed timing descriptor */
+		.byte	0	/* Must be zero */
+		.byte	0xff	/* Descriptor is monitor serial number (text) */
+		.byte	0	/* Must be zero */
+start1:		.ascii	"Linux #0"
+end1:		.byte	0x0a	/* End marker */
+		.fill	12-(end1-start1), 1, 0x20 /* Padded spaces */
+descriptor3:	.byte	0,0	/* Not a detailed timing descriptor */
+		.byte	0	/* Must be zero */
+		.byte	0xfd	/* Descriptor is monitor range limits */
+		.byte	0	/* Must be zero */
+start2:		.byte	VFREQ-1	/* Minimum vertical field rate (1-255 Hz) */
+		.byte	VFREQ+1	/* Maximum vertical field rate (1-255 Hz) */
+		.byte	(CLOCK/(XPIX+XBLANK))-1 /* Minimum horizontal line rate
+						    (1-255 kHz) */
+		.byte	(CLOCK/(XPIX+XBLANK))+1 /* Maximum horizontal line rate
+						    (1-255 kHz) */
+		.byte	(CLOCK/10000)+1	/* Maximum pixel clock rate, rounded up
+					   to 10 MHz multiple (10-2550 MHz) */
+		.byte	0	/* No extended timing information type */
+end2:		.byte	0x0a	/* End marker */
+		.fill	12-(end2-start2), 1, 0x20 /* Padded spaces */
+descriptor4:	.byte	0,0	/* Not a detailed timing descriptor */
+		.byte	0	/* Must be zero */
+		.byte	0xfc	/* Descriptor is text */
+		.byte	0	/* Must be zero */
+start3:		.ascii	TIMING_NAME
+end3:		.byte	0x0a	/* End marker */
+		.fill	12-(end3-start3), 1, 0x20 /* Padded spaces */
+extensions:	.byte	0	/* Number of extensions to follow */
+checksum:	.byte	CRC	/* Sum of all bytes must be 0 */
diff --git a/tools/edid/hex b/tools/edid/hex
new file mode 100644
index 000000000000..8873ebb618af
--- /dev/null
+++ b/tools/edid/hex
@@ -0,0 +1 @@
+"\t" 8/1 "0x%02x, " "\n"
-- 
cgit 


From 43e96ef8b70c50f6054f20b8c357ee5881592082 Mon Sep 17 00:00:00 2001
From: Michael Ellerman <mpe@ellerman.id.au>
Date: Fri, 21 Feb 2020 11:48:43 +1100
Subject: docs/core-api: Add Fedora instructions for GCC plugins

Add an example of how to install the necessary packages for GCC
plugins on Fedora.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/core-api/gcc-plugins.rst | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/Documentation/core-api/gcc-plugins.rst b/Documentation/core-api/gcc-plugins.rst
index 8502f24396fb..4b1c10f88e30 100644
--- a/Documentation/core-api/gcc-plugins.rst
+++ b/Documentation/core-api/gcc-plugins.rst
@@ -72,6 +72,10 @@ e.g., on Ubuntu for gcc-4.9::
 
 	apt-get install gcc-4.9-plugin-dev
 
+Or on Fedora::
+
+	dnf install gcc-plugin-devel
+
 Enable a GCC plugin based feature in the kernel config::
 
 	CONFIG_GCC_PLUGIN_CYC_COMPLEXITY = y
-- 
cgit 


From 290d5388993eb40b9d5632aefb864cf1012a2bcc Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Sat, 22 Feb 2020 10:00:01 +0100
Subject: scripts: documentation-file-ref-check: improve :doc: handling

There are some issues at the script with regards to :doc:
tags:

- It doesn't escape files under Documentation/sphinx,
  leading to false positives;
- It doesn't handle root URLs, like :doc:`/x86/boot`;
- It doesn't output the file with a bad reference.

Address those things, in order to remove false positives
from the list of problems.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 scripts/documentation-file-ref-check | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/scripts/documentation-file-ref-check b/scripts/documentation-file-ref-check
index 7784c54aa38b..997202a18ddb 100755
--- a/scripts/documentation-file-ref-check
+++ b/scripts/documentation-file-ref-check
@@ -51,7 +51,9 @@ open IN, "git grep ':doc:\`' Documentation/|"
      or die "Failed to run git grep";
 while (<IN>) {
 	next if (!m,^([^:]+):.*\:doc\:\`([^\`]+)\`,);
+	next if (m,sphinx/,);
 
+	my $file = $1;
 	my $d = $1;
 	my $doc_ref = $2;
 
@@ -60,7 +62,12 @@ while (<IN>) {
 	$d =~ s,(.*/).*,$1,;
 	$f =~ s,.*\<([^\>]+)\>,$1,;
 
-	$f ="$d$f.rst";
+	if ($f =~ m,^/,) {
+		$f = "$f.rst";
+		$f =~ s,^/,Documentation/,;
+	} else {
+		$f = "$d$f.rst";
+	}
 
 	next if (grep -e, glob("$f"));
 
@@ -69,7 +76,7 @@ while (<IN>) {
 	}
 	$doc_fix++;
 
-	print STDERR "$f: :doc:`$doc_ref`\n";
+	print STDERR "$file: :doc:`$doc_ref`\n";
 }
 close IN;
 
-- 
cgit 


From a3aead706dac19ca504c31ed5d6b3e141addbaec Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Sat, 22 Feb 2020 10:00:07 +0100
Subject: docs: gpu: i915.rst: fix warnings due to file renames

Fix two warnings due to file rename:

	WARNING: kernel-doc './scripts/kernel-doc -rst -enable-lineno -function csr support for dmc ./drivers/gpu/drm/i915/intel_csr.c' failed with return code 1
	WARNING: kernel-doc './scripts/kernel-doc -rst -enable-lineno -internal ./drivers/gpu/drm/i915/intel_csr.c' failed with return code 2

Fixes: 06d3ff6e7451 ("drm/i915: move intel_csr.[ch] under display/")
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/gpu/i915.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/gpu/i915.rst b/Documentation/gpu/i915.rst
index e539c42a3e78..cc74e24ca3b5 100644
--- a/Documentation/gpu/i915.rst
+++ b/Documentation/gpu/i915.rst
@@ -207,10 +207,10 @@ DPIO
 CSR firmware support for DMC
 ----------------------------
 
-.. kernel-doc:: drivers/gpu/drm/i915/intel_csr.c
+.. kernel-doc:: drivers/gpu/drm/i915/display/intel_csr.c
    :doc: csr support for dmc
 
-.. kernel-doc:: drivers/gpu/drm/i915/intel_csr.c
+.. kernel-doc:: drivers/gpu/drm/i915/display/intel_csr.c
    :internal:
 
 Video BIOS Table (VBT)
-- 
cgit 


From 2bd49cb581ed5a5fbd43811b952fe9552b737408 Mon Sep 17 00:00:00 2001
From: Stephen Kitt <steve@sk2.org>
Date: Fri, 21 Feb 2020 17:55:02 +0100
Subject: docs: sysctl/kernel: document acpi_video_flags

Based on the implementation in arch/x86/kernel/acpi/sleep.c, in
particular the acpi_sleep_setup() function.

Signed-off-by: Stephen Kitt <steve@sk2.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/admin-guide/sysctl/kernel.rst | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
index 6c0d8c55101c..6586e0e0c11f 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -51,8 +51,15 @@ free space valid for 30 seconds.
 acpi_video_flags
 ================
 
-See Documentation/kernel/power/video.txt, it allows mode of video boot
-to be set during run time.
+See :doc:`/power/video`. This allows the video resume mode to be set,
+in a similar fashion to the ``acpi_sleep`` kernel parameter, by
+combining the following values:
+
+= =======
+1 s3_bios
+2 s3_mode
+4 s3_beep
+= =======
 
 
 auto_msgmni
-- 
cgit 


From bf347b9da9bbba14b4af845b00d443f24d17d46d Mon Sep 17 00:00:00 2001
From: Alex Hung <alex.hung@canonical.com>
Date: Wed, 19 Feb 2020 12:21:33 -0700
Subject: Documentation: fix a typo for intel_iommu=nobounce

"untrusted" was mis-spelled as "unstrusted"

Signed-off-by: Alex Hung <alex.hung@canonical.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/admin-guide/kernel-parameters.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 47cd55e339a5..56bf9b2a9ddf 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1775,7 +1775,7 @@
 			provided by tboot because it makes the system
 			vulnerable to DMA attacks.
 		nobounce [Default off]
-			Disable bounce buffer for unstrusted devices such as
+			Disable bounce buffer for untrusted devices such as
 			the Thunderbolt devices. This will treat the untrusted
 			devices as the trusted ones, hence might expose security
 			risks of DMA attacks.
-- 
cgit 


From 021622df556b7213cffec1c0713f093fc7d045e3 Mon Sep 17 00:00:00 2001
From: Stephen Kitt <steve@sk2.org>
Date: Wed, 19 Feb 2020 16:34:42 +0100
Subject: docs: add a script to check sysctl docs

This script allows sysctl documentation to be checked against the
kernel source code, to identify missing or obsolete entries. Running
it against 5.5 shows for example that sysctl/kernel.rst has two
obsolete entries and is missing 52 entries.

Signed-off-by: Stephen Kitt <steve@sk2.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/admin-guide/sysctl/kernel.rst |   3 +
 scripts/check-sysctl-docs                   | 181 ++++++++++++++++++++++++++++
 2 files changed, 184 insertions(+)
 create mode 100755 scripts/check-sysctl-docs

diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
index 6586e0e0c11f..1c48ab4bfe30 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -2,6 +2,9 @@
 Documentation for /proc/sys/kernel/
 ===================================
 
+.. See scripts/check-sysctl-docs to keep this up to date
+
+
 Copyright (c) 1998, 1999,  Rik van Riel <riel@nl.linux.org>
 
 Copyright (c) 2009,        Shen Feng<shen@cn.fujitsu.com>
diff --git a/scripts/check-sysctl-docs b/scripts/check-sysctl-docs
new file mode 100755
index 000000000000..8bcb9e26c7bc
--- /dev/null
+++ b/scripts/check-sysctl-docs
@@ -0,0 +1,181 @@
+#!/usr/bin/gawk -f
+# SPDX-License-Identifier: GPL-2.0
+
+# Script to check sysctl documentation against source files
+#
+# Copyright (c) 2020 Stephen Kitt
+
+# Example invocation:
+#	scripts/check-sysctl-docs -vtable="kernel" \
+#		Documentation/admin-guide/sysctl/kernel.rst \
+#		$(git grep -l register_sysctl_)
+#
+# Specify -vdebug=1 to see debugging information
+
+BEGIN {
+    if (!table) {
+	print "Please specify the table to look for using the table variable" > "/dev/stderr"
+	exit 1
+    }
+}
+
+# The following globals are used:
+# children: maps ctl_table names and procnames to child ctl_table names
+# documented: maps documented entries (each key is an entry)
+# entries: maps ctl_table names and procnames to counts (so
+#          enumerating the subkeys for a given ctl_table lists its
+#          procnames)
+# files: maps procnames to source file names
+# paths: maps ctl_path names to paths
+# curpath: the name of the current ctl_path struct
+# curtable: the name of the current ctl_table struct
+# curentry: the name of the current proc entry (procname when parsing
+#           a ctl_table, constructed path when parsing a ctl_path)
+
+
+# Remove punctuation from the given value
+function trimpunct(value) {
+    while (value ~ /^["&]/) {
+	value = substr(value, 2)
+    }
+    while (value ~ /[]["&,}]$/) {
+	value = substr(value, 1, length(value) - 1)
+    }
+    return value
+}
+
+# Print the information for the given entry
+function printentry(entry) {
+    seen[entry]++
+    printf "* %s from %s", entry, file[entry]
+    if (documented[entry]) {
+	printf " (documented)"
+    }
+    print ""
+}
+
+
+# Stage 1: build the list of documented entries
+FNR == NR && /^=+$/ {
+    if (prevline ~ /Documentation for/) {
+	# This is the main title
+	next
+    }
+
+    # The previous line is a section title, parse it
+    $0 = prevline
+    if (debug) print "Parsing " $0
+    inbrackets = 0
+    for (i = 1; i <= NF; i++) {
+	if (length($i) == 0) {
+	    continue
+	}
+	if (!inbrackets && substr($i, 1, 1) == "(") {
+	    inbrackets = 1
+	}
+	if (!inbrackets) {
+	    token = trimpunct($i)
+	    if (length(token) > 0 && token != "and") {
+		if (debug) print trimpunct($i)
+		documented[trimpunct($i)]++
+	    }
+	}
+	if (inbrackets && substr($i, length($i), 1) == ")") {
+	    inbrackets = 0
+	}
+    }
+}
+
+FNR == NR {
+    prevline = $0
+    next
+}
+
+
+# Stage 2: process each file and find all sysctl tables
+BEGINFILE {
+    delete children
+    delete entries
+    delete paths
+    curpath = ""
+    curtable = ""
+    curentry = ""
+    if (debug) print "Processing file " FILENAME
+}
+
+/^static struct ctl_path/ {
+    match($0, /static struct ctl_path ([^][]+)/, tables)
+    curpath = tables[1]
+    if (debug) print "Processing path " curpath
+}
+
+/^static struct ctl_table/ {
+    match($0, /static struct ctl_table ([^][]+)/, tables)
+    curtable = tables[1]
+    if (debug) print "Processing table " curtable
+}
+
+/^};$/ {
+    curpath = ""
+    curtable = ""
+    curentry = ""
+}
+
+curpath && /\.procname[\t ]*=[\t ]*".+"/ {
+    match($0, /.procname[\t ]*=[\t ]*"([^"]+)"/, names)
+    if (curentry) {
+	curentry = curentry "/" names[1]
+    } else {
+	curentry = names[1]
+    }
+    if (debug) print "Setting path " curpath " to " curentry
+    paths[curpath] = curentry
+}
+
+curtable && /\.procname[\t ]*=[\t ]*".+"/ {
+    match($0, /.procname[\t ]*=[\t ]*"([^"]+)"/, names)
+    curentry = names[1]
+    if (debug) print "Adding entry " curentry " to table " curtable
+    entries[curtable][curentry]++
+    file[curentry] = FILENAME
+}
+
+/\.child[\t ]*=/ {
+    child = trimpunct($NF)
+    if (debug) print "Linking child " child " to table " curtable " entry " curentry
+    children[curtable][curentry] = child
+}
+
+/register_sysctl_table\(.*\)/ {
+    match($0, /register_sysctl_table\(([^)]+)\)/, tables)
+    if (debug) print "Registering table " tables[1]
+    if (children[tables[1]][table]) {
+	for (entry in entries[children[tables[1]][table]]) {
+	    printentry(entry)
+	}
+    }
+}
+
+/register_sysctl_paths\(.*\)/ {
+    match($0, /register_sysctl_paths\(([^)]+), ([^)]+)\)/, tables)
+    if (debug) print "Attaching table " tables[2] " to path " tables[1]
+    if (paths[tables[1]] == table) {
+	for (entry in entries[tables[2]]) {
+	    printentry(entry)
+	}
+    }
+    split(paths[tables[1]], components, "/")
+    if (length(components) > 1 && components[1] == table) {
+	# Count the first subdirectory as seen
+	seen[components[2]]++
+    }
+}
+
+
+END {
+    for (entry in documented) {
+	if (!seen[entry]) {
+	    print "No implementation for " entry
+	}
+    }
+}
-- 
cgit 


From ef45e78fdc11ac1794940c2ff4a6bf3bc4c45372 Mon Sep 17 00:00:00 2001
From: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Date: Thu, 13 Feb 2020 18:23:11 +0530
Subject: docs: kref: Clarify the use of two kref_put() in example code

Eventhough the current documentation explains that the reference count
gets incremented by both kref_init() and kref_get(), it is often
misunderstood that only one instance of kref_put() is needed in the
example code. So let's clarify that a bit.

Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/kref.txt | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/Documentation/kref.txt b/Documentation/kref.txt
index 3af384156d7e..c61eea6f1bf2 100644
--- a/Documentation/kref.txt
+++ b/Documentation/kref.txt
@@ -128,6 +128,10 @@ since we already have a valid pointer that we own a refcount for.  The
 put needs no lock because nothing tries to get the data without
 already holding a pointer.
 
+In the above example, kref_put() will be called 2 times in both success
+and error paths. This is necessary because the reference count got
+incremented 2 times by kref_init() and kref_get().
+
 Note that the "before" in rule 1 is very important.  You should never
 do something like::
 
-- 
cgit 


From 0a464ea4dc1248e8e640ae0f7eee90b99732eaf0 Mon Sep 17 00:00:00 2001
From: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Date: Sat, 29 Feb 2020 18:35:14 +0100
Subject: docs: dev-tools: gcov: Remove a stray single-quote
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Link: https://lore.kernel.org/r/20200229173515.13868-1-j.neuschaefer@gmx.net
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/dev-tools/gcov.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/dev-tools/gcov.rst b/Documentation/dev-tools/gcov.rst
index 46aae52a41d0..7bd013596217 100644
--- a/Documentation/dev-tools/gcov.rst
+++ b/Documentation/dev-tools/gcov.rst
@@ -203,7 +203,7 @@ Cause
     may not correctly copy files from sysfs.
 
 Solution
-    Use ``cat``' to read ``.gcda`` files and ``cp -d`` to copy links.
+    Use ``cat`` to read ``.gcda`` files and ``cp -d`` to copy links.
     Alternatively use the mechanism shown in Appendix B.
 
 
-- 
cgit 


From 7fe068dba8335ff0f9ec608db9589b1fce4663e0 Mon Sep 17 00:00:00 2001
From: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Date: Sat, 29 Feb 2020 14:27:48 +0100
Subject: docs: admin-guide: kernel-parameters: Document earlycon options for
 i.MX UARTs
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

drivers/tty/serial/imx.c implements these earlycon options.

Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Link: https://lore.kernel.org/r/20200229132750.2783-1-j.neuschaefer@gmx.net
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/admin-guide/kernel-parameters.txt | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 56bf9b2a9ddf..3e3fd0d19e53 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1095,6 +1095,12 @@
 			A valid base address must be provided, and the serial
 			port must already be setup and configured.
 
+		ec_imx21,<addr>
+		ec_imx6q,<addr>
+			Start an early, polled-mode, output-only console on the
+			Freescale i.MX UART at the specified address. The UART
+			must already be setup and configured.
+
 		ar3700_uart,<addr>
 			Start an early, polled-mode console on the
 			Armada 3700 serial port at the specified
-- 
cgit 


From adf3f38a87bbd8b8b69487988c7e8392d141f558 Mon Sep 17 00:00:00 2001
From: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Date: Fri, 28 Feb 2020 21:41:45 +0100
Subject: docs: kernel-docs: Remove "Here is its" at the end of lines
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Before commit 9e03ea7f683e ("Documentation/kernel-docs.txt: convert it
to ReST markup"), it read:

       Description: Linux Journal Kernel Korner article. Here is its
       abstract: "..."

In Sphinx' HTML formatting, however, the "Here is its" doesn't make
sense anymore, because the "Abstract:" is clearly separated.

Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Link: https://lore.kernel.org/r/20200228204147.8622-1-j.neuschaefer@gmx.net
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/process/kernel-docs.rst | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/Documentation/process/kernel-docs.rst b/Documentation/process/kernel-docs.rst
index 7a45a8e36ea7..9d6d0ac4fca9 100644
--- a/Documentation/process/kernel-docs.rst
+++ b/Documentation/process/kernel-docs.rst
@@ -313,7 +313,7 @@ On-line docs
       :URL: http://www.linuxjournal.com/article.php?sid=2391
       :Date: 1997
       :Keywords: RAID, MD driver.
-      :Description: Linux Journal Kernel Korner article. Here is its
+      :Description: Linux Journal Kernel Korner article.
       :Abstract: *A description of the implementation of the RAID-1,
         RAID-4 and RAID-5 personalities of the MD device driver in the
         Linux kernel, providing users with high performance and reliable,
@@ -338,7 +338,7 @@ On-line docs
       :Date: 1996
       :Keywords: device driver, module, loading/unloading modules,
         allocating resources.
-      :Description: Linux Journal Kernel Korner article. Here is its
+      :Description: Linux Journal Kernel Korner article.
       :Abstract: *This is the first of a series of four articles
         co-authored by Alessandro Rubini and Georg Zezchwitz which present
         a practical approach to writing Linux device drivers as kernel
@@ -354,7 +354,7 @@ On-line docs
       :Keywords: character driver, init_module, clean_up module,
         autodetection, mayor number, minor number, file operations,
         open(), close().
-      :Description: Linux Journal Kernel Korner article. Here is its
+      :Description: Linux Journal Kernel Korner article.
       :Abstract: *This article, the second of four, introduces part of
         the actual code to create custom module implementing a character
         device driver. It describes the code for module initialization and
@@ -367,7 +367,7 @@ On-line docs
       :Date: 1996
       :Keywords: read(), write(), select(), ioctl(), blocking/non
         blocking mode, interrupt handler.
-      :Description: Linux Journal Kernel Korner article. Here is its
+      :Description: Linux Journal Kernel Korner article.
       :Abstract: *This article, the third of four on writing character
         device drivers, introduces concepts of reading, writing, and using
         ioctl-calls*.
@@ -378,7 +378,7 @@ On-line docs
       :URL: http://www.linuxjournal.com/article.php?sid=1222
       :Date: 1996
       :Keywords: interrupts, irqs, DMA, bottom halves, task queues.
-      :Description: Linux Journal Kernel Korner article. Here is its
+      :Description: Linux Journal Kernel Korner article.
       :Abstract: *This is the fourth in a series of articles about
         writing character device drivers as loadable kernel modules. This
         month, we further investigate the field of interrupt handling.
-- 
cgit 


From d0c3bacb3e37488b00feb307c0ce43105a6fd23e Mon Sep 17 00:00:00 2001
From: Jakub Kicinski <kuba@kernel.org>
Date: Thu, 27 Feb 2020 16:06:49 -0800
Subject: doc: cgroup: improve formatting

Fix tabs vs spaces issue which cases the line to be considered
a new list entry.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Link: https://lore.kernel.org/r/20200228000653.1572553-2-kuba@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/admin-guide/cgroup-v2.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 3f801461f0f3..723c8bd422cc 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1103,7 +1103,7 @@ PAGE_SIZE multiple when read back.
 	proportionally to the overage, reducing reclaim pressure for
 	smaller overages.
 
-       Effective min boundary is limited by memory.min values of
+	Effective min boundary is limited by memory.min values of
 	all ancestor cgroups. If there is memory.min overcommitment
 	(child cgroup or cgroups are requiring more protected memory
 	than parent will allow), then each child cgroup will get
-- 
cgit 


From 2551cab59927e3b50f45f3f04f7ce0c9708eb5fb Mon Sep 17 00:00:00 2001
From: Jakub Kicinski <kuba@kernel.org>
Date: Thu, 27 Feb 2020 16:06:50 -0800
Subject: doc: cgroup: improve formatting of mem stats

If there is an empty line between item and description
Sphinx does not emphasize the item. First half of the
list does not have the empty line and is emphasized
correctly.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Link: https://lore.kernel.org/r/20200228000653.1572553-3-kuba@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/admin-guide/cgroup-v2.rst | 12 ------------
 1 file changed, 12 deletions(-)

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 723c8bd422cc..ab8b91014afb 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1313,53 +1313,41 @@ PAGE_SIZE multiple when read back.
 		Number of major page faults incurred
 
 	  workingset_refault
-
 		Number of refaults of previously evicted pages
 
 	  workingset_activate
-
 		Number of refaulted pages that were immediately activated
 
 	  workingset_nodereclaim
-
 		Number of times a shadow node has been reclaimed
 
 	  pgrefill
-
 		Amount of scanned pages (in an active LRU list)
 
 	  pgscan
-
 		Amount of scanned pages (in an inactive LRU list)
 
 	  pgsteal
-
 		Amount of reclaimed pages
 
 	  pgactivate
-
 		Amount of pages moved to the active LRU list
 
 	  pgdeactivate
-
 		Amount of pages moved to the inactive LRU list
 
 	  pglazyfree
-
 		Amount of pages postponed to be freed under memory pressure
 
 	  pglazyfreed
-
 		Amount of reclaimed lazyfree pages
 
 	  thp_fault_alloc
-
 		Number of transparent hugepages which were allocated to satisfy
 		a page fault, including COW faults. This counter is not present
 		when CONFIG_TRANSPARENT_HUGEPAGE is not set.
 
 	  thp_collapse_alloc
-
 		Number of transparent hugepages which were allocated to allow
 		collapsing an existing range of pages. This counter is not
 		present when CONFIG_TRANSPARENT_HUGEPAGE is not set.
-- 
cgit 


From 69654d37cfa66bd0483f172d174180daf4527fea Mon Sep 17 00:00:00 2001
From: Jakub Kicinski <kuba@kernel.org>
Date: Thu, 27 Feb 2020 16:06:51 -0800
Subject: doc: cgroup: improve formatting of io example

We need a literal section, like few paragraphs below.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Link: https://lore.kernel.org/r/20200228000653.1572553-4-kuba@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/admin-guide/cgroup-v2.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index ab8b91014afb..9d16fbc5df63 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1466,7 +1466,7 @@ IO Interface Files
 	  dios		Number of discard IOs
 	  ======	=====================
 
-	An example read output follows:
+	An example read output follows::
 
 	  8:16 rbytes=1459200 wbytes=314773504 rios=192 wios=353 dbytes=0 dios=0
 	  8:0 rbytes=90430464 wbytes=299008000 rios=8950 wios=1252 dbytes=50331648 dios=3021
-- 
cgit 


From f3431ba715b5e7ecf6ae9634c0aa84305339e286 Mon Sep 17 00:00:00 2001
From: Jakub Kicinski <kuba@kernel.org>
Date: Thu, 27 Feb 2020 16:06:52 -0800
Subject: doc: cgroup: improve formatting of cpuset examples

We need literal sections otherwise the entire example is rendered
as a single line.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Link: https://lore.kernel.org/r/20200228000653.1572553-5-kuba@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/admin-guide/cgroup-v2.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 9d16fbc5df63..308d096af071 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1841,7 +1841,7 @@ Cpuset Interface Files
 	from the requested CPUs.
 
 	The CPU numbers are comma-separated numbers or ranges.
-	For example:
+	For example::
 
 	  # cat cpuset.cpus
 	  0-4,6,8-10
@@ -1880,7 +1880,7 @@ Cpuset Interface Files
 	from the requested memory nodes.
 
 	The memory node numbers are comma-separated numbers or ranges.
-	For example:
+	For example::
 
 	  # cat cpuset.mems
 	  0-1,3
-- 
cgit 


From 373e8ffafd665ad114b96c547decce54b9621af4 Mon Sep 17 00:00:00 2001
From: Jakub Kicinski <kuba@kernel.org>
Date: Thu, 27 Feb 2020 16:06:53 -0800
Subject: doc: cgroup: improve formatting of references

Annotate references to other documents to make them clickable.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Link: https://lore.kernel.org/r/20200228000653.1572553-6-kuba@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/accounting/psi.rst              | 2 ++
 Documentation/admin-guide/cgroup-v1/index.rst | 2 ++
 Documentation/admin-guide/cgroup-v2.rst       | 8 ++++----
 3 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/Documentation/accounting/psi.rst b/Documentation/accounting/psi.rst
index 621111ce5740..f2b3439edcc2 100644
--- a/Documentation/accounting/psi.rst
+++ b/Documentation/accounting/psi.rst
@@ -1,3 +1,5 @@
+.. _psi:
+
 ================================
 PSI - Pressure Stall Information
 ================================
diff --git a/Documentation/admin-guide/cgroup-v1/index.rst b/Documentation/admin-guide/cgroup-v1/index.rst
index 10bf48bae0b0..226f64473e8e 100644
--- a/Documentation/admin-guide/cgroup-v1/index.rst
+++ b/Documentation/admin-guide/cgroup-v1/index.rst
@@ -1,3 +1,5 @@
+.. _cgroup-v1:
+
 ========================
 Control Groups version 1
 ========================
diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 308d096af071..fbb111616705 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -9,7 +9,7 @@ This is the authoritative documentation on the design, interface and
 conventions of cgroup v2.  It describes all userland-visible aspects
 of cgroup including core and specific controller behaviors.  All
 future changes must be reflected in this document.  Documentation for
-v1 is available under Documentation/admin-guide/cgroup-v1/.
+v1 is available under :ref:`Documentation/admin-guide/cgroup-v1/index.rst <cgroup-v1>`.
 
 .. CONTENTS
 
@@ -1023,7 +1023,7 @@ All time durations are in microseconds.
 	A read-only nested-key file which exists on non-root cgroups.
 
 	Shows pressure stall information for CPU. See
-	Documentation/accounting/psi.rst for details.
+	:ref:`Documentation/accounting/psi.rst <psi>` for details.
 
   cpu.uclamp.min
         A read-write single value file which exists on non-root cgroups.
@@ -1391,7 +1391,7 @@ PAGE_SIZE multiple when read back.
 	A read-only nested-key file which exists on non-root cgroups.
 
 	Shows pressure stall information for memory. See
-	Documentation/accounting/psi.rst for details.
+	:ref:`Documentation/accounting/psi.rst <psi>` for details.
 
 
 Usage Guidelines
@@ -1631,7 +1631,7 @@ IO Interface Files
 	A read-only nested-key file which exists on non-root cgroups.
 
 	Shows pressure stall information for IO. See
-	Documentation/accounting/psi.rst for details.
+	:ref:`Documentation/accounting/psi.rst <psi>` for details.
 
 
 Writeback
-- 
cgit 


From 669a5cc8c5d997147a0551c809d0e5f795867341 Mon Sep 17 00:00:00 2001
From: Sameer Rahmani <lxsameer@gnu.org>
Date: Tue, 25 Feb 2020 22:21:24 +0000
Subject: Documentation: Converted the `kobject.txt` to rst format

Reviewed and converted the `kobject.txt` format to rst in place.

Signed-off-by: Sameer Rahmani <lxsameer@gnu.org>
Link: https://lore.kernel.org/r/20200225222125.61874-1-lxsameer@gnu.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/kobject.txt | 78 +++++++++++++++++++++++------------------------
 1 file changed, 39 insertions(+), 39 deletions(-)

diff --git a/Documentation/kobject.txt b/Documentation/kobject.txt
index ff4c25098119..1f62d4d7d966 100644
--- a/Documentation/kobject.txt
+++ b/Documentation/kobject.txt
@@ -25,7 +25,7 @@ some terms we will be working with.
    usually embedded within some other structure which contains the stuff
    the code is really interested in.
 
-   No structure should EVER have more than one kobject embedded within it.
+   No structure should **EVER** have more than one kobject embedded within it.
    If it does, the reference counting for the object is sure to be messed
    up and incorrect, and your code will be buggy.  So do not do this.
 
@@ -55,7 +55,7 @@ a larger, domain-specific object.  To this end, kobjects will be found
 embedded in other structures.  If you are used to thinking of things in
 object-oriented terms, kobjects can be seen as a top-level, abstract class
 from which other classes are derived.  A kobject implements a set of
-capabilities which are not particularly useful by themselves, but which are
+capabilities which are not particularly useful by themselves, but are
 nice to have in other objects.  The C language does not allow for the
 direct expression of inheritance, so other techniques - such as structure
 embedding - must be used.
@@ -65,12 +65,12 @@ this is analogous as to how "list_head" structs are rarely useful on
 their own, but are invariably found embedded in the larger objects of
 interest.)
 
-So, for example, the UIO code in drivers/uio/uio.c has a structure that
+So, for example, the UIO code in ``drivers/uio/uio.c`` has a structure that
 defines the memory region associated with a uio device::
 
     struct uio_map {
-	struct kobject kobj;
-	struct uio_mem *mem;
+            struct kobject kobj;
+            struct uio_mem *mem;
     };
 
 If you have a struct uio_map structure, finding its embedded kobject is
@@ -78,30 +78,30 @@ just a matter of using the kobj member.  Code that works with kobjects will
 often have the opposite problem, however: given a struct kobject pointer,
 what is the pointer to the containing structure?  You must avoid tricks
 (such as assuming that the kobject is at the beginning of the structure)
-and, instead, use the container_of() macro, found in <linux/kernel.h>::
+and, instead, use the container_of() macro, found in ``<linux/kernel.h>``::
 
     container_of(pointer, type, member)
 
 where:
 
-  * "pointer" is the pointer to the embedded kobject,
-  * "type" is the type of the containing structure, and
-  * "member" is the name of the structure field to which "pointer" points.
+  * ``pointer`` is the pointer to the embedded kobject,
+  * ``type`` is the type of the containing structure, and
+  * ``member`` is the name of the structure field to which ``pointer`` points.
 
 The return value from container_of() is a pointer to the corresponding
-container type. So, for example, a pointer "kp" to a struct kobject
-embedded *within* a struct uio_map could be converted to a pointer to the
-*containing* uio_map structure with::
+container type. So, for example, a pointer ``kp`` to a struct kobject
+embedded **within** a struct uio_map could be converted to a pointer to the
+**containing** uio_map structure with::
 
     struct uio_map *u_map = container_of(kp, struct uio_map, kobj);
 
-For convenience, programmers often define a simple macro for "back-casting"
+For convenience, programmers often define a simple macro for **back-casting**
 kobject pointers to the containing type.  Exactly this happens in the
-earlier drivers/uio/uio.c, as you can see here::
+earlier ``drivers/uio/uio.c``, as you can see here::
 
     struct uio_map {
-        struct kobject kobj;
-        struct uio_mem *mem;
+            struct kobject kobj;
+            struct uio_mem *mem;
     };
 
     #define to_map(map) container_of(map, struct uio_map, kobj)
@@ -125,7 +125,7 @@ must have an associated kobj_type.  After calling kobject_init(), to
 register the kobject with sysfs, the function kobject_add() must be called::
 
     int kobject_add(struct kobject *kobj, struct kobject *parent,
-		    const char *fmt, ...);
+                    const char *fmt, ...);
 
 This sets up the parent of the kobject and the name for the kobject
 properly.  If the kobject is to be associated with a specific kset,
@@ -172,13 +172,13 @@ call to kobject_uevent()::
 
     int kobject_uevent(struct kobject *kobj, enum kobject_action action);
 
-Use the KOBJ_ADD action for when the kobject is first added to the kernel.
+Use the **KOBJ_ADD** action for when the kobject is first added to the kernel.
 This should be done only after any attributes or children of the kobject
 have been initialized properly, as userspace will instantly start to look
 for them when this call happens.
 
 When the kobject is removed from the kernel (details on how to do that are
-below), the uevent for KOBJ_REMOVE will be automatically created by the
+below), the uevent for **KOBJ_REMOVE** will be automatically created by the
 kobject core, so the caller does not have to worry about doing that by
 hand.
 
@@ -238,7 +238,7 @@ Both types of attributes used here, with a kobject that has been created
 with the kobject_create_and_add(), can be of type kobj_attribute, so no
 special custom attribute is needed to be created.
 
-See the example module, samples/kobject/kobject-example.c for an
+See the example module, ``samples/kobject/kobject-example.c`` for an
 implementation of a simple kobject and attributes.
 
 
@@ -270,10 +270,10 @@ such a method has a form like::
 
     void my_object_release(struct kobject *kobj)
     {
-    	    struct my_object *mine = container_of(kobj, struct my_object, kobj);
+            struct my_object *mine = container_of(kobj, struct my_object, kobj);
 
-	    /* Perform any additional cleanup on this object, then... */
-	    kfree(mine);
+            /* Perform any additional cleanup on this object, then... */
+            kfree(mine);
     }
 
 One important point cannot be overstated: every kobject must have a
@@ -297,11 +297,11 @@ instead, it is associated with the ktype. So let us introduce struct
 kobj_type::
 
     struct kobj_type {
-	    void (*release)(struct kobject *kobj);
-	    const struct sysfs_ops *sysfs_ops;
-	    struct attribute **default_attrs;
-	    const struct kobj_ns_type_operations *(*child_ns_type)(struct kobject *kobj);
-	    const void *(*namespace)(struct kobject *kobj);
+            void (*release)(struct kobject *kobj);
+            const struct sysfs_ops *sysfs_ops;
+            struct attribute **default_attrs;
+            const struct kobj_ns_type_operations *(*child_ns_type)(struct kobject *kobj);
+            const void *(*namespace)(struct kobject *kobj);
     };
 
 This structure is used to describe a particular type of kobject (or, more
@@ -352,8 +352,8 @@ created and never declared statically or on the stack.  To create a new
 kset use::
 
   struct kset *kset_create_and_add(const char *name,
-				   struct kset_uevent_ops *u,
-				   struct kobject *parent);
+                                   struct kset_uevent_ops *u,
+                                   struct kobject *parent);
 
 When you are finished with the kset, call::
 
@@ -365,16 +365,16 @@ Because other references to the kset may still exist, the release may happen
 after kset_unregister() returns.
 
 An example of using a kset can be seen in the
-samples/kobject/kset-example.c file in the kernel tree.
+``samples/kobject/kset-example.c`` file in the kernel tree.
 
 If a kset wishes to control the uevent operations of the kobjects
 associated with it, it can use the struct kset_uevent_ops to handle it::
 
   struct kset_uevent_ops {
-        int (*filter)(struct kset *kset, struct kobject *kobj);
-        const char *(*name)(struct kset *kset, struct kobject *kobj);
-        int (*uevent)(struct kset *kset, struct kobject *kobj,
-                      struct kobj_uevent_env *env);
+          int (*filter)(struct kset *kset, struct kobject *kobj);
+          const char *(*name)(struct kset *kset, struct kobject *kobj);
+          int (*uevent)(struct kset *kset, struct kobject *kobj,
+                        struct kobj_uevent_env *env);
   };
 
 
@@ -408,8 +408,8 @@ Kobject removal
 After a kobject has been registered with the kobject core successfully, it
 must be cleaned up when the code is finished with it.  To do that, call
 kobject_put().  By doing this, the kobject core will automatically clean up
-all of the memory allocated by this kobject.  If a KOBJ_ADD uevent has been
-sent for the object, a corresponding KOBJ_REMOVE uevent will be sent, and
+all of the memory allocated by this kobject.  If a ``KOBJ_ADD`` uevent has been
+sent for the object, a corresponding ``KOBJ_REMOVE`` uevent will be sent, and
 any other sysfs housekeeping will be handled for the caller properly.
 
 If you need to do a two-stage delete of the kobject (say you are not
@@ -430,5 +430,5 @@ Example code to copy from
 =========================
 
 For a more complete example of using ksets and kobjects properly, see the
-example programs samples/kobject/{kobject-example.c,kset-example.c},
-which will be built as loadable modules if you select CONFIG_SAMPLE_KOBJECT.
+example programs ``samples/kobject/{kobject-example.c,kset-example.c}``,
+which will be built as loadable modules if you select ``CONFIG_SAMPLE_KOBJECT``.
-- 
cgit 


From 5fed00dcaca8bbd428742a6db1980753290eb204 Mon Sep 17 00:00:00 2001
From: Sameer Rahmani <lxsameer@gnu.org>
Date: Tue, 25 Feb 2020 22:21:25 +0000
Subject: Documentation: kobject.txt has been moved to core-api/kobject.rst

Moved the `kobject.txt` to `core-api/kobject.rst` and updated the
`core-api` index to point to it.

Signed-off-by: Sameer Rahmani <lxsameer@gnu.org>
[jc: moved it down from the top of core-api/index.rst]
Link: https://lore.kernel.org/r/20200225222125.61874-2-lxsameer@gnu.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/core-api/index.rst   |   1 +
 Documentation/core-api/kobject.rst | 434 +++++++++++++++++++++++++++++++++++++
 Documentation/kobject.txt          | 434 -------------------------------------
 3 files changed, 435 insertions(+), 434 deletions(-)
 create mode 100644 Documentation/core-api/kobject.rst
 delete mode 100644 Documentation/kobject.txt

diff --git a/Documentation/core-api/index.rst b/Documentation/core-api/index.rst
index a501dc1c90d0..d02b26917931 100644
--- a/Documentation/core-api/index.rst
+++ b/Documentation/core-api/index.rst
@@ -12,6 +12,7 @@ Core utilities
    :maxdepth: 1
 
    kernel-api
+   kobject
    assoc_array
    atomic_ops
    cachetlb
diff --git a/Documentation/core-api/kobject.rst b/Documentation/core-api/kobject.rst
new file mode 100644
index 000000000000..1f62d4d7d966
--- /dev/null
+++ b/Documentation/core-api/kobject.rst
@@ -0,0 +1,434 @@
+=====================================================================
+Everything you never wanted to know about kobjects, ksets, and ktypes
+=====================================================================
+
+:Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+:Last updated: December 19, 2007
+
+Based on an original article by Jon Corbet for lwn.net written October 1,
+2003 and located at http://lwn.net/Articles/51437/
+
+Part of the difficulty in understanding the driver model - and the kobject
+abstraction upon which it is built - is that there is no obvious starting
+place. Dealing with kobjects requires understanding a few different types,
+all of which make reference to each other. In an attempt to make things
+easier, we'll take a multi-pass approach, starting with vague terms and
+adding detail as we go. To that end, here are some quick definitions of
+some terms we will be working with.
+
+ - A kobject is an object of type struct kobject.  Kobjects have a name
+   and a reference count.  A kobject also has a parent pointer (allowing
+   objects to be arranged into hierarchies), a specific type, and,
+   usually, a representation in the sysfs virtual filesystem.
+
+   Kobjects are generally not interesting on their own; instead, they are
+   usually embedded within some other structure which contains the stuff
+   the code is really interested in.
+
+   No structure should **EVER** have more than one kobject embedded within it.
+   If it does, the reference counting for the object is sure to be messed
+   up and incorrect, and your code will be buggy.  So do not do this.
+
+ - A ktype is the type of object that embeds a kobject.  Every structure
+   that embeds a kobject needs a corresponding ktype.  The ktype controls
+   what happens to the kobject when it is created and destroyed.
+
+ - A kset is a group of kobjects.  These kobjects can be of the same ktype
+   or belong to different ktypes.  The kset is the basic container type for
+   collections of kobjects. Ksets contain their own kobjects, but you can
+   safely ignore that implementation detail as the kset core code handles
+   this kobject automatically.
+
+   When you see a sysfs directory full of other directories, generally each
+   of those directories corresponds to a kobject in the same kset.
+
+We'll look at how to create and manipulate all of these types. A bottom-up
+approach will be taken, so we'll go back to kobjects.
+
+
+Embedding kobjects
+==================
+
+It is rare for kernel code to create a standalone kobject, with one major
+exception explained below.  Instead, kobjects are used to control access to
+a larger, domain-specific object.  To this end, kobjects will be found
+embedded in other structures.  If you are used to thinking of things in
+object-oriented terms, kobjects can be seen as a top-level, abstract class
+from which other classes are derived.  A kobject implements a set of
+capabilities which are not particularly useful by themselves, but are
+nice to have in other objects.  The C language does not allow for the
+direct expression of inheritance, so other techniques - such as structure
+embedding - must be used.
+
+(As an aside, for those familiar with the kernel linked list implementation,
+this is analogous as to how "list_head" structs are rarely useful on
+their own, but are invariably found embedded in the larger objects of
+interest.)
+
+So, for example, the UIO code in ``drivers/uio/uio.c`` has a structure that
+defines the memory region associated with a uio device::
+
+    struct uio_map {
+            struct kobject kobj;
+            struct uio_mem *mem;
+    };
+
+If you have a struct uio_map structure, finding its embedded kobject is
+just a matter of using the kobj member.  Code that works with kobjects will
+often have the opposite problem, however: given a struct kobject pointer,
+what is the pointer to the containing structure?  You must avoid tricks
+(such as assuming that the kobject is at the beginning of the structure)
+and, instead, use the container_of() macro, found in ``<linux/kernel.h>``::
+
+    container_of(pointer, type, member)
+
+where:
+
+  * ``pointer`` is the pointer to the embedded kobject,
+  * ``type`` is the type of the containing structure, and
+  * ``member`` is the name of the structure field to which ``pointer`` points.
+
+The return value from container_of() is a pointer to the corresponding
+container type. So, for example, a pointer ``kp`` to a struct kobject
+embedded **within** a struct uio_map could be converted to a pointer to the
+**containing** uio_map structure with::
+
+    struct uio_map *u_map = container_of(kp, struct uio_map, kobj);
+
+For convenience, programmers often define a simple macro for **back-casting**
+kobject pointers to the containing type.  Exactly this happens in the
+earlier ``drivers/uio/uio.c``, as you can see here::
+
+    struct uio_map {
+            struct kobject kobj;
+            struct uio_mem *mem;
+    };
+
+    #define to_map(map) container_of(map, struct uio_map, kobj)
+
+where the macro argument "map" is a pointer to the struct kobject in
+question.  That macro is subsequently invoked with::
+
+    struct uio_map *map = to_map(kobj);
+
+
+Initialization of kobjects
+==========================
+
+Code which creates a kobject must, of course, initialize that object. Some
+of the internal fields are setup with a (mandatory) call to kobject_init()::
+
+    void kobject_init(struct kobject *kobj, struct kobj_type *ktype);
+
+The ktype is required for a kobject to be created properly, as every kobject
+must have an associated kobj_type.  After calling kobject_init(), to
+register the kobject with sysfs, the function kobject_add() must be called::
+
+    int kobject_add(struct kobject *kobj, struct kobject *parent,
+                    const char *fmt, ...);
+
+This sets up the parent of the kobject and the name for the kobject
+properly.  If the kobject is to be associated with a specific kset,
+kobj->kset must be assigned before calling kobject_add().  If a kset is
+associated with a kobject, then the parent for the kobject can be set to
+NULL in the call to kobject_add() and then the kobject's parent will be the
+kset itself.
+
+As the name of the kobject is set when it is added to the kernel, the name
+of the kobject should never be manipulated directly.  If you must change
+the name of the kobject, call kobject_rename()::
+
+    int kobject_rename(struct kobject *kobj, const char *new_name);
+
+kobject_rename does not perform any locking or have a solid notion of
+what names are valid so the caller must provide their own sanity checking
+and serialization.
+
+There is a function called kobject_set_name() but that is legacy cruft and
+is being removed.  If your code needs to call this function, it is
+incorrect and needs to be fixed.
+
+To properly access the name of the kobject, use the function
+kobject_name()::
+
+    const char *kobject_name(const struct kobject * kobj);
+
+There is a helper function to both initialize and add the kobject to the
+kernel at the same time, called surprisingly enough kobject_init_and_add()::
+
+    int kobject_init_and_add(struct kobject *kobj, struct kobj_type *ktype,
+                             struct kobject *parent, const char *fmt, ...);
+
+The arguments are the same as the individual kobject_init() and
+kobject_add() functions described above.
+
+
+Uevents
+=======
+
+After a kobject has been registered with the kobject core, you need to
+announce to the world that it has been created.  This can be done with a
+call to kobject_uevent()::
+
+    int kobject_uevent(struct kobject *kobj, enum kobject_action action);
+
+Use the **KOBJ_ADD** action for when the kobject is first added to the kernel.
+This should be done only after any attributes or children of the kobject
+have been initialized properly, as userspace will instantly start to look
+for them when this call happens.
+
+When the kobject is removed from the kernel (details on how to do that are
+below), the uevent for **KOBJ_REMOVE** will be automatically created by the
+kobject core, so the caller does not have to worry about doing that by
+hand.
+
+
+Reference counts
+================
+
+One of the key functions of a kobject is to serve as a reference counter
+for the object in which it is embedded. As long as references to the object
+exist, the object (and the code which supports it) must continue to exist.
+The low-level functions for manipulating a kobject's reference counts are::
+
+    struct kobject *kobject_get(struct kobject *kobj);
+    void kobject_put(struct kobject *kobj);
+
+A successful call to kobject_get() will increment the kobject's reference
+counter and return the pointer to the kobject.
+
+When a reference is released, the call to kobject_put() will decrement the
+reference count and, possibly, free the object. Note that kobject_init()
+sets the reference count to one, so the code which sets up the kobject will
+need to do a kobject_put() eventually to release that reference.
+
+Because kobjects are dynamic, they must not be declared statically or on
+the stack, but instead, always allocated dynamically.  Future versions of
+the kernel will contain a run-time check for kobjects that are created
+statically and will warn the developer of this improper usage.
+
+If all that you want to use a kobject for is to provide a reference counter
+for your structure, please use the struct kref instead; a kobject would be
+overkill.  For more information on how to use struct kref, please see the
+file Documentation/kref.txt in the Linux kernel source tree.
+
+
+Creating "simple" kobjects
+==========================
+
+Sometimes all that a developer wants is a way to create a simple directory
+in the sysfs hierarchy, and not have to mess with the whole complication of
+ksets, show and store functions, and other details.  This is the one
+exception where a single kobject should be created.  To create such an
+entry, use the function::
+
+    struct kobject *kobject_create_and_add(char *name, struct kobject *parent);
+
+This function will create a kobject and place it in sysfs in the location
+underneath the specified parent kobject.  To create simple attributes
+associated with this kobject, use::
+
+    int sysfs_create_file(struct kobject *kobj, struct attribute *attr);
+
+or::
+
+    int sysfs_create_group(struct kobject *kobj, struct attribute_group *grp);
+
+Both types of attributes used here, with a kobject that has been created
+with the kobject_create_and_add(), can be of type kobj_attribute, so no
+special custom attribute is needed to be created.
+
+See the example module, ``samples/kobject/kobject-example.c`` for an
+implementation of a simple kobject and attributes.
+
+
+
+ktypes and release methods
+==========================
+
+One important thing still missing from the discussion is what happens to a
+kobject when its reference count reaches zero. The code which created the
+kobject generally does not know when that will happen; if it did, there
+would be little point in using a kobject in the first place. Even
+predictable object lifecycles become more complicated when sysfs is brought
+in as other portions of the kernel can get a reference on any kobject that
+is registered in the system.
+
+The end result is that a structure protected by a kobject cannot be freed
+before its reference count goes to zero. The reference count is not under
+the direct control of the code which created the kobject. So that code must
+be notified asynchronously whenever the last reference to one of its
+kobjects goes away.
+
+Once you registered your kobject via kobject_add(), you must never use
+kfree() to free it directly. The only safe way is to use kobject_put(). It
+is good practice to always use kobject_put() after kobject_init() to avoid
+errors creeping in.
+
+This notification is done through a kobject's release() method. Usually
+such a method has a form like::
+
+    void my_object_release(struct kobject *kobj)
+    {
+            struct my_object *mine = container_of(kobj, struct my_object, kobj);
+
+            /* Perform any additional cleanup on this object, then... */
+            kfree(mine);
+    }
+
+One important point cannot be overstated: every kobject must have a
+release() method, and the kobject must persist (in a consistent state)
+until that method is called. If these constraints are not met, the code is
+flawed. Note that the kernel will warn you if you forget to provide a
+release() method.  Do not try to get rid of this warning by providing an
+"empty" release function.
+
+If all your cleanup function needs to do is call kfree(), then you must
+create a wrapper function which uses container_of() to upcast to the correct
+type (as shown in the example above) and then calls kfree() on the overall
+structure.
+
+Note, the name of the kobject is available in the release function, but it
+must NOT be changed within this callback.  Otherwise there will be a memory
+leak in the kobject core, which makes people unhappy.
+
+Interestingly, the release() method is not stored in the kobject itself;
+instead, it is associated with the ktype. So let us introduce struct
+kobj_type::
+
+    struct kobj_type {
+            void (*release)(struct kobject *kobj);
+            const struct sysfs_ops *sysfs_ops;
+            struct attribute **default_attrs;
+            const struct kobj_ns_type_operations *(*child_ns_type)(struct kobject *kobj);
+            const void *(*namespace)(struct kobject *kobj);
+    };
+
+This structure is used to describe a particular type of kobject (or, more
+correctly, of containing object). Every kobject needs to have an associated
+kobj_type structure; a pointer to that structure must be specified when you
+call kobject_init() or kobject_init_and_add().
+
+The release field in struct kobj_type is, of course, a pointer to the
+release() method for this type of kobject. The other two fields (sysfs_ops
+and default_attrs) control how objects of this type are represented in
+sysfs; they are beyond the scope of this document.
+
+The default_attrs pointer is a list of default attributes that will be
+automatically created for any kobject that is registered with this ktype.
+
+
+ksets
+=====
+
+A kset is merely a collection of kobjects that want to be associated with
+each other.  There is no restriction that they be of the same ktype, but be
+very careful if they are not.
+
+A kset serves these functions:
+
+ - It serves as a bag containing a group of objects. A kset can be used by
+   the kernel to track "all block devices" or "all PCI device drivers."
+
+ - A kset is also a subdirectory in sysfs, where the associated kobjects
+   with the kset can show up.  Every kset contains a kobject which can be
+   set up to be the parent of other kobjects; the top-level directories of
+   the sysfs hierarchy are constructed in this way.
+
+ - Ksets can support the "hotplugging" of kobjects and influence how
+   uevent events are reported to user space.
+
+In object-oriented terms, "kset" is the top-level container class; ksets
+contain their own kobject, but that kobject is managed by the kset code and
+should not be manipulated by any other user.
+
+A kset keeps its children in a standard kernel linked list.  Kobjects point
+back to their containing kset via their kset field. In almost all cases,
+the kobjects belonging to a kset have that kset (or, strictly, its embedded
+kobject) in their parent.
+
+As a kset contains a kobject within it, it should always be dynamically
+created and never declared statically or on the stack.  To create a new
+kset use::
+
+  struct kset *kset_create_and_add(const char *name,
+                                   struct kset_uevent_ops *u,
+                                   struct kobject *parent);
+
+When you are finished with the kset, call::
+
+  void kset_unregister(struct kset *kset);
+
+to destroy it.  This removes the kset from sysfs and decrements its reference
+count.  When the reference count goes to zero, the kset will be released.
+Because other references to the kset may still exist, the release may happen
+after kset_unregister() returns.
+
+An example of using a kset can be seen in the
+``samples/kobject/kset-example.c`` file in the kernel tree.
+
+If a kset wishes to control the uevent operations of the kobjects
+associated with it, it can use the struct kset_uevent_ops to handle it::
+
+  struct kset_uevent_ops {
+          int (*filter)(struct kset *kset, struct kobject *kobj);
+          const char *(*name)(struct kset *kset, struct kobject *kobj);
+          int (*uevent)(struct kset *kset, struct kobject *kobj,
+                        struct kobj_uevent_env *env);
+  };
+
+
+The filter function allows a kset to prevent a uevent from being emitted to
+userspace for a specific kobject.  If the function returns 0, the uevent
+will not be emitted.
+
+The name function will be called to override the default name of the kset
+that the uevent sends to userspace.  By default, the name will be the same
+as the kset itself, but this function, if present, can override that name.
+
+The uevent function will be called when the uevent is about to be sent to
+userspace to allow more environment variables to be added to the uevent.
+
+One might ask how, exactly, a kobject is added to a kset, given that no
+functions which perform that function have been presented.  The answer is
+that this task is handled by kobject_add().  When a kobject is passed to
+kobject_add(), its kset member should point to the kset to which the
+kobject will belong.  kobject_add() will handle the rest.
+
+If the kobject belonging to a kset has no parent kobject set, it will be
+added to the kset's directory.  Not all members of a kset do necessarily
+live in the kset directory.  If an explicit parent kobject is assigned
+before the kobject is added, the kobject is registered with the kset, but
+added below the parent kobject.
+
+
+Kobject removal
+===============
+
+After a kobject has been registered with the kobject core successfully, it
+must be cleaned up when the code is finished with it.  To do that, call
+kobject_put().  By doing this, the kobject core will automatically clean up
+all of the memory allocated by this kobject.  If a ``KOBJ_ADD`` uevent has been
+sent for the object, a corresponding ``KOBJ_REMOVE`` uevent will be sent, and
+any other sysfs housekeeping will be handled for the caller properly.
+
+If you need to do a two-stage delete of the kobject (say you are not
+allowed to sleep when you need to destroy the object), then call
+kobject_del() which will unregister the kobject from sysfs.  This makes the
+kobject "invisible", but it is not cleaned up, and the reference count of
+the object is still the same.  At a later time call kobject_put() to finish
+the cleanup of the memory associated with the kobject.
+
+kobject_del() can be used to drop the reference to the parent object, if
+circular references are constructed.  It is valid in some cases, that a
+parent objects references a child.  Circular references _must_ be broken
+with an explicit call to kobject_del(), so that a release functions will be
+called, and the objects in the former circle release each other.
+
+
+Example code to copy from
+=========================
+
+For a more complete example of using ksets and kobjects properly, see the
+example programs ``samples/kobject/{kobject-example.c,kset-example.c}``,
+which will be built as loadable modules if you select ``CONFIG_SAMPLE_KOBJECT``.
diff --git a/Documentation/kobject.txt b/Documentation/kobject.txt
deleted file mode 100644
index 1f62d4d7d966..000000000000
--- a/Documentation/kobject.txt
+++ /dev/null
@@ -1,434 +0,0 @@
-=====================================================================
-Everything you never wanted to know about kobjects, ksets, and ktypes
-=====================================================================
-
-:Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-:Last updated: December 19, 2007
-
-Based on an original article by Jon Corbet for lwn.net written October 1,
-2003 and located at http://lwn.net/Articles/51437/
-
-Part of the difficulty in understanding the driver model - and the kobject
-abstraction upon which it is built - is that there is no obvious starting
-place. Dealing with kobjects requires understanding a few different types,
-all of which make reference to each other. In an attempt to make things
-easier, we'll take a multi-pass approach, starting with vague terms and
-adding detail as we go. To that end, here are some quick definitions of
-some terms we will be working with.
-
- - A kobject is an object of type struct kobject.  Kobjects have a name
-   and a reference count.  A kobject also has a parent pointer (allowing
-   objects to be arranged into hierarchies), a specific type, and,
-   usually, a representation in the sysfs virtual filesystem.
-
-   Kobjects are generally not interesting on their own; instead, they are
-   usually embedded within some other structure which contains the stuff
-   the code is really interested in.
-
-   No structure should **EVER** have more than one kobject embedded within it.
-   If it does, the reference counting for the object is sure to be messed
-   up and incorrect, and your code will be buggy.  So do not do this.
-
- - A ktype is the type of object that embeds a kobject.  Every structure
-   that embeds a kobject needs a corresponding ktype.  The ktype controls
-   what happens to the kobject when it is created and destroyed.
-
- - A kset is a group of kobjects.  These kobjects can be of the same ktype
-   or belong to different ktypes.  The kset is the basic container type for
-   collections of kobjects. Ksets contain their own kobjects, but you can
-   safely ignore that implementation detail as the kset core code handles
-   this kobject automatically.
-
-   When you see a sysfs directory full of other directories, generally each
-   of those directories corresponds to a kobject in the same kset.
-
-We'll look at how to create and manipulate all of these types. A bottom-up
-approach will be taken, so we'll go back to kobjects.
-
-
-Embedding kobjects
-==================
-
-It is rare for kernel code to create a standalone kobject, with one major
-exception explained below.  Instead, kobjects are used to control access to
-a larger, domain-specific object.  To this end, kobjects will be found
-embedded in other structures.  If you are used to thinking of things in
-object-oriented terms, kobjects can be seen as a top-level, abstract class
-from which other classes are derived.  A kobject implements a set of
-capabilities which are not particularly useful by themselves, but are
-nice to have in other objects.  The C language does not allow for the
-direct expression of inheritance, so other techniques - such as structure
-embedding - must be used.
-
-(As an aside, for those familiar with the kernel linked list implementation,
-this is analogous as to how "list_head" structs are rarely useful on
-their own, but are invariably found embedded in the larger objects of
-interest.)
-
-So, for example, the UIO code in ``drivers/uio/uio.c`` has a structure that
-defines the memory region associated with a uio device::
-
-    struct uio_map {
-            struct kobject kobj;
-            struct uio_mem *mem;
-    };
-
-If you have a struct uio_map structure, finding its embedded kobject is
-just a matter of using the kobj member.  Code that works with kobjects will
-often have the opposite problem, however: given a struct kobject pointer,
-what is the pointer to the containing structure?  You must avoid tricks
-(such as assuming that the kobject is at the beginning of the structure)
-and, instead, use the container_of() macro, found in ``<linux/kernel.h>``::
-
-    container_of(pointer, type, member)
-
-where:
-
-  * ``pointer`` is the pointer to the embedded kobject,
-  * ``type`` is the type of the containing structure, and
-  * ``member`` is the name of the structure field to which ``pointer`` points.
-
-The return value from container_of() is a pointer to the corresponding
-container type. So, for example, a pointer ``kp`` to a struct kobject
-embedded **within** a struct uio_map could be converted to a pointer to the
-**containing** uio_map structure with::
-
-    struct uio_map *u_map = container_of(kp, struct uio_map, kobj);
-
-For convenience, programmers often define a simple macro for **back-casting**
-kobject pointers to the containing type.  Exactly this happens in the
-earlier ``drivers/uio/uio.c``, as you can see here::
-
-    struct uio_map {
-            struct kobject kobj;
-            struct uio_mem *mem;
-    };
-
-    #define to_map(map) container_of(map, struct uio_map, kobj)
-
-where the macro argument "map" is a pointer to the struct kobject in
-question.  That macro is subsequently invoked with::
-
-    struct uio_map *map = to_map(kobj);
-
-
-Initialization of kobjects
-==========================
-
-Code which creates a kobject must, of course, initialize that object. Some
-of the internal fields are setup with a (mandatory) call to kobject_init()::
-
-    void kobject_init(struct kobject *kobj, struct kobj_type *ktype);
-
-The ktype is required for a kobject to be created properly, as every kobject
-must have an associated kobj_type.  After calling kobject_init(), to
-register the kobject with sysfs, the function kobject_add() must be called::
-
-    int kobject_add(struct kobject *kobj, struct kobject *parent,
-                    const char *fmt, ...);
-
-This sets up the parent of the kobject and the name for the kobject
-properly.  If the kobject is to be associated with a specific kset,
-kobj->kset must be assigned before calling kobject_add().  If a kset is
-associated with a kobject, then the parent for the kobject can be set to
-NULL in the call to kobject_add() and then the kobject's parent will be the
-kset itself.
-
-As the name of the kobject is set when it is added to the kernel, the name
-of the kobject should never be manipulated directly.  If you must change
-the name of the kobject, call kobject_rename()::
-
-    int kobject_rename(struct kobject *kobj, const char *new_name);
-
-kobject_rename does not perform any locking or have a solid notion of
-what names are valid so the caller must provide their own sanity checking
-and serialization.
-
-There is a function called kobject_set_name() but that is legacy cruft and
-is being removed.  If your code needs to call this function, it is
-incorrect and needs to be fixed.
-
-To properly access the name of the kobject, use the function
-kobject_name()::
-
-    const char *kobject_name(const struct kobject * kobj);
-
-There is a helper function to both initialize and add the kobject to the
-kernel at the same time, called surprisingly enough kobject_init_and_add()::
-
-    int kobject_init_and_add(struct kobject *kobj, struct kobj_type *ktype,
-                             struct kobject *parent, const char *fmt, ...);
-
-The arguments are the same as the individual kobject_init() and
-kobject_add() functions described above.
-
-
-Uevents
-=======
-
-After a kobject has been registered with the kobject core, you need to
-announce to the world that it has been created.  This can be done with a
-call to kobject_uevent()::
-
-    int kobject_uevent(struct kobject *kobj, enum kobject_action action);
-
-Use the **KOBJ_ADD** action for when the kobject is first added to the kernel.
-This should be done only after any attributes or children of the kobject
-have been initialized properly, as userspace will instantly start to look
-for them when this call happens.
-
-When the kobject is removed from the kernel (details on how to do that are
-below), the uevent for **KOBJ_REMOVE** will be automatically created by the
-kobject core, so the caller does not have to worry about doing that by
-hand.
-
-
-Reference counts
-================
-
-One of the key functions of a kobject is to serve as a reference counter
-for the object in which it is embedded. As long as references to the object
-exist, the object (and the code which supports it) must continue to exist.
-The low-level functions for manipulating a kobject's reference counts are::
-
-    struct kobject *kobject_get(struct kobject *kobj);
-    void kobject_put(struct kobject *kobj);
-
-A successful call to kobject_get() will increment the kobject's reference
-counter and return the pointer to the kobject.
-
-When a reference is released, the call to kobject_put() will decrement the
-reference count and, possibly, free the object. Note that kobject_init()
-sets the reference count to one, so the code which sets up the kobject will
-need to do a kobject_put() eventually to release that reference.
-
-Because kobjects are dynamic, they must not be declared statically or on
-the stack, but instead, always allocated dynamically.  Future versions of
-the kernel will contain a run-time check for kobjects that are created
-statically and will warn the developer of this improper usage.
-
-If all that you want to use a kobject for is to provide a reference counter
-for your structure, please use the struct kref instead; a kobject would be
-overkill.  For more information on how to use struct kref, please see the
-file Documentation/kref.txt in the Linux kernel source tree.
-
-
-Creating "simple" kobjects
-==========================
-
-Sometimes all that a developer wants is a way to create a simple directory
-in the sysfs hierarchy, and not have to mess with the whole complication of
-ksets, show and store functions, and other details.  This is the one
-exception where a single kobject should be created.  To create such an
-entry, use the function::
-
-    struct kobject *kobject_create_and_add(char *name, struct kobject *parent);
-
-This function will create a kobject and place it in sysfs in the location
-underneath the specified parent kobject.  To create simple attributes
-associated with this kobject, use::
-
-    int sysfs_create_file(struct kobject *kobj, struct attribute *attr);
-
-or::
-
-    int sysfs_create_group(struct kobject *kobj, struct attribute_group *grp);
-
-Both types of attributes used here, with a kobject that has been created
-with the kobject_create_and_add(), can be of type kobj_attribute, so no
-special custom attribute is needed to be created.
-
-See the example module, ``samples/kobject/kobject-example.c`` for an
-implementation of a simple kobject and attributes.
-
-
-
-ktypes and release methods
-==========================
-
-One important thing still missing from the discussion is what happens to a
-kobject when its reference count reaches zero. The code which created the
-kobject generally does not know when that will happen; if it did, there
-would be little point in using a kobject in the first place. Even
-predictable object lifecycles become more complicated when sysfs is brought
-in as other portions of the kernel can get a reference on any kobject that
-is registered in the system.
-
-The end result is that a structure protected by a kobject cannot be freed
-before its reference count goes to zero. The reference count is not under
-the direct control of the code which created the kobject. So that code must
-be notified asynchronously whenever the last reference to one of its
-kobjects goes away.
-
-Once you registered your kobject via kobject_add(), you must never use
-kfree() to free it directly. The only safe way is to use kobject_put(). It
-is good practice to always use kobject_put() after kobject_init() to avoid
-errors creeping in.
-
-This notification is done through a kobject's release() method. Usually
-such a method has a form like::
-
-    void my_object_release(struct kobject *kobj)
-    {
-            struct my_object *mine = container_of(kobj, struct my_object, kobj);
-
-            /* Perform any additional cleanup on this object, then... */
-            kfree(mine);
-    }
-
-One important point cannot be overstated: every kobject must have a
-release() method, and the kobject must persist (in a consistent state)
-until that method is called. If these constraints are not met, the code is
-flawed. Note that the kernel will warn you if you forget to provide a
-release() method.  Do not try to get rid of this warning by providing an
-"empty" release function.
-
-If all your cleanup function needs to do is call kfree(), then you must
-create a wrapper function which uses container_of() to upcast to the correct
-type (as shown in the example above) and then calls kfree() on the overall
-structure.
-
-Note, the name of the kobject is available in the release function, but it
-must NOT be changed within this callback.  Otherwise there will be a memory
-leak in the kobject core, which makes people unhappy.
-
-Interestingly, the release() method is not stored in the kobject itself;
-instead, it is associated with the ktype. So let us introduce struct
-kobj_type::
-
-    struct kobj_type {
-            void (*release)(struct kobject *kobj);
-            const struct sysfs_ops *sysfs_ops;
-            struct attribute **default_attrs;
-            const struct kobj_ns_type_operations *(*child_ns_type)(struct kobject *kobj);
-            const void *(*namespace)(struct kobject *kobj);
-    };
-
-This structure is used to describe a particular type of kobject (or, more
-correctly, of containing object). Every kobject needs to have an associated
-kobj_type structure; a pointer to that structure must be specified when you
-call kobject_init() or kobject_init_and_add().
-
-The release field in struct kobj_type is, of course, a pointer to the
-release() method for this type of kobject. The other two fields (sysfs_ops
-and default_attrs) control how objects of this type are represented in
-sysfs; they are beyond the scope of this document.
-
-The default_attrs pointer is a list of default attributes that will be
-automatically created for any kobject that is registered with this ktype.
-
-
-ksets
-=====
-
-A kset is merely a collection of kobjects that want to be associated with
-each other.  There is no restriction that they be of the same ktype, but be
-very careful if they are not.
-
-A kset serves these functions:
-
- - It serves as a bag containing a group of objects. A kset can be used by
-   the kernel to track "all block devices" or "all PCI device drivers."
-
- - A kset is also a subdirectory in sysfs, where the associated kobjects
-   with the kset can show up.  Every kset contains a kobject which can be
-   set up to be the parent of other kobjects; the top-level directories of
-   the sysfs hierarchy are constructed in this way.
-
- - Ksets can support the "hotplugging" of kobjects and influence how
-   uevent events are reported to user space.
-
-In object-oriented terms, "kset" is the top-level container class; ksets
-contain their own kobject, but that kobject is managed by the kset code and
-should not be manipulated by any other user.
-
-A kset keeps its children in a standard kernel linked list.  Kobjects point
-back to their containing kset via their kset field. In almost all cases,
-the kobjects belonging to a kset have that kset (or, strictly, its embedded
-kobject) in their parent.
-
-As a kset contains a kobject within it, it should always be dynamically
-created and never declared statically or on the stack.  To create a new
-kset use::
-
-  struct kset *kset_create_and_add(const char *name,
-                                   struct kset_uevent_ops *u,
-                                   struct kobject *parent);
-
-When you are finished with the kset, call::
-
-  void kset_unregister(struct kset *kset);
-
-to destroy it.  This removes the kset from sysfs and decrements its reference
-count.  When the reference count goes to zero, the kset will be released.
-Because other references to the kset may still exist, the release may happen
-after kset_unregister() returns.
-
-An example of using a kset can be seen in the
-``samples/kobject/kset-example.c`` file in the kernel tree.
-
-If a kset wishes to control the uevent operations of the kobjects
-associated with it, it can use the struct kset_uevent_ops to handle it::
-
-  struct kset_uevent_ops {
-          int (*filter)(struct kset *kset, struct kobject *kobj);
-          const char *(*name)(struct kset *kset, struct kobject *kobj);
-          int (*uevent)(struct kset *kset, struct kobject *kobj,
-                        struct kobj_uevent_env *env);
-  };
-
-
-The filter function allows a kset to prevent a uevent from being emitted to
-userspace for a specific kobject.  If the function returns 0, the uevent
-will not be emitted.
-
-The name function will be called to override the default name of the kset
-that the uevent sends to userspace.  By default, the name will be the same
-as the kset itself, but this function, if present, can override that name.
-
-The uevent function will be called when the uevent is about to be sent to
-userspace to allow more environment variables to be added to the uevent.
-
-One might ask how, exactly, a kobject is added to a kset, given that no
-functions which perform that function have been presented.  The answer is
-that this task is handled by kobject_add().  When a kobject is passed to
-kobject_add(), its kset member should point to the kset to which the
-kobject will belong.  kobject_add() will handle the rest.
-
-If the kobject belonging to a kset has no parent kobject set, it will be
-added to the kset's directory.  Not all members of a kset do necessarily
-live in the kset directory.  If an explicit parent kobject is assigned
-before the kobject is added, the kobject is registered with the kset, but
-added below the parent kobject.
-
-
-Kobject removal
-===============
-
-After a kobject has been registered with the kobject core successfully, it
-must be cleaned up when the code is finished with it.  To do that, call
-kobject_put().  By doing this, the kobject core will automatically clean up
-all of the memory allocated by this kobject.  If a ``KOBJ_ADD`` uevent has been
-sent for the object, a corresponding ``KOBJ_REMOVE`` uevent will be sent, and
-any other sysfs housekeeping will be handled for the caller properly.
-
-If you need to do a two-stage delete of the kobject (say you are not
-allowed to sleep when you need to destroy the object), then call
-kobject_del() which will unregister the kobject from sysfs.  This makes the
-kobject "invisible", but it is not cleaned up, and the reference count of
-the object is still the same.  At a later time call kobject_put() to finish
-the cleanup of the memory associated with the kobject.
-
-kobject_del() can be used to drop the reference to the parent object, if
-circular references are constructed.  It is valid in some cases, that a
-parent objects references a child.  Circular references _must_ be broken
-with an explicit call to kobject_del(), so that a release functions will be
-called, and the objects in the former circle release each other.
-
-
-Example code to copy from
-=========================
-
-For a more complete example of using ksets and kobjects properly, see the
-example programs ``samples/kobject/{kobject-example.c,kset-example.c}``,
-which will be built as loadable modules if you select ``CONFIG_SAMPLE_KOBJECT``.
-- 
cgit 


From ae5977765acb25c1eafb348f81a6597cb7a88eba Mon Sep 17 00:00:00 2001
From: Zenghui Yu <yuzenghui@huawei.com>
Date: Tue, 25 Feb 2020 20:40:52 +0800
Subject: Documentation: kthread: Fix WQ_SYSFS workqueues path name

The set of WQ_SYSFS workqueues should be displayed using
"ls /sys/devices/virtual/workqueue", add the missing '/'.

Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Link: https://lore.kernel.org/r/20200225124052.1506-1-yuzenghui@huawei.com
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/admin-guide/kernel-per-CPU-kthreads.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-per-CPU-kthreads.rst b/Documentation/admin-guide/kernel-per-CPU-kthreads.rst
index baeeba8762ae..21818aca4708 100644
--- a/Documentation/admin-guide/kernel-per-CPU-kthreads.rst
+++ b/Documentation/admin-guide/kernel-per-CPU-kthreads.rst
@@ -234,7 +234,7 @@ To reduce its OS jitter, do any of the following:
 	Such a workqueue can be confined to a given subset of the
 	CPUs using the ``/sys/devices/virtual/workqueue/*/cpumask`` sysfs
 	files.	The set of WQ_SYSFS workqueues can be displayed using
-	"ls sys/devices/virtual/workqueue".  That said, the workqueues
+	"ls /sys/devices/virtual/workqueue".  That said, the workqueues
 	maintainer would like to caution people against indiscriminately
 	sprinkling WQ_SYSFS across all the workqueues.	The reason for
 	caution is that it is easy to add WQ_SYSFS, but because sysfs is
-- 
cgit 


From c428cd52282dcc967b2a936d80f1eec4cb80d6d5 Mon Sep 17 00:00:00 2001
From: Tim Bird <tim.bird@sony.com>
Date: Mon, 24 Feb 2020 18:34:41 -0700
Subject: scripts/sphinx-pre-install: add '-p python3' to virtualenv

With Ubuntu 16.04 (and presumably Debian distros of the same age),
the instructions for setting up a python virtual environment should
do so with the python 3 interpreter.  On these older distros, the
default python (and virtualenv command) might be python2 based.

Some of the packages that sphinx relies on are now only available
for python3.  If you don't specify the python3 interpreter for
the virtualenv, you get errors when doing the pip installs for
various packages

Fix this by adding '-p python3' to the virtualenv recommendation
line.

Signed-off-by: Tim Bird <tim.bird@sony.com>
Link: https://lore.kernel.org/r/1582594481-23221-1-git-send-email-tim.bird@sony.com
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 scripts/sphinx-pre-install | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/scripts/sphinx-pre-install b/scripts/sphinx-pre-install
index a8f0c002a340..fa3fb05cd54b 100755
--- a/scripts/sphinx-pre-install
+++ b/scripts/sphinx-pre-install
@@ -701,11 +701,26 @@ sub check_needs()
 		} else {
 			my $rec_activate = "$virtenv_dir/bin/activate";
 			my $virtualenv = findprog("virtualenv-3");
+			my $rec_python3 = "";
 			$virtualenv = findprog("virtualenv-3.5") if (!$virtualenv);
 			$virtualenv = findprog("virtualenv") if (!$virtualenv);
 			$virtualenv = "virtualenv" if (!$virtualenv);
 
-			printf "\t$virtualenv $virtenv_dir\n";
+			my $rel = "";
+			if (index($system_release, "Ubuntu") != -1) {
+				$rel = $1 if ($system_release =~ /Ubuntu\s+(\d+)[.]/);
+				if ($rel && $rel >= 16) {
+					$rec_python3 = " -p python3";
+				}
+			}
+			if (index($system_release, "Debian") != -1) {
+				$rel = $1 if ($system_release =~ /Debian\s+(\d+)/);
+				if ($rel && $rel >= 7) {
+					$rec_python3 = " -p python3";
+				}
+			}
+
+			printf "\t$virtualenv$rec_python3 $virtenv_dir\n";
 			printf "\t. $rec_activate\n";
 			printf "\tpip install -r $requirement_file\n";
 			deactivate_help();
-- 
cgit 


From 3eb30c51a6dda26d0c5b8824b7c0515502f1c161 Mon Sep 17 00:00:00 2001
From: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Date: Wed, 12 Feb 2020 19:13:32 +0100
Subject: Documentation: nfsroot.rst: Fix references to nfsroot.rst
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

When converting and moving nfsroot.txt to nfsroot.rst the references to
the old text file was not updated to match the change, fix this.

Fixes: f9a9349846f92b2d ("Documentation: nfsroot.txt: convert to ReST")
Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/20200212181332.520545-1-niklas.soderlund+renesas@ragnatech.se
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/admin-guide/kernel-parameters.txt | 8 ++++----
 Documentation/filesystems/cifs/cifsroot.txt     | 2 +-
 fs/nfs/Kconfig                                  | 2 +-
 net/ipv4/Kconfig                                | 6 +++---
 net/ipv4/ipconfig.c                             | 2 +-
 5 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 3e3fd0d19e53..4220477079bd 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1885,7 +1885,7 @@
 			No delay
 
 	ip=		[IP_PNP]
-			See Documentation/filesystems/nfs/nfsroot.txt.
+			See Documentation/admin-guide/nfs/nfsroot.rst.
 
 	ipcmni_extend	[KNL] Extend the maximum number of unique System V
 			IPC identifiers from 32,768 to 16,777,216.
@@ -2855,13 +2855,13 @@
 			Default value is 0.
 
 	nfsaddrs=	[NFS] Deprecated.  Use ip= instead.
-			See Documentation/filesystems/nfs/nfsroot.txt.
+			See Documentation/admin-guide/nfs/nfsroot.rst.
 
 	nfsroot=	[NFS] nfs root filesystem for disk-less boxes.
-			See Documentation/filesystems/nfs/nfsroot.txt.
+			See Documentation/admin-guide/nfs/nfsroot.rst.
 
 	nfsrootdebug	[NFS] enable nfsroot debugging messages.
-			See Documentation/filesystems/nfs/nfsroot.txt.
+			See Documentation/admin-guide/nfs/nfsroot.rst.
 
 	nfs.callback_nr_threads=
 			[NFSv4] set the total number of threads that the
diff --git a/Documentation/filesystems/cifs/cifsroot.txt b/Documentation/filesystems/cifs/cifsroot.txt
index 0fa1a2c36a40..947b7ec6ce9e 100644
--- a/Documentation/filesystems/cifs/cifsroot.txt
+++ b/Documentation/filesystems/cifs/cifsroot.txt
@@ -13,7 +13,7 @@ network by utilizing SMB or CIFS protocol.
 
 In order to mount, the network stack will also need to be set up by
 using 'ip=' config option. For more details, see
-Documentation/filesystems/nfs/nfsroot.txt.
+Documentation/admin-guide/nfs/nfsroot.rst.
 
 A CIFS root mount currently requires the use of SMB1+UNIX Extensions
 which is only supported by the Samba server. SMB1 is the older
diff --git a/fs/nfs/Kconfig b/fs/nfs/Kconfig
index 40b6c5ac46c0..88e1763e02f3 100644
--- a/fs/nfs/Kconfig
+++ b/fs/nfs/Kconfig
@@ -164,7 +164,7 @@ config ROOT_NFS
 	  If you want your system to mount its root file system via NFS,
 	  choose Y here.  This is common practice for managing systems
 	  without local permanent storage.  For details, read
-	  <file:Documentation/filesystems/nfs/nfsroot.txt>.
+	  <file:Documentation/admin-guide/nfs/nfsroot.rst>.
 
 	  Most people say N here.
 
diff --git a/net/ipv4/Kconfig b/net/ipv4/Kconfig
index f96bd489b362..fb1dc8d02f6d 100644
--- a/net/ipv4/Kconfig
+++ b/net/ipv4/Kconfig
@@ -129,7 +129,7 @@ config IP_PNP_DHCP
 
 	  If unsure, say Y. Note that if you want to use DHCP, a DHCP server
 	  must be operating on your network.  Read
-	  <file:Documentation/filesystems/nfs/nfsroot.txt> for details.
+	  <file:Documentation/admin-guide/nfs/nfsroot.rst> for details.
 
 config IP_PNP_BOOTP
 	bool "IP: BOOTP support"
@@ -144,7 +144,7 @@ config IP_PNP_BOOTP
 	  does BOOTP itself, providing all necessary information on the kernel
 	  command line, you can say N here. If unsure, say Y. Note that if you
 	  want to use BOOTP, a BOOTP server must be operating on your network.
-	  Read <file:Documentation/filesystems/nfs/nfsroot.txt> for details.
+	  Read <file:Documentation/admin-guide/nfs/nfsroot.rst> for details.
 
 config IP_PNP_RARP
 	bool "IP: RARP support"
@@ -157,7 +157,7 @@ config IP_PNP_RARP
 	  older protocol which is being obsoleted by BOOTP and DHCP), say Y
 	  here. Note that if you want to use RARP, a RARP server must be
 	  operating on your network. Read
-	  <file:Documentation/filesystems/nfs/nfsroot.txt> for details.
+	  <file:Documentation/admin-guide/nfs/nfsroot.rst> for details.
 
 config NET_IPIP
 	tristate "IP: tunneling"
diff --git a/net/ipv4/ipconfig.c b/net/ipv4/ipconfig.c
index 4438f6b12335..561f15b5a944 100644
--- a/net/ipv4/ipconfig.c
+++ b/net/ipv4/ipconfig.c
@@ -1621,7 +1621,7 @@ late_initcall(ip_auto_config);
 
 /*
  *  Decode any IP configuration options in the "ip=" or "nfsaddrs=" kernel
- *  command line parameter.  See Documentation/filesystems/nfs/nfsroot.txt.
+ *  command line parameter.  See Documentation/admin-guide/nfs/nfsroot.rst.
  */
 static int __init ic_proto_name(char *name)
 {
-- 
cgit 


From 07d241fd66ba99111d43a0a4c4abeeb972468d1d Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:11:47 +0100
Subject: docs: filesystems: convert 9p.txt to ReST

- Add a SPDX header;
- Add a document title;
- Adjust section titles;
- Some whitespace fixes and new line breaks;
- Mark literal blocks as such;
- Add table markups;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/96a060b7b5c0c3838ab1751addfe4d6d3bc37bd6.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/9p.rst    | 185 ++++++++++++++++++++++++++++++++++++
 Documentation/filesystems/9p.txt    | 161 -------------------------------
 Documentation/filesystems/index.rst |   1 +
 3 files changed, 186 insertions(+), 161 deletions(-)
 create mode 100644 Documentation/filesystems/9p.rst
 delete mode 100644 Documentation/filesystems/9p.txt

diff --git a/Documentation/filesystems/9p.rst b/Documentation/filesystems/9p.rst
new file mode 100644
index 000000000000..f054d1c45e86
--- /dev/null
+++ b/Documentation/filesystems/9p.rst
@@ -0,0 +1,185 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=======================================
+v9fs: Plan 9 Resource Sharing for Linux
+=======================================
+
+About
+=====
+
+v9fs is a Unix implementation of the Plan 9 9p remote filesystem protocol.
+
+This software was originally developed by Ron Minnich <rminnich@sandia.gov>
+and Maya Gokhale.  Additional development by Greg Watson
+<gwatson@lanl.gov> and most recently Eric Van Hensbergen
+<ericvh@gmail.com>, Latchesar Ionkov <lucho@ionkov.net> and Russ Cox
+<rsc@swtch.com>.
+
+The best detailed explanation of the Linux implementation and applications of
+the 9p client is available in the form of a USENIX paper:
+
+   http://www.usenix.org/events/usenix05/tech/freenix/hensbergen.html
+
+Other applications are described in the following papers:
+
+	* XCPU & Clustering
+	  http://xcpu.org/papers/xcpu-talk.pdf
+	* KVMFS: control file system for KVM
+	  http://xcpu.org/papers/kvmfs.pdf
+	* CellFS: A New Programming Model for the Cell BE
+	  http://xcpu.org/papers/cellfs-talk.pdf
+	* PROSE I/O: Using 9p to enable Application Partitions
+	  http://plan9.escet.urjc.es/iwp9/cready/PROSE_iwp9_2006.pdf
+	* VirtFS: A Virtualization Aware File System pass-through
+	  http://goo.gl/3WPDg
+
+Usage
+=====
+
+For remote file server::
+
+	mount -t 9p 10.10.1.2 /mnt/9
+
+For Plan 9 From User Space applications (http://swtch.com/plan9)::
+
+	mount -t 9p `namespace`/acme /mnt/9 -o trans=unix,uname=$USER
+
+For server running on QEMU host with virtio transport::
+
+	mount -t 9p -o trans=virtio <mount_tag> /mnt/9
+
+where mount_tag is the tag associated by the server to each of the exported
+mount points. Each 9P export is seen by the client as a virtio device with an
+associated "mount_tag" property. Available mount tags can be
+seen by reading /sys/bus/virtio/drivers/9pnet_virtio/virtio<n>/mount_tag files.
+
+Options
+=======
+
+  ============= ===============================================================
+  trans=name	select an alternative transport.  Valid options are
+  		currently:
+
+			========  ============================================
+			unix 	  specifying a named pipe mount point
+			tcp	  specifying a normal TCP/IP connection
+			fd   	  used passed file descriptors for connection
+                                  (see rfdno and wfdno)
+			virtio	  connect to the next virtio channel available
+				  (from QEMU with trans_virtio module)
+			rdma	  connect to a specified RDMA channel
+			========  ============================================
+
+  uname=name	user name to attempt mount as on the remote server.  The
+  		server may override or ignore this value.  Certain user
+		names may require authentication.
+
+  aname=name	aname specifies the file tree to access when the server is
+  		offering several exported file systems.
+
+  cache=mode	specifies a caching policy.  By default, no caches are used.
+
+                        none
+				default no cache policy, metadata and data
+                                alike are synchronous.
+			loose
+				no attempts are made at consistency,
+                                intended for exclusive, read-only mounts
+                        fscache
+				use FS-Cache for a persistent, read-only
+				cache backend.
+                        mmap
+				minimal cache that is only used for read-write
+                                mmap.  Northing else is cached, like cache=none
+
+  debug=n	specifies debug level.  The debug level is a bitmask.
+
+			=====   ================================
+			0x01    display verbose error messages
+			0x02    developer debug (DEBUG_CURRENT)
+			0x04    display 9p trace
+			0x08    display VFS trace
+			0x10    display Marshalling debug
+			0x20    display RPC debug
+			0x40    display transport debug
+			0x80    display allocation debug
+			0x100   display protocol message debug
+			0x200   display Fid debug
+			0x400   display packet debug
+			0x800   display fscache tracing debug
+			=====   ================================
+
+  rfdno=n	the file descriptor for reading with trans=fd
+
+  wfdno=n	the file descriptor for writing with trans=fd
+
+  msize=n	the number of bytes to use for 9p packet payload
+
+  port=n	port to connect to on the remote server
+
+  noextend	force legacy mode (no 9p2000.u or 9p2000.L semantics)
+
+  version=name	Select 9P protocol version. Valid options are:
+
+			========        ==============================
+			9p2000          Legacy mode (same as noextend)
+			9p2000.u        Use 9P2000.u protocol
+			9p2000.L        Use 9P2000.L protocol
+			========        ==============================
+
+  dfltuid	attempt to mount as a particular uid
+
+  dfltgid	attempt to mount with a particular gid
+
+  afid		security channel - used by Plan 9 authentication protocols
+
+  nodevmap	do not map special files - represent them as normal files.
+  		This can be used to share devices/named pipes/sockets between
+		hosts.  This functionality will be expanded in later versions.
+
+  access	there are four access modes.
+			user
+				if a user tries to access a file on v9fs
+			        filesystem for the first time, v9fs sends an
+			        attach command (Tattach) for that user.
+				This is the default mode.
+			<uid>
+				allows only user with uid=<uid> to access
+				the files on the mounted filesystem
+			any
+				v9fs does single attach and performs all
+				operations as one user
+			clien
+				 ACL based access check on the 9p client
+			         side for access validation
+
+  cachetag	cache tag to use the specified persistent cache.
+		cache tags for existing cache sessions can be listed at
+		/sys/fs/9p/caches. (applies only to cache=fscache)
+  ============= ===============================================================
+
+Resources
+=========
+
+Protocol specifications are maintained on github:
+http://ericvh.github.com/9p-rfc/
+
+9p client and server implementations are listed on
+http://9p.cat-v.org/implementations
+
+A 9p2000.L server is being developed by LLNL and can be found
+at http://code.google.com/p/diod/
+
+There are user and developer mailing lists available through the v9fs project
+on sourceforge (http://sourceforge.net/projects/v9fs).
+
+News and other information is maintained on a Wiki.
+(http://sf.net/apps/mediawiki/v9fs/index.php).
+
+Bug reports are best issued via the mailing list.
+
+For more information on the Plan 9 Operating System check out
+http://plan9.bell-labs.com/plan9
+
+For information on Plan 9 from User Space (Plan 9 applications and libraries
+ported to Linux/BSD/OSX/etc) check out http://swtch.com/plan9
diff --git a/Documentation/filesystems/9p.txt b/Documentation/filesystems/9p.txt
deleted file mode 100644
index fec7144e817c..000000000000
--- a/Documentation/filesystems/9p.txt
+++ /dev/null
@@ -1,161 +0,0 @@
-	  	    v9fs: Plan 9 Resource Sharing for Linux
-		    =======================================
-
-ABOUT
-=====
-
-v9fs is a Unix implementation of the Plan 9 9p remote filesystem protocol.
-
-This software was originally developed by Ron Minnich <rminnich@sandia.gov>
-and Maya Gokhale.  Additional development by Greg Watson
-<gwatson@lanl.gov> and most recently Eric Van Hensbergen
-<ericvh@gmail.com>, Latchesar Ionkov <lucho@ionkov.net> and Russ Cox
-<rsc@swtch.com>.
-
-The best detailed explanation of the Linux implementation and applications of
-the 9p client is available in the form of a USENIX paper:
-   http://www.usenix.org/events/usenix05/tech/freenix/hensbergen.html
-
-Other applications are described in the following papers:
-	* XCPU & Clustering
-		http://xcpu.org/papers/xcpu-talk.pdf
-	* KVMFS: control file system for KVM
-		http://xcpu.org/papers/kvmfs.pdf
-	* CellFS: A New Programming Model for the Cell BE
-		http://xcpu.org/papers/cellfs-talk.pdf
-	* PROSE I/O: Using 9p to enable Application Partitions
-		http://plan9.escet.urjc.es/iwp9/cready/PROSE_iwp9_2006.pdf
-	* VirtFS: A Virtualization Aware File System pass-through
-		http://goo.gl/3WPDg
-
-USAGE
-=====
-
-For remote file server:
-
-	mount -t 9p 10.10.1.2 /mnt/9
-
-For Plan 9 From User Space applications (http://swtch.com/plan9)
-
-	mount -t 9p `namespace`/acme /mnt/9 -o trans=unix,uname=$USER
-
-For server running on QEMU host with virtio transport:
-
-	mount -t 9p -o trans=virtio <mount_tag> /mnt/9
-
-where mount_tag is the tag associated by the server to each of the exported
-mount points. Each 9P export is seen by the client as a virtio device with an
-associated "mount_tag" property. Available mount tags can be
-seen by reading /sys/bus/virtio/drivers/9pnet_virtio/virtio<n>/mount_tag files.
-
-OPTIONS
-=======
-
-  trans=name	select an alternative transport.  Valid options are
-  		currently:
-			unix 	- specifying a named pipe mount point
-			tcp	- specifying a normal TCP/IP connection
-			fd   	- used passed file descriptors for connection
-                                (see rfdno and wfdno)
-			virtio	- connect to the next virtio channel available
-				(from QEMU with trans_virtio module)
-			rdma	- connect to a specified RDMA channel
-
-  uname=name	user name to attempt mount as on the remote server.  The
-  		server may override or ignore this value.  Certain user
-		names may require authentication.
-
-  aname=name	aname specifies the file tree to access when the server is
-  		offering several exported file systems.
-
-  cache=mode	specifies a caching policy.  By default, no caches are used.
-                        none = default no cache policy, metadata and data
-                                alike are synchronous.
-			loose = no attempts are made at consistency,
-                                intended for exclusive, read-only mounts
-                        fscache = use FS-Cache for a persistent, read-only
-				cache backend.
-                        mmap = minimal cache that is only used for read-write
-                                mmap.  Northing else is cached, like cache=none
-
-  debug=n	specifies debug level.  The debug level is a bitmask.
-			0x01  = display verbose error messages
-			0x02  = developer debug (DEBUG_CURRENT)
-			0x04  = display 9p trace
-			0x08  = display VFS trace
-			0x10  = display Marshalling debug
-			0x20  = display RPC debug
-			0x40  = display transport debug
-			0x80  = display allocation debug
-			0x100 = display protocol message debug
-			0x200 = display Fid debug
-			0x400 = display packet debug
-			0x800 = display fscache tracing debug
-
-  rfdno=n	the file descriptor for reading with trans=fd
-
-  wfdno=n	the file descriptor for writing with trans=fd
-
-  msize=n	the number of bytes to use for 9p packet payload
-
-  port=n	port to connect to on the remote server
-
-  noextend	force legacy mode (no 9p2000.u or 9p2000.L semantics)
-
-  version=name	Select 9P protocol version. Valid options are:
-			9p2000          - Legacy mode (same as noextend)
-			9p2000.u        - Use 9P2000.u protocol
-			9p2000.L        - Use 9P2000.L protocol
-
-  dfltuid	attempt to mount as a particular uid
-
-  dfltgid	attempt to mount with a particular gid
-
-  afid		security channel - used by Plan 9 authentication protocols
-
-  nodevmap	do not map special files - represent them as normal files.
-  		This can be used to share devices/named pipes/sockets between
-		hosts.  This functionality will be expanded in later versions.
-
-  access	there are four access modes.
-			user  = if a user tries to access a file on v9fs
-			        filesystem for the first time, v9fs sends an
-			        attach command (Tattach) for that user.
-				This is the default mode.
-			<uid> = allows only user with uid=<uid> to access
-				the files on the mounted filesystem
-			any   = v9fs does single attach and performs all
-				operations as one user
-			client = ACL based access check on the 9p client
-			         side for access validation
-
-  cachetag	cache tag to use the specified persistent cache.
-		cache tags for existing cache sessions can be listed at
-		/sys/fs/9p/caches. (applies only to cache=fscache)
-
-RESOURCES
-=========
-
-Protocol specifications are maintained on github:
-http://ericvh.github.com/9p-rfc/
-
-9p client and server implementations are listed on
-http://9p.cat-v.org/implementations
-
-A 9p2000.L server is being developed by LLNL and can be found
-at http://code.google.com/p/diod/
-
-There are user and developer mailing lists available through the v9fs project
-on sourceforge (http://sourceforge.net/projects/v9fs).
-
-News and other information is maintained on a Wiki.
-(http://sf.net/apps/mediawiki/v9fs/index.php).
-
-Bug reports are best issued via the mailing list.
-
-For more information on the Plan 9 Operating System check out
-http://plan9.bell-labs.com/plan9
-
-For information on Plan 9 from User Space (Plan 9 applications and libraries
-ported to Linux/BSD/OSX/etc) check out http://swtch.com/plan9
-
diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index 45d791905e91..a9330c3f8c2e 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -46,6 +46,7 @@ Documentation for filesystem implementations.
 .. toctree::
    :maxdepth: 2
 
+   9p
    autofs
    fuse
    overlayfs
-- 
cgit 


From 348739003d4f7e777ef935a44a91e7494f8ab786 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:11:48 +0100
Subject: docs: filesystems: convert adfs.txt to ReST

- Add a SPDX header;
- Add a document title;
- Adjust section titles;
- Mark literal blocks as such;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/15ee92f03ec917e5d26bd7b863565dec88c843f6.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/adfs.rst  | 108 ++++++++++++++++++++++++++++++++++++
 Documentation/filesystems/adfs.txt  |  99 ---------------------------------
 Documentation/filesystems/index.rst |   1 +
 3 files changed, 109 insertions(+), 99 deletions(-)
 create mode 100644 Documentation/filesystems/adfs.rst
 delete mode 100644 Documentation/filesystems/adfs.txt

diff --git a/Documentation/filesystems/adfs.rst b/Documentation/filesystems/adfs.rst
new file mode 100644
index 000000000000..5b22cae38e5e
--- /dev/null
+++ b/Documentation/filesystems/adfs.rst
@@ -0,0 +1,108 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===============================
+Acorn Disc Filing System - ADFS
+===============================
+
+Filesystems supported by ADFS
+-----------------------------
+
+The ADFS module supports the following Filecore formats which have:
+
+- new maps
+- new directories or big directories
+
+In terms of the named formats, this means we support:
+
+- E and E+, with or without boot block
+- F and F+
+
+We fully support reading files from these filesystems, and writing to
+existing files within their existing allocation.  Essentially, we do
+not support changing any of the filesystem metadata.
+
+This is intended to support loopback mounted Linux native filesystems
+on a RISC OS Filecore filesystem, but will allow the data within files
+to be changed.
+
+If write support (ADFS_FS_RW) is configured, we allow rudimentary
+directory updates, specifically updating the access mode and timestamp.
+
+Mount options for ADFS
+----------------------
+
+  ============  ======================================================
+  uid=nnn	All files in the partition will be owned by
+		user id nnn.  Default 0 (root).
+  gid=nnn	All files in the partition will be in group
+		nnn.  Default 0 (root).
+  ownmask=nnn	The permission mask for ADFS 'owner' permissions
+		will be nnn.  Default 0700.
+  othmask=nnn	The permission mask for ADFS 'other' permissions
+		will be nnn.  Default 0077.
+  ftsuffix=n	When ftsuffix=0, no file type suffix will be applied.
+		When ftsuffix=1, a hexadecimal suffix corresponding to
+		the RISC OS file type will be added.  Default 0.
+  ============  ======================================================
+
+Mapping of ADFS permissions to Linux permissions
+------------------------------------------------
+
+  ADFS permissions consist of the following:
+
+	- Owner read
+	- Owner write
+	- Other read
+	- Other write
+
+  (In older versions, an 'execute' permission did exist, but this
+  does not hold the same meaning as the Linux 'execute' permission
+  and is now obsolete).
+
+  The mapping is performed as follows::
+
+	Owner read				-> -r--r--r--
+	Owner write				-> --w--w---w
+	Owner read and filetype UnixExec	-> ---x--x--x
+    These are then masked by ownmask, eg 700	-> -rwx------
+	Possible owner mode permissions		-> -rwx------
+
+	Other read				-> -r--r--r--
+	Other write				-> --w--w--w-
+	Other read and filetype UnixExec	-> ---x--x--x
+    These are then masked by othmask, eg 077	-> ----rwxrwx
+	Possible other mode permissions		-> ----rwxrwx
+
+  Hence, with the default masks, if a file is owner read/write, and
+  not a UnixExec filetype, then the permissions will be::
+
+			-rw-------
+
+  However, if the masks were ownmask=0770,othmask=0007, then this would
+  be modified to::
+
+			-rw-rw----
+
+  There is no restriction on what you can do with these masks.  You may
+  wish that either read bits give read access to the file for all, but
+  keep the default write protection (ownmask=0755,othmask=0577)::
+
+			-rw-r--r--
+
+  You can therefore tailor the permission translation to whatever you
+  desire the permissions should be under Linux.
+
+RISC OS file type suffix
+------------------------
+
+  RISC OS file types are stored in bits 19..8 of the file load address.
+
+  To enable non-RISC OS systems to be used to store files without losing
+  file type information, a file naming convention was devised (initially
+  for use with NFS) such that a hexadecimal suffix of the form ,xyz
+  denoted the file type: e.g. BasicFile,ffb is a BASIC (0xffb) file.  This
+  naming convention is now also used by RISC OS emulators such as RPCEmu.
+
+  Mounting an ADFS disc with option ftsuffix=1 will cause appropriate file
+  type suffixes to be appended to file names read from a directory.  If the
+  ftsuffix option is zero or omitted, no file type suffixes will be added.
diff --git a/Documentation/filesystems/adfs.txt b/Documentation/filesystems/adfs.txt
deleted file mode 100644
index 0baa8e8c1fc1..000000000000
--- a/Documentation/filesystems/adfs.txt
+++ /dev/null
@@ -1,99 +0,0 @@
-Filesystems supported by ADFS
------------------------------
-
-The ADFS module supports the following Filecore formats which have:
-
-- new maps
-- new directories or big directories
-
-In terms of the named formats, this means we support:
-
-- E and E+, with or without boot block
-- F and F+
-
-We fully support reading files from these filesystems, and writing to
-existing files within their existing allocation.  Essentially, we do
-not support changing any of the filesystem metadata.
-
-This is intended to support loopback mounted Linux native filesystems
-on a RISC OS Filecore filesystem, but will allow the data within files
-to be changed.
-
-If write support (ADFS_FS_RW) is configured, we allow rudimentary
-directory updates, specifically updating the access mode and timestamp.
-
-Mount options for ADFS
-----------------------
-
-  uid=nnn	All files in the partition will be owned by
-		user id nnn.  Default 0 (root).
-  gid=nnn	All files in the partition will be in group
-		nnn.  Default 0 (root).
-  ownmask=nnn	The permission mask for ADFS 'owner' permissions
-		will be nnn.  Default 0700.
-  othmask=nnn	The permission mask for ADFS 'other' permissions
-		will be nnn.  Default 0077.
-  ftsuffix=n	When ftsuffix=0, no file type suffix will be applied.
-		When ftsuffix=1, a hexadecimal suffix corresponding to
-		the RISC OS file type will be added.  Default 0.
-
-Mapping of ADFS permissions to Linux permissions
-------------------------------------------------
-
-  ADFS permissions consist of the following:
-
-	Owner read
-	Owner write
-	Other read
-	Other write
-
-  (In older versions, an 'execute' permission did exist, but this
-   does not hold the same meaning as the Linux 'execute' permission
-   and is now obsolete).
-
-  The mapping is performed as follows:
-
-	Owner read				-> -r--r--r--
-	Owner write				-> --w--w---w
-	Owner read and filetype UnixExec	-> ---x--x--x
-    These are then masked by ownmask, eg 700	-> -rwx------
-	Possible owner mode permissions		-> -rwx------
-
-	Other read				-> -r--r--r--
-	Other write				-> --w--w--w-
-	Other read and filetype UnixExec	-> ---x--x--x
-    These are then masked by othmask, eg 077	-> ----rwxrwx
-	Possible other mode permissions		-> ----rwxrwx
-
-  Hence, with the default masks, if a file is owner read/write, and
-  not a UnixExec filetype, then the permissions will be:
-
-			-rw-------
-
-  However, if the masks were ownmask=0770,othmask=0007, then this would
-  be modified to:
-			-rw-rw----
-
-  There is no restriction on what you can do with these masks.  You may
-  wish that either read bits give read access to the file for all, but
-  keep the default write protection (ownmask=0755,othmask=0577):
-
-			-rw-r--r--
-
-  You can therefore tailor the permission translation to whatever you
-  desire the permissions should be under Linux.
-
-RISC OS file type suffix
-------------------------
-
-  RISC OS file types are stored in bits 19..8 of the file load address.
-
-  To enable non-RISC OS systems to be used to store files without losing
-  file type information, a file naming convention was devised (initially
-  for use with NFS) such that a hexadecimal suffix of the form ,xyz
-  denoted the file type: e.g. BasicFile,ffb is a BASIC (0xffb) file.  This
-  naming convention is now also used by RISC OS emulators such as RPCEmu.
-
-  Mounting an ADFS disc with option ftsuffix=1 will cause appropriate file
-  type suffixes to be appended to file names read from a directory.  If the
-  ftsuffix option is zero or omitted, no file type suffixes will be added.
diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index a9330c3f8c2e..14dc89c94822 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -47,6 +47,7 @@ Documentation for filesystem implementations.
    :maxdepth: 2
 
    9p
+   adfs
    autofs
    fuse
    overlayfs
-- 
cgit 


From 7627216830d808572fff8225964e9209249ba196 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:11:49 +0100
Subject: docs: filesystems: convert affs.txt to ReST

- Add a SPDX header;
- Adjust document title;
- Add table markups;
- Mark literal blocks as such;
- Some whitespace fixes and new line breaks;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Acked-by: David Sterba <dsterba@suse.com>
Link: https://lore.kernel.org/r/b44c56befe0e28cbc0eb1b3e281ad7d99737ff16.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/affs.rst  | 246 ++++++++++++++++++++++++++++++++++++
 Documentation/filesystems/affs.txt  | 222 --------------------------------
 Documentation/filesystems/index.rst |   1 +
 3 files changed, 247 insertions(+), 222 deletions(-)
 create mode 100644 Documentation/filesystems/affs.rst
 delete mode 100644 Documentation/filesystems/affs.txt

diff --git a/Documentation/filesystems/affs.rst b/Documentation/filesystems/affs.rst
new file mode 100644
index 000000000000..7f1a40dce6d3
--- /dev/null
+++ b/Documentation/filesystems/affs.rst
@@ -0,0 +1,246 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=============================
+Overview of Amiga Filesystems
+=============================
+
+Not all varieties of the Amiga filesystems are supported for reading and
+writing. The Amiga currently knows six different filesystems:
+
+==============	===============================================================
+DOS\0		The old or original filesystem, not really suited for
+		hard disks and normally not used on them, either.
+		Supported read/write.
+
+DOS\1		The original Fast File System. Supported read/write.
+
+DOS\2		The old "international" filesystem. International means that
+		a bug has been fixed so that accented ("international") letters
+		in file names are case-insensitive, as they ought to be.
+		Supported read/write.
+
+DOS\3		The "international" Fast File System.  Supported read/write.
+
+DOS\4		The original filesystem with directory cache. The directory
+		cache speeds up directory accesses on floppies considerably,
+		but slows down file creation/deletion. Doesn't make much
+		sense on hard disks. Supported read only.
+
+DOS\5		The Fast File System with directory cache. Supported read only.
+==============	===============================================================
+
+All of the above filesystems allow block sizes from 512 to 32K bytes.
+Supported block sizes are: 512, 1024, 2048 and 4096 bytes. Larger blocks
+speed up almost everything at the expense of wasted disk space. The speed
+gain above 4K seems not really worth the price, so you don't lose too
+much here, either.
+
+The muFS (multi user File System) equivalents of the above file systems
+are supported, too.
+
+Mount options for the AFFS
+==========================
+
+protect
+		If this option is set, the protection bits cannot be altered.
+
+setuid[=uid]
+		This sets the owner of all files and directories in the file
+		system to uid or the uid of the current user, respectively.
+
+setgid[=gid]
+		Same as above, but for gid.
+
+mode=mode
+		Sets the mode flags to the given (octal) value, regardless
+		of the original permissions. Directories will get an x
+		permission if the corresponding r bit is set.
+		This is useful since most of the plain AmigaOS files
+		will map to 600.
+
+nofilenametruncate
+		The file system will return an error when filename exceeds
+		standard maximum filename length (30 characters).
+
+reserved=num
+		Sets the number of reserved blocks at the start of the
+		partition to num. You should never need this option.
+		Default is 2.
+
+root=block
+		Sets the block number of the root block. This should never
+		be necessary.
+
+bs=blksize
+		Sets the blocksize to blksize. Valid block sizes are 512,
+		1024, 2048 and 4096. Like the root option, this should
+		never be necessary, as the affs can figure it out itself.
+
+quiet
+		The file system will not return an error for disallowed
+		mode changes.
+
+verbose
+		The volume name, file system type and block size will
+		be written to the syslog when the filesystem is mounted.
+
+mufs
+		The filesystem is really a muFS, also it doesn't
+		identify itself as one. This option is necessary if
+		the filesystem wasn't formatted as muFS, but is used
+		as one.
+
+prefix=path
+		Path will be prefixed to every absolute path name of
+		symbolic links on an AFFS partition. Default = "/".
+		(See below.)
+
+volume=name
+		When symbolic links with an absolute path are created
+		on an AFFS partition, name will be prepended as the
+		volume name. Default = "" (empty string).
+		(See below.)
+
+Handling of the Users/Groups and protection flags
+=================================================
+
+Amiga -> Linux:
+
+The Amiga protection flags RWEDRWEDHSPARWED are handled as follows:
+
+  - R maps to r for user, group and others. On directories, R implies x.
+
+  - If both W and D are allowed, w will be set.
+
+  - E maps to x.
+
+  - H and P are always retained and ignored under Linux.
+
+  - A is always reset when a file is written to.
+
+User id and group id will be used unless set[gu]id are given as mount
+options. Since most of the Amiga file systems are single user systems
+they will be owned by root. The root directory (the mount point) of the
+Amiga filesystem will be owned by the user who actually mounts the
+filesystem (the root directory doesn't have uid/gid fields).
+
+Linux -> Amiga:
+
+The Linux rwxrwxrwx file mode is handled as follows:
+
+  - r permission will set R for user, group and others.
+
+  - w permission will set W and D for user, group and others.
+
+  - x permission of the user will set E for plain files.
+
+  - All other flags (suid, sgid, ...) are ignored and will
+    not be retained.
+
+Newly created files and directories will get the user and group ID
+of the current user and a mode according to the umask.
+
+Symbolic links
+==============
+
+Although the Amiga and Linux file systems resemble each other, there
+are some, not always subtle, differences. One of them becomes apparent
+with symbolic links. While Linux has a file system with exactly one
+root directory, the Amiga has a separate root directory for each
+file system (for example, partition, floppy disk, ...). With the Amiga,
+these entities are called "volumes". They have symbolic names which
+can be used to access them. Thus, symbolic links can point to a
+different volume. AFFS turns the volume name into a directory name
+and prepends the prefix path (see prefix option) to it.
+
+Example:
+You mount all your Amiga partitions under /amiga/<volume> (where
+<volume> is the name of the volume), and you give the option
+"prefix=/amiga/" when mounting all your AFFS partitions. (They
+might be "User", "WB" and "Graphics", the mount points /amiga/User,
+/amiga/WB and /amiga/Graphics). A symbolic link referring to
+"User:sc/include/dos/dos.h" will be followed to
+"/amiga/User/sc/include/dos/dos.h".
+
+Examples
+========
+
+Command line::
+
+    mount  Archive/Amiga/Workbench3.1.adf /mnt -t affs -o loop,verbose
+    mount  /dev/sda3 /Amiga -t affs
+
+/etc/fstab entry::
+
+    /dev/sdb5	/amiga/Workbench    affs    noauto,user,exec,verbose 0 0
+
+IMPORTANT NOTE
+==============
+
+If you boot Windows 95 (don't know about 3.x, 98 and NT) while you
+have an Amiga harddisk connected to your PC, it will overwrite
+the bytes 0x00dc..0x00df of block 0 with garbage, thus invalidating
+the Rigid Disk Block. Sheer luck has it that this is an unused
+area of the RDB, so only the checksum doesn't match anymore.
+Linux will ignore this garbage and recognize the RDB anyway, but
+before you connect that drive to your Amiga again, you must
+restore or repair your RDB. So please do make a backup copy of it
+before booting Windows!
+
+If the damage is already done, the following should fix the RDB
+(where <disk> is the device name).
+
+DO AT YOUR OWN RISK::
+
+  dd if=/dev/<disk> of=rdb.tmp count=1
+  cp rdb.tmp rdb.fixed
+  dd if=/dev/zero of=rdb.fixed bs=1 seek=220 count=4
+  dd if=rdb.fixed of=/dev/<disk>
+
+Bugs, Restrictions, Caveats
+===========================
+
+Quite a few things may not work as advertised. Not everything is
+tested, though several hundred MB have been read and written using
+this fs. For a most up-to-date list of bugs please consult
+fs/affs/Changes.
+
+By default, filenames are truncated to 30 characters without warning.
+'nofilenametruncate' mount option can change that behavior.
+
+Case is ignored by the affs in filename matching, but Linux shells
+do care about the case. Example (with /wb being an affs mounted fs)::
+
+    rm /wb/WRONGCASE
+
+will remove /mnt/wrongcase, but::
+
+    rm /wb/WR*
+
+will not since the names are matched by the shell.
+
+The block allocation is designed for hard disk partitions. If more
+than 1 process writes to a (small) diskette, the blocks are allocated
+in an ugly way (but the real AFFS doesn't do much better). This
+is also true when space gets tight.
+
+You cannot execute programs on an OFS (Old File System), since the
+program files cannot be memory mapped due to the 488 byte blocks.
+For the same reason you cannot mount an image on such a filesystem
+via the loopback device.
+
+The bitmap valid flag in the root block may not be accurate when the
+system crashes while an affs partition is mounted. There's currently
+no way to fix a garbled filesystem without an Amiga (disk validator)
+or manually (who would do this?). Maybe later.
+
+If you mount affs partitions on system startup, you may want to tell
+fsck that the fs should not be checked (place a '0' in the sixth field
+of /etc/fstab).
+
+It's not possible to read floppy disks with a normal PC or workstation
+due to an incompatibility with the Amiga floppy controller.
+
+If you are interested in an Amiga Emulator for Linux, look at
+
+http://web.archive.org/web/%2E/http://www.freiburg.linux.de/~uae/
diff --git a/Documentation/filesystems/affs.txt b/Documentation/filesystems/affs.txt
deleted file mode 100644
index 71b63c2b9841..000000000000
--- a/Documentation/filesystems/affs.txt
+++ /dev/null
@@ -1,222 +0,0 @@
-Overview of Amiga Filesystems
-=============================
-
-Not all varieties of the Amiga filesystems are supported for reading and
-writing. The Amiga currently knows six different filesystems:
-
-DOS\0		The old or original filesystem, not really suited for
-		hard disks and normally not used on them, either.
-		Supported read/write.
-
-DOS\1		The original Fast File System. Supported read/write.
-
-DOS\2		The old "international" filesystem. International means that
-		a bug has been fixed so that accented ("international") letters
-		in file names are case-insensitive, as they ought to be.
-		Supported read/write.
-
-DOS\3		The "international" Fast File System.  Supported read/write.
-
-DOS\4		The original filesystem with directory cache. The directory
-		cache speeds up directory accesses on floppies considerably,
-		but slows down file creation/deletion. Doesn't make much
-		sense on hard disks. Supported read only.
-
-DOS\5		The Fast File System with directory cache. Supported read only.
-
-All of the above filesystems allow block sizes from 512 to 32K bytes.
-Supported block sizes are: 512, 1024, 2048 and 4096 bytes. Larger blocks
-speed up almost everything at the expense of wasted disk space. The speed
-gain above 4K seems not really worth the price, so you don't lose too
-much here, either.
-
-The muFS (multi user File System) equivalents of the above file systems
-are supported, too.
-
-Mount options for the AFFS
-==========================
-
-protect		If this option is set, the protection bits cannot be altered.
-
-setuid[=uid]	This sets the owner of all files and directories in the file
-		system to uid or the uid of the current user, respectively.
-
-setgid[=gid]	Same as above, but for gid.
-
-mode=mode	Sets the mode flags to the given (octal) value, regardless
-		of the original permissions. Directories will get an x
-		permission if the corresponding r bit is set.
-		This is useful since most of the plain AmigaOS files
-		will map to 600.
-
-nofilenametruncate
-		The file system will return an error when filename exceeds
-		standard maximum filename length (30 characters).
-
-reserved=num	Sets the number of reserved blocks at the start of the
-		partition to num. You should never need this option.
-		Default is 2.
-
-root=block	Sets the block number of the root block. This should never
-		be necessary.
-
-bs=blksize	Sets the blocksize to blksize. Valid block sizes are 512,
-		1024, 2048 and 4096. Like the root option, this should
-		never be necessary, as the affs can figure it out itself.
-
-quiet		The file system will not return an error for disallowed
-		mode changes.
-
-verbose		The volume name, file system type and block size will
-		be written to the syslog when the filesystem is mounted.
-
-mufs		The filesystem is really a muFS, also it doesn't
-		identify itself as one. This option is necessary if
-		the filesystem wasn't formatted as muFS, but is used
-		as one.
-
-prefix=path	Path will be prefixed to every absolute path name of
-		symbolic links on an AFFS partition. Default = "/".
-		(See below.)
-
-volume=name	When symbolic links with an absolute path are created
-		on an AFFS partition, name will be prepended as the
-		volume name. Default = "" (empty string).
-		(See below.)
-
-Handling of the Users/Groups and protection flags
-=================================================
-
-Amiga -> Linux:
-
-The Amiga protection flags RWEDRWEDHSPARWED are handled as follows:
-
-  - R maps to r for user, group and others. On directories, R implies x.
-
-  - If both W and D are allowed, w will be set.
-
-  - E maps to x.
-
-  - H and P are always retained and ignored under Linux.
-
-  - A is always reset when a file is written to.
-
-User id and group id will be used unless set[gu]id are given as mount
-options. Since most of the Amiga file systems are single user systems
-they will be owned by root. The root directory (the mount point) of the
-Amiga filesystem will be owned by the user who actually mounts the
-filesystem (the root directory doesn't have uid/gid fields).
-
-Linux -> Amiga:
-
-The Linux rwxrwxrwx file mode is handled as follows:
-
-  - r permission will set R for user, group and others.
-
-  - w permission will set W and D for user, group and others.
-
-  - x permission of the user will set E for plain files.
-
-  - All other flags (suid, sgid, ...) are ignored and will
-    not be retained.
-    
-Newly created files and directories will get the user and group ID
-of the current user and a mode according to the umask.
-
-Symbolic links
-==============
-
-Although the Amiga and Linux file systems resemble each other, there
-are some, not always subtle, differences. One of them becomes apparent
-with symbolic links. While Linux has a file system with exactly one
-root directory, the Amiga has a separate root directory for each
-file system (for example, partition, floppy disk, ...). With the Amiga,
-these entities are called "volumes". They have symbolic names which
-can be used to access them. Thus, symbolic links can point to a
-different volume. AFFS turns the volume name into a directory name
-and prepends the prefix path (see prefix option) to it.
-
-Example:
-You mount all your Amiga partitions under /amiga/<volume> (where
-<volume> is the name of the volume), and you give the option
-"prefix=/amiga/" when mounting all your AFFS partitions. (They
-might be "User", "WB" and "Graphics", the mount points /amiga/User,
-/amiga/WB and /amiga/Graphics). A symbolic link referring to
-"User:sc/include/dos/dos.h" will be followed to
-"/amiga/User/sc/include/dos/dos.h".
-
-Examples
-========
-
-Command line:
-    mount  Archive/Amiga/Workbench3.1.adf /mnt -t affs -o loop,verbose
-    mount  /dev/sda3 /Amiga -t affs
-
-/etc/fstab entry:
-    /dev/sdb5	/amiga/Workbench    affs    noauto,user,exec,verbose 0 0
-
-IMPORTANT NOTE
-==============
-
-If you boot Windows 95 (don't know about 3.x, 98 and NT) while you
-have an Amiga harddisk connected to your PC, it will overwrite
-the bytes 0x00dc..0x00df of block 0 with garbage, thus invalidating
-the Rigid Disk Block. Sheer luck has it that this is an unused
-area of the RDB, so only the checksum doesn't match anymore.
-Linux will ignore this garbage and recognize the RDB anyway, but
-before you connect that drive to your Amiga again, you must
-restore or repair your RDB. So please do make a backup copy of it
-before booting Windows!
-
-If the damage is already done, the following should fix the RDB
-(where <disk> is the device name).
-DO AT YOUR OWN RISK:
-
-  dd if=/dev/<disk> of=rdb.tmp count=1
-  cp rdb.tmp rdb.fixed
-  dd if=/dev/zero of=rdb.fixed bs=1 seek=220 count=4
-  dd if=rdb.fixed of=/dev/<disk>
-
-Bugs, Restrictions, Caveats
-===========================
-
-Quite a few things may not work as advertised. Not everything is
-tested, though several hundred MB have been read and written using
-this fs. For a most up-to-date list of bugs please consult
-fs/affs/Changes.
-
-By default, filenames are truncated to 30 characters without warning.
-'nofilenametruncate' mount option can change that behavior.
-
-Case is ignored by the affs in filename matching, but Linux shells
-do care about the case. Example (with /wb being an affs mounted fs):
-    rm /wb/WRONGCASE
-will remove /mnt/wrongcase, but
-    rm /wb/WR*
-will not since the names are matched by the shell.
-
-The block allocation is designed for hard disk partitions. If more
-than 1 process writes to a (small) diskette, the blocks are allocated
-in an ugly way (but the real AFFS doesn't do much better). This
-is also true when space gets tight.
-
-You cannot execute programs on an OFS (Old File System), since the
-program files cannot be memory mapped due to the 488 byte blocks.
-For the same reason you cannot mount an image on such a filesystem
-via the loopback device.
-
-The bitmap valid flag in the root block may not be accurate when the
-system crashes while an affs partition is mounted. There's currently
-no way to fix a garbled filesystem without an Amiga (disk validator)
-or manually (who would do this?). Maybe later.
-
-If you mount affs partitions on system startup, you may want to tell
-fsck that the fs should not be checked (place a '0' in the sixth field
-of /etc/fstab).
-
-It's not possible to read floppy disks with a normal PC or workstation
-due to an incompatibility with the Amiga floppy controller.
-
-If you are interested in an Amiga Emulator for Linux, look at
-
-http://web.archive.org/web/*/http://www.freiburg.linux.de/~uae/
diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index 14dc89c94822..273d802ad5fb 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -48,6 +48,7 @@ Documentation for filesystem implementations.
 
    9p
    adfs
+   affs
    autofs
    fuse
    overlayfs
-- 
cgit 


From ca6e9049a0934fe72ffea6990c889205aff0a2cf Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:11:50 +0100
Subject: docs: filesystems: convert afs.txt to ReST

- Add a SPDX header;
- Adjust document and section titles;
- Comment out text-only ToC;
- Mark literal blocks as such;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/d77f5afdb5da0f8b0ec3dbe720aef23f1ce73bb5.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/afs.rst   | 251 +++++++++++++++++++++++++++++++++++
 Documentation/filesystems/afs.txt   | 258 ------------------------------------
 Documentation/filesystems/index.rst |   1 +
 3 files changed, 252 insertions(+), 258 deletions(-)
 create mode 100644 Documentation/filesystems/afs.rst
 delete mode 100644 Documentation/filesystems/afs.txt

diff --git a/Documentation/filesystems/afs.rst b/Documentation/filesystems/afs.rst
new file mode 100644
index 000000000000..c4ec39a5966e
--- /dev/null
+++ b/Documentation/filesystems/afs.rst
@@ -0,0 +1,251 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+====================
+kAFS: AFS FILESYSTEM
+====================
+
+.. Contents:
+
+ - Overview.
+ - Usage.
+ - Mountpoints.
+ - Dynamic root.
+ - Proc filesystem.
+ - The cell database.
+ - Security.
+ - The @sys substitution.
+
+
+Overview
+========
+
+This filesystem provides a fairly simple secure AFS filesystem driver. It is
+under development and does not yet provide the full feature set.  The features
+it does support include:
+
+ (*) Security (currently only AFS kaserver and KerberosIV tickets).
+
+ (*) File reading and writing.
+
+ (*) Automounting.
+
+ (*) Local caching (via fscache).
+
+It does not yet support the following AFS features:
+
+ (*) pioctl() system call.
+
+
+Compilation
+===========
+
+The filesystem should be enabled by turning on the kernel configuration
+options::
+
+	CONFIG_AF_RXRPC		- The RxRPC protocol transport
+	CONFIG_RXKAD		- The RxRPC Kerberos security handler
+	CONFIG_AFS		- The AFS filesystem
+
+Additionally, the following can be turned on to aid debugging::
+
+	CONFIG_AF_RXRPC_DEBUG	- Permit AF_RXRPC debugging to be enabled
+	CONFIG_AFS_DEBUG	- Permit AFS debugging to be enabled
+
+They permit the debugging messages to be turned on dynamically by manipulating
+the masks in the following files::
+
+	/sys/module/af_rxrpc/parameters/debug
+	/sys/module/kafs/parameters/debug
+
+
+Usage
+=====
+
+When inserting the driver modules the root cell must be specified along with a
+list of volume location server IP addresses::
+
+	modprobe rxrpc
+	modprobe kafs rootcell=cambridge.redhat.com:172.16.18.73:172.16.18.91
+
+The first module is the AF_RXRPC network protocol driver.  This provides the
+RxRPC remote operation protocol and may also be accessed from userspace.  See:
+
+	Documentation/networking/rxrpc.txt
+
+The second module is the kerberos RxRPC security driver, and the third module
+is the actual filesystem driver for the AFS filesystem.
+
+Once the module has been loaded, more modules can be added by the following
+procedure::
+
+	echo add grand.central.org 18.9.48.14:128.2.203.61:130.237.48.87 >/proc/fs/afs/cells
+
+Where the parameters to the "add" command are the name of a cell and a list of
+volume location servers within that cell, with the latter separated by colons.
+
+Filesystems can be mounted anywhere by commands similar to the following::
+
+	mount -t afs "%cambridge.redhat.com:root.afs." /afs
+	mount -t afs "#cambridge.redhat.com:root.cell." /afs/cambridge
+	mount -t afs "#root.afs." /afs
+	mount -t afs "#root.cell." /afs/cambridge
+
+Where the initial character is either a hash or a percent symbol depending on
+whether you definitely want a R/W volume (percent) or whether you'd prefer a
+R/O volume, but are willing to use a R/W volume instead (hash).
+
+The name of the volume can be suffixes with ".backup" or ".readonly" to
+specify connection to only volumes of those types.
+
+The name of the cell is optional, and if not given during a mount, then the
+named volume will be looked up in the cell specified during modprobe.
+
+Additional cells can be added through /proc (see later section).
+
+
+Mountpoints
+===========
+
+AFS has a concept of mountpoints. In AFS terms, these are specially formatted
+symbolic links (of the same form as the "device name" passed to mount).  kAFS
+presents these to the user as directories that have a follow-link capability
+(ie: symbolic link semantics).  If anyone attempts to access them, they will
+automatically cause the target volume to be mounted (if possible) on that site.
+
+Automatically mounted filesystems will be automatically unmounted approximately
+twenty minutes after they were last used.  Alternatively they can be unmounted
+directly with the umount() system call.
+
+Manually unmounting an AFS volume will cause any idle submounts upon it to be
+culled first.  If all are culled, then the requested volume will also be
+unmounted, otherwise error EBUSY will be returned.
+
+This can be used by the administrator to attempt to unmount the whole AFS tree
+mounted on /afs in one go by doing::
+
+	umount /afs
+
+
+Dynamic Root
+============
+
+A mount option is available to create a serverless mount that is only usable
+for dynamic lookup.  Creating such a mount can be done by, for example::
+
+	mount -t afs none /afs -o dyn
+
+This creates a mount that just has an empty directory at the root.  Attempting
+to look up a name in this directory will cause a mountpoint to be created that
+looks up a cell of the same name, for example::
+
+	ls /afs/grand.central.org/
+
+
+Proc Filesystem
+===============
+
+The AFS modules creates a "/proc/fs/afs/" directory and populates it:
+
+  (*) A "cells" file that lists cells currently known to the afs module and
+      their usage counts::
+
+	[root@andromeda ~]# cat /proc/fs/afs/cells
+	USE NAME
+	  3 cambridge.redhat.com
+
+  (*) A directory per cell that contains files that list volume location
+      servers, volumes, and active servers known within that cell::
+
+	[root@andromeda ~]# cat /proc/fs/afs/cambridge.redhat.com/servers
+	USE ADDR            STATE
+	  4 172.16.18.91        0
+	[root@andromeda ~]# cat /proc/fs/afs/cambridge.redhat.com/vlservers
+	ADDRESS
+	172.16.18.91
+	[root@andromeda ~]# cat /proc/fs/afs/cambridge.redhat.com/volumes
+	USE STT VLID[0]  VLID[1]  VLID[2]  NAME
+	  1 Val 20000000 20000001 20000002 root.afs
+
+
+The Cell Database
+=================
+
+The filesystem maintains an internal database of all the cells it knows and the
+IP addresses of the volume location servers for those cells.  The cell to which
+the system belongs is added to the database when modprobe is performed by the
+"rootcell=" argument or, if compiled in, using a "kafs.rootcell=" argument on
+the kernel command line.
+
+Further cells can be added by commands similar to the following::
+
+	echo add CELLNAME VLADDR[:VLADDR][:VLADDR]... >/proc/fs/afs/cells
+	echo add grand.central.org 18.9.48.14:128.2.203.61:130.237.48.87 >/proc/fs/afs/cells
+
+No other cell database operations are available at this time.
+
+
+Security
+========
+
+Secure operations are initiated by acquiring a key using the klog program.  A
+very primitive klog program is available at:
+
+	http://people.redhat.com/~dhowells/rxrpc/klog.c
+
+This should be compiled by::
+
+	make klog LDLIBS="-lcrypto -lcrypt -lkrb4 -lkeyutils"
+
+And then run as::
+
+	./klog
+
+Assuming it's successful, this adds a key of type RxRPC, named for the service
+and cell, eg: "afs@<cellname>".  This can be viewed with the keyctl program or
+by cat'ing /proc/keys::
+
+	[root@andromeda ~]# keyctl show
+	Session Keyring
+	       -3 --alswrv      0     0  keyring: _ses.3268
+		2 --alswrv      0     0   \_ keyring: _uid.0
+	111416553 --als--v      0     0   \_ rxrpc: afs@CAMBRIDGE.REDHAT.COM
+
+Currently the username, realm, password and proposed ticket lifetime are
+compiled in to the program.
+
+It is not required to acquire a key before using AFS facilities, but if one is
+not acquired then all operations will be governed by the anonymous user parts
+of the ACLs.
+
+If a key is acquired, then all AFS operations, including mounts and automounts,
+made by a possessor of that key will be secured with that key.
+
+If a file is opened with a particular key and then the file descriptor is
+passed to a process that doesn't have that key (perhaps over an AF_UNIX
+socket), then the operations on the file will be made with key that was used to
+open the file.
+
+
+The @sys Substitution
+=====================
+
+The list of up to 16 @sys substitutions for the current network namespace can
+be configured by writing a list to /proc/fs/afs/sysname::
+
+	[root@andromeda ~]# echo foo amd64_linux_26 >/proc/fs/afs/sysname
+
+or cleared entirely by writing an empty list::
+
+	[root@andromeda ~]# echo >/proc/fs/afs/sysname
+
+The current list for current network namespace can be retrieved by::
+
+	[root@andromeda ~]# cat /proc/fs/afs/sysname
+	foo
+	amd64_linux_26
+
+When @sys is being substituted for, each element of the list is tried in the
+order given.
+
+By default, the list will contain one item that conforms to the pattern
+"<arch>_linux_26", amd64 being the name for x86_64.
diff --git a/Documentation/filesystems/afs.txt b/Documentation/filesystems/afs.txt
deleted file mode 100644
index 8c6ea7b41048..000000000000
--- a/Documentation/filesystems/afs.txt
+++ /dev/null
@@ -1,258 +0,0 @@
-			     ====================
-			     kAFS: AFS FILESYSTEM
-			     ====================
-
-Contents:
-
- - Overview.
- - Usage.
- - Mountpoints.
- - Dynamic root.
- - Proc filesystem.
- - The cell database.
- - Security.
- - The @sys substitution.
-
-
-========
-OVERVIEW
-========
-
-This filesystem provides a fairly simple secure AFS filesystem driver. It is
-under development and does not yet provide the full feature set.  The features
-it does support include:
-
- (*) Security (currently only AFS kaserver and KerberosIV tickets).
-
- (*) File reading and writing.
-
- (*) Automounting.
-
- (*) Local caching (via fscache).
-
-It does not yet support the following AFS features:
-
- (*) pioctl() system call.
-
-
-===========
-COMPILATION
-===========
-
-The filesystem should be enabled by turning on the kernel configuration
-options:
-
-	CONFIG_AF_RXRPC		- The RxRPC protocol transport
-	CONFIG_RXKAD		- The RxRPC Kerberos security handler
-	CONFIG_AFS		- The AFS filesystem
-
-Additionally, the following can be turned on to aid debugging:
-
-	CONFIG_AF_RXRPC_DEBUG	- Permit AF_RXRPC debugging to be enabled
-	CONFIG_AFS_DEBUG	- Permit AFS debugging to be enabled
-
-They permit the debugging messages to be turned on dynamically by manipulating
-the masks in the following files:
-
-	/sys/module/af_rxrpc/parameters/debug
-	/sys/module/kafs/parameters/debug
-
-
-=====
-USAGE
-=====
-
-When inserting the driver modules the root cell must be specified along with a
-list of volume location server IP addresses:
-
-	modprobe rxrpc
-	modprobe kafs rootcell=cambridge.redhat.com:172.16.18.73:172.16.18.91
-
-The first module is the AF_RXRPC network protocol driver.  This provides the
-RxRPC remote operation protocol and may also be accessed from userspace.  See:
-
-	Documentation/networking/rxrpc.txt
-
-The second module is the kerberos RxRPC security driver, and the third module
-is the actual filesystem driver for the AFS filesystem.
-
-Once the module has been loaded, more modules can be added by the following
-procedure:
-
-	echo add grand.central.org 18.9.48.14:128.2.203.61:130.237.48.87 >/proc/fs/afs/cells
-
-Where the parameters to the "add" command are the name of a cell and a list of
-volume location servers within that cell, with the latter separated by colons.
-
-Filesystems can be mounted anywhere by commands similar to the following:
-
-	mount -t afs "%cambridge.redhat.com:root.afs." /afs
-	mount -t afs "#cambridge.redhat.com:root.cell." /afs/cambridge
-	mount -t afs "#root.afs." /afs
-	mount -t afs "#root.cell." /afs/cambridge
-
-Where the initial character is either a hash or a percent symbol depending on
-whether you definitely want a R/W volume (percent) or whether you'd prefer a
-R/O volume, but are willing to use a R/W volume instead (hash).
-
-The name of the volume can be suffixes with ".backup" or ".readonly" to
-specify connection to only volumes of those types.
-
-The name of the cell is optional, and if not given during a mount, then the
-named volume will be looked up in the cell specified during modprobe.
-
-Additional cells can be added through /proc (see later section).
-
-
-===========
-MOUNTPOINTS
-===========
-
-AFS has a concept of mountpoints. In AFS terms, these are specially formatted
-symbolic links (of the same form as the "device name" passed to mount).  kAFS
-presents these to the user as directories that have a follow-link capability
-(ie: symbolic link semantics).  If anyone attempts to access them, they will
-automatically cause the target volume to be mounted (if possible) on that site.
-
-Automatically mounted filesystems will be automatically unmounted approximately
-twenty minutes after they were last used.  Alternatively they can be unmounted
-directly with the umount() system call.
-
-Manually unmounting an AFS volume will cause any idle submounts upon it to be
-culled first.  If all are culled, then the requested volume will also be
-unmounted, otherwise error EBUSY will be returned.
-
-This can be used by the administrator to attempt to unmount the whole AFS tree
-mounted on /afs in one go by doing:
-
-	umount /afs
-
-
-============
-DYNAMIC ROOT
-============
-
-A mount option is available to create a serverless mount that is only usable
-for dynamic lookup.  Creating such a mount can be done by, for example:
-
-	mount -t afs none /afs -o dyn
-
-This creates a mount that just has an empty directory at the root.  Attempting
-to look up a name in this directory will cause a mountpoint to be created that
-looks up a cell of the same name, for example:
-
-	ls /afs/grand.central.org/
-
-
-===============
-PROC FILESYSTEM
-===============
-
-The AFS modules creates a "/proc/fs/afs/" directory and populates it:
-
-  (*) A "cells" file that lists cells currently known to the afs module and
-      their usage counts:
-
-	[root@andromeda ~]# cat /proc/fs/afs/cells
-	USE NAME
-	  3 cambridge.redhat.com
-
-  (*) A directory per cell that contains files that list volume location
-      servers, volumes, and active servers known within that cell.
-
-	[root@andromeda ~]# cat /proc/fs/afs/cambridge.redhat.com/servers
-	USE ADDR            STATE
-	  4 172.16.18.91        0
-	[root@andromeda ~]# cat /proc/fs/afs/cambridge.redhat.com/vlservers
-	ADDRESS
-	172.16.18.91
-	[root@andromeda ~]# cat /proc/fs/afs/cambridge.redhat.com/volumes
-	USE STT VLID[0]  VLID[1]  VLID[2]  NAME
-	  1 Val 20000000 20000001 20000002 root.afs
-
-
-=================
-THE CELL DATABASE
-=================
-
-The filesystem maintains an internal database of all the cells it knows and the
-IP addresses of the volume location servers for those cells.  The cell to which
-the system belongs is added to the database when modprobe is performed by the
-"rootcell=" argument or, if compiled in, using a "kafs.rootcell=" argument on
-the kernel command line.
-
-Further cells can be added by commands similar to the following:
-
-	echo add CELLNAME VLADDR[:VLADDR][:VLADDR]... >/proc/fs/afs/cells
-	echo add grand.central.org 18.9.48.14:128.2.203.61:130.237.48.87 >/proc/fs/afs/cells
-
-No other cell database operations are available at this time.
-
-
-========
-SECURITY
-========
-
-Secure operations are initiated by acquiring a key using the klog program.  A
-very primitive klog program is available at:
-
-	http://people.redhat.com/~dhowells/rxrpc/klog.c
-
-This should be compiled by:
-
-	make klog LDLIBS="-lcrypto -lcrypt -lkrb4 -lkeyutils"
-
-And then run as:
-
-	./klog
-
-Assuming it's successful, this adds a key of type RxRPC, named for the service
-and cell, eg: "afs@<cellname>".  This can be viewed with the keyctl program or
-by cat'ing /proc/keys:
-
-	[root@andromeda ~]# keyctl show
-	Session Keyring
-	       -3 --alswrv      0     0  keyring: _ses.3268
-		2 --alswrv      0     0   \_ keyring: _uid.0
-	111416553 --als--v      0     0   \_ rxrpc: afs@CAMBRIDGE.REDHAT.COM
-
-Currently the username, realm, password and proposed ticket lifetime are
-compiled in to the program.
-
-It is not required to acquire a key before using AFS facilities, but if one is
-not acquired then all operations will be governed by the anonymous user parts
-of the ACLs.
-
-If a key is acquired, then all AFS operations, including mounts and automounts,
-made by a possessor of that key will be secured with that key.
-
-If a file is opened with a particular key and then the file descriptor is
-passed to a process that doesn't have that key (perhaps over an AF_UNIX
-socket), then the operations on the file will be made with key that was used to
-open the file.
-
-
-=====================
-THE @SYS SUBSTITUTION
-=====================
-
-The list of up to 16 @sys substitutions for the current network namespace can
-be configured by writing a list to /proc/fs/afs/sysname:
-
-	[root@andromeda ~]# echo foo amd64_linux_26 >/proc/fs/afs/sysname
-
-or cleared entirely by writing an empty list:
-
-	[root@andromeda ~]# echo >/proc/fs/afs/sysname
-
-The current list for current network namespace can be retrieved by:
-
-	[root@andromeda ~]# cat /proc/fs/afs/sysname
-	foo
-	amd64_linux_26
-
-When @sys is being substituted for, each element of the list is tried in the
-order given.
-
-By default, the list will contain one item that conforms to the pattern
-"<arch>_linux_26", amd64 being the name for x86_64.
diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index 273d802ad5fb..0598bc52abdc 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -49,6 +49,7 @@ Documentation for filesystem implementations.
    9p
    adfs
    affs
+   afs
    autofs
    fuse
    overlayfs
-- 
cgit 


From c64d3dc69f38a08d082813f1c0425d7a108ef950 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:11:51 +0100
Subject: docs: filesystems: convert autofs-mount-control.txt to ReST

- Add a SPDX header;
- Adjust document title;
- Some whitespace fixes and new line breaks;
- Mark literal blocks as such;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/8cae057ae244d0f5b58d3c209bcdae5ed82bc52c.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/autofs-mount-control.rst | 410 +++++++++++++++++++++
 Documentation/filesystems/autofs-mount-control.txt | 408 --------------------
 Documentation/filesystems/index.rst                |   1 +
 3 files changed, 411 insertions(+), 408 deletions(-)
 create mode 100644 Documentation/filesystems/autofs-mount-control.rst
 delete mode 100644 Documentation/filesystems/autofs-mount-control.txt

diff --git a/Documentation/filesystems/autofs-mount-control.rst b/Documentation/filesystems/autofs-mount-control.rst
new file mode 100644
index 000000000000..2903aed92316
--- /dev/null
+++ b/Documentation/filesystems/autofs-mount-control.rst
@@ -0,0 +1,410 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+====================================================================
+Miscellaneous Device control operations for the autofs kernel module
+====================================================================
+
+The problem
+===========
+
+There is a problem with active restarts in autofs (that is to say
+restarting autofs when there are busy mounts).
+
+During normal operation autofs uses a file descriptor opened on the
+directory that is being managed in order to be able to issue control
+operations. Using a file descriptor gives ioctl operations access to
+autofs specific information stored in the super block. The operations
+are things such as setting an autofs mount catatonic, setting the
+expire timeout and requesting expire checks. As is explained below,
+certain types of autofs triggered mounts can end up covering an autofs
+mount itself which prevents us being able to use open(2) to obtain a
+file descriptor for these operations if we don't already have one open.
+
+Currently autofs uses "umount -l" (lazy umount) to clear active mounts
+at restart. While using lazy umount works for most cases, anything that
+needs to walk back up the mount tree to construct a path, such as
+getcwd(2) and the proc file system /proc/<pid>/cwd, no longer works
+because the point from which the path is constructed has been detached
+from the mount tree.
+
+The actual problem with autofs is that it can't reconnect to existing
+mounts. Immediately one thinks of just adding the ability to remount
+autofs file systems would solve it, but alas, that can't work. This is
+because autofs direct mounts and the implementation of "on demand mount
+and expire" of nested mount trees have the file system mounted directly
+on top of the mount trigger directory dentry.
+
+For example, there are two types of automount maps, direct (in the kernel
+module source you will see a third type called an offset, which is just
+a direct mount in disguise) and indirect.
+
+Here is a master map with direct and indirect map entries::
+
+    /-      /etc/auto.direct
+    /test   /etc/auto.indirect
+
+and the corresponding map files::
+
+    /etc/auto.direct:
+
+    /automount/dparse/g6  budgie:/autofs/export1
+    /automount/dparse/g1  shark:/autofs/export1
+    and so on.
+
+/etc/auto.indirect::
+
+    g1    shark:/autofs/export1
+    g6    budgie:/autofs/export1
+    and so on.
+
+For the above indirect map an autofs file system is mounted on /test and
+mounts are triggered for each sub-directory key by the inode lookup
+operation. So we see a mount of shark:/autofs/export1 on /test/g1, for
+example.
+
+The way that direct mounts are handled is by making an autofs mount on
+each full path, such as /automount/dparse/g1, and using it as a mount
+trigger. So when we walk on the path we mount shark:/autofs/export1 "on
+top of this mount point". Since these are always directories we can
+use the follow_link inode operation to trigger the mount.
+
+But, each entry in direct and indirect maps can have offsets (making
+them multi-mount map entries).
+
+For example, an indirect mount map entry could also be::
+
+    g1  \
+    /        shark:/autofs/export5/testing/test \
+    /s1      shark:/autofs/export/testing/test/s1 \
+    /s2      shark:/autofs/export5/testing/test/s2 \
+    /s1/ss1  shark:/autofs/export1 \
+    /s2/ss2  shark:/autofs/export2
+
+and a similarly a direct mount map entry could also be::
+
+    /automount/dparse/g1 \
+	/       shark:/autofs/export5/testing/test \
+	/s1     shark:/autofs/export/testing/test/s1 \
+	/s2     shark:/autofs/export5/testing/test/s2 \
+	/s1/ss1 shark:/autofs/export2 \
+	/s2/ss2 shark:/autofs/export2
+
+One of the issues with version 4 of autofs was that, when mounting an
+entry with a large number of offsets, possibly with nesting, we needed
+to mount and umount all of the offsets as a single unit. Not really a
+problem, except for people with a large number of offsets in map entries.
+This mechanism is used for the well known "hosts" map and we have seen
+cases (in 2.4) where the available number of mounts are exhausted or
+where the number of privileged ports available is exhausted.
+
+In version 5 we mount only as we go down the tree of offsets and
+similarly for expiring them which resolves the above problem. There is
+somewhat more detail to the implementation but it isn't needed for the
+sake of the problem explanation. The one important detail is that these
+offsets are implemented using the same mechanism as the direct mounts
+above and so the mount points can be covered by a mount.
+
+The current autofs implementation uses an ioctl file descriptor opened
+on the mount point for control operations. The references held by the
+descriptor are accounted for in checks made to determine if a mount is
+in use and is also used to access autofs file system information held
+in the mount super block. So the use of a file handle needs to be
+retained.
+
+
+The Solution
+============
+
+To be able to restart autofs leaving existing direct, indirect and
+offset mounts in place we need to be able to obtain a file handle
+for these potentially covered autofs mount points. Rather than just
+implement an isolated operation it was decided to re-implement the
+existing ioctl interface and add new operations to provide this
+functionality.
+
+In addition, to be able to reconstruct a mount tree that has busy mounts,
+the uid and gid of the last user that triggered the mount needs to be
+available because these can be used as macro substitution variables in
+autofs maps. They are recorded at mount request time and an operation
+has been added to retrieve them.
+
+Since we're re-implementing the control interface, a couple of other
+problems with the existing interface have been addressed. First, when
+a mount or expire operation completes a status is returned to the
+kernel by either a "send ready" or a "send fail" operation. The
+"send fail" operation of the ioctl interface could only ever send
+ENOENT so the re-implementation allows user space to send an actual
+status. Another expensive operation in user space, for those using
+very large maps, is discovering if a mount is present. Usually this
+involves scanning /proc/mounts and since it needs to be done quite
+often it can introduce significant overhead when there are many entries
+in the mount table. An operation to lookup the mount status of a mount
+point dentry (covered or not) has also been added.
+
+Current kernel development policy recommends avoiding the use of the
+ioctl mechanism in favor of systems such as Netlink. An implementation
+using this system was attempted to evaluate its suitability and it was
+found to be inadequate, in this case. The Generic Netlink system was
+used for this as raw Netlink would lead to a significant increase in
+complexity. There's no question that the Generic Netlink system is an
+elegant solution for common case ioctl functions but it's not a complete
+replacement probably because its primary purpose in life is to be a
+message bus implementation rather than specifically an ioctl replacement.
+While it would be possible to work around this there is one concern
+that lead to the decision to not use it. This is that the autofs
+expire in the daemon has become far to complex because umount
+candidates are enumerated, almost for no other reason than to "count"
+the number of times to call the expire ioctl. This involves scanning
+the mount table which has proved to be a big overhead for users with
+large maps. The best way to improve this is try and get back to the
+way the expire was done long ago. That is, when an expire request is
+issued for a mount (file handle) we should continually call back to
+the daemon until we can't umount any more mounts, then return the
+appropriate status to the daemon. At the moment we just expire one
+mount at a time. A Generic Netlink implementation would exclude this
+possibility for future development due to the requirements of the
+message bus architecture.
+
+
+autofs Miscellaneous Device mount control interface
+====================================================
+
+The control interface is opening a device node, typically /dev/autofs.
+
+All the ioctls use a common structure to pass the needed parameter
+information and return operation results::
+
+    struct autofs_dev_ioctl {
+	    __u32 ver_major;
+	    __u32 ver_minor;
+	    __u32 size;             /* total size of data passed in
+				    * including this struct */
+	    __s32 ioctlfd;          /* automount command fd */
+
+	    /* Command parameters */
+	    union {
+		    struct args_protover		protover;
+		    struct args_protosubver		protosubver;
+		    struct args_openmount		openmount;
+		    struct args_ready		ready;
+		    struct args_fail		fail;
+		    struct args_setpipefd		setpipefd;
+		    struct args_timeout		timeout;
+		    struct args_requester		requester;
+		    struct args_expire		expire;
+		    struct args_askumount		askumount;
+		    struct args_ismountpoint	ismountpoint;
+	    };
+
+	    char path[0];
+    };
+
+The ioctlfd field is a mount point file descriptor of an autofs mount
+point. It is returned by the open call and is used by all calls except
+the check for whether a given path is a mount point, where it may
+optionally be used to check a specific mount corresponding to a given
+mount point file descriptor, and when requesting the uid and gid of the
+last successful mount on a directory within the autofs file system.
+
+The union is used to communicate parameters and results of calls made
+as described below.
+
+The path field is used to pass a path where it is needed and the size field
+is used account for the increased structure length when translating the
+structure sent from user space.
+
+This structure can be initialized before setting specific fields by using
+the void function call init_autofs_dev_ioctl(``struct autofs_dev_ioctl *``).
+
+All of the ioctls perform a copy of this structure from user space to
+kernel space and return -EINVAL if the size parameter is smaller than
+the structure size itself, -ENOMEM if the kernel memory allocation fails
+or -EFAULT if the copy itself fails. Other checks include a version check
+of the compiled in user space version against the module version and a
+mismatch results in a -EINVAL return. If the size field is greater than
+the structure size then a path is assumed to be present and is checked to
+ensure it begins with a "/" and is NULL terminated, otherwise -EINVAL is
+returned. Following these checks, for all ioctl commands except
+AUTOFS_DEV_IOCTL_VERSION_CMD, AUTOFS_DEV_IOCTL_OPENMOUNT_CMD and
+AUTOFS_DEV_IOCTL_CLOSEMOUNT_CMD the ioctlfd is validated and if it is
+not a valid descriptor or doesn't correspond to an autofs mount point
+an error of -EBADF, -ENOTTY or -EINVAL (not an autofs descriptor) is
+returned.
+
+
+The ioctls
+==========
+
+An example of an implementation which uses this interface can be seen
+in autofs version 5.0.4 and later in file lib/dev-ioctl-lib.c of the
+distribution tar available for download from kernel.org in directory
+/pub/linux/daemons/autofs/v5.
+
+The device node ioctl operations implemented by this interface are:
+
+
+AUTOFS_DEV_IOCTL_VERSION
+------------------------
+
+Get the major and minor version of the autofs device ioctl kernel module
+implementation. It requires an initialized struct autofs_dev_ioctl as an
+input parameter and sets the version information in the passed in structure.
+It returns 0 on success or the error -EINVAL if a version mismatch is
+detected.
+
+
+AUTOFS_DEV_IOCTL_PROTOVER_CMD and AUTOFS_DEV_IOCTL_PROTOSUBVER_CMD
+------------------------------------------------------------------
+
+Get the major and minor version of the autofs protocol version understood
+by loaded module. This call requires an initialized struct autofs_dev_ioctl
+with the ioctlfd field set to a valid autofs mount point descriptor
+and sets the requested version number in version field of struct args_protover
+or sub_version field of struct args_protosubver. These commands return
+0 on success or one of the negative error codes if validation fails.
+
+
+AUTOFS_DEV_IOCTL_OPENMOUNT and AUTOFS_DEV_IOCTL_CLOSEMOUNT
+----------------------------------------------------------
+
+Obtain and release a file descriptor for an autofs managed mount point
+path. The open call requires an initialized struct autofs_dev_ioctl with
+the path field set and the size field adjusted appropriately as well
+as the devid field of struct args_openmount set to the device number of
+the autofs mount. The device number can be obtained from the mount options
+shown in /proc/mounts. The close call requires an initialized struct
+autofs_dev_ioct with the ioctlfd field set to the descriptor obtained
+from the open call. The release of the file descriptor can also be done
+with close(2) so any open descriptors will also be closed at process exit.
+The close call is included in the implemented operations largely for
+completeness and to provide for a consistent user space implementation.
+
+
+AUTOFS_DEV_IOCTL_READY_CMD and AUTOFS_DEV_IOCTL_FAIL_CMD
+--------------------------------------------------------
+
+Return mount and expire result status from user space to the kernel.
+Both of these calls require an initialized struct autofs_dev_ioctl
+with the ioctlfd field set to the descriptor obtained from the open
+call and the token field of struct args_ready or struct args_fail set
+to the wait queue token number, received by user space in the foregoing
+mount or expire request. The status field of struct args_fail is set to
+the errno of the operation. It is set to 0 on success.
+
+
+AUTOFS_DEV_IOCTL_SETPIPEFD_CMD
+------------------------------
+
+Set the pipe file descriptor used for kernel communication to the daemon.
+Normally this is set at mount time using an option but when reconnecting
+to a existing mount we need to use this to tell the autofs mount about
+the new kernel pipe descriptor. In order to protect mounts against
+incorrectly setting the pipe descriptor we also require that the autofs
+mount be catatonic (see next call).
+
+The call requires an initialized struct autofs_dev_ioctl with the
+ioctlfd field set to the descriptor obtained from the open call and
+the pipefd field of struct args_setpipefd set to descriptor of the pipe.
+On success the call also sets the process group id used to identify the
+controlling process (eg. the owning automount(8) daemon) to the process
+group of the caller.
+
+
+AUTOFS_DEV_IOCTL_CATATONIC_CMD
+------------------------------
+
+Make the autofs mount point catatonic. The autofs mount will no longer
+issue mount requests, the kernel communication pipe descriptor is released
+and any remaining waits in the queue released.
+
+The call requires an initialized struct autofs_dev_ioctl with the
+ioctlfd field set to the descriptor obtained from the open call.
+
+
+AUTOFS_DEV_IOCTL_TIMEOUT_CMD
+----------------------------
+
+Set the expire timeout for mounts within an autofs mount point.
+
+The call requires an initialized struct autofs_dev_ioctl with the
+ioctlfd field set to the descriptor obtained from the open call.
+
+
+AUTOFS_DEV_IOCTL_REQUESTER_CMD
+------------------------------
+
+Return the uid and gid of the last process to successfully trigger a the
+mount on the given path dentry.
+
+The call requires an initialized struct autofs_dev_ioctl with the path
+field set to the mount point in question and the size field adjusted
+appropriately. Upon return the uid field of struct args_requester contains
+the uid and gid field the gid.
+
+When reconstructing an autofs mount tree with active mounts we need to
+re-connect to mounts that may have used the original process uid and
+gid (or string variations of them) for mount lookups within the map entry.
+This call provides the ability to obtain this uid and gid so they may be
+used by user space for the mount map lookups.
+
+
+AUTOFS_DEV_IOCTL_EXPIRE_CMD
+---------------------------
+
+Issue an expire request to the kernel for an autofs mount. Typically
+this ioctl is called until no further expire candidates are found.
+
+The call requires an initialized struct autofs_dev_ioctl with the
+ioctlfd field set to the descriptor obtained from the open call. In
+addition an immediate expire that's independent of the mount timeout,
+and a forced expire that's independent of whether the mount is busy,
+can be requested by setting the how field of struct args_expire to
+AUTOFS_EXP_IMMEDIATE or AUTOFS_EXP_FORCED, respectively . If no
+expire candidates can be found the ioctl returns -1 with errno set to
+EAGAIN.
+
+This call causes the kernel module to check the mount corresponding
+to the given ioctlfd for mounts that can be expired, issues an expire
+request back to the daemon and waits for completion.
+
+AUTOFS_DEV_IOCTL_ASKUMOUNT_CMD
+------------------------------
+
+Checks if an autofs mount point is in use.
+
+The call requires an initialized struct autofs_dev_ioctl with the
+ioctlfd field set to the descriptor obtained from the open call and
+it returns the result in the may_umount field of struct args_askumount,
+1 for busy and 0 otherwise.
+
+
+AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD
+---------------------------------
+
+Check if the given path is a mountpoint.
+
+The call requires an initialized struct autofs_dev_ioctl. There are two
+possible variations. Both use the path field set to the path of the mount
+point to check and the size field adjusted appropriately. One uses the
+ioctlfd field to identify a specific mount point to check while the other
+variation uses the path and optionally in.type field of struct args_ismountpoint
+set to an autofs mount type. The call returns 1 if this is a mount point
+and sets out.devid field to the device number of the mount and out.magic
+field to the relevant super block magic number (described below) or 0 if
+it isn't a mountpoint. In both cases the the device number (as returned
+by new_encode_dev()) is returned in out.devid field.
+
+If supplied with a file descriptor we're looking for a specific mount,
+not necessarily at the top of the mounted stack. In this case the path
+the descriptor corresponds to is considered a mountpoint if it is itself
+a mountpoint or contains a mount, such as a multi-mount without a root
+mount. In this case we return 1 if the descriptor corresponds to a mount
+point and and also returns the super magic of the covering mount if there
+is one or 0 if it isn't a mountpoint.
+
+If a path is supplied (and the ioctlfd field is set to -1) then the path
+is looked up and is checked to see if it is the root of a mount. If a
+type is also given we are looking for a particular autofs mount and if
+a match isn't found a fail is returned. If the the located path is the
+root of a mount 1 is returned along with the super magic of the mount
+or 0 otherwise.
diff --git a/Documentation/filesystems/autofs-mount-control.txt b/Documentation/filesystems/autofs-mount-control.txt
deleted file mode 100644
index acc02fc57993..000000000000
--- a/Documentation/filesystems/autofs-mount-control.txt
+++ /dev/null
@@ -1,408 +0,0 @@
-
-Miscellaneous Device control operations for the autofs kernel module
-====================================================================
-
-The problem
-===========
-
-There is a problem with active restarts in autofs (that is to say
-restarting autofs when there are busy mounts).
-
-During normal operation autofs uses a file descriptor opened on the
-directory that is being managed in order to be able to issue control
-operations. Using a file descriptor gives ioctl operations access to
-autofs specific information stored in the super block. The operations
-are things such as setting an autofs mount catatonic, setting the
-expire timeout and requesting expire checks. As is explained below,
-certain types of autofs triggered mounts can end up covering an autofs
-mount itself which prevents us being able to use open(2) to obtain a
-file descriptor for these operations if we don't already have one open.
-
-Currently autofs uses "umount -l" (lazy umount) to clear active mounts
-at restart. While using lazy umount works for most cases, anything that
-needs to walk back up the mount tree to construct a path, such as
-getcwd(2) and the proc file system /proc/<pid>/cwd, no longer works
-because the point from which the path is constructed has been detached
-from the mount tree.
-
-The actual problem with autofs is that it can't reconnect to existing
-mounts. Immediately one thinks of just adding the ability to remount
-autofs file systems would solve it, but alas, that can't work. This is
-because autofs direct mounts and the implementation of "on demand mount
-and expire" of nested mount trees have the file system mounted directly
-on top of the mount trigger directory dentry.
-
-For example, there are two types of automount maps, direct (in the kernel
-module source you will see a third type called an offset, which is just
-a direct mount in disguise) and indirect.
-
-Here is a master map with direct and indirect map entries:
-
-/-      /etc/auto.direct
-/test   /etc/auto.indirect
-
-and the corresponding map files:
-
-/etc/auto.direct:
-
-/automount/dparse/g6  budgie:/autofs/export1
-/automount/dparse/g1  shark:/autofs/export1
-and so on.
-
-/etc/auto.indirect:
-
-g1    shark:/autofs/export1
-g6    budgie:/autofs/export1
-and so on.
-
-For the above indirect map an autofs file system is mounted on /test and
-mounts are triggered for each sub-directory key by the inode lookup
-operation. So we see a mount of shark:/autofs/export1 on /test/g1, for
-example.
-
-The way that direct mounts are handled is by making an autofs mount on
-each full path, such as /automount/dparse/g1, and using it as a mount
-trigger. So when we walk on the path we mount shark:/autofs/export1 "on
-top of this mount point". Since these are always directories we can
-use the follow_link inode operation to trigger the mount.
-
-But, each entry in direct and indirect maps can have offsets (making
-them multi-mount map entries).
-
-For example, an indirect mount map entry could also be:
-
-g1  \
-   /        shark:/autofs/export5/testing/test \
-   /s1      shark:/autofs/export/testing/test/s1 \
-   /s2      shark:/autofs/export5/testing/test/s2 \
-   /s1/ss1  shark:/autofs/export1 \
-   /s2/ss2  shark:/autofs/export2
-
-and a similarly a direct mount map entry could also be:
-
-/automount/dparse/g1 \
-    /       shark:/autofs/export5/testing/test \
-    /s1     shark:/autofs/export/testing/test/s1 \
-    /s2     shark:/autofs/export5/testing/test/s2 \
-    /s1/ss1 shark:/autofs/export2 \
-    /s2/ss2 shark:/autofs/export2
-
-One of the issues with version 4 of autofs was that, when mounting an
-entry with a large number of offsets, possibly with nesting, we needed
-to mount and umount all of the offsets as a single unit. Not really a
-problem, except for people with a large number of offsets in map entries.
-This mechanism is used for the well known "hosts" map and we have seen
-cases (in 2.4) where the available number of mounts are exhausted or
-where the number of privileged ports available is exhausted.
-
-In version 5 we mount only as we go down the tree of offsets and
-similarly for expiring them which resolves the above problem. There is
-somewhat more detail to the implementation but it isn't needed for the
-sake of the problem explanation. The one important detail is that these
-offsets are implemented using the same mechanism as the direct mounts
-above and so the mount points can be covered by a mount.
-
-The current autofs implementation uses an ioctl file descriptor opened
-on the mount point for control operations. The references held by the
-descriptor are accounted for in checks made to determine if a mount is
-in use and is also used to access autofs file system information held
-in the mount super block. So the use of a file handle needs to be
-retained.
-
-
-The Solution
-============
-
-To be able to restart autofs leaving existing direct, indirect and
-offset mounts in place we need to be able to obtain a file handle
-for these potentially covered autofs mount points. Rather than just
-implement an isolated operation it was decided to re-implement the
-existing ioctl interface and add new operations to provide this
-functionality.
-
-In addition, to be able to reconstruct a mount tree that has busy mounts,
-the uid and gid of the last user that triggered the mount needs to be
-available because these can be used as macro substitution variables in
-autofs maps. They are recorded at mount request time and an operation
-has been added to retrieve them.
-
-Since we're re-implementing the control interface, a couple of other
-problems with the existing interface have been addressed. First, when
-a mount or expire operation completes a status is returned to the
-kernel by either a "send ready" or a "send fail" operation. The
-"send fail" operation of the ioctl interface could only ever send
-ENOENT so the re-implementation allows user space to send an actual
-status. Another expensive operation in user space, for those using
-very large maps, is discovering if a mount is present. Usually this
-involves scanning /proc/mounts and since it needs to be done quite
-often it can introduce significant overhead when there are many entries
-in the mount table. An operation to lookup the mount status of a mount
-point dentry (covered or not) has also been added.
-
-Current kernel development policy recommends avoiding the use of the
-ioctl mechanism in favor of systems such as Netlink. An implementation
-using this system was attempted to evaluate its suitability and it was
-found to be inadequate, in this case. The Generic Netlink system was
-used for this as raw Netlink would lead to a significant increase in
-complexity. There's no question that the Generic Netlink system is an
-elegant solution for common case ioctl functions but it's not a complete
-replacement probably because its primary purpose in life is to be a
-message bus implementation rather than specifically an ioctl replacement.
-While it would be possible to work around this there is one concern
-that lead to the decision to not use it. This is that the autofs
-expire in the daemon has become far to complex because umount
-candidates are enumerated, almost for no other reason than to "count"
-the number of times to call the expire ioctl. This involves scanning
-the mount table which has proved to be a big overhead for users with
-large maps. The best way to improve this is try and get back to the
-way the expire was done long ago. That is, when an expire request is
-issued for a mount (file handle) we should continually call back to
-the daemon until we can't umount any more mounts, then return the
-appropriate status to the daemon. At the moment we just expire one
-mount at a time. A Generic Netlink implementation would exclude this
-possibility for future development due to the requirements of the
-message bus architecture.
-
-
-autofs Miscellaneous Device mount control interface
-====================================================
-
-The control interface is opening a device node, typically /dev/autofs.
-
-All the ioctls use a common structure to pass the needed parameter
-information and return operation results:
-
-struct autofs_dev_ioctl {
-	__u32 ver_major;
-	__u32 ver_minor;
-	__u32 size;             /* total size of data passed in
-				 * including this struct */
-	__s32 ioctlfd;          /* automount command fd */
-
-	/* Command parameters */
-	union {
-		struct args_protover		protover;
-		struct args_protosubver		protosubver;
-		struct args_openmount		openmount;
-		struct args_ready		ready;
-		struct args_fail		fail;
-		struct args_setpipefd		setpipefd;
-		struct args_timeout		timeout;
-		struct args_requester		requester;
-		struct args_expire		expire;
-		struct args_askumount		askumount;
-		struct args_ismountpoint	ismountpoint;
-	};
-
-	char path[0];
-};
-
-The ioctlfd field is a mount point file descriptor of an autofs mount
-point. It is returned by the open call and is used by all calls except
-the check for whether a given path is a mount point, where it may
-optionally be used to check a specific mount corresponding to a given
-mount point file descriptor, and when requesting the uid and gid of the
-last successful mount on a directory within the autofs file system.
-
-The union is used to communicate parameters and results of calls made
-as described below.
-
-The path field is used to pass a path where it is needed and the size field
-is used account for the increased structure length when translating the
-structure sent from user space.
-
-This structure can be initialized before setting specific fields by using
-the void function call init_autofs_dev_ioctl(struct autofs_dev_ioctl *).
-
-All of the ioctls perform a copy of this structure from user space to
-kernel space and return -EINVAL if the size parameter is smaller than
-the structure size itself, -ENOMEM if the kernel memory allocation fails
-or -EFAULT if the copy itself fails. Other checks include a version check
-of the compiled in user space version against the module version and a
-mismatch results in a -EINVAL return. If the size field is greater than
-the structure size then a path is assumed to be present and is checked to
-ensure it begins with a "/" and is NULL terminated, otherwise -EINVAL is
-returned. Following these checks, for all ioctl commands except
-AUTOFS_DEV_IOCTL_VERSION_CMD, AUTOFS_DEV_IOCTL_OPENMOUNT_CMD and
-AUTOFS_DEV_IOCTL_CLOSEMOUNT_CMD the ioctlfd is validated and if it is
-not a valid descriptor or doesn't correspond to an autofs mount point
-an error of -EBADF, -ENOTTY or -EINVAL (not an autofs descriptor) is
-returned.
-
-
-The ioctls
-==========
-
-An example of an implementation which uses this interface can be seen
-in autofs version 5.0.4 and later in file lib/dev-ioctl-lib.c of the
-distribution tar available for download from kernel.org in directory
-/pub/linux/daemons/autofs/v5.
-
-The device node ioctl operations implemented by this interface are:
-
-
-AUTOFS_DEV_IOCTL_VERSION
-------------------------
-
-Get the major and minor version of the autofs device ioctl kernel module
-implementation. It requires an initialized struct autofs_dev_ioctl as an
-input parameter and sets the version information in the passed in structure.
-It returns 0 on success or the error -EINVAL if a version mismatch is
-detected.
-
-
-AUTOFS_DEV_IOCTL_PROTOVER_CMD and AUTOFS_DEV_IOCTL_PROTOSUBVER_CMD
-------------------------------------------------------------------
-
-Get the major and minor version of the autofs protocol version understood
-by loaded module. This call requires an initialized struct autofs_dev_ioctl
-with the ioctlfd field set to a valid autofs mount point descriptor
-and sets the requested version number in version field of struct args_protover
-or sub_version field of struct args_protosubver. These commands return
-0 on success or one of the negative error codes if validation fails.
-
-
-AUTOFS_DEV_IOCTL_OPENMOUNT and AUTOFS_DEV_IOCTL_CLOSEMOUNT
-----------------------------------------------------------
-
-Obtain and release a file descriptor for an autofs managed mount point
-path. The open call requires an initialized struct autofs_dev_ioctl with
-the path field set and the size field adjusted appropriately as well
-as the devid field of struct args_openmount set to the device number of
-the autofs mount. The device number can be obtained from the mount options
-shown in /proc/mounts. The close call requires an initialized struct
-autofs_dev_ioct with the ioctlfd field set to the descriptor obtained
-from the open call. The release of the file descriptor can also be done
-with close(2) so any open descriptors will also be closed at process exit.
-The close call is included in the implemented operations largely for
-completeness and to provide for a consistent user space implementation.
-
-
-AUTOFS_DEV_IOCTL_READY_CMD and AUTOFS_DEV_IOCTL_FAIL_CMD
---------------------------------------------------------
-
-Return mount and expire result status from user space to the kernel.
-Both of these calls require an initialized struct autofs_dev_ioctl
-with the ioctlfd field set to the descriptor obtained from the open
-call and the token field of struct args_ready or struct args_fail set
-to the wait queue token number, received by user space in the foregoing
-mount or expire request. The status field of struct args_fail is set to
-the errno of the operation. It is set to 0 on success.
-
-
-AUTOFS_DEV_IOCTL_SETPIPEFD_CMD
-------------------------------
-
-Set the pipe file descriptor used for kernel communication to the daemon.
-Normally this is set at mount time using an option but when reconnecting
-to a existing mount we need to use this to tell the autofs mount about
-the new kernel pipe descriptor. In order to protect mounts against
-incorrectly setting the pipe descriptor we also require that the autofs
-mount be catatonic (see next call).
-
-The call requires an initialized struct autofs_dev_ioctl with the
-ioctlfd field set to the descriptor obtained from the open call and
-the pipefd field of struct args_setpipefd set to descriptor of the pipe.
-On success the call also sets the process group id used to identify the
-controlling process (eg. the owning automount(8) daemon) to the process
-group of the caller.
-
-
-AUTOFS_DEV_IOCTL_CATATONIC_CMD
-------------------------------
-
-Make the autofs mount point catatonic. The autofs mount will no longer
-issue mount requests, the kernel communication pipe descriptor is released
-and any remaining waits in the queue released.
-
-The call requires an initialized struct autofs_dev_ioctl with the
-ioctlfd field set to the descriptor obtained from the open call.
-
-
-AUTOFS_DEV_IOCTL_TIMEOUT_CMD
-----------------------------
-
-Set the expire timeout for mounts within an autofs mount point.
-
-The call requires an initialized struct autofs_dev_ioctl with the
-ioctlfd field set to the descriptor obtained from the open call.
-
-
-AUTOFS_DEV_IOCTL_REQUESTER_CMD
-------------------------------
-
-Return the uid and gid of the last process to successfully trigger a the
-mount on the given path dentry.
-
-The call requires an initialized struct autofs_dev_ioctl with the path
-field set to the mount point in question and the size field adjusted
-appropriately. Upon return the uid field of struct args_requester contains
-the uid and gid field the gid.
-
-When reconstructing an autofs mount tree with active mounts we need to
-re-connect to mounts that may have used the original process uid and
-gid (or string variations of them) for mount lookups within the map entry.
-This call provides the ability to obtain this uid and gid so they may be
-used by user space for the mount map lookups.
-
-
-AUTOFS_DEV_IOCTL_EXPIRE_CMD
----------------------------
-
-Issue an expire request to the kernel for an autofs mount. Typically
-this ioctl is called until no further expire candidates are found.
-
-The call requires an initialized struct autofs_dev_ioctl with the
-ioctlfd field set to the descriptor obtained from the open call. In
-addition an immediate expire that's independent of the mount timeout,
-and a forced expire that's independent of whether the mount is busy,
-can be requested by setting the how field of struct args_expire to
-AUTOFS_EXP_IMMEDIATE or AUTOFS_EXP_FORCED, respectively . If no
-expire candidates can be found the ioctl returns -1 with errno set to
-EAGAIN.
-
-This call causes the kernel module to check the mount corresponding
-to the given ioctlfd for mounts that can be expired, issues an expire
-request back to the daemon and waits for completion.
-
-AUTOFS_DEV_IOCTL_ASKUMOUNT_CMD
-------------------------------
-
-Checks if an autofs mount point is in use.
-
-The call requires an initialized struct autofs_dev_ioctl with the
-ioctlfd field set to the descriptor obtained from the open call and
-it returns the result in the may_umount field of struct args_askumount,
-1 for busy and 0 otherwise.
-
-
-AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD
----------------------------------
-
-Check if the given path is a mountpoint.
-
-The call requires an initialized struct autofs_dev_ioctl. There are two
-possible variations. Both use the path field set to the path of the mount
-point to check and the size field adjusted appropriately. One uses the
-ioctlfd field to identify a specific mount point to check while the other
-variation uses the path and optionally in.type field of struct args_ismountpoint
-set to an autofs mount type. The call returns 1 if this is a mount point
-and sets out.devid field to the device number of the mount and out.magic
-field to the relevant super block magic number (described below) or 0 if
-it isn't a mountpoint. In both cases the the device number (as returned
-by new_encode_dev()) is returned in out.devid field.
-
-If supplied with a file descriptor we're looking for a specific mount,
-not necessarily at the top of the mounted stack. In this case the path
-the descriptor corresponds to is considered a mountpoint if it is itself
-a mountpoint or contains a mount, such as a multi-mount without a root
-mount. In this case we return 1 if the descriptor corresponds to a mount
-point and and also returns the super magic of the covering mount if there
-is one or 0 if it isn't a mountpoint.
-
-If a path is supplied (and the ioctlfd field is set to -1) then the path
-is looked up and is checked to see if it is the root of a mount. If a
-type is also given we are looking for a particular autofs mount and if
-a match isn't found a fail is returned. If the the located path is the
-root of a mount 1 is returned along with the super magic of the mount
-or 0 otherwise.
diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index 0598bc52abdc..c9480138d47e 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -51,6 +51,7 @@ Documentation for filesystem implementations.
    affs
    afs
    autofs
+   autofs-mount-control
    fuse
    overlayfs
    virtiofs
-- 
cgit 


From c54ad9a4e8faf080e6b395cc4b8298dfc5170255 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:11:52 +0100
Subject: docs: filesystems: convert befs.txt to ReST

- Add a SPDX header;
- Adjust document and section titles;
- Some whitespace fixes and new line breaks;
- Mark literal blocks as such;
- Add table markups;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/3e29ea6df6cd569021cfa953ccb8ed7dfc146f3d.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/befs.rst  | 128 ++++++++++++++++++++++++++++++++++++
 Documentation/filesystems/befs.txt  | 117 --------------------------------
 Documentation/filesystems/index.rst |   1 +
 3 files changed, 129 insertions(+), 117 deletions(-)
 create mode 100644 Documentation/filesystems/befs.rst
 delete mode 100644 Documentation/filesystems/befs.txt

diff --git a/Documentation/filesystems/befs.rst b/Documentation/filesystems/befs.rst
new file mode 100644
index 000000000000..79f9740d76ff
--- /dev/null
+++ b/Documentation/filesystems/befs.rst
@@ -0,0 +1,128 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=========================
+BeOS filesystem for Linux
+=========================
+
+Document last updated: Dec 6, 2001
+
+Warning
+=======
+Make sure you understand that this is alpha software.  This means that the
+implementation is neither complete nor well-tested.
+
+I DISCLAIM ALL RESPONSIBILITY FOR ANY POSSIBLE BAD EFFECTS OF THIS CODE!
+
+License
+=======
+This software is covered by the GNU General Public License.
+See the file COPYING for the complete text of the license.
+Or the GNU website: <http://www.gnu.org/licenses/licenses.html>
+
+Author
+======
+The largest part of the code written by Will Dyson <will_dyson@pobox.com>
+He has been working on the code since Aug 13, 2001. See the changelog for
+details.
+
+Original Author: Makoto Kato <m_kato@ga2.so-net.ne.jp>
+
+His original code can still be found at:
+<http://hp.vector.co.jp/authors/VA008030/bfs/>
+
+Does anyone know of a more current email address for Makoto? He doesn't
+respond to the address given above...
+
+This filesystem doesn't have a maintainer.
+
+What is this Driver?
+====================
+This module implements the native filesystem of BeOS http://www.beincorporated.com/
+for the linux 2.4.1 and later kernels. Currently it is a read-only
+implementation.
+
+Which is it, BFS or BEFS?
+=========================
+Be, Inc said, "BeOS Filesystem is officially called BFS, not BeFS".
+But Unixware Boot Filesystem is called bfs, too. And they are already in
+the kernel. Because of this naming conflict, on Linux the BeOS
+filesystem is called befs.
+
+How to Install
+==============
+step 1.  Install the BeFS  patch into the source code tree of linux.
+
+Apply the patchfile to your kernel source tree.
+Assuming that your kernel source is in /foo/bar/linux and the patchfile
+is called patch-befs-xxx, you would do the following:
+
+	cd /foo/bar/linux
+	patch -p1 < /path/to/patch-befs-xxx
+
+if the patching step fails (i.e. there are rejected hunks), you can try to
+figure it out yourself (it shouldn't be hard), or mail the maintainer
+(Will Dyson <will_dyson@pobox.com>) for help.
+
+step 2.  Configuration & make kernel
+
+The linux kernel has many compile-time options. Most of them are beyond the
+scope of this document. I suggest the Kernel-HOWTO document as a good general
+reference on this topic. http://www.linuxdocs.org/HOWTOs/Kernel-HOWTO-4.html
+
+However, to use the BeFS module, you must enable it at configure time::
+
+	cd /foo/bar/linux
+	make menuconfig (or xconfig)
+
+The BeFS module is not a standard part of the linux kernel, so you must first
+enable support for experimental code under the "Code maturity level" menu.
+
+Then, under the "Filesystems" menu will be an option called "BeFS
+filesystem (experimental)", or something like that. Enable that option
+(it is fine to make it a module).
+
+Save your kernel configuration and then build your kernel.
+
+step 3.  Install
+
+See the kernel howto <http://www.linux.com/howto/Kernel-HOWTO.html> for
+instructions on this critical step.
+
+Using BFS
+=========
+To use the BeOS filesystem, use filesystem type 'befs'.
+
+ex::
+
+    mount -t befs /dev/fd0 /beos
+
+Mount Options
+=============
+
+=============  ===========================================================
+uid=nnn        All files in the partition will be owned by user id nnn.
+gid=nnn	       All files in the partition will be in group nnn.
+iocharset=xxx  Use xxx as the name of the NLS translation table.
+debug          The driver will output debugging information to the syslog.
+=============  ===========================================================
+
+How to Get Lastest Version
+==========================
+
+The latest version is currently available at:
+<http://befs-driver.sourceforge.net/>
+
+Any Known Bugs?
+===============
+As of Jan 20, 2002:
+
+	None
+
+Special Thanks
+==============
+Dominic Giampalo ... Writing "Practical file system design with Be filesystem"
+
+Hiroyuki Yamada  ... Testing LinuxPPC.
+
+
+
diff --git a/Documentation/filesystems/befs.txt b/Documentation/filesystems/befs.txt
deleted file mode 100644
index da45e6c842b8..000000000000
--- a/Documentation/filesystems/befs.txt
+++ /dev/null
@@ -1,117 +0,0 @@
-BeOS filesystem for Linux
-
-Document last updated: Dec 6, 2001
-
-WARNING
-=======
-Make sure you understand that this is alpha software.  This means that the
-implementation is neither complete nor well-tested. 
-
-I DISCLAIM ALL RESPONSIBILITY FOR ANY POSSIBLE BAD EFFECTS OF THIS CODE!
-
-LICENSE
-=====
-This software is covered by the GNU General Public License. 
-See the file COPYING for the complete text of the license.
-Or the GNU website: <http://www.gnu.org/licenses/licenses.html>
-
-AUTHOR
-=====
-The largest part of the code written by Will Dyson <will_dyson@pobox.com>
-He has been working on the code since Aug 13, 2001. See the changelog for
-details.
-
-Original Author: Makoto Kato <m_kato@ga2.so-net.ne.jp>
-His original code can still be found at:
-<http://hp.vector.co.jp/authors/VA008030/bfs/>
-Does anyone know of a more current email address for Makoto? He doesn't
-respond to the address given above...
-
-This filesystem doesn't have a maintainer.
-
-WHAT IS THIS DRIVER?
-==================
-This module implements the native filesystem of BeOS http://www.beincorporated.com/ 
-for the linux 2.4.1 and later kernels. Currently it is a read-only
-implementation.
-
-Which is it, BFS or BEFS?
-================
-Be, Inc said, "BeOS Filesystem is officially called BFS, not BeFS". 
-But Unixware Boot Filesystem is called bfs, too. And they are already in
-the kernel. Because of this naming conflict, on Linux the BeOS
-filesystem is called befs.
-
-HOW TO INSTALL
-==============
-step 1.  Install the BeFS  patch into the source code tree of linux.
-
-Apply the patchfile to your kernel source tree.
-Assuming that your kernel source is in /foo/bar/linux and the patchfile
-is called patch-befs-xxx, you would do the following:
-
-	cd /foo/bar/linux
-	patch -p1 < /path/to/patch-befs-xxx
-
-if the patching step fails (i.e. there are rejected hunks), you can try to
-figure it out yourself (it shouldn't be hard), or mail the maintainer 
-(Will Dyson <will_dyson@pobox.com>) for help.
-
-step 2.  Configuration & make kernel
-
-The linux kernel has many compile-time options. Most of them are beyond the
-scope of this document. I suggest the Kernel-HOWTO document as a good general
-reference on this topic. http://www.linuxdocs.org/HOWTOs/Kernel-HOWTO-4.html 
-
-However, to use the BeFS module, you must enable it at configure time.
-
-	cd /foo/bar/linux
-	make menuconfig (or xconfig)
-
-The BeFS module is not a standard part of the linux kernel, so you must first
-enable support for experimental code under the "Code maturity level" menu.
-
-Then, under the "Filesystems" menu will be an option called "BeFS
-filesystem (experimental)", or something like that. Enable that option
-(it is fine to make it a module).
-
-Save your kernel configuration and then build your kernel.
-
-step 3.  Install
-
-See the kernel howto <http://www.linux.com/howto/Kernel-HOWTO.html> for
-instructions on this critical step.
-
-USING BFS
-=========
-To use the BeOS filesystem, use filesystem type 'befs'.
-
-ex)
-    mount -t befs /dev/fd0 /beos
-
-MOUNT OPTIONS
-=============
-uid=nnn        All files in the partition will be owned by user id nnn.
-gid=nnn	       All files in the partition will be in group nnn.
-iocharset=xxx  Use xxx as the name of the NLS translation table.
-debug          The driver will output debugging information to the syslog.
-
-HOW TO GET LASTEST VERSION
-==========================
-
-The latest version is currently available at:
-<http://befs-driver.sourceforge.net/>
-
-ANY KNOWN BUGS?
-===========
-As of Jan 20, 2002:
-	
-	None
-
-SPECIAL THANKS
-==============
-Dominic Giampalo ... Writing "Practical file system design with Be filesystem"
-Hiroyuki Yamada  ... Testing LinuxPPC.
-
-
-
diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index c9480138d47e..98de437f5500 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -52,6 +52,7 @@ Documentation for filesystem implementations.
    afs
    autofs
    autofs-mount-control
+   befs
    fuse
    overlayfs
    virtiofs
-- 
cgit 


From ee68f34d7e7e553ffb74f09df0f3764fbfcf5d4b Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:11:53 +0100
Subject: docs: filesystems: convert bfs.txt to ReST

- Add a SPDX header;
- Adjust document title;
- Some whitespace fixes and new line breaks;
- Mark literal blocks as such;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/93991bcc05e419368ee1e585c81057fb2c7c8d2b.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/bfs.rst   | 60 +++++++++++++++++++++++++++++++++++++
 Documentation/filesystems/bfs.txt   | 57 -----------------------------------
 Documentation/filesystems/index.rst |  1 +
 3 files changed, 61 insertions(+), 57 deletions(-)
 create mode 100644 Documentation/filesystems/bfs.rst
 delete mode 100644 Documentation/filesystems/bfs.txt

diff --git a/Documentation/filesystems/bfs.rst b/Documentation/filesystems/bfs.rst
new file mode 100644
index 000000000000..ce14b9018807
--- /dev/null
+++ b/Documentation/filesystems/bfs.rst
@@ -0,0 +1,60 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+========================
+BFS Filesystem for Linux
+========================
+
+The BFS filesystem is used by SCO UnixWare OS for the /stand slice, which
+usually contains the kernel image and a few other files required for the
+boot process.
+
+In order to access /stand partition under Linux you obviously need to
+know the partition number and the kernel must support UnixWare disk slices
+(CONFIG_UNIXWARE_DISKLABEL config option). However BFS support does not
+depend on having UnixWare disklabel support because one can also mount
+BFS filesystem via loopback::
+
+    # losetup /dev/loop0 stand.img
+    # mount -t bfs /dev/loop0 /mnt/stand
+
+where stand.img is a file containing the image of BFS filesystem.
+When you have finished using it and umounted you need to also deallocate
+/dev/loop0 device by::
+
+    # losetup -d /dev/loop0
+
+You can simplify mounting by just typing::
+
+    # mount -t bfs -o loop stand.img /mnt/stand
+
+this will allocate the first available loopback device (and load loop.o
+kernel module if necessary) automatically. If the loopback driver is not
+loaded automatically, make sure that you have compiled the module and
+that modprobe is functioning. Beware that umount will not deallocate
+/dev/loopN device if /etc/mtab file on your system is a symbolic link to
+/proc/mounts. You will need to do it manually using "-d" switch of
+losetup(8). Read losetup(8) manpage for more info.
+
+To create the BFS image under UnixWare you need to find out first which
+slice contains it. The command prtvtoc(1M) is your friend::
+
+    # prtvtoc /dev/rdsk/c0b0t0d0s0
+
+(assuming your root disk is on target=0, lun=0, bus=0, controller=0). Then you
+look for the slice with tag "STAND", which is usually slice 10. With this
+information you can use dd(1) to create the BFS image::
+
+    # umount /stand
+    # dd if=/dev/rdsk/c0b0t0d0sa of=stand.img bs=512
+
+Just in case, you can verify that you have done the right thing by checking
+the magic number::
+
+    # od -Ad -tx4 stand.img | more
+
+The first 4 bytes should be 0x1badface.
+
+If you have any patches, questions or suggestions regarding this BFS
+implementation please contact the author:
+
+Tigran Aivazian <aivazian.tigran@gmail.com>
diff --git a/Documentation/filesystems/bfs.txt b/Documentation/filesystems/bfs.txt
deleted file mode 100644
index 843ce91a2e40..000000000000
--- a/Documentation/filesystems/bfs.txt
+++ /dev/null
@@ -1,57 +0,0 @@
-BFS FILESYSTEM FOR LINUX
-========================
-
-The BFS filesystem is used by SCO UnixWare OS for the /stand slice, which
-usually contains the kernel image and a few other files required for the
-boot process.
-
-In order to access /stand partition under Linux you obviously need to
-know the partition number and the kernel must support UnixWare disk slices
-(CONFIG_UNIXWARE_DISKLABEL config option). However BFS support does not
-depend on having UnixWare disklabel support because one can also mount
-BFS filesystem via loopback:
-
-# losetup /dev/loop0 stand.img
-# mount -t bfs /dev/loop0 /mnt/stand
-
-where stand.img is a file containing the image of BFS filesystem. 
-When you have finished using it and umounted you need to also deallocate
-/dev/loop0 device by:
-
-# losetup -d /dev/loop0
-
-You can simplify mounting by just typing:
-
-# mount -t bfs -o loop stand.img /mnt/stand
-
-this will allocate the first available loopback device (and load loop.o 
-kernel module if necessary) automatically. If the loopback driver is not
-loaded automatically, make sure that you have compiled the module and
-that modprobe is functioning. Beware that umount will not deallocate
-/dev/loopN device if /etc/mtab file on your system is a symbolic link to
-/proc/mounts. You will need to do it manually using "-d" switch of
-losetup(8). Read losetup(8) manpage for more info.
-
-To create the BFS image under UnixWare you need to find out first which
-slice contains it. The command prtvtoc(1M) is your friend:
-
-# prtvtoc /dev/rdsk/c0b0t0d0s0
-
-(assuming your root disk is on target=0, lun=0, bus=0, controller=0). Then you
-look for the slice with tag "STAND", which is usually slice 10. With this
-information you can use dd(1) to create the BFS image:
-
-# umount /stand
-# dd if=/dev/rdsk/c0b0t0d0sa of=stand.img bs=512
-
-Just in case, you can verify that you have done the right thing by checking
-the magic number:
-
-# od -Ad -tx4 stand.img | more
-
-The first 4 bytes should be 0x1badface.
-
-If you have any patches, questions or suggestions regarding this BFS
-implementation please contact the author:
-
-Tigran Aivazian <aivazian.tigran@gmail.com>
diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index 98de437f5500..f74e6b273d9f 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -53,6 +53,7 @@ Documentation for filesystem implementations.
    autofs
    autofs-mount-control
    befs
+   bfs
    fuse
    overlayfs
    virtiofs
-- 
cgit 


From 5d43e1bc2dfccbb07ea662fa4536544f1b6efd43 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:11:54 +0100
Subject: docs: filesystems: convert btrfs.txt to ReST

Just trivial changes:

- Add a SPDX header;
- Add it to filesystems/index.rst.

While here, adjust document title, just to make it use the same
style of the other docs.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Acked-by: David Sterba <dsterba@suse.com>
Link: https://lore.kernel.org/r/1ef76da4ac24a9a6f6187723554733c702ea19ae.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/btrfs.rst | 34 ++++++++++++++++++++++++++++++++++
 Documentation/filesystems/btrfs.txt | 31 -------------------------------
 Documentation/filesystems/index.rst |  1 +
 3 files changed, 35 insertions(+), 31 deletions(-)
 create mode 100644 Documentation/filesystems/btrfs.rst
 delete mode 100644 Documentation/filesystems/btrfs.txt

diff --git a/Documentation/filesystems/btrfs.rst b/Documentation/filesystems/btrfs.rst
new file mode 100644
index 000000000000..d0904f602819
--- /dev/null
+++ b/Documentation/filesystems/btrfs.rst
@@ -0,0 +1,34 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=====
+BTRFS
+=====
+
+Btrfs is a copy on write filesystem for Linux aimed at implementing advanced
+features while focusing on fault tolerance, repair and easy administration.
+Jointly developed by several companies, licensed under the GPL and open for
+contribution from anyone.
+
+The main Btrfs features include:
+
+    * Extent based file storage (2^64 max file size)
+    * Space efficient packing of small files
+    * Space efficient indexed directories
+    * Dynamic inode allocation
+    * Writable snapshots
+    * Subvolumes (separate internal filesystem roots)
+    * Object level mirroring and striping
+    * Checksums on data and metadata (multiple algorithms available)
+    * Compression
+    * Integrated multiple device support, with several raid algorithms
+    * Offline filesystem check
+    * Efficient incremental backup and FS mirroring
+    * Online filesystem defragmentation
+
+For more information please refer to the wiki
+
+  https://btrfs.wiki.kernel.org
+
+that maintains information about administration tasks, frequently asked
+questions, use cases, mount options, comprehensible changelogs, features,
+manual pages, source code repositories, contacts etc.
diff --git a/Documentation/filesystems/btrfs.txt b/Documentation/filesystems/btrfs.txt
deleted file mode 100644
index f9dad22d95ce..000000000000
--- a/Documentation/filesystems/btrfs.txt
+++ /dev/null
@@ -1,31 +0,0 @@
-BTRFS
-=====
-
-Btrfs is a copy on write filesystem for Linux aimed at implementing advanced
-features while focusing on fault tolerance, repair and easy administration.
-Jointly developed by several companies, licensed under the GPL and open for
-contribution from anyone.
-
-The main Btrfs features include:
-
-    * Extent based file storage (2^64 max file size)
-    * Space efficient packing of small files
-    * Space efficient indexed directories
-    * Dynamic inode allocation
-    * Writable snapshots
-    * Subvolumes (separate internal filesystem roots)
-    * Object level mirroring and striping
-    * Checksums on data and metadata (multiple algorithms available)
-    * Compression
-    * Integrated multiple device support, with several raid algorithms
-    * Offline filesystem check
-    * Efficient incremental backup and FS mirroring
-    * Online filesystem defragmentation
-
-For more information please refer to the wiki
-
-  https://btrfs.wiki.kernel.org
-
-that maintains information about administration tasks, frequently asked
-questions, use cases, mount options, comprehensible changelogs, features,
-manual pages, source code repositories, contacts etc.
diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index f74e6b273d9f..dae862cf167e 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -54,6 +54,7 @@ Documentation for filesystem implementations.
    autofs-mount-control
    befs
    bfs
+   btrfs
    fuse
    overlayfs
    virtiofs
-- 
cgit 


From 471379a174aa444b326d1b74e9f96a8b4b766b79 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:11:55 +0100
Subject: docs: filesystems: convert ceph.txt to ReST

- Add a SPDX header;
- Adjust document title;
- Some whitespace fixes and new line breaks;
- Mark literal blocks as such;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Acked-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/df2f142b5ca5842e030d8209482dfd62dcbe020f.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/ceph.rst  | 190 ++++++++++++++++++++++++++++++++++++
 Documentation/filesystems/ceph.txt  | 186 -----------------------------------
 Documentation/filesystems/index.rst |   1 +
 3 files changed, 191 insertions(+), 186 deletions(-)
 create mode 100644 Documentation/filesystems/ceph.rst
 delete mode 100644 Documentation/filesystems/ceph.txt

diff --git a/Documentation/filesystems/ceph.rst b/Documentation/filesystems/ceph.rst
new file mode 100644
index 000000000000..b46a7218248f
--- /dev/null
+++ b/Documentation/filesystems/ceph.rst
@@ -0,0 +1,190 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+============================
+Ceph Distributed File System
+============================
+
+Ceph is a distributed network file system designed to provide good
+performance, reliability, and scalability.
+
+Basic features include:
+
+ * POSIX semantics
+ * Seamless scaling from 1 to many thousands of nodes
+ * High availability and reliability.  No single point of failure.
+ * N-way replication of data across storage nodes
+ * Fast recovery from node failures
+ * Automatic rebalancing of data on node addition/removal
+ * Easy deployment: most FS components are userspace daemons
+
+Also,
+
+ * Flexible snapshots (on any directory)
+ * Recursive accounting (nested files, directories, bytes)
+
+In contrast to cluster filesystems like GFS, OCFS2, and GPFS that rely
+on symmetric access by all clients to shared block devices, Ceph
+separates data and metadata management into independent server
+clusters, similar to Lustre.  Unlike Lustre, however, metadata and
+storage nodes run entirely as user space daemons.  File data is striped
+across storage nodes in large chunks to distribute workload and
+facilitate high throughputs.  When storage nodes fail, data is
+re-replicated in a distributed fashion by the storage nodes themselves
+(with some minimal coordination from a cluster monitor), making the
+system extremely efficient and scalable.
+
+Metadata servers effectively form a large, consistent, distributed
+in-memory cache above the file namespace that is extremely scalable,
+dynamically redistributes metadata in response to workload changes,
+and can tolerate arbitrary (well, non-Byzantine) node failures.  The
+metadata server takes a somewhat unconventional approach to metadata
+storage to significantly improve performance for common workloads.  In
+particular, inodes with only a single link are embedded in
+directories, allowing entire directories of dentries and inodes to be
+loaded into its cache with a single I/O operation.  The contents of
+extremely large directories can be fragmented and managed by
+independent metadata servers, allowing scalable concurrent access.
+
+The system offers automatic data rebalancing/migration when scaling
+from a small cluster of just a few nodes to many hundreds, without
+requiring an administrator carve the data set into static volumes or
+go through the tedious process of migrating data between servers.
+When the file system approaches full, new nodes can be easily added
+and things will "just work."
+
+Ceph includes flexible snapshot mechanism that allows a user to create
+a snapshot on any subdirectory (and its nested contents) in the
+system.  Snapshot creation and deletion are as simple as 'mkdir
+.snap/foo' and 'rmdir .snap/foo'.
+
+Ceph also provides some recursive accounting on directories for nested
+files and bytes.  That is, a 'getfattr -d foo' on any directory in the
+system will reveal the total number of nested regular files and
+subdirectories, and a summation of all nested file sizes.  This makes
+the identification of large disk space consumers relatively quick, as
+no 'du' or similar recursive scan of the file system is required.
+
+Finally, Ceph also allows quotas to be set on any directory in the system.
+The quota can restrict the number of bytes or the number of files stored
+beneath that point in the directory hierarchy.  Quotas can be set using
+extended attributes 'ceph.quota.max_files' and 'ceph.quota.max_bytes', eg::
+
+ setfattr -n ceph.quota.max_bytes -v 100000000 /some/dir
+ getfattr -n ceph.quota.max_bytes /some/dir
+
+A limitation of the current quotas implementation is that it relies on the
+cooperation of the client mounting the file system to stop writers when a
+limit is reached.  A modified or adversarial client cannot be prevented
+from writing as much data as it needs.
+
+Mount Syntax
+============
+
+The basic mount syntax is::
+
+ # mount -t ceph monip[:port][,monip2[:port]...]:/[subdir] mnt
+
+You only need to specify a single monitor, as the client will get the
+full list when it connects.  (However, if the monitor you specify
+happens to be down, the mount won't succeed.)  The port can be left
+off if the monitor is using the default.  So if the monitor is at
+1.2.3.4::
+
+ # mount -t ceph 1.2.3.4:/ /mnt/ceph
+
+is sufficient.  If /sbin/mount.ceph is installed, a hostname can be
+used instead of an IP address.
+
+
+
+Mount Options
+=============
+
+  ip=A.B.C.D[:N]
+	Specify the IP and/or port the client should bind to locally.
+	There is normally not much reason to do this.  If the IP is not
+	specified, the client's IP address is determined by looking at the
+	address its connection to the monitor originates from.
+
+  wsize=X
+	Specify the maximum write size in bytes.  Default: 16 MB.
+
+  rsize=X
+	Specify the maximum read size in bytes.  Default: 16 MB.
+
+  rasize=X
+	Specify the maximum readahead size in bytes.  Default: 8 MB.
+
+  mount_timeout=X
+	Specify the timeout value for mount (in seconds), in the case
+	of a non-responsive Ceph file system.  The default is 30
+	seconds.
+
+  caps_max=X
+	Specify the maximum number of caps to hold. Unused caps are released
+	when number of caps exceeds the limit. The default is 0 (no limit)
+
+  rbytes
+	When stat() is called on a directory, set st_size to 'rbytes',
+	the summation of file sizes over all files nested beneath that
+	directory.  This is the default.
+
+  norbytes
+	When stat() is called on a directory, set st_size to the
+	number of entries in that directory.
+
+  nocrc
+	Disable CRC32C calculation for data writes.  If set, the storage node
+	must rely on TCP's error correction to detect data corruption
+	in the data payload.
+
+  dcache
+        Use the dcache contents to perform negative lookups and
+        readdir when the client has the entire directory contents in
+        its cache.  (This does not change correctness; the client uses
+        cached metadata only when a lease or capability ensures it is
+        valid.)
+
+  nodcache
+        Do not use the dcache as above.  This avoids a significant amount of
+        complex code, sacrificing performance without affecting correctness,
+        and is useful for tracking down bugs.
+
+  noasyncreaddir
+	Do not use the dcache as above for readdir.
+
+  noquotadf
+        Report overall filesystem usage in statfs instead of using the root
+        directory quota.
+
+  nocopyfrom
+        Don't use the RADOS 'copy-from' operation to perform remote object
+        copies.  Currently, it's only used in copy_file_range, which will revert
+        to the default VFS implementation if this option is used.
+
+  recover_session=<no|clean>
+	Set auto reconnect mode in the case where the client is blacklisted. The
+	available modes are "no" and "clean". The default is "no".
+
+	* no: never attempt to reconnect when client detects that it has been
+	  blacklisted. Operations will generally fail after being blacklisted.
+
+	* clean: client reconnects to the ceph cluster automatically when it
+	  detects that it has been blacklisted. During reconnect, client drops
+	  dirty data/metadata, invalidates page caches and writable file handles.
+	  After reconnect, file locks become stale because the MDS loses track
+	  of them. If an inode contains any stale file locks, read/write on the
+	  inode is not allowed until applications release all stale file locks.
+
+More Information
+================
+
+For more information on Ceph, see the home page at
+	https://ceph.com/
+
+The Linux kernel client source tree is available at
+	- https://github.com/ceph/ceph-client.git
+	- git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client.git
+
+and the source for the full system is at
+	https://github.com/ceph/ceph.git
diff --git a/Documentation/filesystems/ceph.txt b/Documentation/filesystems/ceph.txt
deleted file mode 100644
index b19b6a03f91c..000000000000
--- a/Documentation/filesystems/ceph.txt
+++ /dev/null
@@ -1,186 +0,0 @@
-Ceph Distributed File System
-============================
-
-Ceph is a distributed network file system designed to provide good
-performance, reliability, and scalability.
-
-Basic features include:
-
- * POSIX semantics
- * Seamless scaling from 1 to many thousands of nodes
- * High availability and reliability.  No single point of failure.
- * N-way replication of data across storage nodes
- * Fast recovery from node failures
- * Automatic rebalancing of data on node addition/removal
- * Easy deployment: most FS components are userspace daemons
-
-Also,
- * Flexible snapshots (on any directory)
- * Recursive accounting (nested files, directories, bytes)
-
-In contrast to cluster filesystems like GFS, OCFS2, and GPFS that rely
-on symmetric access by all clients to shared block devices, Ceph
-separates data and metadata management into independent server
-clusters, similar to Lustre.  Unlike Lustre, however, metadata and
-storage nodes run entirely as user space daemons.  File data is striped
-across storage nodes in large chunks to distribute workload and
-facilitate high throughputs.  When storage nodes fail, data is
-re-replicated in a distributed fashion by the storage nodes themselves
-(with some minimal coordination from a cluster monitor), making the
-system extremely efficient and scalable.
-
-Metadata servers effectively form a large, consistent, distributed
-in-memory cache above the file namespace that is extremely scalable,
-dynamically redistributes metadata in response to workload changes,
-and can tolerate arbitrary (well, non-Byzantine) node failures.  The
-metadata server takes a somewhat unconventional approach to metadata
-storage to significantly improve performance for common workloads.  In
-particular, inodes with only a single link are embedded in
-directories, allowing entire directories of dentries and inodes to be
-loaded into its cache with a single I/O operation.  The contents of
-extremely large directories can be fragmented and managed by
-independent metadata servers, allowing scalable concurrent access.
-
-The system offers automatic data rebalancing/migration when scaling
-from a small cluster of just a few nodes to many hundreds, without
-requiring an administrator carve the data set into static volumes or
-go through the tedious process of migrating data between servers.
-When the file system approaches full, new nodes can be easily added
-and things will "just work."
-
-Ceph includes flexible snapshot mechanism that allows a user to create
-a snapshot on any subdirectory (and its nested contents) in the
-system.  Snapshot creation and deletion are as simple as 'mkdir
-.snap/foo' and 'rmdir .snap/foo'.
-
-Ceph also provides some recursive accounting on directories for nested
-files and bytes.  That is, a 'getfattr -d foo' on any directory in the
-system will reveal the total number of nested regular files and
-subdirectories, and a summation of all nested file sizes.  This makes
-the identification of large disk space consumers relatively quick, as
-no 'du' or similar recursive scan of the file system is required.
-
-Finally, Ceph also allows quotas to be set on any directory in the system.
-The quota can restrict the number of bytes or the number of files stored
-beneath that point in the directory hierarchy.  Quotas can be set using
-extended attributes 'ceph.quota.max_files' and 'ceph.quota.max_bytes', eg:
-
- setfattr -n ceph.quota.max_bytes -v 100000000 /some/dir
- getfattr -n ceph.quota.max_bytes /some/dir
-
-A limitation of the current quotas implementation is that it relies on the
-cooperation of the client mounting the file system to stop writers when a
-limit is reached.  A modified or adversarial client cannot be prevented
-from writing as much data as it needs.
-
-Mount Syntax
-============
-
-The basic mount syntax is:
-
- # mount -t ceph monip[:port][,monip2[:port]...]:/[subdir] mnt
-
-You only need to specify a single monitor, as the client will get the
-full list when it connects.  (However, if the monitor you specify
-happens to be down, the mount won't succeed.)  The port can be left
-off if the monitor is using the default.  So if the monitor is at
-1.2.3.4,
-
- # mount -t ceph 1.2.3.4:/ /mnt/ceph
-
-is sufficient.  If /sbin/mount.ceph is installed, a hostname can be
-used instead of an IP address.
-
-
-
-Mount Options
-=============
-
-  ip=A.B.C.D[:N]
-	Specify the IP and/or port the client should bind to locally.
-	There is normally not much reason to do this.  If the IP is not
-	specified, the client's IP address is determined by looking at the
-	address its connection to the monitor originates from.
-
-  wsize=X
-	Specify the maximum write size in bytes.  Default: 16 MB.
-
-  rsize=X
-	Specify the maximum read size in bytes.  Default: 16 MB.
-
-  rasize=X
-	Specify the maximum readahead size in bytes.  Default: 8 MB.
-
-  mount_timeout=X
-	Specify the timeout value for mount (in seconds), in the case
-	of a non-responsive Ceph file system.  The default is 30
-	seconds.
-
-  caps_max=X
-	Specify the maximum number of caps to hold. Unused caps are released
-	when number of caps exceeds the limit. The default is 0 (no limit)
-
-  rbytes
-	When stat() is called on a directory, set st_size to 'rbytes',
-	the summation of file sizes over all files nested beneath that
-	directory.  This is the default.
-
-  norbytes
-	When stat() is called on a directory, set st_size to the
-	number of entries in that directory.
-
-  nocrc
-	Disable CRC32C calculation for data writes.  If set, the storage node
-	must rely on TCP's error correction to detect data corruption
-	in the data payload.
-
-  dcache
-        Use the dcache contents to perform negative lookups and
-        readdir when the client has the entire directory contents in
-        its cache.  (This does not change correctness; the client uses
-        cached metadata only when a lease or capability ensures it is
-        valid.)
-
-  nodcache
-        Do not use the dcache as above.  This avoids a significant amount of
-        complex code, sacrificing performance without affecting correctness,
-        and is useful for tracking down bugs.
-
-  noasyncreaddir
-	Do not use the dcache as above for readdir.
-
-  noquotadf
-        Report overall filesystem usage in statfs instead of using the root
-        directory quota.
-
-  nocopyfrom
-        Don't use the RADOS 'copy-from' operation to perform remote object
-        copies.  Currently, it's only used in copy_file_range, which will revert
-        to the default VFS implementation if this option is used.
-
-  recover_session=<no|clean>
-	Set auto reconnect mode in the case where the client is blacklisted. The
-	available modes are "no" and "clean". The default is "no".
-
-	* no: never attempt to reconnect when client detects that it has been
-	blacklisted. Operations will generally fail after being blacklisted.
-
-	* clean: client reconnects to the ceph cluster automatically when it
-	detects that it has been blacklisted. During reconnect, client drops
-	dirty data/metadata, invalidates page caches and writable file handles.
-	After reconnect, file locks become stale because the MDS loses track
-	of them. If an inode contains any stale file locks, read/write on the
-	inode is not allowed until applications release all stale file locks.
-
-More Information
-================
-
-For more information on Ceph, see the home page at
-	https://ceph.com/
-
-The Linux kernel client source tree is available at
-	https://github.com/ceph/ceph-client.git
-	git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client.git
-
-and the source for the full system is at
-	https://github.com/ceph/ceph.git
diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index dae862cf167e..ddd8f7b2bb25 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -55,6 +55,7 @@ Documentation for filesystem implementations.
    befs
    bfs
    btrfs
+   ceph
    fuse
    overlayfs
    virtiofs
-- 
cgit 


From f1fa0e6028d395c5f0d1a0929a795b8dc0d43295 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:11:56 +0100
Subject: docs: filesystems: convert cramfs.txt to ReST

- Add a SPDX header;
- Adjust document title;
- Some whitespace fixes and new line breaks;
- Mark literal blocks as such;
- Add table markups;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Acked-by: Nicolas Pitre <nico@fluxnic.net>
Link: https://lore.kernel.org/r/e87b267e71f99974b7bb3fc0a4a08454ff58165e.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/cramfs.rst | 123 +++++++++++++++++++++++++++++++++++
 Documentation/filesystems/cramfs.txt | 118 ---------------------------------
 Documentation/filesystems/index.rst  |   1 +
 3 files changed, 124 insertions(+), 118 deletions(-)
 create mode 100644 Documentation/filesystems/cramfs.rst
 delete mode 100644 Documentation/filesystems/cramfs.txt

diff --git a/Documentation/filesystems/cramfs.rst b/Documentation/filesystems/cramfs.rst
new file mode 100644
index 000000000000..afbdbde98bd2
--- /dev/null
+++ b/Documentation/filesystems/cramfs.rst
@@ -0,0 +1,123 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===========================================
+Cramfs - cram a filesystem onto a small ROM
+===========================================
+
+cramfs is designed to be simple and small, and to compress things well.
+
+It uses the zlib routines to compress a file one page at a time, and
+allows random page access.  The meta-data is not compressed, but is
+expressed in a very terse representation to make it use much less
+diskspace than traditional filesystems.
+
+You can't write to a cramfs filesystem (making it compressible and
+compact also makes it _very_ hard to update on-the-fly), so you have to
+create the disk image with the "mkcramfs" utility.
+
+
+Usage Notes
+-----------
+
+File sizes are limited to less than 16MB.
+
+Maximum filesystem size is a little over 256MB.  (The last file on the
+filesystem is allowed to extend past 256MB.)
+
+Only the low 8 bits of gid are stored.  The current version of
+mkcramfs simply truncates to 8 bits, which is a potential security
+issue.
+
+Hard links are supported, but hard linked files
+will still have a link count of 1 in the cramfs image.
+
+Cramfs directories have no ``.`` or ``..`` entries.  Directories (like
+every other file on cramfs) always have a link count of 1.  (There's
+no need to use -noleaf in ``find``, btw.)
+
+No timestamps are stored in a cramfs, so these default to the epoch
+(1970 GMT).  Recently-accessed files may have updated timestamps, but
+the update lasts only as long as the inode is cached in memory, after
+which the timestamp reverts to 1970, i.e. moves backwards in time.
+
+Currently, cramfs must be written and read with architectures of the
+same endianness, and can be read only by kernels with PAGE_SIZE
+== 4096.  At least the latter of these is a bug, but it hasn't been
+decided what the best fix is.  For the moment if you have larger pages
+you can just change the #define in mkcramfs.c, so long as you don't
+mind the filesystem becoming unreadable to future kernels.
+
+
+Memory Mapped cramfs image
+--------------------------
+
+The CRAMFS_MTD Kconfig option adds support for loading data directly from
+a physical linear memory range (usually non volatile memory like Flash)
+instead of going through the block device layer. This saves some memory
+since no intermediate buffering is necessary to hold the data before
+decompressing.
+
+And when data blocks are kept uncompressed and properly aligned, they will
+automatically be mapped directly into user space whenever possible providing
+eXecute-In-Place (XIP) from ROM of read-only segments. Data segments mapped
+read-write (hence they have to be copied to RAM) may still be compressed in
+the cramfs image in the same file along with non compressed read-only
+segments. Both MMU and no-MMU systems are supported. This is particularly
+handy for tiny embedded systems with very tight memory constraints.
+
+The location of the cramfs image in memory is system dependent. You must
+know the proper physical address where the cramfs image is located and
+configure an MTD device for it. Also, that MTD device must be supported
+by a map driver that implements the "point" method. Examples of such
+MTD drivers are cfi_cmdset_0001 (Intel/Sharp CFI flash) or physmap
+(Flash device in physical memory map). MTD partitions based on such devices
+are fine too. Then that device should be specified with the "mtd:" prefix
+as the mount device argument. For example, to mount the MTD device named
+"fs_partition" on the /mnt directory::
+
+    $ mount -t cramfs mtd:fs_partition /mnt
+
+To boot a kernel with this as root filesystem, suffice to specify
+something like "root=mtd:fs_partition" on the kernel command line.
+
+
+Tools
+-----
+
+A version of mkcramfs that can take advantage of the latest capabilities
+described above can be found here:
+
+https://github.com/npitre/cramfs-tools
+
+
+For /usr/share/magic
+--------------------
+
+=====	=======================	=======================
+0	ulelong	0x28cd3d45	Linux cramfs offset 0
+>4	ulelong	x		size %d
+>8	ulelong	x		flags 0x%x
+>12	ulelong	x		future 0x%x
+>16	string	>\0		signature "%.16s"
+>32	ulelong	x		fsid.crc 0x%x
+>36	ulelong	x		fsid.edition %d
+>40	ulelong	x		fsid.blocks %d
+>44	ulelong	x		fsid.files %d
+>48	string	>\0		name "%.16s"
+512	ulelong	0x28cd3d45	Linux cramfs offset 512
+>516	ulelong	x		size %d
+>520	ulelong	x		flags 0x%x
+>524	ulelong	x		future 0x%x
+>528	string	>\0		signature "%.16s"
+>544	ulelong	x		fsid.crc 0x%x
+>548	ulelong	x		fsid.edition %d
+>552	ulelong	x		fsid.blocks %d
+>556	ulelong	x		fsid.files %d
+>560	string	>\0		name "%.16s"
+=====	=======================	=======================
+
+
+Hacker Notes
+------------
+
+See fs/cramfs/README for filesystem layout and implementation notes.
diff --git a/Documentation/filesystems/cramfs.txt b/Documentation/filesystems/cramfs.txt
deleted file mode 100644
index 8e19a53d648b..000000000000
--- a/Documentation/filesystems/cramfs.txt
+++ /dev/null
@@ -1,118 +0,0 @@
-
-	Cramfs - cram a filesystem onto a small ROM
-
-cramfs is designed to be simple and small, and to compress things well. 
-
-It uses the zlib routines to compress a file one page at a time, and
-allows random page access.  The meta-data is not compressed, but is
-expressed in a very terse representation to make it use much less
-diskspace than traditional filesystems. 
-
-You can't write to a cramfs filesystem (making it compressible and
-compact also makes it _very_ hard to update on-the-fly), so you have to
-create the disk image with the "mkcramfs" utility.
-
-
-Usage Notes
------------
-
-File sizes are limited to less than 16MB.
-
-Maximum filesystem size is a little over 256MB.  (The last file on the
-filesystem is allowed to extend past 256MB.)
-
-Only the low 8 bits of gid are stored.  The current version of
-mkcramfs simply truncates to 8 bits, which is a potential security
-issue.
-
-Hard links are supported, but hard linked files
-will still have a link count of 1 in the cramfs image.
-
-Cramfs directories have no `.' or `..' entries.  Directories (like
-every other file on cramfs) always have a link count of 1.  (There's
-no need to use -noleaf in `find', btw.)
-
-No timestamps are stored in a cramfs, so these default to the epoch
-(1970 GMT).  Recently-accessed files may have updated timestamps, but
-the update lasts only as long as the inode is cached in memory, after
-which the timestamp reverts to 1970, i.e. moves backwards in time.
-
-Currently, cramfs must be written and read with architectures of the
-same endianness, and can be read only by kernels with PAGE_SIZE
-== 4096.  At least the latter of these is a bug, but it hasn't been
-decided what the best fix is.  For the moment if you have larger pages
-you can just change the #define in mkcramfs.c, so long as you don't
-mind the filesystem becoming unreadable to future kernels.
-
-
-Memory Mapped cramfs image
---------------------------
-
-The CRAMFS_MTD Kconfig option adds support for loading data directly from
-a physical linear memory range (usually non volatile memory like Flash)
-instead of going through the block device layer. This saves some memory
-since no intermediate buffering is necessary to hold the data before
-decompressing.
-
-And when data blocks are kept uncompressed and properly aligned, they will
-automatically be mapped directly into user space whenever possible providing
-eXecute-In-Place (XIP) from ROM of read-only segments. Data segments mapped
-read-write (hence they have to be copied to RAM) may still be compressed in
-the cramfs image in the same file along with non compressed read-only
-segments. Both MMU and no-MMU systems are supported. This is particularly
-handy for tiny embedded systems with very tight memory constraints.
-
-The location of the cramfs image in memory is system dependent. You must
-know the proper physical address where the cramfs image is located and
-configure an MTD device for it. Also, that MTD device must be supported
-by a map driver that implements the "point" method. Examples of such
-MTD drivers are cfi_cmdset_0001 (Intel/Sharp CFI flash) or physmap
-(Flash device in physical memory map). MTD partitions based on such devices
-are fine too. Then that device should be specified with the "mtd:" prefix
-as the mount device argument. For example, to mount the MTD device named
-"fs_partition" on the /mnt directory:
-
-$ mount -t cramfs mtd:fs_partition /mnt
-
-To boot a kernel with this as root filesystem, suffice to specify
-something like "root=mtd:fs_partition" on the kernel command line.
-
-
-Tools
------
-
-A version of mkcramfs that can take advantage of the latest capabilities
-described above can be found here:
-
-https://github.com/npitre/cramfs-tools
-
-
-For /usr/share/magic
---------------------
-
-0	ulelong	0x28cd3d45	Linux cramfs offset 0
->4	ulelong	x		size %d
->8	ulelong	x		flags 0x%x
->12	ulelong	x		future 0x%x
->16	string	>\0		signature "%.16s"
->32	ulelong	x		fsid.crc 0x%x
->36	ulelong	x		fsid.edition %d
->40	ulelong	x		fsid.blocks %d
->44	ulelong	x		fsid.files %d
->48	string	>\0		name "%.16s"
-512	ulelong	0x28cd3d45	Linux cramfs offset 512
->516	ulelong	x		size %d
->520	ulelong	x		flags 0x%x
->524	ulelong	x		future 0x%x
->528	string	>\0		signature "%.16s"
->544	ulelong	x		fsid.crc 0x%x
->548	ulelong	x		fsid.edition %d
->552	ulelong	x		fsid.blocks %d
->556	ulelong	x		fsid.files %d
->560	string	>\0		name "%.16s"
-
-
-Hacker Notes
-------------
-
-See fs/cramfs/README for filesystem layout and implementation notes.
diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index ddd8f7b2bb25..8fe848ea04af 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -56,6 +56,7 @@ Documentation for filesystem implementations.
    bfs
    btrfs
    ceph
+   cramfs
    fuse
    overlayfs
    virtiofs
-- 
cgit 


From 57443789849cd79e66488301a01f01c6340942ce Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:11:57 +0100
Subject: docs: filesystems: convert debugfs.txt to ReST

- Add a SPDX header;
- Use copyright symbol;
- Add a document title;
- Some whitespace fixes and new line breaks;
- Mark literal blocks as such;
- Use footnoote markups;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/42db8f9db17a5d8b619130815ae63d1615951d50.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/debugfs.rst | 247 ++++++++++++++++++++++++++++++++++
 Documentation/filesystems/debugfs.txt | 241 ---------------------------------
 Documentation/filesystems/index.rst   |   1 +
 3 files changed, 248 insertions(+), 241 deletions(-)
 create mode 100644 Documentation/filesystems/debugfs.rst
 delete mode 100644 Documentation/filesystems/debugfs.txt

diff --git a/Documentation/filesystems/debugfs.rst b/Documentation/filesystems/debugfs.rst
new file mode 100644
index 000000000000..c89d2d335dfb
--- /dev/null
+++ b/Documentation/filesystems/debugfs.rst
@@ -0,0 +1,247 @@
+.. SPDX-License-Identifier: GPL-2.0
+.. include:: <isonum.txt>
+
+=======
+DebugFS
+=======
+
+Copyright |copy| 2009 Jonathan Corbet <corbet@lwn.net>
+
+Debugfs exists as a simple way for kernel developers to make information
+available to user space.  Unlike /proc, which is only meant for information
+about a process, or sysfs, which has strict one-value-per-file rules,
+debugfs has no rules at all.  Developers can put any information they want
+there.  The debugfs filesystem is also intended to not serve as a stable
+ABI to user space; in theory, there are no stability constraints placed on
+files exported there.  The real world is not always so simple, though [1]_;
+even debugfs interfaces are best designed with the idea that they will need
+to be maintained forever.
+
+Debugfs is typically mounted with a command like::
+
+    mount -t debugfs none /sys/kernel/debug
+
+(Or an equivalent /etc/fstab line).
+The debugfs root directory is accessible only to the root user by
+default. To change access to the tree the "uid", "gid" and "mode" mount
+options can be used.
+
+Note that the debugfs API is exported GPL-only to modules.
+
+Code using debugfs should include <linux/debugfs.h>.  Then, the first order
+of business will be to create at least one directory to hold a set of
+debugfs files::
+
+    struct dentry *debugfs_create_dir(const char *name, struct dentry *parent);
+
+This call, if successful, will make a directory called name underneath the
+indicated parent directory.  If parent is NULL, the directory will be
+created in the debugfs root.  On success, the return value is a struct
+dentry pointer which can be used to create files in the directory (and to
+clean it up at the end).  An ERR_PTR(-ERROR) return value indicates that
+something went wrong.  If ERR_PTR(-ENODEV) is returned, that is an
+indication that the kernel has been built without debugfs support and none
+of the functions described below will work.
+
+The most general way to create a file within a debugfs directory is with::
+
+    struct dentry *debugfs_create_file(const char *name, umode_t mode,
+				       struct dentry *parent, void *data,
+				       const struct file_operations *fops);
+
+Here, name is the name of the file to create, mode describes the access
+permissions the file should have, parent indicates the directory which
+should hold the file, data will be stored in the i_private field of the
+resulting inode structure, and fops is a set of file operations which
+implement the file's behavior.  At a minimum, the read() and/or write()
+operations should be provided; others can be included as needed.  Again,
+the return value will be a dentry pointer to the created file,
+ERR_PTR(-ERROR) on error, or ERR_PTR(-ENODEV) if debugfs support is
+missing.
+
+Create a file with an initial size, the following function can be used
+instead::
+
+    struct dentry *debugfs_create_file_size(const char *name, umode_t mode,
+				struct dentry *parent, void *data,
+				const struct file_operations *fops,
+				loff_t file_size);
+
+file_size is the initial file size. The other parameters are the same
+as the function debugfs_create_file.
+
+In a number of cases, the creation of a set of file operations is not
+actually necessary; the debugfs code provides a number of helper functions
+for simple situations.  Files containing a single integer value can be
+created with any of::
+
+    void debugfs_create_u8(const char *name, umode_t mode,
+			   struct dentry *parent, u8 *value);
+    void debugfs_create_u16(const char *name, umode_t mode,
+			    struct dentry *parent, u16 *value);
+    struct dentry *debugfs_create_u32(const char *name, umode_t mode,
+				      struct dentry *parent, u32 *value);
+    void debugfs_create_u64(const char *name, umode_t mode,
+			    struct dentry *parent, u64 *value);
+
+These files support both reading and writing the given value; if a specific
+file should not be written to, simply set the mode bits accordingly.  The
+values in these files are in decimal; if hexadecimal is more appropriate,
+the following functions can be used instead::
+
+    void debugfs_create_x8(const char *name, umode_t mode,
+			   struct dentry *parent, u8 *value);
+    void debugfs_create_x16(const char *name, umode_t mode,
+			    struct dentry *parent, u16 *value);
+    void debugfs_create_x32(const char *name, umode_t mode,
+			    struct dentry *parent, u32 *value);
+    void debugfs_create_x64(const char *name, umode_t mode,
+			    struct dentry *parent, u64 *value);
+
+These functions are useful as long as the developer knows the size of the
+value to be exported.  Some types can have different widths on different
+architectures, though, complicating the situation somewhat.  There are
+functions meant to help out in such special cases::
+
+    void debugfs_create_size_t(const char *name, umode_t mode,
+			       struct dentry *parent, size_t *value);
+
+As might be expected, this function will create a debugfs file to represent
+a variable of type size_t.
+
+Similarly, there are helpers for variables of type unsigned long, in decimal
+and hexadecimal::
+
+    struct dentry *debugfs_create_ulong(const char *name, umode_t mode,
+					struct dentry *parent,
+					unsigned long *value);
+    void debugfs_create_xul(const char *name, umode_t mode,
+			    struct dentry *parent, unsigned long *value);
+
+Boolean values can be placed in debugfs with::
+
+    struct dentry *debugfs_create_bool(const char *name, umode_t mode,
+				       struct dentry *parent, bool *value);
+
+A read on the resulting file will yield either Y (for non-zero values) or
+N, followed by a newline.  If written to, it will accept either upper- or
+lower-case values, or 1 or 0.  Any other input will be silently ignored.
+
+Also, atomic_t values can be placed in debugfs with::
+
+    void debugfs_create_atomic_t(const char *name, umode_t mode,
+				 struct dentry *parent, atomic_t *value)
+
+A read of this file will get atomic_t values, and a write of this file
+will set atomic_t values.
+
+Another option is exporting a block of arbitrary binary data, with
+this structure and function::
+
+    struct debugfs_blob_wrapper {
+	void *data;
+	unsigned long size;
+    };
+
+    struct dentry *debugfs_create_blob(const char *name, umode_t mode,
+				       struct dentry *parent,
+				       struct debugfs_blob_wrapper *blob);
+
+A read of this file will return the data pointed to by the
+debugfs_blob_wrapper structure.  Some drivers use "blobs" as a simple way
+to return several lines of (static) formatted text output.  This function
+can be used to export binary information, but there does not appear to be
+any code which does so in the mainline.  Note that all files created with
+debugfs_create_blob() are read-only.
+
+If you want to dump a block of registers (something that happens quite
+often during development, even if little such code reaches mainline.
+Debugfs offers two functions: one to make a registers-only file, and
+another to insert a register block in the middle of another sequential
+file::
+
+    struct debugfs_reg32 {
+	char *name;
+	unsigned long offset;
+    };
+
+    struct debugfs_regset32 {
+	struct debugfs_reg32 *regs;
+	int nregs;
+	void __iomem *base;
+    };
+
+    struct dentry *debugfs_create_regset32(const char *name, umode_t mode,
+				     struct dentry *parent,
+				     struct debugfs_regset32 *regset);
+
+    void debugfs_print_regs32(struct seq_file *s, struct debugfs_reg32 *regs,
+			 int nregs, void __iomem *base, char *prefix);
+
+The "base" argument may be 0, but you may want to build the reg32 array
+using __stringify, and a number of register names (macros) are actually
+byte offsets over a base for the register block.
+
+If you want to dump an u32 array in debugfs, you can create file with::
+
+    void debugfs_create_u32_array(const char *name, umode_t mode,
+			struct dentry *parent,
+			u32 *array, u32 elements);
+
+The "array" argument provides data, and the "elements" argument is
+the number of elements in the array. Note: Once array is created its
+size can not be changed.
+
+There is a helper function to create device related seq_file::
+
+   struct dentry *debugfs_create_devm_seqfile(struct device *dev,
+				const char *name,
+				struct dentry *parent,
+				int (*read_fn)(struct seq_file *s,
+					void *data));
+
+The "dev" argument is the device related to this debugfs file, and
+the "read_fn" is a function pointer which to be called to print the
+seq_file content.
+
+There are a couple of other directory-oriented helper functions::
+
+    struct dentry *debugfs_rename(struct dentry *old_dir,
+    				  struct dentry *old_dentry,
+		                  struct dentry *new_dir,
+				  const char *new_name);
+
+    struct dentry *debugfs_create_symlink(const char *name,
+                                          struct dentry *parent,
+				      	  const char *target);
+
+A call to debugfs_rename() will give a new name to an existing debugfs
+file, possibly in a different directory.  The new_name must not exist prior
+to the call; the return value is old_dentry with updated information.
+Symbolic links can be created with debugfs_create_symlink().
+
+There is one important thing that all debugfs users must take into account:
+there is no automatic cleanup of any directories created in debugfs.  If a
+module is unloaded without explicitly removing debugfs entries, the result
+will be a lot of stale pointers and no end of highly antisocial behavior.
+So all debugfs users - at least those which can be built as modules - must
+be prepared to remove all files and directories they create there.  A file
+can be removed with::
+
+    void debugfs_remove(struct dentry *dentry);
+
+The dentry value can be NULL or an error value, in which case nothing will
+be removed.
+
+Once upon a time, debugfs users were required to remember the dentry
+pointer for every debugfs file they created so that all files could be
+cleaned up.  We live in more civilized times now, though, and debugfs users
+can call::
+
+    void debugfs_remove_recursive(struct dentry *dentry);
+
+If this function is passed a pointer for the dentry corresponding to the
+top-level directory, the entire hierarchy below that directory will be
+removed.
+
+.. [1] http://lwn.net/Articles/309298/
diff --git a/Documentation/filesystems/debugfs.txt b/Documentation/filesystems/debugfs.txt
deleted file mode 100644
index dc497b96fa4f..000000000000
--- a/Documentation/filesystems/debugfs.txt
+++ /dev/null
@@ -1,241 +0,0 @@
-Copyright 2009 Jonathan Corbet <corbet@lwn.net>
-
-Debugfs exists as a simple way for kernel developers to make information
-available to user space.  Unlike /proc, which is only meant for information
-about a process, or sysfs, which has strict one-value-per-file rules,
-debugfs has no rules at all.  Developers can put any information they want
-there.  The debugfs filesystem is also intended to not serve as a stable
-ABI to user space; in theory, there are no stability constraints placed on
-files exported there.  The real world is not always so simple, though [1];
-even debugfs interfaces are best designed with the idea that they will need
-to be maintained forever.
-
-Debugfs is typically mounted with a command like:
-
-    mount -t debugfs none /sys/kernel/debug
-
-(Or an equivalent /etc/fstab line).
-The debugfs root directory is accessible only to the root user by
-default. To change access to the tree the "uid", "gid" and "mode" mount
-options can be used.
-
-Note that the debugfs API is exported GPL-only to modules.
-
-Code using debugfs should include <linux/debugfs.h>.  Then, the first order
-of business will be to create at least one directory to hold a set of
-debugfs files:
-
-    struct dentry *debugfs_create_dir(const char *name, struct dentry *parent);
-
-This call, if successful, will make a directory called name underneath the
-indicated parent directory.  If parent is NULL, the directory will be
-created in the debugfs root.  On success, the return value is a struct
-dentry pointer which can be used to create files in the directory (and to
-clean it up at the end).  An ERR_PTR(-ERROR) return value indicates that
-something went wrong.  If ERR_PTR(-ENODEV) is returned, that is an
-indication that the kernel has been built without debugfs support and none
-of the functions described below will work.
-
-The most general way to create a file within a debugfs directory is with:
-
-    struct dentry *debugfs_create_file(const char *name, umode_t mode,
-				       struct dentry *parent, void *data,
-				       const struct file_operations *fops);
-
-Here, name is the name of the file to create, mode describes the access
-permissions the file should have, parent indicates the directory which
-should hold the file, data will be stored in the i_private field of the
-resulting inode structure, and fops is a set of file operations which
-implement the file's behavior.  At a minimum, the read() and/or write()
-operations should be provided; others can be included as needed.  Again,
-the return value will be a dentry pointer to the created file,
-ERR_PTR(-ERROR) on error, or ERR_PTR(-ENODEV) if debugfs support is
-missing.
-
-Create a file with an initial size, the following function can be used
-instead:
-
-    struct dentry *debugfs_create_file_size(const char *name, umode_t mode,
-				struct dentry *parent, void *data,
-				const struct file_operations *fops,
-				loff_t file_size);
-
-file_size is the initial file size. The other parameters are the same
-as the function debugfs_create_file.
-
-In a number of cases, the creation of a set of file operations is not
-actually necessary; the debugfs code provides a number of helper functions
-for simple situations.  Files containing a single integer value can be
-created with any of:
-
-    void debugfs_create_u8(const char *name, umode_t mode,
-			   struct dentry *parent, u8 *value);
-    void debugfs_create_u16(const char *name, umode_t mode,
-			    struct dentry *parent, u16 *value);
-    struct dentry *debugfs_create_u32(const char *name, umode_t mode,
-				      struct dentry *parent, u32 *value);
-    void debugfs_create_u64(const char *name, umode_t mode,
-			    struct dentry *parent, u64 *value);
-
-These files support both reading and writing the given value; if a specific
-file should not be written to, simply set the mode bits accordingly.  The
-values in these files are in decimal; if hexadecimal is more appropriate,
-the following functions can be used instead:
-
-    void debugfs_create_x8(const char *name, umode_t mode,
-			   struct dentry *parent, u8 *value);
-    void debugfs_create_x16(const char *name, umode_t mode,
-			    struct dentry *parent, u16 *value);
-    void debugfs_create_x32(const char *name, umode_t mode,
-			    struct dentry *parent, u32 *value);
-    void debugfs_create_x64(const char *name, umode_t mode,
-			    struct dentry *parent, u64 *value);
-
-These functions are useful as long as the developer knows the size of the
-value to be exported.  Some types can have different widths on different
-architectures, though, complicating the situation somewhat.  There are
-functions meant to help out in such special cases:
-
-    void debugfs_create_size_t(const char *name, umode_t mode,
-			       struct dentry *parent, size_t *value);
-
-As might be expected, this function will create a debugfs file to represent
-a variable of type size_t.
-
-Similarly, there are helpers for variables of type unsigned long, in decimal
-and hexadecimal:
-
-    struct dentry *debugfs_create_ulong(const char *name, umode_t mode,
-					struct dentry *parent,
-					unsigned long *value);
-    void debugfs_create_xul(const char *name, umode_t mode,
-			    struct dentry *parent, unsigned long *value);
-
-Boolean values can be placed in debugfs with:
-
-    struct dentry *debugfs_create_bool(const char *name, umode_t mode,
-				       struct dentry *parent, bool *value);
-
-A read on the resulting file will yield either Y (for non-zero values) or
-N, followed by a newline.  If written to, it will accept either upper- or
-lower-case values, or 1 or 0.  Any other input will be silently ignored.
-
-Also, atomic_t values can be placed in debugfs with:
-
-    void debugfs_create_atomic_t(const char *name, umode_t mode,
-				 struct dentry *parent, atomic_t *value)
-
-A read of this file will get atomic_t values, and a write of this file
-will set atomic_t values.
-
-Another option is exporting a block of arbitrary binary data, with
-this structure and function:
-
-    struct debugfs_blob_wrapper {
-	void *data;
-	unsigned long size;
-    };
-
-    struct dentry *debugfs_create_blob(const char *name, umode_t mode,
-				       struct dentry *parent,
-				       struct debugfs_blob_wrapper *blob);
-
-A read of this file will return the data pointed to by the
-debugfs_blob_wrapper structure.  Some drivers use "blobs" as a simple way
-to return several lines of (static) formatted text output.  This function
-can be used to export binary information, but there does not appear to be
-any code which does so in the mainline.  Note that all files created with
-debugfs_create_blob() are read-only.
-
-If you want to dump a block of registers (something that happens quite
-often during development, even if little such code reaches mainline.
-Debugfs offers two functions: one to make a registers-only file, and
-another to insert a register block in the middle of another sequential
-file.
-
-    struct debugfs_reg32 {
-	char *name;
-	unsigned long offset;
-    };
-
-    struct debugfs_regset32 {
-	struct debugfs_reg32 *regs;
-	int nregs;
-	void __iomem *base;
-    };
-
-    struct dentry *debugfs_create_regset32(const char *name, umode_t mode,
-				     struct dentry *parent,
-				     struct debugfs_regset32 *regset);
-
-    void debugfs_print_regs32(struct seq_file *s, struct debugfs_reg32 *regs,
-			 int nregs, void __iomem *base, char *prefix);
-
-The "base" argument may be 0, but you may want to build the reg32 array
-using __stringify, and a number of register names (macros) are actually
-byte offsets over a base for the register block.
-
-If you want to dump an u32 array in debugfs, you can create file with:
-
-    void debugfs_create_u32_array(const char *name, umode_t mode,
-			struct dentry *parent,
-			u32 *array, u32 elements);
-
-The "array" argument provides data, and the "elements" argument is
-the number of elements in the array. Note: Once array is created its
-size can not be changed.
-
-There is a helper function to create device related seq_file:
-
-   struct dentry *debugfs_create_devm_seqfile(struct device *dev,
-				const char *name,
-				struct dentry *parent,
-				int (*read_fn)(struct seq_file *s,
-					void *data));
-
-The "dev" argument is the device related to this debugfs file, and
-the "read_fn" is a function pointer which to be called to print the
-seq_file content.
-
-There are a couple of other directory-oriented helper functions:
-
-    struct dentry *debugfs_rename(struct dentry *old_dir, 
-    				  struct dentry *old_dentry,
-		                  struct dentry *new_dir, 
-				  const char *new_name);
-
-    struct dentry *debugfs_create_symlink(const char *name, 
-                                          struct dentry *parent,
-				      	  const char *target);
-
-A call to debugfs_rename() will give a new name to an existing debugfs
-file, possibly in a different directory.  The new_name must not exist prior
-to the call; the return value is old_dentry with updated information.
-Symbolic links can be created with debugfs_create_symlink().
-
-There is one important thing that all debugfs users must take into account:
-there is no automatic cleanup of any directories created in debugfs.  If a
-module is unloaded without explicitly removing debugfs entries, the result
-will be a lot of stale pointers and no end of highly antisocial behavior.
-So all debugfs users - at least those which can be built as modules - must
-be prepared to remove all files and directories they create there.  A file
-can be removed with:
-
-    void debugfs_remove(struct dentry *dentry);
-
-The dentry value can be NULL or an error value, in which case nothing will
-be removed.
-
-Once upon a time, debugfs users were required to remember the dentry
-pointer for every debugfs file they created so that all files could be
-cleaned up.  We live in more civilized times now, though, and debugfs users
-can call:
-
-    void debugfs_remove_recursive(struct dentry *dentry);
-
-If this function is passed a pointer for the dentry corresponding to the
-top-level directory, the entire hierarchy below that directory will be
-removed.
-
-Notes:
-	[1] http://lwn.net/Articles/309298/
diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index 8fe848ea04af..ab3b656bbe60 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -57,6 +57,7 @@ Documentation for filesystem implementations.
    btrfs
    ceph
    cramfs
+   debugfs
    fuse
    overlayfs
    virtiofs
-- 
cgit 


From 14a19fa5cf759ea18bc7d692cd8fe326af3c4d0a Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:11:58 +0100
Subject: docs: filesystems: convert dlmfs.txt to ReST

- Add a SPDX header;
- Use copyright symbol;
- Adjust document title;
- Some whitespace fixes and new line breaks;
- Mark literal blocks as such;
- Add table markups;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/efc9e59925723e17d1a4741b11049616c221463e.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/dlmfs.rst | 140 ++++++++++++++++++++++++++++++++++++
 Documentation/filesystems/dlmfs.txt | 130 ---------------------------------
 Documentation/filesystems/index.rst |   1 +
 3 files changed, 141 insertions(+), 130 deletions(-)
 create mode 100644 Documentation/filesystems/dlmfs.rst
 delete mode 100644 Documentation/filesystems/dlmfs.txt

diff --git a/Documentation/filesystems/dlmfs.rst b/Documentation/filesystems/dlmfs.rst
new file mode 100644
index 000000000000..68daaa7facf9
--- /dev/null
+++ b/Documentation/filesystems/dlmfs.rst
@@ -0,0 +1,140 @@
+.. SPDX-License-Identifier: GPL-2.0
+.. include:: <isonum.txt>
+
+=====
+DLMFS
+=====
+
+A minimal DLM userspace interface implemented via a virtual file
+system.
+
+dlmfs is built with OCFS2 as it requires most of its infrastructure.
+
+:Project web page:    http://ocfs2.wiki.kernel.org
+:Tools web page:      https://github.com/markfasheh/ocfs2-tools
+:OCFS2 mailing lists: http://oss.oracle.com/projects/ocfs2/mailman/
+
+All code copyright 2005 Oracle except when otherwise noted.
+
+Credits
+=======
+
+Some code taken from ramfs which is Copyright |copy| 2000 Linus Torvalds
+and Transmeta Corp.
+
+Mark Fasheh <mark.fasheh@oracle.com>
+
+Caveats
+=======
+- Right now it only works with the OCFS2 DLM, though support for other
+  DLM implementations should not be a major issue.
+
+Mount options
+=============
+None
+
+Usage
+=====
+
+If you're just interested in OCFS2, then please see ocfs2.txt. The
+rest of this document will be geared towards those who want to use
+dlmfs for easy to setup and easy to use clustered locking in
+userspace.
+
+Setup
+=====
+
+dlmfs requires that the OCFS2 cluster infrastructure be in
+place. Please download ocfs2-tools from the above url and configure a
+cluster.
+
+You'll want to start heartbeating on a volume which all the nodes in
+your lockspace can access. The easiest way to do this is via
+ocfs2_hb_ctl (distributed with ocfs2-tools). Right now it requires
+that an OCFS2 file system be in place so that it can automatically
+find its heartbeat area, though it will eventually support heartbeat
+against raw disks.
+
+Please see the ocfs2_hb_ctl and mkfs.ocfs2 manual pages distributed
+with ocfs2-tools.
+
+Once you're heartbeating, DLM lock 'domains' can be easily created /
+destroyed and locks within them accessed.
+
+Locking
+=======
+
+Users may access dlmfs via standard file system calls, or they can use
+'libo2dlm' (distributed with ocfs2-tools) which abstracts the file
+system calls and presents a more traditional locking api.
+
+dlmfs handles lock caching automatically for the user, so a lock
+request for an already acquired lock will not generate another DLM
+call. Userspace programs are assumed to handle their own local
+locking.
+
+Two levels of locks are supported - Shared Read, and Exclusive.
+Also supported is a Trylock operation.
+
+For information on the libo2dlm interface, please see o2dlm.h,
+distributed with ocfs2-tools.
+
+Lock value blocks can be read and written to a resource via read(2)
+and write(2) against the fd obtained via your open(2) call. The
+maximum currently supported LVB length is 64 bytes (though that is an
+OCFS2 DLM limitation). Through this mechanism, users of dlmfs can share
+small amounts of data amongst their nodes.
+
+mkdir(2) signals dlmfs to join a domain (which will have the same name
+as the resulting directory)
+
+rmdir(2) signals dlmfs to leave the domain
+
+Locks for a given domain are represented by regular inodes inside the
+domain directory.  Locking against them is done via the open(2) system
+call.
+
+The open(2) call will not return until your lock has been granted or
+an error has occurred, unless it has been instructed to do a trylock
+operation. If the lock succeeds, you'll get an fd.
+
+open(2) with O_CREAT to ensure the resource inode is created - dlmfs does
+not automatically create inodes for existing lock resources.
+
+============  ===========================
+Open Flag     Lock Request Type
+============  ===========================
+O_RDONLY      Shared Read
+O_RDWR        Exclusive
+============  ===========================
+
+
+============  ===========================
+Open Flag     Resulting Locking Behavior
+============  ===========================
+O_NONBLOCK    Trylock operation
+============  ===========================
+
+You must provide exactly one of O_RDONLY or O_RDWR.
+
+If O_NONBLOCK is also provided and the trylock operation was valid but
+could not lock the resource then open(2) will return ETXTBUSY.
+
+close(2) drops the lock associated with your fd.
+
+Modes passed to mkdir(2) or open(2) are adhered to locally. Chown is
+supported locally as well. This means you can use them to restrict
+access to the resources via dlmfs on your local node only.
+
+The resource LVB may be read from the fd in either Shared Read or
+Exclusive modes via the read(2) system call. It can be written via
+write(2) only when open in Exclusive mode.
+
+Once written, an LVB will be visible to other nodes who obtain Read
+Only or higher level locks on the resource.
+
+See Also
+========
+http://opendlm.sourceforge.net/cvsmirror/opendlm/docs/dlmbook_final.pdf
+
+For more information on the VMS distributed locking API.
diff --git a/Documentation/filesystems/dlmfs.txt b/Documentation/filesystems/dlmfs.txt
deleted file mode 100644
index fcf4d509d118..000000000000
--- a/Documentation/filesystems/dlmfs.txt
+++ /dev/null
@@ -1,130 +0,0 @@
-dlmfs
-==================
-A minimal DLM userspace interface implemented via a virtual file
-system.
-
-dlmfs is built with OCFS2 as it requires most of its infrastructure.
-
-Project web page:    http://ocfs2.wiki.kernel.org
-Tools web page:      https://github.com/markfasheh/ocfs2-tools
-OCFS2 mailing lists: http://oss.oracle.com/projects/ocfs2/mailman/
-
-All code copyright 2005 Oracle except when otherwise noted.
-
-CREDITS
-=======
-
-Some code taken from ramfs which is Copyright (C) 2000 Linus Torvalds
-and Transmeta Corp.
-
-Mark Fasheh <mark.fasheh@oracle.com>
-
-Caveats
-=======
-- Right now it only works with the OCFS2 DLM, though support for other
-  DLM implementations should not be a major issue.
-
-Mount options
-=============
-None
-
-Usage
-=====
-
-If you're just interested in OCFS2, then please see ocfs2.txt. The
-rest of this document will be geared towards those who want to use
-dlmfs for easy to setup and easy to use clustered locking in
-userspace.
-
-Setup
-=====
-
-dlmfs requires that the OCFS2 cluster infrastructure be in
-place. Please download ocfs2-tools from the above url and configure a
-cluster.
-
-You'll want to start heartbeating on a volume which all the nodes in
-your lockspace can access. The easiest way to do this is via
-ocfs2_hb_ctl (distributed with ocfs2-tools). Right now it requires
-that an OCFS2 file system be in place so that it can automatically
-find its heartbeat area, though it will eventually support heartbeat
-against raw disks.
-
-Please see the ocfs2_hb_ctl and mkfs.ocfs2 manual pages distributed
-with ocfs2-tools.
-
-Once you're heartbeating, DLM lock 'domains' can be easily created /
-destroyed and locks within them accessed.
-
-Locking
-=======
-
-Users may access dlmfs via standard file system calls, or they can use
-'libo2dlm' (distributed with ocfs2-tools) which abstracts the file
-system calls and presents a more traditional locking api.
-
-dlmfs handles lock caching automatically for the user, so a lock
-request for an already acquired lock will not generate another DLM
-call. Userspace programs are assumed to handle their own local
-locking.
-
-Two levels of locks are supported - Shared Read, and Exclusive.
-Also supported is a Trylock operation.
-
-For information on the libo2dlm interface, please see o2dlm.h,
-distributed with ocfs2-tools.
-
-Lock value blocks can be read and written to a resource via read(2)
-and write(2) against the fd obtained via your open(2) call. The
-maximum currently supported LVB length is 64 bytes (though that is an
-OCFS2 DLM limitation). Through this mechanism, users of dlmfs can share
-small amounts of data amongst their nodes.
-
-mkdir(2) signals dlmfs to join a domain (which will have the same name
-as the resulting directory)
-
-rmdir(2) signals dlmfs to leave the domain
-
-Locks for a given domain are represented by regular inodes inside the
-domain directory.  Locking against them is done via the open(2) system
-call.
-
-The open(2) call will not return until your lock has been granted or
-an error has occurred, unless it has been instructed to do a trylock
-operation. If the lock succeeds, you'll get an fd.
-
-open(2) with O_CREAT to ensure the resource inode is created - dlmfs does
-not automatically create inodes for existing lock resources.
-
-Open Flag     Lock Request Type
----------     -----------------
-O_RDONLY      Shared Read
-O_RDWR        Exclusive
-
-Open Flag     Resulting Locking Behavior
----------     --------------------------
-O_NONBLOCK    Trylock operation
-
-You must provide exactly one of O_RDONLY or O_RDWR.
-
-If O_NONBLOCK is also provided and the trylock operation was valid but
-could not lock the resource then open(2) will return ETXTBUSY.
-
-close(2) drops the lock associated with your fd.
-
-Modes passed to mkdir(2) or open(2) are adhered to locally. Chown is
-supported locally as well. This means you can use them to restrict
-access to the resources via dlmfs on your local node only.
-
-The resource LVB may be read from the fd in either Shared Read or
-Exclusive modes via the read(2) system call. It can be written via
-write(2) only when open in Exclusive mode.
-
-Once written, an LVB will be visible to other nodes who obtain Read
-Only or higher level locks on the resource.
-
-See Also
-========
-http://opendlm.sourceforge.net/cvsmirror/opendlm/docs/dlmbook_final.pdf
-
-For more information on the VMS distributed locking API.
diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index ab3b656bbe60..c6885c7ef781 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -58,6 +58,7 @@ Documentation for filesystem implementations.
    ceph
    cramfs
    debugfs
+   dlmfs
    fuse
    overlayfs
    virtiofs
-- 
cgit 


From b02a17cb8ae23479c9bf306e96d2dd71422de63f Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:11:59 +0100
Subject: docs: filesystems: convert ecryptfs.txt to ReST

- Add a SPDX header;
- Add a document title;
- use :field: markup;
- Some whitespace fixes and new line breaks;
- Mark literal blocks as such;
- Add table markups;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Acked-by: Tyler Hicks <code@tyhicks.com>
Link: https://lore.kernel.org/r/6e13841ebd00c8d988027115c75c58821bb41a0c.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/ecryptfs.rst | 87 ++++++++++++++++++++++++++++++++++
 Documentation/filesystems/ecryptfs.txt | 77 ------------------------------
 Documentation/filesystems/index.rst    |  1 +
 3 files changed, 88 insertions(+), 77 deletions(-)
 create mode 100644 Documentation/filesystems/ecryptfs.rst
 delete mode 100644 Documentation/filesystems/ecryptfs.txt

diff --git a/Documentation/filesystems/ecryptfs.rst b/Documentation/filesystems/ecryptfs.rst
new file mode 100644
index 000000000000..7236172300ef
--- /dev/null
+++ b/Documentation/filesystems/ecryptfs.rst
@@ -0,0 +1,87 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+======================================================
+eCryptfs: A stacked cryptographic filesystem for Linux
+======================================================
+
+eCryptfs is free software. Please see the file COPYING for details.
+For documentation, please see the files in the doc/ subdirectory.  For
+building and installation instructions please see the INSTALL file.
+
+:Maintainer: Phillip Hellewell
+:Lead developer: Michael A. Halcrow <mhalcrow@us.ibm.com>
+:Developers: Michael C. Thompson
+             Kent Yoder
+:Web Site: http://ecryptfs.sf.net
+
+This software is currently undergoing development. Make sure to
+maintain a backup copy of any data you write into eCryptfs.
+
+eCryptfs requires the userspace tools downloadable from the
+SourceForge site:
+
+http://sourceforge.net/projects/ecryptfs/
+
+Userspace requirements include:
+
+- David Howells' userspace keyring headers and libraries (version
+  1.0 or higher), obtainable from
+  http://people.redhat.com/~dhowells/keyutils/
+- Libgcrypt
+
+
+Notes
+=====
+
+In the beta/experimental releases of eCryptfs, when you upgrade
+eCryptfs, you should copy the files to an unencrypted location and
+then copy the files back into the new eCryptfs mount to migrate the
+files.
+
+
+Mount-wide Passphrase
+=====================
+
+Create a new directory into which eCryptfs will write its encrypted
+files (i.e., /root/crypt).  Then, create the mount point directory
+(i.e., /mnt/crypt).  Now it's time to mount eCryptfs::
+
+    mount -t ecryptfs /root/crypt /mnt/crypt
+
+You should be prompted for a passphrase and a salt (the salt may be
+blank).
+
+Try writing a new file::
+
+    echo "Hello, World" > /mnt/crypt/hello.txt
+
+The operation will complete.  Notice that there is a new file in
+/root/crypt that is at least 12288 bytes in size (depending on your
+host page size).  This is the encrypted underlying file for what you
+just wrote.  To test reading, from start to finish, you need to clear
+the user session keyring:
+
+keyctl clear @u
+
+Then umount /mnt/crypt and mount again per the instructions given
+above.
+
+::
+
+    cat /mnt/crypt/hello.txt
+
+
+Notes
+=====
+
+eCryptfs version 0.1 should only be mounted on (1) empty directories
+or (2) directories containing files only created by eCryptfs. If you
+mount a directory that has pre-existing files not created by eCryptfs,
+then behavior is undefined. Do not run eCryptfs in higher verbosity
+levels unless you are doing so for the sole purpose of debugging or
+development, since secret values will be written out to the system log
+in that case.
+
+
+Mike Halcrow
+mhalcrow@us.ibm.com
diff --git a/Documentation/filesystems/ecryptfs.txt b/Documentation/filesystems/ecryptfs.txt
deleted file mode 100644
index 01d8a08351ac..000000000000
--- a/Documentation/filesystems/ecryptfs.txt
+++ /dev/null
@@ -1,77 +0,0 @@
-eCryptfs: A stacked cryptographic filesystem for Linux
-
-eCryptfs is free software. Please see the file COPYING for details.
-For documentation, please see the files in the doc/ subdirectory.  For
-building and installation instructions please see the INSTALL file.
-
-Maintainer: Phillip Hellewell
-Lead developer: Michael A. Halcrow <mhalcrow@us.ibm.com>
-Developers: Michael C. Thompson
-            Kent Yoder
-Web Site: http://ecryptfs.sf.net
-
-This software is currently undergoing development. Make sure to
-maintain a backup copy of any data you write into eCryptfs.
-
-eCryptfs requires the userspace tools downloadable from the
-SourceForge site:
-
-http://sourceforge.net/projects/ecryptfs/
-
-Userspace requirements include:
- - David Howells' userspace keyring headers and libraries (version
-   1.0 or higher), obtainable from
-   http://people.redhat.com/~dhowells/keyutils/
- - Libgcrypt
-
-
-NOTES
-
-In the beta/experimental releases of eCryptfs, when you upgrade
-eCryptfs, you should copy the files to an unencrypted location and
-then copy the files back into the new eCryptfs mount to migrate the
-files.
-
-
-MOUNT-WIDE PASSPHRASE
-
-Create a new directory into which eCryptfs will write its encrypted
-files (i.e., /root/crypt).  Then, create the mount point directory
-(i.e., /mnt/crypt).  Now it's time to mount eCryptfs:
-
-mount -t ecryptfs /root/crypt /mnt/crypt
-
-You should be prompted for a passphrase and a salt (the salt may be
-blank).
-
-Try writing a new file:
-
-echo "Hello, World" > /mnt/crypt/hello.txt
-
-The operation will complete.  Notice that there is a new file in
-/root/crypt that is at least 12288 bytes in size (depending on your
-host page size).  This is the encrypted underlying file for what you
-just wrote.  To test reading, from start to finish, you need to clear
-the user session keyring:
-
-keyctl clear @u
-
-Then umount /mnt/crypt and mount again per the instructions given
-above.
-
-cat /mnt/crypt/hello.txt
-
-
-NOTES
-
-eCryptfs version 0.1 should only be mounted on (1) empty directories
-or (2) directories containing files only created by eCryptfs. If you
-mount a directory that has pre-existing files not created by eCryptfs,
-then behavior is undefined. Do not run eCryptfs in higher verbosity
-levels unless you are doing so for the sole purpose of debugging or
-development, since secret values will be written out to the system log
-in that case.
-
-
-Mike Halcrow
-mhalcrow@us.ibm.com
diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index c6885c7ef781..d6d69f1c9287 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -59,6 +59,7 @@ Documentation for filesystem implementations.
    cramfs
    debugfs
    dlmfs
+   ecryptfs
    fuse
    overlayfs
    virtiofs
-- 
cgit 


From 06dedb45b79c6550b878244879f33b6e614126bd Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:12:00 +0100
Subject: docs: filesystems: convert efivarfs.txt to ReST

Trivial changes:

- Add a SPDX header;
- Adjust document title;
- Mark a literal block as such;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/215691d747055c4ccb038ec7d78d8d1fe87fe2c0.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/efivarfs.rst | 26 ++++++++++++++++++++++++++
 Documentation/filesystems/efivarfs.txt | 23 -----------------------
 Documentation/filesystems/index.rst    |  1 +
 3 files changed, 27 insertions(+), 23 deletions(-)
 create mode 100644 Documentation/filesystems/efivarfs.rst
 delete mode 100644 Documentation/filesystems/efivarfs.txt

diff --git a/Documentation/filesystems/efivarfs.rst b/Documentation/filesystems/efivarfs.rst
new file mode 100644
index 000000000000..90ac65683e7e
--- /dev/null
+++ b/Documentation/filesystems/efivarfs.rst
@@ -0,0 +1,26 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=======================================
+efivarfs - a (U)EFI variable filesystem
+=======================================
+
+The efivarfs filesystem was created to address the shortcomings of
+using entries in sysfs to maintain EFI variables. The old sysfs EFI
+variables code only supported variables of up to 1024 bytes. This
+limitation existed in version 0.99 of the EFI specification, but was
+removed before any full releases. Since variables can now be larger
+than a single page, sysfs isn't the best interface for this.
+
+Variables can be created, deleted and modified with the efivarfs
+filesystem.
+
+efivarfs is typically mounted like this::
+
+	mount -t efivarfs none /sys/firmware/efi/efivars
+
+Due to the presence of numerous firmware bugs where removing non-standard
+UEFI variables causes the system firmware to fail to POST, efivarfs
+files that are not well-known standardized variables are created
+as immutable files.  This doesn't prevent removal - "chattr -i" will work -
+but it does prevent this kind of failure from being accomplished
+accidentally.
diff --git a/Documentation/filesystems/efivarfs.txt b/Documentation/filesystems/efivarfs.txt
deleted file mode 100644
index 686a64bba775..000000000000
--- a/Documentation/filesystems/efivarfs.txt
+++ /dev/null
@@ -1,23 +0,0 @@
-
-efivarfs - a (U)EFI variable filesystem
-
-The efivarfs filesystem was created to address the shortcomings of
-using entries in sysfs to maintain EFI variables. The old sysfs EFI
-variables code only supported variables of up to 1024 bytes. This
-limitation existed in version 0.99 of the EFI specification, but was
-removed before any full releases. Since variables can now be larger
-than a single page, sysfs isn't the best interface for this.
-
-Variables can be created, deleted and modified with the efivarfs
-filesystem.
-
-efivarfs is typically mounted like this,
-
-	mount -t efivarfs none /sys/firmware/efi/efivars
-
-Due to the presence of numerous firmware bugs where removing non-standard
-UEFI variables causes the system firmware to fail to POST, efivarfs
-files that are not well-known standardized variables are created
-as immutable files.  This doesn't prevent removal - "chattr -i" will work -
-but it does prevent this kind of failure from being accomplished
-accidentally.
diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index d6d69f1c9287..4230f49d2732 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -60,6 +60,7 @@ Documentation for filesystem implementations.
    debugfs
    dlmfs
    ecryptfs
+   efivarfs
    fuse
    overlayfs
    virtiofs
-- 
cgit 


From e66d8631ddb3306bd9f463324c2d9a5d9dc559f7 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:12:01 +0100
Subject: docs: filesystems: convert erofs.txt to ReST

- Add a SPDX header;
- Add a document title;
- Some whitespace fixes and new line breaks;
- Mark literal blocks as such;
- Add table markups;
- Add lists markups;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/402d1d2f7252b8a683f7a9c6867bc5428da64026.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/erofs.rst | 240 ++++++++++++++++++++++++++++++++++++
 Documentation/filesystems/erofs.txt | 211 -------------------------------
 Documentation/filesystems/index.rst |   1 +
 3 files changed, 241 insertions(+), 211 deletions(-)
 create mode 100644 Documentation/filesystems/erofs.rst
 delete mode 100644 Documentation/filesystems/erofs.txt

diff --git a/Documentation/filesystems/erofs.rst b/Documentation/filesystems/erofs.rst
new file mode 100644
index 000000000000..bf145171c2bf
--- /dev/null
+++ b/Documentation/filesystems/erofs.rst
@@ -0,0 +1,240 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+======================================
+Enhanced Read-Only File System - EROFS
+======================================
+
+Overview
+========
+
+EROFS file-system stands for Enhanced Read-Only File System. Different
+from other read-only file systems, it aims to be designed for flexibility,
+scalability, but be kept simple and high performance.
+
+It is designed as a better filesystem solution for the following scenarios:
+
+ - read-only storage media or
+
+ - part of a fully trusted read-only solution, which means it needs to be
+   immutable and bit-for-bit identical to the official golden image for
+   their releases due to security and other considerations and
+
+ - hope to save some extra storage space with guaranteed end-to-end performance
+   by using reduced metadata and transparent file compression, especially
+   for those embedded devices with limited memory (ex, smartphone);
+
+Here is the main features of EROFS:
+
+ - Little endian on-disk design;
+
+ - Currently 4KB block size (nobh) and therefore maximum 16TB address space;
+
+ - Metadata & data could be mixed by design;
+
+ - 2 inode versions for different requirements:
+
+   =====================  ============  =====================================
+                          compact (v1)  extended (v2)
+   =====================  ============  =====================================
+   Inode metadata size    32 bytes      64 bytes
+   Max file size          4 GB          16 EB (also limited by max. vol size)
+   Max uids/gids          65536         4294967296
+   File change time       no            yes (64 + 32-bit timestamp)
+   Max hardlinks          65536         4294967296
+   Metadata reserved      4 bytes       14 bytes
+   =====================  ============  =====================================
+
+ - Support extended attributes (xattrs) as an option;
+
+ - Support xattr inline and tail-end data inline for all files;
+
+ - Support POSIX.1e ACLs by using xattrs;
+
+ - Support transparent file compression as an option:
+   LZ4 algorithm with 4 KB fixed-sized output compression for high performance.
+
+The following git tree provides the file system user-space tools under
+development (ex, formatting tool mkfs.erofs):
+
+- git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs-utils.git
+
+Bugs and patches are welcome, please kindly help us and send to the following
+linux-erofs mailing list:
+
+- linux-erofs mailing list   <linux-erofs@lists.ozlabs.org>
+
+Mount options
+=============
+
+===================    =========================================================
+(no)user_xattr         Setup Extended User Attributes. Note: xattr is enabled
+                       by default if CONFIG_EROFS_FS_XATTR is selected.
+(no)acl                Setup POSIX Access Control List. Note: acl is enabled
+                       by default if CONFIG_EROFS_FS_POSIX_ACL is selected.
+cache_strategy=%s      Select a strategy for cached decompression from now on:
+
+		       ==========  =============================================
+                         disabled  In-place I/O decompression only;
+                        readahead  Cache the last incomplete compressed physical
+                                   cluster for further reading. It still does
+                                   in-place I/O decompression for the rest
+                                   compressed physical clusters;
+                       readaround  Cache the both ends of incomplete compressed
+                                   physical clusters for further reading.
+                                   It still does in-place I/O decompression
+                                   for the rest compressed physical clusters.
+		       ==========  =============================================
+===================    =========================================================
+
+On-disk details
+===============
+
+Summary
+-------
+Different from other read-only file systems, an EROFS volume is designed
+to be as simple as possible::
+
+                                |-> aligned with the block size
+   ____________________________________________________________
+  | |SB| | ... | Metadata | ... | Data | Metadata | ... | Data |
+  |_|__|_|_____|__________|_____|______|__________|_____|______|
+  0 +1K
+
+All data areas should be aligned with the block size, but metadata areas
+may not. All metadatas can be now observed in two different spaces (views):
+
+ 1. Inode metadata space
+
+    Each valid inode should be aligned with an inode slot, which is a fixed
+    value (32 bytes) and designed to be kept in line with compact inode size.
+
+    Each inode can be directly found with the following formula:
+         inode offset = meta_blkaddr * block_size + 32 * nid
+
+    ::
+
+				    |-> aligned with 8B
+					    |-> followed closely
+	+ meta_blkaddr blocks                                      |-> another slot
+	_____________________________________________________________________
+	|  ...   | inode |  xattrs  | extents  | data inline | ... | inode ...
+	|________|_______|(optional)|(optional)|__(optional)_|_____|__________
+		|-> aligned with the inode slot size
+		    .                   .
+		    .                         .
+		.                              .
+		.                                    .
+	    .                                         .
+	    .                                              .
+	.____________________________________________________|-> aligned with 4B
+	| xattr_ibody_header | shared xattrs | inline xattrs |
+	|____________________|_______________|_______________|
+	|->    12 bytes    <-|->x * 4 bytes<-|               .
+			    .                .                 .
+			.                      .                   .
+		.                           .                     .
+	    ._______________________________.______________________.
+	    | id | id | id | id |  ... | id | ent | ... | ent| ... |
+	    |____|____|____|____|______|____|_____|_____|____|_____|
+					    |-> aligned with 4B
+							|-> aligned with 4B
+
+    Inode could be 32 or 64 bytes, which can be distinguished from a common
+    field which all inode versions have -- i_format::
+
+        __________________               __________________
+       |     i_format     |             |     i_format     |
+       |__________________|             |__________________|
+       |        ...       |             |        ...       |
+       |                  |             |                  |
+       |__________________| 32 bytes    |                  |
+                                        |                  |
+                                        |__________________| 64 bytes
+
+    Xattrs, extents, data inline are followed by the corresponding inode with
+    proper alignment, and they could be optional for different data mappings.
+    _currently_ total 4 valid data mappings are supported:
+
+    ==  ====================================================================
+     0  flat file data without data inline (no extent);
+     1  fixed-sized output data compression (with non-compacted indexes);
+     2  flat file data with tail packing data inline (no extent);
+     3  fixed-sized output data compression (with compacted indexes, v5.3+).
+    ==  ====================================================================
+
+    The size of the optional xattrs is indicated by i_xattr_count in inode
+    header. Large xattrs or xattrs shared by many different files can be
+    stored in shared xattrs metadata rather than inlined right after inode.
+
+ 2. Shared xattrs metadata space
+
+    Shared xattrs space is similar to the above inode space, started with
+    a specific block indicated by xattr_blkaddr, organized one by one with
+    proper align.
+
+    Each share xattr can also be directly found by the following formula:
+         xattr offset = xattr_blkaddr * block_size + 4 * xattr_id
+
+    ::
+
+			    |-> aligned by  4 bytes
+	+ xattr_blkaddr blocks                     |-> aligned with 4 bytes
+	_________________________________________________________________________
+	|  ...   | xattr_entry |  xattr data | ... |  xattr_entry | xattr data  ...
+	|________|_____________|_____________|_____|______________|_______________
+
+Directories
+-----------
+All directories are now organized in a compact on-disk format. Note that
+each directory block is divided into index and name areas in order to support
+random file lookup, and all directory entries are _strictly_ recorded in
+alphabetical order in order to support improved prefix binary search
+algorithm (could refer to the related source code).
+
+::
+
+		    ___________________________
+		    /                           |
+		/              ______________|________________
+		/              /              | nameoff1       | nameoffN-1
+    ____________.______________._______________v________________v__________
+    | dirent | dirent | ... | dirent | filename | filename | ... | filename |
+    |___.0___|____1___|_____|___N-1__|____0_____|____1_____|_____|___N-1____|
+	\                           ^
+	\                          |                           * could have
+	\                         |                             trailing '\0'
+	    \________________________| nameoff0
+
+				Directory block
+
+Note that apart from the offset of the first filename, nameoff0 also indicates
+the total number of directory entries in this block since it is no need to
+introduce another on-disk field at all.
+
+Compression
+-----------
+Currently, EROFS supports 4KB fixed-sized output transparent file compression,
+as illustrated below::
+
+	    |---- Variant-Length Extent ----|-------- VLE --------|----- VLE -----
+	    clusterofs                      clusterofs            clusterofs
+	    |                               |                     |   logical data
+    _________v_______________________________v_____________________v_______________
+    ... |    .        |             |        .    |             |  .          | ...
+    ____|____.________|_____________|________.____|_____________|__.__________|____
+	|-> cluster <-|-> cluster <-|-> cluster <-|-> cluster <-|-> cluster <-|
+	    size          size          size          size          size
+	    .                             .                .                   .
+	    .                       .               .                  .
+		.                  .              .                .
+	_______._____________._____________._____________._____________________
+	    ... |             |             |             | ... physical data
+	_______|_____________|_____________|_____________|_____________________
+		|-> cluster <-|-> cluster <-|-> cluster <-|
+		    size          size          size
+
+Currently each on-disk physical cluster can contain 4KB (un)compressed data
+at most. For each logical cluster, there is a corresponding on-disk index to
+describe its cluster type, physical cluster address, etc.
+
+See "struct z_erofs_vle_decompressed_index" in erofs_fs.h for more details.
diff --git a/Documentation/filesystems/erofs.txt b/Documentation/filesystems/erofs.txt
deleted file mode 100644
index db6d39c3ae71..000000000000
--- a/Documentation/filesystems/erofs.txt
+++ /dev/null
@@ -1,211 +0,0 @@
-Overview
-========
-
-EROFS file-system stands for Enhanced Read-Only File System. Different
-from other read-only file systems, it aims to be designed for flexibility,
-scalability, but be kept simple and high performance.
-
-It is designed as a better filesystem solution for the following scenarios:
- - read-only storage media or
-
- - part of a fully trusted read-only solution, which means it needs to be
-   immutable and bit-for-bit identical to the official golden image for
-   their releases due to security and other considerations and
-
- - hope to save some extra storage space with guaranteed end-to-end performance
-   by using reduced metadata and transparent file compression, especially
-   for those embedded devices with limited memory (ex, smartphone);
-
-Here is the main features of EROFS:
- - Little endian on-disk design;
-
- - Currently 4KB block size (nobh) and therefore maximum 16TB address space;
-
- - Metadata & data could be mixed by design;
-
- - 2 inode versions for different requirements:
-                          compact (v1)  extended (v2)
-   Inode metadata size:   32 bytes      64 bytes
-   Max file size:         4 GB          16 EB (also limited by max. vol size)
-   Max uids/gids:         65536         4294967296
-   File change time:      no            yes (64 + 32-bit timestamp)
-   Max hardlinks:         65536         4294967296
-   Metadata reserved:     4 bytes       14 bytes
-
- - Support extended attributes (xattrs) as an option;
-
- - Support xattr inline and tail-end data inline for all files;
-
- - Support POSIX.1e ACLs by using xattrs;
-
- - Support transparent file compression as an option:
-   LZ4 algorithm with 4 KB fixed-sized output compression for high performance.
-
-The following git tree provides the file system user-space tools under
-development (ex, formatting tool mkfs.erofs):
->> git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs-utils.git
-
-Bugs and patches are welcome, please kindly help us and send to the following
-linux-erofs mailing list:
->> linux-erofs mailing list   <linux-erofs@lists.ozlabs.org>
-
-Mount options
-=============
-
-(no)user_xattr         Setup Extended User Attributes. Note: xattr is enabled
-                       by default if CONFIG_EROFS_FS_XATTR is selected.
-(no)acl                Setup POSIX Access Control List. Note: acl is enabled
-                       by default if CONFIG_EROFS_FS_POSIX_ACL is selected.
-cache_strategy=%s      Select a strategy for cached decompression from now on:
-                         disabled: In-place I/O decompression only;
-                        readahead: Cache the last incomplete compressed physical
-                                   cluster for further reading. It still does
-                                   in-place I/O decompression for the rest
-                                   compressed physical clusters;
-                       readaround: Cache the both ends of incomplete compressed
-                                   physical clusters for further reading.
-                                   It still does in-place I/O decompression
-                                   for the rest compressed physical clusters.
-
-On-disk details
-===============
-
-Summary
--------
-Different from other read-only file systems, an EROFS volume is designed
-to be as simple as possible:
-
-                                |-> aligned with the block size
-   ____________________________________________________________
-  | |SB| | ... | Metadata | ... | Data | Metadata | ... | Data |
-  |_|__|_|_____|__________|_____|______|__________|_____|______|
-  0 +1K
-
-All data areas should be aligned with the block size, but metadata areas
-may not. All metadatas can be now observed in two different spaces (views):
- 1. Inode metadata space
-    Each valid inode should be aligned with an inode slot, which is a fixed
-    value (32 bytes) and designed to be kept in line with compact inode size.
-
-    Each inode can be directly found with the following formula:
-         inode offset = meta_blkaddr * block_size + 32 * nid
-
-                                |-> aligned with 8B
-                                           |-> followed closely
-    + meta_blkaddr blocks                                      |-> another slot
-     _____________________________________________________________________
-    |  ...   | inode |  xattrs  | extents  | data inline | ... | inode ...
-    |________|_______|(optional)|(optional)|__(optional)_|_____|__________
-             |-> aligned with the inode slot size
-                  .                   .
-                .                         .
-              .                              .
-            .                                    .
-          .                                         .
-        .                                              .
-      .____________________________________________________|-> aligned with 4B
-      | xattr_ibody_header | shared xattrs | inline xattrs |
-      |____________________|_______________|_______________|
-      |->    12 bytes    <-|->x * 4 bytes<-|               .
-                          .                .                 .
-                    .                      .                   .
-               .                           .                     .
-           ._______________________________.______________________.
-           | id | id | id | id |  ... | id | ent | ... | ent| ... |
-           |____|____|____|____|______|____|_____|_____|____|_____|
-                                           |-> aligned with 4B
-                                                       |-> aligned with 4B
-
-    Inode could be 32 or 64 bytes, which can be distinguished from a common
-    field which all inode versions have -- i_format:
-
-        __________________               __________________
-       |     i_format     |             |     i_format     |
-       |__________________|             |__________________|
-       |        ...       |             |        ...       |
-       |                  |             |                  |
-       |__________________| 32 bytes    |                  |
-                                        |                  |
-                                        |__________________| 64 bytes
-
-    Xattrs, extents, data inline are followed by the corresponding inode with
-    proper alignment, and they could be optional for different data mappings.
-    _currently_ total 4 valid data mappings are supported:
-
-     0  flat file data without data inline (no extent);
-     1  fixed-sized output data compression (with non-compacted indexes);
-     2  flat file data with tail packing data inline (no extent);
-     3  fixed-sized output data compression (with compacted indexes, v5.3+).
-
-    The size of the optional xattrs is indicated by i_xattr_count in inode
-    header. Large xattrs or xattrs shared by many different files can be
-    stored in shared xattrs metadata rather than inlined right after inode.
-
- 2. Shared xattrs metadata space
-    Shared xattrs space is similar to the above inode space, started with
-    a specific block indicated by xattr_blkaddr, organized one by one with
-    proper align.
-
-    Each share xattr can also be directly found by the following formula:
-         xattr offset = xattr_blkaddr * block_size + 4 * xattr_id
-
-                           |-> aligned by  4 bytes
-    + xattr_blkaddr blocks                     |-> aligned with 4 bytes
-     _________________________________________________________________________
-    |  ...   | xattr_entry |  xattr data | ... |  xattr_entry | xattr data  ...
-    |________|_____________|_____________|_____|______________|_______________
-
-Directories
------------
-All directories are now organized in a compact on-disk format. Note that
-each directory block is divided into index and name areas in order to support
-random file lookup, and all directory entries are _strictly_ recorded in
-alphabetical order in order to support improved prefix binary search
-algorithm (could refer to the related source code).
-
-                 ___________________________
-                /                           |
-               /              ______________|________________
-              /              /              | nameoff1       | nameoffN-1
- ____________.______________._______________v________________v__________
-| dirent | dirent | ... | dirent | filename | filename | ... | filename |
-|___.0___|____1___|_____|___N-1__|____0_____|____1_____|_____|___N-1____|
-     \                           ^
-      \                          |                           * could have
-       \                         |                             trailing '\0'
-        \________________________| nameoff0
-
-                             Directory block
-
-Note that apart from the offset of the first filename, nameoff0 also indicates
-the total number of directory entries in this block since it is no need to
-introduce another on-disk field at all.
-
-Compression
------------
-Currently, EROFS supports 4KB fixed-sized output transparent file compression,
-as illustrated below:
-
-         |---- Variant-Length Extent ----|-------- VLE --------|----- VLE -----
-         clusterofs                      clusterofs            clusterofs
-         |                               |                     |   logical data
-_________v_______________________________v_____________________v_______________
-... |    .        |             |        .    |             |  .          | ...
-____|____.________|_____________|________.____|_____________|__.__________|____
-    |-> cluster <-|-> cluster <-|-> cluster <-|-> cluster <-|-> cluster <-|
-         size          size          size          size          size
-          .                             .                .                   .
-           .                       .               .                  .
-            .                  .              .                .
-      _______._____________._____________._____________._____________________
-         ... |             |             |             | ... physical data
-      _______|_____________|_____________|_____________|_____________________
-             |-> cluster <-|-> cluster <-|-> cluster <-|
-                  size          size          size
-
-Currently each on-disk physical cluster can contain 4KB (un)compressed data
-at most. For each logical cluster, there is a corresponding on-disk index to
-describe its cluster type, physical cluster address, etc.
-
-See "struct z_erofs_vle_decompressed_index" in erofs_fs.h for more details.
-
diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index 4230f49d2732..03a493b27920 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -61,6 +61,7 @@ Documentation for filesystem implementations.
    dlmfs
    ecryptfs
    efivarfs
+   erofs
    fuse
    overlayfs
    virtiofs
-- 
cgit 


From 6e29ad2ea34f63f2b959807370672af569861378 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:12:02 +0100
Subject: docs: filesystems: convert ext2.txt to ReST

- Add a SPDX header;
- Some whitespace fixes and new line breaks;
- Mark literal blocks as such;
- Add table markups;
- Use footnoote markups;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Acked-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/fde6721f0303259d830391e351dbde48f67f3ec7.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/ext2.rst  | 399 ++++++++++++++++++++++++++++++++++++
 Documentation/filesystems/ext2.txt  | 388 -----------------------------------
 Documentation/filesystems/index.rst |   1 +
 3 files changed, 400 insertions(+), 388 deletions(-)
 create mode 100644 Documentation/filesystems/ext2.rst
 delete mode 100644 Documentation/filesystems/ext2.txt

diff --git a/Documentation/filesystems/ext2.rst b/Documentation/filesystems/ext2.rst
new file mode 100644
index 000000000000..d83dbbb162e2
--- /dev/null
+++ b/Documentation/filesystems/ext2.rst
@@ -0,0 +1,399 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+
+The Second Extended Filesystem
+==============================
+
+ext2 was originally released in January 1993.  Written by R\'emy Card,
+Theodore Ts'o and Stephen Tweedie, it was a major rewrite of the
+Extended Filesystem.  It is currently still (April 2001) the predominant
+filesystem in use by Linux.  There are also implementations available
+for NetBSD, FreeBSD, the GNU HURD, Windows 95/98/NT, OS/2 and RISC OS.
+
+Options
+=======
+
+Most defaults are determined by the filesystem superblock, and can be
+set using tune2fs(8). Kernel-determined defaults are indicated by (*).
+
+====================    ===     ================================================
+bsddf			(*)	Makes ``df`` act like BSD.
+minixdf				Makes ``df`` act like Minix.
+
+check=none, nocheck	(*)	Don't do extra checking of bitmaps on mount
+				(check=normal and check=strict options removed)
+
+dax				Use direct access (no page cache).  See
+				Documentation/filesystems/dax.txt.
+
+debug				Extra debugging information is sent to the
+				kernel syslog.  Useful for developers.
+
+errors=continue			Keep going on a filesystem error.
+errors=remount-ro		Remount the filesystem read-only on an error.
+errors=panic			Panic and halt the machine if an error occurs.
+
+grpid, bsdgroups		Give objects the same group ID as their parent.
+nogrpid, sysvgroups		New objects have the group ID of their creator.
+
+nouid32				Use 16-bit UIDs and GIDs.
+
+oldalloc			Enable the old block allocator. Orlov should
+				have better performance, we'd like to get some
+				feedback if it's the contrary for you.
+orlov			(*)	Use the Orlov block allocator.
+				(See http://lwn.net/Articles/14633/ and
+				http://lwn.net/Articles/14446/.)
+
+resuid=n			The user ID which may use the reserved blocks.
+resgid=n			The group ID which may use the reserved blocks.
+
+sb=n				Use alternate superblock at this location.
+
+user_xattr			Enable "user." POSIX Extended Attributes
+				(requires CONFIG_EXT2_FS_XATTR).
+nouser_xattr			Don't support "user." extended attributes.
+
+acl				Enable POSIX Access Control Lists support
+				(requires CONFIG_EXT2_FS_POSIX_ACL).
+noacl				Don't support POSIX ACLs.
+
+nobh				Do not attach buffer_heads to file pagecache.
+
+quota, usrquota			Enable user disk quota support
+				(requires CONFIG_QUOTA).
+
+grpquota			Enable group disk quota support
+				(requires CONFIG_QUOTA).
+====================    ===     ================================================
+
+noquota option ls silently ignored by ext2.
+
+
+Specification
+=============
+
+ext2 shares many properties with traditional Unix filesystems.  It has
+the concepts of blocks, inodes and directories.  It has space in the
+specification for Access Control Lists (ACLs), fragments, undeletion and
+compression though these are not yet implemented (some are available as
+separate patches).  There is also a versioning mechanism to allow new
+features (such as journalling) to be added in a maximally compatible
+manner.
+
+Blocks
+------
+
+The space in the device or file is split up into blocks.  These are
+a fixed size, of 1024, 2048 or 4096 bytes (8192 bytes on Alpha systems),
+which is decided when the filesystem is created.  Smaller blocks mean
+less wasted space per file, but require slightly more accounting overhead,
+and also impose other limits on the size of files and the filesystem.
+
+Block Groups
+------------
+
+Blocks are clustered into block groups in order to reduce fragmentation
+and minimise the amount of head seeking when reading a large amount
+of consecutive data.  Information about each block group is kept in a
+descriptor table stored in the block(s) immediately after the superblock.
+Two blocks near the start of each group are reserved for the block usage
+bitmap and the inode usage bitmap which show which blocks and inodes
+are in use.  Since each bitmap is limited to a single block, this means
+that the maximum size of a block group is 8 times the size of a block.
+
+The block(s) following the bitmaps in each block group are designated
+as the inode table for that block group and the remainder are the data
+blocks.  The block allocation algorithm attempts to allocate data blocks
+in the same block group as the inode which contains them.
+
+The Superblock
+--------------
+
+The superblock contains all the information about the configuration of
+the filing system.  The primary copy of the superblock is stored at an
+offset of 1024 bytes from the start of the device, and it is essential
+to mounting the filesystem.  Since it is so important, backup copies of
+the superblock are stored in block groups throughout the filesystem.
+The first version of ext2 (revision 0) stores a copy at the start of
+every block group, along with backups of the group descriptor block(s).
+Because this can consume a considerable amount of space for large
+filesystems, later revisions can optionally reduce the number of backup
+copies by only putting backups in specific groups (this is the sparse
+superblock feature).  The groups chosen are 0, 1 and powers of 3, 5 and 7.
+
+The information in the superblock contains fields such as the total
+number of inodes and blocks in the filesystem and how many are free,
+how many inodes and blocks are in each block group, when the filesystem
+was mounted (and if it was cleanly unmounted), when it was modified,
+what version of the filesystem it is (see the Revisions section below)
+and which OS created it.
+
+If the filesystem is revision 1 or higher, then there are extra fields,
+such as a volume name, a unique identification number, the inode size,
+and space for optional filesystem features to store configuration info.
+
+All fields in the superblock (as in all other ext2 structures) are stored
+on the disc in little endian format, so a filesystem is portable between
+machines without having to know what machine it was created on.
+
+Inodes
+------
+
+The inode (index node) is a fundamental concept in the ext2 filesystem.
+Each object in the filesystem is represented by an inode.  The inode
+structure contains pointers to the filesystem blocks which contain the
+data held in the object and all of the metadata about an object except
+its name.  The metadata about an object includes the permissions, owner,
+group, flags, size, number of blocks used, access time, change time,
+modification time, deletion time, number of links, fragments, version
+(for NFS) and extended attributes (EAs) and/or Access Control Lists (ACLs).
+
+There are some reserved fields which are currently unused in the inode
+structure and several which are overloaded.  One field is reserved for the
+directory ACL if the inode is a directory and alternately for the top 32
+bits of the file size if the inode is a regular file (allowing file sizes
+larger than 2GB).  The translator field is unused under Linux, but is used
+by the HURD to reference the inode of a program which will be used to
+interpret this object.  Most of the remaining reserved fields have been
+used up for both Linux and the HURD for larger owner and group fields,
+The HURD also has a larger mode field so it uses another of the remaining
+fields to store the extra more bits.
+
+There are pointers to the first 12 blocks which contain the file's data
+in the inode.  There is a pointer to an indirect block (which contains
+pointers to the next set of blocks), a pointer to a doubly-indirect
+block (which contains pointers to indirect blocks) and a pointer to a
+trebly-indirect block (which contains pointers to doubly-indirect blocks).
+
+The flags field contains some ext2-specific flags which aren't catered
+for by the standard chmod flags.  These flags can be listed with lsattr
+and changed with the chattr command, and allow specific filesystem
+behaviour on a per-file basis.  There are flags for secure deletion,
+undeletable, compression, synchronous updates, immutability, append-only,
+dumpable, no-atime, indexed directories, and data-journaling.  Not all
+of these are supported yet.
+
+Directories
+-----------
+
+A directory is a filesystem object and has an inode just like a file.
+It is a specially formatted file containing records which associate
+each name with an inode number.  Later revisions of the filesystem also
+encode the type of the object (file, directory, symlink, device, fifo,
+socket) to avoid the need to check the inode itself for this information
+(support for taking advantage of this feature does not yet exist in
+Glibc 2.2).
+
+The inode allocation code tries to assign inodes which are in the same
+block group as the directory in which they are first created.
+
+The current implementation of ext2 uses a singly-linked list to store
+the filenames in the directory; a pending enhancement uses hashing of the
+filenames to allow lookup without the need to scan the entire directory.
+
+The current implementation never removes empty directory blocks once they
+have been allocated to hold more files.
+
+Special files
+-------------
+
+Symbolic links are also filesystem objects with inodes.  They deserve
+special mention because the data for them is stored within the inode
+itself if the symlink is less than 60 bytes long.  It uses the fields
+which would normally be used to store the pointers to data blocks.
+This is a worthwhile optimisation as it we avoid allocating a full
+block for the symlink, and most symlinks are less than 60 characters long.
+
+Character and block special devices never have data blocks assigned to
+them.  Instead, their device number is stored in the inode, again reusing
+the fields which would be used to point to the data blocks.
+
+Reserved Space
+--------------
+
+In ext2, there is a mechanism for reserving a certain number of blocks
+for a particular user (normally the super-user).  This is intended to
+allow for the system to continue functioning even if non-privileged users
+fill up all the space available to them (this is independent of filesystem
+quotas).  It also keeps the filesystem from filling up entirely which
+helps combat fragmentation.
+
+Filesystem check
+----------------
+
+At boot time, most systems run a consistency check (e2fsck) on their
+filesystems.  The superblock of the ext2 filesystem contains several
+fields which indicate whether fsck should actually run (since checking
+the filesystem at boot can take a long time if it is large).  fsck will
+run if the filesystem was not cleanly unmounted, if the maximum mount
+count has been exceeded or if the maximum time between checks has been
+exceeded.
+
+Feature Compatibility
+---------------------
+
+The compatibility feature mechanism used in ext2 is sophisticated.
+It safely allows features to be added to the filesystem, without
+unnecessarily sacrificing compatibility with older versions of the
+filesystem code.  The feature compatibility mechanism is not supported by
+the original revision 0 (EXT2_GOOD_OLD_REV) of ext2, but was introduced in
+revision 1.  There are three 32-bit fields, one for compatible features
+(COMPAT), one for read-only compatible (RO_COMPAT) features and one for
+incompatible (INCOMPAT) features.
+
+These feature flags have specific meanings for the kernel as follows:
+
+A COMPAT flag indicates that a feature is present in the filesystem,
+but the on-disk format is 100% compatible with older on-disk formats, so
+a kernel which didn't know anything about this feature could read/write
+the filesystem without any chance of corrupting the filesystem (or even
+making it inconsistent).  This is essentially just a flag which says
+"this filesystem has a (hidden) feature" that the kernel or e2fsck may
+want to be aware of (more on e2fsck and feature flags later).  The ext3
+HAS_JOURNAL feature is a COMPAT flag because the ext3 journal is simply
+a regular file with data blocks in it so the kernel does not need to
+take any special notice of it if it doesn't understand ext3 journaling.
+
+An RO_COMPAT flag indicates that the on-disk format is 100% compatible
+with older on-disk formats for reading (i.e. the feature does not change
+the visible on-disk format).  However, an old kernel writing to such a
+filesystem would/could corrupt the filesystem, so this is prevented. The
+most common such feature, SPARSE_SUPER, is an RO_COMPAT feature because
+sparse groups allow file data blocks where superblock/group descriptor
+backups used to live, and ext2_free_blocks() refuses to free these blocks,
+which would leading to inconsistent bitmaps.  An old kernel would also
+get an error if it tried to free a series of blocks which crossed a group
+boundary, but this is a legitimate layout in a SPARSE_SUPER filesystem.
+
+An INCOMPAT flag indicates the on-disk format has changed in some
+way that makes it unreadable by older kernels, or would otherwise
+cause a problem if an old kernel tried to mount it.  FILETYPE is an
+INCOMPAT flag because older kernels would think a filename was longer
+than 256 characters, which would lead to corrupt directory listings.
+The COMPRESSION flag is an obvious INCOMPAT flag - if the kernel
+doesn't understand compression, you would just get garbage back from
+read() instead of it automatically decompressing your data.  The ext3
+RECOVER flag is needed to prevent a kernel which does not understand the
+ext3 journal from mounting the filesystem without replaying the journal.
+
+For e2fsck, it needs to be more strict with the handling of these
+flags than the kernel.  If it doesn't understand ANY of the COMPAT,
+RO_COMPAT, or INCOMPAT flags it will refuse to check the filesystem,
+because it has no way of verifying whether a given feature is valid
+or not.  Allowing e2fsck to succeed on a filesystem with an unknown
+feature is a false sense of security for the user.  Refusing to check
+a filesystem with unknown features is a good incentive for the user to
+update to the latest e2fsck.  This also means that anyone adding feature
+flags to ext2 also needs to update e2fsck to verify these features.
+
+Metadata
+--------
+
+It is frequently claimed that the ext2 implementation of writing
+asynchronous metadata is faster than the ffs synchronous metadata
+scheme but less reliable.  Both methods are equally resolvable by their
+respective fsck programs.
+
+If you're exceptionally paranoid, there are 3 ways of making metadata
+writes synchronous on ext2:
+
+- per-file if you have the program source: use the O_SYNC flag to open()
+- per-file if you don't have the source: use "chattr +S" on the file
+- per-filesystem: add the "sync" option to mount (or in /etc/fstab)
+
+the first and last are not ext2 specific but do force the metadata to
+be written synchronously.  See also Journaling below.
+
+Limitations
+-----------
+
+There are various limits imposed by the on-disk layout of ext2.  Other
+limits are imposed by the current implementation of the kernel code.
+Many of the limits are determined at the time the filesystem is first
+created, and depend upon the block size chosen.  The ratio of inodes to
+data blocks is fixed at filesystem creation time, so the only way to
+increase the number of inodes is to increase the size of the filesystem.
+No tools currently exist which can change the ratio of inodes to blocks.
+
+Most of these limits could be overcome with slight changes in the on-disk
+format and using a compatibility flag to signal the format change (at
+the expense of some compatibility).
+
+=====================  =======    =======    =======   ========
+Filesystem block size      1kB        2kB        4kB        8kB
+=====================  =======    =======    =======   ========
+File size limit           16GB      256GB     2048GB     2048GB
+Filesystem size limit   2047GB     8192GB    16384GB    32768GB
+=====================  =======    =======    =======   ========
+
+There is a 2.4 kernel limit of 2048GB for a single block device, so no
+filesystem larger than that can be created at this time.  There is also
+an upper limit on the block size imposed by the page size of the kernel,
+so 8kB blocks are only allowed on Alpha systems (and other architectures
+which support larger pages).
+
+There is an upper limit of 32000 subdirectories in a single directory.
+
+There is a "soft" upper limit of about 10-15k files in a single directory
+with the current linear linked-list directory implementation.  This limit
+stems from performance problems when creating and deleting (and also
+finding) files in such large directories.  Using a hashed directory index
+(under development) allows 100k-1M+ files in a single directory without
+performance problems (although RAM size becomes an issue at this point).
+
+The (meaningless) absolute upper limit of files in a single directory
+(imposed by the file size, the realistic limit is obviously much less)
+is over 130 trillion files.  It would be higher except there are not
+enough 4-character names to make up unique directory entries, so they
+have to be 8 character filenames, even then we are fairly close to
+running out of unique filenames.
+
+Journaling
+----------
+
+A journaling extension to the ext2 code has been developed by Stephen
+Tweedie.  It avoids the risks of metadata corruption and the need to
+wait for e2fsck to complete after a crash, without requiring a change
+to the on-disk ext2 layout.  In a nutshell, the journal is a regular
+file which stores whole metadata (and optionally data) blocks that have
+been modified, prior to writing them into the filesystem.  This means
+it is possible to add a journal to an existing ext2 filesystem without
+the need for data conversion.
+
+When changes to the filesystem (e.g. a file is renamed) they are stored in
+a transaction in the journal and can either be complete or incomplete at
+the time of a crash.  If a transaction is complete at the time of a crash
+(or in the normal case where the system does not crash), then any blocks
+in that transaction are guaranteed to represent a valid filesystem state,
+and are copied into the filesystem.  If a transaction is incomplete at
+the time of the crash, then there is no guarantee of consistency for
+the blocks in that transaction so they are discarded (which means any
+filesystem changes they represent are also lost).
+Check Documentation/filesystems/ext4/ if you want to read more about
+ext4 and journaling.
+
+References
+==========
+
+=======================	===============================================
+The kernel source	file:/usr/src/linux/fs/ext2/
+e2fsprogs (e2fsck)	http://e2fsprogs.sourceforge.net/
+Design & Implementation	http://e2fsprogs.sourceforge.net/ext2intro.html
+Journaling (ext3)	ftp://ftp.uk.linux.org/pub/linux/sct/fs/jfs/
+Filesystem Resizing	http://ext2resize.sourceforge.net/
+Compression [1]_	http://e2compr.sourceforge.net/
+=======================	===============================================
+
+Implementations for:
+
+=======================	===========================================================
+Windows 95/98/NT/2000	http://www.chrysocome.net/explore2fs
+Windows 95 [1]_		http://www.yipton.net/content.html#FSDEXT2
+DOS client [1]_		ftp://metalab.unc.edu/pub/Linux/system/filesystems/ext2/
+OS/2 [2]_		ftp://metalab.unc.edu/pub/Linux/system/filesystems/ext2/
+RISC OS client		http://www.esw-heim.tu-clausthal.de/~marco/smorbrod/IscaFS/
+=======================	===========================================================
+
+.. [1] no longer actively developed/supported (as of Apr 2001)
+.. [2] no longer actively developed/supported (as of Mar 2009)
diff --git a/Documentation/filesystems/ext2.txt b/Documentation/filesystems/ext2.txt
deleted file mode 100644
index 94c2cf0292f5..000000000000
--- a/Documentation/filesystems/ext2.txt
+++ /dev/null
@@ -1,388 +0,0 @@
-
-The Second Extended Filesystem
-==============================
-
-ext2 was originally released in January 1993.  Written by R\'emy Card,
-Theodore Ts'o and Stephen Tweedie, it was a major rewrite of the
-Extended Filesystem.  It is currently still (April 2001) the predominant
-filesystem in use by Linux.  There are also implementations available
-for NetBSD, FreeBSD, the GNU HURD, Windows 95/98/NT, OS/2 and RISC OS.
-
-Options
-=======
-
-Most defaults are determined by the filesystem superblock, and can be
-set using tune2fs(8). Kernel-determined defaults are indicated by (*).
-
-bsddf			(*)	Makes `df' act like BSD.
-minixdf				Makes `df' act like Minix.
-
-check=none, nocheck	(*)	Don't do extra checking of bitmaps on mount
-				(check=normal and check=strict options removed)
-
-dax				Use direct access (no page cache).  See
-				Documentation/filesystems/dax.txt.
-
-debug				Extra debugging information is sent to the
-				kernel syslog.  Useful for developers.
-
-errors=continue			Keep going on a filesystem error.
-errors=remount-ro		Remount the filesystem read-only on an error.
-errors=panic			Panic and halt the machine if an error occurs.
-
-grpid, bsdgroups		Give objects the same group ID as their parent.
-nogrpid, sysvgroups		New objects have the group ID of their creator.
-
-nouid32				Use 16-bit UIDs and GIDs.
-
-oldalloc			Enable the old block allocator. Orlov should
-				have better performance, we'd like to get some
-				feedback if it's the contrary for you.
-orlov			(*)	Use the Orlov block allocator.
-				(See http://lwn.net/Articles/14633/ and
-				http://lwn.net/Articles/14446/.)
-
-resuid=n			The user ID which may use the reserved blocks.
-resgid=n			The group ID which may use the reserved blocks.
-
-sb=n				Use alternate superblock at this location.
-
-user_xattr			Enable "user." POSIX Extended Attributes
-				(requires CONFIG_EXT2_FS_XATTR).
-nouser_xattr			Don't support "user." extended attributes.
-
-acl				Enable POSIX Access Control Lists support
-				(requires CONFIG_EXT2_FS_POSIX_ACL).
-noacl				Don't support POSIX ACLs.
-
-nobh				Do not attach buffer_heads to file pagecache.
-
-quota, usrquota			Enable user disk quota support
-				(requires CONFIG_QUOTA).
-
-grpquota			Enable group disk quota support
-				(requires CONFIG_QUOTA).
-
-noquota option ls silently ignored by ext2.
-
-
-Specification
-=============
-
-ext2 shares many properties with traditional Unix filesystems.  It has
-the concepts of blocks, inodes and directories.  It has space in the
-specification for Access Control Lists (ACLs), fragments, undeletion and
-compression though these are not yet implemented (some are available as
-separate patches).  There is also a versioning mechanism to allow new
-features (such as journalling) to be added in a maximally compatible
-manner.
-
-Blocks
-------
-
-The space in the device or file is split up into blocks.  These are
-a fixed size, of 1024, 2048 or 4096 bytes (8192 bytes on Alpha systems),
-which is decided when the filesystem is created.  Smaller blocks mean
-less wasted space per file, but require slightly more accounting overhead,
-and also impose other limits on the size of files and the filesystem.
-
-Block Groups
-------------
-
-Blocks are clustered into block groups in order to reduce fragmentation
-and minimise the amount of head seeking when reading a large amount
-of consecutive data.  Information about each block group is kept in a
-descriptor table stored in the block(s) immediately after the superblock.
-Two blocks near the start of each group are reserved for the block usage
-bitmap and the inode usage bitmap which show which blocks and inodes
-are in use.  Since each bitmap is limited to a single block, this means
-that the maximum size of a block group is 8 times the size of a block.
-
-The block(s) following the bitmaps in each block group are designated
-as the inode table for that block group and the remainder are the data
-blocks.  The block allocation algorithm attempts to allocate data blocks
-in the same block group as the inode which contains them.
-
-The Superblock
---------------
-
-The superblock contains all the information about the configuration of
-the filing system.  The primary copy of the superblock is stored at an
-offset of 1024 bytes from the start of the device, and it is essential
-to mounting the filesystem.  Since it is so important, backup copies of
-the superblock are stored in block groups throughout the filesystem.
-The first version of ext2 (revision 0) stores a copy at the start of
-every block group, along with backups of the group descriptor block(s).
-Because this can consume a considerable amount of space for large
-filesystems, later revisions can optionally reduce the number of backup
-copies by only putting backups in specific groups (this is the sparse
-superblock feature).  The groups chosen are 0, 1 and powers of 3, 5 and 7.
-
-The information in the superblock contains fields such as the total
-number of inodes and blocks in the filesystem and how many are free,
-how many inodes and blocks are in each block group, when the filesystem
-was mounted (and if it was cleanly unmounted), when it was modified,
-what version of the filesystem it is (see the Revisions section below)
-and which OS created it.
-
-If the filesystem is revision 1 or higher, then there are extra fields,
-such as a volume name, a unique identification number, the inode size,
-and space for optional filesystem features to store configuration info.
-
-All fields in the superblock (as in all other ext2 structures) are stored
-on the disc in little endian format, so a filesystem is portable between
-machines without having to know what machine it was created on.
-
-Inodes
-------
-
-The inode (index node) is a fundamental concept in the ext2 filesystem.
-Each object in the filesystem is represented by an inode.  The inode
-structure contains pointers to the filesystem blocks which contain the
-data held in the object and all of the metadata about an object except
-its name.  The metadata about an object includes the permissions, owner,
-group, flags, size, number of blocks used, access time, change time,
-modification time, deletion time, number of links, fragments, version
-(for NFS) and extended attributes (EAs) and/or Access Control Lists (ACLs).
-
-There are some reserved fields which are currently unused in the inode
-structure and several which are overloaded.  One field is reserved for the
-directory ACL if the inode is a directory and alternately for the top 32
-bits of the file size if the inode is a regular file (allowing file sizes
-larger than 2GB).  The translator field is unused under Linux, but is used
-by the HURD to reference the inode of a program which will be used to
-interpret this object.  Most of the remaining reserved fields have been
-used up for both Linux and the HURD for larger owner and group fields,
-The HURD also has a larger mode field so it uses another of the remaining
-fields to store the extra more bits.
-
-There are pointers to the first 12 blocks which contain the file's data
-in the inode.  There is a pointer to an indirect block (which contains
-pointers to the next set of blocks), a pointer to a doubly-indirect
-block (which contains pointers to indirect blocks) and a pointer to a
-trebly-indirect block (which contains pointers to doubly-indirect blocks).
-
-The flags field contains some ext2-specific flags which aren't catered
-for by the standard chmod flags.  These flags can be listed with lsattr
-and changed with the chattr command, and allow specific filesystem
-behaviour on a per-file basis.  There are flags for secure deletion,
-undeletable, compression, synchronous updates, immutability, append-only,
-dumpable, no-atime, indexed directories, and data-journaling.  Not all
-of these are supported yet.
-
-Directories
------------
-
-A directory is a filesystem object and has an inode just like a file.
-It is a specially formatted file containing records which associate
-each name with an inode number.  Later revisions of the filesystem also
-encode the type of the object (file, directory, symlink, device, fifo,
-socket) to avoid the need to check the inode itself for this information
-(support for taking advantage of this feature does not yet exist in
-Glibc 2.2).
-
-The inode allocation code tries to assign inodes which are in the same
-block group as the directory in which they are first created.
-
-The current implementation of ext2 uses a singly-linked list to store
-the filenames in the directory; a pending enhancement uses hashing of the
-filenames to allow lookup without the need to scan the entire directory.
-
-The current implementation never removes empty directory blocks once they
-have been allocated to hold more files.
-
-Special files
--------------
-
-Symbolic links are also filesystem objects with inodes.  They deserve
-special mention because the data for them is stored within the inode
-itself if the symlink is less than 60 bytes long.  It uses the fields
-which would normally be used to store the pointers to data blocks.
-This is a worthwhile optimisation as it we avoid allocating a full
-block for the symlink, and most symlinks are less than 60 characters long.
-
-Character and block special devices never have data blocks assigned to
-them.  Instead, their device number is stored in the inode, again reusing
-the fields which would be used to point to the data blocks.
-
-Reserved Space
---------------
-
-In ext2, there is a mechanism for reserving a certain number of blocks
-for a particular user (normally the super-user).  This is intended to
-allow for the system to continue functioning even if non-privileged users
-fill up all the space available to them (this is independent of filesystem
-quotas).  It also keeps the filesystem from filling up entirely which
-helps combat fragmentation.
-
-Filesystem check
-----------------
-
-At boot time, most systems run a consistency check (e2fsck) on their
-filesystems.  The superblock of the ext2 filesystem contains several
-fields which indicate whether fsck should actually run (since checking
-the filesystem at boot can take a long time if it is large).  fsck will
-run if the filesystem was not cleanly unmounted, if the maximum mount
-count has been exceeded or if the maximum time between checks has been
-exceeded.
-
-Feature Compatibility
----------------------
-
-The compatibility feature mechanism used in ext2 is sophisticated.
-It safely allows features to be added to the filesystem, without
-unnecessarily sacrificing compatibility with older versions of the
-filesystem code.  The feature compatibility mechanism is not supported by
-the original revision 0 (EXT2_GOOD_OLD_REV) of ext2, but was introduced in
-revision 1.  There are three 32-bit fields, one for compatible features
-(COMPAT), one for read-only compatible (RO_COMPAT) features and one for
-incompatible (INCOMPAT) features.
-
-These feature flags have specific meanings for the kernel as follows:
-
-A COMPAT flag indicates that a feature is present in the filesystem,
-but the on-disk format is 100% compatible with older on-disk formats, so
-a kernel which didn't know anything about this feature could read/write
-the filesystem without any chance of corrupting the filesystem (or even
-making it inconsistent).  This is essentially just a flag which says
-"this filesystem has a (hidden) feature" that the kernel or e2fsck may
-want to be aware of (more on e2fsck and feature flags later).  The ext3
-HAS_JOURNAL feature is a COMPAT flag because the ext3 journal is simply
-a regular file with data blocks in it so the kernel does not need to
-take any special notice of it if it doesn't understand ext3 journaling.
-
-An RO_COMPAT flag indicates that the on-disk format is 100% compatible
-with older on-disk formats for reading (i.e. the feature does not change
-the visible on-disk format).  However, an old kernel writing to such a
-filesystem would/could corrupt the filesystem, so this is prevented. The
-most common such feature, SPARSE_SUPER, is an RO_COMPAT feature because
-sparse groups allow file data blocks where superblock/group descriptor
-backups used to live, and ext2_free_blocks() refuses to free these blocks,
-which would leading to inconsistent bitmaps.  An old kernel would also
-get an error if it tried to free a series of blocks which crossed a group
-boundary, but this is a legitimate layout in a SPARSE_SUPER filesystem.
-
-An INCOMPAT flag indicates the on-disk format has changed in some
-way that makes it unreadable by older kernels, or would otherwise
-cause a problem if an old kernel tried to mount it.  FILETYPE is an
-INCOMPAT flag because older kernels would think a filename was longer
-than 256 characters, which would lead to corrupt directory listings.
-The COMPRESSION flag is an obvious INCOMPAT flag - if the kernel
-doesn't understand compression, you would just get garbage back from
-read() instead of it automatically decompressing your data.  The ext3
-RECOVER flag is needed to prevent a kernel which does not understand the
-ext3 journal from mounting the filesystem without replaying the journal.
-
-For e2fsck, it needs to be more strict with the handling of these
-flags than the kernel.  If it doesn't understand ANY of the COMPAT,
-RO_COMPAT, or INCOMPAT flags it will refuse to check the filesystem,
-because it has no way of verifying whether a given feature is valid
-or not.  Allowing e2fsck to succeed on a filesystem with an unknown
-feature is a false sense of security for the user.  Refusing to check
-a filesystem with unknown features is a good incentive for the user to
-update to the latest e2fsck.  This also means that anyone adding feature
-flags to ext2 also needs to update e2fsck to verify these features.
-
-Metadata
---------
-
-It is frequently claimed that the ext2 implementation of writing
-asynchronous metadata is faster than the ffs synchronous metadata
-scheme but less reliable.  Both methods are equally resolvable by their
-respective fsck programs.
-
-If you're exceptionally paranoid, there are 3 ways of making metadata
-writes synchronous on ext2:
-
-per-file if you have the program source: use the O_SYNC flag to open()
-per-file if you don't have the source: use "chattr +S" on the file
-per-filesystem: add the "sync" option to mount (or in /etc/fstab)
-
-the first and last are not ext2 specific but do force the metadata to
-be written synchronously.  See also Journaling below.
-
-Limitations
------------
-
-There are various limits imposed by the on-disk layout of ext2.  Other
-limits are imposed by the current implementation of the kernel code.
-Many of the limits are determined at the time the filesystem is first
-created, and depend upon the block size chosen.  The ratio of inodes to
-data blocks is fixed at filesystem creation time, so the only way to
-increase the number of inodes is to increase the size of the filesystem.
-No tools currently exist which can change the ratio of inodes to blocks.
-
-Most of these limits could be overcome with slight changes in the on-disk
-format and using a compatibility flag to signal the format change (at
-the expense of some compatibility).
-
-Filesystem block size:     1kB        2kB        4kB        8kB
-
-File size limit:          16GB      256GB     2048GB     2048GB
-Filesystem size limit:  2047GB     8192GB    16384GB    32768GB
-
-There is a 2.4 kernel limit of 2048GB for a single block device, so no
-filesystem larger than that can be created at this time.  There is also
-an upper limit on the block size imposed by the page size of the kernel,
-so 8kB blocks are only allowed on Alpha systems (and other architectures
-which support larger pages).
-
-There is an upper limit of 32000 subdirectories in a single directory.
-
-There is a "soft" upper limit of about 10-15k files in a single directory
-with the current linear linked-list directory implementation.  This limit
-stems from performance problems when creating and deleting (and also
-finding) files in such large directories.  Using a hashed directory index
-(under development) allows 100k-1M+ files in a single directory without
-performance problems (although RAM size becomes an issue at this point).
-
-The (meaningless) absolute upper limit of files in a single directory
-(imposed by the file size, the realistic limit is obviously much less)
-is over 130 trillion files.  It would be higher except there are not
-enough 4-character names to make up unique directory entries, so they
-have to be 8 character filenames, even then we are fairly close to
-running out of unique filenames.
-
-Journaling
-----------
-
-A journaling extension to the ext2 code has been developed by Stephen
-Tweedie.  It avoids the risks of metadata corruption and the need to
-wait for e2fsck to complete after a crash, without requiring a change
-to the on-disk ext2 layout.  In a nutshell, the journal is a regular
-file which stores whole metadata (and optionally data) blocks that have
-been modified, prior to writing them into the filesystem.  This means
-it is possible to add a journal to an existing ext2 filesystem without
-the need for data conversion.
-
-When changes to the filesystem (e.g. a file is renamed) they are stored in
-a transaction in the journal and can either be complete or incomplete at
-the time of a crash.  If a transaction is complete at the time of a crash
-(or in the normal case where the system does not crash), then any blocks
-in that transaction are guaranteed to represent a valid filesystem state,
-and are copied into the filesystem.  If a transaction is incomplete at
-the time of the crash, then there is no guarantee of consistency for
-the blocks in that transaction so they are discarded (which means any
-filesystem changes they represent are also lost).
-Check Documentation/filesystems/ext4/ if you want to read more about
-ext4 and journaling.
-
-References
-==========
-
-The kernel source	file:/usr/src/linux/fs/ext2/
-e2fsprogs (e2fsck)	http://e2fsprogs.sourceforge.net/
-Design & Implementation	http://e2fsprogs.sourceforge.net/ext2intro.html
-Journaling (ext3)	ftp://ftp.uk.linux.org/pub/linux/sct/fs/jfs/
-Filesystem Resizing	http://ext2resize.sourceforge.net/
-Compression (*)		http://e2compr.sourceforge.net/
-
-Implementations for:
-Windows 95/98/NT/2000	http://www.chrysocome.net/explore2fs
-Windows 95 (*)		http://www.yipton.net/content.html#FSDEXT2
-DOS client (*)		ftp://metalab.unc.edu/pub/Linux/system/filesystems/ext2/
-OS/2 (+)		ftp://metalab.unc.edu/pub/Linux/system/filesystems/ext2/
-RISC OS client		http://www.esw-heim.tu-clausthal.de/~marco/smorbrod/IscaFS/
-
-(*) no longer actively developed/supported (as of Apr 2001)
-(+) no longer actively developed/supported (as of Mar 2009)
diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index 03a493b27920..102b3b65486a 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -62,6 +62,7 @@ Documentation for filesystem implementations.
    ecryptfs
    efivarfs
    erofs
+   ext2
    fuse
    overlayfs
    virtiofs
-- 
cgit 


From 7dc62406320c4103bbdeeeecd0a7ef03e3e58009 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:12:03 +0100
Subject: docs: filesystems: convert ext3.txt to ReST

Nothing really required here. Just renaming would be enough.

Yet, while here, lets add a SPDX header and adjust document title
to met the same standard we're using on most docs.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/26960235e3e7c972bd543f5dd59f1ef4f3a877c6.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/ext3.rst  | 14 ++++++++++++++
 Documentation/filesystems/ext3.txt  | 12 ------------
 Documentation/filesystems/index.rst |  1 +
 3 files changed, 15 insertions(+), 12 deletions(-)
 create mode 100644 Documentation/filesystems/ext3.rst
 delete mode 100644 Documentation/filesystems/ext3.txt

diff --git a/Documentation/filesystems/ext3.rst b/Documentation/filesystems/ext3.rst
new file mode 100644
index 000000000000..c06cec3a8fdc
--- /dev/null
+++ b/Documentation/filesystems/ext3.rst
@@ -0,0 +1,14 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===============
+Ext3 Filesystem
+===============
+
+Ext3 was originally released in September 1999. Written by Stephen Tweedie
+for the 2.2 branch, and ported to 2.4 kernels by Peter Braam, Andreas Dilger,
+Andrew Morton, Alexander Viro, Ted Ts'o and Stephen Tweedie.
+
+Ext3 is the ext2 filesystem enhanced with journalling capabilities. The
+filesystem is a subset of ext4 filesystem so use ext4 driver for accessing
+ext3 filesystems.
+
diff --git a/Documentation/filesystems/ext3.txt b/Documentation/filesystems/ext3.txt
deleted file mode 100644
index 58758fbef9e0..000000000000
--- a/Documentation/filesystems/ext3.txt
+++ /dev/null
@@ -1,12 +0,0 @@
-
-Ext3 Filesystem
-===============
-
-Ext3 was originally released in September 1999. Written by Stephen Tweedie
-for the 2.2 branch, and ported to 2.4 kernels by Peter Braam, Andreas Dilger,
-Andrew Morton, Alexander Viro, Ted Ts'o and Stephen Tweedie.
-
-Ext3 is the ext2 filesystem enhanced with journalling capabilities. The
-filesystem is a subset of ext4 filesystem so use ext4 driver for accessing
-ext3 filesystems.
-
diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index 102b3b65486a..aa2c3d1de3de 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -63,6 +63,7 @@ Documentation for filesystem implementations.
    efivarfs
    erofs
    ext2
+   ext3
    fuse
    overlayfs
    virtiofs
-- 
cgit 


From 89272ca1102e000f7dbca724b7b106e688199a5d Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:12:04 +0100
Subject: docs: filesystems: convert f2fs.txt to ReST

- Add a SPDX header;
- Adjust document and section titles;
- Some whitespace fixes and new line breaks;
- Mark literal blocks as such;
- Add table markups;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/8dd156320b0c015dec6d3f848d03ea057042a15b.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/f2fs.rst  | 762 ++++++++++++++++++++++++++++++++++++
 Documentation/filesystems/f2fs.txt  | 730 ----------------------------------
 Documentation/filesystems/index.rst |   1 +
 3 files changed, 763 insertions(+), 730 deletions(-)
 create mode 100644 Documentation/filesystems/f2fs.rst
 delete mode 100644 Documentation/filesystems/f2fs.txt

diff --git a/Documentation/filesystems/f2fs.rst b/Documentation/filesystems/f2fs.rst
new file mode 100644
index 000000000000..d681203728d7
--- /dev/null
+++ b/Documentation/filesystems/f2fs.rst
@@ -0,0 +1,762 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==========================================
+WHAT IS Flash-Friendly File System (F2FS)?
+==========================================
+
+NAND flash memory-based storage devices, such as SSD, eMMC, and SD cards, have
+been equipped on a variety systems ranging from mobile to server systems. Since
+they are known to have different characteristics from the conventional rotating
+disks, a file system, an upper layer to the storage device, should adapt to the
+changes from the sketch in the design level.
+
+F2FS is a file system exploiting NAND flash memory-based storage devices, which
+is based on Log-structured File System (LFS). The design has been focused on
+addressing the fundamental issues in LFS, which are snowball effect of wandering
+tree and high cleaning overhead.
+
+Since a NAND flash memory-based storage device shows different characteristic
+according to its internal geometry or flash memory management scheme, namely FTL,
+F2FS and its tools support various parameters not only for configuring on-disk
+layout, but also for selecting allocation and cleaning algorithms.
+
+The following git tree provides the file system formatting tool (mkfs.f2fs),
+a consistency checking tool (fsck.f2fs), and a debugging tool (dump.f2fs).
+
+- git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs-tools.git
+
+For reporting bugs and sending patches, please use the following mailing list:
+
+- linux-f2fs-devel@lists.sourceforge.net
+
+Background and Design issues
+============================
+
+Log-structured File System (LFS)
+--------------------------------
+"A log-structured file system writes all modifications to disk sequentially in
+a log-like structure, thereby speeding up  both file writing and crash recovery.
+The log is the only structure on disk; it contains indexing information so that
+files can be read back from the log efficiently. In order to maintain large free
+areas on disk for fast writing, we divide  the log into segments and use a
+segment cleaner to compress the live information from heavily fragmented
+segments." from Rosenblum, M. and Ousterhout, J. K., 1992, "The design and
+implementation of a log-structured file system", ACM Trans. Computer Systems
+10, 1, 26–52.
+
+Wandering Tree Problem
+----------------------
+In LFS, when a file data is updated and written to the end of log, its direct
+pointer block is updated due to the changed location. Then the indirect pointer
+block is also updated due to the direct pointer block update. In this manner,
+the upper index structures such as inode, inode map, and checkpoint block are
+also updated recursively. This problem is called as wandering tree problem [1],
+and in order to enhance the performance, it should eliminate or relax the update
+propagation as much as possible.
+
+[1] Bityutskiy, A. 2005. JFFS3 design issues. http://www.linux-mtd.infradead.org/
+
+Cleaning Overhead
+-----------------
+Since LFS is based on out-of-place writes, it produces so many obsolete blocks
+scattered across the whole storage. In order to serve new empty log space, it
+needs to reclaim these obsolete blocks seamlessly to users. This job is called
+as a cleaning process.
+
+The process consists of three operations as follows.
+
+1. A victim segment is selected through referencing segment usage table.
+2. It loads parent index structures of all the data in the victim identified by
+   segment summary blocks.
+3. It checks the cross-reference between the data and its parent index structure.
+4. It moves valid data selectively.
+
+This cleaning job may cause unexpected long delays, so the most important goal
+is to hide the latencies to users. And also definitely, it should reduce the
+amount of valid data to be moved, and move them quickly as well.
+
+Key Features
+============
+
+Flash Awareness
+---------------
+- Enlarge the random write area for better performance, but provide the high
+  spatial locality
+- Align FS data structures to the operational units in FTL as best efforts
+
+Wandering Tree Problem
+----------------------
+- Use a term, “node”, that represents inodes as well as various pointer blocks
+- Introduce Node Address Table (NAT) containing the locations of all the “node”
+  blocks; this will cut off the update propagation.
+
+Cleaning Overhead
+-----------------
+- Support a background cleaning process
+- Support greedy and cost-benefit algorithms for victim selection policies
+- Support multi-head logs for static/dynamic hot and cold data separation
+- Introduce adaptive logging for efficient block allocation
+
+Mount Options
+=============
+
+
+====================== ============================================================
+background_gc=%s       Turn on/off cleaning operations, namely garbage
+                       collection, triggered in background when I/O subsystem is
+                       idle. If background_gc=on, it will turn on the garbage
+                       collection and if background_gc=off, garbage collection
+                       will be turned off. If background_gc=sync, it will turn
+                       on synchronous garbage collection running in background.
+                       Default value for this option is on. So garbage
+                       collection is on by default.
+disable_roll_forward   Disable the roll-forward recovery routine
+norecovery             Disable the roll-forward recovery routine, mounted read-
+                       only (i.e., -o ro,disable_roll_forward)
+discard/nodiscard      Enable/disable real-time discard in f2fs, if discard is
+                       enabled, f2fs will issue discard/TRIM commands when a
+		       segment is cleaned.
+no_heap                Disable heap-style segment allocation which finds free
+                       segments for data from the beginning of main area, while
+		       for node from the end of main area.
+nouser_xattr           Disable Extended User Attributes. Note: xattr is enabled
+                       by default if CONFIG_F2FS_FS_XATTR is selected.
+noacl                  Disable POSIX Access Control List. Note: acl is enabled
+                       by default if CONFIG_F2FS_FS_POSIX_ACL is selected.
+active_logs=%u         Support configuring the number of active logs. In the
+                       current design, f2fs supports only 2, 4, and 6 logs.
+                       Default number is 6.
+disable_ext_identify   Disable the extension list configured by mkfs, so f2fs
+                       does not aware of cold files such as media files.
+inline_xattr           Enable the inline xattrs feature.
+noinline_xattr         Disable the inline xattrs feature.
+inline_xattr_size=%u   Support configuring inline xattr size, it depends on
+		       flexible inline xattr feature.
+inline_data            Enable the inline data feature: New created small(<~3.4k)
+                       files can be written into inode block.
+inline_dentry          Enable the inline dir feature: data in new created
+                       directory entries can be written into inode block. The
+                       space of inode block which is used to store inline
+                       dentries is limited to ~3.4k.
+noinline_dentry        Disable the inline dentry feature.
+flush_merge	       Merge concurrent cache_flush commands as much as possible
+                       to eliminate redundant command issues. If the underlying
+		       device handles the cache_flush command relatively slowly,
+		       recommend to enable this option.
+nobarrier              This option can be used if underlying storage guarantees
+                       its cached data should be written to the novolatile area.
+		       If this option is set, no cache_flush commands are issued
+		       but f2fs still guarantees the write ordering of all the
+		       data writes.
+fastboot               This option is used when a system wants to reduce mount
+                       time as much as possible, even though normal performance
+		       can be sacrificed.
+extent_cache           Enable an extent cache based on rb-tree, it can cache
+                       as many as extent which map between contiguous logical
+                       address and physical address per inode, resulting in
+                       increasing the cache hit ratio. Set by default.
+noextent_cache         Disable an extent cache based on rb-tree explicitly, see
+                       the above extent_cache mount option.
+noinline_data          Disable the inline data feature, inline data feature is
+                       enabled by default.
+data_flush             Enable data flushing before checkpoint in order to
+                       persist data of regular and symlink.
+reserve_root=%d        Support configuring reserved space which is used for
+                       allocation from a privileged user with specified uid or
+                       gid, unit: 4KB, the default limit is 0.2% of user blocks.
+resuid=%d              The user ID which may use the reserved blocks.
+resgid=%d              The group ID which may use the reserved blocks.
+fault_injection=%d     Enable fault injection in all supported types with
+                       specified injection rate.
+fault_type=%d          Support configuring fault injection type, should be
+                       enabled with fault_injection option, fault type value
+                       is shown below, it supports single or combined type.
+
+                       ===================	===========
+                       Type_Name		Type_Value
+                       ===================	===========
+                       FAULT_KMALLOC		0x000000001
+                       FAULT_KVMALLOC		0x000000002
+                       FAULT_PAGE_ALLOC		0x000000004
+                       FAULT_PAGE_GET		0x000000008
+                       FAULT_ALLOC_BIO		0x000000010
+                       FAULT_ALLOC_NID		0x000000020
+                       FAULT_ORPHAN		0x000000040
+                       FAULT_BLOCK		0x000000080
+                       FAULT_DIR_DEPTH		0x000000100
+                       FAULT_EVICT_INODE	0x000000200
+                       FAULT_TRUNCATE		0x000000400
+                       FAULT_READ_IO		0x000000800
+                       FAULT_CHECKPOINT		0x000001000
+                       FAULT_DISCARD		0x000002000
+                       FAULT_WRITE_IO		0x000004000
+                       ===================	===========
+mode=%s                Control block allocation mode which supports "adaptive"
+                       and "lfs". In "lfs" mode, there should be no random
+                       writes towards main area.
+io_bits=%u             Set the bit size of write IO requests. It should be set
+                       with "mode=lfs".
+usrquota               Enable plain user disk quota accounting.
+grpquota               Enable plain group disk quota accounting.
+prjquota               Enable plain project quota accounting.
+usrjquota=<file>       Appoint specified file and type during mount, so that quota
+grpjquota=<file>       information can be properly updated during recovery flow,
+prjjquota=<file>       <quota file>: must be in root directory;
+jqfmt=<quota type>     <quota type>: [vfsold,vfsv0,vfsv1].
+offusrjquota           Turn off user journelled quota.
+offgrpjquota           Turn off group journelled quota.
+offprjjquota           Turn off project journelled quota.
+quota                  Enable plain user disk quota accounting.
+noquota                Disable all plain disk quota option.
+whint_mode=%s          Control which write hints are passed down to block
+                       layer. This supports "off", "user-based", and
+                       "fs-based".  In "off" mode (default), f2fs does not pass
+                       down hints. In "user-based" mode, f2fs tries to pass
+                       down hints given by users. And in "fs-based" mode, f2fs
+                       passes down hints with its policy.
+alloc_mode=%s          Adjust block allocation policy, which supports "reuse"
+                       and "default".
+fsync_mode=%s          Control the policy of fsync. Currently supports "posix",
+                       "strict", and "nobarrier". In "posix" mode, which is
+                       default, fsync will follow POSIX semantics and does a
+                       light operation to improve the filesystem performance.
+                       In "strict" mode, fsync will be heavy and behaves in line
+                       with xfs, ext4 and btrfs, where xfstest generic/342 will
+                       pass, but the performance will regress. "nobarrier" is
+                       based on "posix", but doesn't issue flush command for
+                       non-atomic files likewise "nobarrier" mount option.
+test_dummy_encryption  Enable dummy encryption, which provides a fake fscrypt
+                       context. The fake fscrypt context is used by xfstests.
+checkpoint=%s[:%u[%]]  Set to "disable" to turn off checkpointing. Set to "enable"
+                       to reenable checkpointing. Is enabled by default. While
+                       disabled, any unmounting or unexpected shutdowns will cause
+                       the filesystem contents to appear as they did when the
+                       filesystem was mounted with that option.
+                       While mounting with checkpoint=disabled, the filesystem must
+                       run garbage collection to ensure that all available space can
+                       be used. If this takes too much time, the mount may return
+                       EAGAIN. You may optionally add a value to indicate how much
+                       of the disk you would be willing to temporarily give up to
+                       avoid additional garbage collection. This can be given as a
+                       number of blocks, or as a percent. For instance, mounting
+                       with checkpoint=disable:100% would always succeed, but it may
+                       hide up to all remaining free space. The actual space that
+                       would be unusable can be viewed at /sys/fs/f2fs/<disk>/unusable
+                       This space is reclaimed once checkpoint=enable.
+compress_algorithm=%s  Control compress algorithm, currently f2fs supports "lzo"
+                       and "lz4" algorithm.
+compress_log_size=%u   Support configuring compress cluster size, the size will
+                       be 4KB * (1 << %u), 16KB is minimum size, also it's
+                       default size.
+compress_extension=%s  Support adding specified extension, so that f2fs can enable
+                       compression on those corresponding files, e.g. if all files
+                       with '.ext' has high compression rate, we can set the '.ext'
+                       on compression extension list and enable compression on
+                       these file by default rather than to enable it via ioctl.
+                       For other files, we can still enable compression via ioctl.
+====================== ============================================================
+
+Debugfs Entries
+===============
+
+/sys/kernel/debug/f2fs/ contains information about all the partitions mounted as
+f2fs. Each file shows the whole f2fs information.
+
+/sys/kernel/debug/f2fs/status includes:
+
+ - major file system information managed by f2fs currently
+ - average SIT information about whole segments
+ - current memory footprint consumed by f2fs.
+
+Sysfs Entries
+=============
+
+Information about mounted f2fs file systems can be found in
+/sys/fs/f2fs.  Each mounted filesystem will have a directory in
+/sys/fs/f2fs based on its device name (i.e., /sys/fs/f2fs/sda).
+The files in each per-device directory are shown in table below.
+
+Files in /sys/fs/f2fs/<devname>
+(see also Documentation/ABI/testing/sysfs-fs-f2fs)
+
+Usage
+=====
+
+1. Download userland tools and compile them.
+
+2. Skip, if f2fs was compiled statically inside kernel.
+   Otherwise, insert the f2fs.ko module::
+
+	# insmod f2fs.ko
+
+3. Create a directory trying to mount::
+
+	# mkdir /mnt/f2fs
+
+4. Format the block device, and then mount as f2fs::
+
+	# mkfs.f2fs -l label /dev/block_device
+	# mount -t f2fs /dev/block_device /mnt/f2fs
+
+mkfs.f2fs
+---------
+The mkfs.f2fs is for the use of formatting a partition as the f2fs filesystem,
+which builds a basic on-disk layout.
+
+The options consist of:
+
+===============    ===========================================================
+``-l [label]``     Give a volume label, up to 512 unicode name.
+``-a [0 or 1]``    Split start location of each area for heap-based allocation.
+
+                   1 is set by default, which performs this.
+``-o [int]``       Set overprovision ratio in percent over volume size.
+
+                   5 is set by default.
+``-s [int]``       Set the number of segments per section.
+
+                   1 is set by default.
+``-z [int]``       Set the number of sections per zone.
+
+                   1 is set by default.
+``-e [str]``       Set basic extension list. e.g. "mp3,gif,mov"
+``-t [0 or 1]``    Disable discard command or not.
+
+                   1 is set by default, which conducts discard.
+===============    ===========================================================
+
+fsck.f2fs
+---------
+The fsck.f2fs is a tool to check the consistency of an f2fs-formatted
+partition, which examines whether the filesystem metadata and user-made data
+are cross-referenced correctly or not.
+Note that, initial version of the tool does not fix any inconsistency.
+
+The options consist of::
+
+  -d debug level [default:0]
+
+dump.f2fs
+---------
+The dump.f2fs shows the information of specific inode and dumps SSA and SIT to
+file. Each file is dump_ssa and dump_sit.
+
+The dump.f2fs is used to debug on-disk data structures of the f2fs filesystem.
+It shows on-disk inode information recognized by a given inode number, and is
+able to dump all the SSA and SIT entries into predefined files, ./dump_ssa and
+./dump_sit respectively.
+
+The options consist of::
+
+  -d debug level [default:0]
+  -i inode no (hex)
+  -s [SIT dump segno from #1~#2 (decimal), for all 0~-1]
+  -a [SSA dump segno from #1~#2 (decimal), for all 0~-1]
+
+Examples::
+
+    # dump.f2fs -i [ino] /dev/sdx
+    # dump.f2fs -s 0~-1 /dev/sdx (SIT dump)
+    # dump.f2fs -a 0~-1 /dev/sdx (SSA dump)
+
+Design
+======
+
+On-disk Layout
+--------------
+
+F2FS divides the whole volume into a number of segments, each of which is fixed
+to 2MB in size. A section is composed of consecutive segments, and a zone
+consists of a set of sections. By default, section and zone sizes are set to one
+segment size identically, but users can easily modify the sizes by mkfs.
+
+F2FS splits the entire volume into six areas, and all the areas except superblock
+consists of multiple segments as described below::
+
+                                            align with the zone size <-|
+                 |-> align with the segment size
+     _________________________________________________________________________
+    |            |            |   Segment   |    Node     |   Segment  |      |
+    | Superblock | Checkpoint |    Info.    |   Address   |   Summary  | Main |
+    |    (SB)    |   (CP)     | Table (SIT) | Table (NAT) | Area (SSA) |      |
+    |____________|_____2______|______N______|______N______|______N_____|__N___|
+                                                                       .      .
+                                                             .                .
+                                                 .                            .
+                                    ._________________________________________.
+                                    |_Segment_|_..._|_Segment_|_..._|_Segment_|
+                                    .           .
+                                    ._________._________
+                                    |_section_|__...__|_
+                                    .            .
+		                    .________.
+	                            |__zone__|
+
+- Superblock (SB)
+   It is located at the beginning of the partition, and there exist two copies
+   to avoid file system crash. It contains basic partition information and some
+   default parameters of f2fs.
+
+- Checkpoint (CP)
+   It contains file system information, bitmaps for valid NAT/SIT sets, orphan
+   inode lists, and summary entries of current active segments.
+
+- Segment Information Table (SIT)
+   It contains segment information such as valid block count and bitmap for the
+   validity of all the blocks.
+
+- Node Address Table (NAT)
+   It is composed of a block address table for all the node blocks stored in
+   Main area.
+
+- Segment Summary Area (SSA)
+   It contains summary entries which contains the owner information of all the
+   data and node blocks stored in Main area.
+
+- Main Area
+   It contains file and directory data including their indices.
+
+In order to avoid misalignment between file system and flash-based storage, F2FS
+aligns the start block address of CP with the segment size. Also, it aligns the
+start block address of Main area with the zone size by reserving some segments
+in SSA area.
+
+Reference the following survey for additional technical details.
+https://wiki.linaro.org/WorkingGroups/Kernel/Projects/FlashCardSurvey
+
+File System Metadata Structure
+------------------------------
+
+F2FS adopts the checkpointing scheme to maintain file system consistency. At
+mount time, F2FS first tries to find the last valid checkpoint data by scanning
+CP area. In order to reduce the scanning time, F2FS uses only two copies of CP.
+One of them always indicates the last valid data, which is called as shadow copy
+mechanism. In addition to CP, NAT and SIT also adopt the shadow copy mechanism.
+
+For file system consistency, each CP points to which NAT and SIT copies are
+valid, as shown as below::
+
+  +--------+----------+---------+
+  |   CP   |    SIT   |   NAT   |
+  +--------+----------+---------+
+  .         .          .          .
+  .            .              .              .
+  .               .                 .                 .
+  +-------+-------+--------+--------+--------+--------+
+  | CP #0 | CP #1 | SIT #0 | SIT #1 | NAT #0 | NAT #1 |
+  +-------+-------+--------+--------+--------+--------+
+     |             ^                          ^
+     |             |                          |
+     `----------------------------------------'
+
+Index Structure
+---------------
+
+The key data structure to manage the data locations is a "node". Similar to
+traditional file structures, F2FS has three types of node: inode, direct node,
+indirect node. F2FS assigns 4KB to an inode block which contains 923 data block
+indices, two direct node pointers, two indirect node pointers, and one double
+indirect node pointer as described below. One direct node block contains 1018
+data blocks, and one indirect node block contains also 1018 node blocks. Thus,
+one inode block (i.e., a file) covers::
+
+  4KB * (923 + 2 * 1018 + 2 * 1018 * 1018 + 1018 * 1018 * 1018) := 3.94TB.
+
+   Inode block (4KB)
+     |- data (923)
+     |- direct node (2)
+     |          `- data (1018)
+     |- indirect node (2)
+     |            `- direct node (1018)
+     |                       `- data (1018)
+     `- double indirect node (1)
+                         `- indirect node (1018)
+			              `- direct node (1018)
+	                                         `- data (1018)
+
+Note that, all the node blocks are mapped by NAT which means the location of
+each node is translated by the NAT table. In the consideration of the wandering
+tree problem, F2FS is able to cut off the propagation of node updates caused by
+leaf data writes.
+
+Directory Structure
+-------------------
+
+A directory entry occupies 11 bytes, which consists of the following attributes.
+
+- hash		hash value of the file name
+- ino		inode number
+- len		the length of file name
+- type		file type such as directory, symlink, etc
+
+A dentry block consists of 214 dentry slots and file names. Therein a bitmap is
+used to represent whether each dentry is valid or not. A dentry block occupies
+4KB with the following composition.
+
+::
+
+  Dentry Block(4 K) = bitmap (27 bytes) + reserved (3 bytes) +
+	              dentries(11 * 214 bytes) + file name (8 * 214 bytes)
+
+                         [Bucket]
+             +--------------------------------+
+             |dentry block 1 | dentry block 2 |
+             +--------------------------------+
+             .               .
+       .                             .
+  .       [Dentry Block Structure: 4KB]       .
+  +--------+----------+----------+------------+
+  | bitmap | reserved | dentries | file names |
+  +--------+----------+----------+------------+
+  [Dentry Block: 4KB] .   .
+		 .               .
+            .                          .
+            +------+------+-----+------+
+            | hash | ino  | len | type |
+            +------+------+-----+------+
+            [Dentry Structure: 11 bytes]
+
+F2FS implements multi-level hash tables for directory structure. Each level has
+a hash table with dedicated number of hash buckets as shown below. Note that
+"A(2B)" means a bucket includes 2 data blocks.
+
+::
+
+    ----------------------
+    A : bucket
+    B : block
+    N : MAX_DIR_HASH_DEPTH
+    ----------------------
+
+    level #0   | A(2B)
+	    |
+    level #1   | A(2B) - A(2B)
+	    |
+    level #2   | A(2B) - A(2B) - A(2B) - A(2B)
+	.     |   .       .       .       .
+    level #N/2 | A(2B) - A(2B) - A(2B) - A(2B) - A(2B) - ... - A(2B)
+	.     |   .       .       .       .
+    level #N   | A(4B) - A(4B) - A(4B) - A(4B) - A(4B) - ... - A(4B)
+
+The number of blocks and buckets are determined by::
+
+                            ,- 2, if n < MAX_DIR_HASH_DEPTH / 2,
+  # of blocks in level #n = |
+                            `- 4, Otherwise
+
+                             ,- 2^(n + dir_level),
+			     |        if n + dir_level < MAX_DIR_HASH_DEPTH / 2,
+  # of buckets in level #n = |
+                             `- 2^((MAX_DIR_HASH_DEPTH / 2) - 1),
+			              Otherwise
+
+When F2FS finds a file name in a directory, at first a hash value of the file
+name is calculated. Then, F2FS scans the hash table in level #0 to find the
+dentry consisting of the file name and its inode number. If not found, F2FS
+scans the next hash table in level #1. In this way, F2FS scans hash tables in
+each levels incrementally from 1 to N. In each levels F2FS needs to scan only
+one bucket determined by the following equation, which shows O(log(# of files))
+complexity::
+
+  bucket number to scan in level #n = (hash value) % (# of buckets in level #n)
+
+In the case of file creation, F2FS finds empty consecutive slots that cover the
+file name. F2FS searches the empty slots in the hash tables of whole levels from
+1 to N in the same way as the lookup operation.
+
+The following figure shows an example of two cases holding children::
+
+       --------------> Dir <--------------
+       |                                 |
+    child                             child
+
+    child - child                     [hole] - child
+
+    child - child - child             [hole] - [hole] - child
+
+   Case 1:                           Case 2:
+   Number of children = 6,           Number of children = 3,
+   File size = 7                     File size = 7
+
+Default Block Allocation
+------------------------
+
+At runtime, F2FS manages six active logs inside "Main" area: Hot/Warm/Cold node
+and Hot/Warm/Cold data.
+
+- Hot node	contains direct node blocks of directories.
+- Warm node	contains direct node blocks except hot node blocks.
+- Cold node	contains indirect node blocks
+- Hot data	contains dentry blocks
+- Warm data	contains data blocks except hot and cold data blocks
+- Cold data	contains multimedia data or migrated data blocks
+
+LFS has two schemes for free space management: threaded log and copy-and-compac-
+tion. The copy-and-compaction scheme which is known as cleaning, is well-suited
+for devices showing very good sequential write performance, since free segments
+are served all the time for writing new data. However, it suffers from cleaning
+overhead under high utilization. Contrarily, the threaded log scheme suffers
+from random writes, but no cleaning process is needed. F2FS adopts a hybrid
+scheme where the copy-and-compaction scheme is adopted by default, but the
+policy is dynamically changed to the threaded log scheme according to the file
+system status.
+
+In order to align F2FS with underlying flash-based storage, F2FS allocates a
+segment in a unit of section. F2FS expects that the section size would be the
+same as the unit size of garbage collection in FTL. Furthermore, with respect
+to the mapping granularity in FTL, F2FS allocates each section of the active
+logs from different zones as much as possible, since FTL can write the data in
+the active logs into one allocation unit according to its mapping granularity.
+
+Cleaning process
+----------------
+
+F2FS does cleaning both on demand and in the background. On-demand cleaning is
+triggered when there are not enough free segments to serve VFS calls. Background
+cleaner is operated by a kernel thread, and triggers the cleaning job when the
+system is idle.
+
+F2FS supports two victim selection policies: greedy and cost-benefit algorithms.
+In the greedy algorithm, F2FS selects a victim segment having the smallest number
+of valid blocks. In the cost-benefit algorithm, F2FS selects a victim segment
+according to the segment age and the number of valid blocks in order to address
+log block thrashing problem in the greedy algorithm. F2FS adopts the greedy
+algorithm for on-demand cleaner, while background cleaner adopts cost-benefit
+algorithm.
+
+In order to identify whether the data in the victim segment are valid or not,
+F2FS manages a bitmap. Each bit represents the validity of a block, and the
+bitmap is composed of a bit stream covering whole blocks in main area.
+
+Write-hint Policy
+-----------------
+
+1) whint_mode=off. F2FS only passes down WRITE_LIFE_NOT_SET.
+
+2) whint_mode=user-based. F2FS tries to pass down hints given by
+users.
+
+===================== ======================== ===================
+User                  F2FS                     Block
+===================== ======================== ===================
+                      META                     WRITE_LIFE_NOT_SET
+                      HOT_NODE                 "
+                      WARM_NODE                "
+                      COLD_NODE                "
+ioctl(COLD)           COLD_DATA                WRITE_LIFE_EXTREME
+extension list        "                        "
+
+-- buffered io
+WRITE_LIFE_EXTREME    COLD_DATA                WRITE_LIFE_EXTREME
+WRITE_LIFE_SHORT      HOT_DATA                 WRITE_LIFE_SHORT
+WRITE_LIFE_NOT_SET    WARM_DATA                WRITE_LIFE_NOT_SET
+WRITE_LIFE_NONE       "                        "
+WRITE_LIFE_MEDIUM     "                        "
+WRITE_LIFE_LONG       "                        "
+
+-- direct io
+WRITE_LIFE_EXTREME    COLD_DATA                WRITE_LIFE_EXTREME
+WRITE_LIFE_SHORT      HOT_DATA                 WRITE_LIFE_SHORT
+WRITE_LIFE_NOT_SET    WARM_DATA                WRITE_LIFE_NOT_SET
+WRITE_LIFE_NONE       "                        WRITE_LIFE_NONE
+WRITE_LIFE_MEDIUM     "                        WRITE_LIFE_MEDIUM
+WRITE_LIFE_LONG       "                        WRITE_LIFE_LONG
+===================== ======================== ===================
+
+3) whint_mode=fs-based. F2FS passes down hints with its policy.
+
+===================== ======================== ===================
+User                  F2FS                     Block
+===================== ======================== ===================
+                      META                     WRITE_LIFE_MEDIUM;
+                      HOT_NODE                 WRITE_LIFE_NOT_SET
+                      WARM_NODE                "
+                      COLD_NODE                WRITE_LIFE_NONE
+ioctl(COLD)           COLD_DATA                WRITE_LIFE_EXTREME
+extension list        "                        "
+
+-- buffered io
+WRITE_LIFE_EXTREME    COLD_DATA                WRITE_LIFE_EXTREME
+WRITE_LIFE_SHORT      HOT_DATA                 WRITE_LIFE_SHORT
+WRITE_LIFE_NOT_SET    WARM_DATA                WRITE_LIFE_LONG
+WRITE_LIFE_NONE       "                        "
+WRITE_LIFE_MEDIUM     "                        "
+WRITE_LIFE_LONG       "                        "
+
+-- direct io
+WRITE_LIFE_EXTREME    COLD_DATA                WRITE_LIFE_EXTREME
+WRITE_LIFE_SHORT      HOT_DATA                 WRITE_LIFE_SHORT
+WRITE_LIFE_NOT_SET    WARM_DATA                WRITE_LIFE_NOT_SET
+WRITE_LIFE_NONE       "                        WRITE_LIFE_NONE
+WRITE_LIFE_MEDIUM     "                        WRITE_LIFE_MEDIUM
+WRITE_LIFE_LONG       "                        WRITE_LIFE_LONG
+===================== ======================== ===================
+
+Fallocate(2) Policy
+-------------------
+
+The default policy follows the below posix rule.
+
+Allocating disk space
+    The default operation (i.e., mode is zero) of fallocate() allocates
+    the disk space within the range specified by offset and len.  The
+    file size (as reported by stat(2)) will be changed if offset+len is
+    greater than the file size.  Any subregion within the range specified
+    by offset and len that did not contain data before the call will be
+    initialized to zero.  This default behavior closely resembles the
+    behavior of the posix_fallocate(3) library function, and is intended
+    as a method of optimally implementing that function.
+
+However, once F2FS receives ioctl(fd, F2FS_IOC_SET_PIN_FILE) in prior to
+fallocate(fd, DEFAULT_MODE), it allocates on-disk blocks addressess having
+zero or random data, which is useful to the below scenario where:
+
+ 1. create(fd)
+ 2. ioctl(fd, F2FS_IOC_SET_PIN_FILE)
+ 3. fallocate(fd, 0, 0, size)
+ 4. address = fibmap(fd, offset)
+ 5. open(blkdev)
+ 6. write(blkdev, address)
+
+Compression implementation
+--------------------------
+
+- New term named cluster is defined as basic unit of compression, file can
+  be divided into multiple clusters logically. One cluster includes 4 << n
+  (n >= 0) logical pages, compression size is also cluster size, each of
+  cluster can be compressed or not.
+
+- In cluster metadata layout, one special block address is used to indicate
+  cluster is compressed one or normal one, for compressed cluster, following
+  metadata maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs
+  stores data including compress header and compressed data.
+
+- In order to eliminate write amplification during overwrite, F2FS only
+  support compression on write-once file, data can be compressed only when
+  all logical blocks in file are valid and cluster compress ratio is lower
+  than specified threshold.
+
+- To enable compression on regular inode, there are three ways:
+
+  * chattr +c file
+  * chattr +c dir; touch dir/file
+  * mount w/ -o compress_extension=ext; touch file.ext
+
+Compress metadata layout::
+
+				[Dnode Structure]
+		+-----------------------------------------------+
+		| cluster 1 | cluster 2 | ......... | cluster N |
+		+-----------------------------------------------+
+		.           .                       .           .
+	.                       .                .                      .
+    .         Compressed Cluster       .        .        Normal Cluster            .
+    +----------+---------+---------+---------+  +---------+---------+---------+---------+
+    |compr flag| block 1 | block 2 | block 3 |  | block 1 | block 2 | block 3 | block 4 |
+    +----------+---------+---------+---------+  +---------+---------+---------+---------+
+	    .                             .
+	    .                                           .
+	.                                                           .
+	+-------------+-------------+----------+----------------------------+
+	| data length | data chksum | reserved |      compressed data       |
+	+-------------+-------------+----------+----------------------------+
diff --git a/Documentation/filesystems/f2fs.txt b/Documentation/filesystems/f2fs.txt
deleted file mode 100644
index 4eb3e2ddd00e..000000000000
--- a/Documentation/filesystems/f2fs.txt
+++ /dev/null
@@ -1,730 +0,0 @@
-================================================================================
-WHAT IS Flash-Friendly File System (F2FS)?
-================================================================================
-
-NAND flash memory-based storage devices, such as SSD, eMMC, and SD cards, have
-been equipped on a variety systems ranging from mobile to server systems. Since
-they are known to have different characteristics from the conventional rotating
-disks, a file system, an upper layer to the storage device, should adapt to the
-changes from the sketch in the design level.
-
-F2FS is a file system exploiting NAND flash memory-based storage devices, which
-is based on Log-structured File System (LFS). The design has been focused on
-addressing the fundamental issues in LFS, which are snowball effect of wandering
-tree and high cleaning overhead.
-
-Since a NAND flash memory-based storage device shows different characteristic
-according to its internal geometry or flash memory management scheme, namely FTL,
-F2FS and its tools support various parameters not only for configuring on-disk
-layout, but also for selecting allocation and cleaning algorithms.
-
-The following git tree provides the file system formatting tool (mkfs.f2fs),
-a consistency checking tool (fsck.f2fs), and a debugging tool (dump.f2fs).
->> git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs-tools.git
-
-For reporting bugs and sending patches, please use the following mailing list:
->> linux-f2fs-devel@lists.sourceforge.net
-
-================================================================================
-BACKGROUND AND DESIGN ISSUES
-================================================================================
-
-Log-structured File System (LFS)
---------------------------------
-"A log-structured file system writes all modifications to disk sequentially in
-a log-like structure, thereby speeding up  both file writing and crash recovery.
-The log is the only structure on disk; it contains indexing information so that
-files can be read back from the log efficiently. In order to maintain large free
-areas on disk for fast writing, we divide  the log into segments and use a
-segment cleaner to compress the live information from heavily fragmented
-segments." from Rosenblum, M. and Ousterhout, J. K., 1992, "The design and
-implementation of a log-structured file system", ACM Trans. Computer Systems
-10, 1, 26–52.
-
-Wandering Tree Problem
-----------------------
-In LFS, when a file data is updated and written to the end of log, its direct
-pointer block is updated due to the changed location. Then the indirect pointer
-block is also updated due to the direct pointer block update. In this manner,
-the upper index structures such as inode, inode map, and checkpoint block are
-also updated recursively. This problem is called as wandering tree problem [1],
-and in order to enhance the performance, it should eliminate or relax the update
-propagation as much as possible.
-
-[1] Bityutskiy, A. 2005. JFFS3 design issues. http://www.linux-mtd.infradead.org/
-
-Cleaning Overhead
------------------
-Since LFS is based on out-of-place writes, it produces so many obsolete blocks
-scattered across the whole storage. In order to serve new empty log space, it
-needs to reclaim these obsolete blocks seamlessly to users. This job is called
-as a cleaning process.
-
-The process consists of three operations as follows.
-1. A victim segment is selected through referencing segment usage table.
-2. It loads parent index structures of all the data in the victim identified by
-   segment summary blocks.
-3. It checks the cross-reference between the data and its parent index structure.
-4. It moves valid data selectively.
-
-This cleaning job may cause unexpected long delays, so the most important goal
-is to hide the latencies to users. And also definitely, it should reduce the
-amount of valid data to be moved, and move them quickly as well.
-
-================================================================================
-KEY FEATURES
-================================================================================
-
-Flash Awareness
----------------
-- Enlarge the random write area for better performance, but provide the high
-  spatial locality
-- Align FS data structures to the operational units in FTL as best efforts
-
-Wandering Tree Problem
-----------------------
-- Use a term, “node”, that represents inodes as well as various pointer blocks
-- Introduce Node Address Table (NAT) containing the locations of all the “node”
-  blocks; this will cut off the update propagation.
-
-Cleaning Overhead
------------------
-- Support a background cleaning process
-- Support greedy and cost-benefit algorithms for victim selection policies
-- Support multi-head logs for static/dynamic hot and cold data separation
-- Introduce adaptive logging for efficient block allocation
-
-================================================================================
-MOUNT OPTIONS
-================================================================================
-
-background_gc=%s       Turn on/off cleaning operations, namely garbage
-                       collection, triggered in background when I/O subsystem is
-                       idle. If background_gc=on, it will turn on the garbage
-                       collection and if background_gc=off, garbage collection
-                       will be turned off. If background_gc=sync, it will turn
-                       on synchronous garbage collection running in background.
-                       Default value for this option is on. So garbage
-                       collection is on by default.
-disable_roll_forward   Disable the roll-forward recovery routine
-norecovery             Disable the roll-forward recovery routine, mounted read-
-                       only (i.e., -o ro,disable_roll_forward)
-discard/nodiscard      Enable/disable real-time discard in f2fs, if discard is
-                       enabled, f2fs will issue discard/TRIM commands when a
-		       segment is cleaned.
-no_heap                Disable heap-style segment allocation which finds free
-                       segments for data from the beginning of main area, while
-		       for node from the end of main area.
-nouser_xattr           Disable Extended User Attributes. Note: xattr is enabled
-                       by default if CONFIG_F2FS_FS_XATTR is selected.
-noacl                  Disable POSIX Access Control List. Note: acl is enabled
-                       by default if CONFIG_F2FS_FS_POSIX_ACL is selected.
-active_logs=%u         Support configuring the number of active logs. In the
-                       current design, f2fs supports only 2, 4, and 6 logs.
-                       Default number is 6.
-disable_ext_identify   Disable the extension list configured by mkfs, so f2fs
-                       does not aware of cold files such as media files.
-inline_xattr           Enable the inline xattrs feature.
-noinline_xattr         Disable the inline xattrs feature.
-inline_xattr_size=%u   Support configuring inline xattr size, it depends on
-		       flexible inline xattr feature.
-inline_data            Enable the inline data feature: New created small(<~3.4k)
-                       files can be written into inode block.
-inline_dentry          Enable the inline dir feature: data in new created
-                       directory entries can be written into inode block. The
-                       space of inode block which is used to store inline
-                       dentries is limited to ~3.4k.
-noinline_dentry        Disable the inline dentry feature.
-flush_merge	       Merge concurrent cache_flush commands as much as possible
-                       to eliminate redundant command issues. If the underlying
-		       device handles the cache_flush command relatively slowly,
-		       recommend to enable this option.
-nobarrier              This option can be used if underlying storage guarantees
-                       its cached data should be written to the novolatile area.
-		       If this option is set, no cache_flush commands are issued
-		       but f2fs still guarantees the write ordering of all the
-		       data writes.
-fastboot               This option is used when a system wants to reduce mount
-                       time as much as possible, even though normal performance
-		       can be sacrificed.
-extent_cache           Enable an extent cache based on rb-tree, it can cache
-                       as many as extent which map between contiguous logical
-                       address and physical address per inode, resulting in
-                       increasing the cache hit ratio. Set by default.
-noextent_cache         Disable an extent cache based on rb-tree explicitly, see
-                       the above extent_cache mount option.
-noinline_data          Disable the inline data feature, inline data feature is
-                       enabled by default.
-data_flush             Enable data flushing before checkpoint in order to
-                       persist data of regular and symlink.
-reserve_root=%d        Support configuring reserved space which is used for
-                       allocation from a privileged user with specified uid or
-                       gid, unit: 4KB, the default limit is 0.2% of user blocks.
-resuid=%d              The user ID which may use the reserved blocks.
-resgid=%d              The group ID which may use the reserved blocks.
-fault_injection=%d     Enable fault injection in all supported types with
-                       specified injection rate.
-fault_type=%d          Support configuring fault injection type, should be
-                       enabled with fault_injection option, fault type value
-                       is shown below, it supports single or combined type.
-                       Type_Name		Type_Value
-                       FAULT_KMALLOC		0x000000001
-                       FAULT_KVMALLOC		0x000000002
-                       FAULT_PAGE_ALLOC		0x000000004
-                       FAULT_PAGE_GET		0x000000008
-                       FAULT_ALLOC_BIO		0x000000010
-                       FAULT_ALLOC_NID		0x000000020
-                       FAULT_ORPHAN		0x000000040
-                       FAULT_BLOCK		0x000000080
-                       FAULT_DIR_DEPTH		0x000000100
-                       FAULT_EVICT_INODE	0x000000200
-                       FAULT_TRUNCATE		0x000000400
-                       FAULT_READ_IO		0x000000800
-                       FAULT_CHECKPOINT		0x000001000
-                       FAULT_DISCARD		0x000002000
-                       FAULT_WRITE_IO		0x000004000
-mode=%s                Control block allocation mode which supports "adaptive"
-                       and "lfs". In "lfs" mode, there should be no random
-                       writes towards main area.
-io_bits=%u             Set the bit size of write IO requests. It should be set
-                       with "mode=lfs".
-usrquota               Enable plain user disk quota accounting.
-grpquota               Enable plain group disk quota accounting.
-prjquota               Enable plain project quota accounting.
-usrjquota=<file>       Appoint specified file and type during mount, so that quota
-grpjquota=<file>       information can be properly updated during recovery flow,
-prjjquota=<file>       <quota file>: must be in root directory;
-jqfmt=<quota type>     <quota type>: [vfsold,vfsv0,vfsv1].
-offusrjquota           Turn off user journelled quota.
-offgrpjquota           Turn off group journelled quota.
-offprjjquota           Turn off project journelled quota.
-quota                  Enable plain user disk quota accounting.
-noquota                Disable all plain disk quota option.
-whint_mode=%s          Control which write hints are passed down to block
-                       layer. This supports "off", "user-based", and
-                       "fs-based".  In "off" mode (default), f2fs does not pass
-                       down hints. In "user-based" mode, f2fs tries to pass
-                       down hints given by users. And in "fs-based" mode, f2fs
-                       passes down hints with its policy.
-alloc_mode=%s          Adjust block allocation policy, which supports "reuse"
-                       and "default".
-fsync_mode=%s          Control the policy of fsync. Currently supports "posix",
-                       "strict", and "nobarrier". In "posix" mode, which is
-                       default, fsync will follow POSIX semantics and does a
-                       light operation to improve the filesystem performance.
-                       In "strict" mode, fsync will be heavy and behaves in line
-                       with xfs, ext4 and btrfs, where xfstest generic/342 will
-                       pass, but the performance will regress. "nobarrier" is
-                       based on "posix", but doesn't issue flush command for
-                       non-atomic files likewise "nobarrier" mount option.
-test_dummy_encryption  Enable dummy encryption, which provides a fake fscrypt
-                       context. The fake fscrypt context is used by xfstests.
-checkpoint=%s[:%u[%]]     Set to "disable" to turn off checkpointing. Set to "enable"
-                       to reenable checkpointing. Is enabled by default. While
-                       disabled, any unmounting or unexpected shutdowns will cause
-                       the filesystem contents to appear as they did when the
-                       filesystem was mounted with that option.
-                       While mounting with checkpoint=disabled, the filesystem must
-                       run garbage collection to ensure that all available space can
-                       be used. If this takes too much time, the mount may return
-                       EAGAIN. You may optionally add a value to indicate how much
-                       of the disk you would be willing to temporarily give up to
-                       avoid additional garbage collection. This can be given as a
-                       number of blocks, or as a percent. For instance, mounting
-                       with checkpoint=disable:100% would always succeed, but it may
-                       hide up to all remaining free space. The actual space that
-                       would be unusable can be viewed at /sys/fs/f2fs/<disk>/unusable
-                       This space is reclaimed once checkpoint=enable.
-compress_algorithm=%s  Control compress algorithm, currently f2fs supports "lzo"
-                       and "lz4" algorithm.
-compress_log_size=%u   Support configuring compress cluster size, the size will
-                       be 4KB * (1 << %u), 16KB is minimum size, also it's
-                       default size.
-compress_extension=%s  Support adding specified extension, so that f2fs can enable
-                       compression on those corresponding files, e.g. if all files
-                       with '.ext' has high compression rate, we can set the '.ext'
-                       on compression extension list and enable compression on
-                       these file by default rather than to enable it via ioctl.
-                       For other files, we can still enable compression via ioctl.
-
-================================================================================
-DEBUGFS ENTRIES
-================================================================================
-
-/sys/kernel/debug/f2fs/ contains information about all the partitions mounted as
-f2fs. Each file shows the whole f2fs information.
-
-/sys/kernel/debug/f2fs/status includes:
- - major file system information managed by f2fs currently
- - average SIT information about whole segments
- - current memory footprint consumed by f2fs.
-
-================================================================================
-SYSFS ENTRIES
-================================================================================
-
-Information about mounted f2fs file systems can be found in
-/sys/fs/f2fs.  Each mounted filesystem will have a directory in
-/sys/fs/f2fs based on its device name (i.e., /sys/fs/f2fs/sda).
-The files in each per-device directory are shown in table below.
-
-Files in /sys/fs/f2fs/<devname>
-(see also Documentation/ABI/testing/sysfs-fs-f2fs)
-
-================================================================================
-USAGE
-================================================================================
-
-1. Download userland tools and compile them.
-
-2. Skip, if f2fs was compiled statically inside kernel.
-   Otherwise, insert the f2fs.ko module.
- # insmod f2fs.ko
-
-3. Create a directory trying to mount
- # mkdir /mnt/f2fs
-
-4. Format the block device, and then mount as f2fs
- # mkfs.f2fs -l label /dev/block_device
- # mount -t f2fs /dev/block_device /mnt/f2fs
-
-mkfs.f2fs
----------
-The mkfs.f2fs is for the use of formatting a partition as the f2fs filesystem,
-which builds a basic on-disk layout.
-
-The options consist of:
--l [label]   : Give a volume label, up to 512 unicode name.
--a [0 or 1]  : Split start location of each area for heap-based allocation.
-               1 is set by default, which performs this.
--o [int]     : Set overprovision ratio in percent over volume size.
-               5 is set by default.
--s [int]     : Set the number of segments per section.
-               1 is set by default.
--z [int]     : Set the number of sections per zone.
-               1 is set by default.
--e [str]     : Set basic extension list. e.g. "mp3,gif,mov"
--t [0 or 1]  : Disable discard command or not.
-               1 is set by default, which conducts discard.
-
-fsck.f2fs
----------
-The fsck.f2fs is a tool to check the consistency of an f2fs-formatted
-partition, which examines whether the filesystem metadata and user-made data
-are cross-referenced correctly or not.
-Note that, initial version of the tool does not fix any inconsistency.
-
-The options consist of:
-  -d debug level [default:0]
-
-dump.f2fs
----------
-The dump.f2fs shows the information of specific inode and dumps SSA and SIT to
-file. Each file is dump_ssa and dump_sit.
-
-The dump.f2fs is used to debug on-disk data structures of the f2fs filesystem.
-It shows on-disk inode information recognized by a given inode number, and is
-able to dump all the SSA and SIT entries into predefined files, ./dump_ssa and
-./dump_sit respectively.
-
-The options consist of:
-  -d debug level [default:0]
-  -i inode no (hex)
-  -s [SIT dump segno from #1~#2 (decimal), for all 0~-1]
-  -a [SSA dump segno from #1~#2 (decimal), for all 0~-1]
-
-Examples:
-# dump.f2fs -i [ino] /dev/sdx
-# dump.f2fs -s 0~-1 /dev/sdx (SIT dump)
-# dump.f2fs -a 0~-1 /dev/sdx (SSA dump)
-
-================================================================================
-DESIGN
-================================================================================
-
-On-disk Layout
---------------
-
-F2FS divides the whole volume into a number of segments, each of which is fixed
-to 2MB in size. A section is composed of consecutive segments, and a zone
-consists of a set of sections. By default, section and zone sizes are set to one
-segment size identically, but users can easily modify the sizes by mkfs.
-
-F2FS splits the entire volume into six areas, and all the areas except superblock
-consists of multiple segments as described below.
-
-                                            align with the zone size <-|
-                 |-> align with the segment size
-     _________________________________________________________________________
-    |            |            |   Segment   |    Node     |   Segment  |      |
-    | Superblock | Checkpoint |    Info.    |   Address   |   Summary  | Main |
-    |    (SB)    |   (CP)     | Table (SIT) | Table (NAT) | Area (SSA) |      |
-    |____________|_____2______|______N______|______N______|______N_____|__N___|
-                                                                       .      .
-                                                             .                .
-                                                 .                            .
-                                    ._________________________________________.
-                                    |_Segment_|_..._|_Segment_|_..._|_Segment_|
-                                    .           .
-                                    ._________._________
-                                    |_section_|__...__|_
-                                    .            .
-		                    .________.
-	                            |__zone__|
-
-- Superblock (SB)
- : It is located at the beginning of the partition, and there exist two copies
-   to avoid file system crash. It contains basic partition information and some
-   default parameters of f2fs.
-
-- Checkpoint (CP)
- : It contains file system information, bitmaps for valid NAT/SIT sets, orphan
-   inode lists, and summary entries of current active segments.
-
-- Segment Information Table (SIT)
- : It contains segment information such as valid block count and bitmap for the
-   validity of all the blocks.
-
-- Node Address Table (NAT)
- : It is composed of a block address table for all the node blocks stored in
-   Main area.
-
-- Segment Summary Area (SSA)
- : It contains summary entries which contains the owner information of all the
-   data and node blocks stored in Main area.
-
-- Main Area
- : It contains file and directory data including their indices.
-
-In order to avoid misalignment between file system and flash-based storage, F2FS
-aligns the start block address of CP with the segment size. Also, it aligns the
-start block address of Main area with the zone size by reserving some segments
-in SSA area.
-
-Reference the following survey for additional technical details.
-https://wiki.linaro.org/WorkingGroups/Kernel/Projects/FlashCardSurvey
-
-File System Metadata Structure
-------------------------------
-
-F2FS adopts the checkpointing scheme to maintain file system consistency. At
-mount time, F2FS first tries to find the last valid checkpoint data by scanning
-CP area. In order to reduce the scanning time, F2FS uses only two copies of CP.
-One of them always indicates the last valid data, which is called as shadow copy
-mechanism. In addition to CP, NAT and SIT also adopt the shadow copy mechanism.
-
-For file system consistency, each CP points to which NAT and SIT copies are
-valid, as shown as below.
-
-  +--------+----------+---------+
-  |   CP   |    SIT   |   NAT   |
-  +--------+----------+---------+
-  .         .          .          .
-  .            .              .              .
-  .               .                 .                 .
-  +-------+-------+--------+--------+--------+--------+
-  | CP #0 | CP #1 | SIT #0 | SIT #1 | NAT #0 | NAT #1 |
-  +-------+-------+--------+--------+--------+--------+
-     |             ^                          ^
-     |             |                          |
-     `----------------------------------------'
-
-Index Structure
----------------
-
-The key data structure to manage the data locations is a "node". Similar to
-traditional file structures, F2FS has three types of node: inode, direct node,
-indirect node. F2FS assigns 4KB to an inode block which contains 923 data block
-indices, two direct node pointers, two indirect node pointers, and one double
-indirect node pointer as described below. One direct node block contains 1018
-data blocks, and one indirect node block contains also 1018 node blocks. Thus,
-one inode block (i.e., a file) covers:
-
-  4KB * (923 + 2 * 1018 + 2 * 1018 * 1018 + 1018 * 1018 * 1018) := 3.94TB.
-
-   Inode block (4KB)
-     |- data (923)
-     |- direct node (2)
-     |          `- data (1018)
-     |- indirect node (2)
-     |            `- direct node (1018)
-     |                       `- data (1018)
-     `- double indirect node (1)
-                         `- indirect node (1018)
-			              `- direct node (1018)
-	                                         `- data (1018)
-
-Note that, all the node blocks are mapped by NAT which means the location of
-each node is translated by the NAT table. In the consideration of the wandering
-tree problem, F2FS is able to cut off the propagation of node updates caused by
-leaf data writes.
-
-Directory Structure
--------------------
-
-A directory entry occupies 11 bytes, which consists of the following attributes.
-
-- hash		hash value of the file name
-- ino		inode number
-- len		the length of file name
-- type		file type such as directory, symlink, etc
-
-A dentry block consists of 214 dentry slots and file names. Therein a bitmap is
-used to represent whether each dentry is valid or not. A dentry block occupies
-4KB with the following composition.
-
-  Dentry Block(4 K) = bitmap (27 bytes) + reserved (3 bytes) +
-	              dentries(11 * 214 bytes) + file name (8 * 214 bytes)
-
-                         [Bucket]
-             +--------------------------------+
-             |dentry block 1 | dentry block 2 |
-             +--------------------------------+
-             .               .
-       .                             .
-  .       [Dentry Block Structure: 4KB]       .
-  +--------+----------+----------+------------+
-  | bitmap | reserved | dentries | file names |
-  +--------+----------+----------+------------+
-  [Dentry Block: 4KB] .   .
-		 .               .
-            .                          .
-            +------+------+-----+------+
-            | hash | ino  | len | type |
-            +------+------+-----+------+
-            [Dentry Structure: 11 bytes]
-
-F2FS implements multi-level hash tables for directory structure. Each level has
-a hash table with dedicated number of hash buckets as shown below. Note that
-"A(2B)" means a bucket includes 2 data blocks.
-
-----------------------
-A : bucket
-B : block
-N : MAX_DIR_HASH_DEPTH
-----------------------
-
-level #0   | A(2B)
-           |
-level #1   | A(2B) - A(2B)
-           |
-level #2   | A(2B) - A(2B) - A(2B) - A(2B)
-     .     |   .       .       .       .
-level #N/2 | A(2B) - A(2B) - A(2B) - A(2B) - A(2B) - ... - A(2B)
-     .     |   .       .       .       .
-level #N   | A(4B) - A(4B) - A(4B) - A(4B) - A(4B) - ... - A(4B)
-
-The number of blocks and buckets are determined by,
-
-                            ,- 2, if n < MAX_DIR_HASH_DEPTH / 2,
-  # of blocks in level #n = |
-                            `- 4, Otherwise
-
-                             ,- 2^(n + dir_level),
-			     |        if n + dir_level < MAX_DIR_HASH_DEPTH / 2,
-  # of buckets in level #n = |
-                             `- 2^((MAX_DIR_HASH_DEPTH / 2) - 1),
-			              Otherwise
-
-When F2FS finds a file name in a directory, at first a hash value of the file
-name is calculated. Then, F2FS scans the hash table in level #0 to find the
-dentry consisting of the file name and its inode number. If not found, F2FS
-scans the next hash table in level #1. In this way, F2FS scans hash tables in
-each levels incrementally from 1 to N. In each levels F2FS needs to scan only
-one bucket determined by the following equation, which shows O(log(# of files))
-complexity.
-
-  bucket number to scan in level #n = (hash value) % (# of buckets in level #n)
-
-In the case of file creation, F2FS finds empty consecutive slots that cover the
-file name. F2FS searches the empty slots in the hash tables of whole levels from
-1 to N in the same way as the lookup operation.
-
-The following figure shows an example of two cases holding children.
-       --------------> Dir <--------------
-       |                                 |
-    child                             child
-
-    child - child                     [hole] - child
-
-    child - child - child             [hole] - [hole] - child
-
-   Case 1:                           Case 2:
-   Number of children = 6,           Number of children = 3,
-   File size = 7                     File size = 7
-
-Default Block Allocation
-------------------------
-
-At runtime, F2FS manages six active logs inside "Main" area: Hot/Warm/Cold node
-and Hot/Warm/Cold data.
-
-- Hot node	contains direct node blocks of directories.
-- Warm node	contains direct node blocks except hot node blocks.
-- Cold node	contains indirect node blocks
-- Hot data	contains dentry blocks
-- Warm data	contains data blocks except hot and cold data blocks
-- Cold data	contains multimedia data or migrated data blocks
-
-LFS has two schemes for free space management: threaded log and copy-and-compac-
-tion. The copy-and-compaction scheme which is known as cleaning, is well-suited
-for devices showing very good sequential write performance, since free segments
-are served all the time for writing new data. However, it suffers from cleaning
-overhead under high utilization. Contrarily, the threaded log scheme suffers
-from random writes, but no cleaning process is needed. F2FS adopts a hybrid
-scheme where the copy-and-compaction scheme is adopted by default, but the
-policy is dynamically changed to the threaded log scheme according to the file
-system status.
-
-In order to align F2FS with underlying flash-based storage, F2FS allocates a
-segment in a unit of section. F2FS expects that the section size would be the
-same as the unit size of garbage collection in FTL. Furthermore, with respect
-to the mapping granularity in FTL, F2FS allocates each section of the active
-logs from different zones as much as possible, since FTL can write the data in
-the active logs into one allocation unit according to its mapping granularity.
-
-Cleaning process
-----------------
-
-F2FS does cleaning both on demand and in the background. On-demand cleaning is
-triggered when there are not enough free segments to serve VFS calls. Background
-cleaner is operated by a kernel thread, and triggers the cleaning job when the
-system is idle.
-
-F2FS supports two victim selection policies: greedy and cost-benefit algorithms.
-In the greedy algorithm, F2FS selects a victim segment having the smallest number
-of valid blocks. In the cost-benefit algorithm, F2FS selects a victim segment
-according to the segment age and the number of valid blocks in order to address
-log block thrashing problem in the greedy algorithm. F2FS adopts the greedy
-algorithm for on-demand cleaner, while background cleaner adopts cost-benefit
-algorithm.
-
-In order to identify whether the data in the victim segment are valid or not,
-F2FS manages a bitmap. Each bit represents the validity of a block, and the
-bitmap is composed of a bit stream covering whole blocks in main area.
-
-Write-hint Policy
------------------
-
-1) whint_mode=off. F2FS only passes down WRITE_LIFE_NOT_SET.
-
-2) whint_mode=user-based. F2FS tries to pass down hints given by
-users.
-
-User                  F2FS                     Block
-----                  ----                     -----
-                      META                     WRITE_LIFE_NOT_SET
-                      HOT_NODE                 "
-                      WARM_NODE                "
-                      COLD_NODE                "
-*ioctl(COLD)          COLD_DATA                WRITE_LIFE_EXTREME
-*extension list       "                        "
-
--- buffered io
-WRITE_LIFE_EXTREME    COLD_DATA                WRITE_LIFE_EXTREME
-WRITE_LIFE_SHORT      HOT_DATA                 WRITE_LIFE_SHORT
-WRITE_LIFE_NOT_SET    WARM_DATA                WRITE_LIFE_NOT_SET
-WRITE_LIFE_NONE       "                        "
-WRITE_LIFE_MEDIUM     "                        "
-WRITE_LIFE_LONG       "                        "
-
--- direct io
-WRITE_LIFE_EXTREME    COLD_DATA                WRITE_LIFE_EXTREME
-WRITE_LIFE_SHORT      HOT_DATA                 WRITE_LIFE_SHORT
-WRITE_LIFE_NOT_SET    WARM_DATA                WRITE_LIFE_NOT_SET
-WRITE_LIFE_NONE       "                        WRITE_LIFE_NONE
-WRITE_LIFE_MEDIUM     "                        WRITE_LIFE_MEDIUM
-WRITE_LIFE_LONG       "                        WRITE_LIFE_LONG
-
-3) whint_mode=fs-based. F2FS passes down hints with its policy.
-
-User                  F2FS                     Block
-----                  ----                     -----
-                      META                     WRITE_LIFE_MEDIUM;
-                      HOT_NODE                 WRITE_LIFE_NOT_SET
-                      WARM_NODE                "
-                      COLD_NODE                WRITE_LIFE_NONE
-ioctl(COLD)           COLD_DATA                WRITE_LIFE_EXTREME
-extension list        "                        "
-
--- buffered io
-WRITE_LIFE_EXTREME    COLD_DATA                WRITE_LIFE_EXTREME
-WRITE_LIFE_SHORT      HOT_DATA                 WRITE_LIFE_SHORT
-WRITE_LIFE_NOT_SET    WARM_DATA                WRITE_LIFE_LONG
-WRITE_LIFE_NONE       "                        "
-WRITE_LIFE_MEDIUM     "                        "
-WRITE_LIFE_LONG       "                        "
-
--- direct io
-WRITE_LIFE_EXTREME    COLD_DATA                WRITE_LIFE_EXTREME
-WRITE_LIFE_SHORT      HOT_DATA                 WRITE_LIFE_SHORT
-WRITE_LIFE_NOT_SET    WARM_DATA                WRITE_LIFE_NOT_SET
-WRITE_LIFE_NONE       "                        WRITE_LIFE_NONE
-WRITE_LIFE_MEDIUM     "                        WRITE_LIFE_MEDIUM
-WRITE_LIFE_LONG       "                        WRITE_LIFE_LONG
-
-Fallocate(2) Policy
--------------------
-
-The default policy follows the below posix rule.
-
-Allocating disk space
-    The default operation (i.e., mode is zero) of fallocate() allocates
-    the disk space within the range specified by offset and len.  The
-    file size (as reported by stat(2)) will be changed if offset+len is
-    greater than the file size.  Any subregion within the range specified
-    by offset and len that did not contain data before the call will be
-    initialized to zero.  This default behavior closely resembles the
-    behavior of the posix_fallocate(3) library function, and is intended
-    as a method of optimally implementing that function.
-
-However, once F2FS receives ioctl(fd, F2FS_IOC_SET_PIN_FILE) in prior to
-fallocate(fd, DEFAULT_MODE), it allocates on-disk blocks addressess having
-zero or random data, which is useful to the below scenario where:
- 1. create(fd)
- 2. ioctl(fd, F2FS_IOC_SET_PIN_FILE)
- 3. fallocate(fd, 0, 0, size)
- 4. address = fibmap(fd, offset)
- 5. open(blkdev)
- 6. write(blkdev, address)
-
-Compression implementation
---------------------------
-
-- New term named cluster is defined as basic unit of compression, file can
-be divided into multiple clusters logically. One cluster includes 4 << n
-(n >= 0) logical pages, compression size is also cluster size, each of
-cluster can be compressed or not.
-
-- In cluster metadata layout, one special block address is used to indicate
-cluster is compressed one or normal one, for compressed cluster, following
-metadata maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs
-stores data including compress header and compressed data.
-
-- In order to eliminate write amplification during overwrite, F2FS only
-support compression on write-once file, data can be compressed only when
-all logical blocks in file are valid and cluster compress ratio is lower
-than specified threshold.
-
-- To enable compression on regular inode, there are three ways:
-* chattr +c file
-* chattr +c dir; touch dir/file
-* mount w/ -o compress_extension=ext; touch file.ext
-
-Compress metadata layout:
-                             [Dnode Structure]
-             +-----------------------------------------------+
-             | cluster 1 | cluster 2 | ......... | cluster N |
-             +-----------------------------------------------+
-             .           .                       .           .
-       .                       .                .                      .
-  .         Compressed Cluster       .        .        Normal Cluster            .
-+----------+---------+---------+---------+  +---------+---------+---------+---------+
-|compr flag| block 1 | block 2 | block 3 |  | block 1 | block 2 | block 3 | block 4 |
-+----------+---------+---------+---------+  +---------+---------+---------+---------+
-           .                             .
-         .                                           .
-       .                                                           .
-      +-------------+-------------+----------+----------------------------+
-      | data length | data chksum | reserved |      compressed data       |
-      +-------------+-------------+----------+----------------------------+
diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index aa2c3d1de3de..f69d20406be0 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -64,6 +64,7 @@ Documentation for filesystem implementations.
    erofs
    ext2
    ext3
+   f2fs
    fuse
    overlayfs
    virtiofs
-- 
cgit 


From 720c2fc1ec7cb36bfc5326603522bc3955534773 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:12:05 +0100
Subject: docs: filesystems: convert gfs2.txt to ReST

- Add a SPDX header;
- Adjust document title;
- Some whitespace fixes and new line breaks;
- Mark literal blocks as such;
- Add table markups;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Acked-by: Bob Peterson <rpeterso@redhat.com>
Link: https://lore.kernel.org/r/6d7a296de025bcfed7a229da7f8cc1678944f304.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/gfs2.rst  | 53 +++++++++++++++++++++++++++++++++++++
 Documentation/filesystems/gfs2.txt  | 45 -------------------------------
 Documentation/filesystems/index.rst |  1 +
 3 files changed, 54 insertions(+), 45 deletions(-)
 create mode 100644 Documentation/filesystems/gfs2.rst
 delete mode 100644 Documentation/filesystems/gfs2.txt

diff --git a/Documentation/filesystems/gfs2.rst b/Documentation/filesystems/gfs2.rst
new file mode 100644
index 000000000000..8d1ab589ce18
--- /dev/null
+++ b/Documentation/filesystems/gfs2.rst
@@ -0,0 +1,53 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==================
+Global File System
+==================
+
+https://fedorahosted.org/cluster/wiki/HomePage
+
+GFS is a cluster file system. It allows a cluster of computers to
+simultaneously use a block device that is shared between them (with FC,
+iSCSI, NBD, etc).  GFS reads and writes to the block device like a local
+file system, but also uses a lock module to allow the computers coordinate
+their I/O so file system consistency is maintained.  One of the nifty
+features of GFS is perfect consistency -- changes made to the file system
+on one machine show up immediately on all other machines in the cluster.
+
+GFS uses interchangeable inter-node locking mechanisms, the currently
+supported mechanisms are:
+
+  lock_nolock
+    - allows gfs to be used as a local file system
+
+  lock_dlm
+    - uses a distributed lock manager (dlm) for inter-node locking.
+      The dlm is found at linux/fs/dlm/
+
+Lock_dlm depends on user space cluster management systems found
+at the URL above.
+
+To use gfs as a local file system, no external clustering systems are
+needed, simply::
+
+  $ mkfs -t gfs2 -p lock_nolock -j 1 /dev/block_device
+  $ mount -t gfs2 /dev/block_device /dir
+
+If you are using Fedora, you need to install the gfs2-utils package
+and, for lock_dlm, you will also need to install the cman package
+and write a cluster.conf as per the documentation. For F17 and above
+cman has been replaced by the dlm package.
+
+GFS2 is not on-disk compatible with previous versions of GFS, but it
+is pretty close.
+
+The following man pages can be found at the URL above:
+
+  ============		=============================================
+  fsck.gfs2		to repair a filesystem
+  gfs2_grow		to expand a filesystem online
+  gfs2_jadd		to add journals to a filesystem online
+  tunegfs2		to manipulate, examine and tune a filesystem
+  gfs2_convert		to convert a gfs filesystem to gfs2 in-place
+  mkfs.gfs2		to make a filesystem
+  ============		=============================================
diff --git a/Documentation/filesystems/gfs2.txt b/Documentation/filesystems/gfs2.txt
deleted file mode 100644
index cc4f2306609e..000000000000
--- a/Documentation/filesystems/gfs2.txt
+++ /dev/null
@@ -1,45 +0,0 @@
-Global File System
-------------------
-
-https://fedorahosted.org/cluster/wiki/HomePage
-
-GFS is a cluster file system. It allows a cluster of computers to
-simultaneously use a block device that is shared between them (with FC,
-iSCSI, NBD, etc).  GFS reads and writes to the block device like a local
-file system, but also uses a lock module to allow the computers coordinate
-their I/O so file system consistency is maintained.  One of the nifty
-features of GFS is perfect consistency -- changes made to the file system
-on one machine show up immediately on all other machines in the cluster.
-
-GFS uses interchangeable inter-node locking mechanisms, the currently
-supported mechanisms are:
-
-  lock_nolock -- allows gfs to be used as a local file system
-
-  lock_dlm -- uses a distributed lock manager (dlm) for inter-node locking
-  The dlm is found at linux/fs/dlm/
-
-Lock_dlm depends on user space cluster management systems found
-at the URL above.
-
-To use gfs as a local file system, no external clustering systems are
-needed, simply:
-
-  $ mkfs -t gfs2 -p lock_nolock -j 1 /dev/block_device
-  $ mount -t gfs2 /dev/block_device /dir
-
-If you are using Fedora, you need to install the gfs2-utils package
-and, for lock_dlm, you will also need to install the cman package
-and write a cluster.conf as per the documentation. For F17 and above
-cman has been replaced by the dlm package.
-
-GFS2 is not on-disk compatible with previous versions of GFS, but it
-is pretty close.
-
-The following man pages can be found at the URL above:
-  fsck.gfs2		to repair a filesystem
-  gfs2_grow		to expand a filesystem online
-  gfs2_jadd		to add journals to a filesystem online
-  tunegfs2		to manipulate, examine and tune a filesystem
-  gfs2_convert	to convert a gfs filesystem to gfs2 in-place
-  mkfs.gfs2		to make a filesystem
diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index f69d20406be0..f24befe78326 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -65,6 +65,7 @@ Documentation for filesystem implementations.
    ext2
    ext3
    f2fs
+   gfs2
    fuse
    overlayfs
    virtiofs
-- 
cgit 


From 5b7ac27a6e2c54cc09f479b616f1076afeae3c1b Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:12:06 +0100
Subject: docs: filesystems: convert gfs2-uevents.txt to ReST

This document is almost in ReST format: all it needs is to have
the titles adjusted and add a SPDX header. In other words:

- Add a SPDX header;
- Add a document title;
- Adjust section titles;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Acked-by: Bob Peterson <rpeterso@redhat.com>
Link: https://lore.kernel.org/r/1d1c46b7e86bd0a18d9abbea0de0bc2be84e5e2b.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/gfs2-uevents.rst | 112 +++++++++++++++++++++++++++++
 Documentation/filesystems/gfs2-uevents.txt | 100 --------------------------
 Documentation/filesystems/index.rst        |   1 +
 3 files changed, 113 insertions(+), 100 deletions(-)
 create mode 100644 Documentation/filesystems/gfs2-uevents.rst
 delete mode 100644 Documentation/filesystems/gfs2-uevents.txt

diff --git a/Documentation/filesystems/gfs2-uevents.rst b/Documentation/filesystems/gfs2-uevents.rst
new file mode 100644
index 000000000000..f162a2c76c69
--- /dev/null
+++ b/Documentation/filesystems/gfs2-uevents.rst
@@ -0,0 +1,112 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+================
+uevents and GFS2
+================
+
+During the lifetime of a GFS2 mount, a number of uevents are generated.
+This document explains what the events are and what they are used
+for (by gfs_controld in gfs2-utils).
+
+A list of GFS2 uevents
+======================
+
+1. ADD
+------
+
+The ADD event occurs at mount time. It will always be the first
+uevent generated by the newly created filesystem. If the mount
+is successful, an ONLINE uevent will follow.  If it is not successful
+then a REMOVE uevent will follow.
+
+The ADD uevent has two environment variables: SPECTATOR=[0|1]
+and RDONLY=[0|1] that specify the spectator status (a read-only mount
+with no journal assigned), and read-only (with journal assigned) status
+of the filesystem respectively.
+
+2. ONLINE
+---------
+
+The ONLINE uevent is generated after a successful mount or remount. It
+has the same environment variables as the ADD uevent. The ONLINE
+uevent, along with the two environment variables for spectator and
+RDONLY are a relatively recent addition (2.6.32-rc+) and will not
+be generated by older kernels.
+
+3. CHANGE
+---------
+
+The CHANGE uevent is used in two places. One is when reporting the
+successful mount of the filesystem by the first node (FIRSTMOUNT=Done).
+This is used as a signal by gfs_controld that it is then ok for other
+nodes in the cluster to mount the filesystem.
+
+The other CHANGE uevent is used to inform of the completion
+of journal recovery for one of the filesystems journals. It has
+two environment variables, JID= which specifies the journal id which
+has just been recovered, and RECOVERY=[Done|Failed] to indicate the
+success (or otherwise) of the operation. These uevents are generated
+for every journal recovered, whether it is during the initial mount
+process or as the result of gfs_controld requesting a specific journal
+recovery via the /sys/fs/gfs2/<fsname>/lock_module/recovery file.
+
+Because the CHANGE uevent was used (in early versions of gfs_controld)
+without checking the environment variables to discover the state, we
+cannot add any more functions to it without running the risk of
+someone using an older version of the user tools and breaking their
+cluster. For this reason the ONLINE uevent was used when adding a new
+uevent for a successful mount or remount.
+
+4. OFFLINE
+----------
+
+The OFFLINE uevent is only generated due to filesystem errors and is used
+as part of the "withdraw" mechanism. Currently this doesn't give any
+information about what the error is, which is something that needs to
+be fixed.
+
+5. REMOVE
+---------
+
+The REMOVE uevent is generated at the end of an unsuccessful mount
+or at the end of a umount of the filesystem. All REMOVE uevents will
+have been preceded by at least an ADD uevent for the same filesystem,
+and unlike the other uevents is generated automatically by the kernel's
+kobject subsystem.
+
+
+Information common to all GFS2 uevents (uevent environment variables)
+=====================================================================
+
+1. LOCKTABLE=
+--------------
+
+The LOCKTABLE is a string, as supplied on the mount command
+line (locktable=) or via fstab. It is used as a filesystem label
+as well as providing the information for a lock_dlm mount to be
+able to join the cluster.
+
+2. LOCKPROTO=
+-------------
+
+The LOCKPROTO is a string, and its value depends on what is set
+on the mount command line, or via fstab. It will be either
+lock_nolock or lock_dlm. In the future other lock managers
+may be supported.
+
+3. JOURNALID=
+-------------
+
+If a journal is in use by the filesystem (journals are not
+assigned for spectator mounts) then this will give the
+numeric journal id in all GFS2 uevents.
+
+4. UUID=
+--------
+
+With recent versions of gfs2-utils, mkfs.gfs2 writes a UUID
+into the filesystem superblock. If it exists, this will
+be included in every uevent relating to the filesystem.
+
+
+
diff --git a/Documentation/filesystems/gfs2-uevents.txt b/Documentation/filesystems/gfs2-uevents.txt
deleted file mode 100644
index 19a19ebebc34..000000000000
--- a/Documentation/filesystems/gfs2-uevents.txt
+++ /dev/null
@@ -1,100 +0,0 @@
-                              uevents and GFS2
-                             ==================
-
-During the lifetime of a GFS2 mount, a number of uevents are generated.
-This document explains what the events are and what they are used
-for (by gfs_controld in gfs2-utils).
-
-A list of GFS2 uevents
------------------------
-
-1. ADD
-
-The ADD event occurs at mount time. It will always be the first
-uevent generated by the newly created filesystem. If the mount
-is successful, an ONLINE uevent will follow.  If it is not successful
-then a REMOVE uevent will follow.
-
-The ADD uevent has two environment variables: SPECTATOR=[0|1]
-and RDONLY=[0|1] that specify the spectator status (a read-only mount
-with no journal assigned), and read-only (with journal assigned) status
-of the filesystem respectively.
-
-2. ONLINE
-
-The ONLINE uevent is generated after a successful mount or remount. It
-has the same environment variables as the ADD uevent. The ONLINE
-uevent, along with the two environment variables for spectator and
-RDONLY are a relatively recent addition (2.6.32-rc+) and will not
-be generated by older kernels.
-
-3. CHANGE
-
-The CHANGE uevent is used in two places. One is when reporting the
-successful mount of the filesystem by the first node (FIRSTMOUNT=Done).
-This is used as a signal by gfs_controld that it is then ok for other
-nodes in the cluster to mount the filesystem.
-
-The other CHANGE uevent is used to inform of the completion
-of journal recovery for one of the filesystems journals. It has
-two environment variables, JID= which specifies the journal id which
-has just been recovered, and RECOVERY=[Done|Failed] to indicate the
-success (or otherwise) of the operation. These uevents are generated
-for every journal recovered, whether it is during the initial mount
-process or as the result of gfs_controld requesting a specific journal
-recovery via the /sys/fs/gfs2/<fsname>/lock_module/recovery file.
-
-Because the CHANGE uevent was used (in early versions of gfs_controld)
-without checking the environment variables to discover the state, we
-cannot add any more functions to it without running the risk of
-someone using an older version of the user tools and breaking their
-cluster. For this reason the ONLINE uevent was used when adding a new
-uevent for a successful mount or remount.
-
-4. OFFLINE
-
-The OFFLINE uevent is only generated due to filesystem errors and is used
-as part of the "withdraw" mechanism. Currently this doesn't give any
-information about what the error is, which is something that needs to
-be fixed.
-
-5. REMOVE
-
-The REMOVE uevent is generated at the end of an unsuccessful mount
-or at the end of a umount of the filesystem. All REMOVE uevents will
-have been preceded by at least an ADD uevent for the same filesystem,
-and unlike the other uevents is generated automatically by the kernel's
-kobject subsystem.
-
-
-Information common to all GFS2 uevents (uevent environment variables)
-----------------------------------------------------------------------
-
-1. LOCKTABLE=
-
-The LOCKTABLE is a string, as supplied on the mount command
-line (locktable=) or via fstab. It is used as a filesystem label
-as well as providing the information for a lock_dlm mount to be
-able to join the cluster.
-
-2. LOCKPROTO=
-
-The LOCKPROTO is a string, and its value depends on what is set
-on the mount command line, or via fstab. It will be either
-lock_nolock or lock_dlm. In the future other lock managers
-may be supported.
-
-3. JOURNALID=
-
-If a journal is in use by the filesystem (journals are not
-assigned for spectator mounts) then this will give the
-numeric journal id in all GFS2 uevents.
-
-4. UUID=
-
-With recent versions of gfs2-utils, mkfs.gfs2 writes a UUID
-into the filesystem superblock. If it exists, this will
-be included in every uevent relating to the filesystem.
-
-
-
diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index f24befe78326..c16e517e37c5 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -66,6 +66,7 @@ Documentation for filesystem implementations.
    ext3
    f2fs
    gfs2
+   gfs2-uevents
    fuse
    overlayfs
    virtiofs
-- 
cgit 


From cdded7db3625c98e66316911947bd3a1941992e2 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:12:07 +0100
Subject: docs: filesystems: convert hfsplus.txt to ReST

Just trivial changes:

- Add a SPDX header;
- Add it to filesystems/index.rst.

While here, adjust document title, just to make it use the same
style of the other docs.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/4298409da951fbee000201a6c8d9c85e961b2b79.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/hfsplus.rst | 61 +++++++++++++++++++++++++++++++++++
 Documentation/filesystems/hfsplus.txt | 59 ---------------------------------
 Documentation/filesystems/index.rst   |  1 +
 3 files changed, 62 insertions(+), 59 deletions(-)
 create mode 100644 Documentation/filesystems/hfsplus.rst
 delete mode 100644 Documentation/filesystems/hfsplus.txt

diff --git a/Documentation/filesystems/hfsplus.rst b/Documentation/filesystems/hfsplus.rst
new file mode 100644
index 000000000000..f02f4f5fc020
--- /dev/null
+++ b/Documentation/filesystems/hfsplus.rst
@@ -0,0 +1,61 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+======================================
+Macintosh HFSPlus Filesystem for Linux
+======================================
+
+HFSPlus is a filesystem first introduced in MacOS 8.1.
+HFSPlus has several extensions to HFS, including 32-bit allocation
+blocks, 255-character unicode filenames, and file sizes of 2^63 bytes.
+
+
+Mount options
+=============
+
+When mounting an HFSPlus filesystem, the following options are accepted:
+
+  creator=cccc, type=cccc
+	Specifies the creator/type values as shown by the MacOS finder
+	used for creating new files.  Default values: '????'.
+
+  uid=n, gid=n
+	Specifies the user/group that owns all files on the filesystem
+	that have uninitialized permissions structures.
+	Default:  user/group id of the mounting process.
+
+  umask=n
+	Specifies the umask (in octal) used for files and directories
+	that have uninitialized permissions structures.
+	Default:  umask of the mounting process.
+
+  session=n
+	Select the CDROM session to mount as HFSPlus filesystem.  Defaults to
+	leaving that decision to the CDROM driver.  This option will fail
+	with anything but a CDROM as underlying devices.
+
+  part=n
+	Select partition number n from the devices.  This option only makes
+	sense for CDROMs because they can't be partitioned under Linux.
+	For disk devices the generic partition parsing code does this
+	for us.  Defaults to not parsing the partition table at all.
+
+  decompose
+	Decompose file name characters.
+
+  nodecompose
+	Do not decompose file name characters.
+
+  force
+	Used to force write access to volumes that are marked as journalled
+	or locked.  Use at your own risk.
+
+  nls=cccc
+	Encoding to use when presenting file names.
+
+
+References
+==========
+
+kernel source:		<file:fs/hfsplus>
+
+Apple Technote 1150	https://developer.apple.com/legacy/library/technotes/tn/tn1150.html
diff --git a/Documentation/filesystems/hfsplus.txt b/Documentation/filesystems/hfsplus.txt
deleted file mode 100644
index 59f7569fc9ed..000000000000
--- a/Documentation/filesystems/hfsplus.txt
+++ /dev/null
@@ -1,59 +0,0 @@
-
-Macintosh HFSPlus Filesystem for Linux
-======================================
-
-HFSPlus is a filesystem first introduced in MacOS 8.1.
-HFSPlus has several extensions to HFS, including 32-bit allocation
-blocks, 255-character unicode filenames, and file sizes of 2^63 bytes.
-
-
-Mount options
-=============
-
-When mounting an HFSPlus filesystem, the following options are accepted:
-
-  creator=cccc, type=cccc
-	Specifies the creator/type values as shown by the MacOS finder
-	used for creating new files.  Default values: '????'.
-
-  uid=n, gid=n
-	Specifies the user/group that owns all files on the filesystem
-	that have uninitialized permissions structures.
-	Default:  user/group id of the mounting process.
-
-  umask=n
-	Specifies the umask (in octal) used for files and directories
-	that have uninitialized permissions structures.
-	Default:  umask of the mounting process.
-
-  session=n
-	Select the CDROM session to mount as HFSPlus filesystem.  Defaults to
-	leaving that decision to the CDROM driver.  This option will fail
-	with anything but a CDROM as underlying devices.
-
-  part=n
-	Select partition number n from the devices.  This option only makes
-	sense for CDROMs because they can't be partitioned under Linux.
-	For disk devices the generic partition parsing code does this
-	for us.  Defaults to not parsing the partition table at all.
-
-  decompose
-	Decompose file name characters.
-
-  nodecompose
-	Do not decompose file name characters.
-
-  force
-	Used to force write access to volumes that are marked as journalled
-	or locked.  Use at your own risk.
-
-  nls=cccc
-	Encoding to use when presenting file names.
-
-
-References
-==========
-
-kernel source:		<file:fs/hfsplus>
-
-Apple Technote 1150	https://developer.apple.com/legacy/library/technotes/tn/tn1150.html
diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index c16e517e37c5..c351bc8a8c85 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -67,6 +67,7 @@ Documentation for filesystem implementations.
    f2fs
    gfs2
    gfs2-uevents
+   hfsplus
    fuse
    overlayfs
    virtiofs
-- 
cgit 


From 5040a0acc8f2300ef35a1d9cc1c50a25235e061d Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:12:08 +0100
Subject: docs: filesystems: convert hfs.txt to ReST

- Add a SPDX header;
- Adjust document and section titles;
- Use notes markups;
- Add lists markups;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/8a625d6652d88809730020048d26c3b9333ddbdf.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/hfs.rst   | 87 +++++++++++++++++++++++++++++++++++++
 Documentation/filesystems/hfs.txt   | 82 ----------------------------------
 Documentation/filesystems/index.rst |  1 +
 3 files changed, 88 insertions(+), 82 deletions(-)
 create mode 100644 Documentation/filesystems/hfs.rst
 delete mode 100644 Documentation/filesystems/hfs.txt

diff --git a/Documentation/filesystems/hfs.rst b/Documentation/filesystems/hfs.rst
new file mode 100644
index 000000000000..ab17a005e9b1
--- /dev/null
+++ b/Documentation/filesystems/hfs.rst
@@ -0,0 +1,87 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==================================
+Macintosh HFS Filesystem for Linux
+==================================
+
+
+.. Note:: This filesystem doesn't have a maintainer.
+
+
+HFS stands for ``Hierarchical File System`` and is the filesystem used
+by the Mac Plus and all later Macintosh models.  Earlier Macintosh
+models used MFS (``Macintosh File System``), which is not supported,
+MacOS 8.1 and newer support a filesystem called HFS+ that's similar to
+HFS but is extended in various areas.  Use the hfsplus filesystem driver
+to access such filesystems from Linux.
+
+
+Mount options
+=============
+
+When mounting an HFS filesystem, the following options are accepted:
+
+  creator=cccc, type=cccc
+	Specifies the creator/type values as shown by the MacOS finder
+	used for creating new files.  Default values: '????'.
+
+  uid=n, gid=n
+  	Specifies the user/group that owns all files on the filesystems.
+	Default:  user/group id of the mounting process.
+
+  dir_umask=n, file_umask=n, umask=n
+	Specifies the umask used for all files , all directories or all
+	files and directories.  Defaults to the umask of the mounting process.
+
+  session=n
+  	Select the CDROM session to mount as HFS filesystem.  Defaults to
+	leaving that decision to the CDROM driver.  This option will fail
+	with anything but a CDROM as underlying devices.
+
+  part=n
+  	Select partition number n from the devices.  Does only makes
+	sense for CDROMS because they can't be partitioned under Linux.
+	For disk devices the generic partition parsing code does this
+	for us.  Defaults to not parsing the partition table at all.
+
+  quiet
+  	Ignore invalid mount options instead of complaining.
+
+
+Writing to HFS Filesystems
+==========================
+
+HFS is not a UNIX filesystem, thus it does not have the usual features you'd
+expect:
+
+ * You can't modify the set-uid, set-gid, sticky or executable bits or the uid
+   and gid of files.
+ * You can't create hard- or symlinks, device files, sockets or FIFOs.
+
+HFS does on the other have the concepts of multiple forks per file.  These
+non-standard forks are represented as hidden additional files in the normal
+filesystems namespace which is kind of a cludge and makes the semantics for
+the a little strange:
+
+ * You can't create, delete or rename resource forks of files or the
+   Finder's metadata.
+ * They are however created (with default values), deleted and renamed
+   along with the corresponding data fork or directory.
+ * Copying files to a different filesystem will loose those attributes
+   that are essential for MacOS to work.
+
+
+Creating HFS filesystems
+========================
+
+The hfsutils package from Robert Leslie contains a program called
+hformat that can be used to create HFS filesystem. See
+<http://www.mars.org/home/rob/proj/hfs/> for details.
+
+
+Credits
+=======
+
+The HFS drivers was written by Paul H. Hargrovea (hargrove@sccm.Stanford.EDU).
+Roman Zippel (roman@ardistech.com) rewrote large parts of the code and brought
+in btree routines derived from Brad Boyer's hfsplus driver.
diff --git a/Documentation/filesystems/hfs.txt b/Documentation/filesystems/hfs.txt
deleted file mode 100644
index d096df6db07a..000000000000
--- a/Documentation/filesystems/hfs.txt
+++ /dev/null
@@ -1,82 +0,0 @@
-Note: This filesystem doesn't have a maintainer.
-
-Macintosh HFS Filesystem for Linux
-==================================
-
-HFS stands for ``Hierarchical File System'' and is the filesystem used
-by the Mac Plus and all later Macintosh models.  Earlier Macintosh
-models used MFS (``Macintosh File System''), which is not supported,
-MacOS 8.1 and newer support a filesystem called HFS+ that's similar to
-HFS but is extended in various areas.  Use the hfsplus filesystem driver
-to access such filesystems from Linux.
-
-
-Mount options
-=============
-
-When mounting an HFS filesystem, the following options are accepted:
-
-  creator=cccc, type=cccc
-	Specifies the creator/type values as shown by the MacOS finder
-	used for creating new files.  Default values: '????'.
-
-  uid=n, gid=n
-  	Specifies the user/group that owns all files on the filesystems.
-	Default:  user/group id of the mounting process.
-
-  dir_umask=n, file_umask=n, umask=n
-	Specifies the umask used for all files , all directories or all
-	files and directories.  Defaults to the umask of the mounting process.
-
-  session=n
-  	Select the CDROM session to mount as HFS filesystem.  Defaults to
-	leaving that decision to the CDROM driver.  This option will fail
-	with anything but a CDROM as underlying devices.
-
-  part=n
-  	Select partition number n from the devices.  Does only makes
-	sense for CDROMS because they can't be partitioned under Linux.
-	For disk devices the generic partition parsing code does this
-	for us.  Defaults to not parsing the partition table at all.
-
-  quiet
-  	Ignore invalid mount options instead of complaining.
-
-
-Writing to HFS Filesystems
-==========================
-
-HFS is not a UNIX filesystem, thus it does not have the usual features you'd
-expect:
-
- o You can't modify the set-uid, set-gid, sticky or executable bits or the uid
-   and gid of files.
- o You can't create hard- or symlinks, device files, sockets or FIFOs.
-
-HFS does on the other have the concepts of multiple forks per file.  These
-non-standard forks are represented as hidden additional files in the normal
-filesystems namespace which is kind of a cludge and makes the semantics for
-the a little strange:
-
- o You can't create, delete or rename resource forks of files or the
-   Finder's metadata.
- o They are however created (with default values), deleted and renamed
-   along with the corresponding data fork or directory.
- o Copying files to a different filesystem will loose those attributes
-   that are essential for MacOS to work.
-
-
-Creating HFS filesystems
-===================================
-
-The hfsutils package from Robert Leslie contains a program called
-hformat that can be used to create HFS filesystem. See
-<http://www.mars.org/home/rob/proj/hfs/> for details.
-
-
-Credits
-=======
-
-The HFS drivers was written by Paul H. Hargrovea (hargrove@sccm.Stanford.EDU).
-Roman Zippel (roman@ardistech.com) rewrote large parts of the code and brought
-in btree routines derived from Brad Boyer's hfsplus driver.
diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index c351bc8a8c85..f776411340cb 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -67,6 +67,7 @@ Documentation for filesystem implementations.
    f2fs
    gfs2
    gfs2-uevents
+   hfs
    hfsplus
    fuse
    overlayfs
-- 
cgit 


From a1ef4bcd1664a9c1ae5191598b769ab37b93aa57 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:12:09 +0100
Subject: docs: filesystems: convert hpfs.txt to ReST

- Add a SPDX header;
- Adjust document and section titles;
- Some whitespace fixes and new line breaks;
- Mark literal blocks as such;
- Add table markups;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/581019c3120938118aa55ba28902b62083c3f37a.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/hpfs.rst  | 353 ++++++++++++++++++++++++++++++++++++
 Documentation/filesystems/hpfs.txt  | 296 ------------------------------
 Documentation/filesystems/index.rst |   1 +
 3 files changed, 354 insertions(+), 296 deletions(-)
 create mode 100644 Documentation/filesystems/hpfs.rst
 delete mode 100644 Documentation/filesystems/hpfs.txt

diff --git a/Documentation/filesystems/hpfs.rst b/Documentation/filesystems/hpfs.rst
new file mode 100644
index 000000000000..0db152278572
--- /dev/null
+++ b/Documentation/filesystems/hpfs.rst
@@ -0,0 +1,353 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+====================
+Read/Write HPFS 2.09
+====================
+
+1998-2004, Mikulas Patocka
+
+:email: mikulas@artax.karlin.mff.cuni.cz
+:homepage: http://artax.karlin.mff.cuni.cz/~mikulas/vyplody/hpfs/index-e.cgi
+
+Credits
+=======
+Chris Smith, 1993, original read-only HPFS, some code and hpfs structures file
+	is taken from it
+
+Jacques Gelinas, MSDos mmap, Inspired by fs/nfs/mmap.c (Jon Tombs 15 Aug 1993)
+
+Werner Almesberger, 1992, 1993, MSDos option parser & CR/LF conversion
+
+Mount options
+
+uid=xxx,gid=xxx,umask=xxx (default uid=gid=0 umask=default_system_umask)
+	Set owner/group/mode for files that do not have it specified in extended
+	attributes. Mode is inverted umask - for example umask 027 gives owner
+	all permission, group read permission and anybody else no access. Note
+	that for files mode is anded with 0666. If you want files to have 'x'
+	rights, you must use extended attributes.
+case=lower,asis (default asis)
+	File name lowercasing in readdir.
+conv=binary,text,auto (default binary)
+	CR/LF -> LF conversion, if auto, decision is made according to extension
+	- there is a list of text extensions (I thing it's better to not convert
+	text file than to damage binary file). If you want to change that list,
+	change it in the source. Original readonly HPFS contained some strange
+	heuristic algorithm that I removed. I thing it's danger to let the
+	computer decide whether file is text or binary. For example, DJGPP
+	binaries contain small text message at the beginning and they could be
+	misidentified and damaged under some circumstances.
+check=none,normal,strict (default normal)
+	Check level. Selecting none will cause only little speedup and big
+	danger. I tried to write it so that it won't crash if check=normal on
+	corrupted filesystems. check=strict means many superfluous checks -
+	used for debugging (for example it checks if file is allocated in
+	bitmaps when accessing it).
+errors=continue,remount-ro,panic (default remount-ro)
+	Behaviour when filesystem errors found.
+chkdsk=no,errors,always (default errors)
+	When to mark filesystem dirty so that OS/2 checks it.
+eas=no,ro,rw (default rw)
+	What to do with extended attributes. 'no' - ignore them and use always
+	values specified in uid/gid/mode options. 'ro' - read extended
+	attributes but do not create them. 'rw' - create extended attributes
+	when you use chmod/chown/chgrp/mknod/ln -s on the filesystem.
+timeshift=(-)nnn (default 0)
+	Shifts the time by nnn seconds. For example, if you see under linux
+	one hour more, than under os/2, use timeshift=-3600.
+
+
+File names
+==========
+
+As in OS/2, filenames are case insensitive. However, shell thinks that names
+are case sensitive, so for example when you create a file FOO, you can use
+'cat FOO', 'cat Foo', 'cat foo' or 'cat F*' but not 'cat f*'. Note, that you
+also won't be able to compile linux kernel (and maybe other things) on HPFS
+because kernel creates different files with names like bootsect.S and
+bootsect.s. When searching for file thats name has characters >= 128, codepages
+are used - see below.
+OS/2 ignores dots and spaces at the end of file name, so this driver does as
+well. If you create 'a. ...', the file 'a' will be created, but you can still
+access it under names 'a.', 'a..', 'a .  . . ' etc.
+
+
+Extended attributes
+===================
+
+On HPFS partitions, OS/2 can associate to each file a special information called
+extended attributes. Extended attributes are pairs of (key,value) where key is
+an ascii string identifying that attribute and value is any string of bytes of
+variable length. OS/2 stores window and icon positions and file types there. So
+why not use it for unix-specific info like file owner or access rights? This
+driver can do it. If you chown/chgrp/chmod on a hpfs partition, extended
+attributes with keys "UID", "GID" or "MODE" and 2-byte values are created. Only
+that extended attributes those value differs from defaults specified in mount
+options are created. Once created, the extended attributes are never deleted,
+they're just changed. It means that when your default uid=0 and you type
+something like 'chown luser file; chown root file' the file will contain
+extended attribute UID=0. And when you umount the fs and mount it again with
+uid=luser_uid, the file will be still owned by root! If you chmod file to 444,
+extended attribute "MODE" will not be set, this special case is done by setting
+read-only flag. When you mknod a block or char device, besides "MODE", the
+special 4-byte extended attribute "DEV" will be created containing the device
+number. Currently this driver cannot resize extended attributes - it means
+that if somebody (I don't know who?) has set "UID", "GID", "MODE" or "DEV"
+attributes with different sizes, they won't be rewritten and changing these
+values doesn't work.
+
+
+Symlinks
+========
+
+You can do symlinks on HPFS partition, symlinks are achieved by setting extended
+attribute named "SYMLINK" with symlink value. Like on ext2, you can chown and
+chgrp symlinks but I don't know what is it good for. chmoding symlink results
+in chmoding file where symlink points. These symlinks are just for Linux use and
+incompatible with OS/2. OS/2 PmShell symlinks are not supported because they are
+stored in very crazy way. They tried to do it so that link changes when file is
+moved ... sometimes it works. But the link is partly stored in directory
+extended attributes and partly in OS2SYS.INI. I don't want (and don't know how)
+to analyze or change OS2SYS.INI.
+
+
+Codepages
+=========
+
+HPFS can contain several uppercasing tables for several codepages and each
+file has a pointer to codepage its name is in. However OS/2 was created in
+America where people don't care much about codepages and so multiple codepages
+support is quite buggy. I have Czech OS/2 working in codepage 852 on my disk.
+Once I booted English OS/2 working in cp 850 and I created a file on my 852
+partition. It marked file name codepage as 850 - good. But when I again booted
+Czech OS/2, the file was completely inaccessible under any name. It seems that
+OS/2 uppercases the search pattern with its system code page (852) and file
+name it's comparing to with its code page (850). These could never match. Is it
+really what IBM developers wanted? But problems continued. When I created in
+Czech OS/2 another file in that directory, that file was inaccessible too. OS/2
+probably uses different uppercasing method when searching where to place a file
+(note, that files in HPFS directory must be sorted) and when searching for
+a file. Finally when I opened this directory in PmShell, PmShell crashed (the
+funny thing was that, when rebooted, PmShell tried to reopen this directory
+again :-). chkdsk happily ignores these errors and only low-level disk
+modification saved me.  Never mix different language versions of OS/2 on one
+system although HPFS was designed to allow that.
+OK, I could implement complex codepage support to this driver but I think it
+would cause more problems than benefit with such buggy implementation in OS/2.
+So this driver simply uses first codepage it finds for uppercasing and
+lowercasing no matter what's file codepage index. Usually all file names are in
+this codepage - if you don't try to do what I described above :-)
+
+
+Known bugs
+==========
+
+HPFS386 on OS/2 server is not supported. HPFS386 installed on normal OS/2 client
+should work. If you have OS/2 server, use only read-only mode. I don't know how
+to handle some HPFS386 structures like access control list or extended perm
+list, I don't know how to delete them when file is deleted and how to not
+overwrite them with extended attributes. Send me some info on these structures
+and I'll make it. However, this driver should detect presence of HPFS386
+structures, remount read-only and not destroy them (I hope).
+
+When there's not enough space for extended attributes, they will be truncated
+and no error is returned.
+
+OS/2 can't access files if the path is longer than about 256 chars but this
+driver allows you to do it. chkdsk ignores such errors.
+
+Sometimes you won't be able to delete some files on a very full filesystem
+(returning error ENOSPC). That's because file in non-leaf node in directory tree
+(one directory, if it's large, has dirents in tree on HPFS) must be replaced
+with another node when deleted. And that new file might have larger name than
+the old one so the new name doesn't fit in directory node (dnode). And that
+would result in directory tree splitting, that takes disk space. Workaround is
+to delete other files that are leaf (probability that the file is non-leaf is
+about 1/50) or to truncate file first to make some space.
+You encounter this problem only if you have many directories so that
+preallocated directory band is full i.e.::
+
+	number_of_directories / size_of_filesystem_in_mb > 4.
+
+You can't delete open directories.
+
+You can't rename over directories (what is it good for?).
+
+Renaming files so that only case changes doesn't work. This driver supports it
+but vfs doesn't. Something like 'mv file FILE' won't work.
+
+All atimes and directory mtimes are not updated. That's because of performance
+reasons. If you extremely wish to update them, let me know, I'll write it (but
+it will be slow).
+
+When the system is out of memory and swap, it may slightly corrupt filesystem
+(lost files, unbalanced directories). (I guess all filesystem may do it).
+
+When compiled, you get warning: function declaration isn't a prototype. Does
+anybody know what does it mean?
+
+
+What does "unbalanced tree" message mean?
+=========================================
+
+Old versions of this driver created sometimes unbalanced dnode trees. OS/2
+chkdsk doesn't scream if the tree is unbalanced (and sometimes creates
+unbalanced trees too :-) but both HPFS and HPFS386 contain bug that it rarely
+crashes when the tree is not balanced. This driver handles unbalanced trees
+correctly and writes warning if it finds them. If you see this message, this is
+probably because of directories created with old version of this driver.
+Workaround is to move all files from that directory to another and then back
+again. Do it in Linux, not OS/2! If you see this message in directory that is
+whole created by this driver, it is BUG - let me know about it.
+
+
+Bugs in OS/2
+============
+
+When you have two (or more) lost directories pointing each to other, chkdsk
+locks up when repairing filesystem.
+
+Sometimes (I think it's random) when you create a file with one-char name under
+OS/2, OS/2 marks it as 'long'. chkdsk then removes this flag saying "Minor fs
+error corrected".
+
+File names like "a .b" are marked as 'long' by OS/2 but chkdsk "corrects" it and
+marks them as short (and writes "minor fs error corrected"). This bug is not in
+HPFS386.
+
+Codepage bugs described above
+=============================
+
+If you don't install fixpacks, there are many, many more...
+
+
+History
+=======
+
+====== =========================================================================
+0.90   First public release
+0.91   Fixed bug that caused shooting to memory when write_inode was called on
+       open inode (rarely happened)
+0.92   Fixed a little memory leak in freeing directory inodes
+0.93   Fixed bug that locked up the machine when there were too many filenames
+       with first 15 characters same
+       Fixed write_file to zero file when writing behind file end
+0.94   Fixed a little memory leak when trying to delete busy file or directory
+0.95   Fixed a bug that i_hpfs_parent_dir was not updated when moving files
+1.90   First version for 2.1.1xx kernels
+1.91   Fixed a bug that chk_sectors failed when sectors were at the end of disk
+       Fixed a race-condition when write_inode is called while deleting file
+       Fixed a bug that could possibly happen (with very low probability) when
+       using 0xff in filenames.
+
+       Rewritten locking to avoid race-conditions
+
+       Mount option 'eas' now works
+
+       Fsync no longer returns error
+
+       Files beginning with '.' are marked hidden
+
+       Remount support added
+
+       Alloc is not so slow when filesystem becomes full
+
+       Atimes are no more updated because it slows down operation
+
+       Code cleanup (removed all commented debug prints)
+1.92   Corrected a bug when sync was called just before closing file
+1.93   Modified, so that it works with kernels >= 2.1.131, I don't know if it
+       works with previous versions
+
+       Fixed a possible problem with disks > 64G (but I don't have one, so I can't
+       test it)
+
+       Fixed a file overflow at 2G
+
+       Added new option 'timeshift'
+
+       Changed behaviour on HPFS386: It is now possible to operate on HPFS386 in
+       read-only mode
+
+       Fixed a bug that slowed down alloc and prevented allocating 100% space
+       (this bug was not destructive)
+1.94   Added workaround for one bug in Linux
+
+       Fixed one buffer leak
+
+       Fixed some incompatibilities with large extended attributes (but it's still
+       not 100% ok, I have no info on it and OS/2 doesn't want to create them)
+
+       Rewritten allocation
+
+       Fixed a bug with i_blocks (du sometimes didn't display correct values)
+
+       Directories have no longer archive attribute set (some programs don't like
+       it)
+
+       Fixed a bug that it set badly one flag in large anode tree (it was not
+       destructive)
+1.95   Fixed one buffer leak, that could happen on corrupted filesystem
+
+       Fixed one bug in allocation in 1.94
+1.96   Added workaround for one bug in OS/2 (HPFS locked up, HPFS386 reported
+       error sometimes when opening directories in PMSHELL)
+
+       Fixed a possible bitmap race
+
+       Fixed possible problem on large disks
+
+       You can now delete open files
+
+       Fixed a nondestructive race in rename
+1.97   Support for HPFS v3 (on large partitions)
+
+       ZFixed a bug that it didn't allow creation of files > 128M
+       (it should be 2G)
+1.97.1 Changed names of global symbols
+
+       Fixed a bug when chmoding or chowning root directory
+1.98   Fixed a deadlock when using old_readdir
+       Better directory handling; workaround for "unbalanced tree" bug in OS/2
+1.99   Corrected a possible problem when there's not enough space while deleting
+       file
+
+       Now it tries to truncate the file if there's not enough space when
+       deleting
+
+       Removed a lot of redundant code
+2.00   Fixed a bug in rename (it was there since 1.96)
+       Better anti-fragmentation strategy
+2.01   Fixed problem with directory listing over NFS
+
+       Directory lseek now checks for proper parameters
+
+       Fixed race-condition in buffer code - it is in all filesystems in Linux;
+       when reading device (cat /dev/hda) while creating files on it, files
+       could be damaged
+2.02   Workaround for bug in breada in Linux. breada could cause accesses beyond
+       end of partition
+2.03   Char, block devices and pipes are correctly created
+
+       Fixed non-crashing race in unlink (Alexander Viro)
+
+       Now it works with Japanese version of OS/2
+2.04   Fixed error when ftruncate used to extend file
+2.05   Fixed crash when got mount parameters without =
+
+       Fixed crash when allocation of anode failed due to full disk
+
+       Fixed some crashes when block io or inode allocation failed
+2.06   Fixed some crash on corrupted disk structures
+
+       Better allocation strategy
+
+       Reschedule points added so that it doesn't lock CPU long time
+
+       It should work in read-only mode on Warp Server
+2.07   More fixes for Warp Server. Now it really works
+2.08   Creating new files is not so slow on large disks
+
+       An attempt to sync deleted file does not generate filesystem error
+2.09   Fixed error on extremely fragmented files
+====== =========================================================================
diff --git a/Documentation/filesystems/hpfs.txt b/Documentation/filesystems/hpfs.txt
deleted file mode 100644
index 74630bd504fb..000000000000
--- a/Documentation/filesystems/hpfs.txt
+++ /dev/null
@@ -1,296 +0,0 @@
-Read/Write HPFS 2.09
-1998-2004, Mikulas Patocka
-
-email: mikulas@artax.karlin.mff.cuni.cz
-homepage: http://artax.karlin.mff.cuni.cz/~mikulas/vyplody/hpfs/index-e.cgi
-
-CREDITS:
-Chris Smith, 1993, original read-only HPFS, some code and hpfs structures file
-	is taken from it
-Jacques Gelinas, MSDos mmap, Inspired by fs/nfs/mmap.c (Jon Tombs 15 Aug 1993)
-Werner Almesberger, 1992, 1993, MSDos option parser & CR/LF conversion
-
-Mount options
-
-uid=xxx,gid=xxx,umask=xxx (default uid=gid=0 umask=default_system_umask)
-	Set owner/group/mode for files that do not have it specified in extended
-	attributes. Mode is inverted umask - for example umask 027 gives owner
-	all permission, group read permission and anybody else no access. Note
-	that for files mode is anded with 0666. If you want files to have 'x'
-	rights, you must use extended attributes.
-case=lower,asis (default asis)
-	File name lowercasing in readdir.
-conv=binary,text,auto (default binary)
-	CR/LF -> LF conversion, if auto, decision is made according to extension
-	- there is a list of text extensions (I thing it's better to not convert
-	text file than to damage binary file). If you want to change that list,
-	change it in the source. Original readonly HPFS contained some strange
-	heuristic algorithm that I removed. I thing it's danger to let the
-	computer decide whether file is text or binary. For example, DJGPP
-	binaries contain small text message at the beginning and they could be
-	misidentified and damaged under some circumstances.
-check=none,normal,strict (default normal)
-	Check level. Selecting none will cause only little speedup and big
-	danger. I tried to write it so that it won't crash if check=normal on
-	corrupted filesystems. check=strict means many superfluous checks -
-	used for debugging (for example it checks if file is allocated in
-	bitmaps when accessing it).
-errors=continue,remount-ro,panic (default remount-ro)
-	Behaviour when filesystem errors found.
-chkdsk=no,errors,always (default errors)
-	When to mark filesystem dirty so that OS/2 checks it.
-eas=no,ro,rw (default rw)
-	What to do with extended attributes. 'no' - ignore them and use always
-	values specified in uid/gid/mode options. 'ro' - read extended
-	attributes but do not create them. 'rw' - create extended attributes
-	when you use chmod/chown/chgrp/mknod/ln -s on the filesystem.
-timeshift=(-)nnn (default 0)
-	Shifts the time by nnn seconds. For example, if you see under linux
-	one hour more, than under os/2, use timeshift=-3600.
-
-
-File names
-
-As in OS/2, filenames are case insensitive. However, shell thinks that names
-are case sensitive, so for example when you create a file FOO, you can use
-'cat FOO', 'cat Foo', 'cat foo' or 'cat F*' but not 'cat f*'. Note, that you
-also won't be able to compile linux kernel (and maybe other things) on HPFS
-because kernel creates different files with names like bootsect.S and
-bootsect.s. When searching for file thats name has characters >= 128, codepages
-are used - see below.
-OS/2 ignores dots and spaces at the end of file name, so this driver does as
-well. If you create 'a. ...', the file 'a' will be created, but you can still
-access it under names 'a.', 'a..', 'a .  . . ' etc.
-
-
-Extended attributes
-
-On HPFS partitions, OS/2 can associate to each file a special information called
-extended attributes. Extended attributes are pairs of (key,value) where key is
-an ascii string identifying that attribute and value is any string of bytes of
-variable length. OS/2 stores window and icon positions and file types there. So
-why not use it for unix-specific info like file owner or access rights? This
-driver can do it. If you chown/chgrp/chmod on a hpfs partition, extended
-attributes with keys "UID", "GID" or "MODE" and 2-byte values are created. Only
-that extended attributes those value differs from defaults specified in mount
-options are created. Once created, the extended attributes are never deleted,
-they're just changed. It means that when your default uid=0 and you type
-something like 'chown luser file; chown root file' the file will contain
-extended attribute UID=0. And when you umount the fs and mount it again with
-uid=luser_uid, the file will be still owned by root! If you chmod file to 444,
-extended attribute "MODE" will not be set, this special case is done by setting
-read-only flag. When you mknod a block or char device, besides "MODE", the
-special 4-byte extended attribute "DEV" will be created containing the device
-number. Currently this driver cannot resize extended attributes - it means
-that if somebody (I don't know who?) has set "UID", "GID", "MODE" or "DEV"
-attributes with different sizes, they won't be rewritten and changing these
-values doesn't work.
-
-
-Symlinks
-
-You can do symlinks on HPFS partition, symlinks are achieved by setting extended
-attribute named "SYMLINK" with symlink value. Like on ext2, you can chown and
-chgrp symlinks but I don't know what is it good for. chmoding symlink results
-in chmoding file where symlink points. These symlinks are just for Linux use and
-incompatible with OS/2. OS/2 PmShell symlinks are not supported because they are
-stored in very crazy way. They tried to do it so that link changes when file is
-moved ... sometimes it works. But the link is partly stored in directory
-extended attributes and partly in OS2SYS.INI. I don't want (and don't know how)
-to analyze or change OS2SYS.INI.
-
-
-Codepages
-
-HPFS can contain several uppercasing tables for several codepages and each
-file has a pointer to codepage its name is in. However OS/2 was created in
-America where people don't care much about codepages and so multiple codepages
-support is quite buggy. I have Czech OS/2 working in codepage 852 on my disk.
-Once I booted English OS/2 working in cp 850 and I created a file on my 852
-partition. It marked file name codepage as 850 - good. But when I again booted
-Czech OS/2, the file was completely inaccessible under any name. It seems that
-OS/2 uppercases the search pattern with its system code page (852) and file
-name it's comparing to with its code page (850). These could never match. Is it
-really what IBM developers wanted? But problems continued. When I created in
-Czech OS/2 another file in that directory, that file was inaccessible too. OS/2
-probably uses different uppercasing method when searching where to place a file
-(note, that files in HPFS directory must be sorted) and when searching for
-a file. Finally when I opened this directory in PmShell, PmShell crashed (the
-funny thing was that, when rebooted, PmShell tried to reopen this directory
-again :-). chkdsk happily ignores these errors and only low-level disk
-modification saved me.  Never mix different language versions of OS/2 on one
-system although HPFS was designed to allow that.
-OK, I could implement complex codepage support to this driver but I think it
-would cause more problems than benefit with such buggy implementation in OS/2.
-So this driver simply uses first codepage it finds for uppercasing and
-lowercasing no matter what's file codepage index. Usually all file names are in
-this codepage - if you don't try to do what I described above :-)
-
-
-Known bugs
-
-HPFS386 on OS/2 server is not supported. HPFS386 installed on normal OS/2 client
-should work. If you have OS/2 server, use only read-only mode. I don't know how
-to handle some HPFS386 structures like access control list or extended perm
-list, I don't know how to delete them when file is deleted and how to not
-overwrite them with extended attributes. Send me some info on these structures
-and I'll make it. However, this driver should detect presence of HPFS386
-structures, remount read-only and not destroy them (I hope).
-
-When there's not enough space for extended attributes, they will be truncated
-and no error is returned.
-
-OS/2 can't access files if the path is longer than about 256 chars but this
-driver allows you to do it. chkdsk ignores such errors.
-
-Sometimes you won't be able to delete some files on a very full filesystem
-(returning error ENOSPC). That's because file in non-leaf node in directory tree
-(one directory, if it's large, has dirents in tree on HPFS) must be replaced
-with another node when deleted. And that new file might have larger name than
-the old one so the new name doesn't fit in directory node (dnode). And that
-would result in directory tree splitting, that takes disk space. Workaround is
-to delete other files that are leaf (probability that the file is non-leaf is
-about 1/50) or to truncate file first to make some space.
-You encounter this problem only if you have many directories so that
-preallocated directory band is full i.e.
-	number_of_directories / size_of_filesystem_in_mb > 4.
-
-You can't delete open directories.
-
-You can't rename over directories (what is it good for?).
-
-Renaming files so that only case changes doesn't work. This driver supports it
-but vfs doesn't. Something like 'mv file FILE' won't work.
-
-All atimes and directory mtimes are not updated. That's because of performance
-reasons. If you extremely wish to update them, let me know, I'll write it (but
-it will be slow).
-
-When the system is out of memory and swap, it may slightly corrupt filesystem
-(lost files, unbalanced directories). (I guess all filesystem may do it).
-
-When compiled, you get warning: function declaration isn't a prototype. Does
-anybody know what does it mean?
-
-
-What does "unbalanced tree" message mean?
-
-Old versions of this driver created sometimes unbalanced dnode trees. OS/2
-chkdsk doesn't scream if the tree is unbalanced (and sometimes creates
-unbalanced trees too :-) but both HPFS and HPFS386 contain bug that it rarely
-crashes when the tree is not balanced. This driver handles unbalanced trees
-correctly and writes warning if it finds them. If you see this message, this is
-probably because of directories created with old version of this driver.
-Workaround is to move all files from that directory to another and then back
-again. Do it in Linux, not OS/2! If you see this message in directory that is
-whole created by this driver, it is BUG - let me know about it.
-
-
-Bugs in OS/2
-
-When you have two (or more) lost directories pointing each to other, chkdsk
-locks up when repairing filesystem.
-
-Sometimes (I think it's random) when you create a file with one-char name under
-OS/2, OS/2 marks it as 'long'. chkdsk then removes this flag saying "Minor fs
-error corrected".
-
-File names like "a .b" are marked as 'long' by OS/2 but chkdsk "corrects" it and
-marks them as short (and writes "minor fs error corrected"). This bug is not in
-HPFS386.
-
-Codepage bugs described above.
-
-If you don't install fixpacks, there are many, many more...
-
-
-History
-
-0.90 First public release
-0.91 Fixed bug that caused shooting to memory when write_inode was called on
-	open inode (rarely happened)
-0.92 Fixed a little memory leak in freeing directory inodes
-0.93 Fixed bug that locked up the machine when there were too many filenames
-	with first 15 characters same
-     Fixed write_file to zero file when writing behind file end
-0.94 Fixed a little memory leak when trying to delete busy file or directory
-0.95 Fixed a bug that i_hpfs_parent_dir was not updated when moving files
-1.90 First version for 2.1.1xx kernels
-1.91 Fixed a bug that chk_sectors failed when sectors were at the end of disk
-     Fixed a race-condition when write_inode is called while deleting file
-     Fixed a bug that could possibly happen (with very low probability) when
-     	using 0xff in filenames
-     Rewritten locking to avoid race-conditions
-     Mount option 'eas' now works
-     Fsync no longer returns error
-     Files beginning with '.' are marked hidden
-     Remount support added
-     Alloc is not so slow when filesystem becomes full
-     Atimes are no more updated because it slows down operation
-     Code cleanup (removed all commented debug prints)
-1.92 Corrected a bug when sync was called just before closing file
-1.93 Modified, so that it works with kernels >= 2.1.131, I don't know if it
-	works with previous versions
-     Fixed a possible problem with disks > 64G (but I don't have one, so I can't
-     	test it)
-     Fixed a file overflow at 2G
-     Added new option 'timeshift'
-     Changed behaviour on HPFS386: It is now possible to operate on HPFS386 in
-     	read-only mode
-     Fixed a bug that slowed down alloc and prevented allocating 100% space
-     	(this bug was not destructive)
-1.94 Added workaround for one bug in Linux
-     Fixed one buffer leak
-     Fixed some incompatibilities with large extended attributes (but it's still
-	not 100% ok, I have no info on it and OS/2 doesn't want to create them)
-     Rewritten allocation
-     Fixed a bug with i_blocks (du sometimes didn't display correct values)
-     Directories have no longer archive attribute set (some programs don't like
-	it)
-     Fixed a bug that it set badly one flag in large anode tree (it was not
-	destructive)
-1.95 Fixed one buffer leak, that could happen on corrupted filesystem
-     Fixed one bug in allocation in 1.94
-1.96 Added workaround for one bug in OS/2 (HPFS locked up, HPFS386 reported
-	error sometimes when opening directories in PMSHELL)
-     Fixed a possible bitmap race
-     Fixed possible problem on large disks
-     You can now delete open files
-     Fixed a nondestructive race in rename
-1.97 Support for HPFS v3 (on large partitions)
-     Fixed a bug that it didn't allow creation of files > 128M (it should be 2G)
-1.97.1 Changed names of global symbols
-       Fixed a bug when chmoding or chowning root directory
-1.98 Fixed a deadlock when using old_readdir
-     Better directory handling; workaround for "unbalanced tree" bug in OS/2
-1.99 Corrected a possible problem when there's not enough space while deleting
-	file
-     Now it tries to truncate the file if there's not enough space when deleting
-     Removed a lot of redundant code
-2.00 Fixed a bug in rename (it was there since 1.96)
-     Better anti-fragmentation strategy
-2.01 Fixed problem with directory listing over NFS
-     Directory lseek now checks for proper parameters
-     Fixed race-condition in buffer code - it is in all filesystems in Linux;
-        when reading device (cat /dev/hda) while creating files on it, files
-        could be damaged
-2.02 Workaround for bug in breada in Linux. breada could cause accesses beyond
-        end of partition
-2.03 Char, block devices and pipes are correctly created
-     Fixed non-crashing race in unlink (Alexander Viro)
-     Now it works with Japanese version of OS/2
-2.04 Fixed error when ftruncate used to extend file
-2.05 Fixed crash when got mount parameters without =
-     Fixed crash when allocation of anode failed due to full disk
-     Fixed some crashes when block io or inode allocation failed
-2.06 Fixed some crash on corrupted disk structures
-     Better allocation strategy
-     Reschedule points added so that it doesn't lock CPU long time
-     It should work in read-only mode on Warp Server
-2.07 More fixes for Warp Server. Now it really works
-2.08 Creating new files is not so slow on large disks
-     An attempt to sync deleted file does not generate filesystem error
-2.09 Fixed error on extremely fragmented files
-
-
- vim: set textwidth=80:
diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index f776411340cb..3fbe2fa0b5c5 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -69,6 +69,7 @@ Documentation for filesystem implementations.
    gfs2-uevents
    hfs
    hfsplus
+   hpfs
    fuse
    overlayfs
    virtiofs
-- 
cgit 


From de389cf08d4708d0a03516e5ce0e193f49f0b358 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:12:10 +0100
Subject: docs: filesystems: convert inotify.txt to ReST

- Add a SPDX header;
- Add a document title;
- Adjust document title;
- Fix list markups;
- Some whitespace fixes and new line breaks;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Acked-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/8f846843ecf1914988feb4d001e3a53d27dc1a65.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/index.rst   |  1 +
 Documentation/filesystems/inotify.rst | 90 +++++++++++++++++++++++++++++++++++
 Documentation/filesystems/inotify.txt | 79 ------------------------------
 3 files changed, 91 insertions(+), 79 deletions(-)
 create mode 100644 Documentation/filesystems/inotify.rst
 delete mode 100644 Documentation/filesystems/inotify.txt

diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index 3fbe2fa0b5c5..5a737722652c 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -70,6 +70,7 @@ Documentation for filesystem implementations.
    hfs
    hfsplus
    hpfs
+   inotify
    fuse
    overlayfs
    virtiofs
diff --git a/Documentation/filesystems/inotify.rst b/Documentation/filesystems/inotify.rst
new file mode 100644
index 000000000000..7f7ef8af0e1e
--- /dev/null
+++ b/Documentation/filesystems/inotify.rst
@@ -0,0 +1,90 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===============================================================
+Inotify - A Powerful yet Simple File Change Notification System
+===============================================================
+
+
+
+Document started 15 Mar 2005 by Robert Love <rml@novell.com>
+
+Document updated 4 Jan 2015 by Zhang Zhen <zhenzhang.zhang@huawei.com>
+
+	- Deleted obsoleted interface, just refer to manpages for user interface.
+
+(i) Rationale
+
+Q:
+   What is the design decision behind not tying the watch to the open fd of
+   the watched object?
+
+A:
+   Watches are associated with an open inotify device, not an open file.
+   This solves the primary problem with dnotify: keeping the file open pins
+   the file and thus, worse, pins the mount.  Dnotify is therefore infeasible
+   for use on a desktop system with removable media as the media cannot be
+   unmounted.  Watching a file should not require that it be open.
+
+Q:
+   What is the design decision behind using an-fd-per-instance as opposed to
+   an fd-per-watch?
+
+A:
+   An fd-per-watch quickly consumes more file descriptors than are allowed,
+   more fd's than are feasible to manage, and more fd's than are optimally
+   select()-able.  Yes, root can bump the per-process fd limit and yes, users
+   can use epoll, but requiring both is a silly and extraneous requirement.
+   A watch consumes less memory than an open file, separating the number
+   spaces is thus sensible.  The current design is what user-space developers
+   want: Users initialize inotify, once, and add n watches, requiring but one
+   fd and no twiddling with fd limits.  Initializing an inotify instance two
+   thousand times is silly.  If we can implement user-space's preferences
+   cleanly--and we can, the idr layer makes stuff like this trivial--then we
+   should.
+
+   There are other good arguments.  With a single fd, there is a single
+   item to block on, which is mapped to a single queue of events.  The single
+   fd returns all watch events and also any potential out-of-band data.  If
+   every fd was a separate watch,
+
+   - There would be no way to get event ordering.  Events on file foo and
+     file bar would pop poll() on both fd's, but there would be no way to tell
+     which happened first.  A single queue trivially gives you ordering.  Such
+     ordering is crucial to existing applications such as Beagle.  Imagine
+     "mv a b ; mv b a" events without ordering.
+
+   - We'd have to maintain n fd's and n internal queues with state,
+     versus just one.  It is a lot messier in the kernel.  A single, linear
+     queue is the data structure that makes sense.
+
+   - User-space developers prefer the current API.  The Beagle guys, for
+     example, love it.  Trust me, I asked.  It is not a surprise: Who'd want
+     to manage and block on 1000 fd's via select?
+
+   - No way to get out of band data.
+
+   - 1024 is still too low.  ;-)
+
+   When you talk about designing a file change notification system that
+   scales to 1000s of directories, juggling 1000s of fd's just does not seem
+   the right interface.  It is too heavy.
+
+   Additionally, it _is_ possible to  more than one instance  and
+   juggle more than one queue and thus more than one associated fd.  There
+   need not be a one-fd-per-process mapping; it is one-fd-per-queue and a
+   process can easily want more than one queue.
+
+Q:
+   Why the system call approach?
+
+A:
+   The poor user-space interface is the second biggest problem with dnotify.
+   Signals are a terrible, terrible interface for file notification.  Or for
+   anything, for that matter.  The ideal solution, from all perspectives, is a
+   file descriptor-based one that allows basic file I/O and poll/select.
+   Obtaining the fd and managing the watches could have been done either via a
+   device file or a family of new system calls.  We decided to implement a
+   family of system calls because that is the preferred approach for new kernel
+   interfaces.  The only real difference was whether we wanted to use open(2)
+   and ioctl(2) or a couple of new system calls.  System calls beat ioctls.
+
diff --git a/Documentation/filesystems/inotify.txt b/Documentation/filesystems/inotify.txt
deleted file mode 100644
index 51f61db787fb..000000000000
--- a/Documentation/filesystems/inotify.txt
+++ /dev/null
@@ -1,79 +0,0 @@
-				   inotify
-	    a powerful yet simple file change notification system
-
-
-
-Document started 15 Mar 2005 by Robert Love <rml@novell.com>
-Document updated 4 Jan 2015 by Zhang Zhen <zhenzhang.zhang@huawei.com>
-	--Deleted obsoleted interface, just refer to manpages for user interface.
-
-(i) Rationale
-
-Q: What is the design decision behind not tying the watch to the open fd of
-   the watched object?
-
-A: Watches are associated with an open inotify device, not an open file.
-   This solves the primary problem with dnotify: keeping the file open pins
-   the file and thus, worse, pins the mount.  Dnotify is therefore infeasible
-   for use on a desktop system with removable media as the media cannot be
-   unmounted.  Watching a file should not require that it be open.
-
-Q: What is the design decision behind using an-fd-per-instance as opposed to
-   an fd-per-watch?
-
-A: An fd-per-watch quickly consumes more file descriptors than are allowed,
-   more fd's than are feasible to manage, and more fd's than are optimally
-   select()-able.  Yes, root can bump the per-process fd limit and yes, users
-   can use epoll, but requiring both is a silly and extraneous requirement.
-   A watch consumes less memory than an open file, separating the number
-   spaces is thus sensible.  The current design is what user-space developers
-   want: Users initialize inotify, once, and add n watches, requiring but one
-   fd and no twiddling with fd limits.  Initializing an inotify instance two
-   thousand times is silly.  If we can implement user-space's preferences 
-   cleanly--and we can, the idr layer makes stuff like this trivial--then we 
-   should.
-
-   There are other good arguments.  With a single fd, there is a single
-   item to block on, which is mapped to a single queue of events.  The single
-   fd returns all watch events and also any potential out-of-band data.  If
-   every fd was a separate watch,
-
-   - There would be no way to get event ordering.  Events on file foo and
-     file bar would pop poll() on both fd's, but there would be no way to tell
-     which happened first.  A single queue trivially gives you ordering.  Such
-     ordering is crucial to existing applications such as Beagle.  Imagine
-     "mv a b ; mv b a" events without ordering.
-
-   - We'd have to maintain n fd's and n internal queues with state,
-     versus just one.  It is a lot messier in the kernel.  A single, linear
-     queue is the data structure that makes sense.
-
-   - User-space developers prefer the current API.  The Beagle guys, for
-     example, love it.  Trust me, I asked.  It is not a surprise: Who'd want
-     to manage and block on 1000 fd's via select?
-
-   - No way to get out of band data.
-
-   - 1024 is still too low.  ;-)
-
-   When you talk about designing a file change notification system that
-   scales to 1000s of directories, juggling 1000s of fd's just does not seem
-   the right interface.  It is too heavy.
-
-   Additionally, it _is_ possible to  more than one instance  and
-   juggle more than one queue and thus more than one associated fd.  There
-   need not be a one-fd-per-process mapping; it is one-fd-per-queue and a
-   process can easily want more than one queue.
-
-Q: Why the system call approach?
-
-A: The poor user-space interface is the second biggest problem with dnotify.
-   Signals are a terrible, terrible interface for file notification.  Or for
-   anything, for that matter.  The ideal solution, from all perspectives, is a
-   file descriptor-based one that allows basic file I/O and poll/select.
-   Obtaining the fd and managing the watches could have been done either via a
-   device file or a family of new system calls.  We decided to implement a
-   family of system calls because that is the preferred approach for new kernel
-   interfaces.  The only real difference was whether we wanted to use open(2)
-   and ioctl(2) or a couple of new system calls.  System calls beat ioctls.
-
-- 
cgit 


From 76f216855b6bd1027e236b29cd7fece7336c37eb Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:12:11 +0100
Subject: docs: filesystems: convert isofs.txt to ReST

- Add a SPDX header;
- Add a document title;
- Some whitespace fixes and new line breaks;
- Add table markups;
- Add lists markups;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/ec16dc09d0c23bb0c1af3d3f33a96896083a1d36.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/index.rst |  1 +
 Documentation/filesystems/isofs.rst | 64 +++++++++++++++++++++++++++++++++++++
 Documentation/filesystems/isofs.txt | 48 ----------------------------
 3 files changed, 65 insertions(+), 48 deletions(-)
 create mode 100644 Documentation/filesystems/isofs.rst
 delete mode 100644 Documentation/filesystems/isofs.txt

diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index 5a737722652c..8c8813ada53f 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -71,6 +71,7 @@ Documentation for filesystem implementations.
    hfsplus
    hpfs
    inotify
+   isofs
    fuse
    overlayfs
    virtiofs
diff --git a/Documentation/filesystems/isofs.rst b/Documentation/filesystems/isofs.rst
new file mode 100644
index 000000000000..08fd469091d4
--- /dev/null
+++ b/Documentation/filesystems/isofs.rst
@@ -0,0 +1,64 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==================
+ISO9660 Filesystem
+==================
+
+Mount options that are the same as for msdos and vfat partitions.
+
+  =========	========================================================
+  gid=nnn	All files in the partition will be in group nnn.
+  uid=nnn	All files in the partition will be owned by user id nnn.
+  umask=nnn	The permission mask (see umask(1)) for the partition.
+  =========	========================================================
+
+Mount options that are the same as vfat partitions. These are only useful
+when using discs encoded using Microsoft's Joliet extensions.
+
+ ==============	=============================================================
+ iocharset=name Character set to use for converting from Unicode to
+		ASCII.  Joliet filenames are stored in Unicode format, but
+		Unix for the most part doesn't know how to deal with Unicode.
+		There is also an option of doing UTF-8 translations with the
+		utf8 option.
+  utf8          Encode Unicode names in UTF-8 format. Default is no.
+ ==============	=============================================================
+
+Mount options unique to the isofs filesystem.
+
+ ================= ============================================================
+  block=512        Set the block size for the disk to 512 bytes
+  block=1024       Set the block size for the disk to 1024 bytes
+  block=2048       Set the block size for the disk to 2048 bytes
+  check=relaxed    Matches filenames with different cases
+  check=strict     Matches only filenames with the exact same case
+  cruft            Try to handle badly formatted CDs.
+  map=off          Do not map non-Rock Ridge filenames to lower case
+  map=normal       Map non-Rock Ridge filenames to lower case
+  map=acorn        As map=normal but also apply Acorn extensions if present
+  mode=xxx         Sets the permissions on files to xxx unless Rock Ridge
+		   extensions set the permissions otherwise
+  dmode=xxx        Sets the permissions on directories to xxx unless Rock Ridge
+		   extensions set the permissions otherwise
+  overriderockperm Set permissions on files and directories according to
+		   'mode' and 'dmode' even though Rock Ridge extensions are
+		   present.
+  nojoliet         Ignore Joliet extensions if they are present.
+  norock           Ignore Rock Ridge extensions if they are present.
+  hide		   Completely strip hidden files from the file system.
+  showassoc	   Show files marked with the 'associated' bit
+  unhide	   Deprecated; showing hidden files is now default;
+		   If given, it is a synonym for 'showassoc' which will
+		   recreate previous unhide behavior
+  session=x        Select number of session on multisession CD
+  sbsector=xxx     Session begins from sector xxx
+ ================= ============================================================
+
+Recommended documents about ISO 9660 standard are located at:
+
+- http://www.y-adagio.com/
+- ftp://ftp.ecma.ch/ecma-st/Ecma-119.pdf
+
+Quoting from the PDF "This 2nd Edition of Standard ECMA-119 is technically
+identical with ISO 9660.", so it is a valid and gratis substitute of the
+official ISO specification.
diff --git a/Documentation/filesystems/isofs.txt b/Documentation/filesystems/isofs.txt
deleted file mode 100644
index ba0a93384de0..000000000000
--- a/Documentation/filesystems/isofs.txt
+++ /dev/null
@@ -1,48 +0,0 @@
-Mount options that are the same as for msdos and vfat partitions.
-
-  gid=nnn	All files in the partition will be in group nnn.
-  uid=nnn	All files in the partition will be owned by user id nnn.
-  umask=nnn	The permission mask (see umask(1)) for the partition.
-
-Mount options that are the same as vfat partitions. These are only useful
-when using discs encoded using Microsoft's Joliet extensions.
-  iocharset=name Character set to use for converting from Unicode to
-		ASCII.  Joliet filenames are stored in Unicode format, but
-		Unix for the most part doesn't know how to deal with Unicode.
-		There is also an option of doing UTF-8 translations with the
-		utf8 option.
-  utf8          Encode Unicode names in UTF-8 format. Default is no.
-
-Mount options unique to the isofs filesystem.
-  block=512     Set the block size for the disk to 512 bytes
-  block=1024    Set the block size for the disk to 1024 bytes
-  block=2048    Set the block size for the disk to 2048 bytes
-  check=relaxed Matches filenames with different cases
-  check=strict  Matches only filenames with the exact same case
-  cruft         Try to handle badly formatted CDs.
-  map=off       Do not map non-Rock Ridge filenames to lower case
-  map=normal    Map non-Rock Ridge filenames to lower case
-  map=acorn     As map=normal but also apply Acorn extensions if present
-  mode=xxx      Sets the permissions on files to xxx unless Rock Ridge
-		extensions set the permissions otherwise
-  dmode=xxx     Sets the permissions on directories to xxx unless Rock Ridge
-		extensions set the permissions otherwise
-  overriderockperm Set permissions on files and directories according to
-		'mode' and 'dmode' even though Rock Ridge extensions are
-		present.
-  nojoliet      Ignore Joliet extensions if they are present.
-  norock        Ignore Rock Ridge extensions if they are present.
-  hide		Completely strip hidden files from the file system.
-  showassoc	Show files marked with the 'associated' bit
-  unhide	Deprecated; showing hidden files is now default;
-		If given, it is a synonym for 'showassoc' which will
-		recreate previous unhide behavior
-  session=x     Select number of session on multisession CD
-  sbsector=xxx  Session begins from sector xxx
-
-Recommended documents about ISO 9660 standard are located at:
-http://www.y-adagio.com/
-ftp://ftp.ecma.ch/ecma-st/Ecma-119.pdf
-Quoting from the PDF "This 2nd Edition of Standard ECMA-119 is technically 
-identical with ISO 9660.", so it is a valid and gratis substitute of the
-official ISO specification.
-- 
cgit 


From 2640c19dcab0f6530007dfb4ee5870f5d61b0772 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:12:12 +0100
Subject: docs: filesystems: convert nilfs2.txt to ReST

- Add a SPDX header;
- Add a document title;
- Adjust document title;
- Mark literal blocks as such;
- use :field: markup;
- Add table markups;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/f7989ca501585f5990fffd2d365cfca4fe9fdd6f.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/index.rst  |   3 +-
 Documentation/filesystems/nilfs2.rst | 286 +++++++++++++++++++++++++++++++++++
 Documentation/filesystems/nilfs2.txt | 276 ---------------------------------
 3 files changed, 288 insertions(+), 277 deletions(-)
 create mode 100644 Documentation/filesystems/nilfs2.rst
 delete mode 100644 Documentation/filesystems/nilfs2.txt

diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index 8c8813ada53f..01587704fcc9 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -70,9 +70,10 @@ Documentation for filesystem implementations.
    hfs
    hfsplus
    hpfs
+   fuse
    inotify
    isofs
-   fuse
+   nilfs2
    overlayfs
    virtiofs
    vfat
diff --git a/Documentation/filesystems/nilfs2.rst b/Documentation/filesystems/nilfs2.rst
new file mode 100644
index 000000000000..6c49f04e9e0a
--- /dev/null
+++ b/Documentation/filesystems/nilfs2.rst
@@ -0,0 +1,286 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+======
+NILFS2
+======
+
+NILFS2 is a log-structured file system (LFS) supporting continuous
+snapshotting.  In addition to versioning capability of the entire file
+system, users can even restore files mistakenly overwritten or
+destroyed just a few seconds ago.  Since NILFS2 can keep consistency
+like conventional LFS, it achieves quick recovery after system
+crashes.
+
+NILFS2 creates a number of checkpoints every few seconds or per
+synchronous write basis (unless there is no change).  Users can select
+significant versions among continuously created checkpoints, and can
+change them into snapshots which will be preserved until they are
+changed back to checkpoints.
+
+There is no limit on the number of snapshots until the volume gets
+full.  Each snapshot is mountable as a read-only file system
+concurrently with its writable mount, and this feature is convenient
+for online backup.
+
+The userland tools are included in nilfs-utils package, which is
+available from the following download page.  At least "mkfs.nilfs2",
+"mount.nilfs2", "umount.nilfs2", and "nilfs_cleanerd" (so called
+cleaner or garbage collector) are required.  Details on the tools are
+described in the man pages included in the package.
+
+:Project web page:    https://nilfs.sourceforge.io/
+:Download page:       https://nilfs.sourceforge.io/en/download.html
+:List info:           http://vger.kernel.org/vger-lists.html#linux-nilfs
+
+Caveats
+=======
+
+Features which NILFS2 does not support yet:
+
+	- atime
+	- extended attributes
+	- POSIX ACLs
+	- quotas
+	- fsck
+	- defragmentation
+
+Mount options
+=============
+
+NILFS2 supports the following mount options:
+(*) == default
+
+======================= =======================================================
+barrier(*)		This enables/disables the use of write barriers.  This
+nobarrier		requires an IO stack which can support barriers, and
+			if nilfs gets an error on a barrier write, it will
+			disable again with a warning.
+errors=continue		Keep going on a filesystem error.
+errors=remount-ro(*)	Remount the filesystem read-only on an error.
+errors=panic		Panic and halt the machine if an error occurs.
+cp=n			Specify the checkpoint-number of the snapshot to be
+			mounted.  Checkpoints and snapshots are listed by lscp
+			user command.  Only the checkpoints marked as snapshot
+			are mountable with this option.  Snapshot is read-only,
+			so a read-only mount option must be specified together.
+order=relaxed(*)	Apply relaxed order semantics that allows modified data
+			blocks to be written to disk without making a
+			checkpoint if no metadata update is going.  This mode
+			is equivalent to the ordered data mode of the ext3
+			filesystem except for the updates on data blocks still
+			conserve atomicity.  This will improve synchronous
+			write performance for overwriting.
+order=strict		Apply strict in-order semantics that preserves sequence
+			of all file operations including overwriting of data
+			blocks.  That means, it is guaranteed that no
+			overtaking of events occurs in the recovered file
+			system after a crash.
+norecovery		Disable recovery of the filesystem on mount.
+			This disables every write access on the device for
+			read-only mounts or snapshots.  This option will fail
+			for r/w mounts on an unclean volume.
+discard			This enables/disables the use of discard/TRIM commands.
+nodiscard(*)		The discard/TRIM commands are sent to the underlying
+			block device when blocks are freed.  This is useful
+			for SSD devices and sparse/thinly-provisioned LUNs.
+======================= =======================================================
+
+Ioctls
+======
+
+There is some NILFS2 specific functionality which can be accessed by applications
+through the system call interfaces. The list of all NILFS2 specific ioctls are
+shown in the table below.
+
+Table of NILFS2 specific ioctls:
+
+ ============================== ===============================================
+ Ioctl			        Description
+ ============================== ===============================================
+ NILFS_IOCTL_CHANGE_CPMODE      Change mode of given checkpoint between
+			        checkpoint and snapshot state. This ioctl is
+			        used in chcp and mkcp utilities.
+
+ NILFS_IOCTL_DELETE_CHECKPOINT  Remove checkpoint from NILFS2 file system.
+			        This ioctl is used in rmcp utility.
+
+ NILFS_IOCTL_GET_CPINFO         Return info about requested checkpoints. This
+			        ioctl is used in lscp utility and by
+			        nilfs_cleanerd daemon.
+
+ NILFS_IOCTL_GET_CPSTAT         Return checkpoints statistics. This ioctl is
+			        used by lscp, rmcp utilities and by
+			        nilfs_cleanerd daemon.
+
+ NILFS_IOCTL_GET_SUINFO         Return segment usage info about requested
+			        segments. This ioctl is used in lssu,
+			        nilfs_resize utilities and by nilfs_cleanerd
+			        daemon.
+
+ NILFS_IOCTL_SET_SUINFO         Modify segment usage info of requested
+				segments. This ioctl is used by
+				nilfs_cleanerd daemon to skip unnecessary
+				cleaning operation of segments and reduce
+				performance penalty or wear of flash device
+				due to redundant move of in-use blocks.
+
+ NILFS_IOCTL_GET_SUSTAT         Return segment usage statistics. This ioctl
+			        is used in lssu, nilfs_resize utilities and
+			        by nilfs_cleanerd daemon.
+
+ NILFS_IOCTL_GET_VINFO          Return information on virtual block addresses.
+			        This ioctl is used by nilfs_cleanerd daemon.
+
+ NILFS_IOCTL_GET_BDESCS         Return information about descriptors of disk
+			        block numbers. This ioctl is used by
+			        nilfs_cleanerd daemon.
+
+ NILFS_IOCTL_CLEAN_SEGMENTS     Do garbage collection operation in the
+			        environment of requested parameters from
+			        userspace. This ioctl is used by
+			        nilfs_cleanerd daemon.
+
+ NILFS_IOCTL_SYNC               Make a checkpoint. This ioctl is used in
+			        mkcp utility.
+
+ NILFS_IOCTL_RESIZE             Resize NILFS2 volume. This ioctl is used
+			        by nilfs_resize utility.
+
+ NILFS_IOCTL_SET_ALLOC_RANGE    Define lower limit of segments in bytes and
+			        upper limit of segments in bytes. This ioctl
+			        is used by nilfs_resize utility.
+ ============================== ===============================================
+
+NILFS2 usage
+============
+
+To use nilfs2 as a local file system, simply::
+
+ # mkfs -t nilfs2 /dev/block_device
+ # mount -t nilfs2 /dev/block_device /dir
+
+This will also invoke the cleaner through the mount helper program
+(mount.nilfs2).
+
+Checkpoints and snapshots are managed by the following commands.
+Their manpages are included in the nilfs-utils package above.
+
+  ====     ===========================================================
+  lscp     list checkpoints or snapshots.
+  mkcp     make a checkpoint or a snapshot.
+  chcp     change an existing checkpoint to a snapshot or vice versa.
+  rmcp     invalidate specified checkpoint(s).
+  ====     ===========================================================
+
+To mount a snapshot::
+
+ # mount -t nilfs2 -r -o cp=<cno> /dev/block_device /snap_dir
+
+where <cno> is the checkpoint number of the snapshot.
+
+To unmount the NILFS2 mount point or snapshot, simply::
+
+ # umount /dir
+
+Then, the cleaner daemon is automatically shut down by the umount
+helper program (umount.nilfs2).
+
+Disk format
+===========
+
+A nilfs2 volume is equally divided into a number of segments except
+for the super block (SB) and segment #0.  A segment is the container
+of logs.  Each log is composed of summary information blocks, payload
+blocks, and an optional super root block (SR)::
+
+   ______________________________________________________
+  | |SB| | Segment | Segment | Segment | ... | Segment | |
+  |_|__|_|____0____|____1____|____2____|_____|____N____|_|
+  0 +1K +4K       +8M       +16M      +24M  +(8MB x N)
+       .             .            (Typical offsets for 4KB-block)
+    .                  .
+  .______________________.
+  | log | log |... | log |
+  |__1__|__2__|____|__m__|
+        .       .
+      .               .
+    .                       .
+  .______________________________.
+  | Summary | Payload blocks  |SR|
+  |_blocks__|_________________|__|
+
+The payload blocks are organized per file, and each file consists of
+data blocks and B-tree node blocks::
+
+    |<---       File-A        --->|<---       File-B        --->|
+   _______________________________________________________________
+    | Data blocks | B-tree blocks | Data blocks | B-tree blocks | ...
+   _|_____________|_______________|_____________|_______________|_
+
+
+Since only the modified blocks are written in the log, it may have
+files without data blocks or B-tree node blocks.
+
+The organization of the blocks is recorded in the summary information
+blocks, which contains a header structure (nilfs_segment_summary), per
+file structures (nilfs_finfo), and per block structures (nilfs_binfo)::
+
+  _________________________________________________________________________
+ | Summary | finfo | binfo | ... | binfo | finfo | binfo | ... | binfo |...
+ |_blocks__|___A___|_(A,1)_|_____|(A,Na)_|___B___|_(B,1)_|_____|(B,Nb)_|___
+
+
+The logs include regular files, directory files, symbolic link files
+and several meta data files.  The mata data files are the files used
+to maintain file system meta data.  The current version of NILFS2 uses
+the following meta data files::
+
+ 1) Inode file (ifile)             -- Stores on-disk inodes
+ 2) Checkpoint file (cpfile)       -- Stores checkpoints
+ 3) Segment usage file (sufile)    -- Stores allocation state of segments
+ 4) Data address translation file  -- Maps virtual block numbers to usual
+    (DAT)                             block numbers.  This file serves to
+                                      make on-disk blocks relocatable.
+
+The following figure shows a typical organization of the logs::
+
+  _________________________________________________________________________
+ | Summary | regular file | file  | ... | ifile | cpfile | sufile | DAT |SR|
+ |_blocks__|_or_directory_|_______|_____|_______|________|________|_____|__|
+
+
+To stride over segment boundaries, this sequence of files may be split
+into multiple logs.  The sequence of logs that should be treated as
+logically one log, is delimited with flags marked in the segment
+summary.  The recovery code of nilfs2 looks this boundary information
+to ensure atomicity of updates.
+
+The super root block is inserted for every checkpoints.  It includes
+three special inodes, inodes for the DAT, cpfile, and sufile.  Inodes
+of regular files, directories, symlinks and other special files, are
+included in the ifile.  The inode of ifile itself is included in the
+corresponding checkpoint entry in the cpfile.  Thus, the hierarchy
+among NILFS2 files can be depicted as follows::
+
+  Super block (SB)
+       |
+       v
+  Super root block (the latest cno=xx)
+       |-- DAT
+       |-- sufile
+       `-- cpfile
+              |-- ifile (cno=c1)
+              |-- ifile (cno=c2) ---- file (ino=i1)
+              :        :          |-- file (ino=i2)
+              `-- ifile (cno=xx)  |-- file (ino=i3)
+                                  :        :
+                                  `-- file (ino=yy)
+                                    ( regular file, directory, or symlink )
+
+For detail on the format of each file, please see nilfs2_ondisk.h
+located at include/uapi/linux directory.
+
+There are no patents or other intellectual property that we protect
+with regard to the design of NILFS2.  It is allowed to replicate the
+design in hopes that other operating systems could share (mount, read,
+write, etc.) data stored in this format.
diff --git a/Documentation/filesystems/nilfs2.txt b/Documentation/filesystems/nilfs2.txt
deleted file mode 100644
index f2f3f8592a6f..000000000000
--- a/Documentation/filesystems/nilfs2.txt
+++ /dev/null
@@ -1,276 +0,0 @@
-NILFS2
-------
-
-NILFS2 is a log-structured file system (LFS) supporting continuous
-snapshotting.  In addition to versioning capability of the entire file
-system, users can even restore files mistakenly overwritten or
-destroyed just a few seconds ago.  Since NILFS2 can keep consistency
-like conventional LFS, it achieves quick recovery after system
-crashes.
-
-NILFS2 creates a number of checkpoints every few seconds or per
-synchronous write basis (unless there is no change).  Users can select
-significant versions among continuously created checkpoints, and can
-change them into snapshots which will be preserved until they are
-changed back to checkpoints.
-
-There is no limit on the number of snapshots until the volume gets
-full.  Each snapshot is mountable as a read-only file system
-concurrently with its writable mount, and this feature is convenient
-for online backup.
-
-The userland tools are included in nilfs-utils package, which is
-available from the following download page.  At least "mkfs.nilfs2",
-"mount.nilfs2", "umount.nilfs2", and "nilfs_cleanerd" (so called
-cleaner or garbage collector) are required.  Details on the tools are
-described in the man pages included in the package.
-
-Project web page:    https://nilfs.sourceforge.io/
-Download page:       https://nilfs.sourceforge.io/en/download.html
-List info:           http://vger.kernel.org/vger-lists.html#linux-nilfs
-
-Caveats
-=======
-
-Features which NILFS2 does not support yet:
-
-	- atime
-	- extended attributes
-	- POSIX ACLs
-	- quotas
-	- fsck
-	- defragmentation
-
-Mount options
-=============
-
-NILFS2 supports the following mount options:
-(*) == default
-
-barrier(*)		This enables/disables the use of write barriers.  This
-nobarrier		requires an IO stack which can support barriers, and
-			if nilfs gets an error on a barrier write, it will
-			disable again with a warning.
-errors=continue		Keep going on a filesystem error.
-errors=remount-ro(*)	Remount the filesystem read-only on an error.
-errors=panic		Panic and halt the machine if an error occurs.
-cp=n			Specify the checkpoint-number of the snapshot to be
-			mounted.  Checkpoints and snapshots are listed by lscp
-			user command.  Only the checkpoints marked as snapshot
-			are mountable with this option.  Snapshot is read-only,
-			so a read-only mount option must be specified together.
-order=relaxed(*)	Apply relaxed order semantics that allows modified data
-			blocks to be written to disk without making a
-			checkpoint if no metadata update is going.  This mode
-			is equivalent to the ordered data mode of the ext3
-			filesystem except for the updates on data blocks still
-			conserve atomicity.  This will improve synchronous
-			write performance for overwriting.
-order=strict		Apply strict in-order semantics that preserves sequence
-			of all file operations including overwriting of data
-			blocks.  That means, it is guaranteed that no
-			overtaking of events occurs in the recovered file
-			system after a crash.
-norecovery		Disable recovery of the filesystem on mount.
-			This disables every write access on the device for
-			read-only mounts or snapshots.  This option will fail
-			for r/w mounts on an unclean volume.
-discard			This enables/disables the use of discard/TRIM commands.
-nodiscard(*)		The discard/TRIM commands are sent to the underlying
-			block device when blocks are freed.  This is useful
-			for SSD devices and sparse/thinly-provisioned LUNs.
-
-Ioctls
-======
-
-There is some NILFS2 specific functionality which can be accessed by applications
-through the system call interfaces. The list of all NILFS2 specific ioctls are
-shown in the table below.
-
-Table of NILFS2 specific ioctls
-..............................................................................
- Ioctl			        Description
- NILFS_IOCTL_CHANGE_CPMODE      Change mode of given checkpoint between
-			        checkpoint and snapshot state. This ioctl is
-			        used in chcp and mkcp utilities.
-
- NILFS_IOCTL_DELETE_CHECKPOINT  Remove checkpoint from NILFS2 file system.
-			        This ioctl is used in rmcp utility.
-
- NILFS_IOCTL_GET_CPINFO         Return info about requested checkpoints. This
-			        ioctl is used in lscp utility and by
-			        nilfs_cleanerd daemon.
-
- NILFS_IOCTL_GET_CPSTAT         Return checkpoints statistics. This ioctl is
-			        used by lscp, rmcp utilities and by
-			        nilfs_cleanerd daemon.
-
- NILFS_IOCTL_GET_SUINFO         Return segment usage info about requested
-			        segments. This ioctl is used in lssu,
-			        nilfs_resize utilities and by nilfs_cleanerd
-			        daemon.
-
- NILFS_IOCTL_SET_SUINFO         Modify segment usage info of requested
-				segments. This ioctl is used by
-				nilfs_cleanerd daemon to skip unnecessary
-				cleaning operation of segments and reduce
-				performance penalty or wear of flash device
-				due to redundant move of in-use blocks.
-
- NILFS_IOCTL_GET_SUSTAT         Return segment usage statistics. This ioctl
-			        is used in lssu, nilfs_resize utilities and
-			        by nilfs_cleanerd daemon.
-
- NILFS_IOCTL_GET_VINFO          Return information on virtual block addresses.
-			        This ioctl is used by nilfs_cleanerd daemon.
-
- NILFS_IOCTL_GET_BDESCS         Return information about descriptors of disk
-			        block numbers. This ioctl is used by
-			        nilfs_cleanerd daemon.
-
- NILFS_IOCTL_CLEAN_SEGMENTS     Do garbage collection operation in the
-			        environment of requested parameters from
-			        userspace. This ioctl is used by
-			        nilfs_cleanerd daemon.
-
- NILFS_IOCTL_SYNC               Make a checkpoint. This ioctl is used in
-			        mkcp utility.
-
- NILFS_IOCTL_RESIZE             Resize NILFS2 volume. This ioctl is used
-			        by nilfs_resize utility.
-
- NILFS_IOCTL_SET_ALLOC_RANGE    Define lower limit of segments in bytes and
-			        upper limit of segments in bytes. This ioctl
-			        is used by nilfs_resize utility.
-
-NILFS2 usage
-============
-
-To use nilfs2 as a local file system, simply:
-
- # mkfs -t nilfs2 /dev/block_device
- # mount -t nilfs2 /dev/block_device /dir
-
-This will also invoke the cleaner through the mount helper program
-(mount.nilfs2).
-
-Checkpoints and snapshots are managed by the following commands.
-Their manpages are included in the nilfs-utils package above.
-
-  lscp     list checkpoints or snapshots.
-  mkcp     make a checkpoint or a snapshot.
-  chcp     change an existing checkpoint to a snapshot or vice versa.
-  rmcp     invalidate specified checkpoint(s).
-
-To mount a snapshot,
-
- # mount -t nilfs2 -r -o cp=<cno> /dev/block_device /snap_dir
-
-where <cno> is the checkpoint number of the snapshot.
-
-To unmount the NILFS2 mount point or snapshot, simply:
-
- # umount /dir
-
-Then, the cleaner daemon is automatically shut down by the umount
-helper program (umount.nilfs2).
-
-Disk format
-===========
-
-A nilfs2 volume is equally divided into a number of segments except
-for the super block (SB) and segment #0.  A segment is the container
-of logs.  Each log is composed of summary information blocks, payload
-blocks, and an optional super root block (SR):
-
-   ______________________________________________________
-  | |SB| | Segment | Segment | Segment | ... | Segment | |
-  |_|__|_|____0____|____1____|____2____|_____|____N____|_|
-  0 +1K +4K       +8M       +16M      +24M  +(8MB x N)
-       .             .            (Typical offsets for 4KB-block)
-    .                  .
-  .______________________.
-  | log | log |... | log |
-  |__1__|__2__|____|__m__|
-        .       .
-      .               .
-    .                       .
-  .______________________________.
-  | Summary | Payload blocks  |SR|
-  |_blocks__|_________________|__|
-
-The payload blocks are organized per file, and each file consists of
-data blocks and B-tree node blocks:
-
-    |<---       File-A        --->|<---       File-B        --->|
-   _______________________________________________________________
-    | Data blocks | B-tree blocks | Data blocks | B-tree blocks | ...
-   _|_____________|_______________|_____________|_______________|_
-
-
-Since only the modified blocks are written in the log, it may have
-files without data blocks or B-tree node blocks.
-
-The organization of the blocks is recorded in the summary information
-blocks, which contains a header structure (nilfs_segment_summary), per
-file structures (nilfs_finfo), and per block structures (nilfs_binfo):
-
-  _________________________________________________________________________
- | Summary | finfo | binfo | ... | binfo | finfo | binfo | ... | binfo |...
- |_blocks__|___A___|_(A,1)_|_____|(A,Na)_|___B___|_(B,1)_|_____|(B,Nb)_|___
-
-
-The logs include regular files, directory files, symbolic link files
-and several meta data files.  The mata data files are the files used
-to maintain file system meta data.  The current version of NILFS2 uses
-the following meta data files:
-
- 1) Inode file (ifile)             -- Stores on-disk inodes
- 2) Checkpoint file (cpfile)       -- Stores checkpoints
- 3) Segment usage file (sufile)    -- Stores allocation state of segments
- 4) Data address translation file  -- Maps virtual block numbers to usual
-    (DAT)                             block numbers.  This file serves to
-                                      make on-disk blocks relocatable.
-
-The following figure shows a typical organization of the logs:
-
-  _________________________________________________________________________
- | Summary | regular file | file  | ... | ifile | cpfile | sufile | DAT |SR|
- |_blocks__|_or_directory_|_______|_____|_______|________|________|_____|__|
-
-
-To stride over segment boundaries, this sequence of files may be split
-into multiple logs.  The sequence of logs that should be treated as
-logically one log, is delimited with flags marked in the segment
-summary.  The recovery code of nilfs2 looks this boundary information
-to ensure atomicity of updates.
-
-The super root block is inserted for every checkpoints.  It includes
-three special inodes, inodes for the DAT, cpfile, and sufile.  Inodes
-of regular files, directories, symlinks and other special files, are
-included in the ifile.  The inode of ifile itself is included in the
-corresponding checkpoint entry in the cpfile.  Thus, the hierarchy
-among NILFS2 files can be depicted as follows:
-
-  Super block (SB)
-       |
-       v
-  Super root block (the latest cno=xx)
-       |-- DAT
-       |-- sufile
-       `-- cpfile
-              |-- ifile (cno=c1)
-              |-- ifile (cno=c2) ---- file (ino=i1)
-              :        :          |-- file (ino=i2)
-              `-- ifile (cno=xx)  |-- file (ino=i3)
-                                  :        :
-                                  `-- file (ino=yy)
-                                    ( regular file, directory, or symlink )
-
-For detail on the format of each file, please see nilfs2_ondisk.h
-located at include/uapi/linux directory.
-
-There are no patents or other intellectual property that we protect
-with regard to the design of NILFS2.  It is allowed to replicate the
-design in hopes that other operating systems could share (mount, read,
-write, etc.) data stored in this format.
-- 
cgit 


From 461f2c8f13fcc0d349e4acac46aacf63dbeb34ca Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:12:13 +0100
Subject: docs: filesystems: convert ntfs.txt to ReST

- Add a SPDX header;
- Adjust document title;
- Comment out text-only ToC;
- Some whitespace fixes and new line breaks;
- Mark literal blocks as such;
- Add table markups;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/f09ca6c9bdd4e7aa7208f3dba0b8753080b38d03.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/index.rst |   3 +-
 Documentation/filesystems/ntfs.rst  | 466 ++++++++++++++++++++++++++++++++++++
 Documentation/filesystems/ntfs.txt  | 451 ----------------------------------
 3 files changed, 468 insertions(+), 452 deletions(-)
 create mode 100644 Documentation/filesystems/ntfs.rst
 delete mode 100644 Documentation/filesystems/ntfs.txt

diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index 01587704fcc9..62be53c4755d 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -74,7 +74,8 @@ Documentation for filesystem implementations.
    inotify
    isofs
    nilfs2
+   nfs/index
+   ntfs
    overlayfs
    virtiofs
    vfat
-   nfs/index
diff --git a/Documentation/filesystems/ntfs.rst b/Documentation/filesystems/ntfs.rst
new file mode 100644
index 000000000000..5bb093a26485
--- /dev/null
+++ b/Documentation/filesystems/ntfs.rst
@@ -0,0 +1,466 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+================================
+The Linux NTFS filesystem driver
+================================
+
+
+.. Table of contents
+
+   - Overview
+   - Web site
+   - Features
+   - Supported mount options
+   - Known bugs and (mis-)features
+   - Using NTFS volume and stripe sets
+     - The Device-Mapper driver
+     - The Software RAID / MD driver
+     - Limitations when using the MD driver
+
+
+Overview
+========
+
+Linux-NTFS comes with a number of user-space programs known as ntfsprogs.
+These include mkntfs, a full-featured ntfs filesystem format utility,
+ntfsundelete used for recovering files that were unintentionally deleted
+from an NTFS volume and ntfsresize which is used to resize an NTFS partition.
+See the web site for more information.
+
+To mount an NTFS 1.2/3.x (Windows NT4/2000/XP/2003) volume, use the file
+system type 'ntfs'.  The driver currently supports read-only mode (with no
+fault-tolerance, encryption or journalling) and very limited, but safe, write
+support.
+
+For fault tolerance and raid support (i.e. volume and stripe sets), you can
+use the kernel's Software RAID / MD driver.  See section "Using Software RAID
+with NTFS" for details.
+
+
+Web site
+========
+
+There is plenty of additional information on the linux-ntfs web site
+at http://www.linux-ntfs.org/
+
+The web site has a lot of additional information, such as a comprehensive
+FAQ, documentation on the NTFS on-disk format, information on the Linux-NTFS
+userspace utilities, etc.
+
+
+Features
+========
+
+- This is a complete rewrite of the NTFS driver that used to be in the 2.4 and
+  earlier kernels.  This new driver implements NTFS read support and is
+  functionally equivalent to the old ntfs driver and it also implements limited
+  write support.  The biggest limitation at present is that files/directories
+  cannot be created or deleted.  See below for the list of write features that
+  are so far supported.  Another limitation is that writing to compressed files
+  is not implemented at all.  Also, neither read nor write access to encrypted
+  files is so far implemented.
+- The new driver has full support for sparse files on NTFS 3.x volumes which
+  the old driver isn't happy with.
+- The new driver supports execution of binaries due to mmap() now being
+  supported.
+- The new driver supports loopback mounting of files on NTFS which is used by
+  some Linux distributions to enable the user to run Linux from an NTFS
+  partition by creating a large file while in Windows and then loopback
+  mounting the file while in Linux and creating a Linux filesystem on it that
+  is used to install Linux on it.
+- A comparison of the two drivers using::
+
+	time find . -type f -exec md5sum "{}" \;
+
+  run three times in sequence with each driver (after a reboot) on a 1.4GiB
+  NTFS partition, showed the new driver to be 20% faster in total time elapsed
+  (from 9:43 minutes on average down to 7:53).  The time spent in user space
+  was unchanged but the time spent in the kernel was decreased by a factor of
+  2.5 (from 85 CPU seconds down to 33).
+- The driver does not support short file names in general.  For backwards
+  compatibility, we implement access to files using their short file names if
+  they exist.  The driver will not create short file names however, and a
+  rename will discard any existing short file name.
+- The new driver supports exporting of mounted NTFS volumes via NFS.
+- The new driver supports async io (aio).
+- The new driver supports fsync(2), fdatasync(2), and msync(2).
+- The new driver supports readv(2) and writev(2).
+- The new driver supports access time updates (including mtime and ctime).
+- The new driver supports truncate(2) and open(2) with O_TRUNC.  But at present
+  only very limited support for highly fragmented files, i.e. ones which have
+  their data attribute split across multiple extents, is included.  Another
+  limitation is that at present truncate(2) will never create sparse files,
+  since to mark a file sparse we need to modify the directory entry for the
+  file and we do not implement directory modifications yet.
+- The new driver supports write(2) which can both overwrite existing data and
+  extend the file size so that you can write beyond the existing data.  Also,
+  writing into sparse regions is supported and the holes are filled in with
+  clusters.  But at present only limited support for highly fragmented files,
+  i.e. ones which have their data attribute split across multiple extents, is
+  included.  Another limitation is that write(2) will never create sparse
+  files, since to mark a file sparse we need to modify the directory entry for
+  the file and we do not implement directory modifications yet.
+
+Supported mount options
+=======================
+
+In addition to the generic mount options described by the manual page for the
+mount command (man 8 mount, also see man 5 fstab), the NTFS driver supports the
+following mount options:
+
+======================= =======================================================
+iocharset=name		Deprecated option.  Still supported but please use
+			nls=name in the future.  See description for nls=name.
+
+nls=name		Character set to use when returning file names.
+			Unlike VFAT, NTFS suppresses names that contain
+			unconvertible characters.  Note that most character
+			sets contain insufficient characters to represent all
+			possible Unicode characters that can exist on NTFS.
+			To be sure you are not missing any files, you are
+			advised to use nls=utf8 which is capable of
+			representing all Unicode characters.
+
+utf8=<bool>		Option no longer supported.  Currently mapped to
+			nls=utf8 but please use nls=utf8 in the future and
+			make sure utf8 is compiled either as module or into
+			the kernel.  See description for nls=name.
+
+uid=
+gid=
+umask=			Provide default owner, group, and access mode mask.
+			These options work as documented in mount(8).  By
+			default, the files/directories are owned by root and
+			he/she has read and write permissions, as well as
+			browse permission for directories.  No one else has any
+			access permissions.  I.e. the mode on all files is by
+			default rw------- and for directories rwx------, a
+			consequence of the default fmask=0177 and dmask=0077.
+			Using a umask of zero will grant all permissions to
+			everyone, i.e. all files and directories will have mode
+			rwxrwxrwx.
+
+fmask=
+dmask=			Instead of specifying umask which applies both to
+			files and directories, fmask applies only to files and
+			dmask only to directories.
+
+sloppy=<BOOL>		If sloppy is specified, ignore unknown mount options.
+			Otherwise the default behaviour is to abort mount if
+			any unknown options are found.
+
+show_sys_files=<BOOL>	If show_sys_files is specified, show the system files
+			in directory listings.  Otherwise the default behaviour
+			is to hide the system files.
+			Note that even when show_sys_files is specified, "$MFT"
+			will not be visible due to bugs/mis-features in glibc.
+			Further, note that irrespective of show_sys_files, all
+			files are accessible by name, i.e. you can always do
+			"ls -l \$UpCase" for example to specifically show the
+			system file containing the Unicode upcase table.
+
+case_sensitive=<BOOL>	If case_sensitive is specified, treat all file names as
+			case sensitive and create file names in the POSIX
+			namespace.  Otherwise the default behaviour is to treat
+			file names as case insensitive and to create file names
+			in the WIN32/LONG name space.  Note, the Linux NTFS
+			driver will never create short file names and will
+			remove them on rename/delete of the corresponding long
+			file name.
+			Note that files remain accessible via their short file
+			name, if it exists.  If case_sensitive, you will need
+			to provide the correct case of the short file name.
+
+disable_sparse=<BOOL>	If disable_sparse is specified, creation of sparse
+			regions, i.e. holes, inside files is disabled for the
+			volume (for the duration of this mount only).  By
+			default, creation of sparse regions is enabled, which
+			is consistent with the behaviour of traditional Unix
+			filesystems.
+
+errors=opt		What to do when critical filesystem errors are found.
+			Following values can be used for "opt":
+
+			  ========  =========================================
+			  continue  DEFAULT, try to clean-up as much as
+				    possible, e.g. marking a corrupt inode as
+				    bad so it is no longer accessed, and then
+				    continue.
+			  recover   At present only supported is recovery of
+				    the boot sector from the backup copy.
+				    If read-only mount, the recovery is done
+				    in memory only and not written to disk.
+			  ========  =========================================
+
+			Note that the options are additive, i.e. specifying::
+
+			   errors=continue,errors=recover
+
+			means the driver will attempt to recover and if that
+			fails it will clean-up as much as possible and
+			continue.
+
+mft_zone_multiplier=	Set the MFT zone multiplier for the volume (this
+			setting is not persistent across mounts and can be
+			changed from mount to mount but cannot be changed on
+			remount).  Values of 1 to 4 are allowed, 1 being the
+			default.  The MFT zone multiplier determines how much
+			space is reserved for the MFT on the volume.  If all
+			other space is used up, then the MFT zone will be
+			shrunk dynamically, so this has no impact on the
+			amount of free space.  However, it can have an impact
+			on performance by affecting fragmentation of the MFT.
+			In general use the default.  If you have a lot of small
+			files then use a higher value.  The values have the
+			following meaning:
+
+			      =====	    =================================
+			      Value	     MFT zone size (% of volume size)
+			      =====	    =================================
+				1		12.5%
+				2		25%
+				3		37.5%
+				4		50%
+			      =====	    =================================
+
+			Note this option is irrelevant for read-only mounts.
+======================= =======================================================
+
+
+Known bugs and (mis-)features
+=============================
+
+- The link count on each directory inode entry is set to 1, due to Linux not
+  supporting directory hard links.  This may well confuse some user space
+  applications, since the directory names will have the same inode numbers.
+  This also speeds up ntfs_read_inode() immensely.  And we haven't found any
+  problems with this approach so far.  If you find a problem with this, please
+  let us know.
+
+
+Please send bug reports/comments/feedback/abuse to the Linux-NTFS development
+list at sourceforge: linux-ntfs-dev@lists.sourceforge.net
+
+
+Using NTFS volume and stripe sets
+=================================
+
+For support of volume and stripe sets, you can either use the kernel's
+Device-Mapper driver or the kernel's Software RAID / MD driver.  The former is
+the recommended one to use for linear raid.  But the latter is required for
+raid level 5.  For striping and mirroring, either driver should work fine.
+
+
+The Device-Mapper driver
+------------------------
+
+You will need to create a table of the components of the volume/stripe set and
+how they fit together and load this into the kernel using the dmsetup utility
+(see man 8 dmsetup).
+
+Linear volume sets, i.e. linear raid, has been tested and works fine.  Even
+though untested, there is no reason why stripe sets, i.e. raid level 0, and
+mirrors, i.e. raid level 1 should not work, too.  Stripes with parity, i.e.
+raid level 5, unfortunately cannot work yet because the current version of the
+Device-Mapper driver does not support raid level 5.  You may be able to use the
+Software RAID / MD driver for raid level 5, see the next section for details.
+
+To create the table describing your volume you will need to know each of its
+components and their sizes in sectors, i.e. multiples of 512-byte blocks.
+
+For NT4 fault tolerant volumes you can obtain the sizes using fdisk.  So for
+example if one of your partitions is /dev/hda2 you would do::
+
+    $ fdisk -ul /dev/hda
+
+    Disk /dev/hda: 81.9 GB, 81964302336 bytes
+    255 heads, 63 sectors/track, 9964 cylinders, total 160086528 sectors
+    Units = sectors of 1 * 512 = 512 bytes
+
+	Device Boot      Start         End      Blocks   Id  System
+	/dev/hda1   *          63     4209029     2104483+  83  Linux
+	/dev/hda2         4209030    37768814    16779892+  86  NTFS
+	/dev/hda3        37768815    46170809     4200997+  83  Linux
+
+And you would know that /dev/hda2 has a size of 37768814 - 4209030 + 1 =
+33559785 sectors.
+
+For Win2k and later dynamic disks, you can for example use the ldminfo utility
+which is part of the Linux LDM tools (the latest version at the time of
+writing is linux-ldm-0.0.8.tar.bz2).  You can download it from:
+
+	http://www.linux-ntfs.org/
+
+Simply extract the downloaded archive (tar xvjf linux-ldm-0.0.8.tar.bz2), go
+into it (cd linux-ldm-0.0.8) and change to the test directory (cd test).  You
+will find the precompiled (i386) ldminfo utility there.  NOTE: You will not be
+able to compile this yourself easily so use the binary version!
+
+Then you would use ldminfo in dump mode to obtain the necessary information::
+
+    $ ./ldminfo --dump /dev/hda
+
+This would dump the LDM database found on /dev/hda which describes all of your
+dynamic disks and all the volumes on them.  At the bottom you will see the
+VOLUME DEFINITIONS section which is all you really need.  You may need to look
+further above to determine which of the disks in the volume definitions is
+which device in Linux.  Hint: Run ldminfo on each of your dynamic disks and
+look at the Disk Id close to the top of the output for each (the PRIVATE HEADER
+section).  You can then find these Disk Ids in the VBLK DATABASE section in the
+<Disk> components where you will get the LDM Name for the disk that is found in
+the VOLUME DEFINITIONS section.
+
+Note you will also need to enable the LDM driver in the Linux kernel.  If your
+distribution did not enable it, you will need to recompile the kernel with it
+enabled.  This will create the LDM partitions on each device at boot time.  You
+would then use those devices (for /dev/hda they would be /dev/hda1, 2, 3, etc)
+in the Device-Mapper table.
+
+You can also bypass using the LDM driver by using the main device (e.g.
+/dev/hda) and then using the offsets of the LDM partitions into this device as
+the "Start sector of device" when creating the table.  Once again ldminfo would
+give you the correct information to do this.
+
+Assuming you know all your devices and their sizes things are easy.
+
+For a linear raid the table would look like this (note all values are in
+512-byte sectors)::
+
+    # Offset into	Size of this	Raid type	Device		Start sector
+    # volume	device						of device
+    0		1028161		linear		/dev/hda1	0
+    1028161		3903762		linear		/dev/hdb2	0
+    4931923		2103211		linear		/dev/hdc1	0
+
+For a striped volume, i.e. raid level 0, you will need to know the chunk size
+you used when creating the volume.  Windows uses 64kiB as the default, so it
+will probably be this unless you changes the defaults when creating the array.
+
+For a raid level 0 the table would look like this (note all values are in
+512-byte sectors)::
+
+    # Offset   Size	    Raid     Number   Chunk  1st        Start	2nd	  Start
+    # into     of the   type     of	      size   Device	in	Device	  in
+    # volume   volume	     stripes			device		  device
+    0	   2056320  striped  2	      128    /dev/hda1	0	/dev/hdb1 0
+
+If there are more than two devices, just add each of them to the end of the
+line.
+
+Finally, for a mirrored volume, i.e. raid level 1, the table would look like
+this (note all values are in 512-byte sectors)::
+
+    # Ofs Size   Raid   Log  Number Region Should Number Source  Start Target Start
+    # in  of the type   type of log size   sync?  of     Device  in    Device in
+    # vol volume		 params		     mirrors	     Device	  Device
+    0    2056320 mirror core 2	16     nosync 2	   /dev/hda1 0   /dev/hdb1 0
+
+If you are mirroring to multiple devices you can specify further targets at the
+end of the line.
+
+Note the "Should sync?" parameter "nosync" means that the two mirrors are
+already in sync which will be the case on a clean shutdown of Windows.  If the
+mirrors are not clean, you can specify the "sync" option instead of "nosync"
+and the Device-Mapper driver will then copy the entirety of the "Source Device"
+to the "Target Device" or if you specified multiple target devices to all of
+them.
+
+Once you have your table, save it in a file somewhere (e.g. /etc/ntfsvolume1),
+and hand it over to dmsetup to work with, like so::
+
+    $ dmsetup create myvolume1 /etc/ntfsvolume1
+
+You can obviously replace "myvolume1" with whatever name you like.
+
+If it all worked, you will now have the device /dev/device-mapper/myvolume1
+which you can then just use as an argument to the mount command as usual to
+mount the ntfs volume.  For example::
+
+    $ mount -t ntfs -o ro /dev/device-mapper/myvolume1 /mnt/myvol1
+
+(You need to create the directory /mnt/myvol1 first and of course you can use
+anything you like instead of /mnt/myvol1 as long as it is an existing
+directory.)
+
+It is advisable to do the mount read-only to see if the volume has been setup
+correctly to avoid the possibility of causing damage to the data on the ntfs
+volume.
+
+
+The Software RAID / MD driver
+-----------------------------
+
+An alternative to using the Device-Mapper driver is to use the kernel's
+Software RAID / MD driver.  For which you need to set up your /etc/raidtab
+appropriately (see man 5 raidtab).
+
+Linear volume sets, i.e. linear raid, as well as stripe sets, i.e. raid level
+0, have been tested and work fine (though see section "Limitations when using
+the MD driver with NTFS volumes" especially if you want to use linear raid).
+Even though untested, there is no reason why mirrors, i.e. raid level 1, and
+stripes with parity, i.e. raid level 5, should not work, too.
+
+You have to use the "persistent-superblock 0" option for each raid-disk in the
+NTFS volume/stripe you are configuring in /etc/raidtab as the persistent
+superblock used by the MD driver would damage the NTFS volume.
+
+Windows by default uses a stripe chunk size of 64k, so you probably want the
+"chunk-size 64k" option for each raid-disk, too.
+
+For example, if you have a stripe set consisting of two partitions /dev/hda5
+and /dev/hdb1 your /etc/raidtab would look like this::
+
+    raiddev /dev/md0
+	    raid-level	0
+	    nr-raid-disks	2
+	    nr-spare-disks	0
+	    persistent-superblock	0
+	    chunk-size	64k
+	    device		/dev/hda5
+	    raid-disk	0
+	    device		/dev/hdb1
+	    raid-disk	1
+
+For linear raid, just change the raid-level above to "raid-level linear", for
+mirrors, change it to "raid-level 1", and for stripe sets with parity, change
+it to "raid-level 5".
+
+Note for stripe sets with parity you will also need to tell the MD driver
+which parity algorithm to use by specifying the option "parity-algorithm
+which", where you need to replace "which" with the name of the algorithm to
+use (see man 5 raidtab for available algorithms) and you will have to try the
+different available algorithms until you find one that works.  Make sure you
+are working read-only when playing with this as you may damage your data
+otherwise.  If you find which algorithm works please let us know (email the
+linux-ntfs developers list linux-ntfs-dev@lists.sourceforge.net or drop in on
+IRC in channel #ntfs on the irc.freenode.net network) so we can update this
+documentation.
+
+Once the raidtab is setup, run for example raid0run -a to start all devices or
+raid0run /dev/md0 to start a particular md device, in this case /dev/md0.
+
+Then just use the mount command as usual to mount the ntfs volume using for
+example::
+
+    mount -t ntfs -o ro /dev/md0 /mnt/myntfsvolume
+
+It is advisable to do the mount read-only to see if the md volume has been
+setup correctly to avoid the possibility of causing damage to the data on the
+ntfs volume.
+
+
+Limitations when using the Software RAID / MD driver
+-----------------------------------------------------
+
+Using the md driver will not work properly if any of your NTFS partitions have
+an odd number of sectors.  This is especially important for linear raid as all
+data after the first partition with an odd number of sectors will be offset by
+one or more sectors so if you mount such a partition with write support you
+will cause massive damage to the data on the volume which will only become
+apparent when you try to use the volume again under Windows.
+
+So when using linear raid, make sure that all your partitions have an even
+number of sectors BEFORE attempting to use it.  You have been warned!
+
+Even better is to simply use the Device-Mapper for linear raid and then you do
+not have this problem with odd numbers of sectors.
diff --git a/Documentation/filesystems/ntfs.txt b/Documentation/filesystems/ntfs.txt
deleted file mode 100644
index 553f10d03076..000000000000
--- a/Documentation/filesystems/ntfs.txt
+++ /dev/null
@@ -1,451 +0,0 @@
-The Linux NTFS filesystem driver
-================================
-
-
-Table of contents
-=================
-
-- Overview
-- Web site
-- Features
-- Supported mount options
-- Known bugs and (mis-)features
-- Using NTFS volume and stripe sets
-  - The Device-Mapper driver
-  - The Software RAID / MD driver
-  - Limitations when using the MD driver
-
-
-Overview
-========
-
-Linux-NTFS comes with a number of user-space programs known as ntfsprogs.
-These include mkntfs, a full-featured ntfs filesystem format utility,
-ntfsundelete used for recovering files that were unintentionally deleted
-from an NTFS volume and ntfsresize which is used to resize an NTFS partition.
-See the web site for more information.
-
-To mount an NTFS 1.2/3.x (Windows NT4/2000/XP/2003) volume, use the file
-system type 'ntfs'.  The driver currently supports read-only mode (with no
-fault-tolerance, encryption or journalling) and very limited, but safe, write
-support.
-
-For fault tolerance and raid support (i.e. volume and stripe sets), you can
-use the kernel's Software RAID / MD driver.  See section "Using Software RAID
-with NTFS" for details.
-
-
-Web site
-========
-
-There is plenty of additional information on the linux-ntfs web site
-at http://www.linux-ntfs.org/
-
-The web site has a lot of additional information, such as a comprehensive
-FAQ, documentation on the NTFS on-disk format, information on the Linux-NTFS
-userspace utilities, etc.
-
-
-Features
-========
-
-- This is a complete rewrite of the NTFS driver that used to be in the 2.4 and
-  earlier kernels.  This new driver implements NTFS read support and is
-  functionally equivalent to the old ntfs driver and it also implements limited
-  write support.  The biggest limitation at present is that files/directories
-  cannot be created or deleted.  See below for the list of write features that
-  are so far supported.  Another limitation is that writing to compressed files
-  is not implemented at all.  Also, neither read nor write access to encrypted
-  files is so far implemented.
-- The new driver has full support for sparse files on NTFS 3.x volumes which
-  the old driver isn't happy with.
-- The new driver supports execution of binaries due to mmap() now being
-  supported.
-- The new driver supports loopback mounting of files on NTFS which is used by
-  some Linux distributions to enable the user to run Linux from an NTFS
-  partition by creating a large file while in Windows and then loopback
-  mounting the file while in Linux and creating a Linux filesystem on it that
-  is used to install Linux on it.
-- A comparison of the two drivers using:
-	time find . -type f -exec md5sum "{}" \;
-  run three times in sequence with each driver (after a reboot) on a 1.4GiB
-  NTFS partition, showed the new driver to be 20% faster in total time elapsed
-  (from 9:43 minutes on average down to 7:53).  The time spent in user space
-  was unchanged but the time spent in the kernel was decreased by a factor of
-  2.5 (from 85 CPU seconds down to 33).
-- The driver does not support short file names in general.  For backwards
-  compatibility, we implement access to files using their short file names if
-  they exist.  The driver will not create short file names however, and a
-  rename will discard any existing short file name.
-- The new driver supports exporting of mounted NTFS volumes via NFS.
-- The new driver supports async io (aio).
-- The new driver supports fsync(2), fdatasync(2), and msync(2).
-- The new driver supports readv(2) and writev(2).
-- The new driver supports access time updates (including mtime and ctime).
-- The new driver supports truncate(2) and open(2) with O_TRUNC.  But at present
-  only very limited support for highly fragmented files, i.e. ones which have
-  their data attribute split across multiple extents, is included.  Another
-  limitation is that at present truncate(2) will never create sparse files,
-  since to mark a file sparse we need to modify the directory entry for the
-  file and we do not implement directory modifications yet.
-- The new driver supports write(2) which can both overwrite existing data and
-  extend the file size so that you can write beyond the existing data.  Also,
-  writing into sparse regions is supported and the holes are filled in with
-  clusters.  But at present only limited support for highly fragmented files,
-  i.e. ones which have their data attribute split across multiple extents, is
-  included.  Another limitation is that write(2) will never create sparse
-  files, since to mark a file sparse we need to modify the directory entry for
-  the file and we do not implement directory modifications yet.
-
-Supported mount options
-=======================
-
-In addition to the generic mount options described by the manual page for the
-mount command (man 8 mount, also see man 5 fstab), the NTFS driver supports the
-following mount options:
-
-iocharset=name		Deprecated option.  Still supported but please use
-			nls=name in the future.  See description for nls=name.
-
-nls=name		Character set to use when returning file names.
-			Unlike VFAT, NTFS suppresses names that contain
-			unconvertible characters.  Note that most character
-			sets contain insufficient characters to represent all
-			possible Unicode characters that can exist on NTFS.
-			To be sure you are not missing any files, you are
-			advised to use nls=utf8 which is capable of
-			representing all Unicode characters.
-
-utf8=<bool>		Option no longer supported.  Currently mapped to
-			nls=utf8 but please use nls=utf8 in the future and
-			make sure utf8 is compiled either as module or into
-			the kernel.  See description for nls=name.
-
-uid=
-gid=
-umask=			Provide default owner, group, and access mode mask.
-			These options work as documented in mount(8).  By
-			default, the files/directories are owned by root and
-			he/she has read and write permissions, as well as
-			browse permission for directories.  No one else has any
-			access permissions.  I.e. the mode on all files is by
-			default rw------- and for directories rwx------, a
-			consequence of the default fmask=0177 and dmask=0077.
-			Using a umask of zero will grant all permissions to
-			everyone, i.e. all files and directories will have mode
-			rwxrwxrwx.
-
-fmask=
-dmask=			Instead of specifying umask which applies both to
-			files and directories, fmask applies only to files and
-			dmask only to directories.
-
-sloppy=<BOOL>		If sloppy is specified, ignore unknown mount options.
-			Otherwise the default behaviour is to abort mount if
-			any unknown options are found.
-
-show_sys_files=<BOOL>	If show_sys_files is specified, show the system files
-			in directory listings.  Otherwise the default behaviour
-			is to hide the system files.
-			Note that even when show_sys_files is specified, "$MFT"
-			will not be visible due to bugs/mis-features in glibc.
-			Further, note that irrespective of show_sys_files, all
-			files are accessible by name, i.e. you can always do
-			"ls -l \$UpCase" for example to specifically show the
-			system file containing the Unicode upcase table.
-
-case_sensitive=<BOOL>	If case_sensitive is specified, treat all file names as
-			case sensitive and create file names in the POSIX
-			namespace.  Otherwise the default behaviour is to treat
-			file names as case insensitive and to create file names
-			in the WIN32/LONG name space.  Note, the Linux NTFS
-			driver will never create short file names and will
-			remove them on rename/delete of the corresponding long
-			file name.
-			Note that files remain accessible via their short file
-			name, if it exists.  If case_sensitive, you will need
-			to provide the correct case of the short file name.
-
-disable_sparse=<BOOL>	If disable_sparse is specified, creation of sparse
-			regions, i.e. holes, inside files is disabled for the
-			volume (for the duration of this mount only).  By
-			default, creation of sparse regions is enabled, which
-			is consistent with the behaviour of traditional Unix
-			filesystems.
-
-errors=opt		What to do when critical filesystem errors are found.
-			Following values can be used for "opt":
-			  continue: DEFAULT, try to clean-up as much as
-				    possible, e.g. marking a corrupt inode as
-				    bad so it is no longer accessed, and then
-				    continue.
-			  recover:  At present only supported is recovery of
-				    the boot sector from the backup copy.
-				    If read-only mount, the recovery is done
-				    in memory only and not written to disk.
-			Note that the options are additive, i.e. specifying:
-			   errors=continue,errors=recover
-			means the driver will attempt to recover and if that
-			fails it will clean-up as much as possible and
-			continue.
-
-mft_zone_multiplier=	Set the MFT zone multiplier for the volume (this
-			setting is not persistent across mounts and can be
-			changed from mount to mount but cannot be changed on
-			remount).  Values of 1 to 4 are allowed, 1 being the
-			default.  The MFT zone multiplier determines how much
-			space is reserved for the MFT on the volume.  If all
-			other space is used up, then the MFT zone will be
-			shrunk dynamically, so this has no impact on the
-			amount of free space.  However, it can have an impact
-			on performance by affecting fragmentation of the MFT.
-			In general use the default.  If you have a lot of small
-			files then use a higher value.  The values have the
-			following meaning:
-			      Value	     MFT zone size (% of volume size)
-				1		12.5%
-				2		25%
-				3		37.5%
-				4		50%
-			Note this option is irrelevant for read-only mounts.
-
-
-Known bugs and (mis-)features
-=============================
-
-- The link count on each directory inode entry is set to 1, due to Linux not
-  supporting directory hard links.  This may well confuse some user space
-  applications, since the directory names will have the same inode numbers.
-  This also speeds up ntfs_read_inode() immensely.  And we haven't found any
-  problems with this approach so far.  If you find a problem with this, please
-  let us know.
-
-
-Please send bug reports/comments/feedback/abuse to the Linux-NTFS development
-list at sourceforge: linux-ntfs-dev@lists.sourceforge.net
-
-
-Using NTFS volume and stripe sets
-=================================
-
-For support of volume and stripe sets, you can either use the kernel's
-Device-Mapper driver or the kernel's Software RAID / MD driver.  The former is
-the recommended one to use for linear raid.  But the latter is required for
-raid level 5.  For striping and mirroring, either driver should work fine.
-
-
-The Device-Mapper driver
-------------------------
-
-You will need to create a table of the components of the volume/stripe set and
-how they fit together and load this into the kernel using the dmsetup utility
-(see man 8 dmsetup).
-
-Linear volume sets, i.e. linear raid, has been tested and works fine.  Even
-though untested, there is no reason why stripe sets, i.e. raid level 0, and
-mirrors, i.e. raid level 1 should not work, too.  Stripes with parity, i.e.
-raid level 5, unfortunately cannot work yet because the current version of the
-Device-Mapper driver does not support raid level 5.  You may be able to use the
-Software RAID / MD driver for raid level 5, see the next section for details.
-
-To create the table describing your volume you will need to know each of its
-components and their sizes in sectors, i.e. multiples of 512-byte blocks.
-
-For NT4 fault tolerant volumes you can obtain the sizes using fdisk.  So for
-example if one of your partitions is /dev/hda2 you would do:
-
-$ fdisk -ul /dev/hda
-
-Disk /dev/hda: 81.9 GB, 81964302336 bytes
-255 heads, 63 sectors/track, 9964 cylinders, total 160086528 sectors
-Units = sectors of 1 * 512 = 512 bytes
-
-   Device Boot      Start         End      Blocks   Id  System
-   /dev/hda1   *          63     4209029     2104483+  83  Linux
-   /dev/hda2         4209030    37768814    16779892+  86  NTFS
-   /dev/hda3        37768815    46170809     4200997+  83  Linux
-
-And you would know that /dev/hda2 has a size of 37768814 - 4209030 + 1 =
-33559785 sectors.
-
-For Win2k and later dynamic disks, you can for example use the ldminfo utility
-which is part of the Linux LDM tools (the latest version at the time of
-writing is linux-ldm-0.0.8.tar.bz2).  You can download it from:
-	http://www.linux-ntfs.org/
-Simply extract the downloaded archive (tar xvjf linux-ldm-0.0.8.tar.bz2), go
-into it (cd linux-ldm-0.0.8) and change to the test directory (cd test).  You
-will find the precompiled (i386) ldminfo utility there.  NOTE: You will not be
-able to compile this yourself easily so use the binary version!
-
-Then you would use ldminfo in dump mode to obtain the necessary information:
-
-$ ./ldminfo --dump /dev/hda
-
-This would dump the LDM database found on /dev/hda which describes all of your
-dynamic disks and all the volumes on them.  At the bottom you will see the
-VOLUME DEFINITIONS section which is all you really need.  You may need to look
-further above to determine which of the disks in the volume definitions is
-which device in Linux.  Hint: Run ldminfo on each of your dynamic disks and
-look at the Disk Id close to the top of the output for each (the PRIVATE HEADER
-section).  You can then find these Disk Ids in the VBLK DATABASE section in the
-<Disk> components where you will get the LDM Name for the disk that is found in
-the VOLUME DEFINITIONS section.
-
-Note you will also need to enable the LDM driver in the Linux kernel.  If your
-distribution did not enable it, you will need to recompile the kernel with it
-enabled.  This will create the LDM partitions on each device at boot time.  You
-would then use those devices (for /dev/hda they would be /dev/hda1, 2, 3, etc)
-in the Device-Mapper table.
-
-You can also bypass using the LDM driver by using the main device (e.g.
-/dev/hda) and then using the offsets of the LDM partitions into this device as
-the "Start sector of device" when creating the table.  Once again ldminfo would
-give you the correct information to do this.
-
-Assuming you know all your devices and their sizes things are easy.
-
-For a linear raid the table would look like this (note all values are in
-512-byte sectors):
-
---- cut here ---
-# Offset into	Size of this	Raid type	Device		Start sector
-# volume	device						of device
-0		1028161		linear		/dev/hda1	0
-1028161		3903762		linear		/dev/hdb2	0
-4931923		2103211		linear		/dev/hdc1	0
---- cut here ---
-
-For a striped volume, i.e. raid level 0, you will need to know the chunk size
-you used when creating the volume.  Windows uses 64kiB as the default, so it
-will probably be this unless you changes the defaults when creating the array.
-
-For a raid level 0 the table would look like this (note all values are in
-512-byte sectors):
-
---- cut here ---
-# Offset   Size	    Raid     Number   Chunk  1st        Start	2nd	  Start
-# into     of the   type     of	      size   Device	in	Device	  in
-# volume   volume	     stripes			device		  device
-0	   2056320  striped  2	      128    /dev/hda1	0	/dev/hdb1 0
---- cut here ---
-
-If there are more than two devices, just add each of them to the end of the
-line.
-
-Finally, for a mirrored volume, i.e. raid level 1, the table would look like
-this (note all values are in 512-byte sectors):
-
---- cut here ---
-# Ofs Size   Raid   Log  Number Region Should Number Source  Start Target Start
-# in  of the type   type of log size   sync?  of     Device  in    Device in
-# vol volume		 params		     mirrors	     Device	  Device
-0    2056320 mirror core 2	16     nosync 2	   /dev/hda1 0   /dev/hdb1 0
---- cut here ---
-
-If you are mirroring to multiple devices you can specify further targets at the
-end of the line.
-
-Note the "Should sync?" parameter "nosync" means that the two mirrors are
-already in sync which will be the case on a clean shutdown of Windows.  If the
-mirrors are not clean, you can specify the "sync" option instead of "nosync"
-and the Device-Mapper driver will then copy the entirety of the "Source Device"
-to the "Target Device" or if you specified multiple target devices to all of
-them.
-
-Once you have your table, save it in a file somewhere (e.g. /etc/ntfsvolume1),
-and hand it over to dmsetup to work with, like so:
-
-$ dmsetup create myvolume1 /etc/ntfsvolume1
-
-You can obviously replace "myvolume1" with whatever name you like.
-
-If it all worked, you will now have the device /dev/device-mapper/myvolume1
-which you can then just use as an argument to the mount command as usual to
-mount the ntfs volume.  For example:
-
-$ mount -t ntfs -o ro /dev/device-mapper/myvolume1 /mnt/myvol1
-
-(You need to create the directory /mnt/myvol1 first and of course you can use
-anything you like instead of /mnt/myvol1 as long as it is an existing
-directory.)
-
-It is advisable to do the mount read-only to see if the volume has been setup
-correctly to avoid the possibility of causing damage to the data on the ntfs
-volume.
-
-
-The Software RAID / MD driver
------------------------------
-
-An alternative to using the Device-Mapper driver is to use the kernel's
-Software RAID / MD driver.  For which you need to set up your /etc/raidtab
-appropriately (see man 5 raidtab).
-
-Linear volume sets, i.e. linear raid, as well as stripe sets, i.e. raid level
-0, have been tested and work fine (though see section "Limitations when using
-the MD driver with NTFS volumes" especially if you want to use linear raid).
-Even though untested, there is no reason why mirrors, i.e. raid level 1, and
-stripes with parity, i.e. raid level 5, should not work, too.
-
-You have to use the "persistent-superblock 0" option for each raid-disk in the
-NTFS volume/stripe you are configuring in /etc/raidtab as the persistent
-superblock used by the MD driver would damage the NTFS volume.
-
-Windows by default uses a stripe chunk size of 64k, so you probably want the
-"chunk-size 64k" option for each raid-disk, too.
-
-For example, if you have a stripe set consisting of two partitions /dev/hda5
-and /dev/hdb1 your /etc/raidtab would look like this:
-
-raiddev /dev/md0
-	raid-level	0
-	nr-raid-disks	2
-	nr-spare-disks	0
-	persistent-superblock	0
-	chunk-size	64k
-	device		/dev/hda5
-	raid-disk	0
-	device		/dev/hdb1
-	raid-disk	1
-
-For linear raid, just change the raid-level above to "raid-level linear", for
-mirrors, change it to "raid-level 1", and for stripe sets with parity, change
-it to "raid-level 5".
-
-Note for stripe sets with parity you will also need to tell the MD driver
-which parity algorithm to use by specifying the option "parity-algorithm
-which", where you need to replace "which" with the name of the algorithm to
-use (see man 5 raidtab for available algorithms) and you will have to try the
-different available algorithms until you find one that works.  Make sure you
-are working read-only when playing with this as you may damage your data
-otherwise.  If you find which algorithm works please let us know (email the
-linux-ntfs developers list linux-ntfs-dev@lists.sourceforge.net or drop in on
-IRC in channel #ntfs on the irc.freenode.net network) so we can update this
-documentation.
-
-Once the raidtab is setup, run for example raid0run -a to start all devices or
-raid0run /dev/md0 to start a particular md device, in this case /dev/md0.
-
-Then just use the mount command as usual to mount the ntfs volume using for
-example:	mount -t ntfs -o ro /dev/md0 /mnt/myntfsvolume
-
-It is advisable to do the mount read-only to see if the md volume has been
-setup correctly to avoid the possibility of causing damage to the data on the
-ntfs volume.
-
-
-Limitations when using the Software RAID / MD driver
------------------------------------------------------
-
-Using the md driver will not work properly if any of your NTFS partitions have
-an odd number of sectors.  This is especially important for linear raid as all
-data after the first partition with an odd number of sectors will be offset by
-one or more sectors so if you mount such a partition with write support you
-will cause massive damage to the data on the volume which will only become
-apparent when you try to use the volume again under Windows.
-
-So when using linear raid, make sure that all your partitions have an even
-number of sectors BEFORE attempting to use it.  You have been warned!
-
-Even better is to simply use the Device-Mapper for linear raid and then you do
-not have this problem with odd numbers of sectors.
-- 
cgit 


From 3d0c60d004644630f1431ce486e76adcc829e288 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:12:14 +0100
Subject: docs: filesystems: convert ocfs2-online-filecheck.txt to ReST

- Add a SPDX header;
- Add a document title;
- Some whitespace fixes and new line breaks;
- Mark literal blocks as such;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/6007166acc3252697755836354bd29b5a5fb82aa.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/index.rst                |  1 +
 .../filesystems/ocfs2-online-filecheck.rst         | 99 ++++++++++++++++++++++
 .../filesystems/ocfs2-online-filecheck.txt         | 94 --------------------
 3 files changed, 100 insertions(+), 94 deletions(-)
 create mode 100644 Documentation/filesystems/ocfs2-online-filecheck.rst
 delete mode 100644 Documentation/filesystems/ocfs2-online-filecheck.txt

diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index 62be53c4755d..f3a26fdbd04f 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -76,6 +76,7 @@ Documentation for filesystem implementations.
    nilfs2
    nfs/index
    ntfs
+   ocfs2-online-filecheck
    overlayfs
    virtiofs
    vfat
diff --git a/Documentation/filesystems/ocfs2-online-filecheck.rst b/Documentation/filesystems/ocfs2-online-filecheck.rst
new file mode 100644
index 000000000000..2257bb53edc1
--- /dev/null
+++ b/Documentation/filesystems/ocfs2-online-filecheck.rst
@@ -0,0 +1,99 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=====================================
+OCFS2 file system - online file check
+=====================================
+
+This document will describe OCFS2 online file check feature.
+
+Introduction
+============
+OCFS2 is often used in high-availability systems. However, OCFS2 usually
+converts the filesystem to read-only when encounters an error. This may not be
+necessary, since turning the filesystem read-only would affect other running
+processes as well, decreasing availability.
+Then, a mount option (errors=continue) is introduced, which would return the
+-EIO errno to the calling process and terminate further processing so that the
+filesystem is not corrupted further. The filesystem is not converted to
+read-only, and the problematic file's inode number is reported in the kernel
+log. The user can try to check/fix this file via online filecheck feature.
+
+Scope
+=====
+This effort is to check/fix small issues which may hinder day-to-day operations
+of a cluster filesystem by turning the filesystem read-only. The scope of
+checking/fixing is at the file level, initially for regular files and eventually
+to all files (including system files) of the filesystem.
+
+In case of directory to file links is incorrect, the directory inode is
+reported as erroneous.
+
+This feature is not suited for extravagant checks which involve dependency of
+other components of the filesystem, such as but not limited to, checking if the
+bits for file blocks in the allocation has been set. In case of such an error,
+the offline fsck should/would be recommended.
+
+Finally, such an operation/feature should not be automated lest the filesystem
+may end up with more damage than before the repair attempt. So, this has to
+be performed using user interaction and consent.
+
+User interface
+==============
+When there are errors in the OCFS2 filesystem, they are usually accompanied
+by the inode number which caused the error. This inode number would be the
+input to check/fix the file.
+
+There is a sysfs directory for each OCFS2 file system mounting::
+
+  /sys/fs/ocfs2/<devname>/filecheck
+
+Here, <devname> indicates the name of OCFS2 volume device which has been already
+mounted. The file above would accept inode numbers. This could be used to
+communicate with kernel space, tell which file(inode number) will be checked or
+fixed. Currently, three operations are supported, which includes checking
+inode, fixing inode and setting the size of result record history.
+
+1. If you want to know what error exactly happened to <inode> before fixing, do::
+
+    # echo "<inode>" > /sys/fs/ocfs2/<devname>/filecheck/check
+    # cat /sys/fs/ocfs2/<devname>/filecheck/check
+
+The output is like this::
+
+    INO		DONE	ERROR
+    39502		1	GENERATION
+
+    <INO> lists the inode numbers.
+    <DONE> indicates whether the operation has been finished.
+    <ERROR> says what kind of errors was found. For the detailed error numbers,
+    please refer to the file linux/fs/ocfs2/filecheck.h.
+
+2. If you determine to fix this inode, do::
+
+    # echo "<inode>" > /sys/fs/ocfs2/<devname>/filecheck/fix
+    # cat /sys/fs/ocfs2/<devname>/filecheck/fix
+
+The output is like this:::
+
+    INO		DONE	ERROR
+    39502		1	SUCCESS
+
+This time, the <ERROR> column indicates whether this fix is successful or not.
+
+3. The record cache is used to store the history of check/fix results. It's
+default size is 10, and can be adjust between the range of 10 ~ 100. You can
+adjust the size like this::
+
+  # echo "<size>" > /sys/fs/ocfs2/<devname>/filecheck/set
+
+Fixing stuff
+============
+On receiving the inode, the filesystem would read the inode and the
+file metadata. In case of errors, the filesystem would fix the errors
+and report the problems it fixed in the kernel log. As a precautionary measure,
+the inode must first be checked for errors before performing a final fix.
+
+The inode and the result history will be maintained temporarily in a
+small linked list buffer which would contain the last (N) inodes
+fixed/checked, the detailed errors which were fixed/checked are printed in the
+kernel log.
diff --git a/Documentation/filesystems/ocfs2-online-filecheck.txt b/Documentation/filesystems/ocfs2-online-filecheck.txt
deleted file mode 100644
index 139fab175c8a..000000000000
--- a/Documentation/filesystems/ocfs2-online-filecheck.txt
+++ /dev/null
@@ -1,94 +0,0 @@
-		    OCFS2 online file check
-		    -----------------------
-
-This document will describe OCFS2 online file check feature.
-
-Introduction
-============
-OCFS2 is often used in high-availability systems. However, OCFS2 usually
-converts the filesystem to read-only when encounters an error. This may not be
-necessary, since turning the filesystem read-only would affect other running
-processes as well, decreasing availability.
-Then, a mount option (errors=continue) is introduced, which would return the
--EIO errno to the calling process and terminate further processing so that the
-filesystem is not corrupted further. The filesystem is not converted to
-read-only, and the problematic file's inode number is reported in the kernel
-log. The user can try to check/fix this file via online filecheck feature.
-
-Scope
-=====
-This effort is to check/fix small issues which may hinder day-to-day operations
-of a cluster filesystem by turning the filesystem read-only. The scope of
-checking/fixing is at the file level, initially for regular files and eventually
-to all files (including system files) of the filesystem.
-
-In case of directory to file links is incorrect, the directory inode is
-reported as erroneous.
-
-This feature is not suited for extravagant checks which involve dependency of
-other components of the filesystem, such as but not limited to, checking if the
-bits for file blocks in the allocation has been set. In case of such an error,
-the offline fsck should/would be recommended.
-
-Finally, such an operation/feature should not be automated lest the filesystem
-may end up with more damage than before the repair attempt. So, this has to
-be performed using user interaction and consent.
-
-User interface
-==============
-When there are errors in the OCFS2 filesystem, they are usually accompanied
-by the inode number which caused the error. This inode number would be the
-input to check/fix the file.
-
-There is a sysfs directory for each OCFS2 file system mounting:
-
-  /sys/fs/ocfs2/<devname>/filecheck
-
-Here, <devname> indicates the name of OCFS2 volume device which has been already
-mounted. The file above would accept inode numbers. This could be used to
-communicate with kernel space, tell which file(inode number) will be checked or
-fixed. Currently, three operations are supported, which includes checking
-inode, fixing inode and setting the size of result record history.
-
-1. If you want to know what error exactly happened to <inode> before fixing, do
-
-  # echo "<inode>" > /sys/fs/ocfs2/<devname>/filecheck/check
-  # cat /sys/fs/ocfs2/<devname>/filecheck/check
-
-The output is like this:
-  INO		DONE	ERROR
-39502		1	GENERATION
-
-<INO> lists the inode numbers.
-<DONE> indicates whether the operation has been finished.
-<ERROR> says what kind of errors was found. For the detailed error numbers,
-please refer to the file linux/fs/ocfs2/filecheck.h.
-
-2. If you determine to fix this inode, do
-
-  # echo "<inode>" > /sys/fs/ocfs2/<devname>/filecheck/fix
-  # cat /sys/fs/ocfs2/<devname>/filecheck/fix
-
-The output is like this:
-  INO		DONE	ERROR
-39502		1	SUCCESS
-
-This time, the <ERROR> column indicates whether this fix is successful or not.
-
-3. The record cache is used to store the history of check/fix results. It's
-default size is 10, and can be adjust between the range of 10 ~ 100. You can
-adjust the size like this:
-
-  # echo "<size>" > /sys/fs/ocfs2/<devname>/filecheck/set
-
-Fixing stuff
-============
-On receiving the inode, the filesystem would read the inode and the
-file metadata. In case of errors, the filesystem would fix the errors
-and report the problems it fixed in the kernel log. As a precautionary measure,
-the inode must first be checked for errors before performing a final fix.
-
-The inode and the result history will be maintained temporarily in a
-small linked list buffer which would contain the last (N) inodes
-fixed/checked, the detailed errors which were fixed/checked are printed in the
-kernel log.
-- 
cgit 


From fa95e087ff69468b4e452c50c3f4c59a45846b8d Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:12:15 +0100
Subject: docs: filesystems: convert ocfs2.txt to ReST

- Add a SPDX header;
- Adjust document title;
- Some whitespace fixes and new line breaks;
- Mark literal blocks as such;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Link: https://lore.kernel.org/r/e29a8120bf1d847f23fb68e915f10a7d43bed9e3.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/index.rst |   1 +
 Documentation/filesystems/ocfs2.rst | 117 ++++++++++++++++++++++++++++++++++++
 Documentation/filesystems/ocfs2.txt | 106 --------------------------------
 3 files changed, 118 insertions(+), 106 deletions(-)
 create mode 100644 Documentation/filesystems/ocfs2.rst
 delete mode 100644 Documentation/filesystems/ocfs2.txt

diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index f3a26fdbd04f..3b2b07491c98 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -76,6 +76,7 @@ Documentation for filesystem implementations.
    nilfs2
    nfs/index
    ntfs
+   ocfs2
    ocfs2-online-filecheck
    overlayfs
    virtiofs
diff --git a/Documentation/filesystems/ocfs2.rst b/Documentation/filesystems/ocfs2.rst
new file mode 100644
index 000000000000..412386bc6506
--- /dev/null
+++ b/Documentation/filesystems/ocfs2.rst
@@ -0,0 +1,117 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+================
+OCFS2 filesystem
+================
+
+OCFS2 is a general purpose extent based shared disk cluster file
+system with many similarities to ext3. It supports 64 bit inode
+numbers, and has automatically extending metadata groups which may
+also make it attractive for non-clustered use.
+
+You'll want to install the ocfs2-tools package in order to at least
+get "mount.ocfs2" and "ocfs2_hb_ctl".
+
+Project web page:    http://ocfs2.wiki.kernel.org
+Tools git tree:      https://github.com/markfasheh/ocfs2-tools
+OCFS2 mailing lists: http://oss.oracle.com/projects/ocfs2/mailman/
+
+All code copyright 2005 Oracle except when otherwise noted.
+
+Credits
+=======
+
+Lots of code taken from ext3 and other projects.
+
+Authors in alphabetical order:
+
+- Joel Becker   <joel.becker@oracle.com>
+- Zach Brown    <zach.brown@oracle.com>
+- Mark Fasheh   <mfasheh@suse.com>
+- Kurt Hackel   <kurt.hackel@oracle.com>
+- Tao Ma        <tao.ma@oracle.com>
+- Sunil Mushran <sunil.mushran@oracle.com>
+- Manish Singh  <manish.singh@oracle.com>
+- Tiger Yang    <tiger.yang@oracle.com>
+
+Caveats
+=======
+Features which OCFS2 does not support yet:
+
+	- Directory change notification (F_NOTIFY)
+	- Distributed Caching (F_SETLEASE/F_GETLEASE/break_lease)
+
+Mount options
+=============
+
+OCFS2 supports the following mount options:
+
+(*) == default
+
+======================= ========================================================
+barrier=1		This enables/disables barriers. barrier=0 disables it,
+			barrier=1 enables it.
+errors=remount-ro(*)	Remount the filesystem read-only on an error.
+errors=panic		Panic and halt the machine if an error occurs.
+intr		(*)	Allow signals to interrupt cluster operations.
+nointr			Do not allow signals to interrupt cluster
+			operations.
+noatime			Do not update access time.
+relatime(*)		Update atime if the previous atime is older than
+			mtime or ctime
+strictatime		Always update atime, but the minimum update interval
+			is specified by atime_quantum.
+atime_quantum=60(*)	OCFS2 will not update atime unless this number
+			of seconds has passed since the last update.
+			Set to zero to always update atime. This option need
+			work with strictatime.
+data=ordered	(*)	All data are forced directly out to the main file
+			system prior to its metadata being committed to the
+			journal.
+data=writeback		Data ordering is not preserved, data may be written
+			into the main file system after its metadata has been
+			committed to the journal.
+preferred_slot=0(*)	During mount, try to use this filesystem slot first. If
+			it is in use by another node, the first empty one found
+			will be chosen. Invalid values will be ignored.
+commit=nrsec	(*)	Ocfs2 can be told to sync all its data and metadata
+			every 'nrsec' seconds. The default value is 5 seconds.
+			This means that if you lose your power, you will lose
+			as much as the latest 5 seconds of work (your
+			filesystem will not be damaged though, thanks to the
+			journaling).  This default value (or any low value)
+			will hurt performance, but it's good for data-safety.
+			Setting it to 0 will have the same effect as leaving
+			it at the default (5 seconds).
+			Setting it to very large values will improve
+			performance.
+localalloc=8(*)		Allows custom localalloc size in MB. If the value is too
+			large, the fs will silently revert it to the default.
+localflocks		This disables cluster aware flock.
+inode64			Indicates that Ocfs2 is allowed to create inodes at
+			any location in the filesystem, including those which
+			will result in inode numbers occupying more than 32
+			bits of significance.
+user_xattr	(*)	Enables Extended User Attributes.
+nouser_xattr		Disables Extended User Attributes.
+acl			Enables POSIX Access Control Lists support.
+noacl		(*)	Disables POSIX Access Control Lists support.
+resv_level=2	(*)	Set how aggressive allocation reservations will be.
+			Valid values are between 0 (reservations off) to 8
+			(maximum space for reservations).
+dir_resv_level=	(*)	By default, directory reservations will scale with file
+			reservations - users should rarely need to change this
+			value. If allocation reservations are turned off, this
+			option will have no effect.
+coherency=full  (*)	Disallow concurrent O_DIRECT writes, cluster inode
+			lock will be taken to force other nodes drop cache,
+			therefore full cluster coherency is guaranteed even
+			for O_DIRECT writes.
+coherency=buffered	Allow concurrent O_DIRECT writes without EX lock among
+			nodes, which gains high performance at risk of getting
+			stale data on other nodes.
+journal_async_commit	Commit block can be written to disk without waiting
+			for descriptor blocks. If enabled older kernels cannot
+			mount the device. This will enable 'journal_checksum'
+			internally.
+======================= ========================================================
diff --git a/Documentation/filesystems/ocfs2.txt b/Documentation/filesystems/ocfs2.txt
deleted file mode 100644
index 4c49e5410595..000000000000
--- a/Documentation/filesystems/ocfs2.txt
+++ /dev/null
@@ -1,106 +0,0 @@
-OCFS2 filesystem
-==================
-OCFS2 is a general purpose extent based shared disk cluster file
-system with many similarities to ext3. It supports 64 bit inode
-numbers, and has automatically extending metadata groups which may
-also make it attractive for non-clustered use.
-
-You'll want to install the ocfs2-tools package in order to at least
-get "mount.ocfs2" and "ocfs2_hb_ctl".
-
-Project web page:    http://ocfs2.wiki.kernel.org
-Tools git tree:      https://github.com/markfasheh/ocfs2-tools
-OCFS2 mailing lists: http://oss.oracle.com/projects/ocfs2/mailman/
-
-All code copyright 2005 Oracle except when otherwise noted.
-
-CREDITS:
-Lots of code taken from ext3 and other projects.
-
-Authors in alphabetical order:
-Joel Becker   <joel.becker@oracle.com>
-Zach Brown    <zach.brown@oracle.com>
-Mark Fasheh   <mfasheh@suse.com>
-Kurt Hackel   <kurt.hackel@oracle.com>
-Tao Ma        <tao.ma@oracle.com>
-Sunil Mushran <sunil.mushran@oracle.com>
-Manish Singh  <manish.singh@oracle.com>
-Tiger Yang    <tiger.yang@oracle.com>
-
-Caveats
-=======
-Features which OCFS2 does not support yet:
-	- Directory change notification (F_NOTIFY)
-	- Distributed Caching (F_SETLEASE/F_GETLEASE/break_lease)
-
-Mount options
-=============
-
-OCFS2 supports the following mount options:
-(*) == default
-
-barrier=1		This enables/disables barriers. barrier=0 disables it,
-			barrier=1 enables it.
-errors=remount-ro(*)	Remount the filesystem read-only on an error.
-errors=panic		Panic and halt the machine if an error occurs.
-intr		(*)	Allow signals to interrupt cluster operations.
-nointr			Do not allow signals to interrupt cluster
-			operations.
-noatime			Do not update access time.
-relatime(*)		Update atime if the previous atime is older than
-			mtime or ctime
-strictatime		Always update atime, but the minimum update interval
-			is specified by atime_quantum.
-atime_quantum=60(*)	OCFS2 will not update atime unless this number
-			of seconds has passed since the last update.
-			Set to zero to always update atime. This option need
-			work with strictatime.
-data=ordered	(*)	All data are forced directly out to the main file
-			system prior to its metadata being committed to the
-			journal.
-data=writeback		Data ordering is not preserved, data may be written
-			into the main file system after its metadata has been
-			committed to the journal.
-preferred_slot=0(*)	During mount, try to use this filesystem slot first. If
-			it is in use by another node, the first empty one found
-			will be chosen. Invalid values will be ignored.
-commit=nrsec	(*)	Ocfs2 can be told to sync all its data and metadata
-			every 'nrsec' seconds. The default value is 5 seconds.
-			This means that if you lose your power, you will lose
-			as much as the latest 5 seconds of work (your
-			filesystem will not be damaged though, thanks to the
-			journaling).  This default value (or any low value)
-			will hurt performance, but it's good for data-safety.
-			Setting it to 0 will have the same effect as leaving
-			it at the default (5 seconds).
-			Setting it to very large values will improve
-			performance.
-localalloc=8(*)		Allows custom localalloc size in MB. If the value is too
-			large, the fs will silently revert it to the default.
-localflocks		This disables cluster aware flock.
-inode64			Indicates that Ocfs2 is allowed to create inodes at
-			any location in the filesystem, including those which
-			will result in inode numbers occupying more than 32
-			bits of significance.
-user_xattr	(*)	Enables Extended User Attributes.
-nouser_xattr		Disables Extended User Attributes.
-acl			Enables POSIX Access Control Lists support.
-noacl		(*)	Disables POSIX Access Control Lists support.
-resv_level=2	(*)	Set how aggressive allocation reservations will be.
-			Valid values are between 0 (reservations off) to 8
-			(maximum space for reservations).
-dir_resv_level=	(*)	By default, directory reservations will scale with file
-			reservations - users should rarely need to change this
-			value. If allocation reservations are turned off, this
-			option will have no effect.
-coherency=full  (*)	Disallow concurrent O_DIRECT writes, cluster inode
-			lock will be taken to force other nodes drop cache,
-			therefore full cluster coherency is guaranteed even
-			for O_DIRECT writes.
-coherency=buffered	Allow concurrent O_DIRECT writes without EX lock among
-			nodes, which gains high performance at risk of getting
-			stale data on other nodes.
-journal_async_commit	Commit block can be written to disk without waiting
-			for descriptor blocks. If enabled older kernels cannot
-			mount the device. This will enable 'journal_checksum'
-			internally.
-- 
cgit 


From 7cbb468f0c70878fe64d324790ee049c1881af7c Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:12:16 +0100
Subject: docs: filesystems: convert omfs.txt to ReST

- Add a SPDX header;
- Adjust document title;
- Mark literal blocks as such;
- Add table markups;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Acked-by: Bob Copeland <me@bobcopeland.com>
Link: https://lore.kernel.org/r/0c125c7c971d81a557ca954992b8d770a9d1e3e8.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/index.rst |   1 +
 Documentation/filesystems/omfs.rst  | 112 ++++++++++++++++++++++++++++++++++++
 Documentation/filesystems/omfs.txt  | 106 ----------------------------------
 3 files changed, 113 insertions(+), 106 deletions(-)
 create mode 100644 Documentation/filesystems/omfs.rst
 delete mode 100644 Documentation/filesystems/omfs.txt

diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index 3b2b07491c98..fbee77175840 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -78,6 +78,7 @@ Documentation for filesystem implementations.
    ntfs
    ocfs2
    ocfs2-online-filecheck
+   omfs
    overlayfs
    virtiofs
    vfat
diff --git a/Documentation/filesystems/omfs.rst b/Documentation/filesystems/omfs.rst
new file mode 100644
index 000000000000..4c8bb3074169
--- /dev/null
+++ b/Documentation/filesystems/omfs.rst
@@ -0,0 +1,112 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+================================
+Optimized MPEG Filesystem (OMFS)
+================================
+
+Overview
+========
+
+OMFS is a filesystem created by SonicBlue for use in the ReplayTV DVR
+and Rio Karma MP3 player.  The filesystem is extent-based, utilizing
+block sizes from 2k to 8k, with hash-based directories.  This
+filesystem driver may be used to read and write disks from these
+devices.
+
+Note, it is not recommended that this FS be used in place of a general
+filesystem for your own streaming media device.  Native Linux filesystems
+will likely perform better.
+
+More information is available at:
+
+    http://linux-karma.sf.net/
+
+Various utilities, including mkomfs and omfsck, are included with
+omfsprogs, available at:
+
+    http://bobcopeland.com/karma/
+
+Instructions are included in its README.
+
+Options
+=======
+
+OMFS supports the following mount-time options:
+
+    ============   ========================================
+    uid=n          make all files owned by specified user
+    gid=n          make all files owned by specified group
+    umask=xxx      set permission umask to xxx
+    fmask=xxx      set umask to xxx for files
+    dmask=xxx      set umask to xxx for directories
+    ============   ========================================
+
+Disk format
+===========
+
+OMFS discriminates between "sysblocks" and normal data blocks.  The sysblock
+group consists of super block information, file metadata, directory structures,
+and extents.  Each sysblock has a header containing CRCs of the entire
+sysblock, and may be mirrored in successive blocks on the disk.  A sysblock may
+have a smaller size than a data block, but since they are both addressed by the
+same 64-bit block number, any remaining space in the smaller sysblock is
+unused.
+
+Sysblock header information::
+
+    struct omfs_header {
+	    __be64 h_self;                  /* FS block where this is located */
+	    __be32 h_body_size;             /* size of useful data after header */
+	    __be16 h_crc;                   /* crc-ccitt of body_size bytes */
+	    char h_fill1[2];
+	    u8 h_version;                   /* version, always 1 */
+	    char h_type;                    /* OMFS_INODE_X */
+	    u8 h_magic;                     /* OMFS_IMAGIC */
+	    u8 h_check_xor;                 /* XOR of header bytes before this */
+	    __be32 h_fill2;
+    };
+
+Files and directories are both represented by omfs_inode::
+
+    struct omfs_inode {
+	    struct omfs_header i_head;      /* header */
+	    __be64 i_parent;                /* parent containing this inode */
+	    __be64 i_sibling;               /* next inode in hash bucket */
+	    __be64 i_ctime;                 /* ctime, in milliseconds */
+	    char i_fill1[35];
+	    char i_type;                    /* OMFS_[DIR,FILE] */
+	    __be32 i_fill2;
+	    char i_fill3[64];
+	    char i_name[OMFS_NAMELEN];      /* filename */
+	    __be64 i_size;                  /* size of file, in bytes */
+    };
+
+Directories in OMFS are implemented as a large hash table.  Filenames are
+hashed then prepended into the bucket list beginning at OMFS_DIR_START.
+Lookup requires hashing the filename, then seeking across i_sibling pointers
+until a match is found on i_name.  Empty buckets are represented by block
+pointers with all-1s (~0).
+
+A file is an omfs_inode structure followed by an extent table beginning at
+OMFS_EXTENT_START::
+
+    struct omfs_extent_entry {
+	    __be64 e_cluster;               /* start location of a set of blocks */
+	    __be64 e_blocks;                /* number of blocks after e_cluster */
+    };
+
+    struct omfs_extent {
+	    __be64 e_next;                  /* next extent table location */
+	    __be32 e_extent_count;          /* total # extents in this table */
+	    __be32 e_fill;
+	    struct omfs_extent_entry e_entry;       /* start of extent entries */
+    };
+
+Each extent holds the block offset followed by number of blocks allocated to
+the extent.  The final extent in each table is a terminator with e_cluster
+being ~0 and e_blocks being ones'-complement of the total number of blocks
+in the table.
+
+If this table overflows, a continuation inode is written and pointed to by
+e_next.  These have a header but lack the rest of the inode structure.
+
diff --git a/Documentation/filesystems/omfs.txt b/Documentation/filesystems/omfs.txt
deleted file mode 100644
index 1d0d41ff5c65..000000000000
--- a/Documentation/filesystems/omfs.txt
+++ /dev/null
@@ -1,106 +0,0 @@
-Optimized MPEG Filesystem (OMFS)
-
-Overview
-========
-
-OMFS is a filesystem created by SonicBlue for use in the ReplayTV DVR
-and Rio Karma MP3 player.  The filesystem is extent-based, utilizing
-block sizes from 2k to 8k, with hash-based directories.  This
-filesystem driver may be used to read and write disks from these
-devices.
-
-Note, it is not recommended that this FS be used in place of a general
-filesystem for your own streaming media device.  Native Linux filesystems
-will likely perform better.
-
-More information is available at:
-
-    http://linux-karma.sf.net/
-
-Various utilities, including mkomfs and omfsck, are included with
-omfsprogs, available at:
-
-    http://bobcopeland.com/karma/
-
-Instructions are included in its README.
-
-Options
-=======
-
-OMFS supports the following mount-time options:
-
-    uid=n        - make all files owned by specified user
-    gid=n        - make all files owned by specified group
-    umask=xxx    - set permission umask to xxx
-    fmask=xxx    - set umask to xxx for files
-    dmask=xxx    - set umask to xxx for directories
-
-Disk format
-===========
-
-OMFS discriminates between "sysblocks" and normal data blocks.  The sysblock
-group consists of super block information, file metadata, directory structures,
-and extents.  Each sysblock has a header containing CRCs of the entire
-sysblock, and may be mirrored in successive blocks on the disk.  A sysblock may
-have a smaller size than a data block, but since they are both addressed by the
-same 64-bit block number, any remaining space in the smaller sysblock is
-unused.
-
-Sysblock header information:
-
-struct omfs_header {
-        __be64 h_self;                  /* FS block where this is located */
-        __be32 h_body_size;             /* size of useful data after header */
-        __be16 h_crc;                   /* crc-ccitt of body_size bytes */
-        char h_fill1[2];
-        u8 h_version;                   /* version, always 1 */
-        char h_type;                    /* OMFS_INODE_X */
-        u8 h_magic;                     /* OMFS_IMAGIC */
-        u8 h_check_xor;                 /* XOR of header bytes before this */
-        __be32 h_fill2;
-};
-
-Files and directories are both represented by omfs_inode:
-
-struct omfs_inode {
-        struct omfs_header i_head;      /* header */
-        __be64 i_parent;                /* parent containing this inode */
-        __be64 i_sibling;               /* next inode in hash bucket */
-        __be64 i_ctime;                 /* ctime, in milliseconds */
-        char i_fill1[35];
-        char i_type;                    /* OMFS_[DIR,FILE] */
-        __be32 i_fill2;
-        char i_fill3[64];
-        char i_name[OMFS_NAMELEN];      /* filename */
-        __be64 i_size;                  /* size of file, in bytes */
-};
-
-Directories in OMFS are implemented as a large hash table.  Filenames are
-hashed then prepended into the bucket list beginning at OMFS_DIR_START.
-Lookup requires hashing the filename, then seeking across i_sibling pointers
-until a match is found on i_name.  Empty buckets are represented by block
-pointers with all-1s (~0).
-
-A file is an omfs_inode structure followed by an extent table beginning at
-OMFS_EXTENT_START:
-
-struct omfs_extent_entry {
-        __be64 e_cluster;               /* start location of a set of blocks */
-        __be64 e_blocks;                /* number of blocks after e_cluster */
-};
-
-struct omfs_extent {
-        __be64 e_next;                  /* next extent table location */
-        __be32 e_extent_count;          /* total # extents in this table */
-        __be32 e_fill;
-        struct omfs_extent_entry e_entry;       /* start of extent entries */
-};
-
-Each extent holds the block offset followed by number of blocks allocated to
-the extent.  The final extent in each table is a terminator with e_cluster
-being ~0 and e_blocks being ones'-complement of the total number of blocks
-in the table.
-
-If this table overflows, a continuation inode is written and pointed to by
-e_next.  These have a header but lack the rest of the inode structure.
-
-- 
cgit 


From 18ccb2233fc5f7c27b5be17f5b6585c2fa62d919 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:12:17 +0100
Subject: docs: filesystems: convert orangefs.txt to ReST

- Add a SPDX header;
- Adjust document and section titles;
- Some whitespace fixes and new line breaks;
- Mark literal blocks as such;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/6f438eeff5b029d229197a602bd9b74004fe9b63.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/index.rst    |   1 +
 Documentation/filesystems/orangefs.rst | 554 +++++++++++++++++++++++++++++++++
 Documentation/filesystems/orangefs.txt | 529 -------------------------------
 3 files changed, 555 insertions(+), 529 deletions(-)
 create mode 100644 Documentation/filesystems/orangefs.rst
 delete mode 100644 Documentation/filesystems/orangefs.txt

diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index fbee77175840..fed53f831192 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -79,6 +79,7 @@ Documentation for filesystem implementations.
    ocfs2
    ocfs2-online-filecheck
    omfs
+   orangefs
    overlayfs
    virtiofs
    vfat
diff --git a/Documentation/filesystems/orangefs.rst b/Documentation/filesystems/orangefs.rst
new file mode 100644
index 000000000000..7d6d4cad73c4
--- /dev/null
+++ b/Documentation/filesystems/orangefs.rst
@@ -0,0 +1,554 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+========
+ORANGEFS
+========
+
+OrangeFS is an LGPL userspace scale-out parallel storage system. It is ideal
+for large storage problems faced by HPC, BigData, Streaming Video,
+Genomics, Bioinformatics.
+
+Orangefs, originally called PVFS, was first developed in 1993 by
+Walt Ligon and Eric Blumer as a parallel file system for Parallel
+Virtual Machine (PVM) as part of a NASA grant to study the I/O patterns
+of parallel programs.
+
+Orangefs features include:
+
+  * Distributes file data among multiple file servers
+  * Supports simultaneous access by multiple clients
+  * Stores file data and metadata on servers using local file system
+    and access methods
+  * Userspace implementation is easy to install and maintain
+  * Direct MPI support
+  * Stateless
+
+
+Mailing List Archives
+=====================
+
+http://lists.orangefs.org/pipermail/devel_lists.orangefs.org/
+
+
+Mailing List Submissions
+========================
+
+devel@lists.orangefs.org
+
+
+Documentation
+=============
+
+http://www.orangefs.org/documentation/
+
+
+Userspace Filesystem Source
+===========================
+
+http://www.orangefs.org/download
+
+Orangefs versions prior to 2.9.3 would not be compatible with the
+upstream version of the kernel client.
+
+
+Running ORANGEFS On a Single Server
+===================================
+
+OrangeFS is usually run in large installations with multiple servers and
+clients, but a complete filesystem can be run on a single machine for
+development and testing.
+
+On Fedora, install orangefs and orangefs-server::
+
+    dnf -y install orangefs orangefs-server
+
+There is an example server configuration file in
+/etc/orangefs/orangefs.conf.  Change localhost to your hostname if
+necessary.
+
+To generate a filesystem to run xfstests against, see below.
+
+There is an example client configuration file in /etc/pvfs2tab.  It is a
+single line.  Uncomment it and change the hostname if necessary.  This
+controls clients which use libpvfs2.  This does not control the
+pvfs2-client-core.
+
+Create the filesystem::
+
+    pvfs2-server -f /etc/orangefs/orangefs.conf
+
+Start the server::
+
+    systemctl start orangefs-server
+
+Test the server::
+
+    pvfs2-ping -m /pvfsmnt
+
+Start the client.  The module must be compiled in or loaded before this
+point::
+
+    systemctl start orangefs-client
+
+Mount the filesystem::
+
+    mount -t pvfs2 tcp://localhost:3334/orangefs /pvfsmnt
+
+
+Building ORANGEFS on a Single Server
+====================================
+
+Where OrangeFS cannot be installed from distribution packages, it may be
+built from source.
+
+You can omit --prefix if you don't care that things are sprinkled around
+in /usr/local.  As of version 2.9.6, OrangeFS uses Berkeley DB by
+default, we will probably be changing the default to LMDB soon.
+
+::
+
+    ./configure --prefix=/opt/ofs --with-db-backend=lmdb
+
+    make
+
+    make install
+
+Create an orangefs config file::
+
+    /opt/ofs/bin/pvfs2-genconfig /etc/pvfs2.conf
+
+Create an /etc/pvfs2tab file::
+
+    echo tcp://localhost:3334/orangefs /pvfsmnt pvfs2 defaults,noauto 0 0 > \
+	/etc/pvfs2tab
+
+Create the mount point you specified in the tab file if needed::
+
+    mkdir /pvfsmnt
+
+Bootstrap the server::
+
+    /opt/ofs/sbin/pvfs2-server -f /etc/pvfs2.conf
+
+Start the server::
+
+    /opt/osf/sbin/pvfs2-server /etc/pvfs2.conf
+
+Now the server should be running. Pvfs2-ls is a simple
+test to verify that the server is running::
+
+    /opt/ofs/bin/pvfs2-ls /pvfsmnt
+
+If stuff seems to be working, load the kernel module and
+turn on the client core::
+
+    /opt/ofs/sbin/pvfs2-client -p /opt/osf/sbin/pvfs2-client-core
+
+Mount your filesystem::
+
+    mount -t pvfs2 tcp://localhost:3334/orangefs /pvfsmnt
+
+
+Running xfstests
+================
+
+It is useful to use a scratch filesystem with xfstests.  This can be
+done with only one server.
+
+Make a second copy of the FileSystem section in the server configuration
+file, which is /etc/orangefs/orangefs.conf.  Change the Name to scratch.
+Change the ID to something other than the ID of the first FileSystem
+section (2 is usually a good choice).
+
+Then there are two FileSystem sections: orangefs and scratch.
+
+This change should be made before creating the filesystem.
+
+::
+
+    pvfs2-server -f /etc/orangefs/orangefs.conf
+
+To run xfstests, create /etc/xfsqa.config::
+
+    TEST_DIR=/orangefs
+    TEST_DEV=tcp://localhost:3334/orangefs
+    SCRATCH_MNT=/scratch
+    SCRATCH_DEV=tcp://localhost:3334/scratch
+
+Then xfstests can be run::
+
+    ./check -pvfs2
+
+
+Options
+=======
+
+The following mount options are accepted:
+
+  acl
+    Allow the use of Access Control Lists on files and directories.
+
+  intr
+    Some operations between the kernel client and the user space
+    filesystem can be interruptible, such as changes in debug levels
+    and the setting of tunable parameters.
+
+  local_lock
+    Enable posix locking from the perspective of "this" kernel. The
+    default file_operations lock action is to return ENOSYS. Posix
+    locking kicks in if the filesystem is mounted with -o local_lock.
+    Distributed locking is being worked on for the future.
+
+
+Debugging
+=========
+
+If you want the debug (GOSSIP) statements in a particular
+source file (inode.c for example) go to syslog::
+
+  echo inode > /sys/kernel/debug/orangefs/kernel-debug
+
+No debugging (the default)::
+
+  echo none > /sys/kernel/debug/orangefs/kernel-debug
+
+Debugging from several source files::
+
+  echo inode,dir > /sys/kernel/debug/orangefs/kernel-debug
+
+All debugging::
+
+  echo all > /sys/kernel/debug/orangefs/kernel-debug
+
+Get a list of all debugging keywords::
+
+  cat /sys/kernel/debug/orangefs/debug-help
+
+
+Protocol between Kernel Module and Userspace
+============================================
+
+Orangefs is a user space filesystem and an associated kernel module.
+We'll just refer to the user space part of Orangefs as "userspace"
+from here on out. Orangefs descends from PVFS, and userspace code
+still uses PVFS for function and variable names. Userspace typedefs
+many of the important structures. Function and variable names in
+the kernel module have been transitioned to "orangefs", and The Linux
+Coding Style avoids typedefs, so kernel module structures that
+correspond to userspace structures are not typedefed.
+
+The kernel module implements a pseudo device that userspace
+can read from and write to. Userspace can also manipulate the
+kernel module through the pseudo device with ioctl.
+
+The Bufmap
+----------
+
+At startup userspace allocates two page-size-aligned (posix_memalign)
+mlocked memory buffers, one is used for IO and one is used for readdir
+operations. The IO buffer is 41943040 bytes and the readdir buffer is
+4194304 bytes. Each buffer contains logical chunks, or partitions, and
+a pointer to each buffer is added to its own PVFS_dev_map_desc structure
+which also describes its total size, as well as the size and number of
+the partitions.
+
+A pointer to the IO buffer's PVFS_dev_map_desc structure is sent to a
+mapping routine in the kernel module with an ioctl. The structure is
+copied from user space to kernel space with copy_from_user and is used
+to initialize the kernel module's "bufmap" (struct orangefs_bufmap), which
+then contains:
+
+  * refcnt
+    - a reference counter
+  * desc_size - PVFS2_BUFMAP_DEFAULT_DESC_SIZE (4194304) - the IO buffer's
+    partition size, which represents the filesystem's block size and
+    is used for s_blocksize in super blocks.
+  * desc_count - PVFS2_BUFMAP_DEFAULT_DESC_COUNT (10) - the number of
+    partitions in the IO buffer.
+  * desc_shift - log2(desc_size), used for s_blocksize_bits in super blocks.
+  * total_size - the total size of the IO buffer.
+  * page_count - the number of 4096 byte pages in the IO buffer.
+  * page_array - a pointer to ``page_count * (sizeof(struct page*))`` bytes
+    of kcalloced memory. This memory is used as an array of pointers
+    to each of the pages in the IO buffer through a call to get_user_pages.
+  * desc_array - a pointer to ``desc_count * (sizeof(struct orangefs_bufmap_desc))``
+    bytes of kcalloced memory. This memory is further intialized:
+
+      user_desc is the kernel's copy of the IO buffer's ORANGEFS_dev_map_desc
+      structure. user_desc->ptr points to the IO buffer.
+
+      ::
+
+	pages_per_desc = bufmap->desc_size / PAGE_SIZE
+	offset = 0
+
+        bufmap->desc_array[0].page_array = &bufmap->page_array[offset]
+        bufmap->desc_array[0].array_count = pages_per_desc = 1024
+        bufmap->desc_array[0].uaddr = (user_desc->ptr) + (0 * 1024 * 4096)
+        offset += 1024
+                           .
+                           .
+                           .
+        bufmap->desc_array[9].page_array = &bufmap->page_array[offset]
+        bufmap->desc_array[9].array_count = pages_per_desc = 1024
+        bufmap->desc_array[9].uaddr = (user_desc->ptr) +
+                                               (9 * 1024 * 4096)
+        offset += 1024
+
+  * buffer_index_array - a desc_count sized array of ints, used to
+    indicate which of the IO buffer's partitions are available to use.
+  * buffer_index_lock - a spinlock to protect buffer_index_array during update.
+  * readdir_index_array - a five (ORANGEFS_READDIR_DEFAULT_DESC_COUNT) element
+    int array used to indicate which of the readdir buffer's partitions are
+    available to use.
+  * readdir_index_lock - a spinlock to protect readdir_index_array during
+    update.
+
+Operations
+----------
+
+The kernel module builds an "op" (struct orangefs_kernel_op_s) when it
+needs to communicate with userspace. Part of the op contains the "upcall"
+which expresses the request to userspace. Part of the op eventually
+contains the "downcall" which expresses the results of the request.
+
+The slab allocator is used to keep a cache of op structures handy.
+
+At init time the kernel module defines and initializes a request list
+and an in_progress hash table to keep track of all the ops that are
+in flight at any given time.
+
+Ops are stateful:
+
+ * unknown
+	    - op was just initialized
+ * waiting
+	    - op is on request_list (upward bound)
+ * inprogr
+	    - op is in progress (waiting for downcall)
+ * serviced
+	    - op has matching downcall; ok
+ * purged
+	    - op has to start a timer since client-core
+              exited uncleanly before servicing op
+ * given up
+	    - submitter has given up waiting for it
+
+When some arbitrary userspace program needs to perform a
+filesystem operation on Orangefs (readdir, I/O, create, whatever)
+an op structure is initialized and tagged with a distinguishing ID
+number. The upcall part of the op is filled out, and the op is
+passed to the "service_operation" function.
+
+Service_operation changes the op's state to "waiting", puts
+it on the request list, and signals the Orangefs file_operations.poll
+function through a wait queue. Userspace is polling the pseudo-device
+and thus becomes aware of the upcall request that needs to be read.
+
+When the Orangefs file_operations.read function is triggered, the
+request list is searched for an op that seems ready-to-process.
+The op is removed from the request list. The tag from the op and
+the filled-out upcall struct are copy_to_user'ed back to userspace.
+
+If any of these (and some additional protocol) copy_to_users fail,
+the op's state is set to "waiting" and the op is added back to
+the request list. Otherwise, the op's state is changed to "in progress",
+and the op is hashed on its tag and put onto the end of a list in the
+in_progress hash table at the index the tag hashed to.
+
+When userspace has assembled the response to the upcall, it
+writes the response, which includes the distinguishing tag, back to
+the pseudo device in a series of io_vecs. This triggers the Orangefs
+file_operations.write_iter function to find the op with the associated
+tag and remove it from the in_progress hash table. As long as the op's
+state is not "canceled" or "given up", its state is set to "serviced".
+The file_operations.write_iter function returns to the waiting vfs,
+and back to service_operation through wait_for_matching_downcall.
+
+Service operation returns to its caller with the op's downcall
+part (the response to the upcall) filled out.
+
+The "client-core" is the bridge between the kernel module and
+userspace. The client-core is a daemon. The client-core has an
+associated watchdog daemon. If the client-core is ever signaled
+to die, the watchdog daemon restarts the client-core. Even though
+the client-core is restarted "right away", there is a period of
+time during such an event that the client-core is dead. A dead client-core
+can't be triggered by the Orangefs file_operations.poll function.
+Ops that pass through service_operation during a "dead spell" can timeout
+on the wait queue and one attempt is made to recycle them. Obviously,
+if the client-core stays dead too long, the arbitrary userspace processes
+trying to use Orangefs will be negatively affected. Waiting ops
+that can't be serviced will be removed from the request list and
+have their states set to "given up". In-progress ops that can't
+be serviced will be removed from the in_progress hash table and
+have their states set to "given up".
+
+Readdir and I/O ops are atypical with respect to their payloads.
+
+  - readdir ops use the smaller of the two pre-allocated pre-partitioned
+    memory buffers. The readdir buffer is only available to userspace.
+    The kernel module obtains an index to a free partition before launching
+    a readdir op. Userspace deposits the results into the indexed partition
+    and then writes them to back to the pvfs device.
+
+  - io (read and write) ops use the larger of the two pre-allocated
+    pre-partitioned memory buffers. The IO buffer is accessible from
+    both userspace and the kernel module. The kernel module obtains an
+    index to a free partition before launching an io op. The kernel module
+    deposits write data into the indexed partition, to be consumed
+    directly by userspace. Userspace deposits the results of read
+    requests into the indexed partition, to be consumed directly
+    by the kernel module.
+
+Responses to kernel requests are all packaged in pvfs2_downcall_t
+structs. Besides a few other members, pvfs2_downcall_t contains a
+union of structs, each of which is associated with a particular
+response type.
+
+The several members outside of the union are:
+
+ ``int32_t type``
+    - type of operation.
+ ``int32_t status``
+    - return code for the operation.
+ ``int64_t trailer_size``
+    - 0 unless readdir operation.
+ ``char *trailer_buf``
+    - initialized to NULL, used during readdir operations.
+
+The appropriate member inside the union is filled out for any
+particular response.
+
+  PVFS2_VFS_OP_FILE_IO
+    fill a pvfs2_io_response_t
+
+  PVFS2_VFS_OP_LOOKUP
+    fill a PVFS_object_kref
+
+  PVFS2_VFS_OP_CREATE
+    fill a PVFS_object_kref
+
+  PVFS2_VFS_OP_SYMLINK
+    fill a PVFS_object_kref
+
+  PVFS2_VFS_OP_GETATTR
+    fill in a PVFS_sys_attr_s (tons of stuff the kernel doesn't need)
+    fill in a string with the link target when the object is a symlink.
+
+  PVFS2_VFS_OP_MKDIR
+    fill a PVFS_object_kref
+
+  PVFS2_VFS_OP_STATFS
+    fill a pvfs2_statfs_response_t with useless info <g>. It is hard for
+    us to know, in a timely fashion, these statistics about our
+    distributed network filesystem.
+
+  PVFS2_VFS_OP_FS_MOUNT
+    fill a pvfs2_fs_mount_response_t which is just like a PVFS_object_kref
+    except its members are in a different order and "__pad1" is replaced
+    with "id".
+
+  PVFS2_VFS_OP_GETXATTR
+    fill a pvfs2_getxattr_response_t
+
+  PVFS2_VFS_OP_LISTXATTR
+    fill a pvfs2_listxattr_response_t
+
+  PVFS2_VFS_OP_PARAM
+    fill a pvfs2_param_response_t
+
+  PVFS2_VFS_OP_PERF_COUNT
+    fill a pvfs2_perf_count_response_t
+
+  PVFS2_VFS_OP_FSKEY
+    file a pvfs2_fs_key_response_t
+
+  PVFS2_VFS_OP_READDIR
+    jamb everything needed to represent a pvfs2_readdir_response_t into
+    the readdir buffer descriptor specified in the upcall.
+
+Userspace uses writev() on /dev/pvfs2-req to pass responses to the requests
+made by the kernel side.
+
+A buffer_list containing:
+
+  - a pointer to the prepared response to the request from the
+    kernel (struct pvfs2_downcall_t).
+  - and also, in the case of a readdir request, a pointer to a
+    buffer containing descriptors for the objects in the target
+    directory.
+
+... is sent to the function (PINT_dev_write_list) which performs
+the writev.
+
+PINT_dev_write_list has a local iovec array: struct iovec io_array[10];
+
+The first four elements of io_array are initialized like this for all
+responses::
+
+  io_array[0].iov_base = address of local variable "proto_ver" (int32_t)
+  io_array[0].iov_len = sizeof(int32_t)
+
+  io_array[1].iov_base = address of global variable "pdev_magic" (int32_t)
+  io_array[1].iov_len = sizeof(int32_t)
+
+  io_array[2].iov_base = address of parameter "tag" (PVFS_id_gen_t)
+  io_array[2].iov_len = sizeof(int64_t)
+
+  io_array[3].iov_base = address of out_downcall member (pvfs2_downcall_t)
+                         of global variable vfs_request (vfs_request_t)
+  io_array[3].iov_len = sizeof(pvfs2_downcall_t)
+
+Readdir responses initialize the fifth element io_array like this::
+
+  io_array[4].iov_base = contents of member trailer_buf (char *)
+                         from out_downcall member of global variable
+                         vfs_request
+  io_array[4].iov_len = contents of member trailer_size (PVFS_size)
+                        from out_downcall member of global variable
+                        vfs_request
+
+Orangefs exploits the dcache in order to avoid sending redundant
+requests to userspace. We keep object inode attributes up-to-date with
+orangefs_inode_getattr. Orangefs_inode_getattr uses two arguments to
+help it decide whether or not to update an inode: "new" and "bypass".
+Orangefs keeps private data in an object's inode that includes a short
+timeout value, getattr_time, which allows any iteration of
+orangefs_inode_getattr to know how long it has been since the inode was
+updated. When the object is not new (new == 0) and the bypass flag is not
+set (bypass == 0) orangefs_inode_getattr returns without updating the inode
+if getattr_time has not timed out. Getattr_time is updated each time the
+inode is updated.
+
+Creation of a new object (file, dir, sym-link) includes the evaluation of
+its pathname, resulting in a negative directory entry for the object.
+A new inode is allocated and associated with the dentry, turning it from
+a negative dentry into a "productive full member of society". Orangefs
+obtains the new inode from Linux with new_inode() and associates
+the inode with the dentry by sending the pair back to Linux with
+d_instantiate().
+
+The evaluation of a pathname for an object resolves to its corresponding
+dentry. If there is no corresponding dentry, one is created for it in
+the dcache. Whenever a dentry is modified or verified Orangefs stores a
+short timeout value in the dentry's d_time, and the dentry will be trusted
+for that amount of time. Orangefs is a network filesystem, and objects
+can potentially change out-of-band with any particular Orangefs kernel module
+instance, so trusting a dentry is risky. The alternative to trusting
+dentries is to always obtain the needed information from userspace - at
+least a trip to the client-core, maybe to the servers. Obtaining information
+from a dentry is cheap, obtaining it from userspace is relatively expensive,
+hence the motivation to use the dentry when possible.
+
+The timeout values d_time and getattr_time are jiffy based, and the
+code is designed to avoid the jiffy-wrap problem::
+
+    "In general, if the clock may have wrapped around more than once, there
+    is no way to tell how much time has elapsed. However, if the times t1
+    and t2 are known to be fairly close, we can reliably compute the
+    difference in a way that takes into account the possibility that the
+    clock may have wrapped between times."
+
+from course notes by instructor Andy Wang
+
diff --git a/Documentation/filesystems/orangefs.txt b/Documentation/filesystems/orangefs.txt
deleted file mode 100644
index f4ba94950e3f..000000000000
--- a/Documentation/filesystems/orangefs.txt
+++ /dev/null
@@ -1,529 +0,0 @@
-ORANGEFS
-========
-
-OrangeFS is an LGPL userspace scale-out parallel storage system. It is ideal
-for large storage problems faced by HPC, BigData, Streaming Video,
-Genomics, Bioinformatics.
-
-Orangefs, originally called PVFS, was first developed in 1993 by
-Walt Ligon and Eric Blumer as a parallel file system for Parallel
-Virtual Machine (PVM) as part of a NASA grant to study the I/O patterns
-of parallel programs.
-
-Orangefs features include:
-
-  * Distributes file data among multiple file servers
-  * Supports simultaneous access by multiple clients
-  * Stores file data and metadata on servers using local file system
-    and access methods
-  * Userspace implementation is easy to install and maintain
-  * Direct MPI support
-  * Stateless
-
-
-MAILING LIST ARCHIVES
-=====================
-
-http://lists.orangefs.org/pipermail/devel_lists.orangefs.org/
-
-
-MAILING LIST SUBMISSIONS
-========================
-
-devel@lists.orangefs.org
-
-
-DOCUMENTATION
-=============
-
-http://www.orangefs.org/documentation/
-
-
-USERSPACE FILESYSTEM SOURCE
-===========================
-
-http://www.orangefs.org/download
-
-Orangefs versions prior to 2.9.3 would not be compatible with the
-upstream version of the kernel client.
-
-
-RUNNING ORANGEFS ON A SINGLE SERVER
-===================================
-
-OrangeFS is usually run in large installations with multiple servers and
-clients, but a complete filesystem can be run on a single machine for
-development and testing.
-
-On Fedora, install orangefs and orangefs-server.
-
-dnf -y install orangefs orangefs-server
-
-There is an example server configuration file in
-/etc/orangefs/orangefs.conf.  Change localhost to your hostname if
-necessary.
-
-To generate a filesystem to run xfstests against, see below.
-
-There is an example client configuration file in /etc/pvfs2tab.  It is a
-single line.  Uncomment it and change the hostname if necessary.  This
-controls clients which use libpvfs2.  This does not control the
-pvfs2-client-core.
-
-Create the filesystem.
-
-pvfs2-server -f /etc/orangefs/orangefs.conf
-
-Start the server.
-
-systemctl start orangefs-server
-
-Test the server.
-
-pvfs2-ping -m /pvfsmnt
-
-Start the client.  The module must be compiled in or loaded before this
-point.
-
-systemctl start orangefs-client
-
-Mount the filesystem.
-
-mount -t pvfs2 tcp://localhost:3334/orangefs /pvfsmnt
-
-
-BUILDING ORANGEFS ON A SINGLE SERVER
-====================================
-
-Where OrangeFS cannot be installed from distribution packages, it may be
-built from source.
-
-You can omit --prefix if you don't care that things are sprinkled around
-in /usr/local.  As of version 2.9.6, OrangeFS uses Berkeley DB by
-default, we will probably be changing the default to LMDB soon.
-
-./configure --prefix=/opt/ofs --with-db-backend=lmdb
-
-make
-
-make install
-
-Create an orangefs config file.
-
-/opt/ofs/bin/pvfs2-genconfig /etc/pvfs2.conf
-
-Create an /etc/pvfs2tab file.
-
-echo tcp://localhost:3334/orangefs /pvfsmnt pvfs2 defaults,noauto 0 0 > \
-    /etc/pvfs2tab
-
-Create the mount point you specified in the tab file if needed.
-
-mkdir /pvfsmnt
-
-Bootstrap the server.
-
-/opt/ofs/sbin/pvfs2-server -f /etc/pvfs2.conf
-
-Start the server.
-
-/opt/osf/sbin/pvfs2-server /etc/pvfs2.conf
-
-Now the server should be running. Pvfs2-ls is a simple
-test to verify that the server is running.
-
-/opt/ofs/bin/pvfs2-ls /pvfsmnt
-
-If stuff seems to be working, load the kernel module and
-turn on the client core.
-
-/opt/ofs/sbin/pvfs2-client -p /opt/osf/sbin/pvfs2-client-core
-
-Mount your filesystem.
-
-mount -t pvfs2 tcp://localhost:3334/orangefs /pvfsmnt
-
-
-RUNNING XFSTESTS
-================
-
-It is useful to use a scratch filesystem with xfstests.  This can be
-done with only one server.
-
-Make a second copy of the FileSystem section in the server configuration
-file, which is /etc/orangefs/orangefs.conf.  Change the Name to scratch.
-Change the ID to something other than the ID of the first FileSystem
-section (2 is usually a good choice).
-
-Then there are two FileSystem sections: orangefs and scratch.
-
-This change should be made before creating the filesystem.
-
-pvfs2-server -f /etc/orangefs/orangefs.conf
-
-To run xfstests, create /etc/xfsqa.config.
-
-TEST_DIR=/orangefs
-TEST_DEV=tcp://localhost:3334/orangefs
-SCRATCH_MNT=/scratch
-SCRATCH_DEV=tcp://localhost:3334/scratch
-
-Then xfstests can be run
-
-./check -pvfs2
-
-
-OPTIONS
-=======
-
-The following mount options are accepted:
-
-  acl
-    Allow the use of Access Control Lists on files and directories.
-
-  intr
-    Some operations between the kernel client and the user space
-    filesystem can be interruptible, such as changes in debug levels
-    and the setting of tunable parameters.
-
-  local_lock
-    Enable posix locking from the perspective of "this" kernel. The
-    default file_operations lock action is to return ENOSYS. Posix
-    locking kicks in if the filesystem is mounted with -o local_lock.
-    Distributed locking is being worked on for the future.
-
-
-DEBUGGING
-=========
-
-If you want the debug (GOSSIP) statements in a particular
-source file (inode.c for example) go to syslog:
-
-  echo inode > /sys/kernel/debug/orangefs/kernel-debug
-
-No debugging (the default):
-
-  echo none > /sys/kernel/debug/orangefs/kernel-debug
-
-Debugging from several source files:
-
-  echo inode,dir > /sys/kernel/debug/orangefs/kernel-debug
-
-All debugging:
-
-  echo all > /sys/kernel/debug/orangefs/kernel-debug
-
-Get a list of all debugging keywords:
-
-  cat /sys/kernel/debug/orangefs/debug-help
-
-
-PROTOCOL BETWEEN KERNEL MODULE AND USERSPACE
-============================================
-
-Orangefs is a user space filesystem and an associated kernel module.
-We'll just refer to the user space part of Orangefs as "userspace"
-from here on out. Orangefs descends from PVFS, and userspace code
-still uses PVFS for function and variable names. Userspace typedefs
-many of the important structures. Function and variable names in
-the kernel module have been transitioned to "orangefs", and The Linux
-Coding Style avoids typedefs, so kernel module structures that
-correspond to userspace structures are not typedefed.
-
-The kernel module implements a pseudo device that userspace
-can read from and write to. Userspace can also manipulate the
-kernel module through the pseudo device with ioctl.
-
-THE BUFMAP:
-
-At startup userspace allocates two page-size-aligned (posix_memalign)
-mlocked memory buffers, one is used for IO and one is used for readdir
-operations. The IO buffer is 41943040 bytes and the readdir buffer is
-4194304 bytes. Each buffer contains logical chunks, or partitions, and
-a pointer to each buffer is added to its own PVFS_dev_map_desc structure
-which also describes its total size, as well as the size and number of
-the partitions.
-
-A pointer to the IO buffer's PVFS_dev_map_desc structure is sent to a
-mapping routine in the kernel module with an ioctl. The structure is
-copied from user space to kernel space with copy_from_user and is used
-to initialize the kernel module's "bufmap" (struct orangefs_bufmap), which
-then contains:
-
-  * refcnt - a reference counter
-  * desc_size - PVFS2_BUFMAP_DEFAULT_DESC_SIZE (4194304) - the IO buffer's
-    partition size, which represents the filesystem's block size and
-    is used for s_blocksize in super blocks.
-  * desc_count - PVFS2_BUFMAP_DEFAULT_DESC_COUNT (10) - the number of
-    partitions in the IO buffer.
-  * desc_shift - log2(desc_size), used for s_blocksize_bits in super blocks.
-  * total_size - the total size of the IO buffer.
-  * page_count - the number of 4096 byte pages in the IO buffer.
-  * page_array - a pointer to page_count * (sizeof(struct page*)) bytes
-    of kcalloced memory. This memory is used as an array of pointers
-    to each of the pages in the IO buffer through a call to get_user_pages.
-  * desc_array - a pointer to desc_count * (sizeof(struct orangefs_bufmap_desc))
-    bytes of kcalloced memory. This memory is further intialized:
-
-      user_desc is the kernel's copy of the IO buffer's ORANGEFS_dev_map_desc
-      structure. user_desc->ptr points to the IO buffer.
-
-      pages_per_desc = bufmap->desc_size / PAGE_SIZE
-      offset = 0
-
-        bufmap->desc_array[0].page_array = &bufmap->page_array[offset]
-        bufmap->desc_array[0].array_count = pages_per_desc = 1024
-        bufmap->desc_array[0].uaddr = (user_desc->ptr) + (0 * 1024 * 4096)
-        offset += 1024
-                           .
-                           .
-                           .
-        bufmap->desc_array[9].page_array = &bufmap->page_array[offset]
-        bufmap->desc_array[9].array_count = pages_per_desc = 1024
-        bufmap->desc_array[9].uaddr = (user_desc->ptr) +
-                                               (9 * 1024 * 4096)
-        offset += 1024
-
-  * buffer_index_array - a desc_count sized array of ints, used to
-    indicate which of the IO buffer's partitions are available to use.
-  * buffer_index_lock - a spinlock to protect buffer_index_array during update.
-  * readdir_index_array - a five (ORANGEFS_READDIR_DEFAULT_DESC_COUNT) element
-    int array used to indicate which of the readdir buffer's partitions are
-    available to use.
-  * readdir_index_lock - a spinlock to protect readdir_index_array during
-    update.
-
-OPERATIONS:
-
-The kernel module builds an "op" (struct orangefs_kernel_op_s) when it
-needs to communicate with userspace. Part of the op contains the "upcall"
-which expresses the request to userspace. Part of the op eventually
-contains the "downcall" which expresses the results of the request.
-
-The slab allocator is used to keep a cache of op structures handy.
-
-At init time the kernel module defines and initializes a request list
-and an in_progress hash table to keep track of all the ops that are
-in flight at any given time.
-
-Ops are stateful:
-
- * unknown  - op was just initialized
- * waiting  - op is on request_list (upward bound)
- * inprogr  - op is in progress (waiting for downcall)
- * serviced - op has matching downcall; ok
- * purged   - op has to start a timer since client-core
-              exited uncleanly before servicing op
- * given up - submitter has given up waiting for it
-
-When some arbitrary userspace program needs to perform a
-filesystem operation on Orangefs (readdir, I/O, create, whatever)
-an op structure is initialized and tagged with a distinguishing ID
-number. The upcall part of the op is filled out, and the op is
-passed to the "service_operation" function.
-
-Service_operation changes the op's state to "waiting", puts
-it on the request list, and signals the Orangefs file_operations.poll
-function through a wait queue. Userspace is polling the pseudo-device
-and thus becomes aware of the upcall request that needs to be read.
-
-When the Orangefs file_operations.read function is triggered, the
-request list is searched for an op that seems ready-to-process.
-The op is removed from the request list. The tag from the op and
-the filled-out upcall struct are copy_to_user'ed back to userspace.
-
-If any of these (and some additional protocol) copy_to_users fail,
-the op's state is set to "waiting" and the op is added back to
-the request list. Otherwise, the op's state is changed to "in progress",
-and the op is hashed on its tag and put onto the end of a list in the
-in_progress hash table at the index the tag hashed to.
-
-When userspace has assembled the response to the upcall, it
-writes the response, which includes the distinguishing tag, back to
-the pseudo device in a series of io_vecs. This triggers the Orangefs
-file_operations.write_iter function to find the op with the associated
-tag and remove it from the in_progress hash table. As long as the op's
-state is not "canceled" or "given up", its state is set to "serviced".
-The file_operations.write_iter function returns to the waiting vfs,
-and back to service_operation through wait_for_matching_downcall.
-
-Service operation returns to its caller with the op's downcall
-part (the response to the upcall) filled out.
-
-The "client-core" is the bridge between the kernel module and
-userspace. The client-core is a daemon. The client-core has an
-associated watchdog daemon. If the client-core is ever signaled
-to die, the watchdog daemon restarts the client-core. Even though
-the client-core is restarted "right away", there is a period of
-time during such an event that the client-core is dead. A dead client-core
-can't be triggered by the Orangefs file_operations.poll function.
-Ops that pass through service_operation during a "dead spell" can timeout
-on the wait queue and one attempt is made to recycle them. Obviously,
-if the client-core stays dead too long, the arbitrary userspace processes
-trying to use Orangefs will be negatively affected. Waiting ops
-that can't be serviced will be removed from the request list and
-have their states set to "given up". In-progress ops that can't
-be serviced will be removed from the in_progress hash table and
-have their states set to "given up".
-
-Readdir and I/O ops are atypical with respect to their payloads.
-
-  - readdir ops use the smaller of the two pre-allocated pre-partitioned
-    memory buffers. The readdir buffer is only available to userspace.
-    The kernel module obtains an index to a free partition before launching
-    a readdir op. Userspace deposits the results into the indexed partition
-    and then writes them to back to the pvfs device.
-
-  - io (read and write) ops use the larger of the two pre-allocated
-    pre-partitioned memory buffers. The IO buffer is accessible from
-    both userspace and the kernel module. The kernel module obtains an
-    index to a free partition before launching an io op. The kernel module
-    deposits write data into the indexed partition, to be consumed
-    directly by userspace. Userspace deposits the results of read
-    requests into the indexed partition, to be consumed directly
-    by the kernel module.
-
-Responses to kernel requests are all packaged in pvfs2_downcall_t
-structs. Besides a few other members, pvfs2_downcall_t contains a
-union of structs, each of which is associated with a particular
-response type.
-
-The several members outside of the union are:
- - int32_t type - type of operation.
- - int32_t status - return code for the operation.
- - int64_t trailer_size - 0 unless readdir operation.
- - char *trailer_buf - initialized to NULL, used during readdir operations.
-
-The appropriate member inside the union is filled out for any
-particular response.
-
-  PVFS2_VFS_OP_FILE_IO
-    fill a pvfs2_io_response_t
-
-  PVFS2_VFS_OP_LOOKUP
-    fill a PVFS_object_kref
-
-  PVFS2_VFS_OP_CREATE
-    fill a PVFS_object_kref
-
-  PVFS2_VFS_OP_SYMLINK
-    fill a PVFS_object_kref
-
-  PVFS2_VFS_OP_GETATTR
-    fill in a PVFS_sys_attr_s (tons of stuff the kernel doesn't need)
-    fill in a string with the link target when the object is a symlink.
-
-  PVFS2_VFS_OP_MKDIR
-    fill a PVFS_object_kref
-
-  PVFS2_VFS_OP_STATFS
-    fill a pvfs2_statfs_response_t with useless info <g>. It is hard for
-    us to know, in a timely fashion, these statistics about our
-    distributed network filesystem.
-
-  PVFS2_VFS_OP_FS_MOUNT
-    fill a pvfs2_fs_mount_response_t which is just like a PVFS_object_kref
-    except its members are in a different order and "__pad1" is replaced
-    with "id".
-
-  PVFS2_VFS_OP_GETXATTR
-    fill a pvfs2_getxattr_response_t
-
-  PVFS2_VFS_OP_LISTXATTR
-    fill a pvfs2_listxattr_response_t
-
-  PVFS2_VFS_OP_PARAM
-    fill a pvfs2_param_response_t
-
-  PVFS2_VFS_OP_PERF_COUNT
-    fill a pvfs2_perf_count_response_t
-
-  PVFS2_VFS_OP_FSKEY
-    file a pvfs2_fs_key_response_t
-
-  PVFS2_VFS_OP_READDIR
-    jamb everything needed to represent a pvfs2_readdir_response_t into
-    the readdir buffer descriptor specified in the upcall.
-
-Userspace uses writev() on /dev/pvfs2-req to pass responses to the requests
-made by the kernel side.
-
-A buffer_list containing:
-  - a pointer to the prepared response to the request from the
-    kernel (struct pvfs2_downcall_t).
-  - and also, in the case of a readdir request, a pointer to a
-    buffer containing descriptors for the objects in the target
-    directory.
-... is sent to the function (PINT_dev_write_list) which performs
-the writev.
-
-PINT_dev_write_list has a local iovec array: struct iovec io_array[10];
-
-The first four elements of io_array are initialized like this for all
-responses:
-
-  io_array[0].iov_base = address of local variable "proto_ver" (int32_t)
-  io_array[0].iov_len = sizeof(int32_t)
-
-  io_array[1].iov_base = address of global variable "pdev_magic" (int32_t)
-  io_array[1].iov_len = sizeof(int32_t)
-
-  io_array[2].iov_base = address of parameter "tag" (PVFS_id_gen_t)
-  io_array[2].iov_len = sizeof(int64_t)
-
-  io_array[3].iov_base = address of out_downcall member (pvfs2_downcall_t)
-                         of global variable vfs_request (vfs_request_t)
-  io_array[3].iov_len = sizeof(pvfs2_downcall_t)
-
-Readdir responses initialize the fifth element io_array like this:
-
-  io_array[4].iov_base = contents of member trailer_buf (char *)
-                         from out_downcall member of global variable
-                         vfs_request
-  io_array[4].iov_len = contents of member trailer_size (PVFS_size)
-                        from out_downcall member of global variable
-                        vfs_request
-
-Orangefs exploits the dcache in order to avoid sending redundant
-requests to userspace. We keep object inode attributes up-to-date with
-orangefs_inode_getattr. Orangefs_inode_getattr uses two arguments to
-help it decide whether or not to update an inode: "new" and "bypass".
-Orangefs keeps private data in an object's inode that includes a short
-timeout value, getattr_time, which allows any iteration of
-orangefs_inode_getattr to know how long it has been since the inode was
-updated. When the object is not new (new == 0) and the bypass flag is not
-set (bypass == 0) orangefs_inode_getattr returns without updating the inode
-if getattr_time has not timed out. Getattr_time is updated each time the
-inode is updated.
-
-Creation of a new object (file, dir, sym-link) includes the evaluation of
-its pathname, resulting in a negative directory entry for the object.
-A new inode is allocated and associated with the dentry, turning it from
-a negative dentry into a "productive full member of society". Orangefs
-obtains the new inode from Linux with new_inode() and associates
-the inode with the dentry by sending the pair back to Linux with
-d_instantiate().
-
-The evaluation of a pathname for an object resolves to its corresponding
-dentry. If there is no corresponding dentry, one is created for it in
-the dcache. Whenever a dentry is modified or verified Orangefs stores a
-short timeout value in the dentry's d_time, and the dentry will be trusted
-for that amount of time. Orangefs is a network filesystem, and objects
-can potentially change out-of-band with any particular Orangefs kernel module
-instance, so trusting a dentry is risky. The alternative to trusting
-dentries is to always obtain the needed information from userspace - at
-least a trip to the client-core, maybe to the servers. Obtaining information
-from a dentry is cheap, obtaining it from userspace is relatively expensive,
-hence the motivation to use the dentry when possible.
-
-The timeout values d_time and getattr_time are jiffy based, and the
-code is designed to avoid the jiffy-wrap problem:
-
-"In general, if the clock may have wrapped around more than once, there
-is no way to tell how much time has elapsed. However, if the times t1
-and t2 are known to be fairly close, we can reliably compute the
-difference in a way that takes into account the possibility that the
-clock may have wrapped between times."
-
-                      from course notes by instructor Andy Wang
-
-- 
cgit 


From c33e97efa9d9de538e5f0afe6cb07f83afcd5b68 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:12:18 +0100
Subject: docs: filesystems: convert proc.txt to ReST

This document has a nice format! Unfortunately, not exactly
ReST. So, several adjustments were required:

- Add a SPDX header;
- Adjust document and section titles;
- Whitespace fixes and new line breaks;
- Mark literal blocks as such;
- Add table markups;
- Add table captions;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/1d113d860188de416ca3b0b97371dc2195433d5b.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/index.rst |    1 +
 Documentation/filesystems/proc.rst  | 2169 +++++++++++++++++++++++++++++++++++
 Documentation/filesystems/proc.txt  | 2047 ---------------------------------
 3 files changed, 2170 insertions(+), 2047 deletions(-)
 create mode 100644 Documentation/filesystems/proc.rst
 delete mode 100644 Documentation/filesystems/proc.txt

diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index fed53f831192..671906e2fee6 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -81,5 +81,6 @@ Documentation for filesystem implementations.
    omfs
    orangefs
    overlayfs
+   proc
    virtiofs
    vfat
diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
new file mode 100644
index 000000000000..38b606991065
--- /dev/null
+++ b/Documentation/filesystems/proc.rst
@@ -0,0 +1,2169 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+====================
+The /proc Filesystem
+====================
+
+=====================  =======================================  ================
+/proc/sys              Terrehon Bowden <terrehon@pacbell.net>,  October 7 1999
+                       Bodo Bauer <bb@ricochet.net>
+2.4.x update	       Jorge Nerin <comandante@zaralinux.com>   November 14 2000
+move /proc/sys	       Shen Feng <shen@cn.fujitsu.com>	        April 1 2009
+fixes/update part 1.1  Stefani Seibold <stefani@seibold.net>    June 9 2009
+=====================  =======================================  ================
+
+
+
+.. Table of Contents
+
+  0     Preface
+  0.1	Introduction/Credits
+  0.2	Legal Stuff
+
+  1	Collecting System Information
+  1.1	Process-Specific Subdirectories
+  1.2	Kernel data
+  1.3	IDE devices in /proc/ide
+  1.4	Networking info in /proc/net
+  1.5	SCSI info
+  1.6	Parallel port info in /proc/parport
+  1.7	TTY info in /proc/tty
+  1.8	Miscellaneous kernel statistics in /proc/stat
+  1.9	Ext4 file system parameters
+
+  2	Modifying System Parameters
+
+  3	Per-Process Parameters
+  3.1	/proc/<pid>/oom_adj & /proc/<pid>/oom_score_adj - Adjust the oom-killer
+								score
+  3.2	/proc/<pid>/oom_score - Display current oom-killer score
+  3.3	/proc/<pid>/io - Display the IO accounting fields
+  3.4	/proc/<pid>/coredump_filter - Core dump filtering settings
+  3.5	/proc/<pid>/mountinfo - Information about mounts
+  3.6	/proc/<pid>/comm  & /proc/<pid>/task/<tid>/comm
+  3.7   /proc/<pid>/task/<tid>/children - Information about task children
+  3.8   /proc/<pid>/fdinfo/<fd> - Information about opened file
+  3.9   /proc/<pid>/map_files - Information about memory mapped files
+  3.10  /proc/<pid>/timerslack_ns - Task timerslack value
+  3.11	/proc/<pid>/patch_state - Livepatch patch operation state
+  3.12	/proc/<pid>/arch_status - Task architecture specific information
+
+  4	Configuring procfs
+  4.1	Mount options
+
+Preface
+=======
+
+0.1 Introduction/Credits
+------------------------
+
+This documentation is  part of a soon (or  so we hope) to be  released book on
+the SuSE  Linux distribution. As  there is  no complete documentation  for the
+/proc file system and we've used  many freely available sources to write these
+chapters, it  seems only fair  to give the work  back to the  Linux community.
+This work is  based on the 2.2.*  kernel version and the  upcoming 2.4.*. I'm
+afraid it's still far from complete, but we  hope it will be useful. As far as
+we know, it is the first 'all-in-one' document about the /proc file system. It
+is focused  on the Intel  x86 hardware,  so if you  are looking for  PPC, ARM,
+SPARC, AXP, etc., features, you probably  won't find what you are looking for.
+It also only covers IPv4 networking, not IPv6 nor other protocols - sorry. But
+additions and patches  are welcome and will  be added to this  document if you
+mail them to Bodo.
+
+We'd like  to  thank Alan Cox, Rik van Riel, and Alexey Kuznetsov and a lot of
+other people for help compiling this documentation. We'd also like to extend a
+special thank  you to Andi Kleen for documentation, which we relied on heavily
+to create  this  document,  as well as the additional information he provided.
+Thanks to  everybody  else  who contributed source or docs to the Linux kernel
+and helped create a great piece of software... :)
+
+If you  have  any comments, corrections or additions, please don't hesitate to
+contact Bodo  Bauer  at  bb@ricochet.net.  We'll  be happy to add them to this
+document.
+
+The   latest   version    of   this   document   is    available   online   at
+http://tldp.org/LDP/Linux-Filesystem-Hierarchy/html/proc.html
+
+If  the above  direction does  not works  for you,  you could  try the  kernel
+mailing  list  at  linux-kernel@vger.kernel.org  and/or try  to  reach  me  at
+comandante@zaralinux.com.
+
+0.2 Legal Stuff
+---------------
+
+We don't  guarantee  the  correctness  of this document, and if you come to us
+complaining about  how  you  screwed  up  your  system  because  of  incorrect
+documentation, we won't feel responsible...
+
+Chapter 1: Collecting System Information
+========================================
+
+In This Chapter
+---------------
+* Investigating  the  properties  of  the  pseudo  file  system  /proc and its
+  ability to provide information on the running Linux system
+* Examining /proc's structure
+* Uncovering  various  information  about the kernel and the processes running
+  on the system
+
+------------------------------------------------------------------------------
+
+The proc  file  system acts as an interface to internal data structures in the
+kernel. It  can  be  used to obtain information about the system and to change
+certain kernel parameters at runtime (sysctl).
+
+First, we'll  take  a  look  at the read-only parts of /proc. In Chapter 2, we
+show you how you can use /proc/sys to change settings.
+
+1.1 Process-Specific Subdirectories
+-----------------------------------
+
+The directory  /proc  contains  (among other things) one subdirectory for each
+process running on the system, which is named after the process ID (PID).
+
+The link  self  points  to  the  process reading the file system. Each process
+subdirectory has the entries listed in Table 1-1.
+
+Note that an open a file descriptor to /proc/<pid> or to any of its
+contained files or subdirectories does not prevent <pid> being reused
+for some other process in the event that <pid> exits. Operations on
+open /proc/<pid> file descriptors corresponding to dead processes
+never act on any new process that the kernel may, through chance, have
+also assigned the process ID <pid>. Instead, operations on these FDs
+usually fail with ESRCH.
+
+.. table:: Table 1-1: Process specific entries in /proc
+
+ =============  ===============================================================
+ File		Content
+ =============  ===============================================================
+ clear_refs	Clears page referenced bits shown in smaps output
+ cmdline	Command line arguments
+ cpu		Current and last cpu in which it was executed	(2.4)(smp)
+ cwd		Link to the current working directory
+ environ	Values of environment variables
+ exe		Link to the executable of this process
+ fd		Directory, which contains all file descriptors
+ maps		Memory maps to executables and library files	(2.4)
+ mem		Memory held by this process
+ root		Link to the root directory of this process
+ stat		Process status
+ statm		Process memory status information
+ status		Process status in human readable form
+ wchan		Present with CONFIG_KALLSYMS=y: it shows the kernel function
+		symbol the task is blocked in - or "0" if not blocked.
+ pagemap	Page table
+ stack		Report full stack trace, enable via CONFIG_STACKTRACE
+ smaps		An extension based on maps, showing the memory consumption of
+		each mapping and flags associated with it
+ smaps_rollup	Accumulated smaps stats for all mappings of the process.  This
+		can be derived from smaps, but is faster and more convenient
+ numa_maps	An extension based on maps, showing the memory locality and
+		binding policy as well as mem usage (in pages) of each mapping.
+ =============  ===============================================================
+
+For example, to get the status information of a process, all you have to do is
+read the file /proc/PID/status::
+
+  >cat /proc/self/status
+  Name:   cat
+  State:  R (running)
+  Tgid:   5452
+  Pid:    5452
+  PPid:   743
+  TracerPid:      0						(2.4)
+  Uid:    501     501     501     501
+  Gid:    100     100     100     100
+  FDSize: 256
+  Groups: 100 14 16
+  VmPeak:     5004 kB
+  VmSize:     5004 kB
+  VmLck:         0 kB
+  VmHWM:       476 kB
+  VmRSS:       476 kB
+  RssAnon:             352 kB
+  RssFile:             120 kB
+  RssShmem:              4 kB
+  VmData:      156 kB
+  VmStk:        88 kB
+  VmExe:        68 kB
+  VmLib:      1412 kB
+  VmPTE:        20 kb
+  VmSwap:        0 kB
+  HugetlbPages:          0 kB
+  CoreDumping:    0
+  THP_enabled:	  1
+  Threads:        1
+  SigQ:   0/28578
+  SigPnd: 0000000000000000
+  ShdPnd: 0000000000000000
+  SigBlk: 0000000000000000
+  SigIgn: 0000000000000000
+  SigCgt: 0000000000000000
+  CapInh: 00000000fffffeff
+  CapPrm: 0000000000000000
+  CapEff: 0000000000000000
+  CapBnd: ffffffffffffffff
+  CapAmb: 0000000000000000
+  NoNewPrivs:     0
+  Seccomp:        0
+  Speculation_Store_Bypass:       thread vulnerable
+  voluntary_ctxt_switches:        0
+  nonvoluntary_ctxt_switches:     1
+
+This shows you nearly the same information you would get if you viewed it with
+the ps  command.  In  fact,  ps  uses  the  proc  file  system  to  obtain its
+information.  But you get a more detailed  view of the  process by reading the
+file /proc/PID/status. It fields are described in table 1-2.
+
+The  statm  file  contains  more  detailed  information about the process
+memory usage. Its seven fields are explained in Table 1-3.  The stat file
+contains details information about the process itself.  Its fields are
+explained in Table 1-4.
+
+(for SMP CONFIG users)
+
+For making accounting scalable, RSS related information are handled in an
+asynchronous manner and the value may not be very precise. To see a precise
+snapshot of a moment, you can see /proc/<pid>/smaps file and scan page table.
+It's slow but very precise.
+
+.. table:: Table 1-2: Contents of the status files (as of 4.19)
+
+ ==========================  ===================================================
+ Field                       Content
+ ==========================  ===================================================
+ Name                        filename of the executable
+ Umask                       file mode creation mask
+ State                       state (R is running, S is sleeping, D is sleeping
+                             in an uninterruptible wait, Z is zombie,
+			     T is traced or stopped)
+ Tgid                        thread group ID
+ Ngid                        NUMA group ID (0 if none)
+ Pid                         process id
+ PPid                        process id of the parent process
+ TracerPid                   PID of process tracing this process (0 if not)
+ Uid                         Real, effective, saved set, and  file system UIDs
+ Gid                         Real, effective, saved set, and  file system GIDs
+ FDSize                      number of file descriptor slots currently allocated
+ Groups                      supplementary group list
+ NStgid                      descendant namespace thread group ID hierarchy
+ NSpid                       descendant namespace process ID hierarchy
+ NSpgid                      descendant namespace process group ID hierarchy
+ NSsid                       descendant namespace session ID hierarchy
+ VmPeak                      peak virtual memory size
+ VmSize                      total program size
+ VmLck                       locked memory size
+ VmPin                       pinned memory size
+ VmHWM                       peak resident set size ("high water mark")
+ VmRSS                       size of memory portions. It contains the three
+                             following parts
+                             (VmRSS = RssAnon + RssFile + RssShmem)
+ RssAnon                     size of resident anonymous memory
+ RssFile                     size of resident file mappings
+ RssShmem                    size of resident shmem memory (includes SysV shm,
+                             mapping of tmpfs and shared anonymous mappings)
+ VmData                      size of private data segments
+ VmStk                       size of stack segments
+ VmExe                       size of text segment
+ VmLib                       size of shared library code
+ VmPTE                       size of page table entries
+ VmSwap                      amount of swap used by anonymous private data
+                             (shmem swap usage is not included)
+ HugetlbPages                size of hugetlb memory portions
+ CoreDumping                 process's memory is currently being dumped
+                             (killing the process may lead to a corrupted core)
+ THP_enabled		     process is allowed to use THP (returns 0 when
+			     PR_SET_THP_DISABLE is set on the process
+ Threads                     number of threads
+ SigQ                        number of signals queued/max. number for queue
+ SigPnd                      bitmap of pending signals for the thread
+ ShdPnd                      bitmap of shared pending signals for the process
+ SigBlk                      bitmap of blocked signals
+ SigIgn                      bitmap of ignored signals
+ SigCgt                      bitmap of caught signals
+ CapInh                      bitmap of inheritable capabilities
+ CapPrm                      bitmap of permitted capabilities
+ CapEff                      bitmap of effective capabilities
+ CapBnd                      bitmap of capabilities bounding set
+ CapAmb                      bitmap of ambient capabilities
+ NoNewPrivs                  no_new_privs, like prctl(PR_GET_NO_NEW_PRIV, ...)
+ Seccomp                     seccomp mode, like prctl(PR_GET_SECCOMP, ...)
+ Speculation_Store_Bypass    speculative store bypass mitigation status
+ Cpus_allowed                mask of CPUs on which this process may run
+ Cpus_allowed_list           Same as previous, but in "list format"
+ Mems_allowed                mask of memory nodes allowed to this process
+ Mems_allowed_list           Same as previous, but in "list format"
+ voluntary_ctxt_switches     number of voluntary context switches
+ nonvoluntary_ctxt_switches  number of non voluntary context switches
+ ==========================  ===================================================
+
+
+.. table:: Table 1-3: Contents of the statm files (as of 2.6.8-rc3)
+
+ ======== ===============================	==============================
+ Field    Content
+ ======== ===============================	==============================
+ size     total program size (pages)		(same as VmSize in status)
+ resident size of memory portions (pages)	(same as VmRSS in status)
+ shared   number of pages that are shared	(i.e. backed by a file, same
+						as RssFile+RssShmem in status)
+ trs      number of pages that are 'code'	(not including libs; broken,
+						includes data segment)
+ lrs      number of pages of library		(always 0 on 2.6)
+ drs      number of pages of data/stack		(including libs; broken,
+						includes library text)
+ dt       number of dirty pages			(always 0 on 2.6)
+ ======== ===============================	==============================
+
+
+.. table:: Table 1-4: Contents of the stat files (as of 2.6.30-rc7)
+
+  ============= ===============================================================
+  Field         Content
+  ============= ===============================================================
+  pid           process id
+  tcomm         filename of the executable
+  state         state (R is running, S is sleeping, D is sleeping in an
+                uninterruptible wait, Z is zombie, T is traced or stopped)
+  ppid          process id of the parent process
+  pgrp          pgrp of the process
+  sid           session id
+  tty_nr        tty the process uses
+  tty_pgrp      pgrp of the tty
+  flags         task flags
+  min_flt       number of minor faults
+  cmin_flt      number of minor faults with child's
+  maj_flt       number of major faults
+  cmaj_flt      number of major faults with child's
+  utime         user mode jiffies
+  stime         kernel mode jiffies
+  cutime        user mode jiffies with child's
+  cstime        kernel mode jiffies with child's
+  priority      priority level
+  nice          nice level
+  num_threads   number of threads
+  it_real_value	(obsolete, always 0)
+  start_time    time the process started after system boot
+  vsize         virtual memory size
+  rss           resident set memory size
+  rsslim        current limit in bytes on the rss
+  start_code    address above which program text can run
+  end_code      address below which program text can run
+  start_stack   address of the start of the main process stack
+  esp           current value of ESP
+  eip           current value of EIP
+  pending       bitmap of pending signals
+  blocked       bitmap of blocked signals
+  sigign        bitmap of ignored signals
+  sigcatch      bitmap of caught signals
+  0		(place holder, used to be the wchan address,
+		use /proc/PID/wchan instead)
+  0             (place holder)
+  0             (place holder)
+  exit_signal   signal to send to parent thread on exit
+  task_cpu      which CPU the task is scheduled on
+  rt_priority   realtime priority
+  policy        scheduling policy (man sched_setscheduler)
+  blkio_ticks   time spent waiting for block IO
+  gtime         guest time of the task in jiffies
+  cgtime        guest time of the task children in jiffies
+  start_data    address above which program data+bss is placed
+  end_data      address below which program data+bss is placed
+  start_brk     address above which program heap can be expanded with brk()
+  arg_start     address above which program command line is placed
+  arg_end       address below which program command line is placed
+  env_start     address above which program environment is placed
+  env_end       address below which program environment is placed
+  exit_code     the thread's exit_code in the form reported by the waitpid
+		system call
+  ============= ===============================================================
+
+The /proc/PID/maps file contains the currently mapped memory regions and
+their access permissions.
+
+The format is::
+
+    address           perms offset  dev   inode      pathname
+
+    08048000-08049000 r-xp 00000000 03:00 8312       /opt/test
+    08049000-0804a000 rw-p 00001000 03:00 8312       /opt/test
+    0804a000-0806b000 rw-p 00000000 00:00 0          [heap]
+    a7cb1000-a7cb2000 ---p 00000000 00:00 0
+    a7cb2000-a7eb2000 rw-p 00000000 00:00 0
+    a7eb2000-a7eb3000 ---p 00000000 00:00 0
+    a7eb3000-a7ed5000 rw-p 00000000 00:00 0
+    a7ed5000-a8008000 r-xp 00000000 03:00 4222       /lib/libc.so.6
+    a8008000-a800a000 r--p 00133000 03:00 4222       /lib/libc.so.6
+    a800a000-a800b000 rw-p 00135000 03:00 4222       /lib/libc.so.6
+    a800b000-a800e000 rw-p 00000000 00:00 0
+    a800e000-a8022000 r-xp 00000000 03:00 14462      /lib/libpthread.so.0
+    a8022000-a8023000 r--p 00013000 03:00 14462      /lib/libpthread.so.0
+    a8023000-a8024000 rw-p 00014000 03:00 14462      /lib/libpthread.so.0
+    a8024000-a8027000 rw-p 00000000 00:00 0
+    a8027000-a8043000 r-xp 00000000 03:00 8317       /lib/ld-linux.so.2
+    a8043000-a8044000 r--p 0001b000 03:00 8317       /lib/ld-linux.so.2
+    a8044000-a8045000 rw-p 0001c000 03:00 8317       /lib/ld-linux.so.2
+    aff35000-aff4a000 rw-p 00000000 00:00 0          [stack]
+    ffffe000-fffff000 r-xp 00000000 00:00 0          [vdso]
+
+where "address" is the address space in the process that it occupies, "perms"
+is a set of permissions::
+
+ r = read
+ w = write
+ x = execute
+ s = shared
+ p = private (copy on write)
+
+"offset" is the offset into the mapping, "dev" is the device (major:minor), and
+"inode" is the inode  on that device.  0 indicates that  no inode is associated
+with the memory region, as the case would be with BSS (uninitialized data).
+The "pathname" shows the name associated file for this mapping.  If the mapping
+is not associated with a file:
+
+ =======                    ====================================
+ [heap]                     the heap of the program
+ [stack]                    the stack of the main process
+ [vdso]                     the "virtual dynamic shared object",
+                            the kernel system call handler
+ =======                    ====================================
+
+ or if empty, the mapping is anonymous.
+
+The /proc/PID/smaps is an extension based on maps, showing the memory
+consumption for each of the process's mappings. For each mapping (aka Virtual
+Memory Area, or VMA) there is a series of lines such as the following::
+
+    08048000-080bc000 r-xp 00000000 03:02 13130      /bin/bash
+
+    Size:               1084 kB
+    KernelPageSize:        4 kB
+    MMUPageSize:           4 kB
+    Rss:                 892 kB
+    Pss:                 374 kB
+    Shared_Clean:        892 kB
+    Shared_Dirty:          0 kB
+    Private_Clean:         0 kB
+    Private_Dirty:         0 kB
+    Referenced:          892 kB
+    Anonymous:             0 kB
+    LazyFree:              0 kB
+    AnonHugePages:         0 kB
+    ShmemPmdMapped:        0 kB
+    Shared_Hugetlb:        0 kB
+    Private_Hugetlb:       0 kB
+    Swap:                  0 kB
+    SwapPss:               0 kB
+    KernelPageSize:        4 kB
+    MMUPageSize:           4 kB
+    Locked:                0 kB
+    THPeligible:           0
+    VmFlags: rd ex mr mw me dw
+
+The first of these lines shows the same information as is displayed for the
+mapping in /proc/PID/maps.  Following lines show the size of the mapping
+(size); the size of each page allocated when backing a VMA (KernelPageSize),
+which is usually the same as the size in the page table entries; the page size
+used by the MMU when backing a VMA (in most cases, the same as KernelPageSize);
+the amount of the mapping that is currently resident in RAM (RSS); the
+process' proportional share of this mapping (PSS); and the number of clean and
+dirty shared and private pages in the mapping.
+
+The "proportional set size" (PSS) of a process is the count of pages it has
+in memory, where each page is divided by the number of processes sharing it.
+So if a process has 1000 pages all to itself, and 1000 shared with one other
+process, its PSS will be 1500.
+
+Note that even a page which is part of a MAP_SHARED mapping, but has only
+a single pte mapped, i.e.  is currently used by only one process, is accounted
+as private and not as shared.
+
+"Referenced" indicates the amount of memory currently marked as referenced or
+accessed.
+
+"Anonymous" shows the amount of memory that does not belong to any file.  Even
+a mapping associated with a file may contain anonymous pages: when MAP_PRIVATE
+and a page is modified, the file page is replaced by a private anonymous copy.
+
+"LazyFree" shows the amount of memory which is marked by madvise(MADV_FREE).
+The memory isn't freed immediately with madvise(). It's freed in memory
+pressure if the memory is clean. Please note that the printed value might
+be lower than the real value due to optimizations used in the current
+implementation. If this is not desirable please file a bug report.
+
+"AnonHugePages" shows the ammount of memory backed by transparent hugepage.
+
+"ShmemPmdMapped" shows the ammount of shared (shmem/tmpfs) memory backed by
+huge pages.
+
+"Shared_Hugetlb" and "Private_Hugetlb" show the ammounts of memory backed by
+hugetlbfs page which is *not* counted in "RSS" or "PSS" field for historical
+reasons. And these are not included in {Shared,Private}_{Clean,Dirty} field.
+
+"Swap" shows how much would-be-anonymous memory is also used, but out on swap.
+
+For shmem mappings, "Swap" includes also the size of the mapped (and not
+replaced by copy-on-write) part of the underlying shmem object out on swap.
+"SwapPss" shows proportional swap share of this mapping. Unlike "Swap", this
+does not take into account swapped out page of underlying shmem objects.
+"Locked" indicates whether the mapping is locked in memory or not.
+"THPeligible" indicates whether the mapping is eligible for allocating THP
+pages - 1 if true, 0 otherwise. It just shows the current status.
+
+"VmFlags" field deserves a separate description. This member represents the
+kernel flags associated with the particular virtual memory area in two letter
+encoded manner. The codes are the following:
+
+    ==    =======================================
+    rd    readable
+    wr    writeable
+    ex    executable
+    sh    shared
+    mr    may read
+    mw    may write
+    me    may execute
+    ms    may share
+    gd    stack segment growns down
+    pf    pure PFN range
+    dw    disabled write to the mapped file
+    lo    pages are locked in memory
+    io    memory mapped I/O area
+    sr    sequential read advise provided
+    rr    random read advise provided
+    dc    do not copy area on fork
+    de    do not expand area on remapping
+    ac    area is accountable
+    nr    swap space is not reserved for the area
+    ht    area uses huge tlb pages
+    ar    architecture specific flag
+    dd    do not include area into core dump
+    sd    soft dirty flag
+    mm    mixed map area
+    hg    huge page advise flag
+    nh    no huge page advise flag
+    mg    mergable advise flag
+    ==    =======================================
+
+Note that there is no guarantee that every flag and associated mnemonic will
+be present in all further kernel releases. Things get changed, the flags may
+be vanished or the reverse -- new added. Interpretation of their meaning
+might change in future as well. So each consumer of these flags has to
+follow each specific kernel version for the exact semantic.
+
+This file is only present if the CONFIG_MMU kernel configuration option is
+enabled.
+
+Note: reading /proc/PID/maps or /proc/PID/smaps is inherently racy (consistent
+output can be achieved only in the single read call).
+
+This typically manifests when doing partial reads of these files while the
+memory map is being modified.  Despite the races, we do provide the following
+guarantees:
+
+1) The mapped addresses never go backwards, which implies no two
+   regions will ever overlap.
+2) If there is something at a given vaddr during the entirety of the
+   life of the smaps/maps walk, there will be some output for it.
+
+The /proc/PID/smaps_rollup file includes the same fields as /proc/PID/smaps,
+but their values are the sums of the corresponding values for all mappings of
+the process.  Additionally, it contains these fields:
+
+- Pss_Anon
+- Pss_File
+- Pss_Shmem
+
+They represent the proportional shares of anonymous, file, and shmem pages, as
+described for smaps above.  These fields are omitted in smaps since each
+mapping identifies the type (anon, file, or shmem) of all pages it contains.
+Thus all information in smaps_rollup can be derived from smaps, but at a
+significantly higher cost.
+
+The /proc/PID/clear_refs is used to reset the PG_Referenced and ACCESSED/YOUNG
+bits on both physical and virtual pages associated with a process, and the
+soft-dirty bit on pte (see Documentation/admin-guide/mm/soft-dirty.rst
+for details).
+To clear the bits for all the pages associated with the process::
+
+    > echo 1 > /proc/PID/clear_refs
+
+To clear the bits for the anonymous pages associated with the process::
+
+    > echo 2 > /proc/PID/clear_refs
+
+To clear the bits for the file mapped pages associated with the process::
+
+    > echo 3 > /proc/PID/clear_refs
+
+To clear the soft-dirty bit::
+
+    > echo 4 > /proc/PID/clear_refs
+
+To reset the peak resident set size ("high water mark") to the process's
+current value::
+
+    > echo 5 > /proc/PID/clear_refs
+
+Any other value written to /proc/PID/clear_refs will have no effect.
+
+The /proc/pid/pagemap gives the PFN, which can be used to find the pageflags
+using /proc/kpageflags and number of times a page is mapped using
+/proc/kpagecount. For detailed explanation, see
+Documentation/admin-guide/mm/pagemap.rst.
+
+The /proc/pid/numa_maps is an extension based on maps, showing the memory
+locality and binding policy, as well as the memory usage (in pages) of
+each mapping. The output follows a general format where mapping details get
+summarized separated by blank spaces, one mapping per each file line::
+
+    address   policy    mapping details
+
+    00400000 default file=/usr/local/bin/app mapped=1 active=0 N3=1 kernelpagesize_kB=4
+    00600000 default file=/usr/local/bin/app anon=1 dirty=1 N3=1 kernelpagesize_kB=4
+    3206000000 default file=/lib64/ld-2.12.so mapped=26 mapmax=6 N0=24 N3=2 kernelpagesize_kB=4
+    320621f000 default file=/lib64/ld-2.12.so anon=1 dirty=1 N3=1 kernelpagesize_kB=4
+    3206220000 default file=/lib64/ld-2.12.so anon=1 dirty=1 N3=1 kernelpagesize_kB=4
+    3206221000 default anon=1 dirty=1 N3=1 kernelpagesize_kB=4
+    3206800000 default file=/lib64/libc-2.12.so mapped=59 mapmax=21 active=55 N0=41 N3=18 kernelpagesize_kB=4
+    320698b000 default file=/lib64/libc-2.12.so
+    3206b8a000 default file=/lib64/libc-2.12.so anon=2 dirty=2 N3=2 kernelpagesize_kB=4
+    3206b8e000 default file=/lib64/libc-2.12.so anon=1 dirty=1 N3=1 kernelpagesize_kB=4
+    3206b8f000 default anon=3 dirty=3 active=1 N3=3 kernelpagesize_kB=4
+    7f4dc10a2000 default anon=3 dirty=3 N3=3 kernelpagesize_kB=4
+    7f4dc10b4000 default anon=2 dirty=2 active=1 N3=2 kernelpagesize_kB=4
+    7f4dc1200000 default file=/anon_hugepage\040(deleted) huge anon=1 dirty=1 N3=1 kernelpagesize_kB=2048
+    7fff335f0000 default stack anon=3 dirty=3 N3=3 kernelpagesize_kB=4
+    7fff3369d000 default mapped=1 mapmax=35 active=0 N3=1 kernelpagesize_kB=4
+
+Where:
+
+"address" is the starting address for the mapping;
+
+"policy" reports the NUMA memory policy set for the mapping (see Documentation/admin-guide/mm/numa_memory_policy.rst);
+
+"mapping details" summarizes mapping data such as mapping type, page usage counters,
+node locality page counters (N0 == node0, N1 == node1, ...) and the kernel page
+size, in KB, that is backing the mapping up.
+
+1.2 Kernel data
+---------------
+
+Similar to  the  process entries, the kernel data files give information about
+the running kernel. The files used to obtain this information are contained in
+/proc and  are  listed  in Table 1-5. Not all of these will be present in your
+system. It  depends  on the kernel configuration and the loaded modules, which
+files are there, and which are missing.
+
+.. table:: Table 1-5: Kernel info in /proc
+
+ ============ ===============================================================
+ File         Content
+ ============ ===============================================================
+ apm          Advanced power management info
+ buddyinfo    Kernel memory allocator information (see text)	(2.5)
+ bus          Directory containing bus specific information
+ cmdline      Kernel command line
+ cpuinfo      Info about the CPU
+ devices      Available devices (block and character)
+ dma          Used DMS channels
+ filesystems  Supported filesystems
+ driver       Various drivers grouped here, currently rtc	(2.4)
+ execdomains  Execdomains, related to security			(2.4)
+ fb 	      Frame Buffer devices				(2.4)
+ fs 	      File system parameters, currently nfs/exports	(2.4)
+ ide          Directory containing info about the IDE subsystem
+ interrupts   Interrupt usage
+ iomem 	      Memory map					(2.4)
+ ioports      I/O port usage
+ irq 	      Masks for irq to cpu affinity			(2.4)(smp?)
+ isapnp       ISA PnP (Plug&Play) Info				(2.4)
+ kcore        Kernel core image (can be ELF or A.OUT(deprecated in 2.4))
+ kmsg         Kernel messages
+ ksyms        Kernel symbol table
+ loadavg      Load average of last 1, 5 & 15 minutes
+ locks        Kernel locks
+ meminfo      Memory info
+ misc         Miscellaneous
+ modules      List of loaded modules
+ mounts       Mounted filesystems
+ net          Networking info (see text)
+ pagetypeinfo Additional page allocator information (see text)  (2.5)
+ partitions   Table of partitions known to the system
+ pci 	      Deprecated info of PCI bus (new way -> /proc/bus/pci/,
+              decoupled by lspci				(2.4)
+ rtc          Real time clock
+ scsi         SCSI info (see text)
+ slabinfo     Slab pool info
+ softirqs     softirq usage
+ stat         Overall statistics
+ swaps        Swap space utilization
+ sys          See chapter 2
+ sysvipc      Info of SysVIPC Resources (msg, sem, shm)		(2.4)
+ tty 	      Info of tty drivers
+ uptime       Wall clock since boot, combined idle time of all cpus
+ version      Kernel version
+ video 	      bttv info of video resources			(2.4)
+ vmallocinfo  Show vmalloced areas
+ ============ ===============================================================
+
+You can,  for  example,  check  which interrupts are currently in use and what
+they are used for by looking in the file /proc/interrupts::
+
+  > cat /proc/interrupts
+             CPU0
+    0:    8728810          XT-PIC  timer
+    1:        895          XT-PIC  keyboard
+    2:          0          XT-PIC  cascade
+    3:     531695          XT-PIC  aha152x
+    4:    2014133          XT-PIC  serial
+    5:      44401          XT-PIC  pcnet_cs
+    8:          2          XT-PIC  rtc
+   11:          8          XT-PIC  i82365
+   12:     182918          XT-PIC  PS/2 Mouse
+   13:          1          XT-PIC  fpu
+   14:    1232265          XT-PIC  ide0
+   15:          7          XT-PIC  ide1
+  NMI:          0
+
+In 2.4.* a couple of lines where added to this file LOC & ERR (this time is the
+output of a SMP machine)::
+
+  > cat /proc/interrupts
+
+             CPU0       CPU1
+    0:    1243498    1214548    IO-APIC-edge  timer
+    1:       8949       8958    IO-APIC-edge  keyboard
+    2:          0          0          XT-PIC  cascade
+    5:      11286      10161    IO-APIC-edge  soundblaster
+    8:          1          0    IO-APIC-edge  rtc
+    9:      27422      27407    IO-APIC-edge  3c503
+   12:     113645     113873    IO-APIC-edge  PS/2 Mouse
+   13:          0          0          XT-PIC  fpu
+   14:      22491      24012    IO-APIC-edge  ide0
+   15:       2183       2415    IO-APIC-edge  ide1
+   17:      30564      30414   IO-APIC-level  eth0
+   18:        177        164   IO-APIC-level  bttv
+  NMI:    2457961    2457959
+  LOC:    2457882    2457881
+  ERR:       2155
+
+NMI is incremented in this case because every timer interrupt generates a NMI
+(Non Maskable Interrupt) which is used by the NMI Watchdog to detect lockups.
+
+LOC is the local interrupt counter of the internal APIC of every CPU.
+
+ERR is incremented in the case of errors in the IO-APIC bus (the bus that
+connects the CPUs in a SMP system. This means that an error has been detected,
+the IO-APIC automatically retry the transmission, so it should not be a big
+problem, but you should read the SMP-FAQ.
+
+In 2.6.2* /proc/interrupts was expanded again.  This time the goal was for
+/proc/interrupts to display every IRQ vector in use by the system, not
+just those considered 'most important'.  The new vectors are:
+
+THR
+  interrupt raised when a machine check threshold counter
+  (typically counting ECC corrected errors of memory or cache) exceeds
+  a configurable threshold.  Only available on some systems.
+
+TRM
+  a thermal event interrupt occurs when a temperature threshold
+  has been exceeded for the CPU.  This interrupt may also be generated
+  when the temperature drops back to normal.
+
+SPU
+  a spurious interrupt is some interrupt that was raised then lowered
+  by some IO device before it could be fully processed by the APIC.  Hence
+  the APIC sees the interrupt but does not know what device it came from.
+  For this case the APIC will generate the interrupt with a IRQ vector
+  of 0xff. This might also be generated by chipset bugs.
+
+RES, CAL, TLB]
+  rescheduling, call and TLB flush interrupts are
+  sent from one CPU to another per the needs of the OS.  Typically,
+  their statistics are used by kernel developers and interested users to
+  determine the occurrence of interrupts of the given type.
+
+The above IRQ vectors are displayed only when relevant.  For example,
+the threshold vector does not exist on x86_64 platforms.  Others are
+suppressed when the system is a uniprocessor.  As of this writing, only
+i386 and x86_64 platforms support the new IRQ vector displays.
+
+Of some interest is the introduction of the /proc/irq directory to 2.4.
+It could be used to set IRQ to CPU affinity, this means that you can "hook" an
+IRQ to only one CPU, or to exclude a CPU of handling IRQs. The contents of the
+irq subdir is one subdir for each IRQ, and two files; default_smp_affinity and
+prof_cpu_mask.
+
+For example::
+
+  > ls /proc/irq/
+  0  10  12  14  16  18  2  4  6  8  prof_cpu_mask
+  1  11  13  15  17  19  3  5  7  9  default_smp_affinity
+  > ls /proc/irq/0/
+  smp_affinity
+
+smp_affinity is a bitmask, in which you can specify which CPUs can handle the
+IRQ, you can set it by doing::
+
+  > echo 1 > /proc/irq/10/smp_affinity
+
+This means that only the first CPU will handle the IRQ, but you can also echo
+5 which means that only the first and third CPU can handle the IRQ.
+
+The contents of each smp_affinity file is the same by default::
+
+  > cat /proc/irq/0/smp_affinity
+  ffffffff
+
+There is an alternate interface, smp_affinity_list which allows specifying
+a cpu range instead of a bitmask::
+
+  > cat /proc/irq/0/smp_affinity_list
+  1024-1031
+
+The default_smp_affinity mask applies to all non-active IRQs, which are the
+IRQs which have not yet been allocated/activated, and hence which lack a
+/proc/irq/[0-9]* directory.
+
+The node file on an SMP system shows the node to which the device using the IRQ
+reports itself as being attached. This hardware locality information does not
+include information about any possible driver locality preference.
+
+prof_cpu_mask specifies which CPUs are to be profiled by the system wide
+profiler. Default value is ffffffff (all cpus if there are only 32 of them).
+
+The way IRQs are routed is handled by the IO-APIC, and it's Round Robin
+between all the CPUs which are allowed to handle it. As usual the kernel has
+more info than you and does a better job than you, so the defaults are the
+best choice for almost everyone.  [Note this applies only to those IO-APIC's
+that support "Round Robin" interrupt distribution.]
+
+There are  three  more  important subdirectories in /proc: net, scsi, and sys.
+The general  rule  is  that  the  contents,  or  even  the  existence of these
+directories, depend  on your kernel configuration. If SCSI is not enabled, the
+directory scsi  may  not  exist. The same is true with the net, which is there
+only when networking support is present in the running kernel.
+
+The slabinfo  file  gives  information  about  memory usage at the slab level.
+Linux uses  slab  pools for memory management above page level in version 2.2.
+Commonly used  objects  have  their  own  slab  pool (such as network buffers,
+directory cache, and so on).
+
+::
+
+    > cat /proc/buddyinfo
+
+    Node 0, zone      DMA      0      4      5      4      4      3 ...
+    Node 0, zone   Normal      1      0      0      1    101      8 ...
+    Node 0, zone  HighMem      2      0      0      1      1      0 ...
+
+External fragmentation is a problem under some workloads, and buddyinfo is a
+useful tool for helping diagnose these problems.  Buddyinfo will give you a
+clue as to how big an area you can safely allocate, or why a previous
+allocation failed.
+
+Each column represents the number of pages of a certain order which are
+available.  In this case, there are 0 chunks of 2^0*PAGE_SIZE available in
+ZONE_DMA, 4 chunks of 2^1*PAGE_SIZE in ZONE_DMA, 101 chunks of 2^4*PAGE_SIZE
+available in ZONE_NORMAL, etc...
+
+More information relevant to external fragmentation can be found in
+pagetypeinfo::
+
+    > cat /proc/pagetypeinfo
+    Page block order: 9
+    Pages per block:  512
+
+    Free pages count per migrate type at order       0      1      2      3      4      5      6      7      8      9     10
+    Node    0, zone      DMA, type    Unmovable      0      0      0      1      1      1      1      1      1      1      0
+    Node    0, zone      DMA, type  Reclaimable      0      0      0      0      0      0      0      0      0      0      0
+    Node    0, zone      DMA, type      Movable      1      1      2      1      2      1      1      0      1      0      2
+    Node    0, zone      DMA, type      Reserve      0      0      0      0      0      0      0      0      0      1      0
+    Node    0, zone      DMA, type      Isolate      0      0      0      0      0      0      0      0      0      0      0
+    Node    0, zone    DMA32, type    Unmovable    103     54     77      1      1      1     11      8      7      1      9
+    Node    0, zone    DMA32, type  Reclaimable      0      0      2      1      0      0      0      0      1      0      0
+    Node    0, zone    DMA32, type      Movable    169    152    113     91     77     54     39     13      6      1    452
+    Node    0, zone    DMA32, type      Reserve      1      2      2      2      2      0      1      1      1      1      0
+    Node    0, zone    DMA32, type      Isolate      0      0      0      0      0      0      0      0      0      0      0
+
+    Number of blocks type     Unmovable  Reclaimable      Movable      Reserve      Isolate
+    Node 0, zone      DMA            2            0            5            1            0
+    Node 0, zone    DMA32           41            6          967            2            0
+
+Fragmentation avoidance in the kernel works by grouping pages of different
+migrate types into the same contiguous regions of memory called page blocks.
+A page block is typically the size of the default hugepage size e.g. 2MB on
+X86-64. By keeping pages grouped based on their ability to move, the kernel
+can reclaim pages within a page block to satisfy a high-order allocation.
+
+The pagetypinfo begins with information on the size of a page block. It
+then gives the same type of information as buddyinfo except broken down
+by migrate-type and finishes with details on how many page blocks of each
+type exist.
+
+If min_free_kbytes has been tuned correctly (recommendations made by hugeadm
+from libhugetlbfs https://github.com/libhugetlbfs/libhugetlbfs/), one can
+make an estimate of the likely number of huge pages that can be allocated
+at a given point in time. All the "Movable" blocks should be allocatable
+unless memory has been mlock()'d. Some of the Reclaimable blocks should
+also be allocatable although a lot of filesystem metadata may have to be
+reclaimed to achieve this.
+
+
+meminfo
+~~~~~~~
+
+Provides information about distribution and utilization of memory.  This
+varies by architecture and compile options.  The following is from a
+16GB PIII, which has highmem enabled.  You may not have all of these fields.
+
+::
+
+    > cat /proc/meminfo
+
+    MemTotal:     16344972 kB
+    MemFree:      13634064 kB
+    MemAvailable: 14836172 kB
+    Buffers:          3656 kB
+    Cached:        1195708 kB
+    SwapCached:          0 kB
+    Active:         891636 kB
+    Inactive:      1077224 kB
+    HighTotal:    15597528 kB
+    HighFree:     13629632 kB
+    LowTotal:       747444 kB
+    LowFree:          4432 kB
+    SwapTotal:           0 kB
+    SwapFree:            0 kB
+    Dirty:             968 kB
+    Writeback:           0 kB
+    AnonPages:      861800 kB
+    Mapped:         280372 kB
+    Shmem:             644 kB
+    KReclaimable:   168048 kB
+    Slab:           284364 kB
+    SReclaimable:   159856 kB
+    SUnreclaim:     124508 kB
+    PageTables:      24448 kB
+    NFS_Unstable:        0 kB
+    Bounce:              0 kB
+    WritebackTmp:        0 kB
+    CommitLimit:   7669796 kB
+    Committed_AS:   100056 kB
+    VmallocTotal:   112216 kB
+    VmallocUsed:       428 kB
+    VmallocChunk:   111088 kB
+    Percpu:          62080 kB
+    HardwareCorrupted:   0 kB
+    AnonHugePages:   49152 kB
+    ShmemHugePages:      0 kB
+    ShmemPmdMapped:      0 kB
+
+MemTotal
+              Total usable ram (i.e. physical ram minus a few reserved
+              bits and the kernel binary code)
+MemFree
+              The sum of LowFree+HighFree
+MemAvailable
+              An estimate of how much memory is available for starting new
+              applications, without swapping. Calculated from MemFree,
+              SReclaimable, the size of the file LRU lists, and the low
+              watermarks in each zone.
+              The estimate takes into account that the system needs some
+              page cache to function well, and that not all reclaimable
+              slab will be reclaimable, due to items being in use. The
+              impact of those factors will vary from system to system.
+Buffers
+              Relatively temporary storage for raw disk blocks
+              shouldn't get tremendously large (20MB or so)
+Cached
+              in-memory cache for files read from the disk (the
+              pagecache).  Doesn't include SwapCached
+SwapCached
+              Memory that once was swapped out, is swapped back in but
+              still also is in the swapfile (if memory is needed it
+              doesn't need to be swapped out AGAIN because it is already
+              in the swapfile. This saves I/O)
+Active
+              Memory that has been used more recently and usually not
+              reclaimed unless absolutely necessary.
+Inactive
+              Memory which has been less recently used.  It is more
+              eligible to be reclaimed for other purposes
+HighTotal, HighFree
+              Highmem is all memory above ~860MB of physical memory
+              Highmem areas are for use by userspace programs, or
+              for the pagecache.  The kernel must use tricks to access
+              this memory, making it slower to access than lowmem.
+LowTotal, LowFree
+              Lowmem is memory which can be used for everything that
+              highmem can be used for, but it is also available for the
+              kernel's use for its own data structures.  Among many
+              other things, it is where everything from the Slab is
+              allocated.  Bad things happen when you're out of lowmem.
+SwapTotal
+              total amount of swap space available
+SwapFree
+              Memory which has been evicted from RAM, and is temporarily
+              on the disk
+Dirty
+              Memory which is waiting to get written back to the disk
+Writeback
+              Memory which is actively being written back to the disk
+AnonPages
+              Non-file backed pages mapped into userspace page tables
+HardwareCorrupted
+              The amount of RAM/memory in KB, the kernel identifies as
+	      corrupted.
+AnonHugePages
+              Non-file backed huge pages mapped into userspace page tables
+Mapped
+              files which have been mmaped, such as libraries
+Shmem
+              Total memory used by shared memory (shmem) and tmpfs
+ShmemHugePages
+              Memory used by shared memory (shmem) and tmpfs allocated
+              with huge pages
+ShmemPmdMapped
+              Shared memory mapped into userspace with huge pages
+KReclaimable
+              Kernel allocations that the kernel will attempt to reclaim
+              under memory pressure. Includes SReclaimable (below), and other
+              direct allocations with a shrinker.
+Slab
+              in-kernel data structures cache
+SReclaimable
+              Part of Slab, that might be reclaimed, such as caches
+SUnreclaim
+              Part of Slab, that cannot be reclaimed on memory pressure
+PageTables
+              amount of memory dedicated to the lowest level of page
+              tables.
+NFS_Unstable
+              NFS pages sent to the server, but not yet committed to stable
+	      storage
+Bounce
+              Memory used for block device "bounce buffers"
+WritebackTmp
+              Memory used by FUSE for temporary writeback buffers
+CommitLimit
+              Based on the overcommit ratio ('vm.overcommit_ratio'),
+              this is the total amount of  memory currently available to
+              be allocated on the system. This limit is only adhered to
+              if strict overcommit accounting is enabled (mode 2 in
+              'vm.overcommit_memory').
+
+              The CommitLimit is calculated with the following formula::
+
+                CommitLimit = ([total RAM pages] - [total huge TLB pages]) *
+                               overcommit_ratio / 100 + [total swap pages]
+
+              For example, on a system with 1G of physical RAM and 7G
+              of swap with a `vm.overcommit_ratio` of 30 it would
+              yield a CommitLimit of 7.3G.
+
+              For more details, see the memory overcommit documentation
+              in vm/overcommit-accounting.
+Committed_AS
+              The amount of memory presently allocated on the system.
+              The committed memory is a sum of all of the memory which
+              has been allocated by processes, even if it has not been
+              "used" by them as of yet. A process which malloc()'s 1G
+              of memory, but only touches 300M of it will show up as
+	      using 1G. This 1G is memory which has been "committed" to
+              by the VM and can be used at any time by the allocating
+              application. With strict overcommit enabled on the system
+              (mode 2 in 'vm.overcommit_memory'),allocations which would
+              exceed the CommitLimit (detailed above) will not be permitted.
+              This is useful if one needs to guarantee that processes will
+              not fail due to lack of memory once that memory has been
+              successfully allocated.
+VmallocTotal
+              total size of vmalloc memory area
+VmallocUsed
+              amount of vmalloc area which is used
+VmallocChunk
+              largest contiguous block of vmalloc area which is free
+Percpu
+              Memory allocated to the percpu allocator used to back percpu
+              allocations. This stat excludes the cost of metadata.
+
+vmallocinfo
+~~~~~~~~~~~
+
+Provides information about vmalloced/vmaped areas. One line per area,
+containing the virtual address range of the area, size in bytes,
+caller information of the creator, and optional information depending
+on the kind of area :
+
+ ==========  ===================================================
+ pages=nr    number of pages
+ phys=addr   if a physical address was specified
+ ioremap     I/O mapping (ioremap() and friends)
+ vmalloc     vmalloc() area
+ vmap        vmap()ed pages
+ user        VM_USERMAP area
+ vpages      buffer for pages pointers was vmalloced (huge area)
+ N<node>=nr  (Only on NUMA kernels)
+             Number of pages allocated on memory node <node>
+ ==========  ===================================================
+
+::
+
+    > cat /proc/vmallocinfo
+    0xffffc20000000000-0xffffc20000201000 2101248 alloc_large_system_hash+0x204 ...
+    /0x2c0 pages=512 vmalloc N0=128 N1=128 N2=128 N3=128
+    0xffffc20000201000-0xffffc20000302000 1052672 alloc_large_system_hash+0x204 ...
+    /0x2c0 pages=256 vmalloc N0=64 N1=64 N2=64 N3=64
+    0xffffc20000302000-0xffffc20000304000    8192 acpi_tb_verify_table+0x21/0x4f...
+    phys=7fee8000 ioremap
+    0xffffc20000304000-0xffffc20000307000   12288 acpi_tb_verify_table+0x21/0x4f...
+    phys=7fee7000 ioremap
+    0xffffc2000031d000-0xffffc2000031f000    8192 init_vdso_vars+0x112/0x210
+    0xffffc2000031f000-0xffffc2000032b000   49152 cramfs_uncompress_init+0x2e ...
+    /0x80 pages=11 vmalloc N0=3 N1=3 N2=2 N3=3
+    0xffffc2000033a000-0xffffc2000033d000   12288 sys_swapon+0x640/0xac0      ...
+    pages=2 vmalloc N1=2
+    0xffffc20000347000-0xffffc2000034c000   20480 xt_alloc_table_info+0xfe ...
+    /0x130 [x_tables] pages=4 vmalloc N0=4
+    0xffffffffa0000000-0xffffffffa000f000   61440 sys_init_module+0xc27/0x1d00 ...
+    pages=14 vmalloc N2=14
+    0xffffffffa000f000-0xffffffffa0014000   20480 sys_init_module+0xc27/0x1d00 ...
+    pages=4 vmalloc N1=4
+    0xffffffffa0014000-0xffffffffa0017000   12288 sys_init_module+0xc27/0x1d00 ...
+    pages=2 vmalloc N1=2
+    0xffffffffa0017000-0xffffffffa0022000   45056 sys_init_module+0xc27/0x1d00 ...
+    pages=10 vmalloc N0=10
+
+
+softirqs
+~~~~~~~~
+
+Provides counts of softirq handlers serviced since boot time, for each cpu.
+
+::
+
+    > cat /proc/softirqs
+		    CPU0       CPU1       CPU2       CPU3
+	HI:          0          0          0          0
+    TIMER:      27166      27120      27097      27034
+    NET_TX:          0          0          0         17
+    NET_RX:         42          0          0         39
+    BLOCK:          0          0        107       1121
+    TASKLET:          0          0          0        290
+    SCHED:      27035      26983      26971      26746
+    HRTIMER:          0          0          0          0
+	RCU:       1678       1769       2178       2250
+
+
+1.3 IDE devices in /proc/ide
+----------------------------
+
+The subdirectory /proc/ide contains information about all IDE devices of which
+the kernel  is  aware.  There is one subdirectory for each IDE controller, the
+file drivers  and a link for each IDE device, pointing to the device directory
+in the controller specific subtree.
+
+The file  drivers  contains general information about the drivers used for the
+IDE devices::
+
+  > cat /proc/ide/drivers
+  ide-cdrom version 4.53
+  ide-disk version 1.08
+
+More detailed  information  can  be  found  in  the  controller  specific
+subdirectories. These  are  named  ide0,  ide1  and  so  on.  Each  of  these
+directories contains the files shown in table 1-6.
+
+
+.. table:: Table 1-6: IDE controller info in  /proc/ide/ide?
+
+ ======= =======================================
+ File    Content
+ ======= =======================================
+ channel IDE channel (0 or 1)
+ config  Configuration (only for PCI/IDE bridge)
+ mate    Mate name
+ model   Type/Chipset of IDE controller
+ ======= =======================================
+
+Each device  connected  to  a  controller  has  a separate subdirectory in the
+controllers directory.  The  files  listed in table 1-7 are contained in these
+directories.
+
+
+.. table:: Table 1-7: IDE device information
+
+ ================ ==========================================
+ File             Content
+ ================ ==========================================
+ cache            The cache
+ capacity         Capacity of the medium (in 512Byte blocks)
+ driver           driver and version
+ geometry         physical and logical geometry
+ identify         device identify block
+ media            media type
+ model            device identifier
+ settings         device setup
+ smart_thresholds IDE disk management thresholds
+ smart_values     IDE disk management values
+ ================ ==========================================
+
+The most  interesting  file is ``settings``. This file contains a nice
+overview of the drive parameters::
+
+  # cat /proc/ide/ide0/hda/settings
+  name                    value           min             max             mode
+  ----                    -----           ---             ---             ----
+  bios_cyl                526             0               65535           rw
+  bios_head               255             0               255             rw
+  bios_sect               63              0               63              rw
+  breada_readahead        4               0               127             rw
+  bswap                   0               0               1               r
+  file_readahead          72              0               2097151         rw
+  io_32bit                0               0               3               rw
+  keepsettings            0               0               1               rw
+  max_kb_per_request      122             1               127             rw
+  multcount               0               0               8               rw
+  nice1                   1               0               1               rw
+  nowerr                  0               0               1               rw
+  pio_mode                write-only      0               255             w
+  slow                    0               0               1               rw
+  unmaskirq               0               0               1               rw
+  using_dma               0               0               1               rw
+
+
+1.4 Networking info in /proc/net
+--------------------------------
+
+The subdirectory  /proc/net  follows  the  usual  pattern. Table 1-8 shows the
+additional values  you  get  for  IP  version 6 if you configure the kernel to
+support this. Table 1-9 lists the files and their meaning.
+
+
+.. table:: Table 1-8: IPv6 info in /proc/net
+
+ ========== =====================================================
+ File       Content
+ ========== =====================================================
+ udp6       UDP sockets (IPv6)
+ tcp6       TCP sockets (IPv6)
+ raw6       Raw device statistics (IPv6)
+ igmp6      IP multicast addresses, which this host joined (IPv6)
+ if_inet6   List of IPv6 interface addresses
+ ipv6_route Kernel routing table for IPv6
+ rt6_stats  Global IPv6 routing tables statistics
+ sockstat6  Socket statistics (IPv6)
+ snmp6      Snmp data (IPv6)
+ ========== =====================================================
+
+.. table:: Table 1-9: Network info in /proc/net
+
+ ============= ================================================================
+ File          Content
+ ============= ================================================================
+ arp           Kernel  ARP table
+ dev           network devices with statistics
+ dev_mcast     the Layer2 multicast groups a device is listening too
+               (interface index, label, number of references, number of bound
+               addresses).
+ dev_stat      network device status
+ ip_fwchains   Firewall chain linkage
+ ip_fwnames    Firewall chain names
+ ip_masq       Directory containing the masquerading tables
+ ip_masquerade Major masquerading table
+ netstat       Network statistics
+ raw           raw device statistics
+ route         Kernel routing table
+ rpc           Directory containing rpc info
+ rt_cache      Routing cache
+ snmp          SNMP data
+ sockstat      Socket statistics
+ tcp           TCP  sockets
+ udp           UDP sockets
+ unix          UNIX domain sockets
+ wireless      Wireless interface data (Wavelan etc)
+ igmp          IP multicast addresses, which this host joined
+ psched        Global packet scheduler parameters.
+ netlink       List of PF_NETLINK sockets
+ ip_mr_vifs    List of multicast virtual interfaces
+ ip_mr_cache   List of multicast routing cache
+ ============= ================================================================
+
+You can  use  this  information  to see which network devices are available in
+your system and how much traffic was routed over those devices::
+
+  > cat /proc/net/dev
+  Inter-|Receive                                                   |[...
+   face |bytes    packets errs drop fifo frame compressed multicast|[...
+      lo:  908188   5596     0    0    0     0          0         0 [...
+    ppp0:15475140  20721   410    0    0   410          0         0 [...
+    eth0:  614530   7085     0    0    0     0          0         1 [...
+
+  ...] Transmit
+  ...] bytes    packets errs drop fifo colls carrier compressed
+  ...]  908188     5596    0    0    0     0       0          0
+  ...] 1375103    17405    0    0    0     0       0          0
+  ...] 1703981     5535    0    0    0     3       0          0
+
+In addition, each Channel Bond interface has its own directory.  For
+example, the bond0 device will have a directory called /proc/net/bond0/.
+It will contain information that is specific to that bond, such as the
+current slaves of the bond, the link status of the slaves, and how
+many times the slaves link has failed.
+
+1.5 SCSI info
+-------------
+
+If you  have  a  SCSI  host adapter in your system, you'll find a subdirectory
+named after  the driver for this adapter in /proc/scsi. You'll also see a list
+of all recognized SCSI devices in /proc/scsi::
+
+  >cat /proc/scsi/scsi
+  Attached devices:
+  Host: scsi0 Channel: 00 Id: 00 Lun: 00
+    Vendor: IBM      Model: DGHS09U          Rev: 03E0
+    Type:   Direct-Access                    ANSI SCSI revision: 03
+  Host: scsi0 Channel: 00 Id: 06 Lun: 00
+    Vendor: PIONEER  Model: CD-ROM DR-U06S   Rev: 1.04
+    Type:   CD-ROM                           ANSI SCSI revision: 02
+
+
+The directory  named  after  the driver has one file for each adapter found in
+the system.  These  files  contain information about the controller, including
+the used  IRQ  and  the  IO  address range. The amount of information shown is
+dependent on  the adapter you use. The example shows the output for an Adaptec
+AHA-2940 SCSI adapter::
+
+  > cat /proc/scsi/aic7xxx/0
+
+  Adaptec AIC7xxx driver version: 5.1.19/3.2.4
+  Compile Options:
+    TCQ Enabled By Default : Disabled
+    AIC7XXX_PROC_STATS     : Disabled
+    AIC7XXX_RESET_DELAY    : 5
+  Adapter Configuration:
+             SCSI Adapter: Adaptec AHA-294X Ultra SCSI host adapter
+                             Ultra Wide Controller
+      PCI MMAPed I/O Base: 0xeb001000
+   Adapter SEEPROM Config: SEEPROM found and used.
+        Adaptec SCSI BIOS: Enabled
+                      IRQ: 10
+                     SCBs: Active 0, Max Active 2,
+                           Allocated 15, HW 16, Page 255
+               Interrupts: 160328
+        BIOS Control Word: 0x18b6
+     Adapter Control Word: 0x005b
+     Extended Translation: Enabled
+  Disconnect Enable Flags: 0xffff
+       Ultra Enable Flags: 0x0001
+   Tag Queue Enable Flags: 0x0000
+  Ordered Queue Tag Flags: 0x0000
+  Default Tag Queue Depth: 8
+      Tagged Queue By Device array for aic7xxx host instance 0:
+        {255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255}
+      Actual queue depth per device for aic7xxx host instance 0:
+        {1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1}
+  Statistics:
+  (scsi0:0:0:0)
+    Device using Wide/Sync transfers at 40.0 MByte/sec, offset 8
+    Transinfo settings: current(12/8/1/0), goal(12/8/1/0), user(12/15/1/0)
+    Total transfers 160151 (74577 reads and 85574 writes)
+  (scsi0:0:6:0)
+    Device using Narrow/Sync transfers at 5.0 MByte/sec, offset 15
+    Transinfo settings: current(50/15/0/0), goal(50/15/0/0), user(50/15/0/0)
+    Total transfers 0 (0 reads and 0 writes)
+
+
+1.6 Parallel port info in /proc/parport
+---------------------------------------
+
+The directory  /proc/parport  contains information about the parallel ports of
+your system.  It  has  one  subdirectory  for  each port, named after the port
+number (0,1,2,...).
+
+These directories contain the four files shown in Table 1-10.
+
+
+.. table:: Table 1-10: Files in /proc/parport
+
+ ========= ====================================================================
+ File      Content
+ ========= ====================================================================
+ autoprobe Any IEEE-1284 device ID information that has been acquired.
+ devices   list of the device drivers using that port. A + will appear by the
+           name of the device currently using the port (it might not appear
+           against any).
+ hardware  Parallel port's base address, IRQ line and DMA channel.
+ irq       IRQ that parport is using for that port. This is in a separate
+           file to allow you to alter it by writing a new value in (IRQ
+           number or none).
+ ========= ====================================================================
+
+1.7 TTY info in /proc/tty
+-------------------------
+
+Information about  the  available  and actually used tty's can be found in the
+directory /proc/tty.You'll  find  entries  for drivers and line disciplines in
+this directory, as shown in Table 1-11.
+
+
+.. table:: Table 1-11: Files in /proc/tty
+
+ ============= ==============================================
+ File          Content
+ ============= ==============================================
+ drivers       list of drivers and their usage
+ ldiscs        registered line disciplines
+ driver/serial usage statistic and status of single tty lines
+ ============= ==============================================
+
+To see  which  tty's  are  currently in use, you can simply look into the file
+/proc/tty/drivers::
+
+  > cat /proc/tty/drivers
+  pty_slave            /dev/pts      136   0-255 pty:slave
+  pty_master           /dev/ptm      128   0-255 pty:master
+  pty_slave            /dev/ttyp       3   0-255 pty:slave
+  pty_master           /dev/pty        2   0-255 pty:master
+  serial               /dev/cua        5   64-67 serial:callout
+  serial               /dev/ttyS       4   64-67 serial
+  /dev/tty0            /dev/tty0       4       0 system:vtmaster
+  /dev/ptmx            /dev/ptmx       5       2 system
+  /dev/console         /dev/console    5       1 system:console
+  /dev/tty             /dev/tty        5       0 system:/dev/tty
+  unknown              /dev/tty        4    1-63 console
+
+
+1.8 Miscellaneous kernel statistics in /proc/stat
+-------------------------------------------------
+
+Various pieces   of  information about  kernel activity  are  available in the
+/proc/stat file.  All  of  the numbers reported  in  this file are  aggregates
+since the system first booted.  For a quick look, simply cat the file::
+
+  > cat /proc/stat
+  cpu  2255 34 2290 22625563 6290 127 456 0 0 0
+  cpu0 1132 34 1441 11311718 3675 127 438 0 0 0
+  cpu1 1123 0 849 11313845 2614 0 18 0 0 0
+  intr 114930548 113199788 3 0 5 263 0 4 [... lots more numbers ...]
+  ctxt 1990473
+  btime 1062191376
+  processes 2915
+  procs_running 1
+  procs_blocked 0
+  softirq 183433 0 21755 12 39 1137 231 21459 2263
+
+The very first  "cpu" line aggregates the  numbers in all  of the other "cpuN"
+lines.  These numbers identify the amount of time the CPU has spent performing
+different kinds of work.  Time units are in USER_HZ (typically hundredths of a
+second).  The meanings of the columns are as follows, from left to right:
+
+- user: normal processes executing in user mode
+- nice: niced processes executing in user mode
+- system: processes executing in kernel mode
+- idle: twiddling thumbs
+- iowait: In a word, iowait stands for waiting for I/O to complete. But there
+  are several problems:
+
+  1. Cpu will not wait for I/O to complete, iowait is the time that a task is
+     waiting for I/O to complete. When cpu goes into idle state for
+     outstanding task io, another task will be scheduled on this CPU.
+  2. In a multi-core CPU, the task waiting for I/O to complete is not running
+     on any CPU, so the iowait of each CPU is difficult to calculate.
+  3. The value of iowait field in /proc/stat will decrease in certain
+     conditions.
+
+  So, the iowait is not reliable by reading from /proc/stat.
+- irq: servicing interrupts
+- softirq: servicing softirqs
+- steal: involuntary wait
+- guest: running a normal guest
+- guest_nice: running a niced guest
+
+The "intr" line gives counts of interrupts  serviced since boot time, for each
+of the  possible system interrupts.   The first  column  is the  total of  all
+interrupts serviced  including  unnumbered  architecture specific  interrupts;
+each  subsequent column is the  total for that particular numbered interrupt.
+Unnumbered interrupts are not shown, only summed into the total.
+
+The "ctxt" line gives the total number of context switches across all CPUs.
+
+The "btime" line gives  the time at which the  system booted, in seconds since
+the Unix epoch.
+
+The "processes" line gives the number  of processes and threads created, which
+includes (but  is not limited  to) those  created by  calls to the  fork() and
+clone() system calls.
+
+The "procs_running" line gives the total number of threads that are
+running or ready to run (i.e., the total number of runnable threads).
+
+The   "procs_blocked" line gives  the  number of  processes currently blocked,
+waiting for I/O to complete.
+
+The "softirq" line gives counts of softirqs serviced since boot time, for each
+of the possible system softirqs. The first column is the total of all
+softirqs serviced; each subsequent column is the total for that particular
+softirq.
+
+
+1.9 Ext4 file system parameters
+-------------------------------
+
+Information about mounted ext4 file systems can be found in
+/proc/fs/ext4.  Each mounted filesystem will have a directory in
+/proc/fs/ext4 based on its device name (i.e., /proc/fs/ext4/hdc or
+/proc/fs/ext4/dm-0).   The files in each per-device directory are shown
+in Table 1-12, below.
+
+.. table:: Table 1-12: Files in /proc/fs/ext4/<devname>
+
+ ==============  ==========================================================
+ File            Content
+ mb_groups       details of multiblock allocator buddy cache of free blocks
+ ==============  ==========================================================
+
+2.0 /proc/consoles
+------------------
+Shows registered system console lines.
+
+To see which character device lines are currently used for the system console
+/dev/console, you may simply look into the file /proc/consoles::
+
+  > cat /proc/consoles
+  tty0                 -WU (ECp)       4:7
+  ttyS0                -W- (Ep)        4:64
+
+The columns are:
+
++--------------------+-------------------------------------------------------+
+| device             | name of the device                                    |
++====================+=======================================================+
+| operations         | * R = can do read operations                          |
+|                    | * W = can do write operations                         |
+|                    | * U = can do unblank                                  |
++--------------------+-------------------------------------------------------+
+| flags              | * E = it is enabled                                   |
+|                    | * C = it is preferred console                         |
+|                    | * B = it is primary boot console                      |
+|                    | * p = it is used for printk buffer                    |
+|                    | * b = it is not a TTY but a Braille device            |
+|                    | * a = it is safe to use when cpu is offline           |
++--------------------+-------------------------------------------------------+
+| major:minor        | major and minor number of the device separated by a   |
+|                    | colon                                                 |
++--------------------+-------------------------------------------------------+
+
+Summary
+-------
+
+The /proc file system serves information about the running system. It not only
+allows access to process data but also allows you to request the kernel status
+by reading files in the hierarchy.
+
+The directory  structure  of /proc reflects the types of information and makes
+it easy, if not obvious, where to look for specific data.
+
+Chapter 2: Modifying System Parameters
+======================================
+
+In This Chapter
+---------------
+
+* Modifying kernel parameters by writing into files found in /proc/sys
+* Exploring the files which modify certain parameters
+* Review of the /proc/sys file tree
+
+------------------------------------------------------------------------------
+
+A very  interesting part of /proc is the directory /proc/sys. This is not only
+a source  of  information,  it also allows you to change parameters within the
+kernel. Be  very  careful  when attempting this. You can optimize your system,
+but you  can  also  cause  it  to  crash.  Never  alter kernel parameters on a
+production system.  Set  up  a  development machine and test to make sure that
+everything works  the  way  you want it to. You may have no alternative but to
+reboot the machine once an error has been made.
+
+To change  a  value,  simply  echo  the new value into the file. An example is
+given below  in the section on the file system data. You need to be root to do
+this. You  can  create  your  own  boot script to perform this every time your
+system boots.
+
+The files  in /proc/sys can be used to fine tune and monitor miscellaneous and
+general things  in  the operation of the Linux kernel. Since some of the files
+can inadvertently  disrupt  your  system,  it  is  advisable  to  read  both
+documentation and  source  before actually making adjustments. In any case, be
+very careful  when  writing  to  any  of these files. The entries in /proc may
+change slightly between the 2.1.* and the 2.2 kernel, so if there is any doubt
+review the kernel documentation in the directory /usr/src/linux/Documentation.
+This chapter  is  heavily  based  on the documentation included in the pre 2.2
+kernels, and became part of it in version 2.2.1 of the Linux kernel.
+
+Please see: Documentation/admin-guide/sysctl/ directory for descriptions of these
+entries.
+
+Summary
+-------
+
+Certain aspects  of  kernel  behavior  can be modified at runtime, without the
+need to  recompile  the kernel, or even to reboot the system. The files in the
+/proc/sys tree  can  not only be read, but also modified. You can use the echo
+command to write value into these files, thereby changing the default settings
+of the kernel.
+
+
+Chapter 3: Per-process Parameters
+=================================
+
+3.1 /proc/<pid>/oom_adj & /proc/<pid>/oom_score_adj- Adjust the oom-killer score
+--------------------------------------------------------------------------------
+
+These file can be used to adjust the badness heuristic used to select which
+process gets killed in out of memory conditions.
+
+The badness heuristic assigns a value to each candidate task ranging from 0
+(never kill) to 1000 (always kill) to determine which process is targeted.  The
+units are roughly a proportion along that range of allowed memory the process
+may allocate from based on an estimation of its current memory and swap use.
+For example, if a task is using all allowed memory, its badness score will be
+1000.  If it is using half of its allowed memory, its score will be 500.
+
+There is an additional factor included in the badness score: the current memory
+and swap usage is discounted by 3% for root processes.
+
+The amount of "allowed" memory depends on the context in which the oom killer
+was called.  If it is due to the memory assigned to the allocating task's cpuset
+being exhausted, the allowed memory represents the set of mems assigned to that
+cpuset.  If it is due to a mempolicy's node(s) being exhausted, the allowed
+memory represents the set of mempolicy nodes.  If it is due to a memory
+limit (or swap limit) being reached, the allowed memory is that configured
+limit.  Finally, if it is due to the entire system being out of memory, the
+allowed memory represents all allocatable resources.
+
+The value of /proc/<pid>/oom_score_adj is added to the badness score before it
+is used to determine which task to kill.  Acceptable values range from -1000
+(OOM_SCORE_ADJ_MIN) to +1000 (OOM_SCORE_ADJ_MAX).  This allows userspace to
+polarize the preference for oom killing either by always preferring a certain
+task or completely disabling it.  The lowest possible value, -1000, is
+equivalent to disabling oom killing entirely for that task since it will always
+report a badness score of 0.
+
+Consequently, it is very simple for userspace to define the amount of memory to
+consider for each task.  Setting a /proc/<pid>/oom_score_adj value of +500, for
+example, is roughly equivalent to allowing the remainder of tasks sharing the
+same system, cpuset, mempolicy, or memory controller resources to use at least
+50% more memory.  A value of -500, on the other hand, would be roughly
+equivalent to discounting 50% of the task's allowed memory from being considered
+as scoring against the task.
+
+For backwards compatibility with previous kernels, /proc/<pid>/oom_adj may also
+be used to tune the badness score.  Its acceptable values range from -16
+(OOM_ADJUST_MIN) to +15 (OOM_ADJUST_MAX) and a special value of -17
+(OOM_DISABLE) to disable oom killing entirely for that task.  Its value is
+scaled linearly with /proc/<pid>/oom_score_adj.
+
+The value of /proc/<pid>/oom_score_adj may be reduced no lower than the last
+value set by a CAP_SYS_RESOURCE process. To reduce the value any lower
+requires CAP_SYS_RESOURCE.
+
+Caveat: when a parent task is selected, the oom killer will sacrifice any first
+generation children with separate address spaces instead, if possible.  This
+avoids servers and important system daemons from being killed and loses the
+minimal amount of work.
+
+
+3.2 /proc/<pid>/oom_score - Display current oom-killer score
+-------------------------------------------------------------
+
+This file can be used to check the current score used by the oom-killer is for
+any given <pid>. Use it together with /proc/<pid>/oom_score_adj to tune which
+process should be killed in an out-of-memory situation.
+
+
+3.3  /proc/<pid>/io - Display the IO accounting fields
+-------------------------------------------------------
+
+This file contains IO statistics for each running process
+
+Example
+~~~~~~~
+
+::
+
+    test:/tmp # dd if=/dev/zero of=/tmp/test.dat &
+    [1] 3828
+
+    test:/tmp # cat /proc/3828/io
+    rchar: 323934931
+    wchar: 323929600
+    syscr: 632687
+    syscw: 632675
+    read_bytes: 0
+    write_bytes: 323932160
+    cancelled_write_bytes: 0
+
+
+Description
+~~~~~~~~~~~
+
+rchar
+^^^^^
+
+I/O counter: chars read
+The number of bytes which this task has caused to be read from storage. This
+is simply the sum of bytes which this process passed to read() and pread().
+It includes things like tty IO and it is unaffected by whether or not actual
+physical disk IO was required (the read might have been satisfied from
+pagecache)
+
+
+wchar
+^^^^^
+
+I/O counter: chars written
+The number of bytes which this task has caused, or shall cause to be written
+to disk. Similar caveats apply here as with rchar.
+
+
+syscr
+^^^^^
+
+I/O counter: read syscalls
+Attempt to count the number of read I/O operations, i.e. syscalls like read()
+and pread().
+
+
+syscw
+^^^^^
+
+I/O counter: write syscalls
+Attempt to count the number of write I/O operations, i.e. syscalls like
+write() and pwrite().
+
+
+read_bytes
+^^^^^^^^^^
+
+I/O counter: bytes read
+Attempt to count the number of bytes which this process really did cause to
+be fetched from the storage layer. Done at the submit_bio() level, so it is
+accurate for block-backed filesystems. <please add status regarding NFS and
+CIFS at a later time>
+
+
+write_bytes
+^^^^^^^^^^^
+
+I/O counter: bytes written
+Attempt to count the number of bytes which this process caused to be sent to
+the storage layer. This is done at page-dirtying time.
+
+
+cancelled_write_bytes
+^^^^^^^^^^^^^^^^^^^^^
+
+The big inaccuracy here is truncate. If a process writes 1MB to a file and
+then deletes the file, it will in fact perform no writeout. But it will have
+been accounted as having caused 1MB of write.
+In other words: The number of bytes which this process caused to not happen,
+by truncating pagecache. A task can cause "negative" IO too. If this task
+truncates some dirty pagecache, some IO which another task has been accounted
+for (in its write_bytes) will not be happening. We _could_ just subtract that
+from the truncating task's write_bytes, but there is information loss in doing
+that.
+
+
+.. Note::
+
+   At its current implementation state, this is a bit racy on 32-bit machines:
+   if process A reads process B's /proc/pid/io while process B is updating one
+   of those 64-bit counters, process A could see an intermediate result.
+
+
+More information about this can be found within the taskstats documentation in
+Documentation/accounting.
+
+3.4 /proc/<pid>/coredump_filter - Core dump filtering settings
+---------------------------------------------------------------
+When a process is dumped, all anonymous memory is written to a core file as
+long as the size of the core file isn't limited. But sometimes we don't want
+to dump some memory segments, for example, huge shared memory or DAX.
+Conversely, sometimes we want to save file-backed memory segments into a core
+file, not only the individual files.
+
+/proc/<pid>/coredump_filter allows you to customize which memory segments
+will be dumped when the <pid> process is dumped. coredump_filter is a bitmask
+of memory types. If a bit of the bitmask is set, memory segments of the
+corresponding memory type are dumped, otherwise they are not dumped.
+
+The following 9 memory types are supported:
+
+  - (bit 0) anonymous private memory
+  - (bit 1) anonymous shared memory
+  - (bit 2) file-backed private memory
+  - (bit 3) file-backed shared memory
+  - (bit 4) ELF header pages in file-backed private memory areas (it is
+    effective only if the bit 2 is cleared)
+  - (bit 5) hugetlb private memory
+  - (bit 6) hugetlb shared memory
+  - (bit 7) DAX private memory
+  - (bit 8) DAX shared memory
+
+  Note that MMIO pages such as frame buffer are never dumped and vDSO pages
+  are always dumped regardless of the bitmask status.
+
+  Note that bits 0-4 don't affect hugetlb or DAX memory. hugetlb memory is
+  only affected by bit 5-6, and DAX is only affected by bits 7-8.
+
+The default value of coredump_filter is 0x33; this means all anonymous memory
+segments, ELF header pages and hugetlb private memory are dumped.
+
+If you don't want to dump all shared memory segments attached to pid 1234,
+write 0x31 to the process's proc file::
+
+  $ echo 0x31 > /proc/1234/coredump_filter
+
+When a new process is created, the process inherits the bitmask status from its
+parent. It is useful to set up coredump_filter before the program runs.
+For example::
+
+  $ echo 0x7 > /proc/self/coredump_filter
+  $ ./some_program
+
+3.5	/proc/<pid>/mountinfo - Information about mounts
+--------------------------------------------------------
+
+This file contains lines of the form::
+
+    36 35 98:0 /mnt1 /mnt2 rw,noatime master:1 - ext3 /dev/root rw,errors=continue
+    (1)(2)(3)   (4)   (5)      (6)      (7)   (8) (9)   (10)         (11)
+
+    (1) mount ID:  unique identifier of the mount (may be reused after umount)
+    (2) parent ID:  ID of parent (or of self for the top of the mount tree)
+    (3) major:minor:  value of st_dev for files on filesystem
+    (4) root:  root of the mount within the filesystem
+    (5) mount point:  mount point relative to the process's root
+    (6) mount options:  per mount options
+    (7) optional fields:  zero or more fields of the form "tag[:value]"
+    (8) separator:  marks the end of the optional fields
+    (9) filesystem type:  name of filesystem of the form "type[.subtype]"
+    (10) mount source:  filesystem specific information or "none"
+    (11) super options:  per super block options
+
+Parsers should ignore all unrecognised optional fields.  Currently the
+possible optional fields are:
+
+================  ==============================================================
+shared:X          mount is shared in peer group X
+master:X          mount is slave to peer group X
+propagate_from:X  mount is slave and receives propagation from peer group X [#]_
+unbindable        mount is unbindable
+================  ==============================================================
+
+.. [#] X is the closest dominant peer group under the process's root.  If
+       X is the immediate master of the mount, or if there's no dominant peer
+       group under the same root, then only the "master:X" field is present
+       and not the "propagate_from:X" field.
+
+For more information on mount propagation see:
+
+  Documentation/filesystems/sharedsubtree.txt
+
+
+3.6	/proc/<pid>/comm  & /proc/<pid>/task/<tid>/comm
+--------------------------------------------------------
+These files provide a method to access a tasks comm value. It also allows for
+a task to set its own or one of its thread siblings comm value. The comm value
+is limited in size compared to the cmdline value, so writing anything longer
+then the kernel's TASK_COMM_LEN (currently 16 chars) will result in a truncated
+comm value.
+
+
+3.7	/proc/<pid>/task/<tid>/children - Information about task children
+-------------------------------------------------------------------------
+This file provides a fast way to retrieve first level children pids
+of a task pointed by <pid>/<tid> pair. The format is a space separated
+stream of pids.
+
+Note the "first level" here -- if a child has own children they will
+not be listed here, one needs to read /proc/<children-pid>/task/<tid>/children
+to obtain the descendants.
+
+Since this interface is intended to be fast and cheap it doesn't
+guarantee to provide precise results and some children might be
+skipped, especially if they've exited right after we printed their
+pids, so one need to either stop or freeze processes being inspected
+if precise results are needed.
+
+
+3.8	/proc/<pid>/fdinfo/<fd> - Information about opened file
+---------------------------------------------------------------
+This file provides information associated with an opened file. The regular
+files have at least three fields -- 'pos', 'flags' and mnt_id. The 'pos'
+represents the current offset of the opened file in decimal form [see lseek(2)
+for details], 'flags' denotes the octal O_xxx mask the file has been
+created with [see open(2) for details] and 'mnt_id' represents mount ID of
+the file system containing the opened file [see 3.5 /proc/<pid>/mountinfo
+for details].
+
+A typical output is::
+
+	pos:	0
+	flags:	0100002
+	mnt_id:	19
+
+All locks associated with a file descriptor are shown in its fdinfo too::
+
+    lock:       1: FLOCK  ADVISORY  WRITE 359 00:13:11691 0 EOF
+
+The files such as eventfd, fsnotify, signalfd, epoll among the regular pos/flags
+pair provide additional information particular to the objects they represent.
+
+Eventfd files
+~~~~~~~~~~~~~
+
+::
+
+	pos:	0
+	flags:	04002
+	mnt_id:	9
+	eventfd-count:	5a
+
+where 'eventfd-count' is hex value of a counter.
+
+Signalfd files
+~~~~~~~~~~~~~~
+
+::
+
+	pos:	0
+	flags:	04002
+	mnt_id:	9
+	sigmask:	0000000000000200
+
+where 'sigmask' is hex value of the signal mask associated
+with a file.
+
+Epoll files
+~~~~~~~~~~~
+
+::
+
+	pos:	0
+	flags:	02
+	mnt_id:	9
+	tfd:        5 events:       1d data: ffffffffffffffff pos:0 ino:61af sdev:7
+
+where 'tfd' is a target file descriptor number in decimal form,
+'events' is events mask being watched and the 'data' is data
+associated with a target [see epoll(7) for more details].
+
+The 'pos' is current offset of the target file in decimal form
+[see lseek(2)], 'ino' and 'sdev' are inode and device numbers
+where target file resides, all in hex format.
+
+Fsnotify files
+~~~~~~~~~~~~~~
+For inotify files the format is the following::
+
+	pos:	0
+	flags:	02000000
+	inotify wd:3 ino:9e7e sdev:800013 mask:800afce ignored_mask:0 fhandle-bytes:8 fhandle-type:1 f_handle:7e9e0000640d1b6d
+
+where 'wd' is a watch descriptor in decimal form, ie a target file
+descriptor number, 'ino' and 'sdev' are inode and device where the
+target file resides and the 'mask' is the mask of events, all in hex
+form [see inotify(7) for more details].
+
+If the kernel was built with exportfs support, the path to the target
+file is encoded as a file handle.  The file handle is provided by three
+fields 'fhandle-bytes', 'fhandle-type' and 'f_handle', all in hex
+format.
+
+If the kernel is built without exportfs support the file handle won't be
+printed out.
+
+If there is no inotify mark attached yet the 'inotify' line will be omitted.
+
+For fanotify files the format is::
+
+	pos:	0
+	flags:	02
+	mnt_id:	9
+	fanotify flags:10 event-flags:0
+	fanotify mnt_id:12 mflags:40 mask:38 ignored_mask:40000003
+	fanotify ino:4f969 sdev:800013 mflags:0 mask:3b ignored_mask:40000000 fhandle-bytes:8 fhandle-type:1 f_handle:69f90400c275b5b4
+
+where fanotify 'flags' and 'event-flags' are values used in fanotify_init
+call, 'mnt_id' is the mount point identifier, 'mflags' is the value of
+flags associated with mark which are tracked separately from events
+mask. 'ino', 'sdev' are target inode and device, 'mask' is the events
+mask and 'ignored_mask' is the mask of events which are to be ignored.
+All in hex format. Incorporation of 'mflags', 'mask' and 'ignored_mask'
+does provide information about flags and mask used in fanotify_mark
+call [see fsnotify manpage for details].
+
+While the first three lines are mandatory and always printed, the rest is
+optional and may be omitted if no marks created yet.
+
+Timerfd files
+~~~~~~~~~~~~~
+
+::
+
+	pos:	0
+	flags:	02
+	mnt_id:	9
+	clockid: 0
+	ticks: 0
+	settime flags: 01
+	it_value: (0, 49406829)
+	it_interval: (1, 0)
+
+where 'clockid' is the clock type and 'ticks' is the number of the timer expirations
+that have occurred [see timerfd_create(2) for details]. 'settime flags' are
+flags in octal form been used to setup the timer [see timerfd_settime(2) for
+details]. 'it_value' is remaining time until the timer exiration.
+'it_interval' is the interval for the timer. Note the timer might be set up
+with TIMER_ABSTIME option which will be shown in 'settime flags', but 'it_value'
+still exhibits timer's remaining time.
+
+3.9	/proc/<pid>/map_files - Information about memory mapped files
+---------------------------------------------------------------------
+This directory contains symbolic links which represent memory mapped files
+the process is maintaining.  Example output::
+
+     | lr-------- 1 root root 64 Jan 27 11:24 333c600000-333c620000 -> /usr/lib64/ld-2.18.so
+     | lr-------- 1 root root 64 Jan 27 11:24 333c81f000-333c820000 -> /usr/lib64/ld-2.18.so
+     | lr-------- 1 root root 64 Jan 27 11:24 333c820000-333c821000 -> /usr/lib64/ld-2.18.so
+     | ...
+     | lr-------- 1 root root 64 Jan 27 11:24 35d0421000-35d0422000 -> /usr/lib64/libselinux.so.1
+     | lr-------- 1 root root 64 Jan 27 11:24 400000-41a000 -> /usr/bin/ls
+
+The name of a link represents the virtual memory bounds of a mapping, i.e.
+vm_area_struct::vm_start-vm_area_struct::vm_end.
+
+The main purpose of the map_files is to retrieve a set of memory mapped
+files in a fast way instead of parsing /proc/<pid>/maps or
+/proc/<pid>/smaps, both of which contain many more records.  At the same
+time one can open(2) mappings from the listings of two processes and
+comparing their inode numbers to figure out which anonymous memory areas
+are actually shared.
+
+3.10	/proc/<pid>/timerslack_ns - Task timerslack value
+---------------------------------------------------------
+This file provides the value of the task's timerslack value in nanoseconds.
+This value specifies a amount of time that normal timers may be deferred
+in order to coalesce timers and avoid unnecessary wakeups.
+
+This allows a task's interactivity vs power consumption trade off to be
+adjusted.
+
+Writing 0 to the file will set the tasks timerslack to the default value.
+
+Valid values are from 0 - ULLONG_MAX
+
+An application setting the value must have PTRACE_MODE_ATTACH_FSCREDS level
+permissions on the task specified to change its timerslack_ns value.
+
+3.11	/proc/<pid>/patch_state - Livepatch patch operation state
+-----------------------------------------------------------------
+When CONFIG_LIVEPATCH is enabled, this file displays the value of the
+patch state for the task.
+
+A value of '-1' indicates that no patch is in transition.
+
+A value of '0' indicates that a patch is in transition and the task is
+unpatched.  If the patch is being enabled, then the task hasn't been
+patched yet.  If the patch is being disabled, then the task has already
+been unpatched.
+
+A value of '1' indicates that a patch is in transition and the task is
+patched.  If the patch is being enabled, then the task has already been
+patched.  If the patch is being disabled, then the task hasn't been
+unpatched yet.
+
+3.12 /proc/<pid>/arch_status - task architecture specific status
+-------------------------------------------------------------------
+When CONFIG_PROC_PID_ARCH_STATUS is enabled, this file displays the
+architecture specific status of the task.
+
+Example
+~~~~~~~
+
+::
+
+ $ cat /proc/6753/arch_status
+ AVX512_elapsed_ms:      8
+
+Description
+~~~~~~~~~~~
+
+x86 specific entries:
+~~~~~~~~~~~~~~~~~~~~~
+
+AVX512_elapsed_ms:
+^^^^^^^^^^^^^^^^^^
+
+  If AVX512 is supported on the machine, this entry shows the milliseconds
+  elapsed since the last time AVX512 usage was recorded. The recording
+  happens on a best effort basis when a task is scheduled out. This means
+  that the value depends on two factors:
+
+    1) The time which the task spent on the CPU without being scheduled
+       out. With CPU isolation and a single runnable task this can take
+       several seconds.
+
+    2) The time since the task was scheduled out last. Depending on the
+       reason for being scheduled out (time slice exhausted, syscall ...)
+       this can be arbitrary long time.
+
+  As a consequence the value cannot be considered precise and authoritative
+  information. The application which uses this information has to be aware
+  of the overall scenario on the system in order to determine whether a
+  task is a real AVX512 user or not. Precise information can be obtained
+  with performance counters.
+
+  A special value of '-1' indicates that no AVX512 usage was recorded, thus
+  the task is unlikely an AVX512 user, but depends on the workload and the
+  scheduling scenario, it also could be a false negative mentioned above.
+
+Configuring procfs
+------------------
+
+4.1	Mount options
+---------------------
+
+The following mount options are supported:
+
+	=========	========================================================
+	hidepid=	Set /proc/<pid>/ access mode.
+	gid=		Set the group authorized to learn processes information.
+	=========	========================================================
+
+hidepid=0 means classic mode - everybody may access all /proc/<pid>/ directories
+(default).
+
+hidepid=1 means users may not access any /proc/<pid>/ directories but their
+own.  Sensitive files like cmdline, sched*, status are now protected against
+other users.  This makes it impossible to learn whether any user runs
+specific program (given the program doesn't reveal itself by its behaviour).
+As an additional bonus, as /proc/<pid>/cmdline is unaccessible for other users,
+poorly written programs passing sensitive information via program arguments are
+now protected against local eavesdroppers.
+
+hidepid=2 means hidepid=1 plus all /proc/<pid>/ will be fully invisible to other
+users.  It doesn't mean that it hides a fact whether a process with a specific
+pid value exists (it can be learned by other means, e.g. by "kill -0 $PID"),
+but it hides process' uid and gid, which may be learned by stat()'ing
+/proc/<pid>/ otherwise.  It greatly complicates an intruder's task of gathering
+information about running processes, whether some daemon runs with elevated
+privileges, whether other user runs some sensitive program, whether other users
+run any program at all, etc.
+
+gid= defines a group authorized to learn processes information otherwise
+prohibited by hidepid=.  If you use some daemon like identd which needs to learn
+information about processes information, just add identd to this group.
diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
deleted file mode 100644
index 99ca040e3f90..000000000000
--- a/Documentation/filesystems/proc.txt
+++ /dev/null
@@ -1,2047 +0,0 @@
-------------------------------------------------------------------------------
-                       T H E  /proc   F I L E S Y S T E M
-------------------------------------------------------------------------------
-/proc/sys         Terrehon Bowden <terrehon@pacbell.net>        October 7 1999
-                  Bodo Bauer <bb@ricochet.net>
-
-2.4.x update	  Jorge Nerin <comandante@zaralinux.com>      November 14 2000
-move /proc/sys	  Shen Feng <shen@cn.fujitsu.com>		  April 1 2009
-------------------------------------------------------------------------------
-Version 1.3                                              Kernel version 2.2.12
-					      Kernel version 2.4.0-test11-pre4
-------------------------------------------------------------------------------
-fixes/update part 1.1  Stefani Seibold <stefani@seibold.net>       June 9 2009
-
-Table of Contents
------------------
-
-  0     Preface
-  0.1	Introduction/Credits
-  0.2	Legal Stuff
-
-  1	Collecting System Information
-  1.1	Process-Specific Subdirectories
-  1.2	Kernel data
-  1.3	IDE devices in /proc/ide
-  1.4	Networking info in /proc/net
-  1.5	SCSI info
-  1.6	Parallel port info in /proc/parport
-  1.7	TTY info in /proc/tty
-  1.8	Miscellaneous kernel statistics in /proc/stat
-  1.9	Ext4 file system parameters
-
-  2	Modifying System Parameters
-
-  3	Per-Process Parameters
-  3.1	/proc/<pid>/oom_adj & /proc/<pid>/oom_score_adj - Adjust the oom-killer
-								score
-  3.2	/proc/<pid>/oom_score - Display current oom-killer score
-  3.3	/proc/<pid>/io - Display the IO accounting fields
-  3.4	/proc/<pid>/coredump_filter - Core dump filtering settings
-  3.5	/proc/<pid>/mountinfo - Information about mounts
-  3.6	/proc/<pid>/comm  & /proc/<pid>/task/<tid>/comm
-  3.7   /proc/<pid>/task/<tid>/children - Information about task children
-  3.8   /proc/<pid>/fdinfo/<fd> - Information about opened file
-  3.9   /proc/<pid>/map_files - Information about memory mapped files
-  3.10  /proc/<pid>/timerslack_ns - Task timerslack value
-  3.11	/proc/<pid>/patch_state - Livepatch patch operation state
-  3.12	/proc/<pid>/arch_status - Task architecture specific information
-
-  4	Configuring procfs
-  4.1	Mount options
-
-------------------------------------------------------------------------------
-Preface
-------------------------------------------------------------------------------
-
-0.1 Introduction/Credits
-------------------------
-
-This documentation is  part of a soon (or  so we hope) to be  released book on
-the SuSE  Linux distribution. As  there is  no complete documentation  for the
-/proc file system and we've used  many freely available sources to write these
-chapters, it  seems only fair  to give the work  back to the  Linux community.
-This work is  based on the 2.2.*  kernel version and the  upcoming 2.4.*. I'm
-afraid it's still far from complete, but we  hope it will be useful. As far as
-we know, it is the first 'all-in-one' document about the /proc file system. It
-is focused  on the Intel  x86 hardware,  so if you  are looking for  PPC, ARM,
-SPARC, AXP, etc., features, you probably  won't find what you are looking for.
-It also only covers IPv4 networking, not IPv6 nor other protocols - sorry. But
-additions and patches  are welcome and will  be added to this  document if you
-mail them to Bodo.
-
-We'd like  to  thank Alan Cox, Rik van Riel, and Alexey Kuznetsov and a lot of
-other people for help compiling this documentation. We'd also like to extend a
-special thank  you to Andi Kleen for documentation, which we relied on heavily
-to create  this  document,  as well as the additional information he provided.
-Thanks to  everybody  else  who contributed source or docs to the Linux kernel
-and helped create a great piece of software... :)
-
-If you  have  any comments, corrections or additions, please don't hesitate to
-contact Bodo  Bauer  at  bb@ricochet.net.  We'll  be happy to add them to this
-document.
-
-The   latest   version    of   this   document   is    available   online   at
-http://tldp.org/LDP/Linux-Filesystem-Hierarchy/html/proc.html
-
-If  the above  direction does  not works  for you,  you could  try the  kernel
-mailing  list  at  linux-kernel@vger.kernel.org  and/or try  to  reach  me  at
-comandante@zaralinux.com.
-
-0.2 Legal Stuff
----------------
-
-We don't  guarantee  the  correctness  of this document, and if you come to us
-complaining about  how  you  screwed  up  your  system  because  of  incorrect
-documentation, we won't feel responsible...
-
-------------------------------------------------------------------------------
-CHAPTER 1: COLLECTING SYSTEM INFORMATION
-------------------------------------------------------------------------------
-
-------------------------------------------------------------------------------
-In This Chapter
-------------------------------------------------------------------------------
-* Investigating  the  properties  of  the  pseudo  file  system  /proc and its
-  ability to provide information on the running Linux system
-* Examining /proc's structure
-* Uncovering  various  information  about the kernel and the processes running
-  on the system
-------------------------------------------------------------------------------
-
-
-The proc  file  system acts as an interface to internal data structures in the
-kernel. It  can  be  used to obtain information about the system and to change
-certain kernel parameters at runtime (sysctl).
-
-First, we'll  take  a  look  at the read-only parts of /proc. In Chapter 2, we
-show you how you can use /proc/sys to change settings.
-
-1.1 Process-Specific Subdirectories
------------------------------------
-
-The directory  /proc  contains  (among other things) one subdirectory for each
-process running on the system, which is named after the process ID (PID).
-
-The link  self  points  to  the  process reading the file system. Each process
-subdirectory has the entries listed in Table 1-1.
-
-Note that an open a file descriptor to /proc/<pid> or to any of its
-contained files or subdirectories does not prevent <pid> being reused
-for some other process in the event that <pid> exits. Operations on
-open /proc/<pid> file descriptors corresponding to dead processes
-never act on any new process that the kernel may, through chance, have
-also assigned the process ID <pid>. Instead, operations on these FDs
-usually fail with ESRCH.
-
-Table 1-1: Process specific entries in /proc
-..............................................................................
- File		Content
- clear_refs	Clears page referenced bits shown in smaps output
- cmdline	Command line arguments
- cpu		Current and last cpu in which it was executed	(2.4)(smp)
- cwd		Link to the current working directory
- environ	Values of environment variables
- exe		Link to the executable of this process
- fd		Directory, which contains all file descriptors
- maps		Memory maps to executables and library files	(2.4)
- mem		Memory held by this process
- root		Link to the root directory of this process
- stat		Process status
- statm		Process memory status information
- status		Process status in human readable form
- wchan		Present with CONFIG_KALLSYMS=y: it shows the kernel function
-		symbol the task is blocked in - or "0" if not blocked.
- pagemap	Page table
- stack		Report full stack trace, enable via CONFIG_STACKTRACE
- smaps		An extension based on maps, showing the memory consumption of
-		each mapping and flags associated with it
- smaps_rollup	Accumulated smaps stats for all mappings of the process.  This
-		can be derived from smaps, but is faster and more convenient
- numa_maps	An extension based on maps, showing the memory locality and
-		binding policy as well as mem usage (in pages) of each mapping.
-..............................................................................
-
-For example, to get the status information of a process, all you have to do is
-read the file /proc/PID/status:
-
-  >cat /proc/self/status
-  Name:   cat
-  State:  R (running)
-  Tgid:   5452
-  Pid:    5452
-  PPid:   743
-  TracerPid:      0						(2.4)
-  Uid:    501     501     501     501
-  Gid:    100     100     100     100
-  FDSize: 256
-  Groups: 100 14 16
-  VmPeak:     5004 kB
-  VmSize:     5004 kB
-  VmLck:         0 kB
-  VmHWM:       476 kB
-  VmRSS:       476 kB
-  RssAnon:             352 kB
-  RssFile:             120 kB
-  RssShmem:              4 kB
-  VmData:      156 kB
-  VmStk:        88 kB
-  VmExe:        68 kB
-  VmLib:      1412 kB
-  VmPTE:        20 kb
-  VmSwap:        0 kB
-  HugetlbPages:          0 kB
-  CoreDumping:    0
-  THP_enabled:	  1
-  Threads:        1
-  SigQ:   0/28578
-  SigPnd: 0000000000000000
-  ShdPnd: 0000000000000000
-  SigBlk: 0000000000000000
-  SigIgn: 0000000000000000
-  SigCgt: 0000000000000000
-  CapInh: 00000000fffffeff
-  CapPrm: 0000000000000000
-  CapEff: 0000000000000000
-  CapBnd: ffffffffffffffff
-  CapAmb: 0000000000000000
-  NoNewPrivs:     0
-  Seccomp:        0
-  Speculation_Store_Bypass:       thread vulnerable
-  voluntary_ctxt_switches:        0
-  nonvoluntary_ctxt_switches:     1
-
-This shows you nearly the same information you would get if you viewed it with
-the ps  command.  In  fact,  ps  uses  the  proc  file  system  to  obtain its
-information.  But you get a more detailed  view of the  process by reading the
-file /proc/PID/status. It fields are described in table 1-2.
-
-The  statm  file  contains  more  detailed  information about the process
-memory usage. Its seven fields are explained in Table 1-3.  The stat file
-contains details information about the process itself.  Its fields are
-explained in Table 1-4.
-
-(for SMP CONFIG users)
-For making accounting scalable, RSS related information are handled in an
-asynchronous manner and the value may not be very precise. To see a precise
-snapshot of a moment, you can see /proc/<pid>/smaps file and scan page table.
-It's slow but very precise.
-
-Table 1-2: Contents of the status files (as of 4.19)
-..............................................................................
- Field                       Content
- Name                        filename of the executable
- Umask                       file mode creation mask
- State                       state (R is running, S is sleeping, D is sleeping
-                             in an uninterruptible wait, Z is zombie,
-			     T is traced or stopped)
- Tgid                        thread group ID
- Ngid                        NUMA group ID (0 if none)
- Pid                         process id
- PPid                        process id of the parent process
- TracerPid                   PID of process tracing this process (0 if not)
- Uid                         Real, effective, saved set, and  file system UIDs
- Gid                         Real, effective, saved set, and  file system GIDs
- FDSize                      number of file descriptor slots currently allocated
- Groups                      supplementary group list
- NStgid                      descendant namespace thread group ID hierarchy
- NSpid                       descendant namespace process ID hierarchy
- NSpgid                      descendant namespace process group ID hierarchy
- NSsid                       descendant namespace session ID hierarchy
- VmPeak                      peak virtual memory size
- VmSize                      total program size
- VmLck                       locked memory size
- VmPin                       pinned memory size
- VmHWM                       peak resident set size ("high water mark")
- VmRSS                       size of memory portions. It contains the three
-                             following parts (VmRSS = RssAnon + RssFile + RssShmem)
- RssAnon                     size of resident anonymous memory
- RssFile                     size of resident file mappings
- RssShmem                    size of resident shmem memory (includes SysV shm,
-                             mapping of tmpfs and shared anonymous mappings)
- VmData                      size of private data segments
- VmStk                       size of stack segments
- VmExe                       size of text segment
- VmLib                       size of shared library code
- VmPTE                       size of page table entries
- VmSwap                      amount of swap used by anonymous private data
-                             (shmem swap usage is not included)
- HugetlbPages                size of hugetlb memory portions
- CoreDumping                 process's memory is currently being dumped
-                             (killing the process may lead to a corrupted core)
- THP_enabled		     process is allowed to use THP (returns 0 when
-			     PR_SET_THP_DISABLE is set on the process
- Threads                     number of threads
- SigQ                        number of signals queued/max. number for queue
- SigPnd                      bitmap of pending signals for the thread
- ShdPnd                      bitmap of shared pending signals for the process
- SigBlk                      bitmap of blocked signals
- SigIgn                      bitmap of ignored signals
- SigCgt                      bitmap of caught signals
- CapInh                      bitmap of inheritable capabilities
- CapPrm                      bitmap of permitted capabilities
- CapEff                      bitmap of effective capabilities
- CapBnd                      bitmap of capabilities bounding set
- CapAmb                      bitmap of ambient capabilities
- NoNewPrivs                  no_new_privs, like prctl(PR_GET_NO_NEW_PRIV, ...)
- Seccomp                     seccomp mode, like prctl(PR_GET_SECCOMP, ...)
- Speculation_Store_Bypass    speculative store bypass mitigation status
- Cpus_allowed                mask of CPUs on which this process may run
- Cpus_allowed_list           Same as previous, but in "list format"
- Mems_allowed                mask of memory nodes allowed to this process
- Mems_allowed_list           Same as previous, but in "list format"
- voluntary_ctxt_switches     number of voluntary context switches
- nonvoluntary_ctxt_switches  number of non voluntary context switches
-..............................................................................
-
-Table 1-3: Contents of the statm files (as of 2.6.8-rc3)
-..............................................................................
- Field    Content
- size     total program size (pages)		(same as VmSize in status)
- resident size of memory portions (pages)	(same as VmRSS in status)
- shared   number of pages that are shared	(i.e. backed by a file, same
-						as RssFile+RssShmem in status)
- trs      number of pages that are 'code'	(not including libs; broken,
-							includes data segment)
- lrs      number of pages of library		(always 0 on 2.6)
- drs      number of pages of data/stack		(including libs; broken,
-							includes library text)
- dt       number of dirty pages			(always 0 on 2.6)
-..............................................................................
-
-
-Table 1-4: Contents of the stat files (as of 2.6.30-rc7)
-..............................................................................
- Field          Content
-  pid           process id
-  tcomm         filename of the executable
-  state         state (R is running, S is sleeping, D is sleeping in an
-                uninterruptible wait, Z is zombie, T is traced or stopped)
-  ppid          process id of the parent process
-  pgrp          pgrp of the process
-  sid           session id
-  tty_nr        tty the process uses
-  tty_pgrp      pgrp of the tty
-  flags         task flags
-  min_flt       number of minor faults
-  cmin_flt      number of minor faults with child's
-  maj_flt       number of major faults
-  cmaj_flt      number of major faults with child's
-  utime         user mode jiffies
-  stime         kernel mode jiffies
-  cutime        user mode jiffies with child's
-  cstime        kernel mode jiffies with child's
-  priority      priority level
-  nice          nice level
-  num_threads   number of threads
-  it_real_value	(obsolete, always 0)
-  start_time    time the process started after system boot
-  vsize         virtual memory size
-  rss           resident set memory size
-  rsslim        current limit in bytes on the rss
-  start_code    address above which program text can run
-  end_code      address below which program text can run
-  start_stack   address of the start of the main process stack
-  esp           current value of ESP
-  eip           current value of EIP
-  pending       bitmap of pending signals
-  blocked       bitmap of blocked signals
-  sigign        bitmap of ignored signals
-  sigcatch      bitmap of caught signals
-  0		(place holder, used to be the wchan address, use /proc/PID/wchan instead)
-  0             (place holder)
-  0             (place holder)
-  exit_signal   signal to send to parent thread on exit
-  task_cpu      which CPU the task is scheduled on
-  rt_priority   realtime priority
-  policy        scheduling policy (man sched_setscheduler)
-  blkio_ticks   time spent waiting for block IO
-  gtime         guest time of the task in jiffies
-  cgtime        guest time of the task children in jiffies
-  start_data    address above which program data+bss is placed
-  end_data      address below which program data+bss is placed
-  start_brk     address above which program heap can be expanded with brk()
-  arg_start     address above which program command line is placed
-  arg_end       address below which program command line is placed
-  env_start     address above which program environment is placed
-  env_end       address below which program environment is placed
-  exit_code     the thread's exit_code in the form reported by the waitpid system call
-..............................................................................
-
-The /proc/PID/maps file contains the currently mapped memory regions and
-their access permissions.
-
-The format is:
-
-address           perms offset  dev   inode      pathname
-
-08048000-08049000 r-xp 00000000 03:00 8312       /opt/test
-08049000-0804a000 rw-p 00001000 03:00 8312       /opt/test
-0804a000-0806b000 rw-p 00000000 00:00 0          [heap]
-a7cb1000-a7cb2000 ---p 00000000 00:00 0
-a7cb2000-a7eb2000 rw-p 00000000 00:00 0
-a7eb2000-a7eb3000 ---p 00000000 00:00 0
-a7eb3000-a7ed5000 rw-p 00000000 00:00 0
-a7ed5000-a8008000 r-xp 00000000 03:00 4222       /lib/libc.so.6
-a8008000-a800a000 r--p 00133000 03:00 4222       /lib/libc.so.6
-a800a000-a800b000 rw-p 00135000 03:00 4222       /lib/libc.so.6
-a800b000-a800e000 rw-p 00000000 00:00 0
-a800e000-a8022000 r-xp 00000000 03:00 14462      /lib/libpthread.so.0
-a8022000-a8023000 r--p 00013000 03:00 14462      /lib/libpthread.so.0
-a8023000-a8024000 rw-p 00014000 03:00 14462      /lib/libpthread.so.0
-a8024000-a8027000 rw-p 00000000 00:00 0
-a8027000-a8043000 r-xp 00000000 03:00 8317       /lib/ld-linux.so.2
-a8043000-a8044000 r--p 0001b000 03:00 8317       /lib/ld-linux.so.2
-a8044000-a8045000 rw-p 0001c000 03:00 8317       /lib/ld-linux.so.2
-aff35000-aff4a000 rw-p 00000000 00:00 0          [stack]
-ffffe000-fffff000 r-xp 00000000 00:00 0          [vdso]
-
-where "address" is the address space in the process that it occupies, "perms"
-is a set of permissions:
-
- r = read
- w = write
- x = execute
- s = shared
- p = private (copy on write)
-
-"offset" is the offset into the mapping, "dev" is the device (major:minor), and
-"inode" is the inode  on that device.  0 indicates that  no inode is associated
-with the memory region, as the case would be with BSS (uninitialized data).
-The "pathname" shows the name associated file for this mapping.  If the mapping
-is not associated with a file:
-
- [heap]                   = the heap of the program
- [stack]                  = the stack of the main process
- [vdso]                   = the "virtual dynamic shared object",
-                            the kernel system call handler
-
- or if empty, the mapping is anonymous.
-
-The /proc/PID/smaps is an extension based on maps, showing the memory
-consumption for each of the process's mappings. For each mapping (aka Virtual
-Memory Area, or VMA) there is a series of lines such as the following:
-
-08048000-080bc000 r-xp 00000000 03:02 13130      /bin/bash
-
-Size:               1084 kB
-KernelPageSize:        4 kB
-MMUPageSize:           4 kB
-Rss:                 892 kB
-Pss:                 374 kB
-Shared_Clean:        892 kB
-Shared_Dirty:          0 kB
-Private_Clean:         0 kB
-Private_Dirty:         0 kB
-Referenced:          892 kB
-Anonymous:             0 kB
-LazyFree:              0 kB
-AnonHugePages:         0 kB
-ShmemPmdMapped:        0 kB
-Shared_Hugetlb:        0 kB
-Private_Hugetlb:       0 kB
-Swap:                  0 kB
-SwapPss:               0 kB
-KernelPageSize:        4 kB
-MMUPageSize:           4 kB
-Locked:                0 kB
-THPeligible:           0
-VmFlags: rd ex mr mw me dw
-
-The first of these lines shows the same information as is displayed for the
-mapping in /proc/PID/maps.  Following lines show the size of the mapping
-(size); the size of each page allocated when backing a VMA (KernelPageSize),
-which is usually the same as the size in the page table entries; the page size
-used by the MMU when backing a VMA (in most cases, the same as KernelPageSize);
-the amount of the mapping that is currently resident in RAM (RSS); the
-process' proportional share of this mapping (PSS); and the number of clean and
-dirty shared and private pages in the mapping.
-
-The "proportional set size" (PSS) of a process is the count of pages it has
-in memory, where each page is divided by the number of processes sharing it.
-So if a process has 1000 pages all to itself, and 1000 shared with one other
-process, its PSS will be 1500.
-Note that even a page which is part of a MAP_SHARED mapping, but has only
-a single pte mapped, i.e.  is currently used by only one process, is accounted
-as private and not as shared.
-"Referenced" indicates the amount of memory currently marked as referenced or
-accessed.
-"Anonymous" shows the amount of memory that does not belong to any file.  Even
-a mapping associated with a file may contain anonymous pages: when MAP_PRIVATE
-and a page is modified, the file page is replaced by a private anonymous copy.
-"LazyFree" shows the amount of memory which is marked by madvise(MADV_FREE).
-The memory isn't freed immediately with madvise(). It's freed in memory
-pressure if the memory is clean. Please note that the printed value might
-be lower than the real value due to optimizations used in the current
-implementation. If this is not desirable please file a bug report.
-"AnonHugePages" shows the ammount of memory backed by transparent hugepage.
-"ShmemPmdMapped" shows the ammount of shared (shmem/tmpfs) memory backed by
-huge pages.
-"Shared_Hugetlb" and "Private_Hugetlb" show the ammounts of memory backed by
-hugetlbfs page which is *not* counted in "RSS" or "PSS" field for historical
-reasons. And these are not included in {Shared,Private}_{Clean,Dirty} field.
-"Swap" shows how much would-be-anonymous memory is also used, but out on swap.
-For shmem mappings, "Swap" includes also the size of the mapped (and not
-replaced by copy-on-write) part of the underlying shmem object out on swap.
-"SwapPss" shows proportional swap share of this mapping. Unlike "Swap", this
-does not take into account swapped out page of underlying shmem objects.
-"Locked" indicates whether the mapping is locked in memory or not.
-"THPeligible" indicates whether the mapping is eligible for allocating THP
-pages - 1 if true, 0 otherwise. It just shows the current status.
-
-"VmFlags" field deserves a separate description. This member represents the kernel
-flags associated with the particular virtual memory area in two letter encoded
-manner. The codes are the following:
-    rd  - readable
-    wr  - writeable
-    ex  - executable
-    sh  - shared
-    mr  - may read
-    mw  - may write
-    me  - may execute
-    ms  - may share
-    gd  - stack segment growns down
-    pf  - pure PFN range
-    dw  - disabled write to the mapped file
-    lo  - pages are locked in memory
-    io  - memory mapped I/O area
-    sr  - sequential read advise provided
-    rr  - random read advise provided
-    dc  - do not copy area on fork
-    de  - do not expand area on remapping
-    ac  - area is accountable
-    nr  - swap space is not reserved for the area
-    ht  - area uses huge tlb pages
-    ar  - architecture specific flag
-    dd  - do not include area into core dump
-    sd  - soft-dirty flag
-    mm  - mixed map area
-    hg  - huge page advise flag
-    nh  - no-huge page advise flag
-    mg  - mergable advise flag
-
-Note that there is no guarantee that every flag and associated mnemonic will
-be present in all further kernel releases. Things get changed, the flags may
-be vanished or the reverse -- new added. Interpretation of their meaning
-might change in future as well. So each consumer of these flags has to
-follow each specific kernel version for the exact semantic.
-
-This file is only present if the CONFIG_MMU kernel configuration option is
-enabled.
-
-Note: reading /proc/PID/maps or /proc/PID/smaps is inherently racy (consistent
-output can be achieved only in the single read call).
-This typically manifests when doing partial reads of these files while the
-memory map is being modified.  Despite the races, we do provide the following
-guarantees:
-
-1) The mapped addresses never go backwards, which implies no two
-   regions will ever overlap.
-2) If there is something at a given vaddr during the entirety of the
-   life of the smaps/maps walk, there will be some output for it.
-
-The /proc/PID/smaps_rollup file includes the same fields as /proc/PID/smaps,
-but their values are the sums of the corresponding values for all mappings of
-the process.  Additionally, it contains these fields:
-
-Pss_Anon
-Pss_File
-Pss_Shmem
-
-They represent the proportional shares of anonymous, file, and shmem pages, as
-described for smaps above.  These fields are omitted in smaps since each
-mapping identifies the type (anon, file, or shmem) of all pages it contains.
-Thus all information in smaps_rollup can be derived from smaps, but at a
-significantly higher cost.
-
-The /proc/PID/clear_refs is used to reset the PG_Referenced and ACCESSED/YOUNG
-bits on both physical and virtual pages associated with a process, and the
-soft-dirty bit on pte (see Documentation/admin-guide/mm/soft-dirty.rst
-for details).
-To clear the bits for all the pages associated with the process
-    > echo 1 > /proc/PID/clear_refs
-
-To clear the bits for the anonymous pages associated with the process
-    > echo 2 > /proc/PID/clear_refs
-
-To clear the bits for the file mapped pages associated with the process
-    > echo 3 > /proc/PID/clear_refs
-
-To clear the soft-dirty bit
-    > echo 4 > /proc/PID/clear_refs
-
-To reset the peak resident set size ("high water mark") to the process's
-current value:
-    > echo 5 > /proc/PID/clear_refs
-
-Any other value written to /proc/PID/clear_refs will have no effect.
-
-The /proc/pid/pagemap gives the PFN, which can be used to find the pageflags
-using /proc/kpageflags and number of times a page is mapped using
-/proc/kpagecount. For detailed explanation, see
-Documentation/admin-guide/mm/pagemap.rst.
-
-The /proc/pid/numa_maps is an extension based on maps, showing the memory
-locality and binding policy, as well as the memory usage (in pages) of
-each mapping. The output follows a general format where mapping details get
-summarized separated by blank spaces, one mapping per each file line:
-
-address   policy    mapping details
-
-00400000 default file=/usr/local/bin/app mapped=1 active=0 N3=1 kernelpagesize_kB=4
-00600000 default file=/usr/local/bin/app anon=1 dirty=1 N3=1 kernelpagesize_kB=4
-3206000000 default file=/lib64/ld-2.12.so mapped=26 mapmax=6 N0=24 N3=2 kernelpagesize_kB=4
-320621f000 default file=/lib64/ld-2.12.so anon=1 dirty=1 N3=1 kernelpagesize_kB=4
-3206220000 default file=/lib64/ld-2.12.so anon=1 dirty=1 N3=1 kernelpagesize_kB=4
-3206221000 default anon=1 dirty=1 N3=1 kernelpagesize_kB=4
-3206800000 default file=/lib64/libc-2.12.so mapped=59 mapmax=21 active=55 N0=41 N3=18 kernelpagesize_kB=4
-320698b000 default file=/lib64/libc-2.12.so
-3206b8a000 default file=/lib64/libc-2.12.so anon=2 dirty=2 N3=2 kernelpagesize_kB=4
-3206b8e000 default file=/lib64/libc-2.12.so anon=1 dirty=1 N3=1 kernelpagesize_kB=4
-3206b8f000 default anon=3 dirty=3 active=1 N3=3 kernelpagesize_kB=4
-7f4dc10a2000 default anon=3 dirty=3 N3=3 kernelpagesize_kB=4
-7f4dc10b4000 default anon=2 dirty=2 active=1 N3=2 kernelpagesize_kB=4
-7f4dc1200000 default file=/anon_hugepage\040(deleted) huge anon=1 dirty=1 N3=1 kernelpagesize_kB=2048
-7fff335f0000 default stack anon=3 dirty=3 N3=3 kernelpagesize_kB=4
-7fff3369d000 default mapped=1 mapmax=35 active=0 N3=1 kernelpagesize_kB=4
-
-Where:
-"address" is the starting address for the mapping;
-"policy" reports the NUMA memory policy set for the mapping (see Documentation/admin-guide/mm/numa_memory_policy.rst);
-"mapping details" summarizes mapping data such as mapping type, page usage counters,
-node locality page counters (N0 == node0, N1 == node1, ...) and the kernel page
-size, in KB, that is backing the mapping up.
-
-1.2 Kernel data
----------------
-
-Similar to  the  process entries, the kernel data files give information about
-the running kernel. The files used to obtain this information are contained in
-/proc and  are  listed  in Table 1-5. Not all of these will be present in your
-system. It  depends  on the kernel configuration and the loaded modules, which
-files are there, and which are missing.
-
-Table 1-5: Kernel info in /proc
-..............................................................................
- File        Content                                           
- apm         Advanced power management info                    
- buddyinfo   Kernel memory allocator information (see text)	(2.5)
- bus         Directory containing bus specific information     
- cmdline     Kernel command line                               
- cpuinfo     Info about the CPU                                
- devices     Available devices (block and character)           
- dma         Used DMS channels                                 
- filesystems Supported filesystems                             
- driver	     Various drivers grouped here, currently rtc (2.4)
- execdomains Execdomains, related to security			(2.4)
- fb	     Frame Buffer devices				(2.4)
- fs	     File system parameters, currently nfs/exports	(2.4)
- ide         Directory containing info about the IDE subsystem 
- interrupts  Interrupt usage                                   
- iomem	     Memory map						(2.4)
- ioports     I/O port usage                                    
- irq	     Masks for irq to cpu affinity			(2.4)(smp?)
- isapnp	     ISA PnP (Plug&Play) Info				(2.4)
- kcore       Kernel core image (can be ELF or A.OUT(deprecated in 2.4))   
- kmsg        Kernel messages                                   
- ksyms       Kernel symbol table                               
- loadavg     Load average of last 1, 5 & 15 minutes                
- locks       Kernel locks                                      
- meminfo     Memory info                                       
- misc        Miscellaneous                                     
- modules     List of loaded modules                            
- mounts      Mounted filesystems                               
- net         Networking info (see text)                        
- pagetypeinfo Additional page allocator information (see text)  (2.5)
- partitions  Table of partitions known to the system           
- pci	     Deprecated info of PCI bus (new way -> /proc/bus/pci/,
-             decoupled by lspci					(2.4)
- rtc         Real time clock                                   
- scsi        SCSI info (see text)                              
- slabinfo    Slab pool info                                    
- softirqs    softirq usage
- stat        Overall statistics                                
- swaps       Swap space utilization                            
- sys         See chapter 2                                     
- sysvipc     Info of SysVIPC Resources (msg, sem, shm)		(2.4)
- tty	     Info of tty drivers
- uptime      Wall clock since boot, combined idle time of all cpus
- version     Kernel version                                    
- video	     bttv info of video resources			(2.4)
- vmallocinfo Show vmalloced areas
-..............................................................................
-
-You can,  for  example,  check  which interrupts are currently in use and what
-they are used for by looking in the file /proc/interrupts:
-
-  > cat /proc/interrupts 
-             CPU0        
-    0:    8728810          XT-PIC  timer 
-    1:        895          XT-PIC  keyboard 
-    2:          0          XT-PIC  cascade 
-    3:     531695          XT-PIC  aha152x 
-    4:    2014133          XT-PIC  serial 
-    5:      44401          XT-PIC  pcnet_cs 
-    8:          2          XT-PIC  rtc 
-   11:          8          XT-PIC  i82365 
-   12:     182918          XT-PIC  PS/2 Mouse 
-   13:          1          XT-PIC  fpu 
-   14:    1232265          XT-PIC  ide0 
-   15:          7          XT-PIC  ide1 
-  NMI:          0 
-
-In 2.4.* a couple of lines where added to this file LOC & ERR (this time is the
-output of a SMP machine):
-
-  > cat /proc/interrupts 
-
-             CPU0       CPU1       
-    0:    1243498    1214548    IO-APIC-edge  timer
-    1:       8949       8958    IO-APIC-edge  keyboard
-    2:          0          0          XT-PIC  cascade
-    5:      11286      10161    IO-APIC-edge  soundblaster
-    8:          1          0    IO-APIC-edge  rtc
-    9:      27422      27407    IO-APIC-edge  3c503
-   12:     113645     113873    IO-APIC-edge  PS/2 Mouse
-   13:          0          0          XT-PIC  fpu
-   14:      22491      24012    IO-APIC-edge  ide0
-   15:       2183       2415    IO-APIC-edge  ide1
-   17:      30564      30414   IO-APIC-level  eth0
-   18:        177        164   IO-APIC-level  bttv
-  NMI:    2457961    2457959 
-  LOC:    2457882    2457881 
-  ERR:       2155
-
-NMI is incremented in this case because every timer interrupt generates a NMI
-(Non Maskable Interrupt) which is used by the NMI Watchdog to detect lockups.
-
-LOC is the local interrupt counter of the internal APIC of every CPU.
-
-ERR is incremented in the case of errors in the IO-APIC bus (the bus that
-connects the CPUs in a SMP system. This means that an error has been detected,
-the IO-APIC automatically retry the transmission, so it should not be a big
-problem, but you should read the SMP-FAQ.
-
-In 2.6.2* /proc/interrupts was expanded again.  This time the goal was for
-/proc/interrupts to display every IRQ vector in use by the system, not
-just those considered 'most important'.  The new vectors are:
-
-  THR -- interrupt raised when a machine check threshold counter
-  (typically counting ECC corrected errors of memory or cache) exceeds
-  a configurable threshold.  Only available on some systems.
-
-  TRM -- a thermal event interrupt occurs when a temperature threshold
-  has been exceeded for the CPU.  This interrupt may also be generated
-  when the temperature drops back to normal.
-
-  SPU -- a spurious interrupt is some interrupt that was raised then lowered
-  by some IO device before it could be fully processed by the APIC.  Hence
-  the APIC sees the interrupt but does not know what device it came from.
-  For this case the APIC will generate the interrupt with a IRQ vector
-  of 0xff. This might also be generated by chipset bugs.
-
-  RES, CAL, TLB -- rescheduling, call and TLB flush interrupts are
-  sent from one CPU to another per the needs of the OS.  Typically,
-  their statistics are used by kernel developers and interested users to
-  determine the occurrence of interrupts of the given type.
-
-The above IRQ vectors are displayed only when relevant.  For example,
-the threshold vector does not exist on x86_64 platforms.  Others are
-suppressed when the system is a uniprocessor.  As of this writing, only
-i386 and x86_64 platforms support the new IRQ vector displays.
-
-Of some interest is the introduction of the /proc/irq directory to 2.4.
-It could be used to set IRQ to CPU affinity, this means that you can "hook" an
-IRQ to only one CPU, or to exclude a CPU of handling IRQs. The contents of the
-irq subdir is one subdir for each IRQ, and two files; default_smp_affinity and
-prof_cpu_mask.
-
-For example 
-  > ls /proc/irq/
-  0  10  12  14  16  18  2  4  6  8  prof_cpu_mask
-  1  11  13  15  17  19  3  5  7  9  default_smp_affinity
-  > ls /proc/irq/0/
-  smp_affinity
-
-smp_affinity is a bitmask, in which you can specify which CPUs can handle the
-IRQ, you can set it by doing:
-
-  > echo 1 > /proc/irq/10/smp_affinity
-
-This means that only the first CPU will handle the IRQ, but you can also echo
-5 which means that only the first and third CPU can handle the IRQ.
-
-The contents of each smp_affinity file is the same by default:
-
-  > cat /proc/irq/0/smp_affinity
-  ffffffff
-
-There is an alternate interface, smp_affinity_list which allows specifying
-a cpu range instead of a bitmask:
-
-  > cat /proc/irq/0/smp_affinity_list
-  1024-1031
-
-The default_smp_affinity mask applies to all non-active IRQs, which are the
-IRQs which have not yet been allocated/activated, and hence which lack a
-/proc/irq/[0-9]* directory.
-
-The node file on an SMP system shows the node to which the device using the IRQ
-reports itself as being attached. This hardware locality information does not
-include information about any possible driver locality preference.
-
-prof_cpu_mask specifies which CPUs are to be profiled by the system wide
-profiler. Default value is ffffffff (all cpus if there are only 32 of them).
-
-The way IRQs are routed is handled by the IO-APIC, and it's Round Robin
-between all the CPUs which are allowed to handle it. As usual the kernel has
-more info than you and does a better job than you, so the defaults are the
-best choice for almost everyone.  [Note this applies only to those IO-APIC's
-that support "Round Robin" interrupt distribution.]
-
-There are  three  more  important subdirectories in /proc: net, scsi, and sys.
-The general  rule  is  that  the  contents,  or  even  the  existence of these
-directories, depend  on your kernel configuration. If SCSI is not enabled, the
-directory scsi  may  not  exist. The same is true with the net, which is there
-only when networking support is present in the running kernel.
-
-The slabinfo  file  gives  information  about  memory usage at the slab level.
-Linux uses  slab  pools for memory management above page level in version 2.2.
-Commonly used  objects  have  their  own  slab  pool (such as network buffers,
-directory cache, and so on).
-
-..............................................................................
-
-> cat /proc/buddyinfo
-
-Node 0, zone      DMA      0      4      5      4      4      3 ...
-Node 0, zone   Normal      1      0      0      1    101      8 ...
-Node 0, zone  HighMem      2      0      0      1      1      0 ...
-
-External fragmentation is a problem under some workloads, and buddyinfo is a
-useful tool for helping diagnose these problems.  Buddyinfo will give you a 
-clue as to how big an area you can safely allocate, or why a previous
-allocation failed.
-
-Each column represents the number of pages of a certain order which are 
-available.  In this case, there are 0 chunks of 2^0*PAGE_SIZE available in 
-ZONE_DMA, 4 chunks of 2^1*PAGE_SIZE in ZONE_DMA, 101 chunks of 2^4*PAGE_SIZE 
-available in ZONE_NORMAL, etc... 
-
-More information relevant to external fragmentation can be found in
-pagetypeinfo.
-
-> cat /proc/pagetypeinfo
-Page block order: 9
-Pages per block:  512
-
-Free pages count per migrate type at order       0      1      2      3      4      5      6      7      8      9     10
-Node    0, zone      DMA, type    Unmovable      0      0      0      1      1      1      1      1      1      1      0
-Node    0, zone      DMA, type  Reclaimable      0      0      0      0      0      0      0      0      0      0      0
-Node    0, zone      DMA, type      Movable      1      1      2      1      2      1      1      0      1      0      2
-Node    0, zone      DMA, type      Reserve      0      0      0      0      0      0      0      0      0      1      0
-Node    0, zone      DMA, type      Isolate      0      0      0      0      0      0      0      0      0      0      0
-Node    0, zone    DMA32, type    Unmovable    103     54     77      1      1      1     11      8      7      1      9
-Node    0, zone    DMA32, type  Reclaimable      0      0      2      1      0      0      0      0      1      0      0
-Node    0, zone    DMA32, type      Movable    169    152    113     91     77     54     39     13      6      1    452
-Node    0, zone    DMA32, type      Reserve      1      2      2      2      2      0      1      1      1      1      0
-Node    0, zone    DMA32, type      Isolate      0      0      0      0      0      0      0      0      0      0      0
-
-Number of blocks type     Unmovable  Reclaimable      Movable      Reserve      Isolate
-Node 0, zone      DMA            2            0            5            1            0
-Node 0, zone    DMA32           41            6          967            2            0
-
-Fragmentation avoidance in the kernel works by grouping pages of different
-migrate types into the same contiguous regions of memory called page blocks.
-A page block is typically the size of the default hugepage size e.g. 2MB on
-X86-64. By keeping pages grouped based on their ability to move, the kernel
-can reclaim pages within a page block to satisfy a high-order allocation.
-
-The pagetypinfo begins with information on the size of a page block. It
-then gives the same type of information as buddyinfo except broken down
-by migrate-type and finishes with details on how many page blocks of each
-type exist.
-
-If min_free_kbytes has been tuned correctly (recommendations made by hugeadm
-from libhugetlbfs https://github.com/libhugetlbfs/libhugetlbfs/), one can
-make an estimate of the likely number of huge pages that can be allocated
-at a given point in time. All the "Movable" blocks should be allocatable
-unless memory has been mlock()'d. Some of the Reclaimable blocks should
-also be allocatable although a lot of filesystem metadata may have to be
-reclaimed to achieve this.
-
-..............................................................................
-
-meminfo:
-
-Provides information about distribution and utilization of memory.  This
-varies by architecture and compile options.  The following is from a
-16GB PIII, which has highmem enabled.  You may not have all of these fields.
-
-> cat /proc/meminfo
-
-MemTotal:     16344972 kB
-MemFree:      13634064 kB
-MemAvailable: 14836172 kB
-Buffers:          3656 kB
-Cached:        1195708 kB
-SwapCached:          0 kB
-Active:         891636 kB
-Inactive:      1077224 kB
-HighTotal:    15597528 kB
-HighFree:     13629632 kB
-LowTotal:       747444 kB
-LowFree:          4432 kB
-SwapTotal:           0 kB
-SwapFree:            0 kB
-Dirty:             968 kB
-Writeback:           0 kB
-AnonPages:      861800 kB
-Mapped:         280372 kB
-Shmem:             644 kB
-KReclaimable:   168048 kB
-Slab:           284364 kB
-SReclaimable:   159856 kB
-SUnreclaim:     124508 kB
-PageTables:      24448 kB
-NFS_Unstable:        0 kB
-Bounce:              0 kB
-WritebackTmp:        0 kB
-CommitLimit:   7669796 kB
-Committed_AS:   100056 kB
-VmallocTotal:   112216 kB
-VmallocUsed:       428 kB
-VmallocChunk:   111088 kB
-Percpu:          62080 kB
-HardwareCorrupted:   0 kB
-AnonHugePages:   49152 kB
-ShmemHugePages:      0 kB
-ShmemPmdMapped:      0 kB
-
-
-    MemTotal: Total usable ram (i.e. physical ram minus a few reserved
-              bits and the kernel binary code)
-     MemFree: The sum of LowFree+HighFree
-MemAvailable: An estimate of how much memory is available for starting new
-              applications, without swapping. Calculated from MemFree,
-              SReclaimable, the size of the file LRU lists, and the low
-              watermarks in each zone.
-              The estimate takes into account that the system needs some
-              page cache to function well, and that not all reclaimable
-              slab will be reclaimable, due to items being in use. The
-              impact of those factors will vary from system to system.
-     Buffers: Relatively temporary storage for raw disk blocks
-              shouldn't get tremendously large (20MB or so)
-      Cached: in-memory cache for files read from the disk (the
-              pagecache).  Doesn't include SwapCached
-  SwapCached: Memory that once was swapped out, is swapped back in but
-              still also is in the swapfile (if memory is needed it
-              doesn't need to be swapped out AGAIN because it is already
-              in the swapfile. This saves I/O)
-      Active: Memory that has been used more recently and usually not
-              reclaimed unless absolutely necessary.
-    Inactive: Memory which has been less recently used.  It is more
-              eligible to be reclaimed for other purposes
-   HighTotal:
-    HighFree: Highmem is all memory above ~860MB of physical memory
-              Highmem areas are for use by userspace programs, or
-              for the pagecache.  The kernel must use tricks to access
-              this memory, making it slower to access than lowmem.
-    LowTotal:
-     LowFree: Lowmem is memory which can be used for everything that
-              highmem can be used for, but it is also available for the
-              kernel's use for its own data structures.  Among many
-              other things, it is where everything from the Slab is
-              allocated.  Bad things happen when you're out of lowmem.
-   SwapTotal: total amount of swap space available
-    SwapFree: Memory which has been evicted from RAM, and is temporarily
-              on the disk
-       Dirty: Memory which is waiting to get written back to the disk
-   Writeback: Memory which is actively being written back to the disk
-   AnonPages: Non-file backed pages mapped into userspace page tables
-HardwareCorrupted: The amount of RAM/memory in KB, the kernel identifies as
-	      corrupted.
-AnonHugePages: Non-file backed huge pages mapped into userspace page tables
-      Mapped: files which have been mmaped, such as libraries
-       Shmem: Total memory used by shared memory (shmem) and tmpfs
-ShmemHugePages: Memory used by shared memory (shmem) and tmpfs allocated
-              with huge pages
-ShmemPmdMapped: Shared memory mapped into userspace with huge pages
-KReclaimable: Kernel allocations that the kernel will attempt to reclaim
-              under memory pressure. Includes SReclaimable (below), and other
-              direct allocations with a shrinker.
-        Slab: in-kernel data structures cache
-SReclaimable: Part of Slab, that might be reclaimed, such as caches
-  SUnreclaim: Part of Slab, that cannot be reclaimed on memory pressure
-  PageTables: amount of memory dedicated to the lowest level of page
-              tables.
-NFS_Unstable: NFS pages sent to the server, but not yet committed to stable
-	      storage
-      Bounce: Memory used for block device "bounce buffers"
-WritebackTmp: Memory used by FUSE for temporary writeback buffers
- CommitLimit: Based on the overcommit ratio ('vm.overcommit_ratio'),
-              this is the total amount of  memory currently available to
-              be allocated on the system. This limit is only adhered to
-              if strict overcommit accounting is enabled (mode 2 in
-              'vm.overcommit_memory').
-              The CommitLimit is calculated with the following formula:
-              CommitLimit = ([total RAM pages] - [total huge TLB pages]) *
-                             overcommit_ratio / 100 + [total swap pages]
-              For example, on a system with 1G of physical RAM and 7G
-              of swap with a `vm.overcommit_ratio` of 30 it would
-              yield a CommitLimit of 7.3G.
-              For more details, see the memory overcommit documentation
-              in vm/overcommit-accounting.
-Committed_AS: The amount of memory presently allocated on the system.
-              The committed memory is a sum of all of the memory which
-              has been allocated by processes, even if it has not been
-              "used" by them as of yet. A process which malloc()'s 1G
-              of memory, but only touches 300M of it will show up as
-	      using 1G. This 1G is memory which has been "committed" to
-              by the VM and can be used at any time by the allocating
-              application. With strict overcommit enabled on the system
-              (mode 2 in 'vm.overcommit_memory'),allocations which would
-              exceed the CommitLimit (detailed above) will not be permitted.
-              This is useful if one needs to guarantee that processes will
-              not fail due to lack of memory once that memory has been
-              successfully allocated.
-VmallocTotal: total size of vmalloc memory area
- VmallocUsed: amount of vmalloc area which is used
-VmallocChunk: largest contiguous block of vmalloc area which is free
-      Percpu: Memory allocated to the percpu allocator used to back percpu
-              allocations. This stat excludes the cost of metadata.
-
-..............................................................................
-
-vmallocinfo:
-
-Provides information about vmalloced/vmaped areas. One line per area,
-containing the virtual address range of the area, size in bytes,
-caller information of the creator, and optional information depending
-on the kind of area :
-
- pages=nr    number of pages
- phys=addr   if a physical address was specified
- ioremap     I/O mapping (ioremap() and friends)
- vmalloc     vmalloc() area
- vmap        vmap()ed pages
- user        VM_USERMAP area
- vpages      buffer for pages pointers was vmalloced (huge area)
- N<node>=nr  (Only on NUMA kernels)
-             Number of pages allocated on memory node <node>
-
-> cat /proc/vmallocinfo
-0xffffc20000000000-0xffffc20000201000 2101248 alloc_large_system_hash+0x204 ...
-  /0x2c0 pages=512 vmalloc N0=128 N1=128 N2=128 N3=128
-0xffffc20000201000-0xffffc20000302000 1052672 alloc_large_system_hash+0x204 ...
-  /0x2c0 pages=256 vmalloc N0=64 N1=64 N2=64 N3=64
-0xffffc20000302000-0xffffc20000304000    8192 acpi_tb_verify_table+0x21/0x4f...
-  phys=7fee8000 ioremap
-0xffffc20000304000-0xffffc20000307000   12288 acpi_tb_verify_table+0x21/0x4f...
-  phys=7fee7000 ioremap
-0xffffc2000031d000-0xffffc2000031f000    8192 init_vdso_vars+0x112/0x210
-0xffffc2000031f000-0xffffc2000032b000   49152 cramfs_uncompress_init+0x2e ...
-  /0x80 pages=11 vmalloc N0=3 N1=3 N2=2 N3=3
-0xffffc2000033a000-0xffffc2000033d000   12288 sys_swapon+0x640/0xac0      ...
-  pages=2 vmalloc N1=2
-0xffffc20000347000-0xffffc2000034c000   20480 xt_alloc_table_info+0xfe ...
-  /0x130 [x_tables] pages=4 vmalloc N0=4
-0xffffffffa0000000-0xffffffffa000f000   61440 sys_init_module+0xc27/0x1d00 ...
-   pages=14 vmalloc N2=14
-0xffffffffa000f000-0xffffffffa0014000   20480 sys_init_module+0xc27/0x1d00 ...
-   pages=4 vmalloc N1=4
-0xffffffffa0014000-0xffffffffa0017000   12288 sys_init_module+0xc27/0x1d00 ...
-   pages=2 vmalloc N1=2
-0xffffffffa0017000-0xffffffffa0022000   45056 sys_init_module+0xc27/0x1d00 ...
-   pages=10 vmalloc N0=10
-
-..............................................................................
-
-softirqs:
-
-Provides counts of softirq handlers serviced since boot time, for each cpu.
-
-> cat /proc/softirqs
-                CPU0       CPU1       CPU2       CPU3
-      HI:          0          0          0          0
-   TIMER:      27166      27120      27097      27034
-  NET_TX:          0          0          0         17
-  NET_RX:         42          0          0         39
-   BLOCK:          0          0        107       1121
- TASKLET:          0          0          0        290
-   SCHED:      27035      26983      26971      26746
- HRTIMER:          0          0          0          0
-     RCU:       1678       1769       2178       2250
-
-
-1.3 IDE devices in /proc/ide
-----------------------------
-
-The subdirectory /proc/ide contains information about all IDE devices of which
-the kernel  is  aware.  There is one subdirectory for each IDE controller, the
-file drivers  and a link for each IDE device, pointing to the device directory
-in the controller specific subtree.
-
-The file  drivers  contains general information about the drivers used for the
-IDE devices:
-
-  > cat /proc/ide/drivers
-  ide-cdrom version 4.53
-  ide-disk version 1.08
-
-More detailed  information  can  be  found  in  the  controller  specific
-subdirectories. These  are  named  ide0,  ide1  and  so  on.  Each  of  these
-directories contains the files shown in table 1-6.
-
-
-Table 1-6: IDE controller info in  /proc/ide/ide?
-..............................................................................
- File    Content                                 
- channel IDE channel (0 or 1)                    
- config  Configuration (only for PCI/IDE bridge) 
- mate    Mate name                               
- model   Type/Chipset of IDE controller          
-..............................................................................
-
-Each device  connected  to  a  controller  has  a separate subdirectory in the
-controllers directory.  The  files  listed in table 1-7 are contained in these
-directories.
-
-
-Table 1-7: IDE device information
-..............................................................................
- File             Content                                    
- cache            The cache                                  
- capacity         Capacity of the medium (in 512Byte blocks) 
- driver           driver and version                         
- geometry         physical and logical geometry              
- identify         device identify block                      
- media            media type                                 
- model            device identifier                          
- settings         device setup                               
- smart_thresholds IDE disk management thresholds             
- smart_values     IDE disk management values                 
-..............................................................................
-
-The most  interesting  file is settings. This file contains a nice overview of
-the drive parameters:
-
-  # cat /proc/ide/ide0/hda/settings 
-  name                    value           min             max             mode 
-  ----                    -----           ---             ---             ---- 
-  bios_cyl                526             0               65535           rw 
-  bios_head               255             0               255             rw 
-  bios_sect               63              0               63              rw 
-  breada_readahead        4               0               127             rw 
-  bswap                   0               0               1               r 
-  file_readahead          72              0               2097151         rw 
-  io_32bit                0               0               3               rw 
-  keepsettings            0               0               1               rw 
-  max_kb_per_request      122             1               127             rw 
-  multcount               0               0               8               rw 
-  nice1                   1               0               1               rw 
-  nowerr                  0               0               1               rw 
-  pio_mode                write-only      0               255             w 
-  slow                    0               0               1               rw 
-  unmaskirq               0               0               1               rw 
-  using_dma               0               0               1               rw 
-
-
-1.4 Networking info in /proc/net
---------------------------------
-
-The subdirectory  /proc/net  follows  the  usual  pattern. Table 1-8 shows the
-additional values  you  get  for  IP  version 6 if you configure the kernel to
-support this. Table 1-9 lists the files and their meaning.
-
-
-Table 1-8: IPv6 info in /proc/net
-..............................................................................
- File       Content                                               
- udp6       UDP sockets (IPv6)                                    
- tcp6       TCP sockets (IPv6)                                    
- raw6       Raw device statistics (IPv6)                          
- igmp6      IP multicast addresses, which this host joined (IPv6) 
- if_inet6   List of IPv6 interface addresses                      
- ipv6_route Kernel routing table for IPv6                         
- rt6_stats  Global IPv6 routing tables statistics                 
- sockstat6  Socket statistics (IPv6)                              
- snmp6      Snmp data (IPv6)                                      
-..............................................................................
-
-
-Table 1-9: Network info in /proc/net
-..............................................................................
- File          Content                                                         
- arp           Kernel  ARP table                                               
- dev           network devices with statistics                                 
- dev_mcast     the Layer2 multicast groups a device is listening too
-               (interface index, label, number of references, number of bound
-               addresses). 
- dev_stat      network device status                                           
- ip_fwchains   Firewall chain linkage                                          
- ip_fwnames    Firewall chain names                                            
- ip_masq       Directory containing the masquerading tables                    
- ip_masquerade Major masquerading table                                        
- netstat       Network statistics                                              
- raw           raw device statistics                                           
- route         Kernel routing table                                            
- rpc           Directory containing rpc info                                   
- rt_cache      Routing cache                                                   
- snmp          SNMP data                                                       
- sockstat      Socket statistics                                               
- tcp           TCP  sockets                                                    
- udp           UDP sockets                                                     
- unix          UNIX domain sockets                                             
- wireless      Wireless interface data (Wavelan etc)                           
- igmp          IP multicast addresses, which this host joined                  
- psched        Global packet scheduler parameters.                             
- netlink       List of PF_NETLINK sockets                                      
- ip_mr_vifs    List of multicast virtual interfaces                            
- ip_mr_cache   List of multicast routing cache                                 
-..............................................................................
-
-You can  use  this  information  to see which network devices are available in
-your system and how much traffic was routed over those devices:
-
-  > cat /proc/net/dev 
-  Inter-|Receive                                                   |[... 
-   face |bytes    packets errs drop fifo frame compressed multicast|[... 
-      lo:  908188   5596     0    0    0     0          0         0 [...         
-    ppp0:15475140  20721   410    0    0   410          0         0 [...  
-    eth0:  614530   7085     0    0    0     0          0         1 [... 
-   
-  ...] Transmit 
-  ...] bytes    packets errs drop fifo colls carrier compressed 
-  ...]  908188     5596    0    0    0     0       0          0 
-  ...] 1375103    17405    0    0    0     0       0          0 
-  ...] 1703981     5535    0    0    0     3       0          0 
-
-In addition, each Channel Bond interface has its own directory.  For
-example, the bond0 device will have a directory called /proc/net/bond0/.
-It will contain information that is specific to that bond, such as the
-current slaves of the bond, the link status of the slaves, and how
-many times the slaves link has failed.
-
-1.5 SCSI info
--------------
-
-If you  have  a  SCSI  host adapter in your system, you'll find a subdirectory
-named after  the driver for this adapter in /proc/scsi. You'll also see a list
-of all recognized SCSI devices in /proc/scsi:
-
-  >cat /proc/scsi/scsi 
-  Attached devices: 
-  Host: scsi0 Channel: 00 Id: 00 Lun: 00 
-    Vendor: IBM      Model: DGHS09U          Rev: 03E0 
-    Type:   Direct-Access                    ANSI SCSI revision: 03 
-  Host: scsi0 Channel: 00 Id: 06 Lun: 00 
-    Vendor: PIONEER  Model: CD-ROM DR-U06S   Rev: 1.04 
-    Type:   CD-ROM                           ANSI SCSI revision: 02 
-
-
-The directory  named  after  the driver has one file for each adapter found in
-the system.  These  files  contain information about the controller, including
-the used  IRQ  and  the  IO  address range. The amount of information shown is
-dependent on  the adapter you use. The example shows the output for an Adaptec
-AHA-2940 SCSI adapter:
-
-  > cat /proc/scsi/aic7xxx/0 
-   
-  Adaptec AIC7xxx driver version: 5.1.19/3.2.4 
-  Compile Options: 
-    TCQ Enabled By Default : Disabled 
-    AIC7XXX_PROC_STATS     : Disabled 
-    AIC7XXX_RESET_DELAY    : 5 
-  Adapter Configuration: 
-             SCSI Adapter: Adaptec AHA-294X Ultra SCSI host adapter 
-                             Ultra Wide Controller 
-      PCI MMAPed I/O Base: 0xeb001000 
-   Adapter SEEPROM Config: SEEPROM found and used. 
-        Adaptec SCSI BIOS: Enabled 
-                      IRQ: 10 
-                     SCBs: Active 0, Max Active 2, 
-                           Allocated 15, HW 16, Page 255 
-               Interrupts: 160328 
-        BIOS Control Word: 0x18b6 
-     Adapter Control Word: 0x005b 
-     Extended Translation: Enabled 
-  Disconnect Enable Flags: 0xffff 
-       Ultra Enable Flags: 0x0001 
-   Tag Queue Enable Flags: 0x0000 
-  Ordered Queue Tag Flags: 0x0000 
-  Default Tag Queue Depth: 8 
-      Tagged Queue By Device array for aic7xxx host instance 0: 
-        {255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255} 
-      Actual queue depth per device for aic7xxx host instance 0: 
-        {1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1} 
-  Statistics: 
-  (scsi0:0:0:0) 
-    Device using Wide/Sync transfers at 40.0 MByte/sec, offset 8 
-    Transinfo settings: current(12/8/1/0), goal(12/8/1/0), user(12/15/1/0) 
-    Total transfers 160151 (74577 reads and 85574 writes) 
-  (scsi0:0:6:0) 
-    Device using Narrow/Sync transfers at 5.0 MByte/sec, offset 15 
-    Transinfo settings: current(50/15/0/0), goal(50/15/0/0), user(50/15/0/0) 
-    Total transfers 0 (0 reads and 0 writes) 
-
-
-1.6 Parallel port info in /proc/parport
----------------------------------------
-
-The directory  /proc/parport  contains information about the parallel ports of
-your system.  It  has  one  subdirectory  for  each port, named after the port
-number (0,1,2,...).
-
-These directories contain the four files shown in Table 1-10.
-
-
-Table 1-10: Files in /proc/parport
-..............................................................................
- File      Content                                                             
- autoprobe Any IEEE-1284 device ID information that has been acquired.         
- devices   list of the device drivers using that port. A + will appear by the
-           name of the device currently using the port (it might not appear
-           against any). 
- hardware  Parallel port's base address, IRQ line and DMA channel.             
- irq       IRQ that parport is using for that port. This is in a separate
-           file to allow you to alter it by writing a new value in (IRQ
-           number or none). 
-..............................................................................
-
-1.7 TTY info in /proc/tty
--------------------------
-
-Information about  the  available  and actually used tty's can be found in the
-directory /proc/tty.You'll  find  entries  for drivers and line disciplines in
-this directory, as shown in Table 1-11.
-
-
-Table 1-11: Files in /proc/tty
-..............................................................................
- File          Content                                        
- drivers       list of drivers and their usage                
- ldiscs        registered line disciplines                    
- driver/serial usage statistic and status of single tty lines 
-..............................................................................
-
-To see  which  tty's  are  currently in use, you can simply look into the file
-/proc/tty/drivers:
-
-  > cat /proc/tty/drivers 
-  pty_slave            /dev/pts      136   0-255 pty:slave 
-  pty_master           /dev/ptm      128   0-255 pty:master 
-  pty_slave            /dev/ttyp       3   0-255 pty:slave 
-  pty_master           /dev/pty        2   0-255 pty:master 
-  serial               /dev/cua        5   64-67 serial:callout 
-  serial               /dev/ttyS       4   64-67 serial 
-  /dev/tty0            /dev/tty0       4       0 system:vtmaster 
-  /dev/ptmx            /dev/ptmx       5       2 system 
-  /dev/console         /dev/console    5       1 system:console 
-  /dev/tty             /dev/tty        5       0 system:/dev/tty 
-  unknown              /dev/tty        4    1-63 console 
-
-
-1.8 Miscellaneous kernel statistics in /proc/stat
--------------------------------------------------
-
-Various pieces   of  information about  kernel activity  are  available in the
-/proc/stat file.  All  of  the numbers reported  in  this file are  aggregates
-since the system first booted.  For a quick look, simply cat the file:
-
-  > cat /proc/stat
-  cpu  2255 34 2290 22625563 6290 127 456 0 0 0
-  cpu0 1132 34 1441 11311718 3675 127 438 0 0 0
-  cpu1 1123 0 849 11313845 2614 0 18 0 0 0
-  intr 114930548 113199788 3 0 5 263 0 4 [... lots more numbers ...]
-  ctxt 1990473
-  btime 1062191376
-  processes 2915
-  procs_running 1
-  procs_blocked 0
-  softirq 183433 0 21755 12 39 1137 231 21459 2263
-
-The very first  "cpu" line aggregates the  numbers in all  of the other "cpuN"
-lines.  These numbers identify the amount of time the CPU has spent performing
-different kinds of work.  Time units are in USER_HZ (typically hundredths of a
-second).  The meanings of the columns are as follows, from left to right:
-
-- user: normal processes executing in user mode
-- nice: niced processes executing in user mode
-- system: processes executing in kernel mode
-- idle: twiddling thumbs
-- iowait: In a word, iowait stands for waiting for I/O to complete. But there
-  are several problems:
-  1. Cpu will not wait for I/O to complete, iowait is the time that a task is
-     waiting for I/O to complete. When cpu goes into idle state for
-     outstanding task io, another task will be scheduled on this CPU.
-  2. In a multi-core CPU, the task waiting for I/O to complete is not running
-     on any CPU, so the iowait of each CPU is difficult to calculate.
-  3. The value of iowait field in /proc/stat will decrease in certain
-     conditions.
-  So, the iowait is not reliable by reading from /proc/stat.
-- irq: servicing interrupts
-- softirq: servicing softirqs
-- steal: involuntary wait
-- guest: running a normal guest
-- guest_nice: running a niced guest
-
-The "intr" line gives counts of interrupts  serviced since boot time, for each
-of the  possible system interrupts.   The first  column  is the  total of  all
-interrupts serviced  including  unnumbered  architecture specific  interrupts;
-each  subsequent column is the  total for that particular numbered interrupt.
-Unnumbered interrupts are not shown, only summed into the total.
-
-The "ctxt" line gives the total number of context switches across all CPUs.
-
-The "btime" line gives  the time at which the  system booted, in seconds since
-the Unix epoch.
-
-The "processes" line gives the number  of processes and threads created, which
-includes (but  is not limited  to) those  created by  calls to the  fork() and
-clone() system calls.
-
-The "procs_running" line gives the total number of threads that are
-running or ready to run (i.e., the total number of runnable threads).
-
-The   "procs_blocked" line gives  the  number of  processes currently blocked,
-waiting for I/O to complete.
-
-The "softirq" line gives counts of softirqs serviced since boot time, for each
-of the possible system softirqs. The first column is the total of all
-softirqs serviced; each subsequent column is the total for that particular
-softirq.
-
-
-1.9 Ext4 file system parameters
--------------------------------
-
-Information about mounted ext4 file systems can be found in
-/proc/fs/ext4.  Each mounted filesystem will have a directory in
-/proc/fs/ext4 based on its device name (i.e., /proc/fs/ext4/hdc or
-/proc/fs/ext4/dm-0).   The files in each per-device directory are shown
-in Table 1-12, below.
-
-Table 1-12: Files in /proc/fs/ext4/<devname>
-..............................................................................
- File            Content                                        
- mb_groups       details of multiblock allocator buddy cache of free blocks
-..............................................................................
-
-2.0 /proc/consoles
-------------------
-Shows registered system console lines.
-
-To see which character device lines are currently used for the system console
-/dev/console, you may simply look into the file /proc/consoles:
-
-  > cat /proc/consoles
-  tty0                 -WU (ECp)       4:7
-  ttyS0                -W- (Ep)        4:64
-
-The columns are:
-
-  device               name of the device
-  operations           R = can do read operations
-                       W = can do write operations
-                       U = can do unblank
-  flags                E = it is enabled
-                       C = it is preferred console
-                       B = it is primary boot console
-                       p = it is used for printk buffer
-                       b = it is not a TTY but a Braille device
-                       a = it is safe to use when cpu is offline
-  major:minor          major and minor number of the device separated by a colon
-
-------------------------------------------------------------------------------
-Summary
-------------------------------------------------------------------------------
-The /proc file system serves information about the running system. It not only
-allows access to process data but also allows you to request the kernel status
-by reading files in the hierarchy.
-
-The directory  structure  of /proc reflects the types of information and makes
-it easy, if not obvious, where to look for specific data.
-------------------------------------------------------------------------------
-
-------------------------------------------------------------------------------
-CHAPTER 2: MODIFYING SYSTEM PARAMETERS
-------------------------------------------------------------------------------
-
-------------------------------------------------------------------------------
-In This Chapter
-------------------------------------------------------------------------------
-* Modifying kernel parameters by writing into files found in /proc/sys
-* Exploring the files which modify certain parameters
-* Review of the /proc/sys file tree
-------------------------------------------------------------------------------
-
-
-A very  interesting part of /proc is the directory /proc/sys. This is not only
-a source  of  information,  it also allows you to change parameters within the
-kernel. Be  very  careful  when attempting this. You can optimize your system,
-but you  can  also  cause  it  to  crash.  Never  alter kernel parameters on a
-production system.  Set  up  a  development machine and test to make sure that
-everything works  the  way  you want it to. You may have no alternative but to
-reboot the machine once an error has been made.
-
-To change  a  value,  simply  echo  the new value into the file. An example is
-given below  in the section on the file system data. You need to be root to do
-this. You  can  create  your  own  boot script to perform this every time your
-system boots.
-
-The files  in /proc/sys can be used to fine tune and monitor miscellaneous and
-general things  in  the operation of the Linux kernel. Since some of the files
-can inadvertently  disrupt  your  system,  it  is  advisable  to  read  both
-documentation and  source  before actually making adjustments. In any case, be
-very careful  when  writing  to  any  of these files. The entries in /proc may
-change slightly between the 2.1.* and the 2.2 kernel, so if there is any doubt
-review the kernel documentation in the directory /usr/src/linux/Documentation.
-This chapter  is  heavily  based  on the documentation included in the pre 2.2
-kernels, and became part of it in version 2.2.1 of the Linux kernel.
-
-Please see: Documentation/admin-guide/sysctl/ directory for descriptions of these
-entries.
-
-------------------------------------------------------------------------------
-Summary
-------------------------------------------------------------------------------
-Certain aspects  of  kernel  behavior  can be modified at runtime, without the
-need to  recompile  the kernel, or even to reboot the system. The files in the
-/proc/sys tree  can  not only be read, but also modified. You can use the echo
-command to write value into these files, thereby changing the default settings
-of the kernel.
-------------------------------------------------------------------------------
-
-------------------------------------------------------------------------------
-CHAPTER 3: PER-PROCESS PARAMETERS
-------------------------------------------------------------------------------
-
-3.1 /proc/<pid>/oom_adj & /proc/<pid>/oom_score_adj- Adjust the oom-killer score
---------------------------------------------------------------------------------
-
-These file can be used to adjust the badness heuristic used to select which
-process gets killed in out of memory conditions.
-
-The badness heuristic assigns a value to each candidate task ranging from 0
-(never kill) to 1000 (always kill) to determine which process is targeted.  The
-units are roughly a proportion along that range of allowed memory the process
-may allocate from based on an estimation of its current memory and swap use.
-For example, if a task is using all allowed memory, its badness score will be
-1000.  If it is using half of its allowed memory, its score will be 500.
-
-There is an additional factor included in the badness score: the current memory
-and swap usage is discounted by 3% for root processes.
-
-The amount of "allowed" memory depends on the context in which the oom killer
-was called.  If it is due to the memory assigned to the allocating task's cpuset
-being exhausted, the allowed memory represents the set of mems assigned to that
-cpuset.  If it is due to a mempolicy's node(s) being exhausted, the allowed
-memory represents the set of mempolicy nodes.  If it is due to a memory
-limit (or swap limit) being reached, the allowed memory is that configured
-limit.  Finally, if it is due to the entire system being out of memory, the
-allowed memory represents all allocatable resources.
-
-The value of /proc/<pid>/oom_score_adj is added to the badness score before it
-is used to determine which task to kill.  Acceptable values range from -1000
-(OOM_SCORE_ADJ_MIN) to +1000 (OOM_SCORE_ADJ_MAX).  This allows userspace to
-polarize the preference for oom killing either by always preferring a certain
-task or completely disabling it.  The lowest possible value, -1000, is
-equivalent to disabling oom killing entirely for that task since it will always
-report a badness score of 0.
-
-Consequently, it is very simple for userspace to define the amount of memory to
-consider for each task.  Setting a /proc/<pid>/oom_score_adj value of +500, for
-example, is roughly equivalent to allowing the remainder of tasks sharing the
-same system, cpuset, mempolicy, or memory controller resources to use at least
-50% more memory.  A value of -500, on the other hand, would be roughly
-equivalent to discounting 50% of the task's allowed memory from being considered
-as scoring against the task.
-
-For backwards compatibility with previous kernels, /proc/<pid>/oom_adj may also
-be used to tune the badness score.  Its acceptable values range from -16
-(OOM_ADJUST_MIN) to +15 (OOM_ADJUST_MAX) and a special value of -17
-(OOM_DISABLE) to disable oom killing entirely for that task.  Its value is
-scaled linearly with /proc/<pid>/oom_score_adj.
-
-The value of /proc/<pid>/oom_score_adj may be reduced no lower than the last
-value set by a CAP_SYS_RESOURCE process. To reduce the value any lower
-requires CAP_SYS_RESOURCE.
-
-Caveat: when a parent task is selected, the oom killer will sacrifice any first
-generation children with separate address spaces instead, if possible.  This
-avoids servers and important system daemons from being killed and loses the
-minimal amount of work.
-
-
-3.2 /proc/<pid>/oom_score - Display current oom-killer score
--------------------------------------------------------------
-
-This file can be used to check the current score used by the oom-killer is for
-any given <pid>. Use it together with /proc/<pid>/oom_score_adj to tune which
-process should be killed in an out-of-memory situation.
-
-
-3.3  /proc/<pid>/io - Display the IO accounting fields
--------------------------------------------------------
-
-This file contains IO statistics for each running process
-
-Example
--------
-
-test:/tmp # dd if=/dev/zero of=/tmp/test.dat &
-[1] 3828
-
-test:/tmp # cat /proc/3828/io
-rchar: 323934931
-wchar: 323929600
-syscr: 632687
-syscw: 632675
-read_bytes: 0
-write_bytes: 323932160
-cancelled_write_bytes: 0
-
-
-Description
------------
-
-rchar
------
-
-I/O counter: chars read
-The number of bytes which this task has caused to be read from storage. This
-is simply the sum of bytes which this process passed to read() and pread().
-It includes things like tty IO and it is unaffected by whether or not actual
-physical disk IO was required (the read might have been satisfied from
-pagecache)
-
-
-wchar
------
-
-I/O counter: chars written
-The number of bytes which this task has caused, or shall cause to be written
-to disk. Similar caveats apply here as with rchar.
-
-
-syscr
------
-
-I/O counter: read syscalls
-Attempt to count the number of read I/O operations, i.e. syscalls like read()
-and pread().
-
-
-syscw
------
-
-I/O counter: write syscalls
-Attempt to count the number of write I/O operations, i.e. syscalls like
-write() and pwrite().
-
-
-read_bytes
-----------
-
-I/O counter: bytes read
-Attempt to count the number of bytes which this process really did cause to
-be fetched from the storage layer. Done at the submit_bio() level, so it is
-accurate for block-backed filesystems. <please add status regarding NFS and
-CIFS at a later time>
-
-
-write_bytes
------------
-
-I/O counter: bytes written
-Attempt to count the number of bytes which this process caused to be sent to
-the storage layer. This is done at page-dirtying time.
-
-
-cancelled_write_bytes
----------------------
-
-The big inaccuracy here is truncate. If a process writes 1MB to a file and
-then deletes the file, it will in fact perform no writeout. But it will have
-been accounted as having caused 1MB of write.
-In other words: The number of bytes which this process caused to not happen,
-by truncating pagecache. A task can cause "negative" IO too. If this task
-truncates some dirty pagecache, some IO which another task has been accounted
-for (in its write_bytes) will not be happening. We _could_ just subtract that
-from the truncating task's write_bytes, but there is information loss in doing
-that.
-
-
-Note
-----
-
-At its current implementation state, this is a bit racy on 32-bit machines: if
-process A reads process B's /proc/pid/io while process B is updating one of
-those 64-bit counters, process A could see an intermediate result.
-
-
-More information about this can be found within the taskstats documentation in
-Documentation/accounting.
-
-3.4 /proc/<pid>/coredump_filter - Core dump filtering settings
----------------------------------------------------------------
-When a process is dumped, all anonymous memory is written to a core file as
-long as the size of the core file isn't limited. But sometimes we don't want
-to dump some memory segments, for example, huge shared memory or DAX.
-Conversely, sometimes we want to save file-backed memory segments into a core
-file, not only the individual files.
-
-/proc/<pid>/coredump_filter allows you to customize which memory segments
-will be dumped when the <pid> process is dumped. coredump_filter is a bitmask
-of memory types. If a bit of the bitmask is set, memory segments of the
-corresponding memory type are dumped, otherwise they are not dumped.
-
-The following 9 memory types are supported:
-  - (bit 0) anonymous private memory
-  - (bit 1) anonymous shared memory
-  - (bit 2) file-backed private memory
-  - (bit 3) file-backed shared memory
-  - (bit 4) ELF header pages in file-backed private memory areas (it is
-            effective only if the bit 2 is cleared)
-  - (bit 5) hugetlb private memory
-  - (bit 6) hugetlb shared memory
-  - (bit 7) DAX private memory
-  - (bit 8) DAX shared memory
-
-  Note that MMIO pages such as frame buffer are never dumped and vDSO pages
-  are always dumped regardless of the bitmask status.
-
-  Note that bits 0-4 don't affect hugetlb or DAX memory. hugetlb memory is
-  only affected by bit 5-6, and DAX is only affected by bits 7-8.
-
-The default value of coredump_filter is 0x33; this means all anonymous memory
-segments, ELF header pages and hugetlb private memory are dumped.
-
-If you don't want to dump all shared memory segments attached to pid 1234,
-write 0x31 to the process's proc file.
-
-  $ echo 0x31 > /proc/1234/coredump_filter
-
-When a new process is created, the process inherits the bitmask status from its
-parent. It is useful to set up coredump_filter before the program runs.
-For example:
-
-  $ echo 0x7 > /proc/self/coredump_filter
-  $ ./some_program
-
-3.5	/proc/<pid>/mountinfo - Information about mounts
---------------------------------------------------------
-
-This file contains lines of the form:
-
-36 35 98:0 /mnt1 /mnt2 rw,noatime master:1 - ext3 /dev/root rw,errors=continue
-(1)(2)(3)   (4)   (5)      (6)      (7)   (8) (9)   (10)         (11)
-
-(1) mount ID:  unique identifier of the mount (may be reused after umount)
-(2) parent ID:  ID of parent (or of self for the top of the mount tree)
-(3) major:minor:  value of st_dev for files on filesystem
-(4) root:  root of the mount within the filesystem
-(5) mount point:  mount point relative to the process's root
-(6) mount options:  per mount options
-(7) optional fields:  zero or more fields of the form "tag[:value]"
-(8) separator:  marks the end of the optional fields
-(9) filesystem type:  name of filesystem of the form "type[.subtype]"
-(10) mount source:  filesystem specific information or "none"
-(11) super options:  per super block options
-
-Parsers should ignore all unrecognised optional fields.  Currently the
-possible optional fields are:
-
-shared:X  mount is shared in peer group X
-master:X  mount is slave to peer group X
-propagate_from:X  mount is slave and receives propagation from peer group X (*)
-unbindable  mount is unbindable
-
-(*) X is the closest dominant peer group under the process's root.  If
-X is the immediate master of the mount, or if there's no dominant peer
-group under the same root, then only the "master:X" field is present
-and not the "propagate_from:X" field.
-
-For more information on mount propagation see:
-
-  Documentation/filesystems/sharedsubtree.txt
-
-
-3.6	/proc/<pid>/comm  & /proc/<pid>/task/<tid>/comm
---------------------------------------------------------
-These files provide a method to access a tasks comm value. It also allows for
-a task to set its own or one of its thread siblings comm value. The comm value
-is limited in size compared to the cmdline value, so writing anything longer
-then the kernel's TASK_COMM_LEN (currently 16 chars) will result in a truncated
-comm value.
-
-
-3.7	/proc/<pid>/task/<tid>/children - Information about task children
--------------------------------------------------------------------------
-This file provides a fast way to retrieve first level children pids
-of a task pointed by <pid>/<tid> pair. The format is a space separated
-stream of pids.
-
-Note the "first level" here -- if a child has own children they will
-not be listed here, one needs to read /proc/<children-pid>/task/<tid>/children
-to obtain the descendants.
-
-Since this interface is intended to be fast and cheap it doesn't
-guarantee to provide precise results and some children might be
-skipped, especially if they've exited right after we printed their
-pids, so one need to either stop or freeze processes being inspected
-if precise results are needed.
-
-
-3.8	/proc/<pid>/fdinfo/<fd> - Information about opened file
----------------------------------------------------------------
-This file provides information associated with an opened file. The regular
-files have at least three fields -- 'pos', 'flags' and mnt_id. The 'pos'
-represents the current offset of the opened file in decimal form [see lseek(2)
-for details], 'flags' denotes the octal O_xxx mask the file has been
-created with [see open(2) for details] and 'mnt_id' represents mount ID of
-the file system containing the opened file [see 3.5 /proc/<pid>/mountinfo
-for details].
-
-A typical output is
-
-	pos:	0
-	flags:	0100002
-	mnt_id:	19
-
-All locks associated with a file descriptor are shown in its fdinfo too.
-
-lock:       1: FLOCK  ADVISORY  WRITE 359 00:13:11691 0 EOF
-
-The files such as eventfd, fsnotify, signalfd, epoll among the regular pos/flags
-pair provide additional information particular to the objects they represent.
-
-	Eventfd files
-	~~~~~~~~~~~~~
-	pos:	0
-	flags:	04002
-	mnt_id:	9
-	eventfd-count:	5a
-
-	where 'eventfd-count' is hex value of a counter.
-
-	Signalfd files
-	~~~~~~~~~~~~~~
-	pos:	0
-	flags:	04002
-	mnt_id:	9
-	sigmask:	0000000000000200
-
-	where 'sigmask' is hex value of the signal mask associated
-	with a file.
-
-	Epoll files
-	~~~~~~~~~~~
-	pos:	0
-	flags:	02
-	mnt_id:	9
-	tfd:        5 events:       1d data: ffffffffffffffff pos:0 ino:61af sdev:7
-
-	where 'tfd' is a target file descriptor number in decimal form,
-	'events' is events mask being watched and the 'data' is data
-	associated with a target [see epoll(7) for more details].
-
-	The 'pos' is current offset of the target file in decimal form
-	[see lseek(2)], 'ino' and 'sdev' are inode and device numbers
-	where target file resides, all in hex format.
-
-	Fsnotify files
-	~~~~~~~~~~~~~~
-	For inotify files the format is the following
-
-	pos:	0
-	flags:	02000000
-	inotify wd:3 ino:9e7e sdev:800013 mask:800afce ignored_mask:0 fhandle-bytes:8 fhandle-type:1 f_handle:7e9e0000640d1b6d
-
-	where 'wd' is a watch descriptor in decimal form, ie a target file
-	descriptor number, 'ino' and 'sdev' are inode and device where the
-	target file resides and the 'mask' is the mask of events, all in hex
-	form [see inotify(7) for more details].
-
-	If the kernel was built with exportfs support, the path to the target
-	file is encoded as a file handle.  The file handle is provided by three
-	fields 'fhandle-bytes', 'fhandle-type' and 'f_handle', all in hex
-	format.
-
-	If the kernel is built without exportfs support the file handle won't be
-	printed out.
-
-	If there is no inotify mark attached yet the 'inotify' line will be omitted.
-
-	For fanotify files the format is
-
-	pos:	0
-	flags:	02
-	mnt_id:	9
-	fanotify flags:10 event-flags:0
-	fanotify mnt_id:12 mflags:40 mask:38 ignored_mask:40000003
-	fanotify ino:4f969 sdev:800013 mflags:0 mask:3b ignored_mask:40000000 fhandle-bytes:8 fhandle-type:1 f_handle:69f90400c275b5b4
-
-	where fanotify 'flags' and 'event-flags' are values used in fanotify_init
-	call, 'mnt_id' is the mount point identifier, 'mflags' is the value of
-	flags associated with mark which are tracked separately from events
-	mask. 'ino', 'sdev' are target inode and device, 'mask' is the events
-	mask and 'ignored_mask' is the mask of events which are to be ignored.
-	All in hex format. Incorporation of 'mflags', 'mask' and 'ignored_mask'
-	does provide information about flags and mask used in fanotify_mark
-	call [see fsnotify manpage for details].
-
-	While the first three lines are mandatory and always printed, the rest is
-	optional and may be omitted if no marks created yet.
-
-	Timerfd files
-	~~~~~~~~~~~~~
-
-	pos:	0
-	flags:	02
-	mnt_id:	9
-	clockid: 0
-	ticks: 0
-	settime flags: 01
-	it_value: (0, 49406829)
-	it_interval: (1, 0)
-
-	where 'clockid' is the clock type and 'ticks' is the number of the timer expirations
-	that have occurred [see timerfd_create(2) for details]. 'settime flags' are
-	flags in octal form been used to setup the timer [see timerfd_settime(2) for
-	details]. 'it_value' is remaining time until the timer exiration.
-	'it_interval' is the interval for the timer. Note the timer might be set up
-	with TIMER_ABSTIME option which will be shown in 'settime flags', but 'it_value'
-	still exhibits timer's remaining time.
-
-3.9	/proc/<pid>/map_files - Information about memory mapped files
----------------------------------------------------------------------
-This directory contains symbolic links which represent memory mapped files
-the process is maintaining.  Example output:
-
-     | lr-------- 1 root root 64 Jan 27 11:24 333c600000-333c620000 -> /usr/lib64/ld-2.18.so
-     | lr-------- 1 root root 64 Jan 27 11:24 333c81f000-333c820000 -> /usr/lib64/ld-2.18.so
-     | lr-------- 1 root root 64 Jan 27 11:24 333c820000-333c821000 -> /usr/lib64/ld-2.18.so
-     | ...
-     | lr-------- 1 root root 64 Jan 27 11:24 35d0421000-35d0422000 -> /usr/lib64/libselinux.so.1
-     | lr-------- 1 root root 64 Jan 27 11:24 400000-41a000 -> /usr/bin/ls
-
-The name of a link represents the virtual memory bounds of a mapping, i.e.
-vm_area_struct::vm_start-vm_area_struct::vm_end.
-
-The main purpose of the map_files is to retrieve a set of memory mapped
-files in a fast way instead of parsing /proc/<pid>/maps or
-/proc/<pid>/smaps, both of which contain many more records.  At the same
-time one can open(2) mappings from the listings of two processes and
-comparing their inode numbers to figure out which anonymous memory areas
-are actually shared.
-
-3.10	/proc/<pid>/timerslack_ns - Task timerslack value
----------------------------------------------------------
-This file provides the value of the task's timerslack value in nanoseconds.
-This value specifies a amount of time that normal timers may be deferred
-in order to coalesce timers and avoid unnecessary wakeups.
-
-This allows a task's interactivity vs power consumption trade off to be
-adjusted.
-
-Writing 0 to the file will set the tasks timerslack to the default value.
-
-Valid values are from 0 - ULLONG_MAX
-
-An application setting the value must have PTRACE_MODE_ATTACH_FSCREDS level
-permissions on the task specified to change its timerslack_ns value.
-
-3.11	/proc/<pid>/patch_state - Livepatch patch operation state
------------------------------------------------------------------
-When CONFIG_LIVEPATCH is enabled, this file displays the value of the
-patch state for the task.
-
-A value of '-1' indicates that no patch is in transition.
-
-A value of '0' indicates that a patch is in transition and the task is
-unpatched.  If the patch is being enabled, then the task hasn't been
-patched yet.  If the patch is being disabled, then the task has already
-been unpatched.
-
-A value of '1' indicates that a patch is in transition and the task is
-patched.  If the patch is being enabled, then the task has already been
-patched.  If the patch is being disabled, then the task hasn't been
-unpatched yet.
-
-3.12 /proc/<pid>/arch_status - task architecture specific status
--------------------------------------------------------------------
-When CONFIG_PROC_PID_ARCH_STATUS is enabled, this file displays the
-architecture specific status of the task.
-
-Example
--------
- $ cat /proc/6753/arch_status
- AVX512_elapsed_ms:      8
-
-Description
------------
-
-x86 specific entries:
----------------------
- AVX512_elapsed_ms:
- ------------------
-  If AVX512 is supported on the machine, this entry shows the milliseconds
-  elapsed since the last time AVX512 usage was recorded. The recording
-  happens on a best effort basis when a task is scheduled out. This means
-  that the value depends on two factors:
-
-    1) The time which the task spent on the CPU without being scheduled
-       out. With CPU isolation and a single runnable task this can take
-       several seconds.
-
-    2) The time since the task was scheduled out last. Depending on the
-       reason for being scheduled out (time slice exhausted, syscall ...)
-       this can be arbitrary long time.
-
-  As a consequence the value cannot be considered precise and authoritative
-  information. The application which uses this information has to be aware
-  of the overall scenario on the system in order to determine whether a
-  task is a real AVX512 user or not. Precise information can be obtained
-  with performance counters.
-
-  A special value of '-1' indicates that no AVX512 usage was recorded, thus
-  the task is unlikely an AVX512 user, but depends on the workload and the
-  scheduling scenario, it also could be a false negative mentioned above.
-
-------------------------------------------------------------------------------
-Configuring procfs
-------------------------------------------------------------------------------
-
-4.1	Mount options
----------------------
-
-The following mount options are supported:
-
-	hidepid=	Set /proc/<pid>/ access mode.
-	gid=		Set the group authorized to learn processes information.
-
-hidepid=0 means classic mode - everybody may access all /proc/<pid>/ directories
-(default).
-
-hidepid=1 means users may not access any /proc/<pid>/ directories but their
-own.  Sensitive files like cmdline, sched*, status are now protected against
-other users.  This makes it impossible to learn whether any user runs
-specific program (given the program doesn't reveal itself by its behaviour).
-As an additional bonus, as /proc/<pid>/cmdline is unaccessible for other users,
-poorly written programs passing sensitive information via program arguments are
-now protected against local eavesdroppers.
-
-hidepid=2 means hidepid=1 plus all /proc/<pid>/ will be fully invisible to other
-users.  It doesn't mean that it hides a fact whether a process with a specific
-pid value exists (it can be learned by other means, e.g. by "kill -0 $PID"),
-but it hides process' uid and gid, which may be learned by stat()'ing
-/proc/<pid>/ otherwise.  It greatly complicates an intruder's task of gathering
-information about running processes, whether some daemon runs with elevated
-privileges, whether other user runs some sensitive program, whether other users
-run any program at all, etc.
-
-gid= defines a group authorized to learn processes information otherwise
-prohibited by hidepid=.  If you use some daemon like identd which needs to learn
-information about processes information, just add identd to this group.
-- 
cgit 


From d5eefa2c5e567751df74d38d5b8cec7ed6e7a08c Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:12:19 +0100
Subject: docs: filesystems: convert qnx6.txt to ReST

- Add a SPDX header;
- Adjust document title;
- Some whitespace fixes and new line breaks;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/ccd22c1e1426ce4cb30ece9a71c39ebb41844762.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/index.rst |   1 +
 Documentation/filesystems/qnx6.rst  | 196 ++++++++++++++++++++++++++++++++++++
 Documentation/filesystems/qnx6.txt  | 174 --------------------------------
 3 files changed, 197 insertions(+), 174 deletions(-)
 create mode 100644 Documentation/filesystems/qnx6.rst
 delete mode 100644 Documentation/filesystems/qnx6.txt

diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index 671906e2fee6..08883a481a76 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -82,5 +82,6 @@ Documentation for filesystem implementations.
    orangefs
    overlayfs
    proc
+   qnx6
    virtiofs
    vfat
diff --git a/Documentation/filesystems/qnx6.rst b/Documentation/filesystems/qnx6.rst
new file mode 100644
index 000000000000..b71308314070
--- /dev/null
+++ b/Documentation/filesystems/qnx6.rst
@@ -0,0 +1,196 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===================
+The QNX6 Filesystem
+===================
+
+The qnx6fs is used by newer QNX operating system versions. (e.g. Neutrino)
+It got introduced in QNX 6.4.0 and is used default since 6.4.1.
+
+Option
+======
+
+mmi_fs		Mount filesystem as used for example by Audi MMI 3G system
+
+Specification
+=============
+
+qnx6fs shares many properties with traditional Unix filesystems. It has the
+concepts of blocks, inodes and directories.
+
+On QNX it is possible to create little endian and big endian qnx6 filesystems.
+This feature makes it possible to create and use a different endianness fs
+for the target (QNX is used on quite a range of embedded systems) platform
+running on a different endianness.
+
+The Linux driver handles endianness transparently. (LE and BE)
+
+Blocks
+------
+
+The space in the device or file is split up into blocks. These are a fixed
+size of 512, 1024, 2048 or 4096, which is decided when the filesystem is
+created.
+
+Blockpointers are 32bit, so the maximum space that can be addressed is
+2^32 * 4096 bytes or 16TB
+
+The superblocks
+---------------
+
+The superblock contains all global information about the filesystem.
+Each qnx6fs got two superblocks, each one having a 64bit serial number.
+That serial number is used to identify the "active" superblock.
+In write mode with reach new snapshot (after each synchronous write), the
+serial of the new master superblock is increased (old superblock serial + 1)
+
+So basically the snapshot functionality is realized by an atomic final
+update of the serial number. Before updating that serial, all modifications
+are done by copying all modified blocks during that specific write request
+(or period) and building up a new (stable) filesystem structure under the
+inactive superblock.
+
+Each superblock holds a set of root inodes for the different filesystem
+parts. (Inode, Bitmap and Longfilenames)
+Each of these root nodes holds information like total size of the stored
+data and the addressing levels in that specific tree.
+If the level value is 0, up to 16 direct blocks can be addressed by each
+node.
+
+Level 1 adds an additional indirect addressing level where each indirect
+addressing block holds up to blocksize / 4 bytes pointers to data blocks.
+Level 2 adds an additional indirect addressing block level (so, already up
+to 16 * 256 * 256 = 1048576 blocks that can be addressed by such a tree).
+
+Unused block pointers are always set to ~0 - regardless of root node,
+indirect addressing blocks or inodes.
+
+Data leaves are always on the lowest level. So no data is stored on upper
+tree levels.
+
+The first Superblock is located at 0x2000. (0x2000 is the bootblock size)
+The Audi MMI 3G first superblock directly starts at byte 0.
+
+Second superblock position can either be calculated from the superblock
+information (total number of filesystem blocks) or by taking the highest
+device address, zeroing the last 3 bytes and then subtracting 0x1000 from
+that address.
+
+0x1000 is the size reserved for each superblock - regardless of the
+blocksize of the filesystem.
+
+Inodes
+------
+
+Each object in the filesystem is represented by an inode. (index node)
+The inode structure contains pointers to the filesystem blocks which contain
+the data held in the object and all of the metadata about an object except
+its longname. (filenames longer than 27 characters)
+The metadata about an object includes the permissions, owner, group, flags,
+size, number of blocks used, access time, change time and modification time.
+
+Object mode field is POSIX format. (which makes things easier)
+
+There are also pointers to the first 16 blocks, if the object data can be
+addressed with 16 direct blocks.
+
+For more than 16 blocks an indirect addressing in form of another tree is
+used. (scheme is the same as the one used for the superblock root nodes)
+
+The filesize is stored 64bit. Inode counting starts with 1. (while long
+filename inodes start with 0)
+
+Directories
+-----------
+
+A directory is a filesystem object and has an inode just like a file.
+It is a specially formatted file containing records which associate each
+name with an inode number.
+
+'.' inode number points to the directory inode
+
+'..' inode number points to the parent directory inode
+
+Eeach filename record additionally got a filename length field.
+
+One special case are long filenames or subdirectory names.
+
+These got set a filename length field of 0xff in the corresponding directory
+record plus the longfile inode number also stored in that record.
+
+With that longfilename inode number, the longfilename tree can be walked
+starting with the superblock longfilename root node pointers.
+
+Special files
+-------------
+
+Symbolic links are also filesystem objects with inodes. They got a specific
+bit in the inode mode field identifying them as symbolic link.
+
+The directory entry file inode pointer points to the target file inode.
+
+Hard links got an inode, a directory entry, but a specific mode bit set,
+no block pointers and the directory file record pointing to the target file
+inode.
+
+Character and block special devices do not exist in QNX as those files
+are handled by the QNX kernel/drivers and created in /dev independent of the
+underlaying filesystem.
+
+Long filenames
+--------------
+
+Long filenames are stored in a separate addressing tree. The staring point
+is the longfilename root node in the active superblock.
+
+Each data block (tree leaves) holds one long filename. That filename is
+limited to 510 bytes. The first two starting bytes are used as length field
+for the actual filename.
+
+If that structure shall fit for all allowed blocksizes, it is clear why there
+is a limit of 510 bytes for the actual filename stored.
+
+Bitmap
+------
+
+The qnx6fs filesystem allocation bitmap is stored in a tree under bitmap
+root node in the superblock and each bit in the bitmap represents one
+filesystem block.
+
+The first block is block 0, which starts 0x1000 after superblock start.
+So for a normal qnx6fs 0x3000 (bootblock + superblock) is the physical
+address at which block 0 is located.
+
+Bits at the end of the last bitmap block are set to 1, if the device is
+smaller than addressing space in the bitmap.
+
+Bitmap system area
+------------------
+
+The bitmap itself is divided into three parts.
+
+First the system area, that is split into two halves.
+
+Then userspace.
+
+The requirement for a static, fixed preallocated system area comes from how
+qnx6fs deals with writes.
+
+Each superblock got it's own half of the system area. So superblock #1
+always uses blocks from the lower half while superblock #2 just writes to
+blocks represented by the upper half bitmap system area bits.
+
+Bitmap blocks, Inode blocks and indirect addressing blocks for those two
+tree structures are treated as system blocks.
+
+The rational behind that is that a write request can work on a new snapshot
+(system area of the inactive - resp. lower serial numbered superblock) while
+at the same time there is still a complete stable filesystem structer in the
+other half of the system area.
+
+When finished with writing (a sync write is completed, the maximum sync leap
+time or a filesystem sync is requested), serial of the previously inactive
+superblock atomically is increased and the fs switches over to that - then
+stable declared - superblock.
+
+For all data outside the system area, blocks are just copied while writing.
diff --git a/Documentation/filesystems/qnx6.txt b/Documentation/filesystems/qnx6.txt
deleted file mode 100644
index 48ea68f15845..000000000000
--- a/Documentation/filesystems/qnx6.txt
+++ /dev/null
@@ -1,174 +0,0 @@
-The QNX6 Filesystem
-===================
-
-The qnx6fs is used by newer QNX operating system versions. (e.g. Neutrino)
-It got introduced in QNX 6.4.0 and is used default since 6.4.1.
-
-Option
-======
-
-mmi_fs		Mount filesystem as used for example by Audi MMI 3G system
-
-Specification
-=============
-
-qnx6fs shares many properties with traditional Unix filesystems. It has the
-concepts of blocks, inodes and directories.
-On QNX it is possible to create little endian and big endian qnx6 filesystems.
-This feature makes it possible to create and use a different endianness fs
-for the target (QNX is used on quite a range of embedded systems) platform
-running on a different endianness.
-The Linux driver handles endianness transparently. (LE and BE)
-
-Blocks
-------
-
-The space in the device or file is split up into blocks. These are a fixed
-size of 512, 1024, 2048 or 4096, which is decided when the filesystem is
-created.
-Blockpointers are 32bit, so the maximum space that can be addressed is
-2^32 * 4096 bytes or 16TB
-
-The superblocks
----------------
-
-The superblock contains all global information about the filesystem.
-Each qnx6fs got two superblocks, each one having a 64bit serial number.
-That serial number is used to identify the "active" superblock.
-In write mode with reach new snapshot (after each synchronous write), the
-serial of the new master superblock is increased (old superblock serial + 1)
-
-So basically the snapshot functionality is realized by an atomic final
-update of the serial number. Before updating that serial, all modifications
-are done by copying all modified blocks during that specific write request
-(or period) and building up a new (stable) filesystem structure under the
-inactive superblock.
-
-Each superblock holds a set of root inodes for the different filesystem
-parts. (Inode, Bitmap and Longfilenames)
-Each of these root nodes holds information like total size of the stored
-data and the addressing levels in that specific tree.
-If the level value is 0, up to 16 direct blocks can be addressed by each
-node.
-Level 1 adds an additional indirect addressing level where each indirect
-addressing block holds up to blocksize / 4 bytes pointers to data blocks.
-Level 2 adds an additional indirect addressing block level (so, already up
-to 16 * 256 * 256 = 1048576 blocks that can be addressed by such a tree).
-
-Unused block pointers are always set to ~0 - regardless of root node,
-indirect addressing blocks or inodes.
-Data leaves are always on the lowest level. So no data is stored on upper
-tree levels.
-
-The first Superblock is located at 0x2000. (0x2000 is the bootblock size)
-The Audi MMI 3G first superblock directly starts at byte 0.
-Second superblock position can either be calculated from the superblock
-information (total number of filesystem blocks) or by taking the highest
-device address, zeroing the last 3 bytes and then subtracting 0x1000 from
-that address.
-
-0x1000 is the size reserved for each superblock - regardless of the
-blocksize of the filesystem.
-
-Inodes
-------
-
-Each object in the filesystem is represented by an inode. (index node)
-The inode structure contains pointers to the filesystem blocks which contain
-the data held in the object and all of the metadata about an object except
-its longname. (filenames longer than 27 characters)
-The metadata about an object includes the permissions, owner, group, flags,
-size, number of blocks used, access time, change time and modification time.
-
-Object mode field is POSIX format. (which makes things easier)
-
-There are also pointers to the first 16 blocks, if the object data can be
-addressed with 16 direct blocks.
-For more than 16 blocks an indirect addressing in form of another tree is
-used. (scheme is the same as the one used for the superblock root nodes)
-
-The filesize is stored 64bit. Inode counting starts with 1. (while long
-filename inodes start with 0)
-
-Directories
------------
-
-A directory is a filesystem object and has an inode just like a file.
-It is a specially formatted file containing records which associate each
-name with an inode number.
-'.' inode number points to the directory inode
-'..' inode number points to the parent directory inode
-Eeach filename record additionally got a filename length field.
-
-One special case are long filenames or subdirectory names.
-These got set a filename length field of 0xff in the corresponding directory
-record plus the longfile inode number also stored in that record.
-With that longfilename inode number, the longfilename tree can be walked
-starting with the superblock longfilename root node pointers.
-
-Special files
--------------
-
-Symbolic links are also filesystem objects with inodes. They got a specific
-bit in the inode mode field identifying them as symbolic link.
-The directory entry file inode pointer points to the target file inode.
-
-Hard links got an inode, a directory entry, but a specific mode bit set,
-no block pointers and the directory file record pointing to the target file
-inode.
-
-Character and block special devices do not exist in QNX as those files
-are handled by the QNX kernel/drivers and created in /dev independent of the
-underlaying filesystem.
-
-Long filenames
---------------
-
-Long filenames are stored in a separate addressing tree. The staring point
-is the longfilename root node in the active superblock.
-Each data block (tree leaves) holds one long filename. That filename is
-limited to 510 bytes. The first two starting bytes are used as length field
-for the actual filename.
-If that structure shall fit for all allowed blocksizes, it is clear why there
-is a limit of 510 bytes for the actual filename stored.
-
-Bitmap
-------
-
-The qnx6fs filesystem allocation bitmap is stored in a tree under bitmap
-root node in the superblock and each bit in the bitmap represents one
-filesystem block.
-The first block is block 0, which starts 0x1000 after superblock start.
-So for a normal qnx6fs 0x3000 (bootblock + superblock) is the physical
-address at which block 0 is located.
-
-Bits at the end of the last bitmap block are set to 1, if the device is
-smaller than addressing space in the bitmap.
-
-Bitmap system area
-------------------
-
-The bitmap itself is divided into three parts.
-First the system area, that is split into two halves.
-Then userspace.
-
-The requirement for a static, fixed preallocated system area comes from how
-qnx6fs deals with writes.
-Each superblock got it's own half of the system area. So superblock #1
-always uses blocks from the lower half while superblock #2 just writes to
-blocks represented by the upper half bitmap system area bits.
-
-Bitmap blocks, Inode blocks and indirect addressing blocks for those two
-tree structures are treated as system blocks.
-
-The rational behind that is that a write request can work on a new snapshot
-(system area of the inactive - resp. lower serial numbered superblock) while
-at the same time there is still a complete stable filesystem structer in the
-other half of the system area.
-
-When finished with writing (a sync write is completed, the maximum sync leap
-time or a filesystem sync is requested), serial of the previously inactive
-superblock atomically is increased and the fs switches over to that - then
-stable declared - superblock.
-
-For all data outside the system area, blocks are just copied while writing.
-- 
cgit 


From 8979fc9a282441d086ead589528c711d9df3d94a Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:12:20 +0100
Subject: docs: filesystems: convert ramfs-rootfs-initramfs.txt to ReST

- Add a SPDX header;
- Add a document title;
- Some whitespace fixes and new line breaks;
- Mark literal blocks as such;
- Add table markups;
- Use notes markups;
- Add lists markups;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/89cbcc99a6371f3bff3ea1668fe497e8a15c226b.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/index.rst                |   1 +
 .../filesystems/ramfs-rootfs-initramfs.rst         | 369 +++++++++++++++++++++
 .../filesystems/ramfs-rootfs-initramfs.txt         | 359 --------------------
 3 files changed, 370 insertions(+), 359 deletions(-)
 create mode 100644 Documentation/filesystems/ramfs-rootfs-initramfs.rst
 delete mode 100644 Documentation/filesystems/ramfs-rootfs-initramfs.txt

diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index 08883a481a76..b8689d082911 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -83,5 +83,6 @@ Documentation for filesystem implementations.
    overlayfs
    proc
    qnx6
+   ramfs-rootfs-initramfs
    virtiofs
    vfat
diff --git a/Documentation/filesystems/ramfs-rootfs-initramfs.rst b/Documentation/filesystems/ramfs-rootfs-initramfs.rst
new file mode 100644
index 000000000000..6c576e241d86
--- /dev/null
+++ b/Documentation/filesystems/ramfs-rootfs-initramfs.rst
@@ -0,0 +1,369 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===========================
+Ramfs, rootfs and initramfs
+===========================
+
+October 17, 2005
+
+Rob Landley <rob@landley.net>
+=============================
+
+What is ramfs?
+--------------
+
+Ramfs is a very simple filesystem that exports Linux's disk caching
+mechanisms (the page cache and dentry cache) as a dynamically resizable
+RAM-based filesystem.
+
+Normally all files are cached in memory by Linux.  Pages of data read from
+backing store (usually the block device the filesystem is mounted on) are kept
+around in case it's needed again, but marked as clean (freeable) in case the
+Virtual Memory system needs the memory for something else.  Similarly, data
+written to files is marked clean as soon as it has been written to backing
+store, but kept around for caching purposes until the VM reallocates the
+memory.  A similar mechanism (the dentry cache) greatly speeds up access to
+directories.
+
+With ramfs, there is no backing store.  Files written into ramfs allocate
+dentries and page cache as usual, but there's nowhere to write them to.
+This means the pages are never marked clean, so they can't be freed by the
+VM when it's looking to recycle memory.
+
+The amount of code required to implement ramfs is tiny, because all the
+work is done by the existing Linux caching infrastructure.  Basically,
+you're mounting the disk cache as a filesystem.  Because of this, ramfs is not
+an optional component removable via menuconfig, since there would be negligible
+space savings.
+
+ramfs and ramdisk:
+------------------
+
+The older "ram disk" mechanism created a synthetic block device out of
+an area of RAM and used it as backing store for a filesystem.  This block
+device was of fixed size, so the filesystem mounted on it was of fixed
+size.  Using a ram disk also required unnecessarily copying memory from the
+fake block device into the page cache (and copying changes back out), as well
+as creating and destroying dentries.  Plus it needed a filesystem driver
+(such as ext2) to format and interpret this data.
+
+Compared to ramfs, this wastes memory (and memory bus bandwidth), creates
+unnecessary work for the CPU, and pollutes the CPU caches.  (There are tricks
+to avoid this copying by playing with the page tables, but they're unpleasantly
+complicated and turn out to be about as expensive as the copying anyway.)
+More to the point, all the work ramfs is doing has to happen _anyway_,
+since all file access goes through the page and dentry caches.  The RAM
+disk is simply unnecessary; ramfs is internally much simpler.
+
+Another reason ramdisks are semi-obsolete is that the introduction of
+loopback devices offered a more flexible and convenient way to create
+synthetic block devices, now from files instead of from chunks of memory.
+See losetup (8) for details.
+
+ramfs and tmpfs:
+----------------
+
+One downside of ramfs is you can keep writing data into it until you fill
+up all memory, and the VM can't free it because the VM thinks that files
+should get written to backing store (rather than swap space), but ramfs hasn't
+got any backing store.  Because of this, only root (or a trusted user) should
+be allowed write access to a ramfs mount.
+
+A ramfs derivative called tmpfs was created to add size limits, and the ability
+to write the data to swap space.  Normal users can be allowed write access to
+tmpfs mounts.  See Documentation/filesystems/tmpfs.txt for more information.
+
+What is rootfs?
+---------------
+
+Rootfs is a special instance of ramfs (or tmpfs, if that's enabled), which is
+always present in 2.6 systems.  You can't unmount rootfs for approximately the
+same reason you can't kill the init process; rather than having special code
+to check for and handle an empty list, it's smaller and simpler for the kernel
+to just make sure certain lists can't become empty.
+
+Most systems just mount another filesystem over rootfs and ignore it.  The
+amount of space an empty instance of ramfs takes up is tiny.
+
+If CONFIG_TMPFS is enabled, rootfs will use tmpfs instead of ramfs by
+default.  To force ramfs, add "rootfstype=ramfs" to the kernel command
+line.
+
+What is initramfs?
+------------------
+
+All 2.6 Linux kernels contain a gzipped "cpio" format archive, which is
+extracted into rootfs when the kernel boots up.  After extracting, the kernel
+checks to see if rootfs contains a file "init", and if so it executes it as PID
+1.  If found, this init process is responsible for bringing the system the
+rest of the way up, including locating and mounting the real root device (if
+any).  If rootfs does not contain an init program after the embedded cpio
+archive is extracted into it, the kernel will fall through to the older code
+to locate and mount a root partition, then exec some variant of /sbin/init
+out of that.
+
+All this differs from the old initrd in several ways:
+
+  - The old initrd was always a separate file, while the initramfs archive is
+    linked into the linux kernel image.  (The directory ``linux-*/usr`` is
+    devoted to generating this archive during the build.)
+
+  - The old initrd file was a gzipped filesystem image (in some file format,
+    such as ext2, that needed a driver built into the kernel), while the new
+    initramfs archive is a gzipped cpio archive (like tar only simpler,
+    see cpio(1) and Documentation/driver-api/early-userspace/buffer-format.rst).
+    The kernel's cpio extraction code is not only extremely small, it's also
+    __init text and data that can be discarded during the boot process.
+
+  - The program run by the old initrd (which was called /initrd, not /init) did
+    some setup and then returned to the kernel, while the init program from
+    initramfs is not expected to return to the kernel.  (If /init needs to hand
+    off control it can overmount / with a new root device and exec another init
+    program.  See the switch_root utility, below.)
+
+  - When switching another root device, initrd would pivot_root and then
+    umount the ramdisk.  But initramfs is rootfs: you can neither pivot_root
+    rootfs, nor unmount it.  Instead delete everything out of rootfs to
+    free up the space (find -xdev / -exec rm '{}' ';'), overmount rootfs
+    with the new root (cd /newmount; mount --move . /; chroot .), attach
+    stdin/stdout/stderr to the new /dev/console, and exec the new init.
+
+    Since this is a remarkably persnickety process (and involves deleting
+    commands before you can run them), the klibc package introduced a helper
+    program (utils/run_init.c) to do all this for you.  Most other packages
+    (such as busybox) have named this command "switch_root".
+
+Populating initramfs:
+---------------------
+
+The 2.6 kernel build process always creates a gzipped cpio format initramfs
+archive and links it into the resulting kernel binary.  By default, this
+archive is empty (consuming 134 bytes on x86).
+
+The config option CONFIG_INITRAMFS_SOURCE (in General Setup in menuconfig,
+and living in usr/Kconfig) can be used to specify a source for the
+initramfs archive, which will automatically be incorporated into the
+resulting binary.  This option can point to an existing gzipped cpio
+archive, a directory containing files to be archived, or a text file
+specification such as the following example::
+
+  dir /dev 755 0 0
+  nod /dev/console 644 0 0 c 5 1
+  nod /dev/loop0 644 0 0 b 7 0
+  dir /bin 755 1000 1000
+  slink /bin/sh busybox 777 0 0
+  file /bin/busybox initramfs/busybox 755 0 0
+  dir /proc 755 0 0
+  dir /sys 755 0 0
+  dir /mnt 755 0 0
+  file /init initramfs/init.sh 755 0 0
+
+Run "usr/gen_init_cpio" (after the kernel build) to get a usage message
+documenting the above file format.
+
+One advantage of the configuration file is that root access is not required to
+set permissions or create device nodes in the new archive.  (Note that those
+two example "file" entries expect to find files named "init.sh" and "busybox" in
+a directory called "initramfs", under the linux-2.6.* directory.  See
+Documentation/driver-api/early-userspace/early_userspace_support.rst for more details.)
+
+The kernel does not depend on external cpio tools.  If you specify a
+directory instead of a configuration file, the kernel's build infrastructure
+creates a configuration file from that directory (usr/Makefile calls
+usr/gen_initramfs_list.sh), and proceeds to package up that directory
+using the config file (by feeding it to usr/gen_init_cpio, which is created
+from usr/gen_init_cpio.c).  The kernel's build-time cpio creation code is
+entirely self-contained, and the kernel's boot-time extractor is also
+(obviously) self-contained.
+
+The one thing you might need external cpio utilities installed for is creating
+or extracting your own preprepared cpio files to feed to the kernel build
+(instead of a config file or directory).
+
+The following command line can extract a cpio image (either by the above script
+or by the kernel build) back into its component files::
+
+  cpio -i -d -H newc -F initramfs_data.cpio --no-absolute-filenames
+
+The following shell script can create a prebuilt cpio archive you can
+use in place of the above config file::
+
+  #!/bin/sh
+
+  # Copyright 2006 Rob Landley <rob@landley.net> and TimeSys Corporation.
+  # Licensed under GPL version 2
+
+  if [ $# -ne 2 ]
+  then
+    echo "usage: mkinitramfs directory imagename.cpio.gz"
+    exit 1
+  fi
+
+  if [ -d "$1" ]
+  then
+    echo "creating $2 from $1"
+    (cd "$1"; find . | cpio -o -H newc | gzip) > "$2"
+  else
+    echo "First argument must be a directory"
+    exit 1
+  fi
+
+.. Note::
+
+   The cpio man page contains some bad advice that will break your initramfs
+   archive if you follow it.  It says "A typical way to generate the list
+   of filenames is with the find command; you should give find the -depth
+   option to minimize problems with permissions on directories that are
+   unwritable or not searchable."  Don't do this when creating
+   initramfs.cpio.gz images, it won't work.  The Linux kernel cpio extractor
+   won't create files in a directory that doesn't exist, so the directory
+   entries must go before the files that go in those directories.
+   The above script gets them in the right order.
+
+External initramfs images:
+--------------------------
+
+If the kernel has initrd support enabled, an external cpio.gz archive can also
+be passed into a 2.6 kernel in place of an initrd.  In this case, the kernel
+will autodetect the type (initramfs, not initrd) and extract the external cpio
+archive into rootfs before trying to run /init.
+
+This has the memory efficiency advantages of initramfs (no ramdisk block
+device) but the separate packaging of initrd (which is nice if you have
+non-GPL code you'd like to run from initramfs, without conflating it with
+the GPL licensed Linux kernel binary).
+
+It can also be used to supplement the kernel's built-in initramfs image.  The
+files in the external archive will overwrite any conflicting files in
+the built-in initramfs archive.  Some distributors also prefer to customize
+a single kernel image with task-specific initramfs images, without recompiling.
+
+Contents of initramfs:
+----------------------
+
+An initramfs archive is a complete self-contained root filesystem for Linux.
+If you don't already understand what shared libraries, devices, and paths
+you need to get a minimal root filesystem up and running, here are some
+references:
+
+- http://www.tldp.org/HOWTO/Bootdisk-HOWTO/
+- http://www.tldp.org/HOWTO/From-PowerUp-To-Bash-Prompt-HOWTO.html
+- http://www.linuxfromscratch.org/lfs/view/stable/
+
+The "klibc" package (http://www.kernel.org/pub/linux/libs/klibc) is
+designed to be a tiny C library to statically link early userspace
+code against, along with some related utilities.  It is BSD licensed.
+
+I use uClibc (http://www.uclibc.org) and busybox (http://www.busybox.net)
+myself.  These are LGPL and GPL, respectively.  (A self-contained initramfs
+package is planned for the busybox 1.3 release.)
+
+In theory you could use glibc, but that's not well suited for small embedded
+uses like this.  (A "hello world" program statically linked against glibc is
+over 400k.  With uClibc it's 7k.  Also note that glibc dlopens libnss to do
+name lookups, even when otherwise statically linked.)
+
+A good first step is to get initramfs to run a statically linked "hello world"
+program as init, and test it under an emulator like qemu (www.qemu.org) or
+User Mode Linux, like so::
+
+  cat > hello.c << EOF
+  #include <stdio.h>
+  #include <unistd.h>
+
+  int main(int argc, char *argv[])
+  {
+    printf("Hello world!\n");
+    sleep(999999999);
+  }
+  EOF
+  gcc -static hello.c -o init
+  echo init | cpio -o -H newc | gzip > test.cpio.gz
+  # Testing external initramfs using the initrd loading mechanism.
+  qemu -kernel /boot/vmlinuz -initrd test.cpio.gz /dev/zero
+
+When debugging a normal root filesystem, it's nice to be able to boot with
+"init=/bin/sh".  The initramfs equivalent is "rdinit=/bin/sh", and it's
+just as useful.
+
+Why cpio rather than tar?
+-------------------------
+
+This decision was made back in December, 2001.  The discussion started here:
+
+  http://www.uwsg.iu.edu/hypermail/linux/kernel/0112.2/1538.html
+
+And spawned a second thread (specifically on tar vs cpio), starting here:
+
+  http://www.uwsg.iu.edu/hypermail/linux/kernel/0112.2/1587.html
+
+The quick and dirty summary version (which is no substitute for reading
+the above threads) is:
+
+1) cpio is a standard.  It's decades old (from the AT&T days), and already
+   widely used on Linux (inside RPM, Red Hat's device driver disks).  Here's
+   a Linux Journal article about it from 1996:
+
+      http://www.linuxjournal.com/article/1213
+
+   It's not as popular as tar because the traditional cpio command line tools
+   require _truly_hideous_ command line arguments.  But that says nothing
+   either way about the archive format, and there are alternative tools,
+   such as:
+
+     http://freecode.com/projects/afio
+
+2) The cpio archive format chosen by the kernel is simpler and cleaner (and
+   thus easier to create and parse) than any of the (literally dozens of)
+   various tar archive formats.  The complete initramfs archive format is
+   explained in buffer-format.txt, created in usr/gen_init_cpio.c, and
+   extracted in init/initramfs.c.  All three together come to less than 26k
+   total of human-readable text.
+
+3) The GNU project standardizing on tar is approximately as relevant as
+   Windows standardizing on zip.  Linux is not part of either, and is free
+   to make its own technical decisions.
+
+4) Since this is a kernel internal format, it could easily have been
+   something brand new.  The kernel provides its own tools to create and
+   extract this format anyway.  Using an existing standard was preferable,
+   but not essential.
+
+5) Al Viro made the decision (quote: "tar is ugly as hell and not going to be
+   supported on the kernel side"):
+
+      http://www.uwsg.iu.edu/hypermail/linux/kernel/0112.2/1540.html
+
+   explained his reasoning:
+
+     - http://www.uwsg.iu.edu/hypermail/linux/kernel/0112.2/1550.html
+     - http://www.uwsg.iu.edu/hypermail/linux/kernel/0112.2/1638.html
+
+   and, most importantly, designed and implemented the initramfs code.
+
+Future directions:
+------------------
+
+Today (2.6.16), initramfs is always compiled in, but not always used.  The
+kernel falls back to legacy boot code that is reached only if initramfs does
+not contain an /init program.  The fallback is legacy code, there to ensure a
+smooth transition and allowing early boot functionality to gradually move to
+"early userspace" (I.E. initramfs).
+
+The move to early userspace is necessary because finding and mounting the real
+root device is complex.  Root partitions can span multiple devices (raid or
+separate journal).  They can be out on the network (requiring dhcp, setting a
+specific MAC address, logging into a server, etc).  They can live on removable
+media, with dynamically allocated major/minor numbers and persistent naming
+issues requiring a full udev implementation to sort out.  They can be
+compressed, encrypted, copy-on-write, loopback mounted, strangely partitioned,
+and so on.
+
+This kind of complexity (which inevitably includes policy) is rightly handled
+in userspace.  Both klibc and busybox/uClibc are working on simple initramfs
+packages to drop into a kernel build.
+
+The klibc package has now been accepted into Andrew Morton's 2.6.17-mm tree.
+The kernel's current early boot code (partition detection, etc) will probably
+be migrated into a default initramfs, automatically created and used by the
+kernel build.
diff --git a/Documentation/filesystems/ramfs-rootfs-initramfs.txt b/Documentation/filesystems/ramfs-rootfs-initramfs.txt
deleted file mode 100644
index 97d42ccaa92d..000000000000
--- a/Documentation/filesystems/ramfs-rootfs-initramfs.txt
+++ /dev/null
@@ -1,359 +0,0 @@
-ramfs, rootfs and initramfs
-October 17, 2005
-Rob Landley <rob@landley.net>
-=============================
-
-What is ramfs?
---------------
-
-Ramfs is a very simple filesystem that exports Linux's disk caching
-mechanisms (the page cache and dentry cache) as a dynamically resizable
-RAM-based filesystem.
-
-Normally all files are cached in memory by Linux.  Pages of data read from
-backing store (usually the block device the filesystem is mounted on) are kept
-around in case it's needed again, but marked as clean (freeable) in case the
-Virtual Memory system needs the memory for something else.  Similarly, data
-written to files is marked clean as soon as it has been written to backing
-store, but kept around for caching purposes until the VM reallocates the
-memory.  A similar mechanism (the dentry cache) greatly speeds up access to
-directories.
-
-With ramfs, there is no backing store.  Files written into ramfs allocate
-dentries and page cache as usual, but there's nowhere to write them to.
-This means the pages are never marked clean, so they can't be freed by the
-VM when it's looking to recycle memory.
-
-The amount of code required to implement ramfs is tiny, because all the
-work is done by the existing Linux caching infrastructure.  Basically,
-you're mounting the disk cache as a filesystem.  Because of this, ramfs is not
-an optional component removable via menuconfig, since there would be negligible
-space savings.
-
-ramfs and ramdisk:
-------------------
-
-The older "ram disk" mechanism created a synthetic block device out of
-an area of RAM and used it as backing store for a filesystem.  This block
-device was of fixed size, so the filesystem mounted on it was of fixed
-size.  Using a ram disk also required unnecessarily copying memory from the
-fake block device into the page cache (and copying changes back out), as well
-as creating and destroying dentries.  Plus it needed a filesystem driver
-(such as ext2) to format and interpret this data.
-
-Compared to ramfs, this wastes memory (and memory bus bandwidth), creates
-unnecessary work for the CPU, and pollutes the CPU caches.  (There are tricks
-to avoid this copying by playing with the page tables, but they're unpleasantly
-complicated and turn out to be about as expensive as the copying anyway.)
-More to the point, all the work ramfs is doing has to happen _anyway_,
-since all file access goes through the page and dentry caches.  The RAM
-disk is simply unnecessary; ramfs is internally much simpler.
-
-Another reason ramdisks are semi-obsolete is that the introduction of
-loopback devices offered a more flexible and convenient way to create
-synthetic block devices, now from files instead of from chunks of memory.
-See losetup (8) for details.
-
-ramfs and tmpfs:
-----------------
-
-One downside of ramfs is you can keep writing data into it until you fill
-up all memory, and the VM can't free it because the VM thinks that files
-should get written to backing store (rather than swap space), but ramfs hasn't
-got any backing store.  Because of this, only root (or a trusted user) should
-be allowed write access to a ramfs mount.
-
-A ramfs derivative called tmpfs was created to add size limits, and the ability
-to write the data to swap space.  Normal users can be allowed write access to
-tmpfs mounts.  See Documentation/filesystems/tmpfs.txt for more information.
-
-What is rootfs?
----------------
-
-Rootfs is a special instance of ramfs (or tmpfs, if that's enabled), which is
-always present in 2.6 systems.  You can't unmount rootfs for approximately the
-same reason you can't kill the init process; rather than having special code
-to check for and handle an empty list, it's smaller and simpler for the kernel
-to just make sure certain lists can't become empty.
-
-Most systems just mount another filesystem over rootfs and ignore it.  The
-amount of space an empty instance of ramfs takes up is tiny.
-
-If CONFIG_TMPFS is enabled, rootfs will use tmpfs instead of ramfs by
-default.  To force ramfs, add "rootfstype=ramfs" to the kernel command
-line.
-
-What is initramfs?
-------------------
-
-All 2.6 Linux kernels contain a gzipped "cpio" format archive, which is
-extracted into rootfs when the kernel boots up.  After extracting, the kernel
-checks to see if rootfs contains a file "init", and if so it executes it as PID
-1.  If found, this init process is responsible for bringing the system the
-rest of the way up, including locating and mounting the real root device (if
-any).  If rootfs does not contain an init program after the embedded cpio
-archive is extracted into it, the kernel will fall through to the older code
-to locate and mount a root partition, then exec some variant of /sbin/init
-out of that.
-
-All this differs from the old initrd in several ways:
-
-  - The old initrd was always a separate file, while the initramfs archive is
-    linked into the linux kernel image.  (The directory linux-*/usr is devoted
-    to generating this archive during the build.)
-
-  - The old initrd file was a gzipped filesystem image (in some file format,
-    such as ext2, that needed a driver built into the kernel), while the new
-    initramfs archive is a gzipped cpio archive (like tar only simpler,
-    see cpio(1) and Documentation/driver-api/early-userspace/buffer-format.rst).  The
-    kernel's cpio extraction code is not only extremely small, it's also
-    __init text and data that can be discarded during the boot process.
-
-  - The program run by the old initrd (which was called /initrd, not /init) did
-    some setup and then returned to the kernel, while the init program from
-    initramfs is not expected to return to the kernel.  (If /init needs to hand
-    off control it can overmount / with a new root device and exec another init
-    program.  See the switch_root utility, below.)
-
-  - When switching another root device, initrd would pivot_root and then
-    umount the ramdisk.  But initramfs is rootfs: you can neither pivot_root
-    rootfs, nor unmount it.  Instead delete everything out of rootfs to
-    free up the space (find -xdev / -exec rm '{}' ';'), overmount rootfs
-    with the new root (cd /newmount; mount --move . /; chroot .), attach
-    stdin/stdout/stderr to the new /dev/console, and exec the new init.
-
-    Since this is a remarkably persnickety process (and involves deleting
-    commands before you can run them), the klibc package introduced a helper
-    program (utils/run_init.c) to do all this for you.  Most other packages
-    (such as busybox) have named this command "switch_root".
-
-Populating initramfs:
----------------------
-
-The 2.6 kernel build process always creates a gzipped cpio format initramfs
-archive and links it into the resulting kernel binary.  By default, this
-archive is empty (consuming 134 bytes on x86).
-
-The config option CONFIG_INITRAMFS_SOURCE (in General Setup in menuconfig,
-and living in usr/Kconfig) can be used to specify a source for the
-initramfs archive, which will automatically be incorporated into the
-resulting binary.  This option can point to an existing gzipped cpio
-archive, a directory containing files to be archived, or a text file
-specification such as the following example:
-
-  dir /dev 755 0 0
-  nod /dev/console 644 0 0 c 5 1
-  nod /dev/loop0 644 0 0 b 7 0
-  dir /bin 755 1000 1000
-  slink /bin/sh busybox 777 0 0
-  file /bin/busybox initramfs/busybox 755 0 0
-  dir /proc 755 0 0
-  dir /sys 755 0 0
-  dir /mnt 755 0 0
-  file /init initramfs/init.sh 755 0 0
-
-Run "usr/gen_init_cpio" (after the kernel build) to get a usage message
-documenting the above file format.
-
-One advantage of the configuration file is that root access is not required to
-set permissions or create device nodes in the new archive.  (Note that those
-two example "file" entries expect to find files named "init.sh" and "busybox" in
-a directory called "initramfs", under the linux-2.6.* directory.  See
-Documentation/driver-api/early-userspace/early_userspace_support.rst for more details.)
-
-The kernel does not depend on external cpio tools.  If you specify a
-directory instead of a configuration file, the kernel's build infrastructure
-creates a configuration file from that directory (usr/Makefile calls
-usr/gen_initramfs_list.sh), and proceeds to package up that directory
-using the config file (by feeding it to usr/gen_init_cpio, which is created
-from usr/gen_init_cpio.c).  The kernel's build-time cpio creation code is
-entirely self-contained, and the kernel's boot-time extractor is also
-(obviously) self-contained.
-
-The one thing you might need external cpio utilities installed for is creating
-or extracting your own preprepared cpio files to feed to the kernel build
-(instead of a config file or directory).
-
-The following command line can extract a cpio image (either by the above script
-or by the kernel build) back into its component files:
-
-  cpio -i -d -H newc -F initramfs_data.cpio --no-absolute-filenames
-
-The following shell script can create a prebuilt cpio archive you can
-use in place of the above config file:
-
-  #!/bin/sh
-
-  # Copyright 2006 Rob Landley <rob@landley.net> and TimeSys Corporation.
-  # Licensed under GPL version 2
-
-  if [ $# -ne 2 ]
-  then
-    echo "usage: mkinitramfs directory imagename.cpio.gz"
-    exit 1
-  fi
-
-  if [ -d "$1" ]
-  then
-    echo "creating $2 from $1"
-    (cd "$1"; find . | cpio -o -H newc | gzip) > "$2"
-  else
-    echo "First argument must be a directory"
-    exit 1
-  fi
-
-Note: The cpio man page contains some bad advice that will break your initramfs
-archive if you follow it.  It says "A typical way to generate the list
-of filenames is with the find command; you should give find the -depth option
-to minimize problems with permissions on directories that are unwritable or not
-searchable."  Don't do this when creating initramfs.cpio.gz images, it won't
-work.  The Linux kernel cpio extractor won't create files in a directory that
-doesn't exist, so the directory entries must go before the files that go in
-those directories.  The above script gets them in the right order.
-
-External initramfs images:
---------------------------
-
-If the kernel has initrd support enabled, an external cpio.gz archive can also
-be passed into a 2.6 kernel in place of an initrd.  In this case, the kernel
-will autodetect the type (initramfs, not initrd) and extract the external cpio
-archive into rootfs before trying to run /init.
-
-This has the memory efficiency advantages of initramfs (no ramdisk block
-device) but the separate packaging of initrd (which is nice if you have
-non-GPL code you'd like to run from initramfs, without conflating it with
-the GPL licensed Linux kernel binary).
-
-It can also be used to supplement the kernel's built-in initramfs image.  The
-files in the external archive will overwrite any conflicting files in
-the built-in initramfs archive.  Some distributors also prefer to customize
-a single kernel image with task-specific initramfs images, without recompiling.
-
-Contents of initramfs:
-----------------------
-
-An initramfs archive is a complete self-contained root filesystem for Linux.
-If you don't already understand what shared libraries, devices, and paths
-you need to get a minimal root filesystem up and running, here are some
-references:
-http://www.tldp.org/HOWTO/Bootdisk-HOWTO/
-http://www.tldp.org/HOWTO/From-PowerUp-To-Bash-Prompt-HOWTO.html
-http://www.linuxfromscratch.org/lfs/view/stable/
-
-The "klibc" package (http://www.kernel.org/pub/linux/libs/klibc) is
-designed to be a tiny C library to statically link early userspace
-code against, along with some related utilities.  It is BSD licensed.
-
-I use uClibc (http://www.uclibc.org) and busybox (http://www.busybox.net)
-myself.  These are LGPL and GPL, respectively.  (A self-contained initramfs
-package is planned for the busybox 1.3 release.)
-
-In theory you could use glibc, but that's not well suited for small embedded
-uses like this.  (A "hello world" program statically linked against glibc is
-over 400k.  With uClibc it's 7k.  Also note that glibc dlopens libnss to do
-name lookups, even when otherwise statically linked.)
-
-A good first step is to get initramfs to run a statically linked "hello world"
-program as init, and test it under an emulator like qemu (www.qemu.org) or
-User Mode Linux, like so:
-
-  cat > hello.c << EOF
-  #include <stdio.h>
-  #include <unistd.h>
-
-  int main(int argc, char *argv[])
-  {
-    printf("Hello world!\n");
-    sleep(999999999);
-  }
-  EOF
-  gcc -static hello.c -o init
-  echo init | cpio -o -H newc | gzip > test.cpio.gz
-  # Testing external initramfs using the initrd loading mechanism.
-  qemu -kernel /boot/vmlinuz -initrd test.cpio.gz /dev/zero
-
-When debugging a normal root filesystem, it's nice to be able to boot with
-"init=/bin/sh".  The initramfs equivalent is "rdinit=/bin/sh", and it's
-just as useful.
-
-Why cpio rather than tar?
--------------------------
-
-This decision was made back in December, 2001.  The discussion started here:
-
-  http://www.uwsg.iu.edu/hypermail/linux/kernel/0112.2/1538.html
-
-And spawned a second thread (specifically on tar vs cpio), starting here:
-
-  http://www.uwsg.iu.edu/hypermail/linux/kernel/0112.2/1587.html
-
-The quick and dirty summary version (which is no substitute for reading
-the above threads) is:
-
-1) cpio is a standard.  It's decades old (from the AT&T days), and already
-   widely used on Linux (inside RPM, Red Hat's device driver disks).  Here's
-   a Linux Journal article about it from 1996:
-
-      http://www.linuxjournal.com/article/1213
-
-   It's not as popular as tar because the traditional cpio command line tools
-   require _truly_hideous_ command line arguments.  But that says nothing
-   either way about the archive format, and there are alternative tools,
-   such as:
-
-     http://freecode.com/projects/afio
-
-2) The cpio archive format chosen by the kernel is simpler and cleaner (and
-   thus easier to create and parse) than any of the (literally dozens of)
-   various tar archive formats.  The complete initramfs archive format is
-   explained in buffer-format.txt, created in usr/gen_init_cpio.c, and
-   extracted in init/initramfs.c.  All three together come to less than 26k
-   total of human-readable text.
-
-3) The GNU project standardizing on tar is approximately as relevant as
-   Windows standardizing on zip.  Linux is not part of either, and is free
-   to make its own technical decisions.
-
-4) Since this is a kernel internal format, it could easily have been
-   something brand new.  The kernel provides its own tools to create and
-   extract this format anyway.  Using an existing standard was preferable,
-   but not essential.
-
-5) Al Viro made the decision (quote: "tar is ugly as hell and not going to be
-   supported on the kernel side"):
-
-      http://www.uwsg.iu.edu/hypermail/linux/kernel/0112.2/1540.html
-
-   explained his reasoning:
-
-      http://www.uwsg.iu.edu/hypermail/linux/kernel/0112.2/1550.html
-      http://www.uwsg.iu.edu/hypermail/linux/kernel/0112.2/1638.html
-
-   and, most importantly, designed and implemented the initramfs code.
-
-Future directions:
-------------------
-
-Today (2.6.16), initramfs is always compiled in, but not always used.  The
-kernel falls back to legacy boot code that is reached only if initramfs does
-not contain an /init program.  The fallback is legacy code, there to ensure a
-smooth transition and allowing early boot functionality to gradually move to
-"early userspace" (I.E. initramfs).
-
-The move to early userspace is necessary because finding and mounting the real
-root device is complex.  Root partitions can span multiple devices (raid or
-separate journal).  They can be out on the network (requiring dhcp, setting a
-specific MAC address, logging into a server, etc).  They can live on removable
-media, with dynamically allocated major/minor numbers and persistent naming
-issues requiring a full udev implementation to sort out.  They can be
-compressed, encrypted, copy-on-write, loopback mounted, strangely partitioned,
-and so on.
-
-This kind of complexity (which inevitably includes policy) is rightly handled
-in userspace.  Both klibc and busybox/uClibc are working on simple initramfs
-packages to drop into a kernel build.
-
-The klibc package has now been accepted into Andrew Morton's 2.6.17-mm tree.
-The kernel's current early boot code (partition detection, etc) will probably
-be migrated into a default initramfs, automatically created and used by the
-kernel build.
-- 
cgit 


From 56e6d5c0eb7b862b4c984107e665821722413008 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:12:21 +0100
Subject: docs: filesystems: convert relay.txt to ReST

- Add a SPDX header;
- Adjust document title;
- Some whitespace fixes and new line breaks;
- Mark literal blocks as such;
- Add table markups;
- Use notes markups;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/f48bb0fdf64d197f28c6f469adb61a7a091adb75.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/index.rst |   1 +
 Documentation/filesystems/relay.rst | 501 ++++++++++++++++++++++++++++++++++++
 Documentation/filesystems/relay.txt | 494 -----------------------------------
 3 files changed, 502 insertions(+), 494 deletions(-)
 create mode 100644 Documentation/filesystems/relay.rst
 delete mode 100644 Documentation/filesystems/relay.txt

diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index b8689d082911..0aade8146d4d 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -84,5 +84,6 @@ Documentation for filesystem implementations.
    proc
    qnx6
    ramfs-rootfs-initramfs
+   relay
    virtiofs
    vfat
diff --git a/Documentation/filesystems/relay.rst b/Documentation/filesystems/relay.rst
new file mode 100644
index 000000000000..04ad083cfe62
--- /dev/null
+++ b/Documentation/filesystems/relay.rst
@@ -0,0 +1,501 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==================================
+relay interface (formerly relayfs)
+==================================
+
+The relay interface provides a means for kernel applications to
+efficiently log and transfer large quantities of data from the kernel
+to userspace via user-defined 'relay channels'.
+
+A 'relay channel' is a kernel->user data relay mechanism implemented
+as a set of per-cpu kernel buffers ('channel buffers'), each
+represented as a regular file ('relay file') in user space.  Kernel
+clients write into the channel buffers using efficient write
+functions; these automatically log into the current cpu's channel
+buffer.  User space applications mmap() or read() from the relay files
+and retrieve the data as it becomes available.  The relay files
+themselves are files created in a host filesystem, e.g. debugfs, and
+are associated with the channel buffers using the API described below.
+
+The format of the data logged into the channel buffers is completely
+up to the kernel client; the relay interface does however provide
+hooks which allow kernel clients to impose some structure on the
+buffer data.  The relay interface doesn't implement any form of data
+filtering - this also is left to the kernel client.  The purpose is to
+keep things as simple as possible.
+
+This document provides an overview of the relay interface API.  The
+details of the function parameters are documented along with the
+functions in the relay interface code - please see that for details.
+
+Semantics
+=========
+
+Each relay channel has one buffer per CPU, each buffer has one or more
+sub-buffers.  Messages are written to the first sub-buffer until it is
+too full to contain a new message, in which case it is written to
+the next (if available).  Messages are never split across sub-buffers.
+At this point, userspace can be notified so it empties the first
+sub-buffer, while the kernel continues writing to the next.
+
+When notified that a sub-buffer is full, the kernel knows how many
+bytes of it are padding i.e. unused space occurring because a complete
+message couldn't fit into a sub-buffer.  Userspace can use this
+knowledge to copy only valid data.
+
+After copying it, userspace can notify the kernel that a sub-buffer
+has been consumed.
+
+A relay channel can operate in a mode where it will overwrite data not
+yet collected by userspace, and not wait for it to be consumed.
+
+The relay channel itself does not provide for communication of such
+data between userspace and kernel, allowing the kernel side to remain
+simple and not impose a single interface on userspace.  It does
+provide a set of examples and a separate helper though, described
+below.
+
+The read() interface both removes padding and internally consumes the
+read sub-buffers; thus in cases where read(2) is being used to drain
+the channel buffers, special-purpose communication between kernel and
+user isn't necessary for basic operation.
+
+One of the major goals of the relay interface is to provide a low
+overhead mechanism for conveying kernel data to userspace.  While the
+read() interface is easy to use, it's not as efficient as the mmap()
+approach; the example code attempts to make the tradeoff between the
+two approaches as small as possible.
+
+klog and relay-apps example code
+================================
+
+The relay interface itself is ready to use, but to make things easier,
+a couple simple utility functions and a set of examples are provided.
+
+The relay-apps example tarball, available on the relay sourceforge
+site, contains a set of self-contained examples, each consisting of a
+pair of .c files containing boilerplate code for each of the user and
+kernel sides of a relay application.  When combined these two sets of
+boilerplate code provide glue to easily stream data to disk, without
+having to bother with mundane housekeeping chores.
+
+The 'klog debugging functions' patch (klog.patch in the relay-apps
+tarball) provides a couple of high-level logging functions to the
+kernel which allow writing formatted text or raw data to a channel,
+regardless of whether a channel to write into exists or not, or even
+whether the relay interface is compiled into the kernel or not.  These
+functions allow you to put unconditional 'trace' statements anywhere
+in the kernel or kernel modules; only when there is a 'klog handler'
+registered will data actually be logged (see the klog and kleak
+examples for details).
+
+It is of course possible to use the relay interface from scratch,
+i.e. without using any of the relay-apps example code or klog, but
+you'll have to implement communication between userspace and kernel,
+allowing both to convey the state of buffers (full, empty, amount of
+padding).  The read() interface both removes padding and internally
+consumes the read sub-buffers; thus in cases where read(2) is being
+used to drain the channel buffers, special-purpose communication
+between kernel and user isn't necessary for basic operation.  Things
+such as buffer-full conditions would still need to be communicated via
+some channel though.
+
+klog and the relay-apps examples can be found in the relay-apps
+tarball on http://relayfs.sourceforge.net
+
+The relay interface user space API
+==================================
+
+The relay interface implements basic file operations for user space
+access to relay channel buffer data.  Here are the file operations
+that are available and some comments regarding their behavior:
+
+=========== ============================================================
+open()	    enables user to open an _existing_ channel buffer.
+
+mmap()      results in channel buffer being mapped into the caller's
+	    memory space. Note that you can't do a partial mmap - you
+	    must map the entire file, which is NRBUF * SUBBUFSIZE.
+
+read()      read the contents of a channel buffer.  The bytes read are
+	    'consumed' by the reader, i.e. they won't be available
+	    again to subsequent reads.  If the channel is being used
+	    in no-overwrite mode (the default), it can be read at any
+	    time even if there's an active kernel writer.  If the
+	    channel is being used in overwrite mode and there are
+	    active channel writers, results may be unpredictable -
+	    users should make sure that all logging to the channel has
+	    ended before using read() with overwrite mode.  Sub-buffer
+	    padding is automatically removed and will not be seen by
+	    the reader.
+
+sendfile()  transfer data from a channel buffer to an output file
+	    descriptor. Sub-buffer padding is automatically removed
+	    and will not be seen by the reader.
+
+poll()      POLLIN/POLLRDNORM/POLLERR supported.  User applications are
+	    notified when sub-buffer boundaries are crossed.
+
+close()     decrements the channel buffer's refcount.  When the refcount
+	    reaches 0, i.e. when no process or kernel client has the
+	    buffer open, the channel buffer is freed.
+=========== ============================================================
+
+In order for a user application to make use of relay files, the
+host filesystem must be mounted.  For example::
+
+	mount -t debugfs debugfs /sys/kernel/debug
+
+.. Note::
+
+	the host filesystem doesn't need to be mounted for kernel
+	clients to create or use channels - it only needs to be
+	mounted when user space applications need access to the buffer
+	data.
+
+
+The relay interface kernel API
+==============================
+
+Here's a summary of the API the relay interface provides to in-kernel clients:
+
+TBD(curr. line MT:/API/)
+  channel management functions::
+
+    relay_open(base_filename, parent, subbuf_size, n_subbufs,
+               callbacks, private_data)
+    relay_close(chan)
+    relay_flush(chan)
+    relay_reset(chan)
+
+  channel management typically called on instigation of userspace::
+
+    relay_subbufs_consumed(chan, cpu, subbufs_consumed)
+
+  write functions::
+
+    relay_write(chan, data, length)
+    __relay_write(chan, data, length)
+    relay_reserve(chan, length)
+
+  callbacks::
+
+    subbuf_start(buf, subbuf, prev_subbuf, prev_padding)
+    buf_mapped(buf, filp)
+    buf_unmapped(buf, filp)
+    create_buf_file(filename, parent, mode, buf, is_global)
+    remove_buf_file(dentry)
+
+  helper functions::
+
+    relay_buf_full(buf)
+    subbuf_start_reserve(buf, length)
+
+
+Creating a channel
+------------------
+
+relay_open() is used to create a channel, along with its per-cpu
+channel buffers.  Each channel buffer will have an associated file
+created for it in the host filesystem, which can be and mmapped or
+read from in user space.  The files are named basename0...basenameN-1
+where N is the number of online cpus, and by default will be created
+in the root of the filesystem (if the parent param is NULL).  If you
+want a directory structure to contain your relay files, you should
+create it using the host filesystem's directory creation function,
+e.g. debugfs_create_dir(), and pass the parent directory to
+relay_open().  Users are responsible for cleaning up any directory
+structure they create, when the channel is closed - again the host
+filesystem's directory removal functions should be used for that,
+e.g. debugfs_remove().
+
+In order for a channel to be created and the host filesystem's files
+associated with its channel buffers, the user must provide definitions
+for two callback functions, create_buf_file() and remove_buf_file().
+create_buf_file() is called once for each per-cpu buffer from
+relay_open() and allows the user to create the file which will be used
+to represent the corresponding channel buffer.  The callback should
+return the dentry of the file created to represent the channel buffer.
+remove_buf_file() must also be defined; it's responsible for deleting
+the file(s) created in create_buf_file() and is called during
+relay_close().
+
+Here are some typical definitions for these callbacks, in this case
+using debugfs::
+
+    /*
+    * create_buf_file() callback.  Creates relay file in debugfs.
+    */
+    static struct dentry *create_buf_file_handler(const char *filename,
+						struct dentry *parent,
+						umode_t mode,
+						struct rchan_buf *buf,
+						int *is_global)
+    {
+	    return debugfs_create_file(filename, mode, parent, buf,
+				    &relay_file_operations);
+    }
+
+    /*
+    * remove_buf_file() callback.  Removes relay file from debugfs.
+    */
+    static int remove_buf_file_handler(struct dentry *dentry)
+    {
+	    debugfs_remove(dentry);
+
+	    return 0;
+    }
+
+    /*
+    * relay interface callbacks
+    */
+    static struct rchan_callbacks relay_callbacks =
+    {
+	    .create_buf_file = create_buf_file_handler,
+	    .remove_buf_file = remove_buf_file_handler,
+    };
+
+And an example relay_open() invocation using them::
+
+  chan = relay_open("cpu", NULL, SUBBUF_SIZE, N_SUBBUFS, &relay_callbacks, NULL);
+
+If the create_buf_file() callback fails, or isn't defined, channel
+creation and thus relay_open() will fail.
+
+The total size of each per-cpu buffer is calculated by multiplying the
+number of sub-buffers by the sub-buffer size passed into relay_open().
+The idea behind sub-buffers is that they're basically an extension of
+double-buffering to N buffers, and they also allow applications to
+easily implement random-access-on-buffer-boundary schemes, which can
+be important for some high-volume applications.  The number and size
+of sub-buffers is completely dependent on the application and even for
+the same application, different conditions will warrant different
+values for these parameters at different times.  Typically, the right
+values to use are best decided after some experimentation; in general,
+though, it's safe to assume that having only 1 sub-buffer is a bad
+idea - you're guaranteed to either overwrite data or lose events
+depending on the channel mode being used.
+
+The create_buf_file() implementation can also be defined in such a way
+as to allow the creation of a single 'global' buffer instead of the
+default per-cpu set.  This can be useful for applications interested
+mainly in seeing the relative ordering of system-wide events without
+the need to bother with saving explicit timestamps for the purpose of
+merging/sorting per-cpu files in a postprocessing step.
+
+To have relay_open() create a global buffer, the create_buf_file()
+implementation should set the value of the is_global outparam to a
+non-zero value in addition to creating the file that will be used to
+represent the single buffer.  In the case of a global buffer,
+create_buf_file() and remove_buf_file() will be called only once.  The
+normal channel-writing functions, e.g. relay_write(), can still be
+used - writes from any cpu will transparently end up in the global
+buffer - but since it is a global buffer, callers should make sure
+they use the proper locking for such a buffer, either by wrapping
+writes in a spinlock, or by copying a write function from relay.h and
+creating a local version that internally does the proper locking.
+
+The private_data passed into relay_open() allows clients to associate
+user-defined data with a channel, and is immediately available
+(including in create_buf_file()) via chan->private_data or
+buf->chan->private_data.
+
+Buffer-only channels
+--------------------
+
+These channels have no files associated and can be created with
+relay_open(NULL, NULL, ...). Such channels are useful in scenarios such
+as when doing early tracing in the kernel, before the VFS is up. In these
+cases, one may open a buffer-only channel and then call
+relay_late_setup_files() when the kernel is ready to handle files,
+to expose the buffered data to the userspace.
+
+Channel 'modes'
+---------------
+
+relay channels can be used in either of two modes - 'overwrite' or
+'no-overwrite'.  The mode is entirely determined by the implementation
+of the subbuf_start() callback, as described below.  The default if no
+subbuf_start() callback is defined is 'no-overwrite' mode.  If the
+default mode suits your needs, and you plan to use the read()
+interface to retrieve channel data, you can ignore the details of this
+section, as it pertains mainly to mmap() implementations.
+
+In 'overwrite' mode, also known as 'flight recorder' mode, writes
+continuously cycle around the buffer and will never fail, but will
+unconditionally overwrite old data regardless of whether it's actually
+been consumed.  In no-overwrite mode, writes will fail, i.e. data will
+be lost, if the number of unconsumed sub-buffers equals the total
+number of sub-buffers in the channel.  It should be clear that if
+there is no consumer or if the consumer can't consume sub-buffers fast
+enough, data will be lost in either case; the only difference is
+whether data is lost from the beginning or the end of a buffer.
+
+As explained above, a relay channel is made of up one or more
+per-cpu channel buffers, each implemented as a circular buffer
+subdivided into one or more sub-buffers.  Messages are written into
+the current sub-buffer of the channel's current per-cpu buffer via the
+write functions described below.  Whenever a message can't fit into
+the current sub-buffer, because there's no room left for it, the
+client is notified via the subbuf_start() callback that a switch to a
+new sub-buffer is about to occur.  The client uses this callback to 1)
+initialize the next sub-buffer if appropriate 2) finalize the previous
+sub-buffer if appropriate and 3) return a boolean value indicating
+whether or not to actually move on to the next sub-buffer.
+
+To implement 'no-overwrite' mode, the userspace client would provide
+an implementation of the subbuf_start() callback something like the
+following::
+
+    static int subbuf_start(struct rchan_buf *buf,
+			    void *subbuf,
+			    void *prev_subbuf,
+			    unsigned int prev_padding)
+    {
+	    if (prev_subbuf)
+		    *((unsigned *)prev_subbuf) = prev_padding;
+
+	    if (relay_buf_full(buf))
+		    return 0;
+
+	    subbuf_start_reserve(buf, sizeof(unsigned int));
+
+	    return 1;
+    }
+
+If the current buffer is full, i.e. all sub-buffers remain unconsumed,
+the callback returns 0 to indicate that the buffer switch should not
+occur yet, i.e. until the consumer has had a chance to read the
+current set of ready sub-buffers.  For the relay_buf_full() function
+to make sense, the consumer is responsible for notifying the relay
+interface when sub-buffers have been consumed via
+relay_subbufs_consumed().  Any subsequent attempts to write into the
+buffer will again invoke the subbuf_start() callback with the same
+parameters; only when the consumer has consumed one or more of the
+ready sub-buffers will relay_buf_full() return 0, in which case the
+buffer switch can continue.
+
+The implementation of the subbuf_start() callback for 'overwrite' mode
+would be very similar::
+
+    static int subbuf_start(struct rchan_buf *buf,
+			    void *subbuf,
+			    void *prev_subbuf,
+			    size_t prev_padding)
+    {
+	    if (prev_subbuf)
+		    *((unsigned *)prev_subbuf) = prev_padding;
+
+	    subbuf_start_reserve(buf, sizeof(unsigned int));
+
+	    return 1;
+    }
+
+In this case, the relay_buf_full() check is meaningless and the
+callback always returns 1, causing the buffer switch to occur
+unconditionally.  It's also meaningless for the client to use the
+relay_subbufs_consumed() function in this mode, as it's never
+consulted.
+
+The default subbuf_start() implementation, used if the client doesn't
+define any callbacks, or doesn't define the subbuf_start() callback,
+implements the simplest possible 'no-overwrite' mode, i.e. it does
+nothing but return 0.
+
+Header information can be reserved at the beginning of each sub-buffer
+by calling the subbuf_start_reserve() helper function from within the
+subbuf_start() callback.  This reserved area can be used to store
+whatever information the client wants.  In the example above, room is
+reserved in each sub-buffer to store the padding count for that
+sub-buffer.  This is filled in for the previous sub-buffer in the
+subbuf_start() implementation; the padding value for the previous
+sub-buffer is passed into the subbuf_start() callback along with a
+pointer to the previous sub-buffer, since the padding value isn't
+known until a sub-buffer is filled.  The subbuf_start() callback is
+also called for the first sub-buffer when the channel is opened, to
+give the client a chance to reserve space in it.  In this case the
+previous sub-buffer pointer passed into the callback will be NULL, so
+the client should check the value of the prev_subbuf pointer before
+writing into the previous sub-buffer.
+
+Writing to a channel
+--------------------
+
+Kernel clients write data into the current cpu's channel buffer using
+relay_write() or __relay_write().  relay_write() is the main logging
+function - it uses local_irqsave() to protect the buffer and should be
+used if you might be logging from interrupt context.  If you know
+you'll never be logging from interrupt context, you can use
+__relay_write(), which only disables preemption.  These functions
+don't return a value, so you can't determine whether or not they
+failed - the assumption is that you wouldn't want to check a return
+value in the fast logging path anyway, and that they'll always succeed
+unless the buffer is full and no-overwrite mode is being used, in
+which case you can detect a failed write in the subbuf_start()
+callback by calling the relay_buf_full() helper function.
+
+relay_reserve() is used to reserve a slot in a channel buffer which
+can be written to later.  This would typically be used in applications
+that need to write directly into a channel buffer without having to
+stage data in a temporary buffer beforehand.  Because the actual write
+may not happen immediately after the slot is reserved, applications
+using relay_reserve() can keep a count of the number of bytes actually
+written, either in space reserved in the sub-buffers themselves or as
+a separate array.  See the 'reserve' example in the relay-apps tarball
+at http://relayfs.sourceforge.net for an example of how this can be
+done.  Because the write is under control of the client and is
+separated from the reserve, relay_reserve() doesn't protect the buffer
+at all - it's up to the client to provide the appropriate
+synchronization when using relay_reserve().
+
+Closing a channel
+-----------------
+
+The client calls relay_close() when it's finished using the channel.
+The channel and its associated buffers are destroyed when there are no
+longer any references to any of the channel buffers.  relay_flush()
+forces a sub-buffer switch on all the channel buffers, and can be used
+to finalize and process the last sub-buffers before the channel is
+closed.
+
+Misc
+----
+
+Some applications may want to keep a channel around and re-use it
+rather than open and close a new channel for each use.  relay_reset()
+can be used for this purpose - it resets a channel to its initial
+state without reallocating channel buffer memory or destroying
+existing mappings.  It should however only be called when it's safe to
+do so, i.e. when the channel isn't currently being written to.
+
+Finally, there are a couple of utility callbacks that can be used for
+different purposes.  buf_mapped() is called whenever a channel buffer
+is mmapped from user space and buf_unmapped() is called when it's
+unmapped.  The client can use this notification to trigger actions
+within the kernel application, such as enabling/disabling logging to
+the channel.
+
+
+Resources
+=========
+
+For news, example code, mailing list, etc. see the relay interface homepage:
+
+    http://relayfs.sourceforge.net
+
+
+Credits
+=======
+
+The ideas and specs for the relay interface came about as a result of
+discussions on tracing involving the following:
+
+Michel Dagenais		<michel.dagenais@polymtl.ca>
+Richard Moore		<richardj_moore@uk.ibm.com>
+Bob Wisniewski		<bob@watson.ibm.com>
+Karim Yaghmour		<karim@opersys.com>
+Tom Zanussi		<zanussi@us.ibm.com>
+
+Also thanks to Hubertus Franke for a lot of useful suggestions and bug
+reports.
diff --git a/Documentation/filesystems/relay.txt b/Documentation/filesystems/relay.txt
deleted file mode 100644
index cd709a94d054..000000000000
--- a/Documentation/filesystems/relay.txt
+++ /dev/null
@@ -1,494 +0,0 @@
-relay interface (formerly relayfs)
-==================================
-
-The relay interface provides a means for kernel applications to
-efficiently log and transfer large quantities of data from the kernel
-to userspace via user-defined 'relay channels'.
-
-A 'relay channel' is a kernel->user data relay mechanism implemented
-as a set of per-cpu kernel buffers ('channel buffers'), each
-represented as a regular file ('relay file') in user space.  Kernel
-clients write into the channel buffers using efficient write
-functions; these automatically log into the current cpu's channel
-buffer.  User space applications mmap() or read() from the relay files
-and retrieve the data as it becomes available.  The relay files
-themselves are files created in a host filesystem, e.g. debugfs, and
-are associated with the channel buffers using the API described below.
-
-The format of the data logged into the channel buffers is completely
-up to the kernel client; the relay interface does however provide
-hooks which allow kernel clients to impose some structure on the
-buffer data.  The relay interface doesn't implement any form of data
-filtering - this also is left to the kernel client.  The purpose is to
-keep things as simple as possible.
-
-This document provides an overview of the relay interface API.  The
-details of the function parameters are documented along with the
-functions in the relay interface code - please see that for details.
-
-Semantics
-=========
-
-Each relay channel has one buffer per CPU, each buffer has one or more
-sub-buffers.  Messages are written to the first sub-buffer until it is
-too full to contain a new message, in which case it is written to
-the next (if available).  Messages are never split across sub-buffers.
-At this point, userspace can be notified so it empties the first
-sub-buffer, while the kernel continues writing to the next.
-
-When notified that a sub-buffer is full, the kernel knows how many
-bytes of it are padding i.e. unused space occurring because a complete
-message couldn't fit into a sub-buffer.  Userspace can use this
-knowledge to copy only valid data.
-
-After copying it, userspace can notify the kernel that a sub-buffer
-has been consumed.
-
-A relay channel can operate in a mode where it will overwrite data not
-yet collected by userspace, and not wait for it to be consumed.
-
-The relay channel itself does not provide for communication of such
-data between userspace and kernel, allowing the kernel side to remain
-simple and not impose a single interface on userspace.  It does
-provide a set of examples and a separate helper though, described
-below.
-
-The read() interface both removes padding and internally consumes the
-read sub-buffers; thus in cases where read(2) is being used to drain
-the channel buffers, special-purpose communication between kernel and
-user isn't necessary for basic operation.
-
-One of the major goals of the relay interface is to provide a low
-overhead mechanism for conveying kernel data to userspace.  While the
-read() interface is easy to use, it's not as efficient as the mmap()
-approach; the example code attempts to make the tradeoff between the
-two approaches as small as possible.
-
-klog and relay-apps example code
-================================
-
-The relay interface itself is ready to use, but to make things easier,
-a couple simple utility functions and a set of examples are provided.
-
-The relay-apps example tarball, available on the relay sourceforge
-site, contains a set of self-contained examples, each consisting of a
-pair of .c files containing boilerplate code for each of the user and
-kernel sides of a relay application.  When combined these two sets of
-boilerplate code provide glue to easily stream data to disk, without
-having to bother with mundane housekeeping chores.
-
-The 'klog debugging functions' patch (klog.patch in the relay-apps
-tarball) provides a couple of high-level logging functions to the
-kernel which allow writing formatted text or raw data to a channel,
-regardless of whether a channel to write into exists or not, or even
-whether the relay interface is compiled into the kernel or not.  These
-functions allow you to put unconditional 'trace' statements anywhere
-in the kernel or kernel modules; only when there is a 'klog handler'
-registered will data actually be logged (see the klog and kleak
-examples for details).
-
-It is of course possible to use the relay interface from scratch,
-i.e. without using any of the relay-apps example code or klog, but
-you'll have to implement communication between userspace and kernel,
-allowing both to convey the state of buffers (full, empty, amount of
-padding).  The read() interface both removes padding and internally
-consumes the read sub-buffers; thus in cases where read(2) is being
-used to drain the channel buffers, special-purpose communication
-between kernel and user isn't necessary for basic operation.  Things
-such as buffer-full conditions would still need to be communicated via
-some channel though.
-
-klog and the relay-apps examples can be found in the relay-apps
-tarball on http://relayfs.sourceforge.net
-
-The relay interface user space API
-==================================
-
-The relay interface implements basic file operations for user space
-access to relay channel buffer data.  Here are the file operations
-that are available and some comments regarding their behavior:
-
-open()	    enables user to open an _existing_ channel buffer.
-
-mmap()      results in channel buffer being mapped into the caller's
-	    memory space. Note that you can't do a partial mmap - you
-	    must map the entire file, which is NRBUF * SUBBUFSIZE.
-
-read()      read the contents of a channel buffer.  The bytes read are
-	    'consumed' by the reader, i.e. they won't be available
-	    again to subsequent reads.  If the channel is being used
-	    in no-overwrite mode (the default), it can be read at any
-	    time even if there's an active kernel writer.  If the
-	    channel is being used in overwrite mode and there are
-	    active channel writers, results may be unpredictable -
-	    users should make sure that all logging to the channel has
-	    ended before using read() with overwrite mode.  Sub-buffer
-	    padding is automatically removed and will not be seen by
-	    the reader.
-
-sendfile()  transfer data from a channel buffer to an output file
-	    descriptor. Sub-buffer padding is automatically removed
-	    and will not be seen by the reader.
-
-poll()      POLLIN/POLLRDNORM/POLLERR supported.  User applications are
-	    notified when sub-buffer boundaries are crossed.
-
-close()     decrements the channel buffer's refcount.  When the refcount
-	    reaches 0, i.e. when no process or kernel client has the
-	    buffer open, the channel buffer is freed.
-
-In order for a user application to make use of relay files, the
-host filesystem must be mounted.  For example,
-
-	mount -t debugfs debugfs /sys/kernel/debug
-
-NOTE:   the host filesystem doesn't need to be mounted for kernel
-	clients to create or use channels - it only needs to be
-	mounted when user space applications need access to the buffer
-	data.
-
-
-The relay interface kernel API
-==============================
-
-Here's a summary of the API the relay interface provides to in-kernel clients:
-
-TBD(curr. line MT:/API/)
-  channel management functions:
-
-    relay_open(base_filename, parent, subbuf_size, n_subbufs,
-               callbacks, private_data)
-    relay_close(chan)
-    relay_flush(chan)
-    relay_reset(chan)
-
-  channel management typically called on instigation of userspace:
-
-    relay_subbufs_consumed(chan, cpu, subbufs_consumed)
-
-  write functions:
-
-    relay_write(chan, data, length)
-    __relay_write(chan, data, length)
-    relay_reserve(chan, length)
-
-  callbacks:
-
-    subbuf_start(buf, subbuf, prev_subbuf, prev_padding)
-    buf_mapped(buf, filp)
-    buf_unmapped(buf, filp)
-    create_buf_file(filename, parent, mode, buf, is_global)
-    remove_buf_file(dentry)
-
-  helper functions:
-
-    relay_buf_full(buf)
-    subbuf_start_reserve(buf, length)
-
-
-Creating a channel
-------------------
-
-relay_open() is used to create a channel, along with its per-cpu
-channel buffers.  Each channel buffer will have an associated file
-created for it in the host filesystem, which can be and mmapped or
-read from in user space.  The files are named basename0...basenameN-1
-where N is the number of online cpus, and by default will be created
-in the root of the filesystem (if the parent param is NULL).  If you
-want a directory structure to contain your relay files, you should
-create it using the host filesystem's directory creation function,
-e.g. debugfs_create_dir(), and pass the parent directory to
-relay_open().  Users are responsible for cleaning up any directory
-structure they create, when the channel is closed - again the host
-filesystem's directory removal functions should be used for that,
-e.g. debugfs_remove().
-
-In order for a channel to be created and the host filesystem's files
-associated with its channel buffers, the user must provide definitions
-for two callback functions, create_buf_file() and remove_buf_file().
-create_buf_file() is called once for each per-cpu buffer from
-relay_open() and allows the user to create the file which will be used
-to represent the corresponding channel buffer.  The callback should
-return the dentry of the file created to represent the channel buffer.
-remove_buf_file() must also be defined; it's responsible for deleting
-the file(s) created in create_buf_file() and is called during
-relay_close().
-
-Here are some typical definitions for these callbacks, in this case
-using debugfs:
-
-/*
- * create_buf_file() callback.  Creates relay file in debugfs.
- */
-static struct dentry *create_buf_file_handler(const char *filename,
-                                              struct dentry *parent,
-                                              umode_t mode,
-                                              struct rchan_buf *buf,
-                                              int *is_global)
-{
-        return debugfs_create_file(filename, mode, parent, buf,
-	                           &relay_file_operations);
-}
-
-/*
- * remove_buf_file() callback.  Removes relay file from debugfs.
- */
-static int remove_buf_file_handler(struct dentry *dentry)
-{
-        debugfs_remove(dentry);
-
-        return 0;
-}
-
-/*
- * relay interface callbacks
- */
-static struct rchan_callbacks relay_callbacks =
-{
-        .create_buf_file = create_buf_file_handler,
-        .remove_buf_file = remove_buf_file_handler,
-};
-
-And an example relay_open() invocation using them:
-
-  chan = relay_open("cpu", NULL, SUBBUF_SIZE, N_SUBBUFS, &relay_callbacks, NULL);
-
-If the create_buf_file() callback fails, or isn't defined, channel
-creation and thus relay_open() will fail.
-
-The total size of each per-cpu buffer is calculated by multiplying the
-number of sub-buffers by the sub-buffer size passed into relay_open().
-The idea behind sub-buffers is that they're basically an extension of
-double-buffering to N buffers, and they also allow applications to
-easily implement random-access-on-buffer-boundary schemes, which can
-be important for some high-volume applications.  The number and size
-of sub-buffers is completely dependent on the application and even for
-the same application, different conditions will warrant different
-values for these parameters at different times.  Typically, the right
-values to use are best decided after some experimentation; in general,
-though, it's safe to assume that having only 1 sub-buffer is a bad
-idea - you're guaranteed to either overwrite data or lose events
-depending on the channel mode being used.
-
-The create_buf_file() implementation can also be defined in such a way
-as to allow the creation of a single 'global' buffer instead of the
-default per-cpu set.  This can be useful for applications interested
-mainly in seeing the relative ordering of system-wide events without
-the need to bother with saving explicit timestamps for the purpose of
-merging/sorting per-cpu files in a postprocessing step.
-
-To have relay_open() create a global buffer, the create_buf_file()
-implementation should set the value of the is_global outparam to a
-non-zero value in addition to creating the file that will be used to
-represent the single buffer.  In the case of a global buffer,
-create_buf_file() and remove_buf_file() will be called only once.  The
-normal channel-writing functions, e.g. relay_write(), can still be
-used - writes from any cpu will transparently end up in the global
-buffer - but since it is a global buffer, callers should make sure
-they use the proper locking for such a buffer, either by wrapping
-writes in a spinlock, or by copying a write function from relay.h and
-creating a local version that internally does the proper locking.
-
-The private_data passed into relay_open() allows clients to associate
-user-defined data with a channel, and is immediately available
-(including in create_buf_file()) via chan->private_data or
-buf->chan->private_data.
-
-Buffer-only channels
---------------------
-
-These channels have no files associated and can be created with
-relay_open(NULL, NULL, ...). Such channels are useful in scenarios such
-as when doing early tracing in the kernel, before the VFS is up. In these
-cases, one may open a buffer-only channel and then call
-relay_late_setup_files() when the kernel is ready to handle files,
-to expose the buffered data to the userspace.
-
-Channel 'modes'
----------------
-
-relay channels can be used in either of two modes - 'overwrite' or
-'no-overwrite'.  The mode is entirely determined by the implementation
-of the subbuf_start() callback, as described below.  The default if no
-subbuf_start() callback is defined is 'no-overwrite' mode.  If the
-default mode suits your needs, and you plan to use the read()
-interface to retrieve channel data, you can ignore the details of this
-section, as it pertains mainly to mmap() implementations.
-
-In 'overwrite' mode, also known as 'flight recorder' mode, writes
-continuously cycle around the buffer and will never fail, but will
-unconditionally overwrite old data regardless of whether it's actually
-been consumed.  In no-overwrite mode, writes will fail, i.e. data will
-be lost, if the number of unconsumed sub-buffers equals the total
-number of sub-buffers in the channel.  It should be clear that if
-there is no consumer or if the consumer can't consume sub-buffers fast
-enough, data will be lost in either case; the only difference is
-whether data is lost from the beginning or the end of a buffer.
-
-As explained above, a relay channel is made of up one or more
-per-cpu channel buffers, each implemented as a circular buffer
-subdivided into one or more sub-buffers.  Messages are written into
-the current sub-buffer of the channel's current per-cpu buffer via the
-write functions described below.  Whenever a message can't fit into
-the current sub-buffer, because there's no room left for it, the
-client is notified via the subbuf_start() callback that a switch to a
-new sub-buffer is about to occur.  The client uses this callback to 1)
-initialize the next sub-buffer if appropriate 2) finalize the previous
-sub-buffer if appropriate and 3) return a boolean value indicating
-whether or not to actually move on to the next sub-buffer.
-
-To implement 'no-overwrite' mode, the userspace client would provide
-an implementation of the subbuf_start() callback something like the
-following:
-
-static int subbuf_start(struct rchan_buf *buf,
-                        void *subbuf,
-			void *prev_subbuf,
-			unsigned int prev_padding)
-{
-	if (prev_subbuf)
-		*((unsigned *)prev_subbuf) = prev_padding;
-
-	if (relay_buf_full(buf))
-		return 0;
-
-	subbuf_start_reserve(buf, sizeof(unsigned int));
-
-	return 1;
-}
-
-If the current buffer is full, i.e. all sub-buffers remain unconsumed,
-the callback returns 0 to indicate that the buffer switch should not
-occur yet, i.e. until the consumer has had a chance to read the
-current set of ready sub-buffers.  For the relay_buf_full() function
-to make sense, the consumer is responsible for notifying the relay
-interface when sub-buffers have been consumed via
-relay_subbufs_consumed().  Any subsequent attempts to write into the
-buffer will again invoke the subbuf_start() callback with the same
-parameters; only when the consumer has consumed one or more of the
-ready sub-buffers will relay_buf_full() return 0, in which case the
-buffer switch can continue.
-
-The implementation of the subbuf_start() callback for 'overwrite' mode
-would be very similar:
-
-static int subbuf_start(struct rchan_buf *buf,
-                        void *subbuf,
-			void *prev_subbuf,
-			size_t prev_padding)
-{
-	if (prev_subbuf)
-		*((unsigned *)prev_subbuf) = prev_padding;
-
-	subbuf_start_reserve(buf, sizeof(unsigned int));
-
-	return 1;
-}
-
-In this case, the relay_buf_full() check is meaningless and the
-callback always returns 1, causing the buffer switch to occur
-unconditionally.  It's also meaningless for the client to use the
-relay_subbufs_consumed() function in this mode, as it's never
-consulted.
-
-The default subbuf_start() implementation, used if the client doesn't
-define any callbacks, or doesn't define the subbuf_start() callback,
-implements the simplest possible 'no-overwrite' mode, i.e. it does
-nothing but return 0.
-
-Header information can be reserved at the beginning of each sub-buffer
-by calling the subbuf_start_reserve() helper function from within the
-subbuf_start() callback.  This reserved area can be used to store
-whatever information the client wants.  In the example above, room is
-reserved in each sub-buffer to store the padding count for that
-sub-buffer.  This is filled in for the previous sub-buffer in the
-subbuf_start() implementation; the padding value for the previous
-sub-buffer is passed into the subbuf_start() callback along with a
-pointer to the previous sub-buffer, since the padding value isn't
-known until a sub-buffer is filled.  The subbuf_start() callback is
-also called for the first sub-buffer when the channel is opened, to
-give the client a chance to reserve space in it.  In this case the
-previous sub-buffer pointer passed into the callback will be NULL, so
-the client should check the value of the prev_subbuf pointer before
-writing into the previous sub-buffer.
-
-Writing to a channel
---------------------
-
-Kernel clients write data into the current cpu's channel buffer using
-relay_write() or __relay_write().  relay_write() is the main logging
-function - it uses local_irqsave() to protect the buffer and should be
-used if you might be logging from interrupt context.  If you know
-you'll never be logging from interrupt context, you can use
-__relay_write(), which only disables preemption.  These functions
-don't return a value, so you can't determine whether or not they
-failed - the assumption is that you wouldn't want to check a return
-value in the fast logging path anyway, and that they'll always succeed
-unless the buffer is full and no-overwrite mode is being used, in
-which case you can detect a failed write in the subbuf_start()
-callback by calling the relay_buf_full() helper function.
-
-relay_reserve() is used to reserve a slot in a channel buffer which
-can be written to later.  This would typically be used in applications
-that need to write directly into a channel buffer without having to
-stage data in a temporary buffer beforehand.  Because the actual write
-may not happen immediately after the slot is reserved, applications
-using relay_reserve() can keep a count of the number of bytes actually
-written, either in space reserved in the sub-buffers themselves or as
-a separate array.  See the 'reserve' example in the relay-apps tarball
-at http://relayfs.sourceforge.net for an example of how this can be
-done.  Because the write is under control of the client and is
-separated from the reserve, relay_reserve() doesn't protect the buffer
-at all - it's up to the client to provide the appropriate
-synchronization when using relay_reserve().
-
-Closing a channel
------------------
-
-The client calls relay_close() when it's finished using the channel.
-The channel and its associated buffers are destroyed when there are no
-longer any references to any of the channel buffers.  relay_flush()
-forces a sub-buffer switch on all the channel buffers, and can be used
-to finalize and process the last sub-buffers before the channel is
-closed.
-
-Misc
-----
-
-Some applications may want to keep a channel around and re-use it
-rather than open and close a new channel for each use.  relay_reset()
-can be used for this purpose - it resets a channel to its initial
-state without reallocating channel buffer memory or destroying
-existing mappings.  It should however only be called when it's safe to
-do so, i.e. when the channel isn't currently being written to.
-
-Finally, there are a couple of utility callbacks that can be used for
-different purposes.  buf_mapped() is called whenever a channel buffer
-is mmapped from user space and buf_unmapped() is called when it's
-unmapped.  The client can use this notification to trigger actions
-within the kernel application, such as enabling/disabling logging to
-the channel.
-
-
-Resources
-=========
-
-For news, example code, mailing list, etc. see the relay interface homepage:
-
-    http://relayfs.sourceforge.net
-
-
-Credits
-=======
-
-The ideas and specs for the relay interface came about as a result of
-discussions on tracing involving the following:
-
-Michel Dagenais		<michel.dagenais@polymtl.ca>
-Richard Moore		<richardj_moore@uk.ibm.com>
-Bob Wisniewski		<bob@watson.ibm.com>
-Karim Yaghmour		<karim@opersys.com>
-Tom Zanussi		<zanussi@us.ibm.com>
-
-Also thanks to Hubertus Franke for a lot of useful suggestions and bug
-reports.
-- 
cgit 


From 6db0a480aa07ab65b6c7d34d095c714359af3e87 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:12:22 +0100
Subject: docs: filesystems: convert romfs.txt to ReST

- Add a SPDX header;
- Add a document title;
- Some whitespace fixes and new line breaks;
- Mark literal blocks as such;
- Add table markups;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/d2cc83e7cd6de63c793ccd3f2588ea40f7f1e764.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/index.rst |   1 +
 Documentation/filesystems/romfs.rst | 194 ++++++++++++++++++++++++++++++++++++
 Documentation/filesystems/romfs.txt | 186 ----------------------------------
 3 files changed, 195 insertions(+), 186 deletions(-)
 create mode 100644 Documentation/filesystems/romfs.rst
 delete mode 100644 Documentation/filesystems/romfs.txt

diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index 0aade8146d4d..3b26639517af 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -85,5 +85,6 @@ Documentation for filesystem implementations.
    qnx6
    ramfs-rootfs-initramfs
    relay
+   romfs
    virtiofs
    vfat
diff --git a/Documentation/filesystems/romfs.rst b/Documentation/filesystems/romfs.rst
new file mode 100644
index 000000000000..465b11efa9be
--- /dev/null
+++ b/Documentation/filesystems/romfs.rst
@@ -0,0 +1,194 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=======================
+ROMFS - ROM File System
+=======================
+
+This is a quite dumb, read only filesystem, mainly for initial RAM
+disks of installation disks.  It has grown up by the need of having
+modules linked at boot time.  Using this filesystem, you get a very
+similar feature, and even the possibility of a small kernel, with a
+file system which doesn't take up useful memory from the router
+functions in the basement of your office.
+
+For comparison, both the older minix and xiafs (the latter is now
+defunct) filesystems, compiled as module need more than 20000 bytes,
+while romfs is less than a page, about 4000 bytes (assuming i586
+code).  Under the same conditions, the msdos filesystem would need
+about 30K (and does not support device nodes or symlinks), while the
+nfs module with nfsroot is about 57K.  Furthermore, as a bit unfair
+comparison, an actual rescue disk used up 3202 blocks with ext2, while
+with romfs, it needed 3079 blocks.
+
+To create such a file system, you'll need a user program named
+genromfs. It is available on http://romfs.sourceforge.net/
+
+As the name suggests, romfs could be also used (space-efficiently) on
+various read-only media, like (E)EPROM disks if someone will have the
+motivation.. :)
+
+However, the main purpose of romfs is to have a very small kernel,
+which has only this filesystem linked in, and then can load any module
+later, with the current module utilities.  It can also be used to run
+some program to decide if you need SCSI devices, and even IDE or
+floppy drives can be loaded later if you use the "initrd"--initial
+RAM disk--feature of the kernel.  This would not be really news
+flash, but with romfs, you can even spare off your ext2 or minix or
+maybe even affs filesystem until you really know that you need it.
+
+For example, a distribution boot disk can contain only the cd disk
+drivers (and possibly the SCSI drivers), and the ISO 9660 filesystem
+module.  The kernel can be small enough, since it doesn't have other
+filesystems, like the quite large ext2fs module, which can then be
+loaded off the CD at a later stage of the installation.  Another use
+would be for a recovery disk, when you are reinstalling a workstation
+from the network, and you will have all the tools/modules available
+from a nearby server, so you don't want to carry two disks for this
+purpose, just because it won't fit into ext2.
+
+romfs operates on block devices as you can expect, and the underlying
+structure is very simple.  Every accessible structure begins on 16
+byte boundaries for fast access.  The minimum space a file will take
+is 32 bytes (this is an empty file, with a less than 16 character
+name).  The maximum overhead for any non-empty file is the header, and
+the 16 byte padding for the name and the contents, also 16+14+15 = 45
+bytes.  This is quite rare however, since most file names are longer
+than 3 bytes, and shorter than 15 bytes.
+
+The layout of the filesystem is the following::
+
+ offset	    content
+
+	+---+---+---+---+
+  0	| - | r | o | m |  \
+	+---+---+---+---+	The ASCII representation of those bytes
+  4	| 1 | f | s | - |  /	(i.e. "-rom1fs-")
+	+---+---+---+---+
+  8	|   full size	|	The number of accessible bytes in this fs.
+	+---+---+---+---+
+ 12	|    checksum	|	The checksum of the FIRST 512 BYTES.
+	+---+---+---+---+
+ 16	| volume name	|	The zero terminated name of the volume,
+	:               :	padded to 16 byte boundary.
+	+---+---+---+---+
+ xx	|     file	|
+	:    headers	:
+
+Every multi byte value (32 bit words, I'll use the longwords term from
+now on) must be in big endian order.
+
+The first eight bytes identify the filesystem, even for the casual
+inspector.  After that, in the 3rd longword, it contains the number of
+bytes accessible from the start of this filesystem.  The 4th longword
+is the checksum of the first 512 bytes (or the number of bytes
+accessible, whichever is smaller).  The applied algorithm is the same
+as in the AFFS filesystem, namely a simple sum of the longwords
+(assuming bigendian quantities again).  For details, please consult
+the source.  This algorithm was chosen because although it's not quite
+reliable, it does not require any tables, and it is very simple.
+
+The following bytes are now part of the file system; each file header
+must begin on a 16 byte boundary::
+
+ offset	    content
+
+     	+---+---+---+---+
+  0	| next filehdr|X|	The offset of the next file header
+	+---+---+---+---+	  (zero if no more files)
+  4	|   spec.info	|	Info for directories/hard links/devices
+	+---+---+---+---+
+  8	|     size      |	The size of this file in bytes
+	+---+---+---+---+
+ 12	|   checksum	|	Covering the meta data, including the file
+	+---+---+---+---+	  name, and padding
+ 16	| file name     |	The zero terminated name of the file,
+	:               :	padded to 16 byte boundary
+	+---+---+---+---+
+ xx	| file data	|
+	:		:
+
+Since the file headers begin always at a 16 byte boundary, the lowest
+4 bits would be always zero in the next filehdr pointer.  These four
+bits are used for the mode information.  Bits 0..2 specify the type of
+the file; while bit 4 shows if the file is executable or not.  The
+permissions are assumed to be world readable, if this bit is not set,
+and world executable if it is; except the character and block devices,
+they are never accessible for other than owner.  The owner of every
+file is user and group 0, this should never be a problem for the
+intended use.  The mapping of the 8 possible values to file types is
+the following:
+
+==	=============== ============================================
+	  mapping		spec.info means
+==	=============== ============================================
+ 0	hard link	link destination [file header]
+ 1	directory	first file's header
+ 2	regular file	unused, must be zero [MBZ]
+ 3	symbolic link	unused, MBZ (file data is the link content)
+ 4	block device	16/16 bits major/minor number
+ 5	char device		    - " -
+ 6	socket		unused, MBZ
+ 7	fifo		unused, MBZ
+==	=============== ============================================
+
+Note that hard links are specifically marked in this filesystem, but
+they will behave as you can expect (i.e. share the inode number).
+Note also that it is your responsibility to not create hard link
+loops, and creating all the . and .. links for directories.  This is
+normally done correctly by the genromfs program.  Please refrain from
+using the executable bits for special purposes on the socket and fifo
+special files, they may have other uses in the future.  Additionally,
+please remember that only regular files, and symlinks are supposed to
+have a nonzero size field; they contain the number of bytes available
+directly after the (padded) file name.
+
+Another thing to note is that romfs works on file headers and data
+aligned to 16 byte boundaries, but most hardware devices and the block
+device drivers are unable to cope with smaller than block-sized data.
+To overcome this limitation, the whole size of the file system must be
+padded to an 1024 byte boundary.
+
+If you have any problems or suggestions concerning this file system,
+please contact me.  However, think twice before wanting me to add
+features and code, because the primary and most important advantage of
+this file system is the small code.  On the other hand, don't be
+alarmed, I'm not getting that much romfs related mail.  Now I can
+understand why Avery wrote poems in the ARCnet docs to get some more
+feedback. :)
+
+romfs has also a mailing list, and to date, it hasn't received any
+traffic, so you are welcome to join it to discuss your ideas. :)
+
+It's run by ezmlm, so you can subscribe to it by sending a message
+to romfs-subscribe@shadow.banki.hu, the content is irrelevant.
+
+Pending issues:
+
+- Permissions and owner information are pretty essential features of a
+  Un*x like system, but romfs does not provide the full possibilities.
+  I have never found this limiting, but others might.
+
+- The file system is read only, so it can be very small, but in case
+  one would want to write _anything_ to a file system, he still needs
+  a writable file system, thus negating the size advantages.  Possible
+  solutions: implement write access as a compile-time option, or a new,
+  similarly small writable filesystem for RAM disks.
+
+- Since the files are only required to have alignment on a 16 byte
+  boundary, it is currently possibly suboptimal to read or execute files
+  from the filesystem.  It might be resolved by reordering file data to
+  have most of it (i.e. except the start and the end) laying at "natural"
+  boundaries, thus it would be possible to directly map a big portion of
+  the file contents to the mm subsystem.
+
+- Compression might be an useful feature, but memory is quite a
+  limiting factor in my eyes.
+
+- Where it is used?
+
+- Does it work on other architectures than intel and motorola?
+
+
+Have fun,
+
+Janos Farkas <chexum@shadow.banki.hu>
diff --git a/Documentation/filesystems/romfs.txt b/Documentation/filesystems/romfs.txt
deleted file mode 100644
index e2b07cc9120a..000000000000
--- a/Documentation/filesystems/romfs.txt
+++ /dev/null
@@ -1,186 +0,0 @@
-ROMFS - ROM FILE SYSTEM
-
-This is a quite dumb, read only filesystem, mainly for initial RAM
-disks of installation disks.  It has grown up by the need of having
-modules linked at boot time.  Using this filesystem, you get a very
-similar feature, and even the possibility of a small kernel, with a
-file system which doesn't take up useful memory from the router
-functions in the basement of your office.
-
-For comparison, both the older minix and xiafs (the latter is now
-defunct) filesystems, compiled as module need more than 20000 bytes,
-while romfs is less than a page, about 4000 bytes (assuming i586
-code).  Under the same conditions, the msdos filesystem would need
-about 30K (and does not support device nodes or symlinks), while the
-nfs module with nfsroot is about 57K.  Furthermore, as a bit unfair
-comparison, an actual rescue disk used up 3202 blocks with ext2, while
-with romfs, it needed 3079 blocks.
-
-To create such a file system, you'll need a user program named
-genromfs. It is available on http://romfs.sourceforge.net/
-
-As the name suggests, romfs could be also used (space-efficiently) on
-various read-only media, like (E)EPROM disks if someone will have the
-motivation.. :)
-
-However, the main purpose of romfs is to have a very small kernel,
-which has only this filesystem linked in, and then can load any module
-later, with the current module utilities.  It can also be used to run
-some program to decide if you need SCSI devices, and even IDE or
-floppy drives can be loaded later if you use the "initrd"--initial
-RAM disk--feature of the kernel.  This would not be really news
-flash, but with romfs, you can even spare off your ext2 or minix or
-maybe even affs filesystem until you really know that you need it.
-
-For example, a distribution boot disk can contain only the cd disk
-drivers (and possibly the SCSI drivers), and the ISO 9660 filesystem
-module.  The kernel can be small enough, since it doesn't have other
-filesystems, like the quite large ext2fs module, which can then be
-loaded off the CD at a later stage of the installation.  Another use
-would be for a recovery disk, when you are reinstalling a workstation
-from the network, and you will have all the tools/modules available
-from a nearby server, so you don't want to carry two disks for this
-purpose, just because it won't fit into ext2.
-
-romfs operates on block devices as you can expect, and the underlying
-structure is very simple.  Every accessible structure begins on 16
-byte boundaries for fast access.  The minimum space a file will take
-is 32 bytes (this is an empty file, with a less than 16 character
-name).  The maximum overhead for any non-empty file is the header, and
-the 16 byte padding for the name and the contents, also 16+14+15 = 45
-bytes.  This is quite rare however, since most file names are longer
-than 3 bytes, and shorter than 15 bytes.
-
-The layout of the filesystem is the following:
-
-offset	    content
-
-	+---+---+---+---+
-  0	| - | r | o | m |  \
-	+---+---+---+---+	The ASCII representation of those bytes
-  4	| 1 | f | s | - |  /	(i.e. "-rom1fs-")
-	+---+---+---+---+
-  8	|   full size	|	The number of accessible bytes in this fs.
-	+---+---+---+---+
- 12	|    checksum	|	The checksum of the FIRST 512 BYTES.
-	+---+---+---+---+
- 16	| volume name	|	The zero terminated name of the volume,
-	:               :	padded to 16 byte boundary.
-	+---+---+---+---+
- xx	|     file	|
-	:    headers	:
-
-Every multi byte value (32 bit words, I'll use the longwords term from
-now on) must be in big endian order.
-
-The first eight bytes identify the filesystem, even for the casual
-inspector.  After that, in the 3rd longword, it contains the number of
-bytes accessible from the start of this filesystem.  The 4th longword
-is the checksum of the first 512 bytes (or the number of bytes
-accessible, whichever is smaller).  The applied algorithm is the same
-as in the AFFS filesystem, namely a simple sum of the longwords
-(assuming bigendian quantities again).  For details, please consult
-the source.  This algorithm was chosen because although it's not quite
-reliable, it does not require any tables, and it is very simple.
-
-The following bytes are now part of the file system; each file header
-must begin on a 16 byte boundary.
-
-offset	    content
-
-     	+---+---+---+---+
-  0	| next filehdr|X|	The offset of the next file header
-	+---+---+---+---+	  (zero if no more files)
-  4	|   spec.info	|	Info for directories/hard links/devices
-	+---+---+---+---+
-  8	|     size      |	The size of this file in bytes
-	+---+---+---+---+
- 12	|   checksum	|	Covering the meta data, including the file
-	+---+---+---+---+	  name, and padding
- 16	| file name     |	The zero terminated name of the file,
-	:               :	padded to 16 byte boundary
-	+---+---+---+---+
- xx	| file data	|
-	:		:
-
-Since the file headers begin always at a 16 byte boundary, the lowest
-4 bits would be always zero in the next filehdr pointer.  These four
-bits are used for the mode information.  Bits 0..2 specify the type of
-the file; while bit 4 shows if the file is executable or not.  The
-permissions are assumed to be world readable, if this bit is not set,
-and world executable if it is; except the character and block devices,
-they are never accessible for other than owner.  The owner of every
-file is user and group 0, this should never be a problem for the
-intended use.  The mapping of the 8 possible values to file types is
-the following:
-
-	  mapping		spec.info means
- 0	hard link	link destination [file header]
- 1	directory	first file's header
- 2	regular file	unused, must be zero [MBZ]
- 3	symbolic link	unused, MBZ (file data is the link content)
- 4	block device	16/16 bits major/minor number
- 5	char device		    - " -
- 6	socket		unused, MBZ
- 7	fifo		unused, MBZ
-
-Note that hard links are specifically marked in this filesystem, but
-they will behave as you can expect (i.e. share the inode number).
-Note also that it is your responsibility to not create hard link
-loops, and creating all the . and .. links for directories.  This is
-normally done correctly by the genromfs program.  Please refrain from
-using the executable bits for special purposes on the socket and fifo
-special files, they may have other uses in the future.  Additionally,
-please remember that only regular files, and symlinks are supposed to
-have a nonzero size field; they contain the number of bytes available
-directly after the (padded) file name.
-
-Another thing to note is that romfs works on file headers and data
-aligned to 16 byte boundaries, but most hardware devices and the block
-device drivers are unable to cope with smaller than block-sized data.
-To overcome this limitation, the whole size of the file system must be
-padded to an 1024 byte boundary.
-
-If you have any problems or suggestions concerning this file system,
-please contact me.  However, think twice before wanting me to add
-features and code, because the primary and most important advantage of
-this file system is the small code.  On the other hand, don't be
-alarmed, I'm not getting that much romfs related mail.  Now I can
-understand why Avery wrote poems in the ARCnet docs to get some more
-feedback. :)
-
-romfs has also a mailing list, and to date, it hasn't received any
-traffic, so you are welcome to join it to discuss your ideas. :)
-
-It's run by ezmlm, so you can subscribe to it by sending a message
-to romfs-subscribe@shadow.banki.hu, the content is irrelevant.
-
-Pending issues:
-
-- Permissions and owner information are pretty essential features of a
-Un*x like system, but romfs does not provide the full possibilities.
-I have never found this limiting, but others might.
-
-- The file system is read only, so it can be very small, but in case
-one would want to write _anything_ to a file system, he still needs
-a writable file system, thus negating the size advantages.  Possible
-solutions: implement write access as a compile-time option, or a new,
-similarly small writable filesystem for RAM disks.
-
-- Since the files are only required to have alignment on a 16 byte
-boundary, it is currently possibly suboptimal to read or execute files
-from the filesystem.  It might be resolved by reordering file data to
-have most of it (i.e. except the start and the end) laying at "natural"
-boundaries, thus it would be possible to directly map a big portion of
-the file contents to the mm subsystem.
-
-- Compression might be an useful feature, but memory is quite a
-limiting factor in my eyes.
-
-- Where it is used?
-
-- Does it work on other architectures than intel and motorola?
-
-
-Have fun,
-Janos Farkas <chexum@shadow.banki.hu>
-- 
cgit 


From 31771f45c8e46d9356f1a58329f5cd40ab331e1a Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:12:23 +0100
Subject: docs: filesystems: convert squashfs.txt to ReST

- Add a SPDX header;
- Adjust document and section titles;
- Mark literal blocks as such;
- Add table markups;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/cec30862c7ee7de7f9cd903e35e6c8bf74cc928a.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/index.rst    |   1 +
 Documentation/filesystems/squashfs.rst | 265 +++++++++++++++++++++++++++++++++
 Documentation/filesystems/squashfs.txt | 259 --------------------------------
 3 files changed, 266 insertions(+), 259 deletions(-)
 create mode 100644 Documentation/filesystems/squashfs.rst
 delete mode 100644 Documentation/filesystems/squashfs.txt

diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index 3b26639517af..97a5f65ae509 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -86,5 +86,6 @@ Documentation for filesystem implementations.
    ramfs-rootfs-initramfs
    relay
    romfs
+   squashfs
    virtiofs
    vfat
diff --git a/Documentation/filesystems/squashfs.rst b/Documentation/filesystems/squashfs.rst
new file mode 100644
index 000000000000..df42106bae71
--- /dev/null
+++ b/Documentation/filesystems/squashfs.rst
@@ -0,0 +1,265 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=======================
+Squashfs 4.0 Filesystem
+=======================
+
+Squashfs is a compressed read-only filesystem for Linux.
+
+It uses zlib, lz4, lzo, or xz compression to compress files, inodes and
+directories.  Inodes in the system are very small and all blocks are packed to
+minimise data overhead. Block sizes greater than 4K are supported up to a
+maximum of 1Mbytes (default block size 128K).
+
+Squashfs is intended for general read-only filesystem use, for archival
+use (i.e. in cases where a .tar.gz file may be used), and in constrained
+block device/memory systems (e.g. embedded systems) where low overhead is
+needed.
+
+Mailing list: squashfs-devel@lists.sourceforge.net
+Web site: www.squashfs.org
+
+1. Filesystem Features
+----------------------
+
+Squashfs filesystem features versus Cramfs:
+
+============================== 	=========		==========
+				Squashfs		Cramfs
+============================== 	=========		==========
+Max filesystem size		2^64			256 MiB
+Max file size			~ 2 TiB			16 MiB
+Max files			unlimited		unlimited
+Max directories			unlimited		unlimited
+Max entries per directory	unlimited		unlimited
+Max block size			1 MiB			4 KiB
+Metadata compression		yes			no
+Directory indexes		yes			no
+Sparse file support		yes			no
+Tail-end packing (fragments)	yes			no
+Exportable (NFS etc.)		yes			no
+Hard link support		yes			no
+"." and ".." in readdir		yes			no
+Real inode numbers		yes			no
+32-bit uids/gids		yes			no
+File creation time		yes			no
+Xattr support			yes			no
+ACL support			no			no
+============================== 	=========		==========
+
+Squashfs compresses data, inodes and directories.  In addition, inode and
+directory data are highly compacted, and packed on byte boundaries.  Each
+compressed inode is on average 8 bytes in length (the exact length varies on
+file type, i.e. regular file, directory, symbolic link, and block/char device
+inodes have different sizes).
+
+2. Using Squashfs
+-----------------
+
+As squashfs is a read-only filesystem, the mksquashfs program must be used to
+create populated squashfs filesystems.  This and other squashfs utilities
+can be obtained from http://www.squashfs.org.  Usage instructions can be
+obtained from this site also.
+
+The squashfs-tools development tree is now located on kernel.org
+	git://git.kernel.org/pub/scm/fs/squashfs/squashfs-tools.git
+
+3. Squashfs Filesystem Design
+-----------------------------
+
+A squashfs filesystem consists of a maximum of nine parts, packed together on a
+byte alignment::
+
+	 ---------------
+	|  superblock 	|
+	|---------------|
+	|  compression  |
+	|    options    |
+	|---------------|
+	|  datablocks   |
+	|  & fragments  |
+	|---------------|
+	|  inode table	|
+	|---------------|
+	|   directory	|
+	|     table     |
+	|---------------|
+	|   fragment	|
+	|    table      |
+	|---------------|
+	|    export     |
+	|    table      |
+	|---------------|
+	|    uid/gid	|
+	|  lookup table	|
+	|---------------|
+	|     xattr     |
+	|     table	|
+	 ---------------
+
+Compressed data blocks are written to the filesystem as files are read from
+the source directory, and checked for duplicates.  Once all file data has been
+written the completed inode, directory, fragment, export, uid/gid lookup and
+xattr tables are written.
+
+3.1 Compression options
+-----------------------
+
+Compressors can optionally support compression specific options (e.g.
+dictionary size).  If non-default compression options have been used, then
+these are stored here.
+
+3.2 Inodes
+----------
+
+Metadata (inodes and directories) are compressed in 8Kbyte blocks.  Each
+compressed block is prefixed by a two byte length, the top bit is set if the
+block is uncompressed.  A block will be uncompressed if the -noI option is set,
+or if the compressed block was larger than the uncompressed block.
+
+Inodes are packed into the metadata blocks, and are not aligned to block
+boundaries, therefore inodes overlap compressed blocks.  Inodes are identified
+by a 48-bit number which encodes the location of the compressed metadata block
+containing the inode, and the byte offset into that block where the inode is
+placed (<block, offset>).
+
+To maximise compression there are different inodes for each file type
+(regular file, directory, device, etc.), the inode contents and length
+varying with the type.
+
+To further maximise compression, two types of regular file inode and
+directory inode are defined: inodes optimised for frequently occurring
+regular files and directories, and extended types where extra
+information has to be stored.
+
+3.3 Directories
+---------------
+
+Like inodes, directories are packed into compressed metadata blocks, stored
+in a directory table.  Directories are accessed using the start address of
+the metablock containing the directory and the offset into the
+decompressed block (<block, offset>).
+
+Directories are organised in a slightly complex way, and are not simply
+a list of file names.  The organisation takes advantage of the
+fact that (in most cases) the inodes of the files will be in the same
+compressed metadata block, and therefore, can share the start block.
+Directories are therefore organised in a two level list, a directory
+header containing the shared start block value, and a sequence of directory
+entries, each of which share the shared start block.  A new directory header
+is written once/if the inode start block changes.  The directory
+header/directory entry list is repeated as many times as necessary.
+
+Directories are sorted, and can contain a directory index to speed up
+file lookup.  Directory indexes store one entry per metablock, each entry
+storing the index/filename mapping to the first directory header
+in each metadata block.  Directories are sorted in alphabetical order,
+and at lookup the index is scanned linearly looking for the first filename
+alphabetically larger than the filename being looked up.  At this point the
+location of the metadata block the filename is in has been found.
+The general idea of the index is to ensure only one metadata block needs to be
+decompressed to do a lookup irrespective of the length of the directory.
+This scheme has the advantage that it doesn't require extra memory overhead
+and doesn't require much extra storage on disk.
+
+3.4 File data
+-------------
+
+Regular files consist of a sequence of contiguous compressed blocks, and/or a
+compressed fragment block (tail-end packed block).   The compressed size
+of each datablock is stored in a block list contained within the
+file inode.
+
+To speed up access to datablocks when reading 'large' files (256 Mbytes or
+larger), the code implements an index cache that caches the mapping from
+block index to datablock location on disk.
+
+The index cache allows Squashfs to handle large files (up to 1.75 TiB) while
+retaining a simple and space-efficient block list on disk.  The cache
+is split into slots, caching up to eight 224 GiB files (128 KiB blocks).
+Larger files use multiple slots, with 1.75 TiB files using all 8 slots.
+The index cache is designed to be memory efficient, and by default uses
+16 KiB.
+
+3.5 Fragment lookup table
+-------------------------
+
+Regular files can contain a fragment index which is mapped to a fragment
+location on disk and compressed size using a fragment lookup table.  This
+fragment lookup table is itself stored compressed into metadata blocks.
+A second index table is used to locate these.  This second index table for
+speed of access (and because it is small) is read at mount time and cached
+in memory.
+
+3.6 Uid/gid lookup table
+------------------------
+
+For space efficiency regular files store uid and gid indexes, which are
+converted to 32-bit uids/gids using an id look up table.  This table is
+stored compressed into metadata blocks.  A second index table is used to
+locate these.  This second index table for speed of access (and because it
+is small) is read at mount time and cached in memory.
+
+3.7 Export table
+----------------
+
+To enable Squashfs filesystems to be exportable (via NFS etc.) filesystems
+can optionally (disabled with the -no-exports Mksquashfs option) contain
+an inode number to inode disk location lookup table.  This is required to
+enable Squashfs to map inode numbers passed in filehandles to the inode
+location on disk, which is necessary when the export code reinstantiates
+expired/flushed inodes.
+
+This table is stored compressed into metadata blocks.  A second index table is
+used to locate these.  This second index table for speed of access (and because
+it is small) is read at mount time and cached in memory.
+
+3.8 Xattr table
+---------------
+
+The xattr table contains extended attributes for each inode.  The xattrs
+for each inode are stored in a list, each list entry containing a type,
+name and value field.  The type field encodes the xattr prefix
+("user.", "trusted." etc) and it also encodes how the name/value fields
+should be interpreted.  Currently the type indicates whether the value
+is stored inline (in which case the value field contains the xattr value),
+or if it is stored out of line (in which case the value field stores a
+reference to where the actual value is stored).  This allows large values
+to be stored out of line improving scanning and lookup performance and it
+also allows values to be de-duplicated, the value being stored once, and
+all other occurrences holding an out of line reference to that value.
+
+The xattr lists are packed into compressed 8K metadata blocks.
+To reduce overhead in inodes, rather than storing the on-disk
+location of the xattr list inside each inode, a 32-bit xattr id
+is stored.  This xattr id is mapped into the location of the xattr
+list using a second xattr id lookup table.
+
+4. TODOs and Outstanding Issues
+-------------------------------
+
+4.1 TODO list
+-------------
+
+Implement ACL support.
+
+4.2 Squashfs Internal Cache
+---------------------------
+
+Blocks in Squashfs are compressed.  To avoid repeatedly decompressing
+recently accessed data Squashfs uses two small metadata and fragment caches.
+
+The cache is not used for file datablocks, these are decompressed and cached in
+the page-cache in the normal way.  The cache is used to temporarily cache
+fragment and metadata blocks which have been read as a result of a metadata
+(i.e. inode or directory) or fragment access.  Because metadata and fragments
+are packed together into blocks (to gain greater compression) the read of a
+particular piece of metadata or fragment will retrieve other metadata/fragments
+which have been packed with it, these because of locality-of-reference may be
+read in the near future. Temporarily caching them ensures they are available
+for near future access without requiring an additional read and decompress.
+
+In the future this internal cache may be replaced with an implementation which
+uses the kernel page cache.  Because the page cache operates on page sized
+units this may introduce additional complexity in terms of locking and
+associated race conditions.
diff --git a/Documentation/filesystems/squashfs.txt b/Documentation/filesystems/squashfs.txt
deleted file mode 100644
index e5274f84dc56..000000000000
--- a/Documentation/filesystems/squashfs.txt
+++ /dev/null
@@ -1,259 +0,0 @@
-SQUASHFS 4.0 FILESYSTEM
-=======================
-
-Squashfs is a compressed read-only filesystem for Linux.
-It uses zlib, lz4, lzo, or xz compression to compress files, inodes and
-directories.  Inodes in the system are very small and all blocks are packed to
-minimise data overhead. Block sizes greater than 4K are supported up to a
-maximum of 1Mbytes (default block size 128K).
-
-Squashfs is intended for general read-only filesystem use, for archival
-use (i.e. in cases where a .tar.gz file may be used), and in constrained
-block device/memory systems (e.g. embedded systems) where low overhead is
-needed.
-
-Mailing list: squashfs-devel@lists.sourceforge.net
-Web site: www.squashfs.org
-
-1. FILESYSTEM FEATURES
-----------------------
-
-Squashfs filesystem features versus Cramfs:
-
-				Squashfs		Cramfs
-
-Max filesystem size:		2^64			256 MiB
-Max file size:			~ 2 TiB			16 MiB
-Max files:			unlimited		unlimited
-Max directories:		unlimited		unlimited
-Max entries per directory:	unlimited		unlimited
-Max block size:			1 MiB			4 KiB
-Metadata compression:		yes			no
-Directory indexes:		yes			no
-Sparse file support:		yes			no
-Tail-end packing (fragments):	yes			no
-Exportable (NFS etc.):		yes			no
-Hard link support:		yes			no
-"." and ".." in readdir:	yes			no
-Real inode numbers:		yes			no
-32-bit uids/gids:		yes			no
-File creation time:		yes			no
-Xattr support:			yes			no
-ACL support:			no			no
-
-Squashfs compresses data, inodes and directories.  In addition, inode and
-directory data are highly compacted, and packed on byte boundaries.  Each
-compressed inode is on average 8 bytes in length (the exact length varies on
-file type, i.e. regular file, directory, symbolic link, and block/char device
-inodes have different sizes).
-
-2. USING SQUASHFS
------------------
-
-As squashfs is a read-only filesystem, the mksquashfs program must be used to
-create populated squashfs filesystems.  This and other squashfs utilities
-can be obtained from http://www.squashfs.org.  Usage instructions can be
-obtained from this site also.
-
-The squashfs-tools development tree is now located on kernel.org
-	git://git.kernel.org/pub/scm/fs/squashfs/squashfs-tools.git
-
-3. SQUASHFS FILESYSTEM DESIGN
------------------------------
-
-A squashfs filesystem consists of a maximum of nine parts, packed together on a
-byte alignment:
-
-	 ---------------
-	|  superblock 	|
-	|---------------|
-	|  compression  |
-	|    options    |
-	|---------------|
-	|  datablocks   |
-	|  & fragments  |
-	|---------------|
-	|  inode table	|
-	|---------------|
-	|   directory	|
-	|     table     |
-	|---------------|
-	|   fragment	|
-	|    table      |
-	|---------------|
-	|    export     |
-	|    table      |
-	|---------------|
-	|    uid/gid	|
-	|  lookup table	|
-	|---------------|
-	|     xattr     |
-	|     table	|
-	 ---------------
-
-Compressed data blocks are written to the filesystem as files are read from
-the source directory, and checked for duplicates.  Once all file data has been
-written the completed inode, directory, fragment, export, uid/gid lookup and
-xattr tables are written.
-
-3.1 Compression options
------------------------
-
-Compressors can optionally support compression specific options (e.g.
-dictionary size).  If non-default compression options have been used, then
-these are stored here.
-
-3.2 Inodes
-----------
-
-Metadata (inodes and directories) are compressed in 8Kbyte blocks.  Each
-compressed block is prefixed by a two byte length, the top bit is set if the
-block is uncompressed.  A block will be uncompressed if the -noI option is set,
-or if the compressed block was larger than the uncompressed block.
-
-Inodes are packed into the metadata blocks, and are not aligned to block
-boundaries, therefore inodes overlap compressed blocks.  Inodes are identified
-by a 48-bit number which encodes the location of the compressed metadata block
-containing the inode, and the byte offset into that block where the inode is
-placed (<block, offset>).
-
-To maximise compression there are different inodes for each file type
-(regular file, directory, device, etc.), the inode contents and length
-varying with the type.
-
-To further maximise compression, two types of regular file inode and
-directory inode are defined: inodes optimised for frequently occurring
-regular files and directories, and extended types where extra
-information has to be stored.
-
-3.3 Directories
----------------
-
-Like inodes, directories are packed into compressed metadata blocks, stored
-in a directory table.  Directories are accessed using the start address of
-the metablock containing the directory and the offset into the
-decompressed block (<block, offset>).
-
-Directories are organised in a slightly complex way, and are not simply
-a list of file names.  The organisation takes advantage of the
-fact that (in most cases) the inodes of the files will be in the same
-compressed metadata block, and therefore, can share the start block.
-Directories are therefore organised in a two level list, a directory
-header containing the shared start block value, and a sequence of directory
-entries, each of which share the shared start block.  A new directory header
-is written once/if the inode start block changes.  The directory
-header/directory entry list is repeated as many times as necessary.
-
-Directories are sorted, and can contain a directory index to speed up
-file lookup.  Directory indexes store one entry per metablock, each entry
-storing the index/filename mapping to the first directory header
-in each metadata block.  Directories are sorted in alphabetical order,
-and at lookup the index is scanned linearly looking for the first filename
-alphabetically larger than the filename being looked up.  At this point the
-location of the metadata block the filename is in has been found.
-The general idea of the index is to ensure only one metadata block needs to be
-decompressed to do a lookup irrespective of the length of the directory.
-This scheme has the advantage that it doesn't require extra memory overhead
-and doesn't require much extra storage on disk.
-
-3.4 File data
--------------
-
-Regular files consist of a sequence of contiguous compressed blocks, and/or a
-compressed fragment block (tail-end packed block).   The compressed size
-of each datablock is stored in a block list contained within the
-file inode.
-
-To speed up access to datablocks when reading 'large' files (256 Mbytes or
-larger), the code implements an index cache that caches the mapping from
-block index to datablock location on disk.
-
-The index cache allows Squashfs to handle large files (up to 1.75 TiB) while
-retaining a simple and space-efficient block list on disk.  The cache
-is split into slots, caching up to eight 224 GiB files (128 KiB blocks).
-Larger files use multiple slots, with 1.75 TiB files using all 8 slots.
-The index cache is designed to be memory efficient, and by default uses
-16 KiB.
-
-3.5 Fragment lookup table
--------------------------
-
-Regular files can contain a fragment index which is mapped to a fragment
-location on disk and compressed size using a fragment lookup table.  This
-fragment lookup table is itself stored compressed into metadata blocks.
-A second index table is used to locate these.  This second index table for
-speed of access (and because it is small) is read at mount time and cached
-in memory.
-
-3.6 Uid/gid lookup table
-------------------------
-
-For space efficiency regular files store uid and gid indexes, which are
-converted to 32-bit uids/gids using an id look up table.  This table is
-stored compressed into metadata blocks.  A second index table is used to
-locate these.  This second index table for speed of access (and because it
-is small) is read at mount time and cached in memory.
-
-3.7 Export table
-----------------
-
-To enable Squashfs filesystems to be exportable (via NFS etc.) filesystems
-can optionally (disabled with the -no-exports Mksquashfs option) contain
-an inode number to inode disk location lookup table.  This is required to
-enable Squashfs to map inode numbers passed in filehandles to the inode
-location on disk, which is necessary when the export code reinstantiates
-expired/flushed inodes.
-
-This table is stored compressed into metadata blocks.  A second index table is
-used to locate these.  This second index table for speed of access (and because
-it is small) is read at mount time and cached in memory.
-
-3.8 Xattr table
----------------
-
-The xattr table contains extended attributes for each inode.  The xattrs
-for each inode are stored in a list, each list entry containing a type,
-name and value field.  The type field encodes the xattr prefix
-("user.", "trusted." etc) and it also encodes how the name/value fields
-should be interpreted.  Currently the type indicates whether the value
-is stored inline (in which case the value field contains the xattr value),
-or if it is stored out of line (in which case the value field stores a
-reference to where the actual value is stored).  This allows large values
-to be stored out of line improving scanning and lookup performance and it
-also allows values to be de-duplicated, the value being stored once, and
-all other occurrences holding an out of line reference to that value.
-
-The xattr lists are packed into compressed 8K metadata blocks.
-To reduce overhead in inodes, rather than storing the on-disk
-location of the xattr list inside each inode, a 32-bit xattr id
-is stored.  This xattr id is mapped into the location of the xattr
-list using a second xattr id lookup table.
-
-4. TODOS AND OUTSTANDING ISSUES
--------------------------------
-
-4.1 Todo list
--------------
-
-Implement ACL support.
-
-4.2 Squashfs internal cache
----------------------------
-
-Blocks in Squashfs are compressed.  To avoid repeatedly decompressing
-recently accessed data Squashfs uses two small metadata and fragment caches.
-
-The cache is not used for file datablocks, these are decompressed and cached in
-the page-cache in the normal way.  The cache is used to temporarily cache
-fragment and metadata blocks which have been read as a result of a metadata
-(i.e. inode or directory) or fragment access.  Because metadata and fragments
-are packed together into blocks (to gain greater compression) the read of a
-particular piece of metadata or fragment will retrieve other metadata/fragments
-which have been packed with it, these because of locality-of-reference may be
-read in the near future. Temporarily caching them ensures they are available
-for near future access without requiring an additional read and decompress.
-
-In the future this internal cache may be replaced with an implementation which
-uses the kernel page cache.  Because the page cache operates on page sized
-units this may introduce additional complexity in terms of locking and
-associated race conditions.
-- 
cgit 


From 86beb976700b26576fe522a94a0b3a4e3d5ce424 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:12:24 +0100
Subject: docs: filesystems: convert sysfs.txt to ReST

- Add a SPDX header;
- Add a document title;
- Adjust document and section titles;
- use :field: markup;
- Some whitespace fixes and new line breaks;
- Mark literal blocks as such;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/5c480dcb467315b5df6e25372a65e473b585c36d.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/index.rst |   1 +
 Documentation/filesystems/sysfs.rst | 418 ++++++++++++++++++++++++++++++++++++
 Documentation/filesystems/sysfs.txt | 408 -----------------------------------
 3 files changed, 419 insertions(+), 408 deletions(-)
 create mode 100644 Documentation/filesystems/sysfs.rst
 delete mode 100644 Documentation/filesystems/sysfs.txt

diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index 97a5f65ae509..bafe92c72433 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -87,5 +87,6 @@ Documentation for filesystem implementations.
    relay
    romfs
    squashfs
+   sysfs
    virtiofs
    vfat
diff --git a/Documentation/filesystems/sysfs.rst b/Documentation/filesystems/sysfs.rst
new file mode 100644
index 000000000000..290891c3fecb
--- /dev/null
+++ b/Documentation/filesystems/sysfs.rst
@@ -0,0 +1,418 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=====================================================
+sysfs - _The_ filesystem for exporting kernel objects
+=====================================================
+
+Patrick Mochel	<mochel@osdl.org>
+
+Mike Murphy <mamurph@cs.clemson.edu>
+
+:Revised:    16 August 2011
+:Original:   10 January 2003
+
+
+What it is:
+~~~~~~~~~~~
+
+sysfs is a ram-based filesystem initially based on ramfs. It provides
+a means to export kernel data structures, their attributes, and the
+linkages between them to userspace.
+
+sysfs is tied inherently to the kobject infrastructure. Please read
+Documentation/kobject.txt for more information concerning the kobject
+interface.
+
+
+Using sysfs
+~~~~~~~~~~~
+
+sysfs is always compiled in if CONFIG_SYSFS is defined. You can access
+it by doing::
+
+    mount -t sysfs sysfs /sys
+
+
+Directory Creation
+~~~~~~~~~~~~~~~~~~
+
+For every kobject that is registered with the system, a directory is
+created for it in sysfs. That directory is created as a subdirectory
+of the kobject's parent, expressing internal object hierarchies to
+userspace. Top-level directories in sysfs represent the common
+ancestors of object hierarchies; i.e. the subsystems the objects
+belong to.
+
+Sysfs internally stores a pointer to the kobject that implements a
+directory in the kernfs_node object associated with the directory. In
+the past this kobject pointer has been used by sysfs to do reference
+counting directly on the kobject whenever the file is opened or closed.
+With the current sysfs implementation the kobject reference count is
+only modified directly by the function sysfs_schedule_callback().
+
+
+Attributes
+~~~~~~~~~~
+
+Attributes can be exported for kobjects in the form of regular files in
+the filesystem. Sysfs forwards file I/O operations to methods defined
+for the attributes, providing a means to read and write kernel
+attributes.
+
+Attributes should be ASCII text files, preferably with only one value
+per file. It is noted that it may not be efficient to contain only one
+value per file, so it is socially acceptable to express an array of
+values of the same type.
+
+Mixing types, expressing multiple lines of data, and doing fancy
+formatting of data is heavily frowned upon. Doing these things may get
+you publicly humiliated and your code rewritten without notice.
+
+
+An attribute definition is simply::
+
+    struct attribute {
+	    char                    * name;
+	    struct module		*owner;
+	    umode_t                 mode;
+    };
+
+
+    int sysfs_create_file(struct kobject * kobj, const struct attribute * attr);
+    void sysfs_remove_file(struct kobject * kobj, const struct attribute * attr);
+
+
+A bare attribute contains no means to read or write the value of the
+attribute. Subsystems are encouraged to define their own attribute
+structure and wrapper functions for adding and removing attributes for
+a specific object type.
+
+For example, the driver model defines struct device_attribute like::
+
+    struct device_attribute {
+	    struct attribute	attr;
+	    ssize_t (*show)(struct device *dev, struct device_attribute *attr,
+			    char *buf);
+	    ssize_t (*store)(struct device *dev, struct device_attribute *attr,
+			    const char *buf, size_t count);
+    };
+
+    int device_create_file(struct device *, const struct device_attribute *);
+    void device_remove_file(struct device *, const struct device_attribute *);
+
+It also defines this helper for defining device attributes::
+
+    #define DEVICE_ATTR(_name, _mode, _show, _store) \
+    struct device_attribute dev_attr_##_name = __ATTR(_name, _mode, _show, _store)
+
+For example, declaring::
+
+    static DEVICE_ATTR(foo, S_IWUSR | S_IRUGO, show_foo, store_foo);
+
+is equivalent to doing::
+
+    static struct device_attribute dev_attr_foo = {
+	    .attr = {
+		    .name = "foo",
+		    .mode = S_IWUSR | S_IRUGO,
+	    },
+	    .show = show_foo,
+	    .store = store_foo,
+    };
+
+Note as stated in include/linux/kernel.h "OTHER_WRITABLE?  Generally
+considered a bad idea." so trying to set a sysfs file writable for
+everyone will fail reverting to RO mode for "Others".
+
+For the common cases sysfs.h provides convenience macros to make
+defining attributes easier as well as making code more concise and
+readable. The above case could be shortened to:
+
+static struct device_attribute dev_attr_foo = __ATTR_RW(foo);
+
+the list of helpers available to define your wrapper function is:
+
+__ATTR_RO(name):
+		 assumes default name_show and mode 0444
+__ATTR_WO(name):
+		 assumes a name_store only and is restricted to mode
+                 0200 that is root write access only.
+__ATTR_RO_MODE(name, mode):
+	         fore more restrictive RO access currently
+                 only use case is the EFI System Resource Table
+                 (see drivers/firmware/efi/esrt.c)
+__ATTR_RW(name):
+	         assumes default name_show, name_store and setting
+                 mode to 0644.
+__ATTR_NULL:
+	         which sets the name to NULL and is used as end of list
+                 indicator (see: kernel/workqueue.c)
+
+Subsystem-Specific Callbacks
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+When a subsystem defines a new attribute type, it must implement a
+set of sysfs operations for forwarding read and write calls to the
+show and store methods of the attribute owners::
+
+    struct sysfs_ops {
+	    ssize_t (*show)(struct kobject *, struct attribute *, char *);
+	    ssize_t (*store)(struct kobject *, struct attribute *, const char *, size_t);
+    };
+
+[ Subsystems should have already defined a struct kobj_type as a
+descriptor for this type, which is where the sysfs_ops pointer is
+stored. See the kobject documentation for more information. ]
+
+When a file is read or written, sysfs calls the appropriate method
+for the type. The method then translates the generic struct kobject
+and struct attribute pointers to the appropriate pointer types, and
+calls the associated methods.
+
+
+To illustrate::
+
+    #define to_dev(obj) container_of(obj, struct device, kobj)
+    #define to_dev_attr(_attr) container_of(_attr, struct device_attribute, attr)
+
+    static ssize_t dev_attr_show(struct kobject *kobj, struct attribute *attr,
+				char *buf)
+    {
+	    struct device_attribute *dev_attr = to_dev_attr(attr);
+	    struct device *dev = to_dev(kobj);
+	    ssize_t ret = -EIO;
+
+	    if (dev_attr->show)
+		    ret = dev_attr->show(dev, dev_attr, buf);
+	    if (ret >= (ssize_t)PAGE_SIZE) {
+		    printk("dev_attr_show: %pS returned bad count\n",
+				    dev_attr->show);
+	    }
+	    return ret;
+    }
+
+
+
+Reading/Writing Attribute Data
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+To read or write attributes, show() or store() methods must be
+specified when declaring the attribute. The method types should be as
+simple as those defined for device attributes::
+
+    ssize_t (*show)(struct device *dev, struct device_attribute *attr, char *buf);
+    ssize_t (*store)(struct device *dev, struct device_attribute *attr,
+		    const char *buf, size_t count);
+
+IOW, they should take only an object, an attribute, and a buffer as parameters.
+
+
+sysfs allocates a buffer of size (PAGE_SIZE) and passes it to the
+method. Sysfs will call the method exactly once for each read or
+write. This forces the following behavior on the method
+implementations:
+
+- On read(2), the show() method should fill the entire buffer.
+  Recall that an attribute should only be exporting one value, or an
+  array of similar values, so this shouldn't be that expensive.
+
+  This allows userspace to do partial reads and forward seeks
+  arbitrarily over the entire file at will. If userspace seeks back to
+  zero or does a pread(2) with an offset of '0' the show() method will
+  be called again, rearmed, to fill the buffer.
+
+- On write(2), sysfs expects the entire buffer to be passed during the
+  first write. Sysfs then passes the entire buffer to the store() method.
+  A terminating null is added after the data on stores. This makes
+  functions like sysfs_streq() safe to use.
+
+  When writing sysfs files, userspace processes should first read the
+  entire file, modify the values it wishes to change, then write the
+  entire buffer back.
+
+  Attribute method implementations should operate on an identical
+  buffer when reading and writing values.
+
+Other notes:
+
+- Writing causes the show() method to be rearmed regardless of current
+  file position.
+
+- The buffer will always be PAGE_SIZE bytes in length. On i386, this
+  is 4096.
+
+- show() methods should return the number of bytes printed into the
+  buffer. This is the return value of scnprintf().
+
+- show() must not use snprintf() when formatting the value to be
+  returned to user space. If you can guarantee that an overflow
+  will never happen you can use sprintf() otherwise you must use
+  scnprintf().
+
+- store() should return the number of bytes used from the buffer. If the
+  entire buffer has been used, just return the count argument.
+
+- show() or store() can always return errors. If a bad value comes
+  through, be sure to return an error.
+
+- The object passed to the methods will be pinned in memory via sysfs
+  referencing counting its embedded object. However, the physical
+  entity (e.g. device) the object represents may not be present. Be
+  sure to have a way to check this, if necessary.
+
+
+A very simple (and naive) implementation of a device attribute is::
+
+    static ssize_t show_name(struct device *dev, struct device_attribute *attr,
+			    char *buf)
+    {
+	    return scnprintf(buf, PAGE_SIZE, "%s\n", dev->name);
+    }
+
+    static ssize_t store_name(struct device *dev, struct device_attribute *attr,
+			    const char *buf, size_t count)
+    {
+	    snprintf(dev->name, sizeof(dev->name), "%.*s",
+		    (int)min(count, sizeof(dev->name) - 1), buf);
+	    return count;
+    }
+
+    static DEVICE_ATTR(name, S_IRUGO, show_name, store_name);
+
+
+(Note that the real implementation doesn't allow userspace to set the
+name for a device.)
+
+
+Top Level Directory Layout
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The sysfs directory arrangement exposes the relationship of kernel
+data structures.
+
+The top level sysfs directory looks like::
+
+    block/
+    bus/
+    class/
+    dev/
+    devices/
+    firmware/
+    net/
+    fs/
+
+devices/ contains a filesystem representation of the device tree. It maps
+directly to the internal kernel device tree, which is a hierarchy of
+struct device.
+
+bus/ contains flat directory layout of the various bus types in the
+kernel. Each bus's directory contains two subdirectories::
+
+	devices/
+	drivers/
+
+devices/ contains symlinks for each device discovered in the system
+that point to the device's directory under root/.
+
+drivers/ contains a directory for each device driver that is loaded
+for devices on that particular bus (this assumes that drivers do not
+span multiple bus types).
+
+fs/ contains a directory for some filesystems.  Currently each
+filesystem wanting to export attributes must create its own hierarchy
+below fs/ (see ./fuse.txt for an example).
+
+dev/ contains two directories char/ and block/. Inside these two
+directories there are symlinks named <major>:<minor>.  These symlinks
+point to the sysfs directory for the given device.  /sys/dev provides a
+quick way to lookup the sysfs interface for a device from the result of
+a stat(2) operation.
+
+More information can driver-model specific features can be found in
+Documentation/driver-api/driver-model/.
+
+
+TODO: Finish this section.
+
+
+Current Interfaces
+~~~~~~~~~~~~~~~~~~
+
+The following interface layers currently exist in sysfs:
+
+
+devices (include/linux/device.h)
+--------------------------------
+Structure::
+
+    struct device_attribute {
+	    struct attribute	attr;
+	    ssize_t (*show)(struct device *dev, struct device_attribute *attr,
+			    char *buf);
+	    ssize_t (*store)(struct device *dev, struct device_attribute *attr,
+			    const char *buf, size_t count);
+    };
+
+Declaring::
+
+    DEVICE_ATTR(_name, _mode, _show, _store);
+
+Creation/Removal::
+
+    int device_create_file(struct device *dev, const struct device_attribute * attr);
+    void device_remove_file(struct device *dev, const struct device_attribute * attr);
+
+
+bus drivers (include/linux/device.h)
+------------------------------------
+Structure::
+
+    struct bus_attribute {
+	    struct attribute        attr;
+	    ssize_t (*show)(struct bus_type *, char * buf);
+	    ssize_t (*store)(struct bus_type *, const char * buf, size_t count);
+    };
+
+Declaring::
+
+    static BUS_ATTR_RW(name);
+    static BUS_ATTR_RO(name);
+    static BUS_ATTR_WO(name);
+
+Creation/Removal::
+
+    int bus_create_file(struct bus_type *, struct bus_attribute *);
+    void bus_remove_file(struct bus_type *, struct bus_attribute *);
+
+
+device drivers (include/linux/device.h)
+---------------------------------------
+
+Structure::
+
+    struct driver_attribute {
+	    struct attribute        attr;
+	    ssize_t (*show)(struct device_driver *, char * buf);
+	    ssize_t (*store)(struct device_driver *, const char * buf,
+			    size_t count);
+    };
+
+Declaring::
+
+    DRIVER_ATTR_RO(_name)
+    DRIVER_ATTR_RW(_name)
+
+Creation/Removal::
+
+    int driver_create_file(struct device_driver *, const struct driver_attribute *);
+    void driver_remove_file(struct device_driver *, const struct driver_attribute *);
+
+
+Documentation
+~~~~~~~~~~~~~
+
+The sysfs directory structure and the attributes in each directory define an
+ABI between the kernel and user space. As for any ABI, it is important that
+this ABI is stable and properly documented. All new sysfs attributes must be
+documented in Documentation/ABI. See also Documentation/ABI/README for more
+information.
diff --git a/Documentation/filesystems/sysfs.txt b/Documentation/filesystems/sysfs.txt
deleted file mode 100644
index ddf15b1b0d5a..000000000000
--- a/Documentation/filesystems/sysfs.txt
+++ /dev/null
@@ -1,408 +0,0 @@
-
-sysfs - _The_ filesystem for exporting kernel objects. 
-
-Patrick Mochel	<mochel@osdl.org>
-Mike Murphy <mamurph@cs.clemson.edu>
-
-Revised:    16 August 2011
-Original:   10 January 2003
-
-
-What it is:
-~~~~~~~~~~~
-
-sysfs is a ram-based filesystem initially based on ramfs. It provides
-a means to export kernel data structures, their attributes, and the 
-linkages between them to userspace. 
-
-sysfs is tied inherently to the kobject infrastructure. Please read
-Documentation/kobject.txt for more information concerning the kobject
-interface. 
-
-
-Using sysfs
-~~~~~~~~~~~
-
-sysfs is always compiled in if CONFIG_SYSFS is defined. You can access
-it by doing:
-
-    mount -t sysfs sysfs /sys 
-
-
-Directory Creation
-~~~~~~~~~~~~~~~~~~
-
-For every kobject that is registered with the system, a directory is
-created for it in sysfs. That directory is created as a subdirectory
-of the kobject's parent, expressing internal object hierarchies to
-userspace. Top-level directories in sysfs represent the common
-ancestors of object hierarchies; i.e. the subsystems the objects
-belong to. 
-
-Sysfs internally stores a pointer to the kobject that implements a
-directory in the kernfs_node object associated with the directory. In
-the past this kobject pointer has been used by sysfs to do reference
-counting directly on the kobject whenever the file is opened or closed.
-With the current sysfs implementation the kobject reference count is
-only modified directly by the function sysfs_schedule_callback().
-
-
-Attributes
-~~~~~~~~~~
-
-Attributes can be exported for kobjects in the form of regular files in
-the filesystem. Sysfs forwards file I/O operations to methods defined
-for the attributes, providing a means to read and write kernel
-attributes.
-
-Attributes should be ASCII text files, preferably with only one value
-per file. It is noted that it may not be efficient to contain only one
-value per file, so it is socially acceptable to express an array of
-values of the same type. 
-
-Mixing types, expressing multiple lines of data, and doing fancy
-formatting of data is heavily frowned upon. Doing these things may get
-you publicly humiliated and your code rewritten without notice. 
-
-
-An attribute definition is simply:
-
-struct attribute {
-        char                    * name;
-        struct module		*owner;
-        umode_t                 mode;
-};
-
-
-int sysfs_create_file(struct kobject * kobj, const struct attribute * attr);
-void sysfs_remove_file(struct kobject * kobj, const struct attribute * attr);
-
-
-A bare attribute contains no means to read or write the value of the
-attribute. Subsystems are encouraged to define their own attribute
-structure and wrapper functions for adding and removing attributes for
-a specific object type. 
-
-For example, the driver model defines struct device_attribute like:
-
-struct device_attribute {
-	struct attribute	attr;
-	ssize_t (*show)(struct device *dev, struct device_attribute *attr,
-			char *buf);
-	ssize_t (*store)(struct device *dev, struct device_attribute *attr,
-			 const char *buf, size_t count);
-};
-
-int device_create_file(struct device *, const struct device_attribute *);
-void device_remove_file(struct device *, const struct device_attribute *);
-
-It also defines this helper for defining device attributes: 
-
-#define DEVICE_ATTR(_name, _mode, _show, _store) \
-struct device_attribute dev_attr_##_name = __ATTR(_name, _mode, _show, _store)
-
-For example, declaring
-
-static DEVICE_ATTR(foo, S_IWUSR | S_IRUGO, show_foo, store_foo);
-
-is equivalent to doing:
-
-static struct device_attribute dev_attr_foo = {
-	.attr = {
-		.name = "foo",
-		.mode = S_IWUSR | S_IRUGO,
-	},
-	.show = show_foo,
-	.store = store_foo,
-};
-
-Note as stated in include/linux/kernel.h "OTHER_WRITABLE?  Generally
-considered a bad idea." so trying to set a sysfs file writable for
-everyone will fail reverting to RO mode for "Others".
-
-For the common cases sysfs.h provides convenience macros to make
-defining attributes easier as well as making code more concise and
-readable. The above case could be shortened to:
-
-static struct device_attribute dev_attr_foo = __ATTR_RW(foo);
-
-the list of helpers available to define your wrapper function is:
-__ATTR_RO(name): assumes default name_show and mode 0444
-__ATTR_WO(name): assumes a name_store only and is restricted to mode
-                 0200 that is root write access only.
-__ATTR_RO_MODE(name, mode): fore more restrictive RO access currently
-                 only use case is the EFI System Resource Table
-                 (see drivers/firmware/efi/esrt.c)
-__ATTR_RW(name): assumes default name_show, name_store and setting
-                 mode to 0644.
-__ATTR_NULL: which sets the name to NULL and is used as end of list
-                 indicator (see: kernel/workqueue.c)
-
-Subsystem-Specific Callbacks
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-When a subsystem defines a new attribute type, it must implement a
-set of sysfs operations for forwarding read and write calls to the
-show and store methods of the attribute owners. 
-
-struct sysfs_ops {
-        ssize_t (*show)(struct kobject *, struct attribute *, char *);
-        ssize_t (*store)(struct kobject *, struct attribute *, const char *, size_t);
-};
-
-[ Subsystems should have already defined a struct kobj_type as a
-descriptor for this type, which is where the sysfs_ops pointer is
-stored. See the kobject documentation for more information. ]
-
-When a file is read or written, sysfs calls the appropriate method
-for the type. The method then translates the generic struct kobject
-and struct attribute pointers to the appropriate pointer types, and
-calls the associated methods. 
-
-
-To illustrate:
-
-#define to_dev(obj) container_of(obj, struct device, kobj)
-#define to_dev_attr(_attr) container_of(_attr, struct device_attribute, attr)
-
-static ssize_t dev_attr_show(struct kobject *kobj, struct attribute *attr,
-                             char *buf)
-{
-        struct device_attribute *dev_attr = to_dev_attr(attr);
-        struct device *dev = to_dev(kobj);
-        ssize_t ret = -EIO;
-
-        if (dev_attr->show)
-                ret = dev_attr->show(dev, dev_attr, buf);
-        if (ret >= (ssize_t)PAGE_SIZE) {
-                printk("dev_attr_show: %pS returned bad count\n",
-                                dev_attr->show);
-        }
-        return ret;
-}
-
-
-
-Reading/Writing Attribute Data
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-To read or write attributes, show() or store() methods must be
-specified when declaring the attribute. The method types should be as
-simple as those defined for device attributes:
-
-ssize_t (*show)(struct device *dev, struct device_attribute *attr, char *buf);
-ssize_t (*store)(struct device *dev, struct device_attribute *attr,
-                 const char *buf, size_t count);
-
-IOW, they should take only an object, an attribute, and a buffer as parameters.
-
-
-sysfs allocates a buffer of size (PAGE_SIZE) and passes it to the
-method. Sysfs will call the method exactly once for each read or
-write. This forces the following behavior on the method
-implementations: 
-
-- On read(2), the show() method should fill the entire buffer. 
-  Recall that an attribute should only be exporting one value, or an
-  array of similar values, so this shouldn't be that expensive. 
-
-  This allows userspace to do partial reads and forward seeks
-  arbitrarily over the entire file at will. If userspace seeks back to
-  zero or does a pread(2) with an offset of '0' the show() method will
-  be called again, rearmed, to fill the buffer.
-
-- On write(2), sysfs expects the entire buffer to be passed during the
-  first write. Sysfs then passes the entire buffer to the store() method.
-  A terminating null is added after the data on stores. This makes
-  functions like sysfs_streq() safe to use.
-
-  When writing sysfs files, userspace processes should first read the
-  entire file, modify the values it wishes to change, then write the
-  entire buffer back. 
-
-  Attribute method implementations should operate on an identical
-  buffer when reading and writing values. 
-
-Other notes:
-
-- Writing causes the show() method to be rearmed regardless of current
-  file position.
-
-- The buffer will always be PAGE_SIZE bytes in length. On i386, this
-  is 4096. 
-
-- show() methods should return the number of bytes printed into the
-  buffer. This is the return value of scnprintf().
-
-- show() must not use snprintf() when formatting the value to be
-  returned to user space. If you can guarantee that an overflow
-  will never happen you can use sprintf() otherwise you must use
-  scnprintf().
-
-- store() should return the number of bytes used from the buffer. If the
-  entire buffer has been used, just return the count argument.
-
-- show() or store() can always return errors. If a bad value comes
-  through, be sure to return an error.
-
-- The object passed to the methods will be pinned in memory via sysfs
-  referencing counting its embedded object. However, the physical 
-  entity (e.g. device) the object represents may not be present. Be 
-  sure to have a way to check this, if necessary. 
-
-
-A very simple (and naive) implementation of a device attribute is:
-
-static ssize_t show_name(struct device *dev, struct device_attribute *attr,
-                         char *buf)
-{
-	return scnprintf(buf, PAGE_SIZE, "%s\n", dev->name);
-}
-
-static ssize_t store_name(struct device *dev, struct device_attribute *attr,
-                          const char *buf, size_t count)
-{
-        snprintf(dev->name, sizeof(dev->name), "%.*s",
-                 (int)min(count, sizeof(dev->name) - 1), buf);
-	return count;
-}
-
-static DEVICE_ATTR(name, S_IRUGO, show_name, store_name);
-
-
-(Note that the real implementation doesn't allow userspace to set the 
-name for a device.)
-
-
-Top Level Directory Layout
-~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-The sysfs directory arrangement exposes the relationship of kernel
-data structures. 
-
-The top level sysfs directory looks like:
-
-block/
-bus/
-class/
-dev/
-devices/
-firmware/
-net/
-fs/
-
-devices/ contains a filesystem representation of the device tree. It maps
-directly to the internal kernel device tree, which is a hierarchy of
-struct device. 
-
-bus/ contains flat directory layout of the various bus types in the
-kernel. Each bus's directory contains two subdirectories:
-
-	devices/
-	drivers/
-
-devices/ contains symlinks for each device discovered in the system
-that point to the device's directory under root/.
-
-drivers/ contains a directory for each device driver that is loaded
-for devices on that particular bus (this assumes that drivers do not
-span multiple bus types).
-
-fs/ contains a directory for some filesystems.  Currently each
-filesystem wanting to export attributes must create its own hierarchy
-below fs/ (see ./fuse.txt for an example).
-
-dev/ contains two directories char/ and block/. Inside these two
-directories there are symlinks named <major>:<minor>.  These symlinks
-point to the sysfs directory for the given device.  /sys/dev provides a
-quick way to lookup the sysfs interface for a device from the result of
-a stat(2) operation.
-
-More information can driver-model specific features can be found in
-Documentation/driver-api/driver-model/.
-
-
-TODO: Finish this section.
-
-
-Current Interfaces
-~~~~~~~~~~~~~~~~~~
-
-The following interface layers currently exist in sysfs:
-
-
-- devices (include/linux/device.h)
-----------------------------------
-Structure:
-
-struct device_attribute {
-	struct attribute	attr;
-	ssize_t (*show)(struct device *dev, struct device_attribute *attr,
-			char *buf);
-	ssize_t (*store)(struct device *dev, struct device_attribute *attr,
-			 const char *buf, size_t count);
-};
-
-Declaring:
-
-DEVICE_ATTR(_name, _mode, _show, _store);
-
-Creation/Removal:
-
-int device_create_file(struct device *dev, const struct device_attribute * attr);
-void device_remove_file(struct device *dev, const struct device_attribute * attr);
-
-
-- bus drivers (include/linux/device.h)
---------------------------------------
-Structure:
-
-struct bus_attribute {
-        struct attribute        attr;
-        ssize_t (*show)(struct bus_type *, char * buf);
-        ssize_t (*store)(struct bus_type *, const char * buf, size_t count);
-};
-
-Declaring:
-
-static BUS_ATTR_RW(name);
-static BUS_ATTR_RO(name);
-static BUS_ATTR_WO(name);
-
-Creation/Removal:
-
-int bus_create_file(struct bus_type *, struct bus_attribute *);
-void bus_remove_file(struct bus_type *, struct bus_attribute *);
-
-
-- device drivers (include/linux/device.h)
------------------------------------------
-
-Structure:
-
-struct driver_attribute {
-        struct attribute        attr;
-        ssize_t (*show)(struct device_driver *, char * buf);
-        ssize_t (*store)(struct device_driver *, const char * buf,
-                         size_t count);
-};
-
-Declaring:
-
-DRIVER_ATTR_RO(_name)
-DRIVER_ATTR_RW(_name)
-
-Creation/Removal:
-
-int driver_create_file(struct device_driver *, const struct driver_attribute *);
-void driver_remove_file(struct device_driver *, const struct driver_attribute *);
-
-
-Documentation
-~~~~~~~~~~~~~
-
-The sysfs directory structure and the attributes in each directory define an
-ABI between the kernel and user space. As for any ABI, it is important that
-this ABI is stable and properly documented. All new sysfs attributes must be
-documented in Documentation/ABI. See also Documentation/ABI/README for more
-information.
-- 
cgit 


From 826a613d3f81695022f324a5cb84fe73ec09e51d Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:12:25 +0100
Subject: docs: filesystems: convert sysv-fs.txt to ReST

- Add a SPDX header;
- Add a document title;
- Some whitespace fixes and new line breaks;
- Mark literal blocks as such;
- Add table markups;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/5b96a6efba95773af439ab25a7dbe4d0edf8c867.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/index.rst   |   1 +
 Documentation/filesystems/sysv-fs.rst | 264 ++++++++++++++++++++++++++++++++++
 Documentation/filesystems/sysv-fs.txt | 197 -------------------------
 3 files changed, 265 insertions(+), 197 deletions(-)
 create mode 100644 Documentation/filesystems/sysv-fs.rst
 delete mode 100644 Documentation/filesystems/sysv-fs.txt

diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index bafe92c72433..d583b8b35196 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -88,5 +88,6 @@ Documentation for filesystem implementations.
    romfs
    squashfs
    sysfs
+   sysv-fs
    virtiofs
    vfat
diff --git a/Documentation/filesystems/sysv-fs.rst b/Documentation/filesystems/sysv-fs.rst
new file mode 100644
index 000000000000..89e40911ad7c
--- /dev/null
+++ b/Documentation/filesystems/sysv-fs.rst
@@ -0,0 +1,264 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==================
+SystemV Filesystem
+==================
+
+It implements all of
+  - Xenix FS,
+  - SystemV/386 FS,
+  - Coherent FS.
+
+To install:
+
+* Answer the 'System V and Coherent filesystem support' question with 'y'
+  when configuring the kernel.
+* To mount a disk or a partition, use::
+
+    mount [-r] -t sysv device mountpoint
+
+  The file system type names::
+
+               -t sysv
+               -t xenix
+               -t coherent
+
+  may be used interchangeably, but the last two will eventually disappear.
+
+Bugs in the present implementation:
+
+- Coherent FS:
+
+  - The "free list interleave" n:m is currently ignored.
+  - Only file systems with no filesystem name and no pack name are recognized.
+    (See Coherent "man mkfs" for a description of these features.)
+
+- SystemV Release 2 FS:
+
+  The superblock is only searched in the blocks 9, 15, 18, which
+  corresponds to the beginning of track 1 on floppy disks. No support
+  for this FS on hard disk yet.
+
+
+These filesystems are rather similar. Here is a comparison with Minix FS:
+
+* Linux fdisk reports on partitions
+
+  - Minix FS     0x81 Linux/Minix
+  - Xenix FS     ??
+  - SystemV FS   ??
+  - Coherent FS  0x08 AIX bootable
+
+* Size of a block or zone (data allocation unit on disk)
+
+  - Minix FS     1024
+  - Xenix FS     1024 (also 512 ??)
+  - SystemV FS   1024 (also 512 and 2048)
+  - Coherent FS   512
+
+* General layout: all have one boot block, one super block and
+  separate areas for inodes and for directories/data.
+  On SystemV Release 2 FS (e.g. Microport) the first track is reserved and
+  all the block numbers (including the super block) are offset by one track.
+
+* Byte ordering of "short" (16 bit entities) on disk:
+
+  - Minix FS     little endian  0 1
+  - Xenix FS     little endian  0 1
+  - SystemV FS   little endian  0 1
+  - Coherent FS  little endian  0 1
+
+  Of course, this affects only the file system, not the data of files on it!
+
+* Byte ordering of "long" (32 bit entities) on disk:
+
+  - Minix FS     little endian  0 1 2 3
+  - Xenix FS     little endian  0 1 2 3
+  - SystemV FS   little endian  0 1 2 3
+  - Coherent FS  PDP-11         2 3 0 1
+
+  Of course, this affects only the file system, not the data of files on it!
+
+* Inode on disk: "short", 0 means non-existent, the root dir ino is:
+
+  =================================  ==
+  Minix FS                            1
+  Xenix FS, SystemV FS, Coherent FS   2
+  =================================  ==
+
+* Maximum number of hard links to a file:
+
+  ===========  =========
+  Minix FS     250
+  Xenix FS     ??
+  SystemV FS   ??
+  Coherent FS  >=10000
+  ===========  =========
+
+* Free inode management:
+
+  - Minix FS
+      a bitmap
+  - Xenix FS, SystemV FS, Coherent FS
+      There is a cache of a certain number of free inodes in the super-block.
+      When it is exhausted, new free inodes are found using a linear search.
+
+* Free block management:
+
+  - Minix FS
+      a bitmap
+  - Xenix FS, SystemV FS, Coherent FS
+      Free blocks are organized in a "free list". Maybe a misleading term,
+      since it is not true that every free block contains a pointer to
+      the next free block. Rather, the free blocks are organized in chunks
+      of limited size, and every now and then a free block contains pointers
+      to the free blocks pertaining to the next chunk; the first of these
+      contains pointers and so on. The list terminates with a "block number"
+      0 on Xenix FS and SystemV FS, with a block zeroed out on Coherent FS.
+
+* Super-block location:
+
+  ===========  ==========================
+  Minix FS     block 1 = bytes 1024..2047
+  Xenix FS     block 1 = bytes 1024..2047
+  SystemV FS   bytes 512..1023
+  Coherent FS  block 1 = bytes 512..1023
+  ===========  ==========================
+
+* Super-block layout:
+
+  - Minix FS::
+
+                    unsigned short s_ninodes;
+                    unsigned short s_nzones;
+                    unsigned short s_imap_blocks;
+                    unsigned short s_zmap_blocks;
+                    unsigned short s_firstdatazone;
+                    unsigned short s_log_zone_size;
+                    unsigned long s_max_size;
+                    unsigned short s_magic;
+
+  - Xenix FS, SystemV FS, Coherent FS::
+
+                    unsigned short s_firstdatazone;
+                    unsigned long  s_nzones;
+                    unsigned short s_fzone_count;
+                    unsigned long  s_fzones[NICFREE];
+                    unsigned short s_finode_count;
+                    unsigned short s_finodes[NICINOD];
+                    char           s_flock;
+                    char           s_ilock;
+                    char           s_modified;
+                    char           s_rdonly;
+                    unsigned long  s_time;
+                    short          s_dinfo[4]; -- SystemV FS only
+                    unsigned long  s_free_zones;
+                    unsigned short s_free_inodes;
+                    short          s_dinfo[4]; -- Xenix FS only
+                    unsigned short s_interleave_m,s_interleave_n; -- Coherent FS only
+                    char           s_fname[6];
+                    char           s_fpack[6];
+
+    then they differ considerably:
+
+        Xenix FS::
+
+                    char           s_clean;
+                    char           s_fill[371];
+                    long           s_magic;
+                    long           s_type;
+
+        SystemV FS::
+
+                    long           s_fill[12 or 14];
+                    long           s_state;
+                    long           s_magic;
+                    long           s_type;
+
+        Coherent FS::
+
+                    unsigned long  s_unique;
+
+    Note that Coherent FS has no magic.
+
+* Inode layout:
+
+  - Minix FS::
+
+                    unsigned short i_mode;
+                    unsigned short i_uid;
+                    unsigned long  i_size;
+                    unsigned long  i_time;
+                    unsigned char  i_gid;
+                    unsigned char  i_nlinks;
+                    unsigned short i_zone[7+1+1];
+
+  - Xenix FS, SystemV FS, Coherent FS::
+
+                    unsigned short i_mode;
+                    unsigned short i_nlink;
+                    unsigned short i_uid;
+                    unsigned short i_gid;
+                    unsigned long  i_size;
+                    unsigned char  i_zone[3*(10+1+1+1)];
+                    unsigned long  i_atime;
+                    unsigned long  i_mtime;
+                    unsigned long  i_ctime;
+
+
+* Regular file data blocks are organized as
+
+  - Minix FS:
+
+             - 7 direct blocks
+	     - 1 indirect block (pointers to blocks)
+             - 1 double-indirect block (pointer to pointers to blocks)
+
+  - Xenix FS, SystemV FS, Coherent FS:
+
+             - 10 direct blocks
+             -  1 indirect block (pointers to blocks)
+             -  1 double-indirect block (pointer to pointers to blocks)
+             -  1 triple-indirect block (pointer to pointers to pointers to blocks)
+
+
+  ===========  ==========   ================
+               Inode size   inodes per block
+  ===========  ==========   ================
+  Minix FS        32        32
+  Xenix FS        64        16
+  SystemV FS      64        16
+  Coherent FS     64        8
+  ===========  ==========   ================
+
+* Directory entry on disk
+
+  - Minix FS::
+
+                    unsigned short inode;
+                    char name[14/30];
+
+  - Xenix FS, SystemV FS, Coherent FS::
+
+                    unsigned short inode;
+                    char name[14];
+
+  ===========    ==============    =====================
+                 Dir entry size    dir entries per block
+  ===========    ==============    =====================
+  Minix FS       16/32             64/32
+  Xenix FS       16                64
+  SystemV FS     16                64
+  Coherent FS    16                32
+  ===========    ==============    =====================
+
+* How to implement symbolic links such that the host fsck doesn't scream:
+
+  - Minix FS     normal
+  - Xenix FS     kludge: as regular files with  chmod 1000
+  - SystemV FS   ??
+  - Coherent FS  kludge: as regular files with  chmod 1000
+
+
+Notation: We often speak of a "block" but mean a zone (the allocation unit)
+and not the disk driver's notion of "block".
diff --git a/Documentation/filesystems/sysv-fs.txt b/Documentation/filesystems/sysv-fs.txt
deleted file mode 100644
index 253b50d1328e..000000000000
--- a/Documentation/filesystems/sysv-fs.txt
+++ /dev/null
@@ -1,197 +0,0 @@
-It implements all of
-  - Xenix FS,
-  - SystemV/386 FS,
-  - Coherent FS.
-
-To install:
-* Answer the 'System V and Coherent filesystem support' question with 'y'
-  when configuring the kernel.
-* To mount a disk or a partition, use
-    mount [-r] -t sysv device mountpoint
-  The file system type names
-               -t sysv
-               -t xenix
-               -t coherent
-  may be used interchangeably, but the last two will eventually disappear.
-
-Bugs in the present implementation:
-- Coherent FS:
-  - The "free list interleave" n:m is currently ignored.
-  - Only file systems with no filesystem name and no pack name are recognized.
-  (See Coherent "man mkfs" for a description of these features.)
-- SystemV Release 2 FS:
-  The superblock is only searched in the blocks 9, 15, 18, which
-  corresponds to the beginning of track 1 on floppy disks. No support
-  for this FS on hard disk yet.
-
-
-These filesystems are rather similar. Here is a comparison with Minix FS:
-
-* Linux fdisk reports on partitions
-  - Minix FS     0x81 Linux/Minix
-  - Xenix FS     ??
-  - SystemV FS   ??
-  - Coherent FS  0x08 AIX bootable
-
-* Size of a block or zone (data allocation unit on disk)
-  - Minix FS     1024
-  - Xenix FS     1024 (also 512 ??)
-  - SystemV FS   1024 (also 512 and 2048)
-  - Coherent FS   512
-
-* General layout: all have one boot block, one super block and
-  separate areas for inodes and for directories/data.
-  On SystemV Release 2 FS (e.g. Microport) the first track is reserved and
-  all the block numbers (including the super block) are offset by one track.
-
-* Byte ordering of "short" (16 bit entities) on disk:
-  - Minix FS     little endian  0 1
-  - Xenix FS     little endian  0 1
-  - SystemV FS   little endian  0 1
-  - Coherent FS  little endian  0 1
-  Of course, this affects only the file system, not the data of files on it!
-
-* Byte ordering of "long" (32 bit entities) on disk:
-  - Minix FS     little endian  0 1 2 3
-  - Xenix FS     little endian  0 1 2 3
-  - SystemV FS   little endian  0 1 2 3
-  - Coherent FS  PDP-11         2 3 0 1
-  Of course, this affects only the file system, not the data of files on it!
-
-* Inode on disk: "short", 0 means non-existent, the root dir ino is:
-  - Minix FS                            1
-  - Xenix FS, SystemV FS, Coherent FS   2
-
-* Maximum number of hard links to a file:
-  - Minix FS     250
-  - Xenix FS     ??
-  - SystemV FS   ??
-  - Coherent FS  >=10000
-
-* Free inode management:
-  - Minix FS                             a bitmap
-  - Xenix FS, SystemV FS, Coherent FS
-      There is a cache of a certain number of free inodes in the super-block.
-      When it is exhausted, new free inodes are found using a linear search.
-
-* Free block management:
-  - Minix FS                             a bitmap
-  - Xenix FS, SystemV FS, Coherent FS
-      Free blocks are organized in a "free list". Maybe a misleading term,
-      since it is not true that every free block contains a pointer to
-      the next free block. Rather, the free blocks are organized in chunks
-      of limited size, and every now and then a free block contains pointers
-      to the free blocks pertaining to the next chunk; the first of these
-      contains pointers and so on. The list terminates with a "block number"
-      0 on Xenix FS and SystemV FS, with a block zeroed out on Coherent FS.
-
-* Super-block location:
-  - Minix FS     block 1 = bytes 1024..2047
-  - Xenix FS     block 1 = bytes 1024..2047
-  - SystemV FS   bytes 512..1023
-  - Coherent FS  block 1 = bytes 512..1023
-
-* Super-block layout:
-  - Minix FS
-                    unsigned short s_ninodes;
-                    unsigned short s_nzones;
-                    unsigned short s_imap_blocks;
-                    unsigned short s_zmap_blocks;
-                    unsigned short s_firstdatazone;
-                    unsigned short s_log_zone_size;
-                    unsigned long s_max_size;
-                    unsigned short s_magic;
-  - Xenix FS, SystemV FS, Coherent FS
-                    unsigned short s_firstdatazone;
-                    unsigned long  s_nzones;
-                    unsigned short s_fzone_count;
-                    unsigned long  s_fzones[NICFREE];
-                    unsigned short s_finode_count;
-                    unsigned short s_finodes[NICINOD];
-                    char           s_flock;
-                    char           s_ilock;
-                    char           s_modified;
-                    char           s_rdonly;
-                    unsigned long  s_time;
-                    short          s_dinfo[4]; -- SystemV FS only
-                    unsigned long  s_free_zones;
-                    unsigned short s_free_inodes;
-                    short          s_dinfo[4]; -- Xenix FS only
-                    unsigned short s_interleave_m,s_interleave_n; -- Coherent FS only
-                    char           s_fname[6];
-                    char           s_fpack[6];
-    then they differ considerably:
-        Xenix FS
-                    char           s_clean;
-                    char           s_fill[371];
-                    long           s_magic;
-                    long           s_type;
-        SystemV FS
-                    long           s_fill[12 or 14];
-                    long           s_state;
-                    long           s_magic;
-                    long           s_type;
-        Coherent FS
-                    unsigned long  s_unique;
-    Note that Coherent FS has no magic.
-
-* Inode layout:
-  - Minix FS
-                    unsigned short i_mode;
-                    unsigned short i_uid;
-                    unsigned long  i_size;
-                    unsigned long  i_time;
-                    unsigned char  i_gid;
-                    unsigned char  i_nlinks;
-                    unsigned short i_zone[7+1+1];
-  - Xenix FS, SystemV FS, Coherent FS
-                    unsigned short i_mode;
-                    unsigned short i_nlink;
-                    unsigned short i_uid;
-                    unsigned short i_gid;
-                    unsigned long  i_size;
-                    unsigned char  i_zone[3*(10+1+1+1)];
-                    unsigned long  i_atime;
-                    unsigned long  i_mtime;
-                    unsigned long  i_ctime;
-
-* Regular file data blocks are organized as
-  - Minix FS
-               7 direct blocks
-               1 indirect block (pointers to blocks)
-               1 double-indirect block (pointer to pointers to blocks)
-  - Xenix FS, SystemV FS, Coherent FS
-              10 direct blocks
-               1 indirect block (pointers to blocks)
-               1 double-indirect block (pointer to pointers to blocks)
-               1 triple-indirect block (pointer to pointers to pointers to blocks)
-
-* Inode size, inodes per block
-  - Minix FS        32   32
-  - Xenix FS        64   16
-  - SystemV FS      64   16
-  - Coherent FS     64    8
-
-* Directory entry on disk
-  - Minix FS
-                    unsigned short inode;
-                    char name[14/30];
-  - Xenix FS, SystemV FS, Coherent FS
-                    unsigned short inode;
-                    char name[14];
-
-* Dir entry size, dir entries per block
-  - Minix FS     16/32    64/32
-  - Xenix FS     16       64
-  - SystemV FS   16       64
-  - Coherent FS  16       32
-
-* How to implement symbolic links such that the host fsck doesn't scream:
-  - Minix FS     normal
-  - Xenix FS     kludge: as regular files with  chmod 1000
-  - SystemV FS   ??
-  - Coherent FS  kludge: as regular files with  chmod 1000
-
-
-Notation: We often speak of a "block" but mean a zone (the allocation unit)
-and not the disk driver's notion of "block".
-- 
cgit 


From 7e7cd458b8105b02e69e3af2ef4cd186326d7f84 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:12:26 +0100
Subject: docs: filesystems: convert tmpfs.txt to ReST

- Add a SPDX header;
- Add a document title;
- Some whitespace fixes and new line breaks;
- Mark literal blocks as such;
- Add table markups;
- Use :field: markup;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/30397a47a78ca59760fbc0fc5f50c5f1002d487a.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/index.rst |   1 +
 Documentation/filesystems/tmpfs.rst | 163 ++++++++++++++++++++++++++++++++++++
 Documentation/filesystems/tmpfs.txt | 149 --------------------------------
 3 files changed, 164 insertions(+), 149 deletions(-)
 create mode 100644 Documentation/filesystems/tmpfs.rst
 delete mode 100644 Documentation/filesystems/tmpfs.txt

diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index d583b8b35196..27d37e7712da 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -89,5 +89,6 @@ Documentation for filesystem implementations.
    squashfs
    sysfs
    sysv-fs
+   tmpfs
    virtiofs
    vfat
diff --git a/Documentation/filesystems/tmpfs.rst b/Documentation/filesystems/tmpfs.rst
new file mode 100644
index 000000000000..4e95929301a5
--- /dev/null
+++ b/Documentation/filesystems/tmpfs.rst
@@ -0,0 +1,163 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=====
+Tmpfs
+=====
+
+Tmpfs is a file system which keeps all files in virtual memory.
+
+
+Everything in tmpfs is temporary in the sense that no files will be
+created on your hard drive. If you unmount a tmpfs instance,
+everything stored therein is lost.
+
+tmpfs puts everything into the kernel internal caches and grows and
+shrinks to accommodate the files it contains and is able to swap
+unneeded pages out to swap space. It has maximum size limits which can
+be adjusted on the fly via 'mount -o remount ...'
+
+If you compare it to ramfs (which was the template to create tmpfs)
+you gain swapping and limit checking. Another similar thing is the RAM
+disk (/dev/ram*), which simulates a fixed size hard disk in physical
+RAM, where you have to create an ordinary filesystem on top. Ramdisks
+cannot swap and you do not have the possibility to resize them.
+
+Since tmpfs lives completely in the page cache and on swap, all tmpfs
+pages will be shown as "Shmem" in /proc/meminfo and "Shared" in
+free(1). Notice that these counters also include shared memory
+(shmem, see ipcs(1)). The most reliable way to get the count is
+using df(1) and du(1).
+
+tmpfs has the following uses:
+
+1) There is always a kernel internal mount which you will not see at
+   all. This is used for shared anonymous mappings and SYSV shared
+   memory.
+
+   This mount does not depend on CONFIG_TMPFS. If CONFIG_TMPFS is not
+   set, the user visible part of tmpfs is not build. But the internal
+   mechanisms are always present.
+
+2) glibc 2.2 and above expects tmpfs to be mounted at /dev/shm for
+   POSIX shared memory (shm_open, shm_unlink). Adding the following
+   line to /etc/fstab should take care of this::
+
+	tmpfs	/dev/shm	tmpfs	defaults	0 0
+
+   Remember to create the directory that you intend to mount tmpfs on
+   if necessary.
+
+   This mount is _not_ needed for SYSV shared memory. The internal
+   mount is used for that. (In the 2.3 kernel versions it was
+   necessary to mount the predecessor of tmpfs (shm fs) to use SYSV
+   shared memory)
+
+3) Some people (including me) find it very convenient to mount it
+   e.g. on /tmp and /var/tmp and have a big swap partition. And now
+   loop mounts of tmpfs files do work, so mkinitrd shipped by most
+   distributions should succeed with a tmpfs /tmp.
+
+4) And probably a lot more I do not know about :-)
+
+
+tmpfs has three mount options for sizing:
+
+=========  ============================================================
+size       The limit of allocated bytes for this tmpfs instance. The
+           default is half of your physical RAM without swap. If you
+           oversize your tmpfs instances the machine will deadlock
+           since the OOM handler will not be able to free that memory.
+nr_blocks  The same as size, but in blocks of PAGE_SIZE.
+nr_inodes  The maximum number of inodes for this instance. The default
+           is half of the number of your physical RAM pages, or (on a
+           machine with highmem) the number of lowmem RAM pages,
+           whichever is the lower.
+=========  ============================================================
+
+These parameters accept a suffix k, m or g for kilo, mega and giga and
+can be changed on remount.  The size parameter also accepts a suffix %
+to limit this tmpfs instance to that percentage of your physical RAM:
+the default, when neither size nor nr_blocks is specified, is size=50%
+
+If nr_blocks=0 (or size=0), blocks will not be limited in that instance;
+if nr_inodes=0, inodes will not be limited.  It is generally unwise to
+mount with such options, since it allows any user with write access to
+use up all the memory on the machine; but enhances the scalability of
+that instance in a system with many cpus making intensive use of it.
+
+
+tmpfs has a mount option to set the NUMA memory allocation policy for
+all files in that instance (if CONFIG_NUMA is enabled) - which can be
+adjusted on the fly via 'mount -o remount ...'
+
+======================== ==============================================
+mpol=default             use the process allocation policy
+                         (see set_mempolicy(2))
+mpol=prefer:Node         prefers to allocate memory from the given Node
+mpol=bind:NodeList       allocates memory only from nodes in NodeList
+mpol=interleave          prefers to allocate from each node in turn
+mpol=interleave:NodeList allocates from each node of NodeList in turn
+mpol=local		 prefers to allocate memory from the local node
+======================== ==============================================
+
+NodeList format is a comma-separated list of decimal numbers and ranges,
+a range being two hyphen-separated decimal numbers, the smallest and
+largest node numbers in the range.  For example, mpol=bind:0-3,5,7,9-15
+
+A memory policy with a valid NodeList will be saved, as specified, for
+use at file creation time.  When a task allocates a file in the file
+system, the mount option memory policy will be applied with a NodeList,
+if any, modified by the calling task's cpuset constraints
+[See Documentation/admin-guide/cgroup-v1/cpusets.rst] and any optional flags,
+listed below.  If the resulting NodeLists is the empty set, the effective
+memory policy for the file will revert to "default" policy.
+
+NUMA memory allocation policies have optional flags that can be used in
+conjunction with their modes.  These optional flags can be specified
+when tmpfs is mounted by appending them to the mode before the NodeList.
+See Documentation/admin-guide/mm/numa_memory_policy.rst for a list of
+all available memory allocation policy mode flags and their effect on
+memory policy.
+
+::
+
+	=static		is equivalent to	MPOL_F_STATIC_NODES
+	=relative	is equivalent to	MPOL_F_RELATIVE_NODES
+
+For example, mpol=bind=static:NodeList, is the equivalent of an
+allocation policy of MPOL_BIND | MPOL_F_STATIC_NODES.
+
+Note that trying to mount a tmpfs with an mpol option will fail if the
+running kernel does not support NUMA; and will fail if its nodelist
+specifies a node which is not online.  If your system relies on that
+tmpfs being mounted, but from time to time runs a kernel built without
+NUMA capability (perhaps a safe recovery kernel), or with fewer nodes
+online, then it is advisable to omit the mpol option from automatic
+mount options.  It can be added later, when the tmpfs is already mounted
+on MountPoint, by 'mount -o remount,mpol=Policy:NodeList MountPoint'.
+
+
+To specify the initial root directory you can use the following mount
+options:
+
+====	==================================
+mode	The permissions as an octal number
+uid	The user id
+gid	The group id
+====	==================================
+
+These options do not have any effect on remount. You can change these
+parameters with chmod(1), chown(1) and chgrp(1) on a mounted filesystem.
+
+
+So 'mount -t tmpfs -o size=10G,nr_inodes=10k,mode=700 tmpfs /mytmpfs'
+will give you tmpfs instance on /mytmpfs which can allocate 10GB
+RAM/SWAP in 10240 inodes and it is only accessible by root.
+
+
+:Author:
+   Christoph Rohland <cr@sap.com>, 1.12.01
+:Updated:
+   Hugh Dickins, 4 June 2007
+:Updated:
+   KOSAKI Motohiro, 16 Mar 2010
diff --git a/Documentation/filesystems/tmpfs.txt b/Documentation/filesystems/tmpfs.txt
deleted file mode 100644
index 5ecbc03e6b2f..000000000000
--- a/Documentation/filesystems/tmpfs.txt
+++ /dev/null
@@ -1,149 +0,0 @@
-Tmpfs is a file system which keeps all files in virtual memory.
-
-
-Everything in tmpfs is temporary in the sense that no files will be
-created on your hard drive. If you unmount a tmpfs instance,
-everything stored therein is lost.
-
-tmpfs puts everything into the kernel internal caches and grows and
-shrinks to accommodate the files it contains and is able to swap
-unneeded pages out to swap space. It has maximum size limits which can
-be adjusted on the fly via 'mount -o remount ...'
-
-If you compare it to ramfs (which was the template to create tmpfs)
-you gain swapping and limit checking. Another similar thing is the RAM
-disk (/dev/ram*), which simulates a fixed size hard disk in physical
-RAM, where you have to create an ordinary filesystem on top. Ramdisks
-cannot swap and you do not have the possibility to resize them. 
-
-Since tmpfs lives completely in the page cache and on swap, all tmpfs
-pages will be shown as "Shmem" in /proc/meminfo and "Shared" in
-free(1). Notice that these counters also include shared memory
-(shmem, see ipcs(1)). The most reliable way to get the count is
-using df(1) and du(1).
-
-tmpfs has the following uses:
-
-1) There is always a kernel internal mount which you will not see at
-   all. This is used for shared anonymous mappings and SYSV shared
-   memory. 
-
-   This mount does not depend on CONFIG_TMPFS. If CONFIG_TMPFS is not
-   set, the user visible part of tmpfs is not build. But the internal
-   mechanisms are always present.
-
-2) glibc 2.2 and above expects tmpfs to be mounted at /dev/shm for
-   POSIX shared memory (shm_open, shm_unlink). Adding the following
-   line to /etc/fstab should take care of this:
-
-	tmpfs	/dev/shm	tmpfs	defaults	0 0
-
-   Remember to create the directory that you intend to mount tmpfs on
-   if necessary.
-
-   This mount is _not_ needed for SYSV shared memory. The internal
-   mount is used for that. (In the 2.3 kernel versions it was
-   necessary to mount the predecessor of tmpfs (shm fs) to use SYSV
-   shared memory)
-
-3) Some people (including me) find it very convenient to mount it
-   e.g. on /tmp and /var/tmp and have a big swap partition. And now
-   loop mounts of tmpfs files do work, so mkinitrd shipped by most
-   distributions should succeed with a tmpfs /tmp.
-
-4) And probably a lot more I do not know about :-)
-
-
-tmpfs has three mount options for sizing:
-
-size:      The limit of allocated bytes for this tmpfs instance. The 
-           default is half of your physical RAM without swap. If you
-           oversize your tmpfs instances the machine will deadlock
-           since the OOM handler will not be able to free that memory.
-nr_blocks: The same as size, but in blocks of PAGE_SIZE.
-nr_inodes: The maximum number of inodes for this instance. The default
-           is half of the number of your physical RAM pages, or (on a
-           machine with highmem) the number of lowmem RAM pages,
-           whichever is the lower.
-
-These parameters accept a suffix k, m or g for kilo, mega and giga and
-can be changed on remount.  The size parameter also accepts a suffix %
-to limit this tmpfs instance to that percentage of your physical RAM:
-the default, when neither size nor nr_blocks is specified, is size=50%
-
-If nr_blocks=0 (or size=0), blocks will not be limited in that instance;
-if nr_inodes=0, inodes will not be limited.  It is generally unwise to
-mount with such options, since it allows any user with write access to
-use up all the memory on the machine; but enhances the scalability of
-that instance in a system with many cpus making intensive use of it.
-
-
-tmpfs has a mount option to set the NUMA memory allocation policy for
-all files in that instance (if CONFIG_NUMA is enabled) - which can be
-adjusted on the fly via 'mount -o remount ...'
-
-mpol=default             use the process allocation policy
-                         (see set_mempolicy(2))
-mpol=prefer:Node         prefers to allocate memory from the given Node
-mpol=bind:NodeList       allocates memory only from nodes in NodeList
-mpol=interleave          prefers to allocate from each node in turn
-mpol=interleave:NodeList allocates from each node of NodeList in turn
-mpol=local		 prefers to allocate memory from the local node
-
-NodeList format is a comma-separated list of decimal numbers and ranges,
-a range being two hyphen-separated decimal numbers, the smallest and
-largest node numbers in the range.  For example, mpol=bind:0-3,5,7,9-15
-
-A memory policy with a valid NodeList will be saved, as specified, for
-use at file creation time.  When a task allocates a file in the file
-system, the mount option memory policy will be applied with a NodeList,
-if any, modified by the calling task's cpuset constraints
-[See Documentation/admin-guide/cgroup-v1/cpusets.rst] and any optional flags, listed
-below.  If the resulting NodeLists is the empty set, the effective memory
-policy for the file will revert to "default" policy.
-
-NUMA memory allocation policies have optional flags that can be used in
-conjunction with their modes.  These optional flags can be specified
-when tmpfs is mounted by appending them to the mode before the NodeList.
-See Documentation/admin-guide/mm/numa_memory_policy.rst for a list of
-all available memory allocation policy mode flags and their effect on
-memory policy.
-
-	=static		is equivalent to	MPOL_F_STATIC_NODES
-	=relative	is equivalent to	MPOL_F_RELATIVE_NODES
-
-For example, mpol=bind=static:NodeList, is the equivalent of an
-allocation policy of MPOL_BIND | MPOL_F_STATIC_NODES.
-
-Note that trying to mount a tmpfs with an mpol option will fail if the
-running kernel does not support NUMA; and will fail if its nodelist
-specifies a node which is not online.  If your system relies on that
-tmpfs being mounted, but from time to time runs a kernel built without
-NUMA capability (perhaps a safe recovery kernel), or with fewer nodes
-online, then it is advisable to omit the mpol option from automatic
-mount options.  It can be added later, when the tmpfs is already mounted
-on MountPoint, by 'mount -o remount,mpol=Policy:NodeList MountPoint'.
-
-
-To specify the initial root directory you can use the following mount
-options:
-
-mode:	The permissions as an octal number
-uid:	The user id 
-gid:	The group id
-
-These options do not have any effect on remount. You can change these
-parameters with chmod(1), chown(1) and chgrp(1) on a mounted filesystem.
-
-
-So 'mount -t tmpfs -o size=10G,nr_inodes=10k,mode=700 tmpfs /mytmpfs'
-will give you tmpfs instance on /mytmpfs which can allocate 10GB
-RAM/SWAP in 10240 inodes and it is only accessible by root.
-
-
-Author:
-   Christoph Rohland <cr@sap.com>, 1.12.01
-Updated:
-   Hugh Dickins, 4 June 2007
-Updated:
-   KOSAKI Motohiro, 16 Mar 2010
-- 
cgit 


From 688f118e3139f81f813ba1896931cf8fad93430d Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:12:27 +0100
Subject: docs: filesystems: convert ubifs-authentication.rst.txt to ReST

- Add a SPDX header;
- Mark some literals as such;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/0c36091b6660cd372f994bd98e1264491d766c22.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/index.rst                |  1 +
 Documentation/filesystems/ubifs-authentication.rst | 10 ++++++----
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index 27d37e7712da..bb14738df358 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -90,5 +90,6 @@ Documentation for filesystem implementations.
    sysfs
    sysv-fs
    tmpfs
+   ubifs-authentication.rst
    virtiofs
    vfat
diff --git a/Documentation/filesystems/ubifs-authentication.rst b/Documentation/filesystems/ubifs-authentication.rst
index 6a9584f6ff46..16efd729bf7c 100644
--- a/Documentation/filesystems/ubifs-authentication.rst
+++ b/Documentation/filesystems/ubifs-authentication.rst
@@ -1,3 +1,5 @@
+.. SPDX-License-Identifier: GPL-2.0
+
 :orphan:
 
 .. UBIFS Authentication
@@ -92,11 +94,11 @@ UBIFS Index & Tree Node Cache
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 Basic on-flash UBIFS entities are called *nodes*. UBIFS knows different types
-of nodes. Eg. data nodes (`struct ubifs_data_node`) which store chunks of file
-contents or inode nodes (`struct ubifs_ino_node`) which represent VFS inodes.
-Almost all types of nodes share a common header (`ubifs_ch`) containing basic
+of nodes. Eg. data nodes (``struct ubifs_data_node``) which store chunks of file
+contents or inode nodes (``struct ubifs_ino_node``) which represent VFS inodes.
+Almost all types of nodes share a common header (``ubifs_ch``) containing basic
 information like node type, node length, a sequence number, etc. (see
-`fs/ubifs/ubifs-media.h`in kernel source). Exceptions are entries of the LPT
+``fs/ubifs/ubifs-media.h`` in kernel source). Exceptions are entries of the LPT
 and some less important node types like padding nodes which are used to pad
 unusable content at the end of LEBs.
 
-- 
cgit 


From 38e56b4ec44139b5781d6ff13f1b422e4b38f0d4 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:12:28 +0100
Subject: docs: filesystems: convert ubifs.txt to ReST

- Add a SPDX header;
- Add a document title;
- Adjust section titles;
- Some whitespace fixes and new line breaks;
- Mark literal blocks as such;
- Add table markups;
- Add lists markups;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/9043dc2965cafc64e6a521e2317c00ecc8303bf6.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/index.rst |   1 +
 Documentation/filesystems/ubifs.rst | 137 ++++++++++++++++++++++++++++++++++++
 Documentation/filesystems/ubifs.txt | 126 ---------------------------------
 3 files changed, 138 insertions(+), 126 deletions(-)
 create mode 100644 Documentation/filesystems/ubifs.rst
 delete mode 100644 Documentation/filesystems/ubifs.txt

diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index bb14738df358..58d57c9bf922 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -90,6 +90,7 @@ Documentation for filesystem implementations.
    sysfs
    sysv-fs
    tmpfs
+   ubifs
    ubifs-authentication.rst
    virtiofs
    vfat
diff --git a/Documentation/filesystems/ubifs.rst b/Documentation/filesystems/ubifs.rst
new file mode 100644
index 000000000000..e6ee99762534
--- /dev/null
+++ b/Documentation/filesystems/ubifs.rst
@@ -0,0 +1,137 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===============
+UBI File System
+===============
+
+Introduction
+============
+
+UBIFS file-system stands for UBI File System. UBI stands for "Unsorted
+Block Images". UBIFS is a flash file system, which means it is designed
+to work with flash devices. It is important to understand, that UBIFS
+is completely different to any traditional file-system in Linux, like
+Ext2, XFS, JFS, etc. UBIFS represents a separate class of file-systems
+which work with MTD devices, not block devices. The other Linux
+file-system of this class is JFFS2.
+
+To make it more clear, here is a small comparison of MTD devices and
+block devices.
+
+1 MTD devices represent flash devices and they consist of eraseblocks of
+  rather large size, typically about 128KiB. Block devices consist of
+  small blocks, typically 512 bytes.
+2 MTD devices support 3 main operations - read from some offset within an
+  eraseblock, write to some offset within an eraseblock, and erase a whole
+  eraseblock. Block  devices support 2 main operations - read a whole
+  block and write a whole block.
+3 The whole eraseblock has to be erased before it becomes possible to
+  re-write its contents. Blocks may be just re-written.
+4 Eraseblocks become worn out after some number of erase cycles -
+  typically 100K-1G for SLC NAND and NOR flashes, and 1K-10K for MLC
+  NAND flashes. Blocks do not have the wear-out property.
+5 Eraseblocks may become bad (only on NAND flashes) and software should
+  deal with this. Blocks on hard drives typically do not become bad,
+  because hardware has mechanisms to substitute bad blocks, at least in
+  modern LBA disks.
+
+It should be quite obvious why UBIFS is very different to traditional
+file-systems.
+
+UBIFS works on top of UBI. UBI is a separate software layer which may be
+found in drivers/mtd/ubi. UBI is basically a volume management and
+wear-leveling layer. It provides so called UBI volumes which is a higher
+level abstraction than a MTD device. The programming model of UBI devices
+is very similar to MTD devices - they still consist of large eraseblocks,
+they have read/write/erase operations, but UBI devices are devoid of
+limitations like wear and bad blocks (items 4 and 5 in the above list).
+
+In a sense, UBIFS is a next generation of JFFS2 file-system, but it is
+very different and incompatible to JFFS2. The following are the main
+differences.
+
+* JFFS2 works on top of MTD devices, UBIFS depends on UBI and works on
+  top of UBI volumes.
+* JFFS2 does not have on-media index and has to build it while mounting,
+  which requires full media scan. UBIFS maintains the FS indexing
+  information on the flash media and does not require full media scan,
+  so it mounts many times faster than JFFS2.
+* JFFS2 is a write-through file-system, while UBIFS supports write-back,
+  which makes UBIFS much faster on writes.
+
+Similarly to JFFS2, UBIFS supports on-the-flight compression which makes
+it possible to fit quite a lot of data to the flash.
+
+Similarly to JFFS2, UBIFS is tolerant of unclean reboots and power-cuts.
+It does not need stuff like fsck.ext2. UBIFS automatically replays its
+journal and recovers from crashes, ensuring that the on-flash data
+structures are consistent.
+
+UBIFS scales logarithmically (most of the data structures it uses are
+trees), so the mount time and memory consumption do not linearly depend
+on the flash size, like in case of JFFS2. This is because UBIFS
+maintains the FS index on the flash media. However, UBIFS depends on
+UBI, which scales linearly. So overall UBI/UBIFS stack scales linearly.
+Nevertheless, UBI/UBIFS scales considerably better than JFFS2.
+
+The authors of UBIFS believe, that it is possible to develop UBI2 which
+would scale logarithmically as well. UBI2 would support the same API as UBI,
+but it would be binary incompatible to UBI. So UBIFS would not need to be
+changed to use UBI2
+
+
+Mount options
+=============
+
+(*) == default.
+
+====================	=======================================================
+bulk_read		read more in one go to take advantage of flash
+			media that read faster sequentially
+no_bulk_read (*)	do not bulk-read
+no_chk_data_crc (*)	skip checking of CRCs on data nodes in order to
+			improve read performance. Use this option only
+			if the flash media is highly reliable. The effect
+			of this option is that corruption of the contents
+			of a file can go unnoticed.
+chk_data_crc		do not skip checking CRCs on data nodes
+compr=none              override default compressor and set it to "none"
+compr=lzo               override default compressor and set it to "lzo"
+compr=zlib              override default compressor and set it to "zlib"
+auth_key=		specify the key used for authenticating the filesystem.
+			Passing this option makes authentication mandatory.
+			The passed key must be present in the kernel keyring
+			and must be of type 'logon'
+auth_hash_name=		The hash algorithm used for authentication. Used for
+			both hashing and for creating HMACs. Typical values
+			include "sha256" or "sha512"
+====================	=======================================================
+
+
+Quick usage instructions
+========================
+
+The UBI volume to mount is specified using "ubiX_Y" or "ubiX:NAME" syntax,
+where "X" is UBI device number, "Y" is UBI volume number, and "NAME" is
+UBI volume name.
+
+Mount volume 0 on UBI device 0 to /mnt/ubifs::
+
+    $ mount -t ubifs ubi0_0 /mnt/ubifs
+
+Mount "rootfs" volume of UBI device 0 to /mnt/ubifs ("rootfs" is volume
+name)::
+
+    $ mount -t ubifs ubi0:rootfs /mnt/ubifs
+
+The following is an example of the kernel boot arguments to attach mtd0
+to UBI and mount volume "rootfs":
+ubi.mtd=0 root=ubi0:rootfs rootfstype=ubifs
+
+References
+==========
+
+UBIFS documentation and FAQ/HOWTO at the MTD web site:
+
+- http://www.linux-mtd.infradead.org/doc/ubifs.html
+- http://www.linux-mtd.infradead.org/faq/ubifs.html
diff --git a/Documentation/filesystems/ubifs.txt b/Documentation/filesystems/ubifs.txt
deleted file mode 100644
index acc80442a3bb..000000000000
--- a/Documentation/filesystems/ubifs.txt
+++ /dev/null
@@ -1,126 +0,0 @@
-Introduction
-=============
-
-UBIFS file-system stands for UBI File System. UBI stands for "Unsorted
-Block Images". UBIFS is a flash file system, which means it is designed
-to work with flash devices. It is important to understand, that UBIFS
-is completely different to any traditional file-system in Linux, like
-Ext2, XFS, JFS, etc. UBIFS represents a separate class of file-systems
-which work with MTD devices, not block devices. The other Linux
-file-system of this class is JFFS2.
-
-To make it more clear, here is a small comparison of MTD devices and
-block devices.
-
-1 MTD devices represent flash devices and they consist of eraseblocks of
-  rather large size, typically about 128KiB. Block devices consist of
-  small blocks, typically 512 bytes.
-2 MTD devices support 3 main operations - read from some offset within an
-  eraseblock, write to some offset within an eraseblock, and erase a whole
-  eraseblock. Block  devices support 2 main operations - read a whole
-  block and write a whole block.
-3 The whole eraseblock has to be erased before it becomes possible to
-  re-write its contents. Blocks may be just re-written.
-4 Eraseblocks become worn out after some number of erase cycles -
-  typically 100K-1G for SLC NAND and NOR flashes, and 1K-10K for MLC
-  NAND flashes. Blocks do not have the wear-out property.
-5 Eraseblocks may become bad (only on NAND flashes) and software should
-  deal with this. Blocks on hard drives typically do not become bad,
-  because hardware has mechanisms to substitute bad blocks, at least in
-  modern LBA disks.
-
-It should be quite obvious why UBIFS is very different to traditional
-file-systems.
-
-UBIFS works on top of UBI. UBI is a separate software layer which may be
-found in drivers/mtd/ubi. UBI is basically a volume management and
-wear-leveling layer. It provides so called UBI volumes which is a higher
-level abstraction than a MTD device. The programming model of UBI devices
-is very similar to MTD devices - they still consist of large eraseblocks,
-they have read/write/erase operations, but UBI devices are devoid of
-limitations like wear and bad blocks (items 4 and 5 in the above list).
-
-In a sense, UBIFS is a next generation of JFFS2 file-system, but it is
-very different and incompatible to JFFS2. The following are the main
-differences.
-
-* JFFS2 works on top of MTD devices, UBIFS depends on UBI and works on
-  top of UBI volumes.
-* JFFS2 does not have on-media index and has to build it while mounting,
-  which requires full media scan. UBIFS maintains the FS indexing
-  information on the flash media and does not require full media scan,
-  so it mounts many times faster than JFFS2.
-* JFFS2 is a write-through file-system, while UBIFS supports write-back,
-  which makes UBIFS much faster on writes.
-
-Similarly to JFFS2, UBIFS supports on-the-flight compression which makes
-it possible to fit quite a lot of data to the flash.
-
-Similarly to JFFS2, UBIFS is tolerant of unclean reboots and power-cuts.
-It does not need stuff like fsck.ext2. UBIFS automatically replays its
-journal and recovers from crashes, ensuring that the on-flash data
-structures are consistent.
-
-UBIFS scales logarithmically (most of the data structures it uses are
-trees), so the mount time and memory consumption do not linearly depend
-on the flash size, like in case of JFFS2. This is because UBIFS
-maintains the FS index on the flash media. However, UBIFS depends on
-UBI, which scales linearly. So overall UBI/UBIFS stack scales linearly.
-Nevertheless, UBI/UBIFS scales considerably better than JFFS2.
-
-The authors of UBIFS believe, that it is possible to develop UBI2 which
-would scale logarithmically as well. UBI2 would support the same API as UBI,
-but it would be binary incompatible to UBI. So UBIFS would not need to be
-changed to use UBI2
-
-
-Mount options
-=============
-
-(*) == default.
-
-bulk_read		read more in one go to take advantage of flash
-			media that read faster sequentially
-no_bulk_read (*)	do not bulk-read
-no_chk_data_crc (*)	skip checking of CRCs on data nodes in order to
-			improve read performance. Use this option only
-			if the flash media is highly reliable. The effect
-			of this option is that corruption of the contents
-			of a file can go unnoticed.
-chk_data_crc		do not skip checking CRCs on data nodes
-compr=none              override default compressor and set it to "none"
-compr=lzo               override default compressor and set it to "lzo"
-compr=zlib              override default compressor and set it to "zlib"
-auth_key=		specify the key used for authenticating the filesystem.
-			Passing this option makes authentication mandatory.
-			The passed key must be present in the kernel keyring
-			and must be of type 'logon'
-auth_hash_name=		The hash algorithm used for authentication. Used for
-			both hashing and for creating HMACs. Typical values
-			include "sha256" or "sha512"
-
-
-Quick usage instructions
-========================
-
-The UBI volume to mount is specified using "ubiX_Y" or "ubiX:NAME" syntax,
-where "X" is UBI device number, "Y" is UBI volume number, and "NAME" is
-UBI volume name.
-
-Mount volume 0 on UBI device 0 to /mnt/ubifs:
-$ mount -t ubifs ubi0_0 /mnt/ubifs
-
-Mount "rootfs" volume of UBI device 0 to /mnt/ubifs ("rootfs" is volume
-name):
-$ mount -t ubifs ubi0:rootfs /mnt/ubifs
-
-The following is an example of the kernel boot arguments to attach mtd0
-to UBI and mount volume "rootfs":
-ubi.mtd=0 root=ubi0:rootfs rootfstype=ubifs
-
-References
-==========
-
-UBIFS documentation and FAQ/HOWTO at the MTD web site:
-http://www.linux-mtd.infradead.org/doc/ubifs.html
-http://www.linux-mtd.infradead.org/faq/ubifs.html
-- 
cgit 


From c9817ad5d82f04fbc66278eda27bff094dcb3119 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:12:29 +0100
Subject: docs: filesystems: convert udf.txt to ReST

- Add a SPDX header;
- Add a document title;
- Add table markups;
- Add lists markups;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Acked-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/2887f8a3a813a31170389eab687e9f199327dc7d.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/index.rst |  1 +
 Documentation/filesystems/udf.rst   | 75 +++++++++++++++++++++++++++++++++++++
 Documentation/filesystems/udf.txt   | 66 --------------------------------
 3 files changed, 76 insertions(+), 66 deletions(-)
 create mode 100644 Documentation/filesystems/udf.rst
 delete mode 100644 Documentation/filesystems/udf.txt

diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index 58d57c9bf922..ec03cb4d7353 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -92,5 +92,6 @@ Documentation for filesystem implementations.
    tmpfs
    ubifs
    ubifs-authentication.rst
+   udf
    virtiofs
    vfat
diff --git a/Documentation/filesystems/udf.rst b/Documentation/filesystems/udf.rst
new file mode 100644
index 000000000000..d9badbf285b2
--- /dev/null
+++ b/Documentation/filesystems/udf.rst
@@ -0,0 +1,75 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===============
+UDF file system
+===============
+
+If you encounter problems with reading UDF discs using this driver,
+please report them according to MAINTAINERS file.
+
+Write support requires a block driver which supports writing.  Currently
+dvd+rw drives and media support true random sector writes, and so a udf
+filesystem on such devices can be directly mounted read/write.  CD-RW
+media however, does not support this.  Instead the media can be formatted
+for packet mode using the utility cdrwtool, then the pktcdvd driver can
+be bound to the underlying cd device to provide the required buffering
+and read-modify-write cycles to allow the filesystem random sector writes
+while providing the hardware with only full packet writes.  While not
+required for dvd+rw media, use of the pktcdvd driver often enhances
+performance due to very poor read-modify-write support supplied internally
+by drive firmware.
+
+-------------------------------------------------------------------------------
+
+The following mount options are supported:
+
+	===========	======================================
+	gid=		Set the default group.
+	umask=		Set the default umask.
+	mode=		Set the default file permissions.
+	dmode=		Set the default directory permissions.
+	uid=		Set the default user.
+	bs=		Set the block size.
+	unhide		Show otherwise hidden files.
+	undelete	Show deleted files in lists.
+	adinicb		Embed data in the inode (default)
+	noadinicb	Don't embed data in the inode
+	shortad		Use short ad's
+	longad		Use long ad's (default)
+	nostrict	Unset strict conformance
+	iocharset=	Set the NLS character set
+	===========	======================================
+
+The uid= and gid= options need a bit more explaining.  They will accept a
+decimal numeric value and all inodes on that mount will then appear as
+belonging to that uid and gid.  Mount options also accept the string "forget".
+The forget option causes all IDs to be written to disk as -1 which is a way
+of UDF standard to indicate that IDs are not supported for these files .
+
+For typical desktop use of removable media, you should set the ID to that of
+the interactively logged on user, and also specify the forget option.  This way
+the interactive user will always see the files on the disk as belonging to him.
+
+The remaining are for debugging and disaster recovery:
+
+	=====		================================
+	novrs		Skip volume sequence recognition
+	=====		================================
+
+The following expect a offset from 0.
+
+	==========	=================================================
+	session=	Set the CDROM session (default= last session)
+	anchor=		Override standard anchor location. (default= 256)
+	lastblock=	Set the last block of the filesystem/
+	==========	=================================================
+
+-------------------------------------------------------------------------------
+
+
+For the latest version and toolset see:
+	https://github.com/pali/udftools
+
+Documentation on UDF and ECMA 167 is available FREE from:
+	- http://www.osta.org/
+	- http://www.ecma-international.org/
diff --git a/Documentation/filesystems/udf.txt b/Documentation/filesystems/udf.txt
deleted file mode 100644
index e2f2faf32f18..000000000000
--- a/Documentation/filesystems/udf.txt
+++ /dev/null
@@ -1,66 +0,0 @@
-*
-* Documentation/filesystems/udf.txt
-*
-
-If you encounter problems with reading UDF discs using this driver,
-please report them according to MAINTAINERS file.
-
-Write support requires a block driver which supports writing.  Currently
-dvd+rw drives and media support true random sector writes, and so a udf
-filesystem on such devices can be directly mounted read/write.  CD-RW
-media however, does not support this.  Instead the media can be formatted
-for packet mode using the utility cdrwtool, then the pktcdvd driver can
-be bound to the underlying cd device to provide the required buffering
-and read-modify-write cycles to allow the filesystem random sector writes
-while providing the hardware with only full packet writes.  While not
-required for dvd+rw media, use of the pktcdvd driver often enhances
-performance due to very poor read-modify-write support supplied internally
-by drive firmware.
-
--------------------------------------------------------------------------------
-The following mount options are supported:
-
-	gid=		Set the default group.
-	umask=		Set the default umask.
-	mode=		Set the default file permissions.
-	dmode=		Set the default directory permissions.
-	uid=		Set the default user.
-	bs=		Set the block size.
-	unhide		Show otherwise hidden files.
-	undelete	Show deleted files in lists.
-	adinicb		Embed data in the inode (default)
-	noadinicb	Don't embed data in the inode
-	shortad		Use short ad's
-	longad		Use long ad's (default)
-	nostrict	Unset strict conformance
-	iocharset=	Set the NLS character set
-
-The uid= and gid= options need a bit more explaining.  They will accept a
-decimal numeric value and all inodes on that mount will then appear as
-belonging to that uid and gid.  Mount options also accept the string "forget".
-The forget option causes all IDs to be written to disk as -1 which is a way
-of UDF standard to indicate that IDs are not supported for these files .
-
-For typical desktop use of removable media, you should set the ID to that of
-the interactively logged on user, and also specify the forget option.  This way
-the interactive user will always see the files on the disk as belonging to him.
-
-The remaining are for debugging and disaster recovery:
-
-	novrs		Skip volume sequence recognition 
-
-The following expect a offset from 0.
-
-	session=	Set the CDROM session (default= last session)
-	anchor=		Override standard anchor location. (default= 256)
-	lastblock=	Set the last block of the filesystem/
-
--------------------------------------------------------------------------------
-
-
-For the latest version and toolset see:
-	https://github.com/pali/udftools
-
-Documentation on UDF and ECMA 167 is available FREE from:
-	http://www.osta.org/
-	http://www.ecma-international.org/
-- 
cgit 


From 9a6108124c1d27192fee6f058b5de84f51ab62a0 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:12:30 +0100
Subject: docs: filesystems: convert zonefs.txt to ReST

- Add a SPDX header;
- Add a document title;
- Some whitespace fixes and new line breaks;
- Mark literal blocks as such;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Acked-by: Damien Le Moal <damien.lemoal@wdc.com>
Link: https://lore.kernel.org/r/42a7cfcd19f6b904a9a3188fd4af71bed5050052.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/index.rst  |   1 +
 Documentation/filesystems/zonefs.rst | 412 +++++++++++++++++++++++++++++++++++
 Documentation/filesystems/zonefs.txt | 404 ----------------------------------
 3 files changed, 413 insertions(+), 404 deletions(-)
 create mode 100644 Documentation/filesystems/zonefs.rst
 delete mode 100644 Documentation/filesystems/zonefs.txt

diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index ec03cb4d7353..53f46a88e6ec 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -95,3 +95,4 @@ Documentation for filesystem implementations.
    udf
    virtiofs
    vfat
+   zonefs
diff --git a/Documentation/filesystems/zonefs.rst b/Documentation/filesystems/zonefs.rst
new file mode 100644
index 000000000000..7e733e751e98
--- /dev/null
+++ b/Documentation/filesystems/zonefs.rst
@@ -0,0 +1,412 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+================================================
+ZoneFS - Zone filesystem for Zoned block devices
+================================================
+
+Introduction
+============
+
+zonefs is a very simple file system exposing each zone of a zoned block device
+as a file. Unlike a regular POSIX-compliant file system with native zoned block
+device support (e.g. f2fs), zonefs does not hide the sequential write
+constraint of zoned block devices to the user. Files representing sequential
+write zones of the device must be written sequentially starting from the end
+of the file (append only writes).
+
+As such, zonefs is in essence closer to a raw block device access interface
+than to a full-featured POSIX file system. The goal of zonefs is to simplify
+the implementation of zoned block device support in applications by replacing
+raw block device file accesses with a richer file API, avoiding relying on
+direct block device file ioctls which may be more obscure to developers. One
+example of this approach is the implementation of LSM (log-structured merge)
+tree structures (such as used in RocksDB and LevelDB) on zoned block devices
+by allowing SSTables to be stored in a zone file similarly to a regular file
+system rather than as a range of sectors of the entire disk. The introduction
+of the higher level construct "one file is one zone" can help reducing the
+amount of changes needed in the application as well as introducing support for
+different application programming languages.
+
+Zoned block devices
+-------------------
+
+Zoned storage devices belong to a class of storage devices with an address
+space that is divided into zones. A zone is a group of consecutive LBAs and all
+zones are contiguous (there are no LBA gaps). Zones may have different types.
+
+* Conventional zones: there are no access constraints to LBAs belonging to
+  conventional zones. Any read or write access can be executed, similarly to a
+  regular block device.
+* Sequential zones: these zones accept random reads but must be written
+  sequentially. Each sequential zone has a write pointer maintained by the
+  device that keeps track of the mandatory start LBA position of the next write
+  to the device. As a result of this write constraint, LBAs in a sequential zone
+  cannot be overwritten. Sequential zones must first be erased using a special
+  command (zone reset) before rewriting.
+
+Zoned storage devices can be implemented using various recording and media
+technologies. The most common form of zoned storage today uses the SCSI Zoned
+Block Commands (ZBC) and Zoned ATA Commands (ZAC) interfaces on Shingled
+Magnetic Recording (SMR) HDDs.
+
+Solid State Disks (SSD) storage devices can also implement a zoned interface
+to, for instance, reduce internal write amplification due to garbage collection.
+The NVMe Zoned NameSpace (ZNS) is a technical proposal of the NVMe standard
+committee aiming at adding a zoned storage interface to the NVMe protocol.
+
+Zonefs Overview
+===============
+
+Zonefs exposes the zones of a zoned block device as files. The files
+representing zones are grouped by zone type, which are themselves represented
+by sub-directories. This file structure is built entirely using zone information
+provided by the device and so does not require any complex on-disk metadata
+structure.
+
+On-disk metadata
+----------------
+
+zonefs on-disk metadata is reduced to an immutable super block which
+persistently stores a magic number and optional feature flags and values. On
+mount, zonefs uses blkdev_report_zones() to obtain the device zone configuration
+and populates the mount point with a static file tree solely based on this
+information. File sizes come from the device zone type and write pointer
+position managed by the device itself.
+
+The super block is always written on disk at sector 0. The first zone of the
+device storing the super block is never exposed as a zone file by zonefs. If
+the zone containing the super block is a sequential zone, the mkzonefs format
+tool always "finishes" the zone, that is, it transitions the zone to a full
+state to make it read-only, preventing any data write.
+
+Zone type sub-directories
+-------------------------
+
+Files representing zones of the same type are grouped together under the same
+sub-directory automatically created on mount.
+
+For conventional zones, the sub-directory "cnv" is used. This directory is
+however created if and only if the device has usable conventional zones. If
+the device only has a single conventional zone at sector 0, the zone will not
+be exposed as a file as it will be used to store the zonefs super block. For
+such devices, the "cnv" sub-directory will not be created.
+
+For sequential write zones, the sub-directory "seq" is used.
+
+These two directories are the only directories that exist in zonefs. Users
+cannot create other directories and cannot rename nor delete the "cnv" and
+"seq" sub-directories.
+
+The size of the directories indicated by the st_size field of struct stat,
+obtained with the stat() or fstat() system calls, indicates the number of files
+existing under the directory.
+
+Zone files
+----------
+
+Zone files are named using the number of the zone they represent within the set
+of zones of a particular type. That is, both the "cnv" and "seq" directories
+contain files named "0", "1", "2", ... The file numbers also represent
+increasing zone start sector on the device.
+
+All read and write operations to zone files are not allowed beyond the file
+maximum size, that is, beyond the zone size. Any access exceeding the zone
+size is failed with the -EFBIG error.
+
+Creating, deleting, renaming or modifying any attribute of files and
+sub-directories is not allowed.
+
+The number of blocks of a file as reported by stat() and fstat() indicates the
+size of the file zone, or in other words, the maximum file size.
+
+Conventional zone files
+-----------------------
+
+The size of conventional zone files is fixed to the size of the zone they
+represent. Conventional zone files cannot be truncated.
+
+These files can be randomly read and written using any type of I/O operation:
+buffered I/Os, direct I/Os, memory mapped I/Os (mmap), etc. There are no I/O
+constraint for these files beyond the file size limit mentioned above.
+
+Sequential zone files
+---------------------
+
+The size of sequential zone files grouped in the "seq" sub-directory represents
+the file's zone write pointer position relative to the zone start sector.
+
+Sequential zone files can only be written sequentially, starting from the file
+end, that is, write operations can only be append writes. Zonefs makes no
+attempt at accepting random writes and will fail any write request that has a
+start offset not corresponding to the end of the file, or to the end of the last
+write issued and still in-flight (for asynchrnous I/O operations).
+
+Since dirty page writeback by the page cache does not guarantee a sequential
+write pattern, zonefs prevents buffered writes and writeable shared mappings
+on sequential files. Only direct I/O writes are accepted for these files.
+zonefs relies on the sequential delivery of write I/O requests to the device
+implemented by the block layer elevator. An elevator implementing the sequential
+write feature for zoned block device (ELEVATOR_F_ZBD_SEQ_WRITE elevator feature)
+must be used. This type of elevator (e.g. mq-deadline) is the set by default
+for zoned block devices on device initialization.
+
+There are no restrictions on the type of I/O used for read operations in
+sequential zone files. Buffered I/Os, direct I/Os and shared read mappings are
+all accepted.
+
+Truncating sequential zone files is allowed only down to 0, in which case, the
+zone is reset to rewind the file zone write pointer position to the start of
+the zone, or up to the zone size, in which case the file's zone is transitioned
+to the FULL state (finish zone operation).
+
+Format options
+--------------
+
+Several optional features of zonefs can be enabled at format time.
+
+* Conventional zone aggregation: ranges of contiguous conventional zones can be
+  aggregated into a single larger file instead of the default one file per zone.
+* File ownership: The owner UID and GID of zone files is by default 0 (root)
+  but can be changed to any valid UID/GID.
+* File access permissions: the default 640 access permissions can be changed.
+
+IO error handling
+-----------------
+
+Zoned block devices may fail I/O requests for reasons similar to regular block
+devices, e.g. due to bad sectors. However, in addition to such known I/O
+failure pattern, the standards governing zoned block devices behavior define
+additional conditions that result in I/O errors.
+
+* A zone may transition to the read-only condition (BLK_ZONE_COND_READONLY):
+  While the data already written in the zone is still readable, the zone can
+  no longer be written. No user action on the zone (zone management command or
+  read/write access) can change the zone condition back to a normal read/write
+  state. While the reasons for the device to transition a zone to read-only
+  state are not defined by the standards, a typical cause for such transition
+  would be a defective write head on an HDD (all zones under this head are
+  changed to read-only).
+
+* A zone may transition to the offline condition (BLK_ZONE_COND_OFFLINE):
+  An offline zone cannot be read nor written. No user action can transition an
+  offline zone back to an operational good state. Similarly to zone read-only
+  transitions, the reasons for a drive to transition a zone to the offline
+  condition are undefined. A typical cause would be a defective read-write head
+  on an HDD causing all zones on the platter under the broken head to be
+  inaccessible.
+
+* Unaligned write errors: These errors result from the host issuing write
+  requests with a start sector that does not correspond to a zone write pointer
+  position when the write request is executed by the device. Even though zonefs
+  enforces sequential file write for sequential zones, unaligned write errors
+  may still happen in the case of a partial failure of a very large direct I/O
+  operation split into multiple BIOs/requests or asynchronous I/O operations.
+  If one of the write request within the set of sequential write requests
+  issued to the device fails, all write requests after queued after it will
+  become unaligned and fail.
+
+* Delayed write errors: similarly to regular block devices, if the device side
+  write cache is enabled, write errors may occur in ranges of previously
+  completed writes when the device write cache is flushed, e.g. on fsync().
+  Similarly to the previous immediate unaligned write error case, delayed write
+  errors can propagate through a stream of cached sequential data for a zone
+  causing all data to be dropped after the sector that caused the error.
+
+All I/O errors detected by zonefs are notified to the user with an error code
+return for the system call that trigered or detected the error. The recovery
+actions taken by zonefs in response to I/O errors depend on the I/O type (read
+vs write) and on the reason for the error (bad sector, unaligned writes or zone
+condition change).
+
+* For read I/O errors, zonefs does not execute any particular recovery action,
+  but only if the file zone is still in a good condition and there is no
+  inconsistency between the file inode size and its zone write pointer position.
+  If a problem is detected, I/O error recovery is executed (see below table).
+
+* For write I/O errors, zonefs I/O error recovery is always executed.
+
+* A zone condition change to read-only or offline also always triggers zonefs
+  I/O error recovery.
+
+Zonefs minimal I/O error recovery may change a file size and a file access
+permissions.
+
+* File size changes:
+  Immediate or delayed write errors in a sequential zone file may cause the file
+  inode size to be inconsistent with the amount of data successfully written in
+  the file zone. For instance, the partial failure of a multi-BIO large write
+  operation will cause the zone write pointer to advance partially, even though
+  the entire write operation will be reported as failed to the user. In such
+  case, the file inode size must be advanced to reflect the zone write pointer
+  change and eventually allow the user to restart writing at the end of the
+  file.
+  A file size may also be reduced to reflect a delayed write error detected on
+  fsync(): in this case, the amount of data effectively written in the zone may
+  be less than originally indicated by the file inode size. After such I/O
+  error, zonefs always fixes a file inode size to reflect the amount of data
+  persistently stored in the file zone.
+
+* Access permission changes:
+  A zone condition change to read-only is indicated with a change in the file
+  access permissions to render the file read-only. This disables changes to the
+  file attributes and data modification. For offline zones, all permissions
+  (read and write) to the file are disabled.
+
+Further action taken by zonefs I/O error recovery can be controlled by the user
+with the "errors=xxx" mount option. The table below summarizes the result of
+zonefs I/O error processing depending on the mount option and on the zone
+conditions::
+
+    +--------------+-----------+-----------------------------------------+
+    |              |           |            Post error state             |
+    | "errors=xxx" |  device   |                 access permissions      |
+    |    mount     |   zone    | file         file          device zone  |
+    |    option    | condition | size     read    write    read    write |
+    +--------------+-----------+-----------------------------------------+
+    |              | good      | fixed    yes     no       yes     yes   |
+    | remount-ro   | read-only | fixed    yes     no       yes     no    |
+    | (default)    | offline   |   0      no      no       no      no    |
+    +--------------+-----------+-----------------------------------------+
+    |              | good      | fixed    yes     no       yes     yes   |
+    | zone-ro      | read-only | fixed    yes     no       yes     no    |
+    |              | offline   |   0      no      no       no      no    |
+    +--------------+-----------+-----------------------------------------+
+    |              | good      |   0      no      no       yes     yes   |
+    | zone-offline | read-only |   0      no      no       yes     no    |
+    |              | offline   |   0      no      no       no      no    |
+    +--------------+-----------+-----------------------------------------+
+    |              | good      | fixed    yes     yes      yes     yes   |
+    | repair       | read-only | fixed    yes     no       yes     no    |
+    |              | offline   |   0      no      no       no      no    |
+    +--------------+-----------+-----------------------------------------+
+
+Further notes:
+
+* The "errors=remount-ro" mount option is the default behavior of zonefs I/O
+  error processing if no errors mount option is specified.
+* With the "errors=remount-ro" mount option, the change of the file access
+  permissions to read-only applies to all files. The file system is remounted
+  read-only.
+* Access permission and file size changes due to the device transitioning zones
+  to the offline condition are permanent. Remounting or reformating the device
+  with mkfs.zonefs (mkzonefs) will not change back offline zone files to a good
+  state.
+* File access permission changes to read-only due to the device transitioning
+  zones to the read-only condition are permanent. Remounting or reformating
+  the device will not re-enable file write access.
+* File access permission changes implied by the remount-ro, zone-ro and
+  zone-offline mount options are temporary for zones in a good condition.
+  Unmounting and remounting the file system will restore the previous default
+  (format time values) access rights to the files affected.
+* The repair mount option triggers only the minimal set of I/O error recovery
+  actions, that is, file size fixes for zones in a good condition. Zones
+  indicated as being read-only or offline by the device still imply changes to
+  the zone file access permissions as noted in the table above.
+
+Mount options
+-------------
+
+zonefs define the "errors=<behavior>" mount option to allow the user to specify
+zonefs behavior in response to I/O errors, inode size inconsistencies or zone
+condition chages. The defined behaviors are as follow:
+
+* remount-ro (default)
+* zone-ro
+* zone-offline
+* repair
+
+The I/O error actions defined for each behavior is detailed in the previous
+section.
+
+Zonefs User Space Tools
+=======================
+
+The mkzonefs tool is used to format zoned block devices for use with zonefs.
+This tool is available on Github at:
+
+https://github.com/damien-lemoal/zonefs-tools
+
+zonefs-tools also includes a test suite which can be run against any zoned
+block device, including null_blk block device created with zoned mode.
+
+Examples
+--------
+
+The following formats a 15TB host-managed SMR HDD with 256 MB zones
+with the conventional zones aggregation feature enabled::
+
+    # mkzonefs -o aggr_cnv /dev/sdX
+    # mount -t zonefs /dev/sdX /mnt
+    # ls -l /mnt/
+    total 0
+    dr-xr-xr-x 2 root root     1 Nov 25 13:23 cnv
+    dr-xr-xr-x 2 root root 55356 Nov 25 13:23 seq
+
+The size of the zone files sub-directories indicate the number of files
+existing for each type of zones. In this example, there is only one
+conventional zone file (all conventional zones are aggregated under a single
+file)::
+
+    # ls -l /mnt/cnv
+    total 137101312
+    -rw-r----- 1 root root 140391743488 Nov 25 13:23 0
+
+This aggregated conventional zone file can be used as a regular file::
+
+    # mkfs.ext4 /mnt/cnv/0
+    # mount -o loop /mnt/cnv/0 /data
+
+The "seq" sub-directory grouping files for sequential write zones has in this
+example 55356 zones::
+
+    # ls -lv /mnt/seq
+    total 14511243264
+    -rw-r----- 1 root root 0 Nov 25 13:23 0
+    -rw-r----- 1 root root 0 Nov 25 13:23 1
+    -rw-r----- 1 root root 0 Nov 25 13:23 2
+    ...
+    -rw-r----- 1 root root 0 Nov 25 13:23 55354
+    -rw-r----- 1 root root 0 Nov 25 13:23 55355
+
+For sequential write zone files, the file size changes as data is appended at
+the end of the file, similarly to any regular file system::
+
+    # dd if=/dev/zero of=/mnt/seq/0 bs=4096 count=1 conv=notrunc oflag=direct
+    1+0 records in
+    1+0 records out
+    4096 bytes (4.1 kB, 4.0 KiB) copied, 0.00044121 s, 9.3 MB/s
+
+    # ls -l /mnt/seq/0
+    -rw-r----- 1 root root 4096 Nov 25 13:23 /mnt/seq/0
+
+The written file can be truncated to the zone size, preventing any further
+write operation::
+
+    # truncate -s 268435456 /mnt/seq/0
+    # ls -l /mnt/seq/0
+    -rw-r----- 1 root root 268435456 Nov 25 13:49 /mnt/seq/0
+
+Truncation to 0 size allows freeing the file zone storage space and restart
+append-writes to the file::
+
+    # truncate -s 0 /mnt/seq/0
+    # ls -l /mnt/seq/0
+    -rw-r----- 1 root root 0 Nov 25 13:49 /mnt/seq/0
+
+Since files are statically mapped to zones on the disk, the number of blocks of
+a file as reported by stat() and fstat() indicates the size of the file zone::
+
+    # stat /mnt/seq/0
+    File: /mnt/seq/0
+    Size: 0         	Blocks: 524288     IO Block: 4096   regular empty file
+    Device: 870h/2160d	Inode: 50431       Links: 1
+    Access: (0640/-rw-r-----)  Uid: (    0/    root)   Gid: (    0/    root)
+    Access: 2019-11-25 13:23:57.048971997 +0900
+    Modify: 2019-11-25 13:52:25.553805765 +0900
+    Change: 2019-11-25 13:52:25.553805765 +0900
+    Birth: -
+
+The number of blocks of the file ("Blocks") in units of 512B blocks gives the
+maximum file size of 524288 * 512 B = 256 MB, corresponding to the device zone
+size in this example. Of note is that the "IO block" field always indicates the
+minimum I/O size for writes and corresponds to the device physical sector size.
diff --git a/Documentation/filesystems/zonefs.txt b/Documentation/filesystems/zonefs.txt
deleted file mode 100644
index 935bf22031ca..000000000000
--- a/Documentation/filesystems/zonefs.txt
+++ /dev/null
@@ -1,404 +0,0 @@
-ZoneFS - Zone filesystem for Zoned block devices
-
-Introduction
-============
-
-zonefs is a very simple file system exposing each zone of a zoned block device
-as a file. Unlike a regular POSIX-compliant file system with native zoned block
-device support (e.g. f2fs), zonefs does not hide the sequential write
-constraint of zoned block devices to the user. Files representing sequential
-write zones of the device must be written sequentially starting from the end
-of the file (append only writes).
-
-As such, zonefs is in essence closer to a raw block device access interface
-than to a full-featured POSIX file system. The goal of zonefs is to simplify
-the implementation of zoned block device support in applications by replacing
-raw block device file accesses with a richer file API, avoiding relying on
-direct block device file ioctls which may be more obscure to developers. One
-example of this approach is the implementation of LSM (log-structured merge)
-tree structures (such as used in RocksDB and LevelDB) on zoned block devices
-by allowing SSTables to be stored in a zone file similarly to a regular file
-system rather than as a range of sectors of the entire disk. The introduction
-of the higher level construct "one file is one zone" can help reducing the
-amount of changes needed in the application as well as introducing support for
-different application programming languages.
-
-Zoned block devices
--------------------
-
-Zoned storage devices belong to a class of storage devices with an address
-space that is divided into zones. A zone is a group of consecutive LBAs and all
-zones are contiguous (there are no LBA gaps). Zones may have different types.
-* Conventional zones: there are no access constraints to LBAs belonging to
-  conventional zones. Any read or write access can be executed, similarly to a
-  regular block device.
-* Sequential zones: these zones accept random reads but must be written
-  sequentially. Each sequential zone has a write pointer maintained by the
-  device that keeps track of the mandatory start LBA position of the next write
-  to the device. As a result of this write constraint, LBAs in a sequential zone
-  cannot be overwritten. Sequential zones must first be erased using a special
-  command (zone reset) before rewriting.
-
-Zoned storage devices can be implemented using various recording and media
-technologies. The most common form of zoned storage today uses the SCSI Zoned
-Block Commands (ZBC) and Zoned ATA Commands (ZAC) interfaces on Shingled
-Magnetic Recording (SMR) HDDs.
-
-Solid State Disks (SSD) storage devices can also implement a zoned interface
-to, for instance, reduce internal write amplification due to garbage collection.
-The NVMe Zoned NameSpace (ZNS) is a technical proposal of the NVMe standard
-committee aiming at adding a zoned storage interface to the NVMe protocol.
-
-Zonefs Overview
-===============
-
-Zonefs exposes the zones of a zoned block device as files. The files
-representing zones are grouped by zone type, which are themselves represented
-by sub-directories. This file structure is built entirely using zone information
-provided by the device and so does not require any complex on-disk metadata
-structure.
-
-On-disk metadata
-----------------
-
-zonefs on-disk metadata is reduced to an immutable super block which
-persistently stores a magic number and optional feature flags and values. On
-mount, zonefs uses blkdev_report_zones() to obtain the device zone configuration
-and populates the mount point with a static file tree solely based on this
-information. File sizes come from the device zone type and write pointer
-position managed by the device itself.
-
-The super block is always written on disk at sector 0. The first zone of the
-device storing the super block is never exposed as a zone file by zonefs. If
-the zone containing the super block is a sequential zone, the mkzonefs format
-tool always "finishes" the zone, that is, it transitions the zone to a full
-state to make it read-only, preventing any data write.
-
-Zone type sub-directories
--------------------------
-
-Files representing zones of the same type are grouped together under the same
-sub-directory automatically created on mount.
-
-For conventional zones, the sub-directory "cnv" is used. This directory is
-however created if and only if the device has usable conventional zones. If
-the device only has a single conventional zone at sector 0, the zone will not
-be exposed as a file as it will be used to store the zonefs super block. For
-such devices, the "cnv" sub-directory will not be created.
-
-For sequential write zones, the sub-directory "seq" is used.
-
-These two directories are the only directories that exist in zonefs. Users
-cannot create other directories and cannot rename nor delete the "cnv" and
-"seq" sub-directories.
-
-The size of the directories indicated by the st_size field of struct stat,
-obtained with the stat() or fstat() system calls, indicates the number of files
-existing under the directory.
-
-Zone files
-----------
-
-Zone files are named using the number of the zone they represent within the set
-of zones of a particular type. That is, both the "cnv" and "seq" directories
-contain files named "0", "1", "2", ... The file numbers also represent
-increasing zone start sector on the device.
-
-All read and write operations to zone files are not allowed beyond the file
-maximum size, that is, beyond the zone size. Any access exceeding the zone
-size is failed with the -EFBIG error.
-
-Creating, deleting, renaming or modifying any attribute of files and
-sub-directories is not allowed.
-
-The number of blocks of a file as reported by stat() and fstat() indicates the
-size of the file zone, or in other words, the maximum file size.
-
-Conventional zone files
------------------------
-
-The size of conventional zone files is fixed to the size of the zone they
-represent. Conventional zone files cannot be truncated.
-
-These files can be randomly read and written using any type of I/O operation:
-buffered I/Os, direct I/Os, memory mapped I/Os (mmap), etc. There are no I/O
-constraint for these files beyond the file size limit mentioned above.
-
-Sequential zone files
----------------------
-
-The size of sequential zone files grouped in the "seq" sub-directory represents
-the file's zone write pointer position relative to the zone start sector.
-
-Sequential zone files can only be written sequentially, starting from the file
-end, that is, write operations can only be append writes. Zonefs makes no
-attempt at accepting random writes and will fail any write request that has a
-start offset not corresponding to the end of the file, or to the end of the last
-write issued and still in-flight (for asynchrnous I/O operations).
-
-Since dirty page writeback by the page cache does not guarantee a sequential
-write pattern, zonefs prevents buffered writes and writeable shared mappings
-on sequential files. Only direct I/O writes are accepted for these files.
-zonefs relies on the sequential delivery of write I/O requests to the device
-implemented by the block layer elevator. An elevator implementing the sequential
-write feature for zoned block device (ELEVATOR_F_ZBD_SEQ_WRITE elevator feature)
-must be used. This type of elevator (e.g. mq-deadline) is the set by default
-for zoned block devices on device initialization.
-
-There are no restrictions on the type of I/O used for read operations in
-sequential zone files. Buffered I/Os, direct I/Os and shared read mappings are
-all accepted.
-
-Truncating sequential zone files is allowed only down to 0, in which case, the
-zone is reset to rewind the file zone write pointer position to the start of
-the zone, or up to the zone size, in which case the file's zone is transitioned
-to the FULL state (finish zone operation).
-
-Format options
---------------
-
-Several optional features of zonefs can be enabled at format time.
-* Conventional zone aggregation: ranges of contiguous conventional zones can be
-  aggregated into a single larger file instead of the default one file per zone.
-* File ownership: The owner UID and GID of zone files is by default 0 (root)
-  but can be changed to any valid UID/GID.
-* File access permissions: the default 640 access permissions can be changed.
-
-IO error handling
------------------
-
-Zoned block devices may fail I/O requests for reasons similar to regular block
-devices, e.g. due to bad sectors. However, in addition to such known I/O
-failure pattern, the standards governing zoned block devices behavior define
-additional conditions that result in I/O errors.
-
-* A zone may transition to the read-only condition (BLK_ZONE_COND_READONLY):
-  While the data already written in the zone is still readable, the zone can
-  no longer be written. No user action on the zone (zone management command or
-  read/write access) can change the zone condition back to a normal read/write
-  state. While the reasons for the device to transition a zone to read-only
-  state are not defined by the standards, a typical cause for such transition
-  would be a defective write head on an HDD (all zones under this head are
-  changed to read-only).
-
-* A zone may transition to the offline condition (BLK_ZONE_COND_OFFLINE):
-  An offline zone cannot be read nor written. No user action can transition an
-  offline zone back to an operational good state. Similarly to zone read-only
-  transitions, the reasons for a drive to transition a zone to the offline
-  condition are undefined. A typical cause would be a defective read-write head
-  on an HDD causing all zones on the platter under the broken head to be
-  inaccessible.
-
-* Unaligned write errors: These errors result from the host issuing write
-  requests with a start sector that does not correspond to a zone write pointer
-  position when the write request is executed by the device. Even though zonefs
-  enforces sequential file write for sequential zones, unaligned write errors
-  may still happen in the case of a partial failure of a very large direct I/O
-  operation split into multiple BIOs/requests or asynchronous I/O operations.
-  If one of the write request within the set of sequential write requests
-  issued to the device fails, all write requests after queued after it will
-  become unaligned and fail.
-
-* Delayed write errors: similarly to regular block devices, if the device side
-  write cache is enabled, write errors may occur in ranges of previously
-  completed writes when the device write cache is flushed, e.g. on fsync().
-  Similarly to the previous immediate unaligned write error case, delayed write
-  errors can propagate through a stream of cached sequential data for a zone
-  causing all data to be dropped after the sector that caused the error.
-
-All I/O errors detected by zonefs are notified to the user with an error code
-return for the system call that trigered or detected the error. The recovery
-actions taken by zonefs in response to I/O errors depend on the I/O type (read
-vs write) and on the reason for the error (bad sector, unaligned writes or zone
-condition change).
-
-* For read I/O errors, zonefs does not execute any particular recovery action,
-  but only if the file zone is still in a good condition and there is no
-  inconsistency between the file inode size and its zone write pointer position.
-  If a problem is detected, I/O error recovery is executed (see below table).
-
-* For write I/O errors, zonefs I/O error recovery is always executed.
-
-* A zone condition change to read-only or offline also always triggers zonefs
-  I/O error recovery.
-
-Zonefs minimal I/O error recovery may change a file size and a file access
-permissions.
-
-* File size changes:
-  Immediate or delayed write errors in a sequential zone file may cause the file
-  inode size to be inconsistent with the amount of data successfully written in
-  the file zone. For instance, the partial failure of a multi-BIO large write
-  operation will cause the zone write pointer to advance partially, even though
-  the entire write operation will be reported as failed to the user. In such
-  case, the file inode size must be advanced to reflect the zone write pointer
-  change and eventually allow the user to restart writing at the end of the
-  file.
-  A file size may also be reduced to reflect a delayed write error detected on
-  fsync(): in this case, the amount of data effectively written in the zone may
-  be less than originally indicated by the file inode size. After such I/O
-  error, zonefs always fixes a file inode size to reflect the amount of data
-  persistently stored in the file zone.
-
-* Access permission changes:
-  A zone condition change to read-only is indicated with a change in the file
-  access permissions to render the file read-only. This disables changes to the
-  file attributes and data modification. For offline zones, all permissions
-  (read and write) to the file are disabled.
-
-Further action taken by zonefs I/O error recovery can be controlled by the user
-with the "errors=xxx" mount option. The table below summarizes the result of
-zonefs I/O error processing depending on the mount option and on the zone
-conditions.
-
-    +--------------+-----------+-----------------------------------------+
-    |              |           |            Post error state             |
-    | "errors=xxx" |  device   |                 access permissions      |
-    |    mount     |   zone    | file         file          device zone  |
-    |    option    | condition | size     read    write    read    write |
-    +--------------+-----------+-----------------------------------------+
-    |              | good      | fixed    yes     no       yes     yes   |
-    | remount-ro   | read-only | fixed    yes     no       yes     no    |
-    | (default)    | offline   |   0      no      no       no      no    |
-    +--------------+-----------+-----------------------------------------+
-    |              | good      | fixed    yes     no       yes     yes   |
-    | zone-ro      | read-only | fixed    yes     no       yes     no    |
-    |              | offline   |   0      no      no       no      no    |
-    +--------------+-----------+-----------------------------------------+
-    |              | good      |   0      no      no       yes     yes   |
-    | zone-offline | read-only |   0      no      no       yes     no    |
-    |              | offline   |   0      no      no       no      no    |
-    +--------------+-----------+-----------------------------------------+
-    |              | good      | fixed    yes     yes      yes     yes   |
-    | repair       | read-only | fixed    yes     no       yes     no    |
-    |              | offline   |   0      no      no       no      no    |
-    +--------------+-----------+-----------------------------------------+
-
-Further notes:
-* The "errors=remount-ro" mount option is the default behavior of zonefs I/O
-  error processing if no errors mount option is specified.
-* With the "errors=remount-ro" mount option, the change of the file access
-  permissions to read-only applies to all files. The file system is remounted
-  read-only.
-* Access permission and file size changes due to the device transitioning zones
-  to the offline condition are permanent. Remounting or reformating the device
-  with mkfs.zonefs (mkzonefs) will not change back offline zone files to a good
-  state.
-* File access permission changes to read-only due to the device transitioning
-  zones to the read-only condition are permanent. Remounting or reformating
-  the device will not re-enable file write access.
-* File access permission changes implied by the remount-ro, zone-ro and
-  zone-offline mount options are temporary for zones in a good condition.
-  Unmounting and remounting the file system will restore the previous default
-  (format time values) access rights to the files affected.
-* The repair mount option triggers only the minimal set of I/O error recovery
-  actions, that is, file size fixes for zones in a good condition. Zones
-  indicated as being read-only or offline by the device still imply changes to
-  the zone file access permissions as noted in the table above.
-
-Mount options
--------------
-
-zonefs define the "errors=<behavior>" mount option to allow the user to specify
-zonefs behavior in response to I/O errors, inode size inconsistencies or zone
-condition chages. The defined behaviors are as follow:
-* remount-ro (default)
-* zone-ro
-* zone-offline
-* repair
-
-The I/O error actions defined for each behavior is detailed in the previous
-section.
-
-Zonefs User Space Tools
-=======================
-
-The mkzonefs tool is used to format zoned block devices for use with zonefs.
-This tool is available on Github at:
-
-https://github.com/damien-lemoal/zonefs-tools
-
-zonefs-tools also includes a test suite which can be run against any zoned
-block device, including null_blk block device created with zoned mode.
-
-Examples
---------
-
-The following formats a 15TB host-managed SMR HDD with 256 MB zones
-with the conventional zones aggregation feature enabled.
-
-# mkzonefs -o aggr_cnv /dev/sdX
-# mount -t zonefs /dev/sdX /mnt
-# ls -l /mnt/
-total 0
-dr-xr-xr-x 2 root root     1 Nov 25 13:23 cnv
-dr-xr-xr-x 2 root root 55356 Nov 25 13:23 seq
-
-The size of the zone files sub-directories indicate the number of files
-existing for each type of zones. In this example, there is only one
-conventional zone file (all conventional zones are aggregated under a single
-file).
-
-# ls -l /mnt/cnv
-total 137101312
--rw-r----- 1 root root 140391743488 Nov 25 13:23 0
-
-This aggregated conventional zone file can be used as a regular file.
-
-# mkfs.ext4 /mnt/cnv/0
-# mount -o loop /mnt/cnv/0 /data
-
-The "seq" sub-directory grouping files for sequential write zones has in this
-example 55356 zones.
-
-# ls -lv /mnt/seq
-total 14511243264
--rw-r----- 1 root root 0 Nov 25 13:23 0
--rw-r----- 1 root root 0 Nov 25 13:23 1
--rw-r----- 1 root root 0 Nov 25 13:23 2
-...
--rw-r----- 1 root root 0 Nov 25 13:23 55354
--rw-r----- 1 root root 0 Nov 25 13:23 55355
-
-For sequential write zone files, the file size changes as data is appended at
-the end of the file, similarly to any regular file system.
-
-# dd if=/dev/zero of=/mnt/seq/0 bs=4096 count=1 conv=notrunc oflag=direct
-1+0 records in
-1+0 records out
-4096 bytes (4.1 kB, 4.0 KiB) copied, 0.00044121 s, 9.3 MB/s
-
-# ls -l /mnt/seq/0
--rw-r----- 1 root root 4096 Nov 25 13:23 /mnt/seq/0
-
-The written file can be truncated to the zone size, preventing any further
-write operation.
-
-# truncate -s 268435456 /mnt/seq/0
-# ls -l /mnt/seq/0
--rw-r----- 1 root root 268435456 Nov 25 13:49 /mnt/seq/0
-
-Truncation to 0 size allows freeing the file zone storage space and restart
-append-writes to the file.
-
-# truncate -s 0 /mnt/seq/0
-# ls -l /mnt/seq/0
--rw-r----- 1 root root 0 Nov 25 13:49 /mnt/seq/0
-
-Since files are statically mapped to zones on the disk, the number of blocks of
-a file as reported by stat() and fstat() indicates the size of the file zone.
-
-# stat /mnt/seq/0
-  File: /mnt/seq/0
-  Size: 0         	Blocks: 524288     IO Block: 4096   regular empty file
-Device: 870h/2160d	Inode: 50431       Links: 1
-Access: (0640/-rw-r-----)  Uid: (    0/    root)   Gid: (    0/    root)
-Access: 2019-11-25 13:23:57.048971997 +0900
-Modify: 2019-11-25 13:52:25.553805765 +0900
-Change: 2019-11-25 13:52:25.553805765 +0900
- Birth: -
-
-The number of blocks of the file ("Blocks") in units of 512B blocks gives the
-maximum file size of 524288 * 512 B = 256 MB, corresponding to the device zone
-size in this example. Of note is that the "IO block" field always indicates the
-minimum I/O size for writes and corresponds to the device physical sector size.
-- 
cgit 


From 19796c348ab62287fc6434f296ae279d7b97e39f Mon Sep 17 00:00:00 2001
From: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Date: Sun, 8 Mar 2020 22:14:43 +0100
Subject: docs: Move Intel Many Integrated Core documentation (mic) under
 misc-devices
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

It doesn't need to be a top-level chapter.

This patch also updates MAINTAINERS and makes sure the F: lines are
properly sorted.

Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Link: https://lore.kernel.org/r/20200308211519.8414-1-j.neuschaefer@gmx.net
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/index.rst                          |   1 -
 Documentation/mic/index.rst                      |  16 ----
 Documentation/mic/mic_overview.rst               |  85 ------------------
 Documentation/mic/scif_overview.rst              | 108 -----------------------
 Documentation/misc-devices/index.rst             |   1 +
 Documentation/misc-devices/mic/index.rst         |  16 ++++
 Documentation/misc-devices/mic/mic_overview.rst  |  85 ++++++++++++++++++
 Documentation/misc-devices/mic/scif_overview.rst | 108 +++++++++++++++++++++++
 MAINTAINERS                                      |   8 +-
 9 files changed, 214 insertions(+), 214 deletions(-)
 delete mode 100644 Documentation/mic/index.rst
 delete mode 100644 Documentation/mic/mic_overview.rst
 delete mode 100644 Documentation/mic/scif_overview.rst
 create mode 100644 Documentation/misc-devices/mic/index.rst
 create mode 100644 Documentation/misc-devices/mic/mic_overview.rst
 create mode 100644 Documentation/misc-devices/mic/scif_overview.rst

diff --git a/Documentation/index.rst b/Documentation/index.rst
index e99d0bd2589d..6fdad61ee443 100644
--- a/Documentation/index.rst
+++ b/Documentation/index.rst
@@ -131,7 +131,6 @@ needed).
    usb/index
    PCI/index
    misc-devices/index
-   mic/index
    scheduler/index
 
 Architecture-agnostic documentation
diff --git a/Documentation/mic/index.rst b/Documentation/mic/index.rst
deleted file mode 100644
index 3a8d06367ef1..000000000000
--- a/Documentation/mic/index.rst
+++ /dev/null
@@ -1,16 +0,0 @@
-=============================================
-Intel Many Integrated Core (MIC) architecture
-=============================================
-
-.. toctree::
-    :maxdepth: 1
-
-    mic_overview
-    scif_overview
-
-.. only::  subproject and html
-
-   Indices
-   =======
-
-   * :ref:`genindex`
diff --git a/Documentation/mic/mic_overview.rst b/Documentation/mic/mic_overview.rst
deleted file mode 100644
index 17d956bdaf7c..000000000000
--- a/Documentation/mic/mic_overview.rst
+++ /dev/null
@@ -1,85 +0,0 @@
-======================================================
-Intel Many Integrated Core (MIC) architecture overview
-======================================================
-
-An Intel MIC X100 device is a PCIe form factor add-in coprocessor
-card based on the Intel Many Integrated Core (MIC) architecture
-that runs a Linux OS. It is a PCIe endpoint in a platform and therefore
-implements the three required standard address spaces i.e. configuration,
-memory and I/O. The host OS loads a device driver as is typical for
-PCIe devices. The card itself runs a bootstrap after reset that
-transfers control to the card OS downloaded from the host driver. The
-host driver supports OSPM suspend and resume operations. It shuts down
-the card during suspend and reboots the card OS during resume.
-The card OS as shipped by Intel is a Linux kernel with modifications
-for the X100 devices.
-
-Since it is a PCIe card, it does not have the ability to host hardware
-devices for networking, storage and console. We provide these devices
-on X100 coprocessors thus enabling a self-bootable equivalent
-environment for applications. A key benefit of our solution is that it
-leverages the standard virtio framework for network, disk and console
-devices, though in our case the virtio framework is used across a PCIe
-bus. A Virtio Over PCIe (VOP) driver allows creating user space
-backends or devices on the host which are used to probe virtio drivers
-for these devices on the MIC card. The existing VRINGH infrastructure
-in the kernel is used to access virtio rings from the host. The card
-VOP driver allows card virtio drivers to communicate with their user
-space backends on the host via a device page. Ring 3 apps on the host
-can add, remove and configure virtio devices. A thin MIC specific
-virtio_config_ops is implemented which is borrowed heavily from
-previous similar implementations in lguest and s390.
-
-MIC PCIe card has a dma controller with 8 channels. These channels are
-shared between the host s/w and the card s/w. 0 to 3 are used by host
-and 4 to 7 by card. As the dma device doesn't show up as PCIe device,
-a virtual bus called mic bus is created and virtual dma devices are
-created on it by the host/card drivers. On host the channels are private
-and used only by the host driver to transfer data for the virtio devices.
-
-The Symmetric Communication Interface (SCIF (pronounced as skiff)) is a
-low level communications API across PCIe currently implemented for MIC.
-More details are available at scif_overview.txt.
-
-The Coprocessor State Management (COSM) driver on the host allows for
-boot, shutdown and reset of Intel MIC devices. It communicates with a COSM
-"client" driver on the MIC cards over SCIF to perform these functions.
-
-Here is a block diagram of the various components described above. The
-virtio backends are situated on the host rather than the card given better
-single threaded performance for the host compared to MIC, the ability of
-the host to initiate DMA's to/from the card using the MIC DMA engine and
-the fact that the virtio block storage backend can only be on the host::
-
-               +----------+           |             +----------+
-               | Card OS  |           |             | Host OS  |
-               +----------+           |             +----------+
-                                      |
-        +-------+ +--------+ +------+ | +---------+  +--------+ +--------+
-        | Virtio| |Virtio  | |Virtio| | |Virtio   |  |Virtio  | |Virtio  |
-        | Net   | |Console | |Block | | |Net      |  |Console | |Block   |
-        | Driver| |Driver  | |Driver| | |backend  |  |backend | |backend |
-        +---+---+ +---+----+ +--+---+ | +---------+  +----+---+ +--------+
-            |         |         |     |      |            |         |
-            |         |         |     |User  |            |         |
-            |         |         |     |------|------------|--+------|-------
-            +---------+---------+     |Kernel                |
-                      |               |                      |
-  +---------+     +---+----+ +------+ | +------+ +------+ +--+---+  +-------+
-  |MIC DMA  |     |  VOP   | | SCIF | | | SCIF | | COSM | | VOP  |  |MIC DMA|
-  +---+-----+     +---+----+ +--+---+ | +--+---+ +--+---+ +------+  +----+--+
-      |               |         |     |    |        |                    |
-  +---+-----+     +---+----+ +--+---+ | +--+---+ +--+---+ +------+  +----+--+
-  |MIC      |     |  VOP   | |SCIF  | | |SCIF  | | COSM | | VOP  |  | MIC   |
-  |HW Bus   |     |  HW Bus| |HW Bus| | |HW Bus| | Bus  | |HW Bus|  |HW Bus |
-  +---------+     +--------+ +--+---+ | +--+---+ +------+ +------+  +-------+
-      |               |         |     |       |     |                    |
-      |   +-----------+--+      |     |       |    +---------------+     |
-      |   |Intel MIC     |      |     |       |    |Intel MIC      |     |
-      |   |Card Driver   |      |     |       |    |Host Driver    |     |
-      +---+--------------+------+     |       +----+---------------+-----+
-                 |                    |                   |
-             +-------------------------------------------------------------+
-             |                                                             |
-             |                    PCIe Bus                                 |
-             +-------------------------------------------------------------+
diff --git a/Documentation/mic/scif_overview.rst b/Documentation/mic/scif_overview.rst
deleted file mode 100644
index 4c8ad9e43706..000000000000
--- a/Documentation/mic/scif_overview.rst
+++ /dev/null
@@ -1,108 +0,0 @@
-========================================
-Symmetric Communication Interface (SCIF)
-========================================
-
-The Symmetric Communication Interface (SCIF (pronounced as skiff)) is a low
-level communications API across PCIe currently implemented for MIC. Currently
-SCIF provides inter-node communication within a single host platform, where a
-node is a MIC Coprocessor or Xeon based host. SCIF abstracts the details of
-communicating over the PCIe bus while providing an API that is symmetric
-across all the nodes in the PCIe network. An important design objective for SCIF
-is to deliver the maximum possible performance given the communication
-abilities of the hardware. SCIF has been used to implement an offload compiler
-runtime and OFED support for MPI implementations for MIC coprocessors.
-
-SCIF API Components
-===================
-
-The SCIF API has the following parts:
-
-1. Connection establishment using a client server model
-2. Byte stream messaging intended for short messages
-3. Node enumeration to determine online nodes
-4. Poll semantics for detection of incoming connections and messages
-5. Memory registration to pin down pages
-6. Remote memory mapping for low latency CPU accesses via mmap
-7. Remote DMA (RDMA) for high bandwidth DMA transfers
-8. Fence APIs for RDMA synchronization
-
-SCIF exposes the notion of a connection which can be used by peer processes on
-nodes in a SCIF PCIe "network" to share memory "windows" and to communicate. A
-process in a SCIF node initiates a SCIF connection to a peer process on a
-different node via a SCIF "endpoint". SCIF endpoints support messaging APIs
-which are similar to connection oriented socket APIs. Connected SCIF endpoints
-can also register local memory which is followed by data transfer using either
-DMA, CPU copies or remote memory mapping via mmap. SCIF supports both user and
-kernel mode clients which are functionally equivalent.
-
-SCIF Performance for MIC
-========================
-
-DMA bandwidth comparison between the TCP (over ethernet over PCIe) stack versus
-SCIF shows the performance advantages of SCIF for HPC applications and
-runtimes::
-
-             Comparison of TCP and SCIF based BW
-
-  Throughput (GB/sec)
-    8 +                                             PCIe Bandwidth ******
-      +                                                        TCP ######
-    7 +    **************************************             SCIF %%%%%%
-      |                       %%%%%%%%%%%%%%%%%%%
-    6 +                   %%%%
-      |                 %%
-      |               %%%
-    5 +              %%
-      |            %%
-    4 +           %%
-      |          %%
-    3 +         %%
-      |        %
-    2 +      %%
-      |     %%
-      |    %
-    1 +
-      +    ######################################
-    0 +++---+++--+--+-+--+--+-++-+--+-++-+--+-++-+-
-      1       10     100      1000   10000   100000
-                   Transfer Size (KBytes)
-
-SCIF allows memory sharing via mmap(..) between processes on different PCIe
-nodes and thus provides bare-metal PCIe latency. The round trip SCIF mmap
-latency from the host to an x100 MIC for an 8 byte message is 0.44 usecs.
-
-SCIF has a user space library which is a thin IOCTL wrapper providing a user
-space API similar to the kernel API in scif.h. The SCIF user space library
-is distributed @ https://software.intel.com/en-us/mic-developer
-
-Here is some pseudo code for an example of how two applications on two PCIe
-nodes would typically use the SCIF API::
-
-  Process A (on node A)			Process B (on node B)
-
-  /* get online node information */
-  scif_get_node_ids(..)			scif_get_node_ids(..)
-  scif_open(..)				scif_open(..)
-  scif_bind(..)				scif_bind(..)
-  scif_listen(..)
-  scif_accept(..)				scif_connect(..)
-  /* SCIF connection established */
-
-  /* Send and receive short messages */
-  scif_send(..)/scif_recv(..)		scif_send(..)/scif_recv(..)
-
-  /* Register memory */
-  scif_register(..)			scif_register(..)
-
-  /* RDMA */
-  scif_readfrom(..)/scif_writeto(..)	scif_readfrom(..)/scif_writeto(..)
-
-  /* Fence DMAs */
-  scif_fence_signal(..)			scif_fence_signal(..)
-
-  mmap(..)				mmap(..)
-
-  /* Access remote registered memory */
-
-  /* Close the endpoints */
-  scif_close(..)				scif_close(..)
diff --git a/Documentation/misc-devices/index.rst b/Documentation/misc-devices/index.rst
index f11c5daeada5..c1dcd2628911 100644
--- a/Documentation/misc-devices/index.rst
+++ b/Documentation/misc-devices/index.rst
@@ -20,4 +20,5 @@ fit into other categories.
    isl29003
    lis3lv02d
    max6875
+   mic/index
    xilinx_sdfec
diff --git a/Documentation/misc-devices/mic/index.rst b/Documentation/misc-devices/mic/index.rst
new file mode 100644
index 000000000000..3a8d06367ef1
--- /dev/null
+++ b/Documentation/misc-devices/mic/index.rst
@@ -0,0 +1,16 @@
+=============================================
+Intel Many Integrated Core (MIC) architecture
+=============================================
+
+.. toctree::
+    :maxdepth: 1
+
+    mic_overview
+    scif_overview
+
+.. only::  subproject and html
+
+   Indices
+   =======
+
+   * :ref:`genindex`
diff --git a/Documentation/misc-devices/mic/mic_overview.rst b/Documentation/misc-devices/mic/mic_overview.rst
new file mode 100644
index 000000000000..17d956bdaf7c
--- /dev/null
+++ b/Documentation/misc-devices/mic/mic_overview.rst
@@ -0,0 +1,85 @@
+======================================================
+Intel Many Integrated Core (MIC) architecture overview
+======================================================
+
+An Intel MIC X100 device is a PCIe form factor add-in coprocessor
+card based on the Intel Many Integrated Core (MIC) architecture
+that runs a Linux OS. It is a PCIe endpoint in a platform and therefore
+implements the three required standard address spaces i.e. configuration,
+memory and I/O. The host OS loads a device driver as is typical for
+PCIe devices. The card itself runs a bootstrap after reset that
+transfers control to the card OS downloaded from the host driver. The
+host driver supports OSPM suspend and resume operations. It shuts down
+the card during suspend and reboots the card OS during resume.
+The card OS as shipped by Intel is a Linux kernel with modifications
+for the X100 devices.
+
+Since it is a PCIe card, it does not have the ability to host hardware
+devices for networking, storage and console. We provide these devices
+on X100 coprocessors thus enabling a self-bootable equivalent
+environment for applications. A key benefit of our solution is that it
+leverages the standard virtio framework for network, disk and console
+devices, though in our case the virtio framework is used across a PCIe
+bus. A Virtio Over PCIe (VOP) driver allows creating user space
+backends or devices on the host which are used to probe virtio drivers
+for these devices on the MIC card. The existing VRINGH infrastructure
+in the kernel is used to access virtio rings from the host. The card
+VOP driver allows card virtio drivers to communicate with their user
+space backends on the host via a device page. Ring 3 apps on the host
+can add, remove and configure virtio devices. A thin MIC specific
+virtio_config_ops is implemented which is borrowed heavily from
+previous similar implementations in lguest and s390.
+
+MIC PCIe card has a dma controller with 8 channels. These channels are
+shared between the host s/w and the card s/w. 0 to 3 are used by host
+and 4 to 7 by card. As the dma device doesn't show up as PCIe device,
+a virtual bus called mic bus is created and virtual dma devices are
+created on it by the host/card drivers. On host the channels are private
+and used only by the host driver to transfer data for the virtio devices.
+
+The Symmetric Communication Interface (SCIF (pronounced as skiff)) is a
+low level communications API across PCIe currently implemented for MIC.
+More details are available at scif_overview.txt.
+
+The Coprocessor State Management (COSM) driver on the host allows for
+boot, shutdown and reset of Intel MIC devices. It communicates with a COSM
+"client" driver on the MIC cards over SCIF to perform these functions.
+
+Here is a block diagram of the various components described above. The
+virtio backends are situated on the host rather than the card given better
+single threaded performance for the host compared to MIC, the ability of
+the host to initiate DMA's to/from the card using the MIC DMA engine and
+the fact that the virtio block storage backend can only be on the host::
+
+               +----------+           |             +----------+
+               | Card OS  |           |             | Host OS  |
+               +----------+           |             +----------+
+                                      |
+        +-------+ +--------+ +------+ | +---------+  +--------+ +--------+
+        | Virtio| |Virtio  | |Virtio| | |Virtio   |  |Virtio  | |Virtio  |
+        | Net   | |Console | |Block | | |Net      |  |Console | |Block   |
+        | Driver| |Driver  | |Driver| | |backend  |  |backend | |backend |
+        +---+---+ +---+----+ +--+---+ | +---------+  +----+---+ +--------+
+            |         |         |     |      |            |         |
+            |         |         |     |User  |            |         |
+            |         |         |     |------|------------|--+------|-------
+            +---------+---------+     |Kernel                |
+                      |               |                      |
+  +---------+     +---+----+ +------+ | +------+ +------+ +--+---+  +-------+
+  |MIC DMA  |     |  VOP   | | SCIF | | | SCIF | | COSM | | VOP  |  |MIC DMA|
+  +---+-----+     +---+----+ +--+---+ | +--+---+ +--+---+ +------+  +----+--+
+      |               |         |     |    |        |                    |
+  +---+-----+     +---+----+ +--+---+ | +--+---+ +--+---+ +------+  +----+--+
+  |MIC      |     |  VOP   | |SCIF  | | |SCIF  | | COSM | | VOP  |  | MIC   |
+  |HW Bus   |     |  HW Bus| |HW Bus| | |HW Bus| | Bus  | |HW Bus|  |HW Bus |
+  +---------+     +--------+ +--+---+ | +--+---+ +------+ +------+  +-------+
+      |               |         |     |       |     |                    |
+      |   +-----------+--+      |     |       |    +---------------+     |
+      |   |Intel MIC     |      |     |       |    |Intel MIC      |     |
+      |   |Card Driver   |      |     |       |    |Host Driver    |     |
+      +---+--------------+------+     |       +----+---------------+-----+
+                 |                    |                   |
+             +-------------------------------------------------------------+
+             |                                                             |
+             |                    PCIe Bus                                 |
+             +-------------------------------------------------------------+
diff --git a/Documentation/misc-devices/mic/scif_overview.rst b/Documentation/misc-devices/mic/scif_overview.rst
new file mode 100644
index 000000000000..4c8ad9e43706
--- /dev/null
+++ b/Documentation/misc-devices/mic/scif_overview.rst
@@ -0,0 +1,108 @@
+========================================
+Symmetric Communication Interface (SCIF)
+========================================
+
+The Symmetric Communication Interface (SCIF (pronounced as skiff)) is a low
+level communications API across PCIe currently implemented for MIC. Currently
+SCIF provides inter-node communication within a single host platform, where a
+node is a MIC Coprocessor or Xeon based host. SCIF abstracts the details of
+communicating over the PCIe bus while providing an API that is symmetric
+across all the nodes in the PCIe network. An important design objective for SCIF
+is to deliver the maximum possible performance given the communication
+abilities of the hardware. SCIF has been used to implement an offload compiler
+runtime and OFED support for MPI implementations for MIC coprocessors.
+
+SCIF API Components
+===================
+
+The SCIF API has the following parts:
+
+1. Connection establishment using a client server model
+2. Byte stream messaging intended for short messages
+3. Node enumeration to determine online nodes
+4. Poll semantics for detection of incoming connections and messages
+5. Memory registration to pin down pages
+6. Remote memory mapping for low latency CPU accesses via mmap
+7. Remote DMA (RDMA) for high bandwidth DMA transfers
+8. Fence APIs for RDMA synchronization
+
+SCIF exposes the notion of a connection which can be used by peer processes on
+nodes in a SCIF PCIe "network" to share memory "windows" and to communicate. A
+process in a SCIF node initiates a SCIF connection to a peer process on a
+different node via a SCIF "endpoint". SCIF endpoints support messaging APIs
+which are similar to connection oriented socket APIs. Connected SCIF endpoints
+can also register local memory which is followed by data transfer using either
+DMA, CPU copies or remote memory mapping via mmap. SCIF supports both user and
+kernel mode clients which are functionally equivalent.
+
+SCIF Performance for MIC
+========================
+
+DMA bandwidth comparison between the TCP (over ethernet over PCIe) stack versus
+SCIF shows the performance advantages of SCIF for HPC applications and
+runtimes::
+
+             Comparison of TCP and SCIF based BW
+
+  Throughput (GB/sec)
+    8 +                                             PCIe Bandwidth ******
+      +                                                        TCP ######
+    7 +    **************************************             SCIF %%%%%%
+      |                       %%%%%%%%%%%%%%%%%%%
+    6 +                   %%%%
+      |                 %%
+      |               %%%
+    5 +              %%
+      |            %%
+    4 +           %%
+      |          %%
+    3 +         %%
+      |        %
+    2 +      %%
+      |     %%
+      |    %
+    1 +
+      +    ######################################
+    0 +++---+++--+--+-+--+--+-++-+--+-++-+--+-++-+-
+      1       10     100      1000   10000   100000
+                   Transfer Size (KBytes)
+
+SCIF allows memory sharing via mmap(..) between processes on different PCIe
+nodes and thus provides bare-metal PCIe latency. The round trip SCIF mmap
+latency from the host to an x100 MIC for an 8 byte message is 0.44 usecs.
+
+SCIF has a user space library which is a thin IOCTL wrapper providing a user
+space API similar to the kernel API in scif.h. The SCIF user space library
+is distributed @ https://software.intel.com/en-us/mic-developer
+
+Here is some pseudo code for an example of how two applications on two PCIe
+nodes would typically use the SCIF API::
+
+  Process A (on node A)			Process B (on node B)
+
+  /* get online node information */
+  scif_get_node_ids(..)			scif_get_node_ids(..)
+  scif_open(..)				scif_open(..)
+  scif_bind(..)				scif_bind(..)
+  scif_listen(..)
+  scif_accept(..)				scif_connect(..)
+  /* SCIF connection established */
+
+  /* Send and receive short messages */
+  scif_send(..)/scif_recv(..)		scif_send(..)/scif_recv(..)
+
+  /* Register memory */
+  scif_register(..)			scif_register(..)
+
+  /* RDMA */
+  scif_readfrom(..)/scif_writeto(..)	scif_readfrom(..)/scif_writeto(..)
+
+  /* Fence DMAs */
+  scif_fence_signal(..)			scif_fence_signal(..)
+
+  mmap(..)				mmap(..)
+
+  /* Access remote registered memory */
+
+  /* Close the endpoints */
+  scif_close(..)				scif_close(..)
diff --git a/MAINTAINERS b/MAINTAINERS
index 38fe2f3f7b6f..083fcf1a151c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8569,15 +8569,15 @@ M:	Ashutosh Dixit <ashutosh.dixit@intel.com>
 S:	Supported
 W:	https://github.com/sudeepdutt/mic
 W:	http://software.intel.com/en-us/mic-developer
+F:	Documentation/misc-devices/mic/
+F:	drivers/dma/mic_x100_dma.c
+F:	drivers/dma/mic_x100_dma.h
+F:	drivers/misc/mic/
 F:	include/linux/mic_bus.h
 F:	include/linux/scif.h
 F:	include/uapi/linux/mic_common.h
 F:	include/uapi/linux/mic_ioctl.h
 F:	include/uapi/linux/scif_ioctl.h
-F:	drivers/misc/mic/
-F:	drivers/dma/mic_x100_dma.c
-F:	drivers/dma/mic_x100_dma.h
-F:	Documentation/mic/
 
 INTEL PMC CORE DRIVER
 M:	Rajneesh Bhardwaj <rajneesh.bhardwaj@intel.com>
-- 
cgit 


From ea6b5370836f995f1cdee45ae03a992e572efa45 Mon Sep 17 00:00:00 2001
From: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Date: Sun, 8 Mar 2020 22:09:34 +0100
Subject: docs: admin-guide: binfmt-misc: Improve the title
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Trim the title a bit, since it's relatively long. Add `binfmt_misc` to
make it easier to search for the feature by its common name.

Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Link: https://lore.kernel.org/r/20200308210935.7273-1-j.neuschaefer@gmx.net
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/admin-guide/binfmt-misc.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/binfmt-misc.rst b/Documentation/admin-guide/binfmt-misc.rst
index 97b0d7927078..95c93bbe408a 100644
--- a/Documentation/admin-guide/binfmt-misc.rst
+++ b/Documentation/admin-guide/binfmt-misc.rst
@@ -1,5 +1,5 @@
-Kernel Support for miscellaneous (your favourite) Binary Formats v1.1
-=====================================================================
+Kernel Support for miscellaneous Binary Formats (binfmt_misc)
+=============================================================
 
 This Kernel feature allows you to invoke almost (for restrictions see below)
 every program by simply typing its name in the shell.
-- 
cgit 


From d442bbca36751a4c791a7559cd249f5306f5a23f Mon Sep 17 00:00:00 2001
From: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Date: Thu, 5 Mar 2020 21:51:21 +0100
Subject: docs: it_IT: netdev-FAQ: Fix link to original document
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Reviewed-by: Federico Vaga <federico.vaga@vaga.pv.it>
Link: https://lore.kernel.org/r/20200305205123.8569-1-j.neuschaefer@gmx.net
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/translations/it_IT/networking/netdev-FAQ.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/translations/it_IT/networking/netdev-FAQ.rst b/Documentation/translations/it_IT/networking/netdev-FAQ.rst
index 8489ead7cff1..7e2456bb7d92 100644
--- a/Documentation/translations/it_IT/networking/netdev-FAQ.rst
+++ b/Documentation/translations/it_IT/networking/netdev-FAQ.rst
@@ -1,6 +1,6 @@
 .. include:: ../disclaimer-ita.rst
 
-:Original: :ref:`Documentation/process/stable-kernel-rules.rst <stable_kernel_rules>`
+:Original: :ref:`Documentation/networking/netdev-FAQ.rst <netdev-FAQ>`
 
 .. _it_netdev-FAQ:
 
-- 
cgit 


From d8401f504b49c71280e504e41b3b56876094f081 Mon Sep 17 00:00:00 2001
From: Kees Cook <keescook@chromium.org>
Date: Wed, 4 Mar 2020 23:03:47 -0800
Subject: docs: deprecated.rst: Add %p to the list

Once in a while %p usage comes up, and I've needed to have a reference
to point people to. Add %p details to deprecated.rst.

Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/202003042301.F844A8C0EC@keescook
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/process/deprecated.rst | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/Documentation/process/deprecated.rst b/Documentation/process/deprecated.rst
index 179f2a5625a0..7160a449e6c6 100644
--- a/Documentation/process/deprecated.rst
+++ b/Documentation/process/deprecated.rst
@@ -109,6 +109,28 @@ the given limit of bytes to copy. This is inefficient and can lead to
 linear read overflows if a source string is not NUL-terminated. The
 safe replacement is :c:func:`strscpy`.
 
+%p format specifier
+-------------------
+Traditionally, using "%p" in format strings would lead to regular address
+exposure flaws in dmesg, proc, sysfs, etc. Instead of leaving these to
+be exploitable, all "%p" uses in the kernel are being printed as a hashed
+value, rendering them unusable for addressing. New uses of "%p" should not
+be added to the kernel. For text addresses, using "%pS" is likely better,
+as it produces the more useful symbol name instead. For nearly everything
+else, just do not add "%p" at all.
+
+Paraphrasing Linus's current `guidance <https://lore.kernel.org/lkml/CA+55aFwQEd_d40g4mUCSsVRZzrFPUJt74vc6PPpb675hYNXcKw@mail.gmail.com/>`_:
+
+- If the hashed "%p" value is pointless, ask yourself whether the pointer
+  itself is important. Maybe it should be removed entirely?
+- If you really think the true pointer value is important, why is some
+  system state or user privilege level considered "special"? If you think
+  you can justify it (in comments and commit log) well enough to stand
+  up to Linus's scrutiny, maybe you can use "%px", along with making sure
+  you have sensible permissions.
+
+And finally, know that a toggle for "%p" hashing will `not be accepted <https://lore.kernel.org/lkml/CA+55aFwieC1-nAs+NFq9RTwaR8ef9hWa4MjNBWL41F-8wM49eA@mail.gmail.com/>`_.
+
 Variable Length Arrays (VLAs)
 -----------------------------
 Using stack VLAs produces much worse machine code than statically
-- 
cgit 


From 5e72017279957b764c225f143c16391b3c51f225 Mon Sep 17 00:00:00 2001
From: Jonathan Corbet <corbet@lwn.net>
Date: Mon, 2 Mar 2020 15:17:17 -0700
Subject: docs: Organize core-api/index.rst

The core-api manual has become a big, disorganized mess.  Try to bring a
small amount of order to it by organizing the documents into
subcategories.

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/core-api/index.rst | 95 ++++++++++++++++++++++++++++++----------
 1 file changed, 73 insertions(+), 22 deletions(-)

diff --git a/Documentation/core-api/index.rst b/Documentation/core-api/index.rst
index d02b26917931..b39dae276b57 100644
--- a/Documentation/core-api/index.rst
+++ b/Documentation/core-api/index.rst
@@ -8,42 +8,81 @@ This is the beginning of a manual for core kernel APIs.  The conversion
 Core utilities
 ==============
 
+This section has general and "core core" documentation.  The first is a
+massive grab-bag of kerneldoc info left over from the docbook days; it
+should really be broken up someday when somebody finds the energy to do
+it.
+
 .. toctree::
    :maxdepth: 1
 
    kernel-api
+   workqueue
+   printk-formats
+   symbol-namespaces
+
+Data structures and low-level utilities
+=======================================
+
+Library functionality that is used throughout the kernel.
+
+.. toctree::
+   :maxdepth: 1
+
    kobject
    assoc_array
+   xarray
+   idr
+   circular-buffers
+   generic-radix-tree
+   packing
+   timekeeping
+   errseq
+
+Concurrency primitives
+======================
+
+How Linux keeps everything from happening at the same time.  See
+:doc:`/locking/index` for more related documentation.
+
+.. toctree::
+   :maxdepth: 1
+
    atomic_ops
-   cachetlb
    refcount-vs-atomic
-   cpu_hotplug
-   idr
    local_ops
-   workqueue
+   padata
+   ../RCU/index
+
+Low-level hardware management
+=============================
+
+Cache management, managing CPU hotplug, etc.
+
+.. toctree::
+   :maxdepth: 1
+
+   cachetlb
+   cpu_hotplug
+   memory-hotplug
    genericirq
-   xarray
-   librs
-   genalloc
-   errseq
-   packing
-   printk-formats
-   circular-buffers
-   generic-radix-tree
+   protection-keys
+
+Memory management
+=================
+
+How to allocate and use memory in the kernel.  Note that there is a lot
+more memory-management documentation in :doc:`/vm/index`.
+
+.. toctree::
+   :maxdepth: 1
+
    memory-allocation
    mm-api
+   genalloc
    pin_user_pages
-   gfp_mask-from-fs-io
-   timekeeping
    boot-time-mm
-   memory-hotplug
-   protection-keys
-   ../RCU/index
-   gcc-plugins
-   symbol-namespaces
-   padata
-   ioctl
-
+   gfp_mask-from-fs-io
 
 Interfaces for kernel debugging
 ===============================
@@ -54,6 +93,18 @@ Interfaces for kernel debugging
    debug-objects
    tracepoint
 
+Everything else
+===============
+
+Documents that don't fit elsewhere or which have yet to be categorized.
+
+.. toctree::
+   :maxdepth: 1
+
+   librs
+   gcc-plugins
+   ioctl
+
 .. only:: subproject and html
 
    Indices
-- 
cgit 


From 2b4cbd5c950525b6d4d2cd384dcefdd95fedabe3 Mon Sep 17 00:00:00 2001
From: Jonathan Corbet <corbet@lwn.net>
Date: Mon, 2 Mar 2020 15:24:04 -0700
Subject: docs: move gcc-plugins to the kbuild manual

Information about GCC plugins is relevant to kernel building, so move this
document to the kbuild manual.

Acked-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/core-api/gcc-plugins.rst | 97 ----------------------------------
 Documentation/core-api/index.rst       |  1 -
 Documentation/kbuild/gcc-plugins.rst   | 97 ++++++++++++++++++++++++++++++++++
 Documentation/kbuild/index.rst         |  1 +
 MAINTAINERS                            |  2 +-
 scripts/gcc-plugins/Kconfig            |  2 +-
 6 files changed, 100 insertions(+), 100 deletions(-)
 delete mode 100644 Documentation/core-api/gcc-plugins.rst
 create mode 100644 Documentation/kbuild/gcc-plugins.rst

diff --git a/Documentation/core-api/gcc-plugins.rst b/Documentation/core-api/gcc-plugins.rst
deleted file mode 100644
index 4b1c10f88e30..000000000000
--- a/Documentation/core-api/gcc-plugins.rst
+++ /dev/null
@@ -1,97 +0,0 @@
-=========================
-GCC plugin infrastructure
-=========================
-
-
-Introduction
-============
-
-GCC plugins are loadable modules that provide extra features to the
-compiler [1]_. They are useful for runtime instrumentation and static analysis.
-We can analyse, change and add further code during compilation via
-callbacks [2]_, GIMPLE [3]_, IPA [4]_ and RTL passes [5]_.
-
-The GCC plugin infrastructure of the kernel supports all gcc versions from
-4.5 to 6.0, building out-of-tree modules, cross-compilation and building in a
-separate directory.
-Plugin source files have to be compilable by both a C and a C++ compiler as well
-because gcc versions 4.5 and 4.6 are compiled by a C compiler,
-gcc-4.7 can be compiled by a C or a C++ compiler,
-and versions 4.8+ can only be compiled by a C++ compiler.
-
-Currently the GCC plugin infrastructure supports only the x86, arm, arm64 and
-powerpc architectures.
-
-This infrastructure was ported from grsecurity [6]_ and PaX [7]_.
-
---
-
-.. [1] https://gcc.gnu.org/onlinedocs/gccint/Plugins.html
-.. [2] https://gcc.gnu.org/onlinedocs/gccint/Plugin-API.html#Plugin-API
-.. [3] https://gcc.gnu.org/onlinedocs/gccint/GIMPLE.html
-.. [4] https://gcc.gnu.org/onlinedocs/gccint/IPA.html
-.. [5] https://gcc.gnu.org/onlinedocs/gccint/RTL.html
-.. [6] https://grsecurity.net/
-.. [7] https://pax.grsecurity.net/
-
-
-Files
-=====
-
-**$(src)/scripts/gcc-plugins**
-
-	This is the directory of the GCC plugins.
-
-**$(src)/scripts/gcc-plugins/gcc-common.h**
-
-	This is a compatibility header for GCC plugins.
-	It should be always included instead of individual gcc headers.
-
-**$(src)/scripts/gcc-plugin.sh**
-
-	This script checks the availability of the included headers in
-	gcc-common.h and chooses the proper host compiler to build the plugins
-	(gcc-4.7 can be built by either gcc or g++).
-
-**$(src)/scripts/gcc-plugins/gcc-generate-gimple-pass.h,
-$(src)/scripts/gcc-plugins/gcc-generate-ipa-pass.h,
-$(src)/scripts/gcc-plugins/gcc-generate-simple_ipa-pass.h,
-$(src)/scripts/gcc-plugins/gcc-generate-rtl-pass.h**
-
-	These headers automatically generate the registration structures for
-	GIMPLE, SIMPLE_IPA, IPA and RTL passes. They support all gcc versions
-	from 4.5 to 6.0.
-	They should be preferred to creating the structures by hand.
-
-
-Usage
-=====
-
-You must install the gcc plugin headers for your gcc version,
-e.g., on Ubuntu for gcc-4.9::
-
-	apt-get install gcc-4.9-plugin-dev
-
-Or on Fedora::
-
-	dnf install gcc-plugin-devel
-
-Enable a GCC plugin based feature in the kernel config::
-
-	CONFIG_GCC_PLUGIN_CYC_COMPLEXITY = y
-
-To compile only the plugin(s)::
-
-	make gcc-plugins
-
-or just run the kernel make and compile the whole kernel with
-the cyclomatic complexity GCC plugin.
-
-
-4. How to add a new GCC plugin
-==============================
-
-The GCC plugins are in $(src)/scripts/gcc-plugins/. You can use a file or a directory
-here. It must be added to $(src)/scripts/gcc-plugins/Makefile,
-$(src)/scripts/Makefile.gcc-plugins and $(src)/arch/Kconfig.
-See the cyc_complexity_plugin.c (CONFIG_GCC_PLUGIN_CYC_COMPLEXITY) GCC plugin.
diff --git a/Documentation/core-api/index.rst b/Documentation/core-api/index.rst
index b39dae276b57..9836a0ac09a3 100644
--- a/Documentation/core-api/index.rst
+++ b/Documentation/core-api/index.rst
@@ -102,7 +102,6 @@ Documents that don't fit elsewhere or which have yet to be categorized.
    :maxdepth: 1
 
    librs
-   gcc-plugins
    ioctl
 
 .. only:: subproject and html
diff --git a/Documentation/kbuild/gcc-plugins.rst b/Documentation/kbuild/gcc-plugins.rst
new file mode 100644
index 000000000000..4b1c10f88e30
--- /dev/null
+++ b/Documentation/kbuild/gcc-plugins.rst
@@ -0,0 +1,97 @@
+=========================
+GCC plugin infrastructure
+=========================
+
+
+Introduction
+============
+
+GCC plugins are loadable modules that provide extra features to the
+compiler [1]_. They are useful for runtime instrumentation and static analysis.
+We can analyse, change and add further code during compilation via
+callbacks [2]_, GIMPLE [3]_, IPA [4]_ and RTL passes [5]_.
+
+The GCC plugin infrastructure of the kernel supports all gcc versions from
+4.5 to 6.0, building out-of-tree modules, cross-compilation and building in a
+separate directory.
+Plugin source files have to be compilable by both a C and a C++ compiler as well
+because gcc versions 4.5 and 4.6 are compiled by a C compiler,
+gcc-4.7 can be compiled by a C or a C++ compiler,
+and versions 4.8+ can only be compiled by a C++ compiler.
+
+Currently the GCC plugin infrastructure supports only the x86, arm, arm64 and
+powerpc architectures.
+
+This infrastructure was ported from grsecurity [6]_ and PaX [7]_.
+
+--
+
+.. [1] https://gcc.gnu.org/onlinedocs/gccint/Plugins.html
+.. [2] https://gcc.gnu.org/onlinedocs/gccint/Plugin-API.html#Plugin-API
+.. [3] https://gcc.gnu.org/onlinedocs/gccint/GIMPLE.html
+.. [4] https://gcc.gnu.org/onlinedocs/gccint/IPA.html
+.. [5] https://gcc.gnu.org/onlinedocs/gccint/RTL.html
+.. [6] https://grsecurity.net/
+.. [7] https://pax.grsecurity.net/
+
+
+Files
+=====
+
+**$(src)/scripts/gcc-plugins**
+
+	This is the directory of the GCC plugins.
+
+**$(src)/scripts/gcc-plugins/gcc-common.h**
+
+	This is a compatibility header for GCC plugins.
+	It should be always included instead of individual gcc headers.
+
+**$(src)/scripts/gcc-plugin.sh**
+
+	This script checks the availability of the included headers in
+	gcc-common.h and chooses the proper host compiler to build the plugins
+	(gcc-4.7 can be built by either gcc or g++).
+
+**$(src)/scripts/gcc-plugins/gcc-generate-gimple-pass.h,
+$(src)/scripts/gcc-plugins/gcc-generate-ipa-pass.h,
+$(src)/scripts/gcc-plugins/gcc-generate-simple_ipa-pass.h,
+$(src)/scripts/gcc-plugins/gcc-generate-rtl-pass.h**
+
+	These headers automatically generate the registration structures for
+	GIMPLE, SIMPLE_IPA, IPA and RTL passes. They support all gcc versions
+	from 4.5 to 6.0.
+	They should be preferred to creating the structures by hand.
+
+
+Usage
+=====
+
+You must install the gcc plugin headers for your gcc version,
+e.g., on Ubuntu for gcc-4.9::
+
+	apt-get install gcc-4.9-plugin-dev
+
+Or on Fedora::
+
+	dnf install gcc-plugin-devel
+
+Enable a GCC plugin based feature in the kernel config::
+
+	CONFIG_GCC_PLUGIN_CYC_COMPLEXITY = y
+
+To compile only the plugin(s)::
+
+	make gcc-plugins
+
+or just run the kernel make and compile the whole kernel with
+the cyclomatic complexity GCC plugin.
+
+
+4. How to add a new GCC plugin
+==============================
+
+The GCC plugins are in $(src)/scripts/gcc-plugins/. You can use a file or a directory
+here. It must be added to $(src)/scripts/gcc-plugins/Makefile,
+$(src)/scripts/Makefile.gcc-plugins and $(src)/arch/Kconfig.
+See the cyc_complexity_plugin.c (CONFIG_GCC_PLUGIN_CYC_COMPLEXITY) GCC plugin.
diff --git a/Documentation/kbuild/index.rst b/Documentation/kbuild/index.rst
index 0f144fad99a6..82daf2efcb73 100644
--- a/Documentation/kbuild/index.rst
+++ b/Documentation/kbuild/index.rst
@@ -19,6 +19,7 @@ Kernel Build System
 
     issues
     reproducible-builds
+    gcc-plugins
 
 .. only::  subproject and html
 
diff --git a/MAINTAINERS b/MAINTAINERS
index 083fcf1a151c..8c5712079412 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6934,7 +6934,7 @@ S:	Maintained
 F:	scripts/gcc-plugins/
 F:	scripts/gcc-plugin.sh
 F:	scripts/Makefile.gcc-plugins
-F:	Documentation/core-api/gcc-plugins.rst
+F:	Documentation/kbuild/gcc-plugins.rst
 
 GASKET DRIVER FRAMEWORK
 M:	Rob Springer <rspringer@google.com>
diff --git a/scripts/gcc-plugins/Kconfig b/scripts/gcc-plugins/Kconfig
index e3569543bdac..f8ca236d6165 100644
--- a/scripts/gcc-plugins/Kconfig
+++ b/scripts/gcc-plugins/Kconfig
@@ -23,7 +23,7 @@ menuconfig GCC_PLUGINS
 	  GCC plugins are loadable modules that provide extra features to the
 	  compiler. They are useful for runtime instrumentation and static analysis.
 
-	  See Documentation/core-api/gcc-plugins.rst for details.
+	  See Documentation/kbuild/gcc-plugins.rst for details.
 
 if GCC_PLUGINS
 
-- 
cgit 


From 6505a18e66876e0f502dcba5a563bd3048094048 Mon Sep 17 00:00:00 2001
From: Jonathan Corbet <corbet@lwn.net>
Date: Mon, 2 Mar 2020 15:26:38 -0700
Subject: docs: move core-api/ioctl.rst to driver-api/

The ioctl() documentation belongs with the rest of the driver-oriented
info, so move it there.

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/core-api/index.rst   |   1 -
 Documentation/core-api/ioctl.rst   | 253 -------------------------------------
 Documentation/driver-api/index.rst |   1 +
 Documentation/driver-api/ioctl.rst | 253 +++++++++++++++++++++++++++++++++++++
 4 files changed, 254 insertions(+), 254 deletions(-)
 delete mode 100644 Documentation/core-api/ioctl.rst
 create mode 100644 Documentation/driver-api/ioctl.rst

diff --git a/Documentation/core-api/index.rst b/Documentation/core-api/index.rst
index 9836a0ac09a3..0897ad12c119 100644
--- a/Documentation/core-api/index.rst
+++ b/Documentation/core-api/index.rst
@@ -102,7 +102,6 @@ Documents that don't fit elsewhere or which have yet to be categorized.
    :maxdepth: 1
 
    librs
-   ioctl
 
 .. only:: subproject and html
 
diff --git a/Documentation/core-api/ioctl.rst b/Documentation/core-api/ioctl.rst
deleted file mode 100644
index c455db0e1627..000000000000
--- a/Documentation/core-api/ioctl.rst
+++ /dev/null
@@ -1,253 +0,0 @@
-======================
-ioctl based interfaces
-======================
-
-ioctl() is the most common way for applications to interface
-with device drivers. It is flexible and easily extended by adding new
-commands and can be passed through character devices, block devices as
-well as sockets and other special file descriptors.
-
-However, it is also very easy to get ioctl command definitions wrong,
-and hard to fix them later without breaking existing applications,
-so this documentation tries to help developers get it right.
-
-Command number definitions
-==========================
-
-The command number, or request number, is the second argument passed to
-the ioctl system call. While this can be any 32-bit number that uniquely
-identifies an action for a particular driver, there are a number of
-conventions around defining them.
-
-``include/uapi/asm-generic/ioctl.h`` provides four macros for defining
-ioctl commands that follow modern conventions: ``_IO``, ``_IOR``,
-``_IOW``, and ``_IOWR``. These should be used for all new commands,
-with the correct parameters:
-
-_IO/_IOR/_IOW/_IOWR
-   The macro name specifies how the argument will be used.  It may be a
-   pointer to data to be passed into the kernel (_IOW), out of the kernel
-   (_IOR), or both (_IOWR).  _IO can indicate either commands with no
-   argument or those passing an integer value instead of a pointer.
-   It is recommended to only use _IO for commands without arguments,
-   and use pointers for passing data.
-
-type
-   An 8-bit number, often a character literal, specific to a subsystem
-   or driver, and listed in :doc:`../userspace-api/ioctl/ioctl-number`
-
-nr
-  An 8-bit number identifying the specific command, unique for a give
-  value of 'type'
-
-data_type
-  The name of the data type pointed to by the argument, the command number
-  encodes the ``sizeof(data_type)`` value in a 13-bit or 14-bit integer,
-  leading to a limit of 8191 bytes for the maximum size of the argument.
-  Note: do not pass sizeof(data_type) type into _IOR/_IOW/IOWR, as that
-  will lead to encoding sizeof(sizeof(data_type)), i.e. sizeof(size_t).
-  _IO does not have a data_type parameter.
-
-
-Interface versions
-==================
-
-Some subsystems use version numbers in data structures to overload
-commands with different interpretations of the argument.
-
-This is generally a bad idea, since changes to existing commands tend
-to break existing applications.
-
-A better approach is to add a new ioctl command with a new number. The
-old command still needs to be implemented in the kernel for compatibility,
-but this can be a wrapper around the new implementation.
-
-Return code
-===========
-
-ioctl commands can return negative error codes as documented in errno(3);
-these get turned into errno values in user space. On success, the return
-code should be zero. It is also possible but not recommended to return
-a positive 'long' value.
-
-When the ioctl callback is called with an unknown command number, the
-handler returns either -ENOTTY or -ENOIOCTLCMD, which also results in
--ENOTTY being returned from the system call. Some subsystems return
--ENOSYS or -EINVAL here for historic reasons, but this is wrong.
-
-Prior to Linux 5.5, compat_ioctl handlers were required to return
--ENOIOCTLCMD in order to use the fallback conversion into native
-commands. As all subsystems are now responsible for handling compat
-mode themselves, this is no longer needed, but it may be important to
-consider when backporting bug fixes to older kernels.
-
-Timestamps
-==========
-
-Traditionally, timestamps and timeout values are passed as ``struct
-timespec`` or ``struct timeval``, but these are problematic because of
-incompatible definitions of these structures in user space after the
-move to 64-bit time_t.
-
-The ``struct __kernel_timespec`` type can be used instead to be embedded
-in other data structures when separate second/nanosecond values are
-desired, or passed to user space directly. This is still not ideal though,
-as the structure matches neither the kernel's timespec64 nor the user
-space timespec exactly. The get_timespec64() and put_timespec64() helper
-functions can be used to ensure that the layout remains compatible with
-user space and the padding is treated correctly.
-
-As it is cheap to convert seconds to nanoseconds, but the opposite
-requires an expensive 64-bit division, a simple __u64 nanosecond value
-can be simpler and more efficient.
-
-Timeout values and timestamps should ideally use CLOCK_MONOTONIC time,
-as returned by ktime_get_ns() or ktime_get_ts64().  Unlike
-CLOCK_REALTIME, this makes the timestamps immune from jumping backwards
-or forwards due to leap second adjustments and clock_settime() calls.
-
-ktime_get_real_ns() can be used for CLOCK_REALTIME timestamps that
-need to be persistent across a reboot or between multiple machines.
-
-32-bit compat mode
-==================
-
-In order to support 32-bit user space running on a 64-bit machine, each
-subsystem or driver that implements an ioctl callback handler must also
-implement the corresponding compat_ioctl handler.
-
-As long as all the rules for data structures are followed, this is as
-easy as setting the .compat_ioctl pointer to a helper function such as
-compat_ptr_ioctl() or blkdev_compat_ptr_ioctl().
-
-compat_ptr()
-------------
-
-On the s390 architecture, 31-bit user space has ambiguous representations
-for data pointers, with the upper bit being ignored. When running such
-a process in compat mode, the compat_ptr() helper must be used to
-clear the upper bit of a compat_uptr_t and turn it into a valid 64-bit
-pointer.  On other architectures, this macro only performs a cast to a
-``void __user *`` pointer.
-
-In an compat_ioctl() callback, the last argument is an unsigned long,
-which can be interpreted as either a pointer or a scalar depending on
-the command. If it is a scalar, then compat_ptr() must not be used, to
-ensure that the 64-bit kernel behaves the same way as a 32-bit kernel
-for arguments with the upper bit set.
-
-The compat_ptr_ioctl() helper can be used in place of a custom
-compat_ioctl file operation for drivers that only take arguments that
-are pointers to compatible data structures.
-
-Structure layout
-----------------
-
-Compatible data structures have the same layout on all architectures,
-avoiding all problematic members:
-
-* ``long`` and ``unsigned long`` are the size of a register, so
-  they can be either 32-bit or 64-bit wide and cannot be used in portable
-  data structures. Fixed-length replacements are ``__s32``, ``__u32``,
-  ``__s64`` and ``__u64``.
-
-* Pointers have the same problem, in addition to requiring the
-  use of compat_ptr(). The best workaround is to use ``__u64``
-  in place of pointers, which requires a cast to ``uintptr_t`` in user
-  space, and the use of u64_to_user_ptr() in the kernel to convert
-  it back into a user pointer.
-
-* On the x86-32 (i386) architecture, the alignment of 64-bit variables
-  is only 32-bit, but they are naturally aligned on most other
-  architectures including x86-64. This means a structure like::
-
-    struct foo {
-        __u32 a;
-        __u64 b;
-        __u32 c;
-    };
-
-  has four bytes of padding between a and b on x86-64, plus another four
-  bytes of padding at the end, but no padding on i386, and it needs a
-  compat_ioctl conversion handler to translate between the two formats.
-
-  To avoid this problem, all structures should have their members
-  naturally aligned, or explicit reserved fields added in place of the
-  implicit padding. The ``pahole`` tool can be used for checking the
-  alignment.
-
-* On ARM OABI user space, structures are padded to multiples of 32-bit,
-  making some structs incompatible with modern EABI kernels if they
-  do not end on a 32-bit boundary.
-
-* On the m68k architecture, struct members are not guaranteed to have an
-  alignment greater than 16-bit, which is a problem when relying on
-  implicit padding.
-
-* Bitfields and enums generally work as one would expect them to,
-  but some properties of them are implementation-defined, so it is better
-  to avoid them completely in ioctl interfaces.
-
-* ``char`` members can be either signed or unsigned, depending on
-  the architecture, so the __u8 and __s8 types should be used for 8-bit
-  integer values, though char arrays are clearer for fixed-length strings.
-
-Information leaks
-=================
-
-Uninitialized data must not be copied back to user space, as this can
-cause an information leak, which can be used to defeat kernel address
-space layout randomization (KASLR), helping in an attack.
-
-For this reason (and for compat support) it is best to avoid any
-implicit padding in data structures.  Where there is implicit padding
-in an existing structure, kernel drivers must be careful to fully
-initialize an instance of the structure before copying it to user
-space.  This is usually done by calling memset() before assigning to
-individual members.
-
-Subsystem abstractions
-======================
-
-While some device drivers implement their own ioctl function, most
-subsystems implement the same command for multiple drivers.  Ideally the
-subsystem has an .ioctl() handler that copies the arguments from and
-to user space, passing them into subsystem specific callback functions
-through normal kernel pointers.
-
-This helps in various ways:
-
-* Applications written for one driver are more likely to work for
-  another one in the same subsystem if there are no subtle differences
-  in the user space ABI.
-
-* The complexity of user space access and data structure layout is done
-  in one place, reducing the potential for implementation bugs.
-
-* It is more likely to be reviewed by experienced developers
-  that can spot problems in the interface when the ioctl is shared
-  between multiple drivers than when it is only used in a single driver.
-
-Alternatives to ioctl
-=====================
-
-There are many cases in which ioctl is not the best solution for a
-problem. Alternatives include:
-
-* System calls are a better choice for a system-wide feature that
-  is not tied to a physical device or constrained by the file system
-  permissions of a character device node
-
-* netlink is the preferred way of configuring any network related
-  objects through sockets.
-
-* debugfs is used for ad-hoc interfaces for debugging functionality
-  that does not need to be exposed as a stable interface to applications.
-
-* sysfs is a good way to expose the state of an in-kernel object
-  that is not tied to a file descriptor.
-
-* configfs can be used for more complex configuration than sysfs
-
-* A custom file system can provide extra flexibility with a simple
-  user interface but adds a lot of complexity to the implementation.
diff --git a/Documentation/driver-api/index.rst b/Documentation/driver-api/index.rst
index ea3003b3c5e5..1d8c5599149b 100644
--- a/Documentation/driver-api/index.rst
+++ b/Documentation/driver-api/index.rst
@@ -17,6 +17,7 @@ available subsections can be seen below.
    driver-model/index
    basics
    infrastructure
+   ioctl
    early-userspace/index
    pm/index
    clk
diff --git a/Documentation/driver-api/ioctl.rst b/Documentation/driver-api/ioctl.rst
new file mode 100644
index 000000000000..c455db0e1627
--- /dev/null
+++ b/Documentation/driver-api/ioctl.rst
@@ -0,0 +1,253 @@
+======================
+ioctl based interfaces
+======================
+
+ioctl() is the most common way for applications to interface
+with device drivers. It is flexible and easily extended by adding new
+commands and can be passed through character devices, block devices as
+well as sockets and other special file descriptors.
+
+However, it is also very easy to get ioctl command definitions wrong,
+and hard to fix them later without breaking existing applications,
+so this documentation tries to help developers get it right.
+
+Command number definitions
+==========================
+
+The command number, or request number, is the second argument passed to
+the ioctl system call. While this can be any 32-bit number that uniquely
+identifies an action for a particular driver, there are a number of
+conventions around defining them.
+
+``include/uapi/asm-generic/ioctl.h`` provides four macros for defining
+ioctl commands that follow modern conventions: ``_IO``, ``_IOR``,
+``_IOW``, and ``_IOWR``. These should be used for all new commands,
+with the correct parameters:
+
+_IO/_IOR/_IOW/_IOWR
+   The macro name specifies how the argument will be used.  It may be a
+   pointer to data to be passed into the kernel (_IOW), out of the kernel
+   (_IOR), or both (_IOWR).  _IO can indicate either commands with no
+   argument or those passing an integer value instead of a pointer.
+   It is recommended to only use _IO for commands without arguments,
+   and use pointers for passing data.
+
+type
+   An 8-bit number, often a character literal, specific to a subsystem
+   or driver, and listed in :doc:`../userspace-api/ioctl/ioctl-number`
+
+nr
+  An 8-bit number identifying the specific command, unique for a give
+  value of 'type'
+
+data_type
+  The name of the data type pointed to by the argument, the command number
+  encodes the ``sizeof(data_type)`` value in a 13-bit or 14-bit integer,
+  leading to a limit of 8191 bytes for the maximum size of the argument.
+  Note: do not pass sizeof(data_type) type into _IOR/_IOW/IOWR, as that
+  will lead to encoding sizeof(sizeof(data_type)), i.e. sizeof(size_t).
+  _IO does not have a data_type parameter.
+
+
+Interface versions
+==================
+
+Some subsystems use version numbers in data structures to overload
+commands with different interpretations of the argument.
+
+This is generally a bad idea, since changes to existing commands tend
+to break existing applications.
+
+A better approach is to add a new ioctl command with a new number. The
+old command still needs to be implemented in the kernel for compatibility,
+but this can be a wrapper around the new implementation.
+
+Return code
+===========
+
+ioctl commands can return negative error codes as documented in errno(3);
+these get turned into errno values in user space. On success, the return
+code should be zero. It is also possible but not recommended to return
+a positive 'long' value.
+
+When the ioctl callback is called with an unknown command number, the
+handler returns either -ENOTTY or -ENOIOCTLCMD, which also results in
+-ENOTTY being returned from the system call. Some subsystems return
+-ENOSYS or -EINVAL here for historic reasons, but this is wrong.
+
+Prior to Linux 5.5, compat_ioctl handlers were required to return
+-ENOIOCTLCMD in order to use the fallback conversion into native
+commands. As all subsystems are now responsible for handling compat
+mode themselves, this is no longer needed, but it may be important to
+consider when backporting bug fixes to older kernels.
+
+Timestamps
+==========
+
+Traditionally, timestamps and timeout values are passed as ``struct
+timespec`` or ``struct timeval``, but these are problematic because of
+incompatible definitions of these structures in user space after the
+move to 64-bit time_t.
+
+The ``struct __kernel_timespec`` type can be used instead to be embedded
+in other data structures when separate second/nanosecond values are
+desired, or passed to user space directly. This is still not ideal though,
+as the structure matches neither the kernel's timespec64 nor the user
+space timespec exactly. The get_timespec64() and put_timespec64() helper
+functions can be used to ensure that the layout remains compatible with
+user space and the padding is treated correctly.
+
+As it is cheap to convert seconds to nanoseconds, but the opposite
+requires an expensive 64-bit division, a simple __u64 nanosecond value
+can be simpler and more efficient.
+
+Timeout values and timestamps should ideally use CLOCK_MONOTONIC time,
+as returned by ktime_get_ns() or ktime_get_ts64().  Unlike
+CLOCK_REALTIME, this makes the timestamps immune from jumping backwards
+or forwards due to leap second adjustments and clock_settime() calls.
+
+ktime_get_real_ns() can be used for CLOCK_REALTIME timestamps that
+need to be persistent across a reboot or between multiple machines.
+
+32-bit compat mode
+==================
+
+In order to support 32-bit user space running on a 64-bit machine, each
+subsystem or driver that implements an ioctl callback handler must also
+implement the corresponding compat_ioctl handler.
+
+As long as all the rules for data structures are followed, this is as
+easy as setting the .compat_ioctl pointer to a helper function such as
+compat_ptr_ioctl() or blkdev_compat_ptr_ioctl().
+
+compat_ptr()
+------------
+
+On the s390 architecture, 31-bit user space has ambiguous representations
+for data pointers, with the upper bit being ignored. When running such
+a process in compat mode, the compat_ptr() helper must be used to
+clear the upper bit of a compat_uptr_t and turn it into a valid 64-bit
+pointer.  On other architectures, this macro only performs a cast to a
+``void __user *`` pointer.
+
+In an compat_ioctl() callback, the last argument is an unsigned long,
+which can be interpreted as either a pointer or a scalar depending on
+the command. If it is a scalar, then compat_ptr() must not be used, to
+ensure that the 64-bit kernel behaves the same way as a 32-bit kernel
+for arguments with the upper bit set.
+
+The compat_ptr_ioctl() helper can be used in place of a custom
+compat_ioctl file operation for drivers that only take arguments that
+are pointers to compatible data structures.
+
+Structure layout
+----------------
+
+Compatible data structures have the same layout on all architectures,
+avoiding all problematic members:
+
+* ``long`` and ``unsigned long`` are the size of a register, so
+  they can be either 32-bit or 64-bit wide and cannot be used in portable
+  data structures. Fixed-length replacements are ``__s32``, ``__u32``,
+  ``__s64`` and ``__u64``.
+
+* Pointers have the same problem, in addition to requiring the
+  use of compat_ptr(). The best workaround is to use ``__u64``
+  in place of pointers, which requires a cast to ``uintptr_t`` in user
+  space, and the use of u64_to_user_ptr() in the kernel to convert
+  it back into a user pointer.
+
+* On the x86-32 (i386) architecture, the alignment of 64-bit variables
+  is only 32-bit, but they are naturally aligned on most other
+  architectures including x86-64. This means a structure like::
+
+    struct foo {
+        __u32 a;
+        __u64 b;
+        __u32 c;
+    };
+
+  has four bytes of padding between a and b on x86-64, plus another four
+  bytes of padding at the end, but no padding on i386, and it needs a
+  compat_ioctl conversion handler to translate between the two formats.
+
+  To avoid this problem, all structures should have their members
+  naturally aligned, or explicit reserved fields added in place of the
+  implicit padding. The ``pahole`` tool can be used for checking the
+  alignment.
+
+* On ARM OABI user space, structures are padded to multiples of 32-bit,
+  making some structs incompatible with modern EABI kernels if they
+  do not end on a 32-bit boundary.
+
+* On the m68k architecture, struct members are not guaranteed to have an
+  alignment greater than 16-bit, which is a problem when relying on
+  implicit padding.
+
+* Bitfields and enums generally work as one would expect them to,
+  but some properties of them are implementation-defined, so it is better
+  to avoid them completely in ioctl interfaces.
+
+* ``char`` members can be either signed or unsigned, depending on
+  the architecture, so the __u8 and __s8 types should be used for 8-bit
+  integer values, though char arrays are clearer for fixed-length strings.
+
+Information leaks
+=================
+
+Uninitialized data must not be copied back to user space, as this can
+cause an information leak, which can be used to defeat kernel address
+space layout randomization (KASLR), helping in an attack.
+
+For this reason (and for compat support) it is best to avoid any
+implicit padding in data structures.  Where there is implicit padding
+in an existing structure, kernel drivers must be careful to fully
+initialize an instance of the structure before copying it to user
+space.  This is usually done by calling memset() before assigning to
+individual members.
+
+Subsystem abstractions
+======================
+
+While some device drivers implement their own ioctl function, most
+subsystems implement the same command for multiple drivers.  Ideally the
+subsystem has an .ioctl() handler that copies the arguments from and
+to user space, passing them into subsystem specific callback functions
+through normal kernel pointers.
+
+This helps in various ways:
+
+* Applications written for one driver are more likely to work for
+  another one in the same subsystem if there are no subtle differences
+  in the user space ABI.
+
+* The complexity of user space access and data structure layout is done
+  in one place, reducing the potential for implementation bugs.
+
+* It is more likely to be reviewed by experienced developers
+  that can spot problems in the interface when the ioctl is shared
+  between multiple drivers than when it is only used in a single driver.
+
+Alternatives to ioctl
+=====================
+
+There are many cases in which ioctl is not the best solution for a
+problem. Alternatives include:
+
+* System calls are a better choice for a system-wide feature that
+  is not tied to a physical device or constrained by the file system
+  permissions of a character device node
+
+* netlink is the preferred way of configuring any network related
+  objects through sockets.
+
+* debugfs is used for ad-hoc interfaces for debugging functionality
+  that does not need to be exposed as a stable interface to applications.
+
+* sysfs is a good way to expose the state of an in-kernel object
+  that is not tied to a file descriptor.
+
+* configfs can be used for more complex configuration than sysfs
+
+* A custom file system can provide extra flexibility with a simple
+  user interface but adds a lot of complexity to the implementation.
-- 
cgit 


From 76136e028d3bc94a84f5404ba0a9afae38db1b8a Mon Sep 17 00:00:00 2001
From: Kees Cook <keescook@chromium.org>
Date: Wed, 4 Mar 2020 11:03:24 -0800
Subject: docs: deprecated.rst: Clean up fall-through details

Add example of fall-through, list-ify the case ending statements, and
adjust the markup for links and readability. While here, adjust
strscpy() details to mention strscpy_pad().

Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Link: https://lore.kernel.org/r/202003041102.47A4E4B62@keescook
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/process/deprecated.rst | 48 ++++++++++++++++++++++--------------
 1 file changed, 29 insertions(+), 19 deletions(-)

diff --git a/Documentation/process/deprecated.rst b/Documentation/process/deprecated.rst
index 7160a449e6c6..8965446f0b71 100644
--- a/Documentation/process/deprecated.rst
+++ b/Documentation/process/deprecated.rst
@@ -94,8 +94,8 @@ and other misbehavior due to the missing termination. It also NUL-pads the
 destination buffer if the source contents are shorter than the destination
 buffer size, which may be a needless performance penalty for callers using
 only NUL-terminated strings. The safe replacement is :c:func:`strscpy`.
-(Users of :c:func:`strscpy` still needing NUL-padding will need an
-explicit :c:func:`memset` added.)
+(Users of :c:func:`strscpy` still needing NUL-padding should instead
+use strscpy_pad().)
 
 If a caller is using non-NUL-terminated strings, :c:func:`strncpy()` can
 still be used, but destinations should be marked with the `__nonstring
@@ -144,27 +144,37 @@ memory adjacent to the stack (when built without `CONFIG_VMAP_STACK=y`)
 
 Implicit switch case fall-through
 ---------------------------------
-The C language allows switch cases to "fall-through" when a "break" statement
-is missing at the end of a case. This, however, introduces ambiguity in the
-code, as it's not always clear if the missing break is intentional or a bug.
+The C language allows switch cases to fall through to the next case
+when a "break" statement is missing at the end of a case. This, however,
+introduces ambiguity in the code, as it's not always clear if the missing
+break is intentional or a bug. For example, it's not obvious just from
+looking at the code if `STATE_ONE` is intentionally designed to fall
+through into `STATE_TWO`::
+
+	switch (value) {
+	case STATE_ONE:
+		do_something();
+	case STATE_TWO:
+		do_other();
+		break;
+	default:
+		WARN("unknown state");
+	}
 
 As there have been a long list of flaws `due to missing "break" statements
 <https://cwe.mitre.org/data/definitions/484.html>`_, we no longer allow
-"implicit fall-through".
-
-In order to identify intentional fall-through cases, we have adopted a
-pseudo-keyword macro 'fallthrough' which expands to gcc's extension
-__attribute__((__fallthrough__)).  `Statement Attributes
-<https://gcc.gnu.org/onlinedocs/gcc/Statement-Attributes.html>`_
-
-When the C17/C18  [[fallthrough]] syntax is more commonly supported by
+implicit fall-through. In order to identify intentional fall-through
+cases, we have adopted a pseudo-keyword macro "fallthrough" which
+expands to gcc's extension `__attribute__((__fallthrough__))
+<https://gcc.gnu.org/onlinedocs/gcc/Statement-Attributes.html>`_.
+(When the C17/C18  `[[fallthrough]]` syntax is more commonly supported by
 C compilers, static analyzers, and IDEs, we can switch to using that syntax
-for the macro pseudo-keyword.
+for the macro pseudo-keyword.)
 
 All switch/case blocks must end in one of:
 
-	break;
-	fallthrough;
-	continue;
-	goto <label>;
-	return [expression];
+* break;
+* fallthrough;
+* continue;
+* goto <label>;
+* return [expression];
-- 
cgit 


From 7929b9836ed0d7c051eed9f223f0f815454c5210 Mon Sep 17 00:00:00 2001
From: Jonathan Corbet <corbet@lwn.net>
Date: Tue, 10 Mar 2020 11:27:22 -0600
Subject: docs: Remove :c:func: from process/deprecated.rst

Documentation/process/deprecated.rst has a lot of uses of :c:func:, which
is, well, deprecated.  Emacs query-replace-regexp to the rescue.

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/process/deprecated.rst | 30 +++++++++++++++---------------
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/Documentation/process/deprecated.rst b/Documentation/process/deprecated.rst
index 8965446f0b71..e924d3197761 100644
--- a/Documentation/process/deprecated.rst
+++ b/Documentation/process/deprecated.rst
@@ -63,51 +63,51 @@ Instead, use the helper::
 
 	header = kzalloc(struct_size(header, item, count), GFP_KERNEL);
 
-See :c:func:`array_size`, :c:func:`array3_size`, and :c:func:`struct_size`,
-for more details as well as the related :c:func:`check_add_overflow` and
-:c:func:`check_mul_overflow` family of functions.
+See array_size(), array3_size(), and struct_size(),
+for more details as well as the related check_add_overflow() and
+check_mul_overflow() family of functions.
 
 simple_strtol(), simple_strtoll(), simple_strtoul(), simple_strtoull()
 ----------------------------------------------------------------------
-The :c:func:`simple_strtol`, :c:func:`simple_strtoll`,
-:c:func:`simple_strtoul`, and :c:func:`simple_strtoull` functions
+The simple_strtol(), simple_strtoll(),
+simple_strtoul(), and simple_strtoull() functions
 explicitly ignore overflows, which may lead to unexpected results
-in callers. The respective :c:func:`kstrtol`, :c:func:`kstrtoll`,
-:c:func:`kstrtoul`, and :c:func:`kstrtoull` functions tend to be the
+in callers. The respective kstrtol(), kstrtoll(),
+kstrtoul(), and kstrtoull() functions tend to be the
 correct replacements, though note that those require the string to be
 NUL or newline terminated.
 
 strcpy()
 --------
-:c:func:`strcpy` performs no bounds checking on the destination
+strcpy() performs no bounds checking on the destination
 buffer. This could result in linear overflows beyond the
 end of the buffer, leading to all kinds of misbehaviors. While
 `CONFIG_FORTIFY_SOURCE=y` and various compiler flags help reduce the
 risk of using this function, there is no good reason to add new uses of
-this function. The safe replacement is :c:func:`strscpy`.
+this function. The safe replacement is strscpy().
 
 strncpy() on NUL-terminated strings
 -----------------------------------
-Use of :c:func:`strncpy` does not guarantee that the destination buffer
+Use of strncpy() does not guarantee that the destination buffer
 will be NUL terminated. This can lead to various linear read overflows
 and other misbehavior due to the missing termination. It also NUL-pads the
 destination buffer if the source contents are shorter than the destination
 buffer size, which may be a needless performance penalty for callers using
-only NUL-terminated strings. The safe replacement is :c:func:`strscpy`.
-(Users of :c:func:`strscpy` still needing NUL-padding should instead
+only NUL-terminated strings. The safe replacement is strscpy().
+(Users of strscpy() still needing NUL-padding should instead
 use strscpy_pad().)
 
-If a caller is using non-NUL-terminated strings, :c:func:`strncpy()` can
+If a caller is using non-NUL-terminated strings, strncpy()() can
 still be used, but destinations should be marked with the `__nonstring
 <https://gcc.gnu.org/onlinedocs/gcc/Common-Variable-Attributes.html>`_
 attribute to avoid future compiler warnings.
 
 strlcpy()
 ---------
-:c:func:`strlcpy` reads the entire source buffer first, possibly exceeding
+strlcpy() reads the entire source buffer first, possibly exceeding
 the given limit of bytes to copy. This is inefficient and can lead to
 linear read overflows if a source string is not NUL-terminated. The
-safe replacement is :c:func:`strscpy`.
+safe replacement is strscpy().
 
 %p format specifier
 -------------------
-- 
cgit 


From b53366a979f74ecdd893aa237e329366e3f028f8 Mon Sep 17 00:00:00 2001
From: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Date: Wed, 4 Mar 2020 12:08:21 +0100
Subject: MAINTAINERS: adjust to kobject doc ReST conversion

Commit 5fed00dcaca8 ("Documentation: kobject.txt has been moved to
core-api/kobject.rst") missed to adjust the entry in MAINTAINERS.

Since then, ./scripts/get_maintainer.pl --self-test complains:

  warning: no file matches F: Documentation/kobject.txt

Adjust DRIVER CORE, KOBJECTS, DEBUGFS AND SYSFS entry in MAINTAINERS.

Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Link: https://lore.kernel.org/r/20200304110821.7243-1-lukas.bulwahn@gmail.com
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 8c5712079412..5ddc491bea55 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5201,7 +5201,7 @@ M:	Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 R:	"Rafael J. Wysocki" <rafael@kernel.org>
 T:	git git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git
 S:	Supported
-F:	Documentation/kobject.txt
+F:	Documentation/core-api/kobject.rst
 F:	drivers/base/
 F:	fs/debugfs/
 F:	fs/sysfs/
-- 
cgit 


From 6480e449646cdbfce239cec0b6cdc66b9617b802 Mon Sep 17 00:00:00 2001
From: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Date: Tue, 3 Mar 2020 20:42:15 +0100
Subject: docs: dev-tools: kmemleak: Update list of architectures
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

* Don't list powerpc twice (once as ppc)
* Drop tile, which has been removed from the source tree
* Mention arm64, nds32, arc, and xtensa

Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Link: https://lore.kernel.org/r/20200303194215.23756-1-j.neuschaefer@gmx.net
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/dev-tools/kmemleak.rst | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/Documentation/dev-tools/kmemleak.rst b/Documentation/dev-tools/kmemleak.rst
index 3a289e8a1d12..fce262883984 100644
--- a/Documentation/dev-tools/kmemleak.rst
+++ b/Documentation/dev-tools/kmemleak.rst
@@ -8,7 +8,8 @@ with the difference that the orphan objects are not freed but only
 reported via /sys/kernel/debug/kmemleak. A similar method is used by the
 Valgrind tool (``memcheck --leak-check``) to detect the memory leaks in
 user-space applications.
-Kmemleak is supported on x86, arm, powerpc, sparc, sh, microblaze, ppc, mips, s390 and tile.
+Kmemleak is supported on x86, arm, arm64, powerpc, sparc, sh, microblaze, mips,
+s390, nds32, arc and xtensa.
 
 Usage
 -----
-- 
cgit 


From 26f67b4c6e4ceb608f3a9f23023c372456855200 Mon Sep 17 00:00:00 2001
From: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Date: Tue, 3 Mar 2020 20:21:12 +0100
Subject: Documentation: management-style: Fix formatting of emphsized word
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Commit 7f2b3c65b9a1 ("Documentation/ManagementStyle: convert it to ReST
markup") converted _underlined_ to *emphasized* words, but forgot about
an underscore in this case.

Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Link: https://lore.kernel.org/r/20200303192113.20761-1-j.neuschaefer@gmx.net
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/process/management-style.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/process/management-style.rst b/Documentation/process/management-style.rst
index 186753ff3d2d..dfbc69bf49d4 100644
--- a/Documentation/process/management-style.rst
+++ b/Documentation/process/management-style.rst
@@ -227,7 +227,7 @@ incompetence will grudgingly admit that you at least didn't try to weasel
 out of it.
 
 Then make the developer who really screwed up (if you can find them) know
-**in_private** that they screwed up.  Not just so they can avoid it in the
+**in private** that they screwed up.  Not just so they can avoid it in the
 future, but so that they know they owe you one.  And, perhaps even more
 importantly, they're also likely the person who can fix it.  Because, let's
 face it, it sure ain't you.
-- 
cgit 


From fcd6807271579c377a5fc43a4dc22fdd9883ba8c Mon Sep 17 00:00:00 2001
From: Pragat Pandya <pragat.pandya@gmail.com>
Date: Tue, 3 Mar 2020 10:33:00 +0530
Subject: Documentation: Add io-mapping.rst to driver-api manual

Add io-mapping.rst under Documentation/driver-api and reference it from
Sphinx TOC Tree present in Documentation/driver-api/index.rst

Signed-off-by: Pragat Pandya <pragat.pandya@gmail.com>
Link: https://lore.kernel.org/r/20200303050301.5412-2-pragat.pandya@gmail.com
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/driver-api/index.rst      |  1 +
 Documentation/driver-api/io-mapping.rst | 97 +++++++++++++++++++++++++++++++++
 Documentation/io-mapping.txt            | 97 ---------------------------------
 3 files changed, 98 insertions(+), 97 deletions(-)
 create mode 100644 Documentation/driver-api/io-mapping.rst
 delete mode 100644 Documentation/io-mapping.txt

diff --git a/Documentation/driver-api/index.rst b/Documentation/driver-api/index.rst
index 1d8c5599149b..99bdb393f475 100644
--- a/Documentation/driver-api/index.rst
+++ b/Documentation/driver-api/index.rst
@@ -79,6 +79,7 @@ available subsections can be seen below.
    ipmb
    isa
    isapnp
+   io-mapping
    generic-counter
    lightnvm-pblk
    memory-devices/index
diff --git a/Documentation/driver-api/io-mapping.rst b/Documentation/driver-api/io-mapping.rst
new file mode 100644
index 000000000000..a966239f04e4
--- /dev/null
+++ b/Documentation/driver-api/io-mapping.rst
@@ -0,0 +1,97 @@
+========================
+The io_mapping functions
+========================
+
+API
+===
+
+The io_mapping functions in linux/io-mapping.h provide an abstraction for
+efficiently mapping small regions of an I/O device to the CPU. The initial
+usage is to support the large graphics aperture on 32-bit processors where
+ioremap_wc cannot be used to statically map the entire aperture to the CPU
+as it would consume too much of the kernel address space.
+
+A mapping object is created during driver initialization using::
+
+	struct io_mapping *io_mapping_create_wc(unsigned long base,
+						unsigned long size)
+
+'base' is the bus address of the region to be made
+mappable, while 'size' indicates how large a mapping region to
+enable. Both are in bytes.
+
+This _wc variant provides a mapping which may only be used
+with the io_mapping_map_atomic_wc or io_mapping_map_wc.
+
+With this mapping object, individual pages can be mapped either atomically
+or not, depending on the necessary scheduling environment. Of course, atomic
+maps are more efficient::
+
+	void *io_mapping_map_atomic_wc(struct io_mapping *mapping,
+				       unsigned long offset)
+
+'offset' is the offset within the defined mapping region.
+Accessing addresses beyond the region specified in the
+creation function yields undefined results. Using an offset
+which is not page aligned yields an undefined result. The
+return value points to a single page in CPU address space.
+
+This _wc variant returns a write-combining map to the
+page and may only be used with mappings created by
+io_mapping_create_wc
+
+Note that the task may not sleep while holding this page
+mapped.
+
+::
+
+	void io_mapping_unmap_atomic(void *vaddr)
+
+'vaddr' must be the value returned by the last
+io_mapping_map_atomic_wc call. This unmaps the specified
+page and allows the task to sleep once again.
+
+If you need to sleep while holding the lock, you can use the non-atomic
+variant, although they may be significantly slower.
+
+::
+
+	void *io_mapping_map_wc(struct io_mapping *mapping,
+				unsigned long offset)
+
+This works like io_mapping_map_atomic_wc except it allows
+the task to sleep while holding the page mapped.
+
+
+::
+
+	void io_mapping_unmap(void *vaddr)
+
+This works like io_mapping_unmap_atomic, except it is used
+for pages mapped with io_mapping_map_wc.
+
+At driver close time, the io_mapping object must be freed::
+
+	void io_mapping_free(struct io_mapping *mapping)
+
+Current Implementation
+======================
+
+The initial implementation of these functions uses existing mapping
+mechanisms and so provides only an abstraction layer and no new
+functionality.
+
+On 64-bit processors, io_mapping_create_wc calls ioremap_wc for the whole
+range, creating a permanent kernel-visible mapping to the resource. The
+map_atomic and map functions add the requested offset to the base of the
+virtual address returned by ioremap_wc.
+
+On 32-bit processors with HIGHMEM defined, io_mapping_map_atomic_wc uses
+kmap_atomic_pfn to map the specified page in an atomic fashion;
+kmap_atomic_pfn isn't really supposed to be used with device pages, but it
+provides an efficient mapping for this usage.
+
+On 32-bit processors without HIGHMEM defined, io_mapping_map_atomic_wc and
+io_mapping_map_wc both use ioremap_wc, a terribly inefficient function which
+performs an IPI to inform all processors about the new mapping. This results
+in a significant performance penalty.
diff --git a/Documentation/io-mapping.txt b/Documentation/io-mapping.txt
deleted file mode 100644
index a966239f04e4..000000000000
--- a/Documentation/io-mapping.txt
+++ /dev/null
@@ -1,97 +0,0 @@
-========================
-The io_mapping functions
-========================
-
-API
-===
-
-The io_mapping functions in linux/io-mapping.h provide an abstraction for
-efficiently mapping small regions of an I/O device to the CPU. The initial
-usage is to support the large graphics aperture on 32-bit processors where
-ioremap_wc cannot be used to statically map the entire aperture to the CPU
-as it would consume too much of the kernel address space.
-
-A mapping object is created during driver initialization using::
-
-	struct io_mapping *io_mapping_create_wc(unsigned long base,
-						unsigned long size)
-
-'base' is the bus address of the region to be made
-mappable, while 'size' indicates how large a mapping region to
-enable. Both are in bytes.
-
-This _wc variant provides a mapping which may only be used
-with the io_mapping_map_atomic_wc or io_mapping_map_wc.
-
-With this mapping object, individual pages can be mapped either atomically
-or not, depending on the necessary scheduling environment. Of course, atomic
-maps are more efficient::
-
-	void *io_mapping_map_atomic_wc(struct io_mapping *mapping,
-				       unsigned long offset)
-
-'offset' is the offset within the defined mapping region.
-Accessing addresses beyond the region specified in the
-creation function yields undefined results. Using an offset
-which is not page aligned yields an undefined result. The
-return value points to a single page in CPU address space.
-
-This _wc variant returns a write-combining map to the
-page and may only be used with mappings created by
-io_mapping_create_wc
-
-Note that the task may not sleep while holding this page
-mapped.
-
-::
-
-	void io_mapping_unmap_atomic(void *vaddr)
-
-'vaddr' must be the value returned by the last
-io_mapping_map_atomic_wc call. This unmaps the specified
-page and allows the task to sleep once again.
-
-If you need to sleep while holding the lock, you can use the non-atomic
-variant, although they may be significantly slower.
-
-::
-
-	void *io_mapping_map_wc(struct io_mapping *mapping,
-				unsigned long offset)
-
-This works like io_mapping_map_atomic_wc except it allows
-the task to sleep while holding the page mapped.
-
-
-::
-
-	void io_mapping_unmap(void *vaddr)
-
-This works like io_mapping_unmap_atomic, except it is used
-for pages mapped with io_mapping_map_wc.
-
-At driver close time, the io_mapping object must be freed::
-
-	void io_mapping_free(struct io_mapping *mapping)
-
-Current Implementation
-======================
-
-The initial implementation of these functions uses existing mapping
-mechanisms and so provides only an abstraction layer and no new
-functionality.
-
-On 64-bit processors, io_mapping_create_wc calls ioremap_wc for the whole
-range, creating a permanent kernel-visible mapping to the resource. The
-map_atomic and map functions add the requested offset to the base of the
-virtual address returned by ioremap_wc.
-
-On 32-bit processors with HIGHMEM defined, io_mapping_map_atomic_wc uses
-kmap_atomic_pfn to map the specified page in an atomic fashion;
-kmap_atomic_pfn isn't really supposed to be used with device pages, but it
-provides an efficient mapping for this usage.
-
-On 32-bit processors without HIGHMEM defined, io_mapping_map_atomic_wc and
-io_mapping_map_wc both use ioremap_wc, a terribly inefficient function which
-performs an IPI to inform all processors about the new mapping. This results
-in a significant performance penalty.
-- 
cgit 


From d1ce350015d86a67d245fad124e37d14b573cac2 Mon Sep 17 00:00:00 2001
From: Pragat Pandya <pragat.pandya@gmail.com>
Date: Tue, 3 Mar 2020 10:33:01 +0530
Subject: Documentation: Add io_ordering.rst to driver-api manual

Add io_ordering.rst under Documentation/driver-api and reference it from
the Sphinx TOC Tree present in Documentation/driver-api/index.rst

Signed-off-by: Pragat Pandya <pragat.pandya@gmail.com>
Link: https://lore.kernel.org/r/20200303050301.5412-3-pragat.pandya@gmail.com
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/driver-api/index.rst       |  1 +
 Documentation/driver-api/io_ordering.rst | 51 ++++++++++++++++++++++++++++++++
 Documentation/io_ordering.txt            | 51 --------------------------------
 3 files changed, 52 insertions(+), 51 deletions(-)
 create mode 100644 Documentation/driver-api/io_ordering.rst
 delete mode 100644 Documentation/io_ordering.txt

diff --git a/Documentation/driver-api/index.rst b/Documentation/driver-api/index.rst
index 99bdb393f475..d4e78cb3ef4d 100644
--- a/Documentation/driver-api/index.rst
+++ b/Documentation/driver-api/index.rst
@@ -80,6 +80,7 @@ available subsections can be seen below.
    isa
    isapnp
    io-mapping
+   io_ordering
    generic-counter
    lightnvm-pblk
    memory-devices/index
diff --git a/Documentation/driver-api/io_ordering.rst b/Documentation/driver-api/io_ordering.rst
new file mode 100644
index 000000000000..2ab303ce9a0d
--- /dev/null
+++ b/Documentation/driver-api/io_ordering.rst
@@ -0,0 +1,51 @@
+==============================================
+Ordering I/O writes to memory-mapped addresses
+==============================================
+
+On some platforms, so-called memory-mapped I/O is weakly ordered.  On such
+platforms, driver writers are responsible for ensuring that I/O writes to
+memory-mapped addresses on their device arrive in the order intended.  This is
+typically done by reading a 'safe' device or bridge register, causing the I/O
+chipset to flush pending writes to the device before any reads are posted.  A
+driver would usually use this technique immediately prior to the exit of a
+critical section of code protected by spinlocks.  This would ensure that
+subsequent writes to I/O space arrived only after all prior writes (much like a
+memory barrier op, mb(), only with respect to I/O).
+
+A more concrete example from a hypothetical device driver::
+
+		...
+	CPU A:  spin_lock_irqsave(&dev_lock, flags)
+	CPU A:  val = readl(my_status);
+	CPU A:  ...
+	CPU A:  writel(newval, ring_ptr);
+	CPU A:  spin_unlock_irqrestore(&dev_lock, flags)
+		...
+	CPU B:  spin_lock_irqsave(&dev_lock, flags)
+	CPU B:  val = readl(my_status);
+	CPU B:  ...
+	CPU B:  writel(newval2, ring_ptr);
+	CPU B:  spin_unlock_irqrestore(&dev_lock, flags)
+		...
+
+In the case above, the device may receive newval2 before it receives newval,
+which could cause problems.  Fixing it is easy enough though::
+
+		...
+	CPU A:  spin_lock_irqsave(&dev_lock, flags)
+	CPU A:  val = readl(my_status);
+	CPU A:  ...
+	CPU A:  writel(newval, ring_ptr);
+	CPU A:  (void)readl(safe_register); /* maybe a config register? */
+	CPU A:  spin_unlock_irqrestore(&dev_lock, flags)
+		...
+	CPU B:  spin_lock_irqsave(&dev_lock, flags)
+	CPU B:  val = readl(my_status);
+	CPU B:  ...
+	CPU B:  writel(newval2, ring_ptr);
+	CPU B:  (void)readl(safe_register); /* maybe a config register? */
+	CPU B:  spin_unlock_irqrestore(&dev_lock, flags)
+
+Here, the reads from safe_register will cause the I/O chipset to flush any
+pending writes before actually posting the read to the chipset, preventing
+possible data corruption.
diff --git a/Documentation/io_ordering.txt b/Documentation/io_ordering.txt
deleted file mode 100644
index 2ab303ce9a0d..000000000000
--- a/Documentation/io_ordering.txt
+++ /dev/null
@@ -1,51 +0,0 @@
-==============================================
-Ordering I/O writes to memory-mapped addresses
-==============================================
-
-On some platforms, so-called memory-mapped I/O is weakly ordered.  On such
-platforms, driver writers are responsible for ensuring that I/O writes to
-memory-mapped addresses on their device arrive in the order intended.  This is
-typically done by reading a 'safe' device or bridge register, causing the I/O
-chipset to flush pending writes to the device before any reads are posted.  A
-driver would usually use this technique immediately prior to the exit of a
-critical section of code protected by spinlocks.  This would ensure that
-subsequent writes to I/O space arrived only after all prior writes (much like a
-memory barrier op, mb(), only with respect to I/O).
-
-A more concrete example from a hypothetical device driver::
-
-		...
-	CPU A:  spin_lock_irqsave(&dev_lock, flags)
-	CPU A:  val = readl(my_status);
-	CPU A:  ...
-	CPU A:  writel(newval, ring_ptr);
-	CPU A:  spin_unlock_irqrestore(&dev_lock, flags)
-		...
-	CPU B:  spin_lock_irqsave(&dev_lock, flags)
-	CPU B:  val = readl(my_status);
-	CPU B:  ...
-	CPU B:  writel(newval2, ring_ptr);
-	CPU B:  spin_unlock_irqrestore(&dev_lock, flags)
-		...
-
-In the case above, the device may receive newval2 before it receives newval,
-which could cause problems.  Fixing it is easy enough though::
-
-		...
-	CPU A:  spin_lock_irqsave(&dev_lock, flags)
-	CPU A:  val = readl(my_status);
-	CPU A:  ...
-	CPU A:  writel(newval, ring_ptr);
-	CPU A:  (void)readl(safe_register); /* maybe a config register? */
-	CPU A:  spin_unlock_irqrestore(&dev_lock, flags)
-		...
-	CPU B:  spin_lock_irqsave(&dev_lock, flags)
-	CPU B:  val = readl(my_status);
-	CPU B:  ...
-	CPU B:  writel(newval2, ring_ptr);
-	CPU B:  (void)readl(safe_register); /* maybe a config register? */
-	CPU B:  spin_unlock_irqrestore(&dev_lock, flags)
-
-Here, the reads from safe_register will cause the I/O chipset to flush any
-pending writes before actually posting the read to the chipset, preventing
-possible data corruption.
-- 
cgit 


From 8206de7d3887f17aa0034b319ab6ca6c2f925372 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Tue, 3 Mar 2020 16:50:31 +0100
Subject: docs: trace: events.rst: convert some new stuff to ReST format

Some new chapters were added to the documentation. This caused
Sphinx to complain, as the literal blocks there are not properly
tagged as such. Also, a new note added there doesn't follow
the ReST format.

This fixes the following warnings:

    Documentation/trace/events.rst:589: WARNING: Definition list ends without a blank line; unexpected unindent.
    Documentation/trace/events.rst:620: WARNING: Inline emphasis start-string without end-string.
    Documentation/trace/events.rst:623: WARNING: Inline emphasis start-string without end-string.
    Documentation/trace/events.rst:626: WARNING: Inline emphasis start-string without end-string.
    Documentation/trace/events.rst:703: WARNING: Inline emphasis start-string without end-string.
    Documentation/trace/events.rst:697: WARNING: Inline emphasis start-string without end-string.
    Documentation/trace/events.rst:722: WARNING: Inline emphasis start-string without end-string.
    Documentation/trace/events.rst:775: WARNING: Definition list ends without a blank line; unexpected unindent.
    Documentation/trace/events.rst:814: WARNING: Inline emphasis start-string without end-string.
    Documentation/trace/events.rst:817: WARNING: Inline emphasis start-string without end-string.
    Documentation/trace/events.rst:820: WARNING: Inline emphasis start-string without end-string.
    Documentation/trace/events.rst:823: WARNING: Inline emphasis start-string without end-string.
    Documentation/trace/events.rst:826: WARNING: Inline emphasis start-string without end-string.
    Documentation/trace/events.rst:829: WARNING: Inline emphasis start-string without end-string.
    Documentation/trace/events.rst:832: WARNING: Inline emphasis start-string without end-string.
    Documentation/trace/events.rst:844: WARNING: Unexpected indentation.
    Documentation/trace/events.rst:845: WARNING: Block quote ends without a blank line; unexpected unindent.
    Documentation/trace/events.rst:849: WARNING: Unexpected indentation.
    Documentation/trace/events.rst:850: WARNING: Block quote ends without a blank line; unexpected unindent.
    Documentation/trace/events.rst:883: WARNING: Inline emphasis start-string without end-string.
    Documentation/trace/events.rst:886: WARNING: Inline emphasis start-string without end-string.
    Documentation/trace/events.rst:889: WARNING: Inline emphasis start-string without end-string.
    Documentation/trace/events.rst:895: WARNING: Bullet list ends without a blank line; unexpected unindent.
    Documentation/trace/events.rst:895: WARNING: Inline emphasis start-string without end-string.
    Documentation/trace/events.rst:968: WARNING: Inline emphasis start-string without end-string.

Fixes: 34ed63573b66 ("tracing: Documentation for in-kernel synthetic event API")
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Tom Zanussi <zanussi@kernel.org>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/afbe367ccb7b9abcb9fab7bc5cb5e0686c105a53.1583250595.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/trace/events.rst | 63 +++++++++++++++++++++---------------------
 1 file changed, 32 insertions(+), 31 deletions(-)

diff --git a/Documentation/trace/events.rst b/Documentation/trace/events.rst
index ed79b220bd07..4a2ebe0bd19b 100644
--- a/Documentation/trace/events.rst
+++ b/Documentation/trace/events.rst
@@ -342,7 +342,8 @@ section of Documentation/trace/ftrace.rst), but there are major
 differences and the implementation isn't currently tied to it in any
 way, so beware about making generalizations between the two.
 
-Note: Writing into trace_marker (See Documentation/trace/ftrace.rst)
+.. Note::
+     Writing into trace_marker (See Documentation/trace/ftrace.rst)
      can also enable triggers that are written into
      /sys/kernel/tracing/events/ftrace/print/trigger
 
@@ -569,14 +570,14 @@ The first creates the event in one step, using synth_event_create().
 In this method, the name of the event to create and an array defining
 the fields is supplied to synth_event_create().  If successful, a
 synthetic event with that name and fields will exist following that
-call.  For example, to create a new "schedtest" synthetic event:
+call.  For example, to create a new "schedtest" synthetic event::
 
   ret = synth_event_create("schedtest", sched_fields,
                            ARRAY_SIZE(sched_fields), THIS_MODULE);
 
 The sched_fields param in this example points to an array of struct
 synth_field_desc, each of which describes an event field by type and
-name:
+name::
 
   static struct synth_field_desc sched_fields[] = {
         { .type = "pid_t",              .name = "next_pid_field" },
@@ -615,7 +616,7 @@ synth_event_gen_cmd_array_start(), the user should create and
 initialize a dynevent_cmd object using synth_event_cmd_init().
 
 For example, to create a new "schedtest" synthetic event with two
-fields:
+fields::
 
   struct dynevent_cmd cmd;
   char *buf;
@@ -631,7 +632,7 @@ fields:
                                   "u64", "ts_ns");
 
 Alternatively, using an array of struct synth_field_desc fields
-containing the same information:
+containing the same information::
 
   ret = synth_event_gen_cmd_array_start(&cmd, "schedtest", THIS_MODULE,
                                         fields, n_fields);
@@ -640,7 +641,7 @@ Once the synthetic event object has been created, it can then be
 populated with more fields.  Fields are added one by one using
 synth_event_add_field(), supplying the dynevent_cmd object, a field
 type, and a field name.  For example, to add a new int field named
-"intfield", the following call should be made:
+"intfield", the following call should be made::
 
   ret = synth_event_add_field(&cmd, "int", "intfield");
 
@@ -649,7 +650,7 @@ the field is considered to be an array.
 
 A group of fields can also be added all at once using an array of
 synth_field_desc with add_synth_fields().  For example, this would add
-just the first four sched_fields:
+just the first four sched_fields::
 
   ret = synth_event_add_fields(&cmd, sched_fields, 4);
 
@@ -658,7 +659,7 @@ synth_event_add_field_str() can be used to add it as-is; it will
 also automatically append a ';' to the string.
 
 Once all the fields have been added, the event should be finalized and
-registered by calling the synth_event_gen_cmd_end() function:
+registered by calling the synth_event_gen_cmd_end() function::
 
   ret = synth_event_gen_cmd_end(&cmd);
 
@@ -691,7 +692,7 @@ trace array)), along with an variable number of u64 args, one for each
 synthetic event field, and the number of values being passed.
 
 So, to trace an event corresponding to the synthetic event definition
-above, code like the following could be used:
+above, code like the following could be used::
 
   ret = synth_event_trace(create_synth_test, 7, /* number of values */
                           444,             /* next_pid_field */
@@ -715,7 +716,7 @@ trace array)), along with an array of u64, one for each synthetic
 event field.
 
 To trace an event corresponding to the synthetic event definition
-above, code like the following could be used:
+above, code like the following could be used::
 
   u64 vals[7];
 
@@ -739,7 +740,7 @@ In order to trace a synthetic event, a pointer to the trace event file
 is needed.  The trace_get_event_file() function can be used to get
 it - it will find the file in the given trace instance (in this case
 NULL since the top trace array is being used) while at the same time
-preventing the instance containing it from going away:
+preventing the instance containing it from going away::
 
        schedtest_event_file = trace_get_event_file(NULL, "synthetic",
                                                    "schedtest");
@@ -751,31 +752,31 @@ To enable a synthetic event from the kernel, trace_array_set_clr_event()
 can be used (which is not specific to synthetic events, so does need
 the "synthetic" system name to be specified explicitly).
 
-To enable the event, pass 'true' to it:
+To enable the event, pass 'true' to it::
 
        trace_array_set_clr_event(schedtest_event_file->tr,
                                  "synthetic", "schedtest", true);
 
-To disable it pass false:
+To disable it pass false::
 
        trace_array_set_clr_event(schedtest_event_file->tr,
                                  "synthetic", "schedtest", false);
 
 Finally, synth_event_trace_array() can be used to actually trace the
-event, which should be visible in the trace buffer afterwards:
+event, which should be visible in the trace buffer afterwards::
 
        ret = synth_event_trace_array(schedtest_event_file, vals,
                                      ARRAY_SIZE(vals));
 
 To remove the synthetic event, the event should be disabled, and the
-trace instance should be 'put' back using trace_put_event_file():
+trace instance should be 'put' back using trace_put_event_file()::
 
        trace_array_set_clr_event(schedtest_event_file->tr,
                                  "synthetic", "schedtest", false);
        trace_put_event_file(schedtest_event_file);
 
 If those have been successful, synth_event_delete() can be called to
-remove the event:
+remove the event::
 
        ret = synth_event_delete("schedtest");
 
@@ -784,7 +785,7 @@ remove the event:
 
 To trace a synthetic using the piecewise method described above, the
 synth_event_trace_start() function is used to 'open' the synthetic
-event trace:
+event trace::
 
        struct synth_trace_state trace_state;
 
@@ -809,7 +810,7 @@ along with the value to set the next field in the event.  After each
 field is set, the 'cursor' points to the next field, which will be set
 by the subsequent call, continuing until all the fields have been set
 in order.  The same sequence of calls as in the above examples using
-this method would be (without error-handling code):
+this method would be (without error-handling code)::
 
        /* next_pid_field */
        ret = synth_event_add_next_val(777, &trace_state);
@@ -837,7 +838,7 @@ used.  Each call is passed the same synth_trace_state object used in
 the synth_event_trace_start(), along with the field name of the field
 to set and the value to set it to.  The same sequence of calls as in
 the above examples using this method would be (without error-handling
-code):
+code)::
 
        ret = synth_event_add_val("next_pid_field", 777, &trace_state);
        ret = synth_event_add_val("next_comm_field", (u64)"silly putty",
@@ -855,7 +856,7 @@ can be used but not both at the same time.
 
 Finally, the event won't be actually traced until it's 'closed',
 which is done using synth_event_trace_end(), which takes only the
-struct synth_trace_state object used in the previous calls:
+struct synth_trace_state object used in the previous calls::
 
        ret = synth_event_trace_end(&trace_state);
 
@@ -878,7 +879,7 @@ function.  Before calling kprobe_event_gen_cmd_start(), the user
 should create and initialize a dynevent_cmd object using
 kprobe_event_cmd_init().
 
-For example, to create a new "schedtest" kprobe event with two fields:
+For example, to create a new "schedtest" kprobe event with two fields::
 
   struct dynevent_cmd cmd;
   char *buf;
@@ -900,18 +901,18 @@ Once the kprobe event object has been created, it can then be
 populated with more fields.  Fields can be added using
 kprobe_event_add_fields(), supplying the dynevent_cmd object along
 with a variable arg list of probe fields.  For example, to add a
-couple additional fields, the following call could be made:
+couple additional fields, the following call could be made::
 
   ret = kprobe_event_add_fields(&cmd, "flags=%cx", "mode=+4($stack)");
 
 Once all the fields have been added, the event should be finalized and
 registered by calling the kprobe_event_gen_cmd_end() or
 kretprobe_event_gen_cmd_end() functions, depending on whether a kprobe
-or kretprobe command was started:
+or kretprobe command was started::
 
   ret = kprobe_event_gen_cmd_end(&cmd);
 
-or
+or::
 
   ret = kretprobe_event_gen_cmd_end(&cmd);
 
@@ -920,13 +921,13 @@ events.
 
 Similarly, a kretprobe event can be created using
 kretprobe_event_gen_cmd_start() with a probe name and location and
-additional params such as $retval:
+additional params such as $retval::
 
   ret = kretprobe_event_gen_cmd_start(&cmd, "gen_kretprobe_test",
                                       "do_sys_open", "$retval");
 
 Similar to the synthetic event case, code like the following can be
-used to enable the newly created kprobe event:
+used to enable the newly created kprobe event::
 
   gen_kprobe_test = trace_get_event_file(NULL, "kprobes", "gen_kprobe_test");
 
@@ -934,7 +935,7 @@ used to enable the newly created kprobe event:
                                   "kprobes", "gen_kprobe_test", true);
 
 Finally, also similar to synthetic events, the following code can be
-used to give the kprobe event file back and delete the event:
+used to give the kprobe event file back and delete the event::
 
   trace_put_event_file(gen_kprobe_test);
 
@@ -963,7 +964,7 @@ are described below.
 
 The first step in building a new command string is to create and
 initialize an instance of a dynevent_cmd.  Here, for instance, we
-create a dynevent_cmd on the stack and initialize it:
+create a dynevent_cmd on the stack and initialize it::
 
   struct dynevent_cmd cmd;
   char *buf;
@@ -989,7 +990,7 @@ calls to argument-adding functions.
 To add a single argument, define and initialize a struct dynevent_arg
 or struct dynevent_arg_pair object.  Here's an example of the simplest
 possible arg addition, which is simply to append the given string as
-a whitespace-separated argument to the command:
+a whitespace-separated argument to the command::
 
   struct dynevent_arg arg;
 
@@ -1007,7 +1008,7 @@ the arg.
 Here's another more complicated example using an 'arg pair', which is
 used to create an argument that consists of a couple components added
 together as a unit, for example, a 'type field_name;' arg or a simple
-expression arg e.g. 'flags=%cx':
+expression arg e.g. 'flags=%cx'::
 
   struct dynevent_arg_pair arg_pair;
 
@@ -1031,7 +1032,7 @@ Any number of dynevent_*_add() calls can be made to build up the string
 (until its length surpasses cmd->maxlen).  When all the arguments have
 been added and the command string is complete, the only thing left to
 do is run the command, which happens by simply calling
-dynevent_create():
+dynevent_create()::
 
   ret = dynevent_create(&cmd);
 
-- 
cgit 


From 99d1a38a739e1d8e3ae6b2f76a364b21f16d7bd9 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Tue, 3 Mar 2020 16:50:34 +0100
Subject: docs: driver.rst: supress two ReSt warnings

Get rid of those, by marking a literal block as such:

	Documentation/driver-api/gpio/driver.rst:425: WARNING: Unexpected indentation.
	Documentation/driver-api/gpio/driver.rst:423: WARNING: Inline emphasis start-string without end-string.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/8356b02547087979f57cb71fbefb5e5f636c78f4.1583250595.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/driver-api/driver-model/driver.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/driver-api/driver-model/driver.rst b/Documentation/driver-api/driver-model/driver.rst
index baa6a85c8287..63887b813005 100644
--- a/Documentation/driver-api/driver-model/driver.rst
+++ b/Documentation/driver-api/driver-model/driver.rst
@@ -210,7 +210,7 @@ probed.
 While the typical use case for sync_state() is to have the kernel cleanly take
 over management of devices from the bootloader, the usage of sync_state() is
 not restricted to that. Use it whenever it makes sense to take an action after
-all the consumers of a device have probed.
+all the consumers of a device have probed::
 
 	int 	(*remove)	(struct device *dev);
 
-- 
cgit 


From faa71c80a8d5ea7d5e7bc34119e44face190caf2 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Tue, 3 Mar 2020 16:50:36 +0100
Subject: docs: translations: it: avoid duplicate refs at
 programming-language.rst

As the translations document is part of the main body, we can't
keep duplicated references there. So, prefix the Italian ones
with "it-".

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/e733111f3599dff96524ad09ace5204ac6bb496b.1583250595.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 .../it_IT/process/programming-language.rst         | 30 +++++++++++-----------
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/Documentation/translations/it_IT/process/programming-language.rst b/Documentation/translations/it_IT/process/programming-language.rst
index f4b006395849..c4fc9d394c29 100644
--- a/Documentation/translations/it_IT/process/programming-language.rst
+++ b/Documentation/translations/it_IT/process/programming-language.rst
@@ -8,26 +8,26 @@
 Linguaggio di programmazione
 ============================
 
-Il kernel è scritto nel linguaggio di programmazione C [c-language]_.
-Più precisamente, il kernel viene compilato con ``gcc`` [gcc]_ usando
-l'opzione ``-std=gnu89`` [gcc-c-dialect-options]_: il dialetto GNU
+Il kernel è scritto nel linguaggio di programmazione C [it-c-language]_.
+Più precisamente, il kernel viene compilato con ``gcc`` [it-gcc]_ usando
+l'opzione ``-std=gnu89`` [it-gcc-c-dialect-options]_: il dialetto GNU
 dello standard ISO C90 (con l'aggiunta di alcune funzionalità da C99)
 
-Questo dialetto contiene diverse estensioni al linguaggio [gnu-extensions]_,
+Questo dialetto contiene diverse estensioni al linguaggio [it-gnu-extensions]_,
 e molte di queste vengono usate sistematicamente dal kernel.
 
 Il kernel offre un certo livello di supporto per la compilazione con ``clang``
-[clang]_ e ``icc`` [icc]_ su diverse architetture, tuttavia in questo momento
+[it-clang]_ e ``icc`` [it-icc]_ su diverse architetture, tuttavia in questo momento
 il supporto non è completo e richiede delle patch aggiuntive.
 
 Attributi
 ---------
 
 Una delle estensioni più comuni e usate nel kernel sono gli attributi
-[gcc-attribute-syntax]_. Gli attributi permettono di aggiungere una semantica,
+[it-gcc-attribute-syntax]_. Gli attributi permettono di aggiungere una semantica,
 definita dell'implementazione, alle entità del linguaggio (come le variabili,
 le funzioni o i tipi) senza dover fare importanti modifiche sintattiche al
-linguaggio stesso (come l'aggiunta di nuove parole chiave) [n2049]_.
+linguaggio stesso (come l'aggiunta di nuove parole chiave) [it-n2049]_.
 
 In alcuni casi, gli attributi sono opzionali (ovvero un compilatore che non
 dovesse supportarli dovrebbe produrre comunque codice corretto, anche se
@@ -41,11 +41,11 @@ possono usare e/o per accorciare il codice.
 Per maggiori informazioni consultate il file d'intestazione
 ``include/linux/compiler_attributes.h``.
 
-.. [c-language] http://www.open-std.org/jtc1/sc22/wg14/www/standards
-.. [gcc] https://gcc.gnu.org
-.. [clang] https://clang.llvm.org
-.. [icc] https://software.intel.com/en-us/c-compilers
-.. [gcc-c-dialect-options] https://gcc.gnu.org/onlinedocs/gcc/C-Dialect-Options.html
-.. [gnu-extensions] https://gcc.gnu.org/onlinedocs/gcc/C-Extensions.html
-.. [gcc-attribute-syntax] https://gcc.gnu.org/onlinedocs/gcc/Attribute-Syntax.html
-.. [n2049] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2049.pdf
+.. [it-c-language] http://www.open-std.org/jtc1/sc22/wg14/www/standards
+.. [it-gcc] https://gcc.gnu.org
+.. [it-clang] https://clang.llvm.org
+.. [it-icc] https://software.intel.com/en-us/c-compilers
+.. [it-gcc-c-dialect-options] https://gcc.gnu.org/onlinedocs/gcc/C-Dialect-Options.html
+.. [it-gnu-extensions] https://gcc.gnu.org/onlinedocs/gcc/C-Extensions.html
+.. [it-gcc-attribute-syntax] https://gcc.gnu.org/onlinedocs/gcc/Attribute-Syntax.html
+.. [it-n2049] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2049.pdf
-- 
cgit 


From 3b31589c7d8565725f6e7360337086ef08254766 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Tue, 3 Mar 2020 16:50:37 +0100
Subject: docs: filesystems: fuse.rst: supress a Sphinx warning

Get rid of this warning:

    Documentation/filesystems/fuse.rst:2: WARNING: Explicit markup ends without a blank line; unexpected unindent.

Fixes: 8ab13bca428b ("Documentation: filesystems: convert fuse to RST")
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/cad541ec7d8d220d57bd5d097d60c62da64054ac.1583250595.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/fuse.rst | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/Documentation/filesystems/fuse.rst b/Documentation/filesystems/fuse.rst
index 8e455065ce9e..cd717f9bf940 100644
--- a/Documentation/filesystems/fuse.rst
+++ b/Documentation/filesystems/fuse.rst
@@ -1,7 +1,8 @@
 .. SPDX-License-Identifier: GPL-2.0
-==============
+
+====
 FUSE
-==============
+====
 
 Definitions
 ===========
-- 
cgit 


From 2b008dc6926c2df04530d9ddca6fa22f339d92b9 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Tue, 3 Mar 2020 16:50:38 +0100
Subject: docs: perf: imx-ddr.rst: get rid of a warning

    Documentation/admin-guide/perf/imx-ddr.rst:47: WARNING: Unexpected indentation.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/b27b54bd4f847032fd33313d6497ff320c0f3d78.1583250595.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/admin-guide/perf/imx-ddr.rst | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/perf/imx-ddr.rst b/Documentation/admin-guide/perf/imx-ddr.rst
index 3726a10a03ba..f05f56c73b7d 100644
--- a/Documentation/admin-guide/perf/imx-ddr.rst
+++ b/Documentation/admin-guide/perf/imx-ddr.rst
@@ -43,7 +43,8 @@ value 1 for supported.
 
   AXI_ID and AXI_MASKING are mapped on DPCR1 register in performance counter.
   When non-masked bits are matching corresponding AXI_ID bits then counter is
-  incremented. Perf counter is incremented if
+  incremented. Perf counter is incremented if::
+
         AxID && AXI_MASKING == AXI_ID && AXI_MASKING
 
   This filter doesn't support filter different AXI ID for axid-read and axid-write
-- 
cgit 


From 23f03fe22032d68719f646558951b0f6618187a8 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Tue, 3 Mar 2020 16:50:39 +0100
Subject: docs: hw-vuln: tsx_async_abort.rst: get rid of an unused ref

The virt_mechanism reference there points to a section
called elsewhere (Virtualization mitigation). Also, it is
not used anywere.

Besides that, it conflicts with a label with the same name
inside:

	Documentation/admin-guide/hw-vuln/mds.rst

Perhaps added due to some cut-and-paste?

Anyway, as this is not used, let's just get rid of it.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/681c8e2916bf4943ac2277f181668bfbc5fdbc01.1583250595.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/admin-guide/hw-vuln/tsx_async_abort.rst | 2 --
 1 file changed, 2 deletions(-)

diff --git a/Documentation/admin-guide/hw-vuln/tsx_async_abort.rst b/Documentation/admin-guide/hw-vuln/tsx_async_abort.rst
index af6865b822d2..68d96f0e9c95 100644
--- a/Documentation/admin-guide/hw-vuln/tsx_async_abort.rst
+++ b/Documentation/admin-guide/hw-vuln/tsx_async_abort.rst
@@ -136,8 +136,6 @@ enables the mitigation by default.
 The mitigation can be controlled at boot time via a kernel command line option.
 See :ref:`taa_mitigation_control_command_line`.
 
-.. _virt_mechanism:
-
 Virtualization mitigation
 ^^^^^^^^^^^^^^^^^^^^^^^^^
 
-- 
cgit 


From 0a07bef6e5c58874d510f452c72eb12a31200e0f Mon Sep 17 00:00:00 2001
From: "Guilherme G. Piccoli" <gpiccoli@canonical.com>
Date: Tue, 10 Mar 2020 15:36:49 -0300
Subject: Documentation: Better document the softlockup_panic sysctl

Commit 9c44bc03fff4 ("softlockup: allow panic on lockup") added the
softlockup_panic sysctl, but didn't add information about it to the file
Documentation/admin-guide/sysctl/kernel.rst (which in that time certainly
wasn't rst and had other name!).

This patch just adds the respective documentation and references it from
the corresponding entry in Documentation/admin-guide/kernel-parameters.txt.

This patch was strongly based on Scott Wood's commit d22881dc13b6
("Documentation: Better document the hardlockup_panic sysctl").

Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com>
Link: https://lore.kernel.org/r/20200310183649.23163-1-gpiccoli@canonical.com
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/admin-guide/kernel-parameters.txt |  8 ++++----
 Documentation/admin-guide/sysctl/kernel.rst     | 14 ++++++++++++++
 2 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 4220477079bd..b3b5aa7408df 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4516,10 +4516,10 @@
 			Format: <integer>
 
 			A nonzero value instructs the soft-lockup detector
-			to panic the machine when a soft-lockup occurs. This
-			is also controlled by CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC
-			which is the respective build-time switch to that
-			functionality.
+			to panic the machine when a soft-lockup occurs. It is
+			also controlled by the kernel.softlockup_panic sysctl
+			and CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC, which is the
+			respective build-time switch to that functionality.
 
 	softlockup_all_cpu_backtrace=
 			[KNL] Should the soft-lockup detector generate
diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
index 1c48ab4bfe30..335696d3360d 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -1036,6 +1036,20 @@ NMI.
 = ============================================
 
 
+softlockup_panic
+=================
+
+This parameter can be used to control whether the kernel panics
+when a soft lockup is detected.
+
+= ============================================
+0 Don't panic on soft lockup.
+1 Panic on soft lockup.
+= ============================================
+
+This can also be set using the softlockup_panic kernel parameter.
+
+
 soft_watchdog
 =============
 
-- 
cgit 


From 7d3d3254adaa61cba896f71497f56901deb618e5 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Wed, 11 Mar 2020 12:51:17 +0100
Subject: docs: fix pointers to io-mapping.rst and io_ordering.rst files

Those files got moved, but cross-references still point to the
wrong places.

Fixes: fcd680727157 ("Documentation: Add io-mapping.rst to driver-api manual")
Fixes: d1ce350015d8 ("Documentation: Add io_ordering.rst to driver-api manual")
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Alex Shi <alex.shi@linux.alibaba.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/c0205119db4fef536272cb0a183b6c14c2c8bf4c.1583927470.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/PCI/pci.rst                        | 2 +-
 Documentation/translations/zh_CN/io_ordering.txt | 4 ++--
 arch/unicore32/include/asm/io.h                  | 2 +-
 include/linux/io-mapping.h                       | 2 +-
 4 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/Documentation/PCI/pci.rst b/Documentation/PCI/pci.rst
index 6864f9a70f5f..8c016d8c9862 100644
--- a/Documentation/PCI/pci.rst
+++ b/Documentation/PCI/pci.rst
@@ -239,7 +239,7 @@ from the PCI device config space. Use the values in the pci_dev structure
 as the PCI "bus address" might have been remapped to a "host physical"
 address by the arch/chip-set specific kernel support.
 
-See Documentation/io-mapping.txt for how to access device registers
+See Documentation/driver-api/io-mapping.rst for how to access device registers
 or device memory.
 
 The device driver needs to call pci_request_region() to verify
diff --git a/Documentation/translations/zh_CN/io_ordering.txt b/Documentation/translations/zh_CN/io_ordering.txt
index 1f8127bdd415..7bb3086227ae 100644
--- a/Documentation/translations/zh_CN/io_ordering.txt
+++ b/Documentation/translations/zh_CN/io_ordering.txt
@@ -1,4 +1,4 @@
-Chinese translated version of Documentation/io_ordering.txt
+Chinese translated version of Documentation/driver-api/io_ordering.rst
 
 If you have any comment or update to the content, please contact the
 original document maintainer directly.  However, if you have a problem
@@ -8,7 +8,7 @@ or if there is a problem with the translation.
 
 Chinese maintainer: Lin Yongting <linyongting@gmail.com>
 ---------------------------------------------------------------------
-Documentation/io_ordering.txt 的中文翻译
+Documentation/driver-api/io_ordering.rst 的中文翻译
 
 如果想评论或更新本文的内容，请直接联系原文档的维护者。如果你使用英文
 交流有困难的话，也可以向中文版维护者求助。如果本翻译更新不及时或者翻
diff --git a/arch/unicore32/include/asm/io.h b/arch/unicore32/include/asm/io.h
index 3ca74e1cde7d..bd4e7c332f85 100644
--- a/arch/unicore32/include/asm/io.h
+++ b/arch/unicore32/include/asm/io.h
@@ -27,7 +27,7 @@ extern void __uc32_iounmap(volatile void __iomem *addr);
  * ioremap and friends.
  *
  * ioremap takes a PCI memory address, as specified in
- * Documentation/io-mapping.txt.
+ * Documentation/driver-api/io-mapping.rst.
  *
  */
 #define ioremap(cookie, size)		__uc32_ioremap(cookie, size)
diff --git a/include/linux/io-mapping.h b/include/linux/io-mapping.h
index 837058bc1c9f..b336622612f3 100644
--- a/include/linux/io-mapping.h
+++ b/include/linux/io-mapping.h
@@ -16,7 +16,7 @@
  * The io_mapping mechanism provides an abstraction for mapping
  * individual pages from an io device to the CPU in an efficient fashion.
  *
- * See Documentation/io-mapping.txt
+ * See Documentation/driver-api/io-mapping.rst
  */
 
 struct io_mapping {
-- 
cgit 


From 58ad30cf91f073a9fab4f8e238b025431343dbf3 Mon Sep 17 00:00:00 2001
From: Jonathan Corbet <corbet@lwn.net>
Date: Thu, 19 Mar 2020 12:52:01 -0600
Subject: docs: fix reference to core-api/namespaces.rst

Fix a couple of dangling links to core-api/namespaces.rst by turning them
into proper references.  Enable the autosection extension (available since
Sphinx 1.4) to make this work.

Co-developed-by: Federico Vaga <federico.vaga@vaga.pv.it>
Fixes: fcfacb9f8374 ("doc: move namespaces.rst from kbuild/ to core-api/")
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/conf.py                    | 2 +-
 Documentation/kernel-hacking/hacking.rst | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/Documentation/conf.py b/Documentation/conf.py
index 3c7bdf4cd31f..fa2bfcd6df1d 100644
--- a/Documentation/conf.py
+++ b/Documentation/conf.py
@@ -38,7 +38,7 @@ needs_sphinx = '1.3'
 # ones.
 extensions = ['kerneldoc', 'rstFlatTable', 'kernel_include', 'cdomain',
               'kfigure', 'sphinx.ext.ifconfig', 'automarkup',
-              'maintainers_include']
+              'maintainers_include', 'sphinx.ext.autosectionlabel' ]
 
 # The name of the math extension changed on Sphinx 1.4
 if (major == 1 and minor > 3) or (major > 1):
diff --git a/Documentation/kernel-hacking/hacking.rst b/Documentation/kernel-hacking/hacking.rst
index d62aacb2822a..d707a0a61cc9 100644
--- a/Documentation/kernel-hacking/hacking.rst
+++ b/Documentation/kernel-hacking/hacking.rst
@@ -601,7 +601,7 @@ Defined in ``include/linux/export.h``
 
 This is the variant of `EXPORT_SYMBOL()` that allows specifying a symbol
 namespace. Symbol Namespaces are documented in
-``Documentation/core-api/symbol-namespaces.rst``.
+:ref:`Documentation/core-api/symbol-namespaces.rst <Symbol Namespaces>`
 
 :c:func:`EXPORT_SYMBOL_NS_GPL()`
 --------------------------------
@@ -610,7 +610,7 @@ Defined in ``include/linux/export.h``
 
 This is the variant of `EXPORT_SYMBOL_GPL()` that allows specifying a symbol
 namespace. Symbol Namespaces are documented in
-``Documentation/core-api/symbol-namespaces.rst``.
+:ref:`Documentation/core-api/symbol-namespaces.rst <Symbol Namespaces>`
 
 Routines and Conventions
 ========================
-- 
cgit 


From c44166fe5f38f0559eff1138cca094f3460e2345 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Fri, 20 Mar 2020 16:11:02 +0100
Subject: docs: prevent warnings due to autosectionlabel

Changeset 58ad30cf91f0 ("docs: fix reference to core-api/namespaces.rst")
enabled a new feature at Sphinx: it will now generate index for each
document title, plus to each chapter inside it.

There's a drawback, though: one document cannot have two sections
with the same name anymore.

A followup patch will change the logic of autosectionlabel to
avoid most creating references for every single section title,
but still we need to be able to reference the chapters inside
a document.

There are a few places where there are two chapters with the
same name. This patch renames one of the chapters, in order to
avoid symbol conflict within the same document.

PS.: as I don't speach Chinese, I had some help from a friend
(Wen Liu) at the Chinese translation for "publishing patches"
for this document:

	Documentation/translations/zh_CN/process/5.Posting.rst

Fixes: 58ad30cf91f0 ("docs: fix reference to core-api/namespaces.rst")
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/2bffb91e4a63d41bf5fae1c23e1e8b3bba0b8806.1584716446.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/driver-api/80211/mac80211-advanced.rst   |  8 ++++----
 Documentation/driver-api/dmaengine/index.rst           |  4 ++--
 Documentation/filesystems/ecryptfs.rst                 | 11 +++++------
 Documentation/kernel-hacking/hacking.rst               |  4 ++--
 Documentation/media/kapi/v4l2-controls.rst             |  8 ++++----
 Documentation/networking/snmp_counter.rst              |  4 ++--
 Documentation/powerpc/ultravisor.rst                   |  4 ++--
 Documentation/security/siphash.rst                     |  8 ++++----
 Documentation/target/tcmu-design.rst                   |  6 +++---
 Documentation/translations/zh_CN/process/5.Posting.rst |  2 +-
 Documentation/x86/intel-iommu.rst                      |  3 ++-
 11 files changed, 31 insertions(+), 31 deletions(-)

diff --git a/Documentation/driver-api/80211/mac80211-advanced.rst b/Documentation/driver-api/80211/mac80211-advanced.rst
index 9f1c5bb7ac35..24cb64b3b715 100644
--- a/Documentation/driver-api/80211/mac80211-advanced.rst
+++ b/Documentation/driver-api/80211/mac80211-advanced.rst
@@ -272,8 +272,8 @@ STA information lifetime rules
 .. kernel-doc:: net/mac80211/sta_info.c
    :doc: STA information lifetime rules
 
-Aggregation
-===========
+Aggregation Functions
+=====================
 
 .. kernel-doc:: net/mac80211/sta_info.h
    :functions: sta_ampdu_mlme
@@ -284,8 +284,8 @@ Aggregation
 .. kernel-doc:: net/mac80211/sta_info.h
    :functions: tid_ampdu_rx
 
-Synchronisation
-===============
+Synchronisation Functions
+=========================
 
 TBD
 
diff --git a/Documentation/driver-api/dmaengine/index.rst b/Documentation/driver-api/dmaengine/index.rst
index b9df904d0a79..bdc45d8b4cfb 100644
--- a/Documentation/driver-api/dmaengine/index.rst
+++ b/Documentation/driver-api/dmaengine/index.rst
@@ -5,8 +5,8 @@ DMAEngine documentation
 DMAEngine documentation provides documents for various aspects of DMAEngine
 framework.
 
-DMAEngine documentation
------------------------
+DMAEngine development documentation
+-----------------------------------
 
 This book helps with DMAengine internal APIs and guide for DMAEngine device
 driver writers.
diff --git a/Documentation/filesystems/ecryptfs.rst b/Documentation/filesystems/ecryptfs.rst
index 7236172300ef..1f2edef4c57a 100644
--- a/Documentation/filesystems/ecryptfs.rst
+++ b/Documentation/filesystems/ecryptfs.rst
@@ -30,13 +30,12 @@ Userspace requirements include:
 - Libgcrypt
 
 
-Notes
-=====
+.. note::
 
-In the beta/experimental releases of eCryptfs, when you upgrade
-eCryptfs, you should copy the files to an unencrypted location and
-then copy the files back into the new eCryptfs mount to migrate the
-files.
+   In the beta/experimental releases of eCryptfs, when you upgrade
+   eCryptfs, you should copy the files to an unencrypted location and
+   then copy the files back into the new eCryptfs mount to migrate the
+   files.
 
 
 Mount-wide Passphrase
diff --git a/Documentation/kernel-hacking/hacking.rst b/Documentation/kernel-hacking/hacking.rst
index d707a0a61cc9..eed2136d847f 100644
--- a/Documentation/kernel-hacking/hacking.rst
+++ b/Documentation/kernel-hacking/hacking.rst
@@ -601,7 +601,7 @@ Defined in ``include/linux/export.h``
 
 This is the variant of `EXPORT_SYMBOL()` that allows specifying a symbol
 namespace. Symbol Namespaces are documented in
-:ref:`Documentation/core-api/symbol-namespaces.rst <Symbol Namespaces>`
+:doc:`../core-api/symbol-namespaces`
 
 :c:func:`EXPORT_SYMBOL_NS_GPL()`
 --------------------------------
@@ -610,7 +610,7 @@ Defined in ``include/linux/export.h``
 
 This is the variant of `EXPORT_SYMBOL_GPL()` that allows specifying a symbol
 namespace. Symbol Namespaces are documented in
-:ref:`Documentation/core-api/symbol-namespaces.rst <Symbol Namespaces>`
+:doc:`../core-api/symbol-namespaces`
 
 Routines and Conventions
 ========================
diff --git a/Documentation/media/kapi/v4l2-controls.rst b/Documentation/media/kapi/v4l2-controls.rst
index b20800cae3f2..5129019afb49 100644
--- a/Documentation/media/kapi/v4l2-controls.rst
+++ b/Documentation/media/kapi/v4l2-controls.rst
@@ -291,8 +291,8 @@ and QUERYMENU. And G/S_CTRL as well as G/TRY/S_EXT_CTRLS are automatically suppo
    In practice the basic usage as described above is sufficient for most drivers.
 
 
-Inheriting Controls
--------------------
+Inheriting Sub-device Controls
+------------------------------
 
 When a sub-device is registered with a V4L2 driver by calling
 v4l2_device_register_subdev() and the ctrl_handler fields of both v4l2_subdev
@@ -757,8 +757,8 @@ attempting to find another control from the same handler will deadlock.
 It is recommended not to use this function from inside the control ops.
 
 
-Inheriting Controls
--------------------
+Preventing Controls inheritance
+-------------------------------
 
 When one control handler is added to another using v4l2_ctrl_add_handler, then
 by default all controls from one are merged to the other. But a subdev might
diff --git a/Documentation/networking/snmp_counter.rst b/Documentation/networking/snmp_counter.rst
index 38a4edc4522b..10e11099e74a 100644
--- a/Documentation/networking/snmp_counter.rst
+++ b/Documentation/networking/snmp_counter.rst
@@ -908,8 +908,8 @@ A TLP probe packet is sent.
 
 A packet loss is detected and recovered by TLP.
 
-TCP Fast Open
-=============
+TCP Fast Open description
+=========================
 TCP Fast Open is a technology which allows data transfer before the
 3-way handshake complete. Please refer the `TCP Fast Open wiki`_ for a
 general description.
diff --git a/Documentation/powerpc/ultravisor.rst b/Documentation/powerpc/ultravisor.rst
index 363736d7fd36..df136c8f91fa 100644
--- a/Documentation/powerpc/ultravisor.rst
+++ b/Documentation/powerpc/ultravisor.rst
@@ -8,8 +8,8 @@ Protected Execution Facility
 .. contents::
     :depth: 3
 
-Protected Execution Facility
-############################
+Introduction
+############
 
     Protected Execution Facility (PEF) is an architectural change for
     POWER 9 that enables Secure Virtual Machines (SVMs). DD2.3 chips
diff --git a/Documentation/security/siphash.rst b/Documentation/security/siphash.rst
index 9965821ab333..4eba68cdf0a1 100644
--- a/Documentation/security/siphash.rst
+++ b/Documentation/security/siphash.rst
@@ -128,8 +128,8 @@ then when you can be absolutely certain that the outputs will never be
 transmitted out of the kernel. This is only remotely useful over `jhash` as a
 means of mitigating hashtable flooding denial of service attacks.
 
-Generating a key
-================
+Generating a HalfSipHash key
+============================
 
 Keys should always be generated from a cryptographically secure source of
 random numbers, either using get_random_bytes or get_random_once:
@@ -139,8 +139,8 @@ get_random_bytes(&key, sizeof(key));
 
 If you're not deriving your key from here, you're doing it wrong.
 
-Using the functions
-===================
+Using the HalfSipHash functions
+===============================
 
 There are two variants of the function, one that takes a list of integers, and
 one that takes a buffer::
diff --git a/Documentation/target/tcmu-design.rst b/Documentation/target/tcmu-design.rst
index a7b426707bf6..e47047e32e27 100644
--- a/Documentation/target/tcmu-design.rst
+++ b/Documentation/target/tcmu-design.rst
@@ -5,7 +5,7 @@ TCM Userspace Design
 
 .. Contents:
 
-   1) TCM Userspace Design
+   1) Design
      a) Background
      b) Benefits
      c) Design constraints
@@ -23,8 +23,8 @@ TCM Userspace Design
    3) A final note
 
 
-TCM Userspace Design
-====================
+Design
+======
 
 TCM is another name for LIO, an in-kernel iSCSI target (server).
 Existing TCM targets run in the kernel.  TCMU (TCM in Userspace)
diff --git a/Documentation/translations/zh_CN/process/5.Posting.rst b/Documentation/translations/zh_CN/process/5.Posting.rst
index 41aba21ff050..9ff9945f918c 100644
--- a/Documentation/translations/zh_CN/process/5.Posting.rst
+++ b/Documentation/translations/zh_CN/process/5.Posting.rst
@@ -5,7 +5,7 @@
 
 .. _cn_development_posting:
 
-发送补丁
+发布补丁
 ========
 
 迟早，当您的工作准备好提交给社区进行审查，并最终包含到主线内核中时。不出所料，
diff --git a/Documentation/x86/intel-iommu.rst b/Documentation/x86/intel-iommu.rst
index 9dae6b47e398..099f13d51d5f 100644
--- a/Documentation/x86/intel-iommu.rst
+++ b/Documentation/x86/intel-iommu.rst
@@ -95,9 +95,10 @@ and any RMRR's processed::
 When DMAR is enabled for use, you will notice..
 
 PCI-DMA: Using DMAR IOMMU
+-------------------------
 
 Fault reporting
----------------
+^^^^^^^^^^^^^^^
 
 ::
 
-- 
cgit 


From 4658b0eb9430e2c228a0a9cc6e66f0b90d3853e1 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Fri, 20 Mar 2020 16:11:03 +0100
Subject: docs: conf.py: avoid thousands of duplicate label warning on Sphinx

The autosectionlabel extension is nice, as it allows to refer to
a section by its name without requiring any extra tag to create
a reference name.

However, on its default, it has two serious problems:

1) the namespace is global. So, two files with different
   "introduction" section would create a label with the
   same name. This is easily solvable by forcing the extension
   to prepend the file name with:

	autosectionlabel_prefix_document = True

2) It doesn't work hierarchically. So, if there are two level 1
   sessions (let's say, one labeled "open" and another one "ioctl")
   and both have a level 2 "synopsis" label, both section 2 will
   have the same identical name.

   Currently, there's no way to tell Sphinx to create an
   hierarchical reference like:

		open / synopsis
		ioctl / synopsis

  This causes around 800 warnings. So, the fix should be to
  not let autosectionlabel to produce references for anything
  that it is not at a chapter level within any doc, with:

	autosectionlabel_maxdepth = 2

Fixes: 58ad30cf91f0 ("docs: fix reference to core-api/namespaces.rst")
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/74f4d8d91c648d7101c45b4b99cc93532f4dadc6.1584716446.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/conf.py | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/Documentation/conf.py b/Documentation/conf.py
index fa2bfcd6df1d..9ae8e9abf846 100644
--- a/Documentation/conf.py
+++ b/Documentation/conf.py
@@ -40,6 +40,10 @@ extensions = ['kerneldoc', 'rstFlatTable', 'kernel_include', 'cdomain',
               'kfigure', 'sphinx.ext.ifconfig', 'automarkup',
               'maintainers_include', 'sphinx.ext.autosectionlabel' ]
 
+# Ensure that autosectionlabel will produce unique names
+autosectionlabel_prefix_document = True
+autosectionlabel_maxdepth = 2
+
 # The name of the math extension changed on Sphinx 1.4
 if (major == 1 and minor > 3) or (major > 1):
     extensions.append("sphinx.ext.imgmath")
-- 
cgit 


From 6adb7755996f0bf0f5e5f3996b016bc66f95f372 Mon Sep 17 00:00:00 2001
From: Stephen Boyd <swboyd@chromium.org>
Date: Wed, 18 Mar 2020 10:41:32 -0700
Subject: docs: locking: Add 'need' to hardirq section

Add the missing word to make this sentence read properly.

Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Link: https://lore.kernel.org/r/20200318174133.160206-2-swboyd@chromium.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/kernel-hacking/locking.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/kernel-hacking/locking.rst b/Documentation/kernel-hacking/locking.rst
index a8518ac0d31d..9850c1e52607 100644
--- a/Documentation/kernel-hacking/locking.rst
+++ b/Documentation/kernel-hacking/locking.rst
@@ -263,7 +263,7 @@ by a hardware interrupt on another CPU. This is where
 interrupts on that cpu, then grab the lock.
 :c:func:`spin_unlock_irq()` does the reverse.
 
-The irq handler does not to use :c:func:`spin_lock_irq()`, because
+The irq handler does not need to use :c:func:`spin_lock_irq()`, because
 the softirq cannot run while the irq handler is running: it can use
 :c:func:`spin_lock()`, which is slightly faster. The only exception
 would be if a different hardware irq handler uses the same lock:
-- 
cgit 


From b1735296cef99db66aac22f0e34fb0c88b889744 Mon Sep 17 00:00:00 2001
From: Stephen Boyd <swboyd@chromium.org>
Date: Wed, 18 Mar 2020 10:41:33 -0700
Subject: docs: locking: Drop :c:func: throughout

The kernel doc tooling knows how to do this itself so drop this markup
throughout this file to simplify.

Suggested-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Link: https://lore.kernel.org/r/20200318174133.160206-3-swboyd@chromium.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/kernel-hacking/locking.rst | 176 +++++++++++++++----------------
 1 file changed, 88 insertions(+), 88 deletions(-)

diff --git a/Documentation/kernel-hacking/locking.rst b/Documentation/kernel-hacking/locking.rst
index 9850c1e52607..6ed806e6061b 100644
--- a/Documentation/kernel-hacking/locking.rst
+++ b/Documentation/kernel-hacking/locking.rst
@@ -150,17 +150,17 @@ Locking Only In User Context
 If you have a data structure which is only ever accessed from user
 context, then you can use a simple mutex (``include/linux/mutex.h``) to
 protect it. This is the most trivial case: you initialize the mutex.
-Then you can call :c:func:`mutex_lock_interruptible()` to grab the
-mutex, and :c:func:`mutex_unlock()` to release it. There is also a
-:c:func:`mutex_lock()`, which should be avoided, because it will
+Then you can call mutex_lock_interruptible() to grab the
+mutex, and mutex_unlock() to release it. There is also a
+mutex_lock(), which should be avoided, because it will
 not return if a signal is received.
 
 Example: ``net/netfilter/nf_sockopt.c`` allows registration of new
-:c:func:`setsockopt()` and :c:func:`getsockopt()` calls, with
-:c:func:`nf_register_sockopt()`. Registration and de-registration
+setsockopt() and getsockopt() calls, with
+nf_register_sockopt(). Registration and de-registration
 are only done on module load and unload (and boot time, where there is
 no concurrency), and the list of registrations is only consulted for an
-unknown :c:func:`setsockopt()` or :c:func:`getsockopt()` system
+unknown setsockopt() or getsockopt() system
 call. The ``nf_sockopt_mutex`` is perfect to protect this, especially
 since the setsockopt and getsockopt calls may well sleep.
 
@@ -170,19 +170,19 @@ Locking Between User Context and Softirqs
 If a softirq shares data with user context, you have two problems.
 Firstly, the current user context can be interrupted by a softirq, and
 secondly, the critical region could be entered from another CPU. This is
-where :c:func:`spin_lock_bh()` (``include/linux/spinlock.h``) is
+where spin_lock_bh() (``include/linux/spinlock.h``) is
 used. It disables softirqs on that CPU, then grabs the lock.
-:c:func:`spin_unlock_bh()` does the reverse. (The '_bh' suffix is
+spin_unlock_bh() does the reverse. (The '_bh' suffix is
 a historical reference to "Bottom Halves", the old name for software
 interrupts. It should really be called spin_lock_softirq()' in a
 perfect world).
 
-Note that you can also use :c:func:`spin_lock_irq()` or
-:c:func:`spin_lock_irqsave()` here, which stop hardware interrupts
+Note that you can also use spin_lock_irq() or
+spin_lock_irqsave() here, which stop hardware interrupts
 as well: see `Hard IRQ Context <#hard-irq-context>`__.
 
 This works perfectly for UP as well: the spin lock vanishes, and this
-macro simply becomes :c:func:`local_bh_disable()`
+macro simply becomes local_bh_disable()
 (``include/linux/interrupt.h``), which protects you from the softirq
 being run.
 
@@ -216,8 +216,8 @@ Different Tasklets/Timers
 ~~~~~~~~~~~~~~~~~~~~~~~~~
 
 If another tasklet/timer wants to share data with your tasklet or timer
-, you will both need to use :c:func:`spin_lock()` and
-:c:func:`spin_unlock()` calls. :c:func:`spin_lock_bh()` is
+, you will both need to use spin_lock() and
+spin_unlock() calls. spin_lock_bh() is
 unnecessary here, as you are already in a tasklet, and none will be run
 on the same CPU.
 
@@ -234,14 +234,14 @@ The same softirq can run on the other CPUs: you can use a per-CPU array
 going so far as to use a softirq, you probably care about scalable
 performance enough to justify the extra complexity.
 
-You'll need to use :c:func:`spin_lock()` and
-:c:func:`spin_unlock()` for shared data.
+You'll need to use spin_lock() and
+spin_unlock() for shared data.
 
 Different Softirqs
 ~~~~~~~~~~~~~~~~~~
 
-You'll need to use :c:func:`spin_lock()` and
-:c:func:`spin_unlock()` for shared data, whether it be a timer,
+You'll need to use spin_lock() and
+spin_unlock() for shared data, whether it be a timer,
 tasklet, different softirq or the same or another softirq: any of them
 could be running on a different CPU.
 
@@ -259,38 +259,38 @@ If a hardware irq handler shares data with a softirq, you have two
 concerns. Firstly, the softirq processing can be interrupted by a
 hardware interrupt, and secondly, the critical region could be entered
 by a hardware interrupt on another CPU. This is where
-:c:func:`spin_lock_irq()` is used. It is defined to disable
+spin_lock_irq() is used. It is defined to disable
 interrupts on that cpu, then grab the lock.
-:c:func:`spin_unlock_irq()` does the reverse.
+spin_unlock_irq() does the reverse.
 
-The irq handler does not need to use :c:func:`spin_lock_irq()`, because
+The irq handler does not need to use spin_lock_irq(), because
 the softirq cannot run while the irq handler is running: it can use
-:c:func:`spin_lock()`, which is slightly faster. The only exception
+spin_lock(), which is slightly faster. The only exception
 would be if a different hardware irq handler uses the same lock:
-:c:func:`spin_lock_irq()` will stop that from interrupting us.
+spin_lock_irq() will stop that from interrupting us.
 
 This works perfectly for UP as well: the spin lock vanishes, and this
-macro simply becomes :c:func:`local_irq_disable()`
+macro simply becomes local_irq_disable()
 (``include/asm/smp.h``), which protects you from the softirq/tasklet/BH
 being run.
 
-:c:func:`spin_lock_irqsave()` (``include/linux/spinlock.h``) is a
+spin_lock_irqsave() (``include/linux/spinlock.h``) is a
 variant which saves whether interrupts were on or off in a flags word,
-which is passed to :c:func:`spin_unlock_irqrestore()`. This means
+which is passed to spin_unlock_irqrestore(). This means
 that the same code can be used inside an hard irq handler (where
 interrupts are already off) and in softirqs (where the irq disabling is
 required).
 
 Note that softirqs (and hence tasklets and timers) are run on return
-from hardware interrupts, so :c:func:`spin_lock_irq()` also stops
-these. In that sense, :c:func:`spin_lock_irqsave()` is the most
+from hardware interrupts, so spin_lock_irq() also stops
+these. In that sense, spin_lock_irqsave() is the most
 general and powerful locking function.
 
 Locking Between Two Hard IRQ Handlers
 -------------------------------------
 
 It is rare to have to share data between two IRQ handlers, but if you
-do, :c:func:`spin_lock_irqsave()` should be used: it is
+do, spin_lock_irqsave() should be used: it is
 architecture-specific whether all interrupts are disabled inside irq
 handlers themselves.
 
@@ -304,11 +304,11 @@ Pete Zaitcev gives the following summary:
    (``copy_from_user*(`` or ``kmalloc(x,GFP_KERNEL)``).
 
 -  Otherwise (== data can be touched in an interrupt), use
-   :c:func:`spin_lock_irqsave()` and
-   :c:func:`spin_unlock_irqrestore()`.
+   spin_lock_irqsave() and
+   spin_unlock_irqrestore().
 
 -  Avoid holding spinlock for more than 5 lines of code and across any
-   function call (except accessors like :c:func:`readb()`).
+   function call (except accessors like readb()).
 
 Table of Minimum Requirements
 -----------------------------
@@ -320,7 +320,7 @@ particular thread can only run on one CPU at a time, but if it needs
 shares data with another thread, locking is required).
 
 Remember the advice above: you can always use
-:c:func:`spin_lock_irqsave()`, which is a superset of all other
+spin_lock_irqsave(), which is a superset of all other
 spinlock primitives.
 
 ============== ============= ============= ========= ========= ========= ========= ======= ======= ============== ==============
@@ -363,13 +363,13 @@ They can be used if you need no access to the data protected with the
 lock when some other thread is holding the lock. You should acquire the
 lock later if you then need access to the data protected with the lock.
 
-:c:func:`spin_trylock()` does not spin but returns non-zero if it
+spin_trylock() does not spin but returns non-zero if it
 acquires the spinlock on the first try or 0 if not. This function can be
-used in all contexts like :c:func:`spin_lock()`: you must have
+used in all contexts like spin_lock(): you must have
 disabled the contexts that might interrupt you and acquire the spin
 lock.
 
-:c:func:`mutex_trylock()` does not suspend your task but returns
+mutex_trylock() does not suspend your task but returns
 non-zero if it could lock the mutex on the first try or 0 if not. This
 function cannot be safely used in hardware or software interrupt
 contexts despite not sleeping.
@@ -490,14 +490,14 @@ easy, since we copy the data for the user, and never let them access the
 objects directly.
 
 There is a slight (and common) optimization here: in
-:c:func:`cache_add()` we set up the fields of the object before
+cache_add() we set up the fields of the object before
 grabbing the lock. This is safe, as no-one else can access it until we
 put it in cache.
 
 Accessing From Interrupt Context
 --------------------------------
 
-Now consider the case where :c:func:`cache_find()` can be called
+Now consider the case where cache_find() can be called
 from interrupt context: either a hardware interrupt or a softirq. An
 example would be a timer which deletes object from the cache.
 
@@ -566,16 +566,16 @@ which are taken away, and the ``+`` are lines which are added.
              return ret;
      }
 
-Note that the :c:func:`spin_lock_irqsave()` will turn off
+Note that the spin_lock_irqsave() will turn off
 interrupts if they are on, otherwise does nothing (if we are already in
 an interrupt handler), hence these functions are safe to call from any
 context.
 
-Unfortunately, :c:func:`cache_add()` calls :c:func:`kmalloc()`
+Unfortunately, cache_add() calls kmalloc()
 with the ``GFP_KERNEL`` flag, which is only legal in user context. I
-have assumed that :c:func:`cache_add()` is still only called in
+have assumed that cache_add() is still only called in
 user context, otherwise this should become a parameter to
-:c:func:`cache_add()`.
+cache_add().
 
 Exposing Objects Outside This File
 ----------------------------------
@@ -592,7 +592,7 @@ This makes locking trickier, as it is no longer all in one place.
 The second problem is the lifetime problem: if another structure keeps a
 pointer to an object, it presumably expects that pointer to remain
 valid. Unfortunately, this is only guaranteed while you hold the lock,
-otherwise someone might call :c:func:`cache_delete()` and even
+otherwise someone might call cache_delete() and even
 worse, add another object, re-using the same address.
 
 As there is only one lock, you can't hold it forever: no-one else would
@@ -693,8 +693,8 @@ Here is the code::
 
 We encapsulate the reference counting in the standard 'get' and 'put'
 functions. Now we can return the object itself from
-:c:func:`cache_find()` which has the advantage that the user can
-now sleep holding the object (eg. to :c:func:`copy_to_user()` to
+cache_find() which has the advantage that the user can
+now sleep holding the object (eg. to copy_to_user() to
 name to userspace).
 
 The other point to note is that I said a reference should be held for
@@ -710,7 +710,7 @@ number of atomic operations defined in ``include/asm/atomic.h``: these
 are guaranteed to be seen atomically from all CPUs in the system, so no
 lock is required. In this case, it is simpler than using spinlocks,
 although for anything non-trivial using spinlocks is clearer. The
-:c:func:`atomic_inc()` and :c:func:`atomic_dec_and_test()`
+atomic_inc() and atomic_dec_and_test()
 are used instead of the standard increment and decrement operators, and
 the lock is no longer used to protect the reference count itself.
 
@@ -802,7 +802,7 @@ name to change, there are three possibilities:
 -  You can make ``cache_lock`` non-static, and tell people to grab that
    lock before changing the name in any object.
 
--  You can provide a :c:func:`cache_obj_rename()` which grabs this
+-  You can provide a cache_obj_rename() which grabs this
    lock and changes the name for the caller, and tell everyone to use
    that function.
 
@@ -861,11 +861,11 @@ Note that I decide that the popularity count should be protected by the
 ``cache_lock`` rather than the per-object lock: this is because it (like
 the :c:type:`struct list_head <list_head>` inside the object)
 is logically part of the infrastructure. This way, I don't need to grab
-the lock of every object in :c:func:`__cache_add()` when seeking
+the lock of every object in __cache_add() when seeking
 the least popular.
 
 I also decided that the id member is unchangeable, so I don't need to
-grab each object lock in :c:func:`__cache_find()` to examine the
+grab each object lock in __cache_find() to examine the
 id: the object lock is only used by a caller who wants to read or write
 the name field.
 
@@ -887,7 +887,7 @@ trivial to diagnose: not a
 stay-up-five-nights-talk-to-fluffy-code-bunnies kind of problem.
 
 For a slightly more complex case, imagine you have a region shared by a
-softirq and user context. If you use a :c:func:`spin_lock()` call
+softirq and user context. If you use a spin_lock() call
 to protect it, it is possible that the user context will be interrupted
 by the softirq while it holds the lock, and the softirq will then spin
 forever trying to get the same lock.
@@ -985,12 +985,12 @@ you might do the following::
 
 
 Sooner or later, this will crash on SMP, because a timer can have just
-gone off before the :c:func:`spin_lock_bh()`, and it will only get
-the lock after we :c:func:`spin_unlock_bh()`, and then try to free
+gone off before the spin_lock_bh(), and it will only get
+the lock after we spin_unlock_bh(), and then try to free
 the element (which has already been freed!).
 
 This can be avoided by checking the result of
-:c:func:`del_timer()`: if it returns 1, the timer has been deleted.
+del_timer(): if it returns 1, the timer has been deleted.
 If 0, it means (in this case) that it is currently running, so we can
 do::
 
@@ -1012,9 +1012,9 @@ do::
 
 
 Another common problem is deleting timers which restart themselves (by
-calling :c:func:`add_timer()` at the end of their timer function).
+calling add_timer() at the end of their timer function).
 Because this is a fairly common case which is prone to races, you should
-use :c:func:`del_timer_sync()` (``include/linux/timer.h``) to
+use del_timer_sync() (``include/linux/timer.h``) to
 handle this case. It returns the number of times the timer had to be
 deleted before we finally stopped it from adding itself back in.
 
@@ -1086,7 +1086,7 @@ adding ``new`` to a single linked list called ``list``::
             list->next = new;
 
 
-The :c:func:`wmb()` is a write memory barrier. It ensures that the
+The wmb() is a write memory barrier. It ensures that the
 first operation (setting the new element's ``next`` pointer) is complete
 and will be seen by all CPUs, before the second operation is (putting
 the new element into the list). This is important, since modern
@@ -1097,7 +1097,7 @@ rest of the list.
 
 Fortunately, there is a function to do this for standard
 :c:type:`struct list_head <list_head>` lists:
-:c:func:`list_add_rcu()` (``include/linux/list.h``).
+list_add_rcu() (``include/linux/list.h``).
 
 Removing an element from the list is even simpler: we replace the
 pointer to the old element with a pointer to its successor, and readers
@@ -1108,7 +1108,7 @@ will either see it, or skip over it.
             list->next = old->next;
 
 
-There is :c:func:`list_del_rcu()` (``include/linux/list.h``) which
+There is list_del_rcu() (``include/linux/list.h``) which
 does this (the normal version poisons the old object, which we don't
 want).
 
@@ -1116,9 +1116,9 @@ The reader must also be careful: some CPUs can look through the ``next``
 pointer to start reading the contents of the next element early, but
 don't realize that the pre-fetched contents is wrong when the ``next``
 pointer changes underneath them. Once again, there is a
-:c:func:`list_for_each_entry_rcu()` (``include/linux/list.h``)
+list_for_each_entry_rcu() (``include/linux/list.h``)
 to help you. Of course, writers can just use
-:c:func:`list_for_each_entry()`, since there cannot be two
+list_for_each_entry(), since there cannot be two
 simultaneous writers.
 
 Our final dilemma is this: when can we actually destroy the removed
@@ -1127,14 +1127,14 @@ the list right now: if we free this element and the ``next`` pointer
 changes, the reader will jump off into garbage and crash. We need to
 wait until we know that all the readers who were traversing the list
 when we deleted the element are finished. We use
-:c:func:`call_rcu()` to register a callback which will actually
+call_rcu() to register a callback which will actually
 destroy the object once all pre-existing readers are finished.
-Alternatively, :c:func:`synchronize_rcu()` may be used to block
+Alternatively, synchronize_rcu() may be used to block
 until all pre-existing are finished.
 
 But how does Read Copy Update know when the readers are finished? The
 method is this: firstly, the readers always traverse the list inside
-:c:func:`rcu_read_lock()`/:c:func:`rcu_read_unlock()` pairs:
+rcu_read_lock()/rcu_read_unlock() pairs:
 these simply disable preemption so the reader won't go to sleep while
 reading the list.
 
@@ -1223,12 +1223,12 @@ this is the fundamental idea.
      }
 
 Note that the reader will alter the popularity member in
-:c:func:`__cache_find()`, and now it doesn't hold a lock. One
+__cache_find(), and now it doesn't hold a lock. One
 solution would be to make it an ``atomic_t``, but for this usage, we
 don't really care about races: an approximate result is good enough, so
 I didn't change it.
 
-The result is that :c:func:`cache_find()` requires no
+The result is that cache_find() requires no
 synchronization with any other functions, so is almost as fast on SMP as
 it would be on UP.
 
@@ -1240,9 +1240,9 @@ and put the reference count.
 
 Now, because the 'read lock' in RCU is simply disabling preemption, a
 caller which always has preemption disabled between calling
-:c:func:`cache_find()` and :c:func:`object_put()` does not
+cache_find() and object_put() does not
 need to actually get and put the reference count: we could expose
-:c:func:`__cache_find()` by making it non-static, and such
+__cache_find() by making it non-static, and such
 callers could simply call that.
 
 The benefit here is that the reference count is not written to: the
@@ -1260,11 +1260,11 @@ counter. Nice and simple.
 If that was too slow (it's usually not, but if you've got a really big
 machine to test on and can show that it is), you could instead use a
 counter for each CPU, then none of them need an exclusive lock. See
-:c:func:`DEFINE_PER_CPU()`, :c:func:`get_cpu_var()` and
-:c:func:`put_cpu_var()` (``include/linux/percpu.h``).
+DEFINE_PER_CPU(), get_cpu_var() and
+put_cpu_var() (``include/linux/percpu.h``).
 
 Of particular use for simple per-cpu counters is the ``local_t`` type,
-and the :c:func:`cpu_local_inc()` and related functions, which are
+and the cpu_local_inc() and related functions, which are
 more efficient than simple code on some architectures
 (``include/asm/local.h``).
 
@@ -1289,10 +1289,10 @@ irq handler doesn't use a lock, and all other accesses are done as so::
         enable_irq(irq);
         spin_unlock(&lock);
 
-The :c:func:`disable_irq()` prevents the irq handler from running
+The disable_irq() prevents the irq handler from running
 (and waits for it to finish if it's currently running on other CPUs).
 The spinlock prevents any other accesses happening at the same time.
-Naturally, this is slower than just a :c:func:`spin_lock_irq()`
+Naturally, this is slower than just a spin_lock_irq()
 call, so it only makes sense if this type of access happens extremely
 rarely.
 
@@ -1315,22 +1315,22 @@ from user context, and can sleep.
 
 -  Accesses to userspace:
 
-   -  :c:func:`copy_from_user()`
+   -  copy_from_user()
 
-   -  :c:func:`copy_to_user()`
+   -  copy_to_user()
 
-   -  :c:func:`get_user()`
+   -  get_user()
 
-   -  :c:func:`put_user()`
+   -  put_user()
 
--  :c:func:`kmalloc(GFP_KERNEL) <kmalloc>`
+-  kmalloc(GP_KERNEL) <kmalloc>`
 
--  :c:func:`mutex_lock_interruptible()` and
-   :c:func:`mutex_lock()`
+-  mutex_lock_interruptible() and
+   mutex_lock()
 
-   There is a :c:func:`mutex_trylock()` which does not sleep.
+   There is a mutex_trylock() which does not sleep.
    Still, it must not be used inside interrupt context since its
-   implementation is not safe for that. :c:func:`mutex_unlock()`
+   implementation is not safe for that. mutex_unlock()
    will also never sleep. It cannot be used in interrupt context either
    since a mutex must be released by the same task that acquired it.
 
@@ -1340,11 +1340,11 @@ Some Functions Which Don't Sleep
 Some functions are safe to call from any context, or holding almost any
 lock.
 
--  :c:func:`printk()`
+-  printk()
 
--  :c:func:`kfree()`
+-  kfree()
 
--  :c:func:`add_timer()` and :c:func:`del_timer()`
+-  add_timer() and del_timer()
 
 Mutex API reference
 ===================
@@ -1400,26 +1400,26 @@ preemption
 
 bh
   Bottom Half: for historical reasons, functions with '_bh' in them often
-  now refer to any software interrupt, e.g. :c:func:`spin_lock_bh()`
+  now refer to any software interrupt, e.g. spin_lock_bh()
   blocks any software interrupt on the current CPU. Bottom halves are
   deprecated, and will eventually be replaced by tasklets. Only one bottom
   half will be running at any time.
 
 Hardware Interrupt / Hardware IRQ
-  Hardware interrupt request. :c:func:`in_irq()` returns true in a
+  Hardware interrupt request. in_irq() returns true in a
   hardware interrupt handler.
 
 Interrupt Context
   Not user context: processing a hardware irq or software irq. Indicated
-  by the :c:func:`in_interrupt()` macro returning true.
+  by the in_interrupt() macro returning true.
 
 SMP
   Symmetric Multi-Processor: kernels compiled for multiple-CPU machines.
   (``CONFIG_SMP=y``).
 
 Software Interrupt / softirq
-  Software interrupt handler. :c:func:`in_irq()` returns false;
-  :c:func:`in_softirq()` returns true. Tasklets and softirqs both
+  Software interrupt handler. in_irq() returns false;
+  in_softirq() returns true. Tasklets and softirqs both
   fall into the category of 'software interrupts'.
 
   Strictly speaking a softirq is one of up to 32 enumerated software
-- 
cgit 


From 9d4ca8c6b9fb65ffb671737851e9c9bb320e54f3 Mon Sep 17 00:00:00 2001
From: Wang Wenhu <wenhu.wang@vivo.com>
Date: Mon, 16 Mar 2020 04:01:31 -0700
Subject: doc: zh_CN: index files in filesystems subdirectory

Add filesystems subdirectory into the table of Contents for zh_CN,
all translations residing on it would be indexed conveniently.

Reviewed-by: Alex Shi <alex.shi@linux.alibaba.com>
Reviewed-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Wang Wenhu <wenhu.wang@vivo.com>
Link: https://lore.kernel.org/r/20200316110143.97848-1-wenhu.wang@vivo.com
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/index.rst                |  2 ++
 .../translations/zh_CN/filesystems/index.rst       | 25 ++++++++++++++++++++++
 Documentation/translations/zh_CN/index.rst         |  1 +
 3 files changed, 28 insertions(+)
 create mode 100644 Documentation/translations/zh_CN/filesystems/index.rst

diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index 53f46a88e6ec..e7b46dac7079 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -1,3 +1,5 @@
+.. _filesystems_index:
+
 ===============================
 Filesystems in the Linux kernel
 ===============================
diff --git a/Documentation/translations/zh_CN/filesystems/index.rst b/Documentation/translations/zh_CN/filesystems/index.rst
new file mode 100644
index 000000000000..f5adcdc5fa1c
--- /dev/null
+++ b/Documentation/translations/zh_CN/filesystems/index.rst
@@ -0,0 +1,25 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+.. include:: ../disclaimer-zh_CN.rst
+
+:Original: :ref:`Documentation/filesystems/index.rst <filesystems_index>`
+:Translator: Wang Wenhu <wenhu.wang@vivo.com>
+
+.. _cn_filesystems_index:
+
+========================
+Linux Kernel中的文件系统
+========================
+
+这份正在开发的手册或许在未来某个辉煌的日子里以易懂的形式将Linux虚拟\
+文件系统（VFS）层以及基于其上的各种文件系统如何工作呈现给大家。当前\
+可以看到下面的内容。
+
+文件系统
+========
+
+文件系统实现文档。
+
+.. toctree::
+   :maxdepth: 2
+
diff --git a/Documentation/translations/zh_CN/index.rst b/Documentation/translations/zh_CN/index.rst
index d3165535ec9e..76850a5dd982 100644
--- a/Documentation/translations/zh_CN/index.rst
+++ b/Documentation/translations/zh_CN/index.rst
@@ -14,6 +14,7 @@
    :maxdepth: 2
 
    process/index
+   filesystems/index
 
 目录和表格
 ----------
-- 
cgit 


From 6735c208c1326f2d56bdcd1f41f6062baef73bec Mon Sep 17 00:00:00 2001
From: Wang Wenhu <wenhu.wang@vivo.com>
Date: Mon, 16 Mar 2020 04:01:32 -0700
Subject: doc: zh_CN: add translation for virtiofs

Translate virtiofs.rst in Documentation/filesystems/ into Chinese.

Reviewed-by: Alex Shi <alex.shi@linux.alibaba.com>
Reviewed-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Wang Wenhu <wenhu.wang@vivo.com>
Link: https://lore.kernel.org/r/20200316110143.97848-2-wenhu.wang@vivo.com
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/virtiofs.rst             |  2 +
 .../translations/zh_CN/filesystems/index.rst       |  2 +
 .../translations/zh_CN/filesystems/virtiofs.rst    | 58 ++++++++++++++++++++++
 3 files changed, 62 insertions(+)
 create mode 100644 Documentation/translations/zh_CN/filesystems/virtiofs.rst

diff --git a/Documentation/filesystems/virtiofs.rst b/Documentation/filesystems/virtiofs.rst
index 4f338e3cb3f7..e06e4951cb39 100644
--- a/Documentation/filesystems/virtiofs.rst
+++ b/Documentation/filesystems/virtiofs.rst
@@ -1,5 +1,7 @@
 .. SPDX-License-Identifier: GPL-2.0
 
+.. _virtiofs_index:
+
 ===================================================
 virtiofs: virtio-fs host<->guest shared file system
 ===================================================
diff --git a/Documentation/translations/zh_CN/filesystems/index.rst b/Documentation/translations/zh_CN/filesystems/index.rst
index f5adcdc5fa1c..14f155edaf69 100644
--- a/Documentation/translations/zh_CN/filesystems/index.rst
+++ b/Documentation/translations/zh_CN/filesystems/index.rst
@@ -23,3 +23,5 @@ Linux Kernel中的文件系统
 .. toctree::
    :maxdepth: 2
 
+   virtiofs
+
diff --git a/Documentation/translations/zh_CN/filesystems/virtiofs.rst b/Documentation/translations/zh_CN/filesystems/virtiofs.rst
new file mode 100644
index 000000000000..09bc9e012e2a
--- /dev/null
+++ b/Documentation/translations/zh_CN/filesystems/virtiofs.rst
@@ -0,0 +1,58 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+.. include:: ../disclaimer-zh_CN.rst
+
+:Original: :ref:`Documentation/filesystems/virtiofs.rst <virtiofs_index>`
+
+译者
+::
+
+	中文版维护者： 王文虎 Wang Wenhu <wenhu.wang@vivo.com>
+	中文版翻译者： 王文虎 Wang Wenhu <wenhu.wang@vivo.com>
+	中文版校译者： 王文虎 Wang Wenhu <wenhu.wang@vivo.com>
+
+===========================================
+virtiofs: virtio-fs 主机<->客机共享文件系统
+===========================================
+
+- Copyright (C) 2020 Vivo Communication Technology Co. Ltd.
+
+介绍
+====
+Linux的virtiofs文件系统实现了一个半虚拟化VIRTIO类型“virtio-fs”设备的驱动，通过该\
+类型设备实现客机<->主机文件系统共享。它允许客机挂载一个已经导出到主机的目录。
+
+客机通常需要访问主机或者远程系统上的文件。使用场景包括：在新客机安装时让文件对其\
+可见；从主机上的根文件系统启动；对无状态或临时客机提供持久存储和在客机之间共享目录。
+
+尽管在某些任务可能通过使用已有的网络文件系统完成，但是却需要非常难以自动化的配置\
+步骤，且将存储网络暴露给客机。而virtio-fs设备通过提供不经过网络的文件系统访问文件\
+的设计方式解决了这些问题。
+
+另外，virto-fs设备发挥了主客机共存的优点提高了性能，并且提供了网络文件系统所不具备
+的一些语义功能。
+
+用法
+====
+以``myfs``标签将文件系统挂载到``/mnt``:
+
+.. code-block:: sh
+
+  guest# mount -t virtiofs myfs /mnt
+
+请查阅 https://virtio-fs.gitlab.io/ 了解配置QEMU和virtiofsd守护程序的详细信息。
+
+内幕
+====
+由于virtio-fs设备将FUSE协议用于文件系统请求，因此Linux的virtiofs文件系统与FUSE文\
+件系统客户端紧密集成在一起。客机充当FUSE客户端而主机充当FUSE服务器，内核与用户空\
+间之间的/dev/fuse接口由virtio-fs设备接口代替。
+
+FUSE请求被置于虚拟队列中由主机处理。主机填充缓冲区中的响应部分，而客机处理请求的完成部分。
+
+将/dev/fuse映射到虚拟队列需要解决/dev/fuse和虚拟队列之间语义上的差异。每次读取\
+/dev/fuse设备时，FUSE客户端都可以选择要传输的请求，从而可以使某些请求优先于其他\
+请求。虚拟队列有其队列语义，无法更改已入队请求的顺序。在虚拟队列已满的情况下尤
+其关键，因为此时不可能加入高优先级的请求。为了解决此差异，virtio-fs设备采用“hiprio”\
+（高优先级）虚拟队列，专门用于有别于普通请求的高优先级请求。
+
-- 
cgit 


From 7af51678b6d367ee93dc3d21e72ecf15be50fcb1 Mon Sep 17 00:00:00 2001
From: Kees Cook <keescook@chromium.org>
Date: Sat, 14 Mar 2020 15:29:50 -0700
Subject: docs: deprecated.rst: Add BUG()-family

Linus continues to remind[1] people to stop using the BUG()-family of
functions. We should have this better documented (even if checkpatch.pl
has been warning[2] since 2015), so add more details to deprecated.rst,
as a distinct place to point people to for guidance.

[1] https://lore.kernel.org/lkml/CAHk-=whDHsbK3HTOpTF=ue_o04onRwTEaK_ZoJp_fjbqq4+=Jw@mail.gmail.com/
[2] https://git.kernel.org/linus/9d3e3c705eb395528fd8f17208c87581b134da48

Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/202003141524.59C619B51A@keescook
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/process/deprecated.rst | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/Documentation/process/deprecated.rst b/Documentation/process/deprecated.rst
index e924d3197761..652e2aa02a66 100644
--- a/Documentation/process/deprecated.rst
+++ b/Documentation/process/deprecated.rst
@@ -29,6 +29,28 @@ a header file, it isn't the full solution. Such interfaces must either
 be fully removed from the kernel, or added to this file to discourage
 others from using them in the future.
 
+BUG() and BUG_ON()
+------------------
+Use WARN() and WARN_ON() instead, and handle the "impossible"
+error condition as gracefully as possible. While the BUG()-family
+of APIs were originally designed to act as an "impossible situation"
+assert and to kill a kernel thread "safely", they turn out to just be
+too risky. (e.g. "In what order do locks need to be released? Have
+various states been restored?") Very commonly, using BUG() will
+destabilize a system or entirely break it, which makes it impossible
+to debug or even get viable crash reports. Linus has `very strong
+<https://lore.kernel.org/lkml/CA+55aFy6jNLsywVYdGp83AMrXBo_P-pkjkphPGrO=82SPKCpLQ@mail.gmail.com/>`_
+feelings `about this
+<https://lore.kernel.org/lkml/CAHk-=whDHsbK3HTOpTF=ue_o04onRwTEaK_ZoJp_fjbqq4+=Jw@mail.gmail.com/>`_.
+
+Note that the WARN()-family should only be used for "expected to
+be unreachable" situations. If you want to warn about "reachable
+but undesirable" situations, please use the pr_warn()-family of
+functions. System owners may have set the *panic_on_warn* sysctl,
+to make sure their systems do not continue running in the face of
+"unreachable" conditions. (For example, see commits like `this one
+<https://git.kernel.org/linus/d4689846881d160a4d12a514e991a740bcb5d65a>`_.)
+
 open-coded arithmetic in allocator arguments
 --------------------------------------------
 Dynamic size calculations (especially multiplication) should not be
-- 
cgit 


From 19e91e543c82c5d7c5f0f1820ed60af1e88956e6 Mon Sep 17 00:00:00 2001
From: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Date: Sat, 14 Mar 2020 18:50:30 +0100
Subject: MAINTAINERS: adjust to filesystem doc ReST conversion

Mauro's patch series <cover.1581955849.git.mchehab+huawei@kernel.org>
("[PATCH 00/44] Manually convert filesystem FS documents to ReST")
converts many Documentation/filesystems/ files to ReST.

Since then, ./scripts/get_maintainer.pl --self-test complains with 27
warnings on Documentation/filesystems/ of this kind:

  warning: no file matches F: Documentation/filesystems/...

Adjust MAINTAINERS entries to all files converted from .txt to .rst in the
patch series and address the 27 warnings.

Link: https://lore.kernel.org/linux-erofs/cover.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Link: https://lore.kernel.org/r/20200314175030.10436-1-lukas.bulwahn@gmail.com
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 MAINTAINERS | 54 +++++++++++++++++++++++++++---------------------------
 1 file changed, 27 insertions(+), 27 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 5ddc491bea55..38f58b85eb06 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -214,7 +214,7 @@ Q:	http://patchwork.kernel.org/project/v9fs-devel/list/
 T:	git git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs.git
 T:	git git://github.com/martinetd/linux.git
 S:	Maintained
-F:	Documentation/filesystems/9p.txt
+F:	Documentation/filesystems/9p.rst
 F:	fs/9p/
 F:	net/9p/
 F:	include/net/9p/
@@ -584,7 +584,7 @@ AFFS FILE SYSTEM
 M:	David Sterba <dsterba@suse.com>
 L:	linux-fsdevel@vger.kernel.org
 S:	Odd Fixes
-F:	Documentation/filesystems/affs.txt
+F:	Documentation/filesystems/affs.rst
 F:	fs/affs/
 
 AFS FILESYSTEM
@@ -593,7 +593,7 @@ L:	linux-afs@lists.infradead.org
 S:	Supported
 F:	fs/afs/
 F:	include/trace/events/afs.h
-F:	Documentation/filesystems/afs.txt
+F:	Documentation/filesystems/afs.rst
 W:	https://www.infradead.org/~dhowells/kafs/
 
 AGPGART DRIVER
@@ -3063,7 +3063,7 @@ M:	Luis de Bethencourt <luisbg@kernel.org>
 M:	Salah Triki <salah.triki@gmail.com>
 S:	Maintained
 T:	git git://git.kernel.org/pub/scm/linux/kernel/git/luisbg/linux-befs.git
-F:	Documentation/filesystems/befs.txt
+F:	Documentation/filesystems/befs.rst
 F:	fs/befs/
 
 BFQ I/O SCHEDULER
@@ -3077,7 +3077,7 @@ F:	Documentation/block/bfq-iosched.rst
 BFS FILE SYSTEM
 M:	"Tigran A. Aivazian" <aivazian.tigran@gmail.com>
 S:	Maintained
-F:	Documentation/filesystems/bfs.txt
+F:	Documentation/filesystems/bfs.rst
 F:	fs/bfs/
 F:	include/uapi/linux/bfs_fs.h
 
@@ -3610,7 +3610,7 @@ W:	http://btrfs.wiki.kernel.org/
 Q:	http://patchwork.kernel.org/project/linux-btrfs/list/
 T:	git git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git
 S:	Maintained
-F:	Documentation/filesystems/btrfs.txt
+F:	Documentation/filesystems/btrfs.rst
 F:	fs/btrfs/
 F:	include/linux/btrfs*
 F:	include/uapi/linux/btrfs*
@@ -3906,7 +3906,7 @@ W:	http://ceph.com/
 T:	git git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client.git
 T:	git git://github.com/ceph/ceph-client.git
 S:	Supported
-F:	Documentation/filesystems/ceph.txt
+F:	Documentation/filesystems/ceph.rst
 F:	fs/ceph/
 
 CERTIFICATE HANDLING:
@@ -4423,7 +4423,7 @@ F:	include/linux/cpuidle.h
 CRAMFS FILESYSTEM
 M:	Nicolas Pitre <nico@fluxnic.net>
 S:	Maintained
-F:	Documentation/filesystems/cramfs.txt
+F:	Documentation/filesystems/cramfs.rst
 F:	fs/cramfs/
 
 CREATIVE SB0540
@@ -5938,7 +5938,7 @@ W:	http://ecryptfs.org
 W:	https://launchpad.net/ecryptfs
 T:	git git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs.git
 S:	Supported
-F:	Documentation/filesystems/ecryptfs.txt
+F:	Documentation/filesystems/ecryptfs.rst
 F:	fs/ecryptfs/
 
 EDAC-AMD64
@@ -6254,7 +6254,7 @@ M:	Chao Yu <yuchao0@huawei.com>
 L:	linux-erofs@lists.ozlabs.org
 S:	Maintained
 T:	git git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs.git
-F:	Documentation/filesystems/erofs.txt
+F:	Documentation/filesystems/erofs.rst
 F:	fs/erofs/
 F:	include/trace/events/erofs.h
 
@@ -6315,7 +6315,7 @@ EXT2 FILE SYSTEM
 M:	Jan Kara <jack@suse.com>
 L:	linux-ext4@vger.kernel.org
 S:	Maintained
-F:	Documentation/filesystems/ext2.txt
+F:	Documentation/filesystems/ext2.rst
 F:	fs/ext2/
 F:	include/linux/ext2*
 
@@ -6389,7 +6389,7 @@ L:	linux-f2fs-devel@lists.sourceforge.net
 W:	https://f2fs.wiki.kernel.org/
 T:	git git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git
 S:	Maintained
-F:	Documentation/filesystems/f2fs.txt
+F:	Documentation/filesystems/f2fs.rst
 F:	Documentation/ABI/testing/sysfs-fs-f2fs
 F:	fs/f2fs/
 F:	include/linux/f2fs_fs.h
@@ -7431,13 +7431,13 @@ F:	drivers/infiniband/hw/hfi1
 HFS FILESYSTEM
 L:	linux-fsdevel@vger.kernel.org
 S:	Orphan
-F:	Documentation/filesystems/hfs.txt
+F:	Documentation/filesystems/hfs.rst
 F:	fs/hfs/
 
 HFSPLUS FILESYSTEM
 L:	linux-fsdevel@vger.kernel.org
 S:	Orphan
-F:	Documentation/filesystems/hfsplus.txt
+F:	Documentation/filesystems/hfsplus.rst
 F:	fs/hfsplus/
 
 HGA FRAMEBUFFER DRIVER
@@ -8308,7 +8308,7 @@ M:	Jan Kara <jack@suse.cz>
 R:	Amir Goldstein <amir73il@gmail.com>
 L:	linux-fsdevel@vger.kernel.org
 S:	Maintained
-F:	Documentation/filesystems/inotify.txt
+F:	Documentation/filesystems/inotify.rst
 F:	fs/notify/inotify/
 F:	include/linux/inotify.h
 F:	include/uapi/linux/inotify.h
@@ -11791,7 +11791,7 @@ W:	https://nilfs.sourceforge.io/
 W:	https://nilfs.osdn.jp/
 T:	git git://github.com/konis/nilfs2.git
 S:	Supported
-F:	Documentation/filesystems/nilfs2.txt
+F:	Documentation/filesystems/nilfs2.rst
 F:	fs/nilfs2/
 F:	include/trace/events/nilfs2.h
 F:	include/uapi/linux/nilfs2_api.h
@@ -11900,7 +11900,7 @@ L:	linux-ntfs-dev@lists.sourceforge.net
 W:	http://www.tuxera.com/
 T:	git git://git.kernel.org/pub/scm/linux/kernel/git/aia21/ntfs.git
 S:	Supported
-F:	Documentation/filesystems/ntfs.txt
+F:	Documentation/filesystems/ntfs.rst
 F:	fs/ntfs/
 
 NUBUS SUBSYSTEM
@@ -12246,7 +12246,7 @@ OMFS FILESYSTEM
 M:	Bob Copeland <me@bobcopeland.com>
 L:	linux-karma-devel@lists.sourceforge.net
 S:	Maintained
-F:	Documentation/filesystems/omfs.txt
+F:	Documentation/filesystems/omfs.rst
 F:	fs/omfs/
 
 OMNIKEY CARDMAN 4000 DRIVER
@@ -12495,8 +12495,8 @@ M:	Joseph Qi <joseph.qi@linux.alibaba.com>
 L:	ocfs2-devel@oss.oracle.com (moderated for non-subscribers)
 W:	http://ocfs2.wiki.kernel.org
 S:	Supported
-F:	Documentation/filesystems/ocfs2.txt
-F:	Documentation/filesystems/dlmfs.txt
+F:	Documentation/filesystems/ocfs2.rst
+F:	Documentation/filesystems/dlmfs.rst
 F:	fs/ocfs2/
 
 ORANGEFS FILESYSTEM
@@ -12506,7 +12506,7 @@ L:	devel@lists.orangefs.org
 T:	git git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux.git
 S:	Supported
 F:	fs/orangefs/
-F:	Documentation/filesystems/orangefs.txt
+F:	Documentation/filesystems/orangefs.rst
 
 ORINOCO DRIVER
 L:	linux-wireless@vger.kernel.org
@@ -13469,7 +13469,7 @@ S:	Maintained
 F:	fs/proc/
 F:	include/linux/proc_fs.h
 F:	tools/testing/selftests/proc/
-F:	Documentation/filesystems/proc.txt
+F:	Documentation/filesystems/proc.rst
 
 PROC SYSCTL
 M:	Luis Chamberlain <mcgrof@kernel.org>
@@ -15738,7 +15738,7 @@ L:	squashfs-devel@lists.sourceforge.net (subscribers-only)
 W:	http://squashfs.org.uk
 T:	git git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-next.git
 S:	Maintained
-F:	Documentation/filesystems/squashfs.txt
+F:	Documentation/filesystems/squashfs.rst
 F:	fs/squashfs/
 
 SRM (Alpha) environment access
@@ -16181,7 +16181,7 @@ F:	drivers/platform/x86/system76_acpi.c
 SYSV FILESYSTEM
 M:	Christoph Hellwig <hch@infradead.org>
 S:	Maintained
-F:	Documentation/filesystems/sysv-fs.txt
+F:	Documentation/filesystems/sysv-fs.rst
 F:	fs/sysv/
 F:	include/linux/sysv_fs.h
 
@@ -17046,7 +17046,7 @@ T:	git git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs.git next
 T:	git git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs.git fixes
 W:	http://www.linux-mtd.infradead.org/doc/ubifs.html
 S:	Supported
-F:	Documentation/filesystems/ubifs.txt
+F:	Documentation/filesystems/ubifs.rst
 F:	fs/ubifs/
 
 UCLINUX (M68KNOMMU AND COLDFIRE)
@@ -17065,7 +17065,7 @@ F:	arch/m68k/include/asm/*_no.*
 UDF FILESYSTEM
 M:	Jan Kara <jack@suse.com>
 S:	Maintained
-F:	Documentation/filesystems/udf.txt
+F:	Documentation/filesystems/udf.rst
 F:	fs/udf/
 
 UDRAW TABLET
@@ -18504,7 +18504,7 @@ L:	linux-fsdevel@vger.kernel.org
 T:	git git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs.git
 S:	Maintained
 F:	fs/zonefs/
-F:	Documentation/filesystems/zonefs.txt
+F:	Documentation/filesystems/zonefs.rst
 
 ZPOOL COMPRESSED PAGE STORAGE API
 M:	Dan Streetman <ddstreet@ieee.org>
-- 
cgit 


From abcb1e021ae5a36374c635eeaba5cec733169b78 Mon Sep 17 00:00:00 2001
From: Nick Desaulniers <ndesaulniers@google.com>
Date: Thu, 26 Mar 2020 17:09:51 -0700
Subject: Documentation: x86: exception-tables: document
 CONFIG_BUILDTIME_TABLE_SORT

Provide more information about __ex_table sorting post link.

The exception tables and fixup tables use a commonly recurring pattern
in the kernel of storing the address of labels as date in custom ELF
sections, then finding these sections, iterating elements within them,
and possibly revisiting them or modifying the data at these addresses.

Sorting readonly arrays to minimize runtime penalties is quite clever.

Suggested-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Link: https://lore.kernel.org/r/20200327000951.84071-1-ndesaulniers@google.com
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/x86/exception-tables.rst | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/Documentation/x86/exception-tables.rst b/Documentation/x86/exception-tables.rst
index ed6d4b0cf62c..81a393867f10 100644
--- a/Documentation/x86/exception-tables.rst
+++ b/Documentation/x86/exception-tables.rst
@@ -257,6 +257,9 @@ the fault, in our case the actual value is c0199ff5:
 the original assembly code: > 3:      movl $-14,%eax
 and linked in vmlinux     : > c0199ff5 <.fixup+10b5> movl   $0xfffffff2,%eax
 
+If the fixup was able to handle the exception, control flow may be returned
+to the instruction after the one that triggered the fault, ie. local label 2b.
+
 The assembly code::
 
  > .section __ex_table,"a"
@@ -344,3 +347,14 @@ pointer which points to one of:
      it as special.
 
 More functions can easily be added.
+
+CONFIG_BUILDTIME_TABLE_SORT allows the __ex_table section to be sorted post
+link of the kernel image, via a host utility scripts/sorttable. It will set the
+symbol main_extable_sort_needed to 0, avoiding sorting the __ex_table section
+at boot time. With the exception table sorted, at runtime when an exception
+occurs we can quickly lookup the __ex_table entry via binary search.
+
+This is not just a boot time optimization, some architectures require this
+table to be sorted in order to handle exceptions relatively early in the boot
+process. For example, i386 makes use of this form of exception handling before
+paging support is even enabled!
-- 
cgit