summaryrefslogtreecommitdiff
path: root/Documentation/userspace-api
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/userspace-api')
-rw-r--r--Documentation/userspace-api/gpio/chardev.rst116
-rw-r--r--Documentation/userspace-api/gpio/chardev_v1.rst131
-rw-r--r--Documentation/userspace-api/gpio/error-codes.rst79
-rw-r--r--Documentation/userspace-api/gpio/gpio-get-chipinfo-ioctl.rst41
-rw-r--r--Documentation/userspace-api/gpio/gpio-get-lineevent-ioctl.rst84
-rw-r--r--Documentation/userspace-api/gpio/gpio-get-linehandle-ioctl.rst125
-rw-r--r--Documentation/userspace-api/gpio/gpio-get-lineinfo-ioctl.rst54
-rw-r--r--Documentation/userspace-api/gpio/gpio-get-lineinfo-unwatch-ioctl.rst49
-rw-r--r--Documentation/userspace-api/gpio/gpio-get-lineinfo-watch-ioctl.rst74
-rw-r--r--Documentation/userspace-api/gpio/gpio-handle-get-line-values-ioctl.rst56
-rw-r--r--Documentation/userspace-api/gpio/gpio-handle-set-config-ioctl.rst63
-rw-r--r--Documentation/userspace-api/gpio/gpio-handle-set-line-values-ioctl.rst48
-rw-r--r--Documentation/userspace-api/gpio/gpio-lineevent-data-read.rst84
-rw-r--r--Documentation/userspace-api/gpio/gpio-lineinfo-changed-read.rst87
-rw-r--r--Documentation/userspace-api/gpio/gpio-v2-get-line-ioctl.rst152
-rw-r--r--Documentation/userspace-api/gpio/gpio-v2-get-lineinfo-ioctl.rst50
-rw-r--r--Documentation/userspace-api/gpio/gpio-v2-get-lineinfo-watch-ioctl.rst67
-rw-r--r--Documentation/userspace-api/gpio/gpio-v2-line-event-read.rst83
-rw-r--r--Documentation/userspace-api/gpio/gpio-v2-line-get-values-ioctl.rst51
-rw-r--r--Documentation/userspace-api/gpio/gpio-v2-line-set-config-ioctl.rst58
-rw-r--r--Documentation/userspace-api/gpio/gpio-v2-line-set-values-ioctl.rst47
-rw-r--r--Documentation/userspace-api/gpio/gpio-v2-lineinfo-changed-read.rst81
-rw-r--r--Documentation/userspace-api/gpio/index.rst18
-rw-r--r--Documentation/userspace-api/gpio/obsolete.rst11
-rw-r--r--Documentation/userspace-api/gpio/sysfs.rst170
-rw-r--r--Documentation/userspace-api/index.rst48
-rw-r--r--Documentation/userspace-api/ioctl/ioctl-number.rst4
-rw-r--r--Documentation/userspace-api/landlock.rst59
-rw-r--r--Documentation/userspace-api/netlink/netlink-raw.rst42
-rw-r--r--Documentation/userspace-api/perf_ring_buffer.rst830
30 files changed, 2844 insertions, 18 deletions
diff --git a/Documentation/userspace-api/gpio/chardev.rst b/Documentation/userspace-api/gpio/chardev.rst
new file mode 100644
index 000000000000..c58dd9771ac9
--- /dev/null
+++ b/Documentation/userspace-api/gpio/chardev.rst
@@ -0,0 +1,116 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===================================
+GPIO Character Device Userspace API
+===================================
+
+This is latest version (v2) of the character device API, as defined in
+``include/uapi/linux/gpio.h.``
+
+First added in 5.10.
+
+.. note::
+ Do NOT abuse userspace APIs to control hardware that has proper kernel
+ drivers. There may already be a driver for your use case, and an existing
+ kernel driver is sure to provide a superior solution to bitbashing
+ from userspace.
+
+ Read Documentation/driver-api/gpio/drivers-on-gpio.rst to avoid reinventing
+ kernel wheels in userspace.
+
+ Similarly, for multi-function lines there may be other subsystems, such as
+ Documentation/spi/index.rst, Documentation/i2c/index.rst,
+ Documentation/driver-api/pwm.rst, Documentation/w1/index.rst etc, that
+ provide suitable drivers and APIs for your hardware.
+
+Basic examples using the character device API can be found in ``tools/gpio/*``.
+
+The API is based around two major objects, the :ref:`gpio-v2-chip` and the
+:ref:`gpio-v2-line-request`.
+
+.. _gpio-v2-chip:
+
+Chip
+====
+
+The Chip represents a single GPIO chip and is exposed to userspace using device
+files of the form ``/dev/gpiochipX``.
+
+Each chip supports a number of GPIO lines,
+:c:type:`chip.lines<gpiochip_info>`. Lines on the chip are identified by an
+``offset`` in the range from 0 to ``chip.lines - 1``, i.e. `[0,chip.lines)`.
+
+Lines are requested from the chip using gpio-v2-get-line-ioctl.rst
+and the resulting line request is used to access the GPIO chip's lines or
+monitor the lines for edge events.
+
+Within this documentation, the file descriptor returned by calling `open()`
+on the GPIO device file is referred to as ``chip_fd``.
+
+Operations
+----------
+
+The following operations may be performed on the chip:
+
+.. toctree::
+ :titlesonly:
+
+ Get Line <gpio-v2-get-line-ioctl>
+ Get Chip Info <gpio-get-chipinfo-ioctl>
+ Get Line Info <gpio-v2-get-lineinfo-ioctl>
+ Watch Line Info <gpio-v2-get-lineinfo-watch-ioctl>
+ Unwatch Line Info <gpio-get-lineinfo-unwatch-ioctl>
+ Read Line Info Changed Events <gpio-v2-lineinfo-changed-read>
+
+.. _gpio-v2-line-request:
+
+Line Request
+============
+
+Line requests are created by gpio-v2-get-line-ioctl.rst and provide
+access to a set of requested lines. The line request is exposed to userspace
+via the anonymous file descriptor returned in
+:c:type:`request.fd<gpio_v2_line_request>` by gpio-v2-get-line-ioctl.rst.
+
+Within this documentation, the line request file descriptor is referred to
+as ``req_fd``.
+
+Operations
+----------
+
+The following operations may be performed on the line request:
+
+.. toctree::
+ :titlesonly:
+
+ Get Line Values <gpio-v2-line-get-values-ioctl>
+ Set Line Values <gpio-v2-line-set-values-ioctl>
+ Read Line Edge Events <gpio-v2-line-event-read>
+ Reconfigure Lines <gpio-v2-line-set-config-ioctl>
+
+Types
+=====
+
+This section contains the structs and enums that are referenced by the API v2,
+as defined in ``include/uapi/linux/gpio.h``.
+
+.. kernel-doc:: include/uapi/linux/gpio.h
+ :identifiers:
+ gpio_v2_line_attr_id
+ gpio_v2_line_attribute
+ gpio_v2_line_changed_type
+ gpio_v2_line_config
+ gpio_v2_line_config_attribute
+ gpio_v2_line_event
+ gpio_v2_line_event_id
+ gpio_v2_line_flag
+ gpio_v2_line_info
+ gpio_v2_line_info_changed
+ gpio_v2_line_request
+ gpio_v2_line_values
+ gpiochip_info
+
+.. toctree::
+ :hidden:
+
+ error-codes
diff --git a/Documentation/userspace-api/gpio/chardev_v1.rst b/Documentation/userspace-api/gpio/chardev_v1.rst
new file mode 100644
index 000000000000..67124b1d0487
--- /dev/null
+++ b/Documentation/userspace-api/gpio/chardev_v1.rst
@@ -0,0 +1,131 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+========================================
+GPIO Character Device Userspace API (v1)
+========================================
+
+.. warning::
+ This API is obsoleted by chardev.rst (v2).
+
+ New developments should use the v2 API, and existing developments are
+ encouraged to migrate as soon as possible, as this API will be removed
+ in the future. The v2 API is a functional superset of the v1 API so any
+ v1 call can be directly translated to a v2 equivalent.
+
+ This interface will continue to be maintained for the migration period,
+ but new features will only be added to the new API.
+
+First added in 4.8.
+
+The API is based around three major objects, the :ref:`gpio-v1-chip`, the
+:ref:`gpio-v1-line-handle`, and the :ref:`gpio-v1-line-event`.
+
+Where "line event" is used in this document it refers to the request that can
+monitor a line for edge events, not the edge events themselves.
+
+.. _gpio-v1-chip:
+
+Chip
+====
+
+The Chip represents a single GPIO chip and is exposed to userspace using device
+files of the form ``/dev/gpiochipX``.
+
+Each chip supports a number of GPIO lines,
+:c:type:`chip.lines<gpiochip_info>`. Lines on the chip are identified by an
+``offset`` in the range from 0 to ``chip.lines - 1``, i.e. `[0,chip.lines)`.
+
+Lines are requested from the chip using either gpio-get-linehandle-ioctl.rst
+and the resulting line handle is used to access the GPIO chip's lines, or
+gpio-get-lineevent-ioctl.rst and the resulting line event is used to monitor
+a GPIO line for edge events.
+
+Within this documentation, the file descriptor returned by calling `open()`
+on the GPIO device file is referred to as ``chip_fd``.
+
+Operations
+----------
+
+The following operations may be performed on the chip:
+
+.. toctree::
+ :titlesonly:
+
+ Get Line Handle <gpio-get-linehandle-ioctl>
+ Get Line Event <gpio-get-lineevent-ioctl>
+ Get Chip Info <gpio-get-chipinfo-ioctl>
+ Get Line Info <gpio-get-lineinfo-ioctl>
+ Watch Line Info <gpio-get-lineinfo-watch-ioctl>
+ Unwatch Line Info <gpio-get-lineinfo-unwatch-ioctl>
+ Read Line Info Changed Events <gpio-lineinfo-changed-read>
+
+.. _gpio-v1-line-handle:
+
+Line Handle
+===========
+
+Line handles are created by gpio-get-linehandle-ioctl.rst and provide
+access to a set of requested lines. The line handle is exposed to userspace
+via the anonymous file descriptor returned in
+:c:type:`request.fd<gpiohandle_request>` by gpio-get-linehandle-ioctl.rst.
+
+Within this documentation, the line handle file descriptor is referred to
+as ``handle_fd``.
+
+Operations
+----------
+
+The following operations may be performed on the line handle:
+
+.. toctree::
+ :titlesonly:
+
+ Get Line Values <gpio-handle-get-line-values-ioctl>
+ Set Line Values <gpio-handle-set-line-values-ioctl>
+ Reconfigure Lines <gpio-handle-set-config-ioctl>
+
+.. _gpio-v1-line-event:
+
+Line Event
+==========
+
+Line events are created by gpio-get-lineevent-ioctl.rst and provide
+access to a requested line. The line event is exposed to userspace
+via the anonymous file descriptor returned in
+:c:type:`request.fd<gpioevent_request>` by gpio-get-lineevent-ioctl.rst.
+
+Within this documentation, the line event file descriptor is referred to
+as ``event_fd``.
+
+Operations
+----------
+
+The following operations may be performed on the line event:
+
+.. toctree::
+ :titlesonly:
+
+ Get Line Value <gpio-handle-get-line-values-ioctl>
+ Read Line Edge Events <gpio-lineevent-data-read>
+
+Types
+=====
+
+This section contains the structs that are referenced by the ABI v1.
+
+The :c:type:`struct gpiochip_info<gpiochip_info>` is common to ABI v1 and v2.
+
+.. kernel-doc:: include/uapi/linux/gpio.h
+ :identifiers:
+ gpioevent_data
+ gpioevent_request
+ gpiohandle_config
+ gpiohandle_data
+ gpiohandle_request
+ gpioline_info
+ gpioline_info_changed
+
+.. toctree::
+ :hidden:
+
+ error-codes
diff --git a/Documentation/userspace-api/gpio/error-codes.rst b/Documentation/userspace-api/gpio/error-codes.rst
new file mode 100644
index 000000000000..6bf2948990cd
--- /dev/null
+++ b/Documentation/userspace-api/gpio/error-codes.rst
@@ -0,0 +1,79 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+.. _gpio_errors:
+
+*******************
+GPIO Error Codes
+*******************
+
+.. _gpio-errors:
+
+.. tabularcolumns:: |p{2.5cm}|p{15.0cm}|
+
+.. flat-table:: Common GPIO error codes
+ :header-rows: 0
+ :stub-columns: 0
+ :widths: 1 16
+
+ - - ``EAGAIN`` (aka ``EWOULDBLOCK``)
+
+ - The device was opened in non-blocking mode and a read can't
+ be performed as there is no data available.
+
+ - - ``EBADF``
+
+ - The file descriptor is not valid.
+
+ - - ``EBUSY``
+
+ - The ioctl can't be handled because the device is busy. Typically
+ returned when an ioctl attempts something that would require the
+ usage of a resource that was already allocated. The ioctl must not
+ be retried without performing another action to fix the problem
+ first.
+
+ - - ``EFAULT``
+
+ - There was a failure while copying data from/to userspace, probably
+ caused by an invalid pointer reference.
+
+ - - ``EINVAL``
+
+ - One or more of the ioctl parameters are invalid or out of the
+ allowed range. This is a widely used error code.
+
+ - - ``ENODEV``
+
+ - Device not found or was removed.
+
+ - - ``ENOMEM``
+
+ - There's not enough memory to handle the desired operation.
+
+ - - ``EPERM``
+
+ - Permission denied. Typically returned in response to an attempt
+ to perform an action incompatible with the current line
+ configuration.
+
+ - - ``EIO``
+
+ - I/O error. Typically returned when there are problems communicating
+ with a hardware device or requesting features that hardware does not
+ support. This could indicate broken or flaky hardware.
+ It's a 'Something is wrong, I give up!' type of error.
+
+ - - ``ENXIO``
+
+ - Typically returned when a feature requiring interrupt support was
+ requested, but the line does not support interrupts.
+
+.. note::
+
+ #. This list is not exhaustive; ioctls may return other error codes.
+ Since errors may have side effects such as a driver reset,
+ applications should abort on unexpected errors, or otherwise
+ assume that the device is in a bad state.
+
+ #. Request-specific error codes are listed in the individual
+ requests descriptions.
diff --git a/Documentation/userspace-api/gpio/gpio-get-chipinfo-ioctl.rst b/Documentation/userspace-api/gpio/gpio-get-chipinfo-ioctl.rst
new file mode 100644
index 000000000000..05f07fdefe2f
--- /dev/null
+++ b/Documentation/userspace-api/gpio/gpio-get-chipinfo-ioctl.rst
@@ -0,0 +1,41 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+.. _GPIO_GET_CHIPINFO_IOCTL:
+
+***********************
+GPIO_GET_CHIPINFO_IOCTL
+***********************
+
+Name
+====
+
+GPIO_GET_CHIPINFO_IOCTL - Get the publicly available information for a chip.
+
+Synopsis
+========
+
+.. c:macro:: GPIO_GET_CHIPINFO_IOCTL
+
+``int ioctl(int chip_fd, GPIO_GET_CHIPINFO_IOCTL, struct gpiochip_info *info)``
+
+Arguments
+=========
+
+``chip_fd``
+ The file descriptor of the GPIO character device returned by `open()`.
+
+``info``
+ The :c:type:`chip_info<gpiochip_info>` to be populated.
+
+Description
+===========
+
+Gets the publicly available information for a particular GPIO chip.
+
+Return Value
+============
+
+On success 0 and ``info`` is populated with the chip info.
+
+On error -1 and the ``errno`` variable is set appropriately.
+Common error codes are described in error-codes.rst.
diff --git a/Documentation/userspace-api/gpio/gpio-get-lineevent-ioctl.rst b/Documentation/userspace-api/gpio/gpio-get-lineevent-ioctl.rst
new file mode 100644
index 000000000000..09a9254f38cf
--- /dev/null
+++ b/Documentation/userspace-api/gpio/gpio-get-lineevent-ioctl.rst
@@ -0,0 +1,84 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+.. _GPIO_GET_LINEEVENT_IOCTL:
+
+************************
+GPIO_GET_LINEEVENT_IOCTL
+************************
+
+.. warning::
+ This ioctl is part of chardev_v1.rst and is obsoleted by
+ gpio-v2-get-line-ioctl.rst.
+
+Name
+====
+
+GPIO_GET_LINEEVENT_IOCTL - Request a line with edge detection from the kernel.
+
+Synopsis
+========
+
+.. c:macro:: GPIO_GET_LINEEVENT_IOCTL
+
+``int ioctl(int chip_fd, GPIO_GET_LINEEVENT_IOCTL, struct gpioevent_request *request)``
+
+Arguments
+=========
+
+``chip_fd``
+ The file descriptor of the GPIO character device returned by `open()`.
+
+``request``
+ The :c:type:`event_request<gpioevent_request>` specifying the line
+ to request and its configuration.
+
+Description
+===========
+
+Request a line with edge detection from the kernel.
+
+On success, the requesting process is granted exclusive access to the line
+value and may receive events when edges are detected on the line, as
+described in gpio-lineevent-data-read.rst.
+
+The state of a line is guaranteed to remain as requested until the returned
+file descriptor is closed. Once the file descriptor is closed, the state of
+the line becomes uncontrolled from the userspace perspective, and may revert
+to its default state.
+
+Requesting a line already in use is an error (**EBUSY**).
+
+Requesting edge detection on a line that does not support interrupts is an
+error (**ENXIO**).
+
+As with the :ref:`line handle<gpio-get-linehandle-config-support>`, the
+bias configuration is best effort.
+
+Closing the ``chip_fd`` has no effect on existing line events.
+
+Configuration Rules
+-------------------
+
+The following configuration rules apply:
+
+The line event is requested as an input, so no flags specific to output lines,
+``GPIOHANDLE_REQUEST_OUTPUT``, ``GPIOHANDLE_REQUEST_OPEN_DRAIN``, or
+``GPIOHANDLE_REQUEST_OPEN_SOURCE``, may be set.
+
+Only one bias flag, ``GPIOHANDLE_REQUEST_BIAS_xxx``, may be set.
+If no bias flags are set then the bias configuration is not changed.
+
+The edge flags, ``GPIOEVENT_REQUEST_RISING_EDGE`` and
+``GPIOEVENT_REQUEST_FALLING_EDGE``, may be combined to detect both rising
+and falling edges.
+
+Requesting an invalid configuration is an error (**EINVAL**).
+
+Return Value
+============
+
+On success 0 and the :c:type:`request.fd<gpioevent_request>` contains the file
+descriptor for the request.
+
+On error -1 and the ``errno`` variable is set appropriately.
+Common error codes are described in error-codes.rst.
diff --git a/Documentation/userspace-api/gpio/gpio-get-linehandle-ioctl.rst b/Documentation/userspace-api/gpio/gpio-get-linehandle-ioctl.rst
new file mode 100644
index 000000000000..9112a9d31174
--- /dev/null
+++ b/Documentation/userspace-api/gpio/gpio-get-linehandle-ioctl.rst
@@ -0,0 +1,125 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+.. _GPIO_GET_LINEHANDLE_IOCTL:
+
+*************************
+GPIO_GET_LINEHANDLE_IOCTL
+*************************
+
+.. warning::
+ This ioctl is part of chardev_v1.rst and is obsoleted by
+ gpio-v2-get-line-ioctl.rst.
+
+Name
+====
+
+GPIO_GET_LINEHANDLE_IOCTL - Request a line or lines from the kernel.
+
+Synopsis
+========
+
+.. c:macro:: GPIO_GET_LINEHANDLE_IOCTL
+
+``int ioctl(int chip_fd, GPIO_GET_LINEHANDLE_IOCTL, struct gpiohandle_request *request)``
+
+Arguments
+=========
+
+``chip_fd``
+ The file descriptor of the GPIO character device returned by `open()`.
+
+``request``
+ The :c:type:`handle_request<gpiohandle_request>` specifying the lines to
+ request and their configuration.
+
+Description
+===========
+
+Request a line or lines from the kernel.
+
+While multiple lines may be requested, the same configuration applies to all
+lines in the request.
+
+On success, the requesting process is granted exclusive access to the line
+value and write access to the line configuration.
+
+The state of a line, including the value of output lines, is guaranteed to
+remain as requested until the returned file descriptor is closed. Once the
+file descriptor is closed, the state of the line becomes uncontrolled from
+the userspace perspective, and may revert to its default state.
+
+Requesting a line already in use is an error (**EBUSY**).
+
+Closing the ``chip_fd`` has no effect on existing line handles.
+
+.. _gpio-get-linehandle-config-rules:
+
+Configuration Rules
+-------------------
+
+The following configuration rules apply:
+
+The direction flags, ``GPIOHANDLE_REQUEST_INPUT`` and
+``GPIOHANDLE_REQUEST_OUTPUT``, cannot be combined. If neither are set then the
+only other flag that may be set is ``GPIOHANDLE_REQUEST_ACTIVE_LOW`` and the
+line is requested "as-is" to allow reading of the line value without altering
+the electrical configuration.
+
+The drive flags, ``GPIOHANDLE_REQUEST_OPEN_xxx``, require the
+``GPIOHANDLE_REQUEST_OUTPUT`` to be set.
+Only one drive flag may be set.
+If none are set then the line is assumed push-pull.
+
+Only one bias flag, ``GPIOHANDLE_REQUEST_BIAS_xxx``, may be set, and
+it requires a direction flag to also be set.
+If no bias flags are set then the bias configuration is not changed.
+
+Requesting an invalid configuration is an error (**EINVAL**).
+
+
+.. _gpio-get-linehandle-config-support:
+
+Configuration Support
+---------------------
+
+Where the requested configuration is not directly supported by the underlying
+hardware and driver, the kernel applies one of these approaches:
+
+ - reject the request
+ - emulate the feature in software
+ - treat the feature as best effort
+
+The approach applied depends on whether the feature can reasonably be emulated
+in software, and the impact on the hardware and userspace if the feature is not
+supported.
+The approach applied for each feature is as follows:
+
+============== ===========
+Feature Approach
+============== ===========
+Bias best effort
+Direction reject
+Drive emulate
+============== ===========
+
+Bias is treated as best effort to allow userspace to apply the same
+configuration for platforms that support internal bias as those that require
+external bias.
+Worst case the line floats rather than being biased as expected.
+
+Drive is emulated by switching the line to an input when the line should not
+be driven.
+
+In all cases, the configuration reported by gpio-get-lineinfo-ioctl.rst
+is the requested configuration, not the resulting hardware configuration.
+Userspace cannot determine if a feature is supported in hardware, is
+emulated, or is best effort.
+
+Return Value
+============
+
+On success 0 and the :c:type:`request.fd<gpiohandle_request>` contains the
+file descriptor for the request.
+
+On error -1 and the ``errno`` variable is set appropriately.
+Common error codes are described in error-codes.rst.
diff --git a/Documentation/userspace-api/gpio/gpio-get-lineinfo-ioctl.rst b/Documentation/userspace-api/gpio/gpio-get-lineinfo-ioctl.rst
new file mode 100644
index 000000000000..c895b8910b25
--- /dev/null
+++ b/Documentation/userspace-api/gpio/gpio-get-lineinfo-ioctl.rst
@@ -0,0 +1,54 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+.. _GPIO_GET_LINEINFO_IOCTL:
+
+***********************
+GPIO_GET_LINEINFO_IOCTL
+***********************
+
+.. warning::
+ This ioctl is part of chardev_v1.rst and is obsoleted by
+ gpio-v2-get-lineinfo-ioctl.rst.
+
+Name
+====
+
+GPIO_GET_LINEINFO_IOCTL - Get the publicly available information for a line.
+
+Synopsis
+========
+
+.. c:macro:: GPIO_GET_LINEINFO_IOCTL
+
+``int ioctl(int chip_fd, GPIO_GET_LINEINFO_IOCTL, struct gpioline_info *info)``
+
+Arguments
+=========
+
+``chip_fd``
+ The file descriptor of the GPIO character device returned by `open()`.
+
+``info``
+ The :c:type:`line_info<gpioline_info>` to be populated, with the
+ ``offset`` field set to indicate the line to be collected.
+
+Description
+===========
+
+Get the publicly available information for a line.
+
+This information is available independent of whether the line is in use.
+
+.. note::
+ The line info does not include the line value.
+
+ The line must be requested using gpio-get-linehandle-ioctl.rst or
+ gpio-get-lineevent-ioctl.rst to access its value.
+
+Return Value
+============
+
+On success 0 and ``info`` is populated with the chip info.
+
+On error -1 and the ``errno`` variable is set appropriately.
+Common error codes are described in error-codes.rst.
diff --git a/Documentation/userspace-api/gpio/gpio-get-lineinfo-unwatch-ioctl.rst b/Documentation/userspace-api/gpio/gpio-get-lineinfo-unwatch-ioctl.rst
new file mode 100644
index 000000000000..a82d0161daf8
--- /dev/null
+++ b/Documentation/userspace-api/gpio/gpio-get-lineinfo-unwatch-ioctl.rst
@@ -0,0 +1,49 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+.. _GPIO_GET_LINEINFO_UNWATCH_IOCTL:
+
+*******************************
+GPIO_GET_LINEINFO_UNWATCH_IOCTL
+*******************************
+
+Name
+====
+
+GPIO_GET_LINEINFO_UNWATCH_IOCTL - Disable watching a line for changes to its
+requested state and configuration information.
+
+Synopsis
+========
+
+.. c:macro:: GPIO_GET_LINEINFO_UNWATCH_IOCTL
+
+``int ioctl(int chip_fd, GPIO_GET_LINEINFO_UNWATCH_IOCTL, u32 *offset)``
+
+Arguments
+=========
+
+``chip_fd``
+ The file descriptor of the GPIO character device returned by `open()`.
+
+``offset``
+ The offset of the line to no longer watch.
+
+Description
+===========
+
+Remove the line from the list of lines being watched on this ``chip_fd``.
+
+This is the reverse of gpio-v2-get-lineinfo-watch-ioctl.rst (v2) and
+gpio-get-lineinfo-watch-ioctl.rst (v1).
+
+Unwatching a line that is not watched is an error (**EBUSY**).
+
+First added in 5.7.
+
+Return Value
+============
+
+On success 0.
+
+On error -1 and the ``errno`` variable is set appropriately.
+Common error codes are described in error-codes.rst.
diff --git a/Documentation/userspace-api/gpio/gpio-get-lineinfo-watch-ioctl.rst b/Documentation/userspace-api/gpio/gpio-get-lineinfo-watch-ioctl.rst
new file mode 100644
index 000000000000..f5c92b69a496
--- /dev/null
+++ b/Documentation/userspace-api/gpio/gpio-get-lineinfo-watch-ioctl.rst
@@ -0,0 +1,74 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+.. _GPIO_GET_LINEINFO_WATCH_IOCTL:
+
+*****************************
+GPIO_GET_LINEINFO_WATCH_IOCTL
+*****************************
+
+.. warning::
+ This ioctl is part of chardev_v1.rst and is obsoleted by
+ gpio-v2-get-lineinfo-watch-ioctl.rst.
+
+Name
+====
+
+GPIO_GET_LINEINFO_WATCH_IOCTL - Enable watching a line for changes to its
+request state and configuration information.
+
+Synopsis
+========
+
+.. c:macro:: GPIO_GET_LINEINFO_WATCH_IOCTL
+
+``int ioctl(int chip_fd, GPIO_GET_LINEINFO_WATCH_IOCTL, struct gpioline_info *info)``
+
+Arguments
+=========
+
+``chip_fd``
+ The file descriptor of the GPIO character device returned by `open()`.
+
+``info``
+ The :c:type:`line_info<gpioline_info>` struct to be populated, with
+ the ``offset`` set to indicate the line to watch
+
+Description
+===========
+
+Enable watching a line for changes to its request state and configuration
+information. Changes to line info include a line being requested, released
+or reconfigured.
+
+.. note::
+ Watching line info is not generally required, and would typically only be
+ used by a system monitoring component.
+
+ The line info does NOT include the line value.
+
+ The line must be requested using gpio-get-linehandle-ioctl.rst or
+ gpio-get-lineevent-ioctl.rst to access its value, and the line event can
+ monitor a line for events using gpio-lineevent-data-read.rst.
+
+By default all lines are unwatched when the GPIO chip is opened.
+
+Multiple lines may be watched simultaneously by adding a watch for each.
+
+Once a watch is set, any changes to line info will generate events which can be
+read from the ``chip_fd`` as described in
+gpio-lineinfo-changed-read.rst.
+
+Adding a watch to a line that is already watched is an error (**EBUSY**).
+
+Watches are specific to the ``chip_fd`` and are independent of watches
+on the same GPIO chip opened with a separate call to `open()`.
+
+First added in 5.7.
+
+Return Value
+============
+
+On success 0 and ``info`` is populated with the current line info.
+
+On error -1 and the ``errno`` variable is set appropriately.
+Common error codes are described in error-codes.rst.
diff --git a/Documentation/userspace-api/gpio/gpio-handle-get-line-values-ioctl.rst b/Documentation/userspace-api/gpio/gpio-handle-get-line-values-ioctl.rst
new file mode 100644
index 000000000000..25263b8f0588
--- /dev/null
+++ b/Documentation/userspace-api/gpio/gpio-handle-get-line-values-ioctl.rst
@@ -0,0 +1,56 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+.. _GPIOHANDLE_GET_LINE_VALUES_IOCTL:
+
+********************************
+GPIOHANDLE_GET_LINE_VALUES_IOCTL
+********************************
+.. warning::
+ This ioctl is part of chardev_v1.rst and is obsoleted by
+ gpio-v2-line-get-values-ioctl.rst.
+
+Name
+====
+
+GPIOHANDLE_GET_LINE_VALUES_IOCTL - Get the values of all requested lines.
+
+Synopsis
+========
+
+.. c:macro:: GPIOHANDLE_GET_LINE_VALUES_IOCTL
+
+``int ioctl(int handle_fd, GPIOHANDLE_GET_LINE_VALUES_IOCTL, struct gpiohandle_data *values)``
+
+Arguments
+=========
+
+``handle_fd``
+ The file descriptor of the GPIO character device, as returned in the
+ :c:type:`request.fd<gpiohandle_request>` by gpio-get-linehandle-ioctl.rst.
+
+``values``
+ The :c:type:`line_values<gpiohandle_data>` to be populated.
+
+Description
+===========
+
+Get the values of all requested lines.
+
+The values of both input and output lines may be read.
+
+For output lines, the value returned is driver and configuration dependent and
+may be either the output buffer (the last requested value set) or the input
+buffer (the actual level of the line), and depending on the hardware and
+configuration these may differ.
+
+This ioctl can also be used to read the line value for line events,
+substituting the ``event_fd`` for the ``handle_fd``. As there is only
+one line requested in that case, only the one value is returned in ``values``.
+
+Return Value
+============
+
+On success 0 and ``values`` populated with the values read.
+
+On error -1 and the ``errno`` variable is set appropriately.
+Common error codes are described in error-codes.rst.
diff --git a/Documentation/userspace-api/gpio/gpio-handle-set-config-ioctl.rst b/Documentation/userspace-api/gpio/gpio-handle-set-config-ioctl.rst
new file mode 100644
index 000000000000..d002a84681ac
--- /dev/null
+++ b/Documentation/userspace-api/gpio/gpio-handle-set-config-ioctl.rst
@@ -0,0 +1,63 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+.. _GPIOHANDLE_SET_CONFIG_IOCTL:
+
+***************************
+GPIOHANDLE_SET_CONFIG_IOCTL
+***************************
+
+.. warning::
+ This ioctl is part of chardev_v1.rst and is obsoleted by
+ gpio-v2-line-set-config-ioctl.rst.
+
+Name
+====
+
+GPIOHANDLE_SET_CONFIG_IOCTL - Update the configuration of previously requested lines.
+
+Synopsis
+========
+
+.. c:macro:: GPIOHANDLE_SET_CONFIG_IOCTL
+
+``int ioctl(int handle_fd, GPIOHANDLE_SET_CONFIG_IOCTL, struct gpiohandle_config *config)``
+
+Arguments
+=========
+
+``handle_fd``
+ The file descriptor of the GPIO character device, as returned in the
+ :c:type:`request.fd<gpiohandle_request>` by gpio-get-linehandle-ioctl.rst.
+
+``config``
+ The new :c:type:`configuration<gpiohandle_config>` to apply to the
+ requested lines.
+
+Description
+===========
+
+Update the configuration of previously requested lines, without releasing the
+line or introducing potential glitches.
+
+The configuration applies to all requested lines.
+
+The same :ref:`gpio-get-linehandle-config-rules` and
+:ref:`gpio-get-linehandle-config-support` that apply when requesting the
+lines also apply when updating the line configuration.
+
+The motivating use case for this command is changing direction of
+bi-directional lines between input and output, but it may be used more
+generally to move lines seamlessly from one configuration state to another.
+
+To only change the value of output lines, use
+gpio-handle-set-line-values-ioctl.rst.
+
+First added in 5.5.
+
+Return Value
+============
+
+On success 0.
+
+On error -1 and the ``errno`` variable is set appropriately.
+Common error codes are described in error-codes.rst.
diff --git a/Documentation/userspace-api/gpio/gpio-handle-set-line-values-ioctl.rst b/Documentation/userspace-api/gpio/gpio-handle-set-line-values-ioctl.rst
new file mode 100644
index 000000000000..0aa05e623a6c
--- /dev/null
+++ b/Documentation/userspace-api/gpio/gpio-handle-set-line-values-ioctl.rst
@@ -0,0 +1,48 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+.. _GPIO_HANDLE_SET_LINE_VALUES_IOCTL:
+
+*********************************
+GPIO_HANDLE_SET_LINE_VALUES_IOCTL
+*********************************
+.. warning::
+ This ioctl is part of chardev_v1.rst and is obsoleted by
+ gpio-v2-line-set-values-ioctl.rst.
+
+Name
+====
+
+GPIO_HANDLE_SET_LINE_VALUES_IOCTL - Set the values of all requested output lines.
+
+Synopsis
+========
+
+.. c:macro:: GPIO_HANDLE_SET_LINE_VALUES_IOCTL
+
+``int ioctl(int handle_fd, GPIO_HANDLE_SET_LINE_VALUES_IOCTL, struct gpiohandle_data *values)``
+
+Arguments
+=========
+
+``handle_fd``
+ The file descriptor of the GPIO character device, as returned in the
+ :c:type:`request.fd<gpiohandle_request>` by gpio-get-linehandle-ioctl.rst.
+
+``values``
+ The :c:type:`line_values<gpiohandle_data>` to set.
+
+Description
+===========
+
+Set the values of all requested output lines.
+
+Only the values of output lines may be set.
+Attempting to set the value of input lines is an error (**EPERM**).
+
+Return Value
+============
+
+On success 0.
+
+On error -1 and the ``errno`` variable is set appropriately.
+Common error codes are described in error-codes.rst.
diff --git a/Documentation/userspace-api/gpio/gpio-lineevent-data-read.rst b/Documentation/userspace-api/gpio/gpio-lineevent-data-read.rst
new file mode 100644
index 000000000000..68b8d4f9f604
--- /dev/null
+++ b/Documentation/userspace-api/gpio/gpio-lineevent-data-read.rst
@@ -0,0 +1,84 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+.. _GPIO_LINEEVENT_DATA_READ:
+
+************************
+GPIO_LINEEVENT_DATA_READ
+************************
+
+.. warning::
+ This ioctl is part of chardev_v1.rst and is obsoleted by
+ gpio-v2-line-event-read.rst.
+
+Name
+====
+
+GPIO_LINEEVENT_DATA_READ - Read edge detection events from a line event.
+
+Synopsis
+========
+
+``int read(int event_fd, void *buf, size_t count)``
+
+Arguments
+=========
+
+``event_fd``
+ The file descriptor of the GPIO character device, as returned in the
+ :c:type:`request.fd<gpioevent_request>` by gpio-get-lineevent-ioctl.rst.
+
+``buf``
+ The buffer to contain the :c:type:`events<gpioevent_data>`.
+
+``count``
+ The number of bytes available in ``buf``, which must be at
+ least the size of a :c:type:`gpioevent_data`.
+
+Description
+===========
+
+Read edge detection events for a line from a line event.
+
+Edge detection must be enabled for the input line using either
+``GPIOEVENT_REQUEST_RISING_EDGE`` or ``GPIOEVENT_REQUEST_FALLING_EDGE``, or
+both. Edge events are then generated whenever edge interrupts are detected on
+the input line.
+
+The kernel captures and timestamps edge events as close as possible to their
+occurrence and stores them in a buffer from where they can be read by
+userspace at its convenience using `read()`.
+
+The source of the clock for :c:type:`event.timestamp<gpioevent_data>` is
+``CLOCK_MONOTONIC``, except for kernels earlier than Linux 5.7 when it was
+``CLOCK_REALTIME``. There is no indication in the :c:type:`gpioevent_data`
+as to which clock source is used, it must be determined from either the kernel
+version or sanity checks on the timestamp itself.
+
+Events read from the buffer are always in the same order that they were
+detected by the kernel.
+
+The size of the kernel event buffer is fixed at 16 events.
+
+The buffer may overflow if bursts of events occur quicker than they are read
+by userspace. If an overflow occurs then the most recent event is discarded.
+Overflow cannot be detected from userspace.
+
+To minimize the number of calls required to copy events from the kernel to
+userspace, `read()` supports copying multiple events. The number of events
+copied is the lower of the number available in the kernel buffer and the
+number that will fit in the userspace buffer (``buf``).
+
+The `read()` will block if no event is available and the ``event_fd`` has not
+been set **O_NONBLOCK**.
+
+The presence of an event can be tested for by checking that the ``event_fd`` is
+readable using `poll()` or an equivalent.
+
+Return Value
+============
+
+On success the number of bytes read, which will be a multiple of the size of
+a :c:type:`gpio_lineevent_data` event.
+
+On error -1 and the ``errno`` variable is set appropriately.
+Common error codes are described in error-codes.rst.
diff --git a/Documentation/userspace-api/gpio/gpio-lineinfo-changed-read.rst b/Documentation/userspace-api/gpio/gpio-lineinfo-changed-read.rst
new file mode 100644
index 000000000000..c4f5e1787a9d
--- /dev/null
+++ b/Documentation/userspace-api/gpio/gpio-lineinfo-changed-read.rst
@@ -0,0 +1,87 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+.. _GPIO_LINEINFO_CHANGED_READ:
+
+**************************
+GPIO_LINEINFO_CHANGED_READ
+**************************
+
+.. warning::
+ This ioctl is part of chardev_v1.rst and is obsoleted by
+ gpio-v2-lineinfo-changed-read.rst.
+
+Name
+====
+
+GPIO_LINEINFO_CHANGED_READ - Read line info change events for watched lines
+from the chip.
+
+Synopsis
+========
+
+``int read(int chip_fd, void *buf, size_t count)``
+
+Arguments
+=========
+
+``chip_fd``
+ The file descriptor of the GPIO character device returned by `open()`.
+
+``buf``
+ The buffer to contain the :c:type:`events<gpioline_info_changed>`.
+
+``count``
+ The number of bytes available in ``buf``, which must be at least the size
+ of a :c:type:`gpioline_info_changed` event.
+
+Description
+===========
+
+Read line info change events for watched lines from the chip.
+
+.. note::
+ Monitoring line info changes is not generally required, and would typically
+ only be performed by a system monitoring component.
+
+ These events relate to changes in a line's request state or configuration,
+ not its value. Use gpio-lineevent-data-read.rst to receive events when a
+ line changes value.
+
+A line must be watched using gpio-get-lineinfo-watch-ioctl.rst to generate
+info changed events. Subsequently, a request, release, or reconfiguration
+of the line will generate an info changed event.
+
+The kernel timestamps events when they occur and stores them in a buffer
+from where they can be read by userspace at its convenience using `read()`.
+
+The size of the kernel event buffer is fixed at 32 events per ``chip_fd``.
+
+The buffer may overflow if bursts of events occur quicker than they are read
+by userspace. If an overflow occurs then the most recent event is discarded.
+Overflow cannot be detected from userspace.
+
+Events read from the buffer are always in the same order that they were
+detected by the kernel, including when multiple lines are being monitored by
+the one ``chip_fd``.
+
+To minimize the number of calls required to copy events from the kernel to
+userspace, `read()` supports copying multiple events. The number of events
+copied is the lower of the number available in the kernel buffer and the
+number that will fit in the userspace buffer (``buf``).
+
+A `read()` will block if no event is available and the ``chip_fd`` has not
+been set **O_NONBLOCK**.
+
+The presence of an event can be tested for by checking that the ``chip_fd`` is
+readable using `poll()` or an equivalent.
+
+First added in 5.7.
+
+Return Value
+============
+
+On success the number of bytes read, which will be a multiple of the size of
+a :c:type:`gpioline_info_changed` event.
+
+On error -1 and the ``errno`` variable is set appropriately.
+Common error codes are described in error-codes.rst.
diff --git a/Documentation/userspace-api/gpio/gpio-v2-get-line-ioctl.rst b/Documentation/userspace-api/gpio/gpio-v2-get-line-ioctl.rst
new file mode 100644
index 000000000000..56b975801b6a
--- /dev/null
+++ b/Documentation/userspace-api/gpio/gpio-v2-get-line-ioctl.rst
@@ -0,0 +1,152 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+.. _GPIO_V2_GET_LINE_IOCTL:
+
+**********************
+GPIO_V2_GET_LINE_IOCTL
+**********************
+
+Name
+====
+
+GPIO_V2_GET_LINE_IOCTL - Request a line or lines from the kernel.
+
+Synopsis
+========
+
+.. c:macro:: GPIO_V2_GET_LINE_IOCTL
+
+``int ioctl(int chip_fd, GPIO_V2_GET_LINE_IOCTL, struct gpio_v2_line_request *request)``
+
+Arguments
+=========
+
+``chip_fd``
+ The file descriptor of the GPIO character device returned by `open()`.
+
+``request``
+ The :c:type:`line_request<gpio_v2_line_request>` specifying the lines
+ to request and their configuration.
+
+Description
+===========
+
+On success, the requesting process is granted exclusive access to the line
+value, write access to the line configuration, and may receive events when
+edges are detected on the line, all of which are described in more detail in
+:ref:`gpio-v2-line-request`.
+
+A number of lines may be requested in the one line request, and request
+operations are performed on the requested lines by the kernel as atomically
+as possible. e.g. gpio-v2-line-get-values-ioctl.rst will read all the
+requested lines at once.
+
+The state of a line, including the value of output lines, is guaranteed to
+remain as requested until the returned file descriptor is closed. Once the
+file descriptor is closed, the state of the line becomes uncontrolled from
+the userspace perspective, and may revert to its default state.
+
+Requesting a line already in use is an error (**EBUSY**).
+
+Closing the ``chip_fd`` has no effect on existing line requests.
+
+.. _gpio-v2-get-line-config-rules:
+
+Configuration Rules
+-------------------
+
+For any given requested line, the following configuration rules apply:
+
+The direction flags, ``GPIO_V2_LINE_FLAG_INPUT`` and
+``GPIO_V2_LINE_FLAG_OUTPUT``, cannot be combined. If neither are set then
+the only other flag that may be set is ``GPIO_V2_LINE_FLAG_ACTIVE_LOW``
+and the line is requested "as-is" to allow reading of the line value
+without altering the electrical configuration.
+
+The drive flags, ``GPIO_V2_LINE_FLAG_OPEN_xxx``, require the
+``GPIO_V2_LINE_FLAG_OUTPUT`` to be set.
+Only one drive flag may be set.
+If none are set then the line is assumed push-pull.
+
+Only one bias flag, ``GPIO_V2_LINE_FLAG_BIAS_xxx``, may be set, and it
+requires a direction flag to also be set.
+If no bias flags are set then the bias configuration is not changed.
+
+The edge flags, ``GPIO_V2_LINE_FLAG_EDGE_xxx``, require
+``GPIO_V2_LINE_FLAG_INPUT`` to be set and may be combined to detect both rising
+and falling edges. Requesting edge detection from a line that does not support
+it is an error (**ENXIO**).
+
+Only one event clock flag, ``GPIO_V2_LINE_FLAG_EVENT_CLOCK_xxx``, may be set.
+If none are set then the event clock defaults to ``CLOCK_MONOTONIC``.
+The ``GPIO_V2_LINE_FLAG_EVENT_CLOCK_HTE`` flag requires supporting hardware
+and a kernel with ``CONFIG_HTE`` set. Requesting HTE from a device that
+doesn't support it is an error (**EOPNOTSUP**).
+
+The :c:type:`debounce_period_us<gpio_v2_line_attribute>` attribute may only
+be applied to lines with ``GPIO_V2_LINE_FLAG_INPUT`` set. When set, debounce
+applies to both the values returned by gpio-v2-line-get-values-ioctl.rst and
+the edges returned by gpio-v2-line-event-read.rst. If not
+supported directly by hardware, debouncing is emulated in software by the
+kernel. Requesting debounce on a line that supports neither debounce in
+hardware nor interrupts, as required for software emulation, is an error
+(**ENXIO**).
+
+Requesting an invalid configuration is an error (**EINVAL**).
+
+.. _gpio-v2-get-line-config-support:
+
+Configuration Support
+---------------------
+
+Where the requested configuration is not directly supported by the underlying
+hardware and driver, the kernel applies one of these approaches:
+
+ - reject the request
+ - emulate the feature in software
+ - treat the feature as best effort
+
+The approach applied depends on whether the feature can reasonably be emulated
+in software, and the impact on the hardware and userspace if the feature is not
+supported.
+The approach applied for each feature is as follows:
+
+============== ===========
+Feature Approach
+============== ===========
+Bias best effort
+Debounce emulate
+Direction reject
+Drive emulate
+Edge Detection reject
+============== ===========
+
+Bias is treated as best effort to allow userspace to apply the same
+configuration for platforms that support internal bias as those that require
+external bias.
+Worst case the line floats rather than being biased as expected.
+
+Debounce is emulated by applying a filter to hardware interrupts on the line.
+An edge event is generated after an edge is detected and the line remains
+stable for the debounce period.
+The event timestamp corresponds to the end of the debounce period.
+
+Drive is emulated by switching the line to an input when the line should not
+be actively driven.
+
+Edge detection requires interrupt support, and is rejected if that is not
+supported. Emulation by polling can still be performed from userspace.
+
+In all cases, the configuration reported by gpio-v2-get-lineinfo-ioctl.rst
+is the requested configuration, not the resulting hardware configuration.
+Userspace cannot determine if a feature is supported in hardware, is
+emulated, or is best effort.
+
+Return Value
+============
+
+On success 0 and the :c:type:`request.fd<gpio_v2_line_request>` contains the
+file descriptor for the request.
+
+On error -1 and the ``errno`` variable is set appropriately.
+Common error codes are described in error-codes.rst.
diff --git a/Documentation/userspace-api/gpio/gpio-v2-get-lineinfo-ioctl.rst b/Documentation/userspace-api/gpio/gpio-v2-get-lineinfo-ioctl.rst
new file mode 100644
index 000000000000..bc4d8df887d4
--- /dev/null
+++ b/Documentation/userspace-api/gpio/gpio-v2-get-lineinfo-ioctl.rst
@@ -0,0 +1,50 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+.. _GPIO_V2_GET_LINEINFO_IOCTL:
+
+**************************
+GPIO_V2_GET_LINEINFO_IOCTL
+**************************
+
+Name
+====
+
+GPIO_V2_GET_LINEINFO_IOCTL - Get the publicly available information for a line.
+
+Synopsis
+========
+
+.. c:macro:: GPIO_V2_GET_LINEINFO_IOCTL
+
+``int ioctl(int chip_fd, GPIO_V2_GET_LINEINFO_IOCTL, struct gpio_v2_line_info *info)``
+
+Arguments
+=========
+
+``chip_fd``
+ The file descriptor of the GPIO character device returned by `open()`.
+
+``info``
+ The :c:type:`line_info<gpio_v2_line_info>` to be populated, with the
+ ``offset`` field set to indicate the line to be collected.
+
+Description
+===========
+
+Get the publicly available information for a line.
+
+This information is available independent of whether the line is in use.
+
+.. note::
+ The line info does not include the line value.
+
+ The line must be requested using gpio-v2-get-line-ioctl.rst to access its
+ value.
+
+Return Value
+============
+
+On success 0 and ``info`` is populated with the chip info.
+
+On error -1 and the ``errno`` variable is set appropriately.
+Common error codes are described in error-codes.rst.
diff --git a/Documentation/userspace-api/gpio/gpio-v2-get-lineinfo-watch-ioctl.rst b/Documentation/userspace-api/gpio/gpio-v2-get-lineinfo-watch-ioctl.rst
new file mode 100644
index 000000000000..938ff85a9322
--- /dev/null
+++ b/Documentation/userspace-api/gpio/gpio-v2-get-lineinfo-watch-ioctl.rst
@@ -0,0 +1,67 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+.. _GPIO_V2_GET_LINEINFO_WATCH_IOCTL:
+
+********************************
+GPIO_V2_GET_LINEINFO_WATCH_IOCTL
+********************************
+
+Name
+====
+
+GPIO_V2_GET_LINEINFO_WATCH_IOCTL - Enable watching a line for changes to its
+request state and configuration information.
+
+Synopsis
+========
+
+.. c:macro:: GPIO_V2_GET_LINEINFO_WATCH_IOCTL
+
+``int ioctl(int chip_fd, GPIO_V2_GET_LINEINFO_WATCH_IOCTL, struct gpio_v2_line_info *info)``
+
+Arguments
+=========
+
+``chip_fd``
+ The file descriptor of the GPIO character device returned by `open()`.
+
+``info``
+ The :c:type:`line_info<gpio_v2_line_info>` struct to be populated, with
+ the ``offset`` set to indicate the line to watch
+
+Description
+===========
+
+Enable watching a line for changes to its request state and configuration
+information. Changes to line info include a line being requested, released
+or reconfigured.
+
+.. note::
+ Watching line info is not generally required, and would typically only be
+ used by a system monitoring component.
+
+ The line info does NOT include the line value.
+ The line must be requested using gpio-v2-get-line-ioctl.rst to access
+ its value, and the line request can monitor a line for events using
+ gpio-v2-line-event-read.rst.
+
+By default all lines are unwatched when the GPIO chip is opened.
+
+Multiple lines may be watched simultaneously by adding a watch for each.
+
+Once a watch is set, any changes to line info will generate events which can be
+read from the ``chip_fd`` as described in
+gpio-v2-lineinfo-changed-read.rst.
+
+Adding a watch to a line that is already watched is an error (**EBUSY**).
+
+Watches are specific to the ``chip_fd`` and are independent of watches
+on the same GPIO chip opened with a separate call to `open()`.
+
+Return Value
+============
+
+On success 0 and ``info`` is populated with the current line info.
+
+On error -1 and the ``errno`` variable is set appropriately.
+Common error codes are described in error-codes.rst.
diff --git a/Documentation/userspace-api/gpio/gpio-v2-line-event-read.rst b/Documentation/userspace-api/gpio/gpio-v2-line-event-read.rst
new file mode 100644
index 000000000000..6513c23fb7ca
--- /dev/null
+++ b/Documentation/userspace-api/gpio/gpio-v2-line-event-read.rst
@@ -0,0 +1,83 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+.. _GPIO_V2_LINE_EVENT_READ:
+
+***********************
+GPIO_V2_LINE_EVENT_READ
+***********************
+
+Name
+====
+
+GPIO_V2_LINE_EVENT_READ - Read edge detection events for lines from a request.
+
+Synopsis
+========
+
+``int read(int req_fd, void *buf, size_t count)``
+
+Arguments
+=========
+
+``req_fd``
+ The file descriptor of the GPIO character device, as returned in the
+ :c:type:`request.fd<gpio_v2_line_request>` by gpio-v2-get-line-ioctl.rst.
+
+``buf``
+ The buffer to contain the :c:type:`events<gpio_v2_line_event>`.
+
+``count``
+ The number of bytes available in ``buf``, which must be at
+ least the size of a :c:type:`gpio_v2_line_event`.
+
+Description
+===========
+
+Read edge detection events for lines from a request.
+
+Edge detection must be enabled for the input line using either
+``GPIO_V2_LINE_FLAG_EDGE_RISING`` or ``GPIO_V2_LINE_FLAG_EDGE_FALLING``, or
+both. Edge events are then generated whenever edge interrupts are detected on
+the input line.
+
+The kernel captures and timestamps edge events as close as possible to their
+occurrence and stores them in a buffer from where they can be read by
+userspace at its convenience using `read()`.
+
+Events read from the buffer are always in the same order that they were
+detected by the kernel, including when multiple lines are being monitored by
+the one request.
+
+The size of the kernel event buffer is fixed at the time of line request
+creation, and can be influenced by the
+:c:type:`request.event_buffer_size<gpio_v2_line_request>`.
+The default size is 16 times the number of lines requested.
+
+The buffer may overflow if bursts of events occur quicker than they are read
+by userspace. If an overflow occurs then the oldest buffered event is
+discarded. Overflow can be detected from userspace by monitoring the event
+sequence numbers.
+
+To minimize the number of calls required to copy events from the kernel to
+userspace, `read()` supports copying multiple events. The number of events
+copied is the lower of the number available in the kernel buffer and the
+number that will fit in the userspace buffer (``buf``).
+
+Changing the edge detection flags using gpio-v2-line-set-config-ioctl.rst
+does not remove or modify the events already contained in the kernel event
+buffer.
+
+The `read()` will block if no event is available and the ``req_fd`` has not
+been set **O_NONBLOCK**.
+
+The presence of an event can be tested for by checking that the ``req_fd`` is
+readable using `poll()` or an equivalent.
+
+Return Value
+============
+
+On success the number of bytes read, which will be a multiple of the size of a
+:c:type:`gpio_v2_line_event` event.
+
+On error -1 and the ``errno`` variable is set appropriately.
+Common error codes are described in error-codes.rst.
diff --git a/Documentation/userspace-api/gpio/gpio-v2-line-get-values-ioctl.rst b/Documentation/userspace-api/gpio/gpio-v2-line-get-values-ioctl.rst
new file mode 100644
index 000000000000..e4e74a1926d8
--- /dev/null
+++ b/Documentation/userspace-api/gpio/gpio-v2-line-get-values-ioctl.rst
@@ -0,0 +1,51 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+.. _GPIO_V2_LINE_GET_VALUES_IOCTL:
+
+*****************************
+GPIO_V2_LINE_GET_VALUES_IOCTL
+*****************************
+
+Name
+====
+
+GPIO_V2_LINE_GET_VALUES_IOCTL - Get the values of requested lines.
+
+Synopsis
+========
+
+.. c:macro:: GPIO_V2_LINE_GET_VALUES_IOCTL
+
+``int ioctl(int req_fd, GPIO_V2_LINE_GET_VALUES_IOCTL, struct gpio_v2_line_values *values)``
+
+Arguments
+=========
+
+``req_fd``
+ The file descriptor of the GPIO character device, as returned in the
+ :c:type:`request.fd<gpio_v2_line_request>` by gpio-v2-get-line-ioctl.rst.
+
+``values``
+ The :c:type:`line_values<gpio_v2_line_values>` to get with the ``mask`` set
+ to indicate the subset of requested lines to get.
+
+Description
+===========
+
+Get the values of requested lines.
+
+The values of both input and output lines may be read.
+
+For output lines, the value returned is driver and configuration dependent and
+may be either the output buffer (the last requested value set) or the input
+buffer (the actual level of the line), and depending on the hardware and
+configuration these may differ.
+
+Return Value
+============
+
+On success 0 and the corresponding :c:type:`values.bits<gpio_v2_line_values>`
+contain the value read.
+
+On error -1 and the ``errno`` variable is set appropriately.
+Common error codes are described in error-codes.rst.
diff --git a/Documentation/userspace-api/gpio/gpio-v2-line-set-config-ioctl.rst b/Documentation/userspace-api/gpio/gpio-v2-line-set-config-ioctl.rst
new file mode 100644
index 000000000000..9b942a8a53ca
--- /dev/null
+++ b/Documentation/userspace-api/gpio/gpio-v2-line-set-config-ioctl.rst
@@ -0,0 +1,58 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+.. _GPIO_V2_LINE_SET_CONFIG_IOCTL:
+
+*****************************
+GPIO_V2_LINE_SET_CONFIG_IOCTL
+*****************************
+
+Name
+====
+
+GPIO_V2_LINE_SET_CONFIG_IOCTL - Update the configuration of previously requested lines.
+
+Synopsis
+========
+
+.. c:macro:: GPIO_V2_LINE_SET_CONFIG_IOCTL
+
+``int ioctl(int req_fd, GPIO_V2_LINE_SET_CONFIG_IOCTL, struct gpio_v2_line_config *config)``
+
+Arguments
+=========
+
+``req_fd``
+ The file descriptor of the GPIO character device, as returned in the
+ :c:type:`request.fd<gpio_v2_line_request>` by gpio-v2-get-line-ioctl.rst.
+
+``config``
+ The new :c:type:`configuration<gpio_v2_line_config>` to apply to the
+ requested lines.
+
+Description
+===========
+
+Update the configuration of previously requested lines, without releasing the
+line or introducing potential glitches.
+
+The new configuration must specify the configuration of all requested lines.
+
+The same :ref:`gpio-v2-get-line-config-rules` and
+:ref:`gpio-v2-get-line-config-support` that apply when requesting the lines
+also apply when updating the line configuration.
+
+The motivating use case for this command is changing direction of
+bi-directional lines between input and output, but it may also be used to
+dynamically control edge detection, or more generally move lines seamlessly
+from one configuration state to another.
+
+To only change the value of output lines, use
+gpio-v2-line-set-values-ioctl.rst.
+
+Return Value
+============
+
+On success 0.
+
+On error -1 and the ``errno`` variable is set appropriately.
+Common error codes are described in error-codes.rst.
diff --git a/Documentation/userspace-api/gpio/gpio-v2-line-set-values-ioctl.rst b/Documentation/userspace-api/gpio/gpio-v2-line-set-values-ioctl.rst
new file mode 100644
index 000000000000..6d2d1886950b
--- /dev/null
+++ b/Documentation/userspace-api/gpio/gpio-v2-line-set-values-ioctl.rst
@@ -0,0 +1,47 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+.. _GPIO_V2_LINE_SET_VALUES_IOCTL:
+
+*****************************
+GPIO_V2_LINE_SET_VALUES_IOCTL
+*****************************
+
+Name
+====
+
+GPIO_V2_LINE_SET_VALUES_IOCTL - Set the values of requested output lines.
+
+Synopsis
+========
+
+.. c:macro:: GPIO_V2_LINE_SET_VALUES_IOCTL
+
+``int ioctl(int req_fd, GPIO_V2_LINE_SET_VALUES_IOCTL, struct gpio_v2_line_values *values)``
+
+Arguments
+=========
+
+``req_fd``
+ The file descriptor of the GPIO character device, as returned in the
+ :c:type:`request.fd<gpio_v2_line_request>` by gpio-v2-get-line-ioctl.rst.
+
+``values``
+ The :c:type:`line_values<gpio_v2_line_values>` to set with the ``mask`` set
+ to indicate the subset of requested lines to set and ``bits`` set to
+ indicate the new value.
+
+Description
+===========
+
+Set the values of requested output lines.
+
+Only the values of output lines may be set.
+Attempting to set the value of an input line is an error (**EPERM**).
+
+Return Value
+============
+
+On success 0.
+
+On error -1 and the ``errno`` variable is set appropriately.
+Common error codes are described in error-codes.rst.
diff --git a/Documentation/userspace-api/gpio/gpio-v2-lineinfo-changed-read.rst b/Documentation/userspace-api/gpio/gpio-v2-lineinfo-changed-read.rst
new file mode 100644
index 000000000000..24ad325e7253
--- /dev/null
+++ b/Documentation/userspace-api/gpio/gpio-v2-lineinfo-changed-read.rst
@@ -0,0 +1,81 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+.. _GPIO_V2_LINEINFO_CHANGED_READ:
+
+*****************************
+GPIO_V2_LINEINFO_CHANGED_READ
+*****************************
+
+Name
+====
+
+GPIO_V2_LINEINFO_CHANGED_READ - Read line info changed events for watched
+lines from the chip.
+
+Synopsis
+========
+
+``int read(int chip_fd, void *buf, size_t count)``
+
+Arguments
+=========
+
+``chip_fd``
+ The file descriptor of the GPIO character device returned by `open()`.
+
+``buf``
+ The buffer to contain the :c:type:`events<gpio_v2_line_info_changed>`.
+
+``count``
+ The number of bytes available in ``buf``, which must be at least the size
+ of a :c:type:`gpio_v2_line_info_changed` event.
+
+Description
+===========
+
+Read line info changed events for watched lines from the chip.
+
+.. note::
+ Monitoring line info changes is not generally required, and would typically
+ only be performed by a system monitoring component.
+
+ These events relate to changes in a line's request state or configuration,
+ not its value. Use gpio-v2-line-event-read.rst to receive events when a
+ line changes value.
+
+A line must be watched using gpio-v2-get-lineinfo-watch-ioctl.rst to generate
+info changed events. Subsequently, a request, release, or reconfiguration
+of the line will generate an info changed event.
+
+The kernel timestamps events when they occur and stores them in a buffer
+from where they can be read by userspace at its convenience using `read()`.
+
+The size of the kernel event buffer is fixed at 32 events per ``chip_fd``.
+
+The buffer may overflow if bursts of events occur quicker than they are read
+by userspace. If an overflow occurs then the most recent event is discarded.
+Overflow cannot be detected from userspace.
+
+Events read from the buffer are always in the same order that they were
+detected by the kernel, including when multiple lines are being monitored by
+the one ``chip_fd``.
+
+To minimize the number of calls required to copy events from the kernel to
+userspace, `read()` supports copying multiple events. The number of events
+copied is the lower of the number available in the kernel buffer and the
+number that will fit in the userspace buffer (``buf``).
+
+A `read()` will block if no event is available and the ``chip_fd`` has not
+been set **O_NONBLOCK**.
+
+The presence of an event can be tested for by checking that the ``chip_fd`` is
+readable using `poll()` or an equivalent.
+
+Return Value
+============
+
+On success the number of bytes read, which will be a multiple of the size
+of a :c:type:`gpio_v2_line_info_changed` event.
+
+On error -1 and the ``errno`` variable is set appropriately.
+Common error codes are described in error-codes.rst.
diff --git a/Documentation/userspace-api/gpio/index.rst b/Documentation/userspace-api/gpio/index.rst
new file mode 100644
index 000000000000..f258de4ef370
--- /dev/null
+++ b/Documentation/userspace-api/gpio/index.rst
@@ -0,0 +1,18 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+====
+GPIO
+====
+
+.. toctree::
+ :maxdepth: 1
+
+ Character Device Userspace API <chardev>
+ Obsolete Userspace APIs <obsolete>
+
+.. only:: subproject and html
+
+ Indices
+ =======
+
+ * :ref:`genindex`
diff --git a/Documentation/userspace-api/gpio/obsolete.rst b/Documentation/userspace-api/gpio/obsolete.rst
new file mode 100644
index 000000000000..c42538b44ec8
--- /dev/null
+++ b/Documentation/userspace-api/gpio/obsolete.rst
@@ -0,0 +1,11 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+============================
+Obsolete GPIO Userspace APIs
+============================
+
+.. toctree::
+ :maxdepth: 1
+
+ Character Device Userspace API (v1) <chardev_v1>
+ Sysfs Interface <sysfs>
diff --git a/Documentation/userspace-api/gpio/sysfs.rst b/Documentation/userspace-api/gpio/sysfs.rst
new file mode 100644
index 000000000000..116921048b18
--- /dev/null
+++ b/Documentation/userspace-api/gpio/sysfs.rst
@@ -0,0 +1,170 @@
+GPIO Sysfs Interface for Userspace
+==================================
+
+.. warning::
+ This API is obsoleted by the chardev.rst and the ABI documentation has
+ been moved to Documentation/ABI/obsolete/sysfs-gpio.
+
+ New developments should use the chardev.rst, and existing developments are
+ encouraged to migrate as soon as possible, as this API will be removed
+ in the future.
+
+ This interface will continue to be maintained for the migration period,
+ but new features will only be added to the new API.
+
+The obsolete sysfs ABI
+----------------------
+Platforms which use the "gpiolib" implementors framework may choose to
+configure a sysfs user interface to GPIOs. This is different from the
+debugfs interface, since it provides control over GPIO direction and
+value instead of just showing a gpio state summary. Plus, it could be
+present on production systems without debugging support.
+
+Given appropriate hardware documentation for the system, userspace could
+know for example that GPIO #23 controls the write protect line used to
+protect boot loader segments in flash memory. System upgrade procedures
+may need to temporarily remove that protection, first importing a GPIO,
+then changing its output state, then updating the code before re-enabling
+the write protection. In normal use, GPIO #23 would never be touched,
+and the kernel would have no need to know about it.
+
+Again depending on appropriate hardware documentation, on some systems
+userspace GPIO can be used to determine system configuration data that
+standard kernels won't know about. And for some tasks, simple userspace
+GPIO drivers could be all that the system really needs.
+
+.. note::
+ Do NOT abuse sysfs to control hardware that has proper kernel drivers.
+ Please read Documentation/driver-api/gpio/drivers-on-gpio.rst
+ to avoid reinventing kernel wheels in userspace.
+
+ I MEAN IT. REALLY.
+
+Paths in Sysfs
+--------------
+There are three kinds of entries in /sys/class/gpio:
+
+ - Control interfaces used to get userspace control over GPIOs;
+
+ - GPIOs themselves; and
+
+ - GPIO controllers ("gpio_chip" instances).
+
+That's in addition to standard files including the "device" symlink.
+
+The control interfaces are write-only:
+
+ /sys/class/gpio/
+
+ "export" ...
+ Userspace may ask the kernel to export control of
+ a GPIO to userspace by writing its number to this file.
+
+ Example: "echo 19 > export" will create a "gpio19" node
+ for GPIO #19, if that's not requested by kernel code.
+
+ "unexport" ...
+ Reverses the effect of exporting to userspace.
+
+ Example: "echo 19 > unexport" will remove a "gpio19"
+ node exported using the "export" file.
+
+GPIO signals have paths like /sys/class/gpio/gpio42/ (for GPIO #42)
+and have the following read/write attributes:
+
+ /sys/class/gpio/gpioN/
+
+ "direction" ...
+ reads as either "in" or "out". This value may
+ normally be written. Writing as "out" defaults to
+ initializing the value as low. To ensure glitch free
+ operation, values "low" and "high" may be written to
+ configure the GPIO as an output with that initial value.
+
+ Note that this attribute *will not exist* if the kernel
+ doesn't support changing the direction of a GPIO, or
+ it was exported by kernel code that didn't explicitly
+ allow userspace to reconfigure this GPIO's direction.
+
+ "value" ...
+ reads as either 0 (inactive) or 1 (active). If the GPIO
+ is configured as an output, this value may be written;
+ any nonzero value is treated as active.
+
+ If the pin can be configured as interrupt-generating interrupt
+ and if it has been configured to generate interrupts (see the
+ description of "edge"), you can poll(2) on that file and
+ poll(2) will return whenever the interrupt was triggered. If
+ you use poll(2), set the events POLLPRI and POLLERR. If you
+ use select(2), set the file descriptor in exceptfds. After
+ poll(2) returns, either lseek(2) to the beginning of the sysfs
+ file and read the new value or close the file and re-open it
+ to read the value.
+
+ "edge" ...
+ reads as either "none", "rising", "falling", or
+ "both". Write these strings to select the signal edge(s)
+ that will make poll(2) on the "value" file return.
+
+ This file exists only if the pin can be configured as an
+ interrupt generating input pin.
+
+ "active_low" ...
+ reads as either 0 (false) or 1 (true). Write
+ any nonzero value to invert the value attribute both
+ for reading and writing. Existing and subsequent
+ poll(2) support configuration via the edge attribute
+ for "rising" and "falling" edges will follow this
+ setting.
+
+GPIO controllers have paths like /sys/class/gpio/gpiochip42/ (for the
+controller implementing GPIOs starting at #42) and have the following
+read-only attributes:
+
+ /sys/class/gpio/gpiochipN/
+
+ "base" ...
+ same as N, the first GPIO managed by this chip
+
+ "label" ...
+ provided for diagnostics (not always unique)
+
+ "ngpio" ...
+ how many GPIOs this manages (N to N + ngpio - 1)
+
+Board documentation should in most cases cover what GPIOs are used for
+what purposes. However, those numbers are not always stable; GPIOs on
+a daughtercard might be different depending on the base board being used,
+or other cards in the stack. In such cases, you may need to use the
+gpiochip nodes (possibly in conjunction with schematics) to determine
+the correct GPIO number to use for a given signal.
+
+
+Exporting from Kernel code
+--------------------------
+Kernel code can explicitly manage exports of GPIOs which have already been
+requested using gpio_request()::
+
+ /* export the GPIO to userspace */
+ int gpiod_export(struct gpio_desc *desc, bool direction_may_change);
+
+ /* reverse gpiod_export() */
+ void gpiod_unexport(struct gpio_desc *desc);
+
+ /* create a sysfs link to an exported GPIO node */
+ int gpiod_export_link(struct device *dev, const char *name,
+ struct gpio_desc *desc);
+
+After a kernel driver requests a GPIO, it may only be made available in
+the sysfs interface by gpiod_export(). The driver can control whether the
+signal direction may change. This helps drivers prevent userspace code
+from accidentally clobbering important system state.
+
+This explicit exporting can help with debugging (by making some kinds
+of experiments easier), or can provide an always-there interface that's
+suitable for documenting as part of a board support package.
+
+After the GPIO has been exported, gpiod_export_link() allows creating
+symlinks from elsewhere in sysfs to the GPIO sysfs node. Drivers can
+use this to provide the interface under their own device in sysfs with
+a descriptive name.
diff --git a/Documentation/userspace-api/index.rst b/Documentation/userspace-api/index.rst
index 09f61bd2ac2e..afecfe3cc4a8 100644
--- a/Documentation/userspace-api/index.rst
+++ b/Documentation/userspace-api/index.rst
@@ -9,31 +9,59 @@ While much of the kernel's user-space API is documented elsewhere
also be found in the kernel tree itself. This manual is intended to be the
place where this information is gathered.
+
+System calls
+============
+
+.. toctree::
+ :maxdepth: 1
+
+ unshare
+ futex2
+ ebpf/index
+ ioctl/index
+
+Security-related interfaces
+===========================
+
.. toctree::
- :caption: Table of contents
- :maxdepth: 2
+ :maxdepth: 1
no_new_privs
seccomp_filter
landlock
- unshare
+ lsm
spec_ctrl
+ tee
+
+Devices and I/O
+===============
+
+.. toctree::
+ :maxdepth: 1
+
accelerators/ocxl
dma-buf-alloc-exchange
- ebpf/index
- ELF
- ioctl/index
+ gpio/index
iommu
iommufd
media/index
+ dcdbas
+ vduse
+ isapnp
+
+Everything else
+===============
+
+.. toctree::
+ :maxdepth: 1
+
+ ELF
netlink/index
sysfs-platform_profile
vduse
futex2
- lsm
- tee
- isapnp
- dcdbas
+ perf_ring_buffer
.. only:: subproject and html
diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst b/Documentation/userspace-api/ioctl/ioctl-number.rst
index 457e16f06e04..c472423412bf 100644
--- a/Documentation/userspace-api/ioctl/ioctl-number.rst
+++ b/Documentation/userspace-api/ioctl/ioctl-number.rst
@@ -82,8 +82,9 @@ Code Seq# Include File Comments
0x10 00-0F drivers/char/s390/vmcp.h
0x10 10-1F arch/s390/include/uapi/sclp_ctl.h
0x10 20-2F arch/s390/include/uapi/asm/hypfs.h
-0x12 all linux/fs.h
+0x12 all linux/fs.h BLK* ioctls
linux/blkpg.h
+0x15 all linux/fs.h FS_IOC_* ioctls
0x1b all InfiniBand Subsystem
<http://infiniband.sourceforge.net/>
0x20 all drivers/cdrom/cm206.h
@@ -309,6 +310,7 @@ Code Seq# Include File Comments
0x89 0B-DF linux/sockios.h
0x89 E0-EF linux/sockios.h SIOCPROTOPRIVATE range
0x89 F0-FF linux/sockios.h SIOCDEVPRIVATE range
+0x8A 00-1F linux/eventpoll.h
0x8B all linux/wireless.h
0x8C 00-3F WiNRADiO driver
<http://www.winradio.com.au/>
diff --git a/Documentation/userspace-api/landlock.rst b/Documentation/userspace-api/landlock.rst
index 2e3822677061..9dd636aaa829 100644
--- a/Documentation/userspace-api/landlock.rst
+++ b/Documentation/userspace-api/landlock.rst
@@ -19,11 +19,12 @@ unexpected/malicious behaviors in user space applications. Landlock empowers
any process, including unprivileged ones, to securely restrict themselves.
We can quickly make sure that Landlock is enabled in the running system by
-looking for "landlock: Up and running" in kernel logs (as root): ``dmesg | grep
-landlock || journalctl -kg landlock`` . Developers can also easily check for
-Landlock support with a :ref:`related system call <landlock_abi_versions>`. If
-Landlock is not currently supported, we need to :ref:`configure the kernel
-appropriately <kernel_support>`.
+looking for "landlock: Up and running" in kernel logs (as root):
+``dmesg | grep landlock || journalctl -kb -g landlock`` .
+Developers can also easily check for Landlock support with a
+:ref:`related system call <landlock_abi_versions>`.
+If Landlock is not currently supported, we need to
+:ref:`configure the kernel appropriately <kernel_support>`.
Landlock rules
==============
@@ -499,6 +500,9 @@ access rights.
Kernel support
==============
+Build time configuration
+------------------------
+
Landlock was first introduced in Linux 5.13 but it must be configured at build
time with ``CONFIG_SECURITY_LANDLOCK=y``. Landlock must also be enabled at boot
time as the other security modules. The list of security modules enabled by
@@ -507,11 +511,52 @@ contains ``CONFIG_LSM=landlock,[...]`` with ``[...]`` as the list of other
potentially useful security modules for the running system (see the
``CONFIG_LSM`` help).
+Boot time configuration
+-----------------------
+
If the running kernel does not have ``landlock`` in ``CONFIG_LSM``, then we can
-still enable it by adding ``lsm=landlock,[...]`` to
-Documentation/admin-guide/kernel-parameters.rst thanks to the bootloader
+enable Landlock by adding ``lsm=landlock,[...]`` to
+Documentation/admin-guide/kernel-parameters.rst in the boot loader
configuration.
+For example, if the current built-in configuration is:
+
+.. code-block:: console
+
+ $ zgrep -h "^CONFIG_LSM=" "/boot/config-$(uname -r)" /proc/config.gz 2>/dev/null
+ CONFIG_LSM="lockdown,yama,integrity,apparmor"
+
+...and if the cmdline doesn't contain ``landlock`` either:
+
+.. code-block:: console
+
+ $ sed -n 's/.*\(\<lsm=\S\+\).*/\1/p' /proc/cmdline
+ lsm=lockdown,yama,integrity,apparmor
+
+...we should configure the boot loader to set a cmdline extending the ``lsm``
+list with the ``landlock,`` prefix::
+
+ lsm=landlock,lockdown,yama,integrity,apparmor
+
+After a reboot, we can check that Landlock is up and running by looking at
+kernel logs:
+
+.. code-block:: console
+
+ # dmesg | grep landlock || journalctl -kb -g landlock
+ [ 0.000000] Command line: [...] lsm=landlock,lockdown,yama,integrity,apparmor
+ [ 0.000000] Kernel command line: [...] lsm=landlock,lockdown,yama,integrity,apparmor
+ [ 0.000000] LSM: initializing lsm=lockdown,capability,landlock,yama,integrity,apparmor
+ [ 0.000000] landlock: Up and running.
+
+The kernel may be configured at build time to always load the ``lockdown`` and
+``capability`` LSMs. In that case, these LSMs will appear at the beginning of
+the ``LSM: initializing`` log line as well, even if they are not configured in
+the boot loader.
+
+Network support
+---------------
+
To be able to explicitly allow TCP operations (e.g., adding a network rule with
``LANDLOCK_ACCESS_NET_BIND_TCP``), the kernel must support TCP
(``CONFIG_INET=y``). Otherwise, sys_landlock_add_rule() returns an
diff --git a/Documentation/userspace-api/netlink/netlink-raw.rst b/Documentation/userspace-api/netlink/netlink-raw.rst
index 1e14f5f22b8e..1990eea772d0 100644
--- a/Documentation/userspace-api/netlink/netlink-raw.rst
+++ b/Documentation/userspace-api/netlink/netlink-raw.rst
@@ -150,3 +150,45 @@ attributes from an ``attribute-set``. For example the following
Note that a selector attribute must appear in a netlink message before any
sub-message attributes that depend on it.
+
+If an attribute such as ``kind`` is defined at more than one nest level, then a
+sub-message selector will be resolved using the value 'closest' to the selector.
+For example, if the same attribute name is defined in a nested ``attribute-set``
+alongside a sub-message selector and also in a top level ``attribute-set``, then
+the selector will be resolved using the value 'closest' to the selector. If the
+value is not present in the message at the same level as defined in the spec
+then this is an error.
+
+Nested struct definitions
+-------------------------
+
+Many raw netlink families such as :doc:`tc<../../networking/netlink_spec/tc>`
+make use of nested struct definitions. The ``netlink-raw`` schema makes it
+possible to embed a struct within a struct definition using the ``struct``
+property. For example, the following struct definition embeds the
+``tc-ratespec`` struct definition for both the ``rate`` and the ``peakrate``
+members of ``struct tc-tbf-qopt``.
+
+.. code-block:: yaml
+
+ -
+ name: tc-tbf-qopt
+ type: struct
+ members:
+ -
+ name: rate
+ type: binary
+ struct: tc-ratespec
+ -
+ name: peakrate
+ type: binary
+ struct: tc-ratespec
+ -
+ name: limit
+ type: u32
+ -
+ name: buffer
+ type: u32
+ -
+ name: mtu
+ type: u32
diff --git a/Documentation/userspace-api/perf_ring_buffer.rst b/Documentation/userspace-api/perf_ring_buffer.rst
new file mode 100644
index 000000000000..bde9d8cbc106
--- /dev/null
+++ b/Documentation/userspace-api/perf_ring_buffer.rst
@@ -0,0 +1,830 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+================
+Perf ring buffer
+================
+
+.. CONTENTS
+
+ 1. Introduction
+
+ 2. Ring buffer implementation
+ 2.1 Basic algorithm
+ 2.2 Ring buffer for different tracing modes
+ 2.2.1 Default mode
+ 2.2.2 Per-thread mode
+ 2.2.3 Per-CPU mode
+ 2.2.4 System wide mode
+ 2.3 Accessing buffer
+ 2.3.1 Producer-consumer model
+ 2.3.2 Properties of the ring buffers
+ 2.3.3 Writing samples into buffer
+ 2.3.4 Reading samples from buffer
+ 2.3.5 Memory synchronization
+
+ 3. The mechanism of AUX ring buffer
+ 3.1 The relationship between AUX and regular ring buffers
+ 3.2 AUX events
+ 3.3 Snapshot mode
+
+
+1. Introduction
+===============
+
+The ring buffer is a fundamental mechanism for data transfer. perf uses
+ring buffers to transfer event data from kernel to user space, another
+kind of ring buffer which is so called auxiliary (AUX) ring buffer also
+plays an important role for hardware tracing with Intel PT, Arm
+CoreSight, etc.
+
+The ring buffer implementation is critical but it's also a very
+challenging work. On the one hand, the kernel and perf tool in the user
+space use the ring buffer to exchange data and stores data into data
+file, thus the ring buffer needs to transfer data with high throughput;
+on the other hand, the ring buffer management should avoid significant
+overload to distract profiling results.
+
+This documentation dives into the details for perf ring buffer with two
+parts: firstly it explains the perf ring buffer implementation, then the
+second part discusses the AUX ring buffer mechanism.
+
+2. Ring buffer implementation
+=============================
+
+2.1 Basic algorithm
+-------------------
+
+That said, a typical ring buffer is managed by a head pointer and a tail
+pointer; the head pointer is manipulated by a writer and the tail
+pointer is updated by a reader respectively.
+
+::
+
+ +---------------------------+
+ | | |***|***|***| | |
+ +---------------------------+
+ `-> Tail `-> Head
+
+ * : the data is filled by the writer.
+
+ Figure 1. Ring buffer
+
+Perf uses the same way to manage its ring buffer. In the implementation
+there are two key data structures held together in a set of consecutive
+pages, the control structure and then the ring buffer itself. The page
+with the control structure in is known as the "user page". Being held
+in continuous virtual addresses simplifies locating the ring buffer
+address, it is in the pages after the page with the user page.
+
+The control structure is named as ``perf_event_mmap_page``, it contains a
+head pointer ``data_head`` and a tail pointer ``data_tail``. When the
+kernel starts to fill records into the ring buffer, it updates the head
+pointer to reserve the memory so later it can safely store events into
+the buffer. On the other side, when the user page is a writable mapping,
+the perf tool has the permission to update the tail pointer after consuming
+data from the ring buffer. Yet another case is for the user page's
+read-only mapping, which is to be addressed in the section
+:ref:`writing_samples_into_buffer`.
+
+::
+
+ user page ring buffer
+ +---------+---------+ +---------------------------------------+
+ |data_head|data_tail|...| | |***|***|***|***|***| | | |
+ +---------+---------+ +---------------------------------------+
+ ` `----------------^ ^
+ `----------------------------------------------|
+
+ * : the data is filled by the writer.
+
+ Figure 2. Perf ring buffer
+
+When using the ``perf record`` tool, we can specify the ring buffer size
+with option ``-m`` or ``--mmap-pages=``, the given size will be rounded up
+to a power of two that is a multiple of a page size. Though the kernel
+allocates at once for all memory pages, it's deferred to map the pages
+to VMA area until the perf tool accesses the buffer from the user space.
+In other words, at the first time accesses the buffer's page from user
+space in the perf tool, a data abort exception for page fault is taken
+and the kernel uses this occasion to map the page into process VMA
+(see ``perf_mmap_fault()``), thus the perf tool can continue to access
+the page after returning from the exception.
+
+2.2 Ring buffer for different tracing modes
+-------------------------------------------
+
+The perf profiles programs with different modes: default mode, per thread
+mode, per cpu mode, and system wide mode. This section describes these
+modes and how the ring buffer meets requirements for them. At last we
+will review the race conditions caused by these modes.
+
+2.2.1 Default mode
+^^^^^^^^^^^^^^^^^^
+
+Usually we execute ``perf record`` command followed by a profiling program
+name, like below command::
+
+ perf record test_program
+
+This command doesn't specify any options for CPU and thread modes, the
+perf tool applies the default mode on the perf event. It maps all the
+CPUs in the system and the profiled program's PID on the perf event, and
+it enables inheritance mode on the event so that child tasks inherits
+the events. As a result, the perf event is attributed as::
+
+ evsel::cpus::map[] = { 0 .. _SC_NPROCESSORS_ONLN-1 }
+ evsel::threads::map[] = { pid }
+ evsel::attr::inherit = 1
+
+These attributions finally will be reflected on the deployment of ring
+buffers. As shown below, the perf tool allocates individual ring buffer
+for each CPU, but it only enables events for the profiled program rather
+than for all threads in the system. The *T1* thread represents the
+thread context of the 'test_program', whereas *T2* and *T3* are irrelevant
+threads in the system. The perf samples are exclusively collected for
+the *T1* thread and stored in the ring buffer associated with the CPU on
+which the *T1* thread is running.
+
+::
+
+ T1 T2 T1
+ +----+ +-----------+ +----+
+ CPU0 |xxxx| |xxxxxxxxxxx| |xxxx|
+ +----+--------------+-----------+----------+----+-------->
+ | |
+ v v
+ +-----------------------------------------------------+
+ | Ring buffer 0 |
+ +-----------------------------------------------------+
+
+ T1
+ +-----+
+ CPU1 |xxxxx|
+ -----+-----+--------------------------------------------->
+ |
+ v
+ +-----------------------------------------------------+
+ | Ring buffer 1 |
+ +-----------------------------------------------------+
+
+ T1 T3
+ +----+ +-------+
+ CPU2 |xxxx| |xxxxxxx|
+ --------------------------+----+--------+-------+-------->
+ |
+ v
+ +-----------------------------------------------------+
+ | Ring buffer 2 |
+ +-----------------------------------------------------+
+
+ T1
+ +--------------+
+ CPU3 |xxxxxxxxxxxxxx|
+ -----------+--------------+------------------------------>
+ |
+ v
+ +-----------------------------------------------------+
+ | Ring buffer 3 |
+ +-----------------------------------------------------+
+
+ T1: Thread 1; T2: Thread 2; T3: Thread 3
+ x: Thread is in running state
+
+ Figure 3. Ring buffer for default mode
+
+2.2.2 Per-thread mode
+^^^^^^^^^^^^^^^^^^^^^
+
+By specifying option ``--per-thread`` in perf command, e.g.
+
+::
+
+ perf record --per-thread test_program
+
+The perf event doesn't map to any CPUs and is only bound to the
+profiled process, thus, the perf event's attributions are::
+
+ evsel::cpus::map[0] = { -1 }
+ evsel::threads::map[] = { pid }
+ evsel::attr::inherit = 0
+
+In this mode, a single ring buffer is allocated for the profiled thread;
+if the thread is scheduled on a CPU, the events on that CPU will be
+enabled; and if the thread is scheduled out from the CPU, the events on
+the CPU will be disabled. When the thread is migrated from one CPU to
+another, the events are to be disabled on the previous CPU and enabled
+on the next CPU correspondingly.
+
+::
+
+ T1 T2 T1
+ +----+ +-----------+ +----+
+ CPU0 |xxxx| |xxxxxxxxxxx| |xxxx|
+ +----+--------------+-----------+----------+----+-------->
+ | |
+ | T1 |
+ | +-----+ |
+ CPU1 | |xxxxx| |
+ --|--+-----+----------------------------------|---------->
+ | | |
+ | | T1 T3 |
+ | | +----+ +---+ |
+ CPU2 | | |xxxx| |xxx| |
+ --|-----|-----------------+----+--------+---+-|---------->
+ | | | |
+ | | T1 | |
+ | | +--------------+ | |
+ CPU3 | | |xxxxxxxxxxxxxx| | |
+ --|-----|--+--------------+-|-----------------|---------->
+ | | | | |
+ v v v v v
+ +-----------------------------------------------------+
+ | Ring buffer |
+ +-----------------------------------------------------+
+
+ T1: Thread 1
+ x: Thread is in running state
+
+ Figure 4. Ring buffer for per-thread mode
+
+When perf runs in per-thread mode, a ring buffer is allocated for the
+profiled thread *T1*. The ring buffer is dedicated for thread *T1*, if the
+thread *T1* is running, the perf events will be recorded into the ring
+buffer; when the thread is sleeping, all associated events will be
+disabled, thus no trace data will be recorded into the ring buffer.
+
+2.2.3 Per-CPU mode
+^^^^^^^^^^^^^^^^^^
+
+The option ``-C`` is used to collect samples on the list of CPUs, for
+example the below perf command receives option ``-C 0,2``::
+
+ perf record -C 0,2 test_program
+
+It maps the perf event to CPUs 0 and 2, and the event is not associated to any
+PID. Thus the perf event attributions are set as::
+
+ evsel::cpus::map[0] = { 0, 2 }
+ evsel::threads::map[] = { -1 }
+ evsel::attr::inherit = 0
+
+This results in the session of ``perf record`` will sample all threads on CPU0
+and CPU2, and be terminated until test_program exits. Even there have tasks
+running on CPU1 and CPU3, since the ring buffer is absent for them, any
+activities on these two CPUs will be ignored. A usage case is to combine the
+options for per-thread mode and per-CPU mode, e.g. the options ``–C 0,2`` and
+``––per–thread`` are specified together, the samples are recorded only when
+the profiled thread is scheduled on any of the listed CPUs.
+
+::
+
+ T1 T2 T1
+ +----+ +-----------+ +----+
+ CPU0 |xxxx| |xxxxxxxxxxx| |xxxx|
+ +----+--------------+-----------+----------+----+-------->
+ | | |
+ v v v
+ +-----------------------------------------------------+
+ | Ring buffer 0 |
+ +-----------------------------------------------------+
+
+ T1
+ +-----+
+ CPU1 |xxxxx|
+ -----+-----+--------------------------------------------->
+
+ T1 T3
+ +----+ +-------+
+ CPU2 |xxxx| |xxxxxxx|
+ --------------------------+----+--------+-------+-------->
+ | |
+ v v
+ +-----------------------------------------------------+
+ | Ring buffer 1 |
+ +-----------------------------------------------------+
+
+ T1
+ +--------------+
+ CPU3 |xxxxxxxxxxxxxx|
+ -----------+--------------+------------------------------>
+
+ T1: Thread 1; T2: Thread 2; T3: Thread 3
+ x: Thread is in running state
+
+ Figure 5. Ring buffer for per-CPU mode
+
+2.2.4 System wide mode
+^^^^^^^^^^^^^^^^^^^^^^
+
+By using option ``–a`` or ``––all–cpus``, perf collects samples on all CPUs
+for all tasks, we call it as the system wide mode, the command is::
+
+ perf record -a test_program
+
+Similar to the per-CPU mode, the perf event doesn't bind to any PID, and
+it maps to all CPUs in the system::
+
+ evsel::cpus::map[] = { 0 .. _SC_NPROCESSORS_ONLN-1 }
+ evsel::threads::map[] = { -1 }
+ evsel::attr::inherit = 0
+
+In the system wide mode, every CPU has its own ring buffer, all threads
+are monitored during the running state and the samples are recorded into
+the ring buffer belonging to the CPU which the events occurred on.
+
+::
+
+ T1 T2 T1
+ +----+ +-----------+ +----+
+ CPU0 |xxxx| |xxxxxxxxxxx| |xxxx|
+ +----+--------------+-----------+----------+----+-------->
+ | | |
+ v v v
+ +-----------------------------------------------------+
+ | Ring buffer 0 |
+ +-----------------------------------------------------+
+
+ T1
+ +-----+
+ CPU1 |xxxxx|
+ -----+-----+--------------------------------------------->
+ |
+ v
+ +-----------------------------------------------------+
+ | Ring buffer 1 |
+ +-----------------------------------------------------+
+
+ T1 T3
+ +----+ +-------+
+ CPU2 |xxxx| |xxxxxxx|
+ --------------------------+----+--------+-------+-------->
+ | |
+ v v
+ +-----------------------------------------------------+
+ | Ring buffer 2 |
+ +-----------------------------------------------------+
+
+ T1
+ +--------------+
+ CPU3 |xxxxxxxxxxxxxx|
+ -----------+--------------+------------------------------>
+ |
+ v
+ +-----------------------------------------------------+
+ | Ring buffer 3 |
+ +-----------------------------------------------------+
+
+ T1: Thread 1; T2: Thread 2; T3: Thread 3
+ x: Thread is in running state
+
+ Figure 6. Ring buffer for system wide mode
+
+2.3 Accessing buffer
+--------------------
+
+Based on the understanding of how the ring buffer is allocated in
+various modes, this section explains access the ring buffer.
+
+2.3.1 Producer-consumer model
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+In the Linux kernel, the PMU events can produce samples which are stored
+into the ring buffer; the perf command in user space consumes the
+samples by reading out data from the ring buffer and finally saves the
+data into the file for post analysis. It’s a typical producer-consumer
+model for using the ring buffer.
+
+The perf process polls on the PMU events and sleeps when no events are
+incoming. To prevent frequent exchanges between the kernel and user
+space, the kernel event core layer introduces a watermark, which is
+stored in the ``perf_buffer::watermark``. When a sample is recorded into
+the ring buffer, and if the used buffer exceeds the watermark, the
+kernel wakes up the perf process to read samples from the ring buffer.
+
+::
+
+ Perf
+ / | Read samples
+ Polling / `--------------| Ring buffer
+ v v ;---------------------v
+ +----------------+ +---------+---------+ +-------------------+
+ |Event wait queue| |data_head|data_tail| |***|***| | |***|
+ +----------------+ +---------+---------+ +-------------------+
+ ^ ^ `------------------------^
+ | Wake up tasks | Store samples
+ +-----------------------------+
+ | Kernel event core layer |
+ +-----------------------------+
+
+ * : the data is filled by the writer.
+
+ Figure 7. Writing and reading the ring buffer
+
+When the kernel event core layer notifies the user space, because
+multiple events might share the same ring buffer for recording samples,
+the core layer iterates every event associated with the ring buffer and
+wakes up tasks waiting on the event. This is fulfilled by the kernel
+function ``ring_buffer_wakeup()``.
+
+After the perf process is woken up, it starts to check the ring buffers
+one by one, if it finds any ring buffer containing samples it will read
+out the samples for statistics or saving into the data file. Given the
+perf process is able to run on any CPU, this leads to the ring buffer
+potentially being accessed from multiple CPUs simultaneously, which
+causes race conditions. The race condition handling is described in the
+section :ref:`memory_synchronization`.
+
+2.3.2 Properties of the ring buffers
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Linux kernel supports two write directions for the ring buffer: forward and
+backward. The forward writing saves samples from the beginning of the ring
+buffer, the backward writing stores data from the end of the ring buffer with
+the reversed direction. The perf tool determines the writing direction.
+
+Additionally, the tool can map buffers in either read-write mode or read-only
+mode to the user space.
+
+The ring buffer in the read-write mode is mapped with the property
+``PROT_READ | PROT_WRITE``. With the write permission, the perf tool
+updates the ``data_tail`` to indicate the data start position. Combining
+with the head pointer ``data_head``, which works as the end position of
+the current data, the perf tool can easily know where read out the data
+from.
+
+Alternatively, in the read-only mode, only the kernel keeps to update
+the ``data_head`` while the user space cannot access the ``data_tail`` due
+to the mapping property ``PROT_READ``.
+
+As a result, the matrix below illustrates the various combinations of
+direction and mapping characteristics. The perf tool employs two of these
+combinations to support buffer types: the non-overwrite buffer and the
+overwritable buffer.
+
+.. list-table::
+ :widths: 1 1 1
+ :header-rows: 1
+
+ * - Mapping mode
+ - Forward
+ - Backward
+ * - read-write
+ - Non-overwrite ring buffer
+ - Not used
+ * - read-only
+ - Not used
+ - Overwritable ring buffer
+
+The non-overwrite ring buffer uses the read-write mapping with forward
+writing. It starts to save data from the beginning of the ring buffer
+and wrap around when overflow, which is used with the read-write mode in
+the normal ring buffer. When the consumer doesn't keep up with the
+producer, it would lose some data, the kernel keeps how many records it
+lost and generates the ``PERF_RECORD_LOST`` records in the next time
+when it finds a space in the ring buffer.
+
+The overwritable ring buffer uses the backward writing with the
+read-only mode. It saves the data from the end of the ring buffer and
+the ``data_head`` keeps the position of current data, the perf always
+knows where it starts to read and until the end of the ring buffer, thus
+it don't need the ``data_tail``. In this mode, it will not generate the
+``PERF_RECORD_LOST`` records.
+
+.. _writing_samples_into_buffer:
+
+2.3.3 Writing samples into buffer
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+When a sample is taken and saved into the ring buffer, the kernel
+prepares sample fields based on the sample type; then it prepares the
+info for writing ring buffer which is stored in the structure
+``perf_output_handle``. In the end, the kernel outputs the sample into
+the ring buffer and updates the head pointer in the user page so the
+perf tool can see the latest value.
+
+The structure ``perf_output_handle`` serves as a temporary context for
+tracking the information related to the buffer. The advantages of it is
+that it enables concurrent writing to the buffer by different events.
+For example, a software event and a hardware PMU event both are enabled
+for profiling, two instances of ``perf_output_handle`` serve as separate
+contexts for the software event and the hardware event respectively.
+This allows each event to reserve its own memory space for populating
+the record data.
+
+2.3.4 Reading samples from buffer
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+In the user space, the perf tool utilizes the ``perf_event_mmap_page``
+structure to handle the head and tail of the buffer. It also uses
+``perf_mmap`` structure to keep track of a context for the ring buffer, this
+context includes information about the buffer's starting and ending
+addresses. Additionally, the mask value can be utilized to compute the
+circular buffer pointer even for an overflow.
+
+Similar to the kernel, the perf tool in the user space first reads out
+the recorded data from the ring buffer, and then updates the buffer's
+tail pointer ``perf_event_mmap_page::data_tail``.
+
+.. _memory_synchronization:
+
+2.3.5 Memory synchronization
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The modern CPUs with relaxed memory model cannot promise the memory
+ordering, this means it’s possible to access the ring buffer and the
+``perf_event_mmap_page`` structure out of order. To assure the specific
+sequence for memory accessing perf ring buffer, memory barriers are
+used to assure the data dependency. The rationale for the memory
+synchronization is as below::
+
+ Kernel User space
+
+ if (LOAD ->data_tail) { LOAD ->data_head
+ (A) smp_rmb() (C)
+ STORE $data LOAD $data
+ smp_wmb() (B) smp_mb() (D)
+ STORE ->data_head STORE ->data_tail
+ }
+
+The comments in tools/include/linux/ring_buffer.h gives nice description
+for why and how to use memory barriers, here we will just provide an
+alternative explanation:
+
+(A) is a control dependency so that CPU assures order between checking
+pointer ``perf_event_mmap_page::data_tail`` and filling sample into ring
+buffer;
+
+(D) pairs with (A). (D) separates the ring buffer data reading from
+writing the pointer ``data_tail``, perf tool first consumes samples and then
+tells the kernel that the data chunk has been released. Since a reading
+operation is followed by a writing operation, thus (D) is a full memory
+barrier.
+
+(B) is a writing barrier in the middle of two writing operations, which
+makes sure that recording a sample must be prior to updating the head
+pointer.
+
+(C) pairs with (B). (C) is a read memory barrier to ensure the head
+pointer is fetched before reading samples.
+
+To implement the above algorithm, the ``perf_output_put_handle()`` function
+in the kernel and two helpers ``ring_buffer_read_head()`` and
+``ring_buffer_write_tail()`` in the user space are introduced, they rely
+on memory barriers as described above to ensure the data dependency.
+
+Some architectures support one-way permeable barrier with load-acquire
+and store-release operations, these barriers are more relaxed with less
+performance penalty, so (C) and (D) can be optimized to use barriers
+``smp_load_acquire()`` and ``smp_store_release()`` respectively.
+
+If an architecture doesn’t support load-acquire and store-release in its
+memory model, it will roll back to the old fashion of memory barrier
+operations. In this case, ``smp_load_acquire()`` encapsulates
+``READ_ONCE()`` + ``smp_mb()``, since ``smp_mb()`` is costly,
+``ring_buffer_read_head()`` doesn't invoke ``smp_load_acquire()`` and it uses
+the barriers ``READ_ONCE()`` + ``smp_rmb()`` instead.
+
+3. The mechanism of AUX ring buffer
+===================================
+
+In this chapter, we will explain the implementation of the AUX ring
+buffer. In the first part it will discuss the connection between the
+AUX ring buffer and the regular ring buffer, then the second part will
+examine how the AUX ring buffer co-works with the regular ring buffer,
+as well as the additional features introduced by the AUX ring buffer for
+the sampling mechanism.
+
+3.1 The relationship between AUX and regular ring buffers
+---------------------------------------------------------
+
+Generally, the AUX ring buffer is an auxiliary for the regular ring
+buffer. The regular ring buffer is primarily used to store the event
+samples and every event format complies with the definition in the
+union ``perf_event``; the AUX ring buffer is for recording the hardware
+trace data and the trace data format is hardware IP dependent.
+
+The general use and advantage of the AUX ring buffer is that it is
+written directly by hardware rather than by the kernel. For example,
+regular profile samples that write to the regular ring buffer cause an
+interrupt. Tracing execution requires a high number of samples and
+using interrupts would be overwhelming for the regular ring buffer
+mechanism. Having an AUX buffer allows for a region of memory more
+decoupled from the kernel and written to directly by hardware tracing.
+
+The AUX ring buffer reuses the same algorithm with the regular ring
+buffer for the buffer management. The control structure
+``perf_event_mmap_page`` extends the new fields ``aux_head`` and ``aux_tail``
+for the head and tail pointers of the AUX ring buffer.
+
+During the initialisation phase, besides the mmap()-ed regular ring
+buffer, the perf tool invokes a second syscall in the
+``auxtrace_mmap__mmap()`` function for the mmap of the AUX buffer with
+non-zero file offset; ``rb_alloc_aux()`` in the kernel allocates pages
+correspondingly, these pages will be deferred to map into VMA when
+handling the page fault, which is the same lazy mechanism with the
+regular ring buffer.
+
+AUX events and AUX trace data are two different things. Let's see an
+example::
+
+ perf record -a -e cycles -e cs_etm/@tmc_etr0/ -- sleep 2
+
+The above command enables two events: one is the event *cycles* from PMU
+and another is the AUX event *cs_etm* from Arm CoreSight, both are saved
+into the regular ring buffer while the CoreSight's AUX trace data is
+stored in the AUX ring buffer.
+
+As a result, we can see the regular ring buffer and the AUX ring buffer
+are allocated in pairs. The perf in default mode allocates the regular
+ring buffer and the AUX ring buffer per CPU-wise, which is the same as
+the system wide mode, however, the default mode records samples only for
+the profiled program, whereas the latter mode profiles for all programs
+in the system. For per-thread mode, the perf tool allocates only one
+regular ring buffer and one AUX ring buffer for the whole session. For
+the per-CPU mode, the perf allocates two kinds of ring buffers for
+selected CPUs specified by the option ``-C``.
+
+The below figure demonstrates the buffers' layout in the system wide
+mode; if there are any activities on one CPU, the AUX event samples and
+the hardware trace data will be recorded into the dedicated buffers for
+the CPU.
+
+::
+
+ T1 T2 T1
+ +----+ +-----------+ +----+
+ CPU0 |xxxx| |xxxxxxxxxxx| |xxxx|
+ +----+--------------+-----------+----------+----+-------->
+ | | |
+ v v v
+ +-----------------------------------------------------+
+ | Ring buffer 0 |
+ +-----------------------------------------------------+
+ | | |
+ v v v
+ +-----------------------------------------------------+
+ | AUX Ring buffer 0 |
+ +-----------------------------------------------------+
+
+ T1
+ +-----+
+ CPU1 |xxxxx|
+ -----+-----+--------------------------------------------->
+ |
+ v
+ +-----------------------------------------------------+
+ | Ring buffer 1 |
+ +-----------------------------------------------------+
+ |
+ v
+ +-----------------------------------------------------+
+ | AUX Ring buffer 1 |
+ +-----------------------------------------------------+
+
+ T1 T3
+ +----+ +-------+
+ CPU2 |xxxx| |xxxxxxx|
+ --------------------------+----+--------+-------+-------->
+ | |
+ v v
+ +-----------------------------------------------------+
+ | Ring buffer 2 |
+ +-----------------------------------------------------+
+ | |
+ v v
+ +-----------------------------------------------------+
+ | AUX Ring buffer 2 |
+ +-----------------------------------------------------+
+
+ T1
+ +--------------+
+ CPU3 |xxxxxxxxxxxxxx|
+ -----------+--------------+------------------------------>
+ |
+ v
+ +-----------------------------------------------------+
+ | Ring buffer 3 |
+ +-----------------------------------------------------+
+ |
+ v
+ +-----------------------------------------------------+
+ | AUX Ring buffer 3 |
+ +-----------------------------------------------------+
+
+ T1: Thread 1; T2: Thread 2; T3: Thread 3
+ x: Thread is in running state
+
+ Figure 8. AUX ring buffer for system wide mode
+
+3.2 AUX events
+--------------
+
+Similar to ``perf_output_begin()`` and ``perf_output_end()``'s working for the
+regular ring buffer, ``perf_aux_output_begin()`` and ``perf_aux_output_end()``
+serve for the AUX ring buffer for processing the hardware trace data.
+
+Once the hardware trace data is stored into the AUX ring buffer, the PMU
+driver will stop hardware tracing by calling the ``pmu::stop()`` callback.
+Similar to the regular ring buffer, the AUX ring buffer needs to apply
+the memory synchronization mechanism as discussed in the section
+:ref:`memory_synchronization`. Since the AUX ring buffer is managed by the
+PMU driver, the barrier (B), which is a writing barrier to ensure the trace
+data is externally visible prior to updating the head pointer, is asked
+to be implemented in the PMU driver.
+
+Then ``pmu::stop()`` can safely call the ``perf_aux_output_end()`` function to
+finish two things:
+
+- It fills an event ``PERF_RECORD_AUX`` into the regular ring buffer, this
+ event delivers the information of the start address and data size for a
+ chunk of hardware trace data has been stored into the AUX ring buffer;
+
+- Since the hardware trace driver has stored new trace data into the AUX
+ ring buffer, the argument *size* indicates how many bytes have been
+ consumed by the hardware tracing, thus ``perf_aux_output_end()`` updates the
+ header pointer ``perf_buffer::aux_head`` to reflect the latest buffer usage.
+
+At the end, the PMU driver will restart hardware tracing. During this
+temporary suspending period, it will lose hardware trace data, which
+will introduce a discontinuity during decoding phase.
+
+The event ``PERF_RECORD_AUX`` presents an AUX event which is handled in the
+kernel, but it lacks the information for saving the AUX trace data in
+the perf file. When the perf tool copies the trace data from AUX ring
+buffer to the perf data file, it synthesizes a ``PERF_RECORD_AUXTRACE``
+event which is not a kernel ABI, it's defined by the perf tool to describe
+which portion of data in the AUX ring buffer is saved. Afterwards, the perf
+tool reads out the AUX trace data from the perf file based on the
+``PERF_RECORD_AUXTRACE`` events, and the ``PERF_RECORD_AUX`` event is used to
+decode a chunk of data by correlating with time order.
+
+3.3 Snapshot mode
+-----------------
+
+Perf supports snapshot mode for AUX ring buffer, in this mode, users
+only record AUX trace data at a specific time point which users are
+interested in. E.g. below gives an example of how to take snapshots
+with 1 second interval with Arm CoreSight::
+
+ perf record -e cs_etm/@tmc_etr0/u -S -a program &
+ PERFPID=$!
+ while true; do
+ kill -USR2 $PERFPID
+ sleep 1
+ done
+
+The main flow for snapshot mode is:
+
+- Before a snapshot is taken, the AUX ring buffer acts in free run mode.
+ During free run mode the perf doesn't record any of the AUX events and
+ trace data;
+
+- Once the perf tool receives the *USR2* signal, it triggers the callback
+ function ``auxtrace_record::snapshot_start()`` to deactivate hardware
+ tracing. The kernel driver then populates the AUX ring buffer with the
+ hardware trace data, and the event ``PERF_RECORD_AUX`` is stored in the
+ regular ring buffer;
+
+- Then perf tool takes a snapshot, ``record__read_auxtrace_snapshot()``
+ reads out the hardware trace data from the AUX ring buffer and saves it
+ into perf data file;
+
+- After the snapshot is finished, ``auxtrace_record::snapshot_finish()``
+ restarts the PMU event for AUX tracing.
+
+The perf only accesses the head pointer ``perf_event_mmap_page::aux_head``
+in snapshot mode and doesn’t touch tail pointer ``aux_tail``, this is
+because the AUX ring buffer can overflow in free run mode, the tail
+pointer is useless in this case. Alternatively, the callback
+``auxtrace_record::find_snapshot()`` is introduced for making the decision
+of whether the AUX ring buffer has been wrapped around or not, at the
+end it fixes up the AUX buffer's head which are used to calculate the
+trace data size.
+
+As we know, the buffers' deployment can be per-thread mode, per-CPU
+mode, or system wide mode, and the snapshot can be applied to any of
+these modes. Below is an example of taking snapshot with system wide
+mode.
+
+::
+
+ Snapshot is taken
+ |
+ v
+ +------------------------+
+ | AUX Ring buffer 0 | <- aux_head
+ +------------------------+
+ v
+ +--------------------------------+
+ | AUX Ring buffer 1 | <- aux_head
+ +--------------------------------+
+ v
+ +--------------------------------------------+
+ | AUX Ring buffer 2 | <- aux_head
+ +--------------------------------------------+
+ v
+ +---------------------------------------+
+ | AUX Ring buffer 3 | <- aux_head
+ +---------------------------------------+
+
+ Figure 9. Snapshot with system wide mode