summaryrefslogtreecommitdiff
path: root/Documentation/driver-api
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/driver-api')
-rw-r--r--Documentation/driver-api/cxl/memory-devices.rst315
-rw-r--r--Documentation/driver-api/device-io.rst9
-rw-r--r--Documentation/driver-api/dma-buf.rst9
-rw-r--r--Documentation/driver-api/driver-model/devres.rst1
-rw-r--r--Documentation/driver-api/gpio/board.rst21
-rw-r--r--Documentation/driver-api/index.rst1
-rw-r--r--Documentation/driver-api/media/drivers/davinci-vpbe-devel.rst20
-rw-r--r--Documentation/driver-api/media/drivers/fimc-devel.rst14
-rw-r--r--Documentation/driver-api/media/v4l2-event.rst2
-rw-r--r--Documentation/driver-api/mtd/index.rst2
-rw-r--r--Documentation/driver-api/mtd/spi-intel.rst (renamed from Documentation/driver-api/mtd/intel-spi.rst)8
-rw-r--r--Documentation/driver-api/nvdimm/nvdimm.rst406
-rw-r--r--Documentation/driver-api/nvmem.rst28
-rw-r--r--Documentation/driver-api/serial/driver.rst2
-rw-r--r--Documentation/driver-api/thermal/index.rst1
-rw-r--r--Documentation/driver-api/thermal/intel_dptf.rst272
-rw-r--r--Documentation/driver-api/vfio-pci-device-specific-driver-acceptance.rst35
17 files changed, 765 insertions, 381 deletions
diff --git a/Documentation/driver-api/cxl/memory-devices.rst b/Documentation/driver-api/cxl/memory-devices.rst
index 3b8f41395f6b..db476bb170b6 100644
--- a/Documentation/driver-api/cxl/memory-devices.rst
+++ b/Documentation/driver-api/cxl/memory-devices.rst
@@ -14,6 +14,303 @@ that optionally define a device's contribution to an interleaved address
range across multiple devices underneath a host-bridge or interleaved
across host-bridges.
+CXL Bus: Theory of Operation
+============================
+Similar to how a RAID driver takes disk objects and assembles them into a new
+logical device, the CXL subsystem is tasked to take PCIe and ACPI objects and
+assemble them into a CXL.mem decode topology. The need for runtime configuration
+of the CXL.mem topology is also similar to RAID in that different environments
+with the same hardware configuration may decide to assemble the topology in
+contrasting ways. One may choose performance (RAID0) striping memory across
+multiple Host Bridges and endpoints while another may opt for fault tolerance
+and disable any striping in the CXL.mem topology.
+
+Platform firmware enumerates a menu of interleave options at the "CXL root port"
+(Linux term for the top of the CXL decode topology). From there, PCIe topology
+dictates which endpoints can participate in which Host Bridge decode regimes.
+Each PCIe Switch in the path between the root and an endpoint introduces a point
+at which the interleave can be split. For example platform firmware may say at a
+given range only decodes to 1 one Host Bridge, but that Host Bridge may in turn
+interleave cycles across multiple Root Ports. An intervening Switch between a
+port and an endpoint may interleave cycles across multiple Downstream Switch
+Ports, etc.
+
+Here is a sample listing of a CXL topology defined by 'cxl_test'. The 'cxl_test'
+module generates an emulated CXL topology of 2 Host Bridges each with 2 Root
+Ports. Each of those Root Ports are connected to 2-way switches with endpoints
+connected to those downstream ports for a total of 8 endpoints::
+
+ # cxl list -BEMPu -b cxl_test
+ {
+ "bus":"root3",
+ "provider":"cxl_test",
+ "ports:root3":[
+ {
+ "port":"port5",
+ "host":"cxl_host_bridge.1",
+ "ports:port5":[
+ {
+ "port":"port8",
+ "host":"cxl_switch_uport.1",
+ "endpoints:port8":[
+ {
+ "endpoint":"endpoint9",
+ "host":"mem2",
+ "memdev":{
+ "memdev":"mem2",
+ "pmem_size":"256.00 MiB (268.44 MB)",
+ "ram_size":"256.00 MiB (268.44 MB)",
+ "serial":"0x1",
+ "numa_node":1,
+ "host":"cxl_mem.1"
+ }
+ },
+ {
+ "endpoint":"endpoint15",
+ "host":"mem6",
+ "memdev":{
+ "memdev":"mem6",
+ "pmem_size":"256.00 MiB (268.44 MB)",
+ "ram_size":"256.00 MiB (268.44 MB)",
+ "serial":"0x5",
+ "numa_node":1,
+ "host":"cxl_mem.5"
+ }
+ }
+ ]
+ },
+ {
+ "port":"port12",
+ "host":"cxl_switch_uport.3",
+ "endpoints:port12":[
+ {
+ "endpoint":"endpoint17",
+ "host":"mem8",
+ "memdev":{
+ "memdev":"mem8",
+ "pmem_size":"256.00 MiB (268.44 MB)",
+ "ram_size":"256.00 MiB (268.44 MB)",
+ "serial":"0x7",
+ "numa_node":1,
+ "host":"cxl_mem.7"
+ }
+ },
+ {
+ "endpoint":"endpoint13",
+ "host":"mem4",
+ "memdev":{
+ "memdev":"mem4",
+ "pmem_size":"256.00 MiB (268.44 MB)",
+ "ram_size":"256.00 MiB (268.44 MB)",
+ "serial":"0x3",
+ "numa_node":1,
+ "host":"cxl_mem.3"
+ }
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "port":"port4",
+ "host":"cxl_host_bridge.0",
+ "ports:port4":[
+ {
+ "port":"port6",
+ "host":"cxl_switch_uport.0",
+ "endpoints:port6":[
+ {
+ "endpoint":"endpoint7",
+ "host":"mem1",
+ "memdev":{
+ "memdev":"mem1",
+ "pmem_size":"256.00 MiB (268.44 MB)",
+ "ram_size":"256.00 MiB (268.44 MB)",
+ "serial":"0",
+ "numa_node":0,
+ "host":"cxl_mem.0"
+ }
+ },
+ {
+ "endpoint":"endpoint14",
+ "host":"mem5",
+ "memdev":{
+ "memdev":"mem5",
+ "pmem_size":"256.00 MiB (268.44 MB)",
+ "ram_size":"256.00 MiB (268.44 MB)",
+ "serial":"0x4",
+ "numa_node":0,
+ "host":"cxl_mem.4"
+ }
+ }
+ ]
+ },
+ {
+ "port":"port10",
+ "host":"cxl_switch_uport.2",
+ "endpoints:port10":[
+ {
+ "endpoint":"endpoint16",
+ "host":"mem7",
+ "memdev":{
+ "memdev":"mem7",
+ "pmem_size":"256.00 MiB (268.44 MB)",
+ "ram_size":"256.00 MiB (268.44 MB)",
+ "serial":"0x6",
+ "numa_node":0,
+ "host":"cxl_mem.6"
+ }
+ },
+ {
+ "endpoint":"endpoint11",
+ "host":"mem3",
+ "memdev":{
+ "memdev":"mem3",
+ "pmem_size":"256.00 MiB (268.44 MB)",
+ "ram_size":"256.00 MiB (268.44 MB)",
+ "serial":"0x2",
+ "numa_node":0,
+ "host":"cxl_mem.2"
+ }
+ }
+ ]
+ }
+ ]
+ }
+ ]
+ }
+
+In that listing each "root", "port", and "endpoint" object correspond a kernel
+'struct cxl_port' object. A 'cxl_port' is a device that can decode CXL.mem to
+its descendants. So "root" claims non-PCIe enumerable platform decode ranges and
+decodes them to "ports", "ports" decode to "endpoints", and "endpoints"
+represent the decode from SPA (System Physical Address) to DPA (Device Physical
+Address).
+
+Continuing the RAID analogy, disks have both topology metadata and on device
+metadata that determine RAID set assembly. CXL Port topology and CXL Port link
+status is metadata for CXL.mem set assembly. The CXL Port topology is enumerated
+by the arrival of a CXL.mem device. I.e. unless and until the PCIe core attaches
+the cxl_pci driver to a CXL Memory Expander there is no role for CXL Port
+objects. Conversely for hot-unplug / removal scenarios, there is no need for
+the Linux PCI core to tear down switch-level CXL resources because the endpoint
+->remove() event cleans up the port data that was established to support that
+Memory Expander.
+
+The port metadata and potential decode schemes that a give memory device may
+participate can be determined via a command like::
+
+ # cxl list -BDMu -d root -m mem3
+ {
+ "bus":"root3",
+ "provider":"cxl_test",
+ "decoders:root3":[
+ {
+ "decoder":"decoder3.1",
+ "resource":"0x8030000000",
+ "size":"512.00 MiB (536.87 MB)",
+ "volatile_capable":true,
+ "nr_targets":2
+ },
+ {
+ "decoder":"decoder3.3",
+ "resource":"0x8060000000",
+ "size":"512.00 MiB (536.87 MB)",
+ "pmem_capable":true,
+ "nr_targets":2
+ },
+ {
+ "decoder":"decoder3.0",
+ "resource":"0x8020000000",
+ "size":"256.00 MiB (268.44 MB)",
+ "volatile_capable":true,
+ "nr_targets":1
+ },
+ {
+ "decoder":"decoder3.2",
+ "resource":"0x8050000000",
+ "size":"256.00 MiB (268.44 MB)",
+ "pmem_capable":true,
+ "nr_targets":1
+ }
+ ],
+ "memdevs:root3":[
+ {
+ "memdev":"mem3",
+ "pmem_size":"256.00 MiB (268.44 MB)",
+ "ram_size":"256.00 MiB (268.44 MB)",
+ "serial":"0x2",
+ "numa_node":0,
+ "host":"cxl_mem.2"
+ }
+ ]
+ }
+
+...which queries the CXL topology to ask "given CXL Memory Expander with a kernel
+device name of 'mem3' which platform level decode ranges may this device
+participate". A given expander can participate in multiple CXL.mem interleave
+sets simultaneously depending on how many decoder resource it has. In this
+example mem3 can participate in one or more of a PMEM interleave that spans to
+Host Bridges, a PMEM interleave that targets a single Host Bridge, a Volatile
+memory interleave that spans 2 Host Bridges, and a Volatile memory interleave
+that only targets a single Host Bridge.
+
+Conversely the memory devices that can participate in a given platform level
+decode scheme can be determined via a command like the following::
+
+ # cxl list -MDu -d 3.2
+ [
+ {
+ "memdevs":[
+ {
+ "memdev":"mem1",
+ "pmem_size":"256.00 MiB (268.44 MB)",
+ "ram_size":"256.00 MiB (268.44 MB)",
+ "serial":"0",
+ "numa_node":0,
+ "host":"cxl_mem.0"
+ },
+ {
+ "memdev":"mem5",
+ "pmem_size":"256.00 MiB (268.44 MB)",
+ "ram_size":"256.00 MiB (268.44 MB)",
+ "serial":"0x4",
+ "numa_node":0,
+ "host":"cxl_mem.4"
+ },
+ {
+ "memdev":"mem7",
+ "pmem_size":"256.00 MiB (268.44 MB)",
+ "ram_size":"256.00 MiB (268.44 MB)",
+ "serial":"0x6",
+ "numa_node":0,
+ "host":"cxl_mem.6"
+ },
+ {
+ "memdev":"mem3",
+ "pmem_size":"256.00 MiB (268.44 MB)",
+ "ram_size":"256.00 MiB (268.44 MB)",
+ "serial":"0x2",
+ "numa_node":0,
+ "host":"cxl_mem.2"
+ }
+ ]
+ },
+ {
+ "root decoders":[
+ {
+ "decoder":"decoder3.2",
+ "resource":"0x8050000000",
+ "size":"256.00 MiB (268.44 MB)",
+ "pmem_capable":true,
+ "nr_targets":1
+ }
+ ]
+ }
+ ]
+
+...where the naming scheme for decoders is "decoder<port_id>.<instance_id>".
+
Driver Infrastructure
=====================
@@ -28,6 +325,14 @@ CXL Memory Device
.. kernel-doc:: drivers/cxl/pci.c
:internal:
+.. kernel-doc:: drivers/cxl/mem.c
+ :doc: cxl mem
+
+CXL Port
+--------
+.. kernel-doc:: drivers/cxl/port.c
+ :doc: cxl port
+
CXL Core
--------
.. kernel-doc:: drivers/cxl/cxl.h
@@ -36,10 +341,16 @@ CXL Core
.. kernel-doc:: drivers/cxl/cxl.h
:internal:
-.. kernel-doc:: drivers/cxl/core/bus.c
+.. kernel-doc:: drivers/cxl/core/port.c
:doc: cxl core
-.. kernel-doc:: drivers/cxl/core/bus.c
+.. kernel-doc:: drivers/cxl/core/port.c
+ :identifiers:
+
+.. kernel-doc:: drivers/cxl/core/pci.c
+ :doc: cxl core pci
+
+.. kernel-doc:: drivers/cxl/core/pci.c
:identifiers:
.. kernel-doc:: drivers/cxl/core/pmem.c
diff --git a/Documentation/driver-api/device-io.rst b/Documentation/driver-api/device-io.rst
index e9f04b1815d1..4d2baac0311c 100644
--- a/Documentation/driver-api/device-io.rst
+++ b/Documentation/driver-api/device-io.rst
@@ -502,6 +502,15 @@ pcim_iomap()
Not using these wrappers may make drivers unusable on certain platforms with
stricter rules for mapping I/O memory.
+Generalizing Access to System and I/O Memory
+============================================
+
+.. kernel-doc:: include/linux/iosys-map.h
+ :doc: overview
+
+.. kernel-doc:: include/linux/iosys-map.h
+ :internal:
+
Public Functions Provided
=========================
diff --git a/Documentation/driver-api/dma-buf.rst b/Documentation/driver-api/dma-buf.rst
index 2cd7db82d9fe..55006678394a 100644
--- a/Documentation/driver-api/dma-buf.rst
+++ b/Documentation/driver-api/dma-buf.rst
@@ -128,15 +128,6 @@ Kernel Functions and Structures Reference
.. kernel-doc:: include/linux/dma-buf.h
:internal:
-Buffer Mapping Helpers
-~~~~~~~~~~~~~~~~~~~~~~
-
-.. kernel-doc:: include/linux/dma-buf-map.h
- :doc: overview
-
-.. kernel-doc:: include/linux/dma-buf-map.h
- :internal:
-
Reservation Objects
-------------------
diff --git a/Documentation/driver-api/driver-model/devres.rst b/Documentation/driver-api/driver-model/devres.rst
index 148e19381b79..5018403fe82f 100644
--- a/Documentation/driver-api/driver-model/devres.rst
+++ b/Documentation/driver-api/driver-model/devres.rst
@@ -368,6 +368,7 @@ MUX
devm_mux_chip_alloc()
devm_mux_chip_register()
devm_mux_control_get()
+ devm_mux_state_get()
NET
devm_alloc_etherdev()
diff --git a/Documentation/driver-api/gpio/board.rst b/Documentation/driver-api/gpio/board.rst
index 191fa867826a..4e3adf31c8d1 100644
--- a/Documentation/driver-api/gpio/board.rst
+++ b/Documentation/driver-api/gpio/board.rst
@@ -71,14 +71,14 @@ with the help of _DSD (Device Specific Data), introduced in ACPI 5.1::
Device (FOO) {
Name (_CRS, ResourceTemplate () {
- GpioIo (Exclusive, ..., IoRestrictionOutputOnly,
- "\\_SB.GPI0") {15} // red
- GpioIo (Exclusive, ..., IoRestrictionOutputOnly,
- "\\_SB.GPI0") {16} // green
- GpioIo (Exclusive, ..., IoRestrictionOutputOnly,
- "\\_SB.GPI0") {17} // blue
- GpioIo (Exclusive, ..., IoRestrictionOutputOnly,
- "\\_SB.GPI0") {1} // power
+ GpioIo (Exclusive, PullUp, 0, 0, IoRestrictionOutputOnly,
+ "\\_SB.GPI0", 0, ResourceConsumer) { 15 } // red
+ GpioIo (Exclusive, PullUp, 0, 0, IoRestrictionOutputOnly,
+ "\\_SB.GPI0", 0, ResourceConsumer) { 16 } // green
+ GpioIo (Exclusive, PullUp, 0, 0, IoRestrictionOutputOnly,
+ "\\_SB.GPI0", 0, ResourceConsumer) { 17 } // blue
+ GpioIo (Exclusive, PullNone, 0, 0, IoRestrictionOutputOnly,
+ "\\_SB.GPI0", 0, ResourceConsumer) { 1 } // power
})
Name (_DSD, Package () {
@@ -92,10 +92,7 @@ with the help of _DSD (Device Specific Data), introduced in ACPI 5.1::
^FOO, 2, 0, 1,
}
},
- Package () {
- "power-gpios",
- Package () {^FOO, 3, 0, 0},
- },
+ Package () { "power-gpios", Package () { ^FOO, 3, 0, 0 } },
}
})
}
diff --git a/Documentation/driver-api/index.rst b/Documentation/driver-api/index.rst
index c57c609ad2eb..a7b0223e2886 100644
--- a/Documentation/driver-api/index.rst
+++ b/Documentation/driver-api/index.rst
@@ -103,6 +103,7 @@ available subsections can be seen below.
sync_file
vfio-mediated-device
vfio
+ vfio-pci-device-specific-driver-acceptance
xilinx/index
xillybus
zorro
diff --git a/Documentation/driver-api/media/drivers/davinci-vpbe-devel.rst b/Documentation/driver-api/media/drivers/davinci-vpbe-devel.rst
index f0961672e6a3..4e87bdbc7ae4 100644
--- a/Documentation/driver-api/media/drivers/davinci-vpbe-devel.rst
+++ b/Documentation/driver-api/media/drivers/davinci-vpbe-devel.rst
@@ -7,22 +7,22 @@ File partitioning
-----------------
V4L2 display device driver
- drivers/media/platform/davinci/vpbe_display.c
- drivers/media/platform/davinci/vpbe_display.h
+ drivers/media/platform/ti/davinci/vpbe_display.c
+ drivers/media/platform/ti/davinci/vpbe_display.h
VPBE display controller
- drivers/media/platform/davinci/vpbe.c
- drivers/media/platform/davinci/vpbe.h
+ drivers/media/platform/ti/davinci/vpbe.c
+ drivers/media/platform/ti/davinci/vpbe.h
VPBE venc sub device driver
- drivers/media/platform/davinci/vpbe_venc.c
- drivers/media/platform/davinci/vpbe_venc.h
- drivers/media/platform/davinci/vpbe_venc_regs.h
+ drivers/media/platform/ti/davinci/vpbe_venc.c
+ drivers/media/platform/ti/davinci/vpbe_venc.h
+ drivers/media/platform/ti/davinci/vpbe_venc_regs.h
VPBE osd driver
- drivers/media/platform/davinci/vpbe_osd.c
- drivers/media/platform/davinci/vpbe_osd.h
- drivers/media/platform/davinci/vpbe_osd_regs.h
+ drivers/media/platform/ti/davinci/vpbe_osd.c
+ drivers/media/platform/ti/davinci/vpbe_osd.h
+ drivers/media/platform/ti/davinci/vpbe_osd_regs.h
To be done
----------
diff --git a/Documentation/driver-api/media/drivers/fimc-devel.rst b/Documentation/driver-api/media/drivers/fimc-devel.rst
index 956e3a9901f8..4c6b7c8be19f 100644
--- a/Documentation/driver-api/media/drivers/fimc-devel.rst
+++ b/Documentation/driver-api/media/drivers/fimc-devel.rst
@@ -12,22 +12,22 @@ Files partitioning
- media device driver
- drivers/media/platform/exynos4-is/media-dev.[ch]
+ drivers/media/platform/samsung/exynos4-is/media-dev.[ch]
- camera capture video device driver
- drivers/media/platform/exynos4-is/fimc-capture.c
+ drivers/media/platform/samsung/exynos4-is/fimc-capture.c
- MIPI-CSI2 receiver subdev
- drivers/media/platform/exynos4-is/mipi-csis.[ch]
+ drivers/media/platform/samsung/exynos4-is/mipi-csis.[ch]
- video post-processor (mem-to-mem)
- drivers/media/platform/exynos4-is/fimc-core.c
+ drivers/media/platform/samsung/exynos4-is/fimc-core.c
- common files
- drivers/media/platform/exynos4-is/fimc-core.h
- drivers/media/platform/exynos4-is/fimc-reg.h
- drivers/media/platform/exynos4-is/regs-fimc.h
+ drivers/media/platform/samsung/exynos4-is/fimc-core.h
+ drivers/media/platform/samsung/exynos4-is/fimc-reg.h
+ drivers/media/platform/samsung/exynos4-is/regs-fimc.h
diff --git a/Documentation/driver-api/media/v4l2-event.rst b/Documentation/driver-api/media/v4l2-event.rst
index 5b8254eba7da..52d4fbc5d819 100644
--- a/Documentation/driver-api/media/v4l2-event.rst
+++ b/Documentation/driver-api/media/v4l2-event.rst
@@ -167,7 +167,7 @@ The first event type in the class is reserved for future use, so the first
available event type is 'class base + 1'.
An example on how the V4L2 events may be used can be found in the OMAP
-3 ISP driver (``drivers/media/platform/omap3isp``).
+3 ISP driver (``drivers/media/platform/ti/omap3isp``).
A subdev can directly send an event to the :c:type:`v4l2_device` notify
function with ``V4L2_DEVICE_NOTIFY_EVENT``. This allows the bridge to map
diff --git a/Documentation/driver-api/mtd/index.rst b/Documentation/driver-api/mtd/index.rst
index 436ba5a851d7..6a4278f409d7 100644
--- a/Documentation/driver-api/mtd/index.rst
+++ b/Documentation/driver-api/mtd/index.rst
@@ -7,6 +7,6 @@ Memory Technology Device (MTD)
.. toctree::
:maxdepth: 1
- intel-spi
+ spi-intel
nand_ecc
spi-nor
diff --git a/Documentation/driver-api/mtd/intel-spi.rst b/Documentation/driver-api/mtd/spi-intel.rst
index 0465f6879262..df854f20ead1 100644
--- a/Documentation/driver-api/mtd/intel-spi.rst
+++ b/Documentation/driver-api/mtd/spi-intel.rst
@@ -1,5 +1,5 @@
==============================
-Upgrading BIOS using intel-spi
+Upgrading BIOS using spi-intel
==============================
Many Intel CPUs like Baytrail and Braswell include SPI serial flash host
@@ -11,12 +11,12 @@ avoid accidental (or on purpose) overwrite of the content.
Not all manufacturers protect the SPI serial flash, mainly because it
allows upgrading the BIOS image directly from an OS.
-The intel-spi driver makes it possible to read and write the SPI serial
+The spi-intel driver makes it possible to read and write the SPI serial
flash, if certain protection bits are not set and locked. If it finds
any of them set, the whole MTD device is made read-only to prevent
partial overwrites. By default the driver exposes SPI serial flash
contents as read-only but it can be changed from kernel command line,
-passing "intel-spi.writeable=1".
+passing "spi_intel.writeable=1".
Please keep in mind that overwriting the BIOS image on SPI serial flash
might render the machine unbootable and requires special equipment like
@@ -32,7 +32,7 @@ Linux.
serial flash. Distros like Debian and Fedora have this prepackaged with
name "mtd-utils".
- 3) Add "intel-spi.writeable=1" to the kernel command line and reboot
+ 3) Add "spi_intel.writeable=1" to the kernel command line and reboot
the board (you can also reload the driver passing "writeable=1" as
module parameter to modprobe).
diff --git a/Documentation/driver-api/nvdimm/nvdimm.rst b/Documentation/driver-api/nvdimm/nvdimm.rst
index 1d8302b89bd4..be8587a558e1 100644
--- a/Documentation/driver-api/nvdimm/nvdimm.rst
+++ b/Documentation/driver-api/nvdimm/nvdimm.rst
@@ -14,10 +14,8 @@ Version 13
Overview
Supporting Documents
Git Trees
- LIBNVDIMM PMEM and BLK
- Why BLK?
- PMEM vs BLK
- BLK-REGIONs, PMEM-REGIONs, Atomic Sectors, and DAX
+ LIBNVDIMM PMEM
+ PMEM-REGIONs, Atomic Sectors, and DAX
Example NVDIMM Platform
LIBNVDIMM Kernel Device Model and LIBNDCTL Userspace API
LIBNDCTL: Context
@@ -53,19 +51,12 @@ PMEM:
block device composed of PMEM is capable of DAX. A PMEM address range
may span an interleave of several DIMMs.
-BLK:
- A set of one or more programmable memory mapped apertures provided
- by a DIMM to access its media. This indirection precludes the
- performance benefit of interleaving, but enables DIMM-bounded failure
- modes.
-
DPA:
DIMM Physical Address, is a DIMM-relative offset. With one DIMM in
the system there would be a 1:1 system-physical-address:DPA association.
Once more DIMMs are added a memory controller interleave must be
decoded to determine the DPA associated with a given
- system-physical-address. BLK capacity always has a 1:1 relationship
- with a single-DIMM's DPA range.
+ system-physical-address.
DAX:
File system extensions to bypass the page cache and block layer to
@@ -84,30 +75,30 @@ BTT:
Block Translation Table: Persistent memory is byte addressable.
Existing software may have an expectation that the power-fail-atomicity
of writes is at least one sector, 512 bytes. The BTT is an indirection
- table with atomic update semantics to front a PMEM/BLK block device
+ table with atomic update semantics to front a PMEM block device
driver and present arbitrary atomic sector sizes.
LABEL:
Metadata stored on a DIMM device that partitions and identifies
- (persistently names) storage between PMEM and BLK. It also partitions
- BLK storage to host BTTs with different parameters per BLK-partition.
- Note that traditional partition tables, GPT/MBR, are layered on top of a
- BLK or PMEM device.
+ (persistently names) capacity allocated to different PMEM namespaces. It
+ also indicates whether an address abstraction like a BTT is applied to
+ the namepsace. Note that traditional partition tables, GPT/MBR, are
+ layered on top of a PMEM namespace, or an address abstraction like BTT
+ if present, but partition support is deprecated going forward.
Overview
========
-The LIBNVDIMM subsystem provides support for three types of NVDIMMs, namely,
-PMEM, BLK, and NVDIMM devices that can simultaneously support both PMEM
-and BLK mode access. These three modes of operation are described by
-the "NVDIMM Firmware Interface Table" (NFIT) in ACPI 6. While the LIBNVDIMM
-implementation is generic and supports pre-NFIT platforms, it was guided
-by the superset of capabilities need to support this ACPI 6 definition
-for NVDIMM resources. The bulk of the kernel implementation is in place
-to handle the case where DPA accessible via PMEM is aliased with DPA
-accessible via BLK. When that occurs a LABEL is needed to reserve DPA
-for exclusive access via one mode a time.
+The LIBNVDIMM subsystem provides support for PMEM described by platform
+firmware or a device driver. On ACPI based systems the platform firmware
+conveys persistent memory resource via the ACPI NFIT "NVDIMM Firmware
+Interface Table" in ACPI 6. While the LIBNVDIMM subsystem implementation
+is generic and supports pre-NFIT platforms, it was guided by the
+superset of capabilities need to support this ACPI 6 definition for
+NVDIMM resources. The original implementation supported the
+block-window-aperture capability described in the NFIT, but that support
+has since been abandoned and never shipped in a product.
Supporting Documents
--------------------
@@ -125,107 +116,38 @@ Git Trees
---------
LIBNVDIMM:
- https://git.kernel.org/cgit/linux/kernel/git/djbw/nvdimm.git
+ https://git.kernel.org/cgit/linux/kernel/git/nvdimm/nvdimm.git
LIBNDCTL:
https://github.com/pmem/ndctl.git
-PMEM:
- https://github.com/01org/prd
-LIBNVDIMM PMEM and BLK
-======================
+LIBNVDIMM PMEM
+==============
Prior to the arrival of the NFIT, non-volatile memory was described to a
system in various ad-hoc ways. Usually only the bare minimum was
provided, namely, a single system-physical-address range where writes
are expected to be durable after a system power loss. Now, the NFIT
specification standardizes not only the description of PMEM, but also
-BLK and platform message-passing entry points for control and
-configuration.
-
-For each NVDIMM access method (PMEM, BLK), LIBNVDIMM provides a block
-device driver:
-
- 1. PMEM (nd_pmem.ko): Drives a system-physical-address range. This
- range is contiguous in system memory and may be interleaved (hardware
- memory controller striped) across multiple DIMMs. When interleaved the
- platform may optionally provide details of which DIMMs are participating
- in the interleave.
-
- Note that while LIBNVDIMM describes system-physical-address ranges that may
- alias with BLK access as ND_NAMESPACE_PMEM ranges and those without
- alias as ND_NAMESPACE_IO ranges, to the nd_pmem driver there is no
- distinction. The different device-types are an implementation detail
- that userspace can exploit to implement policies like "only interface
- with address ranges from certain DIMMs". It is worth noting that when
- aliasing is present and a DIMM lacks a label, then no block device can
- be created by default as userspace needs to do at least one allocation
- of DPA to the PMEM range. In contrast ND_NAMESPACE_IO ranges, once
- registered, can be immediately attached to nd_pmem.
-
- 2. BLK (nd_blk.ko): This driver performs I/O using a set of platform
- defined apertures. A set of apertures will access just one DIMM.
- Multiple windows (apertures) allow multiple concurrent accesses, much like
- tagged-command-queuing, and would likely be used by different threads or
- different CPUs.
-
- The NFIT specification defines a standard format for a BLK-aperture, but
- the spec also allows for vendor specific layouts, and non-NFIT BLK
- implementations may have other designs for BLK I/O. For this reason
- "nd_blk" calls back into platform-specific code to perform the I/O.
-
- One such implementation is defined in the "Driver Writer's Guide" and "DSM
- Interface Example".
-
-
-Why BLK?
-========
+platform message-passing entry points for control and configuration.
+
+PMEM (nd_pmem.ko): Drives a system-physical-address range. This range is
+contiguous in system memory and may be interleaved (hardware memory controller
+striped) across multiple DIMMs. When interleaved the platform may optionally
+provide details of which DIMMs are participating in the interleave.
+
+It is worth noting that when the labeling capability is detected (a EFI
+namespace label index block is found), then no block device is created
+by default as userspace needs to do at least one allocation of DPA to
+the PMEM range. In contrast ND_NAMESPACE_IO ranges, once registered,
+can be immediately attached to nd_pmem. This latter mode is called
+label-less or "legacy".
+
+PMEM-REGIONs, Atomic Sectors, and DAX
+-------------------------------------
-While PMEM provides direct byte-addressable CPU-load/store access to
-NVDIMM storage, it does not provide the best system RAS (recovery,
-availability, and serviceability) model. An access to a corrupted
-system-physical-address address causes a CPU exception while an access
-to a corrupted address through an BLK-aperture causes that block window
-to raise an error status in a register. The latter is more aligned with
-the standard error model that host-bus-adapter attached disks present.
-
-Also, if an administrator ever wants to replace a memory it is easier to
-service a system at DIMM module boundaries. Compare this to PMEM where
-data could be interleaved in an opaque hardware specific manner across
-several DIMMs.
-
-PMEM vs BLK
------------
-
-BLK-apertures solve these RAS problems, but their presence is also the
-major contributing factor to the complexity of the ND subsystem. They
-complicate the implementation because PMEM and BLK alias in DPA space.
-Any given DIMM's DPA-range may contribute to one or more
-system-physical-address sets of interleaved DIMMs, *and* may also be
-accessed in its entirety through its BLK-aperture. Accessing a DPA
-through a system-physical-address while simultaneously accessing the
-same DPA through a BLK-aperture has undefined results. For this reason,
-DIMMs with this dual interface configuration include a DSM function to
-store/retrieve a LABEL. The LABEL effectively partitions the DPA-space
-into exclusive system-physical-address and BLK-aperture accessible
-regions. For simplicity a DIMM is allowed a PMEM "region" per each
-interleave set in which it is a member. The remaining DPA space can be
-carved into an arbitrary number of BLK devices with discontiguous
-extents.
-
-BLK-REGIONs, PMEM-REGIONs, Atomic Sectors, and DAX
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-One of the few
-reasons to allow multiple BLK namespaces per REGION is so that each
-BLK-namespace can be configured with a BTT with unique atomic sector
-sizes. While a PMEM device can host a BTT the LABEL specification does
-not provide for a sector size to be specified for a PMEM namespace.
-
-This is due to the expectation that the primary usage model for PMEM is
-via DAX, and the BTT is incompatible with DAX. However, for the cases
-where an application or filesystem still needs atomic sector update
-guarantees it can register a BTT on a PMEM device or partition. See
+For the cases where an application or filesystem still needs atomic sector
+update guarantees it can register a BTT on a PMEM device or partition. See
LIBNVDIMM/NDCTL: Block Translation Table "btt"
@@ -236,51 +158,40 @@ For the remainder of this document the following diagram will be
referenced for any example sysfs layouts::
- (a) (b) DIMM BLK-REGION
+ (a) (b) DIMM
+-------------------+--------+--------+--------+
- +------+ | pm0.0 | blk2.0 | pm1.0 | blk2.1 | 0 region2
+ +------+ | pm0.0 | free | pm1.0 | free | 0
| imc0 +--+- - - region0- - - +--------+ +--------+
- +--+---+ | pm0.0 | blk3.0 | pm1.0 | blk3.1 | 1 region3
+ +--+---+ | pm0.0 | free | pm1.0 | free | 1
| +-------------------+--------v v--------+
+--+---+ | |
| cpu0 | region1
+--+---+ | |
| +----------------------------^ ^--------+
- +--+---+ | blk4.0 | pm1.0 | blk4.0 | 2 region4
+ +--+---+ | free | pm1.0 | free | 2
| imc1 +--+----------------------------| +--------+
- +------+ | blk5.0 | pm1.0 | blk5.0 | 3 region5
+ +------+ | free | pm1.0 | free | 3
+----------------------------+--------+--------+
In this platform we have four DIMMs and two memory controllers in one
-socket. Each unique interface (BLK or PMEM) to DPA space is identified
-by a region device with a dynamically assigned id (REGION0 - REGION5).
+socket. Each PMEM interleave set is identified by a region device with
+a dynamically assigned id.
1. The first portion of DIMM0 and DIMM1 are interleaved as REGION0. A
single PMEM namespace is created in the REGION0-SPA-range that spans most
of DIMM0 and DIMM1 with a user-specified name of "pm0.0". Some of that
- interleaved system-physical-address range is reclaimed as BLK-aperture
- accessed space starting at DPA-offset (a) into each DIMM. In that
- reclaimed space we create two BLK-aperture "namespaces" from REGION2 and
- REGION3 where "blk2.0" and "blk3.0" are just human readable names that
- could be set to any user-desired name in the LABEL.
+ interleaved system-physical-address range is left free for
+ another PMEM namespace to be defined.
2. In the last portion of DIMM0 and DIMM1 we have an interleaved
system-physical-address range, REGION1, that spans those two DIMMs as
well as DIMM2 and DIMM3. Some of REGION1 is allocated to a PMEM namespace
- named "pm1.0", the rest is reclaimed in 4 BLK-aperture namespaces (for
- each DIMM in the interleave set), "blk2.1", "blk3.1", "blk4.0", and
- "blk5.0".
-
- 3. The portion of DIMM2 and DIMM3 that do not participate in the REGION1
- interleaved system-physical-address range (i.e. the DPA address past
- offset (b) are also included in the "blk4.0" and "blk5.0" namespaces.
- Note, that this example shows that BLK-aperture namespaces don't need to
- be contiguous in DPA-space.
+ named "pm1.0".
This bus is provided by the kernel under the device
/sys/devices/platform/nfit_test.0 when the nfit_test.ko module from
- tools/testing/nvdimm is loaded. This not only test LIBNVDIMM but the
- acpi_nfit.ko driver as well.
+ tools/testing/nvdimm is loaded. This module is a unit test for
+ LIBNVDIMM and the acpi_nfit.ko driver.
LIBNVDIMM Kernel Device Model and LIBNDCTL Userspace API
@@ -469,17 +380,14 @@ identified by an "nfit_handle" a 32-bit value where:
LIBNVDIMM/LIBNDCTL: Region
--------------------------
-A generic REGION device is registered for each PMEM range or BLK-aperture
-set. Per the example there are 6 regions: 2 PMEM and 4 BLK-aperture
-sets on the "nfit_test.0" bus. The primary role of regions are to be a
-container of "mappings". A mapping is a tuple of <DIMM,
-DPA-start-offset, length>.
+A generic REGION device is registered for each PMEM interleave-set /
+range. Per the example there are 2 PMEM regions on the "nfit_test.0"
+bus. The primary role of regions are to be a container of "mappings". A
+mapping is a tuple of <DIMM, DPA-start-offset, length>.
-LIBNVDIMM provides a built-in driver for these REGION devices. This driver
-is responsible for reconciling the aliased DPA mappings across all
-regions, parsing the LABEL, if present, and then emitting NAMESPACE
-devices with the resolved/exclusive DPA-boundaries for the nd_pmem or
-nd_blk device driver to consume.
+LIBNVDIMM provides a built-in driver for REGION devices. This driver
+is responsible for all parsing LABELs, if present, and then emitting NAMESPACE
+devices for the nd_pmem driver to consume.
In addition to the generic attributes of "mapping"s, "interleave_ways"
and "size" the REGION device also exports some convenience attributes.
@@ -493,8 +401,6 @@ LIBNVDIMM: region::
struct nd_region *nvdimm_pmem_region_create(struct nvdimm_bus *nvdimm_bus,
struct nd_region_desc *ndr_desc);
- struct nd_region *nvdimm_blk_region_create(struct nvdimm_bus *nvdimm_bus,
- struct nd_region_desc *ndr_desc);
::
@@ -527,8 +433,9 @@ LIBNDCTL: region enumeration example
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Sample region retrieval routines based on NFIT-unique data like
-"spa_index" (interleave set id) for PMEM and "nfit_handle" (dimm id) for
-BLK::
+"spa_index" (interleave set id).
+
+::
static struct ndctl_region *get_pmem_region_by_spa_index(struct ndctl_bus *bus,
unsigned int spa_index)
@@ -544,139 +451,23 @@ BLK::
return NULL;
}
- static struct ndctl_region *get_blk_region_by_dimm_handle(struct ndctl_bus *bus,
- unsigned int handle)
- {
- struct ndctl_region *region;
-
- ndctl_region_foreach(bus, region) {
- struct ndctl_mapping *map;
-
- if (ndctl_region_get_type(region) != ND_DEVICE_REGION_BLOCK)
- continue;
- ndctl_mapping_foreach(region, map) {
- struct ndctl_dimm *dimm = ndctl_mapping_get_dimm(map);
-
- if (ndctl_dimm_get_handle(dimm) == handle)
- return region;
- }
- }
- return NULL;
- }
-
-
-Why Not Encode the Region Type into the Region Name?
-----------------------------------------------------
-
-At first glance it seems since NFIT defines just PMEM and BLK interface
-types that we should simply name REGION devices with something derived
-from those type names. However, the ND subsystem explicitly keeps the
-REGION name generic and expects userspace to always consider the
-region-attributes for four reasons:
-
- 1. There are already more than two REGION and "namespace" types. For
- PMEM there are two subtypes. As mentioned previously we have PMEM where
- the constituent DIMM devices are known and anonymous PMEM. For BLK
- regions the NFIT specification already anticipates vendor specific
- implementations. The exact distinction of what a region contains is in
- the region-attributes not the region-name or the region-devtype.
-
- 2. A region with zero child-namespaces is a possible configuration. For
- example, the NFIT allows for a DCR to be published without a
- corresponding BLK-aperture. This equates to a DIMM that can only accept
- control/configuration messages, but no i/o through a descendant block
- device. Again, this "type" is advertised in the attributes ('mappings'
- == 0) and the name does not tell you much.
-
- 3. What if a third major interface type arises in the future? Outside
- of vendor specific implementations, it's not difficult to envision a
- third class of interface type beyond BLK and PMEM. With a generic name
- for the REGION level of the device-hierarchy old userspace
- implementations can still make sense of new kernel advertised
- region-types. Userspace can always rely on the generic region
- attributes like "mappings", "size", etc and the expected child devices
- named "namespace". This generic format of the device-model hierarchy
- allows the LIBNVDIMM and LIBNDCTL implementations to be more uniform and
- future-proof.
-
- 4. There are more robust mechanisms for determining the major type of a
- region than a device name. See the next section, How Do I Determine the
- Major Type of a Region?
-
-How Do I Determine the Major Type of a Region?
-----------------------------------------------
-
-Outside of the blanket recommendation of "use libndctl", or simply
-looking at the kernel header (/usr/include/linux/ndctl.h) to decode the
-"nstype" integer attribute, here are some other options.
-
-1. module alias lookup
-^^^^^^^^^^^^^^^^^^^^^^
-
- The whole point of region/namespace device type differentiation is to
- decide which block-device driver will attach to a given LIBNVDIMM namespace.
- One can simply use the modalias to lookup the resulting module. It's
- important to note that this method is robust in the presence of a
- vendor-specific driver down the road. If a vendor-specific
- implementation wants to supplant the standard nd_blk driver it can with
- minimal impact to the rest of LIBNVDIMM.
-
- In fact, a vendor may also want to have a vendor-specific region-driver
- (outside of nd_region). For example, if a vendor defined its own LABEL
- format it would need its own region driver to parse that LABEL and emit
- the resulting namespaces. The output from module resolution is more
- accurate than a region-name or region-devtype.
-
-2. udev
-^^^^^^^
-
- The kernel "devtype" is registered in the udev database::
-
- # udevadm info --path=/devices/platform/nfit_test.0/ndbus0/region0
- P: /devices/platform/nfit_test.0/ndbus0/region0
- E: DEVPATH=/devices/platform/nfit_test.0/ndbus0/region0
- E: DEVTYPE=nd_pmem
- E: MODALIAS=nd:t2
- E: SUBSYSTEM=nd
-
- # udevadm info --path=/devices/platform/nfit_test.0/ndbus0/region4
- P: /devices/platform/nfit_test.0/ndbus0/region4
- E: DEVPATH=/devices/platform/nfit_test.0/ndbus0/region4
- E: DEVTYPE=nd_blk
- E: MODALIAS=nd:t3
- E: SUBSYSTEM=nd
-
- ...and is available as a region attribute, but keep in mind that the
- "devtype" does not indicate sub-type variations and scripts should
- really be understanding the other attributes.
-
-3. type specific attributes
-^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
- As it currently stands a BLK-aperture region will never have a
- "nfit/spa_index" attribute, but neither will a non-NFIT PMEM region. A
- BLK region with a "mappings" value of 0 is, as mentioned above, a DIMM
- that does not allow I/O. A PMEM region with a "mappings" value of zero
- is a simple system-physical-address range.
-
LIBNVDIMM/LIBNDCTL: Namespace
-----------------------------
-A REGION, after resolving DPA aliasing and LABEL specified boundaries,
-surfaces one or more "namespace" devices. The arrival of a "namespace"
-device currently triggers either the nd_blk or nd_pmem driver to load
-and register a disk/block device.
+A REGION, after resolving DPA aliasing and LABEL specified boundaries, surfaces
+one or more "namespace" devices. The arrival of a "namespace" device currently
+triggers the nd_pmem driver to load and register a disk/block device.
LIBNVDIMM: namespace
^^^^^^^^^^^^^^^^^^^^
-Here is a sample layout from the three major types of NAMESPACE where
-namespace0.0 represents DIMM-info-backed PMEM (note that it has a 'uuid'
-attribute), namespace2.0 represents a BLK namespace (note it has a
-'sector_size' attribute) that, and namespace6.0 represents an anonymous
-PMEM namespace (note that has no 'uuid' attribute due to not support a
-LABEL)::
+Here is a sample layout from the 2 major types of NAMESPACE where namespace0.0
+represents DIMM-info-backed PMEM (note that it has a 'uuid' attribute), and
+namespace1.0 represents an anonymous PMEM namespace (note that has no 'uuid'
+attribute due to not support a LABEL)
+
+::
/sys/devices/platform/nfit_test.0/ndbus0/region0/namespace0.0
|-- alt_name
@@ -691,20 +482,7 @@ LABEL)::
|-- type
|-- uevent
`-- uuid
- /sys/devices/platform/nfit_test.0/ndbus0/region2/namespace2.0
- |-- alt_name
- |-- devtype
- |-- dpa_extents
- |-- force_raw
- |-- modalias
- |-- numa_node
- |-- sector_size
- |-- size
- |-- subsystem -> ../../../../../../bus/nd
- |-- type
- |-- uevent
- `-- uuid
- /sys/devices/platform/nfit_test.1/ndbus1/region6/namespace6.0
+ /sys/devices/platform/nfit_test.1/ndbus1/region1/namespace1.0
|-- block
| `-- pmem0
|-- devtype
@@ -786,9 +564,9 @@ Why the Term "namespace"?
LIBNVDIMM/LIBNDCTL: Block Translation Table "btt"
-------------------------------------------------
-A BTT (design document: https://pmem.io/2014/09/23/btt.html) is a stacked
-block device driver that fronts either the whole block device or a
-partition of a block device emitted by either a PMEM or BLK NAMESPACE.
+A BTT (design document: https://pmem.io/2014/09/23/btt.html) is a
+personality driver for a namespace that fronts entire namespace as an
+'address abstraction'.
LIBNVDIMM: btt layout
^^^^^^^^^^^^^^^^^^^^^
@@ -815,7 +593,9 @@ LIBNDCTL: btt creation example
Similar to namespaces an idle BTT device is automatically created per
region. Each time this "seed" btt device is configured and enabled a new
seed is created. Creating a BTT configuration involves two steps of
-finding and idle BTT and assigning it to consume a PMEM or BLK namespace::
+finding and idle BTT and assigning it to consume a namespace.
+
+::
static struct ndctl_btt *get_idle_btt(struct ndctl_region *region)
{
@@ -863,25 +643,15 @@ For the given example above, here is the view of the objects as seen by the
LIBNDCTL API::
+---+
- |CTX| +---------+ +--------------+ +---------------+
- +-+-+ +-> REGION0 +---> NAMESPACE0.0 +--> PMEM8 "pm0.0" |
- | | +---------+ +--------------+ +---------------+
- +-------+ | | +---------+ +--------------+ +---------------+
- | DIMM0 <-+ | +-> REGION1 +---> NAMESPACE1.0 +--> PMEM6 "pm1.0" |
- +-------+ | | | +---------+ +--------------+ +---------------+
+ |CTX|
+ +-+-+
+ |
+ +-------+ |
+ | DIMM0 <-+ | +---------+ +--------------+ +---------------+
+ +-------+ | | +-> REGION0 +---> NAMESPACE0.0 +--> PMEM8 "pm0.0" |
| DIMM1 <-+ +-v--+ | +---------+ +--------------+ +---------------+
- +-------+ +-+BUS0+---> REGION2 +-+-> NAMESPACE2.0 +--> ND6 "blk2.0" |
- | DIMM2 <-+ +----+ | +---------+ | +--------------+ +----------------------+
- +-------+ | | +-> NAMESPACE2.1 +--> ND5 "blk2.1" | BTT2 |
- | DIMM3 <-+ | +--------------+ +----------------------+
- +-------+ | +---------+ +--------------+ +---------------+
- +-> REGION3 +-+-> NAMESPACE3.0 +--> ND4 "blk3.0" |
- | +---------+ | +--------------+ +----------------------+
- | +-> NAMESPACE3.1 +--> ND3 "blk3.1" | BTT1 |
- | +--------------+ +----------------------+
- | +---------+ +--------------+ +---------------+
- +-> REGION4 +---> NAMESPACE4.0 +--> ND2 "blk4.0" |
- | +---------+ +--------------+ +---------------+
- | +---------+ +--------------+ +----------------------+
- +-> REGION5 +---> NAMESPACE5.0 +--> ND1 "blk5.0" | BTT0 |
- +---------+ +--------------+ +---------------+------+
+ +-------+ +-+BUS0+-| +---------+ +--------------+ +----------------------+
+ | DIMM2 <-+ +----+ +-> REGION1 +---> NAMESPACE1.0 +--> PMEM6 "pm1.0" | BTT1 |
+ +-------+ | | +---------+ +--------------+ +---------------+------+
+ | DIMM3 <-+
+ +-------+
diff --git a/Documentation/driver-api/nvmem.rst b/Documentation/driver-api/nvmem.rst
index 287e86819640..e3366322d46c 100644
--- a/Documentation/driver-api/nvmem.rst
+++ b/Documentation/driver-api/nvmem.rst
@@ -26,9 +26,7 @@ was a rather big abstraction leak.
This framework aims at solve these problems. It also introduces DT
representation for consumer devices to go get the data they require (MAC
-Addresses, SoC/Revision ID, part numbers, and so on) from the NVMEMs. This
-framework is based on regmap, so that most of the abstraction available in
-regmap can be reused, across multiple types of buses.
+Addresses, SoC/Revision ID, part numbers, and so on) from the NVMEMs.
NVMEM Providers
+++++++++++++++
@@ -45,23 +43,21 @@ nvmem_device pointer.
nvmem_unregister(nvmem) is used to unregister a previously registered provider.
-For example, a simple qfprom case::
+For example, a simple nvram case::
- static struct nvmem_config econfig = {
- .name = "qfprom",
- .owner = THIS_MODULE,
- };
-
- static int qfprom_probe(struct platform_device *pdev)
+ static int brcm_nvram_probe(struct platform_device *pdev)
{
+ struct nvmem_config config = {
+ .name = "brcm-nvram",
+ .reg_read = brcm_nvram_read,
+ };
...
- econfig.dev = &pdev->dev;
- nvmem = nvmem_register(&econfig);
- ...
- }
+ config.dev = &pdev->dev;
+ config.priv = priv;
+ config.size = resource_size(res);
-It is mandatory that the NVMEM provider has a regmap associated with its
-struct device. Failure to do would return error code from nvmem_register().
+ devm_nvmem_register(&config);
+ }
Users of board files can define and register nvmem cells using the
nvmem_cell_table struct::
diff --git a/Documentation/driver-api/serial/driver.rst b/Documentation/driver-api/serial/driver.rst
index 31bd4e16fb1f..06ec04ba086f 100644
--- a/Documentation/driver-api/serial/driver.rst
+++ b/Documentation/driver-api/serial/driver.rst
@@ -311,7 +311,7 @@ hardware.
This call must not sleep
set_ldisc(port,termios)
- Notifier for discipline change. See Documentation/driver-api/serial/tty.rst.
+ Notifier for discipline change. See Documentation/tty/tty_ldisc.rst.
Locking: caller holds tty_port->mutex
diff --git a/Documentation/driver-api/thermal/index.rst b/Documentation/driver-api/thermal/index.rst
index 4cb0b9b6bfb8..030306ffa408 100644
--- a/Documentation/driver-api/thermal/index.rst
+++ b/Documentation/driver-api/thermal/index.rst
@@ -17,3 +17,4 @@ Thermal
intel_powerclamp
nouveau_thermal
x86_pkg_temperature_thermal
+ intel_dptf
diff --git a/Documentation/driver-api/thermal/intel_dptf.rst b/Documentation/driver-api/thermal/intel_dptf.rst
new file mode 100644
index 000000000000..96668dca753a
--- /dev/null
+++ b/Documentation/driver-api/thermal/intel_dptf.rst
@@ -0,0 +1,272 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===============================================================
+Intel(R) Dynamic Platform and Thermal Framework Sysfs Interface
+===============================================================
+
+:Copyright: |copy| 2022 Intel Corporation
+
+:Author: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
+
+Introduction
+------------
+
+Intel(R) Dynamic Platform and Thermal Framework (DPTF) is a platform
+level hardware/software solution for power and thermal management.
+
+As a container for multiple power/thermal technologies, DPTF provides
+a coordinated approach for different policies to effect the hardware
+state of a system.
+
+Since it is a platform level framework, this has several components.
+Some parts of the technology is implemented in the firmware and uses
+ACPI and PCI devices to expose various features for monitoring and
+control. Linux has a set of kernel drivers exposing hardware interface
+to user space. This allows user space thermal solutions like
+"Linux Thermal Daemon" to read platform specific thermal and power
+tables to deliver adequate performance while keeping the system under
+thermal limits.
+
+DPTF ACPI Drivers interface
+----------------------------
+
+:file:`/sys/bus/platform/devices/<N>/uuids`, where <N>
+=INT3400|INTC1040|INTC1041|INTC10A0
+
+``available_uuids`` (RO)
+ A set of UUIDs strings presenting available policies
+ which should be notified to the firmware when the
+ user space can support those policies.
+
+ UUID strings:
+
+ "42A441D6-AE6A-462b-A84B-4A8CE79027D3" : Passive 1
+
+ "3A95C389-E4B8-4629-A526-C52C88626BAE" : Active
+
+ "97C68AE7-15FA-499c-B8C9-5DA81D606E0A" : Critical
+
+ "63BE270F-1C11-48FD-A6F7-3AF253FF3E2D" : Adaptive performance
+
+ "5349962F-71E6-431D-9AE8-0A635B710AEE" : Emergency call
+
+ "9E04115A-AE87-4D1C-9500-0F3E340BFE75" : Passive 2
+
+ "F5A35014-C209-46A4-993A-EB56DE7530A1" : Power Boss
+
+ "6ED722A7-9240-48A5-B479-31EEF723D7CF" : Virtual Sensor
+
+ "16CAF1B7-DD38-40ED-B1C1-1B8A1913D531" : Cooling mode
+
+ "BE84BABF-C4D4-403D-B495-3128FD44dAC1" : HDC
+
+``current_uuid`` (RW)
+ User space can write strings from available UUIDs, one at a
+ time.
+
+:file:`/sys/bus/platform/devices/<N>/`, where <N>
+=INT3400|INTC1040|INTC1041|INTC10A0
+
+``imok`` (WO)
+ User space daemon write 1 to respond to firmware event
+ for sending keep alive notification. User space receives
+ THERMAL_EVENT_KEEP_ALIVE kobject uevent notification when
+ firmware calls for user space to respond with imok ACPI
+ method.
+
+``odvp*`` (RO)
+ Firmware thermal status variable values. Thermal tables
+ calls for different processing based on these variable
+ values.
+
+``data_vault`` (RO)
+ Binary thermal table. Refer to
+ https:/github.com/intel/thermal_daemon for decoding
+ thermal table.
+
+
+ACPI Thermal Relationship table interface
+------------------------------------------
+
+:file:`/dev/acpi_thermal_rel`
+
+ This device provides IOCTL interface to read standard ACPI
+ thermal relationship tables via ACPI methods _TRT and _ART.
+ These IOCTLs are defined in
+ drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.h
+
+ IOCTLs:
+
+ ACPI_THERMAL_GET_TRT_LEN: Get length of TRT table
+
+ ACPI_THERMAL_GET_ART_LEN: Get length of ART table
+
+ ACPI_THERMAL_GET_TRT_COUNT: Number of records in TRT table
+
+ ACPI_THERMAL_GET_ART_COUNT: Number of records in ART table
+
+ ACPI_THERMAL_GET_TRT: Read binary TRT table, length to read is
+ provided via argument to ioctl().
+
+ ACPI_THERMAL_GET_ART: Read binary ART table, length to read is
+ provided via argument to ioctl().
+
+DPTF ACPI Sensor drivers
+-------------------------
+
+DPTF Sensor drivers are presented as standard thermal sysfs thermal_zone.
+
+
+DPTF ACPI Cooling drivers
+--------------------------
+
+DPTF cooling drivers are presented as standard thermal sysfs cooling_device.
+
+
+DPTF Processor thermal PCI Driver interface
+--------------------------------------------
+
+:file:`/sys/bus/pci/devices/0000\:00\:04.0/power_limits/`
+
+Refer to Documentation/power/powercap/powercap.rst for powercap
+ABI.
+
+``power_limit_0_max_uw`` (RO)
+ Maximum powercap sysfs constraint_0_power_limit_uw for Intel RAPL
+
+``power_limit_0_step_uw`` (RO)
+ Power limit increment/decrements for Intel RAPL constraint 0 power limit
+
+``power_limit_0_min_uw`` (RO)
+ Minimum powercap sysfs constraint_0_power_limit_uw for Intel RAPL
+
+``power_limit_0_tmin_us`` (RO)
+ Minimum powercap sysfs constraint_0_time_window_us for Intel RAPL
+
+``power_limit_0_tmax_us`` (RO)
+ Maximum powercap sysfs constraint_0_time_window_us for Intel RAPL
+
+``power_limit_1_max_uw`` (RO)
+ Maximum powercap sysfs constraint_1_power_limit_uw for Intel RAPL
+
+``power_limit_1_step_uw`` (RO)
+ Power limit increment/decrements for Intel RAPL constraint 1 power limit
+
+``power_limit_1_min_uw`` (RO)
+ Minimum powercap sysfs constraint_1_power_limit_uw for Intel RAPL
+
+``power_limit_1_tmin_us`` (RO)
+ Minimum powercap sysfs constraint_1_time_window_us for Intel RAPL
+
+``power_limit_1_tmax_us`` (RO)
+ Maximum powercap sysfs constraint_1_time_window_us for Intel RAPL
+
+:file:`/sys/bus/pci/devices/0000\:00\:04.0/`
+
+``tcc_offset_degree_celsius`` (RW)
+ TCC offset from the critical temperature where hardware will throttle
+ CPU.
+
+:file:`/sys/bus/pci/devices/0000\:00\:04.0/workload_request`
+
+``workload_available_types`` (RO)
+ Available workload types. User space can specify one of the workload type
+ it is currently executing via workload_type. For example: idle, bursty,
+ sustained etc.
+
+``workload_type`` (RW)
+ User space can specify any one of the available workload type using
+ this interface.
+
+DPTF Processor thermal RFIM interface
+--------------------------------------------
+
+RFIM interface allows adjustment of FIVR (Fully Integrated Voltage Regulator)
+and DDR (Double Data Rate)frequencies to avoid RF interference with WiFi and 5G.
+
+Switching voltage regulators (VR) generate radiated EMI or RFI at the
+fundamental frequency and its harmonics. Some harmonics may interfere
+with very sensitive wireless receivers such as Wi-Fi and cellular that
+are integrated into host systems like notebook PCs. One of mitigation
+methods is requesting SOC integrated VR (IVR) switching frequency to a
+small % and shift away the switching noise harmonic interference from
+radio channels. OEM or ODMs can use the driver to control SOC IVR
+operation within the range where it does not impact IVR performance.
+
+DRAM devices of DDR IO interface and their power plane can generate EMI
+at the data rates. Similar to IVR control mechanism, Intel offers a
+mechanism by which DDR data rates can be changed if several conditions
+are met: there is strong RFI interference because of DDR; CPU power
+management has no other restriction in changing DDR data rates;
+PC ODMs enable this feature (real time DDR RFI Mitigation referred to as
+DDR-RFIM) for Wi-Fi from BIOS.
+
+
+FIVR attributes
+
+:file:`/sys/bus/pci/devices/0000\:00\:04.0/fivr/`
+
+``vco_ref_code_lo`` (RW)
+ The VCO reference code is an 11-bit field and controls the FIVR
+ switching frequency. This is the 3-bit LSB field.
+
+``vco_ref_code_hi`` (RW)
+ The VCO reference code is an 11-bit field and controls the FIVR
+ switching frequency. This is the 8-bit MSB field.
+
+``spread_spectrum_pct`` (RW)
+ Set the FIVR spread spectrum clocking percentage
+
+``spread_spectrum_clk_enable`` (RW)
+ Enable/disable of the FIVR spread spectrum clocking feature
+
+``rfi_vco_ref_code`` (RW)
+ This field is a read only status register which reflects the
+ current FIVR switching frequency
+
+``fivr_fffc_rev`` (RW)
+ This field indicated the revision of the FIVR HW.
+
+
+DVFS attributes
+
+:file:`/sys/bus/pci/devices/0000\:00\:04.0/dvfs/`
+
+``rfi_restriction_run_busy`` (RW)
+ Request the restriction of specific DDR data rate and set this
+ value 1. Self reset to 0 after operation.
+
+``rfi_restriction_err_code`` (RW)
+ 0 :Request is accepted, 1:Feature disabled,
+ 2: the request restricts more points than it is allowed
+
+``rfi_restriction_data_rate_Delta`` (RW)
+ Restricted DDR data rate for RFI protection: Lower Limit
+
+``rfi_restriction_data_rate_Base`` (RW)
+ Restricted DDR data rate for RFI protection: Upper Limit
+
+``ddr_data_rate_point_0`` (RO)
+ DDR data rate selection 1st point
+
+``ddr_data_rate_point_1`` (RO)
+ DDR data rate selection 2nd point
+
+``ddr_data_rate_point_2`` (RO)
+ DDR data rate selection 3rd point
+
+``ddr_data_rate_point_3`` (RO)
+ DDR data rate selection 4th point
+
+``rfi_disable (RW)``
+ Disable DDR rate change feature
+
+DPTF Power supply and Battery Interface
+----------------------------------------
+
+Refer to Documentation/ABI/testing/sysfs-platform-dptf
+
+DPTF Fan Control
+----------------------------------------
+
+Refer to Documentation/admin-guide/acpi/fan_performance_states.rst
diff --git a/Documentation/driver-api/vfio-pci-device-specific-driver-acceptance.rst b/Documentation/driver-api/vfio-pci-device-specific-driver-acceptance.rst
new file mode 100644
index 000000000000..b7b99b876b50
--- /dev/null
+++ b/Documentation/driver-api/vfio-pci-device-specific-driver-acceptance.rst
@@ -0,0 +1,35 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Acceptance criteria for vfio-pci device specific driver variants
+================================================================
+
+Overview
+--------
+The vfio-pci driver exists as a device agnostic driver using the
+system IOMMU and relying on the robustness of platform fault
+handling to provide isolated device access to userspace. While the
+vfio-pci driver does include some device specific support, further
+extensions for yet more advanced device specific features are not
+sustainable. The vfio-pci driver has therefore split out
+vfio-pci-core as a library that may be reused to implement features
+requiring device specific knowledge, ex. saving and loading device
+state for the purposes of supporting migration.
+
+In support of such features, it's expected that some device specific
+variants may interact with parent devices (ex. SR-IOV PF in support of
+a user assigned VF) or other extensions that may not be otherwise
+accessible via the vfio-pci base driver. Authors of such drivers
+should be diligent not to create exploitable interfaces via these
+interactions or allow unchecked userspace data to have an effect
+beyond the scope of the assigned device.
+
+New driver submissions are therefore requested to have approval via
+sign-off/ack/review/etc for any interactions with parent drivers.
+Additionally, drivers should make an attempt to provide sufficient
+documentation for reviewers to understand the device specific
+extensions, for example in the case of migration data, how is the
+device state composed and consumed, which portions are not otherwise
+available to the user via vfio-pci, what safeguards exist to validate
+the data, etc. To that extent, authors should additionally expect to
+require reviews from at least one of the listed reviewers, in addition
+to the overall vfio maintainer.