diff options
Diffstat (limited to 'Documentation/core-api')
-rw-r--r-- | Documentation/core-api/folio_queue.rst | 3 | ||||
-rw-r--r-- | Documentation/core-api/index.rst | 1 | ||||
-rw-r--r-- | Documentation/core-api/kho/bindings/kho.yaml | 43 | ||||
-rw-r--r-- | Documentation/core-api/kho/bindings/memblock/memblock.yaml | 39 | ||||
-rw-r--r-- | Documentation/core-api/kho/bindings/memblock/reserve-mem.yaml | 40 | ||||
-rw-r--r-- | Documentation/core-api/kho/bindings/sub-fdt.yaml | 27 | ||||
-rw-r--r-- | Documentation/core-api/kho/concepts.rst | 74 | ||||
-rw-r--r-- | Documentation/core-api/kho/fdt.rst | 80 | ||||
-rw-r--r-- | Documentation/core-api/kho/index.rst | 13 | ||||
-rw-r--r-- | Documentation/core-api/symbol-namespaces.rst | 63 |
10 files changed, 354 insertions, 29 deletions
diff --git a/Documentation/core-api/folio_queue.rst b/Documentation/core-api/folio_queue.rst index 1fe7a9bc4b8d..83cfbc157e49 100644 --- a/Documentation/core-api/folio_queue.rst +++ b/Documentation/core-api/folio_queue.rst @@ -151,19 +151,16 @@ The marks can be set by:: void folioq_mark(struct folio_queue *folioq, unsigned int slot); void folioq_mark2(struct folio_queue *folioq, unsigned int slot); - void folioq_mark3(struct folio_queue *folioq, unsigned int slot); Cleared by:: void folioq_unmark(struct folio_queue *folioq, unsigned int slot); void folioq_unmark2(struct folio_queue *folioq, unsigned int slot); - void folioq_unmark3(struct folio_queue *folioq, unsigned int slot); And the marks can be queried by:: bool folioq_is_marked(const struct folio_queue *folioq, unsigned int slot); bool folioq_is_marked2(const struct folio_queue *folioq, unsigned int slot); - bool folioq_is_marked3(const struct folio_queue *folioq, unsigned int slot); The marks can be used for any purpose and are not interpreted by this API. diff --git a/Documentation/core-api/index.rst b/Documentation/core-api/index.rst index e9789bd381d8..7a4ca18ca6e2 100644 --- a/Documentation/core-api/index.rst +++ b/Documentation/core-api/index.rst @@ -115,6 +115,7 @@ more memory-management documentation in Documentation/mm/index.rst. pin_user_pages boot-time-mm gfp_mask-from-fs-io + kho/index Interfaces for kernel debugging =============================== diff --git a/Documentation/core-api/kho/bindings/kho.yaml b/Documentation/core-api/kho/bindings/kho.yaml new file mode 100644 index 000000000000..11e8ab7b219d --- /dev/null +++ b/Documentation/core-api/kho/bindings/kho.yaml @@ -0,0 +1,43 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +title: Kexec HandOver (KHO) root tree + +maintainers: + - Mike Rapoport <rppt@kernel.org> + - Changyuan Lyu <changyuanl@google.com> + +description: | + System memory preserved by KHO across kexec. + +properties: + compatible: + enum: + - kho-v1 + + preserved-memory-map: + description: | + physical address (u64) of an in-memory structure describing all preserved + folios and memory ranges. + +patternProperties: + "$[0-9a-f_]+^": + $ref: sub-fdt.yaml# + description: physical address of a KHO user's own FDT. + +required: + - compatible + - preserved-memory-map + +additionalProperties: false + +examples: + - | + kho { + compatible = "kho-v1"; + preserved-memory-map = <0xf0be16 0x1000000>; + + memblock { + fdt = <0x80cc16 0x1000000>; + }; + }; diff --git a/Documentation/core-api/kho/bindings/memblock/memblock.yaml b/Documentation/core-api/kho/bindings/memblock/memblock.yaml new file mode 100644 index 000000000000..d388c28eb91d --- /dev/null +++ b/Documentation/core-api/kho/bindings/memblock/memblock.yaml @@ -0,0 +1,39 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +title: Memblock reserved memory + +maintainers: + - Mike Rapoport <rppt@kernel.org> + +description: | + Memblock can serialize its current memory reservations created with + reserve_mem command line option across kexec through KHO. + The post-KHO kernel can then consume these reservations and they are + guaranteed to have the same physical address. + +properties: + compatible: + enum: + - reserve-mem-v1 + +patternProperties: + "$[0-9a-f_]+^": + $ref: reserve-mem.yaml# + description: reserved memory regions + +required: + - compatible + +additionalProperties: false + +examples: + - | + memblock { + compatible = "memblock-v1"; + n1 { + compatible = "reserve-mem-v1"; + start = <0xc06b 0x4000000>; + size = <0x04 0x00>; + }; + }; diff --git a/Documentation/core-api/kho/bindings/memblock/reserve-mem.yaml b/Documentation/core-api/kho/bindings/memblock/reserve-mem.yaml new file mode 100644 index 000000000000..10282d3d1bcd --- /dev/null +++ b/Documentation/core-api/kho/bindings/memblock/reserve-mem.yaml @@ -0,0 +1,40 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +title: Memblock reserved memory regions + +maintainers: + - Mike Rapoport <rppt@kernel.org> + +description: | + Memblock can serialize its current memory reservations created with + reserve_mem command line option across kexec through KHO. + This object describes each such region. + +properties: + compatible: + enum: + - reserve-mem-v1 + + start: + description: | + physical address (u64) of the reserved memory region. + + size: + description: | + size (u64) of the reserved memory region. + +required: + - compatible + - start + - size + +additionalProperties: false + +examples: + - | + n1 { + compatible = "reserve-mem-v1"; + start = <0xc06b 0x4000000>; + size = <0x04 0x00>; + }; diff --git a/Documentation/core-api/kho/bindings/sub-fdt.yaml b/Documentation/core-api/kho/bindings/sub-fdt.yaml new file mode 100644 index 000000000000..b9a3d2d24850 --- /dev/null +++ b/Documentation/core-api/kho/bindings/sub-fdt.yaml @@ -0,0 +1,27 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +title: KHO users' FDT address + +maintainers: + - Mike Rapoport <rppt@kernel.org> + - Changyuan Lyu <changyuanl@google.com> + +description: | + Physical address of an FDT blob registered by a KHO user. + +properties: + fdt: + description: | + physical address (u64) of an FDT blob. + +required: + - fdt + +additionalProperties: false + +examples: + - | + memblock { + fdt = <0x80cc16 0x1000000>; + }; diff --git a/Documentation/core-api/kho/concepts.rst b/Documentation/core-api/kho/concepts.rst new file mode 100644 index 000000000000..36d5c05cfb30 --- /dev/null +++ b/Documentation/core-api/kho/concepts.rst @@ -0,0 +1,74 @@ +.. SPDX-License-Identifier: GPL-2.0-or-later +.. _kho-concepts: + +======================= +Kexec Handover Concepts +======================= + +Kexec HandOver (KHO) is a mechanism that allows Linux to preserve memory +regions, which could contain serialized system states, across kexec. + +It introduces multiple concepts: + +KHO FDT +======= + +Every KHO kexec carries a KHO specific flattened device tree (FDT) blob +that describes preserved memory regions. These regions contain either +serialized subsystem states, or in-memory data that shall not be touched +across kexec. After KHO, subsystems can retrieve and restore preserved +memory regions from KHO FDT. + +KHO only uses the FDT container format and libfdt library, but does not +adhere to the same property semantics that normal device trees do: Properties +are passed in native endianness and standardized properties like ``regs`` and +``ranges`` do not exist, hence there are no ``#...-cells`` properties. + +KHO is still under development. The FDT schema is unstable and would change +in the future. + +Scratch Regions +=============== + +To boot into kexec, we need to have a physically contiguous memory range that +contains no handed over memory. Kexec then places the target kernel and initrd +into that region. The new kernel exclusively uses this region for memory +allocations before during boot up to the initialization of the page allocator. + +We guarantee that we always have such regions through the scratch regions: On +first boot KHO allocates several physically contiguous memory regions. Since +after kexec these regions will be used by early memory allocations, there is a +scratch region per NUMA node plus a scratch region to satisfy allocations +requests that do not require particular NUMA node assignment. +By default, size of the scratch region is calculated based on amount of memory +allocated during boot. The ``kho_scratch`` kernel command line option may be +used to explicitly define size of the scratch regions. +The scratch regions are declared as CMA when page allocator is initialized so +that their memory can be used during system lifetime. CMA gives us the +guarantee that no handover pages land in that region, because handover pages +must be at a static physical memory location and CMA enforces that only +movable pages can be located inside. + +After KHO kexec, we ignore the ``kho_scratch`` kernel command line option and +instead reuse the exact same region that was originally allocated. This allows +us to recursively execute any amount of KHO kexecs. Because we used this region +for boot memory allocations and as target memory for kexec blobs, some parts +of that memory region may be reserved. These reservations are irrelevant for +the next KHO, because kexec can overwrite even the original kernel. + +.. _kho-finalization-phase: + +KHO finalization phase +====================== + +To enable user space based kexec file loader, the kernel needs to be able to +provide the FDT that describes the current kernel's state before +performing the actual kexec. The process of generating that FDT is +called serialization. When the FDT is generated, some properties +of the system may become immutable because they are already written down +in the FDT. That state is called the KHO finalization phase. + +Public API +========== +.. kernel-doc:: kernel/kexec_handover.c + :export: diff --git a/Documentation/core-api/kho/fdt.rst b/Documentation/core-api/kho/fdt.rst new file mode 100644 index 000000000000..62505285d60d --- /dev/null +++ b/Documentation/core-api/kho/fdt.rst @@ -0,0 +1,80 @@ +.. SPDX-License-Identifier: GPL-2.0-or-later + +======= +KHO FDT +======= + +KHO uses the flattened device tree (FDT) container format and libfdt +library to create and parse the data that is passed between the +kernels. The properties in KHO FDT are stored in native format. +It includes the physical address of an in-memory structure describing +all preserved memory regions, as well as physical addresses of KHO users' +own FDTs. Interpreting those sub FDTs is the responsibility of KHO users. + +KHO nodes and properties +======================== + +Property ``preserved-memory-map`` +--------------------------------- + +KHO saves a special property named ``preserved-memory-map`` under the root node. +This node contains the physical address of an in-memory structure for KHO to +preserve memory regions across kexec. + +Property ``compatible`` +----------------------- + +The ``compatible`` property determines compatibility between the kernel +that created the KHO FDT and the kernel that attempts to load it. +If the kernel that loads the KHO FDT is not compatible with it, the entire +KHO process will be bypassed. + +Property ``fdt`` +---------------- + +Generally, a KHO user serialize its state into its own FDT and instructs +KHO to preserve the underlying memory, such that after kexec, the new kernel +can recover its state from the preserved FDT. + +A KHO user thus can create a node in KHO root tree and save the physical address +of its own FDT in that node's property ``fdt`` . + +Examples +======== + +The following example demonstrates KHO FDT that preserves two memory +regions created with ``reserve_mem`` kernel command line parameter:: + + /dts-v1/; + + / { + compatible = "kho-v1"; + + preserved-memory-map = <0x40be16 0x1000000>; + + memblock { + fdt = <0x1517 0x1000000>; + }; + }; + +where the ``memblock`` node contains an FDT that is requested by the +subsystem memblock for preservation. The FDT contains the following +serialized data:: + + /dts-v1/; + + / { + compatible = "memblock-v1"; + + n1 { + compatible = "reserve-mem-v1"; + start = <0xc06b 0x4000000>; + size = <0x04 0x00>; + }; + + n2 { + compatible = "reserve-mem-v1"; + start = <0xc067 0x4000000>; + size = <0x04 0x00>; + }; + }; diff --git a/Documentation/core-api/kho/index.rst b/Documentation/core-api/kho/index.rst new file mode 100644 index 000000000000..0c63b0c5c143 --- /dev/null +++ b/Documentation/core-api/kho/index.rst @@ -0,0 +1,13 @@ +.. SPDX-License-Identifier: GPL-2.0-or-later + +======================== +Kexec Handover Subsystem +======================== + +.. toctree:: + :maxdepth: 1 + + concepts + fdt + +.. only:: subproject and html diff --git a/Documentation/core-api/symbol-namespaces.rst b/Documentation/core-api/symbol-namespaces.rst index 06f766a6aab2..32fc73dc5529 100644 --- a/Documentation/core-api/symbol-namespaces.rst +++ b/Documentation/core-api/symbol-namespaces.rst @@ -6,18 +6,8 @@ The following document describes how to use Symbol Namespaces to structure the export surface of in-kernel symbols exported through the family of EXPORT_SYMBOL() macros. -.. Table of Contents - - === 1 Introduction - === 2 How to define Symbol Namespaces - --- 2.1 Using the EXPORT_SYMBOL macros - --- 2.2 Using the DEFAULT_SYMBOL_NAMESPACE define - === 3 How to use Symbols exported in Namespaces - === 4 Loading Modules that use namespaced Symbols - === 5 Automatically creating MODULE_IMPORT_NS statements - -1. Introduction -=============== +Introduction +============ Symbol Namespaces have been introduced as a means to structure the export surface of the in-kernel API. It allows subsystem maintainers to partition @@ -28,15 +18,18 @@ kernel. As of today, modules that make use of symbols exported into namespaces, are required to import the namespace. Otherwise the kernel will, depending on its configuration, reject loading the module or warn about a missing import. -2. How to define Symbol Namespaces -================================== +Additionally, it is possible to put symbols into a module namespace, strictly +limiting which modules are allowed to use these symbols. + +How to define Symbol Namespaces +=============================== Symbols can be exported into namespace using different methods. All of them are changing the way EXPORT_SYMBOL and friends are instrumented to create ksymtab entries. -2.1 Using the EXPORT_SYMBOL macros -================================== +Using the EXPORT_SYMBOL macros +------------------------------ In addition to the macros EXPORT_SYMBOL() and EXPORT_SYMBOL_GPL(), that allow exporting of kernel symbols to the kernel symbol table, variants of these are @@ -54,8 +47,8 @@ refer to ``NULL``. There is no default namespace if none is defined. ``modpost`` and kernel/module/main.c make use the namespace at build time or module load time, respectively. -2.2 Using the DEFAULT_SYMBOL_NAMESPACE define -============================================= +Using the DEFAULT_SYMBOL_NAMESPACE define +----------------------------------------- Defining namespaces for all symbols of a subsystem can be very verbose and may become hard to maintain. Therefore a default define (DEFAULT_SYMBOL_NAMESPACE) @@ -83,8 +76,24 @@ unit as preprocessor statement. The above example would then read:: within the corresponding compilation unit before the #include for <linux/export.h>. Typically it's placed before the first #include statement. -3. How to use Symbols exported in Namespaces -============================================ +Using the EXPORT_SYMBOL_GPL_FOR_MODULES() macro +----------------------------------------------- + +Symbols exported using this macro are put into a module namespace. This +namespace cannot be imported. + +The macro takes a comma separated list of module names, allowing only those +modules to access this symbol. Simple tail-globs are supported. + +For example:: + + EXPORT_SYMBOL_GPL_FOR_MODULES(preempt_notifier_inc, "kvm,kvm-*") + +will limit usage of this symbol to modules whoes name matches the given +patterns. + +How to use Symbols exported in Namespaces +========================================= In order to use symbols that are exported into namespaces, kernel modules need to explicitly import these namespaces. Otherwise the kernel might reject to @@ -106,11 +115,10 @@ inspected with modinfo:: It is advisable to add the MODULE_IMPORT_NS() statement close to other module -metadata definitions like MODULE_AUTHOR() or MODULE_LICENSE(). Refer to section -5. for a way to create missing import statements automatically. +metadata definitions like MODULE_AUTHOR() or MODULE_LICENSE(). -4. Loading Modules that use namespaced Symbols -============================================== +Loading Modules that use namespaced Symbols +=========================================== At module loading time (e.g. ``insmod``), the kernel will check each symbol referenced from the module for its availability and whether the namespace it @@ -121,8 +129,8 @@ allow loading of modules that don't satisfy this precondition, a configuration option is available: Setting MODULE_ALLOW_MISSING_NAMESPACE_IMPORTS=y will enable loading regardless, but will emit a warning. -5. Automatically creating MODULE_IMPORT_NS statements -===================================================== +Automatically creating MODULE_IMPORT_NS statements +================================================== Missing namespaces imports can easily be detected at build time. In fact, modpost will emit a warning if a module uses a symbol from a namespace @@ -154,3 +162,6 @@ in-tree modules:: You can also run nsdeps for external module builds. A typical usage is:: $ make -C <path_to_kernel_src> M=$PWD nsdeps + +Note: it will happily generate an import statement for the module namespace; +which will not work and generates build and runtime failures. |