summaryrefslogtreecommitdiff
path: root/Documentation
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/RCU/Design/Requirements/Requirements.rst36
-rw-r--r--Documentation/RCU/whatisRCU.rst1
-rw-r--r--Documentation/admin-guide/bcache.rst3
-rw-r--r--Documentation/admin-guide/cgroup-v2.rst42
-rw-r--r--Documentation/admin-guide/kernel-parameters.txt170
-rw-r--r--Documentation/admin-guide/perf/hisi-pmu.rst40
-rw-r--r--Documentation/arch/x86/resctrl.rst7
-rw-r--r--Documentation/arm64/acpi_object_usage.rst81
-rw-r--r--Documentation/arm64/arm-acpi.rst169
-rw-r--r--Documentation/arm64/booting.rst32
-rw-r--r--Documentation/arm64/cpu-feature-registers.rst2
-rw-r--r--Documentation/arm64/elf_hwcaps.rst3
-rw-r--r--Documentation/arm64/index.rst2
-rw-r--r--Documentation/arm64/kdump.rst92
-rw-r--r--Documentation/arm64/memory.rst8
-rw-r--r--Documentation/arm64/ptdump.rst96
-rw-r--r--Documentation/arm64/silicon-errata.rst4
-rw-r--r--Documentation/core-api/cpu_hotplug.rst13
-rw-r--r--Documentation/core-api/kernel-api.rst12
-rw-r--r--Documentation/core-api/pin_user_pages.rst6
-rw-r--r--Documentation/dev-tools/kunit/architecture.rst4
-rw-r--r--Documentation/dev-tools/kunit/start.rst7
-rw-r--r--Documentation/dev-tools/kunit/usage.rst69
-rw-r--r--Documentation/devicetree/bindings/ata/ahci-common.yaml2
-rw-r--r--Documentation/devicetree/bindings/clock/canaan,k210-clk.yaml2
-rw-r--r--Documentation/devicetree/bindings/firmware/qcom,scm.yaml2
-rw-r--r--Documentation/devicetree/bindings/i2c/opencores,i2c-ocores.yaml1
-rw-r--r--Documentation/devicetree/bindings/i3c/silvaco,i3c-master.yaml2
-rw-r--r--Documentation/devicetree/bindings/interrupt-controller/loongson,eiointc.yaml59
-rw-r--r--Documentation/devicetree/bindings/memory-controllers/nuvoton,npcm-memory-controller.yaml50
-rw-r--r--Documentation/devicetree/bindings/mfd/canaan,k210-sysctl.yaml2
-rw-r--r--Documentation/devicetree/bindings/net/realtek-bluetooth.yaml4
-rw-r--r--Documentation/devicetree/bindings/perf/fsl-imx-ddr.yaml3
-rw-r--r--Documentation/devicetree/bindings/pinctrl/canaan,k210-fpioa.yaml2
-rw-r--r--Documentation/devicetree/bindings/pinctrl/qcom,pmic-mpp.yaml5
-rw-r--r--Documentation/devicetree/bindings/reset/canaan,k210-rst.yaml2
-rw-r--r--Documentation/devicetree/bindings/riscv/canaan.yaml2
-rw-r--r--Documentation/devicetree/bindings/thermal/armada-thermal.txt1
-rw-r--r--Documentation/devicetree/bindings/thermal/brcm,bcm2835-thermal.txt41
-rw-r--r--Documentation/devicetree/bindings/thermal/brcm,bcm2835-thermal.yaml48
-rw-r--r--Documentation/devicetree/bindings/thermal/qcom-tsens.yaml32
-rw-r--r--Documentation/devicetree/bindings/timer/brcm,kona-timer.txt25
-rw-r--r--Documentation/devicetree/bindings/timer/brcm,kona-timer.yaml52
-rw-r--r--Documentation/devicetree/bindings/timer/loongson,ls1x-pwmtimer.yaml48
-rw-r--r--Documentation/devicetree/bindings/timer/ralink,rt2880-timer.yaml44
-rw-r--r--Documentation/devicetree/usage-model.rst2
-rw-r--r--Documentation/driver-api/edac.rst120
-rw-r--r--Documentation/filesystems/directory-locking.rst26
-rw-r--r--Documentation/filesystems/fsverity.rst192
-rw-r--r--Documentation/process/changes.rst2
-rw-r--r--Documentation/process/maintainer-tip.rst3
-rw-r--r--Documentation/riscv/patch-acceptance.rst18
-rw-r--r--Documentation/rust/quick-start.rst4
-rw-r--r--Documentation/trace/user_events.rst7
-rw-r--r--Documentation/translations/zh_CN/devicetree/usage-model.rst2
-rw-r--r--Documentation/virt/paravirt_ops.rst16
56 files changed, 1292 insertions, 428 deletions
diff --git a/Documentation/RCU/Design/Requirements/Requirements.rst b/Documentation/RCU/Design/Requirements/Requirements.rst
index 49387d823619..f3b605285a87 100644
--- a/Documentation/RCU/Design/Requirements/Requirements.rst
+++ b/Documentation/RCU/Design/Requirements/Requirements.rst
@@ -2071,41 +2071,7 @@ call.
Because RCU avoids interrupting idle CPUs, it is illegal to execute an
RCU read-side critical section on an idle CPU. (Kernels built with
-``CONFIG_PROVE_RCU=y`` will splat if you try it.) The RCU_NONIDLE()
-macro and ``_rcuidle`` event tracing is provided to work around this
-restriction. In addition, rcu_is_watching() may be used to test
-whether or not it is currently legal to run RCU read-side critical
-sections on this CPU. I learned of the need for diagnostics on the one
-hand and RCU_NONIDLE() on the other while inspecting idle-loop code.
-Steven Rostedt supplied ``_rcuidle`` event tracing, which is used quite
-heavily in the idle loop. However, there are some restrictions on the
-code placed within RCU_NONIDLE():
-
-#. Blocking is prohibited. In practice, this is not a serious
- restriction given that idle tasks are prohibited from blocking to
- begin with.
-#. Although nesting RCU_NONIDLE() is permitted, they cannot nest
- indefinitely deeply. However, given that they can be nested on the
- order of a million deep, even on 32-bit systems, this should not be a
- serious restriction. This nesting limit would probably be reached
- long after the compiler OOMed or the stack overflowed.
-#. Any code path that enters RCU_NONIDLE() must sequence out of that
- same RCU_NONIDLE(). For example, the following is grossly
- illegal:
-
- ::
-
- 1 RCU_NONIDLE({
- 2 do_something();
- 3 goto bad_idea; /* BUG!!! */
- 4 do_something_else();});
- 5 bad_idea:
-
-
- It is just as illegal to transfer control into the middle of
- RCU_NONIDLE()'s argument. Yes, in theory, you could transfer in
- as long as you also transferred out, but in practice you could also
- expect to get sharply worded review comments.
+``CONFIG_PROVE_RCU=y`` will splat if you try it.)
It is similarly socially unacceptable to interrupt an ``nohz_full`` CPU
running in userspace. RCU must therefore track ``nohz_full`` userspace
diff --git a/Documentation/RCU/whatisRCU.rst b/Documentation/RCU/whatisRCU.rst
index 8eddef28d3a1..e488c8e557a9 100644
--- a/Documentation/RCU/whatisRCU.rst
+++ b/Documentation/RCU/whatisRCU.rst
@@ -1117,7 +1117,6 @@ All: lockdep-checked RCU utility APIs::
RCU_LOCKDEP_WARN
rcu_sleep_check
- RCU_NONIDLE
All: Unchecked RCU-protected pointer access::
diff --git a/Documentation/admin-guide/bcache.rst b/Documentation/admin-guide/bcache.rst
index bb5032a99234..6fdb495ac466 100644
--- a/Documentation/admin-guide/bcache.rst
+++ b/Documentation/admin-guide/bcache.rst
@@ -508,9 +508,6 @@ cache_miss_collisions
cache miss, but raced with a write and data was already present (usually 0
since the synchronization for cache misses was rewritten)
-cache_readaheads
- Count of times readahead occurred.
-
Sysfs - cache set
~~~~~~~~~~~~~~~~~
diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index e592a9364473..c63358c38a1d 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -2022,31 +2022,33 @@ that attribute:
no-change
Do not modify the I/O priority class.
- none-to-rt
- For requests that do not have an I/O priority class (NONE),
- change the I/O priority class into RT. Do not modify
- the I/O priority class of other requests.
+ promote-to-rt
+ For requests that have a non-RT I/O priority class, change it into RT.
+ Also change the priority level of these requests to 4. Do not modify
+ the I/O priority of requests that have priority class RT.
restrict-to-be
For requests that do not have an I/O priority class or that have I/O
- priority class RT, change it into BE. Do not modify the I/O priority
- class of requests that have priority class IDLE.
+ priority class RT, change it into BE. Also change the priority level
+ of these requests to 0. Do not modify the I/O priority class of
+ requests that have priority class IDLE.
idle
Change the I/O priority class of all requests into IDLE, the lowest
I/O priority class.
+ none-to-rt
+ Deprecated. Just an alias for promote-to-rt.
+
The following numerical values are associated with the I/O priority policies:
-+-------------+---+
-| no-change | 0 |
-+-------------+---+
-| none-to-rt | 1 |
-+-------------+---+
-| rt-to-be | 2 |
-+-------------+---+
-| all-to-idle | 3 |
-+-------------+---+
++----------------+---+
+| no-change | 0 |
++----------------+---+
+| rt-to-be | 2 |
++----------------+---+
+| all-to-idle | 3 |
++----------------+---+
The numerical value that corresponds to each I/O priority class is as follows:
@@ -2062,9 +2064,13 @@ The numerical value that corresponds to each I/O priority class is as follows:
The algorithm to set the I/O priority class for a request is as follows:
-- Translate the I/O priority class policy into a number.
-- Change the request I/O priority class into the maximum of the I/O priority
- class policy number and the numerical I/O priority class.
+- If I/O priority class policy is promote-to-rt, change the request I/O
+ priority class to IOPRIO_CLASS_RT and change the request I/O priority
+ level to 4.
+- If I/O priorityt class is not promote-to-rt, translate the I/O priority
+ class policy into a number, then change the request I/O priority class
+ into the maximum of the I/O priority class policy number and the numerical
+ I/O priority class.
PID
---
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 9e5bab29685f..2836780618a8 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -429,6 +429,9 @@
arm64.nosme [ARM64] Unconditionally disable Scalable Matrix
Extension support
+ arm64.nomops [ARM64] Unconditionally disable Memory Copy and Memory
+ Set instructions support
+
ataflop= [HW,M68k]
atarimouse= [HW,MOUSE] Atari Mouse
@@ -818,20 +821,6 @@
Format:
<first_slot>,<last_slot>,<port>,<enum_bit>[,<debug>]
- cpu0_hotplug [X86] Turn on CPU0 hotplug feature when
- CONFIG_BOOTPARAM_HOTPLUG_CPU0 is off.
- Some features depend on CPU0. Known dependencies are:
- 1. Resume from suspend/hibernate depends on CPU0.
- Suspend/hibernate will fail if CPU0 is offline and you
- need to online CPU0 before suspend/hibernate.
- 2. PIC interrupts also depend on CPU0. CPU0 can't be
- removed if a PIC interrupt is detected.
- It's said poweroff/reboot may depend on CPU0 on some
- machines although I haven't seen such issues so far
- after CPU0 is offline on a few tested machines.
- If the dependencies are under your control, you can
- turn on cpu0_hotplug.
-
cpuidle.off=1 [CPU_IDLE]
disable the cpuidle sub-system
@@ -852,6 +841,12 @@
on every CPU online, such as boot, and resume from suspend.
Default: 10000
+ cpuhp.parallel=
+ [SMP] Enable/disable parallel bringup of secondary CPUs
+ Format: <bool>
+ Default is enabled if CONFIG_HOTPLUG_PARALLEL=y. Otherwise
+ the parameter has no effect.
+
crash_kexec_post_notifiers
Run kdump after running panic-notifiers and dumping
kmsg. This only for the users who doubt kdump always
@@ -4736,43 +4731,6 @@
the propagation of recent CPU-hotplug changes up
the rcu_node combining tree.
- rcutree.use_softirq= [KNL]
- If set to zero, move all RCU_SOFTIRQ processing to
- per-CPU rcuc kthreads. Defaults to a non-zero
- value, meaning that RCU_SOFTIRQ is used by default.
- Specify rcutree.use_softirq=0 to use rcuc kthreads.
-
- But note that CONFIG_PREEMPT_RT=y kernels disable
- this kernel boot parameter, forcibly setting it
- to zero.
-
- rcutree.rcu_fanout_exact= [KNL]
- Disable autobalancing of the rcu_node combining
- tree. This is used by rcutorture, and might
- possibly be useful for architectures having high
- cache-to-cache transfer latencies.
-
- rcutree.rcu_fanout_leaf= [KNL]
- Change the number of CPUs assigned to each
- leaf rcu_node structure. Useful for very
- large systems, which will choose the value 64,
- and for NUMA systems with large remote-access
- latencies, which will choose a value aligned
- with the appropriate hardware boundaries.
-
- rcutree.rcu_min_cached_objs= [KNL]
- Minimum number of objects which are cached and
- maintained per one CPU. Object size is equal
- to PAGE_SIZE. The cache allows to reduce the
- pressure to page allocator, also it makes the
- whole algorithm to behave better in low memory
- condition.
-
- rcutree.rcu_delay_page_cache_fill_msec= [KNL]
- Set the page-cache refill delay (in milliseconds)
- in response to low-memory conditions. The range
- of permitted values is in the range 0:100000.
-
rcutree.jiffies_till_first_fqs= [KNL]
Set delay from grace-period initialization to
first attempt to force quiescent states.
@@ -4811,21 +4769,6 @@
When RCU_NOCB_CPU is set, also adjust the
priority of NOCB callback kthreads.
- rcutree.rcu_divisor= [KNL]
- Set the shift-right count to use to compute
- the callback-invocation batch limit bl from
- the number of callbacks queued on this CPU.
- The result will be bounded below by the value of
- the rcutree.blimit kernel parameter. Every bl
- callbacks, the softirq handler will exit in
- order to allow the CPU to do other work.
-
- Please note that this callback-invocation batch
- limit applies only to non-offloaded callback
- invocation. Offloaded callbacks are instead
- invoked in the context of an rcuoc kthread, which
- scheduler will preempt as it does any other task.
-
rcutree.nocb_nobypass_lim_per_jiffy= [KNL]
On callback-offloaded (rcu_nocbs) CPUs,
RCU reduces the lock contention that would
@@ -4839,14 +4782,6 @@
the ->nocb_bypass queue. The definition of "too
many" is supplied by this kernel boot parameter.
- rcutree.rcu_nocb_gp_stride= [KNL]
- Set the number of NOCB callback kthreads in
- each group, which defaults to the square root
- of the number of CPUs. Larger numbers reduce
- the wakeup overhead on the global grace-period
- kthread, but increases that same overhead on
- each group's NOCB grace-period kthread.
-
rcutree.qhimark= [KNL]
Set threshold of queued RCU callbacks beyond which
batch limiting is disabled.
@@ -4864,6 +4799,56 @@
on rcutree.qhimark at boot time and to zero to
disable more aggressive help enlistment.
+ rcutree.rcu_delay_page_cache_fill_msec= [KNL]
+ Set the page-cache refill delay (in milliseconds)
+ in response to low-memory conditions. The range
+ of permitted values is in the range 0:100000.
+
+ rcutree.rcu_divisor= [KNL]
+ Set the shift-right count to use to compute
+ the callback-invocation batch limit bl from
+ the number of callbacks queued on this CPU.
+ The result will be bounded below by the value of
+ the rcutree.blimit kernel parameter. Every bl
+ callbacks, the softirq handler will exit in
+ order to allow the CPU to do other work.
+
+ Please note that this callback-invocation batch
+ limit applies only to non-offloaded callback
+ invocation. Offloaded callbacks are instead
+ invoked in the context of an rcuoc kthread, which
+ scheduler will preempt as it does any other task.
+
+ rcutree.rcu_fanout_exact= [KNL]
+ Disable autobalancing of the rcu_node combining
+ tree. This is used by rcutorture, and might
+ possibly be useful for architectures having high
+ cache-to-cache transfer latencies.
+
+ rcutree.rcu_fanout_leaf= [KNL]
+ Change the number of CPUs assigned to each
+ leaf rcu_node structure. Useful for very
+ large systems, which will choose the value 64,
+ and for NUMA systems with large remote-access
+ latencies, which will choose a value aligned
+ with the appropriate hardware boundaries.
+
+ rcutree.rcu_min_cached_objs= [KNL]
+ Minimum number of objects which are cached and
+ maintained per one CPU. Object size is equal
+ to PAGE_SIZE. The cache allows to reduce the
+ pressure to page allocator, also it makes the
+ whole algorithm to behave better in low memory
+ condition.
+
+ rcutree.rcu_nocb_gp_stride= [KNL]
+ Set the number of NOCB callback kthreads in
+ each group, which defaults to the square root
+ of the number of CPUs. Larger numbers reduce
+ the wakeup overhead on the global grace-period
+ kthread, but increases that same overhead on
+ each group's NOCB grace-period kthread.
+
rcutree.rcu_kick_kthreads= [KNL]
Cause the grace-period kthread to get an extra
wake_up() if it sleeps three times longer than
@@ -4871,6 +4856,13 @@
This wake_up() will be accompanied by a
WARN_ONCE() splat and an ftrace_dump().
+ rcutree.rcu_resched_ns= [KNL]
+ Limit the time spend invoking a batch of RCU
+ callbacks to the specified number of nanoseconds.
+ By default, this limit is checked only once
+ every 32 callbacks in order to limit the pain
+ inflicted by local_clock() overhead.
+
rcutree.rcu_unlock_delay= [KNL]
In CONFIG_RCU_STRICT_GRACE_PERIOD=y kernels,
this specifies an rcu_read_unlock()-time delay
@@ -4885,6 +4877,16 @@
rcu_node tree with an eye towards determining
why a new grace period has not yet started.
+ rcutree.use_softirq= [KNL]
+ If set to zero, move all RCU_SOFTIRQ processing to
+ per-CPU rcuc kthreads. Defaults to a non-zero
+ value, meaning that RCU_SOFTIRQ is used by default.
+ Specify rcutree.use_softirq=0 to use rcuc kthreads.
+
+ But note that CONFIG_PREEMPT_RT=y kernels disable
+ this kernel boot parameter, forcibly setting it
+ to zero.
+
rcuscale.gp_async= [KNL]
Measure performance of asynchronous
grace-period primitives such as call_rcu().
@@ -5087,8 +5089,17 @@
rcutorture.stall_cpu_block= [KNL]
Sleep while stalling if set. This will result
- in warnings from preemptible RCU in addition
- to any other stall-related activity.
+ in warnings from preemptible RCU in addition to
+ any other stall-related activity. Note that
+ in kernels built with CONFIG_PREEMPTION=n and
+ CONFIG_PREEMPT_COUNT=y, this parameter will
+ cause the CPU to pass through a quiescent state.
+ Given CONFIG_PREEMPTION=n, this will suppress
+ RCU CPU stall warnings, but will instead result
+ in scheduling-while-atomic splats.
+
+ Use of this module parameter results in splats.
+
rcutorture.stall_cpu_holdoff= [KNL]
Time to wait (s) after boot before inducing stall.
@@ -5452,7 +5463,12 @@
port and the regular usb controller gets disabled.
root= [KNL] Root filesystem
- See name_to_dev_t comment in init/do_mounts.c.
+ Usually this a a block device specifier of some kind,
+ see the early_lookup_bdev comment in
+ block/early-lookup.c for details.
+ Alternatively this can be "ram" for the legacy initial
+ ramdisk, "nfs" and "cifs" for root on a network file
+ system, or "mtd" and "ubi" for mounting from raw flash.
rootdelay= [KNL] Delay (in seconds) to pause before attempting to
mount the root filesystem
diff --git a/Documentation/admin-guide/perf/hisi-pmu.rst b/Documentation/admin-guide/perf/hisi-pmu.rst
index 546979360513..e0174d20809a 100644
--- a/Documentation/admin-guide/perf/hisi-pmu.rst
+++ b/Documentation/admin-guide/perf/hisi-pmu.rst
@@ -56,14 +56,14 @@ Example usage of perf::
For HiSilicon uncore PMU v2 whose identifier is 0x30, the topology is the same
as PMU v1, but some new functions are added to the hardware.
-(a) L3C PMU supports filtering by core/thread within the cluster which can be
+1. L3C PMU supports filtering by core/thread within the cluster which can be
specified as a bitmap::
$# perf stat -a -e hisi_sccl3_l3c0/config=0x02,tt_core=0x3/ sleep 5
This will only count the operations from core/thread 0 and 1 in this cluster.
-(b) Tracetag allow the user to chose to count only read, write or atomic
+2. Tracetag allow the user to chose to count only read, write or atomic
operations via the tt_req parameeter in perf. The default value counts all
operations. tt_req is 3bits, 3'b100 represents read operations, 3'b101
represents write operations, 3'b110 represents atomic store operations and
@@ -73,14 +73,16 @@ represents write operations, 3'b110 represents atomic store operations and
This will only count the read operations in this cluster.
-(c) Datasrc allows the user to check where the data comes from. It is 5 bits.
+3. Datasrc allows the user to check where the data comes from. It is 5 bits.
Some important codes are as follows:
-5'b00001: comes from L3C in this die;
-5'b01000: comes from L3C in the cross-die;
-5'b01001: comes from L3C which is in another socket;
-5'b01110: comes from the local DDR;
-5'b01111: comes from the cross-die DDR;
-5'b10000: comes from cross-socket DDR;
+
+- 5'b00001: comes from L3C in this die;
+- 5'b01000: comes from L3C in the cross-die;
+- 5'b01001: comes from L3C which is in another socket;
+- 5'b01110: comes from the local DDR;
+- 5'b01111: comes from the cross-die DDR;
+- 5'b10000: comes from cross-socket DDR;
+
etc, it is mainly helpful to find that the data source is nearest from the CPU
cores. If datasrc_cfg is used in the multi-chips, the datasrc_skt shall be
configured in perf command::
@@ -88,15 +90,25 @@ configured in perf command::
$# perf stat -a -e hisi_sccl3_l3c0/config=0xb9,datasrc_cfg=0xE/,
hisi_sccl3_l3c0/config=0xb9,datasrc_cfg=0xF/ sleep 5
-(d)Some HiSilicon SoCs encapsulate multiple CPU and IO dies. Each CPU die
+4. Some HiSilicon SoCs encapsulate multiple CPU and IO dies. Each CPU die
contains several Compute Clusters (CCLs). The I/O dies are called Super I/O
clusters (SICL) containing multiple I/O clusters (ICLs). Each CCL/ICL in the
SoC has a unique ID. Each ID is 11bits, include a 6-bit SCCL-ID and 5-bit
CCL/ICL-ID. For I/O die, the ICL-ID is followed by:
-5'b00000: I/O_MGMT_ICL;
-5'b00001: Network_ICL;
-5'b00011: HAC_ICL;
-5'b10000: PCIe_ICL;
+
+- 5'b00000: I/O_MGMT_ICL;
+- 5'b00001: Network_ICL;
+- 5'b00011: HAC_ICL;
+- 5'b10000: PCIe_ICL;
+
+5. uring_channel: UC PMU events 0x47~0x59 supports filtering by tx request
+uring channel. It is 2 bits. Some important codes are as follows:
+
+- 2'b11: count the events which sent to the uring_ext (MATA) channel;
+- 2'b01: is the same as 2'b11;
+- 2'b10: count the events which sent to the uring (non-MATA) channel;
+- 2'b00: default value, count the events which sent to the both uring and
+ uring_ext channel;
Users could configure IDs to count data come from specific CCL/ICL, by setting
srcid_cmd & srcid_msk, and data desitined for specific CCL/ICL by setting
diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
index 387ccbcb558f..cb05d90111b4 100644
--- a/Documentation/arch/x86/resctrl.rst
+++ b/Documentation/arch/x86/resctrl.rst
@@ -287,6 +287,13 @@ Removing a directory will move all tasks and cpus owned by the group it
represents to the parent. Removing one of the created CTRL_MON groups
will automatically remove all MON groups below it.
+Moving MON group directories to a new parent CTRL_MON group is supported
+for the purpose of changing the resource allocations of a MON group
+without impacting its monitoring data or assigned tasks. This operation
+is not allowed for MON groups which monitor CPUs. No other move
+operation is currently allowed other than simply renaming a CTRL_MON or
+MON group.
+
All groups contain the following files:
"tasks":
diff --git a/Documentation/arm64/acpi_object_usage.rst b/Documentation/arm64/acpi_object_usage.rst
index 484ef9676653..1da22200fdf8 100644
--- a/Documentation/arm64/acpi_object_usage.rst
+++ b/Documentation/arm64/acpi_object_usage.rst
@@ -17,16 +17,37 @@ For ACPI on arm64, tables also fall into the following categories:
- Recommended: BERT, EINJ, ERST, HEST, PCCT, SSDT
- - Optional: BGRT, CPEP, CSRT, DBG2, DRTM, ECDT, FACS, FPDT, IBFT,
- IORT, MCHI, MPST, MSCT, NFIT, PMTT, RASF, SBST, SLIT, SPMI, SRAT,
- STAO, TCPA, TPM2, UEFI, XENV
+ - Optional: AGDI, BGRT, CEDT, CPEP, CSRT, DBG2, DRTM, ECDT, FACS, FPDT,
+ HMAT, IBFT, IORT, MCHI, MPAM, MPST, MSCT, NFIT, PMTT, PPTT, RASF, SBST,
+ SDEI, SLIT, SPMI, SRAT, STAO, TCPA, TPM2, UEFI, XENV
- - Not supported: BOOT, DBGP, DMAR, ETDT, HPET, IVRS, LPIT, MSDM, OEMx,
- PSDT, RSDT, SLIC, WAET, WDAT, WDRT, WPBT
+ - Not supported: AEST, APMT, BOOT, DBGP, DMAR, ETDT, HPET, IVRS, LPIT,
+ MSDM, OEMx, PDTT, PSDT, RAS2, RSDT, SLIC, WAET, WDAT, WDRT, WPBT
====== ========================================================================
Table Usage for ARMv8 Linux
====== ========================================================================
+AEST Signature Reserved (signature == "AEST")
+
+ **Arm Error Source Table**
+
+ This table informs the OS of any error nodes in the system that are
+ compliant with the Arm RAS architecture.
+
+AGDI Signature Reserved (signature == "AGDI")
+
+ **Arm Generic diagnostic Dump and Reset Device Interface Table**
+
+ This table describes a non-maskable event, that is used by the platform
+ firmware, to request the OS to generate a diagnostic dump and reset the device.
+
+APMT Signature Reserved (signature == "APMT")
+
+ **Arm Performance Monitoring Table**
+
+ This table describes the properties of PMU support implmented by
+ components in the system.
+
BERT Section 18.3 (signature == "BERT")
**Boot Error Record Table**
@@ -47,6 +68,13 @@ BGRT Section 5.2.22 (signature == "BGRT")
Optional, not currently supported, with no real use-case for an
ARM server.
+CEDT Signature Reserved (signature == "CEDT")
+
+ **CXL Early Discovery Table**
+
+ This table allows the OS to discover any CXL Host Bridges and the Host
+ Bridge registers.
+
CPEP Section 5.2.18 (signature == "CPEP")
**Corrected Platform Error Polling table**
@@ -184,6 +212,15 @@ HEST Section 18.3.2 (signature == "HEST")
Must be supplied if RAS support is provided by the platform. It
is recommended this table be supplied.
+HMAT Section 5.2.28 (signature == "HMAT")
+
+ **Heterogeneous Memory Attribute Table**
+
+ This table describes the memory attributes, such as memory side cache
+ attributes and bandwidth and latency details, related to Memory Proximity
+ Domains. The OS uses this information to optimize the system memory
+ configuration.
+
HPET Signature Reserved (signature == "HPET")
**High Precision Event timer Table**
@@ -241,6 +278,13 @@ MCHI Signature Reserved (signature == "MCHI")
Optional, not currently supported.
+MPAM Signature Reserved (signature == "MPAM")
+
+ **Memory Partitioning And Monitoring table**
+
+ This table allows the OS to discover the MPAM controls implemented by
+ the subsystems.
+
MPST Section 5.2.21 (signature == "MPST")
**Memory Power State Table**
@@ -281,18 +325,39 @@ PCCT Section 14.1 (signature == "PCCT)
Recommend for use on arm64; use of PCC is recommended when using CPPC
to control performance and power for platform processors.
+PDTT Section 5.2.29 (signature == "PDTT")
+
+ **Platform Debug Trigger Table**
+
+ This table describes PCC channels used to gather debug logs of
+ non-architectural features.
+
+
PMTT Section 5.2.21.12 (signature == "PMTT")
**Platform Memory Topology Table**
Optional, not currently supported.
+PPTT Section 5.2.30 (signature == "PPTT")
+
+ **Processor Properties Topology Table**
+
+ This table provides the processor and cache topology.
+
PSDT Section 5.2.11.3 (signature == "PSDT")
**Persistent System Description Table**
Obsolete table, will not be supported.
+RAS2 Section 5.2.21 (signature == "RAS2")
+
+ **RAS Features 2 table**
+
+ This table provides interfaces for the RAS capabilities implemented in
+ the platform.
+
RASF Section 5.2.20 (signature == "RASF")
**RAS Feature table**
@@ -318,6 +383,12 @@ SBST Section 5.2.14 (signature == "SBST")
Optional, not currently supported.
+SDEI Signature Reserved (signature == "SDEI")
+
+ **Software Delegated Exception Interface table**
+
+ This table advertises the presence of the SDEI interface.
+
SLIC Signature Reserved (signature == "SLIC")
**Software LIcensing table**
diff --git a/Documentation/arm64/arm-acpi.rst b/Documentation/arm64/arm-acpi.rst
index 47ecb9930dde..37ec5e9b1575 100644
--- a/Documentation/arm64/arm-acpi.rst
+++ b/Documentation/arm64/arm-acpi.rst
@@ -1,40 +1,41 @@
-=====================
-ACPI on ARMv8 Servers
-=====================
-
-ACPI can be used for ARMv8 general purpose servers designed to follow
-the ARM SBSA (Server Base System Architecture) [0] and SBBR (Server
-Base Boot Requirements) [1] specifications. Please note that the SBBR
-can be retrieved simply by visiting [1], but the SBSA is currently only
-available to those with an ARM login due to ARM IP licensing concerns.
-
-The ARMv8 kernel implements the reduced hardware model of ACPI version
+===================
+ACPI on Arm systems
+===================
+
+ACPI can be used for Armv8 and Armv9 systems designed to follow
+the BSA (Arm Base System Architecture) [0] and BBR (Arm
+Base Boot Requirements) [1] specifications. Both BSA and BBR are publicly
+accessible documents.
+Arm Servers, in addition to being BSA compliant, comply with a set
+of rules defined in SBSA (Server Base System Architecture) [2].
+
+The Arm kernel implements the reduced hardware model of ACPI version
5.1 or later. Links to the specification and all external documents
it refers to are managed by the UEFI Forum. The specification is
available at http://www.uefi.org/specifications and documents referenced
by the specification can be found via http://www.uefi.org/acpi.
-If an ARMv8 system does not meet the requirements of the SBSA and SBBR,
+If an Arm system does not meet the requirements of the BSA and BBR,
or cannot be described using the mechanisms defined in the required ACPI
specifications, then ACPI may not be a good fit for the hardware.
While the documents mentioned above set out the requirements for building
-industry-standard ARMv8 servers, they also apply to more than one operating
+industry-standard Arm systems, they also apply to more than one operating
system. The purpose of this document is to describe the interaction between
-ACPI and Linux only, on an ARMv8 system -- that is, what Linux expects of
+ACPI and Linux only, on an Arm system -- that is, what Linux expects of
ACPI and what ACPI can expect of Linux.
-Why ACPI on ARM?
+Why ACPI on Arm?
----------------
Before examining the details of the interface between ACPI and Linux, it is
useful to understand why ACPI is being used. Several technologies already
exist in Linux for describing non-enumerable hardware, after all. In this
-section we summarize a blog post [2] from Grant Likely that outlines the
-reasoning behind ACPI on ARMv8 servers. Actually, we snitch a good portion
+section we summarize a blog post [3] from Grant Likely that outlines the
+reasoning behind ACPI on Arm systems. Actually, we snitch a good portion
of the summary text almost directly, to be honest.
-The short form of the rationale for ACPI on ARM is:
+The short form of the rationale for ACPI on Arm is:
- ACPI’s byte code (AML) allows the platform to encode hardware behavior,
while DT explicitly does not support this. For hardware vendors, being
@@ -47,7 +48,7 @@ The short form of the rationale for ACPI on ARM is:
- In the enterprise server environment, ACPI has established bindings (such
as for RAS) which are currently used in production systems. DT does not.
- Such bindings could be defined in DT at some point, but doing so means ARM
+ Such bindings could be defined in DT at some point, but doing so means Arm
and x86 would end up using completely different code paths in both firmware
and the kernel.
@@ -108,7 +109,7 @@ recent version of the kernel.
Relationship with Device Tree
-----------------------------
-ACPI support in drivers and subsystems for ARMv8 should never be mutually
+ACPI support in drivers and subsystems for Arm should never be mutually
exclusive with DT support at compile time.
At boot time the kernel will only use one description method depending on
@@ -121,11 +122,11 @@ time).
Booting using ACPI tables
-------------------------
-The only defined method for passing ACPI tables to the kernel on ARMv8
+The only defined method for passing ACPI tables to the kernel on Arm
is via the UEFI system configuration table. Just so it is explicit, this
means that ACPI is only supported on platforms that boot via UEFI.
-When an ARMv8 system boots, it can either have DT information, ACPI tables,
+When an Arm system boots, it can either have DT information, ACPI tables,
or in some very unusual cases, both. If no command line parameters are used,
the kernel will try to use DT for device enumeration; if there is no DT
present, the kernel will try to use ACPI tables, but only if they are present.
@@ -169,7 +170,7 @@ hardware reduced mode must be set to zero.
For the ACPI core to operate properly, and in turn provide the information
the kernel needs to configure devices, it expects to find the following
-tables (all section numbers refer to the ACPI 6.1 specification):
+tables (all section numbers refer to the ACPI 6.5 specification):
- RSDP (Root System Description Pointer), section 5.2.5
@@ -184,20 +185,76 @@ tables (all section numbers refer to the ACPI 6.1 specification):
- GTDT (Generic Timer Description Table), section 5.2.24
+ - PPTT (Processor Properties Topology Table), section 5.2.30
+
+ - DBG2 (DeBuG port table 2), section 5.2.6, specifically Table 5-6.
+
+ - APMT (Arm Performance Monitoring unit Table), section 5.2.6, specifically Table 5-6.
+
+ - AGDI (Arm Generic diagnostic Dump and Reset Device Interface Table), section 5.2.6, specifically Table 5-6.
+
- If PCI is supported, the MCFG (Memory mapped ConFiGuration
- Table), section 5.2.6, specifically Table 5-31.
+ Table), section 5.2.6, specifically Table 5-6.
- If booting without a console=<device> kernel parameter is
supported, the SPCR (Serial Port Console Redirection table),
- section 5.2.6, specifically Table 5-31.
+ section 5.2.6, specifically Table 5-6.
- If necessary to describe the I/O topology, SMMUs and GIC ITSs,
the IORT (Input Output Remapping Table, section 5.2.6, specifically
- Table 5-31).
+ Table 5-6).
+
+ - If NUMA is supported, the following tables are required:
+
+ - SRAT (System Resource Affinity Table), section 5.2.16
+
+ - SLIT (System Locality distance Information Table), section 5.2.17
+
+ - If NUMA is supported, and the system contains heterogeneous memory,
+ the HMAT (Heterogeneous Memory Attribute Table), section 5.2.28.
+
+ - If the ACPI Platform Error Interfaces are required, the following
+ tables are conditionally required:
+
+ - BERT (Boot Error Record Table, section 18.3.1)
+
+ - EINJ (Error INJection table, section 18.6.1)
+
+ - ERST (Error Record Serialization Table, section 18.5)
+
+ - HEST (Hardware Error Source Table, section 18.3.2)
+
+ - SDEI (Software Delegated Exception Interface table, section 5.2.6,
+ specifically Table 5-6)
+
+ - AEST (Arm Error Source Table, section 5.2.6,
+ specifically Table 5-6)
+
+ - RAS2 (ACPI RAS2 feature table, section 5.2.21)
+
+ - If the system contains controllers using PCC channel, the
+ PCCT (Platform Communications Channel Table), section 14.1
+
+ - If the system contains a controller to capture board-level system state,
+ and communicates with the host via PCC, the PDTT (Platform Debug Trigger
+ Table), section 5.2.29.
+
+ - If NVDIMM is supported, the NFIT (NVDIMM Firmware Interface Table), section 5.2.26
+
+ - If video framebuffer is present, the BGRT (Boot Graphics Resource Table), section 5.2.23
+
+ - If IPMI is implemented, the SPMI (Server Platform Management Interface),
+ section 5.2.6, specifically Table 5-6.
+
+ - If the system contains a CXL Host Bridge, the CEDT (CXL Early Discovery
+ Table), section 5.2.6, specifically Table 5-6.
+
+ - If the system supports MPAM, the MPAM (Memory Partitioning And Monitoring table), section 5.2.6,
+ specifically Table 5-6.
+
+ - If the system lacks persistent storage, the IBFT (ISCSI Boot Firmware
+ Table), section 5.2.6, specifically Table 5-6.
- - If NUMA is supported, the SRAT (System Resource Affinity Table)
- and SLIT (System Locality distance Information Table), sections
- 5.2.16 and 5.2.17, respectively.
If the above tables are not all present, the kernel may or may not be
able to boot properly since it may not be able to configure all of the
@@ -269,16 +326,14 @@ Drivers should look for device properties in the _DSD object ONLY; the _DSD
object is described in the ACPI specification section 6.2.5, but this only
describes how to define the structure of an object returned via _DSD, and
how specific data structures are defined by specific UUIDs. Linux should
-only use the _DSD Device Properties UUID [5]:
+only use the _DSD Device Properties UUID [4]:
- UUID: daffd814-6eba-4d8c-8a91-bc9bbf4aa301
- - https://www.uefi.org/sites/default/files/resources/_DSD-device-properties-UUID.pdf
-
-The UEFI Forum provides a mechanism for registering device properties [4]
-so that they may be used across all operating systems supporting ACPI.
-Device properties that have not been registered with the UEFI Forum should
-not be used.
+Common device properties can be registered by creating a pull request to [4] so
+that they may be used across all operating systems supporting ACPI.
+Device properties that have not been registered with the UEFI Forum can be used
+but not as "uefi-" common properties.
Before creating new device properties, check to be sure that they have not
been defined before and either registered in the Linux kernel documentation
@@ -306,7 +361,7 @@ process.
Once registration and review have been completed, the kernel provides an
interface for looking up device properties in a manner independent of
-whether DT or ACPI is being used. This API should be used [6]; it can
+whether DT or ACPI is being used. This API should be used [5]; it can
eliminate some duplication of code paths in driver probing functions and
discourage divergence between DT bindings and ACPI device properties.
@@ -448,15 +503,15 @@ ASWG
----
The ACPI specification changes regularly. During the year 2014, for instance,
version 5.1 was released and version 6.0 substantially completed, with most of
-the changes being driven by ARM-specific requirements. Proposed changes are
+the changes being driven by Arm-specific requirements. Proposed changes are
presented and discussed in the ASWG (ACPI Specification Working Group) which
is a part of the UEFI Forum. The current version of the ACPI specification
-is 6.1 release in January 2016.
+is 6.5 release in August 2022.
Participation in this group is open to all UEFI members. Please see
http://www.uefi.org/workinggroup for details on group membership.
-It is the intent of the ARMv8 ACPI kernel code to follow the ACPI specification
+It is the intent of the Arm ACPI kernel code to follow the ACPI specification
as closely as possible, and to only implement functionality that complies with
the released standards from UEFI ASWG. As a practical matter, there will be
vendors that provide bad ACPI tables or violate the standards in some way.
@@ -470,12 +525,12 @@ likely be willing to assist in submitting ECRs.
Linux Code
----------
-Individual items specific to Linux on ARM, contained in the Linux
+Individual items specific to Linux on Arm, contained in the Linux
source code, are in the list that follows:
ACPI_OS_NAME
This macro defines the string to be returned when
- an ACPI method invokes the _OS method. On ARM64
+ an ACPI method invokes the _OS method. On Arm
systems, this macro will be "Linux" by default.
The command line parameter acpi_os=<string>
can be used to set it to some other value. The
@@ -490,31 +545,23 @@ Documentation/arm64/acpi_object_usage.rst.
References
----------
-[0] http://silver.arm.com
- document ARM-DEN-0029, or newer:
- "Server Base System Architecture", version 2.3, dated 27 Mar 2014
+[0] https://developer.arm.com/documentation/den0094/latest
+ document Arm-DEN-0094: "Arm Base System Architecture", version 1.0C, dated 6 Oct 2022
+
+[1] https://developer.arm.com/documentation/den0044/latest
+ Document Arm-DEN-0044: "Arm Base Boot Requirements", version 2.0G, dated 15 Apr 2022
-[1] http://infocenter.arm.com/help/topic/com.arm.doc.den0044a/Server_Base_Boot_Requirements.pdf
- Document ARM-DEN-0044A, or newer: "Server Base Boot Requirements, System
- Software on ARM Platforms", dated 16 Aug 2014
+[2] https://developer.arm.com/documentation/den0029/latest
+ Document Arm-DEN-0029: "Arm Server Base System Architecture", version 7.1, dated 06 Oct 2022
-[2] http://www.secretlab.ca/archives/151,
+[3] http://www.secretlab.ca/archives/151,
10 Jan 2015, Copyright (c) 2015,
Linaro Ltd., written by Grant Likely.
-[3] AMD ACPI for Seattle platform documentation
- http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2012/10/Seattle_ACPI_Guide.pdf
-
-
-[4] http://www.uefi.org/acpi
- please see the link for the "ACPI _DSD Device
- Property Registry Instructions"
-
-[5] http://www.uefi.org/acpi
- please see the link for the "_DSD (Device
- Specific Data) Implementation Guide"
+[4] _DSD (Device Specific Data) Implementation Guide
+ https://github.com/UEFI/DSD-Guide/blob/main/dsd-guide.pdf
-[6] Kernel code for the unified device
+[5] Kernel code for the unified device
property interface can be found in
include/linux/property.h and drivers/base/property.c.
diff --git a/Documentation/arm64/booting.rst b/Documentation/arm64/booting.rst
index ffeccdd6bdac..b57776a68f15 100644
--- a/Documentation/arm64/booting.rst
+++ b/Documentation/arm64/booting.rst
@@ -379,6 +379,38 @@ Before jumping into the kernel, the following conditions must be met:
- SMCR_EL2.EZT0 (bit 30) must be initialised to 0b1.
+ For CPUs with Memory Copy and Memory Set instructions (FEAT_MOPS):
+
+ - If the kernel is entered at EL1 and EL2 is present:
+
+ - HCRX_EL2.MSCEn (bit 11) must be initialised to 0b1.
+
+ For CPUs with the Extended Translation Control Register feature (FEAT_TCR2):
+
+ - If EL3 is present:
+
+ - SCR_EL3.TCR2En (bit 43) must be initialised to 0b1.
+
+ - If the kernel is entered at EL1 and EL2 is present:
+
+ - HCRX_EL2.TCR2En (bit 14) must be initialised to 0b1.
+
+ For CPUs with the Stage 1 Permission Indirection Extension feature (FEAT_S1PIE):
+
+ - If EL3 is present:
+
+ - SCR_EL3.PIEn (bit 45) must be initialised to 0b1.
+
+ - If the kernel is entered at EL1 and EL2 is present:
+
+ - HFGRTR_EL2.nPIR_EL1 (bit 58) must be initialised to 0b1.
+
+ - HFGWTR_EL2.nPIR_EL1 (bit 58) must be initialised to 0b1.
+
+ - HFGRTR_EL2.nPIRE0_EL1 (bit 57) must be initialised to 0b1.
+
+ - HFGRWR_EL2.nPIRE0_EL1 (bit 57) must be initialised to 0b1.
+
The requirements described above for CPU mode, caches, MMUs, architected
timers, coherency and system registers apply to all CPUs. All CPUs must
enter the kernel in the same exception level. Where the values documented
diff --git a/Documentation/arm64/cpu-feature-registers.rst b/Documentation/arm64/cpu-feature-registers.rst
index c7adc7897df6..4e4625f2455f 100644
--- a/Documentation/arm64/cpu-feature-registers.rst
+++ b/Documentation/arm64/cpu-feature-registers.rst
@@ -288,6 +288,8 @@ infrastructure:
+------------------------------+---------+---------+
| Name | bits | visible |
+------------------------------+---------+---------+
+ | MOPS | [19-16] | y |
+ +------------------------------+---------+---------+
| RPRES | [7-4] | y |
+------------------------------+---------+---------+
| WFXT | [3-0] | y |
diff --git a/Documentation/arm64/elf_hwcaps.rst b/Documentation/arm64/elf_hwcaps.rst
index 83e57e4d38e2..8f847d0dcf57 100644
--- a/Documentation/arm64/elf_hwcaps.rst
+++ b/Documentation/arm64/elf_hwcaps.rst
@@ -302,6 +302,9 @@ HWCAP2_SMEB16B16
HWCAP2_SMEF16F16
Functionality implied by ID_AA64SMFR0_EL1.F16F16 == 0b1
+HWCAP2_MOPS
+ Functionality implied by ID_AA64ISAR2_EL1.MOPS == 0b0001.
+
4. Unused AT_HWCAP bits
-----------------------
diff --git a/Documentation/arm64/index.rst b/Documentation/arm64/index.rst
index ae21f8118830..d08e924204bf 100644
--- a/Documentation/arm64/index.rst
+++ b/Documentation/arm64/index.rst
@@ -15,11 +15,13 @@ ARM64 Architecture
cpu-feature-registers
elf_hwcaps
hugetlbpage
+ kdump
legacy_instructions
memory
memory-tagging-extension
perf
pointer-authentication
+ ptdump
silicon-errata
sme
sve
diff --git a/Documentation/arm64/kdump.rst b/Documentation/arm64/kdump.rst
new file mode 100644
index 000000000000..56a89f45df28
--- /dev/null
+++ b/Documentation/arm64/kdump.rst
@@ -0,0 +1,92 @@
+=======================================
+crashkernel memory reservation on arm64
+=======================================
+
+Author: Baoquan He <bhe@redhat.com>
+
+Kdump mechanism is used to capture a corrupted kernel vmcore so that
+it can be subsequently analyzed. In order to do this, a preliminarily
+reserved memory is needed to pre-load the kdump kernel and boot such
+kernel if corruption happens.
+
+That reserved memory for kdump is adapted to be able to minimally
+accommodate the kdump kernel and the user space programs needed for the
+vmcore collection.
+
+Kernel parameter
+================
+
+Through the kernel parameters below, memory can be reserved accordingly
+during the early stage of the first kernel booting so that a continuous
+large chunk of memomy can be found. The low memory reservation needs to
+be considered if the crashkernel is reserved from the high memory area.
+
+- crashkernel=size@offset
+- crashkernel=size
+- crashkernel=size,high crashkernel=size,low
+
+Low memory and high memory
+==========================
+
+For kdump reservations, low memory is the memory area under a specific
+limit, usually decided by the accessible address bits of the DMA-capable
+devices needed by the kdump kernel to run. Those devices not related to
+vmcore dumping can be ignored. On arm64, the low memory upper bound is
+not fixed: it is 1G on the RPi4 platform but 4G on most other systems.
+On special kernels built with CONFIG_ZONE_(DMA|DMA32) disabled, the
+whole system RAM is low memory. Outside of the low memory described
+above, the rest of system RAM is considered high memory.
+
+Implementation
+==============
+
+1) crashkernel=size@offset
+--------------------------
+
+The crashkernel memory must be reserved at the user-specified region or
+fail if already occupied.
+
+
+2) crashkernel=size
+-------------------
+
+The crashkernel memory region will be reserved in any available position
+according to the search order:
+
+Firstly, the kernel searches the low memory area for an available region
+with the specified size.
+
+If searching for low memory fails, the kernel falls back to searching
+the high memory area for an available region of the specified size. If
+the reservation in high memory succeeds, a default size reservation in
+the low memory will be done. Currently the default size is 128M,
+sufficient for the low memory needs of the kdump kernel.
+
+Note: crashkernel=size is the recommended option for crashkernel kernel
+reservations. The user would not need to know the system memory layout
+for a specific platform.
+
+3) crashkernel=size,high crashkernel=size,low
+---------------------------------------------
+
+crashkernel=size,(high|low) are an important supplement to
+crashkernel=size. They allows the user to specify how much memory needs
+to be allocated from the high memory and low memory respectively. On
+many systems the low memory is precious and crashkernel reservations
+from this area should be kept to a minimum.
+
+To reserve memory for crashkernel=size,high, searching is first
+attempted from the high memory region. If the reservation succeeds, the
+low memory reservation will be done subsequently.
+
+If reservation from the high memory failed, the kernel falls back to
+searching the low memory with the specified size in crashkernel=,high.
+If it succeeds, no further reservation for low memory is needed.
+
+Notes:
+
+- If crashkernel=,low is not specified, the default low memory
+ reservation will be done automatically.
+
+- if crashkernel=0,low is specified, it means that the low memory
+ reservation is omitted intentionally.
diff --git a/Documentation/arm64/memory.rst b/Documentation/arm64/memory.rst
index 2a641ba7be3b..55a55f30eed8 100644
--- a/Documentation/arm64/memory.rst
+++ b/Documentation/arm64/memory.rst
@@ -33,8 +33,8 @@ AArch64 Linux memory layout with 4KB pages + 4 levels (48-bit)::
0000000000000000 0000ffffffffffff 256TB user
ffff000000000000 ffff7fffffffffff 128TB kernel logical memory map
[ffff600000000000 ffff7fffffffffff] 32TB [kasan shadow region]
- ffff800000000000 ffff800007ffffff 128MB modules
- ffff800008000000 fffffbffefffffff 124TB vmalloc
+ ffff800000000000 ffff80007fffffff 2GB modules
+ ffff800080000000 fffffbffefffffff 124TB vmalloc
fffffbfff0000000 fffffbfffdffffff 224MB fixed mappings (top down)
fffffbfffe000000 fffffbfffe7fffff 8MB [guard region]
fffffbfffe800000 fffffbffff7fffff 16MB PCI I/O space
@@ -50,8 +50,8 @@ AArch64 Linux memory layout with 64KB pages + 3 levels (52-bit with HW support):
0000000000000000 000fffffffffffff 4PB user
fff0000000000000 ffff7fffffffffff ~4PB kernel logical memory map
[fffd800000000000 ffff7fffffffffff] 512TB [kasan shadow region]
- ffff800000000000 ffff800007ffffff 128MB modules
- ffff800008000000 fffffbffefffffff 124TB vmalloc
+ ffff800000000000 ffff80007fffffff 2GB modules
+ ffff800080000000 fffffbffefffffff 124TB vmalloc
fffffbfff0000000 fffffbfffdffffff 224MB fixed mappings (top down)
fffffbfffe000000 fffffbfffe7fffff 8MB [guard region]
fffffbfffe800000 fffffbffff7fffff 16MB PCI I/O space
diff --git a/Documentation/arm64/ptdump.rst b/Documentation/arm64/ptdump.rst
new file mode 100644
index 000000000000..5dcfc5d7cddf
--- /dev/null
+++ b/Documentation/arm64/ptdump.rst
@@ -0,0 +1,96 @@
+======================
+Kernel page table dump
+======================
+
+ptdump is a debugfs interface that provides a detailed dump of the
+kernel page tables. It offers a comprehensive overview of the kernel
+virtual memory layout as well as the attributes associated with the
+various regions in a human-readable format. It is useful to dump the
+kernel page tables to verify permissions and memory types. Examining the
+page table entries and permissions helps identify potential security
+vulnerabilities such as mappings with overly permissive access rights or
+improper memory protections.
+
+Memory hotplug allows dynamic expansion or contraction of available
+memory without requiring a system reboot. To maintain the consistency
+and integrity of the memory management data structures, arm64 makes use
+of the ``mem_hotplug_lock`` semaphore in write mode. Additionally, in
+read mode, ``mem_hotplug_lock`` supports an efficient implementation of
+``get_online_mems()`` and ``put_online_mems()``. These protect the
+offlining of memory being accessed by the ptdump code.
+
+In order to dump the kernel page tables, enable the following
+configurations and mount debugfs::
+
+ CONFIG_GENERIC_PTDUMP=y
+ CONFIG_PTDUMP_CORE=y
+ CONFIG_PTDUMP_DEBUGFS=y
+
+ mount -t debugfs nodev /sys/kernel/debug
+ cat /sys/kernel/debug/kernel_page_tables
+
+On analysing the output of ``cat /sys/kernel/debug/kernel_page_tables``
+one can derive information about the virtual address range of the entry,
+followed by size of the memory region covered by this entry, the
+hierarchical structure of the page tables and finally the attributes
+associated with each page. The page attributes provide information about
+access permissions, execution capability, type of mapping such as leaf
+level PTE or block level PGD, PMD and PUD, and access status of a page
+within the kernel memory. Assessing these attributes can assist in
+understanding the memory layout, access patterns and security
+characteristics of the kernel pages.
+
+Kernel virtual memory layout example::
+
+ start address end address size attributes
+ +---------------------------------------------------------------------------------------+
+ | ---[ Linear Mapping start ]---------------------------------------------------------- |
+ | .................. |
+ | 0xfff0000000000000-0xfff0000000210000 2112K PTE RW NX SHD AF UXN MEM/NORMAL-TAGGED |
+ | 0xfff0000000210000-0xfff0000001c00000 26560K PTE ro NX SHD AF UXN MEM/NORMAL |
+ | .................. |
+ | ---[ Linear Mapping end ]------------------------------------------------------------ |
+ +---------------------------------------------------------------------------------------+
+ | ---[ Modules start ]----------------------------------------------------------------- |
+ | .................. |
+ | 0xffff800000000000-0xffff800008000000 128M PTE |
+ | .................. |
+ | ---[ Modules end ]------------------------------------------------------------------- |
+ +---------------------------------------------------------------------------------------+
+ | ---[ vmalloc() area ]---------------------------------------------------------------- |
+ | .................. |
+ | 0xffff800008010000-0xffff800008200000 1984K PTE ro x SHD AF UXN MEM/NORMAL |
+ | 0xffff800008200000-0xffff800008e00000 12M PTE ro x SHD AF CON UXN MEM/NORMAL |
+ | .................. |
+ | ---[ vmalloc() end ]----------------------------------------------------------------- |
+ +---------------------------------------------------------------------------------------+
+ | ---[ Fixmap start ]------------------------------------------------------------------ |
+ | .................. |
+ | 0xfffffbfffdb80000-0xfffffbfffdb90000 64K PTE ro x SHD AF UXN MEM/NORMAL |
+ | 0xfffffbfffdb90000-0xfffffbfffdba0000 64K PTE ro NX SHD AF UXN MEM/NORMAL |
+ | .................. |
+ | ---[ Fixmap end ]-------------------------------------------------------------------- |
+ +---------------------------------------------------------------------------------------+
+ | ---[ PCI I/O start ]----------------------------------------------------------------- |
+ | .................. |
+ | 0xfffffbfffe800000-0xfffffbffff800000 16M PTE |
+ | .................. |
+ | ---[ PCI I/O end ]------------------------------------------------------------------- |
+ +---------------------------------------------------------------------------------------+
+ | ---[ vmemmap start ]----------------------------------------------------------------- |
+ | .................. |
+ | 0xfffffc0002000000-0xfffffc0002200000 2M PTE RW NX SHD AF UXN MEM/NORMAL |
+ | 0xfffffc0002200000-0xfffffc0020000000 478M PTE |
+ | .................. |
+ | ---[ vmemmap end ]------------------------------------------------------------------- |
+ +---------------------------------------------------------------------------------------+
+
+``cat /sys/kernel/debug/kernel_page_tables`` output::
+
+ 0xfff0000001c00000-0xfff0000080000000 2020M PTE RW NX SHD AF UXN MEM/NORMAL-TAGGED
+ 0xfff0000080000000-0xfff0000800000000 30G PMD
+ 0xfff0000800000000-0xfff0000800700000 7M PTE RW NX SHD AF UXN MEM/NORMAL-TAGGED
+ 0xfff0000800700000-0xfff0000800710000 64K PTE ro NX SHD AF UXN MEM/NORMAL-TAGGED
+ 0xfff0000800710000-0xfff0000880000000 2089920K PTE RW NX SHD AF UXN MEM/NORMAL-TAGGED
+ 0xfff0000880000000-0xfff0040000000000 4062G PMD
+ 0xfff0040000000000-0xffff800000000000 3964T PGD
diff --git a/Documentation/arm64/silicon-errata.rst b/Documentation/arm64/silicon-errata.rst
index 9e311bc43e05..d6430ade349d 100644
--- a/Documentation/arm64/silicon-errata.rst
+++ b/Documentation/arm64/silicon-errata.rst
@@ -214,3 +214,7 @@ stable kernels.
+----------------+-----------------+-----------------+-----------------------------+
| Fujitsu | A64FX | E#010001 | FUJITSU_ERRATUM_010001 |
+----------------+-----------------+-----------------+-----------------------------+
+
++----------------+-----------------+-----------------+-----------------------------+
+| ASR | ASR8601 | #8601001 | N/A |
++----------------+-----------------+-----------------+-----------------------------+
diff --git a/Documentation/core-api/cpu_hotplug.rst b/Documentation/core-api/cpu_hotplug.rst
index f75778d37488..e6f5bc39cf5c 100644
--- a/Documentation/core-api/cpu_hotplug.rst
+++ b/Documentation/core-api/cpu_hotplug.rst
@@ -127,17 +127,8 @@ bring CPU4 back online::
$ echo 1 > /sys/devices/system/cpu/cpu4/online
smpboot: Booting Node 0 Processor 4 APIC 0x1
-The CPU is usable again. This should work on all CPUs. CPU0 is often special
-and excluded from CPU hotplug. On X86 the kernel option
-*CONFIG_BOOTPARAM_HOTPLUG_CPU0* has to be enabled in order to be able to
-shutdown CPU0. Alternatively the kernel command option *cpu0_hotplug* can be
-used. Some known dependencies of CPU0:
-
-* Resume from hibernate/suspend. Hibernate/suspend will fail if CPU0 is offline.
-* PIC interrupts. CPU0 can't be removed if a PIC interrupt is detected.
-
-Please let Fenghua Yu <fenghua.yu@intel.com> know if you find any dependencies
-on CPU0.
+The CPU is usable again. This should work on all CPUs, but CPU0 is often special
+and excluded from CPU hotplug.
The CPU hotplug coordination
============================
diff --git a/Documentation/core-api/kernel-api.rst b/Documentation/core-api/kernel-api.rst
index 9b3f3e5f5a95..712e59ad32fa 100644
--- a/Documentation/core-api/kernel-api.rst
+++ b/Documentation/core-api/kernel-api.rst
@@ -412,3 +412,15 @@ Read-Copy Update (RCU)
.. kernel-doc:: include/linux/rcu_sync.h
.. kernel-doc:: kernel/rcu/sync.c
+
+.. kernel-doc:: kernel/rcu/tasks.h
+
+.. kernel-doc:: kernel/rcu/tree_stall.h
+
+.. kernel-doc:: include/linux/rcupdate_trace.h
+
+.. kernel-doc:: include/linux/rcupdate_wait.h
+
+.. kernel-doc:: include/linux/rcuref.h
+
+.. kernel-doc:: include/linux/rcutree.h
diff --git a/Documentation/core-api/pin_user_pages.rst b/Documentation/core-api/pin_user_pages.rst
index 9fb0b1080d3b..d3c1f6d8c0e0 100644
--- a/Documentation/core-api/pin_user_pages.rst
+++ b/Documentation/core-api/pin_user_pages.rst
@@ -112,6 +112,12 @@ pages:
This also leads to limitations: there are only 31-10==21 bits available for a
counter that increments 10 bits at a time.
+* Because of that limitation, special handling is applied to the zero pages
+ when using FOLL_PIN. We only pretend to pin a zero page - we don't alter its
+ refcount or pincount at all (it is permanent, so there's no need). The
+ unpinning functions also don't do anything to a zero page. This is
+ transparent to the caller.
+
* Callers must specifically request "dma-pinned tracking of pages". In other
words, just calling get_user_pages() will not suffice; a new set of functions,
pin_user_page() and related, must be used.
diff --git a/Documentation/dev-tools/kunit/architecture.rst b/Documentation/dev-tools/kunit/architecture.rst
index e95ab05342bb..f335f883f8f6 100644
--- a/Documentation/dev-tools/kunit/architecture.rst
+++ b/Documentation/dev-tools/kunit/architecture.rst
@@ -119,9 +119,9 @@ All expectations/assertions are formatted as:
terminated immediately.
- Assertions call the function:
- ``void __noreturn kunit_abort(struct kunit *)``.
+ ``void __noreturn __kunit_abort(struct kunit *)``.
- - ``kunit_abort`` calls the function:
+ - ``__kunit_abort`` calls the function:
``void __noreturn kunit_try_catch_throw(struct kunit_try_catch *try_catch)``.
- ``kunit_try_catch_throw`` calls the function:
diff --git a/Documentation/dev-tools/kunit/start.rst b/Documentation/dev-tools/kunit/start.rst
index c736613c9b19..a98235326bab 100644
--- a/Documentation/dev-tools/kunit/start.rst
+++ b/Documentation/dev-tools/kunit/start.rst
@@ -250,15 +250,20 @@ Now we are ready to write the test cases.
};
kunit_test_suite(misc_example_test_suite);
+ MODULE_LICENSE("GPL");
+
2. Add the following lines to ``drivers/misc/Kconfig``:
.. code-block:: kconfig
config MISC_EXAMPLE_TEST
tristate "Test for my example" if !KUNIT_ALL_TESTS
- depends on MISC_EXAMPLE && KUNIT=y
+ depends on MISC_EXAMPLE && KUNIT
default KUNIT_ALL_TESTS
+Note: If your test does not support being built as a loadable module (which is
+discouraged), replace tristate by bool, and depend on KUNIT=y instead of KUNIT.
+
3. Add the following lines to ``drivers/misc/Makefile``:
.. code-block:: make
diff --git a/Documentation/dev-tools/kunit/usage.rst b/Documentation/dev-tools/kunit/usage.rst
index 9faf2b4153fc..c27e1646ecd9 100644
--- a/Documentation/dev-tools/kunit/usage.rst
+++ b/Documentation/dev-tools/kunit/usage.rst
@@ -121,6 +121,12 @@ there's an allocation error.
``return`` so they only work from the test function. In KUnit, we stop the
current kthread on failure, so you can call them from anywhere.
+.. note::
+ Warning: There is an exception to the above rule. You shouldn't use assertions
+ in the suite's exit() function, or in the free function for a resource. These
+ run when a test is shutting down, and an assertion here prevents further
+ cleanup code from running, potentially leading to a memory leak.
+
Customizing error messages
--------------------------
@@ -160,7 +166,12 @@ many similar tests. In order to reduce duplication in these closely related
tests, most unit testing frameworks (including KUnit) provide the concept of a
*test suite*. A test suite is a collection of test cases for a unit of code
with optional setup and teardown functions that run before/after the whole
-suite and/or every test case. For example:
+suite and/or every test case.
+
+.. note::
+ A test case will only run if it is associated with a test suite.
+
+For example:
.. code-block:: c
@@ -190,7 +201,10 @@ after everything else. ``kunit_test_suite(example_test_suite)`` registers the
test suite with the KUnit test framework.
.. note::
- A test case will only run if it is associated with a test suite.
+ The ``exit`` and ``suite_exit`` functions will run even if ``init`` or
+ ``suite_init`` fail. Make sure that they can handle any inconsistent
+ state which may result from ``init`` or ``suite_init`` encountering errors
+ or exiting early.
``kunit_test_suite(...)`` is a macro which tells the linker to put the
specified test suite in a special linker section so that it can be run by KUnit
@@ -601,6 +615,57 @@ For example:
KUNIT_ASSERT_STREQ(test, buffer, "");
}
+Registering Cleanup Actions
+---------------------------
+
+If you need to perform some cleanup beyond simple use of ``kunit_kzalloc``,
+you can register a custom "deferred action", which is a cleanup function
+run when the test exits (whether cleanly, or via a failed assertion).
+
+Actions are simple functions with no return value, and a single ``void*``
+context argument, and fulfill the same role as "cleanup" functions in Python
+and Go tests, "defer" statements in languages which support them, and
+(in some cases) destructors in RAII languages.
+
+These are very useful for unregistering things from global lists, closing
+files or other resources, or freeing resources.
+
+For example:
+
+.. code-block:: C
+
+ static void cleanup_device(void *ctx)
+ {
+ struct device *dev = (struct device *)ctx;
+
+ device_unregister(dev);
+ }
+
+ void example_device_test(struct kunit *test)
+ {
+ struct my_device dev;
+
+ device_register(&dev);
+
+ kunit_add_action(test, &cleanup_device, &dev);
+ }
+
+Note that, for functions like device_unregister which only accept a single
+pointer-sized argument, it's possible to directly cast that function to
+a ``kunit_action_t`` rather than writing a wrapper function, for example:
+
+.. code-block:: C
+
+ kunit_add_action(test, (kunit_action_t *)&device_unregister, &dev);
+
+``kunit_add_action`` can fail if, for example, the system is out of memory.
+You can use ``kunit_add_action_or_reset`` instead which runs the action
+immediately if it cannot be deferred.
+
+If you need more control over when the cleanup function is called, you
+can trigger it early using ``kunit_release_action``, or cancel it entirely
+with ``kunit_remove_action``.
+
Testing Static Functions
------------------------
diff --git a/Documentation/devicetree/bindings/ata/ahci-common.yaml b/Documentation/devicetree/bindings/ata/ahci-common.yaml
index 7fdf40954a4c..38770c4c85fd 100644
--- a/Documentation/devicetree/bindings/ata/ahci-common.yaml
+++ b/Documentation/devicetree/bindings/ata/ahci-common.yaml
@@ -8,7 +8,7 @@ title: Common Properties for Serial ATA AHCI controllers
maintainers:
- Hans de Goede <hdegoede@redhat.com>
- - Damien Le Moal <damien.lemoal@opensource.wdc.com>
+ - Damien Le Moal <dlemoal@kernel.org>
description:
This document defines device tree properties for a common AHCI SATA
diff --git a/Documentation/devicetree/bindings/clock/canaan,k210-clk.yaml b/Documentation/devicetree/bindings/clock/canaan,k210-clk.yaml
index 998e5cce652f..380cb6d80025 100644
--- a/Documentation/devicetree/bindings/clock/canaan,k210-clk.yaml
+++ b/Documentation/devicetree/bindings/clock/canaan,k210-clk.yaml
@@ -7,7 +7,7 @@ $schema: http://devicetree.org/meta-schemas/core.yaml#
title: Canaan Kendryte K210 Clock
maintainers:
- - Damien Le Moal <damien.lemoal@wdc.com>
+ - Damien Le Moal <dlemoal@kernel.org>
description: |
Canaan Kendryte K210 SoC clocks driver bindings. The clock
diff --git a/Documentation/devicetree/bindings/firmware/qcom,scm.yaml b/Documentation/devicetree/bindings/firmware/qcom,scm.yaml
index 367d04ad1923..83381f3a1341 100644
--- a/Documentation/devicetree/bindings/firmware/qcom,scm.yaml
+++ b/Documentation/devicetree/bindings/firmware/qcom,scm.yaml
@@ -71,6 +71,8 @@ properties:
minItems: 1
maxItems: 3
+ dma-coherent: true
+
interconnects:
maxItems: 1
diff --git a/Documentation/devicetree/bindings/i2c/opencores,i2c-ocores.yaml b/Documentation/devicetree/bindings/i2c/opencores,i2c-ocores.yaml
index 85d9efb743ee..d9ef86729011 100644
--- a/Documentation/devicetree/bindings/i2c/opencores,i2c-ocores.yaml
+++ b/Documentation/devicetree/bindings/i2c/opencores,i2c-ocores.yaml
@@ -60,6 +60,7 @@ properties:
default: 0
regstep:
+ $ref: /schemas/types.yaml#/definitions/uint32
description: |
deprecated, use reg-shift above
deprecated: true
diff --git a/Documentation/devicetree/bindings/i3c/silvaco,i3c-master.yaml b/Documentation/devicetree/bindings/i3c/silvaco,i3c-master.yaml
index 62f3ca66274f..32c821f97779 100644
--- a/Documentation/devicetree/bindings/i3c/silvaco,i3c-master.yaml
+++ b/Documentation/devicetree/bindings/i3c/silvaco,i3c-master.yaml
@@ -44,7 +44,7 @@ required:
- clock-names
- clocks
-additionalProperties: true
+unevaluatedProperties: false
examples:
- |
diff --git a/Documentation/devicetree/bindings/interrupt-controller/loongson,eiointc.yaml b/Documentation/devicetree/bindings/interrupt-controller/loongson,eiointc.yaml
new file mode 100644
index 000000000000..393c128a41d8
--- /dev/null
+++ b/Documentation/devicetree/bindings/interrupt-controller/loongson,eiointc.yaml
@@ -0,0 +1,59 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/interrupt-controller/loongson,eiointc.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Loongson Extended I/O Interrupt Controller
+
+maintainers:
+ - Binbin Zhou <zhoubinbin@loongson.cn>
+
+description: |
+ This interrupt controller is found on the Loongson-3 family chips and
+ Loongson-2K series chips and is used to distribute interrupts directly to
+ individual cores without forwarding them through the HT's interrupt line.
+
+allOf:
+ - $ref: /schemas/interrupt-controller.yaml#
+
+properties:
+ compatible:
+ enum:
+ - loongson,ls2k0500-eiointc
+ - loongson,ls2k2000-eiointc
+
+ reg:
+ maxItems: 1
+
+ interrupts:
+ maxItems: 1
+
+ interrupt-controller: true
+
+ '#interrupt-cells':
+ const: 1
+
+required:
+ - compatible
+ - reg
+ - interrupts
+ - interrupt-controller
+ - '#interrupt-cells'
+
+unevaluatedProperties: false
+
+examples:
+ - |
+ eiointc: interrupt-controller@1fe11600 {
+ compatible = "loongson,ls2k0500-eiointc";
+ reg = <0x1fe10000 0x10000>;
+
+ interrupt-controller;
+ #interrupt-cells = <1>;
+
+ interrupt-parent = <&cpuintc>;
+ interrupts = <3>;
+ };
+
+...
diff --git a/Documentation/devicetree/bindings/memory-controllers/nuvoton,npcm-memory-controller.yaml b/Documentation/devicetree/bindings/memory-controllers/nuvoton,npcm-memory-controller.yaml
new file mode 100644
index 000000000000..ac1a5a17749d
--- /dev/null
+++ b/Documentation/devicetree/bindings/memory-controllers/nuvoton,npcm-memory-controller.yaml
@@ -0,0 +1,50 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/memory-controllers/nuvoton,npcm-memory-controller.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Nuvoton NPCM Memory Controller
+
+maintainers:
+ - Marvin Lin <kflin@nuvoton.com>
+ - Stanley Chu <yschu@nuvoton.com>
+
+description: |
+ The Nuvoton BMC SoC supports DDR4 memory with or without ECC (error correction
+ check).
+
+ The memory controller supports single bit error correction, double bit error
+ detection (in-line ECC in which a section (1/8th) of the memory device used to
+ store data is used for ECC storage).
+
+ Note, the bootloader must configure ECC mode for the memory controller.
+
+properties:
+ compatible:
+ enum:
+ - nuvoton,npcm750-memory-controller
+ - nuvoton,npcm845-memory-controller
+
+ reg:
+ maxItems: 1
+
+ interrupts:
+ maxItems: 1
+
+required:
+ - compatible
+ - reg
+ - interrupts
+
+additionalProperties: false
+
+examples:
+ - |
+ #include <dt-bindings/interrupt-controller/arm-gic.h>
+
+ mc: memory-controller@f0824000 {
+ compatible = "nuvoton,npcm750-memory-controller";
+ reg = <0xf0824000 0x1000>;
+ interrupts = <GIC_SPI 25 IRQ_TYPE_LEVEL_HIGH>;
+ };
diff --git a/Documentation/devicetree/bindings/mfd/canaan,k210-sysctl.yaml b/Documentation/devicetree/bindings/mfd/canaan,k210-sysctl.yaml
index 8459d3642205..3b3beab9db3f 100644
--- a/Documentation/devicetree/bindings/mfd/canaan,k210-sysctl.yaml
+++ b/Documentation/devicetree/bindings/mfd/canaan,k210-sysctl.yaml
@@ -7,7 +7,7 @@ $schema: http://devicetree.org/meta-schemas/core.yaml#
title: Canaan Kendryte K210 System Controller
maintainers:
- - Damien Le Moal <damien.lemoal@wdc.com>
+ - Damien Le Moal <dlemoal@kernel.org>
description:
Canaan Inc. Kendryte K210 SoC system controller which provides a
diff --git a/Documentation/devicetree/bindings/net/realtek-bluetooth.yaml b/Documentation/devicetree/bindings/net/realtek-bluetooth.yaml
index 8cc2b9924680..043e118c605c 100644
--- a/Documentation/devicetree/bindings/net/realtek-bluetooth.yaml
+++ b/Documentation/devicetree/bindings/net/realtek-bluetooth.yaml
@@ -11,7 +11,7 @@ maintainers:
- Alistair Francis <alistair@alistair23.me>
description:
- RTL8723CS/RTL8723CS/RTL8821CS/RTL8822CS is a WiFi + BT chip. WiFi part
+ RTL8723BS/RTL8723CS/RTL8821CS/RTL8822CS is a WiFi + BT chip. WiFi part
is connected over SDIO, while BT is connected over serial. It speaks
H5 protocol with few extra commands to upload firmware and change
module speed.
@@ -27,7 +27,7 @@ properties:
- items:
- enum:
- realtek,rtl8821cs-bt
- - const: realtek,rtl8822cs-bt
+ - const: realtek,rtl8723bs-bt
device-wake-gpios:
maxItems: 1
diff --git a/Documentation/devicetree/bindings/perf/fsl-imx-ddr.yaml b/Documentation/devicetree/bindings/perf/fsl-imx-ddr.yaml
index 80a92385367e..e9fad4b3de68 100644
--- a/Documentation/devicetree/bindings/perf/fsl-imx-ddr.yaml
+++ b/Documentation/devicetree/bindings/perf/fsl-imx-ddr.yaml
@@ -4,7 +4,7 @@
$id: http://devicetree.org/schemas/perf/fsl-imx-ddr.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#
-title: Freescale(NXP) IMX8 DDR performance monitor
+title: Freescale(NXP) IMX8/9 DDR performance monitor
maintainers:
- Frank Li <frank.li@nxp.com>
@@ -19,6 +19,7 @@ properties:
- fsl,imx8mm-ddr-pmu
- fsl,imx8mn-ddr-pmu
- fsl,imx8mp-ddr-pmu
+ - fsl,imx93-ddr-pmu
- items:
- enum:
- fsl,imx8mm-ddr-pmu
diff --git a/Documentation/devicetree/bindings/pinctrl/canaan,k210-fpioa.yaml b/Documentation/devicetree/bindings/pinctrl/canaan,k210-fpioa.yaml
index 7f4f36a58e56..739a08f00467 100644
--- a/Documentation/devicetree/bindings/pinctrl/canaan,k210-fpioa.yaml
+++ b/Documentation/devicetree/bindings/pinctrl/canaan,k210-fpioa.yaml
@@ -7,7 +7,7 @@ $schema: http://devicetree.org/meta-schemas/core.yaml#
title: Canaan Kendryte K210 FPIOA
maintainers:
- - Damien Le Moal <damien.lemoal@wdc.com>
+ - Damien Le Moal <dlemoal@kernel.org>
description:
The Canaan Kendryte K210 SoC Fully Programmable IO Array (FPIOA)
diff --git a/Documentation/devicetree/bindings/pinctrl/qcom,pmic-mpp.yaml b/Documentation/devicetree/bindings/pinctrl/qcom,pmic-mpp.yaml
index c91d3e3a094b..80f960671857 100644
--- a/Documentation/devicetree/bindings/pinctrl/qcom,pmic-mpp.yaml
+++ b/Documentation/devicetree/bindings/pinctrl/qcom,pmic-mpp.yaml
@@ -144,8 +144,9 @@ $defs:
enum: [0, 1, 2, 3, 4, 5, 6, 7]
qcom,paired:
- - description:
- Indicates that the pin should be operating in paired mode.
+ type: boolean
+ description:
+ Indicates that the pin should be operating in paired mode.
required:
- pins
diff --git a/Documentation/devicetree/bindings/reset/canaan,k210-rst.yaml b/Documentation/devicetree/bindings/reset/canaan,k210-rst.yaml
index ee8a2dcf5dfa..0c0135964b91 100644
--- a/Documentation/devicetree/bindings/reset/canaan,k210-rst.yaml
+++ b/Documentation/devicetree/bindings/reset/canaan,k210-rst.yaml
@@ -7,7 +7,7 @@ $schema: http://devicetree.org/meta-schemas/core.yaml#
title: Canaan Kendryte K210 Reset Controller
maintainers:
- - Damien Le Moal <damien.lemoal@wdc.com>
+ - Damien Le Moal <dlemoal@kernel.org>
description: |
Canaan Kendryte K210 reset controller driver which supports the SoC
diff --git a/Documentation/devicetree/bindings/riscv/canaan.yaml b/Documentation/devicetree/bindings/riscv/canaan.yaml
index f8f3f286bd55..41fd11f70a49 100644
--- a/Documentation/devicetree/bindings/riscv/canaan.yaml
+++ b/Documentation/devicetree/bindings/riscv/canaan.yaml
@@ -7,7 +7,7 @@ $schema: http://devicetree.org/meta-schemas/core.yaml#
title: Canaan SoC-based boards
maintainers:
- - Damien Le Moal <damien.lemoal@wdc.com>
+ - Damien Le Moal <dlemoal@kernel.org>
description:
Canaan Kendryte K210 SoC-based boards
diff --git a/Documentation/devicetree/bindings/thermal/armada-thermal.txt b/Documentation/devicetree/bindings/thermal/armada-thermal.txt
index b0bee7e42038..ab8b8fccc7af 100644
--- a/Documentation/devicetree/bindings/thermal/armada-thermal.txt
+++ b/Documentation/devicetree/bindings/thermal/armada-thermal.txt
@@ -8,6 +8,7 @@ Required properties:
* marvell,armada380-thermal
* marvell,armadaxp-thermal
* marvell,armada-ap806-thermal
+ * marvell,armada-ap807-thermal
* marvell,armada-cp110-thermal
Note: these bindings are deprecated for AP806/CP110 and should instead
diff --git a/Documentation/devicetree/bindings/thermal/brcm,bcm2835-thermal.txt b/Documentation/devicetree/bindings/thermal/brcm,bcm2835-thermal.txt
deleted file mode 100644
index a3e9ec5dc7ac..000000000000
--- a/Documentation/devicetree/bindings/thermal/brcm,bcm2835-thermal.txt
+++ /dev/null
@@ -1,41 +0,0 @@
-Binding for Thermal Sensor driver for BCM2835 SoCs.
-
-Required parameters:
--------------------
-
-compatible: should be one of: "brcm,bcm2835-thermal",
- "brcm,bcm2836-thermal" or "brcm,bcm2837-thermal"
-reg: Address range of the thermal registers.
-clocks: Phandle of the clock used by the thermal sensor.
-#thermal-sensor-cells: should be 0 (see Documentation/devicetree/bindings/thermal/thermal-sensor.yaml)
-
-Example:
-
-thermal-zones {
- cpu_thermal: cpu-thermal {
- polling-delay-passive = <0>;
- polling-delay = <1000>;
-
- thermal-sensors = <&thermal>;
-
- trips {
- cpu-crit {
- temperature = <80000>;
- hysteresis = <0>;
- type = "critical";
- };
- };
-
- coefficients = <(-538) 407000>;
-
- cooling-maps {
- };
- };
-};
-
-thermal: thermal@7e212000 {
- compatible = "brcm,bcm2835-thermal";
- reg = <0x7e212000 0x8>;
- clocks = <&clocks BCM2835_CLOCK_TSENS>;
- #thermal-sensor-cells = <0>;
-};
diff --git a/Documentation/devicetree/bindings/thermal/brcm,bcm2835-thermal.yaml b/Documentation/devicetree/bindings/thermal/brcm,bcm2835-thermal.yaml
new file mode 100644
index 000000000000..2b6026d9fbcf
--- /dev/null
+++ b/Documentation/devicetree/bindings/thermal/brcm,bcm2835-thermal.yaml
@@ -0,0 +1,48 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/thermal/brcm,bcm2835-thermal.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Broadcom BCM2835 thermal sensor
+
+maintainers:
+ - Stefan Wahren <stefan.wahren@i2se.com>
+
+allOf:
+ - $ref: thermal-sensor.yaml#
+
+properties:
+ compatible:
+ enum:
+ - brcm,bcm2835-thermal
+ - brcm,bcm2836-thermal
+ - brcm,bcm2837-thermal
+
+ reg:
+ maxItems: 1
+
+ clocks:
+ maxItems: 1
+
+ "#thermal-sensor-cells":
+ const: 0
+
+unevaluatedProperties: false
+
+required:
+ - compatible
+ - reg
+ - clocks
+ - '#thermal-sensor-cells'
+
+examples:
+ - |
+ #include <dt-bindings/clock/bcm2835.h>
+
+ thermal@7e212000 {
+ compatible = "brcm,bcm2835-thermal";
+ reg = <0x7e212000 0x8>;
+ clocks = <&clocks BCM2835_CLOCK_TSENS>;
+ #thermal-sensor-cells = <0>;
+ };
diff --git a/Documentation/devicetree/bindings/thermal/qcom-tsens.yaml b/Documentation/devicetree/bindings/thermal/qcom-tsens.yaml
index d1ec963a6834..27e9e16e6455 100644
--- a/Documentation/devicetree/bindings/thermal/qcom-tsens.yaml
+++ b/Documentation/devicetree/bindings/thermal/qcom-tsens.yaml
@@ -29,6 +29,8 @@ properties:
items:
- enum:
- qcom,mdm9607-tsens
+ - qcom,msm8226-tsens
+ - qcom,msm8909-tsens
- qcom,msm8916-tsens
- qcom,msm8939-tsens
- qcom,msm8974-tsens
@@ -48,6 +50,7 @@ properties:
- qcom,msm8953-tsens
- qcom,msm8996-tsens
- qcom,msm8998-tsens
+ - qcom,qcm2290-tsens
- qcom,sc7180-tsens
- qcom,sc7280-tsens
- qcom,sc8180x-tsens
@@ -56,6 +59,7 @@ properties:
- qcom,sdm845-tsens
- qcom,sm6115-tsens
- qcom,sm6350-tsens
+ - qcom,sm6375-tsens
- qcom,sm8150-tsens
- qcom,sm8250-tsens
- qcom,sm8350-tsens
@@ -67,6 +71,12 @@ properties:
enum:
- qcom,ipq8074-tsens
+ - description: v2 of TSENS with combined interrupt
+ items:
+ - enum:
+ - qcom,ipq9574-tsens
+ - const: qcom,ipq8074-tsens
+
reg:
items:
- description: TM registers
@@ -223,12 +233,7 @@ allOf:
contains:
enum:
- qcom,ipq8064-tsens
- - qcom,mdm9607-tsens
- - qcom,msm8916-tsens
- qcom,msm8960-tsens
- - qcom,msm8974-tsens
- - qcom,msm8976-tsens
- - qcom,qcs404-tsens
- qcom,tsens-v0_1
- qcom,tsens-v1
then:
@@ -244,22 +249,7 @@ allOf:
properties:
compatible:
contains:
- enum:
- - qcom,msm8953-tsens
- - qcom,msm8996-tsens
- - qcom,msm8998-tsens
- - qcom,sc7180-tsens
- - qcom,sc7280-tsens
- - qcom,sc8180x-tsens
- - qcom,sc8280xp-tsens
- - qcom,sdm630-tsens
- - qcom,sdm845-tsens
- - qcom,sm6350-tsens
- - qcom,sm8150-tsens
- - qcom,sm8250-tsens
- - qcom,sm8350-tsens
- - qcom,sm8450-tsens
- - qcom,tsens-v2
+ const: qcom,tsens-v2
then:
properties:
interrupts:
diff --git a/Documentation/devicetree/bindings/timer/brcm,kona-timer.txt b/Documentation/devicetree/bindings/timer/brcm,kona-timer.txt
deleted file mode 100644
index 39adf54b4388..000000000000
--- a/Documentation/devicetree/bindings/timer/brcm,kona-timer.txt
+++ /dev/null
@@ -1,25 +0,0 @@
-Broadcom Kona Family timer
------------------------------------------------------
-This timer is used in the following Broadcom SoCs:
- BCM11130, BCM11140, BCM11351, BCM28145, BCM28155
-
-Required properties:
-- compatible : "brcm,kona-timer"
-- DEPRECATED: compatible : "bcm,kona-timer"
-- reg : Register range for the timer
-- interrupts : interrupt for the timer
-- clocks: phandle + clock specifier pair of the external clock
-- clock-frequency: frequency that the clock operates
-
-Only one of clocks or clock-frequency should be specified.
-
-Refer to clocks/clock-bindings.txt for generic clock consumer properties.
-
-Example:
- timer@35006000 {
- compatible = "brcm,kona-timer";
- reg = <0x35006000 0x1000>;
- interrupts = <0x0 7 0x4>;
- clocks = <&hub_timer_clk>;
- };
-
diff --git a/Documentation/devicetree/bindings/timer/brcm,kona-timer.yaml b/Documentation/devicetree/bindings/timer/brcm,kona-timer.yaml
new file mode 100644
index 000000000000..d6af8383d6fc
--- /dev/null
+++ b/Documentation/devicetree/bindings/timer/brcm,kona-timer.yaml
@@ -0,0 +1,52 @@
+# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/timer/brcm,kona-timer.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Broadcom Kona family timer
+
+maintainers:
+ - Florian Fainelli <f.fainelli@gmail.com>
+
+properties:
+ compatible:
+ const: brcm,kona-timer
+
+ reg:
+ maxItems: 1
+
+ interrupts:
+ maxItems: 1
+
+ clocks:
+ maxItems: 1
+
+ clock-frequency: true
+
+oneOf:
+ - required:
+ - clocks
+ - required:
+ - clock-frequency
+
+required:
+ - compatible
+ - reg
+ - interrupts
+
+additionalProperties: false
+
+examples:
+ - |
+ #include <dt-bindings/clock/bcm281xx.h>
+ #include <dt-bindings/interrupt-controller/arm-gic.h>
+ #include <dt-bindings/interrupt-controller/irq.h>
+
+ timer@35006000 {
+ compatible = "brcm,kona-timer";
+ reg = <0x35006000 0x1000>;
+ interrupts = <GIC_SPI 7 IRQ_TYPE_LEVEL_HIGH>;
+ clocks = <&aon_ccu BCM281XX_AON_CCU_HUB_TIMER>;
+ };
+...
diff --git a/Documentation/devicetree/bindings/timer/loongson,ls1x-pwmtimer.yaml b/Documentation/devicetree/bindings/timer/loongson,ls1x-pwmtimer.yaml
new file mode 100644
index 000000000000..ad61ae55850b
--- /dev/null
+++ b/Documentation/devicetree/bindings/timer/loongson,ls1x-pwmtimer.yaml
@@ -0,0 +1,48 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/timer/loongson,ls1x-pwmtimer.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Loongson-1 PWM timer
+
+maintainers:
+ - Keguang Zhang <keguang.zhang@gmail.com>
+
+description:
+ Loongson-1 PWM timer can be used for system clock source
+ and clock event timers.
+
+properties:
+ compatible:
+ const: loongson,ls1b-pwmtimer
+
+ reg:
+ maxItems: 1
+
+ clocks:
+ maxItems: 1
+
+ interrupts:
+ maxItems: 1
+
+required:
+ - compatible
+ - reg
+ - clocks
+ - interrupts
+
+additionalProperties: false
+
+examples:
+ - |
+ #include <dt-bindings/clock/loongson,ls1x-clk.h>
+ #include <dt-bindings/interrupt-controller/irq.h>
+ clocksource: timer@1fe5c030 {
+ compatible = "loongson,ls1b-pwmtimer";
+ reg = <0x1fe5c030 0x10>;
+
+ clocks = <&clkc LS1X_CLKID_APB>;
+ interrupt-parent = <&intc0>;
+ interrupts = <20 IRQ_TYPE_LEVEL_HIGH>;
+ };
diff --git a/Documentation/devicetree/bindings/timer/ralink,rt2880-timer.yaml b/Documentation/devicetree/bindings/timer/ralink,rt2880-timer.yaml
new file mode 100644
index 000000000000..daa7832babe3
--- /dev/null
+++ b/Documentation/devicetree/bindings/timer/ralink,rt2880-timer.yaml
@@ -0,0 +1,44 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/timer/ralink,rt2880-timer.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Timer present in Ralink family SoCs
+
+maintainers:
+ - Sergio Paracuellos <sergio.paracuellos@gmail.com>
+
+properties:
+ compatible:
+ const: ralink,rt2880-timer
+
+ reg:
+ maxItems: 1
+
+ clocks:
+ maxItems: 1
+
+ interrupts:
+ maxItems: 1
+
+required:
+ - compatible
+ - reg
+ - clocks
+ - interrupts
+
+additionalProperties: false
+
+examples:
+ - |
+ timer@100 {
+ compatible = "ralink,rt2880-timer";
+ reg = <0x100 0x20>;
+
+ clocks = <&sysc 3>;
+
+ interrupt-parent = <&intc>;
+ interrupts = <1>;
+ };
+...
diff --git a/Documentation/devicetree/usage-model.rst b/Documentation/devicetree/usage-model.rst
index b6a287955ee5..0717426856b2 100644
--- a/Documentation/devicetree/usage-model.rst
+++ b/Documentation/devicetree/usage-model.rst
@@ -415,6 +415,6 @@ When using the DT, this creates problems for of_platform_populate()
because it must decide whether to register each node as either a
platform_device or an amba_device. This unfortunately complicates the
device creation model a little bit, but the solution turns out not to
-be too invasive. If a node is compatible with "arm,amba-primecell", then
+be too invasive. If a node is compatible with "arm,primecell", then
of_platform_populate() will register it as an amba_device instead of a
platform_device.
diff --git a/Documentation/driver-api/edac.rst b/Documentation/driver-api/edac.rst
index b8c742aa0a71..f4f044b95c4f 100644
--- a/Documentation/driver-api/edac.rst
+++ b/Documentation/driver-api/edac.rst
@@ -106,6 +106,16 @@ will occupy those chip-select rows.
This term is avoided because it is unclear when needing to distinguish
between chip-select rows and socket sets.
+* High Bandwidth Memory (HBM)
+
+HBM is a new memory type with low power consumption and ultra-wide
+communication lanes. It uses vertically stacked memory chips (DRAM dies)
+interconnected by microscopic wires called "through-silicon vias," or
+TSVs.
+
+Several stacks of HBM chips connect to the CPU or GPU through an ultra-fast
+interconnect called the "interposer". Therefore, HBM's characteristics
+are nearly indistinguishable from on-chip integrated RAM.
Memory Controllers
------------------
@@ -176,3 +186,113 @@ nodes::
the L1 and L2 directories would be "edac_device_block's"
.. kernel-doc:: drivers/edac/edac_device.h
+
+
+Heterogeneous system support
+----------------------------
+
+An AMD heterogeneous system is built by connecting the data fabrics of
+both CPUs and GPUs via custom xGMI links. Thus, the data fabric on the
+GPU nodes can be accessed the same way as the data fabric on CPU nodes.
+
+The MI200 accelerators are data center GPUs. They have 2 data fabrics,
+and each GPU data fabric contains four Unified Memory Controllers (UMC).
+Each UMC contains eight channels. Each UMC channel controls one 128-bit
+HBM2e (2GB) channel (equivalent to 8 X 2GB ranks). This creates a total
+of 4096-bits of DRAM data bus.
+
+While the UMC is interfacing a 16GB (8high X 2GB DRAM) HBM stack, each UMC
+channel is interfacing 2GB of DRAM (represented as rank).
+
+Memory controllers on AMD GPU nodes can be represented in EDAC thusly:
+
+ GPU DF / GPU Node -> EDAC MC
+ GPU UMC -> EDAC CSROW
+ GPU UMC channel -> EDAC CHANNEL
+
+For example: a heterogeneous system with 1 AMD CPU is connected to
+4 MI200 (Aldebaran) GPUs using xGMI.
+
+Some more heterogeneous hardware details:
+
+- The CPU UMC (Unified Memory Controller) is mostly the same as the GPU UMC.
+ They have chip selects (csrows) and channels. However, the layouts are different
+ for performance, physical layout, or other reasons.
+- CPU UMCs use 1 channel, In this case UMC = EDAC channel. This follows the
+ marketing speak. CPU has X memory channels, etc.
+- CPU UMCs use up to 4 chip selects, So UMC chip select = EDAC CSROW.
+- GPU UMCs use 1 chip select, So UMC = EDAC CSROW.
+- GPU UMCs use 8 channels, So UMC channel = EDAC channel.
+
+The EDAC subsystem provides a mechanism to handle AMD heterogeneous
+systems by calling system specific ops for both CPUs and GPUs.
+
+AMD GPU nodes are enumerated in sequential order based on the PCI
+hierarchy, and the first GPU node is assumed to have a Node ID value
+following those of the CPU nodes after latter are fully populated::
+
+ $ ls /sys/devices/system/edac/mc/
+ mc0 - CPU MC node 0
+ mc1 |
+ mc2 |- GPU card[0] => node 0(mc1), node 1(mc2)
+ mc3 |
+ mc4 |- GPU card[1] => node 0(mc3), node 1(mc4)
+ mc5 |
+ mc6 |- GPU card[2] => node 0(mc5), node 1(mc6)
+ mc7 |
+ mc8 |- GPU card[3] => node 0(mc7), node 1(mc8)
+
+For example, a heterogeneous system with one AMD CPU is connected to
+four MI200 (Aldebaran) GPUs using xGMI. This topology can be represented
+via the following sysfs entries::
+
+ /sys/devices/system/edac/mc/..
+
+ CPU # CPU node
+ ├── mc 0
+
+ GPU Nodes are enumerated sequentially after CPU nodes have been populated
+ GPU card 1 # Each MI200 GPU has 2 nodes/mcs
+ ├── mc 1 # GPU node 0 == mc1, Each MC node has 4 UMCs/CSROWs
+ │   ├── csrow 0 # UMC 0
+ │   │   ├── channel 0 # Each UMC has 8 channels
+ │   │   ├── channel 1 # size of each channel is 2 GB, so each UMC has 16 GB
+ │   │   ├── channel 2
+ │   │   ├── channel 3
+ │   │   ├── channel 4
+ │   │   ├── channel 5
+ │   │   ├── channel 6
+ │   │   ├── channel 7
+ │   ├── csrow 1 # UMC 1
+ │   │   ├── channel 0
+ │   │   ├── ..
+ │   │   ├── channel 7
+ │   ├── .. ..
+ │   ├── csrow 3 # UMC 3
+ │   │   ├── channel 0
+ │   │   ├── ..
+ │   │   ├── channel 7
+ │   ├── rank 0
+ │   ├── .. ..
+ │   ├── rank 31 # total 32 ranks/dimms from 4 UMCs
+ ├
+ ├── mc 2 # GPU node 1 == mc2
+ │   ├── .. # each GPU has total 64 GB
+
+ GPU card 2
+ ├── mc 3
+ │   ├── ..
+ ├── mc 4
+ │   ├── ..
+
+ GPU card 3
+ ├── mc 5
+ │   ├── ..
+ ├── mc 6
+ │   ├── ..
+
+ GPU card 4
+ ├── mc 7
+ │   ├── ..
+ ├── mc 8
+ │   ├── ..
diff --git a/Documentation/filesystems/directory-locking.rst b/Documentation/filesystems/directory-locking.rst
index 504ba940c36c..dccd61c7c5c3 100644
--- a/Documentation/filesystems/directory-locking.rst
+++ b/Documentation/filesystems/directory-locking.rst
@@ -22,12 +22,11 @@ exclusive.
3) object removal. Locking rules: caller locks parent, finds victim,
locks victim and calls the method. Locks are exclusive.
-4) rename() that is _not_ cross-directory. Locking rules: caller locks
-the parent and finds source and target. In case of exchange (with
-RENAME_EXCHANGE in flags argument) lock both. In any case,
-if the target already exists, lock it. If the source is a non-directory,
-lock it. If we need to lock both, lock them in inode pointer order.
-Then call the method. All locks are exclusive.
+4) rename() that is _not_ cross-directory. Locking rules: caller locks the
+parent and finds source and target. We lock both (provided they exist). If we
+need to lock two inodes of different type (dir vs non-dir), we lock directory
+first. If we need to lock two inodes of the same type, lock them in inode
+pointer order. Then call the method. All locks are exclusive.
NB: we might get away with locking the source (and target in exchange
case) shared.
@@ -44,15 +43,17 @@ All locks are exclusive.
rules:
* lock the filesystem
- * lock parents in "ancestors first" order.
+ * lock parents in "ancestors first" order. If one is not ancestor of
+ the other, lock them in inode pointer order.
* find source and target.
* if old parent is equal to or is a descendent of target
fail with -ENOTEMPTY
* if new parent is equal to or is a descendent of source
fail with -ELOOP
- * If it's an exchange, lock both the source and the target.
- * If the target exists, lock it. If the source is a non-directory,
- lock it. If we need to lock both, do so in inode pointer order.
+ * Lock both the source and the target provided they exist. If we
+ need to lock two inodes of different type (dir vs non-dir), we lock
+ the directory first. If we need to lock two inodes of the same type,
+ lock them in inode pointer order.
* call the method.
All ->i_rwsem are taken exclusive. Again, we might get away with locking
@@ -66,8 +67,9 @@ If no directory is its own ancestor, the scheme above is deadlock-free.
Proof:
- First of all, at any moment we have a partial ordering of the
- objects - A < B iff A is an ancestor of B.
+ First of all, at any moment we have a linear ordering of the
+ objects - A < B iff (A is an ancestor of B) or (B is not an ancestor
+ of A and ptr(A) < ptr(B)).
That ordering can change. However, the following is true:
diff --git a/Documentation/filesystems/fsverity.rst b/Documentation/filesystems/fsverity.rst
index ede672dedf11..cb845e8e5435 100644
--- a/Documentation/filesystems/fsverity.rst
+++ b/Documentation/filesystems/fsverity.rst
@@ -38,20 +38,14 @@ fail at runtime.
Use cases
=========
-By itself, the base fs-verity feature only provides integrity
-protection, i.e. detection of accidental (non-malicious) corruption.
+By itself, fs-verity only provides integrity protection, i.e.
+detection of accidental (non-malicious) corruption.
However, because fs-verity makes retrieving the file hash extremely
efficient, it's primarily meant to be used as a tool to support
authentication (detection of malicious modifications) or auditing
(logging file hashes before use).
-Trusted userspace code (e.g. operating system code running on a
-read-only partition that is itself authenticated by dm-verity) can
-authenticate the contents of an fs-verity file by using the
-`FS_IOC_MEASURE_VERITY`_ ioctl to retrieve its hash, then verifying a
-digital signature of it.
-
A standard file hash could be used instead of fs-verity. However,
this is inefficient if the file is large and only a small portion may
be accessed. This is often the case for Android application package
@@ -69,24 +63,31 @@ still be used on read-only filesystems. fs-verity is for files that
must live on a read-write filesystem because they are independently
updated and potentially user-installed, so dm-verity cannot be used.
-The base fs-verity feature is a hashing mechanism only; actually
-authenticating the files may be done by:
-
-* Userspace-only
-
-* Builtin signature verification + userspace policy
-
- fs-verity optionally supports a simple signature verification
- mechanism where users can configure the kernel to require that
- all fs-verity files be signed by a key loaded into a keyring;
- see `Built-in signature verification`_.
-
-* Integrity Measurement Architecture (IMA)
-
- IMA supports including fs-verity file digests and signatures in the
- IMA measurement list and verifying fs-verity based file signatures
- stored as security.ima xattrs, based on policy.
-
+fs-verity does not mandate a particular scheme for authenticating its
+file hashes. (Similarly, dm-verity does not mandate a particular
+scheme for authenticating its block device root hashes.) Options for
+authenticating fs-verity file hashes include:
+
+- Trusted userspace code. Often, the userspace code that accesses
+ files can be trusted to authenticate them. Consider e.g. an
+ application that wants to authenticate data files before using them,
+ or an application loader that is part of the operating system (which
+ is already authenticated in a different way, such as by being loaded
+ from a read-only partition that uses dm-verity) and that wants to
+ authenticate applications before loading them. In these cases, this
+ trusted userspace code can authenticate a file's contents by
+ retrieving its fs-verity digest using `FS_IOC_MEASURE_VERITY`_, then
+ verifying a signature of it using any userspace cryptographic
+ library that supports digital signatures.
+
+- Integrity Measurement Architecture (IMA). IMA supports fs-verity
+ file digests as an alternative to its traditional full file digests.
+ "IMA appraisal" enforces that files contain a valid, matching
+ signature in their "security.ima" extended attribute, as controlled
+ by the IMA policy. For more information, see the IMA documentation.
+
+- Trusted userspace code in combination with `Built-in signature
+ verification`_. This approach should be used only with great care.
User API
========
@@ -111,8 +112,7 @@ follows::
};
This structure contains the parameters of the Merkle tree to build for
-the file, and optionally contains a signature. It must be initialized
-as follows:
+the file. It must be initialized as follows:
- ``version`` must be 1.
- ``hash_algorithm`` must be the identifier for the hash algorithm to
@@ -129,12 +129,14 @@ as follows:
file or device. Currently the maximum salt size is 32 bytes.
- ``salt_ptr`` is the pointer to the salt, or NULL if no salt is
provided.
-- ``sig_size`` is the size of the signature in bytes, or 0 if no
- signature is provided. Currently the signature is (somewhat
- arbitrarily) limited to 16128 bytes. See `Built-in signature
- verification`_ for more information.
-- ``sig_ptr`` is the pointer to the signature, or NULL if no
- signature is provided.
+- ``sig_size`` is the size of the builtin signature in bytes, or 0 if no
+ builtin signature is provided. Currently the builtin signature is
+ (somewhat arbitrarily) limited to 16128 bytes.
+- ``sig_ptr`` is the pointer to the builtin signature, or NULL if no
+ builtin signature is provided. A builtin signature is only needed
+ if the `Built-in signature verification`_ feature is being used. It
+ is not needed for IMA appraisal, and it is not needed if the file
+ signature is being handled entirely in userspace.
- All reserved fields must be zeroed.
FS_IOC_ENABLE_VERITY causes the filesystem to build a Merkle tree for
@@ -158,7 +160,7 @@ fatal signal), no changes are made to the file.
FS_IOC_ENABLE_VERITY can fail with the following errors:
- ``EACCES``: the process does not have write access to the file
-- ``EBADMSG``: the signature is malformed
+- ``EBADMSG``: the builtin signature is malformed
- ``EBUSY``: this ioctl is already running on the file
- ``EEXIST``: the file already has verity enabled
- ``EFAULT``: the caller provided inaccessible memory
@@ -168,10 +170,10 @@ FS_IOC_ENABLE_VERITY can fail with the following errors:
reserved bits are set; or the file descriptor refers to neither a
regular file nor a directory.
- ``EISDIR``: the file descriptor refers to a directory
-- ``EKEYREJECTED``: the signature doesn't match the file
-- ``EMSGSIZE``: the salt or signature is too long
-- ``ENOKEY``: the fs-verity keyring doesn't contain the certificate
- needed to verify the signature
+- ``EKEYREJECTED``: the builtin signature doesn't match the file
+- ``EMSGSIZE``: the salt or builtin signature is too long
+- ``ENOKEY``: the ".fs-verity" keyring doesn't contain the certificate
+ needed to verify the builtin signature
- ``ENOPKG``: fs-verity recognizes the hash algorithm, but it's not
available in the kernel's crypto API as currently configured (e.g.
for SHA-512, missing CONFIG_CRYPTO_SHA512).
@@ -180,8 +182,8 @@ FS_IOC_ENABLE_VERITY can fail with the following errors:
support; or the filesystem superblock has not had the 'verity'
feature enabled on it; or the filesystem does not support fs-verity
on this file. (See `Filesystem support`_.)
-- ``EPERM``: the file is append-only; or, a signature is required and
- one was not provided.
+- ``EPERM``: the file is append-only; or, a builtin signature is
+ required and one was not provided.
- ``EROFS``: the filesystem is read-only
- ``ETXTBSY``: someone has the file open for writing. This can be the
caller's file descriptor, another open file descriptor, or the file
@@ -270,9 +272,9 @@ This ioctl takes in a pointer to the following structure::
- ``FS_VERITY_METADATA_TYPE_DESCRIPTOR`` reads the fs-verity
descriptor. See `fs-verity descriptor`_.
-- ``FS_VERITY_METADATA_TYPE_SIGNATURE`` reads the signature which was
- passed to FS_IOC_ENABLE_VERITY, if any. See `Built-in signature
- verification`_.
+- ``FS_VERITY_METADATA_TYPE_SIGNATURE`` reads the builtin signature
+ which was passed to FS_IOC_ENABLE_VERITY, if any. See `Built-in
+ signature verification`_.
The semantics are similar to those of ``pread()``. ``offset``
specifies the offset in bytes into the metadata item to read from, and
@@ -299,7 +301,7 @@ FS_IOC_READ_VERITY_METADATA can fail with the following errors:
overflowed
- ``ENODATA``: the file is not a verity file, or
FS_VERITY_METADATA_TYPE_SIGNATURE was requested but the file doesn't
- have a built-in signature
+ have a builtin signature
- ``ENOTTY``: this type of filesystem does not implement fs-verity, or
this ioctl is not yet implemented on it
- ``EOPNOTSUPP``: the kernel was not configured with fs-verity
@@ -347,8 +349,8 @@ non-verity one, with the following exceptions:
with EIO (for read()) or SIGBUS (for mmap() reads).
- If the sysctl "fs.verity.require_signatures" is set to 1 and the
- file is not signed by a key in the fs-verity keyring, then opening
- the file will fail. See `Built-in signature verification`_.
+ file is not signed by a key in the ".fs-verity" keyring, then
+ opening the file will fail. See `Built-in signature verification`_.
Direct access to the Merkle tree is not supported. Therefore, if a
verity file is copied, or is backed up and restored, then it will lose
@@ -433,20 +435,25 @@ root hash as well as other fields such as the file size::
Built-in signature verification
===============================
-With CONFIG_FS_VERITY_BUILTIN_SIGNATURES=y, fs-verity supports putting
-a portion of an authentication policy (see `Use cases`_) in the
-kernel. Specifically, it adds support for:
+CONFIG_FS_VERITY_BUILTIN_SIGNATURES=y adds supports for in-kernel
+verification of fs-verity builtin signatures.
+
+**IMPORTANT**! Please take great care before using this feature.
+It is not the only way to do signatures with fs-verity, and the
+alternatives (such as userspace signature verification, and IMA
+appraisal) can be much better. It's also easy to fall into a trap
+of thinking this feature solves more problems than it actually does.
+
+Enabling this option adds the following:
-1. At fs-verity module initialization time, a keyring ".fs-verity" is
- created. The root user can add trusted X.509 certificates to this
- keyring using the add_key() system call, then (when done)
- optionally use keyctl_restrict_keyring() to prevent additional
- certificates from being added.
+1. At boot time, the kernel creates a keyring named ".fs-verity". The
+ root user can add trusted X.509 certificates to this keyring using
+ the add_key() system call.
2. `FS_IOC_ENABLE_VERITY`_ accepts a pointer to a PKCS#7 formatted
detached signature in DER format of the file's fs-verity digest.
- On success, this signature is persisted alongside the Merkle tree.
- Then, any time the file is opened, the kernel will verify the
+ On success, the ioctl persists the signature alongside the Merkle
+ tree. Then, any time the file is opened, the kernel verifies the
file's actual digest against this signature, using the certificates
in the ".fs-verity" keyring.
@@ -454,8 +461,8 @@ kernel. Specifically, it adds support for:
When set to 1, the kernel requires that all verity files have a
correctly signed digest as described in (2).
-fs-verity file digests must be signed in the following format, which
-is similar to the structure used by `FS_IOC_MEASURE_VERITY`_::
+The data that the signature as described in (2) must be a signature of
+is the fs-verity file digest in the following format::
struct fsverity_formatted_digest {
char magic[8]; /* must be "FSVerity" */
@@ -464,13 +471,66 @@ is similar to the structure used by `FS_IOC_MEASURE_VERITY`_::
__u8 digest[];
};
-fs-verity's built-in signature verification support is meant as a
-relatively simple mechanism that can be used to provide some level of
-authenticity protection for verity files, as an alternative to doing
-the signature verification in userspace or using IMA-appraisal.
-However, with this mechanism, userspace programs still need to check
-that the verity bit is set, and there is no protection against verity
-files being swapped around.
+That's it. It should be emphasized again that fs-verity builtin
+signatures are not the only way to do signatures with fs-verity. See
+`Use cases`_ for an overview of ways in which fs-verity can be used.
+fs-verity builtin signatures have some major limitations that should
+be carefully considered before using them:
+
+- Builtin signature verification does *not* make the kernel enforce
+ that any files actually have fs-verity enabled. Thus, it is not a
+ complete authentication policy. Currently, if it is used, the only
+ way to complete the authentication policy is for trusted userspace
+ code to explicitly check whether files have fs-verity enabled with a
+ signature before they are accessed. (With
+ fs.verity.require_signatures=1, just checking whether fs-verity is
+ enabled suffices.) But, in this case the trusted userspace code
+ could just store the signature alongside the file and verify it
+ itself using a cryptographic library, instead of using this feature.
+
+- A file's builtin signature can only be set at the same time that
+ fs-verity is being enabled on the file. Changing or deleting the
+ builtin signature later requires re-creating the file.
+
+- Builtin signature verification uses the same set of public keys for
+ all fs-verity enabled files on the system. Different keys cannot be
+ trusted for different files; each key is all or nothing.
+
+- The sysctl fs.verity.require_signatures applies system-wide.
+ Setting it to 1 only works when all users of fs-verity on the system
+ agree that it should be set to 1. This limitation can prevent
+ fs-verity from being used in cases where it would be helpful.
+
+- Builtin signature verification can only use signature algorithms
+ that are supported by the kernel. For example, the kernel does not
+ yet support Ed25519, even though this is often the signature
+ algorithm that is recommended for new cryptographic designs.
+
+- fs-verity builtin signatures are in PKCS#7 format, and the public
+ keys are in X.509 format. These formats are commonly used,
+ including by some other kernel features (which is why the fs-verity
+ builtin signatures use them), and are very feature rich.
+ Unfortunately, history has shown that code that parses and handles
+ these formats (which are from the 1990s and are based on ASN.1)
+ often has vulnerabilities as a result of their complexity. This
+ complexity is not inherent to the cryptography itself.
+
+ fs-verity users who do not need advanced features of X.509 and
+ PKCS#7 should strongly consider using simpler formats, such as plain
+ Ed25519 keys and signatures, and verifying signatures in userspace.
+
+ fs-verity users who choose to use X.509 and PKCS#7 anyway should
+ still consider that verifying those signatures in userspace is more
+ flexible (for other reasons mentioned earlier in this document) and
+ eliminates the need to enable CONFIG_FS_VERITY_BUILTIN_SIGNATURES
+ and its associated increase in kernel attack surface. In some cases
+ it can even be necessary, since advanced X.509 and PKCS#7 features
+ do not always work as intended with the kernel. For example, the
+ kernel does not check X.509 certificate validity times.
+
+ Note: IMA appraisal, which supports fs-verity, does not use PKCS#7
+ for its signatures, so it partially avoids the issues discussed
+ here. IMA appraisal does use X.509.
Filesystem support
==================
diff --git a/Documentation/process/changes.rst b/Documentation/process/changes.rst
index ef540865ad22..5cf6a5f8ca57 100644
--- a/Documentation/process/changes.rst
+++ b/Documentation/process/changes.rst
@@ -31,7 +31,7 @@ you probably needn't concern yourself with pcmciautils.
====================== =============== ========================================
GNU C 5.1 gcc --version
Clang/LLVM (optional) 11.0.0 clang --version
-Rust (optional) 1.62.0 rustc --version
+Rust (optional) 1.68.2 rustc --version
bindgen (optional) 0.56.0 bindgen --version
GNU make 3.82 make --version
bash 4.2 bash --version
diff --git a/Documentation/process/maintainer-tip.rst b/Documentation/process/maintainer-tip.rst
index 178c95fd17dc..93d8a794bdfc 100644
--- a/Documentation/process/maintainer-tip.rst
+++ b/Documentation/process/maintainer-tip.rst
@@ -421,6 +421,9 @@ allowing themselves a breath. Please respect that.
The release candidate -rc1 is the starting point for new patches to be
applied which are targeted for the next merge window.
+So called _urgent_ branches will be merged into mainline during the
+stabilization phase of each release.
+
Git
^^^
diff --git a/Documentation/riscv/patch-acceptance.rst b/Documentation/riscv/patch-acceptance.rst
index 07d5a5623e2a..634aa222b410 100644
--- a/Documentation/riscv/patch-acceptance.rst
+++ b/Documentation/riscv/patch-acceptance.rst
@@ -16,6 +16,24 @@ tested code over experimental code. We wish to extend these same
principles to the RISC-V-related code that will be accepted for
inclusion in the kernel.
+Patchwork
+---------
+
+RISC-V has a patchwork instance, where the status of patches can be checked:
+
+ https://patchwork.kernel.org/project/linux-riscv/list/
+
+If your patch does not appear in the default view, the RISC-V maintainers have
+likely either requested changes, or expect it to be applied to another tree.
+
+Automation runs against this patchwork instance, building/testing patches as
+they arrive. The automation applies patches against the current HEAD of the
+RISC-V `for-next` and `fixes` branches, depending on whether the patch has been
+detected as a fix. Failing those, it will use the RISC-V `master` branch.
+The exact commit to which a series has been applied will be noted on patchwork.
+Patches for which any of the checks fail are unlikely to be applied and in most
+cases will need to be resubmitted.
+
Submit Checklist Addendum
-------------------------
We'll only accept patches for new modules or extensions if the
diff --git a/Documentation/rust/quick-start.rst b/Documentation/rust/quick-start.rst
index 13b7744b1e27..a8931512ed98 100644
--- a/Documentation/rust/quick-start.rst
+++ b/Documentation/rust/quick-start.rst
@@ -38,9 +38,9 @@ and run::
rustup override set $(scripts/min-tool-version.sh rustc)
-Otherwise, fetch a standalone installer or install ``rustup`` from:
+Otherwise, fetch a standalone installer from:
- https://www.rust-lang.org
+ https://forge.rust-lang.org/infra/other-installation-methods.html#standalone
Rust standard library source
diff --git a/Documentation/trace/user_events.rst b/Documentation/trace/user_events.rst
index f79987e16cf4..e7b07313550a 100644
--- a/Documentation/trace/user_events.rst
+++ b/Documentation/trace/user_events.rst
@@ -14,10 +14,6 @@ Programs can view status of the events via
/sys/kernel/tracing/user_events_status and can both register and write
data out via /sys/kernel/tracing/user_events_data.
-Programs can also use /sys/kernel/tracing/dynamic_events to register and
-delete user based events via the u: prefix. The format of the command to
-dynamic_events is the same as the ioctl with the u: prefix applied.
-
Typically programs will register a set of events that they wish to expose to
tools that can read trace_events (such as ftrace and perf). The registration
process tells the kernel which address and bit to reflect if any tool has
@@ -144,6 +140,9 @@ its name. Delete will only succeed if there are no references left to the
event (in both user and kernel space). User programs should use a separate file
to request deletes than the one used for registration due to this.
+**NOTE:** By default events will auto-delete when there are no references left
+to the event. Flags in the future may change this logic.
+
Unregistering
-------------
If after registering an event it is no longer wanted to be updated then it can
diff --git a/Documentation/translations/zh_CN/devicetree/usage-model.rst b/Documentation/translations/zh_CN/devicetree/usage-model.rst
index c6aee82c7e6e..19ba4ae0cd81 100644
--- a/Documentation/translations/zh_CN/devicetree/usage-model.rst
+++ b/Documentation/translations/zh_CN/devicetree/usage-model.rst
@@ -325,6 +325,6 @@ Primecell设备。然而,棘手的一点是,AMBA总线上的所有设备并
当使用DT时,这给of_platform_populate()带来了问题,因为它必须决定是否将
每个节点注册为platform_device或amba_device。不幸的是,这使设备创建模型
-变得有点复杂,但解决方案原来并不是太具有侵略性。如果一个节点与“arm,amba-primecell”
+变得有点复杂,但解决方案原来并不是太具有侵略性。如果一个节点与“arm,primecell”
兼容,那么of_platform_populate()将把它注册为amba_device而不是
platform_device。
diff --git a/Documentation/virt/paravirt_ops.rst b/Documentation/virt/paravirt_ops.rst
index 6b789d27cead..62d867e0d4d6 100644
--- a/Documentation/virt/paravirt_ops.rst
+++ b/Documentation/virt/paravirt_ops.rst
@@ -5,31 +5,31 @@ Paravirt_ops
============
Linux provides support for different hypervisor virtualization technologies.
-Historically different binary kernels would be required in order to support
-different hypervisors, this restriction was removed with pv_ops.
+Historically, different binary kernels would be required in order to support
+different hypervisors; this restriction was removed with pv_ops.
Linux pv_ops is a virtualization API which enables support for different
hypervisors. It allows each hypervisor to override critical operations and
allows a single kernel binary to run on all supported execution environments
including native machine -- without any hypervisors.
pv_ops provides a set of function pointers which represent operations
-corresponding to low level critical instructions and high level
-functionalities in various areas. pv-ops allows for optimizations at run
-time by enabling binary patching of the low-ops critical operations
+corresponding to low-level critical instructions and high-level
+functionalities in various areas. pv_ops allows for optimizations at run
+time by enabling binary patching of the low-level critical operations
at boot time.
pv_ops operations are classified into three categories:
- simple indirect call
- These operations correspond to high level functionality where it is
+ These operations correspond to high-level functionality where it is
known that the overhead of indirect call isn't very important.
- indirect call which allows optimization with binary patch
- Usually these operations correspond to low level critical instructions. They
+ Usually these operations correspond to low-level critical instructions. They
are called frequently and are performance critical. The overhead is
very important.
- a set of macros for hand written assembly code
Hand written assembly codes (.S files) also need paravirtualization
- because they include sensitive instructions or some of code paths in
+ because they include sensitive instructions or some code paths in
them are very performance critical.