summaryrefslogtreecommitdiff
path: root/Documentation
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/ABI/testing/sysfs-driver-intel-m10-bmc-sec-update14
-rw-r--r--Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.rst2
-rw-r--r--Documentation/RCU/Design/Memory-Ordering/TreeRCU-callback-registry.svg9
-rw-r--r--Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp-fqs.svg4
-rw-r--r--Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp.svg13
-rw-r--r--Documentation/RCU/Design/Memory-Ordering/TreeRCU-hotplug.svg4
-rw-r--r--Documentation/RCU/Design/Requirements/Requirements.rst4
-rw-r--r--Documentation/RCU/listRCU.rst9
-rw-r--r--Documentation/RCU/whatisRCU.rst4
-rw-r--r--Documentation/admin-guide/cgroup-v2.rst130
-rw-r--r--Documentation/admin-guide/hw-vuln/srso.rst24
-rw-r--r--Documentation/admin-guide/kernel-parameters.txt87
-rw-r--r--Documentation/admin-guide/pm/intel_idle.rst17
-rw-r--r--Documentation/admin-guide/pstore-blk.rst8
-rw-r--r--Documentation/admin-guide/sysctl/kernel.rst3
-rw-r--r--Documentation/arch/x86/amd-memory-encryption.rst2
-rw-r--r--Documentation/arch/x86/iommu.rst2
-rw-r--r--Documentation/arch/x86/resctrl.rst38
-rw-r--r--Documentation/arch/x86/topology.rst12
-rw-r--r--Documentation/devicetree/bindings/iio/addac/adi,ad74115.yaml3
-rw-r--r--Documentation/devicetree/bindings/iio/dac/adi,ad5758.yaml3
-rw-r--r--Documentation/devicetree/bindings/iio/health/ti,afe4403.yaml3
-rw-r--r--Documentation/devicetree/bindings/iio/health/ti,afe4404.yaml3
-rw-r--r--Documentation/devicetree/bindings/memory-controllers/xlnx,versal-ddrmc-edac.yaml57
-rw-r--r--Documentation/devicetree/bindings/timer/cirrus,ep9301-timer.yaml49
-rw-r--r--Documentation/devicetree/bindings/timer/renesas,rz-mtu3.yaml67
-rw-r--r--Documentation/filesystems/files.rst53
-rw-r--r--Documentation/filesystems/fscrypt.rst121
-rw-r--r--Documentation/filesystems/nfs/exporting.rst7
-rw-r--r--Documentation/filesystems/porting.rst7
-rw-r--r--Documentation/memory-barriers.txt7
-rw-r--r--Documentation/netlink/specs/nfsd.yaml89
-rw-r--r--Documentation/process/changes.rst2
-rw-r--r--Documentation/rust/index.rst19
-rw-r--r--Documentation/scheduler/sched-capacity.rst13
-rw-r--r--Documentation/scheduler/sched-energy.rst29
-rw-r--r--Documentation/scheduler/sched-rt-group.rst40
37 files changed, 708 insertions, 250 deletions
diff --git a/Documentation/ABI/testing/sysfs-driver-intel-m10-bmc-sec-update b/Documentation/ABI/testing/sysfs-driver-intel-m10-bmc-sec-update
index 0a41afe0ab4c..9051695d2211 100644
--- a/Documentation/ABI/testing/sysfs-driver-intel-m10-bmc-sec-update
+++ b/Documentation/ABI/testing/sysfs-driver-intel-m10-bmc-sec-update
@@ -1,7 +1,7 @@
What: /sys/bus/platform/drivers/intel-m10bmc-sec-update/.../security/sr_root_entry_hash
Date: Sep 2022
KernelVersion: 5.20
-Contact: Russ Weight <russell.h.weight@intel.com>
+Contact: Peter Colberg <peter.colberg@intel.com>
Description: Read only. Returns the root entry hash for the static
region if one is programmed, else it returns the
string: "hash not programmed". This file is only
@@ -11,7 +11,7 @@ Description: Read only. Returns the root entry hash for the static
What: /sys/bus/platform/drivers/intel-m10bmc-sec-update/.../security/pr_root_entry_hash
Date: Sep 2022
KernelVersion: 5.20
-Contact: Russ Weight <russell.h.weight@intel.com>
+Contact: Peter Colberg <peter.colberg@intel.com>
Description: Read only. Returns the root entry hash for the partial
reconfiguration region if one is programmed, else it
returns the string: "hash not programmed". This file
@@ -21,7 +21,7 @@ Description: Read only. Returns the root entry hash for the partial
What: /sys/bus/platform/drivers/intel-m10bmc-sec-update/.../security/bmc_root_entry_hash
Date: Sep 2022
KernelVersion: 5.20
-Contact: Russ Weight <russell.h.weight@intel.com>
+Contact: Peter Colberg <peter.colberg@intel.com>
Description: Read only. Returns the root entry hash for the BMC image
if one is programmed, else it returns the string:
"hash not programmed". This file is only visible if the
@@ -31,7 +31,7 @@ Description: Read only. Returns the root entry hash for the BMC image
What: /sys/bus/platform/drivers/intel-m10bmc-sec-update/.../security/sr_canceled_csks
Date: Sep 2022
KernelVersion: 5.20
-Contact: Russ Weight <russell.h.weight@intel.com>
+Contact: Peter Colberg <peter.colberg@intel.com>
Description: Read only. Returns a list of indices for canceled code
signing keys for the static region. The standard bitmap
list format is used (e.g. "1,2-6,9").
@@ -39,7 +39,7 @@ Description: Read only. Returns a list of indices for canceled code
What: /sys/bus/platform/drivers/intel-m10bmc-sec-update/.../security/pr_canceled_csks
Date: Sep 2022
KernelVersion: 5.20
-Contact: Russ Weight <russell.h.weight@intel.com>
+Contact: Peter Colberg <peter.colberg@intel.com>
Description: Read only. Returns a list of indices for canceled code
signing keys for the partial reconfiguration region. The
standard bitmap list format is used (e.g. "1,2-6,9").
@@ -47,7 +47,7 @@ Description: Read only. Returns a list of indices for canceled code
What: /sys/bus/platform/drivers/intel-m10bmc-sec-update/.../security/bmc_canceled_csks
Date: Sep 2022
KernelVersion: 5.20
-Contact: Russ Weight <russell.h.weight@intel.com>
+Contact: Peter Colberg <peter.colberg@intel.com>
Description: Read only. Returns a list of indices for canceled code
signing keys for the BMC. The standard bitmap list format
is used (e.g. "1,2-6,9").
@@ -55,7 +55,7 @@ Description: Read only. Returns a list of indices for canceled code
What: /sys/bus/platform/drivers/intel-m10bmc-sec-update/.../security/flash_count
Date: Sep 2022
KernelVersion: 5.20
-Contact: Russ Weight <russell.h.weight@intel.com>
+Contact: Peter Colberg <peter.colberg@intel.com>
Description: Read only. Returns number of times the secure update
staging area has been flashed.
Format: "%u".
diff --git a/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.rst b/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.rst
index 93d899d53258..414f8a2012d6 100644
--- a/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.rst
+++ b/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.rst
@@ -181,7 +181,7 @@ operations is carried out at several levels:
of this wait (or series of waits, as the case may be) is to permit a
concurrent CPU-hotplug operation to complete.
#. In the case of RCU-sched, one of the last acts of an outgoing CPU is
- to invoke ``rcu_report_dead()``, which reports a quiescent state for
+ to invoke ``rcutree_report_cpu_dead()``, which reports a quiescent state for
that CPU. However, this is likely paranoia-induced redundancy.
+-----------------------------------------------------------------------+
diff --git a/Documentation/RCU/Design/Memory-Ordering/TreeRCU-callback-registry.svg b/Documentation/RCU/Design/Memory-Ordering/TreeRCU-callback-registry.svg
index 7ac6f9269806..63eff867175a 100644
--- a/Documentation/RCU/Design/Memory-Ordering/TreeRCU-callback-registry.svg
+++ b/Documentation/RCU/Design/Memory-Ordering/TreeRCU-callback-registry.svg
@@ -566,15 +566,6 @@
style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">rcutree_migrate_callbacks()</text>
<text
xml:space="preserve"
- x="8335.4873"
- y="5357.1006"
- font-style="normal"
- font-weight="bold"
- font-size="192"
- id="text202-7-9-6-0"
- style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">rcu_migrate_callbacks()</text>
- <text
- xml:space="preserve"
x="8768.4678"
y="6224.9038"
font-style="normal"
diff --git a/Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp-fqs.svg b/Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp-fqs.svg
index 7ddc094d7f28..d82a77d03d8c 100644
--- a/Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp-fqs.svg
+++ b/Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp-fqs.svg
@@ -1135,7 +1135,7 @@
font-weight="bold"
font-size="192"
id="text202-7-5-3-27-6-5"
- style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">rcu_report_dead()</text>
+ style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">rcutree_report_cpu_dead()</text>
<text
xml:space="preserve"
x="3745.7725"
@@ -1256,7 +1256,7 @@
font-style="normal"
y="3679.27"
x="-3804.9949"
- xml:space="preserve">rcu_cpu_starting()</text>
+ xml:space="preserve">rcutree_report_cpu_starting()</text>
<g
style="fill:none;stroke-width:0.025in"
id="g3107-7-5-0"
diff --git a/Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp.svg b/Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp.svg
index 069f6f8371c2..53e0dc2a2c79 100644
--- a/Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp.svg
+++ b/Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp.svg
@@ -1448,15 +1448,6 @@
style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">rcutree_migrate_callbacks()</text>
<text
xml:space="preserve"
- x="8335.4873"
- y="5357.1006"
- font-style="normal"
- font-weight="bold"
- font-size="192"
- id="text202-7-9-6-0"
- style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">rcu_migrate_callbacks()</text>
- <text
- xml:space="preserve"
x="8768.4678"
y="6224.9038"
font-style="normal"
@@ -3274,7 +3265,7 @@
font-weight="bold"
font-size="192"
id="text202-7-5-3-27-6-5"
- style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">rcu_report_dead()</text>
+ style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">rcutree_report_cpu_dead()</text>
<text
xml:space="preserve"
x="3745.7725"
@@ -3395,7 +3386,7 @@
font-style="normal"
y="3679.27"
x="-3804.9949"
- xml:space="preserve">rcu_cpu_starting()</text>
+ xml:space="preserve">rcutree_report_cpu_starting()</text>
<g
style="fill:none;stroke-width:0.025in"
id="g3107-7-5-0"
diff --git a/Documentation/RCU/Design/Memory-Ordering/TreeRCU-hotplug.svg b/Documentation/RCU/Design/Memory-Ordering/TreeRCU-hotplug.svg
index 2c9310ba29ba..4fa7506082bf 100644
--- a/Documentation/RCU/Design/Memory-Ordering/TreeRCU-hotplug.svg
+++ b/Documentation/RCU/Design/Memory-Ordering/TreeRCU-hotplug.svg
@@ -607,7 +607,7 @@
font-weight="bold"
font-size="192"
id="text202-7-5-3-27-6"
- style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">rcu_report_dead()</text>
+ style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">rcutree_report_cpu_dead()</text>
<text
xml:space="preserve"
x="3745.7725"
@@ -728,7 +728,7 @@
font-style="normal"
y="3679.27"
x="-3804.9949"
- xml:space="preserve">rcu_cpu_starting()</text>
+ xml:space="preserve">rcutree_report_cpu_starting()</text>
<g
style="fill:none;stroke-width:0.025in"
id="g3107-7-5-0"
diff --git a/Documentation/RCU/Design/Requirements/Requirements.rst b/Documentation/RCU/Design/Requirements/Requirements.rst
index f3b605285a87..cccafdaa1f84 100644
--- a/Documentation/RCU/Design/Requirements/Requirements.rst
+++ b/Documentation/RCU/Design/Requirements/Requirements.rst
@@ -1955,12 +1955,12 @@ if offline CPUs block an RCU grace period for too long.
An offline CPU's quiescent state will be reported either:
-1. As the CPU goes offline using RCU's hotplug notifier (rcu_report_dead()).
+1. As the CPU goes offline using RCU's hotplug notifier (rcutree_report_cpu_dead()).
2. When grace period initialization (rcu_gp_init()) detects a
race either with CPU offlining or with a task unblocking on a leaf
``rcu_node`` structure whose CPUs are all offline.
-The CPU-online path (rcu_cpu_starting()) should never need to report
+The CPU-online path (rcutree_report_cpu_starting()) should never need to report
a quiescent state for an offline CPU. However, as a debugging measure,
it does emit a warning if a quiescent state was not already reported
for that CPU.
diff --git a/Documentation/RCU/listRCU.rst b/Documentation/RCU/listRCU.rst
index bdc4bcc5289f..ed5c9d8c9afe 100644
--- a/Documentation/RCU/listRCU.rst
+++ b/Documentation/RCU/listRCU.rst
@@ -8,6 +8,15 @@ One of the most common uses of RCU is protecting read-mostly linked lists
that all of the required memory ordering is provided by the list macros.
This document describes several list-based RCU use cases.
+When iterating a list while holding the rcu_read_lock(), writers may
+modify the list. The reader is guaranteed to see all of the elements
+which were added to the list before they acquired the rcu_read_lock()
+and are still on the list when they drop the rcu_read_unlock().
+Elements which are added to, or removed from the list may or may not
+be seen. If the writer calls list_replace_rcu(), the reader may see
+either the old element or the new element; they will not see both,
+nor will they see neither.
+
Example 1: Read-mostly list: Deferred Destruction
-------------------------------------------------
diff --git a/Documentation/RCU/whatisRCU.rst b/Documentation/RCU/whatisRCU.rst
index e488c8e557a9..60ce02475142 100644
--- a/Documentation/RCU/whatisRCU.rst
+++ b/Documentation/RCU/whatisRCU.rst
@@ -59,8 +59,8 @@ experiment with should focus on Section 2. People who prefer to start
with example uses should focus on Sections 3 and 4. People who need to
understand the RCU implementation should focus on Section 5, then dive
into the kernel source code. People who reason best by analogy should
-focus on Section 6. Section 7 serves as an index to the docbook API
-documentation, and Section 8 is the traditional answer key.
+focus on Section 6 and 7. Section 8 serves as an index to the docbook
+API documentation, and Section 9 is the traditional answer key.
So, start with the section that makes the most sense to you and your
preferred method of learning. If you need to know everything about
diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index b26b5274eaaf..e440aee4fe94 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -364,6 +364,13 @@ constraint, a threaded controller must be able to handle competition
between threads in a non-leaf cgroup and its child cgroups. Each
threaded controller defines how such competitions are handled.
+Currently, the following controllers are threaded and can be enabled
+in a threaded cgroup::
+
+- cpu
+- cpuset
+- perf_event
+- pids
[Un]populated Notification
--------------------------
@@ -2226,6 +2233,49 @@ Cpuset Interface Files
Its value will be affected by memory nodes hotplug events.
+ cpuset.cpus.exclusive
+ A read-write multiple values file which exists on non-root
+ cpuset-enabled cgroups.
+
+ It lists all the exclusive CPUs that are allowed to be used
+ to create a new cpuset partition. Its value is not used
+ unless the cgroup becomes a valid partition root. See the
+ "cpuset.cpus.partition" section below for a description of what
+ a cpuset partition is.
+
+ When the cgroup becomes a partition root, the actual exclusive
+ CPUs that are allocated to that partition are listed in
+ "cpuset.cpus.exclusive.effective" which may be different
+ from "cpuset.cpus.exclusive". If "cpuset.cpus.exclusive"
+ has previously been set, "cpuset.cpus.exclusive.effective"
+ is always a subset of it.
+
+ Users can manually set it to a value that is different from
+ "cpuset.cpus". The only constraint in setting it is that the
+ list of CPUs must be exclusive with respect to its sibling.
+
+ For a parent cgroup, any one of its exclusive CPUs can only
+ be distributed to at most one of its child cgroups. Having an
+ exclusive CPU appearing in two or more of its child cgroups is
+ not allowed (the exclusivity rule). A value that violates the
+ exclusivity rule will be rejected with a write error.
+
+ The root cgroup is a partition root and all its available CPUs
+ are in its exclusive CPU set.
+
+ cpuset.cpus.exclusive.effective
+ A read-only multiple values file which exists on all non-root
+ cpuset-enabled cgroups.
+
+ This file shows the effective set of exclusive CPUs that
+ can be used to create a partition root. The content of this
+ file will always be a subset of "cpuset.cpus" and its parent's
+ "cpuset.cpus.exclusive.effective" if its parent is not the root
+ cgroup. It will also be a subset of "cpuset.cpus.exclusive"
+ if it is set. If "cpuset.cpus.exclusive" is not set, it is
+ treated to have an implicit value of "cpuset.cpus" in the
+ formation of local partition.
+
cpuset.cpus.partition
A read-write single value file which exists on non-root
cpuset-enabled cgroups. This flag is owned by the parent cgroup
@@ -2239,26 +2289,41 @@ Cpuset Interface Files
"isolated" Partition root without load balancing
========== =====================================
- The root cgroup is always a partition root and its state
- cannot be changed. All other non-root cgroups start out as
- "member".
+ A cpuset partition is a collection of cpuset-enabled cgroups with
+ a partition root at the top of the hierarchy and its descendants
+ except those that are separate partition roots themselves and
+ their descendants. A partition has exclusive access to the
+ set of exclusive CPUs allocated to it. Other cgroups outside
+ of that partition cannot use any CPUs in that set.
+
+ There are two types of partitions - local and remote. A local
+ partition is one whose parent cgroup is also a valid partition
+ root. A remote partition is one whose parent cgroup is not a
+ valid partition root itself. Writing to "cpuset.cpus.exclusive"
+ is optional for the creation of a local partition as its
+ "cpuset.cpus.exclusive" file will assume an implicit value that
+ is the same as "cpuset.cpus" if it is not set. Writing the
+ proper "cpuset.cpus.exclusive" values down the cgroup hierarchy
+ before the target partition root is mandatory for the creation
+ of a remote partition.
+
+ Currently, a remote partition cannot be created under a local
+ partition. All the ancestors of a remote partition root except
+ the root cgroup cannot be a partition root.
+
+ The root cgroup is always a partition root and its state cannot
+ be changed. All other non-root cgroups start out as "member".
When set to "root", the current cgroup is the root of a new
- partition or scheduling domain that comprises itself and all
- its descendants except those that are separate partition roots
- themselves and their descendants.
+ partition or scheduling domain. The set of exclusive CPUs is
+ determined by the value of its "cpuset.cpus.exclusive.effective".
- When set to "isolated", the CPUs in that partition root will
+ When set to "isolated", the CPUs in that partition will
be in an isolated state without any load balancing from the
scheduler. Tasks placed in such a partition with multiple
CPUs should be carefully distributed and bound to each of the
individual CPUs for optimal performance.
- The value shown in "cpuset.cpus.effective" of a partition root
- is the CPUs that the partition root can dedicate to a potential
- new child partition root. The new child subtracts available
- CPUs from its parent "cpuset.cpus.effective".
-
A partition root ("root" or "isolated") can be in one of the
two possible states - valid or invalid. An invalid partition
root is in a degraded state where some state information may
@@ -2281,37 +2346,33 @@ Cpuset Interface Files
In the case of an invalid partition root, a descriptive string on
why the partition is invalid is included within parentheses.
- For a partition root to become valid, the following conditions
+ For a local partition root to be valid, the following conditions
must be met.
- 1) The "cpuset.cpus" is exclusive with its siblings , i.e. they
- are not shared by any of its siblings (exclusivity rule).
- 2) The parent cgroup is a valid partition root.
- 3) The "cpuset.cpus" is not empty and must contain at least
- one of the CPUs from parent's "cpuset.cpus", i.e. they overlap.
- 4) The "cpuset.cpus.effective" cannot be empty unless there is
+ 1) The parent cgroup is a valid partition root.
+ 2) The "cpuset.cpus.exclusive.effective" file cannot be empty,
+ though it may contain offline CPUs.
+ 3) The "cpuset.cpus.effective" cannot be empty unless there is
no task associated with this partition.
- External events like hotplug or changes to "cpuset.cpus" can
- cause a valid partition root to become invalid and vice versa.
- Note that a task cannot be moved to a cgroup with empty
- "cpuset.cpus.effective".
+ For a remote partition root to be valid, all the above conditions
+ except the first one must be met.
- For a valid partition root with the sibling cpu exclusivity
- rule enabled, changes made to "cpuset.cpus" that violate the
- exclusivity rule will invalidate the partition as well as its
- sibling partitions with conflicting cpuset.cpus values. So
- care must be taking in changing "cpuset.cpus".
+ External events like hotplug or changes to "cpuset.cpus" or
+ "cpuset.cpus.exclusive" can cause a valid partition root to
+ become invalid and vice versa. Note that a task cannot be
+ moved to a cgroup with empty "cpuset.cpus.effective".
A valid non-root parent partition may distribute out all its CPUs
- to its child partitions when there is no task associated with it.
+ to its child local partitions when there is no task associated
+ with it.
- Care must be taken to change a valid partition root to
- "member" as all its child partitions, if present, will become
+ Care must be taken to change a valid partition root to "member"
+ as all its child local partitions, if present, will become
invalid causing disruption to tasks running in those child
partitions. These inactivated partitions could be recovered if
their parent is switched back to a partition root with a proper
- set of "cpuset.cpus".
+ value in "cpuset.cpus" or "cpuset.cpus.exclusive".
Poll and inotify events are triggered whenever the state of
"cpuset.cpus.partition" changes. That includes changes caused
@@ -2321,6 +2382,11 @@ Cpuset Interface Files
to "cpuset.cpus.partition" without the need to do continuous
polling.
+ A user can pre-configure certain CPUs to an isolated state
+ with load balancing disabled at boot time with the "isolcpus"
+ kernel boot command line option. If those CPUs are to be put
+ into a partition, they have to be used in an isolated partition.
+
Device controller
-----------------
diff --git a/Documentation/admin-guide/hw-vuln/srso.rst b/Documentation/admin-guide/hw-vuln/srso.rst
index b6cfb51cb0b4..e715bfc09879 100644
--- a/Documentation/admin-guide/hw-vuln/srso.rst
+++ b/Documentation/admin-guide/hw-vuln/srso.rst
@@ -46,12 +46,22 @@ The possible values in this file are:
The processor is not vulnerable
- * 'Vulnerable: no microcode':
+* 'Vulnerable':
+
+ The processor is vulnerable and no mitigations have been applied.
+
+ * 'Vulnerable: No microcode':
The processor is vulnerable, no microcode extending IBPB
functionality to address the vulnerability has been applied.
- * 'Mitigation: microcode':
+ * 'Vulnerable: Safe RET, no microcode':
+
+ The "Safe RET" mitigation (see below) has been applied to protect the
+ kernel, but the IBPB-extending microcode has not been applied. User
+ space tasks may still be vulnerable.
+
+ * 'Vulnerable: Microcode, no safe RET':
Extended IBPB functionality microcode patch has been applied. It does
not address User->Kernel and Guest->Host transitions protection but it
@@ -72,11 +82,11 @@ The possible values in this file are:
(spec_rstack_overflow=microcode)
- * 'Mitigation: safe RET':
+ * 'Mitigation: Safe RET':
- Software-only mitigation. It complements the extended IBPB microcode
- patch functionality by addressing User->Kernel and Guest->Host
- transitions protection.
+ Combined microcode/software mitigation. It complements the
+ extended IBPB microcode patch functionality by addressing
+ User->Kernel and Guest->Host transitions protection.
Selected by default or by spec_rstack_overflow=safe-ret
@@ -129,7 +139,7 @@ an indrect branch prediction barrier after having applied the required
microcode patch for one's system. This mitigation comes also at
a performance cost.
-Mitigation: safe RET
+Mitigation: Safe RET
--------------------
The mitigation works by ensuring all RET instructions speculate to
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 0a1731a0f0ef..758bb25ea3e6 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -580,6 +580,10 @@
named mounts. Specifying both "all" and "named" disables
all v1 hierarchies.
+ cgroup_favordynmods= [KNL] Enable or Disable favordynmods.
+ Format: { "true" | "false" }
+ Defaults to the value of CONFIG_CGROUP_FAVOR_DYNMODS.
+
cgroup.memory= [KNL] Pass options to the cgroup memory controller.
Format: <string>
nosocket -- Disable socket memory accounting.
@@ -1893,6 +1897,12 @@
0 -- machine default
1 -- force brightness inversion
+ ia32_emulation= [X86-64]
+ Format: <bool>
+ When true, allows loading 32-bit programs and executing 32-bit
+ syscalls, essentially overriding IA32_EMULATION_DEFAULT_DISABLED at
+ boot time. When false, unconditionally disables IA32 emulation.
+
icn= [HW,ISDN]
Format: <io>[,<membase>[,<icn_id>[,<icn_id2>]]]
@@ -2913,6 +2923,38 @@
to extract confidential information from the kernel
are also disabled.
+ locktorture.acq_writer_lim= [KNL]
+ Set the time limit in jiffies for a lock
+ acquisition. Acquisitions exceeding this limit
+ will result in a splat once they do complete.
+
+ locktorture.bind_readers= [KNL]
+ Specify the list of CPUs to which the readers are
+ to be bound.
+
+ locktorture.bind_writers= [KNL]
+ Specify the list of CPUs to which the writers are
+ to be bound.
+
+ locktorture.call_rcu_chains= [KNL]
+ Specify the number of self-propagating call_rcu()
+ chains to set up. These are used to ensure that
+ there is a high probability of an RCU grace period
+ in progress at any given time. Defaults to 0,
+ which disables these call_rcu() chains.
+
+ locktorture.long_hold= [KNL]
+ Specify the duration in milliseconds for the
+ occasional long-duration lock hold time. Defaults
+ to 100 milliseconds. Select 0 to disable.
+
+ locktorture.nested_locks= [KNL]
+ Specify the maximum lock nesting depth that
+ locktorture is to exercise, up to a limit of 8
+ (MAX_NESTED_LOCKS). Specify zero to disable.
+ Note that this parameter is ineffective on types
+ of locks that do not support nested acquisition.
+
locktorture.nreaders_stress= [KNL]
Set the number of locking read-acquisition kthreads.
Defaults to being automatically set based on the
@@ -2928,6 +2970,25 @@
Set time (s) between CPU-hotplug operations, or
zero to disable CPU-hotplug testing.
+ locktorture.rt_boost= [KNL]
+ Do periodic testing of real-time lock priority
+ boosting. Select 0 to disable, 1 to boost
+ only rt_mutex, and 2 to boost unconditionally.
+ Defaults to 2, which might seem to be an
+ odd choice, but which should be harmless for
+ non-real-time spinlocks, due to their disabling
+ of preemption. Note that non-realtime mutexes
+ disable boosting.
+
+ locktorture.rt_boost_factor= [KNL]
+ Number that determines how often and for how
+ long priority boosting is exercised. This is
+ scaled down by the number of writers, so that the
+ number of boosts per unit time remains roughly
+ constant as the number of writers increases.
+ On the other hand, the duration of each boost
+ increases with the number of writers.
+
locktorture.shuffle_interval= [KNL]
Set task-shuffle interval (jiffies). Shuffling
tasks allows some CPUs to go into dyntick-idle
@@ -2950,13 +3011,13 @@
locktorture.torture_type= [KNL]
Specify the locking implementation to test.
+ locktorture.verbose= [KNL]
+ Enable additional printk() statements.
+
locktorture.writer_fifo= [KNL]
Run the write-side locktorture kthreads at
sched_set_fifo() real-time priority.
- locktorture.verbose= [KNL]
- Enable additional printk() statements.
-
logibm.irq= [HW,MOUSE] Logitech Bus Mouse Driver
Format: <irq>
@@ -4769,6 +4830,13 @@
Set maximum number of finished RCU callbacks to
process in one batch.
+ rcutree.do_rcu_barrier= [KNL]
+ Request a call to rcu_barrier(). This is
+ throttled so that userspace tests can safely
+ hammer on the sysfs variable if they so choose.
+ If triggered before the RCU grace-period machinery
+ is fully active, this will error out with EAGAIN.
+
rcutree.dump_tree= [KNL]
Dump the structure of the rcu_node combining tree
out at early boot. This is used for diagnostic
@@ -5422,6 +5490,12 @@
test until boot completes in order to avoid
interference.
+ refscale.lookup_instances= [KNL]
+ Number of data elements to use for the forms of
+ SLAB_TYPESAFE_BY_RCU testing. A negative number
+ is negated and multiplied by nr_cpu_ids, while
+ zero specifies nr_cpu_ids.
+
refscale.loops= [KNL]
Set the number of loops over the synchronization
primitive under test. Increasing this number
@@ -5858,6 +5932,13 @@
This feature may be more efficiently disabled
using the csdlock_debug- kernel parameter.
+ smp.panic_on_ipistall= [KNL]
+ If a csd_lock_timeout extends for more than
+ the specified number of milliseconds, panic the
+ system. By default, let CSD-lock acquisition
+ take as long as they take. Specifying 300,000
+ for this value provides a 5-minute timeout.
+
smsc-ircc2.nopnp [HW] Don't use PNP to discover SMC devices
smsc-ircc2.ircc_cfg= [HW] Device configuration I/O port
smsc-ircc2.ircc_sir= [HW] SIR base I/O port
diff --git a/Documentation/admin-guide/pm/intel_idle.rst b/Documentation/admin-guide/pm/intel_idle.rst
index b799a43da62e..39bd6ecce7de 100644
--- a/Documentation/admin-guide/pm/intel_idle.rst
+++ b/Documentation/admin-guide/pm/intel_idle.rst
@@ -170,7 +170,7 @@ and ``idle=nomwait``. If any of them is present in the kernel command line, the
``MWAIT`` instruction is not allowed to be used, so the initialization of
``intel_idle`` will fail.
-Apart from that there are four module parameters recognized by ``intel_idle``
+Apart from that there are five module parameters recognized by ``intel_idle``
itself that can be set via the kernel command line (they cannot be updated via
sysfs, so that is the only way to change their values).
@@ -216,6 +216,21 @@ are ignored).
The idle states disabled this way can be enabled (on a per-CPU basis) from user
space via ``sysfs``.
+The ``ibrs_off`` module parameter is a boolean flag (defaults to
+false). If set, it is used to control if IBRS (Indirect Branch Restricted
+Speculation) should be turned off when the CPU enters an idle state.
+This flag does not affect CPUs that use Enhanced IBRS which can remain
+on with little performance impact.
+
+For some CPUs, IBRS will be selected as mitigation for Spectre v2 and Retbleed
+security vulnerabilities by default. Leaving the IBRS mode on while idling may
+have a performance impact on its sibling CPU. The IBRS mode will be turned off
+by default when the CPU enters into a deep idle state, but not in some
+shallower ones. Setting the ``ibrs_off`` module parameter will force the IBRS
+mode to off when the CPU is in any one of the available idle states. This may
+help performance of a sibling CPU at the expense of a slightly higher wakeup
+latency for the idle CPU.
+
.. _intel-idle-core-and-package-idle-states:
diff --git a/Documentation/admin-guide/pstore-blk.rst b/Documentation/admin-guide/pstore-blk.rst
index 2d22ead9520e..1bb2a1c292aa 100644
--- a/Documentation/admin-guide/pstore-blk.rst
+++ b/Documentation/admin-guide/pstore-blk.rst
@@ -76,7 +76,7 @@ kmsg_size
~~~~~~~~~
The chunk size in KB for oops/panic front-end. It **MUST** be a multiple of 4.
-It's optional if you do not care oops/panic log.
+It's optional if you do not care about the oops/panic log.
There are multiple chunks for oops/panic front-end depending on the remaining
space except other pstore front-ends.
@@ -88,7 +88,7 @@ pmsg_size
~~~~~~~~~
The chunk size in KB for pmsg front-end. It **MUST** be a multiple of 4.
-It's optional if you do not care pmsg log.
+It's optional if you do not care about the pmsg log.
Unlike oops/panic front-end, there is only one chunk for pmsg front-end.
@@ -100,7 +100,7 @@ console_size
~~~~~~~~~~~~
The chunk size in KB for console front-end. It **MUST** be a multiple of 4.
-It's optional if you do not care console log.
+It's optional if you do not care about the console log.
Similar to pmsg front-end, there is only one chunk for console front-end.
@@ -111,7 +111,7 @@ ftrace_size
~~~~~~~~~~~
The chunk size in KB for ftrace front-end. It **MUST** be a multiple of 4.
-It's optional if you do not care console log.
+It's optional if you do not care about the ftrace log.
Similar to oops front-end, there are multiple chunks for ftrace front-end
depending on the count of cpu processors. Each chunk size is equal to
diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
index cf33de56da27..d89ac2bd8dc4 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -1182,7 +1182,8 @@ automatically on platforms where it can run (that is,
platforms with asymmetric CPU topologies and having an Energy
Model available). If your platform happens to meet the
requirements for EAS but you do not want to use it, change
-this value to 0.
+this value to 0. On Non-EAS platforms, write operation fails and
+read doesn't return anything.
task_delayacct
===============
diff --git a/Documentation/arch/x86/amd-memory-encryption.rst b/Documentation/arch/x86/amd-memory-encryption.rst
index 934310ce7258..07caa8fff852 100644
--- a/Documentation/arch/x86/amd-memory-encryption.rst
+++ b/Documentation/arch/x86/amd-memory-encryption.rst
@@ -130,4 +130,4 @@ SNP feature support.
More details in AMD64 APM[1] Vol 2: 15.34.10 SEV_STATUS MSR
-[1] https://www.amd.com/system/files/TechDocs/40332.pdf
+[1] https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/24593.pdf
diff --git a/Documentation/arch/x86/iommu.rst b/Documentation/arch/x86/iommu.rst
index 42c7a6faa39a..41fbadfe2221 100644
--- a/Documentation/arch/x86/iommu.rst
+++ b/Documentation/arch/x86/iommu.rst
@@ -5,7 +5,7 @@ x86 IOMMU Support
The architecture specs can be obtained from the below locations.
- Intel: http://www.intel.com/content/dam/www/public/us/en/documents/product-specifications/vt-directed-io-spec.pdf
-- AMD: https://www.amd.com/system/files/TechDocs/48882_IOMMU.pdf
+- AMD: https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/specifications/48882_3_07_PUB.pdf
This guide gives a quick cheat sheet for some basic understanding.
diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
index cb05d90111b4..a6279df64a9d 100644
--- a/Documentation/arch/x86/resctrl.rst
+++ b/Documentation/arch/x86/resctrl.rst
@@ -35,7 +35,7 @@ about the feature from resctrl's info directory.
To use the feature mount the file system::
- # mount -t resctrl resctrl [-o cdp[,cdpl2][,mba_MBps]] /sys/fs/resctrl
+ # mount -t resctrl resctrl [-o cdp[,cdpl2][,mba_MBps][,debug]] /sys/fs/resctrl
mount options are:
@@ -46,6 +46,9 @@ mount options are:
"mba_MBps":
Enable the MBA Software Controller(mba_sc) to specify MBA
bandwidth in MBps
+"debug":
+ Make debug files accessible. Available debug files are annotated with
+ "Available only with debug option".
L2 and L3 CDP are controlled separately.
@@ -124,6 +127,13 @@ related to allocation:
"P":
Corresponding region is pseudo-locked. No
sharing allowed.
+"sparse_masks":
+ Indicates if non-contiguous 1s value in CBM is supported.
+
+ "0":
+ Only contiguous 1s value in CBM is supported.
+ "1":
+ Non-contiguous 1s value in CBM is supported.
Memory bandwidth(MB) subdirectory contains the following files
with respect to allocation:
@@ -299,7 +309,14 @@ All groups contain the following files:
"tasks":
Reading this file shows the list of all tasks that belong to
this group. Writing a task id to the file will add a task to the
- group. If the group is a CTRL_MON group the task is removed from
+ group. Multiple tasks can be added by separating the task ids
+ with commas. Tasks will be assigned sequentially. Multiple
+ failures are not supported. A single failure encountered while
+ attempting to assign a task will cause the operation to abort and
+ already added tasks before the failure will remain in the group.
+ Failures will be logged to /sys/fs/resctrl/info/last_cmd_status.
+
+ If the group is a CTRL_MON group the task is removed from
whichever previous CTRL_MON group owned the task and also from
any MON group that owned the task. If the group is a MON group,
then the task must already belong to the CTRL_MON parent of this
@@ -342,6 +359,10 @@ When control is enabled all CTRL_MON groups will also contain:
file. On successful pseudo-locked region creation the mode will
automatically change to "pseudo-locked".
+"ctrl_hw_id":
+ Available only with debug option. The identifier used by hardware
+ for the control group. On x86 this is the CLOSID.
+
When monitoring is enabled all MON groups will also contain:
"mon_data":
@@ -355,6 +376,10 @@ When monitoring is enabled all MON groups will also contain:
the sum for all tasks in the CTRL_MON group and all tasks in
MON groups. Please see example section for more details on usage.
+"mon_hw_id":
+ Available only with debug option. The identifier used by hardware
+ for the monitor group. On x86 this is the RMID.
+
Resource allocation rules
-------------------------
@@ -445,12 +470,13 @@ For cache resources we describe the portion of the cache that is available
for allocation using a bitmask. The maximum value of the mask is defined
by each cpu model (and may be different for different cache levels). It
is found using CPUID, but is also provided in the "info" directory of
-the resctrl file system in "info/{resource}/cbm_mask". Intel hardware
+the resctrl file system in "info/{resource}/cbm_mask". Some Intel hardware
requires that these masks have all the '1' bits in a contiguous block. So
0x3, 0x6 and 0xC are legal 4-bit masks with two bits set, but 0x5, 0x9
-and 0xA are not. On a system with a 20-bit mask each bit represents 5%
-of the capacity of the cache. You could partition the cache into four
-equal parts with masks: 0x1f, 0x3e0, 0x7c00, 0xf8000.
+and 0xA are not. Check /sys/fs/resctrl/info/{resource}/sparse_masks
+if non-contiguous 1s value is supported. On a system with a 20-bit mask
+each bit represents 5% of the capacity of the cache. You could partition
+the cache into four equal parts with masks: 0x1f, 0x3e0, 0x7c00, 0xf8000.
Memory bandwidth Allocation and monitoring
==========================================
diff --git a/Documentation/arch/x86/topology.rst b/Documentation/arch/x86/topology.rst
index 7f58010ea86a..08ebf9edbfc1 100644
--- a/Documentation/arch/x86/topology.rst
+++ b/Documentation/arch/x86/topology.rst
@@ -55,19 +55,19 @@ Package-related topology information in the kernel:
The number of dies in a package. This information is retrieved via CPUID.
- - cpuinfo_x86.cpu_die_id:
+ - cpuinfo_x86.topo.die_id:
The physical ID of the die. This information is retrieved via CPUID.
- - cpuinfo_x86.phys_proc_id:
+ - cpuinfo_x86.topo.pkg_id:
The physical ID of the package. This information is retrieved via CPUID
and deduced from the APIC IDs of the cores in the package.
Modern systems use this value for the socket. There may be multiple
- packages within a socket. This value may differ from cpu_die_id.
+ packages within a socket. This value may differ from topo.die_id.
- - cpuinfo_x86.logical_proc_id:
+ - cpuinfo_x86.topo.logical_pkg_id:
The logical ID of the package. As we do not trust BIOSes to enumerate the
packages in a consistent way, we introduced the concept of logical package
@@ -79,9 +79,7 @@ Package-related topology information in the kernel:
The maximum possible number of packages in the system. Helpful for per
package facilities to preallocate per package information.
- - cpu_llc_id:
-
- A per-CPU variable containing:
+ - cpuinfo_x86.topo.llc_id:
- On Intel, the first APIC ID of the list of CPUs sharing the Last Level
Cache
diff --git a/Documentation/devicetree/bindings/iio/addac/adi,ad74115.yaml b/Documentation/devicetree/bindings/iio/addac/adi,ad74115.yaml
index 2594fa192f93..2a04906531fb 100644
--- a/Documentation/devicetree/bindings/iio/addac/adi,ad74115.yaml
+++ b/Documentation/devicetree/bindings/iio/addac/adi,ad74115.yaml
@@ -32,7 +32,8 @@ properties:
spi-cpol: true
- reset-gpios: true
+ reset-gpios:
+ maxItems: 1
interrupts:
minItems: 1
diff --git a/Documentation/devicetree/bindings/iio/dac/adi,ad5758.yaml b/Documentation/devicetree/bindings/iio/dac/adi,ad5758.yaml
index 4e508bfcc9d8..5121685337b5 100644
--- a/Documentation/devicetree/bindings/iio/dac/adi,ad5758.yaml
+++ b/Documentation/devicetree/bindings/iio/dac/adi,ad5758.yaml
@@ -78,7 +78,8 @@ properties:
- const: -1000
- const: 22000
- reset-gpios: true
+ reset-gpios:
+ maxItems: 1
adi,dc-dc-ilim-microamp:
enum: [150000, 200000, 250000, 300000, 350000, 400000]
diff --git a/Documentation/devicetree/bindings/iio/health/ti,afe4403.yaml b/Documentation/devicetree/bindings/iio/health/ti,afe4403.yaml
index b9b5beac33b2..5b6cde86b5a5 100644
--- a/Documentation/devicetree/bindings/iio/health/ti,afe4403.yaml
+++ b/Documentation/devicetree/bindings/iio/health/ti,afe4403.yaml
@@ -23,7 +23,8 @@ properties:
maxItems: 1
description: Connected to ADC_RDY pin.
- reset-gpios: true
+ reset-gpios:
+ maxItems: 1
required:
- compatible
diff --git a/Documentation/devicetree/bindings/iio/health/ti,afe4404.yaml b/Documentation/devicetree/bindings/iio/health/ti,afe4404.yaml
index 2958c4ca75b4..167d10bd60af 100644
--- a/Documentation/devicetree/bindings/iio/health/ti,afe4404.yaml
+++ b/Documentation/devicetree/bindings/iio/health/ti,afe4404.yaml
@@ -23,7 +23,8 @@ properties:
maxItems: 1
description: Connected to ADC_RDY pin.
- reset-gpios: true
+ reset-gpios:
+ maxItems: 1
additionalProperties: false
diff --git a/Documentation/devicetree/bindings/memory-controllers/xlnx,versal-ddrmc-edac.yaml b/Documentation/devicetree/bindings/memory-controllers/xlnx,versal-ddrmc-edac.yaml
new file mode 100644
index 000000000000..12f8e9f350bc
--- /dev/null
+++ b/Documentation/devicetree/bindings/memory-controllers/xlnx,versal-ddrmc-edac.yaml
@@ -0,0 +1,57 @@
+# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/memory-controllers/xlnx,versal-ddrmc-edac.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Xilinx Versal DDRMC (Integrated DDR Memory Controller)
+
+maintainers:
+ - Shubhrajyoti Datta <shubhrajyoti.datta@amd.com>
+ - Sai Krishna Potthuri <sai.krishna.potthuri@amd.com>
+
+description:
+ The integrated DDR Memory Controllers (DDRMCs) support both DDR4 and LPDDR4/
+ 4X memory interfaces. Versal DDR memory controller has an optional ECC support
+ which correct single bit ECC errors and detect double bit ECC errors.
+
+properties:
+ compatible:
+ const: xlnx,versal-ddrmc
+
+ reg:
+ items:
+ - description: DDR Memory Controller registers
+ - description: NOC registers corresponding to DDR Memory Controller
+
+ reg-names:
+ items:
+ - const: base
+ - const: noc
+
+ interrupts:
+ maxItems: 1
+
+required:
+ - compatible
+ - reg
+ - reg-names
+ - interrupts
+
+additionalProperties: false
+
+examples:
+ - |
+ #include <dt-bindings/interrupt-controller/arm-gic.h>
+
+ bus {
+ #address-cells = <2>;
+ #size-cells = <2>;
+ memory-controller@f6150000 {
+ compatible = "xlnx,versal-ddrmc";
+ reg = <0x0 0xf6150000 0x0 0x2000>, <0x0 0xf6070000 0x0 0x20000>;
+ reg-names = "base", "noc";
+ interrupt-parent = <&gic>;
+ interrupts = <GIC_SPI 147 IRQ_TYPE_LEVEL_HIGH>;
+ };
+ };
diff --git a/Documentation/devicetree/bindings/timer/cirrus,ep9301-timer.yaml b/Documentation/devicetree/bindings/timer/cirrus,ep9301-timer.yaml
new file mode 100644
index 000000000000..e463e11e259d
--- /dev/null
+++ b/Documentation/devicetree/bindings/timer/cirrus,ep9301-timer.yaml
@@ -0,0 +1,49 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/timer/cirrus,ep9301-timer.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Cirrus Logic EP93xx timer
+
+maintainers:
+ - Alexander Sverdlin <alexander.sverdlin@gmail.com>
+ - Nikita Shubin <nikita.shubin@maquefel.me>
+
+properties:
+ compatible:
+ oneOf:
+ - const: cirrus,ep9301-timer
+ - items:
+ - enum:
+ - cirrus,ep9302-timer
+ - cirrus,ep9307-timer
+ - cirrus,ep9312-timer
+ - cirrus,ep9315-timer
+ - const: cirrus,ep9301-timer
+
+ reg:
+ maxItems: 1
+
+ interrupts:
+ maxItems: 1
+
+ resets:
+ maxItems: 1
+
+required:
+ - compatible
+ - reg
+ - interrupts
+
+additionalProperties: false
+
+examples:
+ - |
+ timer@80810000 {
+ compatible = "cirrus,ep9301-timer";
+ reg = <0x80810000 0x100>;
+ interrupt-parent = <&vic1>;
+ interrupts = <19>;
+ };
+...
diff --git a/Documentation/devicetree/bindings/timer/renesas,rz-mtu3.yaml b/Documentation/devicetree/bindings/timer/renesas,rz-mtu3.yaml
index bffdab0b0185..3931054b42fb 100644
--- a/Documentation/devicetree/bindings/timer/renesas,rz-mtu3.yaml
+++ b/Documentation/devicetree/bindings/timer/renesas,rz-mtu3.yaml
@@ -11,8 +11,8 @@ maintainers:
description: |
This hardware block consists of eight 16-bit timer channels and one
- 32- bit timer channel. It supports the following specifications:
- - Pulse input/output: 28 lines max.
+ 32-bit timer channel. It supports the following specifications:
+ - Pulse input/output: 28 lines max
- Pulse input 3 lines
- Count clock 11 clocks for each channel (14 clocks for MTU0, 12 clocks
for MTU2, and 10 clocks for MTU5, four clocks for MTU1-MTU2 combination
@@ -23,11 +23,11 @@ description: |
- Input capture function (noise filter setting available)
- Counter-clearing operation
- Simultaneous writing to multiple timer counters (TCNT)
- (excluding MTU8).
+ (excluding MTU8)
- Simultaneous clearing on compare match or input capture
- (excluding MTU8).
+ (excluding MTU8)
- Simultaneous input and output to registers in synchronization with
- counter operations (excluding MTU8).
+ counter operations (excluding MTU8)
- Up to 12-phase PWM output in combination with synchronous operation
(excluding MTU8)
- [MTU0 MTU3, MTU4, MTU6, MTU7, and MTU8]
@@ -40,26 +40,26 @@ description: |
- [MTU3, MTU4, MTU6, and MTU7]
- Through interlocked operation of MTU3/4 and MTU6/7, the positive and
negative signals in six phases (12 phases in total) can be output in
- complementary PWM and reset-synchronized PWM operation.
+ complementary PWM and reset-synchronized PWM operation
- In complementary PWM mode, values can be transferred from buffer
registers to temporary registers at crests and troughs of the timer-
counter values or when the buffer registers (TGRD registers in MTU4
- and MTU7) are written to.
- - Double-buffering selectable in complementary PWM mode.
+ and MTU7) are written to
+ - Double-buffering selectable in complementary PWM mode
- [MTU3 and MTU4]
- Through interlocking with MTU0, a mode for driving AC synchronous
motors (brushless DC motors) by using complementary PWM output and
reset-synchronized PWM output is settable and allows the selection
- of two types of waveform output (chopping or level).
+ of two types of waveform output (chopping or level)
- [MTU5]
- - Capable of operation as a dead-time compensation counter.
+ - Capable of operation as a dead-time compensation counter
- [MTU0/MTU5, MTU1, MTU2, and MTU8]
- 32-bit phase counting mode specifiable by combining MTU1 and MTU2 and
- through interlocked operation with MTU0/MTU5 and MTU8.
+ through interlocked operation with MTU0/MTU5 and MTU8
- Interrupt-skipping function
- In complementary PWM mode, interrupts on crests and troughs of counter
values and triggers to start conversion by the A/D converter can be
- skipped.
+ skipped
- Interrupt sources: 43 sources.
- Buffer operation:
- Automatic transfer of register data (transfer from the buffer
@@ -68,9 +68,9 @@ description: |
- A/D converter start triggers can be generated
- A/D converter start request delaying function enables A/D converter
to be started with any desired timing and to be synchronized with
- PWM output.
+ PWM output
- Low power consumption function
- - The MTU3a can be placed in the module-stop state.
+ - The MTU3a can be placed in the module-stop state
There are two phase counting modes. 16-bit phase counting mode in which
MTU1 and MTU2 operate independently, and cascade connection 32-bit phase
@@ -109,6 +109,7 @@ properties:
compatible:
items:
- enum:
+ - renesas,r9a07g043-mtu3 # RZ/{G2UL,Five}
- renesas,r9a07g044-mtu3 # RZ/G2{L,LC}
- renesas,r9a07g054-mtu3 # RZ/V2L
- const: renesas,rz-mtu3
@@ -169,27 +170,27 @@ properties:
- const: tgib0
- const: tgic0
- const: tgid0
- - const: tgiv0
+ - const: tciv0
- const: tgie0
- const: tgif0
- const: tgia1
- const: tgib1
- - const: tgiv1
- - const: tgiu1
+ - const: tciv1
+ - const: tciu1
- const: tgia2
- const: tgib2
- - const: tgiv2
- - const: tgiu2
+ - const: tciv2
+ - const: tciu2
- const: tgia3
- const: tgib3
- const: tgic3
- const: tgid3
- - const: tgiv3
+ - const: tciv3
- const: tgia4
- const: tgib4
- const: tgic4
- const: tgid4
- - const: tgiv4
+ - const: tciv4
- const: tgiu5
- const: tgiv5
- const: tgiw5
@@ -197,18 +198,18 @@ properties:
- const: tgib6
- const: tgic6
- const: tgid6
- - const: tgiv6
+ - const: tciv6
- const: tgia7
- const: tgib7
- const: tgic7
- const: tgid7
- - const: tgiv7
+ - const: tciv7
- const: tgia8
- const: tgib8
- const: tgic8
- const: tgid8
- - const: tgiv8
- - const: tgiu8
+ - const: tciv8
+ - const: tciu8
clocks:
maxItems: 1
@@ -285,16 +286,16 @@ examples:
<GIC_SPI 211 IRQ_TYPE_EDGE_RISING>,
<GIC_SPI 212 IRQ_TYPE_EDGE_RISING>,
<GIC_SPI 213 IRQ_TYPE_EDGE_RISING>;
- interrupt-names = "tgia0", "tgib0", "tgic0", "tgid0", "tgiv0", "tgie0",
+ interrupt-names = "tgia0", "tgib0", "tgic0", "tgid0", "tciv0", "tgie0",
"tgif0",
- "tgia1", "tgib1", "tgiv1", "tgiu1",
- "tgia2", "tgib2", "tgiv2", "tgiu2",
- "tgia3", "tgib3", "tgic3", "tgid3", "tgiv3",
- "tgia4", "tgib4", "tgic4", "tgid4", "tgiv4",
+ "tgia1", "tgib1", "tciv1", "tciu1",
+ "tgia2", "tgib2", "tciv2", "tciu2",
+ "tgia3", "tgib3", "tgic3", "tgid3", "tciv3",
+ "tgia4", "tgib4", "tgic4", "tgid4", "tciv4",
"tgiu5", "tgiv5", "tgiw5",
- "tgia6", "tgib6", "tgic6", "tgid6", "tgiv6",
- "tgia7", "tgib7", "tgic7", "tgid7", "tgiv7",
- "tgia8", "tgib8", "tgic8", "tgid8", "tgiv8", "tgiu8";
+ "tgia6", "tgib6", "tgic6", "tgid6", "tciv6",
+ "tgia7", "tgib7", "tgic7", "tgid7", "tciv7",
+ "tgia8", "tgib8", "tgic8", "tgid8", "tciv8", "tciu8";
clocks = <&cpg CPG_MOD R9A07G044_MTU_X_MCK_MTU3>;
power-domains = <&cpg>;
resets = <&cpg R9A07G044_MTU_X_PRESET_MTU3>;
diff --git a/Documentation/filesystems/files.rst b/Documentation/filesystems/files.rst
index bcf84459917f..9e38e4c221ca 100644
--- a/Documentation/filesystems/files.rst
+++ b/Documentation/filesystems/files.rst
@@ -62,7 +62,7 @@ the fdtable structure -
be held.
4. To look up the file structure given an fd, a reader
- must use either lookup_fd_rcu() or files_lookup_fd_rcu() APIs. These
+ must use either lookup_fdget_rcu() or files_lookup_fdget_rcu() APIs. These
take care of barrier requirements due to lock-free lookup.
An example::
@@ -70,43 +70,22 @@ the fdtable structure -
struct file *file;
rcu_read_lock();
- file = lookup_fd_rcu(fd);
- if (file) {
- ...
- }
- ....
+ file = lookup_fdget_rcu(fd);
rcu_read_unlock();
-
-5. Handling of the file structures is special. Since the look-up
- of the fd (fget()/fget_light()) are lock-free, it is possible
- that look-up may race with the last put() operation on the
- file structure. This is avoided using atomic_long_inc_not_zero()
- on ->f_count::
-
- rcu_read_lock();
- file = files_lookup_fd_rcu(files, fd);
if (file) {
- if (atomic_long_inc_not_zero(&file->f_count))
- *fput_needed = 1;
- else
- /* Didn't get the reference, someone's freed */
- file = NULL;
+ ...
+ fput(file);
}
- rcu_read_unlock();
....
- return file;
-
- atomic_long_inc_not_zero() detects if refcounts is already zero or
- goes to zero during increment. If it does, we fail
- fget()/fget_light().
-6. Since both fdtable and file structures can be looked up
+5. Since both fdtable and file structures can be looked up
lock-free, they must be installed using rcu_assign_pointer()
API. If they are looked up lock-free, rcu_dereference()
must be used. However it is advisable to use files_fdtable()
- and lookup_fd_rcu()/files_lookup_fd_rcu() which take care of these issues.
+ and lookup_fdget_rcu()/files_lookup_fdget_rcu() which take care of these
+ issues.
-7. While updating, the fdtable pointer must be looked up while
+6. While updating, the fdtable pointer must be looked up while
holding files->file_lock. If ->file_lock is dropped, then
another thread expand the files thereby creating a new
fdtable and making the earlier fdtable pointer stale.
@@ -126,3 +105,19 @@ the fdtable structure -
Since locate_fd() can drop ->file_lock (and reacquire ->file_lock),
the fdtable pointer (fdt) must be loaded after locate_fd().
+On newer kernels rcu based file lookup has been switched to rely on
+SLAB_TYPESAFE_BY_RCU instead of call_rcu(). It isn't sufficient anymore
+to just acquire a reference to the file in question under rcu using
+atomic_long_inc_not_zero() since the file might have already been
+recycled and someone else might have bumped the reference. In other
+words, callers might see reference count bumps from newer users. For
+this is reason it is necessary to verify that the pointer is the same
+before and after the reference count increment. This pattern can be seen
+in get_file_rcu() and __files_get_rcu().
+
+In addition, it isn't possible to access or check fields in struct file
+without first aqcuiring a reference on it under rcu lookup. Not doing
+that was always very dodgy and it was only usable for non-pointer data
+in struct file. With SLAB_TYPESAFE_BY_RCU it is necessary that callers
+either first acquire a reference or they must hold the files_lock of the
+fdtable.
diff --git a/Documentation/filesystems/fscrypt.rst b/Documentation/filesystems/fscrypt.rst
index a624e92f2687..1b84f818e574 100644
--- a/Documentation/filesystems/fscrypt.rst
+++ b/Documentation/filesystems/fscrypt.rst
@@ -261,9 +261,9 @@ DIRECT_KEY policies
The Adiantum encryption mode (see `Encryption modes and usage`_) is
suitable for both contents and filenames encryption, and it accepts
-long IVs --- long enough to hold both an 8-byte logical block number
-and a 16-byte per-file nonce. Also, the overhead of each Adiantum key
-is greater than that of an AES-256-XTS key.
+long IVs --- long enough to hold both an 8-byte data unit index and a
+16-byte per-file nonce. Also, the overhead of each Adiantum key is
+greater than that of an AES-256-XTS key.
Therefore, to improve performance and save memory, for Adiantum a
"direct key" configuration is supported. When the user has enabled
@@ -300,8 +300,8 @@ IV_INO_LBLK_32 policies
IV_INO_LBLK_32 policies work like IV_INO_LBLK_64, except that for
IV_INO_LBLK_32, the inode number is hashed with SipHash-2-4 (where the
-SipHash key is derived from the master key) and added to the file
-logical block number mod 2^32 to produce a 32-bit IV.
+SipHash key is derived from the master key) and added to the file data
+unit index mod 2^32 to produce a 32-bit IV.
This format is optimized for use with inline encryption hardware
compliant with the eMMC v5.2 standard, which supports only 32 IV bits
@@ -451,31 +451,62 @@ acceleration is recommended:
Contents encryption
-------------------
-For file contents, each filesystem block is encrypted independently.
-Starting from Linux kernel 5.5, encryption of filesystems with block
-size less than system's page size is supported.
-
-Each block's IV is set to the logical block number within the file as
-a little endian number, except that:
-
-- With CBC mode encryption, ESSIV is also used. Specifically, each IV
- is encrypted with AES-256 where the AES-256 key is the SHA-256 hash
- of the file's data encryption key.
-
-- With `DIRECT_KEY policies`_, the file's nonce is appended to the IV.
- Currently this is only allowed with the Adiantum encryption mode.
-
-- With `IV_INO_LBLK_64 policies`_, the logical block number is limited
- to 32 bits and is placed in bits 0-31 of the IV. The inode number
- (which is also limited to 32 bits) is placed in bits 32-63.
-
-- With `IV_INO_LBLK_32 policies`_, the logical block number is limited
- to 32 bits and is placed in bits 0-31 of the IV. The inode number
- is then hashed and added mod 2^32.
-
-Note that because file logical block numbers are included in the IVs,
-filesystems must enforce that blocks are never shifted around within
-encrypted files, e.g. via "collapse range" or "insert range".
+For contents encryption, each file's contents is divided into "data
+units". Each data unit is encrypted independently. The IV for each
+data unit incorporates the zero-based index of the data unit within
+the file. This ensures that each data unit within a file is encrypted
+differently, which is essential to prevent leaking information.
+
+Note: the encryption depending on the offset into the file means that
+operations like "collapse range" and "insert range" that rearrange the
+extent mapping of files are not supported on encrypted files.
+
+There are two cases for the sizes of the data units:
+
+* Fixed-size data units. This is how all filesystems other than UBIFS
+ work. A file's data units are all the same size; the last data unit
+ is zero-padded if needed. By default, the data unit size is equal
+ to the filesystem block size. On some filesystems, users can select
+ a sub-block data unit size via the ``log2_data_unit_size`` field of
+ the encryption policy; see `FS_IOC_SET_ENCRYPTION_POLICY`_.
+
+* Variable-size data units. This is what UBIFS does. Each "UBIFS
+ data node" is treated as a crypto data unit. Each contains variable
+ length, possibly compressed data, zero-padded to the next 16-byte
+ boundary. Users cannot select a sub-block data unit size on UBIFS.
+
+In the case of compression + encryption, the compressed data is
+encrypted. UBIFS compression works as described above. f2fs
+compression works a bit differently; it compresses a number of
+filesystem blocks into a smaller number of filesystem blocks.
+Therefore a f2fs-compressed file still uses fixed-size data units, and
+it is encrypted in a similar way to a file containing holes.
+
+As mentioned in `Key hierarchy`_, the default encryption setting uses
+per-file keys. In this case, the IV for each data unit is simply the
+index of the data unit in the file. However, users can select an
+encryption setting that does not use per-file keys. For these, some
+kind of file identifier is incorporated into the IVs as follows:
+
+- With `DIRECT_KEY policies`_, the data unit index is placed in bits
+ 0-63 of the IV, and the file's nonce is placed in bits 64-191.
+
+- With `IV_INO_LBLK_64 policies`_, the data unit index is placed in
+ bits 0-31 of the IV, and the file's inode number is placed in bits
+ 32-63. This setting is only allowed when data unit indices and
+ inode numbers fit in 32 bits.
+
+- With `IV_INO_LBLK_32 policies`_, the file's inode number is hashed
+ and added to the data unit index. The resulting value is truncated
+ to 32 bits and placed in bits 0-31 of the IV. This setting is only
+ allowed when data unit indices and inode numbers fit in 32 bits.
+
+The byte order of the IV is always little endian.
+
+If the user selects FSCRYPT_MODE_AES_128_CBC for the contents mode, an
+ESSIV layer is automatically included. In this case, before the IV is
+passed to AES-128-CBC, it is encrypted with AES-256 where the AES-256
+key is the SHA-256 hash of the file's contents encryption key.
Filenames encryption
--------------------
@@ -544,7 +575,8 @@ follows::
__u8 contents_encryption_mode;
__u8 filenames_encryption_mode;
__u8 flags;
- __u8 __reserved[4];
+ __u8 log2_data_unit_size;
+ __u8 __reserved[3];
__u8 master_key_identifier[FSCRYPT_KEY_IDENTIFIER_SIZE];
};
@@ -586,6 +618,29 @@ This structure must be initialized as follows:
The DIRECT_KEY, IV_INO_LBLK_64, and IV_INO_LBLK_32 flags are
mutually exclusive.
+- ``log2_data_unit_size`` is the log2 of the data unit size in bytes,
+ or 0 to select the default data unit size. The data unit size is
+ the granularity of file contents encryption. For example, setting
+ ``log2_data_unit_size`` to 12 causes file contents be passed to the
+ underlying encryption algorithm (such as AES-256-XTS) in 4096-byte
+ data units, each with its own IV.
+
+ Not all filesystems support setting ``log2_data_unit_size``. ext4
+ and f2fs support it since Linux v6.7. On filesystems that support
+ it, the supported nonzero values are 9 through the log2 of the
+ filesystem block size, inclusively. The default value of 0 selects
+ the filesystem block size.
+
+ The main use case for ``log2_data_unit_size`` is for selecting a
+ data unit size smaller than the filesystem block size for
+ compatibility with inline encryption hardware that only supports
+ smaller data unit sizes. ``/sys/block/$disk/queue/crypto/`` may be
+ useful for checking which data unit sizes are supported by a
+ particular system's inline encryption hardware.
+
+ Leave this field zeroed unless you are certain you need it. Using
+ an unnecessarily small data unit size reduces performance.
+
- For v2 encryption policies, ``__reserved`` must be zeroed.
- For v1 encryption policies, ``master_key_descriptor`` specifies how
@@ -1079,8 +1134,8 @@ The caller must zero all input fields, then fill in ``key_spec``:
On success, 0 is returned and the kernel fills in the output fields:
- ``status`` indicates whether the key is absent, present, or
- incompletely removed. Incompletely removed means that the master
- secret has been removed, but some files are still in use; i.e.,
+ incompletely removed. Incompletely removed means that removal has
+ been initiated, but some files are still in use; i.e.,
`FS_IOC_REMOVE_ENCRYPTION_KEY`_ returned 0 but set the informational
status flag FSCRYPT_KEY_REMOVAL_STATUS_FLAG_FILES_BUSY.
diff --git a/Documentation/filesystems/nfs/exporting.rst b/Documentation/filesystems/nfs/exporting.rst
index 4b30daee399a..198d805d611c 100644
--- a/Documentation/filesystems/nfs/exporting.rst
+++ b/Documentation/filesystems/nfs/exporting.rst
@@ -241,3 +241,10 @@ following flags are defined:
all of an inode's dirty data on last close. Exports that behave this
way should set EXPORT_OP_FLUSH_ON_CLOSE so that NFSD knows to skip
waiting for writeback when closing such files.
+
+ EXPORT_OP_ASYNC_LOCK - Indicates a capable filesystem to do async lock
+ requests from lockd. Only set EXPORT_OP_ASYNC_LOCK if the filesystem has
+ it's own ->lock() functionality as core posix_lock_file() implementation
+ has no async lock request handling yet. For more information about how to
+ indicate an async lock request from a ->lock() file_operations struct, see
+ fs/locks.c and comment for the function vfs_lock_file().
diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesystems/porting.rst
index 4d05b9862451..d69f59700a23 100644
--- a/Documentation/filesystems/porting.rst
+++ b/Documentation/filesystems/porting.rst
@@ -1045,3 +1045,10 @@ filesystem type is now moved to a later point when the devices are closed:
As this is a VFS level change it has no practical consequences for filesystems
other than that all of them must use one of the provided kill_litter_super(),
kill_anon_super(), or kill_block_super() helpers.
+
+---
+
+**mandatory**
+
+Lock ordering has been changed so that s_umount ranks above open_mutex again.
+All places where s_umount was taken under open_mutex have been fixed up.
diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index 06e14efd8662..d414e145f912 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -396,6 +396,10 @@ Memory barriers come in four basic varieties:
(2) Address-dependency barriers (historical).
+ [!] This section is marked as HISTORICAL: For more up-to-date
+ information, including how compiler transformations related to pointer
+ comparisons can sometimes cause problems, see
+ Documentation/RCU/rcu_dereference.rst.
An address-dependency barrier is a weaker form of read barrier. In the
case where two loads are performed such that the second depends on the
@@ -556,6 +560,9 @@ There are certain things that the Linux kernel memory barriers do not guarantee:
ADDRESS-DEPENDENCY BARRIERS (HISTORICAL)
----------------------------------------
+[!] This section is marked as HISTORICAL: For more up-to-date information,
+including how compiler transformations related to pointer comparisons can
+sometimes cause problems, see Documentation/RCU/rcu_dereference.rst.
As of v4.15 of the Linux kernel, an smp_mb() was added to READ_ONCE() for
DEC Alpha, which means that about the only people who need to pay attention
diff --git a/Documentation/netlink/specs/nfsd.yaml b/Documentation/netlink/specs/nfsd.yaml
new file mode 100644
index 000000000000..05acc73e2e33
--- /dev/null
+++ b/Documentation/netlink/specs/nfsd.yaml
@@ -0,0 +1,89 @@
+# SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)
+
+name: nfsd
+protocol: genetlink
+uapi-header: linux/nfsd_netlink.h
+
+doc: NFSD configuration over generic netlink.
+
+attribute-sets:
+ -
+ name: rpc-status
+ attributes:
+ -
+ name: xid
+ type: u32
+ byte-order: big-endian
+ -
+ name: flags
+ type: u32
+ -
+ name: prog
+ type: u32
+ -
+ name: version
+ type: u8
+ -
+ name: proc
+ type: u32
+ -
+ name: service_time
+ type: s64
+ -
+ name: pad
+ type: pad
+ -
+ name: saddr4
+ type: u32
+ byte-order: big-endian
+ display-hint: ipv4
+ -
+ name: daddr4
+ type: u32
+ byte-order: big-endian
+ display-hint: ipv4
+ -
+ name: saddr6
+ type: binary
+ display-hint: ipv6
+ -
+ name: daddr6
+ type: binary
+ display-hint: ipv6
+ -
+ name: sport
+ type: u16
+ byte-order: big-endian
+ -
+ name: dport
+ type: u16
+ byte-order: big-endian
+ -
+ name: compound-ops
+ type: u32
+ multi-attr: true
+
+operations:
+ list:
+ -
+ name: rpc-status-get
+ doc: dump pending nfsd rpc
+ attribute-set: rpc-status
+ dump:
+ pre: nfsd-nl-rpc-status-get-start
+ post: nfsd-nl-rpc-status-get-done
+ reply:
+ attributes:
+ - xid
+ - flags
+ - prog
+ - version
+ - proc
+ - service_time
+ - saddr4
+ - daddr4
+ - saddr6
+ - daddr6
+ - sport
+ - dport
+ - compound-ops
diff --git a/Documentation/process/changes.rst b/Documentation/process/changes.rst
index b48da698d6f2..bb96ca0f774b 100644
--- a/Documentation/process/changes.rst
+++ b/Documentation/process/changes.rst
@@ -31,7 +31,7 @@ you probably needn't concern yourself with pcmciautils.
====================== =============== ========================================
GNU C 5.1 gcc --version
Clang/LLVM (optional) 11.0.0 clang --version
-Rust (optional) 1.71.1 rustc --version
+Rust (optional) 1.73.0 rustc --version
bindgen (optional) 0.65.1 bindgen --version
GNU make 3.82 make --version
bash 4.2 bash --version
diff --git a/Documentation/rust/index.rst b/Documentation/rust/index.rst
index e599be2cec9b..965f2db529e0 100644
--- a/Documentation/rust/index.rst
+++ b/Documentation/rust/index.rst
@@ -6,6 +6,25 @@ Rust
Documentation related to Rust within the kernel. To start using Rust
in the kernel, please read the quick-start.rst guide.
+
+The Rust experiment
+-------------------
+
+The Rust support was merged in v6.1 into mainline in order to help in
+determining whether Rust as a language was suitable for the kernel, i.e. worth
+the tradeoffs.
+
+Currently, the Rust support is primarily intended for kernel developers and
+maintainers interested in the Rust support, so that they can start working on
+abstractions and drivers, as well as helping the development of infrastructure
+and tools.
+
+If you are an end user, please note that there are currently no in-tree
+drivers/modules suitable or intended for production use, and that the Rust
+support is still in development/experimental, especially for certain kernel
+configurations.
+
+
.. only:: rustdoc and html
You can also browse `rustdoc documentation <rustdoc/kernel/index.html>`_.
diff --git a/Documentation/scheduler/sched-capacity.rst b/Documentation/scheduler/sched-capacity.rst
index e2c1cf743158..de414b33dd2a 100644
--- a/Documentation/scheduler/sched-capacity.rst
+++ b/Documentation/scheduler/sched-capacity.rst
@@ -39,14 +39,15 @@ per Hz, leading to::
-------------------
Two different capacity values are used within the scheduler. A CPU's
-``capacity_orig`` is its maximum attainable capacity, i.e. its maximum
-attainable performance level. A CPU's ``capacity`` is its ``capacity_orig`` to
-which some loss of available performance (e.g. time spent handling IRQs) is
-subtracted.
+``original capacity`` is its maximum attainable capacity, i.e. its maximum
+attainable performance level. This original capacity is returned by
+the function arch_scale_cpu_capacity(). A CPU's ``capacity`` is its ``original
+capacity`` to which some loss of available performance (e.g. time spent
+handling IRQs) is subtracted.
Note that a CPU's ``capacity`` is solely intended to be used by the CFS class,
-while ``capacity_orig`` is class-agnostic. The rest of this document will use
-the term ``capacity`` interchangeably with ``capacity_orig`` for the sake of
+while ``original capacity`` is class-agnostic. The rest of this document will use
+the term ``capacity`` interchangeably with ``original capacity`` for the sake of
brevity.
1.3 Platform examples
diff --git a/Documentation/scheduler/sched-energy.rst b/Documentation/scheduler/sched-energy.rst
index fc853c8cc346..70e2921ef725 100644
--- a/Documentation/scheduler/sched-energy.rst
+++ b/Documentation/scheduler/sched-energy.rst
@@ -359,32 +359,9 @@ in milli-Watts or in an 'abstract scale'.
6.3 - Energy Model complexity
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-The task wake-up path is very latency-sensitive. When the EM of a platform is
-too complex (too many CPUs, too many performance domains, too many performance
-states, ...), the cost of using it in the wake-up path can become prohibitive.
-The energy-aware wake-up algorithm has a complexity of:
-
- C = Nd * (Nc + Ns)
-
-with: Nd the number of performance domains; Nc the number of CPUs; and Ns the
-total number of OPPs (ex: for two perf. domains with 4 OPPs each, Ns = 8).
-
-A complexity check is performed at the root domain level, when scheduling
-domains are built. EAS will not start on a root domain if its C happens to be
-higher than the completely arbitrary EM_MAX_COMPLEXITY threshold (2048 at the
-time of writing).
-
-If you really want to use EAS but the complexity of your platform's Energy
-Model is too high to be used with a single root domain, you're left with only
-two possible options:
-
- 1. split your system into separate, smaller, root domains using exclusive
- cpusets and enable EAS locally on each of them. This option has the
- benefit to work out of the box but the drawback of preventing load
- balance between root domains, which can result in an unbalanced system
- overall;
- 2. submit patches to reduce the complexity of the EAS wake-up algorithm,
- hence enabling it to cope with larger EMs in reasonable time.
+EAS does not impose any complexity limit on the number of PDs/OPPs/CPUs but
+restricts the number of CPUs to EM_MAX_NUM_CPUS to prevent overflows during
+the energy estimation.
6.4 - Schedutil governor
diff --git a/Documentation/scheduler/sched-rt-group.rst b/Documentation/scheduler/sched-rt-group.rst
index 655a096ec8fb..d685609ed3d7 100644
--- a/Documentation/scheduler/sched-rt-group.rst
+++ b/Documentation/scheduler/sched-rt-group.rst
@@ -39,10 +39,10 @@ Most notable:
1.1 The problem
---------------
-Realtime scheduling is all about determinism, a group has to be able to rely on
+Real-time scheduling is all about determinism, a group has to be able to rely on
the amount of bandwidth (eg. CPU time) being constant. In order to schedule
-multiple groups of realtime tasks, each group must be assigned a fixed portion
-of the CPU time available. Without a minimum guarantee a realtime group can
+multiple groups of real-time tasks, each group must be assigned a fixed portion
+of the CPU time available. Without a minimum guarantee a real-time group can
obviously fall short. A fuzzy upper limit is of no use since it cannot be
relied upon. Which leaves us with just the single fixed portion.
@@ -50,14 +50,14 @@ relied upon. Which leaves us with just the single fixed portion.
----------------
CPU time is divided by means of specifying how much time can be spent running
-in a given period. We allocate this "run time" for each realtime group which
-the other realtime groups will not be permitted to use.
+in a given period. We allocate this "run time" for each real-time group which
+the other real-time groups will not be permitted to use.
-Any time not allocated to a realtime group will be used to run normal priority
+Any time not allocated to a real-time group will be used to run normal priority
tasks (SCHED_OTHER). Any allocated run time not used will also be picked up by
SCHED_OTHER.
-Let's consider an example: a frame fixed realtime renderer must deliver 25
+Let's consider an example: a frame fixed real-time renderer must deliver 25
frames a second, which yields a period of 0.04s per frame. Now say it will also
have to play some music and respond to input, leaving it with around 80% CPU
time dedicated for the graphics. We can then give this group a run time of 0.8
@@ -70,7 +70,7 @@ needs only about 3% CPU time to do so, it can do with a 0.03 * 0.005s =
of 0.00015s.
The remaining CPU time will be used for user input and other tasks. Because
-realtime tasks have explicitly allocated the CPU time they need to perform
+real-time tasks have explicitly allocated the CPU time they need to perform
their tasks, buffer underruns in the graphics or audio can be eliminated.
NOTE: the above example is not fully implemented yet. We still
@@ -87,18 +87,20 @@ lack an EDF scheduler to make non-uniform periods usable.
The system wide settings are configured under the /proc virtual file system:
/proc/sys/kernel/sched_rt_period_us:
- The scheduling period that is equivalent to 100% CPU bandwidth
+ The scheduling period that is equivalent to 100% CPU bandwidth.
/proc/sys/kernel/sched_rt_runtime_us:
- A global limit on how much time realtime scheduling may use. Even without
- CONFIG_RT_GROUP_SCHED enabled, this will limit time reserved to realtime
- processes. With CONFIG_RT_GROUP_SCHED it signifies the total bandwidth
- available to all realtime groups.
+ A global limit on how much time real-time scheduling may use. This is always
+ less or equal to the period_us, as it denotes the time allocated from the
+ period_us for the real-time tasks. Even without CONFIG_RT_GROUP_SCHED enabled,
+ this will limit time reserved to real-time processes. With
+ CONFIG_RT_GROUP_SCHED=y it signifies the total bandwidth available to all
+ real-time groups.
* Time is specified in us because the interface is s32. This gives an
operating range from 1us to about 35 minutes.
* sched_rt_period_us takes values from 1 to INT_MAX.
- * sched_rt_runtime_us takes values from -1 to (INT_MAX - 1).
+ * sched_rt_runtime_us takes values from -1 to sched_rt_period_us.
* A run time of -1 specifies runtime == period, ie. no limit.
@@ -108,7 +110,7 @@ The system wide settings are configured under the /proc virtual file system:
The default values for sched_rt_period_us (1000000 or 1s) and
sched_rt_runtime_us (950000 or 0.95s). This gives 0.05s to be used by
SCHED_OTHER (non-RT tasks). These defaults were chosen so that a run-away
-realtime tasks will not lock up the machine but leave a little time to recover
+real-time tasks will not lock up the machine but leave a little time to recover
it. By setting runtime to -1 you'd get the old behaviour back.
By default all bandwidth is assigned to the root group and new groups get the
@@ -116,10 +118,10 @@ period from /proc/sys/kernel/sched_rt_period_us and a run time of 0. If you
want to assign bandwidth to another group, reduce the root group's bandwidth
and assign some or all of the difference to another group.
-Realtime group scheduling means you have to assign a portion of total CPU
-bandwidth to the group before it will accept realtime tasks. Therefore you will
-not be able to run realtime tasks as any user other than root until you have
-done that, even if the user has the rights to run processes with realtime
+Real-time group scheduling means you have to assign a portion of total CPU
+bandwidth to the group before it will accept real-time tasks. Therefore you will
+not be able to run real-time tasks as any user other than root until you have
+done that, even if the user has the rights to run processes with real-time
priority!