diff options
Diffstat (limited to 'Documentation')
87 files changed, 4107 insertions, 365 deletions
diff --git a/Documentation/admin-guide/LSM/index.rst b/Documentation/admin-guide/LSM/index.rst index a6ba95fbaa9f..ce63be6d64ad 100644 --- a/Documentation/admin-guide/LSM/index.rst +++ b/Documentation/admin-guide/LSM/index.rst @@ -47,3 +47,4 @@ subdirectories. tomoyo Yama SafeSetID + ipe diff --git a/Documentation/admin-guide/LSM/ipe.rst b/Documentation/admin-guide/LSM/ipe.rst new file mode 100644 index 000000000000..f38e641df0e9 --- /dev/null +++ b/Documentation/admin-guide/LSM/ipe.rst @@ -0,0 +1,790 @@ +.. SPDX-License-Identifier: GPL-2.0 + +Integrity Policy Enforcement (IPE) +================================== + +.. NOTE:: + + This is the documentation for admins, system builders, or individuals + attempting to use IPE. If you're looking for more developer-focused + documentation about IPE please see :doc:`the design docs </security/ipe>`. + +Overview +-------- + +Integrity Policy Enforcement (IPE) is a Linux Security Module that takes a +complementary approach to access control. Unlike traditional access control +mechanisms that rely on labels and paths for decision-making, IPE focuses +on the immutable security properties inherent to system components. These +properties are fundamental attributes or features of a system component +that cannot be altered, ensuring a consistent and reliable basis for +security decisions. + +To elaborate, in the context of IPE, system components primarily refer to +files or the devices these files reside on. However, this is just a +starting point. The concept of system components is flexible and can be +extended to include new elements as the system evolves. The immutable +properties include the origin of a file, which remains constant and +unchangeable over time. For example, IPE policies can be crafted to trust +files originating from the initramfs. Since initramfs is typically verified +by the bootloader, its files are deemed trustworthy; "file is from +initramfs" becomes an immutable property under IPE's consideration. + +The immutable property concept extends to the security features enabled on +a file's origin, such as dm-verity or fs-verity, which provide a layer of +integrity and trust. For example, IPE allows the definition of policies +that trust files from a dm-verity protected device. dm-verity ensures the +integrity of an entire device by providing a verifiable and immutable state +of its contents. Similarly, fs-verity offers filesystem-level integrity +checks, allowing IPE to enforce policies that trust files protected by +fs-verity. These two features cannot be turned off once established, so +they are considered immutable properties. These examples demonstrate how +IPE leverages immutable properties, such as a file's origin and its +integrity protection mechanisms, to make access control decisions. + +For the IPE policy, specifically, it grants the ability to enforce +stringent access controls by assessing security properties against +reference values defined within the policy. This assessment can be based on +the existence of a security property (e.g., verifying if a file originates +from initramfs) or evaluating the internal state of an immutable security +property. The latter includes checking the roothash of a dm-verity +protected device, determining whether dm-verity possesses a valid +signature, assessing the digest of a fs-verity protected file, or +determining whether fs-verity possesses a valid built-in signature. This +nuanced approach to policy enforcement enables a highly secure and +customizable system defense mechanism, tailored to specific security +requirements and trust models. + +To enable IPE, ensure that ``CONFIG_SECURITY_IPE`` (under +:menuselection:`Security -> Integrity Policy Enforcement (IPE)`) config +option is enabled. + +Use Cases +--------- + +IPE works best in fixed-function devices: devices in which their purpose +is clearly defined and not supposed to be changed (e.g. network firewall +device in a data center, an IoT device, etcetera), where all software and +configuration is built and provisioned by the system owner. + +IPE is a long-way off for use in general-purpose computing: the Linux +community as a whole tends to follow a decentralized trust model (known as +the web of trust), which IPE has no support for it yet. Instead, IPE +supports PKI (public key infrastructure), which generally designates a +set of trusted entities that provide a measure of absolute trust. + +Additionally, while most packages are signed today, the files inside +the packages (for instance, the executables), tend to be unsigned. This +makes it difficult to utilize IPE in systems where a package manager is +expected to be functional, without major changes to the package manager +and ecosystem behind it. + +The digest_cache LSM [#digest_cache_lsm]_ is a system that when combined with IPE, +could be used to enable and support general-purpose computing use cases. + +Known Limitations +----------------- + +IPE cannot verify the integrity of anonymous executable memory, such as +the trampolines created by gcc closures and libffi (<3.4.2), or JIT'd code. +Unfortunately, as this is dynamically generated code, there is no way +for IPE to ensure the integrity of this code to form a trust basis. + +IPE cannot verify the integrity of programs written in interpreted +languages when these scripts are invoked by passing these program files +to the interpreter. This is because the way interpreters execute these +files; the scripts themselves are not evaluated as executable code +through one of IPE's hooks, but they are merely text files that are read +(as opposed to compiled executables) [#interpreters]_. + +Threat Model +------------ + +IPE specifically targets the risk of tampering with user-space executable +code after the kernel has initially booted, including the kernel modules +loaded from userspace via ``modprobe`` or ``insmod``. + +To illustrate, consider a scenario where an untrusted binary, possibly +malicious, is downloaded along with all necessary dependencies, including a +loader and libc. The primary function of IPE in this context is to prevent +the execution of such binaries and their dependencies. + +IPE achieves this by verifying the integrity and authenticity of all +executable code before allowing them to run. It conducts a thorough +check to ensure that the code's integrity is intact and that they match an +authorized reference value (digest, signature, etc) as per the defined +policy. If a binary does not pass this verification process, either +because its integrity has been compromised or it does not meet the +authorization criteria, IPE will deny its execution. Additionally, IPE +generates audit logs which may be utilized to detect and analyze failures +resulting from policy violation. + +Tampering threat scenarios include modification or replacement of +executable code by a range of actors including: + +- Actors with physical access to the hardware +- Actors with local network access to the system +- Actors with access to the deployment system +- Compromised internal systems under external control +- Malicious end users of the system +- Compromised end users of the system +- Remote (external) compromise of the system + +IPE does not mitigate threats arising from malicious but authorized +developers (with access to a signing certificate), or compromised +developer tools used by them (i.e. return-oriented programming attacks). +Additionally, IPE draws hard security boundary between userspace and +kernelspace. As a result, kernel-level exploits are considered outside +the scope of IPE and mitigation is left to other mechanisms. + +Policy +------ + +IPE policy is a plain-text [#devdoc]_ policy composed of multiple statements +over several lines. There is one required line, at the top of the +policy, indicating the policy name, and the policy version, for +instance:: + + policy_name=Ex_Policy policy_version=0.0.0 + +The policy name is a unique key identifying this policy in a human +readable name. This is used to create nodes under securityfs as well as +uniquely identify policies to deploy new policies vs update existing +policies. + +The policy version indicates the current version of the policy (NOT the +policy syntax version). This is used to prevent rollback of policy to +potentially insecure previous versions of the policy. + +The next portion of IPE policy are rules. Rules are formed by key=value +pairs, known as properties. IPE rules require two properties: ``action``, +which determines what IPE does when it encounters a match against the +rule, and ``op``, which determines when the rule should be evaluated. +The ordering is significant, a rule must start with ``op``, and end with +``action``. Thus, a minimal rule is:: + + op=EXECUTE action=ALLOW + +This example will allow any execution. Additional properties are used to +assess immutable security properties about the files being evaluated. +These properties are intended to be descriptions of systems within the +kernel that can provide a measure of integrity verification, such that IPE +can determine the trust of the resource based on the value of the property. + +Rules are evaluated top-to-bottom. As a result, any revocation rules, +or denies should be placed early in the file to ensure that these rules +are evaluated before a rule with ``action=ALLOW``. + +IPE policy supports comments. The character '#' will function as a +comment, ignoring all characters to the right of '#' until the newline. + +The default behavior of IPE evaluations can also be expressed in policy, +through the ``DEFAULT`` statement. This can be done at a global level, +or a per-operation level:: + + # Global + DEFAULT action=ALLOW + + # Operation Specific + DEFAULT op=EXECUTE action=ALLOW + +A default must be set for all known operations in IPE. If you want to +preserve older policies being compatible with newer kernels that can introduce +new operations, set a global default of ``ALLOW``, then override the +defaults on a per-operation basis (as above). + +With configurable policy-based LSMs, there's several issues with +enforcing the configurable policies at startup, around reading and +parsing the policy: + +1. The kernel *should* not read files from userspace, so directly reading + the policy file is prohibited. +2. The kernel command line has a character limit, and one kernel module + should not reserve the entire character limit for its own + configuration. +3. There are various boot loaders in the kernel ecosystem, so handing + off a memory block would be costly to maintain. + +As a result, IPE has addressed this problem through a concept of a "boot +policy". A boot policy is a minimal policy which is compiled into the +kernel. This policy is intended to get the system to a state where +userspace is set up and ready to receive commands, at which point a more +complex policy can be deployed via securityfs. The boot policy can be +specified via ``SECURITY_IPE_BOOT_POLICY`` config option, which accepts +a path to a plain-text version of the IPE policy to apply. This policy +will be compiled into the kernel. If not specified, IPE will be disabled +until a policy is deployed and activated through securityfs. + +Deploying Policies +~~~~~~~~~~~~~~~~~~ + +Policies can be deployed from userspace through securityfs. These policies +are signed through the PKCS#7 message format to enforce some level of +authorization of the policies (prohibiting an attacker from gaining +unconstrained root, and deploying an "allow all" policy). These +policies must be signed by a certificate that chains to the +``SYSTEM_TRUSTED_KEYRING``. With openssl, the policy can be signed by:: + + openssl smime -sign \ + -in "$MY_POLICY" \ + -signer "$MY_CERTIFICATE" \ + -inkey "$MY_PRIVATE_KEY" \ + -noattr \ + -nodetach \ + -nosmimecap \ + -outform der \ + -out "$MY_POLICY.p7b" + +Deploying the policies is done through securityfs, through the +``new_policy`` node. To deploy a policy, simply cat the file into the +securityfs node:: + + cat "$MY_POLICY.p7b" > /sys/kernel/security/ipe/new_policy + +Upon success, this will create one subdirectory under +``/sys/kernel/security/ipe/policies/``. The subdirectory will be the +``policy_name`` field of the policy deployed, so for the example above, +the directory will be ``/sys/kernel/security/ipe/policies/Ex_Policy``. +Within this directory, there will be seven files: ``pkcs7``, ``policy``, +``name``, ``version``, ``active``, ``update``, and ``delete``. + +The ``pkcs7`` file is read-only. Reading it returns the raw PKCS#7 data +that was provided to the kernel, representing the policy. If the policy being +read is the boot policy, this will return ``ENOENT``, as it is not signed. + +The ``policy`` file is read only. Reading it returns the PKCS#7 inner +content of the policy, which will be the plain text policy. + +The ``active`` file is used to set a policy as the currently active policy. +This file is rw, and accepts a value of ``"1"`` to set the policy as active. +Since only a single policy can be active at one time, all other policies +will be marked inactive. The policy being marked active must have a policy +version greater or equal to the currently-running version. + +The ``update`` file is used to update a policy that is already present +in the kernel. This file is write-only and accepts a PKCS#7 signed +policy. Two checks will always be performed on this policy: First, the +``policy_names`` must match with the updated version and the existing +version. Second the updated policy must have a policy version greater than +or equal to the currently-running version. This is to prevent rollback attacks. + +The ``delete`` file is used to remove a policy that is no longer needed. +This file is write-only and accepts a value of ``1`` to delete the policy. +On deletion, the securityfs node representing the policy will be removed. +However, delete the current active policy is not allowed and will return +an operation not permitted error. + +Similarly, writing to both ``update`` and ``new_policy`` could result in +bad message(policy syntax error) or file exists error. The latter error happens +when trying to deploy a policy with a ``policy_name`` while the kernel already +has a deployed policy with the same ``policy_name``. + +Deploying a policy will *not* cause IPE to start enforcing the policy. IPE will +only enforce the policy marked active. Note that only one policy can be active +at a time. + +Once deployment is successful, the policy can be activated, by writing file +``/sys/kernel/security/ipe/policies/$policy_name/active``. +For example, the ``Ex_Policy`` can be activated by:: + + echo 1 > "/sys/kernel/security/ipe/policies/Ex_Policy/active" + +From above point on, ``Ex_Policy`` is now the enforced policy on the +system. + +IPE also provides a way to delete policies. This can be done via the +``delete`` securityfs node, +``/sys/kernel/security/ipe/policies/$policy_name/delete``. +Writing ``1`` to that file deletes the policy:: + + echo 1 > "/sys/kernel/security/ipe/policies/$policy_name/delete" + +There is only one requirement to delete a policy: the policy being deleted +must be inactive. + +.. NOTE:: + + If a traditional MAC system is enabled (SELinux, apparmor, smack), all + writes to ipe's securityfs nodes require ``CAP_MAC_ADMIN``. + +Modes +~~~~~ + +IPE supports two modes of operation: permissive (similar to SELinux's +permissive mode) and enforced. In permissive mode, all events are +checked and policy violations are logged, but the policy is not really +enforced. This allows users to test policies before enforcing them. + +The default mode is enforce, and can be changed via the kernel command +line parameter ``ipe.enforce=(0|1)``, or the securityfs node +``/sys/kernel/security/ipe/enforce``. + +.. NOTE:: + + If a traditional MAC system is enabled (SELinux, apparmor, smack, etcetera), + all writes to ipe's securityfs nodes require ``CAP_MAC_ADMIN``. + +Audit Events +~~~~~~~~~~~~ + +1420 AUDIT_IPE_ACCESS +^^^^^^^^^^^^^^^^^^^^^ +Event Examples:: + + type=1420 audit(1653364370.067:61): ipe_op=EXECUTE ipe_hook=MMAP enforcing=1 pid=2241 comm="ld-linux.so" path="/deny/lib/libc.so.6" dev="sda2" ino=14549020 rule="DEFAULT action=DENY" + type=1300 audit(1653364370.067:61): SYSCALL arch=c000003e syscall=9 success=no exit=-13 a0=7f1105a28000 a1=195000 a2=5 a3=812 items=0 ppid=2219 pid=2241 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts0 ses=2 comm="ld-linux.so" exe="/tmp/ipe-test/lib/ld-linux.so" subj=unconfined key=(null) + type=1327 audit(1653364370.067:61): 707974686F6E3300746573742F6D61696E2E7079002D6E00 + + type=1420 audit(1653364735.161:64): ipe_op=EXECUTE ipe_hook=MMAP enforcing=1 pid=2472 comm="mmap_test" path=? dev=? ino=? rule="DEFAULT action=DENY" + type=1300 audit(1653364735.161:64): SYSCALL arch=c000003e syscall=9 success=no exit=-13 a0=0 a1=1000 a2=4 a3=21 items=0 ppid=2219 pid=2472 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts0 ses=2 comm="mmap_test" exe="/root/overlake_test/upstream_test/vol_fsverity/bin/mmap_test" subj=unconfined key=(null) + type=1327 audit(1653364735.161:64): 707974686F6E3300746573742F6D61696E2E7079002D6E00 + +This event indicates that IPE made an access control decision; the IPE +specific record (1420) is always emitted in conjunction with a +``AUDITSYSCALL`` record. + +Determining whether IPE is in permissive or enforced mode can be derived +from ``success`` property and exit code of the ``AUDITSYSCALL`` record. + + +Field descriptions: + ++-----------+------------+-----------+---------------------------------------------------------------------------------+ +| Field | Value Type | Optional? | Description of Value | ++===========+============+===========+=================================================================================+ +| ipe_op | string | No | The IPE operation name associated with the log | ++-----------+------------+-----------+---------------------------------------------------------------------------------+ +| ipe_hook | string | No | The name of the LSM hook that triggered the IPE event | ++-----------+------------+-----------+---------------------------------------------------------------------------------+ +| enforcing | integer | No | The current IPE enforcing state 1 is in enforcing mode, 0 is in permissive mode | ++-----------+------------+-----------+---------------------------------------------------------------------------------+ +| pid | integer | No | The pid of the process that triggered the IPE event. | ++-----------+------------+-----------+---------------------------------------------------------------------------------+ +| comm | string | No | The command line program name of the process that triggered the IPE event | ++-----------+------------+-----------+---------------------------------------------------------------------------------+ +| path | string | Yes | The absolute path to the evaluated file | ++-----------+------------+-----------+---------------------------------------------------------------------------------+ +| ino | integer | Yes | The inode number of the evaluated file | ++-----------+------------+-----------+---------------------------------------------------------------------------------+ +| dev | string | Yes | The device name of the evaluated file, e.g. vda | ++-----------+------------+-----------+---------------------------------------------------------------------------------+ +| rule | string | No | The matched policy rule | ++-----------+------------+-----------+---------------------------------------------------------------------------------+ + +1421 AUDIT_IPE_CONFIG_CHANGE +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Event Example:: + + type=1421 audit(1653425583.136:54): old_active_pol_name="Allow_All" old_active_pol_version=0.0.0 old_policy_digest=sha256:E3B0C44298FC1C149AFBF4C8996FB92427AE41E4649B934CA495991B7852B855 new_active_pol_name="boot_verified" new_active_pol_version=0.0.0 new_policy_digest=sha256:820EEA5B40CA42B51F68962354BA083122A20BB846F26765076DD8EED7B8F4DB auid=4294967295 ses=4294967295 lsm=ipe res=1 + type=1300 audit(1653425583.136:54): SYSCALL arch=c000003e syscall=1 success=yes exit=2 a0=3 a1=5596fcae1fb0 a2=2 a3=2 items=0 ppid=184 pid=229 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts0 ses=4294967295 comm="python3" exe="/usr/bin/python3.10" key=(null) + type=1327 audit(1653425583.136:54): PROCTITLE proctitle=707974686F6E3300746573742F6D61696E2E7079002D66002E2 + +This event indicates that IPE switched the active poliy from one to another +along with the version and the hash digest of the two policies. +Note IPE can only have one policy active at a time, all access decision +evaluation is based on the current active policy. +The normal procedure to deploy a new policy is loading the policy to deploy +into the kernel first, then switch the active policy to it. + +This record will always be emitted in conjunction with a ``AUDITSYSCALL`` record for the ``write`` syscall. + +Field descriptions: + ++------------------------+------------+-----------+---------------------------------------------------+ +| Field | Value Type | Optional? | Description of Value | ++========================+============+===========+===================================================+ +| old_active_pol_name | string | Yes | The name of previous active policy | ++------------------------+------------+-----------+---------------------------------------------------+ +| old_active_pol_version | string | Yes | The version of previous active policy | ++------------------------+------------+-----------+---------------------------------------------------+ +| old_policy_digest | string | Yes | The hash of previous active policy | ++------------------------+------------+-----------+---------------------------------------------------+ +| new_active_pol_name | string | No | The name of current active policy | ++------------------------+------------+-----------+---------------------------------------------------+ +| new_active_pol_version | string | No | The version of current active policy | ++------------------------+------------+-----------+---------------------------------------------------+ +| new_policy_digest | string | No | The hash of current active policy | ++------------------------+------------+-----------+---------------------------------------------------+ +| auid | integer | No | The login user ID | ++------------------------+------------+-----------+---------------------------------------------------+ +| ses | integer | No | The login session ID | ++------------------------+------------+-----------+---------------------------------------------------+ +| lsm | string | No | The lsm name associated with the event | ++------------------------+------------+-----------+---------------------------------------------------+ +| res | integer | No | The result of the audited operation(success/fail) | ++------------------------+------------+-----------+---------------------------------------------------+ + +1422 AUDIT_IPE_POLICY_LOAD +^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Event Example:: + + type=1422 audit(1653425529.927:53): policy_name="boot_verified" policy_version=0.0.0 policy_digest=sha256:820EEA5B40CA42B51F68962354BA083122A20BB846F26765076DD8EED7B8F4DB auid=4294967295 ses=4294967295 lsm=ipe res=1 + type=1300 audit(1653425529.927:53): arch=c000003e syscall=1 success=yes exit=2567 a0=3 a1=5596fcae1fb0 a2=a07 a3=2 items=0 ppid=184 pid=229 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts0 ses=4294967295 comm="python3" exe="/usr/bin/python3.10" key=(null) + type=1327 audit(1653425529.927:53): PROCTITLE proctitle=707974686F6E3300746573742F6D61696E2E7079002D66002E2E + +This record indicates a new policy has been loaded into the kernel with the policy name, policy version and policy hash. + +This record will always be emitted in conjunction with a ``AUDITSYSCALL`` record for the ``write`` syscall. + +Field descriptions: + ++----------------+------------+-----------+---------------------------------------------------+ +| Field | Value Type | Optional? | Description of Value | ++================+============+===========+===================================================+ +| policy_name | string | No | The policy_name | ++----------------+------------+-----------+---------------------------------------------------+ +| policy_version | string | No | The policy_version | ++----------------+------------+-----------+---------------------------------------------------+ +| policy_digest | string | No | The policy hash | ++----------------+------------+-----------+---------------------------------------------------+ +| auid | integer | No | The login user ID | ++----------------+------------+-----------+---------------------------------------------------+ +| ses | integer | No | The login session ID | ++----------------+------------+-----------+---------------------------------------------------+ +| lsm | string | No | The lsm name associated with the event | ++----------------+------------+-----------+---------------------------------------------------+ +| res | integer | No | The result of the audited operation(success/fail) | ++----------------+------------+-----------+---------------------------------------------------+ + + +1404 AUDIT_MAC_STATUS +^^^^^^^^^^^^^^^^^^^^^ + +Event Examples:: + + type=1404 audit(1653425689.008:55): enforcing=0 old_enforcing=1 auid=4294967295 ses=4294967295 enabled=1 old-enabled=1 lsm=ipe res=1 + type=1300 audit(1653425689.008:55): arch=c000003e syscall=1 success=yes exit=2 a0=1 a1=55c1065e5c60 a2=2 a3=0 items=0 ppid=405 pid=441 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=) + type=1327 audit(1653425689.008:55): proctitle="-bash" + + type=1404 audit(1653425689.008:55): enforcing=1 old_enforcing=0 auid=4294967295 ses=4294967295 enabled=1 old-enabled=1 lsm=ipe res=1 + type=1300 audit(1653425689.008:55): arch=c000003e syscall=1 success=yes exit=2 a0=1 a1=55c1065e5c60 a2=2 a3=0 items=0 ppid=405 pid=441 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=) + type=1327 audit(1653425689.008:55): proctitle="-bash" + +This record will always be emitted in conjunction with a ``AUDITSYSCALL`` record for the ``write`` syscall. + +Field descriptions: + ++---------------+------------+-----------+-------------------------------------------------------------------------------------------------+ +| Field | Value Type | Optional? | Description of Value | ++===============+============+===========+=================================================================================================+ +| enforcing | integer | No | The enforcing state IPE is being switched to, 1 is in enforcing mode, 0 is in permissive mode | ++---------------+------------+-----------+-------------------------------------------------------------------------------------------------+ +| old_enforcing | integer | No | The enforcing state IPE is being switched from, 1 is in enforcing mode, 0 is in permissive mode | ++---------------+------------+-----------+-------------------------------------------------------------------------------------------------+ +| auid | integer | No | The login user ID | ++---------------+------------+-----------+-------------------------------------------------------------------------------------------------+ +| ses | integer | No | The login session ID | ++---------------+------------+-----------+-------------------------------------------------------------------------------------------------+ +| enabled | integer | No | The new TTY audit enabled setting | ++---------------+------------+-----------+-------------------------------------------------------------------------------------------------+ +| old-enabled | integer | No | The old TTY audit enabled setting | ++---------------+------------+-----------+-------------------------------------------------------------------------------------------------+ +| lsm | string | No | The lsm name associated with the event | ++---------------+------------+-----------+-------------------------------------------------------------------------------------------------+ +| res | integer | No | The result of the audited operation(success/fail) | ++---------------+------------+-----------+-------------------------------------------------------------------------------------------------+ + + +Success Auditing +^^^^^^^^^^^^^^^^ + +IPE supports success auditing. When enabled, all events that pass IPE +policy and are not blocked will emit an audit event. This is disabled by +default, and can be enabled via the kernel command line +``ipe.success_audit=(0|1)`` or +``/sys/kernel/security/ipe/success_audit`` securityfs file. + +This is *very* noisy, as IPE will check every userspace binary on the +system, but is useful for debugging policies. + +.. NOTE:: + + If a traditional MAC system is enabled (SELinux, apparmor, smack, etcetera), + all writes to ipe's securityfs nodes require ``CAP_MAC_ADMIN``. + +Properties +---------- + +As explained above, IPE properties are ``key=value`` pairs expressed in IPE +policy. Two properties are built-into the policy parser: 'op' and 'action'. +The other properties are used to restrict immutable security properties +about the files being evaluated. Currently those properties are: +'``boot_verified``', '``dmverity_signature``', '``dmverity_roothash``', +'``fsverity_signature``', '``fsverity_digest``'. A description of all +properties supported by IPE are listed below: + +op +~~ + +Indicates the operation for a rule to apply to. Must be in every rule, +as the first token. IPE supports the following operations: + + ``EXECUTE`` + + Pertains to any file attempting to be executed, or loaded as an + executable. + + ``FIRMWARE``: + + Pertains to firmware being loaded via the firmware_class interface. + This covers both the preallocated buffer and the firmware file + itself. + + ``KMODULE``: + + Pertains to loading kernel modules via ``modprobe`` or ``insmod``. + + ``KEXEC_IMAGE``: + + Pertains to kernel images loading via ``kexec``. + + ``KEXEC_INITRAMFS`` + + Pertains to initrd images loading via ``kexec --initrd``. + + ``POLICY``: + + Controls loading policies via reading a kernel-space initiated read. + + An example of such is loading IMA policies by writing the path + to the policy file to ``$securityfs/ima/policy`` + + ``X509_CERT``: + + Controls loading IMA certificates through the Kconfigs, + ``CONFIG_IMA_X509_PATH`` and ``CONFIG_EVM_X509_PATH``. + +action +~~~~~~ + + Determines what IPE should do when a rule matches. Must be in every + rule, as the final clause. Can be one of: + + ``ALLOW``: + + If the rule matches, explicitly allow access to the resource to proceed + without executing any more rules. + + ``DENY``: + + If the rule matches, explicitly prohibit access to the resource to + proceed without executing any more rules. + +boot_verified +~~~~~~~~~~~~~ + + This property can be utilized for authorization of files from initramfs. + The format of this property is:: + + boot_verified=(TRUE|FALSE) + + + .. WARNING:: + + This property will trust files from initramfs(rootfs). It should + only be used during early booting stage. Before mounting the real + rootfs on top of the initramfs, initramfs script will recursively + remove all files and directories on the initramfs. This is typically + implemented by using switch_root(8) [#switch_root]_. Therefore the + initramfs will be empty and not accessible after the real + rootfs takes over. It is advised to switch to a different policy + that doesn't rely on the property after this point. + This ensures that the trust policies remain relevant and effective + throughout the system's operation. + +dmverity_roothash +~~~~~~~~~~~~~~~~~ + + This property can be utilized for authorization or revocation of + specific dm-verity volumes, identified via their root hashes. It has a + dependency on the DM_VERITY module. This property is controlled by + the ``IPE_PROP_DM_VERITY`` config option, it will be automatically + selected when ``SECURITY_IPE`` and ``DM_VERITY`` are all enabled. + The format of this property is:: + + dmverity_roothash=DigestName:HexadecimalString + + The supported DigestNames for dmverity_roothash are [#dmveritydigests]_ + + + blake2b-512 + + blake2s-256 + + sha256 + + sha384 + + sha512 + + sha3-224 + + sha3-256 + + sha3-384 + + sha3-512 + + sm3 + + rmd160 + +dmverity_signature +~~~~~~~~~~~~~~~~~~ + + This property can be utilized for authorization of all dm-verity + volumes that have a signed roothash that validated by a keyring + specified by dm-verity's configuration, either the system trusted + keyring, or the secondary keyring. It depends on + ``DM_VERITY_VERIFY_ROOTHASH_SIG`` config option and is controlled by + the ``IPE_PROP_DM_VERITY_SIGNATURE`` config option, it will be automatically + selected when ``SECURITY_IPE``, ``DM_VERITY`` and + ``DM_VERITY_VERIFY_ROOTHASH_SIG`` are all enabled. + The format of this property is:: + + dmverity_signature=(TRUE|FALSE) + +fsverity_digest +~~~~~~~~~~~~~~~ + + This property can be utilized for authorization of specific fsverity + enabled files, identified via their fsverity digests. + It depends on ``FS_VERITY`` config option and is controlled by + the ``IPE_PROP_FS_VERITY`` config option, it will be automatically + selected when ``SECURITY_IPE`` and ``FS_VERITY`` are all enabled. + The format of this property is:: + + fsverity_digest=DigestName:HexadecimalString + + The supported DigestNames for fsverity_digest are [#fsveritydigest]_ + + + sha256 + + sha512 + +fsverity_signature +~~~~~~~~~~~~~~~~~~ + + This property is used to authorize all fs-verity enabled files that have + been verified by fs-verity's built-in signature mechanism. The signature + verification relies on a key stored within the ".fs-verity" keyring. It + depends on ``FS_VERITY_BUILTIN_SIGNATURES`` config option and + it is controlled by the ``IPE_PROP_FS_VERITY`` config option, + it will be automatically selected when ``SECURITY_IPE``, ``FS_VERITY`` + and ``FS_VERITY_BUILTIN_SIGNATURES`` are all enabled. + The format of this property is:: + + fsverity_signature=(TRUE|FALSE) + +Policy Examples +--------------- + +Allow all +~~~~~~~~~ + +:: + + policy_name=Allow_All policy_version=0.0.0 + DEFAULT action=ALLOW + +Allow only initramfs +~~~~~~~~~~~~~~~~~~~~ + +:: + + policy_name=Allow_Initramfs policy_version=0.0.0 + DEFAULT action=DENY + + op=EXECUTE boot_verified=TRUE action=ALLOW + +Allow any signed and validated dm-verity volume and the initramfs +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +:: + + policy_name=Allow_Signed_DMV_And_Initramfs policy_version=0.0.0 + DEFAULT action=DENY + + op=EXECUTE boot_verified=TRUE action=ALLOW + op=EXECUTE dmverity_signature=TRUE action=ALLOW + +Prohibit execution from a specific dm-verity volume +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +:: + + policy_name=Deny_DMV_By_Roothash policy_version=0.0.0 + DEFAULT action=DENY + + op=EXECUTE dmverity_roothash=sha256:cd2c5bae7c6c579edaae4353049d58eb5f2e8be0244bf05345bc8e5ed257baff action=DENY + + op=EXECUTE boot_verified=TRUE action=ALLOW + op=EXECUTE dmverity_signature=TRUE action=ALLOW + +Allow only a specific dm-verity volume +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +:: + + policy_name=Allow_DMV_By_Roothash policy_version=0.0.0 + DEFAULT action=DENY + + op=EXECUTE dmverity_roothash=sha256:401fcec5944823ae12f62726e8184407a5fa9599783f030dec146938 action=ALLOW + +Allow any fs-verity file with a valid built-in signature +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +:: + + policy_name=Allow_Signed_And_Validated_FSVerity policy_version=0.0.0 + DEFAULT action=DENY + + op=EXECUTE fsverity_signature=TRUE action=ALLOW + +Allow execution of a specific fs-verity file +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +:: + + policy_name=ALLOW_FSV_By_Digest policy_version=0.0.0 + DEFAULT action=DENY + + op=EXECUTE fsverity_digest=sha256:fd88f2b8824e197f850bf4c5109bea5cf0ee38104f710843bb72da796ba5af9e action=ALLOW + +Additional Information +---------------------- + +- `Github Repository <https://github.com/microsoft/ipe>`_ +- :doc:`Developer and design docs for IPE </security/ipe>` + +FAQ +--- + +Q: + What's the difference between other LSMs which provide a measure of + trust-based access control? + +A: + + In general, there's two other LSMs that can provide similar functionality: + IMA, and Loadpin. + + IMA and IPE are functionally very similar. The significant difference between + the two is the policy. [#devdoc]_ + + Loadpin and IPE differ fairly dramatically, as Loadpin only covers the IPE's + kernel read operations, whereas IPE is capable of controlling execution + on top of kernel read. The trust model is also different; Loadpin roots its + trust in the initial super-block, whereas trust in IPE is stemmed from kernel + itself (via ``SYSTEM_TRUSTED_KEYS``). + +----------- + +.. [#digest_cache_lsm] https://lore.kernel.org/lkml/20240415142436.2545003-1-roberto.sassu@huaweicloud.com/ + +.. [#interpreters] There is `some interest in solving this issue <https://lore.kernel.org/lkml/20220321161557.495388-1-mic@digikod.net/>`_. + +.. [#devdoc] Please see :doc:`the design docs </security/ipe>` for more on + this topic. + +.. [#switch_root] https://man7.org/linux/man-pages/man8/switch_root.8.html + +.. [#dmveritydigests] These hash algorithms are based on values accepted by + the Linux crypto API; IPE does not impose any + restrictions on the digest algorithm itself; + thus, this list may be out of date. + +.. [#fsveritydigest] These hash algorithms are based on values accepted by the + kernel's fsverity support; IPE does not impose any + restrictions on the digest algorithm itself; + thus, this list may be out of date. diff --git a/Documentation/admin-guide/hw-vuln/srso.rst b/Documentation/admin-guide/hw-vuln/srso.rst index 4bd3ce3ba171..2ad1c05b8c88 100644 --- a/Documentation/admin-guide/hw-vuln/srso.rst +++ b/Documentation/admin-guide/hw-vuln/srso.rst @@ -158,3 +158,72 @@ poisoned BTB entry and using that safe one for all function returns. In older Zen1 and Zen2, this is accomplished using a reinterpretation technique similar to Retbleed one: srso_untrain_ret() and srso_safe_ret(). + +Checking the safe RET mitigation actually works +----------------------------------------------- + +In case one wants to validate whether the SRSO safe RET mitigation works +on a kernel, one could use two performance counters + +* PMC_0xc8 - Count of RET/RET lw retired +* PMC_0xc9 - Count of RET/RET lw retired mispredicted + +and compare the number of RETs retired properly vs those retired +mispredicted, in kernel mode. Another way of specifying those events +is:: + + # perf list ex_ret_near_ret + + List of pre-defined events (to be used in -e or -M): + + core: + ex_ret_near_ret + [Retired Near Returns] + ex_ret_near_ret_mispred + [Retired Near Returns Mispredicted] + +Either the command using the event mnemonics:: + + # perf stat -e ex_ret_near_ret:k -e ex_ret_near_ret_mispred:k sleep 10s + +or using the raw PMC numbers:: + + # perf stat -e cpu/event=0xc8,umask=0/k -e cpu/event=0xc9,umask=0/k sleep 10s + +should give the same amount. I.e., every RET retired should be +mispredicted:: + + [root@brent: ~/kernel/linux/tools/perf> ./perf stat -e cpu/event=0xc8,umask=0/k -e cpu/event=0xc9,umask=0/k sleep 10s + + Performance counter stats for 'sleep 10s': + + 137,167 cpu/event=0xc8,umask=0/k + 137,173 cpu/event=0xc9,umask=0/k + + 10.004110303 seconds time elapsed + + 0.000000000 seconds user + 0.004462000 seconds sys + +vs the case when the mitigation is disabled (spec_rstack_overflow=off) +or not functioning properly, showing usually a lot smaller number of +mispredicted retired RETs vs the overall count of retired RETs during +a workload:: + + [root@brent: ~/kernel/linux/tools/perf> ./perf stat -e cpu/event=0xc8,umask=0/k -e cpu/event=0xc9,umask=0/k sleep 10s + + Performance counter stats for 'sleep 10s': + + 201,627 cpu/event=0xc8,umask=0/k + 4,074 cpu/event=0xc9,umask=0/k + + 10.003267252 seconds time elapsed + + 0.002729000 seconds user + 0.000000000 seconds sys + +Also, there is a selftest which performs the above, go to +tools/testing/selftests/x86/ and do:: + + make srso + ./srso diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 09126bb8cc9f..0b400aa28482 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -2350,6 +2350,18 @@ ipcmni_extend [KNL,EARLY] Extend the maximum number of unique System V IPC identifiers from 32,768 to 16,777,216. + ipe.enforce= [IPE] + Format: <bool> + Determine whether IPE starts in permissive (0) or + enforce (1) mode. The default is enforce. + + ipe.success_audit= + [IPE] + Format: <bool> + Start IPE with success auditing enabled, emitting + an audit event when a binary is allowed. The default + is 0. + irqaffinity= [SMP] Set the default irq affinity mask The argument is a cpu list, as described above. @@ -4788,6 +4800,16 @@ printk.time= Show timing data prefixed to each printk message line Format: <bool> (1/Y/y=enable, 0/N/n=disable) + proc_mem.force_override= [KNL] + Format: {always | ptrace | never} + Traditionally /proc/pid/mem allows memory permissions to be + overridden without restrictions. This option may be set to + restrict that. Can be one of: + - 'always': traditional behavior always allows mem overrides. + - 'ptrace': only allow mem overrides for active ptracers. + - 'never': never allow mem overrides. + If not specified, default is the CONFIG_PROC_MEM_* choice. + processor.max_cstate= [HW,ACPI] Limit processor to maximum C-state max_cstate=9 overrides any DMI blacklist limit. diff --git a/Documentation/admin-guide/perf/arm-ni.rst b/Documentation/admin-guide/perf/arm-ni.rst new file mode 100644 index 000000000000..d26a8f697c36 --- /dev/null +++ b/Documentation/admin-guide/perf/arm-ni.rst @@ -0,0 +1,17 @@ +==================================== +Arm Network-on Chip Interconnect PMU +==================================== + +NI-700 and friends implement a distinct PMU for each clock domain within the +interconnect. Correspondingly, the driver exposes multiple PMU devices named +arm_ni_<x>_cd_<y>, where <x> is an (arbitrary) instance identifier and <y> is +the clock domain ID within that particular instance. If multiple NI instances +exist within a system, the PMU devices can be correlated with the underlying +hardware instance via sysfs parentage. + +Each PMU exposes base event aliases for the interface types present in its clock +domain. These require qualifying with the "eventid" and "nodeid" parameters +to specify the event code to count and the interface at which to count it +(per the configured hardware ID as reflected in the xxNI_NODE_INFO register). +The exception is the "cycles" alias for the PMU cycle counter, which is encoded +with the PMU node type and needs no further qualification. diff --git a/Documentation/admin-guide/perf/dwc_pcie_pmu.rst b/Documentation/admin-guide/perf/dwc_pcie_pmu.rst index d47cd229d710..39b8e1fdd0cd 100644 --- a/Documentation/admin-guide/perf/dwc_pcie_pmu.rst +++ b/Documentation/admin-guide/perf/dwc_pcie_pmu.rst @@ -46,16 +46,16 @@ Some of the events only exist for specific configurations. DesignWare Cores (DWC) PCIe PMU Driver ======================================= -This driver adds PMU devices for each PCIe Root Port named based on the BDF of +This driver adds PMU devices for each PCIe Root Port named based on the SBDF of the Root Port. For example, - 30:03.0 PCI bridge: Device 1ded:8000 (rev 01) + 0001:30:03.0 PCI bridge: Device 1ded:8000 (rev 01) -the PMU device name for this Root Port is dwc_rootport_3018. +the PMU device name for this Root Port is dwc_rootport_13018. The DWC PCIe PMU driver registers a perf PMU driver, which provides description of available events and configuration options in sysfs, see -/sys/bus/event_source/devices/dwc_rootport_{bdf}. +/sys/bus/event_source/devices/dwc_rootport_{sbdf}. The "format" directory describes format of the config fields of the perf_event_attr structure. The "events" directory provides configuration @@ -66,16 +66,16 @@ The "perf list" command shall list the available events from sysfs, e.g.:: $# perf list | grep dwc_rootport <...> - dwc_rootport_3018/Rx_PCIe_TLP_Data_Payload/ [Kernel PMU event] + dwc_rootport_13018/Rx_PCIe_TLP_Data_Payload/ [Kernel PMU event] <...> - dwc_rootport_3018/rx_memory_read,lane=?/ [Kernel PMU event] + dwc_rootport_13018/rx_memory_read,lane=?/ [Kernel PMU event] Time Based Analysis Event Usage ------------------------------- Example usage of counting PCIe RX TLP data payload (Units of bytes):: - $# perf stat -a -e dwc_rootport_3018/Rx_PCIe_TLP_Data_Payload/ + $# perf stat -a -e dwc_rootport_13018/Rx_PCIe_TLP_Data_Payload/ The average RX/TX bandwidth can be calculated using the following formula: @@ -88,7 +88,7 @@ Lane Event Usage Each lane has the same event set and to avoid generating a list of hundreds of events, the user need to specify the lane ID explicitly, e.g.:: - $# perf stat -a -e dwc_rootport_3018/rx_memory_read,lane=4/ + $# perf stat -a -e dwc_rootport_13018/rx_memory_read,lane=4/ The driver does not support sampling, therefore "perf record" will not work. Per-task (without "-a") perf sessions are not supported. diff --git a/Documentation/admin-guide/perf/hisi-pcie-pmu.rst b/Documentation/admin-guide/perf/hisi-pcie-pmu.rst index 5541ff40e06a..083ca50de896 100644 --- a/Documentation/admin-guide/perf/hisi-pcie-pmu.rst +++ b/Documentation/admin-guide/perf/hisi-pcie-pmu.rst @@ -28,7 +28,9 @@ The "identifier" sysfs file allows users to identify the version of the PMU hardware device. The "bus" sysfs file allows users to get the bus number of Root Ports -monitored by PMU. +monitored by PMU. Furthermore users can get the Root Ports range in +[bdf_min, bdf_max] from "bdf_min" and "bdf_max" sysfs attributes +respectively. Example usage of perf:: diff --git a/Documentation/admin-guide/perf/index.rst b/Documentation/admin-guide/perf/index.rst index 7eb3dcd6f4da..8502bc174640 100644 --- a/Documentation/admin-guide/perf/index.rst +++ b/Documentation/admin-guide/perf/index.rst @@ -16,6 +16,7 @@ Performance monitor support starfive_starlink_pmu arm-ccn arm-cmn + arm-ni xgene-pmu arm_dsu_pmu thunderx2-pmu diff --git a/Documentation/admin-guide/pm/amd-pstate.rst b/Documentation/admin-guide/pm/amd-pstate.rst index d0324d44f548..210a808b74ec 100644 --- a/Documentation/admin-guide/pm/amd-pstate.rst +++ b/Documentation/admin-guide/pm/amd-pstate.rst @@ -251,7 +251,9 @@ performance supported in `AMD CPPC Performance Capability <perf_cap_>`_). In some ASICs, the highest CPPC performance is not the one in the ``_CPC`` table, so we need to expose it to sysfs. If boost is not active, but still supported, this maximum frequency will be larger than the one in -``cpuinfo``. +``cpuinfo``. On systems that support preferred core, the driver will have +different values for some cores than others and this will reflect the values +advertised by the platform at bootup. This attribute is read-only. ``amd_pstate_lowest_nonlinear_freq`` @@ -262,6 +264,17 @@ lowest non-linear performance in `AMD CPPC Performance Capability <perf_cap_>`_.) This attribute is read-only. +``amd_pstate_hw_prefcore`` + +Whether the platform supports the preferred core feature and it has been +enabled. This attribute is read-only. + +``amd_pstate_prefcore_ranking`` + +The performance ranking of the core. This number doesn't have any unit, but +larger numbers are preferred at the time of reading. This can change at +runtime based on platform conditions. This attribute is read-only. + ``energy_performance_available_preferences`` A list of all the supported EPP preferences that could be used for diff --git a/Documentation/arch/arm64/elf_hwcaps.rst b/Documentation/arch/arm64/elf_hwcaps.rst index 448c1664879b..694f67fa07d1 100644 --- a/Documentation/arch/arm64/elf_hwcaps.rst +++ b/Documentation/arch/arm64/elf_hwcaps.rst @@ -365,6 +365,8 @@ HWCAP2_SME_SF8DP2 HWCAP2_SME_SF8DP4 Functionality implied by ID_AA64SMFR0_EL1.SF8DP4 == 0b1. +HWCAP2_POE + Functionality implied by ID_AA64MMFR3_EL1.S1POE == 0b0001. 4. Unused AT_HWCAP bits ----------------------- diff --git a/Documentation/arch/arm64/silicon-errata.rst b/Documentation/arch/arm64/silicon-errata.rst index 50327c05be8d..9eb5e70b4888 100644 --- a/Documentation/arch/arm64/silicon-errata.rst +++ b/Documentation/arch/arm64/silicon-errata.rst @@ -55,6 +55,8 @@ stable kernels. +----------------+-----------------+-----------------+-----------------------------+ | Ampere | AmpereOne | AC03_CPU_38 | AMPERE_ERRATUM_AC03_CPU_38 | +----------------+-----------------+-----------------+-----------------------------+ +| Ampere | AmpereOne AC04 | AC04_CPU_10 | AMPERE_ERRATUM_AC03_CPU_38 | ++----------------+-----------------+-----------------+-----------------------------+ +----------------+-----------------+-----------------+-----------------------------+ | ARM | Cortex-A510 | #2457168 | ARM64_ERRATUM_2457168 | +----------------+-----------------+-----------------+-----------------------------+ @@ -249,8 +251,8 @@ stable kernels. +----------------+-----------------+-----------------+-----------------------------+ | Hisilicon | Hip08 SMMU PMCG | #162001800 | N/A | +----------------+-----------------+-----------------+-----------------------------+ -| Hisilicon | Hip08 SMMU PMCG | #162001900 | N/A | -| | Hip09 SMMU PMCG | | | +| Hisilicon | Hip{08,09,10,10C| #162001900 | N/A | +| | ,11} SMMU PMCG | | | +----------------+-----------------+-----------------+-----------------------------+ +----------------+-----------------+-----------------+-----------------------------+ | Qualcomm Tech. | Kryo/Falkor v1 | E1003 | QCOM_FALKOR_ERRATUM_1003 | diff --git a/Documentation/dev-tools/gcov.rst b/Documentation/dev-tools/gcov.rst index 5fce2b06f229..dbd26b02ff3c 100644 --- a/Documentation/dev-tools/gcov.rst +++ b/Documentation/dev-tools/gcov.rst @@ -75,6 +75,17 @@ Only files which are linked to the main kernel image or are compiled as kernel modules are supported by this mechanism. +Module specific configs +----------------------- + +Gcov kernel configs for specific modules are described below: + +CONFIG_GCOV_PROFILE_RDS: + Enables GCOV profiling on RDS for checking which functions or + lines are executed. This config is used by the rds selftest to + generate coverage reports. If left unset the report is omitted. + + Files ----- diff --git a/Documentation/devicetree/bindings/crypto/fsl,sec-v4.0.yaml b/Documentation/devicetree/bindings/crypto/fsl,sec-v4.0.yaml index 0a9ed2848b7c..9c8c9991f29a 100644 --- a/Documentation/devicetree/bindings/crypto/fsl,sec-v4.0.yaml +++ b/Documentation/devicetree/bindings/crypto/fsl,sec-v4.0.yaml @@ -137,7 +137,10 @@ patternProperties: - const: fsl,sec-v4.0-rtic reg: - maxItems: 1 + items: + - description: RTIC control and status register space. + - description: RTIC recoverable error indication register space. + minItems: 1 ranges: maxItems: 1 diff --git a/Documentation/devicetree/bindings/crypto/qcom,prng.yaml b/Documentation/devicetree/bindings/crypto/qcom,prng.yaml index 89c88004b41b..048b769a73c0 100644 --- a/Documentation/devicetree/bindings/crypto/qcom,prng.yaml +++ b/Documentation/devicetree/bindings/crypto/qcom,prng.yaml @@ -17,6 +17,7 @@ properties: - qcom,prng-ee # 8996 and later using EE - items: - enum: + - qcom,sa8255p-trng - qcom,sa8775p-trng - qcom,sc7280-trng - qcom,sm8450-trng diff --git a/Documentation/devicetree/bindings/interrupt-controller/apple,aic.yaml b/Documentation/devicetree/bindings/interrupt-controller/apple,aic.yaml index 698588e9aa86..4be9b596a790 100644 --- a/Documentation/devicetree/bindings/interrupt-controller/apple,aic.yaml +++ b/Documentation/devicetree/bindings/interrupt-controller/apple,aic.yaml @@ -31,13 +31,25 @@ description: | This device also represents the FIQ interrupt sources on platforms using AIC, which do not go through a discrete interrupt controller. + IPIs may be performed via MMIO registers on all variants of AIC. Starting + from A11, system registers may also be used for "fast" IPIs. Starting from + M1, even faster IPIs within the same cluster may be achieved by writing to + a "local" fast IPI register as opposed to using the "global" fast IPI + register. + allOf: - $ref: /schemas/interrupt-controller.yaml# properties: compatible: items: - - const: apple,t8103-aic + - enum: + - apple,s5l8960x-aic + - apple,t7000-aic + - apple,s8000-aic + - apple,t8010-aic + - apple,t8015-aic + - apple,t8103-aic - const: apple,aic interrupt-controller: true diff --git a/Documentation/devicetree/bindings/net/amlogic,meson-dwmac.yaml b/Documentation/devicetree/bindings/net/amlogic,meson-dwmac.yaml index ee7a65b528cd..d1e2bca3c503 100644 --- a/Documentation/devicetree/bindings/net/amlogic,meson-dwmac.yaml +++ b/Documentation/devicetree/bindings/net/amlogic,meson-dwmac.yaml @@ -58,18 +58,18 @@ allOf: - const: timing-adjustment amlogic,tx-delay-ns: - $ref: /schemas/types.yaml#/definitions/uint32 + enum: [0, 2, 4, 6] + default: 2 description: - The internal RGMII TX clock delay (provided by this driver) in - nanoseconds. Allowed values are 0ns, 2ns, 4ns, 6ns. - When phy-mode is set to "rgmii" then the TX delay should be - explicitly configured. When not configured a fallback of 2ns is - used. When the phy-mode is set to either "rgmii-id" or "rgmii-txid" - the TX clock delay is already provided by the PHY. In that case - this property should be set to 0ns (which disables the TX clock - delay in the MAC to prevent the clock from going off because both - PHY and MAC are adding a delay). - Any configuration is ignored when the phy-mode is set to "rmii". + The internal RGMII TX clock delay (provided by this driver) + in nanoseconds. When phy-mode is set to "rgmii" then the TX + delay should be explicitly configured. When the phy-mode is + set to either "rgmii-id" or "rgmii-txid" the TX clock delay + is already provided by the PHY. In that case this property + should be set to 0ns (which disables the TX clock delay in + the MAC to prevent the clock from going off because both + PHY and MAC are adding a delay). Any configuration is + ignored when the phy-mode is set to "rmii". amlogic,rx-delay-ns: deprecated: true diff --git a/Documentation/devicetree/bindings/net/bluetooth/amlogic,w155s2-bt.yaml b/Documentation/devicetree/bindings/net/bluetooth/amlogic,w155s2-bt.yaml new file mode 100644 index 000000000000..6fd7557039d2 --- /dev/null +++ b/Documentation/devicetree/bindings/net/bluetooth/amlogic,w155s2-bt.yaml @@ -0,0 +1,63 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +# Copyright (C) 2024 Amlogic, Inc. All rights reserved +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/net/bluetooth/amlogic,w155s2-bt.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Amlogic Bluetooth chips + +description: + The W155S2 is an Amlogic Bluetooth and Wi-Fi combo chip. It works on + the standard H4 protocol via a 4-wire UART interface, with baud rates + up to 4 Mbps. + +maintainers: + - Yang Li <yang.li@amlogic.com> + +properties: + compatible: + oneOf: + - items: + - enum: + - amlogic,w265s1-bt + - amlogic,w265p1-bt + - const: amlogic,w155s2-bt + - enum: + - amlogic,w155s2-bt + - amlogic,w265s2-bt + + clocks: + maxItems: 1 + description: clock provided to the controller (32.768KHz) + + enable-gpios: + maxItems: 1 + + vddio-supply: + description: VDD_IO supply regulator handle + + firmware-name: + maxItems: 1 + description: specify the path of firmware bin to load + +required: + - compatible + - clocks + - enable-gpios + - vddio-supply + - firmware-name + +additionalProperties: false + +examples: + - | + #include <dt-bindings/gpio/gpio.h> + bluetooth { + compatible = "amlogic,w155s2-bt"; + clocks = <&extclk>; + enable-gpios = <&gpio 17 GPIO_ACTIVE_HIGH>; + vddio-supply = <&wcn_3v3>; + firmware-name = "amlogic/aml_w155s2_bt_uart.bin"; + }; + diff --git a/Documentation/devicetree/bindings/net/bluetooth/qualcomm-bluetooth.yaml b/Documentation/devicetree/bindings/net/bluetooth/qualcomm-bluetooth.yaml index 68c5ed111417..64a5c5004862 100644 --- a/Documentation/devicetree/bindings/net/bluetooth/qualcomm-bluetooth.yaml +++ b/Documentation/devicetree/bindings/net/bluetooth/qualcomm-bluetooth.yaml @@ -172,14 +172,14 @@ allOf: - qcom,wcn6855-bt then: required: - - enable-gpios - - swctrl-gpios - - vddio-supply - - vddbtcxmx-supply - vddrfacmn-supply + - vddaon-supply + - vddwlcx-supply + - vddwlmx-supply + - vddbtcmx-supply - vddrfa0p8-supply - vddrfa1p2-supply - - vddrfa1p7-supply + - vddrfa1p8-supply - if: properties: compatible: diff --git a/Documentation/devicetree/bindings/net/can/fsl,flexcan.yaml b/Documentation/devicetree/bindings/net/can/fsl,flexcan.yaml index f197d9b516bb..97dd1a7c5ed2 100644 --- a/Documentation/devicetree/bindings/net/can/fsl,flexcan.yaml +++ b/Documentation/devicetree/bindings/net/can/fsl,flexcan.yaml @@ -17,6 +17,7 @@ properties: compatible: oneOf: - enum: + - fsl,imx95-flexcan - fsl,imx93-flexcan - fsl,imx8qm-flexcan - fsl,imx8mp-flexcan @@ -39,9 +40,6 @@ properties: - fsl,imx6sx-flexcan - const: fsl,imx6q-flexcan - items: - - const: fsl,imx95-flexcan - - const: fsl,imx93-flexcan - - items: - enum: - fsl,ls1028ar1-flexcan - const: fsl,lx2160ar1-flexcan @@ -80,6 +78,10 @@ properties: node then controller is assumed to be little endian. If this property is present then controller is assumed to be big endian. + can-transceiver: + $ref: can-transceiver.yaml# + unevaluatedProperties: false + fsl,stop-mode: description: | Register bits of stop mode control. diff --git a/Documentation/devicetree/bindings/net/can/microchip,mcp2510.yaml b/Documentation/devicetree/bindings/net/can/microchip,mcp2510.yaml new file mode 100644 index 000000000000..db446dde6842 --- /dev/null +++ b/Documentation/devicetree/bindings/net/can/microchip,mcp2510.yaml @@ -0,0 +1,70 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/can/microchip,mcp2510.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Microchip MCP251X stand-alone CAN controller + +maintainers: + - Marc Kleine-Budde <mkl@pengutronix.de> + +properties: + compatible: + enum: + - microchip,mcp2510 + - microchip,mcp2515 + - microchip,mcp25625 + + reg: + maxItems: 1 + + clocks: + maxItems: 1 + + interrupts: + maxItems: 1 + + vdd-supply: + description: Regulator that powers the CAN controller. + + xceiver-supply: + description: Regulator that powers the CAN transceiver. + + gpio-controller: true + + "#gpio-cells": + const: 2 + +required: + - compatible + - reg + - clocks + - interrupts + +allOf: + - $ref: /schemas/spi/spi-peripheral-props.yaml# + +unevaluatedProperties: false + +examples: + - | + #include <dt-bindings/interrupt-controller/irq.h> + + spi { + #address-cells = <1>; + #size-cells = <0>; + + can@1 { + compatible = "microchip,mcp2515"; + reg = <1>; + clocks = <&clk24m>; + interrupt-parent = <&gpio4>; + interrupts = <13 IRQ_TYPE_LEVEL_LOW>; + vdd-supply = <®5v0>; + xceiver-supply = <®5v0>; + gpio-controller; + #gpio-cells = <2>; + }; + }; + diff --git a/Documentation/devicetree/bindings/net/can/microchip,mcp251x.txt b/Documentation/devicetree/bindings/net/can/microchip,mcp251x.txt deleted file mode 100644 index 381f8fb3e865..000000000000 --- a/Documentation/devicetree/bindings/net/can/microchip,mcp251x.txt +++ /dev/null @@ -1,30 +0,0 @@ -* Microchip MCP251X stand-alone CAN controller device tree bindings - -Required properties: - - compatible: Should be one of the following: - - "microchip,mcp2510" for MCP2510. - - "microchip,mcp2515" for MCP2515. - - "microchip,mcp25625" for MCP25625. - - reg: SPI chip select. - - clocks: The clock feeding the CAN controller. - - interrupts: Should contain IRQ line for the CAN controller. - -Optional properties: - - vdd-supply: Regulator that powers the CAN controller. - - xceiver-supply: Regulator that powers the CAN transceiver. - - gpio-controller: Indicates this device is a GPIO controller. - - #gpio-cells: Should be two. The first cell is the pin number and - the second cell is used to specify the gpio polarity. - -Example: - can0: can@1 { - compatible = "microchip,mcp2515"; - reg = <1>; - clocks = <&clk24m>; - interrupt-parent = <&gpio4>; - interrupts = <13 IRQ_TYPE_LEVEL_LOW>; - vdd-supply = <®5v0>; - xceiver-supply = <®5v0>; - gpio-controller; - #gpio-cells = <2>; - }; diff --git a/Documentation/devicetree/bindings/net/can/renesas,rcar-canfd.yaml b/Documentation/devicetree/bindings/net/can/renesas,rcar-canfd.yaml index d3f45d29fa0a..7c5ac5d2e880 100644 --- a/Documentation/devicetree/bindings/net/can/renesas,rcar-canfd.yaml +++ b/Documentation/devicetree/bindings/net/can/renesas,rcar-canfd.yaml @@ -32,6 +32,7 @@ properties: - enum: - renesas,r8a779a0-canfd # R-Car V3U - renesas,r8a779g0-canfd # R-Car V4H + - renesas,r8a779h0-canfd # R-Car V4M - const: renesas,rcar-gen4-canfd # R-Car Gen4 - items: @@ -163,14 +164,23 @@ allOf: maxItems: 1 - if: - not: - properties: - compatible: - contains: - const: renesas,rcar-gen4-canfd + properties: + compatible: + contains: + const: renesas,r8a779h0-canfd then: patternProperties: - "^channel[2-7]$": false + "^channel[5-7]$": false + else: + if: + not: + properties: + compatible: + contains: + const: renesas,rcar-gen4-canfd + then: + patternProperties: + "^channel[2-7]$": false unevaluatedProperties: false diff --git a/Documentation/devicetree/bindings/net/can/rockchip,rk3568v2-canfd.yaml b/Documentation/devicetree/bindings/net/can/rockchip,rk3568v2-canfd.yaml new file mode 100644 index 000000000000..a077c0330013 --- /dev/null +++ b/Documentation/devicetree/bindings/net/can/rockchip,rk3568v2-canfd.yaml @@ -0,0 +1,74 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/net/can/rockchip,rk3568v2-canfd.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: + Rockchip CAN-FD controller + +maintainers: + - Marc Kleine-Budde <mkl@pengutronix.de> + +allOf: + - $ref: can-controller.yaml# + +properties: + compatible: + oneOf: + - const: rockchip,rk3568v2-canfd + - items: + - const: rockchip,rk3568v3-canfd + - const: rockchip,rk3568v2-canfd + + reg: + maxItems: 1 + + interrupts: + maxItems: 1 + + clocks: + maxItems: 2 + + clock-names: + items: + - const: baud + - const: pclk + + resets: + maxItems: 2 + + reset-names: + items: + - const: core + - const: apb + +required: + - compatible + - reg + - interrupts + - clocks + - resets + +additionalProperties: false + +examples: + - | + #include <dt-bindings/clock/rk3568-cru.h> + #include <dt-bindings/interrupt-controller/arm-gic.h> + #include <dt-bindings/interrupt-controller/irq.h> + + soc { + #address-cells = <2>; + #size-cells = <2>; + + can@fe570000 { + compatible = "rockchip,rk3568v2-canfd"; + reg = <0x0 0xfe570000 0x0 0x1000>; + interrupts = <GIC_SPI 1 IRQ_TYPE_LEVEL_HIGH>; + clocks = <&cru CLK_CAN0>, <&cru PCLK_CAN0>; + clock-names = "baud", "pclk"; + resets = <&cru SRST_CAN0>, <&cru SRST_P_CAN0>; + reset-names = "core", "apb"; + }; + }; diff --git a/Documentation/devicetree/bindings/net/dsa/mediatek,mt7530.yaml b/Documentation/devicetree/bindings/net/dsa/mediatek,mt7530.yaml index 7e405ad96eb2..ea979bcae1d6 100644 --- a/Documentation/devicetree/bindings/net/dsa/mediatek,mt7530.yaml +++ b/Documentation/devicetree/bindings/net/dsa/mediatek,mt7530.yaml @@ -92,6 +92,10 @@ properties: Built-in switch of the MT7988 SoC const: mediatek,mt7988-switch + - description: + Built-in switch of the Airoha EN7581 SoC + const: airoha,en7581-switch + reg: maxItems: 1 @@ -284,7 +288,9 @@ allOf: - if: properties: compatible: - const: mediatek,mt7988-switch + enum: + - mediatek,mt7988-switch + - airoha,en7581-switch then: $ref: "#/$defs/mt7530-dsa-port" properties: diff --git a/Documentation/devicetree/bindings/net/dsa/microchip,ksz.yaml b/Documentation/devicetree/bindings/net/dsa/microchip,ksz.yaml index 52acc15ebcbf..30c0c3e6f37a 100644 --- a/Documentation/devicetree/bindings/net/dsa/microchip,ksz.yaml +++ b/Documentation/devicetree/bindings/net/dsa/microchip,ksz.yaml @@ -22,7 +22,9 @@ properties: - microchip,ksz8794 - microchip,ksz8795 - microchip,ksz8863 + - microchip,ksz8864 # 4-port version of KSZ8895 family switch - microchip,ksz8873 + - microchip,ksz8895 # 5-port version of KSZ8895 family switch - microchip,ksz9477 - microchip,ksz9897 - microchip,ksz9896 @@ -51,6 +53,11 @@ properties: Set if the output SYNCLKO clock should be disabled. Do not mix with microchip,synclko-125. + microchip,pme-active-high: + $ref: /schemas/types.yaml#/definitions/flag + description: + Indicates if the PME pin polarity is active-high. + microchip,io-drive-strength-microamp: description: IO Pad Drive Strength diff --git a/Documentation/devicetree/bindings/net/dsa/vitesse,vsc73xx.yaml b/Documentation/devicetree/bindings/net/dsa/vitesse,vsc73xx.yaml index b99d7a694b70..51cf574249be 100644 --- a/Documentation/devicetree/bindings/net/dsa/vitesse,vsc73xx.yaml +++ b/Documentation/devicetree/bindings/net/dsa/vitesse,vsc73xx.yaml @@ -52,6 +52,25 @@ properties: allOf: - $ref: dsa.yaml#/$defs/ethernet-ports +patternProperties: + "^(ethernet-)?ports$": + additionalProperties: true + patternProperties: + "^(ethernet-)?port@6$": + allOf: + - if: + properties: + phy-mode: + contains: + enum: + - rgmii + then: + properties: + rx-internal-delay-ps: + $ref: "#/$defs/internal-delay-ps" + tx-internal-delay-ps: + $ref: "#/$defs/internal-delay-ps" + # This checks if reg is a chipselect so the device is on an SPI # bus, the if-clause will fail if reg is a tuple such as for a # platform device. @@ -67,6 +86,15 @@ required: - compatible - reg +$defs: + internal-delay-ps: + description: + Disable tunable delay lines using 0 ps, or enable them and select + the phase between 1400 ps and 2000 ps in increments of 300 ps. + default: 2000 + enum: + [0, 1400, 1700, 2000] + unevaluatedProperties: false examples: @@ -108,6 +136,8 @@ examples: reg = <6>; ethernet = <&gmac1>; phy-mode = "rgmii"; + rx-internal-delay-ps = <0>; + tx-internal-delay-ps = <0>; fixed-link { speed = <1000>; full-duplex; @@ -150,6 +180,8 @@ examples: ethernet-port@6 { reg = <6>; ethernet = <&enet0>; + rx-internal-delay-ps = <0>; + tx-internal-delay-ps = <0>; phy-mode = "rgmii"; fixed-link { speed = <1000>; diff --git a/Documentation/devicetree/bindings/net/fsl,qoriq-mc-dpmac.yaml b/Documentation/devicetree/bindings/net/fsl,qoriq-mc-dpmac.yaml index 42f9843d1868..be8a2163b73e 100644 --- a/Documentation/devicetree/bindings/net/fsl,qoriq-mc-dpmac.yaml +++ b/Documentation/devicetree/bindings/net/fsl,qoriq-mc-dpmac.yaml @@ -24,20 +24,12 @@ properties: maxItems: 1 description: The DPMAC number - phy-handle: true - - phy-connection-type: true - - phy-mode: true - pcs-handle: maxItems: 1 description: A reference to a node representing a PCS PHY device found on the internal MDIO bus. - managed: true - phys: description: A reference to the SerDes lane(s) maxItems: 1 @@ -45,7 +37,7 @@ properties: required: - reg -additionalProperties: false +unevaluatedProperties: false examples: - | diff --git a/Documentation/devicetree/bindings/net/mdio.yaml b/Documentation/devicetree/bindings/net/mdio.yaml index a266ade918ca..bed3987a8fbf 100644 --- a/Documentation/devicetree/bindings/net/mdio.yaml +++ b/Documentation/devicetree/bindings/net/mdio.yaml @@ -19,7 +19,7 @@ description: properties: $nodename: - pattern: "^mdio(@.*)?" + pattern: '^mdio(-(bus|external))?(@.+|-([0-9]+))?$' "#address-cells": const: 1 diff --git a/Documentation/devicetree/bindings/net/mediatek,net.yaml b/Documentation/devicetree/bindings/net/mediatek,net.yaml index 686b5c2fae40..9e02fd80af83 100644 --- a/Documentation/devicetree/bindings/net/mediatek,net.yaml +++ b/Documentation/devicetree/bindings/net/mediatek,net.yaml @@ -30,8 +30,13 @@ properties: reg: maxItems: 1 - clocks: true - clock-names: true + clocks: + minItems: 2 + maxItems: 24 + + clock-names: + minItems: 2 + maxItems: 24 interrupts: minItems: 1 @@ -127,6 +132,7 @@ allOf: then: properties: interrupts: + minItems: 3 maxItems: 3 clocks: @@ -183,6 +189,7 @@ allOf: then: properties: interrupts: + minItems: 3 maxItems: 3 clocks: @@ -222,6 +229,7 @@ allOf: then: properties: interrupts: + minItems: 3 maxItems: 3 clocks: diff --git a/Documentation/devicetree/bindings/net/microchip,lan8650.yaml b/Documentation/devicetree/bindings/net/microchip,lan8650.yaml new file mode 100644 index 000000000000..61e11d4a07c4 --- /dev/null +++ b/Documentation/devicetree/bindings/net/microchip,lan8650.yaml @@ -0,0 +1,74 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/net/microchip,lan8650.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Microchip LAN8650/1 10BASE-T1S MACPHY Ethernet Controllers + +maintainers: + - Parthiban Veerasooran <parthiban.veerasooran@microchip.com> + +description: + The LAN8650/1 combines a Media Access Controller (MAC) and an Ethernet + PHY to enable 10BASE‑T1S networks. The Ethernet Media Access Controller + (MAC) module implements a 10 Mbps half duplex Ethernet MAC, compatible + with the IEEE 802.3 standard and a 10BASE-T1S physical layer transceiver + integrated into the LAN8650/1. The communication between the Host and + the MAC-PHY is specified in the OPEN Alliance 10BASE-T1x MACPHY Serial + Interface (TC6). + +allOf: + - $ref: /schemas/net/ethernet-controller.yaml# + - $ref: /schemas/spi/spi-peripheral-props.yaml# + +properties: + compatible: + oneOf: + - const: microchip,lan8650 + - items: + - const: microchip,lan8651 + - const: microchip,lan8650 + + reg: + maxItems: 1 + + interrupts: + description: + Interrupt from MAC-PHY asserted in the event of Receive Chunks + Available, Transmit Chunk Credits Available and Extended Status + Event. + maxItems: 1 + + spi-max-frequency: + minimum: 15000000 + maximum: 25000000 + +required: + - compatible + - reg + - interrupts + - spi-max-frequency + +unevaluatedProperties: false + +examples: + - | + #include <dt-bindings/interrupt-controller/irq.h> + #include <dt-bindings/gpio/gpio.h> + + spi { + #address-cells = <1>; + #size-cells = <0>; + + ethernet@0 { + compatible = "microchip,lan8651", "microchip,lan8650"; + reg = <0>; + pinctrl-names = "default"; + pinctrl-0 = <ð0_pins>; + interrupt-parent = <&gpio>; + interrupts = <6 IRQ_TYPE_EDGE_FALLING>; + local-mac-address = [04 05 06 01 02 03]; + spi-max-frequency = <15000000>; + }; + }; diff --git a/Documentation/devicetree/bindings/net/nxp,tja11xx.yaml b/Documentation/devicetree/bindings/net/nxp,tja11xx.yaml index 85bfa45f5122..a754a61adc2d 100644 --- a/Documentation/devicetree/bindings/net/nxp,tja11xx.yaml +++ b/Documentation/devicetree/bindings/net/nxp,tja11xx.yaml @@ -14,8 +14,53 @@ maintainers: description: Bindings for NXP TJA11xx automotive PHYs +properties: + compatible: + enum: + - ethernet-phy-id0180.dc40 + - ethernet-phy-id0180.dc41 + - ethernet-phy-id0180.dc48 + - ethernet-phy-id0180.dd00 + - ethernet-phy-id0180.dd01 + - ethernet-phy-id0180.dd02 + - ethernet-phy-id0180.dc80 + - ethernet-phy-id0180.dc82 + - ethernet-phy-id001b.b010 + - ethernet-phy-id001b.b013 + - ethernet-phy-id001b.b030 + - ethernet-phy-id001b.b031 + allOf: - $ref: ethernet-phy.yaml# + - if: + properties: + compatible: + contains: + enum: + - ethernet-phy-id0180.dc40 + - ethernet-phy-id0180.dc41 + - ethernet-phy-id0180.dc48 + - ethernet-phy-id0180.dd00 + - ethernet-phy-id0180.dd01 + - ethernet-phy-id0180.dd02 + + then: + properties: + nxp,rmii-refclk-in: + type: boolean + description: | + The REF_CLK is provided for both transmitted and received data + in RMII mode. This clock signal is provided by the PHY and is + typically derived from an external 25MHz crystal. Alternatively, + a 50MHz clock signal generated by an external oscillator can be + connected to pin REF_CLK. A third option is to connect a 25MHz + clock to pin CLK_IN_OUT. So, the REF_CLK should be configured + as input or output according to the actual circuit connection. + If present, indicates that the REF_CLK will be configured as + interface reference clock input when RMII mode enabled. + If not present, the REF_CLK will be configured as interface + reference clock output when RMII mode enabled. + Only supported on TJA1100 and TJA1101. patternProperties: "^ethernet-phy@[0-9a-f]+$": @@ -32,22 +77,6 @@ patternProperties: description: The ID number for the child PHY. Should be +1 of parent PHY. - nxp,rmii-refclk-in: - type: boolean - description: | - The REF_CLK is provided for both transmitted and received data - in RMII mode. This clock signal is provided by the PHY and is - typically derived from an external 25MHz crystal. Alternatively, - a 50MHz clock signal generated by an external oscillator can be - connected to pin REF_CLK. A third option is to connect a 25MHz - clock to pin CLK_IN_OUT. So, the REF_CLK should be configured - as input or output according to the actual circuit connection. - If present, indicates that the REF_CLK will be configured as - interface reference clock input when RMII mode enabled. - If not present, the REF_CLK will be configured as interface - reference clock output when RMII mode enabled. - Only supported on TJA1100 and TJA1101. - required: - reg @@ -60,6 +89,7 @@ examples: #size-cells = <0>; tja1101_phy0: ethernet-phy@4 { + compatible = "ethernet-phy-id0180.dc40"; reg = <0x4>; nxp,rmii-refclk-in; }; diff --git a/Documentation/devicetree/bindings/net/pse-pd/ti,tps23881.yaml b/Documentation/devicetree/bindings/net/pse-pd/ti,tps23881.yaml index 6992d56832bf..d08abcb01211 100644 --- a/Documentation/devicetree/bindings/net/pse-pd/ti,tps23881.yaml +++ b/Documentation/devicetree/bindings/net/pse-pd/ti,tps23881.yaml @@ -23,6 +23,9 @@ properties: '#pse-cells': const: 1 + reset-gpios: + maxItems: 1 + channels: description: each set of 8 ports can be assigned to one physical channels or two for PoE4. This parameter describes the configuration diff --git a/Documentation/devicetree/bindings/net/renesas,etheravb.yaml b/Documentation/devicetree/bindings/net/renesas,etheravb.yaml index 21a92f179093..1e00ef5b3acd 100644 --- a/Documentation/devicetree/bindings/net/renesas,etheravb.yaml +++ b/Documentation/devicetree/bindings/net/renesas,etheravb.yaml @@ -62,15 +62,27 @@ properties: - renesas,r9a08g045-gbeth # RZ/G3S - const: renesas,rzg2l-gbeth # RZ/{G2L,G2UL,V2L} family - reg: true + reg: + minItems: 1 + items: + - description: MAC register block + - description: Stream buffer - interrupts: true + interrupts: + minItems: 1 + maxItems: 29 - interrupt-names: true + interrupt-names: + minItems: 1 + maxItems: 29 - clocks: true + clocks: + minItems: 1 + maxItems: 3 - clock-names: true + clock-names: + minItems: 1 + maxItems: 3 iommus: maxItems: 1 @@ -150,14 +162,11 @@ allOf: then: properties: reg: - items: - - description: MAC register block - - description: Stream buffer + minItems: 2 else: properties: reg: - items: - - description: MAC register block + maxItems: 1 - if: properties: diff --git a/Documentation/devicetree/bindings/net/rockchip-dwmac.yaml b/Documentation/devicetree/bindings/net/rockchip-dwmac.yaml index 6bbe96e35250..f8a576611d6c 100644 --- a/Documentation/devicetree/bindings/net/rockchip-dwmac.yaml +++ b/Documentation/devicetree/bindings/net/rockchip-dwmac.yaml @@ -25,6 +25,7 @@ select: - rockchip,rk3368-gmac - rockchip,rk3399-gmac - rockchip,rk3568-gmac + - rockchip,rk3576-gmac - rockchip,rk3588-gmac - rockchip,rv1108-gmac - rockchip,rv1126-gmac @@ -52,6 +53,7 @@ properties: - items: - enum: - rockchip,rk3568-gmac + - rockchip,rk3576-gmac - rockchip,rk3588-gmac - rockchip,rv1126-gmac - const: snps,dwmac-4.20a diff --git a/Documentation/devicetree/bindings/net/snps,dwmac.yaml b/Documentation/devicetree/bindings/net/snps,dwmac.yaml index 3eb65e63fdae..4e2ba1bf788c 100644 --- a/Documentation/devicetree/bindings/net/snps,dwmac.yaml +++ b/Documentation/devicetree/bindings/net/snps,dwmac.yaml @@ -80,6 +80,7 @@ properties: - rockchip,rk3328-gmac - rockchip,rk3366-gmac - rockchip,rk3368-gmac + - rockchip,rk3576-gmac - rockchip,rk3588-gmac - rockchip,rk3399-gmac - rockchip,rv1108-gmac diff --git a/Documentation/devicetree/bindings/net/socionext,uniphier-ave4.yaml b/Documentation/devicetree/bindings/net/socionext,uniphier-ave4.yaml index b0ebcef6801c..4eb63b303cff 100644 --- a/Documentation/devicetree/bindings/net/socionext,uniphier-ave4.yaml +++ b/Documentation/devicetree/bindings/net/socionext,uniphier-ave4.yaml @@ -41,13 +41,17 @@ properties: minItems: 1 maxItems: 4 - clock-names: true + clock-names: + minItems: 1 + maxItems: 4 resets: minItems: 1 maxItems: 2 - reset-names: true + reset-names: + minItems: 1 + maxItems: 2 socionext,syscon-phy-mode: $ref: /schemas/types.yaml#/definitions/phandle-array diff --git a/Documentation/devicetree/bindings/net/wireless/marvell,sd8787.yaml b/Documentation/devicetree/bindings/net/wireless/marvell,sd8787.yaml new file mode 100644 index 000000000000..1715b22e0dcf --- /dev/null +++ b/Documentation/devicetree/bindings/net/wireless/marvell,sd8787.yaml @@ -0,0 +1,93 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/net/wireless/marvell,sd8787.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Marvell 8787/8897/8978/8997 (sd8787/sd8897/sd8978/sd8997/pcie8997) SDIO/PCIE devices + +maintainers: + - Brian Norris <briannorris@chromium.org> + - Frank Li <Frank.Li@nxp.com> + +description: + This node provides properties for describing the Marvell SDIO/PCIE wireless device. + The node is expected to be specified as a child node to the SDIO/PCIE controller that + connects the device to the system. + +properties: + compatible: + enum: + - marvell,sd8787 + - marvell,sd8897 + - marvell,sd8978 + - marvell,sd8997 + - nxp,iw416 + - pci11ab,2b42 + - pci1b4b,2b42 + + reg: + maxItems: 1 + + interrupts: + maxItems: 1 + + wakeup-source: true + + marvell,caldata-txpwrlimit-2g: + $ref: /schemas/types.yaml#/definitions/uint8-array + description: Calibration data for the 2GHz band. + maxItems: 566 + + marvell,caldata-txpwrlimit-5g-sub0: + $ref: /schemas/types.yaml#/definitions/uint8-array + description: Calibration data for sub-band 0 in the 5GHz band. + maxItems: 502 + + marvell,caldata-txpwrlimit-5g-sub1: + $ref: /schemas/types.yaml#/definitions/uint8-array + description: Calibration data for sub-band 1 in the 5GHz band. + maxItems: 688 + + marvell,caldata-txpwrlimit-5g-sub2: + $ref: /schemas/types.yaml#/definitions/uint8-array + description: Calibration data for sub-band 2 in the 5GHz band. + maxItems: 750 + + marvell,caldata-txpwrlimit-5g-sub3: + $ref: /schemas/types.yaml#/definitions/uint8-array + description: Calibration data for sub-band 3 in the 5GHz band. + maxItems: 502 + + marvell,wakeup-pin: + $ref: /schemas/types.yaml#/definitions/uint32 + description: + Provides the pin number for the wakeup pin from the device's point of + view. The wakeup pin is used for the device to wake the host system + from sleep. This property is only necessary if the wakeup pin is + wired in a non-standard way, such that the default pin assignments + are invalid. + +required: + - compatible + - reg + +additionalProperties: false + +examples: + - | + #include <dt-bindings/interrupt-controller/irq.h> + + mmc { + #address-cells = <1>; + #size-cells = <0>; + + wifi@1 { + compatible = "marvell,sd8897"; + reg = <1>; + interrupt-parent = <&pio>; + interrupts = <38 IRQ_TYPE_LEVEL_LOW>; + marvell,wakeup-pin = <3>; + }; + }; + diff --git a/Documentation/devicetree/bindings/net/wireless/marvell-8xxx.txt b/Documentation/devicetree/bindings/net/wireless/marvell-8xxx.txt deleted file mode 100644 index cdc303caf5f4..000000000000 --- a/Documentation/devicetree/bindings/net/wireless/marvell-8xxx.txt +++ /dev/null @@ -1,70 +0,0 @@ -Marvell 8787/8897/8978/8997 (sd8787/sd8897/sd8978/sd8997/pcie8997) SDIO/PCIE devices ------- - -This node provides properties for controlling the Marvell SDIO/PCIE wireless device. -The node is expected to be specified as a child node to the SDIO/PCIE controller that -connects the device to the system. - -Required properties: - - - compatible : should be one of the following: - * "marvell,sd8787" - * "marvell,sd8897" - * "marvell,sd8978" - * "marvell,sd8997" - * "nxp,iw416" - * "pci11ab,2b42" - * "pci1b4b,2b42" - -Optional properties: - - - marvell,caldata* : A series of properties with marvell,caldata prefix, - represent calibration data downloaded to the device during - initialization. This is an array of unsigned 8-bit values. - the properties should follow below property name and - corresponding array length: - "marvell,caldata-txpwrlimit-2g" (length = 566). - "marvell,caldata-txpwrlimit-5g-sub0" (length = 502). - "marvell,caldata-txpwrlimit-5g-sub1" (length = 688). - "marvell,caldata-txpwrlimit-5g-sub2" (length = 750). - "marvell,caldata-txpwrlimit-5g-sub3" (length = 502). - - marvell,wakeup-pin : a wakeup pin number of wifi chip which will be configured - to firmware. Firmware will wakeup the host using this pin - during suspend/resume. - - interrupts : interrupt pin number to the cpu. driver will request an irq based on - this interrupt number. during system suspend, the irq will be enabled - so that the wifi chip can wakeup host platform under certain condition. - during system resume, the irq will be disabled to make sure - unnecessary interrupt is not received. - - vmmc-supply: a phandle of a regulator, supplying VCC to the card - - mmc-pwrseq: phandle to the MMC power sequence node. See "mmc-pwrseq-*" - for documentation of MMC power sequence bindings. - -Example: - -Tx power limit calibration data is configured in below example. -The calibration data is an array of unsigned values, the length -can vary between hw versions. -IRQ pin 38 is used as system wakeup source interrupt. wakeup pin 3 is configured -so that firmware can wakeup host using this device side pin. - -&mmc3 { - vmmc-supply = <&wlan_en_reg>; - mmc-pwrseq = <&wifi_pwrseq>; - bus-width = <4>; - cap-power-off-card; - keep-power-in-suspend; - - #address-cells = <1>; - #size-cells = <0>; - mwifiex: wifi@1 { - compatible = "marvell,sd8897"; - reg = <1>; - interrupt-parent = <&pio>; - interrupts = <38 IRQ_TYPE_LEVEL_LOW>; - - marvell,caldata_00_txpwrlimit_2g_cfg_set = /bits/ 8 < - 0x01 0x00 0x06 0x00 0x08 0x02 0x89 0x01>; - marvell,wakeup-pin = <3>; - }; -}; diff --git a/Documentation/devicetree/bindings/opp/operating-points-v2-ti-cpu.yaml b/Documentation/devicetree/bindings/opp/operating-points-v2-ti-cpu.yaml index 02d1d2c17129..fd0c8d5c5f3e 100644 --- a/Documentation/devicetree/bindings/opp/operating-points-v2-ti-cpu.yaml +++ b/Documentation/devicetree/bindings/opp/operating-points-v2-ti-cpu.yaml @@ -19,7 +19,7 @@ description: the hardware description for the scheme mentioned above. maintainers: - - Nishanth Menon <nm@ti.com> + - Dhruva Gole <d-gole@ti.com> allOf: - $ref: opp-v2-base.yaml# diff --git a/Documentation/devicetree/bindings/perf/arm,cmn.yaml b/Documentation/devicetree/bindings/perf/arm,cmn.yaml index 2e51072e794a..0e9d665584e6 100644 --- a/Documentation/devicetree/bindings/perf/arm,cmn.yaml +++ b/Documentation/devicetree/bindings/perf/arm,cmn.yaml @@ -16,6 +16,7 @@ properties: - arm,cmn-600 - arm,cmn-650 - arm,cmn-700 + - arm,cmn-s3 - arm,ci-700 reg: diff --git a/Documentation/devicetree/bindings/perf/arm,ni.yaml b/Documentation/devicetree/bindings/perf/arm,ni.yaml new file mode 100644 index 000000000000..d66fffa256d5 --- /dev/null +++ b/Documentation/devicetree/bindings/perf/arm,ni.yaml @@ -0,0 +1,30 @@ +# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/perf/arm,ni.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Arm NI (Network-on-Chip Interconnect) Performance Monitors + +maintainers: + - Robin Murphy <robin.murphy@arm.com> + +properties: + compatible: + const: arm,ni-700 + + reg: + items: + - description: Complete configuration register space + + interrupts: + minItems: 1 + maxItems: 32 + description: Overflow interrupts, one per clock domain, in order of domain ID + +required: + - compatible + - reg + - interrupts + +additionalProperties: false diff --git a/Documentation/devicetree/bindings/ptp/fsl,ptp.yaml b/Documentation/devicetree/bindings/ptp/fsl,ptp.yaml index 3bb8615e3e91..42ca895f3c4e 100644 --- a/Documentation/devicetree/bindings/ptp/fsl,ptp.yaml +++ b/Documentation/devicetree/bindings/ptp/fsl,ptp.yaml @@ -11,11 +11,14 @@ maintainers: properties: compatible: - enum: - - fsl,etsec-ptp - - fsl,fman-ptp-timer - - fsl,dpaa2-ptp - - fsl,enetc-ptp + oneOf: + - enum: + - fsl,etsec-ptp + - fsl,fman-ptp-timer + - fsl,dpaa2-ptp + - items: + - const: pci1957,ee02 + - const: fsl,enetc-ptp reg: maxItems: 1 @@ -123,6 +126,15 @@ required: - compatible - reg +allOf: + - if: + properties: + compatible: + contains: + const: fsl,enetc-ptp + then: + $ref: /schemas/pci/pci-device.yaml + additionalProperties: false examples: diff --git a/Documentation/devicetree/bindings/rng/rockchip,rk3568-rng.yaml b/Documentation/devicetree/bindings/rng/rockchip,rk3568-rng.yaml new file mode 100644 index 000000000000..e0595814a6d9 --- /dev/null +++ b/Documentation/devicetree/bindings/rng/rockchip,rk3568-rng.yaml @@ -0,0 +1,61 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/rng/rockchip,rk3568-rng.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Rockchip RK3568 TRNG + +description: True Random Number Generator on Rockchip RK3568 SoC + +maintainers: + - Aurelien Jarno <aurelien@aurel32.net> + - Daniel Golle <daniel@makrotopia.org> + +properties: + compatible: + enum: + - rockchip,rk3568-rng + + reg: + maxItems: 1 + + clocks: + items: + - description: TRNG clock + - description: TRNG AHB clock + + clock-names: + items: + - const: core + - const: ahb + + resets: + maxItems: 1 + +required: + - compatible + - reg + - clocks + - clock-names + - resets + +additionalProperties: false + +examples: + - | + #include <dt-bindings/clock/rk3568-cru.h> + bus { + #address-cells = <2>; + #size-cells = <2>; + + rng@fe388000 { + compatible = "rockchip,rk3568-rng"; + reg = <0x0 0xfe388000 0x0 0x4000>; + clocks = <&cru CLK_TRNG_NS>, <&cru HCLK_TRNG_NS>; + clock-names = "core", "ahb"; + resets = <&cru SRST_TRNG_NS>; + }; + }; + +... diff --git a/Documentation/devicetree/bindings/soc/rockchip/grf.yaml b/Documentation/devicetree/bindings/soc/rockchip/grf.yaml index 78c6d5b64138..35b20e53b513 100644 --- a/Documentation/devicetree/bindings/soc/rockchip/grf.yaml +++ b/Documentation/devicetree/bindings/soc/rockchip/grf.yaml @@ -31,11 +31,17 @@ properties: - rockchip,rk3588-pcie3-pipe-grf - rockchip,rk3588-usb-grf - rockchip,rk3588-usbdpphy-grf - - rockchip,rk3588-vo-grf + - rockchip,rk3588-vo0-grf + - rockchip,rk3588-vo1-grf - rockchip,rk3588-vop-grf - rockchip,rv1108-usbgrf - const: syscon - items: + - const: rockchip,rk3588-vo-grf + - const: syscon + deprecated: true + description: Use rockchip,rk3588-vo{0,1}-grf instead. + - items: - enum: - rockchip,px30-grf - rockchip,px30-pmugrf @@ -262,6 +268,8 @@ allOf: contains: enum: - rockchip,rk3588-vo-grf + - rockchip,rk3588-vo0-grf + - rockchip,rk3588-vo1-grf then: required: diff --git a/Documentation/devicetree/bindings/soc/ti/ti,pruss.yaml b/Documentation/devicetree/bindings/soc/ti/ti,pruss.yaml index c402cb2928e8..3cb1471cc6b6 100644 --- a/Documentation/devicetree/bindings/soc/ti/ti,pruss.yaml +++ b/Documentation/devicetree/bindings/soc/ti/ti,pruss.yaml @@ -278,6 +278,26 @@ patternProperties: additionalProperties: false + ^pa-stats@[a-f0-9]+$: + description: | + PA-STATS sub-module represented as a SysCon. PA_STATS is a set of + registers where different statistics related to ICSSG, are dumped by + ICSSG firmware. This syscon sub-module will help the device to + access/read/write those statistics. + + type: object + + additionalProperties: false + + properties: + compatible: + items: + - const: ti,pruss-pa-st + - const: syscon + + reg: + maxItems: 1 + interrupt-controller@[a-f0-9]+$: description: | PRUSS INTC Node. Each PRUSS has a single interrupt controller instance diff --git a/Documentation/devicetree/bindings/thermal/amlogic,thermal.yaml b/Documentation/devicetree/bindings/thermal/amlogic,thermal.yaml index 725303e1a364..70b273271754 100644 --- a/Documentation/devicetree/bindings/thermal/amlogic,thermal.yaml +++ b/Documentation/devicetree/bindings/thermal/amlogic,thermal.yaml @@ -32,6 +32,9 @@ properties: clocks: maxItems: 1 + power-domains: + maxItems: 1 + amlogic,ao-secure: description: phandle to the ao-secure syscon $ref: /schemas/types.yaml#/definitions/phandle diff --git a/Documentation/devicetree/bindings/thermal/qcom-tsens.yaml b/Documentation/devicetree/bindings/thermal/qcom-tsens.yaml index 72048c5a0412..d45690d6a465 100644 --- a/Documentation/devicetree/bindings/thermal/qcom-tsens.yaml +++ b/Documentation/devicetree/bindings/thermal/qcom-tsens.yaml @@ -51,6 +51,7 @@ properties: - qcom,msm8996-tsens - qcom,msm8998-tsens - qcom,qcm2290-tsens + - qcom,sa8255p-tsens - qcom,sa8775p-tsens - qcom,sc7180-tsens - qcom,sc7280-tsens diff --git a/Documentation/driver-api/dpll.rst b/Documentation/driver-api/dpll.rst index ea8d16600e16..e6855cd37e85 100644 --- a/Documentation/driver-api/dpll.rst +++ b/Documentation/driver-api/dpll.rst @@ -214,6 +214,27 @@ offset values are fractional with 3-digit decimal places and shell be divided with ``DPLL_PIN_PHASE_OFFSET_DIVIDER`` to get integer part and modulo divided to get fractional part. +Embedded SYNC +============= + +Device may provide ability to use Embedded SYNC feature. It allows +to embed additional SYNC signal into the base frequency of a pin - a one +special pulse of base frequency signal every time SYNC signal pulse +happens. The user can configure the frequency of Embedded SYNC. +The Embedded SYNC capability is always related to a given base frequency +and HW capabilities. The user is provided a range of Embedded SYNC +frequencies supported, depending on current base frequency configured for +the pin. + + ========================================= ================================= + ``DPLL_A_PIN_ESYNC_FREQUENCY`` current Embedded SYNC frequency + ``DPLL_A_PIN_ESYNC_FREQUENCY_SUPPORTED`` nest available Embedded SYNC + frequency ranges + ``DPLL_A_PIN_FREQUENCY_MIN`` attr minimum value of frequency + ``DPLL_A_PIN_FREQUENCY_MAX`` attr maximum value of frequency + ``DPLL_A_PIN_ESYNC_PULSE`` pulse type of Embedded SYNC + ========================================= ================================= + Configuration commands group ============================ diff --git a/Documentation/driver-api/thermal/sysfs-api.rst b/Documentation/driver-api/thermal/sysfs-api.rst index 978198f8a18b..c803b89b7248 100644 --- a/Documentation/driver-api/thermal/sysfs-api.rst +++ b/Documentation/driver-api/thermal/sysfs-api.rst @@ -58,10 +58,9 @@ temperature) and throttle appropriate devices. ops: thermal zone device call-backs. - .bind: - bind the thermal zone device with a thermal cooling device. - .unbind: - unbind the thermal zone device with a thermal cooling device. + .should_bind: + check whether or not a given cooling device should be bound to + a given trip point in this thermal zone. .get_temp: get the current temperature of the thermal zone. .set_trips: @@ -246,56 +245,6 @@ temperature) and throttle appropriate devices. It deletes the corresponding entry from /sys/class/thermal folder and unbinds itself from all the thermal zone devices using it. -1.3 interface for binding a thermal zone device with a thermal cooling device ------------------------------------------------------------------------------ - - :: - - int thermal_zone_bind_cooling_device(struct thermal_zone_device *tz, - int trip, struct thermal_cooling_device *cdev, - unsigned long upper, unsigned long lower, unsigned int weight); - - This interface function binds a thermal cooling device to a particular trip - point of a thermal zone device. - - This function is usually called in the thermal zone device .bind callback. - - tz: - the thermal zone device - cdev: - thermal cooling device - trip: - indicates which trip point in this thermal zone the cooling device - is associated with. - upper: - the Maximum cooling state for this trip point. - THERMAL_NO_LIMIT means no upper limit, - and the cooling device can be in max_state. - lower: - the Minimum cooling state can be used for this trip point. - THERMAL_NO_LIMIT means no lower limit, - and the cooling device can be in cooling state 0. - weight: - the influence of this cooling device in this thermal - zone. See 1.4.1 below for more information. - - :: - - int thermal_zone_unbind_cooling_device(struct thermal_zone_device *tz, - int trip, struct thermal_cooling_device *cdev); - - This interface function unbinds a thermal cooling device from a particular - trip point of a thermal zone device. This function is usually called in - the thermal zone device .unbind callback. - - tz: - the thermal zone device - cdev: - thermal cooling device - trip: - indicates which trip point in this thermal zone the cooling device - is associated with. - 1.4 Thermal Zone Parameters --------------------------- @@ -366,8 +315,6 @@ Thermal cooling device sys I/F, created once it's registered:: Then next two dynamic attributes are created/removed in pairs. They represent the relationship between a thermal zone and its associated cooling device. -They are created/removed for each successful execution of -thermal_zone_bind_cooling_device/thermal_zone_unbind_cooling_device. :: @@ -459,14 +406,7 @@ are supposed to implement the callback. If they don't, the thermal framework calculated the trend by comparing the previous and the current temperature values. -4.2. get_thermal_instance -------------------------- - -This function returns the thermal_instance corresponding to a given -{thermal_zone, cooling_device, trip_point} combination. Returns NULL -if such an instance does not exist. - -4.3. thermal_cdev_update +4.2. thermal_cdev_update ------------------------ This function serves as an arbitrator to set the state of a cooling diff --git a/Documentation/filesystems/fsverity.rst b/Documentation/filesystems/fsverity.rst index 13e4b18e5dbb..0e2fac7a16da 100644 --- a/Documentation/filesystems/fsverity.rst +++ b/Documentation/filesystems/fsverity.rst @@ -86,6 +86,16 @@ authenticating fs-verity file hashes include: signature in their "security.ima" extended attribute, as controlled by the IMA policy. For more information, see the IMA documentation. +- Integrity Policy Enforcement (IPE). IPE supports enforcing access + control decisions based on immutable security properties of files, + including those protected by fs-verity's built-in signatures. + "IPE policy" specifically allows for the authorization of fs-verity + files using properties ``fsverity_digest`` for identifying + files by their verity digest, and ``fsverity_signature`` to authorize + files with a verified fs-verity's built-in signature. For + details on configuring IPE policies and understanding its operational + modes, please refer to :doc:`IPE admin guide </admin-guide/LSM/ipe>`. + - Trusted userspace code in combination with `Built-in signature verification`_. This approach should be used only with great care. @@ -457,7 +467,11 @@ Enabling this option adds the following: On success, the ioctl persists the signature alongside the Merkle tree. Then, any time the file is opened, the kernel verifies the file's actual digest against this signature, using the certificates - in the ".fs-verity" keyring. + in the ".fs-verity" keyring. This verification happens as long as the + file's signature exists, regardless of the state of the sysctl variable + "fs.verity.require_signatures" described in the next item. The IPE LSM + relies on this behavior to recognize and label fsverity files + that contain a verified built-in fsverity signature. 3. A new sysctl "fs.verity.require_signatures" is made available. When set to 1, the kernel requires that all verity files have a @@ -481,7 +495,7 @@ be carefully considered before using them: - Builtin signature verification does *not* make the kernel enforce that any files actually have fs-verity enabled. Thus, it is not a - complete authentication policy. Currently, if it is used, the only + complete authentication policy. Currently, if it is used, one way to complete the authentication policy is for trusted userspace code to explicitly check whether files have fs-verity enabled with a signature before they are accessed. (With @@ -490,6 +504,15 @@ be carefully considered before using them: could just store the signature alongside the file and verify it itself using a cryptographic library, instead of using this feature. +- Another approach is to utilize fs-verity builtin signature + verification in conjunction with the IPE LSM, which supports defining + a kernel-enforced, system-wide authentication policy that allows only + files with a verified fs-verity builtin signature to perform certain + operations, such as execution. Note that IPE doesn't require + fs.verity.require_signatures=1. + Please refer to :doc:`IPE admin guide </admin-guide/LSM/ipe>` for + more details. + - A file's builtin signature can only be set at the same time that fs-verity is being enabled on the file. Changing or deleting the builtin signature later requires re-creating the file. diff --git a/Documentation/filesystems/idmappings.rst b/Documentation/filesystems/idmappings.rst index ac0af679e61e..77930c77fcfe 100644 --- a/Documentation/filesystems/idmappings.rst +++ b/Documentation/filesystems/idmappings.rst @@ -821,7 +821,7 @@ the same idmapping to the mount. We now perform three steps: /* Map the userspace id down into a kernel id in the filesystem's idmapping. */ make_kuid(u0:k20000:r10000, u1000) = k21000 -2. Verify that the caller's kernel ids can be mapped to userspace ids in the +3. Verify that the caller's kernel ids can be mapped to userspace ids in the filesystem's idmapping:: from_kuid(u0:k20000:r10000, k21000) = u1000 @@ -854,10 +854,10 @@ The same translation algorithm works with the third example. /* Map the userspace id down into a kernel id in the filesystem's idmapping. */ make_kuid(u0:k0:r4294967295, u1000) = k1000 -2. Verify that the caller's kernel ids can be mapped to userspace ids in the +3. Verify that the caller's kernel ids can be mapped to userspace ids in the filesystem's idmapping:: - from_kuid(u0:k0:r4294967295, k21000) = u1000 + from_kuid(u0:k0:r4294967295, k1000) = u1000 So the ownership that lands on disk will be ``u1000``. @@ -994,7 +994,7 @@ from above::: /* Map the userspace id down into a kernel id in the filesystem's idmapping. */ make_kuid(u0:k0:r4294967295, u1000) = k1000 -2. Verify that the caller's filesystem ids can be mapped to userspace ids in the +3. Verify that the caller's filesystem ids can be mapped to userspace ids in the filesystem's idmapping:: from_kuid(u0:k0:r4294967295, k1000) = u1000 diff --git a/Documentation/filesystems/iomap/design.rst b/Documentation/filesystems/iomap/design.rst index f8ee3427bc1a..37594e1c5914 100644 --- a/Documentation/filesystems/iomap/design.rst +++ b/Documentation/filesystems/iomap/design.rst @@ -142,9 +142,9 @@ Definitions * **pure overwrite**: A write operation that does not require any metadata or zeroing operations to perform during either submission or completion. - This implies that the fileystem must have already allocated space + This implies that the filesystem must have already allocated space on disk as ``IOMAP_MAPPED`` and the filesystem must not place any - constaints on IO alignment or size. + constraints on IO alignment or size. The only constraints on I/O alignment are device level (minimum I/O size and alignment, typically sector size). @@ -394,7 +394,7 @@ iomap is concerned: * The **upper** level primitive is provided by the filesystem to coordinate access to different iomap operations. - The exact primitive is specifc to the filesystem and operation, + The exact primitive is specific to the filesystem and operation, but is often a VFS inode, pagecache invalidation, or folio lock. For example, a filesystem might take ``i_rwsem`` before calling ``iomap_file_buffered_write`` and ``iomap_file_unshare`` to prevent diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst index e664061ed55d..f5e3676db954 100644 --- a/Documentation/filesystems/locking.rst +++ b/Documentation/filesystems/locking.rst @@ -251,10 +251,10 @@ prototypes:: void (*readahead)(struct readahead_control *); int (*write_begin)(struct file *, struct address_space *mapping, loff_t pos, unsigned len, - struct page **pagep, void **fsdata); + struct folio **foliop, void **fsdata); int (*write_end)(struct file *, struct address_space *mapping, loff_t pos, unsigned len, unsigned copied, - struct page *page, void *fsdata); + struct folio *folio, void *fsdata); sector_t (*bmap)(struct address_space *, sector_t); void (*invalidate_folio) (struct folio *, size_t start, size_t len); bool (*release_folio)(struct folio *, gfp_t); @@ -280,7 +280,7 @@ read_folio: yes, unlocks shared writepages: dirty_folio: maybe readahead: yes, unlocks shared -write_begin: locks the page exclusive +write_begin: locks the folio exclusive write_end: yes, unlocks exclusive bmap: invalidate_folio: yes exclusive diff --git a/Documentation/filesystems/netfs_library.rst b/Documentation/filesystems/netfs_library.rst index 4cc657d743f7..f0d2cb257bb8 100644 --- a/Documentation/filesystems/netfs_library.rst +++ b/Documentation/filesystems/netfs_library.rst @@ -116,7 +116,7 @@ The following services are provided: * Handle local caching, allowing cached data and server-read data to be interleaved for a single request. - * Handle clearing of bufferage that aren't on the server. + * Handle clearing of bufferage that isn't on the server. * Handle retrying of reads that failed, switching reads from the cache to the server as necessary. diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst index 6e903a903f8f..4f67b5ea0568 100644 --- a/Documentation/filesystems/vfs.rst +++ b/Documentation/filesystems/vfs.rst @@ -810,7 +810,7 @@ cache in your filesystem. The following members are defined: struct page **pagep, void **fsdata); int (*write_end)(struct file *, struct address_space *mapping, loff_t pos, unsigned len, unsigned copied, - struct page *page, void *fsdata); + struct folio *folio, void *fsdata); sector_t (*bmap)(struct address_space *, sector_t); void (*invalidate_folio) (struct folio *, size_t start, size_t len); bool (*release_folio)(struct folio *, gfp_t); @@ -926,12 +926,12 @@ cache in your filesystem. The following members are defined: (if they haven't been read already) so that the updated blocks can be written out properly. - The filesystem must return the locked pagecache page for the - specified offset, in ``*pagep``, for the caller to write into. + The filesystem must return the locked pagecache folio for the + specified offset, in ``*foliop``, for the caller to write into. It must be able to cope with short writes (where the length passed to write_begin is greater than the number of bytes copied - into the page). + into the folio). A void * may be returned in fsdata, which then gets passed into write_end. @@ -944,8 +944,8 @@ cache in your filesystem. The following members are defined: called. len is the original len passed to write_begin, and copied is the amount that was able to be copied. - The filesystem must take care of unlocking the page and - releasing it refcount, and updating i_size. + The filesystem must take care of unlocking the folio, + decrementing its refcount, and updating i_size. Returns < 0 on failure, otherwise the number of bytes (<= 'copied') that were able to be copied into pagecache. diff --git a/Documentation/netlink/specs/dpll.yaml b/Documentation/netlink/specs/dpll.yaml index 94132d30e0e0..f2894ca35de8 100644 --- a/Documentation/netlink/specs/dpll.yaml +++ b/Documentation/netlink/specs/dpll.yaml @@ -345,6 +345,26 @@ attribute-sets: Value is in PPM (parts per million). This may be implemented for example for pin of type PIN_TYPE_SYNCE_ETH_PORT. + - + name: esync-frequency + type: u64 + doc: | + Frequency of Embedded SYNC signal. If provided, the pin is configured + with a SYNC signal embedded into its base clock frequency. + - + name: esync-frequency-supported + type: nest + multi-attr: true + nested-attributes: frequency-range + doc: | + If provided a pin is capable of embedding a SYNC signal (within given + range) into its base frequency signal. + - + name: esync-pulse + type: u32 + doc: | + A ratio of high to low state of a SYNC signal pulse embedded + into base clock frequency. Value is in percents. - name: pin-parent-device subset-of: pin @@ -510,6 +530,9 @@ operations: - phase-adjust-max - phase-adjust - fractional-frequency-offset + - esync-frequency + - esync-frequency-supported + - esync-pulse dump: request: @@ -536,6 +559,7 @@ operations: - parent-device - parent-pin - phase-adjust + - esync-frequency - name: pin-create-ntf doc: Notification about pin appearing diff --git a/Documentation/netlink/specs/ethtool.yaml b/Documentation/netlink/specs/ethtool.yaml index ea21fe135b97..6a050d755b9c 100644 --- a/Documentation/netlink/specs/ethtool.yaml +++ b/Documentation/netlink/specs/ethtool.yaml @@ -39,6 +39,11 @@ definitions: - ovld-detected - power-not-available - short-detected + - + name: phy-upstream-type + enum-name: + type: enum + entries: [ mac, phy ] attribute-sets: - @@ -54,6 +59,9 @@ attribute-sets: name: flags type: u32 enum: header-flags + - + name: phy-index + type: u32 - name: bitset-bit @@ -659,6 +667,9 @@ attribute-sets: - name: code type: u8 + - + name: src + type: u32 - name: cable-fault-length attributes: @@ -668,6 +679,9 @@ attribute-sets: - name: cm type: u32 + - + name: src + type: u32 - name: cable-nest attributes: @@ -1022,12 +1036,16 @@ attribute-sets: - name: indir type: binary + sub-type: u32 - name: hkey type: binary - name: input_xfrm type: u32 + - + name: start-context + type: u32 - name: plca attributes: @@ -1085,6 +1103,35 @@ attribute-sets: - name: total type: uint + - + name: phy + attributes: + - + name: header + type: nest + nested-attributes: header + - + name: index + type: u32 + - + name: drvname + type: string + - + name: name + type: string + - + name: upstream-type + type: u32 + enum: phy-upstream-type + - + name: upstream-index + type: u32 + - + name: upstream-sfp-name + type: string + - + name: downstream-sfp-name + type: string operations: enum-model: directional @@ -1749,12 +1796,12 @@ operations: attribute-set: rss - do: &rss-get-op + do: request: attributes: - header - context - reply: + reply: &rss-reply attributes: - header - context @@ -1762,6 +1809,12 @@ operations: - indir - hkey - input_xfrm + dump: + request: + attributes: + - header + - start-context + reply: *rss-reply - name: plca-get-cfg doc: Get PLCA params. @@ -1877,3 +1930,24 @@ operations: - status-msg - done - total + - + name: phy-get + doc: Get PHY devices attached to an interface + + attribute-set: phy + + do: &phy-get-op + request: + attributes: + - header + reply: + attributes: + - header + - index + - drvname + - name + - upstream-type + - upstream-index + - upstream-sfp-name + - downstream-sfp-name + dump: *phy-get-op diff --git a/Documentation/netlink/specs/mptcp_pm.yaml b/Documentation/netlink/specs/mptcp_pm.yaml index af525ed29792..30d8342cacc8 100644 --- a/Documentation/netlink/specs/mptcp_pm.yaml +++ b/Documentation/netlink/specs/mptcp_pm.yaml @@ -109,7 +109,6 @@ attribute-sets: - name: port type: u16 - byte-order: big-endian - name: flags type: u32 diff --git a/Documentation/netlink/specs/netdev.yaml b/Documentation/netlink/specs/netdev.yaml index 959755be4d7f..08412c279297 100644 --- a/Documentation/netlink/specs/netdev.yaml +++ b/Documentation/netlink/specs/netdev.yaml @@ -167,6 +167,10 @@ attribute-sets: "re-attached", they are just waiting to disappear. Attribute is absent if Page Pool has not been detached, and can still be used to allocate new memory. + - + name: dmabuf + doc: ID of the dmabuf this page-pool is attached to. + type: u32 - name: page-pool-info subset-of: page-pool @@ -268,6 +272,10 @@ attribute-sets: name: napi-id doc: ID of the NAPI instance which services this queue. type: u32 + - + name: dmabuf + doc: ID of the dmabuf attached to this queue, if any. + type: u32 - name: qstats @@ -457,6 +465,39 @@ attribute-sets: Number of times driver re-started accepting send requests to this queue from the stack. type: uint + - + name: queue-id + subset-of: queue + attributes: + - + name: id + - + name: type + - + name: dmabuf + attributes: + - + name: ifindex + doc: netdev ifindex to bind the dmabuf to. + type: u32 + checks: + min: 1 + - + name: queues + doc: receive queues to bind the dmabuf to. + type: nest + nested-attributes: queue-id + multi-attr: true + - + name: fd + doc: dmabuf file descriptor to bind. + type: u32 + - + name: id + doc: id of the dmabuf binding + type: u32 + checks: + min: 1 operations: list: @@ -510,6 +551,7 @@ operations: - inflight - inflight-mem - detach-time + - dmabuf dump: reply: *pp-reply config-cond: page-pool @@ -574,6 +616,7 @@ operations: - type - napi-id - ifindex + - dmabuf dump: request: attributes: @@ -619,6 +662,24 @@ operations: - rx-bytes - tx-packets - tx-bytes + - + name: bind-rx + doc: Bind dmabuf to netdev + attribute-set: dmabuf + flags: [ admin-perm ] + do: + request: + attributes: + - ifindex + - fd + - queues + reply: + attributes: + - id + +kernel-family: + headers: [ "linux/list.h"] + sock-priv: struct list_head mcast-groups: list: diff --git a/Documentation/netlink/specs/nftables.yaml b/Documentation/netlink/specs/nftables.yaml index dff2a18f3d90..bd938bd01b6b 100644 --- a/Documentation/netlink/specs/nftables.yaml +++ b/Documentation/netlink/specs/nftables.yaml @@ -63,6 +63,13 @@ definitions: - sdifname - bri-broute - + name: bitwise-ops + type: enum + entries: + - bool + - lshift + - rshift + - name: cmp-ops type: enum entries: @@ -125,6 +132,99 @@ definitions: - object - concat - expr + - + name: lookup-flags + type: flags + entries: + - invert + - + name: ct-keys + type: enum + entries: + - state + - direction + - status + - mark + - secmark + - expiration + - helper + - l3protocol + - src + - dst + - protocol + - proto-src + - proto-dst + - labels + - pkts + - bytes + - avgpkt + - zone + - eventmask + - src-ip + - dst-ip + - src-ip6 + - dst-ip6 + - ct-id + - + name: ct-direction + type: enum + entries: + - original + - reply + - + name: quota-flags + type: flags + entries: + - invert + - depleted + - + name: verdict-code + type: enum + entries: + - name: continue + value: 0xffffffff + - name: break + value: 0xfffffffe + - name: jump + value: 0xfffffffd + - name: goto + value: 0xfffffffc + - name: return + value: 0xfffffffb + - name: drop + value: 0 + - name: accept + value: 1 + - name: stolen + value: 2 + - name: queue + value: 3 + - name: repeat + value: 4 + - + name: fib-result + type: enum + entries: + - oif + - oifname + - addrtype + - + name: fib-flags + type: flags + entries: + - saddr + - daddr + - mark + - iif + - oif + - present + - + name: reject-types + type: enum + entries: + - icmp-unreach + - tcp-rst + - icmpx-unreach attribute-sets: - @@ -611,9 +711,10 @@ attribute-sets: type: u64 byte-order: big-endian - - name: flags # TODO + name: flags type: u32 byte-order: big-endian + enum: quota-flags - name: pad type: pad @@ -665,6 +766,38 @@ attribute-sets: type: nest nested-attributes: hook-dev-attrs - + name: expr-bitwise-attrs + attributes: + - + name: sreg + type: u32 + byte-order: big-endian + - + name: dreg + type: u32 + byte-order: big-endian + - + name: len + type: u32 + byte-order: big-endian + - + name: mask + type: nest + nested-attributes: data-attrs + - + name: xor + type: nest + nested-attributes: data-attrs + - + name: op + type: u32 + byte-order: big-endian + enum: bitwise-ops + - + name: data + type: nest + nested-attributes: data-attrs + - name: expr-cmp-attrs attributes: - @@ -698,6 +831,7 @@ attribute-sets: name: code type: u32 byte-order: big-endian + enum: verdict-code - name: chain type: string @@ -719,6 +853,43 @@ attribute-sets: name: pad type: pad - + name: expr-fib-attrs + attributes: + - + name: dreg + type: u32 + byte-order: big-endian + - + name: result + type: u32 + byte-order: big-endian + enum: fib-result + - + name: flags + type: u32 + byte-order: big-endian + enum: fib-flags + - + name: expr-ct-attrs + attributes: + - + name: dreg + type: u32 + byte-order: big-endian + - + name: key + type: u32 + byte-order: big-endian + enum: ct-keys + - + name: direction + type: u8 + enum: ct-direction + - + name: sreg + type: u32 + byte-order: big-endian + - name: expr-flow-offload-attrs attributes: - @@ -737,6 +908,31 @@ attribute-sets: type: nest nested-attributes: data-attrs - + name: expr-lookup-attrs + attributes: + - + name: set + type: string + doc: Name of set to use + - + name: set id + type: u32 + byte-order: big-endian + doc: ID of set to use + - + name: sreg + type: u32 + byte-order: big-endian + - + name: dreg + type: u32 + byte-order: big-endian + - + name: flags + type: u32 + byte-order: big-endian + enum: lookup-flags + - name: expr-meta-attrs attributes: - @@ -821,6 +1017,30 @@ attribute-sets: type: u32 byte-order: big-endian - + name: expr-reject-attrs + attributes: + - + name: type + type: u32 + byte-order: big-endian + enum: reject-types + - + name: icmp-code + type: u8 + - + name: expr-target-attrs + attributes: + - + name: name + type: string + - + name: rev + type: u32 + byte-order: big-endian + - + name: info + type: binary + - name: expr-tproxy-attrs attributes: - @@ -835,13 +1055,38 @@ attribute-sets: name: reg-port type: u32 byte-order: big-endian + - + name: expr-objref-attrs + attributes: + - + name: imm-type + type: u32 + byte-order: big-endian + - + name: imm-name + type: string + doc: object name + - + name: set-sreg + type: u32 + byte-order: big-endian + - + name: set-name + type: string + doc: name of object map + - + name: set-id + type: u32 + byte-order: big-endian + doc: id of object map sub-messages: - name: expr-ops formats: - - value: bitwise # TODO + value: bitwise + attribute-set: expr-bitwise-attrs - value: cmp attribute-set: expr-cmp-attrs @@ -849,7 +1094,11 @@ sub-messages: value: counter attribute-set: expr-counter-attrs - - value: ct # TODO + value: ct + attribute-set: expr-ct-attrs + - + value: fib + attribute-set: expr-fib-attrs - value: flow_offload attribute-set: expr-flow-offload-attrs @@ -857,7 +1106,8 @@ sub-messages: value: immediate attribute-set: expr-immediate-attrs - - value: lookup # TODO + value: lookup + attribute-set: expr-lookup-attrs - value: meta attribute-set: expr-meta-attrs @@ -865,9 +1115,21 @@ sub-messages: value: nat attribute-set: expr-nat-attrs - + value: objref + attribute-set: expr-objref-attrs + - value: payload attribute-set: expr-payload-attrs - + value: quota + attribute-set: quota-attrs + - + value: reject + attribute-set: expr-reject-attrs + - + value: target + attribute-set: expr-target-attrs + - value: tproxy attribute-set: expr-tproxy-attrs - diff --git a/Documentation/netlink/specs/rt_link.yaml b/Documentation/netlink/specs/rt_link.yaml index de08c12fd56f..0c4d5d40cae9 100644 --- a/Documentation/netlink/specs/rt_link.yaml +++ b/Documentation/netlink/specs/rt_link.yaml @@ -903,6 +903,22 @@ definitions: - cfm-config - cfm-status - mst + - + name: netkit-policy + type: enum + entries: + - + name: forward + value: 0 + - + name: blackhole + value: 2 + - + name: netkit-mode + type: enum + entries: + - name: l2 + - name: l3 attribute-sets: - @@ -2109,6 +2125,28 @@ attribute-sets: - name: id type: u32 + - + name: linkinfo-netkit-attrs + name-prefix: ifla-netkit- + attributes: + - + name: peer-info + type: binary + - + name: primary + type: u8 + - + name: policy + type: u32 + enum: netkit-policy + - + name: peer-policy + type: u32 + enum: netkit-policy + - + name: mode + type: u32 + enum: netkit-mode sub-messages: - @@ -2147,6 +2185,9 @@ sub-messages: - value: vrf attribute-set: linkinfo-vrf-attrs + - + value: netkit + attribute-set: linkinfo-netkit-attrs - name: linkinfo-member-data-msg formats: diff --git a/Documentation/networking/device_drivers/ethernet/amazon/ena.rst b/Documentation/networking/device_drivers/ethernet/amazon/ena.rst index a4c7d0c65fd7..4561e8ab9e08 100644 --- a/Documentation/networking/device_drivers/ethernet/amazon/ena.rst +++ b/Documentation/networking/device_drivers/ethernet/amazon/ena.rst @@ -230,6 +230,11 @@ per-queue stats) from the device. In addition the driver logs the stats to syslog upon device reset. +On supported instance types, the statistics will also include the +ENA Express data (fields prefixed with `ena_srd`). For a complete +documentation of ENA Express data refer to +https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ena-express.html#ena-express-monitor + MTU === diff --git a/Documentation/networking/device_drivers/ethernet/index.rst b/Documentation/networking/device_drivers/ethernet/index.rst index 6932d8c043c2..6fc1961492b7 100644 --- a/Documentation/networking/device_drivers/ethernet/index.rst +++ b/Documentation/networking/device_drivers/ethernet/index.rst @@ -44,6 +44,7 @@ Contents: marvell/octeon_ep marvell/octeon_ep_vf mellanox/mlx5/index + meta/fbnic microsoft/netvsc neterion/s2io netronome/nfp diff --git a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst index 3bd72577af9a..99d95be4d159 100644 --- a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst +++ b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst @@ -218,6 +218,22 @@ the software port. [#accel]_. - Informative + * - `rx[i]_hds_nosplit_packets` + - Number of packets that were not split in header/data split mode. A + packet will not get split when the hardware does not support its + protocol splitting. An example such a protocol is ICMPv4/v6. Currently + TCP and UDP with IPv4/IPv6 are supported for header/data split + [#accel]_. + - Informative + + * - `rx[i]_hds_nosplit_bytes` + - Number of bytes for packets that were not split in header/data split + mode. A packet will not get split when the hardware does not support its + protocol splitting. An example such a protocol is ICMPv4/v6. Currently + TCP and UDP with IPv4/IPv6 are supported for header/data split + [#accel]_. + - Informative + * - `rx[i]_lro_packets` - The number of LRO packets received on ring i [#accel]_. - Acceleration diff --git a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/kconfig.rst b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/kconfig.rst index 20d3b7e87049..34e911480108 100644 --- a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/kconfig.rst +++ b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/kconfig.rst @@ -130,6 +130,9 @@ Enabling the driver and kconfig options | Build support for software-managed steering in the NIC. +**CONFIG_MLX5_HW_STEERING=(y/n)** + +| Build support for hardware-managed steering in the NIC. **CONFIG_MLX5_TC_CT=(y/n)** diff --git a/Documentation/networking/device_drivers/ethernet/meta/fbnic.rst b/Documentation/networking/device_drivers/ethernet/meta/fbnic.rst new file mode 100644 index 000000000000..32ff114f5c26 --- /dev/null +++ b/Documentation/networking/device_drivers/ethernet/meta/fbnic.rst @@ -0,0 +1,29 @@ +.. SPDX-License-Identifier: GPL-2.0+ + +===================================== +Meta Platforms Host Network Interface +===================================== + +Firmware Versions +----------------- + +fbnic has three components stored on the flash which are provided in one PLDM +image: + +1. fw - The control firmware used to view and modify firmware settings, request + firmware actions, and retrieve firmware counters outside of the data path. + This is the firmware which fbnic_fw.c interacts with. +2. bootloader - The firmware which validate firmware security and control basic + operations including loading and updating the firmware. This is also known + as the cmrt firmware. +3. undi - This is the UEFI driver which is based on the Linux driver. + +fbnic stores two copies of these three components on flash. This allows fbnic +to fall back to an older version of firmware automatically in case firmware +fails to boot. Version information for both is provided as running and stored. +The undi is only provided in stored as it is not actively running once the Linux +driver takes over. + +devlink dev info provides version information for all three components. In +addition to the version the hg commit hash of the build is included as a +separate entry. diff --git a/Documentation/networking/devmem.rst b/Documentation/networking/devmem.rst new file mode 100644 index 000000000000..a55bf21f671c --- /dev/null +++ b/Documentation/networking/devmem.rst @@ -0,0 +1,269 @@ +.. SPDX-License-Identifier: GPL-2.0 + +================= +Device Memory TCP +================= + + +Intro +===== + +Device memory TCP (devmem TCP) enables receiving data directly into device +memory (dmabuf). The feature is currently implemented for TCP sockets. + + +Opportunity +----------- + +A large number of data transfers have device memory as the source and/or +destination. Accelerators drastically increased the prevalence of such +transfers. Some examples include: + +- Distributed training, where ML accelerators, such as GPUs on different hosts, + exchange data. + +- Distributed raw block storage applications transfer large amounts of data with + remote SSDs. Much of this data does not require host processing. + +Typically the Device-to-Device data transfers in the network are implemented as +the following low-level operations: Device-to-Host copy, Host-to-Host network +transfer, and Host-to-Device copy. + +The flow involving host copies is suboptimal, especially for bulk data transfers, +and can put significant strains on system resources such as host memory +bandwidth and PCIe bandwidth. + +Devmem TCP optimizes this use case by implementing socket APIs that enable +the user to receive incoming network packets directly into device memory. + +Packet payloads go directly from the NIC to device memory. + +Packet headers go to host memory and are processed by the TCP/IP stack +normally. The NIC must support header split to achieve this. + +Advantages: + +- Alleviate host memory bandwidth pressure, compared to existing + network-transfer + device-copy semantics. + +- Alleviate PCIe bandwidth pressure, by limiting data transfer to the lowest + level of the PCIe tree, compared to the traditional path which sends data + through the root complex. + + +More Info +--------- + + slides, video + https://netdevconf.org/0x17/sessions/talk/device-memory-tcp.html + + patchset + [PATCH net-next v24 00/13] Device Memory TCP + https://lore.kernel.org/netdev/20240831004313.3713467-1-almasrymina@google.com/ + + +Interface +========= + + +Example +------- + +tools/testing/selftests/net/ncdevmem.c:do_server shows an example of setting up +the RX path of this API. + + +NIC Setup +--------- + +Header split, flow steering, & RSS are required features for devmem TCP. + +Header split is used to split incoming packets into a header buffer in host +memory, and a payload buffer in device memory. + +Flow steering & RSS are used to ensure that only flows targeting devmem land on +an RX queue bound to devmem. + +Enable header split & flow steering:: + + # enable header split + ethtool -G eth1 tcp-data-split on + + + # enable flow steering + ethtool -K eth1 ntuple on + +Configure RSS to steer all traffic away from the target RX queue (queue 15 in +this example):: + + ethtool --set-rxfh-indir eth1 equal 15 + + +The user must bind a dmabuf to any number of RX queues on a given NIC using +the netlink API:: + + /* Bind dmabuf to NIC RX queue 15 */ + struct netdev_queue *queues; + queues = malloc(sizeof(*queues) * 1); + + queues[0]._present.type = 1; + queues[0]._present.idx = 1; + queues[0].type = NETDEV_RX_QUEUE_TYPE_RX; + queues[0].idx = 15; + + *ys = ynl_sock_create(&ynl_netdev_family, &yerr); + + req = netdev_bind_rx_req_alloc(); + netdev_bind_rx_req_set_ifindex(req, 1 /* ifindex */); + netdev_bind_rx_req_set_dmabuf_fd(req, dmabuf_fd); + __netdev_bind_rx_req_set_queues(req, queues, n_queue_index); + + rsp = netdev_bind_rx(*ys, req); + + dmabuf_id = rsp->dmabuf_id; + + +The netlink API returns a dmabuf_id: a unique ID that refers to this dmabuf +that has been bound. + +The user can unbind the dmabuf from the netdevice by closing the netlink socket +that established the binding. We do this so that the binding is automatically +unbound even if the userspace process crashes. + +Note that any reasonably well-behaved dmabuf from any exporter should work with +devmem TCP, even if the dmabuf is not actually backed by devmem. An example of +this is udmabuf, which wraps user memory (non-devmem) in a dmabuf. + + +Socket Setup +------------ + +The socket must be flow steered to the dmabuf bound RX queue:: + + ethtool -N eth1 flow-type tcp4 ... queue 15 + + +Receiving data +-------------- + +The user application must signal to the kernel that it is capable of receiving +devmem data by passing the MSG_SOCK_DEVMEM flag to recvmsg:: + + ret = recvmsg(fd, &msg, MSG_SOCK_DEVMEM); + +Applications that do not specify the MSG_SOCK_DEVMEM flag will receive an EFAULT +on devmem data. + +Devmem data is received directly into the dmabuf bound to the NIC in 'NIC +Setup', and the kernel signals such to the user via the SCM_DEVMEM_* cmsgs:: + + for (cm = CMSG_FIRSTHDR(&msg); cm; cm = CMSG_NXTHDR(&msg, cm)) { + if (cm->cmsg_level != SOL_SOCKET || + (cm->cmsg_type != SCM_DEVMEM_DMABUF && + cm->cmsg_type != SCM_DEVMEM_LINEAR)) + continue; + + dmabuf_cmsg = (struct dmabuf_cmsg *)CMSG_DATA(cm); + + if (cm->cmsg_type == SCM_DEVMEM_DMABUF) { + /* Frag landed in dmabuf. + * + * dmabuf_cmsg->dmabuf_id is the dmabuf the + * frag landed on. + * + * dmabuf_cmsg->frag_offset is the offset into + * the dmabuf where the frag starts. + * + * dmabuf_cmsg->frag_size is the size of the + * frag. + * + * dmabuf_cmsg->frag_token is a token used to + * refer to this frag for later freeing. + */ + + struct dmabuf_token token; + token.token_start = dmabuf_cmsg->frag_token; + token.token_count = 1; + continue; + } + + if (cm->cmsg_type == SCM_DEVMEM_LINEAR) + /* Frag landed in linear buffer. + * + * dmabuf_cmsg->frag_size is the size of the + * frag. + */ + continue; + + } + +Applications may receive 2 cmsgs: + +- SCM_DEVMEM_DMABUF: this indicates the fragment landed in the dmabuf indicated + by dmabuf_id. + +- SCM_DEVMEM_LINEAR: this indicates the fragment landed in the linear buffer. + This typically happens when the NIC is unable to split the packet at the + header boundary, such that part (or all) of the payload landed in host + memory. + +Applications may receive no SO_DEVMEM_* cmsgs. That indicates non-devmem, +regular TCP data that landed on an RX queue not bound to a dmabuf. + + +Freeing frags +------------- + +Frags received via SCM_DEVMEM_DMABUF are pinned by the kernel while the user +processes the frag. The user must return the frag to the kernel via +SO_DEVMEM_DONTNEED:: + + ret = setsockopt(client_fd, SOL_SOCKET, SO_DEVMEM_DONTNEED, &token, + sizeof(token)); + +The user must ensure the tokens are returned to the kernel in a timely manner. +Failure to do so will exhaust the limited dmabuf that is bound to the RX queue +and will lead to packet drops. + + +Implementation & Caveats +======================== + +Unreadable skbs +--------------- + +Devmem payloads are inaccessible to the kernel processing the packets. This +results in a few quirks for payloads of devmem skbs: + +- Loopback is not functional. Loopback relies on copying the payload, which is + not possible with devmem skbs. + +- Software checksum calculation fails. + +- TCP Dump and bpf can't access devmem packet payloads. + + +Testing +======= + +More realistic example code can be found in the kernel source under +``tools/testing/selftests/net/ncdevmem.c`` + +ncdevmem is a devmem TCP netcat. It works very similarly to netcat, but +receives data directly into a udmabuf. + +To run ncdevmem, you need to run it on a server on the machine under test, and +you need to run netcat on a peer to provide the TX data. + +ncdevmem has a validation mode as well that expects a repeating pattern of +incoming data and validates it as such. For example, you can launch +ncdevmem on the server by:: + + ncdevmem -s <server IP> -c <client IP> -f eth1 -d 3 -n 0000:06:00.0 -l \ + -p 5201 -v 7 + +On client side, use regular netcat to send TX data to ncdevmem process +on the server:: + + yes $(echo -e \\x01\\x02\\x03\\x04\\x05\\x06) | \ + tr \\n \\0 | head -c 5G | nc <server IP> 5201 -p 5201 diff --git a/Documentation/networking/ethtool-netlink.rst b/Documentation/networking/ethtool-netlink.rst index d5f246aceb9f..295563e91082 100644 --- a/Documentation/networking/ethtool-netlink.rst +++ b/Documentation/networking/ethtool-netlink.rst @@ -57,6 +57,7 @@ Structure of this header is ``ETHTOOL_A_HEADER_DEV_INDEX`` u32 device ifindex ``ETHTOOL_A_HEADER_DEV_NAME`` string device name ``ETHTOOL_A_HEADER_FLAGS`` u32 flags common for all requests + ``ETHTOOL_A_HEADER_PHY_INDEX`` u32 phy device index ============================== ====== ============================= ``ETHTOOL_A_HEADER_DEV_INDEX`` and ``ETHTOOL_A_HEADER_DEV_NAME`` identify the @@ -81,6 +82,12 @@ the behaviour is backward compatible, i.e. requests from old clients not aware of the flag should be interpreted the way the client expects. A client must not set flags it does not understand. +``ETHTOOL_A_HEADER_PHY_INDEX`` identifies the Ethernet PHY the message relates to. +As there are numerous commands that are related to PHY configuration, and because +there may be more than one PHY on the link, the PHY index can be passed in the +request for the commands that needs it. It is, however, not mandatory, and if it +is not passed for commands that target a PHY, the net_device.phydev pointer +is used. Bit sets ======== @@ -934,18 +941,18 @@ Request contents: ==================================== ====== =========================== Kernel checks that requested ring sizes do not exceed limits reported by -driver. Driver may impose additional constraints and may not suspport all +driver. Driver may impose additional constraints and may not support all attributes. ``ETHTOOL_A_RINGS_CQE_SIZE`` specifies the completion queue event size. -Completion queue events(CQE) are the events posted by NIC to indicate the -completion status of a packet when the packet is sent(like send success or -error) or received(like pointers to packet fragments). The CQE size parameter +Completion queue events (CQE) are the events posted by NIC to indicate the +completion status of a packet when the packet is sent (like send success or +error) or received (like pointers to packet fragments). The CQE size parameter enables to modify the CQE size other than default size if NIC supports it. -A bigger CQE can have more receive buffer pointers inturn NIC can transfer -a bigger frame from wire. Based on the NIC hardware, the overall completion -queue size can be adjusted in the driver if CQE size is modified. +A bigger CQE can have more receive buffer pointers, and in turn the NIC can +transfer a bigger frame from wire. Based on the NIC hardware, the overall +completion queue size can be adjusted in the driver if CQE size is modified. CHANNELS_GET ============ @@ -989,7 +996,7 @@ Request contents: ===================================== ====== ========================== Kernel checks that requested channel counts do not exceed limits reported by -driver. Driver may impose additional constraints and may not suspport all +driver. Driver may impose additional constraints and may not support all attributes. @@ -1307,12 +1314,17 @@ information. +-+-+-----------------------------------------+--------+---------------------+ | | | ``ETHTOOL_A_CABLE_RESULTS_CODE`` | u8 | result code | +-+-+-----------------------------------------+--------+---------------------+ + | | | ``ETHTOOL_A_CABLE_RESULT_SRC`` | u32 | information source | + +-+-+-----------------------------------------+--------+---------------------+ | | ``ETHTOOL_A_CABLE_NEST_FAULT_LENGTH`` | nested | cable length | +-+-+-----------------------------------------+--------+---------------------+ | | | ``ETHTOOL_A_CABLE_FAULT_LENGTH_PAIR`` | u8 | pair number | +-+-+-----------------------------------------+--------+---------------------+ | | | ``ETHTOOL_A_CABLE_FAULT_LENGTH_CM`` | u32 | length in cm | +-+-+-----------------------------------------+--------+---------------------+ + | | | ``ETHTOOL_A_CABLE_FAULT_LENGTH_SRC`` | u32 | information source | + +-+-+-----------------------------------------+--------+---------------------+ + CABLE_TEST TDR ============== @@ -1756,7 +1768,7 @@ Kernel response contents: When set, the optional ``ETHTOOL_A_PODL_PSE_ADMIN_STATE`` attribute identifies the operational state of the PoDL PSE functions. The operational state of the PSE function can be changed using the ``ETHTOOL_A_PODL_PSE_ADMIN_CONTROL`` -action. This option is corresponding to ``IEEE 802.3-2018`` 30.15.1.1.2 +action. This attribute corresponds to ``IEEE 802.3-2018`` 30.15.1.1.2 aPoDLPSEAdminState. Possible values are: .. kernel-doc:: include/uapi/linux/ethtool.h @@ -1770,8 +1782,8 @@ The same goes for ``ETHTOOL_A_C33_PSE_ADMIN_STATE`` implementing When set, the optional ``ETHTOOL_A_PODL_PSE_PW_D_STATUS`` attribute identifies the power detection status of the PoDL PSE. The status depend on internal PSE -state machine and automatic PD classification support. This option is -corresponding to ``IEEE 802.3-2018`` 30.15.1.1.3 aPoDLPSEPowerDetectionStatus. +state machine and automatic PD classification support. This attribute +corresponds to ``IEEE 802.3-2018`` 30.15.1.1.3 aPoDLPSEPowerDetectionStatus. Possible values are: .. kernel-doc:: include/uapi/linux/ethtool.h @@ -1785,12 +1797,13 @@ The same goes for ``ETHTOOL_A_C33_PSE_ADMIN_PW_D_STATUS`` implementing When set, the optional ``ETHTOOL_A_C33_PSE_PW_CLASS`` attribute identifies the power class of the C33 PSE. It depends on the class negotiated between -the PSE and the PD. This option is corresponding to ``IEEE 802.3-2022`` +the PSE and the PD. This attribute corresponds to ``IEEE 802.3-2022`` 30.9.1.1.8 aPSEPowerClassification. When set, the optional ``ETHTOOL_A_C33_PSE_ACTUAL_PW`` attribute identifies -This option is corresponding to ``IEEE 802.3-2022`` 30.9.1.1.23 aPSEActualPower. -Actual power is reported in mW. +the actual power drawn by the C33 PSE. This attribute corresponds to +``IEEE 802.3-2022`` 30.9.1.1.23 aPSEActualPower. Actual power is reported +in mW. When set, the optional ``ETHTOOL_A_C33_PSE_EXT_STATE`` attribute identifies the extended error state of the C33 PSE. Possible values are: @@ -1839,7 +1852,7 @@ Request contents: ====================================== ====== ============================= When set, the optional ``ETHTOOL_A_PODL_PSE_ADMIN_CONTROL`` attribute is used -to control PoDL PSE Admin functions. This option is implementing +to control PoDL PSE Admin functions. This option implements ``IEEE 802.3-2018`` 30.15.1.2.1 acPoDLPSEAdminControl. See ``ETHTOOL_A_PODL_PSE_ADMIN_STATE`` for supported values. @@ -1866,10 +1879,18 @@ RSS context of an interface similar to ``ETHTOOL_GRSSH`` ioctl request. Request contents: -===================================== ====== ========================== +===================================== ====== ============================ ``ETHTOOL_A_RSS_HEADER`` nested request header ``ETHTOOL_A_RSS_CONTEXT`` u32 context number -===================================== ====== ========================== + ``ETHTOOL_A_RSS_START_CONTEXT`` u32 start context number (dumps) +===================================== ====== ============================ + +``ETHTOOL_A_RSS_CONTEXT`` specifies which RSS context number to query, +if not set context 0 (the main context) is queried. Dumps can be filtered +by device (only listing contexts of a given netdev). Filtering single +context number is not supported but ``ETHTOOL_A_RSS_START_CONTEXT`` +can be used to start dumping context from the given number (primarily +used to ignore context 0s and only dump additional contexts). Kernel response contents: @@ -1927,7 +1948,7 @@ When set, the optional ``ETHTOOL_A_PLCA_VERSION`` attribute indicates which standard and version the PLCA management interface complies to. When not set, the interface is vendor-specific and (possibly) supplied by the driver. The OPEN Alliance SIG specifies a standard register map for 10BASE-T1S PHYs -embedding the PLCA Reconcialiation Sublayer. See "10BASE-T1S PLCA Management +embedding the PLCA Reconciliation Sublayer. See "10BASE-T1S PLCA Management Registers" at https://www.opensig.org/about/specifications/. When set, the optional ``ETHTOOL_A_PLCA_ENABLED`` attribute indicates the @@ -1989,7 +2010,7 @@ Request contents: ``ETHTOOL_A_PLCA_ENABLED`` u8 PLCA Admin State ``ETHTOOL_A_PLCA_NODE_ID`` u8 PLCA unique local node ID ``ETHTOOL_A_PLCA_NODE_CNT`` u8 Number of PLCA nodes on the - netkork, including the + network, including the coordinator ``ETHTOOL_A_PLCA_TO_TMR`` u8 Transmit Opportunity Timer value in bit-times (BT) @@ -2176,6 +2197,49 @@ string. The ``ETHTOOL_A_MODULE_FW_FLASH_DONE`` and ``ETHTOOL_A_MODULE_FW_FLASH_TOTAL`` attributes encode the completed and total amount of work, respectively. +PHY_GET +======= + +Retrieve information about a given Ethernet PHY sitting on the link. The DO +operation returns all available information about dev->phydev. User can also +specify a PHY_INDEX, in which case the DO request returns information about that +specific PHY. + +As there can be more than one PHY, the DUMP operation can be used to list the PHYs +present on a given interface, by passing an interface index or name in +the dump request. + +For more information, refer to :ref:`phy_link_topology` + +Request contents: + + ==================================== ====== ========================== + ``ETHTOOL_A_PHY_HEADER`` nested request header + ==================================== ====== ========================== + +Kernel response contents: + + ===================================== ====== =============================== + ``ETHTOOL_A_PHY_HEADER`` nested request header + ``ETHTOOL_A_PHY_INDEX`` u32 the phy's unique index, that can + be used for phy-specific + requests + ``ETHTOOL_A_PHY_DRVNAME`` string the phy driver name + ``ETHTOOL_A_PHY_NAME`` string the phy device name + ``ETHTOOL_A_PHY_UPSTREAM_TYPE`` u32 the type of device this phy is + connected to + ``ETHTOOL_A_PHY_UPSTREAM_INDEX`` u32 the PHY index of the upstream + PHY + ``ETHTOOL_A_PHY_UPSTREAM_SFP_NAME`` string if this PHY is connected to + its parent PHY through an SFP + bus, the name of this sfp bus + ``ETHTOOL_A_PHY_DOWNSTREAM_SFP_NAME`` string if the phy controls an sfp bus, + the name of the sfp bus + ===================================== ====== =============================== + +When ``ETHTOOL_A_PHY_UPSTREAM_TYPE`` is PHY_UPSTREAM_PHY, the PHY's parent is +another PHY. + Request translation =================== @@ -2283,4 +2347,5 @@ are netlink only. n/a ``ETHTOOL_MSG_MM_GET`` n/a ``ETHTOOL_MSG_MM_SET`` n/a ``ETHTOOL_MSG_MODULE_FW_FLASH_ACT`` + n/a ``ETHTOOL_MSG_PHY_GET`` =================================== ===================================== diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst index d1af04b952f8..803dfc1efb75 100644 --- a/Documentation/networking/index.rst +++ b/Documentation/networking/index.rst @@ -49,6 +49,7 @@ Contents: cdc_mbim dccp dctcp + devmem dns_resolver driver eql @@ -87,10 +88,12 @@ Contents: nexthop-group-resilient nf_conntrack-sysctl nf_flowtable + oa-tc6-framework openvswitch operstates packet_mmap phonet + phy-link-topology pktgen plip ppp_generic diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst index 3616389c8c2d..eacf8983e230 100644 --- a/Documentation/networking/ip-sysctl.rst +++ b/Documentation/networking/ip-sysctl.rst @@ -2362,6 +2362,20 @@ ra_honor_pio_life - BOOLEAN Default: 0 (disabled) +ra_honor_pio_pflag - BOOLEAN + The Prefix Information Option P-flag indicates the network can + allocate a unique IPv6 prefix per client using DHCPv6-PD. + This sysctl can be enabled when a userspace DHCPv6-PD client + is running to cause the P-flag to take effect: i.e. the + P-flag suppresses any effects of the A-flag within the same + PIO. For a given PIO, P=1 and A=1 is treated as A=0. + + - If disabled, the P-flag is ignored. + - If enabled, the P-flag will disable SLAAC autoconfiguration + for the given Prefix Information Option. + + Default: 0 (disabled) + accept_ra_rt_info_min_plen - INTEGER Minimum prefix length of Route Information in RA. diff --git a/Documentation/networking/l2tp.rst b/Documentation/networking/l2tp.rst index 8496b467dea4..e8cf8b3e60ac 100644 --- a/Documentation/networking/l2tp.rst +++ b/Documentation/networking/l2tp.rst @@ -638,9 +638,8 @@ Tunnels are identified by a unique tunnel id. The id is 16-bit for L2TPv2 and 32-bit for L2TPv3. Internally, the id is stored as a 32-bit value. -Tunnels are kept in a per-net list, indexed by tunnel id. The tunnel -id namespace is shared by L2TPv2 and L2TPv3. The tunnel context can be -derived from the socket's sk_user_data. +Tunnels are kept in a per-net list, indexed by tunnel id. The +tunnel id namespace is shared by L2TPv2 and L2TPv3. Handling tunnel socket close is perhaps the most tricky part of the L2TP implementation. If userspace closes a tunnel socket, the L2TP @@ -652,9 +651,7 @@ socket's encap_destroy handler is invoked, which L2TP uses to initiate its tunnel close actions. For L2TPIP sockets, the socket's close handler initiates the same tunnel close actions. All sessions are first closed. Each session drops its tunnel ref. When the tunnel ref -reaches zero, the tunnel puts its socket ref. When the socket is -eventually destroyed, its sk_destruct finally frees the L2TP tunnel -context. +reaches zero, the tunnel drops its socket ref. Sessions -------- @@ -667,10 +664,7 @@ pseudowire) or other data types such as PPP, ATM, HDLC or Frame Relay. Linux currently implements only Ethernet and PPP session types. Some L2TP session types also have a socket (PPP pseudowires) while -others do not (Ethernet pseudowires). We can't therefore use the -socket reference count as the reference count for session -contexts. The L2TP implementation therefore has its own internal -reference counts on the session contexts. +others do not (Ethernet pseudowires). Like tunnels, L2TP sessions are identified by a unique session id. Just as with tunnel ids, the session id is 16-bit for @@ -680,21 +674,19 @@ value. Sessions hold a ref on their parent tunnel to ensure that the tunnel stays extant while one or more sessions references it. -Sessions are kept in a per-tunnel list, indexed by session id. L2TPv3 -sessions are also kept in a per-net list indexed by session id, -because L2TPv3 session ids are unique across all tunnels and L2TPv3 -data packets do not contain a tunnel id in the header. This list is -therefore needed to find the session context associated with a -received data packet when the tunnel context cannot be derived from -the tunnel socket. +Sessions are kept in a per-net list. L2TPv2 sessions and L2TPv3 +sessions are stored in separate lists. L2TPv2 sessions are keyed +by a 32-bit key made up of the 16-bit tunnel ID and 16-bit +session ID. L2TPv3 sessions are keyed by the 32-bit session ID, since +L2TPv3 session ids are unique across all tunnels. Although the L2TPv3 RFC specifies that L2TPv3 session ids are not -scoped by the tunnel, the kernel does not police this for L2TPv3 UDP -tunnels and does not add sessions of L2TPv3 UDP tunnels into the -per-net session list. In the UDP receive code, we must trust that the -tunnel can be identified using the tunnel socket's sk_user_data and -lookup the session in the tunnel's session list instead of the per-net -session list. +scoped by the tunnel, the Linux implementation has historically +allowed this. Such session id collisions are supported using a per-net +hash table keyed by sk and session ID. When looking up L2TPv3 +sessions, the list entry may link to multiple sessions with that +session ID, in which case the session matching the given sk (tunnel) +is used. PPP --- @@ -714,10 +706,9 @@ The L2TP PPP implementation handles the closing of a PPPoL2TP socket by closing its corresponding L2TP session. This is complicated because it must consider racing with netlink session create/destroy requests and pppol2tp_connect trying to reconnect with a session that is in the -process of being closed. Unlike tunnels, PPP sessions do not hold a -ref on their associated socket, so code must be careful to sock_hold -the socket where necessary. For all the details, see commit -3d609342cc04129ff7568e19316ce3d7451a27e8. +process of being closed. PPP sessions hold a ref on their associated +socket in order that the socket remains extants while the session +references it. Ethernet -------- @@ -761,15 +752,10 @@ Limitations The current implementation has a number of limitations: - 1) Multiple UDP sockets with the same 5-tuple address cannot be - used. The kernel's tunnel context is identified using private - data associated with the socket so it is important that each - socket is uniquely identified by its address. - - 2) Interfacing with openvswitch is not yet implemented. It may be + 1) Interfacing with openvswitch is not yet implemented. It may be useful to map OVS Ethernet and VLAN ports into L2TPv3 tunnels. - 3) VLAN pseudowires are implemented using an ``l2tpethN`` interface + 2) VLAN pseudowires are implemented using an ``l2tpethN`` interface configured with a VLAN sub-interface. Since L2TPv3 VLAN pseudowires carry one and only one VLAN, it may be better to use a single netdevice rather than an ``l2tpethN`` and ``l2tpethN``:M diff --git a/Documentation/networking/mptcp-sysctl.rst b/Documentation/networking/mptcp-sysctl.rst index fd514bba8c43..95598c21fc8e 100644 --- a/Documentation/networking/mptcp-sysctl.rst +++ b/Documentation/networking/mptcp-sysctl.rst @@ -34,6 +34,17 @@ available_schedulers - STRING Shows the available schedulers choices that are registered. More packet schedulers may be available, but not loaded. +blackhole_timeout - INTEGER (seconds) + Initial time period in second to disable MPTCP on active MPTCP sockets + when a MPTCP firewall blackhole issue happens. This time period will + grow exponentially when more blackhole issues get detected right after + MPTCP is re-enabled and will reset to the initial value when the + blackhole issue goes away. + + 0 to disable the blackhole detection. + + Default: 3600 + checksum_enabled - BOOLEAN Control whether DSS checksum can be enabled. diff --git a/Documentation/networking/multi-pf-netdev.rst b/Documentation/networking/multi-pf-netdev.rst index 268819225866..2cd25d81aaa7 100644 --- a/Documentation/networking/multi-pf-netdev.rst +++ b/Documentation/networking/multi-pf-netdev.rst @@ -111,11 +111,11 @@ The relation between PF, irq, napi, and queue can be observed via netlink spec:: Here you can clearly observe our channels distribution policy:: $ ls /proc/irq/{36,39,40,41,42}/mlx5* -d -1 - /proc/irq/36/mlx5_comp1@pci:0000:08:00.0 - /proc/irq/39/mlx5_comp1@pci:0000:09:00.0 - /proc/irq/40/mlx5_comp2@pci:0000:08:00.0 - /proc/irq/41/mlx5_comp2@pci:0000:09:00.0 - /proc/irq/42/mlx5_comp3@pci:0000:08:00.0 + /proc/irq/36/mlx5_comp0@pci:0000:08:00.0 + /proc/irq/39/mlx5_comp0@pci:0000:09:00.0 + /proc/irq/40/mlx5_comp1@pci:0000:08:00.0 + /proc/irq/41/mlx5_comp1@pci:0000:09:00.0 + /proc/irq/42/mlx5_comp2@pci:0000:08:00.0 Steering ======== diff --git a/Documentation/networking/net_cachelines/net_device.rst b/Documentation/networking/net_cachelines/net_device.rst index 70c4fb9d4e5c..22b07c814f4a 100644 --- a/Documentation/networking/net_cachelines/net_device.rst +++ b/Documentation/networking/net_cachelines/net_device.rst @@ -7,6 +7,8 @@ net_device struct fast path usage breakdown Type Name fastpath_tx_access fastpath_rx_access Comments ..struct ..net_device +unsigned_long:32 priv_flags read_mostly - __dev_queue_xmit(tx) +unsigned_long:1 lltx read_mostly - HARD_TX_LOCK,HARD_TX_TRYLOCK,HARD_TX_UNLOCK(tx) char name[16] - - struct_netdev_name_node* name_node struct_dev_ifalias* ifalias @@ -23,7 +25,6 @@ struct_list_head ptype_specific struct adj_list unsigned_int flags read_mostly read_mostly __dev_queue_xmit,__dev_xmit_skb,ip6_output,__ip6_finish_output(tx);ip6_rcv_core(rx) xdp_features_t xdp_features -unsigned_long_long priv_flags read_mostly - __dev_queue_xmit(tx) struct_net_device_ops* netdev_ops read_mostly - netdev_core_pick_tx,netdev_start_xmit(tx) struct_xdp_metadata_ops* xdp_metadata_ops int ifindex - read_mostly ip6_rcv_core @@ -98,7 +99,7 @@ unsigned_int num_rx_queues unsigned_int real_num_rx_queues - read_mostly get_rps_cpu struct_bpf_prog* xdp_prog - read_mostly netif_elide_gro() unsigned_long gro_flush_timeout - read_mostly napi_complete_done -int napi_defer_hard_irqs - read_mostly napi_complete_done +u32 napi_defer_hard_irqs - read_mostly napi_complete_done unsigned_int gro_max_size - read_mostly skb_gro_receive unsigned_int gro_ipv4_max_size - read_mostly skb_gro_receive rx_handler_func_t* rx_handler read_mostly - __netif_receive_skb_core @@ -163,6 +164,10 @@ struct_lock_class_key* qdisc_tx_busylock bool proto_down unsigned:1 wol_enabled unsigned:1 threaded - - napi_poll(napi_enable,dev_set_threaded) +unsigned_long:1 see_all_hwtstamp_requests +unsigned_long:1 change_proto_down +unsigned_long:1 netns_local +unsigned_long:1 fcoe_mtu struct_list_head net_notifier_list struct_macsec_ops* macsec_ops struct_udp_tunnel_nic_info* udp_tunnel_nic_info @@ -176,3 +181,5 @@ netdevice_tracker dev_registered_tracker struct_rtnl_hw_stats64* offload_xstats_l3 struct_devlink_port* devlink_port struct_dpll_pin* dpll_pin +struct hlist_head page_pools +struct dim_irq_moder* irq_moder diff --git a/Documentation/networking/netdev-features.rst b/Documentation/networking/netdev-features.rst index d7b15bb64deb..5014f7cc1398 100644 --- a/Documentation/networking/netdev-features.rst +++ b/Documentation/networking/netdev-features.rst @@ -139,21 +139,6 @@ chained skbs (skb->next/prev list). Features contained in NETIF_F_SOFT_FEATURES are features of networking stack. Driver should not change behaviour based on them. - * LLTX driver (deprecated for hardware drivers) - -NETIF_F_LLTX is meant to be used by drivers that don't need locking at all, -e.g. software tunnels. - -This is also used in a few legacy drivers that implement their -own locking, don't use it for new (hardware) drivers. - - * netns-local device - -NETIF_F_NETNS_LOCAL is set for devices that are not allowed to move between -network namespaces (e.g. loopback). - -Don't use it in drivers. - * VLAN challenged NETIF_F_VLAN_CHALLENGED should be set for devices which can't cope with VLAN diff --git a/Documentation/networking/netdevices.rst b/Documentation/networking/netdevices.rst index c2476917a6c3..857c9784f87e 100644 --- a/Documentation/networking/netdevices.rst +++ b/Documentation/networking/netdevices.rst @@ -258,11 +258,11 @@ ndo_get_stats: ndo_start_xmit: Synchronization: __netif_tx_lock spinlock. - When the driver sets NETIF_F_LLTX in dev->features this will be + When the driver sets dev->lltx this will be called without holding netif_tx_lock. In this case the driver has to lock by itself when needed. The locking there should also properly protect against - set_rx_mode. WARNING: use of NETIF_F_LLTX is deprecated. + set_rx_mode. WARNING: use of dev->lltx is deprecated. Don't use it for new drivers. Context: Process with BHs disabled or BH (timer), diff --git a/Documentation/networking/oa-tc6-framework.rst b/Documentation/networking/oa-tc6-framework.rst new file mode 100644 index 000000000000..fe2aabde923a --- /dev/null +++ b/Documentation/networking/oa-tc6-framework.rst @@ -0,0 +1,497 @@ +.. SPDX-License-Identifier: GPL-2.0+ + +========================================================================= +OPEN Alliance 10BASE-T1x MAC-PHY Serial Interface (TC6) Framework Support +========================================================================= + +Introduction +------------ + +The IEEE 802.3cg project defines two 10 Mbit/s PHYs operating over a +single pair of conductors. The 10BASE-T1L (Clause 146) is a long reach +PHY supporting full duplex point-to-point operation over 1 km of single +balanced pair of conductors. The 10BASE-T1S (Clause 147) is a short reach +PHY supporting full / half duplex point-to-point operation over 15 m of +single balanced pair of conductors, or half duplex multidrop bus +operation over 25 m of single balanced pair of conductors. + +Furthermore, the IEEE 802.3cg project defines the new Physical Layer +Collision Avoidance (PLCA) Reconciliation Sublayer (Clause 148) meant to +provide improved determinism to the CSMA/CD media access method. PLCA +works in conjunction with the 10BASE-T1S PHY operating in multidrop mode. + +The aforementioned PHYs are intended to cover the low-speed / low-cost +applications in industrial and automotive environment. The large number +of pins (16) required by the MII interface, which is specified by the +IEEE 802.3 in Clause 22, is one of the major cost factors that need to be +addressed to fulfil this objective. + +The MAC-PHY solution integrates an IEEE Clause 4 MAC and a 10BASE-T1x PHY +exposing a low pin count Serial Peripheral Interface (SPI) to the host +microcontroller. This also enables the addition of Ethernet functionality +to existing low-end microcontrollers which do not integrate a MAC +controller. + +Overview +-------- + +The MAC-PHY is specified to carry both data (Ethernet frames) and control +(register access) transactions over a single full-duplex serial peripheral +interface. + +Protocol Overview +----------------- + +Two types of transactions are defined in the protocol: data transactions +for Ethernet frame transfers and control transactions for register +read/write transfers. A chunk is the basic element of data transactions +and is composed of 4 bytes of overhead plus 64 bytes of payload size for +each chunk. Ethernet frames are transferred over one or more data chunks. +Control transactions consist of one or more register read/write control +commands. + +SPI transactions are initiated by the SPI host with the assertion of CSn +low to the MAC-PHY and ends with the deassertion of CSn high. In between +each SPI transaction, the SPI host may need time for additional +processing and to setup the next SPI data or control transaction. + +SPI data transactions consist of an equal number of transmit (TX) and +receive (RX) chunks. Chunks in both transmit and receive directions may +or may not contain valid frame data independent from each other, allowing +for the simultaneous transmission and reception of different length +frames. + +Each transmit data chunk begins with a 32-bit data header followed by a +data chunk payload on MOSI. The data header indicates whether transmit +frame data is present and provides the information to determine which +bytes of the payload contain valid frame data. + +In parallel, receive data chunks are received on MISO. Each receive data +chunk consists of a data chunk payload ending with a 32-bit data footer. +The data footer indicates if there is receive frame data present within +the payload or not and provides the information to determine which bytes +of the payload contain valid frame data. + +Reference +--------- + +10BASE-T1x MAC-PHY Serial Interface Specification, + +Link: https://opensig.org/download/document/OPEN_Alliance_10BASET1x_MAC-PHY_Serial_Interface_V1.1.pdf + +Hardware Architecture +--------------------- + +.. code-block:: none + + +----------+ +-------------------------------------+ + | | | MAC-PHY | + | |<---->| +-----------+ +-------+ +-------+ | + | SPI Host | | | SPI Slave | | MAC | | PHY | | + | | | +-----------+ +-------+ +-------+ | + +----------+ +-------------------------------------+ + +Software Architecture +--------------------- + +.. code-block:: none + + +----------------------------------------------------------+ + | Networking Subsystem | + +----------------------------------------------------------+ + / \ / \ + | | + | | + \ / | + +----------------------+ +-----------------------------+ + | MAC Driver |<--->| OPEN Alliance TC6 Framework | + +----------------------+ +-----------------------------+ + / \ / \ + | | + | | + | \ / + +----------------------------------------------------------+ + | SPI Subsystem | + +----------------------------------------------------------+ + / \ + | + | + \ / + +----------------------------------------------------------+ + | 10BASE-T1x MAC-PHY Device | + +----------------------------------------------------------+ + +Implementation +-------------- + +MAC Driver +~~~~~~~~~~ + +- Probed by SPI subsystem. + +- Initializes OA TC6 framework for the MAC-PHY. + +- Registers and configures the network device. + +- Sends the tx ethernet frames from n/w subsystem to OA TC6 framework. + +OPEN Alliance TC6 Framework +~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +- Initializes PHYLIB interface. + +- Registers mac-phy interrupt. + +- Performs mac-phy register read/write operation using the control + transaction protocol specified in the OPEN Alliance 10BASE-T1x MAC-PHY + Serial Interface specification. + +- Performs Ethernet frames transaction using the data transaction protocol + for Ethernet frames specified in the OPEN Alliance 10BASE-T1x MAC-PHY + Serial Interface specification. + +- Forwards the received Ethernet frame from 10Base-T1x MAC-PHY to n/w + subsystem. + +Data Transaction +~~~~~~~~~~~~~~~~ + +The Ethernet frames that are typically transferred from the SPI host to +the MAC-PHY will be converted into multiple transmit data chunks. Each +transmit data chunk will have a 4 bytes header which contains the +information needed to determine the validity and the location of the +transmit frame data within the 64 bytes data chunk payload. + +.. code-block:: none + + +---------------------------------------------------+ + | Tx Chunk | + | +---------------------------+ +----------------+ | MOSI + | | 64 bytes chunk payload | | 4 bytes header | |------------> + | +---------------------------+ +----------------+ | + +---------------------------------------------------+ + +4 bytes header contains the below fields, + +DNC (Bit 31) - Data-Not-Control flag. This flag specifies the type of SPI + transaction. For TX data chunks, this bit shall be ’1’. + 0 - Control command + 1 - Data chunk + +SEQ (Bit 30) - Data Chunk Sequence. This bit is used to indicate an + even/odd transmit data chunk sequence to the MAC-PHY. + +NORX (Bit 29) - No Receive flag. The SPI host may set this bit to prevent + the MAC-PHY from conveying RX data on the MISO for the + current chunk (DV = 0 in the footer), indicating that the + host would not process it. Typically, the SPI host should + set NORX = 0 indicating that it will accept and process + any receive frame data within the current chunk. + +RSVD (Bit 28..24) - Reserved: All reserved bits shall be ‘0’. + +VS (Bit 23..22) - Vendor Specific. These bits are implementation specific. + If the MAC-PHY does not implement these bits, the host + shall set them to ‘0’. + +DV (Bit 21) - Data Valid flag. The SPI host uses this bit to indicate + whether the current chunk contains valid transmit frame data + (DV = 1) or not (DV = 0). When ‘0’, the MAC-PHY ignores the + chunk payload. Note that the receive path is unaffected by + the setting of the DV bit in the data header. + +SV (Bit 20) - Start Valid flag. The SPI host shall set this bit when the + beginning of an Ethernet frame is present in the current + transmit data chunk payload. Otherwise, this bit shall be + zero. This bit is not to be confused with the Start-of-Frame + Delimiter (SFD) byte described in IEEE 802.3 [2]. + +SWO (Bit 19..16) - Start Word Offset. When SV = 1, this field shall + contain the 32-bit word offset into the transmit data + chunk payload that points to the start of a new + Ethernet frame to be transmitted. The host shall write + this field as zero when SV = 0. + +RSVD (Bit 15) - Reserved: All reserved bits shall be ‘0’. + +EV (Bit 14) - End Valid flag. The SPI host shall set this bit when the end + of an Ethernet frame is present in the current transmit data + chunk payload. Otherwise, this bit shall be zero. + +EBO (Bit 13..8) - End Byte Offset. When EV = 1, this field shall contain + the byte offset into the transmit data chunk payload + that points to the last byte of the Ethernet frame to + transmit. This field shall be zero when EV = 0. + +TSC (Bit 7..6) - Timestamp Capture. Request a timestamp capture when the + frame is transmitted onto the network. + 00 - Do not capture a timestamp + 01 - Capture timestamp into timestamp capture register A + 10 - Capture timestamp into timestamp capture register B + 11 - Capture timestamp into timestamp capture register C + +RSVD (Bit 5..1) - Reserved: All reserved bits shall be ‘0’. + +P (Bit 0) - Parity. Parity bit calculated over the transmit data header. + Method used is odd parity. + +The number of buffers available in the MAC-PHY to store the incoming +transmit data chunk payloads is represented as transmit credits. The +available transmit credits in the MAC-PHY can be read either from the +Buffer Status Register or footer (Refer below for the footer info) +received from the MAC-PHY. The SPI host should not write more data chunks +than the available transmit credits as this will lead to transmit buffer +overflow error. + +In case the previous data footer had no transmit credits available and +once the transmit credits become available for transmitting transmit data +chunks, the MAC-PHY interrupt is asserted to SPI host. On reception of the +first data header this interrupt will be deasserted and the received +footer for the first data chunk will have the transmit credits available +information. + +The Ethernet frames that are typically transferred from MAC-PHY to SPI +host will be sent as multiple receive data chunks. Each receive data +chunk will have 64 bytes of data chunk payload followed by 4 bytes footer +which contains the information needed to determine the validity and the +location of the receive frame data within the 64 bytes data chunk payload. + +.. code-block:: none + + +---------------------------------------------------+ + | Rx Chunk | + | +----------------+ +---------------------------+ | MISO + | | 4 bytes footer | | 64 bytes chunk payload | |------------> + | +----------------+ +---------------------------+ | + +---------------------------------------------------+ + +4 bytes footer contains the below fields, + +EXST (Bit 31) - Extended Status. This bit is set when any bit in the + STATUS0 or STATUS1 registers are set and not masked. + +HDRB (Bit 30) - Received Header Bad. When set, indicates that the MAC-PHY + received a control or data header with a parity error. + +SYNC (Bit 29) - Configuration Synchronized flag. This bit reflects the + state of the SYNC bit in the CONFIG0 configuration + register (see Table 12). A zero indicates that the MAC-PHY + configuration may not be as expected by the SPI host. + Following configuration, the SPI host sets the + corresponding bitin the configuration register which is + reflected in this field. + +RCA (Bit 28..24) - Receive Chunks Available. The RCA field indicates to + the SPI host the minimum number of additional receive + data chunks of frame data that are available for + reading beyond the current receive data chunk. This + field is zero when there is no receive frame data + pending in the MAC-PHY’s buffer for reading. + +VS (Bit 23..22) - Vendor Specific. These bits are implementation specific. + If not implemented, the MAC-PHY shall set these bits to + ‘0’. + +DV (Bit 21) - Data Valid flag. The MAC-PHY uses this bit to indicate + whether the current receive data chunk contains valid + receive frame data (DV = 1) or not (DV = 0). When ‘0’, the + SPI host shall ignore the chunk payload. + +SV (Bit 20) - Start Valid flag. The MAC-PHY sets this bit when the current + chunk payload contains the start of an Ethernet frame. + Otherwise, this bit is zero. The SV bit is not to be + confused with the Start-of-Frame Delimiter (SFD) byte + described in IEEE 802.3 [2]. + +SWO (Bit 19..16) - Start Word Offset. When SV = 1, this field contains the + 32-bit word offset into the receive data chunk payload + containing the first byte of a new received Ethernet + frame. When a receive timestamp has been added to the + beginning of the received Ethernet frame (RTSA = 1) + then SWO points to the most significant byte of the + timestamp. This field will be zero when SV = 0. + +FD (Bit 15) - Frame Drop. When set, this bit indicates that the MAC has + detected a condition for which the SPI host should drop the + received Ethernet frame. This bit is only valid at the end + of a received Ethernet frame (EV = 1) and shall be zero at + all other times. + +EV (Bit 14) - End Valid flag. The MAC-PHY sets this bit when the end of a + received Ethernet frame is present in this receive data + chunk payload. + +EBO (Bit 13..8) - End Byte Offset: When EV = 1, this field contains the + byte offset into the receive data chunk payload that + locates the last byte of the received Ethernet frame. + This field is zero when EV = 0. + +RTSA (Bit 7) - Receive Timestamp Added. This bit is set when a 32-bit or + 64-bit timestamp has been added to the beginning of the + received Ethernet frame. The MAC-PHY shall set this bit to + zero when SV = 0. + +RTSP (Bit 6) - Receive Timestamp Parity. Parity bit calculated over the + 32-bit/64-bit timestamp added to the beginning of the + received Ethernet frame. Method used is odd parity. The + MAC-PHY shall set this bit to zero when RTSA = 0. + +TXC (Bit 5..1) - Transmit Credits. This field contains the minimum number + of transmit data chunks of frame data that the SPI host + can write in a single transaction without incurring a + transmit buffer overflow error. + +P (Bit 0) - Parity. Parity bit calculated over the receive data footer. + Method used is odd parity. + +SPI host will initiate the data receive transaction based on the receive +chunks available in the MAC-PHY which is provided in the receive chunk +footer (RCA - Receive Chunks Available). SPI host will create data invalid +transmit data chunks (empty chunks) or data valid transmit data chunks in +case there are valid Ethernet frames to transmit to the MAC-PHY. The +receive chunks available in MAC-PHY can be read either from the Buffer +Status Register or footer. + +In case the previous data footer had no receive data chunks available and +once the receive data chunks become available again for reading, the +MAC-PHY interrupt is asserted to SPI host. On reception of the first data +header this interrupt will be deasserted and the received footer for the +first data chunk will have the receive chunks available information. + +MAC-PHY Interrupt +~~~~~~~~~~~~~~~~~ + +The MAC-PHY interrupt is asserted when the following conditions are met. + +Receive chunks available - This interrupt is asserted when the previous +data footer had no receive data chunks available and once the receive +data chunks become available for reading. On reception of the first data +header this interrupt will be deasserted. + +Transmit chunk credits available - This interrupt is asserted when the +previous data footer indicated no transmit credits available and once the +transmit credits become available for transmitting transmit data chunks. +On reception of the first data header this interrupt will be deasserted. + +Extended status event - This interrupt is asserted when the previous data +footer indicated no extended status and once the extended event become +available. In this case the host should read status #0 register to know +the corresponding error/event. On reception of the first data header this +interrupt will be deasserted. + +Control Transaction +~~~~~~~~~~~~~~~~~~~ + +4 bytes control header contains the below fields, + +DNC (Bit 31) - Data-Not-Control flag. This flag specifies the type of SPI + transaction. For control commands, this bit shall be ‘0’. + 0 - Control command + 1 - Data chunk + +HDRB (Bit 30) - Received Header Bad. When set by the MAC-PHY, indicates + that a header was received with a parity error. The SPI + host should always clear this bit. The MAC-PHY ignores the + HDRB value sent by the SPI host on MOSI. + +WNR (Bit 29) - Write-Not-Read. This bit indicates if data is to be written + to registers (when set) or read from registers + (when clear). + +AID (Bit 28) - Address Increment Disable. When clear, the address will be + automatically post-incremented by one following each + register read or write. When set, address auto increment is + disabled allowing successive reads and writes to occur at + the same register address. + +MMS (Bit 27..24) - Memory Map Selector. This field selects the specific + register memory map to access. + +ADDR (Bit 23..8) - Address. Address of the first register within the + selected memory map to access. + +LEN (Bit 7..1) - Length. Specifies the number of registers to read/write. + This field is interpreted as the number of registers + minus 1 allowing for up to 128 consecutive registers read + or written starting at the address specified in ADDR. A + length of zero shall read or write a single register. + +P (Bit 0) - Parity. Parity bit calculated over the control command header. + Method used is odd parity. + +Control transactions consist of one or more control commands. Control +commands are used by the SPI host to read and write registers within the +MAC-PHY. Each control commands are composed of a 4 bytes control command +header followed by register write data in case of control write command. + +The MAC-PHY ignores the final 4 bytes of data from the SPI host at the end +of the control write command. The control write command is also echoed +from the MAC-PHY back to the SPI host to identify which register write +failed in case of any bus errors. The echoed Control write command will +have the first 4 bytes unused value to be ignored by the SPI host +followed by 4 bytes echoed control header followed by echoed register +write data. Control write commands can write either a single register or +multiple consecutive registers. When multiple consecutive registers are +written, the address is automatically post-incremented by the MAC-PHY. +Writing to any unimplemented or undefined registers shall be ignored and +yield no effect. + +The MAC-PHY ignores all data from the SPI host following the control +header for the remainder of the control read command. The control read +command is also echoed from the MAC-PHY back to the SPI host to identify +which register read is failed in case of any bus errors. The echoed +Control read command will have the first 4 bytes of unused value to be +ignored by the SPI host followed by 4 bytes echoed control header followed +by register read data. Control read commands can read either a single +register or multiple consecutive registers. When multiple consecutive +registers are read, the address is automatically post-incremented by the +MAC-PHY. Reading any unimplemented or undefined registers shall return +zero. + +Device drivers API +================== + +The include/linux/oa_tc6.h defines the following functions: + +.. c:function:: struct oa_tc6 *oa_tc6_init(struct spi_device *spi, \ + struct net_device *netdev) + +Initialize OA TC6 lib. + +.. c:function:: void oa_tc6_exit(struct oa_tc6 *tc6) + +Free allocated OA TC6 lib. + +.. c:function:: int oa_tc6_write_register(struct oa_tc6 *tc6, u32 address, \ + u32 value) + +Write a single register in the MAC-PHY. + +.. c:function:: int oa_tc6_write_registers(struct oa_tc6 *tc6, u32 address, \ + u32 value[], u8 length) + +Writing multiple consecutive registers starting from @address in the MAC-PHY. +Maximum of 128 consecutive registers can be written starting at @address. + +.. c:function:: int oa_tc6_read_register(struct oa_tc6 *tc6, u32 address, \ + u32 *value) + +Read a single register in the MAC-PHY. + +.. c:function:: int oa_tc6_read_registers(struct oa_tc6 *tc6, u32 address, \ + u32 value[], u8 length) + +Reading multiple consecutive registers starting from @address in the MAC-PHY. +Maximum of 128 consecutive registers can be read starting at @address. + +.. c:function:: netdev_tx_t oa_tc6_start_xmit(struct oa_tc6 *tc6, \ + struct sk_buff *skb); + +The transmit Ethernet frame in the skb is or going to be transmitted through +the MAC-PHY. + +.. c:function:: int oa_tc6_zero_align_receive_frame_enable(struct oa_tc6 *tc6); + +Zero align receive frame feature can be enabled to align all receive ethernet +frames data to start at the beginning of any receive data chunk payload with a +start word offset (SWO) of zero. diff --git a/Documentation/networking/phy-link-topology.rst b/Documentation/networking/phy-link-topology.rst new file mode 100644 index 000000000000..4dec5d7d6513 --- /dev/null +++ b/Documentation/networking/phy-link-topology.rst @@ -0,0 +1,121 @@ +.. SPDX-License-Identifier: GPL-2.0 +.. _phy_link_topology: + +================= +PHY link topology +================= + +Overview +======== + +The PHY link topology representation in the networking stack aims at representing +the hardware layout for any given Ethernet link. + +An Ethernet interface from userspace's point of view is nothing but a +:c:type:`struct net_device <net_device>`, which exposes configuration options +through the legacy ioctls and the ethtool netlink commands. The base assumption +when designing these configuration APIs were that the link looks something like :: + + +-----------------------+ +----------+ +--------------+ + | Ethernet Controller / | | Ethernet | | Connector / | + | MAC | ------ | PHY | ---- | Port | ---... to LP + +-----------------------+ +----------+ +--------------+ + struct net_device struct phy_device + +Commands that needs to configure the PHY will go through the net_device.phydev +field to reach the PHY and perform the relevant configuration. + +This assumption falls apart in more complex topologies that can arise when, +for example, using SFP transceivers (although that's not the only specific case). + +Here, we have 2 basic scenarios. Either the MAC is able to output a serialized +interface, that can directly be fed to an SFP cage, such as SGMII, 1000BaseX, +10GBaseR, etc. + +The link topology then looks like this (when an SFP module is inserted) :: + + +-----+ SGMII +------------+ + | MAC | ------- | SFP Module | + +-----+ +------------+ + +Knowing that some modules embed a PHY, the actual link is more like :: + + +-----+ SGMII +--------------+ + | MAC | -------- | PHY (on SFP) | + +-----+ +--------------+ + +In this case, the SFP PHY is handled by phylib, and registered by phylink through +its SFP upstream ops. + +Now some Ethernet controllers aren't able to output a serialized interface, so +we can't directly connect them to an SFP cage. However, some PHYs can be used +as media-converters, to translate the non-serialized MAC MII interface to a +serialized MII interface fed to the SFP :: + + +-----+ RGMII +-----------------------+ SGMII +--------------+ + | MAC | ------- | PHY (media converter) | ------- | PHY (on SFP) | + +-----+ +-----------------------+ +--------------+ + +This is where the model of having a single net_device.phydev pointer shows its +limitations, as we now have 2 PHYs on the link. + +The phy_link topology framework aims at providing a way to keep track of every +PHY on the link, for use by both kernel drivers and subsystems, but also to +report the topology to userspace, allowing to target individual PHYs in configuration +commands. + +API +=== + +The :c:type:`struct phy_link_topology <phy_link_topology>` is a per-netdevice +resource, that gets initialized at netdevice creation. Once it's initialized, +it is then possible to register PHYs to the topology through : + +:c:func:`phy_link_topo_add_phy` + +Besides registering the PHY to the topology, this call will also assign a unique +index to the PHY, which can then be reported to userspace to refer to this PHY +(akin to the ifindex). This index is a u32, ranging from 1 to U32_MAX. The value +0 is reserved to indicate the PHY doesn't belong to any topology yet. + +The PHY can then be removed from the topology through + +:c:func:`phy_link_topo_del_phy` + +These function are already hooked into the phylib subsystem, so all PHYs that +are linked to a net_device through :c:func:`phy_attach_direct` will automatically +join the netdev's topology. + +PHYs that are on a SFP module will also be automatically registered IF the SFP +upstream is phylink (so, no media-converter). + +PHY drivers that can be used as SFP upstream need to call :c:func:`phy_sfp_attach_phy` +and :c:func:`phy_sfp_detach_phy`, which can be used as a +.attach_phy / .detach_phy implementation for the +:c:type:`struct sfp_upstream_ops <sfp_upstream_ops>`. + +UAPI +==== + +There exist a set of netlink commands to query the link topology from userspace, +see ``Documentation/networking/ethtool-netlink.rst``. + +The whole point of having a topology representation is to assign the phyindex +field in :c:type:`struct phy_device <phy_device>`. This index is reported to +userspace using the ``ETHTOOL_MSG_PHY_GET`` ethtnl command. Performing a DUMP operation +will result in all PHYs from all net_device being listed. The DUMP command +accepts either a ``ETHTOOL_A_HEADER_DEV_INDEX`` or ``ETHTOOL_A_HEADER_DEV_NAME`` +to be passed in the request to filter the DUMP to a single net_device. + +The retrieved index can then be passed as a request parameter using the +``ETHTOOL_A_HEADER_PHY_INDEX`` field in the following ethnl commands : + +* ``ETHTOOL_MSG_STRSET_GET`` to get the stats string set from a given PHY +* ``ETHTOOL_MSG_CABLE_TEST_ACT`` and ``ETHTOOL_MSG_CABLE_TEST_ACT``, to perform + cable testing on a given PHY on the link (most likely the outermost PHY) +* ``ETHTOOL_MSG_PSE_SET`` and ``ETHTOOL_MSG_PSE_GET`` for PHY-controlled PoE and PSE settings +* ``ETHTOOL_MSG_PLCA_GET_CFG``, ``ETHTOOL_MSG_PLCA_SET_CFG`` and ``ETHTOOL_MSG_PLCA_GET_STATUS`` + to set the PLCA (Physical Layer Collision Avoidance) parameters + +Note that the PHY index can be passed to other requests, which will silently +ignore it if present and irrelevant. diff --git a/Documentation/networking/switchdev.rst b/Documentation/networking/switchdev.rst index 758f1dae3fce..f355f0166f1b 100644 --- a/Documentation/networking/switchdev.rst +++ b/Documentation/networking/switchdev.rst @@ -137,10 +137,10 @@ would be sub-port 0 on port 1 on switch 1. Port Features ^^^^^^^^^^^^^ -NETIF_F_NETNS_LOCAL +dev->netns_local If the switchdev driver (and device) only supports offloading of the default -network namespace (netns), the driver should set this feature flag to prevent +network namespace (netns), the driver should set this private flag to prevent the port netdev from being moved out of the default netns. A netns-aware driver/device would not set this flag and be responsible for partitioning hardware to preserve netns containment. This means hardware cannot forward diff --git a/Documentation/networking/timestamping.rst b/Documentation/networking/timestamping.rst index 5e93cd71f99f..8199e6917671 100644 --- a/Documentation/networking/timestamping.rst +++ b/Documentation/networking/timestamping.rst @@ -158,7 +158,8 @@ SOF_TIMESTAMPING_SYS_HARDWARE: SOF_TIMESTAMPING_RAW_HARDWARE: Report hardware timestamps as generated by - SOF_TIMESTAMPING_TX_HARDWARE when available. + SOF_TIMESTAMPING_TX_HARDWARE or SOF_TIMESTAMPING_RX_HARDWARE + when available. 1.3.3 Timestamp Options @@ -266,6 +267,23 @@ SOF_TIMESTAMPING_OPT_TX_SWHW: two separate messages will be looped to the socket's error queue, each containing just one timestamp. +SOF_TIMESTAMPING_OPT_RX_FILTER: + Filter out spurious receive timestamps: report a receive timestamp + only if the matching timestamp generation flag is enabled. + + Receive timestamps are generated early in the ingress path, before a + packet's destination socket is known. If any socket enables receive + timestamps, packets for all socket will receive timestamped packets. + Including those that request timestamp reporting with + SOF_TIMESTAMPING_SOFTWARE and/or SOF_TIMESTAMPING_RAW_HARDWARE, but + do not request receive timestamp generation. This can happen when + requesting transmit timestamps only. + + Receiving spurious timestamps is generally benign. A process can + ignore the unexpected non-zero value. But it makes behavior subtly + dependent on other sockets. This flag isolates the socket for more + deterministic behavior. + New applications are encouraged to pass SOF_TIMESTAMPING_OPT_ID to disambiguate timestamps and SOF_TIMESTAMPING_OPT_TSONLY to operate regardless of the setting of sysctl net.core.tstamp_allow_data. diff --git a/Documentation/power/pci.rst b/Documentation/power/pci.rst index e2c1fb8a569a..9ebecb7b00b2 100644 --- a/Documentation/power/pci.rst +++ b/Documentation/power/pci.rst @@ -979,18 +979,17 @@ subsections can be defined as a separate function, it often is convenient to point two or more members of struct dev_pm_ops to the same routine. There are a few convenience macros that can be used for this purpose. -The SIMPLE_DEV_PM_OPS macro declares a struct dev_pm_ops object with one +The DEFINE_SIMPLE_DEV_PM_OPS() declares a struct dev_pm_ops object with one suspend routine pointed to by the .suspend(), .freeze(), and .poweroff() members and one resume routine pointed to by the .resume(), .thaw(), and .restore() members. The other function pointers in this struct dev_pm_ops are unset. -The UNIVERSAL_DEV_PM_OPS macro is similar to SIMPLE_DEV_PM_OPS, but it -additionally sets the .runtime_resume() pointer to the same value as -.resume() (and .thaw(), and .restore()) and the .runtime_suspend() pointer to -the same value as .suspend() (and .freeze() and .poweroff()). +The DEFINE_RUNTIME_DEV_PM_OPS() is similar to DEFINE_SIMPLE_DEV_PM_OPS(), but it +additionally sets the .runtime_resume() pointer to pm_runtime_force_resume() +and the .runtime_suspend() pointer to pm_runtime_force_suspend(). -The SET_SYSTEM_SLEEP_PM_OPS can be used inside of a declaration of struct +The SYSTEM_SLEEP_PM_OPS() can be used inside of a declaration of struct dev_pm_ops to indicate that one suspend routine is to be pointed to by the .suspend(), .freeze(), and .poweroff() members and one resume routine is to be pointed to by the .resume(), .thaw(), and .restore() members. diff --git a/Documentation/power/runtime_pm.rst b/Documentation/power/runtime_pm.rst index 5c4e730f38d0..53d1996460ab 100644 --- a/Documentation/power/runtime_pm.rst +++ b/Documentation/power/runtime_pm.rst @@ -811,8 +811,8 @@ subsystem-level dev_pm_ops structure. Device drivers that wish to use the same function as a system suspend, freeze, poweroff and runtime suspend callback, and similarly for system resume, thaw, -restore, and runtime resume, can achieve this with the help of the -UNIVERSAL_DEV_PM_OPS macro defined in include/linux/pm.h (possibly setting its +restore, and runtime resume, can achieve similar behaviour with the help of the +DEFINE_RUNTIME_DEV_PM_OPS() defined in include/linux/pm_runtime.h (possibly setting its last argument to NULL). 8. "No-Callback" Devices diff --git a/Documentation/security/index.rst b/Documentation/security/index.rst index 59f8fc106cb0..3e0a7114a862 100644 --- a/Documentation/security/index.rst +++ b/Documentation/security/index.rst @@ -19,3 +19,4 @@ Security Documentation digsig landlock secrets/index + ipe diff --git a/Documentation/security/ipe.rst b/Documentation/security/ipe.rst new file mode 100644 index 000000000000..4a7d953abcdc --- /dev/null +++ b/Documentation/security/ipe.rst @@ -0,0 +1,446 @@ +.. SPDX-License-Identifier: GPL-2.0 + +Integrity Policy Enforcement (IPE) - Kernel Documentation +========================================================= + +.. NOTE:: + + This is documentation targeted at developers, instead of administrators. + If you're looking for documentation on the usage of IPE, please see + :doc:`IPE admin guide </admin-guide/LSM/ipe>`. + +Historical Motivation +--------------------- + +The original issue that prompted IPE's implementation was the creation +of a locked-down system. This system would be born-secure, and have +strong integrity guarantees over both the executable code, and specific +*data files* on the system, that were critical to its function. These +specific data files would not be readable unless they passed integrity +policy. A mandatory access control system would be present, and +as a result, xattrs would have to be protected. This lead to a selection +of what would provide the integrity claims. At the time, there were two +main mechanisms considered that could guarantee integrity for the system +with these requirements: + + 1. IMA + EVM Signatures + 2. DM-Verity + +Both options were carefully considered, however the choice to use DM-Verity +over IMA+EVM as the *integrity mechanism* in the original use case of IPE +was due to three main reasons: + + 1. Protection of additional attack vectors: + + * With IMA+EVM, without an encryption solution, the system is vulnerable + to offline attack against the aforementioned specific data files. + + Unlike executables, read operations (like those on the protected data + files), cannot be enforced to be globally integrity verified. This means + there must be some form of selector to determine whether a read should + enforce the integrity policy, or it should not. + + At the time, this was done with mandatory access control labels. An IMA + policy would indicate what labels required integrity verification, which + presented an issue: EVM would protect the label, but if an attacker could + modify filesystem offline, the attacker could wipe all the xattrs - + including the SELinux labels that would be used to determine whether the + file should be subject to integrity policy. + + With DM-Verity, as the xattrs are saved as part of the Merkel tree, if + offline mount occurs against the filesystem protected by dm-verity, the + checksum no longer matches and the file fails to be read. + + * As userspace binaries are paged in Linux, dm-verity also offers the + additional protection against a hostile block device. In such an attack, + the block device reports the appropriate content for the IMA hash + initially, passing the required integrity check. Then, on the page fault + that accesses the real data, will report the attacker's payload. Since + dm-verity will check the data when the page fault occurs (and the disk + access), this attack is mitigated. + + 2. Performance: + + * dm-verity provides integrity verification on demand as blocks are + read versus requiring the entire file being read into memory for + validation. + + 3. Simplicity of signing: + + * No need for two signatures (IMA, then EVM): one signature covers + an entire block device. + * Signatures can be stored externally to the filesystem metadata. + * The signature supports an x.509-based signing infrastructure. + +The next step was to choose a *policy* to enforce the integrity mechanism. +The minimum requirements for the policy were: + + 1. The policy itself must be integrity verified (preventing trivial + attack against it). + 2. The policy itself must be resistant to rollback attacks. + 3. The policy enforcement must have a permissive-like mode. + 4. The policy must be able to be updated, in its entirety, without + a reboot. + 5. Policy updates must be atomic. + 6. The policy must support *revocations* of previously authored + components. + 7. The policy must be auditable, at any point-of-time. + +IMA, as the only integrity policy mechanism at the time, was +considered against these list of requirements, and did not fulfill +all of the minimum requirements. Extending IMA to cover these +requirements was considered, but ultimately discarded for a +two reasons: + + 1. Regression risk; many of these changes would result in + dramatic code changes to IMA, which is already present in the + kernel, and therefore might impact users. + + 2. IMA was used in the system for measurement and attestation; + separation of measurement policy from local integrity policy + enforcement was considered favorable. + +Due to these reasons, it was decided that a new LSM should be created, +whose responsibility would be only the local integrity policy enforcement. + +Role and Scope +-------------- + +IPE, as its name implies, is fundamentally an integrity policy enforcement +solution; IPE does not mandate how integrity is provided, but instead +leaves that decision to the system administrator to set the security bar, +via the mechanisms that they select that suit their individual needs. +There are several different integrity solutions that provide a different +level of security guarantees; and IPE allows sysadmins to express policy for +theoretically all of them. + +IPE does not have an inherent mechanism to ensure integrity on its own. +Instead, there are more effective layers available for building systems that +can guarantee integrity. It's important to note that the mechanism for proving +integrity is independent of the policy for enforcing that integrity claim. + +Therefore, IPE was designed around: + + 1. Easy integrations with integrity providers. + 2. Ease of use for platform administrators/sysadmins. + +Design Rationale: +----------------- + +IPE was designed after evaluating existing integrity policy solutions +in other operating systems and environments. In this survey of other +implementations, there were a few pitfalls identified: + + 1. Policies were not readable by humans, usually requiring a binary + intermediary format. + 2. A single, non-customizable action was implicitly taken as a default. + 3. Debugging the policy required manual steps to determine what rule was violated. + 4. Authoring a policy required an in-depth knowledge of the larger system, + or operating system. + +IPE attempts to avoid all of these pitfalls. + +Policy +~~~~~~ + +Plain Text +^^^^^^^^^^ + +IPE's policy is plain-text. This introduces slightly larger policy files than +other LSMs, but solves two major problems that occurs with some integrity policy +solutions on other platforms. + +The first issue is one of code maintenance and duplication. To author policies, +the policy has to be some form of string representation (be it structured, +through XML, JSON, YAML, etcetera), to allow the policy author to understand +what is being written. In a hypothetical binary policy design, a serializer +is necessary to write the policy from the human readable form, to the binary +form, and a deserializer is needed to interpret the binary form into a data +structure in the kernel. + +Eventually, another deserializer will be needed to transform the binary from +back into the human-readable form with as much information preserved. This is because a +user of this access control system will have to keep a lookup table of a checksum +and the original file itself to try to understand what policies have been deployed +on this system and what policies have not. For a single user, this may be alright, +as old policies can be discarded almost immediately after the update takes hold. +For users that manage computer fleets in the thousands, if not hundreds of thousands, +with multiple different operating systems, and multiple different operational needs, +this quickly becomes an issue, as stale policies from years ago may be present, +quickly resulting in the need to recover the policy or fund extensive infrastructure +to track what each policy contains. + +With now three separate serializer/deserializers, maintenance becomes costly. If the +policy avoids the binary format, there is only one required serializer: from the +human-readable form to the data structure in kernel, saving on code maintenance, +and retaining operability. + +The second issue with a binary format is one of transparency. As IPE controls +access based on the trust of the system's resources, it's policy must also be +trusted to be changed. This is done through signatures, resulting in needing +signing as a process. Signing, as a process, is typically done with a +high security bar, as anything signed can be used to attack integrity +enforcement systems. It is also important that, when signing something, that +the signer is aware of what they are signing. A binary policy can cause +obfuscation of that fact; what signers see is an opaque binary blob. A +plain-text policy, on the other hand, the signers see the actual policy +submitted for signing. + +Boot Policy +~~~~~~~~~~~ + +IPE, if configured appropriately, is able to enforce a policy as soon as a +kernel is booted and usermode starts. That implies some level of storage +of the policy to apply the minute usermode starts. Generally, that storage +can be handled in one of three ways: + + 1. The policy file(s) live on disk and the kernel loads the policy prior + to an code path that would result in an enforcement decision. + 2. The policy file(s) are passed by the bootloader to the kernel, who + parses the policy. + 3. There is a policy file that is compiled into the kernel that is + parsed and enforced on initialization. + +The first option has problems: the kernel reading files from userspace +is typically discouraged and very uncommon in the kernel. + +The second option also has problems: Linux supports a variety of bootloaders +across its entire ecosystem - every bootloader would have to support this +new methodology or there must be an independent source. It would likely +result in more drastic changes to the kernel startup than necessary. + +The third option is the best but it's important to be aware that the policy +will take disk space against the kernel it's compiled in. It's important to +keep this policy generalized enough that userspace can load a new, more +complicated policy, but restrictive enough that it will not overauthorize +and cause security issues. + +The initramfs provides a way that this bootup path can be established. The +kernel starts with a minimal policy, that trusts the initramfs only. Inside +the initramfs, when the real rootfs is mounted, but not yet transferred to, +it deploys and activates a policy that trusts the new root filesystem. +This prevents overauthorization at any step, and keeps the kernel policy +to a minimal size. + +Startup +^^^^^^^ + +Not every system, however starts with an initramfs, so the startup policy +compiled into the kernel will need some flexibility to express how trust +is established for the next phase of the bootup. To this end, if we just +make the compiled-in policy a full IPE policy, it allows system builders +to express the first stage bootup requirements appropriately. + +Updatable, Rebootless Policy +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +As requirements change over time (vulnerabilities are found in previously +trusted applications, keys roll, etcetera). Updating a kernel to change the +meet those security goals is not always a suitable option, as updates are not +always risk-free, and blocking a security update leaves systems vulnerable. +This means IPE requires a policy that can be completely updated (allowing +revocations of existing policy) from a source external to the kernel (allowing +policies to be updated without updating the kernel). + +Additionally, since the kernel is stateless between invocations, and reading +policy files off the disk from kernel space is a bad idea(tm), then the +policy updates have to be done rebootlessly. + +To allow an update from an external source, it could be potentially malicious, +so this policy needs to have a way to be identified as trusted. This is +done via a signature chained to a trust source in the kernel. Arbitrarily, +this is the ``SYSTEM_TRUSTED_KEYRING``, a keyring that is initially +populated at kernel compile-time, as this matches the expectation that the +author of the compiled-in policy described above is the same entity that can +deploy policy updates. + +Anti-Rollback / Anti-Replay +~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Over time, vulnerabilities are found and trusted resources may not be +trusted anymore. IPE's policy has no exception to this. There can be +instances where a mistaken policy author deploys an insecure policy, +before correcting it with a secure policy. + +Assuming that as soon as the insecure policy is signed, and an attacker +acquires the insecure policy, IPE needs a way to prevent rollback +from the secure policy update to the insecure policy update. + +Initially, IPE's policy can have a policy_version that states the +minimum required version across all policies that can be active on +the system. This will prevent rollback while the system is live. + +.. WARNING:: + + However, since the kernel is stateless across boots, this policy + version will be reset to 0.0.0 on the next boot. System builders + need to be aware of this, and ensure the new secure policies are + deployed ASAP after a boot to ensure that the window of + opportunity is minimal for an attacker to deploy the insecure policy. + +Implicit Actions: +~~~~~~~~~~~~~~~~~ + +The issue of implicit actions only becomes visible when you consider +a mixed level of security bars across multiple operations in a system. +For example, consider a system that has strong integrity guarantees +over both the executable code, and specific *data files* on the system, +that were critical to its function. In this system, three types of policies +are possible: + + 1. A policy in which failure to match any rules in the policy results + in the action being denied. + 2. A policy in which failure to match any rules in the policy results + in the action being allowed. + 3. A policy in which the action taken when no rules are matched is + specified by the policy author. + +The first option could make a policy like this:: + + op=EXECUTE integrity_verified=YES action=ALLOW + +In the example system, this works well for the executables, as all +executables should have integrity guarantees, without exception. The +issue becomes with the second requirement about specific data files. +This would result in a policy like this (assuming each line is +evaluated in order):: + + op=EXECUTE integrity_verified=YES action=ALLOW + + op=READ integrity_verified=NO label=critical_t action=DENY + op=READ action=ALLOW + +This is somewhat clear if you read the docs, understand the policy +is executed in order and that the default is a denial; however, the +last line effectively changes that default to an ALLOW. This is +required, because in a realistic system, there are some unverified +reads (imagine appending to a log file). + +The second option, matching no rules results in an allow, is clearer +for the specific data files:: + + op=READ integrity_verified=NO label=critical_t action=DENY + +And, like the first option, falls short with the execution scenario, +effectively needing to override the default:: + + op=EXECUTE integrity_verified=YES action=ALLOW + op=EXECUTE action=DENY + + op=READ integrity_verified=NO label=critical_t action=DENY + +This leaves the third option. Instead of making users be clever +and override the default with an empty rule, force the end-user +to consider what the appropriate default should be for their +scenario and explicitly state it:: + + DEFAULT op=EXECUTE action=DENY + op=EXECUTE integrity_verified=YES action=ALLOW + + DEFAULT op=READ action=ALLOW + op=READ integrity_verified=NO label=critical_t action=DENY + +Policy Debugging: +~~~~~~~~~~~~~~~~~ + +When developing a policy, it is useful to know what line of the policy +is being violated to reduce debugging costs; narrowing the scope of the +investigation to the exact line that resulted in the action. Some integrity +policy systems do not provide this information, instead providing the +information that was used in the evaluation. This then requires a correlation +with the policy to evaluate what went wrong. + +Instead, IPE just emits the rule that was matched. This limits the scope +of the investigation to the exact policy line (in the case of a specific +rule), or the section (in the case of a DEFAULT). This decreases iteration +and investigation times when policy failures are observed while evaluating +policies. + +IPE's policy engine is also designed in a way that it makes it obvious to +a human of how to investigate a policy failure. Each line is evaluated in +the sequence that is written, so the algorithm is very simple to follow +for humans to recreate the steps and could have caused the failure. In other +surveyed systems, optimizations occur (sorting rules, for instance) when loading +the policy. In those systems, it requires multiple steps to debug, and the +algorithm may not always be clear to the end-user without reading the code first. + +Simplified Policy: +~~~~~~~~~~~~~~~~~~ + +Finally, IPE's policy is designed for sysadmins, not kernel developers. Instead +of covering individual LSM hooks (or syscalls), IPE covers operations. This means +instead of sysadmins needing to know that the syscalls ``mmap``, ``mprotect``, +``execve``, and ``uselib`` must have rules protecting them, they must simple know +that they want to restrict code execution. This limits the amount of bypasses that +could occur due to a lack of knowledge of the underlying system; whereas the +maintainers of IPE, being kernel developers can make the correct choice to determine +whether something maps to these operations, and under what conditions. + +Implementation Notes +-------------------- + +Anonymous Memory +~~~~~~~~~~~~~~~~ + +Anonymous memory isn't treated any differently from any other access in IPE. +When anonymous memory is mapped with ``+X``, it still comes into the ``file_mmap`` +or ``file_mprotect`` hook, but with a ``NULL`` file object. This is submitted to +the evaluation, like any other file. However, all current trust properties will +evaluate to false, as they are all file-based and the operation is not +associated with a file. + +.. WARNING:: + + This also occurs with the ``kernel_load_data`` hook, when the kernel is + loading data from a userspace buffer that is not backed by a file. In this + scenario all current trust properties will also evaluate to false. + +Securityfs Interface +~~~~~~~~~~~~~~~~~~~~ + +The per-policy securityfs tree is somewhat unique. For example, for +a standard securityfs policy tree:: + + MyPolicy + |- active + |- delete + |- name + |- pkcs7 + |- policy + |- update + |- version + +The policy is stored in the ``->i_private`` data of the MyPolicy inode. + +Tests +----- + +IPE has KUnit Tests for the policy parser. Recommended kunitconfig:: + + CONFIG_KUNIT=y + CONFIG_SECURITY=y + CONFIG_SECURITYFS=y + CONFIG_PKCS7_MESSAGE_PARSER=y + CONFIG_SYSTEM_DATA_VERIFICATION=y + CONFIG_FS_VERITY=y + CONFIG_FS_VERITY_BUILTIN_SIGNATURES=y + CONFIG_BLOCK=y + CONFIG_MD=y + CONFIG_BLK_DEV_DM=y + CONFIG_DM_VERITY=y + CONFIG_DM_VERITY_VERIFY_ROOTHASH_SIG=y + CONFIG_NET=y + CONFIG_AUDIT=y + CONFIG_AUDITSYSCALL=y + CONFIG_BLK_DEV_INITRD=y + + CONFIG_SECURITY_IPE=y + CONFIG_IPE_PROP_DM_VERITY=y + CONFIG_IPE_PROP_DM_VERITY_SIGNATURE=y + CONFIG_IPE_PROP_FS_VERITY=y + CONFIG_IPE_PROP_FS_VERITY_BUILTIN_SIG=y + CONFIG_SECURITY_IPE_KUNIT_TEST=y + +In addition, IPE has a python based integration +`test suite <https://github.com/microsoft/ipe/tree/test-suite>`_ that +can test both user interfaces and enforcement functionalities. diff --git a/Documentation/virt/hyperv/coco.rst b/Documentation/virt/hyperv/coco.rst new file mode 100644 index 000000000000..c15d6fe34b4e --- /dev/null +++ b/Documentation/virt/hyperv/coco.rst @@ -0,0 +1,260 @@ +.. SPDX-License-Identifier: GPL-2.0 + +Confidential Computing VMs +========================== +Hyper-V can create and run Linux guests that are Confidential Computing +(CoCo) VMs. Such VMs cooperate with the physical processor to better protect +the confidentiality and integrity of data in the VM's memory, even in the +face of a hypervisor/VMM that has been compromised and may behave maliciously. +CoCo VMs on Hyper-V share the generic CoCo VM threat model and security +objectives described in Documentation/security/snp-tdx-threat-model.rst. Note +that Hyper-V specific code in Linux refers to CoCo VMs as "isolated VMs" or +"isolation VMs". + +A Linux CoCo VM on Hyper-V requires the cooperation and interaction of the +following: + +* Physical hardware with a processor that supports CoCo VMs + +* The hardware runs a version of Windows/Hyper-V with support for CoCo VMs + +* The VM runs a version of Linux that supports being a CoCo VM + +The physical hardware requirements are as follows: + +* AMD processor with SEV-SNP. Hyper-V does not run guest VMs with AMD SME, + SEV, or SEV-ES encryption, and such encryption is not sufficient for a CoCo + VM on Hyper-V. + +* Intel processor with TDX + +To create a CoCo VM, the "Isolated VM" attribute must be specified to Hyper-V +when the VM is created. A VM cannot be changed from a CoCo VM to a normal VM, +or vice versa, after it is created. + +Operational Modes +----------------- +Hyper-V CoCo VMs can run in two modes. The mode is selected when the VM is +created and cannot be changed during the life of the VM. + +* Fully-enlightened mode. In this mode, the guest operating system is + enlightened to understand and manage all aspects of running as a CoCo VM. + +* Paravisor mode. In this mode, a paravisor layer between the guest and the + host provides some operations needed to run as a CoCo VM. The guest operating + system can have fewer CoCo enlightenments than is required in the + fully-enlightened case. + +Conceptually, fully-enlightened mode and paravisor mode may be treated as +points on a spectrum spanning the degree of guest enlightenment needed to run +as a CoCo VM. Fully-enlightened mode is one end of the spectrum. A full +implementation of paravisor mode is the other end of the spectrum, where all +aspects of running as a CoCo VM are handled by the paravisor, and a normal +guest OS with no knowledge of memory encryption or other aspects of CoCo VMs +can run successfully. However, the Hyper-V implementation of paravisor mode +does not go this far, and is somewhere in the middle of the spectrum. Some +aspects of CoCo VMs are handled by the Hyper-V paravisor while the guest OS +must be enlightened for other aspects. Unfortunately, there is no +standardized enumeration of feature/functions that might be provided in the +paravisor, and there is no standardized mechanism for a guest OS to query the +paravisor for the feature/functions it provides. The understanding of what +the paravisor provides is hard-coded in the guest OS. + +Paravisor mode has similarities to the `Coconut project`_, which aims to provide +a limited paravisor to provide services to the guest such as a virtual TPM. +However, the Hyper-V paravisor generally handles more aspects of CoCo VMs +than is currently envisioned for Coconut, and so is further toward the "no +guest enlightenments required" end of the spectrum. + +.. _Coconut project: https://github.com/coconut-svsm/svsm + +In the CoCo VM threat model, the paravisor is in the guest security domain +and must be trusted by the guest OS. By implication, the hypervisor/VMM must +protect itself against a potentially malicious paravisor just like it +protects against a potentially malicious guest. + +The hardware architectural approach to fully-enlightened vs. paravisor mode +varies depending on the underlying processor. + +* With AMD SEV-SNP processors, in fully-enlightened mode the guest OS runs in + VMPL 0 and has full control of the guest context. In paravisor mode, the + guest OS runs in VMPL 2 and the paravisor runs in VMPL 0. The paravisor + running in VMPL 0 has privileges that the guest OS in VMPL 2 does not have. + Certain operations require the guest to invoke the paravisor. Furthermore, in + paravisor mode the guest OS operates in "virtual Top Of Memory" (vTOM) mode + as defined by the SEV-SNP architecture. This mode simplifies guest management + of memory encryption when a paravisor is used. + +* With Intel TDX processor, in fully-enlightened mode the guest OS runs in an + L1 VM. In paravisor mode, TD partitioning is used. The paravisor runs in the + L1 VM, and the guest OS runs in a nested L2 VM. + +Hyper-V exposes a synthetic MSR to guests that describes the CoCo mode. This +MSR indicates if the underlying processor uses AMD SEV-SNP or Intel TDX, and +whether a paravisor is being used. It is straightforward to build a single +kernel image that can boot and run properly on either architecture, and in +either mode. + +Paravisor Effects +----------------- +Running in paravisor mode affects the following areas of generic Linux kernel +CoCo VM functionality: + +* Initial guest memory setup. When a new VM is created in paravisor mode, the + paravisor runs first and sets up the guest physical memory as encrypted. The + guest Linux does normal memory initialization, except for explicitly marking + appropriate ranges as decrypted (shared). In paravisor mode, Linux does not + perform the early boot memory setup steps that are particularly tricky with + AMD SEV-SNP in fully-enlightened mode. + +* #VC/#VE exception handling. In paravisor mode, Hyper-V configures the guest + CoCo VM to route #VC and #VE exceptions to VMPL 0 and the L1 VM, + respectively, and not the guest Linux. Consequently, these exception handlers + do not run in the guest Linux and are not a required enlightenment for a + Linux guest in paravisor mode. + +* CPUID flags. Both AMD SEV-SNP and Intel TDX provide a CPUID flag in the + guest indicating that the VM is operating with the respective hardware + support. While these CPUID flags are visible in fully-enlightened CoCo VMs, + the paravisor filters out these flags and the guest Linux does not see them. + Throughout the Linux kernel, explicitly testing these flags has mostly been + eliminated in favor of the cc_platform_has() function, with the goal of + abstracting the differences between SEV-SNP and TDX. But the + cc_platform_has() abstraction also allows the Hyper-V paravisor configuration + to selectively enable aspects of CoCo VM functionality even when the CPUID + flags are not set. The exception is early boot memory setup on SEV-SNP, which + tests the CPUID SEV-SNP flag. But not having the flag in Hyper-V paravisor + mode VM achieves the desired effect or not running SEV-SNP specific early + boot memory setup. + +* Device emulation. In paravisor mode, the Hyper-V paravisor provides + emulation of devices such as the IO-APIC and TPM. Because the emulation + happens in the paravisor in the guest context (instead of the hypervisor/VMM + context), MMIO accesses to these devices must be encrypted references instead + of the decrypted references that would be used in a fully-enlightened CoCo + VM. The __ioremap_caller() function has been enhanced to make a callback to + check whether a particular address range should be treated as encrypted + (private). See the "is_private_mmio" callback. + +* Encrypt/decrypt memory transitions. In a CoCo VM, transitioning guest + memory between encrypted and decrypted requires coordinating with the + hypervisor/VMM. This is done via callbacks invoked from + __set_memory_enc_pgtable(). In fully-enlightened mode, the normal SEV-SNP and + TDX implementations of these callbacks are used. In paravisor mode, a Hyper-V + specific set of callbacks is used. These callbacks invoke the paravisor so + that the paravisor can coordinate the transitions and inform the hypervisor + as necessary. See hv_vtom_init() where these callback are set up. + +* Interrupt injection. In fully enlightened mode, a malicious hypervisor + could inject interrupts into the guest OS at times that violate x86/x64 + architectural rules. For full protection, the guest OS should include + enlightenments that use the interrupt injection management features provided + by CoCo-capable processors. In paravisor mode, the paravisor mediates + interrupt injection into the guest OS, and ensures that the guest OS only + sees interrupts that are "legal". The paravisor uses the interrupt injection + management features provided by the CoCo-capable physical processor, thereby + masking these complexities from the guest OS. + +Hyper-V Hypercalls +------------------ +When in fully-enlightened mode, hypercalls made by the Linux guest are routed +directly to the hypervisor, just as in a non-CoCo VM. But in paravisor mode, +normal hypercalls trap to the paravisor first, which may in turn invoke the +hypervisor. But the paravisor is idiosyncratic in this regard, and a few +hypercalls made by the Linux guest must always be routed directly to the +hypervisor. These hypercall sites test for a paravisor being present, and use +a special invocation sequence. See hv_post_message(), for example. + +Guest communication with Hyper-V +-------------------------------- +Separate from the generic Linux kernel handling of memory encryption in Linux +CoCo VMs, Hyper-V has VMBus and VMBus devices that communicate using memory +shared between the Linux guest and the host. This shared memory must be +marked decrypted to enable communication. Furthermore, since the threat model +includes a compromised and potentially malicious host, the guest must guard +against leaking any unintended data to the host through this shared memory. + +These Hyper-V and VMBus memory pages are marked as decrypted: + +* VMBus monitor pages + +* Synthetic interrupt controller (synic) related pages (unless supplied by + the paravisor) + +* Per-cpu hypercall input and output pages (unless running with a paravisor) + +* VMBus ring buffers. The direct mapping is marked decrypted in + __vmbus_establish_gpadl(). The secondary mapping created in + hv_ringbuffer_init() must also include the "decrypted" attribute. + +When the guest writes data to memory that is shared with the host, it must +ensure that only the intended data is written. Padding or unused fields must +be initialized to zeros before copying into the shared memory so that random +kernel data is not inadvertently given to the host. + +Similarly, when the guest reads memory that is shared with the host, it must +validate the data before acting on it so that a malicious host cannot induce +the guest to expose unintended data. Doing such validation can be tricky +because the host can modify the shared memory areas even while or after +validation is performed. For messages passed from the host to the guest in a +VMBus ring buffer, the length of the message is validated, and the message is +copied into a temporary (encrypted) buffer for further validation and +processing. The copying adds a small amount of overhead, but is the only way +to protect against a malicious host. See hv_pkt_iter_first(). + +Many drivers for VMBus devices have been "hardened" by adding code to fully +validate messages received over VMBus, instead of assuming that Hyper-V is +acting cooperatively. Such drivers are marked as "allowed_in_isolated" in the +vmbus_devs[] table. Other drivers for VMBus devices that are not needed in a +CoCo VM have not been hardened, and they are not allowed to load in a CoCo +VM. See vmbus_is_valid_offer() where such devices are excluded. + +Two VMBus devices depend on the Hyper-V host to do DMA data transfers: +storvsc for disk I/O and netvsc for network I/O. storvsc uses the normal +Linux kernel DMA APIs, and so bounce buffering through decrypted swiotlb +memory is done implicitly. netvsc has two modes for data transfers. The first +mode goes through send and receive buffer space that is explicitly allocated +by the netvsc driver, and is used for most smaller packets. These send and +receive buffers are marked decrypted by __vmbus_establish_gpadl(). Because +the netvsc driver explicitly copies packets to/from these buffers, the +equivalent of bounce buffering between encrypted and decrypted memory is +already part of the data path. The second mode uses the normal Linux kernel +DMA APIs, and is bounce buffered through swiotlb memory implicitly like in +storvsc. + +Finally, the VMBus virtual PCI driver needs special handling in a CoCo VM. +Linux PCI device drivers access PCI config space using standard APIs provided +by the Linux PCI subsystem. On Hyper-V, these functions directly access MMIO +space, and the access traps to Hyper-V for emulation. But in CoCo VMs, memory +encryption prevents Hyper-V from reading the guest instruction stream to +emulate the access. So in a CoCo VM, these functions must make a hypercall +with arguments explicitly describing the access. See +_hv_pcifront_read_config() and _hv_pcifront_write_config() and the +"use_calls" flag indicating to use hypercalls. + +load_unaligned_zeropad() +------------------------ +When transitioning memory between encrypted and decrypted, the caller of +set_memory_encrypted() or set_memory_decrypted() is responsible for ensuring +the memory isn't in use and isn't referenced while the transition is in +progress. The transition has multiple steps, and includes interaction with +the Hyper-V host. The memory is in an inconsistent state until all steps are +complete. A reference while the state is inconsistent could result in an +exception that can't be cleanly fixed up. + +However, the kernel load_unaligned_zeropad() mechanism may make stray +references that can't be prevented by the caller of set_memory_encrypted() or +set_memory_decrypted(), so there's specific code in the #VC or #VE exception +handler to fixup this case. But a CoCo VM running on Hyper-V may be +configured to run with a paravisor, with the #VC or #VE exception routed to +the paravisor. There's no architectural way to forward the exceptions back to +the guest kernel, and in such a case, the load_unaligned_zeropad() fixup code +in the #VC/#VE handlers doesn't run. + +To avoid this problem, the Hyper-V specific functions for notifying the +hypervisor of the transition mark pages as "not present" while a transition +is in progress. If load_unaligned_zeropad() causes a stray reference, a +normal page fault is generated instead of #VC or #VE, and the page-fault- +based handlers for load_unaligned_zeropad() fixup the reference. When the +encrypted/decrypted transition is complete, the pages are marked as "present" +again. See hv_vtom_clear_present() and hv_vtom_set_host_visibility(). diff --git a/Documentation/virt/hyperv/index.rst b/Documentation/virt/hyperv/index.rst index de447e11b4a5..79bc4080329e 100644 --- a/Documentation/virt/hyperv/index.rst +++ b/Documentation/virt/hyperv/index.rst @@ -11,3 +11,4 @@ Hyper-V Enlightenments vmbus clocks vpci + coco diff --git a/Documentation/virt/kvm/arm/hypercalls.rst b/Documentation/virt/kvm/arm/hypercalls.rst index 17be111f493f..af7bc2c2e0cb 100644 --- a/Documentation/virt/kvm/arm/hypercalls.rst +++ b/Documentation/virt/kvm/arm/hypercalls.rst @@ -44,3 +44,101 @@ Provides a discovery mechanism for other KVM/arm64 hypercalls. ---------------------------------------- See ptp_kvm.rst + +``ARM_SMCCC_KVM_FUNC_HYP_MEMINFO`` +---------------------------------- + +Query the memory protection parameters for a pKVM protected virtual machine. + ++---------------------+-------------------------------------------------------------+ +| Presence: | Optional; pKVM protected guests only. | ++---------------------+-------------------------------------------------------------+ +| Calling convention: | HVC64 | ++---------------------+----------+--------------------------------------------------+ +| Function ID: | (uint32) | 0xC6000002 | ++---------------------+----------+----+---------------------------------------------+ +| Arguments: | (uint64) | R1 | Reserved / Must be zero | +| +----------+----+---------------------------------------------+ +| | (uint64) | R2 | Reserved / Must be zero | +| +----------+----+---------------------------------------------+ +| | (uint64) | R3 | Reserved / Must be zero | ++---------------------+----------+----+---------------------------------------------+ +| Return Values: | (int64) | R0 | ``INVALID_PARAMETER (-3)`` on error, else | +| | | | memory protection granule in bytes | ++---------------------+----------+----+---------------------------------------------+ + +``ARM_SMCCC_KVM_FUNC_MEM_SHARE`` +-------------------------------- + +Share a region of memory with the KVM host, granting it read, write and execute +permissions. The size of the region is equal to the memory protection granule +advertised by ``ARM_SMCCC_KVM_FUNC_HYP_MEMINFO``. + ++---------------------+-------------------------------------------------------------+ +| Presence: | Optional; pKVM protected guests only. | ++---------------------+-------------------------------------------------------------+ +| Calling convention: | HVC64 | ++---------------------+----------+--------------------------------------------------+ +| Function ID: | (uint32) | 0xC6000003 | ++---------------------+----------+----+---------------------------------------------+ +| Arguments: | (uint64) | R1 | Base IPA of memory region to share | +| +----------+----+---------------------------------------------+ +| | (uint64) | R2 | Reserved / Must be zero | +| +----------+----+---------------------------------------------+ +| | (uint64) | R3 | Reserved / Must be zero | ++---------------------+----------+----+---------------------------------------------+ +| Return Values: | (int64) | R0 | ``SUCCESS (0)`` | +| | | +---------------------------------------------+ +| | | | ``INVALID_PARAMETER (-3)`` | ++---------------------+----------+----+---------------------------------------------+ + +``ARM_SMCCC_KVM_FUNC_MEM_UNSHARE`` +---------------------------------- + +Revoke access permission from the KVM host to a memory region previously shared +with ``ARM_SMCCC_KVM_FUNC_MEM_SHARE``. The size of the region is equal to the +memory protection granule advertised by ``ARM_SMCCC_KVM_FUNC_HYP_MEMINFO``. + ++---------------------+-------------------------------------------------------------+ +| Presence: | Optional; pKVM protected guests only. | ++---------------------+-------------------------------------------------------------+ +| Calling convention: | HVC64 | ++---------------------+----------+--------------------------------------------------+ +| Function ID: | (uint32) | 0xC6000004 | ++---------------------+----------+----+---------------------------------------------+ +| Arguments: | (uint64) | R1 | Base IPA of memory region to unshare | +| +----------+----+---------------------------------------------+ +| | (uint64) | R2 | Reserved / Must be zero | +| +----------+----+---------------------------------------------+ +| | (uint64) | R3 | Reserved / Must be zero | ++---------------------+----------+----+---------------------------------------------+ +| Return Values: | (int64) | R0 | ``SUCCESS (0)`` | +| | | +---------------------------------------------+ +| | | | ``INVALID_PARAMETER (-3)`` | ++---------------------+----------+----+---------------------------------------------+ + +``ARM_SMCCC_KVM_FUNC_MMIO_GUARD`` +---------------------------------- + +Request that a given memory region is handled as MMIO by the hypervisor, +allowing accesses to this region to be emulated by the KVM host. The size of the +region is equal to the memory protection granule advertised by +``ARM_SMCCC_KVM_FUNC_HYP_MEMINFO``. + ++---------------------+-------------------------------------------------------------+ +| Presence: | Optional; pKVM protected guests only. | ++---------------------+-------------------------------------------------------------+ +| Calling convention: | HVC64 | ++---------------------+----------+--------------------------------------------------+ +| Function ID: | (uint32) | 0xC6000007 | ++---------------------+----------+----+---------------------------------------------+ +| Arguments: | (uint64) | R1 | Base IPA of MMIO memory region | +| +----------+----+---------------------------------------------+ +| | (uint64) | R2 | Reserved / Must be zero | +| +----------+----+---------------------------------------------+ +| | (uint64) | R3 | Reserved / Must be zero | ++---------------------+----------+----+---------------------------------------------+ +| Return Values: | (int64) | R0 | ``SUCCESS (0)`` | +| | | +---------------------------------------------+ +| | | | ``INVALID_PARAMETER (-3)`` | ++---------------------+----------+----+---------------------------------------------+ |