diff options
Diffstat (limited to 'Documentation')
200 files changed, 7594 insertions, 2103 deletions
diff --git a/Documentation/ABI/testing/debugfs-driver-qat b/Documentation/ABI/testing/debugfs-driver-qat new file mode 100644 index 000000000000..6731ffacc5f0 --- /dev/null +++ b/Documentation/ABI/testing/debugfs-driver-qat @@ -0,0 +1,61 @@ +What: /sys/kernel/debug/qat_<device>_<BDF>/qat/fw_counters +Date: November 2023 +KernelVersion: 6.6 +Contact: qat-linux@intel.com +Description: (RO) Read returns the number of requests sent to the FW and the number of responses + received from the FW for each Acceleration Engine + Reported firmware counters:: + + <N>: Number of requests sent from Acceleration Engine N to FW and responses + Acceleration Engine N received from FW + +What: /sys/kernel/debug/qat_<device>_<BDF>/heartbeat/config +Date: November 2023 +KernelVersion: 6.6 +Contact: qat-linux@intel.com +Description: (RW) Read returns value of the Heartbeat update period. + Write to the file changes this period value. + + This period should reflect planned polling interval of device + health status. High frequency Heartbeat monitoring wastes CPU cycles + but minimizes the customer’s system downtime. Also, if there are + large service requests that take some time to complete, high frequency + Heartbeat monitoring could result in false reports of unresponsiveness + and in those cases, period needs to be increased. + + This parameter is effective only for c3xxx, c62x, dh895xcc devices. + 4xxx has this value internally fixed to 200ms. + + Default value is set to 500. Minimal allowed value is 200. + All values are expressed in milliseconds. + +What: /sys/kernel/debug/qat_<device>_<BDF>/heartbeat/queries_failed +Date: November 2023 +KernelVersion: 6.6 +Contact: qat-linux@intel.com +Description: (RO) Read returns the number of times the device became unresponsive. + + Attribute returns value of the counter which is incremented when + status query results negative. + +What: /sys/kernel/debug/qat_<device>_<BDF>/heartbeat/queries_sent +Date: November 2023 +KernelVersion: 6.6 +Contact: qat-linux@intel.com +Description: (RO) Read returns the number of times the control process checked + if the device is responsive. + + Attribute returns value of the counter which is incremented on + every status query. + +What: /sys/kernel/debug/qat_<device>_<BDF>/heartbeat/status +Date: November 2023 +KernelVersion: 6.6 +Contact: qat-linux@intel.com +Description: (RO) Read returns the device health status. + + Returns 0 when device is healthy or -1 when is unresponsive + or the query failed to send. + + The driver does not monitor for Heartbeat. It is left for a user + to poll the status periodically. diff --git a/Documentation/ABI/testing/ima_policy b/Documentation/ABI/testing/ima_policy index 49db0ff288e5..c2385183826c 100644 --- a/Documentation/ABI/testing/ima_policy +++ b/Documentation/ABI/testing/ima_policy @@ -57,9 +57,9 @@ Description: stored in security.ima xattr. Requires specifying "digest_type=verity" first.) - appraise_flag:= [check_blacklist] - Currently, blacklist check is only for files signed with appended - signature. + appraise_flag:= [check_blacklist] (deprecated) + Setting the check_blacklist flag is no longer necessary. + All appraisal functions set it by default. digest_type:= verity Require fs-verity's file digest instead of the regular IMA file hash. diff --git a/Documentation/ABI/testing/sysfs-class-led-trigger-netdev b/Documentation/ABI/testing/sysfs-class-led-trigger-netdev index 78b62a23b14a..f6d9d72ce77b 100644 --- a/Documentation/ABI/testing/sysfs-class-led-trigger-netdev +++ b/Documentation/ABI/testing/sysfs-class-led-trigger-netdev @@ -13,7 +13,7 @@ Description: Specifies the duration of the LED blink in milliseconds. Defaults to 50 ms. - With hw_control ON, the interval value MUST be set to the + When offloaded is true, the interval value MUST be set to the default value and cannot be changed. Trying to set any value in this specific mode will return an EINVAL error. @@ -44,8 +44,8 @@ Description: If set to 1, the LED will blink for the milliseconds specified in interval to signal transmission. - With hw_control ON, the blink interval is controlled by hardware - and won't reflect the value set in interval. + When offloaded is true, the blink interval is controlled by + hardware and won't reflect the value set in interval. What: /sys/class/leds/<led>/rx Date: Dec 2017 @@ -59,21 +59,21 @@ Description: If set to 1, the LED will blink for the milliseconds specified in interval to signal reception. - With hw_control ON, the blink interval is controlled by hardware - and won't reflect the value set in interval. + When offloaded is true, the blink interval is controlled by + hardware and won't reflect the value set in interval. -What: /sys/class/leds/<led>/hw_control +What: /sys/class/leds/<led>/offloaded Date: Jun 2023 KernelVersion: 6.5 Contact: linux-leds@vger.kernel.org Description: - Communicate whether the LED trigger modes are driven by hardware - or software fallback is used. + Communicate whether the LED trigger modes are offloaded to + hardware or whether software fallback is used. If 0, the LED is using software fallback to blink. - If 1, the LED is using hardware control to blink and signal the - requested modes. + If 1, the LED blinking in requested mode is offloaded to + hardware. What: /sys/class/leds/<led>/link_10 Date: Jun 2023 diff --git a/Documentation/ABI/testing/sysfs-devices-memory b/Documentation/ABI/testing/sysfs-devices-memory index d8b0f80b9e33..a95e0f17c35a 100644 --- a/Documentation/ABI/testing/sysfs-devices-memory +++ b/Documentation/ABI/testing/sysfs-devices-memory @@ -110,3 +110,11 @@ Description: link is created for memory section 9 on node0. /sys/devices/system/node/node0/memory9 -> ../../memory/memory9 + +What: /sys/devices/system/memory/crash_hotplug +Date: Aug 2023 +Contact: Linux kernel mailing list <linux-kernel@vger.kernel.org> +Description: + (RO) indicates whether or not the kernel directly supports + modifying the crash elfcorehdr for memory hot un/plug and/or + on/offline changes. diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu index 77942eedf4f6..7ecd5c8161a6 100644 --- a/Documentation/ABI/testing/sysfs-devices-system-cpu +++ b/Documentation/ABI/testing/sysfs-devices-system-cpu @@ -556,6 +556,7 @@ Description: Control Symmetric Multi Threading (SMT) ================ ========================================= "on" SMT is enabled "off" SMT is disabled + "<N>" SMT is enabled with N threads per core. "forceoff" SMT is force disabled. Cannot be changed. "notsupported" SMT is not supported by the CPU "notimplemented" SMT runtime toggling is not @@ -687,3 +688,11 @@ Description: (RO) the list of CPUs that are isolated and don't participate in load balancing. These CPUs are set by boot parameter "isolcpus=". + +What: /sys/devices/system/cpu/crash_hotplug +Date: Aug 2023 +Contact: Linux kernel mailing list <linux-kernel@vger.kernel.org> +Description: + (RO) indicates whether or not the kernel directly supports + modifying the crash elfcorehdr for CPU hot un/plug and/or + on/offline changes. diff --git a/Documentation/ABI/testing/sysfs-driver-ccp b/Documentation/ABI/testing/sysfs-driver-ccp index 7aded9b75553..ee6b787eee7a 100644 --- a/Documentation/ABI/testing/sysfs-driver-ccp +++ b/Documentation/ABI/testing/sysfs-driver-ccp @@ -85,3 +85,21 @@ Description: Possible values: 0: Not enforced 1: Enforced + +What: /sys/bus/pci/devices/<BDF>/bootloader_version +Date: June 2023 +KernelVersion: 6.4 +Contact: mario.limonciello@amd.com +Description: + The /sys/bus/pci/devices/<BDF>/bootloader_version + file reports the firmware version of the AMD AGESA + bootloader. + +What: /sys/bus/pci/devices/<BDF>/tee_version +Date: June 2023 +KernelVersion: 6.4 +Contact: mario.limonciello@amd.com +Description: + The /sys/bus/pci/devices/<BDF>/tee_version + file reports the firmware version of the AMD Trusted + Execution Environment (TEE). diff --git a/Documentation/ABI/testing/sysfs-driver-chromeos-acpi b/Documentation/ABI/testing/sysfs-driver-chromeos-acpi index c308926e1568..d46b1c85840d 100644 --- a/Documentation/ABI/testing/sysfs-driver-chromeos-acpi +++ b/Documentation/ABI/testing/sysfs-driver-chromeos-acpi @@ -1,4 +1,5 @@ What: /sys/bus/platform/devices/GGL0001:*/BINF.2 + /sys/bus/platform/devices/GOOG0016:*/BINF.2 Date: May 2022 KernelVersion: 5.19 Description: @@ -10,6 +11,7 @@ Description: == =============================== What: /sys/bus/platform/devices/GGL0001:*/BINF.3 + /sys/bus/platform/devices/GOOG0016:*/BINF.3 Date: May 2022 KernelVersion: 5.19 Description: @@ -23,6 +25,7 @@ Description: == ===================================== What: /sys/bus/platform/devices/GGL0001:*/CHSW + /sys/bus/platform/devices/GOOG0016:*/CHSW Date: May 2022 KernelVersion: 5.19 Description: @@ -38,6 +41,7 @@ Description: ==== =========================================== What: /sys/bus/platform/devices/GGL0001:*/FMAP + /sys/bus/platform/devices/GOOG0016:*/FMAP Date: May 2022 KernelVersion: 5.19 Description: @@ -45,6 +49,7 @@ Description: processor firmware flashmap. What: /sys/bus/platform/devices/GGL0001:*/FRID + /sys/bus/platform/devices/GOOG0016:*/FRID Date: May 2022 KernelVersion: 5.19 Description: @@ -52,6 +57,7 @@ Description: main processor firmware. What: /sys/bus/platform/devices/GGL0001:*/FWID + /sys/bus/platform/devices/GOOG0016:*/FWID Date: May 2022 KernelVersion: 5.19 Description: @@ -59,6 +65,7 @@ Description: main processor firmware. What: /sys/bus/platform/devices/GGL0001:*/GPIO.X/GPIO.0 + /sys/bus/platform/devices/GOOG0016:*/GPIO.X/GPIO.0 Date: May 2022 KernelVersion: 5.19 Description: @@ -73,6 +80,7 @@ Description: =========== ================================== What: /sys/bus/platform/devices/GGL0001:*/GPIO.X/GPIO.1 + /sys/bus/platform/devices/GOOG0016:*/GPIO.X/GPIO.1 Date: May 2022 KernelVersion: 5.19 Description: @@ -84,6 +92,7 @@ Description: == ======================= What: /sys/bus/platform/devices/GGL0001:*/GPIO.X/GPIO.2 + /sys/bus/platform/devices/GOOG0016:*/GPIO.X/GPIO.2 Date: May 2022 KernelVersion: 5.19 Description: @@ -91,18 +100,21 @@ Description: controller. What: /sys/bus/platform/devices/GGL0001:*/GPIO.X/GPIO.3 + /sys/bus/platform/devices/GOOG0016:*/GPIO.X/GPIO.3 Date: May 2022 KernelVersion: 5.19 Description: Returns name of the GPIO controller. What: /sys/bus/platform/devices/GGL0001:*/HWID + /sys/bus/platform/devices/GOOG0016:*/HWID Date: May 2022 KernelVersion: 5.19 Description: Returns hardware ID for the Chromebook. What: /sys/bus/platform/devices/GGL0001:*/MECK + /sys/bus/platform/devices/GOOG0016:*/MECK Date: May 2022 KernelVersion: 5.19 Description: @@ -113,6 +125,7 @@ Description: present, or if the firmware was unable to read the extended registers, this buffer size can be zero. What: /sys/bus/platform/devices/GGL0001:*/VBNV.0 + /sys/bus/platform/devices/GOOG0016:*/VBNV.0 Date: May 2022 KernelVersion: 5.19 Description: @@ -122,6 +135,7 @@ Description: clock data). What: /sys/bus/platform/devices/GGL0001:*/VBNV.1 + /sys/bus/platform/devices/GOOG0016:*/VBNV.1 Date: May 2022 KernelVersion: 5.19 Description: @@ -129,9 +143,10 @@ Description: storage block. What: /sys/bus/platform/devices/GGL0001:*/VDAT + /sys/bus/platform/devices/GOOG0016:*/VDAT Date: May 2022 KernelVersion: 5.19 Description: Returns the verified boot data block shared between the firmware verification step and the kernel verification step - (binary). + (hex dump). diff --git a/Documentation/ABI/testing/sysfs-kernel-mm-damon b/Documentation/ABI/testing/sysfs-kernel-mm-damon index 2744f21b5a6b..334352d198f8 100644 --- a/Documentation/ABI/testing/sysfs-kernel-mm-damon +++ b/Documentation/ABI/testing/sysfs-kernel-mm-damon @@ -29,8 +29,10 @@ Description: Writing 'on' or 'off' to this file makes the kdamond starts or file updates contents of schemes stats files of the kdamond. Writing 'update_schemes_tried_regions' to the file updates contents of 'tried_regions' directory of every scheme directory - of this kdamond. Writing 'clear_schemes_tried_regions' to the - file removes contents of the 'tried_regions' directory. + of this kdamond. Writing 'update_schemes_tried_bytes' to the + file updates only '.../tried_regions/total_bytes' files of this + kdamond. Writing 'clear_schemes_tried_regions' to the file + removes contents of the 'tried_regions' directory. What: /sys/kernel/mm/damon/admin/kdamonds/<K>/pid Date: Mar 2022 @@ -269,8 +271,10 @@ What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/filters/ Date: Dec 2022 Contact: SeongJae Park <sj@kernel.org> Description: Writing to and reading from this file sets and gets the type of - the memory of the interest. 'anon' for anonymous pages, or - 'memcg' for specific memory cgroup can be written and read. + the memory of the interest. 'anon' for anonymous pages, + 'memcg' for specific memory cgroup, 'addr' for address range + (an open-ended interval), or 'target' for DAMON monitoring + target can be written and read. What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/filters/<F>/memcg_path Date: Dec 2022 @@ -279,6 +283,27 @@ Description: If 'memcg' is written to the 'type' file, writing to and reading from this file sets and gets the path to the memory cgroup of the interest. +What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/filters/<F>/addr_start +Date: Jul 2023 +Contact: SeongJae Park <sj@kernel.org> +Description: If 'addr' is written to the 'type' file, writing to or reading + from this file sets or gets the start address of the address + range for the filter. + +What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/filters/<F>/addr_end +Date: Jul 2023 +Contact: SeongJae Park <sj@kernel.org> +Description: If 'addr' is written to the 'type' file, writing to or reading + from this file sets or gets the end address of the address + range for the filter. + +What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/filters/<F>/target_idx +Date: Dec 2022 +Contact: SeongJae Park <sj@kernel.org> +Description: If 'target' is written to the 'type' file, writing to or + reading from this file sets or gets the index of the DAMON + monitoring target of the interest. + What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/filters/<F>/matching Date: Dec 2022 Contact: SeongJae Park <sj@kernel.org> @@ -317,6 +342,13 @@ Contact: SeongJae Park <sj@kernel.org> Description: Reading this file returns the number of the exceed events of the scheme's quotas. +What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/tried_regions/total_bytes +Date: Jul 2023 +Contact: SeongJae Park <sj@kernel.org> +Description: Reading this file returns the total amount of memory that + corresponding DAMON-based Operation Scheme's action has tried + to be applied. + What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/tried_regions/<R>/start Date: Oct 2022 Contact: SeongJae Park <sj@kernel.org> diff --git a/Documentation/ABI/testing/sysfs-memory-page-offline b/Documentation/ABI/testing/sysfs-memory-page-offline index e14703f12fdf..00f4e35f916f 100644 --- a/Documentation/ABI/testing/sysfs-memory-page-offline +++ b/Documentation/ABI/testing/sysfs-memory-page-offline @@ -10,7 +10,7 @@ Description: dropping it if possible. The kernel will then be placed on the bad page list and never be reused. - The offlining is done in kernel specific granuality. + The offlining is done in kernel specific granularity. Normally it's the base page size of the kernel, but this might change. @@ -35,7 +35,7 @@ Description: to access this page assuming it's poisoned by the hardware. - The offlining is done in kernel specific granuality. + The offlining is done in kernel specific granularity. Normally it's the base page size of the kernel, but this might change. diff --git a/Documentation/ABI/testing/sysfs-platform-power-on-reason b/Documentation/ABI/testing/sysfs-platform-power-on-reason new file mode 100644 index 000000000000..c3b29dbc64bf --- /dev/null +++ b/Documentation/ABI/testing/sysfs-platform-power-on-reason @@ -0,0 +1,12 @@ +What: /sys/devices/platform/.../power_on_reason +Date: June 2023 +KernelVersion: 6.5 +Contact: Kamel Bouhara <kamel.bouhara@bootlin.com> +Description: Shows system power on reason. The following strings/reasons can + be read (the list can be extended): + "regular power-up", "RTC wakeup", "watchdog timeout", + "software reset", "reset button action", "CPU clock failure", + "crystal oscillator failure", "brown-out reset", + "unknown reason". + + The file is read only. diff --git a/Documentation/RCU/lockdep-splat.rst b/Documentation/RCU/lockdep-splat.rst index 2a5c79db57dc..bcbc4b3c88d7 100644 --- a/Documentation/RCU/lockdep-splat.rst +++ b/Documentation/RCU/lockdep-splat.rst @@ -10,7 +10,7 @@ misuses of the RCU API, most notably using one of the rcu_dereference() family to access an RCU-protected pointer without the proper protection. When such misuse is detected, an lockdep-RCU splat is emitted. -The usual cause of a lockdep-RCU slat is someone accessing an +The usual cause of a lockdep-RCU splat is someone accessing an RCU-protected data structure without either (1) being in the right kind of RCU read-side critical section or (2) holding the right update-side lock. This problem can therefore be serious: it might result in random memory diff --git a/Documentation/RCU/rculist_nulls.rst b/Documentation/RCU/rculist_nulls.rst index 9a734bf54b76..21e40fcc08de 100644 --- a/Documentation/RCU/rculist_nulls.rst +++ b/Documentation/RCU/rculist_nulls.rst @@ -18,7 +18,16 @@ to solve following problem. Without 'nulls', a typical RCU linked list managing objects which are allocated with SLAB_TYPESAFE_BY_RCU kmem_cache can use the following -algorithms: +algorithms. Following examples assume 'obj' is a pointer to such +objects, which is having below type. + +:: + + struct object { + struct hlist_node obj_node; + atomic_t refcnt; + unsigned int key; + }; 1) Lookup algorithm ------------------- @@ -26,11 +35,13 @@ algorithms: :: begin: - rcu_read_lock() + rcu_read_lock(); obj = lockless_lookup(key); if (obj) { - if (!try_get_ref(obj)) // might fail for free objects + if (!try_get_ref(obj)) { // might fail for free objects + rcu_read_unlock(); goto begin; + } /* * Because a writer could delete object, and a writer could * reuse these object before the RCU grace period, we @@ -54,7 +65,7 @@ but a version with an additional memory barrier (smp_rmb()) struct hlist_node *node, *next; for (pos = rcu_dereference((head)->first); pos && ({ next = pos->next; smp_rmb(); prefetch(next); 1; }) && - ({ tpos = hlist_entry(pos, typeof(*tpos), member); 1; }); + ({ obj = hlist_entry(pos, typeof(*obj), obj_node); 1; }); pos = rcu_dereference(next)) if (obj->key == key) return obj; @@ -66,10 +77,10 @@ And note the traditional hlist_for_each_entry_rcu() misses this smp_rmb():: struct hlist_node *node; for (pos = rcu_dereference((head)->first); pos && ({ prefetch(pos->next); 1; }) && - ({ tpos = hlist_entry(pos, typeof(*tpos), member); 1; }); + ({ obj = hlist_entry(pos, typeof(*obj), obj_node); 1; }); pos = rcu_dereference(pos->next)) - if (obj->key == key) - return obj; + if (obj->key == key) + return obj; return NULL; Quoting Corey Minyard:: @@ -86,7 +97,7 @@ Quoting Corey Minyard:: 2) Insertion algorithm ---------------------- -We need to make sure a reader cannot read the new 'obj->obj_next' value +We need to make sure a reader cannot read the new 'obj->obj_node.next' value and previous value of 'obj->key'. Otherwise, an item could be deleted from a chain, and inserted into another chain. If new chain was empty before the move, 'next' pointer is NULL, and lockless reader can not @@ -129,8 +140,7 @@ very very fast (before the end of RCU grace period) Avoiding extra smp_rmb() ======================== -With hlist_nulls we can avoid extra smp_rmb() in lockless_lookup() -and extra _release() in insert function. +With hlist_nulls we can avoid extra smp_rmb() in lockless_lookup(). For example, if we choose to store the slot number as the 'nulls' end-of-list marker for each slot of the hash table, we can detect @@ -142,6 +152,9 @@ the beginning. If the object was moved to the same chain, then the reader doesn't care: It might occasionally scan the list again without harm. +Note that using hlist_nulls means the type of 'obj_node' field of +'struct object' becomes 'struct hlist_nulls_node'. + 1) lookup algorithm ------------------- @@ -151,7 +164,7 @@ scan the list again without harm. head = &table[slot]; begin: rcu_read_lock(); - hlist_nulls_for_each_entry_rcu(obj, node, head, member) { + hlist_nulls_for_each_entry_rcu(obj, node, head, obj_node) { if (obj->key == key) { if (!try_get_ref(obj)) { // might fail for free objects rcu_read_unlock(); @@ -182,6 +195,9 @@ scan the list again without harm. 2) Insert algorithm ------------------- +Same to the above one, but uses hlist_nulls_add_head_rcu() instead of +hlist_add_head_rcu(). + :: /* diff --git a/Documentation/admin-guide/cgroup-v1/memory.rst b/Documentation/admin-guide/cgroup-v1/memory.rst index fabaad3fd9c2..8d3afeede10e 100644 --- a/Documentation/admin-guide/cgroup-v1/memory.rst +++ b/Documentation/admin-guide/cgroup-v1/memory.rst @@ -92,8 +92,6 @@ Brief summary of control files. memory.oom_control set/show oom controls. memory.numa_stat show the number of memory usage per numa node - memory.kmem.limit_in_bytes This knob is deprecated and writing to - it will return -ENOTSUPP. memory.kmem.usage_in_bytes show current kernel memory allocation memory.kmem.failcnt show the number of kernel memory usage hits limits diff --git a/Documentation/admin-guide/kdump/vmcoreinfo.rst b/Documentation/admin-guide/kdump/vmcoreinfo.rst index f8ebb63b6c5d..599e8d3bcbc3 100644 --- a/Documentation/admin-guide/kdump/vmcoreinfo.rst +++ b/Documentation/admin-guide/kdump/vmcoreinfo.rst @@ -141,8 +141,8 @@ nodemask_t The size of a nodemask_t type. Used to compute the number of online nodes. -(page, flags|_refcount|mapping|lru|_mapcount|private|compound_dtor|compound_order|compound_head) -------------------------------------------------------------------------------------------------- +(page, flags|_refcount|mapping|lru|_mapcount|private|compound_order|compound_head) +---------------------------------------------------------------------------------- User-space tools compute their values based on the offset of these variables. The variables are used when excluding unnecessary pages. @@ -325,8 +325,8 @@ NR_FREE_PAGES On linux-2.6.21 or later, the number of free pages is in vm_stat[NR_FREE_PAGES]. Used to get the number of free pages. -PG_lru|PG_private|PG_swapcache|PG_swapbacked|PG_slab|PG_hwpoision|PG_head_mask ------------------------------------------------------------------------------- +PG_lru|PG_private|PG_swapcache|PG_swapbacked|PG_slab|PG_hwpoision|PG_head_mask|PG_hugetlb +----------------------------------------------------------------------------------------- Page attributes. These flags are used to filter various unnecessary for dumping pages. @@ -338,12 +338,6 @@ More page attributes. These flags are used to filter various unnecessary for dumping pages. -HUGETLB_PAGE_DTOR ------------------ - -The HUGETLB_PAGE_DTOR flag denotes hugetlbfs pages. Makedumpfile -excludes these pages. - x86_64 ====== diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 722b6eca2e93..93f646e0b4b5 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -553,7 +553,7 @@ others). ccw_timeout_log [S390] - See Documentation/s390/common_io.rst for details. + See Documentation/arch/s390/common_io.rst for details. cgroup_disable= [KNL] Disable a particular controller or optional feature Format: {name of the controller(s) or feature(s) to disable} @@ -598,7 +598,7 @@ Setting checkreqprot to 1 is deprecated. cio_ignore= [S390] - See Documentation/s390/common_io.rst for details. + See Documentation/arch/s390/common_io.rst for details. clearcpuid=X[,X...] [X86] Disable CPUID feature X for the kernel. See @@ -696,7 +696,7 @@ kernel/dma/contiguous.c cma_pernuma=nn[MG] - [ARM64,KNL,CMA] + [KNL,CMA] Sets the size of kernel per-numa memory area for contiguous memory allocations. A value of 0 disables per-numa CMA altogether. And If this option is not @@ -706,6 +706,17 @@ which is located in node nid, if the allocation fails, they will fallback to the global default memory area. + numa_cma=<node>:nn[MG][,<node>:nn[MG]] + [KNL,CMA] + Sets the size of kernel numa memory area for + contiguous memory allocations. It will reserve CMA + area for the specified node. + + With numa CMA enabled, DMA users on node nid will + first try to allocate buffer from the numa area + which is located in node nid, if the allocation fails, + they will fallback to the global default memory area. + cmo_free_hint= [PPC] Format: { yes | no } Specify whether pages are marked as being inactive when they are freed. This is used in CMO environments @@ -2938,6 +2949,10 @@ locktorture.torture_type= [KNL] Specify the locking implementation to test. + locktorture.writer_fifo= [KNL] + Run the write-side locktorture kthreads at + sched_set_fifo() real-time priority. + locktorture.verbose= [KNL] Enable additional printk() statements. @@ -4949,6 +4964,15 @@ test until boot completes in order to avoid interference. + rcuscale.kfree_by_call_rcu= [KNL] + In kernels built with CONFIG_RCU_LAZY=y, test + call_rcu() instead of kfree_rcu(). + + rcuscale.kfree_mult= [KNL] + Instead of allocating an object of size kfree_obj, + allocate one of kfree_mult * sizeof(kfree_obj). + Defaults to 1. + rcuscale.kfree_rcu_test= [KNL] Set to measure performance of kfree_rcu() flooding. @@ -4974,6 +4998,12 @@ Number of loops doing rcuscale.kfree_alloc_num number of allocations and frees. + rcuscale.minruntime= [KNL] + Set the minimum test run time in seconds. This + does not affect the data-collection interval, + but instead allows better measurement of things + like CPU consumption. + rcuscale.nreaders= [KNL] Set number of RCU readers. The value -1 selects N, where N is the number of CPUs. A value @@ -4988,7 +5018,7 @@ the same as for rcuscale.nreaders. N, where N is the number of CPUs - rcuscale.perf_type= [KNL] + rcuscale.scale_type= [KNL] Specify the RCU implementation to test. rcuscale.shutdown= [KNL] @@ -5004,6 +5034,11 @@ in microseconds. The default of zero says no holdoff. + rcuscale.writer_holdoff_jiffies= [KNL] + Additional write-side holdoff between grace + periods, but in jiffies. The default of zero + says no holdoff. + rcutorture.fqs_duration= [KNL] Set duration of force_quiescent_state bursts in microseconds. @@ -5285,6 +5320,13 @@ number avoids disturbing real-time workloads, but lengthens grace periods. + rcupdate.rcu_task_lazy_lim= [KNL] + Number of callbacks on a given CPU that will + cancel laziness on that CPU. Use -1 to disable + cancellation of laziness, but be advised that + doing so increases the danger of OOM due to + callback flooding. + rcupdate.rcu_task_stall_info= [KNL] Set initial timeout in jiffies for RCU task stall informational messages, which give some indication @@ -5314,6 +5356,29 @@ A change in value does not take effect until the beginning of the next grace period. + rcupdate.rcu_tasks_lazy_ms= [KNL] + Set timeout in milliseconds RCU Tasks asynchronous + callback batching for call_rcu_tasks(). + A negative value will take the default. A value + of zero will disable batching. Batching is + always disabled for synchronize_rcu_tasks(). + + rcupdate.rcu_tasks_rude_lazy_ms= [KNL] + Set timeout in milliseconds RCU Tasks + Rude asynchronous callback batching for + call_rcu_tasks_rude(). A negative value + will take the default. A value of zero will + disable batching. Batching is always disabled + for synchronize_rcu_tasks_rude(). + + rcupdate.rcu_tasks_trace_lazy_ms= [KNL] + Set timeout in milliseconds RCU Tasks + Trace asynchronous callback batching for + call_rcu_tasks_trace(). A negative value + will take the default. A value of zero will + disable batching. Batching is always disabled + for synchronize_rcu_tasks_trace(). + rcupdate.rcu_self_test= [KNL] Run the RCU early boot self tests @@ -5522,6 +5587,10 @@ Useful for devices that are detected asynchronously (e.g. USB and MMC devices). + rootwait= [KNL] Maximum time (in seconds) to wait for root device + to show up before attempting to mount the root + filesystem. + rproc_mem=nn[KMG][@address] [KNL,ARM,CMA] Remoteproc physical memory block. Memory area to be used by remote processor image, @@ -6275,10 +6344,6 @@ -1: disable all critical trip points in all thermal zones <degrees C>: override all critical trip points - thermal.nocrt= [HW,ACPI] - Set to disable actions on ACPI thermal zone - critical and hot trip points. - thermal.off= [HW,ACPI] 1: disable ACPI thermal control @@ -6340,6 +6405,13 @@ This will guarantee that all the other pcrs are saved. + tpm_tis.interrupts= [HW,TPM] + Enable interrupts for the MMIO based physical layer + for the FIFO interface. By default it is set to false + (0). For more information about TPM hardware interfaces + defined by Trusted Computing Group (TCG) see + https://trustedcomputinggroup.org/resource/pc-client-platform-tpm-profile-ptp-specification/ + tp_printk [FTRACE] Have the tracepoints sent to printk as well as the tracing ring buffer. This is useful for early boot up diff --git a/Documentation/admin-guide/mm/damon/usage.rst b/Documentation/admin-guide/mm/damon/usage.rst index 2d495fa85a0e..084f0a32b421 100644 --- a/Documentation/admin-guide/mm/damon/usage.rst +++ b/Documentation/admin-guide/mm/damon/usage.rst @@ -87,7 +87,7 @@ comma (","). :: │ │ │ │ │ │ │ filters/nr_filters │ │ │ │ │ │ │ │ 0/type,matching,memcg_id │ │ │ │ │ │ │ stats/nr_tried,sz_tried,nr_applied,sz_applied,qt_exceeds - │ │ │ │ │ │ │ tried_regions/ + │ │ │ │ │ │ │ tried_regions/total_bytes │ │ │ │ │ │ │ │ 0/start,end,nr_accesses,age │ │ │ │ │ │ │ │ ... │ │ │ │ │ │ ... @@ -127,14 +127,18 @@ in the state. Writing ``commit`` to the ``state`` file makes kdamond reads the user inputs in the sysfs files except ``state`` file again. Writing ``update_schemes_stats`` to ``state`` file updates the contents of stats files for each DAMON-based operation scheme of the kdamond. For details of the -stats, please refer to :ref:`stats section <sysfs_schemes_stats>`. Writing -``update_schemes_tried_regions`` to ``state`` file updates the DAMON-based -operation scheme action tried regions directory for each DAMON-based operation -scheme of the kdamond. Writing ``clear_schemes_tried_regions`` to ``state`` -file clears the DAMON-based operating scheme action tried regions directory for -each DAMON-based operation scheme of the kdamond. For details of the -DAMON-based operation scheme action tried regions directory, please refer to -:ref:`tried_regions section <sysfs_schemes_tried_regions>`. +stats, please refer to :ref:`stats section <sysfs_schemes_stats>`. + +Writing ``update_schemes_tried_regions`` to ``state`` file updates the +DAMON-based operation scheme action tried regions directory for each +DAMON-based operation scheme of the kdamond. Writing +``update_schemes_tried_bytes`` to ``state`` file updates only +``.../tried_regions/total_bytes`` files. Writing +``clear_schemes_tried_regions`` to ``state`` file clears the DAMON-based +operating scheme action tried regions directory for each DAMON-based operation +scheme of the kdamond. For details of the DAMON-based operation scheme action +tried regions directory, please refer to :ref:`tried_regions section +<sysfs_schemes_tried_regions>`. If the state is ``on``, reading ``pid`` shows the pid of the kdamond thread. @@ -359,15 +363,21 @@ number (``N``) to the file creates the number of child directories named ``0`` to ``N-1``. Each directory represents each filter. The filters are evaluated in the numeric order. -Each filter directory contains three files, namely ``type``, ``matcing``, and -``memcg_path``. You can write one of two special keywords, ``anon`` for -anonymous pages, or ``memcg`` for specific memory cgroup filtering. In case of -the memory cgroup filtering, you can specify the memory cgroup of the interest -by writing the path of the memory cgroup from the cgroups mount point to -``memcg_path`` file. You can write ``Y`` or ``N`` to ``matching`` file to -filter out pages that does or does not match to the type, respectively. Then, -the scheme's action will not be applied to the pages that specified to be -filtered out. +Each filter directory contains six files, namely ``type``, ``matcing``, +``memcg_path``, ``addr_start``, ``addr_end``, and ``target_idx``. To ``type`` +file, you can write one of four special keywords: ``anon`` for anonymous pages, +``memcg`` for specific memory cgroup, ``addr`` for specific address range (an +open-ended interval), or ``target`` for specific DAMON monitoring target +filtering. In case of the memory cgroup filtering, you can specify the memory +cgroup of the interest by writing the path of the memory cgroup from the +cgroups mount point to ``memcg_path`` file. In case of the address range +filtering, you can specify the start and end address of the range to +``addr_start`` and ``addr_end`` files, respectively. For the DAMON monitoring +target filtering, you can specify the index of the target between the list of +the DAMON context's monitoring targets list to ``target_idx`` file. You can +write ``Y`` or ``N`` to ``matching`` file to filter out pages that does or does +not match to the type, respectively. Then, the scheme's action will not be +applied to the pages that specified to be filtered out. For example, below restricts a DAMOS action to be applied to only non-anonymous pages of all memory cgroups except ``/having_care_already``.:: @@ -381,8 +391,14 @@ pages of all memory cgroups except ``/having_care_already``.:: echo /having_care_already > 1/memcg_path echo N > 1/matching -Note that filters are currently supported only when ``paddr`` -`implementation <sysfs_contexts>` is being used. +Note that ``anon`` and ``memcg`` filters are currently supported only when +``paddr`` `implementation <sysfs_contexts>` is being used. + +Also, memory regions that are filtered out by ``addr`` or ``target`` filters +are not counted as the scheme has tried to those, while regions that filtered +out by other type filters are counted as the scheme has tried to. The +difference is applied to :ref:`stats <damos_stats>` and +:ref:`tried regions <sysfs_schemes_tried_regions>`. .. _sysfs_schemes_stats: @@ -406,13 +422,21 @@ stats by writing a special keyword, ``update_schemes_stats`` to the relevant schemes/<N>/tried_regions/ -------------------------- +This directory initially has one file, ``total_bytes``. + When a special keyword, ``update_schemes_tried_regions``, is written to the -relevant ``kdamonds/<N>/state`` file, DAMON creates directories named integer -starting from ``0`` under this directory. Each directory contains files -exposing detailed information about each of the memory region that the -corresponding scheme's ``action`` has tried to be applied under this directory, -during next :ref:`aggregation interval <sysfs_monitoring_attrs>`. The -information includes address range, ``nr_accesses``, and ``age`` of the region. +relevant ``kdamonds/<N>/state`` file, DAMON updates the ``total_bytes`` file so +that reading it returns the total size of the scheme tried regions, and creates +directories named integer starting from ``0`` under this directory. Each +directory contains files exposing detailed information about each of the memory +region that the corresponding scheme's ``action`` has tried to be applied under +this directory, during next :ref:`aggregation interval +<sysfs_monitoring_attrs>`. The information includes address range, +``nr_accesses``, and ``age`` of the region. + +Writing ``update_schemes_tried_bytes`` to the relevant ``kdamonds/<N>/state`` +file will only update the ``total_bytes`` file, and will not create the +subdirectories. The directories will be removed when another special keyword, ``clear_schemes_tried_regions``, is written to the relevant diff --git a/Documentation/admin-guide/mm/ksm.rst b/Documentation/admin-guide/mm/ksm.rst index 7626392fe82c..776f244bdae4 100644 --- a/Documentation/admin-guide/mm/ksm.rst +++ b/Documentation/admin-guide/mm/ksm.rst @@ -159,6 +159,8 @@ The effectiveness of KSM and MADV_MERGEABLE is shown in ``/sys/kernel/mm/ksm/``: general_profit how effective is KSM. The calculation is explained below. +pages_scanned + how many pages are being scanned for ksm pages_shared how many shared pages are being used pages_sharing @@ -173,6 +175,13 @@ stable_node_chains the number of KSM pages that hit the ``max_page_sharing`` limit stable_node_dups number of duplicated KSM pages +ksm_zero_pages + how many zero pages that are still mapped into processes were mapped by + KSM when deduplicating. + +When ``use_zero_pages`` is/was enabled, the sum of ``pages_sharing`` + +``ksm_zero_pages`` represents the actual number of pages saved by KSM. +if ``use_zero_pages`` has never been enabled, ``ksm_zero_pages`` is 0. A high ratio of ``pages_sharing`` to ``pages_shared`` indicates good sharing, but a high ratio of ``pages_unshared`` to ``pages_sharing`` @@ -196,21 +205,25 @@ several times, which are unprofitable memory consumed. 1) How to determine whether KSM save memory or consume memory in system-wide range? Here is a simple approximate calculation for reference:: - general_profit =~ pages_sharing * sizeof(page) - (all_rmap_items) * + general_profit =~ ksm_saved_pages * sizeof(page) - (all_rmap_items) * sizeof(rmap_item); - where all_rmap_items can be easily obtained by summing ``pages_sharing``, - ``pages_shared``, ``pages_unshared`` and ``pages_volatile``. + where ksm_saved_pages equals to the sum of ``pages_sharing`` + + ``ksm_zero_pages`` of the system, and all_rmap_items can be easily + obtained by summing ``pages_sharing``, ``pages_shared``, ``pages_unshared`` + and ``pages_volatile``. 2) The KSM profit inner a single process can be similarly obtained by the following approximate calculation:: - process_profit =~ ksm_merging_pages * sizeof(page) - + process_profit =~ ksm_saved_pages * sizeof(page) - ksm_rmap_items * sizeof(rmap_item). - where ksm_merging_pages is shown under the directory ``/proc/<pid>/``, - and ksm_rmap_items is shown in ``/proc/<pid>/ksm_stat``. The process profit - is also shown in ``/proc/<pid>/ksm_stat`` as ksm_process_profit. + where ksm_saved_pages equals to the sum of ``ksm_merging_pages`` and + ``ksm_zero_pages``, both of which are shown under the directory + ``/proc/<pid>/ksm_stat``, and ksm_rmap_items is also shown in + ``/proc/<pid>/ksm_stat``. The process profit is also shown in + ``/proc/<pid>/ksm_stat`` as ksm_process_profit. From the perspective of application, a high ratio of ``ksm_rmap_items`` to ``ksm_merging_pages`` means a bad madvise-applied policy, so developers or diff --git a/Documentation/admin-guide/mm/memory-hotplug.rst b/Documentation/admin-guide/mm/memory-hotplug.rst index 1b02fe5807cc..cfe034cf1e87 100644 --- a/Documentation/admin-guide/mm/memory-hotplug.rst +++ b/Documentation/admin-guide/mm/memory-hotplug.rst @@ -291,6 +291,14 @@ The following files are currently defined: Availability depends on the CONFIG_ARCH_MEMORY_PROBE kernel configuration option. ``uevent`` read-write: generic udev file for device subsystems. +``crash_hotplug`` read-only: when changes to the system memory map + occur due to hot un/plug of memory, this file contains + '1' if the kernel updates the kdump capture kernel memory + map itself (via elfcorehdr), or '0' if userspace must update + the kdump capture kernel memory map. + + Availability depends on the CONFIG_MEMORY_HOTPLUG kernel + configuration option. ====================== ========================================================= .. note:: @@ -433,6 +441,18 @@ The following module parameters are currently defined: memory in a way that huge pages in bigger granularity cannot be formed on hotplugged memory. + + With value "force" it could result in memory + wastage due to memmap size limitations. For + example, if the memmap for a memory block + requires 1 MiB, but the pageblock size is 2 + MiB, 1 MiB of hotplugged memory will be wasted. + Note that there are still cases where the + feature cannot be enforced: for example, if the + memmap is smaller than a single page, or if the + architecture does not support the forced mode + in all configurations. + ``online_policy`` read-write: Set the basic policy used for automatic zone selection when onlining memory blocks without specifying a target zone. @@ -669,7 +689,7 @@ when still encountering permanently unmovable pages within ZONE_MOVABLE (-> BUG), memory offlining will keep retrying until it eventually succeeds. When offlining is triggered from user space, the offlining context can be -terminated by sending a fatal signal. A timeout based offlining can easily be +terminated by sending a signal. A timeout based offlining can easily be implemented via:: % timeout $TIMEOUT offline_block | failure_handling diff --git a/Documentation/admin-guide/mm/userfaultfd.rst b/Documentation/admin-guide/mm/userfaultfd.rst index 7c304e432205..4349a8c2b978 100644 --- a/Documentation/admin-guide/mm/userfaultfd.rst +++ b/Documentation/admin-guide/mm/userfaultfd.rst @@ -244,6 +244,21 @@ write-protected (so future writes will also result in a WP fault). These ioctls support a mode flag (``UFFDIO_COPY_MODE_WP`` or ``UFFDIO_CONTINUE_MODE_WP`` respectively) to configure the mapping this way. +Memory Poisioning Emulation +--------------------------- + +In response to a fault (either missing or minor), an action userspace can +take to "resolve" it is to issue a ``UFFDIO_POISON``. This will cause any +future faulters to either get a SIGBUS, or in KVM's case the guest will +receive an MCE as if there were hardware memory poisoning. + +This is used to emulate hardware memory poisoning. Imagine a VM running on a +machine which experiences a real hardware memory error. Later, we live migrate +the VM to another physical machine. Since we want the migration to be +transparent to the guest, we want that same address range to act as if it was +still poisoned, even though it's on a new physical host which ostensibly +doesn't have a memory error in the exact same spot. + QEMU/KVM ======== diff --git a/Documentation/admin-guide/mm/zswap.rst b/Documentation/admin-guide/mm/zswap.rst index c5c2c7dbb155..45b98390e938 100644 --- a/Documentation/admin-guide/mm/zswap.rst +++ b/Documentation/admin-guide/mm/zswap.rst @@ -49,7 +49,7 @@ compressed pool. Design ====== -Zswap receives pages for compression through the Frontswap API and is able to +Zswap receives pages for compression from the swap subsystem and is able to evict pages from its own compressed pool on an LRU basis and write them back to the backing swap device in the case that the compressed pool is full. @@ -70,19 +70,19 @@ means the compression ratio will always be 2:1 or worse (because of half-full zbud pages). The zsmalloc type zpool has a more complex compressed page storage method, and it can achieve greater storage densities. -When a swap page is passed from frontswap to zswap, zswap maintains a mapping +When a swap page is passed from swapout to zswap, zswap maintains a mapping of the swap entry, a combination of the swap type and swap offset, to the zpool handle that references that compressed swap page. This mapping is achieved with a red-black tree per swap type. The swap offset is the search key for the tree nodes. -During a page fault on a PTE that is a swap entry, frontswap calls the zswap -load function to decompress the page into the page allocated by the page fault -handler. +During a page fault on a PTE that is a swap entry, the swapin code calls the +zswap load function to decompress the page into the page allocated by the page +fault handler. Once there are no PTEs referencing a swap page stored in zswap (i.e. the count -in the swap_map goes to 0) the swap code calls the zswap invalidate function, -via frontswap, to free the compressed entry. +in the swap_map goes to 0) the swap code calls the zswap invalidate function +to free the compressed entry. Zswap seeks to be simple in its policies. Sysfs attributes allow for one user controlled policy: diff --git a/Documentation/arch/arm64/silicon-errata.rst b/Documentation/arch/arm64/silicon-errata.rst index bedd3a1d7b42..e96f057ea2a0 100644 --- a/Documentation/arch/arm64/silicon-errata.rst +++ b/Documentation/arch/arm64/silicon-errata.rst @@ -63,6 +63,14 @@ stable kernels. +----------------+-----------------+-----------------+-----------------------------+ | ARM | Cortex-A510 | #1902691 | ARM64_ERRATUM_1902691 | +----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A510 | #2051678 | ARM64_ERRATUM_2051678 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A510 | #2077057 | ARM64_ERRATUM_2077057 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A510 | #2441009 | ARM64_ERRATUM_2441009 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A510 | #2658417 | ARM64_ERRATUM_2658417 | ++----------------+-----------------+-----------------+-----------------------------+ | ARM | Cortex-A53 | #826319 | ARM64_ERRATUM_826319 | +----------------+-----------------+-----------------+-----------------------------+ | ARM | Cortex-A53 | #827319 | ARM64_ERRATUM_827319 | @@ -109,14 +117,6 @@ stable kernels. +----------------+-----------------+-----------------+-----------------------------+ | ARM | Cortex-A77 | #1508412 | ARM64_ERRATUM_1508412 | +----------------+-----------------+-----------------+-----------------------------+ -| ARM | Cortex-A510 | #2051678 | ARM64_ERRATUM_2051678 | -+----------------+-----------------+-----------------+-----------------------------+ -| ARM | Cortex-A510 | #2077057 | ARM64_ERRATUM_2077057 | -+----------------+-----------------+-----------------+-----------------------------+ -| ARM | Cortex-A510 | #2441009 | ARM64_ERRATUM_2441009 | -+----------------+-----------------+-----------------+-----------------------------+ -| ARM | Cortex-A510 | #2658417 | ARM64_ERRATUM_2658417 | -+----------------+-----------------+-----------------+-----------------------------+ | ARM | Cortex-A710 | #2119858 | ARM64_ERRATUM_2119858 | +----------------+-----------------+-----------------+-----------------------------+ | ARM | Cortex-A710 | #2054223 | ARM64_ERRATUM_2054223 | @@ -198,6 +198,9 @@ stable kernels. +----------------+-----------------+-----------------+-----------------------------+ | Hisilicon | Hip08 SMMU PMCG | #162001800 | N/A | +----------------+-----------------+-----------------+-----------------------------+ +| Hisilicon | Hip08 SMMU PMCG | #162001900 | N/A | +| | Hip09 SMMU PMCG | | | ++----------------+-----------------+-----------------+-----------------------------+ +----------------+-----------------+-----------------+-----------------------------+ | Qualcomm Tech. | Kryo/Falkor v1 | E1003 | QCOM_FALKOR_ERRATUM_1003 | +----------------+-----------------+-----------------+-----------------------------+ diff --git a/Documentation/arch/arm64/sme.rst b/Documentation/arch/arm64/sme.rst index ba529a1dc606..3d0e53ecac4f 100644 --- a/Documentation/arch/arm64/sme.rst +++ b/Documentation/arch/arm64/sme.rst @@ -322,7 +322,7 @@ The regset data starts with struct user_za_header, containing: VL is supported. * The size and layout of the payload depends on the header fields. The - SME_PT_ZA_*() macros are provided to facilitate access to the data. + ZA_PT_ZA*() macros are provided to facilitate access to the data. * In either case, for SETREGSET it is permissible to omit the payload, in which case the vector length and flags are changed and PSTATE.ZA is set to 0 diff --git a/Documentation/arch/index.rst b/Documentation/arch/index.rst index 8458b88e9b79..c9a209878cf3 100644 --- a/Documentation/arch/index.rst +++ b/Documentation/arch/index.rst @@ -21,7 +21,7 @@ implementation. parisc/index ../powerpc/index ../riscv/index - ../s390/index + s390/index sh/index sparc/index x86/index diff --git a/Documentation/s390/3270.ChangeLog b/Documentation/arch/s390/3270.ChangeLog index ecaf60b6c381..ecaf60b6c381 100644 --- a/Documentation/s390/3270.ChangeLog +++ b/Documentation/arch/s390/3270.ChangeLog diff --git a/Documentation/s390/3270.rst b/Documentation/arch/s390/3270.rst index e09e77954238..467eace91473 100644 --- a/Documentation/s390/3270.rst +++ b/Documentation/arch/s390/3270.rst @@ -116,7 +116,7 @@ Here are the installation steps in detail: as a 3270, not a 3215. 5. Run the 3270 configuration script config3270. It is - distributed in this same directory, Documentation/s390, as + distributed in this same directory, Documentation/arch/s390, as config3270.sh. Inspect the output script it produces, /tmp/mkdev3270, and then run that script. This will create the necessary character special device files and make the necessary @@ -125,7 +125,7 @@ Here are the installation steps in detail: Then notify /sbin/init that /etc/inittab has changed, by issuing the telinit command with the q operand:: - cd Documentation/s390 + cd Documentation/arch/s390 sh config3270.sh sh /tmp/mkdev3270 telinit q diff --git a/Documentation/s390/cds.rst b/Documentation/arch/s390/cds.rst index 7006d8209d2e..bcad2a14244a 100644 --- a/Documentation/s390/cds.rst +++ b/Documentation/arch/s390/cds.rst @@ -39,7 +39,7 @@ some of them are ESA/390 platform specific. Note: In order to write a driver for S/390, you also need to look into the interface - described in Documentation/s390/driver-model.rst. + described in Documentation/arch/s390/driver-model.rst. Note for porting drivers from 2.4: diff --git a/Documentation/s390/common_io.rst b/Documentation/arch/s390/common_io.rst index 846485681ce7..6dcb40cb7145 100644 --- a/Documentation/s390/common_io.rst +++ b/Documentation/arch/s390/common_io.rst @@ -136,5 +136,5 @@ debugfs entries The level of logging can be changed to be more or less verbose by piping to /sys/kernel/debug/s390dbf/cio_*/level a number between 0 and 6; see the - documentation on the S/390 debug feature (Documentation/s390/s390dbf.rst) + documentation on the S/390 debug feature (Documentation/arch/s390/s390dbf.rst) for details. diff --git a/Documentation/s390/config3270.sh b/Documentation/arch/s390/config3270.sh index 515e2f431487..515e2f431487 100644 --- a/Documentation/s390/config3270.sh +++ b/Documentation/arch/s390/config3270.sh diff --git a/Documentation/s390/driver-model.rst b/Documentation/arch/s390/driver-model.rst index ad4bc2dbea43..ad4bc2dbea43 100644 --- a/Documentation/s390/driver-model.rst +++ b/Documentation/arch/s390/driver-model.rst diff --git a/Documentation/s390/features.rst b/Documentation/arch/s390/features.rst index 57c296a9d8f3..57c296a9d8f3 100644 --- a/Documentation/s390/features.rst +++ b/Documentation/arch/s390/features.rst diff --git a/Documentation/s390/index.rst b/Documentation/arch/s390/index.rst index 73c79bf586fd..73c79bf586fd 100644 --- a/Documentation/s390/index.rst +++ b/Documentation/arch/s390/index.rst diff --git a/Documentation/s390/monreader.rst b/Documentation/arch/s390/monreader.rst index 21cdfb699b49..21cdfb699b49 100644 --- a/Documentation/s390/monreader.rst +++ b/Documentation/arch/s390/monreader.rst diff --git a/Documentation/s390/pci.rst b/Documentation/arch/s390/pci.rst index a1a72a47dc96..d5755484d8e7 100644 --- a/Documentation/s390/pci.rst +++ b/Documentation/arch/s390/pci.rst @@ -40,7 +40,7 @@ For example: Change the level of logging to be more or less verbose by piping a number between 0 and 6 to /sys/kernel/debug/s390dbf/pci_*/level. For details, see the documentation on the S/390 debug feature at - Documentation/s390/s390dbf.rst. + Documentation/arch/s390/s390dbf.rst. Sysfs entries ============= diff --git a/Documentation/s390/qeth.rst b/Documentation/arch/s390/qeth.rst index f02fdaa68de0..f02fdaa68de0 100644 --- a/Documentation/s390/qeth.rst +++ b/Documentation/arch/s390/qeth.rst diff --git a/Documentation/s390/s390dbf.rst b/Documentation/arch/s390/s390dbf.rst index af8bdc3629e7..af8bdc3629e7 100644 --- a/Documentation/s390/s390dbf.rst +++ b/Documentation/arch/s390/s390dbf.rst diff --git a/Documentation/s390/text_files.rst b/Documentation/arch/s390/text_files.rst index c94d05d4fa17..c94d05d4fa17 100644 --- a/Documentation/s390/text_files.rst +++ b/Documentation/arch/s390/text_files.rst diff --git a/Documentation/s390/vfio-ap-locking.rst b/Documentation/arch/s390/vfio-ap-locking.rst index 0dfcdb562e21..0dfcdb562e21 100644 --- a/Documentation/s390/vfio-ap-locking.rst +++ b/Documentation/arch/s390/vfio-ap-locking.rst diff --git a/Documentation/s390/vfio-ap.rst b/Documentation/arch/s390/vfio-ap.rst index bb3f4c4e2885..bb3f4c4e2885 100644 --- a/Documentation/s390/vfio-ap.rst +++ b/Documentation/arch/s390/vfio-ap.rst diff --git a/Documentation/s390/vfio-ccw.rst b/Documentation/arch/s390/vfio-ccw.rst index 37026fa18179..42960b7b0d70 100644 --- a/Documentation/s390/vfio-ccw.rst +++ b/Documentation/arch/s390/vfio-ccw.rst @@ -440,6 +440,6 @@ Reference 1. ESA/s390 Principles of Operation manual (IBM Form. No. SA22-7832) 2. ESA/390 Common I/O Device Commands manual (IBM Form. No. SA22-7204) 3. https://en.wikipedia.org/wiki/Channel_I/O -4. Documentation/s390/cds.rst +4. Documentation/arch/s390/cds.rst 5. Documentation/driver-api/vfio.rst 6. Documentation/driver-api/vfio-mediated-device.rst diff --git a/Documentation/s390/zfcpdump.rst b/Documentation/arch/s390/zfcpdump.rst index a61de7aa8778..a61de7aa8778 100644 --- a/Documentation/s390/zfcpdump.rst +++ b/Documentation/arch/s390/zfcpdump.rst diff --git a/Documentation/arch/x86/boot.rst b/Documentation/arch/x86/boot.rst index 33520ecdb37a..cdbca15a4fc2 100644 --- a/Documentation/arch/x86/boot.rst +++ b/Documentation/arch/x86/boot.rst @@ -1417,7 +1417,7 @@ execution context provided by the EFI firmware. The function prototype for the handover entry point looks like this:: - efi_main(void *handle, efi_system_table_t *table, struct boot_params *bp) + efi_stub_entry(void *handle, efi_system_table_t *table, struct boot_params *bp) 'handle' is the EFI image handle passed to the boot loader by the EFI firmware, 'table' is the EFI system table - these are the first two diff --git a/Documentation/block/biovecs.rst b/Documentation/block/biovecs.rst index ddb867e0185b..b9dc0c9dbee4 100644 --- a/Documentation/block/biovecs.rst +++ b/Documentation/block/biovecs.rst @@ -134,6 +134,7 @@ Usage of helpers: bio_for_each_bvec_all() bio_first_bvec_all() bio_first_page_all() + bio_first_folio_all() bio_last_bvec_all() * The following helpers iterate over single-page segment. The passed 'struct diff --git a/Documentation/bpf/bpf_design_QA.rst b/Documentation/bpf/bpf_design_QA.rst index 38372a956d65..eb19c945f4d5 100644 --- a/Documentation/bpf/bpf_design_QA.rst +++ b/Documentation/bpf/bpf_design_QA.rst @@ -140,11 +140,6 @@ A: Because if we picked one-to-one relationship to x64 it would have made it more complicated to support on arm64 and other archs. Also it needs div-by-zero runtime check. -Q: Why there is no BPF_SDIV for signed divide operation? -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -A: Because it would be rarely used. llvm errors in such case and -prints a suggestion to use unsigned divide instead. - Q: Why BPF has implicit prologue and epilogue? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ A: Because architectures like sparc have register windows and in general diff --git a/Documentation/bpf/bpf_devel_QA.rst b/Documentation/bpf/bpf_devel_QA.rst index 609b71f5747d..de27e1620821 100644 --- a/Documentation/bpf/bpf_devel_QA.rst +++ b/Documentation/bpf/bpf_devel_QA.rst @@ -635,12 +635,12 @@ test coverage. Q: clang flag for target bpf? ----------------------------- -Q: In some cases clang flag ``-target bpf`` is used but in other cases the +Q: In some cases clang flag ``--target=bpf`` is used but in other cases the default clang target, which matches the underlying architecture, is used. What is the difference and when I should use which? A: Although LLVM IR generation and optimization try to stay architecture -independent, ``-target <arch>`` still has some impact on generated code: +independent, ``--target=<arch>`` still has some impact on generated code: - BPF program may recursively include header file(s) with file scope inline assembly codes. The default target can handle this well, @@ -658,7 +658,7 @@ independent, ``-target <arch>`` still has some impact on generated code: The clang option ``-fno-jump-tables`` can be used to disable switch table generation. -- For clang ``-target bpf``, it is guaranteed that pointer or long / +- For clang ``--target=bpf``, it is guaranteed that pointer or long / unsigned long types will always have a width of 64 bit, no matter whether underlying clang binary or default target (or kernel) is 32 bit. However, when native clang target is used, then it will @@ -668,7 +668,7 @@ independent, ``-target <arch>`` still has some impact on generated code: while the BPF LLVM back end still operates in 64 bit. The native target is mostly needed in tracing for the case of walking ``pt_regs`` or other kernel structures where CPU's register width matters. - Otherwise, ``clang -target bpf`` is generally recommended. + Otherwise, ``clang --target=bpf`` is generally recommended. You should use default target when: @@ -685,7 +685,7 @@ when: into these structures is verified by the BPF verifier and may result in verification failures if the native architecture is not aligned with the BPF architecture, e.g. 64-bit. An example of this is - BPF_PROG_TYPE_SK_MSG require ``-target bpf`` + BPF_PROG_TYPE_SK_MSG require ``--target=bpf`` .. Links diff --git a/Documentation/bpf/btf.rst b/Documentation/bpf/btf.rst index 7cd7c5415a99..f32db1f44ae9 100644 --- a/Documentation/bpf/btf.rst +++ b/Documentation/bpf/btf.rst @@ -990,7 +990,7 @@ format.:: } g2; int main() { return 0; } int test() { return 0; } - -bash-4.4$ clang -c -g -O2 -target bpf t2.c + -bash-4.4$ clang -c -g -O2 --target=bpf t2.c -bash-4.4$ readelf -S t2.o ...... [ 8] .BTF PROGBITS 0000000000000000 00000247 @@ -1000,7 +1000,7 @@ format.:: [10] .rel.BTF.ext REL 0000000000000000 000007e0 0000000000000040 0000000000000010 16 9 8 ...... - -bash-4.4$ clang -S -g -O2 -target bpf t2.c + -bash-4.4$ clang -S -g -O2 --target=bpf t2.c -bash-4.4$ cat t2.s ...... .section .BTF,"",@progbits diff --git a/Documentation/bpf/index.rst b/Documentation/bpf/index.rst index dbb39e8f9889..1ff177b89d66 100644 --- a/Documentation/bpf/index.rst +++ b/Documentation/bpf/index.rst @@ -12,9 +12,9 @@ that goes into great technical depth about the BPF Architecture. .. toctree:: :maxdepth: 1 - instruction-set verifier libbpf/index + standardization/index btf faq syscall_api @@ -29,7 +29,6 @@ that goes into great technical depth about the BPF Architecture. bpf_licensing test_debug clang-notes - linux-notes other redirect diff --git a/Documentation/bpf/llvm_reloc.rst b/Documentation/bpf/llvm_reloc.rst index e4a777a6a3a2..450e6403fe3d 100644 --- a/Documentation/bpf/llvm_reloc.rst +++ b/Documentation/bpf/llvm_reloc.rst @@ -28,7 +28,7 @@ For example, for the following code:: return g1 + g2 + l1 + l2; } -Compiled with ``clang -target bpf -O2 -c test.c``, the following is +Compiled with ``clang --target=bpf -O2 -c test.c``, the following is the code with ``llvm-objdump -dr test.o``:: 0: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll @@ -157,7 +157,7 @@ and ``call`` instructions. For example:: return gfunc(a, b) + lfunc(a, b) + global; } -Compiled with ``clang -target bpf -O2 -c test.c``, we will have +Compiled with ``clang --target=bpf -O2 -c test.c``, we will have following code with `llvm-objdump -dr test.o``:: Disassembly of section .text: @@ -203,7 +203,7 @@ The following is an example to show how R_BPF_64_ABS64 could be generated:: int global() { return 0; } struct t { void *g; } gbl = { global }; -Compiled with ``clang -target bpf -O2 -g -c test.c``, we will see a +Compiled with ``clang --target=bpf -O2 -g -c test.c``, we will see a relocation below in ``.data`` section with command ``llvm-readelf -r test.o``:: diff --git a/Documentation/bpf/standardization/index.rst b/Documentation/bpf/standardization/index.rst new file mode 100644 index 000000000000..09c6ba055fd7 --- /dev/null +++ b/Documentation/bpf/standardization/index.rst @@ -0,0 +1,18 @@ +.. SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) + +=================== +BPF Standardization +=================== + +This directory contains documents that are being iterated on as part of the BPF +standardization effort with the IETF. See the `IETF BPF Working Group`_ page +for the working group charter, documents, and more. + +.. toctree:: + :maxdepth: 1 + + instruction-set + linux-notes + +.. Links: +.. _IETF BPF Working Group: https://datatracker.ietf.org/wg/bpf/about/ diff --git a/Documentation/bpf/instruction-set.rst b/Documentation/bpf/standardization/instruction-set.rst index 6644842cd3ea..4f73e9dc8d9e 100644 --- a/Documentation/bpf/instruction-set.rst +++ b/Documentation/bpf/standardization/instruction-set.rst @@ -10,9 +10,92 @@ This document specifies version 1.0 of the eBPF instruction set. Documentation conventions ========================= -For brevity, this document uses the type notion "u64", "u32", etc. -to mean an unsigned integer whose width is the specified number of bits, -and "s32", etc. to mean a signed integer of the specified number of bits. +For brevity and consistency, this document refers to families +of types using a shorthand syntax and refers to several expository, +mnemonic functions when describing the semantics of instructions. +The range of valid values for those types and the semantics of those +functions are defined in the following subsections. + +Types +----- +This document refers to integer types with the notation `SN` to specify +a type's signedness (`S`) and bit width (`N`), respectively. + +.. table:: Meaning of signedness notation. + + ==== ========= + `S` Meaning + ==== ========= + `u` unsigned + `s` signed + ==== ========= + +.. table:: Meaning of bit-width notation. + + ===== ========= + `N` Bit width + ===== ========= + `8` 8 bits + `16` 16 bits + `32` 32 bits + `64` 64 bits + `128` 128 bits + ===== ========= + +For example, `u32` is a type whose valid values are all the 32-bit unsigned +numbers and `s16` is a types whose valid values are all the 16-bit signed +numbers. + +Functions +--------- +* `htobe16`: Takes an unsigned 16-bit number in host-endian format and + returns the equivalent number as an unsigned 16-bit number in big-endian + format. +* `htobe32`: Takes an unsigned 32-bit number in host-endian format and + returns the equivalent number as an unsigned 32-bit number in big-endian + format. +* `htobe64`: Takes an unsigned 64-bit number in host-endian format and + returns the equivalent number as an unsigned 64-bit number in big-endian + format. +* `htole16`: Takes an unsigned 16-bit number in host-endian format and + returns the equivalent number as an unsigned 16-bit number in little-endian + format. +* `htole32`: Takes an unsigned 32-bit number in host-endian format and + returns the equivalent number as an unsigned 32-bit number in little-endian + format. +* `htole64`: Takes an unsigned 64-bit number in host-endian format and + returns the equivalent number as an unsigned 64-bit number in little-endian + format. +* `bswap16`: Takes an unsigned 16-bit number in either big- or little-endian + format and returns the equivalent number with the same bit width but + opposite endianness. +* `bswap32`: Takes an unsigned 32-bit number in either big- or little-endian + format and returns the equivalent number with the same bit width but + opposite endianness. +* `bswap64`: Takes an unsigned 64-bit number in either big- or little-endian + format and returns the equivalent number with the same bit width but + opposite endianness. + + +Definitions +----------- + +.. glossary:: + + Sign Extend + To `sign extend an` ``X`` `-bit number, A, to a` ``Y`` `-bit number, B ,` means to + + #. Copy all ``X`` bits from `A` to the lower ``X`` bits of `B`. + #. Set the value of the remaining ``Y`` - ``X`` bits of `B` to the value of + the most-significant bit of `A`. + +.. admonition:: Example + + Sign extend an 8-bit number ``A`` to a 16-bit number ``B`` on a big-endian platform: + :: + + A: 10000110 + B: 11111111 10000110 Registers and calling convention ================================ @@ -154,24 +237,27 @@ otherwise identical operations. The 'code' field encodes the operation as below, where 'src' and 'dst' refer to the values of the source and destination registers, respectively. -======== ===== ========================================================== -code value description -======== ===== ========================================================== -BPF_ADD 0x00 dst += src -BPF_SUB 0x10 dst -= src -BPF_MUL 0x20 dst \*= src -BPF_DIV 0x30 dst = (src != 0) ? (dst / src) : 0 -BPF_OR 0x40 dst \|= src -BPF_AND 0x50 dst &= src -BPF_LSH 0x60 dst <<= (src & mask) -BPF_RSH 0x70 dst >>= (src & mask) -BPF_NEG 0x80 dst = ~src -BPF_MOD 0x90 dst = (src != 0) ? (dst % src) : dst -BPF_XOR 0xa0 dst ^= src -BPF_MOV 0xb0 dst = src -BPF_ARSH 0xc0 sign extending dst >>= (src & mask) -BPF_END 0xd0 byte swap operations (see `Byte swap instructions`_ below) -======== ===== ========================================================== +========= ===== ======= ========================================================== +code value offset description +========= ===== ======= ========================================================== +BPF_ADD 0x00 0 dst += src +BPF_SUB 0x10 0 dst -= src +BPF_MUL 0x20 0 dst \*= src +BPF_DIV 0x30 0 dst = (src != 0) ? (dst / src) : 0 +BPF_SDIV 0x30 1 dst = (src != 0) ? (dst s/ src) : 0 +BPF_OR 0x40 0 dst \|= src +BPF_AND 0x50 0 dst &= src +BPF_LSH 0x60 0 dst <<= (src & mask) +BPF_RSH 0x70 0 dst >>= (src & mask) +BPF_NEG 0x80 0 dst = -dst +BPF_MOD 0x90 0 dst = (src != 0) ? (dst % src) : dst +BPF_SMOD 0x90 1 dst = (src != 0) ? (dst s% src) : dst +BPF_XOR 0xa0 0 dst ^= src +BPF_MOV 0xb0 0 dst = src +BPF_MOVSX 0xb0 8/16/32 dst = (s8,s16,s32)src +BPF_ARSH 0xc0 0 :term:`sign extending<Sign Extend>` dst >>= (src & mask) +BPF_END 0xd0 0 byte swap operations (see `Byte swap instructions`_ below) +========= ===== ======= ========================================================== Underflow and overflow are allowed during arithmetic operations, meaning the 64-bit or 32-bit value will wrap. If eBPF program execution would @@ -198,47 +284,75 @@ where '(u32)' indicates that the upper 32 bits are zeroed. dst = dst ^ imm32 -Also note that the division and modulo operations are unsigned. Thus, for -``BPF_ALU``, 'imm' is first interpreted as an unsigned 32-bit value, whereas -for ``BPF_ALU64``, 'imm' is first sign extended to 64 bits and the result -interpreted as an unsigned 64-bit value. There are no instructions for -signed division or modulo. +Note that most instructions have instruction offset of 0. Only three instructions +(``BPF_SDIV``, ``BPF_SMOD``, ``BPF_MOVSX``) have a non-zero offset. + +The division and modulo operations support both unsigned and signed flavors. + +For unsigned operations (``BPF_DIV`` and ``BPF_MOD``), for ``BPF_ALU``, +'imm' is interpreted as a 32-bit unsigned value. For ``BPF_ALU64``, +'imm' is first :term:`sign extended<Sign Extend>` from 32 to 64 bits, and then +interpreted as a 64-bit unsigned value. + +For signed operations (``BPF_SDIV`` and ``BPF_SMOD``), for ``BPF_ALU``, +'imm' is interpreted as a 32-bit signed value. For ``BPF_ALU64``, 'imm' +is first :term:`sign extended<Sign Extend>` from 32 to 64 bits, and then +interpreted as a 64-bit signed value. + +The ``BPF_MOVSX`` instruction does a move operation with sign extension. +``BPF_ALU | BPF_MOVSX`` :term:`sign extends<Sign Extend>` 8-bit and 16-bit operands into 32 +bit operands, and zeroes the remaining upper 32 bits. +``BPF_ALU64 | BPF_MOVSX`` :term:`sign extends<Sign Extend>` 8-bit, 16-bit, and 32-bit +operands into 64 bit operands. Shift operations use a mask of 0x3F (63) for 64-bit operations and 0x1F (31) for 32-bit operations. Byte swap instructions -~~~~~~~~~~~~~~~~~~~~~~ +---------------------- -The byte swap instructions use an instruction class of ``BPF_ALU`` and a 4-bit -'code' field of ``BPF_END``. +The byte swap instructions use instruction classes of ``BPF_ALU`` and ``BPF_ALU64`` +and a 4-bit 'code' field of ``BPF_END``. The byte swap instructions operate on the destination register only and do not use a separate source register or immediate value. -The 1-bit source operand field in the opcode is used to select what byte -order the operation convert from or to: +For ``BPF_ALU``, the 1-bit source operand field in the opcode is used to +select what byte order the operation converts from or to. For +``BPF_ALU64``, the 1-bit source operand field in the opcode is reserved +and must be set to 0. -========= ===== ================================================= -source value description -========= ===== ================================================= -BPF_TO_LE 0x00 convert between host byte order and little endian -BPF_TO_BE 0x08 convert between host byte order and big endian -========= ===== ================================================= +========= ========= ===== ================================================= +class source value description +========= ========= ===== ================================================= +BPF_ALU BPF_TO_LE 0x00 convert between host byte order and little endian +BPF_ALU BPF_TO_BE 0x08 convert between host byte order and big endian +BPF_ALU64 Reserved 0x00 do byte swap unconditionally +========= ========= ===== ================================================= The 'imm' field encodes the width of the swap operations. The following widths are supported: 16, 32 and 64. Examples: -``BPF_ALU | BPF_TO_LE | BPF_END`` with imm = 16 means:: +``BPF_ALU | BPF_TO_LE | BPF_END`` with imm = 16/32/64 means:: dst = htole16(dst) + dst = htole32(dst) + dst = htole64(dst) -``BPF_ALU | BPF_TO_BE | BPF_END`` with imm = 64 means:: +``BPF_ALU | BPF_TO_BE | BPF_END`` with imm = 16/32/64 means:: + dst = htobe16(dst) + dst = htobe32(dst) dst = htobe64(dst) +``BPF_ALU64 | BPF_TO_LE | BPF_END`` with imm = 16/32/64 means:: + + dst = bswap16(dst) + dst = bswap32(dst) + dst = bswap64(dst) + Jump instructions ----------------- @@ -249,7 +363,8 @@ The 'code' field encodes the operation as below: ======== ===== === =========================================== ========================================= code value src description notes ======== ===== === =========================================== ========================================= -BPF_JA 0x0 0x0 PC += offset BPF_JMP only +BPF_JA 0x0 0x0 PC += offset BPF_JMP class +BPF_JA 0x0 0x0 PC += imm BPF_JMP32 class BPF_JEQ 0x1 any PC += offset if dst == src BPF_JGT 0x2 any PC += offset if dst > src unsigned BPF_JGE 0x3 any PC += offset if dst >= src unsigned @@ -278,6 +393,19 @@ Example: where 's>=' indicates a signed '>=' comparison. +``BPF_JA | BPF_K | BPF_JMP32`` (0x06) means:: + + gotol +imm + +where 'imm' means the branch offset comes from insn 'imm' field. + +Note that there are two flavors of ``BPF_JA`` instructions. The +``BPF_JMP`` class permits a 16-bit jump offset specified by the 'offset' +field, whereas the ``BPF_JMP32`` class permits a 32-bit jump offset +specified by the 'imm' field. A > 16-bit conditional jump may be +converted to a < 16-bit conditional jump plus a 32-bit unconditional +jump. + Helper functions ~~~~~~~~~~~~~~~~ @@ -320,6 +448,7 @@ The mode modifier is one of: BPF_ABS 0x20 legacy BPF packet access (absolute) `Legacy BPF Packet access instructions`_ BPF_IND 0x40 legacy BPF packet access (indirect) `Legacy BPF Packet access instructions`_ BPF_MEM 0x60 regular load and store operations `Regular load and store operations`_ + BPF_MEMSX 0x80 sign-extension load operations `Sign-extension load operations`_ BPF_ATOMIC 0xc0 atomic operations `Atomic operations`_ ============= ===== ==================================== ============= @@ -350,9 +479,23 @@ instructions that transfer data between a register and memory. ``BPF_MEM | <size> | BPF_LDX`` means:: - dst = *(size *) (src + offset) + dst = *(unsigned size *) (src + offset) + +Where size is one of: ``BPF_B``, ``BPF_H``, ``BPF_W``, or ``BPF_DW`` and +'unsigned size' is one of u8, u16, u32 or u64. + +Sign-extension load operations +------------------------------ + +The ``BPF_MEMSX`` mode modifier is used to encode :term:`sign-extension<Sign Extend>` load +instructions that transfer data between a register and memory. + +``BPF_MEMSX | <size> | BPF_LDX`` means:: + + dst = *(signed size *) (src + offset) -Where size is one of: ``BPF_B``, ``BPF_H``, ``BPF_W``, or ``BPF_DW``. +Where size is one of: ``BPF_B``, ``BPF_H`` or ``BPF_W``, and +'signed size' is one of s8, s16 or s32. Atomic operations ----------------- diff --git a/Documentation/bpf/linux-notes.rst b/Documentation/bpf/standardization/linux-notes.rst index 508d009d3bed..00d2693de025 100644 --- a/Documentation/bpf/linux-notes.rst +++ b/Documentation/bpf/standardization/linux-notes.rst @@ -45,7 +45,8 @@ On Linux, this integer is a BTF ID. Legacy BPF Packet access instructions ===================================== -As mentioned in the `ISA standard documentation <instruction-set.rst#legacy-bpf-packet-access-instructions>`_, +As mentioned in the `ISA standard documentation +<instruction-set.html#legacy-bpf-packet-access-instructions>`_, Linux has special eBPF instructions for access to packet data that have been carried over from classic BPF to retain the performance of legacy socket filters running in the eBPF interpreter. diff --git a/Documentation/core-api/cachetlb.rst b/Documentation/core-api/cachetlb.rst index 5c0552e78c58..889fc84ccd1b 100644 --- a/Documentation/core-api/cachetlb.rst +++ b/Documentation/core-api/cachetlb.rst @@ -88,13 +88,17 @@ changes occur: This is used primarily during fault processing. -5) ``void update_mmu_cache(struct vm_area_struct *vma, - unsigned long address, pte_t *ptep)`` +5) ``void update_mmu_cache_range(struct vm_fault *vmf, + struct vm_area_struct *vma, unsigned long address, pte_t *ptep, + unsigned int nr)`` - At the end of every page fault, this routine is invoked to - tell the architecture specific code that a translation - now exists at virtual address "address" for address space - "vma->vm_mm", in the software page tables. + At the end of every page fault, this routine is invoked to tell + the architecture specific code that translations now exists + in the software page tables for address space "vma->vm_mm" + at virtual address "address" for "nr" consecutive pages. + + This routine is also invoked in various other places which pass + a NULL "vmf". A port may use this information in any way it so chooses. For example, it could use this event to pre-load TLB @@ -269,7 +273,7 @@ maps this page at its virtual address. If D-cache aliasing is not an issue, these two routines may simply call memcpy/memset directly and do nothing more. - ``void flush_dcache_page(struct page *page)`` + ``void flush_dcache_folio(struct folio *folio)`` This routines must be called when: @@ -277,7 +281,7 @@ maps this page at its virtual address. and / or in high memory b) the kernel is about to read from a page cache page and user space shared/writable mappings of this page potentially exist. Note - that {get,pin}_user_pages{_fast} already call flush_dcache_page + that {get,pin}_user_pages{_fast} already call flush_dcache_folio on any page found in the user address space and thus driver code rarely needs to take this into account. @@ -291,7 +295,7 @@ maps this page at its virtual address. The phrase "kernel writes to a page cache page" means, specifically, that the kernel executes store instructions that dirty data in that - page at the page->virtual mapping of that page. It is important to + page at the kernel virtual mapping of that page. It is important to flush here to handle D-cache aliasing, to make sure these kernel stores are visible to user space mappings of that page. @@ -302,21 +306,22 @@ maps this page at its virtual address. If D-cache aliasing is not an issue, this routine may simply be defined as a nop on that architecture. - There is a bit set aside in page->flags (PG_arch_1) as "architecture + There is a bit set aside in folio->flags (PG_arch_1) as "architecture private". The kernel guarantees that, for pagecache pages, it will clear this bit when such a page first enters the pagecache. - This allows these interfaces to be implemented much more efficiently. - It allows one to "defer" (perhaps indefinitely) the actual flush if - there are currently no user processes mapping this page. See sparc64's - flush_dcache_page and update_mmu_cache implementations for an example - of how to go about doing this. + This allows these interfaces to be implemented much more + efficiently. It allows one to "defer" (perhaps indefinitely) the + actual flush if there are currently no user processes mapping this + page. See sparc64's flush_dcache_folio and update_mmu_cache_range + implementations for an example of how to go about doing this. - The idea is, first at flush_dcache_page() time, if page_file_mapping() - returns a mapping, and mapping_mapped on that mapping returns %false, - just mark the architecture private page flag bit. Later, in - update_mmu_cache(), a check is made of this flag bit, and if set the - flush is done and the flag bit is cleared. + The idea is, first at flush_dcache_folio() time, if + folio_flush_mapping() returns a mapping, and mapping_mapped() on that + mapping returns %false, just mark the architecture private page + flag bit. Later, in update_mmu_cache_range(), a check is made + of this flag bit, and if set the flush is done and the flag bit + is cleared. .. important:: @@ -326,12 +331,6 @@ maps this page at its virtual address. dirty. Again, see sparc64 for examples of how to deal with this. - ``void flush_dcache_folio(struct folio *folio)`` - This function is called under the same circumstances as - flush_dcache_page(). It allows the architecture to - optimise for flushing the entire folio of pages instead - of flushing one page at a time. - ``void copy_to_user_page(struct vm_area_struct *vma, struct page *page, unsigned long user_vaddr, void *dst, void *src, int len)`` ``void copy_from_user_page(struct vm_area_struct *vma, struct page *page, @@ -352,7 +351,7 @@ maps this page at its virtual address. When the kernel needs to access the contents of an anonymous page, it calls this function (currently only - get_user_pages()). Note: flush_dcache_page() deliberately + get_user_pages()). Note: flush_dcache_folio() deliberately doesn't work for an anonymous page. The default implementation is a nop (and should remain so for all coherent architectures). For incoherent architectures, it should flush @@ -369,7 +368,7 @@ maps this page at its virtual address. ``void flush_icache_page(struct vm_area_struct *vma, struct page *page)`` All the functionality of flush_icache_page can be implemented in - flush_dcache_page and update_mmu_cache. In the future, the hope + flush_dcache_folio and update_mmu_cache_range. In the future, the hope is to remove this interface completely. The final category of APIs is for I/O to deliberately aliased address diff --git a/Documentation/core-api/cpu_hotplug.rst b/Documentation/core-api/cpu_hotplug.rst index e6f5bc39cf5c..9511e405aabd 100644 --- a/Documentation/core-api/cpu_hotplug.rst +++ b/Documentation/core-api/cpu_hotplug.rst @@ -395,8 +395,8 @@ multi-instance state the following function is available: * cpuhp_setup_state_multi(state, name, startup, teardown) The @state argument is either a statically allocated state or one of the -constants for dynamically allocated states - CPUHP_PREPARE_DYN, -CPUHP_ONLINE_DYN - depending on the state section (PREPARE, ONLINE) for +constants for dynamically allocated states - CPUHP_BP_PREPARE_DYN, +CPUHP_AP_ONLINE_DYN - depending on the state section (PREPARE, ONLINE) for which a dynamic state should be allocated. The @name argument is used for sysfs output and for instrumentation. The @@ -588,7 +588,7 @@ notifications on online and offline operations:: Setup and teardown a dynamically allocated state in the ONLINE section for notifications on offline operations:: - state = cpuhp_setup_state(CPUHP_ONLINE_DYN, "subsys:offline", NULL, subsys_cpu_offline); + state = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "subsys:offline", NULL, subsys_cpu_offline); if (state < 0) return state; .... @@ -597,7 +597,7 @@ for notifications on offline operations:: Setup and teardown a dynamically allocated state in the ONLINE section for notifications on online operations without invoking the callbacks:: - state = cpuhp_setup_state_nocalls(CPUHP_ONLINE_DYN, "subsys:online", subsys_cpu_online, NULL); + state = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN, "subsys:online", subsys_cpu_online, NULL); if (state < 0) return state; .... @@ -606,7 +606,7 @@ for notifications on online operations without invoking the callbacks:: Setup, use and teardown a dynamically allocated multi-instance state in the ONLINE section for notifications on online and offline operation:: - state = cpuhp_setup_state_multi(CPUHP_ONLINE_DYN, "subsys:online", subsys_cpu_online, subsys_cpu_offline); + state = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN, "subsys:online", subsys_cpu_online, subsys_cpu_offline); if (state < 0) return state; .... @@ -741,6 +741,24 @@ will receive all events. A script like:: can process the event further. +When changes to the CPUs in the system occur, the sysfs file +/sys/devices/system/cpu/crash_hotplug contains '1' if the kernel +updates the kdump capture kernel list of CPUs itself (via elfcorehdr), +or '0' if userspace must update the kdump capture kernel list of CPUs. + +The availability depends on the CONFIG_HOTPLUG_CPU kernel configuration +option. + +To skip userspace processing of CPU hot un/plug events for kdump +(i.e. the unload-then-reload to obtain a current list of CPUs), this sysfs +file can be used in a udev rule as follows: + + SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end" + +For a CPU hot un/plug event, if the architecture supports kernel updates +of the elfcorehdr (which contains the list of CPUs), then the rule skips +the unload-then-reload of the kdump capture kernel. + Kernel Inline Documentations Reference ====================================== diff --git a/Documentation/core-api/mm-api.rst b/Documentation/core-api/mm-api.rst index f5dde5bceaea..2d091c873d1e 100644 --- a/Documentation/core-api/mm-api.rst +++ b/Documentation/core-api/mm-api.rst @@ -115,3 +115,28 @@ More Memory Management Functions .. kernel-doc:: include/linux/mmzone.h .. kernel-doc:: mm/util.c :functions: folio_mapping + +.. kernel-doc:: mm/rmap.c +.. kernel-doc:: mm/migrate.c +.. kernel-doc:: mm/mmap.c +.. kernel-doc:: mm/kmemleak.c +.. #kernel-doc:: mm/hmm.c (build warnings) +.. kernel-doc:: mm/memremap.c +.. kernel-doc:: mm/hugetlb.c +.. kernel-doc:: mm/swap.c +.. kernel-doc:: mm/zpool.c +.. kernel-doc:: mm/memcontrol.c +.. #kernel-doc:: mm/memory-tiers.c (build warnings) +.. kernel-doc:: mm/shmem.c +.. kernel-doc:: mm/migrate_device.c +.. #kernel-doc:: mm/nommu.c (duplicates kernel-doc from other files) +.. kernel-doc:: mm/mapping_dirty_helpers.c +.. #kernel-doc:: mm/memory-failure.c (build warnings) +.. kernel-doc:: mm/percpu.c +.. kernel-doc:: mm/maccess.c +.. kernel-doc:: mm/vmscan.c +.. kernel-doc:: mm/memory_hotplug.c +.. kernel-doc:: mm/mmu_notifier.c +.. kernel-doc:: mm/balloon_compaction.c +.. kernel-doc:: mm/huge_memory.c +.. kernel-doc:: mm/io-mapping.c diff --git a/Documentation/core-api/netlink.rst b/Documentation/core-api/netlink.rst index e4a938a05cc9..9f692b02bfe6 100644 --- a/Documentation/core-api/netlink.rst +++ b/Documentation/core-api/netlink.rst @@ -67,10 +67,11 @@ Globals kernel-policy ~~~~~~~~~~~~~ -Defines if the kernel validation policy is per operation (``per-op``) -or for the entire family (``global``). New families should use ``per-op`` -(default) to be able to narrow down the attributes accepted by a specific -command. +Defines whether the kernel validation policy is ``global`` i.e. the same for all +operations of the family, defined for each operation individually - ``per-op``, +or separately for each operation and operation type (do vs dump) - ``split``. +New families should use ``per-op`` (default) to be able to narrow down the +attributes accepted by a specific command. checks ------ diff --git a/Documentation/dev-tools/kunit/run_wrapper.rst b/Documentation/dev-tools/kunit/run_wrapper.rst index dafe8eb28d30..19ddf5e07013 100644 --- a/Documentation/dev-tools/kunit/run_wrapper.rst +++ b/Documentation/dev-tools/kunit/run_wrapper.rst @@ -321,3 +321,15 @@ command line arguments: - ``--json``: If set, stores the test results in a JSON format and prints to `stdout` or saves to a file if a filename is specified. + +- ``--filter``: Specifies filters on test attributes, for example, ``speed!=slow``. + Multiple filters can be used by wrapping input in quotes and separating filters + by commas. Example: ``--filter "speed>slow, module=example"``. + +- ``--filter_action``: If set to ``skip``, filtered tests will be shown as skipped + in the output rather than showing no output. + +- ``--list_tests``: If set, lists all tests that will be run. + +- ``--list_tests_attr``: If set, lists all tests that will be run and all of their + attributes. diff --git a/Documentation/dev-tools/kunit/running_tips.rst b/Documentation/dev-tools/kunit/running_tips.rst index 8e8c493f17d1..766f9cdea0fa 100644 --- a/Documentation/dev-tools/kunit/running_tips.rst +++ b/Documentation/dev-tools/kunit/running_tips.rst @@ -262,3 +262,169 @@ other code executed during boot, e.g. # Reset coverage counters before running the test. $ echo 0 > /sys/kernel/debug/gcov/reset $ modprobe kunit-example-test + + +Test Attributes and Filtering +============================= + +Test suites and cases can be marked with test attributes, such as speed of +test. These attributes will later be printed in test output and can be used to +filter test execution. + +Marking Test Attributes +----------------------- + +Tests are marked with an attribute by including a ``kunit_attributes`` object +in the test definition. + +Test cases can be marked using the ``KUNIT_CASE_ATTR(test_name, attributes)`` +macro to define the test case instead of ``KUNIT_CASE(test_name)``. + +.. code-block:: c + + static const struct kunit_attributes example_attr = { + .speed = KUNIT_VERY_SLOW, + }; + + static struct kunit_case example_test_cases[] = { + KUNIT_CASE_ATTR(example_test, example_attr), + }; + +.. note:: + To mark a test case as slow, you can also use ``KUNIT_CASE_SLOW(test_name)``. + This is a helpful macro as the slow attribute is the most commonly used. + +Test suites can be marked with an attribute by setting the "attr" field in the +suite definition. + +.. code-block:: c + + static const struct kunit_attributes example_attr = { + .speed = KUNIT_VERY_SLOW, + }; + + static struct kunit_suite example_test_suite = { + ..., + .attr = example_attr, + }; + +.. note:: + Not all attributes need to be set in a ``kunit_attributes`` object. Unset + attributes will remain uninitialized and act as though the attribute is set + to 0 or NULL. Thus, if an attribute is set to 0, it is treated as unset. + These unset attributes will not be reported and may act as a default value + for filtering purposes. + +Reporting Attributes +-------------------- + +When a user runs tests, attributes will be present in the raw kernel output (in +KTAP format). Note that attributes will be hidden by default in kunit.py output +for all passing tests but the raw kernel output can be accessed using the +``--raw_output`` flag. This is an example of how test attributes for test cases +will be formatted in kernel output: + +.. code-block:: none + + # example_test.speed: slow + ok 1 example_test + +This is an example of how test attributes for test suites will be formatted in +kernel output: + +.. code-block:: none + + KTAP version 2 + # Subtest: example_suite + # module: kunit_example_test + 1..3 + ... + ok 1 example_suite + +Additionally, users can output a full attribute report of tests with their +attributes, using the command line flag ``--list_tests_attr``: + +.. code-block:: bash + + kunit.py run "example" --list_tests_attr + +.. note:: + This report can be accessed when running KUnit manually by passing in the + module_param ``kunit.action=list_attr``. + +Filtering +--------- + +Users can filter tests using the ``--filter`` command line flag when running +tests. As an example: + +.. code-block:: bash + + kunit.py run --filter speed=slow + + +You can also use the following operations on filters: "<", ">", "<=", ">=", +"!=", and "=". Example: + +.. code-block:: bash + + kunit.py run --filter "speed>slow" + +This example will run all tests with speeds faster than slow. Note that the +characters < and > are often interpreted by the shell, so they may need to be +quoted or escaped, as above. + +Additionally, you can use multiple filters at once. Simply separate filters +using commas. Example: + +.. code-block:: bash + + kunit.py run --filter "speed>slow, module=kunit_example_test" + +.. note:: + You can use this filtering feature when running KUnit manually by passing + the filter as a module param: ``kunit.filter="speed>slow, speed<=normal"``. + +Filtered tests will not run or show up in the test output. You can use the +``--filter_action=skip`` flag to skip filtered tests instead. These tests will be +shown in the test output in the test but will not run. To use this feature when +running KUnit manually, use the module param ``kunit.filter_action=skip``. + +Rules of Filtering Procedure +---------------------------- + +Since both suites and test cases can have attributes, there may be conflicts +between attributes during filtering. The process of filtering follows these +rules: + +- Filtering always operates at a per-test level. + +- If a test has an attribute set, then the test's value is filtered on. + +- Otherwise, the value falls back to the suite's value. + +- If neither are set, the attribute has a global "default" value, which is used. + +List of Current Attributes +-------------------------- + +``speed`` + +This attribute indicates the speed of a test's execution (how slow or fast the +test is). + +This attribute is saved as an enum with the following categories: "normal", +"slow", or "very_slow". The assumed default speed for tests is "normal". This +indicates that the test takes a relatively trivial amount of time (less than +1 second), regardless of the machine it is running on. Any test slower than +this could be marked as "slow" or "very_slow". + +The macro ``KUNIT_CASE_SLOW(test_name)`` can be easily used to set the speed +of a test case to "slow". + +``module`` + +This attribute indicates the name of the module associated with the test. + +This attribute is automatically saved as a string and is printed for each suite. +Tests can also be filtered using this attribute. diff --git a/Documentation/devicetree/bindings/arm/pmu.yaml b/Documentation/devicetree/bindings/arm/pmu.yaml index e14358bf0b9c..99b5e9530707 100644 --- a/Documentation/devicetree/bindings/arm/pmu.yaml +++ b/Documentation/devicetree/bindings/arm/pmu.yaml @@ -49,9 +49,14 @@ properties: - arm,cortex-a77-pmu - arm,cortex-a78-pmu - arm,cortex-a510-pmu + - arm,cortex-a520-pmu - arm,cortex-a710-pmu + - arm,cortex-a715-pmu + - arm,cortex-a720-pmu - arm,cortex-x1-pmu - arm,cortex-x2-pmu + - arm,cortex-x3-pmu + - arm,cortex-x4-pmu - arm,neoverse-e1-pmu - arm,neoverse-n1-pmu - arm,neoverse-n2-pmu diff --git a/Documentation/devicetree/bindings/cpufreq/cpufreq-qcom-hw.yaml b/Documentation/devicetree/bindings/cpufreq/cpufreq-qcom-hw.yaml index a6b3bb8fdf33..c1d225fcf2d5 100644 --- a/Documentation/devicetree/bindings/cpufreq/cpufreq-qcom-hw.yaml +++ b/Documentation/devicetree/bindings/cpufreq/cpufreq-qcom-hw.yaml @@ -49,6 +49,7 @@ properties: - description: Frequency domain 0 register region - description: Frequency domain 1 register region - description: Frequency domain 2 register region + - description: Frequency domain 3 register region reg-names: minItems: 1 @@ -56,6 +57,7 @@ properties: - const: freq-domain0 - const: freq-domain1 - const: freq-domain2 + - const: freq-domain3 clocks: items: @@ -69,7 +71,7 @@ properties: interrupts: minItems: 1 - maxItems: 3 + maxItems: 4 interrupt-names: minItems: 1 @@ -77,6 +79,7 @@ properties: - const: dcvsh-irq-0 - const: dcvsh-irq-1 - const: dcvsh-irq-2 + - const: dcvsh-irq-3 '#freq-domain-cells': const: 1 diff --git a/Documentation/devicetree/bindings/cpufreq/ti-cpufreq.txt b/Documentation/devicetree/bindings/cpufreq/ti-cpufreq.txt deleted file mode 100644 index 1758051798fe..000000000000 --- a/Documentation/devicetree/bindings/cpufreq/ti-cpufreq.txt +++ /dev/null @@ -1,132 +0,0 @@ -TI CPUFreq and OPP bindings -================================ - -Certain TI SoCs, like those in the am335x, am437x, am57xx, and dra7xx -families support different OPPs depending on the silicon variant in use. -The ti-cpufreq driver can use revision and an efuse value from the SoC to -provide the OPP framework with supported hardware information. This is -used to determine which OPPs from the operating-points-v2 table get enabled -when it is parsed by the OPP framework. - -Required properties: --------------------- -In 'cpus' nodes: -- operating-points-v2: Phandle to the operating-points-v2 table to use. - -In 'operating-points-v2' table: -- compatible: Should be - - 'operating-points-v2-ti-cpu' for am335x, am43xx, and dra7xx/am57xx, - omap34xx, omap36xx and am3517 SoCs -- syscon: A phandle pointing to a syscon node representing the control module - register space of the SoC. - -Optional properties: --------------------- -- "vdd-supply", "vbb-supply": to define two regulators for dra7xx -- "cpu0-supply", "vbb-supply": to define two regulators for omap36xx - -For each opp entry in 'operating-points-v2' table: -- opp-supported-hw: Two bitfields indicating: - 1. Which revision of the SoC the OPP is supported by - 2. Which eFuse bits indicate this OPP is available - - A bitwise AND is performed against these values and if any bit - matches, the OPP gets enabled. - -Example: --------- - -/* From arch/arm/boot/dts/am33xx.dtsi */ -cpus { - #address-cells = <1>; - #size-cells = <0>; - cpu@0 { - compatible = "arm,cortex-a8"; - device_type = "cpu"; - reg = <0>; - - operating-points-v2 = <&cpu0_opp_table>; - - clocks = <&dpll_mpu_ck>; - clock-names = "cpu"; - - clock-latency = <300000>; /* From omap-cpufreq driver */ - }; -}; - -/* - * cpu0 has different OPPs depending on SoC revision and some on revisions - * 0x2 and 0x4 have eFuse bits that indicate if they are available or not - */ -cpu0_opp_table: opp-table { - compatible = "operating-points-v2-ti-cpu"; - syscon = <&scm_conf>; - - /* - * The three following nodes are marked with opp-suspend - * because they can not be enabled simultaneously on a - * single SoC. - */ - opp50-300000000 { - opp-hz = /bits/ 64 <300000000>; - opp-microvolt = <950000 931000 969000>; - opp-supported-hw = <0x06 0x0010>; - opp-suspend; - }; - - opp100-275000000 { - opp-hz = /bits/ 64 <275000000>; - opp-microvolt = <1100000 1078000 1122000>; - opp-supported-hw = <0x01 0x00FF>; - opp-suspend; - }; - - opp100-300000000 { - opp-hz = /bits/ 64 <300000000>; - opp-microvolt = <1100000 1078000 1122000>; - opp-supported-hw = <0x06 0x0020>; - opp-suspend; - }; - - opp100-500000000 { - opp-hz = /bits/ 64 <500000000>; - opp-microvolt = <1100000 1078000 1122000>; - opp-supported-hw = <0x01 0xFFFF>; - }; - - opp100-600000000 { - opp-hz = /bits/ 64 <600000000>; - opp-microvolt = <1100000 1078000 1122000>; - opp-supported-hw = <0x06 0x0040>; - }; - - opp120-600000000 { - opp-hz = /bits/ 64 <600000000>; - opp-microvolt = <1200000 1176000 1224000>; - opp-supported-hw = <0x01 0xFFFF>; - }; - - opp120-720000000 { - opp-hz = /bits/ 64 <720000000>; - opp-microvolt = <1200000 1176000 1224000>; - opp-supported-hw = <0x06 0x0080>; - }; - - oppturbo-720000000 { - opp-hz = /bits/ 64 <720000000>; - opp-microvolt = <1260000 1234800 1285200>; - opp-supported-hw = <0x01 0xFFFF>; - }; - - oppturbo-800000000 { - opp-hz = /bits/ 64 <800000000>; - opp-microvolt = <1260000 1234800 1285200>; - opp-supported-hw = <0x06 0x0100>; - }; - - oppnitro-1000000000 { - opp-hz = /bits/ 64 <1000000000>; - opp-microvolt = <1325000 1298500 1351500>; - opp-supported-hw = <0x04 0x0200>; - }; -}; diff --git a/Documentation/devicetree/bindings/crypto/st,stm32-hash.yaml b/Documentation/devicetree/bindings/crypto/st,stm32-hash.yaml index b767ec72a999..ac480765cde0 100644 --- a/Documentation/devicetree/bindings/crypto/st,stm32-hash.yaml +++ b/Documentation/devicetree/bindings/crypto/st,stm32-hash.yaml @@ -20,6 +20,7 @@ properties: - stericsson,ux500-hash - st,stm32f456-hash - st,stm32f756-hash + - st,stm32mp13-hash reg: maxItems: 1 diff --git a/Documentation/devicetree/bindings/gpio/adi,ds4520-gpio.yaml b/Documentation/devicetree/bindings/gpio/adi,ds4520-gpio.yaml new file mode 100644 index 000000000000..25b3198c4d3e --- /dev/null +++ b/Documentation/devicetree/bindings/gpio/adi,ds4520-gpio.yaml @@ -0,0 +1,51 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/gpio/adi,ds4520-gpio.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: DS4520 I2C GPIO expander + +maintainers: + - Okan Sahin <okan.sahin@analog.com> + +properties: + compatible: + enum: + - adi,ds4520-gpio + + reg: + maxItems: 1 + + gpio-controller: true + + "#gpio-cells": + const: 2 + + ngpios: + minimum: 1 + maximum: 9 + +required: + - compatible + - reg + - gpio-controller + - "#gpio-cells" + - ngpios + +additionalProperties: false + +examples: + - | + i2c { + #address-cells = <1>; + #size-cells = <0>; + + gpio@50 { + compatible = "adi,ds4520-gpio"; + reg = <0x50>; + ngpios = <9>; + gpio-controller; + #gpio-cells = <2>; + }; + }; diff --git a/Documentation/devicetree/bindings/gpio/brcm,kona-gpio.txt b/Documentation/devicetree/bindings/gpio/brcm,kona-gpio.txt deleted file mode 100644 index 4a63bc96b687..000000000000 --- a/Documentation/devicetree/bindings/gpio/brcm,kona-gpio.txt +++ /dev/null @@ -1,52 +0,0 @@ -Broadcom Kona Family GPIO -========================= - -This GPIO driver is used in the following Broadcom SoCs: - BCM11130, BCM11140, BCM11351, BCM28145, BCM28155 - -The Broadcom GPIO Controller IP can be configured prior to synthesis to -support up to 8 banks of 32 GPIOs where each bank has its own IRQ. The -GPIO controller only supports edge, not level, triggering of interrupts. - -Required properties -------------------- - -- compatible: "brcm,bcm11351-gpio", "brcm,kona-gpio" -- reg: Physical base address and length of the controller's registers. -- interrupts: The interrupt outputs from the controller. There is one GPIO - interrupt per GPIO bank. The number of interrupts listed depends on the - number of GPIO banks on the SoC. The interrupts must be ordered by bank, - starting with bank 0. There is always a 1:1 mapping between banks and - IRQs. -- #gpio-cells: Should be <2>. The first cell is the pin number, the second - cell is used to specify optional parameters: - - bit 0 specifies polarity (0 for normal, 1 for inverted) - See also "gpio-specifier" in .../devicetree/bindings/gpio/gpio.txt. -- #interrupt-cells: Should be <2>. The first cell is the GPIO number. The - second cell is used to specify flags. The following subset of flags is - supported: - - trigger type (bits[1:0]): - 1 = low-to-high edge triggered. - 2 = high-to-low edge triggered. - 3 = low-to-high or high-to-low edge triggered - Valid values are 1, 2, 3 - See also .../devicetree/bindings/interrupt-controller/interrupts.txt. -- gpio-controller: Marks the device node as a GPIO controller. -- interrupt-controller: Marks the device node as an interrupt controller. - -Example: - gpio: gpio@35003000 { - compatible = "brcm,bcm11351-gpio", "brcm,kona-gpio"; - reg = <0x35003000 0x800>; - interrupts = - <GIC_SPI 106 IRQ_TYPE_LEVEL_HIGH - GIC_SPI 115 IRQ_TYPE_LEVEL_HIGH - GIC_SPI 114 IRQ_TYPE_LEVEL_HIGH - GIC_SPI 113 IRQ_TYPE_LEVEL_HIGH - GIC_SPI 112 IRQ_TYPE_LEVEL_HIGH - GIC_SPI 111 IRQ_TYPE_LEVEL_HIGH>; - #gpio-cells = <2>; - #interrupt-cells = <2>; - gpio-controller; - interrupt-controller; - }; diff --git a/Documentation/devicetree/bindings/gpio/brcm,kona-gpio.yaml b/Documentation/devicetree/bindings/gpio/brcm,kona-gpio.yaml new file mode 100644 index 000000000000..296fdd6b8f38 --- /dev/null +++ b/Documentation/devicetree/bindings/gpio/brcm,kona-gpio.yaml @@ -0,0 +1,100 @@ +# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/gpio/brcm,kona-gpio.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Broadcom Kona family GPIO controller + +description: + The Broadcom GPIO Controller IP can be configured prior to synthesis to + support up to 8 banks of 32 GPIOs where each bank has its own IRQ. The + GPIO controller only supports edge, not level, triggering of interrupts. + +maintainers: + - Ray Jui <rjui@broadcom.com> + +properties: + compatible: + items: + - enum: + - brcm,bcm11351-gpio + - brcm,bcm21664-gpio + - brcm,bcm23550-gpio + - const: brcm,kona-gpio + + reg: + maxItems: 1 + + interrupts: + minItems: 4 + maxItems: 6 + description: + The interrupt outputs from the controller. There is one GPIO interrupt + per GPIO bank. The number of interrupts listed depends on the number of + GPIO banks on the SoC. The interrupts must be ordered by bank, starting + with bank 0. There is always a 1:1 mapping between banks and IRQs. + + '#gpio-cells': + const: 2 + + '#interrupt-cells': + const: 2 + + gpio-controller: true + + interrupt-controller: true + +required: + - compatible + - reg + - interrupts + - '#gpio-cells' + - '#interrupt-cells' + - gpio-controller + - interrupt-controller + +allOf: + - if: + properties: + compatible: + contains: + const: brcm,bcm11351-gpio + then: + properties: + interrupts: + minItems: 6 + - if: + properties: + compatible: + contains: + enum: + - brcm,bcm21664-gpio + - brcm,bcm23550-gpio + then: + properties: + interrupts: + maxItems: 4 + +additionalProperties: false + +examples: + - | + #include <dt-bindings/interrupt-controller/arm-gic.h> + #include <dt-bindings/interrupt-controller/irq.h> + + gpio@35003000 { + compatible = "brcm,bcm11351-gpio", "brcm,kona-gpio"; + reg = <0x35003000 0x800>; + interrupts = <GIC_SPI 106 IRQ_TYPE_LEVEL_HIGH>, + <GIC_SPI 115 IRQ_TYPE_LEVEL_HIGH>, + <GIC_SPI 114 IRQ_TYPE_LEVEL_HIGH>, + <GIC_SPI 113 IRQ_TYPE_LEVEL_HIGH>, + <GIC_SPI 112 IRQ_TYPE_LEVEL_HIGH>, + <GIC_SPI 111 IRQ_TYPE_LEVEL_HIGH>; + #gpio-cells = <2>; + #interrupt-cells = <2>; + gpio-controller; + interrupt-controller; + }; +... diff --git a/Documentation/devicetree/bindings/gpio/fsl-imx-gpio.yaml b/Documentation/devicetree/bindings/gpio/fsl-imx-gpio.yaml index ae18603697d7..d0ca2af89f1e 100644 --- a/Documentation/devicetree/bindings/gpio/fsl-imx-gpio.yaml +++ b/Documentation/devicetree/bindings/gpio/fsl-imx-gpio.yaml @@ -32,10 +32,12 @@ properties: - fsl,imx6sx-gpio - fsl,imx6ul-gpio - fsl,imx7d-gpio + - fsl,imx8dxl-gpio - fsl,imx8mm-gpio - fsl,imx8mn-gpio - fsl,imx8mp-gpio - fsl,imx8mq-gpio + - fsl,imx8qm-gpio - fsl,imx8qxp-gpio - fsl,imxrt1050-gpio - fsl,imxrt1170-gpio diff --git a/Documentation/devicetree/bindings/gpio/gpio-pca95xx.yaml b/Documentation/devicetree/bindings/gpio/gpio-pca95xx.yaml index fa116148ee90..99febb8ea1b6 100644 --- a/Documentation/devicetree/bindings/gpio/gpio-pca95xx.yaml +++ b/Documentation/devicetree/bindings/gpio/gpio-pca95xx.yaml @@ -66,6 +66,7 @@ properties: - ti,tca6408 - ti,tca6416 - ti,tca6424 + - ti,tca9538 - ti,tca9539 - ti,tca9554 diff --git a/Documentation/devicetree/bindings/gpio/snps,dw-apb-gpio.yaml b/Documentation/devicetree/bindings/gpio/snps,dw-apb-gpio.yaml index b391cc1b4590..209f03bba0a7 100644 --- a/Documentation/devicetree/bindings/gpio/snps,dw-apb-gpio.yaml +++ b/Documentation/devicetree/bindings/gpio/snps,dw-apb-gpio.yaml @@ -61,6 +61,10 @@ patternProperties: '#gpio-cells': const: 2 + gpio-line-names: + minItems: 1 + maxItems: 32 + ngpios: default: 32 minimum: 1 diff --git a/Documentation/devicetree/bindings/gpio/st,stmpe-gpio.yaml b/Documentation/devicetree/bindings/gpio/st,stmpe-gpio.yaml index 22c0cae73425..4555f1644a4d 100644 --- a/Documentation/devicetree/bindings/gpio/st,stmpe-gpio.yaml +++ b/Documentation/devicetree/bindings/gpio/st,stmpe-gpio.yaml @@ -28,6 +28,10 @@ properties: gpio-controller: true + gpio-line-names: + minItems: 1 + maxItems: 24 + interrupt-controller: true st,norequest-mask: diff --git a/Documentation/devicetree/bindings/interrupt-controller/amlogic,meson-gpio-intc.yaml b/Documentation/devicetree/bindings/interrupt-controller/amlogic,meson-gpio-intc.yaml index e84e4f33b358..3d06db98e978 100644 --- a/Documentation/devicetree/bindings/interrupt-controller/amlogic,meson-gpio-intc.yaml +++ b/Documentation/devicetree/bindings/interrupt-controller/amlogic,meson-gpio-intc.yaml @@ -35,6 +35,7 @@ properties: - amlogic,meson-sm1-gpio-intc - amlogic,meson-a1-gpio-intc - amlogic,meson-s4-gpio-intc + - amlogic,c3-gpio-intc - const: amlogic,meson-gpio-intc reg: diff --git a/Documentation/devicetree/bindings/mmc/arasan,sdhci.yaml b/Documentation/devicetree/bindings/mmc/arasan,sdhci.yaml index a6c19a6cc99e..3e99801f77d2 100644 --- a/Documentation/devicetree/bindings/mmc/arasan,sdhci.yaml +++ b/Documentation/devicetree/bindings/mmc/arasan,sdhci.yaml @@ -160,6 +160,12 @@ properties: description: The MIO bank number in which the command and data lines are configured. + iommus: + maxItems: 1 + + power-domains: + maxItems: 1 + dependencies: '#clock-cells': [ clock-output-names ] diff --git a/Documentation/devicetree/bindings/mmc/mmc-controller.yaml b/Documentation/devicetree/bindings/mmc/mmc-controller.yaml index 86c73fd825fd..58ae298cd2fc 100644 --- a/Documentation/devicetree/bindings/mmc/mmc-controller.yaml +++ b/Documentation/devicetree/bindings/mmc/mmc-controller.yaml @@ -269,7 +269,7 @@ properties: post-power-on-delay-ms: description: It was invented for MMC pwrseq-simple which could be referred to - mmc-pwrseq-simple.txt. But now it\'s reused as a tunable delay + mmc-pwrseq-simple.yaml. But now it\'s reused as a tunable delay waiting for I/O signalling and card power supply to be stable, regardless of whether pwrseq-simple is used. Default to 10ms if no available. diff --git a/Documentation/devicetree/bindings/mmc/mtk-sd.yaml b/Documentation/devicetree/bindings/mmc/mtk-sd.yaml index 46eefdd19a2c..3fffa467e4e1 100644 --- a/Documentation/devicetree/bindings/mmc/mtk-sd.yaml +++ b/Documentation/devicetree/bindings/mmc/mtk-sd.yaml @@ -91,16 +91,6 @@ properties: should switch dat1 pin to GPIO mode. maxItems: 1 - assigned-clocks: - description: - PLL of the source clock. - maxItems: 1 - - assigned-clock-parents: - description: - parent of source clock, used for HS400 mode to get 400Mhz source clock. - maxItems: 1 - hs400-ds-delay: $ref: /schemas/types.yaml#/definitions/uint32 description: diff --git a/Documentation/devicetree/bindings/mmc/sdhci-atmel.txt b/Documentation/devicetree/bindings/mmc/sdhci-atmel.txt index 69edfd4d3922..a9fb0a91245f 100644 --- a/Documentation/devicetree/bindings/mmc/sdhci-atmel.txt +++ b/Documentation/devicetree/bindings/mmc/sdhci-atmel.txt @@ -5,11 +5,13 @@ Documentation/devicetree/bindings/mmc/mmc.txt and the properties used by the sdhci-of-at91 driver. Required properties: -- compatible: Must be "atmel,sama5d2-sdhci" or "microchip,sam9x60-sdhci". +- compatible: Must be "atmel,sama5d2-sdhci" or "microchip,sam9x60-sdhci" + or "microchip,sam9x7-sdhci", "microchip,sam9x60-sdhci". - clocks: Phandlers to the clocks. - clock-names: Must be "hclock", "multclk", "baseclk" for "atmel,sama5d2-sdhci". Must be "hclock", "multclk" for "microchip,sam9x60-sdhci". + Must be "hclock", "multclk" for "microchip,sam9x7-sdhci". Optional properties: - assigned-clocks: The same with "multclk". diff --git a/Documentation/devicetree/bindings/net/bluetooth/qualcomm-bluetooth.yaml b/Documentation/devicetree/bindings/net/bluetooth/qualcomm-bluetooth.yaml index 56cbb42b5aea..eba2f3026ab0 100644 --- a/Documentation/devicetree/bindings/net/bluetooth/qualcomm-bluetooth.yaml +++ b/Documentation/devicetree/bindings/net/bluetooth/qualcomm-bluetooth.yaml @@ -19,12 +19,14 @@ properties: - qcom,qca2066-bt - qcom,qca6174-bt - qcom,qca9377-bt + - qcom,wcn3988-bt - qcom,wcn3990-bt - qcom,wcn3991-bt - qcom,wcn3998-bt - qcom,qca6390-bt - qcom,wcn6750-bt - qcom,wcn6855-bt + - qcom,wcn7850-bt enable-gpios: maxItems: 1 @@ -57,6 +59,9 @@ properties: vddaon-supply: description: VDD_AON supply regulator handle + vdddig-supply: + description: VDD_DIG supply regulator handle + vddbtcxmx-supply: description: VDD_BT_CXMX supply regulator handle @@ -72,6 +77,9 @@ properties: vddrfa1p2-supply: description: VDD_RFA_1P2 supply regulator handle + vddrfa1p9-supply: + description: VDD_RFA_1P9 supply regulator handle + vddrfa2p2-supply: description: VDD_RFA_2P2 supply regulator handle @@ -111,6 +119,7 @@ allOf: compatible: contains: enum: + - qcom,wcn3988-bt - qcom,wcn3990-bt - qcom,wcn3991-bt - qcom,wcn3998-bt @@ -155,6 +164,22 @@ allOf: - vddrfa0p8-supply - vddrfa1p2-supply - vddrfa1p7-supply + - if: + properties: + compatible: + contains: + enum: + - qcom,wcn7850-bt + then: + required: + - enable-gpios + - swctrl-gpios + - vddio-supply + - vddaon-supply + - vdddig-supply + - vddrfa0p8-supply + - vddrfa1p2-supply + - vddrfa1p9-supply examples: - | diff --git a/Documentation/devicetree/bindings/net/brcm,asp-v2.0.yaml b/Documentation/devicetree/bindings/net/brcm,asp-v2.0.yaml new file mode 100644 index 000000000000..aa3162c74833 --- /dev/null +++ b/Documentation/devicetree/bindings/net/brcm,asp-v2.0.yaml @@ -0,0 +1,155 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/net/brcm,asp-v2.0.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Broadcom ASP 2.0 Ethernet controller + +maintainers: + - Justin Chen <justin.chen@broadcom.com> + - Florian Fainelli <florian.fainelli@broadcom.com> + +description: Broadcom Ethernet controller first introduced with 72165 + +properties: + compatible: + oneOf: + - items: + - enum: + - brcm,bcm74165-asp + - const: brcm,asp-v2.1 + - items: + - enum: + - brcm,bcm72165-asp + - const: brcm,asp-v2.0 + + "#address-cells": + const: 1 + "#size-cells": + const: 1 + + reg: + maxItems: 1 + + ranges: true + + interrupts: + minItems: 1 + items: + - description: RX/TX interrupt + - description: Port 0 Wake-on-LAN + - description: Port 1 Wake-on-LAN + + clocks: + maxItems: 1 + + ethernet-ports: + type: object + properties: + "#address-cells": + const: 1 + "#size-cells": + const: 0 + + patternProperties: + "^port@[0-9]+$": + type: object + + $ref: ethernet-controller.yaml# + + unevaluatedProperties: false + + properties: + reg: + maxItems: 1 + description: Port number + + brcm,channel: + $ref: /schemas/types.yaml#/definitions/uint32 + description: | + ASP Channel Number + + The depacketizer channel that consumes packets from + the unimac/port. + + required: + - reg + - brcm,channel + + additionalProperties: false + +patternProperties: + "^mdio@[0-9a-f]+$": + type: object + $ref: brcm,unimac-mdio.yaml + + description: + ASP internal UniMAC MDIO bus + +required: + - compatible + - reg + - interrupts + - clocks + - ranges + +additionalProperties: false + +examples: + - | + #include <dt-bindings/interrupt-controller/irq.h> + #include <dt-bindings/interrupt-controller/arm-gic.h> + + ethernet@9c00000 { + compatible = "brcm,bcm72165-asp", "brcm,asp-v2.0"; + reg = <0x9c00000 0x1fff14>; + interrupts = <GIC_SPI 51 IRQ_TYPE_LEVEL_HIGH>; + ranges = <0x0 0x9c00000 0x1fff14>; + clocks = <&scmi 14>; + #address-cells = <1>; + #size-cells = <1>; + + mdio@c614 { + compatible = "brcm,asp-v2.0-mdio"; + reg = <0xc614 0x8>; + reg-names = "mdio"; + #address-cells = <1>; + #size-cells = <0>; + + phy0: ethernet-phy@1 { + reg = <1>; + }; + }; + + mdio@ce14 { + compatible = "brcm,asp-v2.0-mdio"; + reg = <0xce14 0x8>; + reg-names = "mdio"; + #address-cells = <1>; + #size-cells = <0>; + + phy1: ethernet-phy@1 { + reg = <1>; + }; + }; + + ethernet-ports { + #address-cells = <1>; + #size-cells = <0>; + + port@0 { + reg = <0>; + brcm,channel = <8>; + phy-mode = "rgmii"; + phy-handle = <&phy0>; + }; + + port@1 { + reg = <1>; + brcm,channel = <9>; + phy-mode = "rgmii"; + phy-handle = <&phy1>; + }; + }; + }; diff --git a/Documentation/devicetree/bindings/net/brcm,unimac-mdio.yaml b/Documentation/devicetree/bindings/net/brcm,unimac-mdio.yaml index 0be426ee1e44..6684810fcbf0 100644 --- a/Documentation/devicetree/bindings/net/brcm,unimac-mdio.yaml +++ b/Documentation/devicetree/bindings/net/brcm,unimac-mdio.yaml @@ -22,6 +22,8 @@ properties: - brcm,genet-mdio-v3 - brcm,genet-mdio-v4 - brcm,genet-mdio-v5 + - brcm,asp-v2.0-mdio + - brcm,asp-v2.1-mdio - brcm,unimac-mdio reg: diff --git a/Documentation/devicetree/bindings/net/can/allwinner,sun4i-a10-can.yaml b/Documentation/devicetree/bindings/net/can/allwinner,sun4i-a10-can.yaml index 9c494957a07a..e42ea28d6ab4 100644 --- a/Documentation/devicetree/bindings/net/can/allwinner,sun4i-a10-can.yaml +++ b/Documentation/devicetree/bindings/net/can/allwinner,sun4i-a10-can.yaml @@ -21,6 +21,7 @@ properties: - const: allwinner,sun4i-a10-can - const: allwinner,sun4i-a10-can - const: allwinner,sun8i-r40-can + - const: allwinner,sun20i-d1-can reg: maxItems: 1 @@ -37,8 +38,9 @@ properties: if: properties: compatible: - contains: - const: allwinner,sun8i-r40-can + enum: + - allwinner,sun8i-r40-can + - allwinner,sun20i-d1-can then: required: diff --git a/Documentation/devicetree/bindings/net/can/bosch,m_can.yaml b/Documentation/devicetree/bindings/net/can/bosch,m_can.yaml index 67879aab623b..bb518c831f7b 100644 --- a/Documentation/devicetree/bindings/net/can/bosch,m_can.yaml +++ b/Documentation/devicetree/bindings/net/can/bosch,m_can.yaml @@ -122,8 +122,6 @@ required: - compatible - reg - reg-names - - interrupts - - interrupt-names - clocks - clock-names - bosch,mram-cfg @@ -132,6 +130,7 @@ additionalProperties: false examples: - | + // Example with interrupts #include <dt-bindings/clock/imx6sx-clock.h> can@20e8000 { compatible = "bosch,m_can"; @@ -149,4 +148,21 @@ examples: }; }; + - | + // Example with timer polling + #include <dt-bindings/clock/imx6sx-clock.h> + can@20e8000 { + compatible = "bosch,m_can"; + reg = <0x020e8000 0x4000>, <0x02298000 0x4000>; + reg-names = "m_can", "message_ram"; + clocks = <&clks IMX6SX_CLK_CANFD>, + <&clks IMX6SX_CLK_CANFD>; + clock-names = "hclk", "cclk"; + bosch,mram-cfg = <0x0 0 0 32 0 0 0 1>; + + can-transceiver { + max-bitrate = <5000000>; + }; + }; + ... diff --git a/Documentation/devicetree/bindings/net/can/tcan4x5x.txt b/Documentation/devicetree/bindings/net/can/tcan4x5x.txt index e3501bfa22e9..170e23f0610d 100644 --- a/Documentation/devicetree/bindings/net/can/tcan4x5x.txt +++ b/Documentation/devicetree/bindings/net/can/tcan4x5x.txt @@ -4,7 +4,10 @@ Texas Instruments TCAN4x5x CAN Controller This file provides device node information for the TCAN4x5x interface contains. Required properties: - - compatible: "ti,tcan4x5x" + - compatible: + "ti,tcan4552", "ti,tcan4x5x" + "ti,tcan4553", "ti,tcan4x5x" or + "ti,tcan4x5x" - reg: 0 - #address-cells: 1 - #size-cells: 0 @@ -21,8 +24,10 @@ Optional properties: - reset-gpios: Hardwired output GPIO. If not defined then software reset. - device-state-gpios: Input GPIO that indicates if the device is in - a sleep state or if the device is active. - - device-wake-gpios: Wake up GPIO to wake up the TCAN device. + a sleep state or if the device is active. Not + available with tcan4552/4553. + - device-wake-gpios: Wake up GPIO to wake up the TCAN device. Not + available with tcan4552/4553. Example: tcan4x5x: tcan4x5x@0 { diff --git a/Documentation/devicetree/bindings/net/can/xilinx,can.yaml b/Documentation/devicetree/bindings/net/can/xilinx,can.yaml index 897d2cbda45b..64d57c343e6f 100644 --- a/Documentation/devicetree/bindings/net/can/xilinx,can.yaml +++ b/Documentation/devicetree/bindings/net/can/xilinx,can.yaml @@ -46,6 +46,9 @@ properties: $ref: /schemas/types.yaml#/definitions/uint32 description: CAN Tx mailbox buffer count (CAN FD) + resets: + maxItems: 1 + required: - compatible - reg diff --git a/Documentation/devicetree/bindings/net/dsa/dsa.yaml b/Documentation/devicetree/bindings/net/dsa/dsa.yaml index 8d971813bab6..ec74a660beda 100644 --- a/Documentation/devicetree/bindings/net/dsa/dsa.yaml +++ b/Documentation/devicetree/bindings/net/dsa/dsa.yaml @@ -36,7 +36,7 @@ additionalProperties: true $defs: ethernet-ports: description: A DSA switch without any extra port properties - $ref: '#/' + $ref: '#' patternProperties: "^(ethernet-)?ports$": diff --git a/Documentation/devicetree/bindings/net/dsa/marvell.txt b/Documentation/devicetree/bindings/net/dsa/marvell.txt index 33726134f5c9..6ec0c181b6db 100644 --- a/Documentation/devicetree/bindings/net/dsa/marvell.txt +++ b/Documentation/devicetree/bindings/net/dsa/marvell.txt @@ -20,7 +20,7 @@ which is at a different MDIO base address in different switch families. 6171, 6172, 6175, 6176, 6185, 6240, 6320, 6321, 6341, 6350, 6351, 6352 - "marvell,mv88e6190" : Switch has base address 0x00. Use with models: - 6163, 6190, 6190X, 6191, 6290, 6390, 6390X + 6190, 6190X, 6191, 6290, 6361, 6390, 6390X - "marvell,mv88e6250" : Switch has base address 0x08 or 0x18. Use with model: 6220, 6250 diff --git a/Documentation/devicetree/bindings/net/ethernet-controller.yaml b/Documentation/devicetree/bindings/net/ethernet-controller.yaml index 6b0d359367da..9f6a5ccbcefe 100644 --- a/Documentation/devicetree/bindings/net/ethernet-controller.yaml +++ b/Documentation/devicetree/bindings/net/ethernet-controller.yaml @@ -66,6 +66,7 @@ properties: - mii - gmii - sgmii + - psgmii - qsgmii - qusgmii - tbi diff --git a/Documentation/devicetree/bindings/net/mediatek,net.yaml b/Documentation/devicetree/bindings/net/mediatek,net.yaml index 31cc0c412805..e74502a0afe8 100644 --- a/Documentation/devicetree/bindings/net/mediatek,net.yaml +++ b/Documentation/devicetree/bindings/net/mediatek,net.yaml @@ -19,10 +19,12 @@ properties: enum: - mediatek,mt2701-eth - mediatek,mt7623-eth + - mediatek,mt7621-eth - mediatek,mt7622-eth - mediatek,mt7629-eth - mediatek,mt7981-eth - mediatek,mt7986-eth + - mediatek,mt7988-eth - ralink,rt5350-eth reg: @@ -32,7 +34,7 @@ properties: clock-names: true interrupts: - minItems: 3 + minItems: 1 maxItems: 4 power-domains: @@ -60,6 +62,12 @@ properties: Phandle to the mediatek hifsys controller used to provide various clocks and reset to the system. + mediatek,infracfg: + $ref: /schemas/types.yaml#/definitions/phandle + description: + Phandle to the syscon node that handles the path from GMAC to + PHY variants. + mediatek,sgmiisys: $ref: /schemas/types.yaml#/definitions/phandle-array minItems: 1 @@ -121,6 +129,8 @@ allOf: - const: gp1 - const: gp2 + mediatek,infracfg: false + mediatek,pctl: $ref: /schemas/types.yaml#/definitions/phandle description: @@ -135,6 +145,32 @@ allOf: properties: compatible: contains: + enum: + - mediatek,mt7621-eth + then: + properties: + interrupts: + maxItems: 1 + + clocks: + minItems: 2 + maxItems: 2 + + clock-names: + items: + - const: ethif + - const: fe + + mediatek,infracfg: false + + mediatek,wed: false + + mediatek,wed-pcie: false + + - if: + properties: + compatible: + contains: const: mediatek,mt7622-eth then: properties: @@ -159,6 +195,8 @@ allOf: - const: sgmii_ck - const: eth2pll + mediatek,infracfg: false + mediatek,sgmiisys: minItems: 1 maxItems: 1 @@ -204,12 +242,6 @@ allOf: - const: sgmii_ck - const: eth2pll - mediatek,infracfg: - $ref: /schemas/types.yaml#/definitions/phandle - description: - Phandle to the syscon node that handles the path from GMAC to - PHY variants. - mediatek,sgmiisys: minItems: 2 maxItems: 2 @@ -250,6 +282,8 @@ allOf: - const: netsys0 - const: netsys1 + mediatek,infracfg: false + mediatek,sgmiisys: minItems: 2 maxItems: 2 @@ -286,6 +320,67 @@ allOf: - const: netsys0 - const: netsys1 + mediatek,infracfg: false + + mediatek,sgmiisys: + minItems: 2 + maxItems: 2 + + - if: + properties: + compatible: + contains: + const: mediatek,mt7988-eth + then: + properties: + interrupts: + minItems: 4 + + clocks: + minItems: 34 + maxItems: 34 + + clock-names: + items: + - const: crypto + - const: fe + - const: gp2 + - const: gp1 + - const: gp3 + - const: ethwarp_wocpu2 + - const: ethwarp_wocpu1 + - const: ethwarp_wocpu0 + - const: esw + - const: netsys0 + - const: netsys1 + - const: sgmii_tx250m + - const: sgmii_rx250m + - const: sgmii2_tx250m + - const: sgmii2_rx250m + - const: top_usxgmii0_sel + - const: top_usxgmii1_sel + - const: top_sgm0_sel + - const: top_sgm1_sel + - const: top_xfi_phy0_xtal_sel + - const: top_xfi_phy1_xtal_sel + - const: top_eth_gmii_sel + - const: top_eth_refck_50m_sel + - const: top_eth_sys_200m_sel + - const: top_eth_sys_sel + - const: top_eth_xgmii_sel + - const: top_eth_mii_sel + - const: top_netsys_sel + - const: top_netsys_500m_sel + - const: top_netsys_pao_2x_sel + - const: top_netsys_sync_250m_sel + - const: top_netsys_ppefb_250m_sel + - const: top_netsys_warp_sel + - const: wocpu1 + - const: wocpu0 + - const: xgp1 + - const: xgp2 + - const: xgp3 + mediatek,sgmiisys: minItems: 2 maxItems: 2 diff --git a/Documentation/devicetree/bindings/net/motorcomm,yt8xxx.yaml b/Documentation/devicetree/bindings/net/motorcomm,yt8xxx.yaml index 157e3bbcaf6f..26688e2302ea 100644 --- a/Documentation/devicetree/bindings/net/motorcomm,yt8xxx.yaml +++ b/Documentation/devicetree/bindings/net/motorcomm,yt8xxx.yaml @@ -52,6 +52,40 @@ properties: for a timer. type: boolean + motorcomm,rx-clk-drv-microamp: + description: | + drive strength of rx_clk rgmii pad. + The YT8531 RGMII LDO voltage supports 1.8V/3.3V, and the LDO voltage can + be configured with hardware pull-up resistors to match the SOC voltage + (usually 1.8V). + The software can read the registers to obtain the LDO voltage and configure + the legal drive strength(curren). + ===================================================== + | voltage | current Available (uA) | + | 1.8v | 1200 2100 2700 2910 3110 3600 3970 4350 | + | 3.3v | 3070 4080 4370 4680 5020 5450 5740 6140 | + ===================================================== + enum: [ 1200, 2100, 2700, 2910, 3070, 3110, 3600, 3970, + 4080, 4350, 4370, 4680, 5020, 5450, 5740, 6140 ] + default: 2910 + + motorcomm,rx-data-drv-microamp: + description: | + drive strength of rx_data/rx_ctl rgmii pad. + The YT8531 RGMII LDO voltage supports 1.8V/3.3V, and the LDO voltage can + be configured with hardware pull-up resistors to match the SOC voltage + (usually 1.8V). + The software can read the registers to obtain the LDO voltage and configure + the legal drive strength(curren). + ===================================================== + | voltage | current Available (uA) | + | 1.8v | 1200 2100 2700 2910 3110 3600 3970 4350 | + | 3.3v | 3070 4080 4370 4680 5020 5450 5740 6140 | + ===================================================== + enum: [ 1200, 2100, 2700, 2910, 3070, 3110, 3600, 3970, + 4080, 4350, 4370, 4680, 5020, 5450, 5740, 6140 ] + default: 2910 + motorcomm,tx-clk-adj-enabled: description: | This configuration is mainly to adapt to VF2 with JH7110 SoC. diff --git a/Documentation/devicetree/bindings/net/oxnas-dwmac.txt b/Documentation/devicetree/bindings/net/oxnas-dwmac.txt deleted file mode 100644 index 27db496f1ce8..000000000000 --- a/Documentation/devicetree/bindings/net/oxnas-dwmac.txt +++ /dev/null @@ -1,41 +0,0 @@ -* Oxford Semiconductor OXNAS DWMAC Ethernet controller - -The device inherits all the properties of the dwmac/stmmac devices -described in the file stmmac.txt in the current directory with the -following changes. - -Required properties on all platforms: - -- compatible: For the OX820 SoC, it should be : - - "oxsemi,ox820-dwmac" to select glue - - "snps,dwmac-3.512" to select IP version. - For the OX810SE SoC, it should be : - - "oxsemi,ox810se-dwmac" to select glue - - "snps,dwmac-3.512" to select IP version. - -- clocks: Should contain phandles to the following clocks -- clock-names: Should contain the following: - - "stmmaceth" for the host clock - see stmmac.txt - - "gmac" for the peripheral gate clock - -- oxsemi,sys-ctrl: a phandle to the system controller syscon node - -Example : - -etha: ethernet@40400000 { - compatible = "oxsemi,ox820-dwmac", "snps,dwmac-3.512"; - reg = <0x40400000 0x2000>; - interrupts = <GIC_SPI 8 IRQ_TYPE_LEVEL_HIGH>, - <GIC_SPI 17 IRQ_TYPE_LEVEL_HIGH>; - interrupt-names = "macirq", "eth_wake_irq"; - mac-address = [000000000000]; /* Filled in by U-Boot */ - phy-mode = "rgmii"; - - clocks = <&stdclk CLK_820_ETHA>, <&gmacclk>; - clock-names = "gmac", "stmmaceth"; - resets = <&reset RESET_MAC>; - - /* Regmap for sys registers */ - oxsemi,sys-ctrl = <&sys>; - -}; diff --git a/Documentation/devicetree/bindings/net/qca,ar803x.yaml b/Documentation/devicetree/bindings/net/qca,ar803x.yaml index 161d28919316..3acd09f0da86 100644 --- a/Documentation/devicetree/bindings/net/qca,ar803x.yaml +++ b/Documentation/devicetree/bindings/net/qca,ar803x.yaml @@ -75,6 +75,7 @@ properties: description: Initial data for the VDDIO regulator. Set this to 1.5V or 1.8V. $ref: /schemas/regulator/regulator.yaml + unevaluatedProperties: false vddh-regulator: type: object @@ -82,6 +83,7 @@ properties: Dummy subnode to model the external connection of the PHY VDDH regulator to VDDIO. $ref: /schemas/regulator/regulator.yaml + unevaluatedProperties: false unevaluatedProperties: false diff --git a/Documentation/devicetree/bindings/net/rockchip-dwmac.yaml b/Documentation/devicetree/bindings/net/rockchip-dwmac.yaml index 7f324c6da915..70bbc4220e2a 100644 --- a/Documentation/devicetree/bindings/net/rockchip-dwmac.yaml +++ b/Documentation/devicetree/bindings/net/rockchip-dwmac.yaml @@ -80,6 +80,7 @@ properties: "output" means GMAC provides the reference clock. $ref: /schemas/types.yaml#/definitions/string enum: [input, output] + default: input rockchip,grf: description: The phandle of the syscon node for the general register file. diff --git a/Documentation/devicetree/bindings/net/ti,icss-iep.yaml b/Documentation/devicetree/bindings/net/ti,icss-iep.yaml new file mode 100644 index 000000000000..f5c22d6dcaee --- /dev/null +++ b/Documentation/devicetree/bindings/net/ti,icss-iep.yaml @@ -0,0 +1,45 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/net/ti,icss-iep.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Texas Instruments ICSS Industrial Ethernet Peripheral (IEP) module + +maintainers: + - Md Danish Anwar <danishanwar@ti.com> + +properties: + compatible: + oneOf: + - items: + - enum: + - ti,am642-icss-iep + - ti,j721e-icss-iep + - const: ti,am654-icss-iep + + - const: ti,am654-icss-iep + + + reg: + maxItems: 1 + + clocks: + maxItems: 1 + description: phandle to the IEP source clock + +required: + - compatible + - reg + - clocks + +additionalProperties: false + +examples: + - | + /* AM65x */ + icssg0_iep0: iep@2e000 { + compatible = "ti,am654-icss-iep"; + reg = <0x2e000 0x1000>; + clocks = <&icssg0_iepclk_mux>; + }; diff --git a/Documentation/devicetree/bindings/net/ti,icssg-prueth.yaml b/Documentation/devicetree/bindings/net/ti,icssg-prueth.yaml new file mode 100644 index 000000000000..311c570165f9 --- /dev/null +++ b/Documentation/devicetree/bindings/net/ti,icssg-prueth.yaml @@ -0,0 +1,193 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/net/ti,icssg-prueth.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Texas Instruments ICSSG PRUSS Ethernet + +maintainers: + - Md Danish Anwar <danishanwar@ti.com> + +description: + Ethernet based on the Programmable Real-Time Unit and Industrial + Communication Subsystem. + +allOf: + - $ref: /schemas/remoteproc/ti,pru-consumer.yaml# + +properties: + compatible: + enum: + - ti,am654-icssg-prueth # for AM65x SoC family + + sram: + $ref: /schemas/types.yaml#/definitions/phandle + description: + phandle to MSMC SRAM node + + dmas: + maxItems: 10 + + dma-names: + items: + - const: tx0-0 + - const: tx0-1 + - const: tx0-2 + - const: tx0-3 + - const: tx1-0 + - const: tx1-1 + - const: tx1-2 + - const: tx1-3 + - const: rx0 + - const: rx1 + + ti,mii-g-rt: + $ref: /schemas/types.yaml#/definitions/phandle + description: + phandle to MII_G_RT module's syscon regmap. + + ti,mii-rt: + $ref: /schemas/types.yaml#/definitions/phandle + description: + phandle to MII_RT module's syscon regmap + + ti,iep: + $ref: /schemas/types.yaml#/definitions/phandle-array + maxItems: 2 + items: + maxItems: 1 + description: + phandle to IEP (Industrial Ethernet Peripheral) for ICSSG + + interrupts: + maxItems: 2 + description: + Interrupt specifiers to TX timestamp IRQ. + + interrupt-names: + items: + - const: tx_ts0 + - const: tx_ts1 + + ethernet-ports: + type: object + additionalProperties: false + + properties: + '#address-cells': + const: 1 + '#size-cells': + const: 0 + + patternProperties: + ^port@[0-1]$: + type: object + description: ICSSG PRUETH external ports + $ref: ethernet-controller.yaml# + unevaluatedProperties: false + + properties: + reg: + items: + - enum: [0, 1] + description: ICSSG PRUETH port number + + interrupts: + maxItems: 1 + + ti,syscon-rgmii-delay: + items: + - items: + - description: phandle to system controller node + - description: The offset to ICSSG control register + $ref: /schemas/types.yaml#/definitions/phandle-array + description: + phandle to system controller node and register offset + to ICSSG control register for RGMII transmit delay + + required: + - reg + anyOf: + - required: + - port@0 + - required: + - port@1 + +required: + - compatible + - sram + - dmas + - dma-names + - ethernet-ports + - ti,mii-g-rt + - interrupts + - interrupt-names + +unevaluatedProperties: false + +examples: + - | + /* Example k3-am654 base board SR2.0, dual-emac */ + pruss2_eth: ethernet { + compatible = "ti,am654-icssg-prueth"; + pinctrl-names = "default"; + pinctrl-0 = <&icssg2_rgmii_pins_default>; + sram = <&msmc_ram>; + + ti,prus = <&pru2_0>, <&rtu2_0>, <&tx_pru2_0>, + <&pru2_1>, <&rtu2_1>, <&tx_pru2_1>; + firmware-name = "ti-pruss/am65x-pru0-prueth-fw.elf", + "ti-pruss/am65x-rtu0-prueth-fw.elf", + "ti-pruss/am65x-txpru0-prueth-fw.elf", + "ti-pruss/am65x-pru1-prueth-fw.elf", + "ti-pruss/am65x-rtu1-prueth-fw.elf", + "ti-pruss/am65x-txpru1-prueth-fw.elf"; + ti,pruss-gp-mux-sel = <2>, /* MII mode */ + <2>, + <2>, + <2>, /* MII mode */ + <2>, + <2>; + dmas = <&main_udmap 0xc300>, /* egress slice 0 */ + <&main_udmap 0xc301>, /* egress slice 0 */ + <&main_udmap 0xc302>, /* egress slice 0 */ + <&main_udmap 0xc303>, /* egress slice 0 */ + <&main_udmap 0xc304>, /* egress slice 1 */ + <&main_udmap 0xc305>, /* egress slice 1 */ + <&main_udmap 0xc306>, /* egress slice 1 */ + <&main_udmap 0xc307>, /* egress slice 1 */ + <&main_udmap 0x4300>, /* ingress slice 0 */ + <&main_udmap 0x4301>; /* ingress slice 1 */ + dma-names = "tx0-0", "tx0-1", "tx0-2", "tx0-3", + "tx1-0", "tx1-1", "tx1-2", "tx1-3", + "rx0", "rx1"; + ti,mii-g-rt = <&icssg2_mii_g_rt>; + ti,iep = <&icssg2_iep0>, <&icssg2_iep1>; + interrupt-parent = <&icssg2_intc>; + interrupts = <24 0 2>, <25 1 3>; + interrupt-names = "tx_ts0", "tx_ts1"; + ethernet-ports { + #address-cells = <1>; + #size-cells = <0>; + pruss2_emac0: port@0 { + reg = <0>; + phy-handle = <&pruss2_eth0_phy>; + phy-mode = "rgmii-id"; + interrupts-extended = <&icssg2_intc 24>; + ti,syscon-rgmii-delay = <&scm_conf 0x4120>; + /* Filled in by bootloader */ + local-mac-address = [00 00 00 00 00 00]; + }; + + pruss2_emac1: port@1 { + reg = <1>; + phy-handle = <&pruss2_eth1_phy>; + phy-mode = "rgmii-id"; + interrupts-extended = <&icssg2_intc 25>; + ti,syscon-rgmii-delay = <&scm_conf 0x4124>; + /* Filled in by bootloader */ + local-mac-address = [00 00 00 00 00 00]; + }; + }; + }; diff --git a/Documentation/devicetree/bindings/net/wireless/mediatek,mt76.yaml b/Documentation/devicetree/bindings/net/wireless/mediatek,mt76.yaml index 67b63f119f64..252207adbc54 100644 --- a/Documentation/devicetree/bindings/net/wireless/mediatek,mt76.yaml +++ b/Documentation/devicetree/bindings/net/wireless/mediatek,mt76.yaml @@ -28,6 +28,7 @@ properties: - mediatek,mt76 - mediatek,mt7628-wmac - mediatek,mt7622-wmac + - mediatek,mt7981-wmac - mediatek,mt7986-wmac reg: @@ -71,6 +72,14 @@ properties: ieee80211-freq-limit: true + nvmem-cells: + items: + - description: NVMEM cell with EEPROM + + nvmem-cell-names: + items: + - const: eeprom + mediatek,eeprom-data: $ref: /schemas/types.yaml#/definitions/uint32-array description: @@ -84,6 +93,7 @@ properties: - description: offset containing EEPROM data description: Phandle to a MTD partition + offset containing EEPROM data + deprecated: true big-endian: $ref: /schemas/types.yaml#/definitions/flag @@ -258,7 +268,8 @@ examples: interrupt-parent = <&cpuintc>; interrupts = <6>; - mediatek,mtd-eeprom = <&factory 0x0>; + nvmem-cells = <&eeprom>; + nvmem-cell-names = "eeprom"; }; - | diff --git a/Documentation/devicetree/bindings/net/xilinx_gmii2rgmii.txt b/Documentation/devicetree/bindings/net/xilinx_gmii2rgmii.txt deleted file mode 100644 index 038dda48b8e6..000000000000 --- a/Documentation/devicetree/bindings/net/xilinx_gmii2rgmii.txt +++ /dev/null @@ -1,35 +0,0 @@ -XILINX GMIITORGMII Converter Driver Device Tree Bindings --------------------------------------------------------- - -The Gigabit Media Independent Interface (GMII) to Reduced Gigabit Media -Independent Interface (RGMII) core provides the RGMII between RGMII-compliant -Ethernet physical media devices (PHY) and the Gigabit Ethernet controller. -This core can be used in all three modes of operation(10/100/1000 Mb/s). -The Management Data Input/Output (MDIO) interface is used to configure the -Speed of operation. This core can switch dynamically between the three -Different speed modes by configuring the conveter register through mdio write. - -This converter sits between the ethernet MAC and the external phy. -MAC <==> GMII2RGMII <==> RGMII_PHY - -For more details about mdio please refer phy.txt file in the same directory. - -Required properties: -- compatible : Should be "xlnx,gmii-to-rgmii-1.0" -- reg : The ID number for the phy, usually a small integer -- phy-handle : Should point to the external phy device. - See ethernet.txt file in the same directory. - -Example: - mdio { - #address-cells = <1>; - #size-cells = <0>; - phy: ethernet-phy@0 { - ...... - }; - gmiitorgmii: gmiitorgmii@8 { - compatible = "xlnx,gmii-to-rgmii-1.0"; - reg = <8>; - phy-handle = <&phy>; - }; - }; diff --git a/Documentation/devicetree/bindings/net/xlnx,gmii-to-rgmii.yaml b/Documentation/devicetree/bindings/net/xlnx,gmii-to-rgmii.yaml new file mode 100644 index 000000000000..0f781dac6717 --- /dev/null +++ b/Documentation/devicetree/bindings/net/xlnx,gmii-to-rgmii.yaml @@ -0,0 +1,55 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/net/xlnx,gmii-to-rgmii.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Xilinx GMII to RGMII Converter + +maintainers: + - Harini Katakam <harini.katakam@amd.com> + +description: + The Gigabit Media Independent Interface (GMII) to Reduced Gigabit Media + Independent Interface (RGMII) core provides the RGMII between RGMII-compliant + ethernet physical media devices (PHY) and the Gigabit Ethernet controller. + This core can be used in all three modes of operation(10/100/1000 Mb/s). + The Management Data Input/Output (MDIO) interface is used to configure the + speed of operation. This core can switch dynamically between the three + different speed modes by configuring the converter register through mdio write. + The core cannot function without an external phy connected to it. + +properties: + compatible: + const: xlnx,gmii-to-rgmii-1.0 + + reg: + minimum: 0 + maximum: 31 + description: The ID number for the phy. + + phy-handle: + $ref: ethernet-controller.yaml#/properties/phy-handle + +required: + - compatible + - reg + - phy-handle + +unevaluatedProperties: false + +examples: + - | + mdio { + #address-cells = <1>; + #size-cells = <0>; + + phy: ethernet-phy@0 { + reg = <0>; + }; + gmiitorgmii@8 { + compatible = "xlnx,gmii-to-rgmii-1.0"; + reg = <8>; + phy-handle = <&phy>; + }; + }; diff --git a/Documentation/devicetree/bindings/opp/operating-points-v2-ti-cpu.yaml b/Documentation/devicetree/bindings/opp/operating-points-v2-ti-cpu.yaml new file mode 100644 index 000000000000..02d1d2c17129 --- /dev/null +++ b/Documentation/devicetree/bindings/opp/operating-points-v2-ti-cpu.yaml @@ -0,0 +1,92 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/opp/operating-points-v2-ti-cpu.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: TI CPU OPP (Operating Performance Points) + +description: + TI SoCs, like those in the AM335x, AM437x, AM57xx, AM62x, and DRA7xx + families, the CPU frequencies subset and the voltage value of each + OPP vary based on the silicon variant used. The data sheet sections + corresponding to "Operating Performance Points" describe the frequency + and voltage values based on device type and speed bin information + blown in corresponding eFuse bits as referred to by the Technical + Reference Manual. + + This document extends the operating-points-v2 binding by providing + the hardware description for the scheme mentioned above. + +maintainers: + - Nishanth Menon <nm@ti.com> + +allOf: + - $ref: opp-v2-base.yaml# + +properties: + compatible: + const: operating-points-v2-ti-cpu + + syscon: + $ref: /schemas/types.yaml#/definitions/phandle + description: | + points to syscon node representing the control module + register space of the SoC. + + opp-shared: true + +patternProperties: + '^opp(-?[0-9]+)*$': + type: object + additionalProperties: false + + properties: + clock-latency-ns: true + opp-hz: true + opp-microvolt: true + opp-supported-hw: true + opp-suspend: true + turbo-mode: true + + required: + - opp-hz + - opp-supported-hw + +required: + - compatible + - syscon + +additionalProperties: false + +examples: + - | + opp-table { + compatible = "operating-points-v2-ti-cpu"; + syscon = <&scm_conf>; + + opp-300000000 { + opp-hz = /bits/ 64 <300000000>; + opp-microvolt = <1100000 1078000 1122000>; + opp-supported-hw = <0x06 0x0020>; + opp-suspend; + }; + + opp-500000000 { + opp-hz = /bits/ 64 <500000000>; + opp-microvolt = <1100000 1078000 1122000>; + opp-supported-hw = <0x01 0xFFFF>; + }; + + opp-600000000 { + opp-hz = /bits/ 64 <600000000>; + opp-microvolt = <1100000 1078000 1122000>; + opp-supported-hw = <0x06 0x0040>; + }; + + opp-1000000000 { + opp-hz = /bits/ 64 <1000000000>; + opp-microvolt = <1325000 1298500 1351500>; + opp-supported-hw = <0x04 0x0200>; + }; + }; diff --git a/Documentation/devicetree/bindings/opp/opp-v2-base.yaml b/Documentation/devicetree/bindings/opp/opp-v2-base.yaml index 47e6f36b7637..e2f8f7af3cf4 100644 --- a/Documentation/devicetree/bindings/opp/opp-v2-base.yaml +++ b/Documentation/devicetree/bindings/opp/opp-v2-base.yaml @@ -56,7 +56,7 @@ patternProperties: need to be configured and that is left for the implementation specific binding. minItems: 1 - maxItems: 16 + maxItems: 32 items: maxItems: 1 diff --git a/Documentation/devicetree/bindings/opp/ti,omap-opp-supply.yaml b/Documentation/devicetree/bindings/opp/ti,omap-opp-supply.yaml new file mode 100644 index 000000000000..693f22539606 --- /dev/null +++ b/Documentation/devicetree/bindings/opp/ti,omap-opp-supply.yaml @@ -0,0 +1,101 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/opp/ti,omap-opp-supply.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Texas Instruments OMAP compatible OPP supply + +description: + OMAP5, DRA7, and AM57 families of SoCs have Class 0 AVS eFuse + registers, which contain OPP-specific voltage information tailored + for the specific device. This binding provides the information + needed to describe such a hardware values and relate them to program + the primary regulator during an OPP transition. + + Also, some supplies may have an associated vbb-supply, an Adaptive + Body Bias regulator, which must transition in a specific sequence + w.r.t the vdd-supply and clk when making an OPP transition. By + supplying two regulators to the device that will undergo OPP + transitions, we can use the multi-regulator support implemented by + the OPP core to describe both regulators the platform needs. The + OPP core binding Documentation/devicetree/bindings/opp/opp-v2.yaml + provides further information (refer to Example 4 Handling multiple + regulators). + +maintainers: + - Nishanth Menon <nm@ti.com> + +properties: + $nodename: + pattern: '^opp-supply(@[0-9a-f]+)?$' + + compatible: + oneOf: + - description: Basic OPP supply controlling VDD and VBB + const: ti,omap-opp-supply + - description: OMAP5+ optimized voltages in efuse(Class 0) VDD along with + VBB. + const: ti,omap5-opp-supply + - description: OMAP5+ optimized voltages in efuse(class0) VDD but no VBB + const: ti,omap5-core-opp-supply + + reg: + maxItems: 1 + + ti,absolute-max-voltage-uv: + $ref: /schemas/types.yaml#/definitions/uint32 + description: Absolute maximum voltage for the OPP supply in micro-volts. + minimum: 750000 + maximum: 1500000 + + ti,efuse-settings: + description: An array of u32 tuple items providing information about + optimized efuse configuration. + minItems: 1 + $ref: /schemas/types.yaml#/definitions/uint32-matrix + items: + items: + - description: Reference voltage in micro-volts (OPP Voltage) + minimum: 750000 + maximum: 1500000 + multipleOf: 10000 + - description: efuse offset where the optimized voltage is located + multipleOf: 4 + maximum: 256 + +required: + - compatible + - ti,absolute-max-voltage-uv + +allOf: + - if: + not: + properties: + compatible: + contains: + const: ti,omap-opp-supply + then: + required: + - reg + - ti,efuse-settings + +additionalProperties: false + +examples: + - | + opp-supply { + compatible = "ti,omap-opp-supply"; + ti,absolute-max-voltage-uv = <1375000>; + }; + - | + opp-supply@4a003b20 { + compatible = "ti,omap5-opp-supply"; + reg = <0x4a003b20 0x8>; + ti,efuse-settings = + /* uV offset */ + <1060000 0x0>, + <1160000 0x4>, + <1210000 0x8>; + ti,absolute-max-voltage-uv = <1500000>; + }; diff --git a/Documentation/devicetree/bindings/opp/ti-omap5-opp-supply.txt b/Documentation/devicetree/bindings/opp/ti-omap5-opp-supply.txt deleted file mode 100644 index b70d326117cd..000000000000 --- a/Documentation/devicetree/bindings/opp/ti-omap5-opp-supply.txt +++ /dev/null @@ -1,63 +0,0 @@ -Texas Instruments OMAP compatible OPP supply description - -OMAP5, DRA7, and AM57 family of SoCs have Class0 AVS eFuse registers which -contain data that can be used to adjust voltages programmed for some of their -supplies for more efficient operation. This binding provides the information -needed to read these values and use them to program the main regulator during -an OPP transitions. - -Also, some supplies may have an associated vbb-supply which is an Adaptive Body -Bias regulator which much be transitioned in a specific sequence with regards -to the vdd-supply and clk when making an OPP transition. By supplying two -regulators to the device that will undergo OPP transitions we can make use -of the multi regulator binding that is part of the OPP core described here [1] -to describe both regulators needed by the platform. - -[1] Documentation/devicetree/bindings/opp/opp-v2.yaml - -Required Properties for Device Node: -- vdd-supply: phandle to regulator controlling VDD supply -- vbb-supply: phandle to regulator controlling Body Bias supply - (Usually Adaptive Body Bias regulator) - -Required Properties for opp-supply node: -- compatible: Should be one of: - "ti,omap-opp-supply" - basic OPP supply controlling VDD and VBB - "ti,omap5-opp-supply" - OMAP5+ optimized voltages in efuse(class0)VDD - along with VBB - "ti,omap5-core-opp-supply" - OMAP5+ optimized voltages in efuse(class0) VDD - but no VBB. -- reg: Address and length of the efuse register set for the device (mandatory - only for "ti,omap5-opp-supply") -- ti,efuse-settings: An array of u32 tuple items providing information about - optimized efuse configuration. Each item consists of the following: - volt: voltage in uV - reference voltage (OPP voltage) - efuse_offseet: efuse offset from reg where the optimized voltage is stored. -- ti,absolute-max-voltage-uv: absolute maximum voltage for the OPP supply. - -Example: - -/* Device Node (CPU) */ -cpus { - cpu0: cpu@0 { - device_type = "cpu"; - - ... - - vdd-supply = <&vcc>; - vbb-supply = <&abb_mpu>; - }; -}; - -/* OMAP OPP Supply with Class0 registers */ -opp_supply_mpu: opp_supply@4a003b20 { - compatible = "ti,omap5-opp-supply"; - reg = <0x4a003b20 0x8>; - ti,efuse-settings = < - /* uV offset */ - 1060000 0x0 - 1160000 0x4 - 1210000 0x8 - >; - ti,absolute-max-voltage-uv = <1500000>; -}; diff --git a/Documentation/devicetree/bindings/regulator/active-semi,act8846.yaml b/Documentation/devicetree/bindings/regulator/active-semi,act8846.yaml index 3725348bb235..02f45b5834d0 100644 --- a/Documentation/devicetree/bindings/regulator/active-semi,act8846.yaml +++ b/Documentation/devicetree/bindings/regulator/active-semi,act8846.yaml @@ -28,75 +28,37 @@ properties: the VSEL pin is assumed to be low. type: boolean - regulators: - type: object - additionalProperties: false + inl1-supply: + description: Handle to the INL1 input supply (REG5-7) - properties: - REG1: - type: object - $ref: /schemas/regulator/regulator.yaml# - unevaluatedProperties: false + inl2-supply: + description: Handle to the INL2 input supply (REG8-9) - properties: - vp1-supply: - description: Handle to the VP1 input supply + inl3-supply: + description: Handle to the INL3 input supply (REG10-12) - REG2: - type: object - $ref: /schemas/regulator/regulator.yaml# - unevaluatedProperties: false + vp1-supply: + description: Handle to the VP1 input supply (REG1) - properties: - vp2-supply: - description: Handle to the VP2 input supply + vp2-supply: + description: Handle to the VP2 input supply (REG2) - REG3: - type: object - $ref: /schemas/regulator/regulator.yaml# - unevaluatedProperties: false + vp3-supply: + description: Handle to the VP3 input supply (REG3) - properties: - vp3-supply: - description: Handle to the VP3 input supply - - REG4: - type: object - $ref: /schemas/regulator/regulator.yaml# - unevaluatedProperties: false + vp4-supply: + description: Handle to the VP4 input supply (REG4) - properties: - vp4-supply: - description: Handle to the VP4 input supply + regulators: + type: object + additionalProperties: false patternProperties: - "^REG[5-7]$": + "^REG([1-9]|1[0-2])$": type: object $ref: /schemas/regulator/regulator.yaml# unevaluatedProperties: false - properties: - inl1-supply: - description: Handle to the INL1 input supply - - "^REG[8-9]$": - type: object - $ref: /schemas/regulator/regulator.yaml# - unevaluatedProperties: false - - properties: - inl2-supply: - description: Handle to the INL2 input supply - - "^REG1[0-2]$": - type: object - $ref: /schemas/regulator/regulator.yaml# - unevaluatedProperties: false - - properties: - inl3-supply: - description: Handle to the INL3 input supply - additionalProperties: false required: diff --git a/Documentation/devicetree/bindings/regulator/adi,max77857.yaml b/Documentation/devicetree/bindings/regulator/adi,max77857.yaml new file mode 100644 index 000000000000..d1fa74aca721 --- /dev/null +++ b/Documentation/devicetree/bindings/regulator/adi,max77857.yaml @@ -0,0 +1,86 @@ +# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause) +# Copyright 2022 Analog Devices Inc. +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/regulator/adi,max77857.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Analog Devices MAX77857 Buck-Boost Converter + +maintainers: + - Ibrahim Tilki <Ibrahim.Tilki@analog.com> + - Okan Sahin <Okan.Sahin@analog.com> + +description: Analog Devices MAX77857 Buck-Boost Converter + +properties: + compatible: + enum: + - adi,max77831 + - adi,max77857 + - adi,max77859 + - adi,max77859a + + reg: + description: I2C address of the device + items: + - enum: [0x66, 0x67, 0x6E, 0x6F] + + interrupts: + maxItems: 1 + + adi,switch-frequency-hz: + description: Switching frequency of the Buck-Boost converter in Hz. + items: + - enum: [1200000, 1500000, 1800000, 2100000] + + adi,rtop-ohms: + description: Top feedback resistor value in ohms for external feedback. + minimum: 150000 + maximum: 330000 + + adi,rbot-ohms: + description: Bottom feedback resistor value in ohms for external feedback. + +dependencies: + adi,rtop-ohms: [ 'adi,rbot-ohms' ] + adi,rbot-ohms: [ 'adi,rtop-ohms' ] + +required: + - compatible + - reg + +allOf: + - $ref: regulator.yaml# + - if: + properties: + compatible: + contains: + enum: + - adi,max77831 + + then: + properties: + adi,switch-frequency-hz: + items: + enum: [1200000, 1500000, 1800000] + +unevaluatedProperties: false + +examples: + - | + #include <dt-bindings/interrupt-controller/irq.h> + i2c { + #address-cells = <1>; + #size-cells = <0>; + + regulator@66 { + reg = <0x66>; + compatible = "adi,max77857"; + interrupt-parent = <&gpio>; + interrupts = <26 IRQ_TYPE_EDGE_FALLING>; + + adi,rtop-ohms = <312000>; + adi,rbot-ohms = <12000>; + }; + }; diff --git a/Documentation/devicetree/bindings/regulator/awinic,aw37503.yaml b/Documentation/devicetree/bindings/regulator/awinic,aw37503.yaml new file mode 100644 index 000000000000..c92a881ed60e --- /dev/null +++ b/Documentation/devicetree/bindings/regulator/awinic,aw37503.yaml @@ -0,0 +1,78 @@ +# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/regulator/awinic,aw37503.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Awinic AW37503 Voltage Regulator + +maintainers: + - Alec Li <like@awinic.com> + +description: + The AW37503 are dual voltage regulator, designed to support positive/negative + supply for driving TFT-LCD panels. It support software-configurable output + switching and monitoring. The output voltages can be programmed via an I2C + compatible interface. + +properties: + compatible: + const: awinic,aw37503 + + reg: + maxItems: 1 + +patternProperties: + "^out[pn]$": + type: object + $ref: regulator.yaml# + unevaluatedProperties: false + description: + Properties for single regulator. + + properties: + enable-gpios: + maxItems: 1 + description: + GPIO specifier to enable the GPIO control (on/off) for regulator. + + required: + - regulator-name + +required: + - compatible + - reg + - outp + - outn + +additionalProperties: false + +examples: + - | + #include <dt-bindings/gpio/gpio.h> + + i2c { + #address-cells = <1>; + #size-cells = <0>; + + regulator@3e { + compatible = "awinic,aw37503"; + reg = <0x3e>; + + outp { + regulator-name = "outp"; + regulator-boot-on; + regulator-always-on; + enable-gpios = <&gpio 17 GPIO_ACTIVE_LOW>; + }; + + outn { + regulator-name = "outn"; + regulator-boot-on; + regulator-always-on; + enable-gpios = <&gpio 27 GPIO_ACTIVE_LOW>; + }; + }; + }; +... + diff --git a/Documentation/devicetree/bindings/regulator/dlg,da9121.yaml b/Documentation/devicetree/bindings/regulator/dlg,da9121.yaml index dc626517c2ad..13b3f75f8e5e 100644 --- a/Documentation/devicetree/bindings/regulator/dlg,da9121.yaml +++ b/Documentation/devicetree/bindings/regulator/dlg,da9121.yaml @@ -95,11 +95,6 @@ properties: Properties for a single BUCK regulator properties: - regulator-name: - pattern: "^BUCK([1-2])$" - description: | - BUCK2 present in DA9122, DA9220, DA9131, DA9132 only - regulator-initial-mode: enum: [ 0, 1, 2, 3 ] description: Defined in include/dt-bindings/regulator/dlg,da9121-regulator.h @@ -122,6 +117,23 @@ required: - reg - regulators +allOf: + - if: + properties: + compatible: + not: + contains: + enum: + - dlg,da9122 + - dlg,da9131 + - dlg,da9132 + - dlg,da9220 + then: + properties: + regulators: + properties: + buck2: false + additionalProperties: false examples: diff --git a/Documentation/devicetree/bindings/regulator/dlg,slg51000.yaml b/Documentation/devicetree/bindings/regulator/dlg,slg51000.yaml new file mode 100644 index 000000000000..bad140418e49 --- /dev/null +++ b/Documentation/devicetree/bindings/regulator/dlg,slg51000.yaml @@ -0,0 +1,132 @@ +# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/regulator/dlg,slg51000.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Dialog Semiconductor SLG51000 Voltage Regulator + +maintainers: + - Eric Jeong <eric.jeong.opensource@diasemi.com> + - Support Opensource <support.opensource@diasemi.com> + +properties: + compatible: + const: dlg,slg51000 + + reg: + maxItems: 1 + + interrupts: + maxItems: 1 + + dlg,cs-gpios: + maxItems: 1 + description: + GPIO for chip select + + vin3-supply: + description: + Input supply for ldo3, required if regulator is enabled + + vin4-supply: + description: + Input supply for ldo4, required if regulator is enabled + + vin5-supply: + description: + Input supply for ldo5, required if regulator is enabled + + vin6-supply: + description: + Input supply for ldo6, required if regulator is enabled + + vin7-supply: + description: + Input supply for ldo7, required if regulator is enabled + + regulators: + type: object + additionalProperties: false + + patternProperties: + "^ldo[1-7]$": + type: object + $ref: /schemas/regulator/regulator.yaml# + unevaluatedProperties: false + + properties: + enable-gpios: + maxItems: 1 + + required: + - regulator-name + +required: + - compatible + - reg + - regulators + +additionalProperties: false + +examples: + - | + #include <dt-bindings/gpio/gpio.h> + #include <dt-bindings/interrupt-controller/irq.h> + #include <dt-bindings/regulator/dlg,da9121-regulator.h> + i2c { + #address-cells = <1>; + #size-cells = <0>; + + pmic@75 { + compatible = "dlg,slg51000"; + reg = <0x75>; + dlg,cs-gpios = <&tlmm 69 GPIO_ACTIVE_HIGH>; + vin5-supply = <&vreg_s1f_1p2>; + vin6-supply = <&vreg_s1f_1p2>; + + regulators { + ldo1 { + regulator-name = "slg51000_b_ldo1"; + regulator-min-microvolt = <2400000>; + regulator-max-microvolt = <3300000>; + }; + + ldo2 { + regulator-name = "slg51000_b_ldo2"; + regulator-min-microvolt = <2400000>; + regulator-max-microvolt = <3300000>; + }; + + ldo3 { + regulator-name = "slg51000_b_ldo3"; + regulator-min-microvolt = <1200000>; + regulator-max-microvolt = <3750000>; + }; + + ldo4 { + regulator-name = "slg51000_b_ldo4"; + regulator-min-microvolt = <1200000>; + regulator-max-microvolt = <3750000>; + }; + + ldo5 { + regulator-name = "slg51000_b_ldo5"; + regulator-min-microvolt = <500000>; + regulator-max-microvolt = <1200000>; + }; + + ldo6 { + regulator-name = "slg51000_b_ldo6"; + regulator-min-microvolt = <500000>; + regulator-max-microvolt = <1200000>; + }; + + ldo7 { + regulator-name = "slg51000_b_ldo7"; + regulator-min-microvolt = <1200000>; + regulator-max-microvolt = <3750000>; + }; + }; + }; + }; diff --git a/Documentation/devicetree/bindings/regulator/mps,mp5416.yaml b/Documentation/devicetree/bindings/regulator/mps,mp5416.yaml index 2e720d152890..0221397eb51e 100644 --- a/Documentation/devicetree/bindings/regulator/mps,mp5416.yaml +++ b/Documentation/devicetree/bindings/regulator/mps,mp5416.yaml @@ -29,10 +29,12 @@ properties: patternProperties: "^buck[1-4]$": $ref: regulator.yaml# + unevaluatedProperties: false type: object "^ldo[1-4]$": $ref: regulator.yaml# + unevaluatedProperties: false type: object additionalProperties: false diff --git a/Documentation/devicetree/bindings/regulator/mps,mpq7920.yaml b/Documentation/devicetree/bindings/regulator/mps,mpq7920.yaml index f3fcfc8be72f..6de5b027f990 100644 --- a/Documentation/devicetree/bindings/regulator/mps,mpq7920.yaml +++ b/Documentation/devicetree/bindings/regulator/mps,mpq7920.yaml @@ -21,7 +21,6 @@ properties: regulators: type: object - $ref: regulator.yaml# description: | list of regulators provided by this controller, must be named @@ -39,11 +38,13 @@ properties: ldortc: type: object $ref: regulator.yaml# + unevaluatedProperties: false patternProperties: "^ldo[1-4]$": type: object $ref: regulator.yaml# + unevaluatedProperties: false "^buck[1-4]$": type: object diff --git a/Documentation/devicetree/bindings/regulator/pfuze100.yaml b/Documentation/devicetree/bindings/regulator/pfuze100.yaml index e384e4953f0a..0eda44752cdd 100644 --- a/Documentation/devicetree/bindings/regulator/pfuze100.yaml +++ b/Documentation/devicetree/bindings/regulator/pfuze100.yaml @@ -68,18 +68,22 @@ properties: "^sw([1-4]|[1-4][a-c]|[1-4][a-c][a-c])$": $ref: regulator.yaml# type: object + unevaluatedProperties: false "^vgen[1-6]$": $ref: regulator.yaml# type: object + unevaluatedProperties: false "^vldo[1-4]$": $ref: regulator.yaml# type: object + unevaluatedProperties: false "^(vsnvs|vref|vrefddr|swbst|coin|v33|vccsd)$": $ref: regulator.yaml# type: object + unevaluatedProperties: false additionalProperties: false diff --git a/Documentation/devicetree/bindings/regulator/qcom,rpm-regulator.yaml b/Documentation/devicetree/bindings/regulator/qcom,rpm-regulator.yaml index 8a08698e3484..b4eb4001eb3d 100644 --- a/Documentation/devicetree/bindings/regulator/qcom,rpm-regulator.yaml +++ b/Documentation/devicetree/bindings/regulator/qcom,rpm-regulator.yaml @@ -49,7 +49,7 @@ patternProperties: ".*-supply$": description: Input supply phandle(s) for this node - "^((s|l|lvs)[0-9]*)|(s[1-2][a-b])|(ncp)|(mvs)|(usb-switch)|(hdmi-switch)$": + "^((s|l|lvs)[0-9]*|s[1-2][a-b]|ncp|mvs|usb-switch|hdmi-switch)$": description: List of regulators and its properties $ref: regulator.yaml# unevaluatedProperties: false diff --git a/Documentation/devicetree/bindings/regulator/qcom,rpmh-regulator.yaml b/Documentation/devicetree/bindings/regulator/qcom,rpmh-regulator.yaml index b9498504ad79..127a6f39b7f0 100644 --- a/Documentation/devicetree/bindings/regulator/qcom,rpmh-regulator.yaml +++ b/Documentation/devicetree/bindings/regulator/qcom,rpmh-regulator.yaml @@ -53,6 +53,7 @@ description: | For PMR735A, smps1 - smps3, ldo1 - ldo7 For PMX55, smps1 - smps7, ldo1 - ldo16 For PMX65, smps1 - smps8, ldo1 - ldo21 + For PMX75, smps1 - smps10, ldo1 - ldo21 properties: compatible: @@ -84,13 +85,14 @@ properties: - qcom,pmr735a-rpmh-regulators - qcom,pmx55-rpmh-regulators - qcom,pmx65-rpmh-regulators + - qcom,pmx75-rpmh-regulators qcom,pmic-id: description: | RPMh resource name suffix used for the regulators found on this PMIC. $ref: /schemas/types.yaml#/definitions/string - enum: [a, b, c, d, e, f, g, h, k] + enum: [a, b, c, d, e, f, g, h, i, j, k, l, m, n] qcom,always-wait-for-ack: description: | @@ -109,6 +111,7 @@ properties: bob: type: object $ref: regulator.yaml# + unevaluatedProperties: false description: BOB regulator node. dependencies: regulator-allow-set-load: [ regulator-allowed-modes ] @@ -117,6 +120,7 @@ patternProperties: "^(smps|ldo|lvs|bob)[0-9]+$": type: object $ref: regulator.yaml# + unevaluatedProperties: false description: smps/ldo regulator nodes(s). dependencies: regulator-allow-set-load: [ regulator-allowed-modes ] @@ -424,10 +428,28 @@ allOf: vdd-l11-l13-supply: true patternProperties: "^vdd-l[1347]-supply$": true - "^vdd-l1[0245789]-supply$": true + "^vdd-l1[024579]-supply$": true "^vdd-l2[01]-supply$": true "^vdd-s[1-8]-supply$": true + - if: + properties: + compatible: + enum: + - qcom,pmx75-rpmh-regulators + then: + properties: + vdd-l2-l18-supply: true + vdd-l4-l16-supply: true + vdd-l5-l6-supply: true + vdd-l8-l9-supply: true + vdd-l11-l13-supply: true + vdd-l20-l21-supply: true + patternProperties: + "^vdd-l[137]-supply$": true + "^vdd-l1[024579]-supply$": true + "^vdd-s([1-9]|10)-supply$": true + unevaluatedProperties: false examples: diff --git a/Documentation/devicetree/bindings/regulator/qcom,sdm845-refgen-regulator.yaml b/Documentation/devicetree/bindings/regulator/qcom,sdm845-refgen-regulator.yaml new file mode 100644 index 000000000000..f02f97d4fdd2 --- /dev/null +++ b/Documentation/devicetree/bindings/regulator/qcom,sdm845-refgen-regulator.yaml @@ -0,0 +1,57 @@ +# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/regulator/qcom,sdm845-refgen-regulator.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Qualcomm Technologies, Inc. REFGEN Regulator + +maintainers: + - Konrad Dybcio <konradybcio@kernel.org> + +description: + The REFGEN (reference voltage generator) regulator provides reference + voltage for on-chip IPs (like PHYs) on some Qualcomm SoCs. + +allOf: + - $ref: regulator.yaml# + +properties: + compatible: + oneOf: + - items: + - enum: + - qcom,sc7180-refgen-regulator + - qcom,sc8180x-refgen-regulator + - qcom,sm8150-refgen-regulator + - const: qcom,sdm845-refgen-regulator + + - items: + - enum: + - qcom,sc7280-refgen-regulator + - qcom,sc8280xp-refgen-regulator + - qcom,sm6350-refgen-regulator + - qcom,sm6375-refgen-regulator + - qcom,sm8350-refgen-regulator + - const: qcom,sm8250-refgen-regulator + + - enum: + - qcom,sdm845-refgen-regulator + - qcom,sm8250-refgen-regulator + + reg: + maxItems: 1 + +required: + - compatible + - reg + +unevaluatedProperties: false + +examples: + - | + regulator@162f000 { + compatible = "qcom,sm8250-refgen-regulator"; + reg = <0x0162f000 0x84>; + }; +... diff --git a/Documentation/devicetree/bindings/regulator/qcom,smd-rpm-regulator.yaml b/Documentation/devicetree/bindings/regulator/qcom,smd-rpm-regulator.yaml index a8ca8e0b27f8..9ea8ac0786ac 100644 --- a/Documentation/devicetree/bindings/regulator/qcom,smd-rpm-regulator.yaml +++ b/Documentation/devicetree/bindings/regulator/qcom,smd-rpm-regulator.yaml @@ -110,6 +110,7 @@ patternProperties: "^((s|l|lvs|5vs)[0-9]*)|(boost-bypass)|(bob)$": description: List of regulators and its properties $ref: regulator.yaml# + unevaluatedProperties: false additionalProperties: false diff --git a/Documentation/devicetree/bindings/regulator/richtek,rt4831-regulator.yaml b/Documentation/devicetree/bindings/regulator/richtek,rt4831-regulator.yaml index d9c23333e157..cd06e957b9db 100644 --- a/Documentation/devicetree/bindings/regulator/richtek,rt4831-regulator.yaml +++ b/Documentation/devicetree/bindings/regulator/richtek,rt4831-regulator.yaml @@ -29,6 +29,7 @@ patternProperties: "^DSV(LCM|P|N)$": type: object $ref: regulator.yaml# + unevaluatedProperties: false description: Properties for single Display Bias Voltage regulator. diff --git a/Documentation/devicetree/bindings/regulator/richtek,rt5739.yaml b/Documentation/devicetree/bindings/regulator/richtek,rt5739.yaml index 358297dd3fb7..e95e046e9ed6 100644 --- a/Documentation/devicetree/bindings/regulator/richtek,rt5739.yaml +++ b/Documentation/devicetree/bindings/regulator/richtek,rt5739.yaml @@ -21,6 +21,7 @@ allOf: properties: compatible: enum: + - richtek,rt5733 - richtek,rt5739 reg: diff --git a/Documentation/devicetree/bindings/regulator/richtek,rtmv20-regulator.yaml b/Documentation/devicetree/bindings/regulator/richtek,rtmv20-regulator.yaml index 446ec5127d1f..fec3d396ca50 100644 --- a/Documentation/devicetree/bindings/regulator/richtek,rtmv20-regulator.yaml +++ b/Documentation/devicetree/bindings/regulator/richtek,rtmv20-regulator.yaml @@ -121,6 +121,7 @@ properties: description: load switch current regulator description. type: object $ref: regulator.yaml# + unevaluatedProperties: false required: - compatible diff --git a/Documentation/devicetree/bindings/regulator/richtek,rtq2208.yaml b/Documentation/devicetree/bindings/regulator/richtek,rtq2208.yaml new file mode 100644 index 000000000000..609c06615bdc --- /dev/null +++ b/Documentation/devicetree/bindings/regulator/richtek,rtq2208.yaml @@ -0,0 +1,197 @@ +# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/regulator/richtek,rtq2208.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Richtek RTQ2208 SubPMIC Regulator + +maintainers: + - Alina Yu <alina_yu@richtek.com> + +description: | + RTQ2208 is a highly integrated power converter that offers functional safety dual + multi-configurable synchronous buck converters and two LDOs. + + Bucks support "regulator-allowed-modes" and "regulator-mode". The former defines the permitted + switching operation in normal mode; the latter defines the operation in suspend to RAM mode. + + No matter the RTQ2208 is configured to normal or suspend to RAM mode, there are two switching + operation modes for all buck rails, automatic power saving mode (Auto mode) and forced continuous + conduction mode (FCCM). + + The definition of modes is in the datasheet which is available in below link + and their meaning is:: + 0 - Auto mode for power saving, which reducing the switching frequency at light load condition + to maintain high frequency. + 1 - FCCM to meet the strict voltage regulation accuracy, which keeping constant switching frequency. + + Datasheet will be available soon at + https://www.richtek.com/assets/Products + +properties: + compatible: + enum: + - richtek,rtq2208 + + reg: + maxItems: 1 + + interrupts: + maxItems: 1 + + richtek,mtp-sel-high: + type: boolean + description: + vout register selection based on this boolean value. + false - Using DVS0 register setting to adjust vout + true - Using DVS1 register setting to adjust vout + + regulators: + type: object + additionalProperties: false + + patternProperties: + "^buck-[a-h]$": + type: object + $ref: regulator.yaml# + unevaluatedProperties: false + description: + description for buck-[a-h] regulator. + + properties: + regulator-allowed-modes: + description: + two buck modes in different switching accuracy. + 0 - Auto mode + 1 - FCCM + items: + enum: [0, 1] + + "^ldo[1-2]$": + type: object + $ref: regulator.yaml# + unevaluatedProperties: false + description: + regulator description for ldo[1-2]. + +required: + - compatible + - reg + - regulators + +additionalProperties: false + +examples: + - | + #include <dt-bindings/interrupt-controller/irq.h> + i2c { + #address-cells = <1>; + #size-cells = <0>; + + pmic@10 { + compatible = "richtek,rtq2208"; + reg = <0x10>; + interrupts-extended = <&gpio26 0 IRQ_TYPE_LEVEL_LOW>; + richtek,mtp-sel-high; + + regulators { + buck-a { + regulator-min-microvolt = <400000>; + regulator-max-microvolt = <2050000>; + regulator-allowed-modes = <0 1>; + regulator-always-on; + regulator-state-mem { + regulator-on-in-suspend; + regulator-mode = <1>; + }; + }; + buck-b { + regulator-min-microvolt = <400000>; + regulator-max-microvolt = <2050000>; + regulator-allowed-modes = <0 1>; + regulator-always-on; + regulator-state-mem { + regulator-on-in-suspend; + regulator-mode = <1>; + }; + }; + buck-c { + regulator-min-microvolt = <400000>; + regulator-max-microvolt = <2050000>; + regulator-allowed-modes = <0 1>; + regulator-always-on; + regulator-state-mem { + regulator-on-in-suspend; + regulator-mode = <1>; + }; + }; + buck-d { + regulator-min-microvolt = <400000>; + regulator-max-microvolt = <2050000>; + regulator-allowed-modes = <0 1>; + regulator-always-on; + regulator-state-mem { + regulator-on-in-suspend; + regulator-mode = <1>; + }; + }; + buck-e { + regulator-min-microvolt = <400000>; + regulator-max-microvolt = <2050000>; + regulator-allowed-modes = <0 1>; + regulator-always-on; + regulator-state-mem { + regulator-on-in-suspend; + regulator-mode = <1>; + }; + }; + buck-f { + regulator-min-microvolt = <400000>; + regulator-max-microvolt = <2050000>; + regulator-allowed-modes = <0 1>; + regulator-always-on; + regulator-state-mem { + regulator-on-in-suspend; + regulator-mode = <1>; + }; + }; + buck-g { + regulator-min-microvolt = <400000>; + regulator-max-microvolt = <2050000>; + regulator-allowed-modes = <0 1>; + regulator-always-on; + regulator-state-mem { + regulator-on-in-suspend; + regulator-mode = <1>; + }; + }; + buck-h { + regulator-min-microvolt = <400000>; + regulator-max-microvolt = <2050000>; + regulator-allowed-modes = <0 1>; + regulator-always-on; + regulator-state-mem { + regulator-on-in-suspend; + regulator-mode = <1>; + }; + }; + ldo1 { + regulator-min-microvolt = <1200000>; + regulator-max-microvolt = <1200000>; + regulator-always-on; + regulator-state-mem { + regulator-on-in-suspend; + }; + }; + ldo2 { + regulator-min-microvolt = <3300000>; + regulator-max-microvolt = <3300000>; + regulator-always-on; + regulator-state-mem { + regulator-on-in-suspend; + }; + }; + }; + }; + }; diff --git a/Documentation/devicetree/bindings/regulator/richtek,rtq6752-regulator.yaml b/Documentation/devicetree/bindings/regulator/richtek,rtq6752-regulator.yaml index e6e5a9a7d940..ef62c618de67 100644 --- a/Documentation/devicetree/bindings/regulator/richtek,rtq6752-regulator.yaml +++ b/Documentation/devicetree/bindings/regulator/richtek,rtq6752-regulator.yaml @@ -35,6 +35,7 @@ properties: "^(p|n)avdd$": type: object $ref: regulator.yaml# + unevaluatedProperties: false description: | regulator description for pavdd and navdd. diff --git a/Documentation/devicetree/bindings/regulator/slg51000.txt b/Documentation/devicetree/bindings/regulator/slg51000.txt deleted file mode 100644 index aa0733e49b90..000000000000 --- a/Documentation/devicetree/bindings/regulator/slg51000.txt +++ /dev/null @@ -1,88 +0,0 @@ -* Dialog Semiconductor SLG51000 Voltage Regulator - -Required properties: -- compatible : Should be "dlg,slg51000" for SLG51000 -- reg : Specifies the I2C slave address. -- xxx-supply: Input voltage supply regulator for ldo3 to ldo7. - These entries are required if regulators are enabled for a device. - An absence of these properties can cause the regulator registration to fail. - If some of input supply is powered through battery or always-on supply then - also it is required to have these parameters with proper node handle of always - on power supply. - vin3-supply: Input supply for ldo3 - vin4-supply: Input supply for ldo4 - vin5-supply: Input supply for ldo5 - vin6-supply: Input supply for ldo6 - vin7-supply: Input supply for ldo7 - -Optional properties: -- interrupt-parent : Specifies the reference to the interrupt controller. -- interrupts : IRQ line information. -- dlg,cs-gpios : Specify a valid GPIO for chip select - -Sub-nodes: -- regulators : This node defines the settings for the regulators. - The content of the sub-node is defined by the standard binding - for regulators; see regulator.txt. - - The SLG51000 regulators are bound using their names listed below: - ldo1 - ldo2 - ldo3 - ldo4 - ldo5 - ldo6 - ldo7 - -Optional properties for regulators: -- enable-gpios : Specify a valid GPIO for platform control of the regulator. - -Example: - pmic: slg51000@75 { - compatible = "dlg,slg51000"; - reg = <0x75>; - - regulators { - ldo1 { - regulator-name = "ldo1"; - regulator-min-microvolt = <2400000>; - regulator-max-microvolt = <3300000>; - }; - - ldo2 { - regulator-name = "ldo2"; - regulator-min-microvolt = <2400000>; - regulator-max-microvolt = <3300000>; - }; - - ldo3 { - regulator-name = "ldo3"; - regulator-min-microvolt = <1200000>; - regulator-max-microvolt = <3750000>; - }; - - ldo4 { - regulator-name = "ldo4"; - regulator-min-microvolt = <1200000>; - regulator-max-microvolt = <3750000>; - }; - - ldo5 { - regulator-name = "ldo5"; - regulator-min-microvolt = <500000>; - regulator-max-microvolt = <1200000>; - }; - - ldo6 { - regulator-name = "ldo6"; - regulator-min-microvolt = <500000>; - regulator-max-microvolt = <1200000>; - }; - - ldo7 { - regulator-name = "ldo7"; - regulator-min-microvolt = <1200000>; - regulator-max-microvolt = <3750000>; - }; - }; - }; diff --git a/Documentation/devicetree/bindings/regulator/st,stm32mp1-pwr-reg.yaml b/Documentation/devicetree/bindings/regulator/st,stm32mp1-pwr-reg.yaml index 7d53cfa2c288..c9586d277f41 100644 --- a/Documentation/devicetree/bindings/regulator/st,stm32mp1-pwr-reg.yaml +++ b/Documentation/devicetree/bindings/regulator/st,stm32mp1-pwr-reg.yaml @@ -25,8 +25,8 @@ properties: patternProperties: "^(reg11|reg18|usb33)$": type: object - $ref: regulator.yaml# + unevaluatedProperties: false required: - compatible diff --git a/Documentation/devicetree/bindings/regulator/wlf,arizona.yaml b/Documentation/devicetree/bindings/regulator/wlf,arizona.yaml index 011819c10988..11e378648b3f 100644 --- a/Documentation/devicetree/bindings/regulator/wlf,arizona.yaml +++ b/Documentation/devicetree/bindings/regulator/wlf,arizona.yaml @@ -29,11 +29,13 @@ properties: Initial data for the LDO1 regulator. $ref: regulator.yaml# type: object + unevaluatedProperties: false micvdd: description: Initial data for the MICVDD regulator. $ref: regulator.yaml# type: object + unevaluatedProperties: false additionalProperties: true diff --git a/Documentation/devicetree/bindings/sound/cirrus,cs42l43.yaml b/Documentation/devicetree/bindings/sound/cirrus,cs42l43.yaml new file mode 100644 index 000000000000..7a6de938b11d --- /dev/null +++ b/Documentation/devicetree/bindings/sound/cirrus,cs42l43.yaml @@ -0,0 +1,313 @@ +# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/sound/cirrus,cs42l43.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Cirrus Logic CS42L43 Audio CODEC + +maintainers: + - patches@opensource.cirrus.com + +description: | + The CS42L43 is an audio CODEC with integrated MIPI SoundWire interface + (Version 1.2.1 compliant), I2C, SPI, and I2S/TDM interfaces designed + for portable applications. It provides a high dynamic range, stereo + DAC for headphone output, two integrated Class D amplifiers for + loudspeakers, and two ADCs for wired headset microphone input or + stereo line input. PDM inputs are provided for digital microphones. + +allOf: + - $ref: dai-common.yaml# + +properties: + compatible: + enum: + - cirrus,cs42l43 + + reg: + maxItems: 1 + + vdd-p-supply: + description: + Power supply for the high voltage interface. + + vdd-a-supply: + description: + Power supply for internal analog circuits. + + vdd-d-supply: + description: + Power supply for internal digital circuits. Can be internally supplied. + + vdd-io-supply: + description: + Power supply for external interface and internal digital logic. + + vdd-cp-supply: + description: + Power supply for the amplifier 3 and 4 charge pump. + + vdd-amp-supply: + description: + Power supply for amplifier 1 and 2. + + reset-gpios: + maxItems: 1 + + interrupt-controller: true + + "#interrupt-cells": + const: 2 + + interrupts: + maxItems: 1 + + "#sound-dai-cells": + const: 1 + + clocks: + items: + - description: Synchronous audio clock provided on mclk_in. + + clock-names: + const: mclk + + cirrus,bias-low: + type: boolean + description: + Select a 1.8V headset micbias rather than 2.8V. + + cirrus,bias-sense-microamp: + description: + Current at which the headset micbias sense clamp will engage, 0 to + disable. + enum: [ 0, 14, 23, 41, 50, 60, 68, 86, 95 ] + default: 0 + + cirrus,bias-ramp-ms: + description: + Time in milliseconds the hardware allows for the headset micbias to + ramp up. + enum: [ 10, 40, 90, 170 ] + default: 170 + + cirrus,detect-us: + description: + Time in microseconds the type detection will run for. Long values will + cause more audible effects, but give more accurate detection. + enum: [ 20, 100, 1000, 10000, 50000, 75000, 100000, 200000 ] + default: 10000 + + cirrus,button-automute: + type: boolean + description: + Enable the hardware automuting of decimator 1 when a headset button is + pressed. + + cirrus,buttons-ohms: + description: + Impedance in Ohms for each headset button, these should be listed in + ascending order. + minItems: 1 + maxItems: 6 + + cirrus,tip-debounce-ms: + description: + Software debounce on tip sense triggering in milliseconds. + default: 0 + + cirrus,tip-invert: + type: boolean + description: + Indicates tip detect polarity, inverted implies open-circuit whilst the + jack is inserted. + + cirrus,tip-disable-pullup: + type: boolean + description: + Indicates if the internal pullup on the tip detect should be disabled. + + cirrus,tip-fall-db-ms: + description: + Time in milliseconds a falling edge on the tip detect should be hardware + debounced for. Note the falling edge is considered after the invert. + enum: [ 0, 125, 250, 500, 750, 1000, 1250, 1500 ] + default: 500 + + cirrus,tip-rise-db-ms: + description: + Time in milliseconds a rising edge on the tip detect should be hardware + debounced for. Note the rising edge is considered after the invert. + enum: [ 0, 125, 250, 500, 750, 1000, 1250, 1500 ] + default: 500 + + cirrus,use-ring-sense: + type: boolean + description: + Indicates if the ring sense should be used. + + cirrus,ring-invert: + type: boolean + description: + Indicates ring detect polarity, inverted implies open-circuit whilst the + jack is inserted. + + cirrus,ring-disable-pullup: + type: boolean + description: + Indicates if the internal pullup on the ring detect should be disabled. + + cirrus,ring-fall-db-ms: + description: + Time in milliseconds a falling edge on the ring detect should be hardware + debounced for. Note the falling edge is considered after the invert. + enum: [ 0, 125, 250, 500, 750, 1000, 1250, 1500 ] + default: 500 + + cirrus,ring-rise-db-ms: + description: + Time in milliseconds a rising edge on the ring detect should be hardware + debounced for. Note the rising edge is considered after the invert. + enum: [ 0, 125, 250, 500, 750, 1000, 1250, 1500 ] + default: 500 + + pinctrl: + type: object + $ref: /schemas/pinctrl/pinctrl.yaml# + additionalProperties: false + + properties: + gpio-controller: true + + "#gpio-cells": + const: 2 + + gpio-ranges: + items: + - description: A phandle to the CODEC pinctrl node + minimum: 0 + - const: 0 + - const: 0 + - const: 3 + + patternProperties: + "-state$": + oneOf: + - $ref: "#/$defs/cirrus-cs42l43-state" + - patternProperties: + "-pins$": + $ref: "#/$defs/cirrus-cs42l43-state" + additionalProperties: false + + spi: + type: object + $ref: /schemas/spi/spi-controller.yaml# + unevaluatedProperties: false + +$defs: + cirrus-cs42l43-state: + type: object + + allOf: + - $ref: /schemas/pinctrl/pincfg-node.yaml# + - $ref: /schemas/pinctrl/pinmux-node.yaml# + + oneOf: + - required: [ groups ] + - required: [ pins ] + + additionalProperties: false + + properties: + groups: + enum: [ gpio1, gpio2, gpio3, asp, pdmout2, pdmout1, i2c, spi ] + + pins: + enum: [ gpio1, gpio2, gpio3, + asp_dout, asp_fsync, asp_bclk, + pdmout2_clk, pdmout2_data, pdmout1_clk, pdmout1_data, + i2c_sda, i2c_scl, + spi_miso, spi_sck, spi_ssb ] + + function: + enum: [ gpio, spdif, irq, mic-shutter, spk-shutter ] + + drive-strength: + description: Set drive strength in mA + enum: [ 1, 2, 4, 8, 9, 10, 12, 16 ] + + input-debounce: + description: Set input debounce in uS + enum: [ 0, 85 ] + +required: + - compatible + - reg + - vdd-p-supply + - vdd-a-supply + - vdd-io-supply + - vdd-cp-supply + +additionalProperties: false + +examples: + - | + #include <dt-bindings/interrupt-controller/irq.h> + + i2c { + #address-cells = <1>; + #size-cells = <0>; + + cs42l43: codec@1a { + compatible = "cirrus,cs42l43"; + reg = <0x1a>; + + vdd-p-supply = <&vdd5v0>; + vdd-a-supply = <&vdd1v8>; + vdd-io-supply = <&vdd1v8>; + vdd-cp-supply = <&vdd1v8>; + vdd-amp-supply = <&vdd5v0>; + + reset-gpios = <&gpio 0>; + + interrupt-controller; + #interrupt-cells = <2>; + interrupt-parent = <&gpio>; + interrupts = <56 IRQ_TYPE_LEVEL_LOW>; + + #sound-dai-cells = <1>; + + clocks = <&clks 0>; + clock-names = "mclk"; + + cs42l43_pins: pinctrl { + gpio-controller; + #gpio-cells = <2>; + gpio-ranges = <&cs42l43_pins 0 0 3>; + + pinctrl-names = "default"; + pinctrl-0 = <&pinsettings>; + + pinsettings: default-state { + shutter-pins { + groups = "gpio3"; + function = "mic-shutter"; + }; + }; + }; + + spi { + #address-cells = <1>; + #size-cells = <0>; + + cs-gpios = <&cs42l43_pins 1 0>; + + sensor@0 { + compatible = "bosch,bme680"; + reg = <0>; + spi-max-frequency = <1400000>; + }; + }; + }; + }; diff --git a/Documentation/devicetree/bindings/spi/brcm,bcm63xx-spi.yaml b/Documentation/devicetree/bindings/spi/brcm,bcm63xx-spi.yaml new file mode 100644 index 000000000000..fa03cdd68e70 --- /dev/null +++ b/Documentation/devicetree/bindings/spi/brcm,bcm63xx-spi.yaml @@ -0,0 +1,71 @@ +# SPDX-License-Identifier: GPL-2.0 OR BSD-2-Clause +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/spi/brcm,bcm63xx-spi.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Broadcom BCM6348/BCM6358 SPI controller + +maintainers: + - Jonas Gorski <jonas.gorski@gmail.com> + +description: | + Broadcom "Low Speed" SPI controller found in many older MIPS based Broadband + SoCs. + + This controller has a limitation that can not keep the chip select line active + between the SPI transfers within the same SPI message. This can terminate the + transaction to some SPI devices prematurely. The issue can be worked around by + the controller's prepend mode. + +allOf: + - $ref: spi-controller.yaml# + +properties: + compatible: + oneOf: + - items: + - enum: + - brcm,bcm6368-spi + - brcm,bcm6362-spi + - brcm,bcm63268-spi + - const: brcm,bcm6358-spi + - enum: + - brcm,bcm6348-spi + - brcm,bcm6358-spi + + reg: + maxItems: 1 + + clocks: + items: + - description: SPI master reference clock + + clock-names: + items: + - const: spi + + interrupts: + maxItems: 1 + +required: + - compatible + - reg + - clocks + - clock-names + - interrupts + +unevaluatedProperties: false + +examples: + - | + spi@10000800 { + compatible = "brcm,bcm6368-spi", "brcm,bcm6358-spi"; + reg = <0x10000800 0x70c>; + interrupts = <1>; + clocks = <&clkctl 9>; + clock-names = "spi"; + num-cs = <5>; + #address-cells = <1>; + #size-cells = <0>; + }; diff --git a/Documentation/devicetree/bindings/spi/cdns,qspi-nor.yaml b/Documentation/devicetree/bindings/spi/cdns,qspi-nor.yaml index 4f15f9a0cc34..cca81f89e252 100644 --- a/Documentation/devicetree/bindings/spi/cdns,qspi-nor.yaml +++ b/Documentation/devicetree/bindings/spi/cdns,qspi-nor.yaml @@ -86,7 +86,17 @@ properties: maxItems: 1 clocks: - maxItems: 1 + minItems: 1 + maxItems: 3 + + clock-names: + oneOf: + - items: + - const: ref + - items: + - const: ref + - const: ahb + - const: apb cdns,fifo-depth: description: diff --git a/Documentation/devicetree/bindings/spi/loongson,ls2k-spi.yaml b/Documentation/devicetree/bindings/spi/loongson,ls2k-spi.yaml new file mode 100644 index 000000000000..de9d32feadf5 --- /dev/null +++ b/Documentation/devicetree/bindings/spi/loongson,ls2k-spi.yaml @@ -0,0 +1,46 @@ +# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/spi/loongson,ls2k-spi.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Loongson SPI controller + +maintainers: + - Yinbo Zhu <zhuyinbo@loongson.cn> + +allOf: + - $ref: /schemas/spi/spi-controller.yaml# + +properties: + compatible: + oneOf: + - enum: + - loongson,ls2k1000-spi + - items: + - enum: + - loongson,ls2k0500-spi + - const: loongson,ls2k1000-spi + + reg: + maxItems: 1 + + clocks: + maxItems: 1 + +required: + - compatible + - reg + - clocks + +unevaluatedProperties: false + +examples: + - | + spi0: spi@1fff0220{ + compatible = "loongson,ls2k1000-spi"; + reg = <0x1fff0220 0x10>; + clocks = <&clk 17>; + #address-cells = <1>; + #size-cells = <0>; + }; diff --git a/Documentation/devicetree/bindings/spi/nvidia,tegra114-spi.txt b/Documentation/devicetree/bindings/spi/nvidia,tegra114-spi.txt deleted file mode 100644 index db8e0d71c5bc..000000000000 --- a/Documentation/devicetree/bindings/spi/nvidia,tegra114-spi.txt +++ /dev/null @@ -1,61 +0,0 @@ -NVIDIA Tegra114 SPI controller. - -Required properties: -- compatible : For Tegra114, must contain "nvidia,tegra114-spi". - Otherwise, must contain '"nvidia,<chip>-spi", "nvidia,tegra114-spi"' where - <chip> is tegra124, tegra132, or tegra210. -- reg: Should contain SPI registers location and length. -- interrupts: Should contain SPI interrupts. -- clock-names : Must include the following entries: - - spi -- resets : Must contain an entry for each entry in reset-names. - See ../reset/reset.txt for details. -- reset-names : Must include the following entries: - - spi -- dmas : Must contain an entry for each entry in clock-names. - See ../dma/dma.txt for details. -- dma-names : Must include the following entries: - - rx - - tx -- clocks : Must contain an entry for each entry in clock-names. - See ../clocks/clock-bindings.txt for details. - -Recommended properties: -- spi-max-frequency: Definition as per - Documentation/devicetree/bindings/spi/spi-bus.txt -Optional properties: -- nvidia,tx-clk-tap-delay: Delays the clock going out to the external device - with this tap value. This property is used to tune the outgoing data from - Tegra SPI master with respect to outgoing Tegra SPI master clock. - Tap values vary based on the platform design trace lengths from Tegra SPI - to corresponding slave devices. Valid tap values are from 0 thru 63. -- nvidia,rx-clk-tap-delay: Delays the clock coming in from the external device - with this tap value. This property is used to adjust the Tegra SPI master - clock with respect to the data from the SPI slave device. - Tap values vary based on the platform design trace lengths from Tegra SPI - to corresponding slave devices. Valid tap values are from 0 thru 63. - -Example: - -spi@7000d600 { - compatible = "nvidia,tegra114-spi"; - reg = <0x7000d600 0x200>; - interrupts = <0 82 0x04>; - spi-max-frequency = <25000000>; - #address-cells = <1>; - #size-cells = <0>; - clocks = <&tegra_car 44>; - clock-names = "spi"; - resets = <&tegra_car 44>; - reset-names = "spi"; - dmas = <&apbdma 16>, <&apbdma 16>; - dma-names = "rx", "tx"; - <spi-client>@<bus_num> { - ... - ... - nvidia,rx-clk-tap-delay = <0>; - nvidia,tx-clk-tap-delay = <16>; - ... - }; - -}; diff --git a/Documentation/devicetree/bindings/spi/nvidia,tegra114-spi.yaml b/Documentation/devicetree/bindings/spi/nvidia,tegra114-spi.yaml new file mode 100644 index 000000000000..58222ffa53d7 --- /dev/null +++ b/Documentation/devicetree/bindings/spi/nvidia,tegra114-spi.yaml @@ -0,0 +1,100 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/spi/nvidia,tegra114-spi.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: NVIDIA Tegra114 SPI controller + +maintainers: + - Thierry Reding <thierry.reding@gmail.com> + - Jon Hunter <jonathanh@nvidia.com> + +properties: + compatible: + oneOf: + - const: nvidia,tegra114-spi + - items: + - enum: + - nvidia,tegra210-spi + - nvidia,tegra124-spi + - const: nvidia,tegra114-spi + + reg: + maxItems: 1 + + interrupts: + maxItems: 1 + + clocks: + items: + - description: SPI module clock + + clock-names: + items: + - const: spi + + resets: + items: + - description: SPI module reset + + reset-names: + items: + - const: spi + + dmas: + items: + - description: DMA channel for the reception FIFO + - description: DMA channel for the transmission FIFO + + dma-names: + items: + - const: rx + - const: tx + + spi-max-frequency: + description: Maximum SPI clocking speed of the controller in Hz. + $ref: /schemas/types.yaml#/definitions/uint32 + +allOf: + - $ref: spi-controller.yaml + +unevaluatedProperties: false + +required: + - compatible + - reg + - interrupts + - clocks + - clock-names + - resets + - reset-names + - dmas + - dma-names + +examples: + - | + spi@7000d600 { + compatible = "nvidia,tegra114-spi"; + reg = <0x7000d600 0x200>; + interrupts = <0 82 0x04>; + clocks = <&tegra_car 44>; + clock-names = "spi"; + resets = <&tegra_car 44>; + reset-names = "spi"; + dmas = <&apbdma 16>, <&apbdma 16>; + dma-names = "rx", "tx"; + + spi-max-frequency = <25000000>; + + #address-cells = <1>; + #size-cells = <0>; + + flash@0 { + compatible = "jedec,spi-nor"; + reg = <0>; + spi-max-frequency = <20000000>; + nvidia,rx-clk-tap-delay = <0>; + nvidia,tx-clk-tap-delay = <16>; + }; + }; diff --git a/Documentation/devicetree/bindings/spi/nvidia,tegra20-sflash.txt b/Documentation/devicetree/bindings/spi/nvidia,tegra20-sflash.txt deleted file mode 100644 index c212491929b5..000000000000 --- a/Documentation/devicetree/bindings/spi/nvidia,tegra20-sflash.txt +++ /dev/null @@ -1,37 +0,0 @@ -NVIDIA Tegra20 SFLASH controller. - -Required properties: -- compatible : should be "nvidia,tegra20-sflash". -- reg: Should contain SFLASH registers location and length. -- interrupts: Should contain SFLASH interrupts. -- clocks : Must contain one entry, for the module clock. - See ../clocks/clock-bindings.txt for details. -- resets : Must contain an entry for each entry in reset-names. - See ../reset/reset.txt for details. -- reset-names : Must include the following entries: - - spi -- dmas : Must contain an entry for each entry in clock-names. - See ../dma/dma.txt for details. -- dma-names : Must include the following entries: - - rx - - tx - -Recommended properties: -- spi-max-frequency: Definition as per - Documentation/devicetree/bindings/spi/spi-bus.txt - -Example: - -spi@7000c380 { - compatible = "nvidia,tegra20-sflash"; - reg = <0x7000c380 0x80>; - interrupts = <0 39 0x04>; - spi-max-frequency = <25000000>; - #address-cells = <1>; - #size-cells = <0>; - clocks = <&tegra_car 43>; - resets = <&tegra_car 43>; - reset-names = "spi"; - dmas = <&apbdma 11>, <&apbdma 11>; - dma-names = "rx", "tx"; -}; diff --git a/Documentation/devicetree/bindings/spi/nvidia,tegra20-sflash.yaml b/Documentation/devicetree/bindings/spi/nvidia,tegra20-sflash.yaml new file mode 100644 index 000000000000..e245bad85a25 --- /dev/null +++ b/Documentation/devicetree/bindings/spi/nvidia,tegra20-sflash.yaml @@ -0,0 +1,81 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/spi/nvidia,tegra20-sflash.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: NVIDIA Tegra20 SFLASH controller + +maintainers: + - Thierry Reding <thierry.reding@gmail.com> + - Jon Hunter <jonathanh@nvidia.com> + +properties: + compatible: + const: nvidia,tegra20-sflash + + reg: + maxItems: 1 + + interrupts: + maxItems: 1 + + clocks: + items: + - description: module clock + + resets: + items: + - description: module reset + + reset-names: + items: + - const: spi + + dmas: + items: + - description: DMA channel used for reception + - description: DMA channel used for transmission + + dma-names: + items: + - const: rx + - const: tx + + spi-max-frequency: + description: Maximum SPI clocking speed of the controller in Hz. + $ref: /schemas/types.yaml#/definitions/uint32 + +allOf: + - $ref: spi-controller.yaml + +unevaluatedProperties: false + +required: + - compatible + - reg + - interrupts + - clocks + - resets + - reset-names + - dmas + - dma-names + +examples: + - | + #include <dt-bindings/clock/tegra20-car.h> + #include <dt-bindings/interrupt-controller/arm-gic.h> + + spi@7000c380 { + compatible = "nvidia,tegra20-sflash"; + reg = <0x7000c380 0x80>; + interrupts = <GIC_SPI 39 IRQ_TYPE_LEVEL_HIGH>; + spi-max-frequency = <25000000>; + #address-cells = <1>; + #size-cells = <0>; + clocks = <&tegra_car TEGRA20_CLK_SPI>; + resets = <&tegra_car 43>; + reset-names = "spi"; + dmas = <&apbdma 11>, <&apbdma 11>; + dma-names = "rx", "tx"; + }; diff --git a/Documentation/devicetree/bindings/spi/nvidia,tegra20-slink.txt b/Documentation/devicetree/bindings/spi/nvidia,tegra20-slink.txt deleted file mode 100644 index 40d80b93e327..000000000000 --- a/Documentation/devicetree/bindings/spi/nvidia,tegra20-slink.txt +++ /dev/null @@ -1,37 +0,0 @@ -NVIDIA Tegra20/Tegra30 SLINK controller. - -Required properties: -- compatible : should be "nvidia,tegra20-slink", "nvidia,tegra30-slink". -- reg: Should contain SLINK registers location and length. -- interrupts: Should contain SLINK interrupts. -- clocks : Must contain one entry, for the module clock. - See ../clocks/clock-bindings.txt for details. -- resets : Must contain an entry for each entry in reset-names. - See ../reset/reset.txt for details. -- reset-names : Must include the following entries: - - spi -- dmas : Must contain an entry for each entry in clock-names. - See ../dma/dma.txt for details. -- dma-names : Must include the following entries: - - rx - - tx - -Recommended properties: -- spi-max-frequency: Definition as per - Documentation/devicetree/bindings/spi/spi-bus.txt - -Example: - -spi@7000d600 { - compatible = "nvidia,tegra20-slink"; - reg = <0x7000d600 0x200>; - interrupts = <0 82 0x04>; - spi-max-frequency = <25000000>; - #address-cells = <1>; - #size-cells = <0>; - clocks = <&tegra_car 44>; - resets = <&tegra_car 44>; - reset-names = "spi"; - dmas = <&apbdma 16>, <&apbdma 16>; - dma-names = "rx", "tx"; -}; diff --git a/Documentation/devicetree/bindings/spi/nvidia,tegra20-slink.yaml b/Documentation/devicetree/bindings/spi/nvidia,tegra20-slink.yaml new file mode 100644 index 000000000000..291c25ec015d --- /dev/null +++ b/Documentation/devicetree/bindings/spi/nvidia,tegra20-slink.yaml @@ -0,0 +1,90 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/spi/nvidia,tegra20-slink.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: NVIDIA Tegra20/30 SLINK controller + +maintainers: + - Thierry Reding <thierry.reding@gmail.com> + - Jon Hunter <jonathanh@nvidia.com> + +properties: + compatible: + enum: + - nvidia,tegra20-slink + - nvidia,tegra30-slink + + reg: + maxItems: 1 + + interrupts: + maxItems: 1 + + clocks: + items: + - description: module clock + + resets: + items: + - description: module reset + + reset-names: + items: + - const: spi + + dmas: + items: + - description: DMA channel used for reception + - description: DMA channel used for transmission + + dma-names: + items: + - const: rx + - const: tx + + operating-points-v2: + $ref: /schemas/types.yaml#/definitions/phandle + + power-domains: + items: + - description: phandle to the core power domain + + spi-max-frequency: + description: Maximum SPI clocking speed of the controller in Hz. + $ref: /schemas/types.yaml#/definitions/uint32 + +allOf: + - $ref: spi-controller.yaml + +unevaluatedProperties: false + +required: + - compatible + - reg + - interrupts + - clocks + - resets + - reset-names + - dmas + - dma-names + +examples: + - | + #include <dt-bindings/clock/tegra20-car.h> + #include <dt-bindings/interrupt-controller/arm-gic.h> + + spi@7000d600 { + compatible = "nvidia,tegra20-slink"; + reg = <0x7000d600 0x200>; + interrupts = <GIC_SPI 82 IRQ_TYPE_LEVEL_HIGH>; + spi-max-frequency = <25000000>; + #address-cells = <1>; + #size-cells = <0>; + clocks = <&tegra_car TEGRA20_CLK_SBC2>; + resets = <&tegra_car 44>; + reset-names = "spi"; + dmas = <&apbdma 16>, <&apbdma 16>; + dma-names = "rx", "tx"; + }; diff --git a/Documentation/devicetree/bindings/spi/spi-bcm63xx.txt b/Documentation/devicetree/bindings/spi/spi-bcm63xx.txt deleted file mode 100644 index 1c16f6692613..000000000000 --- a/Documentation/devicetree/bindings/spi/spi-bcm63xx.txt +++ /dev/null @@ -1,33 +0,0 @@ -Binding for Broadcom BCM6348/BCM6358 SPI controller - -Required properties: -- compatible: must contain one of "brcm,bcm6348-spi", "brcm,bcm6358-spi". -- reg: Base address and size of the controllers memory area. -- interrupts: Interrupt for the SPI block. -- clocks: phandle of the SPI clock. -- clock-names: has to be "spi". -- #address-cells: <1>, as required by generic SPI binding. -- #size-cells: <0>, also as required by generic SPI binding. - -Optional properties: -- num-cs: some controllers have less than 8 cs signals. Defaults to 8 - if absent. - -Child nodes as per the generic SPI binding. - -Example: - - spi@10000800 { - compatible = "brcm,bcm6368-spi", "brcm,bcm6358-spi"; - reg = <0x10000800 0x70c>; - - interrupts = <1>; - - clocks = <&clkctl 9>; - clock-names = "spi"; - - num-cs = <5>; - - #address-cells = <1>; - #size-cells = <0>; - }; diff --git a/Documentation/devicetree/bindings/spi/spi-cadence.yaml b/Documentation/devicetree/bindings/spi/spi-cadence.yaml index b7552739b554..d4b61b0e8301 100644 --- a/Documentation/devicetree/bindings/spi/spi-cadence.yaml +++ b/Documentation/devicetree/bindings/spi/spi-cadence.yaml @@ -49,6 +49,12 @@ properties: enum: [ 0, 1 ] default: 0 + power-domains: + maxItems: 1 + + label: + description: Descriptive name of the SPI controller. + required: - compatible - reg diff --git a/Documentation/devicetree/bindings/spi/spi-fsl-lpspi.yaml b/Documentation/devicetree/bindings/spi/spi-fsl-lpspi.yaml index e91425012319..727c5346b8ce 100644 --- a/Documentation/devicetree/bindings/spi/spi-fsl-lpspi.yaml +++ b/Documentation/devicetree/bindings/spi/spi-fsl-lpspi.yaml @@ -63,6 +63,9 @@ properties: maximum: 2 default: 1 + power-domains: + maxItems: 1 + required: - compatible - reg diff --git a/Documentation/devicetree/bindings/spi/spi-nxp-fspi.yaml b/Documentation/devicetree/bindings/spi/spi-nxp-fspi.yaml index a813c971ecf6..7fd591145480 100644 --- a/Documentation/devicetree/bindings/spi/spi-nxp-fspi.yaml +++ b/Documentation/devicetree/bindings/spi/spi-nxp-fspi.yaml @@ -45,6 +45,9 @@ properties: - const: fspi_en - const: fspi + power-domains: + maxItems: 1 + required: - compatible - reg diff --git a/Documentation/devicetree/bindings/spi/spi-pl022.yaml b/Documentation/devicetree/bindings/spi/spi-pl022.yaml index 91e540a92faf..5e5a704a766e 100644 --- a/Documentation/devicetree/bindings/spi/spi-pl022.yaml +++ b/Documentation/devicetree/bindings/spi/spi-pl022.yaml @@ -11,6 +11,7 @@ maintainers: allOf: - $ref: spi-controller.yaml# + - $ref: /schemas/arm/primecell.yaml# # We need a select here so we don't match all nodes with 'arm,primecell' select: diff --git a/Documentation/devicetree/bindings/trivial-devices.yaml b/Documentation/devicetree/bindings/trivial-devices.yaml index ba2bfb547909..40bc475ee7e1 100644 --- a/Documentation/devicetree/bindings/trivial-devices.yaml +++ b/Documentation/devicetree/bindings/trivial-devices.yaml @@ -119,6 +119,10 @@ properties: - fsl,mpr121 # Monolithic Power Systems Inc. multi-phase controller mp2888 - mps,mp2888 + # Monolithic Power Systems Inc. multi-phase controller mp2971 + - mps,mp2971 + # Monolithic Power Systems Inc. multi-phase controller mp2973 + - mps,mp2973 # Monolithic Power Systems Inc. multi-phase controller mp2975 - mps,mp2975 # Honeywell Humidicon HIH-6130 humidity/temperature sensor @@ -315,6 +319,8 @@ properties: - plx,pex8648 # Pulsedlight LIDAR range-finding sensor - pulsedlight,lidar-lite-v2 + # Renesas HS3001 Temperature and Relative Humidity Sensors + - renesas,hs3001 # Renesas ISL29501 time-of-flight sensor - renesas,isl29501 # Rohm DH2228FV diff --git a/Documentation/driver-api/s390-drivers.rst b/Documentation/driver-api/s390-drivers.rst index 5158577bc29b..8c0845c4eee7 100644 --- a/Documentation/driver-api/s390-drivers.rst +++ b/Documentation/driver-api/s390-drivers.rst @@ -27,7 +27,7 @@ not strictly considered I/O devices. They are considered here as well, although they are not the focus of this document. Some additional information can also be found in the kernel source under -Documentation/s390/driver-model.rst. +Documentation/arch/s390/driver-model.rst. The css bus =========== @@ -38,7 +38,7 @@ into several categories: * Standard I/O subchannels, for use by the system. They have a child device on the ccw bus and are described below. * I/O subchannels bound to the vfio-ccw driver. See - Documentation/s390/vfio-ccw.rst. + Documentation/arch/s390/vfio-ccw.rst. * Message subchannels. No Linux driver currently exists. * CHSC subchannels (at most one). The chsc subchannel driver can be used to send asynchronous chsc commands. diff --git a/Documentation/features/vm/TLB/arch-support.txt b/Documentation/features/vm/TLB/arch-support.txt index 7f049c251a79..76208db88f3b 100644 --- a/Documentation/features/vm/TLB/arch-support.txt +++ b/Documentation/features/vm/TLB/arch-support.txt @@ -9,7 +9,7 @@ | alpha: | TODO | | arc: | TODO | | arm: | TODO | - | arm64: | N/A | + | arm64: | ok | | csky: | TODO | | hexagon: | TODO | | ia64: | TODO | diff --git a/Documentation/filesystems/fscrypt.rst b/Documentation/filesystems/fscrypt.rst index eccd327e6df5..a624e92f2687 100644 --- a/Documentation/filesystems/fscrypt.rst +++ b/Documentation/filesystems/fscrypt.rst @@ -332,54 +332,121 @@ Encryption modes and usage fscrypt allows one encryption mode to be specified for file contents and one encryption mode to be specified for filenames. Different directory trees are permitted to use different encryption modes. + +Supported modes +--------------- + Currently, the following pairs of encryption modes are supported: - AES-256-XTS for contents and AES-256-CTS-CBC for filenames -- AES-128-CBC for contents and AES-128-CTS-CBC for filenames +- AES-256-XTS for contents and AES-256-HCTR2 for filenames - Adiantum for both contents and filenames -- AES-256-XTS for contents and AES-256-HCTR2 for filenames (v2 policies only) -- SM4-XTS for contents and SM4-CTS-CBC for filenames (v2 policies only) - -If unsure, you should use the (AES-256-XTS, AES-256-CTS-CBC) pair. - -AES-128-CBC was added only for low-powered embedded devices with -crypto accelerators such as CAAM or CESA that do not support XTS. To -use AES-128-CBC, CONFIG_CRYPTO_ESSIV and CONFIG_CRYPTO_SHA256 (or -another SHA-256 implementation) must be enabled so that ESSIV can be -used. - -Adiantum is a (primarily) stream cipher-based mode that is fast even -on CPUs without dedicated crypto instructions. It's also a true -wide-block mode, unlike XTS. It can also eliminate the need to derive -per-file encryption keys. However, it depends on the security of two -primitives, XChaCha12 and AES-256, rather than just one. See the -paper "Adiantum: length-preserving encryption for entry-level -processors" (https://eprint.iacr.org/2018/720.pdf) for more details. -To use Adiantum, CONFIG_CRYPTO_ADIANTUM must be enabled. Also, fast -implementations of ChaCha and NHPoly1305 should be enabled, e.g. -CONFIG_CRYPTO_CHACHA20_NEON and CONFIG_CRYPTO_NHPOLY1305_NEON for ARM. - -AES-256-HCTR2 is another true wide-block encryption mode that is intended for -use on CPUs with dedicated crypto instructions. AES-256-HCTR2 has the property -that a bitflip in the plaintext changes the entire ciphertext. This property -makes it desirable for filename encryption since initialization vectors are -reused within a directory. For more details on AES-256-HCTR2, see the paper -"Length-preserving encryption with HCTR2" -(https://eprint.iacr.org/2021/1441.pdf). To use AES-256-HCTR2, -CONFIG_CRYPTO_HCTR2 must be enabled. Also, fast implementations of XCTR and -POLYVAL should be enabled, e.g. CRYPTO_POLYVAL_ARM64_CE and -CRYPTO_AES_ARM64_CE_BLK for ARM64. - -SM4 is a Chinese block cipher that is an alternative to AES. It has -not seen as much security review as AES, and it only has a 128-bit key -size. It may be useful in cases where its use is mandated. -Otherwise, it should not be used. For SM4 support to be available, it -also needs to be enabled in the kernel crypto API. - -New encryption modes can be added relatively easily, without changes -to individual filesystems. However, authenticated encryption (AE) -modes are not currently supported because of the difficulty of dealing -with ciphertext expansion. +- AES-128-CBC-ESSIV for contents and AES-128-CTS-CBC for filenames +- SM4-XTS for contents and SM4-CTS-CBC for filenames + +Authenticated encryption modes are not currently supported because of +the difficulty of dealing with ciphertext expansion. Therefore, +contents encryption uses a block cipher in `XTS mode +<https://en.wikipedia.org/wiki/Disk_encryption_theory#XTS>`_ or +`CBC-ESSIV mode +<https://en.wikipedia.org/wiki/Disk_encryption_theory#Encrypted_salt-sector_initialization_vector_(ESSIV)>`_, +or a wide-block cipher. Filenames encryption uses a +block cipher in `CTS-CBC mode +<https://en.wikipedia.org/wiki/Ciphertext_stealing>`_ or a wide-block +cipher. + +The (AES-256-XTS, AES-256-CTS-CBC) pair is the recommended default. +It is also the only option that is *guaranteed* to always be supported +if the kernel supports fscrypt at all; see `Kernel config options`_. + +The (AES-256-XTS, AES-256-HCTR2) pair is also a good choice that +upgrades the filenames encryption to use a wide-block cipher. (A +*wide-block cipher*, also called a tweakable super-pseudorandom +permutation, has the property that changing one bit scrambles the +entire result.) As described in `Filenames encryption`_, a wide-block +cipher is the ideal mode for the problem domain, though CTS-CBC is the +"least bad" choice among the alternatives. For more information about +HCTR2, see `the HCTR2 paper <https://eprint.iacr.org/2021/1441.pdf>`_. + +Adiantum is recommended on systems where AES is too slow due to lack +of hardware acceleration for AES. Adiantum is a wide-block cipher +that uses XChaCha12 and AES-256 as its underlying components. Most of +the work is done by XChaCha12, which is much faster than AES when AES +acceleration is unavailable. For more information about Adiantum, see +`the Adiantum paper <https://eprint.iacr.org/2018/720.pdf>`_. + +The (AES-128-CBC-ESSIV, AES-128-CTS-CBC) pair exists only to support +systems whose only form of AES acceleration is an off-CPU crypto +accelerator such as CAAM or CESA that does not support XTS. + +The remaining mode pairs are the "national pride ciphers": + +- (SM4-XTS, SM4-CTS-CBC) + +Generally speaking, these ciphers aren't "bad" per se, but they +receive limited security review compared to the usual choices such as +AES and ChaCha. They also don't bring much new to the table. It is +suggested to only use these ciphers where their use is mandated. + +Kernel config options +--------------------- + +Enabling fscrypt support (CONFIG_FS_ENCRYPTION) automatically pulls in +only the basic support from the crypto API needed to use AES-256-XTS +and AES-256-CTS-CBC encryption. For optimal performance, it is +strongly recommended to also enable any available platform-specific +kconfig options that provide acceleration for the algorithm(s) you +wish to use. Support for any "non-default" encryption modes typically +requires extra kconfig options as well. + +Below, some relevant options are listed by encryption mode. Note, +acceleration options not listed below may be available for your +platform; refer to the kconfig menus. File contents encryption can +also be configured to use inline encryption hardware instead of the +kernel crypto API (see `Inline encryption support`_); in that case, +the file contents mode doesn't need to supported in the kernel crypto +API, but the filenames mode still does. + +- AES-256-XTS and AES-256-CTS-CBC + - Recommended: + - arm64: CONFIG_CRYPTO_AES_ARM64_CE_BLK + - x86: CONFIG_CRYPTO_AES_NI_INTEL + +- AES-256-HCTR2 + - Mandatory: + - CONFIG_CRYPTO_HCTR2 + - Recommended: + - arm64: CONFIG_CRYPTO_AES_ARM64_CE_BLK + - arm64: CONFIG_CRYPTO_POLYVAL_ARM64_CE + - x86: CONFIG_CRYPTO_AES_NI_INTEL + - x86: CONFIG_CRYPTO_POLYVAL_CLMUL_NI + +- Adiantum + - Mandatory: + - CONFIG_CRYPTO_ADIANTUM + - Recommended: + - arm32: CONFIG_CRYPTO_CHACHA20_NEON + - arm32: CONFIG_CRYPTO_NHPOLY1305_NEON + - arm64: CONFIG_CRYPTO_CHACHA20_NEON + - arm64: CONFIG_CRYPTO_NHPOLY1305_NEON + - x86: CONFIG_CRYPTO_CHACHA20_X86_64 + - x86: CONFIG_CRYPTO_NHPOLY1305_SSE2 + - x86: CONFIG_CRYPTO_NHPOLY1305_AVX2 + +- AES-128-CBC-ESSIV and AES-128-CTS-CBC: + - Mandatory: + - CONFIG_CRYPTO_ESSIV + - CONFIG_CRYPTO_SHA256 or another SHA-256 implementation + - Recommended: + - AES-CBC acceleration + +fscrypt also uses HMAC-SHA512 for key derivation, so enabling SHA-512 +acceleration is recommended: + +- SHA-512 + - Recommended: + - arm64: CONFIG_CRYPTO_SHA512_ARM64_CE + - x86: CONFIG_CRYPTO_SHA512_SSSE3 Contents encryption ------------------- @@ -493,7 +560,14 @@ This structure must be initialized as follows: be set to constants from ``<linux/fscrypt.h>`` which identify the encryption modes to use. If unsure, use FSCRYPT_MODE_AES_256_XTS (1) for ``contents_encryption_mode`` and FSCRYPT_MODE_AES_256_CTS - (4) for ``filenames_encryption_mode``. + (4) for ``filenames_encryption_mode``. For details, see `Encryption + modes and usage`_. + + v1 encryption policies only support three combinations of modes: + (FSCRYPT_MODE_AES_256_XTS, FSCRYPT_MODE_AES_256_CTS), + (FSCRYPT_MODE_AES_128_CBC, FSCRYPT_MODE_AES_128_CTS), and + (FSCRYPT_MODE_ADIANTUM, FSCRYPT_MODE_ADIANTUM). v2 policies support + all combinations documented in `Supported modes`_. - ``flags`` contains optional flags from ``<linux/fscrypt.h>``: diff --git a/Documentation/filesystems/fsverity.rst b/Documentation/filesystems/fsverity.rst index cb845e8e5435..13e4b18e5dbb 100644 --- a/Documentation/filesystems/fsverity.rst +++ b/Documentation/filesystems/fsverity.rst @@ -326,6 +326,8 @@ the file has fs-verity enabled. This can perform better than FS_IOC_GETFLAGS and FS_IOC_MEASURE_VERITY because it doesn't require opening the file, and opening verity files can be expensive. +.. _accessing_verity_files: + Accessing verity files ====================== diff --git a/Documentation/filesystems/idmappings.rst b/Documentation/filesystems/idmappings.rst index ad6d21640576..d095c5838f94 100644 --- a/Documentation/filesystems/idmappings.rst +++ b/Documentation/filesystems/idmappings.rst @@ -146,9 +146,10 @@ For the rest of this document we will prefix all userspace ids with ``u`` and all kernel ids with ``k``. Ranges of idmappings will be prefixed with ``r``. So an idmapping will be written as ``u0:k10000:r10000``. -For example, the id ``u1000`` is an id in the upper idmapset or "userspace -idmapset" starting with ``u1000``. And it is mapped to ``k11000`` which is a -kernel id in the lower idmapset or "kernel idmapset" starting with ``k10000``. +For example, within this idmapping, the id ``u1000`` is an id in the upper +idmapset or "userspace idmapset" starting with ``u0``. And it is mapped to +``k11000`` which is a kernel id in the lower idmapset or "kernel idmapset" +starting with ``k10000``. A kernel id is always created by an idmapping. Such idmappings are associated with user namespaces. Since we mainly care about how idmappings work we're not @@ -373,6 +374,13 @@ kernel maps the caller's userspace id down into a kernel id according to the caller's idmapping and then maps that kernel id up according to the filesystem's idmapping. +From the implementation point it's worth mentioning how idmappings are represented. +All idmappings are taken from the corresponding user namespace. + + - caller's idmapping (usually taken from ``current_user_ns()``) + - filesystem's idmapping (``sb->s_user_ns``) + - mount's idmapping (``mnt_idmap(vfsmnt)``) + Let's see some examples with caller/filesystem idmapping but without mount idmappings. This will exhibit some problems we can hit. After that we will revisit/reconsider these examples, this time using mount idmappings, to see how diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst index eb252fc972aa..09cade7eaefc 100644 --- a/Documentation/filesystems/index.rst +++ b/Documentation/filesystems/index.rst @@ -122,6 +122,7 @@ Documentation for filesystem implementations. virtiofs vfat xfs-delayed-logging-design + xfs-maintainer-entry-profile xfs-self-describing-metadata xfs-online-fsck-design zonefs diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst index 0ca479dbb1cd..2fd01b9aaced 100644 --- a/Documentation/filesystems/locking.rst +++ b/Documentation/filesystems/locking.rst @@ -85,13 +85,14 @@ prototypes:: struct dentry *dentry, struct fileattr *fa); int (*fileattr_get)(struct dentry *dentry, struct fileattr *fa); struct posix_acl * (*get_acl)(struct mnt_idmap *, struct dentry *, int); + struct offset_ctx *(*get_offset_ctx)(struct inode *inode); locking rules: all may block -============== ============================================= +============== ================================================== ops i_rwsem(inode) -============== ============================================= +============== ================================================== lookup: shared create: exclusive link: exclusive (both) @@ -115,7 +116,8 @@ atomic_open: shared (exclusive if O_CREAT is set in open flags) tmpfile: no fileattr_get: no or exclusive fileattr_set: exclusive -============== ============================================= +get_offset_ctx no +============== ================================================== Additionally, ->rmdir(), ->unlink() and ->rename() have ->i_rwsem @@ -374,10 +376,17 @@ invalidate_lock before invalidating page cache in truncate / hole punch path (and thus calling into ->invalidate_folio) to block races between page cache invalidation and page cache filling functions (fault, read, ...). -->release_folio() is called when the kernel is about to try to drop the -buffers from the folio in preparation for freeing it. It returns false to -indicate that the buffers are (or may be) freeable. If ->release_folio is -NULL, the kernel assumes that the fs has no private interest in the buffers. +->release_folio() is called when the MM wants to make a change to the +folio that would invalidate the filesystem's private data. For example, +it may be about to be removed from the address_space or split. The folio +is locked and not under writeback. It may be dirty. The gfp parameter +is not usually used for allocation, but rather to indicate what the +filesystem may do to attempt to free the private data. The filesystem may +return false to indicate that the folio's private data cannot be freed. +If it returns true, it should have already removed the private data from +the folio. If a filesystem does not provide a ->release_folio method, +the pagecache will assume that private data is buffer_heads and call +try_to_free_buffers(). ->free_folio() is called when the kernel has dropped the folio from the page cache. @@ -627,26 +636,29 @@ vm_operations_struct prototypes:: - void (*open)(struct vm_area_struct*); - void (*close)(struct vm_area_struct*); - vm_fault_t (*fault)(struct vm_area_struct*, struct vm_fault *); + void (*open)(struct vm_area_struct *); + void (*close)(struct vm_area_struct *); + vm_fault_t (*fault)(struct vm_fault *); + vm_fault_t (*huge_fault)(struct vm_fault *, unsigned int order); + vm_fault_t (*map_pages)(struct vm_fault *, pgoff_t start, pgoff_t end); vm_fault_t (*page_mkwrite)(struct vm_area_struct *, struct vm_fault *); vm_fault_t (*pfn_mkwrite)(struct vm_area_struct *, struct vm_fault *); int (*access)(struct vm_area_struct *, unsigned long, void*, int, int); locking rules: -============= ========= =========================== +============= ========== =========================== ops mmap_lock PageLocked(page) -============= ========= =========================== -open: yes -close: yes -fault: yes can return with page locked -map_pages: read -page_mkwrite: yes can return with page locked -pfn_mkwrite: yes -access: yes -============= ========= =========================== +============= ========== =========================== +open: write +close: read/write +fault: read can return with page locked +huge_fault: maybe-read +map_pages: maybe-read +page_mkwrite: read can return with page locked +pfn_mkwrite: read +access: read +============= ========== =========================== ->fault() is called when a previously not present pte is about to be faulted in. The filesystem must find and return the page associated with the passed in @@ -656,11 +668,18 @@ then ensure the page is not already truncated (invalidate_lock will block subsequent truncate), and then return with VM_FAULT_LOCKED, and the page locked. The VM will unlock the page. +->huge_fault() is called when there is no PUD or PMD entry present. This +gives the filesystem the opportunity to install a PUD or PMD sized page. +Filesystems can also use the ->fault method to return a PMD sized page, +so implementing this function may not be necessary. In particular, +filesystems should not call filemap_fault() from ->huge_fault(). +The mmap_lock may not be held when this method is called. + ->map_pages() is called when VM asks to map easy accessible pages. Filesystem should find and map pages associated with offsets from "start_pgoff" till "end_pgoff". ->map_pages() is called with the RCU lock held and must not block. If it's not possible to reach a page without blocking, -filesystem should skip it. Filesystem should use do_set_pte() to setup +filesystem should skip it. Filesystem should use set_pte_range() to setup page table entry. Pointer to entry associated with the page is passed in "pte" field in vm_fault structure. Pointers to entries for other offsets should be calculated relative to "pte". diff --git a/Documentation/filesystems/overlayfs.rst b/Documentation/filesystems/overlayfs.rst index eb7d2c88ddec..35853906accb 100644 --- a/Documentation/filesystems/overlayfs.rst +++ b/Documentation/filesystems/overlayfs.rst @@ -405,6 +405,53 @@ when a "metacopy" file in one of the lower layers above it, has a "redirect" to the absolute path of the "lower data" file in the "data-only" lower layer. +fs-verity support +---------------------- + +During metadata copy up of a lower file, if the source file has +fs-verity enabled and overlay verity support is enabled, then the +digest of the lower file is added to the "trusted.overlay.metacopy" +xattr. This is then used to verify the content of the lower file +each the time the metacopy file is opened. + +When a layer containing verity xattrs is used, it means that any such +metacopy file in the upper layer is guaranteed to match the content +that was in the lower at the time of the copy-up. If at any time +(during a mount, after a remount, etc) such a file in the lower is +replaced or modified in any way, access to the corresponding file in +overlayfs will result in EIO errors (either on open, due to overlayfs +digest check, or from a later read due to fs-verity) and a detailed +error is printed to the kernel logs. For more details of how fs-verity +file access works, see :ref:`Documentation/filesystems/fsverity.rst +<accessing_verity_files>`. + +Verity can be used as a general robustness check to detect accidental +changes in the overlayfs directories in use. But, with additional care +it can also give more powerful guarantees. For example, if the upper +layer is fully trusted (by using dm-verity or something similar), then +an untrusted lower layer can be used to supply validated file content +for all metacopy files. If additionally the untrusted lower +directories are specified as "Data-only", then they can only supply +such file content, and the entire mount can be trusted to match the +upper layer. + +This feature is controlled by the "verity" mount option, which +supports these values: + +- "off": + The metacopy digest is never generated or used. This is the + default if verity option is not specified. +- "on": + Whenever a metacopy files specifies an expected digest, the + corresponding data file must match the specified digest. When + generating a metacopy file the verity digest will be set in it + based on the source file (if it has one). +- "require": + Same as "on", but additionally all metacopy files must specify a + digest (or EIO is returned on open). This means metadata copy up + will only be used if the data file has fs-verity enabled, + otherwise a full copy-up is used. + Sharing and copying layers -------------------------- @@ -610,6 +657,31 @@ can be useful in case the underlying disk is copied and the UUID of this copy is changed. This is only applicable if all lower/upper/work directories are on the same filesystem, otherwise it will fallback to normal behaviour. + +UUID and fsid +------------- + +The UUID of overlayfs instance itself and the fsid reported by statfs(2) are +controlled by the "uuid" mount option, which supports these values: + +- "null": + UUID of overlayfs is null. fsid is taken from upper most filesystem. +- "off": + UUID of overlayfs is null. fsid is taken from upper most filesystem. + UUID of underlying layers is ignored. +- "on": + UUID of overlayfs is generated and used to report a unique fsid. + UUID is stored in xattr "trusted.overlay.uuid", making overlayfs fsid + unique and persistent. This option requires an overlayfs with upper + filesystem that supports xattrs. +- "auto": (default) + UUID is taken from xattr "trusted.overlay.uuid" if it exists. + Upgrade to "uuid=on" on first time mount of new overlay filesystem that + meets the prerequites. + Downgrade to "uuid=null" for existing overlay filesystems that were never + mounted with "uuid=on". + + Volatile mount -------------- diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesystems/porting.rst index 0f5da78ef4f9..98969d713e2e 100644 --- a/Documentation/filesystems/porting.rst +++ b/Documentation/filesystems/porting.rst @@ -938,3 +938,14 @@ file pointer instead of struct dentry pointer. d_tmpfile() is similarly changed to simplify callers. The passed file is in a non-open state and on success must be opened before returning (e.g. by calling finish_open_simple()). + +--- + +**mandatory** + +Calling convention for ->huge_fault has changed. It now takes a page +order instead of an enum page_entry_size, and it may be called without the +mmap_lock held. All in-tree users have been audited and do not seem to +depend on the mmap_lock being held, but out of tree users should verify +for themselves. If they do need it, they can return VM_FAULT_RETRY to +be called with the mmap_lock held. diff --git a/Documentation/filesystems/tmpfs.rst b/Documentation/filesystems/tmpfs.rst index 2cd8fa332feb..56a26c843dbe 100644 --- a/Documentation/filesystems/tmpfs.rst +++ b/Documentation/filesystems/tmpfs.rst @@ -21,8 +21,8 @@ explained further below, some of which can be reconfigured dynamically on the fly using a remount ('mount -o remount ...') of the filesystem. A tmpfs filesystem can be resized but it cannot be resized to a size below its current usage. tmpfs also supports POSIX ACLs, and extended attributes for the -trusted.* and security.* namespaces. ramfs does not use swap and you cannot -modify any parameter for a ramfs filesystem. The size limit of a ramfs +trusted.*, security.* and user.* namespaces. ramfs does not use swap and you +cannot modify any parameter for a ramfs filesystem. The size limit of a ramfs filesystem is how much memory you have available, and so care must be taken if used so to not run out of memory. @@ -97,6 +97,9 @@ mount with such options, since it allows any user with write access to use up all the memory on the machine; but enhances the scalability of that instance in a system with many CPUs making intensive use of it. +If nr_inodes is not 0, that limited space for inodes is also used up by +extended attributes: "df -i"'s IUsed and IUse% increase, IFree decreases. + tmpfs blocks may be swapped out, when there is a shortage of memory. tmpfs has a mount option to disable its use of swap: @@ -123,6 +126,37 @@ sysfs file /sys/kernel/mm/transparent_hugepage/shmem_enabled: which can be used to deny huge pages on all tmpfs mounts in an emergency, or to force huge pages on all tmpfs mounts for testing. +tmpfs also supports quota with the following mount options + +======================== ================================================= +quota User and group quota accounting and enforcement + is enabled on the mount. Tmpfs is using hidden + system quota files that are initialized on mount. +usrquota User quota accounting and enforcement is enabled + on the mount. +grpquota Group quota accounting and enforcement is enabled + on the mount. +usrquota_block_hardlimit Set global user quota block hard limit. +usrquota_inode_hardlimit Set global user quota inode hard limit. +grpquota_block_hardlimit Set global group quota block hard limit. +grpquota_inode_hardlimit Set global group quota inode hard limit. +======================== ================================================= + +None of the quota related mount options can be set or changed on remount. + +Quota limit parameters accept a suffix k, m or g for kilo, mega and giga +and can't be changed on remount. Default global quota limits are taking +effect for any and all user/group/project except root the first time the +quota entry for user/group/project id is being accessed - typically the +first time an inode with a particular id ownership is being created after +the mount. In other words, instead of the limits being initialized to zero, +they are initialized with the particular value provided with these mount +options. The limits can be changed for any user/group id at any time as they +normally can be. + +Note that tmpfs quotas do not support user namespaces so no uid/gid +translation is done if quotas are enabled inside user namespaces. + tmpfs has a mount option to set the NUMA memory allocation policy for all files in that instance (if CONFIG_NUMA is enabled) - which can be adjusted on the fly via 'mount -o remount ...' diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst index cb2a97e49872..f8fe815ab1f3 100644 --- a/Documentation/filesystems/vfs.rst +++ b/Documentation/filesystems/vfs.rst @@ -260,9 +260,11 @@ filesystem. The following members are defined: void (*evict_inode) (struct inode *); void (*put_super) (struct super_block *); int (*sync_fs)(struct super_block *sb, int wait); - int (*freeze_super) (struct super_block *); + int (*freeze_super) (struct super_block *sb, + enum freeze_holder who); int (*freeze_fs) (struct super_block *); - int (*thaw_super) (struct super_block *); + int (*thaw_super) (struct super_block *sb, + enum freeze_wholder who); int (*unfreeze_fs) (struct super_block *); int (*statfs) (struct dentry *, struct kstatfs *); int (*remount_fs) (struct super_block *, int *, char *); @@ -515,6 +517,7 @@ As of kernel 2.6.22, the following members are defined: int (*fileattr_set)(struct mnt_idmap *idmap, struct dentry *dentry, struct fileattr *fa); int (*fileattr_get)(struct dentry *dentry, struct fileattr *fa); + struct offset_ctx *(*get_offset_ctx)(struct inode *inode); }; Again, all methods are called without any locks being held, unless @@ -675,7 +678,10 @@ otherwise noted. called on ioctl(FS_IOC_SETFLAGS) and ioctl(FS_IOC_FSSETXATTR) to change miscellaneous file flags and attributes. Callers hold i_rwsem exclusive. If unset, then fall back to f_op->ioctl(). - +``get_offset_ctx`` + called to get the offset context for a directory inode. A + filesystem must define this operation to use + simple_offset_dir_operations. The Address Space Object ======================== diff --git a/Documentation/filesystems/xfs-maintainer-entry-profile.rst b/Documentation/filesystems/xfs-maintainer-entry-profile.rst new file mode 100644 index 000000000000..32b6ac4ca9d6 --- /dev/null +++ b/Documentation/filesystems/xfs-maintainer-entry-profile.rst @@ -0,0 +1,194 @@ +XFS Maintainer Entry Profile +============================ + +Overview +-------- +XFS is a well known high-performance filesystem in the Linux kernel. +The aim of this project is to provide and maintain a robust and +performant filesystem. + +Patches are generally merged to the for-next branch of the appropriate +git repository. +After a testing period, the for-next branch is merged to the master +branch. + +Kernel code are merged to the xfs-linux tree[0]. +Userspace code are merged to the xfsprogs tree[1]. +Test cases are merged to the xfstests tree[2]. +Ondisk format documentation are merged to the xfs-documentation tree[3]. + +All patchsets involving XFS *must* be cc'd in their entirety to the mailing +list linux-xfs@vger.kernel.org. + +Roles +----- +There are eight key roles in the XFS project. +A person can take on multiple roles, and a role can be filled by +multiple people. +Anyone taking on a role is advised to check in with themselves and +others on a regular basis about burnout. + +- **Outside Contributor**: Anyone who sends a patch but is not involved + in the XFS project on a regular basis. + These folks are usually people who work on other filesystems or + elsewhere in the kernel community. + +- **Developer**: Someone who is familiar with the XFS codebase enough to + write new code, documentation, and tests. + + Developers can often be found in the IRC channel mentioned by the ``C:`` + entry in the kernel MAINTAINERS file. + +- **Senior Developer**: A developer who is very familiar with at least + some part of the XFS codebase and/or other subsystems in the kernel. + These people collectively decide the long term goals of the project + and nudge the community in that direction. + They should help prioritize development and review work for each release + cycle. + + Senior developers tend to be more active participants in the IRC channel. + +- **Reviewer**: Someone (most likely also a developer) who reads code + submissions to decide: + + 0. Is the idea behind the contribution sound? + 1. Does the idea fit the goals of the project? + 2. Is the contribution designed correctly? + 3. Is the contribution polished? + 4. Can the contribution be tested effectively? + + Reviewers should identify themselves with an ``R:`` entry in the kernel + and fstests MAINTAINERS files. + +- **Testing Lead**: This person is responsible for setting the test + coverage goals of the project, negotiating with developers to decide + on new tests for new features, and making sure that developers and + release managers execute on the testing. + + The testing lead should identify themselves with an ``M:`` entry in + the XFS section of the fstests MAINTAINERS file. + +- **Bug Triager**: Someone who examines incoming bug reports in just + enough detail to identify the person to whom the report should be + forwarded. + + The bug triagers should identify themselves with a ``B:`` entry in + the kernel MAINTAINERS file. + +- **Release Manager**: This person merges reviewed patchsets into an + integration branch, tests the result locally, pushes the branch to a + public git repository, and sends pull requests further upstream. + The release manager is not expected to work on new feature patchsets. + If a developer and a reviewer fail to reach a resolution on some point, + the release manager must have the ability to intervene to try to drive a + resolution. + + The release manager should identify themselves with an ``M:`` entry in + the kernel MAINTAINERS file. + +- **Community Manager**: This person calls and moderates meetings of as many + XFS participants as they can get when mailing list discussions prove + insufficient for collective decisionmaking. + They may also serve as liaison between managers of the organizations + sponsoring work on any part of XFS. + +- **LTS Maintainer**: Someone who backports and tests bug fixes from + uptream to the LTS kernels. + There tend to be six separate LTS trees at any given time. + + The maintainer for a given LTS release should identify themselves with an + ``M:`` entry in the MAINTAINERS file for that LTS tree. + Unmaintained LTS kernels should be marked with status ``S: Orphan`` in that + same file. + +Submission Checklist Addendum +----------------------------- +Please follow these additional rules when submitting to XFS: + +- Patches affecting only the filesystem itself should be based against + the latest -rc or the for-next branch. + These patches will be merged back to the for-next branch. + +- Authors of patches touching other subsystems need to coordinate with + the maintainers of XFS and the relevant subsystems to decide how to + proceed with a merge. + +- Any patchset changing XFS should be cc'd in its entirety to linux-xfs. + Do not send partial patchsets; that makes analysis of the broader + context of the changes unnecessarily difficult. + +- Anyone making kernel changes that have corresponding changes to the + userspace utilities should send the userspace changes as separate + patchsets immediately after the kernel patchsets. + +- Authors of bug fix patches are expected to use fstests[2] to perform + an A/B test of the patch to determine that there are no regressions. + When possible, a new regression test case should be written for + fstests. + +- Authors of new feature patchsets must ensure that fstests will have + appropriate functional and input corner-case test cases for the new + feature. + +- When implementing a new feature, it is strongly suggested that the + developers write a design document to answer the following questions: + + * **What** problem is this trying to solve? + + * **Who** will benefit from this solution, and **where** will they + access it? + + * **How** will this new feature work? This should touch on major data + structures and algorithms supporting the solution at a higher level + than code comments. + + * **What** userspace interfaces are necessary to build off of the new + features? + + * **How** will this work be tested to ensure that it solves the + problems laid out in the design document without causing new + problems? + + The design document should be committed in the kernel documentation + directory. + It may be omitted if the feature is already well known to the + community. + +- Patchsets for the new tests should be submitted as separate patchsets + immediately after the kernel and userspace code patchsets. + +- Changes to the on-disk format of XFS must be described in the ondisk + format document[3] and submitted as a patchset after the fstests + patchsets. + +- Patchsets implementing bug fixes and further code cleanups should put + the bug fixes at the beginning of the series to ease backporting. + +Key Release Cycle Dates +----------------------- +Bug fixes may be sent at any time, though the release manager may decide to +defer a patch when the next merge window is close. + +Code submissions targeting the next merge window should be sent between +-rc1 and -rc6. +This gives the community time to review the changes, to suggest other changes, +and for the author to retest those changes. + +Code submissions also requiring changes to fs/iomap and targeting the +next merge window should be sent between -rc1 and -rc4. +This allows the broader kernel community adequate time to test the +infrastructure changes. + +Review Cadence +-------------- +In general, please wait at least one week before pinging for feedback. +To find reviewers, either consult the MAINTAINERS file, or ask +developers that have Reviewed-by tags for XFS changes to take a look and +offer their opinion. + +References +---------- +| [0] https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git/ +| [1] https://git.kernel.org/pub/scm/fs/xfs/xfsprogs-dev.git/ +| [2] https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git/ +| [3] https://git.kernel.org/pub/scm/fs/xfs/xfs-documentation.git/ diff --git a/Documentation/firmware-guide/acpi/chromeos-acpi-device.rst b/Documentation/firmware-guide/acpi/chromeos-acpi-device.rst index f37fc90ce340..89419e116413 100644 --- a/Documentation/firmware-guide/acpi/chromeos-acpi-device.rst +++ b/Documentation/firmware-guide/acpi/chromeos-acpi-device.rst @@ -5,9 +5,8 @@ Chrome OS ACPI Device ===================== Hardware functionality specific to Chrome OS is exposed through a Chrome OS ACPI device. -The plug and play ID of a Chrome OS ACPI device is GGL0001. GGL is a valid PNP ID of Google. -PNP ID can be used with the ACPI devices according to the guidelines. The following ACPI -objects are supported: +The plug and play ID of a Chrome OS ACPI device is GGL0001 and the hardware ID is +GOOG0016. The following ACPI objects are supported: .. flat-table:: Supported ACPI Objects :widths: 1 2 diff --git a/Documentation/hwmon/hs3001.rst b/Documentation/hwmon/hs3001.rst new file mode 100644 index 000000000000..9f59dfc212d9 --- /dev/null +++ b/Documentation/hwmon/hs3001.rst @@ -0,0 +1,37 @@ +.. SPDX-License-Identifier: GPL-2.0-or-later + +Kernel driver HS3001 +==================== + +Supported chips: + + * Renesas HS3001, HS3002, HS3003, HS3004 + + Prefix: 'hs3001' + + Addresses scanned: - + + Datasheet: https://www.renesas.com/us/en/document/dst/hs300x-datasheet?r=417401 + +Author: + + - Andre Werner <andre.werner@systec-electronic.com> + +Description +----------- + +This driver implements support for the Renesas HS3001 chips, a humidity +and temperature family. Temperature is measured in degrees celsius, relative +humidity is expressed as a percentage. In the sysfs interface, all values are +scaled by 1000, i.e. the value for 31.5 degrees celsius is 31500. + +The device communicates with the I2C protocol. Sensors have the I2C +address 0x44 by default. + +sysfs-Interface +--------------- + +=================== ================= +temp1_input: temperature input +humidity1_input: humidity input +=================== ================= diff --git a/Documentation/hwmon/index.rst b/Documentation/hwmon/index.rst index 042e1cf9501b..88dadea85cfc 100644 --- a/Documentation/hwmon/index.rst +++ b/Documentation/hwmon/index.rst @@ -78,6 +78,7 @@ Hardware Monitoring Kernel Drivers gxp-fan-ctrl hih6130 hp-wmi-sensors + hs3001 ibmaem ibm-cffps ibmpowernv @@ -195,7 +196,6 @@ Hardware Monitoring Kernel Drivers shtc1 sis5595 sl28cpld - smm665 smpro-hwmon smsc47b397 smsc47m192 diff --git a/Documentation/hwmon/nct6775.rst b/Documentation/hwmon/nct6775.rst index 5ba8276aad4b..9d7a10de61a7 100644 --- a/Documentation/hwmon/nct6775.rst +++ b/Documentation/hwmon/nct6775.rst @@ -80,7 +80,13 @@ Supported chips: Datasheet: Available from Nuvoton upon request + * Nuvoton NCT6796D-S/NCT6799D-R + Prefix: 'nct6799' + + Addresses scanned: ISA address retrieved from Super I/O registers + + Datasheet: Available from Nuvoton upon request Authors: @@ -277,4 +283,7 @@ will not reflect a usable value. It often reports unreasonably high temperatures, and in some cases the reported temperature declines if the actual temperature increases (similar to the raw PECI temperature value - see PECI specification for details). CPUTIN should therefore be ignored on ASUS -boards. The CPU temperature on ASUS boards is reported from PECI 0. +boards. The CPU temperature on ASUS boards is reported from PECI 0 or TSI 0. + +NCT6796D-S and NCT6799D-R chips are very similar and their chip_id indicates +they are different versions. This driver treats them the same way. diff --git a/Documentation/hwmon/pmbus.rst b/Documentation/hwmon/pmbus.rst index 7ecfec6ca2db..eb1569bfa676 100644 --- a/Documentation/hwmon/pmbus.rst +++ b/Documentation/hwmon/pmbus.rst @@ -163,7 +163,7 @@ Emerson DS1200 power modules might look as follows:: .driver = { .name = "ds1200", }, - .probe_new = ds1200_probe, + .probe = ds1200_probe, .id_table = ds1200_id, }; diff --git a/Documentation/hwmon/smm665.rst b/Documentation/hwmon/smm665.rst deleted file mode 100644 index 481e69d8bf39..000000000000 --- a/Documentation/hwmon/smm665.rst +++ /dev/null @@ -1,187 +0,0 @@ -Kernel driver smm665 -==================== - -Supported chips: - - * Summit Microelectronics SMM465 - - Prefix: 'smm465' - - Addresses scanned: - - - Datasheet: - - http://www.summitmicro.com/prod_select/summary/SMM465/SMM465DS.pdf - - * Summit Microelectronics SMM665, SMM665B - - Prefix: 'smm665' - - Addresses scanned: - - - Datasheet: - - http://www.summitmicro.com/prod_select/summary/SMM665/SMM665B_2089_20.pdf - - * Summit Microelectronics SMM665C - - Prefix: 'smm665c' - - Addresses scanned: - - - Datasheet: - - http://www.summitmicro.com/prod_select/summary/SMM665C/SMM665C_2125.pdf - - * Summit Microelectronics SMM764 - - Prefix: 'smm764' - - Addresses scanned: - - - Datasheet: - - http://www.summitmicro.com/prod_select/summary/SMM764/SMM764_2098.pdf - - * Summit Microelectronics SMM766, SMM766B - - Prefix: 'smm766' - - Addresses scanned: - - - Datasheets: - - http://www.summitmicro.com/prod_select/summary/SMM766/SMM766_2086.pdf - - http://www.summitmicro.com/prod_select/summary/SMM766B/SMM766B_2122.pdf - -Author: Guenter Roeck <linux@roeck-us.net> - - -Module Parameters ------------------ - -* vref: int - Default: 1250 (mV) - - Reference voltage on VREF_ADC pin in mV. It should not be necessary to set - this parameter unless a non-default reference voltage is used. - - -Description ------------ - -[From datasheet] The SMM665 is an Active DC Output power supply Controller -that monitors, margins and cascade sequences power. The part monitors six -power supply channels as well as VDD, 12V input, two general-purpose analog -inputs and an internal temperature sensor using a 10-bit ADC. - -Each monitored channel has its own high and low limits, plus a critical -limit. - -Support for SMM465, SMM764, and SMM766 has been implemented but is untested. - - -Usage Notes ------------ - -This driver does not probe for devices, since there is no register which -can be safely used to identify the chip. You will have to instantiate -the devices explicitly. When instantiating the device, you have to specify -its configuration register address. - -Example: the following will load the driver for an SMM665 at address 0x57 -on I2C bus #1:: - - $ modprobe smm665 - $ echo smm665 0x57 > /sys/bus/i2c/devices/i2c-1/new_device - - -Sysfs entries -------------- - -This driver uses the values in the datasheet to convert ADC register values -into the values specified in the sysfs-interface document. All attributes are -read only. - -Min, max, lcrit, and crit values are used by the chip to trigger external signals -and/or other activity. Triggered signals can include HEALTHY, RST, Power Off, -or Fault depending on the chip configuration. The driver reports values as lcrit -or crit if exceeding the limits triggers RST, Power Off, or Fault, and as min or -max otherwise. For details please see the SMM665 datasheet. - -For SMM465 and SMM764, values for Channel E and F are reported but undefined. - -======================= ======================================================= -in1_input 12V input voltage (mV) -in2_input 3.3V (VDD) input voltage (mV) -in3_input Channel A voltage (mV) -in4_input Channel B voltage (mV) -in5_input Channel C voltage (mV) -in6_input Channel D voltage (mV) -in7_input Channel E voltage (mV) -in8_input Channel F voltage (mV) -in9_input AIN1 voltage (mV) -in10_input AIN2 voltage (mV) - -in1_min 12v input minimum voltage (mV) -in2_min 3.3V (VDD) input minimum voltage (mV) -in3_min Channel A minimum voltage (mV) -in4_min Channel B minimum voltage (mV) -in5_min Channel C minimum voltage (mV) -in6_min Channel D minimum voltage (mV) -in7_min Channel E minimum voltage (mV) -in8_min Channel F minimum voltage (mV) -in9_min AIN1 minimum voltage (mV) -in10_min AIN2 minimum voltage (mV) - -in1_max 12v input maximum voltage (mV) -in2_max 3.3V (VDD) input maximum voltage (mV) -in3_max Channel A maximum voltage (mV) -in4_max Channel B maximum voltage (mV) -in5_max Channel C maximum voltage (mV) -in6_max Channel D maximum voltage (mV) -in7_max Channel E maximum voltage (mV) -in8_max Channel F maximum voltage (mV) -in9_max AIN1 maximum voltage (mV) -in10_max AIN2 maximum voltage (mV) - -in1_lcrit 12v input critical minimum voltage (mV) -in2_lcrit 3.3V (VDD) input critical minimum voltage (mV) -in3_lcrit Channel A critical minimum voltage (mV) -in4_lcrit Channel B critical minimum voltage (mV) -in5_lcrit Channel C critical minimum voltage (mV) -in6_lcrit Channel D critical minimum voltage (mV) -in7_lcrit Channel E critical minimum voltage (mV) -in8_lcrit Channel F critical minimum voltage (mV) -in9_lcrit AIN1 critical minimum voltage (mV) -in10_lcrit AIN2 critical minimum voltage (mV) - -in1_crit 12v input critical maximum voltage (mV) -in2_crit 3.3V (VDD) input critical maximum voltage (mV) -in3_crit Channel A critical maximum voltage (mV) -in4_crit Channel B critical maximum voltage (mV) -in5_crit Channel C critical maximum voltage (mV) -in6_crit Channel D critical maximum voltage (mV) -in7_crit Channel E critical maximum voltage (mV) -in8_crit Channel F critical maximum voltage (mV) -in9_crit AIN1 critical maximum voltage (mV) -in10_crit AIN2 critical maximum voltage (mV) - -in1_crit_alarm 12v input critical alarm -in2_crit_alarm 3.3V (VDD) input critical alarm -in3_crit_alarm Channel A critical alarm -in4_crit_alarm Channel B critical alarm -in5_crit_alarm Channel C critical alarm -in6_crit_alarm Channel D critical alarm -in7_crit_alarm Channel E critical alarm -in8_crit_alarm Channel F critical alarm -in9_crit_alarm AIN1 critical alarm -in10_crit_alarm AIN2 critical alarm - -temp1_input Chip temperature -temp1_min Minimum chip temperature -temp1_max Maximum chip temperature -temp1_crit Critical chip temperature -temp1_crit_alarm Temperature critical alarm -======================= ======================================================= diff --git a/Documentation/maintainer/maintainer-entry-profile.rst b/Documentation/maintainer/maintainer-entry-profile.rst index cfd37f31077f..6b64072d4bf2 100644 --- a/Documentation/maintainer/maintainer-entry-profile.rst +++ b/Documentation/maintainer/maintainer-entry-profile.rst @@ -105,3 +105,4 @@ to do something different in the near future. ../driver-api/media/maintainer-entry-profile ../driver-api/vfio-pci-device-specific-driver-acceptance ../nvme/feature-and-quirk-policy + ../filesystems/xfs-maintainer-entry-profile diff --git a/Documentation/mm/damon/design.rst b/Documentation/mm/damon/design.rst index 4bfdf1d30c4a..a20383d01a95 100644 --- a/Documentation/mm/damon/design.rst +++ b/Documentation/mm/damon/design.rst @@ -380,12 +380,24 @@ number of filters for each scheme. Each filter specifies the type of target memory, and whether it should exclude the memory of the type (filter-out), or all except the memory of the type (filter-in). -As of this writing, anonymous page type and memory cgroup type are supported by -the feature. Some filter target types can require additional arguments. For -example, the memory cgroup filter type asks users to specify the file path of -the memory cgroup for the filter. Hence, users can apply specific schemes to -only anonymous pages, non-anonymous pages, pages of specific cgroups, all pages -excluding those of specific cgroups, and any combination of those. +Currently, anonymous page, memory cgroup, address range, and DAMON monitoring +target type filters are supported by the feature. Some filter target types +require additional arguments. The memory cgroup filter type asks users to +specify the file path of the memory cgroup for the filter. The address range +type asks the start and end addresses of the range. The DAMON monitoring +target type asks the index of the target from the context's monitoring targets +list. Hence, users can apply specific schemes to only anonymous pages, +non-anonymous pages, pages of specific cgroups, all pages excluding those of +specific cgroups, pages in specific address range, pages in specific DAMON +monitoring targets, and any combination of those. + +To handle filters efficiently, the address range and DAMON monitoring target +type filters are handled by the core layer, while others are handled by +operations set. If a memory region is filtered by a core layer-handled filter, +it is not counted as the scheme has tried to the region. In contrast, if a +memory regions is filtered by an operations set layer-handled filter, it is +counted as the scheme has tried. The difference in accounting leads to changes +in the statistics. Application Programming Interface diff --git a/Documentation/mm/frontswap.rst b/Documentation/mm/frontswap.rst deleted file mode 100644 index c892412988af..000000000000 --- a/Documentation/mm/frontswap.rst +++ /dev/null @@ -1,264 +0,0 @@ -========= -Frontswap -========= - -Frontswap provides a "transcendent memory" interface for swap pages. -In some environments, dramatic performance savings may be obtained because -swapped pages are saved in RAM (or a RAM-like device) instead of a swap disk. - -.. _Transcendent memory in a nutshell: https://lwn.net/Articles/454795/ - -Frontswap is so named because it can be thought of as the opposite of -a "backing" store for a swap device. The storage is assumed to be -a synchronous concurrency-safe page-oriented "pseudo-RAM device" conforming -to the requirements of transcendent memory (such as Xen's "tmem", or -in-kernel compressed memory, aka "zcache", or future RAM-like devices); -this pseudo-RAM device is not directly accessible or addressable by the -kernel and is of unknown and possibly time-varying size. The driver -links itself to frontswap by calling frontswap_register_ops to set the -frontswap_ops funcs appropriately and the functions it provides must -conform to certain policies as follows: - -An "init" prepares the device to receive frontswap pages associated -with the specified swap device number (aka "type"). A "store" will -copy the page to transcendent memory and associate it with the type and -offset associated with the page. A "load" will copy the page, if found, -from transcendent memory into kernel memory, but will NOT remove the page -from transcendent memory. An "invalidate_page" will remove the page -from transcendent memory and an "invalidate_area" will remove ALL pages -associated with the swap type (e.g., like swapoff) and notify the "device" -to refuse further stores with that swap type. - -Once a page is successfully stored, a matching load on the page will normally -succeed. So when the kernel finds itself in a situation where it needs -to swap out a page, it first attempts to use frontswap. If the store returns -success, the data has been successfully saved to transcendent memory and -a disk write and, if the data is later read back, a disk read are avoided. -If a store returns failure, transcendent memory has rejected the data, and the -page can be written to swap as usual. - -Note that if a page is stored and the page already exists in transcendent memory -(a "duplicate" store), either the store succeeds and the data is overwritten, -or the store fails AND the page is invalidated. This ensures stale data may -never be obtained from frontswap. - -If properly configured, monitoring of frontswap is done via debugfs in -the `/sys/kernel/debug/frontswap` directory. The effectiveness of -frontswap can be measured (across all swap devices) with: - -``failed_stores`` - how many store attempts have failed - -``loads`` - how many loads were attempted (all should succeed) - -``succ_stores`` - how many store attempts have succeeded - -``invalidates`` - how many invalidates were attempted - -A backend implementation may provide additional metrics. - -FAQ -=== - -* Where's the value? - -When a workload starts swapping, performance falls through the floor. -Frontswap significantly increases performance in many such workloads by -providing a clean, dynamic interface to read and write swap pages to -"transcendent memory" that is otherwise not directly addressable to the kernel. -This interface is ideal when data is transformed to a different form -and size (such as with compression) or secretly moved (as might be -useful for write-balancing for some RAM-like devices). Swap pages (and -evicted page-cache pages) are a great use for this kind of slower-than-RAM- -but-much-faster-than-disk "pseudo-RAM device". - -Frontswap with a fairly small impact on the kernel, -provides a huge amount of flexibility for more dynamic, flexible RAM -utilization in various system configurations: - -In the single kernel case, aka "zcache", pages are compressed and -stored in local memory, thus increasing the total anonymous pages -that can be safely kept in RAM. Zcache essentially trades off CPU -cycles used in compression/decompression for better memory utilization. -Benchmarks have shown little or no impact when memory pressure is -low while providing a significant performance improvement (25%+) -on some workloads under high memory pressure. - -"RAMster" builds on zcache by adding "peer-to-peer" transcendent memory -support for clustered systems. Frontswap pages are locally compressed -as in zcache, but then "remotified" to another system's RAM. This -allows RAM to be dynamically load-balanced back-and-forth as needed, -i.e. when system A is overcommitted, it can swap to system B, and -vice versa. RAMster can also be configured as a memory server so -many servers in a cluster can swap, dynamically as needed, to a single -server configured with a large amount of RAM... without pre-configuring -how much of the RAM is available for each of the clients! - -In the virtual case, the whole point of virtualization is to statistically -multiplex physical resources across the varying demands of multiple -virtual machines. This is really hard to do with RAM and efforts to do -it well with no kernel changes have essentially failed (except in some -well-publicized special-case workloads). -Specifically, the Xen Transcendent Memory backend allows otherwise -"fallow" hypervisor-owned RAM to not only be "time-shared" between multiple -virtual machines, but the pages can be compressed and deduplicated to -optimize RAM utilization. And when guest OS's are induced to surrender -underutilized RAM (e.g. with "selfballooning"), sudden unexpected -memory pressure may result in swapping; frontswap allows those pages -to be swapped to and from hypervisor RAM (if overall host system memory -conditions allow), thus mitigating the potentially awful performance impact -of unplanned swapping. - -A KVM implementation is underway and has been RFC'ed to lkml. And, -using frontswap, investigation is also underway on the use of NVM as -a memory extension technology. - -* Sure there may be performance advantages in some situations, but - what's the space/time overhead of frontswap? - -If CONFIG_FRONTSWAP is disabled, every frontswap hook compiles into -nothingness and the only overhead is a few extra bytes per swapon'ed -swap device. If CONFIG_FRONTSWAP is enabled but no frontswap "backend" -registers, there is one extra global variable compared to zero for -every swap page read or written. If CONFIG_FRONTSWAP is enabled -AND a frontswap backend registers AND the backend fails every "store" -request (i.e. provides no memory despite claiming it might), -CPU overhead is still negligible -- and since every frontswap fail -precedes a swap page write-to-disk, the system is highly likely -to be I/O bound and using a small fraction of a percent of a CPU -will be irrelevant anyway. - -As for space, if CONFIG_FRONTSWAP is enabled AND a frontswap backend -registers, one bit is allocated for every swap page for every swap -device that is swapon'd. This is added to the EIGHT bits (which -was sixteen until about 2.6.34) that the kernel already allocates -for every swap page for every swap device that is swapon'd. (Hugh -Dickins has observed that frontswap could probably steal one of -the existing eight bits, but let's worry about that minor optimization -later.) For very large swap disks (which are rare) on a standard -4K pagesize, this is 1MB per 32GB swap. - -When swap pages are stored in transcendent memory instead of written -out to disk, there is a side effect that this may create more memory -pressure that can potentially outweigh the other advantages. A -backend, such as zcache, must implement policies to carefully (but -dynamically) manage memory limits to ensure this doesn't happen. - -* OK, how about a quick overview of what this frontswap patch does - in terms that a kernel hacker can grok? - -Let's assume that a frontswap "backend" has registered during -kernel initialization; this registration indicates that this -frontswap backend has access to some "memory" that is not directly -accessible by the kernel. Exactly how much memory it provides is -entirely dynamic and random. - -Whenever a swap-device is swapon'd frontswap_init() is called, -passing the swap device number (aka "type") as a parameter. -This notifies frontswap to expect attempts to "store" swap pages -associated with that number. - -Whenever the swap subsystem is readying a page to write to a swap -device (c.f swap_writepage()), frontswap_store is called. Frontswap -consults with the frontswap backend and if the backend says it does NOT -have room, frontswap_store returns -1 and the kernel swaps the page -to the swap device as normal. Note that the response from the frontswap -backend is unpredictable to the kernel; it may choose to never accept a -page, it could accept every ninth page, or it might accept every -page. But if the backend does accept a page, the data from the page -has already been copied and associated with the type and offset, -and the backend guarantees the persistence of the data. In this case, -frontswap sets a bit in the "frontswap_map" for the swap device -corresponding to the page offset on the swap device to which it would -otherwise have written the data. - -When the swap subsystem needs to swap-in a page (swap_readpage()), -it first calls frontswap_load() which checks the frontswap_map to -see if the page was earlier accepted by the frontswap backend. If -it was, the page of data is filled from the frontswap backend and -the swap-in is complete. If not, the normal swap-in code is -executed to obtain the page of data from the real swap device. - -So every time the frontswap backend accepts a page, a swap device read -and (potentially) a swap device write are replaced by a "frontswap backend -store" and (possibly) a "frontswap backend loads", which are presumably much -faster. - -* Can't frontswap be configured as a "special" swap device that is - just higher priority than any real swap device (e.g. like zswap, - or maybe swap-over-nbd/NFS)? - -No. First, the existing swap subsystem doesn't allow for any kind of -swap hierarchy. Perhaps it could be rewritten to accommodate a hierarchy, -but this would require fairly drastic changes. Even if it were -rewritten, the existing swap subsystem uses the block I/O layer which -assumes a swap device is fixed size and any page in it is linearly -addressable. Frontswap barely touches the existing swap subsystem, -and works around the constraints of the block I/O subsystem to provide -a great deal of flexibility and dynamicity. - -For example, the acceptance of any swap page by the frontswap backend is -entirely unpredictable. This is critical to the definition of frontswap -backends because it grants completely dynamic discretion to the -backend. In zcache, one cannot know a priori how compressible a page is. -"Poorly" compressible pages can be rejected, and "poorly" can itself be -defined dynamically depending on current memory constraints. - -Further, frontswap is entirely synchronous whereas a real swap -device is, by definition, asynchronous and uses block I/O. The -block I/O layer is not only unnecessary, but may perform "optimizations" -that are inappropriate for a RAM-oriented device including delaying -the write of some pages for a significant amount of time. Synchrony is -required to ensure the dynamicity of the backend and to avoid thorny race -conditions that would unnecessarily and greatly complicate frontswap -and/or the block I/O subsystem. That said, only the initial "store" -and "load" operations need be synchronous. A separate asynchronous thread -is free to manipulate the pages stored by frontswap. For example, -the "remotification" thread in RAMster uses standard asynchronous -kernel sockets to move compressed frontswap pages to a remote machine. -Similarly, a KVM guest-side implementation could do in-guest compression -and use "batched" hypercalls. - -In a virtualized environment, the dynamicity allows the hypervisor -(or host OS) to do "intelligent overcommit". For example, it can -choose to accept pages only until host-swapping might be imminent, -then force guests to do their own swapping. - -There is a downside to the transcendent memory specifications for -frontswap: Since any "store" might fail, there must always be a real -slot on a real swap device to swap the page. Thus frontswap must be -implemented as a "shadow" to every swapon'd device with the potential -capability of holding every page that the swap device might have held -and the possibility that it might hold no pages at all. This means -that frontswap cannot contain more pages than the total of swapon'd -swap devices. For example, if NO swap device is configured on some -installation, frontswap is useless. Swapless portable devices -can still use frontswap but a backend for such devices must configure -some kind of "ghost" swap device and ensure that it is never used. - -* Why this weird definition about "duplicate stores"? If a page - has been previously successfully stored, can't it always be - successfully overwritten? - -Nearly always it can, but no, sometimes it cannot. Consider an example -where data is compressed and the original 4K page has been compressed -to 1K. Now an attempt is made to overwrite the page with data that -is non-compressible and so would take the entire 4K. But the backend -has no more space. In this case, the store must be rejected. Whenever -frontswap rejects a store that would overwrite, it also must invalidate -the old data and ensure that it is no longer accessible. Since the -swap subsystem then writes the new data to the read swap device, -this is the correct course of action to ensure coherency. - -* Why does the frontswap patch create the new include file swapfile.h? - -The frontswap code depends on some swap-subsystem-internal data -structures that have, over the years, moved back and forth between -static and global. This seemed a reasonable compromise: Define -them as global but declare them in a new include file that isn't -included by the large number of source files that include swap.h. - -Dan Magenheimer, last updated April 9, 2012 diff --git a/Documentation/mm/highmem.rst b/Documentation/mm/highmem.rst index c964e0848702..aefb03eb386e 100644 --- a/Documentation/mm/highmem.rst +++ b/Documentation/mm/highmem.rst @@ -206,4 +206,5 @@ Functions ========= .. kernel-doc:: include/linux/highmem.h +.. kernel-doc:: mm/highmem.c .. kernel-doc:: include/linux/highmem-internal.h diff --git a/Documentation/mm/hugetlbfs_reserv.rst b/Documentation/mm/hugetlbfs_reserv.rst index d9c2b0f01dcd..4914fbf07966 100644 --- a/Documentation/mm/hugetlbfs_reserv.rst +++ b/Documentation/mm/hugetlbfs_reserv.rst @@ -271,12 +271,12 @@ to the global reservation count (resv_huge_pages). Freeing Huge Pages ================== -Huge page freeing is performed by the routine free_huge_page(). This routine -is the destructor for hugetlbfs compound pages. As a result, it is only -passed a pointer to the page struct. When a huge page is freed, reservation -accounting may need to be performed. This would be the case if the page was -associated with a subpool that contained reserves, or the page is being freed -on an error path where a global reserve count must be restored. +Huge pages are freed by free_huge_folio(). It is only passed a pointer +to the folio as it is called from the generic MM code. When a huge page +is freed, reservation accounting may need to be performed. This would +be the case if the page was associated with a subpool that contained +reserves, or the page is being freed on an error path where a global +reserve count must be restored. The page->private field points to any subpool associated with the page. If the PagePrivate flag is set, it indicates the global reserve count should @@ -525,7 +525,7 @@ However, there are several instances where errors are encountered after a huge page is allocated but before it is instantiated. In this case, the page allocation has consumed the reservation and made the appropriate subpool, reservation map and global count adjustments. If the page is freed at this -time (before instantiation and clearing of PagePrivate), then free_huge_page +time (before instantiation and clearing of PagePrivate), then free_huge_folio will increment the global reservation count. However, the reservation map indicates the reservation was consumed. This resulting inconsistent state will cause the 'leak' of a reserved huge page. The global reserve count will diff --git a/Documentation/mm/index.rst b/Documentation/mm/index.rst index 5a94a921ea40..31d2ac306438 100644 --- a/Documentation/mm/index.rst +++ b/Documentation/mm/index.rst @@ -44,7 +44,6 @@ above structured documentation, or deleted if it has served its purpose. balance damon/index free_page_reporting - frontswap hmm hwpoison hugetlbfs_reserv diff --git a/Documentation/mm/split_page_table_lock.rst b/Documentation/mm/split_page_table_lock.rst index a834fad9de12..e4f6972eb6c0 100644 --- a/Documentation/mm/split_page_table_lock.rst +++ b/Documentation/mm/split_page_table_lock.rst @@ -58,7 +58,7 @@ Support of split page table lock by an architecture =================================================== There's no need in special enabling of PTE split page table lock: everything -required is done by pgtable_pte_page_ctor() and pgtable_pte_page_dtor(), which +required is done by pagetable_pte_ctor() and pagetable_pte_dtor(), which must be called on PTE table allocation / freeing. Make sure the architecture doesn't use slab allocator for page table @@ -68,8 +68,8 @@ This field shares storage with page->ptl. PMD split lock only makes sense if you have more than two page table levels. -PMD split lock enabling requires pgtable_pmd_page_ctor() call on PMD table -allocation and pgtable_pmd_page_dtor() on freeing. +PMD split lock enabling requires pagetable_pmd_ctor() call on PMD table +allocation and pagetable_pmd_dtor() on freeing. Allocation usually happens in pmd_alloc_one(), freeing in pmd_free() and pmd_free_tlb(), but make sure you cover all PMD table allocation / freeing @@ -77,7 +77,7 @@ paths: i.e X86_PAE preallocate few PMDs on pgd_alloc(). With everything in place you can set CONFIG_ARCH_ENABLE_SPLIT_PMD_PTLOCK. -NOTE: pgtable_pte_page_ctor() and pgtable_pmd_page_ctor() can fail -- it must +NOTE: pagetable_pte_ctor() and pagetable_pmd_ctor() can fail -- it must be handled properly. page->ptl @@ -97,7 +97,7 @@ trick: split lock with enabled DEBUG_SPINLOCK or DEBUG_LOCK_ALLOC, but costs one more cache line for indirect access; -The spinlock_t allocated in pgtable_pte_page_ctor() for PTE table and in -pgtable_pmd_page_ctor() for PMD table. +The spinlock_t allocated in pagetable_pte_ctor() for PTE table and in +pagetable_pmd_ctor() for PMD table. Please, never access page->ptl directly -- use appropriate helper. diff --git a/Documentation/mm/vmemmap_dedup.rst b/Documentation/mm/vmemmap_dedup.rst index a4b12ff906c4..c573e08b5043 100644 --- a/Documentation/mm/vmemmap_dedup.rst +++ b/Documentation/mm/vmemmap_dedup.rst @@ -210,6 +210,7 @@ the device (altmap). The following page sizes are supported in DAX: PAGE_SIZE (4K on x86_64), PMD_SIZE (2M on x86_64) and PUD_SIZE (1G on x86_64). +For powerpc equivalent details see Documentation/powerpc/vmemmap_dedup.rst The differences with HugeTLB are relatively minor. diff --git a/Documentation/mm/zsmalloc.rst b/Documentation/mm/zsmalloc.rst index a3c26d587752..76902835e68e 100644 --- a/Documentation/mm/zsmalloc.rst +++ b/Documentation/mm/zsmalloc.rst @@ -263,3 +263,8 @@ is heavy internal fragmentation and zspool compaction is unable to relocate objects and release zspages. In these cases, it is recommended to decrease the limit on the size of the zspage chains (as specified by the CONFIG_ZSMALLOC_CHAIN_SIZE option). + +Functions +========= + +.. kernel-doc:: mm/zsmalloc.c diff --git a/Documentation/netlink/genetlink-c.yaml b/Documentation/netlink/genetlink-c.yaml index 57d1c1c4918f..9806c44f604c 100644 --- a/Documentation/netlink/genetlink-c.yaml +++ b/Documentation/netlink/genetlink-c.yaml @@ -41,7 +41,7 @@ properties: description: Name of the define for the family name. type: string c-version-name: - description: Name of the define for the verion of the family. + description: Name of the define for the version of the family. type: string max-by-define: description: Makes the number of attributes and commands be specified by a define, not an enum value. @@ -274,7 +274,7 @@ properties: description: Kernel attribute validation flags. type: array items: - enum: [ strict, dump ] + enum: [ strict, dump, dump-strict ] do: &subop-type description: Main command handler. type: object diff --git a/Documentation/netlink/genetlink-legacy.yaml b/Documentation/netlink/genetlink-legacy.yaml index 43b769c98fb2..12a0a045605d 100644 --- a/Documentation/netlink/genetlink-legacy.yaml +++ b/Documentation/netlink/genetlink-legacy.yaml @@ -41,7 +41,7 @@ properties: description: Name of the define for the family name. type: string c-version-name: - description: Name of the define for the verion of the family. + description: Name of the define for the version of the family. type: string max-by-define: description: Makes the number of attributes and commands be specified by a define, not an enum value. @@ -321,7 +321,7 @@ properties: description: Kernel attribute validation flags. type: array items: - enum: [ strict, dump ] + enum: [ strict, dump, dump-strict ] # Start genetlink-legacy fixed-header: *fixed-header # End genetlink-legacy diff --git a/Documentation/netlink/genetlink.yaml b/Documentation/netlink/genetlink.yaml index 1cbb448d2f1c..3d338c48bf21 100644 --- a/Documentation/netlink/genetlink.yaml +++ b/Documentation/netlink/genetlink.yaml @@ -243,7 +243,7 @@ properties: description: Kernel attribute validation flags. type: array items: - enum: [ strict, dump ] + enum: [ strict, dump, dump-strict ] do: &subop-type description: Main command handler. type: object diff --git a/Documentation/netlink/netlink-raw.yaml b/Documentation/netlink/netlink-raw.yaml new file mode 100644 index 000000000000..896797876414 --- /dev/null +++ b/Documentation/netlink/netlink-raw.yaml @@ -0,0 +1,410 @@ +# SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) +%YAML 1.2 +--- +$id: http://kernel.org/schemas/netlink/netlink-raw.yaml# +$schema: https://json-schema.org/draft-07/schema + +# Common defines +$defs: + uint: + type: integer + minimum: 0 + len-or-define: + type: [ string, integer ] + pattern: ^[0-9A-Za-z_]+( - 1)?$ + minimum: 0 + +# Schema for specs +title: Protocol +description: Specification of a raw netlink protocol +type: object +required: [ name, doc, attribute-sets, operations ] +additionalProperties: False +properties: + name: + description: Name of the netlink family. + type: string + doc: + type: string + protocol: + description: Schema compatibility level. + enum: [ netlink-raw ] # Trim + # Start netlink-raw + protonum: + description: Protocol number to use for netlink-raw + type: integer + # End netlink-raw + uapi-header: + description: Path to the uAPI header, default is linux/${family-name}.h + type: string + # Start genetlink-c + c-family-name: + description: Name of the define for the family name. + type: string + c-version-name: + description: Name of the define for the version of the family. + type: string + max-by-define: + description: Makes the number of attributes and commands be specified by a define, not an enum value. + type: boolean + # End genetlink-c + # Start genetlink-legacy + kernel-policy: + description: | + Defines if the input policy in the kernel is global, per-operation, or split per operation type. + Default is split. + enum: [ split, per-op, global ] + # End genetlink-legacy + + definitions: + description: List of type and constant definitions (enums, flags, defines). + type: array + items: + type: object + required: [ type, name ] + additionalProperties: False + properties: + name: + type: string + header: + description: For C-compatible languages, header which already defines this value. + type: string + type: + enum: [ const, enum, flags, struct ] # Trim + doc: + type: string + # For const + value: + description: For const - the value. + type: [ string, integer ] + # For enum and flags + value-start: + description: For enum or flags the literal initializer for the first value. + type: [ string, integer ] + entries: + description: For enum or flags array of values. + type: array + items: + oneOf: + - type: string + - type: object + required: [ name ] + additionalProperties: False + properties: + name: + type: string + value: + type: integer + doc: + type: string + render-max: + description: Render the max members for this enum. + type: boolean + # Start genetlink-c + enum-name: + description: Name for enum, if empty no name will be used. + type: [ string, "null" ] + name-prefix: + description: For enum the prefix of the values, optional. + type: string + # End genetlink-c + # Start genetlink-legacy + members: + description: List of struct members. Only scalars and strings members allowed. + type: array + items: + type: object + required: [ name, type ] + additionalProperties: False + properties: + name: + type: string + type: + description: The netlink attribute type + enum: [ u8, u16, u32, u64, s8, s16, s32, s64, string, binary ] + len: + $ref: '#/$defs/len-or-define' + byte-order: + enum: [ little-endian, big-endian ] + doc: + description: Documentation for the struct member attribute. + type: string + enum: + description: Name of the enum type used for the attribute. + type: string + enum-as-flags: + description: | + Treat the enum as flags. In most cases enum is either used as flags or as values. + Sometimes, however, both forms are necessary, in which case header contains the enum + form while specific attributes may request to convert the values into a bitfield. + type: boolean + display-hint: &display-hint + description: | + Optional format indicator that is intended only for choosing + the right formatting mechanism when displaying values of this + type. + enum: [ hex, mac, fddi, ipv4, ipv6, uuid ] + # End genetlink-legacy + + attribute-sets: + description: Definition of attribute spaces for this family. + type: array + items: + description: Definition of a single attribute space. + type: object + required: [ name, attributes ] + additionalProperties: False + properties: + name: + description: | + Name used when referring to this space in other definitions, not used outside of the spec. + type: string + name-prefix: + description: | + Prefix for the C enum name of the attributes. Default family[name]-set[name]-a- + type: string + enum-name: + description: Name for the enum type of the attribute. + type: string + doc: + description: Documentation of the space. + type: string + subset-of: + description: | + Name of another space which this is a logical part of. Sub-spaces can be used to define + a limited group of attributes which are used in a nest. + type: string + # Start genetlink-c + attr-cnt-name: + description: The explicit name for constant holding the count of attributes (last attr + 1). + type: string + attr-max-name: + description: The explicit name for last member of attribute enum. + type: string + # End genetlink-c + attributes: + description: List of attributes in the space. + type: array + items: + type: object + required: [ name, type ] + additionalProperties: False + properties: + name: + type: string + type: &attr-type + description: The netlink attribute type + enum: [ unused, pad, flag, binary, u8, u16, u32, u64, s32, s64, + string, nest, array-nest, nest-type-value ] + doc: + description: Documentation of the attribute. + type: string + value: + description: Value for the enum item representing this attribute in the uAPI. + $ref: '#/$defs/uint' + type-value: + description: Name of the value extracted from the type of a nest-type-value attribute. + type: array + items: + type: string + byte-order: + enum: [ little-endian, big-endian ] + multi-attr: + type: boolean + nested-attributes: + description: Name of the space (sub-space) used inside the attribute. + type: string + enum: + description: Name of the enum type used for the attribute. + type: string + enum-as-flags: + description: | + Treat the enum as flags. In most cases enum is either used as flags or as values. + Sometimes, however, both forms are necessary, in which case header contains the enum + form while specific attributes may request to convert the values into a bitfield. + type: boolean + checks: + description: Kernel input validation. + type: object + additionalProperties: False + properties: + flags-mask: + description: Name of the flags constant on which to base mask (unsigned scalar types only). + type: string + min: + description: Min value for an integer attribute. + type: integer + min-len: + description: Min length for a binary attribute. + $ref: '#/$defs/len-or-define' + max-len: + description: Max length for a string or a binary attribute. + $ref: '#/$defs/len-or-define' + sub-type: *attr-type + display-hint: *display-hint + # Start genetlink-c + name-prefix: + type: string + # End genetlink-c + # Start genetlink-legacy + struct: + description: Name of the struct type used for the attribute. + type: string + # End genetlink-legacy + + # Make sure name-prefix does not appear in subsets (subsets inherit naming) + dependencies: + name-prefix: + not: + required: [ subset-of ] + subset-of: + not: + required: [ name-prefix ] + + operations: + description: Operations supported by the protocol. + type: object + required: [ list ] + additionalProperties: False + properties: + enum-model: + description: | + The model of assigning values to the operations. + "unified" is the recommended model where all message types belong + to a single enum. + "directional" has the messages sent to the kernel and from the kernel + enumerated separately. + enum: [ unified, directional ] # Trim + name-prefix: + description: | + Prefix for the C enum name of the command. The name is formed by concatenating + the prefix with the upper case name of the command, with dashes replaced by underscores. + type: string + enum-name: + description: Name for the enum type with commands. + type: string + async-prefix: + description: Same as name-prefix but used to render notifications and events to separate enum. + type: string + async-enum: + description: Name for the enum type with notifications/events. + type: string + # Start genetlink-legacy + fixed-header: &fixed-header + description: | + Name of the structure defining the optional fixed-length protocol + header. This header is placed in a message after the netlink and + genetlink headers and before any attributes. + type: string + # End genetlink-legacy + list: + description: List of commands + type: array + items: + type: object + additionalProperties: False + required: [ name, doc ] + properties: + name: + description: Name of the operation, also defining its C enum value in uAPI. + type: string + doc: + description: Documentation for the command. + type: string + value: + description: Value for the enum in the uAPI. + $ref: '#/$defs/uint' + attribute-set: + description: | + Attribute space from which attributes directly in the requests and replies + to this command are defined. + type: string + flags: &cmd_flags + description: Command flags. + type: array + items: + enum: [ admin-perm ] + dont-validate: + description: Kernel attribute validation flags. + type: array + items: + enum: [ strict, dump ] + # Start genetlink-legacy + fixed-header: *fixed-header + # End genetlink-legacy + do: &subop-type + description: Main command handler. + type: object + additionalProperties: False + properties: + request: &subop-attr-list + description: Definition of the request message for a given command. + type: object + additionalProperties: False + properties: + attributes: + description: | + Names of attributes from the attribute-set (not full attribute + definitions, just names). + type: array + items: + type: string + # Start genetlink-legacy + value: + description: | + ID of this message if value for request and response differ, + i.e. requests and responses have different message enums. + $ref: '#/$defs/uint' + # End genetlink-legacy + reply: *subop-attr-list + pre: + description: Hook for a function to run before the main callback (pre_doit or start). + type: string + post: + description: Hook for a function to run after the main callback (post_doit or done). + type: string + dump: *subop-type + notify: + description: Name of the command sharing the reply type with this notification. + type: string + event: + type: object + additionalProperties: False + properties: + attributes: + description: Explicit list of the attributes for the notification. + type: array + items: + type: string + mcgrp: + description: Name of the multicast group generating given notification. + type: string + mcast-groups: + description: List of multicast groups. + type: object + required: [ list ] + additionalProperties: False + properties: + list: + description: List of groups. + type: array + items: + type: object + required: [ name ] + additionalProperties: False + properties: + name: + description: | + The name for the group, used to form the define and the value of the define. + type: string + # Start genetlink-c + c-define-name: + description: Override for the name of the define in C uAPI. + type: string + # End genetlink-c + flags: *cmd_flags + # Start netlink-raw + value: + description: Value of the netlink multicast group in the uAPI. + type: integer + # End netlink-raw diff --git a/Documentation/netlink/specs/devlink.yaml b/Documentation/netlink/specs/devlink.yaml index 5d46ca966979..d1ebcd927149 100644 --- a/Documentation/netlink/specs/devlink.yaml +++ b/Documentation/netlink/specs/devlink.yaml @@ -6,6 +6,16 @@ protocol: genetlink-legacy doc: Partial family for Devlink. +definitions: + - + type: enum + name: sb-pool-type + entries: + - + name: ingress + - + name: egress + attribute-sets: - name: devlink @@ -25,6 +35,46 @@ attribute-sets: # TODO: fill in the attributes in between - + name: sb-index + type: u32 + value: 11 + + # TODO: fill in the attributes in between + + - + name: sb-pool-index + type: u16 + value: 17 + + - + name: sb-pool-type + type: u8 + enum: sb-pool-type + + # TODO: fill in the attributes in between + + - + name: sb-tc-index + type: u16 + value: 22 + + # TODO: fill in the attributes in between + + - + name: param-name + type: string + value: 81 + + # TODO: fill in the attributes in between + + - + name: region-name + type: string + value: 88 + + # TODO: fill in the attributes in between + + - name: info-driver-name type: string value: 98 @@ -56,9 +106,34 @@ attribute-sets: # TODO: fill in the attributes in between - + name: health-reporter-name + type: string + value: 115 + + # TODO: fill in the attributes in between + + - + name: trap-name + type: string + value: 130 + + # TODO: fill in the attributes in between + + - + name: trap-group-name + type: string + value: 135 + + - name: reload-failed type: u8 - value: 136 + + # TODO: fill in the attributes in between + + - + name: trap-policer-id + type: u32 + value: 142 # TODO: fill in the attributes in between @@ -103,6 +178,21 @@ attribute-sets: type: nest multi-attr: true nested-attributes: dl-reload-act-stats + + # TODO: fill in the attributes in between + + - + name: rate-node-name + type: string + value: 168 + + # TODO: fill in the attributes in between + + - + name: linecard-index + type: u32 + value: 171 + - name: dl-dev-stats subset-of: devlink @@ -165,8 +255,13 @@ operations: name: get doc: Get devlink instances. attribute-set: devlink + dont-validate: + - strict + - dump do: + pre: devlink-nl-pre-doit + post: devlink-nl-post-doit request: value: 1 attributes: &dev-id-attrs @@ -183,18 +278,212 @@ operations: dump: reply: *get-reply + - + name: port-get + doc: Get devlink port instances. + attribute-set: devlink + dont-validate: + - strict + + do: + pre: devlink-nl-pre-doit-port + post: devlink-nl-post-doit + request: + value: 5 + attributes: &port-id-attrs + - bus-name + - dev-name + - port-index + reply: + value: 7 + attributes: *port-id-attrs + dump: + request: + attributes: *dev-id-attrs + reply: + value: 3 # due to a bug, port dump returns DEVLINK_CMD_NEW + attributes: *port-id-attrs + + # TODO: fill in the operations in between + + - + name: sb-get + doc: Get shared buffer instances. + attribute-set: devlink + dont-validate: + - strict + + do: + pre: devlink-nl-pre-doit + post: devlink-nl-post-doit + request: + value: 11 + attributes: &sb-id-attrs + - bus-name + - dev-name + - sb-index + reply: &sb-get-reply + value: 11 + attributes: *sb-id-attrs + dump: + request: + attributes: *dev-id-attrs + reply: *sb-get-reply + + # TODO: fill in the operations in between + + - + name: sb-pool-get + doc: Get shared buffer pool instances. + attribute-set: devlink + dont-validate: + - strict + + do: + pre: devlink-nl-pre-doit + post: devlink-nl-post-doit + request: + value: 15 + attributes: &sb-pool-id-attrs + - bus-name + - dev-name + - sb-index + - sb-pool-index + reply: &sb-pool-get-reply + value: 15 + attributes: *sb-pool-id-attrs + dump: + request: + attributes: *dev-id-attrs + reply: *sb-pool-get-reply + + # TODO: fill in the operations in between + + - + name: sb-port-pool-get + doc: Get shared buffer port-pool combinations and threshold. + attribute-set: devlink + dont-validate: + - strict + + do: + pre: devlink-nl-pre-doit-port + post: devlink-nl-post-doit + request: + value: 19 + attributes: &sb-port-pool-id-attrs + - bus-name + - dev-name + - port-index + - sb-index + - sb-pool-index + reply: &sb-port-pool-get-reply + value: 19 + attributes: *sb-port-pool-id-attrs + dump: + request: + attributes: *dev-id-attrs + reply: *sb-port-pool-get-reply + + # TODO: fill in the operations in between + + - + name: sb-tc-pool-bind-get + doc: Get shared buffer port-TC to pool bindings and threshold. + attribute-set: devlink + dont-validate: + - strict + + do: + pre: devlink-nl-pre-doit-port + post: devlink-nl-post-doit + request: + value: 23 + attributes: &sb-tc-pool-bind-id-attrs + - bus-name + - dev-name + - port-index + - sb-index + - sb-pool-type + - sb-tc-index + reply: &sb-tc-pool-bind-get-reply + value: 23 + attributes: *sb-tc-pool-bind-id-attrs + dump: + request: + attributes: *dev-id-attrs + reply: *sb-tc-pool-bind-get-reply + + # TODO: fill in the operations in between + + - + name: param-get + doc: Get param instances. + attribute-set: devlink + dont-validate: + - strict + + do: + pre: devlink-nl-pre-doit + post: devlink-nl-post-doit + request: + value: 38 + attributes: ¶m-id-attrs + - bus-name + - dev-name + - param-name + reply: ¶m-get-reply + value: 38 + attributes: *param-id-attrs + dump: + request: + attributes: *dev-id-attrs + reply: *param-get-reply + + # TODO: fill in the operations in between + + - + name: region-get + doc: Get region instances. + attribute-set: devlink + dont-validate: + - strict + + do: + pre: devlink-nl-pre-doit-port-optional + post: devlink-nl-post-doit + request: + value: 42 + attributes: ®ion-id-attrs + - bus-name + - dev-name + - port-index + - region-name + reply: ®ion-get-reply + value: 42 + attributes: *region-id-attrs + dump: + request: + attributes: *dev-id-attrs + reply: *region-get-reply + # TODO: fill in the operations in between - name: info-get doc: Get device information, like driver name, hardware and firmware versions etc. attribute-set: devlink + dont-validate: + - strict + - dump do: + pre: devlink-nl-pre-doit + post: devlink-nl-post-doit request: value: 51 attributes: *dev-id-attrs - reply: + reply: &info-get-reply value: 51 attributes: - bus-name @@ -204,3 +493,181 @@ operations: - info-version-fixed - info-version-running - info-version-stored + dump: + reply: *info-get-reply + + - + name: health-reporter-get + doc: Get health reporter instances. + attribute-set: devlink + dont-validate: + - strict + + do: + pre: devlink-nl-pre-doit-port-optional + post: devlink-nl-post-doit + request: + attributes: &health-reporter-id-attrs + - bus-name + - dev-name + - port-index + - health-reporter-name + reply: &health-reporter-get-reply + attributes: *health-reporter-id-attrs + dump: + request: + attributes: *port-id-attrs + reply: *health-reporter-get-reply + + # TODO: fill in the operations in between + + - + name: trap-get + doc: Get trap instances. + attribute-set: devlink + dont-validate: + - strict + + do: + pre: devlink-nl-pre-doit + post: devlink-nl-post-doit + request: + value: 61 + attributes: &trap-id-attrs + - bus-name + - dev-name + - trap-name + reply: &trap-get-reply + value: 61 + attributes: *trap-id-attrs + dump: + request: + attributes: *dev-id-attrs + reply: *trap-get-reply + + # TODO: fill in the operations in between + + - + name: trap-group-get + doc: Get trap group instances. + attribute-set: devlink + dont-validate: + - strict + + do: + pre: devlink-nl-pre-doit + post: devlink-nl-post-doit + request: + value: 65 + attributes: &trap-group-id-attrs + - bus-name + - dev-name + - trap-group-name + reply: &trap-group-get-reply + value: 65 + attributes: *trap-group-id-attrs + dump: + request: + attributes: *dev-id-attrs + reply: *trap-group-get-reply + + # TODO: fill in the operations in between + + - + name: trap-policer-get + doc: Get trap policer instances. + attribute-set: devlink + dont-validate: + - strict + + do: + pre: devlink-nl-pre-doit + post: devlink-nl-post-doit + request: + value: 69 + attributes: &trap-policer-id-attrs + - bus-name + - dev-name + - trap-policer-id + reply: &trap-policer-get-reply + value: 69 + attributes: *trap-policer-id-attrs + dump: + request: + attributes: *dev-id-attrs + reply: *trap-policer-get-reply + + # TODO: fill in the operations in between + + - + name: rate-get + doc: Get rate instances. + attribute-set: devlink + dont-validate: + - strict + + do: + pre: devlink-nl-pre-doit + post: devlink-nl-post-doit + request: + value: 74 + attributes: &rate-id-attrs + - bus-name + - dev-name + - port-index + - rate-node-name + reply: &rate-get-reply + value: 74 + attributes: *rate-id-attrs + dump: + request: + attributes: *dev-id-attrs + reply: *rate-get-reply + + # TODO: fill in the operations in between + + - + name: linecard-get + doc: Get line card instances. + attribute-set: devlink + dont-validate: + - strict + + do: + pre: devlink-nl-pre-doit + post: devlink-nl-post-doit + request: + value: 78 + attributes: &linecard-id-attrs + - bus-name + - dev-name + - linecard-index + reply: &linecard-get-reply + value: 78 + attributes: *linecard-id-attrs + dump: + request: + attributes: *dev-id-attrs + reply: *linecard-get-reply + + # TODO: fill in the operations in between + + - + name: selftests-get + doc: Get device selftest instances. + attribute-set: devlink + dont-validate: + - strict + - dump + + do: + pre: devlink-nl-pre-doit + post: devlink-nl-post-doit + request: + value: 82 + attributes: *dev-id-attrs + reply: &selftests-get-reply + value: 82 + attributes: *dev-id-attrs + dump: + reply: *selftests-get-reply diff --git a/Documentation/netlink/specs/fou.yaml b/Documentation/netlink/specs/fou.yaml index 3e13826a3fdf..0af5ab842c04 100644 --- a/Documentation/netlink/specs/fou.yaml +++ b/Documentation/netlink/specs/fou.yaml @@ -107,16 +107,16 @@ operations: flags: [ admin-perm ] do: - request: &select_attrs + request: &select_attrs attributes: - - af - - ifindex - - port - - peer_port - - local_v4 - - peer_v4 - - local_v6 - - peer_v6 + - af + - ifindex + - port + - peer_port + - local_v4 + - peer_v4 + - local_v6 + - peer_v6 - name: get diff --git a/Documentation/netlink/specs/netdev.yaml b/Documentation/netlink/specs/netdev.yaml index b99e7ffef7a1..1c7284fd535b 100644 --- a/Documentation/netlink/specs/netdev.yaml +++ b/Documentation/netlink/specs/netdev.yaml @@ -14,7 +14,7 @@ definitions: - name: basic doc: - XDP feautues set supported by all drivers + XDP features set supported by all drivers (XDP_ABORTED, XDP_DROP, XDP_PASS, XDP_TX) - name: redirect @@ -62,6 +62,12 @@ attribute-sets: type: u64 enum: xdp-act enum-as-flags: true + - + name: xdp-zc-max-segs + doc: max fragment count supported by ZC driver + type: u32 + checks: + min: 1 operations: list: @@ -77,6 +83,7 @@ operations: attributes: - ifindex - xdp-features + - xdp-zc-max-segs dump: reply: *dev-all - diff --git a/Documentation/netlink/specs/ovs_vport.yaml b/Documentation/netlink/specs/ovs_vport.yaml index 17336455bec1..f65ce62cd60d 100644 --- a/Documentation/netlink/specs/ovs_vport.yaml +++ b/Documentation/netlink/specs/ovs_vport.yaml @@ -82,6 +82,10 @@ attribute-sets: enum-name: ovs-vport-attr attributes: - + name: unspec + type: unused + value: 0 + - name: port-no type: u32 - @@ -121,9 +125,34 @@ operations: name-prefix: ovs-vport-cmd- list: - + name: new + doc: Create a new OVS vport + attribute-set: vport + fixed-header: ovs-header + do: + request: + attributes: + - name + - type + - upcall-pid + - dp-ifindex + - ifindex + - options + - + name: del + doc: Delete existing OVS vport from a data path + attribute-set: vport + fixed-header: ovs-header + do: + request: + attributes: + - dp-ifindex + - port-no + - type + - name + - name: get doc: Get / dump OVS vport configuration and state - value: 3 attribute-set: vport fixed-header: ovs-header do: &vport-get-op diff --git a/Documentation/netlink/specs/rt_addr.yaml b/Documentation/netlink/specs/rt_addr.yaml new file mode 100644 index 000000000000..cbee1cedb177 --- /dev/null +++ b/Documentation/netlink/specs/rt_addr.yaml @@ -0,0 +1,179 @@ +# SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) + +name: rt-addr +protocol: netlink-raw +protonum: 0 + +doc: + Address configuration over rtnetlink. + +definitions: + - + name: ifaddrmsg + type: struct + members: + - + name: ifa-family + type: u8 + - + name: ifa-prefixlen + type: u8 + - + name: ifa-flags + type: u8 + enum: ifa-flags + enum-as-flags: true + - + name: ifa-scope + type: u8 + - + name: ifa-index + type: u32 + - + name: ifa-cacheinfo + type: struct + members: + - + name: ifa-prefered + type: u32 + - + name: ifa-valid + type: u32 + - + name: cstamp + type: u32 + - + name: tstamp + type: u32 + + - + name: ifa-flags + type: flags + entries: + - + name: secondary + - + name: nodad + - + name: optimistic + - + name: dadfailed + - + name: homeaddress + - + name: deprecated + - + name: tentative + - + name: permanent + - + name: managetempaddr + - + name: noprefixroute + - + name: mcautojoin + - + name: stable-privacy + +attribute-sets: + - + name: addr-attrs + attributes: + - + name: ifa-address + type: binary + display-hint: ipv4 + - + name: ifa-local + type: binary + display-hint: ipv4 + - + name: ifa-label + type: string + - + name: ifa-broadcast + type: binary + display-hint: ipv4 + - + name: ifa-anycast + type: binary + - + name: ifa-cacheinfo + type: binary + struct: ifa-cacheinfo + - + name: ifa-multicast + type: binary + - + name: ifa-flags + type: u32 + enum: ifa-flags + enum-as-flags: true + - + name: ifa-rt-priority + type: u32 + - + name: ifa-target-netnsid + type: binary + - + name: ifa-proto + type: u8 + + +operations: + fixed-header: ifaddrmsg + enum-model: directional + list: + - + name: newaddr + doc: Add new address + attribute-set: addr-attrs + do: + request: + value: 20 + attributes: &ifaddr-all + - ifa-family + - ifa-flags + - ifa-prefixlen + - ifa-scope + - ifa-index + - ifa-address + - ifa-label + - ifa-local + - ifa-cacheinfo + - + name: deladdr + doc: Remove address + attribute-set: addr-attrs + do: + request: + value: 21 + attributes: + - ifa-family + - ifa-flags + - ifa-prefixlen + - ifa-scope + - ifa-index + - ifa-address + - ifa-local + - + name: getaddr + doc: Dump address information. + attribute-set: addr-attrs + dump: + request: + value: 22 + attributes: + - ifa-index + reply: + value: 20 + attributes: *ifaddr-all + +mcast-groups: + list: + - + name: rtnlgrp-ipv4-ifaddr + value: 5 + - + name: rtnlgrp-ipv6-ifaddr + value: 9 diff --git a/Documentation/netlink/specs/rt_link.yaml b/Documentation/netlink/specs/rt_link.yaml new file mode 100644 index 000000000000..d86a68f8475c --- /dev/null +++ b/Documentation/netlink/specs/rt_link.yaml @@ -0,0 +1,1432 @@ +# SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) + +name: rt-link +protocol: netlink-raw +protonum: 0 + +doc: + Link configuration over rtnetlink. + +definitions: + - + name: ifinfo-flags + type: flags + entries: + - + name: up + - + name: broadcast + - + name: debug + - + name: loopback + - + name: point-to-point + - + name: no-trailers + - + name: running + - + name: no-arp + - + name: promisc + - + name: all-multi + - + name: master + - + name: slave + - + name: multicast + - + name: portsel + - + name: auto-media + - + name: dynamic + - + name: lower-up + - + name: dormant + - + name: echo + + - + name: rtgenmsg + type: struct + members: + - + name: family + type: u8 + - + name: ifinfomsg + type: struct + members: + - + name: ifi-family + type: u8 + - + name: padding + type: u8 + - + name: ifi-type + type: u16 + - + name: ifi-index + type: s32 + - + name: ifi-flags + type: u32 + enum: ifinfo-flags + enum-as-flags: true + - + name: ifi-change + type: u32 + - + name: ifla-cacheinfo + type: struct + members: + - + name: max-reasm-len + type: u32 + - + name: tstamp + type: u32 + - + name: reachable-time + type: s32 + - + name: retrans-time + type: u32 + - + name: rtnl-link-stats + type: struct + members: + - + name: rx-packets + type: u32 + - + name: tx-packets + type: u32 + - + name: rx-bytes + type: u32 + - + name: tx-bytes + type: u32 + - + name: rx-errors + type: u32 + - + name: tx-errors + type: u32 + - + name: rx-dropped + type: u32 + - + name: tx-dropped + type: u32 + - + name: multicast + type: u32 + - + name: collisions + type: u32 + - + name: rx-length-errors + type: u32 + - + name: rx-over-errors + type: u32 + - + name: rx-crc-errors + type: u32 + - + name: rx-frame-errors + type: u32 + - + name: rx-fifo-errors + type: u32 + - + name: rx-missed-errors + type: u32 + - + name: tx-aborted-errors + type: u32 + - + name: tx-carrier-errors + type: u32 + - + name: tx-fifo-errors + type: u32 + - + name: tx-heartbeat-errors + type: u32 + - + name: tx-window-errors + type: u32 + - + name: rx-compressed + type: u32 + - + name: tx-compressed + type: u32 + - + name: rx-nohandler + type: u32 + - + name: rtnl-link-stats64 + type: struct + members: + - + name: rx-packets + type: u64 + - + name: tx-packets + type: u64 + - + name: rx-bytes + type: u64 + - + name: tx-bytes + type: u64 + - + name: rx-errors + type: u64 + - + name: tx-errors + type: u64 + - + name: rx-dropped + type: u64 + - + name: tx-dropped + type: u64 + - + name: multicast + type: u64 + - + name: collisions + type: u64 + - + name: rx-length-errors + type: u64 + - + name: rx-over-errors + type: u64 + - + name: rx-crc-errors + type: u64 + - + name: rx-frame-errors + type: u64 + - + name: rx-fifo-errors + type: u64 + - + name: rx-missed-errors + type: u64 + - + name: tx-aborted-errors + type: u64 + - + name: tx-carrier-errors + type: u64 + - + name: tx-fifo-errors + type: u64 + - + name: tx-heartbeat-errors + type: u64 + - + name: tx-window-errors + type: u64 + - + name: rx-compressed + type: u64 + - + name: tx-compressed + type: u64 + - + name: rx-nohandler + type: u64 + - + name: rx-otherhost-dropped + type: u64 + - + name: rtnl-link-ifmap + type: struct + members: + - + name: mem-start + type: u64 + - + name: mem-end + type: u64 + - + name: base-addr + type: u64 + - + name: irq + type: u16 + - + name: dma + type: u8 + - + name: port + type: u8 + - + name: ipv4-devconf + type: struct + members: + - + name: forwarding + type: u32 + - + name: mc-forwarding + type: u32 + - + name: proxy-arp + type: u32 + - + name: accept-redirects + type: u32 + - + name: secure-redirects + type: u32 + - + name: send-redirects + type: u32 + - + name: shared-media + type: u32 + - + name: rp-filter + type: u32 + - + name: accept-source-route + type: u32 + - + name: bootp-relay + type: u32 + - + name: log-martians + type: u32 + - + name: tag + type: u32 + - + name: arpfilter + type: u32 + - + name: medium-id + type: u32 + - + name: noxfrm + type: u32 + - + name: nopolicy + type: u32 + - + name: force-igmp-version + type: u32 + - + name: arp-announce + type: u32 + - + name: arp-ignore + type: u32 + - + name: promote-secondaries + type: u32 + - + name: arp-accept + type: u32 + - + name: arp-notify + type: u32 + - + name: accept-local + type: u32 + - + name: src-vmark + type: u32 + - + name: proxy-arp-pvlan + type: u32 + - + name: route-localnet + type: u32 + - + name: igmpv2-unsolicited-report-interval + type: u32 + - + name: igmpv3-unsolicited-report-interval + type: u32 + - + name: ignore-routes-with-linkdown + type: u32 + - + name: drop-unicast-in-l2-multicast + type: u32 + - + name: drop-gratuitous-arp + type: u32 + - + name: bc-forwarding + type: u32 + - + name: arp-evict-nocarrier + type: u32 + - + name: ipv6-devconf + type: struct + members: + - + name: forwarding + type: u32 + - + name: hoplimit + type: u32 + - + name: mtu6 + type: u32 + - + name: accept-ra + type: u32 + - + name: accept-redirects + type: u32 + - + name: autoconf + type: u32 + - + name: dad-transmits + type: u32 + - + name: rtr-solicits + type: u32 + - + name: rtr-solicit-interval + type: u32 + - + name: rtr-solicit-delay + type: u32 + - + name: use-tempaddr + type: u32 + - + name: temp-valid-lft + type: u32 + - + name: temp-prefered-lft + type: u32 + - + name: regen-max-retry + type: u32 + - + name: max-desync-factor + type: u32 + - + name: max-addresses + type: u32 + - + name: force-mld-version + type: u32 + - + name: accept-ra-defrtr + type: u32 + - + name: accept-ra-pinfo + type: u32 + - + name: accept-ra-rtr-pref + type: u32 + - + name: rtr-probe-interval + type: u32 + - + name: accept-ra-rt-info-max-plen + type: u32 + - + name: proxy-ndp + type: u32 + - + name: optimistic-dad + type: u32 + - + name: accept-source-route + type: u32 + - + name: mc-forwarding + type: u32 + - + name: disable-ipv6 + type: u32 + - + name: accept-dad + type: u32 + - + name: force-tllao + type: u32 + - + name: ndisc-notify + type: u32 + - + name: mldv1-unsolicited-report-interval + type: u32 + - + name: mldv2-unsolicited-report-interval + type: u32 + - + name: suppress-frag-ndisc + type: u32 + - + name: accept-ra-from-local + type: u32 + - + name: use-optimistic + type: u32 + - + name: accept-ra-mtu + type: u32 + - + name: stable-secret + type: u32 + - + name: use-oif-addrs-only + type: u32 + - + name: accept-ra-min-hop-limit + type: u32 + - + name: ignore-routes-with-linkdown + type: u32 + - + name: drop-unicast-in-l2-multicast + type: u32 + - + name: drop-unsolicited-na + type: u32 + - + name: keep-addr-on-down + type: u32 + - + name: rtr-solicit-max-interval + type: u32 + - + name: seg6-enabled + type: u32 + - + name: seg6-require-hmac + type: u32 + - + name: enhanced-dad + type: u32 + - + name: addr-gen-mode + type: u8 + - + name: disable-policy + type: u32 + - + name: accept-ra-rt-info-min-plen + type: u32 + - + name: ndisc-tclass + type: u32 + - + name: rpl-seg-enabled + type: u32 + - + name: ra-defrtr-metric + type: u32 + - + name: ioam6-enabled + type: u32 + - + name: ioam6-id + type: u32 + - + name: ioam6-id-wide + type: u32 + - + name: ndisc-evict-nocarrier + type: u32 + - + name: accept-untracked-na + type: u32 + - + name: ifla-icmp6-stats + type: struct + members: + - + name: inmsgs + type: u64 + - + name: inerrors + type: u64 + - + name: outmsgs + type: u64 + - + name: outerrors + type: u64 + - + name: csumerrors + type: u64 + - + name: ratelimithost + type: u64 + - + name: ifla-inet6-stats + type: struct + members: + - + name: inpkts + type: u64 + - + name: inoctets + type: u64 + - + name: indelivers + type: u64 + - + name: outforwdatagrams + type: u64 + - + name: outpkts + type: u64 + - + name: outoctets + type: u64 + - + name: inhdrerrors + type: u64 + - + name: intoobigerrors + type: u64 + - + name: innoroutes + type: u64 + - + name: inaddrerrors + type: u64 + - + name: inunknownprotos + type: u64 + - + name: intruncatedpkts + type: u64 + - + name: indiscards + type: u64 + - + name: outdiscards + type: u64 + - + name: outnoroutes + type: u64 + - + name: reasmtimeout + type: u64 + - + name: reasmreqds + type: u64 + - + name: reasmoks + type: u64 + - + name: reasmfails + type: u64 + - + name: fragoks + type: u64 + - + name: fragfails + type: u64 + - + name: fragcreates + type: u64 + - + name: inmcastpkts + type: u64 + - + name: outmcastpkts + type: u64 + - + name: inbcastpkts + type: u64 + - + name: outbcastpkts + type: u64 + - + name: inmcastoctets + type: u64 + - + name: outmcastoctets + type: u64 + - + name: inbcastoctets + type: u64 + - + name: outbcastoctets + type: u64 + - + name: csumerrors + type: u64 + - + name: noectpkts + type: u64 + - + name: ect1-pkts + type: u64 + - + name: ect0-pkts + type: u64 + - + name: cepkts + type: u64 + - + name: reasm-overlaps + type: u64 + - name: br-boolopt-multi + type: struct + members: + - + name: optval + type: u32 + - + name: optmask + type: u32 + - + name: if_stats_msg + type: struct + members: + - + name: family + type: u8 + - + name: pad1 + type: u8 + - + name: pad2 + type: u16 + - + name: ifindex + type: u32 + - + name: filter-mask + type: u32 + + +attribute-sets: + - + name: link-attrs + name-prefix: ifla- + attributes: + - + name: address + type: binary + display-hint: mac + - + name: broadcast + type: binary + display-hint: mac + - + name: ifname + type: string + - + name: mtu + type: u32 + - + name: link + type: u32 + - + name: qdisc + type: string + - + name: stats + type: binary + struct: rtnl-link-stats + - + name: cost + type: string + - + name: priority + type: string + - + name: master + type: u32 + - + name: wireless + type: string + - + name: protinfo + type: string + - + name: txqlen + type: u32 + - + name: map + type: binary + struct: rtnl-link-ifmap + - + name: weight + type: u32 + - + name: operstate + type: u8 + - + name: linkmode + type: u8 + - + name: linkinfo + type: nest + nested-attributes: linkinfo-attrs + - + name: net-ns-pid + type: u32 + - + name: ifalias + type: string + - + name: num-vf + type: u32 + - + name: vfinfo-list + type: nest + nested-attributes: vfinfo-attrs + - + name: stats64 + type: binary + struct: rtnl-link-stats64 + - + name: vf-ports + type: nest + nested-attributes: vf-ports-attrs + - + name: port-self + type: nest + nested-attributes: port-self-attrs + - + name: af-spec + type: nest + nested-attributes: af-spec-attrs + - + name: group + type: u32 + - + name: net-ns-fd + type: u32 + - + name: ext-mask + type: u32 + - + name: promiscuity + type: u32 + - + name: num-tx-queues + type: u32 + - + name: num-rx-queues + type: u32 + - + name: carrier + type: u8 + - + name: phys-port-id + type: binary + - + name: carrier-changes + type: u32 + - + name: phys-switch-id + type: binary + - + name: link-netnsid + type: s32 + - + name: phys-port-name + type: string + - + name: proto-down + type: u8 + - + name: gso-max-segs + type: u32 + - + name: gso-max-size + type: u32 + - + name: pad + type: pad + - + name: xdp + type: nest + nested-attributes: xdp-attrs + - + name: event + type: u32 + - + name: new-netnsid + type: s32 + - + name: target-netnsid + type: s32 + - + name: carrier-up-count + type: u32 + - + name: carrier-down-count + type: u32 + - + name: new-ifindex + type: s32 + - + name: min-mtu + type: u32 + - + name: max-mtu + type: u32 + - + name: prop-list + type: nest + nested-attributes: link-attrs + - + name: alt-ifname + type: string + multi-attr: true + - + name: perm-address + type: binary + display-hint: mac + - + name: proto-down-reason + type: string + - + name: parent-dev-name + type: string + - + name: parent-dev-bus-name + type: string + - + name: gro-max-size + type: u32 + - + name: tso-max-size + type: u32 + - + name: tso-max-segs + type: u32 + - + name: allmulti + type: u32 + - + name: devlink-port + type: binary + - + name: gso-ipv4-max-size + type: u32 + - + name: gro-ipv4-max-size + type: u32 + - + name: af-spec-attrs + attributes: + - + name: "inet" + type: nest + value: 2 + nested-attributes: ifla-attrs + - + name: "inet6" + type: nest + value: 10 + nested-attributes: ifla6-attrs + - + name: "mctp" + type: nest + value: 45 + nested-attributes: mctp-attrs + - + name: vfinfo-attrs + attributes: [] + - + name: vf-ports-attrs + attributes: [] + - + name: port-self-attrs + attributes: [] + - + name: linkinfo-attrs + attributes: + - + name: kind + type: string + - + name: data + type: binary + # kind specific nest, e.g. linkinfo-bridge-attrs + - + name: xstats + type: binary + - + name: slave-kind + type: string + - + name: slave-data + type: binary + # kind specific nest + - + name: linkinfo-bridge-attrs + attributes: + - + name: forward-delay + type: u32 + - + name: hello-time + type: u32 + - + name: max-age + type: u32 + - + name: ageing-time + type: u32 + - + name: stp-state + type: u32 + - + name: priority + type: u16 + - + name: vlan-filtering + type: u8 + - + name: vlan-protocol + type: u16 + - + name: group-fwd-mask + type: u16 + - + name: root-id + type: binary + - + name: bridge-id + type: binary + - + name: root-port + type: u16 + - + name: root-path-cost + type: u32 + - + name: topology-change + type: u8 + - + name: topology-change-detected + type: u8 + - + name: hello-timer + type: u64 + - + name: tcn-timer + type: u64 + - + name: topology-change-timer + type: u64 + - + name: gc-timer + type: u64 + - + name: group-addr + type: binary + - + name: fdb-flush + type: binary + - + name: mcast-router + type: u8 + - + name: mcast-snooping + type: u8 + - + name: mcast-query-use-ifaddr + type: u8 + - + name: mcast-querier + type: u8 + - + name: mcast-hash-elasticity + type: u32 + - + name: mcast-hash-max + type: u32 + - + name: mcast-last-member-cnt + type: u32 + - + name: mcast-startup-query-cnt + type: u32 + - + name: mcast-last-member-intvl + type: u64 + - + name: mcast-membership-intvl + type: u64 + - + name: mcast-querier-intvl + type: u64 + - + name: mcast-query-intvl + type: u64 + - + name: mcast-query-response-intvl + type: u64 + - + name: mcast-startup-query-intvl + type: u64 + - + name: nf-call-iptables + type: u8 + - + name: nf-call-ip6-tables + type: u8 + - + name: nf-call-arptables + type: u8 + - + name: vlan-default-pvid + type: u16 + - + name: pad + type: pad + - + name: vlan-stats-enabled + type: u8 + - + name: mcast-stats-enabled + type: u8 + - + name: mcast-igmp-version + type: u8 + - + name: mcast-mld-version + type: u8 + - + name: vlan-stats-per-port + type: u8 + - + name: multi-boolopt + type: binary + struct: br-boolopt-multi + - + name: mcast-querier-state + type: binary + - + name: xdp-attrs + attributes: + - + name: fd + type: s32 + - + name: attached + type: u8 + - + name: flags + type: u32 + - + name: prog-id + type: u32 + - + name: drv-prog-id + type: u32 + - + name: skb-prog-id + type: u32 + - + name: hw-prog-id + type: u32 + - + name: expected-fd + type: s32 + - + name: ifla-attrs + attributes: + - + name: conf + type: binary + struct: ipv4-devconf + - + name: ifla6-attrs + attributes: + - + name: flags + type: u32 + - + name: conf + type: binary + struct: ipv6-devconf + - + name: stats + type: binary + struct: ifla-inet6-stats + - + name: mcast + type: binary + - + name: cacheinfo + type: binary + struct: ifla-cacheinfo + - + name: icmp6-stats + type: binary + struct: ifla-icmp6-stats + - + name: token + type: binary + - + name: addr-gen-mode + type: u8 + - + name: ra-mtu + type: u32 + - + name: mctp-attrs + attributes: + - + name: mctp-net + type: u32 + - + name: stats-attrs + name-prefix: ifla-stats- + attributes: + - + name: link-64 + type: binary + struct: rtnl-link-stats64 + - + name: link-xstats + type: binary + - + name: link-xstats-slave + type: binary + - + name: link-offload-xstats + type: nest + nested-attributes: link-offload-xstats + - + name: af-spec + type: binary + - + name: link-offload-xstats + attributes: + - + name: cpu-hit + type: binary + - + name: hw-s-info + type: array-nest + nested-attributes: hw-s-info-one + - + name: l3-stats + type: binary + - + name: hw-s-info-one + attributes: + - + name: request + type: u8 + - + name: used + type: u8 + +operations: + enum-model: directional + list: + - + name: newlink + doc: Create a new link. + attribute-set: link-attrs + fixed-header: ifinfomsg + do: + request: + value: 16 + attributes: &link-new-attrs + - ifi-index + - ifname + - net-ns-pid + - net-ns-fd + - target-netnsid + - link-netnsid + - linkinfo + - group + - num-tx-queues + - num-rx-queues + - address + - broadcast + - mtu + - txqlen + - operstate + - linkmode + - group + - gso-max-size + - gso-max-segs + - gro-max-size + - gso-ipv4-max-size + - gro-ipv4-max-size + - af-spec + - + name: dellink + doc: Delete an existing link. + attribute-set: link-attrs + fixed-header: ifinfomsg + do: + request: + value: 17 + attributes: + - ifi-index + - ifname + - + name: getlink + doc: Get / dump information about a link. + attribute-set: link-attrs + fixed-header: ifinfomsg + do: + request: + value: 18 + attributes: + - ifi-index + - ifname + - alt-ifname + - ext-mask + - target-netnsid + reply: + value: 16 + attributes: &link-all-attrs + - ifi-family + - ifi-type + - ifi-index + - ifi-flags + - ifi-change + - address + - broadcast + - ifname + - mtu + - link + - qdisc + - stats + - cost + - priority + - master + - wireless + - protinfo + - txqlen + - map + - weight + - operstate + - linkmode + - linkinfo + - net-ns-pid + - ifalias + - num-vf + - vfinfo-list + - stats64 + - vf-ports + - port-self + - af-spec + - group + - net-ns-fd + - ext-mask + - promiscuity + - num-tx-queues + - num-rx-queues + - carrier + - phys-port-id + - carrier-changes + - phys-switch-id + - link-netnsid + - phys-port-name + - proto-down + - gso-max-segs + - gso-max-size + - pad + - xdp + - event + - new-netnsid + - if-netnsid + - target-netnsid + - carrier-up-count + - carrier-down-count + - new-ifindex + - min-mtu + - max-mtu + - prop-list + - alt-ifname + - perm-address + - proto-down-reason + - parent-dev-name + - parent-dev-bus-name + - gro-max-size + - tso-max-size + - tso-max-segs + - allmulti + - devlink-port + - gso-ipv4-max-size + - gro-ipv4-max-size + dump: + request: + value: 18 + attributes: + - target-netnsid + - ext-mask + - master + - linkinfo + reply: + value: 16 + attributes: *link-all-attrs + - + name: setlink + doc: Set information about a link. + attribute-set: link-attrs + fixed-header: ifinfomsg + do: + request: + value: 19 + attributes: *link-all-attrs + - + name: getstats + doc: Get / dump link stats. + attribute-set: stats-attrs + fixed-header: if_stats_msg + do: + request: + value: 94 + attributes: + - ifindex + reply: + value: 92 + attributes: &link-stats-attrs + - family + - ifindex + - filter-mask + - link-64 + - link-xstats + - link-xstats-slave + - link-offload-xstats + - af-spec + dump: + request: + value: 94 + reply: + value: 92 + attributes: *link-stats-attrs + +mcast-groups: + list: + - + name: rtnlgrp-link + value: 1 + - + name: rtnlgrp-stats + value: 36 diff --git a/Documentation/netlink/specs/rt_route.yaml b/Documentation/netlink/specs/rt_route.yaml new file mode 100644 index 000000000000..f4368be0caed --- /dev/null +++ b/Documentation/netlink/specs/rt_route.yaml @@ -0,0 +1,327 @@ +# SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) + +name: rt-route +protocol: netlink-raw +protonum: 0 + +doc: + Route configuration over rtnetlink. + +definitions: + - + name: rtm-type + name-prefix: rtn- + type: enum + entries: + - unspec + - unicast + - local + - broadcast + - anycast + - multicast + - blackhole + - unreachable + - prohibit + - throw + - nat + - xresolve + - + name: rtmsg + type: struct + members: + - + name: rtm-family + type: u8 + - + name: rtm-dst-len + type: u8 + - + name: rtm-src-len + type: u8 + - + name: rtm-tos + type: u8 + - + name: rtm-table + type: u8 + - + name: rtm-protocol + type: u8 + - + name: rtm-scope + type: u8 + - + name: rtm-type + type: u8 + enum: rtm-type + - + name: rtm-flags + type: u32 + - + name: rta-cacheinfo + type: struct + members: + - + name: rta-clntref + type: u32 + - + name: rta-lastuse + type: u32 + - + name: rta-expires + type: u32 + - + name: rta-error + type: u32 + - + name: rta-used + type: u32 + +attribute-sets: + - + name: route-attrs + attributes: + - + name: rta-dst + type: binary + display-hint: ipv4 + - + name: rta-src + type: binary + display-hint: ipv4 + - + name: rta-iif + type: u32 + - + name: rta-oif + type: u32 + - + name: rta-gateway + type: binary + display-hint: ipv4 + - + name: rta-priority + type: u32 + - + name: rta-prefsrc + type: binary + display-hint: ipv4 + - + name: rta-metrics + type: nest + nested-attributes: rta-metrics + - + name: rta-multipath + type: binary + - + name: rta-protoinfo # not used + type: binary + - + name: rta-flow + type: u32 + - + name: rta-cacheinfo + type: binary + struct: rta-cacheinfo + - + name: rta-session # not used + type: binary + - + name: rta-mp-algo # not used + type: binary + - + name: rta-table + type: u32 + - + name: rta-mark + type: u32 + - + name: rta-mfc-stats + type: binary + - + name: rta-via + type: binary + - + name: rta-newdst + type: binary + - + name: rta-pref + type: u8 + - + name: rta-encap-type + type: u16 + - + name: rta-encap + type: binary # tunnel specific nest + - + name: rta-expires + type: u32 + - + name: rta-pad + type: binary + - + name: rta-uid + type: u32 + - + name: rta-ttl-propagate + type: u8 + - + name: rta-ip-proto + type: u8 + - + name: rta-sport + type: u16 + - + name: rta-dport + type: u16 + - + name: rta-nh-id + type: u32 + - + name: rta-metrics + attributes: + - + name: rtax-unspec + type: unused + value: 0 + - + name: rtax-lock + type: u32 + - + name: rtax-mtu + type: u32 + - + name: rtax-window + type: u32 + - + name: rtax-rtt + type: u32 + - + name: rtax-rttvar + type: u32 + - + name: rtax-ssthresh + type: u32 + - + name: rtax-cwnd + type: u32 + - + name: rtax-advmss + type: u32 + - + name: rtax-reordering + type: u32 + - + name: rtax-hoplimit + type: u32 + - + name: rtax-initcwnd + type: u32 + - + name: rtax-features + type: u32 + - + name: rtax-rto-min + type: u32 + - + name: rtax-initrwnd + type: u32 + - + name: rtax-quickack + type: u32 + - + name: rtax-cc-algo + type: string + - + name: rtax-fastopen-no-cookie + type: u32 + +operations: + enum-model: directional + list: + - + name: getroute + doc: Dump route information. + attribute-set: route-attrs + fixed-header: rtmsg + do: + request: + value: 26 + attributes: + - rtm-family + - rta-src + - rtm-src-len + - rta-dst + - rtm-dst-len + - rta-iif + - rta-oif + - rta-ip-proto + - rta-sport + - rta-dport + - rta-mark + - rta-uid + reply: + value: 24 + attributes: &all-route-attrs + - rtm-family + - rtm-dst-len + - rtm-src-len + - rtm-tos + - rtm-table + - rtm-protocol + - rtm-scope + - rtm-type + - rtm-flags + - rta-dst + - rta-src + - rta-iif + - rta-oif + - rta-gateway + - rta-priority + - rta-prefsrc + - rta-metrics + - rta-multipath + - rta-flow + - rta-cacheinfo + - rta-table + - rta-mark + - rta-mfc-stats + - rta-via + - rta-newdst + - rta-pref + - rta-encap-type + - rta-encap + - rta-expires + - rta-pad + - rta-uid + - rta-ttl-propagate + - rta-ip-proto + - rta-sport + - rta-dport + - rta-nh-id + dump: + request: + value: 26 + attributes: + - rtm-family + reply: + value: 24 + attributes: *all-route-attrs + - + name: newroute + doc: Create a new route + attribute-set: route-attrs + fixed-header: rtmsg + do: + request: + value: 24 + attributes: *all-route-attrs + - + name: delroute + doc: Delete an existing route + attribute-set: route-attrs + fixed-header: rtmsg + do: + request: + value: 25 + attributes: *all-route-attrs diff --git a/Documentation/networking/af_xdp.rst b/Documentation/networking/af_xdp.rst index 1cc35de336a4..dceeb0d763aa 100644 --- a/Documentation/networking/af_xdp.rst +++ b/Documentation/networking/af_xdp.rst @@ -462,8 +462,92 @@ XDP_OPTIONS getsockopt Gets options from an XDP socket. The only one supported so far is XDP_OPTIONS_ZEROCOPY which tells you if zero-copy is on or not. +Multi-Buffer Support +==================== + +With multi-buffer support, programs using AF_XDP sockets can receive +and transmit packets consisting of multiple buffers both in copy and +zero-copy mode. For example, a packet can consist of two +frames/buffers, one with the header and the other one with the data, +or a 9K Ethernet jumbo frame can be constructed by chaining together +three 4K frames. + +Some definitions: + +* A packet consists of one or more frames + +* A descriptor in one of the AF_XDP rings always refers to a single + frame. In the case the packet consists of a single frame, the + descriptor refers to the whole packet. + +To enable multi-buffer support for an AF_XDP socket, use the new bind +flag XDP_USE_SG. If this is not provided, all multi-buffer packets +will be dropped just as before. Note that the XDP program loaded also +needs to be in multi-buffer mode. This can be accomplished by using +"xdp.frags" as the section name of the XDP program used. + +To represent a packet consisting of multiple frames, a new flag called +XDP_PKT_CONTD is introduced in the options field of the Rx and Tx +descriptors. If it is true (1) the packet continues with the next +descriptor and if it is false (0) it means this is the last descriptor +of the packet. Why the reverse logic of end-of-packet (eop) flag found +in many NICs? Just to preserve compatibility with non-multi-buffer +applications that have this bit set to false for all packets on Rx, +and the apps set the options field to zero for Tx, as anything else +will be treated as an invalid descriptor. + +These are the semantics for producing packets onto AF_XDP Tx ring +consisting of multiple frames: + +* When an invalid descriptor is found, all the other + descriptors/frames of this packet are marked as invalid and not + completed. The next descriptor is treated as the start of a new + packet, even if this was not the intent (because we cannot guess + the intent). As before, if your program is producing invalid + descriptors you have a bug that must be fixed. + +* Zero length descriptors are treated as invalid descriptors. + +* For copy mode, the maximum supported number of frames in a packet is + equal to CONFIG_MAX_SKB_FRAGS + 1. If it is exceeded, all + descriptors accumulated so far are dropped and treated as + invalid. To produce an application that will work on any system + regardless of this config setting, limit the number of frags to 18, + as the minimum value of the config is 17. + +* For zero-copy mode, the limit is up to what the NIC HW + supports. Usually at least five on the NICs we have checked. We + consciously chose to not enforce a rigid limit (such as + CONFIG_MAX_SKB_FRAGS + 1) for zero-copy mode, as it would have + resulted in copy actions under the hood to fit into what limit the + NIC supports. Kind of defeats the purpose of zero-copy mode. How to + probe for this limit is explained in the "probe for multi-buffer + support" section. + +On the Rx path in copy-mode, the xsk core copies the XDP data into +multiple descriptors, if needed, and sets the XDP_PKT_CONTD flag as +detailed before. Zero-copy mode works the same, though the data is not +copied. When the application gets a descriptor with the XDP_PKT_CONTD +flag set to one, it means that the packet consists of multiple buffers +and it continues with the next buffer in the following +descriptor. When a descriptor with XDP_PKT_CONTD == 0 is received, it +means that this is the last buffer of the packet. AF_XDP guarantees +that only a complete packet (all frames in the packet) is sent to the +application. If there is not enough space in the AF_XDP Rx ring, all +frames of the packet will be dropped. + +If application reads a batch of descriptors, using for example the libxdp +interfaces, it is not guaranteed that the batch will end with a full +packet. It might end in the middle of a packet and the rest of the +buffers of that packet will arrive at the beginning of the next batch, +since the libxdp interface does not read the whole ring (unless you +have an enormous batch size or a very small ring size). + +An example program each for Rx and Tx multi-buffer support can be found +later in this document. + Usage -===== +----- In order to use AF_XDP sockets two parts are needed. The user-space application and the XDP program. For a complete setup and @@ -541,6 +625,131 @@ like this: But please use the libbpf functions as they are optimized and ready to use. Will make your life easier. +Usage Multi-Buffer Rx +--------------------- + +Here is a simple Rx path pseudo-code example (using libxdp interfaces +for simplicity). Error paths have been excluded to keep it short: + +.. code-block:: c + + void rx_packets(struct xsk_socket_info *xsk) + { + static bool new_packet = true; + u32 idx_rx = 0, idx_fq = 0; + static char *pkt; + + int rcvd = xsk_ring_cons__peek(&xsk->rx, opt_batch_size, &idx_rx); + + xsk_ring_prod__reserve(&xsk->umem->fq, rcvd, &idx_fq); + + for (int i = 0; i < rcvd; i++) { + struct xdp_desc *desc = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx++); + char *frag = xsk_umem__get_data(xsk->umem->buffer, desc->addr); + bool eop = !(desc->options & XDP_PKT_CONTD); + + if (new_packet) + pkt = frag; + else + add_frag_to_pkt(pkt, frag); + + if (eop) + process_pkt(pkt); + + new_packet = eop; + + *xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq++) = desc->addr; + } + + xsk_ring_prod__submit(&xsk->umem->fq, rcvd); + xsk_ring_cons__release(&xsk->rx, rcvd); + } + +Usage Multi-Buffer Tx +--------------------- + +Here is an example Tx path pseudo-code (using libxdp interfaces for +simplicity) ignoring that the umem is finite in size, and that we +eventually will run out of packets to send. Also assumes pkts.addr +points to a valid location in the umem. + +.. code-block:: c + + void tx_packets(struct xsk_socket_info *xsk, struct pkt *pkts, + int batch_size) + { + u32 idx, i, pkt_nb = 0; + + xsk_ring_prod__reserve(&xsk->tx, batch_size, &idx); + + for (i = 0; i < batch_size;) { + u64 addr = pkts[pkt_nb].addr; + u32 len = pkts[pkt_nb].size; + + do { + struct xdp_desc *tx_desc; + + tx_desc = xsk_ring_prod__tx_desc(&xsk->tx, idx + i++); + tx_desc->addr = addr; + + if (len > xsk_frame_size) { + tx_desc->len = xsk_frame_size; + tx_desc->options = XDP_PKT_CONTD; + } else { + tx_desc->len = len; + tx_desc->options = 0; + pkt_nb++; + } + len -= tx_desc->len; + addr += xsk_frame_size; + + if (i == batch_size) { + /* Remember len, addr, pkt_nb for next iteration. + * Skipped for simplicity. + */ + break; + } + } while (len); + } + + xsk_ring_prod__submit(&xsk->tx, i); + } + +Probing for Multi-Buffer Support +-------------------------------- + +To discover if a driver supports multi-buffer AF_XDP in SKB or DRV +mode, use the XDP_FEATURES feature of netlink in linux/netdev.h to +query for NETDEV_XDP_ACT_RX_SG support. This is the same flag as for +querying for XDP multi-buffer support. If XDP supports multi-buffer in +a driver, then AF_XDP will also support that in SKB and DRV mode. + +To discover if a driver supports multi-buffer AF_XDP in zero-copy +mode, use XDP_FEATURES and first check the NETDEV_XDP_ACT_XSK_ZEROCOPY +flag. If it is set, it means that at least zero-copy is supported and +you should go and check the netlink attribute +NETDEV_A_DEV_XDP_ZC_MAX_SEGS in linux/netdev.h. An unsigned integer +value will be returned stating the max number of frags that are +supported by this device in zero-copy mode. These are the possible +return values: + +1: Multi-buffer for zero-copy is not supported by this device, as max + one fragment supported means that multi-buffer is not possible. + +>=2: Multi-buffer is supported in zero-copy mode for this device. The + returned number signifies the max number of frags supported. + +For an example on how these are used through libbpf, please take a +look at tools/testing/selftests/bpf/xskxceiver.c. + +Multi-Buffer Support for Zero-Copy Drivers +------------------------------------------ + +Zero-copy drivers usually use the batched APIs for Rx and Tx +processing. Note that the Tx batch API guarantees that it will provide +a batch of Tx descriptors that ends with full packet at the end. This +to facilitate extending a zero-copy driver with multi-buffer support. + Sample application ================== diff --git a/Documentation/networking/device_drivers/ethernet/google/gve.rst b/Documentation/networking/device_drivers/ethernet/google/gve.rst index 6d73ee78f3d7..31d621bca82e 100644 --- a/Documentation/networking/device_drivers/ethernet/google/gve.rst +++ b/Documentation/networking/device_drivers/ethernet/google/gve.rst @@ -52,6 +52,15 @@ Descriptor Formats GVE supports two descriptor formats: GQI and DQO. These two formats have entirely different descriptors, which will be described below. +Addressing Mode +------------------ +GVE supports two addressing modes: QPL and RDA. +QPL ("queue-page-list") mode communicates data through a set of +pre-registered pages. + +For RDA ("raw DMA addressing") mode, the set of pages is dynamic. +Therefore, the packet buffers can be anywhere in guest memory. + Registers --------- All registers are MMIO. diff --git a/Documentation/networking/device_drivers/ethernet/marvell/octeontx2.rst b/Documentation/networking/device_drivers/ethernet/marvell/octeontx2.rst index bfd233cfac35..1e196cb9ce25 100644 --- a/Documentation/networking/device_drivers/ethernet/marvell/octeontx2.rst +++ b/Documentation/networking/device_drivers/ethernet/marvell/octeontx2.rst @@ -332,3 +332,11 @@ Setup HTB offload # tc class add dev <interface> parent 1: classid 1:1 htb rate 10Gbit prio 1 # tc class add dev <interface> parent 1: classid 1:2 htb rate 10Gbit prio 7 + +4. Create tc classes with same priorities and different quantum:: + + # tc class add dev <interface> parent 1: classid 1:1 htb rate 10Gbit prio 2 quantum 409600 + + # tc class add dev <interface> parent 1: classid 1:2 htb rate 10Gbit prio 2 quantum 188416 + + # tc class add dev <interface> parent 1: classid 1:3 htb rate 10Gbit prio 2 quantum 32768 diff --git a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst index a395df9c2751..f69ee1ebee01 100644 --- a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst +++ b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst @@ -346,6 +346,24 @@ the software port. - The number of receive packets with CQE compression on ring i [#accel]_. - Acceleration + * - `rx[i]_arfs_add` + - The number of aRFS flow rules added to the device for direct RQ steering + on ring i [#accel]_. + - Acceleration + + * - `rx[i]_arfs_request_in` + - Number of flow rules that have been requested to move into ring i for + direct RQ steering [#accel]_. + - Acceleration + + * - `rx[i]_arfs_request_out` + - Number of flow rules that have been requested to move out of ring i [#accel]_. + - Acceleration + + * - `rx[i]_arfs_expired` + - Number of flow rules that have been expired and removed [#accel]_. + - Acceleration + * - `rx[i]_arfs_err` - Number of flow rules that failed to be added to the flow table. - Error @@ -445,11 +463,6 @@ the software port. context. - Error - * - `rx[i]_xsk_arfs_err` - - aRFS (accelerated Receive Flow Steering) does not occur in the XSK RQ - context, so this counter should never increment. - - Error - * - `rx[i]_xdp_tx_xmit` - The number of packets forwarded back to the port due to XDP program `XDP_TX` action (bouncing). these packets are not counted by other @@ -683,6 +696,12 @@ the software port. time protocol. - Error + * - `ptp_cq[i]_late_cqe` + - Number of times a CQE has been delivered on the PTP timestamping CQ when + the CQE was not expected since a certain amount of time had elapsed where + the device typically ensures not posting the CQE. + - Error + .. [#ring_global] The corresponding ring and global counters do not share the same name (i.e. do not follow the common naming scheme). diff --git a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/devlink.rst b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/devlink.rst deleted file mode 100644 index a4edf908b707..000000000000 --- a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/devlink.rst +++ /dev/null @@ -1,313 +0,0 @@ -.. SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB -.. include:: <isonum.txt> - -======= -Devlink -======= - -:Copyright: |copy| 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved. - -Contents -======== - -- `Info`_ -- `Parameters`_ -- `Health reporters`_ - -Info -==== - -The devlink info reports the running and stored firmware versions on device. -It also prints the device PSID which represents the HCA board type ID. - -User command example:: - - $ devlink dev info pci/0000:00:06.0 - pci/0000:00:06.0: - driver mlx5_core - versions: - fixed: - fw.psid MT_0000000009 - running: - fw.version 16.26.0100 - stored: - fw.version 16.26.0100 - -Parameters -========== - -flow_steering_mode: Device flow steering mode ---------------------------------------------- -The flow steering mode parameter controls the flow steering mode of the driver. -Two modes are supported: - -1. 'dmfs' - Device managed flow steering. -2. 'smfs' - Software/Driver managed flow steering. - -In DMFS mode, the HW steering entities are created and managed through the -Firmware. -In SMFS mode, the HW steering entities are created and managed though by -the driver directly into hardware without firmware intervention. - -SMFS mode is faster and provides better rule insertion rate compared to default DMFS mode. - -User command examples: - -- Set SMFS flow steering mode:: - - $ devlink dev param set pci/0000:06:00.0 name flow_steering_mode value "smfs" cmode runtime - -- Read device flow steering mode:: - - $ devlink dev param show pci/0000:06:00.0 name flow_steering_mode - pci/0000:06:00.0: - name flow_steering_mode type driver-specific - values: - cmode runtime value smfs - -enable_roce: RoCE enablement state ----------------------------------- -If the device supports RoCE disablement, RoCE enablement state controls device -support for RoCE capability. Otherwise, the control occurs in the driver stack. -When RoCE is disabled at the driver level, only raw ethernet QPs are supported. - -To change RoCE enablement state, a user must change the driverinit cmode value -and run devlink reload. - -User command examples: - -- Disable RoCE:: - - $ devlink dev param set pci/0000:06:00.0 name enable_roce value false cmode driverinit - $ devlink dev reload pci/0000:06:00.0 - -- Read RoCE enablement state:: - - $ devlink dev param show pci/0000:06:00.0 name enable_roce - pci/0000:06:00.0: - name enable_roce type generic - values: - cmode driverinit value true - -esw_port_metadata: Eswitch port metadata state ----------------------------------------------- -When applicable, disabling eswitch metadata can increase packet rate -up to 20% depending on the use case and packet sizes. - -Eswitch port metadata state controls whether to internally tag packets with -metadata. Metadata tagging must be enabled for multi-port RoCE, failover -between representors and stacked devices. -By default metadata is enabled on the supported devices in E-switch. -Metadata is applicable only for E-switch in switchdev mode and -users may disable it when NONE of the below use cases will be in use: - -1. HCA is in Dual/multi-port RoCE mode. -2. VF/SF representor bonding (Usually used for Live migration) -3. Stacked devices - -When metadata is disabled, the above use cases will fail to initialize if -users try to enable them. - -- Show eswitch port metadata:: - - $ devlink dev param show pci/0000:06:00.0 name esw_port_metadata - pci/0000:06:00.0: - name esw_port_metadata type driver-specific - values: - cmode runtime value true - -- Disable eswitch port metadata:: - - $ devlink dev param set pci/0000:06:00.0 name esw_port_metadata value false cmode runtime - -- Change eswitch mode to switchdev mode where after choosing the metadata value:: - - $ devlink dev eswitch set pci/0000:06:00.0 mode switchdev - -hairpin_num_queues: Number of hairpin queues --------------------------------------------- -We refer to a TC NIC rule that involves forwarding as "hairpin". - -Hairpin queues are mlx5 hardware specific implementation for hardware -forwarding of such packets. - -- Show the number of hairpin queues:: - - $ devlink dev param show pci/0000:06:00.0 name hairpin_num_queues - pci/0000:06:00.0: - name hairpin_num_queues type driver-specific - values: - cmode driverinit value 2 - -- Change the number of hairpin queues:: - - $ devlink dev param set pci/0000:06:00.0 name hairpin_num_queues value 4 cmode driverinit - -hairpin_queue_size: Size of the hairpin queues ----------------------------------------------- -Control the size of the hairpin queues. - -- Show the size of the hairpin queues:: - - $ devlink dev param show pci/0000:06:00.0 name hairpin_queue_size - pci/0000:06:00.0: - name hairpin_queue_size type driver-specific - values: - cmode driverinit value 1024 - -- Change the size (in packets) of the hairpin queues:: - - $ devlink dev param set pci/0000:06:00.0 name hairpin_queue_size value 512 cmode driverinit - -Health reporters -================ - -tx reporter ------------ -The tx reporter is responsible for reporting and recovering of the following two error scenarios: - -- tx timeout - Report on kernel tx timeout detection. - Recover by searching lost interrupts. -- tx error completion - Report on error tx completion. - Recover by flushing the tx queue and reset it. - -tx reporter also support on demand diagnose callback, on which it provides -real time information of its send queues status. - -User commands examples: - -- Diagnose send queues status:: - - $ devlink health diagnose pci/0000:82:00.0 reporter tx - -.. note:: - This command has valid output only when interface is up, otherwise the command has empty output. - -- Show number of tx errors indicated, number of recover flows ended successfully, - is autorecover enabled and graceful period from last recover:: - - $ devlink health show pci/0000:82:00.0 reporter tx - -rx reporter ------------ -The rx reporter is responsible for reporting and recovering of the following two error scenarios: - -- rx queues' initialization (population) timeout - Population of rx queues' descriptors on ring initialization is done - in napi context via triggering an irq. In case of a failure to get - the minimum amount of descriptors, a timeout would occur, and - descriptors could be recovered by polling the EQ (Event Queue). -- rx completions with errors (reported by HW on interrupt context) - Report on rx completion error. - Recover (if needed) by flushing the related queue and reset it. - -rx reporter also supports on demand diagnose callback, on which it -provides real time information of its receive queues' status. - -- Diagnose rx queues' status and corresponding completion queue:: - - $ devlink health diagnose pci/0000:82:00.0 reporter rx - -NOTE: This command has valid output only when interface is up. Otherwise, the command has empty output. - -- Show number of rx errors indicated, number of recover flows ended successfully, - is autorecover enabled, and graceful period from last recover:: - - $ devlink health show pci/0000:82:00.0 reporter rx - -fw reporter ------------ -The fw reporter implements `diagnose` and `dump` callbacks. -It follows symptoms of fw error such as fw syndrome by triggering -fw core dump and storing it into the dump buffer. -The fw reporter diagnose command can be triggered any time by the user to check -current fw status. - -User commands examples: - -- Check fw heath status:: - - $ devlink health diagnose pci/0000:82:00.0 reporter fw - -- Read FW core dump if already stored or trigger new one:: - - $ devlink health dump show pci/0000:82:00.0 reporter fw - -.. note:: - This command can run only on the PF which has fw tracer ownership, - running it on other PF or any VF will return "Operation not permitted". - -fw fatal reporter ------------------ -The fw fatal reporter implements `dump` and `recover` callbacks. -It follows fatal errors indications by CR-space dump and recover flow. -The CR-space dump uses vsc interface which is valid even if the FW command -interface is not functional, which is the case in most FW fatal errors. -The recover function runs recover flow which reloads the driver and triggers fw -reset if needed. -On firmware error, the health buffer is dumped into the dmesg. The log -level is derived from the error's severity (given in health buffer). - -User commands examples: - -- Run fw recover flow manually:: - - $ devlink health recover pci/0000:82:00.0 reporter fw_fatal - -- Read FW CR-space dump if already stored or trigger new one:: - - $ devlink health dump show pci/0000:82:00.1 reporter fw_fatal - -.. note:: - This command can run only on PF. - -vnic reporter -------------- -The vnic reporter implements only the `diagnose` callback. -It is responsible for querying the vnic diagnostic counters from fw and displaying -them in realtime. - -Description of the vnic counters: - -- total_q_under_processor_handle - number of queues in an error state due to - an async error or errored command. -- send_queue_priority_update_flow - number of QP/SQ priority/SL update events. -- cq_overrun - number of times CQ entered an error state due to an overflow. -- async_eq_overrun - number of times an EQ mapped to async events was overrun. - comp_eq_overrun number of times an EQ mapped to completion events was - overrun. -- quota_exceeded_command - number of commands issued and failed due to quota exceeded. -- invalid_command - number of commands issued and failed dues to any reason other than quota - exceeded. -- nic_receive_steering_discard - number of packets that completed RX flow - steering but were discarded due to a mismatch in flow table. -- generated_pkt_steering_fail - number of packets generated by the VNIC experiencing unexpected steering - failure (at any point in steering flow). -- handled_pkt_steering_fail - number of packets handled by the VNIC experiencing unexpected steering - failure (at any point in steering flow owned by the VNIC, including the FDB - for the eswitch owner). - -User commands examples: - -- Diagnose PF/VF vnic counters:: - - $ devlink health diagnose pci/0000:82:00.1 reporter vnic - -- Diagnose representor vnic counters (performed by supplying devlink port of the - representor, which can be obtained via devlink port command):: - - $ devlink health diagnose pci/0000:82:00.1/65537 reporter vnic - -.. note:: - This command can run over all interfaces such as PF/VF and representor ports. diff --git a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/index.rst b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/index.rst index 3fdcd6b61ccf..581a91caa579 100644 --- a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/index.rst +++ b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/index.rst @@ -13,7 +13,6 @@ Contents: :maxdepth: 2 kconfig - devlink switchdev tracepoints counters diff --git a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/kconfig.rst b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/kconfig.rst index 43b1f7e87ec4..0a42c3395ffa 100644 --- a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/kconfig.rst +++ b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/kconfig.rst @@ -36,7 +36,7 @@ Enabling the driver and kconfig options **CONFIG_MLX5_CORE_EN_DCB=(y/n)**: -| Enables `Data Center Bridging (DCB) Support <https://community.mellanox.com/s/article/howto-auto-config-pfc-and-ets-on-connectx-4-via-lldp-dcbx>`_. +| Enables `Data Center Bridging (DCB) Support <https://enterprise-support.nvidia.com/s/article/howto-auto-config-pfc-and-ets-on-connectx-4-via-lldp-dcbx>`_. **CONFIG_MLX5_CORE_IPOIB=(y/n)** @@ -59,12 +59,12 @@ Enabling the driver and kconfig options **CONFIG_MLX5_EN_ARFS=(y/n)** | Enables Hardware-accelerated receive flow steering (arfs) support, and ntuple filtering. -| https://community.mellanox.com/s/article/howto-configure-arfs-on-connectx-4 +| https://enterprise-support.nvidia.com/s/article/howto-configure-arfs-on-connectx-4 **CONFIG_MLX5_EN_IPSEC=(y/n)** -| Enables `IPSec XFRM cryptography-offload acceleration <https://support.mellanox.com/s/article/ConnectX-6DX-Bluefield-2-IPsec-HW-Full-Offload-Configuration-Guide>`_. +| Enables :ref:`IPSec XFRM cryptography-offload acceleration <xfrm_device>`. **CONFIG_MLX5_EN_MACSEC=(y/n)** @@ -87,8 +87,8 @@ Enabling the driver and kconfig options | Ethernet SRIOV E-Switch support in ConnectX NIC. E-Switch provides internal SRIOV packet steering | and switching for the enabled VFs and PF in two available modes: -| 1) `Legacy SRIOV mode (L2 mac vlan steering based) <https://community.mellanox.com/s/article/howto-configure-sr-iov-for-connectx-4-connectx-5-with-kvm--ethernet-x>`_. -| 2) `Switchdev mode (eswitch offloads) <https://www.mellanox.com/related-docs/prod_software/ASAP2_Hardware_Offloading_for_vSwitches_User_Manual_v4.4.pdf>`_. +| 1) `Legacy SRIOV mode (L2 mac vlan steering based) <https://enterprise-support.nvidia.com/s/article/HowTo-Configure-SR-IOV-for-ConnectX-4-ConnectX-5-ConnectX-6-with-KVM-Ethernet>`_. +| 2) :ref:`Switchdev mode (eswitch offloads) <switchdev>`. **CONFIG_MLX5_FPGA=(y/n)** @@ -101,13 +101,13 @@ Enabling the driver and kconfig options **CONFIG_MLX5_INFINIBAND=(y/n/m)** (module mlx5_ib.ko) -| Provides low-level InfiniBand/RDMA and `RoCE <https://community.mellanox.com/s/article/recommended-network-configuration-examples-for-roce-deployment>`_ support. +| Provides low-level InfiniBand/RDMA and `RoCE <https://enterprise-support.nvidia.com/s/article/recommended-network-configuration-examples-for-roce-deployment>`_ support. **CONFIG_MLX5_MPFS=(y/n)** | Ethernet Multi-Physical Function Switch (MPFS) support in ConnectX NIC. -| MPFs is required for when `Multi-Host <http://www.mellanox.com/page/multihost>`_ configuration is enabled to allow passing +| MPFs is required for when `Multi-Host <https://www.nvidia.com/en-us/networking/multi-host/>`_ configuration is enabled to allow passing | user configured unicast MAC addresses to the requesting PF. diff --git a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/switchdev.rst b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/switchdev.rst index 6e3f5ee8b0d0..b617e93d7c2c 100644 --- a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/switchdev.rst +++ b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/switchdev.rst @@ -190,6 +190,26 @@ explicitly enable the VF migratable capability. mlx5 driver support devlink port function attr mechanism to setup migratable capability. (refer to Documentation/networking/devlink/devlink-port.rst) +IPsec crypto capability setup +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +User who wants mlx5 PCI VFs to be able to perform IPsec crypto offloading need +to explicitly enable the VF ipsec_crypto capability. Enabling IPsec capability +for VFs is supported starting with ConnectX6dx devices and above. When a VF has +IPsec capability enabled, any IPsec offloading is blocked on the PF. + +mlx5 driver support devlink port function attr mechanism to setup ipsec_crypto +capability. (refer to Documentation/networking/devlink/devlink-port.rst) + +IPsec packet capability setup +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +User who wants mlx5 PCI VFs to be able to perform IPsec packet offloading need +to explicitly enable the VF ipsec_packet capability. Enabling IPsec capability +for VFs is supported starting with ConnectX6dx devices and above. When a VF has +IPsec capability enabled, any IPsec offloading is blocked on the PF. + +mlx5 driver support devlink port function attr mechanism to setup ipsec_packet +capability. (refer to Documentation/networking/devlink/devlink-port.rst) + SF state setup -------------- diff --git a/Documentation/networking/devlink/devlink-port.rst b/Documentation/networking/devlink/devlink-port.rst index 3da590953ce8..f5adb910427a 100644 --- a/Documentation/networking/devlink/devlink-port.rst +++ b/Documentation/networking/devlink/devlink-port.rst @@ -128,6 +128,12 @@ Users may also set the RoCE capability of the function using Users may also set the function as migratable using 'devlink port function set migratable' command. +Users may also set the IPsec crypto capability of the function using +`devlink port function set ipsec_crypto` command. + +Users may also set the IPsec packet capability of the function using +`devlink port function set ipsec_packet` command. + Function attributes =================== @@ -240,6 +246,55 @@ Attach VF to the VM. Start the VM. Perform live migration. +IPsec crypto capability setup +----------------------------- +When user enables IPsec crypto capability for a VF, user application can offload +XFRM state crypto operation (Encrypt/Decrypt) to this VF. + +When IPsec crypto capability is disabled (default) for a VF, the XFRM state is +processed in software by the kernel. + +- Get IPsec crypto capability of the VF device:: + + $ devlink port show pci/0000:06:00.0/2 + pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1 + function: + hw_addr 00:00:00:00:00:00 ipsec_crypto disabled + +- Set IPsec crypto capability of the VF device:: + + $ devlink port function set pci/0000:06:00.0/2 ipsec_crypto enable + + $ devlink port show pci/0000:06:00.0/2 + pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1 + function: + hw_addr 00:00:00:00:00:00 ipsec_crypto enabled + +IPsec packet capability setup +----------------------------- +When user enables IPsec packet capability for a VF, user application can offload +XFRM state and policy crypto operation (Encrypt/Decrypt) to this VF, as well as +IPsec encapsulation. + +When IPsec packet capability is disabled (default) for a VF, the XFRM state and +policy is processed in software by the kernel. + +- Get IPsec packet capability of the VF device:: + + $ devlink port show pci/0000:06:00.0/2 + pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1 + function: + hw_addr 00:00:00:00:00:00 ipsec_packet disabled + +- Set IPsec packet capability of the VF device:: + + $ devlink port function set pci/0000:06:00.0/2 ipsec_packet enable + + $ devlink port show pci/0000:06:00.0/2 + pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1 + function: + hw_addr 00:00:00:00:00:00 ipsec_packet enabled + Subfunction ============ diff --git a/Documentation/networking/devlink/mlx5.rst b/Documentation/networking/devlink/mlx5.rst index 202798d6501e..702f204a3dbd 100644 --- a/Documentation/networking/devlink/mlx5.rst +++ b/Documentation/networking/devlink/mlx5.rst @@ -18,6 +18,11 @@ Parameters * - ``enable_roce`` - driverinit - Type: Boolean + + If the device supports RoCE disablement, RoCE enablement state controls + device support for RoCE capability. Otherwise, the control occurs in the + driver stack. When RoCE is disabled at the driver level, only raw + ethernet QPs are supported. * - ``io_eq_size`` - driverinit - The range is between 64 and 4096. @@ -48,6 +53,9 @@ parameters. * ``smfs`` Software managed flow steering. In SMFS mode, the HW steering entities are created and manage through the driver without firmware intervention. + + SMFS mode is faster and provides better rule insertion rate compared to + default DMFS mode. * - ``fdb_large_groups`` - u32 - driverinit @@ -71,7 +79,24 @@ parameters. deprecated. Default: disabled + * - ``esw_port_metadata`` + - Boolean + - runtime + - When applicable, disabling eswitch metadata can increase packet rate up + to 20% depending on the use case and packet sizes. + + Eswitch port metadata state controls whether to internally tag packets + with metadata. Metadata tagging must be enabled for multi-port RoCE, + failover between representors and stacked devices. By default metadata is + enabled on the supported devices in E-switch. Metadata is applicable only + for E-switch in switchdev mode and users may disable it when NONE of the + below use cases will be in use: + 1. HCA is in Dual/multi-port RoCE mode. + 2. VF/SF representor bonding (Usually used for Live migration) + 3. Stacked devices + When metadata is disabled, the above use cases will fail to initialize if + users try to enable them. * - ``hairpin_num_queues`` - u32 - driverinit @@ -104,3 +129,160 @@ The ``mlx5`` driver reports the following versions * - ``fw.version`` - stored, running - Three digit major.minor.subminor firmware version number. + +Health reporters +================ + +tx reporter +----------- +The tx reporter is responsible for reporting and recovering of the following three error scenarios: + +- tx timeout + Report on kernel tx timeout detection. + Recover by searching lost interrupts. +- tx error completion + Report on error tx completion. + Recover by flushing the tx queue and reset it. +- tx PTP port timestamping CQ unhealthy + Report too many CQEs never delivered on port ts CQ. + Recover by flushing and re-creating all PTP channels. + +tx reporter also support on demand diagnose callback, on which it provides +real time information of its send queues status. + +User commands examples: + +- Diagnose send queues status:: + + $ devlink health diagnose pci/0000:82:00.0 reporter tx + +.. note:: + This command has valid output only when interface is up, otherwise the command has empty output. + +- Show number of tx errors indicated, number of recover flows ended successfully, + is autorecover enabled and graceful period from last recover:: + + $ devlink health show pci/0000:82:00.0 reporter tx + +rx reporter +----------- +The rx reporter is responsible for reporting and recovering of the following two error scenarios: + +- rx queues' initialization (population) timeout + Population of rx queues' descriptors on ring initialization is done + in napi context via triggering an irq. In case of a failure to get + the minimum amount of descriptors, a timeout would occur, and + descriptors could be recovered by polling the EQ (Event Queue). +- rx completions with errors (reported by HW on interrupt context) + Report on rx completion error. + Recover (if needed) by flushing the related queue and reset it. + +rx reporter also supports on demand diagnose callback, on which it +provides real time information of its receive queues' status. + +- Diagnose rx queues' status and corresponding completion queue:: + + $ devlink health diagnose pci/0000:82:00.0 reporter rx + +.. note:: + This command has valid output only when interface is up. Otherwise, the command has empty output. + +- Show number of rx errors indicated, number of recover flows ended successfully, + is autorecover enabled, and graceful period from last recover:: + + $ devlink health show pci/0000:82:00.0 reporter rx + +fw reporter +----------- +The fw reporter implements `diagnose` and `dump` callbacks. +It follows symptoms of fw error such as fw syndrome by triggering +fw core dump and storing it into the dump buffer. +The fw reporter diagnose command can be triggered any time by the user to check +current fw status. + +User commands examples: + +- Check fw heath status:: + + $ devlink health diagnose pci/0000:82:00.0 reporter fw + +- Read FW core dump if already stored or trigger new one:: + + $ devlink health dump show pci/0000:82:00.0 reporter fw + +.. note:: + This command can run only on the PF which has fw tracer ownership, + running it on other PF or any VF will return "Operation not permitted". + +fw fatal reporter +----------------- +The fw fatal reporter implements `dump` and `recover` callbacks. +It follows fatal errors indications by CR-space dump and recover flow. +The CR-space dump uses vsc interface which is valid even if the FW command +interface is not functional, which is the case in most FW fatal errors. +The recover function runs recover flow which reloads the driver and triggers fw +reset if needed. +On firmware error, the health buffer is dumped into the dmesg. The log +level is derived from the error's severity (given in health buffer). + +User commands examples: + +- Run fw recover flow manually:: + + $ devlink health recover pci/0000:82:00.0 reporter fw_fatal + +- Read FW CR-space dump if already stored or trigger new one:: + + $ devlink health dump show pci/0000:82:00.1 reporter fw_fatal + +.. note:: + This command can run only on PF. + +vnic reporter +------------- +The vnic reporter implements only the `diagnose` callback. +It is responsible for querying the vnic diagnostic counters from fw and displaying +them in realtime. + +Description of the vnic counters: + +- total_q_under_processor_handle + number of queues in an error state due to + an async error or errored command. +- send_queue_priority_update_flow + number of QP/SQ priority/SL update events. +- cq_overrun + number of times CQ entered an error state due to an overflow. +- async_eq_overrun + number of times an EQ mapped to async events was overrun. + comp_eq_overrun number of times an EQ mapped to completion events was + overrun. +- quota_exceeded_command + number of commands issued and failed due to quota exceeded. +- invalid_command + number of commands issued and failed dues to any reason other than quota + exceeded. +- nic_receive_steering_discard + number of packets that completed RX flow + steering but were discarded due to a mismatch in flow table. +- generated_pkt_steering_fail + number of packets generated by the VNIC experiencing unexpected steering + failure (at any point in steering flow). +- handled_pkt_steering_fail + number of packets handled by the VNIC experiencing unexpected steering + failure (at any point in steering flow owned by the VNIC, including the FDB + for the eswitch owner). + +User commands examples: + +- Diagnose PF/VF vnic counters:: + + $ devlink health diagnose pci/0000:82:00.1 reporter vnic + +- Diagnose representor vnic counters (performed by supplying devlink port of the + representor, which can be obtained via devlink port command):: + + $ devlink health diagnose pci/0000:82:00.1/65537 reporter vnic + +.. note:: + This command can run over all interfaces such as PF/VF and representor ports. diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst index 4a010a7cde7f..a66054d0763a 100644 --- a/Documentation/networking/ip-sysctl.rst +++ b/Documentation/networking/ip-sysctl.rst @@ -321,6 +321,7 @@ tcp_abort_on_overflow - BOOLEAN option can harm clients of your server. tcp_adv_win_scale - INTEGER + Obsolete since linux-6.6 Count buffering overhead as bytes/2^tcp_adv_win_scale (if tcp_adv_win_scale > 0) or bytes-bytes/2^(-tcp_adv_win_scale), if it is <= 0. @@ -2287,6 +2288,14 @@ accept_ra_min_hop_limit - INTEGER Default: 1 +accept_ra_min_lft - INTEGER + Minimum acceptable lifetime value in Router Advertisement. + + RA sections with a lifetime less than this value shall be + ignored. Zero lifetimes stay unaffected. + + Default: 0 + accept_ra_pinfo - BOOLEAN Learn Prefix Information in Router Advertisement. diff --git a/Documentation/networking/mptcp-sysctl.rst b/Documentation/networking/mptcp-sysctl.rst index 213510698014..15f1919d640c 100644 --- a/Documentation/networking/mptcp-sysctl.rst +++ b/Documentation/networking/mptcp-sysctl.rst @@ -74,3 +74,11 @@ stale_loss_cnt - INTEGER This is a per-namespace sysctl. Default: 4 + +scheduler - STRING + Select the scheduler of your choice. + + Support for selection of different schedulers. This is a per-namespace + sysctl. + + Default: "default" diff --git a/Documentation/networking/netconsole.rst b/Documentation/networking/netconsole.rst index dd0518e002f6..7a9de0568e84 100644 --- a/Documentation/networking/netconsole.rst +++ b/Documentation/networking/netconsole.rst @@ -13,6 +13,8 @@ IPv6 support by Cong Wang <xiyou.wangcong@gmail.com>, Jan 1 2013 Extended console support by Tejun Heo <tj@kernel.org>, May 1 2015 +Release prepend support by Breno Leitao <leitao@debian.org>, Jul 7 2023 + Please send bug reports to Matt Mackall <mpm@selenic.com> Satyam Sharma <satyam.sharma@gmail.com>, and Cong Wang <xiyou.wangcong@gmail.com> @@ -34,10 +36,11 @@ Sender and receiver configuration: It takes a string configuration parameter "netconsole" in the following format:: - netconsole=[+][src-port]@[src-ip]/[<dev>],[tgt-port]@<tgt-ip>/[tgt-macaddr] + netconsole=[+][r][src-port]@[src-ip]/[<dev>],[tgt-port]@<tgt-ip>/[tgt-macaddr] where + if present, enable extended console support + r if present, prepend kernel version (release) to the message src-port source for UDP packets (defaults to 6665) src-ip source IP to use (interface address) dev network interface (eth0) @@ -125,6 +128,7 @@ The interface exposes these parameters of a netconsole target to userspace: ============== ================================= ============ enabled Is this target currently enabled? (read-write) extended Extended mode enabled (read-write) + release Prepend kernel release to message (read-write) dev_name Local network interface name (read-write) local_port Source UDP port to use (read-write) remote_port Remote agent's UDP port (read-write) @@ -165,6 +169,11 @@ following format which is the same as /dev/kmsg:: <level>,<sequnum>,<timestamp>,<contflag>;<message text> +If 'r' (release) feature is enabled, the kernel release version is +prepended to the start of the message. Example:: + + 6.4.0,6,444,501151268,-;netconsole: network logging started + Non printable characters in <message text> are escaped using "\xff" notation. If the message contains optional dictionary, verbatim newline is used as the delimiter. diff --git a/Documentation/networking/page_pool.rst b/Documentation/networking/page_pool.rst index 873efd97f822..215ebc92752c 100644 --- a/Documentation/networking/page_pool.rst +++ b/Documentation/networking/page_pool.rst @@ -4,22 +4,8 @@ Page Pool API ============= -The page_pool allocator is optimized for the XDP mode that uses one frame -per-page, but it can fallback on the regular page allocator APIs. - -Basic use involves replacing alloc_pages() calls with the -page_pool_alloc_pages() call. Drivers should use page_pool_dev_alloc_pages() -replacing dev_alloc_pages(). - -API keeps track of in-flight pages, in order to let API user know -when it is safe to free a page_pool object. Thus, API users -must run page_pool_release_page() when a page is leaving the page_pool or -call page_pool_put_page() where appropriate in order to maintain correct -accounting. - -API user must call page_pool_put_page() once on a page, as it -will either recycle the page, or in case of refcnt > 1, it will -release the DMA mapping and in-flight state accounting. +.. kernel-doc:: include/net/page_pool/helpers.h + :doc: page_pool allocator Architecture overview ===================== @@ -64,87 +50,68 @@ This lockless guarantee naturally comes from running under a NAPI softirq. The protection doesn't strictly have to be NAPI, any guarantee that allocating a page will cause no race conditions is enough. -* page_pool_create(): Create a pool. - * flags: PP_FLAG_DMA_MAP, PP_FLAG_DMA_SYNC_DEV - * order: 2^order pages on allocation - * pool_size: size of the ptr_ring - * nid: preferred NUMA node for allocation - * dev: struct device. Used on DMA operations - * dma_dir: DMA direction - * max_len: max DMA sync memory size - * offset: DMA address offset - -* page_pool_put_page(): The outcome of this depends on the page refcnt. If the - driver bumps the refcnt > 1 this will unmap the page. If the page refcnt is 1 - the allocator owns the page and will try to recycle it in one of the pool - caches. If PP_FLAG_DMA_SYNC_DEV is set, the page will be synced for_device - using dma_sync_single_range_for_device(). - -* page_pool_put_full_page(): Similar to page_pool_put_page(), but will DMA sync - for the entire memory area configured in area pool->max_len. - -* page_pool_recycle_direct(): Similar to page_pool_put_full_page() but caller - must guarantee safe context (e.g NAPI), since it will recycle the page - directly into the pool fast cache. - -* page_pool_release_page(): Unmap the page (if mapped) and account for it on - in-flight counters. - -* page_pool_dev_alloc_pages(): Get a page from the page allocator or page_pool - caches. - -* page_pool_get_dma_addr(): Retrieve the stored DMA address. - -* page_pool_get_dma_dir(): Retrieve the stored DMA direction. - -* page_pool_put_page_bulk(): Tries to refill a number of pages into the - ptr_ring cache holding ptr_ring producer lock. If the ptr_ring is full, - page_pool_put_page_bulk() will release leftover pages to the page allocator. - page_pool_put_page_bulk() is suitable to be run inside the driver NAPI tx - completion loop for the XDP_REDIRECT use case. - Please note the caller must not use data area after running - page_pool_put_page_bulk(), as this function overwrites it. - -* page_pool_get_stats(): Retrieve statistics about the page_pool. This API - is only available if the kernel has been configured with - ``CONFIG_PAGE_POOL_STATS=y``. A pointer to a caller allocated ``struct - page_pool_stats`` structure is passed to this API which is filled in. The - caller can then report those stats to the user (perhaps via ethtool, - debugfs, etc.). See below for an example usage of this API. +.. kernel-doc:: net/core/page_pool.c + :identifiers: page_pool_create + +.. kernel-doc:: include/net/page_pool/types.h + :identifiers: struct page_pool_params + +.. kernel-doc:: include/net/page_pool/helpers.h + :identifiers: page_pool_put_page page_pool_put_full_page + page_pool_recycle_direct page_pool_dev_alloc_pages + page_pool_get_dma_addr page_pool_get_dma_dir + +.. kernel-doc:: net/core/page_pool.c + :identifiers: page_pool_put_page_bulk page_pool_get_stats + +DMA sync +-------- +Driver is always responsible for syncing the pages for the CPU. +Drivers may choose to take care of syncing for the device as well +or set the ``PP_FLAG_DMA_SYNC_DEV`` flag to request that pages +allocated from the page pool are already synced for the device. + +If ``PP_FLAG_DMA_SYNC_DEV`` is set, the driver must inform the core what portion +of the buffer has to be synced. This allows the core to avoid syncing the entire +page when the drivers knows that the device only accessed a portion of the page. + +Most drivers will reserve headroom in front of the frame. This part +of the buffer is not touched by the device, so to avoid syncing +it drivers can set the ``offset`` field in struct page_pool_params +appropriately. + +For pages recycled on the XDP xmit and skb paths the page pool will +use the ``max_len`` member of struct page_pool_params to decide how +much of the page needs to be synced (starting at ``offset``). +When directly freeing pages in the driver (page_pool_put_page()) +the ``dma_sync_size`` argument specifies how much of the buffer needs +to be synced. + +If in doubt set ``offset`` to 0, ``max_len`` to ``PAGE_SIZE`` and +pass -1 as ``dma_sync_size``. That combination of arguments is always +correct. + +Note that the syncing parameters are for the entire page. +This is important to remember when using fragments (``PP_FLAG_PAGE_FRAG``), +where allocated buffers may be smaller than a full page. +Unless the driver author really understands page pool internals +it's recommended to always use ``offset = 0``, ``max_len = PAGE_SIZE`` +with fragmented page pools. Stats API and structures ------------------------ If the kernel is configured with ``CONFIG_PAGE_POOL_STATS=y``, the API -``page_pool_get_stats()`` and structures described below are available. It -takes a pointer to a ``struct page_pool`` and a pointer to a ``struct -page_pool_stats`` allocated by the caller. +page_pool_get_stats() and structures described below are available. +It takes a pointer to a ``struct page_pool`` and a pointer to a struct +page_pool_stats allocated by the caller. -The API will fill in the provided ``struct page_pool_stats`` with +The API will fill in the provided struct page_pool_stats with statistics about the page_pool. -The stats structure has the following fields:: - - struct page_pool_stats { - struct page_pool_alloc_stats alloc_stats; - struct page_pool_recycle_stats recycle_stats; - }; - - -The ``struct page_pool_alloc_stats`` has the following fields: - * ``fast``: successful fast path allocations - * ``slow``: slow path order-0 allocations - * ``slow_high_order``: slow path high order allocations - * ``empty``: ptr ring is empty, so a slow path allocation was forced. - * ``refill``: an allocation which triggered a refill of the cache - * ``waive``: pages obtained from the ptr ring that cannot be added to - the cache due to a NUMA mismatch. - -The ``struct page_pool_recycle_stats`` has the following fields: - * ``cached``: recycling placed page in the page pool cache - * ``cache_full``: page pool cache was full - * ``ring``: page placed into the ptr ring - * ``ring_full``: page released from page pool because the ptr ring was full - * ``released_refcnt``: page released (and not recycled) because refcnt > 1 +.. kernel-doc:: include/net/page_pool/types.h + :identifiers: struct page_pool_recycle_stats + struct page_pool_alloc_stats + struct page_pool_stats Coding examples =============== @@ -194,7 +161,7 @@ NAPI poller if XDP_DROP: page_pool_recycle_direct(page_pool, page); } else (packet_is_skb) { - page_pool_release_page(page_pool, page); + skb_mark_for_recycle(skb); new_page = page_pool_dev_alloc_pages(page_pool); } } diff --git a/Documentation/networking/phy.rst b/Documentation/networking/phy.rst index b7ac4c64cf67..1283240d7620 100644 --- a/Documentation/networking/phy.rst +++ b/Documentation/networking/phy.rst @@ -323,6 +323,10 @@ Some of the interface modes are described below: contrast with the 1000BASE-X phy mode used for Clause 38 and 39 PMDs, this interface mode has different autonegotiation and only supports full duplex. +``PHY_INTERFACE_MODE_PSGMII`` + This is the Penta SGMII mode, it is similar to QSGMII but it combines 5 + SGMII lines into a single link compared to 4 on QSGMII. + Pause frames / flow control =========================== diff --git a/Documentation/networking/xfrm_device.rst b/Documentation/networking/xfrm_device.rst index 83abdfef4ec3..535077cbeb07 100644 --- a/Documentation/networking/xfrm_device.rst +++ b/Documentation/networking/xfrm_device.rst @@ -1,4 +1,5 @@ .. SPDX-License-Identifier: GPL-2.0 +.. _xfrm_device: =============================================== XFRM device - offloading the IPsec computations diff --git a/Documentation/powerpc/index.rst b/Documentation/powerpc/index.rst index d33b554ca7ba..a50834798454 100644 --- a/Documentation/powerpc/index.rst +++ b/Documentation/powerpc/index.rst @@ -36,6 +36,7 @@ powerpc ultravisor vas-api vcpudispatch_stats + vmemmap_dedup features diff --git a/Documentation/powerpc/vmemmap_dedup.rst b/Documentation/powerpc/vmemmap_dedup.rst new file mode 100644 index 000000000000..dc4db59fdf87 --- /dev/null +++ b/Documentation/powerpc/vmemmap_dedup.rst @@ -0,0 +1,101 @@ +.. SPDX-License-Identifier: GPL-2.0 + +========== +Device DAX +========== + +The device-dax interface uses the tail deduplication technique explained in +Documentation/mm/vmemmap_dedup.rst + +On powerpc, vmemmap deduplication is only used with radix MMU translation. Also +with a 64K page size, only the devdax namespace with 1G alignment uses vmemmap +deduplication. + +With 2M PMD level mapping, we require 32 struct pages and a single 64K vmemmap +page can contain 1024 struct pages (64K/sizeof(struct page)). Hence there is no +vmemmap deduplication possible. + +With 1G PUD level mapping, we require 16384 struct pages and a single 64K +vmemmap page can contain 1024 struct pages (64K/sizeof(struct page)). Hence we +require 16 64K pages in vmemmap to map the struct page for 1G PUD level mapping. + +Here's how things look like on device-dax after the sections are populated:: + +-----------+ ---virt_to_page---> +-----------+ mapping to +-----------+ + | | | 0 | -------------> | 0 | + | | +-----------+ +-----------+ + | | | 1 | -------------> | 1 | + | | +-----------+ +-----------+ + | | | 2 | ----------------^ ^ ^ ^ ^ ^ + | | +-----------+ | | | | | + | | | 3 | ------------------+ | | | | + | | +-----------+ | | | | + | | | 4 | --------------------+ | | | + | PUD | +-----------+ | | | + | level | | . | ----------------------+ | | + | mapping | +-----------+ | | + | | | . | ------------------------+ | + | | +-----------+ | + | | | 15 | --------------------------+ + | | +-----------+ + | | + | | + | | + +-----------+ + + +With 4K page size, 2M PMD level mapping requires 512 struct pages and a single +4K vmemmap page contains 64 struct pages(4K/sizeof(struct page)). Hence we +require 8 4K pages in vmemmap to map the struct page for 2M pmd level mapping. + +Here's how things look like on device-dax after the sections are populated:: + + +-----------+ ---virt_to_page---> +-----------+ mapping to +-----------+ + | | | 0 | -------------> | 0 | + | | +-----------+ +-----------+ + | | | 1 | -------------> | 1 | + | | +-----------+ +-----------+ + | | | 2 | ----------------^ ^ ^ ^ ^ ^ + | | +-----------+ | | | | | + | | | 3 | ------------------+ | | | | + | | +-----------+ | | | | + | | | 4 | --------------------+ | | | + | PMD | +-----------+ | | | + | level | | 5 | ----------------------+ | | + | mapping | +-----------+ | | + | | | 6 | ------------------------+ | + | | +-----------+ | + | | | 7 | --------------------------+ + | | +-----------+ + | | + | | + | | + +-----------+ + +With 1G PUD level mapping, we require 262144 struct pages and a single 4K +vmemmap page can contain 64 struct pages (4K/sizeof(struct page)). Hence we +require 4096 4K pages in vmemmap to map the struct pages for 1G PUD level +mapping. + +Here's how things look like on device-dax after the sections are populated:: + + +-----------+ ---virt_to_page---> +-----------+ mapping to +-----------+ + | | | 0 | -------------> | 0 | + | | +-----------+ +-----------+ + | | | 1 | -------------> | 1 | + | | +-----------+ +-----------+ + | | | 2 | ----------------^ ^ ^ ^ ^ ^ + | | +-----------+ | | | | | + | | | 3 | ------------------+ | | | | + | | +-----------+ | | | | + | | | 4 | --------------------+ | | | + | PUD | +-----------+ | | | + | level | | . | ----------------------+ | | + | mapping | +-----------+ | | + | | | . | ------------------------+ | + | | +-----------+ | + | | | 4095 | --------------------------+ + | | +-----------+ + | | + | | + | | + +-----------+ diff --git a/Documentation/process/changes.rst b/Documentation/process/changes.rst index 5561dae94f85..0bbd040f6a55 100644 --- a/Documentation/process/changes.rst +++ b/Documentation/process/changes.rst @@ -31,8 +31,8 @@ you probably needn't concern yourself with pcmciautils. ====================== =============== ======================================== GNU C 5.1 gcc --version Clang/LLVM (optional) 11.0.0 clang --version -Rust (optional) 1.68.2 rustc --version -bindgen (optional) 0.56.0 bindgen --version +Rust (optional) 1.71.1 rustc --version +bindgen (optional) 0.65.1 bindgen --version GNU make 3.82 make --version bash 4.2 bash --version binutils 2.25 ld -v diff --git a/Documentation/process/maintainer-netdev.rst b/Documentation/process/maintainer-netdev.rst index 2ab843cde830..c1c732e9748b 100644 --- a/Documentation/process/maintainer-netdev.rst +++ b/Documentation/process/maintainer-netdev.rst @@ -167,6 +167,8 @@ Asking the maintainer for status updates on your patch is a good way to ensure your patch is ignored or pushed to the bottom of the priority list. +.. _Changes requested: + Changes requested ~~~~~~~~~~~~~~~~~ @@ -359,6 +361,10 @@ Make sure you address all the feedback in your new posting. Do not post a new version of the code if the discussion about the previous version is still ongoing, unless directly instructed by a reviewer. +The new version of patches should be posted as a separate thread, +not as a reply to the previous posting. Change log should include a link +to the previous posting (see :ref:`Changes requested`). + Testing ------- diff --git a/Documentation/rust/quick-start.rst b/Documentation/rust/quick-start.rst index a8931512ed98..f382914f4191 100644 --- a/Documentation/rust/quick-start.rst +++ b/Documentation/rust/quick-start.rst @@ -38,7 +38,9 @@ and run:: rustup override set $(scripts/min-tool-version.sh rustc) -Otherwise, fetch a standalone installer from: +This will configure your working directory to use the correct version of +``rustc`` without affecting your default toolchain. If you are not using +``rustup``, fetch a standalone installer from: https://forge.rust-lang.org/infra/other-installation-methods.html#standalone @@ -56,16 +58,17 @@ If ``rustup`` is being used, run:: The components are installed per toolchain, thus upgrading the Rust compiler version later on requires re-adding the component. -Otherwise, if a standalone installer is used, the Rust repository may be cloned -into the installation folder of the toolchain:: +Otherwise, if a standalone installer is used, the Rust source tree may be +downloaded into the toolchain's installation folder:: - git clone --recurse-submodules \ - --branch $(scripts/min-tool-version.sh rustc) \ - https://github.com/rust-lang/rust \ - $(rustc --print sysroot)/lib/rustlib/src/rust + curl -L "https://static.rust-lang.org/dist/rust-src-$(scripts/min-tool-version.sh rustc).tar.gz" | + tar -xzf - -C "$(rustc --print sysroot)/lib" \ + "rust-src-$(scripts/min-tool-version.sh rustc)/rust-src/lib/" \ + --strip-components=3 In this case, upgrading the Rust compiler version later on requires manually -updating this clone. +updating the source tree (this can be done by removing ``$(rustc --print +sysroot)/lib/rustlib/src/rust`` then rerunning the above command). libclang @@ -98,7 +101,24 @@ the ``bindgen`` tool. A particular version is required. Install it via (note that this will download and build the tool from source):: - cargo install --locked --version $(scripts/min-tool-version.sh bindgen) bindgen + cargo install --locked --version $(scripts/min-tool-version.sh bindgen) bindgen-cli + +``bindgen`` needs to find a suitable ``libclang`` in order to work. If it is +not found (or a different ``libclang`` than the one found should be used), +the process can be tweaked using the environment variables understood by +``clang-sys`` (the Rust bindings crate that ``bindgen`` uses to access +``libclang``): + +* ``LLVM_CONFIG_PATH`` can be pointed to an ``llvm-config`` executable. + +* Or ``LIBCLANG_PATH`` can be pointed to a ``libclang`` shared library + or to the directory containing it. + +* Or ``CLANG_PATH`` can be pointed to a ``clang`` executable. + +For details, please see ``clang-sys``'s documentation at: + + https://github.com/KyleMayes/clang-sys#environment-variables Requirements: Developing @@ -179,7 +199,9 @@ be used with many editors to enable syntax highlighting, completion, go to definition, and other features. ``rust-analyzer`` needs a configuration file, ``rust-project.json``, which -can be generated by the ``rust-analyzer`` Make target. +can be generated by the ``rust-analyzer`` Make target:: + + make LLVM=1 rust-analyzer Configuration diff --git a/Documentation/scheduler/sched-design-CFS.rst b/Documentation/scheduler/sched-design-CFS.rst index 03db55504515..f68919800f05 100644 --- a/Documentation/scheduler/sched-design-CFS.rst +++ b/Documentation/scheduler/sched-design-CFS.rst @@ -94,7 +94,7 @@ other HZ detail. Thus the CFS scheduler has no notion of "timeslices" in the way the previous scheduler had, and has no heuristics whatsoever. There is only one central tunable (you have to switch on CONFIG_SCHED_DEBUG): - /sys/kernel/debug/sched/min_granularity_ns + /sys/kernel/debug/sched/base_slice_ns which can be used to tune the scheduler from "desktop" (i.e., low latencies) to "server" (i.e., good batching) workloads. It defaults to a setting suitable diff --git a/Documentation/translations/zh_CN/mm/frontswap.rst b/Documentation/translations/zh_CN/mm/frontswap.rst deleted file mode 100644 index 434975390b48..000000000000 --- a/Documentation/translations/zh_CN/mm/frontswap.rst +++ /dev/null @@ -1,196 +0,0 @@ -:Original: Documentation/mm/frontswap.rst - -:翻译: - - 司延腾 Yanteng Si <siyanteng@loongson.cn> - -:校译: - -========= -Frontswap -========= - -Frontswap为交换页提供了一个 “transcendent memory” 的接口。在一些环境中,由 -于交换页被保存在RAM(或类似RAM的设备)中,而不是交换磁盘,因此可以获得巨大的性能 -节省(提高)。 - -.. _Transcendent memory in a nutshell: https://lwn.net/Articles/454795/ - -Frontswap之所以这么命名,是因为它可以被认为是与swap设备的“back”存储相反。存 -储器被认为是一个同步并发安全的面向页面的“伪RAM设备”,符合transcendent memory -(如Xen的“tmem”,或内核内压缩内存,又称“zcache”,或未来的类似RAM的设备)的要 -求;这个伪RAM设备不能被内核直接访问或寻址,其大小未知且可能随时间变化。驱动程序通过 -调用frontswap_register_ops将自己与frontswap链接起来,以适当地设置frontswap_ops -的功能,它提供的功能必须符合某些策略,如下所示: - -一个 “init” 将设备准备好接收与指定的交换设备编号(又称“类型”)相关的frontswap -交换页。一个 “store” 将把该页复制到transcendent memory,并与该页的类型和偏移 -量相关联。一个 “load” 将把该页,如果找到的话,从transcendent memory复制到内核 -内存,但不会从transcendent memory中删除该页。一个 “invalidate_page” 将从 -transcendent memory中删除该页,一个 “invalidate_area” 将删除所有与交换类型 -相关的页(例如,像swapoff)并通知 “device” 拒绝进一步存储该交换类型。 - -一旦一个页面被成功存储,在该页面上的匹配加载通常会成功。因此,当内核发现自己处于需 -要交换页面的情况时,它首先尝试使用frontswap。如果存储的结果是成功的,那么数据就已 -经成功的保存到了transcendent memory中,并且避免了磁盘写入,如果后来再读回数据, -也避免了磁盘读取。如果存储返回失败,transcendent memory已经拒绝了该数据,且该页 -可以像往常一样被写入交换空间。 - -请注意,如果一个页面被存储,而该页面已经存在于transcendent memory中(一个 “重复” -的存储),要么存储成功,数据被覆盖,要么存储失败,该页面被废止。这确保了旧的数据永远 -不会从frontswap中获得。 - -如果配置正确,对frontswap的监控是通过 `/sys/kernel/debug/frontswap` 目录下的 -debugfs完成的。frontswap的有效性可以通过以下方式测量(在所有交换设备中): - -``failed_stores`` - 有多少次存储的尝试是失败的 - -``loads`` - 尝试了多少次加载(应该全部成功) - -``succ_stores`` - 有多少次存储的尝试是成功的 - -``invalidates`` - 尝试了多少次作废 - -后台实现可以提供额外的指标。 - -经常问到的问题 -============== - -* 价值在哪里? - -当一个工作负载开始交换时,性能就会下降。Frontswap通过提供一个干净的、动态的接口来 -读取和写入交换页到 “transcendent memory”,从而大大增加了许多这样的工作负载的性 -能,否则内核是无法直接寻址的。当数据被转换为不同的形式和大小(比如压缩)或者被秘密 -移动(对于一些类似RAM的设备来说,这可能对写平衡很有用)时,这个接口是理想的。交换 -页(和被驱逐的页面缓存页)是这种比RAM慢但比磁盘快得多的“伪RAM设备”的一大用途。 - -Frontswap对内核的影响相当小,为各种系统配置中更动态、更灵活的RAM利用提供了巨大的 -灵活性: - -在单一内核的情况下,又称“zcache”,页面被压缩并存储在本地内存中,从而增加了可以安 -全保存在RAM中的匿名页面总数。Zcache本质上是用压缩/解压缩的CPU周期换取更好的内存利 -用率。Benchmarks测试显示,当内存压力较低时,几乎没有影响,而在高内存压力下的一些 -工作负载上,则有明显的性能改善(25%以上)。 - -“RAMster” 在zcache的基础上增加了对集群系统的 “peer-to-peer” transcendent memory -的支持。Frontswap页面像zcache一样被本地压缩,但随后被“remotified” 到另一个系 -统的RAM。这使得RAM可以根据需要动态地来回负载平衡,也就是说,当系统A超载时,它可以 -交换到系统B,反之亦然。RAMster也可以被配置成一个内存服务器,因此集群中的许多服务器 -可以根据需要动态地交换到配置有大量内存的单一服务器上......而不需要预先配置每个客户 -有多少内存可用 - -在虚拟情况下,虚拟化的全部意义在于统计地将物理资源在多个虚拟机的不同需求之间进行复 -用。对于RAM来说,这真的很难做到,而且在不改变内核的情况下,要做好这一点的努力基本上 -是失败的(除了一些广为人知的特殊情况下的工作负载)。具体来说,Xen Transcendent Memory -后端允许管理器拥有的RAM “fallow”,不仅可以在多个虚拟机之间进行“time-shared”, -而且页面可以被压缩和重复利用,以优化RAM的利用率。当客户操作系统被诱导交出未充分利用 -的RAM时(如 “selfballooning”),突然出现的意外内存压力可能会导致交换;frontswap -允许这些页面被交换到管理器RAM中或从管理器RAM中交换(如果整体主机系统内存条件允许), -从而减轻计划外交换可能带来的可怕的性能影响。 - -一个KVM的实现正在进行中,并且已经被RFC'ed到lkml。而且,利用frontswap,对NVM作为 -内存扩展技术的调查也在进行中。 - -* 当然,在某些情况下可能有性能上的优势,但frontswap的空间/时间开销是多少? - -如果 CONFIG_FRONTSWAP 被禁用,每个 frontswap 钩子都会编译成空,唯一的开销是每 -个 swapon'ed swap 设备的几个额外字节。如果 CONFIG_FRONTSWAP 被启用,但没有 -frontswap的 “backend” 寄存器,每读或写一个交换页就会有一个额外的全局变量,而不 -是零。如果 CONFIG_FRONTSWAP 被启用,并且有一个frontswap的backend寄存器,并且 -后端每次 “store” 请求都失败(即尽管声称可能,但没有提供内存),CPU 的开销仍然可以 -忽略不计 - 因为每次frontswap失败都是在交换页写到磁盘之前,系统很可能是 I/O 绑定 -的,无论如何使用一小部分的 CPU 都是不相关的。 - -至于空间,如果CONFIG_FRONTSWAP被启用,并且有一个frontswap的backend注册,那么 -每个交换设备的每个交换页都会被分配一个比特。这是在内核已经为每个交换设备的每个交换 -页分配的8位(在2.6.34之前是16位)上增加的。(Hugh Dickins观察到,frontswap可能 -会偷取现有的8个比特,但是我们以后再来担心这个小的优化问题)。对于标准的4K页面大小的 -非常大的交换盘(这很罕见),这是每32GB交换盘1MB开销。 - -当交换页存储在transcendent memory中而不是写到磁盘上时,有一个副作用,即这可能会 -产生更多的内存压力,有可能超过其他的优点。一个backend,比如zcache,必须实现策略 -来仔细(但动态地)管理内存限制,以确保这种情况不会发生。 - -* 好吧,那就用内核骇客能理解的术语来快速概述一下这个frontswap补丁的作用如何? - -我们假设在内核初始化过程中,一个frontswap 的 “backend” 已经注册了;这个注册表 -明这个frontswap 的 “backend” 可以访问一些不被内核直接访问的“内存”。它到底提 -供了多少内存是完全动态和随机的。 - -每当一个交换设备被交换时,就会调用frontswap_init(),把交换设备的编号(又称“类 -型”)作为一个参数传给它。这就通知了frontswap,以期待 “store” 与该号码相关的交 -换页的尝试。 - -每当交换子系统准备将一个页面写入交换设备时(参见swap_writepage()),就会调用 -frontswap_store。Frontswap与frontswap backend协商,如果backend说它没有空 -间,frontswap_store返回-1,内核就会照常把页换到交换设备上。注意,来自frontswap -backend的响应对内核来说是不可预测的;它可能选择从不接受一个页面,可能接受每九个 -页面,也可能接受每一个页面。但是如果backend确实接受了一个页面,那么这个页面的数 -据已经被复制并与类型和偏移量相关联了,而且backend保证了数据的持久性。在这种情况 -下,frontswap在交换设备的“frontswap_map” 中设置了一个位,对应于交换设备上的 -页面偏移量,否则它就会将数据写入该设备。 - -当交换子系统需要交换一个页面时(swap_readpage()),它首先调用frontswap_load(), -检查frontswap_map,看这个页面是否早先被frontswap backend接受。如果是,该页 -的数据就会从frontswap后端填充,换入就完成了。如果不是,正常的交换代码将被执行, -以便从真正的交换设备上获得这一页的数据。 - -所以每次frontswap backend接受一个页面时,交换设备的读取和(可能)交换设备的写 -入都被 “frontswap backend store” 和(可能)“frontswap backend loads” -所取代,这可能会快得多。 - -* frontswap不能被配置为一个 “特殊的” 交换设备,它的优先级要高于任何真正的交换 - 设备(例如像zswap,或者可能是swap-over-nbd/NFS)? - -首先,现有的交换子系统不允许有任何种类的交换层次结构。也许它可以被重写以适应层次 -结构,但这将需要相当大的改变。即使它被重写,现有的交换子系统也使用了块I/O层,它 -假定交换设备是固定大小的,其中的任何页面都是可线性寻址的。Frontswap几乎没有触 -及现有的交换子系统,而是围绕着块I/O子系统的限制,提供了大量的灵活性和动态性。 - -例如,frontswap backend对任何交换页的接受是完全不可预测的。这对frontswap backend -的定义至关重要,因为它赋予了backend完全动态的决定权。在zcache中,人们无法预 -先知道一个页面的可压缩性如何。可压缩性 “差” 的页面会被拒绝,而 “差” 本身也可 -以根据当前的内存限制动态地定义。 - -此外,frontswap是完全同步的,而真正的交换设备,根据定义,是异步的,并且使用 -块I/O。块I/O层不仅是不必要的,而且可能进行 “优化”,这对面向RAM的设备来说是 -不合适的,包括将一些页面的写入延迟相当长的时间。同步是必须的,以确保后端的动 -态性,并避免棘手的竞争条件,这将不必要地大大增加frontswap和/或块I/O子系统的 -复杂性。也就是说,只有最初的 “store” 和 “load” 操作是需要同步的。一个独立 -的异步线程可以自由地操作由frontswap存储的页面。例如,RAMster中的 “remotification” -线程使用标准的异步内核套接字,将压缩的frontswap页面移动到远程机器。同样, -KVM的客户方实现可以进行客户内压缩,并使用 “batched” hypercalls。 - -在虚拟化环境中,动态性允许管理程序(或主机操作系统)做“intelligent overcommit”。 -例如,它可以选择只接受页面,直到主机交换可能即将发生,然后强迫客户机做他们 -自己的交换。 - -transcendent memory规格的frontswap有一个坏处。因为任何 “store” 都可 -能失败,所以必须在一个真正的交换设备上有一个真正的插槽来交换页面。因此, -frontswap必须作为每个交换设备的 “影子” 来实现,它有可能容纳交换设备可能 -容纳的每一个页面,也有可能根本不容纳任何页面。这意味着frontswap不能包含比 -swap设备总数更多的页面。例如,如果在某些安装上没有配置交换设备,frontswap -就没有用。无交换设备的便携式设备仍然可以使用frontswap,但是这种设备的 -backend必须配置某种 “ghost” 交换设备,并确保它永远不会被使用。 - - -* 为什么会有这种关于 “重复存储” 的奇怪定义?如果一个页面以前被成功地存储过, - 难道它不能总是被成功地覆盖吗? - -几乎总是可以的,不,有时不能。考虑一个例子,数据被压缩了,原来的4K页面被压 -缩到了1K。现在,有人试图用不可压缩的数据覆盖该页,因此会占用整个4K。但是 -backend没有更多的空间了。在这种情况下,这个存储必须被拒绝。每当frontswap -拒绝一个会覆盖的存储时,它也必须使旧的数据作废,并确保它不再被访问。因为交 -换子系统会把新的数据写到读交换设备上,这是确保一致性的正确做法。 - -* 为什么frontswap补丁会创建新的头文件swapfile.h? - -frontswap代码依赖于一些swap子系统内部的数据结构,这些数据结构多年来一直 -在静态和全局之间来回移动。这似乎是一个合理的妥协:将它们定义为全局,但在一 -个新的包含文件中声明它们,该文件不被包含swap.h的大量源文件所包含。 - -Dan Magenheimer,最后更新于2012年4月9日 diff --git a/Documentation/translations/zh_CN/mm/hugetlbfs_reserv.rst b/Documentation/translations/zh_CN/mm/hugetlbfs_reserv.rst index b7a0544224ad..0f7e7fb5ca8c 100644 --- a/Documentation/translations/zh_CN/mm/hugetlbfs_reserv.rst +++ b/Documentation/translations/zh_CN/mm/hugetlbfs_reserv.rst @@ -219,7 +219,7 @@ vma_commit_reservation()之间,预留映射有可能被改变。如果hugetlb_ 释放巨页 ======== -巨页释放是由函数free_huge_page()执行的。这个函数是hugetlbfs复合页的析构器。因此,它只传 +巨页释放是由函数free_huge_folio()执行的。这个函数是hugetlbfs复合页的析构器。因此,它只传 递一个指向页面结构体的指针。当一个巨页被释放时,可能需要进行预留计算。如果该页与包含保 留的子池相关联,或者该页在错误路径上被释放,必须恢复全局预留计数,就会出现这种情况。 @@ -387,7 +387,7 @@ region_count()在解除私有巨页映射时被调用。在私有映射中,预 然而,有几种情况是,在一个巨页被分配后,但在它被实例化之前,就遇到了错误。在这种情况下, 页面分配已经消耗了预留,并进行了适当的子池、预留映射和全局计数调整。如果页面在这个时候被释放 -(在实例化和清除PagePrivate之前),那么free_huge_page将增加全局预留计数。然而,预留映射 +(在实例化和清除PagePrivate之前),那么free_huge_folio将增加全局预留计数。然而,预留映射 显示报留被消耗了。这种不一致的状态将导致预留的巨页的 “泄漏” 。全局预留计数将比它原本的要高, 并阻止分配一个预先分配的页面。 diff --git a/Documentation/translations/zh_CN/mm/index.rst b/Documentation/translations/zh_CN/mm/index.rst index 2f53e37b8049..b950dd118be7 100644 --- a/Documentation/translations/zh_CN/mm/index.rst +++ b/Documentation/translations/zh_CN/mm/index.rst @@ -42,7 +42,6 @@ Linux内存管理文档 damon/index free_page_reporting ksm - frontswap hmm hwpoison hugetlbfs_reserv diff --git a/Documentation/translations/zh_CN/mm/split_page_table_lock.rst b/Documentation/translations/zh_CN/mm/split_page_table_lock.rst index 4fb7aa666037..a2c288670a24 100644 --- a/Documentation/translations/zh_CN/mm/split_page_table_lock.rst +++ b/Documentation/translations/zh_CN/mm/split_page_table_lock.rst @@ -56,16 +56,16 @@ Hugetlb特定的辅助函数: 架构对分页表锁的支持 ==================== -没有必要特别启用PTE分页表锁:所有需要的东西都由pgtable_pte_page_ctor() -和pgtable_pte_page_dtor()完成,它们必须在PTE表分配/释放时被调用。 +没有必要特别启用PTE分页表锁:所有需要的东西都由pagetable_pte_ctor() +和pagetable_pte_dtor()完成,它们必须在PTE表分配/释放时被调用。 确保架构不使用slab分配器来分配页表:slab使用page->slab_cache来分配其页 面。这个区域与page->ptl共享存储。 PMD分页锁只有在你有两个以上的页表级别时才有意义。 -启用PMD分页锁需要在PMD表分配时调用pgtable_pmd_page_ctor(),在释放时调 -用pgtable_pmd_page_dtor()。 +启用PMD分页锁需要在PMD表分配时调用pagetable_pmd_ctor(),在释放时调 +用pagetable_pmd_dtor()。 分配通常发生在pmd_alloc_one()中,释放发生在pmd_free()和pmd_free_tlb() 中,但要确保覆盖所有的PMD表分配/释放路径:即X86_PAE在pgd_alloc()中预先 @@ -73,7 +73,7 @@ PMD分页锁只有在你有两个以上的页表级别时才有意义。 一切就绪后,你可以设置CONFIG_ARCH_ENABLE_SPLIT_PMD_PTLOCK。 -注意:pgtable_pte_page_ctor()和pgtable_pmd_page_ctor()可能失败--必 +注意:pagetable_pte_ctor()和pagetable_pmd_ctor()可能失败--必 须正确处理。 page->ptl @@ -90,7 +90,7 @@ page->ptl用于访问分割页表锁,其中'page'是包含该表的页面struc 的指针并动态分配它。这允许在启用DEBUG_SPINLOCK或DEBUG_LOCK_ALLOC的 情况下使用分页锁,但由于间接访问而多花了一个缓存行。 -PTE表的spinlock_t分配在pgtable_pte_page_ctor()中,PMD表的spinlock_t -分配在pgtable_pmd_page_ctor()中。 +PTE表的spinlock_t分配在pagetable_pte_ctor()中,PMD表的spinlock_t +分配在pagetable_pmd_ctor()中。 请不要直接访问page->ptl - -使用适当的辅助函数。 diff --git a/Documentation/userspace-api/netlink/genetlink-legacy.rst b/Documentation/userspace-api/netlink/genetlink-legacy.rst index 802875a37a27..40b82ad5d54a 100644 --- a/Documentation/userspace-api/netlink/genetlink-legacy.rst +++ b/Documentation/userspace-api/netlink/genetlink-legacy.rst @@ -8,11 +8,8 @@ This document describes the many additional quirks and properties required to describe older Generic Netlink families which form the ``genetlink-legacy`` protocol level. -The spec is a work in progress, some of the quirks are just documented -for future reference. - -Specification (defined) -======================= +Specification +============= Attribute type nests -------------------- @@ -156,16 +153,27 @@ it will be allocated 3 for the request (``a`` is the previous operation with a request section and the value of 2) and 8 for response (``c`` is the previous operation in the "from-kernel" direction). -Other quirks (todo) -=================== +Other quirks +============ Structures ---------- Legacy families can define C structures both to be used as the contents of an attribute and as a fixed message header. Structures are defined in -``definitions`` and referenced in operations or attributes. Note that -structures defined in YAML are implicitly packed according to C +``definitions`` and referenced in operations or attributes. + +members +~~~~~~~ + + - ``name`` - The attribute name of the struct member + - ``type`` - One of the scalar types ``u8``, ``u16``, ``u32``, ``u64``, ``s8``, + ``s16``, ``s32``, ``s64``, ``string`` or ``binary``. + - ``byte-order`` - ``big-endian`` or ``little-endian`` + - ``doc``, ``enum``, ``enum-as-flags``, ``display-hint`` - Same as for + :ref:`attribute definitions <attribute_properties>` + +Note that structures defined in YAML are implicitly packed according to C conventions. For example, the following struct is 4 bytes, not 6 bytes: .. code-block:: c diff --git a/Documentation/userspace-api/netlink/index.rst b/Documentation/userspace-api/netlink/index.rst index 26f3720cb3be..62725dafbbdb 100644 --- a/Documentation/userspace-api/netlink/index.rst +++ b/Documentation/userspace-api/netlink/index.rst @@ -14,5 +14,6 @@ Netlink documentation for users. specs c-code-gen genetlink-legacy + netlink-raw See also :ref:`Documentation/core-api/netlink.rst <kernel_netlink>`. diff --git a/Documentation/userspace-api/netlink/netlink-raw.rst b/Documentation/userspace-api/netlink/netlink-raw.rst new file mode 100644 index 000000000000..f07fb9b9c101 --- /dev/null +++ b/Documentation/userspace-api/netlink/netlink-raw.rst @@ -0,0 +1,58 @@ +.. SPDX-License-Identifier: BSD-3-Clause + +====================================================== +Netlink specification support for raw Netlink families +====================================================== + +This document describes the additional properties required by raw Netlink +families such as ``NETLINK_ROUTE`` which use the ``netlink-raw`` protocol +specification. + +Specification +============= + +The netlink-raw schema extends the :doc:`genetlink-legacy <genetlink-legacy>` +schema with properties that are needed to specify the protocol numbers and +multicast IDs used by raw netlink families. See :ref:`classic_netlink` for more +information. + +Globals +------- + +protonum +~~~~~~~~ + +The ``protonum`` property is used to specify the protocol number to use when +opening a netlink socket. + +.. code-block:: yaml + + # SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) + + name: rt-addr + protocol: netlink-raw + protonum: 0 # part of the NETLINK_ROUTE protocol + + +Multicast group properties +-------------------------- + +value +~~~~~ + +The ``value`` property is used to specify the group ID to use for multicast +group registration. + +.. code-block:: yaml + + mcast-groups: + list: + - + name: rtnlgrp-ipv4-ifaddr + value: 5 + - + name: rtnlgrp-ipv6-ifaddr + value: 9 + - + name: rtnlgrp-mctp-ifaddr + value: 34 diff --git a/Documentation/userspace-api/netlink/specs.rst b/Documentation/userspace-api/netlink/specs.rst index 2e4acde890b7..cc4e2430997e 100644 --- a/Documentation/userspace-api/netlink/specs.rst +++ b/Documentation/userspace-api/netlink/specs.rst @@ -68,6 +68,10 @@ The following sections describe the properties of the most modern ``genetlink`` schema. See the documentation of :doc:`genetlink-c <c-code-gen>` for information on how C names are derived from name properties. +See also :ref:`Documentation/core-api/netlink.rst <kernel_netlink>` for +information on the Netlink specification properties that are only relevant to +the kernel space and not part of the user space API. + genetlink ========= @@ -180,6 +184,8 @@ attributes List of attributes in the set. +.. _attribute_properties: + Attribute properties -------------------- @@ -264,6 +270,13 @@ a C array of u32 values can be specified with ``type: binary`` and ``sub-type: u32``. Binary types and legacy array formats are described in more detail in :doc:`genetlink-legacy`. +display-hint +~~~~~~~~~~~~ + +Optional format indicator that is intended only for choosing the right +formatting mechanism when displaying values of this type. Currently supported +hints are ``hex``, ``mac``, ``fddi``, ``ipv4``, ``ipv6`` and ``uuid``. + operations ---------- |