diff options
Diffstat (limited to 'Documentation/core-api')
-rw-r--r-- | Documentation/core-api/index.rst | 1 | ||||
-rw-r--r-- | Documentation/core-api/netlink.rst | 101 | ||||
-rw-r--r-- | Documentation/core-api/packing.rst | 2 | ||||
-rw-r--r-- | Documentation/core-api/padata.rst | 2 | ||||
-rw-r--r-- | Documentation/core-api/pin_user_pages.rst | 31 | ||||
-rw-r--r-- | Documentation/core-api/workqueue.rst | 4 |
6 files changed, 121 insertions, 20 deletions
diff --git a/Documentation/core-api/index.rst b/Documentation/core-api/index.rst index 77eb775b8b42..7a3a08d81f11 100644 --- a/Documentation/core-api/index.rst +++ b/Documentation/core-api/index.rst @@ -127,6 +127,7 @@ Documents that don't fit elsewhere or which have yet to be categorized. :maxdepth: 1 librs + netlink .. only:: subproject and html diff --git a/Documentation/core-api/netlink.rst b/Documentation/core-api/netlink.rst new file mode 100644 index 000000000000..e4a938a05cc9 --- /dev/null +++ b/Documentation/core-api/netlink.rst @@ -0,0 +1,101 @@ +.. SPDX-License-Identifier: BSD-3-Clause + +.. _kernel_netlink: + +=================================== +Netlink notes for kernel developers +=================================== + +General guidance +================ + +Attribute enums +--------------- + +Older families often define "null" attributes and commands with value +of ``0`` and named ``unspec``. This is supported (``type: unused``) +but should be avoided in new families. The ``unspec`` enum values are +not used in practice, so just set the value of the first attribute to ``1``. + +Message enums +------------- + +Use the same command IDs for requests and replies. This makes it easier +to match them up, and we have plenty of ID space. + +Use separate command IDs for notifications. This makes it easier to +sort the notifications from replies (and present them to the user +application via a different API than replies). + +Answer requests +--------------- + +Older families do not reply to all of the commands, especially NEW / ADD +commands. User only gets information whether the operation succeeded or +not via the ACK. Try to find useful data to return. Once the command is +added whether it replies with a full message or only an ACK is uAPI and +cannot be changed. It's better to err on the side of replying. + +Specifically NEW and ADD commands should reply with information identifying +the created object such as the allocated object's ID (without having to +resort to using ``NLM_F_ECHO``). + +NLM_F_ECHO +---------- + +Make sure to pass the request info to genl_notify() to allow ``NLM_F_ECHO`` +to take effect. This is useful for programs that need precise feedback +from the kernel (for example for logging purposes). + +Support dump consistency +------------------------ + +If iterating over objects during dump may skip over objects or repeat +them - make sure to report dump inconsistency with ``NLM_F_DUMP_INTR``. +This is usually implemented by maintaining a generation id for the +structure and recording it in the ``seq`` member of struct netlink_callback. + +Netlink specification +===================== + +Documentation of the Netlink specification parts which are only relevant +to the kernel space. + +Globals +------- + +kernel-policy +~~~~~~~~~~~~~ + +Defines if the kernel validation policy is per operation (``per-op``) +or for the entire family (``global``). New families should use ``per-op`` +(default) to be able to narrow down the attributes accepted by a specific +command. + +checks +------ + +Documentation for the ``checks`` sub-sections of attribute specs. + +unterminated-ok +~~~~~~~~~~~~~~~ + +Accept strings without the null-termination (for legacy families only). +Switches from the ``NLA_NUL_STRING`` to ``NLA_STRING`` policy type. + +max-len +~~~~~~~ + +Defines max length for a binary or string attribute (corresponding +to the ``len`` member of struct nla_policy). For string attributes terminating +null character is not counted towards ``max-len``. + +The field may either be a literal integer value or a name of a defined +constant. String types may reduce the constant by one +(i.e. specify ``max-len: CONST - 1``) to reserve space for the terminating +character so implementations should recognize such pattern. + +min-len +~~~~~~~ + +Similar to ``max-len`` but defines minimum length. diff --git a/Documentation/core-api/packing.rst b/Documentation/core-api/packing.rst index d8c341fe383e..3ed13bc9a195 100644 --- a/Documentation/core-api/packing.rst +++ b/Documentation/core-api/packing.rst @@ -161,6 +161,6 @@ xxx_packing() that calls it using the proper QUIRK_* one-hot bits set. The packing() function returns an int-encoded error code, which protects the programmer against incorrect API use. The errors are not expected to occur -durring runtime, therefore it is reasonable for xxx_packing() to return void +during runtime, therefore it is reasonable for xxx_packing() to return void and simply swallow those errors. Optionally it can dump stack or print the error description. diff --git a/Documentation/core-api/padata.rst b/Documentation/core-api/padata.rst index 35175710b43c..05b73c6c105f 100644 --- a/Documentation/core-api/padata.rst +++ b/Documentation/core-api/padata.rst @@ -42,7 +42,7 @@ padata_shells associated with it, each allowing a separate series of jobs. Modifying cpumasks ------------------ -The CPUs used to run jobs can be changed in two ways, programatically with +The CPUs used to run jobs can be changed in two ways, programmatically with padata_set_cpumask() or via sysfs. The former is defined:: int padata_set_cpumask(struct padata_instance *pinst, int cpumask_type, diff --git a/Documentation/core-api/pin_user_pages.rst b/Documentation/core-api/pin_user_pages.rst index b18416f4500f..9fb0b1080d3b 100644 --- a/Documentation/core-api/pin_user_pages.rst +++ b/Documentation/core-api/pin_user_pages.rst @@ -55,18 +55,17 @@ flags the caller provides. The caller is required to pass in a non-null struct pages* array, and the function then pins pages by incrementing each by a special value: GUP_PIN_COUNTING_BIAS. -For compound pages, the GUP_PIN_COUNTING_BIAS scheme is not used. Instead, -an exact form of pin counting is achieved, by using the 2nd struct page -in the compound page. A new struct page field, compound_pincount, has -been added in order to support this. - -This approach for compound pages avoids the counting upper limit problems that -are discussed below. Those limitations would have been aggravated severely by -huge pages, because each tail page adds a refcount to the head page. And in -fact, testing revealed that, without a separate compound_pincount field, -page overflows were seen in some huge page stress tests. - -This also means that huge pages and compound pages do not suffer +For large folios, the GUP_PIN_COUNTING_BIAS scheme is not used. Instead, +the extra space available in the struct folio is used to store the +pincount directly. + +This approach for large folios avoids the counting upper limit problems +that are discussed below. Those limitations would have been aggravated +severely by huge pages, because each tail page adds a refcount to the +head page. And in fact, testing revealed that, without a separate pincount +field, refcount overflows were seen in some huge page stress tests. + +This also means that huge pages and large folios do not suffer from the false positives problem that is mentioned below.:: Function @@ -221,7 +220,7 @@ Unit testing ============ This file:: - tools/testing/selftests/vm/gup_test.c + tools/testing/selftests/mm/gup_test.c has the following new calls to exercise the new pin*() wrapper functions: @@ -264,9 +263,9 @@ place.) Other diagnostics ================= -dump_page() has been enhanced slightly, to handle these new counting -fields, and to better report on compound pages in general. Specifically, -for compound pages, the exact (compound_pincount) pincount is reported. +dump_page() has been enhanced slightly to handle these new counting +fields, and to better report on large folios in general. Specifically, +for large folios, the exact pincount is reported. References ========== diff --git a/Documentation/core-api/workqueue.rst b/Documentation/core-api/workqueue.rst index 3b22ed137662..8ec4d6270b24 100644 --- a/Documentation/core-api/workqueue.rst +++ b/Documentation/core-api/workqueue.rst @@ -370,8 +370,8 @@ of possible problems: The first one can be tracked using tracing: :: - $ echo workqueue:workqueue_queue_work > /sys/kernel/debug/tracing/set_event - $ cat /sys/kernel/debug/tracing/trace_pipe > out.txt + $ echo workqueue:workqueue_queue_work > /sys/kernel/tracing/set_event + $ cat /sys/kernel/tracing/trace_pipe > out.txt (wait a few secs) ^C |