summaryrefslogtreecommitdiff
path: root/Documentation/process/adding-syscalls.rst
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/process/adding-syscalls.rst')
-rw-r--r--Documentation/process/adding-syscalls.rst151
1 files changed, 135 insertions, 16 deletions
diff --git a/Documentation/process/adding-syscalls.rst b/Documentation/process/adding-syscalls.rst
index 8cc25a06f353..fc0b0bbcd34d 100644
--- a/Documentation/process/adding-syscalls.rst
+++ b/Documentation/process/adding-syscalls.rst
@@ -1,3 +1,6 @@
+
+.. _addsyscalls:
+
Adding a New System Call
========================
@@ -30,7 +33,7 @@ interface.
to a somewhat opaque API.
- If you're just exposing runtime system information, a new node in sysfs
- (see ``Documentation/filesystems/sysfs.txt``) or the ``/proc`` filesystem may
+ (see ``Documentation/filesystems/sysfs.rst``) or the ``/proc`` filesystem may
be more appropriate. However, access to these mechanisms requires that the
relevant filesystem is mounted, which might not always be the case (e.g.
in a namespaced/sandboxed/chrooted environment). Avoid adding any API to
@@ -222,7 +225,7 @@ your new syscall number may get adjusted to resolve conflicts.
The file ``kernel/sys_ni.c`` provides a fallback stub implementation of each
system call, returning ``-ENOSYS``. Add your new system call here too::
- cond_syscall(sys_xyzzy);
+ COND_SYSCALL(xyzzy);
Your new kernel functionality, and the system call that controls it, should
normally be optional, so add a ``CONFIG`` option (typically to
@@ -232,7 +235,7 @@ normally be optional, so add a ``CONFIG`` option (typically to
by the option.
- Make the option depend on EXPERT if it should be hidden from normal users.
- Make any new source files implementing the function dependent on the CONFIG
- option in the Makefile (e.g. ``obj-$(CONFIG_XYZZY_SYSCALL) += xyzzy.c``).
+ option in the Makefile (e.g. ``obj-$(CONFIG_XYZZY_SYSCALL) += xyzzy.o``).
- Double check that the kernel still builds with the new CONFIG option turned
off.
@@ -245,6 +248,52 @@ To summarize, you need a commit that includes:
- fallback stub in ``kernel/sys_ni.c``
+.. _syscall_generic_6_11:
+
+Since 6.11
+~~~~~~~~~~
+
+Starting with kernel version 6.11, general system call implementation for the
+following architectures no longer requires modifications to
+``include/uapi/asm-generic/unistd.h``:
+
+ - arc
+ - arm64
+ - csky
+ - hexagon
+ - loongarch
+ - nios2
+ - openrisc
+ - riscv
+
+Instead, you need to update ``scripts/syscall.tbl`` and, if applicable, adjust
+``arch/*/kernel/Makefile.syscalls``.
+
+As ``scripts/syscall.tbl`` serves as a common syscall table across multiple
+architectures, a new entry is required in this table::
+
+ 468 common xyzzy sys_xyzzy
+
+Note that adding an entry to ``scripts/syscall.tbl`` with the "common" ABI
+also affects all architectures that share this table. For more limited or
+architecture-specific changes, consider using an architecture-specific ABI or
+defining a new one.
+
+If a new ABI, say ``xyz``, is introduced, the corresponding updates should be
+made to ``arch/*/kernel/Makefile.syscalls`` as well::
+
+ syscall_abis_{32,64} += xyz (...)
+
+To summarize, you need a commit that includes:
+
+ - ``CONFIG`` option for the new function, normally in ``init/Kconfig``
+ - ``SYSCALL_DEFINEn(xyzzy, ...)`` for the entry point
+ - corresponding prototype in ``include/linux/syscalls.h``
+ - new entry in ``scripts/syscall.tbl``
+ - (if needed) Makefile updates in ``arch/*/kernel/Makefile.syscalls``
+ - fallback stub in ``kernel/sys_ni.c``
+
+
x86 System Call Implementation
------------------------------
@@ -350,6 +399,41 @@ To summarize, you need:
``include/uapi/asm-generic/unistd.h``
+Since 6.11
+~~~~~~~~~~
+
+This applies to all the architectures listed in :ref:`Since 6.11<syscall_generic_6_11>`
+under "Generic System Call Implementation", except arm64. See
+:ref:`Compatibility System Calls (arm64)<compat_arm64>` for more information.
+
+You need to extend the entry in ``scripts/syscall.tbl`` with an extra column
+to indicate that a 32-bit userspace program running on a 64-bit kernel should
+hit the compat entry point::
+
+ 468 common xyzzy sys_xyzzy compat_sys_xyzzy
+
+To summarize, you need:
+
+ - ``COMPAT_SYSCALL_DEFINEn(xyzzy, ...)`` for the compat entry point
+ - corresponding prototype in ``include/linux/compat.h``
+ - modification of the entry in ``scripts/syscall.tbl`` to include an extra
+ "compat" column
+ - (if needed) 32-bit mapping struct in ``include/linux/compat.h``
+
+
+.. _compat_arm64:
+
+Compatibility System Calls (arm64)
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+On arm64, there is a dedicated syscall table for compatibility system calls
+targeting 32-bit (AArch32) userspace: ``arch/arm64/tools/syscall_32.tbl``.
+You need to add an additional line to this table specifying the compat
+entry point::
+
+ 468 common xyzzy sys_xyzzy compat_sys_xyzzy
+
+
Compatibility System Calls (x86)
--------------------------------
@@ -360,7 +444,7 @@ First, the entry in ``arch/x86/entry/syscalls/syscall_32.tbl`` gets an extra
column to indicate that a 32-bit userspace program running on a 64-bit kernel
should hit the compat entry point::
- 380 i386 xyzzy sys_xyzzy compat_sys_xyzzy
+ 380 i386 xyzzy sys_xyzzy __ia32_compat_sys_xyzzy
Second, you need to figure out what should happen for the x32 ABI version of
the new system call. There's a choice here: the layout of the arguments
@@ -373,7 +457,7 @@ the compatibility wrapper::
333 64 xyzzy sys_xyzzy
...
- 555 x32 xyzzy compat_sys_xyzzy
+ 555 x32 xyzzy __x32_compat_sys_xyzzy
If no pointers are involved, then it is preferable to re-use the 64-bit system
call for the x32 ABI (and consequently the entry in
@@ -487,6 +571,38 @@ patchset, for the convenience of reviewers.
The man page should be cc'ed to linux-man@vger.kernel.org
For more details, see https://www.kernel.org/doc/man-pages/patches.html
+
+Do not call System Calls in the Kernel
+--------------------------------------
+
+System calls are, as stated above, interaction points between userspace and
+the kernel. Therefore, system call functions such as ``sys_xyzzy()`` or
+``compat_sys_xyzzy()`` should only be called from userspace via the syscall
+table, but not from elsewhere in the kernel. If the syscall functionality is
+useful to be used within the kernel, needs to be shared between an old and a
+new syscall, or needs to be shared between a syscall and its compatibility
+variant, it should be implemented by means of a "helper" function (such as
+``ksys_xyzzy()``). This kernel function may then be called within the
+syscall stub (``sys_xyzzy()``), the compatibility syscall stub
+(``compat_sys_xyzzy()``), and/or other kernel code.
+
+At least on 64-bit x86, it will be a hard requirement from v4.17 onwards to not
+call system call functions in the kernel. It uses a different calling
+convention for system calls where ``struct pt_regs`` is decoded on-the-fly in a
+syscall wrapper which then hands processing over to the actual syscall function.
+This means that only those parameters which are actually needed for a specific
+syscall are passed on during syscall entry, instead of filling in six CPU
+registers with random user space content all the time (which may cause serious
+trouble down the call chain).
+
+Moreover, rules on how data may be accessed may differ between kernel data and
+user data. This is another reason why calling ``sys_xyzzy()`` is generally a
+bad idea.
+
+Exceptions to this rule are only allowed in architecture-specific overrides,
+architecture-specific compatibility wrappers, or other code in arch/.
+
+
References and Sources
----------------------
@@ -506,25 +622,25 @@ References and Sources
:manpage:`syscall(2)` man-page:
http://man7.org/linux/man-pages/man2/syscall.2.html#NOTES
- Collated emails from Linus Torvalds discussing the problems with ``ioctl()``:
- http://yarchive.net/comp/linux/ioctl.html
+ https://yarchive.net/comp/linux/ioctl.html
- "How to not invent kernel interfaces", Arnd Bergmann,
- http://www.ukuug.org/events/linux2007/2007/papers/Bergmann.pdf
+ https://www.ukuug.org/events/linux2007/2007/papers/Bergmann.pdf
- LWN article from Michael Kerrisk on avoiding new uses of CAP_SYS_ADMIN:
https://lwn.net/Articles/486306/
- Recommendation from Andrew Morton that all related information for a new
system call should come in the same email thread:
- https://lkml.org/lkml/2014/7/24/641
+ https://lore.kernel.org/r/20140724144747.3041b208832bbdf9fbce5d96@linux-foundation.org
- Recommendation from Michael Kerrisk that a new system call should come with
- a man page: https://lkml.org/lkml/2014/6/13/309
+ a man page: https://lore.kernel.org/r/CAKgNAkgMA39AfoSoA5Pe1r9N+ZzfYQNvNPvcRN7tOvRb8+v06Q@mail.gmail.com
- Suggestion from Thomas Gleixner that x86 wire-up should be in a separate
- commit: https://lkml.org/lkml/2014/11/19/254
+ commit: https://lore.kernel.org/r/alpine.DEB.2.11.1411191249560.3909@nanos
- Suggestion from Greg Kroah-Hartman that it's good for new system calls to
- come with a man-page & selftest: https://lkml.org/lkml/2014/3/19/710
+ come with a man-page & selftest: https://lore.kernel.org/r/20140320025530.GA25469@kroah.com
- Discussion from Michael Kerrisk of new system call vs. :manpage:`prctl(2)` extension:
- https://lkml.org/lkml/2014/6/3/411
+ https://lore.kernel.org/r/CAHO5Pa3F2MjfTtfNxa8LbnkeeU8=YJ+9tDqxZpw7Gz59E-4AUg@mail.gmail.com
- Suggestion from Ingo Molnar that system calls that involve multiple
arguments should encapsulate those arguments in a struct, which includes a
- size field for future extensibility: https://lkml.org/lkml/2015/7/30/117
+ size field for future extensibility: https://lore.kernel.org/r/20150730083831.GA22182@gmail.com
- Numbering oddities arising from (re-)use of O_* numbering space flags:
- commit 75069f2b5bfb ("vfs: renumber FMODE_NONOTIFY and add to uniqueness
@@ -534,9 +650,12 @@ References and Sources
- commit bb458c644a59 ("Safer ABI for O_TMPFILE")
- Discussion from Matthew Wilcox about restrictions on 64-bit arguments:
- https://lkml.org/lkml/2008/12/12/187
+ https://lore.kernel.org/r/20081212152929.GM26095@parisc-linux.org
- Recommendation from Greg Kroah-Hartman that unknown flags should be
- policed: https://lkml.org/lkml/2014/7/17/577
+ policed: https://lore.kernel.org/r/20140717193330.GB4703@kroah.com
- Recommendation from Linus Torvalds that x32 system calls should prefer
compatibility with 64-bit versions rather than 32-bit versions:
- https://lkml.org/lkml/2011/8/31/244
+ https://lore.kernel.org/r/CA+55aFxfmwfB7jbbrXxa=K7VBYPfAvmu3XOkGrLbB1UFjX1+Ew@mail.gmail.com
+ - Patch series revising system call table infrastructure to use
+ scripts/syscall.tbl across multiple architectures:
+ https://lore.kernel.org/lkml/20240704143611.2979589-1-arnd@kernel.org