Age | Commit message (Collapse) | Author |
|
To get rid of setjmp()/longjmp(), the variant and self need to be usable
from __bail().
Make them available from the test metadata.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Link: https://lore.kernel.org/r/20250505-nolibc-kselftest-harness-v4-11-ee4dd5257135@linutronix.de
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
|
|
To get rid of setjmp()/longjmp(), the teardown logic needs to be usable
from __bail(). Introduce a new callback for it.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Link: https://lore.kernel.org/r/20250505-nolibc-kselftest-harness-v4-10-ee4dd5257135@linutronix.de
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
|
|
To get rid of setjmp()/longjmp(), the teardown logic needs to be usable
from __bail(). To access the atomic teardown conditional from there,
move it into the test metadata.
This also allows the removal of "setup_completed".
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Link: https://lore.kernel.org/r/20250505-nolibc-kselftest-harness-v4-9-ee4dd5257135@linutronix.de
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
|
|
This field is unused and has no meaning for tests without fixtures.
Don't set it for them.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Link: https://lore.kernel.org/r/20250505-nolibc-kselftest-harness-v4-8-ee4dd5257135@linutronix.de
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
|
|
Make the kselftest harness compatible with nolibc which does not implement
signals by replacing the signal logic with pidfds.
The code also becomes simpler.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Link: https://lore.kernel.org/r/20250505-nolibc-kselftest-harness-v4-7-ee4dd5257135@linutronix.de
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
|
|
__sync_bool_compare_and_swap() is deprecated and requires libatomic on
GCC. Compiler toolchains don't necessarily have libatomic available, so
avoid this requirement by using atomics that don't need libatomic.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Link: https://lore.kernel.org/r/20250505-nolibc-kselftest-harness-v4-6-ee4dd5257135@linutronix.de
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
|
|
The pointers to the wrappers are stored in function pointers,
preventing them from actually being inlined.
Remove the inline qualifier, aligning these wrappers with the other
functions defined through macros.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Link: https://lore.kernel.org/r/20250505-nolibc-kselftest-harness-v4-5-ee4dd5257135@linutronix.de
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
|
|
With -Wmissing-prototypes the compiler will warn about non-static
functions which don't have a prototype defined.
As they are not used from a different compilation unit they don't need to
be defined globally.
Avoid the issue by marking the functions static.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Link: https://lore.kernel.org/r/20250505-nolibc-kselftest-harness-v4-4-ee4dd5257135@linutronix.de
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
|
|
For tests without fixtures the variant argument is unused.
This is intentional, prevent to compiler from complaining.
Example warning:
harness-selftest.c: In function 'wrapper_standalone_pass':
../kselftest_harness.h:181:52: error: unused parameter 'variant' [-Werror=unused-parameter]
181 | struct __fixture_variant_metadata *variant) \
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~
../kselftest_harness.h:156:25: note: in expansion of macro '__TEST_IMPL'
156 | #define TEST(test_name) __TEST_IMPL(test_name, -1)
| ^~~~~~~~~~~
harness-selftest.c:15:1: note: in expansion of macro 'TEST'
15 | TEST(standalone_pass) {
| ^~~~
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Link: https://lore.kernel.org/r/20250505-nolibc-kselftest-harness-v4-3-ee4dd5257135@linutronix.de
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
|
|
All comments in this file use C89 comment style.
Except for this one. Change it to get one step closer to C89
compatibility.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Link: https://lore.kernel.org/r/20250505-nolibc-kselftest-harness-v4-2-ee4dd5257135@linutronix.de
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
|
|
Add a selftest for the kselftest harness itself so any changes can be
validated.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Link: https://lore.kernel.org/r/20250505-nolibc-kselftest-harness-v4-1-ee4dd5257135@linutronix.de
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
|
|
Nolibc now provides all the headers required by nolibc-test.c.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Link: https://lore.kernel.org/r/20250515-nolibc-sys-v1-9-74f82eea3b59@weissschuh.net
Acked-by: Willy Tarreau <w@1wt.eu>
|
|
This is the location regular userspace expects these definitions.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/r/20250515-nolibc-sys-v1-8-74f82eea3b59@weissschuh.net
|
|
This is the location regular userspace expects these definitions.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/r/20250515-nolibc-sys-v1-7-74f82eea3b59@weissschuh.net
|
|
This is the location regular userspace expects these definitions.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/r/20250515-nolibc-sys-v1-6-74f82eea3b59@weissschuh.net
|
|
This is the location regular userspace expects these definitions.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/r/20250515-nolibc-sys-v1-5-74f82eea3b59@weissschuh.net
|
|
This is the location regular userspace expects this definition.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/r/20250515-nolibc-sys-v1-4-74f82eea3b59@weissschuh.net
|
|
This is the location regular userspace expects this definition.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/r/20250515-nolibc-sys-v1-3-74f82eea3b59@weissschuh.net
|
|
This is the location regular userspace expects this definition.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/r/20250515-nolibc-sys-v1-2-74f82eea3b59@weissschuh.net
|
|
This is the location regular userspace expects this definition.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/r/20250515-nolibc-sys-v1-1-74f82eea3b59@weissschuh.net
|
|
Newer architectures like riscv 32-bit are missing sys_wait4().
Make use of the fact that wait(&status) is defined to be equivalent to
waitpid(-1, status, 0) to implement it on all architectures.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/r/20250428-nolibc-misc-v2-15-3c043eeab06c@linutronix.de
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
|
|
Newer architectures (like riscv32) do not implement sys_gettimeofday().
In those cases fall back to sys_clock_gettime().
While that does not support the timezone argument of sys_gettimeofday(),
specifying this argument invokes undefined behaviour, so it's safe to ignore.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/r/20250428-nolibc-misc-v2-14-3c043eeab06c@linutronix.de
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
|
|
This is used in various selftests and will be handy when integrating
those with nolibc.
Only the standard POSIX modes are supported.
No extensions nor the (noop) "b" from ISO C are accepted.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/r/20250428-nolibc-misc-v2-13-3c043eeab06c@linutronix.de
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
|
|
This is used in various selftests and will be handy when integrating
those with nolibc.
Not all configurations support namespaces, so skip the tests where
necessary. Also if the tests are running without privileges.
Enable the namespace configuration for those architectures where it is not
enabled by default.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/r/20250428-nolibc-misc-v2-12-3c043eeab06c@linutronix.de
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
|
|
This is used in various selftests and will be handy when integrating
those with nolibc.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/r/20250428-nolibc-misc-v2-11-3c043eeab06c@linutronix.de
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
|
|
This is used in various selftests and will be handy when integrating
those with nolibc.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/r/20250428-nolibc-misc-v2-10-3c043eeab06c@linutronix.de
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
|
|
This is used in various selftests and will be handy when integrating
those with nolibc.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/r/20250428-nolibc-misc-v2-9-3c043eeab06c@linutronix.de
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
|
|
This is used in various selftests and will be handy when integrating
those with nolibc.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/r/20250428-nolibc-misc-v2-8-3c043eeab06c@linutronix.de
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
|
|
This is used in various selftests and will be handy when integrating
those with nolibc.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/r/20250428-nolibc-misc-v2-7-3c043eeab06c@linutronix.de
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
|
|
This is used in various selftests and will be handy when integrating
those with nolibc.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/r/20250428-nolibc-misc-v2-6-3c043eeab06c@linutronix.de
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
|
|
This is used in various selftests and will be handy when integrating
those with nolibc.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/r/20250428-nolibc-misc-v2-5-3c043eeab06c@linutronix.de
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
|
|
This is used in various selftests and will be handy when integrating
those with nolibc.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/r/20250428-nolibc-misc-v2-4-3c043eeab06c@linutronix.de
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
|
|
Add fstat(), fstatat() and lstat(). All of them use the existing implementation
based on statx().
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/r/20250428-nolibc-misc-v2-3-3c043eeab06c@linutronix.de
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
|
|
The %m format can be used to format the current errno.
It is non-standard but supported by other commonly used libcs like glibc and
musl, so applications do rely on them.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/r/20250428-nolibc-misc-v2-2-3c043eeab06c@linutronix.de
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
|
|
This is used in various selftests and will be handy when integrating
those with nolibc.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/r/20250428-nolibc-misc-v2-1-3c043eeab06c@linutronix.de
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
|
|
The UAPI headers already provide definitions for these symbols.
Using them makes the code shorter, more robust and compatible with
applications using linux/poll.h directly.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/r/20250430-poll-v1-2-44b5ceabdeee@linutronix.de
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
|
|
This is the location regular userspace expects the definition.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/r/20250430-poll-v1-1-44b5ceabdeee@linutronix.de
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
|
|
Add nolibc support for m68k. Should be helpful for nommu where
linking libc can bloat even hello world to the point where you get
an OOM just trying to load it.
Signed-off-by: Daniel Palmer <daniel@thingy.jp>
Link: https://lore.kernel.org/r/20250426224738.284874-1-daniel@0x0f.com
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
|
|
Prevent regressions of issues validates by the header check by always
running it together with the nolibc selftests.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/r/20250424-nolibc-header-check-v1-3-011576b6ed6f@linutronix.de
|
|
Inclusion of any nolibc header file should also bring all other headers.
On the other hand it should also be possible to include any nolibc header
files
in any order.
Currently this is implemented by including the catch-all nolibc.h after the
headers own definitions.
This is problematic if one nolibc header depends on another one.
The first header has to include the other one before defining any symbols.
That in turn will include the rest of nolibc while the current header has
not defined anything yet. If any other part of nolibc depends on
definitions from the current header, errors are encountered.
This is already the case today. Effectively nolibc can only be included in
the order of nolibc.h.
Restructure the way "nolibc.h" is included.
Move it to the beginning of the header files and before the include guards.
Now any header will behave exactly like "nolibc.h" while the include
guards prevent any duplicate definitions.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/r/20250424-nolibc-header-check-v1-2-011576b6ed6f@linutronix.de
|
|
Each nolibc header should be valid for inclusion irrespective of any
special ordering requirements.
Add a new make target, based on the old kbuild "make header_check" target
to validate this requirement.
For now the check fails, but the following commits will fix the issues.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/r/20250424-nolibc-header-check-v1-1-011576b6ed6f@linutronix.de
|
|
Leaving the CQ critical section in the middle of a overflow flushing
can cause cqe reordering since the cache cq pointers are reset and any
new cqe emitters that might get called in between are not going to be
forced into io_cqe_cache_refill().
Fixes: eac2ca2d682f9 ("io_uring: check if we need to reschedule during overflow flush")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/90ba817f1a458f091f355f407de1c911d2b93bbf.1747483784.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Commit 4dbd2b2ebe4c ("nvme-multipath: Add visibility for round-robin
io-policy") introduced the creation of the multipath sysfs group under
the NVMe head gendisk device node. However, it also inadvertently added
the same sysfs group under each namespace path device which head node
refers to and that is incorrect.
The multipath sysfs group should only be exposed through the namespace
head gendisk node. This is sufficient, as the head device already
provides symbolic links to the individual namespace paths it manages.
This patch fixes the issue by preventing the creation of the multipath
sysfs group under namespace path devices, ensuring it only appears under
the head disk node.
Fixes: 4dbd2b2ebe4c ("nvme-multipath: Add visibility for round-robin io-policy")
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Christian Brauner <brauner@kernel.org> says:
Coredumping currently supports two modes:
(1) Dumping directly into a file somewhere on the filesystem.
(2) Dumping into a pipe connected to a usermode helper process
spawned as a child of the system_unbound_wq or kthreadd.
For simplicity I'm mostly ignoring (1). There's probably still some
users of (1) out there but processing coredumps in this way can be
considered adventurous especially in the face of set*id binaries.
The most common option should be (2) by now. It works by allowing
userspace to put a string into /proc/sys/kernel/core_pattern like:
|/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h
The "|" at the beginning indicates to the kernel that a pipe must be
used. The path following the pipe indicator is a path to a binary that
will be spawned as a usermode helper process. Any additional parameters
pass information about the task that is generating the coredump to the
binary that processes the coredump.
In the example core_pattern shown above systemd-coredump is spawned as a
usermode helper. There's various conceptual consequences of this
(non-exhaustive list):
- systemd-coredump is spawned with file descriptor number 0 (stdin)
connected to the read-end of the pipe. All other file descriptors are
closed. That specifically includes 1 (stdout) and 2 (stderr). This has
already caused bugs because userspace assumed that this cannot happen
(Whether or not this is a sane assumption is irrelevant.).
- systemd-coredump will be spawned as a child of system_unbound_wq. So
it is not a child of any userspace process and specifically not a
child of PID 1. It cannot be waited upon and is in a weird hybrid
upcall which are difficult for userspace to control correctly.
- systemd-coredump is spawned with full kernel privileges. This
necessitates all kinds of weird privilege dropping excercises in
userspace to make this safe.
- A new usermode helper has to be spawned for each crashing process.
This series adds a new mode:
(3) Dumping into an AF_UNIX socket.
Userspace can set /proc/sys/kernel/core_pattern to:
@/path/to/coredump.socket
The "@" at the beginning indicates to the kernel that an AF_UNIX
coredump socket will be used to process coredumps.
The coredump socket must be located in the initial mount namespace.
When a task coredumps it opens a client socket in the initial network
namespace and connects to the coredump socket.
- The coredump server should use SO_PEERPIDFD to get a stable handle on
the connected crashing task. The retrieved pidfd will provide a stable
reference even if the crashing task gets SIGKILLed while generating
the coredump.
- By setting core_pipe_limit non-zero userspace can guarantee that the
crashing task cannot be reaped behind it's back and thus process all
necessary information in /proc/<pid>. The SO_PEERPIDFD can be used to
detect whether /proc/<pid> still refers to the same process.
The core_pipe_limit isn't used to rate-limit connections to the
socket. This can simply be done via AF_UNIX socket directly.
- The pidfd for the crashing task will contain information how the task
coredumps. The PIDFD_GET_INFO ioctl gained a new flag
PIDFD_INFO_COREDUMP which can be used to retreive the coredump
information.
If the coredump gets a new coredump client connection the kernel
guarantees that PIDFD_INFO_COREDUMP information is available.
Currently the following information is provided in the new
@coredump_mask extension to struct pidfd_info:
* PIDFD_COREDUMPED is raised if the task did actually coredump.
* PIDFD_COREDUMP_SKIP is raised if the task skipped coredumping (e.g.,
undumpable).
* PIDFD_COREDUMP_USER is raised if this is a regular coredump and
doesn't need special care by the coredump server.
* PIDFD_COREDUMP_ROOT is raised if the generated coredump should be
treated as sensitive and the coredump server should restrict access
to the generated coredump to sufficiently privileged users.
- The coredump server should mark itself as non-dumpable.
- A container coredump server in a separate network namespace can simply
bind to another well-know address and systemd-coredump fowards
coredumps to the container.
- Coredumps could in the future also be handled via per-user/session
coredump servers that run only with that users privileges.
The coredump server listens on the coredump socket and accepts a
new coredump connection. It then retrieves SO_PEERPIDFD for the
client, inspects uid/gid and hands the accepted client to the users
own coredump handler which runs with the users privileges only
(It must of coure pay close attention to not forward crashing suid
binaries.).
The new coredump socket will allow userspace to not have to rely on
usermode helpers for processing coredumps and provides a safer way to
handle them instead of relying on super privileged coredumping helpers.
This will also be significantly more lightweight since no fork()+exec()
for the usermodehelper is required for each crashing process. The
coredump server in userspace can just keep a worker pool.
* patches from https://lore.kernel.org/20250516-work-coredump-socket-v8-0-664f3caf2516@kernel.org:
selftests/coredump: add tests for AF_UNIX coredumps
selftests/pidfd: add PIDFD_INFO_COREDUMP infrastructure
coredump: validate socket name as it is written
coredump: show supported coredump modes
pidfs, coredump: add PIDFD_INFO_COREDUMP
coredump: add coredump socket
coredump: reflow dump helpers a little
coredump: massage do_coredump()
coredump: massage format_corename()
Link: https://lore.kernel.org/20250516-work-coredump-socket-v8-0-664f3caf2516@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Add a simple test for generating coredumps via AF_UNIX sockets.
Link: https://lore.kernel.org/20250516-work-coredump-socket-v8-9-664f3caf2516@kernel.org
Acked-by: Luca Boccassi <luca.boccassi@gmail.com>
Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Add PIDFD_INFO_COREDUMP infrastructure so we can use it in tests.
Link: https://lore.kernel.org/20250516-work-coredump-socket-v8-8-664f3caf2516@kernel.org
Acked-by: Luca Boccassi <luca.boccassi@gmail.com>
Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
In contrast to other parameters written into
/proc/sys/kernel/core_pattern that never fail we can validate enabling
the new AF_UNIX support. This is obviously racy as hell but it's always
been that way.
Link: https://lore.kernel.org/20250516-work-coredump-socket-v8-7-664f3caf2516@kernel.org
Acked-by: Luca Boccassi <luca.boccassi@gmail.com>
Reviewed-by: Jann Horn <jannh@google.com>
Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Allow userspace to discover what coredump modes are supported.
Link: https://lore.kernel.org/20250516-work-coredump-socket-v8-6-664f3caf2516@kernel.org
Acked-by: Luca Boccassi <luca.boccassi@gmail.com>
Reviewed-by: Jann Horn <jannh@google.com>
Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Extend the PIDFD_INFO_COREDUMP ioctl() with the new PIDFD_INFO_COREDUMP
mask flag. This adds the @coredump_mask field to struct pidfd_info.
When a task coredumps the kernel will provide the following information
to userspace in @coredump_mask:
* PIDFD_COREDUMPED is raised if the task did actually coredump.
* PIDFD_COREDUMP_SKIP is raised if the task skipped coredumping (e.g.,
undumpable).
* PIDFD_COREDUMP_USER is raised if this is a regular coredump and
doesn't need special care by the coredump server.
* PIDFD_COREDUMP_ROOT is raised if the generated coredump should be
treated as sensitive and the coredump server should restrict to the
generated coredump to sufficiently privileged users.
The kernel guarantees that by the time the connection is made the all
PIDFD_INFO_COREDUMP info is available.
Link: https://lore.kernel.org/20250516-work-coredump-socket-v8-5-664f3caf2516@kernel.org
Acked-by: Luca Boccassi <luca.boccassi@gmail.com>
Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Reviewed-by: Jann Horn <jannh@google.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Coredumping currently supports two modes:
(1) Dumping directly into a file somewhere on the filesystem.
(2) Dumping into a pipe connected to a usermode helper process
spawned as a child of the system_unbound_wq or kthreadd.
For simplicity I'm mostly ignoring (1). There's probably still some
users of (1) out there but processing coredumps in this way can be
considered adventurous especially in the face of set*id binaries.
The most common option should be (2) by now. It works by allowing
userspace to put a string into /proc/sys/kernel/core_pattern like:
|/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h
The "|" at the beginning indicates to the kernel that a pipe must be
used. The path following the pipe indicator is a path to a binary that
will be spawned as a usermode helper process. Any additional parameters
pass information about the task that is generating the coredump to the
binary that processes the coredump.
In the example core_pattern shown above systemd-coredump is spawned as a
usermode helper. There's various conceptual consequences of this
(non-exhaustive list):
- systemd-coredump is spawned with file descriptor number 0 (stdin)
connected to the read-end of the pipe. All other file descriptors are
closed. That specifically includes 1 (stdout) and 2 (stderr). This has
already caused bugs because userspace assumed that this cannot happen
(Whether or not this is a sane assumption is irrelevant.).
- systemd-coredump will be spawned as a child of system_unbound_wq. So
it is not a child of any userspace process and specifically not a
child of PID 1. It cannot be waited upon and is in a weird hybrid
upcall which are difficult for userspace to control correctly.
- systemd-coredump is spawned with full kernel privileges. This
necessitates all kinds of weird privilege dropping excercises in
userspace to make this safe.
- A new usermode helper has to be spawned for each crashing process.
This series adds a new mode:
(3) Dumping into an AF_UNIX socket.
Userspace can set /proc/sys/kernel/core_pattern to:
@/path/to/coredump.socket
The "@" at the beginning indicates to the kernel that an AF_UNIX
coredump socket will be used to process coredumps.
The coredump socket must be located in the initial mount namespace.
When a task coredumps it opens a client socket in the initial network
namespace and connects to the coredump socket.
- The coredump server uses SO_PEERPIDFD to get a stable handle on the
connected crashing task. The retrieved pidfd will provide a stable
reference even if the crashing task gets SIGKILLed while generating
the coredump.
- By setting core_pipe_limit non-zero userspace can guarantee that the
crashing task cannot be reaped behind it's back and thus process all
necessary information in /proc/<pid>. The SO_PEERPIDFD can be used to
detect whether /proc/<pid> still refers to the same process.
The core_pipe_limit isn't used to rate-limit connections to the
socket. This can simply be done via AF_UNIX sockets directly.
- The pidfd for the crashing task will grow new information how the task
coredumps.
- The coredump server should mark itself as non-dumpable.
- A container coredump server in a separate network namespace can simply
bind to another well-know address and systemd-coredump fowards
coredumps to the container.
- Coredumps could in the future also be handled via per-user/session
coredump servers that run only with that users privileges.
The coredump server listens on the coredump socket and accepts a
new coredump connection. It then retrieves SO_PEERPIDFD for the
client, inspects uid/gid and hands the accepted client to the users
own coredump handler which runs with the users privileges only
(It must of coure pay close attention to not forward crashing suid
binaries.).
The new coredump socket will allow userspace to not have to rely on
usermode helpers for processing coredumps and provides a safer way to
handle them instead of relying on super privileged coredumping helpers
that have and continue to cause significant CVEs.
This will also be significantly more lightweight since no fork()+exec()
for the usermodehelper is required for each crashing process. The
coredump server in userspace can e.g., just keep a worker pool.
Link: https://lore.kernel.org/20250516-work-coredump-socket-v8-4-664f3caf2516@kernel.org
Acked-by: Luca Boccassi <luca.boccassi@gmail.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Reviewed-by: Jann Horn <jannh@google.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|