summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-05-21dt-bindings: can: renesas,rcar-canfd: Document RZ/G3E supportBiju Das
Document support for the CAN-FD Interface on the RZ/G3E (R9A09G047) SoC, which supports up to six channels. The CAN-FD module on RZ/G3E is very similar to the one on both R-Car V4H and RZ/G2L, but differs in some hardware parameters: * No external clock, but instead has ram clock. * Support up to 6 channels. * 20 interrupts. Reviewed-by: "Rob Herring (Arm)" <robh@kernel.org> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com> Link: https://patch.msgid.link/20250417054320.14100-3-biju.das.jz@bp.renesas.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-05-21dt-bindings: can: renesas,rcar-canfd: Simplify the conditional schemaBiju Das
RZ/G3E SoC has 20 interrupts, 2 resets and 6 channels that need more branching with conditional schema. Simplify the conditional schema with if statements rather than the complex if-else statements to prepare for supporting RZ/G3E SoC. Acked-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com> Link: https://patch.msgid.link/20250417054320.14100-2-biju.das.jz@bp.renesas.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-05-21Merge netfs API documentation updatesChristian Brauner
Bring in the netfs API documentation updates which had been in the vfs-6.16.misc branch for most of this cycle. So don't needlessly rewrite the vfs-6.16.misc by dropping it from that branch and moving it to vfs-6.16.netfs. Simply merge vfs-6.16.misc into vfs-6.16.netfs. Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-21nvmem: Add apple-spmi-nvmem driverHector Martin
Add a driver for a series of SPMI-attached PMICs present on Apple devices Reviewed-by: Neal Gompa <neal@gompa.dev> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Signed-off-by: Hector Martin <marcan@marcan.st> Co-developed-by: Sasha Finkelstein <fnkl.kernel@gmail.com> Signed-off-by: Sasha Finkelstein <fnkl.kernel@gmail.com> Signed-off-by: Srinivas Kandagatla <srini@kernel.org> Link: https://lore.kernel.org/r/20250509122452.11827-4-srini@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-21dt-bindings: spmi: Add Apple SPMI NVMEMSasha Finkelstein
Add bindings for exposing SPMI registers of Apple PMICs as NVMEM cells Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Signed-off-by: Sasha Finkelstein <fnkl.kernel@gmail.com> Reviewed-by: "Rob Herring (Arm)" <robh@kernel.org> Reviewed-by: Neal Gompa <neal@gompa.dev> Signed-off-by: Srinivas Kandagatla <srini@kernel.org> Link: https://lore.kernel.org/r/20250509122452.11827-3-srini@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-21nvmem: Remove unused nvmem cell table supportGeert Uytterhoeven
Board files are deprecated by DT, and the last user of nvmem_add_cell_table() was removed by commit 2af4fcc0d3574482 ("ARM: davinci: remove unused board support") in v6.3. Hence remove all support for nvmem cell tables, and update the documentation. Device drivers can still register a single cell using nvmem_add_one_cell() (which was not documented before). Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Srinivas Kandagatla <srini@kernel.org> Link: https://lore.kernel.org/r/20250509122452.11827-2-srini@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-21nvmem: zynqmp_nvmem: unbreak driver after cleanupPeter Korsgaard
Commit 29be47fcd6a0 ("nvmem: zynqmp_nvmem: zynqmp_nvmem_probe cleanup") changed the driver to expect the device pointer to be passed as the "context", but in nvmem the context parameter comes from nvmem_config.priv which is never set - Leading to null pointer exceptions when the device is accessed. Fixes: 29be47fcd6a0 ("nvmem: zynqmp_nvmem: zynqmp_nvmem_probe cleanup") Cc: stable <stable@kernel.org> Signed-off-by: Peter Korsgaard <peter@korsgaard.com> Reviewed-by: Michal Simek <michal.simek@amd.com> Tested-by: Michal Simek <michal.simek@amd.com> Signed-off-by: Srinivas Kandagatla <srini@kernel.org> Link: https://lore.kernel.org/r/20250509122407.11763-3-srini@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-21nvmem: rmem: select CONFIG_CRC32Arnd Bergmann
The newly added crc checking leads to a link failure if CRC32 itself is disabled: x86_64-linux-ld: vmlinux.o: in function `rmem_eyeq5_checksum': rmem.c:(.text+0x52341b): undefined reference to `crc32_le_arch' Fixes: 7e606c311f70 ("nvmem: rmem: add CRC validation for Mobileye EyeQ5 NVMEM") Cc: stable <stable@kernel.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Srinivas Kandagatla <srini@kernel.org> Link: https://lore.kernel.org/r/20250509122407.11763-2-srini@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-21mux: MAINTAINERS: Mark as Odd FixesKrzysztof Kozlowski
Over last year, several patches for drivers/mux/ were not picked up, even after multiple pings or resends, so mark the mux subsystem as odd fixes to clarify actual status of lack of maintainers with dedicated time and indicate that someone could help here. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20250501175303.144102-2-krzysztof.kozlowski@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-21kernfs: Relax constraint in draining guardMichal Koutný
The active reference lifecycle provides the break/unbreak mechanism but the active reference is not truly active after unbreak -- callers don't use it afterwards but it's important for proper pairing of kn->active counting. Assuming this mechanism is in place, the WARN check in kernfs_should_drain_open_files() is too sensitive -- it may transiently catch those (rightful) callers between kernfs_unbreak_active_protection() and kernfs_put_active() as found out by Chen Ridong: kernfs_remove_by_name_ns kernfs_get_active // active=1 __kernfs_remove // active=0x80000002 kernfs_drain ... wait_event //waiting (active == 0x80000001) kernfs_break_active_protection // active = 0x80000001 // continue kernfs_unbreak_active_protection // active = 0x80000002 ... kernfs_should_drain_open_files // warning occurs kernfs_put_active To avoid the false positives (mind panic_on_warn) remove the check altogether. (This is meant as quick fix, I think active reference break/unbreak may be simplified with larger rework.) Fixes: bdb2fd7fc56e1 ("kernfs: Skip kernfs_drain_open_files() more aggressively") Link: https://lore.kernel.org/r/kmmrseckjctb4gxcx2rdminrjnq2b4ipf7562nvfd432ld5v5m@2byj5eedkb2o/ Cc: Chen Ridong <chenridong@huawei.com> Signed-off-by: Michal Koutný <mkoutny@suse.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20250505121201.879823-1-mkoutny@suse.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-21Documentation: embargoed-hardware-issues.rst: Remove myselfMichael Ellerman
I'm no longer able to perform this role since I left IBM. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/8734czh8yg.fsf@mpe.ellerman.id.au Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-21Merge tag 'iio-fixes-for-6.15b' of ↵Greg Kroah-Hartman
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/jic23/iio into char-misc-next Jonathan writes: IIO: 2nd set of fixes for 6.15 (or 6.16 merge window) Usual mixed bag. adi,ad4851 - Avoid a buffer overrun due to bug in pointer arithmetic. adi,ad7173 - Fix compiling if gpiolib is not enabled adi,ad7606 - Fix raw reads for 18-bit chips by ensuring we mask out upper bits as some SPI controllers do not do so for 18bit words. - Fix wrong masking for register writes. adi,ad7944 - Mask high bits for raw reads. adi,axi-adc - Add check on whether the busy flag has cleared before first access. invensense,icm42600 - Fix the temperature offset to take scale into account. nxp,fxls8962af - Fix temperature to be in milli degrees Celsius not degrees. - Fix sign of temperature channel. * tag 'iio-fixes-for-6.15b' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/jic23/iio: iio: accel: fxls8962af: Fix temperature scan element sign iio: accel: fxls8962af: Fix temperature calculation iio: adc: ad7944: mask high bits on direct read iio: adc: ad4851: fix ad4858 chan pointer handling iio: imu: inv_icm42600: Fix temperature calculation iio: dac: adi-axi-dac: fix bus read iio: adc: ad7606_spi: fix reg write value mask iio: adc: ad7606: fix raw read for 18-bit chips iio: adc: ad7173: fix compiling without gpiolib
2025-05-21w1: Avoid -Wflex-array-member-not-at-end warningsGustavo A. R. Silva
-Wflex-array-member-not-at-end was introduced in GCC-14, and we are getting ready to enable it, globally. Use the `DEFINE_RAW_FLEX()` helper for on-stack definitions of a flexible structure where the size of the flexible-array member is known at compile-time, and refactor the rest of the code, accordingly. So, with these changes, fix the following warnings: drivers/w1/w1_netlink.c:198:31: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] drivers/w1/w1_netlink.c:219:31: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] Reviewed-by: Kees Cook <kees@kernel.org> Signed-off-by: "Gustavo A. R. Silva" <gustavoars@kernel.org> Link: https://lore.kernel.org/r/Z_RflBe5iDGTMFjV@kspp Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20250513105326.27385-2-krzysztof.kozlowski@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-21Merge tag 'mux-drv-6.16' of ↵Greg Kroah-Hartman
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/krzk/linux into char-misc-next Krzysztof writes: Mux drivers for v6.16 Few cleanups and fixes for the mux drivers: 1. Simplify with spi_get_device_match_data(). 2. Fix -Wunused-const-variable and -Wvoid-pointer-to-enum-cast warnings. 3. GPIO mux: add optional regulator for Lenovo T14s laptop headset. 4. MMIO mux: avoid using syscon's device_node_to_regmap(), due to changes in the syscon code. * tag 'mux-drv-6.16' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/krzk/linux: mux: adgs1408: fix Wvoid-pointer-to-enum-cast warning mux: gpio: add optional regulator support dt-bindings: mux: add optional regulator binding to gpio mux mux: mmio: Do not use syscon helper to build regmap mux: adg792a: remove incorrect of_match_ptr annotation mux: adgs1408: simplify with spi_get_device_match_data() mux: mmio: Add missing word in error message
2025-05-21Merge tag 'mhi-for-v6.16' of ↵Greg Kroah-Hartman
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mani/mhi into char-misc-next Manivannan writes: MHI Host ======== - Fix conflict between MHI power up and SYSERR state transitions by issuing MHI reset only if the device is in SYSERR state while in SBL/PBL EEs. The device won't respond to reset if it is not in SYSERR state in SBL/PBL EEs. - Remove redundant call to pci_assign_resource() since PCI core calls this API internally. - Add Telit FN920C04 modem which is based on Qcom SDX35 chipset. * tag 'mhi-for-v6.16' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mani/mhi: bus: mhi: host: pci_generic: Add Telit FN920C04 modem support bus: mhi: host: pci_generic: Remove redundant assign resource usage bus: mhi: host: Fix conflict between power_up and SYSERR
2025-05-21Merge tag 'mhi-fixes-for-v6.15' of ↵Greg Kroah-Hartman
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mani/mhi into char-misc-next Manivannan writes: MHI Endpoint ============ - Increment the rd_offset after writing the buffer to avoid MHI host accessing the incomplete/wrong buffer element. * tag 'mhi-fixes-for-v6.15' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mani/mhi: bus: mhi: ep: Update read pointer only after buffer is written
2025-05-21Merge tag 'fpga-for-6.16-rc1' of ↵Greg Kroah-Hartman
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/fpga/linux-fpga into char-misc-next Xu writes: FPGA Manager changes for 6.16-rc1 - Peter hands over the maintain role of m10bmc-sec driver to Matthew. - Qasim's change fix potential NULL pointer for fpga test. All patches have been reviewed on the mailing list, and have been in the last linux-next releases (as part of our for-next branch). Signed-off-by: Xu Yilun <yilun.xu@intel.com> * tag 'fpga-for-6.16-rc1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/fpga/linux-fpga: fpga: fix potential null pointer deref in fpga_mgr_test_img_load_sgt() fpga: m10bmc-sec: change contact for secure update driver
2025-05-21Merge tag 'counter-updates-for-6.16' of ↵Greg Kroah-Hartman
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/wbg/counter into char-misc-next William writes: Counter updates for 6.16 An update to allow for larger count values in interrupt-cnt. * tag 'counter-updates-for-6.16' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/wbg/counter: counter: interrupt-cnt: Convert atomic_t -> atomic_long_t
2025-05-21Merge tag 'counter-fixes-for-6.15' of ↵Greg Kroah-Hartman
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/wbg/counter into char-misc-next William writes: Counter fixes for 6.15 A fix to prevent a race condition when accessing the Count enable component in interrupt-cnt. * tag 'counter-fixes-for-6.15' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/wbg/counter: counter: interrupt-cnt: Protect enable/disable OPs with mutex
2025-05-21rust: miscdevice: fix typo in MiscDevice::ioctl documentationChristian Schrefl
Fixes one small typo (`utilties` to `utilities`) in the documentation of `MiscDevice::ioctl`. Fixes: f893691e7426 ("rust: miscdevice: add base miscdevice abstraction") Signed-off-by: Christian Schrefl <chrisi.schrefl@gmail.com> Reviewed-by: Benno Lossin <lossin@kernel.org> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Link: https://lore.kernel.org/r/20250517-rust_miscdevice_fix_typo-v1-1-8c30a6237ba9@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-21PCI: rcar-gen4: Document how to obtain platform firmwareYoshihiro Shimoda
Renesas R-Car V4H (r8a779g0) has PCIe controller, and it requires specific firmware downloading. So, add a document about the firmware how to get. Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> [kwilczynski: commit log, refactor the document content and then add this new file to a correct index under the top-level PCI documentation] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Link: https://lore.kernel.org/r/20250507100947.608875-1-yoshihiro.shimoda.uh@renesas.com
2025-05-21Merge patch series "coredump: add coredump socket"Christian Brauner
Christian Brauner <brauner@kernel.org> says: Coredumping currently supports two modes: (1) Dumping directly into a file somewhere on the filesystem. (2) Dumping into a pipe connected to a usermode helper process spawned as a child of the system_unbound_wq or kthreadd. For simplicity I'm mostly ignoring (1). There's probably still some users of (1) out there but processing coredumps in this way can be considered adventurous especially in the face of set*id binaries. The most common option should be (2) by now. It works by allowing userspace to put a string into /proc/sys/kernel/core_pattern like: |/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h The "|" at the beginning indicates to the kernel that a pipe must be used. The path following the pipe indicator is a path to a binary that will be spawned as a usermode helper process. Any additional parameters pass information about the task that is generating the coredump to the binary that processes the coredump. In the example core_pattern shown above systemd-coredump is spawned as a usermode helper. There's various conceptual consequences of this (non-exhaustive list): - systemd-coredump is spawned with file descriptor number 0 (stdin) connected to the read-end of the pipe. All other file descriptors are closed. That specifically includes 1 (stdout) and 2 (stderr). This has already caused bugs because userspace assumed that this cannot happen (Whether or not this is a sane assumption is irrelevant.). - systemd-coredump will be spawned as a child of system_unbound_wq. So it is not a child of any userspace process and specifically not a child of PID 1. It cannot be waited upon and is in a weird hybrid upcall which are difficult for userspace to control correctly. - systemd-coredump is spawned with full kernel privileges. This necessitates all kinds of weird privilege dropping excercises in userspace to make this safe. - A new usermode helper has to be spawned for each crashing process. This series adds a new mode: (3) Dumping into an AF_UNIX socket. Userspace can set /proc/sys/kernel/core_pattern to: @/path/to/coredump.socket The "@" at the beginning indicates to the kernel that an AF_UNIX coredump socket will be used to process coredumps. The coredump socket must be located in the initial mount namespace. When a task coredumps it opens a client socket in the initial network namespace and connects to the coredump socket. - The coredump server should use SO_PEERPIDFD to get a stable handle on the connected crashing task. The retrieved pidfd will provide a stable reference even if the crashing task gets SIGKILLed while generating the coredump. - By setting core_pipe_limit non-zero userspace can guarantee that the crashing task cannot be reaped behind it's back and thus process all necessary information in /proc/<pid>. The SO_PEERPIDFD can be used to detect whether /proc/<pid> still refers to the same process. The core_pipe_limit isn't used to rate-limit connections to the socket. This can simply be done via AF_UNIX socket directly. - The pidfd for the crashing task will contain information how the task coredumps. The PIDFD_GET_INFO ioctl gained a new flag PIDFD_INFO_COREDUMP which can be used to retreive the coredump information. If the coredump gets a new coredump client connection the kernel guarantees that PIDFD_INFO_COREDUMP information is available. Currently the following information is provided in the new @coredump_mask extension to struct pidfd_info: * PIDFD_COREDUMPED is raised if the task did actually coredump. * PIDFD_COREDUMP_SKIP is raised if the task skipped coredumping (e.g., undumpable). * PIDFD_COREDUMP_USER is raised if this is a regular coredump and doesn't need special care by the coredump server. * PIDFD_COREDUMP_ROOT is raised if the generated coredump should be treated as sensitive and the coredump server should restrict access to the generated coredump to sufficiently privileged users. - The coredump server should mark itself as non-dumpable. - A container coredump server in a separate network namespace can simply bind to another well-know address and systemd-coredump fowards coredumps to the container. - Coredumps could in the future also be handled via per-user/session coredump servers that run only with that users privileges. The coredump server listens on the coredump socket and accepts a new coredump connection. It then retrieves SO_PEERPIDFD for the client, inspects uid/gid and hands the accepted client to the users own coredump handler which runs with the users privileges only (It must of coure pay close attention to not forward crashing suid binaries.). The new coredump socket will allow userspace to not have to rely on usermode helpers for processing coredumps and provides a safer way to handle them instead of relying on super privileged coredumping helpers. This will also be significantly more lightweight since no fork()+exec() for the usermodehelper is required for each crashing process. The coredump server in userspace can just keep a worker pool. * patches from https://lore.kernel.org/20250516-work-coredump-socket-v8-0-664f3caf2516@kernel.org: selftests/coredump: add tests for AF_UNIX coredumps selftests/pidfd: add PIDFD_INFO_COREDUMP infrastructure coredump: validate socket name as it is written coredump: show supported coredump modes pidfs, coredump: add PIDFD_INFO_COREDUMP coredump: add coredump socket coredump: reflow dump helpers a little coredump: massage do_coredump() coredump: massage format_corename() Link: https://lore.kernel.org/20250516-work-coredump-socket-v8-0-664f3caf2516@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-21selftests/coredump: add tests for AF_UNIX coredumpsChristian Brauner
Add a simple test for generating coredumps via AF_UNIX sockets. Link: https://lore.kernel.org/20250516-work-coredump-socket-v8-9-664f3caf2516@kernel.org Acked-by: Luca Boccassi <luca.boccassi@gmail.com> Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-21selftests/pidfd: add PIDFD_INFO_COREDUMP infrastructureChristian Brauner
Add PIDFD_INFO_COREDUMP infrastructure so we can use it in tests. Link: https://lore.kernel.org/20250516-work-coredump-socket-v8-8-664f3caf2516@kernel.org Acked-by: Luca Boccassi <luca.boccassi@gmail.com> Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-21coredump: validate socket name as it is writtenChristian Brauner
In contrast to other parameters written into /proc/sys/kernel/core_pattern that never fail we can validate enabling the new AF_UNIX support. This is obviously racy as hell but it's always been that way. Link: https://lore.kernel.org/20250516-work-coredump-socket-v8-7-664f3caf2516@kernel.org Acked-by: Luca Boccassi <luca.boccassi@gmail.com> Reviewed-by: Jann Horn <jannh@google.com> Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-21coredump: show supported coredump modesChristian Brauner
Allow userspace to discover what coredump modes are supported. Link: https://lore.kernel.org/20250516-work-coredump-socket-v8-6-664f3caf2516@kernel.org Acked-by: Luca Boccassi <luca.boccassi@gmail.com> Reviewed-by: Jann Horn <jannh@google.com> Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-21pidfs, coredump: add PIDFD_INFO_COREDUMPChristian Brauner
Extend the PIDFD_INFO_COREDUMP ioctl() with the new PIDFD_INFO_COREDUMP mask flag. This adds the @coredump_mask field to struct pidfd_info. When a task coredumps the kernel will provide the following information to userspace in @coredump_mask: * PIDFD_COREDUMPED is raised if the task did actually coredump. * PIDFD_COREDUMP_SKIP is raised if the task skipped coredumping (e.g., undumpable). * PIDFD_COREDUMP_USER is raised if this is a regular coredump and doesn't need special care by the coredump server. * PIDFD_COREDUMP_ROOT is raised if the generated coredump should be treated as sensitive and the coredump server should restrict to the generated coredump to sufficiently privileged users. The kernel guarantees that by the time the connection is made the all PIDFD_INFO_COREDUMP info is available. Link: https://lore.kernel.org/20250516-work-coredump-socket-v8-5-664f3caf2516@kernel.org Acked-by: Luca Boccassi <luca.boccassi@gmail.com> Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Reviewed-by: Jann Horn <jannh@google.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-21coredump: add coredump socketChristian Brauner
Coredumping currently supports two modes: (1) Dumping directly into a file somewhere on the filesystem. (2) Dumping into a pipe connected to a usermode helper process spawned as a child of the system_unbound_wq or kthreadd. For simplicity I'm mostly ignoring (1). There's probably still some users of (1) out there but processing coredumps in this way can be considered adventurous especially in the face of set*id binaries. The most common option should be (2) by now. It works by allowing userspace to put a string into /proc/sys/kernel/core_pattern like: |/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h The "|" at the beginning indicates to the kernel that a pipe must be used. The path following the pipe indicator is a path to a binary that will be spawned as a usermode helper process. Any additional parameters pass information about the task that is generating the coredump to the binary that processes the coredump. In the example core_pattern shown above systemd-coredump is spawned as a usermode helper. There's various conceptual consequences of this (non-exhaustive list): - systemd-coredump is spawned with file descriptor number 0 (stdin) connected to the read-end of the pipe. All other file descriptors are closed. That specifically includes 1 (stdout) and 2 (stderr). This has already caused bugs because userspace assumed that this cannot happen (Whether or not this is a sane assumption is irrelevant.). - systemd-coredump will be spawned as a child of system_unbound_wq. So it is not a child of any userspace process and specifically not a child of PID 1. It cannot be waited upon and is in a weird hybrid upcall which are difficult for userspace to control correctly. - systemd-coredump is spawned with full kernel privileges. This necessitates all kinds of weird privilege dropping excercises in userspace to make this safe. - A new usermode helper has to be spawned for each crashing process. This series adds a new mode: (3) Dumping into an AF_UNIX socket. Userspace can set /proc/sys/kernel/core_pattern to: @/path/to/coredump.socket The "@" at the beginning indicates to the kernel that an AF_UNIX coredump socket will be used to process coredumps. The coredump socket must be located in the initial mount namespace. When a task coredumps it opens a client socket in the initial network namespace and connects to the coredump socket. - The coredump server uses SO_PEERPIDFD to get a stable handle on the connected crashing task. The retrieved pidfd will provide a stable reference even if the crashing task gets SIGKILLed while generating the coredump. - By setting core_pipe_limit non-zero userspace can guarantee that the crashing task cannot be reaped behind it's back and thus process all necessary information in /proc/<pid>. The SO_PEERPIDFD can be used to detect whether /proc/<pid> still refers to the same process. The core_pipe_limit isn't used to rate-limit connections to the socket. This can simply be done via AF_UNIX sockets directly. - The pidfd for the crashing task will grow new information how the task coredumps. - The coredump server should mark itself as non-dumpable. - A container coredump server in a separate network namespace can simply bind to another well-know address and systemd-coredump fowards coredumps to the container. - Coredumps could in the future also be handled via per-user/session coredump servers that run only with that users privileges. The coredump server listens on the coredump socket and accepts a new coredump connection. It then retrieves SO_PEERPIDFD for the client, inspects uid/gid and hands the accepted client to the users own coredump handler which runs with the users privileges only (It must of coure pay close attention to not forward crashing suid binaries.). The new coredump socket will allow userspace to not have to rely on usermode helpers for processing coredumps and provides a safer way to handle them instead of relying on super privileged coredumping helpers that have and continue to cause significant CVEs. This will also be significantly more lightweight since no fork()+exec() for the usermodehelper is required for each crashing process. The coredump server in userspace can e.g., just keep a worker pool. Link: https://lore.kernel.org/20250516-work-coredump-socket-v8-4-664f3caf2516@kernel.org Acked-by: Luca Boccassi <luca.boccassi@gmail.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Reviewed-by: Jann Horn <jannh@google.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-21mips/perf: Remove driver-specific throttle supportKan Liang
The throttle support has been added in the generic code. Remove the driver-specific throttle support. Besides the throttle, perf_event_overflow may return true because of event_limit. It already does an inatomic event disable. The pmu->stop is not required either. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250520181644.2673067-17-kan.liang@linux.intel.com
2025-05-21xtensa/perf: Remove driver-specific throttle supportKan Liang
The throttle support has been added in the generic code. Remove the driver-specific throttle support. Besides the throttle, perf_event_overflow may return true because of event_limit. It already does an inatomic event disable. The pmu->stop is not required either. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Max Filippov <jcmvbkbc@gmail.com> Link: https://lore.kernel.org/r/20250520181644.2673067-16-kan.liang@linux.intel.com
2025-05-21sparc/perf: Remove driver-specific throttle supportKan Liang
The throttle support has been added in the generic code. Remove the driver-specific throttle support. Besides the throttle, perf_event_overflow may return true because of event_limit. It already does an inatomic event disable. The pmu->stop is not required either. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250520181644.2673067-15-kan.liang@linux.intel.com
2025-05-21loongarch/perf: Remove driver-specific throttle supportKan Liang
The throttle support has been added in the generic code. Remove the driver-specific throttle support. Besides the throttle, perf_event_overflow may return true because of event_limit. It already does an inatomic event disable. The pmu->stop is not required either. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250520181644.2673067-14-kan.liang@linux.intel.com
2025-05-21csky/perf: Remove driver-specific throttle supportKan Liang
The throttle support has been added in the generic code. Remove the driver-specific throttle support. Besides the throttle, perf_event_overflow may return true because of event_limit. It already does an inatomic event disable. The pmu->stop is not required either. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Guo Ren <guoren@kernel.org> Link: https://lore.kernel.org/r/20250520181644.2673067-13-kan.liang@linux.intel.com
2025-05-21arc/perf: Remove driver-specific throttle supportKan Liang
The throttle support has been added in the generic code. Remove the driver-specific throttle support. Besides the throttle, perf_event_overflow may return true because of event_limit. It already does an inatomic event disable. The pmu->stop is not required either. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Vineet Gupta <vgupta@kernel.org> Link: https://lore.kernel.org/r/20250520181644.2673067-12-kan.liang@linux.intel.com
2025-05-21alpha/perf: Remove driver-specific throttle supportKan Liang
The throttle support has been added in the generic code. Remove the driver-specific throttle support. Besides the throttle, perf_event_overflow may return true because of event_limit. It already does an inatomic event disable. The pmu->stop is not required either. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250520181644.2673067-11-kan.liang@linux.intel.com
2025-05-21perf/apple_m1: Remove driver-specific throttle supportKan Liang
The throttle support has been added in the generic code. Remove the driver-specific throttle support. Besides the throttle, perf_event_overflow may return true because of event_limit. It already does an inatomic event disable. The pmu->stop is not required either. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250520181644.2673067-10-kan.liang@linux.intel.com
2025-05-21perf/arm: Remove driver-specific throttle supportKan Liang
The throttle support has been added in the generic code. Remove the driver-specific throttle support. Besides the throttle, perf_event_overflow may return true because of event_limit. It already does an inatomic event disable. The pmu->stop is not required either. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Leo Yan <leo.yan@arm.com> Link: https://lore.kernel.org/r/20250520181644.2673067-9-kan.liang@linux.intel.com
2025-05-21s390/perf: Remove driver-specific throttle supportKan Liang
The throttle support has been added in the generic code. Remove the driver-specific throttle support. Besides the throttle, perf_event_overflow may return true because of event_limit. It already does an inatomic event disable. The pmu->stop is not required either. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Thomas Richter <tmricht@linux.ibm.com> Link: https://lore.kernel.org/r/20250520181644.2673067-8-kan.liang@linux.intel.com
2025-05-21powerpc/perf: Remove driver-specific throttle supportKan Liang
The throttle support has been added in the generic code. Remove the driver-specific throttle support. Besides the throttle, perf_event_overflow may return true because of event_limit. It already does an inatomic event disable. The pmu->stop is not required either. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250520181644.2673067-7-kan.liang@linux.intel.com
2025-05-21perf/x86/zhaoxin: Remove driver-specific throttle supportKan Liang
The throttle support has been added in the generic code. Remove the driver-specific throttle support. Besides the throttle, perf_event_overflow may return true because of event_limit. It already does an inatomic event disable. The pmu->stop is not required either. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250520181644.2673067-6-kan.liang@linux.intel.com
2025-05-21perf/x86/amd: Remove driver-specific throttle supportKan Liang
The throttle support has been added in the generic code. Remove the driver-specific throttle support. Besides the throttle, perf_event_overflow may return true because of event_limit. It already does an inatomic event disable. The pmu->stop is not required either. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/r/20250520181644.2673067-5-kan.liang@linux.intel.com
2025-05-21perf/x86/intel: Remove driver-specific throttle supportKan Liang
The throttle support has been added in the generic code. Remove the driver-specific throttle support. Besides the throttle, perf_event_overflow may return true because of event_limit. It already does an inatomic event disable. The pmu->stop is not required either. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250520181644.2673067-4-kan.liang@linux.intel.com
2025-05-21perf: Only dump the throttle log for the leaderKan Liang
The PERF_RECORD_THROTTLE records are dumped for all throttled events. It's not necessary for group events, which are throttled altogether. Optimize it by only dump the throttle log for the leader. The sample right after the THROTTLE record must be generated by the actual target event. It is good enough for the perf tool to locate the actual target event. Suggested-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20250520181644.2673067-3-kan.liang@linux.intel.com
2025-05-21perf: Fix the throttle logic for a groupKan Liang
The current throttle logic doesn't work well with a group, e.g., the following sampling-read case. $ perf record -e "{cycles,cycles}:S" ... $ perf report -D | grep THROTTLE | tail -2 THROTTLE events: 426 ( 9.0%) UNTHROTTLE events: 425 ( 9.0%) $ perf report -D | grep PERF_RECORD_SAMPLE -a4 | tail -n 5 0 1020120874009167 0x74970 [0x68]: PERF_RECORD_SAMPLE(IP, 0x1): ... sample_read: .... group nr 2 ..... id 0000000000000327, value 000000000cbb993a, lost 0 ..... id 0000000000000328, value 00000002211c26df, lost 0 The second cycles event has a much larger value than the first cycles event in the same group. The current throttle logic in the generic code only logs the THROTTLE event. It relies on the specific driver implementation to disable events. For all ARCHs, the implementation is similar. Only the event is disabled, rather than the group. The logic to disable the group should be generic for all ARCHs. Add the logic in the generic code. The following patch will remove the buggy driver-specific implementation. The throttle only happens when an event is overflowed. Stop the entire group when any event in the group triggers the throttle. The MAX_INTERRUPTS is set to all throttle events. The unthrottled could happen in 3 places. - event/group sched. All events in the group are scheduled one by one. All of them will be unthrottled eventually. Nothing needs to be changed. - The perf_adjust_freq_unthr_events for each tick. Needs to restart the group altogether. - The __perf_event_period(). The whole group needs to be restarted altogether as well. With the fix, $ sudo perf report -D | grep PERF_RECORD_SAMPLE -a4 | tail -n 5 0 3573470770332 0x12f5f8 [0x70]: PERF_RECORD_SAMPLE(IP, 0x2): ... sample_read: .... group nr 2 ..... id 0000000000000a28, value 00000004fd3dfd8f, lost 0 ..... id 0000000000000a29, value 00000004fd3dfd8f, lost 0 Suggested-by: "Peter Zijlstra (Intel)" <peterz@infradead.org> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20250520181644.2673067-2-kan.liang@linux.intel.com
2025-05-21selftests/futex: Fix spelling mistake "unitiliazed" -> "uninitialized"Colin Ian King
There is a spelling mistake in a fail error message. Fix it. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://lore.kernel.org/r/20250520080657.30726-1-colin.i.king@gmail.com
2025-05-21futex: Correct the kernedoc return value for futex_wait_setup().Sebastian Andrzej Siewior
The kerneldoc for futex_wait_setup() states it can return "0" or "<1". This isn't true because the error case is "<0" not less than 1. Document that <0 is returned on error. Drop the possible return values and state possible reasons. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: André Almeida <andrealmeid@igalia.com> Link: https://lore.kernel.org/r/20250517151455.1065363-6-bigeasy@linutronix.de
2025-05-21tools headers: Synchronize prctl.h ABI headerSebastian Andrzej Siewior
The prctl.h ABI header was slightly updated during the development of the interface. In particular the "immutable" parameter became a bit in the option argument. Synchronize prctl.h ABI header again and make use of the definition in the testsuite and "perf bench futex". Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: André Almeida <andrealmeid@igalia.com> Link: https://lore.kernel.org/r/20250517151455.1065363-5-bigeasy@linutronix.de
2025-05-21futex: Use RCU_INIT_POINTER() in futex_mm_init().Sebastian Andrzej Siewior
There is no need for an explicit NULL pointer initialisation plus a comment why it is okay. RCU_INIT_POINTER() can be used for NULL initialisations and it is documented. This has been build tested with gcc version 9.3.0 (Debian 9.3.0-22) on a x86-64 defconfig. Fixes: 094ac8cff7858 ("futex: Relax the rcu_assign_pointer() assignment of mm->futex_phash in futex_mm_init()") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250517151455.1065363-4-bigeasy@linutronix.de
2025-05-21selftests/futex: Use TAP output in futex_numa_mpolSebastian Andrzej Siewior
Use TAP output for easier automated testing. Suggested-by: André Almeida <andrealmeid@igalia.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: André Almeida <andrealmeid@igalia.com> Link: https://lore.kernel.org/r/20250517151455.1065363-3-bigeasy@linutronix.de
2025-05-21selftests/futex: Use TAP output in futex_priv_hashSebastian Andrzej Siewior
Use TAP output for easier automated testing. Suggested-by: André Almeida <andrealmeid@igalia.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: André Almeida <andrealmeid@igalia.com> Link: https://lore.kernel.org/r/20250517151455.1065363-2-bigeasy@linutronix.de