summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-05-21net: hibmcge: fix wrong ndo.open() after reset fail issue.Jijie Shao
If the driver reset fails, it may not work properly. Therefore, the ndo.open() operation should be rejected. In this patch, the driver calls netif_device_detach() before the reset and calls netif_device_attach() after the reset succeeds. If the reset fails, netif_device_attach() is not called. Therefore, netdev does not present and cannot be opened. If reset fails, only the PCI reset (via sysfs) can be used to attempt recovery. Fixes: 3f5a61f6d504 ("net: hibmcge: Add reset supported in this module") Signed-off-by: Jijie Shao <shaojijie@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250517095828.1763126-3-shaojijie@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-21net: hibmcge: fix incorrect statistics update issueJijie Shao
When the user dumps statistics, the hibmcge driver automatically updates all statistics. If the driver is performing the reset operation, the error data of 0xFFFFFFFF is updated. Therefore, if the driver is resetting, the hbg_update_stats_by_info() needs to return directly. Fixes: c0bf9bf31e79 ("net: hibmcge: Add support for dump statistics") Signed-off-by: Jijie Shao <shaojijie@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250517095828.1763126-2-shaojijie@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-21Merge tag 'mvebu-fixes-6.15-1' of ↵Arnd Bergmann
https://git.kernel.org/pub/scm/linux/kernel/git/gclement/mvebu into arm/fixes mvebu fixes for 6.15 (part 1) Fix uDPU board LEDs by correcting pinctrl state * tag 'mvebu-fixes-6.15-1' of https://git.kernel.org/pub/scm/linux/kernel/git/gclement/mvebu: arm64: dts: marvell: uDPU: define pinctrl state for alarm LEDs Link: https://lore.kernel.org/r/87wmagpr0a.fsf@BLaptop.bootlin.com Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-05-21Merge tag 'sunxi-fixes-for-6.15' of ↵Arnd Bergmann
https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux into arm/fixes Allwinner fixes for 6.15 Only one fix: Switch back to I2C for PMICs on Allwinner H6 devices. Apparently using Allwinner's proprietary bus ended up causing issues when the PMIC was sharing the bus with other devices. * tag 'sunxi-fixes-for-6.15' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux: Revert "arm64: dts: allwinner: h6: Use RSB for AXP805 PMIC connection" Link: https://lore.kernel.org/r/aCaeLgjZllV7bauX@wens.tw Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-05-21arm64: defconfig: Ensure CRYPTO_CHACHA20_NEON is selectedFabio Estevam
Since commit 17ec3e71ba79 ("crypto: lib/Kconfig - Hide arch options from user"), the CRYPTO_CHACHA20_NEON option is no longer selected by default due to changes in how crypto library options are exposed and selected. To restore the previous behavior and ensure CRYPTO_CHACHA20_NEON is enabled, explicitly select CONFIG_CRYPTO_CHACHA20 in the defconfig. This pulls in CRYPTO_LIB_CHACHA_INTERNAL and allows CRYPTO_CHACHA20_NEON to be selected automatically as before. Fixes: 17ec3e71ba79 ("crypto: lib/Kconfig - Hide arch options from user") Signed-off-by: Fabio Estevam <festevam@denx.de> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-05-21ublk: handle ublk_set_auto_buf_reg() failure correctly in ublk_fetch()Ming Lei
If ublk_set_auto_buf_reg() fails, we need to unlock and return, otherwise `ub->mutex` is leaked. Fixes: 99c1e4eb6a3f ("ublk: register buffer to local io_uring with provided buf index via UBLK_F_AUTO_BUF_REG") Reported-by: Caleb Sander Mateos <csander@purestorage.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250521025502.71041-2-ming.lei@redhat.com Reviewed-by: Caleb Sander Mateos <csander@purestorage.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-21PCI/MSI: Use bool for MSI enable state trackingHans Zhang
Convert pci_msi_enable and pci_msi_enabled() to use bool type for clarity. No functional changes, only code cleanup. Signed-off-by: Hans Zhang <hans.zhang@cixtech.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250516165223.125083-2-18255117159@163.com
2025-05-21Merge tag 'samsung-fixes-6.15' of ↵Arnd Bergmann
https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux into arm/fixes Samsung SoC driver fixes for v6.15 1. Exynos ACPM driver (used on Google GS101): Fix timeout due to missing responses from the firmware part. 2. Samsung USI (serial engines) driver: Correct ineffective unconfiguring of the interface during probe removal. * tag 'samsung-fixes-6.15' of https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux: soc: samsung: usi: prevent wrong bits inversion during unconfiguring firmware: exynos-acpm: check saved RX before bailing out on empty RX queue Link: https://lore.kernel.org/r/20250513101023.21552-5-krzysztof.kozlowski@linaro.org Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-05-21kunit: Fix wrong parameter to kunit_deactivate_static_stub()Tzung-Bi Shih
kunit_deactivate_static_stub() accepts real_fn_addr instead of replacement_addr. In the case, it always passes NULL to kunit_deactivate_static_stub(). Fix it. Link: https://lore.kernel.org/r/20250520082050.2254875-1-tzungbi@kernel.org Fixes: e047c5eaa763 ("kunit: Expose 'static stub' API to redirect functions") Signed-off-by: Tzung-Bi Shih <tzungbi@kernel.org> Reviewed-by: David Gow <davidgow@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2025-05-21genirq/irqdesc: Remove double locking in hwirq_show()Claudiu Beznea
&desc->lock is acquired on 2 consecutive lines in hwirq_show(). This leads obviously to a deadlock. Drop the raw_spin_lock_irq() and keep guard(). Fixes: 5d964a9f7cd8 ("genirq/irqdesc: Switch to lock guards") Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250521142541.3832130-1-claudiu.beznea.uj@bp.renesas.com
2025-05-21io_uring: finish IOU_OK -> IOU_COMPLETE transitionJens Axboe
IOU_COMPLETE is more descriptive, in that it explicitly says that the return value means "please post a completion for this request". This patch completes the transition from IOU_OK to IOU_COMPLETE, replacing existing IOU_OK users. This is a purely mechanical change. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-21io_uring: fix overflow resched cqe reorderingPavel Begunkov
Leaving the CQ critical section in the middle of a overflow flushing can cause cqe reordering since the cache cq pointers are reset and any new cqe emitters that might get called in between are not going to be forced into io_cqe_cache_refill(). Fixes: eac2ca2d682f9 ("io_uring: check if we need to reschedule during overflow flush") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/90ba817f1a458f091f355f407de1c911d2b93bbf.1747483784.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-21nvme: avoid creating multipath sysfs group under namespace path devicesNilay Shroff
Commit 4dbd2b2ebe4c ("nvme-multipath: Add visibility for round-robin io-policy") introduced the creation of the multipath sysfs group under the NVMe head gendisk device node. However, it also inadvertently added the same sysfs group under each namespace path device which head node refers to and that is incorrect. The multipath sysfs group should only be exposed through the namespace head gendisk node. This is sufficient, as the head device already provides symbolic links to the individual namespace paths it manages. This patch fixes the issue by preventing the creation of the multipath sysfs group under namespace path devices, ensuring it only appears under the head disk node. Fixes: 4dbd2b2ebe4c ("nvme-multipath: Add visibility for round-robin io-policy") Signed-off-by: Nilay Shroff <nilay@linux.ibm.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-05-21Merge patch series "coredump: add coredump socket"Christian Brauner
Christian Brauner <brauner@kernel.org> says: Coredumping currently supports two modes: (1) Dumping directly into a file somewhere on the filesystem. (2) Dumping into a pipe connected to a usermode helper process spawned as a child of the system_unbound_wq or kthreadd. For simplicity I'm mostly ignoring (1). There's probably still some users of (1) out there but processing coredumps in this way can be considered adventurous especially in the face of set*id binaries. The most common option should be (2) by now. It works by allowing userspace to put a string into /proc/sys/kernel/core_pattern like: |/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h The "|" at the beginning indicates to the kernel that a pipe must be used. The path following the pipe indicator is a path to a binary that will be spawned as a usermode helper process. Any additional parameters pass information about the task that is generating the coredump to the binary that processes the coredump. In the example core_pattern shown above systemd-coredump is spawned as a usermode helper. There's various conceptual consequences of this (non-exhaustive list): - systemd-coredump is spawned with file descriptor number 0 (stdin) connected to the read-end of the pipe. All other file descriptors are closed. That specifically includes 1 (stdout) and 2 (stderr). This has already caused bugs because userspace assumed that this cannot happen (Whether or not this is a sane assumption is irrelevant.). - systemd-coredump will be spawned as a child of system_unbound_wq. So it is not a child of any userspace process and specifically not a child of PID 1. It cannot be waited upon and is in a weird hybrid upcall which are difficult for userspace to control correctly. - systemd-coredump is spawned with full kernel privileges. This necessitates all kinds of weird privilege dropping excercises in userspace to make this safe. - A new usermode helper has to be spawned for each crashing process. This series adds a new mode: (3) Dumping into an AF_UNIX socket. Userspace can set /proc/sys/kernel/core_pattern to: @/path/to/coredump.socket The "@" at the beginning indicates to the kernel that an AF_UNIX coredump socket will be used to process coredumps. The coredump socket must be located in the initial mount namespace. When a task coredumps it opens a client socket in the initial network namespace and connects to the coredump socket. - The coredump server should use SO_PEERPIDFD to get a stable handle on the connected crashing task. The retrieved pidfd will provide a stable reference even if the crashing task gets SIGKILLed while generating the coredump. - By setting core_pipe_limit non-zero userspace can guarantee that the crashing task cannot be reaped behind it's back and thus process all necessary information in /proc/<pid>. The SO_PEERPIDFD can be used to detect whether /proc/<pid> still refers to the same process. The core_pipe_limit isn't used to rate-limit connections to the socket. This can simply be done via AF_UNIX socket directly. - The pidfd for the crashing task will contain information how the task coredumps. The PIDFD_GET_INFO ioctl gained a new flag PIDFD_INFO_COREDUMP which can be used to retreive the coredump information. If the coredump gets a new coredump client connection the kernel guarantees that PIDFD_INFO_COREDUMP information is available. Currently the following information is provided in the new @coredump_mask extension to struct pidfd_info: * PIDFD_COREDUMPED is raised if the task did actually coredump. * PIDFD_COREDUMP_SKIP is raised if the task skipped coredumping (e.g., undumpable). * PIDFD_COREDUMP_USER is raised if this is a regular coredump and doesn't need special care by the coredump server. * PIDFD_COREDUMP_ROOT is raised if the generated coredump should be treated as sensitive and the coredump server should restrict access to the generated coredump to sufficiently privileged users. - The coredump server should mark itself as non-dumpable. - A container coredump server in a separate network namespace can simply bind to another well-know address and systemd-coredump fowards coredumps to the container. - Coredumps could in the future also be handled via per-user/session coredump servers that run only with that users privileges. The coredump server listens on the coredump socket and accepts a new coredump connection. It then retrieves SO_PEERPIDFD for the client, inspects uid/gid and hands the accepted client to the users own coredump handler which runs with the users privileges only (It must of coure pay close attention to not forward crashing suid binaries.). The new coredump socket will allow userspace to not have to rely on usermode helpers for processing coredumps and provides a safer way to handle them instead of relying on super privileged coredumping helpers. This will also be significantly more lightweight since no fork()+exec() for the usermodehelper is required for each crashing process. The coredump server in userspace can just keep a worker pool. * patches from https://lore.kernel.org/20250516-work-coredump-socket-v8-0-664f3caf2516@kernel.org: selftests/coredump: add tests for AF_UNIX coredumps selftests/pidfd: add PIDFD_INFO_COREDUMP infrastructure coredump: validate socket name as it is written coredump: show supported coredump modes pidfs, coredump: add PIDFD_INFO_COREDUMP coredump: add coredump socket coredump: reflow dump helpers a little coredump: massage do_coredump() coredump: massage format_corename() Link: https://lore.kernel.org/20250516-work-coredump-socket-v8-0-664f3caf2516@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-21selftests/coredump: add tests for AF_UNIX coredumpsChristian Brauner
Add a simple test for generating coredumps via AF_UNIX sockets. Link: https://lore.kernel.org/20250516-work-coredump-socket-v8-9-664f3caf2516@kernel.org Acked-by: Luca Boccassi <luca.boccassi@gmail.com> Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-21selftests/pidfd: add PIDFD_INFO_COREDUMP infrastructureChristian Brauner
Add PIDFD_INFO_COREDUMP infrastructure so we can use it in tests. Link: https://lore.kernel.org/20250516-work-coredump-socket-v8-8-664f3caf2516@kernel.org Acked-by: Luca Boccassi <luca.boccassi@gmail.com> Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-21coredump: validate socket name as it is writtenChristian Brauner
In contrast to other parameters written into /proc/sys/kernel/core_pattern that never fail we can validate enabling the new AF_UNIX support. This is obviously racy as hell but it's always been that way. Link: https://lore.kernel.org/20250516-work-coredump-socket-v8-7-664f3caf2516@kernel.org Acked-by: Luca Boccassi <luca.boccassi@gmail.com> Reviewed-by: Jann Horn <jannh@google.com> Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-21coredump: show supported coredump modesChristian Brauner
Allow userspace to discover what coredump modes are supported. Link: https://lore.kernel.org/20250516-work-coredump-socket-v8-6-664f3caf2516@kernel.org Acked-by: Luca Boccassi <luca.boccassi@gmail.com> Reviewed-by: Jann Horn <jannh@google.com> Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-21pidfs, coredump: add PIDFD_INFO_COREDUMPChristian Brauner
Extend the PIDFD_INFO_COREDUMP ioctl() with the new PIDFD_INFO_COREDUMP mask flag. This adds the @coredump_mask field to struct pidfd_info. When a task coredumps the kernel will provide the following information to userspace in @coredump_mask: * PIDFD_COREDUMPED is raised if the task did actually coredump. * PIDFD_COREDUMP_SKIP is raised if the task skipped coredumping (e.g., undumpable). * PIDFD_COREDUMP_USER is raised if this is a regular coredump and doesn't need special care by the coredump server. * PIDFD_COREDUMP_ROOT is raised if the generated coredump should be treated as sensitive and the coredump server should restrict to the generated coredump to sufficiently privileged users. The kernel guarantees that by the time the connection is made the all PIDFD_INFO_COREDUMP info is available. Link: https://lore.kernel.org/20250516-work-coredump-socket-v8-5-664f3caf2516@kernel.org Acked-by: Luca Boccassi <luca.boccassi@gmail.com> Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Reviewed-by: Jann Horn <jannh@google.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-21coredump: add coredump socketChristian Brauner
Coredumping currently supports two modes: (1) Dumping directly into a file somewhere on the filesystem. (2) Dumping into a pipe connected to a usermode helper process spawned as a child of the system_unbound_wq or kthreadd. For simplicity I'm mostly ignoring (1). There's probably still some users of (1) out there but processing coredumps in this way can be considered adventurous especially in the face of set*id binaries. The most common option should be (2) by now. It works by allowing userspace to put a string into /proc/sys/kernel/core_pattern like: |/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h The "|" at the beginning indicates to the kernel that a pipe must be used. The path following the pipe indicator is a path to a binary that will be spawned as a usermode helper process. Any additional parameters pass information about the task that is generating the coredump to the binary that processes the coredump. In the example core_pattern shown above systemd-coredump is spawned as a usermode helper. There's various conceptual consequences of this (non-exhaustive list): - systemd-coredump is spawned with file descriptor number 0 (stdin) connected to the read-end of the pipe. All other file descriptors are closed. That specifically includes 1 (stdout) and 2 (stderr). This has already caused bugs because userspace assumed that this cannot happen (Whether or not this is a sane assumption is irrelevant.). - systemd-coredump will be spawned as a child of system_unbound_wq. So it is not a child of any userspace process and specifically not a child of PID 1. It cannot be waited upon and is in a weird hybrid upcall which are difficult for userspace to control correctly. - systemd-coredump is spawned with full kernel privileges. This necessitates all kinds of weird privilege dropping excercises in userspace to make this safe. - A new usermode helper has to be spawned for each crashing process. This series adds a new mode: (3) Dumping into an AF_UNIX socket. Userspace can set /proc/sys/kernel/core_pattern to: @/path/to/coredump.socket The "@" at the beginning indicates to the kernel that an AF_UNIX coredump socket will be used to process coredumps. The coredump socket must be located in the initial mount namespace. When a task coredumps it opens a client socket in the initial network namespace and connects to the coredump socket. - The coredump server uses SO_PEERPIDFD to get a stable handle on the connected crashing task. The retrieved pidfd will provide a stable reference even if the crashing task gets SIGKILLed while generating the coredump. - By setting core_pipe_limit non-zero userspace can guarantee that the crashing task cannot be reaped behind it's back and thus process all necessary information in /proc/<pid>. The SO_PEERPIDFD can be used to detect whether /proc/<pid> still refers to the same process. The core_pipe_limit isn't used to rate-limit connections to the socket. This can simply be done via AF_UNIX sockets directly. - The pidfd for the crashing task will grow new information how the task coredumps. - The coredump server should mark itself as non-dumpable. - A container coredump server in a separate network namespace can simply bind to another well-know address and systemd-coredump fowards coredumps to the container. - Coredumps could in the future also be handled via per-user/session coredump servers that run only with that users privileges. The coredump server listens on the coredump socket and accepts a new coredump connection. It then retrieves SO_PEERPIDFD for the client, inspects uid/gid and hands the accepted client to the users own coredump handler which runs with the users privileges only (It must of coure pay close attention to not forward crashing suid binaries.). The new coredump socket will allow userspace to not have to rely on usermode helpers for processing coredumps and provides a safer way to handle them instead of relying on super privileged coredumping helpers that have and continue to cause significant CVEs. This will also be significantly more lightweight since no fork()+exec() for the usermodehelper is required for each crashing process. The coredump server in userspace can e.g., just keep a worker pool. Link: https://lore.kernel.org/20250516-work-coredump-socket-v8-4-664f3caf2516@kernel.org Acked-by: Luca Boccassi <luca.boccassi@gmail.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Reviewed-by: Jann Horn <jannh@google.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-21mips/perf: Remove driver-specific throttle supportKan Liang
The throttle support has been added in the generic code. Remove the driver-specific throttle support. Besides the throttle, perf_event_overflow may return true because of event_limit. It already does an inatomic event disable. The pmu->stop is not required either. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250520181644.2673067-17-kan.liang@linux.intel.com
2025-05-21xtensa/perf: Remove driver-specific throttle supportKan Liang
The throttle support has been added in the generic code. Remove the driver-specific throttle support. Besides the throttle, perf_event_overflow may return true because of event_limit. It already does an inatomic event disable. The pmu->stop is not required either. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Max Filippov <jcmvbkbc@gmail.com> Link: https://lore.kernel.org/r/20250520181644.2673067-16-kan.liang@linux.intel.com
2025-05-21sparc/perf: Remove driver-specific throttle supportKan Liang
The throttle support has been added in the generic code. Remove the driver-specific throttle support. Besides the throttle, perf_event_overflow may return true because of event_limit. It already does an inatomic event disable. The pmu->stop is not required either. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250520181644.2673067-15-kan.liang@linux.intel.com
2025-05-21loongarch/perf: Remove driver-specific throttle supportKan Liang
The throttle support has been added in the generic code. Remove the driver-specific throttle support. Besides the throttle, perf_event_overflow may return true because of event_limit. It already does an inatomic event disable. The pmu->stop is not required either. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250520181644.2673067-14-kan.liang@linux.intel.com
2025-05-21csky/perf: Remove driver-specific throttle supportKan Liang
The throttle support has been added in the generic code. Remove the driver-specific throttle support. Besides the throttle, perf_event_overflow may return true because of event_limit. It already does an inatomic event disable. The pmu->stop is not required either. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Guo Ren <guoren@kernel.org> Link: https://lore.kernel.org/r/20250520181644.2673067-13-kan.liang@linux.intel.com
2025-05-21arc/perf: Remove driver-specific throttle supportKan Liang
The throttle support has been added in the generic code. Remove the driver-specific throttle support. Besides the throttle, perf_event_overflow may return true because of event_limit. It already does an inatomic event disable. The pmu->stop is not required either. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Vineet Gupta <vgupta@kernel.org> Link: https://lore.kernel.org/r/20250520181644.2673067-12-kan.liang@linux.intel.com
2025-05-21alpha/perf: Remove driver-specific throttle supportKan Liang
The throttle support has been added in the generic code. Remove the driver-specific throttle support. Besides the throttle, perf_event_overflow may return true because of event_limit. It already does an inatomic event disable. The pmu->stop is not required either. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250520181644.2673067-11-kan.liang@linux.intel.com
2025-05-21perf/apple_m1: Remove driver-specific throttle supportKan Liang
The throttle support has been added in the generic code. Remove the driver-specific throttle support. Besides the throttle, perf_event_overflow may return true because of event_limit. It already does an inatomic event disable. The pmu->stop is not required either. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250520181644.2673067-10-kan.liang@linux.intel.com
2025-05-21perf/arm: Remove driver-specific throttle supportKan Liang
The throttle support has been added in the generic code. Remove the driver-specific throttle support. Besides the throttle, perf_event_overflow may return true because of event_limit. It already does an inatomic event disable. The pmu->stop is not required either. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Leo Yan <leo.yan@arm.com> Link: https://lore.kernel.org/r/20250520181644.2673067-9-kan.liang@linux.intel.com
2025-05-21s390/perf: Remove driver-specific throttle supportKan Liang
The throttle support has been added in the generic code. Remove the driver-specific throttle support. Besides the throttle, perf_event_overflow may return true because of event_limit. It already does an inatomic event disable. The pmu->stop is not required either. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Thomas Richter <tmricht@linux.ibm.com> Link: https://lore.kernel.org/r/20250520181644.2673067-8-kan.liang@linux.intel.com
2025-05-21powerpc/perf: Remove driver-specific throttle supportKan Liang
The throttle support has been added in the generic code. Remove the driver-specific throttle support. Besides the throttle, perf_event_overflow may return true because of event_limit. It already does an inatomic event disable. The pmu->stop is not required either. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250520181644.2673067-7-kan.liang@linux.intel.com
2025-05-21perf/x86/zhaoxin: Remove driver-specific throttle supportKan Liang
The throttle support has been added in the generic code. Remove the driver-specific throttle support. Besides the throttle, perf_event_overflow may return true because of event_limit. It already does an inatomic event disable. The pmu->stop is not required either. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250520181644.2673067-6-kan.liang@linux.intel.com
2025-05-21perf/x86/amd: Remove driver-specific throttle supportKan Liang
The throttle support has been added in the generic code. Remove the driver-specific throttle support. Besides the throttle, perf_event_overflow may return true because of event_limit. It already does an inatomic event disable. The pmu->stop is not required either. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/r/20250520181644.2673067-5-kan.liang@linux.intel.com
2025-05-21perf/x86/intel: Remove driver-specific throttle supportKan Liang
The throttle support has been added in the generic code. Remove the driver-specific throttle support. Besides the throttle, perf_event_overflow may return true because of event_limit. It already does an inatomic event disable. The pmu->stop is not required either. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250520181644.2673067-4-kan.liang@linux.intel.com
2025-05-21perf: Only dump the throttle log for the leaderKan Liang
The PERF_RECORD_THROTTLE records are dumped for all throttled events. It's not necessary for group events, which are throttled altogether. Optimize it by only dump the throttle log for the leader. The sample right after the THROTTLE record must be generated by the actual target event. It is good enough for the perf tool to locate the actual target event. Suggested-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20250520181644.2673067-3-kan.liang@linux.intel.com
2025-05-21perf: Fix the throttle logic for a groupKan Liang
The current throttle logic doesn't work well with a group, e.g., the following sampling-read case. $ perf record -e "{cycles,cycles}:S" ... $ perf report -D | grep THROTTLE | tail -2 THROTTLE events: 426 ( 9.0%) UNTHROTTLE events: 425 ( 9.0%) $ perf report -D | grep PERF_RECORD_SAMPLE -a4 | tail -n 5 0 1020120874009167 0x74970 [0x68]: PERF_RECORD_SAMPLE(IP, 0x1): ... sample_read: .... group nr 2 ..... id 0000000000000327, value 000000000cbb993a, lost 0 ..... id 0000000000000328, value 00000002211c26df, lost 0 The second cycles event has a much larger value than the first cycles event in the same group. The current throttle logic in the generic code only logs the THROTTLE event. It relies on the specific driver implementation to disable events. For all ARCHs, the implementation is similar. Only the event is disabled, rather than the group. The logic to disable the group should be generic for all ARCHs. Add the logic in the generic code. The following patch will remove the buggy driver-specific implementation. The throttle only happens when an event is overflowed. Stop the entire group when any event in the group triggers the throttle. The MAX_INTERRUPTS is set to all throttle events. The unthrottled could happen in 3 places. - event/group sched. All events in the group are scheduled one by one. All of them will be unthrottled eventually. Nothing needs to be changed. - The perf_adjust_freq_unthr_events for each tick. Needs to restart the group altogether. - The __perf_event_period(). The whole group needs to be restarted altogether as well. With the fix, $ sudo perf report -D | grep PERF_RECORD_SAMPLE -a4 | tail -n 5 0 3573470770332 0x12f5f8 [0x70]: PERF_RECORD_SAMPLE(IP, 0x2): ... sample_read: .... group nr 2 ..... id 0000000000000a28, value 00000004fd3dfd8f, lost 0 ..... id 0000000000000a29, value 00000004fd3dfd8f, lost 0 Suggested-by: "Peter Zijlstra (Intel)" <peterz@infradead.org> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20250520181644.2673067-2-kan.liang@linux.intel.com
2025-05-21selftests/futex: Fix spelling mistake "unitiliazed" -> "uninitialized"Colin Ian King
There is a spelling mistake in a fail error message. Fix it. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://lore.kernel.org/r/20250520080657.30726-1-colin.i.king@gmail.com
2025-05-21futex: Correct the kernedoc return value for futex_wait_setup().Sebastian Andrzej Siewior
The kerneldoc for futex_wait_setup() states it can return "0" or "<1". This isn't true because the error case is "<0" not less than 1. Document that <0 is returned on error. Drop the possible return values and state possible reasons. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: André Almeida <andrealmeid@igalia.com> Link: https://lore.kernel.org/r/20250517151455.1065363-6-bigeasy@linutronix.de
2025-05-21tools headers: Synchronize prctl.h ABI headerSebastian Andrzej Siewior
The prctl.h ABI header was slightly updated during the development of the interface. In particular the "immutable" parameter became a bit in the option argument. Synchronize prctl.h ABI header again and make use of the definition in the testsuite and "perf bench futex". Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: André Almeida <andrealmeid@igalia.com> Link: https://lore.kernel.org/r/20250517151455.1065363-5-bigeasy@linutronix.de
2025-05-21futex: Use RCU_INIT_POINTER() in futex_mm_init().Sebastian Andrzej Siewior
There is no need for an explicit NULL pointer initialisation plus a comment why it is okay. RCU_INIT_POINTER() can be used for NULL initialisations and it is documented. This has been build tested with gcc version 9.3.0 (Debian 9.3.0-22) on a x86-64 defconfig. Fixes: 094ac8cff7858 ("futex: Relax the rcu_assign_pointer() assignment of mm->futex_phash in futex_mm_init()") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250517151455.1065363-4-bigeasy@linutronix.de
2025-05-21selftests/futex: Use TAP output in futex_numa_mpolSebastian Andrzej Siewior
Use TAP output for easier automated testing. Suggested-by: André Almeida <andrealmeid@igalia.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: André Almeida <andrealmeid@igalia.com> Link: https://lore.kernel.org/r/20250517151455.1065363-3-bigeasy@linutronix.de
2025-05-21selftests/futex: Use TAP output in futex_priv_hashSebastian Andrzej Siewior
Use TAP output for easier automated testing. Suggested-by: André Almeida <andrealmeid@igalia.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: André Almeida <andrealmeid@igalia.com> Link: https://lore.kernel.org/r/20250517151455.1065363-2-bigeasy@linutronix.de
2025-05-21sched/uclamp: Align uclamp and util_est and call before freq updateXuewen Yan
The commit dfa0a574cbc47 ("sched/uclamg: Handle delayed dequeue") has add the sched_delayed check to prevent double uclamp_dec/inc. However, it put the uclamp_rq_inc() after enqueue_task(). This may lead to the following issues: When a task with uclamp goes through enqueue_task() and could trigger cpufreq update, its uclamp won't even be considered in the cpufreq update. It is only after enqueue will the uclamp be added to rq buckets, and cpufreq will only pick it up at the next update. This could cause a delay in frequency updating. It may affect the performance(uclamp_min > 0) or power(uclamp_max < 1024). So, just like util_est, put the uclamp_rq_inc() before enqueue_task(). And as for the sched_delayed_task, same as util_est, using the sched_delayed flag to prevent inc the sched_delayed_task's uclamp, using the ENQUEUE_DELAYED flag to allow inc the sched_delayed_task's uclamp which is being woken up. Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Link: https://lore.kernel.org/r/20250417043457.10632-3-xuewen.yan@unisoc.com
2025-05-21sched/util_est: Simplify condition for util_est_{en,de}queue()Xuewen Yan
To prevent double enqueue/dequeue of the util-est for sched_delayed tasks, commit 729288bc6856 ("kernel/sched: Fix util_est accounting for DELAY_DEQUEUE") added the corresponding check. This check excludes double en/dequeue during task migration and priority changes. In fact, these conditions can be simplified. For util_est_dequeue, we know that sched_delayed flag is set in dequeue_entity. When the task is sleeping, we need to call util_est_dequeue to subtract util-est from the cfs_rq. At this point, sched_delayed has not yet been set. If we find that sched_delayed is already set, it indicates that this task has already called dequeue_task_fair once. In this case, there is no need to call util_est_dequeue again. Therefore, simply checking the sched_delayed flag should be sufficient to prevent unnecessary util_est updates during the dequeue. For util_est_enqueue, our goal is to add the util_est to the cfs_rq when task enqueue. However, we don't want to add the util_est of a sched_delayed task to the cfs_rq because the task is sleeping. Therefore, we can exclude the util_est_enqueue for sched_delayed tasks by checking the sched_delayed flag. However, when waking up a delayed task, the sched_delayed flag is cleared after util_est_enqueue. As a result, if we only check the sched_delayed flag, we would miss the util_est_enqueue. Since waking up a sched_delayed task calls enqueue_task with the ENQUEUE_DELAYED flag, we can determine whether to call util_est_enqueue by checking if the enqueue_flag contains ENQUEUE_DELAYED. Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Link: https://lore.kernel.org/r/20250417043457.10632-2-xuewen.yan@unisoc.com
2025-05-21sched/fair: Fixup wake_up_sync() vs DELAYED_DEQUEUEXuewen Yan
Delayed dequeued feature keeps a sleeping task enqueued until its lag has elapsed. As a result, it stays also visible in rq->nr_running. So when in wake_affine_idle(), we should use the real running-tasks in rq to check whether we should place the wake-up task to current cpu. On the other hand, add a helper function to return the nr-delayed. Fixes: 152e11f6df29 ("sched/fair: Implement delayed dequeue") Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com> Reviewed-and-tested-by: Tianchen Ding <dtcccc@linux.alibaba.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lore.kernel.org/r/20250303105241.17251-2-xuewen.yan@unisoc.com
2025-05-21s390/crypto: Extend protected key conversion retry loopHarald Freudenberger
CI runs show that the protected key conversion retry loop runs into timeout if a master key change was initiated on the addressed crypto resource shortly before the conversion request. This patch extends the retry logic to run in total 5 attempts with increasing delay (200, 400, 800 and 1600 ms) in case of a busy card. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Ingo Franzki <ifranzki@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-05-21s390/pci: Fix __pcilg_mio_inuser() inline assemblyHeiko Carstens
Use "a" constraint for the shift operand of the __pcilg_mio_inuser() inline assembly. The used "d" constraint allows the compiler to use any general purpose register for the shift operand, including register zero. If register zero is used this my result in incorrect code generation: 8f6: a7 0a ff f8 ahi %r0,-8 8fa: eb 32 00 00 00 0c srlg %r3,%r2,0 <---- If register zero is selected to contain the shift value, the srlg instruction ignores the contents of the register and always shifts zero bits. Therefore use the "a" constraint which does not permit to select register zero. Fixes: f058599e22d5 ("s390/pci: Fix s390_mmio_read/write with MIO") Cc: stable@vger.kernel.org Reported-by: Niklas Schnelle <schnelle@linux.ibm.com> Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-05-21x86/bugs: Fix spectre_v2 mitigation default on IntelPawan Gupta
Commit 480e803dacf8 ("x86/bugs: Restructure spectre_v2 mitigation") inadvertently changed the spectre-v2 mitigation default from eIBRS to IBRS on Intel. While splitting the spectre_v2 mitigation in select/update/apply functions, eIBRS and IBRS selection logic was separated in select and update. This caused IBRS selection to not consider that eIBRS mitigation is already selected, fix it. Fixes: 480e803dacf8 ("x86/bugs: Restructure spectre_v2 mitigation") Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/20250520-eibrs-fix-v1-1-91bacd35ed09@linux.intel.com
2025-05-21xsk: Bring back busy polling support in XDP_COPYSamiullah Khawaja
Commit 5ef44b3cb43b ("xsk: Bring back busy polling support") fixed the busy polling support in xsk for XDP_ZEROCOPY after it was broken in commit 86e25f40aa1e ("net: napi: Add napi_config"). The busy polling support with XDP_COPY remained broken since the napi_id setup in xsk_rcv_check was removed. Bring back the setup of napi_id for XDP_COPY so socket level SO_BUSYPOLL can be used to poll the underlying napi. Do the setup of napi_id for XDP_COPY in xsk_bind, as it is done currently for XDP_ZEROCOPY. The setup of napi_id for XDP_COPY in xsk_bind is safe because xsk_rcv_check checks that the rx queue at which the packet arrives is equal to the queue_id that was supplied in bind. This is done for both XDP_COPY and XDP_ZEROCOPY mode. Tested using AF_XDP support in virtio-net by running the xsk_rr AF_XDP benchmarking tool shared here: https://lore.kernel.org/all/20250320163523.3501305-1-skhawaja@google.com/T/ Enabled socket busy polling using following commands in qemu, ``` sudo ethtool -L eth0 combined 1 echo 400 | sudo tee /proc/sys/net/core/busy_read echo 100 | sudo tee /sys/class/net/eth0/napi_defer_hard_irqs echo 15000 | sudo tee /sys/class/net/eth0/gro_flush_timeout ``` Fixes: 5ef44b3cb43b ("xsk: Bring back busy polling support") Signed-off-by: Samiullah Khawaja <skhawaja@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-05-21can: slcan: allow reception of short error messagesCarlos Sanchez
Allows slcan to receive short messages (typically errors) from the serial interface. When error support was added to slcan protocol in b32ff4668544e1333b694fcc7812b2d7397b4d6a ("can: slcan: extend the protocol with error info") the minimum valid message size changed from 5 (minimum standard can frame tIII0) to 3 ("e1a" is a valid protocol message, it is one of the examples given in the comments for slcan_bump_err() ), but the check for minimum message length prodicating all decoding was not adjusted. This makes short error messages discarded and error frames not being generated. This patch changes the minimum length to the new minimum (3 characters, excluding terminator, is now a valid message). Signed-off-by: Carlos Sanchez <carlossanchez@geotab.com> Fixes: b32ff4668544 ("can: slcan: extend the protocol with error info") Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Link: https://patch.msgid.link/20250520102305.1097494-1-carlossanchez@geotab.com Cc: stable@vger.kernel.org Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>