summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-05-21scripts/gdb/symbols: factor out pagination_off()Ilya Leoshkevich
Move the code that turns off pagination into a separate function. It will be useful later in order to prevent hangs when loading symbols for kernel image in physical memory during s390 early boot. Link: https://lkml.kernel.org/r/20250515155811.114392-3-iii@linux.ibm.com Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Jan Kiszka <jan.kiszka@siemens.com> Cc: Kieran Bingham <kbingham@kernel.org> Cc: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-21scripts/gdb/symbols: factor out get_vmlinux()Ilya Leoshkevich
Patch series "scripts/gdb/symbols: determine KASLR offset on s390 during early boot". I noticed that debugging s390 early boot using the support I introduced in commit 28939c3e9925 ("scripts/gdb/symbols: determine KASLR offset on s390") does not work. The reason is that decompressor does not provide the vmcoreinfo note, so KASLR offset needs to be extracted in a different way, which this series implements. Patches 1-2 are trivial refactorings, and patch 3 is the implementation. This patch (of 3): Move the code that determines the current vmlinux file into a separate function. It will be useful later in order to analyze the kernel image in physical memory during s390 early boot. Link: https://lkml.kernel.org/r/20250515155811.114392-1-iii@linux.ibm.com Link: https://lkml.kernel.org/r/20250515155811.114392-2-iii@linux.ibm.com Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Jan Kiszka <jan.kiszka@siemens.com> Cc: Kieran Bingham <kbingham@kernel.org> Cc: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-21kernel/panic.c: format kernel-doc commentsSravan Kumar Gundu
kernel-doc function comment don't follows documentation commenting style misinterpreting arguments description with function description. Please see latest docs generated before applying this patch https://docs.kernel.org/driver-api/basics.html#c.panic Link: https://lkml.kernel.org/r/20250516174031.2937-1-sravankumarlpu@gmail.com Signed-off-by: Sravan Kumar Gundu <sravankumarlpu@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jani Nikula <jani.nikula@intel.com> Cc: John Ogness <john.ogness@linutronix.de> Cc: Petr Mladek <pmladek@suse.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-21mailmap: update and consolidate Casey Connolly's name and emailCasey Connolly
I've used several email addresses and a previous name to contribute. Consolidate all of these to my primary email and update my name. Link: https://lkml.kernel.org/r/20250517223237.15647-2-casey.connolly@linaro.org Signed-off-by: Casey Connolly <casey.connolly@linaro.org> Cc: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-21nilfs2: remove wbc->for_reclaim handlingChristoph Hellwig
Since commit 013a07052a1a ("nilfs2: convert metadata aops from writepage to writepages"), nilfs_mdt_write_folio can't be called from reclaim context any more. Remove the code keyed of the wbc->for_reclaim flag, which is now only set for writing out swap or shmem pages inside the swap code, but never passed to file systems. Link: https://lkml.kernel.org/r/20250508054938.15894-7-hch@lst.de Link: https://lkml.kernel.org/r/20250516123417.6779-1-konishi.ryusuke@gmail.com Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-21fork: define a local GFP_VMAP_STACKLinus Walleij
The current allocation of VMAP stack memory is using (THREADINFO_GFP & ~__GFP_ACCOUNT) which is a complicated way of saying (GFP_KERNEL | __GFP_ZERO): <linux/thread_info.h>: define THREADINFO_GFP (GFP_KERNEL_ACCOUNT | __GFP_ZERO) <linux/gfp_types.h>: define GFP_KERNEL_ACCOUNT (GFP_KERNEL | __GFP_ACCOUNT) This is an unfortunate side-effect of independent changes blurring the picture: commit 19809c2da28aee5860ad9a2eff760730a0710df0 changed (THREADINFO_GFP | __GFP_HIGHMEM) to just THREADINFO_GFP since highmem became implicit. commit 9b6f7e163cd0f468d1b9696b785659d3c27c8667 then added stack caching and rewrote the allocation to (THREADINFO_GFP & ~__GFP_ACCOUNT) as cached stacks need to be accounted separately. However that code, when it eventually accounts the memory does this: ret = memcg_kmem_charge(vm->pages[i], GFP_KERNEL, 0) so the memory is charged as a GFP_KERNEL allocation. Define a unique GFP_VMAP_STACK to use GFP_KERNEL | __GFP_ZERO and move the comment there. Link: https://lkml.kernel.org/r/20250509-gfp-stack-v1-1-82f6f7efc210@linaro.org Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Reported-by: Mateusz Guzik <mjguzik@gmail.com> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-21fork: check charging success before zeroing stackPasha Tatashin
No need to do zero cached stack if memcg charge fails, so move the charging attempt before the memset operation. [linus.walleij@linaro.org: rebased] Link: https://lkml.kernel.org/r/20250509-fork-fixes-v3-3-e6c69dd356f2@linaro.org Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Link: https://lore.kernel.org/20240311164638.2015063-6-pasha.tatashin@soleen.com Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-21fork: clean-up naming of vm_stack/vm_struct variables in vmap stacks codePasha Tatashin
There are two data types: "struct vm_struct" and "struct vm_stack" that have the same local variable names: vm_stack, or vm, or s, which makes the code confusing to read. Change the code so the naming is consistent: struct vm_struct is always called vm_area struct vm_stack is always called vm_stack One change altering vfree(vm_stack) to vfree(vm_area->addr) may look like a semantic change but it is not: vm_area->addr points to the vm_stack. This was done to improve readability. [linus.walleij@linaro.org: rebased and added new users of the variable names, address review comments] Link: https://lore.kernel.org/20240311164638.2015063-4-pasha.tatashin@soleen.com Link: https://lkml.kernel.org/r/20250509-fork-fixes-v3-2-e6c69dd356f2@linaro.org Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-21fork: clean-up ifdef logic around stack allocationPasha Tatashin
Patch series "fork: Page operation cleanups in the fork code", v3. This patchset consists of outtakes from a 1 year+ old patchset from Pasha, which all stand on their own. See: https://lore.kernel.org/all/20240311164638.2015063-1-pasha.tatashin@soleen.com/ These are good cleanups for readability so I split these off, rebased on v6.15-rc1, addressed review comments and send them separately. All mentions of dynamic stack are removed from the patchset as we have no idea whether that will go anywhere. This patch (of 3): There is unneeded OR in the ifdef functions that are used to allocate and free kernel stacks based on direct map or vmap. Therefore, clean up by changing the order so OR is no longer needed. [linus.walleij@linaro.org: rebased] Link: https://lkml.kernel.org/r/20250509-fork-fixes-v3-1-e6c69dd356f2@linaro.org Link: https://lkml.kernel.org/r/20250509-fork-fixes-v3-0-e6c69dd356f2@linaro.org Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Link: https://lore.kernel.org/20240311164638.2015063-3-pasha.tatashin@soleen.com Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-21kernel/rcu/tree_stall: add /sys/kernel/rcu_stall_countMax Kellermann
Expose a simple counter to userspace for monitoring tools. Link: https://lkml.kernel.org/r/20250504180831.4190860-3-max.kellermann@ionos.com Signed-off-by: Max Kellermann <max.kellermann@ionos.com> Cc: Core Minyard <cminyard@mvista.com> Cc: Doug Anderson <dianders@chromium.org> Cc: Joel Granados <joel.granados@kernel.org> Cc: Kees Cook <kees@kernel.org> Cc: Song Liu <song@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-21kernel/watchdog: add /sys/kernel/{hard,soft}lockup_countMax Kellermann
Patch series "sysfs: add counters for lockups and stalls", v2. Commits 9db89b411170 ("exit: Expose "oops_count" to sysfs") and 8b05aa263361 ("panic: Expose "warn_count" to sysfs") added counters for oopses and warnings to sysfs, and these two patches do the same for hard/soft lockups and RCU stalls. All of these counters are useful for monitoring tools to detect whether the machine is healthy. If the kernel has experienced a lockup or a stall, it's probably due to a kernel bug, and I'd like to detect that quickly and easily. There is currently no way to detect that, other than parsing dmesg. Or observing indirect effects: such as certain tasks not responding, but then I need to observe all tasks, and it may take a while until these effects become visible/measurable. I'd rather be able to detect the primary cause more quickly, possibly before everything falls apart. This patch (of 2): There is /proc/sys/kernel/hung_task_detect_count, /sys/kernel/warn_count and /sys/kernel/oops_count but there is no userspace-accessible counter for hard/soft lockups. Having this is useful for monitoring tools. Link: https://lkml.kernel.org/r/20250504180831.4190860-1-max.kellermann@ionos.com Link: https://lkml.kernel.org/r/20250504180831.4190860-2-max.kellermann@ionos.com Signed-off-by: Max Kellermann <max.kellermann@ionos.com> Cc: Cc: Core Minyard <cminyard@mvista.com> Cc: Doug Anderson <dianders@chromium.org> Cc: Joel Granados <joel.granados@kernel.org> Cc: Song Liu <song@kernel.org> Cc: Kees Cook <kees@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-21x86/crash: make the page that stores the dm crypt keys inaccessibleCoiby Xu
This adds an addition layer of protection for the saved copy of dm crypt key. Trying to access the saved copy will cause page fault. Link: https://lkml.kernel.org/r/20250502011246.99238-9-coxu@redhat.com Signed-off-by: Coiby Xu <coxu@redhat.com> Suggested-by: Pingfan Liu <kernelfans@gmail.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: "Daniel P. Berrange" <berrange@redhat.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Dave Young <dyoung@redhat.com> Cc: Jan Pazdziora <jpazdziora@redhat.com> Cc: Milan Broz <gmazyland@gmail.com> Cc: Ondrej Kozina <okozina@redhat.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-21x86/crash: pass dm crypt keys to kdump kernelCoiby Xu
1st kernel will build up the kernel command parameter dmcryptkeys as similar to elfcorehdr to pass the memory address of the stored info of dm crypt key to kdump kernel. Link: https://lkml.kernel.org/r/20250502011246.99238-8-coxu@redhat.com Signed-off-by: Coiby Xu <coxu@redhat.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: "Daniel P. Berrange" <berrange@redhat.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Dave Young <dyoung@redhat.com> Cc: Jan Pazdziora <jpazdziora@redhat.com> Cc: Liu Pingfan <kernelfans@gmail.com> Cc: Milan Broz <gmazyland@gmail.com> Cc: Ondrej Kozina <okozina@redhat.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-21Revert "x86/mm: Remove unused __set_memory_prot()"Coiby Xu
This reverts commit 693bbf2a50447353c6a47961e6a7240a823ace02 as kdump LUKS support (CONFIG_CRASH_DM_CRYPT) depends on __set_memory_prot. [akpm@linux-foundation.org: x86 set_memory.h needs pgtable_types.h] Link: https://lkml.kernel.org/r/20250502011246.99238-7-coxu@redhat.com Signed-off-by: Coiby Xu <coxu@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: "Daniel P. Berrange" <berrange@redhat.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Dave Young <dyoung@redhat.com> Cc: Jan Pazdziora <jpazdziora@redhat.com> Cc: Liu Pingfan <kernelfans@gmail.com> Cc: Milan Broz <gmazyland@gmail.com> Cc: Ondrej Kozina <okozina@redhat.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-21crash_dump: retrieve dm crypt keys in kdump kernelCoiby Xu
Crash kernel will retrieve the dm crypt keys based on the dmcryptkeys command line parameter. When user space writes the key description to /sys/kernel/config/crash_dm_crypt_key/restore, the crash kernel will save the encryption keys to the user keyring. Then user space e.g. cryptsetup's --volume-key-keyring API can use it to unlock the encrypted device. Link: https://lkml.kernel.org/r/20250502011246.99238-6-coxu@redhat.com Signed-off-by: Coiby Xu <coxu@redhat.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: "Daniel P. Berrange" <berrange@redhat.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Dave Young <dyoung@redhat.com> Cc: Jan Pazdziora <jpazdziora@redhat.com> Cc: Liu Pingfan <kernelfans@gmail.com> Cc: Milan Broz <gmazyland@gmail.com> Cc: Ondrej Kozina <okozina@redhat.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-21crash_dump: reuse saved dm crypt keys for CPU/memory hot-pluggingCoiby Xu
When there are CPU and memory hot un/plugs, the dm crypt keys may need to be reloaded again depending on the solution for crash hotplug support. Currently, there are two solutions. One is to utilizes udev to instruct user space to reload the kdump kernel image and initrd, elfcorehdr and etc again. The other is to only update the elfcorehdr segment introduced in commit 247262756121 ("crash: add generic infrastructure for crash hotplug support"). For the 1st solution, the dm crypt keys need to be reloaded again. The user space can write true to /sys/kernel/config/crash_dm_crypt_key/reuse so the stored keys can be re-used. For the 2nd solution, the dm crypt keys don't need to be reloaded. Currently, only x86 supports the 2nd solution. If the 2nd solution gets extended to all arches, this patch can be dropped. Link: https://lkml.kernel.org/r/20250502011246.99238-5-coxu@redhat.com Signed-off-by: Coiby Xu <coxu@redhat.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: "Daniel P. Berrange" <berrange@redhat.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Dave Young <dyoung@redhat.com> Cc: Jan Pazdziora <jpazdziora@redhat.com> Cc: Liu Pingfan <kernelfans@gmail.com> Cc: Milan Broz <gmazyland@gmail.com> Cc: Ondrej Kozina <okozina@redhat.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-21crash_dump: store dm crypt keys in kdump reserved memoryCoiby Xu
When the kdump kernel image and initrd are loaded, the dm crypts keys will be read from keyring and then stored in kdump reserved memory. Assume a key won't exceed 256 bytes thus MAX_KEY_SIZE=256 according to "cryptsetup benchmark". Link: https://lkml.kernel.org/r/20250502011246.99238-4-coxu@redhat.com Signed-off-by: Coiby Xu <coxu@redhat.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: "Daniel P. Berrange" <berrange@redhat.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Dave Young <dyoung@redhat.com> Cc: Jan Pazdziora <jpazdziora@redhat.com> Cc: Liu Pingfan <kernelfans@gmail.com> Cc: Milan Broz <gmazyland@gmail.com> Cc: Ondrej Kozina <okozina@redhat.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-21crash_dump: make dm crypt keys persist for the kdump kernelCoiby Xu
A configfs /sys/kernel/config/crash_dm_crypt_keys is provided for user space to make the dm crypt keys persist for the kdump kernel. Take the case of dumping to a LUKS-encrypted target as an example, here is the life cycle of the kdump copies of LUKS volume keys, 1. After the 1st kernel loads the initramfs during boot, systemd uses an user-input passphrase to de-crypt the LUKS volume keys or simply TPM-sealed volume keys and then save the volume keys to specified keyring (using the --link-vk-to-keyring API) and the keys will expire within specified time. 2. A user space tool (kdump initramfs loader like kdump-utils) create key items inside /sys/kernel/config/crash_dm_crypt_keys to inform the 1st kernel which keys are needed. 3. When the kdump initramfs is loaded by the kexec_file_load syscall, the 1st kernel will iterate created key items, save the keys to kdump reserved memory. 4. When the 1st kernel crashes and the kdump initramfs is booted, the kdump initramfs asks the kdump kernel to create a user key using the key stored in kdump reserved memory by writing yes to /sys/kernel/crash_dm_crypt_keys/restore. Then the LUKS encrypted device is unlocked with libcryptsetup's --volume-key-keyring API. 5. The system gets rebooted to the 1st kernel after dumping vmcore to the LUKS encrypted device is finished Eventually the keys have to stay in the kdump reserved memory for the kdump kernel to unlock encrypted volumes. During this process, some measures like letting the keys expire within specified time are desirable to reduce security risk. This patch assumes, 1) there are 128 LUKS devices at maximum to be unlocked thus MAX_KEY_NUM=128. 2) a key description won't exceed 128 bytes thus KEY_DESC_MAX_LEN=128. And here is a demo on how to interact with /sys/kernel/config/crash_dm_crypt_keys, # Add key #1 mkdir /sys/kernel/config/crash_dm_crypt_keys/7d26b7b4-e342-4d2d-b660-7426b0996720 # Add key #1's description echo cryptsetup:7d26b7b4-e342-4d2d-b660-7426b0996720 > /sys/kernel/config/crash_dm_crypt_keys/description # how many keys do we have now? cat /sys/kernel/config/crash_dm_crypt_keys/count 1 # Add key# 2 in the same way # how many keys do we have now? cat /sys/kernel/config/crash_dm_crypt_keys/count 2 # the tree structure of /crash_dm_crypt_keys configfs tree /sys/kernel/config/crash_dm_crypt_keys/ /sys/kernel/config/crash_dm_crypt_keys/ ├── 7d26b7b4-e342-4d2d-b660-7426b0996720 │   └── description ├── count ├── fce2cd38-4d59-4317-8ce2-1fd24d52c46a │   └── description Link: https://lkml.kernel.org/r/20250502011246.99238-3-coxu@redhat.com Signed-off-by: Coiby Xu <coxu@redhat.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: "Daniel P. Berrange" <berrange@redhat.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Dave Young <dyoung@redhat.com> Cc: Jan Pazdziora <jpazdziora@redhat.com> Cc: Liu Pingfan <kernelfans@gmail.com> Cc: Milan Broz <gmazyland@gmail.com> Cc: Ondrej Kozina <okozina@redhat.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-21kexec_file: allow to place kexec_buf randomlyCoiby Xu
Patch series "Support kdump with LUKS encryption by reusing LUKS volume keys", v9. LUKS is the standard for Linux disk encryption, widely adopted by users, and in some cases, such as Confidential VMs, it is a requirement. With kdump enabled, when the first kernel crashes, the system can boot into the kdump/crash kernel to dump the memory image (i.e., /proc/vmcore) to a specified target. However, there are two challenges when dumping vmcore to a LUKS-encrypted device: - Kdump kernel may not be able to decrypt the LUKS partition. For some machines, a system administrator may not have a chance to enter the password to decrypt the device in kdump initramfs after the 1st kernel crashes; For cloud confidential VMs, depending on the policy the kdump kernel may not be able to unseal the keys with TPM and the console virtual keyboard is untrusted. - LUKS2 by default use the memory-hard Argon2 key derivation function which is quite memory-consuming compared to the limited memory reserved for kdump. Take Fedora example, by default, only 256M is reserved for systems having memory between 4G-64G. With LUKS enabled, ~1300M needs to be reserved for kdump. Note if the memory reserved for kdump can't be used by 1st kernel i.e. an user sees ~1300M memory missing in the 1st kernel. Besides users (at least for Fedora) usually expect kdump to work out of the box i.e. no manual password input or custom crashkernel value is needed. And it doesn't make sense to derivate the keys again in kdump kernel which seems to be redundant work. This patchset addresses the above issues by making the LUKS volume keys persistent for kdump kernel with the help of cryptsetup's new APIs (--link-vk-to-keyring/--volume-key-keyring). Here is the life cycle of the kdump copies of LUKS volume keys, 1. After the 1st kernel loads the initramfs during boot, systemd use an user-input passphrase to de-crypt the LUKS volume keys or TPM-sealed key and then save the volume keys to specified keyring (using the --link-vk-to-keyring API) and the key will expire within specified time. 2. A user space tool (kdump initramfs loader like kdump-utils) create key items inside /sys/kernel/config/crash_dm_crypt_keys to inform the 1st kernel which keys are needed. 3. When the kdump initramfs is loaded by the kexec_file_load syscall, the 1st kernel will iterate created key items, save the keys to kdump reserved memory. 4. When the 1st kernel crashes and the kdump initramfs is booted, the kdump initramfs asks the kdump kernel to create a user key using the key stored in kdump reserved memory by writing yes to /sys/kernel/crash_dm_crypt_keys/restore. Then the LUKS encrypted device is unlocked with libcryptsetup's --volume-key-keyring API. 5. The system gets rebooted to the 1st kernel after dumping vmcore to the LUKS encrypted device is finished After libcryptsetup saving the LUKS volume keys to specified keyring, whoever takes this should be responsible for the safety of these copies of keys. The keys will be saved in the memory area exclusively reserved for kdump where even the 1st kernel has no direct access. And further more, two additional protections are added, - save the copy randomly in kdump reserved memory as suggested by Jan - clear the _PAGE_PRESENT flag of the page that stores the copy as suggested by Pingfan This patchset only supports x86. There will be patches to support other architectures once this patch set gets merged. This patch (of 9): Currently, kexec_buf is placed in order which means for the same machine, the info in the kexec_buf is always located at the same position each time the machine is booted. This may cause a risk for sensitive information like LUKS volume key. Now struct kexec_buf has a new field random which indicates it's supposed to be placed in a random position. Note this feature is enabled only when CONFIG_CRASH_DUMP is enabled. So it only takes effect for kdump and won't impact kexec reboot. Link: https://lkml.kernel.org/r/20250502011246.99238-1-coxu@redhat.com Link: https://lkml.kernel.org/r/20250502011246.99238-2-coxu@redhat.com Signed-off-by: Coiby Xu <coxu@redhat.com> Suggested-by: Jan Pazdziora <jpazdziora@redhat.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: "Daniel P. Berrange" <berrange@redhat.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Dave Young <dyoung@redhat.com> Cc: Liu Pingfan <kernelfans@gmail.com> Cc: Milan Broz <gmazyland@gmail.com> Cc: Ondrej Kozina <okozina@redhat.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11list: remove redundant 'extern' for function prototypesAndy Shevchenko
The 'extern' keyword is redundant for function prototypes. list.h never used them and new code in general is better without them. Link: https://lkml.kernel.org/r/20250502121742.3997529-1-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11scripts/gdb: update documentation for lx_per_cpuIllia Ostapyshyn
Commit db08c53fdd542bb7f83b ("scripts/gdb: fix parameter handling in $lx_per_cpu") changed the parameter handling of lx_per_cpu to use GdbValue instead of parsing the variable name. Update the documentation to reflect the new lx_per_cpu usage. Update the hrtimer_bases example to use rb_tree instead of the timerqueue_head.next pointer removed in commit 511885d7061eda3eb1fa ("lib/timerqueue: Rely on rbtree semantics for next timer"). Link: https://lkml.kernel.org/r/20250503123234.2407184-3-illia@yshyn.com Signed-off-by: Illia Ostapyshyn <illia@yshyn.com> Cc: Alex Shi <alexs@kernel.org> Cc: Brendan Jackman <jackmanb@google.com> Cc: Dongliang Mu <dzm91@hust.edu.cn> Cc: Florian Rommel <mail@florommel.de> Cc: Hu Haowen <2023002089@link.tyut.edu.cn> Cc: Jan Kiszka <jan.kiszka@siemens.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kieran Bingham <kbingham@kernel.org> Cc: Yanteng Si <si.yanteng@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11scripts/gdb: fix kgdb probing on single-core systemsIllia Ostapyshyn
Patch series "scripts/gdb: Fixes related to lx_per_cpu()". These patches (1) fix kgdb detection on systems featuring a single CPU and (2) update the documentation to reflect the current usage of lx_per_cpu() and update an outdated example of its usage. This patch (of 2): When requested the list of threads via qfThreadInfo, gdb_cmd_query in kernel/debug/gdbstub.c first returns "shadow" threads for CPUs followed by the actual tasks in the system. Extended qThreadExtraInfo queries yield "shadowCPU%d" as the name for the CPU core threads. This behavior is used by get_gdbserver_type() to probe for KGDB by matching the name for the thread 2 against "shadowCPU". This breaks down on single-core systems, where thread 2 is the first nonshadow thread. Request the name for thread 1 instead. As GDB assigns thread IDs in the order of their appearance, it is safe to assume shadowCPU0 at ID 1 as long as CPU0 is not hotplugged. Before: (gdb) info threads Id Target Id Frame 1 Thread 4294967294 (shadowCPU0) kgdb_breakpoint () * 2 Thread 1 (swapper/0) kgdb_breakpoint () 3 Thread 2 (kthreadd) 0x0000000000000000 in ?? () ... (gdb) p $lx_current().comm Sorry, obtaining the current CPU is not yet supported with this gdb server. After: (gdb) info threads Id Target Id Frame 1 Thread 4294967294 (shadowCPU0) kgdb_breakpoint () * 2 Thread 1 (swapper/0) kgdb_breakpoint () 3 Thread 2 (kthreadd) 0x0000000000000000 in ?? () ... (gdb) p $lx_current().comm $1 = "swapper/0\000\000\000\000\000\000" Link: https://lkml.kernel.org/r/20250503123234.2407184-1-illia@yshyn.com Link: https://lkml.kernel.org/r/20250503123234.2407184-2-illia@yshyn.com Signed-off-by: Illia Ostapyshyn <illia@yshyn.com> Cc: Alex Shi <alexs@kernel.org> Cc: Brendan Jackman <jackmanb@google.com> Cc: Dongliang Mu <dzm91@hust.edu.cn> Cc: Florian Rommel <mail@florommel.de> Cc: Hu Haowen <2023002089@link.tyut.edu.cn> Cc: Jan Kiszka <jan.kiszka@siemens.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kieran Bingham <kbingham@kernel.org> Cc: Yanteng Si <si.yanteng@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11selftests: fix some typos in tools/testing/selftestsChelsy Ratnawat
Fix multiple spelling errors: - "rougly" -> "roughly" - "fielesystems" -> "filesystems" - "Can'" -> "Can't" Link: https://lkml.kernel.org/r/20250503211959.507815-1-chelsyratnawat2001@gmail.com Signed-off-by: Chelsy Ratnawat <chelsyratnawat2001@gmail.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11lib/oid_registry.c: remove unused sprint_OIDDr. David Alan Gilbert
sprint_OID() was added as part of 2012's commit 4f73175d0375 ("X.509: Add utility functions to render OIDs as strings") but it hasn't been used. Remove it. Note that there's also 'sprint_oid' (lower case) which is used in a lot of places; that's left as is except for fixing its case in a comment. Link: https://lkml.kernel.org/r/20250501010502.326472-1-linux@treblig.org Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11nilfs2: do not propagate ENOENT error from nilfs_btree_propagate()Ryusuke Konishi
In preparation for writing logs, in nilfs_btree_propagate(), which makes parent and ancestor node blocks dirty starting from a modified data block or b-tree node block, if the starting block does not belong to the b-tree, i.e. is isolated, nilfs_btree_do_lookup() called within the function fails with -ENOENT. In this case, even though -ENOENT is an internal code, it is propagated to the log writer via nilfs_bmap_propagate() and may be erroneously returned to system calls such as fsync(). Fix this issue by changing the error code to -EINVAL in this case, and having the bmap layer detect metadata corruption and convert the error code appropriately. Link: https://lkml.kernel.org/r/20250428173808.6452-3-konishi.ryusuke@gmail.com Fixes: 1f5abe7e7dbc ("nilfs2: replace BUG_ON and BUG calls triggerable from ioctl") Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Wentao Liang <vulab@iscas.ac.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11nilfs2: add pointer check for nilfs_direct_propagate()Wentao Liang
Patch series "nilfs2: improve sanity checks in dirty state propagation". This fixes one missed check for block mapping anomalies and one improper return of an error code during a preparation step for log writing, thereby improving checking for filesystem corruption on writeback. This patch (of 2): In nilfs_direct_propagate(), the printer get from nilfs_direct_get_ptr() need to be checked to ensure it is not an invalid pointer. If the pointer value obtained by nilfs_direct_get_ptr() is NILFS_BMAP_INVALID_PTR, means that the metadata (in this case, i_bmap in the nilfs_inode_info struct) that should point to the data block at the buffer head of the argument is corrupted and the data block is orphaned, meaning that the file system has lost consistency. Add a value check and return -EINVAL when it is an invalid pointer. Link: https://lkml.kernel.org/r/20250428173808.6452-1-konishi.ryusuke@gmail.com Link: https://lkml.kernel.org/r/20250428173808.6452-2-konishi.ryusuke@gmail.com Fixes: 36a580eb489f ("nilfs2: direct block mapping") Signed-off-by: Wentao Liang <vulab@iscas.ac.cn> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11kexec_file: use SHA-256 library API instead of crypto_shash APIEric Biggers
This user of SHA-256 does not support any other algorithm, so the crypto_shash abstraction provides no value. Just use the SHA-256 library API instead, which is much simpler and easier to use. Tested with '/sbin/kexec --kexec-file-syscall'. Link: https://lkml.kernel.org/r/20250428185721.844686-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com> Cc: Baoquan He <bhe@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Dave Young <dyoung@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11util_macros.h: fix the reference in kernel-docAndy Shevchenko
In PTR_IF() description the text refers to the parameter as (ptr) while the kernel-doc format asks for @ptr. Fix this accordingly. Link: https://lkml.kernel.org/r/20250428072737.3265239-1-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Alexandru Ardelean <aardelean@baylibre.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11sort.h: hoist cmp_int() into generic header fileFedor Pchelkin
Deduplicate the same functionality implemented in several places by moving the cmp_int() helper macro into linux/sort.h. The macro performs a three-way comparison of the arguments mostly useful in different sorting strategies and algorithms. Link: https://lkml.kernel.org/r/20250427201451.900730-1-pchelkin@ispras.ru Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Suggested-by: Darrick J. Wong <djwong@kernel.org> Acked-by: Kent Overstreet <kent.overstreet@linux.dev> Acked-by: Coly Li <colyli@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Carlos Maiolino <cem@kernel.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Coly Li <colyli@kernel.org> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11ocfs2: remove unnecessary NULL check before unregister_sysctl_table()Chen Ni
unregister_sysctl_table() checks for NULL pointers internally. Remove unneeded NULL check here. Link: https://lkml.kernel.org/r/20250422073051.1334310-1-nichen@iscas.ac.cn Signed-off-by: Chen Ni <nichen@iscas.ac.cn> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11ocfs2: fix possible memory leak in ocfs2_finish_quota_recoveryMurad Masimov
If ocfs2_finish_quota_recovery() exits due to an error before passing all rc_list elements to ocfs2_recover_local_quota_file() then it can lead to a memory leak as rc_list may still contain elements that have to be freed. Release all memory allocated by ocfs2_add_recovery_chunk() using ocfs2_free_quota_recovery() instead of kfree(). Found by Linux Verification Center (linuxtesting.org) with Syzkaller. Link: https://lkml.kernel.org/r/20250402065628.706359-2-m.masimov@mt-integration.ru Fixes: 2205363dce74 ("ocfs2: Implement quota recovery") Signed-off-by: Murad Masimov <m.masimov@mt-integration.ru> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11ipc: fix to protect IPCS lookups using RCUJeongjun Park
syzbot reported that it discovered a use-after-free vulnerability, [0] [0]: https://lore.kernel.org/all/67af13f8.050a0220.21dd3.0038.GAE@google.com/ idr_for_each() is protected by rwsem, but this is not enough. If it is not protected by RCU read-critical region, when idr_for_each() calls radix_tree_node_free() through call_rcu() to free the radix_tree_node structure, the node will be freed immediately, and when reading the next node in radix_tree_for_each_slot(), the already freed memory may be read. Therefore, we need to add code to make sure that idr_for_each() is protected within the RCU read-critical region when we call it in shm_destroy_orphaned(). Link: https://lkml.kernel.org/r/20250424143322.18830-1-aha310510@gmail.com Fixes: b34a6b1da371 ("ipc: introduce shm_rmid_forced sysctl") Signed-off-by: Jeongjun Park <aha310510@gmail.com> Reported-by: syzbot+a2b84e569d06ca3a949c@syzkaller.appspotmail.com Cc: Jeongjun Park <aha310510@gmail.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Vasiliy Kulikov <segoon@openwall.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11compiler_types.h: fix "unused variable" in __compiletime_assert()Marc Herbert
This refines commit c03567a8e8d5 ("include/linux/compiler.h: don't perform compiletime_assert with -O0") and restores #ifdef __OPTIMIZE__ symmetry by evaluating the 'condition' variable in both compile-time variants of __compiletimeassert(). As __OPTIMIZE__ is always true by default, this commit does not change anything by default. But it fixes warnings with _non-default_ CFLAGS like for instance this: make CFLAGS_tcp.o='-Og -U__OPTIMIZE__' from net/ipv4/tcp.c:273: include/net/sch_generic.h: In function `qdisc_cb_private_validate': include/net/sch_generic.h:511:30: error: unused variable `qcb' [-Werror=unused-variable] { struct qdisc_skb_cb *qcb; BUILD_BUG_ON(sizeof(skb->cb) < sizeof(*qcb)); ... } [akpm@linux-foundation.org: regularize comment layout, reflow comment] Link: https://lkml.kernel.org/r/20250424194048.652571-1-marc.herbert@linux.intel.com Signed-off-by: Marc Herbert <Marc.Herbert@linux.intel.com> Cc: Alexander Potapenko <glider@google.com> Cc: Changbin Du <changbin.du@intel.com> Cc: Jan Hendrik Farr <kernel@jfarr.cc> Cc: Macro Elver <elver@google.com> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Tony Ambardar <tony.ambardar@gmail.com> Cc: Uros Bizjak <ubizjak@gmail.com> Cc: Kees Cook <kees@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11maccess: fix strncpy_from_user_nofault() empty string handlingMykyta Yatsenko
strncpy_from_user_nofault() should return the length of the copied string including the trailing NUL, but if the argument unsafe_addr points to an empty string ({'\0'}), the return value is 0. This happens as strncpy_from_user() copies terminal symbol into dst and returns 0 (as expected), but strncpy_from_user_nofault does not modify ret as it is not equal to count and not greater than 0, so 0 is returned, which contradicts the contract. Link: https://lkml.kernel.org/r/20250422131449.57177-1-mykyta.yatsenko5@gmail.com Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> Reviewed-by: Andrii Nakryiko <andrii@kernel.org> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11watchdog: fix watchdog may detect false positive of softlockupLuo Gengkun
When updating `watchdog_thresh`, there is a race condition between writing the new `watchdog_thresh` value and stopping the old watchdog timer. If the old timer triggers during this window, it may falsely detect a softlockup due to the old interval and the new `watchdog_thresh` value being used. The problem can be described as follow: # We asuume previous watchdog_thresh is 60, so the watchdog timer is # coming every 24s. echo 10 > /proc/sys/kernel/watchdog_thresh (User space) | +------>+ update watchdog_thresh (We are in kernel now) | | # using old interval and new `watchdog_thresh` +------>+ watchdog hrtimer (irq context: detect softlockup) | | +-------+ | | + softlockup_stop_all To fix this problem, introduce a shadow variable for `watchdog_thresh`. The update to the actual `watchdog_thresh` is delayed until after the old timer is stopped, preventing false positives. The following testcase may help to understand this problem. --------------------------------------------- echo RT_RUNTIME_SHARE > /sys/kernel/debug/sched/features echo -1 > /proc/sys/kernel/sched_rt_runtime_us echo 0 > /sys/kernel/debug/sched/fair_server/cpu3/runtime echo 60 > /proc/sys/kernel/watchdog_thresh taskset -c 3 chrt -r 99 /bin/bash -c "while true;do true; done" & echo 10 > /proc/sys/kernel/watchdog_thresh & --------------------------------------------- The test case above first removes the throttling restrictions for real-time tasks. It then sets watchdog_thresh to 60 and executes a real-time task ,a simple while(1) loop, on cpu3. Consequently, the final command gets blocked because the presence of this real-time thread prevents kworker:3 from being selected by the scheduler. This eventually triggers a softlockup detection on cpu3 due to watchdog_timer_fn operating with inconsistent variable - using both the old interval and the updated watchdog_thresh simultaneously. [nysal@linux.ibm.com: fix the SOFTLOCKUP_DETECTOR=n case] Link: https://lkml.kernel.org/r/20250502111120.282690-1-nysal@linux.ibm.com Link: https://lkml.kernel.org/r/20250421035021.3507649-1-luogengkun@huaweicloud.com Signed-off-by: Luo Gengkun <luogengkun@huaweicloud.com> Signed-off-by: Nysal Jan K.A. <nysal@linux.ibm.com> Cc: Doug Anderson <dianders@chromium.org> Cc: Joel Granados <joel.granados@kernel.org> Cc: Song Liu <song@kernel.org> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: "Nysal Jan K.A." <nysal@linux.ibm.com> Cc: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11treewide: fix typo "previlege"WangYuli
There are some spelling mistakes of 'previlege' in comments which should be 'privilege'. Fix them and add it to scripts/spelling.txt. The typo in arch/loongarch/kvm/main.c was corrected by a different patch [1] and is therefore not included in this submission. [1]. https://lore.kernel.org/all/20250420142208.2252280-1-wheatfox17@icloud.com/ Link: https://lkml.kernel.org/r/46AD404E411A4BAC+20250421074910.66988-1-wangyuli@uniontech.com Signed-off-by: WangYuli <wangyuli@uniontech.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11crash: fix spelling mistake "crahskernel" -> "crashkernel"Colin Ian King
There is a spelling mistake in a pr_warn message. Fix it. Link: https://lkml.kernel.org/r/20250418120331.535086-1-colin.i.king@gmail.com Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11lib/test_kmod: do not hardcode/depend on any filesystemHerton R. Krzesinski
Right now test_kmod has hardcoded dependencies on btrfs/xfs. That is not optimal since you end up needing to select/build them, but it is not really required since other fs could be selected for the testing. Also, we can't change the default/driver module used for testing on initialization. Thus make it more generic: introduce two module parameters (start_driver and start_test_fs), which allow to select which modules/fs to use for the testing on test_kmod initialization. Then it's up to the user to select which modules/fs to use for testing based on his config. However, keep test_module as required default. This way, config/modules becomes selectable as when the testing is done from selftests (userspace). While at it, also change trigger_config_run_type, since at module initialization we already set the defaults at __kmod_config_init and should not need to do it again in test_kmod_init(), thus we can avoid to again set test_driver/test_fs. Link: https://lkml.kernel.org/r/20250418165047.702487-1-herton@redhat.com Signed-off-by: Herton R. Krzesinski <herton@redhat.com> Reviewed-by: Luis Chambelrain <mcgrof@kernel.org> Cc: Daniel Gomez <da.gomez@samsung.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Petr Pavlu <petr.pavlu@suse.com> Cc: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11relay: remove unused relay_late_setup_filesDr. David Alan Gilbert
The last use of relay_late_setup_files() was removed in 2018 by commit 2b47733045aa ("drm/i915/guc: Merge log relay file and channel creation") Remove it and the helper it used. relay_late_setup_files() was used for eventually registering 'buffer only' channels. With it gone, delete the docs that explain how to do that. Which suggests it should be possible to lose the 'has_base_filename' flags. (Are there any other uses??) Link: https://lkml.kernel.org/r/20250418234932.490863-1-linux@treblig.org Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: Jens Axboe <axboe@kernel.dk> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andriy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11rapidio: remove unused functionsDr. David Alan Gilbert
rio_request_dma() and rio_dma_prep_slave_sg() were added in 2012 by commit e42d98ebe7d7 ("rapidio: add DMA engine support for RIO data transfers") but never used. rio_find_mport() last use was removed in 2013 by commit 9edbc30b434f ("rapidio: update enumerator registration mechanism") rio_unregister_scan() was added in 2013 by commit a11650e11093 ("rapidio: make enumeration/discovery configurable") but never used. Remove them. Link: https://lkml.kernel.org/r/20250419203012.429787-3-linux@treblig.org Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Cc: Alexandre Bounine <alex.bou9@gmail.com> Cc: Matt Porter <mporter@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11rapidio: remove some dead definesDr. David Alan Gilbert
Patch series "rapidio deadcoding". A couple of rapidio deadcoding patches. The first of these is a repost and was originally posted almost a year ago https://lore.kernel.org/all/20240528002515.211366-1-linux@treblig.org/ but got no answer. Other than being rebased and a typo fixed, it's not changed. This patch (of 2): 'mport_dma_buf', 'rio_mport_dma_map' and 'MPORT_MAX_DMA_BUFS' were added in the original commit e8de370188d0 ("rapidio: add mport char device driver") but never used. 'rio_cm_work' was unused since the original commit b6e8d4aa1110 ("rapidio: add RapidIO channelized messaging driver") but never used. Remove them. Link: https://lkml.kernel.org/r/20250419203012.429787-1-linux@treblig.org Link: https://lkml.kernel.org/r/20250419203012.429787-2-linux@treblig.org Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Cc: Alexandre Bounine <alex.bou9@gmail.com> Cc: Matt Porter <mporter@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11scatterlist: inline sg_next()Caleb Sander Mateos
sg_next() is a short function called frequently in I/O paths. Define it in the header file so it can be inlined into its callers. Link: https://lkml.kernel.org/r/20250416160615.3571958-1-csander@purestorage.com Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Cc: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11ocfs2: simplify return statement in ocfs2_filecheck_attr_store()Thorsten Blum
Don't negate 'ret' and simplify the return statement. No functional changes intended. Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11samples: extend hung_task detector test with semaphore supportZi Li
Extend the existing hung_task detector test module to support multiple lock types, including mutex and semaphore, with room for future additions (e.g., spinlock, etc.). This module creates dummy files under <debugfs>/hung_task, such as 'mutex' and 'semaphore'. The read process on any of these files will sleep for enough long time (256 seconds) while holding the respective lock. As a result, the second process will wait on the lock for a prolonged duration and be detected by the hung_task detector. This change unifies the previous mutex-only sample into a single, extensible hung_task_tests module, reducing code duplication and improving maintainability. Usage is: > cd /sys/kernel/debug/hung_task > cat mutex & cat mutex # Test mutex blocking > cat semaphore & cat semaphore # Test semaphore blocking Update the Kconfig description to reflect multiple debugfs files support. Link: https://lkml.kernel.org/r/20250414145945.84916-4-ioworker0@gmail.com Signed-off-by: Lance Yang <ioworker0@gmail.com> Signed-off-by: Zi Li <amaindex@outlook.com> Suggested-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Anna Schumaker <anna.schumaker@oracle.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joel Granados <joel.granados@kernel.org> Cc: John Stultz <jstultz@google.com> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Mingzhe Yang <mingzhe.yang@ly.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tomasz Figa <tfiga@chromium.org> Cc: Waiman Long <longman@redhat.com> Cc: Will Deacon <will@kernel.org> Cc: Yongliang Gao <leonylgao@tencent.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11hung_task: show the blocker task if the task is hung on semaphoreLance Yang
Inspired by mutex blocker tracking[1], this patch makes a trade-off to balance the overhead and utility of the hung task detector. Unlike mutexes, semaphores lack explicit ownership tracking, making it challenging to identify the root cause of hangs. To address this, we introduce a last_holder field to the semaphore structure, which is updated when a task successfully calls down() and cleared during up(). The assumption is that if a task is blocked on a semaphore, the holders must not have released it. While this does not guarantee that the last holder is one of the current blockers, it likely provides a practical hint for diagnosing semaphore-related stalls. With this change, the hung task detector can now show blocker task's info like below: [Tue Apr 8 12:19:07 2025] INFO: task cat:945 blocked for more than 120 seconds. [Tue Apr 8 12:19:07 2025] Tainted: G E 6.14.0-rc6+ #1 [Tue Apr 8 12:19:07 2025] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Tue Apr 8 12:19:07 2025] task:cat state:D stack:0 pid:945 tgid:945 ppid:828 task_flags:0x400000 flags:0x00000000 [Tue Apr 8 12:19:07 2025] Call Trace: [Tue Apr 8 12:19:07 2025] <TASK> [Tue Apr 8 12:19:07 2025] __schedule+0x491/0xbd0 [Tue Apr 8 12:19:07 2025] schedule+0x27/0xf0 [Tue Apr 8 12:19:07 2025] schedule_timeout+0xe3/0xf0 [Tue Apr 8 12:19:07 2025] ? __folio_mod_stat+0x2a/0x80 [Tue Apr 8 12:19:07 2025] ? set_ptes.constprop.0+0x27/0x90 [Tue Apr 8 12:19:07 2025] __down_common+0x155/0x280 [Tue Apr 8 12:19:07 2025] down+0x53/0x70 [Tue Apr 8 12:19:07 2025] read_dummy_semaphore+0x23/0x60 [Tue Apr 8 12:19:07 2025] full_proxy_read+0x5f/0xa0 [Tue Apr 8 12:19:07 2025] vfs_read+0xbc/0x350 [Tue Apr 8 12:19:07 2025] ? __count_memcg_events+0xa5/0x140 [Tue Apr 8 12:19:07 2025] ? count_memcg_events.constprop.0+0x1a/0x30 [Tue Apr 8 12:19:07 2025] ? handle_mm_fault+0x180/0x260 [Tue Apr 8 12:19:07 2025] ksys_read+0x66/0xe0 [Tue Apr 8 12:19:07 2025] do_syscall_64+0x51/0x120 [Tue Apr 8 12:19:07 2025] entry_SYSCALL_64_after_hwframe+0x76/0x7e [Tue Apr 8 12:19:07 2025] RIP: 0033:0x7f419478f46e [Tue Apr 8 12:19:07 2025] RSP: 002b:00007fff1c4d2668 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [Tue Apr 8 12:19:07 2025] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007f419478f46e [Tue Apr 8 12:19:07 2025] RDX: 0000000000020000 RSI: 00007f4194683000 RDI: 0000000000000003 [Tue Apr 8 12:19:07 2025] RBP: 00007f4194683000 R08: 00007f4194682010 R09: 0000000000000000 [Tue Apr 8 12:19:07 2025] R10: fffffffffffffbc5 R11: 0000000000000246 R12: 0000000000000000 [Tue Apr 8 12:19:07 2025] R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000 [Tue Apr 8 12:19:07 2025] </TASK> [Tue Apr 8 12:19:07 2025] INFO: task cat:945 blocked on a semaphore likely last held by task cat:938 [Tue Apr 8 12:19:07 2025] task:cat state:S stack:0 pid:938 tgid:938 ppid:584 task_flags:0x400000 flags:0x00000000 [Tue Apr 8 12:19:07 2025] Call Trace: [Tue Apr 8 12:19:07 2025] <TASK> [Tue Apr 8 12:19:07 2025] __schedule+0x491/0xbd0 [Tue Apr 8 12:19:07 2025] ? _raw_spin_unlock_irqrestore+0xe/0x40 [Tue Apr 8 12:19:07 2025] schedule+0x27/0xf0 [Tue Apr 8 12:19:07 2025] schedule_timeout+0x77/0xf0 [Tue Apr 8 12:19:07 2025] ? __pfx_process_timeout+0x10/0x10 [Tue Apr 8 12:19:07 2025] msleep_interruptible+0x49/0x60 [Tue Apr 8 12:19:07 2025] read_dummy_semaphore+0x2d/0x60 [Tue Apr 8 12:19:07 2025] full_proxy_read+0x5f/0xa0 [Tue Apr 8 12:19:07 2025] vfs_read+0xbc/0x350 [Tue Apr 8 12:19:07 2025] ? __count_memcg_events+0xa5/0x140 [Tue Apr 8 12:19:07 2025] ? count_memcg_events.constprop.0+0x1a/0x30 [Tue Apr 8 12:19:07 2025] ? handle_mm_fault+0x180/0x260 [Tue Apr 8 12:19:07 2025] ksys_read+0x66/0xe0 [Tue Apr 8 12:19:07 2025] do_syscall_64+0x51/0x120 [Tue Apr 8 12:19:07 2025] entry_SYSCALL_64_after_hwframe+0x76/0x7e [Tue Apr 8 12:19:07 2025] RIP: 0033:0x7f7c584a646e [Tue Apr 8 12:19:07 2025] RSP: 002b:00007ffdba8ce158 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [Tue Apr 8 12:19:07 2025] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007f7c584a646e [Tue Apr 8 12:19:07 2025] RDX: 0000000000020000 RSI: 00007f7c5839a000 RDI: 0000000000000003 [Tue Apr 8 12:19:07 2025] RBP: 00007f7c5839a000 R08: 00007f7c58399010 R09: 0000000000000000 [Tue Apr 8 12:19:07 2025] R10: fffffffffffffbc5 R11: 0000000000000246 R12: 0000000000000000 [Tue Apr 8 12:19:07 2025] R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000 [Tue Apr 8 12:19:07 2025] </TASK> [1] https://lore.kernel.org/all/174046694331.2194069.15472952050240807469.stgit@mhiramat.tok.corp.google.com Link: https://lkml.kernel.org/r/20250414145945.84916-3-ioworker0@gmail.com Signed-off-by: Mingzhe Yang <mingzhe.yang@ly.com> Signed-off-by: Lance Yang <ioworker0@gmail.com> Suggested-by: Andrew Morton <akpm@linux-foundation.org> Suggested-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Anna Schumaker <anna.schumaker@oracle.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joel Granados <joel.granados@kernel.org> Cc: John Stultz <jstultz@google.com> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tomasz Figa <tfiga@chromium.org> Cc: Waiman Long <longman@redhat.com> Cc: Will Deacon <will@kernel.org> Cc: Yongliang Gao <leonylgao@tencent.com> Cc: Zi Li <amaindex@outlook.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11hung_task: replace blocker_mutex with encoded blockerLance Yang
Patch series "hung_task: extend blocking task stacktrace dump to semaphore", v5. Inspired by mutex blocker tracking[1], this patch series extend the feature to not only dump the blocker task holding a mutex but also to support semaphores. Unlike mutexes, semaphores lack explicit ownership tracking, making it challenging to identify the root cause of hangs. To address this, we introduce a last_holder field to the semaphore structure, which is updated when a task successfully calls down() and cleared during up(). The assumption is that if a task is blocked on a semaphore, the holders must not have released it. While this does not guarantee that the last holder is one of the current blockers, it likely provides a practical hint for diagnosing semaphore-related stalls. With this change, the hung task detector can now show blocker task's info like below: [Tue Apr 8 12:19:07 2025] INFO: task cat:945 blocked for more than 120 seconds. [Tue Apr 8 12:19:07 2025] Tainted: G E 6.14.0-rc6+ #1 [Tue Apr 8 12:19:07 2025] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Tue Apr 8 12:19:07 2025] task:cat state:D stack:0 pid:945 tgid:945 ppid:828 task_flags:0x400000 flags:0x00000000 [Tue Apr 8 12:19:07 2025] Call Trace: [Tue Apr 8 12:19:07 2025] <TASK> [Tue Apr 8 12:19:07 2025] __schedule+0x491/0xbd0 [Tue Apr 8 12:19:07 2025] schedule+0x27/0xf0 [Tue Apr 8 12:19:07 2025] schedule_timeout+0xe3/0xf0 [Tue Apr 8 12:19:07 2025] ? __folio_mod_stat+0x2a/0x80 [Tue Apr 8 12:19:07 2025] ? set_ptes.constprop.0+0x27/0x90 [Tue Apr 8 12:19:07 2025] __down_common+0x155/0x280 [Tue Apr 8 12:19:07 2025] down+0x53/0x70 [Tue Apr 8 12:19:07 2025] read_dummy_semaphore+0x23/0x60 [Tue Apr 8 12:19:07 2025] full_proxy_read+0x5f/0xa0 [Tue Apr 8 12:19:07 2025] vfs_read+0xbc/0x350 [Tue Apr 8 12:19:07 2025] ? __count_memcg_events+0xa5/0x140 [Tue Apr 8 12:19:07 2025] ? count_memcg_events.constprop.0+0x1a/0x30 [Tue Apr 8 12:19:07 2025] ? handle_mm_fault+0x180/0x260 [Tue Apr 8 12:19:07 2025] ksys_read+0x66/0xe0 [Tue Apr 8 12:19:07 2025] do_syscall_64+0x51/0x120 [Tue Apr 8 12:19:07 2025] entry_SYSCALL_64_after_hwframe+0x76/0x7e [Tue Apr 8 12:19:07 2025] RIP: 0033:0x7f419478f46e [Tue Apr 8 12:19:07 2025] RSP: 002b:00007fff1c4d2668 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [Tue Apr 8 12:19:07 2025] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007f419478f46e [Tue Apr 8 12:19:07 2025] RDX: 0000000000020000 RSI: 00007f4194683000 RDI: 0000000000000003 [Tue Apr 8 12:19:07 2025] RBP: 00007f4194683000 R08: 00007f4194682010 R09: 0000000000000000 [Tue Apr 8 12:19:07 2025] R10: fffffffffffffbc5 R11: 0000000000000246 R12: 0000000000000000 [Tue Apr 8 12:19:07 2025] R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000 [Tue Apr 8 12:19:07 2025] </TASK> [Tue Apr 8 12:19:07 2025] INFO: task cat:945 blocked on a semaphore likely last held by task cat:938 [Tue Apr 8 12:19:07 2025] task:cat state:S stack:0 pid:938 tgid:938 ppid:584 task_flags:0x400000 flags:0x00000000 [Tue Apr 8 12:19:07 2025] Call Trace: [Tue Apr 8 12:19:07 2025] <TASK> [Tue Apr 8 12:19:07 2025] __schedule+0x491/0xbd0 [Tue Apr 8 12:19:07 2025] ? _raw_spin_unlock_irqrestore+0xe/0x40 [Tue Apr 8 12:19:07 2025] schedule+0x27/0xf0 [Tue Apr 8 12:19:07 2025] schedule_timeout+0x77/0xf0 [Tue Apr 8 12:19:07 2025] ? __pfx_process_timeout+0x10/0x10 [Tue Apr 8 12:19:07 2025] msleep_interruptible+0x49/0x60 [Tue Apr 8 12:19:07 2025] read_dummy_semaphore+0x2d/0x60 [Tue Apr 8 12:19:07 2025] full_proxy_read+0x5f/0xa0 [Tue Apr 8 12:19:07 2025] vfs_read+0xbc/0x350 [Tue Apr 8 12:19:07 2025] ? __count_memcg_events+0xa5/0x140 [Tue Apr 8 12:19:07 2025] ? count_memcg_events.constprop.0+0x1a/0x30 [Tue Apr 8 12:19:07 2025] ? handle_mm_fault+0x180/0x260 [Tue Apr 8 12:19:07 2025] ksys_read+0x66/0xe0 [Tue Apr 8 12:19:07 2025] do_syscall_64+0x51/0x120 [Tue Apr 8 12:19:07 2025] entry_SYSCALL_64_after_hwframe+0x76/0x7e [Tue Apr 8 12:19:07 2025] RIP: 0033:0x7f7c584a646e [Tue Apr 8 12:19:07 2025] RSP: 002b:00007ffdba8ce158 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [Tue Apr 8 12:19:07 2025] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007f7c584a646e [Tue Apr 8 12:19:07 2025] RDX: 0000000000020000 RSI: 00007f7c5839a000 RDI: 0000000000000003 [Tue Apr 8 12:19:07 2025] RBP: 00007f7c5839a000 R08: 00007f7c58399010 R09: 0000000000000000 [Tue Apr 8 12:19:07 2025] R10: fffffffffffffbc5 R11: 0000000000000246 R12: 0000000000000000 [Tue Apr 8 12:19:07 2025] R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000 [Tue Apr 8 12:19:07 2025] </TASK> This patch (of 3): This patch replaces 'struct mutex *blocker_mutex' with 'unsigned long blocker', as only one blocker is active at a time. The blocker filed can store both the lock addrees and the lock type, with LSB used to encode the type as Masami suggested, making it easier to extend the feature to cover other types of locks. Also, once the lock type is determined, we can directly extract the address and cast it to a lock pointer ;) Link: https://lkml.kernel.org/r/20250414145945.84916-1-ioworker0@gmail.com Link: https://lore.kernel.org/all/174046694331.2194069.15472952050240807469.stgit@mhiramat.tok.corp.google.com [1] Link: https://lkml.kernel.org/r/20250414145945.84916-2-ioworker0@gmail.com Signed-off-by: Mingzhe Yang <mingzhe.yang@ly.com> Signed-off-by: Lance Yang <ioworker0@gmail.com> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Suggested-by: Andrew Morton <akpm@linux-foundation.org> Suggested-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Anna Schumaker <anna.schumaker@oracle.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joel Granados <joel.granados@kernel.org> Cc: John Stultz <jstultz@google.com> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tomasz Figa <tfiga@chromium.org> Cc: Waiman Long <longman@redhat.com> Cc: Will Deacon <will@kernel.org> Cc: Yongliang Gao <leonylgao@tencent.com> Cc: Zi Li <amaindex@outlook.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11ocfs2: o2net_idle_timer: Rename del_timer_sync in commentWangYuli
Commit 8fa7292fee5c ("treewide: Switch/rename to timer_delete[_sync]()") switched del_timer_sync to timer_delete_sync, but did not modify the comment for o2net_idle_timer(). Now fix it. Link: https://lkml.kernel.org/r/BDDB1E4E2876C36C+20250411102610.165946-1-wangyuli@uniontech.com Signed-off-by: WangYuli <wangyuli@uniontech.com> Acked-by: Joseph Qi <jiangqi903@gmail.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11Squashfs: check return result of sb_min_blocksizePhillip Lougher
Syzkaller reports an "UBSAN: shift-out-of-bounds in squashfs_bio_read" bug. Syzkaller forks multiple processes which after mounting the Squashfs filesystem, issues an ioctl("/dev/loop0", LOOP_SET_BLOCK_SIZE, 0x8000). Now if this ioctl occurs at the same time another process is in the process of mounting a Squashfs filesystem on /dev/loop0, the failure occurs. When this happens the following code in squashfs_fill_super() fails. ---- msblk->devblksize = sb_min_blocksize(sb, SQUASHFS_DEVBLK_SIZE); msblk->devblksize_log2 = ffz(~msblk->devblksize); ---- sb_min_blocksize() returns 0, which means msblk->devblksize is set to 0. As a result, ffz(~msblk->devblksize) returns 64, and msblk->devblksize_log2 is set to 64. This subsequently causes the UBSAN: shift-out-of-bounds in fs/squashfs/block.c:195:36 shift exponent 64 is too large for 64-bit type 'u64' (aka 'unsigned long long') This commit adds a check for a 0 return by sb_min_blocksize(). Link: https://lkml.kernel.org/r/20250409024747.876480-1-phillip@squashfs.org.uk Fixes: 0aa666190509 ("Squashfs: super block operations") Reported-by: syzbot+65761fc25a137b9c8c6e@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/67f0dd7a.050a0220.0a13.0230.GAE@google.com/ Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11exit: combine work under lock in synchronize_group_exit() and ↵Mateusz Guzik
coredump_task_exit() This reduces single-threaded overhead as it avoids one lock+irq trip on exit. It also improves scalability of spawning and killing threads within one process (just shy of 5% when doing it on 24 cores on my test jig). Both routines are moved below kcov and kmsan exit, which should be harmless. Link: https://lkml.kernel.org/r/20250319195436.1864415-1-mjguzik@gmail.com Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11errseq: eliminate special limitation for macro MAX_ERRNOZijun Hu
Current errseq implementation depends on a very special precondition that macro MAX_ERRNO must be (2^n - 1). Eliminate the limitation by - redefining macro ERRSEQ_SHIFT - defining a new macro ERRNO_MASK instead of MAX_ERRNO for errno mask. There is no plan to change the value of MAX_ERRNO, but this makes the implementation more generic and eliminates the BUILD_BUG_ON(). Link: https://lkml.kernel.org/r/20250407-improve_errseq-v1-1-7b27cbeb8298@quicinc.com Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>