summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-03-16ucount: replace get_ucounts_or_wrap() with atomic_inc_not_zero()Sebastian Andrzej Siewior
get_ucounts_or_wrap() increments the counter and if the counter is negative then it decrements it again in order to reset the previous increment. This statement can be replaced with atomic_inc_not_zero() to only increment the counter if it is not yet 0. This simplifies the get function because the put (if the get failed) can be removed. atomic_inc_not_zero() is implement as a cmpxchg() loop which can be repeated several times if another get/put is performed in parallel. This will be optimized later. Increment the reference counter only if not yet dropped to zero. Link: https://lkml.kernel.org/r/20250203150525.456525-3-bigeasy@linutronix.de Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Lai jiangshan <jiangshanlai@gmail.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Mengen Sun <mengensun@tencent.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com> Cc: YueHong Wu <yuehongwu@tencent.com> Cc: Zqiang <qiang.zhang1211@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16rcu: provide a static initializer for hlist_nulls_headSebastian Andrzej Siewior
Patch series "ucount: Simplify refcounting with rcuref_t". I noticed that the atomic_dec_and_lock_irqsave() in put_ucounts() loops sometimes even during boot. Something like 2-3 iterations but still. This series replaces the refcounting with rcuref_t and adds a RCU lookup. This allows a lockless lookup in alloc_ucounts() if the entry is available and a cmpxchg()less put of the item. This patch (of 4): Provide a static initializer for hlist_nulls_head so that it can be used in statically defined data structures. Link: https://lkml.kernel.org/r/20250203150525.456525-1-bigeasy@linutronix.de Link: https://lkml.kernel.org/r/20250203150525.456525-2-bigeasy@linutronix.de Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Lai jiangshan <jiangshanlai@gmail.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Mengen Sun <mengensun@tencent.com> Cc: "Paul E . McKenney" <paulmck@kernel.org> Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com> Cc: YueHong Wu <yuehongwu@tencent.com> Cc: Zqiang <qiang.zhang1211@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16lib/zlib: drop EQUAL macroYury Norov
The macro is prehistoric, and only exists to help those readers who don't know what memcmp() returns if memory areas differ. This is pretty well documented, so the macro looks excessive. Now that the only user of the macro depends on DEBUG_ZLIB config, GCC warns about unused macro if the library is built with W=2 against defconfig. So drop it for good. Link: https://lkml.kernel.org/r/20250205212933.68695-1-yury.norov@gmail.com Signed-off-by: Yury Norov <yury.norov@gmail.com> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Reviewed-by: Mikhail Zaslonko <zaslonko@linux.ibm.com> Cc: Heiko Carsten <heiko.carstens@de.ibm.com> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16get_maintainer: stop reporting subsystem status as maintainer roleVlastimil Babka
After introducing the --substatus option, we can stop adjusting the reported maintainer role by the subsystem's status. For compatibility with the --git-chief-penguins option, keep the "chief penguin" role. Link: https://lkml.kernel.org/r/20250203-b4-get_maintainer-v2-2-83ba008b491f@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Tested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Bryan O'Donoghue <bryan.odonoghue@linaro.org> Cc: Joe Perches <joe@perches.com> Cc: Kees Cook <kees@kernel.org> Cc: Ted Ts'o <tytso@mit.edu> Cc: Thorsten Leemhuis <linux@leemhuis.info> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16get_maintainer: add --substatus for reporting subsystem statusVlastimil Babka
Patch series "get_maintainer: report subsystem status separately", v2. The subsystem status (S: field) can inform a patch submitter if the subsystem is well maintained or e.g. maintainers are missing. In get_maintainer, it is currently reported with --role(stats) by adjusting the maintainer role for any status different from Maintained. This has two downsides: - if a subsystem has only reviewers or mailing lists and no maintainers, the status is not reported. For example Orphan subsystems typically have no maintainers so there's nobody to report as orphan minder. - the Supported status means that someone is paid for maintaining, but it is reported as "supporter" for all the maintainers, which can be incorrect (only some of them may be paid). People (including myself) have been also confused about what "supporter" means. The second point has been brought up in 2022 and the discussion in the end resulted in adjusting documentation only [1]. I however agree with Ted's points that it's misleading to take the subsystem status and apply it to all maintainers [2]. The attempt to modify get_maintainer output was retracted after Joe objected that the status becomes not reported at all [3]. This series addresses that concern by reporting the status (unless it's the most common Maintained one) on separate lines that follow the reported emails, using a new --substatus parameter. Care is taken to reduce the noise to minimum by not reporting the most common Maintained status, by default require no opt-in that would need the users to discover the new parameter, and at the same time not to break existing git --cc-cmd usage. [1] https://lore.kernel.org/all/20221006162413.858527-1-bryan.odonoghue@linaro.org/ [2] https://lore.kernel.org/all/Yzen4X1Na0MKXHs9@mit.edu/ [3] https://lore.kernel.org/all/30776fe75061951777da8fa6618ae89bea7a8ce4.camel@perches.com/ This patch (of 2): The subsystem status is currently reported with --role(stats) by adjusting the maintainer role for any status different from Maintained. This has two downsides: - if a subsystem has only reviewers or mailing lists and no maintainers, the status is not reported (i.e. typically, Orphan subsystems have no maintainers) - the Supported status means that someone is paid for maintaining, but it is reported as "supporter" for all the maintainers, which can be incorrect. People have been also confused about what "supporter" means. This patch introduces a new --substatus option and functionality aimed to report the subsystem status separately, without adjusting the reported maintainer role. After the e-mails are output, the status of subsystems will follow, for example: ... linux-kernel@vger.kernel.org (open list:LIBRARY CODE) LIBRARY CODE status: Supported In order to allow replacing the role rewriting seamlessly, the new option works as follows: - it is automatically enabled when --email and --role are enabled (the defaults include --email and --rolestats which implies --role) - usages with --norolestats e.g. for git's --cc-cmd will thus need no adjustments - the most common Maintained status is not reported at all, to reduce unnecessary noise - THE REST catch-all section (contains lkml) status is not reported - the existing --subsystem and --status options are unaffected so their users will need no adjustments [vbabka@suse.cz: require that script output goes to a terminal] Link: https://lkml.kernel.org/r/66c2bf7a-9119-4850-b6b8-ac8f426966e1@suse.cz Link: https://lkml.kernel.org/r/20250203-b4-get_maintainer-v2-0-83ba008b491f@suse.cz Link: https://lkml.kernel.org/r/20250203-b4-get_maintainer-v2-1-83ba008b491f@suse.cz Fixes: c1565b6f7b53 ("get_maintainer: add --substatus for reporting subsystem status") Closes: https://lore.kernel.org/all/7aodxv46lj6rthjo4i5zhhx2lybrhb4uknpej2dyz3e7im5w3w@w23bz6fx3jnn/ Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Tested-by: Uwe Kleine-K=F6nig <u.kleine-koenig@baylibre.com> Cc: Bryan O'Donoghue <bryan.odonoghue@linaro.org> Cc: Joe Perches <joe@perches.com> Cc: Kees Cook <kees@kernel.org> Cc: Ted Ts'o <tytso@mit.edu> Cc: Thorsten Leemhuis <linux@leemhuis.info> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16powerpc/crash: use generic crashkernel reservationSourabh Jain
Commit 0ab97169aa05 ("crash_core: add generic function to do reservation") added a generic function to reserve crashkernel memory. So let's use the same function on powerpc and remove the architecture-specific code that essentially does the same thing. The generic crashkernel reservation also provides a way to split the crashkernel reservation into high and low memory reservations, which can be enabled for powerpc in the future. Along with moving to the generic crashkernel reservation, the code related to finding the base address for the crashkernel has been separated into its own function name get_crash_base() for better readability and maintainability. Link: https://lkml.kernel.org/r/20250131113830.925179-8-sourabhjain@linux.ibm.com Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com> Reviewed-by: Mahesh Salgaonkar <mahesh@linux.ibm.com> Acked-by: Hari Bathini <hbathini@linux.ibm.com> Cc: Baoquan he <bhe@redhat.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16powerpc: insert System RAM resource to prevent crashkernel conflictSourabh Jain
The next patch in the series with title "powerpc/crash: use generic crashkernel reservation" enables powerpc to use generic crashkernel reservation instead of custom implementation. This leads to exporting of `Crash Kernel` memory to iomem_resource (/proc/iomem) via insert_crashkernel_resources():kernel/crash_reserve.c or at another place in the same file if HAVE_ARCH_ADD_CRASH_RES_TO_IOMEM_EARLY is set. The add_system_ram_resources():arch/powerpc/mm/mem.c adds `System RAM` to iomem_resource using request_resource(). This creates a conflict with `Crash Kernel`, which is added by the generic crashkernel reservation code. As a result, the kernel ultimately fails to add `System RAM` to iomem_resource. Consequently, it does not appear in /proc/iomem. There are multiple approches tried to avoid this: 1. Don't add Crash Kernel to iomem_resource: - This has two issues. First, it requires adding an architecture-specific hook in the generic code. There are already two code paths to choose when to add `Crash Kernel` to iomem_resource. This adds one more code path to skip it. Second, what if `Crash Kernel` is required in /proc/iomem in the future? Many architectures do export it. 2. Don't add `System RAM` to iomem_resource by reverting commit c40dd2f766440 ("powerpc: Add System RAM to /proc/iomem"): - It's not ideal to export `System RAM` via /proc/iomem, but since it already done ealier and userspace tools like kdump and kdump-utils rely on `System RAM` from /proc/iomem, removing it will break userspace. 3. Add Crash Kernel along with System RAM to /proc/iomem: This patch takes the third approach by updating add_system_ram_resources() to use insert_resource() instead of the request_resource() API to add the `System RAM` resource to iomem_resource. insert_resource() allows inserting resources even if they overlap with existing ones. Since `Crash Kernel` and `System RAM` resources are added to iomem_resource early in the boot, any other conflict is not expected. With the changes introduced here and in the next patch, "powerpc/crash: use generic crashkernel reservation," /proc/iomem now exports `System RAM` and `Crash Kernel` as shown below: $ cat /proc/iomem 00000000-3ffffffff : System RAM 10000000-4fffffff : Crash kernel The kdump script is capable of handling `System RAM` and `Crash Kernel` in the above format. The same format is used in other architectures. Link: https://lkml.kernel.org/r/20250131113830.925179-7-sourabhjain@linux.ibm.com Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com> Cc: Baoquan he <bhe@redhat.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Mahesh Salgaonkar <mahesh@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16powerpc/crash: preserve user-specified memory limitSourabh Jain
Commit 59d58189f3d9 ("crash: fix crash memory reserve exceed system memory bug") fails crashkernel parsing if the crash size is found to be higher than system RAM, which makes the memory_limit adjustment code ineffective due to an early exit from reserve_crashkernel(). Regardless lets not violate the user-specified memory limit by adjusting it. Remove this adjustment to ensure all reservations stay within the limit. Commit f94f5ac07983 ("powerpc/fadump: Don't update the user-specified memory limit") did the same for fadump. Link: https://lkml.kernel.org/r/20250131113830.925179-6-sourabhjain@linux.ibm.com Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com> Reviewed-by: Mahesh Salgaonkar <mahesh@linux.ibm.com> Acked-by: Hari Bathini <hbathini@linux.ibm.com> Cc: Baoquan he <bhe@redhat.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16powerpc/crash: use generic APIs to locate memory hole for kdumpSourabh Jain
On PowerPC, the memory reserved for the crashkernel can contain components like RTAS, TCE, OPAL, etc., which should be avoided when loading kexec segments into crashkernel memory. Due to these special components, PowerPC has its own set of APIs to locate holes in the crashkernel memory for loading kexec segments for kdump. However, for loading kexec segments in the kexec case, PowerPC already uses generic APIs to locate holes. The previous patch in this series, titled "crash: Let arch decide usable memory range in reserved area," introduced arch-specific hook to handle such special regions in the crashkernel area. So, switch PowerPC to use the generic APIs to locate memory holes for kdump and remove the redundant PowerPC-specific APIs. Link: https://lkml.kernel.org/r/20250131113830.925179-5-sourabhjain@linux.ibm.com Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com> Cc: Baoquan he <bhe@redhat.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mahesh Salgaonkar <mahesh@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16crash: let arch decide usable memory range in reserved areaSourabh Jain
Although the crashkernel area is reserved, on architectures like PowerPC, it is possible for the crashkernel reserved area to contain components like RTAS, TCE, OPAL, etc. To avoid placing kexec segments over these components, PowerPC has its own set of APIs to locate holes in the crashkernel reserved area. Add an arch hook in the generic locate mem hole APIs so that architectures can handle such special regions in the crashkernel area while locating memory holes for kexec segments using generic APIs. With this, a lot of redundant arch-specific code can be removed, as it performs the exact same job as the generic APIs. To keep the generic and arch-specific changes separate, the changes related to moving PowerPC to use the generic APIs and the removal of PowerPC-specific APIs for memory hole allocation are done in a subsequent patch titled "powerpc/crash: Use generic APIs to locate memory hole for kdump. Link: https://lkml.kernel.org/r/20250131113830.925179-4-sourabhjain@linux.ibm.com Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mahesh Salgaonkar <mahesh@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16crash: remove an unused argument from reserve_crashkernel_generic()Sourabh Jain
cmdline argument is not used in reserve_crashkernel_generic() so remove it. Correspondingly, all the callers have been updated as well. No functional change intended. Link: https://lkml.kernel.org/r/20250131113830.925179-3-sourabhjain@linux.ibm.com Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com> Acked-by: Hari Bathini <hbathini@linux.ibm.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mahesh Salgaonkar <mahesh@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16kexec: initialize ELF lowest address to ULONG_MAXSourabh Jain
Patch series "powerpc/crash: use generic crashkernel reservation", v3. Commit 0ab97169aa05 ("crash_core: add generic function to do reservation") added a generic function to reserve crashkernel memory. So let's use the same function on powerpc and remove the architecture-specific code that essentially does the same thing. The generic crashkernel reservation also provides a way to split the crashkernel reservation into high and low memory reservations, which can be enabled for powerpc in the future. Additionally move powerpc to use generic APIs to locate memory hole for kexec segments while loading kdump kernel. This patch (of 7): kexec_elf_load() loads an ELF executable and sets the address of the lowest PT_LOAD section to the address held by the lowest_load_addr function argument. To determine the lowest PT_LOAD address, a local variable lowest_addr (type unsigned long) is initialized to UINT_MAX. After loading each PT_LOAD, its address is compared to lowest_addr. If a loaded PT_LOAD address is lower, lowest_addr is updated. However, setting lowest_addr to UINT_MAX won't work when the kernel image is loaded above 4G, as the returned lowest PT_LOAD address would be invalid. This is resolved by initializing lowest_addr to ULONG_MAX instead. This issue was discovered while implementing crashkernel high/low reservation on the PowerPC architecture. Link: https://lkml.kernel.org/r/20250131113830.925179-1-sourabhjain@linux.ibm.com Link: https://lkml.kernel.org/r/20250131113830.925179-2-sourabhjain@linux.ibm.com Fixes: a0458284f062 ("powerpc: Add support code for kexec_file_load()") Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com> Acked-by: Hari Bathini <hbathini@linux.ibm.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mahesh Salgaonkar <mahesh@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16lib/plist.c: add shortcut for plist_requeue()I Hsin Cheng
In the operation of plist_requeue(), "node" is deleted from the list before queueing it back to the list again, which involves looping to find the tail of same-prio entries. If "node" is the head of same-prio entries which means its prio_list is on the priority list, then "node_next" can be retrieve immediately by the next entry of prio_list, instead of looping nodes on node_list. The shortcut implementation can benefit plist_requeue() running the below test, and the test result is shown in the following table. One can observe from the test result that when the number of nodes of same-prio entries is smaller, then the probability of hitting the shortcut can be bigger, thus the benefit can be more significant. While it tends to behave almost the same for long same-prio entries, since the probability of taking the shortcut is much smaller. ----------------------------------------------------------------------- | Test size | 200 | 400 | 600 | 800 | 1000 | ----------------------------------------------------------------------- | new_plist_requeue | 271521| 1007913| 2148033| 4346792| 12200940| ----------------------------------------------------------------------- | old_plist_requeue | 301395| 1105544| 2488301| 4632980| 12217275| ----------------------------------------------------------------------- The test is done on x86_64 architecture with v6.9 kernel and Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz. Test script( executed in kernel module mode ): int init_module(void) { unsigned int test_data[test_size]; /* Split the list into 10 different priority * , when test_size is larger, the number of * nodes within each priority is larger. */ for (i = 0; i < ARRAY_SIZE(test_data); i++) { test_data[i] = i % 10; } ktime_t start, end, time_elapsed = 0; plist_head_init(&test_head_local); for (i = 0; i < ARRAY_SIZE(test_node_local); i++) { plist_node_init(test_node_local + i, 0); test_node_local[i].prio = test_data[i]; } for (i = 0; i < ARRAY_SIZE(test_node_local); i++) { if (plist_node_empty(test_node_local + i)) { plist_add(test_node_local + i, &test_head_local); } } for (i = 0; i < ARRAY_SIZE(test_node_local); i += 1) { start = ktime_get(); plist_requeue(test_node_local + i, &test_head_local); end = ktime_get(); time_elapsed += (end - start); } pr_info("plist_requeue() elapsed time : %lld, size %d\n", time_elapsed, test_size); return 0; } [akpm@linux-foundation.org: tweak comment and code layout] Link: https://lkml.kernel.org/r/20250119062408.77638-1-richard120310@gmail.com Signed-off-by: I Hsin Cheng <richard120310@gmail.com> Cc: Ching-Chun (Jim) Huang <jserv@ccns.ncku.edu.tw> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16docs,procfs: document /proc/PID/* access permission checksAndrii Nakryiko
Add a paragraph explaining what sort of capabilities a process would need to read procfs data for some other process. Also mention that reading data for its own process doesn't require any extra permissions. Link: https://lkml.kernel.org/r/20250129001747.759990-1-andrii@kernel.org Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jann Horn <jannh@google.com> Cc: Kees Cook <kees@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: "Mike Rapoport (IBM)" <rppt@kernel.org> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16.mailmap: remove redundant mappings of emailsCarlos Bilbao
Remove two redundant mappings: changbin.du@intel.com -> changbin.du@intel.com viresh.kumar@linaro.org -> viresh.kumar@linaro.org Link: https://lkml.kernel.org/r/20250129013430.1117720-1-carlos.bilbao@kernel.org Signed-off-by: Carlos Bilbao <carlos.bilbao@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16scripts: add script to extract built-in firmware blobsGuilherme G. Piccoli
There is currently no tool to extract a firmware blob that is built-in on vmlinux to the best of my knowledge. So if we have a kernel image containing the blobs, and we want to rebuild the kernel with some debug patches for example (and given that the image also has IKCONFIG=y), we currently can't do that for the same versions for all the firmware blobs, _unless_ we have exact commits of linux-firmware for the specific versions for each firmware included. Through the options CONFIG_EXTRA_FIRMWARE{_DIR} one is able to build a kernel including firmware blobs in a built-in fashion. This is usually the case of built-in drivers that require some blobs in order to work properly, for example, like in non-initrd based systems. Add hereby a script to extract these blobs from a non-stripped vmlinux, similar to the idea of "extract-ikconfig". The firmware loader interface saves such built-in blobs as rodata entries, having a field for the FW name as "_fw_<module_name>_<firmware_name>_bin"; the tool extracts files named "<module_name>_<firmware_name>" for each rodata firmware entry detected. It makes use of awk, bash, dd and readelf, pretty standard tooling for Linux development. With this tool, we can blindly extract the FWs and easily re-add them in the new debug kernel build, allowing a more deterministic testing without the burden of "hunting down" the proper version of each firmware binary. Link: https://lkml.kernel.org/r/20250120190436.127578-1-gpiccoli@igalia.com Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com> Suggested-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com> Reviewed-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com> Cc: Danilo Krummrich <dakr@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Luis Chamberalin <mcgrof@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nicolas Schier <nicolas@fjasle.eu> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Russ Weight <russ.weight@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16MAINTAINERS: add Yang Yang as a co-maintainer of PER-TASK DELAY ACCOUNTINGYang Yang
Balbir Singh is the unique maintainer of PER-TASK DELAY ACCOUNTING, and he had started work on cgroupstats a long time back, this subsystem then is not growing at a very rapid pace. With their excellent work delay accounting is still very useful for observing and optimizing system delay, but still needs continuous improvement. Yang Yang with his team had worked for most of the recent patches of the subsystem, and he has a strong willing to help, Balbir Singh is glad to see that, so add him as a co-maintainer. Link: https://lkml.kernel.org/r/20250117222013817zWHgBaSigRI_eRJt1hqnu@zte.com.cn Signed-off-by: Yang Yang <yang.yang29@zte.com.cn> Cc: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16mm,procfs: allow read-only remote mm access under CAP_PERFMONAndrii Nakryiko
It's very common for various tracing and profiling toolis to need to access /proc/PID/maps contents for stack symbolization needs to learn which shared libraries are mapped in memory, at which file offset, etc. Currently, access to /proc/PID/maps requires CAP_SYS_PTRACE (unless we are looking at data for our own process, which is a trivial case not too relevant for profilers use cases). Unfortunately, CAP_SYS_PTRACE implies way more than just ability to discover memory layout of another process: it allows to fully control arbitrary other processes. This is problematic from security POV for applications that only need read-only /proc/PID/maps (and other similar read-only data) access, and in large production settings CAP_SYS_PTRACE is frowned upon even for the system-wide profilers. On the other hand, it's already possible to access similar kind of information (and more) with just CAP_PERFMON capability. E.g., setting up PERF_RECORD_MMAP collection through perf_event_open() would give one similar information to what /proc/PID/maps provides. CAP_PERFMON, together with CAP_BPF, is already a very common combination for system-wide profiling and observability application. As such, it's reasonable and convenient to be able to access /proc/PID/maps with CAP_PERFMON capabilities instead of CAP_SYS_PTRACE. For procfs, these permissions are checked through common mm_access() helper, and so we augment that with cap_perfmon() check *only* if requested mode is PTRACE_MODE_READ. I.e., PTRACE_MODE_ATTACH wouldn't be permitted by CAP_PERFMON. So /proc/PID/mem, which uses PTRACE_MODE_ATTACH, won't be permitted by CAP_PERFMON, but /proc/PID/maps, /proc/PID/environ, and a bunch of other read-only contents will be allowable under CAP_PERFMON. Besides procfs itself, mm_access() is used by process_madvise() and process_vm_{readv,writev}() syscalls. The former one uses PTRACE_MODE_READ to avoid leaking ASLR metadata, and as such CAP_PERFMON seems like a meaningful allowable capability as well. process_vm_{readv,writev} currently assume PTRACE_MODE_ATTACH level of permissions (though for readv PTRACE_MODE_READ seems more reasonable, but that's outside the scope of this change), and as such won't be affected by this patch. Link: https://lkml.kernel.org/r/20250127222114.1132392-1-andrii@kernel.org Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jann Horn <jannh@google.com> Cc: Kees Cook <kees@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: "Mike Rapoport (IBM)" <rppt@kernel.org> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-09Linux 6.14-rc6v6.14-rc6Linus Torvalds
2025-03-09Merge tag 'kbuild-fixes-v6.14-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild fixes from Masahiro Yamada: - Use the specified $(LD) when building userprogs with Clang - Pass the correct target triple when compile-testing UAPI headers with Clang - Fix pacman-pkg build error with KBUILD_OUTPUT * tag 'kbuild-fixes-v6.14-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: kbuild: install-extmod-build: Fix build when specifying KBUILD_OUTPUT docs: Kconfig: fix defconfig description kbuild: hdrcheck: fix cross build with clang kbuild: userprogs: use correct lld when linking through clang
2025-03-09Merge tag 'usb-6.14-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB fixes from Greg KH: "Here are some small USB driver fixes for some reported issues. These contain: - typec driver fixes - dwc3 driver fixes - xhci driver fixes - renesas controller fixes - gadget driver fixes - a new USB quirk added All of these have been in linux-next with no reported issues" * tag 'usb-6.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: usb: typec: ucsi: Fix NULL pointer access usb: quirks: Add DELAY_INIT and NO_LPM for Prolific Mass Storage Card Reader usb: xhci: Fix host controllers "dying" after suspend and resume usb: dwc3: Set SUSPENDENABLE soon after phy init usb: hub: lack of clearing xHC resources usb: renesas_usbhs: Flush the notify_hotplug_work usb: renesas_usbhs: Use devm_usb_get_phy() usb: renesas_usbhs: Call clk_put() usb: dwc3: gadget: Prevent irq storm when TH re-executes usb: gadget: Check bmAttributes only if configuration is valid xhci: Restrict USB4 tunnel detection for USB3 devices to Intel hosts usb: xhci: Enable the TRB overfetch quirk on VIA VL805 usb: gadget: Fix setting self-powered state on suspend usb: typec: ucsi: increase timeout for PPM reset operations acpi: typec: ucsi: Introduce a ->poll_cci method usb: typec: tcpci_rt1711h: Unmask alert interrupts to fix functionality usb: gadget: Set self-powered based on MaxPower and bmAttributes usb: gadget: u_ether: Set is_suspend flag if remote wakeup fails usb: atm: cxacru: fix a flaw in existing endpoint checks
2025-03-09Merge tag 'driver-core-6.14-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core fix from Greg KH: "Here is a single driver core fix that resolves a reported memory leak. It's been in linux-next for 2 weeks now with no reported problems" * tag 'driver-core-6.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: drivers: core: fix device leak in __fw_devlink_relax_cycles()
2025-03-09Merge tag 'char-misc-6.14-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc/IIO driver fixes from Greg KH: "Here are a number of misc and char and iio driver fixes that have been sitting in my tree for way too long. They contain: - iio driver fixes for reported issues - regression fix for rtsx_usb card reader - mei and mhi driver fixes - small virt driver fixes - ntsync permissions fix - other tiny driver fixes for reported problems. All of these have been in linux-next for quite a while with no reported issues" * tag 'char-misc-6.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (30 commits) Revert "drivers/card_reader/rtsx_usb: Restore interrupt based detection" ntsync: Check wait count based on byte size. bus: simple-pm-bus: fix forced runtime PM use char: misc: deallocate static minor in error path eeprom: digsy_mtc: Make GPIO lookup table match the device drivers: virt: acrn: hsm: Use kzalloc to avoid info leak in pmcmd_ioctl binderfs: fix use-after-free in binder_devices slimbus: messaging: Free transaction ID in delayed interrupt scenario vbox: add HAS_IOPORT dependency cdx: Fix possible UAF error in driver_override_show() intel_th: pci: Add Panther Lake-P/U support intel_th: pci: Add Panther Lake-H support intel_th: pci: Add Arrow Lake support intel_th: msu: Fix less trivial kernel-doc warnings intel_th: msu: Fix kernel-doc warnings MAINTAINERS: change maintainer for FSI ntsync: Set the permissions to be 0666 bus: mhi: host: pci_generic: Use pci_try_reset_function() to avoid deadlock mei: vsc: Use "wakeuphostint" when getting the host wakeup GPIO mei: me: add panther lake P DID ...
2025-03-09Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM fixes from Paolo Bonzini: "arm64: - Fix a couple of bugs affecting pKVM's PSCI relay implementation when running in the hVHE mode, resulting in the host being entered with the MMU in an unknown state, and EL2 being in the wrong mode x86: - Set RFLAGS.IF in C code on SVM to get VMRUN out of the STI shadow - Ensure DEBUGCTL is context switched on AMD to avoid running the guest with the host's value, which can lead to unexpected bus lock #DBs - Suppress DEBUGCTL.BTF on AMD (to match Intel), as KVM doesn't properly emulate BTF. KVM's lack of context switching has meant BTF has always been broken to some extent - Always save DR masks for SNP vCPUs if DebugSwap is *supported*, as the guest can enable DebugSwap without KVM's knowledge - Fix a bug in mmu_stress_tests where a vCPU could finish the "writes to RO memory" phase without actually generating a write-protection fault - Fix a printf() goof in the SEV smoke test that causes build failures with -Werror - Explicitly zero EAX and EBX in CPUID.0x8000_0022 output when PERFMON_V2 isn't supported by KVM" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: Explicitly zero EAX and EBX when PERFMON_V2 isn't supported by KVM KVM: selftests: Fix printf() format goof in SEV smoke test KVM: selftests: Ensure all vCPUs hit -EFAULT during initial RO stage KVM: SVM: Don't rely on DebugSwap to restore host DR0..DR3 KVM: SVM: Save host DR masks on CPUs with DebugSwap KVM: arm64: Initialize SCTLR_EL1 in __kvm_hyp_init_cpu() KVM: arm64: Initialize HCR_EL2.E2H early KVM: x86: Snapshot the host's DEBUGCTL after disabling IRQs KVM: SVM: Manually context switch DEBUGCTL if LBR virtualization is disabled KVM: x86: Snapshot the host's DEBUGCTL in common x86 KVM: SVM: Suppress DEBUGCTL.BTF on AMD KVM: SVM: Drop DEBUGCTL[5:2] from guest's effective value KVM: selftests: Assert that STI blocking isn't set after event injection KVM: SVM: Set RFLAGS.IF=1 in C code, to get VMRUN out of the STI shadow
2025-03-09Merge tag 'kvm-x86-fixes-6.14-rcN.2' of https://github.com/kvm-x86/linux ↵Paolo Bonzini
into HEAD KVM x86 fixes for 6.14-rcN #2 - Set RFLAGS.IF in C code on SVM to get VMRUN out of the STI shadow. - Ensure DEBUGCTL is context switched on AMD to avoid running the guest with the host's value, which can lead to unexpected bus lock #DBs. - Suppress DEBUGCTL.BTF on AMD (to match Intel), as KVM doesn't properly emulate BTF. KVM's lack of context switching has meant BTF has always been broken to some extent. - Always save DR masks for SNP vCPUs if DebugSwap is *supported*, as the guest can enable DebugSwap without KVM's knowledge. - Fix a bug in mmu_stress_tests where a vCPU could finish the "writes to RO memory" phase without actually generating a write-protection fault. - Fix a printf() goof in the SEV smoke test that causes build failures with -Werror. - Explicitly zero EAX and EBX in CPUID.0x8000_0022 output when PERFMON_V2 isn't supported by KVM.
2025-03-09Merge tag 'kvmarm-fixes-6.14-4' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 6.14, take #4 - Fix a couple of bugs affecting pKVM's PSCI relay implementation when running in the hVHE mode, resulting in the host being entered with the MMU in an unknown state, and EL2 being in the wrong mode.
2025-03-08Merge tag 'mm-hotfixes-stable-2025-03-08-16-27' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "33 hotfixes. 24 are cc:stable and the remainder address post-6.13 issues or aren't considered necessary for -stable kernels. 26 are for MM and 7 are for non-MM. - "mm: memory_failure: unmap poisoned folio during migrate properly" from Ma Wupeng fixes a couple of two year old bugs involving the migration of hwpoisoned folios. - "selftests/damon: three fixes for false results" from SeongJae Park fixes three one year old bugs in the SAMON selftest code. The remainder are singletons and doubletons. Please see the individual changelogs for details" * tag 'mm-hotfixes-stable-2025-03-08-16-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (33 commits) mm/page_alloc: fix uninitialized variable rapidio: add check for rio_add_net() in rio_scan_alloc_net() rapidio: fix an API misues when rio_add_net() fails MAINTAINERS: .mailmap: update Sumit Garg's email address Revert "mm/page_alloc.c: don't show protection in zone's ->lowmem_reserve[] for empty zone" mm: fix finish_fault() handling for large folios mm: don't skip arch_sync_kernel_mappings() in error paths mm: shmem: remove unnecessary warning in shmem_writepage() userfaultfd: fix PTE unmapping stack-allocated PTE copies userfaultfd: do not block on locking a large folio with raised refcount mm: zswap: use ATOMIC_LONG_INIT to initialize zswap_stored_pages mm: shmem: fix potential data corruption during shmem swapin mm: fix kernel BUG when userfaultfd_move encounters swapcache selftests/damon/damon_nr_regions: sort collected regiosn before checking with min/max boundaries selftests/damon/damon_nr_regions: set ops update for merge results check to 100ms selftests/damon/damos_quota: make real expectation of quota exceeds include/linux/log2.h: mark is_power_of_2() with __always_inline NFS: fix nfs_release_folio() to not deadlock via kcompactd writeback mm, swap: avoid BUG_ON in relocate_cluster() mm: swap: use correct step in loop to wait all clusters in wait_for_allocation() ...
2025-03-08Merge tag 'x86-urgent-2025-03-08' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull more x86 fixes from Ingo Molnar: - Add more model IDs to the AMD microcode version check, more people are hitting these checks - Fix a Xen guest boot warning related to AMD northbridge setup - Fix SEV guest bugs related to a recent changes in its locking logic - Fix a missing definition of PTRS_PER_PMD that assembly builds can hit * tag 'x86-urgent-2025-03-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/microcode/AMD: Add some forgotten models to the SHA check x86/mm: Define PTRS_PER_PMD for assembly code too virt: sev-guest: Move SNP Guest Request data pages handling under snp_cmd_mutex virt: sev-guest: Allocate request data dynamically x86/amd_nb: Use rdmsr_safe() in amd_get_mmconfig_range()
2025-03-08x86/microcode/AMD: Add some forgotten models to the SHA checkBorislav Petkov (AMD)
Add some more forgotten models to the SHA check. Fixes: 50cef76d5cb0 ("x86/microcode/AMD: Load only SHA256-checksummed patches") Reported-by: Toralf Förster <toralf.foerster@gmx.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Toralf Förster <toralf.foerster@gmx.de> Link: https://lore.kernel.org/r/20250307220256.11816-1-bp@kernel.org
2025-03-08Merge branch 'linus' into x86/urgent, to pick up dependent patchesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-03-08Merge tag 'loongarch-fixes-6.14-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson Pull LoongArch fixes from Huacai Chen: "Fix bugs in kernel build, hibernation, memory management and KVM" * tag 'loongarch-fixes-6.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson: LoongArch: KVM: Fix GPA size issue about VM LoongArch: KVM: Reload guest CSR registers after sleep LoongArch: KVM: Add interrupt checking for AVEC LoongArch: Set hugetlb mmap base address aligned with pmd size LoongArch: Set max_pfn with the PFN of the last page LoongArch: Use polling play_dead() when resuming from hibernation LoongArch: Eliminate superfluous get_numa_distances_cnt() LoongArch: Convert unreachable() to BUG()
2025-03-08LoongArch: KVM: Fix GPA size issue about VMBibo Mao
Physical address space is 48 bit on Loongson-3A5000 physical machine, however it is 47 bit for VM on Loongson-3A5000 system. Size of physical address space of VM is the same with the size of virtual user space (a half) of physical machine. Variable cpu_vabits represents user address space, kernel address space is not included (user space and kernel space are both a half of total). Here cpu_vabits, rather than cpu_vabits - 1, is to represent the size of guest physical address space. Also there is strict checking about page fault GPA address, inject error if it is larger than maximum GPA address of VM. Cc: stable@vger.kernel.org Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-03-08LoongArch: KVM: Reload guest CSR registers after sleepBibo Mao
On host, the HW guest CSR registers are lost after suspend and resume operation. Since last_vcpu of boot CPU still records latest vCPU pointer so that the guest CSR register skips to reload when boot CPU resumes and vCPU is scheduled. Here last_vcpu is cleared so that guest CSR registers will reload from scheduled vCPU context after suspend and resume. Cc: stable@vger.kernel.org Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-03-08LoongArch: KVM: Add interrupt checking for AVECBibo Mao
There is a newly added macro INT_AVEC with CSR ESTAT register, which is bit 14 used for LoongArch AVEC support. AVEC interrupt status bit 14 is supported with macro CSR_ESTAT_IS, so here replace the hard-coded value 0x1fff with macro CSR_ESTAT_IS so that the AVEC interrupt status is also supported by KVM. Cc: stable@vger.kernel.org Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-03-08LoongArch: Set hugetlb mmap base address aligned with pmd sizeBibo Mao
With ltp test case "testcases/bin/hugefork02", there is a dmesg error report message such as: kernel BUG at mm/hugetlb.c:5550! Oops - BUG[#1]: CPU: 0 UID: 0 PID: 1517 Comm: hugefork02 Not tainted 6.14.0-rc2+ #241 Hardware name: QEMU QEMU Virtual Machine, BIOS unknown 2/2/2022 pc 90000000004eaf1c ra 9000000000485538 tp 900000010edbc000 sp 900000010edbf940 a0 900000010edbfb00 a1 9000000108d20280 a2 00007fffe9474000 a3 00007ffff3474000 a4 0000000000000000 a5 0000000000000003 a6 00000000003cadd3 a7 0000000000000000 t0 0000000001ffffff t1 0000000001474000 t2 900000010ecd7900 t3 00007fffe9474000 t4 00007fffe9474000 t5 0000000000000040 t6 900000010edbfb00 t7 0000000000000001 t8 0000000000000005 u0 90000000004849d0 s9 900000010edbfa00 s0 9000000108d20280 s1 00007fffe9474000 s2 0000000002000000 s3 9000000108d20280 s4 9000000002b38b10 s5 900000010edbfb00 s6 00007ffff3474000 s7 0000000000000406 s8 900000010edbfa08 ra: 9000000000485538 unmap_vmas+0x130/0x218 ERA: 90000000004eaf1c __unmap_hugepage_range+0x6f4/0x7d0 PRMD: 00000004 (PPLV0 +PIE -PWE) EUEN: 00000007 (+FPE +SXE +ASXE -BTE) ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7) ESTAT: 000c0000 [BRK] (IS= ECode=12 EsubCode=0) PRID: 0014c010 (Loongson-64bit, Loongson-3A5000) Process hugefork02 (pid: 1517, threadinfo=00000000a670eaf4, task=000000007a95fc64) Call Trace: [<90000000004eaf1c>] __unmap_hugepage_range+0x6f4/0x7d0 [<9000000000485534>] unmap_vmas+0x12c/0x218 [<9000000000494068>] exit_mmap+0xe0/0x308 [<900000000025fdc4>] mmput+0x74/0x180 [<900000000026a284>] do_exit+0x294/0x898 [<900000000026aa30>] do_group_exit+0x30/0x98 [<900000000027bed4>] get_signal+0x83c/0x868 [<90000000002457b4>] arch_do_signal_or_restart+0x54/0xfa0 [<90000000015795e8>] irqentry_exit_to_user_mode+0xb8/0x138 [<90000000002572d0>] tlb_do_page_fault_1+0x114/0x1b4 The problem is that base address allocated from hugetlbfs is not aligned with pmd size. Here add a checking for hugetlbfs and align base address with pmd size. After this patch the test case "testcases/bin/hugefork02" passes to run. This is similar to the commit 7f24cbc9c4d42db8a3c8484d1 ("mm/mmap: teach generic_get_unmapped_area{_topdown} to handle hugetlb mappings"). Cc: stable@vger.kernel.org # 6.13+ Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-03-08LoongArch: Set max_pfn with the PFN of the last pageBibo Mao
The current max_pfn equals to zero. In this case, it causes user cannot get some page information through /proc filesystem such as kpagecount. The following message is displayed by stress-ng test suite with command "stress-ng --verbose --physpage 1 -t 1". # stress-ng --verbose --physpage 1 -t 1 stress-ng: error: [1691] physpage: cannot read page count for address 0x134ac000 in /proc/kpagecount, errno=22 (Invalid argument) stress-ng: error: [1691] physpage: cannot read page count for address 0x7ffff207c3a8 in /proc/kpagecount, errno=22 (Invalid argument) stress-ng: error: [1691] physpage: cannot read page count for address 0x134b0000 in /proc/kpagecount, errno=22 (Invalid argument) ... After applying this patch, the kernel can pass the test. # stress-ng --verbose --physpage 1 -t 1 stress-ng: debug: [1701] physpage: [1701] started (instance 0 on CPU 3) stress-ng: debug: [1701] physpage: [1701] exited (instance 0 on CPU 3) stress-ng: debug: [1700] physpage: [1701] terminated (success) Cc: stable@vger.kernel.org # 6.8+ Fixes: ff6c3d81f2e8 ("NUMA: optimize detection of memory with no node id assigned by firmware") Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-03-08LoongArch: Use polling play_dead() when resuming from hibernationHuacai Chen
When CONFIG_RANDOM_KMALLOC_CACHES or other randomization infrastructrue enabled, the idle_task's stack may different between the booting kernel and target kernel. So when resuming from hibernation, an ACTION_BOOT_CPU IPI wakeup the idle instruction in arch_cpu_idle_dead() and jump to the interrupt handler. But since the stack pointer is changed, the interrupt handler cannot restore correct context. So rename the current arch_cpu_idle_dead() to idle_play_dead(), make it as the default version of play_dead(), and the new arch_cpu_idle_dead() call play_dead() directly. For hibernation, implement an arch-specific hibernate_resume_nonboot_cpu_disable() to use the polling version (idle instruction is replace by nop, and irq is disabled) of play_dead(), i.e. poll_play_dead(), to avoid IPI handler corrupting the idle_task's stack when resuming from hibernation. This solution is a little similar to commit 406f992e4a372dafbe3c ("x86 / hibernate: Use hlt_play_dead() when resuming from hibernation"). Cc: stable@vger.kernel.org Tested-by: Erpeng Xu <xuerpeng@uniontech.com> Tested-by: Yuli Wang <wangyuli@uniontech.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-03-08LoongArch: Eliminate superfluous get_numa_distances_cnt()Yuli Wang
In LoongArch, get_numa_distances_cnt() isn't in use, resulting in a compiler warning. Fix follow errors with clang-18 when W=1e: arch/loongarch/kernel/acpi.c:259:28: error: unused function 'get_numa_distances_cnt' [-Werror,-Wunused-function] 259 | static inline unsigned int get_numa_distances_cnt(struct acpi_table_slit *slit) | ^~~~~~~~~~~~~~~~~~~~~~ 1 error generated. Link: https://lore.kernel.org/all/Z7bHPVUH4lAezk0E@kernel.org/ Signed-off-by: Yuli Wang <wangyuli@uniontech.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-03-08LoongArch: Convert unreachable() to BUG()Tiezhu Yang
When compiling on LoongArch, there exists the following objtool warning in arch/loongarch/kernel/machine_kexec.o: kexec_reboot() falls through to next function crash_shutdown_secondary() Avoid using unreachable() as it can (and will in the absence of UBSAN) generate fall-through code. Use BUG() so we get a "break BRK_BUG" trap (with unreachable annotation). Cc: stable@vger.kernel.org # 6.12+ Acked-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-03-07Merge tag 's390-6.14-6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fixes from Vasily Gorbik: - Fix return address recovery of traced function in ftrace to ensure reliable stack unwinding - Fix compiler warnings and runtime crashes of vDSO selftests on s390 by introducing a dedicated GNU hash bucket pointer with correct 32-bit entry size - Fix test_monitor_call() inline asm, which misses CC clobber, by switching to an instruction that doesn't modify CC * tag 's390-6.14-6' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/ftrace: Fix return address recovery of traced function selftests/vDSO: Fix GNU hash table entry size for s390x s390/traps: Fix test_monitor_call() inline assembly
2025-03-08x86/mm: Define PTRS_PER_PMD for assembly code tooIngo Molnar
Andy reported the following build warning from head_32.S: In file included from arch/x86/kernel/head_32.S:29: arch/x86/include/asm/pgtable_32.h:59:5: error: "PTRS_PER_PMD" is not defined, evaluates to 0 [-Werror=undef] 59 | #if PTRS_PER_PMD > 1 The reason is that on 2-level i386 paging the folded in PMD's PTRS_PER_PMD constant is not defined in assembly headers, only in generic MM C headers. Instead of trying to fish out the definition from the generic headers, just define it - it even has a comment for it already... Reported-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Tested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/Z8oa8AUVyi2HWfo9@gmail.com
2025-03-07Merge tag 'slab-for-6.14-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab Pull slab fix from Vlastimil Babka: - Stable fix for kmem_cache_destroy() called from a WQ_MEM_RECLAIM workqueue causing a warning due to the new kvfree_rcu_barrier() (Uladzislau Rezki) * tag 'slab-for-6.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab: mm/slab/kvfree_rcu: Switch to WQ_MEM_RECLAIM wq
2025-03-07Merge tag 'acpi-6.14-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI fix from Rafael Wysocki: "Restore the previous behavior of the ACPI platform_profile sysfs interface that has been changed recently in a way incompatible with the existing user space (Mario Limonciello)" * tag 'acpi-6.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: platform/x86/amd: pmf: Add balanced-performance to hidden choices platform/x86/amd: pmf: Add 'quiet' to hidden choices ACPI: platform_profile: Add support for hidden choices
2025-03-07Merge tag 'execve-v6.14-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull core dumping fix from Kees Cook: - Only sort VMAs when core_sort_vma sysctl is set * tag 'execve-v6.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: coredump: Only sort VMAs when core_sort_vma sysctl is set
2025-03-07Merge tag 'for-6.14-rc5-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - fix leaked extent map after error when reading chunks - replace use of deprecated strncpy - in zoned mode, fixed range when ulocking extent range, causing a hang * tag 'for-6.14-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix a leaked chunk map issue in read_one_chunk() btrfs: replace deprecated strncpy() with strscpy() btrfs: zoned: fix extent range end unlock in cow_file_range()
2025-03-07Merge tag 'block-6.14-20250306' of git://git.kernel.dk/linuxLinus Torvalds
Pull block fixes from Jens Axboe: - NVMe pull request via Keith: - TCP use after free fix on polling (Sagi) - Controller memory buffer cleanup fixes (Icenowy) - Free leaking requests on bad user passthrough commands (Keith) - TCP error message fix (Maurizio) - TCP corruption fix on partial PDU (Maurizio) - TCP memory ordering fix for weakly ordered archs (Meir) - Type coercion fix on message error for TCP (Dan) - Name the RQF flags enum, fixing issues with anon enums and BPF import of it - ublk parameter setting fix - GPT partition 7-bit conversion fix * tag 'block-6.14-20250306' of git://git.kernel.dk/linux: block: Name the RQF flags enum nvme-tcp: fix signedness bug in nvme_tcp_init_connection() block: fix conversion of GPT partition name to 7-bit ublk: set_params: properly check if parameters can be applied nvmet-tcp: Fix a possible sporadic response drops in weakly ordered arch nvme-tcp: fix potential memory corruption in nvme_tcp_recv_pdu() nvme-tcp: Fix a C2HTermReq error message nvmet: remove old function prototype nvme-ioctl: fix leaked requests on mapping error nvme-pci: skip CMB blocks incompatible with PCI P2P DMA nvme-pci: clean up CMBMSC when registering CMB fails nvme-tcp: fix possible UAF in nvme_tcp_poll
2025-03-07Merge tag 'io_uring-6.14-20250306' of git://git.kernel.dk/linuxLinus Torvalds
Pull io_uring fix from Jens Axboe: "A single fix for a regression introduced in the 6.14 merge window, causing stalls/hangs with IOPOLL reads or writes" * tag 'io_uring-6.14-20250306' of git://git.kernel.dk/linux: io_uring/rw: ensure reissue path is correctly handled for IOPOLL
2025-03-07Merge tag 'sched-urgent-2025-03-07' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc scheduler fixes from Ingo Molnar: - Fix deadline scheduler sysctl parameter setting bug - Fix RT scheduler sysctl parameter setting bug - Fix possible memory corruption in child_cfs_rq_on_list() * tag 'sched-urgent-2025-03-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/rt: Update limit of sched_rt sysctl in documentation sched/deadline: Use online cpus for validating runtime sched/fair: Fix potential memory corruption in child_cfs_rq_on_list
2025-03-07Merge tag 'perf-urgent-2025-03-07' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf event fixes from Ingo Molnar: "Fix a race between PMU registration and event creation, and fix pmus_lock vs. pmus_srcu lock ordering" * tag 'perf-urgent-2025-03-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/core: Fix perf_pmu_register() vs. perf_init_event() perf/core: Fix pmus_lock vs. pmus_srcu ordering
2025-03-07Merge tag 'x86-urgent-2025-03-07' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 fixes from Ingo Molnar: - Fix CPUID leaf 0x2 parsing bugs - Sanitize very early boot parameters to avoid crash - Fix size overflows in the SGX code - Make CALL_NOSPEC use consistent * tag 'x86-urgent-2025-03-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/boot: Sanitize boot params before parsing command line x86/sgx: Fix size overflows in sgx_encl_create() x86/cpu: Properly parse CPUID leaf 0x2 TLB descriptor 0x63 x86/cpu: Validate CPUID leaf 0x2 EDX output x86/cacheinfo: Validate CPUID leaf 0x2 EDX output x86/speculation: Add a conditional CS prefix to CALL_NOSPEC x86/speculation: Simplify and make CALL_NOSPEC consistent