summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-05-16Merge branches 'rcu/misc-for-6.16', 'rcu/seq-counters-for-6.16' and ↵Joel Fernandes
'rcu/torture-for-6.16' into rcu/for-next
2025-05-16rcutorture: Fix issue with re-using old images on ARM64Joel Fernandes
On ARM64, when running with --configs '36*SRCU-P', I noticed that only 1 instance instead of 36 for starting. Fix it by checking for Image files, instead of bzImage which ARM does not seem to have. With this I see all 36 instances running at the same time in the batch. Tested-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
2025-05-16rcutorture: Remove MAXSMP and CPUMASK_OFFSTACK from TREE01Paul E. McKenney
Back in the day, rcutorture was about the only thing that tested off-stack CPU masks, but now any arm64 system with more than 256 CPUs tests it full time. In fact, it is necessary to hack the kernel to prevent such a system from testing off-stack CPU masks. This means that there is no longer much point in rcutorture going out of its way to test this. And given the differences in how CPUMASK_OFFSTACK is enabled in x86 and arm64, rcutorture would need to go out of its way. This commit therefore removes CONFIG_CPUMASK_OFFSTACK=y (and the CONFIG_MAXSMP=y required to enable it on x86) from TREE01. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
2025-05-16rcutorture: Reduce TREE01 CPU overcommitPaul E. McKenney
The TREE01.boot nr_cpus kernel boot parameter has been set to 43 for more than seven years, but it can cause RCU CPU stall warnings on arm64, most of the time involving the stop-machine subsystem. This should not be too surprising, given that this causes 43 vCPUs to spin with interrupts disabled when there are only eight physical CPUs. The point of this CPU overcommit is to test the ability of expedited RCU grace period initialization to handle races with incoming CPUs that have never previously been online. But limiting to 17 CPUs instead of 43 allows time for this code to be exercised, and eliminates (or at least greatly reduces) the incidence of RCU CPU stall warnings on arm64. So this commit therefore sets nr_cpus=17 in TREE01.boot. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
2025-05-16torture: Check for "Call trace:" as well as "Call Trace:"Paul E. McKenney
Different architectures capitalize their splats differently. Who knew? This commit therefore checks for both arm64 "Call trace:" and x86 "Call Trace:". Reported-by: Joel Fernandes <joelagnelf@nvidia.com> Closes: https://lore.kernel.org/all/553c33d8-2b51-4772-8aef-97b0163bc78e@nvidia.com/ Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
2025-05-16rcutorture: Perform more frequent testing of ->gpwrapJoel Fernandes
Currently, the ->gpwrap is not tested (at all per my testing) due to the requirement of a large delta between a CPU's rdp->gp_seq and its node's rnp->gpseq. This results in no testing of ->gpwrap being set. This patch by default adds 5 minutes of testing with ->gpwrap forced by lowering the delta between rdp->gp_seq and rnp->gp_seq to just 8 GPs. All of this is configurable, including the active time for the setting and a full testing cycle. By default, the first 25 minutes of a test will have the _default_ behavior there is right now (ULONG_MAX / 4) delta. Then for 5 minutes, we switch to a smaller delta causing 1-2 wraps in 5 minutes. I believe this is reasonable since we at least add a little bit of testing for usecases where ->gpwrap is set. [ Apply fix for Dan Carpenter's bug report on init path cleanup. ] [ Apply kernel doc warning fix from Akira Yokosawa. ] Tested-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
2025-05-16torture: Add testing of RCU's Rust bindings to torture.shPaul E. McKenney
This commit adds a --do-rcu-rust parameter to torture.sh, which invokes a rust_doctests_kernel kunit run. Note that kunit wants a clean source tree, so this runs "make mrproper", which might come as a surprise to some users. Should there be a --mrproper parameter to torture.sh to make the user explicitly ask for it? Co-developed-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
2025-05-16torture: Add --do-{,no-}normal to torture.shPaul E. McKenney
Right now, torture.sh runs normal runs unconditionally, which can be slow and thus annoying when you only want to test --kcsan or --kasan runs. This commit therefore adds a --do-normal argument so that "--kcsan --do-no-kasan --do-no-normal" runs only KCSAN runs. Note that specifying "--do-no-kasan --do-no-kcsan --do-no-normal" gets normal runs, so you should not try to use this as a synonym for --do-none. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
2025-05-16checkpatch: Deprecate srcu_read_lock_lite() and srcu_read_unlock_lite()Paul E. McKenney
Uses of srcu_read_lock_lite() and srcu_read_unlock_lite() are better served by the new srcu_read_lock_fast() and srcu_read_unlock_fast() APIs. As in srcu_read_lock_lite() and srcu_read_unlock_lite() would never have happened had I thought a bit harder a few months ago. Therefore, mark them deprecated. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
2025-05-16rcutorture: Comment invocations of tick_dep_set_task()Paul E. McKenney
The rcu_torture_reader() and rcu_torture_fwd_prog_cr() functions run CPU-bound for extended periods of time (tens or even hundreds of milliseconds), so they invoke tick_dep_set_task() and tick_dep_clear_task() to ensure that the scheduling-clock tick helps move grace periods forward. So why doesn't rcu_torture_fwd_prog_nr() also invoke tick_dep_set_task() and tick_dep_clear_task()? Because the point of this function is to test RCU's ability to (eventually) force grace periods forward even when the tick has been disabled during long CPU-bound kernel execution. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
2025-05-16rcu/nocb: Add Safe checks for access offloaded rdpZqiang
For built with CONFIG_PROVE_RCU=y and CONFIG_PREEMPT_RT=y kernels, Disable BH does not change the SOFTIRQ corresponding bits in preempt_count(), but change current->softirq_disable_cnt, this resulted in the following splat: WARNING: suspicious RCU usage kernel/rcu/tree_plugin.h:36 Unsafe read of RCU_NOCB offloaded state! stack backtrace: CPU: 0 UID: 0 PID: 22 Comm: rcuc/0 Call Trace: [ 0.407907] <TASK> [ 0.407910] dump_stack_lvl+0xbb/0xd0 [ 0.407917] dump_stack+0x14/0x20 [ 0.407920] lockdep_rcu_suspicious+0x133/0x210 [ 0.407932] rcu_rdp_is_offloaded+0x1c3/0x270 [ 0.407939] rcu_core+0x471/0x900 [ 0.407942] ? lockdep_hardirqs_on+0xd5/0x160 [ 0.407954] rcu_cpu_kthread+0x25f/0x870 [ 0.407959] ? __pfx_rcu_cpu_kthread+0x10/0x10 [ 0.407966] smpboot_thread_fn+0x34c/0xa50 [ 0.407970] ? trace_preempt_on+0x54/0x120 [ 0.407977] ? __pfx_smpboot_thread_fn+0x10/0x10 [ 0.407982] kthread+0x40e/0x840 [ 0.407990] ? __pfx_kthread+0x10/0x10 [ 0.407994] ? rt_spin_unlock+0x4e/0xb0 [ 0.407997] ? rt_spin_unlock+0x4e/0xb0 [ 0.408000] ? __pfx_kthread+0x10/0x10 [ 0.408006] ? __pfx_kthread+0x10/0x10 [ 0.408011] ret_from_fork+0x40/0x70 [ 0.408013] ? __pfx_kthread+0x10/0x10 [ 0.408018] ret_from_fork_asm+0x1a/0x30 [ 0.408042] </TASK> Currently, triggering an rdp offloaded state change need the corresponding rdp's CPU goes offline, and at this time the rcuc kthreads has already in parking state. this means the corresponding rcuc kthreads can safely read offloaded state of rdp while it's corresponding cpu is online. This commit therefore add softirq_count() check for Preempt-RT kernels. Suggested-by: Joel Fernandes <joelagnelf@nvidia.com> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Zqiang <qiang.zhang1211@gmail.com> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
2025-05-16rcuscale: using kcalloc() to relpace kmalloc()Su Hui
It's safer to using kcalloc() because it can prevent overflow problem. Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Su Hui <suhui@nfschina.com> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
2025-05-16doc/RCU/listRCU: refine example code for eliminating stale dataWei Yang
This patch adjust the example code with following two purpose: * reduce the confusion on not releasing e->lock * emphasize e is valid and not stale with e->lock held Signed-off-by: Wei Yang <richard.weiyang@gmail.com> CC: Boqun Feng <boqun.feng@gmail.com> CC: Alan Huang <mmpgouride@gmail.com> Reviewed-by: Alan Huang <mmpgouride@gmail.com> Link: https://lore.kernel.org/r/20250218005047.27258-1-richard.weiyang@gmail.com Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
2025-05-16doc: Update LWN RCU API links in whatisRCU.rstPaul E. McKenney
This commit adds the 2024 LWN RCU API article set. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
2025-05-16Revert "rcu/nocb: Fix rcuog wake-up from offline softirq"Frederic Weisbecker
This reverts commit f7345ccc62a4b880cf76458db5f320725f28e400. swake_up_one_online() has been removed because hrtimers can now assign a proper online target to hrtimers queued from offline CPUs. Therefore remove the related hackery. Link: https://lore.kernel.org/all/20241231170712.149394-4-frederic@kernel.org/ Reviewed-by: Usama Arif <usamaarif642@gmail.com> Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
2025-05-16rust: sync: rcu: Mark Guard methods as inlineI Hsin Cheng
Currently the implementation of "Guard" methods are basically wrappers around rcu's function within kernel. Building the kernel with llvm 18.1.8 on x86_64 machine will generate the following symbols: $ nm vmlinux | grep ' _R'.*Guard | rustfilt ffffffff817b6c90 T <kernel::sync::rcu::Guard>::new ffffffff817b6cb0 T <kernel::sync::rcu::Guard>::unlock ffffffff817b6cd0 T <kernel::sync::rcu::Guard as core::ops::drop::Drop>::drop ffffffff817b6c90 T <kernel::sync::rcu::Guard as core::default::Default>::default These Rust symbols are basically wrappers around functions "rcu_read_lock" and "rcu_read_unlock". Marking them as inline can reduce the generation of these symbols, and saves the size of code generation for 132 bytes. $ ./scripts/bloat-o-meter vmlinux_old vmlinux_new (Output is demangled for readability) add/remove: 0/10 grow/shrink: 0/1 up/down: 0/-132 (-132) Function old new delta rust_driver_pci::SampleDriver::probe 1041 1034 -7 kernel::sync::rcu::Guard::default 9 - -9 kernel::sync::rcu::Guard::drop 9 - -9 kernel::sync::rcu::read_lock 9 - -9 kernel::sync::rcu::Guard::unlock 9 - -9 kernel::sync::rcu::Guard::new 9 - -9 __pfx__kernel::sync::rcu::Guard::default 16 - -16 __pfx__kernel::sync::rcu::Guard::drop 16 - -16 __pfx__kernel::sync::rcu::read_lock 16 - -16 __pfx__kernel::sync::rcu::Guard::unlock 16 - -16 __pfx__kernel::sync::rcu::Guard::new 16 - -16 Total: Before=23365955, After=23365823, chg -0.00% Link: https://github.com/Rust-for-Linux/linux/issues/1145 Signed-off-by: I Hsin Cheng <richard120310@gmail.com> Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com> Reviewed-by: Benno Lossin <benno.lossin@proton.me> Reviewed-by: Charalampos Mitrodimas <charmitro@posteo.net> Acked-by: Miguel Ojeda <ojeda@kernel.org> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
2025-05-16rcu/cpu_stall_cputime: fix the hardirq count for x86 architectureYongliang Gao
When counting the number of hardirqs in the x86 architecture, it is essential to add arch_irq_stat_cpu to ensure accuracy. For example, a CPU loop within the rcu_read_lock function. Before: [ 70.910184] rcu: INFO: rcu_preempt self-detected stall on CPU [ 70.910436] rcu: 3-....: (4999 ticks this GP) idle=*** [ 70.910711] rcu: hardirqs softirqs csw/system [ 70.910870] rcu: number: 0 657 0 [ 70.911024] rcu: cputime: 0 0 2498 ==> 2498(ms) [ 70.911278] rcu: (t=5001 jiffies g=3677 q=29 ncpus=8) After: [ 68.046132] rcu: INFO: rcu_preempt self-detected stall on CPU [ 68.046354] rcu: 2-....: (4999 ticks this GP) idle=*** [ 68.046628] rcu: hardirqs softirqs csw/system [ 68.046793] rcu: number: 2498 663 0 [ 68.046951] rcu: cputime: 0 0 2496 ==> 2496(ms) [ 68.047244] rcu: (t=5000 jiffies g=3825 q=4 ncpus=8) Fixes: be42f00b73a0 ("rcu: Add RCU stall diagnosis information") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202501090842.SfI6QPGS-lkp@intel.com/ Signed-off-by: Yongliang Gao <leonylgao@tencent.com> Reviewed-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Link: https://lore.kernel.org/r/20250216084109.3109837-1-leonylgao@gmail.com Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
2025-05-16rcu: Remove swake_up_one_online() bandaidFrederic Weisbecker
It's now ok to perform a wake-up from an offline CPU because the resulting armed scheduler bandwidth hrtimers are now correctly targeted by hrtimer infrastructure. Remove the obsolete hackerry. Link: https://lore.kernel.org/all/20241231170712.149394-3-frederic@kernel.org/ Reviewed-by: Usama Arif <usamaarif642@gmail.com> Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
2025-05-16MAINTAINERS: Update Zqiang's email addressZqiang
This patch updates Zqiang's email address to qiang.zhang@linux.dev. Signed-off-by: Zqiang <qiang.zhang1211@gmail.com> Acked-by: Joel Fernandes <joelagnelf@nvidia.com> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
2025-04-11rcutorture: Make torture.sh --do-rt use CONFIG_PREEMPT_RTPaul E. McKenney
The torture.sh --do-rt command-line parameter is intended to mimic -rt kernels. Now that CONFIG_PREEMPT_RT is upstream, this commit makes this mimicking more precise. Note that testing of RCU priority boosting is disabled in favor of forward-progress testing of RCU callbacks. If it turns out to be possible to make kernels built with CONFIG_PREEMPT_RT=y to tolerate testing of both, both will be enabled. [ paulmck: Apply Sebastian Siewior feedback. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
2025-04-08srcu: Use rcu_seq_done_exact() for polling APIJoel Fernandes
poll_state_synchronize_srcu() uses rcu_seq_done() unlike poll_state_synchronize_rcu() which uses rcu_seq_done_exact(). The rcu_seq_done_exact() makes more sense for polling API, as with this API, there is a higher chance that there is a significant delay between the get_state..() and poll_state..() calls since a cookie can be stored and reused at a later time. During such a delay, if the gp_seq counter progresses more than ULONG_MAX/2 distance, then poll_state..() may return false for a long time unwantedly. Fix by using the more accurate rcu_seq_done_exact() API which is exactly what straight RCU's polling does. It may make sense, as future work, to add debug code here as well, where we compare a physical timestamp between get_state..() and poll_state() calls and yell if significant time has past but the grace period has still not progressed. Reviewed-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Kent Overstreet <kent.overstreet@linux.dev> Cc: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
2025-04-08rcu: Comment on the extraneous delta test on rcu_seq_done_exact()Frederic Weisbecker
The numbers used in rcu_seq_done_exact() lack some explanation behind their magic. Especially after the commit: 85aad7cc4178 ("rcu: Fix get_state_synchronize_rcu_full() GP-start detection") which reported a subtle issue where a new GP sequence snapshot was taken on the root node state while a grace period had already been started and reflected on the global state sequence but not yet on the root node sequence, making a polling user waiting on a wrong already started grace period that would ignore freshly online CPUs. The fix involved taking the snaphot on the global state sequence and waiting on the root node sequence. And since a grace period is first started on the global state and only afterward reflected on the root node, a snapshot taken on the global state sequence might be two full grace periods ahead of the root node as in the following example: rnp->gp_seq = rcu_state.gp_seq = 0 CPU 0 CPU 1 ----- ----- // rcu_state.gp_seq = 1 rcu_seq_start(&rcu_state.gp_seq) // snap = 8 snap = rcu_seq_snap(&rcu_state.gp_seq) // Two full GP differences rcu_seq_done_exact(&rnp->gp_seq, snap) // rnp->gp_seq = 1 WRITE_ONCE(rnp->gp_seq, rcu_state.gp_seq); Add a comment about those expectations and to clarify the magic within the relevant function. Note that the issue arises mainly with the use of rcu_seq_done_exact() which has a much tigher guardband (of 2 GPs) to ensure the false-negative window of the API during wraparound is limited to just 2 GPs. rcu_seq_done() does not have such strict requirements, however its large false-negative window of ULONG_MAX/2 is not ideal for the polling API. However, this also means care is needed to ensure the guardband is as large as needed to avoid the example scenario describe above which a warning added in an earlier patch does. [ Comment wordsmithing by Joel ] Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
2025-04-08rcu: Add warning to ensure rcu_seq_done_exact() is workingJoel Fernandes
The previous patch improved the rcu_seq_done_exact() function by adding a meaningful constant for the guardband. Ensure that this is working for the future by a quick check during rcu_gp_init(). Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
2025-04-08rcu: Replace magic number with meaningful constant in rcu_seq_done_exact()Joel Fernandes
The rcu_seq_done_exact() function checks if a grace period has completed by comparing sequence numbers. It includes a guard band to handle sequence number wraparound, which was previously expressed using the magic number calculation '3 * RCU_SEQ_STATE_MASK + 1'. This magic number is not immediately obvious in terms of what it represents. Instead, the reason we need this tiny guardband is because of the lag between the setting of rcu_state.gp_seq_polled and root rnp's gp_seq in rcu_gp_init(). This guardband needs to be at least 2 GPs worth of counts, to avoid recognizing the newly started GP as completed immediately, due to the following sequence which arises due to the delay between update of rcu_state.gp_seq_polled and root rnp's gp_seq: rnp->gp_seq = rcu_state.gp_seq = 0 CPU 0 CPU 1 ----- ----- // rcu_state.gp_seq = 1 rcu_seq_start(&rcu_state.gp_seq) // snap = 8 snap = rcu_seq_snap(&rcu_state.gp_seq) // Two full GP differences rcu_seq_done_exact(&rnp->gp_seq, snap) // rnp->gp_seq = 1 WRITE_ONCE(rnp->gp_seq, rcu_state.gp_seq); This can happen due to get_state_synchronize_rcu_full() sampling rcu_state.gp_seq_polled, however the poll_state_synchronize_rcu_full() sampling the root rnp's gp_seq. The delay between the update of the 2 counters occurs in rcu_gp_init() during which the counters briefly go out of sync. Make the guardband explictly 2 GPs. This improves code readability and maintainability by making the intent clearer as well. Suggested-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
2025-04-08rcutorture: Split out beginning and end from rcu_torture_one_read()Paul E. McKenney
The rcu_torture_one_read() function is designed for RCU readers that are confined to a task, such that a single thread of control extends from the beginning of a given RCU read-side critical section to its end. This does not suffice for things like srcu_down_read() and srcu_up_read(), where the critical section might start at task level and end in a timer handler. This commit therefore creates separate init_rcu_torture_one_read_state(), rcu_torture_one_read_start(), and rcu_torture_one_read_end() functions, along with a rcu_torture_one_read_state structure to coordinate their actions. These will be used to create tests for srcu_down_read() and friends. One caution: The caller to rcu_torture_one_read_start() must enter the initial read-side critical section prior to the call. This enables use of non-standard primitives such as srcu_down_read() while still using the same validation code. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
2025-04-08rcutorture: Make srcu_lockdep.sh check reader-conflict handlingPaul E. McKenney
Mixing different flavors of RCU readers is forbidden, for example, you should not use srcu_read_lock() and srcu_read_lock_nmisafe() on the same srcu_struct structure. There are checks for this, but these checks are not tested on a regular basis. This commit therefore adds such tests to srcu_lockdep.sh. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
2025-04-08rcutorture: Make srcu_lockdep.sh check kernel KconfigPaul E. McKenney
The srcu_lockdep.sh currently blindly trusts the rcutorture SRCU-P scenario to build its kernel with lockdep enabled. Of course, this dependency might not be obvious to someone rebalancing SRCU scenarios. This commit therefore adds code to srcu_lockdep.sh that verifies that the .config file has lockdep enabled. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
2025-04-08MAINTAINERS: Update Joel's email addressJoel Fernandes
Update MAINTAINERS file to reflect changes to Joel's email address for upstream work. Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
2025-04-06Linux 6.15-rc1v6.15-rc1Linus Torvalds
2025-04-06tools/include: make uapi/linux/types.h usable from assemblyThomas Weißschuh
The "real" linux/types.h UAPI header gracefully degrades to a NOOP when included from assembly code. Mirror this behaviour in the tools/ variant. Test for __ASSEMBLER__ over __ASSEMBLY__ as the former is provided by the toolchain automatically. Reported-by: Mark Brown <broonie@kernel.org> Closes: https://lore.kernel.org/lkml/af553c62-ca2f-4956-932c-dd6e3a126f58@sirena.org.uk/ Fixes: c9fbaa879508 ("selftests: vDSO: parse_vdso: Use UAPI headers instead of libc headers") Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Link: https://patch.msgid.link/20250321-uapi-consistency-v1-1-439070118dc0@linutronix.de Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: Mark Brown <broonie@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-04-06Merge tag 'turbostat-2025.05.06' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux Pull turbostat updates from Len Brown: - support up to 8192 processors - add cpuidle governor debug telemetry, disabled by default - update default output to exclude cpuidle invocation counts - bug fixes * tag 'turbostat-2025.05.06' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: tools/power turbostat: v2025.05.06 tools/power turbostat: disable "cpuidle" invocation counters, by default tools/power turbostat: re-factor sysfs code tools/power turbostat: Restore GFX sysfs fflush() call tools/power turbostat: Document GNR UncMHz domain convention tools/power turbostat: report CoreThr per measurement interval tools/power turbostat: Increase CPU_SUBSET_MAXCPUS to 8192 tools/power turbostat: Add idle governor statistics reporting tools/power turbostat: Fix names matching tools/power turbostat: Allow Zero return value for some RAPL registers tools/power turbostat: Clustered Uncore MHz counters should honor show/hide options
2025-04-06Merge tag 'soundwire-6.15-rc1-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire Pull soundwire fix from Vinod Koul: - add missing config symbol CONFIG_SND_HDA_EXT_CORE required for asoc driver CONFIG_SND_SOF_SOF_HDA_SDW_BPT * tag 'soundwire-6.15-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire: ASoC: SOF: Intel: Let SND_SOF_SOF_HDA_SDW_BPT select SND_HDA_EXT_CORE
2025-04-06tools/power turbostat: v2025.05.06Len Brown
Support up to 8192 processors Add cpuidle governor debug telemetry, disabled by default Update default output to exclude cpuidle invocation counts Bug fixes Signed-off-by: Len Brown <len.brown@intel.com>
2025-04-06tools/power turbostat: disable "cpuidle" invocation counters, by defaultLen Brown
Create "pct_idle" counter group, the sofware notion of residency so it can now be singled out, independent of other counter groups. Create "cpuidle" group, the cpuidle invocation counts. Disable "cpuidle", by default. Create "swidle" = "cpuidle" + "pct_idle". Undocument "sysfs", the old name for "swidle", but keep it working for backwards compatibilty. Create "hwidle", all the HW idle counters Modify "idle", enabled by default "idle" = "hwidle" + "pct_idle" (and now excludes "cpuidle") Signed-off-by: Len Brown <len.brown@intel.com>
2025-04-06Merge tag 'perf-urgent-2025-04-06' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf event fix from Ingo Molnar: "Fix a perf events time accounting bug" * tag 'perf-urgent-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/core: Fix child_total_time_enabled accounting bug at task exit
2025-04-06Merge tag 'sched-urgent-2025-04-06' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: - Fix a nonsensical Kconfig combination - Remove an unnecessary rseq-notification * tag 'sched-urgent-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: rseq: Eliminate useless task_work on execve sched/isolation: Make CONFIG_CPU_ISOLATION depend on CONFIG_SMP
2025-04-06Disable SLUB_TINY for build testingLinus Torvalds
... and don't error out so hard on missing module descriptions. Before commit 6c6c1fc09de3 ("modpost: require a MODULE_DESCRIPTION()") we used to warn about missing module descriptions, but only when building with extra warnigns (ie 'W=1'). After that commit the warning became an unconditional hard error. And it turns out not all modules have been converted despite the claims to the contrary. As reported by Damian Tometzki, the slub KUnit test didn't have a module description, and apparently nobody ever really noticed. The reason nobody noticed seems to be that the slub KUnit tests get disabled by SLUB_TINY, which also ends up disabling a lot of other code, both in tests and in slub itself. And so anybody doing full build tests didn't actually see this failre. So let's disable SLUB_TINY for build-only tests, since it clearly ends up limiting build coverage. Also turn the missing module descriptions error back into a warning, but let's keep it around for non-'W=1' builds. Reported-by: Damian Tometzki <damian@riscv-rocks.de> Link: https://lore.kernel.org/all/01070196099fd059-e8463438-7b1b-4ec8-816d-173874be9966-000000@eu-central-1.amazonses.com/ Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Jeff Johnson <jeff.johnson@oss.qualcomm.com> Fixes: 6c6c1fc09de3 ("modpost: require a MODULE_DESCRIPTION()") Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-04-06tools/power turbostat: re-factor sysfs codeLen Brown
Probe cpuidle "sysfs" residency and counts separately, since soon we will make one disabled on, and the other disabled off. Clarify that some BIC (build-in-counters) are actually "groups". since we're about to re-name some of those groups. no functional change. Signed-off-by: Len Brown <len.brown@intel.com>
2025-04-06tools/power turbostat: Restore GFX sysfs fflush() callZhang Rui
Do fflush() to discard the buffered data, before each read of the graphics sysfs knobs. Fixes: ba99a4fc8c24 ("tools/power turbostat: Remove unnecessary fflush() call") Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2025-04-06tools/power turbostat: Document GNR UncMHz domain conventionLen Brown
Document that on Intel Granite Rapids Systems, Uncore domains 0-2 are CPU domains, and uncore domains 3-4 are IO domains. Signed-off-by: Len Brown <len.brown@intel.com>
2025-04-06tools/power turbostat: report CoreThr per measurement intervalLen Brown
The CoreThr column displays total thermal throttling events since boot time. Change it to report events during the measurement interval. This is more useful for showing a user the current conditions. Total events since boot time are still available to the user via /sys/devices/system/cpu/cpu*/thermal_throttle/* Document CoreThr on turbostat.8 Fixes: eae97e053fe30 ("turbostat: Support thermal throttle count print") Reported-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Cc: Chen Yu <yu.c.chen@intel.com>
2025-04-06tools/power turbostat: Increase CPU_SUBSET_MAXCPUS to 8192Justin Ernst
On systems with >= 1024 cpus (in my case 1152), turbostat fails with the error output: "turbostat: /sys/fs/cgroup/cpuset.cpus.effective: cpu str malformat 0-1151" A similar error appears with the use of turbostat --cpu when the inputted cpu range contains a cpu number >= 1024: # turbostat -c 1100-1151 "--cpu 1100-1151" malformed ... Both errors are caused by parse_cpu_str() reaching its limit of CPU_SUBSET_MAXCPUS. It's a good idea to limit the maximum cpu number being parsed, but 1024 is too low. For a small increase in compute and allocated memory, increasing CPU_SUBSET_MAXCPUS brings support for parsing cpu numbers >= 1024. Increase CPU_SUBSET_MAXCPUS to 8192, a common setting for CONFIG_NR_CPUS on x86_64. Signed-off-by: Justin Ernst <justin.ernst@hpe.com> Signed-off-by: Len Brown <len.brown@intel.com>
2025-04-06Merge tag 'timers-cleanups-2025-04-06' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer cleanups from Thomas Gleixner: "A set of final cleanups for the timer subsystem: - Convert all del_timer[_sync]() instances over to the new timer_delete[_sync]() API and remove the legacy wrappers. Conversion was done with coccinelle plus some manual fixups as coccinelle chokes on scoped_guard(). - The final cleanup of the hrtimer_init() to hrtimer_setup() conversion. This has been delayed to the end of the merge window, so that all patches which have been merged through other trees are in mainline and all new users are catched. Doing this right before rc1 ensures that new code which is merged post rc1 is not introducing new instances of the original functionality" * tag 'timers-cleanups-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tracing/timers: Rename the hrtimer_init event to hrtimer_setup hrtimers: Rename debug_init_on_stack() to debug_setup_on_stack() hrtimers: Rename debug_init() to debug_setup() hrtimers: Rename __hrtimer_init_sleeper() to __hrtimer_setup_sleeper() hrtimers: Remove unnecessary NULL check in hrtimer_start_range_ns() hrtimers: Make callback function pointer private hrtimers: Merge __hrtimer_init() into __hrtimer_setup() hrtimers: Switch to use __htimer_setup() hrtimers: Delete hrtimer_init() treewide: Convert new and leftover hrtimer_init() users treewide: Switch/rename to timer_delete[_sync]()
2025-04-06Merge tag 'irq-urgent-2025-04-06' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull more irq updates from Thomas Gleixner: "A set of updates for the interrupt subsystem: - A treewide cleanup for the irq_domain code, which makes the naming consistent and gets rid of the original oddity of naming domains 'host'. This is a trivial mechanical change and is done late to ensure that all instances have been catched and new code merged post rc1 wont reintroduce new instances. - A trivial consistency fix in the migration code The recent introduction of irq_force_complete_move() in the core code, causes a problem for the nostalgia crowd who maintains ia64 out of tree. The code assumes that hierarchical interrupt domains are enabled and dereferences irq_data::parent_data unconditionally. That works in mainline because both architectures which enable that code have hierarchical domains enabled. Though it breaks the ia64 build, which enables the functionality, but does not have hierarchical domains. While it's not really a problem for mainline today, this unconditional dereference is inconsistent and trivially fixable by using the existing helper function irqd_get_parent_data(), which has the appropriate #ifdeffery in place" * tag 'irq-urgent-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq/migration: Use irqd_get_parent_data() in irq_force_complete_move() irqdomain: Stop using 'host' for domain irqdomain: Rename irq_get_default_host() to irq_get_default_domain() irqdomain: Rename irq_set_default_host() to irq_set_default_domain()
2025-04-06Merge tag 'timers-urgent-2025-04-06' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Thomas Gleixner: "A revert to fix a adjtimex() regression: The recent change to prevent that time goes backwards for the coarse time getters due to immediate multiplier adjustments via adjtimex(), changed the way how the timekeeping core treats that. That change result in a regression on the adjtimex() side, which is user space visible: 1) The forwarding of the base time moves the update out of the original period and establishes a new one. That's changing the behaviour of the [PF]LL control, which user space expects to be applied periodically. 2) The clearing of the accumulated NTP error due to #1, changes the behaviour as well. An attempt to delay the multiplier/frequency update to the next tick did not solve the problem as userspace expects that the multiplier or frequency updates are in effect, when the syscall returns. There is a different solution for the coarse time problem available, so revert the offending commit to restore the existing adjtimex() behaviour" * tag 'timers-urgent-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Revert "timekeeping: Fix possible inconsistencies in _COARSE clockids"
2025-04-06Merge tag 'sh-for-v6.15-tag1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux Pull sh updates from John Paul Adrian Glaubitz: "One important fix and one small configuration update. The first patch by Artur Rojek fixes an issue with the J2 firmware loader not being able to find the location of the device tree blob due to insufficient alignment of the .bss section which rendered J2 boards unbootable. The second patch by Johan Korsnes updates the defconfigs on sh to drop the CONFIG_NET_CLS_TCINDEX configuration option which became obsolete after 8c710f75256b ("net/sched: Retire tcindex classifier"). Summary: - sh: defconfig: Drop obsolete CONFIG_NET_CLS_TCINDEX - sh: Align .bss section padding to 8-byte boundary" * tag 'sh-for-v6.15-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux: sh: defconfig: Drop obsolete CONFIG_NET_CLS_TCINDEX sh: Align .bss section padding to 8-byte boundary
2025-04-05Merge tag 'kbuild-v6.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - Improve performance in gendwarfksyms - Remove deprecated EXTRA_*FLAGS and KBUILD_ENABLE_EXTRA_GCC_CHECKS - Support CONFIG_HEADERS_INSTALL for ARCH=um - Use more relative paths to sources files for better reproducibility - Support the loong64 Debian architecture - Add Kbuild bash completion - Introduce intermediate vmlinux.unstripped for architectures that need static relocations to be stripped from the final vmlinux - Fix versioning in Debian packages for -rc releases - Treat missing MODULE_DESCRIPTION() as an error - Convert Nios2 Makefiles to use the generic rule for built-in DTB - Add debuginfo support to the RPM package * tag 'kbuild-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (40 commits) kbuild: rpm-pkg: build a debuginfo RPM kconfig: merge_config: use an empty file as initfile nios2: migrate to the generic rule for built-in DTB rust: kbuild: skip `--remap-path-prefix` for `rustdoc` kbuild: pacman-pkg: hardcode module installation path kbuild: deb-pkg: don't set KBUILD_BUILD_VERSION unconditionally modpost: require a MODULE_DESCRIPTION() kbuild: make all file references relative to source root x86: drop unnecessary prefix map configuration kbuild: deb-pkg: add comment about future removal of KDEB_COMPRESS kbuild: Add a help message for "headers" kbuild: deb-pkg: remove "version" variable in mkdebian kbuild: deb-pkg: fix versioning for -rc releases Documentation/kbuild: Fix indentation in modules.rst example x86: Get rid of Makefile.postlink kbuild: Create intermediate vmlinux build with relocations preserved kbuild: Introduce Kconfig symbol for linking vmlinux with relocations kbuild: link-vmlinux.sh: Make output file name configurable kbuild: do not generate .tmp_vmlinux*.map when CONFIG_VMLINUX_MAP=y Revert "kheaders: Ignore silly-rename files" ...
2025-04-05Merge tag 'drm-next-2025-04-05' of https://gitlab.freedesktop.org/drm/kernelLinus Torvalds
Pull drm fixes from Dave Airlie: "Weekly fixes, mostly from the end of last week, this week was very quiet, maybe you scared everyone away. It's mostly amdgpu, and xe, with some i915, adp and bridge bits, since I think this is overly quiet I'd expect rc2 to be a bit more lively. bridge: - tda998x: Select CONFIG_DRM_KMS_HELPER amdgpu: - Guard against potential division by 0 in fan code - Zero RPM support for SMU 14.0.2 - Properly handle SI and CIK support being disabled - PSR fixes - DML2 fixes - DP Link training fix - Vblank fixes - RAS fixes - Partitioning fix - SDMA fix - SMU 13.0.x fixes - Rom fetching fix - MES fixes - Queue reset fix xe: - Fix NULL pointer dereference on error path - Add missing HW workaround for BMG - Fix survivability mode not triggering - Fix build warning when DRM_FBDEV_EMULATION is not set i915: - Bounds check for scalers in DSC prefill latency computation - Fix build by adding a missing include adp: - Fix error handling in plane setup" # -----BEGIN PGP SIGNATURE----- * tag 'drm-next-2025-04-05' of https://gitlab.freedesktop.org/drm/kernel: (34 commits) drm/i2c: tda998x: select CONFIG_DRM_KMS_HELPER drm/amdgpu/gfx12: fix num_mec drm/amdgpu/gfx11: fix num_mec drm/amd/pm: Add gpu_metrics_v1_8 drm/amdgpu: Prefer shadow rom when available drm/amd/pm: Update smu metrics table for smu_v13_0_6 drm/amd/pm: Remove host limit metrics support Remove unnecessary firmware version check for gc v9_4_2 drm/amdgpu: stop unmapping MQD for kernel queues v3 Revert "drm/amdgpu/sdma_v4_4_2: update VM flush implementation for SDMA" drm/amdgpu: Parse all deferred errors with UMC aca handle drm/amdgpu: Update ta ras block drm/amdgpu: Add NPS2 to DPX compatible mode drm/amdgpu: Use correct gfx deferred error count drm/amd/display: Actually do immediate vblank disable drm/amd/display: prevent hang on link training fail Revert "drm/amd/display: dml2 soc dscclk use DPM table clk setting" drm/amd/display: Increase vblank offdelay for PSR panels drm/amd: Handle being compiled without SI or CIK support better drm/amd/pm: Add zero RPM enabled OD setting support for SMU14.0.2 ...
2025-04-06kbuild: rpm-pkg: build a debuginfo RPMUday Shankar
The rpm-pkg make target currently suffers from a few issues related to debuginfo: 1. debuginfo for things built into the kernel (vmlinux) is not available in any RPM produced by make rpm-pkg. This makes using tools like systemtap against a make rpm-pkg kernel impossible. 2. debug source for the kernel is not available. This means that commands like 'disas /s' in gdb, which display source intermixed with assembly, can only print file names/line numbers which then must be painstakingly resolved to actual source in a separate editor. 3. debuginfo for modules is available, but it remains bundled with the .ko files that contain module code, in the main kernel RPM. This is a waste of space for users who do not need to debug the kernel (i.e. most users). Address all of these issues by additionally building a debuginfo RPM when the kernel configuration allows for it, in line with standard patterns followed by RPM distributors. With these changes: 1. systemtap now works (when these changes are backported to 6.11, since systemtap lags a bit behind in compatibility), as verified by the following simple test script: # stap -e 'probe kernel.function("do_sys_open").call { printf("%s\n", $$parms); }' dfd=0xffffffffffffff9c filename=0x7fe18800b160 flags=0x88800 mode=0x0 ... 2. disas /s works correctly in gdb, with source and disassembly interspersed: # gdb vmlinux --batch -ex 'disas /s blk_op_str' Dump of assembler code for function blk_op_str: block/blk-core.c: 125 { 0xffffffff814c8740 <+0>: endbr64 127 128 if (op < ARRAY_SIZE(blk_op_name) && blk_op_name[op]) 0xffffffff814c8744 <+4>: mov $0xffffffff824a7378,%rax 0xffffffff814c874b <+11>: cmp $0x23,%edi 0xffffffff814c874e <+14>: ja 0xffffffff814c8768 <blk_op_str+40> 0xffffffff814c8750 <+16>: mov %edi,%edi 126 const char *op_str = "UNKNOWN"; 0xffffffff814c8752 <+18>: mov $0xffffffff824a7378,%rdx 127 128 if (op < ARRAY_SIZE(blk_op_name) && blk_op_name[op]) 0xffffffff814c8759 <+25>: mov -0x7dfa0160(,%rdi,8),%rax 126 const char *op_str = "UNKNOWN"; 0xffffffff814c8761 <+33>: test %rax,%rax 0xffffffff814c8764 <+36>: cmove %rdx,%rax 129 op_str = blk_op_name[op]; 130 131 return op_str; 132 } 0xffffffff814c8768 <+40>: jmp 0xffffffff81d01360 <__x86_return_thunk> End of assembler dump. 3. The size of the main kernel package goes down substantially, especially if many modules are built (quite typical). Here is a comparison of installed size of the kernel package (configured with allmodconfig, dwarf4 debuginfo, and module compression turned off) before and after this patch: # rpm -qi kernel-6.13* | grep -E '^(Version|Size)' Version : 6.13.0postpatch+ Size : 1382874089 Version : 6.13.0prepatch+ Size : 17870795887 This is a ~92% size reduction. Note that a debuginfo package can only be produced if the following configs are set: - CONFIG_DEBUG_INFO=y - CONFIG_MODULE_COMPRESS=n - CONFIG_DEBUG_INFO_SPLIT=n The first of these is obvious - we can't produce debuginfo if the build does not generate it. The second two requirements can in principle be removed, but doing so is difficult with the current approach, which uses a generic rpmbuild script find-debuginfo.sh that processes all packaged executables. If we want to remove those requirements the best path forward is likely to add some debuginfo extraction/installation logic to the modules_install target (controllable by flags). That way, it's easier to operate on modules before they're compressed, and the logic can be reused by all packaging targets. Signed-off-by: Uday Shankar <ushankar@purestorage.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2025-04-06kconfig: merge_config: use an empty file as initfileDaniel Gomez
The scripts/kconfig/merge_config.sh script requires an existing $INITFILE (or the $1 argument) as a base file for merging Kconfig fragments. However, an empty $INITFILE can serve as an initial starting point, later referenced by the KCONFIG_ALLCONFIG Makefile variable if -m is not used. This variable can point to any configuration file containing preset config symbols (the merged output) as stated in Documentation/kbuild/kconfig.rst. When -m is used $INITFILE will contain just the merge output requiring the user to run make (i.e. KCONFIG_ALLCONFIG=<$INITFILE> make <allnoconfig/alldefconfig> or make olddefconfig). Instead of failing when `$INITFILE` is missing, create an empty file and use it as the starting point for merges. Signed-off-by: Daniel Gomez <da.gomez@samsung.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>