summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2025-03-13posix-timers: Make signal_struct:: Next_posix_timer_id an atomic_tEric Dumazet
The global hash_lock protecting the posix timer hash table can be heavily contended especially when there is an extensive linear search for a timer ID. Timer IDs are handed out by monotonically increasing next_posix_timer_id and then validating that there is no timer with the same ID in the hash table. Both operations happen with the global hash lock held. To reduce the hash lock contention the hash will be reworked to a scaled hash with per bucket locks, which requires to handle the ID counter lockless. Prepare for this by making next_posix_timer_id an atomic_t, which can be used lockless with atomic_inc_return(). [ tglx: Adopted from Eric's series, massaged change log and simplified it ] Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/all/20250219125522.2535263-2-edumazet@google.com Link: https://lore.kernel.org/all/20250308155624.151545978@linutronix.de
2025-03-13posix-timers: Make lock_timer() use guard()Peter Zijlstra
The lookup and locking of posix timers requires the same repeating pattern at all usage sites: tmr = lock_timer(tiner_id); if (!tmr) return -EINVAL; .... unlock_timer(tmr); Solve this with a guard implementation, which works in most places out of the box except for those, which need to unlock the timer inside the guard scope. Though the only places where this matters are timer_delete() and timer_settime(). In both cases the timer pointer needs to be preserved across the end of the scope, which is solved by storing the pointer in a variable outside of the scope. timer_settime() also has to protect the timer with RCU before unlocking, which obviously can't use guard(rcu) before leaving the guard scope as that guard is cleaned up before the unlock. Solve this by providing the RCU protection open coded. [ tglx: Made it work and added change log ] Signed-off-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/all/20250224162103.GD11590@noisy.programming.kicks-ass.net Link: https://lore.kernel.org/all/20250308155624.087465658@linutronix.de
2025-03-13posix-timers: Rework timer removalThomas Gleixner
sys_timer_delete() and the do_exit() cleanup function itimer_delete() are doing the same thing, but have needlessly different implementations instead of sharing the code. The other oddity of timer deletion is the fact that the timer is not invalidated before the actual deletion happens, which allows concurrent lookups to succeed. That's wrong because a timer which is in the process of being deleted should not be visible and any actions like signal queueing, delivery and rearming should not happen once the task, which invoked timer_delete(), has the timer locked. Rework the code so that: 1) The signal queueing and delivery code ignore timers which are marked invalid 2) The deletion implementation between sys_timer_delete() and itimer_delete() is shared 3) The timer is invalidated and removed from the linked lists before the deletion callback of the relevant clock is invoked. That requires to rework timer_wait_running() as it does a lookup of the timer when relocking it at the end. In case of deletion this lookup would fail due to the preceding invalidation and the wait loop would terminate prematurely. But due to the preceding invalidation the timer cannot be accessed by other tasks anymore, so there is no way that the timer has been freed after the timer lock has been dropped. Move the re-validation out of timer_wait_running() and handle it at the only other usage site, timer_settime(). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/all/87zfht1exf.ffs@tglx
2025-03-13posix-timers: Simplify lock/unlock_timer()Thomas Gleixner
Since the integration of sigqueue into the timer struct, lock_timer() is only used in task context. So taking the lock with irqsave() is not longer required. Convert it to use spin_[un]lock_irq(). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/all/20250308155623.959825668@linutronix.de
2025-03-13posix-timers: Use guards in a few placesThomas Gleixner
Switch locking and RCU to guards where applicable. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/all/20250308155623.892762130@linutronix.de
2025-03-13posix-timers: Remove SLAB_PANIC from kmem cacheThomas Gleixner
There is no need to panic when the posix-timer kmem_cache can't be created. timer_create() will fail with -ENOMEM and that's it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/all/20250308155623.829215801@linutronix.de
2025-03-13posix-timers: Remove a few paranoid warningsThomas Gleixner
Warnings about a non-initialized timer or non-existing callbacks are just useful for implementing new posix clocks, but there a NULL pointer dereference is expected anyway. :) Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/all/20250308155623.765462334@linutronix.de
2025-03-13posix-timers: Cleanup includesThomas Gleixner
Remove pointless includes and sort the remaining ones alphabetically. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/all/20250308155623.701301552@linutronix.de
2025-03-13posix-timers: Add cond_resched() to posix_timer_add() search loopEric Dumazet
With a large number of POSIX timers the search for a valid ID might cause a soft lockup on PREEMPT_NONE/VOLUNTARY kernels. Add cond_resched() to the loop to prevent that. [ tglx: Split out from Eric's series ] Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/all/20250214135911.2037402-2-edumazet@google.com Link: https://lore.kernel.org/all/20250308155623.635612865@linutronix.de
2025-03-13posix-timers: Initialise timer before adding it to the hash tableEric Dumazet
A timer is only valid in the hashtable when both timer::it_signal and timer::it_id are set to their final values, but timers are added without those values being set. The timer ID is allocated when the timer is added to the hash in invalid state. The ID is taken from a monotonically increasing per process counter which wraps around after reaching INT_MAX. The hash insertion validates that there is no timer with the allocated ID in the hash table which belongs to the same process. That opens a mostly theoretical race condition: If other threads of the same process manage to create/delete timers in rapid succession before the newly created timer is fully initialized and wrap around to the timer ID which was handed out, then a duplicate timer ID will be inserted into the hash table. Prevent this by: 1) Setting timer::it_id before inserting the timer into the hashtable. 2) Storing the signal pointer in timer::it_signal with bit 0 set before inserting it into the hashtable. Bit 0 acts as a invalid bit, which means that the regular lookup for sys_timer_*() will fail the comparison with the signal pointer. But the lookup on insertion masks out bit 0 and can therefore detect a timer which is not yet valid, but allocated in the hash table. Bit 0 in the pointer is cleared once the initialization of the timer completed. [ tglx: Fold ID and signal iniitializaion into one patch and massage change log and comments. ] Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/all/20250219125522.2535263-3-edumazet@google.com Link: https://lore.kernel.org/all/20250308155623.572035178@linutronix.de
2025-03-13posix-timers: Ensure that timer initialization is fully visibleThomas Gleixner
Frederic pointed out that the memory operations to initialize the timer are not guaranteed to be visible, when __lock_timer() observes timer::it_signal valid under timer::it_lock: T0 T1 --------- ----------- do_timer_create() // A new_timer->.... = .... spin_lock(current->sighand) // B WRITE_ONCE(new_timer->it_signal, current->signal) spin_unlock(current->sighand) sys_timer_*() t = __lock_timer() spin_lock(&timr->it_lock) // observes B if (timr->it_signal == current->signal) return timr; if (!t) return; // Is not guaranteed to observe A Protect the write of timer::it_signal, which makes the timer valid, with timer::it_lock as well. This guarantees that T1 must observe the initialization A completely, when it observes the valid signal pointer under timer::it_lock. sighand::siglock must still be taken to protect the signal::posix_timers list. Reported-by: Frederic Weisbecker <frederic@kernel.org> Suggested-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/all/20250308155623.507944489@linutronix.de
2025-03-13clocksource: Remove unnecessary strscpy() size argumentThorsten Blum
The size argument of strscpy() is only required when the destination pointer is not a fixed sized array or when the copy needs to be smaller than the size of the fixed sized destination array. For fixed sized destination arrays and full copies, strscpy() automatically determines the length of the destination buffer if the size argument is omitted. This makes the explicit sizeof() unnecessary. Remove it. [ tglx: Massaged change log ] Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250311110624.495718-2-thorsten.blum@linux.dev
2025-03-13timer_list: Don't use %pK through printk()Thomas Weißschuh
This reverts commit f590308536db ("timer debug: Hide kernel addresses via %pK in /proc/timer_list") The timer list helper SEQ_printf() uses either the real seq_printf() for procfs output or vprintk() to print to the kernel log, when invoked from SysRq-q. It uses %pK for printing pointers. In the past %pK was prefered over %p as it would not leak raw pointer values into the kernel log. Since commit ad67b74d2469 ("printk: hash addresses printed with %p") the regular %p has been improved to avoid this issue. Furthermore, restricted pointers ("%pK") were never meant to be used through printk(). They can still unintentionally leak raw pointers or acquire sleeping looks in atomic contexts. Switch to the regular pointer formatting which is safer, easier to reason about and sufficient here. Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/lkml/20250113171731-dc10e3c1-da64-4af0-b767-7c7070468023@linutronix.de/ Link: https://lore.kernel.org/all/20250311-restricted-pointers-timer-v1-1-6626b91e54ab@linutronix.de
2025-03-12Merge tag 'sched_ext-for-6.14-rc6-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext Pull sched_ext fix from Tejun Heo: "BPF schedulers could trigger a crash by passing in an invalid CPU to the scx_bpf_select_cpu_dfl() helper. Fix it by verifying input validity" * tag 'sched_ext-for-6.14-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext: sched_ext: Validate prev_cpu in scx_bpf_select_cpu_dfl()
2025-03-12PM: s2idle: Extend comment in s2idle_enter()Ulf Hansson
The s2idle_lock must be held while checking for a pending wakeup and while moving into S2IDLE_STATE_ENTER, to make sure a wakeup doesn't get lost. Let's extend the comment in the code to make this clear. Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Link: https://patch.msgid.link/20250311160827.1129643-3-ulf.hansson@linaro.org [ rjw: Rewrote the new comment ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-03-12PM: s2idle: Drop redundant locks when entering s2idleUlf Hansson
The calls to cpus_read_lock|unlock() protects us from getting CPUS hotplugged, while entering suspend-to-idle. However, when s2idle_enter() is called we should be far beyond the point when CPUs may be hotplugged. Let's therefore simplify the code and drop the use of the lock. Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Link: https://patch.msgid.link/20250311160827.1129643-2-ulf.hansson@linaro.org [ rjw: Rewrote the new comment ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-03-12dma-mapping: fix missing clear bdr in check_ram_in_range_map()Baochen Qiang
As discussed in [1], if 'bdr' is set once, it would never get cleared, hence 0 is always returned. Refactor the range check hunk into a new helper dma_find_range(), which allows 'bdr' to be cleared in each iteration. Link: https://lore.kernel.org/all/64931fac-085b-4ff3-9314-84bac2fa9bdb@quicinc.com/ # [1] Fixes: a409d9600959 ("dma-mapping: fix dma_addressing_limited() if dma_range_map can't cover all system RAM") Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com> Link: https://lore.kernel.org/r/20250307030350.69144-1-quic_bqiang@quicinc.com Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
2025-03-11blk-cgroup: Simplify policy files registrationMichal Koutný
Use one set of files when there is no difference between default and legacy files, similar to regular subsys files registration. No functional change. Signed-off-by: Michal Koutný <mkoutny@suse.com> Acked-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-03-11cgroup: Add deprecation message to legacy freezer controllerMichal Koutný
As explained in the commit 76f969e8948d8 ("cgroup: cgroup v2 freezer"), the original freezer is imperfect, some users may unwittingly rely on it when there exists the alternative of v2. Print a message when it happens and explain that in the docs. Signed-off-by: Michal Koutný <mkoutny@suse.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-03-11RFC cgroup/cpuset-v1: Add deprecation messages to sched_relax_domain_levelMichal Koutný
This is not a properly hierarchical resource, it might be better implemented based on a sched_attr. Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: Michal Koutný <mkoutny@suse.com> Acked-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-03-11cgroup/cpuset-v1: Add deprecation messages to memory_migrateMichal Koutný
Memory migration (between cgroups) was given up in v2 due to performance reasons of its implementation. Migration between NUMA nodes within one memcg may still make sense to modify affinity at runtime though. Signed-off-by: Michal Koutný <mkoutny@suse.com> Acked-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-03-11cgroup/cpuset-v1: Add deprecation messages to mem_exclusive and mem_hardwallMichal Koutný
The concept of exclusive memory affinity may require complex approaches like with cpuset v2 cpu partitions. There is so far no implementation in cpuset v2. Specific kernel memory affinity may cause unintended (global) bottlenecks like kmem limits. Signed-off-by: Michal Koutný <mkoutny@suse.com> Acked-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-03-11cgroup: Print message when /proc/cgroups is read on v2-only systemMichal Koutný
As a followup to commits 6c2920926b10e ("cgroup: replace unified-hierarchy.txt with a proper cgroup v2 documentation") and ab03125268679 ("cgroup: Show # of subsystem CSSes in cgroup.stat"), add a runtime message to users who read status of controllers in /proc/cgroups on v2-only system. The detection is based on a) no controllers are attached to v1, b) default hierarchy is mounted (the latter is for setups that never mount v2 but read /proc/cgroups upon boot when controllers default to v2, so that this code may be backported to older kernels). Signed-off-by: Michal Koutný <mkoutny@suse.com> Acked-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-03-11cgroup/cpuset-v1: Add deprecation messages to memory_spread_page and ↵Michal Koutný
memory_spread_slab There is MPOL_INTERLEAVE for user explicit allocations. Deprecate spreading of allocations that users carry out unwittingly. Use straight warning level for slab spreading since such a knob is unnecessarily intertwined with slab allocator. Signed-off-by: Michal Koutný <mkoutny@suse.com> Acked-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-03-11cgroup/cpuset-v1: Add deprecation messages to sched_load_balance and ↵Michal Koutný
memory_pressure_enabled These two v1 feature have analogues in cgroup v2. Signed-off-by: Michal Koutný <mkoutny@suse.com> Acked-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-03-11stop-machine: Add comment for rcu_momentary_eqs()Paul E. McKenney
Add a comment to explain the purpose of the rcu_momentary_eqs() call from multi_cpu_stop(), which is to suppress false-positive RCU CPU stall warnings. Reported-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/87wmeuanti.ffs@tglx/ Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Neeraj Upadhyay <neeraj.upadhyay@kernel.org> Cc: Frederic Weisbecker <frederic@kernel.org>
2025-03-11printk: Check CON_SUSPEND when unblanking a consoleMarcos Paulo de Souza
The commit 9e70a5e109a4 ("printk: Add per-console suspended state") introduced the CON_SUSPENDED flag for consoles. The suspended consoles will stop receiving messages, so don't unblank suspended consoles because it won't be showing anything either way. Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Reviewed-by: John Ogness <john.ogness@linutronix.de> Link: https://lore.kernel.org/r/20250226-printk-renaming-v1-5-0b878577f2e6@suse.com Signed-off-by: Petr Mladek <pmladek@suse.com>
2025-03-11printk: Rename console_start to console_resumeMarcos Paulo de Souza
The intent of console_start was to resume a previously suspended console, so rename it accordingly. Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Reviewed-by: John Ogness <john.ogness@linutronix.de> Link: https://lore.kernel.org/r/20250226-printk-renaming-v1-4-0b878577f2e6@suse.com [pmladek@suse.com: Fixed typo in the commit message. Updated also new drm_log.c.] Signed-off-by: Petr Mladek <pmladek@suse.com>
2025-03-11printk: Rename console_stop to console_suspendMarcos Paulo de Souza
The intent of console_stop was in fact to suspend it, so rename the function accordingly. Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Reviewed-by: John Ogness <john.ogness@linutronix.de> Link: https://lore.kernel.org/r/20250226-printk-renaming-v1-3-0b878577f2e6@suse.com [pmladek@suse.com: Fixed typo in the commit message. Updated also new drm_log.c] Signed-off-by: Petr Mladek <pmladek@suse.com>
2025-03-11printk: Rename resume_console to console_resume_allMarcos Paulo de Souza
The function resume_console has a misleading name, since it resumes all consoles, so rename it accordingly. Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Reviewed-by: John Ogness <john.ogness@linutronix.de> Link: https://lore.kernel.org/r/20250226-printk-renaming-v1-2-0b878577f2e6@suse.com [pmladek@suse.com: Fixed typo in the commit message.] Signed-off-by: Petr Mladek <pmladek@suse.com>
2025-03-11printk: Rename suspend_console to console_suspend_allMarcos Paulo de Souza
The function suspend_console has a misleading name, since it suspends all consoles, so rename it accordingly. Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Reviewed-by: John Ogness <john.ogness@linutronix.de> Link: https://lore.kernel.org/r/20250226-printk-renaming-v1-1-0b878577f2e6@suse.com [pmladek@suse.com: Fixed typo in the commit message.] Signed-off-by: Petr Mladek <pmladek@suse.com>
2025-03-10perf/core: Remove optional 'size' arguments from strscpy() callsThorsten Blum
The 'size' parameter is optional and strscpy() automatically determines the length of the destination buffer using sizeof() if the argument is omitted. This makes the explicit sizeof() calls unnecessary. Furthermore, KSYM_NAME_LEN is equal to sizeof(name) and can also be removed. Remove them to shorten and simplify the code. Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20250310192336.442994-1-thorsten.blum@linux.dev
2025-03-10Merge 6.14-rc6 into driver-core-nextGreg Kroah-Hartman
We need the driver core fix in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-10sched/clock: Don't define sched_clock_irqtime as static keyYafang Shao
The sched_clock_irqtime was defined as a static key in: 8722903cbb8f ("sched: Define sched_clock_irqtime as static key") However, this change introduces a 'sleeping in atomic context' warning: arch/x86/kernel/tsc.c:1214 mark_tsc_unstable() warn: sleeping in atomic context As analyzed by Dan, the affected code path is as follows: vcpu_load() <- disables preempt -> kvm_arch_vcpu_load() -> mark_tsc_unstable() <- sleeps virt/kvm/kvm_main.c 166 void vcpu_load(struct kvm_vcpu *vcpu) 167 { 168 int cpu = get_cpu(); ^^^^^^^^^^ This get_cpu() disables preemption. 169 170 __this_cpu_write(kvm_running_vcpu, vcpu); 171 preempt_notifier_register(&vcpu->preempt_notifier); 172 kvm_arch_vcpu_load(vcpu, cpu); 173 put_cpu(); 174 } arch/x86/kvm/x86.c 4979 if (unlikely(vcpu->cpu != cpu) || kvm_check_tsc_unstable()) { 4980 s64 tsc_delta = !vcpu->arch.last_host_tsc ? 0 : 4981 rdtsc() - vcpu->arch.last_host_tsc; 4982 if (tsc_delta < 0) 4983 mark_tsc_unstable("KVM discovered backwards TSC"); arch/x86/kernel/tsc.c 1206 void mark_tsc_unstable(char *reason) 1207 { 1208 if (tsc_unstable) 1209 return; 1210 1211 tsc_unstable = 1; 1212 if (using_native_sched_clock()) 1213 clear_sched_clock_stable(); --> 1214 disable_sched_clock_irqtime(); ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ kernel/jump_label.c 245 void static_key_disable(struct static_key *key) 246 { 247 cpus_read_lock(); ^^^^^^^^^^^^^^^^ This lock has a might_sleep() in it which triggers the static checker warning. 248 static_key_disable_cpuslocked(key); 249 cpus_read_unlock(); 250 } Let revert this change for now as {disable,enable}_sched_clock_irqtime are used in many places, as pointed out by Sean, including the following: The code path in clocksource_watchdog(): clocksource_watchdog() | -> spin_lock(&watchdog_lock); | -> __clocksource_unstable() | -> clocksource.mark_unstable() == tsc_cs_mark_unstable() | -> disable_sched_clock_irqtime() And the code path in sched_clock_register(): /* Cannot register a sched_clock with interrupts on */ local_irq_save(flags); ... /* Enable IRQ time accounting if we have a fast enough sched_clock() */ if (irqtime > 0 || (irqtime == -1 && rate >= 1000000)) enable_sched_clock_irqtime(); local_irq_restore(flags); [ lkp@intel.com: reported a build error in the prev version ] [ mingo: cherry-picked it over into sched/urgent ] Closes: https://lore.kernel.org/kvm/37a79ba3-9ce0-479c-a5b0-2bd75d573ed3@stanley.mountain/ Fixes: 8722903cbb8f ("sched: Define sched_clock_irqtime as static key") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Debugged-by: Dan Carpenter <dan.carpenter@linaro.org> Debugged-by: Sean Christopherson <seanjc@google.com> Debugged-by: Michal Koutný <mkoutny@suse.com> Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lkml.kernel.org/r/20250205032438.14668-1-laoar.shao@gmail.com
2025-03-10module: Remove unnecessary size argument when calling strscpy()Thorsten Blum
The size parameter is optional and strscpy() automatically determines the length of the destination buffer using sizeof() if the argument is omitted. This makes the explicit sizeof() unnecessary. Remove it to shorten and simplify the code. Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Link: https://lore.kernel.org/r/20250308194631.191670-2-thorsten.blum@linux.dev Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
2025-03-10module: Replace deprecated strncpy() with strscpy()Thorsten Blum
strncpy() is deprecated for NUL-terminated destination buffers; use strscpy() instead. The destination buffer ownername is only used with "%s" format strings and must therefore be NUL-terminated, but not NUL- padded. No functional changes intended. Link: https://github.com/KSPP/linux/issues/90 Cc: linux-hardening@vger.kernel.org Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Link: https://lore.kernel.org/r/20250307113546.112237-2-thorsten.blum@linux.dev Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
2025-03-10params: Annotate struct module_param_attrs with __counted_by()Thorsten Blum
Add the __counted_by compiler attribute to the flexible array member attrs to improve access bounds-checking via CONFIG_UBSAN_BOUNDS and CONFIG_FORTIFY_SOURCE. Increment num before adding a new param_attribute to the attrs array and adjust the array index accordingly. Increment num immediately after the first reallocation such that the reallocation for the NULL terminator only needs to add 1 (instead of 2) to mk->mp->num. Use struct_size() instead of manually calculating the size for the reallocation. Use krealloc_array() for the additional NULL terminator. Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Link: https://lore.kernel.org/r/20250213221352.2625-3-thorsten.blum@linux.dev Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
2025-03-10static_call: Use RCU in all users of __module_text_address().Sebastian Andrzej Siewior
__module_text_address() can be invoked within a RCU section, there is no requirement to have preemption disabled. Replace the preempt_disable() section around __module_text_address() with RCU. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250108090457.512198-28-bigeasy@linutronix.de Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
2025-03-10kprobes: Use RCU in all users of __module_text_address().Sebastian Andrzej Siewior
__module_text_address() can be invoked within a RCU section, there is no requirement to have preemption disabled. Replace the preempt_disable() section around __module_text_address() with RCU. Cc: David S. Miller <davem@davemloft.net> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Naveen N Rao <naveen@kernel.org> Cc: linux-trace-kernel@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250129084925.9ppBjGLC@linutronix.de Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
2025-03-10bpf: Use RCU in all users of __module_text_address().Sebastian Andrzej Siewior
__module_address() can be invoked within a RCU section, there is no requirement to have preemption disabled. Replace the preempt_disable() section around __module_address() with RCU. Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Hao Luo <haoluo@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Matt Bobrowski <mattbobrowski@google.com> Cc: Song Liu <song@kernel.org> Cc: Stanislav Fomichev <sdf@fomichev.me> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Yonghong Song <yonghong.song@linux.dev> Cc: bpf@vger.kernel.org Cc: linux-trace-kernel@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/r/20250129084751.tH6iidUO@linutronix.de Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
2025-03-10jump_label: Use RCU in all users of __module_text_address().Sebastian Andrzej Siewior
__module_text_address() can be invoked within a RCU section, there is no requirement to have preemption disabled. Replace the preempt_disable() section around __module_text_address() with RCU. Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Jason Baron <jbaron@akamai.com> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250108090457.512198-25-bigeasy@linutronix.de Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
2025-03-10jump_label: Use RCU in all users of __module_address().Sebastian Andrzej Siewior
__module_address() can be invoked within a RCU section, there is no requirement to have preemption disabled. Replace the preempt_disable() section around __module_address() with RCU. Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Jason Baron <jbaron@akamai.com> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250108090457.512198-24-bigeasy@linutronix.de Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
2025-03-10cfi: Use RCU while invoking __module_address().Sebastian Andrzej Siewior
__module_address() can be invoked within a RCU section, there is no requirement to have preemption disabled. The _notrace() variant was introduced in commit 14c4c8e41511a ("cfi: Use rcu_read_{un}lock_sched_notrace"). The recursive case where __cfi_slowpath_diag() could end up calling itself is no longer present, as all that logic is gone since commit 89245600941e ("cfi: Switch to -fsanitize=kcfi"). Sami Tolvanen said that KCFI checks don't perform function calls. Elliot Berman verified it with | modprobe -a dummy_stm stm_ftrace stm_p_basic | mkdir -p /sys/kernel/config/stp-policy/dummy_stm.0.my-policy/default | echo function > /sys/kernel/tracing/current_tracer | echo 1 > /sys/kernel/tracing/tracing_on | echo dummy_stm.0 > /sys/class/stm_source/ftrace/stm_source_link Replace the rcu_read_lock_sched_notrace() section around __module_address() with RCU. Cc: Elliot Berman <quic_eberman@quicinc.com> Cc: Kees Cook <kees@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Sami Tolvanen <samitolvanen@google.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: llvm@lists.linux.dev Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Elliot Berman <elliot.berman@oss.qualcomm.com> # sm8650-qrd [1] Link: https://lore.kernel.org/all/20241230185812429-0800.eberman@hu-eberman-lv.qualcomm.com [1] Link: https://lore.kernel.org/r/20250108090457.512198-22-bigeasy@linutronix.de Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
2025-03-10module: Use RCU in all users of __module_text_address().Sebastian Andrzej Siewior
__module_text_address() can be invoked within a RCU section, there is no requirement to have preemption disabled. Replace the preempt_disable() section around __module_text_address() with RCU. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250108090457.512198-16-bigeasy@linutronix.de Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
2025-03-10module: Use RCU in all users of __module_address().Sebastian Andrzej Siewior
__module_address() can be invoked within a RCU section, there is no requirement to have preemption disabled. Replace the preempt_disable() section around __module_address() with RCU. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250108090457.512198-15-bigeasy@linutronix.de Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
2025-03-10module: Use RCU in search_module_extables().Sebastian Andrzej Siewior
search_module_extables() returns an exception_table_entry belonging to a module. The lookup via __module_address() can be performed with RCU protection. The returned exception_table_entry remains valid because the passed address usually belongs to a module that is currently executed. So the module can not be removed because "something else" holds a reference to it, ensuring that it can not be removed. Exceptions here are: - kprobe, acquires a reference on the module beforehand - MCE, invokes the function from within a timer and the RCU lifetime guarantees (of the timer) are sufficient. Therefore it is safe to return the exception_table_entry outside the RCU section which provided the module. Use RCU for the lookup in search_module_extables() and update the comment. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250108090457.512198-14-bigeasy@linutronix.de Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
2025-03-10module: Allow __module_address() to be called from RCU section.Sebastian Andrzej Siewior
mod_find() uses either the modules list to find a module or a tree lookup (CONFIG_MODULES_TREE_LOOKUP). The list and the tree can both be iterated under RCU assumption (as well as RCU-sched). Remove module_assert_mutex_or_preempt() from __module_address() and entirely since __module_address() is the last user. Update comments. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250108090457.512198-13-bigeasy@linutronix.de Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
2025-03-10module: Use RCU in __is_module_percpu_address().Sebastian Andrzej Siewior
The modules list can be accessed under RCU assumption. Use RCU protection instead preempt_disable(). Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250108090457.512198-12-bigeasy@linutronix.de Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
2025-03-10module: Use RCU in find_symbol().Sebastian Andrzej Siewior
module_assert_mutex_or_preempt() is not needed in find_symbol(). The function checks for RCU-sched or the module_mutex to be acquired. The list_for_each_entry_rcu() below does the same check. Remove module_assert_mutex_or_preempt() from try_add_tainted_module(). Use RCU protection to invoke find_symbol() and update callers. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250108090457.512198-11-bigeasy@linutronix.de Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
2025-03-10module: Remove module_assert_mutex_or_preempt() from try_add_tainted_module().Sebastian Andrzej Siewior
module_assert_mutex_or_preempt() is not needed in try_add_tainted_module(). The function checks for RCU-sched or the module_mutex to be acquired. The list_for_each_entry_rcu() below does the same check. Remove module_assert_mutex_or_preempt() from try_add_tainted_module(). Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250108090457.512198-10-bigeasy@linutronix.de Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>