summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2023-09-24rcutorture: Traverse possible cpu to set maxcpu in rcu_nocb_toggle()Zqiang
Currently, the maxcpu is set by traversing online CPUs, however, if the rcutorture.onoff_holdoff is set zero and onoff_interval is set non-zero, and the some CPUs with larger cpuid has been offline before setting maxcpu, for these CPUs, even if they are online again, also cannot be offload or deoffload. This can result in rcutorture attempting to (de-)offload CPUs that have never been online, but the (de-)offload code handles this. This commit therefore use for_each_possible_cpu() instead of for_each_online_cpu() in rcu_nocb_toggle(). Signed-off-by: Zqiang <qiang.zhang1211@gmail.com> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2023-09-24rcutorture: Replace schedule_timeout*() 1-jiffy waits with HZ/20Joel Fernandes (Google)
In the past, spinning on schedule_timeout* with a wait of 1 jiffy has hung the kernel. See for example d52d3a2bf408 ("torture: Fix hang during kthread shutdown phase"). This issue recently recurred in torture's stutter code. The result is that the function instantly returns and never goes to sleep, preempting whatever might otherwise make useful forward progress. To prevent future issues, apply the commit-d52d3a2bf408 fix throughout rcutorture, moving from a 1-jiffy wait to a 50-millisecond wait. Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2023-09-24locktorture: Rename readers_bind/writers_bind to bind_readers/bind_writersPaul E. McKenney
This commit renames the readers_bind and writers_bind module parameters to bind_readers and bind_writers, respectively. This provides added clarity via the imperative mode and better organizes the documentation. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2023-09-24locktorture: Add call_rcu_chains module parameterPaul E. McKenney
When running locktorture on large systems, there will normally be enough RCU activity to ensure that there is a grace period in flight at all times. However, on smaller systems, RCU might well be idle the majority of the time. This situation can be inconvenient in cases where the RCU CPU stall warning is part of the debugging process. This commit therefore adds an call_rcu_chains module parameter to locktorture, allowing the user to specify the desired number of self-propagating call_rcu() chains. For good measure, immediately before invoking call_rcu(), the self-propagating RCU callback invokes start_poll_synchronize_rcu() to force the immediate start of a grace period, with the call_rcu() forcing another to start shortly thereafter. Booting with locktorture.call_rcu_chains=2 increases the probability of a stuck locking primitive resulting in an RCU CPU stall warning from about 25% to nearly 100%. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2023-09-24locktorture: Add new module parameters to lock_torture_print_module_parms()Paul E. McKenney
This commit adds new module parameters to lock_torture_print_module_parms, and alphabetizes things while in the area. This change makes locktorture test results more useful and self-contained. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2023-09-24torture: Print out torture module parametersPaul E. McKenney
The kernel/torture.c module now has several module parameters, so this commit causes them to be printed out. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2023-09-24locktorture: Add acq_writer_lim to complain about long acquistion timesPaul E. McKenney
This commit adds a locktorture.acq_writer_lim module parameter that specifies the maximum number of jiffies that is expected to be consumed by write-side lock acquisition. If this limit is exceeded, a WARN_ONCE() causes a splat. Note that this limit applies to the main lock acquisition only, not to any nested acquisitions. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2023-09-24locktorture: Consolidate "if" statements in lock_torture_writer()Paul E. McKenney
There is a pair of adjacent "if" statements with identical conditions in the lock_torture_writer() function. This commit therefore combines them. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2023-09-24locktorture: Alphabetize torture_param() entriesPaul E. McKenney
There are getting to be too many module parameters for a random list to be comfortable, so this commit alphabetizes the list. Strictly code motion. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2023-09-24rcutorture: Fix stuttering races and other issuesJoel Fernandes (Google)
The stuttering code isn't functioning as expected. Ideally, it should pause the torture threads for a designated period before resuming. Yet, it fails to halt the test for the correct duration. Additionally, a race condition exists, potentially causing the stuttering code to pause for an extended period if the 'spt' variable is non-zero due to the stutter orchestration thread's inadequate CPU time. Moreover, over-stuttering can hinder RCU's progress on TREE07 kernels. This happens as the stuttering code may run within a softirq due to RCU callbacks. Consequently, ksoftirqd keeps a CPU busy for several seconds, thus obstructing RCU's progress. This situation triggers a warning message in the logs: [ 2169.481783] rcu_torture_writer: rtort_pipe_count: 9 This warning suggests that an RCU torture object, although invisible to RCU readers, couldn't make it past the pipe array and be freed -- a strong indication that there weren't enough grace periods during the stutter interval. To address these issues, this patch sets the "stutter end" time to an absolute point in the future set by the main stutter thread. This is then used for waiting in stutter_wait(). While the stutter thread still defines this absolute time, the waiters' waiting logic doesn't rely on the stutter thread receiving sufficient CPU time to halt the stuttering as the halting is now self-controlled. Cc: stable@vger.kernel.org Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2023-09-24locktorture: Add readers_bind and writers_bind module parametersPaul E. McKenney
This commit adds readers_bind and writers_bind module parameters to locktorture in order to skew tests across socket boundaries. This skewing is intended to provide additional variable-latency stress on the primitive under test. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2023-09-24torture: Move rcutorture_sched_setaffinity() out of rcutorturePaul E. McKenney
The rcutorture_sched_setaffinity() function is needed by locktorture, so move its declaration from rcu.h to torture.h and rename it to the more generic torture_sched_setaffinity() name. Please note that use of this function is still restricted to torture tests, and of those, currently only rcutorture and locktorture. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2023-09-24rcu: Include torture_sched_setaffinity() declarationArnd Bergmann
The prototype for torture_sched_setaffinity() will be moved to a different header, which will need to be included from update.c to avoid this W=1 warning: kernel/rcu/update.c:529:6: error: no previous prototype for 'torture_sched_setaffinity' [-Werror=missing-prototypes] 529 | long torture_sched_setaffinity(pid_t pid, const struct cpumask *in_mask) Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2023-09-24torture: Make torture_hrtimeout_ns() take an hrtimer mode parameterPaul E. McKenney
The current torture-test sleeps are waiting for a duration, but there are situations where it is better to wait for an absolute time, for example, when ending a stutter interval. This commit therefore adds an hrtimer mode parameter to torture_hrtimeout_ns(). Why not also the other torture_hrtimeout_*() functions? The theory is that most absolute times will be in nanoseconds, especially not (say) jiffies. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2023-09-24torture: Share torture_random_state with torture_shuffle_tasks()Paul E. McKenney
Both torture_shuffle_tasks() and its caller torture_shuffle() define a torture_random_state structure. This is suboptimal given that torture_shuffle_tasks() runs for a very short period of time. This commit therefore causes torture_shuffle() to pass a pointer to its torture_random_state structure down to torture_shuffle_tasks(). Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2023-09-24sched/core: Refactor the task_flags check for worker sleeping in ↵Wang Jinchao
sched_submit_work() Simplify the conditional logic for checking worker flags by splitting the original compound `if` statement into separate `if` and `else if` clauses. This modification not only retains the previous functionality, but also reduces a single `if` check, improving code clarity and potentially enhancing performance. Signed-off-by: Wang Jinchao <wangjinchao@xfusion.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/ZOIMvURE99ZRAYEj@fedora
2023-09-24sched/fair: Fix warning in bandwidth distributionJosh Don
We've observed the following warning being hit in distribute_cfs_runtime(): SCHED_WARN_ON(cfs_rq->runtime_remaining > 0) We have the following race: - CPU 0: running bandwidth distribution (distribute_cfs_runtime). Inspects the local cfs_rq and makes its runtime_remaining positive. However, we defer unthrottling the local cfs_rq until after considering all remote cfs_rq's. - CPU 1: starts running bandwidth distribution from the slack timer. When it finds the cfs_rq for CPU 0 on the throttled list, it observers the that the cfs_rq is throttled, yet is not on the CSD list, and has a positive runtime_remaining, thus triggering the warning in distribute_cfs_runtime. To fix this, we can rework the local unthrottling logic to put the local cfs_rq on a local list, so that any future bandwidth distributions will realize that the cfs_rq is about to be unthrottled. Signed-off-by: Josh Don <joshdon@google.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230922230535.296350-2-joshdon@google.com
2023-09-24sched/fair: Make cfs_rq->throttled_csd_list available on !SMPJosh Don
This makes the following patch cleaner by avoiding extra CONFIG_SMP conditionals on the availability of rq->throttled_csd_list. Signed-off-by: Josh Don <joshdon@google.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230922230535.296350-1-joshdon@google.com
2023-09-23Merge tag 'mm-hotfixes-stable-2023-09-23-10-31' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "13 hotfixes, 10 of which pertain to post-6.5 issues. The other three are cc:stable" * tag 'mm-hotfixes-stable-2023-09-23-10-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: proc: nommu: fix empty /proc/<pid>/maps filemap: add filemap_map_order0_folio() to handle order0 folio proc: nommu: /proc/<pid>/maps: release mmap read lock mm: memcontrol: fix GFP_NOFS recursion in memory.high enforcement pidfd: prevent a kernel-doc warning argv_split: fix kernel-doc warnings scatterlist: add missing function params to kernel-doc selftests/proc: fixup proc-empty-vm test after KSM changes revert "scripts/gdb/symbols: add specific ko module load command" selftests: link libasan statically for tests with -fsanitize=address task_work: add kerneldoc annotation for 'data' argument mm: page_alloc: fix CMA and HIGHATOMIC landing on the wrong buddy list sh: mm: re-add lost __ref to ioremap_prot() to fix modpost warning
2023-09-22ring-buffer: Fix bytes info in per_cpu buffer statsZheng Yejian
The 'bytes' info in file 'per_cpu/cpu<X>/stats' means the number of bytes in cpu buffer that have not been consumed. However, currently after consuming data by reading file 'trace_pipe', the 'bytes' info was not changed as expected. # cat per_cpu/cpu0/stats entries: 0 overrun: 0 commit overrun: 0 bytes: 568 <--- 'bytes' is problematical !!! oldest event ts: 8651.371479 now ts: 8653.912224 dropped events: 0 read events: 8 The root cause is incorrect stat on cpu_buffer->read_bytes. To fix it: 1. When stat 'read_bytes', account consumed event in rb_advance_reader(); 2. When stat 'entries_bytes', exclude the discarded padding event which is smaller than minimum size because it is invisible to reader. Then use rb_page_commit() instead of BUF_PAGE_SIZE at where accounting for page-based read/remove/overrun. Also correct the comments of ring_buffer_bytes_cpu() in this patch. Link: https://lore.kernel.org/linux-trace-kernel/20230921125425.1708423-1-zhengyejian1@huawei.com Cc: stable@vger.kernel.org Fixes: c64e148a3be3 ("trace: Add ring buffer stats to measure rate of events") Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-09-22Merge tag 'sched-urgent-2023-09-22' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fix from Ingo Molnar: "Fix a PF_IDLE initialization bug that generated warnings on tiny-RCU" * tag 'sched-urgent-2023-09-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: kernel/sched: Modify initial boot task idle setup
2023-09-22hardening: Provide Kconfig fragments for basic optionsKees Cook
Inspired by Salvatore Mesoraca's earlier[1] efforts to provide some in-tree guidance for kernel hardening Kconfig options, add a new fragment named "hardening-basic.config" (along with some arch-specific fragments) that enable a basic set of kernel hardening options that have the least (or no) performance impact and remove a reasonable set of legacy APIs. Using this fragment is as simple as running "make hardening.config". More extreme fragments can be added[2] in the future to cover all the recognized hardening options, and more per-architecture files can be added too. For now, document the fragments directly via comments. Perhaps .rst documentation can be generated from them in the future (rather than the other way around). [1] https://lore.kernel.org/kernel-hardening/1536516257-30871-1-git-send-email-s.mesoraca16@gmail.com/ [2] https://github.com/KSPP/linux/issues/14 Cc: Salvatore Mesoraca <s.mesoraca16@gmail.com> Cc: x86@kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-doc@vger.kernel.org Cc: linux-kbuild@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org>
2023-09-22sched/debug: Avoid checking in_atomic_preempt_off() twice in schedule_debug()Liming Wu
in_atomic_preempt_off() already gets called in schedule_debug() once, which is the only caller of __schedule_bug(). Skip the second call within __schedule_bug(), it should always be true at this point. [ mingo: Clarified the changelog. ] Signed-off-by: Liming Wu <liming.wu@jaguarmicro.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230825023501.1848-1-liming.wu@jaguarmicro.com
2023-09-22locking/ww_mutex/test: Make sure we bail out instead of livelockJohn Stultz
I've seen what appears to be livelocks in the stress_inorder_work() function, and looking at the code it is clear we can have a case where we continually retry acquiring the locks and never check to see if we have passed the specified timeout. This patch reworks that function so we always check the timeout before iterating through the loop again. I believe others may have hit this previously here: https://lore.kernel.org/lkml/895ef450-4fb3-5d29-a6ad-790657106a5a@intel.com/ Reported-by: Li Zhijian <zhijianx.li@intel.com> Signed-off-by: John Stultz <jstultz@google.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230922043616.19282-4-jstultz@google.com
2023-09-22locking/ww_mutex/test: Fix potential workqueue corruptionJohn Stultz
In some cases running with the test-ww_mutex code, I was seeing odd behavior where sometimes it seemed flush_workqueue was returning before all the work threads were finished. Often this would cause strange crashes as the mutexes would be freed while they were being used. Looking at the code, there is a lifetime problem as the controlling thread that spawns the work allocates the "struct stress" structures that are passed to the workqueue threads. Then when the workqueue threads are finished, they free the stress struct that was passed to them. Unfortunately the workqueue work_struct node is in the stress struct. Which means the work_struct is freed before the work thread returns and while flush_workqueue is waiting. It seems like a better idea to have the controlling thread both allocate and free the stress structures, so that we can be sure we don't corrupt the workqueue by freeing the structure prematurely. So this patch reworks the test to do so, and with this change I no longer see the early flush_workqueue returns. Signed-off-by: John Stultz <jstultz@google.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230922043616.19282-3-jstultz@google.com
2023-09-22locking/ww_mutex/test: Use prng instead of rng to avoid hangs at bootupJohn Stultz
Booting w/ qemu without kvm, and with 64 cpus, I noticed we'd sometimes hung task watchdog splats in get_random_u32_below() when using the test-ww_mutex stress test. While entropy exhaustion is no longer an issue, the RNG may be slower early in boot. The test-ww_mutex code will spawn off 128 threads (2x cpus) and each thread will call get_random_u32_below() a number of times to generate a random order of the 16 locks. This intense use takes time and without kvm, qemu can be slow enough that we trip the hung task watchdogs. For this test, we don't need true randomness, just mixed up orders for testing ww_mutex lock acquisitions, so it changes the logic to use the prng instead, which takes less time and avoids the watchdgos. Feedback would be appreciated! Signed-off-by: John Stultz <jstultz@google.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230922043616.19282-2-jstultz@google.com
2023-09-21cred: add get_cred_many and put_cred_manyMateusz Guzik
Some of the frequent consumers of get_cred and put_cred operate on 2 references on the same creds back-to-back. Switch them to doing the work in one go instead. Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> [PM: removed changelog from commit description] Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-09-21bpf: Disable zero-extension for BPF_MEMSXIlya Leoshkevich
On the architectures that use bpf_jit_needs_zext(), e.g., s390x, the verifier incorrectly inserts a zero-extension after BPF_MEMSX, leading to miscompilations like the one below: 24: 89 1a ff fe 00 00 00 00 "r1 = *(s16 *)(r10 - 2);" # zext_dst set 0x3ff7fdb910e: lgh %r2,-2(%r13,%r0) # load halfword 0x3ff7fdb9114: llgfr %r2,%r2 # wrong! 25: 65 10 00 03 00 00 7f ff if r1 s> 32767 goto +3 <l0_1> # check_cond_jmp_op() Disable such zero-extensions. The JITs need to insert sign-extension themselves, if necessary. Suggested-by: Puranjay Mohan <puranjay12@gmail.com> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Reviewed-by: Puranjay Mohan <puranjay12@gmail.com> Link: https://lore.kernel.org/r/20230919101336.2223655-2-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netPaolo Abeni
Cross-merge networking fixes after downstream PR. No conflicts. Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-21Merge tag 'net-6.6-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from netfilter and bpf. Current release - regressions: - bpf: adjust size_index according to the value of KMALLOC_MIN_SIZE - netfilter: fix entries val in rule reset audit log - eth: stmmac: fix incorrect rxq|txq_stats reference Previous releases - regressions: - ipv4: fix null-deref in ipv4_link_failure - netfilter: - fix several GC related issues - fix race between IPSET_CMD_CREATE and IPSET_CMD_SWAP - eth: team: fix null-ptr-deref when team device type is changed - eth: i40e: fix VF VLAN offloading when port VLAN is configured - eth: ionic: fix 16bit math issue when PAGE_SIZE >= 64KB Previous releases - always broken: - core: fix ETH_P_1588 flow dissector - mptcp: fix several connection hang-up conditions - bpf: - avoid deadlock when using queue and stack maps from NMI - add override check to kprobe multi link attach - hsr: properly parse HSRv1 supervisor frames. - eth: igc: fix infinite initialization loop with early XDP redirect - eth: octeon_ep: fix tx dma unmap len values in SG - eth: hns3: fix GRE checksum offload issue" * tag 'net-6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (87 commits) sfc: handle error pointers returned by rhashtable_lookup_get_insert_fast() igc: Expose tx-usecs coalesce setting to user octeontx2-pf: Do xdp_do_flush() after redirects. bnxt_en: Flush XDP for bnxt_poll_nitroa0()'s NAPI net: ena: Flush XDP packets on error. net/handshake: Fix memory leak in __sock_create() and sock_alloc_file() net: hinic: Fix warning-hinic_set_vlan_fliter() warn: variable dereferenced before check 'hwdev' netfilter: ipset: Fix race between IPSET_CMD_CREATE and IPSET_CMD_SWAP netfilter: nf_tables: fix memleak when more than 255 elements expired netfilter: nf_tables: disable toggling dormant table state more than once vxlan: Add missing entries to vxlan_get_size() net: rds: Fix possible NULL-pointer dereference team: fix null-ptr-deref when team device type is changed net: bridge: use DEV_STATS_INC() net: hns3: add 5ms delay before clear firmware reset irq source net: hns3: fix fail to delete tc flower rules during reset issue net: hns3: only enable unicast promisc when mac table full net: hns3: fix GRE checksum offload issue net: hns3: add cmdq check for vf periodic service task net: stmmac: fix incorrect rxq|txq_stats reference ...
2023-09-21PM: hibernate: Use __get_safe_page() rather than touching the listBrian Geffon
We found at least one situation where the safe pages list was empty and get_buffer() would gladly try to use a NULL pointer. Signed-off-by: Brian Geffon <bgeffon@google.com> Fixes: 8357376d3df2 ("[PATCH] swsusp: Improve handling of highmem") Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-09-21exit: add internal include file with helpersJens Axboe
Move struct wait_opts and waitid_info into kernel/exit.h, and include function declarations for the recently added helpers. Make them non-static as well. This is in preparation for adding a waitid operation through io_uring. With the abtracted helpers, this is now possible. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-09-21exit: add kernel_waitid_prepare() helperJens Axboe
Move the setup logic out of kernel_waitid(), and into a separate helper. No functional changes intended in this patch. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-09-21exit: move core of do_wait() into helperJens Axboe
Rather than have a maze of gotos, put the actual logic in __do_wait() and have do_wait() loop deal with waitqueue setup/teardown and whether to call __do_wait() again. No functional changes intended in this patch. Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-09-21exit: abstract out should_wake helper for child_wait_callback()Jens Axboe
Abstract out the helper that decides if we should wake up following a wake_up() callback on our internal waitqueue. No functional changes intended in this patch. Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-09-21futex: Add sys_futex_requeue()peterz@infradead.org
Finish off the 'simple' futex2 syscall group by adding sys_futex_requeue(). Unlike sys_futex_{wait,wake}() its arguments are too numerous to fit into a regular syscall. As such, use struct futex_waitv to pass the 'source' and 'destination' futexes to the syscall. This syscall implements what was previously known as FUTEX_CMP_REQUEUE and uses {val, uaddr, flags} for source and {uaddr, flags} for destination. This design explicitly allows requeueing between different types of futex by having a different flags word per uaddr. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Link: https://lore.kernel.org/r/20230921105248.511860556@noisy.programming.kicks-ass.net
2023-09-21futex: Add flags2 argument to futex_requeue()peterz@infradead.org
In order to support mixed size requeue, add a second flags argument to the internal futex_requeue() function. No functional change intended. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20230921105248.396780136@noisy.programming.kicks-ass.net
2023-09-21futex: Propagate flags into get_futex_key()peterz@infradead.org
Instead of only passing FLAGS_SHARED as a boolean, pass down flags as a whole. No functional change intended. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20230921105248.282857501@noisy.programming.kicks-ass.net
2023-09-21futex: Add sys_futex_wait()peterz@infradead.org
To complement sys_futex_waitv()/wake(), add sys_futex_wait(). This syscall implements what was previously known as FUTEX_WAIT_BITSET except it uses 'unsigned long' for the value and bitmask arguments, takes timespec and clockid_t arguments for the absolute timeout and uses FUTEX2 flags. The 'unsigned long' allows FUTEX2_SIZE_U64 on 64bit platforms. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Link: https://lore.kernel.org/r/20230921105248.164324363@noisy.programming.kicks-ass.net
2023-09-21futex: FLAGS_STRICTpeterz@infradead.org
The current semantics for futex_wake() are a bit loose, specifically asking for 0 futexes to be woken actually gets you 1. Adding a !nr check to sys_futex_wake() makes that it would return 0 for unaligned futex words, because that check comes in the shared futex_wake() function. Adding the !nr check there, would affect the legacy sys_futex() semantics. Hence frob a flag :-( Suggested-by: André Almeida <andrealmeid@igalia.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20230921105248.048643656@noisy.programming.kicks-ass.net
2023-09-21futex: Add sys_futex_wake()peterz@infradead.org
To complement sys_futex_waitv() add sys_futex_wake(). This syscall implements what was previously known as FUTEX_WAKE_BITSET except it uses 'unsigned long' for the bitmask and takes FUTEX2 flags. The 'unsigned long' allows FUTEX2_SIZE_U64 on 64bit platforms. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Link: https://lore.kernel.org/r/20230921105247.936205525@noisy.programming.kicks-ass.net
2023-09-21futex: Validate futex value against futex sizepeterz@infradead.org
Ensure the futex value fits in the given futex size. Since this adds a constraint to an existing syscall, it might possibly change behaviour. Currently the value would be truncated to a u32 and any high bits would get silently lost. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20230921105247.828934099@noisy.programming.kicks-ass.net
2023-09-21futex: Flag conversionpeterz@infradead.org
Futex has 3 sets of flags: - legacy futex op bits - futex2 flags - internal flags Add a few helpers to convert from the API flags into the internal flags. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: André Almeida <andrealmeid@igalia.com> Link: https://lore.kernel.org/r/20230921105247.722140574@noisy.programming.kicks-ass.net
2023-09-21futex: Extend the FUTEX2 flagspeterz@infradead.org
Add the definition for the missing but always intended extra sizes, and add a NUMA flag for the planned numa extention. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: André Almeida <andrealmeid@igalia.com> Link: https://lore.kernel.org/r/20230921105247.617057368@noisy.programming.kicks-ass.net
2023-09-21futex: Clarify FUTEX2 flagspeterz@infradead.org
sys_futex_waitv() is part of the futex2 series (the first and only so far) of syscalls and has a flags field per futex (as opposed to flags being encoded in the futex op). This new flags field has a new namespace, which unfortunately isn't super explicit. Notably it currently takes FUTEX_32 and FUTEX_PRIVATE_FLAG. Introduce the FUTEX2 namespace to clarify this Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: André Almeida <andrealmeid@igalia.com> Link: https://lore.kernel.org/r/20230921105247.507327749@noisy.programming.kicks-ass.net
2023-09-21printk: fix illegal pbufs access for !CONFIG_PRINTKJohn Ogness
When CONFIG_PRINTK is not set, PRINTK_MESSAGE_MAX is 0. This leads to a zero-sized array @outbuf in @printk_shared_pbufs. In console_flush_all() a pointer to the first element of the array is assigned with: char *outbuf = &printk_shared_pbufs.outbuf[0]; For !CONFIG_PRINTK this leads to a compiler warning: warning: array subscript 0 is outside array bounds of 'char[0]' [-Warray-bounds] This is not really dangerous because printk_get_next_message() always returns false for !CONFIG_PRINTK, which leads to @outbuf never being used. However, it makes no sense to even compile these functions for !CONFIG_PRINTK. Extend the existing '#ifdef CONFIG_PRINTK' block to contain the formatting and emitting functions since these have no purpose in !CONFIG_PRINTK. This also allows removing several more !CONFIG_PRINTK dummies as well as moving @suppress_panic_printk into a CONFIG_PRINTK block. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202309201724.M9BMAQIh-lkp@intel.com/ Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20230920155238.670439-1-john.ogness@linutronix.de
2023-09-21sched/debug: Update stale reference to sched_debug.cSebastian Andrzej Siewior
Since commit: 8a99b6833c884 ("sched: Move SCHED_DEBUG sysctl to debugfs") The sched_debug interface moved from /proc to debugfs. The comment mentions still the outdated proc interfaces. Update the comment, point to the current location of the interface. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230920130025.412071-3-bigeasy@linutronix.de
2023-09-21sched/debug: Remove the /proc/sys/kernel/sched_child_runs_first sysctlSebastian Andrzej Siewior
The /proc/sys/kernel/sched_child_runs_first knob is no longer connected since: 5e963f2bd4654 ("sched/fair: Commit to EEVDF") Remove it. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230920130025.412071-2-bigeasy@linutronix.de
2023-09-20bpf: unconditionally reset backtrack_state masks on global func exitAndrii Nakryiko
In mark_chain_precision() logic, when we reach the entry to a global func, it is expected that R1-R5 might be still requested to be marked precise. This would correspond to some integer input arguments being tracked as precise. This is all expected and handled as a special case. What's not expected is that we'll leave backtrack_state structure with some register bits set. This is because for subsequent precision propagations backtrack_state is reused without clearing masks, as all code paths are carefully written in a way to leave empty backtrack_state with zeroed out masks, for speed. The fix is trivial, we always clear register bit in the register mask, and then, optionally, set reg->precise if register is SCALAR_VALUE type. Reported-by: Chris Mason <clm@meta.com> Fixes: be2ef8161572 ("bpf: allow precision tracking for programs with subprogs") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230918210110.2241458-1-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-20livepatch: Fix missing newline character in klp_resolve_symbols()Zheng Yejian
Without the newline character, the log may not be printed immediately after the error occurs. Fixes: ca376a937486 ("livepatch: Prevent module-specific KLP rela sections from referencing vmlinux symbols") Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20230914072644.4098857-1-zhengyejian1@huawei.com