summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2022-10-20blktrace: remove unnessary stop block trace in 'blk_trace_shutdown'Ye Bin
As previous commit, 'blk_trace_cleanup' will stop block trace if block trace's state is 'Blktrace_running'. So remove unnessary stop block trace in 'blk_trace_shutdown'. Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20221019033602.752383-4-yebin@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-10-20blktrace: fix possible memleak in '__blk_trace_remove'Ye Bin
When test as follows: step1: ioctl(sda, BLKTRACESETUP, &arg) step2: ioctl(sda, BLKTRACESTART, NULL) step3: ioctl(sda, BLKTRACETEARDOWN, NULL) step4: ioctl(sda, BLKTRACESETUP, &arg) Got issue as follows: debugfs: File 'dropped' in directory 'sda' already present! debugfs: File 'msg' in directory 'sda' already present! debugfs: File 'trace0' in directory 'sda' already present! And also find syzkaller report issue like "KASAN: use-after-free Read in relay_switch_subbuf" "https://syzkaller.appspot.com/bug?id=13849f0d9b1b818b087341691be6cc3ac6a6bfb7" If remove block trace without stop(BLKTRACESTOP) block trace, '__blk_trace_remove' will just set 'q->blk_trace' with NULL. However, debugfs file isn't removed, so will report file already present when call BLKTRACESETUP. static int __blk_trace_remove(struct request_queue *q) { struct blk_trace *bt; bt = rcu_replace_pointer(q->blk_trace, NULL, lockdep_is_held(&q->debugfs_mutex)); if (!bt) return -EINVAL; if (bt->trace_state != Blktrace_running) blk_trace_cleanup(q, bt); return 0; } If do test as follows: step1: ioctl(sda, BLKTRACESETUP, &arg) step2: ioctl(sda, BLKTRACESTART, NULL) step3: ioctl(sda, BLKTRACETEARDOWN, NULL) step4: remove sda There will remove debugfs directory which will remove recursively all file under directory. >> blk_release_queue >> debugfs_remove_recursive(q->debugfs_dir) So all files which created in 'do_blk_trace_setup' are removed, and 'dentry->d_inode' is NULL. But 'q->blk_trace' is still in 'running_trace_lock', 'trace_note_tsk' will traverse 'running_trace_lock' all nodes. >>trace_note_tsk >> trace_note >> relay_reserve >> relay_switch_subbuf >> d_inode(buf->dentry)->i_size To solve above issues, reference commit '5afedf670caf', call 'blk_trace_cleanup' unconditionally in '__blk_trace_remove' and first stop block trace in 'blk_trace_cleanup'. Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20221019033602.752383-3-yebin@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-10-20blktrace: introduce 'blk_trace_{start,stop}' helperYe Bin
Introduce 'blk_trace_{start,stop}' helper. No functional changed. Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20221019033602.752383-2-yebin@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-10-18Merge tag 'for-netdev' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2022-10-18 We've added 33 non-merge commits during the last 14 day(s) which contain a total of 31 files changed, 874 insertions(+), 538 deletions(-). The main changes are: 1) Add RCU grace period chaining to BPF to wait for the completion of access from both sleepable and non-sleepable BPF programs, from Hou Tao & Paul E. McKenney. 2) Improve helper UAPI by explicitly defining BPF_FUNC_xxx integer values. In the wild we have seen OS vendors doing buggy backports where helper call numbers mismatched. This is an attempt to make backports more foolproof, from Andrii Nakryiko. 3) Add libbpf *_opts API-variants for bpf_*_get_fd_by_id() functions, from Roberto Sassu. 4) Fix libbpf's BTF dumper for structs with padding-only fields, from Eduard Zingerman. 5) Fix various libbpf bugs which have been found from fuzzing with malformed BPF object files, from Shung-Hsi Yu. 6) Clean up an unneeded check on existence of SSE2 in BPF x86-64 JIT, from Jie Meng. 7) Fix various ASAN bugs in both libbpf and selftests when running the BPF selftest suite on arm64, from Xu Kuohai. 8) Fix missing bpf_iter_vma_offset__destroy() call in BPF iter selftest and use in-skeleton link pointer to remove an explicit bpf_link__destroy(), from Jiri Olsa. 9) Fix BPF CI breakage by pointing to iptables-legacy instead of relying on symlinked iptables which got upgraded to iptables-nft, from Martin KaFai Lau. 10) Minor BPF selftest improvements all over the place, from various others. * tag 'for-netdev' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (33 commits) bpf/docs: Update README for most recent vmtest.sh bpf: Use rcu_trace_implies_rcu_gp() for program array freeing bpf: Use rcu_trace_implies_rcu_gp() in local storage map bpf: Use rcu_trace_implies_rcu_gp() in bpf memory allocator rcu-tasks: Provide rcu_trace_implies_rcu_gp() selftests/bpf: Use sys_pidfd_open() helper when possible libbpf: Fix null-pointer dereference in find_prog_by_sec_insn() libbpf: Deal with section with no data gracefully libbpf: Use elf_getshdrnum() instead of e_shnum selftest/bpf: Fix error usage of ASSERT_OK in xdp_adjust_tail.c selftests/bpf: Fix error failure of case test_xdp_adjust_tail_grow selftest/bpf: Fix memory leak in kprobe_multi_test selftests/bpf: Fix memory leak caused by not destroying skeleton libbpf: Fix memory leak in parse_usdt_arg() libbpf: Fix use-after-free in btf_dump_name_dups selftests/bpf: S/iptables/iptables-legacy/ in the bpf_nf and xdp_synproxy test selftests/bpf: Alphabetize DENYLISTs selftests/bpf: Add tests for _opts variants of bpf_*_get_fd_by_id() libbpf: Introduce bpf_link_get_fd_by_id_opts() libbpf: Introduce bpf_btf_get_fd_by_id_opts() ... ==================== Link: https://lore.kernel.org/r/20221018210631.11211-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-18kcsan: Instrument memcpy/memset/memmove with newer ClangMarco Elver
With Clang version 16+, -fsanitize=thread will turn memcpy/memset/memmove calls in instrumented functions into __tsan_memcpy/__tsan_memset/__tsan_memmove calls respectively. Add these functions to the core KCSAN runtime, so that we (a) catch data races with mem* functions, and (b) won't run into linker errors with such newer compilers. Cc: stable@vger.kernel.org # v5.10+ Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-10-18rcutorture: Verify NUM_ACTIVE_RCU_POLL_OLDSTATEPaul E. McKenney
This commit adds code to the RTWS_POLL_GET case of rcu_torture_writer() to verify that the value of NUM_ACTIVE_RCU_POLL_OLDSTATE is sufficiently large Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-10-18rcutorture: Verify NUM_ACTIVE_RCU_POLL_FULL_OLDSTATEPaul E. McKenney
This commit adds code to the RTWS_POLL_GET_FULL case of rcu_torture_writer() to verify that the value of NUM_ACTIVE_RCU_POLL_FULL_OLDSTATE is sufficiently large. [ paulmck: Fix whitespace issue located by checkpatch.pl. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-10-18rcu: Fix missing nocb gp wake on rcu_barrier()Frederic Weisbecker
In preparation for RCU lazy changes, wake up the RCU nocb gp thread if needed after an entrain. This change prevents the RCU barrier callback from waiting in the queue for several seconds before the lazy callbacks in front of it are serviced. Reported-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-10-18rcu: Fix late wakeup when flush of bypass cblist happensJoel Fernandes (Google)
When the bypass cblist gets too big or its timeout has occurred, it is flushed into the main cblist. However, the bypass timer is still running and the behavior is that it would eventually expire and wake the GP thread. Since we are going to use the bypass cblist for lazy CBs, do the wakeup soon as the flush for "too big or too long" bypass list happens. Otherwise, long delays can happen for callbacks which get promoted from lazy to non-lazy. This is a good thing to do anyway (regardless of future lazy patches), since it makes the behavior consistent with behavior of other code paths where flushing into the ->cblist makes the GP kthread into a non-sleeping state quickly. [ Frederic Weisbecker: Changes to avoid unnecessary GP-thread wakeups plus comment changes. ] Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-10-18rcu: Simplify rcu_init_nohz() cpumask handlingZhen Lei
In kernels built with either CONFIG_RCU_NOCB_CPU_DEFAULT_ALL=y or CONFIG_NO_HZ_FULL=y, additional CPUs must be added to rcu_nocb_mask. Except that kernels booted without the rcu_nocbs= will not have allocated rcu_nocb_mask. And the current rcu_init_nohz() function uses its need_rcu_nocb_mask and offload_all local variables to track the rcu_nocb and nohz_full state. But there is a much simpler approach, namely creating a cpumask pointer to track the default and then using cpumask_available() to check the rcu_nocb_mask state. This commit takes this approach, thereby simplifying and shortening the rcu_init_nohz() function. Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Acked-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-10-18rcu: Use READ_ONCE() for lockless read of rnp->qsmaskJoel Fernandes (Google)
The rnp->qsmask is locklessly accessed from rcutree_dying_cpu(). This may help avoid load tearing due to concurrent access, KCSAN issues, and preserve sanity of people reading the mask in tracing. Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-10-18rcu: Synchronize ->qsmaskinitnext in rcu_boost_kthread_setaffinity()Pingfan Liu
Once either rcutree_online_cpu() or rcutree_dead_cpu() is invoked concurrently, the following rcu_boost_kthread_setaffinity() race can occur: CPU 1 CPU2 mask = rcu_rnp_online_cpus(rnp); ... mask = rcu_rnp_online_cpus(rnp); ... set_cpus_allowed_ptr(t, cm); set_cpus_allowed_ptr(t, cm); This results in CPU2's update being overwritten by that of CPU1, and thus the possibility of ->boost_kthread_task continuing to run on a to-be-offlined CPU. This commit therefore eliminates this race by relying on the pre-existing acquisition of ->boost_kthread_mutex to serialize the full process of changing the affinity of ->boost_kthread_task. Signed-off-by: Pingfan Liu <kernelfans@gmail.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: "Jason A. Donenfeld" <Jason@zx2c4.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-10-18rcu: Remove duplicate RCU exp QS report from rcu_report_dead()Zqiang
The rcu_report_dead() function invokes rcu_report_exp_rdp() in order to force an immediate expedited quiescent state on the outgoing CPU, and then it invokes rcu_preempt_deferred_qs() to provide any required deferred quiescent state of either sort. Because the call to rcu_preempt_deferred_qs() provides the expedited RCU quiescent state if requested, the call to rcu_report_exp_rdp() is potentially redundant. One possible issue is a concurrent start of a new expedited RCU grace period, but this situation is already handled correctly by __sync_rcu_exp_select_node_cpus(). This function will detect that the CPU is going offline via the error return from its call to smp_call_function_single(). In that case, it will retry, and eventually stop retrying due to rcu_report_exp_rdp() clearing the ->qsmaskinitnext bit corresponding to the target CPU. As a result, __sync_rcu_exp_select_node_cpus() will report the necessary quiescent state after dealing with any remaining CPU. This change assumes that control does not enter rcu_report_dead() within an RCU read-side critical section, but then again, the surviving call to rcu_preempt_deferred_qs() has always made this assumption. This commit therefore removes the call to rcu_report_exp_rdp(), thus relying on rcu_preempt_deferred_qs() to handle both normal and expedited quiescent states. Signed-off-by: Zqiang <qiang1.zhang@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-10-18srcu: Convert ->srcu_lock_count and ->srcu_unlock_count to atomicPaul E. McKenney
NMI-safe variants of srcu_read_lock() and srcu_read_unlock() are needed by printk(), which on many architectures entails read-modify-write atomic operations. This commit prepares Tree SRCU for this change by making both ->srcu_lock_count and ->srcu_unlock_count by atomic_long_t. [ paulmck: Apply feedback from John Ogness. ] Link: https://lore.kernel.org/all/20220910221947.171557773@linutronix.de/ Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: John Ogness <john.ogness@linutronix.de> Cc: Petr Mladek <pmladek@suse.com>
2022-10-18bpf: Use rcu_trace_implies_rcu_gp() for program array freeingHou Tao
To support both sleepable and normal uprobe bpf program, the freeing of trace program array chains a RCU-tasks-trace grace period and a normal RCU grace period one after the other. With the introduction of rcu_trace_implies_rcu_gp(), __bpf_prog_array_free_sleepable_cb() can check whether or not a normal RCU grace period has also passed after a RCU-tasks-trace grace period has passed. If it is true, it is safe to invoke kfree() directly. Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20221014113946.965131-5-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-10-18bpf: Use rcu_trace_implies_rcu_gp() in local storage mapHou Tao
Local storage map is accessible for both sleepable and non-sleepable bpf program, and its memory is freed by using both call_rcu_tasks_trace() and kfree_rcu() to wait for both RCU-tasks-trace grace period and RCU grace period to pass. With the introduction of rcu_trace_implies_rcu_gp(), both bpf_selem_free_rcu() and bpf_local_storage_free_rcu() can check whether or not a normal RCU grace period has also passed after a RCU-tasks-trace grace period has passed. If it is true, it is safe to call kfree() directly. Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20221014113946.965131-4-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-10-18bpf: Use rcu_trace_implies_rcu_gp() in bpf memory allocatorHou Tao
The memory free logic in bpf memory allocator chains a RCU Tasks Trace grace period and a normal RCU grace period one after the other, so it can ensure that both sleepable and non-sleepable programs have finished. With the introduction of rcu_trace_implies_rcu_gp(), __free_rcu_tasks_trace() can check whether or not a normal RCU grace period has also passed after a RCU Tasks Trace grace period has passed. If it is true, freeing these elements directly, else freeing through call_rcu(). Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20221014113946.965131-3-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-10-18rcu-tasks: Provide rcu_trace_implies_rcu_gp()Paul E. McKenney
As an accident of implementation, an RCU Tasks Trace grace period also acts as an RCU grace period. However, this could change at any time. This commit therefore creates an rcu_trace_implies_rcu_gp() that currently returns true to codify this accident. Code relying on this accident must call this function to verify that this accident is still happening. Reported-by: Hou Tao <houtao@huaweicloud.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Martin KaFai Lau <martin.lau@linux.dev> Link: https://lore.kernel.org/r/20221014113946.965131-2-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-10-18livepatch: Move the result-invariant calculation out of the loopZhen Lei
The calculation results of the variables 'func_addr' and 'func_size' are not affected by the for loop and do not change due to the changes of entries[i]. The performance can be improved by moving it outside the loop. No functional change. Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2022-10-17Merge tag 'cgroup-for-6.1-rc1-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: - Fix a recent regression where a sleeping kernfs function is called with css_set_lock (spinlock) held - Revert the commit to enable cgroup1 support for cgroup_get_from_fd/file() Multiple users assume that the lookup only works for cgroup2 and breaks when fed a cgroup1 file. Instead, introduce a separate set of functions to lookup both v1 and v2 and use them where the user explicitly wants to support both versions. - Compat update for tools/perf/util/bpf_skel/bperf_cgroup.bpf.c. - Add Josef Bacik as a blkcg maintainer. * tag 'cgroup-for-6.1-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: blkcg: Update MAINTAINERS entry mm: cgroup: fix comments for get from fd/file helpers perf stat: Support old kernels for bperf cgroup counting bpf: cgroup_iter: support cgroup1 using cgroup fd cgroup: add cgroup_v1v2_get_from_[fd/file]() Revert "cgroup: enable cgroup_get_from_file() on cgroup1" cgroup: Reorganize css_set_lock and kernfs path processing
2022-10-17audit: unify audit_filter_{uring(), inode_name(), syscall()}Ankur Arora
audit_filter_uring(), audit_filter_inode_name() are substantially similar to audit_filter_syscall(). Move the core logic to __audit_filter_op() which can be parametrized for all three. On a Skylakex system, getpid() latency (all results aggregated across 12 boot cycles): Min Mean Median Max pstdev (ns) (ns) (ns) (ns) - 196.63 207.86 206.60 230.98 (+- 3.92%) + 183.73 196.95 192.31 232.49 (+- 6.04%) Performance counter stats for 'bin/getpid' (3 runs) go from: cycles 805.58 ( +- 4.11% ) instructions 1654.11 ( +- .05% ) IPC 2.06 ( +- 3.39% ) branches 430.02 ( +- .05% ) branch-misses 1.55 ( +- 7.09% ) L1-dcache-loads 440.01 ( +- .09% ) L1-dcache-load-misses 9.05 ( +- 74.03% ) to: cycles 765.37 ( +- 6.66% ) instructions 1677.07 ( +- 0.04% ) IPC 2.20 ( +- 5.90% ) branches 431.10 ( +- 0.04% ) branch-misses 1.60 ( +- 11.25% ) L1-dcache-loads 521.04 ( +- 0.05% ) L1-dcache-load-misses 6.92 ( +- 77.60% ) (Both aggregated over 12 boot cycles.) The increased L1-dcache-loads are due to some intermediate values now coming from the stack. The improvement in cycles is due to a slightly denser loop (the list parameter in the list_for_each_entry_rcu() exit check now comes from a register rather than a constant as before.) Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2022-10-17audit: cache ctx->major in audit_filter_syscall()Ankur Arora
ctx->major contains the current syscall number. This is, of course, a constant for the duration of the syscall. Unfortunately, GCC's alias analysis cannot prove that it is not modified via a pointer in the audit_filter_syscall() loop, and so always loads it from memory. In and of itself the load isn't very expensive (ops dependent on the ctx->major load are only used to determine the direction of control flow and have short dependence chains and, in any case the related branches get predicted perfectly in the fastpath) but still cache ctx->major in a local for two reasons: * ctx->major is in the first cacheline of struct audit_context and has similar alignment as audit_entry::list audit_entry. For cases with a lot of audit rules, doing this reduces one source of contention from a potentially busy cache-set. * audit_in_mask() (called in the hot loop in audit_filter_syscall()) does cast manipulation and error checking on ctx->major: audit_in_mask(const struct audit_krule *rule, unsigned long val): if (val > 0xffffffff) return false; word = AUDIT_WORD(val); if (word >= AUDIT_BITMASK_SIZE) return false; bit = AUDIT_BIT(val); return rule->mask[word] & bit; The clauses related to the rule need to be evaluated in the loop, but the rest is unnecessarily re-evaluated for every loop iteration. (Note, however, that most of these are cheap ALU ops and the branches are perfectly predicted. However, see discussion on cycles improvement below for more on why it is still worth hoisting.) On a Skylakex system change in getpid() latency (aggregated over 12 boot cycles): Min Mean Median Max pstdev (ns) (ns) (ns) (ns) - 201.30 216.14 216.22 228.46 (+- 1.45%) + 196.63 207.86 206.60 230.98 (+- 3.92%) Performance counter stats for 'bin/getpid' (3 runs) go from: cycles 836.89 ( +- .80% ) instructions 2000.19 ( +- .03% ) IPC 2.39 ( +- .83% ) branches 430.14 ( +- .03% ) branch-misses 1.48 ( +- 3.37% ) L1-dcache-loads 471.11 ( +- .05% ) L1-dcache-load-misses 7.62 ( +- 46.98% ) to: cycles 805.58 ( +- 4.11% ) instructions 1654.11 ( +- .05% ) IPC 2.06 ( +- 3.39% ) branches 430.02 ( +- .05% ) branch-misses 1.55 ( +- 7.09% ) L1-dcache-loads 440.01 ( +- .09% ) L1-dcache-load-misses 9.05 ( +- 74.03% ) (Both aggregated over 12 boot cycles.) instructions: we reduce around 8 instructions/iteration because some of the computation is now hoisted out of the loop (branch count does not change because GCC, for reasons unclear, only hoists the computations while keeping the basic-blocks.) cycles: improve by about 5% (in aggregate and looking at individual run numbers.) This is likely because we now waste fewer pipeline resources on unnecessary instructions which allows the control flow to speculatively execute further ahead shortening the execution of the loop a little. The final gating factor on the performance of this loop remains the long dependence chain due to the linked-list load. Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2022-10-17bpf: prevent decl_tag from being referenced in func_protoStanislav Fomichev
Syzkaller was able to hit the following issue: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 3609 at kernel/bpf/btf.c:1946 btf_type_id_size+0x2d5/0x9d0 kernel/bpf/btf.c:1946 Modules linked in: CPU: 0 PID: 3609 Comm: syz-executor361 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022 RIP: 0010:btf_type_id_size+0x2d5/0x9d0 kernel/bpf/btf.c:1946 Code: ef e8 7f 8e e4 ff 41 83 ff 0b 77 28 f6 44 24 10 18 75 3f e8 6d 91 e4 ff 44 89 fe bf 0e 00 00 00 e8 20 8e e4 ff e8 5b 91 e4 ff <0f> 0b 45 31 f6 e9 98 02 00 00 41 83 ff 12 74 18 e8 46 91 e4 ff 44 RSP: 0018:ffffc90003cefb40 EFLAGS: 00010293 RAX: 0000000000000000 RBX: 0000000000000002 RCX: 0000000000000000 RDX: ffff8880259c0000 RSI: ffffffff81968415 RDI: 0000000000000005 RBP: ffff88801270ca00 R08: 0000000000000005 R09: 000000000000000e R10: 0000000000000011 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000011 R14: ffff888026ee6424 R15: 0000000000000011 FS: 000055555641b300(0000) GS:ffff8880b9a00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000f2e258 CR3: 000000007110e000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> btf_func_proto_check kernel/bpf/btf.c:4447 [inline] btf_check_all_types kernel/bpf/btf.c:4723 [inline] btf_parse_type_sec kernel/bpf/btf.c:4752 [inline] btf_parse kernel/bpf/btf.c:5026 [inline] btf_new_fd+0x1926/0x1e70 kernel/bpf/btf.c:6892 bpf_btf_load kernel/bpf/syscall.c:4324 [inline] __sys_bpf+0xb7d/0x4cf0 kernel/bpf/syscall.c:5010 __do_sys_bpf kernel/bpf/syscall.c:5069 [inline] __se_sys_bpf kernel/bpf/syscall.c:5067 [inline] __x64_sys_bpf+0x75/0xb0 kernel/bpf/syscall.c:5067 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f0fbae41c69 Code: 28 c3 e8 2a 14 00 00 66 2e 0f 1f 84 00 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffc8aeb6228 EFLAGS: 00000246 ORIG_RAX: 0000000000000141 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f0fbae41c69 RDX: 0000000000000020 RSI: 0000000020000140 RDI: 0000000000000012 RBP: 00007f0fbae05e10 R08: 0000000000000000 R09: 0000000000000000 R10: 00000000ffffffff R11: 0000000000000246 R12: 00007f0fbae05ea0 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 </TASK> Looks like it tries to create a func_proto which return type is decl_tag. For the details, see Martin's spot on analysis in [0]. 0: https://lore.kernel.org/bpf/CAKH8qBuQDLva_hHxxBuZzyAcYNO4ejhovz6TQeVSk8HY-2SO6g@mail.gmail.com/T/#mea6524b3fcd6298347432226e81b1e6155efc62c Cc: Yonghong Song <yhs@fb.com> Cc: Martin KaFai Lau <martin.lau@kernel.org> Fixes: bd16dee66ae4 ("bpf: Add BTF_KIND_DECL_TAG typedef support") Reported-by: syzbot+d8bd751aef7c6b39a344@syzkaller.appspotmail.com Signed-off-by: Stanislav Fomichev <sdf@google.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20221015002444.2680969-2-sdf@google.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2022-10-17sched: Introduce struct balance_callback to avoid CFI mismatchesKees Cook
Introduce distinct struct balance_callback instead of performing function pointer casting which will trip CFI. Avoids warnings as found by Clang's future -Wcast-function-type-strict option: In file included from kernel/sched/core.c:84: kernel/sched/sched.h:1755:15: warning: cast from 'void (*)(struct rq *)' to 'void (*)(struct callback_head *)' converts to incompatible function type [-Wcast-function-type-strict] head->func = (void (*)(struct callback_head *))func; ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ No binary differences result from this change. This patch is a cleanup based on Brad Spengler/PaX Team's modifications to sched code in their last public patch of grsecurity/PaX based on my understanding of the code. Changes or omissions from the original code are mine and don't reflect the original grsecurity/PaX code. Reported-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Link: https://github.com/ClangBuiltLinux/linux/issues/1724 Link: https://lkml.kernel.org/r/20221008000758.2957718-1-keescook@chromium.org
2022-10-17sched/core: Fix comparison in sched_group_cookie_match()Lin Shengwang
In commit 97886d9dcd86 ("sched: Migration changes for core scheduling"), sched_group_cookie_match() was added to help determine if a cookie matches the core state. However, while it iterates the SMT group, it fails to actually use the RQ for each of the CPUs iterated, use cpu_rq(cpu) instead of rq to fix things. Fixes: 97886d9dcd86 ("sched: Migration changes for core scheduling") Signed-off-by: Lin Shengwang <linshengwang1@huawei.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20221008022709.642-1-linshengwang1@huawei.com
2022-10-17x86/ftrace: Make it call depth tracking awarePeter Zijlstra
Since ftrace has trampolines, don't use thunks for the __fentry__ site but instead require that every function called from there includes accounting. This very much includes all the direct-call functions. Additionally, ftrace uses ROP tricks in two places: - return_to_handler(), and - ftrace_regs_caller() when pt_regs->orig_ax is set by a direct-call. return_to_handler() already uses a retpoline to replace an indirect-jump to defeat IBT, since this is a jump-type retpoline, make sure there is no accounting done and ALTERNATIVE the RET into a ret. ftrace_regs_caller() does much the same and gets the same treatment. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220915111148.927545073@infradead.org
2022-10-17kallsyms: Take callthunks into accountPeter Zijlstra
Since the pre-symbol function padding is an integral part of the symbol make kallsyms report it as part of the symbol by reporting it as sym-x instead of prev_sym+y. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220915111148.409656012@infradead.org
2022-10-17static_call: Add call depth tracking supportPeter Zijlstra
When indirect calls are switched to direct calls then it has to be ensured that the call target is not the function, but the call thunk when call depth tracking is enabled. But static calls are available before call thunks have been set up. Ensure a second run through the static call patching code after call thunks have been created. When call thunks are not enabled this has no side effects. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220915111148.306100465@infradead.org
2022-10-17bpf: Fix sample_flags for bpf_perf_event_outputSumanth Korikkar
* Raw data is also filled by bpf_perf_event_output. * Add sample_flags to indicate raw data. * This eliminates the segfaults as shown below: Run ./samples/bpf/trace_output BUG pid 9 cookie 1001000000004 sized 4 BUG pid 9 cookie 1001000000004 sized 4 BUG pid 9 cookie 1001000000004 sized 4 Segmentation fault (core dumped) Fixes: 838d9bb62d13 ("perf: Use sample_flags for raw_data") Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/r/20221007081327.1047552-1-sumanthk@linux.ibm.com
2022-10-17perf: Fix missing SIGTRAPsPeter Zijlstra
Marco reported: Due to the implementation of how SIGTRAP are delivered if perf_event_attr::sigtrap is set, we've noticed 3 issues: 1. Missing SIGTRAP due to a race with event_sched_out() (more details below). 2. Hardware PMU events being disabled due to returning 1 from perf_event_overflow(). The only way to re-enable the event is for user space to first "properly" disable the event and then re-enable it. 3. The inability to automatically disable an event after a specified number of overflows via PERF_EVENT_IOC_REFRESH. The worst of the 3 issues is problem (1), which occurs when a pending_disable is "consumed" by a racing event_sched_out(), observed as follows: CPU0 | CPU1 --------------------------------+--------------------------- __perf_event_overflow() | perf_event_disable_inatomic() | pending_disable = CPU0 | ... | _perf_event_enable() | event_function_call() | task_function_call() | /* sends IPI to CPU0 */ <IPI> | ... __perf_event_enable() +--------------------------- ctx_resched() task_ctx_sched_out() ctx_sched_out() group_sched_out() event_sched_out() pending_disable = -1 </IPI> <IRQ-work> perf_pending_event() perf_pending_event_disable() /* Fails to send SIGTRAP because no pending_disable! */ </IRQ-work> In the above case, not only is that particular SIGTRAP missed, but also all future SIGTRAPs because 'event_limit' is not reset back to 1. To fix, rework pending delivery of SIGTRAP via IRQ-work by introduction of a separate 'pending_sigtrap', no longer using 'event_limit' and 'pending_disable' for its delivery. Additionally; and different to Marco's proposed patch: - recognise that pending_disable effectively duplicates oncpu for the case where it is set. As such, change the irq_work handler to use ->oncpu to target the event and use pending_* as boolean toggles. - observe that SIGTRAP targets the ctx->task, so the context switch optimization that carries contexts between tasks is invalid. If the irq_work were delayed enough to hit after a context switch the SIGTRAP would be delivered to the wrong task. - observe that if the event gets scheduled out (rotation/migration/context-switch/...) the irq-work would be insufficient to deliver the SIGTRAP when the event gets scheduled back in (the irq-work might still be pending on the old CPU). Therefore have event_sched_out() convert the pending sigtrap into a task_work which will deliver the signal at return_to_user. Fixes: 97ba62b27867 ("perf: Add support for SIGTRAP on perf events") Reported-by: Dmitry Vyukov <dvyukov@google.com> Debugged-by: Dmitry Vyukov <dvyukov@google.com> Reported-by: Marco Elver <elver@google.com> Debugged-by: Marco Elver <elver@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Marco Elver <elver@google.com> Tested-by: Marco Elver <elver@google.com>
2022-10-17timers: Replace in_irq() with in_hardirq()ye xingchen
Replace the obsolete and ambiguous macro in_irq() with new macro in_hardirq(). Signed-off-by: ye xingchen <ye.xingchen@zte.com.cn> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/r/20221012012629.334966-1-ye.xingchen@zte.com.cn
2022-10-16Merge tag 'random-6.1-rc1-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/crng/random Pull more random number generator updates from Jason Donenfeld: "This time with some large scale treewide cleanups. The intent of this pull is to clean up the way callers fetch random integers. The current rules for doing this right are: - If you want a secure or an insecure random u64, use get_random_u64() - If you want a secure or an insecure random u32, use get_random_u32() The old function prandom_u32() has been deprecated for a while now and is just a wrapper around get_random_u32(). Same for get_random_int(). - If you want a secure or an insecure random u16, use get_random_u16() - If you want a secure or an insecure random u8, use get_random_u8() - If you want secure or insecure random bytes, use get_random_bytes(). The old function prandom_bytes() has been deprecated for a while now and has long been a wrapper around get_random_bytes() - If you want a non-uniform random u32, u16, or u8 bounded by a certain open interval maximum, use prandom_u32_max() I say "non-uniform", because it doesn't do any rejection sampling or divisions. Hence, it stays within the prandom_*() namespace, not the get_random_*() namespace. I'm currently investigating a "uniform" function for 6.2. We'll see what comes of that. By applying these rules uniformly, we get several benefits: - By using prandom_u32_max() with an upper-bound that the compiler can prove at compile-time is ≤65536 or ≤256, internally get_random_u16() or get_random_u8() is used, which wastes fewer batched random bytes, and hence has higher throughput. - By using prandom_u32_max() instead of %, when the upper-bound is not a constant, division is still avoided, because prandom_u32_max() uses a faster multiplication-based trick instead. - By using get_random_u16() or get_random_u8() in cases where the return value is intended to indeed be a u16 or a u8, we waste fewer batched random bytes, and hence have higher throughput. This series was originally done by hand while I was on an airplane without Internet. Later, Kees and I worked on retroactively figuring out what could be done with Coccinelle and what had to be done manually, and then we split things up based on that. So while this touches a lot of files, the actual amount of code that's hand fiddled is comfortably small" * tag 'random-6.1-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random: prandom: remove unused functions treewide: use get_random_bytes() when possible treewide: use get_random_u32() when possible treewide: use get_random_{u8,u16}() when possible, part 2 treewide: use get_random_{u8,u16}() when possible, part 1 treewide: use prandom_u32_max() when possible, part 2 treewide: use prandom_u32_max() when possible, part 1
2022-10-14Merge tag 'sched-psi-2022-10-14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull PSI updates from Ingo Molnar: - Various performance optimizations, resulting in a 4%-9% speedup in the mmtests/config-scheduler-perfpipe micro-benchmark. - New interface to turn PSI on/off on a per cgroup level. * tag 'sched-psi-2022-10-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/psi: Per-cgroup PSI accounting disable/re-enable interface sched/psi: Cache parent psi_group to speed up group iteration sched/psi: Consolidate cgroup_psi() sched/psi: Add PSI_IRQ to track IRQ/SOFTIRQ pressure sched/psi: Remove NR_ONCPU task accounting sched/psi: Optimize task switch inside shared cgroups again sched/psi: Move private helpers to sched/stats.h sched/psi: Save percpu memory when !psi_cgroups_enabled sched/psi: Don't create cgroup PSI files when psi_disabled sched/psi: Fix periodic aggregation shut off
2022-10-13Merge tag 'trace-v6.1-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: - Found that the synthetic events were using strlen/strscpy() on values that could have come from userspace, and that is bad. Consolidate the string logic of kprobe and eprobe and extend it to the synthetic events to safely process string addresses. - Clean up content of text dump in ftrace_bug() where the output does not make char reads into signed and sign extending the byte output. - Fix some kernel docs in the ring buffer code. * tag 'trace-v6.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Fix reading strings from synthetic events tracing: Add "(fault)" name injection to kernel probes tracing: Move duplicate code of trace_kprobe/eprobe.c into header ring-buffer: Fix kernel-doc ftrace: Fix char print issue in print_ip_ins()
2022-10-13bpf: Allow bpf_user_ringbuf_drain() callbacks to return 1David Vernet
The bpf_user_ringbuf_drain() helper function allows a BPF program to specify a callback that is invoked when draining entries from a BPF_MAP_TYPE_USER_RINGBUF ring buffer map. The API is meant to allow the callback to return 0 if it wants to continue draining samples, and 1 if it's done draining. Unfortunately, bpf_user_ringbuf_drain() landed shortly after commit 1bfe26fb0827 ("bpf: Add verifier support for custom callback return range"), which changed the default behavior of callbacks to only support returning 0. This patch corrects that oversight by allowing bpf_user_ringbuf_drain() callbacks to return 0 or 1. A follow-on patch will update the user_ringbuf selftests to also return 1 from a bpf_user_ringbuf_drain() callback to prevent this from regressing in the future. Fixes: 205715673844 ("bpf: Add bpf_user_ringbuf_drain() helper") Signed-off-by: David Vernet <void@manifault.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20221012232015.1510043-2-void@manifault.com
2022-10-12Merge tag 'mm-nonmm-stable-2022-10-11' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - hfs and hfsplus kmap API modernization (Fabio Francesco) - make crash-kexec work properly when invoked from an NMI-time panic (Valentin Schneider) - ntfs bugfixes (Hawkins Jiawei) - improve IPC msg scalability by replacing atomic_t's with percpu counters (Jiebin Sun) - nilfs2 cleanups (Minghao Chi) - lots of other single patches all over the tree! * tag 'mm-nonmm-stable-2022-10-11' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (71 commits) include/linux/entry-common.h: remove has_signal comment of arch_do_signal_or_restart() prototype proc: test how it holds up with mapping'less process mailmap: update Frank Rowand email address ia64: mca: use strscpy() is more robust and safer init/Kconfig: fix unmet direct dependencies ia64: update config files nilfs2: replace WARN_ONs by nilfs_error for checkpoint acquisition failure fork: remove duplicate included header files init/main.c: remove unnecessary (void*) conversions proc: mark more files as permanent nilfs2: remove the unneeded result variable nilfs2: delete unnecessary checks before brelse() checkpatch: warn for non-standard fixes tag style usr/gen_init_cpio.c: remove unnecessary -1 values from int file ipc/msg: mitigate the lock contention with percpu counter percpu: add percpu_counter_add_local and percpu_counter_sub_local fs/ocfs2: fix repeated words in comments relay: use kvcalloc to alloc page array in relay_alloc_page_array proc: make config PROC_CHILDREN depend on PROC_FS fs: uninline inode_maybe_inc_iversion() ...
2022-10-12tracing: Fix reading strings from synthetic eventsSteven Rostedt (Google)
The follow commands caused a crash: # cd /sys/kernel/tracing # echo 's:open char file[]' > dynamic_events # echo 'hist:keys=common_pid:file=filename:onchange($file).trace(open,$file)' > events/syscalls/sys_enter_openat/trigger' # echo 1 > events/synthetic/open/enable BOOM! The problem is that the synthetic event field "char file[]" will read the value given to it as a string without any memory checks to make sure the address is valid. The above example will pass in the user space address and the sythetic event code will happily call strlen() on it and then strscpy() where either one will cause an oops when accessing user space addresses. Use the helper functions from trace_kprobe and trace_eprobe that can read strings safely (and actually succeed when the address is from user space and the memory is mapped in). Now the above can show: packagekitd-1721 [000] ...2. 104.597170: open: file=/usr/lib/rpm/fileattrs/cmake.attr in:imjournal-978 [006] ...2. 104.599642: open: file=/var/lib/rsyslog/imjournal.state.tmp packagekitd-1721 [000] ...2. 104.626308: open: file=/usr/lib/rpm/fileattrs/debuginfo.attr Link: https://lkml.kernel.org/r/20221012104534.826549315@goodmis.org Cc: stable@vger.kernel.org Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Tom Zanussi <zanussi@kernel.org> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reviewed-by: Tom Zanussi <zanussi@kernel.org> Fixes: bd82631d7ccdc ("tracing: Add support for dynamic strings to synthetic events") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-10-12tracing: Add "(fault)" name injection to kernel probesSteven Rostedt (Google)
Have the specific functions for kernel probes that read strings to inject the "(fault)" name directly. trace_probes.c does this too (for uprobes) but as the code to read strings are going to be used by synthetic events (and perhaps other utilities), it simplifies the code by making sure those other uses do not need to implement the "(fault)" name injection as well. Link: https://lkml.kernel.org/r/20221012104534.644803645@goodmis.org Cc: stable@vger.kernel.org Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Tom Zanussi <zanussi@kernel.org> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reviewed-by: Tom Zanussi <zanussi@kernel.org> Fixes: bd82631d7ccdc ("tracing: Add support for dynamic strings to synthetic events") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-10-12tracing: Move duplicate code of trace_kprobe/eprobe.c into headerSteven Rostedt (Google)
The functions: fetch_store_strlen_user() fetch_store_strlen() fetch_store_string_user() fetch_store_string() are identical in both trace_kprobe.c and trace_eprobe.c. Move them into a new header file trace_probe_kernel.h to share it. This code will later be used by the synthetic events as well. Marked for stable as a fix for a crash in synthetic events requires it. Link: https://lkml.kernel.org/r/20221012104534.467668078@goodmis.org Cc: stable@vger.kernel.org Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Tom Zanussi <zanussi@kernel.org> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reviewed-by: Tom Zanussi <zanussi@kernel.org> Fixes: bd82631d7ccdc ("tracing: Add support for dynamic strings to synthetic events") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-10-12Merge tag 'irq-core-2022-10-12' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull interrupt updates from Thomas Gleixner: "Core code: - Provide a generic wrapper which can be utilized in drivers to handle the problem of force threaded demultiplex interrupts on RT enabled kernels. This avoids conditionals and horrible quirks in drivers all over the place - Fix up affected pinctrl and GPIO drivers to make them cleanly RT safe Interrupt drivers: - A new driver for the FSL MU platform specific MSI implementation - Make irqchip_init() available for pure ACPI based systems - Provide a functional DT binding for the Realtek RTL interrupt chip - The usual DT updates and small code improvements all over the place" * tag 'irq-core-2022-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits) irqchip: IMX_MU_MSI should depend on ARCH_MXC irqchip/imx-mu-msi: Fix wrong register offset for 8ulp irqchip/ls-extirq: Fix invalid wait context by avoiding to use regmap dt-bindings: irqchip: Describe the IMX MU block as a MSI controller irqchip: Add IMX MU MSI controller driver dt-bindings: irqchip: renesas,irqc: Add r8a779g0 support irqchip/gic-v3: Fix typo in comment dt-bindings: interrupt-controller: ti,sci-intr: Fix missing reg property in the binding dt-bindings: irqchip: ti,sci-inta: Fix warning for missing #interrupt-cells irqchip: Allow extra fields to be passed to IRQCHIP_PLATFORM_DRIVER_END platform-msi: Export symbol platform_msi_create_irq_domain() irqchip/realtek-rtl: use parent interrupts dt-bindings: interrupt-controller: realtek,rtl-intc: require parents irqchip/realtek-rtl: use irq_domain_add_linear() irqchip: Make irqchip_init() usable on pure ACPI systems bcma: gpio: Use generic_handle_irq_safe() gpio: mlxbf2: Use generic_handle_irq_safe() platform/x86: intel_int0002_vgpio: Use generic_handle_irq_safe() ssb: gpio: Use generic_handle_irq_safe() pinctrl: amd: Use generic_handle_irq_safe() ...
2022-10-12ring-buffer: Fix kernel-docJiapeng Chong
kernel/trace/ring_buffer.c:895: warning: expecting prototype for ring_buffer_nr_pages_dirty(). Prototype was for ring_buffer_nr_dirty_pages() instead. kernel/trace/ring_buffer.c:5313: warning: expecting prototype for ring_buffer_reset_cpu(). Prototype was for ring_buffer_reset_online_cpus() instead. kernel/trace/ring_buffer.c:5382: warning: expecting prototype for rind_buffer_empty(). Prototype was for ring_buffer_empty() instead. Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=2340 Link: https://lkml.kernel.org/r/20221009020642.12506-1-jiapeng.chong@linux.alibaba.com Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-10-12ftrace: Fix char print issue in print_ip_ins()Zheng Yejian
When ftrace bug happened, following log shows every hex data in problematic ip address: actual: ffffffe8:6b:ffffffd9:01:21 But so many 'f's seem a little confusing, and that is because format '%x' being used to print signed chars in array 'ins'. As suggested by Joe, change to use format "%*phC" to print array 'ins'. After this patch, the log is like: actual: e8:6b:d9:01:21 Link: https://lkml.kernel.org/r/20221011120352.1878494-1-zhengyejian1@huawei.com Fixes: 6c14133d2d3f ("ftrace: Do not blindly read the ip address in ftrace_bug()") Suggested-by: Joe Perches <joe@perches.com> Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-10-11mm: cgroup: fix comments for get from fd/file helpersYosry Ahmed
Fix the documentation comments for cgroup_[v1v2_]get_from_[fd/file](). Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2022-10-11fork: remove duplicate included header filesXu Panda
linux/sched/mm.h is included more than once. Link: https://lkml.kernel.org/r/20220912071556.16811-1-xu.panda@zte.com.cn Signed-off-by: Xu Panda <xu.panda@zte.com.cn> Reported-by: Zeal Robot <zealci@zte.com.cn> Cc: Andy Lutomirski <luto@kernel.org> Cc: Christian Brauner (Microsoft) <brauner@kernel.org> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-11treewide: use get_random_bytes() when possibleJason A. Donenfeld
The prandom_bytes() function has been a deprecated inline wrapper around get_random_bytes() for several releases now, and compiles down to the exact same code. Replace the deprecated wrapper with a direct call to the real function. This was done as a basic find and replace. Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Yury Norov <yury.norov@gmail.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> # powerpc Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-10-11treewide: use get_random_u32() when possibleJason A. Donenfeld
The prandom_u32() function has been a deprecated inline wrapper around get_random_u32() for several releases now, and compiles down to the exact same code. Replace the deprecated wrapper with a direct call to the real function. The same also applies to get_random_int(), which is just a wrapper around get_random_u32(). This was done as a basic find and replace. Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Yury Norov <yury.norov@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> # for ext4 Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> # for sch_cake Acked-by: Chuck Lever <chuck.lever@oracle.com> # for nfsd Acked-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com> # for thunderbolt Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs Acked-by: Helge Deller <deller@gmx.de> # for parisc Acked-by: Heiko Carstens <hca@linux.ibm.com> # for s390 Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-10-11treewide: use prandom_u32_max() when possible, part 1Jason A. Donenfeld
Rather than incurring a division or requesting too many random bytes for the given range, use the prandom_u32_max() function, which only takes the minimum required bytes from the RNG and avoids divisions. This was done mechanically with this coccinelle script: @basic@ expression E; type T; identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; typedef u64; @@ ( - ((T)get_random_u32() % (E)) + prandom_u32_max(E) | - ((T)get_random_u32() & ((E) - 1)) + prandom_u32_max(E * XXX_MAKE_SURE_E_IS_POW2) | - ((u64)(E) * get_random_u32() >> 32) + prandom_u32_max(E) | - ((T)get_random_u32() & ~PAGE_MASK) + prandom_u32_max(PAGE_SIZE) ) @multi_line@ identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; identifier RAND; expression E; @@ - RAND = get_random_u32(); ... when != RAND - RAND %= (E); + RAND = prandom_u32_max(E); // Find a potential literal @literal_mask@ expression LITERAL; type T; identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; position p; @@ ((T)get_random_u32()@p & (LITERAL)) // Add one to the literal. @script:python add_one@ literal << literal_mask.LITERAL; RESULT; @@ value = None if literal.startswith('0x'): value = int(literal, 16) elif literal[0] in '123456789': value = int(literal, 10) if value is None: print("I don't know how to handle %s" % (literal)) cocci.include_match(False) elif value == 2**32 - 1 or value == 2**31 - 1 or value == 2**24 - 1 or value == 2**16 - 1 or value == 2**8 - 1: print("Skipping 0x%x for cleanup elsewhere" % (value)) cocci.include_match(False) elif value & (value + 1) != 0: print("Skipping 0x%x because it's not a power of two minus one" % (value)) cocci.include_match(False) elif literal.startswith('0x'): coccinelle.RESULT = cocci.make_expr("0x%x" % (value + 1)) else: coccinelle.RESULT = cocci.make_expr("%d" % (value + 1)) // Replace the literal mask with the calculated result. @plus_one@ expression literal_mask.LITERAL; position literal_mask.p; expression add_one.RESULT; identifier FUNC; @@ - (FUNC()@p & (LITERAL)) + prandom_u32_max(RESULT) @collapse_ret@ type T; identifier VAR; expression E; @@ { - T VAR; - VAR = (E); - return VAR; + return E; } @drop_var@ type T; identifier VAR; @@ { - T VAR; ... when != VAR } Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Yury Norov <yury.norov@gmail.com> Reviewed-by: KP Singh <kpsingh@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> # for ext4 and sbitmap Reviewed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> # for drbd Acked-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Heiko Carstens <hca@linux.ibm.com> # for s390 Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # for mmc Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-10-11bpf: cgroup_iter: support cgroup1 using cgroup fdYosry Ahmed
Use cgroup_v1v2_get_from_fd() in cgroup_iter to support attaching to both cgroup v1 and v2 using fds. Signed-off-by: Yosry Ahmed <yosryahmed@google.com> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Tejun Heo <tj@kernel.org>
2022-10-11cgroup: add cgroup_v1v2_get_from_[fd/file]()Yosry Ahmed
Add cgroup_v1v2_get_from_fd() and cgroup_v1v2_get_from_file() that support both cgroup1 and cgroup2. Signed-off-by: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2022-10-10Merge tag 'gfs2-nopid-for-v6.1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 debugfs updates from Andreas Gruenbacher: - Improve the way how the state of glocks is reported in debugfs for glocks which are not held by processes, but rather by other resouces like cached inodes or flocks. * tag 'gfs2-nopid-for-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: Mark the remaining process-independent glock holders as GL_NOPID gfs2: Mark flock glock holders as GL_NOPID gfs2: Add GL_NOPID flag for process-independent glock holders gfs2: Add flocks to glockfd debugfs file gfs2: Add glockfd debugfs file