summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2022-07-11module: Fix ERRORs reported by checkpatch.plChristophe Leroy
Checkpatch reports following errors: ERROR: do not use assignment in if condition + if ((colon = strnchr(name, MODULE_NAME_LEN, ':')) != NULL) { ERROR: do not use assignment in if condition + if ((mod = find_module_all(name, colon - name, false)) != NULL) ERROR: do not use assignment in if condition + if ((ret = find_kallsyms_symbol_value(mod, name)) != 0) ERROR: do not initialise globals to 0 +int modules_disabled = 0; Fix them. The following one has to remain, because the condition has to be evaluated multiple times by the macro wait_event_interruptible_timeout(). ERROR: do not use assignment in if condition + if (wait_event_interruptible_timeout(module_wq, Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2022-07-11module: Add support for default value for module async_probeSaravana Kannan
Add a module.async_probe kernel command line option that allows enabling async probing for all modules. When this command line option is used, there might still be some modules for which we want to explicitly force synchronous probing, so extend <modulename>.async_probe to take an optional bool input so that async probing can be disabled for a specific module. Signed-off-by: Saravana Kannan <saravanak@google.com> Reviewed-by: Aaron Tomlin <atomlin@redhat.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2022-07-11module: kallsyms: Ensure preemption in add_kallsyms() with PREEMPT_RTAaron Tomlin
The commit 08126db5ff73 ("module: kallsyms: Fix suspicious rcu usage") under PREEMPT_RT=y, disabling preemption introduced an unbounded latency since the loop is not fixed. This change caused a regression since previously preemption was not disabled and we would dereference RCU-protected pointers explicitly. That being said, these pointers cannot change. Before kallsyms-specific data is prepared/or set-up, we ensure that the unformed module is known to be unique i.e. does not already exist (see load_module()). Therefore, we can fix this by using the common and more appropriate RCU flavour as this section of code can be safely preempted. Reported-by: Steven Rostedt <rostedt@goodmis.org> Fixes: 08126db5ff73 ("module: kallsyms: Fix suspicious rcu usage") Signed-off-by: Aaron Tomlin <atomlin@redhat.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2022-07-11fix race between exit_itimers() and /proc/pid/timersOleg Nesterov
As Chris explains, the comment above exit_itimers() is not correct, we can race with proc_timers_seq_ops. Change exit_itimers() to clear signal->posix_timers with ->siglock held. Cc: <stable@vger.kernel.org> Reported-by: chris@accessvector.net Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-07-11Merge 5.19-rc6 into char-misc-nextGreg Kroah-Hartman
We need the misc driver fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-07-10Merge branch irq/plic-masking into irq/irqchip-nextMarc Zyngier
* irq/plic-masking: : . : SiFive PLIC optimisations from Samuel Holland: : : "This series removes the spinlocks and cpumask operations from the PLIC : driver's hot path. As far as I know, using the priority to mask : interrupts is an intended usage and will work on all existing : implementations. [...]" : . irqchip/sifive-plic: Separate the enable and mask operations irqchip/sifive-plic: Make better use of the effective affinity mask PCI: hv: Take a const cpumask in hv_compose_msi_req_get_cpu() genirq: Provide an IRQ affinity mask in non-SMP configs genirq: Return a const cpumask from irq_data_get_affinity_mask genirq: Add and use an irq_data_update_affinity helper genirq: Refactor accessors to use irq_data_get_affinity_mask genirq: Drop redundant irq_init_effective_affinity genirq: GENERIC_IRQ_EFFECTIVE_AFF_MASK depends on SMP genirq: GENERIC_IRQ_IPI depends on SMP irqchip/mips-gic: Only register IPI domain when SMP is enabled Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-07-09Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextJakub Kicinski
Daniel Borkmann says: ==================== pull-request: bpf-next 2022-07-09 We've added 94 non-merge commits during the last 19 day(s) which contain a total of 125 files changed, 5141 insertions(+), 6701 deletions(-). The main changes are: 1) Add new way for performing BTF type queries to BPF, from Daniel Müller. 2) Add inlining of calls to bpf_loop() helper when its function callback is statically known, from Eduard Zingerman. 3) Implement BPF TCP CC framework usability improvements, from Jörn-Thorben Hinz. 4) Add LSM flavor for attaching per-cgroup BPF programs to existing LSM hooks, from Stanislav Fomichev. 5) Remove all deprecated libbpf APIs in prep for 1.0 release, from Andrii Nakryiko. 6) Add benchmarks around local_storage to BPF selftests, from Dave Marchevsky. 7) AF_XDP sample removal (given move to libxdp) and various improvements around AF_XDP selftests, from Magnus Karlsson & Maciej Fijalkowski. 8) Add bpftool improvements for memcg probing and bash completion, from Quentin Monnet. 9) Add arm64 JIT support for BPF-2-BPF coupled with tail calls, from Jakub Sitnicki. 10) Sockmap optimizations around throughput of UDP transmissions which have been improved by 61%, from Cong Wang. 11) Rework perf's BPF prologue code to remove deprecated functions, from Jiri Olsa. 12) Fix sockmap teardown path to avoid sleepable sk_psock_stop, from John Fastabend. 13) Fix libbpf's cleanup around legacy kprobe/uprobe on error case, from Chuang Wang. 14) Fix libbpf's bpf_helpers.h to work with gcc for the case of its sec/pragma macro, from James Hilliard. 15) Fix libbpf's pt_regs macros for riscv to use a0 for RC register, from Yixun Lan. 16) Fix bpftool to show the name of type BPF_OBJ_LINK, from Yafang Shao. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (94 commits) selftests/bpf: Fix xdp_synproxy build failure if CONFIG_NF_CONNTRACK=m/n bpf: Correctly propagate errors up from bpf_core_composites_match libbpf: Disable SEC pragma macro on GCC bpf: Check attach_func_proto more carefully in check_return_code selftests/bpf: Add test involving restrict type qualifier bpftool: Add support for KIND_RESTRICT to gen min_core_btf command MAINTAINERS: Add entry for AF_XDP selftests files selftests, xsk: Rename AF_XDP testing app bpf, docs: Remove deprecated xsk libbpf APIs description selftests/bpf: Add benchmark for local_storage RCU Tasks Trace usage libbpf, riscv: Use a0 for RC register libbpf: Remove unnecessary usdt_rel_ip assignments selftests/bpf: Fix few more compiler warnings selftests/bpf: Fix bogus uninitialized variable warning bpftool: Remove zlib feature test from Makefile libbpf: Cleanup the legacy uprobe_event on failed add/attach_event() libbpf: Fix wrong variable used in perf_event_uprobe_open_legacy() libbpf: Cleanup the legacy kprobe_event on failed add/attach_event() selftests/bpf: Add type match test against kernel's task_struct selftests/bpf: Add nested type to type based tests ... ==================== Link: https://lore.kernel.org/r/20220708233145.32365-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-09ptrace: fix clearing of JOBCTL_TRACED in ptrace_unfreeze_traced()Sven Schnelle
CI reported the following splat while running the strace testsuite: WARNING: CPU: 1 PID: 3570031 at kernel/ptrace.c:272 ptrace_check_attach+0x12e/0x178 CPU: 1 PID: 3570031 Comm: strace Tainted: G OE 5.19.0-20220624.rc3.git0.ee819a77d4e7.300.fc36.s390x #1 Hardware name: IBM 3906 M04 704 (z/VM 7.1.0) Call Trace: [<00000000ab4b645a>] ptrace_check_attach+0x132/0x178 ([<00000000ab4b6450>] ptrace_check_attach+0x128/0x178) [<00000000ab4b6cde>] __s390x_sys_ptrace+0x86/0x160 [<00000000ac03fcec>] __do_syscall+0x1d4/0x200 [<00000000ac04e312>] system_call+0x82/0xb0 Last Breaking-Event-Address: [<00000000ab4ea3c8>] wait_task_inactive+0x98/0x190 This is because JOBCTL_TRACED is set, but the task is not in TASK_TRACED state. Caused by ptrace_unfreeze_traced() which does: task->jobctl &= ~TASK_TRACED but it should be: task->jobctl &= ~JOBCTL_TRACED Fixes: 31cae1eaae4f ("sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state") Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Tested-by: Alexander Gordeev <agordeev@linux.ibm.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-07-08Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfJakub Kicinski
Daniel Borkmann says: ==================== bpf 2022-07-08 We've added 3 non-merge commits during the last 2 day(s) which contain a total of 7 files changed, 40 insertions(+), 24 deletions(-). The main changes are: 1) Fix cBPF splat triggered by skb not having a mac header, from Eric Dumazet. 2) Fix spurious packet loss in generic XDP when pushing packets out (note that native XDP is not affected by the issue), from Johan Almbladh. 3) Fix bpf_dynptr_{read,write}() helper signatures with flag argument before its set in stone as UAPI, from Joanne Koong. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf: Add flags arg to bpf_dynptr_read and bpf_dynptr_write APIs bpf: Make sure mac_header was set before using it xdp: Fix spurious packet loss in generic XDP TX path ==================== Link: https://lore.kernel.org/r/20220708213418.19626-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-08bpf: Check attach_func_proto more carefully in check_return_codeStanislav Fomichev
Syzkaller reports the following crash: RIP: 0010:check_return_code kernel/bpf/verifier.c:10575 [inline] RIP: 0010:do_check kernel/bpf/verifier.c:12346 [inline] RIP: 0010:do_check_common+0xb3d2/0xd250 kernel/bpf/verifier.c:14610 With the following reproducer: bpf$PROG_LOAD_XDP(0x5, &(0x7f00000004c0)={0xd, 0x3, &(0x7f0000000000)=ANY=[@ANYBLOB="1800000000000019000000000000000095"], &(0x7f0000000300)='GPL\x00', 0x0, 0x0, 0x0, 0x0, 0x0, '\x00', 0x0, 0x2b, 0xffffffffffffffff, 0x8, 0x0, 0x0, 0x10, 0x0}, 0x80) Because we don't enforce expected_attach_type for XDP programs, we end up in hitting 'if (prog->expected_attach_type == BPF_LSM_CGROUP' part in check_return_code and follow up with testing `prog->aux->attach_func_proto->type`, but `prog->aux->attach_func_proto` is NULL. Add explicit prog_type check for the "Note, BPF_LSM_CGROUP that attach ..." condition. Also, don't skip return code check for LSM/STRUCT_OPS. The above actually brings an issue with existing selftest which tries to return EPERM from void inet_csk_clone. Fix the test (and move called_socket_clone to make sure it's not incremented in case of an error) and add a new one to explicitly verify this condition. Fixes: 69fd337a975c ("bpf: per-cgroup lsm flavor") Reported-by: syzbot+5cc0730bd4b4d2c5f152@syzkaller.appspotmail.com Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20220708175000.2603078-1-sdf@google.com
2022-07-08sysctl: Fix data races in proc_dointvec_jiffies().Kuniyuki Iwashima
A sysctl variable is accessed concurrently, and there is always a chance of data-race. So, all readers and writers need some basic protection to avoid load/store-tearing. This patch changes proc_dointvec_jiffies() to use READ_ONCE() and WRITE_ONCE() internally to fix data-races on the sysctl side. For now, proc_dointvec_jiffies() itself is tolerant to a data-race, but we still need to add annotations on the other subsystem's side. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-08sysctl: Fix data races in proc_doulongvec_minmax().Kuniyuki Iwashima
A sysctl variable is accessed concurrently, and there is always a chance of data-race. So, all readers and writers need some basic protection to avoid load/store-tearing. This patch changes proc_doulongvec_minmax() to use READ_ONCE() and WRITE_ONCE() internally to fix data-races on the sysctl side. For now, proc_doulongvec_minmax() itself is tolerant to a data-race, but we still need to add annotations on the other subsystem's side. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-08sysctl: Fix data races in proc_douintvec_minmax().Kuniyuki Iwashima
A sysctl variable is accessed concurrently, and there is always a chance of data-race. So, all readers and writers need some basic protection to avoid load/store-tearing. This patch changes proc_douintvec_minmax() to use READ_ONCE() and WRITE_ONCE() internally to fix data-races on the sysctl side. For now, proc_douintvec_minmax() itself is tolerant to a data-race, but we still need to add annotations on the other subsystem's side. Fixes: 61d9b56a8920 ("sysctl: add unsigned int range support") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-08sysctl: Fix data races in proc_dointvec_minmax().Kuniyuki Iwashima
A sysctl variable is accessed concurrently, and there is always a chance of data-race. So, all readers and writers need some basic protection to avoid load/store-tearing. This patch changes proc_dointvec_minmax() to use READ_ONCE() and WRITE_ONCE() internally to fix data-races on the sysctl side. For now, proc_dointvec_minmax() itself is tolerant to a data-race, but we still need to add annotations on the other subsystem's side. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-08sysctl: Fix data races in proc_douintvec().Kuniyuki Iwashima
A sysctl variable is accessed concurrently, and there is always a chance of data-race. So, all readers and writers need some basic protection to avoid load/store-tearing. This patch changes proc_douintvec() to use READ_ONCE() and WRITE_ONCE() internally to fix data-races on the sysctl side. For now, proc_douintvec() itself is tolerant to a data-race, but we still need to add annotations on the other subsystem's side. Fixes: e7d316a02f68 ("sysctl: handle error writing UINT_MAX to u32 fields") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-08sysctl: Fix data races in proc_dointvec().Kuniyuki Iwashima
A sysctl variable is accessed concurrently, and there is always a chance of data-race. So, all readers and writers need some basic protection to avoid load/store-tearing. This patch changes proc_dointvec() to use READ_ONCE() and WRITE_ONCE() internally to fix data-races on the sysctl side. For now, proc_dointvec() itself is tolerant to a data-race, but we still need to add annotations on the other subsystem's side. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-08bpf: Add flags arg to bpf_dynptr_read and bpf_dynptr_write APIsJoanne Koong
Commit 13bbbfbea759 ("bpf: Add bpf_dynptr_read and bpf_dynptr_write") added the bpf_dynptr_write() and bpf_dynptr_read() APIs. However, it will be needed for some dynptr types to pass in flags as well (e.g. when writing to a skb, the user may like to invalidate the hash or recompute the checksum). This patch adds a "u64 flags" arg to the bpf_dynptr_read() and bpf_dynptr_write() APIs before their UAPI signature freezes where we then cannot change them anymore with a 5.19.x released kernel. Fixes: 13bbbfbea759 ("bpf: Add bpf_dynptr_read and bpf_dynptr_write") Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20220706232547.4016651-1-joannelkoong@gmail.com
2022-07-07Documentation: update watch_queue.rst referencesMauro Carvalho Chehab
Changeset f5461124d59b ("Documentation: move watch_queue to core-api") renamed: Documentation/watch_queue.rst to: Documentation/core-api/watch_queue.rst. Update the cross-references accordingly. Fixes: f5461124d59b ("Documentation: move watch_queue to core-api") Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org> Link: https://lore.kernel.org/r/1c220de9c58f35e815a3df9458ac2bea323c8bfb.1656234456.git.mchehab@kernel.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-07-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-07bpf: Make sure mac_header was set before using itEric Dumazet
Classic BPF has a way to load bytes starting from the mac header. Some skbs do not have a mac header, and skb_mac_header() in this case is returning a pointer that 65535 bytes after skb->head. Existing range check in bpf_internal_load_pointer_neg_helper() was properly kicking and no illegal access was happening. New sanity check in skb_mac_header() is firing, so we need to avoid it. WARNING: CPU: 1 PID: 28990 at include/linux/skbuff.h:2785 skb_mac_header include/linux/skbuff.h:2785 [inline] WARNING: CPU: 1 PID: 28990 at include/linux/skbuff.h:2785 bpf_internal_load_pointer_neg_helper+0x1b1/0x1c0 kernel/bpf/core.c:74 Modules linked in: CPU: 1 PID: 28990 Comm: syz-executor.0 Not tainted 5.19.0-rc4-syzkaller-00865-g4874fb9484be #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/29/2022 RIP: 0010:skb_mac_header include/linux/skbuff.h:2785 [inline] RIP: 0010:bpf_internal_load_pointer_neg_helper+0x1b1/0x1c0 kernel/bpf/core.c:74 Code: ff ff 45 31 f6 e9 5a ff ff ff e8 aa 27 40 00 e9 3b ff ff ff e8 90 27 40 00 e9 df fe ff ff e8 86 27 40 00 eb 9e e8 2f 2c f3 ff <0f> 0b eb b1 e8 96 27 40 00 e9 79 fe ff ff 90 41 57 41 56 41 55 41 RSP: 0018:ffffc9000309f668 EFLAGS: 00010216 RAX: 0000000000000118 RBX: ffffffffffeff00c RCX: ffffc9000e417000 RDX: 0000000000040000 RSI: ffffffff81873f21 RDI: 0000000000000003 RBP: ffff8880842878c0 R08: 0000000000000003 R09: 000000000000ffff R10: 000000000000ffff R11: 0000000000000001 R12: 0000000000000004 R13: ffff88803ac56c00 R14: 000000000000ffff R15: dffffc0000000000 FS: 00007f5c88a16700(0000) GS:ffff8880b9b00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fdaa9f6c058 CR3: 000000003a82c000 CR4: 00000000003506e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ____bpf_skb_load_helper_32 net/core/filter.c:276 [inline] bpf_skb_load_helper_32+0x191/0x220 net/core/filter.c:264 Fixes: f9aefd6b2aa3 ("net: warn if mac header was not set") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220707123900.945305-1-edumazet@google.com
2022-07-07Merge tag 'net-5.19-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from bpf, netfilter, can, and bluetooth. Current release - regressions: - bluetooth: fix deadlock on hci_power_on_sync Previous releases - regressions: - sched: act_police: allow 'continue' action offload - eth: usbnet: fix memory leak in error case - eth: ibmvnic: properly dispose of all skbs during a failover Previous releases - always broken: - bpf: - fix insufficient bounds propagation from adjust_scalar_min_max_vals - clear page contiguity bit when unmapping pool - netfilter: nft_set_pipapo: release elements in clone from abort path - mptcp: netlink: issue MP_PRIO signals from userspace PMs - can: - rcar_canfd: fix data transmission failed on R-Car V3U - gs_usb: gs_usb_open/close(): fix memory leak Misc: - add Wenjia as SMC maintainer" * tag 'net-5.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (57 commits) wireguard: Kconfig: select CRYPTO_CHACHA_S390 crypto: s390 - do not depend on CRYPTO_HW for SIMD implementations wireguard: selftests: use microvm on x86 wireguard: selftests: always call kernel makefile wireguard: selftests: use virt machine on m68k wireguard: selftests: set fake real time in init r8169: fix accessing unset transport header net: rose: fix UAF bug caused by rose_t0timer_expiry usbnet: fix memory leak in error case Revert "tls: rx: move counting TlsDecryptErrors for sync" mptcp: update MIB_RMSUBFLOW in cmd_sf_destroy mptcp: fix local endpoint accounting selftests: mptcp: userspace PM support for MP_PRIO signals mptcp: netlink: issue MP_PRIO signals from userspace PMs mptcp: Acquire the subflow socket lock before modifying MP_PRIO flags mptcp: Avoid acquiring PM lock for subflow priority changes mptcp: fix locking in mptcp_nl_cmd_sf_destroy() net/mlx5e: Fix matchall police parameters validation net/sched: act_police: allow 'continue' action offload net: lan966x: hardcode the number of external ports ...
2022-07-07signal handling: don't use BUG_ON() for debuggingLinus Torvalds
These are indeed "should not happen" situations, but it turns out recent changes made the 'task_is_stopped_or_trace()' case trigger (fix for that exists, is pending more testing), and the BUG_ON() makes it unnecessarily hard to actually debug for no good reason. It's been that way for a long time, but let's make it clear: BUG_ON() is not good for debugging, and should never be used in situations where you could just say "this shouldn't happen, but we can continue". Use WARN_ON_ONCE() instead to make sure it gets logged, and then just continue running. Instead of making the system basically unusuable because you crashed the machine while potentially holding some very core locks (eg this function is commonly called while holding 'tasklist_lock' for writing). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-07-07genirq: Return a const cpumask from irq_data_get_affinity_maskSamuel Holland
Now that the irq_data_update_affinity helper exists, enforce its use by returning a a const cpumask from irq_data_get_affinity_mask. Since the previous commit already updated places that needed to call irq_data_update_affinity, this commit updates the remaining code that either did not modify the cpumask or immediately passed the modified mask to irq_set_affinity. Signed-off-by: Samuel Holland <samuel@sholland.org> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220701200056.46555-8-samuel@sholland.org
2022-07-07genirq: Drop redundant irq_init_effective_affinitySamuel Holland
It does exactly the same thing as irq_data_update_effective_affinity. Signed-off-by: Samuel Holland <samuel@sholland.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220701200056.46555-5-samuel@sholland.org
2022-07-07genirq: GENERIC_IRQ_EFFECTIVE_AFF_MASK depends on SMPSamuel Holland
An IRQ's effective affinity can only be different from its configured affinity if there are multiple CPUs. Make it clear that this option is only meaningful when SMP is enabled. Most of the relevant code in irqdesc.c is already hidden behind CONFIG_SMP anyway. Signed-off-by: Samuel Holland <samuel@sholland.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220701200056.46555-4-samuel@sholland.org
2022-07-07genirq: GENERIC_IRQ_IPI depends on SMPSamuel Holland
The generic IPI code depends on the IRQ affinity mask being allocated and initialized. This will not be the case if SMP is disabled. Fix up the remaining driver that selected GENERIC_IRQ_IPI in a non-SMP config. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Samuel Holland <samuel@sholland.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220701200056.46555-3-samuel@sholland.org
2022-07-07genirq: Don't return error on missing optional irq_request_resources()Antonio Borneo
Function irq_chip::irq_request_resources() is reported as optional in the declaration of struct irq_chip. If the parent irq_chip does not implement it, we should ignore it and return. Don't return error if the functions is missing. Signed-off-by: Antonio Borneo <antonio.borneo@foss.st.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220512160544.13561-1-antonio.borneo@foss.st.com
2022-07-05bpf, libbpf: Add type match supportDaniel Müller
This patch adds support for the proposed type match relation to relo_core where it is shared between userspace and kernel. It plumbs through both kernel-side and libbpf-side support. The matching relation is defined as follows (copy from source): - modifiers and typedefs are stripped (and, hence, effectively ignored) - generally speaking types need to be of same kind (struct vs. struct, union vs. union, etc.) - exceptions are struct/union behind a pointer which could also match a forward declaration of a struct or union, respectively, and enum vs. enum64 (see below) Then, depending on type: - integers: - match if size and signedness match - arrays & pointers: - target types are recursively matched - structs & unions: - local members need to exist in target with the same name - for each member we recursively check match unless it is already behind a pointer, in which case we only check matching names and compatible kind - enums: - local variants have to have a match in target by symbolic name (but not numeric value) - size has to match (but enum may match enum64 and vice versa) - function pointers: - number and position of arguments in local type has to match target - for each argument and the return value we recursively check match Signed-off-by: Daniel Müller <deso@posteo.net> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220628160127.607834-5-deso@posteo.net
2022-07-05MAINTAINERS: Add Paul as context tracking maintainerFrederic Weisbecker
Since most of the bits have been imported from kernel/rcu/tree.c and now that the context tracking code is tightly linked to RCU, add Paul as a context tracking maintainer. Also update the context tracking file header accordingly. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Uladzislau Rezki <uladzislau.rezki@sony.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Nicolas Saenz Julienne <nsaenz@kernel.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Xiongfeng Wang <wangxiongfeng2@huawei.com> Cc: Yu Liao <liaoyu15@huawei.com> Cc: Phil Auld <pauld@redhat.com> Cc: Paul Gortmaker<paul.gortmaker@windriver.com> Cc: Alex Belits <abelits@marvell.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Nicolas Saenz Julienne <nsaenzju@redhat.com> Tested-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
2022-07-05context_tracking: Convert state to atomic_tFrederic Weisbecker
Context tracking's state and dynticks counter are going to be merged in a single field so that both updates can happen atomically and at the same time. Prepare for that with converting the state into an atomic_t. [ paulmck: Apply kernel test robot feedback. ] Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Uladzislau Rezki <uladzislau.rezki@sony.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Nicolas Saenz Julienne <nsaenz@kernel.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Xiongfeng Wang <wangxiongfeng2@huawei.com> Cc: Yu Liao <liaoyu15@huawei.com> Cc: Phil Auld <pauld@redhat.com> Cc: Paul Gortmaker<paul.gortmaker@windriver.com> Cc: Alex Belits <abelits@marvell.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Nicolas Saenz Julienne <nsaenzju@redhat.com> Tested-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
2022-07-05rcu/context-tracking: Remove unused and/or unecessary middle functionsFrederic Weisbecker
Some eqs functions are now only used internally by context tracking, so their public declarations can be removed. Also middle functions such as rcu_user_*() and rcu_idle_*() which now directly call to rcu_eqs_enter() and rcu_eqs_exit() can be wiped out as well. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Uladzislau Rezki <uladzislau.rezki@sony.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Nicolas Saenz Julienne <nsaenz@kernel.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Xiongfeng Wang <wangxiongfeng2@huawei.com> Cc: Yu Liao <liaoyu15@huawei.com> Cc: Phil Auld <pauld@redhat.com> Cc: Paul Gortmaker<paul.gortmaker@windriver.com> Cc: Alex Belits <abelits@marvell.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Nicolas Saenz Julienne <nsaenzju@redhat.com> Tested-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
2022-07-05rcu/context-tracking: Move RCU-dynticks internal functions to context_trackingFrederic Weisbecker
Move the core RCU eqs/dynticks functions to context tracking so that we can later merge all that code within context tracking. Acked-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Uladzislau Rezki <uladzislau.rezki@sony.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Nicolas Saenz Julienne <nsaenz@kernel.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Xiongfeng Wang <wangxiongfeng2@huawei.com> Cc: Yu Liao <liaoyu15@huawei.com> Cc: Phil Auld <pauld@redhat.com> Cc: Paul Gortmaker<paul.gortmaker@windriver.com> Cc: Alex Belits <abelits@marvell.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Nicolas Saenz Julienne <nsaenzju@redhat.com> Tested-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
2022-07-05rcu/context-tracking: Move deferred nocb resched to context trackingFrederic Weisbecker
To prepare for migrating the RCU eqs accounting code to context tracking, split the last-resort deferred nocb resched from rcu_user_enter() and move it into a separate call from context tracking. Acked-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Uladzislau Rezki <uladzislau.rezki@sony.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Nicolas Saenz Julienne <nsaenz@kernel.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Xiongfeng Wang <wangxiongfeng2@huawei.com> Cc: Yu Liao <liaoyu15@huawei.com> Cc: Phil Auld <pauld@redhat.com> Cc: Paul Gortmaker<paul.gortmaker@windriver.com> Cc: Alex Belits <abelits@marvell.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Nicolas Saenz Julienne <nsaenzju@redhat.com> Tested-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
2022-07-05rcu/context_tracking: Move dynticks_nmi_nesting to context trackingFrederic Weisbecker
The RCU eqs tracking is going to be performed by the context tracking subsystem. The related nesting counters thus need to be moved to the context tracking structure. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Uladzislau Rezki <uladzislau.rezki@sony.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Nicolas Saenz Julienne <nsaenz@kernel.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Xiongfeng Wang <wangxiongfeng2@huawei.com> Cc: Yu Liao <liaoyu15@huawei.com> Cc: Phil Auld <pauld@redhat.com> Cc: Paul Gortmaker<paul.gortmaker@windriver.com> Cc: Alex Belits <abelits@marvell.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Nicolas Saenz Julienne <nsaenzju@redhat.com> Tested-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
2022-07-05rcu/context_tracking: Move dynticks_nesting to context trackingFrederic Weisbecker
The RCU eqs tracking is going to be performed by the context tracking subsystem. The related nesting counters thus need to be moved to the context tracking structure. Acked-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Uladzislau Rezki <uladzislau.rezki@sony.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Nicolas Saenz Julienne <nsaenz@kernel.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Xiongfeng Wang <wangxiongfeng2@huawei.com> Cc: Yu Liao <liaoyu15@huawei.com> Cc: Phil Auld <pauld@redhat.com> Cc: Paul Gortmaker<paul.gortmaker@windriver.com> Cc: Alex Belits <abelits@marvell.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Nicolas Saenz Julienne <nsaenzju@redhat.com> Tested-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
2022-07-05rcu/context_tracking: Move dynticks counter to context trackingFrederic Weisbecker
In order to prepare for merging RCU dynticks counter into the context tracking state, move the rcu_data's dynticks field to the context tracking structure. It will later be mixed within the context tracking state itself. [ paulmck: Move enum ctx_state into global scope. ] Acked-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Uladzislau Rezki <uladzislau.rezki@sony.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Nicolas Saenz Julienne <nsaenz@kernel.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Xiongfeng Wang <wangxiongfeng2@huawei.com> Cc: Yu Liao <liaoyu15@huawei.com> Cc: Phil Auld <pauld@redhat.com> Cc: Paul Gortmaker<paul.gortmaker@windriver.com> Cc: Alex Belits <abelits@marvell.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Nicolas Saenz Julienne <nsaenzju@redhat.com> Tested-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
2022-07-05rcu/context-tracking: Remove rcu_irq_enter/exit()Frederic Weisbecker
Now rcu_irq_enter/exit() is an unnecessary middle call between ct_irq_enter/exit() and nmi_irq_enter/exit(). Take this opportunity to remove the former functions and move the comments above them to the new entrypoints. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Uladzislau Rezki <uladzislau.rezki@sony.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Nicolas Saenz Julienne <nsaenz@kernel.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Xiongfeng Wang <wangxiongfeng2@huawei.com> Cc: Yu Liao <liaoyu15@huawei.com> Cc: Phil Auld <pauld@redhat.com> Cc: Paul Gortmaker<paul.gortmaker@windriver.com> Cc: Alex Belits <abelits@marvell.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Nicolas Saenz Julienne <nsaenzju@redhat.com> Tested-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
2022-07-05context_tracking: Take NMI eqs entrypoints over RCUFrederic Weisbecker
The RCU dynticks counter is going to be merged into the context tracking subsystem. Prepare with moving the NMI extended quiescent states entrypoints to context tracking. For now those are dumb redirection to existing RCU calls. Acked-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Uladzislau Rezki <uladzislau.rezki@sony.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Nicolas Saenz Julienne <nsaenz@kernel.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Xiongfeng Wang <wangxiongfeng2@huawei.com> Cc: Yu Liao <liaoyu15@huawei.com> Cc: Phil Auld <pauld@redhat.com> Cc: Paul Gortmaker<paul.gortmaker@windriver.com> Cc: Alex Belits <abelits@marvell.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Nicolas Saenz Julienne <nsaenzju@redhat.com> Tested-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
2022-07-05context_tracking: Take IRQ eqs entrypoints over RCUFrederic Weisbecker
The RCU dynticks counter is going to be merged into the context tracking subsystem. Prepare with moving the IRQ extended quiescent states entrypoints to context tracking. For now those are dumb redirection to existing RCU calls. [ paulmck: Apply Stephen Rothwell feedback from -next. ] [ paulmck: Apply Nathan Chancellor feedback. ] Acked-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Uladzislau Rezki <uladzislau.rezki@sony.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Nicolas Saenz Julienne <nsaenz@kernel.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Xiongfeng Wang <wangxiongfeng2@huawei.com> Cc: Yu Liao <liaoyu15@huawei.com> Cc: Phil Auld <pauld@redhat.com> Cc: Paul Gortmaker<paul.gortmaker@windriver.com> Cc: Alex Belits <abelits@marvell.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Nicolas Saenz Julienne <nsaenzju@redhat.com> Tested-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
2022-07-05context_tracking: Take idle eqs entrypoints over RCUFrederic Weisbecker
The RCU dynticks counter is going to be merged into the context tracking subsystem. Start with moving the idle extended quiescent states entrypoints to context tracking. For now those are dumb redirections to existing RCU calls. [ paulmck: Apply kernel test robot feedback. ] Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Uladzislau Rezki <uladzislau.rezki@sony.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Nicolas Saenz Julienne <nsaenz@kernel.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Xiongfeng Wang <wangxiongfeng2@huawei.com> Cc: Yu Liao <liaoyu15@huawei.com> Cc: Phil Auld <pauld@redhat.com> Cc: Paul Gortmaker<paul.gortmaker@windriver.com> Cc: Alex Belits <abelits@marvell.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Nicolas Saenz Julienne <nsaenzju@redhat.com> Tested-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
2022-07-04sched/core: Use try_cmpxchg in set_nr_{and_not,if}_pollingUros Bizjak
Use try_cmpxchg instead of cmpxchg (*ptr, old, new) != old in set_nr_{and_not,if}_polling. x86 cmpxchg returns success in ZF flag, so this change saves a compare after cmpxchg. The definition of cmpxchg based fetch_or was changed in the same way as atomic_fetch_##op definitions were changed in e6790e4b5d5e97dc287f3496dd2cf2dbabdfdb35. Also declare these two functions as inline to ensure inlining. In the case of set_nr_and_not_polling, the compiler (gcc) tries to outsmart itself by constructing the boolean return value with logic operations on the fetched value, and these extra operations enlarge the function over the inlining threshold value. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220629151552.6015-1-ubizjak@gmail.com
2022-07-04sched/core: add forced idle accounting for cgroupsJosh Don
4feee7d1260 previously added per-task forced idle accounting. This patch extends this to also include cgroups. rstat is used for cgroup accounting, except for the root, which uses kcpustat in order to bypass the need for doing an rstat flush when reading root stats. Only cgroup v2 is supported. Similar to the task accounting, the cgroup accounting requires that schedstats is enabled. Signed-off-by: Josh Don <joshdon@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lkml.kernel.org/r/20220629211426.3329954-1-joshdon@google.com
2022-07-03mm: shrinkers: provide shrinkers with namesRoman Gushchin
Currently shrinkers are anonymous objects. For debugging purposes they can be identified by count/scan function names, but it's not always useful: e.g. for superblock's shrinkers it's nice to have at least an idea of to which superblock the shrinker belongs. This commit adds names to shrinkers. register_shrinker() and prealloc_shrinker() functions are extended to take a format and arguments to master a name. In some cases it's not possible to determine a good name at the time when a shrinker is allocated. For such cases shrinker_debugfs_rename() is provided. The expected format is: <subsystem>-<shrinker_type>[:<instance>]-<id> For some shrinkers an instance can be encoded as (MAJOR:MINOR) pair. After this change the shrinker debugfs directory looks like: $ cd /sys/kernel/debug/shrinker/ $ ls dquota-cache-16 sb-devpts-28 sb-proc-47 sb-tmpfs-42 mm-shadow-18 sb-devtmpfs-5 sb-proc-48 sb-tmpfs-43 mm-zspool:zram0-34 sb-hugetlbfs-17 sb-pstore-31 sb-tmpfs-44 rcu-kfree-0 sb-hugetlbfs-33 sb-rootfs-2 sb-tmpfs-49 sb-aio-20 sb-iomem-12 sb-securityfs-6 sb-tracefs-13 sb-anon_inodefs-15 sb-mqueue-21 sb-selinuxfs-22 sb-xfs:vda1-36 sb-bdev-3 sb-nsfs-4 sb-sockfs-8 sb-zsmalloc-19 sb-bpf-32 sb-pipefs-14 sb-sysfs-26 thp-deferred_split-10 sb-btrfs:vda2-24 sb-proc-25 sb-tmpfs-1 thp-zero-9 sb-cgroup2-30 sb-proc-39 sb-tmpfs-27 xfs-buf:vda1-37 sb-configfs-23 sb-proc-41 sb-tmpfs-29 xfs-inodegc:vda1-38 sb-dax-11 sb-proc-45 sb-tmpfs-35 sb-debugfs-7 sb-proc-46 sb-tmpfs-40 [roman.gushchin@linux.dev: fix build warnings] Link: https://lkml.kernel.org/r/Yr+ZTnLb9lJk6fJO@castle Reported-by: kernel test robot <lkp@intel.com> Link: https://lkml.kernel.org/r/20220601032227.4076670-4-roman.gushchin@linux.dev Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Cc: Dave Chinner <dchinner@redhat.com> Cc: Hillf Danton <hdanton@sina.com> Cc: Kent Overstreet <kent.overstreet@gmail.com> Cc: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-01Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfJakub Kicinski
Daniel Borkmann says: ==================== pull-request: bpf 2022-07-02 We've added 7 non-merge commits during the last 14 day(s) which contain a total of 6 files changed, 193 insertions(+), 86 deletions(-). The main changes are: 1) Fix clearing of page contiguity when unmapping XSK pool, from Ivan Malov. 2) Two verifier fixes around bounds data propagation, from Daniel Borkmann. 3) Fix fprobe sample module's parameter descriptions, from Masami Hiramatsu. 4) General BPF maintainer entry revamp to better scale patch reviews. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf, selftests: Add verifier test case for jmp32's jeq/jne bpf, selftests: Add verifier test case for imm=0,umin=0,umax=1 scalar bpf: Fix insufficient bounds propagation from adjust_scalar_min_max_vals bpf: Fix incorrect verifier simulation around jmp32's jeq/jne xsk: Clear page contiguity bit when unmapping pool bpf, docs: Better scale maintenance of BPF subsystem fprobe, samples: Add module parameter descriptions ==================== Link: https://lore.kernel.org/r/20220701230121.10354-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-01panic: Taint kernel if tests are runDavid Gow
Most in-kernel tests (such as KUnit tests) are not supposed to run on production systems: they may do deliberately illegal things to trigger errors, and have security implications (for example, KUnit assertions will often deliberately leak kernel addresses). Add a new taint type, TAINT_TEST to signal that a test has been run. This will be printed as 'N' (originally for kuNit, as every other sensible letter was taken.) This should discourage people from running these tests on production systems, and to make it easier to tell if tests have been run accidentally (by loading the wrong configuration, etc.) Acked-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Brendan Higgins <brendanhiggins@google.com> Signed-off-by: David Gow <davidgow@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2022-07-01module: Fix "warning: variable 'exit' set but not used"Christophe Leroy
When CONFIG_MODULE_UNLOAD is not selected, 'exit' is set but never used. It is not possible to replace the #ifdef CONFIG_MODULE_UNLOAD by IS_ENABLED(CONFIG_MODULE_UNLOAD) because mod->exit doesn't exist when CONFIG_MODULE_UNLOAD is not selected. And because of the rcu_read_lock_sched() section it is not easy to regroup everything in a single #ifdef. Let's regroup partially and add missing #ifdef to completely opt out the use of 'exit' when CONFIG_MODULE_UNLOAD is not selected. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2022-07-01module: Fix selfAssignment cppcheck warningChristophe Leroy
cppcheck reports the following warnings: kernel/module/main.c:1455:26: warning: Redundant assignment of 'mod->core_layout.size' to itself. [selfAssignment] mod->core_layout.size = strict_align(mod->core_layout.size); ^ kernel/module/main.c:1489:26: warning: Redundant assignment of 'mod->init_layout.size' to itself. [selfAssignment] mod->init_layout.size = strict_align(mod->init_layout.size); ^ kernel/module/main.c:1493:26: warning: Redundant assignment of 'mod->init_layout.size' to itself. [selfAssignment] mod->init_layout.size = strict_align(mod->init_layout.size); ^ kernel/module/main.c:1504:26: warning: Redundant assignment of 'mod->init_layout.size' to itself. [selfAssignment] mod->init_layout.size = strict_align(mod->init_layout.size); ^ kernel/module/main.c:1459:26: warning: Redundant assignment of 'mod->data_layout.size' to itself. [selfAssignment] mod->data_layout.size = strict_align(mod->data_layout.size); ^ kernel/module/main.c:1463:26: warning: Redundant assignment of 'mod->data_layout.size' to itself. [selfAssignment] mod->data_layout.size = strict_align(mod->data_layout.size); ^ kernel/module/main.c:1467:26: warning: Redundant assignment of 'mod->data_layout.size' to itself. [selfAssignment] mod->data_layout.size = strict_align(mod->data_layout.size); ^ This is due to strict_align() being a no-op when CONFIG_STRICT_MODULE_RWX is not selected. Transform strict_align() macro into an inline function. It will allow type checking and avoid the selfAssignment warning. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2022-07-01modules: Fix corruption of /proc/kallsymsAdrian Hunter
The commit 91fb02f31505 ("module: Move kallsyms support into a separate file") changed from using strlcpy() to using strscpy() which created a buffer overflow. That happened because: 1) an incorrect value was passed as the buffer length 2) strscpy() (unlike strlcpy()) may copy beyond the length of the input string when copying word-by-word. The assumption was that because it was already known that the strings being copied would fit in the space available, it was not necessary to correctly set the buffer length. strscpy() breaks that assumption because although it will not touch bytes beyond the given buffer length it may write bytes beyond the input string length when writing word-by-word. The result of the buffer overflow is to corrupt the symbol type information that follows. e.g. $ sudo cat -v /proc/kallsyms | grep '\^' | head ffffffffc0615000 ^@ rfcomm_session_get [rfcomm] ffffffffc061c060 ^@ session_list [rfcomm] ffffffffc06150d0 ^@ rfcomm_send_frame [rfcomm] ffffffffc0615130 ^@ rfcomm_make_uih [rfcomm] ffffffffc07ed58d ^@ bnep_exit [bnep] ffffffffc07ec000 ^@ bnep_rx_control [bnep] ffffffffc07ec1a0 ^@ bnep_session [bnep] ffffffffc07e7000 ^@ input_leds_event [input_leds] ffffffffc07e9000 ^@ input_leds_handler [input_leds] ffffffffc07e7010 ^@ input_leds_disconnect [input_leds] Notably, the null bytes (represented above by ^@) can confuse tools. Fix by correcting the buffer length. Fixes: 91fb02f31505 ("module: Move kallsyms support into a separate file") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2022-07-01bpf: Fix insufficient bounds propagation from adjust_scalar_min_max_valsDaniel Borkmann
Kuee reported a corner case where the tnum becomes constant after the call to __reg_bound_offset(), but the register's bounds are not, that is, its min bounds are still not equal to the register's max bounds. This in turn allows to leak pointers through turning a pointer register as is into an unknown scalar via adjust_ptr_min_max_vals(). Before: func#0 @0 0: R1=ctx(off=0,imm=0,umax=0,var_off=(0x0; 0x0)) R10=fp(off=0,imm=0,umax=0,var_off=(0x0; 0x0)) 0: (b7) r0 = 1 ; R0_w=scalar(imm=1,umin=1,umax=1,var_off=(0x1; 0x0)) 1: (b7) r3 = 0 ; R3_w=scalar(imm=0,umax=0,var_off=(0x0; 0x0)) 2: (87) r3 = -r3 ; R3_w=scalar() 3: (87) r3 = -r3 ; R3_w=scalar() 4: (47) r3 |= 32767 ; R3_w=scalar(smin=-9223372036854743041,umin=32767,var_off=(0x7fff; 0xffffffffffff8000),s32_min=-2147450881) 5: (75) if r3 s>= 0x0 goto pc+1 ; R3_w=scalar(umin=9223372036854808575,var_off=(0x8000000000007fff; 0x7fffffffffff8000),s32_min=-2147450881,u32_min=32767) 6: (95) exit from 5 to 7: R0=scalar(imm=1,umin=1,umax=1,var_off=(0x1; 0x0)) R1=ctx(off=0,imm=0,umax=0,var_off=(0x0; 0x0)) R3=scalar(umin=32767,umax=9223372036854775807,var_off=(0x7fff; 0x7fffffffffff8000),s32_min=-2147450881) R10=fp(off=0,imm=0,umax=0,var_off=(0x0; 0x0)) 7: (d5) if r3 s<= 0x8000 goto pc+1 ; R3=scalar(umin=32769,umax=9223372036854775807,var_off=(0x7fff; 0x7fffffffffff8000),s32_min=-2147450881,u32_min=32767) 8: (95) exit from 7 to 9: R0=scalar(imm=1,umin=1,umax=1,var_off=(0x1; 0x0)) R1=ctx(off=0,imm=0,umax=0,var_off=(0x0; 0x0)) R3=scalar(umin=32767,umax=32768,var_off=(0x7fff; 0x8000)) R10=fp(off=0,imm=0,umax=0,var_off=(0x0; 0x0)) 9: (07) r3 += -32767 ; R3_w=scalar(imm=0,umax=1,var_off=(0x0; 0x0)) <--- [*] 10: (95) exit What can be seen here is that R3=scalar(umin=32767,umax=32768,var_off=(0x7fff; 0x8000)) after the operation R3 += -32767 results in a 'malformed' constant, that is, R3_w=scalar(imm=0,umax=1,var_off=(0x0; 0x0)). Intersecting with var_off has not been done at that point via __update_reg_bounds(), which would have improved the umax to be equal to umin. Refactor the tnum <> min/max bounds information flow into a reg_bounds_sync() helper and use it consistently everywhere. After the fix, bounds have been corrected to R3_w=scalar(imm=0,umax=0,var_off=(0x0; 0x0)) and thus the register is regarded as a 'proper' constant scalar of 0. After: func#0 @0 0: R1=ctx(off=0,imm=0,umax=0,var_off=(0x0; 0x0)) R10=fp(off=0,imm=0,umax=0,var_off=(0x0; 0x0)) 0: (b7) r0 = 1 ; R0_w=scalar(imm=1,umin=1,umax=1,var_off=(0x1; 0x0)) 1: (b7) r3 = 0 ; R3_w=scalar(imm=0,umax=0,var_off=(0x0; 0x0)) 2: (87) r3 = -r3 ; R3_w=scalar() 3: (87) r3 = -r3 ; R3_w=scalar() 4: (47) r3 |= 32767 ; R3_w=scalar(smin=-9223372036854743041,umin=32767,var_off=(0x7fff; 0xffffffffffff8000),s32_min=-2147450881) 5: (75) if r3 s>= 0x0 goto pc+1 ; R3_w=scalar(umin=9223372036854808575,var_off=(0x8000000000007fff; 0x7fffffffffff8000),s32_min=-2147450881,u32_min=32767) 6: (95) exit from 5 to 7: R0=scalar(imm=1,umin=1,umax=1,var_off=(0x1; 0x0)) R1=ctx(off=0,imm=0,umax=0,var_off=(0x0; 0x0)) R3=scalar(umin=32767,umax=9223372036854775807,var_off=(0x7fff; 0x7fffffffffff8000),s32_min=-2147450881) R10=fp(off=0,imm=0,umax=0,var_off=(0x0; 0x0)) 7: (d5) if r3 s<= 0x8000 goto pc+1 ; R3=scalar(umin=32769,umax=9223372036854775807,var_off=(0x7fff; 0x7fffffffffff8000),s32_min=-2147450881,u32_min=32767) 8: (95) exit from 7 to 9: R0=scalar(imm=1,umin=1,umax=1,var_off=(0x1; 0x0)) R1=ctx(off=0,imm=0,umax=0,var_off=(0x0; 0x0)) R3=scalar(umin=32767,umax=32768,var_off=(0x7fff; 0x8000)) R10=fp(off=0,imm=0,umax=0,var_off=(0x0; 0x0)) 9: (07) r3 += -32767 ; R3_w=scalar(imm=0,umax=0,var_off=(0x0; 0x0)) <--- [*] 10: (95) exit Fixes: b03c9f9fdc37 ("bpf/verifier: track signed and unsigned min/max values") Reported-by: Kuee K1r0a <liulin063@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20220701124727.11153-2-daniel@iogearbox.net
2022-07-01bpf: Fix incorrect verifier simulation around jmp32's jeq/jneDaniel Borkmann
Kuee reported a quirk in the jmp32's jeq/jne simulation, namely that the register value does not match expectations for the fall-through path. For example: Before fix: 0: R1=ctx(off=0,imm=0) R10=fp0 0: (b7) r2 = 0 ; R2_w=P0 1: (b7) r6 = 563 ; R6_w=P563 2: (87) r2 = -r2 ; R2_w=Pscalar() 3: (87) r2 = -r2 ; R2_w=Pscalar() 4: (4c) w2 |= w6 ; R2_w=Pscalar(umin=563,umax=4294967295,var_off=(0x233; 0xfffffdcc),s32_min=-2147483085) R6_w=P563 5: (56) if w2 != 0x8 goto pc+1 ; R2_w=P571 <--- [*] 6: (95) exit R0 !read_ok After fix: 0: R1=ctx(off=0,imm=0) R10=fp0 0: (b7) r2 = 0 ; R2_w=P0 1: (b7) r6 = 563 ; R6_w=P563 2: (87) r2 = -r2 ; R2_w=Pscalar() 3: (87) r2 = -r2 ; R2_w=Pscalar() 4: (4c) w2 |= w6 ; R2_w=Pscalar(umin=563,umax=4294967295,var_off=(0x233; 0xfffffdcc),s32_min=-2147483085) R6_w=P563 5: (56) if w2 != 0x8 goto pc+1 ; R2_w=P8 <--- [*] 6: (95) exit R0 !read_ok As can be seen on line 5 for the branch fall-through path in R2 [*] is that given condition w2 != 0x8 is false, verifier should conclude that r2 = 8 as upper 32 bit are known to be zero. However, verifier incorrectly concludes that r2 = 571 which is far off. The problem is it only marks false{true}_reg as known in the switch for JE/NE case, but at the end of the function, it uses {false,true}_{64,32}off to update {false,true}_reg->var_off and they still hold the prior value of {false,true}_reg->var_off before it got marked as known. The subsequent __reg_combine_32_into_64() then propagates this old var_off and derives new bounds. The information between min/max bounds on {false,true}_reg from setting the register to known const combined with the {false,true}_reg->var_off based on the old information then derives wrong register data. Fix it by detangling the BPF_JEQ/BPF_JNE cases and updating relevant {false,true}_{64,32}off tnums along with the register marking to known constant. Fixes: 3f50f132d840 ("bpf: Verifier, do explicit ALU32 bounds tracking") Reported-by: Kuee K1r0a <liulin063@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20220701124727.11153-1-daniel@iogearbox.net