summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2023-01-18bpf: Fix off-by-one error in bpf_mem_cache_idx()Hou Tao
According to the definition of sizes[NUM_CACHES], when the size passed to bpf_mem_cache_size() is 256, it should return 6 instead 7. Fixes: 7c8199e24fa0 ("bpf: Introduce any context BPF specific memory allocator.") Signed-off-by: Hou Tao <houtao1@huawei.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20230118084630.3750680-1-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-01-18kheaders: explicitly validate existence of cpio commandThomas Weißschuh
If the cpio command is not available the error emitted by gen_kheaders.so is not clear as all output of the call to cpio is discarded: GNU make 4.4: GEN kernel/kheaders_data.tar.xz find: 'standard output': Broken pipe find: write error make[2]: *** [kernel/Makefile:157: kernel/kheaders_data.tar.xz] Error 127 make[1]: *** [scripts/Makefile.build:504: kernel] Error 2 GNU make < 4.4: GEN kernel/kheaders_data.tar.xz make[2]: *** [kernel/Makefile:157: kernel/kheaders_data.tar.xz] Error 127 make[2]: *** Waiting for unfinished jobs.... make[1]: *** [scripts/Makefile.build:504: kernel] Error 2 Add an explicit check that will trigger a clear message about the issue: CHK kernel/kheaders_data.tar.xz ./kernel/gen_kheaders.sh: line 17: type: cpio: not found The other commands executed by gen_kheaders.sh are part of a standard installation, so they are not checked. Reported-by: Amy Parker <apark0006@student.cerritos.edu> Link: https://lore.kernel.org/lkml/CAPOgqxFva=tOuh1UitCSN38+28q3BNXKq19rEsVNPRzRqKqZ+g@mail.gmail.com/ Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Reviewed-by: Nicolas Schier <n.schier@avm.de> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2023-01-17Merge tag 'for-netdev' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== bpf 2023-01-16 We've added 6 non-merge commits during the last 8 day(s) which contain a total of 6 files changed, 22 insertions(+), 24 deletions(-). The main changes are: 1) Mitigate a Spectre v4 leak in unprivileged BPF from speculative pointer-as-scalar type confusion, from Luis Gerhorst. 2) Fix a splat when pid 1 attaches a BPF program that attempts to send killing signal to itself, from Hao Sun. 3) Fix BPF program ID information in BPF_AUDIT_UNLOAD as well as PERF_BPF_EVENT_PROG_UNLOAD events, from Paul Moore. 4) Fix BPF verifier warning triggered from invalid kfunc call in backtrack_insn, also from Hao Sun. 5) Fix potential deadlock in htab_lock_bucket from same bucket index but different map_locked index, from Tonghao Zhang. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf: Fix pointer-leak due to insufficient speculative store bypass mitigation bpf: hash map, avoid deadlock with suitable hash mask bpf: remove the do_idr_lock parameter from bpf_prog_free_id() bpf: restore the ebpf program ID for BPF_AUDIT_UNLOAD and PERF_BPF_EVENT_PROG_UNLOAD bpf: Skip task with pid=1 in send_signal_common() bpf: Skip invalid kfunc call in backtrack_insn ==================== Link: https://lore.kernel.org/r/20230116230745.21742-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-17bpf: Do not allow to load sleepable BPF_TRACE_RAW_TP programJiri Olsa
Currently we allow to load any tracing program as sleepable, but BPF_TRACE_RAW_TP can't sleep. Making the check explicit for tracing programs attach types, so sleepable BPF_TRACE_RAW_TP will fail to load. Updating the verifier error to mention iter programs as well. Acked-by: Song Liu <song@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20230117223705.440975-1-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-01-17genirq/msi: Free the fwnode created by msi_create_device_irq_domain()Jason Gunthorpe
msi_create_device_irq_domain() creates a firmware node for the new domain, which is never freed. kmemleak reports: unreferenced object 0xffff888120ba9a00 (size 96): comm "systemd-modules", pid 221, jiffies 4294893411 (age 635.732s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 e0 19 8b 83 ff ff ff ff ................ 00 00 00 00 00 00 00 00 18 9a ba 20 81 88 ff ff ........... .... backtrace: [<000000008cdbc98d>] __irq_domain_alloc_fwnode+0x51/0x2b0 [<00000000c57acf9d>] msi_create_device_irq_domain+0x283/0x670 [<000000009b567982>] __pci_enable_msix_range+0x49e/0xdb0 [<0000000077cc1445>] pci_alloc_irq_vectors_affinity+0x11f/0x1c0 [<00000000532e9ef5>] mlx5_irq_table_create+0x24c/0x940 [mlx5_core] [<00000000fabd2b80>] mlx5_load+0x1fa/0x680 [mlx5_core] [<000000006bb22ae4>] mlx5_init_one+0x485/0x670 [mlx5_core] [<00000000eaa5e1ad>] probe_one+0x4c2/0x720 [mlx5_core] [<00000000df8efb43>] local_pci_probe+0xd6/0x170 [<0000000085cb9924>] pci_device_probe+0x231/0x6e0 Use the proper free operation for the firmware wnode so the name is freed during error unwind of msi_create_device_irq_domain() and also free the node in msi_remove_device_irq_domain() if it was automatically allocated. To avoid extra NULL pointer checks make irq_domain_free_fwnode() tolerant of NULL. Fixes: 27a6dea3ebaa ("genirq/msi: Provide msi_create/free_device_irq_domain()") Reported-by: Omri Barazi <obarazi@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Kalle Valo <kvalo@kernel.org> Tested-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/0-v2-24af6665e2da+c9-msi_leak_jgg@nvidia.com
2023-01-16kernel/printk/printk.c: Fix W=1 kernel-doc warningAnuradha Weeraman
Fix W=1 kernel-doc warning: kernel/printk/printk.c: - Include function parameter in console_lock_spinning_disable_and_check() Signed-off-by: Anuradha Weeraman <anuradha@debian.org> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20230116125635.374567-1-anuradha@debian.org
2023-01-16tty: serial: kgdboc: fix mutex locking order for configure_kgdboc()John Ogness
Several mutexes are taken while setting up console serial ports. In particular, the tty_port->mutex and @console_mutex are taken: serial_pnp_probe serial8250_register_8250_port uart_add_one_port (locks tty_port->mutex) uart_configure_port register_console (locks @console_mutex) In order to synchronize kgdb's tty_find_polling_driver() with register_console(), commit 6193bc90849a ("tty: serial: kgdboc: synchronize tty_find_polling_driver() and register_console()") takes the @console_mutex. However, this leads to the following call chain (with locking): platform_probe kgdboc_probe configure_kgdboc (locks @console_mutex) tty_find_polling_driver uart_poll_init (locks tty_port->mutex) uart_set_options This is clearly deadlock potential due to the reverse lock ordering. Since uart_set_options() requires holding @console_mutex in order to serialize early initialization of the serial-console lock, take the @console_mutex in uart_poll_init() instead of configure_kgdboc(). Since configure_kgdboc() was using @console_mutex for safe traversal of the console list, change it to use the SRCU iterator instead. Add comments to uart_set_options() kerneldoc mentioning that it requires holding @console_mutex (aka the console_list_lock). Fixes: 6193bc90849a ("tty: serial: kgdboc: synchronize tty_find_polling_driver() and register_console()") Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Reviewed-by: Petr Mladek <pmladek@suse.com> [pmladek@suse.com: Export console_srcu_read_lock_is_held() to fix build kgdboc as a module.] Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20230112161213.1434854-1-john.ogness@linutronix.de
2023-01-16sched/core: Fix NULL pointer access fault in sched_setaffinity() with ↵Waiman Long
non-SMP configs The kernel commit 9a5418bc48ba ("sched/core: Use kfree_rcu() in do_set_cpus_allowed()") introduces a bug for kernels built with non-SMP configs. Calling sched_setaffinity() on such a uniprocessor kernel will cause cpumask_copy() to be called with a NULL pointer leading to general protection fault. This is not really a problem in real use cases as there aren't that many uniprocessor kernel configs in use and calling sched_setaffinity() on such a uniprocessor system doesn't make sense. Fix this problem by making sure cpumask_copy() will not be called in such a case. Fixes: 9a5418bc48ba ("sched/core: Use kfree_rcu() in do_set_cpus_allowed()") Reported-by: kernel test robot <yujie.liu@intel.com> Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20230115193122.563036-1-longman@redhat.com
2023-01-14Merge tag 'modules-6.2-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux Pull module fix from Luis Chamberlain: "Just one fix for modules by Nick" * tag 'modules-6.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: kallsyms: Fix scheduling with interrupts disabled in self-test
2023-01-13kallsyms: Fix scheduling with interrupts disabled in self-testNicholas Piggin
kallsyms_on_each* may schedule so must not be called with interrupts disabled. The iteration function could disable interrupts, but this also changes lookup_symbol() to match the change to the other timing code. Reported-by: Erhard F. <erhard_f@mailbox.org> Link: https://lore.kernel.org/all/bug-216902-206035@https.bugzilla.kernel.org%2F/ Reported-by: kernel test robot <oliver.sang@intel.com> Link: https://lore.kernel.org/oe-lkp/202212251728.8d0872ff-oliver.sang@intel.com Fixes: 30f3bb09778d ("kallsyms: Add self-test facility") Tested-by: "Erhard F." <erhard_f@mailbox.org> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-01-13bpf: Fix pointer-leak due to insufficient speculative store bypass mitigationLuis Gerhorst
To mitigate Spectre v4, 2039f26f3aca ("bpf: Fix leakage due to insufficient speculative store bypass mitigation") inserts lfence instructions after 1) initializing a stack slot and 2) spilling a pointer to the stack. However, this does not cover cases where a stack slot is first initialized with a pointer (subject to sanitization) but then overwritten with a scalar (not subject to sanitization because the slot was already initialized). In this case, the second write may be subject to speculative store bypass (SSB) creating a speculative pointer-as-scalar type confusion. This allows the program to subsequently leak the numerical pointer value using, for example, a branch-based cache side channel. To fix this, also sanitize scalars if they write a stack slot that previously contained a pointer. Assuming that pointer-spills are only generated by LLVM on register-pressure, the performance impact on most real-world BPF programs should be small. The following unprivileged BPF bytecode drafts a minimal exploit and the mitigation: [...] // r6 = 0 or 1 (skalar, unknown user input) // r7 = accessible ptr for side channel // r10 = frame pointer (fp), to be leaked // r9 = r10 # fp alias to encourage ssb *(u64 *)(r9 - 8) = r10 // fp[-8] = ptr, to be leaked // lfence added here because of pointer spill to stack. // // Ommitted: Dummy bpf_ringbuf_output() here to train alias predictor // for no r9-r10 dependency. // *(u64 *)(r10 - 8) = r6 // fp[-8] = scalar, overwrites ptr // 2039f26f3aca: no lfence added because stack slot was not STACK_INVALID, // store may be subject to SSB // // fix: also add an lfence when the slot contained a ptr // r8 = *(u64 *)(r9 - 8) // r8 = architecturally a scalar, speculatively a ptr // // leak ptr using branch-based cache side channel: r8 &= 1 // choose bit to leak if r8 == 0 goto SLOW // no mispredict // architecturally dead code if input r6 is 0, // only executes speculatively iff ptr bit is 1 r8 = *(u64 *)(r7 + 0) # encode bit in cache (0: slow, 1: fast) SLOW: [...] After running this, the program can time the access to *(r7 + 0) to determine whether the chosen pointer bit was 0 or 1. Repeat this 64 times to recover the whole address on amd64. In summary, sanitization can only be skipped if one scalar is overwritten with another scalar. Scalar-confusion due to speculative store bypass can not lead to invalid accesses because the pointer bounds deducted during verification are enforced using branchless logic. See 979d63d50c0c ("bpf: prevent out of bounds speculation on pointer arithmetic") for details. Do not make the mitigation depend on !env->allow_{uninit_stack,ptr_leaks} because speculative leaks are likely unexpected if these were enabled. For example, leaking the address to a protected log file may be acceptable while disabling the mitigation might unintentionally leak the address into the cached-state of a map that is accessible to unprivileged processes. Fixes: 2039f26f3aca ("bpf: Fix leakage due to insufficient speculative store bypass mitigation") Signed-off-by: Luis Gerhorst <gerhorst@cs.fau.de> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Henriette Hofmeier <henriette.hofmeier@rub.de> Link: https://lore.kernel.org/bpf/edc95bad-aada-9cfc-ffe2-fa9bb206583c@cs.fau.de Link: https://lore.kernel.org/bpf/20230109150544.41465-1-gerhorst@cs.fau.de
2023-01-13sched/fair: Fixes for capacity inversion detectionQais Yousef
Traversing the Perf Domains requires rcu_read_lock() to be held and is conditional on sched_energy_enabled(). Ensure right protections applied. Also skip capacity inversion detection for our own pd; which was an error. Fixes: 44c7b80bffc3 ("sched/fair: Detect capacity inversion") Reported-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lore.kernel.org/r/20230112122708.330667-3-qyousef@layalina.io
2023-01-13sched/uclamp: Fix a uninitialized variable warningsQais Yousef
Addresses the following warnings: > config: riscv-randconfig-m031-20221111 > compiler: riscv64-linux-gcc (GCC) 12.1.0 > > smatch warnings: > kernel/sched/fair.c:7263 find_energy_efficient_cpu() error: uninitialized symbol 'util_min'. > kernel/sched/fair.c:7263 find_energy_efficient_cpu() error: uninitialized symbol 'util_max'. Fixes: 244226035a1f ("sched/uclamp: Fix fits_capacity() check in feec()") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lore.kernel.org/r/20230112122708.330667-2-qyousef@layalina.io
2023-01-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
drivers/net/usb/r8152.c be53771c87f4 ("r8152: add vendor/device ID pair for Microsoft Devkit") ec51fbd1b8a2 ("r8152: add USB device driver for config selection") https://lore.kernel.org/all/20230113113339.658c4723@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-12bpf: hash map, avoid deadlock with suitable hash maskTonghao Zhang
The deadlock still may occur while accessed in NMI and non-NMI context. Because in NMI, we still may access the same bucket but with different map_locked index. For example, on the same CPU, .max_entries = 2, we update the hash map, with key = 4, while running bpf prog in NMI nmi_handle(), to update hash map with key = 20, so it will have the same bucket index but have different map_locked index. To fix this issue, using min mask to hash again. Fixes: 20b6cc34ea74 ("bpf: Avoid hashtab deadlock with map_locked") Signed-off-by: Tonghao Zhang <tong@infragraf.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: Song Liu <song@kernel.org> Cc: Yonghong Song <yhs@fb.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Stanislav Fomichev <sdf@google.com> Cc: Hao Luo <haoluo@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Hou Tao <houtao1@huawei.com> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20230111092903.92389-1-tong@infragraf.org Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-01-12Merge tag 'timers-urgent-2023-01-12' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer doc fixes from Ingo Molnar: - Fix various DocBook formatting errors in kernel/time/ that generated (justified) warnings during a kernel-doc build. * tag 'timers-urgent-2023-01-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: time: Fix various kernel-doc problems
2023-01-12Merge tag 'sched-urgent-2023-01-12' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: - Fix scheduler frequency invariance bug related to overly long tickless periods triggering an integer overflow and disabling the feature. - Fix use-after-free bug in dup_user_cpus_ptr(). - Fix do_set_cpus_allowed() deadlock scenarios related to calling kfree() with the pi_lock held. NOTE: the rcu_free() is the 'lazy' solution here - we looked at patches to free the structure after the pi_lock got dropped, but that looked quite a bit messier - and none of this is truly performance critical. We can revisit this if it's too lazy of a solution ... * tag 'sched-urgent-2023-01-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/core: Use kfree_rcu() in do_set_cpus_allowed() sched/core: Fix use-after-free bug in dup_user_cpus_ptr() sched/core: Fix arch_scale_freq_tick() on tickless systems
2023-01-10bpf: btf: limit logging of ignored BTF mismatchesConnor O'Brien
Enabling CONFIG_MODULE_ALLOW_BTF_MISMATCH is an indication that BTF mismatches are expected and module loading should proceed anyway. Logging with pr_warn() on every one of these "benign" mismatches creates unnecessary noise when many such modules are loaded. Instead, handle this case with a single log warning that BTF info may be unavailable. Mismatches also result in calls to __btf_verifier_log() via __btf_verifier_log_type() or btf_verifier_log_member(), adding several additional lines of logging per mismatched module. Add checks to these paths to skip logging for module BTF mismatches in the "allow mismatch" case. All existing logging behavior is preserved in the default CONFIG_MODULE_ALLOW_BTF_MISMATCH=n case. Signed-off-by: Connor O'Brien <connoro@google.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20230107025331.3240536-1-connoro@google.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-01-10Merge tag 'xtensa-20230110' of https://github.com/jcmvbkbc/linux-xtensaLinus Torvalds
Pull xtensa fixes from Max Filippov: - fix xtensa allmodconfig build broken by the kcsan test - drop unused members of struct thread_struct * tag 'xtensa-20230110' of https://github.com/jcmvbkbc/linux-xtensa: xtensa: drop unused members of struct thread_struct kcsan: test: don't put the expect array on the stack
2023-01-09bpf: remove the do_idr_lock parameter from bpf_prog_free_id()Paul Moore
It was determined that the do_idr_lock parameter to bpf_prog_free_id() was not necessary as it should always be true. Suggested-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Paul Moore <paul@paul-moore.com> Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20230106154400.74211-2-paul@paul-moore.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-01-09bpf: restore the ebpf program ID for BPF_AUDIT_UNLOAD and ↵Paul Moore
PERF_BPF_EVENT_PROG_UNLOAD When changing the ebpf program put() routines to support being called from within IRQ context the program ID was reset to zero prior to calling the perf event and audit UNLOAD record generators, which resulted in problems as the ebpf program ID was bogus (always zero). This patch addresses this problem by removing an unnecessary call to bpf_prog_free_id() in __bpf_prog_offload_destroy() and adjusting __bpf_prog_put() to only call bpf_prog_free_id() after audit and perf have finished their bpf program unload tasks in bpf_prog_put_deferred(). For the record, no one can determine, or remember, why it was necessary to free the program ID, and remove it from the IDR, prior to executing bpf_prog_put_deferred(); regardless, both Stanislav and Alexei agree that the approach in this patch should be safe. It is worth noting that when moving the bpf_prog_free_id() call, the do_idr_lock parameter was forced to true as the ebpf devs determined this was the correct as the do_idr_lock should always be true. The do_idr_lock parameter will be removed in a follow-up patch, but it was kept here to keep the patch small in an effort to ease any stable backports. I also modified the bpf_audit_prog() logic used to associate the AUDIT_BPF record with other associated records, e.g. @ctx != NULL. Instead of keying off the operation, it now keys off the execution context, e.g. '!in_irg && !irqs_disabled()', which is much more appropriate and should help better connect the UNLOAD operations with the associated audit state (other audit records). Cc: stable@vger.kernel.org Fixes: d809e134be7a ("bpf: Prepare bpf_prog_put() to be called from irq context.") Reported-by: Burn Alting <burn.alting@iinet.net.au> Reported-by: Jiri Olsa <olsajiri@gmail.com> Suggested-by: Stanislav Fomichev <sdf@google.com> Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Signed-off-by: Paul Moore <paul@paul-moore.com> Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20230106154400.74211-1-paul@paul-moore.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-01-09sched/core: Use kfree_rcu() in do_set_cpus_allowed()Waiman Long
Commit 851a723e45d1 ("sched: Always clear user_cpus_ptr in do_set_cpus_allowed()") may call kfree() if user_cpus_ptr was previously set. Unfortunately, some of the callers of do_set_cpus_allowed() may have pi_lock held when calling it. So the following splats may be printed especially when running with a PREEMPT_RT kernel: WARNING: possible circular locking dependency detected BUG: sleeping function called from invalid context To avoid these problems, kfree_rcu() is used instead. An internal cpumask_rcuhead union is created for the sole purpose of facilitating the use of kfree_rcu() to free the cpumask. Since user_cpus_ptr is not being used in non-SMP configs, the newly introduced alloc_user_cpus_ptr() helper will return NULL in this case and sched_setaffinity() is modified to handle this special case. Fixes: 851a723e45d1 ("sched: Always clear user_cpus_ptr in do_set_cpus_allowed()") Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20221231041120.440785-3-longman@redhat.com
2023-01-09sched/core: Fix use-after-free bug in dup_user_cpus_ptr()Waiman Long
Since commit 07ec77a1d4e8 ("sched: Allow task CPU affinity to be restricted on asymmetric systems"), the setting and clearing of user_cpus_ptr are done under pi_lock for arm64 architecture. However, dup_user_cpus_ptr() accesses user_cpus_ptr without any lock protection. Since sched_setaffinity() can be invoked from another process, the process being modified may be undergoing fork() at the same time. When racing with the clearing of user_cpus_ptr in __set_cpus_allowed_ptr_locked(), it can lead to user-after-free and possibly double-free in arm64 kernel. Commit 8f9ea86fdf99 ("sched: Always preserve the user requested cpumask") fixes this problem as user_cpus_ptr, once set, will never be cleared in a task's lifetime. However, this bug was re-introduced in commit 851a723e45d1 ("sched: Always clear user_cpus_ptr in do_set_cpus_allowed()") which allows the clearing of user_cpus_ptr in do_set_cpus_allowed(). This time, it will affect all arches. Fix this bug by always clearing the user_cpus_ptr of the newly cloned/forked task before the copying process starts and check the user_cpus_ptr state of the source task under pi_lock. Note to stable, this patch won't be applicable to stable releases. Just copy the new dup_user_cpus_ptr() function over. Fixes: 07ec77a1d4e8 ("sched: Allow task CPU affinity to be restricted on asymmetric systems") Fixes: 851a723e45d1 ("sched: Always clear user_cpus_ptr in do_set_cpus_allowed()") Reported-by: David Wang 王标 <wangbiao3@xiaomi.com> Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Peter Zijlstra <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20221231041120.440785-2-longman@redhat.com
2023-01-07sched/core: Fix arch_scale_freq_tick() on tickless systemsYair Podemsky
In order for the scheduler to be frequency invariant we measure the ratio between the maximum CPU frequency and the actual CPU frequency. During long tickless periods of time the calculations that keep track of that might overflow, in the function scale_freq_tick(): if (check_shl_overflow(acnt, 2*SCHED_CAPACITY_SHIFT, &acnt)) goto error; eventually forcing the kernel to disable the feature for all CPUs, and show the warning message: "Scheduler frequency invariance went wobbly, disabling!". Let's avoid that by limiting the frequency invariant calculations to CPUs with regular tick. Fixes: e2b0d619b400 ("x86, sched: check for counters overflow in frequency invariant accounting") Suggested-by: "Peter Zijlstra (Intel)" <peterz@infradead.org> Signed-off-by: Yair Podemsky <ypodemsk@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Valentin Schneider <vschneid@redhat.com> Acked-by: Giovanni Gherdovich <ggherdovich@suse.cz> Link: https://lore.kernel.org/r/20221130125121.34407-1-ypodemsk@redhat.com
2023-01-06bpf: Skip task with pid=1 in send_signal_common()Hao Sun
The following kernel panic can be triggered when a task with pid=1 attaches a prog that attempts to send killing signal to itself, also see [1] for more details: Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b CPU: 3 PID: 1 Comm: systemd Not tainted 6.1.0-09652-g59fe41b5255f #148 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x100/0x178 lib/dump_stack.c:106 panic+0x2c4/0x60f kernel/panic.c:275 do_exit.cold+0x63/0xe4 kernel/exit.c:789 do_group_exit+0xd4/0x2a0 kernel/exit.c:950 get_signal+0x2460/0x2600 kernel/signal.c:2858 arch_do_signal_or_restart+0x78/0x5d0 arch/x86/kernel/signal.c:306 exit_to_user_mode_loop kernel/entry/common.c:168 [inline] exit_to_user_mode_prepare+0x15f/0x250 kernel/entry/common.c:203 __syscall_exit_to_user_mode_work kernel/entry/common.c:285 [inline] syscall_exit_to_user_mode+0x1d/0x50 kernel/entry/common.c:296 do_syscall_64+0x44/0xb0 arch/x86/entry/common.c:86 entry_SYSCALL_64_after_hwframe+0x63/0xcd So skip task with pid=1 in bpf_send_signal_common() to avoid the panic. [1] https://lore.kernel.org/bpf/20221222043507.33037-1-sunhao.th@gmail.com Signed-off-by: Hao Sun <sunhao.th@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/bpf/20230106084838.12690-1-sunhao.th@gmail.com
2023-01-06bpf: Skip invalid kfunc call in backtrack_insnHao Sun
The verifier skips invalid kfunc call in check_kfunc_call(), which would be captured in fixup_kfunc_call() if such insn is not eliminated by dead code elimination. However, this can lead to the following warning in backtrack_insn(), also see [1]: ------------[ cut here ]------------ verifier backtracking bug WARNING: CPU: 6 PID: 8646 at kernel/bpf/verifier.c:2756 backtrack_insn kernel/bpf/verifier.c:2756 __mark_chain_precision kernel/bpf/verifier.c:3065 mark_chain_precision kernel/bpf/verifier.c:3165 adjust_reg_min_max_vals kernel/bpf/verifier.c:10715 check_alu_op kernel/bpf/verifier.c:10928 do_check kernel/bpf/verifier.c:13821 [inline] do_check_common kernel/bpf/verifier.c:16289 [...] So make backtracking conservative with this by returning ENOTSUPP. [1] https://lore.kernel.org/bpf/CACkBjsaXNceR8ZjkLG=dT3P=4A8SBsg0Z5h5PWLryF5=ghKq=g@mail.gmail.com/ Reported-by: syzbot+4da3ff23081bafe74fc2@syzkaller.appspotmail.com Signed-off-by: Hao Sun <sunhao.th@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20230104014709.9375-1-sunhao.th@gmail.com
2023-01-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-05Merge tag 'net-6.2-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bpf, wifi, and netfilter. Current release - regressions: - bpf: fix nullness propagation for reg to reg comparisons, avoid null-deref - inet: control sockets should not use current thread task_frag - bpf: always use maximal size for copy_array() - eth: bnxt_en: don't link netdev to a devlink port for VFs Current release - new code bugs: - rxrpc: fix a couple of potential use-after-frees - netfilter: conntrack: fix IPv6 exthdr error check - wifi: iwlwifi: fw: skip PPAG for JF, avoid FW crashes - eth: dsa: qca8k: various fixes for the in-band register access - eth: nfp: fix schedule in atomic context when sync mc address - eth: renesas: rswitch: fix getting mac address from device tree - mobile: ipa: use proper endpoint mask for suspend Previous releases - regressions: - tcp: add TIME_WAIT sockets in bhash2, fix regression caught by Jiri / python tests - net: tc: don't intepret cls results when asked to drop, fix oob-access - vrf: determine the dst using the original ifindex for multicast - eth: bnxt_en: - fix XDP RX path if BPF adjusted packet length - fix HDS (header placement) and jumbo thresholds for RX packets - eth: ice: xsk: do not use xdp_return_frame() on tx_buf->raw_buf, avoid memory corruptions Previous releases - always broken: - ulp: prevent ULP without clone op from entering the LISTEN status - veth: fix race with AF_XDP exposing old or uninitialized descriptors - bpf: - pull before calling skb_postpull_rcsum() (fix checksum support and avoid a WARN()) - fix panic due to wrong pageattr of im->image (when livepatch and kretfunc coexist) - keep a reference to the mm, in case the task is dead - mptcp: fix deadlock in fastopen error path - netfilter: - nf_tables: perform type checking for existing sets - nf_tables: honor set timeout and garbage collection updates - ipset: fix hash:net,port,net hang with /0 subnet - ipset: avoid hung task warning when adding/deleting entries - selftests: net: - fix cmsg_so_mark.sh test hang on non-x86 systems - fix the arp_ndisc_evict_nocarrier test for IPv6 - usb: rndis_host: secure rndis_query check against int overflow - eth: r8169: fix dmar pte write access during suspend/resume with WOL - eth: lan966x: fix configuration of the PCS - eth: sparx5: fix reading of the MAC address - eth: qed: allow sleep in qed_mcp_trace_dump() - eth: hns3: - fix interrupts re-initialization after VF FLR - fix handling of promisc when MAC addr table gets full - refine the handling for VF heartbeat - eth: mlx5: - properly handle ingress QinQ-tagged packets on VST - fix io_eq_size and event_eq_size params validation on big endian - fix RoCE setting at HCA level if not supported at all - don't turn CQE compression on by default for IPoIB - eth: ena: - fix toeplitz initial hash key value - account for the number of XDP-processed bytes in interface stats - fix rx_copybreak value update Misc: - ethtool: harden phy stat handling against buggy drivers - docs: netdev: convert maintainer's doc from FAQ to a normal document" * tag 'net-6.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (112 commits) caif: fix memory leak in cfctrl_linkup_request() inet: control sockets should not use current thread task_frag net/ulp: prevent ULP without clone op from entering the LISTEN status qed: allow sleep in qed_mcp_trace_dump() MAINTAINERS: Update maintainers for ptp_vmw driver usb: rndis_host: Secure rndis_query check against int overflow net: dpaa: Fix dtsec check for PCS availability octeontx2-pf: Fix lmtst ID used in aura free drivers/net/bonding/bond_3ad: return when there's no aggregator netfilter: ipset: Rework long task execution when adding/deleting entries netfilter: ipset: fix hash:net,port,net hang with /0 subnet net: sparx5: Fix reading of the MAC address vxlan: Fix memory leaks in error path net: sched: htb: fix htb_classify() kernel-doc net: sched: cbq: dont intepret cls results when asked to drop net: sched: atm: dont intepret cls results when asked to drop dt-bindings: net: marvell,orion-mdio: Fix examples dt-bindings: net: sun8i-emac: Add phy-supply property net: ipa: use proper endpoint mask for suspend selftests: net: return non-zero for failures reported in arp_ndisc_evict_nocarrier ...
2023-01-04Merge tag 'for-netdev' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== bpf-next 2023-01-04 We've added 45 non-merge commits during the last 21 day(s) which contain a total of 50 files changed, 1454 insertions(+), 375 deletions(-). The main changes are: 1) Fixes, improvements and refactoring of parts of BPF verifier's state equivalence checks, from Andrii Nakryiko. 2) Fix a few corner cases in libbpf's BTF-to-C converter in particular around padding handling and enums, also from Andrii Nakryiko. 3) Add BPF_F_NO_TUNNEL_KEY extension to bpf_skb_set_tunnel_key to better support decap on GRE tunnel devices not operating in collect metadata, from Christian Ehrig. 4) Improve x86 JIT's codegen for PROBE_MEM runtime error checks, from Dave Marchevsky. 5) Remove the need for trace_printk_lock for bpf_trace_printk and bpf_trace_vprintk helpers, from Jiri Olsa. 6) Add proper documentation for BPF_MAP_TYPE_SOCK{MAP,HASH} maps, from Maryam Tahhan. 7) Improvements in libbpf's btf_parse_elf error handling, from Changbin Du. 8) Bigger batch of improvements to BPF tracing code samples, from Daniel T. Lee. 9) Add LoongArch support to libbpf's bpf_tracing helper header, from Hengqi Chen. 10) Fix a libbpf compiler warning in perf_event_open_probe on arm32, from Khem Raj. 11) Optimize bpf_local_storage_elem by removing 56 bytes of padding, from Martin KaFai Lau. 12) Use pkg-config to locate libelf for resolve_btfids build, from Shen Jiamin. 13) Various libbpf improvements around API documentation and errno handling, from Xin Liu. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (45 commits) libbpf: Return -ENODATA for missing btf section libbpf: Add LoongArch support to bpf_tracing.h libbpf: Restore errno after pr_warn. libbpf: Added the description of some API functions libbpf: Fix invalid return address register in s390 samples/bpf: Use BPF_KSYSCALL macro in syscall tracing programs samples/bpf: Fix tracex2 by using BPF_KSYSCALL macro samples/bpf: Change _kern suffix to .bpf with syscall tracing program samples/bpf: Use vmlinux.h instead of implicit headers in syscall tracing program samples/bpf: Use kyscall instead of kprobe in syscall tracing program bpf: rename list_head -> graph_root in field info types libbpf: fix errno is overwritten after being closed. bpf: fix regs_exact() logic in regsafe() to remap IDs correctly bpf: perform byte-by-byte comparison only when necessary in regsafe() bpf: reject non-exact register type matches in regsafe() bpf: generalize MAYBE_NULL vs non-MAYBE_NULL rule bpf: reorganize struct bpf_reg_state fields bpf: teach refsafe() to take into account ID remapping bpf: Remove unused field initialization in bpf's ctl_table selftests/bpf: Add jit probe_mem corner case tests to s390x denylist ... ==================== Link: https://lore.kernel.org/r/20230105000926.31350-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-03time: Fix various kernel-doc problemsRandy Dunlap
Clean up kernel-doc complaints about function names and non-kernel-doc comments in kernel/time/. Fixes these warnings: kernel/time/time.c:479: warning: expecting prototype for set_normalized_timespec(). Prototype was for set_normalized_timespec64() instead kernel/time/time.c:553: warning: expecting prototype for msecs_to_jiffies(). Prototype was for __msecs_to_jiffies() instead kernel/time/timekeeping.c:1595: warning: contents before sections kernel/time/timekeeping.c:1705: warning: This comment starts with '/**', but isn't a kernel-doc comment. * We have three kinds of time sources to use for sleep time kernel/time/timekeeping.c:1726: warning: This comment starts with '/**', but isn't a kernel-doc comment. * 1) can be determined whether to use or not only when doing kernel/time/tick-oneshot.c:21: warning: missing initial short description on line: * tick_program_event kernel/time/tick-oneshot.c:107: warning: expecting prototype for tick_check_oneshot_mode(). Prototype was for tick_oneshot_mode_active() instead Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230103032849.12723-1-rdunlap@infradead.org
2023-01-02kcsan: test: don't put the expect array on the stackMax Filippov
Size of the 'expect' array in the __report_matches is 1536 bytes, which is exactly the default frame size warning limit of the xtensa architecture. As a result allmodconfig xtensa kernel builds with the gcc that does not support the compiler plugins (which otherwise would push the said warning limit to 2K) fail with the following message: kernel/kcsan/kcsan_test.c:257:1: error: the frame size of 1680 bytes is larger than 1536 bytes Fix it by dynamically allocating the 'expect' array. Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Reviewed-by: Marco Elver <elver@google.com> Tested-by: Marco Elver <elver@google.com>
2023-01-01Merge tag 'perf_urgent_for_v6.2_rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Borislav Petkov: - Pass only an initialized perf event attribute to the LSM hook - Fix a use-after-free on the perf syscall's error path - A potential integer overflow fix in amd_core_pmu_init() - Fix the cgroup events tracking after the context handling rewrite - Return the proper value from the inherit_event() function on error * tag 'perf_urgent_for_v6.2_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/core: Call LSM hook after copying perf_event_attr perf: Fix use-after-free in error path perf/x86/amd: fix potential integer overflow on shift of a int perf/core: Fix cgroup events tracking perf core: Return error pointer if inherit_event() fails to find pmu_ctx
2023-01-01Merge tag 'locking_urgent_for_v6.2_rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Borislav Petkov: - Prevent the leaking of a debug timer in futex_waitv() - A preempt-RT mutex locking fix, adding the proper acquire semantics * tag 'locking_urgent_for_v6.2_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: futex: Fix futex_waitv() hrtimer debug object leak on kcalloc error rtmutex: Add acquire semantics for rtmutex lock acquisition slow path
2022-12-28bpf: rename list_head -> graph_root in field info typesDave Marchevsky
Many of the structs recently added to track field info for linked-list head are useful as-is for rbtree root. So let's do a mechanical renaming of list_head-related types and fields: include/linux/bpf.h: struct btf_field_list_head -> struct btf_field_graph_root list_head -> graph_root in struct btf_field union kernel/bpf/btf.c: list_head -> graph_root in struct btf_field_info This is a nonfunctional change, functionality to actually use these fields for rbtree will be added in further patches. Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Link: https://lore.kernel.org/r/20221217082506.1570898-5-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-12-28bpf: Always use maximal size for copy_array()Kees Cook
Instead of counting on prior allocations to have sized allocations to the next kmalloc bucket size, always perform a krealloc that is at least ksize(dst) in size (which is a no-op), so the size can be correctly tracked by all the various allocation size trackers (KASAN, __alloc_size, etc). Reported-by: Hyunwoo Kim <v4bel@theori.io> Link: https://lore.kernel.org/bpf/20221223094551.GA1439509@ubuntu Fixes: ceb35b666d42 ("bpf/verifier: Use kmalloc_size_roundup() to match ksize() usage") Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: Song Liu <song@kernel.org> Cc: Yonghong Song <yhs@fb.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Stanislav Fomichev <sdf@google.com> Cc: Hao Luo <haoluo@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: bpf@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20221223182836.never.866-kees@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-12-28bpf: keep a reference to the mm, in case the task is dead.Kui-Feng Lee
Fix the system crash that happens when a task iterator travel through vma of tasks. In task iterators, we used to access mm by following the pointer on the task_struct; however, the death of a task will clear the pointer, even though we still hold the task_struct. That can cause an unexpected crash for a null pointer when an iterator is visiting a task that dies during the visit. Keeping a reference of mm on the iterator ensures we always have a valid pointer to mm. Co-developed-by: Song Liu <song@kernel.org> Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Kui-Feng Lee <kuifeng@meta.com> Reported-by: Nathan Slingerland <slinger@meta.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20221216221855.4122288-2-kuifeng@meta.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-12-28bpf: Fix panic due to wrong pageattr of im->imageChuang Wang
In the scenario where livepatch and kretfunc coexist, the pageattr of im->image is rox after arch_prepare_bpf_trampoline in bpf_trampoline_update, and then modify_fentry or register_fentry returns -EAGAIN from bpf_tramp_ftrace_ops_func, the BPF_TRAMP_F_ORIG_STACK flag will be configured, and arch_prepare_bpf_trampoline will be re-executed. At this time, because the pageattr of im->image is rox, arch_prepare_bpf_trampoline will read and write im->image, which causes a fault. as follows: insmod livepatch-sample.ko # samples/livepatch/livepatch-sample.c bpftrace -e 'kretfunc:cmdline_proc_show {}' BUG: unable to handle page fault for address: ffffffffa0206000 PGD 322d067 P4D 322d067 PUD 322e063 PMD 1297e067 PTE d428061 Oops: 0003 [#1] PREEMPT SMP PTI CPU: 2 PID: 270 Comm: bpftrace Tainted: G E K 6.1.0 #5 RIP: 0010:arch_prepare_bpf_trampoline+0xed/0x8c0 RSP: 0018:ffffc90001083ad8 EFLAGS: 00010202 RAX: ffffffffa0206000 RBX: 0000000000000020 RCX: 0000000000000000 RDX: ffffffffa0206001 RSI: ffffffffa0206000 RDI: 0000000000000030 RBP: ffffc90001083b70 R08: 0000000000000066 R09: ffff88800f51b400 R10: 000000002e72c6e5 R11: 00000000d0a15080 R12: ffff8880110a68c8 R13: 0000000000000000 R14: ffff88800f51b400 R15: ffffffff814fec10 FS: 00007f87bc0dc780(0000) GS:ffff88803e600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffa0206000 CR3: 0000000010b70000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> bpf_trampoline_update+0x25a/0x6b0 __bpf_trampoline_link_prog+0x101/0x240 bpf_trampoline_link_prog+0x2d/0x50 bpf_tracing_prog_attach+0x24c/0x530 bpf_raw_tp_link_attach+0x73/0x1d0 __sys_bpf+0x100e/0x2570 __x64_sys_bpf+0x1c/0x30 do_syscall_64+0x5b/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd With this patch, when modify_fentry or register_fentry returns -EAGAIN from bpf_tramp_ftrace_ops_func, the pageattr of im->image will be reset to nx+rw. Cc: stable@vger.kernel.org Fixes: 00963a2e75a8 ("bpf: Support bpf_trampoline on functions with IPMODIFY (e.g. livepatch)") Signed-off-by: Chuang Wang <nashuiliang@gmail.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20221224133146.780578-1-nashuiliang@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-12-27bpf: fix regs_exact() logic in regsafe() to remap IDs correctlyAndrii Nakryiko
Comparing IDs exactly between two separate states is not just suboptimal, but also incorrect in some cases. So update regs_exact() check to do byte-by-byte memcmp() only up to id/ref_obj_id. For id and ref_obj_id perform proper check_ids() checks, taking into account idmap. This change makes more states equivalent improving insns and states stats across a bunch of selftest BPF programs: File Program Insns (A) Insns (B) Insns (DIFF) States (A) States (B) States (DIFF) ------------------------------------------- -------------------------------- --------- --------- -------------- ---------- ---------- ------------- cgrp_kfunc_success.bpf.linked1.o test_cgrp_get_release 141 137 -4 (-2.84%) 13 13 +0 (+0.00%) cgrp_kfunc_success.bpf.linked1.o test_cgrp_xchg_release 142 139 -3 (-2.11%) 14 13 -1 (-7.14%) connect6_prog.bpf.linked1.o connect_v6_prog 139 102 -37 (-26.62%) 9 6 -3 (-33.33%) ima.bpf.linked1.o bprm_creds_for_exec 68 61 -7 (-10.29%) 6 5 -1 (-16.67%) linked_list.bpf.linked1.o global_list_in_list 569 499 -70 (-12.30%) 60 52 -8 (-13.33%) linked_list.bpf.linked1.o global_list_push_pop 167 150 -17 (-10.18%) 18 16 -2 (-11.11%) linked_list.bpf.linked1.o global_list_push_pop_multiple 881 815 -66 (-7.49%) 74 63 -11 (-14.86%) linked_list.bpf.linked1.o inner_map_list_in_list 579 534 -45 (-7.77%) 61 55 -6 (-9.84%) linked_list.bpf.linked1.o inner_map_list_push_pop 190 181 -9 (-4.74%) 19 18 -1 (-5.26%) linked_list.bpf.linked1.o inner_map_list_push_pop_multiple 916 850 -66 (-7.21%) 75 64 -11 (-14.67%) linked_list.bpf.linked1.o map_list_in_list 588 525 -63 (-10.71%) 62 55 -7 (-11.29%) linked_list.bpf.linked1.o map_list_push_pop 183 174 -9 (-4.92%) 18 17 -1 (-5.56%) linked_list.bpf.linked1.o map_list_push_pop_multiple 909 843 -66 (-7.26%) 75 64 -11 (-14.67%) map_kptr.bpf.linked1.o test_map_kptr 264 256 -8 (-3.03%) 26 26 +0 (+0.00%) map_kptr.bpf.linked1.o test_map_kptr_ref 95 91 -4 (-4.21%) 9 8 -1 (-11.11%) task_kfunc_success.bpf.linked1.o test_task_xchg_release 139 136 -3 (-2.16%) 14 13 -1 (-7.14%) test_bpf_nf.bpf.linked1.o nf_skb_ct_test 815 509 -306 (-37.55%) 57 30 -27 (-47.37%) test_bpf_nf.bpf.linked1.o nf_xdp_ct_test 815 509 -306 (-37.55%) 57 30 -27 (-47.37%) test_cls_redirect.bpf.linked1.o cls_redirect 78925 78390 -535 (-0.68%) 4782 4704 -78 (-1.63%) test_cls_redirect_subprogs.bpf.linked1.o cls_redirect 64901 63897 -1004 (-1.55%) 4612 4470 -142 (-3.08%) test_sk_lookup.bpf.linked1.o access_ctx_sk 181 95 -86 (-47.51%) 19 10 -9 (-47.37%) test_sk_lookup.bpf.linked1.o ctx_narrow_access 447 437 -10 (-2.24%) 38 37 -1 (-2.63%) test_sk_lookup_kern.bpf.linked1.o sk_lookup_success 148 133 -15 (-10.14%) 14 12 -2 (-14.29%) test_tcp_check_syncookie_kern.bpf.linked1.o check_syncookie_clsact 304 300 -4 (-1.32%) 23 22 -1 (-4.35%) test_tcp_check_syncookie_kern.bpf.linked1.o check_syncookie_xdp 304 300 -4 (-1.32%) 23 22 -1 (-4.35%) test_verify_pkcs7_sig.bpf.linked1.o bpf 87 76 -11 (-12.64%) 7 6 -1 (-14.29%) ------------------------------------------- -------------------------------- --------- --------- -------------- ---------- ---------- ------------- Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20221223054921.958283-7-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-12-27bpf: perform byte-by-byte comparison only when necessary in regsafe()Andrii Nakryiko
Extract byte-by-byte comparison of bpf_reg_state in regsafe() into a helper function, which makes it more convenient to use it "on demand" only for registers that benefit from such checks, instead of doing it all the time, even if result of such comparison is ignored. Also, remove WARN_ON_ONCE(1)+return false dead code. There is no risk of missing some case as compiler will warn about non-void function not returning value in some branches (and that under assumption that default case is removed in the future). Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20221223054921.958283-6-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-12-27bpf: reject non-exact register type matches in regsafe()Andrii Nakryiko
Generalize the (somewhat implicit) rule of regsafe(), which states that if register types in old and current states do not match *exactly*, they can't be safely considered equivalent. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20221223054921.958283-5-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-12-27bpf: generalize MAYBE_NULL vs non-MAYBE_NULL ruleAndrii Nakryiko
Make generic check to prevent XXX_OR_NULL and XXX register types to be intermixed. While technically in some situations it could be safe, it's impossible to enforce due to the loss of an ID when converting XXX_OR_NULL to its non-NULL variant. So prevent this in general, not just for PTR_TO_MAP_KEY and PTR_TO_MAP_VALUE. PTR_TO_MAP_KEY_OR_NULL and PTR_TO_MAP_VALUE_OR_NULL checks, which were previously special-cased, are simplified to generic check that takes into account range_within() and tnum_in(). This is correct as BPF verifier doesn't allow arithmetic on XXX_OR_NULL register types, so var_off and ranges should stay zero. But even if in the future this restriction is lifted, it's even more important to enforce that var_off and ranges are compatible, otherwise it's possible to construct case where this can be exploited to bypass verifier's memory range safety checks. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20221223054921.958283-4-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-12-27bpf: reorganize struct bpf_reg_state fieldsAndrii Nakryiko
Move id and ref_obj_id fields after scalar data section (var_off and ranges). This is necessary to simplify next patch which will change regsafe()'s logic to be safer, as it makes the contents that has to be an exact match (type-specific parts, off, type, and var_off+ranges) a single sequential block of memory, while id and ref_obj_id should always be remapped and thus can't be memcp()'ed. There are few places that assume that var_off is after id/ref_obj_id to clear out id/ref_obj_id with the single memset(0). These are changed to explicitly zero-out id/ref_obj_id fields. Other places are adjusted to preserve exact byte-by-byte comparison behavior. No functional changes. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20221223054921.958283-3-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-12-27bpf: teach refsafe() to take into account ID remappingAndrii Nakryiko
states_equal() check performs ID mapping between old and new states to establish a 1-to-1 correspondence between IDs, even if their absolute numberic values across two equivalent states differ. This is important both for correctness and to avoid unnecessary work when two states are equivalent. With recent changes we partially fixed this logic by maintaining ID map across all function frames. This patch also makes refsafe() check take into account (and maintain) ID map, making states_equal() behavior more optimal and correct. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20221223054921.958283-2-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-12-27futex: Fix futex_waitv() hrtimer debug object leak on kcalloc errorMathieu Desnoyers
In a scenario where kcalloc() fails to allocate memory, the futex_waitv system call immediately returns -ENOMEM without invoking destroy_hrtimer_on_stack(). When CONFIG_DEBUG_OBJECTS_TIMERS=y, this results in leaking a timer debug object. Fixes: bf69bad38cf6 ("futex: Implement sys_futex_waitv()") Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Cc: stable@vger.kernel.org Cc: stable@vger.kernel.org # v5.16+ Link: https://lore.kernel.org/r/20221214222008.200393-1-mathieu.desnoyers@efficios.com
2022-12-27perf/core: Call LSM hook after copying perf_event_attrNamhyung Kim
It passes the attr struct to the security_perf_event_open() but it's not initialized yet. Fixes: da97e18458fb ("perf_event: Add support for LSM and SELinux checks") Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20221220223140.4020470-1-namhyung@kernel.org
2022-12-27perf: Fix use-after-free in error pathPeter Zijlstra
The syscall error path has a use-after-free; put_pmu_ctx() will reference ctx, therefore we must ensure ctx is destroyed after pmu_ctx is. Fixes: bd2756811766 ("perf: Rewrite core context handling") Reported-by: syzbot+b8e8c01c8ade4fe6e48f@syzkaller.appspotmail.com Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Chengming Zhou <zhouchengming@bytedance.com> Link: https://lkml.kernel.org/r/Y6B3xEgkbmFUCeni@hirez.programming.kicks-ass.net
2022-12-27perf/core: Fix cgroup events trackingChengming Zhou
We encounter perf warnings when using cgroup events like: cd /sys/fs/cgroup mkdir test perf stat -e cycles -a -G test Which then triggers: WARNING: CPU: 0 PID: 690 at kernel/events/core.c:849 perf_cgroup_switch+0xb2/0xc0 Call Trace: <TASK> __schedule+0x4ae/0x9f0 ? _raw_spin_unlock_irqrestore+0x23/0x40 ? __cond_resched+0x18/0x20 preempt_schedule_common+0x2d/0x70 __cond_resched+0x18/0x20 wait_for_completion+0x2f/0x160 ? cpu_stop_queue_work+0x9e/0x130 affine_move_task+0x18a/0x4f0 WARNING: CPU: 0 PID: 690 at kernel/events/core.c:829 ctx_sched_in+0x1cf/0x1e0 Call Trace: <TASK> ? ctx_sched_out+0xb7/0x1b0 perf_cgroup_switch+0x88/0xc0 __schedule+0x4ae/0x9f0 ? _raw_spin_unlock_irqrestore+0x23/0x40 ? __cond_resched+0x18/0x20 preempt_schedule_common+0x2d/0x70 __cond_resched+0x18/0x20 wait_for_completion+0x2f/0x160 ? cpu_stop_queue_work+0x9e/0x130 affine_move_task+0x18a/0x4f0 The above two warnings are not complete here since I remove other unimportant information. The problem is caused by the perf cgroup events tracking: CPU0 CPU1 perf_event_open() perf_event_alloc() account_event() account_event_cpu() atomic_inc(perf_cgroup_events) __perf_event_task_sched_out() if (atomic_read(perf_cgroup_events)) perf_cgroup_switch() // kernel/events/core.c:849 WARN_ON_ONCE(cpuctx->ctx.nr_cgroups == 0) if (READ_ONCE(cpuctx->cgrp) == cgrp) // false return perf_ctx_lock() ctx_sched_out() cpuctx->cgrp = cgrp ctx_sched_in() perf_cgroup_set_timestamp() // kernel/events/core.c:829 WARN_ON_ONCE(!ctx->nr_cgroups) perf_ctx_unlock() perf_install_in_context() cpu_function_call() __perf_install_in_context() add_event_to_ctx() list_add_event() perf_cgroup_event_enable() ctx->nr_cgroups++ cpuctx->cgrp = X We can see from above that we wrongly use percpu atomic perf_cgroup_events to check if we need to perf_cgroup_switch(), which should only be used when we know this CPU has cgroup events enabled. The commit bd2756811766 ("perf: Rewrite core context handling") change to have only one context per-CPU, so we can just use cpuctx->cgrp to check if this CPU has cgroup events enabled. So percpu atomic perf_cgroup_events is not needed. Fixes: bd2756811766 ("perf: Rewrite core context handling") Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lkml.kernel.org/r/20221207124023.66252-1-zhouchengming@bytedance.com
2022-12-27perf core: Return error pointer if inherit_event() fails to find pmu_ctxRavi Bangoria
inherit_event() returns NULL only when it finds orphaned events otherwise it returns either valid child_event pointer or an error pointer. Follow the same when it fails to find pmu_ctx. Fixes: bd2756811766 ("perf: Rewrite core context handling") Reported-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20221118051539.820-1-ravi.bangoria@amd.com
2022-12-24Merge tag 'for-netdev' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller
Daniel Borkmann says: ==================== The following pull-request contains BPF updates for your *net* tree. We've added 7 non-merge commits during the last 5 day(s) which contain a total of 11 files changed, 231 insertions(+), 3 deletions(-). The main changes are: 1) Fix a splat in bpf_skb_generic_pop() under CHECKSUM_PARTIAL due to misuse of skb_postpull_rcsum(), from Jakub Kicinski with test case from Martin Lau. 2) Fix BPF verifier's nullness propagation when registers are of type PTR_TO_BTF_ID, from Hao Sun. 3) Fix bpftool build for JIT disassembler under statically built libllvm, from Anton Protopopov. 4) Fix warnings reported by resolve_btfids when building vmlinux with CONFIG_SECURITY_NETWORK disabled, from Hou Tao. 5) Minor fix up for BPF selftest gitignore, from Stanislav Fomichev. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-23Merge tag 'hardening-v6.2-rc1-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull kernel hardening fixes from Kees Cook: - Fix CFI failure with KASAN (Sami Tolvanen) - Fix LKDTM + CFI under GCC 7 and 8 (Kristina Martsenko) - Limit CONFIG_ZERO_CALL_USED_REGS to Clang > 15.0.6 (Nathan Chancellor) - Ignore "contents" argument in LoadPin's LSM hook handling - Fix paste-o in /sys/kernel/warn_count API docs - Use READ_ONCE() consistently for oops/warn limit reading * tag 'hardening-v6.2-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: cfi: Fix CFI failure with KASAN exit: Use READ_ONCE() for all oops/warn limit reads security: Restrict CONFIG_ZERO_CALL_USED_REGS to gcc or clang > 15.0.6 lkdtm: cfi: Make PAC test work with GCC 7 and 8 docs: Fix path paste-o for /sys/kernel/warn_count LoadPin: Ignore the "contents" argument of the LSM hooks