summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-11-01tracing/histogram: Document hist trigger variablesKalesh Singh
Update the tracefs README to describe how hist trigger variables can be created. Link: https://lkml.kernel.org/r/20211029183339.3216491-4-kaleshsingh@google.com Cc: Jonathan Corbet <corbet@lwn.net> Cc: Ingo Molnar <mingo@redhat.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-11-01tracing/histogram: Update division by 0 documentationKalesh Singh
If the divisor is a constant and zero, the undeifned case can be detected and an error returned instead of -1. Link: https://lkml.kernel.org/r/20211029183339.3216491-3-kaleshsingh@google.com Cc: Jonathan Corbet <corbet@lwn.net> Cc: Ingo Molnar <mingo@redhat.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-11-01tracing/histogram: Optimize division by constantsKalesh Singh
If the divisor is a constant use specific division functions to avoid extra branches when the trigger is hit. If the divisor constant but not a power of 2, the division can be replaced with a multiplication and shift in the following case: Let X = dividend and Y = divisor. Choose Z = some power of 2. If Y <= Z, then: X / Y = (X * (Z / Y)) / Z (Z / Y) is a constant (mult) which is calculated at parse time, so: X / Y = (X * mult) / Z The division by Z can be replaced by a shift since Z is a power of 2: X / Y = (X * mult) >> shift As long, as X < Z the results will not be off by more than 1. Link: https://lkml.kernel.org/r/20211029232410.3494196-1-kaleshsingh@google.com Suggested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-11-01Merge tag 'Smack-for-5.16' of https://github.com/cschaufler/smack-nextLinus Torvalds
Pull smack updates from Casey Schaufler: "Multiple corrections to smackfs: - a change for overlayfs support that corrects the initial attributes on created files - code clean-up for netlabel processing - several fixes in smackfs for a variety of reasons - Errors reported by W=1 have been addressed All told, nothing challenging" * tag 'Smack-for-5.16' of https://github.com/cschaufler/smack-next: smackfs: use netlbl_cfg_cipsov4_del() for deleting cipso_v4_doi smackfs: use __GFP_NOFAIL for smk_cipso_doi() Smack: fix W=1 build warnings smack: remove duplicated hook function Smack:- Use overlay inode label in smack_inode_copy_up() smack: Guard smack_ipv6_lock definition within a SMACK_IPV6_PORT_LABELING block smackfs: Fix use-after-free in netlbl_catmap_walk()
2021-11-01Merge tag 'fallthrough-fixes-clang-5.16-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux Pull fallthrough fixes from Gustavo A. R. Silva: "Fix some fall-through warnings when building with Clang and -Wimplicit-fallthrough" * tag 'fallthrough-fixes-clang-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux: pcmcia: db1xxx_ss: Fix fall-through warning for Clang MIPS: Fix fall-through warnings for Clang scsi: st: Fix fall-through warning for Clang
2021-11-01Merge tag 'kspp-misc-fixes-5.16-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux Pull hardening fixes and cleanups from Gustavo A. R. Silva: "Various hardening fixes and cleanups that I've been collecting during the last development cycle: Fix -Wcast-function-type error: - firewire: Remove function callback casts (Oscar Carter) Fix application of sizeof operator: - firmware/psci: fix application of sizeof to pointer (jing yangyang) Replace open coded instances with size_t saturating arithmetic helpers: - assoc_array: Avoid open coded arithmetic in allocator arguments (Len Baker) - writeback: prefer struct_size over open coded arithmetic (Len Baker) - aio: Prefer struct_size over open coded arithmetic (Len Baker) - dmaengine: pxa_dma: Prefer struct_size over open coded arithmetic (Len Baker) Flexible array transformation: - KVM: PPC: Replace zero-length array with flexible array member (Len Baker) Use 2-factor argument multiplication form: - nouveau/svm: Use kvcalloc() instead of kvzalloc() (Gustavo A. R. Silva) - xfs: Use kvcalloc() instead of kvzalloc() (Gustavo A. R. Silva)" * tag 'kspp-misc-fixes-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux: firewire: Remove function callback casts nouveau/svm: Use kvcalloc() instead of kvzalloc() firmware/psci: fix application of sizeof to pointer dmaengine: pxa_dma: Prefer struct_size over open coded arithmetic KVM: PPC: Replace zero-length array with flexible array member aio: Prefer struct_size over open coded arithmetic writeback: prefer struct_size over open coded arithmetic xfs: Use kvcalloc() instead of kvzalloc() assoc_array: Avoid open coded arithmetic in allocator arguments
2021-11-01Merge tag 'seccomp-v5.16-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull seccomp updates from Kees Cook: "These are x86-specific, but I carried these since they're also seccomp-specific. This flips the defaults for spec_store_bypass_disable and spectre_v2_user from "seccomp" to "prctl", as enough time has passed to allow system owners to have updated the defensive stances of their various workloads, and it's long overdue to unpessimize seccomp threads. Extensive rationale and details are in Andrea's main patch. Summary: - set spec_store_bypass_disable & spectre_v2_user to prctl (Andrea Arcangeli)" * tag 'seccomp-v5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: x86: deduplicate the spectre_v2_user documentation x86: change default to spec_store_bypass_disable=prctl spectre_v2_user=prctl
2021-11-01Merge tag 'overflow-v5.16-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull overflow updates from Kees Cook: "The end goal of the current buffer overflow detection work[0] is to gain full compile-time and run-time coverage of all detectable buffer overflows seen via array indexing or memcpy(), memmove(), and memset(). The str*() family of functions already have full coverage. While much of the work for these changes have been on-going for many releases (i.e. 0-element and 1-element array replacements, as well as avoiding false positives and fixing discovered overflows[1]), this series contains the foundational elements of several related buffer overflow detection improvements by providing new common helpers and FORTIFY_SOURCE changes needed to gain the introspection required for compiler visibility into array sizes. Also included are a handful of already Acked instances using the helpers (or related clean-ups), with many more waiting at the ready to be taken via subsystem-specific trees[2]. The new helpers are: - struct_group() for gaining struct member range introspection - memset_after() and memset_startat() for clearing to the end of structures - DECLARE_FLEX_ARRAY() for using flex arrays in unions or alone in structs Also included is the beginning of the refactoring of FORTIFY_SOURCE to support memcpy() introspection, fix missing and regressed coverage under GCC, and to prepare to fix the currently broken Clang support. Finishing this work is part of the larger series[0], but depends on all the false positives and buffer overflow bug fixes to have landed already and those that depend on this series to land. As part of the FORTIFY_SOURCE refactoring, a set of both a compile-time and run-time tests are added for FORTIFY_SOURCE and the mem*()-family functions respectively. The compile time tests have found a legitimate (though corner-case) bug[6] already. Please note that the appearance of "panic" and "BUG" in the FORTIFY_SOURCE refactoring are the result of relocating existing code, and no new use of those code-paths are expected nor desired. Finally, there are two tree-wide conversions for 0-element arrays and flexible array unions to gain sane compiler introspection coverage that result in no known object code differences. After this series (and the changes that have now landed via netdev and usb), we are very close to finally being able to build with -Warray-bounds and -Wzero-length-bounds. However, due corner cases in GCC[3] and Clang[4], I have not included the last two patches that turn on these options, as I don't want to introduce any known warnings to the build. Hopefully these can be solved soon" Link: https://lore.kernel.org/lkml/20210818060533.3569517-1-keescook@chromium.org/ [0] Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/log/?qt=grep&q=FORTIFY_SOURCE [1] Link: https://lore.kernel.org/lkml/202108220107.3E26FE6C9C@keescook/ [2] Link: https://lore.kernel.org/lkml/3ab153ec-2798-da4c-f7b1-81b0ac8b0c5b@roeck-us.net/ [3] Link: https://bugs.llvm.org/show_bug.cgi?id=51682 [4] Link: https://lore.kernel.org/lkml/202109051257.29B29745C0@keescook/ [5] Link: https://lore.kernel.org/lkml/20211020200039.170424-1-keescook@chromium.org/ [6] * tag 'overflow-v5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (30 commits) fortify: strlen: Avoid shadowing previous locals compiler-gcc.h: Define __SANITIZE_ADDRESS__ under hwaddress sanitizer treewide: Replace 0-element memcpy() destinations with flexible arrays treewide: Replace open-coded flex arrays in unions stddef: Introduce DECLARE_FLEX_ARRAY() helper btrfs: Use memset_startat() to clear end of struct string.h: Introduce memset_startat() for wiping trailing members and padding xfrm: Use memset_after() to clear padding string.h: Introduce memset_after() for wiping trailing members/padding lib: Introduce CONFIG_MEMCPY_KUNIT_TEST fortify: Add compile-time FORTIFY_SOURCE tests fortify: Allow strlen() and strnlen() to pass compile-time known lengths fortify: Prepare to improve strnlen() and strlen() warnings fortify: Fix dropped strcpy() compile-time write overflow check fortify: Explicitly disable Clang support fortify: Move remaining fortify helpers into fortify-string.h lib/string: Move helper functions out of string.c compiler_types.h: Remove __compiletime_object_size() cm4000_cs: Use struct_group() to zero struct cm4000_dev region can: flexcan: Use struct_group() to zero struct flexcan_regs regions ...
2021-11-01Merge tag 'hardening-v5.16-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull compiler hardening updates from Kees Cook: "These are various compiler-related hardening feature updates. Notable is the addition of an explicit limited rationale for, and deprecation schedule of, gcc-plugins. gcc-plugins: - remove support for GCC 4.9 and older (Ard Biesheuvel) - remove duplicate include in gcc-common.h (Ye Guojin) - Explicitly document purpose and deprecation schedule (Kees Cook) - Remove cyc_complexity (Kees Cook) instrumentation: - Avoid harmless Clang option under CONFIG_INIT_STACK_ALL_ZERO (Kees Cook) Clang LTO: - kallsyms: strip LTO suffixes from static functions (Nick Desaulniers)" * tag 'hardening-v5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: gcc-plugins: remove duplicate include in gcc-common.h gcc-plugins: Remove cyc_complexity gcc-plugins: Explicitly document purpose and deprecation schedule kallsyms: strip LTO suffixes from static functions gcc-plugins: remove support for GCC 4.9 and older hardening: Avoid harmless Clang option under CONFIG_INIT_STACK_ALL_ZERO
2021-11-01Merge tag 'cpu-to-thread_info-v5.16-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull thread_info update to move 'cpu' back from task_struct from Kees Cook: "Cross-architecture update to move task_struct::cpu back into thread_info on arm64, x86, s390, powerpc, and riscv. All Acked by arch maintainers. Quoting Ard Biesheuvel: 'Move task_struct::cpu back into thread_info Keeping CPU in task_struct is problematic for architectures that define raw_smp_processor_id() in terms of this field, as it requires linux/sched.h to be included, which causes a lot of pain in terms of circular dependencies (aka 'header soup') This series moves it back into thread_info (where it came from) for all architectures that enable THREAD_INFO_IN_TASK, addressing the header soup issue as well as some pointless differences in the implementations of task_cpu() and set_task_cpu()'" * tag 'cpu-to-thread_info-v5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: riscv: rely on core code to keep thread_info::cpu updated powerpc: smp: remove hack to obtain offset of task_struct::cpu sched: move CPU field back into thread_info if THREAD_INFO_IN_TASK=y powerpc: add CPU field to struct thread_info s390: add CPU field to struct thread_info x86: add CPU field to struct thread_info arm64: add CPU field to struct thread_info
2021-11-01Merge tag 'm68k-for-v5.16-tag1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k Pull m68k updates from Geert Uytterhoeven: - A small comma vs semicolon cleanup - defconfig updates * tag 'm68k-for-v5.16-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k: m68k: defconfig: Update defconfigs for v5.15-rc1 m68k: muldi3: Use semicolon instead of comma
2021-11-01Merge tag 'for-5.16/parisc-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parisc updates from Helge Deller: "Lots of new features and fixes: - Added TOC (table of content) support, which is a debugging feature which is either initiated by pressing the TOC button or via command in the BMC. If pressed the Linux built-in KDB/KGDB will be called (Sven Schnelle) - Fix CONFIG_PREEMPT (Sven) - Fix unwinder on 64-bit kernels (Sven) - Various kgdb fixes (Sven) - Added KFENCE support (me) - Switch to ARCH_STACKWALK implementation (me) - Fix ptrace check on syscall return (me) - Fix kernel crash with fixmaps on PA1.x machines (me) - Move thread_info into task struct, aka CONFIG_THREAD_INFO_IN_TASK (me) - Updated defconfigs - Smaller cleanups, including Makefile cleanups (Masahiro Yamada), use kthread_run() macro (Cai Huoqing), use swap() macro (Yihao Han)" * tag 'for-5.16/parisc-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: (36 commits) parisc: Fix set_fixmap() on PA1.x CPUs parisc: Use swap() to swap values in setup_bootmem() parisc: Update defconfigs parisc: decompressor: clean up Makefile parisc: decompressor: remove repeated depenency of misc.o parisc: Remove unused constants from asm-offsets.c parisc/ftrace: use static key to enable/disable function graph tracer parisc/ftrace: set function trace function parisc: Make use of the helper macro kthread_run() parisc: mark xchg functions notrace parisc: enhance warning regarding usage of O_NONBLOCK parisc: Drop ifdef __KERNEL__ from non-uapi kernel headers parisc: Use PRIV_USER and PRIV_KERNEL in ptrace.h parisc: Use PRIV_USER in syscall.S parisc/kgdb: add kgdb_roundup() to make kgdb work with idle polling parisc: Move thread_info into task struct parisc: add support for TOC (transfer of control) parisc/firmware: add functions to retrieve TOC data parisc: add PIM TOC data structures parisc: move virt_map macro to assembly.h ...
2021-11-01net: vmxnet3: remove multiple false checks in vmxnet3_ethtool.cJean Sacren
In one if branch, (ec->rx_coalesce_usecs != 0) is checked. When it is checked again in two more places, it is always false and has no effect on the whole check expression. We should remove it in both places. In another if branch, (ec->use_adaptive_rx_coalesce != 0) is checked. When it is checked again, it is always false. We should remove the entire branch with it. In addition we might as well let C precedence dictate by getting rid of two pairs of parentheses in the neighboring lines in order to keep expressions on both sides of '||' in balance with checkpatch warning silenced. Signed-off-by: Jean Sacren <sakiwit@gmail.com> Link: https://lore.kernel.org/r/20211031012728.8325-1-sakiwit@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-01Merge tag 'arm64-upstream' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Will Deacon: "There's the usual summary below, but the highlights are support for the Armv8.6 timer extensions, KASAN support for asymmetric MTE, the ability to kexec() with the MMU enabled and a second attempt at switching to the generic pfn_valid() implementation. Summary: - Support for the Arm8.6 timer extensions, including a self-synchronising view of the system registers to elide some expensive ISB instructions. - Exception table cleanup and rework so that the fixup handlers appear correctly in backtraces. - A handful of miscellaneous changes, the main one being selection of CONFIG_HAVE_POSIX_CPU_TIMERS_TASK_WORK. - More mm and pgtable cleanups. - KASAN support for "asymmetric" MTE, where tag faults are reported synchronously for loads (via an exception) and asynchronously for stores (via a register). - Support for leaving the MMU enabled during kexec relocation, which significantly speeds up the operation. - Minor improvements to our perf PMU drivers. - Improvements to the compat vDSO build system, particularly when building with LLVM=1. - Preparatory work for handling some Coresight TRBE tracing errata. - Cleanup and refactoring of the SVE code to pave the way for SME support in future. - Ensure SCS pages are unpoisoned immediately prior to freeing them when KASAN is enabled for the vmalloc area. - Try moving to the generic pfn_valid() implementation again now that the DMA mapping issue from last time has been resolved. - Numerous improvements and additions to our FPSIMD and SVE selftests" [ armv8.6 timer updates were in a shared branch and already came in through -tip in the timer pull - Linus ] * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (85 commits) arm64: Select POSIX_CPU_TIMERS_TASK_WORK arm64: Document boot requirements for FEAT_SME_FA64 arm64/sve: Fix warnings when SVE is disabled arm64/sve: Add stub for sve_max_virtualisable_vl() arm64: errata: Add detection for TRBE write to out-of-range arm64: errata: Add workaround for TSB flush failures arm64: errata: Add detection for TRBE overwrite in FILL mode arm64: Add Neoverse-N2, Cortex-A710 CPU part definition selftests: arm64: Factor out utility functions for assembly FP tests arm64: vmlinux.lds.S: remove `.fixup` section arm64: extable: add load_unaligned_zeropad() handler arm64: extable: add a dedicated uaccess handler arm64: extable: add `type` and `data` fields arm64: extable: use `ex` for `exception_table_entry` arm64: extable: make fixup_exception() return bool arm64: extable: consolidate definitions arm64: gpr-num: support W registers arm64: factor out GPR numbering helpers arm64: kvm: use kvm_exception_table_entry arm64: lib: __arch_copy_to_user(): fold fixups into body ...
2021-11-01Merge branch 'accurate-memory-charging-for-msg_zerocopy'Jakub Kicinski
Talal Ahmad says: ==================== Accurate Memory Charging For MSG_ZEROCOPY This series improves the accuracy of msg_zerocopy memory accounting. At present, when msg_zerocopy is used memory is charged twice for the data - once when user space allocates it, and then again within __zerocopy_sg_from_iter. The memory charging in the kernel is excessive because data is held in user pages and is never actually copied to skb fragments. This leads to incorrectly inflated memory statistics for programs passing MSG_ZEROCOPY. We reduce this inaccuracy by introducing the notion of "pure" zerocopy SKBs - where all the frags in the SKB are backed by pinned userspace pages, and none are backed by copied pages. For such SKBs, tracked via the new SKBFL_PURE_ZEROCOPY flag, we elide sk_mem_charge/uncharge calls, leading to more accurate accounting. However, SKBs can also be coalesced by the stack at present, potentially leading to "impure" SKBs. We restrict this coalescing so it can only happen within the sendmsg() system call itself, for the most recently allocated SKB. While this can lead to a small degree of double-charging of memory, this case does not arise often in practice for workloads that set MSG_ZEROCOPY. Testing verified that memory usage in the kernel is lowered. Instrumentation with counters also showed that accounting at time charging and uncharging is balanced. ==================== Link: https://lore.kernel.org/r/20211030020542.3870542-1-mailtalalahmad@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-01net: avoid double accounting for pure zerocopy skbsTalal Ahmad
Track skbs with only zerocopy data and avoid charging them to kernel memory to correctly account the memory utilization for msg_zerocopy. All of the data in such skbs is held in user pages which are already accounted to user. Before this change, they are charged again in kernel in __zerocopy_sg_from_iter. The charging in kernel is excessive because data is not being copied into skb frags. This excessive charging can lead to kernel going into memory pressure state which impacts all sockets in the system adversely. Mark pure zerocopy skbs with a SKBFL_PURE_ZEROCOPY flag and remove charge/uncharge for data in such skbs. Initially, an skb is marked pure zerocopy when it is empty and in zerocopy path. skb can then change from a pure zerocopy skb to mixed data skb (zerocopy and copy data) if it is at tail of write queue and there is room available in it and non-zerocopy data is being sent in the next sendmsg call. At this time sk_mem_charge is done for the pure zerocopied data and the pure zerocopy flag is unmarked. We found that this happens very rarely on workloads that pass MSG_ZEROCOPY. A pure zerocopy skb can later be coalesced into normal skb if they are next to each other in queue but this patch prevents coalescing from happening. This avoids complexity of charging when skb downgrades from pure zerocopy to mixed. This is also rare. In sk_wmem_free_skb, if it is a pure zerocopy skb, an sk_mem_uncharge for SKB_TRUESIZE(MAX_TCP_HEADER) is done for sk_mem_charge in tcp_skb_entail for an skb without data. Testing with the msg_zerocopy.c benchmark between two hosts(100G nics) with zerocopy showed that before this patch the 'sock' variable in memory.stat for cgroup2 that tracks sum of sk_forward_alloc, sk_rmem_alloc and sk_wmem_queued is around 1822720 and with this change it is 0. This is due to no charge to sk_forward_alloc for zerocopy data and shows memory utilization for kernel is lowered. Signed-off-by: Talal Ahmad <talalahmad@google.com> Acked-by: Arjun Roy <arjunroy@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-01tcp: rename sk_wmem_free_skbTalal Ahmad
sk_wmem_free_skb() is only used by TCP. Rename it to make this clear, and move its declaration to include/net/tcp.h Signed-off-by: Talal Ahmad <talalahmad@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Arjun Roy <arjunroy@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-01netdevsim: fix uninit value in nsim_drv_configure_vfs()Jakub Kicinski
Build bot points out that I missed initializing ret after refactoring. Reported-by: kernel test robot <lkp@intel.com> Fixes: 1c401078bcf3 ("netdevsim: move details of vf config to dev") Link: https://lore.kernel.org/r/20211101221845.3188490-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-01selftests/bpf: Fix also no-alu32 strobemeta selftestAndrii Nakryiko
Previous fix aded bpf_clamp_umax() helper use to re-validate boundaries. While that works correctly, it introduces more branches, which blows up past 1 million instructions in no-alu32 variant of strobemeta selftests. Switching len variable from u32 to u64 also fixes the issue and reduces the number of validated instructions, so use that instead. Fix this patch and bpf_clamp_umax() removed, both alu32 and no-alu32 selftests pass. Fixes: 0133c20480b1 ("selftests/bpf: Fix strobemeta selftest regression") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211101230118.1273019-1-andrii@kernel.org
2021-11-01Merge tag 'x86_sgx_for_v5.16_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 SGX updates from Borislav Petkov: "Add a SGX_IOC_VEPC_REMOVE ioctl to the /dev/sgx_vepc virt interface with which EPC pages can be put back into their uninitialized state without having to reopen /dev/sgx_vepc, which could not be possible anymore after startup due to security policies" * tag 'x86_sgx_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/sgx/virt: implement SGX_IOC_VEPC_REMOVE ioctl x86/sgx/virt: extract sgx_vepc_remove_page
2021-11-01Merge tag 'x86_sev_for_v5.16_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 SEV updates from Borislav Petkov: - Export sev_es_ghcb_hv_call() so that HyperV Isolation VMs can use it too - Non-urgent fixes and cleanups * tag 'x86_sev_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/sev: Expose sev_es_ghcb_hv_call() for use by HyperV x86/sev: Allow #VC exceptions on the VC2 stack x86/sev: Fix stack type check in vc_switch_off_ist() x86/sme: Use #define USE_EARLY_PGTABLE_L5 in mem_encrypt_identity.c x86/sev: Carve out HV call's return value verification
2021-11-01Merge tag 'x86_misc_for_v5.16_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 changes from Borislav Petkov: - Use the proper interface for the job: get_unaligned() instead of memcpy() in the insn decoder - A randconfig build fix * tag 'x86_misc_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/insn: Use get_unaligned() instead of memcpy() x86/Kconfig: Fix an unused variable error in dell-smm-hwmon
2021-11-01Merge tag 'x86_cpu_for_v5.16_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpu updates from Borislav Petkov: - Start checking a CPUID bit on AMD Zen3 which states that the CPU clears the segment base when a null selector is written. Do the explicit detection on older CPUs, zen2 and hygon specifically, which have the functionality but do not advertize the CPUID bit. Factor in the presence of a hypervisor underneath the kernel and avoid doing the explicit check there which the HV might've decided to not advertize for migration safety reasons, or similar. - Add support for a new X86 CPU vendor: VORTEX. Needed for whitelisting those CPUs in the hardware vulnerabilities detection - Force the compiler to use rIP-relative addressing in the fallback path of static_cpu_has(), in order to avoid unnecessary register pressure * tag 'x86_cpu_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu: Fix migration safety with X86_BUG_NULL_SEL x86/CPU: Add support for Vortex CPUs x86/umip: Downgrade warning messages to debug loglevel x86/asm: Avoid adding register pressure for the init case in static_cpu_has() x86/asm: Add _ASM_RIP() macro for x86-64 (%rip) suffix
2021-11-01Merge tag 'x86_cleanups_for_v5.16_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Borislav Petkov: "The usual round of random minor fixes and cleanups all over the place" * tag 'x86_cleanups_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/Makefile: Remove unneeded whitespaces before tabs x86/of: Kill unused early_init_dt_scan_chosen_arch() x86: Fix misspelled Kconfig symbols x86/Kconfig: Remove references to obsolete Kconfig symbols x86/smp: Remove unnecessary assignment to local var freq_scale
2021-11-01Merge tag 'x86_cc_for_v5.16_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull generic confidential computing updates from Borislav Petkov: "Add an interface called cc_platform_has() which is supposed to be used by confidential computing solutions to query different aspects of the system. The intent behind it is to unify testing of such aspects instead of having each confidential computing solution add its own set of tests to code paths in the kernel, leading to an unwieldy mess" * tag 'x86_cc_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: treewide: Replace the use of mem_encrypt_active() with cc_platform_has() x86/sev: Replace occurrences of sev_es_active() with cc_platform_has() x86/sev: Replace occurrences of sev_active() with cc_platform_has() x86/sme: Replace occurrences of sme_active() with cc_platform_has() powerpc/pseries/svm: Add a powerpc version of cc_platform_has() x86/sev: Add an x86 version of cc_platform_has() arch/cc: Introduce a function to check for confidential computing features x86/ioremap: Selectively build arch override encryption functions
2021-11-01Merge tag 'x86_build_for_v5.16_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 build fix from Borislav Petkov: - A single fix to hdimage when using older versions of mtools * tag 'x86_build_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/boot: Fix make hdimage with older versions of mtools
2021-11-01Merge tag 'ras_core_for_v5.16_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS updates from Borislav Petkov: - Get rid of a bunch of function pointers used in MCA land in favor of normal functions. This is in preparation of making the MCA code noinstr-aware - When the kernel copies data from user addresses and it encounters a machine check, a SIGBUS is sent to that process. Change this action to either an -EFAULT which is returned to the user or a short write, making the recovery action a lot more user-friendly * tag 'ras_core_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce: Sort mca_config members to get rid of unnecessary padding x86/mce: Get rid of the ->quirk_no_way_out() indirect call x86/mce: Get rid of msr_ops x86/mce: Get rid of machine_check_vector x86/mce: Get rid of the mce_severity function pointer x86/mce: Drop copyin special case for #MC x86/mce: Change to not send SIGBUS error during copy from user
2021-11-01tracing/osnoise: Remove PREEMPT_RT ifdefs from inside functionsDaniel Bristot de Oliveira
Remove CONFIG_PREEMPT_RT from inside functions, avoiding compilation problems in the future. Link: https://lkml.kernel.org/r/37ee0881b033cdc513efc84ebea26cf77880c8c2.1635702894.git.bristot@kernel.org Cc: Ingo Molnar <mingo@redhat.com> Cc: Tom Zanussi <zanussi@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: John Kacur <jkacur@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Daniel Bristot de Oliveira <bristot@kernel.org> Cc: linux-rt-users@vger.kernel.org Cc: linux-trace-devel@vger.kernel.org Cc: linux-kernel@vger.kernel.org Suggested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-11-01tracing/osnoise: Remove STACKTRACE ifdefs from inside functionsDaniel Bristot de Oliveira
Remove CONFIG_STACKTRACE from inside functions, avoiding compilation problems in the future. Link: https://lkml.kernel.org/r/3465cca2f28e1ba602a1fc8bdb28d12950b5226e.1635702894.git.bristot@kernel.org Cc: Ingo Molnar <mingo@redhat.com> Cc: Tom Zanussi <zanussi@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: John Kacur <jkacur@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Daniel Bristot de Oliveira <bristot@kernel.org> Cc: linux-rt-users@vger.kernel.org Cc: linux-trace-devel@vger.kernel.org Cc: linux-kernel@vger.kernel.org Suggested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-11-01tracing/osnoise: Allow multiple instances of the same tracerDaniel Bristot de Oliveira
Currently, the user can start only one instance of timerlat/osnoise tracers and the tracers cannot run in parallel. As starting point to add more flexibility, let's allow the same tracer to run on different trace instances. The workload will start when the first trace_array (instance) is registered and stop when the last instance is unregistered. So, while this patch allows the same tracer to run in multiple instances (e.g., two instances running osnoise), it still does not allow instances of timerlat and osnoise in parallel (e.g., one timerlat and osnoise). That is because the osnoise: events have different behavior depending on which tracer is enabled (osnoise or timerlat). Enabling the parallel usage of these two tracers is in my TODO list. Link: https://lkml.kernel.org/r/38c8f14b613492a4f3f938d9d3bf0b063b72f0f0.1635702894.git.bristot@kernel.org Cc: Ingo Molnar <mingo@redhat.com> Cc: Tom Zanussi <zanussi@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: John Kacur <jkacur@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Daniel Bristot de Oliveira <bristot@kernel.org> Cc: linux-rt-users@vger.kernel.org Cc: linux-trace-devel@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-11-01tracing/osnoise: Remove TIMERLAT ifdefs from inside functionsDaniel Bristot de Oliveira
Remove CONFIG_TIMERLAT_TRACER from inside functions, avoiding compilation problems in the future. Link: https://lkml.kernel.org/r/8245abb5a112d249f5da6c1df499244ad9e647bc.1635702894.git.bristot@kernel.org Cc: Ingo Molnar <mingo@redhat.com> Cc: Tom Zanussi <zanussi@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: John Kacur <jkacur@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Daniel Bristot de Oliveira <bristot@kernel.org> Cc: linux-rt-users@vger.kernel.org Cc: linux-trace-devel@vger.kernel.org Cc: linux-kernel@vger.kernel.org Suggested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-11-01tracing/osnoise: Support a list of trace_array *trDaniel Bristot de Oliveira
osnoise/timerlat were built to run a single instance, and for this, a single variable is enough to store the current struct trace_array *tr with information about the tracing instance. This is done via the *osnoise_trace variable. A trace_array represents a trace instance. In preparation to support multiple instances, replace the *osnoise_trace variable with an RCU protected list of instances. The operations that refer to an instance now propagate to all elements of the list (all instances). Also, replace the osnoise_busy variable with a check if the list has elements (busy). No functional change is expected with this patch, i.e., only one instance is allowed yet. Link: https://lkml.kernel.org/r/91d006e889b9a5d1ff258fe6077f021ae3f26372.1635702894.git.bristot@kernel.org Cc: Ingo Molnar <mingo@redhat.com> Cc: Tom Zanussi <zanussi@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: John Kacur <jkacur@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Daniel Bristot de Oliveira <bristot@kernel.org> Cc: linux-rt-users@vger.kernel.org Cc: linux-trace-devel@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-11-01tracing/osnoise: Use start/stop_per_cpu_kthreads() on osnoise_cpus_write()Daniel Bristot de Oliveira
When writing a new CPU mask via osnoise/cpus, if the tracer is running, the workload is restarted to follow the new cpumask. The restart is currently done using osnoise_workload_start/stop(), which disables the workload *and* the instrumentation. However, disabling the instrumentation is not necessary. Calling start/stop_per_cpu_kthreads() is enough to apply the new osnoise/cpus config. Link: https://lkml.kernel.org/r/ee633e82867c5b88851aa6040522a799c0034486.1635702894.git.bristot@kernel.org Cc: Ingo Molnar <mingo@redhat.com> Cc: Tom Zanussi <zanussi@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: John Kacur <jkacur@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Daniel Bristot de Oliveira <bristot@kernel.org> Cc: linux-rt-users@vger.kernel.org Cc: linux-trace-devel@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-11-01tracing/osnoise: Split workload start from the tracer startDaniel Bristot de Oliveira
In preparation from supporting multiple trace instances, create workload start/stop specific functions. No functional change. Link: https://lkml.kernel.org/r/74b090971e9acdd13625be1c28ef3270d2275e77.1635702894.git.bristot@kernel.org Cc: Ingo Molnar <mingo@redhat.com> Cc: Tom Zanussi <zanussi@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: John Kacur <jkacur@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Daniel Bristot de Oliveira <bristot@kernel.org> Cc: linux-rt-users@vger.kernel.org Cc: linux-trace-devel@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-11-01tracing/osnoise: Improve comments about barrier need for NMI callbacksDaniel Bristot de Oliveira
trace_osnoise_callback_enabled is used by ftrace_nmi_enter/exit() to know when to call the NMI callback. The barrier is used to avoid having callbacks enabled before the resetting date during the start or to touch the values after stopping the tracer. Link: https://lkml.kernel.org/r/a413b8f14aa9312fbd1ba99f96225a8aed831053.1635702894.git.bristot@kernel.org Cc: Ingo Molnar <mingo@redhat.com> Cc: Tom Zanussi <zanussi@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: John Kacur <jkacur@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Daniel Bristot de Oliveira <bristot@kernel.org> Cc: linux-rt-users@vger.kernel.org Cc: linux-trace-devel@vger.kernel.org Cc: linux-kernel@vger.kernel.org Suggested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-11-01tracing/osnoise: Do not follow tracing_cpumaskDaniel Bristot de Oliveira
In preparation to support multiple instances, decouple the osnoise/timelat workload from instance-specific tracing_cpumask. Different instances can have conflicting cpumasks, making osnoise workload management needlessly complex. Osnoise already has its global cpumask. I also thought about using the first instance mask, but the "first" instance could be removed before the others. This also fixes the problem that changing the tracing_mask was not re-starting the trace. Link: https://lkml.kernel.org/r/169a71bcc919ce3ab53ae6f9ca5cde57fffaf9c6.1635702894.git.bristot@kernel.org Cc: Ingo Molnar <mingo@redhat.com> Cc: Tom Zanussi <zanussi@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: John Kacur <jkacur@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Daniel Bristot de Oliveira <bristot@kernel.org> Cc: linux-rt-users@vger.kernel.org Cc: linux-trace-devel@vger.kernel.org Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-11-01Merge tag 'efi-next-for-v5.16' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull EFI updates from Borislav Petkov: "The last EFI pull request which is forwarded through the tip tree, for v5.16. From now on, Ard will be sending stuff directly. Disable EFI runtime services by default on PREEMPT_RT, while adding the ability to re-enable them on demand by passing efi=runtime on the command line" * tag 'efi-next-for-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: efi: Allow efi=runtime efi: Disable runtime services on RT
2021-11-01Merge tag 'edac_updates_for_v5.16' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull EDAC updates from Borislav Petkov: "A small pile of EDAC updates which the autumn wind blew my way. :) - amd64_edac: Add support for three-rank interleaving mode which is present on AMD zen2 servers - The usual fixes and cleanups all over EDAC land" * tag 'edac_updates_for_v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: EDAC/sb_edac: Fix top-of-high-memory value for Broadwell/Haswell EDAC/ti: Remove redundant error messages EDAC/amd64: Handle three rank interleaving mode EDAC/mc_sysfs: Print MC-scope sysfs counters unsigned EDAC/al_mc: Make use of the helper function devm_add_action_or_reset() EDAC/mc: Replace strcpy(), sprintf() and snprintf() with strscpy() or scnprintf()
2021-11-01mm: fix mismerge of folio page flag manipulatorsLinus Torvalds
I had missed a semantic conflict between commit d389a4a81155 ("mm: Add folio flag manipulation functions") from the folio tree, and commit eac96c3efdb5 ("mm: filemap: check if THP has hwpoisoned subpage for PMD page fault") that added a new set of page flags. My build tests had too many options enabled, which hid this issue. But if you didn't have MEMORY_FAILURE or TRANSPARENT_HUGEPAGE enabled, you'd end up with build errors like this: include/linux/page-flags.h:806:29: error: macro "PAGEFLAG_FALSE" requires 2 arguments, but only 1 given 806 | PAGEFLAG_FALSE(HasHWPoisoned) | ^ due to the missing lowercase name used for folio function naming. Fixes: 49f8275c7d92 ("Merge tag 'folio-5.16' of git://git.infradead.org/users/willy/pagecache") Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Reported-by: Yang Shi <shy828301@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-01bpf: Add missing map_delete_elem method to bloom filter mapEric Dumazet
Without it, kernel crashes in map_delete_elem(), as reported by syzbot. BUG: kernel NULL pointer dereference, address: 0000000000000000 PGD 72c97067 P4D 72c97067 PUD 1e20c067 PMD 0 Oops: 0010 [#1] PREEMPT SMP KASAN CPU: 0 PID: 6518 Comm: syz-executor196 Not tainted 5.15.0-rc3-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:0x0 Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6. RSP: 0018:ffffc90002bafcb8 EFLAGS: 00010246 RAX: dffffc0000000000 RBX: 1ffff92000575f9f RCX: 0000000000000000 RDX: 1ffffffff1327aba RSI: 0000000000000000 RDI: ffff888025a30c00 RBP: ffffc90002baff08 R08: 0000000000000000 R09: 0000000000000001 R10: ffffffff818525d8 R11: 0000000000000000 R12: ffffffff8993d560 R13: ffff888025a30c00 R14: ffff888024bc0000 R15: 0000000000000000 FS: 0000555557491300(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffffffffd6 CR3: 0000000070189000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: map_delete_elem kernel/bpf/syscall.c:1220 [inline] __sys_bpf+0x34f1/0x5ee0 kernel/bpf/syscall.c:4606 __do_sys_bpf kernel/bpf/syscall.c:4719 [inline] __se_sys_bpf kernel/bpf/syscall.c:4717 [inline] __x64_sys_bpf+0x75/0xb0 kernel/bpf/syscall.c:4717 do_syscall_x64 arch/x86/entry/common.c:50 [inline] Fixes: 9330986c0300 ("bpf: Add bloom filter map implementation") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20211031171353.4092388-1-eric.dumazet@gmail.com
2021-11-01Merge branch '"map_extra" and bloom filter fixups'Alexei Starovoitov
Joanne Koong says: ==================== There are 3 patches in this patchset: 1/3 - Bloom filter naming fixups (kernel/bpf/bloom_filter.c) 2/3 - Add alignment padding for map_extra, rearrange fields in bpf_map struct to consolidate holes 3/3 - Bloom filter tests (prog_tests/bloom_filter_map): Add test for successful userspace calls, some refactoring to use bpf_create_map instead of bpf_create_map_xattr v1 -> v2: * In prog_tests/bloom_filter_map: remove unneeded line break, also change the inner_map_test to use bpf_create_map instead of bpf_create_map_xattr. * Add acked-bys to commit messages ==================== Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2021-11-01perf bpf: Pull in bpf_program__get_prog_info_linear()Dave Marchevsky
To prepare for impending deprecation of libbpf's bpf_program__get_prog_info_linear(), pull in the function and associated helpers into the perf codebase and migrate existing uses to the perf copy. Since libbpf's deprecated definitions will still be visible to perf, it is necessary to rename perf's definitions. Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20211011082031.4148337-4-davemarchevsky@fb.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-01selftests/bpf: Add bloom map success test for userspace callsJoanne Koong
This patch has two changes: 1) Adds a new function "test_success_cases" to test successfully creating + adding + looking up a value in a bloom filter map from the userspace side. 2) Use bpf_create_map instead of bpf_create_map_xattr in the "test_fail_cases" and test_inner_map to make the code look cleaner. Signed-off-by: Joanne Koong <joannekoong@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20211029224909.1721024-4-joannekoong@fb.com
2021-11-01bpf: Add alignment padding for "map_extra" + consolidate holesJoanne Koong
This patch makes 2 changes regarding alignment padding for the "map_extra" field. 1) In the kernel header, "map_extra" and "btf_value_type_id" are rearranged to consolidate the hole. Before: struct bpf_map { ... u32 max_entries; /* 36 4 */ u32 map_flags; /* 40 4 */ /* XXX 4 bytes hole, try to pack */ u64 map_extra; /* 48 8 */ int spin_lock_off; /* 56 4 */ int timer_off; /* 60 4 */ /* --- cacheline 1 boundary (64 bytes) --- */ u32 id; /* 64 4 */ int numa_node; /* 68 4 */ ... bool frozen; /* 117 1 */ /* XXX 10 bytes hole, try to pack */ /* --- cacheline 2 boundary (128 bytes) --- */ ... struct work_struct work; /* 144 72 */ /* --- cacheline 3 boundary (192 bytes) was 24 bytes ago --- */ struct mutex freeze_mutex; /* 216 144 */ /* --- cacheline 5 boundary (320 bytes) was 40 bytes ago --- */ u64 writecnt; /* 360 8 */ /* size: 384, cachelines: 6, members: 26 */ /* sum members: 354, holes: 2, sum holes: 14 */ /* padding: 16 */ /* forced alignments: 2, forced holes: 1, sum forced holes: 10 */ } __attribute__((__aligned__(64))); After: struct bpf_map { ... u32 max_entries; /* 36 4 */ u64 map_extra; /* 40 8 */ u32 map_flags; /* 48 4 */ int spin_lock_off; /* 52 4 */ int timer_off; /* 56 4 */ u32 id; /* 60 4 */ /* --- cacheline 1 boundary (64 bytes) --- */ int numa_node; /* 64 4 */ ... bool frozen /* 113 1 */ /* XXX 14 bytes hole, try to pack */ /* --- cacheline 2 boundary (128 bytes) --- */ ... struct work_struct work; /* 144 72 */ /* --- cacheline 3 boundary (192 bytes) was 24 bytes ago --- */ struct mutex freeze_mutex; /* 216 144 */ /* --- cacheline 5 boundary (320 bytes) was 40 bytes ago --- */ u64 writecnt; /* 360 8 */ /* size: 384, cachelines: 6, members: 26 */ /* sum members: 354, holes: 1, sum holes: 14 */ /* padding: 16 */ /* forced alignments: 2, forced holes: 1, sum forced holes: 14 */ } __attribute__((__aligned__(64))); 2) Add alignment padding to the bpf_map_info struct More details can be found in commit 36f9814a494a ("bpf: fix uapi hole for 32 bit compat applications") Signed-off-by: Joanne Koong <joannekoong@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20211029224909.1721024-3-joannekoong@fb.com
2021-11-01bpf: Bloom filter map naming fixupsJoanne Koong
This patch has two changes in the kernel bloom filter map implementation: 1) Change the names of map-ops functions to include the "bloom_map" prefix. As Martin pointed out on a previous patchset, having generic map-ops names may be confusing in tracing and in perf-report. 2) Drop the "& 0xF" when getting nr_hash_funcs, since we already ascertain that no other bits in map_extra beyond the first 4 bits can be set. Signed-off-by: Joanne Koong <joannekoong@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20211029224909.1721024-2-joannekoong@fb.com
2021-11-01Merge branch 'introduce dummy BPF STRUCT_OPS'Alexei Starovoitov
Hou Tao says: ==================== Hi, Currently the test of BPF STRUCT_OPS depends on the specific bpf implementation (e.g, tcp_congestion_ops), but it can not cover all basic functionalities (e.g, return value handling), so introduce a dummy BPF STRUCT_OPS for test purpose. Instead of loading a userspace-implemeted bpf_dummy_ops map into kernel and calling the specific function by writing to sysfs provided by bpf_testmode.ko, only loading bpf_dummy_ops related prog into kernel and calling these prog by bpf_prog_test_run(). The latter is more flexible and has no dependency on extra kernel module. Now the return value handling is supported by test_1(...) ops, and passing multiple arguments is supported by test_2(...) ops. If more is needed, test_x(...) ops can be added afterwards. Comments are always welcome. Regards, Hou Change Log: v4: * add Acked-by tags in patch 1~4 * patch 2: remove unncessary comments and update commit message accordingly * patch 4: remove unnecessary nr checking in dummy_ops_init_args() v3: https://www.spinics.net/lists/bpf/msg48303.html * rebase on bpf-next * address comments for Martin, mainly include: merge patch 3 & patch 4 in v2, fix names of btf ctx access check helpers, handle CONFIG_NET, fix leak in dummy_ops_init_args(), and simplify bpf_dummy_init() * patch 4: use a loop to check args in test_dummy_multiple_args() v2: https://www.spinics.net/lists/bpf/msg47948.html * rebase on bpf-next * add test_2(...) ops to test the passing of multiple arguments * a new patch (patch #2) is added to factor out ctx access helpers * address comments from Martin & Andrii v1: https://www.spinics.net/lists/bpf/msg46787.html RFC: https://www.spinics.net/lists/bpf/msg46117.html ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2021-11-01selftests/bpf: Add test cases for struct_ops progHou Tao
Running a BPF_PROG_TYPE_STRUCT_OPS prog for dummy_st_ops::test_N() through bpf_prog_test_run(). Four test cases are added: (1) attach dummy_st_ops should fail (2) function return value of bpf_dummy_ops::test_1() is expected (3) pointer argument of bpf_dummy_ops::test_1() works as expected (4) multiple arguments passed to bpf_dummy_ops::test_2() are correct Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20211025064025.2567443-5-houtao1@huawei.com
2021-11-01bpf: Add dummy BPF STRUCT_OPS for test purposeHou Tao
Currently the test of BPF STRUCT_OPS depends on the specific bpf implementation of tcp_congestion_ops, but it can not cover all basic functionalities (e.g, return value handling), so introduce a dummy BPF STRUCT_OPS for test purpose. Loading a bpf_dummy_ops implementation from userspace is prohibited, and its only purpose is to run BPF_PROG_TYPE_STRUCT_OPS program through bpf(BPF_PROG_TEST_RUN). Now programs for test_1() & test_2() are supported. The following three cases are exercised in bpf_dummy_struct_ops_test_run(): (1) test and check the value returned from state arg in test_1(state) The content of state is copied from userspace pointer and copied back after calling test_1(state). The user pointer is saved in an u64 array and the array address is passed through ctx_in. (2) test and check the return value of test_1(NULL) Just simulate the case in which an invalid input argument is passed in. (3) test multiple arguments passing in test_2(state, ...) 5 arguments are passed through ctx_in in form of u64 array. The first element of array is userspace pointer of state and others 4 arguments follow. Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20211025064025.2567443-4-houtao1@huawei.com
2021-11-01bpf: Factor out helpers for ctx access checkingHou Tao
Factor out two helpers to check the read access of ctx for raw tp and BTF function. bpf_tracing_ctx_access() is used to check the read access to argument is valid, and bpf_tracing_btf_ctx_access() checks whether the btf type of argument is valid besides the checking of argument read. bpf_tracing_btf_ctx_access() will be used by the following patch. Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20211025064025.2567443-3-houtao1@huawei.com
2021-11-01bpf: Factor out a helper to prepare trampoline for struct_ops progHou Tao
Factor out a helper bpf_struct_ops_prepare_trampoline() to prepare trampoline for BPF_PROG_TYPE_STRUCT_OPS prog. It will be used by .test_run callback in following patch. Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20211025064025.2567443-2-houtao1@huawei.com