summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-04-21selftests/mm: extend and rename uffd pagemap testPeter Xu
Extend it to all types of mem, meanwhile add one parallel test when EVENT_FORK is enabled, where uffd-wp bits should be persisted rather than dropped. Since at it, rename the test to "wp-fork" to better show what it means. Making the new test called "wp-fork-with-event". Before: Testing pagemap on anon... done After: Testing wp-fork on anon... done Testing wp-fork on shmem... done Testing wp-fork on shmem-private... done Testing wp-fork on hugetlb... done Testing wp-fork on hugetlb-private... done Testing wp-fork-with-event on anon... done Testing wp-fork-with-event on shmem... done Testing wp-fork-with-event on shmem-private... done Testing wp-fork-with-event on hugetlb... done Testing wp-fork-with-event on hugetlb-private... done Link: https://lkml.kernel.org/r/20230417195317.898696-5-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Mika Penttilä <mpenttil@redhat.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Nadav Amit <nadav.amit@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-21selftests/mm: add a few options for uffd-unit-testPeter Xu
Namely: "-f": add a wildcard filter for tests to run "-l": list tests rather than running any "-h": help msg Link: https://lkml.kernel.org/r/20230417195317.898696-4-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Mika Penttilä <mpenttil@redhat.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Nadav Amit <nadav.amit@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-21mm/hugetlb: fix uffd-wp bit lost when unsharing happensPeter Xu
When we try to unshare a pinned page for a private hugetlb, uffd-wp bit can get lost during unsharing. When above condition met, one can lose uffd-wp bit on the privately mapped hugetlb page. It allows the page to be writable even if it should still be wr-protected. I assume it can mean data loss. This should be very rare, only if an unsharing happened on a private hugetlb page with uffd-wp protected (e.g. in a child which shares the same page with parent with UFFD_FEATURE_EVENT_FORK enabled). When I wrote the reproducer (provided in the last patch) I needed to use the newest gup_test cmd introduced by David to trigger it because I don't even know another way to do a proper RO longerm pin. Besides that, it needs a bunch of other conditions all met: (1) hugetlb being mapped privately, (2) userfaultfd registered with WP and EVENT_FORK, (3) the user app fork()s, then, (4) RO longterm pin onto a wr-protected anonymous page. If it's not impossible to hit in production I'd say extremely rare. Link: https://lkml.kernel.org/r/20230417195317.898696-3-peterx@redhat.com Fixes: 166f3ecc0daf ("mm/hugetlb: hook page faults for uffd write protection") Signed-off-by: Peter Xu <peterx@redhat.com> Reported-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Mika Penttilä <mpenttil@redhat.com> Cc: Nadav Amit <nadav.amit@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-21mm/hugetlb: fix uffd-wp during fork()Peter Xu
Patch series "mm/hugetlb: More fixes around uffd-wp vs fork() / RO pins", v2. This patch (of 6): There're a bunch of things that were wrong: - Reading uffd-wp bit from a swap entry should use pte_swp_uffd_wp() rather than huge_pte_uffd_wp(). - When copying over a pte, we should drop uffd-wp bit when !EVENT_FORK (aka, when !userfaultfd_wp(dst_vma)). - When doing early CoW for private hugetlb (e.g. when the parent page was pinned), uffd-wp bit should be properly carried over if necessary. No bug reported probably because most people do not even care about these corner cases, but they are still bugs and can be exposed by the recent unit tests introduced, so fix all of them in one shot. Link: https://lkml.kernel.org/r/20230417195317.898696-1-peterx@redhat.com Link: https://lkml.kernel.org/r/20230417195317.898696-2-peterx@redhat.com Fixes: bc70fbf269fd ("mm/hugetlb: handle uffd-wp during fork()") Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Mika Penttilä <mpenttil@redhat.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-21kasan: fix lockdep report invalid wait contextZqiang
For kernels built with the following options and booting CONFIG_SLUB=y CONFIG_DEBUG_LOCKDEP=y CONFIG_PROVE_LOCKING=y CONFIG_PROVE_RAW_LOCK_NESTING=y [ 0.523115] [ BUG: Invalid wait context ] [ 0.523315] 6.3.0-rc1-yocto-standard+ #739 Not tainted [ 0.523649] ----------------------------- [ 0.523663] swapper/0/0 is trying to lock: [ 0.523663] ffff888035611360 (&c->lock){....}-{3:3}, at: put_cpu_partial+0x2e/0x1e0 [ 0.523663] other info that might help us debug this: [ 0.523663] context-{2:2} [ 0.523663] no locks held by swapper/0/0. [ 0.523663] stack backtrace: [ 0.523663] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.3.0-rc1-yocto-standard+ #739 [ 0.523663] Call Trace: [ 0.523663] <IRQ> [ 0.523663] dump_stack_lvl+0x64/0xb0 [ 0.523663] dump_stack+0x10/0x20 [ 0.523663] __lock_acquire+0x6c4/0x3c10 [ 0.523663] lock_acquire+0x188/0x460 [ 0.523663] put_cpu_partial+0x5a/0x1e0 [ 0.523663] __slab_free+0x39a/0x520 [ 0.523663] ___cache_free+0xa9/0xc0 [ 0.523663] qlist_free_all+0x7a/0x160 [ 0.523663] per_cpu_remove_cache+0x5c/0x70 [ 0.523663] __flush_smp_call_function_queue+0xfc/0x330 [ 0.523663] generic_smp_call_function_single_interrupt+0x13/0x20 [ 0.523663] __sysvec_call_function+0x86/0x2e0 [ 0.523663] sysvec_call_function+0x73/0x90 [ 0.523663] </IRQ> [ 0.523663] <TASK> [ 0.523663] asm_sysvec_call_function+0x1b/0x20 [ 0.523663] RIP: 0010:default_idle+0x13/0x20 [ 0.523663] RSP: 0000:ffffffff83e07dc0 EFLAGS: 00000246 [ 0.523663] RAX: 0000000000000000 RBX: ffffffff83e1e200 RCX: ffffffff82a83293 [ 0.523663] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff8119a6b1 [ 0.523663] RBP: ffffffff83e07dc8 R08: 0000000000000001 R09: ffffed1006ac0d66 [ 0.523663] R10: ffff888035606b2b R11: ffffed1006ac0d65 R12: 0000000000000000 [ 0.523663] R13: ffffffff83e1e200 R14: ffffffff84a7d980 R15: 0000000000000000 [ 0.523663] default_idle_call+0x6c/0xa0 [ 0.523663] do_idle+0x2e1/0x330 [ 0.523663] cpu_startup_entry+0x20/0x30 [ 0.523663] rest_init+0x152/0x240 [ 0.523663] arch_call_rest_init+0x13/0x40 [ 0.523663] start_kernel+0x331/0x470 [ 0.523663] x86_64_start_reservations+0x18/0x40 [ 0.523663] x86_64_start_kernel+0xbb/0x120 [ 0.523663] secondary_startup_64_no_verify+0xe0/0xeb [ 0.523663] </TASK> The local_lock_irqsave() is invoked in put_cpu_partial() and happens in IPI context, due to the CONFIG_PROVE_RAW_LOCK_NESTING=y (the LD_WAIT_CONFIG not equal to LD_WAIT_SPIN), so acquire local_lock in IPI context will trigger above calltrace. This commit therefore moves qlist_free_all() from hard-irq context to task context. Link: https://lkml.kernel.org/r/20230327120019.1027640-1-qiang1.zhang@intel.com Signed-off-by: Zqiang <qiang1.zhang@intel.com> Acked-by: Marco Elver <elver@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-21PCI: hv: Replace retarget_msi_interrupt_params with hyperv_pcpu_input_argDexuan Cui
4 commits are involved here: A (2016): commit 0de8ce3ee8e3 ("PCI: hv: Allocate physically contiguous hypercall params buffer") B (2017): commit be66b6736591 ("PCI: hv: Use page allocation for hbus structure") C (2019): commit 877b911a5ba0 ("PCI: hv: Avoid a kmemleak false positive caused by the hbus buffer") D (2018): commit 68bb7bfb7985 ("X86/Hyper-V: Enable IPI enlightenments") Patch D introduced the per-CPU hypercall input page "hyperv_pcpu_input_arg" in 2018. With patch D, we no longer need the per-Hyper-V-PCI-bus hypercall input page "hbus->retarget_msi_interrupt_params" that was added in patch A, and the issue addressed by patch B is no longer an issue, and we can also get rid of patch C. The change here is required for PCI device assignment to work for Confidential VMs (CVMs) running without a paravisor, because otherwise we would have to call set_memory_decrypted() for "hbus->retarget_msi_interrupt_params" before calling the hypercall HVCALL_RETARGET_INTERRUPT. Signed-off-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Acked-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Link: https://lore.kernel.org/r/20230421013025.17152-1-decui@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2023-04-21NFS: Cleanup unused rpc_clnt variableBenjamin Coddington
The root rpc_clnt is not used here, clean it up. Fixes: 4dc73c679114 ("NFSv4: keep state manager thread active if swap is enabled") Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2023-04-21Revert "ACPICA: Events: Support fixed PCIe wake event"Linus Torvalds
This reverts commit 5c62d5aab8752e5ee7bfbe75ed6060db1c787f98. This broke wake-on-lan for multiple people, and for much too long. Link: https://bugzilla.kernel.org/show_bug.cgi?id=217069 Link: https://lore.kernel.org/all/754225a2-95a9-2c36-1886-7da1a78308c2@loongson.cn/ Link: https://github.com/acpica/acpica/pull/866 Cc: Rafael J. Wysocki <rafael@kernel.org> Cc: Jianmin Lv <lvjianmin@loongson.cn> Cc: Huacai Chen <chenhuacai@loongson.cn> Cc: Bob Moore <robert.moore@intel.com> Cc: stable@kernel.org # 6.2 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-04-21ubifs: Fix memory leak in do_renameMårten Lindahl
If renaming a file in an encrypted directory, function fscrypt_setup_filename allocates memory for a file name. This name is never used, and before returning to the caller the memory for it is not freed. When running kmemleak on it we see that it is registered as a leak. The report below is triggered by a simple program 'rename' that renames a file in an encrypted directory: unreferenced object 0xffff888101502840 (size 32): comm "rename", pid 9404, jiffies 4302582475 (age 435.735s) backtrace: __kmem_cache_alloc_node __kmalloc fscrypt_setup_filename do_rename ubifs_rename vfs_rename do_renameat2 To fix this we can remove the call to fscrypt_setup_filename as it's not needed. Fixes: 278d9a243635f26 ("ubifs: Rename whiteout atomically") Reported-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Mårten Lindahl <marten.lindahl@axis.com> Reviewed-by: Zhihao Cheng <chengzhihao1@huawei.com> Cc: stable@vger.kernel.org Signed-off-by: Richard Weinberger <richard@nod.at>
2023-04-21ubifs: Free memory for tmpfile nameMårten Lindahl
When opening a ubifs tmpfile on an encrypted directory, function fscrypt_setup_filename allocates memory for the name that is to be stored in the directory entry, but after the name has been copied to the directory entry inode, the memory is not freed. When running kmemleak on it we see that it is registered as a leak. The report below is triggered by a simple program 'tmpfile' just opening a tmpfile: unreferenced object 0xffff88810178f380 (size 32): comm "tmpfile", pid 509, jiffies 4294934744 (age 1524.742s) backtrace: __kmem_cache_alloc_node __kmalloc fscrypt_setup_filename ubifs_tmpfile vfs_tmpfile path_openat Free this memory after it has been copied to the inode. Signed-off-by: Mårten Lindahl <marten.lindahl@axis.com> Reviewed-by: Zhihao Cheng <chengzhihao1@huawei.com> Cc: stable@vger.kernel.org Signed-off-by: Richard Weinberger <richard@nod.at>
2023-04-21NFS: set varaiable nfs_netfs_debug_id storage-class-specifier to staticTom Rix
smatch reports fs/nfs/fscache.c:260:10: warning: symbol 'nfs_netfs_debug_id' was not declared. Should it be static? This variable is only used in its defining file, so it should be static Signed-off-by: Tom Rix <trix@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2023-04-21ubi: Fix return value overwrite issue in try_write_vid_and_data()Wang YanQing
The commit 2d78aee426d8 ("UBI: simplify LEB write and atomic LEB change code") adds helper function, try_write_vid_and_data(), to simplify the code, but this helper function has bug, it will return 0 (success) when ubi_io_write_vid_hdr() or the ubi_io_write_data() return error number (-EIO, etc), because the return value of ubi_wl_put_peb() will overwrite the original return value. This issue will cause unexpected data loss issue, because the caller of this function and UBIFS willn't know the data is lost. Fixes: 2d78aee426d8 ("UBI: simplify LEB write and atomic LEB change code") Cc: stable@vger.kernel.org Signed-off-by: Wang YanQing <udknight@gmail.com> Reviewed-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2023-04-21ubifs: Remove return in compr_exit()Yangtao Li
It's redundant, let's remove it. Signed-off-by: Yangtao Li <frank.li@vivo.com> Reviewed-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2023-04-21docs: kvm: vfio: Suggest KVM_DEV_VFIO_GROUP_ADD vs VFIO_GROUP_GET_DEVICE_FD ↵Yi Liu
ordering as some vfio_device's open_device op requires kvm pointer and kvm pointer set is part of GROUP_ADD. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20230421053611.55839-1-yi.l.liu@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2023-04-21ubi: Simplify bool conversionYang Li
./drivers/mtd/ubi/build.c:1261:33-38: WARNING: conversion to bool not needed here Reported-by: Abaci Robot <abaci@linux.alibaba.com> Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=4061 Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Reviewed-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2023-04-21selftests/bpf: verifier/value_ptr_arith converted to inline assemblyEduard Zingerman
Test verifier/value_ptr_arith automatically converted to use inline assembly. Test cases "sanitation: alu with different scalars 2" and "sanitation: alu with different scalars 3" are updated to avoid -ENOENT as return value, as __retval() annotation only supports numeric literals. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20230421174234.2391278-25-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21selftests/bpf: verifier/value_illegal_alu converted to inline assemblyEduard Zingerman
Test verifier/value_illegal_alu automatically converted to use inline assembly. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20230421174234.2391278-24-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21selftests/bpf: verifier/unpriv converted to inline assemblyEduard Zingerman
Test verifier/unpriv semi-automatically converted to use inline assembly. The verifier/unpriv.c had to be split in two parts: - the bulk of the tests is in the progs/verifier_unpriv.c; - the single test that needs `struct bpf_perf_event_data` definition is in the progs/verifier_unpriv_perf.c. The tests above can't be in a single file because: - first requires inclusion of the filter.h header (to get access to BPF_ST_MEM macro, inline assembler does not support this isntruction); - the second requires vmlinux.h, which contains definitions conflicting with filter.h. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20230421174234.2391278-23-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21selftests/bpf: verifier/subreg converted to inline assemblyEduard Zingerman
Test verifier/subreg automatically converted to use inline assembly. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20230421174234.2391278-22-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21selftests/bpf: verifier/spin_lock converted to inline assemblyEduard Zingerman
Test verifier/spin_lock automatically converted to use inline assembly. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20230421174234.2391278-21-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21selftests/bpf: verifier/sock converted to inline assemblyEduard Zingerman
Test verifier/sock automatically converted to use inline assembly. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20230421174234.2391278-20-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21selftests/bpf: verifier/search_pruning converted to inline assemblyEduard Zingerman
Test verifier/search_pruning automatically converted to use inline assembly. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20230421174234.2391278-19-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21selftests/bpf: verifier/runtime_jit converted to inline assemblyEduard Zingerman
Test verifier/runtime_jit automatically converted to use inline assembly. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20230421174234.2391278-18-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21selftests/bpf: verifier/regalloc converted to inline assemblyEduard Zingerman
Test verifier/regalloc automatically converted to use inline assembly. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20230421174234.2391278-17-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21selftests/bpf: verifier/ref_tracking converted to inline assemblyEduard Zingerman
Test verifier/ref_tracking automatically converted to use inline assembly. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20230421174234.2391278-16-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21selftests/bpf: verifier/map_ptr_mixing converted to inline assemblyEduard Zingerman
Test verifier/map_ptr_mixing automatically converted to use inline assembly. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20230421174234.2391278-13-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21selftests/bpf: verifier/map_in_map converted to inline assemblyEduard Zingerman
Test verifier/map_in_map automatically converted to use inline assembly. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20230421174234.2391278-12-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21selftests/bpf: verifier/lwt converted to inline assemblyEduard Zingerman
Test verifier/lwt automatically converted to use inline assembly. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20230421174234.2391278-11-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21selftests/bpf: verifier/loops1 converted to inline assemblyEduard Zingerman
Test verifier/loops1 automatically converted to use inline assembly. There are a few modifications for the converted tests. "tracepoint" programs do not support test execution, change program type to "xdp" (which supports test execution) for the following tests that have __retval tags: - bounded loop, count to 4 - bonded loop containing forward jump Also, remove the __retval tag for test: - bounded loop, count from positive unknown to 4 As it's return value is a random number. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20230421174234.2391278-10-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21selftests/bpf: verifier/jeq_infer_not_null converted to inline assemblyEduard Zingerman
Test verifier/jeq_infer_not_null automatically converted to use inline assembly. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20230421174234.2391278-9-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21selftests/bpf: verifier/direct_packet_access converted to inline assemblyEduard Zingerman
Test verifier/direct_packet_access automatically converted to use inline assembly. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20230421174234.2391278-8-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21selftests/bpf: verifier/d_path converted to inline assemblyEduard Zingerman
Test verifier/d_path automatically converted to use inline assembly. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20230421174234.2391278-7-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21selftests/bpf: verifier/ctx converted to inline assemblyEduard Zingerman
Test verifier/ctx automatically converted to use inline assembly. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20230421174234.2391278-6-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21selftests/bpf: verifier/btf_ctx_access converted to inline assemblyEduard Zingerman
Test verifier/btf_ctx_access automatically converted to use inline assembly. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20230421174234.2391278-5-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21selftests/bpf: verifier/bpf_get_stack converted to inline assemblyEduard Zingerman
Test verifier/bpf_get_stack automatically converted to use inline assembly. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20230421174234.2391278-4-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21selftests/bpf: verifier/bounds converted to inline assemblyEduard Zingerman
Test verifier/bounds automatically converted to use inline assembly. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20230421174234.2391278-3-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21selftests/bpf: Add notion of auxiliary programs for test_loaderEduard Zingerman
In order to express test cases that use bpf_tail_call() intrinsic it is necessary to have several programs to be loaded at a time. This commit adds __auxiliary annotation to the set of annotations supported by test_loader.c. Programs marked as auxiliary are always loaded but are not treated as a separate test. For example: void dummy_prog1(void); struct { __uint(type, BPF_MAP_TYPE_PROG_ARRAY); __uint(max_entries, 4); __uint(key_size, sizeof(int)); __array(values, void (void)); } prog_map SEC(".maps") = { .values = { [0] = (void *) &dummy_prog1, }, }; SEC("tc") __auxiliary __naked void dummy_prog1(void) { asm volatile ("r0 = 42; exit;"); } SEC("tc") __description("reference tracking: check reference or tail call") __success __retval(0) __naked void check_reference_or_tail_call(void) { asm volatile ( "r2 = %[prog_map] ll;" "r3 = 0;" "call %[bpf_tail_call];" "r0 = 0;" "exit;" :: __imm(bpf_tail_call), : __clobber_all); } Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20230421174234.2391278-2-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21Merge branch 'bpf: add netfilter program type'Alexei Starovoitov
Florian Westphal says: ==================== Changes since last version: - rework test case in last patch wrt. ctx->skb dereference etc (Alexei) - pacify bpf ci tests, netfilter program type missed string translation in libbpf helper. This still uses runtime btf walk rather than extending the btf trace array as Alexei suggested, I would do this later (or someone else can). v1 cover letter: Add minimal support to hook bpf programs to netfilter hooks, e.g. PREROUTING or FORWARD. For this the most relevant parts for registering a netfilter hook via the in-kernel api are exposed to userspace via bpf_link. The new program type is 'tracing style', i.e. there is no context access rewrite done by verifier, the function argument (struct bpf_nf_ctx) isn't stable. There is no support for direct packet access, dynptr api should be used instead. With this its possible to build a small test program such as: #include "vmlinux.h" extern int bpf_dynptr_from_skb(struct __sk_buff *skb, __u64 flags, struct bpf_dynptr *ptr__uninit) __ksym; extern void *bpf_dynptr_slice(const struct bpf_dynptr *ptr, uint32_t offset, void *buffer, uint32_t buffer__sz) __ksym; SEC("netfilter") int nf_test(struct bpf_nf_ctx *ctx) { struct nf_hook_state *state = ctx->state; struct sk_buff *skb = ctx->skb; const struct iphdr *iph, _iph; const struct tcphdr *th, _th; struct bpf_dynptr ptr; if (bpf_dynptr_from_skb(skb, 0, &ptr)) return NF_DROP; iph = bpf_dynptr_slice(&ptr, 0, &_iph, sizeof(_iph)); if (!iph) return NF_DROP; th = bpf_dynptr_slice(&ptr, iph->ihl << 2, &_th, sizeof(_th)); if (!th) return NF_DROP; bpf_printk("accept %x:%d->%x:%d, hook %d ifin %d\n", iph->saddr, bpf_ntohs(th->source), iph->daddr, bpf_ntohs(th->dest), state->hook, state->in->ifindex); return NF_ACCEPT; } Then, tail /sys/kernel/tracing/trace_pipe. Changes since v3: - uapi: remove 'reserved' struct member, s/prio/priority (Alexei) - add ctx access test cases (Alexei, see last patch) - some arm32 can only handle cmpxchg on u32 (build bot) - Fix kdoc annotations (Simon Horman) - bpftool: prefer p_err, not fprintf (Quentin) - add test cases in separate patch Changes since v2: 1. don't WARN when user calls 'bpftool loink detach' twice restrict attachment to ip+ip6 families, lets relax this later in case arp/bridge/netdev are needed too. 2. show netfilter links in 'bpftool net' output as well. Changes since v1: 1. Don't fail to link when CONFIG_NETFILTER=n (build bot) 2. Use test_progs instead of test_verifier (Alexei) Changes since last RFC version: 1. extend 'bpftool link show' to print prio/hooknum etc 2. extend 'nft list hooks' so it can print the bpf program id 3. Add an extra patch to artificially restrict bpf progs with same priority. Its fine from a technical pov but it will cause ordering issues (most recent one comes first). Can be removed later. 4. Add test_run support for netfilter prog type and a small extension to verifier tests to make sure we can't return verdicts like NF_STOLEN. 5. Alter the netfilter part of the bpf_link uapi struct: - add flags/reserved members. Not used here except returning errors when they are nonzero. Plan is to allow the bpf_link users to enable netfilter defrag or conntrack engine by setting feature flags at link create time in the future. ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21selftests/bpf: add missing netfilter return value and ctx access testsFlorian Westphal
Extend prog_tests with two test cases: # ./test_progs --allow=verifier_netfilter_retcode #278/1 verifier_netfilter_retcode/bpf_exit with invalid return code. test1:OK #278/2 verifier_netfilter_retcode/bpf_exit with valid return code. test2:OK #278/3 verifier_netfilter_retcode/bpf_exit with valid return code. test3:OK #278/4 verifier_netfilter_retcode/bpf_exit with invalid return code. test4:OK #278 verifier_netfilter_retcode:OK This checks that only accept and drop (0,1) are permitted. NF_QUEUE could be implemented later if we can guarantee that attachment of such programs can be rejected if they get attached to a pf/hook that doesn't support async reinjection. NF_STOLEN could be implemented via trusted helpers that can guarantee that the skb will eventually be free'd. v4: test case for bpf_nf_ctx access checks, requested by Alexei Starovoitov. v5: also check ctx->{state,skb} can be dereferenced (Alexei). # ./test_progs --allow=verifier_netfilter_ctx #281/1 verifier_netfilter_ctx/netfilter invalid context access, size too short:OK #281/2 verifier_netfilter_ctx/netfilter invalid context access, size too short:OK #281/3 verifier_netfilter_ctx/netfilter invalid context access, past end of ctx:OK #281/4 verifier_netfilter_ctx/netfilter invalid context, write:OK #281/5 verifier_netfilter_ctx/netfilter valid context read and invalid write:OK #281/6 verifier_netfilter_ctx/netfilter test prog with skb and state read access:OK #281/7 verifier_netfilter_ctx/netfilter test prog with skb and state read access @unpriv:OK #281 verifier_netfilter_ctx:OK Summary: 1/7 PASSED, 0 SKIPPED, 0 FAILED This checks: 1/2: partial reads of ctx->{skb,state} are rejected 3. read access past sizeof(ctx) is rejected 4. write to ctx content, e.g. 'ctx->skb = NULL;' is rejected 5. ctx->state content cannot be altered 6. ctx->state and ctx->skb can be dereferenced 7. ... same program fails for unpriv (CAP_NET_ADMIN needed). Link: https://lore.kernel.org/bpf/20230419021152.sjq4gttphzzy6b5f@dhcp-172-26-102-232.dhcp.thefacebook.com/ Link: https://lore.kernel.org/bpf/20230420201655.77kkgi3dh7fesoll@MacBook-Pro-6.local/ Signed-off-by: Florian Westphal <fw@strlen.de> Link: https://lore.kernel.org/r/20230421170300.24115-8-fw@strlen.de Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21bpf: add test_run support for netfilter program typeFlorian Westphal
add glue code so a bpf program can be run using userspace-provided netfilter state and packet/skb. Default is to use ipv4:output hook point, but this can be overridden by userspace. Userspace provided netfilter state is restricted, only hook and protocol families can be overridden and only to ipv4/ipv6. Signed-off-by: Florian Westphal <fw@strlen.de> Link: https://lore.kernel.org/r/20230421170300.24115-7-fw@strlen.de Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21tools: bpftool: print netfilter link infoFlorian Westphal
Dump protocol family, hook and priority value: $ bpftool link 2: netfilter prog 14 ip input prio -128 pids install(3264) 5: netfilter prog 14 ip6 forward prio 21 pids a.out(3387) 9: netfilter prog 14 ip prerouting prio 123 pids a.out(5700) 10: netfilter prog 14 ip input prio 21 pids test2(5701) v2: Quentin Monnet suggested to also add 'bpftool net' support: $ bpftool net xdp: tc: flow_dissector: netfilter: ip prerouting prio 21 prog_id 14 ip input prio -128 prog_id 14 ip input prio 21 prog_id 14 ip forward prio 21 prog_id 14 ip output prio 21 prog_id 14 ip postrouting prio 21 prog_id 14 'bpftool net' only dumps netfilter link type, links are sorted by protocol family, hook and priority. v5: fix bpf ci failure: libbpf needs small update to prog_type_name[] and probe_prog_load helper. v4: don't fail with -EOPNOTSUPP in libbpf probe_prog_load, update prog_type_name[] with "netfilter" entry (bpf ci) v3: fix bpf.h copy, 'reserved' member was removed (Alexei) use p_err, not fprintf (Quentin) Suggested-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/bpf/eeeaac99-9053-90c2-aa33-cc1ecb1ae9ca@isovalent.com/ Reviewed-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Florian Westphal <fw@strlen.de> Link: https://lore.kernel.org/r/20230421170300.24115-6-fw@strlen.de Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21netfilter: disallow bpf hook attachment at same priorityFlorian Westphal
This is just to avoid ordering issues between multiple bpf programs, this could be removed later in case it turns out to be too cautious. bpf prog could still be shared with non-bpf hook, otherwise we'd have to make conntrack hook registration fail just because a bpf program has same priority. Signed-off-by: Florian Westphal <fw@strlen.de> Link: https://lore.kernel.org/r/20230421170300.24115-5-fw@strlen.de Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21netfilter: nfnetlink hook: dump bpf prog idFlorian Westphal
This allows userspace ("nft list hooks") to show which bpf program is attached to which hook. Without this, user only knows bpf prog is attached at prio x, y, z at INPUT and FORWARD, but can't tell which program is where. v4: kdoc fixups (Simon Horman) Link: https://lore.kernel.org/bpf/ZEELzpNCnYJuZyod@corigine.com/ Signed-off-by: Florian Westphal <fw@strlen.de> Link: https://lore.kernel.org/r/20230421170300.24115-4-fw@strlen.de Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21bpf: minimal support for programs hooked into netfilter frameworkFlorian Westphal
This adds minimal support for BPF_PROG_TYPE_NETFILTER bpf programs that will be invoked via the NF_HOOK() points in the ip stack. Invocation incurs an indirect call. This is not a necessity: Its possible to add 'DEFINE_BPF_DISPATCHER(nf_progs)' and handle the program invocation with the same method already done for xdp progs. This isn't done here to keep the size of this chunk down. Verifier restricts verdicts to either DROP or ACCEPT. Signed-off-by: Florian Westphal <fw@strlen.de> Link: https://lore.kernel.org/r/20230421170300.24115-3-fw@strlen.de Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21bpf: add bpf_link support for BPF_NETFILTER programsFlorian Westphal
Add bpf_link support skeleton. To keep this reviewable, no bpf program can be invoked yet, if a program is attached only a c-stub is called and not the actual bpf program. Defaults to 'y' if both netfilter and bpf syscall are enabled in kconfig. Uapi example usage: union bpf_attr attr = { }; attr.link_create.prog_fd = progfd; attr.link_create.attach_type = 0; /* unused */ attr.link_create.netfilter.pf = PF_INET; attr.link_create.netfilter.hooknum = NF_INET_LOCAL_IN; attr.link_create.netfilter.priority = -128; err = bpf(BPF_LINK_CREATE, &attr, sizeof(attr)); ... this would attach progfd to ipv4:input hook. Such hook gets removed automatically if the calling program exits. BPF_NETFILTER program invocation is added in followup change. NF_HOOK_OP_BPF enum will eventually be read from nfnetlink_hook, it allows to tell userspace which program is attached at the given hook when user runs 'nft hook list' command rather than just the priority and not-very-helpful 'this hook runs a bpf prog but I can't tell which one'. Will also be used to disallow registration of two bpf programs with same priority in a followup patch. v4: arm32 cmpxchg only supports 32bit operand s/prio/priority/ v3: restrict prog attachment to ip/ip6 for now, lets lift restrictions if more use cases pop up (arptables, ebtables, netdev ingress/egress etc). Signed-off-by: Florian Westphal <fw@strlen.de> Link: https://lore.kernel.org/r/20230421170300.24115-2-fw@strlen.de Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21bpftool: Update doc to explain struct_ops register subcommand.Kui-Feng Lee
The "struct_ops register" subcommand now allows for an optional *LINK_DIR* to be included. This specifies the directory path where bpftool will pin struct_ops links with the same name as their corresponding map names. Signed-off-by: Kui-Feng Lee <kuifeng@meta.com> Reviewed-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/r/20230420002822.345222-2-kuifeng@meta.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21bpftool: Register struct_ops with a link.Kui-Feng Lee
You can include an optional path after specifying the object name for the 'struct_ops register' subcommand. Since the commit 226bc6ae6405 ("Merge branch 'Transit between BPF TCP congestion controls.'") has been accepted, it is now possible to create a link for a struct_ops. This can be done by defining a struct_ops in SEC(".struct_ops.link") to make libbpf returns a real link. If we don't pin the links before leaving bpftool, they will disappear. To instruct bpftool to pin the links in a directory with the names of the maps, we need to provide the path of that directory. Signed-off-by: Kui-Feng Lee <kuifeng@meta.com> Reviewed-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/r/20230420002822.345222-1-kuifeng@meta.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21Merge tag 'for-6.3-rc7-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "Two patches fixing the problem with aync discard. The default settings had a low IOPS limit and processing a large batch to discard would take a long time. On laptops this can cause increased power consumption due to disk activity. As async discard has been on by default since 6.2 this likely affects a lot of users. Summary: - increase the default IOPS limit 10x which reportedly helped - setting the sysfs IOPS value to 0 now does not throttle anymore allowing the discards to be processed at full speed. Previously there was an arbitrary 6 hour target for processing the pending batch" * tag 'for-6.3-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: reinterpret async discard iops_limit=0 as no delay btrfs: set default discard iops_limit to 1000
2023-04-21Merge tag 'block-6.3-2023-04-21' of git://git.kernel.dk/linuxLinus Torvalds
Pull block fix from Jens Axboe: "Just a single revert of a patch from the 6.3 series" * tag 'block-6.3-2023-04-21' of git://git.kernel.dk/linux: Revert "block: Merge bio before checking ->cached_rq"
2023-04-21Merge tag 'char-misc-6.3-final' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver fixes from Greg KH: "Here are some last-minute tiny driver fixes for 6.3-final. They include fixes for some fpga and iio drivers: - fpga bridge driver fix - fpga dfl error reporting fix - fpga m10bmc driver fix - fpga xilinx driver fix - iio light driver fix - iio dac fwhandle leak fix - iio adc driver fix All of these have been in linux-next for a few weeks with no reported problems" * tag 'char-misc-6.3-final' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: iio: light: tsl2772: fix reading proximity-diodes from device tree fpga: bridge: properly initialize bridge device before populating children iio: dac: ad5755: Add missing fwnode_handle_put() iio: adc: at91-sama5d2_adc: fix an error code in at91_adc_allocate_trigger() fpga: xilinx-pr-decoupler: Use readl wrapper instead of pure readl fpga: dfl-pci: Drop redundant pci_enable_pcie_error_reporting() fpga: m10bmc-sec: Fix rsu_send_data() to return FW_UPLOAD_ERR_HW_ERROR